What is a relational database?


  • A relational database is data organised as a collection of related tables
  • SQL (Structured Query Language) is used to extract data from the tables. Either a single table or data spread across two or more related tables.
  • A schema, which describes the data in a table, has to be created before data can be added
  • The schema can be used to provide some data validation on input

Using DB Browser for SQLite


  • The DB Browser for SQLite application allows you to connect to an existing database or create a new database
  • When connected to a database you can create new tables
  • When connected to a database you can write and run SQL queries and view the results
  • You can save the results of a query to a file

The Select Statement


  • Strictly speaking SQL is a standard, not a particular implementation
  • SQL implementation are sufficiently close that you only have to learn SQL once
  • The DDL constructs are used to create tables and other database objects
  • The DML constructs, typically the SELECT statement is used to retrieve data from one or more tables
  • The SELECT statement allows you to ‘slice’ and ‘dice’ the columns and rows of the dataset so that the query only returns the data of interest

Missing Data


  • You should expect missing data
  • You need to know how missing data is being represented in your dataset
  • Database systems always represent what they consider to be missing data as NULL
  • You can explicitly test for NULL values in your data
  • You may need other tests for different representations of NULL

Creating New Columns


  • New result columns can be created using arithmetic operators or built-in functions
  • New columns have to be given names or aliases
  • The CASE coding structure can be used to create new columns
  • The new columns are only in the query results. The original table is not changed

Aggregations


  • Built-in functions can be used to produce a variety of summary statistics
  • The DISTINCT keyword can be used to find the unique set of values in a column or columns
  • Data in columns can be summarised by values using the GROUP BY clause
  • Summarised data can be filtered using the HAVING clause

Creating tables and views


  • Database tables can be created using the DDL command CREATE TABLE
  • They can be populated using the INSERT INTO command
  • The SQLite plugin allows you to create a table and import data into it in one step
  • There are many options available to the CREATE TABLE command which allows greater control over how or what data can be loaded into the table
  • A View can be treated just like a table in a query
  • A View does not contain data like a table does, only the instructions on how to get the data

The SQLite command line


  • SQLite databases can be created, managed and queried from the SQLite shell utility
  • You can run the shell interactively from the command line, typing queries or dot commands at the prompt
  • You can call the SQLite3 program and specify a database and a set of commands to run. This aids automation

Joins


  • Joins are used to combine data from two or more tables.
  • Tables to be joined must have a column in each which represent the same thing
  • There are several different types of joins
  • The Inner join is the most commonly use
  • You may have to use the other join types to discover missing data or gaps in your data

Using database tables in other environments


  • ODBC - Open DataBase Connector allows a database to be connected to a program or application
  • Each database system has its own ODBC connectors
  • Programs such as Excel allow you to use ODBC to get data from databases
  • Programming languages such as Python and R provide libraries which facilitate ODBC connections