What is a relational database?
- A relational database is data organised as a collection of related tables
- SQL (Structured Query Language) is used to extract data from the tables. Either a single table or data spread across two or more related tables.
- A schema, which describes the data in a table, has to be created before data can be added
- The schema can be used to provide some data validation on input
Using DB Browser for SQLite
- The DB Browser for SQLite application allows you to connect to an existing database or create a new database
- When connected to a database you can create new tables
- When connected to a database you can write and run SQL queries and view the results
- You can save the results of a query to a file
The Select Statement
- Strictly speaking SQL is a standard, not a particular implementation
- SQL implementation are sufficiently close that you only have to learn SQL once
- The DDL constructs are used to create tables and other database objects
- The DML constructs, typically the SELECT statement is used to retrieve data from one or more tables
- The SELECT statement allows you to ‘slice’ and ‘dice’ the columns and rows of the dataset so that the query only returns the data of interest
Missing Data
- You should expect missing data
- You need to know how missing data is being represented in your dataset
- Database systems always represent what they consider to be missing
data as
NULL
- You can explicitly test for
NULL
values in your data - You may need other tests for different representations of
NULL
Creating New Columns
- New result columns can be created using arithmetic operators or built-in functions
- New columns have to be given names or aliases
- The
CASE
coding structure can be used to create new columns - The new columns are only in the query results. The original table is not changed
Aggregations
- Built-in functions can be used to produce a variety of summary statistics
- The
DISTINCT
keyword can be used to find the unique set of values in a column or columns - Data in columns can be summarised by values using the
GROUP BY
clause - Summarised data can be filtered using the
HAVING
clause
Creating tables and views
- Database tables can be created using the DDL command
CREATE TABLE
- They can be populated using the
INSERT INTO
command - The SQLite plugin allows you to create a table and import data into it in one step
- There are many options available to the
CREATE TABLE
command which allows greater control over how or what data can be loaded into the table - A View can be treated just like a table in a query
- A View does not contain data like a table does, only the instructions on how to get the data
The SQLite command line
- SQLite databases can be created, managed and queried from the SQLite shell utility
- You can run the shell interactively from the command line, typing queries or dot commands at the prompt
- You can call the SQLite3 program and specify a database and a set of commands to run. This aids automation
Joins
- Joins are used to combine data from two or more tables.
- Tables to be joined must have a column in each which represent the same thing
- There are several different types of joins
- The Inner join is the most commonly use
- You may have to use the other join types to discover missing data or gaps in your data
Using database tables in other environments
- ODBC - Open DataBase Connector allows a database to be connected to a program or application
- Each database system has its own ODBC connectors
- Programs such as Excel allow you to use ODBC to get data from databases
- Programming languages such as Python and R provide libraries which facilitate ODBC connections