Reference
Last updated on 2023-10-19 | Edit this page
Glossary
The definitions below are modified from the Carpentries Glosario (CC-BY-4.0)
- aggreation
- To combine many values into one, e.g., by summing a set of numbers.
- field
- A component of a record containing a single value. Every record in a database table has the same fields.
- filter
- To choose a set of records (i.e., rows of a table) based on the values they contain.
- full join
- A join that returns all rows and all columns from two tables A and B. Where the keys of A and B match, values are combined; where they do not, missing values from either table are filled with null, NA, or some other missing value signifier.
- group
- To divide data into subsets according to a set of criteria while leaving records in a single structure.
- inner join
- A join that returns the combination of rows from two tables, A and B, whose keys exist in both tables.
- join
- One of several operations that combine values from two tables.
- key
- A field or combination of fields whose value(s) uniquely identify a record within a table or dataset. Keys are often used to select specific records and in joins.
- left join
- A join that combines data from two tables, A and B, where keys in table A match keys in table B, fields are concatenated. Where a key in table A does not match a key in table B, columns from table B are filled with null, NA, or some other missing value. Keys from table B that do not match keys from table A are excluded for the result.
- missing value
- A special value such as null or NA used to indicate the absence of data. Missing values can signal that data was not collected or that the data did not exist in the first place (e.g., the middle name of someone who does not have one).
- null
- A special value used to represent a missing object.
- record
- A group of related values that are stored together. A record may be represented as a tuple or as a row in a table; in the latter case, every record in the table has the same fields.
- relational_database
- A database that organizes information into tables, each of which has a fixed set of named fields (shown as columns) and a variable number of records (shown as rows).
- right join
- A join that combines data from two tables, A and B. Where keys in table A match keys in table B, fields are concatenated. Where a key in table B does *not* match a key in table A, columns from table A are filled with null, NA, or some other missing value signifier. Keys from table A that do not exist in table B are dropped.
- select
- To choose entire columns or rows from a table by name or location.
- SQL
- The language used for writing queries for a relational database. The term is an acronym for Structured Query Language.
- table
- A set of records in a relational database or observations in a data frame. Tables are usually displayed as rows (each of which represents one record or observation) and columns (each of which represents a field or variable.)
Commands
See this cheat sheet for an list of the commands covered in this lesson.
Keywords
Keyword | Definition | Example | Description |
---|---|---|---|
SELECT | Select data from database or table | SELECT * | Selects the entire dataset |
SELECT column1 | Selects a particular column | ||
SELECT 1 + 2 | Performs a calculation | ||
FROM | Indicates the table from which the data is selected or deleted |
SELECT year FROM surveys |
Query will display the desired column from the table |
WHERE | Filter the result set so that only the values satisfying the condition are included |
SELECT * FROM surveys WHERE year == 1990 |
Query will display all values from the table for which the year is 1990 |
LIMIT | Retrieves the number of records from the table up to the limit value |
SELECT * FROM surveys LIMIT 5 |
Query will will return only the first 5 rows from the table |
DISTINCT | Select distinct values |
SELECT DISTINCT year FROM surveys |
Query will display the distinct years present on the table |
AS | Used as an alias to rename the column or table | SELECT 1 + 2 AS calc | Column will be renamed to “calc” |
GROUP BY | Groups the result set |
SELECT MAX(weight) FROM surveys GROUP BY year |
Query will display the max weight per year |
HAVING | Used to filter grouped rows when using aggregate functions |
SELECT MAX(weight) FROM surveys GROUP BY year HAVING MAX(weight) > 100 |
Filter the results by the years that have a maximum weight greater than 100g |
JOIN | Joins tables |
SELECT * FROM surveys JOIN species ON surveys.species_id = species.species_id |
Query will display all the columns from both tables where the condition is met |