Instructor Notes
Setup
Participants should install and run before the workshop, so that any problems may reveal themselves early.
The dataset used
- The dataset used in this lesson can be downloaded from Figshare through the link in the setup section.
- It will need to be downloaded to the local machine before it can be loaded into OpenRefine.
- A general description of the dataset used in the Social Sciences lessons can be found in the workshop data home page.
The Lessons
- Explains what OpenRefine is, what it is used for and where to get help.
- Covers the creation of an OpenRefine project using our dataset.
- Facets and clustering are introduced and there is a discussion on the different clustering algorithms and how they may produce different results.
- Splitting columns is covered as is undo/redo.
- Using Include and Exclude from a facet is covered and the difference between faceting and filtering is explained.
- The various sort options for single or multiple columns is covered.
Examining Numbers in OpenRefine
- Explains that everything is a string until you change it.
- Explains how to change the data type and the additional faceting ability it provides.
- Explains how actions within a project can be copied to an external file and re-applied. The same file is used to re-apply the changes.
- Covers the overall format of a project ‘file’ and how the components can be viewed.
- This may require installing additional software on Windows machine (e.g. 7-zip) as the built-in un-zipping facility does not work with tar.gz files.
- Just a list of various OpenRefine resources available on-line (taken from the Ecology lessons)