Instructor Notes
Setup
Participants should install and run before the workshop, so that any problems may reveal themselves early.
The dataset used
- The dataset used in this lesson can be downloaded from Figshare through the link in the setup section.
- It will need to be downloaded to the local machine before it can be loaded into OpenRefine.
- A general description of the dataset used in the Social Sciences lessons can be found in the workshop data home page.
The Lessons
- Explains what OpenRefine is, what it is used for and where to get help.
- Covers the creation of an OpenRefine project using our dataset.
- Facets and clustering are introduced and there is a discussion on the different clustering algorithms and how they may produce different results.
- Splitting columns is covered as is undo/redo.
- Using Include and Exclude from a facet is covered and the difference between faceting and filtering is explained.
- The various sort options for single or multiple columns is covered.
Examining Numbers in OpenRefine
- Explains that everything is a string until you change it.
- Explains how to change the data type and the additional faceting ability it provides.
- Explains how actions within a project can be copied to an external file and re-applied. The same file is used to re-apply the changes.
- Covers the overall format of a project ‘file’ and how the components can be viewed.
- This may require installing additional software on Windows machine (e.g. 7-zip) as the built-in un-zipping facility does not work with tar.gz files.
- Just a list of various OpenRefine resources available on-line (taken from the Ecology lessons)
Introduction
Please help improve this page
There are several issues related to this section of the lesson:
- the goals of the (whole) lesson should be stated (#38)
- it does not explain the difference between data cleaning and data organisation (#56)
- the contents do not match the objectives (#86)
- it does not explain when (not) to use OpenRefine (#103)
- the Getting Help section should move to the end or an Extras page (#122)
Your input on these issues would be much appreciated!
Data privacy when using APIs or reconciliation
Most functionality does not require an Internet connection and keeps your data within the computer. Some functions, however, like looking up data from URLs or reconciling values in your dataset with online services, necessarily require that data is sent to the online services. While this lesson does not cover these functions, it may be important to know how data could be shared with outside parties, especially if you work with sensitive or confidential data.
Zooming hides buttons
OpenRefine is used through its graphical user interface in this lesson. In classroom settings or in online classes, you probably want to zoom in on the interface so that text is readable to all. However, when you zoom in, some controls may fall outside the view. Dialog windows in OpenRefine cannot be dragged, so the only way to show buttons that were outside the view is to zoom out again.
If you are planning to teach this lesson to a big room, you may want to check if the main projector screen or monitor is large enough to show all of the user interface while having the text large enough that all learners can see it.
Working with OpenRefine
Importing the sample data
The file has a single header row and has comma-separated values. OpenRefine should not have trouble figuring out the settings for parsing these data. Either US-ASCII or UTF-8 are fine as character encoding.
Consider giving the project a meaningful name. If you do, briefly explain how that name is meaningful (to you and hopefully others).
There are many columns in the file, which may be handled after importing.
Open Project when you returned to start screen
If at any time during the lesson you (accidentally) end up back at the start screen, you could demonstrate “Open Project”. It opens your project where you were, which demonstrates that OpenRefine continually saves the project in the background.
Exercises available in OpenRefine for Ecologists lesson
These facet types are explored further in Examining Numeric Columns in the OpenRefine for Ecologists lesson. Note that this is a different lesson!