Instructor Notes

Setup


Participants should install and run before the workshop, so that any problems may reveal themselves early.

The dataset used


  • The dataset used in this lesson can be downloaded from Figshare through the link in the setup section.
  • It will need to be downloaded to the local machine before it can be loaded into OpenRefine.
  • A general description of the dataset used in the Social Sciences lessons can be found in the workshop data home page.

The Lessons


Introduction

  • Explains what OpenRefine is, what it is used for and where to get help.

Working with OpenRefine

  • Covers the creation of an OpenRefine project using our dataset.
  • Facets and clustering are introduced and there is a discussion on the different clustering algorithms and how they may produce different results.
  • Splitting columns is covered as is undo/redo.

Filtering and Sorting

  • Using Include and Exclude from a facet is covered and the difference between faceting and filtering is explained.
  • The various sort options for single or multiple columns is covered.

Examining Numbers in OpenRefine

  • Explains that everything is a string until you change it.
  • Explains how to change the data type and the additional faceting ability it provides.

Using scripts

  • Explains how actions within a project can be copied to an external file and re-applied. The same file is used to re-apply the changes.

Saving results

  • Covers the overall format of a project ‘file’ and how the components can be viewed.
  • This may require installing additional software on Windows machine (e.g. 7-zip) as the built-in un-zipping facility does not work with tar.gz files.

Other resources in OpenRefine

  • Just a list of various OpenRefine resources available on-line (taken from the Ecology lessons)