Introduction
- OpenRefine is a powerful, free and open source tool that can be used for data cleaning.
- OpenRefine will automatically track any steps you take in working with your data.
Importing Data to OpenRefine
- Use the Create Project option to import data
- You can control how data imports using options on the import screen
- Several file types may be imported into OpenRefine
Exploring Data with OpenRefine
- Faceting can identify errors or outliers in data
Transforming Data
- Clustering can identify outliers in data and help us fix errors in bulk
Filtering and Sorting with OpenRefine
- OpenRefine provides various ways to sort and filter data without affecting the raw data.
Reconciliation of Values
- OpenRefine can look up existing reconciliation services to enrich data
Looking Up Data
- OpenRefine can look up custom URLs to fetch data based on what’s in an OpenRefine project
- Such API calls can be custom built
Exporting Data Cleaning Steps
- All changes are being tracked in OpenRefine, and this information can be used for scripts for future analyses or reproducing an analysis.
- Scripts can (and should) be published together with the dataset as part of the digital appendix of the research output.
Exporting and Saving Data from OpenRefine
- OpenRefine can save the clean data to a number of formats.
- Cleaned data or entire projects can be exported from OpenRefine.
- Projects can be shared with collaborators, enabling them to see, reproduce and check all data cleaning steps you performed.
Other Resources in OpenRefine
- Other examples and resources online are good for learning more about OpenRefine