Examining Numbers in OpenRefine
Last updated on 2023-06-12 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- How can we convert a column from one data type to another?
- How can we find non-numeric values in a column that should contain numbers?
Objectives
- Transform a text column into a number column.
- Identify and modify non-numeric values in a numeric column using facets.
Numbers
When a table is imported into OpenRefine, all columns are treated as
containing text values. We saw earlier how we can sort column values as
numbers, but this does not change the cells in a column from text to
numbers. Rather, this interprets the values as numbers for the purposes
of sorting but keeps the underlying data type as is. We can, however,
transform columns from text to other data types (e.g. number or date)
using the Edit cells
> Common transforms
feature. Here we will experiment changing columns to numbers and see
what additional capabilities that grants us.
Be sure to remove any Text filter
facets you have
enabled from the left panel so that we can examine our whole dataset.
You can remove an existing facet by clicking the x
in the
upper left of that facet window.
To transform cells in the years_farm
column to numbers,
click the down arrow for that column, then Edit cells
>
Common transforms…
> To number
. You will
notice the years_farm
values change from left-justified to
right-justified, and black to green in color.
Transforming column contents to numbers
Transform three more columns, no_membrs
,
years_liv
, and buildings_in_compound
, from
text to numbers. Can all columns be transformed to numbers? - Try it
with village
for example.
Only observations that include only numerals (0-9) can be transformed
to numbers. If you apply a number transformation to a column that
doesn’t meet this criteria, and then click the Undo / Redo
tab, you will see a step that starts with
Text transform on 0 cells
. This means that the data in that
column was not transformed.
Numeric facet
Sometimes non-numeric values or blanks occur in a column where
numbers are expected. Such values may represent errors in data entry,
and we want to find them. We can do that with a
Numeric facet
.
Using a numeric facet
- For a column you transformed to numbers, edit one or two cells,
replacing the numbers with text (such as
abc
) or blank (no number or text). You will need to change theData type
totext
using the drop-down menu. - Use the column pulldown menu to apply a numeric facet to the column you edited. The facet will appear in the left panel.
- Notice that there are several checkboxes in this facet:
Numeric
,Non-numeric
,Blank
, andError
. Below these checkboxes are counts of the number of cells in each category. You should see checks forNon-numeric
andBlank
if you changed some values. - Experiment with checking or unchecking these boxes to select subsets of your data.
When done examining the numeric data, remove this facet by clicking
the x
in the upper left corner of its panel. Note that this
does not undo the edits you made to the cells in this column. Use the
Undo / Redo
function to reverse these changes.
Key Points
- OpenRefine also provides ways to to examine and clean numerical data.