Summary and Schedule
Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start.
We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too!
In this lesson, you will learn:
- Good data entry practices - formatting data tables in spreadsheets
- How to avoid common formatting mistakes
- Approaches for handling dates in spreadsheets
- Basic quality control and data manipulation in spreadsheets
- Exporting data from spreadsheets
In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.
Getting Started
Data Carpentry’s teaching is hands-on, so participants are encouraged
to use their own computers to insure the proper setup of tools for an
efficient workflow.
These lessons assume no prior knowledge
of the skills or tools.
To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.
Prerequisites
This lesson requires a working copy of spreadsheet software, such as
Microsoft Excel or LibreOffice or OpenOffice.org (see more details in
“Setup”).
To most effectively use these
materials, please make sure to install everything before
working through this lesson.
For Instructors
If you are teaching this lesson in a workshop, please see the Instructor notes.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction | What are basic principles for using spreadsheets for good data organization? |
Duration: 00h 18m | 2. Formatting data tables in Spreadsheets | How do we format data in spreadsheets for effective data use? |
Duration: 00h 53m | 3. Formatting problems | What are some common challenges with formatting data in spreadsheets and how can we avoid them? |
Duration: 01h 13m | 4. Dates as data | What are good approaches for handling dates in spreadsheets? |
Duration: 01h 26m | 5. Quality control | How can we carry out basic quality control and quality assurance in spreadsheets? |
Duration: 01h 46m | 6. Exporting data | How can we export data from spreadsheets in a way that is useful for downstream applications? |
Duration: 01h 56m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data
Download this data file to your computer: https://ndownloader.figshare.com/files/2252083
About the data
The data for this lesson is a part of the Data Carpentry Ecology workshop. It is a teaching version of the Portal Database. The data in this lesson is a subset of the teaching version that has been intentionally ‘messed up’ for this lesson.
The data for this lesson and the workshop are in the Portal Project Teaching Database available on FigShare, with a CC-BY license available for reuse.
Ernest, M., Brown, J., Valone, T., and White, E.P. (2017). Portal Project Teaching Database. Version 6. Figshare. DOI: 10.6084/m9.figshare.1314459.v6
Software
To interact with spreadsheets, we can use LibreOffice, Microsoft Excel, Gnumeric, OpenOffice.org, or other programs. Commands may differ a bit between programs, but the general ideas for thinking about spreadsheets are the same.
For this lesson, if you don’t have a spreadsheet program already, you can use LibreOffice. It’s a free, open source spreadsheet program.
Windows
- Download the Installer
- Install LibreOffice by going to the installation page. The version for Windows should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
- Install LibreOffice
- Once the installer is downloaded, double click on it and LibreOffice should install.
Mac OS X
- Download the Installer
- Install LibreOffice by going to the installation page. The version for Mac should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
- Install LibreOffice
- Once the installer is downloaded, double click on it and LibreOffice should install.
Linux
- Download the Installer
- Install LibreOffice by going to the installation page. The version for Linux should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
- Install LibreOffice
- Once the installer is downloaded, double click on it and LibreOffice should install.