Summary and Schedule
Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data.
This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in one and a half days (~ 10 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.
Getting Started
Data Carpentry’s teaching is hands-on, so participants are encouraged
to use their own computers to ensure the proper setup of tools for an
efficient workflow.
These lessons assume no prior knowledge
of the skills or tools.
To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.
For Instructors
If you are teaching this lesson in a workshop, please see the Instructor notes.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Before we start | What is Python and why should I learn it? |
Duration: 00h 30m | 2. Short Introduction to Programming in Python |
How do I program in Python? How can I represent my data in Python? |
Duration: 01h 05m | 3. Starting With Data |
How can I import data in Python? What is Pandas? Why should I use Pandas to work with data? |
Duration: 02h 05m | 4. Indexing, Slicing and Subsetting DataFrames in Python |
How can I access specific data within my data set? How can Python and Pandas help me to analyse my data? |
Duration: 03h 05m | 5. Data Types and Formats |
What types of data can be contained in a DataFrame? Why is the data type important? |
Duration: 03h 50m | 6. Combining DataFrames with Pandas |
Can I work with data from multiple sources? How can I combine data from different data sets? |
Duration: 04h 35m | 7. Data Workflows and Automation |
Can I automate operations in Python? What are functions and why should I use them? |
Duration: 06h 05m | 8. Making Plots With plotnine |
How can I visualize data in Python? What is ‘grammar of graphics’? |
Duration: 07h 35m | 9. Data Ingest and Visualization - Matplotlib and Pandas |
What other tools can I use to create plots apart from ggplot? Why should I use Python to create plots? |
Duration: 09h 20m | 10. Accessing SQLite Databases Using Python and Pandas |
What if my data are stored in an SQL database? Can I manage them with
Python? How can I write data from Python to be used with SQL? |
Duration: 10h 05m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data
Data for this lesson is from the Portal Project Teaching Database. Specifically, we use the following eight data files:
- bouldercreek_09_2013.txt
- plots.csv
- portal_mammals.sqlite
- species.csv
- speciesSubset.csv
- surveys.csv
- surveys2001.csv
- surveys2002.csv
Please download them (by clicking on the corresponding links) and move them to the same directory, or download all the files as a zip which will give you everything in a single compressed file. You’ll need to unzip this file after downloading it.
Installing Python using Anaconda
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of the scientific packages we use in the lesson individually can be a bit cumbersome, and therefore recommend the all-in-one installer Anaconda.
Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.10 is fine and will continue to receive security patches unitl 2026-OCT-04).
Installing Anaconda
Select your operating system from the options below.
Open https://www.anaconda.com/products/individual in your web browser.
Download the Anaconda Python 3 installer for Windows.
Double-click the executable and install Python 3 using the recommended settings. Make sure that Register Anaconda as my default Python 3.x option is checked – it should be in the latest version of Anaconda.
Verify the installation: click Start, search and select
Anaconda Prompt
from the menu. A window should pop up where you can now type commands such as checking your Conda installation with:
Visit https://www.anaconda.com/products/individual in your web browser.
Download the Anaconda Python 3 installer for macOS. These instructions assume that you use the graphical installer
.pkg
file.Follow the Anaconda Python 3 installation instructions. Make sure that the install location is set to “Install only for me” so Anaconda will install its files locally, relative to your home directory. Installing the software for all users tends to create problems in the long run and should be avoided.
Verify the installation: click the Launchpad icon in the Dock, type Terminal in the search field, then click Terminal. A window should pop up where you can now type commands such as checking your conda installation with:
Note that the following installation steps require you to work from the terminal (shell). If you run into any difficulties, please request help before the workshop begins.
Open https://www.anaconda.com/products/individual in your web browser.
Download the Anaconda Python 3 installer for Linux.
Install Anaconda using all of the defaults for installation.
- Open a terminal window.
- Navigate to the folder where you downloaded the installer.
- Type
bash Anaconda3-
and press Tab. The name of the file you just downloaded should appear. - Press Return
- Follow the text-only prompts. When the license agreement appears (a
colon will be present at the bottom of the screen) press
Spacebar until you see the bottom of the text. Type
yes
and press Return to approve the license. Press Return again to approve the default location for the files. Typeyes
and press Return to prepend Anaconda to yourPATH
(this makes the Anaconda distribution your user’s default Python).
- Verify the installation: this depends a bit on your Linux distribution, but often you will have an Applications listing in which you can select a Terminal icon you can click. A window should pop up where you can now type commands such as checking your conda installation with:
Required Python Packages
The following are packages needed for this workshop:
All packages apart from plotnine
will have automatically
been installed with Anaconda and we can use Anaconda as a package
manager to install the missing plotnine
package: You need
to open up a Terminal, if you are using Mac OSX, or Linux (see
instructions above), or launch an anaconda-prompt, if you are
using Windows. In your terminal window type the following:
This will then install the latest version of plotnine into your conda environment.
Required packages: Miniconda
Miniconda is a lightweight version of Anaconda. If you install Miniconda instead of Anaconda, you need to install required packages manually in the following way:
(Alternative) Installing required packages with environment file
Download the environment.yml file by right-clicking the link and selecting save as. In the directory where you downloaded the environment.yml file run:
Activate the new environment with:
You can deactivate the environment with:
Launch a Jupyter notebook
After installing either Anaconda or Miniconda and the workshop packages, launch a Jupyter notebook by typing this command into the terminal or anaconda-prompt:
The notebook should open automatically in your browser. If it does not or you wish to use a different browser, open this link: http://localhost:8888.
Leave terminal used to launch Jupyter open
Jupyter depends on a server running in the background associated with the window used to launch it. Closing that window will results in web interface errors in the web interface. When done, you can either close the terminal or shut down the server using CTRL+C and submitting y within 5 seconds if the terminal is needed for other tasks.
For a brief introduction to Jupyter Notebooks, please consult our Introduction to Jupyter Notebooks page.