Summary and Schedule
Lesson Maintainers: {{ page.maintainers | join: ‘,’ }}
Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data.
This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.
Getting Started
Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.
To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.
Prerequisites
This lesson requires a working copy of Python.
To most effectively use these materials, please make sure to install everything before working through this lesson and download data files mentioned in the Setup tab.
For Instructors
If you are teaching this lesson in a workshop, please see the Instructor notes.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to Python |
Why learn Python? What are Jupyter notebooks? |
Duration: 00h 15m | 2. Python basics |
How do I assign values to variables? How do I do arithmetic? What is a built-in function? How do I see results? What data types are supported in Python? |
Duration: 01h 10m | 3. Python control structures |
What constructs are available for changing the flow of a
program? How can I repeat an action many times? How can I perform the same task(s) on a set of items? |
Duration: 01h 55m | 4. Creating re-usable code |
What are user defined functions? How can I automate my code for re-use? |
Duration: 02h 35m | 5. Processing data from a file |
How can I read and write files? What kind of data files can I read? |
Duration: 03h 45m | 6. Dates and Time |
How are dates and time represented in Python? How can I manipulate dates and times? |
Duration: 04h 10m | 7. Processing JSON data |
What is JSON format? How can I extract specific data items from a JSON record? How can I convert an array of JSON record into a table? |
Duration: 04h 55m | 8. Reading data from a file using Pandas |
What is Pandas? How do I read files using Pandas? What is the difference between reading files using Pandas and other methods of reading files? |
Duration: 05h 15m | 9. Extracting row and columns |
How can I extract specific rows and columns from a Dataframe? How can I add or delete columns from a Dataframe? How can I find and change missing values in a Dataframe? |
Duration: 05h 45m | 10. Data Aggregation using Pandas | How can I summarise the data in a data frame? |
Duration: 06h 15m | 11. Joining Pandas Dataframes | How can I join two Dataframes with a common key? |
Duration: 06h 50m | 12. Wide and long data formats |
What are long and Wide formats? Why would I want to change between them? |
Duration: 07h 25m | 13. Data visualisation using Matplotlib | How can I create visualisations of my data? |
Duration: 08h 15m | 14. Accessing SQLite Databases |
How can I access database tables using Pandas and Python? What are the advantages of storing data in a database |
Duration: 09h 15m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data
Data for this lesson is from the The SAFI Teaching Database and Audit of Political Engagement 11, 2013.
We will use the files listed below for the data in this lesson. You can download the files by clicking on the links.
Note: make sure to place the data files on the same folder that your notebook is running on.
Software
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, so we recommend an all-in-one installer.
For this workshop we use Python version 3.x.
Setup instructions for Python
In order to complete the materials for the Python lesson, you will need Python to be installed on your machine. As many of the examples and exercises use Jupyter notebooks, you will need it to be installed as well.
The Anaconda distribution of Python will allow you to install both Python and Jupyter notebooks as a single install. Anaconda will also install many other commonly used Python packages.
How to install the Anaconda distribution of python
- Follow the Anaconda link above to the Anaconda website. There are versions of Anaconda available for Windows, macOS, and Linux. The website will detect your operating system and provide a link to the appropriate download.
- There will be two options, one for Python 2.x and another for Python 3.x. We will take the Python 3.x option. Python 2.x will eventually be phased out but is still provided for backward compatibility with some older optional Python modules. The majority of popular modules have been converted to work with Python 3.x. The actual value of x will vary depending on when you download. At the time of writing I am being offered Python 3.6 or Python 2.7.
- For Windows and Linux there is the option of either a 64 bit (default) download or a 32 bit download. Unless you know that you have an old 32 bit pc you should choose the 64 bit installer.
- Run the downloaded installer program. Accept the default settings until you are given the option to add Anaconda to your environmental Path variable. Despite the recommendation not to and the subsequent warning, you should select this option. This will make it easier later on to start Jupyter notebooks from any location.
- The installation can take a few minutes. When finished you should be able to open a cmd prompt (Type cmd from Windows start and into the cmd window type python. You should get a display similar to that below.
- The
>>>
prompt tells you that you are in the Python environment. You can exit Python with theexit()
command.
Running Jupyter Notebooks in Windows
- From file explorer navigate to where you can select the folder which contains your Jupyter Notebook notebooks (it can be empty initially).
- Hold down the
shift
key and right-click the mouse - The pop-up menu items will include an option to start a cmd window or in the latest Windows release start a ‘PowerShell’ window. Select whichever appears.
- When the window opens, type the command
jupyter notebook
. - Several messages will appear in the command window. In addition your default web browser will open and display the Jupyter notebook home page. The main part of this is a file browser window starting at the folder you selected in step 1.
- There may be existing notebooks which you can select and open in a new tab in your browser or there is a menu option to create a new notebook.
- The Jupyter package creates a small web services and opens your browser pointing at it. If your browser does not open, you can open it manually and specify ‘localhost:8888’ as the URL.
- Port 8888 is the default port used by the Jupyter web service, but if it is already in use it will increment the port number automatically. Either way the port number it does use is given in a message in the cmd/powershell window.
- Once running, the cmd/powershell window will display additional messages, e.g. about saving notebooks, but there is no need to interact with it directly. The window can be minimized and ignored.
- To shut Jupyter down, select the cmd/powershell window and type Ctrl+c twice and then close the window.