Before we Start


Figure 1

RStudio extends what R can do, and makes it easier to write R code and interact with R.

Figure 2

automatic car gear shift representing the ease of RStudio

Figure 3

Screenshot of the RStudio_startup screen

Figure 4

Example of a working directory structure

Figure 5

Screenshot of Packages pane

Figure 6

Screenshot of Install Packages Window

Introduction to R


Starting with Data


Figure 1

A 3 by 3 data frame with columns showing numeric, character and logical values.

Figure 2

Monsters at a fork in the road, with signs saying here, and not here. One direction, not here, leads to a scary dark forest with spiders and absolute filepaths, while the other leads to a sunny, green meadow, and a city below a rainbow and a world free of absolute filepaths. Art by Allison Horst
Image credit: Allison Horst

Figure 3

Yes/no bar graph showing number of individuals who are members of irrigation association

Figure 4

Bar plot of association membership, showing missing responses.

Figure 5

bar graph showing number of individuals who are members of irrigation association, including undetermined option

Data Wrangling with dplyr


Data Wrangling with tidyr


Figure 1

R for Data Science, Wickham H and Grolemund G (https://r4ds.had.co.nz/index.html) © Wickham, Grolemund 2017 This image is licenced under Attribution-NonCommercial-NoDerivs 3.0 United States (CC-BY-NC-ND 3.0 US)


Figure 2

Long and wide dataframe layouts mainly affect readability. You may find that visually you may prefer the “wide” format, since you can see more of the data on the screen. However, all of the R functions we have used thus far expect for your data to be in a “long” data format. This is because the long format is more machine readable and is closer to the formatting of databases.


Figure 3

Two tables shown side-by-side. The first row of the left table is highlighted in blue, and the first four rows of the right table are also highlighted in blue to show how each of the values of 'items owned' are given their own row with the separate longer delim function. The 'items owned logical' column is highlighted in yellow on the right table to show how the mutate function adds a new column.

Figure 4

Two tables shown side-by-side. The 'items owned' column is highlighted in blue on the left table, and the column names are highlighted in blue on the right table to show how the values of the 'items owned' become the column names in the output of the pivot wider function. The 'items owned logical' column is highlighted in yellow on the left table, and the values of the bicycle, television, and solar panel columns are highlighted in yellow on the right table to show how the values of the 'items owned logical' column became the values of all three of the aforementioned columns.

Data Visualisation with ggplot2


Figure 1


Figure 2


Figure 3


Figure 4

Scatter plot of number of items owned versus number of household members.

Figure 5

Scatter plot of number of items owned versus number of household members, with transparency added to points.

Figure 6

Scatter plot of number of items owned versus number of household members, showing jitter.

Figure 7

Scatter plot of number of items owned versus number of household members, with jitter and transparency.

Figure 8

Scatter plot of number of items owned versus number of household members, showing points as blue.

Figure 9


Figure 10

Previous plot with dots colored by village.

Figure 11

Scatter plot showing positive trend between number of household members and number of items owned.

Figure 12

Box plot of number of rooms by wall type.

Figure 13

Previous plot with dot plot added as additional layer to show individual values. Boxplot layer is transparent.

Figure 14


Figure 15

Box plot of number of livestock owned by wall type, with dot plot added as additional layer to show individual values.

Figure 16

Previous plot with dots colored based on whether respondent was a member of an irrigation association.

Figure 17

Bar plot showing counts of respondent wall types.

Figure 18

Stacked bar plot of wall types showing each village as a different color.

Figure 19

Bar plot of respondent wall types with each village as a separate bar.

Figure 20

Side by side bar plot showing percent of respondents in each village with each wall type.

Figure 21

Bar plot showing percent of respondents in each village who were part of association.

Figure 22

Previous plot with plot title and labells added.

Figure 23

Bar plot showing percent of each wall type in each village.

Figure 24

Bar plot showing percent of each wall type in each village, with black and white theme applied.

Figure 25

Multi-panel bar chart showing percent  of respondents in each village and who owned each item, with no grids behid bars.

Figure 26


Figure 27


Figure 28

Multi-panel bar charts showing percent of respondents in each village and who owned each item, with grids behind the bars.

Figure 29


Getting started with R Markdown (Optional)


Figure 1

R Markdown wizard monsters creating a R Markdown document from a recipe. Art by Allison Horst
Image credit: Allison Horst

Figure 2

Screenshot of the New R Markdown file dialogue box in RStudio

Figure 3

The 'knitting' process: First, R Markdown is converted to Markdown, which is then converted (via pandoc) to .html, .pdf, .docx, etc.

Figure 4


Figure 5

I made this plot while attending an awesome Data Carpentries workshop where I learned a ton of cool stuff!
I made this plot while attending an awesome Data Carpentries workshop where I learned a ton of cool stuff!

Processing JSON data (Optional)