Before class

  • Setup class organization at Github
    • Check the Allow members to create repositories for this organization permission
    • Set the Default permissions for the organization to None if you want to avoid students accessing each others repositories
  • Have students create a GitHub account and email their username to the instructor.
  • Add students’ username to organization.

For class

  • Download Gaeta_etal_CLC_data.csv.
  • Either arrange to have a teaching partner to attend class or be logged into GitHub as another user in the browser for collaboration demos.

  • Open the following links in a browser and zoom in to make the images fill the screen.

Live coding demo and assignment are intertwined and designed to work in order. Instructions for both Posit Cloud & RStudio are included so just use the setup pieces that fit your class. Sign into GitHub

Introduction

Motivation

  • Who has a directory on their computer with a bunch of filenames
  • Get rid of messy folders and track changes to things like data files and code in a more manageable way.

Benefits of version control

  • Track changes (but better)
    • Tracks every change ever made in groups called commits
      • Every commit stores the full state of all of your files at that time
    • Never lose anything
      • Revert or restore to any commit
      • Easily unbreak your code/data/manuscript
      • No more file name changes
  • Collaboration
    • Work on things simultaneously
      • See what changes others have made
      • Everyone has the most recent version of everything

Version control using Git & RStudio

Connecting Posit Cloud to GitHub (only if teaching in Posit Cloud)

  1. Go to https://login.posit.cloud/ (login in necessary)
  2. Click Posit User Settings
  3. Go to GitHub section and click Private repo access also enabled
  4. Will be redirected to GitHub to authorize Posit Cloud
  5. Click Authorize posit-hosted

Create a Git repo

  1. Navigate to Github in a web browser and login.
  2. Click the + at the upper right corner of the page and choose New repository.
  3. Choose the class organization (e.g., dcsemester) as the Owner of the repo.
  4. Fill in a Repository name that follows the form FirstnameLastname.
  5. Select Private.
  6. Select Initialize this repository with a README.
  7. Click Create Repository.
  8. From new GitHub repository, click green Code button -> Click the Copy url to clipboard button.

Connect to the Git repo

Posit Cloud

  1. Posit Cloud (in class org): New Project -> New Project from Git Repo
  2. Paste copied URL in URL of your Git Repository:.
  3. Click OK.
  4. Check to make sure you have a Git tab in the upper right window.

RStudio

  1. RStudio, File -> New Project -> Version Control -> Git
  2. Paste copied URL in Repository URL:.
  3. Leave Project directory name: blank; automatically given repo name.
  4. Choose where to Create project as subdirectory of:.
  5. Click Create Project.
  6. Check to make sure you have a Git tab in the upper right window.

Install

install.packages(c('dplyr', 'readr', 'usethis', 'gitcreds'))

Introduce yourself to Git

library(usethis)

use_git_config(user.name = "[name]", user.email = "[email]")

That was Exercise 1 - Set-up Git. Have students confirm that this all worked and fix any issues.

First commits

Commit data

  • Download the data file Gaeta_etal_CLC_data.csv to your project directory.
  • Add the data file to version control
  • Two step process:
  1. Add the data file (checkbox)
  2. Commit it
  • Git -> Select Gaeta_etal_CLC_data.csv.
  • Commit with message.
    • Add fish size and growth rate data
  • History:
    • One commit
    • Shows that the file has been added to version control

Commit R script

  • Read in data to new R script.
library(dplyr)
library(readr)

fish_data = read_csv("Gaeta_etal_CLC_data.csv")

Make sure to have a new line at the end for a clean diff

  • Save as fish-analysis.R.
  • Git -> Select fish-analysis.R.
    • Changes in staged files will be included in next commit.
    • Can also see changes by selecting Diff
  • Commit with message.
    • Start script comparing fish length and scale size
  • History:
    • Two commits
    • See what changes were made to fish-analysis.R

Building a history

  • fish-analysis.R doesn’t currently show on the Git tab
    • No saved changes since last commit
  • Add some more code to fish-analysis.R
    • Create new categorical size column
fish_data_cat = fish_data |>
  mutate(length_cat = ifelse(length > 200, "big", "small"))
  • Save fish-analysis.R.
  • Now we see the file on the Git tab.
    • M indicates that it’s been modified.
  • To commit these changes, we need to stage the file.
    • Check the box next to fish-analysis.R.
  • Commit with message.
    • Add categorical fish length column
  • History:
    • Three commits
    • Each fish-analysis.R commit shows the additions we made in that commit.
  • Modify this code in fish-analysis.R
    • Change category cut-off size
fish_data_cat = fish_data %>% 
  mutate(length_cat = ifelse(length > 300, "big", "small"))
  • Save file -> stage -> commit
    • Change size cutoff for new column
    • Green sections for added lines, red for deleted
    • Git works line by line.
      • The previous version of the line is shown as deleted.
      • The new version of the line is shown as added.

Do Exercise 2 - First Solo Commit and Exercise 3 - Second Solo Commit

Instructor also do exercises

Committing multiple files

  • Commits can include multiple files at once
  • Let’s move our data file into a data subdirectory
  • New Folder -> data
  • Checkbox Gaeta_etal_CLC_data.csv -> More -> Move
  • Change code to read from new subdirectory
fish_data = read_csv("data/Gaeta_etal_CLC_data.csv")
  • Changes to R script indicated by M
  • Original datafile has a red D next to it which indicates “deleted”
  • New, untracked, data directory
  • git initially thinks we’ve deleted Gaeta_etal_CLC_data.csv and created a new Gaeta_etal_CLC_data.csv file in a new directory.
  • Click on both the old and new files to stage them
  • git then recognizes that we have moved (or renamed) the file by making the two files into one and marking this with an R for “rename”.

  • Commit: Move data file into subdirectory

Do Exercise 4 - Commit Multiple Files.

Instructor also do exercise

Git as a time machine

Experiment with impunity

fish_data_cat = fish_data %>% 
  mutate(length_cat = ifelse(length > 300, "large", "small"))
  • Save and show changes are staged
  • More -> Revert -> Yes

  • Get previous state of a file
    • History -> select commit -> View file @ ...
    • Save file over current file
    • Copy and paste relevant piece into current file

Delete with impunity

  • Both of these also work for deleted files
  • Close the upper left window with the fish-analysis.R.
  • Choose the File tab in the lower right window.
  • Select fish-analysis.R -> Delete -> Yes
  • Stage deleted file -> More -> Revert -> Yes

GitHub Remotes

Draw diagram to link local machine with GitHub origin.

  • So far we’ve worked with a local Git repository.
  • One of the big benefits of version control is easy collaboration.
  • To do this, we synchronize our local changes with a remote repository called origin.
  • Our remote repository is on GitHub.
    • By far the most popular hosted version control site
    • Public and private hosted repositories
    • Private free for students and academics
      • https://education.github.com/
        • For the assignment, we’re using private repositories that we made at the beginning.

Push to a remote

Connect to GitHub

  • To push to your remote we first have to connect to GitHub, which is a little tricky
  • First, log in to GitHub in your browser
  • Then create a GitHub token, this is like a special password just for one computer
usethis::create_github_token()
  • You may need to allow popups and try again

  • Select defaults
  • Create token
  • Copy token

  • Now add this token our local git setup so that it can use it to connect to GitHub
gitcreds::gitcreds_set()
  • Paste your password

Push

  • Push sends your recent commits to the origin remote.

Draw push arrow on diagram on board from local to origin.

  • Before a Push your commits show in your local history but not on the remote.

Show local commit history and lack of history in remote.

  • To Push to your remote, select the Push button at the top of the Git tab.
  • Now your changes and commit history are also stored on the remote.

Show local commits now on origin.

Do Exercise 5 - Pushing Changes.

Have students email a link to their repo to their instructor once they have finished Pushing Changes

The instructor should then commit the following code to their repo with the commit message: Plot histogram of scale length by categorical size

ggplot(fish_data_cat, aes(x = scalelength, fill = length_cat)) +
  geom_histogram()

Either you (logged in as another user) or your teaching partner should make the same change to your respository

Pulling

  • Big advantage to remotes is easy collaboration
  • Avoids emailing files and shared folders where you are never sure if you actually have the most recent version
  • Makes it easy to see what collaborators have done
  • Automatically combines non-overlapping changes
  • While I’ve been talking, a collaborator has added a plot of scale size and fish length to the code.

Show origin with collaborator commit.

Add collaborator local repo to diagram and pull arrow from origin to locals.

  • Pull the changes from the remote repo with the Pull button on the Git tab

Show updates to history following Pull and run code

Do Tasks 3-6 in Exercise 6 - Pulling and Pushing.

Merges

Demo merges either with a partner or by logging into GitHub as another user in the browser.

  • What happens if two people make changes at the same time?
    • If they edit different parts of the code git will combine them automatically
    • If they edit the same areas of the code this requires human intervention
  • Merges

  • You decide to change the number of histogram bins to 10
geom_histogram(bins = 10)
  • Your collaborator reassesses the measurement device and decides it is accurate down to 0.5 mm and pushes the change to the remote repository [make this change in the remote]
filter(scalelength >= 0.5)
  • You try to push your change
  • Get an error that shows someone else has made a change & you need to incorporate it to push
  • Pull
  • Merge happens automatically
  • You have both sets of changes
  • Remote still only has collaborators changes
  • Push to add the merged version to the remote

Merge conflicts

  • If both you and your collaborator edit the same location in the code git doesn’t know how to combine the changes.
  • A human has to make this kind of decision.

  • You decide to change "big" to "large"
mutate(length_cat = ifelse(length > 300, "large", "small"))
  • Your collaborator changes the size threshold and pushes to the remote
mutate(length_cat = ifelse(length > 250, "big", "small"))
  • You attempt to push your changes
  • Merge conflict when pulling collaborators changes
  • This shows as U for “unmerged” in RStudio
  • First block of code is your version
  • Second block is the version on the remote
  • Combine into a single block that includes everything
mutate(length_cat = ifelse(length > 250, "large", "small"))
  • Click check box next to file
  • Commit indicating that it is a merge
  • Still not on remote yet
  • Push

Full GitHub flow

  • Collaborating on Github can get more complex with “forks” and “branches.

Optional: Redraw diagram with local, origin, and upstream. Arrows from origin to/from upstream are pull requests and merges.

Show an example of a working repository with branches and forks. Navigate to pull requests.