Introducing R and RStudio IDE
- R is a powerful, popular open-source scripting language
- You can customize the layout of RStudio, and use the project feature to manage the files and packages used in your analysis
- RStudio allows you to run R in an easy-to-use interface and makes it easy to find help
R Basics
- Effectively using R is a journey of months or years. Still you don’t have to be an expert to use R and you can start using and analyzing your data with with about a day’s worth of training
- It is important to understand how data are organized by R in a given object type and how the mode of that type (e.g. numeric, character, logical, etc.) will determine how R will operate on that data.
- Working with vectors effectively prepares you for understanding how data are organized in R.
Introduction to the example dataset and file type
- The dataset comes from a real world experiment in E. coli.
- Publicly available FASTQ files can be downloaded from NCBI SRA.
- Several steps are taken outside of R/RStudio to create VCF files from FASTQ files.
- VCF files store variant calls in a special format.
R Basics continued - factors and data frames
- It is easy to import data into R from tabular formats including Excel. However, you still need to check that R has imported and interpreted your data correctly
- There are best practices for organizing your data (keeping it tidy) and R is great for this
- Base R has many useful functions for manipulating your data, but all of R’s capabilities are greatly enhanced by software packages developed by the community
Using packages from Bioconductor
- Bioconductor is an alternative package repository for bioinformatics packages.
- Installing packages from Bioconductor requires a new method, since
it is not compatible with the
install.packages()
function used for CRAN. - Check Bioconductor to see if there is a package relevant to your analysis before writing code yourself.
Data Wrangling and Analyses with Tidyverse
- Use the
dplyr
package to manipulate data frames. - Use
glimpse()
to quickly look at your data frame. - Use
select()
to choose variables from a data frame. - Use
filter()
to choose data based on values. - Use
mutate()
to create new variables. - Use
group_by()
andsummarize()
to work with subsets of data.
Data Visualization with ggplot2
- ggplot2 is a powerful tool for high-quality plots
- ggplot2 provides a flexible and readable grammar to build plots
Getting help with R
- R provides thousands of functions for analyzing data, and provides several way to get help
- Using R will mean searching for online help, and there are tips and resources on how to search effectively