Reference
Last updated on 2025-10-21 | Edit this page
Reference
Introduction to R and RStudio
- Use the escape key to cancel incomplete commands or running code (Ctrl+C) if you’re using R from the shell.
- Basic arithmetic operations follow standard order of precedence:
- Brackets:
(,) - Exponents:
^or** - Divide:
/ - Multiply:
* - Add:
+ - Subtract:
-
- Brackets:
- Scientific notation is available, e.g:
2e-3 - Anything to the right of a
#is a comment, R will ignore this! - Functions are denoted by
function_name(). Expressions inside the brackets are evaluated before being passed to the function, and functions can be nested. - Comparison operators:
<,<=,>,>=,==,!= - Use
all.equalto compare numbers! -
<-is the assignment operator. Anything to the right is evaluate, then stored in a variable named to the left. -
lslists all variables and functions you’ve created -
rmcan be used to remove them - When assigning values to function arguments, you must use
=.
Project management with RStudio
- To create a new project, go to File -> New Project
- Some best practices:
- Treat data as read-only
- Keep cleaned data separate from raw dirty data
- Treat generated output as disposable
- Keep related data together
- Use a consistent naming scheme
Data Structures
- Use
read.csv()to import data in memory -
class()gives you the data class of your object - R automatic converts data types
- The functions:
length(),nrow(),head(),tail(), andstr()can be useful to explore data. - Factors are a special class to deal with categorical data.
- Lists provide a flexible data type.
- Data frames are a special case of lists.
Exploring Data Frames
- R makes it easy to import datasets storred remotely
-
?data.frameis a key data structure. It is alistofvectors. -
cbind()will add a column (vector) to a data.frame. -
rbind()will add a row (list) to a data.frame.
Useful functions for querying data structures:
?strstructure, prints out a summary of the whole data structure?classwhat is the data structure??headprint the firstnelements (rows for two-dimensional objects)?tailprint the lastnelements (rows for two-dimensional objects)?rownames,?colnames,?dimnamesretrieve or modify the row names and column names of an object.?lengthget the number of elements in an atomic vector?nrow,?ncol,?dimget the dimensions of a n-dimensional object (Won’t work on atomic vectors or lists).If your data frame contains factors, you need to take extra steps to add rows that contain new level values.
-
read.csvto read in data in a regular structure-
separgument to specify the separator- “,” for comma separated
- “\t” for tab separated
- Other arguments:
-
header=TRUEif there is a header row
-
-
Subsetting data
-
Elements can be accessed by:
- Index
- Name
- Logical vectors
-
[single square brackets:- extract single elements or subset vectors
- e.g.
x[1]extracts the first item from vector x. -
extract single elements of a list. The returned value will
be another
list(). - extract columns from a data.frame
-
[with two arguments to:-
extract rows and/or columns of
- matrices
- data.frames
- e.g.
x[1,2]will extract the value in row 1, column 2. - e.g.
x[2,:]will extract the entire second column of values.
-
extract rows and/or columns of
[[double square brackets to extract items from lists.$to access columns or list elements by namenegative indices skip elements
Data frame manipulation with dplyr
-
?selectto extract variables by name. -
?filterreturn rows with matching conditions. -
?group_bygroup data by one of more variables. -
?summarizesummarize multiple values to a single value. -
?mutateadd new variables to a data.frame. -
?countand?nto tally values in the data frame. - Combine operations using the
?"%>%"pipe operator.
Control flow
- figures can be created with the grammar of graphics:
library(ggplot2)-
ggplotto create the base figure -
aesthetics specify the data axes, shape, color, and data size -
geometry functions specify the type of plot, e.g.point,line,density,box -
geometry functions also add statistical transforms, e.g.geom_smooth -
scalefunctions change the mapping from data to aesthetics -
facetfunctions stratify the figure into panels -
aesthetics apply to individual layers, or can be set for the whole plot insideggplot. -
themefunctions change the overall look of the plot - order of layers matters!
-
ggsaveto save a figure.
Writing data
-
write.tableto write out objects in regular format
Glossary
- argument
- A value given to a function or program when it runs. The term is often used interchangeably (and inconsistently) with parameter.
- assign
- To give a value a name by associating a variable with it.
- body
- (of a function): the statements that are executed when a function runs.
- comment
-
A remark in a program that is intended to help human readers understand
what is going on, but is ignored by the computer. Comments in Python, R,
and the Unix shell start with a
#character and run to the end of the line; comments in SQL start with--, and other languages have other conventions. - comma-separated values
- (CSV) A common textual representation for tables in which the values in each row are separated by commas.
- delimiter
- A character or characters used to separate individual values, such as the commas between columns in a CSV file.
- documentation
- Human-language text written to explain what software does, how it works, or how to use it.
- floating-point number
- A number containing a fractional part and an exponent. See also: integer.
- for loop
- A loop that is executed once for each value in some kind of set, list, or range. See also: while loop.
- index
- A subscript that specifies the location of a single value in a collection, such as a single pixel in an image.
- integer
- A whole number, such as -12343. See also: floating-point number.
- library
- In R, the directory(ies) where packages are stored.
- package
- A collection of R functions, data and compiled code in a well-defined format. Packages are stored in a library and loaded using the library() function.
- parameter
- A variable named in the function’s declaration that is used to hold a value passed into the call. The term is often used interchangeably (and inconsistently) with argument.
- return statement
- A statement that causes a function to stop executing and return a value to its caller immediately.
- sequence
- A collection of information that is presented in a specific order.
- shape
-
An array’s dimensions, represented as a vector. For example, a 5×3
array’s shape is
(5,3). - string
- Short for “character string”, a sequence of zero or more characters.
- syntax error
- A programming error that occurs when statements are in an order or contain characters not expected by the programming language.
- type
- The classification of something in a program (for example, the contents of a variable) as a kind of number (e.g. floating-point, integer), string, or something else. In R the command typeof() is used to query a variables type.
- while loop
- A loop that keeps executing as long as some condition is true. See also: for loop.