Previous chapter
IntroductionProgramming basics
Next chapter

## R Programming Basics

R is easiest to use when you know how the R language works. This tutorial will teach you the implicit background knowledge that informs every piece of R code. You’ll learn about:

• functions and their arguments
• objects
• R’s basic data types
• R’s basic data structures including vectors and lists
• R’s package system

### R – A Functional Language

The design concepts of R are significantly different from those of an imperative language like C, C++, or Fortran. You should always be aware of this design concept, so that the functional language aspect of R will not become a source of confusion when you are used to imperative languages.

• All computations are done in functions. Main building blocks to organize code.
• Functions are first–class citizens of the language, e.g. can be directly used as objects, parameter values, etc.
• Side–effects: Functions have no side effects, inputs and environment are unaffected.
• Vectorized: Functions generally work on (all elements of) vectors—avoid for-loops.

## Functions

### Run a function

Can you use the `sqrt()` function in the chunk below to compute the square root of 962?

### Code

Use the code chunk below to examine the code that `sqrt` runs.

### lm

Compare the code in `sqrt()` to the code in another R function, `lm()`. Examine `lm()`’s code body in the chunk below.

### help pages

Wow! `lm()` runs a lot of code. What does it do? Open the help page for `lm()` in the chunk below and find out.

What do you think the chunk below will return? Run it and see. The result should be nothing. R will not run anything on a line after a `#` symbol. This is useful because it lets you write human readable comments in your code: just place the comments after a `#`. Now delete the `#` and re-run the chunk. You should see a result.

## Arguments

### args()

`rnorm()` is a function that generates random variables from a normal distribution. Find the arguments of `rnorm` using the `args()` function.

### rnorm() 1

Use `rnrom()` to generate 100 random normal values with a mean of 100 and a standard deviation of 15.

### rnorm() 2

Can you spot the error in the code below? Fix the code and then re-run it.

## Objects

### `=` vs `<-`

We are familiar to assign a value to a variable just with the equal sign, `=`, but in R this is (internally) also done by a function.

``````a = 1; a
`=` (a, 2); a``````

R purists emphasize this functional aspect by using the arrow sign, `<-`.

``````a <- 3; a
`<-`(a, 3); a``````

Don’t be confused with the `<-` sign, just use it, and not the `=`.

### Object names

You can choose almost any name you like for an object, as long as the name does not begin with a number or a special character like `+`, `-`, `*`, `/`, `^`, `!`, `@`, or `&`.

### Using objects

In the code chunk below, save the results of `rnorm(100, mean = 100, sd = 15)` to an object named `data`. Then, on a new line, call the `hist()` function on `data` to plot a histogram of the random values.

### What if?

What do you think would happen if you assigned `data` to a new object named `copy`, like this? Run the code and then inspect both `data` and `copy`.

### Data sets

Objects provide an easy way to store data sets in R. In fact, R comes with many toy data sets pre-loaded. Examine the contents of `iris` to see a classic toy data set. Hint: how could you learn more about the `iris` object?

### rm()

What if you accidentally overwrite an object? If that object came with R or one of its packages, you can restore the original version of the object by removing your version with `rm()`. Run `rm()` on `iris` below to restore the iris data set.

## Vectors

### Create a vector

In the chunk below, create a vector using the concatenate function `c()` that contains the integers from one to ten.

Hint: Each number must be separated with a comma

### The `:` Operator

If your vector contains a sequence of contiguous integers, you can create it with the `:` shortcut. Run `1:10` in the chunk below. What do you get? What do you suppose `1:20` would return?

### Using `seq()`

The same result can be achieved using the sequence function `seq()`. It supports the construction of more complex sequence vector using the `by` and `length.out` arguments. Let’s compare the sequences below:

### Using `rep()`

We can repeat objects using the `rep` function. Let’s create a vector using `rep()` and check the results:

### Subsetting with `[ ]`

You can extract any element of a vector by placing a pair of brackets behind the vector. Inside the brackets place the number of the element that you’d like to extract. For example, `vec[3]` would return the third element of the vector named `vec`.

Use the chunk below to extract the fourth element of `vec`.

### More `[ ]`

You can also use `[]` to extract multiple elements of a vector. Place the vector `c(1,2,5)` between the brackets below. What does R return?

### Names

If the elements of your vector have names, you can extract them by name. To do so place a name or vector of names in the brackets behind a vector. Surround each name with quotation marks, e.g. `vec2[c("alpha", "beta")]`.

Extract the element named gamma from the vector below.

### Vectorised operations

Predict what the code below will return. Then look at the result.

### Vector recycling

Recycling rules: when adding two structures with different number of elements, then the shortest is recycled to length of longest.

Predict what the code below will return. Then look at the result.

## Types

### Integers

Create a vector of integers from one to five. Can you imagine why you might want to use integers instead of numbers/doubles?

### Factors

Factors variables in R which take on a limited number of different values. Such variables are often refered to as categorical variables. Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. Factor levels specify the mapping between integer values and character labels.

### Floating point arithmetic

Computers must use a finite amount of memory to store decimal numbers (which can sometimes require infinite precision). As a result, some decimals can only be saved as very precise approximations. From time to time you’ll notice side effects of this imprecision, like below.

Compute the square root of two,square the answer (e.g. multiply the square root of two by the square root of two), and then subtract two from the result. What answer do you expect? What answer do you get?

### Date and Time

`Date` are stored as days since `1970-01-01` (origin) and are created as follows:

``## [1] "2017-11-11"``
``## [1] "2017-11-11"``
``## [1] "2017-11-11"``

`POSIXct` (datetime) represented as seconds since `1970-01-01`.

``## [1] "2017-11-11 11:11:00 CET"``
``## [1] "2017-11-11 11:11:00 CET"``
``## [1] "2017-11-11 11:11:00 CET"``

### Character or object?

One of the most common mistakes in R is to call an object when you mean to call a character string and vice versa.

### Matrices

`matrix` objects are vectors with a dimension attribute attached.

### Coercion

Convert from/to data types using

``as.<datatype>(...)``

For the most important data types use, e.g.

• `as.numeric()`, `as.integer()`
• `as.character()`
• `as.logical()`
• `as.Date(..., format = "<format>")`
• `as.POSIXct(..., format = "<format>")`

### Missing Values

R supports missing values as `NA` for all of its basic data types.

``## [1] FALSE FALSE  TRUE FALSE``

Use function `is.na()` to check for `NA` values.

## Lists

### Make a list

Make a list that contains the elements `1001`, `TRUE`, and `"stories"`. Give each element a name.

### Extract an element

Extract the number 1001 from the list below.

### Data Frames

You can make a data frame with the `data.frame()` function, which works similar to `c()`, and `list()`. Assemble the vectors below into a data frame with the column names `numbers`, `logicals`, `strings`.

### Extract a column

Given that a data frame is a type of list (with named elements), how could you extract the strings column of the `df` data frame below? Do it.

## Packages

### R Packages

When you first install R, you get a small collection of core packages known as base R. The remaining packages—there are over 10,000 of them—are optional. You don’t need to install them unless you want to use them.

ggplot2 is one of these optionals packages, so are the other packages that we will look at in these tutorials. Some of the most popular and most modern parts of R come in the optional packages.

You don’t need to worry about installing packages in these tutorials. Each tutorial comes with all of the packages that you need pre-installed; this is how we make the tutorials easy to use.

However, one day, you may want to use R outside of these tutorials. When that day comes, you’ll want to remember which packages to download to acquire the functions you use here. Throughout the tutorials, I will try to make it as clear as possible where each function comes from!

If you’d like to learn more about installing R packages (or R or the RStudio IDE) go to the section Installation and First Steps

### A common error

In the code chunk below, load the `tidyverse` package. Whenever you load a package R will also load all of the packages that the first package depends on. `tidyverse` takes advantage of this to create a shortcut for loading several common packages at once. Whenever you load `tidyverse`, `tidyverse` also loads `ggplot2`, `dplyr`, `tibble`, `tidyr`, `readr`, and `purrr`.