Go back to the main page
Go back to the R overview page
This file can be downloaded here.
Lesson 7: R Basics
R as calculator
You can use R as a basic calculator. Type in the console a couple of calculations to get used to the console as well as R Markdown. Simple calculations and lines of code are usually used in the console, while larger pieces or blocks of code are written in the source and saved in files for later use.
## [1] 13
## [1] 2
## [1] 42
## [1] 390625
## [1] 390625
For more complicated calculations you need functions. Functions that
are used very often are preprogrammed in R and you just have to call
that function. A function needs arguments. You can use the
Help screen on the right bottom side to find information on
the functions that are used in R. You can also type
?function_name in the console.
Let’s stay with the calculations and try to calculate the square root
of a number. It is possible to use to the power 0.5, but you can also
use the function sqrt().
## [1] 2
## [1] 2
## [1] 4
## [1] 1.732051
## [1] 2
## [1] 4
## [1] 1.732051
Variables
It is useful to store data in variables, otherwise you will have to
type the same number (or word/code) again. You can store number(s),
word(s), text, functions, data frames, etc. in a variable. In
programming languages we usually try to avoid capital letters (unless it
is necessary) and spaces are forbidden. Let’s try to start storing data
in variables. For storing data in variables, the arrow
(<-) is used. The equality sign (=) can
also be used, but the arrow is more commonly used when writing code.
## [1] 100
On the right side of the screen (Environment) you can see that the number 100 is stored in the variable var1. You can call the value of the variable in the console and/or use it calculations.
## [1] 150
## [1] 1000
## [1] 10
## [1] 12
Of course it is also possible to store other types of data in variables. The types of data have been explained before, but we will limit our lessons to numbers, characters and factors. Text is used to explain the code and then it is written behind a ‘#’ (as you have seen in the examples above). But you can also store text/words in variables to use them later.
# Store the word 'hello" in the variable var2 and the word 'world' in the variable var3.
var2 <- "hello"
var3 <- "world"
var2## [1] "hello"
## [1] "world"
Adding words together to make a sentence is not possible to do like
it is possible with numbers. To place words together, the
paste() or the paste0() function is used.
# Paste the variables var2 and var3 together using the `paste()` function. Store it in the variable var4.
var4 <- paste(var2, var3)
var4## [1] "hello world"
# Paste the variables var2 and var3 together using the `paste0()` function. Store it in the variable var5.
var5 <- paste0(var2, var3)
var5## [1] "helloworld"
The opposite of paste is the function strsplit() and can
sometimes be useful to separate words into letters (for example a DNA
sequence into its single nucleotides, or protein sequence into amino
acids).
# Convert the following protein sequences into the single amino acids. When you use the split argument "", the sequence will be split after each character.
var6 <- "MSFGRTYHGHHGVAAMKLMNPPLYWQQRTASDDETYNMCGGFLLKILYTQSS"
var7 <- "Met-Ala-Ala-His-Pro-Leu-Pro-Gln-Asp-Asn-Ala-Val-Lys-Lys-Tyr-Ser-Pro-Ile"
strsplit(var6, "")## [[1]]
## [1] "M" "S" "F" "G" "R" "T" "Y" "H" "G" "H" "H" "G" "V" "A" "A" "M" "K" "L" "M"
## [20] "N" "P" "P" "L" "Y" "W" "Q" "Q" "R" "T" "A" "S" "D" "D" "E" "T" "Y" "N" "M"
## [39] "C" "G" "G" "F" "L" "L" "K" "I" "L" "Y" "T" "Q" "S" "S"
## [[1]]
## [1] "Met" "Ala" "Ala" "His" "Pro" "Leu" "Pro" "Gln" "Asp" "Asn" "Ala" "Val"
## [13] "Lys" "Lys" "Tyr" "Ser" "Pro" "Ile"
Some useful variables are ready and available in R. For example, the
letters of the alphabet are stored in the variable letters.
You can use this variable to extract letters and use them in functions.
Just type letters in the console and see what is stored in this
variable. You see that these are the small letters of the alphabet. You
can transform these letters to the capital letters.
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
# Use the function `toupper()` to transform the letters to the capital letters and store this in the variable var9.
var9 <- toupper(var8)
var9## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
This is just a workaround, because R has also the capital letters
already stored them in the variable LETTERS.
Vectors and data types
Before we go to the last type of data (factors) we will discuss
vectors first. A vector is a collection of types of data that can be
only one type of data, but also different types of data. The most basic
factor is created with the c() function.
# Create a vector with the numbers 1, 3, 6, 15 and 19 and store it in the variable var10.
var10 <- c(1, 3, 6, 15, 19)
var10## [1] 1 3 6 15 19
You can do many operations on variables. For example, determine the
length of the vector (the amount of elements in the vector) with the
length() function. Or you can add a certain amount to all
the elements of a vector.
# Determine how many elements are in the vector stored in var10 using the `length()` function.
length(var10)## [1] 5
## [1] 7 9 12 21 25
## [1] 10 30 60 150 190
## [1] 1.000000 1.732051 2.449490 3.872983 4.358899
## [1] 8.8
## [1] 6
## [1] 7.823043
It is possible to use conditional operators on vectors to search
certain values. These conditional operators are == (equal
to), >= (bigger than), <= (smaller
than), and != (not equal to).
## [1] FALSE FALSE FALSE TRUE TRUE
## [1] FALSE FALSE TRUE TRUE TRUE
## [1] TRUE TRUE TRUE FALSE FALSE
## [1] FALSE FALSE FALSE TRUE FALSE
## [1] FALSE TRUE TRUE TRUE TRUE
The last data type that we will use in this lesson is the factor. The factor is not a number but can be ordered. For example, for a test the letters “O”, “V”, “G” are used to grade the test. These are factors and are ordered, because “O” < “V” < “G”.
Check the Help menu what the factor() function does and
what arguments it takes. A vector of data can be created with the
c() function. The argument levels will be the
unique levels that are present in all the factors, and the argument
ordered will be set to TRUE if the order in the argument
levels is indeed true.
# Store the factors "V", "V", "G", "O", "V", "G", "O", "O" and "V" in the variable var11.
var11 <- factor(c("V", "V", "G", "O", "V", "G", "O", "O", "V"), levels = c("O", "V", "G"), ordered = T)
var11## [1] V V G O V G O O V
## Levels: O < V < G
## [1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE
## [1] "numeric"
## [1] "ordered" "factor"
The seq() and the rep() functions can also
be useful to create vectors. The rep() function is used to
add repeats of elements to a vector and seq() is used for
repetition with regular intervals. For example: take only the odd or
even numbers, or take every 3rd element from a certain vector.
# Take the odd numbers from 1-20 and store them in a new variable, var12
var12 <- seq(1, 20, 2)
var12## [1] 1 3 5 7 9 11 13 15 17 19
# Take the even numbers from var12 and store them in a new variable, var13
var13 <- seq(2, 20, 2)
var13## [1] 2 4 6 8 10 12 14 16 18 20
# Create a vector with four times the letter "a" and store them in the variable var14.
var14 <- rep("a", 4)
var14## [1] "a" "a" "a" "a"
# Create a vector with four times the letter "a" and "b" (end result: "a" "b" "a" "b" ...).
var15 <- rep(c("a", "b"), 4)
var15## [1] "a" "b" "a" "b" "a" "b" "a" "b"
# Create a vector with four times the letter "a" and then four times the letter "b".
# Store this vector in the variable var16 (end result: "a" "a" "a" ... "b" "b" "b" ...).
var16 <- rep(c("a", "b"), each = 4)
var16## [1] "a" "a" "a" "a" "b" "b" "b" "b"
## [1] "character"
Last thing about data types: it is possible to store different types of data in a vector, but the elements are then converted to one type of data. Letters cannot be converted to numbers, so a combination of numbers and letters (characters) is converted to characters for all the elements in the vector.
# Create a vector with the elements 1, 2, 3 and "a" and store the vector in the variable var17. Which type are the elements in the vector?
var17 <- c(1:3, "a")
var17## [1] "1" "2" "3" "a"
## [1] "character"
Vectors are the place where elements can be stored and used when
necessary. But what if you want to use only a selection of the elements?
Then you need to know where the elements are in the vector and you need
to know how to extract the element(s). This is called indexing. You can
use indexing on vectors, but (as we will see) also on data frames. To
extract elements from a vector, the square brackets [] are
used.
## [1] "A"
## [1] "Z"
## [1] "Z"
## [1] "D" "N" "A"
# Paste these letters together to "DNA" and store it in var18.
var18 <- paste0(var9[4], var9[14], var9[1])
var18## [1] "DNA"
Dataframes & Tidyverse
Data frames can be constructed from vectors of equal length. The
vectors are the columns and the elements the variables in the data
frame, while the rows are the observations of the data frame. When you
have vectors that you want to collect in a data frame, you can use the
data.frame() function.
# Create two vectors with the numbers 1:20 in vec1 and the numbers 21:40 in vec2. Combine these vectors to a data frome, df1.
vec1 <- c(1:10)
vec2 <- c(11:20)
df1 <- data.frame(vec1, vec2)
df1## vec1 vec2
## 1 1 11
## 2 2 12
## 3 3 13
## 4 4 14
## 5 5 15
## 6 6 16
## 7 7 17
## 8 8 18
## 9 9 19
## 10 10 20
Data frames created in basic R are ‘basic’. It is possible to use a
more ‘pimped’ version of the data frame using the tidyverse
library. You can load this library by typing
library(tidyverse) in the console. Many other useful
functions in this library will be used during this course. Make sure
that you load this library every time before we start the lesson. To
make the data frames a bit more presentable in the HTML-format, we use
during the course you will also need the library
kableExtra.
We use the tibble as a more informative data frame during this
course. Make a tibble of df1 and make it more presentable
it with KableExtra. The function
kable_styling() will let you make up your tibble the way
you like.
library(knitr)
library(pillar)
formatted_table <- function(df) {
col_types <- sapply(df, pillar::type_sum)
new_col_names <- paste0(names(df), "<br>", "<span style='font-weight: normal;'>", col_types, "</span>")
kbl(df, col.names = new_col_names, escape = F, format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hoover", "responsive"))
}
# Create a tibble from df1 and store it in tib1.
tib1 <- tibble(df1)
# Format the table with the function `formatted_table()` that has been defined above.
# The only argument for this function will be the tibble.
formatted_table(tib1)|
vec1 int |
vec2 int |
|---|---|
| 1 | 11 |
| 2 | 12 |
| 3 | 13 |
| 4 | 14 |
| 5 | 15 |
| 6 | 16 |
| 7 | 17 |
| 8 | 18 |
| 9 | 19 |
| 10 | 20 |
You can give the column headers the name you want to give with the
names() function. The argument needed is a vector with the
names of the columns. Selecting elements, rows and columns can be done
using indexing, like indexing with vectors. The difference is that there
are now two dimensions, instead of one. The square brackets
[] are still used, but now you have to indicate the rows
and columns, separated by a comma.
# Give the data frame tib1 new names for the headers of the data frame: "Variable1" and "Variable2".
names(tib1) <- c("Variable1", "Variable2")
tib1## # A tibble: 10 × 2
## Variable1 Variable2
## <int> <int>
## 1 1 11
## 2 2 12
## 3 3 13
## 4 4 14
## 5 5 15
## 6 6 16
## 7 7 17
## 8 8 18
## 9 9 19
## 10 10 20
## # A tibble: 1 × 1
## Variable1
## <int>
## 1 3
## # A tibble: 1 × 1
## Variable2
## <int>
## 1 11
## # A tibble: 10 × 1
## Variable2
## <int>
## 1 11
## 2 12
## 3 13
## 4 14
## 5 15
## 6 16
## 7 17
## 8 18
## 9 19
## 10 20
## [1] 11 12 13 14 15 16 17 18 19 20
## # A tibble: 1 × 2
## Variable1 Variable2
## <int> <int>
## 1 4 14
Let’s add a new column with newly acquired data to this data frame. There are different ways to do this, but the most basic method is the following:
# Add a third column that contains the values 11-15 and has the header Variable3.
tib1$Variable3 <- c(21:30)
tib1## # A tibble: 10 × 3
## Variable1 Variable2 Variable3
## <int> <int> <int>
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 15 25
## 6 6 16 26
## 7 7 17 27
## 8 8 18 28
## 9 9 19 29
## 10 10 20 30
Note that you cannot add a new column this way if the elements in the new vector does not have the same length as the amount of rows in the data frame. You can even do some calculations if the data frame consists of numbers only.
## Variable1 Variable2 Variable3
## 1 2 12 22
## 2 3 13 23
## 3 4 14 24
## 4 5 15 25
## 5 6 16 26
## 6 7 17 27
## 7 8 18 28
## 8 9 19 29
## 9 10 20 30
## 10 11 21 31
## Variable1 Variable2 Variable3
## 1 6 66 126
## 2 12 72 132
## 3 18 78 138
## 4 24 84 144
## 5 30 90 150
## 6 36 96 156
## 7 42 102 162
## 8 48 108 168
## 9 54 114 174
## 10 60 120 180
# Create a second data frame df2 that consists of three columns and 5 rows. Take random numbers for each element. Now add df1 to df2.
tib2 <- tibble(sample(1:20, 10), sample(21:40, 10), sample(41:60, 10))
tib2## # A tibble: 10 × 3
## `sample(1:20, 10)` `sample(21:40, 10)` `sample(41:60, 10)`
## <int> <int> <int>
## 1 9 32 43
## 2 10 24 58
## 3 19 38 49
## 4 11 29 56
## 5 17 31 51
## 6 14 22 60
## 7 15 27 47
## 8 5 23 48
## 9 20 34 52
## 10 4 25 45
## Variable1 Variable2 Variable3
## 1 10 43 64
## 2 12 36 80
## 3 22 51 72
## 4 15 43 80
## 5 22 46 76
## 6 20 38 86
## 7 22 44 74
## 8 13 41 76
## 9 29 53 81
## 10 14 45 75
R can access a couple of standard data frames that can be used for
exercises to practice your skills in R. One of these data frames is the
Iris data frame. We will use the function
head() to show only the first 10 rows of the data
frame.
## # A tibble: 10 × 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 11 5.4 3.7 1.5 0.2 setosa
## [1] 3.9
## [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
## [1] versicolor
## Levels: setosa versicolor virginica
## [1] 4.4
# Show the row that contains the maxim sepal width.
iris[iris$Sepal.Width == max(iris$Sepal.Width), ]## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 16 5.7 4.4 1.5 0.4 setosa
## [1] 3.758
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.100 0.300 1.300 1.199 1.800 2.500
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 14 4.3 3.0 1.1 0.1 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
# Order the data frame descending on the sepal width.
head(iris[order(iris$Sepal.Width, decreasing = T),], 10L) #OR## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 16 5.7 4.4 1.5 0.4 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 16 5.7 4.4 1.5 0.4 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 47 5.1 3.8 1.6 0.2 setosa
Plotting
Last part of this lesson is plotting. Data frames often contain data
that might have a correlation between variables. To plot a graph we use
the basic plot() function. However, when we move to other
plot functions later on. The most basic plot needs x-values and
y-values, which can be extracted from a data frame.
# Create a plot from the vectors with numbers 1-10 (x-values) and numbers 11-20 (y-values).
plot(c(1:10), c(11:20))You see the plot is very, very basic, but this is just a start. Let’s
use the iris data frame to get some more interesting plots
and let’s try to adjust the plot to our liking (axis titles, graph
title, maybe add some color to it…). Check Help for the
plot() function. Search for base::plot. Let’s
have a look on the different kind plots we can use with this data frame.
The first basic plot is the XY-scatterplot.
# Create an XY-scatterplot with the petal width on the x-axis and the petal length on the y-axis.
plot(iris$Petal.Width, iris$Petal.Length)# Add labels for the x- and y-axes. Width and length is in mm.
# Add a main title to the graph.
# Make the lines blue.
plot(iris$Petal.Width, iris$Petal.Length, xlab = "Petal Width (mm)",
ylab = "Petal length (mm)", main = "Petal Width vs Petal Length", col = "blue")It is also possible to use the formula notation y ~ x to
plot the same data.
# Plot the same data using the formula notation.
plot(iris$Petal.Length ~ iris$Petal.Width, xlab = "Petal Width (mm)",
ylab = "Petal length (mm)", main = "Petal Width vs Petal Length", col = "blue")You can add a trendline to the plot as well.
plot(iris$Petal.Length ~ iris$Petal.Width, xlab = "Petal Width (mm)",
ylab = "Petal length (mm)", main = "Petal Width vs Petal Length", col = "blue")
abline(lm(iris$Petal.Length ~ iris$Petal.Width), col = "red")You can also add other variables to the plot, as long as the x-values are the same variable.
# Create a plot with the sepal length and the petal length (y-values) are plotted against the sepal width.
plot(iris$Sepal.Length ~ iris$Sepal.Width, col = "blue")
points(iris$Petal.Length ~ iris$Sepal.Width, col = "red")Let’s try to make a histogram of the sepal width.
The function takes the most convenient steps to divide the different
widths into the classes. You are able to adjust this by using the
argument breaks =. This arguments needs a lower and upper
limit and the division between the breaks (you have used the function
seq() to achieve this before). Add some better titles to
the graph and axes while you are on it.
# Create a histogram that shows the sepal widths between 2.0 and 5.0 with steps of 0.5.
hist(iris$Sepal.Width, breaks = seq(2.0, 5.0, 0.5), xlab = "Sepal Width (mm)", main = "Histogram of Sepal Width")With the boxplot you can add series and compare them to each other. The range of measurements are shown as well as the variability of the measurements.
# Create a boxplot of the petal width en sepal width. Use the argument `names =` for the variables.
boxplot(iris$Petal.Width, iris$Sepal.Width, names = c("Petal Width", "Sepal Width"))The barplot is also frequently used to visualize data. The basic barplot contains the values on the y-axis and the categories on the x-axis. Let’s plot the average of the (numeric) variables in a barplot.
# Create a barplot of the averages of the sepal width, sepal length, petal width and petal length.
barplot(c(mean(iris$Sepal.Width), mean(iris$Sepal.Length), mean(iris$Petal.Width),
mean(iris$Petal.Length)),names = c("Sepal Width", "Sepal Length", "Petal Width",
"Petal Length"))
Learning outcomes
This lesson you have learned to:
- use base R to get to know how R and RStudio works,
- create variables to store data,
- create data frames from vectors,
- access data in vectors and data frames,
- plot data with base R.
Go back to the main page
Go back to the R overview page
⬆️ Back to Top
This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.