Go back to the main page
Go back to the R overview page
R
Data Analysis Exercises
Import Tidyverse:
Function to improve printing tibble as HTML:
library(kableExtra)
formatted_table <- function(df){
kbl(df) %>%
kable_styling(bootstrap_options = c("striped", "hover", "responsive"))
}This file can be downloaded here
In order to compare the analysis in R with Excel, the exercises where kept as similar as possible.
Exercise 1
Import the Mice Protein Expression dataset in R. It contains data on the levels of protein expression of mice proteins in different mice. Perform the following calculations on this data frame and display them as a table:
- Calculate the minimum, maximum, average, median from the columns
CDK5_NandTau_N. Display the answers in a data frame.
Exercise 2
Index, filter and sort your data frame to answer the following
questions:
1. Which mouse has the highest expression of the Tau protein?
2. What is the relative expression value of this protein in this
particular mouse?
3. Which mouse has the lowest expression of the pAKT protein?
4. How many empty cells (NA) are in the BAD column? Use the
summary function to find the amount of empty cells.
5. Which mouse of the Ts65Dn genotype group has the highest Tau
expression (use multi-sorting)?
6. Which mouse of the Ts65Dn genotype, and saline treatment group has
the highest Tau expression (use multi-sorting)?
Mouse with the highest expression of the protein
Tau_N:
The relative expression value for this protein in this particular mouse:
Mouse with the lowest expression of the pAKT protein:
Amount of empty cells (NA) in the BAD column:
Mouse of the Ts65Dn genotype group with the highest Tau expression:
Mouse of the Ts65Dn genotype, and saline treatment group with the highest Tau expression (use multi-sorting):
Exercise 3
- An relative expression level > 0.5 would be considered a high
expression level. How many mice do have a high expression level for
DYRK1A?
- Apply this calculation for all proteins. For which protein do you
observe a count of 218? Hint: you can calculate the mean and the sum for
multiple columns at once with the
colMeansandcolSumsfunctions.
- The average pELK expression is higher than pERK. But how many mice do have higher expression levels for pELK than 0.75 AND higher expression levels for pERK than 0.25?
High expression level:
High expression levels for all proteins:
Higher expression levels for pELK than 0.75 AND higher expression levels for pERK than 0.25:
Exersise 4
Note that this exercise differs a bit from the Excel counterpart. We do not use conditional formatting but selection of rows instead.
- Select rows with a relative expression value higher than 2.3 for the
pCASP9 protein. Which treatment is mostly found for these selected
proteins?
- Select rows with duplicate MouseIDs. Are there any duplicate
MouseIDs?
- Select the following columns: MouseID, APP_N, NR1_N, pCREB_N, S6_N,
and Genotype.
- Select everything but the columns with relative expression measurements. Hint: the column names of the column with relative expression values all have something in common.
Select rows with a relative expression value higher than 2.3 for the pCASP9 protein:
Select rows with duplicate MouseIDs. Are there any duplicate MouseIDs?
Select the following columns: MouseID, APP_N, NR1_N, pCREB_N, S6_N, and Genotype:
Select everything but the columns with relative expression measurements:
Exercise 5
Note that this exercise differs a bit from the Excel counterpart. We do not use a pivot table but the results will be similar.
Group the genotypes of the mice.
Calculate the standard deviation, average and the median of the relative
expression of the following genes:
- PKCA
- RRP1
- BRAF
- JNK
Round the values to 3 decimals.
Go back to the main page
Go back to the R overview page
⬆️ Back to Top
This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.