Go back to the main page
Go back to the R overview page


R

Data Analysis Exercises

Import Tidyverse:

library(tidyverse)

Function to improve printing tibble as HTML:

library(kableExtra)
formatted_table <- function(df){
  kbl(df) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"))
}

This file can be downloaded here

In order to compare the analysis in R with Excel, the exercises where kept as similar as possible.

Exercise 1

Import the Mice Protein Expression dataset in R. It contains data on the levels of protein expression of mice proteins in different mice. Perform the following calculations on this data frame and display them as a table:

  • Calculate the minimum, maximum, average, median from the columns CDK5_N and Tau_N. Display the answers in a data frame.
#Your code here

Exercise 2

Index, filter and sort your data frame to answer the following questions:
1. Which mouse has the highest expression of the Tau protein?
2. What is the relative expression value of this protein in this particular mouse?
3. Which mouse has the lowest expression of the pAKT protein?
4. How many empty cells (NA) are in the BAD column? Use the summary function to find the amount of empty cells.
5. Which mouse of the Ts65Dn genotype group has the highest Tau expression (use multi-sorting)?
6. Which mouse of the Ts65Dn genotype, and saline treatment group has the highest Tau expression (use multi-sorting)?

Mouse with the highest expression of the protein Tau_N:

#Your code here

The relative expression value for this protein in this particular mouse:

#Your code here

Mouse with the lowest expression of the pAKT protein:

#Your code here

Amount of empty cells (NA) in the BAD column:

#Your code here

Mouse of the Ts65Dn genotype group with the highest Tau expression:

#Your code here

Mouse of the Ts65Dn genotype, and saline treatment group with the highest Tau expression (use multi-sorting):

#Your code here

Exercise 3

  1. An relative expression level > 0.5 would be considered a high expression level. How many mice do have a high expression level for DYRK1A?
  2. Apply this calculation for all proteins. For which protein do you observe a count of 218? Hint: you can calculate the mean and the sum for multiple columns at once with the colMeans and colSums functions.
  3. The average pELK expression is higher than pERK. But how many mice do have higher expression levels for pELK than 0.75 AND higher expression levels for pERK than 0.25?

High expression level:

#Your code here

High expression levels for all proteins:

#Your code here

Higher expression levels for pELK than 0.75 AND higher expression levels for pERK than 0.25:

#Your code here

Exersise 4

Note that this exercise differs a bit from the Excel counterpart. We do not use conditional formatting but selection of rows instead.

  1. Select rows with a relative expression value higher than 2.3 for the pCASP9 protein. Which treatment is mostly found for these selected proteins?
  2. Select rows with duplicate MouseIDs. Are there any duplicate MouseIDs?
  3. Select the following columns: MouseID, APP_N, NR1_N, pCREB_N, S6_N, and Genotype.
  4. Select everything but the columns with relative expression measurements. Hint: the column names of the column with relative expression values all have something in common.

Select rows with a relative expression value higher than 2.3 for the pCASP9 protein:

#Your code here

Select rows with duplicate MouseIDs. Are there any duplicate MouseIDs?

#Your code here

Select the following columns: MouseID, APP_N, NR1_N, pCREB_N, S6_N, and Genotype:

#Your code here

Select everything but the columns with relative expression measurements:

#Your code here

Exercise 5

Note that this exercise differs a bit from the Excel counterpart. We do not use a pivot table but the results will be similar.

Group the genotypes of the mice.
Calculate the standard deviation, average and the median of the relative expression of the following genes:
- PKCA
- RRP1
- BRAF
- JNK

Round the values to 3 decimals.

#Your code here

Go back to the main page
Go back to the R overview page
⬆️ Back to Top


This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.