Go back to the main page
Go back to the R overview page


R

Data Cleaning Exercises

library(tidyverse)

This file can be downloaded here

Exercise 1

This first data set was not loaded well in Excel. Load the Excel file as a tibble in R. Separate the columns correctly. Hint: use the separate_wider_delim function from tidyverse. Consult the information on this function with ?separate_wider_delim. On the bottom you will find examples you can run. Take a close look at the first example. This should give you a clue how to solve this problem.

#Your code here

Exercise 2

The data in this file contains the units in the cells. This makes it impossible to perform calculations (as the data type is a text string). Remove the units in order to make calculations possible.

#Your code here

Exercise 3

This data set contains rows with duplicate data. Load the data and remove the duplicates from the data table.

#Your code here

Exercise 4

This data set contains empty data. Make the empty elements more explicit in R by converting them to NA. Count how many elements you have:

  • in total
  • with missing data
  • without missing data
#Your code here

Exercise 5

This also contains missing data in many fields. However, instead of leaving the cells empty, the author used a character to indicate a missing value. Open the file in a text editor to find what character has been used for missing data and replace this character with NA.

#Your code here

Go back to the main page
Go back to the R overview page
⬆️ Back to Top


This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.