Go back to the main page
Go back to the R overview page
This file can be downloaded here.
Lesson 11-13: Data analysis
Once the data is read/loaded and cleaned up nicely, it is time start analyzing and presenting the data. In these two lessons, we will look at the analyzing part. We will look at how to sort, filter and re-arranging the data presented in the data frame (or tibble) and then it is time to look at properties of specific variables and do some commonly used calculations on data.
First, let’s load a data set that we can work with which has been
cleaned up already. Of course we start with the make up of the tibbles
we create during this part of the lessons, like we did before in
previous lessons using the tidyverse and
kableExtra libraries.
library(tidyverse)
library(kableExtra)
library(knitr)
library(pillar)
formatted_table <- function(df) {
col_types <- sapply(df, pillar::type_sum)
new_col_names <- paste0(names(df), "<br>", "<span style='font-weight: normal;'>", col_types, "</span>")
kbl(df, col.names = new_col_names, escape = F, format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hoover", "responsive"))
}Download the file chronic_kidney_disease.csv and check in a text editor what is the delimiter in the file. Read the file into R.
# Read the data on chronic kidney disease.
kidney_data <- read_csv("./files_10_data_analysis_exercises/add_exercises/chronic_kidney_disease.csv")
# Replace any missing data with NA values.
# Hint: check which columns are of character type, but contains numbers.
tibble1 <- tibble(kidney_data) %>%
replace(.== "?", NA) %>%
mutate(`White blood cell count (cells/µl)` = as.numeric(`White blood cell count (cells/µl)`)) %>%
mutate(`Red blood cell count (millions/µl)` = as.numeric(`Red blood cell count (millions/µl)`))
formatted_table(head(tibble1))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 157 | 62 | 70 | 1.025 | 3 | 0 | normal | abnormal | notpresent | notpresent | 122 | 42.0 | 1.7 | 136 | 4.7 | 12.6 | 39 | 7900 | 3.9 | yes | yes | no | good | no | no | ckd |
| 109 | 54 | 70 | NA | NA | NA | NA | NA | notpresent | notpresent | 233 | 50.1 | 1.9 | NA | NA | 11.7 | NA | NA | NA | no | yes | no | good | no | no | ckd |
| 17 | 47 | 80 | NA | NA | NA | NA | NA | notpresent | notpresent | 114 | 87.0 | 5.2 | 139 | 3.7 | 12.1 | NA | NA | NA | yes | no | no | poor | no | no | possibleckd |
| 347 | 43 | 60 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 108 | 25.0 | 1.0 | 144 | 5.0 | 17.8 | 43 | 7200 | 5.5 | no | no | no | good | no | no | notckd |
| 24 | 42 | 100 | 1.015 | 4 | 0 | normal | abnormal | notpresent | present | NA | 50.0 | 1.4 | 129 | 4.0 | 11.1 | 39 | 8300 | 4.6 | yes | no | no | poor | no | no | possibleckd |
| 175 | 60 | 50 | 1.010 | 0 | 0 | NA | normal | notpresent | notpresent | 261 | 58.0 | 2.2 | 113 | 3.0 | NA | NA | 4200 | 3.4 | yes | no | no | good | no | no | ckd |
Select in data frames using base R
Let’s start by selecting specific data from the data frame. If we use numbers, we use the square brackets to indicate the index numbers (rows and column) of the data frame. For columns it is also possible to use the column name and then the $-sign is used.
## # A tibble: 1 × 1
## `Age (years)`
## <dbl>
## 1 60
## # A tibble: 1 × 26
## patient_id `Age (years)` `Blood pressure (mm/Hg)` `Specific gravity` Albumine
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 74 56 90 1.01 2
## # ℹ 21 more variables: Sugar <dbl>, `Red blood cells` <chr>,
## # `Pus in cells` <chr>, `Pus cell clumps` <chr>, Bacteria <chr>,
## # `[Glucose] (mg/dl)` <dbl>, `[Blood urea] (mg/dl)` <dbl>,
## # `[Creatine] (mg/dl)` <dbl>, `[Na] (mEq/L)` <dbl>, `[K] (mEq/L)` <dbl>,
## # `Hemoglobine (mg)` <dbl>, `Packed cell volume` <dbl>,
## # `White blood cell count (cells/µl)` <dbl>,
## # `Red blood cell count (millions/µl)` <dbl>, Hypertension <chr>, …
## # A tibble: 280 × 1
## `Blood pressure (mm/Hg)`
## <dbl>
## 1 70
## 2 70
## 3 80
## 4 60
## 5 100
## 6 50
## 7 80
## 8 70
## 9 70
## 10 100
## # ℹ 270 more rows
## [1] 70 70 80 60 100 50 80 70 70 100 60 90 80 80 70 70 50 80
## [19] 90 60 70 70 80 70 100 80 100 80 80 80 70 80 80 60 60 110
## [37] 70 90 NA 60 80 80 70 70 60 70 70 80 60 70 70 70 80 80
## [55] 80 60 60 100 70 70 80 90 70 80 80 100 80 90 70 90 90 100
## [73] 60 80 60 70 70 60 80 70 70 90 70 120 70 70 80 70 80 NA
## [91] 60 60 90 60 60 70 80 60 60 70 80 NA 80 90 90 80 80 80
## [109] 80 90 80 100 70 80 90 70 60 60 60 70 80 60 90 60 70 70
## [127] 80 100 90 70 100 90 60 70 80 60 90 70 70 50 80 70 90 70
## [145] 70 90 70 80 70 90 100 70 60 80 90 70 60 70 60 70 80 80
## [163] 60 70 100 70 80 70 140 NA 80 80 90 80 90 70 70 70 60 80
## [181] 70 70 70 NA 60 80 90 100 80 60 80 90 90 80 80 70 70 60
## [199] 80 70 70 70 80 80 80 80 70 80 70 NA 80 100 60 60 80 80
## [217] 80 50 70 80 70 70 80 70 70 90 60 80 70 70 70 110 80 60
## [235] 70 60 100 80 80 60 90 80 60 70 60 80 NA 70 80 70 60 70
## [253] 80 90 80 60 60 70 NA 60 60 80 70 90 90 60 180 60 100 80
## [271] 80 60 80 80 NA 60 90 80 80 60
Notice the difference between selecting a column using the index number of the column and using the name of the column. The first command returns a tibble, the second command returns a vector.
Slicing specific columns using Tidyverse
With tidyverse you can take slices (selected rows) from
the data frame with the slice() function.
|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 157 | 62 | 70 | 1.025 | 3 | 0 | normal | abnormal | notpresent | notpresent | 122 | 42 | 1.7 | 136 | 4.7 | 12.6 | 39 | 7900 | 3.9 | yes | yes | no | good | no | no | ckd |
|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 333 | 23 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 99 | 46 | 1.2 | 142 | 4.0 | 17.7 | 46 | 4300 | 5.5 | no | no | no | good | no | no | notckd |
| 275 | 52 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 125 | 22 | 1.2 | 139 | 4.6 | 16.5 | 43 | 4700 | 4.6 | no | no | no | good | no | no | notckd |
| 150 | 8 | 60 | 1.025 | 3 | 0 | normal | normal | notpresent | notpresent | 78 | 27 | 0.9 | NA | NA | 12.3 | 41 | 6700 | NA | no | no | no | poor | yes | no | ckd |
| 10 | 50 | 60 | 1.010 | 2 | 4 | NA | abnormal | present | notpresent | 490 | 55 | 4.0 | NA | NA | 9.4 | 28 | NA | NA | yes | yes | no | good | no | yes | ckd |
| 192 | 46 | 110 | 1.015 | 0 | 0 | NA | normal | notpresent | notpresent | 130 | 16 | 0.9 | NA | NA | NA | NA | NA | NA | no | no | no | good | no | no | ckd |
With slice_head() and slice_tail() you can
get the top or bottom rows, respectively.
|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 157 | 62 | 70 | 1.025 | 3 | 0 | normal | abnormal | notpresent | notpresent | 122 | 42.0 | 1.7 | 136 | 4.7 | 12.6 | 39 | 7900 | 3.9 | yes | yes | no | good | no | no | ckd |
| 109 | 54 | 70 | NA | NA | NA | NA | NA | notpresent | notpresent | 233 | 50.1 | 1.9 | NA | NA | 11.7 | NA | NA | NA | no | yes | no | good | no | no | ckd |
| 17 | 47 | 80 | NA | NA | NA | NA | NA | notpresent | notpresent | 114 | 87.0 | 5.2 | 139 | 3.7 | 12.1 | NA | NA | NA | yes | no | no | poor | no | no | possibleckd |
| 347 | 43 | 60 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 108 | 25.0 | 1.0 | 144 | 5.0 | 17.8 | 43 | 7200 | 5.5 | no | no | no | good | no | no | notckd |
|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 106 | 50 | 90 | NA | NA | NA | NA | NA | notpresent | notpresent | 89 | 118 | 6.1 | 127 | 4.4 | 6.0 | 17 | 6500 | NA | yes | yes | no | good | yes | yes | ckd |
| 270 | 23 | 80 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 111 | 34 | 1.1 | 145 | 4.0 | 14.3 | 41 | 7200 | 5.0 | no | no | no | good | no | no | notckd |
| 348 | 38 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 99 | 19 | 0.5 | 147 | 3.5 | 13.6 | 44 | 7300 | 6.4 | no | no | no | good | no | no | notckd |
| 102 | 17 | 60 | 1.010 | 0 | 0 | NA | normal | notpresent | notpresent | 92 | 32 | 2.1 | 141 | 4.2 | 13.9 | 52 | 7000 | NA | no | no | no | good | no | no | possibleckd |
With slice_max() and slice_min() it is
possible to get the row with the maximum or minimum value, respectively,
for a specific column.
# Get the row with the maximum and mininum values for Hemoglobin levels.
formatted_table(slice_max(tibble1, order_by = `Hemoglobine (mg)`, n = 1))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 347 | 43 | 60 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 108 | 25 | 1.0 | 144 | 5 | 17.8 | 43 | 7200 | 5.5 | no | no | no | good | no | no | notckd |
| 363 | 67 | 80 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 99 | 40 | 0.5 | NA | NA | 17.8 | 44 | 5900 | 5.2 | no | no | no | good | no | no | notckd |
|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 249 | 56 | 90 | 1.01 | 4 | 1 | normal | abnormal | present | notpresent | 176 | 309.0 | 13.3 | 124 | 6.5 | 3.1 | 9 | 5400 | 2.1 | yes | yes | no | poor | yes | yes | ckd |
| 14 | 68 | 80 | 1.01 | 3 | 2 | normal | abnormal | present | present | 157 | 90.0 | 4.1 | 130 | 6.4 | 5.6 | 16 | 11000 | 2.6 | yes | yes | yes | poor | yes | no | ckd |
| 195 | 70 | 90 | 1.02 | 2 | 1 | abnormal | abnormal | notpresent | present | 184 | 98.6 | 3.3 | 138 | 3.9 | 5.8 | NA | NA | NA | yes | yes | yes | poor | no | no | ckd |
n will give the top (or bottom) n rows. If
there are two rows with the same maximum (or minimum) values, R will
give both rows back (even if you set n = 1).
Filter and sort in R
Like in Excel, it is possible to filter and sort on the data in the data frame.
# Select data that only contains the data for patients that have a blood pressure that is higher than 70 (mm/Hg). Assign the data to 'tibble2'.
tibble2 <- filter(tibble1, `Blood pressure (mm/Hg)` > 70)
formatted_table(head(tibble2))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 47 | 80 | NA | NA | NA | NA | NA | notpresent | notpresent | 114 | 87 | 5.2 | 139 | 3.7 | 12.1 | NA | NA | NA | yes | no | no | poor | no | no | possibleckd |
| 24 | 42 | 100 | 1.015 | 4 | 0 | normal | abnormal | notpresent | present | NA | 50 | 1.4 | 129 | 4.0 | 11.1 | 39 | 8300 | 4.6 | yes | no | no | poor | no | no | possibleckd |
| 351 | 29 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 83 | 49 | 0.9 | 139 | 3.3 | 17.5 | 40 | 9900 | 4.7 | no | no | no | good | no | no | notckd |
| 245 | 48 | 100 | NA | NA | NA | NA | NA | notpresent | notpresent | 103 | 79 | 5.3 | 135 | 6.3 | 6.3 | 19 | 7200 | 2.6 | yes | no | yes | poor | no | no | ckd |
| 145 | 57 | 90 | 1.015 | 5 | 0 | abnormal | abnormal | notpresent | present | NA | 322 | 13.0 | 126 | 4.8 | 8.0 | 24 | 4200 | 3.3 | yes | yes | yes | poor | yes | yes | ckd |
| 258 | 42 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 98 | 20 | 0.5 | 140 | 3.5 | 13.9 | 44 | 8400 | 5.5 | no | no | no | good | no | no | notckd |
# Select data that contains the same condition as above (Blood pressure > 70) AND age of the patient < 50 years. Use the '&' operator to combine conditions. Store the data in 'tibble3'.
tibble3 <- filter(tibble1, `Blood pressure (mm/Hg)` > 70 &
`Age (years)` < 50)
formatted_table(head(tibble3))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 47 | 80 | NA | NA | NA | NA | NA | notpresent | notpresent | 114 | 87 | 5.2 | 139 | 3.7 | 12.1 | NA | NA | NA | yes | no | no | poor | no | no | possibleckd |
| 24 | 42 | 100 | 1.015 | 4 | 0 | normal | abnormal | notpresent | present | NA | 50 | 1.4 | 129 | 4.0 | 11.1 | 39 | 8300 | 4.6 | yes | no | no | poor | no | no | possibleckd |
| 351 | 29 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 83 | 49 | 0.9 | 139 | 3.3 | 17.5 | 40 | 9900 | 4.7 | no | no | no | good | no | no | notckd |
| 245 | 48 | 100 | NA | NA | NA | NA | NA | notpresent | notpresent | 103 | 79 | 5.3 | 135 | 6.3 | 6.3 | 19 | 7200 | 2.6 | yes | no | yes | poor | no | no | ckd |
| 258 | 42 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 98 | 20 | 0.5 | 140 | 3.5 | 13.9 | 44 | 8400 | 5.5 | no | no | no | good | no | no | notckd |
| 218 | 33 | 90 | 1.015 | 0 | 0 | NA | normal | notpresent | notpresent | 92 | 19 | 0.8 | NA | NA | 11.8 | 34 | 7000 | NA | no | no | no | good | no | no | ckd |
# Select data that contains either of the conditions mentioned: Blood pressure > 70 mm/Hg OR the age < 50 years. Use the '|' operator to choose between either of the conditions. Store the data in tibble 4.
tibble4 <- filter(tibble1, `Blood pressure (mm/Hg)` > 70 |
`Age (years)` < 50)
formatted_table(head(tibble4))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 47 | 80 | NA | NA | NA | NA | NA | notpresent | notpresent | 114 | 87 | 5.2 | 139 | 3.7 | 12.1 | NA | NA | NA | yes | no | no | poor | no | no | possibleckd |
| 347 | 43 | 60 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 108 | 25 | 1.0 | 144 | 5.0 | 17.8 | 43 | 7200 | 5.5 | no | no | no | good | no | no | notckd |
| 24 | 42 | 100 | 1.015 | 4 | 0 | normal | abnormal | notpresent | present | NA | 50 | 1.4 | 129 | 4.0 | 11.1 | 39 | 8300 | 4.6 | yes | no | no | poor | no | no | possibleckd |
| 351 | 29 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 83 | 49 | 0.9 | 139 | 3.3 | 17.5 | 40 | 9900 | 4.7 | no | no | no | good | no | no | notckd |
| 332 | 34 | 70 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | NA | 33 | 1.0 | 150 | 5.0 | 15.3 | 44 | 10500 | 6.1 | no | no | no | good | no | no | notckd |
| 167 | 34 | 70 | 1.020 | 0 | 0 | abnormal | normal | notpresent | notpresent | 139 | 19 | 0.9 | NA | NA | 12.7 | 42 | 2200 | NA | no | no | no | poor | no | no | possibleckd |
We have seen the negate operator ! already before to exclude certain data.
# Select data of patients that have a blood pressure higher than 70 mm/Hg AND leave out the data of patients which appetite has been scored as "poor". Store the data in 'tibble5'.
tibble5 <- filter(tibble1, `Blood pressure (mm/Hg)` > 70 & Appetite != "poor")
formatted_table(head(tibble5))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 351 | 29 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 83 | 49 | 0.9 | 139 | 3.3 | 17.5 | 40 | 9900 | 4.7 | no | no | no | good | no | no | notckd |
| 258 | 42 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 98 | 20 | 0.5 | 140 | 3.5 | 13.9 | 44 | 8400 | 5.5 | no | no | no | good | no | no | notckd |
| 177 | 65 | 80 | 1.015 | 2 | 1 | normal | normal | present | notpresent | 215 | 133 | 2.5 | NA | NA | 13.2 | 41 | NA | NA | no | yes | no | good | no | no | ckd |
| 265 | 50 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 97 | 40 | 0.6 | 150 | 4.5 | 14.2 | 48 | 10500 | 5.0 | no | no | no | good | no | no | notckd |
| 218 | 33 | 90 | 1.015 | 0 | 0 | NA | normal | notpresent | notpresent | 92 | 19 | 0.8 | NA | NA | 11.8 | 34 | 7000 | NA | no | no | no | good | no | no | ckd |
| 291 | 47 | 80 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 124 | 44 | 1.0 | 140 | 4.9 | 14.9 | 41 | 7000 | 5.7 | no | no | no | good | no | no | notckd |
Sorting on a specific column uses the arrange()
function. If you do not specify, the indicated column is sorted
ascending.
# Sort on creatine levels with the lowest value on top. Store the new tibble in tibble6.
tibble6 <- arrange(tibble1, `[Creatine] (mg/dl)`)
formatted_table(head(tibble6))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 354 | 32 | 60 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 102 | 17 | 0.4 | 147 | 4.7 | 14.6 | 41 | 6800 | 5.1 | no | no | no | good | no | no | notckd |
| 258 | 42 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 98 | 20 | 0.5 | 140 | 3.5 | 13.9 | 44 | 8400 | 5.5 | no | no | no | good | no | no | notckd |
| 307 | 47 | 60 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 137 | 17 | 0.5 | 150 | 3.5 | 13.6 | 44 | 7900 | 4.5 | no | no | no | good | no | no | notckd |
| 363 | 67 | 80 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 99 | 40 | 0.5 | NA | NA | 17.8 | 44 | 5900 | 5.2 | no | no | no | good | no | no | notckd |
| 316 | 35 | NA | 1.020 | 0 | 0 | normal | normal | NA | NA | 99 | 30 | 0.5 | 135 | 4.9 | 15.4 | 48 | 5000 | 5.2 | no | no | no | good | no | no | notckd |
| 375 | 70 | 80 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 74 | 41 | 0.5 | 143 | 4.5 | 15.1 | 48 | 9700 | 5.6 | no | no | no | good | no | no | notckd |
# Sort on creatine levels with the lowest value on top. Store the new tibble in tibble7.
tibble7 <- arrange(tibble1, desc(`[Creatine] (mg/dl)`))
formatted_table(head(tibble7))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21 | 60 | 90 | NA | NA | NA | NA | NA | notpresent | notpresent | NA | 180 | 76.0 | 4.5 | NA | 10.9 | 32 | 6200 | 3.6 | yes | yes | yes | good | no | no | ckd |
| 61 | 67 | 80 | 1.010 | 1 | 3 | normal | abnormal | notpresent | notpresent | 182 | 391 | 32.0 | 163.0 | 39.0 | NA | NA | NA | NA | no | no | no | good | yes | no | ckd |
| 6 | 68 | 70 | 1.010 | 0 | 0 | NA | normal | notpresent | notpresent | 100 | 54 | 24.0 | 104.0 | 4.0 | 12.4 | 36 | NA | NA | no | no | no | good | no | no | ckd |
| 143 | 41 | 80 | 1.015 | 1 | 4 | abnormal | normal | notpresent | notpresent | 210 | 165 | 18.0 | 135.0 | 4.7 | NA | NA | NA | NA | no | yes | no | good | no | no | possibleckd |
| 134 | 47 | 100 | 1.010 | NA | NA | normal | NA | notpresent | notpresent | 122 | NA | 16.9 | 138.0 | 5.2 | 10.8 | 33 | 10200 | 3.8 | no | yes | no | good | no | no | ckd |
| 154 | 56 | 90 | 1.005 | 4 | 3 | abnormal | abnormal | notpresent | notpresent | 242 | 132 | 16.4 | 140.0 | 4.2 | 8.4 | 26 | NA | 3.0 | yes | yes | no | poor | yes | yes | ckd |
You can easily check if the lowest or highest value is on top using
the min() and max() functions and check in the
table.
# Check the minimum of [Creatine] in tibble 6.
# Check the maximum of [Creatine] in tibble 7.
min(tibble6$`[Creatine] (mg/dl)`, na.rm = T)## [1] 0.4
|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 354 | 32 | 60 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 102 | 17 | 0.4 | 147 | 4.7 | 14.6 | 41 | 6800 | 5.1 | no | no | no | good | no | no | notckd |
## [1] 76
|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21 | 60 | 90 | NA | NA | NA | NA | NA | notpresent | notpresent | NA | 180 | 76 | 4.5 | NA | 10.9 | 32 | 6200 | 3.6 | yes | yes | yes | good | no | no | ckd |
Sorting on multiple levels is very simple. Just add another argument on which column to sort. You can even first sort from high to low and then from low to high.
# Sort on Blood pressure and then on creatine level, both from low to high.
tibble8 <- arrange(tibble1, `Blood pressure (mm/Hg)`, `[Creatine] (mg/dl)`)
formatted_table(head(tibble8))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 7 | 50 | 1.020 | 4 | 0 | NA | normal | notpresent | notpresent | NA | 18 | 0.8 | NA | NA | 11.3 | 38 | 6000 | NA | no | no | no | good | no | no | ckd |
| 186 | 8 | 50 | 1.020 | 4 | 0 | normal | normal | notpresent | notpresent | NA | 46 | 1.0 | 135 | 3.8 | NA | NA | NA | NA | no | no | no | good | yes | no | ckd |
| 175 | 60 | 50 | 1.010 | 0 | 0 | NA | normal | notpresent | notpresent | 261 | 58 | 2.2 | 113 | 3.0 | NA | NA | 4200 | 3.4 | yes | no | no | good | no | no | ckd |
| 229 | 59 | 50 | 1.010 | 3 | 0 | normal | abnormal | notpresent | notpresent | 241 | 191 | 12.0 | 114 | 2.9 | 9.6 | 31 | 15700 | 3.8 | no | yes | no | good | yes | no | ckd |
| 354 | 32 | 60 | 1.025 | 0 | 0 | normal | normal | notpresent | notpresent | 102 | 17 | 0.4 | 147 | 4.7 | 14.6 | 41 | 6800 | 5.1 | no | no | no | good | no | no | notckd |
| 307 | 47 | 60 | 1.020 | 0 | 0 | normal | normal | notpresent | notpresent | 137 | 17 | 0.5 | 150 | 3.5 | 13.6 | 44 | 7900 | 4.5 | no | no | no | good | no | no | notckd |
# Sort on Blood pressure from low to high and then on creatine level from high to low.
tibble9 <- arrange(tibble1, `Blood pressure (mm/Hg)`, desc(`[Creatine] (mg/dl)`))
formatted_table(head(tibble9))|
patient_id dbl |
Age (years) dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Albumine dbl |
Sugar dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 229 | 59 | 50 | 1.010 | 3 | 0 | normal | abnormal | notpresent | notpresent | 241 | 191 | 12.0 | 114 | 2.9 | 9.6 | 31 | 15700 | 3.8 | no | yes | no | good | yes | no | ckd |
| 175 | 60 | 50 | 1.010 | 0 | 0 | NA | normal | notpresent | notpresent | 261 | 58 | 2.2 | 113 | 3.0 | NA | NA | 4200 | 3.4 | yes | no | no | good | no | no | ckd |
| 186 | 8 | 50 | 1.020 | 4 | 0 | normal | normal | notpresent | notpresent | NA | 46 | 1.0 | 135 | 3.8 | NA | NA | NA | NA | no | no | no | good | yes | no | ckd |
| 1 | 7 | 50 | 1.020 | 4 | 0 | NA | normal | notpresent | notpresent | NA | 18 | 0.8 | NA | NA | 11.3 | 38 | 6000 | NA | no | no | no | good | no | no | ckd |
| 127 | 71 | 60 | 1.015 | 4 | 0 | normal | normal | notpresent | notpresent | 118 | 125 | 5.3 | 136 | 4.9 | 11.4 | 35 | 15200 | 4.3 | yes | yes | no | poor | yes | no | ckd |
| 189 | 64 | 60 | 1.010 | 4 | 1 | abnormal | abnormal | notpresent | present | 239 | 58 | 4.3 | 137 | 5.4 | 9.5 | 29 | 7500 | 3.4 | yes | yes | no | poor | yes | no | ckd |
If you want only some of the columns that you want to work on, you can select these columns.
# Select the columns for patient_id, Blood pressure, sodium an potassium levels.
tibble10 <- select(tibble1, patient_id, `Blood pressure (mm/Hg)`,
`[Na] (mEq/L)`, `[K] (mEq/L)`)
formatted_table(head(tibble10))|
patient_id dbl |
Blood pressure (mm/Hg) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
|---|---|---|---|
| 157 | 70 | 136 | 4.7 |
| 109 | 70 | NA | NA |
| 17 | 80 | 139 | 3.7 |
| 347 | 60 | 144 | 5.0 |
| 24 | 100 | 129 | 4.0 |
| 175 | 50 | 113 | 3.0 |
If you want to leave out columns, use the ‘-’ sign to indicate which columns should not be present.
# Leave out the columns for Age, Albumine and Sugar.
tibble11 <- select(tibble1, -`Age (years)`, -Albumine, -Sugar)
formatted_table(head(tibble11))|
patient_id dbl |
Blood pressure (mm/Hg) dbl |
Specific gravity dbl |
Red blood cells chr |
Pus in cells chr |
Pus cell clumps chr |
Bacteria chr |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
[Na] (mEq/L) dbl |
[K] (mEq/L) dbl |
Hemoglobine (mg) dbl |
Packed cell volume dbl |
White blood cell count (cells/µl) dbl |
Red blood cell count (millions/µl) dbl |
Hypertension chr |
Diabetes mellitus chr |
Coronary Artery Disease chr |
Appetite chr |
Pedal edema chr |
Anemia chr |
Classification chr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 157 | 70 | 1.025 | normal | abnormal | notpresent | notpresent | 122 | 42.0 | 1.7 | 136 | 4.7 | 12.6 | 39 | 7900 | 3.9 | yes | yes | no | good | no | no | ckd |
| 109 | 70 | NA | NA | NA | notpresent | notpresent | 233 | 50.1 | 1.9 | NA | NA | 11.7 | NA | NA | NA | no | yes | no | good | no | no | ckd |
| 17 | 80 | NA | NA | NA | notpresent | notpresent | 114 | 87.0 | 5.2 | 139 | 3.7 | 12.1 | NA | NA | NA | yes | no | no | poor | no | no | possibleckd |
| 347 | 60 | 1.025 | normal | normal | notpresent | notpresent | 108 | 25.0 | 1.0 | 144 | 5.0 | 17.8 | 43 | 7200 | 5.5 | no | no | no | good | no | no | notckd |
| 24 | 100 | 1.015 | normal | abnormal | notpresent | present | NA | 50.0 | 1.4 | 129 | 4.0 | 11.1 | 39 | 8300 | 4.6 | yes | no | no | poor | no | no | possibleckd |
| 175 | 50 | 1.010 | NA | normal | notpresent | notpresent | 261 | 58.0 | 2.2 | 113 | 3.0 | NA | NA | 4200 | 3.4 | yes | no | no | good | no | no | ckd |
If column names have similar column names it is sometimes possible to
select on part of these similar names with the functions
starts_with() or ends_with().
# Select the columns that end with '(mg/dl)'.
# Select the columns that start with 'Pus'.
tibble12 <- select(tibble1, patient_id, ends_with("(mg/dl)"))
formatted_table(head(tibble12))|
patient_id dbl |
[Glucose] (mg/dl) dbl |
[Blood urea] (mg/dl) dbl |
[Creatine] (mg/dl) dbl |
|---|---|---|---|
| 157 | 122 | 42.0 | 1.7 |
| 109 | 233 | 50.1 | 1.9 |
| 17 | 114 | 87.0 | 5.2 |
| 347 | 108 | 25.0 | 1.0 |
| 24 | NA | 50.0 | 1.4 |
| 175 | 261 | 58.0 | 2.2 |
|
patient_id dbl |
Pus in cells chr |
Pus cell clumps chr |
|---|---|---|
| 157 | abnormal | notpresent |
| 109 | NA | notpresent |
| 17 | NA | notpresent |
| 347 | normal | notpresent |
| 24 | abnormal | notpresent |
| 175 | normal | notpresent |
Because there is so much data to analyze, it might be helpful to look at a summary of all the data that is present in the data frame.
## patient_id Age (years) Blood pressure (mm/Hg) Specific gravity
## Min. : 1.0 Min. : 2.00 Min. : 50.00 Min. :1.005
## 1st Qu.:110.5 1st Qu.:42.00 1st Qu.: 70.00 1st Qu.:1.010
## Median :202.0 Median :55.00 Median : 70.00 Median :1.020
## Mean :202.9 Mean :51.45 Mean : 76.05 Mean :1.017
## 3rd Qu.:302.2 3rd Qu.:65.00 3rd Qu.: 80.00 3rd Qu.:1.020
## Max. :399.0 Max. :90.00 Max. :180.00 Max. :1.025
## NA's :5 NA's :9 NA's :36
## Albumine Sugar Red blood cells Pus in cells
## Min. :0.000 Min. :0.000 Length:280 Length:280
## 1st Qu.:0.000 1st Qu.:0.000 Class :character Class :character
## Median :0.000 Median :0.000 Mode :character Mode :character
## Mean :1.024 Mean :0.438
## 3rd Qu.:2.000 3rd Qu.:0.000
## Max. :5.000 Max. :5.000
## NA's :35 NA's :38
## Pus cell clumps Bacteria [Glucose] (mg/dl) [Blood urea] (mg/dl)
## Length:280 Length:280 Min. : 70.0 Min. : 10.00
## Class :character Class :character 1st Qu.:100.0 1st Qu.: 27.25
## Mode :character Mode :character Median :124.0 Median : 41.00
## Mean :150.2 Mean : 56.98
## 3rd Qu.:171.5 3rd Qu.: 64.75
## Max. :490.0 Max. :391.00
## NA's :33 NA's :14
## [Creatine] (mg/dl) [Na] (mEq/L) [K] (mEq/L) Hemoglobine (mg)
## Min. : 0.400 Min. : 4.5 Min. : 2.700 Min. : 3.10
## 1st Qu.: 0.900 1st Qu.:135.0 1st Qu.: 3.900 1st Qu.:10.50
## Median : 1.300 Median :138.0 Median : 4.400 Median :12.70
## Mean : 3.006 Mean :137.3 Mean : 4.754 Mean :12.53
## 3rd Qu.: 2.800 3rd Qu.:141.0 3rd Qu.: 4.900 3rd Qu.:14.90
## Max. :76.000 Max. :163.0 Max. :47.000 Max. :17.80
## NA's :12 NA's :67 NA's :68 NA's :39
## Packed cell volume White blood cell count (cells/µl)
## Min. : 9.00 Min. : 2200
## 1st Qu.:33.00 1st Qu.: 6325
## Median :41.00 Median : 7900
## Mean :39.17 Mean : 8355
## 3rd Qu.:46.00 3rd Qu.: 9800
## Max. :54.00 Max. :26400
## NA's :51 NA's :78
## Red blood cell count (millions/µl) Hypertension Diabetes mellitus
## Min. :2.100 Length:280 Length:280
## 1st Qu.:3.925 Class :character Class :character
## Median :4.800 Mode :character Mode :character
## Mean :4.706
## 3rd Qu.:5.500
## Max. :8.000
## NA's :94
## Coronary Artery Disease Appetite Pedal edema
## Length:280 Length:280 Length:280
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Anemia Classification
## Length:280 Length:280
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 50.00 70.00 70.00 76.05 80.00 180.00 9
The summary contains the most used statistics used (such as the mean,
minimum and maximum values) for the columns that contain numeric
data.
Rounding numbers
You see in the previous tables that calculations are not rounded to 2
or 3 decimals. If you want to round to a specific amount of decimals,
you can use the round() function. NOTE: rounding is
according to the IEC 60559 standard. This will round the decimal to the
‘even number’.
## [1] 0
## [1] 2
## [1] 2
## [1] 4
If you want to round to a certain amount of decimals, use the
argument digits =.
## [1] 2.83
## [1] 2.8346
## [1] 2.8345
Statistics
R is designed for statistic analysis. This is of course not the scope of this course, but we will discuss the main statistical analyses that are used in the lab.
We have encountered a couple of the statistical calculations already.
# Calculate the average of the 'White blood cell count (cells/µl)'. Round the average to 2 decimals.
# Present it nicely in a sentence.
WBC <- round(mean(tibble1$`White blood cell count (cells/µl)`, na.rm = T), digits = 2)
paste("The average White Bloodcell Count is:", WBC, "cells/µl")## [1] "The average White Bloodcell Count is: 8354.95 cells/µl"
# What are the minimum and maximum amounts of Hemoglobine? Present without decimals.
# Present it nicely in a sentence.
hemo_min <- round(min(tibble1$`Hemoglobine (mg)`, na.rm = T))
hemo_max <- round(max(tibble1$`Hemoglobine (mg)`, na.rm = T))
paste("The minimum amount of hemoglobin is:", hemo_min, "mg")## [1] "The minimum amount of hemoglobin is: 3 mg"
## [1] "The maximum amount of hemoglobin is: 18 mg"
# What is the median of the age of the patients? Present without decimals.
# Present it nicely in a sentence.
median_pat <- round(median(tibble1$`Age (years)`, na.rm = T))
paste("The median age of the patients is:", median_pat, "years")## [1] "The median age of the patients is: 55 years"
# Present the quantiles for 25%, 50% and 75% of the glucose levels. Round to two decimals.
round(quantile(tibble1$`[Glucose] (mg/dl)`, c(0.25, 0.50, 0.75), na.rm = T), digits = 2)## 25% 50% 75%
## 100.0 124.0 171.5
# Calculate the average, the standard deviation and the standard error of the mean for the Creatine levels. Round all to one decimal.
# Present them in a sentence.
mean_crea <- round(mean(tibble1$`[Creatine] (mg/dl)`, na.rm = T), digits = 1)
sd_crea <- round(sd(tibble1$`[Creatine] (mg/dl)`, na.rm = T), digits = 1)
se_crea <- round(sd(tibble1$`[Creatine] (mg/dl)`,
na.rm = T)/sqrt(length(tibble1$`[Creatine] (mg/dl)`)),
digits = 3)
paste("The average Creatine level is:", mean_crea, "mg/dl")## [1] "The average Creatine level is: 3 mg/dl"
## [1] "The standard deviation of the Creatine level is: 5.9"
## [1] "The standard error of the mean for the Creatine levels is: 0.35"
Summary of data
With tidyverse it is possible to make this presentable in a tibble
and then combine it with group_by() to calculate and show
these statistics by category.
# Summarize the Blood pressure and present it in a tibble.
bp1 <- summarize(tibble1, `Blood pressure (mm/Hg)` = mean(`Blood pressure (mm/Hg)`,
na.rm = T))
formatted_table(bp1)|
Blood pressure (mm/Hg) dbl |
|---|
| 76.05166 |
# Calculate the average Blood pressure for the categories 'ckd' (chronic kidney disease) and 'notcdk' (not chronic kidney disease).
by_class <- group_by(tibble1, Classification)
bp2 <- summarize(by_class, `Blood pressure (mm/Hg)` = mean(`Blood pressure (mm/Hg)`,
na.rm = T))
formatted_table(bp2)|
Classification chr |
Blood pressure (mm/Hg) dbl |
|---|---|
| ckd | 78.19444 |
| notckd | 71.05769 |
| possibleckd | 85.21739 |
# Add other variables to which you can calculate the mean.
bp3 <- summarize(by_class, `Blood pressure (mm/Hg)` = mean(`Blood pressure (mm/Hg)`,
na.rm = T),
`Age (years)` = mean(`Age (years)`, na.rm = T),
`[Glucose] (mg/dl)` = mean(`[Glucose] (mg/dl)`, na.rm = T))
formatted_table(bp3)|
Classification chr |
Blood pressure (mm/Hg) dbl |
Age (years) dbl |
[Glucose] (mg/dl) dbl |
|---|---|---|---|
| ckd | 78.19444 | 55.85135 | 178.1102 |
| notckd | 71.05769 | 45.80189 | 108.3500 |
| possibleckd | 85.21739 | 49.00000 | 182.0500 |
# And add the standard deviation to make the data complete.
bp4 <- summarize(by_class, mean_bp = mean(`Blood pressure (mm/Hg)`, na.rm = T),
sd_bp = sd(`Blood pressure (mm/Hg)`, na.rm = T),
mean_age = mean(`Age (years)`, na.rm = T),
sd_age = sd(`Age (years)`, na.rm = T),
mean_glucose = mean(`[Glucose] (mg/dl)`, na.rm = T),
sd_glucose = sd(`[Glucose] (mg/dl)`, na.rm = T))
formatted_table(bp4)|
Classification chr |
mean_bp dbl |
sd_bp dbl |
mean_age dbl |
sd_age dbl |
mean_glucose dbl |
sd_glucose dbl |
|---|---|---|---|---|---|---|
| ckd | 78.19444 | 14.467363 | 55.85135 | 18.01185 | 178.1102 | 86.75611 |
| notckd | 71.05769 | 8.580659 | 45.80189 | 15.86336 | 108.3500 | 18.26537 |
| possibleckd | 85.21739 | 23.523598 | 49.00000 | 12.64120 | 182.0500 | 101.65757 |
Using summarize_each() is easier to determine for the
(selected) results for blood pressure, age and glucose concentrations
and present them in a orderly manner. However, there are many NA values
in this data set. Remember, that calculations with NA values is not
possible.
## [1] "Mean with NA values: NA"
## [1] "Mean without NA values: 4.72"
# Resolve with the `na.rm = ` argument.
paste("Mean with NA values removed:", mean(c(6.5, 3.5, 1.3, NA, 7.8), na.rm = T))## [1] "Mean with NA values removed: 4.775"
## [1] "Sum with NA values: NA"
## [1] "Sum without NA values: 23.6"
# Resolve with the `na.rm = ` argument.
paste("Sum with NA values removed:", sum(c(6.5, 3.5, 1.3, NA, 7.8), na.rm = T))## [1] "Sum with NA values removed: 19.1"
For using summarize_each(), droppping the NA values will
let us do the calculations.
# First drop all NA values to be able to do calculations.
# Then use the `summarize_each()` to do the calculations for all columns.
bp5 <- drop_na(by_class)
bp5_stats <- summarize_each(bp5, funs(mean, sd, se = sd(.)/sqrt(n())))
formatted_table(bp5_stats)|
Classification chr |
patient_id_mean dbl |
Age (years)_mean dbl |
Blood pressure (mm/Hg)_mean dbl |
Specific gravity_mean dbl |
Albumine_mean dbl |
Sugar_mean dbl |
Red blood cells_mean dbl |
Pus in cells_mean dbl |
Pus cell clumps_mean dbl |
Bacteria_mean dbl |
[Glucose] (mg/dl)_mean dbl |
[Blood urea] (mg/dl)_mean dbl |
[Creatine] (mg/dl)_mean dbl |
[Na] (mEq/L)_mean dbl |
[K] (mEq/L)_mean dbl |
Hemoglobine (mg)_mean dbl |
Packed cell volume_mean dbl |
White blood cell count (cells/µl)_mean dbl |
Red blood cell count (millions/µl)_mean dbl |
Hypertension_mean dbl |
Diabetes mellitus_mean dbl |
Coronary Artery Disease_mean dbl |
Appetite_mean dbl |
Pedal edema_mean dbl |
Anemia_mean dbl |
patient_id_sd dbl |
Age (years)_sd dbl |
Blood pressure (mm/Hg)_sd dbl |
Specific gravity_sd dbl |
Albumine_sd dbl |
Sugar_sd dbl |
Red blood cells_sd dbl |
Pus in cells_sd dbl |
Pus cell clumps_sd dbl |
Bacteria_sd dbl |
[Glucose] (mg/dl)_sd dbl |
[Blood urea] (mg/dl)_sd dbl |
[Creatine] (mg/dl)_sd dbl |
[Na] (mEq/L)_sd dbl |
[K] (mEq/L)_sd dbl |
Hemoglobine (mg)_sd dbl |
Packed cell volume_sd dbl |
White blood cell count (cells/µl)_sd dbl |
Red blood cell count (millions/µl)_sd dbl |
Hypertension_sd dbl |
Diabetes mellitus_sd dbl |
Coronary Artery Disease_sd dbl |
Appetite_sd dbl |
Pedal edema_sd dbl |
Anemia_sd dbl |
patient_id_se dbl |
Age (years)_se dbl |
Blood pressure (mm/Hg)_se dbl |
Specific gravity_se dbl |
Albumine_se dbl |
Sugar_se dbl |
Red blood cells_se dbl |
Pus in cells_se dbl |
Pus cell clumps_se dbl |
Bacteria_se dbl |
[Glucose] (mg/dl)_se dbl |
[Blood urea] (mg/dl)_se dbl |
[Creatine] (mg/dl)_se dbl |
[Na] (mEq/L)_se dbl |
[K] (mEq/L)_se dbl |
Hemoglobine (mg)_se dbl |
Packed cell volume_se dbl |
White blood cell count (cells/µl)_se dbl |
Red blood cell count (millions/µl)_se dbl |
Hypertension_se dbl |
Diabetes mellitus_se dbl |
Coronary Artery Disease_se dbl |
Appetite_se dbl |
Pedal edema_se dbl |
Anemia_se dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ckd | 142.61538 | 59.50000 | 75.76923 | 1.012692 | 3.038461 | 0.9615385 | NA | NA | NA | NA | 195.9231 | 100.1538 | 5.0461538 | 130.9231 | 6.219231 | 9.826923 | 29.76923 | 11234.615 | 3.776923 | NA | NA | NA | NA | NA | NA | 72.49832 | 14.602055 | 14.470128 | 0.0045234 | 1.076318 | 1.310901 | NA | NA | NA | NA | 74.84112 | 66.87762 | 3.7022405 | 7.450761 | 8.3718108 | 2.436647 | 8.011530 | 5101.368 | 1.1229631 | NA | NA | NA | NA | NA | NA | 14.218090 | 2.863698 | 2.837826 | 0.0008871 | 0.2110834 | 0.2570888 | NA | NA | NA | NA | 14.677552 | 13.115779 | 0.7260691 | 1.4612145 | 1.6418472 | 0.4778657 | 1.5711903 | 1000.4605 | 0.2202312 | NA | NA | NA | NA | NA | NA |
| notckd | 324.66667 | 46.05128 | 71.66667 | 1.022756 | 0.000000 | 0.0000000 | NA | NA | NA | NA | 107.9872 | 33.5000 | 0.8576923 | 141.7051 | 4.333333 | 15.025641 | 46.53846 | 7596.154 | 5.302564 | NA | NA | NA | NA | NA | NA | 45.31955 | 15.829364 | 8.888438 | 0.0025029 | 0.000000 | 0.000000 | NA | NA | NA | NA | 18.20125 | 11.42877 | 0.2601295 | 4.952001 | 0.5801179 | 1.354112 | 3.959838 | 1816.836 | 0.5500825 | NA | NA | NA | NA | NA | NA | 5.131428 | 1.792323 | 1.006417 | 0.0002834 | 0.0000000 | 0.0000000 | NA | NA | NA | NA | 2.060886 | 1.294054 | 0.0294539 | 0.5607037 | 0.0656854 | 0.1533230 | 0.4483634 | 205.7162 | 0.0622846 | NA | NA | NA | NA | NA | NA |
| possibleckd | 79.33333 | 59.00000 | 86.66667 | 1.013333 | 2.000000 | 0.0000000 | NA | NA | NA | NA | 135.6667 | 102.6667 | 4.3000000 | 134.0000 | 5.066667 | 9.300000 | 28.66667 | 8700.000 | 3.566667 | NA | NA | NA | NA | NA | NA | 62.17180 | 2.645751 | 5.773503 | 0.0028868 | 0.000000 | 0.000000 | NA | NA | NA | NA | 34.48671 | 47.64802 | 2.2271057 | 2.645751 | 0.2309401 | 1.708801 | 4.509250 | 2095.233 | 0.4725816 | NA | NA | NA | NA | NA | NA | 35.894908 | 1.527525 | 3.333333 | 0.0016667 | 0.0000000 | 0.0000000 | NA | NA | NA | NA | 19.910913 | 27.509594 | 1.2858201 | 1.5275252 | 0.1333333 | 0.9865766 | 2.6034166 | 1209.6832 | 0.2728451 | NA | NA | NA | NA | NA | NA |
This is an example of a function that is still available, but will
not be updated anymore, because there is a new function available that
does the same, but is evolved to a better function. Using
across() will put the mean and sd next to each other (which
might be more convenient for your analyses).
# Do the same as for `summarize_each()` with the new `across()` function.
bp6_stats <- bp5 %>%
summarise(across(everything(), list(mean = mean, sd = sd)))
formatted_table(bp6_stats)|
Classification chr |
patient_id_mean dbl |
patient_id_sd dbl |
Age (years)_mean dbl |
Age (years)_sd dbl |
Blood pressure (mm/Hg)_mean dbl |
Blood pressure (mm/Hg)_sd dbl |
Specific gravity_mean dbl |
Specific gravity_sd dbl |
Albumine_mean dbl |
Albumine_sd dbl |
Sugar_mean dbl |
Sugar_sd dbl |
Red blood cells_mean dbl |
Red blood cells_sd dbl |
Pus in cells_mean dbl |
Pus in cells_sd dbl |
Pus cell clumps_mean dbl |
Pus cell clumps_sd dbl |
Bacteria_mean dbl |
Bacteria_sd dbl |
[Glucose] (mg/dl)_mean dbl |
[Glucose] (mg/dl)_sd dbl |
[Blood urea] (mg/dl)_mean dbl |
[Blood urea] (mg/dl)_sd dbl |
[Creatine] (mg/dl)_mean dbl |
[Creatine] (mg/dl)_sd dbl |
[Na] (mEq/L)_mean dbl |
[Na] (mEq/L)_sd dbl |
[K] (mEq/L)_mean dbl |
[K] (mEq/L)_sd dbl |
Hemoglobine (mg)_mean dbl |
Hemoglobine (mg)_sd dbl |
Packed cell volume_mean dbl |
Packed cell volume_sd dbl |
White blood cell count (cells/µl)_mean dbl |
White blood cell count (cells/µl)_sd dbl |
Red blood cell count (millions/µl)_mean dbl |
Red blood cell count (millions/µl)_sd dbl |
Hypertension_mean dbl |
Hypertension_sd dbl |
Diabetes mellitus_mean dbl |
Diabetes mellitus_sd dbl |
Coronary Artery Disease_mean dbl |
Coronary Artery Disease_sd dbl |
Appetite_mean dbl |
Appetite_sd dbl |
Pedal edema_mean dbl |
Pedal edema_sd dbl |
Anemia_mean dbl |
Anemia_sd dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ckd | 142.61538 | 72.49832 | 59.50000 | 14.602055 | 75.76923 | 14.470128 | 1.012692 | 0.0045234 | 3.038461 | 1.076318 | 0.9615385 | 1.310901 | NA | NA | NA | NA | NA | NA | NA | NA | 195.9231 | 74.84112 | 100.1538 | 66.87762 | 5.0461538 | 3.7022405 | 130.9231 | 7.450761 | 6.219231 | 8.3718108 | 9.826923 | 2.436647 | 29.76923 | 8.011530 | 11234.615 | 5101.368 | 3.776923 | 1.1229631 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| notckd | 324.66667 | 45.31955 | 46.05128 | 15.829364 | 71.66667 | 8.888438 | 1.022756 | 0.0025029 | 0.000000 | 0.000000 | 0.0000000 | 0.000000 | NA | NA | NA | NA | NA | NA | NA | NA | 107.9872 | 18.20125 | 33.5000 | 11.42877 | 0.8576923 | 0.2601295 | 141.7051 | 4.952001 | 4.333333 | 0.5801179 | 15.025641 | 1.354112 | 46.53846 | 3.959838 | 7596.154 | 1816.836 | 5.302564 | 0.5500825 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| possibleckd | 79.33333 | 62.17180 | 59.00000 | 2.645751 | 86.66667 | 5.773503 | 1.013333 | 0.0028868 | 2.000000 | 0.000000 | 0.0000000 | 0.000000 | NA | NA | NA | NA | NA | NA | NA | NA | 135.6667 | 34.48671 | 102.6667 | 47.64802 | 4.3000000 | 2.2271057 | 134.0000 | 2.645751 | 5.066667 | 0.2309401 | 9.300000 | 1.708801 | 28.66667 | 4.509250 | 8700.000 | 2095.233 | 3.566667 | 0.4725816 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Some of the columns contain characters, so calculations are not
possible for these columns. You can drop these columns using
select().
# Drop the columns for the mean for 'Red blood cells_mean' and the standard deviation ('Red blood cells_sd').
bp7_stats <- summarize_each(bp5, funs(mean, sd, se = sd(.)/sqrt(n()))) %>%
select(c(-`Red blood cells_mean`,-`Red blood cells_sd`))
formatted_table(bp7_stats)|
Classification chr |
patient_id_mean dbl |
Age (years)_mean dbl |
Blood pressure (mm/Hg)_mean dbl |
Specific gravity_mean dbl |
Albumine_mean dbl |
Sugar_mean dbl |
Pus in cells_mean dbl |
Pus cell clumps_mean dbl |
Bacteria_mean dbl |
[Glucose] (mg/dl)_mean dbl |
[Blood urea] (mg/dl)_mean dbl |
[Creatine] (mg/dl)_mean dbl |
[Na] (mEq/L)_mean dbl |
[K] (mEq/L)_mean dbl |
Hemoglobine (mg)_mean dbl |
Packed cell volume_mean dbl |
White blood cell count (cells/µl)_mean dbl |
Red blood cell count (millions/µl)_mean dbl |
Hypertension_mean dbl |
Diabetes mellitus_mean dbl |
Coronary Artery Disease_mean dbl |
Appetite_mean dbl |
Pedal edema_mean dbl |
Anemia_mean dbl |
patient_id_sd dbl |
Age (years)_sd dbl |
Blood pressure (mm/Hg)_sd dbl |
Specific gravity_sd dbl |
Albumine_sd dbl |
Sugar_sd dbl |
Pus in cells_sd dbl |
Pus cell clumps_sd dbl |
Bacteria_sd dbl |
[Glucose] (mg/dl)_sd dbl |
[Blood urea] (mg/dl)_sd dbl |
[Creatine] (mg/dl)_sd dbl |
[Na] (mEq/L)_sd dbl |
[K] (mEq/L)_sd dbl |
Hemoglobine (mg)_sd dbl |
Packed cell volume_sd dbl |
White blood cell count (cells/µl)_sd dbl |
Red blood cell count (millions/µl)_sd dbl |
Hypertension_sd dbl |
Diabetes mellitus_sd dbl |
Coronary Artery Disease_sd dbl |
Appetite_sd dbl |
Pedal edema_sd dbl |
Anemia_sd dbl |
patient_id_se dbl |
Age (years)_se dbl |
Blood pressure (mm/Hg)_se dbl |
Specific gravity_se dbl |
Albumine_se dbl |
Sugar_se dbl |
Red blood cells_se dbl |
Pus in cells_se dbl |
Pus cell clumps_se dbl |
Bacteria_se dbl |
[Glucose] (mg/dl)_se dbl |
[Blood urea] (mg/dl)_se dbl |
[Creatine] (mg/dl)_se dbl |
[Na] (mEq/L)_se dbl |
[K] (mEq/L)_se dbl |
Hemoglobine (mg)_se dbl |
Packed cell volume_se dbl |
White blood cell count (cells/µl)_se dbl |
Red blood cell count (millions/µl)_se dbl |
Hypertension_se dbl |
Diabetes mellitus_se dbl |
Coronary Artery Disease_se dbl |
Appetite_se dbl |
Pedal edema_se dbl |
Anemia_se dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ckd | 142.61538 | 59.50000 | 75.76923 | 1.012692 | 3.038461 | 0.9615385 | NA | NA | NA | 195.9231 | 100.1538 | 5.0461538 | 130.9231 | 6.219231 | 9.826923 | 29.76923 | 11234.615 | 3.776923 | NA | NA | NA | NA | NA | NA | 72.49832 | 14.602055 | 14.470128 | 0.0045234 | 1.076318 | 1.310901 | NA | NA | NA | 74.84112 | 66.87762 | 3.7022405 | 7.450761 | 8.3718108 | 2.436647 | 8.011530 | 5101.368 | 1.1229631 | NA | NA | NA | NA | NA | NA | 14.218090 | 2.863698 | 2.837826 | 0.0008871 | 0.2110834 | 0.2570888 | NA | NA | NA | NA | 14.677552 | 13.115779 | 0.7260691 | 1.4612145 | 1.6418472 | 0.4778657 | 1.5711903 | 1000.4605 | 0.2202312 | NA | NA | NA | NA | NA | NA |
| notckd | 324.66667 | 46.05128 | 71.66667 | 1.022756 | 0.000000 | 0.0000000 | NA | NA | NA | 107.9872 | 33.5000 | 0.8576923 | 141.7051 | 4.333333 | 15.025641 | 46.53846 | 7596.154 | 5.302564 | NA | NA | NA | NA | NA | NA | 45.31955 | 15.829364 | 8.888438 | 0.0025029 | 0.000000 | 0.000000 | NA | NA | NA | 18.20125 | 11.42877 | 0.2601295 | 4.952001 | 0.5801179 | 1.354112 | 3.959838 | 1816.836 | 0.5500825 | NA | NA | NA | NA | NA | NA | 5.131428 | 1.792323 | 1.006417 | 0.0002834 | 0.0000000 | 0.0000000 | NA | NA | NA | NA | 2.060886 | 1.294054 | 0.0294539 | 0.5607037 | 0.0656854 | 0.1533230 | 0.4483634 | 205.7162 | 0.0622846 | NA | NA | NA | NA | NA | NA |
| possibleckd | 79.33333 | 59.00000 | 86.66667 | 1.013333 | 2.000000 | 0.0000000 | NA | NA | NA | 135.6667 | 102.6667 | 4.3000000 | 134.0000 | 5.066667 | 9.300000 | 28.66667 | 8700.000 | 3.566667 | NA | NA | NA | NA | NA | NA | 62.17180 | 2.645751 | 5.773503 | 0.0028868 | 0.000000 | 0.000000 | NA | NA | NA | 34.48671 | 47.64802 | 2.2271057 | 2.645751 | 0.2309401 | 1.708801 | 4.509250 | 2095.233 | 0.4725816 | NA | NA | NA | NA | NA | NA | 35.894908 | 1.527525 | 3.333333 | 0.0016667 | 0.0000000 | 0.0000000 | NA | NA | NA | NA | 19.910913 | 27.509594 | 1.2858201 | 1.5275252 | 0.1333333 | 0.9865766 | 2.6034166 | 1209.6832 | 0.2728451 | NA | NA | NA | NA | NA | NA |
# Or simplify using 'forward-chaining'.
bp8_stats <- bp5 %>%
summarise_each(funs(mean, sd, se = sd(.)/sqrt(n()))) %>%
select(c(-`Red blood cells_mean`,-`Red blood cells_sd`))
formatted_table(bp8_stats)|
Classification chr |
patient_id_mean dbl |
Age (years)_mean dbl |
Blood pressure (mm/Hg)_mean dbl |
Specific gravity_mean dbl |
Albumine_mean dbl |
Sugar_mean dbl |
Pus in cells_mean dbl |
Pus cell clumps_mean dbl |
Bacteria_mean dbl |
[Glucose] (mg/dl)_mean dbl |
[Blood urea] (mg/dl)_mean dbl |
[Creatine] (mg/dl)_mean dbl |
[Na] (mEq/L)_mean dbl |
[K] (mEq/L)_mean dbl |
Hemoglobine (mg)_mean dbl |
Packed cell volume_mean dbl |
White blood cell count (cells/µl)_mean dbl |
Red blood cell count (millions/µl)_mean dbl |
Hypertension_mean dbl |
Diabetes mellitus_mean dbl |
Coronary Artery Disease_mean dbl |
Appetite_mean dbl |
Pedal edema_mean dbl |
Anemia_mean dbl |
patient_id_sd dbl |
Age (years)_sd dbl |
Blood pressure (mm/Hg)_sd dbl |
Specific gravity_sd dbl |
Albumine_sd dbl |
Sugar_sd dbl |
Pus in cells_sd dbl |
Pus cell clumps_sd dbl |
Bacteria_sd dbl |
[Glucose] (mg/dl)_sd dbl |
[Blood urea] (mg/dl)_sd dbl |
[Creatine] (mg/dl)_sd dbl |
[Na] (mEq/L)_sd dbl |
[K] (mEq/L)_sd dbl |
Hemoglobine (mg)_sd dbl |
Packed cell volume_sd dbl |
White blood cell count (cells/µl)_sd dbl |
Red blood cell count (millions/µl)_sd dbl |
Hypertension_sd dbl |
Diabetes mellitus_sd dbl |
Coronary Artery Disease_sd dbl |
Appetite_sd dbl |
Pedal edema_sd dbl |
Anemia_sd dbl |
patient_id_se dbl |
Age (years)_se dbl |
Blood pressure (mm/Hg)_se dbl |
Specific gravity_se dbl |
Albumine_se dbl |
Sugar_se dbl |
Red blood cells_se dbl |
Pus in cells_se dbl |
Pus cell clumps_se dbl |
Bacteria_se dbl |
[Glucose] (mg/dl)_se dbl |
[Blood urea] (mg/dl)_se dbl |
[Creatine] (mg/dl)_se dbl |
[Na] (mEq/L)_se dbl |
[K] (mEq/L)_se dbl |
Hemoglobine (mg)_se dbl |
Packed cell volume_se dbl |
White blood cell count (cells/µl)_se dbl |
Red blood cell count (millions/µl)_se dbl |
Hypertension_se dbl |
Diabetes mellitus_se dbl |
Coronary Artery Disease_se dbl |
Appetite_se dbl |
Pedal edema_se dbl |
Anemia_se dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ckd | 142.61538 | 59.50000 | 75.76923 | 1.012692 | 3.038461 | 0.9615385 | NA | NA | NA | 195.9231 | 100.1538 | 5.0461538 | 130.9231 | 6.219231 | 9.826923 | 29.76923 | 11234.615 | 3.776923 | NA | NA | NA | NA | NA | NA | 72.49832 | 14.602055 | 14.470128 | 0.0045234 | 1.076318 | 1.310901 | NA | NA | NA | 74.84112 | 66.87762 | 3.7022405 | 7.450761 | 8.3718108 | 2.436647 | 8.011530 | 5101.368 | 1.1229631 | NA | NA | NA | NA | NA | NA | 14.218090 | 2.863698 | 2.837826 | 0.0008871 | 0.2110834 | 0.2570888 | NA | NA | NA | NA | 14.677552 | 13.115779 | 0.7260691 | 1.4612145 | 1.6418472 | 0.4778657 | 1.5711903 | 1000.4605 | 0.2202312 | NA | NA | NA | NA | NA | NA |
| notckd | 324.66667 | 46.05128 | 71.66667 | 1.022756 | 0.000000 | 0.0000000 | NA | NA | NA | 107.9872 | 33.5000 | 0.8576923 | 141.7051 | 4.333333 | 15.025641 | 46.53846 | 7596.154 | 5.302564 | NA | NA | NA | NA | NA | NA | 45.31955 | 15.829364 | 8.888438 | 0.0025029 | 0.000000 | 0.000000 | NA | NA | NA | 18.20125 | 11.42877 | 0.2601295 | 4.952001 | 0.5801179 | 1.354112 | 3.959838 | 1816.836 | 0.5500825 | NA | NA | NA | NA | NA | NA | 5.131428 | 1.792323 | 1.006417 | 0.0002834 | 0.0000000 | 0.0000000 | NA | NA | NA | NA | 2.060886 | 1.294054 | 0.0294539 | 0.5607037 | 0.0656854 | 0.1533230 | 0.4483634 | 205.7162 | 0.0622846 | NA | NA | NA | NA | NA | NA |
| possibleckd | 79.33333 | 59.00000 | 86.66667 | 1.013333 | 2.000000 | 0.0000000 | NA | NA | NA | 135.6667 | 102.6667 | 4.3000000 | 134.0000 | 5.066667 | 9.300000 | 28.66667 | 8700.000 | 3.566667 | NA | NA | NA | NA | NA | NA | 62.17180 | 2.645751 | 5.773503 | 0.0028868 | 0.000000 | 0.000000 | NA | NA | NA | 34.48671 | 47.64802 | 2.2271057 | 2.645751 | 0.2309401 | 1.708801 | 4.509250 | 2095.233 | 0.4725816 | NA | NA | NA | NA | NA | NA | 35.894908 | 1.527525 | 3.333333 | 0.0016667 | 0.0000000 | 0.0000000 | NA | NA | NA | NA | 19.910913 | 27.509594 | 1.2858201 | 1.5275252 | 0.1333333 | 0.9865766 | 2.6034166 | 1209.6832 | 0.2728451 | NA | NA | NA | NA | NA | NA |
Removing columns with character type data is also possible.
# Remove columns with character type.
bp9_stats <- select_if(drop_na(group_by(tibble1, Classification)), is.numeric)
formatted_table(summarise_each(bp9_stats, funs(mean, sd, se = sd(.)/sqrt(n()))))|
Classification chr |
patient_id_mean dbl |
Age (years)_mean dbl |
Blood pressure (mm/Hg)_mean dbl |
Specific gravity_mean dbl |
Albumine_mean dbl |
Sugar_mean dbl |
[Glucose] (mg/dl)_mean dbl |
[Blood urea] (mg/dl)_mean dbl |
[Creatine] (mg/dl)_mean dbl |
[Na] (mEq/L)_mean dbl |
[K] (mEq/L)_mean dbl |
Hemoglobine (mg)_mean dbl |
Packed cell volume_mean dbl |
White blood cell count (cells/µl)_mean dbl |
Red blood cell count (millions/µl)_mean dbl |
patient_id_sd dbl |
Age (years)_sd dbl |
Blood pressure (mm/Hg)_sd dbl |
Specific gravity_sd dbl |
Albumine_sd dbl |
Sugar_sd dbl |
[Glucose] (mg/dl)_sd dbl |
[Blood urea] (mg/dl)_sd dbl |
[Creatine] (mg/dl)_sd dbl |
[Na] (mEq/L)_sd dbl |
[K] (mEq/L)_sd dbl |
Hemoglobine (mg)_sd dbl |
Packed cell volume_sd dbl |
White blood cell count (cells/µl)_sd dbl |
Red blood cell count (millions/µl)_sd dbl |
patient_id_se dbl |
Age (years)_se dbl |
Blood pressure (mm/Hg)_se dbl |
Specific gravity_se dbl |
Albumine_se dbl |
Sugar_se dbl |
[Glucose] (mg/dl)_se dbl |
[Blood urea] (mg/dl)_se dbl |
[Creatine] (mg/dl)_se dbl |
[Na] (mEq/L)_se dbl |
[K] (mEq/L)_se dbl |
Hemoglobine (mg)_se dbl |
Packed cell volume_se dbl |
White blood cell count (cells/µl)_se dbl |
Red blood cell count (millions/µl)_se dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ckd | 142.61538 | 59.50000 | 75.76923 | 1.012692 | 3.038461 | 0.9615385 | 195.9231 | 100.1538 | 5.0461538 | 130.9231 | 6.219231 | 9.826923 | 29.76923 | 11234.615 | 3.776923 | 72.49832 | 14.602055 | 14.470128 | 0.0045234 | 1.076318 | 1.310901 | 74.84112 | 66.87762 | 3.7022405 | 7.450761 | 8.3718108 | 2.436647 | 8.011530 | 5101.368 | 1.1229631 | 14.218090 | 2.863698 | 2.837826 | 0.0008871 | 0.2110834 | 0.2570888 | 14.677552 | 13.115779 | 0.7260691 | 1.4612145 | 1.6418472 | 0.4778657 | 1.5711903 | 1000.4605 | 0.2202312 |
| notckd | 324.66667 | 46.05128 | 71.66667 | 1.022756 | 0.000000 | 0.0000000 | 107.9872 | 33.5000 | 0.8576923 | 141.7051 | 4.333333 | 15.025641 | 46.53846 | 7596.154 | 5.302564 | 45.31955 | 15.829364 | 8.888438 | 0.0025029 | 0.000000 | 0.000000 | 18.20125 | 11.42877 | 0.2601295 | 4.952001 | 0.5801179 | 1.354112 | 3.959838 | 1816.836 | 0.5500825 | 5.131428 | 1.792323 | 1.006417 | 0.0002834 | 0.0000000 | 0.0000000 | 2.060886 | 1.294054 | 0.0294539 | 0.5607037 | 0.0656854 | 0.1533230 | 0.4483634 | 205.7162 | 0.0622846 |
| possibleckd | 79.33333 | 59.00000 | 86.66667 | 1.013333 | 2.000000 | 0.0000000 | 135.6667 | 102.6667 | 4.3000000 | 134.0000 | 5.066667 | 9.300000 | 28.66667 | 8700.000 | 3.566667 | 62.17180 | 2.645751 | 5.773503 | 0.0028868 | 0.000000 | 0.000000 | 34.48671 | 47.64802 | 2.2271057 | 2.645751 | 0.2309401 | 1.708801 | 4.509250 | 2095.233 | 0.4725816 | 35.894908 | 1.527525 | 3.333333 | 0.0016667 | 0.0000000 | 0.0000000 | 19.910913 | 27.509594 | 1.2858201 | 1.5275252 | 0.1333333 | 0.9865766 | 2.6034166 | 1209.6832 | 0.2728451 |
bp10_stats <- bp5 %>%
select_if(is.numeric) %>%
summarise_each(funs(mean, sd, se = sd(.)/sqrt(n())))
formatted_table(bp10_stats)|
Classification chr |
patient_id_mean dbl |
Age (years)_mean dbl |
Blood pressure (mm/Hg)_mean dbl |
Specific gravity_mean dbl |
Albumine_mean dbl |
Sugar_mean dbl |
[Glucose] (mg/dl)_mean dbl |
[Blood urea] (mg/dl)_mean dbl |
[Creatine] (mg/dl)_mean dbl |
[Na] (mEq/L)_mean dbl |
[K] (mEq/L)_mean dbl |
Hemoglobine (mg)_mean dbl |
Packed cell volume_mean dbl |
White blood cell count (cells/µl)_mean dbl |
Red blood cell count (millions/µl)_mean dbl |
patient_id_sd dbl |
Age (years)_sd dbl |
Blood pressure (mm/Hg)_sd dbl |
Specific gravity_sd dbl |
Albumine_sd dbl |
Sugar_sd dbl |
[Glucose] (mg/dl)_sd dbl |
[Blood urea] (mg/dl)_sd dbl |
[Creatine] (mg/dl)_sd dbl |
[Na] (mEq/L)_sd dbl |
[K] (mEq/L)_sd dbl |
Hemoglobine (mg)_sd dbl |
Packed cell volume_sd dbl |
White blood cell count (cells/µl)_sd dbl |
Red blood cell count (millions/µl)_sd dbl |
patient_id_se dbl |
Age (years)_se dbl |
Blood pressure (mm/Hg)_se dbl |
Specific gravity_se dbl |
Albumine_se dbl |
Sugar_se dbl |
[Glucose] (mg/dl)_se dbl |
[Blood urea] (mg/dl)_se dbl |
[Creatine] (mg/dl)_se dbl |
[Na] (mEq/L)_se dbl |
[K] (mEq/L)_se dbl |
Hemoglobine (mg)_se dbl |
Packed cell volume_se dbl |
White blood cell count (cells/µl)_se dbl |
Red blood cell count (millions/µl)_se dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ckd | 142.61538 | 59.50000 | 75.76923 | 1.012692 | 3.038461 | 0.9615385 | 195.9231 | 100.1538 | 5.0461538 | 130.9231 | 6.219231 | 9.826923 | 29.76923 | 11234.615 | 3.776923 | 72.49832 | 14.602055 | 14.470128 | 0.0045234 | 1.076318 | 1.310901 | 74.84112 | 66.87762 | 3.7022405 | 7.450761 | 8.3718108 | 2.436647 | 8.011530 | 5101.368 | 1.1229631 | 14.218090 | 2.863698 | 2.837826 | 0.0008871 | 0.2110834 | 0.2570888 | 14.677552 | 13.115779 | 0.7260691 | 1.4612145 | 1.6418472 | 0.4778657 | 1.5711903 | 1000.4605 | 0.2202312 |
| notckd | 324.66667 | 46.05128 | 71.66667 | 1.022756 | 0.000000 | 0.0000000 | 107.9872 | 33.5000 | 0.8576923 | 141.7051 | 4.333333 | 15.025641 | 46.53846 | 7596.154 | 5.302564 | 45.31955 | 15.829364 | 8.888438 | 0.0025029 | 0.000000 | 0.000000 | 18.20125 | 11.42877 | 0.2601295 | 4.952001 | 0.5801179 | 1.354112 | 3.959838 | 1816.836 | 0.5500825 | 5.131428 | 1.792323 | 1.006417 | 0.0002834 | 0.0000000 | 0.0000000 | 2.060886 | 1.294054 | 0.0294539 | 0.5607037 | 0.0656854 | 0.1533230 | 0.4483634 | 205.7162 | 0.0622846 |
| possibleckd | 79.33333 | 59.00000 | 86.66667 | 1.013333 | 2.000000 | 0.0000000 | 135.6667 | 102.6667 | 4.3000000 | 134.0000 | 5.066667 | 9.300000 | 28.66667 | 8700.000 | 3.566667 | 62.17180 | 2.645751 | 5.773503 | 0.0028868 | 0.000000 | 0.000000 | 34.48671 | 47.64802 | 2.2271057 | 2.645751 | 0.2309401 | 1.708801 | 4.509250 | 2095.233 | 0.4725816 | 35.894908 | 1.527525 | 3.333333 | 0.0016667 | 0.0000000 | 0.0000000 | 19.910913 | 27.509594 | 1.2858201 | 1.5275252 | 0.1333333 | 0.9865766 | 2.6034166 | 1209.6832 | 0.2728451 |
Note: summarize_all() and summarize_each()
give the same results.
# Use `summarize_each()` and `summarize_all()` on the same data and compare the results.
df_summ_all <- tibble1 %>%
group_by(Classification) %>%
summarize_all(list(mean = mean, sd = sd))
formatted_table(df_summ_all)|
Classification chr |
patient_id_mean dbl |
Age (years)_mean dbl |
Blood pressure (mm/Hg)_mean dbl |
Specific gravity_mean dbl |
Albumine_mean dbl |
Sugar_mean dbl |
Red blood cells_mean dbl |
Pus in cells_mean dbl |
Pus cell clumps_mean dbl |
Bacteria_mean dbl |
[Glucose] (mg/dl)_mean dbl |
[Blood urea] (mg/dl)_mean dbl |
[Creatine] (mg/dl)_mean dbl |
[Na] (mEq/L)_mean dbl |
[K] (mEq/L)_mean dbl |
Hemoglobine (mg)_mean dbl |
Packed cell volume_mean dbl |
White blood cell count (cells/µl)_mean dbl |
Red blood cell count (millions/µl)_mean dbl |
Hypertension_mean dbl |
Diabetes mellitus_mean dbl |
Coronary Artery Disease_mean dbl |
Appetite_mean dbl |
Pedal edema_mean dbl |
Anemia_mean dbl |
patient_id_sd dbl |
Age (years)_sd dbl |
Blood pressure (mm/Hg)_sd dbl |
Specific gravity_sd dbl |
Albumine_sd dbl |
Sugar_sd dbl |
Red blood cells_sd dbl |
Pus in cells_sd dbl |
Pus cell clumps_sd dbl |
Bacteria_sd dbl |
[Glucose] (mg/dl)_sd dbl |
[Blood urea] (mg/dl)_sd dbl |
[Creatine] (mg/dl)_sd dbl |
[Na] (mEq/L)_sd dbl |
[K] (mEq/L)_sd dbl |
Hemoglobine (mg)_sd dbl |
Packed cell volume_sd dbl |
White blood cell count (cells/µl)_sd dbl |
Red blood cell count (millions/µl)_sd dbl |
Hypertension_sd dbl |
Diabetes mellitus_sd dbl |
Coronary Artery Disease_sd dbl |
Appetite_sd dbl |
Pedal edema_sd dbl |
Anemia_sd dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ckd | 131.0199 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 71.85573 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| notckd | 322.5755 | 45.80189 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 41.38759 | 15.86336 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| possibleckd | 123.6087 | NA | 85.21739 | NA | NA | NA | NA | NA | NA | NA | NA | 71.56522 | 3.678261 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 68.00851 | NA | 23.5236 | NA | NA | NA | NA | NA | NA | NA | NA | 47.47423 | 3.860635 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
by_heart <- group_by(tibble1, Classification)
formatted_table(summarize_each(by_heart, funs(mean, sd)))|
Classification chr |
patient_id_mean dbl |
Age (years)_mean dbl |
Blood pressure (mm/Hg)_mean dbl |
Specific gravity_mean dbl |
Albumine_mean dbl |
Sugar_mean dbl |
Red blood cells_mean dbl |
Pus in cells_mean dbl |
Pus cell clumps_mean dbl |
Bacteria_mean dbl |
[Glucose] (mg/dl)_mean dbl |
[Blood urea] (mg/dl)_mean dbl |
[Creatine] (mg/dl)_mean dbl |
[Na] (mEq/L)_mean dbl |
[K] (mEq/L)_mean dbl |
Hemoglobine (mg)_mean dbl |
Packed cell volume_mean dbl |
White blood cell count (cells/µl)_mean dbl |
Red blood cell count (millions/µl)_mean dbl |
Hypertension_mean dbl |
Diabetes mellitus_mean dbl |
Coronary Artery Disease_mean dbl |
Appetite_mean dbl |
Pedal edema_mean dbl |
Anemia_mean dbl |
patient_id_sd dbl |
Age (years)_sd dbl |
Blood pressure (mm/Hg)_sd dbl |
Specific gravity_sd dbl |
Albumine_sd dbl |
Sugar_sd dbl |
Red blood cells_sd dbl |
Pus in cells_sd dbl |
Pus cell clumps_sd dbl |
Bacteria_sd dbl |
[Glucose] (mg/dl)_sd dbl |
[Blood urea] (mg/dl)_sd dbl |
[Creatine] (mg/dl)_sd dbl |
[Na] (mEq/L)_sd dbl |
[K] (mEq/L)_sd dbl |
Hemoglobine (mg)_sd dbl |
Packed cell volume_sd dbl |
White blood cell count (cells/µl)_sd dbl |
Red blood cell count (millions/µl)_sd dbl |
Hypertension_sd dbl |
Diabetes mellitus_sd dbl |
Coronary Artery Disease_sd dbl |
Appetite_sd dbl |
Pedal edema_sd dbl |
Anemia_sd dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ckd | 131.0199 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 71.85573 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| notckd | 322.5755 | 45.80189 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 41.38759 | 15.86336 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| possibleckd | 123.6087 | NA | 85.21739 | NA | NA | NA | NA | NA | NA | NA | NA | 71.56522 | 3.678261 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 68.00851 | NA | 23.5236 | NA | NA | NA | NA | NA | NA | NA | NA | 47.47423 | 3.860635 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Learning outcomes
This lesson you have learned to:
- select rows and columns with base R and tidyverse,
- filter and sort data in data frames,
- round numbers and do statistical analysis on data,
- summarize statistical analyses on data frames.
Go back to the main page
Go back to the R overview page
⬆️ Back to Top
This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.