Lesson 11-13: Data analysis

Mark Sibbald, Jurre Hageman

2025-10-15


Go back to the main page
Go back to the R overview page



This file can be downloaded here.

Lesson 11-13: Data analysis

Once the data is read/loaded and cleaned up nicely, it is time start analyzing and presenting the data. In these two lessons, we will look at the analyzing part. We will look at how to sort, filter and re-arranging the data presented in the data frame (or tibble) and then it is time to look at properties of specific variables and do some commonly used calculations on data.

First, let’s load a data set that we can work with which has been cleaned up already. Of course we start with the make up of the tibbles we create during this part of the lessons, like we did before in previous lessons using the tidyverse and kableExtra libraries.

library(tidyverse)
library(kableExtra)
library(knitr)
library(pillar)
formatted_table <- function(df) {
  col_types <- sapply(df, pillar::type_sum)
  new_col_names <- paste0(names(df), "<br>", "<span style='font-weight: normal;'>", col_types, "</span>")
  kbl(df, col.names = new_col_names, escape = F, format = "html") %>%
    kable_styling(bootstrap_options = c("striped", "hoover", "responsive"))
}

Download the file chronic_kidney_disease.csv and check in a text editor what is the delimiter in the file. Read the file into R.

# Read the data on chronic kidney disease.
kidney_data <- read_csv("./files_10_data_analysis_exercises/add_exercises/chronic_kidney_disease.csv")
# Replace any missing data with NA values. 
# Hint: check which columns are of character type, but contains numbers.
tibble1 <- tibble(kidney_data) %>%
  replace(.== "?", NA) %>%
  mutate(`White blood cell count (cells/µl)` = as.numeric(`White blood cell count (cells/µl)`)) %>%
  mutate(`Red blood cell count (millions/µl)` = as.numeric(`Red blood cell count (millions/µl)`))

formatted_table(head(tibble1))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
157 62 70 1.025 3 0 normal abnormal notpresent notpresent 122 42.0 1.7 136 4.7 12.6 39 7900 3.9 yes yes no good no no ckd
109 54 70 NA NA NA NA NA notpresent notpresent 233 50.1 1.9 NA NA 11.7 NA NA NA no yes no good no no ckd
17 47 80 NA NA NA NA NA notpresent notpresent 114 87.0 5.2 139 3.7 12.1 NA NA NA yes no no poor no no possibleckd
347 43 60 1.025 0 0 normal normal notpresent notpresent 108 25.0 1.0 144 5.0 17.8 43 7200 5.5 no no no good no no notckd
24 42 100 1.015 4 0 normal abnormal notpresent present NA 50.0 1.4 129 4.0 11.1 39 8300 4.6 yes no no poor no no possibleckd
175 60 50 1.010 0 0 NA normal notpresent notpresent 261 58.0 2.2 113 3.0 NA NA 4200 3.4 yes no no good no no ckd


Select in data frames using base R

Let’s start by selecting specific data from the data frame. If we use numbers, we use the square brackets to indicate the index numbers (rows and column) of the data frame. For columns it is also possible to use the column name and then the $-sign is used.

# Select the value from the 6th row and the 2nd column.
tibble1[6, 2]
## # A tibble: 1 × 1
##   `Age (years)`
##           <dbl>
## 1            60
# Select the 123th row.
tibble1[123, ]
## # A tibble: 1 × 26
##   patient_id `Age (years)` `Blood pressure (mm/Hg)` `Specific gravity` Albumine
##        <dbl>         <dbl>                    <dbl>              <dbl>    <dbl>
## 1         74            56                       90               1.01        2
## # ℹ 21 more variables: Sugar <dbl>, `Red blood cells` <chr>,
## #   `Pus in cells` <chr>, `Pus cell clumps` <chr>, Bacteria <chr>,
## #   `[Glucose] (mg/dl)` <dbl>, `[Blood urea] (mg/dl)` <dbl>,
## #   `[Creatine] (mg/dl)` <dbl>, `[Na] (mEq/L)` <dbl>, `[K] (mEq/L)` <dbl>,
## #   `Hemoglobine (mg)` <dbl>, `Packed cell volume` <dbl>,
## #   `White blood cell count (cells/µl)` <dbl>,
## #   `Red blood cell count (millions/µl)` <dbl>, Hypertension <chr>, …
# Select the 3rd column using the column number.
tibble1[, 3]
## # A tibble: 280 × 1
##    `Blood pressure (mm/Hg)`
##                       <dbl>
##  1                       70
##  2                       70
##  3                       80
##  4                       60
##  5                      100
##  6                       50
##  7                       80
##  8                       70
##  9                       70
## 10                      100
## # ℹ 270 more rows
# Select the 3rd column using the column name.
tibble1$`Blood pressure (mm/Hg)`
##   [1]  70  70  80  60 100  50  80  70  70 100  60  90  80  80  70  70  50  80
##  [19]  90  60  70  70  80  70 100  80 100  80  80  80  70  80  80  60  60 110
##  [37]  70  90  NA  60  80  80  70  70  60  70  70  80  60  70  70  70  80  80
##  [55]  80  60  60 100  70  70  80  90  70  80  80 100  80  90  70  90  90 100
##  [73]  60  80  60  70  70  60  80  70  70  90  70 120  70  70  80  70  80  NA
##  [91]  60  60  90  60  60  70  80  60  60  70  80  NA  80  90  90  80  80  80
## [109]  80  90  80 100  70  80  90  70  60  60  60  70  80  60  90  60  70  70
## [127]  80 100  90  70 100  90  60  70  80  60  90  70  70  50  80  70  90  70
## [145]  70  90  70  80  70  90 100  70  60  80  90  70  60  70  60  70  80  80
## [163]  60  70 100  70  80  70 140  NA  80  80  90  80  90  70  70  70  60  80
## [181]  70  70  70  NA  60  80  90 100  80  60  80  90  90  80  80  70  70  60
## [199]  80  70  70  70  80  80  80  80  70  80  70  NA  80 100  60  60  80  80
## [217]  80  50  70  80  70  70  80  70  70  90  60  80  70  70  70 110  80  60
## [235]  70  60 100  80  80  60  90  80  60  70  60  80  NA  70  80  70  60  70
## [253]  80  90  80  60  60  70  NA  60  60  80  70  90  90  60 180  60 100  80
## [271]  80  60  80  80  NA  60  90  80  80  60

Notice the difference between selecting a column using the index number of the column and using the name of the column. The first command returns a tibble, the second command returns a vector.


Slicing specific columns using Tidyverse

With tidyverse you can take slices (selected rows) from the data frame with the slice() function.

# Select the first row of tibble1.
formatted_table(slice(tibble1, 1))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
157 62 70 1.025 3 0 normal abnormal notpresent notpresent 122 42 1.7 136 4.7 12.6 39 7900 3.9 yes yes no good no no ckd
# Select the rows 32 to 36 of tibble1
formatted_table(slice(tibble1, 32:36))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
333 23 80 1.020 0 0 normal normal notpresent notpresent 99 46 1.2 142 4.0 17.7 46 4300 5.5 no no no good no no notckd
275 52 80 1.020 0 0 normal normal notpresent notpresent 125 22 1.2 139 4.6 16.5 43 4700 4.6 no no no good no no notckd
150 8 60 1.025 3 0 normal normal notpresent notpresent 78 27 0.9 NA NA 12.3 41 6700 NA no no no poor yes no ckd
10 50 60 1.010 2 4 NA abnormal present notpresent 490 55 4.0 NA NA 9.4 28 NA NA yes yes no good no yes ckd
192 46 110 1.015 0 0 NA normal notpresent notpresent 130 16 0.9 NA NA NA NA NA NA no no no good no no ckd

With slice_head() and slice_tail() you can get the top or bottom rows, respectively.

# Get the top 4 rows of tibbel1.
formatted_table(slice_head(tibble1, n = 4))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
157 62 70 1.025 3 0 normal abnormal notpresent notpresent 122 42.0 1.7 136 4.7 12.6 39 7900 3.9 yes yes no good no no ckd
109 54 70 NA NA NA NA NA notpresent notpresent 233 50.1 1.9 NA NA 11.7 NA NA NA no yes no good no no ckd
17 47 80 NA NA NA NA NA notpresent notpresent 114 87.0 5.2 139 3.7 12.1 NA NA NA yes no no poor no no possibleckd
347 43 60 1.025 0 0 normal normal notpresent notpresent 108 25.0 1.0 144 5.0 17.8 43 7200 5.5 no no no good no no notckd
# Get the bottom 4 rows of tibble1.
formatted_table(slice_tail(tibble1, n = 4))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
106 50 90 NA NA NA NA NA notpresent notpresent 89 118 6.1 127 4.4 6.0 17 6500 NA yes yes no good yes yes ckd
270 23 80 1.025 0 0 normal normal notpresent notpresent 111 34 1.1 145 4.0 14.3 41 7200 5.0 no no no good no no notckd
348 38 80 1.020 0 0 normal normal notpresent notpresent 99 19 0.5 147 3.5 13.6 44 7300 6.4 no no no good no no notckd
102 17 60 1.010 0 0 NA normal notpresent notpresent 92 32 2.1 141 4.2 13.9 52 7000 NA no no no good no no possibleckd

With slice_max() and slice_min() it is possible to get the row with the maximum or minimum value, respectively, for a specific column.

# Get the row with the maximum and mininum values for Hemoglobin levels.
formatted_table(slice_max(tibble1, order_by = `Hemoglobine (mg)`, n = 1))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
347 43 60 1.025 0 0 normal normal notpresent notpresent 108 25 1.0 144 5 17.8 43 7200 5.5 no no no good no no notckd
363 67 80 1.025 0 0 normal normal notpresent notpresent 99 40 0.5 NA NA 17.8 44 5900 5.2 no no no good no no notckd
formatted_table(slice_min(tibble1, order_by = `Hemoglobine (mg)`, n = 3))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
249 56 90 1.01 4 1 normal abnormal present notpresent 176 309.0 13.3 124 6.5 3.1 9 5400 2.1 yes yes no poor yes yes ckd
14 68 80 1.01 3 2 normal abnormal present present 157 90.0 4.1 130 6.4 5.6 16 11000 2.6 yes yes yes poor yes no ckd
195 70 90 1.02 2 1 abnormal abnormal notpresent present 184 98.6 3.3 138 3.9 5.8 NA NA NA yes yes yes poor no no ckd

n will give the top (or bottom) n rows. If there are two rows with the same maximum (or minimum) values, R will give both rows back (even if you set n = 1).


Filter and sort in R

Like in Excel, it is possible to filter and sort on the data in the data frame.

# Select data that only contains the data for patients that have a blood pressure that is higher than 70 (mm/Hg). Assign the data to 'tibble2'.
tibble2 <- filter(tibble1, `Blood pressure (mm/Hg)` > 70)
formatted_table(head(tibble2))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
17 47 80 NA NA NA NA NA notpresent notpresent 114 87 5.2 139 3.7 12.1 NA NA NA yes no no poor no no possibleckd
24 42 100 1.015 4 0 normal abnormal notpresent present NA 50 1.4 129 4.0 11.1 39 8300 4.6 yes no no poor no no possibleckd
351 29 80 1.020 0 0 normal normal notpresent notpresent 83 49 0.9 139 3.3 17.5 40 9900 4.7 no no no good no no notckd
245 48 100 NA NA NA NA NA notpresent notpresent 103 79 5.3 135 6.3 6.3 19 7200 2.6 yes no yes poor no no ckd
145 57 90 1.015 5 0 abnormal abnormal notpresent present NA 322 13.0 126 4.8 8.0 24 4200 3.3 yes yes yes poor yes yes ckd
258 42 80 1.020 0 0 normal normal notpresent notpresent 98 20 0.5 140 3.5 13.9 44 8400 5.5 no no no good no no notckd
# Select data that contains the same condition as above (Blood pressure > 70) AND age of the patient < 50 years. Use the '&' operator to combine conditions. Store the data in 'tibble3'.
tibble3 <- filter(tibble1, `Blood pressure (mm/Hg)` > 70 &
                    `Age (years)` < 50)
formatted_table(head(tibble3))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
17 47 80 NA NA NA NA NA notpresent notpresent 114 87 5.2 139 3.7 12.1 NA NA NA yes no no poor no no possibleckd
24 42 100 1.015 4 0 normal abnormal notpresent present NA 50 1.4 129 4.0 11.1 39 8300 4.6 yes no no poor no no possibleckd
351 29 80 1.020 0 0 normal normal notpresent notpresent 83 49 0.9 139 3.3 17.5 40 9900 4.7 no no no good no no notckd
245 48 100 NA NA NA NA NA notpresent notpresent 103 79 5.3 135 6.3 6.3 19 7200 2.6 yes no yes poor no no ckd
258 42 80 1.020 0 0 normal normal notpresent notpresent 98 20 0.5 140 3.5 13.9 44 8400 5.5 no no no good no no notckd
218 33 90 1.015 0 0 NA normal notpresent notpresent 92 19 0.8 NA NA 11.8 34 7000 NA no no no good no no ckd
# Select data that contains either of the conditions mentioned: Blood pressure > 70 mm/Hg OR the age < 50 years. Use the '|' operator to choose between either of the conditions. Store the data in tibble 4.
tibble4 <- filter(tibble1, `Blood pressure (mm/Hg)` > 70 |
                    `Age (years)` < 50)
formatted_table(head(tibble4))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
17 47 80 NA NA NA NA NA notpresent notpresent 114 87 5.2 139 3.7 12.1 NA NA NA yes no no poor no no possibleckd
347 43 60 1.025 0 0 normal normal notpresent notpresent 108 25 1.0 144 5.0 17.8 43 7200 5.5 no no no good no no notckd
24 42 100 1.015 4 0 normal abnormal notpresent present NA 50 1.4 129 4.0 11.1 39 8300 4.6 yes no no poor no no possibleckd
351 29 80 1.020 0 0 normal normal notpresent notpresent 83 49 0.9 139 3.3 17.5 40 9900 4.7 no no no good no no notckd
332 34 70 1.025 0 0 normal normal notpresent notpresent NA 33 1.0 150 5.0 15.3 44 10500 6.1 no no no good no no notckd
167 34 70 1.020 0 0 abnormal normal notpresent notpresent 139 19 0.9 NA NA 12.7 42 2200 NA no no no poor no no possibleckd

We have seen the negate operator ! already before to exclude certain data.

# Select data of patients that have a blood pressure higher than 70 mm/Hg AND leave out the data of patients which appetite has been scored as "poor". Store the data in 'tibble5'.
tibble5 <- filter(tibble1, `Blood pressure (mm/Hg)` > 70 & Appetite != "poor")
formatted_table(head(tibble5))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
351 29 80 1.020 0 0 normal normal notpresent notpresent 83 49 0.9 139 3.3 17.5 40 9900 4.7 no no no good no no notckd
258 42 80 1.020 0 0 normal normal notpresent notpresent 98 20 0.5 140 3.5 13.9 44 8400 5.5 no no no good no no notckd
177 65 80 1.015 2 1 normal normal present notpresent 215 133 2.5 NA NA 13.2 41 NA NA no yes no good no no ckd
265 50 80 1.020 0 0 normal normal notpresent notpresent 97 40 0.6 150 4.5 14.2 48 10500 5.0 no no no good no no notckd
218 33 90 1.015 0 0 NA normal notpresent notpresent 92 19 0.8 NA NA 11.8 34 7000 NA no no no good no no ckd
291 47 80 1.025 0 0 normal normal notpresent notpresent 124 44 1.0 140 4.9 14.9 41 7000 5.7 no no no good no no notckd

Sorting on a specific column uses the arrange() function. If you do not specify, the indicated column is sorted ascending.

# Sort on creatine levels with the lowest value on top. Store the new tibble in tibble6.
tibble6 <- arrange(tibble1, `[Creatine] (mg/dl)`)
formatted_table(head(tibble6))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
354 32 60 1.025 0 0 normal normal notpresent notpresent 102 17 0.4 147 4.7 14.6 41 6800 5.1 no no no good no no notckd
258 42 80 1.020 0 0 normal normal notpresent notpresent 98 20 0.5 140 3.5 13.9 44 8400 5.5 no no no good no no notckd
307 47 60 1.020 0 0 normal normal notpresent notpresent 137 17 0.5 150 3.5 13.6 44 7900 4.5 no no no good no no notckd
363 67 80 1.025 0 0 normal normal notpresent notpresent 99 40 0.5 NA NA 17.8 44 5900 5.2 no no no good no no notckd
316 35 NA 1.020 0 0 normal normal NA NA 99 30 0.5 135 4.9 15.4 48 5000 5.2 no no no good no no notckd
375 70 80 1.020 0 0 normal normal notpresent notpresent 74 41 0.5 143 4.5 15.1 48 9700 5.6 no no no good no no notckd
# Sort on creatine levels with the lowest value on top. Store the new tibble in tibble7.
tibble7 <- arrange(tibble1, desc(`[Creatine] (mg/dl)`))
formatted_table(head(tibble7))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
21 60 90 NA NA NA NA NA notpresent notpresent NA 180 76.0 4.5 NA 10.9 32 6200 3.6 yes yes yes good no no ckd
61 67 80 1.010 1 3 normal abnormal notpresent notpresent 182 391 32.0 163.0 39.0 NA NA NA NA no no no good yes no ckd
6 68 70 1.010 0 0 NA normal notpresent notpresent 100 54 24.0 104.0 4.0 12.4 36 NA NA no no no good no no ckd
143 41 80 1.015 1 4 abnormal normal notpresent notpresent 210 165 18.0 135.0 4.7 NA NA NA NA no yes no good no no possibleckd
134 47 100 1.010 NA NA normal NA notpresent notpresent 122 NA 16.9 138.0 5.2 10.8 33 10200 3.8 no yes no good no no ckd
154 56 90 1.005 4 3 abnormal abnormal notpresent notpresent 242 132 16.4 140.0 4.2 8.4 26 NA 3.0 yes yes no poor yes yes ckd

You can easily check if the lowest or highest value is on top using the min() and max() functions and check in the table.

# Check the minimum of [Creatine] in tibble 6.
# Check the maximum of [Creatine] in tibble 7.
min(tibble6$`[Creatine] (mg/dl)`, na.rm = T)
## [1] 0.4
formatted_table(tibble6[1, ])
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
354 32 60 1.025 0 0 normal normal notpresent notpresent 102 17 0.4 147 4.7 14.6 41 6800 5.1 no no no good no no notckd
max(tibble7$`[Creatine] (mg/dl)`, na.rm = T)
## [1] 76
formatted_table(tibble7[1, ])
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
21 60 90 NA NA NA NA NA notpresent notpresent NA 180 76 4.5 NA 10.9 32 6200 3.6 yes yes yes good no no ckd

Sorting on multiple levels is very simple. Just add another argument on which column to sort. You can even first sort from high to low and then from low to high.

# Sort on Blood pressure and then on creatine level, both from low to high.
tibble8 <- arrange(tibble1, `Blood pressure (mm/Hg)`, `[Creatine] (mg/dl)`)
formatted_table(head(tibble8))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
1 7 50 1.020 4 0 NA normal notpresent notpresent NA 18 0.8 NA NA 11.3 38 6000 NA no no no good no no ckd
186 8 50 1.020 4 0 normal normal notpresent notpresent NA 46 1.0 135 3.8 NA NA NA NA no no no good yes no ckd
175 60 50 1.010 0 0 NA normal notpresent notpresent 261 58 2.2 113 3.0 NA NA 4200 3.4 yes no no good no no ckd
229 59 50 1.010 3 0 normal abnormal notpresent notpresent 241 191 12.0 114 2.9 9.6 31 15700 3.8 no yes no good yes no ckd
354 32 60 1.025 0 0 normal normal notpresent notpresent 102 17 0.4 147 4.7 14.6 41 6800 5.1 no no no good no no notckd
307 47 60 1.020 0 0 normal normal notpresent notpresent 137 17 0.5 150 3.5 13.6 44 7900 4.5 no no no good no no notckd
# Sort on Blood pressure from low to high and then on creatine level from high to low.
tibble9 <- arrange(tibble1, `Blood pressure (mm/Hg)`, desc(`[Creatine] (mg/dl)`))
formatted_table(head(tibble9))
patient_id
dbl
Age (years)
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Albumine
dbl
Sugar
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
229 59 50 1.010 3 0 normal abnormal notpresent notpresent 241 191 12.0 114 2.9 9.6 31 15700 3.8 no yes no good yes no ckd
175 60 50 1.010 0 0 NA normal notpresent notpresent 261 58 2.2 113 3.0 NA NA 4200 3.4 yes no no good no no ckd
186 8 50 1.020 4 0 normal normal notpresent notpresent NA 46 1.0 135 3.8 NA NA NA NA no no no good yes no ckd
1 7 50 1.020 4 0 NA normal notpresent notpresent NA 18 0.8 NA NA 11.3 38 6000 NA no no no good no no ckd
127 71 60 1.015 4 0 normal normal notpresent notpresent 118 125 5.3 136 4.9 11.4 35 15200 4.3 yes yes no poor yes no ckd
189 64 60 1.010 4 1 abnormal abnormal notpresent present 239 58 4.3 137 5.4 9.5 29 7500 3.4 yes yes no poor yes no ckd

If you want only some of the columns that you want to work on, you can select these columns.

# Select the columns for patient_id, Blood pressure, sodium an potassium levels.
tibble10 <- select(tibble1, patient_id, `Blood pressure (mm/Hg)`, 
                   `[Na] (mEq/L)`, `[K] (mEq/L)`)
formatted_table(head(tibble10))
patient_id
dbl
Blood pressure (mm/Hg)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
157 70 136 4.7
109 70 NA NA
17 80 139 3.7
347 60 144 5.0
24 100 129 4.0
175 50 113 3.0

If you want to leave out columns, use the ‘-’ sign to indicate which columns should not be present.

# Leave out the columns for Age, Albumine and Sugar.
tibble11 <- select(tibble1, -`Age (years)`, -Albumine, -Sugar)
formatted_table(head(tibble11))
patient_id
dbl
Blood pressure (mm/Hg)
dbl
Specific gravity
dbl
Red blood cells
chr
Pus in cells
chr
Pus cell clumps
chr
Bacteria
chr
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
[Na] (mEq/L)
dbl
[K] (mEq/L)
dbl
Hemoglobine (mg)
dbl
Packed cell volume
dbl
White blood cell count (cells/µl)
dbl
Red blood cell count (millions/µl)
dbl
Hypertension
chr
Diabetes mellitus
chr
Coronary Artery Disease
chr
Appetite
chr
Pedal edema
chr
Anemia
chr
Classification
chr
157 70 1.025 normal abnormal notpresent notpresent 122 42.0 1.7 136 4.7 12.6 39 7900 3.9 yes yes no good no no ckd
109 70 NA NA NA notpresent notpresent 233 50.1 1.9 NA NA 11.7 NA NA NA no yes no good no no ckd
17 80 NA NA NA notpresent notpresent 114 87.0 5.2 139 3.7 12.1 NA NA NA yes no no poor no no possibleckd
347 60 1.025 normal normal notpresent notpresent 108 25.0 1.0 144 5.0 17.8 43 7200 5.5 no no no good no no notckd
24 100 1.015 normal abnormal notpresent present NA 50.0 1.4 129 4.0 11.1 39 8300 4.6 yes no no poor no no possibleckd
175 50 1.010 NA normal notpresent notpresent 261 58.0 2.2 113 3.0 NA NA 4200 3.4 yes no no good no no ckd

If column names have similar column names it is sometimes possible to select on part of these similar names with the functions starts_with() or ends_with().

# Select the columns that end with '(mg/dl)'.
# Select the columns that start with 'Pus'.
tibble12 <- select(tibble1, patient_id, ends_with("(mg/dl)"))
formatted_table(head(tibble12))
patient_id
dbl
[Glucose] (mg/dl)
dbl
[Blood urea] (mg/dl)
dbl
[Creatine] (mg/dl)
dbl
157 122 42.0 1.7
109 233 50.1 1.9
17 114 87.0 5.2
347 108 25.0 1.0
24 NA 50.0 1.4
175 261 58.0 2.2
tibble13 <- select(tibble1, patient_id, starts_with("Pus"))
formatted_table(head(tibble13))
patient_id
dbl
Pus in cells
chr
Pus cell clumps
chr
157 abnormal notpresent
109 NA notpresent
17 NA notpresent
347 normal notpresent
24 abnormal notpresent
175 normal notpresent

Because there is so much data to analyze, it might be helpful to look at a summary of all the data that is present in the data frame.

# Give a summary of the data in the complete data frame.
summary(tibble1)
##    patient_id     Age (years)    Blood pressure (mm/Hg) Specific gravity
##  Min.   :  1.0   Min.   : 2.00   Min.   : 50.00         Min.   :1.005   
##  1st Qu.:110.5   1st Qu.:42.00   1st Qu.: 70.00         1st Qu.:1.010   
##  Median :202.0   Median :55.00   Median : 70.00         Median :1.020   
##  Mean   :202.9   Mean   :51.45   Mean   : 76.05         Mean   :1.017   
##  3rd Qu.:302.2   3rd Qu.:65.00   3rd Qu.: 80.00         3rd Qu.:1.020   
##  Max.   :399.0   Max.   :90.00   Max.   :180.00         Max.   :1.025   
##                  NA's   :5       NA's   :9              NA's   :36      
##     Albumine         Sugar       Red blood cells    Pus in cells      
##  Min.   :0.000   Min.   :0.000   Length:280         Length:280        
##  1st Qu.:0.000   1st Qu.:0.000   Class :character   Class :character  
##  Median :0.000   Median :0.000   Mode  :character   Mode  :character  
##  Mean   :1.024   Mean   :0.438                                        
##  3rd Qu.:2.000   3rd Qu.:0.000                                        
##  Max.   :5.000   Max.   :5.000                                        
##  NA's   :35      NA's   :38                                           
##  Pus cell clumps      Bacteria         [Glucose] (mg/dl) [Blood urea] (mg/dl)
##  Length:280         Length:280         Min.   : 70.0     Min.   : 10.00      
##  Class :character   Class :character   1st Qu.:100.0     1st Qu.: 27.25      
##  Mode  :character   Mode  :character   Median :124.0     Median : 41.00      
##                                        Mean   :150.2     Mean   : 56.98      
##                                        3rd Qu.:171.5     3rd Qu.: 64.75      
##                                        Max.   :490.0     Max.   :391.00      
##                                        NA's   :33        NA's   :14          
##  [Creatine] (mg/dl)  [Na] (mEq/L)    [K] (mEq/L)     Hemoglobine (mg)
##  Min.   : 0.400     Min.   :  4.5   Min.   : 2.700   Min.   : 3.10   
##  1st Qu.: 0.900     1st Qu.:135.0   1st Qu.: 3.900   1st Qu.:10.50   
##  Median : 1.300     Median :138.0   Median : 4.400   Median :12.70   
##  Mean   : 3.006     Mean   :137.3   Mean   : 4.754   Mean   :12.53   
##  3rd Qu.: 2.800     3rd Qu.:141.0   3rd Qu.: 4.900   3rd Qu.:14.90   
##  Max.   :76.000     Max.   :163.0   Max.   :47.000   Max.   :17.80   
##  NA's   :12         NA's   :67      NA's   :68       NA's   :39      
##  Packed cell volume White blood cell count (cells/µl)
##  Min.   : 9.00      Min.   : 2200                    
##  1st Qu.:33.00      1st Qu.: 6325                    
##  Median :41.00      Median : 7900                    
##  Mean   :39.17      Mean   : 8355                    
##  3rd Qu.:46.00      3rd Qu.: 9800                    
##  Max.   :54.00      Max.   :26400                    
##  NA's   :51         NA's   :78                       
##  Red blood cell count (millions/µl) Hypertension       Diabetes mellitus 
##  Min.   :2.100                      Length:280         Length:280        
##  1st Qu.:3.925                      Class :character   Class :character  
##  Median :4.800                      Mode  :character   Mode  :character  
##  Mean   :4.706                                                           
##  3rd Qu.:5.500                                                           
##  Max.   :8.000                                                           
##  NA's   :94                                                              
##  Coronary Artery Disease   Appetite         Pedal edema       
##  Length:280              Length:280         Length:280        
##  Class :character        Class :character   Class :character  
##  Mode  :character        Mode  :character   Mode  :character  
##                                                               
##                                                               
##                                                               
##                                                               
##     Anemia          Classification    
##  Length:280         Length:280        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 
# Give a summary of the Blood pressure data.
summary(tibble1$`Blood pressure (mm/Hg)`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   50.00   70.00   70.00   76.05   80.00  180.00       9

The summary contains the most used statistics used (such as the mean, minimum and maximum values) for the columns that contain numeric data.

Rounding numbers

You see in the previous tables that calculations are not rounded to 2 or 3 decimals. If you want to round to a specific amount of decimals, you can use the round() function. NOTE: rounding is according to the IEC 60559 standard. This will round the decimal to the ‘even number’.

# Compare the following rounding numbers:
round(0.5)
## [1] 0
round(1.5)
## [1] 2
round(2.5)
## [1] 2
round(3.5)
## [1] 4

If you want to round to a certain amount of decimals, use the argument digits =.

# Round a random number to different a different amount of decimals.
round(2.83456785, digits = 2)
## [1] 2.83
round(2.83455775, digits = 4)
## [1] 2.8346
round(2.83445775, digits = 4)
## [1] 2.8345


Statistics

R is designed for statistic analysis. This is of course not the scope of this course, but we will discuss the main statistical analyses that are used in the lab.

We have encountered a couple of the statistical calculations already.

# Calculate the average of the 'White blood cell count (cells/µl)'. Round the average to 2 decimals.
# Present it nicely in a sentence.
WBC <- round(mean(tibble1$`White blood cell count (cells/µl)`, na.rm = T), digits = 2)
paste("The average White Bloodcell Count is:", WBC, "cells/µl")
## [1] "The average White Bloodcell Count is: 8354.95 cells/µl"
# What are the minimum and maximum amounts of Hemoglobine? Present without decimals.
# Present it nicely in a sentence.
hemo_min <- round(min(tibble1$`Hemoglobine (mg)`, na.rm = T))
hemo_max <- round(max(tibble1$`Hemoglobine (mg)`, na.rm = T))
paste("The minimum amount of hemoglobin is:", hemo_min, "mg")
## [1] "The minimum amount of hemoglobin is: 3 mg"
paste("The maximum amount of hemoglobin is:", hemo_max, "mg")
## [1] "The maximum amount of hemoglobin is: 18 mg"
# What is the median of the age of the patients? Present without decimals.
# Present it nicely in a sentence.
median_pat <- round(median(tibble1$`Age (years)`, na.rm = T))
paste("The median age of the patients is:", median_pat, "years")
## [1] "The median age of the patients is: 55 years"
# Present the quantiles for 25%, 50% and 75% of the glucose levels. Round to two decimals.
round(quantile(tibble1$`[Glucose] (mg/dl)`, c(0.25, 0.50, 0.75), na.rm = T), digits = 2)
##   25%   50%   75% 
## 100.0 124.0 171.5
# Calculate the average, the standard deviation and the standard error of the mean for the Creatine levels. Round all to one decimal.
# Present them in a sentence.
mean_crea <- round(mean(tibble1$`[Creatine] (mg/dl)`, na.rm = T), digits = 1)
sd_crea <- round(sd(tibble1$`[Creatine] (mg/dl)`, na.rm = T), digits = 1)
se_crea <- round(sd(tibble1$`[Creatine] (mg/dl)`, 
                    na.rm = T)/sqrt(length(tibble1$`[Creatine] (mg/dl)`)), 
                 digits = 3)
paste("The average Creatine level is:", mean_crea, "mg/dl")
## [1] "The average Creatine level is: 3 mg/dl"
paste("The standard deviation of the Creatine level is:", sd_crea)
## [1] "The standard deviation of the Creatine level is: 5.9"
paste("The standard error of the mean for the Creatine levels is:", se_crea)
## [1] "The standard error of the mean for the Creatine levels is: 0.35"


Summary of data

With tidyverse it is possible to make this presentable in a tibble and then combine it with group_by() to calculate and show these statistics by category.

# Summarize the Blood pressure and present it in a tibble.
bp1 <- summarize(tibble1, `Blood pressure (mm/Hg)` = mean(`Blood pressure (mm/Hg)`, 
                                                         na.rm = T))
formatted_table(bp1)
Blood pressure (mm/Hg)
dbl
76.05166
# Calculate the average Blood pressure for the categories 'ckd' (chronic kidney disease) and 'notcdk' (not chronic kidney disease).
by_class <- group_by(tibble1, Classification)
bp2 <- summarize(by_class, `Blood pressure (mm/Hg)` = mean(`Blood pressure (mm/Hg)`, 
                                                         na.rm = T))
formatted_table(bp2)
Classification
chr
Blood pressure (mm/Hg)
dbl
ckd 78.19444
notckd 71.05769
possibleckd 85.21739
# Add other variables to which you can calculate the mean.
bp3 <- summarize(by_class, `Blood pressure (mm/Hg)` = mean(`Blood pressure (mm/Hg)`, 
                                                         na.rm = T), 
                 `Age (years)` = mean(`Age (years)`, na.rm = T), 
                 `[Glucose] (mg/dl)` = mean(`[Glucose] (mg/dl)`, na.rm = T))
formatted_table(bp3)
Classification
chr
Blood pressure (mm/Hg)
dbl
Age (years)
dbl
[Glucose] (mg/dl)
dbl
ckd 78.19444 55.85135 178.1102
notckd 71.05769 45.80189 108.3500
possibleckd 85.21739 49.00000 182.0500
# And add the standard deviation to make the data complete.
bp4 <- summarize(by_class, mean_bp = mean(`Blood pressure (mm/Hg)`, na.rm = T),
                 sd_bp = sd(`Blood pressure (mm/Hg)`, na.rm = T),
                 mean_age = mean(`Age (years)`, na.rm = T),
                 sd_age = sd(`Age (years)`, na.rm = T),
                 mean_glucose = mean(`[Glucose] (mg/dl)`, na.rm = T), 
                 sd_glucose = sd(`[Glucose] (mg/dl)`, na.rm = T))
formatted_table(bp4)
Classification
chr
mean_bp
dbl
sd_bp
dbl
mean_age
dbl
sd_age
dbl
mean_glucose
dbl
sd_glucose
dbl
ckd 78.19444 14.467363 55.85135 18.01185 178.1102 86.75611
notckd 71.05769 8.580659 45.80189 15.86336 108.3500 18.26537
possibleckd 85.21739 23.523598 49.00000 12.64120 182.0500 101.65757

Using summarize_each() is easier to determine for the (selected) results for blood pressure, age and glucose concentrations and present them in a orderly manner. However, there are many NA values in this data set. Remember, that calculations with NA values is not possible.

paste("Mean with NA values:", mean(c(6.5, 3.5, 1.3, NA, 7.8)))
## [1] "Mean with NA values: NA"
paste("Mean without NA values:", mean(c(6.5, 3.5, 1.3, 4.5, 7.8)))
## [1] "Mean without NA values: 4.72"
# Resolve with the `na.rm = ` argument.
paste("Mean with NA values removed:", mean(c(6.5, 3.5, 1.3, NA, 7.8), na.rm = T))
## [1] "Mean with NA values removed: 4.775"
paste("Sum with NA values:", sum(c(6.5, 3.5, 1.3, NA, 7.8)))
## [1] "Sum with NA values: NA"
paste("Sum without NA values:", sum(c(6.5, 3.5, 1.3, 4.5, 7.8)))
## [1] "Sum without NA values: 23.6"
# Resolve with the `na.rm = ` argument.
paste("Sum with NA values removed:", sum(c(6.5, 3.5, 1.3, NA, 7.8), na.rm = T))
## [1] "Sum with NA values removed: 19.1"

For using summarize_each(), droppping the NA values will let us do the calculations.

# First drop all NA values to be able to do calculations.
# Then use the `summarize_each()` to do the calculations for all columns.
bp5 <- drop_na(by_class)
bp5_stats <- summarize_each(bp5, funs(mean, sd, se = sd(.)/sqrt(n())))
formatted_table(bp5_stats)
Classification
chr
patient_id_mean
dbl
Age (years)_mean
dbl
Blood pressure (mm/Hg)_mean
dbl
Specific gravity_mean
dbl
Albumine_mean
dbl
Sugar_mean
dbl
Red blood cells_mean
dbl
Pus in cells_mean
dbl
Pus cell clumps_mean
dbl
Bacteria_mean
dbl
[Glucose] (mg/dl)_mean
dbl
[Blood urea] (mg/dl)_mean
dbl
[Creatine] (mg/dl)_mean
dbl
[Na] (mEq/L)_mean
dbl
[K] (mEq/L)_mean
dbl
Hemoglobine (mg)_mean
dbl
Packed cell volume_mean
dbl
White blood cell count (cells/µl)_mean
dbl
Red blood cell count (millions/µl)_mean
dbl
Hypertension_mean
dbl
Diabetes mellitus_mean
dbl
Coronary Artery Disease_mean
dbl
Appetite_mean
dbl
Pedal edema_mean
dbl
Anemia_mean
dbl
patient_id_sd
dbl
Age (years)_sd
dbl
Blood pressure (mm/Hg)_sd
dbl
Specific gravity_sd
dbl
Albumine_sd
dbl
Sugar_sd
dbl
Red blood cells_sd
dbl
Pus in cells_sd
dbl
Pus cell clumps_sd
dbl
Bacteria_sd
dbl
[Glucose] (mg/dl)_sd
dbl
[Blood urea] (mg/dl)_sd
dbl
[Creatine] (mg/dl)_sd
dbl
[Na] (mEq/L)_sd
dbl
[K] (mEq/L)_sd
dbl
Hemoglobine (mg)_sd
dbl
Packed cell volume_sd
dbl
White blood cell count (cells/µl)_sd
dbl
Red blood cell count (millions/µl)_sd
dbl
Hypertension_sd
dbl
Diabetes mellitus_sd
dbl
Coronary Artery Disease_sd
dbl
Appetite_sd
dbl
Pedal edema_sd
dbl
Anemia_sd
dbl
patient_id_se
dbl
Age (years)_se
dbl
Blood pressure (mm/Hg)_se
dbl
Specific gravity_se
dbl
Albumine_se
dbl
Sugar_se
dbl
Red blood cells_se
dbl
Pus in cells_se
dbl
Pus cell clumps_se
dbl
Bacteria_se
dbl
[Glucose] (mg/dl)_se
dbl
[Blood urea] (mg/dl)_se
dbl
[Creatine] (mg/dl)_se
dbl
[Na] (mEq/L)_se
dbl
[K] (mEq/L)_se
dbl
Hemoglobine (mg)_se
dbl
Packed cell volume_se
dbl
White blood cell count (cells/µl)_se
dbl
Red blood cell count (millions/µl)_se
dbl
Hypertension_se
dbl
Diabetes mellitus_se
dbl
Coronary Artery Disease_se
dbl
Appetite_se
dbl
Pedal edema_se
dbl
Anemia_se
dbl
ckd 142.61538 59.50000 75.76923 1.012692 3.038461 0.9615385 NA NA NA NA 195.9231 100.1538 5.0461538 130.9231 6.219231 9.826923 29.76923 11234.615 3.776923 NA NA NA NA NA NA 72.49832 14.602055 14.470128 0.0045234 1.076318 1.310901 NA NA NA NA 74.84112 66.87762 3.7022405 7.450761 8.3718108 2.436647 8.011530 5101.368 1.1229631 NA NA NA NA NA NA 14.218090 2.863698 2.837826 0.0008871 0.2110834 0.2570888 NA NA NA NA 14.677552 13.115779 0.7260691 1.4612145 1.6418472 0.4778657 1.5711903 1000.4605 0.2202312 NA NA NA NA NA NA
notckd 324.66667 46.05128 71.66667 1.022756 0.000000 0.0000000 NA NA NA NA 107.9872 33.5000 0.8576923 141.7051 4.333333 15.025641 46.53846 7596.154 5.302564 NA NA NA NA NA NA 45.31955 15.829364 8.888438 0.0025029 0.000000 0.000000 NA NA NA NA 18.20125 11.42877 0.2601295 4.952001 0.5801179 1.354112 3.959838 1816.836 0.5500825 NA NA NA NA NA NA 5.131428 1.792323 1.006417 0.0002834 0.0000000 0.0000000 NA NA NA NA 2.060886 1.294054 0.0294539 0.5607037 0.0656854 0.1533230 0.4483634 205.7162 0.0622846 NA NA NA NA NA NA
possibleckd 79.33333 59.00000 86.66667 1.013333 2.000000 0.0000000 NA NA NA NA 135.6667 102.6667 4.3000000 134.0000 5.066667 9.300000 28.66667 8700.000 3.566667 NA NA NA NA NA NA 62.17180 2.645751 5.773503 0.0028868 0.000000 0.000000 NA NA NA NA 34.48671 47.64802 2.2271057 2.645751 0.2309401 1.708801 4.509250 2095.233 0.4725816 NA NA NA NA NA NA 35.894908 1.527525 3.333333 0.0016667 0.0000000 0.0000000 NA NA NA NA 19.910913 27.509594 1.2858201 1.5275252 0.1333333 0.9865766 2.6034166 1209.6832 0.2728451 NA NA NA NA NA NA

This is an example of a function that is still available, but will not be updated anymore, because there is a new function available that does the same, but is evolved to a better function. Using across() will put the mean and sd next to each other (which might be more convenient for your analyses).

# Do the same as for `summarize_each()` with the new `across()` function.
bp6_stats <- bp5 %>%
  summarise(across(everything(), list(mean = mean, sd = sd)))
formatted_table(bp6_stats)
Classification
chr
patient_id_mean
dbl
patient_id_sd
dbl
Age (years)_mean
dbl
Age (years)_sd
dbl
Blood pressure (mm/Hg)_mean
dbl
Blood pressure (mm/Hg)_sd
dbl
Specific gravity_mean
dbl
Specific gravity_sd
dbl
Albumine_mean
dbl
Albumine_sd
dbl
Sugar_mean
dbl
Sugar_sd
dbl
Red blood cells_mean
dbl
Red blood cells_sd
dbl
Pus in cells_mean
dbl
Pus in cells_sd
dbl
Pus cell clumps_mean
dbl
Pus cell clumps_sd
dbl
Bacteria_mean
dbl
Bacteria_sd
dbl
[Glucose] (mg/dl)_mean
dbl
[Glucose] (mg/dl)_sd
dbl
[Blood urea] (mg/dl)_mean
dbl
[Blood urea] (mg/dl)_sd
dbl
[Creatine] (mg/dl)_mean
dbl
[Creatine] (mg/dl)_sd
dbl
[Na] (mEq/L)_mean
dbl
[Na] (mEq/L)_sd
dbl
[K] (mEq/L)_mean
dbl
[K] (mEq/L)_sd
dbl
Hemoglobine (mg)_mean
dbl
Hemoglobine (mg)_sd
dbl
Packed cell volume_mean
dbl
Packed cell volume_sd
dbl
White blood cell count (cells/µl)_mean
dbl
White blood cell count (cells/µl)_sd
dbl
Red blood cell count (millions/µl)_mean
dbl
Red blood cell count (millions/µl)_sd
dbl
Hypertension_mean
dbl
Hypertension_sd
dbl
Diabetes mellitus_mean
dbl
Diabetes mellitus_sd
dbl
Coronary Artery Disease_mean
dbl
Coronary Artery Disease_sd
dbl
Appetite_mean
dbl
Appetite_sd
dbl
Pedal edema_mean
dbl
Pedal edema_sd
dbl
Anemia_mean
dbl
Anemia_sd
dbl
ckd 142.61538 72.49832 59.50000 14.602055 75.76923 14.470128 1.012692 0.0045234 3.038461 1.076318 0.9615385 1.310901 NA NA NA NA NA NA NA NA 195.9231 74.84112 100.1538 66.87762 5.0461538 3.7022405 130.9231 7.450761 6.219231 8.3718108 9.826923 2.436647 29.76923 8.011530 11234.615 5101.368 3.776923 1.1229631 NA NA NA NA NA NA NA NA NA NA NA NA
notckd 324.66667 45.31955 46.05128 15.829364 71.66667 8.888438 1.022756 0.0025029 0.000000 0.000000 0.0000000 0.000000 NA NA NA NA NA NA NA NA 107.9872 18.20125 33.5000 11.42877 0.8576923 0.2601295 141.7051 4.952001 4.333333 0.5801179 15.025641 1.354112 46.53846 3.959838 7596.154 1816.836 5.302564 0.5500825 NA NA NA NA NA NA NA NA NA NA NA NA
possibleckd 79.33333 62.17180 59.00000 2.645751 86.66667 5.773503 1.013333 0.0028868 2.000000 0.000000 0.0000000 0.000000 NA NA NA NA NA NA NA NA 135.6667 34.48671 102.6667 47.64802 4.3000000 2.2271057 134.0000 2.645751 5.066667 0.2309401 9.300000 1.708801 28.66667 4.509250 8700.000 2095.233 3.566667 0.4725816 NA NA NA NA NA NA NA NA NA NA NA NA

Some of the columns contain characters, so calculations are not possible for these columns. You can drop these columns using select().

# Drop the columns for the mean for 'Red blood cells_mean' and the standard deviation ('Red blood cells_sd').
bp7_stats <- summarize_each(bp5, funs(mean, sd, se = sd(.)/sqrt(n()))) %>%
  select(c(-`Red blood cells_mean`,-`Red blood cells_sd`))
formatted_table(bp7_stats)
Classification
chr
patient_id_mean
dbl
Age (years)_mean
dbl
Blood pressure (mm/Hg)_mean
dbl
Specific gravity_mean
dbl
Albumine_mean
dbl
Sugar_mean
dbl
Pus in cells_mean
dbl
Pus cell clumps_mean
dbl
Bacteria_mean
dbl
[Glucose] (mg/dl)_mean
dbl
[Blood urea] (mg/dl)_mean
dbl
[Creatine] (mg/dl)_mean
dbl
[Na] (mEq/L)_mean
dbl
[K] (mEq/L)_mean
dbl
Hemoglobine (mg)_mean
dbl
Packed cell volume_mean
dbl
White blood cell count (cells/µl)_mean
dbl
Red blood cell count (millions/µl)_mean
dbl
Hypertension_mean
dbl
Diabetes mellitus_mean
dbl
Coronary Artery Disease_mean
dbl
Appetite_mean
dbl
Pedal edema_mean
dbl
Anemia_mean
dbl
patient_id_sd
dbl
Age (years)_sd
dbl
Blood pressure (mm/Hg)_sd
dbl
Specific gravity_sd
dbl
Albumine_sd
dbl
Sugar_sd
dbl
Pus in cells_sd
dbl
Pus cell clumps_sd
dbl
Bacteria_sd
dbl
[Glucose] (mg/dl)_sd
dbl
[Blood urea] (mg/dl)_sd
dbl
[Creatine] (mg/dl)_sd
dbl
[Na] (mEq/L)_sd
dbl
[K] (mEq/L)_sd
dbl
Hemoglobine (mg)_sd
dbl
Packed cell volume_sd
dbl
White blood cell count (cells/µl)_sd
dbl
Red blood cell count (millions/µl)_sd
dbl
Hypertension_sd
dbl
Diabetes mellitus_sd
dbl
Coronary Artery Disease_sd
dbl
Appetite_sd
dbl
Pedal edema_sd
dbl
Anemia_sd
dbl
patient_id_se
dbl
Age (years)_se
dbl
Blood pressure (mm/Hg)_se
dbl
Specific gravity_se
dbl
Albumine_se
dbl
Sugar_se
dbl
Red blood cells_se
dbl
Pus in cells_se
dbl
Pus cell clumps_se
dbl
Bacteria_se
dbl
[Glucose] (mg/dl)_se
dbl
[Blood urea] (mg/dl)_se
dbl
[Creatine] (mg/dl)_se
dbl
[Na] (mEq/L)_se
dbl
[K] (mEq/L)_se
dbl
Hemoglobine (mg)_se
dbl
Packed cell volume_se
dbl
White blood cell count (cells/µl)_se
dbl
Red blood cell count (millions/µl)_se
dbl
Hypertension_se
dbl
Diabetes mellitus_se
dbl
Coronary Artery Disease_se
dbl
Appetite_se
dbl
Pedal edema_se
dbl
Anemia_se
dbl
ckd 142.61538 59.50000 75.76923 1.012692 3.038461 0.9615385 NA NA NA 195.9231 100.1538 5.0461538 130.9231 6.219231 9.826923 29.76923 11234.615 3.776923 NA NA NA NA NA NA 72.49832 14.602055 14.470128 0.0045234 1.076318 1.310901 NA NA NA 74.84112 66.87762 3.7022405 7.450761 8.3718108 2.436647 8.011530 5101.368 1.1229631 NA NA NA NA NA NA 14.218090 2.863698 2.837826 0.0008871 0.2110834 0.2570888 NA NA NA NA 14.677552 13.115779 0.7260691 1.4612145 1.6418472 0.4778657 1.5711903 1000.4605 0.2202312 NA NA NA NA NA NA
notckd 324.66667 46.05128 71.66667 1.022756 0.000000 0.0000000 NA NA NA 107.9872 33.5000 0.8576923 141.7051 4.333333 15.025641 46.53846 7596.154 5.302564 NA NA NA NA NA NA 45.31955 15.829364 8.888438 0.0025029 0.000000 0.000000 NA NA NA 18.20125 11.42877 0.2601295 4.952001 0.5801179 1.354112 3.959838 1816.836 0.5500825 NA NA NA NA NA NA 5.131428 1.792323 1.006417 0.0002834 0.0000000 0.0000000 NA NA NA NA 2.060886 1.294054 0.0294539 0.5607037 0.0656854 0.1533230 0.4483634 205.7162 0.0622846 NA NA NA NA NA NA
possibleckd 79.33333 59.00000 86.66667 1.013333 2.000000 0.0000000 NA NA NA 135.6667 102.6667 4.3000000 134.0000 5.066667 9.300000 28.66667 8700.000 3.566667 NA NA NA NA NA NA 62.17180 2.645751 5.773503 0.0028868 0.000000 0.000000 NA NA NA 34.48671 47.64802 2.2271057 2.645751 0.2309401 1.708801 4.509250 2095.233 0.4725816 NA NA NA NA NA NA 35.894908 1.527525 3.333333 0.0016667 0.0000000 0.0000000 NA NA NA NA 19.910913 27.509594 1.2858201 1.5275252 0.1333333 0.9865766 2.6034166 1209.6832 0.2728451 NA NA NA NA NA NA
# Or simplify using 'forward-chaining'.
bp8_stats <- bp5 %>%
  summarise_each(funs(mean, sd, se = sd(.)/sqrt(n()))) %>%
  select(c(-`Red blood cells_mean`,-`Red blood cells_sd`))
formatted_table(bp8_stats)
Classification
chr
patient_id_mean
dbl
Age (years)_mean
dbl
Blood pressure (mm/Hg)_mean
dbl
Specific gravity_mean
dbl
Albumine_mean
dbl
Sugar_mean
dbl
Pus in cells_mean
dbl
Pus cell clumps_mean
dbl
Bacteria_mean
dbl
[Glucose] (mg/dl)_mean
dbl
[Blood urea] (mg/dl)_mean
dbl
[Creatine] (mg/dl)_mean
dbl
[Na] (mEq/L)_mean
dbl
[K] (mEq/L)_mean
dbl
Hemoglobine (mg)_mean
dbl
Packed cell volume_mean
dbl
White blood cell count (cells/µl)_mean
dbl
Red blood cell count (millions/µl)_mean
dbl
Hypertension_mean
dbl
Diabetes mellitus_mean
dbl
Coronary Artery Disease_mean
dbl
Appetite_mean
dbl
Pedal edema_mean
dbl
Anemia_mean
dbl
patient_id_sd
dbl
Age (years)_sd
dbl
Blood pressure (mm/Hg)_sd
dbl
Specific gravity_sd
dbl
Albumine_sd
dbl
Sugar_sd
dbl
Pus in cells_sd
dbl
Pus cell clumps_sd
dbl
Bacteria_sd
dbl
[Glucose] (mg/dl)_sd
dbl
[Blood urea] (mg/dl)_sd
dbl
[Creatine] (mg/dl)_sd
dbl
[Na] (mEq/L)_sd
dbl
[K] (mEq/L)_sd
dbl
Hemoglobine (mg)_sd
dbl
Packed cell volume_sd
dbl
White blood cell count (cells/µl)_sd
dbl
Red blood cell count (millions/µl)_sd
dbl
Hypertension_sd
dbl
Diabetes mellitus_sd
dbl
Coronary Artery Disease_sd
dbl
Appetite_sd
dbl
Pedal edema_sd
dbl
Anemia_sd
dbl
patient_id_se
dbl
Age (years)_se
dbl
Blood pressure (mm/Hg)_se
dbl
Specific gravity_se
dbl
Albumine_se
dbl
Sugar_se
dbl
Red blood cells_se
dbl
Pus in cells_se
dbl
Pus cell clumps_se
dbl
Bacteria_se
dbl
[Glucose] (mg/dl)_se
dbl
[Blood urea] (mg/dl)_se
dbl
[Creatine] (mg/dl)_se
dbl
[Na] (mEq/L)_se
dbl
[K] (mEq/L)_se
dbl
Hemoglobine (mg)_se
dbl
Packed cell volume_se
dbl
White blood cell count (cells/µl)_se
dbl
Red blood cell count (millions/µl)_se
dbl
Hypertension_se
dbl
Diabetes mellitus_se
dbl
Coronary Artery Disease_se
dbl
Appetite_se
dbl
Pedal edema_se
dbl
Anemia_se
dbl
ckd 142.61538 59.50000 75.76923 1.012692 3.038461 0.9615385 NA NA NA 195.9231 100.1538 5.0461538 130.9231 6.219231 9.826923 29.76923 11234.615 3.776923 NA NA NA NA NA NA 72.49832 14.602055 14.470128 0.0045234 1.076318 1.310901 NA NA NA 74.84112 66.87762 3.7022405 7.450761 8.3718108 2.436647 8.011530 5101.368 1.1229631 NA NA NA NA NA NA 14.218090 2.863698 2.837826 0.0008871 0.2110834 0.2570888 NA NA NA NA 14.677552 13.115779 0.7260691 1.4612145 1.6418472 0.4778657 1.5711903 1000.4605 0.2202312 NA NA NA NA NA NA
notckd 324.66667 46.05128 71.66667 1.022756 0.000000 0.0000000 NA NA NA 107.9872 33.5000 0.8576923 141.7051 4.333333 15.025641 46.53846 7596.154 5.302564 NA NA NA NA NA NA 45.31955 15.829364 8.888438 0.0025029 0.000000 0.000000 NA NA NA 18.20125 11.42877 0.2601295 4.952001 0.5801179 1.354112 3.959838 1816.836 0.5500825 NA NA NA NA NA NA 5.131428 1.792323 1.006417 0.0002834 0.0000000 0.0000000 NA NA NA NA 2.060886 1.294054 0.0294539 0.5607037 0.0656854 0.1533230 0.4483634 205.7162 0.0622846 NA NA NA NA NA NA
possibleckd 79.33333 59.00000 86.66667 1.013333 2.000000 0.0000000 NA NA NA 135.6667 102.6667 4.3000000 134.0000 5.066667 9.300000 28.66667 8700.000 3.566667 NA NA NA NA NA NA 62.17180 2.645751 5.773503 0.0028868 0.000000 0.000000 NA NA NA 34.48671 47.64802 2.2271057 2.645751 0.2309401 1.708801 4.509250 2095.233 0.4725816 NA NA NA NA NA NA 35.894908 1.527525 3.333333 0.0016667 0.0000000 0.0000000 NA NA NA NA 19.910913 27.509594 1.2858201 1.5275252 0.1333333 0.9865766 2.6034166 1209.6832 0.2728451 NA NA NA NA NA NA

Removing columns with character type data is also possible.

# Remove columns with character type.
bp9_stats <- select_if(drop_na(group_by(tibble1, Classification)), is.numeric)
formatted_table(summarise_each(bp9_stats, funs(mean, sd, se = sd(.)/sqrt(n()))))
Classification
chr
patient_id_mean
dbl
Age (years)_mean
dbl
Blood pressure (mm/Hg)_mean
dbl
Specific gravity_mean
dbl
Albumine_mean
dbl
Sugar_mean
dbl
[Glucose] (mg/dl)_mean
dbl
[Blood urea] (mg/dl)_mean
dbl
[Creatine] (mg/dl)_mean
dbl
[Na] (mEq/L)_mean
dbl
[K] (mEq/L)_mean
dbl
Hemoglobine (mg)_mean
dbl
Packed cell volume_mean
dbl
White blood cell count (cells/µl)_mean
dbl
Red blood cell count (millions/µl)_mean
dbl
patient_id_sd
dbl
Age (years)_sd
dbl
Blood pressure (mm/Hg)_sd
dbl
Specific gravity_sd
dbl
Albumine_sd
dbl
Sugar_sd
dbl
[Glucose] (mg/dl)_sd
dbl
[Blood urea] (mg/dl)_sd
dbl
[Creatine] (mg/dl)_sd
dbl
[Na] (mEq/L)_sd
dbl
[K] (mEq/L)_sd
dbl
Hemoglobine (mg)_sd
dbl
Packed cell volume_sd
dbl
White blood cell count (cells/µl)_sd
dbl
Red blood cell count (millions/µl)_sd
dbl
patient_id_se
dbl
Age (years)_se
dbl
Blood pressure (mm/Hg)_se
dbl
Specific gravity_se
dbl
Albumine_se
dbl
Sugar_se
dbl
[Glucose] (mg/dl)_se
dbl
[Blood urea] (mg/dl)_se
dbl
[Creatine] (mg/dl)_se
dbl
[Na] (mEq/L)_se
dbl
[K] (mEq/L)_se
dbl
Hemoglobine (mg)_se
dbl
Packed cell volume_se
dbl
White blood cell count (cells/µl)_se
dbl
Red blood cell count (millions/µl)_se
dbl
ckd 142.61538 59.50000 75.76923 1.012692 3.038461 0.9615385 195.9231 100.1538 5.0461538 130.9231 6.219231 9.826923 29.76923 11234.615 3.776923 72.49832 14.602055 14.470128 0.0045234 1.076318 1.310901 74.84112 66.87762 3.7022405 7.450761 8.3718108 2.436647 8.011530 5101.368 1.1229631 14.218090 2.863698 2.837826 0.0008871 0.2110834 0.2570888 14.677552 13.115779 0.7260691 1.4612145 1.6418472 0.4778657 1.5711903 1000.4605 0.2202312
notckd 324.66667 46.05128 71.66667 1.022756 0.000000 0.0000000 107.9872 33.5000 0.8576923 141.7051 4.333333 15.025641 46.53846 7596.154 5.302564 45.31955 15.829364 8.888438 0.0025029 0.000000 0.000000 18.20125 11.42877 0.2601295 4.952001 0.5801179 1.354112 3.959838 1816.836 0.5500825 5.131428 1.792323 1.006417 0.0002834 0.0000000 0.0000000 2.060886 1.294054 0.0294539 0.5607037 0.0656854 0.1533230 0.4483634 205.7162 0.0622846
possibleckd 79.33333 59.00000 86.66667 1.013333 2.000000 0.0000000 135.6667 102.6667 4.3000000 134.0000 5.066667 9.300000 28.66667 8700.000 3.566667 62.17180 2.645751 5.773503 0.0028868 0.000000 0.000000 34.48671 47.64802 2.2271057 2.645751 0.2309401 1.708801 4.509250 2095.233 0.4725816 35.894908 1.527525 3.333333 0.0016667 0.0000000 0.0000000 19.910913 27.509594 1.2858201 1.5275252 0.1333333 0.9865766 2.6034166 1209.6832 0.2728451
bp10_stats <- bp5 %>%
  select_if(is.numeric) %>%
  summarise_each(funs(mean, sd, se = sd(.)/sqrt(n())))
formatted_table(bp10_stats)
Classification
chr
patient_id_mean
dbl
Age (years)_mean
dbl
Blood pressure (mm/Hg)_mean
dbl
Specific gravity_mean
dbl
Albumine_mean
dbl
Sugar_mean
dbl
[Glucose] (mg/dl)_mean
dbl
[Blood urea] (mg/dl)_mean
dbl
[Creatine] (mg/dl)_mean
dbl
[Na] (mEq/L)_mean
dbl
[K] (mEq/L)_mean
dbl
Hemoglobine (mg)_mean
dbl
Packed cell volume_mean
dbl
White blood cell count (cells/µl)_mean
dbl
Red blood cell count (millions/µl)_mean
dbl
patient_id_sd
dbl
Age (years)_sd
dbl
Blood pressure (mm/Hg)_sd
dbl
Specific gravity_sd
dbl
Albumine_sd
dbl
Sugar_sd
dbl
[Glucose] (mg/dl)_sd
dbl
[Blood urea] (mg/dl)_sd
dbl
[Creatine] (mg/dl)_sd
dbl
[Na] (mEq/L)_sd
dbl
[K] (mEq/L)_sd
dbl
Hemoglobine (mg)_sd
dbl
Packed cell volume_sd
dbl
White blood cell count (cells/µl)_sd
dbl
Red blood cell count (millions/µl)_sd
dbl
patient_id_se
dbl
Age (years)_se
dbl
Blood pressure (mm/Hg)_se
dbl
Specific gravity_se
dbl
Albumine_se
dbl
Sugar_se
dbl
[Glucose] (mg/dl)_se
dbl
[Blood urea] (mg/dl)_se
dbl
[Creatine] (mg/dl)_se
dbl
[Na] (mEq/L)_se
dbl
[K] (mEq/L)_se
dbl
Hemoglobine (mg)_se
dbl
Packed cell volume_se
dbl
White blood cell count (cells/µl)_se
dbl
Red blood cell count (millions/µl)_se
dbl
ckd 142.61538 59.50000 75.76923 1.012692 3.038461 0.9615385 195.9231 100.1538 5.0461538 130.9231 6.219231 9.826923 29.76923 11234.615 3.776923 72.49832 14.602055 14.470128 0.0045234 1.076318 1.310901 74.84112 66.87762 3.7022405 7.450761 8.3718108 2.436647 8.011530 5101.368 1.1229631 14.218090 2.863698 2.837826 0.0008871 0.2110834 0.2570888 14.677552 13.115779 0.7260691 1.4612145 1.6418472 0.4778657 1.5711903 1000.4605 0.2202312
notckd 324.66667 46.05128 71.66667 1.022756 0.000000 0.0000000 107.9872 33.5000 0.8576923 141.7051 4.333333 15.025641 46.53846 7596.154 5.302564 45.31955 15.829364 8.888438 0.0025029 0.000000 0.000000 18.20125 11.42877 0.2601295 4.952001 0.5801179 1.354112 3.959838 1816.836 0.5500825 5.131428 1.792323 1.006417 0.0002834 0.0000000 0.0000000 2.060886 1.294054 0.0294539 0.5607037 0.0656854 0.1533230 0.4483634 205.7162 0.0622846
possibleckd 79.33333 59.00000 86.66667 1.013333 2.000000 0.0000000 135.6667 102.6667 4.3000000 134.0000 5.066667 9.300000 28.66667 8700.000 3.566667 62.17180 2.645751 5.773503 0.0028868 0.000000 0.000000 34.48671 47.64802 2.2271057 2.645751 0.2309401 1.708801 4.509250 2095.233 0.4725816 35.894908 1.527525 3.333333 0.0016667 0.0000000 0.0000000 19.910913 27.509594 1.2858201 1.5275252 0.1333333 0.9865766 2.6034166 1209.6832 0.2728451

Note: summarize_all() and summarize_each() give the same results.

# Use `summarize_each()` and `summarize_all()` on the same data and compare the results.
df_summ_all <- tibble1 %>%
  group_by(Classification) %>%
  summarize_all(list(mean = mean, sd = sd))
formatted_table(df_summ_all)
Classification
chr
patient_id_mean
dbl
Age (years)_mean
dbl
Blood pressure (mm/Hg)_mean
dbl
Specific gravity_mean
dbl
Albumine_mean
dbl
Sugar_mean
dbl
Red blood cells_mean
dbl
Pus in cells_mean
dbl
Pus cell clumps_mean
dbl
Bacteria_mean
dbl
[Glucose] (mg/dl)_mean
dbl
[Blood urea] (mg/dl)_mean
dbl
[Creatine] (mg/dl)_mean
dbl
[Na] (mEq/L)_mean
dbl
[K] (mEq/L)_mean
dbl
Hemoglobine (mg)_mean
dbl
Packed cell volume_mean
dbl
White blood cell count (cells/µl)_mean
dbl
Red blood cell count (millions/µl)_mean
dbl
Hypertension_mean
dbl
Diabetes mellitus_mean
dbl
Coronary Artery Disease_mean
dbl
Appetite_mean
dbl
Pedal edema_mean
dbl
Anemia_mean
dbl
patient_id_sd
dbl
Age (years)_sd
dbl
Blood pressure (mm/Hg)_sd
dbl
Specific gravity_sd
dbl
Albumine_sd
dbl
Sugar_sd
dbl
Red blood cells_sd
dbl
Pus in cells_sd
dbl
Pus cell clumps_sd
dbl
Bacteria_sd
dbl
[Glucose] (mg/dl)_sd
dbl
[Blood urea] (mg/dl)_sd
dbl
[Creatine] (mg/dl)_sd
dbl
[Na] (mEq/L)_sd
dbl
[K] (mEq/L)_sd
dbl
Hemoglobine (mg)_sd
dbl
Packed cell volume_sd
dbl
White blood cell count (cells/µl)_sd
dbl
Red blood cell count (millions/µl)_sd
dbl
Hypertension_sd
dbl
Diabetes mellitus_sd
dbl
Coronary Artery Disease_sd
dbl
Appetite_sd
dbl
Pedal edema_sd
dbl
Anemia_sd
dbl
ckd 131.0199 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 71.85573 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
notckd 322.5755 45.80189 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 41.38759 15.86336 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
possibleckd 123.6087 NA 85.21739 NA NA NA NA NA NA NA NA 71.56522 3.678261 NA NA NA NA NA NA NA NA NA NA NA NA 68.00851 NA 23.5236 NA NA NA NA NA NA NA NA 47.47423 3.860635 NA NA NA NA NA NA NA NA NA NA NA NA
by_heart <- group_by(tibble1, Classification)
formatted_table(summarize_each(by_heart, funs(mean, sd)))
Classification
chr
patient_id_mean
dbl
Age (years)_mean
dbl
Blood pressure (mm/Hg)_mean
dbl
Specific gravity_mean
dbl
Albumine_mean
dbl
Sugar_mean
dbl
Red blood cells_mean
dbl
Pus in cells_mean
dbl
Pus cell clumps_mean
dbl
Bacteria_mean
dbl
[Glucose] (mg/dl)_mean
dbl
[Blood urea] (mg/dl)_mean
dbl
[Creatine] (mg/dl)_mean
dbl
[Na] (mEq/L)_mean
dbl
[K] (mEq/L)_mean
dbl
Hemoglobine (mg)_mean
dbl
Packed cell volume_mean
dbl
White blood cell count (cells/µl)_mean
dbl
Red blood cell count (millions/µl)_mean
dbl
Hypertension_mean
dbl
Diabetes mellitus_mean
dbl
Coronary Artery Disease_mean
dbl
Appetite_mean
dbl
Pedal edema_mean
dbl
Anemia_mean
dbl
patient_id_sd
dbl
Age (years)_sd
dbl
Blood pressure (mm/Hg)_sd
dbl
Specific gravity_sd
dbl
Albumine_sd
dbl
Sugar_sd
dbl
Red blood cells_sd
dbl
Pus in cells_sd
dbl
Pus cell clumps_sd
dbl
Bacteria_sd
dbl
[Glucose] (mg/dl)_sd
dbl
[Blood urea] (mg/dl)_sd
dbl
[Creatine] (mg/dl)_sd
dbl
[Na] (mEq/L)_sd
dbl
[K] (mEq/L)_sd
dbl
Hemoglobine (mg)_sd
dbl
Packed cell volume_sd
dbl
White blood cell count (cells/µl)_sd
dbl
Red blood cell count (millions/µl)_sd
dbl
Hypertension_sd
dbl
Diabetes mellitus_sd
dbl
Coronary Artery Disease_sd
dbl
Appetite_sd
dbl
Pedal edema_sd
dbl
Anemia_sd
dbl
ckd 131.0199 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 71.85573 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
notckd 322.5755 45.80189 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 41.38759 15.86336 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
possibleckd 123.6087 NA 85.21739 NA NA NA NA NA NA NA NA 71.56522 3.678261 NA NA NA NA NA NA NA NA NA NA NA NA 68.00851 NA 23.5236 NA NA NA NA NA NA NA NA 47.47423 3.860635 NA NA NA NA NA NA NA NA NA NA NA NA


Learning outcomes

This lesson you have learned to:
- select rows and columns with base R and tidyverse,
- filter and sort data in data frames,
- round numbers and do statistical analysis on data,
- summarize statistical analyses on data frames.


— The end —




Go back to the main page
Go back to the R overview page
⬆️ Back to Top


This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.