Go back to the main page
Go back to the R overview page
This Rmd file can be downloaded here.
Files used on this page:
- file01_dementia_patients_health_data.csv
R: Data Analysis
Data analysis methods
Let’s start with loading the Tidyverse library:
The following function is used to print tibbles in a proper way for the web. You can skip the use of this function to print tibbles to your screen in R Markdown documents.
library(knitr)
library(kableExtra)
library(pillar)
formatted_table <- function(df) {
col_types <- sapply(df, pillar::type_sum)
new_col_names <- paste0(names(df), "<br>", "<span style='font-weight: normal;'>", col_types, "</span>")
kbl(df,
col.names = new_col_names,
escape = FALSE, # This is crucial to allow the <br> tag to work
format = "html" # Ensure HTML format, although often auto-detected
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "responsive"))
}Here we will focus on data transformation and analysis. We will
filter rows with filter(), arrange rows with
arrange() and select columns with select(). In
addition, we will make grouped summaries using summarize().
At last, we will do some basic calculations. Note that plotting data
will be the subject of the next section.
Note: R is like a swiss army knife. Extremely powerfull but also very eloborate. It is impossible to cover all analysis functions of R. So this is not a complete guide for data analysis. Instead, this course will focus on some common tasks. You can always elaborate your analysis by exploring the Tidyverse documentation. Or even use the base R documentation if the specific task you want to perform some custom tasks.
See: tidyverse documentation
See: base R documentation
Loading a dataset
As an example data set, we will be using the same data set as used in
the Excel section: namely, the
Dementia Patient Health,Prescriptions ML Dataset.
file_path <- "./files_09_data_analysis/file01_dementia_patients_health_data.csv"
df <- read_csv(file_path)
formatted_table(head(df))|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0849736 | 98 | 96.23074 | 36.22485 | 57.56398 | 36.42103 | NA | NA | 60 | Primary School | Left | Female | No | Current Smoker | Negative | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Diabetes | 0 |
| 0 | 0.0169728 | 78 | 93.03212 | 36.18387 | 56.83234 | 31.15763 | Galantamine | 12.0 | 61 | Secondary School | Right | Male | No | Former Smoker | Positive | Moderate Activity | No | 1 | Yes | Low-Carb Diet | Poor | Heart Disease | 1 |
| 0 | 0.0090002 | 89 | 93.56650 | 37.32632 | 59.75907 | 37.64044 | NA | NA | 69 | Primary School | Right | Male | Yes | Former Smoker | Negative | Moderate Activity | No | 8 | No | Mediterranean Diet | Poor | Heart Disease | 0 |
| 0 | 0.0864373 | 60 | 93.90651 | 37.03062 | 58.26647 | 50.67399 | Donepezil | 23.0 | 78 | Secondary School | Left | Female | Yes | Never Smoked | Negative | Mild Activity | Yes | 5 | Yes | Balanced Diet | Poor | Hypertension | 1 |
| 1 | 0.1507473 | 67 | 97.50899 | 36.06212 | 67.70503 | 27.81060 | Memantine | 20.0 | 77 | Secondary School | Right | Male | Yes | Never Smoked | Positive | Mild Activity | No | 0 | Yes | Low-Carb Diet | Good | Diabetes | 1 |
| 1 | 0.1140278 | 94 | 94.54675 | 36.67807 | 66.59233 | 21.15486 | Rivastigmine | 1.5 | 67 | No School | Left | Male | No | Former Smoker | Positive | Mild Activity | Yes | 1 | No | Low-Carb Diet | Poor | Diabetes | 1 |
As you can see above, the data is loaded well. Also note that the
columns are of the correct data type (e.g. numeric for the
HeartRate and Weight columns and
character for Prescription and
Gender).
Select rows by index
You can select rows by index using the following notation:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0169728 | 78 | 93.03212 | 36.18387 | 56.83234 | 31.15763 | Galantamine | 12 | 61 | Secondary School | Right | Male | No | Former Smoker | Positive | Moderate Activity | No | 1 | Yes | Low-Carb Diet | Poor | Heart Disease | 1 |
Note the comma between the brackets. Since there is no specification of a column, all columns are selected.
Select column by index
You can also select a column based on an index:
|
Prescription chr |
|---|
| NA |
| Galantamine |
| NA |
| Donepezil |
| Memantine |
| Rivastigmine |
The head() function was used to truncate the output.
Select column by name
You can also select a column by name:
## [1] NA "Galantamine" NA "Donepezil" "Memantine"
## [6] "Rivastigmine"
Note that this action returns a vector instead of a data frame.
Select item by index
You can also select a specific item by index:
|
Prescription chr |
|---|
| Galantamine |
Now we have the value of the second row and the eighth column.
Analysis with Tidyverse functions
The previous examples where all examples of base R. We will now focus on some commonly used Tidyverse functions for the analysis and plotting (next lessons).
Herewith a table with some commonly used Tidyverse functions and the applications:
| Package | Function | Primary Purpose | Description |
|---|---|---|---|
dplyr |
filter() |
Subset rows | Selects a subset of rows based on column values that meet specified logical conditions. |
dplyr |
slice() |
Subset rows by position | Selects rows by their integer position in the data
frame. Also includes helpers like slice_head(),
slice_tail(), slice_min(),
slice_max(), and slice_sample(). |
dplyr |
select() |
Subset columns | Selects, drops, or reorders columns by their names. |
dplyr |
mutate() |
Add/Modify columns | Adds new columns or modifies existing ones based on a formula. |
dplyr |
arrange() |
Reorder rows | Sorts rows by the values of specified column(s) in ascending or descending order. |
dplyr |
group_by() |
Group data | Creates grouped data that subsequent functions (like
summarize()) operate on for each group. |
dplyr |
summarize() |
Summarize data | Reduces multiple values down to a single summary statistic (e.g., mean, count, standard deviation). |
dplyr |
inner_join() |
Combine data | Joins two data frames by matching keys, keeping only rows that have matches in both tables. |
tidyr |
pivot_longer() |
Reshape data | Lengthens data, increasing the number of rows and decreasing the number of columns. |
tidyr |
pivot_wider() |
Reshape data | Widens data, decreasing the number of rows and increasing the number of columns. |
ggplot2 |
ggplot() |
Create visualizations | Initializes a ggplot object, defining the data and aesthetic mappings. |
| Base R | %>% (pipe) |
Chaining operations | Feeds the result of one function directly into the first argument of the next function. |
Some examples will follow below.
Slicing with slice
Tidyverse uses the slice function (and variants) to
slice your tibbles.
Some examples:
Slice the first row:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0849736 | 98 | 96.23074 | 36.22485 | 57.56398 | 36.42103 | NA | NA | 60 | Primary School | Left | Female | No | Current Smoker | Negative | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Diabetes | 0 |
Or the second to fifth row:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0169728 | 78 | 93.03212 | 36.18387 | 56.83234 | 31.15763 | Galantamine | 12 | 61 | Secondary School | Right | Male | No | Former Smoker | Positive | Moderate Activity | No | 1 | Yes | Low-Carb Diet | Poor | Heart Disease | 1 |
| 0 | 0.0090002 | 89 | 93.56650 | 37.32632 | 59.75907 | 37.64044 | NA | NA | 69 | Primary School | Right | Male | Yes | Former Smoker | Negative | Moderate Activity | No | 8 | No | Mediterranean Diet | Poor | Heart Disease | 0 |
| 0 | 0.0864373 | 60 | 93.90651 | 37.03062 | 58.26647 | 50.67399 | Donepezil | 23 | 78 | Secondary School | Left | Female | Yes | Never Smoked | Negative | Mild Activity | Yes | 5 | Yes | Balanced Diet | Poor | Hypertension | 1 |
| 1 | 0.1507473 | 67 | 97.50899 | 36.06212 | 67.70503 | 27.81060 | Memantine | 20 | 77 | Secondary School | Right | Male | Yes | Never Smoked | Positive | Mild Activity | No | 0 | Yes | Low-Carb Diet | Good | Diabetes | 1 |
You can use slice_head to get the upper n
rows:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0849736 | 98 | 96.23074 | 36.22485 | 57.56398 | 36.42103 | NA | NA | 60 | Primary School | Left | Female | No | Current Smoker | Negative | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Diabetes | 0 |
| 0 | 0.0169728 | 78 | 93.03212 | 36.18387 | 56.83234 | 31.15763 | Galantamine | 12 | 61 | Secondary School | Right | Male | No | Former Smoker | Positive | Moderate Activity | No | 1 | Yes | Low-Carb Diet | Poor | Heart Disease | 1 |
| 0 | 0.0090002 | 89 | 93.56650 | 37.32632 | 59.75907 | 37.64044 | NA | NA | 69 | Primary School | Right | Male | Yes | Former Smoker | Negative | Moderate Activity | No | 8 | No | Mediterranean Diet | Poor | Heart Disease | 0 |
| 0 | 0.0864373 | 60 | 93.90651 | 37.03062 | 58.26647 | 50.67399 | Donepezil | 23 | 78 | Secondary School | Left | Female | Yes | Never Smoked | Negative | Mild Activity | Yes | 5 | Yes | Balanced Diet | Poor | Hypertension | 1 |
| 1 | 0.1507473 | 67 | 97.50899 | 36.06212 | 67.70503 | 27.81060 | Memantine | 20 | 77 | Secondary School | Right | Male | Yes | Never Smoked | Positive | Mild Activity | No | 0 | Yes | Low-Carb Diet | Good | Diabetes | 1 |
Or slice_tail to get the last n rows:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0818247 | 87 | 93.85196 | 36.49513 | 50.38011 | 42.31866 | Donepezil | 10 | 88 | Diploma/Degree | Left | Male | Yes | Never Smoked | Positive | Mild Activity | No | 5 | No | Balanced Diet | Poor | Diabetes | 1 |
| 1 | 0.1452494 | 97 | 94.52239 | 36.27080 | 94.00648 | 52.81257 | NA | NA | 80 | Primary School | Left | Female | No | Never Smoked | Negative | Moderate Activity | No | 9 | Yes | Low-Carb Diet | Poor | Diabetes | 0 |
| 1 | 0.0736918 | 65 | 98.57839 | 37.06570 | 80.08861 | 13.64023 | NA | NA | 67 | Primary School | Right | Female | No | Never Smoked | Positive | Sedentary | No | 8 | Yes | Balanced Diet | Good | Diabetes | 0 |
| 0 | 0.0373472 | 71 | 91.29858 | 37.03720 | 95.32221 | 17.44572 | Memantine | 20 | 62 | No School | Left | Male | Yes | Never Smoked | Positive | Sedentary | Yes | 2 | No | Low-Carb Diet | Good | None | 1 |
| 0 | 0.0859694 | 90 | 95.52283 | 36.02675 | 57.67145 | 30.01184 | NA | NA | 80 | Secondary School | Left | Female | Yes | Never Smoked | Positive | Mild Activity | No | 10 | Yes | Mediterranean Diet | Good | Heart Disease | 0 |
Using slice_max you can get the row with a max value
(for example the max Alcohol level):
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.1998664 | 79 | 96.73969 | 37.12633 | 51.13334 | 54.88284 | NA | NA | 68 | Secondary School | Left | Male | Yes | Current Smoker | Positive | Moderate Activity | No | 8 | No | Balanced Diet | Good | Heart Disease | 0 |
Note that this is the same as the following in base R (but easier to read):
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.1998664 | 79 | 96.73969 | 37.12633 | 51.13334 | 54.88284 | NA | NA | 68 | Secondary School | Left | Male | Yes | Current Smoker | Positive | Moderate Activity | No | 8 | No | Balanced Diet | Good | Heart Disease | 0 |
There is also a slice_min function.
Filter rows
One of the most frequent tasks that you will be doing on data frames is filtering rows. Here is how that works in R:
Note that
slice_head()is used to reduce the output to 6 rows.
Try it yourself without the `slice_head()to compare the output.
Filter rows on all patients of age 65:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.1907845 | 61 | 90.20539 | 36.56134 | 74.24871 | 0.3364853 | Memantine | 10 | 65 | No School | Left | Female | No | Former Smoker | Positive | Moderate Activity | No | 0 | No | Mediterranean Diet | Poor | Hypertension | 1 |
| 0 | 0.0477836 | 84 | 98.39336 | 36.27682 | 89.19730 | 37.9368638 | NA | NA | 65 | Primary School | Left | Male | No | Former Smoker | Positive | Mild Activity | No | 10 | Yes | Low-Carb Diet | Good | Heart Disease | 0 |
| 0 | 0.0504704 | 84 | 95.66371 | 36.67772 | 71.11262 | 5.0795380 | Rivastigmine | 6 | 65 | Primary School | Left | Male | No | Former Smoker | Positive | Sedentary | No | 0 | No | Low-Carb Diet | Poor | Hypertension | 1 |
| 1 | 0.1623064 | 70 | 91.43118 | 36.51906 | 56.70106 | 22.7165516 | Galantamine | 12 | 65 | No School | Left | Male | Yes | Former Smoker | Positive | Mild Activity | Yes | 0 | Yes | Mediterranean Diet | Poor | Diabetes | 1 |
| 1 | 0.1862844 | 96 | 94.70693 | 36.47086 | 94.10628 | 3.7485111 | Donepezil | 23 | 65 | Primary School | Right | Female | No | Never Smoked | Positive | Moderate Activity | Yes | 0 | No | Balanced Diet | Good | Diabetes | 1 |
| 1 | 0.0498412 | 91 | 95.83319 | 37.08870 | 56.81603 | 32.2286017 | NA | NA | 65 | Secondary School | Right | Female | No | Former Smoker | Positive | Moderate Activity | No | 9 | Yes | Low-Carb Diet | Poor | Diabetes | 0 |
Measured oxygen level in blood considered as normal (>95):
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0849736 | 98 | 96.23074 | 36.22485 | 57.56398 | 36.42103 | NA | NA | 60 | Primary School | Left | Female | No | Current Smoker | Negative | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Diabetes | 0 |
| 1 | 0.1507473 | 67 | 97.50899 | 36.06212 | 67.70503 | 27.81060 | Memantine | 20 | 77 | Secondary School | Right | Male | Yes | Never Smoked | Positive | Mild Activity | No | 0 | Yes | Low-Carb Diet | Good | Diabetes | 1 |
| 1 | 0.0161938 | 90 | 96.42336 | 37.02463 | 83.97655 | 11.35061 | Donepezil | 10 | 87 | Primary School | Right | Male | No | Never Smoked | Positive | Mild Activity | No | 7 | No | Mediterranean Diet | Poor | Diabetes | 1 |
| 0 | 0.0157542 | 69 | 99.85949 | 36.95526 | 53.72508 | 36.62969 | NA | NA | 66 | Secondary School | Right | Female | No | Former Smoker | Positive | Mild Activity | No | 10 | Yes | Mediterranean Diet | Poor | None | 0 |
| 0 | 0.0826712 | 93 | 96.76003 | 36.51048 | 77.50146 | 34.23731 | NA | NA | 69 | Primary School | Right | Male | No | Former Smoker | Positive | Mild Activity | No | 8 | No | Balanced Diet | Good | Hypertension | 0 |
| 1 | 0.0280772 | 93 | 98.88930 | 36.62580 | 56.72634 | 52.64004 | Rivastigmine | 3 | 76 | Secondary School | Right | Female | Yes | Former Smoker | Negative | Moderate Activity | No | 6 | Yes | Low-Carb Diet | Poor | Diabetes | 1 |
To add a second condition, the & operator is
used:
Current smoker (condition 1) AND normal blood oxygen levels (condition
2):
df %>%
filter(Smoking_Status == "Current Smoker" & BloodOxygenLevel > 95) %>%
slice_head(n = 6) %>%
formatted_table()|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0849736 | 98 | 96.23074 | 36.22485 | 57.56398 | 36.42103 | NA | NA | 60 | Primary School | Left | Female | No | Current Smoker | Negative | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Diabetes | 0 |
| 0 | 0.1659716 | 84 | 97.20612 | 36.01245 | 72.61512 | 32.55944 | NA | NA | 87 | Secondary School | Left | Female | Yes | Current Smoker | Negative | Mild Activity | No | 9 | No | Balanced Diet | Good | None | 0 |
| 1 | 0.0385586 | 93 | 95.69586 | 36.31134 | 77.04856 | 49.58918 | NA | NA | 83 | Primary School | Right | Female | Yes | Current Smoker | Negative | Mild Activity | No | 8 | Yes | Balanced Diet | Good | Diabetes | 0 |
| 1 | 0.0271501 | 87 | 95.70534 | 36.95713 | 91.42437 | 38.93053 | NA | NA | 75 | Primary School | Left | Female | No | Current Smoker | Positive | Sedentary | No | 8 | No | Low-Carb Diet | Poor | Diabetes | 0 |
| 0 | 0.1759582 | 87 | 98.09701 | 37.06736 | 83.12899 | 26.25142 | NA | NA | 89 | Primary School | Right | Female | No | Current Smoker | Positive | Moderate Activity | No | 8 | Yes | Balanced Diet | Good | Heart Disease | 0 |
| 0 | 0.1326665 | 96 | 99.08544 | 37.35210 | 57.75965 | 43.13915 | NA | NA | 81 | Diploma/Degree | Right | Male | Yes | Current Smoker | Negative | Mild Activity | No | 8 | No | Balanced Diet | Good | Hypertension | 0 |
The ! operator can be used to exclude (negate) a
specific condition:
NOT a current smoker AND normal blood oxygen levels:
df %>%
filter(!Smoking_Status == "Current Smoker" & BloodOxygenLevel > 95) %>%
slice_head(n = 6) %>%
formatted_table()|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.1507473 | 67 | 97.50899 | 36.06212 | 67.70503 | 27.81060 | Memantine | 20 | 77 | Secondary School | Right | Male | Yes | Never Smoked | Positive | Mild Activity | No | 0 | Yes | Low-Carb Diet | Good | Diabetes | 1 |
| 1 | 0.0161938 | 90 | 96.42336 | 37.02463 | 83.97655 | 11.35061 | Donepezil | 10 | 87 | Primary School | Right | Male | No | Never Smoked | Positive | Mild Activity | No | 7 | No | Mediterranean Diet | Poor | Diabetes | 1 |
| 0 | 0.0157542 | 69 | 99.85949 | 36.95526 | 53.72508 | 36.62969 | NA | NA | 66 | Secondary School | Right | Female | No | Former Smoker | Positive | Mild Activity | No | 10 | Yes | Mediterranean Diet | Poor | None | 0 |
| 0 | 0.0826712 | 93 | 96.76003 | 36.51048 | 77.50146 | 34.23731 | NA | NA | 69 | Primary School | Right | Male | No | Former Smoker | Positive | Mild Activity | No | 8 | No | Balanced Diet | Good | Hypertension | 0 |
| 1 | 0.0280772 | 93 | 98.88930 | 36.62580 | 56.72634 | 52.64004 | Rivastigmine | 3 | 76 | Secondary School | Right | Female | Yes | Former Smoker | Negative | Moderate Activity | No | 6 | Yes | Low-Carb Diet | Poor | Diabetes | 1 |
| 1 | 0.0880516 | 91 | 97.40051 | 37.36268 | 58.80969 | 56.10713 | NA | NA | 64 | Primary School | Right | Female | Yes | Former Smoker | Negative | Sedentary | No | 10 | Yes | Low-Carb Diet | Good | Diabetes | 0 |
The | operator can be used to meet either one of the
conditions: Normal blood oxygen levels OR very low blood oxygen
levels:
df %>%
filter(BloodOxygenLevel < 92 | BloodOxygenLevel > 95) %>%
slice_head(n = 6) %>%
formatted_table()|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0849736 | 98 | 96.23074 | 36.22485 | 57.56398 | 36.42103 | NA | NA | 60 | Primary School | Left | Female | No | Current Smoker | Negative | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Diabetes | 0 |
| 1 | 0.1507473 | 67 | 97.50899 | 36.06212 | 67.70503 | 27.81060 | Memantine | 20 | 77 | Secondary School | Right | Male | Yes | Never Smoked | Positive | Mild Activity | No | 0 | Yes | Low-Carb Diet | Good | Diabetes | 1 |
| 1 | 0.0161938 | 90 | 96.42336 | 37.02463 | 83.97655 | 11.35061 | Donepezil | 10 | 87 | Primary School | Right | Male | No | Never Smoked | Positive | Mild Activity | No | 7 | No | Mediterranean Diet | Poor | Diabetes | 1 |
| 0 | 0.0157542 | 69 | 99.85949 | 36.95526 | 53.72508 | 36.62969 | NA | NA | 66 | Secondary School | Right | Female | No | Former Smoker | Positive | Mild Activity | No | 10 | Yes | Mediterranean Diet | Poor | None | 0 |
| 1 | 0.0973395 | 64 | 90.31907 | 36.39629 | 58.36670 | 49.17576 | Rivastigmine | 3 | 87 | Diploma/Degree | Left | Female | No | Former Smoker | Positive | Sedentary | Yes | 7 | Yes | Balanced Diet | Good | Diabetes | 1 |
| 0 | 0.0674856 | 97 | 91.97409 | 37.11506 | 81.58276 | 22.28137 | NA | NA | 73 | Secondary School | Left | Female | Yes | Never Smoked | Positive | Mild Activity | No | 9 | No | Mediterranean Diet | Good | Heart Disease | 0 |
Arranging rows
You can use arrange() instead of filter()
if you want to sort your rows in a data frame instead of filtering
it:
Arrange on blood oxygen levels ascending (from low to high values):
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0843848 | 81 | 90.01068 | 36.50548 | 51.62094 | 13.967348 | Memantine | 5 | 87 | Primary School | Left | Male | Yes | Never Smoked | Positive | Sedentary | Yes | 6 | No | Mediterranean Diet | Poor | None | 1 |
| 1 | 0.0014001 | 72 | 90.02021 | 37.09395 | 80.43088 | 3.459051 | NA | NA | 86 | Primary School | Left | Female | Yes | Former Smoker | Negative | Moderate Activity | No | 8 | Yes | Balanced Diet | Good | Diabetes | 0 |
| 1 | 0.0368973 | 80 | 90.02619 | 36.94885 | 89.36069 | 11.432479 | NA | NA | 67 | Primary School | Left | Male | Yes | Current Smoker | Positive | Sedentary | No | 9 | No | Mediterranean Diet | Poor | Diabetes | 0 |
| 1 | 0.1461738 | 85 | 90.03076 | 36.85638 | 54.27371 | 21.277146 | Donepezil | 10 | 61 | Secondary School | Right | Male | Yes | Former Smoker | Positive | Mild Activity | No | 5 | No | Low-Carb Diet | Good | Diabetes | 1 |
| 1 | 0.0691490 | 78 | 90.04666 | 36.44196 | 90.87892 | 25.699952 | NA | NA | 87 | Primary School | Left | Male | No | Never Smoked | Negative | Mild Activity | No | 9 | Yes | Low-Carb Diet | Good | Diabetes | 0 |
| 0 | 0.1247041 | 69 | 90.07497 | 37.14372 | 84.65651 | 10.093780 | NA | NA | 66 | Primary School | Right | Male | Yes | Current Smoker | Negative | Sedentary | No | 9 | Yes | Low-Carb Diet | Poor | None | 0 |
If you want to sort in descending order (from high to low values):
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.1630081 | 62 | 99.99923 | 36.98947 | 75.29522 | 15.94411 | NA | NA | 83 | No School | Right | Male | No | Never Smoked | Positive | Moderate Activity | No | 9 | Yes | Low-Carb Diet | Poor | Hypertension | 0 |
| 1 | 0.0631141 | 78 | 99.99880 | 36.25207 | 54.58186 | 56.58389 | NA | NA | 73 | Primary School | Left | Male | No | Former Smoker | Negative | Moderate Activity | No | 9 | Yes | Mediterranean Diet | Poor | Diabetes | 0 |
| 0 | 0.1322384 | 88 | 99.99487 | 36.34566 | 85.73596 | 51.07882 | Memantine | 20 | 69 | No School | Left | Female | Yes | Former Smoker | Positive | Moderate Activity | Yes | 1 | Yes | Low-Carb Diet | Good | Hypertension | 1 |
| 0 | 0.1264796 | 92 | 99.98091 | 36.71429 | 68.34561 | 11.14927 | NA | NA | 85 | Secondary School | Left | Female | Yes | Never Smoked | Positive | Moderate Activity | No | 10 | Yes | Mediterranean Diet | Good | None | 0 |
| 1 | 0.1407670 | 77 | 99.97098 | 36.53825 | 92.91167 | 31.58852 | Galantamine | 8 | 88 | Primary School | Right | Male | Yes | Former Smoker | Positive | Sedentary | No | 5 | No | Mediterranean Diet | Poor | Diabetes | 1 |
| 0 | 0.1700738 | 76 | 99.97006 | 37.48182 | 63.56496 | 35.78723 | NA | NA | 69 | Primary School | Right | Female | No | Former Smoker | Positive | Mild Activity | No | 9 | Yes | Low-Carb Diet | Poor | Hypertension | 0 |
And to select the row with the highest value:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.1630081 | 62 | 99.99923 | 36.98947 | 75.29522 | 15.94411 | NA | NA | 83 | No School | Right | Male | No | Never Smoked | Positive | Moderate Activity | No | 9 | Yes | Low-Carb Diet | Poor | Hypertension | 0 |
You can also do some multi-level sorting:
Here is an example of sorting on age first and then sorting on blood oxygen levels:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.1413211 | 87 | 90.42675 | 36.55439 | 64.13625 | 29.39000 | NA | NA | 60 | Primary School | Right | Female | Yes | Former Smoker | Positive | Moderate Activity | No | 8 | No | Low-Carb Diet | Good | None | 0 |
| 0 | 0.1156043 | 83 | 90.78808 | 37.35137 | 54.73249 | 59.33901 | NA | NA | 60 | Diploma/Degree | Right | Female | Yes | Former Smoker | Negative | Sedentary | No | 9 | No | Mediterranean Diet | Poor | None | 0 |
| 0 | 0.0149947 | 90 | 90.81721 | 36.21995 | 61.92799 | 51.30535 | Donepezil | 5 | 60 | No School | Right | Female | No | Never Smoked | Positive | Mild Activity | No | 0 | No | Mediterranean Diet | Good | Hypertension | 1 |
| 1 | 0.0719459 | 64 | 91.02595 | 36.06489 | 81.38060 | 50.23029 | NA | NA | 60 | Primary School | Left | Female | No | Former Smoker | Negative | Moderate Activity | No | 9 | Yes | Balanced Diet | Poor | Diabetes | 0 |
| 0 | 0.0916895 | 80 | 91.06952 | 37.12764 | 56.48799 | 13.65608 | NA | NA | 60 | Primary School | Right | Male | No | Former Smoker | Positive | Moderate Activity | No | 8 | No | Balanced Diet | Good | None | 0 |
| 1 | 0.0617510 | 87 | 91.12515 | 37.41713 | 59.74129 | 11.84277 | Galantamine | 12 | 60 | Primary School | Left | Male | Yes | Never Smoked | Negative | Moderate Activity | Yes | 4 | No | Balanced Diet | Good | Diabetes | 1 |
Here is an example of sorting on age first and then sorting on descending blood oxygen levels (from high to low):
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.1680571 | 62 | 99.95640 | 37.42094 | 69.02086 | 46.843046 | NA | NA | 60 | Diploma/Degree | Right | Female | Yes | Never Smoked | Positive | Mild Activity | No | 8 | No | Balanced Diet | Poor | Hypertension | 0 |
| 0 | 0.1891478 | 72 | 99.85792 | 36.34691 | 63.65151 | 17.480047 | Memantine | 10 | 60 | Secondary School | Left | Female | Yes | Former Smoker | Positive | Moderate Activity | No | 2 | No | Balanced Diet | Good | Heart Disease | 1 |
| 0 | 0.0486234 | 86 | 99.70376 | 37.16851 | 68.34782 | 7.025723 | NA | NA | 60 | Secondary School | Left | Female | No | Current Smoker | Positive | Mild Activity | No | 8 | Yes | Balanced Diet | Poor | Hypertension | 0 |
| 0 | 0.1248077 | 97 | 99.65280 | 37.05998 | 55.02571 | 20.655383 | Rivastigmine | 3 | 60 | Secondary School | Left | Male | Yes | Former Smoker | Positive | Sedentary | Yes | 7 | No | Mediterranean Diet | Good | Hypertension | 1 |
| 0 | 0.0435600 | 78 | 98.88844 | 36.91453 | 81.60376 | 33.466336 | NA | NA | 60 | Secondary School | Right | Male | No | Former Smoker | Positive | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Hypertension | 0 |
| 1 | 0.1668177 | 94 | 98.88256 | 36.21034 | 67.06822 | 44.708687 | NA | NA | 60 | Primary School | Right | Female | No | Never Smoked | Positive | Mild Activity | No | 10 | No | Balanced Diet | Good | Diabetes | 0 |
Note: Missing values will always be sorted at the end.
Select columns with select
You have learned that variables in data sets should be arranged in
columns. Often data sets contain a lot of variables (columns). Most
likely, you are not interested in all of them. With the
select() function, you can select the variables that you
are interested in:
df %>%
select(Age, Smoking_Status, BloodOxygenLevel, HeartRate) %>%
slice_head(n = 6) %>%
formatted_table()|
Age dbl |
Smoking_Status chr |
BloodOxygenLevel dbl |
HeartRate dbl |
|---|---|---|---|
| 60 | Current Smoker | 96.23074 | 98 |
| 61 | Former Smoker | 93.03212 | 78 |
| 69 | Former Smoker | 93.56650 | 89 |
| 78 | Never Smoked | 93.90651 | 60 |
| 77 | Never Smoked | 97.50899 | 67 |
| 67 | Former Smoker | 94.54675 | 94 |
Or select all columns except the blood oxygen levels:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0849736 | 98 | 36.22485 | 57.56398 | 36.42103 | NA | NA | 60 | Primary School | Left | Female | No | Current Smoker | Negative | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Diabetes | 0 |
| 0 | 0.0169728 | 78 | 36.18387 | 56.83234 | 31.15763 | Galantamine | 12.0 | 61 | Secondary School | Right | Male | No | Former Smoker | Positive | Moderate Activity | No | 1 | Yes | Low-Carb Diet | Poor | Heart Disease | 1 |
| 0 | 0.0090002 | 89 | 37.32632 | 59.75907 | 37.64044 | NA | NA | 69 | Primary School | Right | Male | Yes | Former Smoker | Negative | Moderate Activity | No | 8 | No | Mediterranean Diet | Poor | Heart Disease | 0 |
| 0 | 0.0864373 | 60 | 37.03062 | 58.26647 | 50.67399 | Donepezil | 23.0 | 78 | Secondary School | Left | Female | Yes | Never Smoked | Negative | Mild Activity | Yes | 5 | Yes | Balanced Diet | Poor | Hypertension | 1 |
| 1 | 0.1507473 | 67 | 36.06212 | 67.70503 | 27.81060 | Memantine | 20.0 | 77 | Secondary School | Right | Male | Yes | Never Smoked | Positive | Mild Activity | No | 0 | Yes | Low-Carb Diet | Good | Diabetes | 1 |
| 1 | 0.1140278 | 94 | 36.67807 | 66.59233 | 21.15486 | Rivastigmine | 1.5 | 67 | No School | Left | Male | No | Former Smoker | Positive | Mild Activity | Yes | 1 | No | Low-Carb Diet | Poor | Diabetes | 1 |
Or select besides the age, all column names that end with “Level”:
|
Age dbl |
AlcoholLevel dbl |
BloodOxygenLevel dbl |
Education_Level chr |
|---|---|---|---|
| 60 | 0.0849736 | 96.23074 | Primary School |
| 61 | 0.0169728 | 93.03212 | Secondary School |
| 69 | 0.0090002 | 93.56650 | Primary School |
| 78 | 0.0864373 | 93.90651 | Secondary School |
| 77 | 0.1507473 | 97.50899 | Secondary School |
| 67 | 0.1140278 | 94.54675 | No School |
Yes, there is also a function starts_With() to select
columns that start with certain characters.
Summary data
You can use the base R summary() function to get summary
information on a data frame:
## Diabetic AlcoholLevel HeartRate BloodOxygenLevel
## Min. :0.000 Min. :0.0004138 Min. : 60.00 Min. : 90.01
## 1st Qu.:0.000 1st Qu.:0.0455049 1st Qu.: 68.00 1st Qu.: 92.88
## Median :1.000 Median :0.0982347 Median : 79.00 Median : 95.39
## Mean :0.513 Mean :0.0984293 Mean : 79.38 Mean : 95.23
## 3rd Qu.:1.000 3rd Qu.:0.1518396 3rd Qu.: 90.00 3rd Qu.: 97.79
## Max. :1.000 Max. :0.1998664 Max. :100.00 Max. :100.00
##
## BodyTemperature Weight MRI_Delay Prescription
## Min. :36.00 Min. :50.07 Min. : 0.09468 Length:1000
## 1st Qu.:36.40 1st Qu.:61.39 1st Qu.:16.23737 Class :character
## Median :36.78 Median :74.15 Median :29.57719 Mode :character
## Mean :36.76 Mean :74.32 Mean :30.10357
## 3rd Qu.:37.13 3rd Qu.:87.02 3rd Qu.:44.17672
## Max. :37.50 Max. :99.98 Max. :59.95760
##
## Dosage in mg Age Education_Level Dominant_Hand
## Min. : 1.500 Min. :60.00 Length:1000 Length:1000
## 1st Qu.: 4.000 1st Qu.:67.00 Class :character Class :character
## Median : 8.000 Median :75.00 Mode :character Mode :character
## Mean : 9.213 Mean :74.91
## 3rd Qu.:12.000 3rd Qu.:83.00
## Max. :23.000 Max. :90.00
## NA's :515
## Gender Family_History Smoking_Status APOE_ε4
## Length:1000 Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Physical_Activity Depression_Status Cognitive_Test_Scores Medication_History
## Length:1000 Length:1000 Min. : 0.000 Length:1000
## Class :character Class :character 1st Qu.: 4.000 Class :character
## Mode :character Mode :character Median : 8.000 Mode :character
## Mean : 6.383
## 3rd Qu.: 9.000
## Max. :10.000
##
## Nutrition_Diet Sleep_Quality Chronic_Health_Conditions
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Dementia
## Min. :0.000
## 1st Qu.:0.000
## Median :0.000
## Mean :0.485
## 3rd Qu.:1.000
## Max. :1.000
##
As this output can be quit overwhelming you may select for the data that you are interested in (for example blood oxygen levels):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 90.01 92.88 95.39 95.23 97.79 100.00
But the Tidyverse library also contains a summarize()
function:
|
BloodOxygenLevel dbl |
|---|
| 95.22605 |
You can combine this with the group_by() function for
powerful analysis:
df %>%
group_by(Smoking_Status) %>%
summarize(Average_BloodOxygenLevel = mean(BloodOxygenLevel)) %>%
formatted_table()|
Smoking_Status chr |
Average_BloodOxygenLevel dbl |
|---|---|
| Current Smoker | 95.80224 |
| Former Smoker | 95.11622 |
| Never Smoked | 95.22261 |
Recall that this looks very much like the pivot table function in Excel (see figure 28 at data analysis Excel).
To create a similar output as the pivot tables in Excel:
df %>%
group_by(Smoking_Status) %>%
summarize(average_BOL = mean(BloodOxygenLevel), average_BT = mean(BodyTemperature)) %>%
formatted_table()|
Smoking_Status chr |
average_BOL dbl |
average_BT dbl |
|---|---|---|
| Current Smoker | 95.80224 | 36.74515 |
| Former Smoker | 95.11622 | 36.77285 |
| Never Smoked | 95.22261 | 36.75328 |
Now compare this output with figure 28 at data analysis Excel.
Likewise, you can extend this by providing the standard deviation:
df %>%
group_by(Smoking_Status) %>%
summarize(average_BOL = mean(BloodOxygenLevel), sd_BOL = sd(BloodOxygenLevel), average_BT = mean(BodyTemperature), sd_BT = sd(BodyTemperature)) %>%
formatted_table()|
Smoking_Status chr |
average_BOL dbl |
sd_BOL dbl |
average_BT dbl |
sd_BT dbl |
|---|---|---|---|---|
| Current Smoker | 95.80224 | 2.946324 | 36.74515 | 0.4338459 |
| Former Smoker | 95.11622 | 2.956668 | 36.77285 | 0.4367836 |
| Never Smoked | 95.22261 | 2.890827 | 36.75328 | 0.4267329 |
Round values
In R you can round decimals using the round()
function:
## [1] 2.4
## [1] 2.5
## [1] 2
## [1] 2
Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE 754’) is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2.
So if you like your values to be rounded in the example above, just
wrap it in the round() function:
df %>%
group_by(Smoking_Status) %>%
summarize(average_BOL = round(mean(BloodOxygenLevel), digits = 1),
sd_BOL = round(sd(BloodOxygenLevel), digits = 2),
average_BT = round(mean(BodyTemperature), digits = 1),
sd_BT = round(sd(BodyTemperature), digits = 2)) %>%
formatted_table()|
Smoking_Status chr |
average_BOL dbl |
sd_BOL dbl |
average_BT dbl |
sd_BT dbl |
|---|---|---|---|---|
| Current Smoker | 95.8 | 2.95 | 36.7 | 0.43 |
| Former Smoker | 95.1 | 2.96 | 36.8 | 0.44 |
| Never Smoked | 95.2 | 2.89 | 36.8 | 0.43 |
Statistics
R is build for statistics so you can do all kind of sophisticated statistics in R. As this is not a statistics course, the use of statistics is beyond the scope of this course.
Some very basic functions that might be useful (all applied on the BloodOxygenLevel column):
## [1] "mean: 95.226"
## [1] "min: 90.01"
## [1] "max: 100"
## [1] "median: 95.39"
## 25% 50% 75%
## 92.88 95.39 97.79
## [1] "standard deviation: 2.929"
paste("standard error of the mean:", round(sd(df$BloodOxygenLevel)/sqrt(length(df$BloodOxygenLevel)), digits = 3)) #can also be calculated using the plotrix package## [1] "standard error of the mean: 0.093"
Summarize each
Using Tidyverse, you can also apply some of the above statistics more
conveniently using the summarize_each() function:
df %>%
group_by(Smoking_Status) %>%
summarize_each(funs(mean, sd, se=sd(.)/sqrt(n()))) %>%
formatted_table()|
Smoking_Status chr |
Diabetic_mean dbl |
AlcoholLevel_mean dbl |
HeartRate_mean dbl |
BloodOxygenLevel_mean dbl |
BodyTemperature_mean dbl |
Weight_mean dbl |
MRI_Delay_mean dbl |
Prescription_mean dbl |
Dosage in mg_mean dbl |
Age_mean dbl |
Education_Level_mean dbl |
Dominant_Hand_mean dbl |
Gender_mean dbl |
Family_History_mean dbl |
APOE_ε4_mean dbl |
Physical_Activity_mean dbl |
Depression_Status_mean dbl |
Cognitive_Test_Scores_mean dbl |
Medication_History_mean dbl |
Nutrition_Diet_mean dbl |
Sleep_Quality_mean dbl |
Chronic_Health_Conditions_mean dbl |
Dementia_mean dbl |
Diabetic_sd dbl |
AlcoholLevel_sd dbl |
HeartRate_sd dbl |
BloodOxygenLevel_sd dbl |
BodyTemperature_sd dbl |
Weight_sd dbl |
MRI_Delay_sd dbl |
Prescription_sd dbl |
Dosage in mg_sd dbl |
Age_sd dbl |
Education_Level_sd dbl |
Dominant_Hand_sd dbl |
Gender_sd dbl |
Family_History_sd dbl |
APOE_ε4_sd dbl |
Physical_Activity_sd dbl |
Depression_Status_sd dbl |
Cognitive_Test_Scores_sd dbl |
Medication_History_sd dbl |
Nutrition_Diet_sd dbl |
Sleep_Quality_sd dbl |
Chronic_Health_Conditions_sd dbl |
Dementia_sd dbl |
Diabetic_se dbl |
AlcoholLevel_se dbl |
HeartRate_se dbl |
BloodOxygenLevel_se dbl |
BodyTemperature_se dbl |
Weight_se dbl |
MRI_Delay_se dbl |
Prescription_se dbl |
Dosage in mg_se dbl |
Age_se dbl |
Education_Level_se dbl |
Dominant_Hand_se dbl |
Gender_se dbl |
Family_History_se dbl |
APOE_ε4_se dbl |
Physical_Activity_se dbl |
Depression_Status_se dbl |
Cognitive_Test_Scores_se dbl |
Medication_History_se dbl |
Nutrition_Diet_se dbl |
Sleep_Quality_se dbl |
Chronic_Health_Conditions_se dbl |
Dementia_se dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Current Smoker | 0.5333333 | 0.1002539 | 80.95556 | 95.80224 | 36.74515 | 75.77560 | 28.45576 | NA | NA | 75.50000 | NA | NA | NA | NA | NA | NA | NA | 8.922222 | NA | NA | NA | NA | 0.0000000 | 0.5016826 | 0.0602318 | 12.44937 | 2.946324 | 0.4338459 | 13.58834 | 17.32292 | NA | NA | 8.047960 | NA | NA | NA | NA | NA | NA | NA | 0.8104344 | NA | NA | NA | NA | 0.0000000 | 0.0528820 | 0.0063490 | 1.3122786 | 0.3105698 | 0.0457314 | 1.4323372 | 1.8259960 | NA | NA | 0.8483295 | NA | NA | NA | NA | NA | NA | NA | 0.0854273 | NA | NA | NA | NA | 0.0000000 |
| Former Smoker | 0.4956332 | 0.0963685 | 78.88865 | 95.11622 | 36.77285 | 73.96654 | 30.30279 | NA | NA | 74.88428 | NA | NA | NA | NA | NA | NA | NA | 6.041485 | NA | NA | NA | NA | 0.5502183 | 0.5005277 | 0.0574986 | 11.81211 | 2.956668 | 0.4367836 | 14.31802 | 16.78815 | NA | NA | 9.134884 | NA | NA | NA | NA | NA | NA | NA | 3.2368947 | NA | NA | NA | NA | 0.4980157 | 0.0233881 | 0.0026867 | 0.5519435 | 0.1381560 | 0.0204096 | 0.6690372 | 0.7844588 | NA | NA | 0.4268451 | NA | NA | NA | NA | NA | NA | NA | 0.1512502 | NA | NA | NA | NA | 0.0232707 |
| Never Smoked | 0.5265487 | 0.1001543 | 79.57080 | 95.22261 | 36.75328 | 74.39125 | 30.22980 | NA | NA | 74.81416 | NA | NA | NA | NA | NA | NA | NA | 6.223451 | NA | NA | NA | NA | 0.5154867 | 0.4998479 | 0.0596861 | 12.32764 | 2.890827 | 0.4267329 | 14.78750 | 16.86450 | NA | NA | 9.263618 | NA | NA | NA | NA | NA | NA | NA | 3.1986802 | NA | NA | NA | NA | 0.5003139 | 0.0235109 | 0.0028074 | 0.5798432 | 0.1359731 | 0.0200718 | 0.6955454 | 0.7932391 | NA | NA | 0.4357239 | NA | NA | NA | NA | NA | NA | NA | 0.1504533 | NA | NA | NA | NA | 0.0235328 |
Note: that the se=sd(.)/sqrt(n()) code is a custom made
function to calculate the standard error of the mean.
Columns with characters cannot be used to do these kind of
calculations. If you check the complete formatted table you will find
that these are empty (NA). For example the column for
Prescription; it is not possible to calculate the mean value for
character type of data (in this case: name of the medicine). So it is
best to omit them:
df %>%
group_by(Smoking_Status) %>%
summarize_each(funs(mean, sd, se=sd(.)/sqrt(n()))) %>%
select(-Prescription_mean) %>%
formatted_table()|
Smoking_Status chr |
Diabetic_mean dbl |
AlcoholLevel_mean dbl |
HeartRate_mean dbl |
BloodOxygenLevel_mean dbl |
BodyTemperature_mean dbl |
Weight_mean dbl |
MRI_Delay_mean dbl |
Dosage in mg_mean dbl |
Age_mean dbl |
Education_Level_mean dbl |
Dominant_Hand_mean dbl |
Gender_mean dbl |
Family_History_mean dbl |
APOE_ε4_mean dbl |
Physical_Activity_mean dbl |
Depression_Status_mean dbl |
Cognitive_Test_Scores_mean dbl |
Medication_History_mean dbl |
Nutrition_Diet_mean dbl |
Sleep_Quality_mean dbl |
Chronic_Health_Conditions_mean dbl |
Dementia_mean dbl |
Diabetic_sd dbl |
AlcoholLevel_sd dbl |
HeartRate_sd dbl |
BloodOxygenLevel_sd dbl |
BodyTemperature_sd dbl |
Weight_sd dbl |
MRI_Delay_sd dbl |
Prescription_sd dbl |
Dosage in mg_sd dbl |
Age_sd dbl |
Education_Level_sd dbl |
Dominant_Hand_sd dbl |
Gender_sd dbl |
Family_History_sd dbl |
APOE_ε4_sd dbl |
Physical_Activity_sd dbl |
Depression_Status_sd dbl |
Cognitive_Test_Scores_sd dbl |
Medication_History_sd dbl |
Nutrition_Diet_sd dbl |
Sleep_Quality_sd dbl |
Chronic_Health_Conditions_sd dbl |
Dementia_sd dbl |
Diabetic_se dbl |
AlcoholLevel_se dbl |
HeartRate_se dbl |
BloodOxygenLevel_se dbl |
BodyTemperature_se dbl |
Weight_se dbl |
MRI_Delay_se dbl |
Prescription_se dbl |
Dosage in mg_se dbl |
Age_se dbl |
Education_Level_se dbl |
Dominant_Hand_se dbl |
Gender_se dbl |
Family_History_se dbl |
APOE_ε4_se dbl |
Physical_Activity_se dbl |
Depression_Status_se dbl |
Cognitive_Test_Scores_se dbl |
Medication_History_se dbl |
Nutrition_Diet_se dbl |
Sleep_Quality_se dbl |
Chronic_Health_Conditions_se dbl |
Dementia_se dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Current Smoker | 0.5333333 | 0.1002539 | 80.95556 | 95.80224 | 36.74515 | 75.77560 | 28.45576 | NA | 75.50000 | NA | NA | NA | NA | NA | NA | NA | 8.922222 | NA | NA | NA | NA | 0.0000000 | 0.5016826 | 0.0602318 | 12.44937 | 2.946324 | 0.4338459 | 13.58834 | 17.32292 | NA | NA | 8.047960 | NA | NA | NA | NA | NA | NA | NA | 0.8104344 | NA | NA | NA | NA | 0.0000000 | 0.0528820 | 0.0063490 | 1.3122786 | 0.3105698 | 0.0457314 | 1.4323372 | 1.8259960 | NA | NA | 0.8483295 | NA | NA | NA | NA | NA | NA | NA | 0.0854273 | NA | NA | NA | NA | 0.0000000 |
| Former Smoker | 0.4956332 | 0.0963685 | 78.88865 | 95.11622 | 36.77285 | 73.96654 | 30.30279 | NA | 74.88428 | NA | NA | NA | NA | NA | NA | NA | 6.041485 | NA | NA | NA | NA | 0.5502183 | 0.5005277 | 0.0574986 | 11.81211 | 2.956668 | 0.4367836 | 14.31802 | 16.78815 | NA | NA | 9.134884 | NA | NA | NA | NA | NA | NA | NA | 3.2368947 | NA | NA | NA | NA | 0.4980157 | 0.0233881 | 0.0026867 | 0.5519435 | 0.1381560 | 0.0204096 | 0.6690372 | 0.7844588 | NA | NA | 0.4268451 | NA | NA | NA | NA | NA | NA | NA | 0.1512502 | NA | NA | NA | NA | 0.0232707 |
| Never Smoked | 0.5265487 | 0.1001543 | 79.57080 | 95.22261 | 36.75328 | 74.39125 | 30.22980 | NA | 74.81416 | NA | NA | NA | NA | NA | NA | NA | 6.223451 | NA | NA | NA | NA | 0.5154867 | 0.4998479 | 0.0596861 | 12.32764 | 2.890827 | 0.4267329 | 14.78750 | 16.86450 | NA | NA | 9.263618 | NA | NA | NA | NA | NA | NA | NA | 3.1986802 | NA | NA | NA | NA | 0.5003139 | 0.0235109 | 0.0028074 | 0.5798432 | 0.1359731 | 0.0200718 | 0.6955454 | 0.7932391 | NA | NA | 0.4357239 | NA | NA | NA | NA | NA | NA | NA | 0.1504533 | NA | NA | NA | NA | 0.0235328 |
In this case, we can also remove all character type columns at once (and then do the calculations) as follows. The code is a bit complex and therefore we added explanations to each line:
df %>%
# 1. Group the data FIRST by the categorical variable
group_by(Smoking_Status) %>%
# 2. Summarize the data, applying functions ACROSS all numeric columns
summarise(
# Use across() to select columns and apply multiple functions
# .cols = where(is.numeric) selects all numeric columns
# .fns = list(...) applies the functions
across(
.cols = where(is.numeric),
.fns = list(
mean = mean,
sd = sd,
se = ~ sd(.) / sqrt(n()) # The tilde (~) creates an anonymous function for SE
)
)
) %>%
# 3. Apply the custom/formatting function
formatted_table()|
Smoking_Status chr |
Diabetic_mean dbl |
Diabetic_sd dbl |
Diabetic_se dbl |
AlcoholLevel_mean dbl |
AlcoholLevel_sd dbl |
AlcoholLevel_se dbl |
HeartRate_mean dbl |
HeartRate_sd dbl |
HeartRate_se dbl |
BloodOxygenLevel_mean dbl |
BloodOxygenLevel_sd dbl |
BloodOxygenLevel_se dbl |
BodyTemperature_mean dbl |
BodyTemperature_sd dbl |
BodyTemperature_se dbl |
Weight_mean dbl |
Weight_sd dbl |
Weight_se dbl |
MRI_Delay_mean dbl |
MRI_Delay_sd dbl |
MRI_Delay_se dbl |
Dosage in mg_mean dbl |
Dosage in mg_sd dbl |
Dosage in mg_se dbl |
Age_mean dbl |
Age_sd dbl |
Age_se dbl |
Cognitive_Test_Scores_mean dbl |
Cognitive_Test_Scores_sd dbl |
Cognitive_Test_Scores_se dbl |
Dementia_mean dbl |
Dementia_sd dbl |
Dementia_se dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Current Smoker | 0.5333333 | 0.5016826 | 0.0528820 | 0.1002539 | 0.0602318 | 0.0063490 | 80.95556 | 12.44937 | 1.3122786 | 95.80224 | 2.946324 | 0.3105698 | 36.74515 | 0.4338459 | 0.0457314 | 75.77560 | 13.58834 | 1.4323372 | 28.45576 | 17.32292 | 1.8259960 | NA | NA | NA | 75.50000 | 8.047960 | 0.8483295 | 8.922222 | 0.8104344 | 0.0854273 | 0.0000000 | 0.0000000 | 0.0000000 |
| Former Smoker | 0.4956332 | 0.5005277 | 0.0233881 | 0.0963685 | 0.0574986 | 0.0026867 | 78.88865 | 11.81211 | 0.5519435 | 95.11622 | 2.956668 | 0.1381560 | 36.77285 | 0.4367836 | 0.0204096 | 73.96654 | 14.31802 | 0.6690372 | 30.30279 | 16.78815 | 0.7844588 | NA | NA | NA | 74.88428 | 9.134884 | 0.4268451 | 6.041485 | 3.2368947 | 0.1512502 | 0.5502183 | 0.4980157 | 0.0232707 |
| Never Smoked | 0.5265487 | 0.4998479 | 0.0235109 | 0.1001543 | 0.0596861 | 0.0028074 | 79.57080 | 12.32764 | 0.5798432 | 95.22261 | 2.890827 | 0.1359731 | 36.75328 | 0.4267329 | 0.0200718 | 74.39125 | 14.78750 | 0.6955454 | 30.22980 | 16.86450 | 0.7932391 | NA | NA | NA | 74.81416 | 9.263618 | 0.4357239 | 6.223451 | 3.1986802 | 0.1504533 | 0.5154867 | 0.5003139 | 0.0235328 |
This way, the only character type column that is preserved is the
Smoking_Status column. Note: there are still fields with
NA in this overview. These are caused by missing values in
the original dataset. If you try to calculate the mean value of a set of
numbers, including NA, the result will be
NA:
## [1] "mean with no `NA` values: 4.88"
## [1] "mean with `NA` values: NA"
## [1] "standard deviation with no `NA` values: 2.03273215156351"
## [1] "standard deviation with `NA` values: NA"
## [1] "sum with no `NA` values: 24.4"
## [1] "sum with `NA` values: NA"
We can get around this problem using the na.rm =
argument in the functions used. The default setting is
na.rm = FALSE. If we set this argument to
TRUE, all NA values will be removed before the
mean/sd/sum is calculated.
## [1] "mean with no `NA` values: 4.88"
## [1] "mean with `NA` values: 4.95"
## [1] "standard deviation with no `NA` values: 2.03273215156351"
## [1] "standard deviation with `NA` values: 2.34022790912908"
## [1] "sum with no `NA` values: 24.4"
## [1] "sum with `NA` values: 19.8"
Summarize and across
The summarize_each() function is deprecated (will work,
but not actively maintained anymore) in favor of the new
across() function that works within
summarize() and mutate().
df %>%
group_by(Smoking_Status) %>%
summarize(across(everything(), list(mean = mean, sd = sd))) %>%
formatted_table()|
Smoking_Status chr |
Diabetic_mean dbl |
Diabetic_sd dbl |
AlcoholLevel_mean dbl |
AlcoholLevel_sd dbl |
HeartRate_mean dbl |
HeartRate_sd dbl |
BloodOxygenLevel_mean dbl |
BloodOxygenLevel_sd dbl |
BodyTemperature_mean dbl |
BodyTemperature_sd dbl |
Weight_mean dbl |
Weight_sd dbl |
MRI_Delay_mean dbl |
MRI_Delay_sd dbl |
Prescription_mean dbl |
Prescription_sd dbl |
Dosage in mg_mean dbl |
Dosage in mg_sd dbl |
Age_mean dbl |
Age_sd dbl |
Education_Level_mean dbl |
Education_Level_sd dbl |
Dominant_Hand_mean dbl |
Dominant_Hand_sd dbl |
Gender_mean dbl |
Gender_sd dbl |
Family_History_mean dbl |
Family_History_sd dbl |
APOE_ε4_mean dbl |
APOE_ε4_sd dbl |
Physical_Activity_mean dbl |
Physical_Activity_sd dbl |
Depression_Status_mean dbl |
Depression_Status_sd dbl |
Cognitive_Test_Scores_mean dbl |
Cognitive_Test_Scores_sd dbl |
Medication_History_mean dbl |
Medication_History_sd dbl |
Nutrition_Diet_mean dbl |
Nutrition_Diet_sd dbl |
Sleep_Quality_mean dbl |
Sleep_Quality_sd dbl |
Chronic_Health_Conditions_mean dbl |
Chronic_Health_Conditions_sd dbl |
Dementia_mean dbl |
Dementia_sd dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Current Smoker | 0.5333333 | 0.5016826 | 0.1002539 | 0.0602318 | 80.95556 | 12.44937 | 95.80224 | 2.946324 | 36.74515 | 0.4338459 | 75.77560 | 13.58834 | 28.45576 | 17.32292 | NA | NA | NA | NA | 75.50000 | 8.047960 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 8.922222 | 0.8104344 | NA | NA | NA | NA | NA | NA | NA | NA | 0.0000000 | 0.0000000 |
| Former Smoker | 0.4956332 | 0.5005277 | 0.0963685 | 0.0574986 | 78.88865 | 11.81211 | 95.11622 | 2.956668 | 36.77285 | 0.4367836 | 73.96654 | 14.31802 | 30.30279 | 16.78815 | NA | NA | NA | NA | 74.88428 | 9.134884 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 6.041485 | 3.2368947 | NA | NA | NA | NA | NA | NA | NA | NA | 0.5502183 | 0.4980157 |
| Never Smoked | 0.5265487 | 0.4998479 | 0.1001543 | 0.0596861 | 79.57080 | 12.32764 | 95.22261 | 2.890827 | 36.75328 | 0.4267329 | 74.39125 | 14.78750 | 30.22980 | 16.86450 | NA | NA | NA | NA | 74.81416 | 9.263618 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 6.223451 | 3.1986802 | NA | NA | NA | NA | NA | NA | NA | NA | 0.5154867 | 0.5003139 |
Or use the scoped variant of summarize,
summarize_all():
|
Smoking_Status chr |
Diabetic_mean dbl |
AlcoholLevel_mean dbl |
HeartRate_mean dbl |
BloodOxygenLevel_mean dbl |
BodyTemperature_mean dbl |
Weight_mean dbl |
MRI_Delay_mean dbl |
Prescription_mean dbl |
Dosage in mg_mean dbl |
Age_mean dbl |
Education_Level_mean dbl |
Dominant_Hand_mean dbl |
Gender_mean dbl |
Family_History_mean dbl |
APOE_ε4_mean dbl |
Physical_Activity_mean dbl |
Depression_Status_mean dbl |
Cognitive_Test_Scores_mean dbl |
Medication_History_mean dbl |
Nutrition_Diet_mean dbl |
Sleep_Quality_mean dbl |
Chronic_Health_Conditions_mean dbl |
Dementia_mean dbl |
Diabetic_sd dbl |
AlcoholLevel_sd dbl |
HeartRate_sd dbl |
BloodOxygenLevel_sd dbl |
BodyTemperature_sd dbl |
Weight_sd dbl |
MRI_Delay_sd dbl |
Prescription_sd dbl |
Dosage in mg_sd dbl |
Age_sd dbl |
Education_Level_sd dbl |
Dominant_Hand_sd dbl |
Gender_sd dbl |
Family_History_sd dbl |
APOE_ε4_sd dbl |
Physical_Activity_sd dbl |
Depression_Status_sd dbl |
Cognitive_Test_Scores_sd dbl |
Medication_History_sd dbl |
Nutrition_Diet_sd dbl |
Sleep_Quality_sd dbl |
Chronic_Health_Conditions_sd dbl |
Dementia_sd dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Current Smoker | 0.5333333 | 0.1002539 | 80.95556 | 95.80224 | 36.74515 | 75.77560 | 28.45576 | NA | NA | 75.50000 | NA | NA | NA | NA | NA | NA | NA | 8.922222 | NA | NA | NA | NA | 0.0000000 | 0.5016826 | 0.0602318 | 12.44937 | 2.946324 | 0.4338459 | 13.58834 | 17.32292 | NA | NA | 8.047960 | NA | NA | NA | NA | NA | NA | NA | 0.8104344 | NA | NA | NA | NA | 0.0000000 |
| Former Smoker | 0.4956332 | 0.0963685 | 78.88865 | 95.11622 | 36.77285 | 73.96654 | 30.30279 | NA | NA | 74.88428 | NA | NA | NA | NA | NA | NA | NA | 6.041485 | NA | NA | NA | NA | 0.5502183 | 0.5005277 | 0.0574986 | 11.81211 | 2.956668 | 0.4367836 | 14.31802 | 16.78815 | NA | NA | 9.134884 | NA | NA | NA | NA | NA | NA | NA | 3.2368947 | NA | NA | NA | NA | 0.4980157 |
| Never Smoked | 0.5265487 | 0.1001543 | 79.57080 | 95.22261 | 36.75328 | 74.39125 | 30.22980 | NA | NA | 74.81416 | NA | NA | NA | NA | NA | NA | NA | 6.223451 | NA | NA | NA | NA | 0.5154867 | 0.4998479 | 0.0596861 | 12.32764 | 2.890827 | 0.4267329 | 14.78750 | 16.86450 | NA | NA | 9.263618 | NA | NA | NA | NA | NA | NA | NA | 3.1986802 | NA | NA | NA | NA | 0.5003139 |
If you want to remove the columns that yield NA’s:
|
Diabetic dbl |
AlcoholLevel dbl |
HeartRate dbl |
BloodOxygenLevel dbl |
BodyTemperature dbl |
Weight dbl |
MRI_Delay dbl |
Prescription chr |
Dosage in mg dbl |
Age dbl |
Education_Level chr |
Dominant_Hand chr |
Gender chr |
Family_History chr |
Smoking_Status chr |
APOE_ε4 chr |
Physical_Activity chr |
Depression_Status chr |
Cognitive_Test_Scores dbl |
Medication_History chr |
Nutrition_Diet chr |
Sleep_Quality chr |
Chronic_Health_Conditions chr |
Dementia dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0849736 | 98 | 96.23074 | 36.22485 | 57.56398 | 36.42103 | NA | NA | 60 | Primary School | Left | Female | No | Current Smoker | Negative | Sedentary | No | 10 | No | Low-Carb Diet | Poor | Diabetes | 0 |
| 0 | 0.0169728 | 78 | 93.03212 | 36.18387 | 56.83234 | 31.15763 | Galantamine | 12.0 | 61 | Secondary School | Right | Male | No | Former Smoker | Positive | Moderate Activity | No | 1 | Yes | Low-Carb Diet | Poor | Heart Disease | 1 |
| 0 | 0.0090002 | 89 | 93.56650 | 37.32632 | 59.75907 | 37.64044 | NA | NA | 69 | Primary School | Right | Male | Yes | Former Smoker | Negative | Moderate Activity | No | 8 | No | Mediterranean Diet | Poor | Heart Disease | 0 |
| 0 | 0.0864373 | 60 | 93.90651 | 37.03062 | 58.26647 | 50.67399 | Donepezil | 23.0 | 78 | Secondary School | Left | Female | Yes | Never Smoked | Negative | Mild Activity | Yes | 5 | Yes | Balanced Diet | Poor | Hypertension | 1 |
| 1 | 0.1507473 | 67 | 97.50899 | 36.06212 | 67.70503 | 27.81060 | Memantine | 20.0 | 77 | Secondary School | Right | Male | Yes | Never Smoked | Positive | Mild Activity | No | 0 | Yes | Low-Carb Diet | Good | Diabetes | 1 |
| 1 | 0.1140278 | 94 | 94.54675 | 36.67807 | 66.59233 | 21.15486 | Rivastigmine | 1.5 | 67 | No School | Left | Male | No | Former Smoker | Positive | Mild Activity | Yes | 1 | No | Low-Carb Diet | Poor | Diabetes | 1 |
Go back to the main page
Go back to the R overview page
⬆️ Back to Top
This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.