Go back to the main page
Go back to the R overview page


This Rmd file can be downloaded here

library(tidyverse)

📘 Tidyverse Overview

💡 This page provides a structured overview of core tidyverse functions.
Click a category to expand and jump to detailed examples.

The examples shown are performed on the Puromycin dataset. The Puromycin dataset is a classic example dataset built into the base R installation (within the datasets package). It originates from a biochemical experiment and is widely used for demonstrating and testing nonlinear regression models, particularly the Michaelis-Menten kinetics model in enzyme reactions.

head(Puromycin)
##   conc rate   state
## 1 0.02   76 treated
## 2 0.02   47 treated
## 3 0.06   97 treated
## 4 0.06  107 treated
## 5 0.11  123 treated
## 6 0.11  139 treated

Summary Table

💡 This section summarizes the core Tidyverse operations on tibbles. Click on the triangles to expand.

🧩 Row Operationsdrop_na(), filter(), slice(), arrange(), distinct(), slice_sample(), slice_min(), slice_max Select, subset, or reorder rows.
➡️ See Row Actions ↓
📊 Column Operationsselect(), rename(), mutate(), transmute(), relocate(), across() Choose, rename, or transform columns.
➡️ See Column Actions ↓
🧮 Grouping & Aggregationgroup_by(), ungroup(), summarize(), mutate(), filter() Split data into groups and compute summaries.
➡️ See Group Actions ↓
🔗 Joins & Bindingleft_join(), inner_join(), bind_rows() Combine multiple data frames.
➡️ See Join Actions ↓
🔄 Reshaping Datapivot_longer(), pivot_wider() Restructure data between long and wide formats.
➡️ See Reshaping ↓
📈 Visualizationggplot(), geom_*() Create and customize plots.
➡️ See Visualization ↓
🧰 Utilitiesglimpse(), pull(), count() Inspect or extract data quickly.
➡️ See Utilities ↓

Function-Specific Actions with Examples


🧩 Actions on Rows

⬆️ Back to Summary

💡 This section lists core row operations in the tidyverse. Click on the triangles to expand.

drop_na() – Drop rows containing NA values

First create a copy with an NA value:

Puro_with_NA <- as.tibble(Puromycin) # create a copy
Puro_with_NA[1, 2] <- NA # replace row 1, column 2 with NA
Puro_with_NA
## # A tibble: 23 × 3
##     conc  rate state  
##    <dbl> <dbl> <fct>  
##  1  0.02    NA treated
##  2  0.02    47 treated
##  3  0.06    97 treated
##  4  0.06   107 treated
##  5  0.11   123 treated
##  6  0.11   139 treated
##  7  0.22   159 treated
##  8  0.22   152 treated
##  9  0.56   191 treated
## 10  0.56   201 treated
## # ℹ 13 more rows

And drop the row with the NA:

# Drop row with NA
Puro_with_NA %>% drop_na() 
## # A tibble: 22 × 3
##     conc  rate state  
##    <dbl> <dbl> <fct>  
##  1  0.02    47 treated
##  2  0.06    97 treated
##  3  0.06   107 treated
##  4  0.11   123 treated
##  5  0.11   139 treated
##  6  0.22   159 treated
##  7  0.22   152 treated
##  8  0.56   191 treated
##  9  0.56   201 treated
## 10  1.1    207 treated
## # ℹ 12 more rows
filter() – Subset rows based on conditions
# Keep only rows where rate > 200
Puromycin %>% filter(rate > 200)
##   conc rate   state
## 1 0.56  201 treated
## 2 1.10  207 treated
Selects rows that meet logical conditions. Supports multiple conditions with & (and) or | (or).
slice() – Select rows by position
# Keep the first 5 rows
Puromycin %>% slice(1:5)
##   conc rate   state
## 1 0.02   76 treated
## 2 0.02   47 treated
## 3 0.06   97 treated
## 4 0.06  107 treated
## 5 0.11  123 treated
Select rows by numeric positions or helpers like slice_head() or slice_tail().
arrange() – Reorder rows by variable(s)
# Sort by rate descending
Puromycin %>% arrange(desc(rate))
##    conc rate     state
## 1  1.10  207   treated
## 2  0.56  201   treated
## 3  1.10  200   treated
## 4  0.56  191   treated
## 5  1.10  160 untreated
## 6  0.22  159   treated
## 7  0.56  158 untreated
## 8  0.22  152   treated
## 9  0.56  144 untreated
## 10 0.11  139   treated
## 11 0.22  131 untreated
## 12 0.22  124 untreated
## 13 0.11  123   treated
## 14 0.11  115 untreated
## 15 0.06  107   treated
## 16 0.11   98 untreated
## 17 0.06   97   treated
## 18 0.06   86 untreated
## 19 0.06   84 untreated
## 20 0.02   76   treated
## 21 0.02   67 untreated
## 22 0.02   51 untreated
## 23 0.02   47   treated
Orders rows by one or more columns. Use desc() for descending order.
distinct() – Keep unique rows
# Keep unique values of rate
Puromycin %>% distinct(rate)
##    rate
## 1    76
## 2    47
## 3    97
## 4   107
## 5   123
## 6   139
## 7   159
## 8   152
## 9   191
## 10  201
## 11  207
## 12  200
## 13   67
## 14   51
## 15   84
## 16   86
## 17   98
## 18  115
## 19  131
## 20  124
## 21  144
## 22  158
## 23  160
Removes duplicate rows based on one or more columns.
slice_sample() – Random sample of rows
# Randomly sample 3 rows
Puromycin %>% slice_sample(n = 3)
##   conc rate     state
## 1 0.22  159   treated
## 2 0.11  115 untreated
## 3 0.11   98 untreated
Randomly selects rows; can sample fractionally with prop.
slice_min()/slice_max() – Top or bottom rows by value
# 3 rows with highest rate
Puromycin %>% slice_max(rate, n = 3)
##   conc rate   state
## 1 1.10  207 treated
## 2 0.56  201 treated
## 3 1.10  200 treated
Select rows with minimum or maximum values of a variable.

📊 Actions on Columns

⬆️ Back to Summary

💡 This section lists core column operations in the tidyverse. Click on the triangles to expand.

select() – Choose specific columns
# Select columns conc and rate
Puromycin %>% select(conc, rate)
##    conc rate
## 1  0.02   76
## 2  0.02   47
## 3  0.06   97
## 4  0.06  107
## 5  0.11  123
## 6  0.11  139
## 7  0.22  159
## 8  0.22  152
## 9  0.56  191
## 10 0.56  201
## 11 1.10  207
## 12 1.10  200
## 13 0.02   67
## 14 0.02   51
## 15 0.06   84
## 16 0.06   86
## 17 0.11   98
## 18 0.11  115
## 19 0.22  131
## 20 0.22  124
## 21 0.56  144
## 22 0.56  158
## 23 1.10  160
Keeps only the specified columns. Can use helper functions like starts_with(), ends_with(), or contains().
rename() – Rename columns
# Rename conc to conc_ppm
Puromycin %>% rename(conc_ppm = conc)
##    conc_ppm rate     state
## 1      0.02   76   treated
## 2      0.02   47   treated
## 3      0.06   97   treated
## 4      0.06  107   treated
## 5      0.11  123   treated
## 6      0.11  139   treated
## 7      0.22  159   treated
## 8      0.22  152   treated
## 9      0.56  191   treated
## 10     0.56  201   treated
## 11     1.10  207   treated
## 12     1.10  200   treated
## 13     0.02   67 untreated
## 14     0.02   51 untreated
## 15     0.06   84 untreated
## 16     0.06   86 untreated
## 17     0.11   98 untreated
## 18     0.11  115 untreated
## 19     0.22  131 untreated
## 20     0.22  124 untreated
## 21     0.56  144 untreated
## 22     0.56  158 untreated
## 23     1.10  160 untreated
Changes column names without altering the data.
mutate() – Add or modify columns
# Add a new column rate_per_conc
Puromycin %>% mutate(rate_per_conc = rate / conc)
##    conc rate     state rate_per_conc
## 1  0.02   76   treated     3800.0000
## 2  0.02   47   treated     2350.0000
## 3  0.06   97   treated     1616.6667
## 4  0.06  107   treated     1783.3333
## 5  0.11  123   treated     1118.1818
## 6  0.11  139   treated     1263.6364
## 7  0.22  159   treated      722.7273
## 8  0.22  152   treated      690.9091
## 9  0.56  191   treated      341.0714
## 10 0.56  201   treated      358.9286
## 11 1.10  207   treated      188.1818
## 12 1.10  200   treated      181.8182
## 13 0.02   67 untreated     3350.0000
## 14 0.02   51 untreated     2550.0000
## 15 0.06   84 untreated     1400.0000
## 16 0.06   86 untreated     1433.3333
## 17 0.11   98 untreated      890.9091
## 18 0.11  115 untreated     1045.4545
## 19 0.22  131 untreated      595.4545
## 20 0.22  124 untreated      563.6364
## 21 0.56  144 untreated      257.1429
## 22 0.56  158 untreated      282.1429
## 23 1.10  160 untreated      145.4545
Creates new columns or modifies existing ones using expressions.
transmute() – Create new columns and drop others
# Keep only rate_per_conc column
Puromycin %>% transmute(rate_per_conc = rate / conc)
##    rate_per_conc
## 1      3800.0000
## 2      2350.0000
## 3      1616.6667
## 4      1783.3333
## 5      1118.1818
## 6      1263.6364
## 7       722.7273
## 8       690.9091
## 9       341.0714
## 10      358.9286
## 11      188.1818
## 12      181.8182
## 13     3350.0000
## 14     2550.0000
## 15     1400.0000
## 16     1433.3333
## 17      890.9091
## 18     1045.4545
## 19      595.4545
## 20      563.6364
## 21      257.1429
## 22      282.1429
## 23      145.4545
Generates new columns and drops all others in the output.
relocate() – Move columns to new positions
# Move state column to the first position
Puromycin %>% relocate(state, .before = 1)
##        state conc rate
## 1    treated 0.02   76
## 2    treated 0.02   47
## 3    treated 0.06   97
## 4    treated 0.06  107
## 5    treated 0.11  123
## 6    treated 0.11  139
## 7    treated 0.22  159
## 8    treated 0.22  152
## 9    treated 0.56  191
## 10   treated 0.56  201
## 11   treated 1.10  207
## 12   treated 1.10  200
## 13 untreated 0.02   67
## 14 untreated 0.02   51
## 15 untreated 0.06   84
## 16 untreated 0.06   86
## 17 untreated 0.11   98
## 18 untreated 0.11  115
## 19 untreated 0.22  131
## 20 untreated 0.22  124
## 21 untreated 0.56  144
## 22 untreated 0.56  158
## 23 untreated 1.10  160
Reorders columns without changing data.
across() – Apply functions to multiple columns
# Convert all numeric columns to log scale
Puromycin %>% mutate(across(where(is.numeric), log))
##           conc     rate     state
## 1  -3.91202301 4.330733   treated
## 2  -3.91202301 3.850148   treated
## 3  -2.81341072 4.574711   treated
## 4  -2.81341072 4.672829   treated
## 5  -2.20727491 4.812184   treated
## 6  -2.20727491 4.934474   treated
## 7  -1.51412773 5.068904   treated
## 8  -1.51412773 5.023881   treated
## 9  -0.57981850 5.252273   treated
## 10 -0.57981850 5.303305   treated
## 11  0.09531018 5.332719   treated
## 12  0.09531018 5.298317   treated
## 13 -3.91202301 4.204693 untreated
## 14 -3.91202301 3.931826 untreated
## 15 -2.81341072 4.430817 untreated
## 16 -2.81341072 4.454347 untreated
## 17 -2.20727491 4.584967 untreated
## 18 -2.20727491 4.744932 untreated
## 19 -1.51412773 4.875197 untreated
## 20 -1.51412773 4.820282 untreated
## 21 -0.57981850 4.969813 untreated
## 22 -0.57981850 5.062595 untreated
## 23  0.09531018 5.075174 untreated
Applies a function across selected columns, often used inside mutate() or summarize().

🧮 Actions on Groups

⬆️ Back to Summary

💡 This section lists core grouping operations in the tidyverse. Click on the triangles to expand.

group_by() – Group data by one or more variables
# Group by conc
Puromycin %>% group_by(conc)
## # A tibble: 23 × 3
## # Groups:   conc [6]
##     conc  rate state  
##    <dbl> <dbl> <fct>  
##  1  0.02    76 treated
##  2  0.02    47 treated
##  3  0.06    97 treated
##  4  0.06   107 treated
##  5  0.11   123 treated
##  6  0.11   139 treated
##  7  0.22   159 treated
##  8  0.22   152 treated
##  9  0.56   191 treated
## 10  0.56   201 treated
## # ℹ 13 more rows
Creates groups in the data for subsequent summarization or manipulation.
ungroup() – Remove grouping
# Remove grouping
Puromycin %>% group_by(conc) %>% ungroup()
## # A tibble: 23 × 3
##     conc  rate state  
##    <dbl> <dbl> <fct>  
##  1  0.02    76 treated
##  2  0.02    47 treated
##  3  0.06    97 treated
##  4  0.06   107 treated
##  5  0.11   123 treated
##  6  0.11   139 treated
##  7  0.22   159 treated
##  8  0.22   152 treated
##  9  0.56   191 treated
## 10  0.56   201 treated
## # ℹ 13 more rows
Removes existing groups, returning data to normal ungrouped structure.
summarize() – Compute summary statistics by group
# Average rate by conc
Puromycin %>% group_by(conc) %>% summarize(avg_rate = mean(rate))
## # A tibble: 6 × 2
##    conc avg_rate
##   <dbl>    <dbl>
## 1  0.02     60.2
## 2  0.06     93.5
## 3  0.11    119. 
## 4  0.22    142. 
## 5  0.56    174. 
## 6  1.1     189
Aggregates data within each group to produce summary statistics. Often combined with group_by().
mutate() with groups – Add or modify columns within groups
# Compute rank of rate within each conc group
Puromycin %>% group_by(conc) %>% mutate(rank_rate = rank(rate))
## # A tibble: 23 × 4
## # Groups:   conc [6]
##     conc  rate state   rank_rate
##    <dbl> <dbl> <fct>       <dbl>
##  1  0.02    76 treated         4
##  2  0.02    47 treated         1
##  3  0.06    97 treated         3
##  4  0.06   107 treated         4
##  5  0.11   123 treated         3
##  6  0.11   139 treated         4
##  7  0.22   159 treated         4
##  8  0.22   152 treated         3
##  9  0.56   191 treated         3
## 10  0.56   201 treated         4
## # ℹ 13 more rows
Performs column transformations separately within each group.
filter() with groups – Subset rows within groups
# Keep top 1 rate rows within each conc group
Puromycin %>% group_by(conc) %>% slice_max(rate, n = 1)
## # A tibble: 6 × 3
## # Groups:   conc [6]
##    conc  rate state  
##   <dbl> <dbl> <fct>  
## 1  0.02    76 treated
## 2  0.06   107 treated
## 3  0.11   139 treated
## 4  0.22   159 treated
## 5  0.56   201 treated
## 6  1.1    207 treated
Allows row selection within each group, keeping group-wise logic intact.

🔗 Joins & Bindings

⬆️ Back to Summary

💡 This section lists core joining operations in the tidyverse. Click on the triangles to expand.

left_join() – A join combines rows from two data frames based on a common key.
# Create a copy of puromycin
puromycin <- as_tibble(Puromycin)
head(puromycin)
## # A tibble: 6 × 3
##    conc  rate state  
##   <dbl> <dbl> <fct>  
## 1  0.02    76 treated
## 2  0.02    47 treated
## 3  0.06    97 treated
## 4  0.06   107 treated
## 5  0.11   123 treated
## 6  0.11   139 treated
# Create a second dataset with additional info
treatment_info <- tibble(
  state = c("treated", "untreated"),
  description = c("Received Puromycin", "Control group")
)
head(treatment_info)
## # A tibble: 2 × 2
##   state     description       
##   <chr>     <chr>             
## 1 treated   Received Puromycin
## 2 untreated Control group
# Left join: add treatment descriptions
left_join(puromycin, treatment_info, by = "state")
## # A tibble: 23 × 4
##     conc  rate state   description       
##    <dbl> <dbl> <chr>   <chr>             
##  1  0.02    76 treated Received Puromycin
##  2  0.02    47 treated Received Puromycin
##  3  0.06    97 treated Received Puromycin
##  4  0.06   107 treated Received Puromycin
##  5  0.11   123 treated Received Puromycin
##  6  0.11   139 treated Received Puromycin
##  7  0.22   159 treated Received Puromycin
##  8  0.22   152 treated Received Puromycin
##  9  0.56   191 treated Received Puromycin
## 10  0.56   201 treated Received Puromycin
## # ℹ 13 more rows

Thus:

  • Left_join(puromycin, treatment_info, by = "state") merges the description column into the puromycin dataset by matching the state column.
  • Every row in puromycin keeps its original data. The description column is added based on the matching state.
bind_cols() – Binding combines data frames either by rows (bind_rows) or by columns (bind_cols).
# Split Puromycin into two groups
treated <- puromycin %>% filter(state == "treated")
untreated <- puromycin %>% filter(state == "untreated")

treated # Show treated, 12 rows
## # A tibble: 12 × 3
##     conc  rate state  
##    <dbl> <dbl> <fct>  
##  1  0.02    76 treated
##  2  0.02    47 treated
##  3  0.06    97 treated
##  4  0.06   107 treated
##  5  0.11   123 treated
##  6  0.11   139 treated
##  7  0.22   159 treated
##  8  0.22   152 treated
##  9  0.56   191 treated
## 10  0.56   201 treated
## 11  1.1    207 treated
## 12  1.1    200 treated
# Row binding: recombine after filtering
bind_rows(treated, untreated) # now 23 rows
## # A tibble: 23 × 3
##     conc  rate state  
##    <dbl> <dbl> <fct>  
##  1  0.02    76 treated
##  2  0.02    47 treated
##  3  0.06    97 treated
##  4  0.06   107 treated
##  5  0.11   123 treated
##  6  0.11   139 treated
##  7  0.22   159 treated
##  8  0.22   152 treated
##  9  0.56   191 treated
## 10  0.56   201 treated
## # ℹ 13 more rows
# Column binding: add an extra column
extra_info <- tibble(source = rep("Lab A", nrow(puromycin)))
extra_info
## # A tibble: 23 × 1
##    source
##    <chr> 
##  1 Lab A 
##  2 Lab A 
##  3 Lab A 
##  4 Lab A 
##  5 Lab A 
##  6 Lab A 
##  7 Lab A 
##  8 Lab A 
##  9 Lab A 
## 10 Lab A 
## # ℹ 13 more rows
bind_cols(puromycin, extra_info) # Columns bound
## # A tibble: 23 × 4
##     conc  rate state   source
##    <dbl> <dbl> <fct>   <chr> 
##  1  0.02    76 treated Lab A 
##  2  0.02    47 treated Lab A 
##  3  0.06    97 treated Lab A 
##  4  0.06   107 treated Lab A 
##  5  0.11   123 treated Lab A 
##  6  0.11   139 treated Lab A 
##  7  0.22   159 treated Lab A 
##  8  0.22   152 treated Lab A 
##  9  0.56   191 treated Lab A 
## 10  0.56   201 treated Lab A 
## # ℹ 13 more rows

Thus:
- bind_rows() is for stacking datasets vertically (more rows).
- bind_cols() is for combining datasets horizontally (more columns).


🔄 Reshaping Data

⬆️ Back to Summary

💡 This section lists restructure operations of data between long and wide formats in the tidyverse. Click on the triangles to expand.

pivot_wider() – Spread key-value pairs across multiple columns. This is useful when you want to separate values into distinct columns.

First create an extra column with replicates:

Puromycin_reps <- Puromycin %>%
  mutate(replicate = rep(1:2, length.out = n()))
Puromycin_reps
##    conc rate     state replicate
## 1  0.02   76   treated         1
## 2  0.02   47   treated         2
## 3  0.06   97   treated         1
## 4  0.06  107   treated         2
## 5  0.11  123   treated         1
## 6  0.11  139   treated         2
## 7  0.22  159   treated         1
## 8  0.22  152   treated         2
## 9  0.56  191   treated         1
## 10 0.56  201   treated         2
## 11 1.10  207   treated         1
## 12 1.10  200   treated         2
## 13 0.02   67 untreated         1
## 14 0.02   51 untreated         2
## 15 0.06   84 untreated         1
## 16 0.06   86 untreated         2
## 17 0.11   98 untreated         1
## 18 0.11  115 untreated         2
## 19 0.22  131 untreated         1
## 20 0.22  124 untreated         2
## 21 0.56  144 untreated         1
## 22 0.56  158 untreated         2
## 23 1.10  160 untreated         1

Now make the data wide:

Puromycin_wide <- Puromycin_reps %>% pivot_wider(names_from = replicate, values_from = rate)
Puromycin_wide
## # A tibble: 12 × 4
##     conc state       `1`   `2`
##    <dbl> <fct>     <dbl> <dbl>
##  1  0.02 treated      76    47
##  2  0.06 treated      97   107
##  3  0.11 treated     123   139
##  4  0.22 treated     159   152
##  5  0.56 treated     191   201
##  6  1.1  treated     207   200
##  7  0.02 untreated    67    51
##  8  0.06 untreated    84    86
##  9  0.11 untreated    98   115
## 10  0.22 untreated   131   124
## 11  0.56 untreated   144   158
## 12  1.1  untreated   160    NA
pivot_longer() – Convert columns into key-value pairs. This is useful when you want to gather multiple columns into a single column.
Puromycin_long <- Puromycin_wide %>% pivot_longer(cols = c("1", "2"),
             names_to = "replicate", values_to = "rate"
)
Puromycin_long
## # A tibble: 24 × 4
##     conc state   replicate  rate
##    <dbl> <fct>   <chr>     <dbl>
##  1  0.02 treated 1            76
##  2  0.02 treated 2            47
##  3  0.06 treated 1            97
##  4  0.06 treated 2           107
##  5  0.11 treated 1           123
##  6  0.11 treated 2           139
##  7  0.22 treated 1           159
##  8  0.22 treated 2           152
##  9  0.56 treated 1           191
## 10  0.56 treated 2           201
## # ℹ 14 more rows

📈 Visualization

⬆️ Back to Summary

💡 This section shows how to plot in ggplot in the tidyverse. Click on the triangles to expand.

ggplot() - The function ggplot() is used to create visualizations of your data.

It works with the following layers:

  1. Initialization Layer
  2. Geometry Layer
  3. Statistical Layer
  4. Scale Layer
  5. Label Layer
  6. Theme Layer

To show different layers in ggplot2 using the Puromycin dataset, we build the plot layer by layer, starting with the data and aesthetic mappings, then adding a geometric layer, and finally adding features like labels, themes, or statistical transformations.

Here is a step-by-step example using R code for a scatter plot with a smooth line, illustrating the key layers:

  • Initialization Layer

Sets up the data and the initial aesthetic mappings (x, y, color/group)

ggplot(Puromycin, aes(x = conc, y = rate, color = state))

As you can see, no data are plotted yet. An XY-grid is created and the axis are set.

  • Geometry Layer

Adds the visual representation of the data (points for a scatter plot)

ggplot(Puromycin, aes(x = conc, y = rate, color = state)) +
  geom_point(size = 3)

As you can see, the data is plotted now. Note that the + sign is used to add an extra layer. Instead of geom_point, other functions can be used for other plot types.

  • Statistical Layer

Adds a statistical summary (e.g., a smooth line fit to the data). For simplicity, we just use a linear model.

ggplot(Puromycin, aes(x = conc, y = rate, color = state)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1)

Note that for plotting enzyme activity, we normally use the Michaelis-Menten model. This requires more complexity and is omitted in this example. In this case, we used a linear model without standard error ranges for demonstration purposes.

  • Scale Layer

Controls the mapping from the data values to the visual properties (e.g., color)

ggplot(Puromycin, aes(x = conc, y = rate, color = state)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
  scale_color_manual(name = "Enzyme State", values = c("untreated" = "darkgreen", "treated" = "purple"))

  • Label Layer

Adds non-data annotations like titles, axis labels, and caption

ggplot(Puromycin, aes(x = conc, y = rate, color = state)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
  scale_color_manual(name = "Enzyme State", values = c("untreated" = "darkgreen", "treated" = "purple")) +
  labs(
  title = "Reaction Rate vs. Substrate Concentration in Puromycin",
  subtitle = "Effect of enzyme state (treated/untreated)",
  x = "Concentration (ppm)",
  y = "Reaction Rate (counts/min)",
  caption = "Data from Puromycin dataset")

  • Theme Layer

Controls the non-data graphical elements (e.g., background, grid lines, font size)

ggplot(Puromycin, aes(x = conc, y = rate, color = state)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
  scale_color_manual(name = "Enzyme State", values = c("untreated" = "darkgreen", "treated" = "purple")) +
  labs(
  title = "Reaction Rate vs. Substrate Concentration in Puromycin",
  subtitle = "Effect of enzyme state (treated/untreated)",
  x = "Concentration (ppm)",
  y = "Reaction Rate (counts/min)",
  caption = "Data from Puromycin dataset") +
  theme_minimal(base_size = 14)

This is just a demonstration of the use of the various layers.
For a complete overview of GGPLOT see this link.

You can also download a pdf cheatsheet here.


🧰 Utilities

⬆️ Back to Summary

💡 This section lists some utility operations in the tidyverse. Click on the triangles to expand.

glimpse() - The function glimpse() is great for quick inspection of your data. Helps you verify column types and spot missing or unexpected values.
glimpse(Puromycin)
## Rows: 23
## Columns: 3
## $ conc  <dbl> 0.02, 0.02, 0.06, 0.06, 0.11, 0.11, 0.22, 0.22, 0.56, 0.56, 1.10…
## $ rate  <dbl> 76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200, 67, 51,…
## $ state <fct> treated, treated, treated, treated, treated, treated, treated, t…
pull() - The function pull() is used to extract a single column from a data frame or tibble as a vector.
Puromycin %>% pull(rate)
##  [1]  76  47  97 107 123 139 159 152 191 201 207 200  67  51  84  86  98 115 131
## [20] 124 144 158 160
count() - The function count() is used to tally the number of occurrences of values in one or more columns.
Puromycin %>% count(state)
##       state  n
## 1   treated 12
## 2 untreated 11

Go back to the main page
Go back to the R overview page
⬆️ Back to Top


This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.