Demo DatasauRus Dozen Dataset
Intro
🦕 The Datasaurus Dozen data:
The package contains 13 different datasets, one of which is the famous Dino dataset (the “Datasaurus” shape).
Same Statistics:
All 13 datasets in the “Datasaurus Dozen” have nearly identical or exactly the same basic summary statistics, such as the mean of \(x\) and \(y\) variables, the standard deviation of \(x\) and \(y\), and the correlation between \(x\) and \(y\).
Different Visualizations:
When plotted, these datasets form radically different visual shapes,
including a star, a circle, and the iconic dinosaur.
Why it is important:
By providing data that is statistically indistinguishable but visually distinct, the datasauRus package drives home the point that plotting your data is crucial for understanding its underlying structure.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.4 âś” readr 2.1.5
## âś” forcats 1.0.1 âś” stringr 1.6.0
## âś” ggplot2 4.0.0 âś” tibble 3.3.0
## âś” lubridate 1.9.4 âś” tidyr 1.3.1
## âś” purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## # A tibble: 6 Ă— 3
## dataset x y
## <chr> <dbl> <dbl>
## 1 dino 55.4 97.2
## 2 dino 51.5 96.0
## 3 dino 46.2 94.5
## 4 dino 42.8 91.4
## 5 dino 40.8 88.3
## 6 dino 38.7 84.9
datasaurus_dozen %>%
group_by(dataset) %>%
summarize(
mean_x = mean(x),
mean_y = mean(y),
std_dev_x = sd(x),
std_dev_y = sd(y),
corr_x_y = cor(x, y)
)## # A tibble: 13 Ă— 6
## dataset mean_x mean_y std_dev_x std_dev_y corr_x_y
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 away 54.3 47.8 16.8 26.9 -0.0641
## 2 bullseye 54.3 47.8 16.8 26.9 -0.0686
## 3 circle 54.3 47.8 16.8 26.9 -0.0683
## 4 dino 54.3 47.8 16.8 26.9 -0.0645
## 5 dots 54.3 47.8 16.8 26.9 -0.0603
## 6 h_lines 54.3 47.8 16.8 26.9 -0.0617
## 7 high_lines 54.3 47.8 16.8 26.9 -0.0685
## 8 slant_down 54.3 47.8 16.8 26.9 -0.0690
## 9 slant_up 54.3 47.8 16.8 26.9 -0.0686
## 10 star 54.3 47.8 16.8 26.9 -0.0630
## 11 v_lines 54.3 47.8 16.8 26.9 -0.0694
## 12 wide_lines 54.3 47.8 16.8 26.9 -0.0666
## 13 x_shape 54.3 47.8 16.8 26.9 -0.0656
Away
datasaurus_dozen %>%
filter(dataset == "away") %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
theme_void() +
theme(legend.position = "none") Dino
datasaurus_dozen %>%
filter(dataset == "dino") %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
theme_void() +
theme(legend.position = "none")