Go back to the main page
Go back to the R overview page
This file can be downloaded here.
Lesson 11-13: Data visualization
Once the data is read/loaded and cleaned up nicely, it is time start analyzing and presenting the data. In these two lessons, we will look at the visualization part. We will use different plots to show the analysis and what it takes to make the data sets usable for each different plot.
First, let’s load a data set that we can work with which has been
cleaned up already. Of course we start with the make up of the tibbles
we create during this part of the lessons, like we did before in
previous lessons using the tidyverse and
kableExtra libraries.
library(tidyverse)
library(kableExtra)
library(knitr)
library(pillar)
formatted_table <- function(df) {
col_types <- sapply(df, pillar::type_sum)
new_col_names <- paste0(names(df), "<br>", "<span style='font-weight: normal;'>", col_types, "</span>")
kbl(df, col.names = new_col_names, escape = F, format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hoover", "responsive"))
}Download the file dinoDatasetCSV.csv and check in a text editor what is the delimiter in the file. Read the file into R.
# Read the data on dinosaurs.
dino_data <- read_csv2("./files_13_data_visualization_exercises/add_exercises/dinoDatasetCSV.csv")
# Replace any missing data with NA values.
# Hint: check which columns are of character type, but contains numbers.
tibble1 <- tibble(dino_data) %>%
replace(.== "?", NA) %>%
mutate(length_m = as.numeric(length_m)) %>%
mutate(weight_kg = as.numeric(weight_kg)) %>%
mutate(height_m = as.numeric(height_m))
formatted_table(head(tibble1))|
scientific_name chr |
common_name chr |
meaning chr |
diet chr |
length_m dbl |
weight_kg dbl |
height_m dbl |
locomotion chr |
period chr |
lived_in chr |
behavior_notes chr |
first_discovered chr |
fossil_location chr |
notable_features chr |
intelligence_level chr |
source_link chr |
row_index dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Abelisaurus | Abelisaurus | Abel’s lizard | Carnivore | 7.0 | 1500 | 2.4 | Bipedal | Late Cretaceous | Argentina | Large theropod | 1985 | Argentina | Short arms | Medium | https://en.wikipedia.org/wiki/Abelisaurus | 0 |
| Abrictosaurus | Abrictosaurus | Wakeful lizard | Herbivore | 1.5 | 15 | 0.5 | Bipedal | Early Jurassic | South Africa | Small herbivore | 1974 | South Africa | Unique teeth | Medium | https://en.wikipedia.org/wiki/Abrictosaurus | 1 |
| Abrosaurus | Abrosaurus | Delicate lizard | Herbivore | 9.0 | 2000 | 4.5 | Quadrupedal | Middle Jurassic | China | Delicate skull | 1959 | China | Delicate skull | Medium | https://en.wikipedia.org/wiki/Abrosaurus | 2 |
| Abydosaurus | Abydosaurus | Abydos lizard | Herbivore | 18.0 | 30000 | 6.0 | Quadrupedal | Early Cretaceous | USA | Basal sauropod | 2010 | USA | Complete skull | Medium | https://en.wikipedia.org/wiki/Abydosaurus | 3 |
| Acantholipan | Acantholipan | Spiny shield | Herbivore | 5.0 | 2500 | 1.5 | Quadrupedal | Late Cretaceous | Mexico | Armored nodosaur | 2011 | Mexico | Clubless armored tail | Medium | https://en.wikipedia.org/wiki/Acantholipan | 4 |
| Acanthopholis | Acanthopholis | Spiny scales | Herbivore | 4.0 | 1000 | 1.2 | Quadrupedal | Early Cretaceous | UK | Spiny armor | 1865 | UK | Dermal armor | Medium | https://en.wikipedia.org/wiki/Acanthopholis | 5 |
Summarize each
Let’s create a summary of the dinosaurs that lived in the Cretaceous period. We are only interested in the scientific name, length, weight and height of the animals.
# Select the dinosaurs from the Middle Cretaceous period and sort them on scientific name and drop the rows that have NA values.
cretaceous <- tibble1 %>%
filter(period == "Middle Cretaceous") %>%
drop_na()
# Select only the columns containing the period, scientific name, length, weight and height.
sel_data <- cretaceous %>%
arrange(scientific_name) %>%
select(period, scientific_name, length_m, weight_kg, height_m)
# Change the colnames.
colnames(sel_data) <- c("Period", "Scientific name", "Length (m)",
"Weight (kg)", "Height (m)")
formatted_table(head(sel_data))|
Period chr |
Scientific name chr |
Length (m) dbl |
Weight (kg) dbl |
Height (m) dbl |
|---|---|---|---|---|
| Middle Cretaceous | Dongyangosaurus | 15 | 12000 | 4.0 |
| Middle Cretaceous | Elaltitan | 21 | 15000 | 5.0 |
| Middle Cretaceous | Epachthosaurus | 17 | 14000 | 4.5 |
| Middle Cretaceous | Ichthyovenator | 9 | 3500 | 3.5 |
| Middle Cretaceous | Ornithomimoides | 4 | 100 | 1.5 |
| Middle Cretaceous | Stegosaurides | 7 | 3000 | 3.0 |
We will use this data to make different plots.
Bar chart
Create a bar chart with ggplot of the weight of the dinosaurs of the
Cretaceous period. Create a title (main and axis titles) and give the
bars a steelblue color. Make sure that the labels on the
x-axis are placed at a 45 degree angle.
# Create a bar chart with ggplot of the weight.
bar_dino <- ggplot(data = sel_data, aes(x = `Scientific name`, y = `Weight (kg)`)) +
geom_bar(stat="identity", fill="steelblue") +
labs(title="Weight of dinosaurs from the Cretaceous period") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
bar_dinoFrom the graph is clear that the Udelartitan,
Volgatitan and Zhuchengtitan were the
heaviest dinosaurs in the Cretaceous period (it is also clear from the
name ‘titan’).
Grouped Bar chart
Let’s compare the length to the height of the dinosaurs in one bar
chart. First, you will have to make the data tidy with
pivot_longer(). Then you can create the grouped bar
chart.
# Make the data tidy. Check if the data is indeed tidy (length and height should be indicated in a column called dimension).
tidy_data <- sel_data %>%
pivot_longer(c(`Length (m)`, `Height (m)`), names_to = "Dimension",
values_to = "Size (m)")
formatted_table(head(tidy_data))|
Period chr |
Scientific name chr |
Weight (kg) dbl |
Dimension chr |
Size (m) dbl |
|---|---|---|---|---|
| Middle Cretaceous | Dongyangosaurus | 12000 | Length (m) | 15.0 |
| Middle Cretaceous | Dongyangosaurus | 12000 | Height (m) | 4.0 |
| Middle Cretaceous | Elaltitan | 15000 | Length (m) | 21.0 |
| Middle Cretaceous | Elaltitan | 15000 | Height (m) | 5.0 |
| Middle Cretaceous | Epachthosaurus | 14000 | Length (m) | 17.0 |
| Middle Cretaceous | Epachthosaurus | 14000 | Height (m) | 4.5 |
# Plot the grouped bar chart.
dino_size1 <- ggplot(tidy_data, aes(`Scientific name`, `Size (m)`,
fill = Dimension)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title="Length and height for dinosaurs from the Cretaceous period",
y = "Size (m)") +
theme(axis.text.x = element_text(angle = 45, hjust=1, size = 7),
axis.title.x = element_blank()) # leave out title of the x-axis
dino_size1
Percent bar chart
You can create a percent bar chart to see what percentage is body length compared to body height.
# Create a percentage bar chart.
perc_dino_size1 <- ggplot(tidy_data, aes(`Scientific name`, `Size (m)`,
fill = `Dimension`)) +
geom_bar(stat = "identity", position="fill") +
labs(title="Length and height for dinosaurs from the Cretaceous period") +
theme(axis.text.x = element_text(angle = 45, hjust=1, size = 7),
axis.title.x = element_blank())
perc_dino_size1
Swithing orders in a group
If you would like to present the data in a different order, you need
to change the column with the groups to the data type
factor. You can put the length before the height in this
way.
# First, change the column of 'Dimension' to a factor type of data.
tidy_data <- tidy_data %>%
mutate(Dimension = factor(Dimension,
levels = c("Length (m)", "Height (m)")))
levels(tidy_data$Dimension)## [1] "Length (m)" "Height (m)"
# Second, plot the grouped bar chart.
dino_size2 <- ggplot(tidy_data, aes(`Scientific name`, `Size (m)`,
fill = Dimension)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title="Length and height for dinosaurs from the Cretaceous period",
y = "Size (m)") +
theme(axis.text.x = element_text(angle = 45, hjust=1, size = 7),
axis.title.x = element_blank())
dino_size2And use the same data to create a percentage bar chart.
# Create the percentage bar chart.
perc_dino_size2 <- ggplot(tidy_data, aes(`Scientific name`, `Size (m)`,
fill = `Dimension`)) +
geom_bar(stat = "identity", position="fill") +
labs(title="Length and height for dinosaurs from the Cretaceous period") +
theme(axis.text.x = element_text(angle = 45, hjust=1, size = 8),
axis.title.x = element_blank())
perc_dino_size2
Pie Chart
For a pie chart we will use a selection of the dinosaurs from the Cretaceous period, otherwise the pie would be divided in too many pieces, representing all the dinosaurs from this period.
We will take the 7 longest dinosaurs and see ho the distribution of the weights is among these ‘long’ dinosaurs.
# Select the seven longest dinosaurs. Check your bar chart with the length and height of each Cretaceous dinosaur. Save these seven to a vector.
long_dino <- c("Dongyangosaurus", "Elaltitan", "Titanomachya", "Epachthosaurus", "Udelartitan",
"Volgatitan", "Zhuchengtitan")
# Filter the data set using this vector.
df_long_dino1 <- sel_data %>%
filter(`Scientific name`%in%long_dino)
formatted_table(df_long_dino1)|
Period chr |
Scientific name chr |
Length (m) dbl |
Weight (kg) dbl |
Height (m) dbl |
|---|---|---|---|---|
| Middle Cretaceous | Dongyangosaurus | 15 | 12000 | 4.0 |
| Middle Cretaceous | Elaltitan | 21 | 15000 | 5.0 |
| Middle Cretaceous | Epachthosaurus | 17 | 14000 | 4.5 |
| Middle Cretaceous | Titanomachya | 15 | 15000 | 6.0 |
| Middle Cretaceous | Udelartitan | 30 | 35000 | 7.0 |
| Middle Cretaceous | Volgatitan | 25 | 35000 | 6.0 |
| Middle Cretaceous | Zhuchengtitan | 25 | 30000 | 6.0 |
# OR select by using `slice_max()` and change the `Scientific name` to factor type of data (creates order in the pie chart):
df_long_dino2 <- sel_data %>%
slice_max(order_by = `Length (m)`, n = 7) %>%
mutate(`Scientific name` = factor(`Scientific name`,
levels = `Scientific name`))
formatted_table(df_long_dino2)|
Period chr |
Scientific name fct |
Length (m) dbl |
Weight (kg) dbl |
Height (m) dbl |
|---|---|---|---|---|
| Middle Cretaceous | Udelartitan | 30 | 35000 | 7.0 |
| Middle Cretaceous | Volgatitan | 25 | 35000 | 6.0 |
| Middle Cretaceous | Zhuchengtitan | 25 | 30000 | 6.0 |
| Middle Cretaceous | Elaltitan | 21 | 15000 | 5.0 |
| Middle Cretaceous | Epachthosaurus | 17 | 14000 | 4.5 |
| Middle Cretaceous | Dongyangosaurus | 15 | 12000 | 4.0 |
| Middle Cretaceous | Titanomachya | 15 | 15000 | 6.0 |
Now create the pie chart.
# Create the pie chart based on the weight of these dinosaurs.
dino_pie1 <- ggplot(df_long_dino2, aes(x = "", y = `Length (m)`,
fill = `Scientific name`))+
geom_bar(stat="identity", width = 1) +
coord_polar("y", start=0, direction = -1) +
labs(title="Length for the largest Cretaceous dinosaur species") +
geom_text(aes(label = `Length (m)`), position = position_stack(vjust = 0.5)) + # add values to the pieces of the pie.
theme_void() # remove background, grid, numeric labels
dino_pie1It seems that the ‘titans’ are the largest and heaviest dinosaurs
(except the Elaltitan).
If you really want to make it a fancy pie chart, you can try using the
ggrepel library.
# First, the position of the labels outside the pie chart have to be determined
library(ggrepel)
fancy_pie1 <- df_long_dino2 %>%
# position the labels outside the pie chart correctly
mutate(csum = rev(cumsum(rev(`Length (m)`))),
pos = `Length (m)`/2 + lead(csum, 1),
pos = if_else(is.na(pos), `Length (m)`/2, pos))
# Now plot the pie chart and insert the labels with `geom_label_repel()`.
dino_pie2 <- ggplot(fancy_pie1,
aes(x = "", y = `Length (m)`,
fill = fct_inorder(`Scientific name`))) +
geom_bar(stat="identity", width = 1) +
coord_polar("y", start=0, direction = -1) +
scale_fill_brewer(palette = "Set3") + # color set for the pie chart
# Create labels outside the pie chart
geom_label_repel(data = fancy_pie1,
aes(y = pos, label = `Length (m)`),
size = 3.5, nudge_x = 0.75, show.legend = FALSE) +
labs(title="Length for the largest Cretaceous dinosaur species") +
guides(fill = guide_legend(title = "Dinosaur")) +
theme_void() # remove background, grid, numeric labels
dino_pie2
Or with percentages:
# First add a column with the calculated percentages (of the total length).
fancy_pie2 <- fancy_pie1 %>%
mutate(percentage = round(`Length (m)` / sum(`Length (m)`) * 100), 0)
# Then plot the same pie chart, but with the percentages instead of the length.
dino_pie3 <- ggplot(fancy_pie2,
aes(x = "", y = `Length (m)`,
fill = fct_inorder(`Scientific name`))) +
geom_bar(stat="identity", width = 1) +
coord_polar("y", start = 0, direction = -1) +
scale_fill_brewer(palette = "Set3") + # color set for the pie chart
# Create labels outside the pie chart
geom_label_repel(data = fancy_pie2,
aes(y = pos, label = paste0(percentage, "%")),
size = 3.5, nudge_x = 0.75, show.legend = FALSE) +
labs(title="Length for the largest Cretaceous dinosaur species") +
guides(fill = guide_legend(title = "Dinosaur")) +
theme_void() # remove background, grid, numeric labels
dino_pie3
Boxplot
We can use the same data for the boxplot. Let’s look at the height of the dinosaurs in different periods of time.
# Exclude (use `filter()`on tibble1) the dinosaurs that are heavier than 10000 kg
# Drop any NA values.
# Select the columns period, scientific name, length, weight and height,
# Make the data in the column period factor type data: Triassic < Jurassic < Cretaceous,
# Save the data in tibble2.
tibble2 <- tibble1 %>%
filter(weight_kg <= 10000) %>%
drop_na() %>%
select(period, scientific_name, length_m, weight_kg, height_m) %>%
mutate(period = factor(period, levels = c("Early Triassic", "Middle Triassic", "Late Triassic",
"Early Jurassic", "Middle Jurassic", "Late Jurassic",
"Early Cretaceous", "Middle Cretaceous", "Late Cretaceous")))
colnames(tibble2) <- c("Period", "Scientific name", "Length (m)",
"Weight (kg)", "Height (m)")
formatted_table(head(tibble2))|
Period fct |
Scientific name chr |
Length (m) dbl |
Weight (kg) dbl |
Height (m) dbl |
|---|---|---|---|---|
| Late Cretaceous | Abelisaurus | 7.0 | 1500 | 2.4 |
| Early Jurassic | Abrictosaurus | 1.5 | 15 | 0.5 |
| Middle Jurassic | Abrosaurus | 9.0 | 2000 | 4.5 |
| Late Cretaceous | Acantholipan | 5.0 | 2500 | 1.5 |
| Early Cretaceous | Acanthopholis | 4.0 | 1000 | 1.2 |
| Late Cretaceous | Achelousaurus | 6.0 | 2500 | 2.0 |
# Create a boxplot for the height of the dinosaurs for each period in time.
height_period <- ggplot(tibble2, aes(x = `Period`, y = `Height (m)`)) +
geom_boxplot() +
labs(title="Height of dinosaurs in different periods in time",
x = "Period", y = "Height (m)") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
height_period
It seems that the dinosaurs were growing large towards the Jurassic
period and maintained that size during the Cretaceous period.
Grouped boxplot
Of course it is also possible to create grouped boxplots. Let’s use the height and weight again and plot them against the period in time.
# First make a tidy tibble from tibble2.
tidy_period <- tibble2 %>%
pivot_longer(c(`Length (m)`, `Height (m)`), names_to = "Dimension",
values_to = "Size (m)")
formatted_table(head(tidy_period))|
Period fct |
Scientific name chr |
Weight (kg) dbl |
Dimension chr |
Size (m) dbl |
|---|---|---|---|---|
| Late Cretaceous | Abelisaurus | 1500 | Length (m) | 7.0 |
| Late Cretaceous | Abelisaurus | 1500 | Height (m) | 2.4 |
| Early Jurassic | Abrictosaurus | 15 | Length (m) | 1.5 |
| Early Jurassic | Abrictosaurus | 15 | Height (m) | 0.5 |
| Middle Jurassic | Abrosaurus | 2000 | Length (m) | 9.0 |
| Middle Jurassic | Abrosaurus | 2000 | Height (m) | 4.5 |
# Now create the boxplot.
size_period1 <- ggplot(tidy_period, aes(x = `Period`, y = `Size (m)`, fill = Dimension)) +
geom_boxplot() +
labs(title="Length and height for dinosaurs in different time periods") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
size_period1
And if you like to change the order of the height and length, you will
have to make the column for Dimension a factor type of
data.
# Make the column for 'Dimension' a factor type.
tidy_period <- tidy_period %>%
mutate(Dimension = factor(Dimension, levels = c("Length (m)", "Height (m)")))
formatted_table(head(tidy_period))|
Period fct |
Scientific name chr |
Weight (kg) dbl |
Dimension fct |
Size (m) dbl |
|---|---|---|---|---|
| Late Cretaceous | Abelisaurus | 1500 | Length (m) | 7.0 |
| Late Cretaceous | Abelisaurus | 1500 | Height (m) | 2.4 |
| Early Jurassic | Abrictosaurus | 15 | Length (m) | 1.5 |
| Early Jurassic | Abrictosaurus | 15 | Height (m) | 0.5 |
| Middle Jurassic | Abrosaurus | 2000 | Length (m) | 9.0 |
| Middle Jurassic | Abrosaurus | 2000 | Height (m) | 4.5 |
# Create the boxplot.
size_period2 <- ggplot(tidy_period, aes(x = `Period`, y = `Size (m)`, fill = Dimension)) +
geom_boxplot() +
labs(title="Length and height for dinosaurs in different time periods") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
size_period2
Violin Chart
Create with the same data a violin chart. Although these plots show more information about the data, it is more difficult to interpret the plots.
# Create a violin chart from the Height (m) for the different time periods.
length_period <- ggplot(tibble2, aes(x = `Period`, y = `Height (m)`)) +
geom_violin() +
labs(title="Length of dinosaurs in different time periods") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
length_period
Line plots
For the following plots we will use the Climate disease dataset.
# Read the data from the file and save it in df1.
df1 <- read.csv2("./files_13_data_visualization_exercises/add_exercises/climate_disease_dataset.csv")
# Filter on the countries of the Netherlands, Sweden, Portugal and Hungary and the year 2023.
df2 <- df1 %>%
filter(country == "Netherlands" | country == "Sweden" | country == "Portugal" |
country == "Hungary") %>%
slice_max(order_by = date, n = 480)
# Turn the first column to dates.
df2$date <- as.Date(df2$date, "%d/%m/%Y")
formatted_table(head(df2))|
date date |
country chr |
region chr |
avg_temp_F dbl |
precipitation_mm dbl |
air_quality_index dbl |
uv_index dbl |
malaria_cases int |
dengue_cases int |
population_density int |
healthcare_budget int |
X dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2023-12-01 | Portugal | West | 81.31159 | 216.72804 | 8.030935 | 12.000000 | 136 | 134 | 394 | 2520 | 178.3609 |
| 2023-12-01 | Sweden | West | 58.50859 | 318.78748 | 14.187193 | 9.076136 | 113 | 60 | 214 | 2681 | 137.3155 |
| 2023-12-01 | Hungary | Central | 62.04476 | 56.07903 | 0.000000 | 9.390880 | 83 | 52 | 107 | 4705 | 143.6806 |
| 2023-12-01 | Netherlands | Central | 83.97252 | 264.04019 | 71.588550 | 12.000000 | 92 | 66 | 421 | 1202 | 183.1505 |
| 2022-12-01 | Portugal | West | 75.53251 | 166.15460 | 59.467505 | 9.694721 | 65 | 119 | 394 | 2520 | 167.9585 |
| 2022-12-01 | Sweden | West | 60.34775 | 264.77139 | 0.000000 | 7.465901 | 67 | 102 | 214 | 2681 | 140.6259 |
Now we can create a line plot.
# Create a line plot of the average precipitation against the date for the four selected continents with tidy data from df2.
line_plot1 <- ggplot(df2, aes(x = date, y = precipitation_mm, group = country)) +
geom_line() +
labs(title="Average precipitation (in mm) per month", x = "Date",
y = "Precipitation (mm)")
line_plot1This is not very clear since the data of each country is not visible. Let’s use some color to distinguish the data for the different countries.
# Use different types of lines to distinguish the data from the different countries.
line_plot2 <- ggplot(df2, aes(x = date, y = precipitation_mm, group = country)) +
geom_line(aes(linetype = country)) +
labs(title="Average precipitation (in mm) per month", x = "Date",
y = "Precipitation (mm)")
line_plot2
This looks a bit better, but for this plot it is better to use colors to
distinguish the data from the different countries.
# Create the same line plot as before, but use colors to distinguish the data for the different countries.
line_plot3 <- ggplot(df2, aes(x = date, y = precipitation_mm, group = country)) +
geom_line(aes(color = country)) +
labs(title="Average precipitation (in mm) per month", x = "Date",
y = "Precipitation (mm)")
line_plot3
And add a trendline.
# Add a trendline to the plot.
line_plot4 <- ggplot(df2, aes(x = date, y = precipitation_mm)) +
geom_line(aes(color = country)) +
labs(title="Average precipitation (in mm) per month") +
geom_smooth(method="lm")
line_plot4Radar chart
For the radar chart we will use the Global
Ecological Footprint data of 2023. To create radar charts you need
to install the remotes package and load the
ggradar library.
# REMOVE THE HASH TAGS IN THE NEXT TWO LINES IF YOU HAVE NOT INSTALLED THE REMOTES PACKAGE YET.
#install.packages("remotes")
#remotes::install_github("ricardo-bion/ggradar")
library(ggradar)Read the data from the Global Ecological Footprint data file.
# Read the data and store it in a data frame.
footprint <- read_csv("./files_13_data_visualization_exercises/add_exercises/Global_Ecological_Footprint_2023.csv")
# Check which four European countries have the highest population.
# Select the columns for the country and the footprints (use the `ends_with()` function) for the four selected European countries.
big4 <- footprint %>%
filter(Region == "EU-27") %>%
slice_max(order_by = `Population (millions)`, n = 4) %>%
arrange(Country) %>%
select(Country, ends_with("Footprint"))
formatted_table(head(big4))|
Country chr |
Cropland Footprint dbl |
Grazing Footprint dbl |
Forest Product Footprint dbl |
Carbon Footprint dbl |
Fish Footprint dbl |
|---|---|---|---|---|---|
| France | 1.0 | 0.3 | 0.5 | 2.2 | 0.2 |
| Germany | 0.9 | 0.2 | 0.5 | 2.7 | 0.1 |
| Italy | 0.8 | 0.3 | 0.5 | 2.0 | 0.2 |
| Spain | 1.2 | 0.2 | 0.2 | 1.8 | 0.5 |
Now create the radar chart.
# Create a radar chart from the data with the footprints for the 4 European countries.
big4_fp <- ggradar(big4, legend.text.size = 8, values.radar = c("0", "1.5", "3.0"), axis.label.size = 2.5, grid.label.size = 3, legend.position = "right") +
labs(title = "Ecological footprints of the 4 highest populated European countries") +
theme(plot.title = element_text(size = 14, ))
big4_fp
Bubble chart
Bubble charts are useful when you have an extra dimension that you would like to show in the plot.
# Use the original data frame
# Check which four European countries have the highest population.
# Select the columns for the country and the footprints (use the `ends_with()` function) for the four selected European countries.
big4_fp <- footprint %>%
filter(Region == "EU-27") %>%
slice_max(order_by = `Population (millions)`, n = 4)
formatted_table(head(big4_fp))|
Country chr |
Region chr |
SDGi dbl |
Life Exectancy dbl |
HDI dbl |
Per Capita GDP chr |
Income Group chr |
Population (millions) dbl |
Cropland Footprint dbl |
Grazing Footprint dbl |
Forest Product Footprint dbl |
Carbon Footprint dbl |
Fish Footprint dbl |
Built up land…14 dbl |
Total Ecological Footprint (Consumption) dbl |
Cropland dbl |
Grazing land dbl |
Forest land dbl |
Fishing ground dbl |
Built up land…20 dbl |
Total biocapacity dbl |
Ecological (Deficit) or Reserve dbl |
Number of Earths required dbl |
Number of Countries required dbl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Germany | EU-27 | 82.2 | 81 | 0.94 | $54,192 | HI | 83.9 | 0.9 | 0.2 | 0.5 | 2.7 | 0.1 | 0.2 | 4.5 | 0.6 | 0.1 | 0.6994546 | 0.0745313 | 0.1780850 | 1.6145357 | -2.884143 | 2.978833 | 2.786361 |
| France | EU-27 | 81.2 | 82 | 0.90 | $47,995 | HI | 65.6 | 1.0 | 0.3 | 0.5 | 2.2 | 0.2 | 0.2 | 4.3 | 1.0 | 0.2 | 0.9738287 | 0.1110340 | 0.1509272 | 2.4583718 | -1.854753 | 2.855967 | 1.754464 |
| Italy | EU-27 | 78.3 | 83 | 0.90 | $43,010 | HI | 60.3 | 0.8 | 0.3 | 0.5 | 2.0 | 0.2 | 0.1 | 4.0 | 0.4 | 0.1 | 0.3408854 | 0.0657797 | 0.0912344 | 0.9711389 | -2.980057 | 2.616313 | 4.068621 |
| Spain | EU-27 | 79.9 | 83 | 0.91 | $39,753 | HI | 46.7 | 1.2 | 0.2 | 0.2 | 1.8 | 0.5 | 0.1 | 3.9 | 1.1 | 0.1 | 0.4113462 | 0.0606709 | 0.0573795 | 1.7221554 | -2.193411 | 2.592720 | 2.273643 |
# Create a bubble chart for the total the 'number of countries required' vs 'biocapacity' and as third dimension the 'population'.
bubble_big4 <- ggplot(big4_fp, aes(x = `Total biocapacity`, y = `Number of Countries required`)) +
geom_point(aes(color = Country, size = `Population (millions)`), alpha = 0.5) +
scale_size_area(max_size = 10)
bubble_big4
Learning outcomes
This lesson you have learned to:
- visualize data using ggplot for:
- creating a basic bar chart,
- creating a grouped bar chart,
- creating a percentage bar chart,
- creating a box plot,
- changing order in a grouped bar chart or box plot.
- creating a violin chart,
- creating a radar chart,
- creating a bubble chart.
Go back to the main page
Go back to the R overview page
⬆️ Back to Top
This web page is distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons License: CC BY-SA 4.0.