A boxplot summarizes the distribution of a continuous variable. It displays its median, its first and third quartiles and its outliers. This page explains how to build a basic boxplot with ggplot2.
The ggplot2
library allows to make a boxplot using
geom_boxplot()
. You have to specify a quantitative variable
for the Y axis, and a qualitative variable for the X axis.
# Load ggplot2
library(ggplot2)
# The mtcars dataset is natively available
# head(mtcars)
# A really basic boxplot
ggplot(mtcars, aes(x=as.factor(cyl), y=mpg)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("cyl")
Different types of boxplot can be created:
# Libraries
library(viridis)
library(ggplot2)
library(hrbrthemes)
library(tidyverse)
# Create dataset
data <- data.frame(
name=c( rep("A",500), rep("B",500), rep("C",20), rep('D', 100) ),
time=c( rep(c("M1", "M3"), each=250), rep(c("M1", "M3"), each=250), rep(c("M1", "M3"), each=10), rep(c("M1", "M3"), each=50) ),
value=c( rnorm(500, 10, 5), rnorm(500, 13, 1),rnorm(20, 25, 4), rnorm(100, 12, 1) )
)
# Basic boxplot
data %>%
ggplot( aes(x=name, y=value, fill=name)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Basic boxplot") +
xlab("")
# Boxplot by groups
data %>%
ggplot( aes(x=name, y=value, fill=time)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
theme_ipsum() +
theme(
legend.position="right",
plot.title = element_text(size=11)
) +
labs(title = "Boxplot by groups", fill = "Time") +
xlab("") +
ylab("")
This document is a work of the statistics team in the Biostatistics and Medical Information Department at Saint-Louis Hospital in Paris (SBIM).
Based on The R Graph Gallery by Yan Holtz.