Let’s consider the number of people entering (red curve) and leaving (blue curve) a shop from 8am to 10pm. This is an accurate representation using a line plot, that answers very well the question of how many people are entering / leaving the shop.
# Libraries
library(tidyverse)
library(hrbrthemes)
# Create data
data <- data.frame(
x = seq(8,20,0.5),
Entering = c(20,22,19,24,28,29,26,32,34,37,33,34,30,28,29,30,27,21,19,21,17,13,15,12,9),
Leaving = c(0,4,8,7,10,13,15,16,15,16,17,19,22,21,24,26,24,25,28,29,28,26,23,20,19)
)
# reformat
data %>%
gather( key=type, value=value, -1) %>%
ggplot( aes(x=x, y=value, color=type)) +
geom_line() +
ylim(0,40) +
scale_color_discrete(name="") +
scale_x_continuous(breaks=seq(8,20,1)) +
annotate( "text", x=c(12.5, 16.3, 17.5), y=c(39, 27, 31), label=LETTERS[1:3] ) +
theme_ipsum() +
theme(
panel.grid.minor = element_blank(),
legend.position = c(0.9, 0.9),
) +
ylab("# of people") +
xlab("Hour of day")
Now, what if somebody asks you:
To answer these questions, your audience must think hard and will probably be confused.
A
is, when the number of people entering the shop starts decreasing?B
where more people are leaving than entering?C
where the number of people leaving decreases?Instead of forcing the reader to make the calculation, it is probably better to represent the number of people in the shop directly:
# reformat
data %>%
mutate(difference=Entering-Leaving + 5) %>%
mutate(tot = cumsum(difference)) %>%
ggplot( aes(x=x, y=tot)) +
geom_line() +
annotate( "text", x=c(12.5, 16.3, 17.5), y=c(205, 300, 290), label=LETTERS[1:3] ) +
scale_x_continuous(breaks=seq(8,20,1)) +
theme_ipsum() +
theme(
panel.grid.minor = element_blank()
) +
ylab("# of people") +
xlab("Hour of day")
Of course, if more people leave the shop than enter, the total quantity starts decreasing (marker B). But if you want your audience to focus on your point, do not give them extra work.
This is very related with the problem of [stacking].
Any thoughts on this? Found any mistake? Disagree? Please drop me a word on twitter or in the comment section below:
A work by Yan Holtz for data-to-viz.com