Example

Let’s consider the number of people entering (red curve) and leaving (blue curve) a shop from 8am to 10pm. This is an accurate representation using a line plot, that answers very well the question of how many people are entering / leaving the shop.

# Libraries
library(tidyverse)
library(hrbrthemes)

# Create data
data <- data.frame(
  x = seq(8,20,0.5),
  Entering = c(20,22,19,24,28,29,26,32,34,37,33,34,30,28,29,30,27,21,19,21,17,13,15,12,9),
  Leaving = c(0,4,8,7,10,13,15,16,15,16,17,19,22,21,24,26,24,25,28,29,28,26,23,20,19)
)

# reformat
data %>%
  gather( key=type, value=value, -1) %>%
  ggplot( aes(x=x, y=value, color=type)) +
    geom_line() +
    ylim(0,40) +
    scale_color_discrete(name="") +
    scale_x_continuous(breaks=seq(8,20,1)) +
    annotate( "text", x=c(12.5, 16.3, 17.5), y=c(39, 27, 31), label=LETTERS[1:3] ) +
    theme_ipsum() +
    theme(
      panel.grid.minor = element_blank(),
      legend.position = c(0.9, 0.9),
    ) +
    ylab("# of people") + 
    xlab("Hour of day")

Now, what if somebody asks you:

what is the trend of the total number of people in the shop?
at what time is the number of people in the shop declining?

Mental arithmetic

To answer these questions, your audience must think hard and will probably be confused.

Is it where the marker A is, when the number of people entering the shop starts decreasing?
Or marker B where more people are leaving than entering?
Or C where the number of people leaving decreases?

Instead of forcing the reader to make the calculation, it is probably better to represent the number of people in the shop directly:

# reformat
data %>%
  mutate(difference=Entering-Leaving + 5) %>%
  mutate(tot = cumsum(difference)) %>%
  ggplot( aes(x=x, y=tot)) +
    geom_line() +
    annotate( "text", x=c(12.5, 16.3, 17.5), y=c(205, 300, 290), label=LETTERS[1:3] ) +
    scale_x_continuous(breaks=seq(8,20,1)) +
    theme_ipsum() +
    theme(
      panel.grid.minor = element_blank()
    ) +
    ylab("# of people") + 
    xlab("Hour of day")

Of course, if more people leave the shop than enter, the total quantity starts decreasing (marker B). But if you want your audience to focus on your point, do not give them extra work.

Link with stacking

This is very related with the problem of [stacking].

Going further

Inspired from this talk by Chris Brown

Comments

Any thoughts on this? Found any mistake? Disagree? Please drop me a word on twitter or in the comment section below: