Annotation is a crucial component of a good dataviz. It can turn a boring graphic into an interesting and insightful way to convey information. Dataviz is often separated in two main types: exploratory
and explanatory
analysis. Annotation is used for the second type.
Our current attention span for looking at things online is, on average, less than five seconds. So if you can’t grab someone’s attention within five seconds, you’ve likely lost your viewer. Adding accurate annotation can greatly help to grab audience attention. Use keywords, shapes, colors and other visuals to help them go straight to the point.
Annotation is a general concept, and hundreds of different ways to annotate a chart exist, depending on the context
. Let’s study a few examples.
The main danger when plotting a line chart is to end up with a spaghetti chart: having too many groups often results in a confusing figure where it is hard to detect any pattern. This topic is well described here.
# Libraries
library(tidyverse)
library(hrbrthemes)
library(babynames)
library(viridis)
# Load dataset from github
data <- babynames %>%
filter(name %in% c("Mary","Emma", "Ida", "Ashley", "Amanda", "Jessica", "Patricia", "Linda", "Deborah", "Dorothy", "Betty", "Helen")) %>%
filter(sex=="F")
# Plot
data %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
theme(
legend.position="none",
plot.title = element_text(size=14)
) +
ggtitle("A spaghetti chart of baby names popularity") +
theme_ipsum()
Instead of presenting this graphic to your audience, you probably want to highlight your main point. Let’s say that you’re specifically interested in the evolution of Amanda. Then it makes more sense to build something like this:
data %>%
mutate( highlight=ifelse(name=="Amanda", "Amanda", "Other")) %>%
ggplot( aes(x=year, y=n, group=name, color=highlight, size=highlight)) +
geom_line() +
scale_color_manual(values = c("#69b3a2", "lightgrey")) +
scale_size_manual(values=c(1.5,0.2)) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
geom_label( x=1990, y=55000, label="Amanda reached 3550\nbabies in 1970", size=4, color="#69b3a2") +
theme(
legend.position="none",
plot.title = element_text(size=14)
)
Here is a second example created by John Burn-Murdoch for the financial time. I found it on a website that I highly recommand: Dataviz Done Right
Here you can see that annotation plays a key role:
Any thoughts on this? Found any mistake? Disagree? Please drop me a word on twitter or in the comment section below:
A work by Yan Holtz for data-to-viz.com