Statistical Consulting Group - ETH Z · 2019-06-28 · Statistical Consulting Group Seminar for...
Transcript of Statistical Consulting Group - ETH Z · 2019-06-28 · Statistical Consulting Group Seminar for...
Introduction to ggplot2
Statistical Consulting Group
Seminar for Statistics, ETH Zurich
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 1 / 40
Table of Contents
1 Getting started with ggplot2
2 Aesthetics
3 Layers
4 Faceting
5 Tailoring Graphics
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 2 / 40
Getting Started with ggplot
In this section we will . . .
. . . get started with ggplot2
. . . create plots of one variable
. . . create plots of two variables
. . . learn how to save a plot
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 3 / 40
Must Have
Useful cheatsheet: https://www.rstudio.com/resources/cheatsheets/ (pickData Visualisation with ggplot2)
Source: link above. This image is under Creative Commons licence.
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 4 / 40
Book Recommendations: R Graphics & ggplot
R Graphics CookbookWinston Chang, O’Reilly Media, 2012
and its online companion:http://www.cookbook-r.com/Graphs/
ggplot2: Elegant Graphics for Data Analysis (Use R!)Hadley Wickham, Springer, 2009
See also: https://ggplot2.tidyverse.org/
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 5 / 40
Why ggplot2?
Some advantages:
nice labels
nice colors
small margins
beautiful faceting or multipanel plots
very powerful and flexible: we will have a glimpse at the grammar ofgraphics
can easily change or update plots
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 6 / 40
Why ggplot2?
Some disadvantages:
ggplot2 can only deal with data.frames
default plots of model outputs are usually not possible
ggplot2 is not optimized for speed performance
3D plots are not possible
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 7 / 40
Functions in Package ggplot2
There are two important functions:
qplot: similar to base plotting functions (“for beginners”)
ggplot: the feature-rich “workhorse” (our focus)
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 8 / 40
Grammar of Graphics
The ”gg” in ggplot2 stands for grammar of graphics which is based onWilkinson’s (2005) grammar of graphics.
The grammar is useful because . . .
it is a generic way of creating a plot
it does not rely on a specific or customized graphic for a particularproblem
it allows for iterative updates of a plot
it uses the concepts of layers
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 9 / 40
Grammar of Graphics
Idea: all plots can be built from the same components
data set
coordinate system
aesthetic mapping that describes how information in data is beingmapped to visual properties (aesthetics) of geometric objects, socalled geoms.
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 10 / 40
Grammar of Graphics
Source: https://www.rstudio.com/resources/cheatsheets/
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 11 / 40
Grammar of Graphics
Source: https://www.rstudio.com/resources/cheatsheets/
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 12 / 40
Overview: Plots of One Variable
One continuous variable:
histogram: geom_histogram()
densities: geom_density()
frequency plot: geom_freqpoly()
One discrete (categorical) variable:
barplot: geom_bar()
pie plot: different coordinate system of barplot . . .
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 13 / 40
Illustration with mpg Data Set
Let us use the mpg data set from ggplot2.
It contains 234 observations about the fuel efficiency of 38 popular cars in1999 and 2008.
Let’s have a look at the democode.
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 14 / 40
Overview: Plots of Two Variables
Two continuous variables
scatter plot: geom_point()
scatter plot using jitter: geom_jitter()
smoother: geom_smooth()
Discrete x and continuous y
boxplot: geom_boxplot()
bar plot: geom_bar(stat = "identity")
Continuous function like time series
line plot: geom_line()
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 15 / 40
Illustration with mpg Data Set
Let’s look at the democode.
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 16 / 40
How to Save a Plot?
First we create an R object with the corresponding plot:> v <- ggplot(data = mpg, aes(x = class, y = cty)) + geom_boxplot() +
+ geom_jitter(alpha = 0.3)
Plots can then be saved by ggsave():> ggsave(filename = "cool-boxplot-II.png", plot = v)
ggsave automatically recognizes the output format (pdf, png, jpg, eps,svg)!
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 17 / 40
How to Save a Plot?
Control the width & height and change the path:> ggsave(filename = "cool-boxplot-III.jpg", plot = v, width = 5, height = 4,
+ path = "/path/of/figures/")
Alternatively, don’t forget to print the plot:> pdf("cool-boxplot-IV.pdf")
> print(v)
> dev.off()
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 18 / 40
Aesthetics
In this section we will have a look at the aesthetics . . .
. . . size
. . . shape
. . . color
. . . and combine them
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 19 / 40
Aesthetics: size
> ggplot(data = mpg, aes(x = hwy, y = cty, size = displ)) +
+ geom_jitter()
> # displ: engine displacement, in litres
●
●
●●
●
●●●
●
●●
●
●●
●●
●
●
●
●
●●●
●●
●●●
●
●●
●
●
●
● ●●
●
●●●
●●
●
●●
●●
●●
●
●●●
●
●●
●●
●
●
●
●
●●
●
●●●
●
● ●
●
● ●●●
●●
●●●●
●●●● ●
●
●
●●
●●
● ● ●●●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
●●
●●●
●
●
●●
●●●
●
●●
● ●●●● ●●●
●
●●●
●
●
● ●
●●●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●● ●
●
●● ●
●
●●●
● ●●
●
●●●
●●
●
●
●
●
●
●
● ●
● ●● ●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
10
20
30
20 30 40hwy
cty
displ
●
●
●
●●●
2
3
4
5
6
7
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 20 / 40
Aesthetics: shape
> ggplot(data = mpg, aes(x = hwy, y = cty, shape = factor(cyl))) +
+ geom_jitter(size = 3)
> # we can set a fixed size for all the points
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
● ●
●●
●
●
● ●
●●
●●
●
●
●
●●●
● ●
●
●
●●
●● ●●
● ● ●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10
20
30
20 30 40hwy
cty
factor(cyl)
● 4
5
6
8
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 21 / 40
Aesthetics: color
> ggplot(data = mpg, aes(x = hwy, y = cty, color = factor(cyl))) +
+ geom_jitter(size = 3)
●
●●
●
●
● ●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●
● ●
●
●
●
●●
●●
●
●●
●●
●
●
●
● ●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●●
●
●
●
●
●●
●
●●
●● ●
●
●
●●
●●
● ● ●●
●
●
●●
●
●
●●
●
●
● ●
●●
●●
●● ●●
●
●●
●●
● ●●
●
●
●
●●
●●
● ●●
●
●
●●●
●
●
● ●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●●
●●
●●
●●
●
●
●● ●●
●●
●
●●
●
●
●● ●
● ●
●
●
●
●
●
●●
●
● ●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
10
20
30
20 30 40hwy
cty
factor(cyl)
●
●
●
●
4
5
6
8
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 22 / 40
Aesthetics: Combination
> ggplot(data = mpg, aes(x = hwy, y = cty, color = factor(cyl),
+ shape = factor(cyl), size = displ)) +
+ geom_jitter()
> # there is only one combined legend for shape and color
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
● ●
●●●
●
●●
●●
●
●
●
●
●
●●●
● ●●
●
●
●
●● ●●●
● ●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
10
20
30
20 30 40hwy
cty
displ
●
●
●
●●●
2
3
4
5
6
7
factor(cyl)
● 4
5
6
8
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 23 / 40
Aesthetics: Setting vs. Mapping
> ggplot(data = mpg, aes(x = hwy, y = cty, color = "blue", size = 2)) +
+ geom_jitter()
●
● ●●
●●●●
●
●●
●
●●
●●
●●
●
●
●●●
●●
●●●●
● ●
●
●
●
● ●●
●●
●●●●
●
●● ●●●●●
● ●●
●●●●●
●
●
●●
● ●
●
●●●
●
● ●
●
● ●●●
● ●● ●●●●●●● ●●
●
●●●
●● ● ●●
●
●
●●●
●
●●●
●
●●
● ●
●●●● ●●●
●●
●●
● ●●
●
●●
● ● ●●● ●●●
● ●●●
●●
●●
●●●●●
●●
●
●
●●● ●
●●●
●●
●
●
●●●●
● ●●
●
●●
●●●
●
●● ●●
●●●
● ● ●●
●● ●
● ●
●●
●
●
●
●●
●
● ●●●
●
●●●
●
●
●●
●●●●
●●
●
●
●
●● ●
●
●●
●
●
●●
10
15
20
25
30
35
10 20 30 40hwy
cty
colour
● blue
size
● 2
The color argument in aes will create a new variable with a single entry"blue" that is mapped to color (getting the first default color).Similarly for size.
In addition, a legend is being created.
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 24 / 40
Aesthetics: Setting vs. Mapping
If we want to set color to the explicit value "blue", we can do this in thecorresponding layer (outside of aes). Similarly for size.
> ggplot(data = mpg, aes(x = hwy, y = cty)) +
+ geom_jitter(color = "blue", size = 2)
●
●
●●
●
● ●●
●
●●
●
●●
●●
●●
●
●
●●
●
●● ●
●●
●
● ●
●
●
●
● ●
●●●
● ●
●●
●
●●●●
●
●●
● ●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
● ●●
●
● ●●
●●●
●●
●● ●
●
●
●●
●
●● ● ●●
●
●
●
●
●●
●
●●
●
● ●
●●
●●●
●●
●●
●●
●●
● ●
●
●
●
●
● ●
●●
●●●
●
●
●●●
●
●
● ●
●●●
● ●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
● ●
●
●
●
●
●●
●
●
●● ●●
●●
●
● ● ●●
●● ●
● ●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
10
15
20
25
30
35
10 20 30 40hwy
cty
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 25 / 40
Layers
In this section we will look at. . .
. . . where to place which arguments.
. . . the order of layers.
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 26 / 40
Layers: Where to Place Which Arguments?
All arguments specified in function ggplot() are passed to allsubsequent layers.
This holds true unless a layer contains another specification.
Arguments specified in a single layer only affect the correspondinglayer.
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 27 / 40
Layers: Where to Place Which Arguments?
Basic plot to start with> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter() + geom_smooth()
Color both points and smoothers per group of drv⇒ three smoothers are fitted:
> ggplot(data = mpg, aes(x = hwy, y = cty, color = drv)) + geom_jitter() +
+ geom_smooth()
●
●
●
●
●
● ●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
● ●
●
●
●
● ●
●
●
●
●●
●●
●
●●
●●
●●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
● ●●
●
●
●
●
●●●
●●●● ●
●
●
●●
●
●
● ● ●●●
●
●●
●
●
●
●●
●
● ●
● ●
●●
●● ●
●●
●
●
●●
● ●
●
●
●
●
● ●
●●
● ●●
●
●
●●●
●
●
● ●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
● ●
●
●
●●
●●
●
●
●●●●
●●
●
● ● ●
●
●●
●
● ●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
10
20
30
20 30 40hwy
cty
drv
●
●
●
4
f
r
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 28 / 40
Layers: Where to Place Which Arguments?
Color the points per group of drv⇒ one smoother is fitted:
> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter(aes(color = drv)) +
+ geom_smooth()
●
● ●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●●
●●
●● ●
●●
●
●●
● ●
●
●
●
●●●
●
●●
●●
●
●
●
●
● ●
●
●●
●
●
●●
●
● ●●
●
●
●
●
●●●
●●
●●
●
●
●
●●
●
●
●● ●●
●
●
●
●
●
●
●
●●
●
●●
● ●
●●●●
●●●
●●
●●
●●
●
●
●●
●●
●●●
●●●
●
●●●
●
●
● ●
●●●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●●●
●
●●
●
● ● ●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
10
20
30
20 30 40hwy
cty
drv
●
●
●
4
f
r
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 29 / 40
Layers: Where to Place Which Arguments?
Color the smoothers per group drv
⇒ three smoothers are fitted:
> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter() +
+ geom_smooth(aes(color = drv))
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
●
●
●
●●
●●
●
●●
●●
●
●●
●●●
●
●
●● ●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●
● ●●
●●
●● ●
●
●
●● ●
●●
● ●●
●
●
●
●
●●
●●
●
●
● ●
● ●
●●
●●
●●●
●●
●●
● ●
●
●
●●
● ● ●●● ●●●
●
●●●
●
●
● ●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
● ●
●●
●
●
●●
●
●
●●
●●
●●
●
●●
●
●
●● ●
● ●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
10
20
30
20 30 40hwy
cty
drv
4
f
r
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 30 / 40
Layers: Order of Layers
Plot the points first and then add the layer with boxplots:
> ggplot(data = mpg, aes(x = class, y = cty)) +
+ geom_jitter(alpha = 0.4) +
+ geom_boxplot(aes(fill = class))
●
●
●
●
●
●
●
●
●
10
20
30
2seater compact midsize minivan pickup subcompact suvclass
cty
class
2seater
compact
midsize
minivan
pickup
subcompact
suv
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 31 / 40
Layers: Order of Layers
Plot the boxplot first and afterwards add the layer of points:
> ggplot(data = mpg, aes(x = class, y = cty)) +
+ geom_boxplot(aes(fill = class)) +
+ geom_jitter(alpha = 0.4)
●
●
●
●
●
●
●
●
●
10
20
30
2seater compact midsize minivan pickup subcompact suvclass
cty
class
2seater
compact
midsize
minivan
pickup
subcompact
suv
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 32 / 40
Faceting
In this section we will . . .
. . . consider faceting or multi-panel conditioning plots
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 33 / 40
Faceting: facet_wrap
Let’s look a multi-panel plots> # subset of the mpg data set
> mpg.small <- subset(mpg, manufacturer %in%
+ c("ford", "land rover", "toyota",
+ "chevrolet", "honda"))#, "volkswagen"))
> ggplot(data = mpg.small, aes(x = hwy, y = cty)) +
+ geom_jitter() + facet_wrap(~ manufacturer)
●
●
●●
●
●●
●●
●●
● ●
●
●
●
● ●●
●
●●●
●●●
●
●●
●●●●●
●●●
●
●
●●●
●●● ●●
●
●●
●●
●
●
●●
●●
●●●
● ● ●●
●● ●
● ●
●
●
●
●
●
●●
●
● ●●
●
●
●●
●●
●
●●
●
land rover toyota
chevrolet ford honda
20 30 20 30
20 3010
15
20
25
10
15
20
25
hwy
cty
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 34 / 40
Faceting: facet_grid
> ggplot(data = mpg.small, aes(x = hwy, y = cty)) +
+ geom_jitter() + facet_grid(manufacturer ~ cyl)
●
●
●
●●
● ●●
●●
●
●●
●● ●●● ● ●●
● ●●
●●
●● ●
● ●●
●●● ●
●●
●● ●●
●●●
●●●●● ●
● ●● ●
●
●
●●●
●●
●●●
●
● ●
●
●● ●●●●● ●
●●
● ●●●●
● ●●●
●
●●
4 6 8chevrolet
fordhonda
land rovertoyota
15 20 25 30 35 15 20 25 30 35 15 20 25 30 35
10
15
20
25
10
15
20
25
10
15
20
25
10
15
20
25
10
15
20
25
hwy
cty
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 35 / 40
Changing Colors or the Theme
In this section we will look at . . .
. . . how to select colors
. . . themes
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 36 / 40
How to Select Colors?
> require(RColorBrewer)
> display.brewer.all()
BrBGPiYG
PRGnPuOrRdBuRdGyRdYlBu
RdYlGnSpectral
AccentDark2Paired
Pastel1Pastel2
Set1Set2Set3
BluesBuGnBuPuGnBu
GreensGreys
OrangesOrRdPuBu
PuBuGnPuRd
PurplesRdPuRedsYlGn
YlGnBuYlOrBrYlOrRd
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 37 / 40
How to Select Colors?
> # use brewer
> ggplot(data = mpg, aes(x = hwy, fill = drv)) + geom_density(alpha = 0.5) +
+ scale_fill_brewer(palette = "Dark2")
0.00
0.05
0.10
20 30 40hwy
dens
ity
drv
4
f
r
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 38 / 40
How to Select Colors?
Websites helping you to select colorshttp://colorbrewer2.org/http://tools.medialab.sciences-po.fr/iwanthue/
Define your own colors> ggplot(data = mpg, aes(x = hwy, fill = drv)) +
+ geom_density(alpha = 0.5) +
+ scale_fill_manual(values = c("red", "green", "black"))
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 39 / 40
Themes
Change the theme of a plot using theme_...().
Let’s have a look at the democode.
> ggplot(data = mpg, aes(x = class, y = cty)) +
+ geom_boxplot(aes(fill = class)) +
+ geom_jitter(alpha = 0.4) +
+ theme_bw()
See the R package ggthemes for additional themes.
Seminar for Statistics, ETH Zurich (SfS) Introduction to ggplot2 40 / 40