Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155...
Transcript of Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155...
![Page 1: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/1.jpg)
Intro to GGPLOT2March 19, 2019
In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlikefinal graphs for publication or thesis, exploratory graphics are usually made very quickly in the process ofchecking for errors, outliers, distribution, and correlations of variables. The goal of making graphs is usuallydeveloping a personal understanding of the data and to prioritize tasks for follow up analysis.
Grammar of ggplot2
ggplot2 , is designed for data visualization of data frames. ‘gg’ of the name represents ‘grammar of graphic’,and ggplot2 has been recognized as one of three main graphic system of R.
The most important thing to get used to with ggplot2 is the logical structure of plots. The code you writespecifies the connections between the variables in your data, and the x and y location, colors, size, shapesetc that you can see on the screen. In ggplot2, these logical connections between your data and the plotelements are called aesthetic mappings or just aesthetics.
You begin every plot by telling the ggplot() function what your data is, and then how the variables in thisdata logically map onto the plot’s aesthetic mapping. Then you take the result and say what general sort ofplot you want, such as a scatterplot, a boxplot, or a bar chart. In ggplot2, the overall type of plot is called ageom. Each geom has a function that creates it and the function’s name follows the pattern of “geom_…()” .For example, geom_point() makes scatterplots, geom_bar() makes bar plots, geom_boxplot() makes boxplots,and so on. You combine these two pieces, the ggplot(data, mapping) object and the geom_…(), by literallyadding them together in an expression, using the + symbol.
Data, mapping (or aesthetics), and geometry (geom) are three mandatory components for ggplot2. As otherfunctions, the output of ggplot2 can be assigned to an object for further editing. Other optional ggplot2grammar components can be found in ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham(2016).
A little too complex? Don’t worry; you will get familiar with the grammar system very soon. In this lab, we
1
![Page 2: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/2.jpg)
will use this ggplot2 syntax to plot the following exploratory graphics: histogram (density plot), boxplot,scatterplot(including scatterplot matrix), and barplot.
Data preparation
• For this exercise, use a weather station dataset “AB_Stations.csv” that you can download from thecourse website. The first three columns specify the weather station ID, as well as the ecosystems andthe biome of Alberta in which the weather station is located. This is followed by a number of climatevariables that you can use for exploration (MAT=mean annual temp, MWMT= mean warmest monthtemp, MCMT=mean coldest month temp, MAP=mean annual precipitation, MSP=mean summerprecipitation, DRYNESS=an index).
• Load required packages.
#install.package('ggplot2') # if no ggplot2 package installedlibrary(ggplot2)
• Import the dataset with the code below, and use head(), tail(), str()or View() functions to check theimported data table.
dat1 <- read.csv("AB_Stations.csv")head(dat1, 10)## STATION ECOSYS BIOME MAT MWMT MCMT MAP MSP DRYNESS## 1 300114 G-NF Grassland 2.4 17.5 -24.9 443 287 17## 2 301449 G-DMG Grassland 4.5 18.8 -23.1 415 257 3## 3 302343 G-MG Grassland 4.9 18.4 -23.3 429 258 12## 4 302369 G-DMG Grassland 5.1 18.3 -24.3 405 254 1## 5 302789 G-NF Grassland 2.8 17.4 -22.2 431 291 14## 6 304155 B-AP Boreal -0.9 17.4 -30.4 480 292 45## 7 304642 B-UBH Boreal -0.5 16.3 -29.0 511 330 52## 8 305076 M-M Montane 2.8 15.1 -21.3 550 319 42## 9 305773 B-CP Boreal 3.3 17.3 -24.5 488 309 22## 10 306733 B-KU Boreal -3.5 16.5 -32.3 431 260 60
5.1. Histograms
One useful plot type for exploration of raw data is histograms. They are commonly used to visually checkthe distribution of continuous variables. The geom of histogram is geom_histogram(). For histograms the yaxis is counting the number of observations in each bin (default of ggplot2), but y can also be set as density.
• According to the ggplot2 syntax, we can execute the following command to get a histogram for avariable, in this case the variable “DRYNESS”:
hist_a <- ggplot(dat1, aes(x = DRYNESS)) + geom_histogram(color = 'gray90')hist_a
ggplot2 chooses the bin width by default when generating histograms, but chances are that bin width is notthe most appropriate one for any histogram you may want to make. It is therefore critical to change bins toverify whether the resulting histogram reflects the data accurately. Too many bins makes histograms overlypeaky and losses the whole picture of distribution, while too few bins cover limited details of the distribution.The following two methods to change bins:
2
![Page 3: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/3.jpg)
1. First method: set the number of bins you want for the histogram;
ggplot(dat1, aes(x = DRYNESS)) + geom_histogram(bins = 5, color = 'gray90') #5 binsggplot(dat1, aes(x = DRYNESS)) + geom_histogram(bins = 20, color = 'gray90')#20 binggplot(dat1, aes(x = DRYNESS)) + geom_histogram(bins = 50, color = 'gray90')#50 bin
0
20
40
60
0 50 100 150
DRYNESS
coun
t
0
5
10
15
20
0 50 100
DRYNESS
coun
t
0
5
10
0 50 100
DRYNESS
coun
t
2. Second method: set the width of bins:
ggplot(dat1, aes(x = DRYNESS)) + geom_histogram(binwidth = 1, color = 'gray90')ggplot(dat1, aes(x = DRYNESS)) + geom_histogram(binwidth = 5, color = 'gray90')ggplot(dat1, aes(x = DRYNESS)) + geom_histogram(binwidth = 10, color = 'gray90')
0
5
10
0 50 100
DRYNESS
coun
t
0
5
10
15
0 50 100
DRYNESS
coun
t
0
10
20
30
0 50 100
DRYNESS
coun
t
• Great to visually check the effectiveness of data transformations. In this case, the square-root trans-formation achieves approximately a normal distribution.
hist_b <- ggplot(dat1, aes(x = sqrt(DRYNESS))) + geom_histogram(color = 'gray90')hist_b
• You can also fill colors of the bins by group/class. In many scenarios we have multiple distributionswe would like to visualize simultaneously. For example, were the biomes having similar dryness situa-tion? One commonly employed visualization strategy is stacking bars on top of each other and fillinghistogram in different colors for groups;
hist_c <- ggplot(dat1, aes(x = DRYNESS, fill = BIOME)) +geom_histogram()
• Although counting numbers is used as y axis by default, you can change y axis as density. Givenuneven sample size for each group/class, density histograms may show inconsistent pattern with thefrequency ones.
3
![Page 4: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/4.jpg)
hist_d <- ggplot(dat1, aes(x = DRYNESS, fill = BIOME)) +geom_histogram(aes(y = ..density..))#specify y as density
0
5
10
15
0 50 100
DRYNESS
coun
t
BIOME
Boreal
Grassland
Montane
hist_c
0.00
0.02
0.04
0.06
0.08
0 50 100
DRYNESS
dens
ity
BIOME
Boreal
Grassland
Montane
hist_d
• One biggest disadvantage of the stacked histogram is hard to quantify each group; for example, howmany samples of Boreal with DRYNESS values around 40? About 15 or 8? It’s not super clear tocompare distributions among groups. To solve this, one way is to change the positions of bins. Dodgingbins preserves the vertical position of a geom while adjusting the horizontal position.
• Histograms have been a popular visualization option since at least the 18th century, in part because theyare easily generated by hand. More recently, as extensive computing power has become popularized ineveryday devices such as laptops and cell phones, we see them increasingly being replaced by densityplots. In a density plot, we attempt to visualize the underlying probability distribution of the data bydrawing an appropriate continuous curve;
0.0
2.5
5.0
7.5
10.0
12.5
0 50 100
DRYNESS
coun
t
BIOME
Boreal
Grassland
Montane
0.00
0.01
0.02
0.03
0.04
0 50 100
DRYNESS
dens
ity
BIOME
Boreal
Grassland
Montane
Similarly, we fill density curve with different colors. The alpha argument is used to introduce transparencyof the color, and alpha value in the range of 0 (totally transparent) to 1 (no transparent). Also, try to addmultiple geom:
hist_f + geom_histogram(aes(y = ..density..), alpha = 0.6, position = 'dodge')
5.2. Scatter plots
With scatter plots you can visually check the relationships among variables. Are they linear or curvilinear?Outliers are also easily visible.
• Now, try to use a scatter plot to visually check the relationships among variables and to identifyoutliers. To check the relationships between Mean Summer Precipitation (MSP) and Mean Annual
4
![Page 5: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/5.jpg)
Precipitation (MAP), we can use them as x, y in the plot respectively (normally, y axis is for dependentvariable, and x is for independent variable, but in this case it’s ok to exchange axes). The geom functionfor scatter plot is geom_point()
ggplot(dat1, aes(x = MAP, y = MSP)) + geom_point()
• Cool! It seems there is a positive relationship between these two variables. However, several overlappedpoints could influence the interpretability of the plot. One easy way is to introduce the transparencyof points.
plt <- ggplot(dat1, aes(x = MAP, y = MSP)) + geom_point(alpha = .3)plt
250
275
300
325
350
375
400 500 600 700
MAP
MS
P
• Besides changing the transparency, changing the point position by counterintuitively adding randomnoise could be helpful to see each point.
plt_jittered <- ggplot(dat1, aes(x = MAP, y = MSP)) +geom_point(position = "jitter")
plt_jittered
• You can also add labels to your plot with the geom function geom_text().In this case, we want to labelSTATION name of points. hjust and vjust is used to control the placement of labels.
plt_label <- ggplot(dat1, aes(x = MAP, y = MSP, label = STATION)) +geom_point() + geom_text(hjust = 0, vjust = 0, size = 2.2, color = 'gray40')
plt_label
5
![Page 6: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/6.jpg)
300114
301449302343302369
302789304155
304642
305076
305773
306733
307370
307476
308940
309125
309275
311653
312849313133
313617
314291
314428
314549
315571
316679317166
317380
317402317973
318467
318820
320886
321054
321629
321981322685322819
323397
323860
324975
325083
326204
326346
328570
329232
330507
330987
331250
331666
332997
333387
333684333941
334604
335494
335543336145
336582
338128
339025
339231
339256340053
340251
340921
342611
343767
344513
345289
346766
348363
348868
350956
352795
352817
354141
354656
355248
356746
357642
361092
361136
361336
361368
361782
361916
363855
363994364112
364909
366384
366906
367460
367544
367548
369889371685
373633
373664
373935
374134
376822379433
379739
381553
381684 381902
382286384538
384579
384757
384942
387607388064
388754
388919
389756391351
393944
394461
394585
394756
394953
395114
396675
397640
397981
250
275
300
325
350
375
400 500 600 700
MAP
MS
P
Could you tell the STATION ID of the two outliers around the lower right corner of the plot?
• However, do all BIOME types follow the same relationship between MAP and MSP? To figure it out,we need add some visual aid to separate these types (e.g., color, shape)
250
275
300
325
350
375
400 500 600 700
MAP
MS
P
BIOME
Boreal
Grassland
Montane
250
275
300
325
350
375
400 500 600 700
MAP
MS
P
5.3 Box plots
• Just like scatter plots, boxplots is a good way to visually check the relationships among two variables.If one variable is continuous (as y) and the other is categorical (as x), then boxplot is a good option.For instance, to understand the general distribution of mean annual temperature (MAT) of each biometypes (BIOME). The geom for boxplot is geom_boxplot()
ggplot(dat1, aes(x = BIOME, y = MAT)) + geom_boxplot()
If you still have time, you can add the following arguments within the parentheses of geom_boxplot()andsee what their functions could be: varwidth = T, notch = T
ggplot(dat1, aes(x = BIOME, y = MAT)) + geom_boxplot(varwidth = T)# the width of box reflecting the samples sizeggplot(dat1, aes(x = BIOME, y = MAT)) + geom_boxplot(notch = T) # add notch to box(es)
6
![Page 7: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/7.jpg)
−2.5
0.0
2.5
5.0
Boreal Grassland Montane
BIOME
MAT
−2.5
0.0
2.5
5.0
Boreal Grassland Montane
BIOME
MAT
Similarly, we can make boxplot for mean annual temperature (MAT) of ecosystems (ECOSYS).
ggplot(dat1, aes(x = ECOSYS, y = MAT)) + geom_boxplot()
−2.5
0.0
2.5
5.0
B−APB−BSAB−CMB−CPB−DMB−KUB−LBHB−NMB−PeacB−PRPB−UBHG−DMGG−FFG−FPG−MGG−NFM−AM−LFM−MM−SAM−UF
ECOSYS
MAT
Since the names of ecosystems takes spaces and can easily overlap, we prefer to use ECOSYS as y axis andhave horizontal boxplot:
ggplot(dat1, aes(x = ECOSYS, y = MAT)) +geom_boxplot() + coord_flip() # horizontal: flip the x, y axes
# Great! Now you can color boxplots based their BIOME types;ggplot(dat1, aes(x = ECOSYS, y = MAT, fill = BIOME)) +geom_boxplot(varwidth = T) + coord_flip() # colored by BIOME groups
7
![Page 8: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/8.jpg)
B−AP
B−BSA
B−CM
B−CP
B−DM
B−KU
B−LBH
B−NM
B−Peac
B−PRP
B−UBH
G−DMG
G−FF
G−FP
G−MG
G−NF
M−A
M−LF
M−M
M−SA
M−UF
−2.5 0.0 2.5 5.0
MAT
EC
OS
YS
a
B−AP
B−BSA
B−CM
B−CP
B−DM
B−KU
B−LBH
B−NM
B−Peac
B−PRP
B−UBH
G−DMG
G−FF
G−FP
G−MG
G−NF
M−A
M−LF
M−M
M−SA
M−UF
−2.5 0.0 2.5 5.0
MAT
EC
OS
YS
BIOME
Boreal
Grassland
Montane
b
• Boxplots are generally useful, but it does only focus on five numbers of the samples (min, max, 25th,50th, and 75th). To add more details about distribution, we can add points (jittered) or violin plot asalternatives.
ggplot(dat1, aes(x = BIOME, y = MAT)) +geom_violin()+ geom_boxplot(width = .1)
ggplot(dat1, aes(x = BIOME, y = MAT)) +geom_boxplot() + geom_point(position='jitter', alpha=.2, size=2)
−2.5
0.0
2.5
5.0
Boreal Grassland Montane
BIOME
MAT
−2.5
0.0
2.5
5.0
7.5
Boreal Grassland Montane
BIOME
MAT
The first commend narrowed the width of boxplots and added them into violin plot, and the second oneadded scatter plots into boxplots.
8
![Page 9: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/9.jpg)
• Well done! So far we just analyze one continuous variable once a time. Can we visual multiple variablesin one plot?
First, we applied gather() function to transform a data frame from wide to long. In ggplot2, x or y mustonly be determined by one single variable. Therefore, first we need to gather multiple interested variablesinto one, and then use the new data table for ggplot2 plotting. For example, let make a boxplot to check thedistribution of three BIOME types of mean annual temperature (MAT), mean warmest month temperature(MWMT) and mean coldest month temperature (MCMT).
# install.packages('tidyr')library(tidyr)dat2 <- gather(dat1, key = 'temp', value = 'value', MAT, MCMT, MWMT)head(dat2, 3) #quick check the new data table## STATION ECOSYS BIOME MAP MSP DRYNESS temp value## 1 300114 G-NF Grassland 443 287 17 MAT 2.4## 2 301449 G-DMG Grassland 415 257 3 MAT 4.5## 3 302343 G-MG Grassland 429 258 12 MAT 4.9ggplot(dat2, aes(x = temp, y = value, fill = BIOME)) +geom_boxplot() #using different colors for BIOME types
−20
0
20
MAT MCMT MWMT
temp
valu
e
BIOME
Boreal
Grassland
Montane
Looks nice! If you change the temperature variables as treatments, then different performance among andwithin groups is a strong clue of interaction.
5.4. Multi-panel scatter plots in R
So far, in this lab we learnt 1-dimensional (histogram, density plot), 2-dimensional (scatter plot, boxplot)exploratory graphics, they normally can only analyze one or a pair of variables a time. If you have a datatable with 10 potential independent variables, plotting them one by one is not effective. To get the generalidea of the relationships among variables in very short time:
#need use ggpairs() function of GGally package#install.packages('GGally')library(GGally)ggpairs(dat1[, c('MAT', 'MAP', 'MSP', 'DRYNESS', 'BIOME')],
aes(color = BIOME))
9
![Page 10: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/10.jpg)
Cor : −0.177
Boreal: 0.245
Grassland: −0.206
Montane: −0.91
Cor : −0.0431
Boreal: 0.311
Grassland: −0.374
Montane: −0.171
Cor : 0.711
Boreal: 0.547
Grassland: 0.54
Montane: −0.0371
Cor : −0.735
Boreal: −0.727
Grassland: −0.51
Montane: −0.908
Cor : 0.667
Boreal: 0.117
Grassland: 0.745
Montane: 0.892
Cor : 0.492
Boreal: 0.123
Grassland: 0.818
Montane: 0.189
MAT MAP MSP DRYNESS BIOME
MAT
MA
PM
SP
DR
YN
ES
SB
IOM
E
−5.0 −2.5 0.0 2.5 5.0 7.5400 500 600 700 250 275 300 325 350 375 0 50 100 Boreal GrasslandMontane
0.0
0.1
0.2
0.3
0.4
400
500
600
700
250
275
300
325
350
375
0
50
100
0.02.55.07.5
0.02.55.07.5
0.02.55.07.5
Voil?. Now you can see the plot matrix among MAT, MAP, MSP, DRYNESS with BIOME, and also usedifferent colors distinguish BIOME types.
5.5. Barplots
Bar charts and dot plots are used with summary statistics to compare class means, i.e. from experimentaltreatments or from sampling sites. In either case, you don’t have a sensible scale on your x-axis, i.e. _**variable is a factor**_, such as “Control”, “Nitrogen”, “Phosphorus”, “Nitrogen & Phosporous”. Becausethe bars/dots can be grouped, these graph types work well for factorial experiments with multiple treatments(or for hierarchical sampling designs).
Let’s use the other dataset(‘lentils_summary.csv’) contained in the zip file for barplots. The data set is thelentil yield measurements of three varieties from two farms, and the average yield amount and its standarddeviation can be found in column ‘mYIELD’ and ‘sdYIELD’, respectively.
Once you load the data set, try out the following code for barcharts:
10
![Page 11: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/11.jpg)
dat3 <- read.csv('lentils_summary.csv', header = T)
ggplot(dat3, aes(x = FARM, y = mYIELD, fill = VARIETY)) +geom_col()ggplot(dat3, aes(x = FARM, y = mYIELD, fill = VARIETY)) +geom_col(position = 'dodge')
You can add an outline to enhance the contrast:
a <- ggplot(dat3, aes(x = FARM, y = mYIELD, fill = VARIETY)) +geom_col(position = 'dodge', color='black') +scale_fill_brewer(palette='Pastel2')
a
Next, let’s add error bars to represent uncertainty. As an example, we drawn error bars for the standarddeviation of yield. Note that we now need a slightly different position indicator. The default dodge amountfrom the main plot is 0.9. You can play with the number to see what it does.
a + geom_errorbar(aes(ymin = mYIELD-sdYIELD, ymax = mYIELD+sdYIELD),width = 0.3, position = position_dodge(width = 0.9))
## You could also do the error bars just on one side. We’ll save that plot to object bb <- a + geom_errorbar(aes(ymin = mYIELD, ymax = mYIELD+sdYIELD),
width = 0.3, position = position_dodge(width = 0.9))b
0
200
400
600
Farm1 Farm2
FARM
mY
IELD
VARIETY
A
B
C
0
200
400
600
Farm1 Farm2
FARM
mY
IELD
VARIETY
A
B
C
Then, you can add letters that indicate significance. Different letters indicate a significant difference betweentreatments. If they have the same letter, the difference is not statistically significant. We place the labelson top of the error bars:
c <- b + geom_text(aes(x = FARM, y = mYIELD + sdYIELD + 30,group = VARIETY,label=c('a','b','b','a','b','b')),position = position_dodge(width = 0.9))
c
Another common customization would be to change the order of your treatments. For example, you maywant to start with the Farm2 than Farm1:
# easy to change the order of class levels:c + scale_x_discrete(limits = c('Farm2', 'Farm1'))# You can customize your lables, as usual, and if you have long labels,# it’s sometimes handy to put your labels on an angle
11
![Page 12: Intro to GGPLOT2 - University of Alberta · 2019-03-20 · 300114 301449302343 302369 302789 304155 304642 305076 305773 306733 307370 307476 308940 309125 309275 311653 312849313133](https://reader033.fdocuments.in/reader033/viewer/2022043023/5f3efe33f2880a76c745231e/html5/thumbnails/12.jpg)
c + labs(y="Yield (kg/ha)", x="", title="Lentil Plots") +theme(axis.text.x = element_text(angle = 45),axis.text.y = element_text(angle = 0))
# You can flip the chart and apply themesc + coord_flip() + theme_minimal()
bb
a
bb
a
0
200
400
600
800
Farm2 Farm1
FARM
mY
IELD
VARIETY
A
B
C
bb
a
bb
a
0
200
400
600
800
Farm
1
Farm
2
Yie
ld (
kg/h
a) VARIETY
A
B
C
Lentil Plots
b
b
a
b
b
a
Farm1
Farm2
0 200 400 600 800
mYIELD
FAR
M
VARIETY
A
B
C
5.6. Dot plots for factorial designs
Dot charts can be used as an alternative to bar charts. They are generally easier to read when you have manytreatments, due to their high data-to-ink ratio. Also, they do not need to start with a 0 value to convey thecorrect sense of the treatment effect (a problem with bar charts). Dot plots are also more compact, whichcan be useful if you have large experiments with many treatments or treatment levels. For the small factoriallentil experiment, it does not really matter much, but we’ll use it anyway as an illustration.
Many of the options that you have learned above can be used here as well. Change the numbers in the codebelow to see what happens. Try additional commands from the section above. You can also write what youlike into objects (like a, b, c, d above, so that you don’t have to re-run the code and keep it simple to lookat.
ggplot(data = dat3, aes(x = FARM, y = mYIELD, color = VARIETY)) +geom_errorbar(aes(ymin = mYIELD-1.96*sdYIELD, ymax = mYIELD+1.96*sdYIELD,color = VARIETY), width = 0.1) +geom_vline(xintercept = c(1, 2), linetype = 'dashed',color = 'gray60', alpha = 0.8) +scale_color_brewer(palette = 'Set2') +geom_point(size = 4) + coord_flip() + theme_bw()
Farm1
Farm2
200 400 600 800
mYIELD
FAR
M
VARIETY
A
B
C
12