A Crash R Course on Statistical Graphics - Isabella R. Ghement Crash R Course on Statistical...

Post on 07-Jun-2020

6 views 0 download

Transcript of A Crash R Course on Statistical Graphics - Isabella R. Ghement Crash R Course on Statistical...

A Crash R Course on Statistical Graphics

Dr. Isabella R. Ghement

Ghement Statistical Consulting Company Ltd.

isabella@ghement.ca

American Statistical Association Conference on Statistical Practice February 21, 2013, 1:00pm – 5:00pm

New Orleans, LA

1

Outline

1. Learning Goals

2. Overview of R

3. Things to Know about R

4. Good Practices

5. Getting Started with R

6. Data Import/Export

7. Graphical Systems in R

2

Outline - Continued

8. Basic R Graphics

9. Customizing Basic R Graphics

10. Advanced R Graphics

11. Customizing Advanced R Graphics

12. R Graphics Housekeeping

13. Summary

14. References

3

Learning Goals

4

Overarching Learning Goals

After attending this course, you will be able to:

Organize your work in R by creating and saving R scripts;

Import/export data using R;

Produce standard statistical plots using the R package graphics;

Produce advanced statistical plots using the R package lattice;

Customize basic and advanced statistical plots;

Save basic and advanced statistical plots in a variety of formats (e.g., jpeg, pdf).

5

Overview of R

Learning Goal: Understand what R is, what it can do for you and where to find R resources.

6

Overview of R R is an open-source software environment and

programming language for statistical computing and graphics.

R’s use is governed by the GNU general Public License.

R was created in the mid 90’s by Ross Ihaka and Robert Gentleman (also known as “R & R”) of the Statistics Department at the University of Auckland, New Zealand.

Some people claim that R was created by academics for academics. This may explain the steep learning curve some learners face when switching to R.

7

Overview of R R gets updated several times a year and each upgrade

includes new functionality. It’s good to keep up with the latest upgrades by installing the latest version of R. However, it is also important to keep all previously installed versions of R, as sometimes old R code will no longer work with recent versions of R.

You can check the website http://cran.stat.ucla.edu/ for R upgrades.

R is supported by all major operating systems: Windows, Mac, Linux and Unix.

R is developed at present by the R Development Core Team, a group of researchers with write access to the R source code.

8

Overview of R R has its own dedicated website:

http://www.r-project.org/

The R website provides access to a variety of

resources, including:

- R Mailing Lists (e.g., R-help)

- R Conferences (e.g., UseR!)

- CRAN (i.e., go to website for installing R)

- Search resources

- R Manuals

- R Books

- R Journal 9

Overview of R

To cite R in publications, use the following:

R Development Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

ISBN 3-900051-07-0, URL http://www.R-project.org/.

10

Overview of R

The original R is focused on function rather than form and its graphical user interface reflects this focus.

Efforts to improve R’s graphical user interface have led to enhanced versions of R such as: • RStudio (http://www.rstudio.com) • Revolution R (http://www.revolutionanalytics.com)

11

Overview of R R offers a powerful and versatile platform for:

Data Processing and Manipulation (e.g., packages plyr, reshape);

Statistical Graphics (e.g., packages graphics, grid, lattice, ggplot2); Statistical Analyses (see CRAN Task Views website for list of R packages dedicated to implementing specific statistical analyses, http://cran.r-project.org/web/views/) ; Statistical Programming (e.g., built-in R programming language as

well as ability to interface with C++, FORTRAN, Java and Python via packages such as Rcpp, Rfortran, rJava and RPy);

Statistical Reporting (e.g., excellent interface with Latex via Sweave and some interface with Word via packages such as R2HTML and rtf).

12

Overview of R To learn more about R, you can refer to introductory R books such as: “R in Action: Data Analysis and Graphics with R”, by

Robert I. Kabacoff (Manning Publications Co., 2011)

“R for Dummies”, by Andrie de Vries and Joris Meys (John Wiley & Sons, 2012)

“R Cookbook”, by Paul Teetor (O'Reilly, 2011) “R for Statistics”, by P.-A. Cornillon et al. (CRC Press, 2012) The first of these books is accompanied by an excellent website, Quick-R: http://www.statmethods.net/. 13

Overview of R R users familiar with other statistical software (i.e., Stata, SAS, SPSS) can also consult these books: “R for Stata Users”, by Robert A. Muenchen and

Joseph M. Hilbe (Springer, 2010); “R for SAS and SPSS Users”, by Robert A.

Muenchen (Springer, 2009). See http://www.r-project.org/doc/bib/R-books.html for additional R book references.

14

Things To Know About R

Learning Goal: Be aware of some of R’s unique features and quirks. 15

Things to know about R

R is case sensitive!

e.g.: anova is different from Anova

R uses the assignment operator <-

to assign names or create new data objects.

e.g.: m <- 1 + 2

R provides access to help files via the question mark.

e.g.: ?mean

16

Things to know about R

R uses the concatenate operator c to combine values or labels.

e.g.: var <- c(1, 2,3)

R uses quotation marks for character strings that can be interpreted as names or labels.

e.g.: col <- c("red", "blue")

data <- read.csv("datafile.csv")

17

Things to know about R

R uses different types of structures for storing data:

vectors

factors

matrices

arrays

data frames

lists

e.g.: Vector: Factor: Matrix: Array: Data Frame:

2

1

3

1

2

3

10

20

30

4

5

17

M

F

F

M

1 10

2 20

18 7

40 9

1

2

3

M

F

M

1.5

1.8

1.7 18

Things to know about R

• R uses the symbol NA to denote missing values (i.e., Not Available).

• In R, operations performed on variables which include missing values produce a missing value as a result.

e.g.:

1

NA

3

19

Things to know about R

R relies on functions for the automation of operations.

e.g.:

f <- function(x){

plot(x)

return(summary(x))

}

20

Things to know about R R uses packages to bundle up functions useful for

performing certain data processing tasks, producing certain types of graphs or performing specialized statistical analyses. R packages may also include data sets and help documentation.

Thousands of R packages are available on CRAN (Comprehensive R Archive Network) and can be installed in R with the command:

install.packages("package_name")

Once installed in R, packages need to be attached to the current R working session: require(package_name)

For a list of R package available on CRAN, see: http://cran.r-project.org/web/packages/

21

Things to know about R

For the purpose of creating graphs or implementing statistical analyses, R uses formulas such as:

y ~ x (y as a function of x) y ~ x | f (y as a function of x, conditional on f) y ~ x1*x2 (y as a function of x1, x2 and their interaction)

22

Things to know about R

R uses various types of brackets:

[ ]

[[ ]]

( )

{ }

It takes a while to get used to the meaning of

each of these brackets and know when and how

to use them.

23

Good Practices

Learning Goal: Adopt a basic set of good practices when working with R in order to keep your R work organized and ensure it is reproducible.

24

Be organized when working with R • Set your working directory at the very beginning of each R

session. This way, everything you save during that session is placed in your working directory.

• Type all of your R commands in script files, to ensure your work is reproducible. Script files are simply text files having the extension .R.

• Set desired options for controlling various aspects of the

session (e.g., maximum object size? maximum memory size?).

options(object.size=10e10) memory.size(max=TRUE)

Note: To access the help file for the options() function, type the following R command in the R Console window: ?options 25

Be organized when working with R • It pays off to be diligent about dating, versioning and

commenting all of your R script files.

• The pound symbol, #, is used to comment lines in an R script file.

e.g.:

• R script files bear the extension .R.

• Suggested naming conventions for R script files: Project_Results.R ProjectResults.R Project.Results.R

# This is a comment in an R script file. demo(graphics)

26

Getting Started with R

Learning Goal: Understand the R workflow and know how to interact with R via R script files.

27

R Workflow

Launch R and set up current session

Type R commands in an R script

Send R commands to the R Console for

execution

Create R output (e.g., numerical output, graphs,

processed data sets)

Save R output and

quit R

28

Launching R

• If you have an R icon on your desktop, double click on it to launch R.

• If you don’t have an R icon on your desktop, go to Start All Programs. Find the R application among the list of programs installed on your computer and select it in order to launch R.

Example of R Icon

29

Taking Stock of R’s Interface

Notes:

R has an R Console window, where we can either type commands directly or send commands stored in script files for execution. R also has a GUI menu, which allows us to change the working directory for the current R working session and open new R script files. GUI = Graphical User Interface

30

Setting Up the Current R Session • Set your working directory for the current R session via the R

Gui commands:

File Change dir... • Check your current working directory using the R command: getwd() • Set your options for the current R session by typing the

following in your R script: # set options for current session options(object.size=10e10) memory.size(max=TRUE) 31

Opening and Saving an R Script File • Open a new R script using the following commands from

the R Gui menu: File New Script. • Save the script using the R Gui commands File Save as... For now, you can call the script

Script.R. • Make it a good habit to keep saving the script file as you

continue to add R commands to it. Simply press the keyboard keys Ctrl + S whenever you are ready to save the script.

Note: Existing R script files can be accessed in R via the R GUI menu commands: File Open script...

32

Modifying an R Script File • Create a header for your script file, similar to the one

below.

################################### # A Crash R Course on Statistical Graphics # New Orleans, LA # February 21, 2013 ###################################

• Save the script file and carry on.

• As we progress through the course, please copy and paste R commands from the course slides into your R script file(s) and then send these commands to R for execution by selecting them and using the keyboard keys Ctrl + R.

33

R Graphics Demo • For now, type the following command

in your R script to access an R Graphics

Demo:

demo(graphics)

Press the Enter key to scroll through the various graphs available in this Demo.

• You can send the demo(graphics) command to R for execution by selecting it in the script file and pressing the keys Ctrl + R.

• You can also send commands to R for execution by copying them from the script file with Ctrl + C and pasting them into the R Console window with Ctrl + V).

34

Quitting R • To quit R at the end of a session , you can simply

type the following command in the R Console window:

quit()

• In general, you don’t need to save the working space attached to the current R session if you save all of the script files and numerical and graphical output they produce.

• For this course, we do not need to quit R yet.

35

Data Import Learning Goal: Be able to import comma delimited data files and text data files in R. 36

R Functions for Data Import R offers a variety of functions for importing data files. Two of these functions are shown below.

Note that read.csv() is a special version of read.table(). Both of these functions require the name of the import file to be specified (provided the file is located in the current R working directory):

dataset <- read.table("datafile.csv") dataset <- read.table("datafile.txt")

File Type File Extension R Function R Help

Comma Separated File

.csv read.csv() ?read.csv

Text File .txt read.table() ?read.table

37

read.csv() One of the easiest ways to import data into R is to save that data as a comma delimited file (.csv) and then use the function read.csv() to bring this file into R. dataset <- read.csv("datafile.csv", as.is = TRUE) name of csv file where option for preserving

import data are stored; character variables this file must be located in the R working directory

38

read.csv()

When calling read.csv(), it is important to use

the option as.is = TRUE.

This will prevent R from automatically

converting all of the character variables in the

data to factors.

As a result, dates will be particularly easy to

handle in R using the lubridate package.

39

read.csv()

The command read.csv() can also be used with the following arguments: dataset <- read.csv(file.choose(), as.is = TRUE) browse interactively for the import data file dataset <- read.csv("C://desktop//datafile.csv", as.is = TRUE) extract the import data file from a specific location on the computer

40

read.table() The function read.table() is used to import text data files (.txt) into R. In general, this function requires more arguments than read.csv(). dataset <- read.table("datafile.txt", sep="\t", header = TRUE, as.is = TRUE)

Notes: sep stands for Type of separator used to delimitate data columns, such as:

sep="\t" tab sep= " " white space sep= "," comma header indicates whether or not the file header should be retained

as.is indicates whether or not character variables should be preserved

41

read.table()

The command read.table() can also be used with the following arguments: dataset <- read.table(file.choose(), as.is = TRUE) browse interactively for the import data file dataset <- read.table("C://desktop//datafile.table", as.is = TRUE) extract the import data file from a specific location on the computer

42

Example of Data Import in R

air <- read.csv("Air Quality Baton Rouge 2011.csv", as.is=TRUE) str(air) require(lubridate) air$Date <- mdy(air$Date) str(air) air <- air[1:365, ] View(air)

Notes on lubridate package: The lubridate package includes the following functions for converting character variables storing dates into date variables: ymd() year month day mdy() month day year dmy() day month year Example of dates handled by these functions: ymd() "2012-10-31 " or "2012/10/31" mdy() "10-31-2011" or "10/31/2012" dmy() "31-10-2012" or "31/10/2012".

43

Exploring Data Imported in R R stores any data set imported via read.csv() or read.table() as a

data frame (i.e., a tabular data set whose columns correspond to

statistical variables and whose rows correspond to records).

R Commands for Exploring a Data Frame Description

View(dataset) View data frame

str(dataset) Explore structure of data frame

names(dataset) rownames(dataset)

Extract names of variables and records in data frame

nrow(dataset) ncol(dataset)

Extract number of rows and columns in data frame

summary(dataset) Summarize the data frame

attach(dataset) detach(dataset)

Attach/detach the data frame to the R working space 44

Exercise on Data Import

ozone <- read.table("ozone.txt", header=TRUE, as.is=TRUE) head(ozone) str(ozone) rownames(ozone) ozone$date <- rownames(ozone) require(lubridate) ozone$date <- ymd(ozone$date) head(ozone) attach(ozone)

Details on ozone.txt • Ozone and meteorological variables collected in Rennes (France) during the summer of 2001. • The variables available are: - maxO3 (maximum daily ozone) - T12 (temperature at midday) - wind (wind direction) - rain - Wx12 (projection of the wind speed vector on the east-west axis at midday)

Import the data file ozone.txt into R. For ease, the R commands for data import are given below. Explore the resulting data frame.

45

Data Export

Learning Goal: Be able to export data from R in the form of comma delimited or text files.

46

R Functions for Data Export

R offers a variety of functions for exporting data files, but

we will focus only on the two functions listed below.

File Type File Extension R Function R Help

Comma Separated File

.csv write.csv() ?write.csv

Text File .txt write.table() ?write.table

47

Data Export

R Command:

write.csv()

Generic Syntax:

write.csv(dataframe, "datafile.csv",

row.names=FALSE,

quote=FALSE)

Notes: When using write.csv(): • The argument dataframe can be any data frame available in your R working space; • The argument "datafile.csv" represents the name of the csv file storing the exported data frame; • The option row.names = FALSE prevents R from adding row names to the exported data file; • The option quote=FALSE prevents R from adding quotes around values of character variables.

48

Data Export

R Command:

write.table()

Generic Syntax:

write.table(dataframe, "datafile.txt",

sep= "\t",

row.names=FALSE,

quote=FALSE)

Notes: When using write.table():

• The argument dataframe can be any data frame available in your R working space; • The argument sep is used to specify the name of the column separator.

• The argument "datafile.txt" represents the name of the text file storing the exported data frame; • The option row.names = FALSE prevents R from adding row names to the exported data file; • The option quote=FALSE prevents R from adding quotes around values of character variables. 49

Example of Data Export in R

# Export comma separated file

write.csv(air, "airexport.csv",

row.names=FALSE,

quote=FALSE)

# Export text file

write.table(air, "airexport.txt",

sep= "\t",

row.names=FALSE,

quote=FALSE)

50

Exercise on Data Export

Export the data frame ozone in your working space as a

comma delimited file (.csv). For your convenience, the

R command for data export is given below.

# Export as a comma separated file

write.csv(ozone, "ozone.csv",

row.names=TRUE,

quote=FALSE)

51

Graphical Systems in R

Learning Goal: Know about the 4 graphical systems available in R and how to access references and help for each system. 52

Graphical Systems in R

Grid Graphics

Grammar of Graphics

Trellis Graphics

Base Graphics Least Sophisticated

Most Sophisticated

53

Graphical Systems in R Graphical System R Package Book Reference

Base Graphics graphics “Graphics for Statistics and Data Analysis with R”, by Kevin J. Keen (CRC Press, 2010)

Trellis Graphics lattice “Lattice: Multivariate Data Visualization with R”, by Deepayan Sarkar (Springer, 2008)

Grammar of Graphics ggplot2 “ggplot2: Elegant Graphics for Data Analysis”, by Hadley Wickham (Springer-Verlag, 2009)

Grid Graphics grid “R Graphics”, 2nd Edition, by Paul Murrell (Chapman & Hall/CRC, 2006)

Note: The graphics packages comes with the default installation of R. The other packages need to be installed in R one time only and then required for each R session.

install.packages(c("lattice", "ggplot2", "grid"))

require("lattice") require("ggplot2") require("grid") 54

Example of Graph Produced in R

graphics package

January February March April May June July August September October November December

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

2011 BATON ROUGE/CAPITOL

Month

Daily

Max O

zone (

ppm

)

55

Example of Graph Produced in R

2011 BATON ROUGE/CAPITOL

Month

Daily

Max O

zone (

ppm

)

0.00

0.05

0.10

January February March April May June July August September October November December

lattice package

56

Example of Graph Produced in R

0.00

0.05

0.10

January February March April May June July August September October November December

Month

Daily

Max O

zone (

ppm

)

2011 BATON ROUGE/CAPITOL

ggplot2 package

57

Getting Help on R Graphical Systems To access the R help files associated with each of the three graphical systems, type the following commands in the R Console window: help(package="graphics") help(package="lattice") help(package="ggplot2") help(package="grid")

58

Getting Help on R Graphical Systems

To access the R help files associated with specific functions within a particular graphical system package, use commands similar to the ones below: function name package name

| | help(barplot, package="graphics") help(bwplot, package="lattice") help(qplot, package="ggplot2") help(arrowsGrob, package="grid")

59

Basic R Graphics Learning Goal: Learn how to produce basic graphs using the R graphics package.

60

Basic R Graphics

R offers a collection of functions for producing

standard graphics that are useful when

conducting exploratory data analysis.

These functions are available via the graphics

package, which is pre-installed in R.

61

Basic R Graphics

Graph Type R Command

Histogram hist(x)

Density Plot plot(density(x))

Boxplot boxplot(x)

Cumulative Distribution Plot

plot.ecdf(x)

x = quantitative variable

62

Basic R Graphics

Graph Type R Command

Bar chart barplot(table(f))

Dot chart dotchart(table(f))

Pie Chart pie(table(f))

f = qualitative variable (i.e., factor)

63

Basic R Graphs Graph Type R Command

Scatter plot plot(y ~ x)

Time series plot plot(y ~ date)

Coplot coplot(y ~ x|z)

Line Plot matplot(x, cbind(y,z))

Pairs plot pairs(cbind(y,x,z))

Side-by-side boxplots boxplot(y ~ f)

Side-by-side bar charts barplot(table(f1, f2))

x, y, z = quantitative variables f, f1, f2 = qualitative variables (i.e., factors) date = time variable

64

Histogram

Air Quality in Baton Rouge in 2011

Histogram

2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

Fre

quency

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

020

40

60

80

?hist

65

R Code for Histogram

air <- read.csv("Air Quality Baton Rouge 2011.csv", as.is=TRUE) View(air) str(air) require(lubridate) air$Date <- mdy(air$Date) str(air) air <- air[1:365, ] names(air) attach(air)

66

R Code for Histogram

hist(Ozone,

xlab="Daily Max Ozone (ppm)",

main="Histogram",

col="lightblue",

sub="2011 BATON ROUGE/CAPITOL",

col.sub="red")

67

Density Plot

0.00 0.05 0.10 0.15

05

10

15

20

25

Kernel Density Plot

N = 365 Bandwidth = 0.004334

Density

?density

68

R Code for Density Plot

plot(density(Ozone),

main="Kernel Density Plot")

69

Boxplot

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

Boxplot

2011 BATON ROUGE/CAPITOL

Daily

Max O

zone (

ppm

)

?boxplot

70

R Code for Boxplot

boxplot(Ozone,

ylab="Daily Max Ozone (ppm)",

main="Boxplot",

col="lightblue",

sub="2011 BATON ROUGE/CAPITOL",

col.sub="red")

71

Side-by-Side Boxplots

January February March April May June July August September October November December

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

Side-by-Side Boxplots

2011 BATON ROUGE/CAPITOL

Month

Daily

Max O

zone (

ppm

)

?boxplot

72

R Code for Side-by-Side Boxplots

Month <- months(Date) Month <- factor(Month, levels=unique(Month)) boxplot(Ozone ~ Month, xlab="Month", ylab="Daily Max Ozone (ppm)", main="Side-by-Side Boxplots", col="lightblue", sub="2011 BATON ROUGE/CAPITOL", col.sub="red")

73

Empirical CDF

?plot.ecdf

plot.ecdf(Ozone, xlab="Daily Max Ozone (ppm) ", main="Empirical Cumulative Distribution Function")

0.00 0.05 0.10 0.15

0.0

0.2

0.4

0.6

0.8

1.0

Empirical Cumulative Distribution Function

Daily Max Ozone (ppm)

Fn(x

)

74

Barchart

United States

Decade

Perc

ent

Change in H

ispanic

Popula

tion f

rom

Pre

vio

us D

ecade

010

20

30

40

50

60

70

53%

58%

43%

Decade US

1990 53%

2000 58%

2010 43%

Percent Change in Hispanic Population from Previous Decade

?barplot

75

Barchart Decade <- c(1990, 2000, 2010)

Pct.Change.Hispanic.US <- c(53, 58, 43)

b <- barplot(Pct.Change.Hispanic.US,

col=c("#599ad3"),

xlab="Decade",

ylab="Percent Change in Hispanic Population from Previous Decade",

ylim=c(0,70),

main="United States")

abline(h=0)

text(b, Pct.Change.Hispanic.US + 2,

paste(Pct.Change.Hispanic.US,"%",sep=""))

76

Side-by-Side Barcharts

1990 2000 2010

Decade

Perc

ent

Change in H

ispanic

Popula

tion f

rom

Pre

vio

us D

ecade

010

20

30

40

50

60

70

53%

58%

43%

6%9%

57%

United States

New Orleans Metro

Decade US New Orleans Metro

1990 53% 6%

2000 58% 9%

2010 43% 57%

Percent Change in Hispanic Population from Previous Decade

?barplot

77

R Code for Side-by-Side Barcharts

Decade <- c(1990, 2000, 2010) Pct.Change.Hispanic.US <- c(53, 58, 43) Pct.Change.Hispanic.NewOrleansMetro <- c(6,9,57) Pct.Change.Hispanic <- data.frame(Pct.Change.Hispanic.US, Pct.Change.Hispanic.NewOrleansMetro) Pct.Change.Hispanic <- data.matrix(Pct.Change.Hispanic) rownames(Pct.Change.Hispanic) <- Decade Pct.Change.Hispanic

78

R Code for Side-by-Side Barcharts b <- barplot(t(Pct.Change.Hispanic), beside=TRUE, col=c("#599ad3", "#79c36a"), xlab="Decade", ylab="Percent Change in Hispanic Population from Previous Decade", ylim=c(0,70)) abline(h=0) text(b[1,], Pct.Change.Hispanic.US + 2, paste(Pct.Change.Hispanic.US,"%",sep="")) text(b[2,], Pct.Change.Hispanic.NewOrleansMetro + 2, paste(Pct.Change.Hispanic.NewOrleansMetro,"%",sep="")) legend("topleft", c("United States","New Orleans Metro"), fill=c("#599ad3", "#79c36a"), bty="n") 79

Stacked Barcharts

Proportion

1980

1990

2000

2010

0.0 0.2 0.4 0.6 0.8 1.0

0.08 0.22 0.60 0.09

0.08 0.20 0.61 0.11

0.07 0.20 0.62 0.11

0.07 0.17 0.64 0.12

Under 5 years 5 to 17 years 18 to 64 years 65 years and over

Decade Under 5 years

5 to 17 years

18 to 64 years

65 years and older

1980 105,801 285,440 774,773 116,291

1990 97,768 256,363 771,383 138,877

2000 90,471 261,362 815,010 149,667

2010 77,154 195,664 752,855 142,091

Age Distribution in New Orleans Metro (Expressed as Counts)

80

R Code for Stacked Barcharts

library(lattice) library(plyr) aged <- matrix(c(105801, 285440, 774773, 116291, 97768, 256363, 771383, 138877, 90471, 261362, 815010, 149667, 77154, 195664, 752855, 142091), nrow=4, ncol=4, byrow=TRUE) aged colnames(aged) <- c("Under 5 years", "5 to 17 years", "18 to 64 years", "65 years and over") rownames(aged) <- c("1980","1990","2000","2010") aged

81

R Code for Stacked Barcharts

colors <- c(rgb(166,27,30,maxColorValue = 255),

rgb(192,80,77,maxColorValue = 255),

rgb(24,65,83,maxColorValue = 255),

rgb(130,184,208,maxColorValue = 255))

colorset <- simpleTheme(col=colors, border="white")

82

R Code for Stacked Barcharts

sb <- barchart(prop.table(aged, margin=1), xlab="Proportion", par.settings=colorset, panel=function(...) { panel.barchart(...) tmp <- list(...) tmp <- data.frame(x=tmp$x, y=tmp$y) # calculate positions of text labels df <- ddply(tmp, .(y), function(x) { data.frame(x, pos=cumsum(x$x)-x$x/2) }) panel.text(x=df$pos, y=df$y, label=sprintf("%.02f", df$x), cex=0.7) }, auto.key=list(columns=4, space="bottom", cex=0.8, size=1.4, adj=1, between=0.2, between.colums=0.1)) plot(sb) 83

Comparative Pie Charts

Mexican

Puerto Rican

Cuban

Other

Mexican

Puerto Rican

CubanOther

Mexican

Puerto Rican

Cuban

Other

Parish Mexican Puerto Rican Cuban Other

Jefferson 10,194 2,682 3,840 36,986

Orleans 4,298 948 1,285 11,520

St. Tammany 3,593 933 816 5,628

Share of Hispanic Population by Nationality in Three New Orleans Parishes in 2010 (Expressed as a Count)

?pie 84

R Code for Comparative Piecharts

PopulationShare <- c(10194, 2682, 3840, 36986, 4298, 948, 1285, 11520, 3593, 933, 816, 5628) PopulationShare <- matrix(PopulationShare, nrow=3, ncol=4, byrow=TRUE) PopulationShare rownames(PopulationShare) <- c("Jefferson","Orleans","St. Tammany") colnames(PopulationShare) <- c("Mexican","Puerto Rican","Cuban","Other") PopulationShare

85

R Code for Comparative Piecharts

layout(matrix(c(1,2,3),1,3, byrow=TRUE)) cols <- c("#599ad3", "#9e66ab", "#79c36a", "#f9a65a") pie(PopulationShare["Jefferson",], init=90, clockwise=T, col=cols,

radius=1.2) pie(PopulationShare["Orleans",],init=90, clockwise=T, col=cols,

radius=1.2) pie(PopulationShare["St. Tammany",], init=90, clockwise=T, col=cols,

radius=1.2)

86

Line Charts

Year

Popula

tion L

ivin

g in P

overt

y

1979 1989 1999 2009

050000

100000

150000

62,114

105,687110,179

104,349

143,793

152,42

130,896

82,469

Orleans Parish

Rest of the New Orleans Metro

?matplot

?matpoints

?matlines

87

R Code for Line Charts Year <- c(1979, 1989, 1999, 2009) OrleansParish <- c(62114, 105687, 110179, 104349) RestNewOrleansMetro <- c(143793, 152042, 130896, 82469) matplot(Year, cbind(OrleansParish, RestNewOrleansMetro), type="l", ylab="Population Living in Poverty", ylim=c(0,160000), lty=1, lwd=2, col=c("darkgreen","orange"), axes=FALSE ) axis(1, at=c(1979, 1989, 1999, 2009), labels=c(1979, 1989, 1999, 2009)) axis(2,at=pretty(0:160000))

88

R Code for Line Charts

segments(Year[1],OrleansParish[1]-10000, Year[1], OrleansParish[1])

segments(Year[2],OrleansParish[2]-10000, Year[2], OrleansParish[2])

segments(Year[3],OrleansParish[3]-10000, Year[3], OrleansParish[3])

segments(Year[4],OrleansParish[4]+10000, Year[4], OrleansParish[4])

text(Year[1], OrleansParish[1]-15000, paste(62,114,sep=","),col="darkgreen")

text(Year[2], OrleansParish[2]-15000, paste(105,687,sep=","),col="darkgreen")

text(Year[3], OrleansParish[3]-15000, paste(110,179,sep=","),col="darkgreen")

text(Year[4], OrleansParish[4]+15000,paste(104,349,sep=","),col="darkgreen")

89

R Code for Line Charts

segments(Year[1],RestNewOrleansMetro[1]-10000, Year[1], RestNewOrleansMetro[1]) segments(Year[2],RestNewOrleansMetro[2]-10000, Year[2], RestNewOrleansMetro[2]) segments(Year[3],RestNewOrleansMetro[3]+10000, Year[3],

RestNewOrleansMetro[3]) segments(Year[4],RestNewOrleansMetro[4]-10000, Year[4], RestNewOrleansMetro[4]) text(Year[1], RestNewOrleansMetro[1]-15000, paste(143,793,sep=","), col="orange") text(Year[2], RestNewOrleansMetro[2]-15000, paste(152,042,sep=","), col="orange") text(Year[3], RestNewOrleansMetro[3]+15000, paste(130,896,sep=","), col="orange") text(Year[4], RestNewOrleansMetro[4]-15000, paste(82,469,sep=","), col="orange")

90

R Code for Line Charts

legend("bottomright", c("Orleans Parish","Rest of the New Orleans Metro"), col=c("darkgreen","orange"), lty=1, lwd=2, bty="n" ) box()

91

Time Series Plot

Used for plotting the values of a quantitative

variable Y versus a time variable T.

e.g.: Y = Ozone

T = Date

plot(Y ~ T) plot(Y ~ T, type= "l") plot(Y ~ T, type= "h")

Jan 01 Jan 06 Jan 11 Jan 16 Jan 21 Jan 26 Jan 31

0.0

20.0

30.0

40.0

50.0

6

T

Y

Jan 01 Jan 06 Jan 11 Jan 16 Jan 21 Jan 26 Jan 31

0.0

20.0

30.0

40.0

50.0

6

T

Y

Jan 01 Jan 06 Jan 11 Jan 16 Jan 21 Jan 26 Jan 31

0.0

20.0

30.0

40.0

50.0

6

T

Y

?plot

92

Time Series Plot (v.1)

Jan Mar May Jul Sep Nov Jan

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

Time Series Plot

2011 BATON ROUGE/CAPITOL

Date

Daily

Max O

zone (

ppm

)

plot(Ozone ~ Date, ylab="Daily Max Ozone (ppm)", main="Time Series Plot", sub="2011 BATON ROUGE/CAPITOL", col.sub="red")

93

Time Series Plot (v.2)

Jan Mar May Jul Sep Nov Jan

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

Time Series Plot

2011 BATON ROUGE/CAPITOL

Date

Daily

Max O

zone (

ppm

)

plot(Ozone ~ Date, type="l", ylab="Daily Max Ozone (ppm)", main="Time Series Plot", sub="2011 BATON ROUGE/CAPITOL", col.sub="red")

94

Time Series Plot (v.3)

Jan Mar May Jul Sep Nov Jan

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

Time Series Plot

2011 BATON ROUGE/CAPITOL

Date

Daily

Max

Ozo

ne (

ppm

)

plot(Ozone ~ Date, type="h", ylab="Daily Max Ozone (ppm)", main="Time Series Plot", sub="2011 BATON ROUGE/CAPITOL", col.sub="red")

95

Time Series Plot (v.4)

Jan Mar May Jul Sep Nov Jan

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

Time Series Plot

2011 BATON ROUGE/CAPITOL

Date

Daily

Max O

zone (

ppm

)

0.10

bad <- ifelse(Ozone > 0.10, "red", "darkgrey") plot(Ozone ~ Date, type="h", ylab="Daily Max Ozone (ppm)", col=bad, main="Time Series Plot", sub="2011 BATON ROUGE/CAPITOL", col.sub="blue") abline(h=0.10, lty=2, col="red") text(locator(1),"0.10")

96

Scatterplot (v.1)

30 40 50 60 70 80 90 100

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

Scatterplot

Temperature (°F)

Daily

Max O

zone (

ppm

)

plot(Ozone ~ Temp, xlab="Temperature (°F)", ylab="Daily Max Ozone (ppm)", main="Scatterplot")

97

Scatterplot (v.2)

30 40 50 60 70 80 90 100

0.0

00.0

20.0

40.0

60.0

80.1

00.1

20.1

4

Scatterplot

Temperature (°F)

Daily

Max O

zone (

ppm

)

require(car) scatterplot(Ozone ~ Temp, xlab="Temperature (°F)", ylab="Daily Max Ozone (ppm)", smooth=FALSE, reg.line=FALSE, main="Scatterplot")

help(scatterplot, package="car")

98

Coplot

0.0

00.0

40.0

80.1

2

30 40 50 60 70 80 90 100

30 40 50 60 70 80 90 100 30 40 50 60 70 80 90 100

0.0

00.0

40.0

80.1

2

Temp

Ozone

60 70 80 90 100

Given : RelativeHumidity

coplot(Ozone ~ Temp | RelativeHumidity, panel = panel.smooth)

?coplot

99

Pairs Plot

Ozone

30 40 50 60 70 80 90

0.0

00.0

40.0

80.1

2

30

40

50

60

70

80

90

Temp

0.00 0.04 0.08 0.12 0 5 10 15

05

10

15

WindSpeed

pairs(cbind(Ozone, Temp, WindSpeed))

100

Exercise on Basic R Graphics For this exercise, refer to the ozone data frame

available in your R working space and follow the

instructions below to create a variety of basic R

graphs using variables from this data frame.

1. Create a histogram of maxO3.

2. Create a density plot of maxO3.

3. Create a boxplot of maxO3.

4. Create a cumulative distribution plot of maxO3.

101

Exercise on Basic R Graphics

5. Create a scatter plot of maxO3 versus T12.

6. Create a time series plot of maxO3.

7. Create side-by-side boxplots of maxO3 for the

four wind directions stored in the wind

variable).

8. Create a bar chart for the variable rain.

9. Create a bar chart for rain according to wind.

102

Customizing Basic R Graphics

Learning Goal: Learn how to customize basic graphs using the R graphics package.

103

Customizing Basic R Graphics

Adding a main title: e.g.: hist(Ozone,

main="BATON ROUGE")

Adding a subtitle: e.g.: hist(Ozone,

sub="Year 2011")

BATON ROUGE

Ozone

Fre

quency

0.00 0.04 0.08 0.12

040

80

Histogram of Ozone

Year 2011

Ozone

Fre

quency

0.00 0.04 0.08 0.12

040

80

104

Customizing Basic R Graphics

Adding x-axis and y- axis labels:

e.g.: hist(Ozone,

xlab="Ozone",

ylab="Frequency")

Histogram of Ozone

Ozone

Fre

quency

0.00 0.04 0.08 0.12

040

80

105

Customizing Basic R Graphics

Adding a legend:

hist(Ozone, freq=FALSE, ylim=c(0,30))

lines(density(Ozone))

legend("topright", "Density Curve", lty=1,bty="n")

Histogram of Ozone

Ozone

Density

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

05

10

15

20

25

30

Density Curve

106

Customizing Basic R Graphics

Adding text annotation:

hist(Ozone)

text(locator(1), "BATON ROUGE")

text(0.12, 20, "Year 2011")

Histogram of Ozone

Ozone

Fre

quency

0.00 0.04 0.08 0.120

20

40

60

80

BATON ROUGE

Year 2011

Notes:

When using the text() function:

• locator(1) places text wherever we click on the current graph;

• text(x,y, "some text") places text at graph location defined by (x,y) coordinates.

107

Customizing Basic R Graphics

Adding colors:

hist(Ozone, col="violet")

Histogram of Ozone

Ozone

Fre

quency

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

020

40

60

80

Note: To see the list of 600+ colors available in R, type the following command in the R Console window: colors() See also the help files for functions such as rainbow(), heat.colors(), terrain.colors() and palette(). 108

Customizing Basic R Graphics

Adding graphical symbols:

30 40 50 60 70 80 90

0.0

00.0

40.0

80.1

2

pch=1

Temp

Ozone

30 40 50 60 70 80 90

0.0

00.0

40.0

80.1

2

pch=19

Temp

Ozone

par(mfrow=c(1,2)) plot(Ozone ~ Temp, pch=1, main="pch=1") plot(Ozone ~ Temp, pch=19, main="pch=19")

Note: To see the graphical symbols available in R, use the command: example(pch) 109

Customizing Basic R Graphics

Controlling the size of graphical symbols: par(mfrow=c(1,3))

plot(Ozone ~ Temp, cex=1, main="cex=1")

plot(Ozone ~ Temp, cex=0.5, main="cex=0.5")

plot(Ozone ~ Temp, cex=2, main= "cex=2")

30 40 50 60 70 80 90 100

0.0

00.0

40.0

80.1

2

cex=1

Temp

Ozone

30 40 50 60 70 80 90 100

0.0

00.0

40.0

80.1

2

cex=0.5

Temp

Ozone

30 40 50 60 70 80 90 1000.0

00.0

40.0

80.1

2

cex=2

Temp

Ozone

Options for cex: cex = 1 (default size) cex = 0.5 (half default) cex = 2 (twice default)

110

Customizing Basic R Graphics

Adding lines:

hist(Ozone, freq=FALSE, ylim=c(0,30))

lines(density(Ozone))

Controlling the type and width of lines:

hist(Ozone, freq=FALSE, ylim=c(0,30))

lines(density(Ozone), lty=2, lwd=2)

Histogram of Ozone

Ozone

Density

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

05

10

15

20

25

30

Options for line type: lty=1 (solid) lty=5 (longdash) lty=2 (dashed) lty=6 (twodash) lty=3 (dotted) lty=4 (dotdash)

Options for line width: lwd = 1 (default) lwd = 0.5 (half default) lwd= 2 (twice default)

111

Exercise on Customizing Basic R Graphics

Create a scatter plot of maxO3 versus T12 using the variables in the ozone data frame. Enhance this scatter plot by adding the following elements to it: - main title and subtitle - x-axis and y-axis labels - some text annotation - some color - a particular type of graphical symbol (e.g., pch=8)

112

Advanced R Graphics

Learning Goal: Learn how to produce advanced graphs using the R lattice package.

113

Advanced R Graphics Recall that advanced R graphics can be produced

using any of the following R packages:

• grid (not covered in this course)

• trellis (covered in this course)

• ggplot2 (not covered in this course)

Note: The lattice package can replicate most of the basic graphics. However, the lattice package is particularly helpful for visualizing data conditional on the values of one or more variables.

114

Lattice Functions

lattice function graphics function Description

histogram() hist() Histogram

densityplot() plot(density()) Density Plot

bwplot() boxplot() Boxplot

stripplot() stripchart() Strip Plot

xyplot() plot() Scatter Plot

dotplot() dotchart() Dot Plot

barchart() barplot() Bar Chart

splom() pairs() Pairwise Scatterplot

cloud() persp() 3-D Scatterplot

115

Lattice Formulas

The functions in the lattice package rely on a

formula framework. For instance:

histogram(~ Y)

histogram(~ Y|F)

histogram(~Y|F1*F2)

xyplot(Y ~ X)

xyplot(Y ~ X |F)

xyplot(Y ~ X|F1*F2)

Symbol interpretation

~ as a function of | conditional on * crossed with

116

Histogram 2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

Count

0

50

100

0.00 0.05 0.10 0.15

require(lattice) histogram(~Ozone, xlab="Daily Max Ozone (ppm)", type="count", main="2011 BATON ROUGE/CAPITOL")

Note: For the histogram function, we can also use type= "density" to obtain a density histogram or type="percent" to obtain a percent of total histogram.

117

Conditional Histograms

histogram(~Ozone | Month, xlab="Daily Max Ozone (ppm)", type="count", main="2011 BATON ROUGE/CAPITOL")

2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

Count

0

5

10

15

20

0.00 0.05 0.10 0.15

January February

0.00 0.05 0.10 0.15

March April

May June July

0

5

10

15

20

August0

5

10

15

20

September

0.00 0.05 0.10 0.15

October November

0.00 0.05 0.10 0.15

December

118

Density Plot densityplot(~Ozone, xlab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL")

2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

Density

0

5

10

15

20

25

0.00 0.05 0.10 0.15

119

Conditional Density Plots

densityplot(~Ozone | Month, xlab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL", as.table=TRUE)

2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

Density

0

10

20

30

40

50

January

0.00 0.05 0.10 0.15

February March

0.00 0.05 0.10 0.15

April

May June July

0

10

20

30

40

50

August

0

10

20

30

40

50

0.00 0.05 0.10 0.15

September October

0.00 0.05 0.10 0.15

November December

120

Boxplot

2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

0.00 0.05 0.10

bwplot(~Ozone, xlab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL")

121

Side-by-Side Boxplots

Month

Daily

Max O

zone (

ppm

)

0.00

0.05

0.10

January February March April May June July August September October November December

bwplot(Ozone ~ Month, xlab="Month", ylab="Daily Max Ozone (ppm)")

122

Conditional Boxplots

2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

0.00 0.05 0.10

January February

0.00 0.05 0.10

March April

0.00 0.05 0.10

May June

July

0.00 0.05 0.10

August September

0.00 0.05 0.10

October November

0.00 0.05 0.10

December

bwplot(~Ozone | Month, xlab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL")

123

Strip Plot

2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

0.00 0.05 0.10

stripplot(~Ozone, xlab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL")

124

Side-by-Side Strip Plots 2011 BATON ROUGE/CAPITOL

Month

Daily M

ax O

zone (

ppm

)

0.00

0.05

0.10

January February March April May June July August September October November December

2011 BATON ROUGE/CAPITOL

Month

Daily

Max O

zone (

ppm

)

0.00

0.05

0.10

January February March April May June July August September October November December

stripplot(Ozone ~ Month, xlab="Month", ylab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL", jitter=FALSE)

stripplot(Ozone ~ Month, xlab="Month", ylab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL", jitter=TRUE)

125

Conditional Strip Plots

2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

January

0.00 0.05 0.10

February March

0.00 0.05 0.10

April

May June July August

0.00 0.05 0.10

September October

0.00 0.05 0.10

November December

stripplot(~ Ozone | Month, xlab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL", as.table=TRUE)

126

Time Series Plot

2011 BATON ROUGE/CAPITOL

Date

Daily

Max O

zone (

ppm

)

0.00

0.05

0.10

Jan Apr Jul Oct Jan

xyplot(Ozone ~ Date, type="l", ylab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL")

127

Scatterplot

2011 BATON ROUGE/CAPITOL

Temperature (°F)

Daily

Max O

zone (

ppm

)

0.00

0.05

0.10

40 60 80 100

xyplot(Ozone ~ Temp, xlab="Temperature (°F)", ylab="Daily Max Ozone (ppm)", main="2011 BATON ROUGE/CAPITOL")

128

Conditional Scatterplots

2011 BATON ROUGE/CAPITOL

Temperature (°F)

Daily

Max O

zone (

ppm

) 0.00

0.05

0.10

January

40 60 80 100

February March

40 60 80 100

April

May June July

0.00

0.05

0.10

August

0.00

0.05

0.10

40 60 80 100

September October

40 60 80 100

November December

xyplot(Ozone ~ Temp | Month, xlab="Temperature (°F)", ylab="Daily Max Ozone (ppm)", as.table=TRUE, main="2011 BATON ROUGE/CAPITOL")

129

Dot Plot (v.1)

require(plyr) air$Month <- Month View(air) OzoneMonthlySummary <- ddply(air, "Month", summarise, Median = median(Ozone), Q1 = quantile(Ozone, prob=0.25), Q3 = quantile(Ozone, prob=0.75)) OzoneMonthlySummary

Median Value of Daily Max Ozone on a Given Month (ppm)

January

February

March

April

May

June

July

August

September

October

November

December

0.03 0.04 0.05 0.06

130

Dot Plot (v.1)

dotplot(Month ~ Median, data=OzoneMonthlySummary,

aspect=1.0,

xlab="Median Value of Daily Max Ozone on a Given Month (ppm)",

scales=list(cex=1.0),

panel = function (x, y) {

panel.abline(h = as.numeric(y), col = "gray", lty = 2)

panel.xyplot(x, as.numeric(y), col = "blue", pch = 16)

}

)

131

Dot Plot (v.2)

Median Value of Daily Max Ozone on a Given Month (ppm)

January

February

March

April

May

June

July

August

September

October

November

December

0.02 0.04 0.06 0.08

Note: Reported ranges represent inter-quartile ranges.

132

Dot Plot (v.2)

dotplot(Month ~ Median, data = OzoneMonthlySummary,

aspect = 1,

xlim = c(0, 0.10),

xlab = "Median Value of Daily Max Ozone on a Given Month (ppm)",

panel = function (x, y) {

panel.xyplot(x, y, pch = 16, col = "red")

panel.segments(OzoneMonthlySummary$Q1, as.numeric(y),

OzoneMonthlySummary$Q3, as.numeric(y),

lty = 1, col = "black")

}

)

133

Bar Chart (v.1)

2011 BATON ROUGE/CAPITOL

Month

Media

n V

alu

e o

f D

aily

Max O

zone o

n a

Giv

en M

onth

(ppm

)

0.03

0.04

0.05

0.06

January February March April May June July August September October November December

barchart(Median ~ Month, data = OzoneMonthlySummary, xlab="Month", ylab="Median Value of Daily Max Ozone on a Given Month (ppm)", main="2011 BATON ROUGE/CAPITOL")

134

Bar Chart (v.2)

2011 BATON ROUGE/CAPITOL

Month

Media

n V

alu

e o

f D

aily

Max O

zone o

n a

Giv

en M

onth

(ppm

)

0.03

0.04

0.05

0.06

January February March April May June July August September October November December

0.036

0.041

0.044 0.0445 0.045

0.0565

0.043

0.068

0.054

0.052

0.0375

0.028

barchart(Median ~ Month, data = OzoneMonthlySummary, xlab="Month", ylab="Median Value of Daily Max Ozone on a Given Month (ppm)", main="2011 BATON ROUGE/CAPITOL", panel=function(x, y, ...) { panel.barchart(x, y, ...) ltext(x=x, y=y+0.001, labels=y) } )

135

Bar Chart (v.3)

2011 BATON ROUGE/CAPITOL

Month

Media

n V

alu

e o

f D

aily

Max O

zone o

n a

Giv

en M

onth

(ppm

)

January

February

March

April

May

June

July

August

September

October

November

December

0.02 0.04 0.06

0.036

0.041

0.044

0.0445

0.045

0.0565

0.043

0.068

0.054

0.052

0.0375

0.028

barchart(Month ~ Median, data = OzoneMonthlySummary, xlab="Month", xlim=c(0,0.08), ylab="Median Value of Daily Max Ozone on a Given Month (ppm)", main="2011 BATON ROUGE/CAPITOL", panel=function(x, y, ...) { panel.barchart(x, y, ...) ltext(x=x+0.003, y=y, labels=x) } )

136

Splom Plots

Scatter Plot Matrix

Ozone0.08

0.10

0.12

0.140.080.100.120.14

0.00

0.02

0.04

0.06

0.000.020.040.06

Temp70

80

90

100 70 80 90 100

30

40

50

60

30 40 50 60

RelativeHumidity80

90

10080 90 100

60

70

80

60 70 80

splom( ~ cbind(Ozone, Temp, RelativeHumidity))

137

Cloud Plot

TempRelativeHumidity

Ozone

cloud(Ozone ~ Temp*RelativeHumidity)

138

Conditional Cloud Plot

TempRelHum

Ozone

January

TempRelHum

Ozone

February

TempRelHum

Ozone

March

TempRelHum

Ozone

April

TempRelHum

Ozone

May

TempRelHum

Ozone

June

TempRelHum

Ozone

July

TempRelHum

Ozone

August

TempRelHum

Ozone

September

TempRelHum

Ozone

October

TempRelHum

Ozone

November

TempRelHum

Ozone

December

RelHum <- RelativeHumidity cloud(Ozone ~ Temp*RelHum | Month, as.table=TRUE, panel.aspect=0.8)

139

Enhancing Lattice Graphs

Learning Goal: Learn how to enhance advanced graphs using the R lattice package.

140

Enhancing Lattice Graphs

Lattice graphs can be enhanced by:

• Adding basic features (e.g., titles, x-axis and

y-axis labels, colors)

• Using panel functions

• Using additional lattice graphics via the latticeExtra package

141

Basic Enhancement of Lattice Graphs

histogram( ~ Ozone,

xlab = "Daily Max Ozone (ppm)",

main= "2011 BATON ROUGE/CAPITOL",

col= "lightblue") 2011 BATON ROUGE/CAPITOL

Daily Max Ozone (ppm)

Perc

ent

of

Tota

l

0

10

20

30

0.00 0.05 0.10 0.15

142

Panel Enhancements of Lattice Graphs

histogram( ~ Ozone ,

xlab = "Ozone", type = "density", panel = function(x, ...) { panel.histogram(x, ...); panel.mathdensity(dmath = dnorm,

col = "black", args = list(mean=mean(x),sd=sd(x))) }

)

Ozone

Density

0

5

10

15

20

25

0.00 0.05 0.10 0.15

143

Panel Enhancements of Lattice Graphs

mypanel <- function(x,y){ panel.xyplot(x,y) panel.lmline(x,y, col="red", lty=2) panel.loess(x,y, col="blue") } xyplot(Ozone ~ Date, panel=mypanel)

Date

Ozone

0.00

0.05

0.10

Jan Apr Jul Oct Jan

144

The latticeExtra Package The latticeExtra package extends the lattice package and has its own dedicated website: http://latticeextra.r-forge.r-project.org/ The latticeExtra package can be installed in R with the command: install.packages("latticeExtra") Once installed in R, the latticeExtra package can be attached to the current R session with the command: require(latticeExtra)

145

latticeExtra: marginal.plot()

require(latticeExtra) air$Month <- Month air1 <- subset(air, Month=="January" | Month=="February" | Month =="March") air2 <- air1[ ,c("Month","Ozone","Temp","RelativeHumidity","SolarRadiation")] air2$Month <- factor(air2$Month, levels=c("January","February","March")) marginal.plot(air2[ ,-1]) marginal.plot(air2[,-1], groups=air2$Month, auto.key=list(lines=TRUE))

0.020.04

0.06

Ozone

40 60 80

Temp

60 80100

RelativeHumidity

0.5 1.0

SolarRadiation

JanuaryFebruaryMarch

General Syntax: marginal.plot( ~ Y) marginal.plot( ~ Y, groups = G, auto.key=list(lines=TRUE))

146

latticeExtra: xyplot()

xyplot(Ozone~Date, xlab="Year 2011", ylab="Daily Max Ozone (ppm)", main="BATON ROUGE/CAPITOL", panel = function(...) { panel.xyplot(...) panel.smoother(..., span = 0.9) } )

General Syntax: xyplot( Y~ X, xlab="x-axis label", ylab="y-axis label", main="Main Title", panel = function(...) { panel.xyplot(...) panel.smoother(..., span = 0.5) } )

147

latticeExtra: ecdfplot()

require(latticeExtra)

ecdfplot(~Ozone,

xlab="Daily Max Ozone (ppm)",

main="Empirical Cumulative Distribution Plot")

ecdfplot(~Ozone | Month,

xlab="Daily Max Ozone (ppm)",

main="Empirical Cumulative Distribution Plot")

Empirical Cumulative Distribution Plot

Daily Max Ozone (ppm)

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

0.00 0.05 0.10

Empirical Cumulative Distribution Plot

Daily Max Ozone (ppm)

Em

piric

al C

DF

0.00.20.40.60.81.0

0.000.05 0.10

January February

0.00 0.05 0.10

March April

May June July

0.00.20.40.60.81.0

August0.00.20.40.60.81.0

September

0.00 0.05 0.10

October November

0.00 0.05 0.10

December

148

Exercise on Lattice Graphics For this exercise, please refer to the variables in the ozone data frame. Before starting the exercise, create a month variable using the R commands provided below.

month <- months(date) month <- factor(month, levels=unique(month)) Remember to attach the lattice package to your current R session with the command: require(lattice)

149

Exercise on Lattice Graphics Use the functions in the lattice package to

create the following graphs.

1) Create a histogram of maxO3.

2) Create conditional histograms of maxO3 for each month.

3) Create a density plot of maxO3.

4) Create conditional density plots of maxO3 for each month.

5) Create a boxplot of maxO3.

6) Create side-by-side boxplots of maxO3 for each month.

150

Exercise on Lattice Graphics 7) Create a scatter plot of maxO3 vs. T12 and enhance it by adding a title and axes labels. 8) Create conditional scatter plots of maxO3 vs. T12 given month. 9) Create a time series plot of maxO3. 10) Create a separate time series plot of maxO3 for each month. 11) Create a splom plot of the variables maxO3, T12 and Wx12. 12) Create a cloud plot visualizing the dependency of maxO3 on T12 and Wx12.

151

R Graphics Housekeeping

Learning Goal: Learn how to save the graphs you produce in R.

152

Exporting R Graphics

R Graphics can be exported in a variety of formats:

• metafile (.wmf)

• postscript (.ps)

• pdf (.pdf)

• png (.png)

• bmp (.bmp)

• TIFF (.tiff)

• JPEG (.jpeg)

153

Exporting R Graphics

Graphics can be exported from R in one of three

ways:

1. Using the R Gui Menu: File Save as.

2. Using Ctrl + C or Ctrl + W to copy the graph from the R Graphics window and Ctrl + V to insert the graph in a Word, Excel or Power Point document.

3. Using the R command line.

154

Exporting R Graphics To save graphs from the R command line, use any of the following R functions: win.metafile() postscript() pdf() png() bmp() tiff() jpeg() Each of these functions will re-direct graphical output from the R graphics window to a file of the corresponding type (e.g., pdf). Using dev.off() after constructing the graph is required.

Accessing Help Files for R Graphics Export Functions: ?win.metafile ?postscript ?pdf ?png ?bmp ?tiff ?jpeg

155

R Graphics Housekeeping

To create a windows metafile from the R

command line, use:

win.metafile("graph.wmf")

hist(rnorm(100))

dev.off()

156

R Graphics Housekeeping

To create a postscript file from the R

command line, use:

postscript("graph.ps")

hist(rnorm(100))

dev.off()

Postscript files can be viewed with GPL Ghostscript

(http://www.cs.wisc.edu/~ghost/).

157

R Graphics Housekeeping

To create a pdf file from the R command line,

use:

pdf("graph.pdf")

hist(rnorm(100))

dev.off()

158

R Graphics Housekeeping

To create a png file from the R command line,

use:

pdf("graph.png")

hist(rnorm(100))

dev.off()

159

R Graphics Housekeeping

To create a bmp file from the R command line,

use:

bmp("graph.bmp")

hist(rnorm(100))

dev.off()

160

R Graphics Housekeeping

To create a tiff file from the R command line,

use:

tiff("graph.tiff")

hist(rnorm(100))

dev.off()

161

R Graphics Housekeeping

To create a jpeg file from the R command line,

use:

jpeg("graph.jpeg")

hist(rnorm(100))

dev.off()

162

Exercise on R Graphics Housekeeping

With reference to the ozone data frame, create a histogram of the variable maxO3 using either the hist() function in the graphics package or the histogram() function in the lattice package. 1) Save this histogram as a pdf file using the R Gui

menu commands File Save as.

2) Save this histogram as a pdf file using the command line approach facilitated by the pdf() function.

163

Summary

164

Summary R provides 4 different graphical systems for producing

elegant, publication-quality graphics: graphics, grid, lattice, ggplot2.

In this course, we explored in more detail some of the functionality available in the graphics and lattice packages. The lattice package relies heavily on the grid package.

Once you get comfortable with the graphics and lattice packages, you can start exploring the ggplot2 package.

The ggplot2 package purports to combine the best

features of the graphics and lattice packages,

but has a completely different, more abstract syntax.

165

References on Graphics in R

166

References on Graphics in R

Books:

• “Graphics for Statistics and Data Analysis with R”, by Kevin J. Keen (CRC Press, 2010)

• “Lattice: Multivariate Data Visualization with R”, by Deepayan Sarkar (Springer, 2008)

• “ggplot2: Elegant Graphics for Data Analysis”, by Hadley Wickham (Springer-Verlag, 2009)

• “R Graphics”, 2nd Edition, by Paul Murrell (Chapman & Hall/CRC, 2006)

167

References on Graphics in R

Websites:

Website Address Website Description

http://www.r-project.org R Project

http://www.statmethods.net Quick-R

http://www.r-bloggers.com R Bloggers

http://lmdvr.r-forge.r-project.org Lattice Website

http://ggplot2.org Ggplot2 Website

http://www.stat.auckland.ac.nz/~paul/grid/grid.html Grid Website

http://gallery.r-enthusiasts.com R Graph Gallery

168

Thank you

Thank you very much for attending this course. If you have any questions related to the content of this course, please contact Dr. Isabella Ghement at the following address: Dr. Isabella R. Ghement Ghement Statistical Consulting Company Ltd. 301-7031 Blundell Road Richmond, B.C. Canada, V6Y 1J5 Tel: 604-767-1250 Fax: 604-270-3922 E-Mail: isabella@ghement.ca

169