STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the...

88
STA258H5 Al Nosedal and Alison Weir Winter 2017 Al Nosedal and Alison Weir STA258H5 Winter 2017 1 / 88

Transcript of STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the...

Page 1: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

STA258H5

Al Nosedaland Alison Weir

Winter 2017

Al Nosedal and Alison Weir STA258H5 Winter 2017 1 / 88

Page 2: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

INTRODUCING R AND ASSESSING NORMALITY

Al Nosedal and Alison Weir STA258H5 Winter 2017 2 / 88

Page 3: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

History of R

S: language for data analysis developed at Bell Labs circa 1976

Licensed by AT&T/Lucent to Insightful Corp. Product name: S-plus.

R: initially written and released as an open source software by RossIhaka and Robert Gentleman at U Auckland during 90s (R plays onname S)

Since 1997: international R-core team 1̃5 people & 1000s of codewriters and statisticians happy to share their libraries! AWESOME!

Al Nosedal and Alison Weir STA258H5 Winter 2017 3 / 88

Page 4: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

But it’s open source...

Doesn’t that mean it’s lousy? NO!!

Provides hundreds of thousands of functions and algorithms

Lets users fix bugs and add functionality

It is CUTTING EDGE, statistically and every other way.

Ensures that researchers around the world - not just ones in richcountries - are the co-owners of the software tools needed to carry outresearch

Most of R is written in? R! This makes it quite easy to see whatfunctions are actually doing.

Al Nosedal and Alison Weir STA258H5 Winter 2017 4 / 88

Page 5: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

What exactly is R?

R is used for data manipulation, statistics, and graphics.It is made of:

operators (+− < − ?) for calculations on vectors, arrays and matrices

a huge collection of functions

facilities for making unlimited types quality graphs

user contributed packages (sets of related functions)

the ability to interface with procedures written in C, C+, orFORTRAN and to write additional primitives.

Al Nosedal and Alison Weir STA258H5 Winter 2017 5 / 88

Page 6: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Advantages of R

Fast and free.

State of the art statistically

Only MATLAB has better graphics.

Lots of other programs use R (like Google).

Active user community

Excellent for simulation, programming, analysis ?

Forces you to think about your analysis.

Easy interface with database storage software

Al Nosedal and Alison Weir STA258H5 Winter 2017 6 / 88

Page 7: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Disadvantages of R

Not user friendly, minimal GUI.

Steep initial learning curve

No commercial support

Easy to make mistakes and not know.

Data prep and cleaning can be messy

Al Nosedal and Alison Weir STA258H5 Winter 2017 7 / 88

Page 8: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

The R GUI

Al Nosedal and Alison Weir STA258H5 Winter 2017 8 / 88

Page 9: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Openning a script in R

Al Nosedal and Alison Weir STA258H5 Winter 2017 9 / 88

Page 10: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Homework

This course uses R. R is an open-source computing package which hasseen a huge growth in popularity in the last few years. R can bedownloaded from https://cran.r-project.org

Please, download R and bring your laptop next time.

Al Nosedal and Alison Weir STA258H5 Winter 2017 10 / 88

Page 11: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

What is RStudio?

RStudio is a relatively new editor specially targeted at R. RStudio iscross-platform, free and open-source software.

Al Nosedal and Alison Weir STA258H5 Winter 2017 11 / 88

Page 12: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

More homework: Obtaining RStudio

Just go to:

http://www.rstudio.com

download the corresponding file, execute it locally and follow theinstructions given by the installer.

Al Nosedal and Alison Weir STA258H5 Winter 2017 12 / 88

Page 13: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Studio

Al Nosedal and Alison Weir STA258H5 Winter 2017 13 / 88

Page 14: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Getting your data into R Studio

1 Save data file as a something .csv file.

2 Click on Import Dataset in top right window

Al Nosedal and Alison Weir STA258H5 Winter 2017 14 / 88

Page 15: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Working in R Studio

Enter your commands in the console windowArithmetic Operations

2+3;

## [1] 5

3-2;

## [1] 1

3*2;

## [1] 6

Al Nosedal and Alison Weir STA258H5 Winter 2017 15 / 88

Page 16: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Working in R Studio

Enter your commands in the console windowArithmetic Operations

3/2;

## [1] 1.5

3^2;

## [1] 9

Al Nosedal and Alison Weir STA258H5 Winter 2017 16 / 88

Page 17: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Working in R Studio

Enter your commands in the console windowAssignment

To assign a value to a variable use < −Example:

a<-19;

Al Nosedal and Alison Weir STA258H5 Winter 2017 17 / 88

Page 18: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

How to use help in R?

If you know which function you want help with simply use help. Example:

help(histogram);

Al Nosedal and Alison Weir STA258H5 Winter 2017 18 / 88

Page 19: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

For now: R-Fiddle

R-Fiddle is a programming environment for R available online. It allows usto encode and to run a program written in R. The tool is available at thisURL: http://www.r-fiddle.org

Al Nosedal and Alison Weir STA258H5 Winter 2017 19 / 88

Page 20: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Old Faithful Geyser Data

DescriptionWaiting time between eruptions and the duration of the eruption for theOld Faithful geyser in Yellowstone National Park, Wyoming, USA.(A data frame with 272 observations on 2 variables.)

Al Nosedal and Alison Weir STA258H5 Winter 2017 20 / 88

Page 21: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Fiddle

Al Nosedal and Alison Weir STA258H5 Winter 2017 21 / 88

Page 22: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Fiddle

Al Nosedal and Alison Weir STA258H5 Winter 2017 22 / 88

Page 23: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Fiddle

Al Nosedal and Alison Weir STA258H5 Winter 2017 23 / 88

Page 24: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Fiddle

Al Nosedal and Alison Weir STA258H5 Winter 2017 24 / 88

Page 25: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Histogram

## Basic plot.

hist(faithful$eruptions);

Al Nosedal and Alison Weir STA258H5 Winter 2017 25 / 88

Page 26: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Histogram

Histogram of faithful$eruptions

faithful$eruptions

Fre

quen

cy

2 3 4 5

020

4060

Al Nosedal and Alison Weir STA258H5 Winter 2017 26 / 88

Page 27: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Histogram (with title)

## Nicer plot.

hist(faithful$eruptions,

main="Duration of Old Faithful Eruptions (min)");

Al Nosedal and Alison Weir STA258H5 Winter 2017 27 / 88

Page 28: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Histogram (with title)

Duration of Old Faithful Eruptions (min)

faithful$eruptions

Fre

quen

cy

2 3 4 5

020

4060

Al Nosedal and Alison Weir STA258H5 Winter 2017 28 / 88

Page 29: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Histogram

## Add axes labels and color.

hist(faithful$eruptions,

main="Duration of Old Faithful Eruptions (min)",

xlab="duration",ylab="count", col="blue");

Al Nosedal and Alison Weir STA258H5 Winter 2017 29 / 88

Page 30: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Histogram

Duration of Old Faithful Eruptions (min)

duration

coun

t

2 3 4 5

020

4060

Al Nosedal and Alison Weir STA258H5 Winter 2017 30 / 88

Page 31: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Boxplot

## Basic plot.

boxplot(faithful$eruptions);

Al Nosedal and Alison Weir STA258H5 Winter 2017 31 / 88

Page 32: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Boxplot

1.5

2.5

3.5

4.5

Al Nosedal and Alison Weir STA258H5 Winter 2017 32 / 88

Page 33: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Boxplot

## Add axes labels and color.

boxplot(faithful$eruptions,

main="Duration of Old Faithful Eruptions (min)",

xlab="duration",ylab="count", col="blue");

Al Nosedal and Alison Weir STA258H5 Winter 2017 33 / 88

Page 34: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Boxplot

1.5

2.5

3.5

4.5

Duration of Old Faithful Eruptions (min)

duration

coun

t

Al Nosedal and Alison Weir STA258H5 Winter 2017 34 / 88

Page 35: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Scatterplot

## Basic plot.

plot(faithful$eruptions,faithful$waiting);

Al Nosedal and Alison Weir STA258H5 Winter 2017 35 / 88

Page 36: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Scatterplot

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●● ●

●●

●●

●●

● ●

● ●

1.5 2.5 3.5 4.5

5070

90

faithful$eruptions

faith

ful$

wai

ting

Al Nosedal and Alison Weir STA258H5 Winter 2017 36 / 88

Page 37: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Scatterplot

## Nicer plot.

plot(faithful$eruptions,faithful$waiting,

main="Eruption Duration vs Waiting Times (mins)",

xlab="duration",ylab="waiting time",

pch=19, col="blue");

Al Nosedal and Alison Weir STA258H5 Winter 2017 37 / 88

Page 38: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Scatterplot

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●● ●● ●

●●

●●

●●

● ●

● ●

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

5070

90Eruption Duration vs Waiting Times (mins)

duration

wai

ting

time

Al Nosedal and Alison Weir STA258H5 Winter 2017 38 / 88

Page 39: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Making a panel of graphs

If you want more than one graph in a panel.

par(mfrow=c(nrow,ncol ) )

# where nrow= number of rows

# and ncol=number of columns;

Al Nosedal and Alison Weir STA258H5 Winter 2017 39 / 88

Page 40: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Panel of graphs

par(mfrow=c(1,2) )

hist(faithful$eruptions,

main="Duration (min)",

xlab="duration",ylab="count", col="blue");

hist(faithful$waiting,

main="Waiting (min)",

xlab="waiting time",ylab="count", col="red");

Al Nosedal and Alison Weir STA258H5 Winter 2017 40 / 88

Page 41: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Panel of graphs

Duration (min)

duration

coun

t

2 3 4 5

020

4060

Waiting (min)

waiting time

coun

t

40 70 100

010

3050

Al Nosedal and Alison Weir STA258H5 Winter 2017 41 / 88

Page 42: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Panel of graphs

prototype<-rnorm(1000,mean=0,sd=1);

par(mfrow=c(2,2) )

hist(prototype,

main="prototype",

col="orange");

hist(faithful$eruptions,

main="Duration (min)",

xlab="duration",ylab="count", col="blue");

hist(faithful$waiting,

main="Waiting (min)",

xlab="waiting time",ylab="count", col="red");

Al Nosedal and Alison Weir STA258H5 Winter 2017 42 / 88

Page 43: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Panel of graphs

prototype

prototype

Fre

quen

cy

−2 0 2 4

0

Duration (min)

duration

coun

t

2 3 4 5

0

Waiting (min)

waiting time

coun

t

40 60 80 100

0

Al Nosedal and Alison Weir STA258H5 Winter 2017 43 / 88

Page 44: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Problem

How much do people with a bachelor’s degree (but no higher degree)earn? Here are the incomes of 15 such people, chosen at random by theCensus Bureau in March 2002 and asked how much they earned in 2001.Most people reported their incomes to the nearest thousand dollars, so wehave rounded their responses to thousands of dollars: 110 25 50 50 55 3035 30 4 32 50 30 32 74 60.How could we find the ”typical” income for people with a bachelor’sdegree (but no higher degree)?

Al Nosedal and Alison Weir STA258H5 Winter 2017 44 / 88

Page 45: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Measuring center: the median

The median M is the midpoint of a distribution, the number such thathalf the observations are smaller and the other half are larger. To find themedian of the distribution:Arrange all observations in order of size, from smallest to largest.If the number of observations n is odd, the median M is the centerobservation in the ordered list. Find the location of the median bycounting n+1

2 observations up from the bottom of the list.If the number of observations n is even, the median M is the mean of thetwo center observations in the ordered list. Find the location of themedian by counting n+1

2 observations up from the bottom of the list.

Al Nosedal and Alison Weir STA258H5 Winter 2017 45 / 88

Page 46: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Income Problem (Median)

We know that if we want to find the median, M, we have to order ourobservations from smallest to largest: 4 25 30 30 30 32 32 35 50 50 50 5560 74 110. Let’s find the location of Mlocation of M = n+1

2 = 15+12 = 8

Therefore, M = x8 = 35 (x8= 8th observation on our ordered list).

Al Nosedal and Alison Weir STA258H5 Winter 2017 46 / 88

Page 47: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

The quartiles Q1 and Q3

To calculate the quartiles:Arrange the observations in increasing order and locate the median M inthe ordered list of observations.The first quartile Q1 is the median of the observations whose position inthe ordered list is to the left of the location of the overall median.The third quartile Q3 is the median of the observations whose position inthe ordered list is to the right of the location of the overall median.

Al Nosedal and Alison Weir STA258H5 Winter 2017 47 / 88

Page 48: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Income Problem (Q1)

Data:4 25 30 30 30 32 32 35 50 50 50 55 60 74 110.From previous work, we know that M = x8 = 35.This implies that the first half of our data has n1 = 7 observations. Let usfind the location of Q1:location of Q1 = n1+1

2 = 7+12 = 4.

This means that Q1 = x4 = 30.

Al Nosedal and Alison Weir STA258H5 Winter 2017 48 / 88

Page 49: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Income Problem (Q3)

Data:4 25 30 30 30 32 32 35 50 50 50 55 60 74 110.From previous work, we know that M = x8 = 35.This implies that the first half of our data has n2 = 7 observations. Let usfind the location of Q3:location of Q3 = n2+1

2 = 7+12 = 4.

This means that Q3 = 55.

Al Nosedal and Alison Weir STA258H5 Winter 2017 49 / 88

Page 50: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Five-number summary

The five-number summary of a distribution consists of the smallestobservation, the first quartile, the median, the third quartile, and thelargest observation, written in order from smallest to largest. In symbols,the five-number summary ismin Q1 M Q3 MAX .

Al Nosedal and Alison Weir STA258H5 Winter 2017 50 / 88

Page 51: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Income Problem (five-number summary)

Data: 4 25 30 30 30 32 32 35 50 50 50 55 60 74 110. The five-numbersummary for our income problem is given by:4 30 35 55 110

Al Nosedal and Alison Weir STA258H5 Winter 2017 51 / 88

Page 52: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

# Step 1. Entering Data;

income=c(4,25,30,30,30,32,32,35,50,50,50,55,60,74,110);

Al Nosedal and Alison Weir STA258H5 Winter 2017 52 / 88

Page 53: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

# Step 2. Finding five-number summary;

fivenum(income);

Al Nosedal and Alison Weir STA258H5 Winter 2017 53 / 88

Page 54: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

## [1] 4.0 30.0 35.0 52.5 110.0

Note. Sometimes, R will give you a slightly different five-number summary.

Al Nosedal and Alison Weir STA258H5 Winter 2017 54 / 88

Page 55: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Box plot

A boxplot is a graph of the five-number summary.A central box spans the quartiles Q1 and Q3.A line in the box marks the median M.Lines extended from the box out to the smallest and largest observations.

Al Nosedal and Alison Weir STA258H5 Winter 2017 55 / 88

Page 56: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Boxplot

par(mfrow=c(1,3) )

boxplot(prototype,

main="prototype",

col="orange");

boxplot(faithful$eruptions,

main="eruption duration ",

col="blue");

boxplot(faithful$waiting,

main="time between eruptions",

col="red");

Al Nosedal and Alison Weir STA258H5 Winter 2017 56 / 88

Page 57: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Boxplot

●●

−2

02

4

prototype

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

eruption duration

5060

7080

90

time between eruptions

Al Nosedal and Alison Weir STA258H5 Winter 2017 57 / 88

Page 58: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Boxplot

●●●

●●

−2

02

4

prototype

Should be:

Symmetric

≈ 5% outliers

Tails ≈ 1.5 IQR

Al Nosedal and Alison Weir STA258H5 Winter 2017 58 / 88

Page 59: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

The 68-95-99.7 rule

In the Normal distribution with mean µ and standard deviation σ:Approximately 68% of the observations fall within σ of the mean µ.Approximately 95% of the observations fall within 2σ of µ.Approximately 99.7% of the observations fall within 3σ of µ.Note. The 68-95-99.7 rule is also know as the empirical rule.

Al Nosedal and Alison Weir STA258H5 Winter 2017 59 / 88

Page 60: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Example N(µ = 0, σ = 1)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Roughly 68%

Al Nosedal and Alison Weir STA258H5 Winter 2017 60 / 88

Page 61: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Example N(µ = 0, σ = 1)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Roughly 95%

Al Nosedal and Alison Weir STA258H5 Winter 2017 61 / 88

Page 62: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

meanP<-mean(prototype);

sdP<-sqrt(var(prototype));

lower<-meanP-sdP;

upper<-meanP+sdP;

N<-length(prototype);

100*length(prototype[ lower<prototype & prototype<upper])/N;

## [1] 69.8

Al Nosedal and Alison Weir STA258H5 Winter 2017 62 / 88

Page 63: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

meanP<-mean(prototype);

sdP<-sqrt(var(prototype));

lower<-meanP-2*sdP;

upper<-meanP+2*sdP;

N<-length(prototype);

100*length(prototype[ lower<prototype & prototype<upper])/N;

## [1] 94.9

Al Nosedal and Alison Weir STA258H5 Winter 2017 63 / 88

Page 64: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

meanP<-mean(prototype);

sdP<-sqrt(var(prototype));

lower<-meanP-3*sdP;

upper<-meanP+3*sdP;

N<-length(prototype);

100*length(prototype[ lower<prototype & prototype<upper])/N;

## [1] 99.5

Al Nosedal and Alison Weir STA258H5 Winter 2017 64 / 88

Page 65: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

eruptions<-faithful$eruptions;

meanE<-mean(eruptions);

sdE<-sqrt(var(eruptions));

lower<-meanE-sdE;

upper<-meanE+sdE;

N<-length(eruptions);

100*length(eruptions[ lower<eruptions & eruptions<upper])/N;

## [1] 55.14706

Al Nosedal and Alison Weir STA258H5 Winter 2017 65 / 88

Page 66: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

eruptions<-faithful$eruptions;

meanE<-mean(eruptions);

sdE<-sqrt(var(eruptions));

lower<-meanE-2*sdE;

upper<-meanE+2*sdE;

N<-length(eruptions);

100*length(eruptions[ lower<eruptions & eruptions<upper])/N;

## [1] 100

Al Nosedal and Alison Weir STA258H5 Winter 2017 66 / 88

Page 67: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

eruptions<-faithful$eruptions;

meanE<-mean(eruptions);

sdE<-sqrt(var(eruptions));

lower<-meanE-3*sdE;

upper<-meanE+3*sdE;

N<-length(eruptions);

100*length(eruptions[ lower<eruptions & eruptions<upper])/N;

## [1] 100

Al Nosedal and Alison Weir STA258H5 Winter 2017 67 / 88

Page 68: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Homework?

Modify R Code given above and see what happens with waiting time.

Al Nosedal and Alison Weir STA258H5 Winter 2017 68 / 88

Page 69: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Q-Q Plot (Example)

A sample of n = 10 observations gives the values in the following table:

Ordered Observations Probability levels Standard Normalx(j) (j − 1/2)/n Quantiles q(j)-1 0.05 -1.645

-0.10 0.15 -1.0360.16 0.25 -0.6740.41 0.35 -0.3850.62 0.45 -0.1250.80 0.55 0.1251.26 0.65 0.3851.54 0.75 0.6741.71 0.85 1.0362.30 0.95 1.645

Here, for example, P[Z ≤ 0.385] =∫ 0.385−∞

1√2πe−z

2/2dz = 0.65.

Al Nosedal and Alison Weir STA258H5 Winter 2017 69 / 88

Page 70: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Q-Q Plot (Example)

Let us now construct the Q-Q plot and comment on its appearance. TheQ-Q plot for the foregoing data, which is a plot of the ordered data x(j)against the normal quantiles is shown below. The pairs of points(q(j), x(j)) lie very nearly along a straight line, and we would not reject thenotion that these data are Normally distributed-particularly with a samplesize as small as n = 10.

Al Nosedal and Alison Weir STA258H5 Winter 2017 70 / 88

Page 71: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

## Ordered observations;

obs<-c(-1,-0.1,0.16,0.41,0.62,0.80,1.26,1.54,1.71,2.30);

n<-length(obs);

## Corresponding probability values;

prob.levels<-(seq(1:n)-0.5)/n;

## Standard Normal Quantiles;

norm.quantiles<-qnorm(prob.levels);

Al Nosedal and Alison Weir STA258H5 Winter 2017 71 / 88

Page 72: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

## Q-Q plot;

plot(norm.quantiles,obs,

xlab=expression(q[(j)]),

ylab=expression(x[(j)]),

main="Ours",col="blue",pch=19);

## Q-Q plot (using R function);

qqnorm(obs,col="blue",pch=19);

Al Nosedal and Alison Weir STA258H5 Winter 2017 72 / 88

Page 73: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

●●

●●

●●

−1.5 −0.5 0.0 0.5 1.0 1.5

−1.

00.

01.

02.

0Ours

q(j)

x (j)

Al Nosedal and Alison Weir STA258H5 Winter 2017 73 / 88

Page 74: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

## Q-Q plot (using R function);

qqnorm(obs,col="blue",pch=19);

Al Nosedal and Alison Weir STA258H5 Winter 2017 74 / 88

Page 75: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

R Code

●●

●●

●●

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−1.

00.

01.

02.

0Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Al Nosedal and Alison Weir STA258H5 Winter 2017 75 / 88

Page 76: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Q-Q plots

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

● ●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2 3

−3

−1

12

3prototype

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Al Nosedal and Alison Weir STA258H5 Winter 2017 76 / 88

Page 77: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Q-Q plots

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

−3 −2 −1 0 1 2 3

1.5

2.5

3.5

4.5

Duration (min)

duration

coun

t

Al Nosedal and Alison Weir STA258H5 Winter 2017 77 / 88

Page 78: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Q-Q plots

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2 3

5070

90Waiting (min)

waiting time

coun

t

Al Nosedal and Alison Weir STA258H5 Winter 2017 78 / 88

Page 79: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

If plots are OK → data could come from a Normal distribution . . .but it could come from some other distribution. So good plots don’tprove data came from a Normal distribution

If plots are not OK → data probably does not come from a Normaldistribution, we can’t assume data is from Normal population

Al Nosedal and Alison Weir STA258H5 Winter 2017 79 / 88

Page 80: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Panel of graphs

n=30, N(0,1)

y1

Fre

quen

cy

−3 −1 1 2

0

n=30, N(0,1)

y2

Fre

quen

cy

−3 −1 1 3

0

n=30, N(0,1)

y3

Fre

quen

cy

−3 −1 1 2

0

n=30, N(0,1)

y4

Fre

quen

cy

−2 0 1 2

06

Al Nosedal and Alison Weir STA258H5 Winter 2017 80 / 88

Page 81: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Q-Q plots

●●

●●

●●●

−2 0 2

−2

−1

01

2n=30, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

●●

●●

−2 0 2

−2

−1

01

2

n=30, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Al Nosedal and Alison Weir STA258H5 Winter 2017 81 / 88

Page 82: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Q-Q plots

●●

●●

●●

●●

●●

●●

−2 0 2

−2

−1

01

n=30, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

● ●

●●

●●

●●

●●

−2 0 2

−2

−1

01

2

n=30, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Al Nosedal and Alison Weir STA258H5 Winter 2017 82 / 88

Page 83: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Panel of graphs

n=70, N(0,1)

y1

Fre

quen

cy

−2 0 1 2 3

0

n=70, N(0,1)

y2

Fre

quen

cy

−4 −2 0 2

0

n=70, N(0,1)

y3

Fre

quen

cy

−3 −1 1 2

0

n=70, N(0,1)

y4

Fre

quen

cy

−2 0 1 2

0

Al Nosedal and Alison Weir STA258H5 Winter 2017 83 / 88

Page 84: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

QQ plots

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 0 2

−1

01

23

n=70, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 0 2

−3

−1

12

n=70, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Al Nosedal and Alison Weir STA258H5 Winter 2017 84 / 88

Page 85: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

QQ plots

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

−2 0 2

−2

01

2n=70, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 0 2

−1

01

2

n=70, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Al Nosedal and Alison Weir STA258H5 Winter 2017 85 / 88

Page 86: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

Panel of graphs

n=120, N(0,1)

y1

Fre

quen

cy

−2 0 1 2

0

n=120, N(0,1)

y2

Fre

quen

cy

−4 −2 0 2

0

n=120, N(0,1)

y3

Fre

quen

cy

−3 −1 1 2

0

n=120, N(0,1)

y4

Fre

quen

cy

−4 −2 0 2

0

Al Nosedal and Alison Weir STA258H5 Winter 2017 86 / 88

Page 87: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

QQ plots

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

−2 0 2

−2

−1

01

n=120, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

−2 0 2

−4

−2

01

2

n=120, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Al Nosedal and Alison Weir STA258H5 Winter 2017 87 / 88

Page 88: STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the installer. ... Working in R Studio ... 40 60 80 100 0 Al Nosedal and Alison Weir STA258H5

QQ plots

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

−2 0 2

−3

−1

01

2n=120, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 0 2

−3

−1

12

n=120, N(0,1)

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Al Nosedal and Alison Weir STA258H5 Winter 2017 88 / 88