STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the...
Transcript of STA258H5 - University of Torontonosedal/sta258/sta258-lec01-02.pdf · instructions given by the...
STA258H5
Al Nosedaland Alison Weir
Winter 2017
Al Nosedal and Alison Weir STA258H5 Winter 2017 1 / 88
INTRODUCING R AND ASSESSING NORMALITY
Al Nosedal and Alison Weir STA258H5 Winter 2017 2 / 88
History of R
S: language for data analysis developed at Bell Labs circa 1976
Licensed by AT&T/Lucent to Insightful Corp. Product name: S-plus.
R: initially written and released as an open source software by RossIhaka and Robert Gentleman at U Auckland during 90s (R plays onname S)
Since 1997: international R-core team 1̃5 people & 1000s of codewriters and statisticians happy to share their libraries! AWESOME!
Al Nosedal and Alison Weir STA258H5 Winter 2017 3 / 88
But it’s open source...
Doesn’t that mean it’s lousy? NO!!
Provides hundreds of thousands of functions and algorithms
Lets users fix bugs and add functionality
It is CUTTING EDGE, statistically and every other way.
Ensures that researchers around the world - not just ones in richcountries - are the co-owners of the software tools needed to carry outresearch
Most of R is written in? R! This makes it quite easy to see whatfunctions are actually doing.
Al Nosedal and Alison Weir STA258H5 Winter 2017 4 / 88
What exactly is R?
R is used for data manipulation, statistics, and graphics.It is made of:
operators (+− < − ?) for calculations on vectors, arrays and matrices
a huge collection of functions
facilities for making unlimited types quality graphs
user contributed packages (sets of related functions)
the ability to interface with procedures written in C, C+, orFORTRAN and to write additional primitives.
Al Nosedal and Alison Weir STA258H5 Winter 2017 5 / 88
Advantages of R
Fast and free.
State of the art statistically
Only MATLAB has better graphics.
Lots of other programs use R (like Google).
Active user community
Excellent for simulation, programming, analysis ?
Forces you to think about your analysis.
Easy interface with database storage software
Al Nosedal and Alison Weir STA258H5 Winter 2017 6 / 88
Disadvantages of R
Not user friendly, minimal GUI.
Steep initial learning curve
No commercial support
Easy to make mistakes and not know.
Data prep and cleaning can be messy
Al Nosedal and Alison Weir STA258H5 Winter 2017 7 / 88
The R GUI
Al Nosedal and Alison Weir STA258H5 Winter 2017 8 / 88
Openning a script in R
Al Nosedal and Alison Weir STA258H5 Winter 2017 9 / 88
Homework
This course uses R. R is an open-source computing package which hasseen a huge growth in popularity in the last few years. R can bedownloaded from https://cran.r-project.org
Please, download R and bring your laptop next time.
Al Nosedal and Alison Weir STA258H5 Winter 2017 10 / 88
What is RStudio?
RStudio is a relatively new editor specially targeted at R. RStudio iscross-platform, free and open-source software.
Al Nosedal and Alison Weir STA258H5 Winter 2017 11 / 88
More homework: Obtaining RStudio
Just go to:
http://www.rstudio.com
download the corresponding file, execute it locally and follow theinstructions given by the installer.
Al Nosedal and Alison Weir STA258H5 Winter 2017 12 / 88
R Studio
Al Nosedal and Alison Weir STA258H5 Winter 2017 13 / 88
Getting your data into R Studio
1 Save data file as a something .csv file.
2 Click on Import Dataset in top right window
Al Nosedal and Alison Weir STA258H5 Winter 2017 14 / 88
Working in R Studio
Enter your commands in the console windowArithmetic Operations
2+3;
## [1] 5
3-2;
## [1] 1
3*2;
## [1] 6
Al Nosedal and Alison Weir STA258H5 Winter 2017 15 / 88
Working in R Studio
Enter your commands in the console windowArithmetic Operations
3/2;
## [1] 1.5
3^2;
## [1] 9
Al Nosedal and Alison Weir STA258H5 Winter 2017 16 / 88
Working in R Studio
Enter your commands in the console windowAssignment
To assign a value to a variable use < −Example:
a<-19;
Al Nosedal and Alison Weir STA258H5 Winter 2017 17 / 88
How to use help in R?
If you know which function you want help with simply use help. Example:
help(histogram);
Al Nosedal and Alison Weir STA258H5 Winter 2017 18 / 88
For now: R-Fiddle
R-Fiddle is a programming environment for R available online. It allows usto encode and to run a program written in R. The tool is available at thisURL: http://www.r-fiddle.org
Al Nosedal and Alison Weir STA258H5 Winter 2017 19 / 88
Old Faithful Geyser Data
DescriptionWaiting time between eruptions and the duration of the eruption for theOld Faithful geyser in Yellowstone National Park, Wyoming, USA.(A data frame with 272 observations on 2 variables.)
Al Nosedal and Alison Weir STA258H5 Winter 2017 20 / 88
R Fiddle
Al Nosedal and Alison Weir STA258H5 Winter 2017 21 / 88
R Fiddle
Al Nosedal and Alison Weir STA258H5 Winter 2017 22 / 88
R Fiddle
Al Nosedal and Alison Weir STA258H5 Winter 2017 23 / 88
R Fiddle
Al Nosedal and Alison Weir STA258H5 Winter 2017 24 / 88
Histogram
## Basic plot.
hist(faithful$eruptions);
Al Nosedal and Alison Weir STA258H5 Winter 2017 25 / 88
Histogram
Histogram of faithful$eruptions
faithful$eruptions
Fre
quen
cy
2 3 4 5
020
4060
Al Nosedal and Alison Weir STA258H5 Winter 2017 26 / 88
Histogram (with title)
## Nicer plot.
hist(faithful$eruptions,
main="Duration of Old Faithful Eruptions (min)");
Al Nosedal and Alison Weir STA258H5 Winter 2017 27 / 88
Histogram (with title)
Duration of Old Faithful Eruptions (min)
faithful$eruptions
Fre
quen
cy
2 3 4 5
020
4060
Al Nosedal and Alison Weir STA258H5 Winter 2017 28 / 88
Histogram
## Add axes labels and color.
hist(faithful$eruptions,
main="Duration of Old Faithful Eruptions (min)",
xlab="duration",ylab="count", col="blue");
Al Nosedal and Alison Weir STA258H5 Winter 2017 29 / 88
Histogram
Duration of Old Faithful Eruptions (min)
duration
coun
t
2 3 4 5
020
4060
Al Nosedal and Alison Weir STA258H5 Winter 2017 30 / 88
Boxplot
## Basic plot.
boxplot(faithful$eruptions);
Al Nosedal and Alison Weir STA258H5 Winter 2017 31 / 88
Boxplot
1.5
2.5
3.5
4.5
Al Nosedal and Alison Weir STA258H5 Winter 2017 32 / 88
Boxplot
## Add axes labels and color.
boxplot(faithful$eruptions,
main="Duration of Old Faithful Eruptions (min)",
xlab="duration",ylab="count", col="blue");
Al Nosedal and Alison Weir STA258H5 Winter 2017 33 / 88
Boxplot
1.5
2.5
3.5
4.5
Duration of Old Faithful Eruptions (min)
duration
coun
t
Al Nosedal and Alison Weir STA258H5 Winter 2017 34 / 88
Scatterplot
## Basic plot.
plot(faithful$eruptions,faithful$waiting);
Al Nosedal and Alison Weir STA258H5 Winter 2017 35 / 88
Scatterplot
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
1.5 2.5 3.5 4.5
5070
90
faithful$eruptions
faith
ful$
wai
ting
Al Nosedal and Alison Weir STA258H5 Winter 2017 36 / 88
Scatterplot
## Nicer plot.
plot(faithful$eruptions,faithful$waiting,
main="Eruption Duration vs Waiting Times (mins)",
xlab="duration",ylab="waiting time",
pch=19, col="blue");
Al Nosedal and Alison Weir STA258H5 Winter 2017 37 / 88
Scatterplot
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
5070
90Eruption Duration vs Waiting Times (mins)
duration
wai
ting
time
Al Nosedal and Alison Weir STA258H5 Winter 2017 38 / 88
Making a panel of graphs
If you want more than one graph in a panel.
par(mfrow=c(nrow,ncol ) )
# where nrow= number of rows
# and ncol=number of columns;
Al Nosedal and Alison Weir STA258H5 Winter 2017 39 / 88
Panel of graphs
par(mfrow=c(1,2) )
hist(faithful$eruptions,
main="Duration (min)",
xlab="duration",ylab="count", col="blue");
hist(faithful$waiting,
main="Waiting (min)",
xlab="waiting time",ylab="count", col="red");
Al Nosedal and Alison Weir STA258H5 Winter 2017 40 / 88
Panel of graphs
Duration (min)
duration
coun
t
2 3 4 5
020
4060
Waiting (min)
waiting time
coun
t
40 70 100
010
3050
Al Nosedal and Alison Weir STA258H5 Winter 2017 41 / 88
Panel of graphs
prototype<-rnorm(1000,mean=0,sd=1);
par(mfrow=c(2,2) )
hist(prototype,
main="prototype",
col="orange");
hist(faithful$eruptions,
main="Duration (min)",
xlab="duration",ylab="count", col="blue");
hist(faithful$waiting,
main="Waiting (min)",
xlab="waiting time",ylab="count", col="red");
Al Nosedal and Alison Weir STA258H5 Winter 2017 42 / 88
Panel of graphs
prototype
prototype
Fre
quen
cy
−2 0 2 4
0
Duration (min)
duration
coun
t
2 3 4 5
0
Waiting (min)
waiting time
coun
t
40 60 80 100
0
Al Nosedal and Alison Weir STA258H5 Winter 2017 43 / 88
Problem
How much do people with a bachelor’s degree (but no higher degree)earn? Here are the incomes of 15 such people, chosen at random by theCensus Bureau in March 2002 and asked how much they earned in 2001.Most people reported their incomes to the nearest thousand dollars, so wehave rounded their responses to thousands of dollars: 110 25 50 50 55 3035 30 4 32 50 30 32 74 60.How could we find the ”typical” income for people with a bachelor’sdegree (but no higher degree)?
Al Nosedal and Alison Weir STA258H5 Winter 2017 44 / 88
Measuring center: the median
The median M is the midpoint of a distribution, the number such thathalf the observations are smaller and the other half are larger. To find themedian of the distribution:Arrange all observations in order of size, from smallest to largest.If the number of observations n is odd, the median M is the centerobservation in the ordered list. Find the location of the median bycounting n+1
2 observations up from the bottom of the list.If the number of observations n is even, the median M is the mean of thetwo center observations in the ordered list. Find the location of themedian by counting n+1
2 observations up from the bottom of the list.
Al Nosedal and Alison Weir STA258H5 Winter 2017 45 / 88
Income Problem (Median)
We know that if we want to find the median, M, we have to order ourobservations from smallest to largest: 4 25 30 30 30 32 32 35 50 50 50 5560 74 110. Let’s find the location of Mlocation of M = n+1
2 = 15+12 = 8
Therefore, M = x8 = 35 (x8= 8th observation on our ordered list).
Al Nosedal and Alison Weir STA258H5 Winter 2017 46 / 88
The quartiles Q1 and Q3
To calculate the quartiles:Arrange the observations in increasing order and locate the median M inthe ordered list of observations.The first quartile Q1 is the median of the observations whose position inthe ordered list is to the left of the location of the overall median.The third quartile Q3 is the median of the observations whose position inthe ordered list is to the right of the location of the overall median.
Al Nosedal and Alison Weir STA258H5 Winter 2017 47 / 88
Income Problem (Q1)
Data:4 25 30 30 30 32 32 35 50 50 50 55 60 74 110.From previous work, we know that M = x8 = 35.This implies that the first half of our data has n1 = 7 observations. Let usfind the location of Q1:location of Q1 = n1+1
2 = 7+12 = 4.
This means that Q1 = x4 = 30.
Al Nosedal and Alison Weir STA258H5 Winter 2017 48 / 88
Income Problem (Q3)
Data:4 25 30 30 30 32 32 35 50 50 50 55 60 74 110.From previous work, we know that M = x8 = 35.This implies that the first half of our data has n2 = 7 observations. Let usfind the location of Q3:location of Q3 = n2+1
2 = 7+12 = 4.
This means that Q3 = 55.
Al Nosedal and Alison Weir STA258H5 Winter 2017 49 / 88
Five-number summary
The five-number summary of a distribution consists of the smallestobservation, the first quartile, the median, the third quartile, and thelargest observation, written in order from smallest to largest. In symbols,the five-number summary ismin Q1 M Q3 MAX .
Al Nosedal and Alison Weir STA258H5 Winter 2017 50 / 88
Income Problem (five-number summary)
Data: 4 25 30 30 30 32 32 35 50 50 50 55 60 74 110. The five-numbersummary for our income problem is given by:4 30 35 55 110
Al Nosedal and Alison Weir STA258H5 Winter 2017 51 / 88
R Code
# Step 1. Entering Data;
income=c(4,25,30,30,30,32,32,35,50,50,50,55,60,74,110);
Al Nosedal and Alison Weir STA258H5 Winter 2017 52 / 88
R Code
# Step 2. Finding five-number summary;
fivenum(income);
Al Nosedal and Alison Weir STA258H5 Winter 2017 53 / 88
R Code
## [1] 4.0 30.0 35.0 52.5 110.0
Note. Sometimes, R will give you a slightly different five-number summary.
Al Nosedal and Alison Weir STA258H5 Winter 2017 54 / 88
Box plot
A boxplot is a graph of the five-number summary.A central box spans the quartiles Q1 and Q3.A line in the box marks the median M.Lines extended from the box out to the smallest and largest observations.
Al Nosedal and Alison Weir STA258H5 Winter 2017 55 / 88
Boxplot
par(mfrow=c(1,3) )
boxplot(prototype,
main="prototype",
col="orange");
boxplot(faithful$eruptions,
main="eruption duration ",
col="blue");
boxplot(faithful$waiting,
main="time between eruptions",
col="red");
Al Nosedal and Alison Weir STA258H5 Winter 2017 56 / 88
Boxplot
●
●
●
●
●
●
●
●
●
●
●
●●
●
−2
02
4
prototype
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
eruption duration
5060
7080
90
time between eruptions
Al Nosedal and Alison Weir STA258H5 Winter 2017 57 / 88
Boxplot
●
●
●
●
●
●
●
●●●
●
●●
●
−2
02
4
prototype
Should be:
Symmetric
≈ 5% outliers
Tails ≈ 1.5 IQR
Al Nosedal and Alison Weir STA258H5 Winter 2017 58 / 88
The 68-95-99.7 rule
In the Normal distribution with mean µ and standard deviation σ:Approximately 68% of the observations fall within σ of the mean µ.Approximately 95% of the observations fall within 2σ of µ.Approximately 99.7% of the observations fall within 3σ of µ.Note. The 68-95-99.7 rule is also know as the empirical rule.
Al Nosedal and Alison Weir STA258H5 Winter 2017 59 / 88
Example N(µ = 0, σ = 1)
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Roughly 68%
Al Nosedal and Alison Weir STA258H5 Winter 2017 60 / 88
Example N(µ = 0, σ = 1)
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Roughly 95%
Al Nosedal and Alison Weir STA258H5 Winter 2017 61 / 88
R Code
meanP<-mean(prototype);
sdP<-sqrt(var(prototype));
lower<-meanP-sdP;
upper<-meanP+sdP;
N<-length(prototype);
100*length(prototype[ lower<prototype & prototype<upper])/N;
## [1] 69.8
Al Nosedal and Alison Weir STA258H5 Winter 2017 62 / 88
R Code
meanP<-mean(prototype);
sdP<-sqrt(var(prototype));
lower<-meanP-2*sdP;
upper<-meanP+2*sdP;
N<-length(prototype);
100*length(prototype[ lower<prototype & prototype<upper])/N;
## [1] 94.9
Al Nosedal and Alison Weir STA258H5 Winter 2017 63 / 88
R Code
meanP<-mean(prototype);
sdP<-sqrt(var(prototype));
lower<-meanP-3*sdP;
upper<-meanP+3*sdP;
N<-length(prototype);
100*length(prototype[ lower<prototype & prototype<upper])/N;
## [1] 99.5
Al Nosedal and Alison Weir STA258H5 Winter 2017 64 / 88
R Code
eruptions<-faithful$eruptions;
meanE<-mean(eruptions);
sdE<-sqrt(var(eruptions));
lower<-meanE-sdE;
upper<-meanE+sdE;
N<-length(eruptions);
100*length(eruptions[ lower<eruptions & eruptions<upper])/N;
## [1] 55.14706
Al Nosedal and Alison Weir STA258H5 Winter 2017 65 / 88
R Code
eruptions<-faithful$eruptions;
meanE<-mean(eruptions);
sdE<-sqrt(var(eruptions));
lower<-meanE-2*sdE;
upper<-meanE+2*sdE;
N<-length(eruptions);
100*length(eruptions[ lower<eruptions & eruptions<upper])/N;
## [1] 100
Al Nosedal and Alison Weir STA258H5 Winter 2017 66 / 88
R Code
eruptions<-faithful$eruptions;
meanE<-mean(eruptions);
sdE<-sqrt(var(eruptions));
lower<-meanE-3*sdE;
upper<-meanE+3*sdE;
N<-length(eruptions);
100*length(eruptions[ lower<eruptions & eruptions<upper])/N;
## [1] 100
Al Nosedal and Alison Weir STA258H5 Winter 2017 67 / 88
Homework?
Modify R Code given above and see what happens with waiting time.
Al Nosedal and Alison Weir STA258H5 Winter 2017 68 / 88
Q-Q Plot (Example)
A sample of n = 10 observations gives the values in the following table:
Ordered Observations Probability levels Standard Normalx(j) (j − 1/2)/n Quantiles q(j)-1 0.05 -1.645
-0.10 0.15 -1.0360.16 0.25 -0.6740.41 0.35 -0.3850.62 0.45 -0.1250.80 0.55 0.1251.26 0.65 0.3851.54 0.75 0.6741.71 0.85 1.0362.30 0.95 1.645
Here, for example, P[Z ≤ 0.385] =∫ 0.385−∞
1√2πe−z
2/2dz = 0.65.
Al Nosedal and Alison Weir STA258H5 Winter 2017 69 / 88
Q-Q Plot (Example)
Let us now construct the Q-Q plot and comment on its appearance. TheQ-Q plot for the foregoing data, which is a plot of the ordered data x(j)against the normal quantiles is shown below. The pairs of points(q(j), x(j)) lie very nearly along a straight line, and we would not reject thenotion that these data are Normally distributed-particularly with a samplesize as small as n = 10.
Al Nosedal and Alison Weir STA258H5 Winter 2017 70 / 88
R Code
## Ordered observations;
obs<-c(-1,-0.1,0.16,0.41,0.62,0.80,1.26,1.54,1.71,2.30);
n<-length(obs);
## Corresponding probability values;
prob.levels<-(seq(1:n)-0.5)/n;
## Standard Normal Quantiles;
norm.quantiles<-qnorm(prob.levels);
Al Nosedal and Alison Weir STA258H5 Winter 2017 71 / 88
R Code
## Q-Q plot;
plot(norm.quantiles,obs,
xlab=expression(q[(j)]),
ylab=expression(x[(j)]),
main="Ours",col="blue",pch=19);
## Q-Q plot (using R function);
qqnorm(obs,col="blue",pch=19);
Al Nosedal and Alison Weir STA258H5 Winter 2017 72 / 88
R Code
●
●●
●●
●
●
●●
●
−1.5 −0.5 0.0 0.5 1.0 1.5
−1.
00.
01.
02.
0Ours
q(j)
x (j)
Al Nosedal and Alison Weir STA258H5 Winter 2017 73 / 88
R Code
## Q-Q plot (using R function);
qqnorm(obs,col="blue",pch=19);
Al Nosedal and Alison Weir STA258H5 Winter 2017 74 / 88
R Code
●
●●
●●
●
●
●●
●
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−1.
00.
01.
02.
0Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Al Nosedal and Alison Weir STA258H5 Winter 2017 75 / 88
Q-Q plots
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●●●●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
● ●●●●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●●●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●●
●
●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●●●
●
●●
●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●●
●
●●
●●
●●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●●●
●
●
●
●●●
●●
●
●
●
●
●●●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●
●●
●●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●●
●●●●
●●
●
●
●
●
●
●
●●
●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●●●
●●●
●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●●
●●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●●
●●
●●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●●
●
●●
●●●
●●
●
●
●
●
●
●
●●
●●
●●●●
●●●●
●
●
●●●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
12
3prototype
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Al Nosedal and Alison Weir STA258H5 Winter 2017 76 / 88
Q-Q plots
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
1.5
2.5
3.5
4.5
Duration (min)
duration
coun
t
Al Nosedal and Alison Weir STA258H5 Winter 2017 77 / 88
Q-Q plots
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
5070
90Waiting (min)
waiting time
coun
t
Al Nosedal and Alison Weir STA258H5 Winter 2017 78 / 88
If plots are OK → data could come from a Normal distribution . . .but it could come from some other distribution. So good plots don’tprove data came from a Normal distribution
If plots are not OK → data probably does not come from a Normaldistribution, we can’t assume data is from Normal population
Al Nosedal and Alison Weir STA258H5 Winter 2017 79 / 88
Panel of graphs
n=30, N(0,1)
y1
Fre
quen
cy
−3 −1 1 2
0
n=30, N(0,1)
y2
Fre
quen
cy
−3 −1 1 3
0
n=30, N(0,1)
y3
Fre
quen
cy
−3 −1 1 2
0
n=30, N(0,1)
y4
Fre
quen
cy
−2 0 1 2
06
Al Nosedal and Alison Weir STA258H5 Winter 2017 80 / 88
Q-Q plots
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
−2 0 2
−2
−1
01
2n=30, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
−2 0 2
−2
−1
01
2
n=30, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Al Nosedal and Alison Weir STA258H5 Winter 2017 81 / 88
Q-Q plots
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
−2 0 2
−2
−1
01
n=30, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
−2 0 2
−2
−1
01
2
n=30, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Al Nosedal and Alison Weir STA258H5 Winter 2017 82 / 88
Panel of graphs
n=70, N(0,1)
y1
Fre
quen
cy
−2 0 1 2 3
0
n=70, N(0,1)
y2
Fre
quen
cy
−4 −2 0 2
0
n=70, N(0,1)
y3
Fre
quen
cy
−3 −1 1 2
0
n=70, N(0,1)
y4
Fre
quen
cy
−2 0 1 2
0
Al Nosedal and Alison Weir STA258H5 Winter 2017 83 / 88
QQ plots
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
−2 0 2
−1
01
23
n=70, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
−2 0 2
−3
−1
12
n=70, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Al Nosedal and Alison Weir STA258H5 Winter 2017 84 / 88
QQ plots
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●●
−2 0 2
−2
01
2n=70, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
−2 0 2
−1
01
2
n=70, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Al Nosedal and Alison Weir STA258H5 Winter 2017 85 / 88
Panel of graphs
n=120, N(0,1)
y1
Fre
quen
cy
−2 0 1 2
0
n=120, N(0,1)
y2
Fre
quen
cy
−4 −2 0 2
0
n=120, N(0,1)
y3
Fre
quen
cy
−3 −1 1 2
0
n=120, N(0,1)
y4
Fre
quen
cy
−4 −2 0 2
0
Al Nosedal and Alison Weir STA258H5 Winter 2017 86 / 88
QQ plots
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 0 2
−2
−1
01
n=120, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
−2 0 2
−4
−2
01
2
n=120, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Al Nosedal and Alison Weir STA258H5 Winter 2017 87 / 88
QQ plots
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●●
●●
●
●
●●
●●●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●●
●
●
●
−2 0 2
−3
−1
01
2n=120, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
−2 0 2
−3
−1
12
n=120, N(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Al Nosedal and Alison Weir STA258H5 Winter 2017 88 / 88