STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F...
Transcript of STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F...
![Page 1: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/1.jpg)
STA258H5
Al Nosedaland Alison Weir
Winter 2017
Al Nosedal and Alison Weir STA258H5 Winter 2017 1 / 132
![Page 2: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/2.jpg)
SIMPLE LINEAR REGRESSION
Al Nosedal and Alison Weir STA258H5 Winter 2017 2 / 132
![Page 3: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/3.jpg)
Scatterplot
A scatterplot shows the relationship between two quantitative variablesmeasured on the same individuals. The values of one variable appear onthe horizontal axis, and the values of the other variable appear on thevertical axis. Each individual in the data appears as the point in the plotfixed by the values of both variables for that individual.Always plot the explanatory variable, if there is one, on the horizontal axis(the x axis) of a scatterplot. As a reminder, we usually call the explanatoryvariable x and the response variable y . If there is no explanatory-responsedistinction, either variable can go on the horizontal axis.
Al Nosedal and Alison Weir STA258H5 Winter 2017 3 / 132
![Page 4: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/4.jpg)
Examining Scatterplot
In any graph of data, look for the overall pattern and for strikingdeviations from that pattern.You can describe the overall pattern of a scatterplot by the form,direction, and strength of the relationship.An important kind of deviation is an outlier, an individual value that fallsoutside the overall pattern of the relationship.
Al Nosedal and Alison Weir STA258H5 Winter 2017 4 / 132
![Page 5: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/5.jpg)
Positive association, negative association
Two variables are positively associated when above-average values ofone tend to accompany above-average values of the other, andbelow-average values also tend to occur together.Two variables are negatively associated when above-average values ofone tend to accompany below-average values of the other, and vice versa.
Al Nosedal and Alison Weir STA258H5 Winter 2017 5 / 132
![Page 6: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/6.jpg)
Do heavier people burn more energy?
Metabolic rate, the rate at which the body consumes energy, is importantin studies of weight gain, dieting, and exercise. We have data on the leanbody mass and resting metabolic rate for 12 women who are subjects in astudy of dieting. Lean body mass, given in kilograms, is a person’s weightleaving out all fat. Metabolic rate is measured in calories burned per 24hours.The researchers believe that lean body mass is an important influence onmetabolic rate. Make a scatterplot to examine this belief.
Al Nosedal and Alison Weir STA258H5 Winter 2017 6 / 132
![Page 7: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/7.jpg)
Do heavier people burn more energy?
The study of dieting described earlier collected data on the lean body mass(in kilograms) and metabolic rate (in calories) for both female and malesubjects.
Mass Rate Sex Mass Rate Sex
36.1 995 F 40.3 1189 F54.6 1425 F 33.1 913 F48.5 1396 F 42.4 1124 F42.0 1418 F 34.5 1052 F50.6 1502 F 51.1 1347 F42.0 1256 F 41.2 1204 F
Al Nosedal and Alison Weir STA258H5 Winter 2017 7 / 132
![Page 8: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/8.jpg)
More data
Mass Rate Sex Mass Rate Sex
51.9 1867 M 47.4 1322 M46.9 1439 M 48.7 1614 M62.0 1792 M 51.9 1460 M62.9 1666 M
Al Nosedal and Alison Weir STA258H5 Winter 2017 8 / 132
![Page 9: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/9.jpg)
a) Make a scatterplot of metabolic rate versus lean body mass for all 19subjects. Use separate symbols to distinguish women and men. (This is acommon method to compare two groups of individuals in a scatterplot)b) Does the same overall pattern hold for both women and men? What isthe most important difference between women and men?
Al Nosedal and Alison Weir STA258H5 Winter 2017 9 / 132
![Page 10: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/10.jpg)
Reading our data
# Step 1. Entering data;
# url of metabolic rate data;
meta_url = "http://www.math.unm.edu/~alvaro/metabolic2.txt"
# import data in R;
data = read.table(meta_url, header = TRUE);
Al Nosedal and Alison Weir STA258H5 Winter 2017 10 / 132
![Page 11: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/11.jpg)
Reading our data
# Step 2. Formating data;
x.min=min(data$Mass);
x.max=max(data$Mass);
y.min=min(data$Rate);
y.max=max(data$Rate);
female=data[1:12, ];
male=data[13:19, ];
Al Nosedal and Alison Weir STA258H5 Winter 2017 11 / 132
![Page 12: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/12.jpg)
Scatterplot
# Step 3. Making scatterplot;
plot(female$Mass,female$Rate,pch=19,col="red",
xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)",
xlim=c(x.min,x.max),ylim=c(y.min,y.max));
Al Nosedal and Alison Weir STA258H5 Winter 2017 12 / 132
![Page 13: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/13.jpg)
Scatterplot
●
●●●●
●●
●
●●
●
●
35 40 45 50 55 60
1000
1600
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
Al Nosedal and Alison Weir STA258H5 Winter 2017 13 / 132
![Page 14: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/14.jpg)
Scatterplot
# Step 3. Making scatterplot;
plot(female$Mass,female$Rate,pch=19,col="red",
xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)",
xlim=c(x.min,x.max),ylim=c(y.min,y.max));
points(male$Mass,male$Rate,pch=19,col='blue');
# pch=19 tells R that you want solid circles;
Al Nosedal and Alison Weir STA258H5 Winter 2017 14 / 132
![Page 15: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/15.jpg)
Scatterplot
●
●●●●
●●
●
●●
●
●
35 40 45 50 55 60
1000
1600
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
●
●
●
●
●
●
●
Al Nosedal and Alison Weir STA258H5 Winter 2017 15 / 132
![Page 16: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/16.jpg)
Scatterplot
# Step 3. Making scatterplot;
plot(female$Mass,female$Rate,pch=19,col="red",
xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)",
xlim=c(x.min,x.max),ylim=c(y.min,y.max));
points(male$Mass,male$Rate,pch=19,col='blue');
legend("topleft",c("female","male"),pch=c(19,19),
col=c('red','blue'),bty="n");
# legend tells R that you want to add a legend to
# your graph;
# topleft, where you want to position legend;
# bty="n" NO box around legend;
Al Nosedal and Alison Weir STA258H5 Winter 2017 16 / 132
![Page 17: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/17.jpg)
Scatterplot
●
●●●●
●●
●
●●
●
●
35 40 45 50 55 60
1000
1600
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
●
●
●
●
●
●
●
●
●
femalemale
Al Nosedal and Alison Weir STA258H5 Winter 2017 17 / 132
![Page 18: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/18.jpg)
b) Solution
For both men and women, the association is linear and positive. Thewomen’s points show a stronger association. As a group, males typicallyhave larger values for both variables (they tend to have more mass, andtend to burn more calories per day).
Al Nosedal and Alison Weir STA258H5 Winter 2017 18 / 132
![Page 19: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/19.jpg)
Correlation
The correlation measures the direction and strength of the linearrelationship between two quantitative variables. Correlation is usuallywritten as r .Suppose that we have data on variables x and y for n individuals. Thevalues for the first individual are x1 and y1, the values for the secondindividual are x2 and y2, and so on.The means and standard deviations of the two variables are x and Sx forthe x-values, and y and Sy for the y -values. The correlation r between xand y is
r =1
n − 1
n∑i=1
(xi − x
Sx
)(yi − y
Sy
)
Al Nosedal and Alison Weir STA258H5 Winter 2017 19 / 132
![Page 20: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/20.jpg)
Facts about correlation
The correlation r measures the strength and direction of the linearassociation between two quantitative variables x and y . Although youcalculate a correlation for any scatterplot, r measures only straight-linerelationships.Correlation indicates the direction of a linear relationship by its sign: r > 0for a positive association and r < 0 for a negative association. Correlationalways satisfies −1 ≤ r ≤ 1 and indicates the strength of a relationship byhow close it is to −1 or 1. Perfect correlation, r = ±1, occurs only whenthe points on a scatterplot lie exactly on a straight line.
Al Nosedal and Alison Weir STA258H5 Winter 2017 20 / 132
![Page 21: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/21.jpg)
Coral reefs
This example is about a study in which scientists examined data on meansea surface temperatures (in degrees Celsius) and mean coral growth (inmillimeters per year) over a several-year period at locations in the RedSea. Here are the data:
Sea Surface Temperature Growth
29.68 2.6329.87 2.5830.16 2.6030.22 2.4830.48 2.2630.65 2.3830.90 2.26
Al Nosedal and Alison Weir STA258H5 Winter 2017 21 / 132
![Page 22: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/22.jpg)
a) Make a scatterplot. Which is the explanatory variable?b) Find the correlation r step-by-step. Explain how your value for rmatches your graph in a).c) Enter these data into your calculator and use the correlation function tofind r (or use R to find r).
Al Nosedal and Alison Weir STA258H5 Winter 2017 22 / 132
![Page 23: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/23.jpg)
Reading our data
# Step 1. Entering data;
# url of coral rate data;
coral_url = "http://www.math.unm.edu/~alvaro/coral.txt"
# import data in R;
data = read.table(coral_url, header = TRUE);
Al Nosedal and Alison Weir STA258H5 Winter 2017 23 / 132
![Page 24: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/24.jpg)
Scatterplot
Temperature is the explanatory variable.
# Step 2. Making scatterplot;
plot(data,xlab="Sea Surface Temperature",
ylab="Coral Growth (mm)",pch=19,col="blue");
Al Nosedal and Alison Weir STA258H5 Winter 2017 24 / 132
![Page 25: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/25.jpg)
Scatterplot
●
●
●●
●
●
●
2.3 2.4 2.5 2.6
29.8
30.4
Sea Surface Temperature
Cor
al G
row
th (
mm
)
Al Nosedal and Alison Weir STA258H5 Winter 2017 25 / 132
![Page 26: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/26.jpg)
x and y
First, let’s find x and y
x =29.68 + 29.87 + 30.16 + 30.22 + 30.48 + 30.65 + 30.90
7= 30.28
y =2.63 + 2.58 + 2.60 + 2.48 + 2.26 + 2.38 + 2.26
7= 2.4557
Al Nosedal and Alison Weir STA258H5 Winter 2017 26 / 132
![Page 27: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/27.jpg)
Sx
Now, let’s find Sx and Sy
S2x =
(29.68− 30.28)2 + ...+ (30.65− 30.28)2 + (30.90− 30.28)2
6
S2x = 0.1845
Sx = 0.4296
Al Nosedal and Alison Weir STA258H5 Winter 2017 27 / 132
![Page 28: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/28.jpg)
Sy
S2y =
(2.63− 2.4557)2 + ...+ (2.38− 2.4557)2 + (2.26− 2.4557)2
6
S2y = 0.0249
Sy = 0.1578
Al Nosedal and Alison Weir STA258H5 Winter 2017 28 / 132
![Page 29: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/29.jpg)
Sample Covariance
Finally, we find the sample covariance and r
7∑i=1
xiyi = (29.68)(2.63) + ...+ (30.65)(2.38) + (30.90)(2.26)
= 520.1504
Sample covariance =
∑ni=1 xiyin − 1
− nx y
n − 1
=520.1504
6− (7)(30.28)(2.4557)
6= 86.6917− 86.7516 = −0.0599
Al Nosedal and Alison Weir STA258H5 Winter 2017 29 / 132
![Page 30: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/30.jpg)
Correlation
r =Sample Covariance
SxSy=
−0.0599
(0.4296)(0.1578)= −0.8835
This is consistent with the strong, negative association depicted in thescatterplot.c) R will give a value of r = −0.8635908.
Al Nosedal and Alison Weir STA258H5 Winter 2017 30 / 132
![Page 31: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/31.jpg)
R Code
growth=data[ ,1];
# data[ ,1] gives you the first column of data;
temp=data[ ,2];
# data[ ,2] gives you the 2nd column of data;
cov(growth, temp);
## [1] -0.05593333
cor(growth, temp);
## [1] -0.8635908
Al Nosedal and Alison Weir STA258H5 Winter 2017 31 / 132
![Page 32: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/32.jpg)
Regression Line
A regression line is a straight line that describes how a response variable ychanges as an explanatory variable x changes. We often use a regressionline to predict the value of y for a given value of x .
Al Nosedal and Alison Weir STA258H5 Winter 2017 32 / 132
![Page 33: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/33.jpg)
Equation of the Least-Squares Regression Line
We have data on an explanatory variable x and a response variable y for nindividuals. From the data, calculate the means x and y and the standarddeviations Sx and Sy of the two variables, and their correlation r . Theleast-squares regression line is the line
y = a + bx
with slope
b = rSySx
and intercept
a = y − bx
Al Nosedal and Alison Weir STA258H5 Winter 2017 33 / 132
![Page 34: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/34.jpg)
Least-Squares Regression Line
The least-squares regression line of y on x is the line that makes thesum of the squares of the vertical distances of the data points from theline as small as possible.
Al Nosedal and Alison Weir STA258H5 Winter 2017 34 / 132
![Page 35: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/35.jpg)
Do heavier people burn more energy?
We have data on the lean body mass and resting metabolic rate for 12women who are subjects in a study of dieting. Lean body mass, given inkilograms, is a person’s weight leaving out all fat. Metabolic rate, incalories burned per 24 hours, is the rate at which the body consumesenergy.
Mass Rate Mass Rate
36.1 995 40.3 118954.6 1425 33.1 91348.5 1396 42.4 112442.0 1418 34.5 105250.6 1502 51.1 134742.0 1256 41.2 1204
Al Nosedal and Alison Weir STA258H5 Winter 2017 35 / 132
![Page 36: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/36.jpg)
a) Make a scatterplot that shows how metabolic rate depends on bodymass. There is a quite strong linear relationship, with correlationr = 0.876.b) Find the least-squares regression line for predicting metabolic rate frombody mass. Add this line to your scatterplot.c) Explain in words what the slope of the regression line tells us.d) Another woman has a lean body mass of 45 kilograms. What is herpredicted metabolic rate?
Al Nosedal and Alison Weir STA258H5 Winter 2017 36 / 132
![Page 37: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/37.jpg)
Solutions
b) the regression equation is
y = 201.2 + 24.026x
where y= metabolic rate and x= body mass.c) The slope tells us that on the average, metabolic rate increases byabout 24 calories per day for each additional kilogram of body mass.d) For x = 45 kg, the predicted metabolic rate isy = 1282.4 calories per day.
Al Nosedal and Alison Weir STA258H5 Winter 2017 37 / 132
![Page 38: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/38.jpg)
Reading our data
# Step 1. Entering data;
# url of metabolic rate data;
meta_url = "http://www.math.unm.edu/~alvaro/metabolic2.txt"
# import data in R;
data = read.table(meta_url, header = TRUE);
Al Nosedal and Alison Weir STA258H5 Winter 2017 38 / 132
![Page 39: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/39.jpg)
Reading our data
# Step 2. Formating data;
female = data[1:12, ];
mass = female$Mass;
rate = female$Rate;
Al Nosedal and Alison Weir STA258H5 Winter 2017 39 / 132
![Page 40: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/40.jpg)
R code
# Step 2. Making scatterplot;
plot(mass, rate ,pch=19,col="blue",
xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)");
Al Nosedal and Alison Weir STA258H5 Winter 2017 40 / 132
![Page 41: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/41.jpg)
Scatterplot
●
●●●
●
●●
●
●
●
●
●
35 40 45 50 55
900
1200
1500
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
Al Nosedal and Alison Weir STA258H5 Winter 2017 41 / 132
![Page 42: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/42.jpg)
Regression Equation (R Code)
# Step 3. Finding Regression Equation;
metabolic.reg=lm(rate~mass);
Al Nosedal and Alison Weir STA258H5 Winter 2017 42 / 132
![Page 43: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/43.jpg)
a and b
metabolic.reg$coef;
## (Intercept) mass
## 201.16160 24.02607
Al Nosedal and Alison Weir STA258H5 Winter 2017 43 / 132
![Page 44: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/44.jpg)
Scatterplot with least-squares line
plot(mass,rate,
pch=19,col="blue", xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)");
abline(metabolic.reg$coef, col="red");
Al Nosedal and Alison Weir STA258H5 Winter 2017 44 / 132
![Page 45: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/45.jpg)
Scatterplot with least-squares line
●
●●●
●
●●
●
●
●
●
●
35 40 45 50 55
900
1200
1500
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
Al Nosedal and Alison Weir STA258H5 Winter 2017 45 / 132
![Page 46: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/46.jpg)
Prediction
new<-data.frame(mass=45);
predict(metabolic.reg,newdata=new);
## 1
## 1282.335
Al Nosedal and Alison Weir STA258H5 Winter 2017 46 / 132
![Page 47: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/47.jpg)
Do heavier people burn more energy?
Return to the example about lean body mass and metabolic rate. We willuse these data to illustrate influence.a) Make a scatterplot of the data that is suitable for predicting metabolicrate from body mass, with two new points added. Point A: mass 42kilograms, metabolic rate 1500 calories. Point B: mass 70 kilograms,metabolic rate 1400 calories. In which direction is each of these points anoutlier?b) Add three least-squares regression lines to your plot: for the original 12women, for the original women plus Point A, and for the original womenplus Point B. Which new point is more influential for the regression line?Explain in simple language why each new point moves the line in the wayyour graph shows.
Al Nosedal and Alison Weir STA258H5 Winter 2017 47 / 132
![Page 48: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/48.jpg)
Reading our data
# Step 1. Entering data;
# url of metabolic rate data;
meta_url = "http://www.math.unm.edu/~alvaro/metabolic.txt"
# import data in R;
data = read.table(meta_url, header = TRUE);
Al Nosedal and Alison Weir STA258H5 Winter 2017 48 / 132
![Page 49: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/49.jpg)
Scatterplot
plot(data,pch=19,col="blue",
xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)");
Al Nosedal and Alison Weir STA258H5 Winter 2017 49 / 132
![Page 50: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/50.jpg)
Scatterplot
●
●●●
●
●●
●
●
●
●
●
35 40 45 50 55
900
1200
1500
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
Al Nosedal and Alison Weir STA258H5 Winter 2017 50 / 132
![Page 51: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/51.jpg)
Least-Squares Regression Line
# Step 3. Finding L-S Regression Line;
mod=lm(data$Rate~data$Mass);
Al Nosedal and Alison Weir STA258H5 Winter 2017 51 / 132
![Page 52: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/52.jpg)
Scatterplot + L-S Regression Line
plot(data,pch=19,col="blue",
xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)");
abline(mod$coeff,col="red",lty=2);
# abline tells R to add a line to your
# scatterplot;
# lty= 2 is used to draw a dashed-line;
Al Nosedal and Alison Weir STA258H5 Winter 2017 52 / 132
![Page 53: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/53.jpg)
Scatterplot + L-S Regression Line
●
●●●
●
●●
●
●
●
●
●
35 40 45 50 55
900
1200
1500
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
Al Nosedal and Alison Weir STA258H5 Winter 2017 53 / 132
![Page 54: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/54.jpg)
Scatterplot + A +B
plot(data,pch=19,col="blue",
xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)",
xlim=c(30,70),ylim=c(850,1600 ));
points(42,1500,pch="A",col="red");
#point A;
points(70,1400,pch="B",col="green");
#point B;
Al Nosedal and Alison Weir STA258H5 Winter 2017 54 / 132
![Page 55: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/55.jpg)
●
●●●
●
●
●
●
●
●
●
●
30 40 50 60 70
1000
1400
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
AB
Al Nosedal and Alison Weir STA258H5 Winter 2017 55 / 132
![Page 56: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/56.jpg)
Least-Squares Regression Lines
# Step 3. Finding L-S Regression Line;
mod=lm(data$Rate~data$Mass);
# original;
modA=lm(c(data$Rate,1500)~c(data$Mass,42));
# point A;
modB=lm(c(data$Rate,1400)~c(data$Mass,70));
# point B;
Al Nosedal and Alison Weir STA258H5 Winter 2017 56 / 132
![Page 57: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/57.jpg)
Scatterplot + A +B + L-S Regression Lines
plot(data,pch=19,col="blue",
xlab="Lean Body Mass (kg)",
ylab="Metabolic Rate (calories/day)",
xlim=c(30,70),ylim=c(850,1600 ));
points(42,1500,pch="A",col="red");
points(70,1400,pch="B",col="green");
abline(mod$coeff,col="blue",lty=2);
abline(modA$coeff,col="red",lty=2);
abline(modB$coeff,col="green",lty=2);
Al Nosedal and Alison Weir STA258H5 Winter 2017 57 / 132
![Page 58: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/58.jpg)
●
●●●
●
●
●
●
●
●
●
●
30 40 50 60 70
1000
1400
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
AB
Al Nosedal and Alison Weir STA258H5 Winter 2017 58 / 132
![Page 59: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/59.jpg)
Adding a legend
legend("bottomright",
c("original","original + A","original + B"),
col=c("blue","red","green"),
lty=c(2,2,2),bty="n");
Al Nosedal and Alison Weir STA258H5 Winter 2017 59 / 132
![Page 60: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/60.jpg)
●
●●●
●
●
●
●
●
●
●
●
30 40 50 60 70
1000
1400
Lean Body Mass (kg)
Met
abol
ic R
ate
(cal
orie
s/da
y)
AB
originaloriginal + Aoriginal + B
Al Nosedal and Alison Weir STA258H5 Winter 2017 60 / 132
![Page 61: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/61.jpg)
Solutions
a) Point A lies above the other points; that is, the metabolic rate is higherthan we expect for the given body mass. Point B lies to the right of theother points; that is, it is an outlier in the x (mass) direction, and themetabolic rate is lower than we would expect.b) In the plot, the dashed blue line is the regression line for the originaldata. The dashed red line slightly above that includes Point A; it has avery similar slope to the original line, but a slightly higher intercept,because Point A pulls the line up. The third line includes Point B, themore influential point; because Point B is an outlier in the x direction, it”pulls” the line down so that it is less steep.
Al Nosedal and Alison Weir STA258H5 Winter 2017 61 / 132
![Page 62: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/62.jpg)
Influential observations
An observation is influential for a statistical calculation if removing itwould markedly change the result of the calculation.The result of a statistical calculation may be of little practical use if itdepends strongly on a few influential observations.Points that are outliers in either the x or the y direction of a scatterplotare often influential for the correlation. Points that are outliers in the xdirection are often influential for the least-squares regression line.
Al Nosedal and Alison Weir STA258H5 Winter 2017 62 / 132
![Page 63: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/63.jpg)
Another example
There is some evidence that drinking moderate amounts of wine helpsprevent heart attacks. A table shown below gives data on yearly wineconsumption (liters of alcohol from drinking wine, per person) and yearlydeaths from heart disease (deaths per 100,000 people) in 19 developednations∗.
Al Nosedal and Alison Weir STA258H5 Winter 2017 63 / 132
![Page 64: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/64.jpg)
Another example
a) Make a scatterplot that shows how national wine consumption helpsexplain heart disease death rates.b) Describe the form of the relationship. Is there a linear pattern? Howstrong is the relationship?c) Is the direction of the association positive or negative? Explain insimple language what this says about wine and heart disease. Do youthink these data give good evidence that drinking wine causes a reductionin heart disease deaths? Why?
Al Nosedal and Alison Weir STA258H5 Winter 2017 64 / 132
![Page 65: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/65.jpg)
Table
Country Alcohol Heart Country Alcohol Heartfrom disease from diseasewine deaths wine deaths
Australia 2.5 211 Netherlands 1.8 167Austria 3.9 167 New Zealand 1.8 266Belgium 2.9 131 Norway 0.8 227Canada 2.4 191 Spain 6.5 86
Denmark 2.9 220 Sweden 1.6 207Finland 0.8 297 Switzerland 5.8 115France 9.1 71 United Kingdom 1.3 285Iceland 0.8 211 United States 1.2 199Ireland 0.7 300 West Germany 2.7 172
Italy 7.9 107
Al Nosedal and Alison Weir STA258H5 Winter 2017 65 / 132
![Page 66: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/66.jpg)
Solution (Bar chart)
# Step 1. Entering data;
consumption=c(2.5, 3.9, 2.9, 2.4, 2.9, 0.8, 9.1,
0.8, 0.7, 7.9, 1.8, 1.9, 0.8, 6.5, 1.6, 5.8, 1.3, 1.2, 2.7);
death.rates=c(211, 167, 131, 191, 220, 297, 71,
211, 300, 107,167, 266, 227, 86, 207, 115, 285, 199, 172);
Al Nosedal and Alison Weir STA258H5 Winter 2017 66 / 132
![Page 67: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/67.jpg)
Scatterplot (R code)
plot(consumption,death.rates);
Al Nosedal and Alison Weir STA258H5 Winter 2017 67 / 132
![Page 68: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/68.jpg)
Scatterplot (R code)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2 4 6 8
100
200
300
consumption
deat
h.ra
tes
Al Nosedal and Alison Weir STA258H5 Winter 2017 68 / 132
![Page 69: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/69.jpg)
Another example (cont.)
Our table gives data on wine consumption and heart disease death rates in19 countries. A scatterplot shows a moderately strong relationship.a) The correlation for these variables is r = −0.843. What does a negativecorrelation say about wine consumption and heart disease deaths?b) The least-squares regression line for predicting heart disease death ratefrom wine consumption is
y = 260.56− 22.969x
Verify this using R. Then use this equation to predict the heart diseasedeath rate in another country where adults average 4 liters of alcohol fromwine each year.
Al Nosedal and Alison Weir STA258H5 Winter 2017 69 / 132
![Page 70: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/70.jpg)
a) Finding correlation
cor(consumption,death.rates);
## [1] -0.8428127
Al Nosedal and Alison Weir STA258H5 Winter 2017 70 / 132
![Page 71: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/71.jpg)
Least-squares Regression Line
explanatory<-consumption;
response<-death.rates;
wine.reg<-lm(response~explanatory);
Al Nosedal and Alison Weir STA258H5 Winter 2017 71 / 132
![Page 72: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/72.jpg)
R code
names(wine.reg);
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
Al Nosedal and Alison Weir STA258H5 Winter 2017 72 / 132
![Page 73: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/73.jpg)
a and b
wine.reg$coef;
## (Intercept) explanatory
## 260.56338 -22.96877
Al Nosedal and Alison Weir STA258H5 Winter 2017 73 / 132
![Page 74: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/74.jpg)
Prediction
wine.reg$coef[1]+wine.reg$coef[2]*4;
## (Intercept)
## 168.6883
Al Nosedal and Alison Weir STA258H5 Winter 2017 74 / 132
![Page 75: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/75.jpg)
Prediction (again...)
new=data.frame(explanatory=4);
predict(wine.reg,newdata=new);
## 1
## 168.6883
Al Nosedal and Alison Weir STA258H5 Winter 2017 75 / 132
![Page 76: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/76.jpg)
c) The association is negative: Countries with high wine consumption havefewer heart disease deaths, while low wine consumption tends to go withmore deaths from heart disease. This does not prove causation; there maybe some other reason for the link.
Al Nosedal and Alison Weir STA258H5 Winter 2017 76 / 132
![Page 77: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/77.jpg)
Our main example
One effect of global warming is to increase the flow of water into theArctic Ocean from rivers. Such an increase might have major effects onthe world’s climate. Six rivers (Yenisey, Lena, Ob, Pechora, Kolyma, andSevernaya Dvina) drain two-thirds of the Arctic in Europe and Asia.Several of these are among the largest rivers on earth. File arctic-rivers.txtcontains the total discharge from these rivers each year from 1936 to19992. Discharge is measured in cubic kilometers of water.
Al Nosedal and Alison Weir STA258H5 Winter 2017 77 / 132
![Page 78: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/78.jpg)
Reading our data
# url of arctic rivers data;
riv_url = "http://www.math.unm.edu/~alvaro/arctic-rivers.txt"
# import data in R;
arctic_rivers = read.table(riv_url, header = TRUE);
Al Nosedal and Alison Weir STA258H5 Winter 2017 78 / 132
![Page 79: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/79.jpg)
Scatterplot (R code)
plot(arctic_rivers$Year,arctic_rivers$Discharge);
Al Nosedal and Alison Weir STA258H5 Winter 2017 79 / 132
![Page 80: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/80.jpg)
Scatterplot (R code)
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
1940 1960 1980 2000
1600
1900
arctic_rivers$Year
arct
ic_r
iver
s$D
isch
arge
Al Nosedal and Alison Weir STA258H5 Winter 2017 80 / 132
![Page 81: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/81.jpg)
Scatterplot (R code)
plot(arctic_rivers$Year,arctic_rivers$Discharge,
pch=19,col="blue");
Al Nosedal and Alison Weir STA258H5 Winter 2017 81 / 132
![Page 82: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/82.jpg)
Scatterplot (R code)
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
1940 1960 1980 2000
1600
1900
arctic_rivers$Year
arct
ic_r
iver
s$D
isch
arge
Al Nosedal and Alison Weir STA258H5 Winter 2017 82 / 132
![Page 83: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/83.jpg)
Scatterplot (R code)
plot(arctic_rivers$Year,arctic_rivers$Discharge,
pch=19,col="blue", xlab="Year",
ylab="Discharge");
Al Nosedal and Alison Weir STA258H5 Winter 2017 83 / 132
![Page 84: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/84.jpg)
Scatterplot (R code)
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
1940 1960 1980 2000
1600
1900
Year
Dis
char
ge
Al Nosedal and Alison Weir STA258H5 Winter 2017 84 / 132
![Page 85: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/85.jpg)
The scatterplot shows a weak positive, linear relationship.
Al Nosedal and Alison Weir STA258H5 Winter 2017 85 / 132
![Page 86: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/86.jpg)
Our main example
r<-cor(arctic_rivers$Year,arctic_rivers$Discharge);
r;
## [1] 0.3343926
Al Nosedal and Alison Weir STA258H5 Winter 2017 86 / 132
![Page 87: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/87.jpg)
The scatterplot shows a weak positive, linear relationship, which isconfirmed by r (0.3343926).
Al Nosedal and Alison Weir STA258H5 Winter 2017 87 / 132
![Page 88: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/88.jpg)
R code
explanatory<-arctic_rivers$Year;
response<-arctic_rivers$Discharge
rivers.reg<-lm(response~explanatory);
Al Nosedal and Alison Weir STA258H5 Winter 2017 88 / 132
![Page 89: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/89.jpg)
R code
names(rivers.reg);
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
Al Nosedal and Alison Weir STA258H5 Winter 2017 89 / 132
![Page 90: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/90.jpg)
a and b
rivers.reg$coef;
## (Intercept) explanatory
## -2056.769460 1.966163
Al Nosedal and Alison Weir STA258H5 Winter 2017 90 / 132
![Page 91: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/91.jpg)
Scatterplot with least-squares line
plot(explanatory,response,
pch=19,col="blue", xlab="Year",
ylab="Discharge");
abline(rivers.reg$coef, col="red");
Al Nosedal and Alison Weir STA258H5 Winter 2017 91 / 132
![Page 92: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/92.jpg)
Scatterplot with least-squares line
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
1940 1960 1980 2000
1600
1900
Year
Dis
char
ge
Al Nosedal and Alison Weir STA258H5 Winter 2017 92 / 132
![Page 93: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/93.jpg)
Residuals
A residual is the difference between an observed value of the responsevariable and the value predicted by the regression line. That is,
residual = observed y − predicted y = y − y .
Al Nosedal and Alison Weir STA258H5 Winter 2017 93 / 132
![Page 94: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/94.jpg)
Scatterplot with residual line segments
plot(explanatory,response,
pch=19,col="blue", xlab="Year",
ylab="Discharge");
abline(rivers.reg$coef, col="red");
segments(explanatory, fitted(rivers.reg),
explanatory,response, lty=2, col="black");
Al Nosedal and Alison Weir STA258H5 Winter 2017 94 / 132
![Page 95: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/95.jpg)
Scatterplot with residual line segments
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
1940 1960 1980 2000
1600
1900
Year
Dis
char
ge
Al Nosedal and Alison Weir STA258H5 Winter 2017 95 / 132
![Page 96: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/96.jpg)
Residual Plots
A residual plot is a scatterplot of the regression residuals against theexplanatory variable. Residual plots help us assess the fit of a regressionline.A residual plot magnifies the deviations of the points from the line andmakes it easier to see unusual observations and patterns.
Al Nosedal and Alison Weir STA258H5 Winter 2017 96 / 132
![Page 97: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/97.jpg)
Residual plot
plot(explanatory,resid(rivers.reg),
pch=19,col="blue", xlab="Year",
ylab="Residual");
abline(h=0, col="red",lty=2);
Al Nosedal and Alison Weir STA258H5 Winter 2017 97 / 132
![Page 98: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/98.jpg)
Residual plot
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●
●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
1940 1960 1980 2000
−20
00
Year
Res
idua
l
Al Nosedal and Alison Weir STA258H5 Winter 2017 98 / 132
![Page 99: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/99.jpg)
INFERENCE FOR REGRESSION.
Al Nosedal and Alison Weir STA258H5 Winter 2017 99 / 132
![Page 100: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/100.jpg)
The Regression Model
We have n observations on an explanatory variable x and a responsevariable y . Our goal is to study or predict the behavior of y for givenvalues of x .
For any fixed value of x , the response y varies according to a Normaldistribution. Repeated measures y are independent of each other.
The mean response µy has a straight-line relationship with x :µy = α+βx . The slope β and intercept α are unknown parameters.
The standard deviation of y (call it σ) is the same for all values of x .The value of σ is unknown.The regression model has three parameters, α, β, and σ.
Al Nosedal and Alison Weir STA258H5 Winter 2017 100 / 132
![Page 101: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/101.jpg)
Regression Standard Error
The regression standard error is
s =
√1
n − 2
∑residual2 =
√√√√ 1
n − 2
n∑i=1
(yi − y)2.
Use s to estimate the unknown σ in the regression model.
Al Nosedal and Alison Weir STA258H5 Winter 2017 101 / 132
![Page 102: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/102.jpg)
Regression Standard Error
residuals<-resid(rivers.reg);
n<-length(residuals);
s<-sqrt(sum(residuals^2)/(n-2));
s;
## [1] 104.0026
Al Nosedal and Alison Weir STA258H5 Winter 2017 102 / 132
![Page 103: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/103.jpg)
Regression Standard Error (another way)
summary(rivers.reg)$sigma
## [1] 104.0026
Al Nosedal and Alison Weir STA258H5 Winter 2017 103 / 132
![Page 104: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/104.jpg)
Confidence intervals for the regression slope
A level C confidence interval for the slope β of the true regression line is
b ± t∗SEb.
In this formula, the standard error of the least-squares slope b is
SEb =s√∑
(xi − x)2
and t∗ is the critical value for the t(n − 2) density curve with area Cbetween −t∗ and t∗.
Al Nosedal and Alison Weir STA258H5 Winter 2017 104 / 132
![Page 105: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/105.jpg)
Example
We will use the data in arctic-rivers.txt to give a 90% confidence intervalfor the slope of the true regression of Arctic river discharge on year.
Al Nosedal and Alison Weir STA258H5 Winter 2017 105 / 132
![Page 106: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/106.jpg)
Confidence interval for slope
summary(rivers.reg)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2056.769460 1384.6873683 -1.485367 0.14251371
## explanatory 1.966163 0.7037491 2.793841 0.00692068
Al Nosedal and Alison Weir STA258H5 Winter 2017 106 / 132
![Page 107: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/107.jpg)
Confidence interval for slope
b<-summary(rivers.reg)$coef[2,1];
SEb<-summary(rivers.reg)$coef[2,2];
# lower bound;
b-qt(0.95,df=n-2)*SEb;
## [1] 0.7910398
# upper bound;
b+qt(0.95,df=n-2)*SEb;
## [1] 3.141286
Al Nosedal and Alison Weir STA258H5 Winter 2017 107 / 132
![Page 108: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/108.jpg)
R output gives b = 1.966163 and SEb = 0.7037491. There were n = 64observations, so df = 62. Our 90% Confidence Interval for β is given by(0.7910398, 3.1412862). Because this interval does not contain 0, we haveevidence that β (the rate at which discharge is increasing) is positive.
Al Nosedal and Alison Weir STA258H5 Winter 2017 108 / 132
![Page 109: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/109.jpg)
Testing the hypothesis of no linear relationship
We can also test hypotheses about the slope β. The most commonhypothesis is
H0 : β = 0.
A regression line with slope 0 is horizontal. That is, the mean of y doesnot change at all when x changes. So this H0 says that there is no truelinear relationship between x and y .
Al Nosedal and Alison Weir STA258H5 Winter 2017 109 / 132
![Page 110: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/110.jpg)
Significance test for regression slope
To test the hypothesis H0 : β = 0, compute the t statistic
t =b
SEb.
In terms of a random variable T having the t(n − 2) distribution, theP-value for a test of H0 against Ha : β 6= 0 is 2P(T ≥ |t|).
Al Nosedal and Alison Weir STA258H5 Winter 2017 110 / 132
![Page 111: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/111.jpg)
Our main example
The most important question we ask of the data in arctic-rivers.txt is this:Is the increasing trend visible in your plot statistically significant? If so,changes in the Arctic may already be affecting earth’s climate. Use R toanswer this question.
Al Nosedal and Alison Weir STA258H5 Winter 2017 111 / 132
![Page 112: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/112.jpg)
t.statistic<-b/SEb;
t.statistic;
## [1] 2.793841
p.value<-2*(1-pt(t.statistic,n-2));
p.value;
## [1] 0.00692068
Al Nosedal and Alison Weir STA258H5 Winter 2017 112 / 132
![Page 113: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/113.jpg)
summary(rivers.reg)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2056.769460 1384.6873683 -1.485367 0.14251371
## explanatory 1.966163 0.7037491 2.793841 0.00692068
Al Nosedal and Alison Weir STA258H5 Winter 2017 113 / 132
![Page 114: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/114.jpg)
The t statistic for testing H0 : β = 0 is therefore t = 2.7938409. This hasdf = 62; R gives a P-value of 0.0069207. There is significant evidence (atα = 0.01 significance level) that β is nonzero.
Al Nosedal and Alison Weir STA258H5 Winter 2017 114 / 132
![Page 115: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/115.jpg)
APPENDIX
Al Nosedal and Alison Weir STA258H5 Winter 2017 115 / 132
![Page 116: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/116.jpg)
Simple Linear Regression (SLR) Model
Y = β0 + β1X + ε
Y : Response or dependent variableX : Predictor or independent variable, treat as fixed.β0 and β1: Regression coefficients.ε: Random error.Goal: To be able to predict y for a given value of x .
Al Nosedal and Alison Weir STA258H5 Winter 2017 116 / 132
![Page 117: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/117.jpg)
Assumptions about ε
Error terms
Are uncorrelated
Have mean zero
Have constant variance
Don’t require Normal distribution
E (ε) = 0→ E (Y ) = E (β0 + β1X + ε) = β0 + β1X ”true line”
Al Nosedal and Alison Weir STA258H5 Winter 2017 117 / 132
![Page 118: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/118.jpg)
The least-squares procedure for fitting a line through a set of n datapoints is similar to the method that we might use if we fit a line by eye;that is, we want the differences between the observed values andcorresponding points on the fitted line to be ”small” according to somecriterion. A convenient way to accomplish this is to minimize the sum ofsquares of the vertical deviations from the fitted line.
Al Nosedal and Alison Weir STA258H5 Winter 2017 118 / 132
![Page 119: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/119.jpg)
Thus, if
yi = β0 + β1xi
is the predicted value of the ith y value, then the deviation of theobserved value yi from yi is the difference yi − yi and the sum of squaresof deviations to be minimized is
SSE =n∑
i=1
(yi − yi )2 =
n∑i=1
[yi − (β0 + β1xi )]2.
The quantity SSE is also called the sum of squares for error.
Al Nosedal and Alison Weir STA258H5 Winter 2017 119 / 132
![Page 120: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/120.jpg)
If SSE possesses a minimum, it will occur for values of β0 and β1 thatsatisfy the equations, ∂SSE
∂β0= 0 and ∂SSE
∂β1= 0. These equations are called
the least-squares equations for estimating the parameters of a line.
Al Nosedal and Alison Weir STA258H5 Winter 2017 120 / 132
![Page 121: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/121.jpg)
You can verify that the solutions are
β1 =
∑ni=1(xi − x)(yi − y)∑n
i=1(xi − x)2,
β0 = y − β1x .
(Further, it can be shown that the simultaneous solution for the twoleast-squares equations yields values of β0 and β1 that minimize SSE . Ileave this for you to prove).
Al Nosedal and Alison Weir STA258H5 Winter 2017 121 / 132
![Page 122: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/122.jpg)
Least-Squares Estimators for Simple Linear RegressionModel
β1 =SxySxx
,
where Sxy =∑n
i=1(xi − x)(yi − y) and Sxy =∑n
i=1(xi − x)2.
β0 = y − β1x .
Al Nosedal and Alison Weir STA258H5 Winter 2017 122 / 132
![Page 123: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/123.jpg)
Fitted Values and Residuals
Fitted Value:yi = β0 + β1xi
Residual:
ε = yi − yi
Al Nosedal and Alison Weir STA258H5 Winter 2017 123 / 132
![Page 124: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/124.jpg)
Estimating E (Y )
Point estimate ˆE (Y ) = y = β0 + β1x
Applies for any x , even ones we didn’t observe
Fitted values yi estimate means E (Yi )
Al Nosedal and Alison Weir STA258H5 Winter 2017 124 / 132
![Page 125: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/125.jpg)
Estimating σ2
Residuals εi = ei estimate errors εi
Estimate of the common variance σ2 =∑n
i=1 e2i
n−2
The denominator is n − 2 to make it an un unbiased estimator
Often called MSE, Mean Square Error
Square root of MSE called residual standard error, or standard errorof regression
Al Nosedal and Alison Weir STA258H5 Winter 2017 125 / 132
![Page 126: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/126.jpg)
Distribution of β1
β1 has a N
(β1,
σ2
Sxx
)
Al Nosedal and Alison Weir STA258H5 Winter 2017 126 / 132
![Page 127: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/127.jpg)
Sums of Squares
n∑i=1
(Yi − Y )2 =n∑
i=1
(Yi − Y )2 +n∑
i=1
(Yi − Yi )2
SSTO = SSR + SSE
Sum of Squares (Total) = Sum of Squares (Regression) + Sum of Squares(Error)
Al Nosedal and Alison Weir STA258H5 Winter 2017 127 / 132
![Page 128: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/128.jpg)
ANOVA Table
Analysis of Variance (ANOVA) Table.
Source df SS MS F
Regression 1 SSR MSR F∗ = MSR/MSEError/Residual n-2 SSE MSE
Total n-1 SSTO
Al Nosedal and Alison Weir STA258H5 Winter 2017 128 / 132
![Page 129: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/129.jpg)
ANOVA Table
Analysis of Variance (ANOVA) Table.
H0 : β1 = 0 vs Ha : β1 6= 0
Test Statistic: F ∗ = MSR/MSE
Reference distribution: F1,n−2
Note. F ∗ = t2∗
Al Nosedal and Alison Weir STA258H5 Winter 2017 129 / 132
![Page 130: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/130.jpg)
Coefficient of Determination
The statistic r2 is called the coefficient of determination and has aninteresting and useful interpretation.r2 can be interpreted as the proportion of the total variation in the yi ’sthat is explained by the variable x in a simple linear regression model (formore details, see page 601).
Al Nosedal and Alison Weir STA258H5 Winter 2017 130 / 132
![Page 131: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/131.jpg)
summary(rivers.reg)$fstatistic;
## value numdf dendf
## 7.805547 1.000000 62.000000
summary(rivers.reg)$r.squared;
## [1] 0.1118184
Al Nosedal and Alison Weir STA258H5 Winter 2017 131 / 132
![Page 132: STA258H5 - University of Torontonosedal/sta258/sta258-lec26-27-28.pdf · 54.6 1425 F 33.1 913 F 48.5 1396 F 42.4 1124 F 42.0 1418 F 34.5 1052 F 50.6 1502 F 51.1 1347 F 42.0 1256 F](https://reader033.fdocuments.in/reader033/viewer/2022042004/5e6ed2229add5972ae76d9ab/html5/thumbnails/132.jpg)
Another way would be typing in
summary(rivers.reg)
(that would show you everything but I don’t have enough room here...)
Al Nosedal and Alison Weir STA258H5 Winter 2017 132 / 132