The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is...
-
Upload
nicholas-offord -
Category
Documents
-
view
216 -
download
2
Transcript of The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is...
![Page 1: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/1.jpg)
The Role of r2 in Regression
Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL.
D4: 3.2b
Hw: pg 191 – 43, 46, 48, 53, 63, 71 - 78
![Page 2: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/2.jpg)
r : correlation coefficientr2 : the coefficient of determination
• If the line ŷ is a poor model, the value of r2 turns out to be: too small, closer to 0.
• If the line ŷ fit the data fairly well, the value of r2 turns out to be: larger, closer to 1.
![Page 3: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/3.jpg)
What is the meaning of r2 in regression?
• Squares of the deviations about ŷ
![Page 4: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/4.jpg)
Least-S
quare
s Regre
ssion
• The Role of r2 in RegressionThe standard deviation of the residuals gives us a
numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y.
Definition:
The coefficient of determination r2 is the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x. We can calculate r2 using the following formula:
where
and
r2 1SSE
SST
SSE residual2
SST (y i y )2
![Page 5: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/5.jpg)
Formula for r2 made up of these parts
• SST: total sum of squares about the mean y bar.
SST = ∑(y – y bar)2
• SSE: sum of the squares for error.
SSE = ∑(y – ŷ)2
• r2: coefficient of determination.
r2 = SST – SSE
SST
![Page 6: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/6.jpg)
Ex: Large r2
If , then the deviations and thus the ; in fact, if all of the points fell exactly on the regression line, SSE would be 0.
r2 = SST – SSE
SST
x is a good predictor of y
SSE would be small
![Page 7: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/7.jpg)
• For the data in this example,
x: 0 5 10
y: 0 7 8
r2 = SST – SSE = 38 – 6 =
SST 38
Conclusion: We say that ____ of the ___________ is explained by the
__________________________.
0.842
84%
variation in y
least-squares regression of y on x.
![Page 8: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/8.jpg)
r2 in Regression
The coefficient of determination r2, is the fraction of the variation in the values that are explainedby least-squared regression of y on x.
![Page 9: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/9.jpg)
Least-S
quare
s Regre
ssion
• The Role of r2 in Regressionr 2 tells us how much better the LSRL does at predicting values of
y than simply guessing the mean y for each value in the dataset. Consider the example on page 179. If we needed to predict a backpack weight for a new hiker, but didn’t know each hikers weight, we could use the average backpack weight as our prediction.
If we use the mean backpack weight as our prediction, the sum of the squared residuals is 83.87.SST = 83.87
If we use the LSRL to make our predictions, the sum of the squared residuals is 30.90.SSE = 30.90
SSE/SST = 30.97/83.87SSE/SST = 0.368
Therefore, 36.8% of the variation in pack weight is unaccounted for by the least-squares regression line.
SSE/SST = 30.97/83.87SSE/SST = 0.368
Therefore, 36.8% of the variation in pack weight is unaccounted for by the least-squares regression line.
1 – SSE/SST = 1 – 30.97/83.87r2 = 0.632
63.2 % of the variation in backpack weight is accounted for by the linear model relating pack weight to body weight.
1 – SSE/SST = 1 – 30.97/83.87r2 = 0.632
63.2 % of the variation in backpack weight is accounted for by the linear model relating pack weight to body weight.
![Page 10: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/10.jpg)
Least-S
quare
s Regre
ssion
• Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, be aware of their limitations
Fact 1. The distinction between explanatory and response variables is important in regression.
![Page 11: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/11.jpg)
Facts about least-squared regression
• Fact 2: There is a close connection between the slope of the least-squared regression line.
As the correlation grows less strong, the in response to changes in x.
correlation and
prediction ŷ moves less
![Page 12: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/12.jpg)
• Fact 3: Every LSRL passes through
• Remember: When reporting a regression, give r2 as a measure of how successful the regression was in explaining the response.
• When you see a correlation (r), square it to get a better feel for the strength of the association.
, .x y
![Page 13: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/13.jpg)
Exercise: Predicting The Stock Market.
Some people think that the behavior of the stock market in January predicts its behavior for the rest of the year.
• Take the explanatory variable x to be the percent change in a stock market index in January and the
• response variable y to be the change in the index for the entire year.
• We expect a positive correlation between x and y because the change during January contributes to the years full change.
![Page 14: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/14.jpg)
• Calculation from the data for the years 1960 to 1997 gives:
x bar = 1.75%, sx = 5.36
y-bar = 9.07%, sy = 15.35%
and r = 0.596
![Page 15: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/15.jpg)
r = 0.596
a.What percent of the observed variation in yearly changes in the index with is explained by a straight-line relationship the changes during January?
The straight-line relationship is explained by r2 = 0.355 or,
35.5% of the variations in yearly changes in the index is explained by the changes during January.
![Page 16: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/16.jpg)
b. What is the equation of the least-squared regression line for predicting full-year change from January change?
Find b: , b = 1.707
Find a: , a = 6.083%
y
x
sb = r
s
a = y - b x
![Page 17: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/17.jpg)
The regression equation is
ŷ = a + bx
ŷ = 6.083% + 1.707x
![Page 18: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/18.jpg)
Predictions
c. The mean change in January is = 1.75%. Use your regression line to predict the change in the index in a year in which the index rises 1.75% (x bar) in January. Why could you have given this result w/out doing the calculation?Every LSRL passes through (x bar, y bar). Recall y bar = 9.07%, so the predicted change isŷ = 9.07%.
![Page 19: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/19.jpg)
Exercise: Class attendance and grades
A study of class attendance and grades among first year students at a state university showed that in general students who attended a higher percent of their classes earn higher grades.
![Page 20: The Role of r 2 in Regression Target Goal: I can use r 2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,](https://reader033.fdocuments.in/reader033/viewer/2022052701/56649c775503460f9492bf2b/html5/thumbnails/20.jpg)
Class attendance explained 16% of the variation in grade index among students. What is the numerical value of the correlation between percent of class attended and grade index?
r2
r =
High attendance goes with high grades so the correlation must be positive.
0.40