Chapter 5
-
Upload
alea-wallace -
Category
Documents
-
view
32 -
download
1
description
Transcript of Chapter 5
Chapter 5Residuals, Residual Plots, &
Influential pointsBy Doug Fritz
Residuals (error) -Residuals (error) -
• The vertical deviation between the observations & the LSRL
• the sum of the residuals is alwaysalways zero zero
• error = observed - expected
yy ˆresidual
Residual plotResidual plot
• A scatterplot of the (x, residual) pairs.
• Residuals can be graphed against other statistics besides x
• Purpose is to tell if a linear associationlinear association exist between the x & y variables
• If no patternno pattern exists between the points in the residual plot, then the association is linearlinear.
Residuals
x
Residuals
x
LinearLinear Not linearNot linear
Age Range of Motion
35 154
24 142
40 137
31 133
28 122
25 126
26 135
16 135
14 108
20 120
21 127
30 122
One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion?
Sketch a residual plot.
Since there is no pattern in the residual plot, there is a linear relationship between age and range of motion
x
Residuals
Age Range of Motion
35 154
24 142
40 137
31 133
28 122
25 126
26 135
16 135
14 108
20 120
21 127
30 122
Plot the residuals against the y-hats. How does this residual plot compare to the previous one?
Residualsy
Residual plots are the same no matter if plotted against x or y-hat.
x
Residuals Residuals
y
Coefficient of determination-Coefficient of determination-• r2
• gives the proportion of variationvariation in yy that can be attributed to an approximate linear relationship between x & y
• remains the same no matter which variable is labeled x
Age Range of Motion
35 154
24 142
40 137
31 133
28 122
25 126
26 135
16 135
14 108
20 120
21 127
30 122
Let’s examine r2.
Suppose you were going to predict a future y but you didn’t know the x-value. Your best guess would be the overall mean of the existing y’s.
SSy = 1564.917
Sum of the squared residuals (errors) using
the mean of y.
308.130y
Age Range of Motion
35 154
24 142
40 137
31 133
28 122
25 126
26 135
16 135
14 108
20 120
21 127
30 122
Now suppose you were going to predict a future y but you DO know the x-value. Your best guess would be the point on the LSRL for that x-value (y-hat).
Sum of the squared residuals (errors) using the LSRL.
SSy = 1085.735
583.107871.ˆ xy
Age Range of Motion
35 154
24 142
40 137
31 133
28 122
25 126
26 135
16 135
14 108
20 120
21 127
30 122
By what percent did the sum of the squared error go down when you went from just an “overall mean” model to the “regression on x” model?
SSy = 1085.735
SSy = 1564.917
3062916671564
7351085916671564
SSE
SSESSE
y
yy
..
..
ˆ
This is r2 – the amount of the
variation in the y-values that is explained by the x-values.
Age Range of Motion
35 154
24 142
40 137
31 133
28 122
25 126
26 135
16 135
14 108
20 120
21 127
30 122
How well does age predict the range of motion after knee surgery?
Approximately 30.6% of the variation in range of motion after knee surgery can be explained by the linear regression of age and range of motion.
Interpretation of r2
Approximately rr22%% of the variation in yy can be explained by the LSRL of xx & yy.
Computer-generated regression analysis of knee surgery data:
Predictor Coef Stdev T P
Constant 107.58 11.12 9.67 0.000
Age 0.8710 0.4146 2.10 0.062
s = 10.42 R-sq = 30.6% R-sq(adj) = 23.7%
x . . y 871058107ˆ 5532.r
What is the equation of the LSRL?
Find the slope & y-intercept.
NEVER use adjusted r2!
Be sure to convert r2 to decimal beforebefore taking the square
root!
What are the correlation coefficient and the coefficient of
determination?
Outlier –Outlier –
• In a regression setting, an outlier is a data point with a largelarge residual
Influential point-Influential point-
• A point that influences where the LSRL is located
• If removed, it will significantly change the slope of the LSRL
Racket Resonance Acceleration
(Hz) (m/sec/sec)
1 105 36.0
2 106 35.0
3 110 34.5
4 111 36.8
5 112 37.0
6 113 34.0
7 113 34.2
8 114 33.8
9 114 35.0
10 119 35.0
11 120 33.6
12 121 34.2
13 126 36.2
14 189 30.0
One factor in the development of tennis elbow is the impact-induced vibration of the racket and arm at ball contact.
Sketch a scatterplot of these data.
Calculate the LSRL & correlation coefficient.
Does there appear to be an influential point? If so, remove it and then calculate the new LSRL & correlation coefficient.
(189,30) could be influential. Remove & recalculate LSRL
(189,30) was influential since it moved the LSRL
Which of these measures are Which of these measures are resistant?resistant?
• LSRL
• Correlation coefficient
• Coefficient of determination
NONENONE – all are affected by outliers