Lecture 10 Chapter 23. Inference for regression
description
Transcript of Lecture 10 Chapter 23. Inference for regression
![Page 1: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/1.jpg)
Lecture 10Chapter 23. Inference for regression
![Page 2: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/2.jpg)
Objectives (PSLS Chapter 23)
Inference for regression
(NHST Regression Inference Award)[B level award] Note Level
The regression model
Confidence interval for the regression slope β
Testing the hypothesis of no linear relationship
Inference for prediction
Conditions for inference
![Page 3: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/3.jpg)
Most scatterplots are
created from sample data.
Is the observed relationship statistically significant
(not explained by chance events due to random sampling alone)?
What is the population mean response y as a function of the
explanatory variable x?
y = + x
Common Questions Asked of Bivariate Relationships
![Page 4: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/4.jpg)
The regression model
The least-squares regression line
ŷ = a + bx
is a mathematical model of the
relationship between 2
quantitative variables:
“observed data = fit + residual”
The regression line is the fit.
For each data point in the sample, the residual is the difference (y - ŷ).
![Page 5: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/5.jpg)
At the population level, the model becomes
yi = ( + xi) + (i)
with residuals i independent and
Normally distributed N(0,).
The population mean response y is
y = + xŷ unbiased estimate for mean response y
a unbiased estimate for intercept
b unbiased estimate for slope
![Page 6: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/6.jpg)
The regression standard error, s, for n sample data points is
computed from the residuals (yi – ŷi):
s is an unbiased estimate of the regression standard deviation
The model has constant
variance of Y ( is the same
for all values of x).
For any fixed x, the responses y follow a Normal distribution with standard deviation .
2
)ˆ(
2
22
n
yy
n
residuals ii
![Page 7: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/7.jpg)
Regression Analysis: The regression equation isFrequency = - 6.19 + 2.33 Temperature
Predictor Coef SE Coef T PConstant -6.190 8.243 -0.75 0.462Temperature 2.3308 0.3468 6.72 0.000
S = 2.82159 R-Sq = 71.5% R-Sq(adj) = 69.9%
Frog mating call
Scientists recorded the call frequency of
20 gray tree frogs, Hyla chrysoscelis, as
a function of field air temperature.
MINITAB
2
)ˆ(
2
22
n
yy
n
residuals iis: aka “MSE”, regression standard
error, unbiased estimate of
![Page 8: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/8.jpg)
Confidence interval for the slope βEstimating the regression parameter for the slope is a case of one-
sample inference with σ unknown. Hence we rely on t distributions.
The standard error of the slope b is:
(s is the regression standard error)
Thus, a level C confidence interval for the slope is:
estimate ± t*SEestimate
b ± t* SEb
t* is t critical for t(df = n – 2) density curve with C% between –t* and +t*
![Page 9: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/9.jpg)
Regression Analysis: The regression equation isFrequency = - 6.19 + 2.33 Temperature
Predictor Coef SE Coef T PConstant -6.190 8.243 -0.75 0.462Temperature 2.3308 0.3468 6.72 0.000
S = 2.82159 R-Sq = 71.5% R-Sq(adj) = 69.9%
Frog mating call
standard error of the slope SEb
We are 95% confident that a 1 degree Celsius increase in temperature results
in an increase in mating call frequency between 1.60 and 3.06 notes/seconds.
95% CI for the slope is:
b ± t*SEb = 2.3308 ± (2.101)(0.3468) = 2.3308 ± 07286, or 1.6022 to 3.0594
n = 20, df = 18
t* = 2.101
![Page 10: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/10.jpg)
Testing the hypothesis of no relationship
)( 2
xx
sSEb
![Page 11: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/11.jpg)
Regression Analysis: The regression equation isFrequency = - 6.19 + 2.33 Temperature
Predictor Coef SE Coef T PConstant -6.190 8.243 -0.75 0.462Temperature 2.3308 0.3468 6.72 0.000
S = 2.82159 R-Sq = 71.5% R-Sq(adj) = 69.9%
Frog mating call
Two-sided P-value
for H0: β = 0
t = b / SEb = 2.3308 / 0.3468 = 6.721, with df = n – 2 = 18,
Table C: t > 3.922 P < 0.001 (two-sided test), highly significant.
There is a significant relationship between temperature and call frequency.
We test H0: β = 0 versus Ha: β ≠ 0
![Page 12: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/12.jpg)
Testing for lack of correlation
The regression slope b and the correlation
coefficient r are related and b = 0 r = 0.
Conversely, the population parameter for the slope β is related to the
population correlation coefficient ρ, and when β= 0 ρ = 0.
Thus, testing the hypothesis H0: β = 0 is the same as testing the
hypothesis of no correlation between x and y in the population from
which our data were drawn.
slope y
x
sb r
s
![Page 13: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/13.jpg)
Hand length and body height for a
random sample of 21 adult men
Regression Analysis: Hand length (mm) vs Height (m)
The regression equation isHand length (mm) = 126 + 36.4 Height (m)
Predictor Coef SE Coef T PConstant 125.51 39.77 3.16 0.005Height (m) 36.41 21.91 1.66 0.113
S = 6.74178 R-Sq = 12.7% R-Sq(adj) = 8.1%
No significant relationship
![Page 14: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/14.jpg)
Objectives (PSLS Chapter 23)
Inference for regression
(NHST Regression Inference Award)[B level award] Note Change
The regression model
Confidence interval for the regression slope β
Testing the hypothesis of no linear relationship
Inference for prediction
Conditions for inference
![Page 15: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/15.jpg)
Two Types of Inference for prediction1. One use of regression is for prediction within range: ŷ = a + bx.
But this prediction depends on the particular sample drawn.
We need statistical inference to generalize our conclusions.
2. To estimate an individual response y for a given value of x, we use
a prediction interval.
If we randomly sampled many times, there
would be many different values of y
obtained for a particular x following N(0,σ)
around the mean response µy.
![Page 16: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/16.jpg)
Confidence interval for µy
Predicting the population mean value of y, µy, for any value of x within
the range (domain) of data tested.
Using inference, we calculate a level C confidence interval for the
population mean μy of all responses y when x takes the value x*:
This interval is centered on ŷ, the unbiased estimate of μy.
The true value of the population mean μy at a given
value of x, will indeed be within our confidence
interval in C% of all intervals computed
from many different random samples.
![Page 17: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/17.jpg)
A level C prediction interval for a single observation on y when x
takes the value x* is:
ŷ ± t* SEŷ
A level C confidence interval for the mean response μy at a given
value x* of x is:
ŷ ± t* SE^
Use t* for a t distribution with df = n – 2
![Page 18: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/18.jpg)
Confidence interval for the mean
call frequency at 22 degrees C:
µy within 43.3 and 46.9 notes/sec
Prediction interval for individual
call frequencies at 22 degrees C:
ŷ within 38.9 and 51.3 notes/sec
MinitabPredicted Values for New Observations: Temperature=22
NewObs Fit SE Fit 95% CI 95% PI 1 45.088 0.863 (43.273, 46.902) (38.888, 51.287)
![Page 19: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/19.jpg)
Blood alcohol content (bac) as a function of alcohol consumed (number of beers) for a random sample of 16 adults.
What do you predict for 5 beers consumed?
What are typical values of BAC when 5 beers
are consumed? Give a range.
![Page 20: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/20.jpg)
Conditions for inference
The observations are independent
The relationship is indeed linear (in the parameters)
The standard deviation of y, σ, is the same for all values of x
The response y varies Normally
around its mean at a given x
![Page 21: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/21.jpg)
Using residual plots to check for regression validityThe residuals (y – ŷ) give useful information about the contribution of
individual data points to the overall pattern of scatter.
We view the residuals in
a residual plot:
If residuals are scattered randomly around 0 with uniform variation, it
indicates that the data fit a linear model, have Normally distributed
residuals for each value of x, and constant standard deviation σ.
![Page 22: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/22.jpg)
Residuals are randomly scattered
good!
Curved pattern
the shape of the relationship is
not modeled correctly.
Change in variability across plot
σ not equal for all values of x.
![Page 23: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/23.jpg)
Frog mating call
The data are a random sample of frogs.
The relationship is clearly linear.
The residuals are roughly Normally distributed.
The spread of the residuals around 0 is fairly homogenous along all values of x.
![Page 24: Lecture 10 Chapter 23. Inference for regression](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814903550346895db63a7a/html5/thumbnails/24.jpg)
Objectives (PSLS Chapter 23)
Inference for regression
(NHST Regression Inference Award)[B level award] Note Change
The regression model
Confidence interval for the regression slope β
Testing the hypothesis of no linear relationship
Inference for prediction
Conditions for inference