9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression...
-
Upload
melina-edwards -
Category
Documents
-
view
213 -
download
0
Transcript of 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression...
![Page 1: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/1.jpg)
04/21/23 330 Lecture 6 1
STATS 330: Lecture 6
![Page 2: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/2.jpg)
04/21/23 330 Lecture 6 2
Inference for the Regression model
Aim of today’s lecture:To discuss how we assess the significance of
variables in the regression
Key concepts:• Standard errors• Confidence intervals for the coefficients• Tests of significance
Reference: Coursebook Section 3.2
![Page 3: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/3.jpg)
04/21/23 330 Lecture 6 3
Variability of the regression coefficients Imagine that we keep the x’s fixed, but
resample the errors and refit the plane. How much would the plane (estimated coefficients) change?
This gives us an idea of the variability (accuracy) of the estimated coefficients as estimates of the coefficients of the true regression plane.
![Page 4: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/4.jpg)
04/21/23 330 Lecture 6 4
The regression model (cont)
The data is scattered above and below the plane:
Size of “sticks” is random, controlled by 2, doesn’t depend on
x1, x2
aaa
Y
X1
O
X2
![Page 5: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/5.jpg)
04/21/23 330 Lecture 6 5
Variability of coefficients (2)
Variability depends on• The arrangement of the x’s (the more
correlation, the more change, see Lecture 8)• The error variance (the more scatter about the
true plane, the more the fitted plane changes)
Measure variability by the standard error of the coefficients
![Page 6: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/6.jpg)
04/21/23 330 Lecture 6 6
Call:lm(formula = volume ~ diameter + height, data = cherry.df)
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***diameter 4.7082 0.2643 17.816 < 2e-16 ***height 0.3393 0.1302 2.607 0.0145 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 3.882 on 28 degrees of freedomMultiple R-Squared: 0.948, Adjusted R-squared: 0.9442 F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
Standard errors of coefficients
Cherries
![Page 7: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/7.jpg)
04/21/23 330 Lecture 6 7
Confidence interval
![Page 8: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/8.jpg)
04/21/23 330 Lecture 6 8
Confidence interval (2)
A 95% confidence interval for a regression coefficient is of the form Estimated coefficient +/- standard error twhere t is the 97.5% point of the appropriate t-distribution. The degrees of freedom aren-k-1 where n=number of cases (observations) in the regression, and k is the number of variables (assuming we have a constant term)
![Page 9: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/9.jpg)
04/21/23 330 Lecture 6 9
Example: cherry trees
Use function confint
> confint(cherry.lm)
2.5% 97.5%(Intercept) -75.68226247 -40.2930554
diameter 4.16683899 5.2494820
height 0.07264863 0.6058538
Object created by lm
![Page 10: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/10.jpg)
04/21/23 330 Lecture 6 10
Hypothesis test Often we ask “do we need a particular
variable, given the others are in the model?”
Note that this is not the same as asking “is a particular variable related to the response?”
Can test the former by examining the ratio of the coefficient to its standard error
![Page 11: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/11.jpg)
04/21/23 330 Lecture 6 11
Hypothesis test (2) This is the t-statistic t
The bigger t , the more we need the variable
Equivalently, the smaller the p-value, the more we need the variable
![Page 12: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/12.jpg)
04/21/23 330 Lecture 6 12
Call:lm(formula = volume ~ diameter + height, data = cherry.df)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***diameter 4.7082 0.2643 17.816 < 2e-16 ***height 0.3393 0.1302 2.607 0.0145 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 3.882 on 28 degrees of freedomMultiple R-Squared: 0.948, Adjusted R-squared: 0.9442 F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
t-values
p-values
Cherries
All variables required since p=values small (<0.05)
![Page 13: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/13.jpg)
P-value
04/21/23 330 Lecture 6 13
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
P-value: total area is 0.0145
2.607-2.607
Density curve for t with 28 degrees of freedom
![Page 14: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/14.jpg)
04/21/23 330 Lecture 6 14
Other hypotheses Overall significance of the regression: do none
of the variables have a relationship with the response?
Use the F statistic: the bigger F, the more evidence that at least one variable has a relationship • equivalently, the smaller the p-value, the more
evidence that at least one variable has a relationship
![Page 15: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/15.jpg)
04/21/23 330 Lecture 6 15
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***diameter 4.7082 0.2643 17.816 < 2e-16 ***height 0.3393 0.1302 2.607 0.0145 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 3.882 on 28 degrees of freedomMultiple R-Squared: 0.948, Adjusted R-squared: 0.9442 F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
F-valuep-value
Cherries
![Page 16: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/16.jpg)
04/21/23 330 Lecture 6 16
Testing if a subset is required
Often we want to test if a subset of variables is unnecessary
Terminology:Full model: model with all the variablesSub-model: model with a set of variables deleted.
Test is based on comparing the RSS of the submodel with the RSS of the full model. Full model RSS is always smaller (why?)
![Page 17: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/17.jpg)
04/21/23 330 Lecture 6 17
Testing if a subset is adequate (2)
If the full model RSS is not much smaller than the submodel RSS, the submodel is adequate: we don’t need the extra variables.
To do the test, we • Fit both models, get RSS for both.• Calculate test statistic (see next slide)• If the test statistic is large, (equivalently the p-
value is small) the submodel is not adequate
![Page 18: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/18.jpg)
04/21/23 330 Lecture 6 18
Test statistic Test statistic is
d is the number of variables dropped s2 is the estimate of 2 from the full model (the
residual mean square) R has a function anova to do the calculations
2
/)(
s
dRSSRSSF FULLSUB
![Page 19: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/19.jpg)
04/21/23 330 Lecture 6 19
P-values When the smaller model is correct, the test
statistic has an F-distribution with d and n-k-1 degrees of freedom
We assess if the value of F calculated from the sample is a plausible value from this distribution by means of a p-value
If the p-value is too small, we reject the hypothesis that the submodel is ok
![Page 20: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/20.jpg)
04/21/23 330 Lecture 6 20
P-values (cont)
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Density of F-distribution with 2 and 16 df
F
Den
sity
Value of F
P-value
![Page 21: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/21.jpg)
04/21/23 330 Lecture 6 21
Example Free fatty acid data: use physical measures
to model a biochemical parameter in overweight children
Variables are • FFA: free fatty acid level in blood (response)• Age (months)• Weight (pounds)• Skinfold thickness (inches)
![Page 22: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/22.jpg)
04/21/23 330 Lecture 6 22
Dataffa age weight skinfold0.759 105 67 0.960.274 107 70 0.520.685 100 54 0.620.526 103 60 0.760.859 97 61 1.000.652 101 62 0.740.349 99 71 0.761.120 101 48 0.621.059 107 59 0.561.035 100 51 0.44… 20 observations in all
![Page 23: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/23.jpg)
04/21/23 330 Lecture 6 23
Analysis (1)
This suggests that age is not required if weight, skinfold retained, skinfold is not required if weight, age retainedCan we get away with just weight?
> model.full<- lm(ffa~age+weight+skinfold,data=fatty.df)> summary(model.full)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.95777 1.40138 2.824 0.01222 * age -0.01912 0.01275 -1.499 0.15323 weight -0.02007 0.00613 -3.274 0.00478 **skinfold -0.07788 0.31377 -0.248 0.80714
![Page 24: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/24.jpg)
04/21/23 330 Lecture 6 24
Analysis (2)> model.sub<-lm(ffa~weight,data=fatty.df)> anova(model.sub,model.full)Analysis of Variance Table
Model 1: ffa ~ weightModel 2: ffa ~ age + weight + skinfold Res.Df RSS Df Sum of Sq F Pr(>F)1 18 0.91007 2 16 0.79113 2 0.11895 1.2028 0.3261
Small F, large p-value suggest weight alone is adequate. But test should be interpreted with caution, as we “pretested”
![Page 25: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/25.jpg)
Testing a combination of coefficients
Cherry trees: Our model is V = c DH
or
log(V) = + log(D) + log(H)
Dimension analysis suggests +
How can we test this? Test statistic is
P value is area under t-curve beyond +/- t 04/21/23 330 Lecture 6 25
)ˆˆ(
3ˆˆ
21
21
se
t
![Page 26: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/26.jpg)
Testing a combination (cont)
We can use the “R330” function test.lc to compute the value of t:
04/21/23 330 Lecture 6 26
> cherry.lm = lm(log(volume)~log(diameter)+log(height),data=cherry.df)> cc = c(0,1,1)> c = 3> test.lc(cherry.lm,cc,c)$est[1] 3.099773$std.err[1] 0.1765222$t.stat[1] 0.5652165$df[1] 28$p.val[1] 0.5764278
![Page 27: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/27.jpg)
The “R330 package”
A set of functions written for the course, in the form of an R package
Install the package using the R packages menu (see coursebook for details). Then type
library(R330)
04/21/23 330 Lecture 6 27
![Page 28: 9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e4f5503460f94b46eaa/html5/thumbnails/28.jpg)
Testing a combination (cont)
In general, we might want to testc + c+ cc
(in our example c = 0, c=1, c=1, c = 3)
Estimate is
Test statistic is
04/21/23 330 Lecture 6 28
221100ˆˆˆ ccc
)ˆˆˆ.(.
ˆˆˆ
221100
221100
ccces
cccc