Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear...
-
Upload
deasia-woliver -
Category
Documents
-
view
224 -
download
0
Transcript of Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear...
![Page 1: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/1.jpg)
Multiple RegressionMultiple Regression
Chapter 18
![Page 2: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/2.jpg)
18.1 Introduction
• In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
• We expect to build a model that fits the data better than the simple linear regression model.
![Page 3: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/3.jpg)
• We will use computer printout to – Assess the model
• How well it fits the data• Is it useful• Are any required conditions violated?
– Employ the model• Interpreting the coefficients• Predictions using the prediction equation• Estimating the expected value of the dependent variable
![Page 4: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/4.jpg)
Coefficients
Dependent variable Independent variables
Random error variable
18.2 Model and Required Conditions
• We allow for k independent variables to potentially be related to the dependent variable
y = 0 + 1x1+ 2x2 + …+ kxk +
![Page 5: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/5.jpg)
y = 0 + 1xy = 0 + 1xy = 0 + 1xy = 0 + 1x
X
y
X2
1
The simple linear regression modelallows for one independent variable, “x”
y =0 + 1x +
The multiple linear regression modelallows for more than one independent variable.Y = 0 + 1x1 + 2x2 +
Note how the straight line becomes a plain, and...
y = 0 + 1x1 + 2x2
y = 0 + 1x1 + 2x2
y = 0 + 1x1 + 2x2
y = 0 + 1x1 + 2x2y = 0 + 1x1 + 2x2
y = 0 + 1x1 + 2x2
y = 0 + 1x1 + 2x2
![Page 6: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/6.jpg)
X
y
X2
1
… a parabola becomes a parabolic surface
y= b0+ b1x2
y = b0 + b1x12 + b2x2
b0
![Page 7: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/7.jpg)
• Required conditions for the error variable – The error is normally distributed with mean equal
to zero and a constant standard deviation
(independent of the value of y). is unknown.– The errors are independent.
• These conditions are required in order to – estimate the model coefficients,– assess the resulting model.
![Page 8: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/8.jpg)
– If the model passes the assessment tests, use it to interpret the coefficients and generate predictions.
– Assess the model fit and usefulness using the model statistics.
– Diagnose violations of required conditions. Try to remedy problems when identified.
18.3 Estimating the Coefficients and Assessing the Model
• The procedure– Obtain the model coefficients and statistics using a
statistical computer software.
![Page 9: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/9.jpg)
– La Quinta Motor Inns is planning an expansion.– Management wishes to predict which sites are likely to be
profitable.– Several areas where predictors of profitability can be
identified are:• Competition• Market awareness• Demand generators• Demographics• Physical quality
Example 18.1 Where to locate a new motor inn?
![Page 10: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/10.jpg)
Profitability
Competition Market awareness Customers Community Physical
Margin
Rooms Nearest Officespace
Collegeenrollment
Income Disttwn
Distance to downtown.
Medianhouseholdincome.
Distance tothe nearestLa Quinta inn.
Number of hotels/motelsrooms within 3 miles from the site.
![Page 11: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/11.jpg)
– Data was collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model:
Margin =Rooms NearestOfficeCollege
+ 5Income + 6Disttwn + INN MARGIN ROOMS NEAREST OFFICE COLLEGE INCOME DISTTWN1 55.5 3203 0.1 549 8 37 12.12 33.8 2810 1.5 496 17.5 39 0.43 49 2890 1.9 254 20 39 12.24 31.9 3422 1 434 15.5 36 2.75 57.4 2687 3.4 678 15.5 32 7.96 49 3759 1.4 635 19 41 4
INN MARGIN ROOMS NEAREST OFFICE COLLEGE INCOME DISTTWN1 55.5 3203 0.1 549 8 37 12.12 33.8 2810 1.5 496 17.5 39 0.43 49 2890 1.9 254 20 39 12.24 31.9 3422 1 434 15.5 36 2.75 57.4 2687 3.4 678 15.5 32 7.96 49 3759 1.4 635 19 41 4
![Page 12: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/12.jpg)
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.724611R Square 0.525062Adjusted R Square0.49442Standard Error5.512084Observations 100
ANOVAdf SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03E-13Residual 93 2825.626 30.38307Total 99 5949.458
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78049 88.12874ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.90292 -0.38955OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744INCOME -0.41312 0.139552 -2.96034 0.003899 -0.69025 -0.136DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962 0.580138
• Excel outputThis is the sample regression equation (sometimes called the prediction equation)
MARGIN = 72.455 - 0.008ROOMS -1.646NEAREST + 0.02OFFICE +0.212COLLEGE - 0.413INCOME + 0.225DISTTWN
Let us assess this equationLet us assess this equation
![Page 13: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/13.jpg)
• Standard error of estimate– We need to estimate the standard error of estimate
– Compare s to the mean value of y• From the printout, Standard Error = 5.5121 • Calculating the mean value of y we have
– It seems s is not particularly small. – Can we conclude the model does not fit the data well?
1knSSE
s
739.45y
![Page 14: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/14.jpg)
• Coefficient of determination– The definition is
– From the printout, R2 = 0.5251– 52.51% of the variation in the measure of profitability is
explained by the linear regression model formulated above.– When adjusted for degrees of freedom,
Adjusted R2 = 1-[SSE/(n-k-1)] / [SS(Total)/(n-1)] = = 49.44%
2i
2
)yy(
SSE1R
![Page 15: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/15.jpg)
• Testing the validity of the model– We pose the question:
Is there at least one independent variable linearly related to the dependent variable?
– To answer the question we test the hypothesis
H0: 1 = 2 = … = k = 0
H1: At least one i is not equal to zero.
– If at least one i is not equal to zero, the model is valid.
![Page 16: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/16.jpg)
• To test these hypotheses we perform an analysis of variance procedure.
• The F test – Construct the F statistic
– Rejection regionF>F,k,n-k-1
MSEMSR
F
MSR=SSR/k
MSE=SSE/(n-k-1)
[Variation in y] = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model. The null hypothesis shouldbe rejected; thus, the model is valid.
Required conditions mustbe satisfied.
![Page 17: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/17.jpg)
ANOVAdf SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03382E-13Residual 93 2825.626 30.38307Total 99 5949.458
• Excel provides the following ANOVA results
Example 18.1 - continued
SSESSR
MSEMSR
MSR/MSE
![Page 18: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/18.jpg)
ANOVAdf SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03382E-13Residual 93 2825.626 30.38307Total 99 5949.458
• Excel provides the following ANOVA results
Example 18.1 - continued
F,k,n-k-1 = F0.05,6,100-6-1=2.17F = 17.14 > 2.17
Also, the p-value (Significance F) = 3.03382(10)-13
Clearly, = 0.05>3.03382(10)-13, and the null hypothesisis rejected.
Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid
Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid
![Page 19: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/19.jpg)
• Let us interpret the coefficients
– This is the intercept, the value of y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept.
– In this model, for each additional 1000 rooms within 3 mile of the La Quinta inn, the operating margin decreases on the average by 7.6% (assuming the other variables are held constant).
5.72b
0076.b1
![Page 20: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/20.jpg)
– In this model, for each additional mile that the nearest competitor is to La Quinta inn, the average operating margin decreases by 1.65%
– For each additional 1000 sq-ft of office space, the average increase in operating margin will be .02%.
– For additional thousand students MARGIN
increases by .21%.
– For additional $1000 increase in median
household income, MARGIN decreases by .41%
– For each additional mile to the downtown
center, MARGIN increases by .23% on the average
65.1b2
02.b3
21.b4
41.b5
23.b6
![Page 21: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/21.jpg)
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78048735 88.12874ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.010110582 -0.00513NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.902924523 -0.38955OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993085 0.026538COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.053178229 0.476744INCOME -0.41312 0.139552 -2.96034 0.003899 -0.690245235 -0.136DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962198 0.580138
• Testing the coefficients– The hypothesis for each i
– Excel printout
H0: i = 0H1: i = 0
Test statistic
d.f. = n - k -1ib
iis
bt
![Page 22: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/22.jpg)
• Using the linear regression equation
– The model can be used by• Producing a prediction interval for the particular value of y,
for a given set of values of xi.• Producing an interval estimate for the expected value of y,
for a given set of values of xi.
– The model can be used to learn about relationships between the independent variables xi, and the dependent variable y, by interpreting the coefficients i
![Page 23: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/23.jpg)
• Example 18.1 - continued. Produce predictions
– Predict the MARGIN of an inn at a site with the following characteristics:• 3815 rooms within 3 miles,• Closet competitor 3.4 miles away,• 476,000 sq-ft of office space,• 24,500 college students,• $39,000 median household income,• 3.6 miles distance to downtown center.
MARGIN = 72.455 - 0.008(3815) -1.646(3.4) + 0.02(476) +0.212(24.5) - 0.413(39) + 0.225(3.6) = 37.1%
![Page 24: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/24.jpg)
• The required conditions for the model assessment to apply must be checked.
– Is the error variable normally distributed?
– Is the error variance constant?– Are the errors independent?– Can we identify outliers?– Is multicollinearity a problem?
18.4 Regression Diagnostics - II
Draw a histogram of the residuals
Plot the residuals versus y
Plot the residuals versus the time periods
![Page 25: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/25.jpg)
• Example 18.2 House price and multicollinearity
– A real estate agent believes that a house selling price can be predicted using the house size, number of bedrooms, and lot size.
– A random sample of 100 houses was drawn and data recorded.
– Analyze the relationship among the four variables
Price Bedrooms H Size Lot Size124100 3 1290 3900218300 4 2080 6600117800 3 1250 3750
. . . .
. . . .
![Page 26: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/26.jpg)
Regression StatisticsMultiple R 0.74833R Square 0.559998Adjusted R Square0.546248Standard Error25022.71Observations 100
ANOVAdf SS MS F Significance F
Regression 3 7.65E+10 2.55E+10 40.7269 4.57E-17Residual 96 6.01E+10 6.26E+08Total 99 1.37E+11
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 37717.59 14176.74 2.660526 0.009145 9576.963 65858.23Bedrooms 2306.081 6994.192 0.329714 0.742335 -11577.3 16189.45H Size 74.29681 52.97858 1.402393 0.164023 -30.8649 179.4585Lot Size -4.36378 17.024 -0.25633 0.798244 -38.1562 29.42862
• Solution• The proposed model is
PRICE = 0 + 1BEDROOMS + 2H-SIZE +3LOTSIZE + – Excel solution
The model is valid, but no variable is significantly relatedto the selling price !!
![Page 27: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/27.jpg)
– when regressing the price on each independent variable alone, it is found that each variable is strongly related to the selling price.
– Multicollinearity is the source of this problem.Price Bedrooms H Size Lot Size
Price 1Bedrooms 0.645411 1H Size 0.747762 0.846454 1Lot Size 0.740874 0.83743 0.993615 1
• Multicollinearity causes two kinds of difficulties:– The t statistics appear to be too small.– The coefficients cannot be interpreted as “slopes”.
• However,
![Page 28: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/28.jpg)
• Remedying violations of the required conditions
– Nonnormality or heteroscedasticity can be remedied using transformations on the y variable.
– The transformations can improve the linear relationship between the dependent variable and the independent variables.
– Many computer software systems allow us to make the transformations easily.
![Page 29: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/29.jpg)
• A brief list of transformations» y’ = log y (for y > 0)
• Use when the s increases with y, or• Use when the error distribution is positively skewed
» y’ = y2
• Use when the s2 is proportional to E(y), or
• Use when the error distribution is negatively skewed» y’ = y1/2 (for y > 0)
• Use when the s2 is proportional to E(y)
» y’ = 1/y• Use when s2
increases significantly when y increases beyond some value.
![Page 30: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/30.jpg)
• Example 18.3: Analysis, diagnostics, transformations.
– A statistics professor wanted to know whether time limit affect the marks on a quiz?
– A random sample of 100 students was split into 5 groups.– Each student wrote a quiz, but each group was given a
different time limit. See data below.Time 40 45 50 55 60
20 24 26 30 32
23 26 25 32 31
. . . . .
. . . . .
Marks Analyze these results, and include diagnostics
![Page 31: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/31.jpg)
0
10
20
30
40
50
-2.5 -1.5 -0.5 0.5 1.5 2.5 MoreSUMMARY OUTPUT
Regression StatisticsMultiple R 0.86254R Square 0.743974Adjusted R Square 0.741362Standard Error 2.304609Observations 100
ANOVAdf SS MS F Significance F
Regression 1 1512.5 1512.5 284.7743 9.42E-31Residual 98 520.5 5.311224Total 99 2033
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept -2.2 1.64582 -1.33672 0.184409 -5.46608 1.066077Time 0.55 0.032592 16.87526 9.42E-31 0.485322 0.614678
This model is useful andprovides a good fit.
The errors seem to benormally distributed
The model tested:MARK = 0 + 1TIME +
![Page 32: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/32.jpg)
-3
-2
-1
0
1
2
3
4
20 22 24 26 28 30 32
Standardized errors vs. predicted mark.
The standard error of estimate seems to increase with the predicted value of y.
Two transformations are used to remedy this problem:1. y’ = logey2. y’ = 1/y
![Page 33: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/33.jpg)
Let us see what happens when a transformation is applied
Mark
15
20
25
30
35
40
0 20 40 60 80
LogMark
2
3
4
0 20 40 60 80
40,18
40,2340, 3.135
40, 2.89
Loge23 = 3.135
Loge18 = 2.89
The original data, where “Mark” is a function of “Time”
The modified data, where LogMark is a function of “Time"
![Page 34: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/34.jpg)
The new regression analysis and the diagnostics are:
The model tested:LOGMARK = ’0 + ’1TIME + ’
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.8783R Square 0.771412Adjusted R Square0.769079Standard Error0.084437Observations 100
ANOVAdf SS MS F Significance F
Regression 1 2.357901 2.357901 330.7181 3.58E-33Residual 98 0.698705 0.00713Total 99 3.056606
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 2.129582 0.0603 35.31632 1.51E-57 2.009918 2.249246Time 0.021716 0.001194 18.18566 3.58E-33 0.019346 0.024086
Predicted LogMark = 2.1295 + .0217Time
This model is useful andprovides a good fit.
![Page 35: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/35.jpg)
The errors seem to benormally distributed
Standard Residuals
-4
-2
0
2
4
2.9 3 3.1 3.2 3.3 3.4 3.5
0
10
20
30
40
-2.5 -1.5 -0.5 0.5 1.5 2.5 More
The standard errors still changes with the predicted y, but the change is smaller than before.
![Page 36: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/36.jpg)
Let TIME = 55 minutes
LogMark = 2.1295 + .0217Time = 2.1295 + .0217(55) = 3.323
To find the predicted mark, take the antilog:
antiloge3.323 = e3.323 = 27.743
How do we use the modified model to predict?
![Page 37: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/37.jpg)
18.5 Regression Diagnostics - III
• The Durbin - Watson Test– This test detects first order auto-correlation between
consecutive residuals in a time series– If autocorrelation exists the error variables are not
independent
4d0isdofrangeThe
r
)rr(
dn
1i
2i
n
2i
21ii
4d0isdofrangeThe
r
)rr(
dn
1i
2i
n
2i
21ii
Residual at time i
![Page 38: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/38.jpg)
+
+
+
+
+
+
+
+
++ Residuals
Time
Positive first order autocorrelation occurs when consecutive residuals tend to be similar. Then,the value of d is small (less than 2).
Positive first order autocorrelation
Negative first order autocorrelation
+
+
+
+
0
0
Residuals
Time
+
Negative first order autocorrelation occurs when consecutive residuals tend to markedly differ. Then, the value of d is large (greater than 2).
![Page 39: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/39.jpg)
• One tail test for positive first order auto-correlation– If d<dL there is enough evidence to show that positive first-order
correlation exists– If d>dU there is not enough evidence to show that positive first-
order correlation exists– If d is between dL and dU the test is inconclusive.
• One tail test for negative first order auto-correlation– If d>4-dL, negative first order correlation exists
– If d<4-dU, negative first order correlation does not exists
– if d falls between 4-dU and 4-dL the test is inconclusive.
![Page 40: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/40.jpg)
• Two-tail test for first order auto-correlation– If d<dL or d>4-dL first order auto-correlation exists
– If d falls between dL and dU or between 4-dU and 4-dL the test is inconclusive
– If d falls between dU and 4-dU there is no evidence for first order auto-correlation
dL dU 20 44-dU 4-dL
First ordercorrelationexists
First ordercorrelationexists
Inconclusivetest
Inconclusivetest
First ordercorrelationdoes notexist
First ordercorrelationdoes notexist
![Page 41: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/41.jpg)
– How does the weather affect the sales of lift tickets in a ski resort?
– Data of the past 20 years sales of tickets, along with the total snowfall and the average temperature during Christmas week in each year, was collected.
– The model hypothesized was
TICKETS=0+1SNOWFALL+2TEMPERATURE+ – Regression analysis yielded the following results:
• Example 18.4
![Page 42: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/42.jpg)
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.3464529R Square 0.1200296Adjusted R Square 0.0165037Standard Error 1711.6764Observations 20
ANOVAdf SS MS F Signif. F
Regression 2 6793798.2 3396899.1 1.1594 0.3372706Residual 17 49807214 2929836.1Total 19 56601012
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Intercept 8308.0114 903.7285 9.1930391 5E-08 6401.3083 10214.715Snowfall 74.593249 51.574829 1.4463111 0.1663 -34.22028 183.40678Tempture -8.753738 19.704359 -0.444254 0.6625 -50.32636 32.818884
The model seems to be very poor:
The model seems to be very poor:
• The fit is very low (R-square=0.12),• It is not valid (Signif. F =0.33)• No variable is linearly related to Sales
Diagnosis of the required conditions resulted with the following findings
Diagnosis of the required conditions resulted with the following findings
![Page 43: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/43.jpg)
-4000
-3000
-2000
-1000
0
1000
2000
3000
7500 8500 9500 10500 11500 12500
-4000
-3000
-2000
-1000
0
1000
2000
3000
0 5 10 15 20 25
Residual over time
Residual vs. predicted y
The errors are not independent
The error variance is constant
01234567
-2.5 -1.5 -0.5 0.5 1.5 2.5 More
The errors may benormally distributed
The error distribution
![Page 44: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/44.jpg)
Residuals
-4000
-2000
0
2000
4000
0 5 10 15 20 25
Test for positive first order auto-correlation:n=20, k=2. From the Durbin-Watson table we have:dL=1.10, dU=1.54. The statistic d=0.59
Conclusion: Because d<dL , there is sufficient evidence to infer that positive first order auto-correlation exists.
Durbin-Watson Statistic-2793.99-1723.23 d = 0.5931-2342.03-956.955-1963.73
.
.
Using the computer - ExcelTools > data Analysis > Regression (check the residual option and then OK)Tools > Data Analysis Plus > Durbin Watson Statistic > Highlight the range of the residualsfrom the regression run > OK
The residuals
![Page 45: Multiple Regression Chapter 18. 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.](https://reader035.fdocuments.in/reader035/viewer/2022081505/551af983550346f70d8b51ef/html5/thumbnails/45.jpg)
The modified regression model
TICKETS=0+ 1SNOWFALL+ 2TEMPERATURE+ 3YEARS+
The autocorrelation has occurred over time.Therefore, a time dependent variable added to the model may correct the problem
• All the required conditions are met for this model.
• The fit of this model is high R2 = 0.74.
• The model is useful. Significance F = 5.93 E-5. • SNOWFALL and YEARS are linearly related to ticket sales.
• TEMPERATURE is not linearly related to ticket sales.