Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry
-
Upload
nathaniel-reeves -
Category
Documents
-
view
213 -
download
0
Transcript of Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry
Economics 173Business Statistics
Lecture 19
Fall, 2001©
Professor J. Petry
http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/
2
The Regression Analysis Process• A generalized procedure for combining art and science
– Develop a model that has a sound basis.• Theoretical and practical inputs into model basis
– Gather data for the variables in the model.• Gather data for dependent and independent variables
– Draw the scatter diagram to determine whether a linear model (or other forms) appears to be appropriate.
– Obtain the model coefficients and statistics using a statistical computer software.
3
The Regression Analysis Process• A generalized procedure for regression analysis
– Assess the model fit and usefulness using the model statistics.• Does the model hold promise? (overall model evaluation)• Do are variables make sense? (significance, signs)
– Diagnose violations of required conditions. Try to remedy problems when identified.
– Assess the model fit and usefulness using the model statistics.– If the model passes the assessment tests, use it to interpret the
coefficients and generate predictions.
4
– La Quinta Motor Inns is planning an expansion.– Management wishes to predict which sites are likely to
be profitable.– Several predictors of profitability which can be identified
include:• Competition• Market awareness• Demand generators• Demographics• Physical quality
Example 18.1 Where to locate a new motor inn?
5
Profitability
Competition Market awareness
Demand Generators
Physical
Rooms Nearest Officespace
Collegeenrollment
Income Disttown
Distance to downtown.
Medianhouseholdincome.
Distance tothe nearestLa Quinta inn.
Number of hotels/motelsrooms within 3 miles from the site.
Demographics
6
– Data was collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model:
Margin =Rooms NearestOfficeCollege
+ 5Income + 6Disttwn + INN MARGIN ROOMS NEAREST OFFICE COLLEGE INCOME DISTTWN1 55.5 3203 0.1 549 8 37 12.12 33.8 2810 1.5 496 17.5 39 0.43 49 2890 1.9 254 20 39 12.24 31.9 3422 1 434 15.5 36 2.75 57.4 2687 3.4 678 15.5 32 7.96 49 3759 1.4 635 19 41 4
INN MARGIN ROOMS NEAREST OFFICE COLLEGE INCOME DISTTWN1 55.5 3203 0.1 549 8 37 12.12 33.8 2810 1.5 496 17.5 39 0.43 49 2890 1.9 254 20 39 12.24 31.9 3422 1 434 15.5 36 2.75 57.4 2687 3.4 678 15.5 32 7.96 49 3759 1.4 635 19 41 4
7
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.724611R Square 0.525062Adjusted R Square0.49442Standard Error5.512084Observations 100
ANOVAdf SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03E-13Residual 93 2825.626 30.38307Total 99 5949.458
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78049 88.12874ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.90292 -0.38955OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744INCOME -0.41312 0.139552 -2.96034 0.003899 -0.69025 -0.136DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962 0.580138
• Excel outputThis is the sample regression equation (sometimes called the prediction equation)
MARGIN = 72.455 - 0.008ROOMS -1.646NEAREST + 0.02OFFICE +0.212COLLEGE - 0.413INCOME + 0.225DISTTWN
Assessing this equationAssessing this equation
8
• Standard error of estimate– We need to estimate the standard error of estimate
– Compare s to the mean value of y• From the printout, Standard Error = 5.5121 • Calculating the mean value of y we have
– It seems s is not particularly small. – Can we conclude the model does not fit the data
well?
1knSSE
s
739.45y
9
• Coefficient of determination– The definition is
– From the printout, R2 = 0.5251– 52.51% of the variation in the measure of profitability is
explained by the linear regression model formulated above.
– When adjusted for degrees of freedom, Adjusted R2 = 1-[SSE/(n-k-1)] / [SST/(n-1)] =
= 49.44%
SST
SSER 12
10
• Testing the validity of the model: The F Test– We pose the question:
Is there at least one independent variable linearly related to the dependent variable?
– To answer the question we test the hypothesis
H0: 1 = 2 = … = k = 0
H1: At least one i is not equal to zero.
– If at least one i is not equal to zero, the model is valid.
11
• To test these hypotheses we perform an analysis of variance procedure.
• The F test – Construct the F statistic
– Rejection regionF>F,k,n-k-1
MSEMSR
F
MSR=SSR/k
MSE=SSE/(n-k-1)
SST = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model. The null hypothesis shouldbe rejected; thus, the model is valid.
Required conditions mustbe satisfied.
12
ANOVAdf SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03382E-13Residual 93 2825.626 30.38307Total 99 5949.458
• Excel provides the following ANOVA results
Example 18.1 - continued
SSESSR
MSEMSR
MSR/MSE
13
ANOVAdf SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03382E-13Residual 93 2825.626 30.38307Total 99 5949.458
• Excel provides the following ANOVA results
Example 18.1 - continued
F,k,n-k-1 = F0.05,6,100-6-1=2.17F = 17.14 > 2.17
Also, the p-value (Significance F) = 3.03382(10)-13
Clearly, = 0.05>3.03382(10)-13, and the null hypothesisis rejected.
Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid
Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid
14
• Remember Our Armani Pizza Example from Simple Linear Regression?– Compare the p-values from the F-test, and the t-test.– In simple linear regression, testing the validity of the overall model, and the
slope of our single independent variable should (and do!) provide identical results.
Regression StatisticsMultiple R 0.963366334R Square 0.928074693Adjusted R Square 0.91908403Standard Error 28.13339035
SUMMARY OUTPUT Observations 10
ANOVAdf SS MS F Significance F
Regression 1 81702.49878 81702.49878 103.2264983 7.5384E-06Residual 8 6331.90122 791.4876525Total 9 88034.4
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -23.49273678 28.85565265 -0.814146783 0.439120548 -90.0340341 43.04856X Variable 1 10.89424753 1.072263792 10.16004421 7.53839E-06 8.42160119 13.36689
15
• Do the Variables Make Sense? – Interpreting the coefficients
– This is the intercept, the value of y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept.
– In this model, for each additional 1000 rooms within 3 mile of the La Quinta inn, the operating margin decreases on the average by 7.6% (assuming the other variables are held constant).
5.72b
0076.b1
16
– In this model, for each additional mile that the nearest competitor is to La Quinta inn, the average operating margin decreases by 1.65%
– For each additional 1000 sq-ft of office space, the average increase in operating margin will be .02%.
– For additional thousand students MARGIN
increases by .21%.
– For additional $1000 increase in median
household income, MARGIN decreases by .41%
– For each additional mile to the downtown
center, MARGIN increases by .23% on the average
65.1b2
02.b3
21.b4
41.b5
23.b6
17
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78048735 88.12874ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.010110582 -0.00513NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.902924523 -0.38955OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993085 0.026538COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.053178229 0.476744INCOME -0.41312 0.139552 -2.96034 0.003899 -0.690245235 -0.136DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962198 0.580138
• Testing the coefficients– The hypothesis for each i
– Excel printout
H0: i = 0H1: i = 0
Test statistic
d.f. = n - k -1ib
iis
bt
18
• Example – Vacation Homes (18.1)
– A developer who specializes in summer cottage properties is looking at a lakeside tract of land for possible development.
– She wants to estimate the selling price for the individual lots.– She knows from experience that sale price depends upon lot
size, number of mature trees, and distance to the lake.– She gathers data on 60 nearby lots which have sold recently,
and conducts a regression analysis, with the following results:
19
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.4924143R Square 0.2424719Adjusted R Square 0.20189Standard Error 40.243529Observations 60
ANOVAdf SS MS F Significance F
Regression 3 29029.71625 9676.57208 5.974883 0.001315371Residual 56 90694.33308 1619.54166Total 59 119724.0493
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 51.391216 23.51650385 2.18532554 0.033064 4.282029664 98.5004Lot size 0.6999045 0.558855319 1.25238937 0.215633 -0.419616528 1.819425Trees 0.6788131 0.229306132 2.96029204 0.0045 0.219458042 1.138168Distance -0.3783608 0.195236549 -1.9379609 0.057676 -0.769466342 0.012745
20
• Example – Vacation Homes (18.1)1. What is the standard error of the estimate? Interpret its value.2. What is the coefficient of determination? What does this statistic tell
you?3. What is the coefficient of determination, adjusted for degrees of
freedom? Why does this value differ from the coefficient of determination? What does this tell you about the model?
=========================================================1. Test the overall validity of the model. What does the p-value of the test
statistic tell you?2. Interpret each of the coefficients.3. Test to determine whether each of the independent variables is linearly
related to the price of the lot.