Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry

Economics 173Business Statistics

Lecture 19

Fall, 2001©

Professor J. Petry

http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/

2

The Regression Analysis Process• A generalized procedure for combining art and science

– Develop a model that has a sound basis.• Theoretical and practical inputs into model basis

– Gather data for the variables in the model.• Gather data for dependent and independent variables

– Draw the scatter diagram to determine whether a linear model (or other forms) appears to be appropriate.

– Obtain the model coefficients and statistics using a statistical computer software.

3

The Regression Analysis Process• A generalized procedure for regression analysis

– Assess the model fit and usefulness using the model statistics.• Does the model hold promise? (overall model evaluation)• Do are variables make sense? (significance, signs)

– Diagnose violations of required conditions. Try to remedy problems when identified.

– Assess the model fit and usefulness using the model statistics.– If the model passes the assessment tests, use it to interpret the

coefficients and generate predictions.

4

– La Quinta Motor Inns is planning an expansion.– Management wishes to predict which sites are likely to

be profitable.– Several predictors of profitability which can be identified

include:• Competition• Market awareness• Demand generators• Demographics• Physical quality

Example 18.1 Where to locate a new motor inn?

5

Profitability

Competition Market awareness

Demand Generators

Physical

Rooms Nearest Officespace

Collegeenrollment

Income Disttown

Distance to downtown.

Medianhouseholdincome.

Distance tothe nearestLa Quinta inn.

Number of hotels/motelsrooms within 3 miles from the site.

Demographics

6

– Data was collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model:

Margin =Rooms NearestOfficeCollege

+ 5Income + 6Disttwn + INN MARGIN ROOMS NEAREST OFFICE COLLEGE INCOME DISTTWN1 55.5 3203 0.1 549 8 37 12.12 33.8 2810 1.5 496 17.5 39 0.43 49 2890 1.9 254 20 39 12.24 31.9 3422 1 434 15.5 36 2.75 57.4 2687 3.4 678 15.5 32 7.96 49 3759 1.4 635 19 41 4

INN MARGIN ROOMS NEAREST OFFICE COLLEGE INCOME DISTTWN1 55.5 3203 0.1 549 8 37 12.12 33.8 2810 1.5 496 17.5 39 0.43 49 2890 1.9 254 20 39 12.24 31.9 3422 1 434 15.5 36 2.75 57.4 2687 3.4 678 15.5 32 7.96 49 3759 1.4 635 19 41 4

7

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.724611R Square 0.525062Adjusted R Square0.49442Standard Error5.512084Observations 100

ANOVAdf SS MS F Significance F

Regression 6 3123.832 520.6387 17.13581 3.03E-13Residual 93 2825.626 30.38307Total 99 5949.458

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78049 88.12874ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.90292 -0.38955OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744INCOME -0.41312 0.139552 -2.96034 0.003899 -0.69025 -0.136DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962 0.580138

• Excel outputThis is the sample regression equation (sometimes called the prediction equation)

MARGIN = 72.455 - 0.008ROOMS -1.646NEAREST + 0.02OFFICE +0.212COLLEGE - 0.413INCOME + 0.225DISTTWN

Assessing this equationAssessing this equation

8

• Standard error of estimate– We need to estimate the standard error of estimate

– Compare s to the mean value of y• From the printout, Standard Error = 5.5121 • Calculating the mean value of y we have

– It seems s is not particularly small. – Can we conclude the model does not fit the data

well?

1knSSE

s

739.45y

9

• Coefficient of determination– The definition is

– From the printout, R2 = 0.5251– 52.51% of the variation in the measure of profitability is

explained by the linear regression model formulated above.

– When adjusted for degrees of freedom, Adjusted R2 = 1-[SSE/(n-k-1)] / [SST/(n-1)] =

= 49.44%

SST

SSER 12

10

• Testing the validity of the model: The F Test– We pose the question:

Is there at least one independent variable linearly related to the dependent variable?

– To answer the question we test the hypothesis

H0: 1 = 2 = … = k = 0

H1: At least one i is not equal to zero.

– If at least one i is not equal to zero, the model is valid.

11

• To test these hypotheses we perform an analysis of variance procedure.

• The F test – Construct the F statistic

– Rejection regionF>F,k,n-k-1

MSEMSR

F

MSR=SSR/k

MSE=SSE/(n-k-1)

SST = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model. The null hypothesis shouldbe rejected; thus, the model is valid.

Required conditions mustbe satisfied.

12



• Excel provides the following ANOVA results

Example 18.1 - continued

SSESSR

MSEMSR

MSR/MSE

13



• Excel provides the following ANOVA results

Example 18.1 - continued

F,k,n-k-1 = F0.05,6,100-6-1=2.17F = 17.14 > 2.17

Also, the p-value (Significance F) = 3.03382(10)-13

Clearly, = 0.05>3.03382(10)-13, and the null hypothesisis rejected.

Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid

Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid

14

• Remember Our Armani Pizza Example from Simple Linear Regression?– Compare the p-values from the F-test, and the t-test.– In simple linear regression, testing the validity of the overall model, and the

slope of our single independent variable should (and do!) provide identical results.

Regression StatisticsMultiple R 0.963366334R Square 0.928074693Adjusted R Square 0.91908403Standard Error 28.13339035

SUMMARY OUTPUT Observations 10



Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -23.49273678 28.85565265 -0.814146783 0.439120548 -90.0340341 43.04856X Variable 1 10.89424753 1.072263792 10.16004421 7.53839E-06 8.42160119 13.36689

15

• Do the Variables Make Sense? – Interpreting the coefficients

– This is the intercept, the value of y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept.

– In this model, for each additional 1000 rooms within 3 mile of the La Quinta inn, the operating margin decreases on the average by 7.6% (assuming the other variables are held constant).

5.72b

0076.b1

16

– In this model, for each additional mile that the nearest competitor is to La Quinta inn, the average operating margin decreases by 1.65%

– For each additional 1000 sq-ft of office space, the average increase in operating margin will be .02%.

– For additional thousand students MARGIN

increases by .21%.

– For additional $1000 increase in median

household income, MARGIN decreases by .41%

– For each additional mile to the downtown

center, MARGIN increases by .23% on the average

65.1b2

02.b3

21.b4

41.b5

23.b6

17

CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78048735 88.12874ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.010110582 -0.00513NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.902924523 -0.38955OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993085 0.026538COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.053178229 0.476744INCOME -0.41312 0.139552 -2.96034 0.003899 -0.690245235 -0.136DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962198 0.580138

• Testing the coefficients– The hypothesis for each i

– Excel printout

H0: i = 0H1: i = 0

Test statistic

d.f. = n - k -1ib

iis

bt

18

• Example – Vacation Homes (18.1)

– A developer who specializes in summer cottage properties is looking at a lakeside tract of land for possible development.

– She wants to estimate the selling price for the individual lots.– She knows from experience that sale price depends upon lot

size, number of mature trees, and distance to the lake.– She gathers data on 60 nearby lots which have sold recently,

and conducts a regression analysis, with the following results:

19

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.4924143R Square 0.2424719Adjusted R Square 0.20189Standard Error 40.243529Observations 60


Regression 3 29029.71625 9676.57208 5.974883 0.001315371Residual 56 90694.33308 1619.54166Total 59 119724.0493

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 51.391216 23.51650385 2.18532554 0.033064 4.282029664 98.5004Lot size 0.6999045 0.558855319 1.25238937 0.215633 -0.419616528 1.819425Trees 0.6788131 0.229306132 2.96029204 0.0045 0.219458042 1.138168Distance -0.3783608 0.195236549 -1.9379609 0.057676 -0.769466342 0.012745

20

• Example – Vacation Homes (18.1)1. What is the standard error of the estimate? Interpret its value.2. What is the coefficient of determination? What does this statistic tell

you?3. What is the coefficient of determination, adjusted for degrees of

freedom? Why does this value differ from the coefficient of determination? What does this tell you about the model?

=========================================================1. Test the overall validity of the model. What does the p-value of the test

statistic tell you?2. Interpret each of the coefficients.3. Test to determine whether each of the independent variables is linearly

related to the price of the lot.

Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry

Documents

Transcript of Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry