MODEL BUILDING IN REGRESSION MODELS

26
MODEL BUILDING MODEL BUILDING IN IN REGRESSION MODELS REGRESSION MODELS

description

MODEL BUILDING IN REGRESSION MODELS. Model Building and Multicollinearity. Suppose we have five factors that we feel could linearly affect y. If all 5 are included we have: y =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5 +  - PowerPoint PPT Presentation

Transcript of MODEL BUILDING IN REGRESSION MODELS

Page 1: MODEL BUILDING IN REGRESSION MODELS

MODEL BUILDINGMODEL BUILDING

ININ

REGRESSION MODELSREGRESSION MODELS

Page 2: MODEL BUILDING IN REGRESSION MODELS

Model Building and Multicollinearity

• Suppose we have five factors that we feel could linearly affect y. If all 5 are included we have:

y = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5 +

• But while the p-value for the F-test (Significance F) might be small, one or more (if not all) of the p-values for the individual t-tests may be large.

• Question: Which factors make up the “best” model?– This is called model building

Page 3: MODEL BUILDING IN REGRESSION MODELS

Model Building

• There many approaches to model building– Elimination of some (all) of the variables

with high p-values is one approach

• Forward stepwise regression “builds” the model by adding one variable at a time.

• Modified F-tests can be used to test if the a certain subset of the variables should be included in the model.

Page 4: MODEL BUILDING IN REGRESSION MODELS

The Stepwise Regression Approach

• y = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5 +

• Step 1: Run five simple linear regressions:– y = 0 + 1 x1 – y = 0 + 2 x2 – y = 0 + 3 x3 – y = 0 + 4 x4

– y = 0 + 5 x5

• Check the p-values for each –– Note for simple linear regression Significance F = p-value for the t-test.

Suppose this model has lowest p-value (< α)

Page 5: MODEL BUILDING IN REGRESSION MODELS

Stepwise Regression

• Step 2: Run four 2-variable linear regressions:

Check Significance F and p-values for:– y = 0 + 4 x4 + 1 x1

– y = 0 + 4 x4 + 2 x2

– y = 0 + 4 x4 + 3 x3

– y = 0 + 4 x4 + 5 x5

Suppose lowest p-values (< α)Add X3Add X3

Page 6: MODEL BUILDING IN REGRESSION MODELS

Stepwise Regression

• Step 3: Run three 3-variable linear regressions:– y = 0 + 3 x3 + 4 x4 + 1 x1 – y = 0 + 3 x3 + 4 x4 + 2 x2

– y = 0 + 3 x3 + 4 x4 + 5 x5

• Suppose none of these models have all p-values < α -- STOP -- best model is the one with x3 and x4 only

Page 7: MODEL BUILDING IN REGRESSION MODELS

Example

Page 8: MODEL BUILDING IN REGRESSION MODELS

Regression on 5 Variables

Page 9: MODEL BUILDING IN REGRESSION MODELS

Summary of Results from1-Variable Tests

Page 10: MODEL BUILDING IN REGRESSION MODELS

Performing Tests With More Than One Variable

• Remember the Range for X must be contiguous

• Use CUTCUT and INSERT CUT CELLSINSERT CUT CELLS to arrange the X columns so that they are next to each other

Page 11: MODEL BUILDING IN REGRESSION MODELS

Summary of Results From2-Variable Tests

Page 12: MODEL BUILDING IN REGRESSION MODELS

Summary of Results from3-Variable Tests

Page 13: MODEL BUILDING IN REGRESSION MODELS

Summary of Results from4-Variable Tests

Page 14: MODEL BUILDING IN REGRESSION MODELS

Best Model

• The best model is the three-variable model that includes x1, x4, and x5.

541 21.36134x931.9743x130.5134x 2782.66- y

Page 15: MODEL BUILDING IN REGRESSION MODELS

TESTING PARTS OF THE MODEL

• Sometimes we wish to see whether to keep a set of variables “as a group” or eliminate them from the model.– Example: Model might include 3 dummy

variables to account for how the independent variable is affected by a particular season (or quarter) of the year.

• Will either keep all seasons or will keep none

• The general approach is to assess how much “extra value” these additional variables will add to the model.– Approach is a Modified F-test

Page 16: MODEL BUILDING IN REGRESSION MODELS

Approach: Compare Two Models –The Full Model and The Reduced Model

• Suppose a model consists of p variables and we wish to consider whether or not to keep a set of p-q of those p variables in the model.

• Two models– Full model – p variables– Reduced model – q variables

• For notational convenience, assume the last p-q of the p variables are the ones that would be eliminated.

– Sample of size n is taken

Page 17: MODEL BUILDING IN REGRESSION MODELS

The Modified F-Test

• Modified F-Test:

H0: βq+1 = βq+2 = ..… = βp = 0

HA: At least one of these p-q β’s ≠ 0

• This is an F-test of the form:

Reject H0 (Accept HA) if: F > Fα,p-q,n-p-1

# variables considered for eliminationDegrees of Freedom for the Error

Term of the Full Model

Page 18: MODEL BUILDING IN REGRESSION MODELS

The Modified F-Statistic

• For this model, the F-statistic is defined by:

Full

FullReduced

Full

MSE

q-p)SSE(SSE

Error SquareMean

Errors Squared in theReduction Mean F

Page 19: MODEL BUILDING IN REGRESSION MODELS

Example

• A housing price model (Full model) is proposed for homes in Laguna Hills that takes into account p = 5 factors:– House size, Lot Size, Age, Whether or not there is

a pool, # Bedrooms

• A reduced model that takes into account only the first of these (q = 3) was discussed earlier.

• Based on a sample of n = 38 sales, can we conclude that adding these p-q = 2 additional variables (Pool, # Bedrooms) is significant?

Page 20: MODEL BUILDING IN REGRESSION MODELS
Page 21: MODEL BUILDING IN REGRESSION MODELS

The Modified F-Test For This Example

• Modified F-Test:H0: β4 = β5 = 0

HA: At least one of β4 and β5 ≠ 0

For α = .05, the test is

Reject H0 (Accept HA) if: F > F.05,2,32

F.05,2,32 can be generated in Excel by FINV(.05,2,32) = 3.29.

Page 22: MODEL BUILDING IN REGRESSION MODELS

Full Model

SSEFull

MSEFullDFEFull

Page 23: MODEL BUILDING IN REGRESSION MODELS

Reduced Model

SSEReduced

Page 24: MODEL BUILDING IN REGRESSION MODELS

The Partial F-Test

=((G3-C13)/2)/D13

=FINV(.05,2,B13)

SSE from

Output Reduced Worksheet

Page 25: MODEL BUILDING IN REGRESSION MODELS

The Modified F-Statistic

• For this model, the modified F-statistic is:

• The critical value of F = F.05,2,32 = 3.29453087

• 21.43522834 > 3.29453087

There is enough evidence to conclude that including Pool and Bedrooms is significant.

43522834.21,04011,375,871

23,286)364,027,87-59,959(851,716,6

MSE

q-p)SSE(SSE

FFull

FullReduced

Page 26: MODEL BUILDING IN REGRESSION MODELS

Review

• Stepwise regression helps determine a “best model” from a series of possible independent variables (x’s)– Approach –

• Step 1 – Run one variable regressions– If there is a p-value < , keep the variable with lowest p-value as a variable

in the model

• Step 2 – Run 2-variable regressions– One of the two variables in each model is the one determined in Step 1

– Keep the one with the lowest p-values if both are < • Repeat with 3, 4, 5 variables, etc. until no model as has p-values <

• Modified F-test for testing the significance of parts of the model– Compare F to Fα,p-q,DFE(Full), where

F= ((SSEReduced – SSEFull)/(#terms removed))/MSEFull