Module 35M F Test

25
TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL

Transcript of Module 35M F Test

Page 1: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 1/25

TESTING THE STRENGTH

OF THEMULTIPLE REGRESSION MODEL

Page 2: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 2/25

Test 1: Are Any of the x’s Usefulin Predicting y?

We are asking: Can we conclude at leastone of the ’s (other than 0) 0?

H 0: 1 = 2 = 3 = 4 = 0H A: At least one of these ’s 0

= .05

Page 3: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 3/25

Idea of the Test

• Measure the overall “average variability”due to changes in the x’s

• Measure the overall “average variability”that is due to randomness (error)

• If the overall “average variability” due to

changes in the x’s I S A L OT L ARGER than“average variability” due to error, weconclude at least is non-zero, i.e. at leastone factor (x) is useful in predicting y

Page 4: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 4/25

“Total Variability”

• Just like with simple linear regression wehave total sum of squares due to regressionSSR , and total sum of squares due to error,SSE, which are printed on the EXCELoutput.

– The formulas are a more complicated (theyinvolve matrix operations)

Page 5: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 5/25

“Average Variability” • “Average variability” (Mean variability)

for a group is defined as the TotalVariability divided by the degrees of

freedom associated with that group:• Mean Squares Due to Regression

MSR = SSR/DFR• Mean Squares Due to Error

MSE = SSE/DFE

Page 6: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 6/25

Degrees of Freedom

• Total number of degrees of freedom DF(Total)always = n-1

• Degrees of freedom for regression (DFR) = thenumber of factors in the regression (i.e. thenumber of x’s in the linear regression)

• Degrees of freedom for error (DFE) =difference between the two = DF(Total) -DFR

Page 7: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 7/25

The F-Statistic

• The F-statistic is defined as the ratio of twomeasures of variability. Here,

• Recall we are saying if MSR is “large” compared

to MSE, at least one β ≠ 0. • Thus if F is “large”, we draw the conclusion is

that H A is true, i.e. at least one β ≠ 0.

MSEMSR F

Page 8: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 8/25

The F-test

• “Large” compared to what?

F-tables give critical values for givenvalues of

• TEST: REJECT H 0 (Accept H A) if:

F = MSR/MSE > F ,DFR,DFE

Page 9: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 9/25

RESULTS

• If we do not get a large F statistic – We cannot conclude that any of the variables

in this model are significant in predicting y.

• If we do get a large F statistic – We can conclude at least one of the variables

is significant for predicting y . – NATURAL QUESTION --

• WHICH ONES?

Page 10: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 10/25

DFR = #x’s DFE = Total DF- DFRTotal DF = n-1

SSRSSE

Total SS = (y i - ) 2 y

Page 11: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 11/25

Page 12: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 12/25

Page 13: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 13/25

Test 2: Which Variables AreSignificant IN THIS MODEL?

• The question we are asking is, “taking all theother factors (x’s) into consideration, does a

change in a particular x (x 3, say) valuesignificantly affect y.

• This is another hypothesis test (a t-test).• To test if the age of the house is significant:

H 0: 3 = 0 (x 3 is not significant in thi s model )H A: 3 0 (x 3 is significant in thi s model )

Page 14: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 14/25

The t-test for a particularfactor IN THIS MODEL

• Reject H 0 (Accept H A) if:

DFE.025,DFE.025,β

3 torts

0βˆ

t3

Page 15: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 15/25

t-value for test of 3 = 0

p-value for test of 3 = 0

Page 16: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 16/25

Reading Printout for the t-test

• Simply look at the p-value – p-value for 3 = 0 is .02194 < .05

• Thus the age of the house is significant in th is model

• The other variables – p-value for 1 = 0 is .0000839 < .05

• Thus square feet is significant in th is model

– p-value for 2 = 0 is .15503 > .05• Thus the land (acres) is not significant in th is model

Page 17: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 17/25

Does A Poor t-value Imply theVariable is not Useful in Predicting y?

• NO• It says the variable is not significant I N TH I S

MODEL when we consider all the other factors.

• In this model – land is not significant whenincluded with square footage and age.

But if we would have run this model withoutsquare footage we would have gotten the outputon the next slide.

Page 18: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 18/25

p-value for land is .00000717.In this model Land is significant.

Page 19: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 19/25

Can it even happen that F says at leastone variable is significant, but none of

the t’s indicate a useful variable?

• YES

EXAMPLES IN WHICH THIS MIGHT HAPPEN: – Miles per gallon vs. horsepower and engine size – Salary vs. GPA and GPA in major – Income vs. age and experience – H OUSE PRI CE vs. SQUARE F OOTAGE OF H OUSE A ND L AN D

• There is a relation between the x’s – – Multicollinearity

Page 20: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 20/25

Page 21: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 21/25

Page 22: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 22/25

What is Adjusted R 2?• Adjusted R 2 adjusts R 2 to take into account

degrees of freedom.• By assuming a higher order equation for y, we

can force the curve to fit this one set of datapoints in the model – eliminating much of thevariability (See next slide).

• But this is not what is going on!R 2 might be higher – but adjusted R 2 might be much

lower• Adjusted R 2 takes this into account• Adjusted R 2 = 1-MSE/SST

Page 23: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 23/25

ScatterplotSales vs Ad Dollars

$0

$20,000

$40,000

$60,000

$80,000

$100,000

$120,000

$140,000

$- $200 $400 $600 $800 $1,000 $1,200 $1,400

Ad Dollars

S a

l e s

This is not what is really going on

Page 24: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 24/25

Review• Are any of the x’s useful in predicting y IN THIS

MODEL – Look at p-value for F-test – Significance F – F = MSR/MSE would be compared to F ,DFR,DFE

• Which variables are significant in this model? – Look at p-values for the individual t-tests

• What proportion of the total variance in y can beexplained by changes in the x’s?

– R 2 – Adjusted R 2 takes into account the reduced degrees of

freedom for the error term by including more terms in themodel

Page 25: Module 35M F Test

8/12/2019 Module 35M F Test

http://slidepdf.com/reader/full/module-35m-f-test 25/25

1-regression equation 3- p-values for t-testsWhich variables are significant

in this model?

4- R 2

What proportion of y can beexplained by changes in x?

4 Places to Look on Excel Printout

2- Significance FAre any variables useful?