Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit...

64

Transcript of Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit...

Page 1: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.
Page 2: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

Chapter13131313

Multiple RegressionAssessing Overall FitPredictor Significance

Confidence Intervals for YBinary Predictors

Tests for Nonlinearity and InteractionMulticollinearity

Violations of AssumptionsOther Regression Topics

Page 3: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• Multiple regressionMultiple regression is an extension of bivariate is an extension of bivariate regression to include more than one independent regression to include more than one independent variable.variable.

• Limitations of bivariate regression:Limitations of bivariate regression:- often simplistic- often simplistic- biased estimates if relevant predictors are- biased estimates if relevant predictors are omitted omitted- lack of fit does not show that - lack of fit does not show that XX is unrelated to is unrelated to YY

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Bivariate or Multivariate?Bivariate or Multivariate?

Page 4: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• YY is the is the response variable response variable and is assumed to be and is assumed to be related to the related to the kk predictors ( predictors (XX11, X, X22, … X, … Xkk) by a ) by a

linear equation called the linear equation called the population regression population regression modelmodel::

• The The fitted regression equationfitted regression equation is: is:

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Regression TerminologyRegression Terminology

Page 5: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• nn observed values of observed values of the response variable the response variable YY and its proposed and its proposed predictors predictors XX11, X, X22, … X, … Xk k

are presented in the are presented in the form of an form of an n n x x kk matrix: matrix:

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Data FormatData Format

Page 6: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• Consider the following Consider the following data of the selling price data of the selling price of a home (of a home (YY, the , the response variableresponse variable) and ) and three potential three potential explanatory variables:explanatory variables:XX11 = SqFt = SqFt

XX22 = LotSize = LotSize

XX33 = Baths = Baths

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Illustration: Home PricesIllustration: Home Prices

Page 7: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• Intuitively, the regression models areIntuitively, the regression models are

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Illustration: Home PricesIllustration: Home Prices

Page 8: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• State the hypotheses about the sign of the State the hypotheses about the sign of the coefficients in the model.coefficients in the model.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Logic of Variable SelectionLogic of Variable Selection

Page 9: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• Use Excel, MegaStat, MINITAB, or any other Use Excel, MegaStat, MINITAB, or any other statistical package.statistical package.

• For For nn = 30 home sales, here are the fitted = 30 home sales, here are the fitted regressions and their statistics of fit.regressions and their statistics of fit.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Fitted RegressionsFitted Regressions

• RR22 is the coefficient of determination and is the coefficient of determination and SESE is the is the standard error of the regression.standard error of the regression.

Page 10: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• A common mistake is to assume that the model with A common mistake is to assume that the model with the best fit is preferred.the best fit is preferred.

• Principle of Occam’s Razor: When two explanations Principle of Occam’s Razor: When two explanations are otherwise equivalent, we prefer the simpler, are otherwise equivalent, we prefer the simpler, more parsimonious one.more parsimonious one.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Common Misconceptions about FitCommon Misconceptions about Fit

Page 11: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• Four Criteria for Regression AssessmentFour Criteria for Regression Assessment

LogicLogic Is there an Is there an a prioria priori reason to expect a reason to expect a causal relationship between the causal relationship between the

predictorspredictorsand the response variable?and the response variable?

FitFit Does the Does the overalloverall regression show a regression show a significant relationship between thesignificant relationship between thepredictors and the response variable?predictors and the response variable?

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Regression ModelingRegression Modeling

Page 12: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

• Four Criteria for Regression AssessmentFour Criteria for Regression Assessment

ParsimonyParsimony Does Does each predictoreach predictor contribute contribute significantly to the explanation? Are significantly to the explanation? Are

somesomepredictors not worth the trouble?predictors not worth the trouble?

StabilityStability Are the predictors related to one anotherAre the predictors related to one anotherso strongly that regression estimatesso strongly that regression estimatesbecome erratic?become erratic?

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Regression ModelingRegression Modeling

Page 13: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Assessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall Fit

• For a regression withFor a regression with k k predictors, the hypotheses to predictors, the hypotheses to be tested arebe tested are

HH00: All the true coefficients are zero: All the true coefficients are zero

HH11: At least one of the coefficients is nonzero: At least one of the coefficients is nonzero

In other words,In other words,HH00: : 11 = = 22 = … = = … = 44 = 0 = 0

HH11: At least one of the coefficients is nonzero: At least one of the coefficients is nonzero

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

F Test for SignificanceF Test for Significance

Page 14: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Assessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall Fit

• The ANOVA table decomposes variation of the The ANOVA table decomposes variation of the response variable around its mean intoresponse variable around its mean into

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

F Test for SignificanceF Test for Significance

Page 15: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Assessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall Fit

• The ANOVA calculations for a The ANOVA calculations for a kk-predictor model -predictor model can be summarized ascan be summarized as

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

F Test for SignificanceF Test for Significance

Page 16: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Assessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall Fit

• Here are the ANOVA calculations for the home Here are the ANOVA calculations for the home price dataprice data

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

F Test for SignificanceF Test for Significance

Page 17: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Assessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall Fit

• RR22, the coefficient of determination, is a common , the coefficient of determination, is a common measure of overall fit. measure of overall fit.

• It can be calculated one of two ways. It can be calculated one of two ways. • For example, for the home price data,For example, for the home price data,

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Coefficient of Determination (RCoefficient of Determination (R22))

Page 18: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Assessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall Fit

• It is generally possible to raise the coefficient of It is generally possible to raise the coefficient of determination determination RR22 by including addition predictors. by including addition predictors.

• The The adjusted coefficient of determinationadjusted coefficient of determination is done is done to penalize the inclusion of useless predictors.to penalize the inclusion of useless predictors.

• For For nn observations and observations and kk predictors, predictors,

• For the home price data, the adjusted For the home price data, the adjusted RR22 is is

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Adjusted RAdjusted R22

Page 19: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Assessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall Fit

• Limit the number of predictors based on the Limit the number of predictors based on the sample size.sample size.

• When When nn//kk is small, the is small, the RR22 no longer gives a no longer gives a reliable indication of fit.reliable indication of fit.

• Suggested rules are:Suggested rules are:

Evan’s RuleEvan’s Rule (conservative): (conservative): nn//kk >> 0 (at least 10 0 (at least 10 observations per predictor)observations per predictor)

Doane’s RuleDoane’s Rule (relaxed): (relaxed): nn//kk >> 5 (at least 5 5 (at least 5 observations per predictor)observations per predictor)

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

How Many Predictors?How Many Predictors?

Page 20: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Predictor SignificancePredictor SignificancePredictor SignificancePredictor Significance

• Test each fitted coefficient to see whether it is Test each fitted coefficient to see whether it is significantly different from zero.significantly different from zero.

• The hypothesis tests for predictor The hypothesis tests for predictor XXjj are are

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

F Test for SignificanceF Test for Significance

• If we cannot reject the hypothesis that a coefficient If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not is zero, then the corresponding predictor does not contribute to the prediction of contribute to the prediction of YY..

Page 21: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Predictor SignificancePredictor SignificancePredictor SignificancePredictor Significance

• The test statistic for coefficient of predictor The test statistic for coefficient of predictor XXjj is is

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Test StatisticTest Statistic

• Find the critical value Find the critical value tt for a chosen level of for a chosen level of

significance significance from Appendix D. from Appendix D.

• Reject Reject HH00 if if ttjj > > tt or if or if pp-value -value << ..

• The 95% confidence interval for coefficient The 95% confidence interval for coefficient jj is is

Page 22: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Confidence Intervals for Confidence Intervals for YYConfidence Intervals for Confidence Intervals for YY

• The The standard error of the regression standard error of the regression ((SESE) is ) is another important measure of fit.another important measure of fit.

• For For nn observations and observations and kk predictors predictors

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Standard ErrorStandard Error

• If all predictions were perfect, the If all predictions were perfect, the SESE = 0. = 0.

Page 23: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Confidence Intervals for Confidence Intervals for YYConfidence Intervals for Confidence Intervals for YY

• Approximate 95% confidence Approximate 95% confidence interval for conditional mean of interval for conditional mean of YY..

• The Approximate 95% prediction The Approximate 95% prediction interval for individual interval for individual YY value value

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Standard ErrorStandard Error

Page 24: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Confidence Intervals for Confidence Intervals for YYConfidence Intervals for Confidence Intervals for YY

• The The tt-values for 95% confidence are typically near -values for 95% confidence are typically near 2 (as long as 2 (as long as nn is too small). is too small).

• A very quick prediction interval without using a A very quick prediction interval without using a tt table is: table is:

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Very Quick Prediction Interval for YVery Quick Prediction Interval for Y

• Approximate 95% confidence Approximate 95% confidence interval for conditional mean of interval for conditional mean of YY..

• The Approximate 95% prediction The Approximate 95% prediction interval for individual interval for individual YY value value

Page 25: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Binary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

• A binary predictor has two values (usually 0 and 1)A binary predictor has two values (usually 0 and 1)to denote the presence or absence of a condition.to denote the presence or absence of a condition.

• For example, for For example, for n n graduates from an MBA program: graduates from an MBA program: Employed = 1Employed = 1Unemployed = 0Unemployed = 0

• These variables are also called These variables are also called dummy dummy or or indicator indicator variables.variables.

• For easy understandability, name the binary variable For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1.the characteristic that is equivalent to the value of 1.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

What Is a Binary Predictor?What Is a Binary Predictor?

Page 26: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Binary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

• A binary predictor is sometimes called a A binary predictor is sometimes called a shift shift variablevariable because it shifts the regression plane up or because it shifts the regression plane up or down.down.

• Suppose Suppose XX11 is a binary predictor which can take on is a binary predictor which can take on

only the values of 0 or 1.only the values of 0 or 1.

• Its contribution to the regression is either Its contribution to the regression is either bb11 or or

nothing, resulting in an intercept of either nothing, resulting in an intercept of either bb00 (when (when

XX11 = 0) or = 0) or bb00 + + bb11 (when (when XX11 = 1). = 1).

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Effects of a Binary PredictorEffects of a Binary Predictor

Page 27: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Binary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

• The slope does The slope does not change, only not change, only the intercept is the intercept is shifted. Forshifted. Forexample,example,

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Effects of a Binary PredictorEffects of a Binary Predictor

Page 28: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Binary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

• In multiple regression, binary predictors require In multiple regression, binary predictors require no special treatment. They are tested as any no special treatment. They are tested as any other predictor using a other predictor using a tt test. test.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Testing a Binary for SignificanceTesting a Binary for Significance

Page 29: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Binary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• More than one binary occurs when the number of categories to be coded exceeds two. More than one binary occurs when the number of categories to be coded exceeds two. • For example, for the variable For example, for the variable GPA by class levelGPA by class level, each category is a binary variable:, each category is a binary variable:

FreshmanFreshman = 1 if a freshman, 0 otherwise = 1 if a freshman, 0 otherwiseSophomoreSophomore = 1 if a sophomore, 0 otherwise = 1 if a sophomore, 0 otherwiseJuniorJunior = 1 if a junior, 0 otherwise = 1 if a junior, 0 otherwiseSeniorSenior = 1 if a senior, 0 otherwise = 1 if a senior, 0 otherwiseMastersMasters = 1 if a master’s candidate, 0 otherwise = 1 if a master’s candidate, 0 otherwiseDoctoralDoctoral = 1 if a PhD candidate, 0 otherwise = 1 if a PhD candidate, 0 otherwise

More Than One BinaryMore Than One Binary

Page 30: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Binary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• If there are If there are cc mutually exclusive and collectively mutually exclusive and collectively exhaustive categories, then there are only exhaustive categories, then there are only cc-1 -1 binaries to code each observation.binaries to code each observation.

More Than One BinaryMore Than One Binary

Any one of the categories can be Any one of the categories can be omitted because the remaining omitted because the remaining cc-1 -1 binary values uniquely determine the binary values uniquely determine the remaining binary. remaining binary.

Page 31: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Binary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• Including all Including all cc binaries for binaries for cc categories would introduce a serious problem for the categories would introduce a serious problem for the regression estimation.regression estimation.

• One column in the One column in the XX data matrix will be a perfect linear combination of the other column(s). data matrix will be a perfect linear combination of the other column(s).• The least squares estimation would fail because the data matrix would be singular (i.e., The least squares estimation would fail because the data matrix would be singular (i.e.,

would have no inverse).would have no inverse).

What if I Forget to Exclude One Binary?What if I Forget to Exclude One Binary?

Page 32: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Binary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• Binaries are commonly used to code regions. For example, Binaries are commonly used to code regions. For example, MidwestMidwest = 1 if in the Midwest, 0 otherwise = 1 if in the Midwest, 0 otherwiseNeastNeast = 1 if in the Northeast, 0 otherwise = 1 if in the Northeast, 0 otherwiseSeastSeast = 1 if in the Southeast, 0 otherwise = 1 if in the Southeast, 0 otherwiseWestWest = 1 if in the West, 0 otherwise = 1 if in the West, 0 otherwise

Regional BinariesRegional Binaries

Page 33: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Tests for Nonlinearity and Tests for Nonlinearity and InteractionInteraction

Tests for Nonlinearity and Tests for Nonlinearity and InteractionInteraction

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• Sometimes the effect of a predictor is nonlinear.Sometimes the effect of a predictor is nonlinear.• To test for nonlinearity of any predictor, include its square in the regression. For example,To test for nonlinearity of any predictor, include its square in the regression. For example,

• If the linear model is the correct one, the coefficients of the squared predictors If the linear model is the correct one, the coefficients of the squared predictors 22 and and 44 would not differ would not differ

significantly from zero.significantly from zero.• Otherwise a quadratic relationship would exist between Otherwise a quadratic relationship would exist between YY and the respective predictor variable. and the respective predictor variable.

Tests for NonlinearityTests for Nonlinearity

Page 34: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Tests for Nonlinearity and Tests for Nonlinearity and InteractionInteraction

Tests for Nonlinearity and Tests for Nonlinearity and InteractionInteraction

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• Test for Test for interactioninteraction between two predictors by including their product in the regression. between two predictors by including their product in the regression.

• If we reject the hypothesis If we reject the hypothesis HH00: : 33 = 0, then we conclude that there is a significant interaction between = 0, then we conclude that there is a significant interaction between XX11 and and XX22. .

• Interaction effects require careful interpretation and cost 1 degree of freedom per interaction.Interaction effects require careful interpretation and cost 1 degree of freedom per interaction.

Tests for InteractionTests for Interaction

Page 35: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• MulticollinearityMulticollinearity occurs when the independent variables occurs when the independent variables XX11, , XX22, …, , …, XXmm are are

intercorrelated instead of being independent.intercorrelated instead of being independent.• CollinearityCollinearity occurs if only two predictors are correlated. occurs if only two predictors are correlated.• The The degreedegree of multicollinearity is the real concern. of multicollinearity is the real concern.

What is Multicollinearity?What is Multicollinearity?

Page 36: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• Multicollinearity inducesMulticollinearity induces variance inflation variance inflation when predictors are strongly intercorrelated.when predictors are strongly intercorrelated.

• This results in wider confidence intervals for the true coefficients This results in wider confidence intervals for the true coefficients 11, , 22, …, , …, mm and makes the and makes the

tt statistic less reliable. statistic less reliable.

• The separate contribution of each predictor in “explaining” the response variable is difficult to The separate contribution of each predictor in “explaining” the response variable is difficult to identify.identify.

Variance InflationVariance Inflation

Page 37: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• To check whether two predictors are correlated To check whether two predictors are correlated ((collinearitycollinearity), inspect the ), inspect the correlation matrixcorrelation matrix using using Excel, MegaStat, or MINITAB. For example,Excel, MegaStat, or MINITAB. For example,

Correlation MatrixCorrelation Matrix

Page 38: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Correlation MatrixCorrelation Matrix

• This applies to samples that are not too small This applies to samples that are not too small (say, 20 or more).(say, 20 or more).

• A quick Rule:A quick Rule:A sample correlation whose absolute value exceeds 2/ n A sample correlation whose absolute value exceeds 2/ n probably differs significantly from zero in a two-tailed test at probably differs significantly from zero in a two-tailed test at = = .05. .05.

Page 39: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Predictor Matrix PlotsPredictor Matrix PlotsThe collinearity for the squared The collinearity for the squared predictors can often be seen in predictors can often be seen in scatter plots. scatter plots. For example, For example,

Page 40: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Variance Inflation Factor (VIF)Variance Inflation Factor (VIF)• The matrix scatter plots and correlation matrix only show correlations between any The matrix scatter plots and correlation matrix only show correlations between any twotwo predictors. predictors.

• The The variance inflation factorvariance inflation factor ( (VIFVIF) is a more comprehensive test for multicollinearity.) is a more comprehensive test for multicollinearity.

• For a given predictor For a given predictor jj, the , the VIFVIF is defined as is defined as

where where RRjj22 is the coefficient of is the coefficient of

determination when predictor determination when predictor jj is is regressed against all other predictors.regressed against all other predictors.

Page 41: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Variance Inflation Factor (VIF)Variance Inflation Factor (VIF)• Some possible situations are:Some possible situations are:

Page 42: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Rules of ThumbRules of Thumb• There is no limit on the magnitude of the There is no limit on the magnitude of the VIFVIF..• A A VIFVIF of 10 says that the other predictors “explain” 90% of the variation in predictor of 10 says that the other predictors “explain” 90% of the variation in predictor jj..• This indicates that predictor This indicates that predictor jj is strongly related to the other predictors. is strongly related to the other predictors.• However, it is not necessarily indicative of instability in the least squares estimate.However, it is not necessarily indicative of instability in the least squares estimate.• A large A large VIFVIF is a warning to consider whether predictor is a warning to consider whether predictor jj really belongs to the model. really belongs to the model.

Page 43: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Are Coefficients Stable?Are Coefficients Stable?• Evidence of instability isEvidence of instability is

when when XX11 and and XX22 have a high pairwise correlation with have a high pairwise correlation with YY, yet one or both predictors have insignificant , yet one or both predictors have insignificant tt statistics in statistics in

the fitted multiple regression, and/orthe fitted multiple regression, and/or

if if XX11 and and XX22 are positively correlated with are positively correlated with YY, yet one has a negative slope in the multiple regression., yet one has a negative slope in the multiple regression.

Page 44: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

MulticollinearityMulticollinearityMulticollinearityMulticollinearity

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Are Coefficients Stable?Are Coefficients Stable?• As a test, try dropping a collinear predictor from the regression and seeing what happens to the fitted coefficients in the re-As a test, try dropping a collinear predictor from the regression and seeing what happens to the fitted coefficients in the re-

estimated model.estimated model.

• If they don’t change much, then multicollinearity is not a concern.If they don’t change much, then multicollinearity is not a concern.

• If it causes sharp changes in one or more of the remaining coefficients in the model, then the multicollinearity may be causing If it causes sharp changes in one or more of the remaining coefficients in the model, then the multicollinearity may be causing instability.instability.

Page 45: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• The least squares method makes several assumptions about the (unobservable) random errors The least squares method makes several assumptions about the (unobservable) random errors ii. Clues about these errors may be found in the . Clues about these errors may be found in the

residuals residuals eeii..

• Assumption 1: The errors are normally distributed.Assumption 1: The errors are normally distributed.• Assumption 2: The errors have constant variance (i.e., they are homoscedastic).Assumption 2: The errors have constant variance (i.e., they are homoscedastic).• Assumption 3: The errors are independent (i.e., they are nonautocorrelated).Assumption 3: The errors are independent (i.e., they are nonautocorrelated).

Page 46: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Non-Normal ErrorsNon-Normal Errors• Except when there are major outliers, non-normal residuals are usually considered a mild violation.Except when there are major outliers, non-normal residuals are usually considered a mild violation.• Regression coefficients and variance remain unbiased and consistent. Regression coefficients and variance remain unbiased and consistent. • Confidence intervals for the parameters may be unreliable since they are based on the normality assumption.Confidence intervals for the parameters may be unreliable since they are based on the normality assumption.• The confidence intervals are generally OK with a large sample size (e.g., The confidence intervals are generally OK with a large sample size (e.g., nn > 30) and no outliers. > 30) and no outliers.

Page 47: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Non-Normal ErrorsNon-Normal Errors• TestTest

HH00: Errors are normally distributed: Errors are normally distributed

HH11: Errors are not normally distributed: Errors are not normally distributed

• Create a Create a histogram of residualshistogram of residuals (plain or standardized) to visually reveal any outliers or serious asymmetry. (plain or standardized) to visually reveal any outliers or serious asymmetry.• The normal The normal probability plotprobability plot will also visually test for normality. will also visually test for normality.

Page 48: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)• If the error variance is constant, the errors are If the error variance is constant, the errors are homoscedastichomoscedastic. If the error variance is nonconstant, the errors are . If the error variance is nonconstant, the errors are heteroscedasticheteroscedastic..

• This violation is potentially serious.This violation is potentially serious.

• The least squares regression parameter estimates are unbiased and consistent. The least squares regression parameter estimates are unbiased and consistent.

• Estimated variances are biased (understated) and not efficient, resulting in overstated Estimated variances are biased (understated) and not efficient, resulting in overstated tt statistics and narrow confidence intervals. statistics and narrow confidence intervals.

Page 49: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)• The hypotheses are:The hypotheses are:

HH00: Errors have constant variance : Errors have constant variance

(homoscedastic) (homoscedastic)HH11: Errors have nonconstant variance: Errors have nonconstant variance

(heteroscedastic) (heteroscedastic)• Constant variance can be visually tested by examining scatter plots of the residuals against each predictor.Constant variance can be visually tested by examining scatter plots of the residuals against each predictor.• Ideally there will be no pattern.Ideally there will be no pattern.

Page 50: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)

Page 51: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

AutocorrelationAutocorrelation• AutocorrelationAutocorrelation is a pattern of nonindependent errors that violates the assumption that each error is independent of its predecessor. is a pattern of nonindependent errors that violates the assumption that each error is independent of its predecessor.• This is a problem with time series data.This is a problem with time series data.• Autocorrelated errors results in biased estimated variances which will result in narrow confidence intervals and large Autocorrelated errors results in biased estimated variances which will result in narrow confidence intervals and large tt statistics. statistics.• The model’s fit may be overstated.The model’s fit may be overstated.

Page 52: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

AutocorrelationAutocorrelation• Test the hypotheses:Test the hypotheses:

HH00: Errors are nonautocorrelated: Errors are nonautocorrelated

HH11: Errors are autocorrelated: Errors are autocorrelated

• We will use the observable residuals We will use the observable residuals ee11, , ee22, …, , …, eenn for evidence of autocorrelation and the Durbin-Watson test statistic for evidence of autocorrelation and the Durbin-Watson test statistic DWDW::

Page 53: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

AutocorrelationAutocorrelation• The The DWDW statistic lies between 0 and 4. statistic lies between 0 and 4.

• When When HH00 is true (no autocorrelation), the is true (no autocorrelation), the DWDW statistic will be near 2. statistic will be near 2.

• A A DWDW < 2 suggests < 2 suggests positivepositive autocorrelation. autocorrelation.• A A DW DW > 2 suggests > 2 suggests negativenegative autocorrelation. autocorrelation.• Ignore the Ignore the DWDW statistic for cross-sectional data. statistic for cross-sectional data.

Page 54: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Unusual ObservationsUnusual Observations• An observation may be unusual An observation may be unusual

1. because the fitted model’s prediction is poor1. because the fitted model’s prediction is poor ( (unusual residualsunusual residuals), or), or

2. because one or more predictors may be2. because one or more predictors may be having a large influence on the regression having a large influence on the regression estimates ( estimates (unusual leverageunusual leverage).).

Page 55: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Unusual ObservationsUnusual Observations• To check for To check for unusual residualsunusual residuals, simply inspect the residuals to find instances where the model does not predict well., simply inspect the residuals to find instances where the model does not predict well.• To check for To check for unusual leverageunusual leverage, look at the , look at the leverage statisticleverage statistic (how far each observation is from the mean(s) of the (how far each observation is from the mean(s) of the

predictors) for each observation.predictors) for each observation.• For For nn observations and observations and kk predictors, look for observations whose leverage exceeds 2( predictors, look for observations whose leverage exceeds 2(kk + 1)/ + 1)/n.n.

Page 56: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• An outlier may be due to an error in recording the data and if so, the observation should be deleted.An outlier may be due to an error in recording the data and if so, the observation should be deleted.• It is reasonable to discard an observation on the grounds that it represents a different population that It is reasonable to discard an observation on the grounds that it represents a different population that

the other observations.the other observations.

Outliers: Causes and CuresOutliers: Causes and Cures

Page 57: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• An outlier may also be an observation that has been influenced by an unspecified “lurking” variable that should An outlier may also be an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t.have been controlled but wasn’t.

• Try to identify the lurking variable and formulate a multiple regression model including both predictors.Try to identify the lurking variable and formulate a multiple regression model including both predictors.• Unspecified “lurking” variables cause inaccurate predictions from the fitted regression.Unspecified “lurking” variables cause inaccurate predictions from the fitted regression.

Missing PredictorsMissing Predictors

Page 58: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• All variables in the regression should be of the same general order of magnitude.All variables in the regression should be of the same general order of magnitude.• Do not mix very large data values with very small data values.Do not mix very large data values with very small data values.• To avoid mixing magnitudes, adjust the decimal point in both variables.To avoid mixing magnitudes, adjust the decimal point in both variables.• Be consistent throughout the data column.Be consistent throughout the data column.• The decimal adjustments for each data column need not be the same.The decimal adjustments for each data column need not be the same.

Ill-Conditioned DataIll-Conditioned Data

Page 59: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• Statistical significanceStatistical significance may not imply may not imply practical importancepractical importance..• Anything can be made significant if you get a large enough sample.Anything can be made significant if you get a large enough sample.

Significance in Large SamplesSignificance in Large Samples

Page 60: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• A A misspecified modelmisspecified model occurs when you estimate a linear model when actually a nonlinear model is required or when a relevant predictor is omitted. occurs when you estimate a linear model when actually a nonlinear model is required or when a relevant predictor is omitted.• To detect misspecificationTo detect misspecification

- Plot the residuals against estimated - Plot the residuals against estimated YY (should be no discernable pattern). (should be no discernable pattern).- Plot the residuals against actual - Plot the residuals against actual YY (should be no discernable pattern). (should be no discernable pattern).- Plot the fitted - Plot the fitted YY against the actual against the actual YY (should be a 45 (should be a 45 line). line).

Model Specification ErrorsModel Specification Errors

Page 61: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• Discard a variable if many data values are missing.Discard a variable if many data values are missing.

• If a If a YY value is missing, discard the observation to be conservative. value is missing, discard the observation to be conservative.

• Other options would be to use the mean of the Other options would be to use the mean of the XX data column for the missing values or to use a regression data column for the missing values or to use a regression procedure to “fit” the missing procedure to “fit” the missing XX-value from the complete observations.-value from the complete observations.

Missing DataMissing Data

Page 62: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• When the response variable When the response variable YY is binary (0, 1), the least squares estimation is binary (0, 1), the least squares estimation method is no longer appropriate.method is no longer appropriate.

• Use logit and probit regression methods.Use logit and probit regression methods.

Binary Dependent VariableBinary Dependent Variable

Page 63: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

• The The stepwise regression stepwise regression procedure finds the best fitting model using 1, 2, 3, …, procedure finds the best fitting model using 1, 2, 3, …, kk predictors. predictors.

• This procedure is appropriate only when there is no theoretical model that specifies which predictors This procedure is appropriate only when there is no theoretical model that specifies which predictors shouldshould be used. be used.

• Perform Perform best subsetsbest subsets regression using all possible combinations of predictors. regression using all possible combinations of predictors.

Stepwise and Best Subsets RegressionStepwise and Best Subsets Regression

Page 64: Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Applied Statistics in Business and Economics

End of Chapter 13End of Chapter 13