Ch14 - Multiple Regression and Correlation - rjerz.com · Ch14 - Multiple Regression and...

2
4/3/18 1 Multiple Linear Regression and Correlation Analysis Chapter 14 Dr. Rick Jerz © 2018 rjerz.com 1 Multiple Regression Analysis For two independent variables, the general form of the multiple regression equation is: X 1 and X 2 are the independent variables. a is the Y-intercept. b 1 is the net change in Y for each unit change in X 1 holding X 2 constant. It is called a regression coefficient. © 2018 rjerz.com 2 1 1 2 2 ˆ Y a bX bX = + + Regression Plane for a 2-Independent Variable Linear Regression Equation © 2018 rjerz.com 3 Multiple Regression Analysis The general multiple regression with k independent variables is given by: The least squares criterion is used to develop this equation. Because determining b 1 , b 2 , etc. is very tedious, a software package such as Excel or MINITAB is recommended. © 2018 rjerz.com 4 1 1 2 2 3 3 ˆ k k Y a bX bX bX bX = + + + + + ! An Example Salsberry Realty sells homes along the East Coast of the United States. One of the questions most frequently asked by prospective buyers is: If we purchase this home, how much can we expect to pay to heat it during the winter? The research department at Salsberry has been asked to develop some guidelines regarding heating costs for single-family homes. Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace. To investigate, Salsberry’s research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January, as well © 2018 rjerz.com 5 Example Data © 2018 rjerz.com 6

Transcript of Ch14 - Multiple Regression and Correlation - rjerz.com · Ch14 - Multiple Regression and...

Page 1: Ch14 - Multiple Regression and Correlation - rjerz.com · Ch14 - Multiple Regression and Correlation Author: Rick Jerz Created Date: 4/4/2018 2:41:54 AM ...

4/3/18

1

Multiple Linear Regression and Correlation Analysis

Chapter 14

Dr. Rick Jerz

© 2018 rjerz.com1

Multiple Regression Analysis

• For two independent variables, the general form of the multiple regression equation is:

• X1 and X2 are the independent variables.• a is the Y-intercept.• b1 is the net change in Y for each unit change

in X1 holding X2 constant. It is called a regression coefficient.

© 2018 rjerz.com2

1 1 2 2Y a b X b X= + +

Regression Plane for a 2-Independent Variable Linear

Regression Equation

© 2018 rjerz.com3

Multiple Regression Analysis

• The general multiple regression with k independent variables is given by:

• The least squares criterion is used to develop this equation. Because determining b1, b2, etc. is very tedious, a software package such as Excel or MINITAB is recommended.

© 2018 rjerz.com4

1 1 2 2 3 3ˆ

k kY a b X b X b X b X= + + + + +!

An ExampleSalsberry Realty sells homes along the East Coast of

the United States. One of the questions most frequently asked by prospective buyers is: If we purchase this home, how much can we expect to pay to heat it during the winter? The research department at Salsberry has been asked to develop some guidelines regarding heating costs for single-family homes.

Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace.

To investigate, Salsberry’s research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January, as well

© 2018 rjerz.com5

Example Data

© 2018 rjerz.com6

Page 2: Ch14 - Multiple Regression and Correlation - rjerz.com · Ch14 - Multiple Regression and Correlation Author: Rick Jerz Created Date: 4/4/2018 2:41:54 AM ...

4/3/18

2

Multiple Linear Regression in Excel

© 2018 rjerz.com7

Interpreting the Regression Coefficients

1. The regression coefficient for mean outside temperature is 4.583. The coefficient is negative and shows an inverse relationship between heating cost and temperature. As the outside temperature increases, the cost to heat the home decreases. The numeric value of the regression coefficient provides more information. If we increase temperature by 1 degree and hold the other two independent variables constant, we can estimate a decrease of $4.583 in monthly heating cost. So if the mean temperature in Boston is 25 degrees and it is 35 degrees in Philadelphia, all other things being the same (insulation and age of furnace), we expect the heating cost would be $45.83 less in Philadelphia.

2. The attic insulation variable also shows an inverse relationship: the more insulation in the attic, the less the cost to heat the home. So the negative sign for this coefficient is logical. For each additional inch of insulation, we expect the cost to heat the home to decline $14.83 per month, regardless of the outside temperature or the age of the furnace.

3. The age of the furnace variable shows a direct relationship. With an older furnace, the cost to heat the home increases. Specifically, for each additional year older the furnace is, we expect the cost to increase $6.10 per month.

© 2018 rjerz.com8

1 2 3ˆ 427.194 4.583 14.831 6.101Y X X X= - - +

Applying the Model for Estimation

• What is the estimated heating cost for a home if the mean outside temperature is 30 degrees, there are 5 inches of insulation in the attic, and the furnace is 10 years old?

© 2018 rjerz.com9

ˆ 427.194 4.583(30) 14.831(5) 6.101(10) 276.56Y = - - + =

Some Assumptions and Tests

1. There is a linear relationship• Use a scatter diagram to plot the dependent variable

against each independent variable.

2. Homoscedasticity, variation of residuals are the same (constant)• Use a scatter diagram, plot the predicted (on the

horizontal axis) vs residuals (on the vertical axis). There should be no trend or correlation.

3. Residuals are normally distributed• Use a frequency diagram to plot the residuals.

Residuals should be normally distributed.• Or use Chi-square tests for actual distribution

compared to a perfect normal distribution.

© 2018 rjerz.com10

Some Assumptions and Tests (continued)

4. Independent variables should not be correlated (multicollinearity)• VIF should be less than 10 (in model.)• Calculate correlation for independent variables (<.7.)• Or, plot each independent variable against each

other.5. Successive residuals should be independent

(autocorrelation)• Uses a scatter diagram (like #1). There should not be

any pattern of negative or positive trend, meaning slope = 0.

• Or Durbin-Watson test.6. Be careful predicting outside of data range.

© 2018 rjerz.com11