1
Multiple Regression Analysis
Dr. Rick Jerz
1
1
Multiple Regression Analysis
• For two independent variables, the general form of the multiple regression equation is:
• X1 and X2 are the independent variables• a is the Y-intercept• b1 is the net change in Y for each unit change
in X1 holding X2 constant. It is called a regression coefficient
1 1 2 2Y a b X b X= + +
2
2
Regression Plane for a 2-Independent Variable Linear
Regression Equation
3
3
Multiple Regression Analysis
• The general multiple regression with k independent variables is given by:
• The least squares criterion is used to develop this equation. Because determining b1, b2, etc., is very tedious, a software package such as Excel or MINITAB is recommended.
1 1 2 2 3 3ˆ
k kY a b X b X b X b X= + + + + +!
4
4
An ExampleSalsberry Realty sells homes along the East Coast of
the United States. One of the questions most frequently asked by prospective buyers is: If we purchase this home, how much can we expect to pay to heat it during the winter? The research department at Salsberry has been asked to develop some guidelines regarding heating costs for single-family homes.
Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace.
To investigate, Salsberry’s research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January, as well.
5
5
Example Data
6
6
2
Multiple Linear Regression in Excel
7
7
Multiple Linear Regression in Excel
8
8
Interpreting the Regression Coefficients
1. The regression coefficient for mean outside temperature is 4.583. The coefficient is negative and shows an inverse relationship between heating cost and temperature. As the outside temperature increases, the cost to heat the home decreases. If we increase temperature by 1 degree and hold the other two independent variables constant, we can estimate a decrease of $4.583 in monthly heating cost.
2. The attic insulation variable also shows an inverse relationship: the more insulation in the attic, the less the cost to heat the home. So the negative sign for this coefficient is logical. For each additional inch of insulation, we expect the cost to heat the home to decline $14.83 per month, regardless of the outside temperature or the age of the furnace.
3. The age of the furnace variable shows a direct relationship. With an older furnace, the cost to heat the home increases. Specifically, for each additional year older the furnace is, we expect the cost to increase $6.10 per month.
1 2 3ˆ 427.194 4.583 14.831 6.101Y X X X= - - +
9
9
Applying the Model for Estimation
• What is the estimated heating cost for a home if the mean outside temperature is 30degrees, there are 5 inches of insulation in the attic, and the furnace is 10 years old?
ˆ 427.194 4.583(30) 14.831(5) 6.101(10) 276.56Y = - - + =
10
10
Measures of Effectiveness
• Coefficient of multiple determination, R2
• For larger number of independent variables, we need to adjust this equation
11
11
Global Test: All Regression Coefficients are Zero
• The hypothesis is• H0: β1 = β2 = β3 = 0• H1: Not all βis are 0
12
12
3
Testing Each Independent Variable for Inclusion
• The test for individual variables determines which independent variables have regression coefficients that differ significantly from zero
• The variables that have zero regression coefficients are usually dropped from the analysis
• Can also be done using p-values
13
13
Testing Each Independent Variable for Inclusion
14
14
Stepwise Regression
• A step-by-step method to determine a regression equation that begins with a single independent variable and adds or deletes independent variables one by one. Only independent variables with nonzero regression coefficients are included in the regression equation
15
15
“Dummy” Variables
• These are “binary” variables, like “yes” “no”, that are most typically qualitative/categorical variables
• We can use these in our regression models by using 0s and 1s
• Example: In our real estate problem, we might want to include whether or not the home has an outdoor pool• Yes = 1, No = 0
16
16
Some Assumptions and Tests1. There is a linear relationship• Use a scatter diagram to plot the dependent variable
against each independent variable2. Homoscedasticity, variation of residuals are the
same (constant)• Use a scatter diagram, plot the predicted (on the
horizontal axis) vs residuals (on the vertical axis). There should be no trend or correlation
3. Residuals are normally distributed• Use a frequency diagram to plot the residuals.
Residuals should be normally distributed• Or use Chi-square tests for actual distribution
compared to a perfect normal distribution
17
17
There is a Linear Relationship
18
18
4
Homoscedasticity
19
19
Residuals areNormally Distributed
20
20
Some Assumptions and Tests (continued)
4. Independent variables should not be correlated (multicollinearity)• VIF should be less than 10 (in model)• Calculate correlation for independent variables (<.7)• Or, plot each independent variable against each
other5. Successive residuals should be independent
(autocorrelation)• Uses a scatter diagram (like #1). There should not be
any pattern of negative or positive trend, meaning slope = 0
• Or Durbin-Watson testBe careful predicting outside of data range!21
21
Multicollinearity
• Variables should not be correlated with each other. If so, one should be eliminated.
• VIF is used to identify correlated independent variables
22
22
Autocorrelation
23
23
Top Related