Multiple Regression Model
INSR 260, Spring 2009Bob Stine
1
Overview Multiple Regression Model (MRM)
Estimators, terminology! ! ! ! ! similar to SRM
Assumptions! ! ! ! ! ! ! ! ! ! ! ! new plots
Inference! ! ! ! ! ! ! ! ! ! ! ! ! new test
Prediction! ! ! ! ! ! ! ! ! ! ! similar to SRM
Examples! ! ! ! (from Bowerman, Ch 4)Fuel consumptionSales management
2
Multiple Regression ModelEquation has k explanatory variablesMean! ! ! ! E Y|X = β0 + β1 X1 +...+ βk Xk = μy|x
Observations ! ! yi = β0 + β1 xi1 +...+ βk xik + εi
Assumptions (as in SRM)Independent observationsEqual variance σ2
Normal distribution around “line”! yi ~ N(μy|x,σ2)! ! ! ! εi ~ N(0, σ2)
k+2 parameters identify model! β0, β1, …, βk, σ2
3
Least SquaresCriterion
Find estimates that minimize sum of squared deviations! ! ! mina Σ(yi - a0 - a1 xi1 - … - ak xik)2
Fitted values, residualsFitted values (on the line)! ŷ = b0 + b1 xi1 + … + bk xik
Residual deviations!! ! ! e = y - ŷ
Standard error of regression (estimate of σ2)s2 = Σ ei2/(n-k-1)degrees of freedomRMSE = square root of s2
4
Goodness of FitR-squared statistic
Square of correlation between Y and ŶPercentage of “explained” variationAlways increases as variables are added to equation
Adjusted R-squaredWill not increase unless s2 gets smallerDifference from R2 increases as k increases
R2 = Explained SS
Total SS
R2 = 1 - s2
var(y)5
Checking AssumptionsScatterplots of Y on X1, Y on X2
Data for fuel consumption (n = 8)
Scatterplot matrix
6
7
8
9
10
11
12FUELCONS
25 30 35 40 45 50 55 60 65
TEMP
7
8
9
10
11
12
FUELCONS
0 5 10 15 20 25
CHILL
7
8
9
10
11
12
25
30
35
40
45
50
55
60
0
5
10
15
20
FUELCONS
7 8 9 10 11 12
TEMP
25 35 40 45 55
CHILL
0 5 10 15 20
Scatterplot Matrix
FUELCONS
TEMP
CHILL
1.0000
-0.9484
0.8706
-0.9484
1.0000
-0.7182
0.8706
-0.7182
1.0000
FUELCONS TEMP CHILL
Correlations
Data
y = weekly natural gas consumption X1 = average temperature
X2 = chill index (wind, clouds, temp)
Partial vs MarginalSRM
MRM
7
7
8
9
10
11
12
FU
ELC
ON
S
25 30 35 40 45 50 55 60 65
TEMP
Linear Fit
FUELCONS = 15.837857 - 0.1279217*TEMP
Intercept
TEMP
Term
15.837857
-0.127922
Estimate
0.801773
0.017457
Std Error
19.75
-7.33
t Ratio
<.0001*
0.0003*
Prob>|t|
Parameter Estimates
Linear Fit
Bivariate Fit of FUELCONS By TEMP
7
8
9
10
11
12
FU
ELC
ON
S
0 5 10 15 20 25
CHILL
Linear Fit
FUELCONS = 7.8494238 + 0.1835399*CHILL
Intercept
CHILL
Term
7.8494238
0.1835399
Estimate
0.652654
0.042339
Std Error
12.03
4.34
t Ratio
<.0001*
0.0049*
Prob>|t|
Parameter Estimates
Linear Fit
Bivariate Fit of FUELCONS By CHILL
7
8
9
10
11
12
FU
ELC
ON
S A
ctu
al
7 8 9 10 11 12
FUELCONS Predicted P=0.0001
RSq=0.97 RMSE=0.3671
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.97363
0.963081
0.367078
10.2125
8
Summary of Fit
Intercept
TEMP
CHILL
Term
13.108737
-0.090014
0.082495
Estimate
0.855698
0.014077
0.022003
Std Error
15.32
-6.39
3.75
t Ratio
<.0001*
0.0014*
0.0133*
Prob>|t|
Parameter Estimates
Slopes differ
More DiagnosticsOverall plots (MRM version of SRM scatterplots)
Leverage plots (partial regression plots)Simple regression view of MR slope, one for each slope
8
7
8
9
10
11
12
FU
ELC
ON
S A
ctu
al
7 8 9 10 11 12
FUELCONS Predicted P=0.0001
RSq=0.97 RMSE=0.3671
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
FU
ELC
ON
S R
esid
ual
7 8 9 10 11 12
FUELCONS Predicted
8
9
10
11
12
FU
ELC
ON
S
Levera
ge R
esid
uals
25 30 35 40 45 50 55 60 65
TEMP Leverage, P=0.0014
8
9
10
11
12
FU
ELC
ON
S
Levera
ge R
esid
uals
0 5 10 15 20 25
CHILL Leverage, P=0.0133
InferenceStandard error of the slope is affected by correlation among explanatory variables
Variance inflation factor (Chap 5)! Var(slope in MRM) ≈ Var(slope in SRM) VIF
Three equivalent methods for each estimated slope and the intercept
Confidence intervalt-statisticp-value
9
Var(bj) =!2
!i(xij ! xj)2
"1
1!R2Xj |Xm !=j
#
Intercept
TEMP
CHILL
Term
13.108737
-0.090014
0.082495
Estimate
0.855698
0.014077
0.022003
Std Error
15.32
-6.39
3.75
t Ratio
<.0001*
0.0014*
0.0133*
Prob>|t|
10.909095
-0.126201
0.0259356
Lower 95%
15.308379
-0.053827
0.1390543
Upper 95%
.
2.07
2.07
VIF
Parameter Estimates
Overall F TestTest both slopes simultaneously
H0: β1 = β2 = 0Ratio of variance explained to remaining variation
Test of the size of R2 statistic
10
7
8
9
10
11
12
FU
ELC
ON
S A
ctu
al
7 8 9 10 11 12
FUELCONS Predicted P=0.0001
RSq=0.97 RMSE=0.3671Model
Error
C. Total
Source
2
5
7
DF
24.875018
0.673732
25.548750
Sum of
Squares
12.4375
0.1347
Mean Square
92.3031
F Ratio
0.0001*
Prob > F
Analysis of Variance
F = R2/k
(1-R2)/(n-k-1)
PredictionNo simple plot
Extrapolation effect is more subtle
Software is needed to identify extrapolationOptions in Fit Model to save various standard errors as well as prediction and confidence intervalsAdd an extra row (before fitting) to get JMP to predict a new case
11
ŷ
Sales ExampleQuestion
Evaluation of sales representativesResponse is annual company sales in territory
y measured in thousands of units
Data are a sample for n = 25 sales representatives
Several explanatory variablesTime (months) with the companyTotal sales of company and rivals in territory (potential)Advertising expenditure in territoryCompany’s market share in prior four yearsChange in company’s market share
12
Initial Graphical Analysis
13
2000
4000
6000
0
100
200
300
20000
40000
60000
0
4000
8000
2
6
10
-2.0
-1.0
0.0
1.0
Sales
2000 5000
Time
0 100 250
MktPoten
20000
Adver
0 4000
MktShare
2 4 6 8 12
Change
-2.0 0.0
Scatterplot Matrix
Sales
Time
MktPoten
Adver
MktShare
Change
1.0000
0.6229
0.5978
0.5962
0.4835
0.4892
0.6229
1.0000
0.4540
0.2492
0.1062
0.2515
0.5978
0.4540
1.0000
0.1741
-0.2107
0.2683
0.5962
0.2492
0.1741
1.0000
0.2645
0.3765
0.4835
0.1062
-0.2107
0.2645
1.0000
0.0855
0.4892
0.2515
0.2683
0.3765
0.0855
1.0000
Sales Time MktPoten Adver MktShare Change
Correlations
Multiple Regression
14
-750
-500
-250
0
250
500
750
Sale
s R
esid
ual
1000 3000 4000 5000 6000
Sales Predicted
Residual by Predicted Plot
Overall fit
Leverage plots
2000
3000
4000
5000
6000
7000Sale
s A
ctu
al
1000 3000 4000 5000 6000
Sales Predicted P<.0001
RSq=0.92 RMSE=430.23
Actual by Predicted Plot
2000
3000
4000
5000
6000
7000
Sale
s L
evera
ge
Resid
uals
-50 0 50 100 200 300 400
Time Leverage, P=0.0065
Leverage Plot
2000
3000
4000
5000
6000
7000
Sale
s L
evera
ge
Resid
uals
10000 30000 50000 70000
MktPoten Leverage, P<.0001
Leverage Plot
Model SummaryOverall fit
Individual estimatesInterpretation of these estimates?Why linear? Implications of model are very strong.
15
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.915009
0.892643
430.2319
3374.568
25
Summary of Fit
Model
Error
C. Total
Source
5
19
24
DF
37862659
3516890
41379549
Sum of
Squares
7572532
185099
Mean Square
40.9106
F Ratio
<.0001*
Prob > F
Analysis of Variance
Intercept
Time
MktPoten
Adver
MktShare
Change
Term
-1113.788
3.6121012
0.0420881
0.1288568
256.95554
324.53345
Estimate
419.8869
1.1817
0.006731
0.037036
39.13607
157.2831
Std Error
-2.65
3.06
6.25
3.48
6.57
2.06
t Ratio
0.0157*
0.0065*
<.0001*
0.0025*
<.0001*
0.0530
Prob>|t|
Parameter Estimates
PredictionConditions for another rep (not one of these 25)
Sales were 3082Time with company!! 85.42Market potential!! ! 35,182.73Advertising!! ! ! ! 7,281.65Market share!! ! ! 9.64Change in share!! ! 0.28!
Prediction resultsPlug values for explanatory variables into equationPrediction !! ! ! ! ! ! ! ! ŷ = 4182Confidence interval for mean! 3884.9 to!4478.6Prediction interval for rep! ! 3233.6! to 5129.9Benchmarking implication: How is this rep doing?
16
Summary Multiple Regression Model (MRM)
Estimators!! ! partial (MRM) vs marginal (SRM)
Assumptions! ! ! ! ! ! ! ! ! ! leverage plots
Inference! ! ! ! ! ! ! ! ! ! ! ! ! F-test
Prediction! ! ! ! ! ! ! ! ! ! ! ! ! Software
17
Top Related