Post on 12-Jan-2016
description
1
Multiple Regression Analysis: Part 2
Interpretation and Diagnostics
2
Learning Objectives
Understand regression coefficients and semi-partial correlations
Learn to use diagnostics to locate problems with data (relative to MRA)
Understand… Assumptions Robustness Methods of dealing with violations
Enhance our interpretation of equations Understand entry methods
3
Statistical Tests & Interpretation Interpretation of regression coefficients
Standardized Unstandardized Intercept
Testing regression coefficients t-statistic & interpretation Testing R2
4
Output for MRA Run (coefficients)
Unstandardized Coef. Standardized Coef. 95% CI for B
B Std. Error Beta t Sig.Lower Bound
Upper Bound
(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852Dependent Variable: Sales Index
R2 = .558
5
Variance in Y Accounted for by two uncorrelated Predictors
Y
X1 X2
Example #1: Small R2, A represents variance in Y accounted for by X1, B = variance in Y accounted for by X2.
Example #2: Larger R2, A represents variance in Y accounted for by X1, B = variance in Y accounted for by X2.
(A+B)/Y = R2, E (in Y circle) equals Error.Y
X1 X2
A BE A B
E
6
Variance in Y Accounted for by two correlated Predictors: sr2 and pr2
Y
X1 X2
Example #1: Small R2
A BC
D
Y
X1 X2
Example #2: Larger R2
AC
B
D
sr2 for X1 =
pr2 for X1 =
7
Unique Contributions -- breaking sr2 down
R2 = .558
Unstandardized Coef. Standardized Coef. 95% CI for B Correlations
B Std. Error Beta t Sig.Lower Bound
Upper Bound Zero-order Partial Part
(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044 0.388 -0.522 -0.406Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570 0.615 0.591 0.487Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852 0.527 0.512 0.396Dependent Variable: Sales Index
8
A shortcoming to breaking down sr2
Coefficientsa
.000 .222 .000 1.000
.400 .263 .400 1.523 .146 .300 .346 .346
-.200 .263 -.200 -.761 .457 .000 -.182 -.173
(Constant)
X1
X2
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Zero-order Partial Part
Correlations
Dependent Variable: Ya.
R2 = .120
9
Multicollinearity: One way it can all go bad!
Y
X1 X2
A B
D
C
E
10
Methods for diagnosing multicollinearity
Unstandardized Coef. Standardized Coef. 95% CI for B Collinearity Statistics
B Std. Error Beta t Sig.Lower Bound
Upper Bound Tolerance VIF
(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044 0.145441 6.8756Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570 0.382775 2.6125Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852 0.183382 5.4531Dependent Variable: Sales Index
Collinearity Diagnosticsa
3.861 1.000 .00 .00 .00 .00
.093 6.434 .37 .05 .08 .00
.038 10.134 .01 .14 .88 .05
.009 21.300 .61 .80 .03 .95
Dimension1
2
3
4
Model1
EigenvalueCondition
Index (Constant)Marketing in
thousands $'s Rock RatingsNumber of
plays per day
Variance Proportions
Dependent Variable: Sales Indexa.
11
Ways to fix multicollinearity
Discarding Predictors Combining Predictors
Using Principal Components Parcelling
Ridge Regression
12
Outliers and Influential Observations:Another way it can all go bad! Outliers on y
Outliers on x’s
Influential data points
13
Outliers
Outliers on y Standardized Residuals Studentized Residuals (df = N – k – 1) Deleted Studentized Residuals
Outliers on x’s Hat elements Mahalanobis Distance
14
Outliers on y
tcrit(21) = 2.08
song slsindex marketing rock airplay rockindx PRE_1 RES_1 DRE_1 ZRE_1 SRE_1 SDR_120 59.51 172 52.84 21.99 52.84 47.7047 11.8063 12.7362 1.6615 1.7257 1.817921 60.87 144 58.60 21.61 58.60 53.9276 6.9424 7.3693 0.9770 1.0066 1.006922 62.23 139 56.05 25.45 56.05 60.1036 2.1245 2.9063 0.2990 0.3497 0.342323 63.59 5 49.96 8.00 49.96 52.7401 10.8469 21.8176 1.5265 2.1649 2.397124 64.95 189 80.82 28.98 80.82 65.9458 -0.9998 -1.3866 -0.1407 -0.1657 -0.161825 66.30 200 63.37 25.15 63.37 51.7151 14.5889 16.0434 2.0531 2.1530 2.3802
15
Outliers on Xs (Leverage)
2( 1)0.32ii
kh
N
χ2(crit) for Mahalanobis’ Distance = 7.82
song slsindex marketing rock airplay rockindx MAH_1 LEV_120 59.51 172 52.84 21.99 52.84 0.7923 0.033021 60.87 144 58.60 21.61 58.60 0.4303 0.017922 62.23 139 56.05 25.45 56.05 5.4961 0.229023 63.59 5 49.96 8.00 49.96 11.1081 0.462824 64.95 189 80.82 28.98 80.82 5.7354 0.239025 66.30 200 63.37 25.15 63.37 1.2158 0.0507
16
Influential Observations
Cook’s Distance (cutoff ≈ 1.0) DFFITs [cut-offs of 2 or 2*((k+1)/n)0.5] DFBeta Standardized DF Beta
17
Influence (y & leverage)
song slsindex marketing rock airplay rockindx COO_1 SDF_1 SDB0_1 SDB1_1 SDB2_1 SDB3_120 59.51 172 52.84 21.99 52.84 0.0586 0.5102 0.0998 0.2585 -0.2728 -0.065121 60.87 144 58.60 21.61 58.60 0.0156 0.2497 -0.0696 -0.1229 0.0537 0.125722 62.23 139 56.05 25.45 56.05 0.0113 0.2076 -0.1324 -0.1589 0.0090 0.190923 63.59 5 49.96 8.00 49.96 1.1851 2.4107 0.7283 -1.2413 1.7368 -0.222124 64.95 189 80.82 28.98 80.82 0.0027 -0.1007 0.0783 0.0694 -0.0345 -0.079825 66.30 200 63.37 25.15 63.37 0.1155 0.7515 -0.1458 0.2495 -0.2708 0.0835
18
Once more, with feeling
R2 = .687
Coefficientsa
19.017 5.811 3.273 .004 6.896 31.138
-.103 .062 -.593 -1.653 .114 -.233 .027
.188 .117 .437 1.604 .124 -.056 .431
1.737 .538 .933 3.230 .004 .615 2.859
(Constant)
Marketing in thousands$'s
Rock Ratings
Number of plays per day
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Sales Indexa.
ANOVAb
1383.859 3 461.286 11.200 .000a
823.706 20 41.185
2207.565 23
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Number of plays per day, Rock Ratings, Marketing inthousands $'s
a.
Dependent Variable: Sales Indexb.
19
Plot of Standardized y’ vs. Residual
20
A cautionary tale:Some more ways it can all go bad!
x y1 y2 y34 4.260 3.100 5.3905 5.680 4.740 5.7306 7.240 6.130 6.0807 4.820 7.260 6.4208 6.950 8.140 6.7709 8.810 8.770 7.110
10 8.040 9.140 7.46011 8.330 9.260 7.81012 10.840 9.130 8.15013 7.580 8.740 12.74014 9.960 8.100 8.840
We will use X to predict y1, y2 and y3 in turn.
21
Exhibit 1, x & y1
Simple Regression Ex. 1
R2 = 0.6665
0.000
2.000
4.000
6.000
8.000
10.000
12.000
0 2 4 6 8 10 12 14 16
X Values
y1
Linear (y1)
22
Exhibit 2 (x & y2)
Simple Regression Ex. 2
R2 = 0.6662
0.000
2.000
4.000
6.000
8.000
10.000
12.000
0 2 4 6 8 10 12 14 16
X Values
y2
Linear (y2)
23
Exhibit 3 (x & y3)
Simple Regression Ex. 3
R2 = 0.6663
0.000
2.000
4.000
6.000
8.000
10.000
12.000
14.000
0 2 4 6 8 10 12 14 16
X-Values
y3
Linear (y3)
24
Homoscadasticity:Yet another way it can all go bad! What is homoscedasticity?
Is it better to have heteroscedasticity? The effects of violation How to identify it Strategies for dealing with it
25
A visual representation of ways that it can all go bad!
26
Effect Size
2 REG
TOT
SSR
SS
'y yR r
2 2 11 (1 )
1
NadjR R
N k
2
2
/
(1 ) /( 1)
R kF
R N k
Multiple Correlation (R):
SMC (R2):
27
Cross Validation
Why
Useful statistics and techniques
Conditions under which likelihood of cross-
validation is increased
28
Assumptions of Regression
Sample Size Absence of Outliers & Influential Observations Absence of Multicollinearity and Singularity Normality Linearity Homoscedasticity of Errors Independence of Errors
29
Structure Coefficients
What are they? Vs. pattern coefficients or “weights”
Why we may need both When they would be used in MRA Why they are not commonly used How you get them in SPSS
CD sales example
30
As a reminder, the coefficients (weights)
Coefficientsa
19.017 5.811 3.273 .004 6.896 31.138
-.103 .062 -.593 -1.653 .114 -.233 .027
.188 .117 .437 1.604 .124 -.056 .431
1.737 .538 .933 3.230 .004 .615 2.859
(Constant)
Marketing in thousands$'s
Rock Ratings
Number of plays per day
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Sales Indexa.
Model Summaryb
.792a .627 .571 6.41758Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Number of plays per day, RockRatings, Marketing in thousands $'s
a.
Dependent Variable: Sales Indexb.
31
Structure coefficients
R
Correlations
.792**
.000
24
.765**
.000
24
.824**
.000
24
.949**
.000
24
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Sales Index
Marketing in thousands$'s
Rock Ratings
Number of plays per day
Unstandardized Predicted
Value
Correlation is significant at the 0.01 level (2-tailed).**.
32
Model Building in MRA:“Canned” procedures
Enter
Forward
Backward Selection (Deletion)
Stepwise
Hierarchical
33
Hierarchical – Example
Predict employee satisfaction Block 1: “Hygiene Factor” Block 2: “Equity” Block 3: “Organizational Commitment”
34
Model Summary
Model Summaryd
.637a .406 .405 .913
.730b .533 .532 .810
.762c .580 .578 .770
Model1
2
3
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), pbasica.
Predictors: (Constant), pbasic, equityb.
Predictors: (Constant), pbasic, equity, affect, norm,indus, cont
c.
Dependent Variable: satisd.
35
Analysis of Variance
ANOVAd
591.651 1 591.651 709.152 .000a
866.845 1039 .834
1458.496 1040
776.948 2 388.474 591.648 .000b
681.548 1038 .657
1458.496 1040
846.134 6 141.022 238.123 .000c
612.361 1034 .592
1458.496 1040
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Model1
2
3
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), pbasica.
Predictors: (Constant), pbasic, equityb.
Predictors: (Constant), pbasic, equity, affect, norm, indus, contc.
Dependent Variable: satisd.
36
Coefficients for Models
Coefficientsa
1.246 .153 8.128 .000
.971 .036 .637 26.630 .000
.686 .140 4.902 .000
.672 .037 .441 18.207 .000
.423 .025 .407 16.799 .000
1.331 .266 5.004 .000
.494 .039 .324 12.718 .000
.332 .026 .319 12.938 .000
.041 .030 .032 1.371 .171
.073 .017 .101 4.380 .000
.052 .020 .066 2.663 .008
-.138 .023 -.163 -5.930 .000
(Constant)
pbasic
(Constant)
pbasic
equity
(Constant)
pbasic
equity
affect
norm
indus
cont
Model1
2
3
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: satisa.
37
Let’s not forget the lesson of structure coefficients…
structure coefficients Beta Wts.affect 0.327 0.032norm 0.417 0.101cont -0.580 -0.163indus 0.430 0.066equity 0.621 0.319pbasic 0.637 0.324
38
Interpretation revisited
In light of multicollinearity Standardized or unstandardized? Suppressor effects Missing predictors Correlated / uncorrelated predictors Structure coefficients Reliability of indicators Mathematical maximization nature of MRA