Multiple Regression Analysis: Part 2

38
1 Multiple Regression Analysis: Part 2 Interpretation and Diagnostics

description

Multiple Regression Analysis: Part 2. Interpretation and Diagnostics. Learning Objectives. Understand regression coefficients and semi-partial correlations Learn to use diagnostics to locate problems with data (relative to MRA) Understand… Assumptions Robustness - PowerPoint PPT Presentation

Transcript of Multiple Regression Analysis: Part 2

Page 1: Multiple Regression Analysis:  Part 2

1

Multiple Regression Analysis: Part 2

Interpretation and Diagnostics

Page 2: Multiple Regression Analysis:  Part 2

2

Learning Objectives

Understand regression coefficients and semi-partial correlations

Learn to use diagnostics to locate problems with data (relative to MRA)

Understand… Assumptions Robustness Methods of dealing with violations

Enhance our interpretation of equations Understand entry methods

Page 3: Multiple Regression Analysis:  Part 2

3

Statistical Tests & Interpretation Interpretation of regression coefficients

Standardized Unstandardized Intercept

Testing regression coefficients t-statistic & interpretation Testing R2

Page 4: Multiple Regression Analysis:  Part 2

4

Output for MRA Run (coefficients)

Unstandardized Coef. Standardized Coef. 95% CI for B

B Std. Error Beta t Sig.Lower Bound

Upper Bound

(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852Dependent Variable: Sales Index

R2 = .558

Page 5: Multiple Regression Analysis:  Part 2

5

Variance in Y Accounted for by two uncorrelated Predictors

Y

X1 X2

Example #1: Small R2, A represents variance in Y accounted for by X1, B = variance in Y accounted for by X2.

Example #2: Larger R2, A represents variance in Y accounted for by X1, B = variance in Y accounted for by X2.

(A+B)/Y = R2, E (in Y circle) equals Error.Y

X1 X2

A BE A B

E

Page 6: Multiple Regression Analysis:  Part 2

6

Variance in Y Accounted for by two correlated Predictors: sr2 and pr2

Y

X1 X2

Example #1: Small R2

A BC

D

Y

X1 X2

Example #2: Larger R2

AC

B

D

sr2 for X1 =

pr2 for X1 =

Page 7: Multiple Regression Analysis:  Part 2

7

Unique Contributions -- breaking sr2 down

R2 = .558

Unstandardized Coef. Standardized Coef. 95% CI for B Correlations

B Std. Error Beta t Sig.Lower Bound

Upper Bound Zero-order Partial Part

(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044 0.388 -0.522 -0.406Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570 0.615 0.591 0.487Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852 0.527 0.512 0.396Dependent Variable: Sales Index

Page 8: Multiple Regression Analysis:  Part 2

8

A shortcoming to breaking down sr2

Coefficientsa

.000 .222 .000 1.000

.400 .263 .400 1.523 .146 .300 .346 .346

-.200 .263 -.200 -.761 .457 .000 -.182 -.173

(Constant)

X1

X2

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Zero-order Partial Part

Correlations

Dependent Variable: Ya.

R2 = .120

Page 9: Multiple Regression Analysis:  Part 2

9

Multicollinearity: One way it can all go bad!

Y

X1 X2

A B

D

C

E

Page 10: Multiple Regression Analysis:  Part 2

10

Methods for diagnosing multicollinearity

Unstandardized Coef. Standardized Coef. 95% CI for B Collinearity Statistics

B Std. Error Beta t Sig.Lower Bound

Upper Bound Tolerance VIF

(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044 0.145441 6.8756Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570 0.382775 2.6125Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852 0.183382 5.4531Dependent Variable: Sales Index

Collinearity Diagnosticsa

3.861 1.000 .00 .00 .00 .00

.093 6.434 .37 .05 .08 .00

.038 10.134 .01 .14 .88 .05

.009 21.300 .61 .80 .03 .95

Dimension1

2

3

4

Model1

EigenvalueCondition

Index (Constant)Marketing in

thousands $'s Rock RatingsNumber of

plays per day

Variance Proportions

Dependent Variable: Sales Indexa.

Page 11: Multiple Regression Analysis:  Part 2

11

Ways to fix multicollinearity

Discarding Predictors Combining Predictors

Using Principal Components Parcelling

Ridge Regression

Page 12: Multiple Regression Analysis:  Part 2

12

Outliers and Influential Observations:Another way it can all go bad! Outliers on y

Outliers on x’s

Influential data points

Page 13: Multiple Regression Analysis:  Part 2

13

Outliers

Outliers on y Standardized Residuals Studentized Residuals (df = N – k – 1) Deleted Studentized Residuals

Outliers on x’s Hat elements Mahalanobis Distance

Page 14: Multiple Regression Analysis:  Part 2

14

Outliers on y

tcrit(21) = 2.08

song slsindex marketing rock airplay rockindx PRE_1 RES_1 DRE_1 ZRE_1 SRE_1 SDR_120 59.51 172 52.84 21.99 52.84 47.7047 11.8063 12.7362 1.6615 1.7257 1.817921 60.87 144 58.60 21.61 58.60 53.9276 6.9424 7.3693 0.9770 1.0066 1.006922 62.23 139 56.05 25.45 56.05 60.1036 2.1245 2.9063 0.2990 0.3497 0.342323 63.59 5 49.96 8.00 49.96 52.7401 10.8469 21.8176 1.5265 2.1649 2.397124 64.95 189 80.82 28.98 80.82 65.9458 -0.9998 -1.3866 -0.1407 -0.1657 -0.161825 66.30 200 63.37 25.15 63.37 51.7151 14.5889 16.0434 2.0531 2.1530 2.3802

Page 15: Multiple Regression Analysis:  Part 2

15

Outliers on Xs (Leverage)

2( 1)0.32ii

kh

N

χ2(crit) for Mahalanobis’ Distance = 7.82

song slsindex marketing rock airplay rockindx MAH_1 LEV_120 59.51 172 52.84 21.99 52.84 0.7923 0.033021 60.87 144 58.60 21.61 58.60 0.4303 0.017922 62.23 139 56.05 25.45 56.05 5.4961 0.229023 63.59 5 49.96 8.00 49.96 11.1081 0.462824 64.95 189 80.82 28.98 80.82 5.7354 0.239025 66.30 200 63.37 25.15 63.37 1.2158 0.0507

Page 16: Multiple Regression Analysis:  Part 2

16

Influential Observations

Cook’s Distance (cutoff ≈ 1.0) DFFITs [cut-offs of 2 or 2*((k+1)/n)0.5] DFBeta Standardized DF Beta

Page 17: Multiple Regression Analysis:  Part 2

17

Influence (y & leverage)

song slsindex marketing rock airplay rockindx COO_1 SDF_1 SDB0_1 SDB1_1 SDB2_1 SDB3_120 59.51 172 52.84 21.99 52.84 0.0586 0.5102 0.0998 0.2585 -0.2728 -0.065121 60.87 144 58.60 21.61 58.60 0.0156 0.2497 -0.0696 -0.1229 0.0537 0.125722 62.23 139 56.05 25.45 56.05 0.0113 0.2076 -0.1324 -0.1589 0.0090 0.190923 63.59 5 49.96 8.00 49.96 1.1851 2.4107 0.7283 -1.2413 1.7368 -0.222124 64.95 189 80.82 28.98 80.82 0.0027 -0.1007 0.0783 0.0694 -0.0345 -0.079825 66.30 200 63.37 25.15 63.37 0.1155 0.7515 -0.1458 0.2495 -0.2708 0.0835

Page 18: Multiple Regression Analysis:  Part 2

18

Once more, with feeling

R2 = .687

Coefficientsa

19.017 5.811 3.273 .004 6.896 31.138

-.103 .062 -.593 -1.653 .114 -.233 .027

.188 .117 .437 1.604 .124 -.056 .431

1.737 .538 .933 3.230 .004 .615 2.859

(Constant)

Marketing in thousands$'s

Rock Ratings

Number of plays per day

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Sales Indexa.

ANOVAb

1383.859 3 461.286 11.200 .000a

823.706 20 41.185

2207.565 23

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Number of plays per day, Rock Ratings, Marketing inthousands $'s

a.

Dependent Variable: Sales Indexb.

Page 19: Multiple Regression Analysis:  Part 2

19

Plot of Standardized y’ vs. Residual

Page 20: Multiple Regression Analysis:  Part 2

20

A cautionary tale:Some more ways it can all go bad!

x y1 y2 y34 4.260 3.100 5.3905 5.680 4.740 5.7306 7.240 6.130 6.0807 4.820 7.260 6.4208 6.950 8.140 6.7709 8.810 8.770 7.110

10 8.040 9.140 7.46011 8.330 9.260 7.81012 10.840 9.130 8.15013 7.580 8.740 12.74014 9.960 8.100 8.840

We will use X to predict y1, y2 and y3 in turn.

Page 21: Multiple Regression Analysis:  Part 2

21

Exhibit 1, x & y1

Simple Regression Ex. 1

R2 = 0.6665

0.000

2.000

4.000

6.000

8.000

10.000

12.000

0 2 4 6 8 10 12 14 16

X Values

y1

Linear (y1)

Page 22: Multiple Regression Analysis:  Part 2

22

Exhibit 2 (x & y2)

Simple Regression Ex. 2

R2 = 0.6662

0.000

2.000

4.000

6.000

8.000

10.000

12.000

0 2 4 6 8 10 12 14 16

X Values

y2

Linear (y2)

Page 23: Multiple Regression Analysis:  Part 2

23

Exhibit 3 (x & y3)

Simple Regression Ex. 3

R2 = 0.6663

0.000

2.000

4.000

6.000

8.000

10.000

12.000

14.000

0 2 4 6 8 10 12 14 16

X-Values

y3

Linear (y3)

Page 24: Multiple Regression Analysis:  Part 2

24

Homoscadasticity:Yet another way it can all go bad! What is homoscedasticity?

Is it better to have heteroscedasticity? The effects of violation How to identify it Strategies for dealing with it

Page 25: Multiple Regression Analysis:  Part 2

25

A visual representation of ways that it can all go bad!

Page 26: Multiple Regression Analysis:  Part 2

26

Effect Size

2 REG

TOT

SSR

SS

'y yR r

2 2 11 (1 )

1

NadjR R

N k

2

2

/

(1 ) /( 1)

R kF

R N k

Multiple Correlation (R):

SMC (R2):

Page 27: Multiple Regression Analysis:  Part 2

27

Cross Validation

Why

Useful statistics and techniques

Conditions under which likelihood of cross-

validation is increased

Page 28: Multiple Regression Analysis:  Part 2

28

Assumptions of Regression

Sample Size Absence of Outliers & Influential Observations Absence of Multicollinearity and Singularity Normality Linearity Homoscedasticity of Errors Independence of Errors

Page 29: Multiple Regression Analysis:  Part 2

29

Structure Coefficients

What are they? Vs. pattern coefficients or “weights”

Why we may need both When they would be used in MRA Why they are not commonly used How you get them in SPSS

CD sales example

Page 30: Multiple Regression Analysis:  Part 2

30

As a reminder, the coefficients (weights)

Coefficientsa

19.017 5.811 3.273 .004 6.896 31.138

-.103 .062 -.593 -1.653 .114 -.233 .027

.188 .117 .437 1.604 .124 -.056 .431

1.737 .538 .933 3.230 .004 .615 2.859

(Constant)

Marketing in thousands$'s

Rock Ratings

Number of plays per day

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Sales Indexa.

Model Summaryb

.792a .627 .571 6.41758Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Number of plays per day, RockRatings, Marketing in thousands $'s

a.

Dependent Variable: Sales Indexb.

Page 31: Multiple Regression Analysis:  Part 2

31

Structure coefficients

R

Correlations

.792**

.000

24

.765**

.000

24

.824**

.000

24

.949**

.000

24

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Sales Index

Marketing in thousands$'s

Rock Ratings

Number of plays per day

Unstandardized Predicted

Value

Correlation is significant at the 0.01 level (2-tailed).**.

Page 32: Multiple Regression Analysis:  Part 2

32

Model Building in MRA:“Canned” procedures

Enter

Forward

Backward Selection (Deletion)

Stepwise

Hierarchical

Page 33: Multiple Regression Analysis:  Part 2

33

Hierarchical – Example

Predict employee satisfaction Block 1: “Hygiene Factor” Block 2: “Equity” Block 3: “Organizational Commitment”

Page 34: Multiple Regression Analysis:  Part 2

34

Model Summary

Model Summaryd

.637a .406 .405 .913

.730b .533 .532 .810

.762c .580 .578 .770

Model1

2

3

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), pbasica.

Predictors: (Constant), pbasic, equityb.

Predictors: (Constant), pbasic, equity, affect, norm,indus, cont

c.

Dependent Variable: satisd.

Page 35: Multiple Regression Analysis:  Part 2

35

Analysis of Variance

ANOVAd

591.651 1 591.651 709.152 .000a

866.845 1039 .834

1458.496 1040

776.948 2 388.474 591.648 .000b

681.548 1038 .657

1458.496 1040

846.134 6 141.022 238.123 .000c

612.361 1034 .592

1458.496 1040

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), pbasica.

Predictors: (Constant), pbasic, equityb.

Predictors: (Constant), pbasic, equity, affect, norm, indus, contc.

Dependent Variable: satisd.

Page 36: Multiple Regression Analysis:  Part 2

36

Coefficients for Models

Coefficientsa

1.246 .153 8.128 .000

.971 .036 .637 26.630 .000

.686 .140 4.902 .000

.672 .037 .441 18.207 .000

.423 .025 .407 16.799 .000

1.331 .266 5.004 .000

.494 .039 .324 12.718 .000

.332 .026 .319 12.938 .000

.041 .030 .032 1.371 .171

.073 .017 .101 4.380 .000

.052 .020 .066 2.663 .008

-.138 .023 -.163 -5.930 .000

(Constant)

pbasic

(Constant)

pbasic

equity

(Constant)

pbasic

equity

affect

norm

indus

cont

Model1

2

3

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: satisa.

Page 37: Multiple Regression Analysis:  Part 2

37

Let’s not forget the lesson of structure coefficients…

structure coefficients Beta Wts.affect 0.327 0.032norm 0.417 0.101cont -0.580 -0.163indus 0.430 0.066equity 0.621 0.319pbasic 0.637 0.324

Page 38: Multiple Regression Analysis:  Part 2

38

Interpretation revisited

In light of multicollinearity Standardized or unstandardized? Suppressor effects Missing predictors Correlated / uncorrelated predictors Structure coefficients Reliability of indicators Mathematical maximization nature of MRA