Multiple Regression Analysis: Part 2

Post on 12-Jan-2016

33 views 7 download

description

Multiple Regression Analysis: Part 2. Interpretation and Diagnostics. Learning Objectives. Understand regression coefficients and semi-partial correlations Learn to use diagnostics to locate problems with data (relative to MRA) Understand… Assumptions Robustness - PowerPoint PPT Presentation

Transcript of Multiple Regression Analysis: Part 2

1

Multiple Regression Analysis: Part 2

Interpretation and Diagnostics

2

Learning Objectives

Understand regression coefficients and semi-partial correlations

Learn to use diagnostics to locate problems with data (relative to MRA)

Understand… Assumptions Robustness Methods of dealing with violations

Enhance our interpretation of equations Understand entry methods

3

Statistical Tests & Interpretation Interpretation of regression coefficients

Standardized Unstandardized Intercept

Testing regression coefficients t-statistic & interpretation Testing R2

4

Output for MRA Run (coefficients)

Unstandardized Coef. Standardized Coef. 95% CI for B

B Std. Error Beta t Sig.Lower Bound

Upper Bound

(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852Dependent Variable: Sales Index

R2 = .558

5

Variance in Y Accounted for by two uncorrelated Predictors

Y

X1 X2

Example #1: Small R2, A represents variance in Y accounted for by X1, B = variance in Y accounted for by X2.

Example #2: Larger R2, A represents variance in Y accounted for by X1, B = variance in Y accounted for by X2.

(A+B)/Y = R2, E (in Y circle) equals Error.Y

X1 X2

A BE A B

E

6

Variance in Y Accounted for by two correlated Predictors: sr2 and pr2

Y

X1 X2

Example #1: Small R2

A BC

D

Y

X1 X2

Example #2: Larger R2

AC

B

D

sr2 for X1 =

pr2 for X1 =

7

Unique Contributions -- breaking sr2 down

R2 = .558

Unstandardized Coef. Standardized Coef. 95% CI for B Correlations

B Std. Error Beta t Sig.Lower Bound

Upper Bound Zero-order Partial Part

(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044 0.388 -0.522 -0.406Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570 0.615 0.591 0.487Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852 0.527 0.512 0.396Dependent Variable: Sales Index

8

A shortcoming to breaking down sr2

Coefficientsa

.000 .222 .000 1.000

.400 .263 .400 1.523 .146 .300 .346 .346

-.200 .263 -.200 -.761 .457 .000 -.182 -.173

(Constant)

X1

X2

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Zero-order Partial Part

Correlations

Dependent Variable: Ya.

R2 = .120

9

Multicollinearity: One way it can all go bad!

Y

X1 X2

A B

D

C

E

10

Methods for diagnosing multicollinearity

Unstandardized Coef. Standardized Coef. 95% CI for B Collinearity Statistics

B Std. Error Beta t Sig.Lower Bound

Upper Bound Tolerance VIF

(Constant) 23.066 6.156 3.747 0.001 10.263 35.868Marketing in thousands $'s -0.172 0.061 -1.066 -2.802 0.011 -0.299 -0.044 0.145441 6.8756Rock Ratings 0.352 0.105 0.787 3.357 0.003 0.134 0.570 0.382775 2.6125Number of plays per day 1.618 0.593 0.924 2.729 0.013 0.385 2.852 0.183382 5.4531Dependent Variable: Sales Index

Collinearity Diagnosticsa

3.861 1.000 .00 .00 .00 .00

.093 6.434 .37 .05 .08 .00

.038 10.134 .01 .14 .88 .05

.009 21.300 .61 .80 .03 .95

Dimension1

2

3

4

Model1

EigenvalueCondition

Index (Constant)Marketing in

thousands $'s Rock RatingsNumber of

plays per day

Variance Proportions

Dependent Variable: Sales Indexa.

11

Ways to fix multicollinearity

Discarding Predictors Combining Predictors

Using Principal Components Parcelling

Ridge Regression

12

Outliers and Influential Observations:Another way it can all go bad! Outliers on y

Outliers on x’s

Influential data points

13

Outliers

Outliers on y Standardized Residuals Studentized Residuals (df = N – k – 1) Deleted Studentized Residuals

Outliers on x’s Hat elements Mahalanobis Distance

14

Outliers on y

tcrit(21) = 2.08

song slsindex marketing rock airplay rockindx PRE_1 RES_1 DRE_1 ZRE_1 SRE_1 SDR_120 59.51 172 52.84 21.99 52.84 47.7047 11.8063 12.7362 1.6615 1.7257 1.817921 60.87 144 58.60 21.61 58.60 53.9276 6.9424 7.3693 0.9770 1.0066 1.006922 62.23 139 56.05 25.45 56.05 60.1036 2.1245 2.9063 0.2990 0.3497 0.342323 63.59 5 49.96 8.00 49.96 52.7401 10.8469 21.8176 1.5265 2.1649 2.397124 64.95 189 80.82 28.98 80.82 65.9458 -0.9998 -1.3866 -0.1407 -0.1657 -0.161825 66.30 200 63.37 25.15 63.37 51.7151 14.5889 16.0434 2.0531 2.1530 2.3802

15

Outliers on Xs (Leverage)

2( 1)0.32ii

kh

N

χ2(crit) for Mahalanobis’ Distance = 7.82

song slsindex marketing rock airplay rockindx MAH_1 LEV_120 59.51 172 52.84 21.99 52.84 0.7923 0.033021 60.87 144 58.60 21.61 58.60 0.4303 0.017922 62.23 139 56.05 25.45 56.05 5.4961 0.229023 63.59 5 49.96 8.00 49.96 11.1081 0.462824 64.95 189 80.82 28.98 80.82 5.7354 0.239025 66.30 200 63.37 25.15 63.37 1.2158 0.0507

16

Influential Observations

Cook’s Distance (cutoff ≈ 1.0) DFFITs [cut-offs of 2 or 2*((k+1)/n)0.5] DFBeta Standardized DF Beta

17

Influence (y & leverage)

song slsindex marketing rock airplay rockindx COO_1 SDF_1 SDB0_1 SDB1_1 SDB2_1 SDB3_120 59.51 172 52.84 21.99 52.84 0.0586 0.5102 0.0998 0.2585 -0.2728 -0.065121 60.87 144 58.60 21.61 58.60 0.0156 0.2497 -0.0696 -0.1229 0.0537 0.125722 62.23 139 56.05 25.45 56.05 0.0113 0.2076 -0.1324 -0.1589 0.0090 0.190923 63.59 5 49.96 8.00 49.96 1.1851 2.4107 0.7283 -1.2413 1.7368 -0.222124 64.95 189 80.82 28.98 80.82 0.0027 -0.1007 0.0783 0.0694 -0.0345 -0.079825 66.30 200 63.37 25.15 63.37 0.1155 0.7515 -0.1458 0.2495 -0.2708 0.0835

18

Once more, with feeling

R2 = .687

Coefficientsa

19.017 5.811 3.273 .004 6.896 31.138

-.103 .062 -.593 -1.653 .114 -.233 .027

.188 .117 .437 1.604 .124 -.056 .431

1.737 .538 .933 3.230 .004 .615 2.859

(Constant)

Marketing in thousands$'s

Rock Ratings

Number of plays per day

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Sales Indexa.

ANOVAb

1383.859 3 461.286 11.200 .000a

823.706 20 41.185

2207.565 23

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Number of plays per day, Rock Ratings, Marketing inthousands $'s

a.

Dependent Variable: Sales Indexb.

19

Plot of Standardized y’ vs. Residual

20

A cautionary tale:Some more ways it can all go bad!

x y1 y2 y34 4.260 3.100 5.3905 5.680 4.740 5.7306 7.240 6.130 6.0807 4.820 7.260 6.4208 6.950 8.140 6.7709 8.810 8.770 7.110

10 8.040 9.140 7.46011 8.330 9.260 7.81012 10.840 9.130 8.15013 7.580 8.740 12.74014 9.960 8.100 8.840

We will use X to predict y1, y2 and y3 in turn.

21

Exhibit 1, x & y1

Simple Regression Ex. 1

R2 = 0.6665

0.000

2.000

4.000

6.000

8.000

10.000

12.000

0 2 4 6 8 10 12 14 16

X Values

y1

Linear (y1)

22

Exhibit 2 (x & y2)

Simple Regression Ex. 2

R2 = 0.6662

0.000

2.000

4.000

6.000

8.000

10.000

12.000

0 2 4 6 8 10 12 14 16

X Values

y2

Linear (y2)

23

Exhibit 3 (x & y3)

Simple Regression Ex. 3

R2 = 0.6663

0.000

2.000

4.000

6.000

8.000

10.000

12.000

14.000

0 2 4 6 8 10 12 14 16

X-Values

y3

Linear (y3)

24

Homoscadasticity:Yet another way it can all go bad! What is homoscedasticity?

Is it better to have heteroscedasticity? The effects of violation How to identify it Strategies for dealing with it

25

A visual representation of ways that it can all go bad!

26

Effect Size

2 REG

TOT

SSR

SS

'y yR r

2 2 11 (1 )

1

NadjR R

N k

2

2

/

(1 ) /( 1)

R kF

R N k

Multiple Correlation (R):

SMC (R2):

27

Cross Validation

Why

Useful statistics and techniques

Conditions under which likelihood of cross-

validation is increased

28

Assumptions of Regression

Sample Size Absence of Outliers & Influential Observations Absence of Multicollinearity and Singularity Normality Linearity Homoscedasticity of Errors Independence of Errors

29

Structure Coefficients

What are they? Vs. pattern coefficients or “weights”

Why we may need both When they would be used in MRA Why they are not commonly used How you get them in SPSS

CD sales example

30

As a reminder, the coefficients (weights)

Coefficientsa

19.017 5.811 3.273 .004 6.896 31.138

-.103 .062 -.593 -1.653 .114 -.233 .027

.188 .117 .437 1.604 .124 -.056 .431

1.737 .538 .933 3.230 .004 .615 2.859

(Constant)

Marketing in thousands$'s

Rock Ratings

Number of plays per day

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Sales Indexa.

Model Summaryb

.792a .627 .571 6.41758Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Number of plays per day, RockRatings, Marketing in thousands $'s

a.

Dependent Variable: Sales Indexb.

31

Structure coefficients

R

Correlations

.792**

.000

24

.765**

.000

24

.824**

.000

24

.949**

.000

24

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Sales Index

Marketing in thousands$'s

Rock Ratings

Number of plays per day

Unstandardized Predicted

Value

Correlation is significant at the 0.01 level (2-tailed).**.

32

Model Building in MRA:“Canned” procedures

Enter

Forward

Backward Selection (Deletion)

Stepwise

Hierarchical

33

Hierarchical – Example

Predict employee satisfaction Block 1: “Hygiene Factor” Block 2: “Equity” Block 3: “Organizational Commitment”

34

Model Summary

Model Summaryd

.637a .406 .405 .913

.730b .533 .532 .810

.762c .580 .578 .770

Model1

2

3

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), pbasica.

Predictors: (Constant), pbasic, equityb.

Predictors: (Constant), pbasic, equity, affect, norm,indus, cont

c.

Dependent Variable: satisd.

35

Analysis of Variance

ANOVAd

591.651 1 591.651 709.152 .000a

866.845 1039 .834

1458.496 1040

776.948 2 388.474 591.648 .000b

681.548 1038 .657

1458.496 1040

846.134 6 141.022 238.123 .000c

612.361 1034 .592

1458.496 1040

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), pbasica.

Predictors: (Constant), pbasic, equityb.

Predictors: (Constant), pbasic, equity, affect, norm, indus, contc.

Dependent Variable: satisd.

36

Coefficients for Models

Coefficientsa

1.246 .153 8.128 .000

.971 .036 .637 26.630 .000

.686 .140 4.902 .000

.672 .037 .441 18.207 .000

.423 .025 .407 16.799 .000

1.331 .266 5.004 .000

.494 .039 .324 12.718 .000

.332 .026 .319 12.938 .000

.041 .030 .032 1.371 .171

.073 .017 .101 4.380 .000

.052 .020 .066 2.663 .008

-.138 .023 -.163 -5.930 .000

(Constant)

pbasic

(Constant)

pbasic

equity

(Constant)

pbasic

equity

affect

norm

indus

cont

Model1

2

3

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: satisa.

37

Let’s not forget the lesson of structure coefficients…

structure coefficients Beta Wts.affect 0.327 0.032norm 0.417 0.101cont -0.580 -0.163indus 0.430 0.066equity 0.621 0.319pbasic 0.637 0.324

38

Interpretation revisited

In light of multicollinearity Standardized or unstandardized? Suppressor effects Missing predictors Correlated / uncorrelated predictors Structure coefficients Reliability of indicators Mathematical maximization nature of MRA