Download - Regression Analyses. Multiple IVs Single DV (continuous) Generalization of simple linear regression Y’ = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3...b k X k Where.

Regression Analyses

• Multiple IVs

• Single DV (continuous)

• Generalization of simple linear regression

• Y’ = b0 + b1X1 + b2X2 + b3X3...bkXk

Where k is the number of predictors

• Find solution where Sum(Y-Y’)2 minimized

• Do not confuse size of bs with importance for prediction

• Can standardize to get betas, which can help determine relative importance

Multiple Regression

• Prediction – allows prediction of change in the D.V. resulting from changes in the multiple I.V.s

• Explanation – enables explanation of the variate by assessing the relative contribution of each I.V. to the regression equation

• More efficient than multiple simple regression equations– Allows consideration of overlapping variance in the

IVs

Why use Multiple Regression?

When do you use Multiple Regression?

• When theoretical or conceptual justification exists for predicting or explaining the D.V. with the set of I.V.s

• D.V. is metric/continuous– If not, logistic regression or discriminant

analysis

Variance in Y

Variance in X1

Variance in X2

a

c

b

eresidual variance

X1

X2

Y

Multiple Regression

• DV is continuous and interval or ratio in scale • Assumes multivariate normality for random IVs• Assumes normal distributions and homogeneity of variance for each level of X for fixed IVs• No error of measurement• Correctly specified model• Errors not correlated

• Expected mean of residuals is 0• Homoscedasticity (error variance equal at all levels of X)• Errors are independent/no autocorrelation (error for one score not correlated with error for another score)• Residuals normally distributed

Assumptions

Multiple regression represents the construction of a weighted linear combination of variables:

XBXBXBAy kikiii ,2,21,1 ...ˆ

The weights are derived to:

(a) Minimize the sum of the squared errors of prediction:

(b) Maximize the squared correlation (R2) between the original outcome variables and the predicted outcome variables based on the linear combination.

i

yy ii ˆ2

X

Y

BXAy ˆy-y`

Multiple R

• R is like r except it involves multiple predictors and R cannot be negative• R is the correlation between Y and Y’ where• Y’ = b0+b1X1 + b2X2 + b3X3...bkXk• R2 tells us the proportion of variance accounted for (coefficient of determination)

An example . . .

Y = Number of job interviews

X1 = GRE score

X2 = Years to complete Ph.D.

X3 = Number of publications

N = 500

Interviews

16.014.012.010.08.06.04.02.0

140

120

100

80

60

40

20

0

Std. Dev = 2.88

Mean = 8.6

N = 500.00

GRE

1560.0

1520.0

1480.0

1440.0

1400.0

1360.0

1320.0

1280.0

1240.0

1200.0

1160.0

1120.0

1080.0

1040.0

50

40

30

20

10

0

Std. Dev = 103.72

Mean = 1296.8

N = 500.00

Publications

10.08.06.04.02.00.0

200

100

0

Std. Dev = 2.31

Mean = 4.3

N = 500.00

Years to Complete Degree

14.012.010.08.06.04.0

200

100

0

Std. Dev = 2.05

Mean = 6.1

N = 500.00

Descriptive Statistics

8.60 2.883 500

1296.82 103.724 500

4.30 2.309 500

6.09 2.055 500

Interviews

GRE

Publications

Years to CompleteDegree

Mean Std. Deviation N

Correlations

1.000 .219 .677 -.375

.219 1.000 .309 -.091

.677 .309 1.000 -.286

-.375 -.091 -.286 1.000

. .000 .000 .000

.000 . .000 .020

.000 .000 . .000

.000 .020 .000 .

500 500 500 500

500 500 500 500

500 500 500 500

500 500 500 500

Interviews

GRE

Publications


Interviews

GRE

Publications


Interviews

GRE

Publications


Pearson Correlation

Sig. (1-tailed)

N

Interviews GRE Publications

Years toCompleteDegree

Predicting Interviews

Variance in InterviewsVariance in Time to

Graduate

Variance in Pubs

b

fresidual variance

Variance in GRE

a

c

d

e

Regression with SPSS

REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT interviews /METHOD=ENTER years to complete gre pubs /SCATTERPLOT=(*ZPRED ,*ZRESID) .

From Analyze Menu • Choose Regression • Choose Linear

Model Summaryb

.703a .494 .491 2.056 .494 161.558 3 496 .000 1.994Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Durbin-Watson

Predictors: (Constant), Years to Complete Degree, GRE, Publicationsa.

Dependent Variable: Interviewsb.

The error that is minimized in the derivation of the regression weights: the standard deviation of errors of prediction.

The variance that is maximized in the derivation of the regression weights.

ANOVAb

2049.250 3 683.083 161.558 .000a

2097.142 496 4.228

4146.392 499

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.



ANOVAb

2049.250 3 683.083 161.558 .000a

2097.142 496 4.228

4146.392 499

Regression

Residual

Total

Model1




The error that is minimized in the derivation of the regression weights: the variance of errors of prediction.

Coefficients

6.576 1.219 5.395 .000 4.181

3.029E-04 .001 .011 .325 .746 -.002

.771 .044 .617 17.685 .000 .685

-.277 .047 -.198 -5.926 .000 -.369

(Constant)

GRE

Publications


Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound

95% Confidence Interval for B

Dependent Variable: Interviewsa.

The weight, b The weight, ,

if variables are standardized.

Coefficients

6.576 1.219 5.395 .000 4.181

3.029E-04 .001 .011 .325 .746 -.002

.771 .044 .617 17.685 .000 .685

-.277 .047 -.198 -5.926 .000 -.369

(Constant)

GRE

Publications


Model1

B Std. Error


Beta


t Sig. Lower Bound



Significance of Beta weights.

Output from SPSS

Multicollinearity

• Addition of many predictors increases likelihood of multicollinearity problems• Using multiple indicators of the same construct without combining them in some fashion will definitely create multicollinearity problems• Wreaks havoc with analysis

e.g., significant overall R2, but no variables in the equation significant

• Can mask or hide variables that have large and meaningful impacts on the DV

Multicollinearity

Multicollinearity reflects redundancy in the predictor variables. When severe, the standard errors for the regression coefficients are inflated and the individual influence of predictors is harder to detect with confidence. When severe, the regression coefficients are highly related.

Coefficient Correlationsa

1.000 .003 .272

.003 1.000 -.296

.272 -.296 1.000

2.186E-03 1.485E-07 5.549E-04

1.485E-07 8.705E-07 -1.204E-05

5.549E-04 -1.20E-05 1.898E-03


GRE

Publications


GRE

Publications

Correlations

Covariances

Model1

Years toCompleteDegree GRE Publications


var(b)

Coefficientsa

.000 4.181 8.971

.746 -.002 .002 .219 .015 .010 .905 1.105

.000 .685 .856 .677 .622 .565 .838 1.194

.000 -.369 -.185 -.375 -.257 -.189 .918 1.089

Sig. Lower Bound Upper Bound


Zero-order Partial Part

Correlations

Tolerance VIF

Collinearity Statistics


Coefficients

6.576

3.029E-04

.771

-.277

(Constant)

GRE

Publications


Model1

B



The tolerance for a predictor is the proportion of variance that it does not share with the other predictors. The variance inflation factor (VIF) is the inverse of the tolerance.

Remedies:(1) Combine variables using factor analysis (2) Use block entry(3) Model specification (omit variables)(4) Don’t worry about it as long as the program will allow it to run (you don’t have singularity, or perfect correlation)

Multicollinearity

Incremental R2

• Changes in R2 that occur when adding IVs• Indicates the proportion of variance in prediction

that is provided by adding Z to the equation• It is what Z adds in prediction after controlling for X

in Z• Total variance in Y can be broken up in different

ways, depending on order of entry (which IVs controlled first)

• If you have multiple IVs, change in R2 strongly determined by intercorrelations and order of entry into the equation

• Later point of entry, less R2 available to predict

Other Issues in Regression• Suppressors (one IV correlated with the other IV

but not with the DV; switches in sign)• Empirical cross-validation• Estimated cross-validation• Dichotomization, Trichotomization, Median splits

– Dichotomize one variable reduces max r to .798– Cost of dichot is loss of 1/5 to 2/3 of real variance– Dichot on more than one variable can increase Type

I error and yet can reduce power as well!

Significance of Overall R2

• Tests: a + b + c + d + e + f against area g (error)

• Get this from a simultaneous regression or from last step of block or hierarchical entry.

• Other approaches may or may not give you an appropriate test of overall R2, depending upon whether all variables are kept or some omitted.

Ya

b

cd

e

f

g

X

W

Z

Significance of Incremental R2

Change in R2 tests: a + b + c against area d + e + f + g

At this step, the t test for the b weight of X is the same as the square root of the F test if you only enter one variable. It is a test of whether or not the area of a + b + c is significant as compared to area d + e + f + g.

Step 1: Enter X

Ya

b

cd

e

f

g

X


Change in R2 tests: d + e against area f + g

At this step, the t test for the b weight of X is a test of area a against area f + g and the t test for the b weight of W is a test of area d + e against area f + g.

Ya

b

cd

e

f

g

X

W

Step 2: Enter W


Ya

b

cd

e

f

g

X

W

Z

Step 3: Enter ZChange in R2 tests: f against g

At this step, the t test for b weight of X is a test of area a against area g, the t test for the b weight of W is a test of area e against area g, and the t test for the b weight of Z is a test of area f against area g. These are the significance tests for the IV effects from a simultaneous regression analysis. No IV gets “credit” for areas b, c, d

in a simultaneous analysis.

Hierarchical RegressionSignificance of Incremental R2

Ya

b

cd

e

f

g

X

W

Z

Enter variables in hierarchical fashion to determine R2 for each effect. Test each effectagainst error variance after all variables have been entered.Assume we entered X then W then Z in a hierarchical fashion.Tests for X: areas a + b + c against gTests for W: areas d + e against gTests for Z: area f against g

Significance test for b or Beta

Ya

b

cd

e

f

g

X

W

Z

In final equation, when we look at the t tests for our b weights we are looking at the following tests:

Tests for X: Only area a against gTests for W: Only area e against gTests for Z: Only area f against g

That’s why incremental or effect R2 tests are more powerful.

Methods of building regression equations

• Simultaneous: All variables entered at once• Backward elimination (stepwise): Starts with full equation and eliminates IVs on the basis of significance tests• Forward selection (stepwise): Starts with no variables and adds on the basis of increment in R2• Hierarchical: Researcher determines order and enters each IV• Block entry: Researcher determines order and enters multiple IVs in single blocks

Simultaneousa

c

b e

X

Z

Y

f d

g

h

W

i

Simultaneous: All variables entered at onceSignificance tests and R2 based on unique variance No variable “gets credit” for area gVariables with intercorrelations have less unique variance

Variable X & Z together predict more than WVariable W might be significant, X & Z are not

Betas are partialled, so beta for W larger than X or Z

Backward Elimination

a

c

b e

X

Z

Y

f d

g

h

W

i

• Starts with full equation and eliminates IVs • Gets rid of least significant variable (probably X), then tests remaining vars to see if they are signif•Keeps all remaining significant vars•Capitalizes on chance•Low cross-validation

Forward Selection

a

c

b e

X

Z

Y

f d

g

hW

i

• Starts with no variables and adds IVs • Adds most unique R2 or next most significant variable (probably W because gets credit for area i)• Quits when more vars are not significant• Capitalizes on chance• Low cross-validation

Hierarchical (Forced Entry)

a

c

b e

X

Z

Y

f d

g

hW

i

•Researcher determines order of entry for IVs•Order based on theory, timing, or need for stat control•Less capitalization on chance•Generally higher cross-validation•Final model based on IVs of theoretical importance•Order of entry determines which IV gets credit for area g

Order of Entry• Determining order of entry is crucial• Stepwise capitalizes on chance and reduces cross-

validation and stability of your prediction equation– Only useful to maximize prediction in a given sample– Can lose important variables

• Use the following:– Logic– Theory– Order of manipulations/treatments– Timing of measures

• Usefulness of the regression model is reduced as the k (number of IVs) approaches N (sample size)– Best to have at least 15 to 1 ratio or more

Interpreting b or

• B or b is raw regression weightis standardized (Scale invariant)• At a given step, size of b or influenced

by order of entry in a regression equation– Should be interpreted at entry step

• Once all variables are in the equation, bs and s will always be the same regardless of the order of entry

• Difficult to interpret b or for main effects when interaction in equation

• We can code groups and use to analyze data (e.g., 1 and 2 to represent females and males)• Overall R2 and significance tests for full equation will not change regardless of how we code (as long as orthogonal)• Interpretation of intercept (a) and slope (b or beta weights) WILL change depending on coding • We can use coding to capture effects of categorical variables

Regression: Categorical IVs

Regression: Categorical IVs • Total # codes needed is always # groups -1• Dummy coding

– One group assigned 0s. b wts indicate mean difference of groups coded 1 compared to the group coded 0

• Effect coding– One group assigned -1s. b wts indicate mean

difference of groups coded 1 to the grand mean

• All forms of coding give you the same overall R2 and significance tests for total R2

• Difference is in interpretation of b wts

Dummy Coding White Black Asian Hispanic Dummy 1 0 1 0 0 Dummy 2 0 0 1 0 Dummy 3 0 0 0 1

• # dummy codes = # groups – 1• Group that receives all zeros is the reference group• Beta = comparison of reference group to group represented by 1• Intercept in the regression equation is mean of the reference group

Effect Coding

• # contrast codes = # groups – 1• Group that receives all zeros in dummy coding now gets all -1s• Beta = comparison of the group represented by 1 to the grand mean• Intercept in the regression equation is the grand mean

White Black Asian Hispanic Effect 1 -1 1 0 0 Effect 2 -1 0 1 0 Effect 3 -1 0 0 1

Regression with Categorical IVs vs. ANOVA

• Provides the same results as t tests or ANOVA

• Provides additional information– Regression equation (line of best fit)– Useful for future prediction– Effect size (R2)– Adjusted R2

Regression with Categorical Variables - Syntax

Step 1. Create k -1 dummy variablesStep 2. Run regression analysis with dummy variables as predictorsREGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT fiw /METHOD=ENTER msdum1 msdum2 msdum3 msdum4 msdum5 .

Regression with Categorical Variables - Output

ANOVAb

3.562 5 .712 1.562 .170a

166.030 364 .456

169.592 369

Regression

Residual

Total

Model1


Predictors: (Constant), msdum5, msdum4, msdum2, msdum3, msdum1a.

Dependent Variable: fiwb.

Coefficientsa

2.134 .049 43.669 .000

.084 .078 .059 1.081 .280

-.420 .260 -.084 -1.615 .107

-.102 .162 -.033 -.631 .529

-.634 .480 -.069 -1.321 .187

-.127 .137 -.050 -.928 .354

(Constant)

msdum1

msdum2

msdum3

msdum4

msdum5

Model1

B Std. Error


Beta


t Sig.

Dependent Variable: fiwa.

Adjusted R2

• There may be “overfitting” of the model and R2 may be inflated

• Model may not cross-validate shrinkage

• More shrinkage with small samples (< 10-15 observations per IV)

Model Summary

.145a .021 .008 .67537Model1



Predictors: (Constant), msdum5, msdum4, msdum2,msdum3, msdum1

a.

Example: Hierarchical Regression

Example. Number of children, hours in family work and sex as predictors of family interfering with work

REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT fiw /METHOD=ENTER numkids /METHOD=ENTER hrsfamil /METHOD=ENTER sex .

Hierarchical Regression OutputVariables Entered/Removedb

numkids a . Enter

hrsfamila . Enter

sexa . Enter

Model1

2

3

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

Dependent Variable: fiwb.

Model Summary

.067a .004 .001 .66568 .004 1.450 1 324 .229

.067b .004 -.002 .66669 .000 .014 1 323 .907

.222c .049 .040 .65254 .045 15.169 1 322 .000

Model1

2

3



R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Predictors: (Constant), numkidsa.

Predictors: (Constant), numkids, hrsfamilb.

Predictors: (Constant), numkids, hrsfamil, sexc.

Hierarchical Regression OutputCoefficientsa

2.025 .098 20.673 .000

.028 .024 .067 1.204 .229

2.034 .126 16.088 .000

.028 .025 .065 1.112 .267

.000 .002 -.007 -.117 .907

2.433 .161 15.144 .000

.038 .024 .088 1.541 .124

.000 .002 -.002 -.036 .971

-.285 .073 -.213 -3.895 .000

(Constant)

numkids

(Constant)

numkids

hrsfamil

(Constant)

numkids

hrsfamil

sex

Model1

2

3

B Std. Error


Beta


t Sig.


Hierarchical Regression OutputCoefficientsa

2.025 .098 20.673 .000

.028 .024 .067 1.204 .229

2.034 .126 16.088 .000

.028 .025 .065 1.112 .267

.000 .002 -.007 -.117 .907

2.433 .161 15.144 .000

.038 .024 .088 1.541 .124

.000 .002 -.002 -.036 .971

-.285 .073 -.213 -3.895 .000

(Constant)

numkids

(Constant)

numkids

hrsfamil

(Constant)

numkids

hrsfamil

sex

Model1

2

3

B Std. Error


Beta


t Sig.


Coefficientsa

2.433 .161 15.144 .000

.038 .024 .088 1.541 .124

.000 .002 -.002 -.036 .971

-.285 .073 -.213 -3.895 .000

(Constant)

numkids

hrsfamil

sex

Model1

B Std. Error


Beta


t Sig.


Simultaneous Regression Output