Multiple Regression Extension of Simple Linear Regression –

Multiple Regression

Extension of Simple Linear Regression –

using multiple predictors

each predictor could help predict or explain additional variability in the response/criterion variable

However:What should be the effect of using any additional

predictors?

Multiple Regression

What should be the effect of using additional

predictors?

Logically, unless correlation with DV is 0,

each predictor will improve prediction (explain additional variance in DV)

So just adding variables as predictors at

random will usually “improve” model

Creates potential for misuse of the strategy

IDEALLY

each predictor should be

- correlated with the DV

- uncorrelated with other predictors(r over .8 undesirable)

each predictor should explain some

unique variability in DV

each predictor should make sense!

Best situation CLEAR THEORY or LOGIC determines the predictors selected

Examples

Relationship Commitmentsatisfaction with outcomes (+)investments in relationship (+)attractiveness of available alternatives (-)

Job Satisfactionsalaryphysical conditionssocial conditions

Simple linear regression

Yp = a + bX (+ residuals)

Multiple regression

Yp = a + b1X1 + b2X2 (+ residuals)

a is value of Y when all X = 0 (regression constant)

b’s are ‘partial regression coefficients’

slope for each predictor when

other predictors held constant

Graph of relationship when two predictors are used

Now try to fit a plane rather than a line – to minimize the errors of prediction

Multiple regression

Yp = a + b1X1 + b2X2 + b3X3 (+ residuals)

Commitment = a +

b1 (satisfaction) +

b2 (investments) +

- b3 (alternatives)

(+ residuals)

A weighted linear combination of predictorscomparison to ANOVA – main effects only model

• Let’s return to the question of predicting Exam 2 grades – using multiple predictors

• Undergraduate GPA (0-4 scale)• GRE Verbal (200-800 scale)• GRE Quantitative (200-800 scale)• Exam 1 grade (0-100 scale)• Mean Homework grade (0-10 scale)

Note variety of scales for predictors, weights (partial regression coefficients) will be variable to take those into account

Ideally, all predictors are related to the criterion, and are unrelated to each other

Variables Entered/Removedb

homework, grev, gpatot, greq, exam1a . EnterModel1

Variables EnteredVariablesRemoved Method

All requested variables entered.a.

Dependent Variable: exam2b.

Model Summary

.735a .540 .518 4.61879Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), homework, grev, gpatot, greq,exam1

a.

ANOVAb

2609.715 5 521.943 24.466 .000a

2218.658 104 21.333

4828.373 109

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), homework, grev, gpatot, greq, exam1a.

Dependent Variable: exam2b.

Coefficientsa

16.059 8.406 1.911 .059

.192 1.226 .011 .157 .876 .127 .015 .010

-7.4E-005 .006 -.001 -.011 .991 .137 -.001 -.001

.001 .007 .006 .086 .932 .078 .008 .006

.440 .067 .500 6.538 .000 .637 .540 .435

3.673 .681 .390 5.394 .000 .564 .468 .359

(Constant)

gpatot

grev

greq

exam1

homework

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Zero-order Partial Part

Correlations

Dependent Variable: exam2a.

Exam 2 (Pred) = 16.06 + .19 (gpa) - .00(grev) + .00(greq) +.44(exam1) +3.67(homework)

Using just Exam 1 score, the correlation between Exam 1 and Exam 2 was r = .637, r2 = .406

Now the R, between the set of predictors and Exam 2 is .735,

and R2 = .540

Since gpatot, grev, greq were all not significant, should they be excluded from the equation?

Assumptions - never likely to satisfy all

Essentially same as for r, but at a multivariate level

Independent ObservationsInterval/Ratio Data – or at least pretend

Normality – all Predictors (X’s) and Response (Y) - errors of prediction are normally

distributedLinearity

all X’s have linear relationship with Y- errors of prediction/predicted scores are

linearEquality of Variances (Homoscedasticity)

- variability of errors of Y are same at all values of X

Assumptions can be evaluated within SPSS at the multivariate level.

In the Regression window, choose Plots and request zresid (Y) and zpred (X).

The tables at the right demonstrate the patterns that would indicate each violation. Although deciding when there is ‘enough’ discrepancy is still subjective.

From Tabachnick & Fidell (2007). Using multivariate statistics (5th). Boston: Allyn & Bacon. Example to follow

Predicting Rated Distress : (1) none to Extreme (9) - when partner is emotionally unfaithful using Age and Rated Distress over Sexual Infidelity as predictors. All 3 variables are skewed.

Other Considerations in Multiple Regression -- Truncated Range – same as with r, can lead

to poor assessment of ‘real’ R

-- Outliers due to multivariate deviation

Discrepancy (distance) –outlier on criterion

Leverage – outlier on predictors

Influence – combines D & L to assess influence on solution (change in regression coefficients if case

deleted)

How these would appear in a simple linear regression situation

From Tabachnick & Fidell (2007). Using multivariate statistics (5th). Boston: Allyn & Bacon.

Other Considerations in Multiple Regression

-- Outliers due to multivariate deviation

Simple diagnostic for Influence is to request Cook’s Distance statistic in Regression window, Save option. Values over 1 would suggest potentially strong influence.

Linear Regression

0.00 10.00 20.00 30.00

years

2.00

4.00

6.00

pu

bs

Cook’s distance = 92.6Note that residual for the outlier is not great, but it has strong influence on solution

Linear Regression

0.00 10.00 20.00 30.00

years

2.00

4.00

6.00

8.00

pu

bs

Line with outlier

Line without outlier


– Sample Size – if too small may get good, but meaningless prediction – too little variability

Minimum sample sizes recommended (to detect moderate effect sizes, 13%

with power of approximately .80)

(Green, S. B. (1991) How many subjects does it take to do a regression analysis?

Multivariate Behavioral Research, 26, 499-510.)

• For test of a model n= 50 +8p• For test of individual predictors in model 104 + p

» p = number of predictors

Can also conduct a power analysis based on the effect size you desire to select your sample size


– Multicollinearity or Singularity

• Singularity – when one predictor is a combination of other predictors included

• Multicollinearity - when other predictors can account for a high degree of variability in a predictor


– Diagnostics for Multicollinearity or Singularity

Tolerance is used as diagnostic statistic

If other predictors used to predict a predictor, what variance is shared?

But reported as 1-R2, so closer to 1 is better -

less than .2 indicates a problem

Variance Inflation Factor (VIF) also used.

It is the reciprocal of Tolerance, so can range from 1 up. Reflects degree to which standard error of b is increased due to correlations among predictors.

value of 4 cause for some concern

value of 10 serious problem

Assessing the Outcome

Testing the Overall Model as a single outcome

How well do the set of predictors (X’s) predict the criterion (Y)

Ho: all b’s are = 0, all partial regression coefficients = 0 Or

Ho: R = 0, the Multiple Correlation Coefficient = 0

R = correlation of actual Y with weighted linear combination of predictors (X’s)

Or – since weighted linear combination leads to predicted scores

R = correlation of actual Y with predicted Yp

Reminder: Partitioning the Variability in Y

SSTotal = Sum (Y - Mean Y)2

variability of Y scores from the mean

Separated into

SSregression = Sum (Yp – Mean Y) 2 Improvement in predictions when using X (variability in Y explained by X),

rather than assuming everyone gets the Mean

SSresidual = Sum (Y - Yp) 2

Degree to which predictions do not match the actual scores

(prediction errors that have been minimized)

Linear Regression

2.50 3.00 3.50

gpa

90.00

100.00

110.00

120.00

iq

iq = 53.05 + 16.99 * gpaR-Square = 0.69

Mean IQ = 105

This would be your best ‘guess’ for every person if you had no useful predictor

Improvement in Prediction using GPA

Residual – distance from the

prediction line

Example from Simple Linear Regression

Residual much greater here

Mean GPA = 3.06

Test using F – similar to simple linear regression

Partition SST into • SSregression (explained by weighted combination)

• SSresidual (unexplained)

F = SSregression /df regression df = (p + a) - 1 SSresidual / df residual df = n – p - 1

F = MSregression = explained (systematic+unsystematic)

MSresidual unexplained (unsystematic)

Was R reliably different from 0 ? Yes, if F is significant

Recall: Standard Error of the Estimate = SQRT (MS residual)

Number of parameters in model (predictors + intercept) – df often indicated as p, since always only one a (df = p +1 -1)

R2 = SSregression = explained variability SST total variability

% of variance accounted for by the model(see next slide for ANOVA example)

Adjusted R2 for better estimate for population, adjusted based on number of predictors and sample size

Adjusted R2 = 1- ((1-R2) (n-1/n-p-1)) so lower if small sample, but many

predictors

Can use R2 for describing a sample

Tests of Between-Subjects Effects

Dependent Variable: Sensitive

83.200a 7 11.886 3.290 .003 .132

6451.600 1 6451.600 1785.585 .000 .922

14.400 1 14.400 3.985 .048 .026

44.600 3 14.867 4.115 .008 .075

24.200 3 8.067 2.233 .087 .042

549.200 152 3.613

7084.000 160

632.400 159

SourceCorrected Model

Intercept

GENDER

RELATE

GENDER * RELATE

Error

Total

Corrected Total

Type IIISum ofSquares df

MeanSquare F Sig.

EtaSquared

R Squared = .132 (Adjusted R Squared = .092)a.

Example from Handout Packet, Page approx. 47

Test of model in which there are 3 predictors used to predict the rating on “Sensitive”, the DV

Partial

In some cases, the purpose of the regression analysis is simply to see if the Model “works”.

Does it explain variance in the criterion? Can it be used to make predictions?

Thus, the overall test of the model is all you need, and you can interpret the R2 or R2

adj

and the SEE, if plan predictions

In other cases, you might want to know how the individual predictors contributed to the overall model.

Assessing the contribution of individual predictors

Dependent upon the set of predictors included!

Partial regression coefficient– can test to see if b = 0

Is b = 0 (slope = 0) when other predictors are held constant

Tested using a t test with df = n – p – 1

Beta – partial regression coefficient when all variables are standardized (standardized slope) If b is significant, so is beta

Test of Partial Regression Coefficient is like a typical statistical test of significance - it is or is not significant, and is influenced by sample size

Can also evaluate predictors based on “effect size” measures (practical significance)

These would be “significant” if b is significant

Partial correlation (pr) – as described in simple covariation section

correlation of predictor (X1) with DV (Y) after removing the variance in both explained by the other predictors

So both X1 and Y are adjusted before correlation is calculated

All other X’s are ‘partialed’ out of X1 and Y

pr2 – shared variance within context – what % of variability in Y does X1 explain after other

variables’ contributions to explaining both are removed

there is less than 100% of variability of Y left for X1 explain

Semi-partial (part) correlation (sr) –

correlation of predictor (X1) with DV (Y) after removing the variance of X1 shared with the other predictors

So X1 is adjusted by removing variance shared with other X’s

But all variability in Y is left to be explained

Assesses ‘unique’ contribution of X1 to explaining Y

There is 100% of variability in Y to explain for each X in model

sr2 is considered best measure of individual predictor importance (practical significance)

R2 will be lowered by sr2 for predictor when it is removed from model

(BOTH pr and sr ARE STILL DEPENDENT ON THE MODEL USED)

WHY?

Variability of DV (Y)

IV 1

IV 2

Shared variabilityDV and X1

Shared variabilityIV 1 and X2

Shared variabilityDV and X2

a

b

cd

Partial correlation (X1) = a/(a + d)

Semi-partial correlation (X1) = a/(a + b + c + d)

Types of Multiple Regression

Standard – all predictors entered together

contribution of each depends on others in the group

Assumes other variables would usually be there and/or are relevant

Four Humor StylesInvestment Model VariablesBig Five Personality Dimensions

Hierarchical Regression - enter in planned sequence

Can enter individual predictors one at a time

Or enter groups of variables at separate steps

As new predictors are added, each one can only explain variability that is left

Assess change in R2 at each step (increase significantly)and overall model when done

Predicting adult IQParental IQPrenatal experienceEarly infant experienceEducation

Statistical Methods –

Let the data determine inclusion in the model, not based on a logical or theoretical ‘plan’

Assess each step by evaluating change in R or R2

Usually an exploratory tool in possible model building

Requires a larger sample to have confidence

(40 cases per predictor)

Stepwise

Begins with single best predictor

Adds next best, and assesses if model is better

At each step, each variable is reassessed, and might be kept or removed

Stops when adding additional variables does not significantly improve model (R)

Forward inclusion

Begins with single best predictor

Adds next best, and assesses improvement

Once in, stay in, but only stay if improved model (R)

Backward exclusion

Begins with full model

Removes weakest contributor and assesses loss

Keeps removing unless significant drop in R

Research questions using Multiple Regression

Assess Overall Model

Assess individual predictors

Effects of adding or changing predictors

on overall model

on other individual predictors

Predictions in new sample

Other Multiple Regression issues/applications

Suppressor variables –

variables that improve the model due to correlations with other predictors, not criterion. They ‘suppress’ variance in another predictor that is ‘noise’

evident if simple r with criterion very low but contributes to model (sr is higher) can also produce a change in sign from r to b (i.e. positive r but negative b)

Other issues/applications

Mediation Models

Relationship of X to Y is mediated by some other variable

Positive use of Humor for self Perceived Stress

Positive Personality (optimistic, hopeful, happy)

Positive use of Humor for self Perceived Stress(High self-enhancing/Low self-defeating)

Humor use (H) predicts Perceived Stress (c) - the direct path

Humor use predicts Positive Personality (PP) (a)

Positive Personality predicts Perceived Stress, with Humor in model (b)

In Hierarchical Model, enter PP first, then H, if PP mediates H, H no longer ‘contributes’ to the model – the c’ path, indirect, not significant

a b

C’

C

Other issues/applications

Moderator Models – relationship of predictor with the criterion depends upon some other variable

(just like an interaction in ANOVA)

Yp = a + b1x1 + b2x2 + b3(x1x2) + residuals

Often requires some modifications of the data prior to the analysis

- centering variables to avoid multicollinearity (if predictors do not have true 0 scores)

Interaction term added to equation

Main effects

Best situation

CLEAR THEORY to be tested

Relationship Commitment (low 8 – 72 high)

satisfaction with outcomes (+) (low 3 – 21 high)

investments in relationship (+)

attractiveness of available alternatives (-) ( low 6 – high 48)

(Subj low 6 – 54 high)

(Obj none 0 - ?? Lots)

In Handout Packet

Begin by examining the individual variables for normality and outliers etc.

Can request Cook’s D to assess for outlier influence

Can check assumptions using plot from regression analysis

Scatterplot

Dependent Variable: Global Commitment

Regression Standardized Predicted Value

Regression Standardized Residual

Then look at the simple correlations (r)Expect predictors to correlate with criterion, but not a lot with each other

Correlations

1.000 .310 -.422 .395 .551

.310 1.000 -.237 .157 .339

-.422 -.237 1.000 -.257 -.440

.395 .157 -.257 1.000 .408

.551 .339 -.440 .408 1.000

. .003 .000 .000 .000

.003 . .020 .089 .001

.000 .020 . .013 .000

.000 .089 .013 . .000

.000 .001 .000 .000 .

75 75 75 75 75

75 75 75 75 75

75 75 75 75 75

75 75 75 75 75

75 75 75 75 75

Global Commitment

Global Satisfaction

Global alternatives

Objective Investments

Subjective Investments

Global Commitment

Global Satisfaction

Global alternatives



Global Commitment

Global Satisfaction

Global alternatives



Pearson Correlation

Sig. (1-tailed)

N

GlobalCommitment

GlobalSatisfaction

Globalalternatives

ObjectiveInvestments

SubjectiveInvestments

Check to see how well the model workedR and R2, and test of significanceStandard error of the estimate

ANOVAb

1751.967 4 437.992 10.889 .000a

2815.713 70 40.224

4567.680 74

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), Subjective Investments, Global Satisfaction, ObjectiveInvestments, Global alternatives

a.

Dependent Variable: Global Commitmentb.

Model Summary

.619a .384 .348 6.34228Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Global alternatives, GlobalSatisfaction, Objective Investments, SubjectiveInvestments

a.

To describe sample

To generalize to population

Typical residual

Coefficientsa

19.247 6.372 3.021 .004 6.539 31.955

.247 .214 .116 1.153 .253 -.180 .674 .310 .137 .108 .875 1.143

-.179 .098 -.192 -1.823 .073 -.376 .017 -.422 -.213 -.171 .791 1.264

3.390E-02 .019 .183 1.773 .081 -.004 .072 .395 .207 .166 .826 1.211

.346 .113 .353 3.071 .003 .121 .571 .551 .345 .288 .668 1.496

(Constant)

Global Satisfaction

Global alternatives



Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.LowerBound

UpperBound

95% Confidence Intervalfor B

Zero-order Partial Part

Correlations

Tolerance VIF

Collinearity Statistics

Dependent Variable: Global Commitmenta.

95

Now can look at individual predictors

check collinearity

see which predictors are individually significant

look at individual contributions (semi-partial or part r2)

• Go through example in SPSS

• Look at G*Power

• Stepwise example in Handouts

Multiple Regression Extension of Simple Linear Regression –

Documents

Transcript of Multiple Regression Extension of Simple Linear Regression –