correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple...

14
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie - [email protected] 1 correlation and association is there a relationship between IQ scores (X) and overall school achievement (Y)? – do scores on X and Y covary? – is there a correlation between X and Y? Y X XY S S r COV = regression and prediction can we predict overall school achievement (Y) using IQ scores (X)? – how much variance in Y can we explain in terms of X? – can we model Y using X? Y = bX + c + e Ŷ = bX + c – represent the model, Ŷ, as a ‘line of best fit’ X unknown – best predictor of Y is Y X known – best predictor of Y is Ŷ

Transcript of correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple...

Page 1: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 1

correlation and association

� is there a relationship between IQ scores (X) and overall school achievement (Y)?– do scores on X and Y covary?

– is there a correlation between X and Y?

YX

XY

SS rCOV

=

regression and prediction

� can we predict overall school achievement (Y) using IQ scores (X)?– how much variance in Y can we explain in terms of X?

– can we model Y using X?• Y = bX + c + e

• Ŷ = bX + c

– represent the model, Ŷ, as a ‘line of best fit’

X unknown – best predictor of Y is Y X known – best predictor of Y is Ŷ

Page 2: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 2

the least squares criterion

� the model, Ŷ, is calculated to minimise errors of prediction according to the least-squares criterion (LSC)

� LSC states that Σ(Y-Ŷ)2 is minimized

� b is the unstandardised regression coefficient– a weight calculated to satisfy the LSC– indicates predicted change in Y given unit change in X– gives the slope of the regression line

� c is the Y intercept– gives the value of Y when X is zero (typically where the

regression line intercepts the Y axis

2

COV

X

XY

S b =

XbYc −=

variance explained – r2

r2 tells us the proportion of variance in Y which is explained by X

� a ratio reflecting the proportion of variance captured by our model relative to the overall variance in our data

� highly interpretable: r2 =.50 means 50% of the variance in Y is explained by X

∑∑

−===

2

2

ˆ2

)(

)ˆ(

YY

YY

SS

SS

SS

SS r

Y

Y

Total

Regression

multiple regression

� can we predict job performance (Y) from overall school achievement (X1) and IQ scores (X2)?– how much variance in Y is explained by X1 and X2 in combination?

– how important is each predictor of job performance?

� two kinds of research questions in MR:– is the model significant and important?

– are the individual predictors significant and important?

the structural model

Y = c + b1X1 + b2X2 + . . . bpXp + e

Y, any DV score is predicted according to

c � an intercept on the Y axis, plus

b1X1 � a weighted effect of predictor X1

b2X2 � a weighted effect of predictor X2

bpXp � a weighted effect of predictor Xp

e� error

Page 3: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 3

the structural model

Y = C + b1X1 + b2X2 + . . . bpXp + e

DATA = MODEL + RESIDUAL

the regression plane – two predictors(3D space)

predictor 1 (X 1) predictor 2 (X

2)

criterion (Y) Ŷ = b1X1 + b2X2 + c

unstandardised partial regression coefficients - b

� Ŷ is calculated according to LSC� solved for by finding a set of weights (b) minimising

errors of prediction (around the plane) – b1 indicates change in Y given unit change in X1 when X2 … Xp

= 0

– when standardised, indicates SD change in Y given SD change in X, and is denoted β

� c is the Y intercept

� Ŷ is therefore a weighted combination of the predictors (and intercept) called a linear composite (LC)

IQ

school achievement

bivariate regression

Ŷ = β1(IQ)

Page 4: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 4

IQ

school achievement

job

performance

Ŷ = β1(IQ) + β2(achieve)

multiple regression

LC

variance explained – R2

R2 is simply the r2 representing the proportion of variance in Y which is explained by Ŷ – the linear composite

� a ratio reflecting the proportion of variance captured by our model relative to the overall variance in our data

� highly interpretable: R2 =.50 means 50% of the variance in Y is explained by the combination of X1, X2… Xp

∑∑

−===

2

2

ˆ2

)(

)ˆ(

YY

YY

SS

SS

SS

SS R

Y

Y

Total

Regression

IQ

school achievement

R2 vs r2

IQ

school achievement

job

performance

LC

r2

R2

significance of the model

� R2 tells us how important the model is

� the model can also be tested for statistical significance

� test is conducted on R the multiple correlation coefficient, against df = p, N - p - 1

residual

regression

MS

MS

Rp

RpN F =

−−=

)1(

)1(2

2

Page 5: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 5

0

123

45

6

78

910

0 1 2 3 4 5 6 7 8 9 10

IQ

School Acheivement

90 95 100 105 110 115 120 125 130

SStotal = SSregression+ SSresidual importance of individual predictors

� r – simple correlation coefficent

� b – partial regression coefficient

� β - standardized partial regression coefficient

� pr – partial correlation coefficient

� sr – semi-partial correlation coefficient

r – simple correlation coefficent

� indicates importance of predictor in terms of its direct relationship with the criterion

� not very useful in MR as it does not take into account intercorrelationswith other predictors

b – partial regression coefficient

� indication of the importance of a predictor in terms of the model (not the data)

� scale-bound so can’t compare magnitude

� can however compare significance – each b is tested by dividing by it’s standard error to give a t-value:

b

bSE

b t =

(IQ)

(Motivation)

(Anxiety)

1−−= pNdf

Page 6: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 6

β – standardized partial regression coefficient

� indication of the importance of a predictor in terms of the model (not the data)

� standardized (scale free) so you CAN compare magnitude

� test of significance is same as for b

Y

pp

ps

sb=β

pr – partial correlation coefficient

correlation between X1 and Y, with the variance shared with other predictors partialled out

pr2 indicates the proportion of variance in the criterion left unexplained by the other predictors that is explained by X1

values are usually similar to β

pr vs βsr – semi-partial correlation

coefficientcorrelation between X1 and Y, after removing from X1 any variance it shares with other predictors

sr2 indicates the uniquecontribution to prediction by X1 as a portion of R2)

Page 7: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 7

criterion

predictor1

unique, shared, and total variance

� subtracting sr2 for each predictor from R2 gives the shared variance – the amount of variance in R2 accounted for by all predictor variables

predictor2

R2

sr2

sr2

shared var

assumptions of MR

� scale (predictor and criterion scores)– measured using a continuous scale (interval or ratio)

– normality (variables are normally distributed)

– linearity (there is a straight line relationship between predictors and criterion)

– predictors are not multicolinear or singular(extremely highly correlated)

more assumptions of MR

� residuals– normality: array of Y values are normally distributed around Ŷ (assumption of normality in arrays)

– homoscedasticity: variance of Y values are constant across full range Ŷ values (assumption of homogeneity of variance in arrays)

– linearity: straight-line relationship between Ŷand residuals (with mean = 0 and slope = 0)

– independence (residuals uncorrelated)

assumptions met

non-normality

non-normality (curvilinearity)

heteroscedasticity

Page 8: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 8

� occurs when predictors are highly correlated (>.90)� causes unstable calculation of regression weights (b)� diagnosed with intercorrelations, tolerance, and VIF

Tolerance = (1 - R2x)

– where R2x is the overlap between a particular

predictor and all the other predictors– values below .10 considered problematicVariance Inflation Factor (VIF) = 1/tolerance- values above 4 considered problematic

� best solution is to remove or combine collinear predictors

multicollinearity and singularity outliers – extreme cases

� distort solution and inflate standard error� univariate outliers

– cases beyond 3 SD on any variable� multivariate outliers

– described in terms of:• leverage (h) – distance of case from group centroid along line/plane of best fit

• discrepancy – extent to which case deviates from line/plane of best fit

• influence – combined effect of leverage and discrepancy: effect of the outlier on the solution

multivariate outliers – high influence

predictor 1 (X 1) predictor 2 (X

2)

criterion (Y)

high leverage

high discrepency

multivariate outliers – low influence

predictor 1 (X 1) predictor 2 (X

2)

criterion (Y)

high leverage

high discrepency

Page 9: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 9

multivariate outliers – testing

� leverage – leverage statistic (h): varies from 0 to 1, values > .50 are problematic

– Mahanobis Distance ≈ h x (n-1), distributed as chi-square and tested as such (df = p, α <.001)

� discrepancy – not directly tested

� influence– assesses change in solution when case is removed

– Cook’s Distance, values > 1 are problematic

more issues: sample size

various rules of thumb exist…

� for medium effect sizes (R2 ≈ .25)– significance of model (R): N > 50 + 8p

– significance of predictors (β etc): N > 104 + p

� more observations needed if– small effect sizes are anticipated

– DV is skewed

– IVs have low reliability

even more issues…

� reliability of measures– low reliability of measures increases residual

variance (less power to reject H0)

� principle of parsimony– moderate intercorrelations are OK, but goal

should be to chose predictors which are 1. maximally correlated with DV (high validity)2. minimally correlated with each other (low

collinearity)

principle of parsimony

predictor1

predictor3

predictor2

criterionR2

•each predictor contributes uniquely and substantially

model is parsimonious

Page 10: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 10

principle of parsimony

predictor1

predictor3

predictor2R2

•predictors are weak and overlap – are redundant

model is not

parsimonious

complete (but quick) example

� can we predict job performance (Y) from overall school achievement (X1), IQ (X2), and interview rating (X3)?– is the model significant and important?

• look at R2 and test R for significance

–are the individual predictors significant and important?• look at size and significance of r, β, sr2 etc

check assumptions

scale measurement:

- IQ, performance, and school achievement (GPA) measured using continuous scales

- interview ratings poorly used (categorical, but OK)

scale reliability

- reliability measures all > .70

sample size

- N = 110; satisfies rule of 104 + p

data checking – scale normality

80.00 90.00 100.00 110.00 120.00 130.00 140.00

iq

0

5

10

15

20

Frequency

Mean = 99.8273Std. Dev. = 12.63449N = 110

Histogram

6.50 7.00 7.50 8.00 8.50 9.00 9.50

interview

0

20

40

60

80

100

Frequency

Mean = 8.0455Std. Dev. = 0.51378N = 110

Histogram

1.00 2.00 3.00 4.00

gpa

0

5

10

15

20

Frequency

Mean = 2.3965Std. Dev. = 0.86414N = 110

Histogram

scales look roughly normally distributed

could transform IQ and check effect on solution

(see lect 4 for transformations)

20.00 30.00 40.00 50.00 60.00 70.00

perform

0

5

10

15

20

25

30

Frequen

cy

Mean = 47.2091Std. Dev. = 12.59102N = 110

Histogram

IQ

GPA P

IR

Page 11: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 11

data checking – scale normality

check skewnessand kurtosis

a statistic divided by a standard error is a z test, p = .05 at +/-1.96

Descriptives

99.8273 1.20465

97.4397

102.2149

99.4444

99.0000

159.630

12.63449

75.00

137.00

62.00

18.00

.386 .230

-.130 .457

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

iqStatistic Std. Error

for IQZ(skew) = .389/.230 = 1.68, p > .05Z(kurtosis) = -.13/.457 < 1, ns

data checking – scale linearity

iq gpa interview perform

perform

interview

gpa

iq

relationships among scales are reasonably linear

interview scores possibly the exception (perhaps rescale?)

important point is that there is no curvilinearity

data checking – univariate outliers

no univariateoutliers

Extreme Values

27 137.00

11 131.00

23 128.00

25 127.00

17 121.00

95 75.00

76 75.00

6 79.00

55 81.00

49 81.00

1

2

3

4

5

1

2

3

4

5

Highest

Lowest

iqCase Number Value

for IQmean = 99.82SD = 12.63

maximum score < 137.71 (just!)mimimum score > 61.93

data checking - residuals

residuals are normally, linearly and homogenouslydistributed around Ŷ, and they are not correlated

-4 -2 0 2 4

Regression Standardized Predicted Value

-4

-3

-2

-1

0

1

2

Reg

ression Standardized

Residual

Dependent Variable: perform

Ŷ

Page 12: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 12

data checking - parsimony

validities relatively high (except interview), collinearities relatively low

Correlations

1.000 .613 .633 .414

.613 1.000 .498 .258

.633 .498 1.000 .413

.414 .258 .413 1.000

perform

iq

gpa

interview

Pearson Correlationperform iq gpa interview

also no multicollinearity (all rs < .90)

data checking – multicollinearity

all tolerance values well above .10

all VIF values well below 4

therefore no multicollinearity or singularity

Coefficientsa

.749 1.336

.665 1.503

.826 1.211

Model1

1

1

iq

gpa

interview

Tolerance VIF

Collinearity Statistics

Dependent Variable: performa.

data checking – multivariate outliers

no multivariate outliers

Residuals Statisticsa

25.9542 68.7743 47.2091 9.24986 110

-2.298 2.331 .000 1.000 110

.830 2.728 1.595 .430 110

25.2231 68.6111 47.1942 9.25462 110

-27.49503 14.47562 .00000 8.54248 110

-3.174 1.671 .000 .986 110

-3.239 1.689 .001 1.005 110

-28.62991 14.78442 .01484 8.87115 110

-3.396 1.704 -.003 1.015 110

.010 9.817 2.973 2.068 110

.000 .108 .010 .015 110

.000 .090 .027 .019 110

Predicted Value

Std. Predicted Value

Standard Error ofPredicted Value

Adjusted Predicted Value

Residual

Std. Residual

Stud. Residual

Deleted Residual

Stud. Deleted Residual

Mahal. Distance

Cook's Distance

Centered Leverage Value

Minimum Maximum Mean Std. Deviation N

Dependent Variable: performa.

Cook’s distance < 1 (no influential cases)

Mahalanobisdistance = 9.87critical χ2(3) = 12.84, so leverage is ns

the model: R and R2

Model Summaryb

.735a .540 .527 8.66252Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), interview, iq, gpaa.

Dependent Variable: performb.

ANOVAb

9326.029 3 3108.676 41.427 .000a

7954.162 106 75.039

17280.191 109

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), interview, iq, gpaa.

Dependent Variable: performb.

R2 is substantial

R is significant

Page 13: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 13

the individual predictors - β

Coefficientsa

-35.710 14.478 -2.466 .015

.384 .076 .386 5.063 .000 .613 .441 .334

5.460 1.177 .375 4.638 .000 .633 .411 .306

3.912 1.777 .160 2.201 .030 .414 .209 .145

(Constant)

iq

gpa

interview

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Zero-order Partial Part

Correlations

Dependent Variable: performa.

the size and significance of β indicates that IQ and GPA are much stronger predictors of performance than interview scores

the individual predictors – sr2

Coefficientsa

-35.710 14.478 -2.466 .015

.384 .076 .386 5.063 .000 .613 .441 .334

5.460 1.177 .375 4.638 .000 .633 .411 .306

3.912 1.777 .160 2.201 .030 .414 .209 .145

(Constant)

iq

gpa

interview

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Zero-order Partial Part

Correlations

Dependent Variable: performa.

semi-partial correlation aka ‘part correlation’

• IQ uniquely explains 11% (.3342) of performance variance• GPA uniquely explains 9% (.3062)… • interview ratings uniquely explain 2% (.1452)…• shared variance = R2 – unique = .54 - .11 - .09 - .02 = 32%

additional issues…

� causality– we speak of ‘prediction’, but can not infer causality

– NO statistical analysis allows causal inferences – evidence for causality is dependent upon your design not your analysis

suppressor variables

� a suppressor variable is a predictor which enhances overall relationship (R2) by virtue of its relationship with another predictor, rather than the criterion

� Identifying suppressor variables:– significant regression weight (b), but disproportionately small bivariate correlation (r)

– or b and r reverse signs

– indicates that one of the other predictors in the model suppresses variance which is unrelated to the DV

Page 14: correlation and association regression and predictionhomepages.gold.ac.uk/ithome/Lect 8 multiple regression_HO... · 2006-11-27 · PS71020A lecture 8 -Multiple Regression -Dr Luke

PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie [email protected] 14

relationship to ANOVA

� ANOVA is a special case of multiple regression

� easy to demonstrate when we have dichotomous groups – e.g., males/females

� with > 2 categories need to code data so that SPSS understands what we are doing– dummy coding or effect coding

regression with dichotomous predictor

Model Summary

.802a .643 .583 6.45497Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), GENDERa.

ANOVAb

450.000 1 450.000 10.800 .017a

250.000 6 41.667

700.000 7

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), GENDERa.

Dependent Variable: HEIGHTb.

now run as an ANOVA…

Tests of Between-Subjects Effects

Dependent Variable: HEIGHT

450.000a 1 450.000 10.800 .017 .643

217800.000 1 217800.000 5227.200 .000 .999

450.000 1 450.000 10.800 .017 .643

250.000 6 41.667

218500.000 8

700.000 7

SourceCorrected Model

Intercept

GENDER

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

Partial EtaSquared

R Squared = .643 (Adjusted R Squared = .583)a.

readings :

� Howell– chapter 15

� Tabachnick & Fidell– chapter 5, esp pp. 153 – 164 for complete example

– also ch 4, esp pp. 66 – 80 for data screening issues

� good website:http://www2.chass.ncsu.edu/garson/PA765/regress.htm