Post on 02-Jan-2020
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 1
correlation and association
� is there a relationship between IQ scores (X) and overall school achievement (Y)?– do scores on X and Y covary?
– is there a correlation between X and Y?
YX
XY
SS rCOV
=
regression and prediction
� can we predict overall school achievement (Y) using IQ scores (X)?– how much variance in Y can we explain in terms of X?
– can we model Y using X?• Y = bX + c + e
• Ŷ = bX + c
– represent the model, Ŷ, as a ‘line of best fit’
X unknown – best predictor of Y is Y X known – best predictor of Y is Ŷ
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 2
the least squares criterion
� the model, Ŷ, is calculated to minimise errors of prediction according to the least-squares criterion (LSC)
� LSC states that Σ(Y-Ŷ)2 is minimized
� b is the unstandardised regression coefficient– a weight calculated to satisfy the LSC– indicates predicted change in Y given unit change in X– gives the slope of the regression line
� c is the Y intercept– gives the value of Y when X is zero (typically where the
regression line intercepts the Y axis
2
COV
X
XY
S b =
XbYc −=
variance explained – r2
r2 tells us the proportion of variance in Y which is explained by X
� a ratio reflecting the proportion of variance captured by our model relative to the overall variance in our data
� highly interpretable: r2 =.50 means 50% of the variance in Y is explained by X
∑∑
−
−===
2
2
ˆ2
)(
)ˆ(
YY
YY
SS
SS
SS
SS r
Y
Y
Total
Regression
multiple regression
� can we predict job performance (Y) from overall school achievement (X1) and IQ scores (X2)?– how much variance in Y is explained by X1 and X2 in combination?
– how important is each predictor of job performance?
� two kinds of research questions in MR:– is the model significant and important?
– are the individual predictors significant and important?
the structural model
Y = c + b1X1 + b2X2 + . . . bpXp + e
Y, any DV score is predicted according to
c � an intercept on the Y axis, plus
b1X1 � a weighted effect of predictor X1
b2X2 � a weighted effect of predictor X2
bpXp � a weighted effect of predictor Xp
e� error
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 3
the structural model
Y = C + b1X1 + b2X2 + . . . bpXp + e
DATA = MODEL + RESIDUAL
the regression plane – two predictors(3D space)
predictor 1 (X 1) predictor 2 (X
2)
criterion (Y) Ŷ = b1X1 + b2X2 + c
unstandardised partial regression coefficients - b
� Ŷ is calculated according to LSC� solved for by finding a set of weights (b) minimising
errors of prediction (around the plane) – b1 indicates change in Y given unit change in X1 when X2 … Xp
= 0
– when standardised, indicates SD change in Y given SD change in X, and is denoted β
� c is the Y intercept
� Ŷ is therefore a weighted combination of the predictors (and intercept) called a linear composite (LC)
IQ
school achievement
bivariate regression
Ŷ = β1(IQ)
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 4
IQ
school achievement
job
performance
Ŷ = β1(IQ) + β2(achieve)
multiple regression
LC
variance explained – R2
R2 is simply the r2 representing the proportion of variance in Y which is explained by Ŷ – the linear composite
� a ratio reflecting the proportion of variance captured by our model relative to the overall variance in our data
� highly interpretable: R2 =.50 means 50% of the variance in Y is explained by the combination of X1, X2… Xp
∑∑
−
−===
2
2
ˆ2
)(
)ˆ(
YY
YY
SS
SS
SS
SS R
Y
Y
Total
Regression
IQ
school achievement
R2 vs r2
IQ
school achievement
job
performance
LC
r2
R2
significance of the model
� R2 tells us how important the model is
� the model can also be tested for statistical significance
� test is conducted on R the multiple correlation coefficient, against df = p, N - p - 1
residual
regression
MS
MS
Rp
RpN F =
−
−−=
)1(
)1(2
2
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 5
0
123
45
6
78
910
0 1 2 3 4 5 6 7 8 9 10
IQ
School Acheivement
90 95 100 105 110 115 120 125 130
SStotal = SSregression+ SSresidual importance of individual predictors
� r – simple correlation coefficent
� b – partial regression coefficient
� β - standardized partial regression coefficient
� pr – partial correlation coefficient
� sr – semi-partial correlation coefficient
r – simple correlation coefficent
� indicates importance of predictor in terms of its direct relationship with the criterion
� not very useful in MR as it does not take into account intercorrelationswith other predictors
b – partial regression coefficient
� indication of the importance of a predictor in terms of the model (not the data)
� scale-bound so can’t compare magnitude
� can however compare significance – each b is tested by dividing by it’s standard error to give a t-value:
b
bSE
b t =
(IQ)
(Motivation)
(Anxiety)
1−−= pNdf
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 6
β – standardized partial regression coefficient
� indication of the importance of a predictor in terms of the model (not the data)
� standardized (scale free) so you CAN compare magnitude
� test of significance is same as for b
Y
pp
ps
sb=β
pr – partial correlation coefficient
correlation between X1 and Y, with the variance shared with other predictors partialled out
pr2 indicates the proportion of variance in the criterion left unexplained by the other predictors that is explained by X1
values are usually similar to β
pr vs βsr – semi-partial correlation
coefficientcorrelation between X1 and Y, after removing from X1 any variance it shares with other predictors
sr2 indicates the uniquecontribution to prediction by X1 as a portion of R2)
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 7
criterion
predictor1
unique, shared, and total variance
� subtracting sr2 for each predictor from R2 gives the shared variance – the amount of variance in R2 accounted for by all predictor variables
predictor2
R2
sr2
sr2
shared var
assumptions of MR
� scale (predictor and criterion scores)– measured using a continuous scale (interval or ratio)
– normality (variables are normally distributed)
– linearity (there is a straight line relationship between predictors and criterion)
– predictors are not multicolinear or singular(extremely highly correlated)
more assumptions of MR
� residuals– normality: array of Y values are normally distributed around Ŷ (assumption of normality in arrays)
– homoscedasticity: variance of Y values are constant across full range Ŷ values (assumption of homogeneity of variance in arrays)
– linearity: straight-line relationship between Ŷand residuals (with mean = 0 and slope = 0)
– independence (residuals uncorrelated)
assumptions met
non-normality
non-normality (curvilinearity)
heteroscedasticity
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 8
� occurs when predictors are highly correlated (>.90)� causes unstable calculation of regression weights (b)� diagnosed with intercorrelations, tolerance, and VIF
Tolerance = (1 - R2x)
– where R2x is the overlap between a particular
predictor and all the other predictors– values below .10 considered problematicVariance Inflation Factor (VIF) = 1/tolerance- values above 4 considered problematic
� best solution is to remove or combine collinear predictors
multicollinearity and singularity outliers – extreme cases
� distort solution and inflate standard error� univariate outliers
– cases beyond 3 SD on any variable� multivariate outliers
– described in terms of:• leverage (h) – distance of case from group centroid along line/plane of best fit
• discrepancy – extent to which case deviates from line/plane of best fit
• influence – combined effect of leverage and discrepancy: effect of the outlier on the solution
multivariate outliers – high influence
predictor 1 (X 1) predictor 2 (X
2)
criterion (Y)
high leverage
high discrepency
multivariate outliers – low influence
predictor 1 (X 1) predictor 2 (X
2)
criterion (Y)
high leverage
high discrepency
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 9
multivariate outliers – testing
� leverage – leverage statistic (h): varies from 0 to 1, values > .50 are problematic
– Mahanobis Distance ≈ h x (n-1), distributed as chi-square and tested as such (df = p, α <.001)
� discrepancy – not directly tested
� influence– assesses change in solution when case is removed
– Cook’s Distance, values > 1 are problematic
more issues: sample size
various rules of thumb exist…
� for medium effect sizes (R2 ≈ .25)– significance of model (R): N > 50 + 8p
– significance of predictors (β etc): N > 104 + p
� more observations needed if– small effect sizes are anticipated
– DV is skewed
– IVs have low reliability
even more issues…
� reliability of measures– low reliability of measures increases residual
variance (less power to reject H0)
� principle of parsimony– moderate intercorrelations are OK, but goal
should be to chose predictors which are 1. maximally correlated with DV (high validity)2. minimally correlated with each other (low
collinearity)
principle of parsimony
predictor1
predictor3
predictor2
criterionR2
•each predictor contributes uniquely and substantially
model is parsimonious
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 10
principle of parsimony
predictor1
predictor3
predictor2R2
•predictors are weak and overlap – are redundant
model is not
parsimonious
complete (but quick) example
� can we predict job performance (Y) from overall school achievement (X1), IQ (X2), and interview rating (X3)?– is the model significant and important?
• look at R2 and test R for significance
–are the individual predictors significant and important?• look at size and significance of r, β, sr2 etc
check assumptions
scale measurement:
- IQ, performance, and school achievement (GPA) measured using continuous scales
- interview ratings poorly used (categorical, but OK)
scale reliability
- reliability measures all > .70
sample size
- N = 110; satisfies rule of 104 + p
data checking – scale normality
80.00 90.00 100.00 110.00 120.00 130.00 140.00
iq
0
5
10
15
20
Frequency
Mean = 99.8273Std. Dev. = 12.63449N = 110
Histogram
6.50 7.00 7.50 8.00 8.50 9.00 9.50
interview
0
20
40
60
80
100
Frequency
Mean = 8.0455Std. Dev. = 0.51378N = 110
Histogram
1.00 2.00 3.00 4.00
gpa
0
5
10
15
20
Frequency
Mean = 2.3965Std. Dev. = 0.86414N = 110
Histogram
scales look roughly normally distributed
could transform IQ and check effect on solution
(see lect 4 for transformations)
20.00 30.00 40.00 50.00 60.00 70.00
perform
0
5
10
15
20
25
30
Frequen
cy
Mean = 47.2091Std. Dev. = 12.59102N = 110
Histogram
IQ
GPA P
IR
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 11
data checking – scale normality
check skewnessand kurtosis
a statistic divided by a standard error is a z test, p = .05 at +/-1.96
Descriptives
99.8273 1.20465
97.4397
102.2149
99.4444
99.0000
159.630
12.63449
75.00
137.00
62.00
18.00
.386 .230
-.130 .457
Mean
Lower Bound
Upper Bound
95% ConfidenceInterval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
iqStatistic Std. Error
for IQZ(skew) = .389/.230 = 1.68, p > .05Z(kurtosis) = -.13/.457 < 1, ns
data checking – scale linearity
iq gpa interview perform
perform
interview
gpa
iq
relationships among scales are reasonably linear
interview scores possibly the exception (perhaps rescale?)
important point is that there is no curvilinearity
data checking – univariate outliers
no univariateoutliers
Extreme Values
27 137.00
11 131.00
23 128.00
25 127.00
17 121.00
95 75.00
76 75.00
6 79.00
55 81.00
49 81.00
1
2
3
4
5
1
2
3
4
5
Highest
Lowest
iqCase Number Value
for IQmean = 99.82SD = 12.63
maximum score < 137.71 (just!)mimimum score > 61.93
data checking - residuals
residuals are normally, linearly and homogenouslydistributed around Ŷ, and they are not correlated
-4 -2 0 2 4
Regression Standardized Predicted Value
-4
-3
-2
-1
0
1
2
Reg
ression Standardized
Residual
Dependent Variable: perform
Ŷ
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 12
data checking - parsimony
validities relatively high (except interview), collinearities relatively low
Correlations
1.000 .613 .633 .414
.613 1.000 .498 .258
.633 .498 1.000 .413
.414 .258 .413 1.000
perform
iq
gpa
interview
Pearson Correlationperform iq gpa interview
also no multicollinearity (all rs < .90)
data checking – multicollinearity
all tolerance values well above .10
all VIF values well below 4
therefore no multicollinearity or singularity
Coefficientsa
.749 1.336
.665 1.503
.826 1.211
Model1
1
1
iq
gpa
interview
Tolerance VIF
Collinearity Statistics
Dependent Variable: performa.
data checking – multivariate outliers
no multivariate outliers
Residuals Statisticsa
25.9542 68.7743 47.2091 9.24986 110
-2.298 2.331 .000 1.000 110
.830 2.728 1.595 .430 110
25.2231 68.6111 47.1942 9.25462 110
-27.49503 14.47562 .00000 8.54248 110
-3.174 1.671 .000 .986 110
-3.239 1.689 .001 1.005 110
-28.62991 14.78442 .01484 8.87115 110
-3.396 1.704 -.003 1.015 110
.010 9.817 2.973 2.068 110
.000 .108 .010 .015 110
.000 .090 .027 .019 110
Predicted Value
Std. Predicted Value
Standard Error ofPredicted Value
Adjusted Predicted Value
Residual
Std. Residual
Stud. Residual
Deleted Residual
Stud. Deleted Residual
Mahal. Distance
Cook's Distance
Centered Leverage Value
Minimum Maximum Mean Std. Deviation N
Dependent Variable: performa.
Cook’s distance < 1 (no influential cases)
Mahalanobisdistance = 9.87critical χ2(3) = 12.84, so leverage is ns
the model: R and R2
Model Summaryb
.735a .540 .527 8.66252Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), interview, iq, gpaa.
Dependent Variable: performb.
ANOVAb
9326.029 3 3108.676 41.427 .000a
7954.162 106 75.039
17280.191 109
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), interview, iq, gpaa.
Dependent Variable: performb.
R2 is substantial
R is significant
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 13
the individual predictors - β
Coefficientsa
-35.710 14.478 -2.466 .015
.384 .076 .386 5.063 .000 .613 .441 .334
5.460 1.177 .375 4.638 .000 .633 .411 .306
3.912 1.777 .160 2.201 .030 .414 .209 .145
(Constant)
iq
gpa
interview
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Zero-order Partial Part
Correlations
Dependent Variable: performa.
the size and significance of β indicates that IQ and GPA are much stronger predictors of performance than interview scores
the individual predictors – sr2
Coefficientsa
-35.710 14.478 -2.466 .015
.384 .076 .386 5.063 .000 .613 .441 .334
5.460 1.177 .375 4.638 .000 .633 .411 .306
3.912 1.777 .160 2.201 .030 .414 .209 .145
(Constant)
iq
gpa
interview
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Zero-order Partial Part
Correlations
Dependent Variable: performa.
semi-partial correlation aka ‘part correlation’
• IQ uniquely explains 11% (.3342) of performance variance• GPA uniquely explains 9% (.3062)… • interview ratings uniquely explain 2% (.1452)…• shared variance = R2 – unique = .54 - .11 - .09 - .02 = 32%
additional issues…
� causality– we speak of ‘prediction’, but can not infer causality
– NO statistical analysis allows causal inferences – evidence for causality is dependent upon your design not your analysis
suppressor variables
� a suppressor variable is a predictor which enhances overall relationship (R2) by virtue of its relationship with another predictor, rather than the criterion
� Identifying suppressor variables:– significant regression weight (b), but disproportionately small bivariate correlation (r)
– or b and r reverse signs
– indicates that one of the other predictors in the model suppresses variance which is unrelated to the DV
PS71020A lecture 8 - Multiple Regression - Dr Luke Smillie -l.smillie@gold.ac.uk 14
relationship to ANOVA
� ANOVA is a special case of multiple regression
� easy to demonstrate when we have dichotomous groups – e.g., males/females
� with > 2 categories need to code data so that SPSS understands what we are doing– dummy coding or effect coding
regression with dichotomous predictor
Model Summary
.802a .643 .583 6.45497Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), GENDERa.
ANOVAb
450.000 1 450.000 10.800 .017a
250.000 6 41.667
700.000 7
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), GENDERa.
Dependent Variable: HEIGHTb.
now run as an ANOVA…
Tests of Between-Subjects Effects
Dependent Variable: HEIGHT
450.000a 1 450.000 10.800 .017 .643
217800.000 1 217800.000 5227.200 .000 .999
450.000 1 450.000 10.800 .017 .643
250.000 6 41.667
218500.000 8
700.000 7
SourceCorrected Model
Intercept
GENDER
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
Partial EtaSquared
R Squared = .643 (Adjusted R Squared = .583)a.
readings :
� Howell– chapter 15
� Tabachnick & Fidell– chapter 5, esp pp. 153 – 164 for complete example
– also ch 4, esp pp. 66 – 80 for data screening issues
� good website:http://www2.chass.ncsu.edu/garson/PA765/regress.htm