Regression Analyses
• Multiple IVs
• Single DV (continuous)
• Generalization of simple linear regression
• Y’ = b0 + b1X1 + b2X2 + b3X3...bkXk
Where k is the number of predictors
• Find solution where Sum(Y-Y’)2 minimized
• Do not confuse size of bs with importance for prediction
• Can standardize to get betas, which can help determine relative importance
Multiple Regression
• Prediction – allows prediction of change in the D.V. resulting from changes in the multiple I.V.s
• Explanation – enables explanation of the variate by assessing the relative contribution of each I.V. to the regression equation
• More efficient than multiple simple regression equations– Allows consideration of overlapping variance in the
IVs
Why use Multiple Regression?
When do you use Multiple Regression?
• When theoretical or conceptual justification exists for predicting or explaining the D.V. with the set of I.V.s
• D.V. is metric/continuous– If not, logistic regression or discriminant
analysis
Variance in Y
Variance in X1
Variance in X2
a
c
b
eresidual variance
X1
X2
Y
Multiple Regression
• DV is continuous and interval or ratio in scale • Assumes multivariate normality for random IVs• Assumes normal distributions and homogeneity of variance for each level of X for fixed IVs• No error of measurement• Correctly specified model• Errors not correlated
• Expected mean of residuals is 0• Homoscedasticity (error variance equal at all levels of X)• Errors are independent/no autocorrelation (error for one score not correlated with error for another score)• Residuals normally distributed
Assumptions
Multiple regression represents the construction of a weighted linear combination of variables:
XBXBXBAy kikiii ,2,21,1 ...ˆ
The weights are derived to:
(a) Minimize the sum of the squared errors of prediction:
(b) Maximize the squared correlation (R2) between the original outcome variables and the predicted outcome variables based on the linear combination.
i
yy ii ˆ2
X
Y
BXAy ˆy-y`
Multiple R
• R is like r except it involves multiple predictors and R cannot be negative• R is the correlation between Y and Y’ where• Y’ = b0+b1X1 + b2X2 + b3X3...bkXk• R2 tells us the proportion of variance accounted for (coefficient of determination)
An example . . .
Y = Number of job interviews
X1 = GRE score
X2 = Years to complete Ph.D.
X3 = Number of publications
N = 500
Interviews
16.014.012.010.08.06.04.02.0
140
120
100
80
60
40
20
0
Std. Dev = 2.88
Mean = 8.6
N = 500.00
GRE
1560.0
1520.0
1480.0
1440.0
1400.0
1360.0
1320.0
1280.0
1240.0
1200.0
1160.0
1120.0
1080.0
1040.0
50
40
30
20
10
0
Std. Dev = 103.72
Mean = 1296.8
N = 500.00
Publications
10.08.06.04.02.00.0
200
100
0
Std. Dev = 2.31
Mean = 4.3
N = 500.00
Years to Complete Degree
14.012.010.08.06.04.0
200
100
0
Std. Dev = 2.05
Mean = 6.1
N = 500.00
Descriptive Statistics
8.60 2.883 500
1296.82 103.724 500
4.30 2.309 500
6.09 2.055 500
Interviews
GRE
Publications
Years to CompleteDegree
Mean Std. Deviation N
Correlations
1.000 .219 .677 -.375
.219 1.000 .309 -.091
.677 .309 1.000 -.286
-.375 -.091 -.286 1.000
. .000 .000 .000
.000 . .000 .020
.000 .000 . .000
.000 .020 .000 .
500 500 500 500
500 500 500 500
500 500 500 500
500 500 500 500
Interviews
GRE
Publications
Years to CompleteDegree
Interviews
GRE
Publications
Years to CompleteDegree
Interviews
GRE
Publications
Years to CompleteDegree
Pearson Correlation
Sig. (1-tailed)
N
Interviews GRE Publications
Years toCompleteDegree
Predicting Interviews
Variance in InterviewsVariance in Time to
Graduate
Variance in Pubs
b
fresidual variance
Variance in GRE
a
c
d
e
Regression with SPSS
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT interviews /METHOD=ENTER years to complete gre pubs /SCATTERPLOT=(*ZPRED ,*ZRESID) .
From Analyze Menu • Choose Regression • Choose Linear
Model Summaryb
.703a .494 .491 2.056 .494 161.558 3 496 .000 1.994Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Durbin-Watson
Predictors: (Constant), Years to Complete Degree, GRE, Publicationsa.
Dependent Variable: Interviewsb.
The error that is minimized in the derivation of the regression weights: the standard deviation of errors of prediction.
The variance that is maximized in the derivation of the regression weights.
ANOVAb
2049.250 3 683.083 161.558 .000a
2097.142 496 4.228
4146.392 499
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Years to Complete Degree, GRE, Publicationsa.
Dependent Variable: Interviewsb.
ANOVAb
2049.250 3 683.083 161.558 .000a
2097.142 496 4.228
4146.392 499
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Years to Complete Degree, GRE, Publicationsa.
Dependent Variable: Interviewsb.
The error that is minimized in the derivation of the regression weights: the variance of errors of prediction.
Coefficients
6.576 1.219 5.395 .000 4.181
3.029E-04 .001 .011 .325 .746 -.002
.771 .044 .617 17.685 .000 .685
-.277 .047 -.198 -5.926 .000 -.369
(Constant)
GRE
Publications
Years to CompleteDegree
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound
95% Confidence Interval for B
Dependent Variable: Interviewsa.
The weight, b The weight, ,
if variables are standardized.
Coefficients
6.576 1.219 5.395 .000 4.181
3.029E-04 .001 .011 .325 .746 -.002
.771 .044 .617 17.685 .000 .685
-.277 .047 -.198 -5.926 .000 -.369
(Constant)
GRE
Publications
Years to CompleteDegree
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound
95% Confidence Interval for B
Dependent Variable: Interviewsa.
Significance of Beta weights.
Output from SPSS
Multicollinearity
• Addition of many predictors increases likelihood of multicollinearity problems• Using multiple indicators of the same construct without combining them in some fashion will definitely create multicollinearity problems• Wreaks havoc with analysis
e.g., significant overall R2, but no variables in the equation significant
• Can mask or hide variables that have large and meaningful impacts on the DV
Multicollinearity
Multicollinearity reflects redundancy in the predictor variables. When severe, the standard errors for the regression coefficients are inflated and the individual influence of predictors is harder to detect with confidence. When severe, the regression coefficients are highly related.
Coefficient Correlationsa
1.000 .003 .272
.003 1.000 -.296
.272 -.296 1.000
2.186E-03 1.485E-07 5.549E-04
1.485E-07 8.705E-07 -1.204E-05
5.549E-04 -1.20E-05 1.898E-03
Years to CompleteDegree
GRE
Publications
Years to CompleteDegree
GRE
Publications
Correlations
Covariances
Model1
Years toCompleteDegree GRE Publications
Dependent Variable: Interviewsa.
var(b)
Coefficientsa
.000 4.181 8.971
.746 -.002 .002 .219 .015 .010 .905 1.105
.000 .685 .856 .677 .622 .565 .838 1.194
.000 -.369 -.185 -.375 -.257 -.189 .918 1.089
Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Zero-order Partial Part
Correlations
Tolerance VIF
Collinearity Statistics
Dependent Variable: Interviewsa.
Coefficients
6.576
3.029E-04
.771
-.277
(Constant)
GRE
Publications
Years to CompleteDegree
Model1
B
UnstandardizedCoefficients
Dependent Variable: Interviewsa.
The tolerance for a predictor is the proportion of variance that it does not share with the other predictors. The variance inflation factor (VIF) is the inverse of the tolerance.
Remedies:(1) Combine variables using factor analysis (2) Use block entry(3) Model specification (omit variables)(4) Don’t worry about it as long as the program will allow it to run (you don’t have singularity, or perfect correlation)
Multicollinearity
Incremental R2
• Changes in R2 that occur when adding IVs• Indicates the proportion of variance in prediction
that is provided by adding Z to the equation• It is what Z adds in prediction after controlling for X
in Z• Total variance in Y can be broken up in different
ways, depending on order of entry (which IVs controlled first)
• If you have multiple IVs, change in R2 strongly determined by intercorrelations and order of entry into the equation
• Later point of entry, less R2 available to predict
Other Issues in Regression• Suppressors (one IV correlated with the other IV
but not with the DV; switches in sign)• Empirical cross-validation• Estimated cross-validation• Dichotomization, Trichotomization, Median splits
– Dichotomize one variable reduces max r to .798– Cost of dichot is loss of 1/5 to 2/3 of real variance– Dichot on more than one variable can increase Type
I error and yet can reduce power as well!
Significance of Overall R2
• Tests: a + b + c + d + e + f against area g (error)
• Get this from a simultaneous regression or from last step of block or hierarchical entry.
• Other approaches may or may not give you an appropriate test of overall R2, depending upon whether all variables are kept or some omitted.
Ya
b
cd
e
f
g
X
W
Z
Significance of Incremental R2
Change in R2 tests: a + b + c against area d + e + f + g
At this step, the t test for the b weight of X is the same as the square root of the F test if you only enter one variable. It is a test of whether or not the area of a + b + c is significant as compared to area d + e + f + g.
Step 1: Enter X
Ya
b
cd
e
f
g
X
Significance of Incremental R2
Change in R2 tests: d + e against area f + g
At this step, the t test for the b weight of X is a test of area a against area f + g and the t test for the b weight of W is a test of area d + e against area f + g.
Ya
b
cd
e
f
g
X
W
Step 2: Enter W
Significance of Incremental R2
Ya
b
cd
e
f
g
X
W
Z
Step 3: Enter ZChange in R2 tests: f against g
At this step, the t test for b weight of X is a test of area a against area g, the t test for the b weight of W is a test of area e against area g, and the t test for the b weight of Z is a test of area f against area g. These are the significance tests for the IV effects from a simultaneous regression analysis. No IV gets “credit” for areas b, c, d
in a simultaneous analysis.
Hierarchical RegressionSignificance of Incremental R2
Ya
b
cd
e
f
g
X
W
Z
Enter variables in hierarchical fashion to determine R2 for each effect. Test each effectagainst error variance after all variables have been entered.Assume we entered X then W then Z in a hierarchical fashion.Tests for X: areas a + b + c against gTests for W: areas d + e against gTests for Z: area f against g
Significance test for b or Beta
Ya
b
cd
e
f
g
X
W
Z
In final equation, when we look at the t tests for our b weights we are looking at the following tests:
Tests for X: Only area a against gTests for W: Only area e against gTests for Z: Only area f against g
That’s why incremental or effect R2 tests are more powerful.
Methods of building regression equations
• Simultaneous: All variables entered at once• Backward elimination (stepwise): Starts with full equation and eliminates IVs on the basis of significance tests• Forward selection (stepwise): Starts with no variables and adds on the basis of increment in R2• Hierarchical: Researcher determines order and enters each IV• Block entry: Researcher determines order and enters multiple IVs in single blocks
Simultaneousa
c
b e
X
Z
Y
f d
g
h
W
i
Simultaneous: All variables entered at onceSignificance tests and R2 based on unique variance No variable “gets credit” for area gVariables with intercorrelations have less unique variance
Variable X & Z together predict more than WVariable W might be significant, X & Z are not
Betas are partialled, so beta for W larger than X or Z
Backward Elimination
a
c
b e
X
Z
Y
f d
g
h
W
i
• Starts with full equation and eliminates IVs • Gets rid of least significant variable (probably X), then tests remaining vars to see if they are signif•Keeps all remaining significant vars•Capitalizes on chance•Low cross-validation
Forward Selection
a
c
b e
X
Z
Y
f d
g
hW
i
• Starts with no variables and adds IVs • Adds most unique R2 or next most significant variable (probably W because gets credit for area i)• Quits when more vars are not significant• Capitalizes on chance• Low cross-validation
Hierarchical (Forced Entry)
a
c
b e
X
Z
Y
f d
g
hW
i
•Researcher determines order of entry for IVs•Order based on theory, timing, or need for stat control•Less capitalization on chance•Generally higher cross-validation•Final model based on IVs of theoretical importance•Order of entry determines which IV gets credit for area g
Order of Entry• Determining order of entry is crucial• Stepwise capitalizes on chance and reduces cross-
validation and stability of your prediction equation– Only useful to maximize prediction in a given sample– Can lose important variables
• Use the following:– Logic– Theory– Order of manipulations/treatments– Timing of measures
• Usefulness of the regression model is reduced as the k (number of IVs) approaches N (sample size)– Best to have at least 15 to 1 ratio or more
Interpreting b or
• B or b is raw regression weightis standardized (Scale invariant)• At a given step, size of b or influenced
by order of entry in a regression equation– Should be interpreted at entry step
• Once all variables are in the equation, bs and s will always be the same regardless of the order of entry
• Difficult to interpret b or for main effects when interaction in equation
• We can code groups and use to analyze data (e.g., 1 and 2 to represent females and males)• Overall R2 and significance tests for full equation will not change regardless of how we code (as long as orthogonal)• Interpretation of intercept (a) and slope (b or beta weights) WILL change depending on coding • We can use coding to capture effects of categorical variables
Regression: Categorical IVs
Regression: Categorical IVs • Total # codes needed is always # groups -1• Dummy coding
– One group assigned 0s. b wts indicate mean difference of groups coded 1 compared to the group coded 0
• Effect coding– One group assigned -1s. b wts indicate mean
difference of groups coded 1 to the grand mean
• All forms of coding give you the same overall R2 and significance tests for total R2
• Difference is in interpretation of b wts
Dummy Coding White Black Asian Hispanic Dummy 1 0 1 0 0 Dummy 2 0 0 1 0 Dummy 3 0 0 0 1
• # dummy codes = # groups – 1• Group that receives all zeros is the reference group• Beta = comparison of reference group to group represented by 1• Intercept in the regression equation is mean of the reference group
Effect Coding
• # contrast codes = # groups – 1• Group that receives all zeros in dummy coding now gets all -1s• Beta = comparison of the group represented by 1 to the grand mean• Intercept in the regression equation is the grand mean
White Black Asian Hispanic Effect 1 -1 1 0 0 Effect 2 -1 0 1 0 Effect 3 -1 0 0 1
Regression with Categorical IVs vs. ANOVA
• Provides the same results as t tests or ANOVA
• Provides additional information– Regression equation (line of best fit)– Useful for future prediction– Effect size (R2)– Adjusted R2
Regression with Categorical Variables - Syntax
Step 1. Create k -1 dummy variablesStep 2. Run regression analysis with dummy variables as predictorsREGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT fiw /METHOD=ENTER msdum1 msdum2 msdum3 msdum4 msdum5 .
Regression with Categorical Variables - Output
ANOVAb
3.562 5 .712 1.562 .170a
166.030 364 .456
169.592 369
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), msdum5, msdum4, msdum2, msdum3, msdum1a.
Dependent Variable: fiwb.
Coefficientsa
2.134 .049 43.669 .000
.084 .078 .059 1.081 .280
-.420 .260 -.084 -1.615 .107
-.102 .162 -.033 -.631 .529
-.634 .480 -.069 -1.321 .187
-.127 .137 -.050 -.928 .354
(Constant)
msdum1
msdum2
msdum3
msdum4
msdum5
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: fiwa.
Adjusted R2
• There may be “overfitting” of the model and R2 may be inflated
• Model may not cross-validate shrinkage
• More shrinkage with small samples (< 10-15 observations per IV)
Model Summary
.145a .021 .008 .67537Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), msdum5, msdum4, msdum2,msdum3, msdum1
a.
Example: Hierarchical Regression
Example. Number of children, hours in family work and sex as predictors of family interfering with work
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT fiw /METHOD=ENTER numkids /METHOD=ENTER hrsfamil /METHOD=ENTER sex .
Hierarchical Regression OutputVariables Entered/Removedb
numkids a . Enter
hrsfamila . Enter
sexa . Enter
Model1
2
3
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: fiwb.
Model Summary
.067a .004 .001 .66568 .004 1.450 1 324 .229
.067b .004 -.002 .66669 .000 .014 1 323 .907
.222c .049 .040 .65254 .045 15.169 1 322 .000
Model1
2
3
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors: (Constant), numkidsa.
Predictors: (Constant), numkids, hrsfamilb.
Predictors: (Constant), numkids, hrsfamil, sexc.
Hierarchical Regression OutputCoefficientsa
2.025 .098 20.673 .000
.028 .024 .067 1.204 .229
2.034 .126 16.088 .000
.028 .025 .065 1.112 .267
.000 .002 -.007 -.117 .907
2.433 .161 15.144 .000
.038 .024 .088 1.541 .124
.000 .002 -.002 -.036 .971
-.285 .073 -.213 -3.895 .000
(Constant)
numkids
(Constant)
numkids
hrsfamil
(Constant)
numkids
hrsfamil
sex
Model1
2
3
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: fiwa.
Hierarchical Regression OutputCoefficientsa
2.025 .098 20.673 .000
.028 .024 .067 1.204 .229
2.034 .126 16.088 .000
.028 .025 .065 1.112 .267
.000 .002 -.007 -.117 .907
2.433 .161 15.144 .000
.038 .024 .088 1.541 .124
.000 .002 -.002 -.036 .971
-.285 .073 -.213 -3.895 .000
(Constant)
numkids
(Constant)
numkids
hrsfamil
(Constant)
numkids
hrsfamil
sex
Model1
2
3
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: fiwa.
Coefficientsa
2.433 .161 15.144 .000
.038 .024 .088 1.541 .124
.000 .002 -.002 -.036 .971
-.285 .073 -.213 -3.895 .000
(Constant)
numkids
hrsfamil
sex
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: fiwa.
Simultaneous Regression Output
Top Related