Getting More out of Multiple Regression Darren Campbell, PhD.
-
Upload
heriberto-audrey -
Category
Documents
-
view
216 -
download
0
Transcript of Getting More out of Multiple Regression Darren Campbell, PhD.
Getting More out of Multiple Regression
Darren Campbell, PhD
Overview
View on Teaching Statistics When to Apply How to Use & How to Interpret
Multiple Regression Techniques
1. Centring removing /group difference confounds
2. Centring interpret continuous interactions
3. Spline functions – Piecemeal Polynomials
Estimate separate slopes each angle of the regression polynomial
Perks of Multiple Regression
1. Realistic many influences Behaviour 2. Control over confounds 3. Test for relative importance 4. Identify interactions
Why Not Use ANOVAs?
Not realistic:Many behaviours / constructs are continuous
e.g., intelligence, personality Loss of statistical power - categories
scores assumed to be the same + errormixing systematic patterns into the error term
What is Centring? Simple re-scaling of raw scores
Raw Score minus Some Constant value x1 – 5.1
1 – 5.1 = -4.1
4 – 5.1 = -1.1 x2 – 29.4
30 – 29.4 = 0.6
35 -- 29.4 = 5.6
A Simple Case for Centring
Babies: Cry & Fuss – parent report diary measures Fail about - limb movement
Are these 2 infant behaviours related? Emotional Responses & Emotion Regulation
A Simple Case for Centring
Age Moves / Hr Crying Hrs/Day
6 week olds 5.1 4.7
6 month olds 29.43.5
Full Sample 17.2 4.1
Are these 2 infant behaviours related?
6 Week-Olds
r = +.47
some infants cry more & move more
others cry less & move less
6 week-old infants
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6 7 8 9 10
Activity - limb movements
Ho
urs
of
Cry
ing
6 Month-Olds
r = +.38
some infants cry more & move more
others cry less & move less
What if we combine the two groups?
6 month-old infants
0
1
2
3
4
5
6
7
25 30 35 40
Activity - limb movements
Ho
urs
of
Cry
ing
• Full sample r = -0.22
6 week-olds & 6-month-old infants
0
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30 35 40
Activity - limb movements
Ho
urs
of
Cry
ing
• Do we get a significant corr? If so, what kind?
What happened with the Correlations?
6 Week-olds: r = +.47 6 Month-Olds: r = +.38 6 Week & 6 Month-olds: r = -0.22
Correlations = Grand Mean Centring
1) Mean Deviations for each variable: X & Y 2) Rank Order Mean Deviations 3) Correlate 2 rank orders of X & Y
The Disappearing Correlation Explained
Grand Mean Centring lead to all the older infants being classified as high movers young infants low movers Young high criers & high movers -> high criers & low
movers Large Group differences in movement altered the
detection of within-group r’s
What should we do?
Solution: Create Group Mean Deviations
Re-scale raw scores Raw – Group Mean 6 week-olds: xs – 5.1 6 month-olds: xs – 29.4
Solution: Create Group Mean Deviations
Crying Raw AL Group Means Group Centred AL
5.7 1 -5.11 -4.11
6 4 -5.11 -1.11
2 5 -5.11 -0.11
0.5 30 -29.4 0.63
2.5 35 -29.4 5.63
2 34 -29.4 4.63
• Raw Scores
6 week-olds & 6-month-old infants
0
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30 35 40
Activity - limb movements
Ho
urs
of
Cry
ing
Group Centred Scores
Group mean data r = .41 - full sample Mulitple Regression could also work on uncentred variables
Crying = Group + Uncentred AL Not a Group x AL interaction – the relation is the same for both groups
012
3456
789
-10 -8 -6 -4 -2 0 2 4 6 8 10
Limb Movements / 48 Hrs
Ho
urs
of
Cry
ing
/48
Hrs
6 Weeks Old
6 Months Old
Centring so far
1. Centring is Magic 2. Different types of centring
Depending on the number used to re-scale the data
Grand mean – Pearson Correlations Group Means – Infant Limb Movements
Regression Interactions Centring
Great for Interpreting Interactions trickier than for ANOVAs do not have pre-defined levels or groups based on 2+ continuous vars
Multiple Regression - the Basics
The Basic Equation: Y = a + b1*X1 + b2*X2 + b3*X3 + e Outcome = Intercept + Beta1 * predictor1 + B2 * pred2 + B3 * pred3 + Error
a = expected mean response of y betas: every 1 unit change in X you get a
beta sized change in Y
Regression Interactions Centring Reducing multicollinearity
interaction predictor = x1 * x2 x1 & x2 numbers near 0 stay near 0 and high x1 & x2
numbers get really high interaction term is highly correlated with original x1 &
x2 variables Centring makes each predictor: x1 & x2
have more moderate numbers above and below zero positive and negative numbers
Reduces the multiplicative exaggeration between x1 & x2 and the interaction product x1*x2
Centring to reduce Multicollinearity
X1 with X1*X2 multicollinearityOriginal Variables
0
10
20
30
40
50
60
70
80
90
0 10 20
x1
x1*x
2 p
rod
uct
X1 with X1*X2 multicollinearity Centred Variables
-10
0
10
20
30
-6 -4 -2 0 2 4
x1
x1*x
2 p
rod
uct
Regression
Y = a + b1*X1 + b2*X2 + b3*X1*X3 + e
How does X2 relate to Y at different levels of X1?
How does predictor 2 (shyness) relate to the outcome (social interactions) at different stress levels (X1)?
Uncentred Data Centred DataX1 = 26.2 (14.5) X1 = 0.0 (14.5)X2 = 24.8 (27.6) X2 = 0.0 (27.6)
x1 x2 x12 y x1c x2c x12c y
x1 -- 0.58** 0.65** 0.14** x1c -- 0.58** 0.11 0.14*
x2 -- 0.96** 0.28** x2c -- 0.66** 0.28**
x12 -- 0.34** x12c -- 0.34**
Correlation Matrix:
** p = .01
* p = .05
Regression Equation Results No Interaction:
Y = b0 + b1 * X1 + b2 * X2
Uncentred:Y = 1164.8 – 4 X1 + 20 X2 **
Centred:Y = 1550.8 – 4 X1 + 20 X2 **
Regression Equation Results
Interaction Term Included: Y = b0 + b1 * X1 + b2 * X2 + b3 * X1*X2
Uncentred: Y = 1733 – 19.1 X1 – 31.7 X2 ** + 1.26 X1*X2
Centred: Y = 1260 + 12.0 X1 + 1.1 X2 + 1.26 X1*X2
But what does it mean…
How does X2 relate to Y at different levels of X1?
How does predictor 2 (shyness) relate to the outcome (social interactions) at different stress levels (X1)?
Post Hocs Y = b0 + b1 * X1 + b2 * X2 + b3 * X1*X2
Y = ( b1 * X1 + b0 ) + ( b2 + b3 * X1 ) * X2
-1 SD below X1 Mean & + 1SD above X1 Mean
X - (- 14.547663) X - 14.547663
X + 14.547663
AL Mean Centred
0
5
10
-10 -5 0 5 10
Movement Hrs/Day
Cry
ing
Hrs
/Day
AL -1SD Below Mean
0
5
10
-10 0 10
Movement Hrs/Day
Cry
ing
H
rs/D
ay
AL +1SD Below Mean
0
5
10
-10 0 10
Movement Hrs/Day
Cry
ing
H
rs/D
ay
Scatterplots: Moving the Y Axis
-1 SD Below X1 Mean Y = 1085 -19.1 X1 - 17.1 X2 + 1.26 X1*X2 t (1,196) = -1.40, p =.16
Centred: Y = 1260 + 12.0 X1 + 1.1 X2 + 1.26 X1*X2 t (1,196) = 0.12, p =.88
+1 SD Above X1 Mean Y = 1435 - 19.1 X1+ 19.4 X2 ** + 1.26 X1*X2 t (1,196) = 3.66, p =.001
Regression Interaction Example
Predicting inhibitory ability with motor activity & age simon says like games 4 to 6 yr-olds & physical movement Move by Age interaction
F (1, 81) = 5.9, p < .02 Young (-1.5SD): move beta sig + Inhibition Middle (Mean) : move beta p = .10 ~ Inhibition Older (+1.5SD): move beta n.s. inhibition
Polynomials, Centring, & Spline Functions
Polynomial relations: quadratic, cubic, etc
Y = a + b1*X1 - b2*X1*X1 + e
-100-50
050
100150200250
-10 -5 0 5 10 15
Curvilinear Pattern Assume a symmetric
pattern – X2
But, it may not be ...
Perceived Control (Y) slowly increases & then declines rapidly in old age
0
100
200
300
400
500
0 5 10 15
-100-50
050
100150200250
-10 -5 0 5 10 15
This Brings us to Spline Functions Split up predictor X
2+ variables
XLow & XHigh 0
50
100
150
200
250
-10 -5 0 5 10 15 20
XLow = X – (-5) & set values at the next change point to zero Ditto for XHigh
Re-run Y = a + b1*XLow - b2*XHigh+ e
Perks of Spline Functions
Estimate slope anywhere along the range
Can be sig on one part - n.s. on another
Steeper or shallower
Multiple Regression Techniques
1. Centring removing /group difference confounds
2. Centring interpret continuous interactions
3. Spline functions More precise understanding of polynomial
patterns
Questions
• Alpha control procedures for spline functions– Could be argue that you are describing the pattern
already identified?
– Conservatively, you could apply an alpha control procedure. I like the False Discovery Rate procedures.
– Replication is preferred, but not always possible.
Alpha Control Aside• The source of Type 1 errors is typically poorly
described.• Typical: If enough probability tests are run, the
probability will increase to the point where something becomes significant just by chance. – But, probability is linked to the representativeness of
your data and type 1 error is a proxy for the likelihood of the representativeness of your data.
• My View: The real source of Type 1 errors is that if you– divide up the data into enough subgroupings – eventually one of those subgroupings will differ
because it is misrepresentative of reality.
Standardized vs Centred
• Centred is x – xM
• Standardized (x – xM)/ SDx– Makes variability for each predictor = 1 – Standardized Beta = raw b * SDx / SDy– Similar to centring but different metric needs to be
adjusted for interaction terms
• To get comparable results with interaction term– Standardization should be applied to X1 and X2 prior
to the X1*X2 estimate then use “raw” coefficients
Centring and Spline Functions
Relatively simple procedures
Old dogs in the Statistic World but new tricks for many
That’s All Folks!