Getting More out of Multiple Regression Darren Campbell, PhD.

Getting More out of Multiple Regression

Darren Campbell, PhD

Overview

View on Teaching Statistics When to Apply How to Use & How to Interpret

Multiple Regression Techniques

1. Centring removing /group difference confounds

2. Centring interpret continuous interactions

3. Spline functions – Piecemeal Polynomials

Estimate separate slopes each angle of the regression polynomial

Perks of Multiple Regression

1. Realistic many influences Behaviour 2. Control over confounds 3. Test for relative importance 4. Identify interactions

Why Not Use ANOVAs?

Not realistic:Many behaviours / constructs are continuous

e.g., intelligence, personality Loss of statistical power - categories

scores assumed to be the same + errormixing systematic patterns into the error term

What is Centring? Simple re-scaling of raw scores

Raw Score minus Some Constant value x1 – 5.1

1 – 5.1 = -4.1

4 – 5.1 = -1.1 x2 – 29.4

30 – 29.4 = 0.6

35 -- 29.4 = 5.6

A Simple Case for Centring

Babies: Cry & Fuss – parent report diary measures Fail about - limb movement

Are these 2 infant behaviours related? Emotional Responses & Emotion Regulation

A Simple Case for Centring

Age Moves / Hr Crying Hrs/Day

6 week olds 5.1 4.7

6 month olds 29.43.5

Full Sample 17.2 4.1

Are these 2 infant behaviours related?

6 Week-Olds

r = +.47

some infants cry more & move more

others cry less & move less

6 week-old infants

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6 7 8 9 10

Activity - limb movements

Ho

urs

of

Cry

ing

6 Month-Olds

r = +.38

some infants cry more & move more

others cry less & move less

What if we combine the two groups?

6 month-old infants

0

1

2

3

4

5

6

7

25 30 35 40


Ho

urs

of

Cry

ing

• Full sample r = -0.22

6 week-olds & 6-month-old infants

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30 35 40


Ho

urs

of

Cry

ing

• Do we get a significant corr? If so, what kind?

What happened with the Correlations?

6 Week-olds: r = +.47 6 Month-Olds: r = +.38 6 Week & 6 Month-olds: r = -0.22

Correlations = Grand Mean Centring

1) Mean Deviations for each variable: X & Y 2) Rank Order Mean Deviations 3) Correlate 2 rank orders of X & Y

The Disappearing Correlation Explained

Grand Mean Centring lead to all the older infants being classified as high movers young infants low movers Young high criers & high movers -> high criers & low

movers Large Group differences in movement altered the

detection of within-group r’s

What should we do?

Solution: Create Group Mean Deviations

Re-scale raw scores Raw – Group Mean 6 week-olds: xs – 5.1 6 month-olds: xs – 29.4

Solution: Create Group Mean Deviations

Crying Raw AL Group Means Group Centred AL

5.7 1 -5.11 -4.11

6 4 -5.11 -1.11

2 5 -5.11 -0.11

0.5 30 -29.4 0.63

2.5 35 -29.4 5.63

2 34 -29.4 4.63

• Raw Scores

6 week-olds & 6-month-old infants

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30 35 40


Ho

urs

of

Cry

ing

Group Centred Scores

Group mean data r = .41 - full sample Mulitple Regression could also work on uncentred variables

Crying = Group + Uncentred AL Not a Group x AL interaction – the relation is the same for both groups

012

3456

789

-10 -8 -6 -4 -2 0 2 4 6 8 10

Limb Movements / 48 Hrs

Ho

urs

of

Cry

ing

/48

Hrs

6 Weeks Old

6 Months Old

Centring so far

1. Centring is Magic 2. Different types of centring

Depending on the number used to re-scale the data

Grand mean – Pearson Correlations Group Means – Infant Limb Movements

Regression Interactions Centring

Great for Interpreting Interactions trickier than for ANOVAs do not have pre-defined levels or groups based on 2+ continuous vars

Multiple Regression - the Basics

The Basic Equation: Y = a + b1*X1 + b2*X2 + b3*X3 + e Outcome = Intercept + Beta1 * predictor1 + B2 * pred2 + B3 * pred3 + Error

a = expected mean response of y betas: every 1 unit change in X you get a

beta sized change in Y

Regression Interactions Centring Reducing multicollinearity

interaction predictor = x1 * x2 x1 & x2 numbers near 0 stay near 0 and high x1 & x2

numbers get really high interaction term is highly correlated with original x1 &

x2 variables Centring makes each predictor: x1 & x2

have more moderate numbers above and below zero positive and negative numbers

Reduces the multiplicative exaggeration between x1 & x2 and the interaction product x1*x2

Centring to reduce Multicollinearity

X1 with X1*X2 multicollinearityOriginal Variables

0

10

20

30

40

50

60

70

80

90

0 10 20

x1

x1*x

2 p

rod

uct

X1 with X1*X2 multicollinearity Centred Variables

-10

0

10

20

30

-6 -4 -2 0 2 4

x1

x1*x

2 p

rod

uct

Regression

Y = a + b1*X1 + b2*X2 + b3*X1*X3 + e

How does X2 relate to Y at different levels of X1?

How does predictor 2 (shyness) relate to the outcome (social interactions) at different stress levels (X1)?

Uncentred Data Centred DataX1 = 26.2 (14.5) X1 = 0.0 (14.5)X2 = 24.8 (27.6) X2 = 0.0 (27.6)

x1 x2 x12 y x1c x2c x12c y

x1 -- 0.58** 0.65** 0.14** x1c -- 0.58** 0.11 0.14*

x2 -- 0.96** 0.28** x2c -- 0.66** 0.28**

x12 -- 0.34** x12c -- 0.34**

Correlation Matrix:

** p = .01

* p = .05

Regression Equation Results No Interaction:

Y = b0 + b1 * X1 + b2 * X2

Uncentred:Y = 1164.8 – 4 X1 + 20 X2 **

Centred:Y = 1550.8 – 4 X1 + 20 X2 **

Regression Equation Results

Interaction Term Included: Y = b0 + b1 * X1 + b2 * X2 + b3 * X1*X2

Uncentred: Y = 1733 – 19.1 X1 – 31.7 X2 ** + 1.26 X1*X2

Centred: Y = 1260 + 12.0 X1 + 1.1 X2 + 1.26 X1*X2

But what does it mean…

How does X2 relate to Y at different levels of X1?

How does predictor 2 (shyness) relate to the outcome (social interactions) at different stress levels (X1)?

Post Hocs Y = b0 + b1 * X1 + b2 * X2 + b3 * X1*X2

Y = ( b1 * X1 + b0 ) + ( b2 + b3 * X1 ) * X2

-1 SD below X1 Mean & + 1SD above X1 Mean

X - (- 14.547663) X - 14.547663

X + 14.547663

AL Mean Centred

0

5

10

-10 -5 0 5 10

Movement Hrs/Day

Cry

ing

Hrs

/Day

AL -1SD Below Mean

0

5

10

-10 0 10

Movement Hrs/Day

Cry

ing

H

rs/D

ay

AL +1SD Below Mean

0

5

10

-10 0 10

Movement Hrs/Day

Cry

ing

H

rs/D

ay

Scatterplots: Moving the Y Axis

-1 SD Below X1 Mean Y = 1085 -19.1 X1 - 17.1 X2 + 1.26 X1*X2 t (1,196) = -1.40, p =.16

Centred: Y = 1260 + 12.0 X1 + 1.1 X2 + 1.26 X1*X2 t (1,196) = 0.12, p =.88

+1 SD Above X1 Mean Y = 1435 - 19.1 X1+ 19.4 X2 ** + 1.26 X1*X2 t (1,196) = 3.66, p =.001

Regression Interaction Example

Predicting inhibitory ability with motor activity & age simon says like games 4 to 6 yr-olds & physical movement Move by Age interaction

F (1, 81) = 5.9, p < .02 Young (-1.5SD): move beta sig + Inhibition Middle (Mean) : move beta p = .10 ~ Inhibition Older (+1.5SD): move beta n.s. inhibition

Polynomials, Centring, & Spline Functions

Polynomial relations: quadratic, cubic, etc

Y = a + b1*X1 - b2*X1*X1 + e

-100-50

050

100150200250

-10 -5 0 5 10 15

Curvilinear Pattern Assume a symmetric

pattern – X2

But, it may not be ...

Perceived Control (Y) slowly increases & then declines rapidly in old age

0

100

200

300

400

500

0 5 10 15

-100-50

050

100150200250

-10 -5 0 5 10 15

This Brings us to Spline Functions Split up predictor X

2+ variables

XLow & XHigh 0

50

100

150

200

250

-10 -5 0 5 10 15 20

XLow = X – (-5) & set values at the next change point to zero Ditto for XHigh

Re-run Y = a + b1*XLow - b2*XHigh+ e

Perks of Spline Functions

Estimate slope anywhere along the range

Can be sig on one part - n.s. on another

Steeper or shallower

Multiple Regression Techniques

1. Centring removing /group difference confounds

2. Centring interpret continuous interactions

3. Spline functions More precise understanding of polynomial

patterns

Questions

• Alpha control procedures for spline functions– Could be argue that you are describing the pattern

already identified?

– Conservatively, you could apply an alpha control procedure. I like the False Discovery Rate procedures.

– Replication is preferred, but not always possible.

Alpha Control Aside• The source of Type 1 errors is typically poorly

described.• Typical: If enough probability tests are run, the

probability will increase to the point where something becomes significant just by chance. – But, probability is linked to the representativeness of

your data and type 1 error is a proxy for the likelihood of the representativeness of your data.

• My View: The real source of Type 1 errors is that if you– divide up the data into enough subgroupings – eventually one of those subgroupings will differ

because it is misrepresentative of reality.

Standardized vs Centred

• Centred is x – xM

• Standardized (x – xM)/ SDx– Makes variability for each predictor = 1 – Standardized Beta = raw b * SDx / SDy– Similar to centring but different metric needs to be

adjusted for interaction terms

• To get comparable results with interaction term– Standardization should be applied to X1 and X2 prior

to the X1*X2 estimate then use “raw” coefficients

Centring and Spline Functions

Relatively simple procedures

Old dogs in the Statistic World but new tricks for many

That’s All Folks!

Getting More out of Multiple Regression Darren Campbell, PhD.

Documents

Transcript of Getting More out of Multiple Regression Darren Campbell, PhD.