Collinearity. Symptoms of collinearity Collinearity between independent variables – High r 2 High...

Post on 17-Dec-2015

219 views 0 download

Transcript of Collinearity. Symptoms of collinearity Collinearity between independent variables – High r 2 High...

Collinearity

Symptoms of collinearity

• Collinearity between independent variables – High r2

• High vif of variables in model• Variables significant in simple regression, but not in

multiple regression• Variables not significant in multiple regression, but multiple

regression model (as whole) significant• Large changes in coefficient estimates between full and

reduced models• Large standard errors in multiple regression models despite

high power

Collinearity and confounding independent variables

Two independent variables, correlated with each other, where

both influence the response

Methods

• Truth: y = 10 + 3x1 + 3x2 + N(0,2)

• x1 = U[0,10]

• x2 = x1 + N(0,z) where• z = U[0.5,20]• Run simple regression between y and x1

• Run multiple regression between y and x1 + x2

• No interactions!

Simple regression: y~x1

0.0 0.2 0.4 0.6 0.8 1.0

01

02

03

04

05

0

Collinearity between x1 and x2

Va

ria

nce

infla

tion

fact

or

Simple regression: y~x1

0.0 0.2 0.4 0.6 0.8 1.0

05

10

15

Collinearity between x1 and x2

Co

effi

cie

nt E

stim

ate

for

X1

Simple regression: y~x1

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Collinearity between x1 and x2

SE

of E

stim

ate

for

X1

Simple regression: y~x1

0.0 0.2 0.4 0.6 0.8 1.0

1e

-46

1e

-36

1e

-26

1e

-16

1e

-06

Collinearity between x1 and x2

P-v

alu

e fo

r co

effi

cie

nt e

stim

ate

for

x1

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

12

34

Collinearity between x1 and x2

Co

effi

cie

nt e

stim

ate

for

x1

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Collinearity between x1 and x2

SE

of e

stim

ate

for

x1

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

1e

-33

1e

-26

1e

-19

1e

-12

1e

-05

Collinearity between x1 and x2

P-v

alu

e fo

r co

effi

cie

nt o

f est

ima

te fo

r x1

Collinearity and redundant independent variables

Two independent variables, correlated with each other, where only one influences the response,

although we don’t know which one

Methods

• Truth: y = 10 + 3x1 + N(0,2)

• x1 = U[0,10]

• x2 = x1 + N(0,z) where• z = U[0.5,20]• Run simple regression between y and x1

• Run multiple regression between y and x1 + x2

• No interactions!

Simple regression: y~x1

0.0 0.2 0.4 0.6 0.8 1.0

01

02

03

04

05

0

Collinearity between x1 and x2

Va

ria

nce

infla

tion

fact

or

Simple regression: y~x1

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.5

3.0

3.5

4.0

Collinearity between x1 and x2

Co

effi

cie

nt o

f est

ima

te fo

r x1

Simple regression: y~x1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

60

.08

0.1

00

.12

0.1

4

Collinearity between x1 and x2

SE

for

coe

ffici

en

t of e

stim

ate

for

x1

Simple regression: y~x2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Collinearity between x1 and x2

Co

effi

cie

nt o

f est

ima

te fo

r x2

Simple regression: y~x2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

50

.10

0.1

50

.20

0.2

50

.30

Collinearity between x1 and x2

SE

for

coe

ffici

en

t of e

stim

ate

for

x2

Simple regression: y~x2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Collinearity between x1 and x2

P-v

alu

e fo

r co

effi

cie

nt o

f est

ima

te fo

r x2

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

Collinearity between x1 and x2

Co

effi

cie

nt o

f est

ima

te fo

r x1

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Collinearity between x1 and x2

SE

of e

stim

ate

for

x1

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

Collinearity between x1 and x2

Co

effi

cie

nt o

f est

ima

te fo

r x2

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

Collinearity between x1 and x2

SE

for

est

ima

te fo

r x2

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

1e

-33

1e

-26

1e

-19

1e

-12

1e

-05

Collinearity between x1 and x2

P-v

alu

e fo

r e

stim

ate

for

x1

Multiple regression: y~x1+x2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Collinearity between x1 and x2

P-v

alu

e fo

r e

stim

ate

for

x2

What to do?

• Be sure to calculate collinearity and vif among independent variables (before you start your analysis)

• Pay attention to how coefficient estimates and variable significance change as variables are removed or added

• Be careful to identify potentially confounding variables prior to data collection

Is a variable redundant or confounding?

• Think!• Extreme collinearity

– Redundant• Large changes in coefficient estimates of both variables

between full and reduced models– Confounding

• Large changes in coefficient estimates of one variable between full and reduced models– Redundant – full model estimate close to zero

• Uncertain – assume confounding– Multiple regression always produces unbiased estimates (on

average) regardless of type of collinearity

What to do? Confounding variables

• Be sure to sample in a manner that eliminates collinearity– Collinearity may be due to real collinearity or sampling

artifact• Use multiple regression

– May have large standard errors if strong collinearity• Include confounding variables even if non-

significant• Get more data

– Decreases standard errors (vif)

What to do? Redundant variables

• Determine which variable explains response best using P-values from regression and changes in coefficient estimates with variable addition and removal

• Do not include redundant variable in final model– Reduces vif

• Try a variable reduction technique like PCA