Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable...

41
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 6). [Teaching Resource] © 2012 The Author This version available at: http://learningresources.lse.ac.uk/132/ Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/ http://learningresources.lse.ac.uk/

Transcript of Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable...

Page 1: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

Christopher Dougherty

EC220 - Introduction to econometrics (chapter 6)Slideshow: variable misspecification i: omitted variable bias

 

 

 

 

Original citation:

Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 6). [Teaching Resource]

© 2012 The Author

This version available at: http://learningresources.lse.ac.uk/132/

Available in LSE Learning Resources Online: May 2012

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/

 

 http://learningresources.lse.ac.uk/

Page 2: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

In this sequence and the next we will investigate the consequences of misspecifying the regression model in terms of explanatory variables.

1

Consequences of variable misspecification

TRUE MODEL

FIT

TE

D M

OD

EL

uXXY 33221 uXY 221

33

221ˆ

Xb

XbbY

221ˆ XbbY

Page 3: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

To keep the analysis simple, we will assume that there are only two possibilities. Either Y depends only on X2, or it depends on both X2 and X3.

2

Consequences of variable misspecification

TRUE MODEL

FIT

TE

D M

OD

EL

uXXY 33221 uXY 221

33

221ˆ

Xb

XbbY

221ˆ XbbY

Page 4: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

If Y depends only on X2, and we fit a simple regression model, we will not encounter any problems, assuming of course that the regression model assumptions are valid.

3

Consequences of variable misspecification

TRUE MODEL

FIT

TE

D M

OD

EL

uXXY 33221 uXY 221

33

221ˆ

Xb

XbbY

221ˆ XbbY

Correct specification,no problems

Page 5: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

Likewise we will not encounter any problems if Y depends on both X2 and X3 and we fit the multiple regression.

4

Consequences of variable misspecification

TRUE MODEL

FIT

TE

D M

OD

EL

uXXY 33221 uXY 221

33

221ˆ

Xb

XbbY

221ˆ XbbY

Correct specification,no problems

Correct specification,no problems

Page 6: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

In this sequence we will examine the consequences of fitting a simple regression when the true model is multiple.

5

Consequences of variable misspecification

TRUE MODEL

FIT

TE

D M

OD

EL

uXXY 33221 uXY 221

33

221ˆ

Xb

XbbY

221ˆ XbbY

Correct specification,no problems

Correct specification,no problems

Page 7: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

In the next one we will do the opposite and examine the consequences of fitting a multiple regression when the true model is simple.

6

Consequences of variable misspecification

TRUE MODEL

FIT

TE

D M

OD

EL

uXXY 33221 uXY 221

33

221ˆ

Xb

XbbY

221ˆ XbbY

Correct specification,no problems

Correct specification,no problems

Page 8: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

Consequences of variable misspecification

TRUE MODEL

FIT

TE

D M

OD

EL

uXXY 33221 uXY 221

33

221ˆ

Xb

XbbY

221ˆ XbbY

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

The omission of a relevant explanatory variable causes the regression coefficients to be biased and the standard errors to be invalid.

7

Correct specification,no problems

Correct specification,no problems

Coefficients are biased (in general). Standarderrors are invalid.

Page 9: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

8

In the present case, the omission of X3 causes b2 to be biased by the term highlighted in yellow. We will explain this first intuitively and then demonstrate it mathematically.

uXXY 33221 221ˆ XbbY

2

22

3322322 )(

XX

XXXXbE

i

ii

Page 10: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

Y

X3X2

direct effect of X2, holding X3 constant

effect of X3

apparent effect of X2, acting as a mimic for X3

2 3

9

The intuitive reason is that, in addition to its direct effect 2, X2 has an apparent indirect effect as a consequence of acting as a proxy for the missing X3.

uXXY 33221 221ˆ XbbY

2

22

3322322 )(

XX

XXXXbE

i

ii

Page 11: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

Y

X3X2

direct effect of X2, holding X3 constant

effect of X3

apparent effect of X2, acting as a mimic for X3

2 3

10

The strength of the proxy effect depends on two factors: the strength of the effect of X3 on Y, which is given by 3, and the ability of X2 to mimic X3.

uXXY 33221 221ˆ XbbY

2

22

3322322 )(

XX

XXXXbE

i

ii

Page 12: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

11

uXXY 33221 221ˆ XbbY

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

2

22

3322322 )(

XX

XXXXbE

i

ii

Y

X3X2

direct effect of X2, holding X3 constant

effect of X3

apparent effect of X2, acting as a mimic for X3

2 3

The ability of X2 to mimic X3 is determined by the slope coefficient obtained when X3 is regressed on X2, the term highlighted in yellow.

Page 13: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

12

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uXXY 33221 221ˆ XbbY

uuXXXX

uXXuXXYY

iii

iiii

333222

3322133221

We will now derive the expression for the bias mathematically. It is convenient to start by deriving an expression for the deviation of Yi about its sample mean. It can be expressed in terms of the deviations of X2, X3, and u about their sample means.

Page 14: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

13

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uXXY 33221 221ˆ XbbY

uuXXXX

uXXuXXYY

iii

iiii

333222

3322133221

222

222

22

332232

222

22332232

222

222

222

XX

uuXX

XX

XXXX

XX

uuXXXXXXXX

XX

YYXXb

i

ii

i

ii

i

iiiii

i

ii

Although Y really depends on X3 as well as X2, we make a mistake and regress Y on X2 only. The slope coefficient is therefore as shown.

Page 15: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

14

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uXXY 33221 221ˆ XbbY

uuXXXX

uXXuXXYY

iii

iiii

333222

3322133221

222

222

22

332232

222

22332232

222

222

222

XX

uuXX

XX

XXXX

XX

uuXXXXXXXX

XX

YYXXb

i

ii

i

ii

i

iiiii

i

ii

We substitute for the Y deviations and simplify.

Page 16: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

15

Hence we have demonstrated that b2 has three components.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uXXY 33221 221ˆ XbbY

uuXXXX

uXXuXXYY

iii

iiii

333222

3322133221

222

222

22

332232

222

22332232

222

222

222

XX

uuXX

XX

XXXX

XX

uuXXXXXXXX

XX

YYXXb

i

ii

i

ii

i

iiiii

i

ii

Page 17: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

16

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uXXY 33221 221ˆ XbbY

2

22

222

22

3322322

XX

uuXX

XX

XXXXb

i

ii

i

ii

222

222

22

3322322

XX

uuXXE

XX

XXXXbE

i

ii

i

ii

To investigate biasedness or unbiasedness, we take the expected value of b2. The first two terms are unaffected because they contain no random components. Thus we focus on the expectation of the error term.

Page 18: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

17

X2 is nonstochastic, so the denominator of the error term is nonstochastic and may be taken outside the expression for the expectation.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

0

1

1

1

22222

22222

22222

222

22

uuEXXXX

uuXXEXX

uuXXEXXXX

uuXXE

ii

i

ii

i

ii

ii

ii

222

222

22

3322322

XX

uuXXE

XX

XXXXbE

i

ii

i

ii

Page 19: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

18

In the numerator the expectation of a sum is equal to the sum of the expectations (first expected value rule).

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

0

1

1

1

22222

22222

22222

222

22

uuEXXXX

uuXXEXX

uuXXEXXXX

uuXXE

ii

i

ii

i

ii

ii

ii

222

222

22

3322322

XX

uuXXE

XX

XXXXbE

i

ii

i

ii

Page 20: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

19

In each product, the factor involving X2 may be taken out of the expectation because X2 is nonstochastic.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

0

1

1

1

22222

22222

22222

222

22

uuEXXXX

uuXXEXX

uuXXEXXXX

uuXXE

ii

i

ii

i

ii

ii

ii

222

222

22

3322322

XX

uuXXE

XX

XXXXbE

i

ii

i

ii

Page 21: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

20

By Assumption A.3, the expected value of u is 0. It follows that the expected value of the sample mean of u is also 0. Hence the expected value of the error term is 0.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

0

1

1

1

22222

22222

22222

222

22

uuEXXXX

uuXXEXX

uuXXEXXXX

uuXXE

ii

i

ii

i

ii

ii

ii

222

222

22

3322322

XX

uuXXE

XX

XXXXbE

i

ii

i

ii

Page 22: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

21

Thus we have shown that the expected value of b2 is equal to the true value plus a bias term. Note: the definition of a bias is the difference between the expected value of an estimator and the true value of the parameter being estimated.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uXXY 33221 221ˆ XbbY

222

222

22

3322322

XX

uuXXE

XX

XXXXbE

i

ii

i

ii

2

22

3322322

XX

XXXXbE

i

ii

Page 23: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

22

As a consequence of the misspecification, the standard errors, t tests and F test are invalid.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uXXY 33221 221ˆ XbbY

222

222

22

3322322

XX

uuXXE

XX

XXXXbE

i

ii

i

ii

2

22

3322322

XX

XXXXbE

i

ii

Page 24: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC SM

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543-------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222------------------------------------------------------------------------------

23

We will illustrate the bias using an educational attainment model. To keep the analysis simple, we will assume that in the true model S depends only on ASVABC and SM. The output above shows the corresponding regression using EAEF Data Set 21.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uSMASVABCS 321

Page 25: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC SM

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543-------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222------------------------------------------------------------------------------

24

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

We will run the regression a second time, omitting SM. Before we do this, we will try to predict the direction of the bias in the coefficient of ASVABC.

uSMASVABCS 321

2322 )(

ASVABCASVABC

SMSMASVABCASVABCbE

i

ii

Page 26: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC SM

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543-------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222------------------------------------------------------------------------------

25

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

It is reasonable to suppose, as a matter of common sense, that 3 is positive. This assumption is strongly supported by the fact that its estimate in the multiple regression is positive and highly significant.

uSMASVABCS 321

2322 )(

ASVABCASVABC

SMSMASVABCASVABCbE

i

ii

Page 27: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC SM

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543-------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222------------------------------------------------------------------------------

26

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

The correlation between ASVABC and SM is positive, so the numerator of the bias term must be positive. The denominator is automatically positive since it is a sum of squares and there is some variation in ASVABC. Hence the bias should be positive.

. cor SM ASVABC(obs=540)

| SM ASVABC--------+------------------ SM| 1.0000 ASVABC| 0.4202 1.0000

uSMASVABCS 321

2322 )(

ASVABCASVABC

SMSMASVABCASVABCbE

i

ii

Page 28: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376-------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165 .1656516 _cons | 6.066225 .4672261 12.98 0.000 5.148413 6.984036------------------------------------------------------------------------------

27

Here is the regression omitting SM.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uSMASVABCS 321

Page 29: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC SM

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222------------------------------------------------------------------------------

. reg S ASVABC

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165 .1656516 _cons | 6.066225 .4672261 12.98 0.000 5.148413 6.984036------------------------------------------------------------------------------

28

As you can see, the coefficient of ASVABC is indeed higher when SM is omitted. Part of the difference may be due to pure chance, but part is attributable to the bias.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uSMASVABCS 321

Page 30: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S SM

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308-------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- SM | .3130793 .0348012 9.00 0.000 .2447165 .3814422 _cons | 10.04688 .4147121 24.23 0.000 9.232226 10.86153------------------------------------------------------------------------------

29

Here is the regression omitting ASVABC instead of SM. We would expect b3 to be upwards biased. We anticipate that 2 is positive and we know that both the numerator and the denominator of the other factor in the bias expression are positive.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

2233 )(

SMSM

SMSMASVABCASVABCbE

i

ii

uSMASVABCS 321

Page 31: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

30

In this case the bias is quite dramatic. The coefficient of SM has more than doubled. The reason for the bigger effect is that the variation in SM is much smaller than that in ASVABC, while 2 and 3 are similar in size, judging by their estimates.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

. reg S ASVABC SM

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222------------------------------------------------------------------------------

. reg S SM

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- SM | .3130793 .0348012 9.00 0.000 .2447165 .3814422 _cons | 10.04688 .4147121 24.23 0.000 9.232226 10.86153------------------------------------------------------------------------------

uSMASVABCS 321

Page 32: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC SM Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543-------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963

. reg S ASVABC Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376-------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865

. reg S SM Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308-------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756

31

Finally, we will investigate how R2 behaves when a variable is omitted. In the simple regression of S on ASVABC, R2 is 0.34, and in the simple regression of S on SM it is 0.13.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uSMASVABCS 321

Page 33: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC SM Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543-------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963

. reg S ASVABC Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376-------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865

. reg S SM Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308-------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756

32

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

Does this imply that ASVABC explains 34% of the variance in S and SM 13%? No, because the multiple regression reveals that their joint explanatory power is 0.35, not 0.47.

uSMASVABCS 321

Page 34: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg S ASVABC SM Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543-------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963

. reg S ASVABC Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376-------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865

. reg S SM Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308-------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756

33

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

In the second regression, ASVABC is partly acting as a proxy for SM, and this inflates its apparent explanatory power. Similarly, in the third regression, SM is partly acting as a proxy for ASVABC, again inflating its apparent explanatory power.

uSMASVABCS 321

Page 35: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg LGEARN S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731-------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274

------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | .1235911 .0090989 13.58 0.000 .1057173 .141465 EXP | .0350826 .0050046 7.01 0.000 .0252515 .0449137 _cons | .5093196 .1663823 3.06 0.002 .1824796 .8361596------------------------------------------------------------------------------

34

However, it is also possible for omitted variable bias to lead to a reduction in the apparent explanatory power of a variable. This will be demonstrated using a simple earnings function model, supposing the logarithm of hourly earnings to depend on S and EXP.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

uEXPSLGEARN 321

Page 36: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg LGEARN S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731-------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274

------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | .1235911 .0090989 13.58 0.000 .1057173 .141465 EXP | .0350826 .0050046 7.01 0.000 .0252515 .0449137 _cons | .5093196 .1663823 3.06 0.002 .1824796 .8361596------------------------------------------------------------------------------

35

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

If we omit EXP from the regression, the coefficient of S should be subject to a downward bias. 3 is likely to be positive. The numerator of the other factor in the bias term is negative since S and EXP are negatively correlated. The denominator is positive.

2322 )(

SS

EXPEXPSSbE

i

ii

uEXPSLGEARN 321

. cor S EXP(obs=540)

| S EXP--------+------------------ S| 1.0000 EXP| -0.2179 1.0000

Page 37: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg LGEARN S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731-------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274

------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | .1235911 .0090989 13.58 0.000 .1057173 .141465 EXP | .0350826 .0050046 7.01 0.000 .0252515 .0449137 _cons | .5093196 .1663823 3.06 0.002 .1824796 .8361596------------------------------------------------------------------------------

36

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

For the same reasons, the coefficient of EXP in a simple regression of LGEARN on EXP should be downwards biased.

2233 )(

EXPEXP

SSEXPEXPbE

i

ii

uEXPSLGEARN 321

. cor S EXP(obs=540)

| S EXP--------+------------------ S| 1.0000 EXP| -0.2179 1.0000

Page 38: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg LGEARN S EXP------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | .1235911 .0090989 13.58 0.000 .1057173 .141465 EXP | .0350826 .0050046 7.01 0.000 .0252515 .0449137 _cons | .5093196 .1663823 3.06 0.002 .1824796 .8361596

. reg LGEARN S------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | .1096934 .0092691 11.83 0.000 .0914853 .1279014 _cons | 1.292241 .1287252 10.04 0.000 1.039376 1.545107

. reg LGEARN EXP------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- EXP | .0202708 .0056564 3.58 0.000 .0091595 .031382 _cons | 2.44941 .0988233 24.79 0.000 2.255284 2.643537

37

As can be seen, the coefficients of S and EXP are indeed lower in the simple regressions.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

Page 39: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg LGEARN S EXP Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731-------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274

. reg LGEARN S Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 140.05 Model | 38.5643833 1 38.5643833 Prob > F = 0.0000 Residual | 148.14326 538 .275359219 R-squared = 0.2065-------------+------------------------------ Adj R-squared = 0.2051 Total | 186.707643 539 .34639637 Root MSE = .52475

. reg LGEARN EXP Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 12.84 Model | 4.35309315 1 4.35309315 Prob > F = 0.0004 Residual | 182.35455 538 .338948978 R-squared = 0.0233-------------+------------------------------ Adj R-squared = 0.0215 Total | 186.707643 539 .34639637 Root MSE = .58219

38

A comparison of R2 for the three regressions shows that the sum of R2 in the simple regressions is actually less than R2 in the multiple regression.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

Page 40: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

. reg LGEARN S EXP Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731-------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274

. reg LGEARN S Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 140.05 Model | 38.5643833 1 38.5643833 Prob > F = 0.0000 Residual | 148.14326 538 .275359219 R-squared = 0.2065-------------+------------------------------ Adj R-squared = 0.2051 Total | 186.707643 539 .34639637 Root MSE = .52475

. reg LGEARN EXP Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 12.84 Model | 4.35309315 1 4.35309315 Prob > F = 0.0004 Residual | 182.35455 538 .338948978 R-squared = 0.0233-------------+------------------------------ Adj R-squared = 0.0215 Total | 186.707643 539 .34639637 Root MSE = .58219

39

This is because the apparent explanatory power of S in the second regression has been undermined by the downwards bias in its coefficient. The same is true for the apparent explanatory power of EXP in the third equation.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

Page 41: Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification i: omitted variable bias Original citation:

Copyright Christopher Dougherty 2011.

These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be

used as a resource for teaching an econometrics course. There is no need to

refer to the author.

The content of this slideshow comes from Section 6.2 of C. Dougherty,

Introduction to Econometrics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own and who feel that they might

benefit from participation in a formal course should consider the London School

of Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course

20 Elements of Econometrics

www.londoninternational.ac.uk/lse.

11.07.25