Download - Econometrics Slides

8/9/2019 Econometrics Slides

1/34

Econ 139/239: Introduction to EconometricsHandout 8

Sophia Zhengzi Li1

1Department of Economics

Duke University

Summer II, 2010

Handout 8 Econ 139/239, SummerII, 2010
http://find/http://goback/


2/34

Omitted Variable Bias

The methodology weve covered so far has (at least) one biglimitation: theres only one RHS variable explaining Y.

Consider the Test Scores regression from Chapter 4

What ifSTRis picking up something besides just thestudent-teacher ratio?

http://find/


3/34


In other words, what if something else is driving test scores?

For example,percent of English learners, teacher quality, richer school,richer neighborhood, parents education

Why do we care? Wed like to establish a causal effect.

We dont want STRgetting credit (or blame) for the effect ofsomething else.

Worse, what ifSTR is significant only because other variablesare correlated with both STR and TESTSCR?

Both problems are examples of omitted variable bias.

Definition: If a regressor is correlated with a variable that hasbeen omitted from the analysis but that determines (in part)the dependent variable, then the OLS estimator will haveomitted variable bias.

http://find/


4/34


Omitted variable bias (OVB) occurs when two conditions hold:

1 The omitted variable is correlated with the included regressor

(OVB 1).2 The omitted variable is a determinant of the dependent

variable (OVB 2).

Examples:

Percentage of English learners, time of day of the test,teachers parking lot space per pupilEducation and wages

Wage= 0+1Educ+ u

Omitting ability will cause you to overestimate the importanceof schooling. Can you see why?

Formally, omitted variable bias occurs when we dont includein our regression all the variables that are correlated with Yand one (or more) of the regressors (Xs).

http://goforward/http://find/http://goback/


5/34


Lets see what happens when we omit a relevant variable fromour analysis. Suppose the true model is:

Yi=0+1X1i+2X2i+ui (1)

withE[ui | X1i, X2i]=0.

So 1 is the true slope ofX1i.Notice that, using the LIE, we have

E[ui | X1i]=E[E[ui | X1i, X2i] | X1i]=0 (*)

Its also useful to note that for any variables P and Q:

Pi P

QiQ

=

Pi P

Qi ()

Why? Just expand the sum and cancel. 1

1(PiP)(QiQ)=[(PiP)QiQ(PiP)]=(PiP)QiQ(PiP)=(PiP)Qi

http://find/


6/34


In particular, this means we can write the OLS estimator for1 (in the univariate regression of Chap 4) as

1 = Xi X Yi Y

Xi X2 =

Xi X

Yi

Xi X2

Now suppose we (incorrectly) assume

Yi=0+1X1i+vi (2)

wherevi2X2i+ui.

If we estimate the slope 1 using equation (2) we get

1 = (X1iX1)Yi(X1iX1)

2

But is E(1)=1?Handout 8 Econ 139/239, SummerII, 2010
http://goforward/http://find/


7/34


Well, what is E(

1)?

E(1)=E(X1iX1)Yi

(X1iX1)2 =E

(X1iX1)(0+1X1i+2X2i+ui)

(X1iX1)2

=E0 (X1iX1)(X1iX1)

2

+E

1 (X1iX1)X1i

(X1iX1)2

+

E

2 (X1iX1)X2i

(X1iX1)2

+E

(X1iX1)ui(X1iX1)

2

=0+E1 (X1iX1)2(X1iX1)

2+2E(X1iX1)(X2iX2)

(X1iX1)2

+E

(X1iX1)ui(X1iX1)

2

=1+2E(X1iX1)(X2iX2)

(X1iX1)2 +

E

E

(X1iX1)ui(X1iX1)

2 | X11, .X1i, .X1n

=

1+

2E(X1iX1)(X2iX2)(X1iX1)2 +E(X1iX1)E(ui|X1i)(X1iX1)2

http://find/


8/34


So

E(1)=1+2EX1i X1 X2i X2

X1i X12

The second term will only equal 0 in the case where X1i & X2iare uncorrelated (OVB 1 fails) or 2=0 (OVB 2 fails), so

1

will be unbiasedonly ifCov(X1, X2) =0 and/or X2 isirrelevant.

Otherwise

E(

1)= 1+2E

sX1X2

s2X1

Thus, ifCov(X1, X2) =0 and 2 =0 (OVB conditions 1 &2),1 will give you a biasedestimate ofX1is expected impacton Y.

If we ignore the problem, we will reach very misleading

conclusions.Handout 8 Econ 139/239, SummerII, 2010

d bl
http://find/


9/34


Omitted variable bias means that OLS A1 (E(vi | Xi)=0 in

model (2)) is incorrect. Why?Consider the previous example where we estimated

Yi=0+1X1i+vi (2)

but the true relationship was

Yi=0+1X1i+2X2i+ui (1)

Since vi= 2X2i+ui, it is easy to show that in (2)

E(vi | X1i) = E(2X2i+ui | X1i)=2E(X2i | X1i)+E(ui | X

= 2E(X2i | X1i)

which will not equal 0 in general.


O d V bl B
http://find/


10/34


The true relationship was

Yi=0+1X1i+2X2i+ui

but we estimated

Yi=0+1X1i+vi

where it turns out that E(vi | X1i)= 2E(X2i | X1i).

Recall that the error term (here, vi) represents all factors(other than Xi) that are determinants of the dependent

variable Yi (OVB 2).If one of these factors is correlated with Xi(OVB 1) then theerror term will be correlated with Xi.

Since this violates OLS A1, OLS is no longer unbiased.


C i
http://find/


11/34

Consistency

Furthermore, when you have OVB, OLS is not only biased, but

also inconsistent. Lets see why.From the previous proof, we see that

1= 1+2

1n (X1iX1)(X2iX2)

1n (X1iX1)

2 +1n (X1iX1)(uiu)

1n (X1iX1)

2

= 1+2sX1X2s2X1

+ sX1us2X1

We know sX1X2p X1X2 , s

2X1

p 2X1 , and sX1u

p X1u=0.

Therefore:

1

p1+2

X1X22X1

= 1 (unless X1X2 =0 or 2=0)


C i


12/34

Consistency

Stock and Watson make the same point without explicitreference to X2, but the conclusion is the same.

Heres what they argue.

Suppose that one (or more) variables have been omitted fromthe regression, meaning that X is now likely to be correlated

with the error term u.

Let the correlation between Xi and uibe represented by Xuwhere Xu=

XuXu

.

We can then see that

1= 1+ 1n (XiX)ui1n (XiX)

2

p 1+

Xu2X

= 1+XuuX


S Thi t R b b t OVB
http://find/


13/34

Some Things to Remember about OVB

1 p 1+XuuXOVB is a problem whether the sample size is large or small.

Why? Because

1 is inconsistent!

The magnitude of the bias depends on the correlation betweenthe regressor and the omitted variable or, more generally, theerror term.

Note that ifX1i & X2iare uncorrelated there is no problem(X1 cant absorb the effect ofX2 on Y).

The correlation between the regressor and the omitted variable(more generally, the error term) determines the direction (sign)of the bias.


E l A Cl i Q ti f L b E i


14/34

Example: A Classic Question from Labor Economics

Will obtaining more education increase your potential earnings?

We have data2

on employed males from the 1980 NLS.wage=average monthly earnings (in 1980 $)

educ=years of education

2For more information on this dataset, see David Neumark and McKinleyBlackburn, Unobserved Ability, Efficiency Wages, and Interindustry Wage

Differentials, Quarterly Journal of Economics, 107(94),1992.Handout 8 Econ 139/239, SummerII, 2010

Labor Example


15/34

Labor Example

So an additional year of schooling is expected to increaseaverage monthly earnings by about $60 per month (about

$145 in todays dollars).

What do you think is missing here?

How about ability? Can we hope to measure it?


Labor Example
http://find/


16/34

Labor Example

We could use a persons IQ score as a proxy (since we cansometimes get data on this).

Sure enough, IQand education are positively correlated.

What do you think will happen to the coefficient on educationif we control for IQ?

1 p1+2 X1X22X1


Labor Example
http://find/


17/34

Labor Example

The expected impact of education on earnings is now smaller.Before it was getting credit both for the school effect andfordifferences in raw intelligence.

Now we have estimated the returns to education, controllingfor IQ.

So how do we do this in general, and how does it solve theomitted variable bias problem?


Multiple Regression
http://find/


18/34

Multiple Regression

The solution to the omitted variable bias problem is to add (ifyou can) the other relevant variables to the regression.

Examples:

Wage = 0+1Educ+2Ability+u

TestScr = 0+1STR+2El_Pct+u

Of course, the interpretation of the coefficients changes a little.

For example,1 is now the expected change in TestScrassociated with a one unit change in STR, holding the percentof English learners (El_Pct) constant.

Now we are estimating the pure impact ofSTR on TestScr,controlling for this other variable.

So what does it take to add more variables to OLS?

Fortunately, adding additional regressors is really easy!


Estimation of the Multiple Regression Model
http://find/


19/34


In the multiple regression model, we assume that thepopulation regressionline (the relationship that holds between

Y and the Xs on average) is given by

E(Yi | X1i, , Xki)= 0+1X1i+ +kXki

As in the univariate case, 0 is the intercept and k is the

slope coefficient ofXk.0 (the intercept) is the expected value ofYiwhen all theregressors (Xjis) are zero.

1 (the slope coefficient ofX1) is the effect on Y (the

expected change in Y) of a one unit change in X1, holding allother variables constant (or controlling for all othervariables).

1 may also be described as the partial effect on Y ofX1,holding all other variables constant.


http://find/


20/34


The population regression model is then given by

Yi= 0+1X1i+ +kXki+ui

where (by definition) ui Yi E(Yi | X1i, , Xki).

As in the univariate case, the OLS residuals are still given by

ui=YiYiwhere

Yi=

0+

1X1i+ +

kXki.

OLS minimizes the sum of the squared errors u2i , yieldingexplicit formulas for the estimators0,1, ,k.

Since the formulas involve matrix algebra, we will not derivethem here, but the intuition is the same as in the univariatecase.




21/34


In Stata, estimation takes place as before (only now you add

the additional regressors you wish to include in addition to str):

For a one student increase in the student teacher ratio, weexpect test scores to decrease by 1.1 points, holding all othervariables constant.




22/34


The expected decrease was 2.28 before. Its gone downbecauseel_pct and strare positively correlated and 2 < 0.

Recall that

1

p1+2

X1X22X1

Intuitively,strwas getting some of the blame that reallybelongs to el_pct.


OLS Assumptions
http://find/


23/34

OLS Assumptions

As in the univariate case, in order to have unbiasedness,consistency and asymptotic normality, we must make some

assumptions.Recall that the population regression model is

Yi= 0+1X1i+...+kXki+ui

OLS Assumption 1 Linearity3

E(ui|X1i, , Xki) =0OLS Assumption 2 Simple random sample(Yi, X1i, , Xki) iidOLS Assumption 3 No extreme outliersX1i, , Xki, uihave non-zero & finite fourth moments.

OLS Assumption 4 No perfect colliearityRegressors cant be written as linear combinations of eachother.

3Note that this condition also implies that the conditional expectation iszero givenanysubset of regressors. For example,

E[ui|X1i] =E[ui|Xji] =E[ui|X1i,

X2i] =0. Why? Thisisduetoa morecomplicated version of LIE, according towhich E Y X =E E Y X, Z X .Handout 8 Econ 139/239, SummerII, 2010

OLS Assumption 4: No Perfect Collinearity


24/34

O S ssu pt o o e ect Co ea ty

Assumption 4 is new! Why is it important?

SupposeX2i=a+ bX1i (wherea andbare known constants):

Yi=0+1X1i+2X2i+ ui=(0+2a)+(1+2b) X1i+ ui

This is equivalent to the univariate regression, which we wereable to estimate because the FOCs gave us two equations andtwo unknowns.

Now there are three unknowns, but still only two equations, sowe cant identify the parameters.


OLS Assumption 4: No Perfect Collinearity
http://find/http://goback/


25/34

p y

Examples of Perfect CollinearityFraction of English learners and % of English speakers, incomeand after tax income, dummies for both males and females.

This is not usually a problem in practice, since if you include

perfectly collinear variables, your software will either give youback an error message, or drop as many regressors asnecessary to make the remaining variables non-collinear.

Note that if two or more of the regressors are highly but notperfectly collinear, we dont have a problem.

In fact, a purpose of OLS is to sort out the independent effectsof the various regressors when they are potentially correlated.


OLS Assumption 5? No, Thanks!


26/34

p ,

Heteroskedasticity & Homoskedasticity

The concepts of heteroskedasticity and homoskedasticity alsocarry over to the multiple regression model.

If we want to assume homoskedasticity (in general we wont),we would addOLS Assumption 5 HomoskedasticityVar(ui|X1i, , Xki) =s

2


Properties of Multivariate OLS
http://find/


27/34

p

Assumptions A1 - A4 are sufficient to prove that OLS yields anunbiased and consistent estimator of the intercept and all theslopes.

Theyre also sufficient to prove that the s are asymptoticallynormal.

In this case, this means that not only for each coefficientj a Nj,2jbut all the estimated coefficients are jointly normal.4

Adding OLS A5 makes OLS BLUE.

4Recall (from Handout 2) that if a group of random variables are distributed

joint normal, then each of them is normally distributedaswell.Handout 8 Econ 139/239, SummerII, 2010

Hypothesis Testing and Confidence Intervals


28/34

y g

Since each coefficient is asymptotically normal, tests and CIsrelated to one coefficient proceed just as before.Hypothesis Testing

To test the hypothesis H0 : j= j,0 against the alternativeHA : j=j,0

Compute the standard error ofj, SEjCompute the t-statistic, tact =

jj,0SE(j)

Compute the p-value, p-value=2 ( |tact|)where tact is the value of the t-statistic actually computed.Reject H0 at the 5% significance level if the p-value is lessthan 0.05, or equivalently, if|tact| > 1.96.


Confidence Intervals


29/34

When the sample size is large, a 95%confidence interval for j

can be constructed asj 1.96 SEj or:j 1.96 SE

j

,

j+1.96 SE

j

Remember that this confidence interval contains the true valueofjwith a 95% probability (i.e. it contains the true value ofj in 95% of all possible randomly selected samples).

Equivalently, it is also the set of values ofjthat cannot berejected by a 5% two-sided hypothesis test.

99%CI for j:j 2.575 SEj90%CI for j:

j 1.645 SE

j


Example: Confidence Intervals
http://find/


30/34

Given a 1% increase in EL_PCT we expect TESTSCRto decreaseby .65 points, holding all other variables constant.95%CI for 2: (.65 1.96 .03)=(.7,.59)99%CI for 2: (.65 2.58 .03)=(.73,.57)95%CI for 1: (1.1 1.96 .43)=(1.94,.26)99%CI for 1: (1.1 2.58 .43)=(2.2, .01)


Example: Hypothesis Testing
http://find/


31/34

H0 :2=0 vs. HA :2=0p-value=2

.65.03

=2 (20.9) 0

H0 :1=0 vs. HA :1=0

p-value=2 1.1.43 =2 (2.54)= .0104

H0 :1= 1 vs. HA : 1= 1p-value=2

1.1+1

.43

=2 (.23)= .82


Tests Involving More than One Coefficient
http://find/


32/34

But what if we want to test more complicated hypotheses?

For example, what if the null is related to more than onecoefficient?

1 Example 1: H0 : 1 = 2 vs. HA : 1 =22 Example 2: H0 : 1+2=1 vs. HA : 1+2=13 Example 3: H0 : 1 =0 & 2=0 vs. HA : 1 =0 and/or

2 =0

With 1 or 2, we can transform the regression.

Lets see how.




33/34

The regression reported above was

TESTSCRi= 0+1STRi+2EL_PCTi+ui

ForH0 :1= 2, HA : 1= 2

we can run the following regression

TESTSCRi=0+STRi+2(STRi+EL_PCTi)+ui

and testH0 :=0, HA : =0

How is this a test of the null above?

TESTSCRi= 0+(2+) STRi+2EL_PCTi+ui




34/34

For

H0 : 1+2 =1, HA : 1+2 =1we can run

TESTSCRi STRi= 0+STRi+2(EL_PCTi STRi)+ui

and testH0 :=0, HA : =0

TESTSCRi=0+(+1 2) STRi+2EL_PCTi+ui

To handle Example 3, we need some additional tools.

http://find/