Instrumental Variables: Problems

Instrumental Variables: Problems

Methods of Economic Investigation

Lecture 16

Last Time IV

Monotonic Exclusion Restriction

Can we test our exclusion restriction? Overidentification test Separate Regression Tests

Today’s Class Issues with Instrumental Variables

Heterogeneous Treatment Effects LATE framework interpretation

Weak Instruments Bias in 2SLS Asymptotic properties Problems when the first stage is not very big

Heterogeneous Treatment Effects Recall our counterfactual worlds

Individual has two potential outcomes Y0 and Y1

We only observe 1 of these for any given individuals

Define a counterfactual S now: S1 is the value of S if Z = 1

S0 is the value of S if Z = 0 We only ever observe one of these for any given

individual

The CounterfactualIndividual U

S1U =1 S0U=0

Y1U Y0U Y1U Y0U

Z=1 Z=0

S1U=0 S0U=1

Y1U Y0UY1U Y0U

ITT is E[Y | Z=1] – E[Y | Z=0] (red vs. orange)

Observed difference is E[Y | S=1] – E[Y | S=0] (light blue vs dark blue)

ATE is the E[Y1U – Y0U]: which node doesn’t matter if homogeneous effects. With heterogeneous effects, it’s the average across the nodes

LATE is E[Y | S1U=1, Z=1] – E[Y | S0U=1, Z=0]

Writing the first stage as counterfactuals We can now define our variable of interest

S as follows:

S = S0i + (S1i – S0i)zi=π0 + π1i Zi + νi

In this specification: π0 = E[S0i]

π1i = S1i – S0i

E[π1i] = E[S1i – S0i]: The average effect of Z on S—this is just our ATE for the first stage regression

Exclusion Restriction The instrument operates only through the

channel of the variable of interest With homogeneous effects we describe this as E[ηZ]=0

For any value of S (i.e. S = 0, 1) Y(S, 0) = Y(S, 1)

Another way to think of this is that the exclusion restriction says we only want to look at the part of S that is varying with Z

The set of potential outcomes Use the exclusion restriction to define

potential outcomes with Y(S,Z) Y1i = Y(1,1) = Y(1,0) = Y(S=1)

Y0i = Y(0,1) = Y(0,0) = Y(S=0)

Rewrite the potential outcome as:Yi= Yi(0,zi) + [Yi(1,zi) – Yi(0,zi)]Si

= Y0i + (Y1i – Y0i)Si

= α0 + ρiSi + η

S is the unique Channel through which the instrument operates

Monotonicity For the set of individuals affected by the

instrument, the instrument must have the same effect It can have no effect on some people (e.g.

always takers, never takers) For those it has an effect on (e.g. complier) it

must be that π1i >0 or π1i <0 for all i

where Si = π0 + π1i Zi + νi

In terms of the counterfactual, it must be the case that S1i ≥ S0i (or S1i ≤ S0i ) for all i

Back to LATE Given these assumptions

To see why note the following: E[Yi | Zi=1] = E[Y0i + (Y1i – Y0i)Si | Zi=1]

= E[Y0i + (Y1i – Y0i)S1i ]

E[Yi | Zi=0] = E[Y0i + (Y1i – Y0i)S0i ]

E[S |Zi =1] – E[S |Zi =0] = E[S1i – S0i] = Pr[S1i>S0i]

]0|[]|[]0|[]1|[

]0|[]1|[0101

iiiiii ESSYYEZSEZSE

ZYEZYE

LATE continued Substituting these equalities in to our

formula we get:

We are left with our LATE estimate

)Pr(

)]Pr()|[(

)Pr(

)])([(

]0|[]1|[

]0|[]1|[

01

010101

01

0101

ii

iiiiii

ii

iiii

iiii

iiii

SS

SSSSYYE

SS

SSYYE

ZSEZSE

ZYEZYE

)|( 0101 iiii SSYYE

How to Interpret the LATE Remember we thought of the LATE as useful

because Y(S=1, Z=0) = Y(S=1, Z=1)= Y(S=1) In the case of heterogeneous effects this is not

true The LATE will not be the same as the ATE

Our estimate is “local” to the set of people our instrument effects (the compliers) Is this group we care about on it’s own? Is there a theory on how this group’s effect size

might relate to other group’s effect

Finite Sample Problems This is a very complicated topic

Exact results for special cases, approximations for more general cases

Hard to say anything that is definitely true but can give useful guidance

With sufficiently strong instruments in a sufficiently large finite sample—you’re fine

Weak Instruments generate 3 problems: Bias Incorrect measurement of variance Non-normal distribution

Some Intuition for why Strength of Instruments is Important Consider very strong instrument

Z can explain a lot of variation in s Z very close to s-hat

Think of limiting case where correlation perfect – then s-hat=s IV estimator identical to OLS estimator Will have same distribution If errors normal then this is same as

asymptotic distribution

What if we have weak instrument… Think of extreme case where true

correlation between s and Z is useless First-stage tries to find some correlation so

estimate of coefficients will not normally be zero and will have some variation in X-hat

No reason to believe X-hat contains more ‘good’ variation than X itself

So central tendency is OLS estimate But a lot more noise – so very big variance

A Simple Example One endogenous variable, no exogenous

variables, one instrument All variables known to be mean zero so

estimate equations without intercepts

i i iy x

i i ix z u

ˆ IV i i

i i

z y

z x

Finite Sample Problems 1 and 2 To address issue of bias want to take

expectation of final term – would like it to be zero.

Problem – mean does not exist ‘fat tails’ i.e. sizeable probability of getting vary

large outcome This happens when Σzixi is small more likely when instruments are weak

Similar issue for variance estimation

Finite Sample Problem 3

zi non-stochastic

(εi,ui) have joint normal distribution with mean zero, variances σ2

ε,σ 2u, and

covariance σ2εu

If σ2εu =0 then no endogeneity problem and

OLS estimator consistent

If σ2εu ≠0 then endogeneity problem and

OLS estimator is inconsistent

IV Estimator for this special case..

Both numerator and denominator of final term are linear combinations of normal random variables so are also normally distributed

So deviation of IV estimator from β is ratio of two (correlated) normal random variables

Sounds simple but isn’t

2

1

ˆ1 1

i ii i i iIV

i i ii i i

zz z u nz z u z z u

n n

A Very Special Case: π= σ2

εu =0 X exogenous and Z useless (basically, OLS would

be okay but maybe you don’t know this In this case numerator and denominator in:

2

1

ˆ1 1

i ii i i iIV

i i ii i i

zz z u nz z u z z u

n n

• Ε and u are independent with mean zero• The IV estimator has a Cauchy distribution –

this has no mean (or other moments)

Rules-of-Thumb Mean of IV estimator exists if more than

two over-identifying restrictions Where mean exists:

• Probably can use as measure of central tendency of IV estimator where mean does not exist

• This is where rule-of-thumb on F-stat comes from

What to do - 1 Report the first stage and think about whether

it makes sense. Are the magnitude and sign as you would

expect, are the estimates too big or large but wrong-

signed?

Report the F-statistic on the instruments. The bigger this is, the better. General suggestion: F-statistics above about 10

put you in the safe zone

What to do – 2 Pick your best single instrument and report

just-identified estimates using this one only. Just-identified IV is median-unbiased and therefore

unlikely to be subject to a weak-instruments critique.

Look at the coefficients, t-statistics, and F-

statistics for excluded instruments in the reduced-form regression of dependent variables on instruments. Remember that the reduced form is proportional to

the causal effect of interest. Most importantly, the reduced-form estimates, since

they are OLS, are unbiased.

Next time Maximum Likelihood Estimation

Two uses: LIML as an alternative to 2SLS Discrete Choice Models (logit, probit, etc.)

Instrumental Variables: Problems

Documents

Transcript of Instrumental Variables: Problems