Instrumental Variables Estimation (with Examples from Criminology)
description
Transcript of Instrumental Variables Estimation (with Examples from Criminology)
Instrumental Variables Estimation (with Examples from Criminology)
Robert Apel, Ph.D.School of Criminal JusticeUniversity at Albany
Center for Social and Demographic AnalysisUniversity at AlbanyMay 5 & 7, 2009
Vital Statistics
Ph.D., Criminology and Criminal Justice, 2004– University of Maryland
Coursework in Department of Economics Dissertation used instrumental variables
– State child labor laws as instrumental variables for the causal effect of youth employment on antisocial behavior
Topics That Will Be Covered in this Workshop
Why use IV?– Discussion of endogeneity bias– Statistical motivation for IV
What is an IV?– Identification issues– Statistical properties of IV estimators
How is an IV model estimated?– Software and data examples– Diagnostics: IV relevance, IV exogeneity, Hausman
Review of the Linear Model
Population model: Y = α + βX + ε– Assume that the true slope is positive, so β > 0
Sample model: Y = a + bX + e– Least squares (LS) estimator of β:
bLS = (X′X)–1X′Y = Cov(X,Y) / Var(X)
Under what conditions can we speak of bLS as a causal estimate of the effect of X on Y?
Review of the Linear Model
Key assumption of the linear model:E(X′e) = Cov(X,e) = E(e | X) = 0
– Exogeneity assumption = X is uncorrelated with the unobserved determinants of Y
Important statistical property of the LS estimator under exogeneity:
E(bLS) = β + Cov(X,e) / Var(X)plim(bLS) = β + Cov(X,e) / Var(X)
Second terms 0, so bLS unbiased and consistent
Endogeneity and the Evaluation Problem
When is the exogeneity assumption violated?– Measurement error → Attenuation bias– Instantaneous causation → Simultaneity bias– Omitted variables → Selection bias
Selection bias is the problem in observational research that undermines causal inference– Measurement error and instantaneous causation
can be posed as problems of omitted variables
When Is the Exogeneity Assumption Violated?
(1) Measurement error in X (u) that is correlated with M.E. in Y (v) or with the model error (e)– Classical M.E. leads to attenuation, 0 < E(bLS) < β,
but non-random M.E. (or correlation between M.E. and X, Y, V, and/or e) introduces unknown biases
And, if there are multiple X’s, bias contaminates the whole model, not just the coefficient on the X measured with error (a.k.a. “smearing”)
X Y
vu
e
When Is the Exogeneity Assumption Violated?
(2) Instantaneous causation of Y on X– Direction of the bias depends on what the sign is
for the feedback effect, Y → X If positive, E(bLS) > β, so overestimate true effect If negative, E(bLS) < β, so underestimate true effect and in
severe cases can even flip the sign so that E(bLS) < 0 even though β > 0
X YThis non-recursivity complicates the relationship between price and quantity in economics
When Is the Exogeneity Assumption Violated?
(3) Omitted variable (W) that is correlated with both X and Y– Classic problem of omitted variables bias
Coefficient on X will absorb the indirect path through W, whose sign depends on Cov(X,W) and Cov(W,Y)
X Y
W
Things more complicated in applied settings because there are bound to be many W’s, not to mention that the “smearing” problem applies in this context also
Example #1: Police Hiring
Measurement error– Mobilization of sworn officers (M.E. in X) as well as
differential victim reporting or crime recording (M.E. in Y) may be correlated with police size
Instantaneous causation– More police might be hired during a crime wave
Omitted variables– Large departments may differ in fundamental ways
difficult to measure (e.g., urban, heterogeneous)
Example #2: Sanction Perceptions
Measurement error– Measures of perceived sanction risk are probably
“noisy” (M.E. in X), resulting in attenuation at best Instantaneous causation
– Perceptions are sensitive to the success/failure of criminal behavior, so feedback is negative
Omitted variables– Perceived risk probably correlated with unobserved
determinants of crime (e.g., intelligence)
Example #3: Delinquent Peers
Measurement error– Highly delinquent youth probably overestimate the
delinquency of their peers (M.E. in X), and likely underestimate their own delinquency (M.E. in Y)
Instantaneous causation– If there is influence/imitation, then it is bidirectional
Omitted variables– High-risk youth probably select themselves into
delinquent peer groups (“birds of a feather”)
Regression EstimationIgnoring Omitted Variables
Suppose we estimate treatment effect model:Y = α + βX + ε
– Let’s assume without loss of generality that X is a binary “treatment” (= 1 if treated; = 0 if untreated)
Least squares estimator:bLS = Cov(X,Y) / Var(X) = E(Y | X = 1) – E(Y | X = 0)
– Simply the difference in means between “treated” units (X = 1) and “untreated” units (X = 0)
Regression EstimationIgnoring Omitted Variables
But suppose the population treatment effect model is instead:
Y = α + βX + (δW + ω)– Now the residual conveys information about W
Consider a plausible example– Y = crime, X = marriage, W = “marriageability”
“Marriageability” can be broadly construed to encompass earnings potential, desire for children, willingness to compromise, faithfulness, verbal communication skills,...
– Including “signals” that individuals emit about these qualities
Regression EstimationIgnoring Omitted Variables
What does LS estimate when W is omitted?bLS = [C(X,Y)/V(X)] + [C(W,Y)/V(W)] × [C(X,W)/V(X)] = β + δ × [E(W | X = 1) – E(W | X = 0)]
Marriage effect on crime will be overestimated– IMPORTANT: Even if β = 0, bLS < 0
True impact of marriage on crime
(–)
Impact of marriage-ability on crime
(–)
Difference in marriageability between married and unmarried
(+)
Regression EstimationIgnoring Omitted Variables
So...bLS = β + δ × [E(W | X = 1) – E(W | X = 0)]
Estimate of β is unbiased if and only if1. Marriageability is uncorrelated with crime
δ = 0or...
2. Marriageability is “balanced” (i.e., equivalent) between married and unmarried subjectsE(W | X = 1) = E(W | X = 0)
Omitted Variables in Criminological Research
What variables of interest to criminologists are surely endogenous?– Micro = Employment, education, marriage, military
service, fertility, conviction, family structure,....– Macro = Poverty, unemployment rate, collective
efficacy, immigrant concentration,.... Basically, EVERYTHING!
– (I’m sorry to be the one to break it to you)
Traditional Strategies to Deal with Omitted Variables
Randomization (physical control)– Achieves balance (in expectation) on any and all
potential W’s– Control variables are technically unnecessary
Covariate adjustment (statistical control)– Control for potential W’s in a regression model– But...we have no idea how many W’s there are, so
model misspecification is still a real problem here
Quasi-Experimental Strategies to Deal with Omitted Variables
Difference in differences (fixed-effects model)– Requires panel data
Propensity score matching– Requires a lot of measured background variables
Similar to covariate adjustment, but only the treated and untreated cases which are “on support” are utilized
Instrumental variables estimation– Requires an exclusion restriction
Instrumental Variables Estimation Is a Viable Approach
An “instrumental variable” for X is one solution to the problem of omitted variables bias
Requirements for Z to be a valid instrument for X– Relevant = Correlated with X– Exogenous = Not correlated
with Y but through its correlation with X
Z
X Y
W
e
Important Point about Instrumental Variables Models
I often hear...“A good instrument should not be correlated with the dependent variable”– WRONG!!!
Z has to be correlated with Y, otherwise it is useless as an instrument– It can only be correlated with Y through X
A good instrument must not be correlated with the unobserved determinants of Y
Important Point about Instrumental Variables Models
Not all of the available variation in X is used– Only that portion of X which is “explained” by
Z is used to explain Y
X Y
ZX = Endogenous variableY = Response variableZ = Instrumental variable
Important Point about Instrumental Variables Models
X Y
Z
Realistic scenario: Very little of X is explained by Z, or what is explained does not overlap much with Y
X YZ
Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for
Important Point about Instrumental Variables Models
The IV estimator is BIASED– In other words, E(bIV) ≠ β (finite-sample bias)– The appeal of IV derives from its consistency
“Consistency” is a way of saying that E(b) → β as N → ∞ So…IV studies often have very large samples
– But with endogeneity, E(bLS) ≠ β and plim(bLS) ≠ β anyway
Asymptotic behavior of IVplim(bIV) = β + Cov(Z,e) / Cov(Z,X)
– If Z is truly exogenous, then Cov(Z,e) = 0
Instrumental Variables Terminology
Three different models to be familiar with– First stage: X = α0 + α1Z + ω – Structural model: Y = β0 + β1X + ε – Reduced form: Y = δ0 + δ1Z + ξ
An interesting equality:δ1 = α1 × β1
so…β1 = δ1 / α1
Z X Yα1 β1
Z Yδ1
ω ε
ξ
Different Types of Instrumental Variables Estimators
Wald estimator for binary instrument:bWald = [E(Y | Z = 1) – E(Y | Z = 0)] / [E(X | Z = 1) – E(X | Z = 0)]
– Difference in response ÷ Difference in treatment Instrumental variables (IV) estimator:
bIV = (Z′X)–1Z′Y = Cov(Z,Y) / Cov(Z,X)– Shows that bIV can be recovered from two samples
Two-stage least squares (2SLS) estimator:b2SLS = (X̃′X̃)–1X̃′Y = Cov(X̃,Y) / Var(X̃)
– X̃ represents “fitted” value from first-stage model
Different Types of Instrumental Variables Estimators
Single binary instrument and no control variables...
bWald = bIV = b2SLS
Single instrument (binary or continuous) with or without control variables...
bIV = b2SLS
Multiple instruments (binary or continuous) with or without control variables...
b2SLS
More on the Method of Two-Stage Least Squares (2SLS)
Step 1: X = a0 + a1Z1 + a2Z2 + + akZk + u – Obtain fitted values (X̃) from the first-stage model
Step 2: Y = b0 + b1X̃ + e – Substitute the fitted X̃ in place of the original X– Note: If done manually in two stages, the standard
errors are based on the wrong residual e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X
Best to just let the software do it for you
Including Control Variables in an IV/2SLS Model
Control variables (W’s) should be entered into the model at both stages– First stage: X = a0 + a1Z + a2W + u – Second stage: Y = b0 + b1X̃ + b2W + e
Control variables are considered “instruments,” they are just not “excluded instruments”– They serve as their own instrument
Functional Form Considerations with IV/2SLS
Binary endogenous regressor (X)– Consistency of second-stage estimates do not
hinge on getting first-stage functional form correct Binary response variable (Y)
– IV probit (or logit) is feasible but is technically unnecessary
In both cases, linear model is tractable, easily interpreted, and consistent– Although variance adjustment is well advised
Functional Form Considerations with IV/2SLS
Quadratic second stage with a continuous endogenous regressor– Entering first-stage fitted values and their square
into second-stage model leads to inconsistency The square of a linear projection is not equivalent to a
linear projection on a quadratic– Squares and cross-products of IV’s should be
treated as additional instruments Kelejian (1971)
– Linear and squared X’s are treated as two different endogenous regressors
Technical Conditions Required for Model Identification
Order condition = At least the same # of IV’s as endogenous X’s– Just-identified model: # IV’s = # X’s– Overidentified model: # IV’s > # X’s
Rank condition = At least one IV must be significant in the first-stage model– Number of linearly independent columns in a matrix
E(X | Z,W) cannot be perfectly correlated with E(X | W)
Statistical Inference with IV
Variance estimationσ2
βLS = σ2
ε / SSTX
σ2βIV
= σ2ε / (SSTX R2
X,Z)where…
ε = Y – β0 – β1X NOTICE: Because R2
X,Z < 1 sbIV > sbLS
– IV standard errors tend to be large, especially when
R2X,Z is very small, which can lead to type II errors
Instrumental Variables and Randomized Experiments
Imperfect compliance in randomized trials– Some individuals assigned to treatment group will
not receive Tx, and some assigned to control group will receive Tx
Assignment error; subject refusal; investigator discretion
– Some individuals who receive Tx will not change their behavior, and some who do not receive Tx will change their behavior
A problem in randomized job training studies and other social experiments (e.g., housing vouchers)
Instrumental Variables and Randomized Experiments
Two different measures of treatment (X)– Treatment assigned = Exogenous
Intention-to-treat (ITT) analysis– Reduced-form model: Y = δ0 + δ1Z + ξ
Often leads to underestimation of treatment effect– Treatment delivered = Endogenous
Individuals who do not comply probably differ in ways that can undermine the study
Self-selection bias and inconsistency
Angrist (2006), J.E.C.
Minneapolis D.V. experiment– Sherman and Berk (1984)
Cases of male-on-female misdemeanor assault in two high-density precincts, in which both parties present at scene
– Random assignment of arrest-mediation-separation– But...treatment assigned was not treatment delivered
Fidelity vis-à-vis arrest, but many subjects (~25%) assigned to mediation/separation were arrested
– “Upgrading” was more likely when suspect was rude, suspect assaulted officer, weapons were involved, victim persistently demanded arrest, and incident violated restraining order
Angrist (2006), J.E.C.
TreatmentAssigned(Arrest)
TreatmentDelivered(Arrest)
Recidivism+ –
ViolenceProneness
++
Angrist (2006), J.E.C.
Estimates of effect of arrest (vs. mediate or separate) on D.V. recividism (Tables 2, 3)– OLS: b = –.070 (s.e. = .038)– ITT: b = –.108 (s.e. = .041)– 2SLS: b = –.140 (s.e. = .053)
Deterrent effect of arrest is twice as large in 2SLS as opposed to OLS– In this context, 2SLS is known as a “local average
treatment effect” (I’ll come back to this)
Sexton and Hebel (1984), J.A.M.A.
Maternal smoking and birth weight– Sexton and Hebel (1984)
Sample of pregnant women who were confirmed smokers, recruited from prenatal care registrants
– At least 10 cigarettes per day and not past 18th week
– Random assignment of staff assistance in a smoking cessation program
Personal visits; telephone and mail contacts– But...some smokers in treatment group did not quit
and some smokers in control group did quit
Sexton and Hebel (1984), J.A.M.A.
SmokingIntervention
SmokingFrequency
–
SmokingPropensity
+
BirthWeight
–
–
DifficultPregnancy
––
Sexton and Hebel (1984), J.A.M.A.
(1) First-stage modelMean cigarettes smoked:
Treatment = 6.4Control = 12.8
First-stage effect: bFS = –6.4
(2) Reduced-form modelMean birth weight:
Treatment = 3,278gControl = 3,186g
Reduced-form effect: bRF = 92
(3) Structural modelEffect of smoking frequency on
mean birth weight:
bIV = 92 / –6.4 = –14.4gEach cigarette reduces birth
weight by 14.4 grams
Sexton and Hebel (1984), J.A.M.A.
As an interesting aside, it’s also possible to estimate the effect of continuing smoking (vs. quitting) from the data– First stage: bFS = –0.23 (57% vs. 80% smokers)– Reduced form: bRF = 92g– Structural: bIV = 92 / –0.23 = –400g
Women who kept smoking by the 8th month of pregnancy bore children who were 400 grams lighter, on average
Permutt and Hebel (1989), Biometrics
Estimates of the effect of smoking frequency (in 8th month) on birth weight– OLS: b = 2g (s.e. not reported)– 2SLS: b = –14g (s.e. = 7g)
Here as well, 2SLS yields the “local average treatment effect” of smoking on birth weight
Instrumental Variables and Local Average Treatment Effects
Definition of a L.A.T.E.– The average treatment effect for individuals “who
can be induced to change [treatment] status by a change in the instrument”
Imbens and Angrist (1994, p. 470)– The average causal effect of X on Y for “compliers,”
as opposed to “always takers” or “never takers” Not a particularly well-defined (sub)population
L.A.T.E. is instrument-dependent, in contrast to the population A.T.E.
L.A.T.E. in the Previous Two Examples
In the D.V. study...– For men who were arrested as per the experimental
protocol, arrest resulted in a mean 14-point decline in the probability of recidivism compared to non-arrest interventions
In the maternal smoking study...– For women who reduced their smoking frequency
because they were assigned to the intervention, each one-cigarette reduction resulted in a 14-gram increase in birth weight (from mean 11 cigarettes)