Matching Estimators

32
Matching Estimators Methods of Economic Investigation Lecture 11

description

Matching Estimators. Methods of Economic Investigation Lecture 11. Last Time. General Theme: If you don’t have an experiment, how do you get a ‘control group’ Difference in Differences How it works: compare before-after between two comparable entities - PowerPoint PPT Presentation

Transcript of Matching Estimators

Page 1: Matching Estimators

Matching Estimators

Methods of Economic InvestigationLecture 11

Page 2: Matching Estimators

Last Time General Theme: If you don’t have an

experiment, how do you get a ‘control group’

Difference in Differences How it works: compare before-after between

two comparable entities Assumptions: Fixed differences over time Tests to improve credibility of assumption

Pre-treatment trends Ashenfelter Dip

Page 3: Matching Estimators

Today’s Class Another way to get a control group:

Matching Assumptions for identification Specific form of matching called “propensity

score matching” Is it better than just a plain old regression?

Page 4: Matching Estimators

The Counterfactual Framework Counterfactual: what would have

happened to the treated subjects, had they not received treatment?

Idea: individuals selected into treatment and nontreatment groups have potential outcomes in both states: the one in which they are observed the one in which they are not observed.

Page 5: Matching Estimators

Reminder of Terms For the treated group, we have observed

mean outcome under the condition of treatment E(Y1|T=1) and unobserved mean outcome under the condition of nontreatment E(Y0|T=1).

For the control group we have both observed mean E(Y0|T=0) and unobserved mean E(Y1|T=0)

Page 6: Matching Estimators

What is “matching”? Pairing treatment and comparison units

that are similar in terms of observable characteristics

Can do this in regressions (with covariates) or prior to regression to define your treatment and control samples

Page 7: Matching Estimators

Matching Assumption Conditioning on observables (X) we

can take assignment to treatment ‘as if’ random, i.e.

What is the implicit statement: unobservables (stuff not in X) plays no role in treatment assignment (T)

iiii XTYY |),( 10

Page 8: Matching Estimators

A matched estimator E(Y1 – Y0 | T=1) = E[Y1 | X, T=1] – E[Y0 | X, T=0] - E[Y0 | X, T=1] – E[Y0 | X, T=0]

Key idea: all selection occurs only through observed X

Matched treatment effectAssumed to be zero

Page 9: Matching Estimators

Just do a regression… Regression are flexible

if you only put in a “main effect” the regression will estimate a purely linear specification

Interactions and fixed effects allow different slopes and intercepts for any combination of variables

Can include quadratic and higher order polynomial terms if necessary

But fundamentally specify additively separable terms

Page 10: Matching Estimators

Sometimes regression not feasible… The issue is largely related to dimentionality

Each time you add an observable characteristics, you partition your data into bins.

Imagine all variables are zero-one variables Then if you have k X’s, you have 2k potential

different values Need enough observations in each value to

estimate that precisely

Page 11: Matching Estimators

Reducing the Dimensionality Use of propensity score: Probability of

receiving treatment, conditional on covariates

Key assumption: if and defining

If this is true, can interpret estimate of differences in outcomes conditional on X as causal effect

iiii XTYY |),( 10

iiii XTYY |),( 10 )(|),( 10 iiii XpTYY

Page 12: Matching Estimators

Why not control for X Matching is flexible in a different way

Avoid specifying a particular for the outcome equation, decision process or unobservable term

Just need the “right” observables

Flexible in the form of how X’s affect treatment probability but inflexible in how treatment probability affects outcome

Page 13: Matching Estimators

Participation decision Remember our 3 groups:

Always takers: take the treatment if offered AND take the treatment if not offeredWe observe them if T=0 but R=1

Never takers: don’t take the treatment if not offered AND don’t take it even if it is offeredWe observe them if T=1 but R=0

Compliers: just do what they’re assigned to doT=1 & R=1 OR T=0 & R=0

Page 14: Matching Estimators

Conditions for Matching to WorkTake 1-sided non-compliance for ease…if

not offered, can’t take it, but some people don’t take it even if offered

Error term for never takers

Error term for compliers

If it’s zero Perfect compliance: so conditioning on X replicates experimental setting

On avg, conditional on X unobservable are the same

Page 15: Matching Estimators

Common Support Can only exist if there is a region of

“common support” People with the same X values are in both the

treatment and the control groups Let S be the set of all observables X, then

0<Pr(T=1 | X)<0 for some S* subset of S

Intuition: Someone in control close enough to match to treatment unit OR enough overlap in the distribution of treated and untreated individuals

Page 16: Matching Estimators

Lots of common support0

.1.2

.3.4

-4 -2 0 2 4x

kdensity treatment kdensity control

Between red and blue line is area of common support

Page 17: Matching Estimators

Not so much common support

0.1

.2.3

.4

-5 0 5 10x

kdensity treatment kdensity control

Page 18: Matching Estimators

Trimming Define Min and Max values of X for region

of overlap—drop all units not in that region Remove Regions which do not have strictly

positive propensity score in both treatment and control distributions(Petra and Todd, 2005)

Both are quite similar when used in practice but if missing sections in middle of distribution can use the second option

Page 19: Matching Estimators

How do we match on p(X) Taken literally, should match on exactly p(Xi)

In practice hard to do so strategy is to match treated units to comparison units whose p-scores are sufficiently close to consider

Issues: How many times can 1 unit be a “match” How many to match to treatment unit How to “match” if using more than 1 control unit

per treatment unit

Page 20: Matching Estimators

Replacement Issue: once control group person Z is a

match for individual A, can she also be a match for individual B

Trade-off between bias and precision: With replacement minimizes the propensity

score distance between the matched and the comparison unit

Without replacement

Page 21: Matching Estimators

Are we doing a one-to-one match? If 1-to-1 match: units closely related but

may not be very precise estimates

More you include in match, the more the p-score of the control group will differ from the treatment group

Trade-off between bias and precision Typically use 1-to-many match because 1-to-1 is

extremely data intensive if X is multi-dimensional

Page 22: Matching Estimators

Different matching algorithms-1

Can use nearest neighbor which chooses m closest comparison units implicitly weights these all the same Get fixed m but may end up with different pscores

Can use ‘caliper’—radius around a point Again implicitly weights these the same Fixed difference in p-scores, but may not be many

units in radius Stratify

Break sample up into intervals Estimate treatment effect separately in each region

Page 23: Matching Estimators

Different Matching Algorithms-2 Can also use some type of distribution:

Kernel estimator puts some type of distribution (e.g. normal) around the each treatment unit and weights closer control units more and farther control units less

Explicit weighting function can be used if you have some knowledge of how related units of certain distances are to each other

Page 24: Matching Estimators

How close is close enough? No “right” answer in these choices—will

depend heavily on sample issues How deep is the common support (i.e. are

there lots of people in both control and treatment group at all the p-score values

Should all be the same asymptotically but in finite samples (which is everything) may differ

Page 25: Matching Estimators

Tradeoffs in different methods

Source: Caliendo and Kopeinig, 2005

Page 26: Matching Estimators

How to estimate a p-score Typically use a logit

Specific, useful functional form for estimating “discrete choice” models

You haven’t learned these yet but you will

For now, think of running a regular OLS regression where the outcome is 1 if you got the treatment and zero if you didn’t

Take the E[T | X] and that’s your propensity score

Page 27: Matching Estimators

The Treatment Effect CIA holds and sufficient region of of common

support Difference in outcome between treated

individual i and weighted comparison group J, with weight generated by the p-score distribution in the common support region

N is the treatment group and |N| is the size of the treatment group

J is comparison group with |J| is the number of comparison group units matched to i

Page 28: Matching Estimators

1-to-n Match Nearest neighbor matching Caliper matching Nonparametric/kernel matching

Run Regression: • Dependent variable: T=1, if participate; T = 0, otherwise. •Choose appropriate conditioning variables, X• Obtain propensity score: predicted probability (p)

General Procedure

Multivariate analysis based on new sample

1-to-1 match estimate difference in outcomes for each pairTake average difference as treatment effect

Page 29: Matching Estimators

Standard Errors Problem: Estimated variance of treatment

effect should include additional variance from estimating p Typically people “bootstrap” which is a non-

parametric form of estimating your coefficients over and over until you get a distribution of those coefficients—use the variance from that

Will do this in a few weeks

Page 30: Matching Estimators

Some concerns about Matching Data intensive in propensity score

estimation May reduce dimensionality of treatment effect

estimation but still need enough of a sample to estimate propensity score over common support

Need LOTS of X’s for this to be believable

Inflexible in how p-score is related to treatment Worry about heterogeneity Bias terms much more difficult to sign (non-linear

p-score bias)

Page 31: Matching Estimators

Matching + Diff-in-Diff Worry that unobservables causing selection

because matching on X not sufficient Can combine this with difference and

difference estimates Take control group J for each individual i Estimate difference before treatment If the groups are truly ‘as if’ random should be zero If it’s not zero: can assume fixed differences over

time and take before after difference in treatment and control groups

Page 32: Matching Estimators

Next Time Comparing Non-Experimental Methods to

the experiments they are trying to replicate

Goal: See how well these techniques work to get the estimated experimental effect