Measuring Impact: Impact Evaluation Methods for Policy Makers

Impact Evaluation

Click to edit Master title style

Click to edit Master subtitle style

Impact EvaluationImpact Evaluation

World Bank InstituteHuman Development Network

Middle East and North Africa Region

Measuring Impact:Impact Evaluation Methods for Policy

MakersPaul GertlerUC Berkeley

Note: slides by Sebastian Martinez, Christel Vermeersch and Paul Gertler. The content of this presentation reflects the views of the authors and not necessarily those of the World Bank. This version: November 2009.

Impact Evaluation

Logical Framework How the program works “in

theory” Measuring Impact

Identification Strategy Data Operational Plan Resources

Measuring Impact1) Causal Inference

Counterfactuals False Counterfactuals:

Before & After (pre & post) Enrolled & Not Enrolled (apples & oranges)

2) IE Methods Toolbox: Random Assignment Random Promotion Discontinuity Design Difference in Difference (Diff-in-diff) Matching (P-score matching)

Our Objective Estimate the CAUSAL effect (impact) of

intervention P (program or treatment) on outcome Y (indicator, measure of success)

Example: what is the effect of a Health Insurance Subsidy Program(P) on Out of Pocket Health Expenditures (Y)?

Causal Inference What is the impact of P on Y?

Answer:α= (Y | P=1)-(Y | P=0)

Can we all go home?

Problem of missing data

For a program beneficiary: we observe (Y | P=1):

Health expenditures (Y) with health insurance subsidy (P=1)

but we do not observe (Y | P=0): Health expenditures (Y) without health insurance subsidy

α= (Y | P=1)-(Y | P=0)

Solution Estimate what would have

happened to Y in the absence of P We call this the…………

COUNTERFACTUALThe key to a good

impact evaluation is a valid counterfactual!

Estimating Impact of P on Y

OBSERVE (Y | P=1)Outcome with treatment

ESTIMATE (Y | P=0) counterfactual

α = (Y | P=1) - (Y | P=0)

IMPACT = outcome with treatment - counterfactual Intention to Treat (ITT) -

Those offered treatment Treatment on the Treated

(TOT) – Those receiving treatment

Use comparison or control group

Example: What is the Impact of:

giving Fulanito

additional pocket money (P)

onFulanito’s consumption of candies (Y)

The perfect “Clone”

6 Candies

Impact =

Fulanito Fulanito’s Clone

4 Candies

In reality, use statistics

Average Y = 6 Candies

Impact = 6 - 4 = 2 Candies

Treatment Comparison

Average Y = 4 Candies

Finding Good Comparison Groups We want to find “clones” for the Fulanito’s in our

programs The treatment and comparison groups should:

have identical characteristics, except for benefiting from the intervention

In practice, use program eligibility & assignment rules to construct valid counterfactuals

With a good comparison group, the only reason for different outcomes between treatments and controls

is the intervention (P)

National Health System Reform Closing gap in access and quality of services between rural and urban

areas Large expansion in supply of health services Reduction of health care costs for rural poor

Health Insurance Subsidy Program (HISP) Pilot program Covers costs for primary health care and drugs Targeted to poor – eligibility based on poverty index

Rigorous impact evaluation with rich data 200 communities, 10,000 households Baseline and follow-up data two years later

Many outcomes of interest Yearly out of pocket health expenditures per capita

What is the effect of HISP (P) on health expenditures (Y)? If impact is a reduction of $9 or more, then scale up nationally

Case Study: HISP

Ineligibles(Non-Poor)

Eligibles(Poor)

Case Study: HISP

Not Enrolled

Enrolled

Eligibility and Enrollment

Before & After (pre & post) Enrolled & Not enrolled (apples & oranges)

Counterfeit Counterfactual #1

Before & AfterY

TimeT=0Baseline

T=1Endline

IMPACT?

C (counterfactual)

Case 1: Before & After

Observe only beneficiaries (P=1)

2 observations in time expenditures at T=0 expenditures at T=1

“Impact” = A-B =

What is the effect of HISP (P) on health expenditures (Y)?

T=0 T=1

Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).

Outcome with Treatment Counterfactual Impact

(After) (Before) (Y | P=1) - (Y | P=0)

health expenditures (Y) 7.8 14.4 -6.6**

Linear Regression Multivariate Linear Regression

estimated impact on health expenditures (Y)

-6.59** -6.65**

Case 1: Before & After

Economic Boom: Real Impact = A-C A-B is an underestimate

Economic Recession: Real Impact = A-D A-B is an overestimate

T=0 T=1

α = -$6.6D?

Impact ?

Case 1: What’s the Problem?

Impact ?

Problem with before & after: doesn’t control for other time-varying factors!

False Counterfactual #2Enrolled & Not Enrolled

If we have post-treatment data on Enrolled: treatment group

Not-enrolled: “control” group (counterfactual) Those ineligible to participate Those that choose NOT to participate

Selection Bias Reason for not enrolling may be correlated with outcome (Y)

Control for observables But not unobservables!!

Estimated impact is confounded with other things

Ineligibles(Non-Poor)

Eligibles(Poor)

Measure outcomes in post-treatment (T=1)

In what ways might enrolled & not enrolled be different, other than their enrollment in the program?

Not Enrolled Y = 21.8

Enrolled Y = 7.8

Case 2: Enrolled & Not Enrolled

Outcome with Treatment Counterfactual Impact

(Enrolled) (Not Enrolled) (Y | P=1) - (Y | P=0)

health expenditures (Y) 7.8 21.8 -14**

-13.9** -9.4**

Will you recommend scaling up HISP? Before-After:

Are there other time-varying factors that also influence health expenditures?

Enrolled-Not Enrolled: Are reasons for enrolling correlated with health expenditures? Selection Bias

Policy Recommendation?Case 1: Before and After Case 2: Enrolled & Not-

Enrolled

Linear Regression

Multivariate Linear

Regression

Linear Regression

Multivariate Linear

Regression

impact on health expenditures (Y)

-6.59** -6.65** -13.9** -9.4**

Keep in mind…….. Two common comparisons to be avoided!!

Before & After (pre & post) Compare: same individuals before and after they receive P Problem: other things may have happened over time

Enrolled & Not Enrolled (apples & oranges) Compare: a group of individuals that enrolled in a program

with a group that chooses not to enroll Problem: Selection Bias we don’t know why they are not

enrolled Both counterfactuals may lead to biased estimates of

the impact

2) IE Methods Toolbox: Random Assignment Random Promotion Discontinuity Design Difference in Differences (Diff-in-diff) Matching (P-score matching)

Choosing your IE method(s)….. Key information you will need for identifying the

right method for your program: Prospective/retrospective evaluation? Eligibility rules and criteria?

Poverty targeting? Geographic targeting ?

Roll-out plan (pipeline) ? Is the number of eligible units larger than

available resources at a given point in time? Budget and capacity constraints? Excess demand for program? Etc….

Choosing your IE method(s)…..

Best design = best comparison group you can find + least operational risk

Have we controlled for “everything”? Internal validity Good comparison group

Is the result valid for “everyone”? External validity Local versus global treatment effect Evaluation results apply to population we’re interested in

Choose the “best” possible design given the operational context

Before & After (pre & post) Enrolled & Not enrolled (apples & oranges)

2) IE Methods Toolbox: Random Assignment Random Promotion Discontinuity Design Difference in Differences (Diff-in-diff) Matching (P-score matching)

Randomized Treatments and Controls

When universe of eligibles > # benefits: Randomize! Lottery for who is offered benefits Fair, transparent and ethical way to assign benefits to equally

deserving populations Oversubscription:

Give each eligible unit the same chance of receiving treatment

Compare those offered treatment with those not offered treatment (controls)

Randomized phase in: Give each eligible unit the same chance of receiving

treatment first, second, third…. Compare those offered treatment first, with those offered

treatment later (controls)

Randomized treatments and controls

1. Universe2. Random Sample

of Eligibles

Ineligible = Eligible =

3. Randomize Treatment

External Validity Internal Validity

Control

Unit of Randomization Choose according to type of program:

Individual/Household School/Health Clinic/catchment area Block/Village/Community Ward/District/Region

Keep in mind: Need “sufficiently large” number of units to detect

minimum desired impact power Spillovers/contamination Operational and survey costs

As a rule of thumb, randomize at the smallest viable unit of implementation

Health Insurance Subsidy Program (HISP) Unit of randomization: Community 200 communities in the sample Randomized phase-in:

100 treatment communities (5,000 households)

Started receiving transfers at baseline T = 0 100 control communities (5,000 households)

Receive transfers after follow up T = 1 if program is scaled up

Case 3: Random Assignment

100 TreatmentCommunities(5,000 HH)

100 Control Communities(5,000 HH)

Comparison period

How do we know we have good clones?

Case 3: Random AssignmentControl Treatment T-stat

Health Expenditures ($ yearly per capita) 14.57 14.48 -0.39

Head’s age (years) 42.3 41.6 1.2

Spouse’s age (years) 36.8 36.8 -0.38

Head’s education (years) 2.8 2.9 -2.16**

Spouse’s education (years) 2.6 2.7 -0.006

**= significant at 1%

Case 3: Balance at Baseline

Case 3: Random AssignmentControl Treatment T-stat

Head is female = 1 0.07 0.07 0.66

Indigenous =1 0.42 0.42 0.21

Numer of household members 5.7 5.7 -1.21

Bathroom =1 0.56 0.57 -1.04

Hectares of Land 1.71 1.67 1.35

Distance to hospital (km) 106 109 -1.02

Case 3: Balance at Baseline

Treatment

Group Counterfactual Impact

(Randomized to

treatment)(Randomized to comaparison) (Y | P=1)-(Y | P=0)

Baseline (T=0) health expenditures (Y) 14.48 14.57 -0.09

Follow-up (T=1) health expenditures (Y) 7.8 17.9 -10.1**

-10.1** -10**

HISP Policy Recommendation?Case 1: Before

and After

Case 2: Enrolled & Not-

Enrolled

Case 2: Enrolled &

Not-Enrolled

Case 3: Random

Assignment

Multivariate Linear

Regression

Linear Regression

Multivariate Linear

Regression

Multivariate Linear

Regression

impact of HISP on health expenditures (Y)

-6.65** -13.9** -9.4** -10**

Random Assignment: With large enough samples, produces two

groups that are statistically equivalent We have identified the perfect “clone”

Feasible for prospective evaluations with over-subscription/excess demand

Most pilots and new programs fall into this category!

Keep in mind……..

Randomized beneficiary Randomized comparison

Remember….. Objective of impact evaluation is to estimate the

CAUSAL effect or IMPACT of a program on outcomes of interest

To estimate impact, we need to estimate the counterfactual What would have happened in the absence of the

program Use comparison or control groups

We have toolbox with 5 methods to identify good comparison groups

Choose the best evaluation method that is feasible in the program’s operational context

THANK YOU!

Measuring Impact: Impact Evaluation Methods for Policy Makers

Documents

Transcript of Measuring Impact: Impact Evaluation Methods for Policy Makers

Measuring Impact Methodology

Measuring and Communicating Impact

Measuring Impact Framework Methodology

THE IMPACT MAKERS OF INDIAN ER&D

Measuring Effectiveness and Impact

Measuring the Impact of

Measuring Impact: Part 1 - World Bankpubdocs.worldbank.org/en/...Measuring-Impact-Part1.pdf · Measuring Impact: Part 1 Maciej Jakubowski, Gdańsk, February 21, 2017. Outline •Objective

Measuring Academic Impact - UMD

Defining and Measuring Impact

Measuring Research Impact

Measuring Volunteer Impact

Measuring Drinkaware’s Impact

New challenges and opportunities for measuring impact and ......Online Workshop on measuring impact and additionality New challenges and opportunities for measuring impact and additionality

Measuring Brand Impact

Measuring social impact

Measuring your marketing impact

WBCSD Measuring Impact Framework

Measuring IFAD's impact

Measuring Cost and Impact

Measuring the "Impact" in Impact Investing