Practical Sampling for Impact Evaluations

32
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Practical Sampling for Impact Evaluations Vincenzo di Maro 1

description

Vincenzo di Maro. Practical Sampling for Impact Evaluations. Introduction. How do we construct a sample to credibly detect a meaningful effect? Which populations or groups are we interested in and where do we find them? - PowerPoint PPT Presentation

Transcript of Practical Sampling for Impact Evaluations

Page 1: Practical Sampling for  Impact Evaluations

Global Workshop onDevelopment Impact Evaluation

in Finance and Private SectorRio de Janeiro, June 6-10, 2011

Practical Sampling for Impact Evaluations

Vincenzo di Maro

1

Page 2: Practical Sampling for  Impact Evaluations

How do we construct a sample to credibly detect a meaningful effect? Which populations or groups are we interested

in and where do we find them? How many people/firms/units should be

interviewed/observed from that population? How does this affect the evaluation budget?

2

Introduction

Page 3: Practical Sampling for  Impact Evaluations

1. Sampling frame What populations or groups are we interested

in? How do we find them?

2. Sample size Why it is so important: confidence in results Determinants of appropriate sample size Further issues Examples

3. Budgets

3

Outline

Page 4: Practical Sampling for  Impact Evaluations

Who are we interested in?a) All SMEs?b) All formal SMEs?c) All formal SMES in a particular sector?d) All formal SMES in a particular region?

Need to keep in mind external validity Can findings from population (c) inform appropriate

programs to help informal firms in a different sector? Can findings from population (d) inform national

policy? But should also keep in mind feasibility and

what you want to learn Might not be possible or desirable to pilot a very

broadly defined program or policy4

Sampling frame

Page 5: Practical Sampling for  Impact Evaluations

Depends on size and type of experiment Lottery among applicants

Example: BDS program among informal firms in a particular area

Can use treatment and comparison units from applicant pool If not feasible (50,000 get the treatment), need to draw a

sample to measure impact Policy change

Example: A change in business registration rules in randomly selected districts

To measure impact on profits, cannot sample all informal businesses in treatment and comparison districts.

Will need to draw a sample of firms within districts.

Required information before sampling Complete listing all of units of observation available for

sampling in each area or group Tricky for units like informal firms, but there are techniques to

overcome this 5

Sampling frame: Finding the units we’re interested in

Page 6: Practical Sampling for  Impact Evaluations

1. Sampling frame What populations or groups are we interested

in? How do we find them?

2. Sample size Why it is so important: confidence in

results Determinants of appropriate sample size Further issues Examples

3. Budgets

6

Outline

Page 7: Practical Sampling for  Impact Evaluations

Start with a simpler question than program impact

Say we wanted to know the average annual profits of an SME in Rio. Option 1: We go out and track down 5 business

owners and take the average of their responses.

Option 2: We track down 1,000 business owners and average their responses.

Which average is likely to be closer to the true average?

7

Sample size and confidence

Page 8: Practical Sampling for  Impact Evaluations

5 firms

8

1,000 firmsProfits Number of firms$0 - $1,000 1$ 1,001 -$5,000 2$5,001-10,000 1$10,001, - $15,000 0$15,001 + 1

Profits Number of firms$0 - $1,000 70$ 1,001 -$5,000 150$5,001-10,000 650$10,001, - $15,000 125$15,001 + 5

Sample size and confidence

Page 9: Practical Sampling for  Impact Evaluations

Similarly, when determining program impact Need many observations to say with confidence

whether average outcome of treatment group is higher/lower than in comparison group

What do I mean by confidence? Minimizing statistical error

Types of errors Type 1 error: You say there is a program impact

when there really isn’t one. Type 2 error: There really is a program impact but

you cannot detect it.

9

Sample size and confidence

Page 10: Practical Sampling for  Impact Evaluations

Type 1 error: Find program impact when there’s none Error can be minimized after data collection, during

statistical analysis Need to adjust the significance levels of impact

estimates (e.g. 99% or 95% confidence intervals)

Type 2 error: Cannot see that there really is a program impact In jargon: statistical test has low power Error must be minimized before data collection Best method of doing this: ensuring you have a large

enough sample

Whole point of an impact evaluation is to learn something Ex ante: We don’t know how large the impact of this

program is Low powered ex-post: This program might have

increased firms’ profits by 50% but we cannot distinguish a 50% increase from an increase of zero with any confidence

10

Sample size and confidence

Page 11: Practical Sampling for  Impact Evaluations

The formula:

Main things to be aware of:1. Detectable effect size2. Probability of type 1 and 2 errors3. Variance of outcome(s)4. Units (firms, banks) per treated area

11

Calculating sample size

)1(1)(4

2

22/

2

H

Dzz

N

Page 12: Practical Sampling for  Impact Evaluations

Smallest detectable effect size Smallest effect you want to be able to distinguish

from zero A 30% increase in sales, a 25% decrease in bribes paid

Larger samples easier to detect smaller effects

Do female and male entrepreneurs work similar hours? Claim: On average, women work 40 hours/week, men

work 44 hours/week If statistic came from sample of 10 women & 10 men

Hard to say if they are different Would be easier to say they are different if women work 30

hours/week and men work 80 hours/week But if statistic came from sample of 500 women and 500

men More likely that they truly are different 12

Calculating sample size

Page 13: Practical Sampling for  Impact Evaluations

How do you choose the smallest detectable effect size? Smallest effect that would prompt a

policy response Smallest effect that would allow you to

say that a program was not a failure This program significantly increased sales by

40%. Great - let’s think about how we can scale this up.

This program significantly increased sales by 10%. Great….uh..wait: we spent all of that money and it

only increased sales by that much?13

Calculating sample size

Page 14: Practical Sampling for  Impact Evaluations

Type 1 and Type 2 errors Type 1

Significance level of estimates usually set to 1% or 5%

1% or 5% probability that there is no effect but we think we found one

Type 2 Power usually set to 80% or 90% 20% or 10% probability that there is an effect

but we cannot detect it Larger samples higher power

14

Calculating sample size

Page 15: Practical Sampling for  Impact Evaluations

Variance of outcomes Less underlying variance easier to

detect difference can have lower sample size

15

Calculating sample size

Page 16: Practical Sampling for  Impact Evaluations

Variance of outcomes How do we know this before we decide

our sample size and collect our data? Ideal pre-existing data often ….non-existent Can use pre-existing data from a similar

population Example: Enterprise Surveys, labor force

surveys

Makes this a bit of guesswork, not a foolproof exercises Use as a guide

16

Calculating sample size

Page 17: Practical Sampling for  Impact Evaluations

1. Multiple treatment arms

2. Group-disaggregated results

3. Take-up

4. Data quality

17

Further issues

Page 18: Practical Sampling for  Impact Evaluations

Multiple treatment arms Straightforward to compare each treatment

separately to the comparison group To compare treatment groups requires very large

samples Especially if treatments very similar, differences between

the treatment groups would be smaller In effect, it’s like fixing a very small detectable effect size

Group-disaggregated results Are effects different for men and women? For

different sectors? If genders/sectors expected to react in a similar

way, then estimating differences in treatment impact also requires very large samples

18

Further issues

Page 19: Practical Sampling for  Impact Evaluations

19

Who is taller?Detecting smaller differences is harder

Page 20: Practical Sampling for  Impact Evaluations

Group-disaggregated results To ensure balance across treatment and

comparison groups, good to divide sample into strata before assigning treatment

Strata Sub-populations Common strata: geography, gender, sector,

initial values of outcome variable Treatment assignment (or sampling) occurs

within these groups

20

Further issues

Page 21: Practical Sampling for  Impact Evaluations

Geography example = T = C

Why do we need strata?

Page 22: Practical Sampling for  Impact Evaluations

What’s the impact in a particular region? Sometimes hard to say with any confidence

Why do we need strata?

Page 23: Practical Sampling for  Impact Evaluations

Random assignment to treatment within geographical units

Within each unit, ½ will be treatment, ½ will be comparison.

Similar logic for gender, industry, firm size, etc

Why do we need strata?

Page 24: Practical Sampling for  Impact Evaluations

Take-up Low take-up increases detectable effect

size Can only find an effect if it is really large Effectively decreases sample size

Example: Offering matching grants to SMEs for BDS services Offer to 5,000 firms Only 50 participate Probably can only say there is an effect on sales

with confidence if they become Fortune 500 companies

24

Further issues

Page 25: Practical Sampling for  Impact Evaluations

Data quality Poor data quality effectively increases

required sample size Missing observations Increased noise

Can be partly addressed with field coordinator on the ground monitoring data collection

25

Further issues

Page 26: Practical Sampling for  Impact Evaluations

 Calculations can be made in many statistical packages – e.g. STATA, Optimal Design

Experiment in Ghana designed to increase the profits of microenterprise firms

Baseline profits• 50 cedi per month.• Profits data typically noisy, so a coefficient of

variation >1 common.Example STATA code to detect 10% increase in

profits: • sampsi 50 55, p(0.8) pre(1) post(1) r1(0.5)

sd1(50) sd2(50)• Having both a baseline and endline decreases

required sample size (pre and post) 26

Example from Ghana

Page 27: Practical Sampling for  Impact Evaluations

Results• 10% increase (from 50 to 55): 1,178 firms in

each group• 20% increase (from 50 to 60): 295 firms in each

group.• 50% increase (from 50 to 75): 48 firms in each

group (But this effect size not realistic)

What if take-up is only 50%?• Offer business training that increases profits by

20%, but only half the firms do it. • Mean for treated group = 0.5*50 + 0.5*60 = 55• Equivalent to detecting a 10% increase with

100% take-up need 1,178 in each group instead of 295 in each group 27

Example from Ghana

Page 28: Practical Sampling for  Impact Evaluations

1. Sampling frame What populations or groups are we interested

in How do we find them?

2. Sample size Why it is so important: confidence in results Determinants of appropriate sample size Further issues Examples

3. Budgets

28

Outline

Page 29: Practical Sampling for  Impact Evaluations

What is required? Data collection

Survey firm Data entry

Field coordinator to ensure treatment follows randomization protocol and to monitor data collection

Data analysis

29

Budgets

Page 30: Practical Sampling for  Impact Evaluations

How much will all of this cost? Huge range. Often depends on

Length of survey Ease of finding respondents Spatial dispersion of respondents Security issues Formal vs informal firms Required human capital of enumerator Et cetera….

Firm-level survey data:$40-350/firm Household survey data: $40+/household Field coordinator: $10,000-$40,000/year

Depends on whether you can find a local hire Administrative data: Usually free

Sometimes has limited outcomes, can miss most of the informal sector

30

Budgets

Page 31: Practical Sampling for  Impact Evaluations

Money can buy power!

31

Budgets

Budget $10,000 $25,000 $37,000 $49,000

Firms 8 7 8 7

Clusters 55 147 205 294

Total obs 440 1029 1640 2058

Power .3 .64 .8 .9

Page 32: Practical Sampling for  Impact Evaluations

The sample size of your impact evaluation will determine how much you can learn from your experiment

Some judgment and guesswork in calculations but important to spend time on them If sample size is too low: waste of time and

money because you will not be able to detect a non-zero impact with any confidence

If little effort put into sample design and data collection: See above.

Questions?

32

Summing up