Time and Duration of Exposure

28
Timing and Duration of Exposure in Evaluations of Social Programs Elizabeth M. King Jere R. Behrman Impact evaluations aim to measure the outcomes that can be attributed to a specific policy or intervention. While there have been excellent reviews of the different methods for estimating impact, insufficient attention has been paid to questions related to timing: How long aftera program has begun should it be evaluated? For how long should treat- ment groups be exposed to a program before they benefit from it? Are there time patterns in a program’s impact? This paper examines the evaluation issues related to timing, and discusses the sources of variation in the duration of exposurewithin programs and their implications for impact estimates. It reviewsthe evidence from careful evaluations of pro- grams (with a focus on developing countries) on the ways that duration affects impacts. A critical risk that faces all development aid is that it will not pay off as expected—or that it will not be perceived as effective—in reaching development targets. Despite the billions of dollars spent on improving health, nutrition, learn- ing, and household welfare, we know surprisingly little about the impact of many social programs in developing countries. One reason for this is that governments and the development community tend to expand programs quickly even in the absence of credible evidence, which reflects an extreme impatience towards ade- quately piloting and assessing new programs first. This impatience is understand- able given the urgency of the problems being addressed, but it can result in costly but avoidable mistakes and failures; it can also result in really promising new pro- grams being terminated too soon when a rapid assessment shows negative or no impact. However, recent promises of substantially more aid from rich countries and large private foundations have intensified interest in assessing aid effectiveness. This interest is reflected in a call for more evaluations of the impact of donor- funded programs in order to understand what type of intervention works and # The Author 2009. Published by Oxford University Press on behalf of the International Bank for Reconstruction and Development / THE WORLD BANK. All rights reserved. For permissions, please e-mail: [email protected] doi;10.1093/wbro/lkn009 Advance Access publication February 23, 2009 24:55–82

description

time

Transcript of Time and Duration of Exposure

Timing and Duration of Exposure inEvaluations of Social ProgramsElizabeth M. King Jere R. BehrmanImpact evaluations aimtomeasure the outcomes that canbe attributedtoaspecicpolicyorintervention. Whiletherehavebeenexcellentreviewsof thedifferentmethodsfor estimatingimpact,insufcientattentionhasbeenpaidtoquestionsrelatedtotiming:Howlongafter aprogramhasbegunshoulditbeevaluated?Forhowlongshouldtreat-ment groups be exposed to aprogram before they benet from it? Are there time patternsinaprogramsimpact?Thispaper examinestheevaluationissuesrelatedtotiming,anddiscussesthesourcesofvariationinthedurationofexposure withinprogramsandtheirimplications for impact estimates. It reviews the evidence from careful evaluations of pro-grams (with a focus on developing countries) on the ways that duration affects impacts.Acritical risk that faces all development aid is that it will not pay off asexpectedor that it will not beperceivedaseffectiveinreachingdevelopmenttargets.Despitethebillionsofdollarsspentonimprovinghealth,nutrition,learn-ing,andhouseholdwelfare,weknowsurprisinglylittleabouttheimpactofmanysocial programsindevelopingcountries. Onereasonforthisisthatgovernmentsandthedevelopment communitytendtoexpandprograms quicklyevenintheabsenceof credibleevidence, whichreectsanextremeimpatiencetowardsade-quatelypilotingandassessingnewprogramsrst.Thisimpatienceisunderstand-ablegiventheurgency oftheproblemsbeing addressed,butitcanresultincostlybutavoidable mistakes andfailures;itcanalso resultinreallypromisingnewpro-gramsbeingterminatedtoosoonwhenarapidassessmentshowsnegativeornoimpact.However, recent promises of substantiallymore aidfromrichcountries andlargeprivate foundations haveintensiedinterest inassessingaideffectiveness.This interest is reectedinacall for moreevaluations of theimpact of donor-fundedprograms inorder tounderstandwhat type of interventionworks and# TheAuthor2009. PublishedbyOxfordUniversityPressonbehalf of theInternational BankforReconstructionandDevelopment /THEWORLDBANK. All rights reserved. For permissions, please e-mail: [email protected];10.1093/wbro/lkn009 Advance Access publication February 23, 2009 24:5582what doesnt.1Researchers are responding enthusiastically to this call. Therehavebeenimportantdevelopmentsinevaluationmethodsastheyapplytosocialprograms, especiallyonthequestionof howbesttoidentifyagroupwithwhichtocompareintendedprogrambeneciariesthatis,agroupofpeople who wouldhave had the same outcomes as the program group without the program.2The timing questioninevaluations, however, is arguablyas important butrelativelyunderstudied.Thisquestionhasmany dimensions.Forhowlongafteraprogramhas been launched should one wait before evaluating it? Howlongshouldtreatmentgroupsbeexposedtoaprogrambeforetheycanbeexpectedtobenetfromit, eitherpartially orfully?Howshouldonetakeaccountof thehet-erogeneityinimpactthatisrelatedtothedurationofexposure?Thistimingissueisrelevantforallevaluations,butparticularlysofortheevaluationofsocialpro-grams thatrequirechangesinthebehaviorsofbothserviceprovidersandserviceusers inorder tobringabout measurableoutcomes. If oneevaluates tooearly,thereisariskof ndingonlypartial or noimpact; toolate, andthereisariskthat theprogrammight losedonorandpublicsupport or that abadlydesignedprogrammight beexpanded. Figure1illustratesthispoint byshowingthat thetrue impact of a programmay not be immediate or constant over time, forreasons that we discuss in this paper. Comparing two hypothetical programswhoseimpactdiffersovertime, weseethatanevaluationundertakenattimet1indicatesthatthecaseinthebottompanel hasahigherimpactthanthecaseinthe top panel, while an evaluation at time t3 suggests the opposite result.Figure 1. The Timing of Evaluations Can Affect Impact Estimates56 The World Bank Research Observer, vol. 24, no. 1 (February 2009)Thispaperdiscusseskeyissuesrelatedtothetimingof programsandthetimepathof theirimpact, andhowthesehavebeenaddressedinevaluations.3Manyevaluationstreatinterventionsasiftheywereinstantaneous, predictablechangesinconditionsandequalacross treatmentgroups.Many evaluationsalsoimplicitlyassumethattheeffectonindividualsisdichotomous(thatis,thatindividualsareeitherexposedor not), asmight bethecaseinaone-shot vaccinationprogramthat provides permanent immunization. Thereis noconsiderationof the possi-bility that the effects vary according to differences in program exposure.4Whether the treatment involves immunization or a more process-orientedprogramsuchas communityorganization, the unstatedassumptions are oftenthatthetreatmentoccursataspeciedinceptiondate,andthatitisimplementedcompletely and in precisely the same way across treatment groups.There are several reasons why implementation is neither immediate norperfect, why the duration of exposure to a treatment differs not only acrossprogramareasbutalsoacrossultimatebeneciaries, andwhyvaryinglengthsofexposure might lead to different estimates of program impact. This paper discussesthreebroadsourcesof variationindurationof exposure, andreviewsthelitera-ture relatedtothose sources (see AppendixTable A-1for alist of the studiesreviewed). Onesourcepertainstoorganizational factorsthataffecttheleadsandlagsinprogramimplementation, andtotimingissuesrelatedtoprogramdesignandthe objectives of anevaluation. Asecondsource refers tospillover effects,includingvariationthat arises fromthe learningandadoptionbybeneciariesandpossible contaminationof the control groups. Spillover effects areexternal(totheprogram) sourcesof variationinthetreatment: whilethesemaypertainmoretocompliancethantiming,they canappear andintensifywithtime,andsoaffect estimates of programimpact. Athird source pertains to heterogeneousresponsestotreatment. Althoughtherecanbedifferentsourcesof heterogeneityinimpact, thefocushereisonthoseassociatedwithageorcohort, especiallyasthese cohort effects interact with how long a program has been running.Organizational Factors and Variation in Program ExposureProgram Design and the Timing of EvaluationsHowlongoneshouldwait toevaluateaprogramdependsonthenatureof theinterventionitself andthepurposeof theevaluation. Forexample, inthecaseofHIV/AIDSortuberculosis treatmentprograms,adherenceto thetreatmentregimeoveraperiodof timeisnecessaryforthedrugstobeeffective. Whiledrugeffec-tivenessintreatingthediseaseislikelytobetheoutcomeof interest, anevalu-ationof the programmight also consider adherence rates as anintermediateKing and Behrman 57outcomeof theprogramandsotheevaluationneednottakeplaceonlyattheendof theprogrambut duringtheimplementationitself. Inthecaseof workertrainingprograms,workersmustrstenrollforthetraining,andthensometimepasses duringwhichthetrainingoccurs. If thetrainingprogramhas aspecicduration, theevaluationshouldtakeplaceafter thecompletionof thetrainingprogram.However,timingmaynotbesoeasytopindownif thetimingoftheinterven-tionitself istheproductofastochasticprocess. Forexample,amarketdownturnmaycauseworkerstobeunemployed, triggeringtheireligibilityforworkertrain-ing,oramarketupturnmay causetraineestoleavetheprogramtostartajobas Ravallion and others (2005) observe in Argentinas Trabajar workfareprogram. Incaseswherethetimingof entryinto(orexit from) aprogramitselfdiffers across potential beneciaries, the outcomes of interest depend on anindividual selectionprocessandonthepassageof time. Anevaluationof theseprograms shouldconsider selectionbias. Randomizedevaluations of trials withwell-dened start and end dates do not address this issue.Infact thetimingof aprogrammaybeusedfor identicationpurposes. Forexample, some programs are implemented in phases. If the phasing is applied ran-domly, therandomvariationindurationcanbeusedforidenticationpurposesinestimatingprogramimpact (RosenzweigandWolpin1986isaseminal articleonthis point). One instance is Mexicos PROGRESA(Programa de Educacion,Salud y Alimentacion) which was targeted at the poorest rural communitieswhenitbegan.Usingadministrativeandcensusdataonmeasuresofpoverty,theprogramidentiedthepotentialbeneciaries.Ofthe506communitieschosenforthe evaluationsample, about two-thirds were randomlyselectedtoreceive theprogramactivities during the rst twoyears of the program, starting inmid-1998, while the remaining one-third received the program in the third year, start-inginthefall of 2000. Thegroupthat receivedtheinterventionlaterhasbeenusedasacontrol groupinevaluationsof PROGRESA(see, forexample, Schultz2004; Behrman, Sengupta, and Todd 2005).One waytoregarddurationeffectsis that,givenconstantdosage orintensity ofatreatment,lengtheningdurationofexposureisakintoincreasingintensity,andthusthelikelihoodof greaterimpact. Twocasesshowthatimpactislikelytobeunderestimatedif the evaluationcoverage is tooshort. First, skill developmentprogramsareanobviousexampleof theimportanceof thedurationof programexposure:beneciarieswhoattendonlypartofatrainingcoursearelesslikelytobenet fromthecourseandattaintheprogramgoalsthanthosewhocompletethecourse. Inevaluatingtheimpact of atrainingcourseattendedbystudents,RouseandKrueger(2004)distinguishbetweenstudentswhocompletedthecom-puterinstructionofferedthroughtheFast ForWordprogramandthosewhodidnot. The authors dene completion as a function of the amount of training58 The World Bank Research Observer, vol. 24, no. 1 (February 2009)attended and the actual progress of students toward the next stage of theprogram, asreectedinthepercentageof exercisesatthecurrentlevel masteredataprespeciedlevelofprociency.5Theauthorsndthat,amongstudentswhoreceived more comprehensivetreatmentas reectedby thetotal number of com-pleteddaysof trainingandthelevel of achievement of thecompletioncriteriaperformanceimprovedmorequickly ononeofthereadingtests(butnotall)thatthe authors use.Banerjee and others (2007) evaluate two randomly assigned programs inurbanIndia:aremedialtrainingprogramthathiredyoungwomentoteachchil-drenwithlowliteracyand numeracyskills, and acomputer-assisted learningprogram. Illustratingthe pointthat a longer durationof exposure intensies treat-ment,theremedialprogramraisedaveragetestscoresby0.14ofastandarddevi-ationintherstyearand0.28of astandarddeviationinthesecondyearof theprogram, while computer-assistedlearningincreasedmathscores by0.35of astandard deviation in the rst year and 0.47of a standard deviation inthesecondyear. Theauthorsinterpret thelargerestimateinthesecondyearasanindicationthattherstyearlaidthefoundationfortheprogramtohelpthechil-dren benet from its second year.Lags in ImplementationOne assumptionthat impact evaluations oftenmake is that, once a programstarts, itsimplementationoccursat aspecicandknowabletimethat isusuallydeterminedatacentralprogramofce.Programdocuments,suchasWorldBankproject loandocuments, typicallycontainofcial project launchdates, but thesedatesoftendifferfromthedateofactualimplementationinaprojectarea.Whenaprogramactuallybeginsdependsonsupply-anddemand-relatedrealitiesintheeld. For example, a programrequiring material inputs (suchas textbooks ormedicines)reliesonthearrival of thoseinputsintheprogramareas: thetimingof theprocurementof theinputsbythecentral programofcemaynotindicateaccuratelywhenthose inputs arrive at their intendeddestinations.6Inalargeearly childhooddevelopmentprograminthePhilippines,administrativedataindi-catethat thetimingof theimplementationdifferedsubstantiallyacrossprogramareas:becauseoflags in centralprocurement, three years afterprojectlaunch notall providers intheprogramareas hadreceivedtherequiredtraining(Armecinandothers 2006). Besides supplylags, snags ininformationows andprojectnancescanalsodelayimplementation.InconditionalcashtransferprogramsinMexicoandEcuador, delaysinprovidingtheinformationabout intendedhouse-holdbeneciariespreventedprogramoperatorsinsomesitesfrommakingpunc-tual transfers to households (Rawlings and Rubio 2005; Schady and Araujo2008).7In Argentina poor municipalities found it more difcult to raise theKing and Behrman 59conancing required for the subprojects of the countrys Trabajar program,which weakened the programs targeting performance (Ravallion 2002).It ispossibletoaddresstheproblemof implementationlagsinpart if carefulandcompleteadministrativedataontimingareavailablefortheprogram: cross-referencing such data with information from public ofcials or community leadersin treatment areas could reveal the institutional reasons for variation inimplementation. For example, if there is anaverage gap of one year betweenprogramlaunchandactual implementation, thenit isreasonablefortheevalu-ationtomakeanallowanceof oneyearafterprogramlaunchbeforeestimatingprogram impact.8However, reliable information on dates is often not readily avail-able, so studies have tended to allot an arbitrary grace period to account for lags.Assumingaconstantallowancefordelays, moreover, maynotbeanadequatesolutionif thereis widevariationinthetimingof implementationacrosstreat-mentareas.Thisislikelytobethecaseiftheprograminvolvesalargenumber ofgeographical regionsoralargenumberof componentsandactors. Inprogramsthat cover several statesor provinces, regionor statexed-effects might controlfor durationdifferences if the differences are homogeneous withinaregionorstate. If the delays are not independent of unobservable characteristics intheprogramareas, that may also inuence programimpact. An evaluation ofMadagascars SEECALINEprogramprovides anexample of howtodenearea-specic starting dates. It dened the start of the program in each treatment site asthe date of the rst child-weighing session in that site. The area-specic date takesintoaccount theprograms approachof gradual andsequential expansion, andtheexpecteddelays betweenthesigningof thecontract withtheimplementingNGOand the point when a treatment site is actually open and operational(Galasso and Yau 2006). This method requires detailed program-monitoring data.Ifaprogramhasmany components,thesolutionmayhinge ontheevaluatorsunderstandingof thetechnical productionfunctionandthusonidentifyingtheelementsthatmustbepresent fortheprogramtobeeffective. Forexample, inaschool improvement program that requires additional teacher training andinstructional materials, thematerialsmight arriveinschoolsat about thesametime, buttheadditional teachertrainingmightbeachievedonlyoveraperiodofseveral months, perhaps because of differences in teacher availability. Theevaluator, whenconsideringthetimingof theevaluation, must decidewhetherthe effective programstart should be dened according to the date whenthematerialsarriveinschoolsorwhenall(ormost?)oftheteachershavecompletedtheir training. In the Madagascar example above, although the programhasseveral components(forexample, growthmonitoring, micronutrient supplemen-tation, deworming), the inception date of each site was xed according to agrowth-monitoringactivity, that of therst weighingsession(GalassoandYau2006).60 The World Bank Research Observer, vol. 24, no. 1 (February 2009)Althoughtheprimary objectiveofevaluationsis usuallytomeasuretheimpactof programs, oftentheyalsomonitor progress duringthe course of implemen-tationandthushelptoidentifyproblemsthatneedcorrection. Anevaluationofthe Bolivia Social Investment Fund illustrates this point clearly (Newmanandothers 2002).One ofthe program componentswas to improvethedrinkingwatersupplythroughinvestmentsinsmall-scalewatersystems. However, therst lab-oratoryanalysis of water qualityshowedlittle improvement inprogramareas.Interviewswithlocal beneciariesexplainedwhy: contrarytoplan, peopledesig-nated to maintainwater quality lacked training; inappropriate materials wereusedfortubesandthewatertanks;andthelack ofwatermetersmadeitdifculttocollect feesneededtonancemaintenancework. Aftertrainingwasprovidedinall theprogramcommunities, asecondanalysisof watersupplyindicatedsig-nicantly less fecal contamination in the water in those areas.Howare estimates of impact affected by variationinprogramstart and bylagsinimplementation?Variationinprogramexposurethat isnot incorporatedinto programevaluation is very likely to bias downward the intent-to-treat(ITT) estimatesof theprogramsimpact, especiallyif suchimpact increaseswiththe exposure of the beneciaries whoare actuallytreated. But the size of thisunderestimation, for a givenaverage lag across communities, depends onthenature of the lags. If the programimplementationdelays are not random, itmatters if theyareinverselyor directlycorrelatedwithunobservedattributes ofthetreatedgroupsthat maypositivelyaffect programsuccess. If theimplemen-tationlagsaredirectlycorrelatedwithunobservedlocal attributes, thenthetrueITTeffects areunderestimatedtoalarger extent; for example, central adminis-trators mayput less effort intostartingtheprograms inareasthat haveworseunobserveddeterminantsof theoutcomesof interest, suchasaweakermanage-ment capability of local ofcials to implement a programin these areas. Ifimplementationdelaysareinsteadinverselyassociatedwiththeunobservedlocalattributes (that is, the central administrators put more effort intostartingtheprogramin those same areas), then the ITTeffects are underestimated to alesserextent. If insteadtheprogramdelaysarerandom, theextentof theunder-estimationdepends onthe variance inthe implementationlags (still giventhesame mean lag). All else being equal, greater randomvariance in the lagsresults ingreater underestimationof the ITTeffects. This is because a largerclassical randommeasurementerrorinaright-sidevariablebiasestheestimatedcoefcientmoretowardszero.Ifthestartofthetreatmentforindividualbeneciarieshasbeenidentiedcor-rectly, implementationdelaysinthemselvesdonot necessarilyaffect estimatesoftreatment-on-the-treated (TOT) effects. In some cases this date of entry canberelativelyeasytoidentify: forexample, thedatesonwhichbeneciariesenrollin a programmay be established through a household or facility survey orKing and Behrman 61administrativerecords(forexample,schoolenrollmentrostersorcliniclogbooks).In other cases, however, the identication may be more difcult: for example, ben-eciariesmaybeunabletodistinguishamongalternative, contemporaneouspro-gramsortorecall theirenrollmentdates, orthefacility orcentral programofcemaynotmonitorbeneciaryprogramenrollments.9Nonetheless,evenifthevari-ationintreatmentdateswithinprogramareasishandledadequatelyandenroll-ment dates are identied fairlyaccuratelyat the beneciary level, nonrandomimplementationdelaysbiasTOT estimates. Evenawell-speciedfacilityorhouse-hold survey may be adversely affected by unobservables that may be related to thedirectionandsizeof theprogramimpact. Thedurationofexposure,likeprogramtake-up,has tobetreatedasendogenous.Theproblemofselectionbiasmotivatesthe choice of random assignment to estimate treatment effects in social programs.Learning by ProvidersAdifferentimplementationlagisassociatedwiththefactthatprogramoperators(orprovidersof services) themselvesfacealearningcurvethat dependsontimein training and on-the-jobexperience. This most likely produces some variationinthequalityof programimplementationthatisindependentof whethertherehasbeenalagintheprocurement of thetraining. Thistooisanaspect of programoperationthat isoftennot capturedinimpact evaluations. Althoughtheevalu-ationof MadagascarsSEECALINEprogramallottedagraceperiodof twotofourmonthsforthetraining ofserviceproviders,itislikelythatmuchofthelearningby providers happened on the job afterthe formal training.Whilethelearningprocessofprogramoperatorsmaydelayfull programeffec-tiveness, anothereffectcouldbeworkingintheoppositedirection. Thepioneer-ingeffect means that implementers mayexhibit extradedication, enthusiasm,andeffortduringtherststages, becausetheprogrammayrepresentaninnova-tiveendeavortoattainanespeciallyimportantgoal.(AsimplisticdiagramofthiseffectisshowninFigure1, bottompanel.)JimenezandSawada(1999)ndthatnewer EDUCOschools inEl Salvador had better outcomes thanolder schools(withschool characteristics heldconstant). Theyinterpret this as evidenceof aHawthorne effectthat is, newer schools were more motivated andwilling toundertakereformsthanweretheolderschools. If suchaphenomenonexists, itwouldexertanoppositepull ontheestimatedimpactsand, if sufcientlystrong,might offset thelearningeffect, at least intheearlyphases of anewprogram.Over time, however, this extra dedication, enthusiasm, and effort are likely towane.10If thereareheterogeneities inthis unobservedpioneeringeffect acrossprogramsites that are correlated with observed characteristics (for example,schoolingof programstaff ), theresult will bebiasedestimates of theimpact ofsuch characteristics on initial program success.62 The World Bank Research Observer, vol. 24, no. 1 (February 2009)Spillover EffectsTheobservablegainsfromasocialprogramduringitsentireexistence,muchlessafter onlyafewyears of implementation, maybe anunderestimate of its fullpotentialimpactforseveralreasons thatareexternaltotheprogramdesign.First,evaluations aretypicallydesignedtomeasureoutcomes at thecompletionof aprogram, andyet theprogrammight yieldadditional andunintendedoutcomesinthelongerrun. Second, whiletheassignmentof individualsorgroupsof indi-viduals toatreatment canbe dened, programbeneciaries maynot actuallytakeupaninterventionor maynot dosountil after theyhavelearnedmoreabout theprogram. Third, withtime, control groups or groups other thantheintendedbeneciariesmight ndawayof obtainingthetreatment, ortheymaybe affected simply by learning about the existence of the programpossiblybecauseof expectationsthattheprogramwill beexpandedtotheirarea. If non-complianceiscorrelatedwiththeoutcomeof interest, thenthedifferenceintheaverageoutcomesbetweenthetreatmentandthecontrol groupsisabiasedesti-mateof theaverageeffect of theintervention. Wediscussthesethreeexamplesbelow.Short-Run and Long-Run OutcomesPrograms that invest in cumulative processes, such as a childs physiologicalgrowthandaccumulationof knowledge,requirethepassageoftime.Thisimpliesthat longer programexposurewouldyieldgreater gains, thoughprobablywithdiminishingmarginalreturns.Also,suchcumulativeprocessescouldleadtoout-comesbeyondthoseoriginallyintendedandpossiblybeyondthoseofimmediateinterest to policymakers. Earlychildhood development (ECD) programs are anexcellent exampleof short-runoutcomes that couldleadtolong-runoutcomesbeyondthoseenvisionedbytheoriginal design. Theseprogramsaimtomitigatethemultiplerisksfacingveryyoungchildren, andtopromotetheirphysical andmental development byimproving nutritionalintake and/or cognitive stimulation.TheliteraturereviewbyGrantham-McGregorandothers(2007)identiesstudiesthatuselongitudinal datafromBrazil, Guatemala, Jamaica, thePhilippines, andSouthAfricathat establishcausality betweenpreschool cognitive developmentand subsequent schoolingoutcomes. The studies suggest that a one standarddeviationincreaseinearlycognitivedevelopment predictssubstantiallyimprovedschooloutcomesinadolescence,asmeasuredbytestscores,gradesattained,anddropout behavior (for example, 0.71 additional grade by age 18 in Brazil).Lookingbeyondchildhood, Garces, Thomas, andCurrie(2002) ndevidencefromthe U.S. HeadStart programthat links preschool attendance not onlytohighereducationalattainmentbutalsotohigher earningsandbetter adultsocialKing and Behrman 63outcomes.Usinglongitudinaldata fromthePanelStudy ofIncomeDynamicsandcontrollingfor the participants disadvantagedbackground, theyconclude thatexposuretoHeadStartforwhitesisassociatedintheshortrunwithsignicantlylowerdropout rates, andinthelongrunwith30percent greater probabilityofhighschool completion, 28percent higher likelihoodof attendingcollege, andhigher earnings intheir earlytwenties. For African-Americans participationinHeadStart is associatedwitha12-percentage-point lower probabilityof beingbooked for or charged with a crime.Anotherexampleofanunintendedlong-runoutcomeisprovidedby anevalu-ation (Angrist, Bettinger, and Kremer 2004) of Colombias school voucherprogramatthesecondarylevel (PACESorProgramadeAmpliaciondeCoberturadelaEducacionSecundaria). Thisndslonger-runoutcomesbeyondtheoriginalprogramgoal of increasingthesecondaryschool enrollment rateof thepoorestyouths inurbanareas. Usingadministrative records, the follow-upstudyndsthat theprogramincreasedhigh-school graduationratesof voucher studentsinBogota by57percentagepoints,whichisconsistentwiththeearlier outcomeofa 10-percentage-point increase in eighth-grade completion rates (Angrist andothers 2002). Correcting for the greater percentage of lottery winners takingcollege admissions tests, the program increased test scores by two-tenths of a stan-dard deviation in the distribution of potential test scores.Intheirevaluationof arural roadsprojectinVietnam, MuandvandeWalle(2007) ndthat, becauseof developments external totheprogram, rural roadconstructionandrehabilitationproducedlargergainsasmoretimeelapsedafterprojectcompletion.Theimpactsofroadsdependonpeopleusingthem,soforthebenetsoftheprojecttobeapparent,morebicyclesormotorized vehiclesmustbemadeavailabletorural populationsconnectedbytheroads. But theimpactsofthenewroadsalsoincludeotherdevelopmentsthatarosemoreslowly,suchasaswitchfromagriculture to non-agricultural income-earning activities, and anincreaseinsecondaryschoolingfollowingariseinprimaryschool completion.These impacts grewat anincreasingrate as more months passed, takingtwoyears more on average to emerge.Inthelongrun, however, impacts canalsovanish. Short-termestimates arenot likelytobeinformativeabout suchissuesastheextent of diminishingmar-ginal returnstoexposure, whichwouldbeanimportantpartof theinformationbasisof policies. InVietnamtheimpact of therural roadsproject ontheavail-ability offoodsandonemploymentopportunitiesforunskilledjobsemergedquiterapidly:itthenwanedas thecontrolareascaughtupwiththeprogramareas,aneffectwereturntobelow(MuandvandeWalle2007).InJamaicaanutritional-supplementation-cum-psychological-stimulationprogramfor childrenunder twoyieldedmixedeffectsoncognitionandeducationyearslater(Walkerandothers2005). While the interventions beneted child developmentevenat age 11,64 The World Bank Research Observer, vol. 24, no. 1 (February 2009)stuntedchildrenwhoreceivedstimulationcontinuedtoshow cognitionbenetssmallimprovementsfromsupplementationnotedatage7werenolongerpresentatage11.Infact,impactcanvanishmuchsoonerafteratreatmentends.Intheexampleof tworandomizedtrials inIndia, althoughimpact roseinthesecondyearoftheprogram,one yearaftertheprogramshadended,impactdropped.Fortheremedial program, thegainfell to0.1of astandarddeviationandwas nolonger statistically signicant; for the computer learning program, the gaindroppedto0.09of astandarddeviation, thoughitwasstill signicant (Banerjeeand others 2007).Chen, Mu, andRavallion(2008) point tohowlonger-termeffects might beinvisibletoevaluators of thelong-termimpact of theSouthwest ChinaProject,whichgaveselectedpoorvillagesinthreeprovincesfundingforarangeof infra-structure investments and social services. The authors nd only small and statisti-callyinsignicantaverageincomegainsintheprojectvillagesfouryearsafterthedisbursementperiod. Theyattributethispartlytosignicantdisplacementeffectscausedbythe government cuttingthe fundingfor nonproject activities intheproject villagesandreallocatingresourcestothenonproject villages. Becauseofthesedisplacement effects, theestimatedimpacts of theproject arelikelytobeunderestimated.Toestimateanupperboundonthesizeofthisbias,theincreaseinspendinginthecomparisonvillages is assumedtobeequal tothedisplacedspendingintheproject villages. Underthisassumption, theupperboundof thebiascouldbeashighas 50percentandit couldbeevenlarger if theprojectactually has positive long-term benets.Long-termbenets, however, areoftennot apowerful incentivetosupport aprogramorpolicy. Theimpatienceof manypolicymakerswithapilot evaluatelearnapproachtopolicymakingandactionis usuallycoupledwithahighdis-countrate. Thisresultsinlittleappetitetoinvestinprogramsforwhichbenetsaremostlyenjoyedinthefuture. Evenaidagenciesexhibit thisimpatience, andyetprograms thatareexpectedtohavelong-runbenetswouldbejustthesortofinterventionthat development aid agencies should support because local poli-ticians are likely to dismiss them.Learning and Adoption by BeneciariesProgramsdonot necessarilyattainfull steady-stateeffectivenessafterimplemen-tation commences. Learning by providers and beneciaries may take time, anecessarytransformationofaccountabilityrelationshipsmaynothappenimmedi-ately, or the behavioral responses of providers andconsumers maybe slowinbecoming apparent.The success of a newchild-immunizationor nutritionprogramdepends onparentslearningabouttheprogramandbringingtheirchildrentotheproviders,King and Behrman 65and the providers giving treatment. InMexicos PROGRESAthe interventionswererandomlyassignedatthecommunitylevel. If programuptakewereperfect,a simple comparison between eligible children in the control and treatmentlocalitieswouldhavebeensufcientto estimate theprogram TOT effect(Behrmanand Hoddinott 2005). However, not all potential beneciaries sought services:only6164percentoftheeligiblechildrenaged4to24monthsandonlyhalfofthoseaged2to4yearsactuallyreceivedtheprogramsnutritional supplements.TheevaluationfoundnosignicantITT effects, butdidndthattheTOT effectswere signicant, despite individual and household controls.InColombiassecondary-educationvoucherprogramtoo,informationplayedaroleatboththelocal governmentlevel andthestudentlevel (King, Orazem, andWohlgemuth1999).Sincetheprogramwascofundedbythecentralandmunici-palgovernments,informationgiventothemunicipal governmentswascritical tosecuringtheir collaboration.Atthebeginning oftheprogram,thecentralgovern-ment met withthe heads of the departmental governments to announce theprogramand solicit their participation; in turn the departmental governorsinvited municipal governments to participate. Disseminationof informationtofamilieswasparticularlyimportant, becauseparticipationwasvoluntaryandtheprogramtargetedonly certainstudents(specicallythoselivinginneighborhoodsclassiedamongthetwolowestsocioeconomicstratainthecountry)onthebasisofspeciceligibility criteria.Somelocalgovernmentsusednewspapers todissemi-nate information about the program.Indecentralizationreforms, thelearningandadoptionprocessesarearguablymorecomplexbecausethedecisiontoparticipateandthesuccessof implemen-tationdepend on many more actors. Even the simplest form of this type of changeingovernanceentailsashiftintheaccountabilityrelationshipsbetweenlevelsofgovernment andbetweengovernments andprovidersforexample, thetransferof thesupervisionandfundingof publichospitalsfromthenational governmentto asubnationalgovernment.InNicaraguas autonomousschoolsprograminthe1990s, forexample, thedateaschool signedthecontract withthegovernmentwasconsideredtobethedatetheschool ofciallybecameautonomous. Infact,thesigningof thecontractwasmerelytherststeptowardschool autonomy: itwouldhavebeenfollowedbytrainingactivities, theelectionof theschool man-agement council, the development of a school improvement plan, and so on.Hence, thereforms full impact onoutcomes wouldhavebeenfelt onlyafter aperiodof time, and the size of this impact might have increased graduallyastheelementsofthereformwereputinplace.However,itisnoteasytodeterminethe length of the learning period. Among teachers, school directors, andparentsintheso-calledautonomousschools,theevaluationndsalack ofagree-mentonwhethertheirschoolshadbecomeautonomousandtheextentto whichthis hadbeen achieved(King and Ozler1998).An in-depthqualitativeanalysisin66 The World Bank Research Observer, vol. 24, no. 1 (February 2009)adozenrandomlyselectedschools conrmsthat school personnel haddifferentinterpretations of what had been achieved (Rivarola and Fuller 1999).StudiesofthediffusionoftheGreenRevolutioninAsiainthemid-1960shigh-light the role of social learningamongbeneciaries. Before adopting the newtechnology, individuals seemto have learned about it fromthe experiences oftheir neighbors (their previous decisions and outcomes). This wait-and-seeprocessaccountedforsomeof theobservedlagsintheuseof high-yieldingseedvarieties in India at the time (Foster and Rosenzweig 1995; Munshi 2004). In ricevillagestheproportionof farmerswhoadoptedthenewseedvarietiesrosefrom26percent intherst year followingtheintroductionof thetechnologyto31percentinthethirdyear; inwheatvillages, theproportionof adoptersincreasedfrom29percent to49percent. Farmerswhodidnot haveneighborswithcom-parable attributes (suchas farmsize or characteristics unobservedinavailabledatasuchas soil quality) mayhavehadtocarryout moreof theirownexper-imentation. This would probably have been a more costly formof learningbecause the farmers bore all the risk of the choices they made (Munshi 2004).Thelearningprocess at workduringtheGreenRevolutionis similar tothatdescribedbyMiguel andKremer (2003, 2004) about the importance of socialnetworks in the adoption of newhealth technology, in this case dewormingdrugs. Surveydataonindividual social networksof thetreatmentgroupinruralKenyareveal that social links providednontreatment groups better informationaboutthedewormingdrugs, andthusledtohigherprogramtake-up. Twoyearsafter thestart of thedewormingprogram, school absenteeismamongthetreat-ment grouphadfallenbyabout one-quarteronaverage. Thereweresignicantgainsinseveral measuresof healthstatusincludingreductionsinworminfec-tion, childgrowthstunting, andanemiaandgainsinself-reportedhealth. Butchildrenwhoseparentshadmoresociallinkstoearlytreatmentschoolsweresig-nicantlylesslikelytotakedewormingdrugs.Theauthorsspeculatethatthisdis-appointing nding could be due to overly optimistic expectations about the impactofthedrugs,ortothefactthatthehealthgainsfromdewormingtaketimetoberealized, while the side effects of the drugs are immediately felt.Providinginformationabout aprogram, however, is noguarantee of higherprogramuptake. One strikingexample of this is givenbya programinUttarPradesh, India, whichaimed to strengthencommunity participationinpublicschoolsbyprovidinginformationto villagemembers(Banerjeeandothers2008).Moreinformationapparentlydidnot leadtohigher participationbytheVillageEducationCommittee(VEC), byparents, orbyteachers. Theevaluatorsattributethispoorresulttomoredeep-seatedinformationblockages:villagememberswereunaware of the roles and responsibilities of the VEC, despite the existences ofthesecommitteessince2001, andalargeproportionof theVECmemberswerenot even aware of their membership.King and Behrman 67The nutritional component in PROGRESAwas undersubscribed (because parentslacked information about theprogramand itsbenets), andthe communitymobil-izationinUttarPradeshwasfoundwanting(becausebasicinformationabouttherolesandpowersofvillageorganizationsisdifculttoconvey).Impactevaluationsthat donot takeinformationdiffusionandlearningbybeneciariesintoaccountobtain downward-biased ITTand TOT impact estimates. The learning processmight be implicitfor example whenprograminformationdiffuses topotentialbeneciariesduringthecourseof implementation, perhaps primarilybywordofmouthoritcouldbeexplicit, forexamplewhenaprogramaimsaninformationcampaign at potential beneciaries during a well-dened time period.Two points are worth noting about the role of learning in impact evaluation. Oneisthesimplepointdiscussedabovethatlearningtakestime. Asteady-statelevel ofeffective demand among potential beneciaries (effective in the sense that the bene-ciariesactuallyacttoenrollinoruseprogramservices)isrelatedtotheprocessofexpanding effective demand for a program.11This implies that ITTestimates ofprogramimpact arebiaseddownwardif theestimatesarebasedondataobtainedpriortotheattainmentofthissteady-stateeffectivedemand.Theextentofthebiasdependsonwhetherlearning(or theexpansionof effectivedemand) iscorrelatedwithunobservedprogramattributes; specically, thereislessdownwardbiasif thiscorrelation is positive. There may be heterogeneity in this learning process: those pro-grams that have better unobservedmanagement capabilities maypromote morerapid learning, while those that have worse management capabilities may face slowerlearning. Heterogeneity in learning would affect the extent to which the ITT and TOTimpacts that are estimated before a program has approached effectiveness are down-ward-biasedbut to a lesser degree if the heterogeneity in learning is random.Thesecondpoint is that thelearningprocess itself maybeaprogramcom-ponent, andthusanoutcomeof interestinanimpactevaluation. Howbeneci-aries learnanddecide toparticipate is oftenexternal toaprogram, since thetypical assumptionis that beneciaries will take upaprogramif the programexists. Infact, theexposureof beneciariestospeciccommunicationinterven-tions about a program may be necessary to encourage uptake. There is a large lit-erature, forexample, thatshowsastrongassociationbetweenexposuretomass-mediainformationcampaigns andtheuseof contraceptivemethods andfamilyplanning services. The aims of such campaigns have been to make potential bene-ciaries aware of these services, andtobreakdownsociocultural resistance tothem (Cleland and others 2006). This socialmarketing approach has been usedalsotostimulate the demandfor insecticide-treatedmosquitonets for malariacontrol, and has increased demand, especially among the poorest and mostremotehouseholds(Rowlandandothers2002; Kikumbihandothers2005). Tounderstandhowlearningtakesplaceistobegintounderstandtheblackboxthat lies between programdesign and outcomesand if this learning were68 The World Bank Research Observer, vol. 24, no. 1 (February 2009)promotedinarandomfashion,itcouldserveasanexogenousinstrumentfortheestimation of program impact.Peer EffectsThelonger aprogramhasbeeninoperation,themorelikelyitis thatspecicinter-ventionswill spill overtopopulationsbeyondthetreatment groupandthusaffectimpact estimates. Peer effects increase impact, as inthe case of the HeadStartexample already mentioned. Garces, Thomas and Currie (2002) nd strong spillovereffects within the familyhigher birth-order children (that is, younger siblings) seemto benet more than their older siblings, especially among African-Americans,becauseoldersiblingsareabletoteachyoungerones. Hence, expandingthede-nition of impact to include peer effects adds to impact estimates.Peer effects also arise whenspecic programmessages (either directly fromcommunications interventions or fromobserving treatment groups) diffuse tocontrolgroupsandaltertheirbehaviorinthesamedirectionasinthetreatmentgroup.Whilethiscontagionisprobably desirablefromthepointofview ofpolicy-makers, it likelydepressesimpactestimatessincedifferencesbetweenthecontrolandtreatment groupsarediminished. Anotherformof leakagethat growswithtimemaynot besoharmless fromthepoint of viewof programobjectives. Forprograms that target onlyspecicpopulations, timeallows political pressuretobuildfor theprogramtobemoreinclusiveandevenfor nontargetedgroupstondways of obtainingtreatment (forexamplethroughmigrationintoprogramsites). Because of the demand-driven nature of the Bolivia Social InvestmentFund, forinstance, not all communitiesselectedforactivepromotionappliedforandreceivedaSIF-nancededucationproject,butsomecommunitiesnotselectedfor active promotionnevertheless appliedfor promotionand obtained anedu-cation project(Newman and others 2002).Heterogeneity of impactAn examination of howprogramimpact varies according to the observablecharacteristicsof thebeneciariescanteachusimportant lessonsonpolicyandprogramdesign. Ourfocushereisonoccasionswhendurationortimingdiffer-encesinteractwiththesourcesofheterogeneityinimpact.Oneimportantsourceof heterogeneity in some programs is cohort membership.Cohort EffectsThe age of beneciaries may be one reason why duration of exposure to aprogrammatters, and the estimates of ITTand TOTimpacts canbe affectedKing and Behrman 69substantiallybywhether thetimingistargetedtowardcritical ageranges. TakethecaseofECDprograms,suchasinfantfeeding andpreschooleducation,whichtargetchildrenforjustafewyearsafterbirth. Thisagetargetingisbasedontheevidencethatasignicantportionofa childs physicalandcognitivedevelopmentoccursata veryyoung age, andthatthereturns toimprovementsintheliving orlearningconditions of the childare highest at those ages. The epidemiologicalandnutritional literaturesemphasizethat childrenunder threeyearsof ageareespeciallyvulnerabletomalnutritionandneglect (seeEngleandothers2007forareview).FindingthatanutritionalsupplementationprograminJamaicadidnotproducelong-termbenets forchildren, Walkerandothers (2005) suggest thatprolonging the supplementationor supplementing at an earlier age, duringpregnancy, andsoonafter birthmight havebenetedlatercognition. It mighthavebeenmoreeffectivethantheattempttoreversetheeffectsofundernutritionthrough supplementation at an older age. Applying evaluation methods todrought shocks, Hoddinott and Kinsey (2001) also conclude that in ruralZimbabwechildrenintheagerangeof12to24monthsarethemostvulnerabletosuchevents: thesechildrenlose1.52centimetersof physical growth, whileolder children2to 5years of age do not seemto experience a slowdowningrowth.12In a follow-up study Alderman, Hoddinott, and Kinsey (2006) concludethat the longer the exposure of youngchildrentocivil war anddrought, thelargerthenegativeeffectoftheseshocksonchildheight;moreover,older childrensuffer less than younger children in terms of growth.13Interaction of Cohort Effects and Duration of ExposureAs discussed above, the impacts of some programs crucially depend on whether ornotanintendedbeneciaryisexposedtoaninterventionataparticularly criticalagerange, suchasduringtherst fewyearsof life. Otherstudiesillustratethatthedurationof exposureduringthecritical agerangealsomatters. Theevalu-ationbyFrankenberg, Suriastini, andThomas(2005) of IndonesiasMidwifeintheVillageprogramshows just this. Theprogramwas intendedtoexpandtheavailability of healthservicestomothersandthusimprovechildrenshealthout-comes. By exploitingthetimingof the(nonrandom)introductionof amidwifetoacommunity, theauthors distinguishbetweenthechildren, livinginthesamecommunity, who wereexposed and those who werenot exposedto a midwife. Theauthors groupthesampleof childrenintothreebirthcohorts. Foreachgroup,theextentof exposuretoavillagemidwifeduringthevulnerableperiodof earlychildhoodvariedasafunctionof whether thevillagehadamidwifeand, if so,whenshehadarrived. Incommunitiesthat hadamidwifefrom1993onward,childreninthe youngercohorthadbeenfully exposedtotheprogramwhendatawere collected, whereas childreninthe middle cohort hadbeenonlypartially70 The World Bank Research Observer, vol. 24, no. 1 (February 2009)exposed. The authors conclude that partial exposure to the village midwifeprogramconferrednobenets inimprovedchildnutrition, while full exposurefrombirthyieldedanincreaseintheheight-for-agez-scoreof 0.35to0.44of astandard deviation among children aged 1 to 4 years.Threeotherstudiestesttheextentto whichECDprogramimpactsaresensitivetothe durationof programexposure andthe ages of the childrenduringtheprogram.Behrman,Cheng,andTodd(2004)evaluatedtheimpactofapreschoolprograminBolivia, the ProyectoIntegral de DesarrolloInfantil. Their analysisexplicitlytakesintoaccount thedatesof programenrollment of individual chil-dren.Intheircomparisonoftreatedanduntreatedchildren,theyndevidenceofpositiveprogramimpactsonmotorskills,psychosocialskills,andlanguageacqui-sitionthat areconcentratedamongchildren37monthsof ageandolderat thetimeof theevaluation. Whentheydisaggregatedtheirresultsbythedurationofprogramexposure, the effects were most clearlyobservedamongchildrenwhohad been involved in the program for more than a year.LiketheBoliviaevaluation, theevaluationof theearlychildhooddevelopmentprograminthePhilippinesmentionedabovendsthattheprogramimpactsvaryaccordingtothedurationof exposureof children, althoughthisvariationisnotasdramaticasthevariationassociatedwithchildrensages(Armecinandothers2006).Administrativedelaysandthedifferentagesofchildrenatthestartoftheprogramresultedinthelengthof exposureof eligiblechildrenvaryingfrom0to30months,withameandurationof14monthsandasubstantialstandarddevi-ationof 6months. Durationof exposurevariedwidely, evenwhenachildsagewas controlledfor. The studynds that, for motor andlanguagedevelopment,two-andthree-year-oldchildrenexposedtotheprogramhadz-scores0.5to1.8standarddeviationshigher,depending onlengthofexposure, thanchildreninthecontrol areas, and that these gains were much lower among older children.Gertler(2004)alsoestimateshowdurationofexposuretohealthinterventionsin Mexicos PROGRESA affected the probability of child illness, using twomodelsoneassumesthat programimpact is independent of duration, andtheother allows impact to vary according to the length of exposure. The interventionsrequiredthat childrenunder 2years be immunized, visit nutritionmonitoringclinics, andobtainnutritional supplements, andthat theirparentsreceivetrain-ing on nutrition, health, and hygiene; children between 2 and 5 years of age wereexpectedtohavebeenimmunizedalready, butweretoobtaintheotherservices.Gertlerndsnoprogramimpactafteramere6monthsof programexposureforchildrenunder 3years of age, but with24months of programexposure theillnessrateamongthetreatmentgroupwasabout40percentlowerthantherateamong the control group, a difference that is signicant at the 1 percent level.Theinteractionof ageeffectsandthedurationof exposurehasbeenexaminedalsobyPitt, Rosenzweig, andGibbons(1993) andbyDuo(2001) inIndonesiaKing and Behrman 71andbyChin(2005) inIndiaintheir evaluations of schoolingprograms. Thesestudiesuseinformationontheregionandyear ofbirthofchildren,combinedwithadministrative data on the year and placement of programs, to measure duration ofprogramexposure. Duo(2001), forexample, estimatestheimpact of amassiveschool construction programon subsequent schooling attainment and on thewages of thebirthcohortsaffectedbytheprograminIndonesia. From1973to1978 more than 61,000 primary schools were built throughout the country,andtheenrollment rateamongchildrenaged712rosefrom69percent to83percent. Bylinkingdistrict-level dataonthenumberof newschoolsbyyearandmatchingthesedata withintercensalsurvey dataonmenbornbetween1950and1972, Duodenes howlonganindividual was exposedtothe program. Theimpactestimatesindicatethateachnewschoolper1,000childrenincreasedyearsof educationby0.120.19percent amongthe rst cohort fullyexposedtotheprogram.Chin (2005) uses a similar approach in estimating the impact of IndiasOperationBlackboard. Taking grades 15as the primaryschool grades, ages610asthecorrespondingprimaryschool ages, and1988astherstyearthatschoolswouldhavereceivedprogramresources,Chinsupposesthatonlystudentsborn in 1978or laterwould havebeen of primary schoolage for at least one yearintheprogramregime,andthereforewerepotentially exposedtotheprogramformost of their schooling. Theevaluationcomparestwobirthcohorts: ayoungercohort bornbetween1978and1983, andthereforepotentiallyexposedtotheprogram, andanoldercohort. Theimpactestimatessuggestthataccountingfordurationsomewhat lowers the impact as measured, but itremains statistically sig-nicant, though only for girls.ConclusionsThis paper has focused on the dimensions of timing and duration of exposure thatrelate to programor policy implementation. Impact evaluations of social pro-grams or policiestypicallyignorethesedimensions; theyassumethat interven-tions occur at aspecieddateandproduceintendedor predictablechanges inconditionsamongthebeneciarygroups. Thisisperhapsareasonableassump-tionwhentheinterventionitself occurswithinaveryshorttimeperiodandhasan immediate effect, such as some immunization programs, or is completelyunder thedirectionandcontrol of theevaluator, asinsmall pilot programs. Intheexampleswehavecited(IndiasGreenRevolution, MexicosPROGRESAcon-ditional cash transfer program, Madagascars child nutrition SEECALINEprogram,andan early childhooddevelopmentprograminthePhilippines,amongothers),thisis far fromtrue.Indeed,initial operationalts and startsin most pro-grams,andalearningprocessforprogramoperatorsandbeneciaries,candelay72 The World Bank Research Observer, vol. 24, no. 1 (February 2009)full programeffectiveness; also,therearemanyreasonswhythesedelaysarenotlikely to be the same across program sites.We have catalogued sources of the variation in the duration of programexposureacrosstreatmentareasandbeneciaries, includingprogramdesignfea-tures that have built-in waiting periods, lags in implementation due to administra-tive or bureaucratic procedures, spillover effects, and the interaction betweensources of heterogeneityinimpact anddurationof exposure. Someevaluationsdemonstratethat accountingforthesevariationsinlengthof programexposurealters impact estimates signicantly, so ignoring these variations cangeneratemisleadingconclusionsaboutanintervention. AppendixTableA-1indicatesthata number ofimpact evaluation studies do incorporate one or more of these timinganddurationeffects. Themost commonlyaddressedsourceof durationeffectsiscohort afliation. Thisisnot surprising, sincemanyinterventions, suchasedu-cationandnutritionprograms,areallocatedonthebasisofage,intermsofbothtimingof entryintoandexit fromtheprogram. Ontheotherhand, implemen-tation lags are recognized but often not explicitly addressed.What canbedonetocapturetimingandthevariationinlengthof programexposure? First, the qualityof programdata should be improved. Suchdatacouldcomefromadministrativerecordsonthedesignandimplementationdetailsof aprogram, incombinationwithsurveydataonprogramtake-upbybeneciaries.Programdataonthe timingof implementationare likelytobe available fromprogrammanagement units, but thesedatamaynot beavailableat thedesiredlevel of disaggregationthismightbethedistrict, community, providers, orindi-vidual,dependingonwherethevariationintimingisthoughttobethegreatest.Compiling such data on large programs that decentralize to numerous localofcescouldbecostly. Thereisobviouslyadifferenceintheprimaryconcernofthehigh-level programmanagerandof theevaluator. Theprogrammanagersconcern is the disbursement of project funds and the procurement of majorexpenditureitems, whereastheevaluatorsconcernwouldbetoascertainwhenthe funds and inputs reach treatment areas or beneciaries.Second, the timing of the evaluation should take into account the time path ofprogramimpacts.Figure1illustratesthatprogramimpact,howevermeasured,canchangeovertime,forvariousreasonsdiscussedinthepaper,sotherearerisksofnotndingsignicantimpactwhenaprogramisevaluatedtooearlyortoolate.Thelearningprocess byprogramoperators or bybeneciaries couldproduceacurveshowingincreasingimpactovertime,whileapioneeringeffectcouldshowaveryearlysteepriseinprogramimpact that isnot sustainable. Figure1thussuggeststhatearlyrapidassessmentstojudgethesuccessof aprogramcouldbemisleading, and also that repeated observations may be necessary to estimatetrueimpact.Severalstudiesthatwereviewedmeasuredtheiroutcomesofinterestmorethanonceafter thestart of thetreatment, andsomecomparedshort-runKing and Behrman 73and long-runeffects to examine whether the short-runimpact had persisted.Possible changes inimpact over time imply that evaluations should not be aonce-off activity for any long-lasting programor policy. In fact, as discussedabove, examining long-termimpactscouldpointto valuablelessonsaboutthedif-fusionof goodpractices over time(FosterandRosenzweig1995) or, sadly, howgovernments canreduce impact by implementingother policies that ( perhapsunintentionally) disadvantage the programareas (Chen, Mu, and Ravallion2008).Third,theappropriateevaluationmethodappliedshouldtakeintoaccountthesourceof variationindurationof programexposure. Impact estimates areaffectedbythelengthofprogramexposure,depending onwhether ornotthesourceofvariationindurationiscommonwithina treatmentareaandwhether ornotthissourceisarandomphenomenon. Somepointersare: If thelengthof implementationlagsis about equal across treatment sites, then a simple comparison between the bene-ciaries inthetreatment andcontrol areas wouldbesufcient toestimatetheaverageimpactoftheprogramortheITT effectsundermany conditionsthoughnotiftherearesignicantlearning orpioneering effects thatdiffer across them.Ifthe delays varyacross treatment areas but not withinthose areasandif thevariation is random or independent of unobservable characteristics in theprogramareasthatmay alsoaffectprogrameffectivenessthenitisalsopossibletoestimatetheITTeffects withappropriatecontrols for thearea, or withxedeffects for different exposurecategories. Incases wheretheinterventionanditsevaluationaredesignedtogether, suchaspilotprograms, itispossibleanddesir-abletoexplorethetimepathofprogramimpactby allocatingtreatmentgroups todifferent lengthsof exposureinarandomizedway. Thistreatment allocationonthe basis of duration differences can yield useful operational lessons aboutprogram design, so it deserves more experimentation in the future.74 The World Bank Research Observer, vol. 24, no. 1 (February 2009)AppendixTable A-1. Examplesof EvaluationsThat ConsiderTiming Issuesand Duration of Program Exposure in Estimating Program ImpactStudies Country InterventionSources of variation in timing andduration of program exposureImplementationlagsShort-run andlong-runoutcomesLearning bybeneciariesLearning and useby beneciariesCohorteffectsCohort interactedwith duration ofexposureAngrist and others(2002)Colombia School voucherprogram forsecondarylevelx xAngrist, Bettinger,and Kremer (2004)Armecin and others(2006)Philippines Comprehensive earlychildhood developmentprogram (ECD)x x xBanerjee and others(2007)India Balsakhi school remedial andcomputer-assisted learningprogramsx xBehrman andHoddinott(2005)Mexico PROGRESA nutritioninterventionxBehrman, Cheng,and Todd (2004)Bolivia PIDI preschool program x xBehrman, Sengupta,and Todd (2005)Mexico PROGRESA educationinterventionxSchultz (2004)Chin (2005) India Operation Blackboard:additional teachers perschoolx x xDuo (2001) Indonesia School constructionprogram x x xContinuedKingandBehrman75Table A-1. ContinuedStudies Country InterventionSources of variation in timing andduration of program exposureImplementationlagsShort-run andlong-runoutcomesLearning bybeneciariesLearning and useby beneciariesCohorteffectsCohort interactedwith duration ofexposureFoster and Rosenzweig(1995)India Green Revolution: new seedvarietiesxFrankenberg,Suriastini, andThomas (2005)Indonesia Midwife in the Villageprogramx xGalasso and Yau(2006)Madagascar SEECALINE child nutritionprogramx x x xGarces, Thomas,and Currie (2002)UnitedStatesHead Start program: ECD xHoddinott and Kinsey(2001)Zimbabwe Drought shocks; civil war x x xAlderman, Hoddinott,and Kinsey (2006)Jimenez and Sawada(1999)El Salvador EDUCO schools: communityparticipationx76TheWorldBankResearchObserver,vol.24,no.1(February2009)Gertler (2004) Mexico PROGRESA health andnutrition servicesx xKing and Ozler(1998)Nicaragua School autonomy reform xRivarola and Fuller(1999)Miguel and Kremer(2003, 2004)Kenya School-based dewormingprogramxMu and van de Walle(2007)Vietnam Rural roads rehabilitationprojectx xMunshi(2004) India Green Revolution: new seedvarietiesxRouse and Krueger(2004)UnitedStatesFast ForWord program:computer-assisted learningxWalker and others(2005)Jamaica Nutrition supplementation x xNote: Reviewarticlesonearlychildhooddevelopmentprogramsforexample, Engleandothers(2007)andGrantham-McGregorandothers(2007)coveralonglistof studiesthatwementioninthetextbutarenotlistedinthistable; manyof thosestudiesexamineage-speciceffects, andsomeexamineshort-andlong-runimpacts.KingandBehrman77NotesElizabeth M. King (corresponding author) is Research Manager, Development Research Group, at theWorldBank;heraddressforcorrespondenceiseking@worldbank.org.JereR.BehrmanisProfessor,Department of Economics, at the Universityof Pennsylvania. The authors are grateful toLauraChiodaandtothreeanonymousrefereesforhelpful commentsonapreviousdraft. All remainingerrors are ours.1. Forinstance, theInternational InitiativeforImpactEvaluation(3IE)hasbeensetupbygov-ernmentsof several countries, donoragencies, andprivatefoundationstoaddressthedesireof thedevelopment community to build up systematically more evidence about effective interventions.2. There have been excellentreviews of thechoice of methodsas applied to social programs. See,for example, Grossman(1994), Heckmanand Smith(1995), Ravallion(2001), Cobb-ClarkandCrossley (2003), and Duo (2004).3. To keepthe discussion focusedon thetiming issue and the duration of exposure, we avoid dis-cussingthe specic evaluationmethod(or methods) usedbythe empirical studies that we cite.However,werestrictourselectionofstudies toreviewtothosethathaveasoundevaluationdesign,whether experimentalorusing econometric techniques.Nor do we discuss estimationissues such assample attritionbias, whichis one of the ways inwhichadurationissue has beentakenintoaccount in the evaluation literature.4. See Heckman, Lalonde, and Smith (1999) for a review.5. BecauseRouseandKrueger(2004)denethetreatmentgroupmorestringently,however,thecounterfactual treatment received by the control students becomes more mixed, and a share of thesestudents is contaminated by partial participation in the program.6. Intheir assessment of the returns toWorldBankinvestment projects, Pohl andMihaljek(1992)citeconstructiondelaysamongtherisksthataccountfora wedgebetweenexante(apprai-sal)estimatesandexpostestimatesofratesofreturns.They estimatethat,onaverage,projectstakeconsiderably more time to implement than expected at appraisal: six years ratherthan fouryears.7. InMexicos well-knownPROGRESAprogram, payment records fromanevaluationsampleshowedthat27percentoftheeligiblepopulationhadnotreceivedbenetsafter almosttwo yearsofprogramoperation, possiblyas aresult of delays insettinguptheprograms management infor-mationsystem(RawlingsandRubio2005). InEcuadorsBonodeDesarrolloHumano, thelistsofthebeneciarieswhohadbeenallocatedthetransferthroughalotterydidnotreachprogramoper-ators, and so about 30 percent of them did not take up the program (Schady and Araujo 2008).8. Chin(2005) makes a one-year adjustment inher evaluationof OperationBlackboard inIndia. AlthoughtheIndiangovernmentallocatedanddisbursedfundsfortheprogramforthersttimeinscal 1987, not all schoolsreceivedprogramresourcesuntil thefollowingschool year. Inadditiontothedelayinimplementation, Chinalsondsthatonlyone-quarterandone-half of theprojectteacherswere sentto one-teacherschools,while the remainingproject teacherswere usedinwaysthecentral government hadnot intended. Apparently, thestateandlocal governments hadexercised their discretion in the use of the OB teachers.9. Intwoprograms thatweknow,administrativerecordsattheindividual levelweremaintainedatlocal programofces, notatacentral programofce, andlocal record-keepingvariedinqualityandform(forexample,somerecordswerecomputerizedandsomewerenot),sothatamajoreffortwas required to collect and check records during the evaluations.10. Leonard (2008)provides an exampleof such an effect.He uses the presenceof a Hawthorneeffect, producedbytheunexpectedarrival of aresearchteamtoobserveaphysician, inorder tomeasureanexogenous,short-termchangeinthequality ofserviceprovidedby aphysician.Indeed,therewasasignicant jumpinqualityuponthearrival of observers, but qualityreturnedtopre-visit levels aftersome time.11. Informationcampaignsforprogramsthat attempt toimproveprimary-school qualityortoenhancechildnutritionthroughprimary-school feedingprogramsinacontext inwhichvirtually78 The World Bank Research Observer, vol. 24, no. 1 (February 2009)allprimary-school-agechildrenarealready enrolledwouldseemlessrelevantthansuchcampaignsaspartofanewprogramtoimprovepreschoolchilddevelopmentwheretherehadpreviouslybeenno preschool programs.12. Theauthorsestimatetheimpactof atypicallylowrainfall levelsbyincludingayearsdelaybecausethefoodshortageswouldbeapparentonly oneyearafterthedrought, butbeforethenextharvest was ready.13. Toestimate these longer-runimpacts, Alderman, Hoddinott, andKinsey(2006) combinedata onchildrens ages withinformationonthe durationof the civil war and the episodes ofdroughtusedintheir analysis.Theyundertookanewhouseholdsurveytotracechildrenmeasuredin earliersurveys.ReferencesAlderman, Harold, JohnHoddinott, andWilliamKinsey. 2006. LongTermConsequencesof EarlyChildhood Malnutrition. Oxford Economic Papers 58(3):45074.Angrist,Joshua,EricBettinger,andMichaelKremer.2004.Long-TermConsequencesofSecondarySchool Vouchers: Evidence fromAdministrative Records in Colombia. National Bureau ofEconomic Research Working Paper No. 10713, August.Angrist, Joshua D., Eric Bettinger, Erik Bloom, Elizabeth M. King, and Michael Kremer. 2002.VouchersforPrivateSchoolinginColombia:EvidencefromaRandomizedNaturalExperiment.American Economic Review 92(5):153558.Armecin, Graeme, JereR. Behrman, PaulitaDuazo, SharonGhuman, SocorroGultiano, ElizabethM. King, andNannette Lee. 2006. EarlyChildhood Development throughanIntegrated Program:Evidence from the Philippines. Policy Research Working Paper 3922. Washington, DC: World Bank.Banerjee, Abhijit, Shawn Cole, Esther Duo, and Leigh Linden. 2007. Remedying Education:EvidencefromTwoRandomizedExperiments inIndia.QuarterlyJournal of Economics 122(3):123564.Banerjee, AbhijitV., Rukmini Banerji, EstherDuo, Rachel Glennerster, andStuti Khemani. 2008.Pitfalls of Participatory Programs: Evidence froma Randomized Evaluation in Education in India.Policy Research Working Paper 4584. Washington, DC: World Bank.Behrman, Jere R., and John Hoddinott. 2005. Programme Evaluation with UnobservedHeterogeneity andSelective Implementation:TheMexicanProgresa ImpactonChildNutrition.Oxford Bulletin of Economics and Statistics 67(4):54769.Behrman,JereR.,YingmeiCheng,andPetraE.Todd.2004.EvaluatingPreschoolProgramsWhenLengthofExposuretotheProgramVaries:ANonparametricApproach.ReviewofEconomicsandStatistics 86(1):10832.Behrman, JereR., Piyali Sengupta, andPetraTodd. 2005. ProgressingthroughPROGRESA: AnImpact Assessment of MexicosSchool SubsidyExperiment.EconomicDevelopment andCulturalChange 54(1):23775.Chen,Shaohua, Ren Mu, and Martin Ravallion.2008. Are There Lasting Impacts of Aidto Poor Areas?Policy Research Working Paper 4084. Washington, DC: World Bank.Chin, Aimee. 2005. CanRedistributingTeachers across Schools Raise Educational Attainment?Evidence from Operation Blackboard in India. Journal of Development Economics 78(2):384405.Cleland, John, StanBernstein, AlexEzeh, Anibal Faundes, AnnaGlasier, andJoleneInnis. 2006.Family Planning: The Unnished Agenda. Lancet 368(November):181027.Cobb-Clark, Deborah A., and Thomas Crossley. 2003. Econometrics for Evaluations: AnIntroduction to Recent Developments. Economic Record 79(247):491511.King and Behrman 79Duo, Esther. 2001. Schooling and Labor Market Consequences of School Construction inIndonesia: Evidence froman Unusual Policy Experiment. American Economic Review91(4):795813.. 2004. ScalingUpandEvaluation.InF. Bourguignon, andB. Pleskoviceds., Annual WorldBank Conference on Development Economics: Accelerating Development. Washington, DC: WorldBank.Engle,PatriceL.,MaureenM.Black,JereR.Behrman,MeenaCabraldeMello,PaulJ.Gertler,LydiaKapiriri, and Reynaldo Martorell, and Mary Eming Young, and the International ChildDevelopment SteeringGroup. 2007. StrategiestoAvoidtheLossof Developmental Potential inMore Than 200 Million Children in the Developing World. Lancet 369(January):22942.Foster, AndrewD., andMarkR. Rosenzweig.1995. LearningbyDoingandLearningfromOthers:Human Capital and Technical Change in Agriculture. Journal of Political Economy 103(6):11761209.Frankenberg, Elizabeth, WayanSuriastini, andDuncanThomas. 2005. CanExpandingAccesstoBasicHealthCareImproveChildrens HealthStatus? Lessons fromIndonesias MidwifeintheVillage Programme. Population Studies 59(1):519.Galasso, Emanuela, andJeffreyYau. 2006. LearningthroughMonitoring: LessonsfromaLarge-ScaleNutritionPrograminMadagascar. PolicyResearchWorkingPaper4058. Washington, DC: WorldBank.Garces, Eliana, DuncanThomas, and Janet Currie. 2002. Longer-TermEffects of Head Start.American Economic Review 92(4):9991012.Gertler, Paul J. 2004. Do Conditional Cash Transfers Improve Child Health? Evidence fromProgresas Control Randomized Experiment. American Economic Review 94(2):33641.Grantham-McGregor, Sally, Yin Bun Cheung, Santiago Cueto, Paul Glewwe, Linda Richter,and Barbara Strupp, and the International Child Development Steering Group. 2007.Developmental Potential inthe First 5Years for ChildreninDeveloping Countries. Lancet369(9555):6070.Grossman, JeanBaldwin.. 1994. EvaluatingSocial Policies: PrinciplesandU.S. Experience.WorldBank Research Observer 9(2):15980.Heckman,JamesJ.,andJeffreyA.Smith.1995.AssessingtheCaseforSocialExperiments.Journalof Economic Perspectives 9(2):85110.Heckman,JamesJ.,R.J.Lalonde,andJeffreyA.Smith.1999.TheEconomicsandEconometricsofActiveLaborMarket Programs.InOrleyAshenfelter, andDavidCard, eds., Handbookof LaborEconomics. Vol. 1. Amsterdam: North-Holland.Hoddinott,John,andWilliamKinsey.2001.ChildGrowthintheTimeofDrought.OxfordBulletinof Economics and Statistics 63(4):40936.Jimenez, Emmanuel, andYasuyuki Sawada. 1999. DoCommunity-ManagedSchools Work? AnEvaluation of El Salvadors EDUCO Program. World Bank Economic Review 13(3):41541.Kikumbih, Nassor, KaraHanson, AnneMills, Hadji Mponda, andJoannaArmstrongSchellenberg.2005. The Economics of Social Marketing: The Case of Mosquito Nets inTanzania. SocialScience and Medicine 60(2):36981.King, Elizabeth M., and Berk Ozler. 1998. Whats Decentralization Gotto Do withLearning? The Case ofNicaraguasSchool AutonomyReform. WorkingPaperonImpactEvaluationof EducationReforms9 (June). Washington, DC: Development Research Group, World Bank.King, ElizabethM., PeterF. Orazem, andDarinWohlgemuth. 1999. Central MandatesandLocalIncentives: Colombias Targeted Voucher Program. World Bank Economic Review 13(3):46791.80 The World Bank Research Observer, vol. 24, no. 1 (February 2009)Leonard,KennethL..2008.IsPatientSatisfactionSensitivetoChangesintheQuality ofCare?AnExploitation of the Hawthorne Effect. Journal of Health Economics 27(2):44459.Miguel, Edward, and Michael Kremer. 2003. Social Networks and Learning about Health inKenya. National Bureau of Economic Research Working Paper and Center for GlobalDevelopment, manuscript, July 2003..2004.Worms:IdentifyingImpactsonEducationandHealthinthePresenceofTreatmentExternalities. Econometrica 72(1):159217.Mu, Ren, andDominiquevandeWalle. 2007. Rural RoadsandPoor AreaDevelopment inVietnam.Policy Research Working Paper Series 4340. World Bank.Munshi, Kaivan. 2004. Social LearninginaHeterogeneous Population: TechnologyDiffusioninthe Indian Green Revolution. Journal of Development Economics 73(1):185213.Newman, JohnL., MennoPradhan, LauraB. Rawlings, Geert Ridder, RamiroCoa, andJoseLuisEvia. 2002. AnImpactEvaluationof Education, Health, andWaterSupplyInvestmentsbytheBolivian Social Investment Fund. World Bank Economic Review 16(2):24174.Pitt, Mark M., Mark R. Rosenzweig, and Donna M. Gibbons. 1993. The Determinants andConsequences of thePlacement of Government Programs inIndonesia.WorldBankEconomicReview 7(3):31948.Pohl, Gerhard, andDubravkoMihaljek. 1992. Project EvaluationandUncertaintyinPractice: AStatistical Analysis of Rate-of-ReturnDivergences of 1,015WorldBankProjects. World BankEconomic Review 6(2):25577.Ravallion, Martin. 2001. The Mystery of the Vanishing Benets: An Introduction to ImpactEvaluation. World Bank Economic Review 150(1):11540.Ravallion, Martin.. 2002. Are the Poor Protected fromBudget Cuts? Evidence for Argentina.Journal of Applied Economics 5(1):95121.Ravallion, Martin, Emanuela Galasso, Teodoro Lazo, and Ernesto Philipp. 2005. What CanEx-Participants Reveal about a Programs Impact? Journal of Human Resources 40(1):20830.Rawlings, Laura B., and Gloria M. Rubio. 2005. Evaluating the Impact of Conditional CashTransfer Programs. World Bank Research Observer 20(1):2955.Rivarola, Magdalena, andBruce Fuller. 1999. Nicaraguas Experiment toDecentralize Schools:Contrasting Views of Parents, Teachers, and Directors. Comparative Education Review43(4):489521.Rosenzweig,MarkR.,andKennethI.Wolpin.1986.EvaluatingtheEffectsofOptimallyDistributedPublicPrograms: ChildHealthandFamilyPlanningInterventions.AmericanEconomic Review76(3):47082.Rouse,CeciliaElena, andAlanB. Krueger. 2004. PuttingComputerizedInstructiontotheTest: ARandomized Evaluation of a Scientically Based Reading Program. Economics of EducationReview 23(4):32338.Rowland, Mark, Jayne Webster, Padshah Saleh, Daniel Chandramohan, TimFreeman, BarbaraPearcy, NaeemDurrani, Abdur Rab, andNasir Mohammed. 2002. Preventionof MalariainAfghanistanthroughSocial Marketingof Insecticide-TreatedNets: Evaluationof CoverageandEffectiveness by Cross-sectional Surveys and Passive Surveillance. Tropical Medicine andInternational Health 7(10):81322.Schady, Norbert, and Maria Caridad Araujo. 2008. Cash Transfers, Conditions, and SchoolEnrollment in Ecuador. Econom a 8(2):4370.Schultz, T. Paul. 2004. School Subsidies for thePoor: EvaluatingtheMexicanProgresaPovertyProgram. Journal of Development Economics 74(2):199250.King and Behrman 81Walker, SusanP., SusanM. Chang, ChristineA. Powell, andSallyM. Grantham-McGregor. 2005.Effects of Early Childhood Psychosocial Stimulation and Nutritional Supplementation onCognition and Education in Growth-Stunted Jamaican Children: Prospective Cohort Study.Lancet 366(9499):180407.82 The World Bank Research Observer, vol. 24, no. 1 (February 2009)