The Economics and Econometrics of Active Labor Market...

73
The Economics and Econometrics of Active Labor Market Programs James J. Heckman, University of Chicago Robert J. LaLonde, Michigan State University and Je¤rey A. Smith, University of Western Ontario Prepared for the Handbook of Labor Economics, Volume III, Orley Ashenfelter and David Card, editors. We thank Susanne Ackum Agell for her helpful comments on Scan- danavian active labor market programs and Costas Meghir for very helpful comments on Sections 1-7. 1

Transcript of The Economics and Econometrics of Active Labor Market...

Page 1: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

The Economics and Econometrics of Active Labor Market Programs

James J. Heckman, University of Chicago

Robert J. LaLonde, Michigan State University

and

Je¤rey A. Smith, University of Western Ontario

Prepared for the Handbook of Labor Economics, Volume III, Orley Ashenfelter andDavid Card, editors. We thank Susanne Ackum Agell for her helpful comments on Scan-danavian active labor market programs and Costas Meghir for very helpful comments onSections 1-7.

1

Page 2: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

The Economics and Econometrics of Active Labor ProgramsContents

1. Introduction2. Public Job Training and Active Labor Market Policies3. The Evaluation Problem and the Parameters of Interest in Evaluating Social Programs

3.1 The Evaluation Problem3.2 The Counterfactuals of Interest3.3 The Counterfactuals Most Commonly Estimated in the Literature3.4 Is Treatment on the Treated an Interesting Economic Parameter?

4. The Prototypical Solutions to the Evaluation Problem4.1 The Before-After Estimator4.2 The Di¤erence-in-Di¤erences Estimator4.3 The Cross Section Estimator

5. Social Experiments5.1 How Social Experiments Solve the Evaluation Problem5.2 Intention to Treat and Substitution Bias5.3 Social Experiments in Practice

5.3.1 Two Important Social Experiments5.3.2 The Practical Importance of Dropping Out and Substitution5.3.3 Additional Problems Common to All Evaluations

6. Econometric Models of Outcomes and Program Participation6.1 Uses of Economic Models6.2 Prototypical Models of Earnings and Program Participation6.3 Expected Present Value of Earnings Maximization

6.3.1 Common Treatment E¤ect6.3.2 A Separable Representation6.3.3 Variable Treatment E¤ect6.3.4 Imperfect Credit Markets6.3.5 Training as a Form of Job Search

6.4 The Role of Program Eligibility Rules in Determining Participation6.5 Administrative Discretion and the E¢ciency and Equity of Training Provision6.6 The Con‡ict between the Economic Approach to Program Evaluation and the

Modern Approach to Social Experiments7. Non-experimental Evaluations

7.1 The Problem of Causal Inference in Non-experimental Evaluations

2

Page 3: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

7.2 Constructing a Comparison Group7.3 Econometric Evaluation Estimators7.4 Identi…cation Assumptions for Cross-Section Estimators

7.4.1 The Method of Matching7.4.2 Index Su¢cient Methods and the Classical Econometric Selection Model7.4.3 The Method of Instrumental Variables7.4.4 The Instrumental Variable Estimator as a Matching Estimator7.4.5 IV Estimators and the Local Average Treatment E¤ect7.4.6 Regression Discontinuity Estimators

7.5 Using Aggregate Time Series Data on Cohorts of Participants to Evaluate Programs7.6 Panel Data Estimators

7.6.1 Analysis of the Common Coe¢cient Model7.6.2 The Fixed E¤ects Method7.6.3 Ut Follows a First-Order Autoregressive Process7.6.4 Ut is Covariance Stationary7.6.5 Repeated Cross-Section Analogs of Longitudinal Procedures7.6.6 The Fixed E¤ect Model7.6.7 The Error Process Follows a First-Order Autoregression7.6.8 Covariance Stationary Errors7.6.9 The Anomalous Properties of First Di¤erence or Fixed E¤ect Models7.6.10 Robustness of Panel Data Methods in the Presence of Heterogeneous

Responses to Treatment7.6.11 Panel Data Estimators as Matching Estimators

7.7 Robustness to Biased Sampling Plans7.7.1 The IV Estimator and Choice-Based Sampling7.7.2 The IV Estimator and Contamination Bias7.7.3 Repeated Cross-Section Methods with Unknown Training Status and Choice-

Based Sampling7.8 Bounding and Sensitivity Analysis

8. Econometric Practice8.1 Data Sources

8.1.1 Using Existing General Survey Data Sets8.1.2 Using Administrative Data8.1.3 Collecting New Survey Data8.1.4 Combining Data Sources

8.2 Characterizing Selection Bias8.3 A Simulation Study of the Sensitivity of Nonexperimental Methods

8.3.1 A Model of Earnings and Program Participation

3

Page 4: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

8.3.2 The Data Generating Process8.3.3 The Estimators We Examine8.3.4 Results from the Simulations

8.4 Speci…cation Testing and the Fallacy of Alignment9. Indirect E¤ects, Displacement, and General Equilibrium Treatment E¤ects

9.1 Review of Traditional Approaches to Displacement and Substitution9.2 General Equilibrium Approaches

9.2.1 Davidson and Woodbury9.2.2 Heckman, Lochner, and Taber

9.3 Summary on General Equilibrium Approaches10. A Survey of Empirical Findings

10.1 The Objectives of Program Evaluations10.2 The Impact of Government Programs on Labor Market Outcomes10.3 The Findings from U.S. Social Experiments10.4 The Findings from Non-experimental Evaluations of U.S. Programs10.5 The Findings from European Evaluations

11. Conclusions

1 IntroductionPublic provision of job training, of wage subsidies and of job search assistance is a featureof the modern welfare state. These activities are cornerstones of European “active labormarket policies,” and have been a feature of U.S. social welfare policy for more than threedecades. Such policies also have been advocated as a way to soften the shocks administeredto the labor markets of former East Bloc and Latin economies currently in transition tomarket-based systems.

A central characteristic of the modern welfare state is a demand for “objective” knowl-edge about the e¤ects of various government tax and transfer programs. Di¤erent partiesbene…t and lose from such programs. Assessments of these bene…ts and losses often playcritical roles in policy decision-making. Recently, interest in evaluation has been elevatedas many economies with modern welfare states have ‡oundered, and as the costs of runningwelfare states have escalated.

This chapter examines the evidence on the e¤ectiveness of such welfare state active la-bor market policies such as training, job search and job subsidy policies, and the methodsused to obtain the evidence on their e¤ectiveness. Our methodological discussion of alter-native approaches to evaluating programs has more general interest. Few U.S. governmentprograms have received such intensive scrutiny, and have been subject to so many di¤erent

4

Page 5: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

types of evaluation methodologies as has governmentally-supplied job training. In part,this is due to the fact that short run measures of government training programs are moreeasily obtained and are more readily accepted. Outcomes such as earnings, employment,and educational and occupational attainment are all more easily measured than the out-comes of health and public school education programs. In addition, short run measures ofthe outcomes of training programs are more closely linked to the “treatment” of training.In public school and health programs, a variety of inputs over the life cycle often give riseto measured outcomes. For these programs, attribution of speci…c e¤ects to speci…c causesis more problematic.

A major focus of this chapter is on the general lessons learned from over thirty yearsof experience in evaluating government training programs. Most of our lessons come fromAmerican studies because the U.S. government has been much more active in promotingevaluations than have other governments, and the results from the evaluations are oftenused to expand – or contract – government programs. We demonstrate that recent studiesin Europe indicate that the basic patterns and lessons from the American case apply moregenerally.

The two relevant empirical questions in this literature are (i) adjusting for their lowerskills and abilities, do participants in government employment and training programs ben-e…t from these programs? and (ii) are these programs worthwhile social investments? Ascurrently constituted, these programs are often ine¤ective on both counts. For most groupsof participants, the bene…ts are modest, and at worst participation in government programsis harmful. Moreover, many programs and initiatives can not pass a cost-bene…t test. Evenwhen programs are cost e¤ective, they are rarely associated with a large scale improvementin skills. But, at the same time, there is substantial heterogeneity in the impacts of theseprograms. For some groups these programs appear to generate signi…cant bene…ts both tothe participants and to society.

We believe that there are two reasons why the private and social gains from theseprograms are generally small. First, the per-capita expenditures on participants are usuallysmall relative to the de…cits that these programs are being asked to address. In order forsuch interventions to generate large gains they would have to be associated with verylarge internal rates of return. Moreover, these returns would have to larger than whatis estimated for private sector training (Mincer, 1993). Another reason that the gainsfrom these programs are generally low is that these services are targeted toward relativelyunskilled and less able individuals. Evidence on the complementarity between the returnsto training and skill in the private sector suggests that the returns to training in the publicsector should be relatively low.

We also survey the main methodological lessons learned from thirty years of evaluationactivity conducted mainly in the United States. We have identi…ed eight lessons from the

5

Page 6: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

evaluation literature that we believe should guide practice in the future. First, there aremany parameters of interest in evaluating any program. This multiplicity of parametersresults in part because of the heterogeneous impacts of these programs. As a result ofthis heterogeneity, some popular estimators that are well-suited for estimating one setof parameters are poorly suited for estimating others. Understanding that responses tothe same measured treatment are heterogenous across people, that measured treatmentsthemselves are heterogeneous, that in many cases people participate in programs based inpart on this heterogeneity and that econometric estimators should allow for this possibility,is an important insight of the modern literature that challenges traditional approaches toprogram evaluation. Because of this heterogeneity, many di¤erent parameters are requiredto answer the interesting evaluation questions.

Second, there is inherently no method of choice for conducting program evaluations.The choice of an appropriate estimator should be guided by the economics underlying theproblem, the data that are available or that can be acquired, and the evaluation questionbeing addressed.

A third lesson from the evaluation literature is that better data helps a lot. The dataavailable to most analysts have been exceedingly crude as we document below. Too muchhas been asked of econometric methods to remedy the defects of the underlying data. Whencertain features of the data are improved, the evaluation problem becomes much easier. Thebest solution to the evaluation problem lies in improving the quality of the data on whichevaluations are conducted and not in the development of formal econometric methods tocircumvent inadequate data.

Fourth, it is important to compare comparable people. Many non-experimental eval-uations identify the parameter of interest by comparing observationally di¤erent personsusing extrapolations based on inappropriate functional forms imposed to make incompa-rable people comparable. A major advantage of nonparametric methods for solving theproblem of selection bias is that, rigorously applied, they force analysts to compare onlycomparable people.

Fifth, evidence that di¤erent non-experimental estimators produce di¤erent estimatesof the same parameter does not indicate that non-experimental methods cannot address theunderlying self-selection problem in the data. Instead, di¤erent estimates obtained fromdi¤erent estimators simply indicate that di¤erent estimators address the selection problemin di¤erent ways and that non-random participation in social programs is an importantproblem that deserves more attention in its own right. Di¤erent methods produce thesame estimates only if there is no problem of selection bias.

Sixth, a corollary lesson, derived from lessons three, four and …ve, is that the messagefrom LaLonde’s (1986) in‡uential study of nonexperimental estimators has been misunder-stood. Once analysts de…ne bias clearly, compare comparable people, know a little about

6

Page 7: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

the unemployment histories of trainees and comparison group members, administer themthe same questionnaire and place them in the same local labor market, much of the bias inusing nonexperimental methods is attenuated. Variability in estimates across estimatorsarises from the fact that di¤erent nonexperimental estimators solve the selection prob-lem under di¤erent assumptions, and these assumptions are often incompatible with eachother. Only if there is no selection bias would all evaluation estimators identify the sameparameter.

Seventh, three decades of experience with social experimentation have enhanced ourunderstanding of the bene…ts and limitations of this approach to program evaluation. Likeall evaluation methods, this method is based on implicit identifying assumptions. Experi-mental methods estimate the e¤ect of the program compared to no programs at all whenthey are used to evaluate the e¤ect of a program for which there are few good substitutes.They are less e¤ective when evaluating ongoing programs in part because they appear todisrupt established bureaucratic procedures. The threat of disruption leads local bureau-crats to oppose their adoption. To the extent that programs are disrupted, the programevaluated by the method is not the ongoing program that one seeks to evaluate. The pa-rameter estimated in experimental evaluations is often not likely to be of primary interestto policy makers and researchers, and under any event has to be more carefully interpretedthan is commonly done in most public policy discussions. However, if there is no dis-ruption, and the other problems that plague experiments are absent, the evidence fromsocial experiments provides a benchmark for learning about the performance of alternativenon-experimental methods.

Eighth, and …nally, programs implemented at a national or regional level a¤ect bothparticipants and nonparticipants. The current practice in the entire “treatment e¤ect”literature is to ignore the indirect e¤ects of programs on nonparticipants by assuming theyare negligible. This practice can produce substantially misleading estimates of programimpacts if indirect e¤ects are substantial. To account for the impacts of programs onboth participants and nonparticipants, general equilibrium frameworks are required whenprograms substantially impact the economy.

The remainder of the chapter is organized as follows. In Section 2, we distinguishamong several types of active labor market policies and describe the types of employmentand training services o¤ered both in U.S. and in Europe, their approximate costs, and theirintended e¤ects. We introduce the evaluation problem in Section 3. We discuss the impor-tance of heterogeneity in the response to treatment for de…ning counterfactuals of interest.We consider what economic questions the most widely used counterfactuals answer. Insection 4, we present three prototypical solutions to the problem cast in terms of mean im-pacts. These prototypes are generalized throughout the rest of this chapter, but three basicprinciples introduced in this section underlie all approaches to program evaluation when the

7

Page 8: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

parameters of interest are means or conditional means. In Section 5, we present conditionsunder which social experiments solve the evaluation problem and assess the e¤ectiveness ofsocial experiments as a tool for evaluating employment and training programs. In Section6, we outline two prototypical models of program participation and outcomes that representthe earliest and the latest thinking in the literature. We demonstrate the implications ofthese decision rules for the choice of an econometric evaluation estimator. We discuss theempirical evidence on the determinants of participation in government training programs.

The econometric models used to evaluate the impact of training programs in non-experimental settings are described in Section 7. The interplay between the economics ofprogram participation and the choice of an appropriate evaluation estimator is stressed. InSection 8, we discuss some of the lessons learned from implementing various approaches toevaluation. Included in this section are the results of a simulation analysis based on theempirical model of Ashenfelter and Card (1985), where we demonstrate the sensitivity of theperformance of alternative estimators to assumptions about heterogeneity in impact amongpersons and other data generating processes of the underlying econometric model. We alsoreexamine LaLonde’s (1986) evidence on the performance of nonexperimental estimatorsand reinterpret the main lessons from his study.

Section 9 discusses the problems that arise in using microeconomic methods to evaluateprograms with macroeconomic consequences. A striking example of the problems thatcan arise from this practice is provided. Two empirically operational general equilibriumframeworks are presented, and the lessons from applying them in practice are summarized.Section 10 surveys the …ndings from the non-experimental literature, and contrasts themwith those from experimental evaluations. We conclude in Section 11 by surveying the mainmethodological lessons learned from the program evaluation literature on job training.

8

Page 9: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

2 Public Job Training and Active Labor Market Poli-cies

Many government policies a¤ect employment and wages. The “active labor market” policieswe analyze have two important features that distinguish them from general policies, suchas income taxes, that also a¤ect the labor market. First, they are targeted toward theunemployed or toward those with low skills or little work experience who have completed(usually at a low level) their formal schooling. Second, the policies are aimed at promotingemployment and/or wage growth among this population, rather than just providing incomesupport.

Table 2.1 describes the set of policies we consider. This set includes: (a) classroomtraining (CT) consisting of basic education to remedy de…ciencies in general skills or voca-tional training to provide the skills necessary for particular jobs; (b) subsidized employmentwith public or private employers (WE), which includes public service employment (whollysubsidized temporary government jobs) and work experience (subsidized entry-level jobs atpublic or non-pro…t employers designed to introduce young people to the world of work)as well as wage supplements and …xed payments to private …rms for hiring new workers;(c) subsidies to private …rms for the provision of on-the-job training (OJT); (d) training inhow to obtain a job; and (e) in-kind subsidies to job search such as referrals to employersand free access to job listings. Policies (d) and (e) fall under the general heading of jobsearch assistance (JSA), which also includes the job matching services provided by the U.S.Employment Service and similar agencies in other countries.

As we argue in more detail below, distinguishing the types of training provided isimportant for two reasons. First, di¤erent types of training often imply di¤erent economicmodels of training participation and impact and therefore di¤erent econometric estimationstrategies. Second, because most existing training programs provide a mix of these services,heterogeneity in the impact of training becomes an important practical concern. As we showin Section 7, this heterogeneity has important implications for the choice of econometricmethods for evaluating active labor market policies.

We do not analyze privately supplied job training despite its greater quantitative im-portance to modern economies (see Heckman, Lochner and Taber, 1998a, or Mincer, 1962,1993). For example, in the United States, Jacob Mincer has estimated that such train-ing amounts to approximately 4 to 5 percent of GDP, annually. Despite the magnitudeof this investment there are surprisingly few publicly-available studies of the returns toprivate job training, and many of those that are available do not control convincingly forthe non-random allocation of training among private sector workers. Governments demandpublicly-justi…ed evaluations of training programs while private …rms, to the extent that

9

Page 10: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

they formally evaluate their training programs, keep their …ndings to themselves. An em-phasis on objective publicly accessible evaluations is a distinctive feature of the modernwelfare state, especially in an era of limited funds and public demands for accountability.

Table 2.2 presents the amount spent on active labor market policies by a number ofOECD countries. Most OECD countries provide some mix of the employment and trainingservices described in Table 2.1. Di¤erences among countries include the relative emphasison each type of service, the particular populations targeted for service, the total resourcesspent on the programs, how resources are allocated among programs and the extent towhich employment and training services are integrated with other programs such as unem-ployment insurance or social assistance. In addition, although the programs we study arefunded by governments, they are not always conducted by governments, especially in theU.S. and the U.K. In decentralized training systems, private …rms and local organizationsplay an important role in providing employment and training services.

Table 2.2 reveals that many OECD countries spend substantial sums on active labormarket policies. In nearly all countries, total expenditures are more than one-third of to-tal expenditures on unemployment bene…ts, and some countries’ expenditures on activelabor market policies exceed those on unemployment bene…ts. Usually only a fraction ofthese expenditures are for CT. Further, even in countries that emphasize classroom train-ing, governments spend substantial sums on other active labor market policies. Denmarkspends 1 percent of its GDP on CT for adults, the most of any OECD country. However,this expenditure amounts to only 40 percent of its total spending on active labor marketprograms. Only in Canada is the fraction spent on CT larger. At the opposite extreme,Japan and the U.S. spend only 0.03 percent and 0.04 percent, respectively, of their GDPon CT. However, as the table shows, these two countries also spend the smallest share ofGDP on active labor market policies.

The low percentage of GDP spent on active labor market programs in the U.S. hasled some researchers to comment on the irony that despite these low expenditures, U.S.programs have been evaluated more extensively and over a longer period of time thanprograms elsewhere (Haveman and Saks, 1985; Björklund, 1993). Indeed, much of what isknown about the impacts of these programs and many of the methodological developmentsassociated with evaluating them come from U.S. evaluations.1

1However, the level of total expenditure in the U.S. is still quite large. Relative total expenditureson active labor market policies can be inferred from Table 2.2 using the relative sizes of each economycompared with the U.S. For example, the German economy is somewhat less than one-fourth the size ofthe U.S. economy, and the French, Italian and British economies are approximately one-sixth the size of theU.S. economy. Accordingly, training expenditures are somewhat greater in Germany and France, about thesame in Italy, and less in the United Kingdom than in the U.S. See OECD, Employment Outlook (1996),Table 1.1, p.2.

10

Page 11: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

We now consider in detail each type of employment and training service in Table 2.1.This discussion motivates the consideration of alternative economic models of programparticipation and impact in Sections 6 and 7, and our focus on heterogeneity in programimpacts. It also provides a context for the empirical literature on the impact these programsthat we review in Section 10.

The …rst category listed in Table 2.1 is classroom training. In many countries, CTrepresents the largest fraction of government expenditures on active labor market policy,and most of that expenditure is devoted to vocational training. Even in the U.S., whereremedial programs aimed at high school dropouts and other low-skill individuals play alarger role than elsewhere, most CT programs provide vocational training. By design, mostCT programs in the OECD are of limited duration. For example in Denmark, CT typicallylasts 2 to 4 weeks (Jensen, et al., 1993) while in Sweden a duration of four months andin the United Kingdom, and the United States three months is the more typical duration.Per capita expenditures on such training varies substantially, with a training slot costingapproximately $7,500 in Sweden and between $2,000 and $3,000 in the United States.2 TheSwedish …gures include stipends for participants while the U.S. …gures do not.

An important di¤erence among OECD countries that provide CT is the extent to whichthe training is relatively standardized and therefore less tailored to the requirements of …rmsor the market in general. In the 1980s and early 1990s, the Nordic countries usually provideCT in government training centers that use standardized materials and teaching methods.However, the emphasis has shifted recently, especially in Sweden, toward decentralizedand …rm based training. In the United Kingdom and the U.S., the provision of CT ishighly decentralized and its content depends on the choices made by local councils ofbusiness, political, and labor leaders. The local councils receive funding from the federalgovernment and then subcontract for CT with private vocational and proprietary schoolsand local community colleges. Due to this highly decentralized structure, both participantcharacteristics and training content can vary substantially among locales, which suggeststhat the impact of training is likely to vary substantially across individuals in evaluationsof such programs.

The second category of services listed in Table 2.1 are wage and employment subsidies.This category encompasses several di¤erent speci…c services which we group together dueto their analytic similarity. The simplest example of this type of policy provides subsidiesto private …rms for hiring workers in particular groups. These subsidies may take the formof a …xed amount for each new employee hired or some fraction of the employee’s wage fora period of time. In the U.S., the Targeted Jobs Tax Credit is an example of this type ofprogram. Heckman, Lochner, Smith and Taber (1997) discuss the empirical evidence on

2Unless otherwise indicated all monetary units are expressed in 1997 U.S. dollars.

11

Page 12: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

the e¤ectiveness of wage and employment subsidies in greater detail.Temporary work experience (WE) usually targets low skilled youth or adults with poor

employment histories and provides them with a job lasting 3 to 12 months in the publicor nonpro…t sector. The idea of these programs is to ease the transition of these groupsinto regular jobs, by helping them learn about the world of work and develop good workhabits. Such programs constitute a very small proportion of U.S. training initiatives, butsubstantial fractions of services provided to youth in countries such as France (TUC) andthe United Kingdom (Community Programmes). In public sector employment (PSE) pro-grams, governments create temporary public sector jobs. These jobs usually require someamount of skill and are aimed at unemployed adults with recent work experience ratherthan youth or the disadvantaged. Except for a brief period during the late 1970s, theyhave not been used in the United States since the Depression era. However, they have beenand remain an important component of active labor market policy in several Europeancountries.

The third category in Table 2.1 is subsidized on-the-job training at private …rms. Thegoal of subsidized OJT programs is to induce employers provide job-relevant skills, including…rm-speci…c skills, to disadvantaged workers. In the U.S., employers receive a 50 percentwage subsidy for up to six months; in the U.K. employers receive a lump sum per week(O’Higgins, 1994). Although evidence is limited and …rm training is di¢cult to measure,there is a widespread view that these programs in fact provide little training, even informalon-the-job training, and are better characterized as a work experience or wage subsidyprogram (e.g., Breen, 1988; Hutchinson and Church, 1989).3 Survey responses by employerswho have hired or sponsored OJT trainees suggest that they value the program for its helpin reducing the costs associated with hiring and retaining suitable employees more than forthe opportunity to increase the skills of new workers (Begg, et al., 1991).

For purposes of evaluation, it is almost always impossible to distinguish those OJTexperiences from which new skills were acquired from those that amounted to work experi-ence or wage subsidy without a training component. In addition, because OJT is providedby individual employers, this indeterminacy is not simply a program-speci…c feature, butholds among individuals within the same program. Consequently, OJT programs will likelyhave heterogeneous e¤ects, and the impact, if any, of these programs will result from somecombination of learning by doing, the usual training provided by the …rm to new workers

3The provision of subsidized OJT is particularly hard to monitor both because on-the-job training hasproven di¢cult to measure with survey methods (Barron, Berger and Black, 1997) and because traineesoften do not peceive that they have been treated any di¤erently than their co-workers who are not sub-sidized. In fact, both groups may have received substantial amounts of informal on-the-job training. Forevidence of the importance of informal on-the-job training in the U.S., see Barron, Black and Lowenstein(1989).

12

Page 13: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

and incremental training beyond that provided to unsubsidized workers.The fourth category of services in Table 2.1 is job search assistance. The purpose of these

services is to facilitate the matching process between workers and …rms both by reducingtime unemployed and by increasing match quality. The programs are usually operatedby the national or local employment service, but sometimes may be subcontracted out tothird parties. Included under this category are direct placement in vacant jobs, employerreferrals, in-kind subsidies to search such as free access to job listings and telephones forcontacting employers, career counseling, and instruction in job search skills. The last ofthese, which often includes instruction in general social skills, was developed in the U.S.,but is now used in U.K., Sweden, and recently France (Björklund and Regner, 1996, p.24). In recent years, JSA has become more popular due to its low cost, usually just afew hundred dollars per participant, and relatively solid record of performance (which wediscuss in detail in Section 10).

To conclude this section, we discuss …ve features of employment and training programsthat should be kept in mind when evaluating them. First, as the operation of these programshas become more decentralized in OECD countries, there have emerged di¤erences betweenhow these programs were designed and how they are implemented (Hollister and Freedman,1988). Actual practice can deviate substantially from explicit written policy.4 Therefore,the evaluator must be careful to characterize the program as implemented when assessingits impacts.

Second, participants often receive services from more than one category in Table 2.1. Forexample, classroom training in vocational skills might be followed by job search assistance.In the U.K., the Youth Training Scheme (now Youth Training) was explicitly designed tocombine OJT with 13 weeks of CT. Some expensive programs combine several of the serviceslisted in Table 2.1 into a single package. For example, in the U.S. the Job Corps programfor youth combines classroom training with work experience and job search assistance ina residential setting at a current cost of around $19,000 per participant. Many availablesurvey data sets do not identify all the services received by a participant. In this case, thepractice of combining together various types of training, particularly when combinationsare tailored to the needs of individual trainees as in the U.S. JTPA program, constitutesanother source of heterogeneity in the impact of training. Even when administrative dataare available that identify the services received, isolating the impact of particular individualservices often proves di¢cult or impossible in practice due to the small samples receivingparticular combinations of services or due to di¢culties in determining the process by which

4For example, see Breen (1988) and Hollister and Freedman (1990) describing the implementation ofWEP in Ireland and Hollister and Freedman (1990) and Leigh (1995) describing the implementation ofJTPA in the United States.

13

Page 14: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

individuals come to receive particular service combinations.Third, certain features of active labor market programs a¤ect individuals’ decisions

to participate in training. In some countries, such as Sweden and the United Kingdom,participation in training is a condition for receiving unemployment bene…ts rather thanless generous social assistance payments. In the U.S., participation is sometimes requiredby a court order in lieu of alternative punishment.

Fourth, program administrators often have considerable discretion over whom theyadmit into government training programs. This discretion results from the fact that thenumber of applicants often exceeds the number of available training positions. It has longbeen a feature of U.S. programs, but also has characterized programs in Austria, Denmark,Germany, Norway, and the United Kingdom (Björklund and Regner, 1996; Westergard-Neilsen, 1993; Kraus, et al., 1997). Consequently, when modeling participation in training,it may be important to account for not only individual incentives, but also those of theprogram operators. In Section 6, we discuss the incentives facing program operators andhow they a¤ect the characteristics of participants in government training programs.

Finally, the di¤erent types of services require di¤erent economic models of program par-ticipation and impact. For example, the standard human capital model captures the essenceof individual decisions to invest in vocational skills (CT). It provides little guidance to be-havior regarding job search assistance or wage subsidies. In Section 6 we describe economicmodels that describe participation in alterative programs and discuss their implications forevaluation research.

14

Page 15: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

3 The Evaluation Problem and the Parameters of In-terest in Evaluating Social Programs

3.1 The Evaluation ProblemConstructing counterfactuals is the central problem in the literature on evaluating socialprograms. In the simplest form of the evaluation problem, persons are imagined as beingable to occupy one of two mutually exclusive states: “0” for the untreated state and “1”for the treated state. Treatment is associated with participation in the program beingevaluated.5 Associated with each state is an outcome, or set of outcomes. It is easiest tothink of each state as consisting of only a single outcome measure, such as earnings, butjust as easily, we can use the framework to model vectors of outcomes such as earnings,employment and participation in welfare programs. In the models presented in section 6,we study an entire vector of earnings or employment at each age that result from programparticipation.

We can express these outcomes as a function of conditioning variables, X. Denotethe potential outcomes by Y0 and Y1, corresponding to the untreated and treated states.Each person has a (Y0; Y1) pair. Assuming that means exist, we may write the (vector) ofoutcomes in each state as(3.1a) Y0 = ¹0(X) + U0(3.1b) Y1 = ¹1(X) + U1where E(Y0jX) = ¹0(X) and E(Y1jX) = ¹1(X): To simplify the notation, we keep the con-ditioning on X implicit unless it serves to clarify the exposition by making it explicit. Thepotential outcome actually realized depends on decisions made by individuals, …rms, fam-ilies or government bureaucrats. This model of potential outcomes is variously attributedto Fisher (1935), Neyman (1935), Roy (1951), Quandt (1972, 1988) or Rubin (1974).

To focus on main ideas, throughout most of this chapter we assumeE(U1jX) = E(U0jX) =0, although as we note at several places in this paper, this is not strictly required. For manyof the estimators that we consider in this chapter we allow for the more general case

Y0 = g0(X) + U0Y1 = g1(X) + U1

where E(U0 j X) 6= 0 and E(U1 j X) 6= 0. Then ¹0(X) = g0(X) + E(U0jX) and ¹1(X) =g1(X)+E(U1jX).6 Thus X is not necessarily exogenous in the ordinary econometric usage

5In this paper, we only consider a two potential state model in order to focus on the main ideas.Heckman (1998a) develops a multiple state model of potential outcomes for a large number of mutuallyexclusive states. The basic ideas in his work are captured in the two outcome models we present here.

6For example, an exogeneity assumption is not required when using social experiments to identifyE(Y1 ¡ Y0jX;D = 1).

15

Page 16: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

of that term. These conditions do not imply that E(U1¡U0jX;D = 1) = 0. D may dependon U1, U0 or U1 ¡ U0 and X.

Note also that Y may be a vector of outcomes or a time series of potential out-comes: (Y0t; Y1t); for t = 1; : : : ; T , on the same type of variable. We will encounter thelatter case when we analyze panel data on outcomes. In this case, there is usually acompanion set of X variables which we will sometimes assume to be strictly exogenousin the conventional econometric meaning of that term: E(U0tjX) = 0; E(U1tjX) = 0where X = (X1;:::; XT ): In de…ning a sequence of “treatment on the treated” parame-ters, E(Y1t ¡ Y0tjX;D = 1) t = 1; : : : ; T; this assumption allows us to abstract from anydependence between U1t, U0t and X. It excludes di¤erences in U1t and U0t arising fromX dependence and allows us to focus on di¤erences in outcomes solely attributable to D.While convenient, this assumption is overly strong.

However, we stress that the exogeneity assumption in either cross section or panelcontexts is only a matter of convenience and is not strictly required. What is requiredfor an interpretable de…nition of the “treatment on the treated” parameter is avoidingconditioning on X variables caused by D even holding Y P = ((Y01;Y11); : : : ; (Y0T;Y1T ))…xed where Y P is the vector of potential outcomes. More precisely, we require that for theconditional density of the data

f(XjD; Y P ) = f(XjY P )

i.e. we require that the realization of D does not determine X given the vector of potentialoutcomes. Otherwise, the parameter E(Y1 ¡ Y0jX;D = 1) does not capture the full e¤ectof treatment on the treated as it operates through all channels and certain other technicalproblems discussed in Heckman (1998a) arise. In order to obtain E(Y1t ¡ Y0tjX;D = 1)de…ned on subsets of X; say Xc; simply integrate out E(Y1t¡Y0tjX;D) against the densityf(fXcjD = 1) where fXc is the portion of X not in Xc: X = (Xc;fXc).

Note, …nally, that the choice of a base state “0” is arbitrary. Clearly the roles of “0”and “1” can be reversed. In the case of human capital investments, there is a naturalbase state. But for many other evaluation problems the choice of a base is arbitrary.Assumptions appropriate for one choice of “0” and “1” need not carry over to the oppositechoice. With this cautionary note in mind, we proceed as if a well-de…ned base state exists.

In many problems it is convenient to think of “0” as a benchmark “no treatment ”state. The gain to the individual of moving from “0” to “1” is given by(3.2) ¢ = Y1 ¡ Y0:If one could observe both Y0 and Y1 for the same person at the same time, the gain ¢would be known for each person. The fundamental evaluation problem arises because wedo not know both coordinates of (Y1; Y0) and hence ¢ for anybody. All approaches to

16

Page 17: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

solving this problem attempt to estimate the missing data. These attempts to solve theevaluation problem di¤er in the assumptions they make about how the missing data arerelated to the available data, and what data are available. Most approaches to evaluationin the social sciences accept the impossibility of constructing ¢ for anyone. Instead, theevaluation problem is rede…ned from the individual level to the population level to estimatethe mean of ¢, or some other aspect of the distribution of ¢, for various populations ofinterest. The question becomes what features of the distribution of ¢ should be of interestand for what populations should it be de…ned?

3.2 The Counterfactuals of InterestThere are many possible counterfactuals of interest for evaluating a social program. One

might like to compare the state of the world in the presence of the program to the state ofthe world if the program were operated in a di¤erent way, or to the state of the world if theprogram did not exist at all, or to the state of the world if alternative programs were usedto replace the present program. A full evaluation entails an enumeration of all outcomesof interest for all persons both in the current state of the world and in all the alternativestates of interest, and a mechanism for valuing the outcomes in the di¤erent states.

Outcomes of interest in program evaluations include the direct bene…ts received, thelevel of behavioral variables for participants and nonparticipants and the payments for theprogram, for both participants and nonparticipants, including taxes levied to …nance apublicly provided program. These measures would be displayed for each individual in theeconomy to characterize each state of the world.

In a Robinson Crusoe economy, participation in a program is a well-de…ned event. Ina modern economy, almost everyone participates in each social program either directly orindirectly. A training program a¤ects more than the trainees. It also a¤ects the personswith whom the trainees compete in the labor market, the …rms that hire them and thetaxpayers who …nance the program. The impact of the program depends on the numberand composition of the trainees. Participation in a program does not mean the same thingfor all people.

The traditional evaluation literature usually de…nes the e¤ect of participation to bethe e¤ect of the program on participants explicitly enrolled in the program. These arethe “Direct E¤ects.” They exclude the e¤ects of a program that do not ‡ow from directparticipation, known as the “Indirect E¤ects”. This distinction appears in the pioneeringwork of H. G. Lewis on measuring union relative wage e¤ects (Lewis, 1963). His insightsapply more generally to all evaluation problems in social settings.

There may be indirect e¤ects for both direct participants and direct nonparticipants.Thus a direct participant may pay taxes to support the program just as persons who do not

17

Page 18: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

directly participate may also pay taxes. A …rm may be an indirect bene…ciary of the lowerwages resulting from an expansion of the trained workforce. The conventional econometricand statistical literature ignores the indirect e¤ects of programs and equates “treatment”outcomes with the direct outcome Y1 in the program state and “no treatment” with thedirect outcome Y0 in the no program state.

Determining all outcomes in all states is not enough to evaluate a program. Anotheraspect of the evaluation problem is the valuation of the outcomes. In a democratic society,aggregation of the evaluations and the outcomes in a form useful for social deliberationsalso is required. Di¤erent persons may value the same state of the world di¤erently evenif they experience the same “objective” outcomes and pay the same taxes. Preferencesmay be interdependent. Redistributive programs exist, in part, because of altruistic orparternalistic preferences. Persons may value the outcomes of other persons either posi-tively or negatively. Only if one person’s preferences are dominant (the idealized case of asocial planner with a social welfare function) is there a unique evaluation of the outcomesassociated for each possible state from each possible program.

The traditional program evaluation literature assumes that the valuation of the directe¤ects of the program boil down to the e¤ect of the program on GDP. This assumptionignores the important point that di¤erent persons value the same outcomes di¤erently andthat the democratic political process often entails coalitions of persons who value outcomesin di¤erent ways. Both e¢ciency and equity considerations may receive di¤erent weightsfrom di¤erent groups. Di¤erent mechanisms for aggregating evaluations and resolving socialcon‡icts exist in di¤erent societies. Di¤erent types of information are required to evaluatea program under di¤erent modes of social decision making.

Both for pragmatic and political reasons, government social planners, statisticians orpolicy makers may value objective output measures di¤erently than the persons or insti-tutions being evaluated. The classic example is the value of nonmarket time (Greenberg,1997). Traditional program evaluations exclude such valuations largely because of the dif-…culty of inputting the value and quantity of nonmarket time. By doing this, however,these evaluations value labor supply in the market sector at the market wage, but valuelabor supply in the nonmarket sector at a zero wage. By contrast, individuals value laborsupply in the nonmarket sector at their reservation wage. In this example, two di¤erentsets of preferences value the same outcomes di¤erently. In evaluating a social program in asociety that places weight on individual preferences, it is appropriate to recognize personalevaluations and that the same outcome may be valued in di¤erent ways by di¤erent socialactors.

Programs that embody redistributive objectives inherently involve di¤erent groups.Even if the taxpayers and the recipients of the bene…ts of a program have the same prefer-ences, their valuations of a program will, in general, di¤er. Altruistic considerations often

18

Page 19: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

motivate such programs. These often entail private valuations of distributions of programimpacts - how much recipients gain over what they would experience in the absence of theprogram. (See Heckman and Smith, 1993, 1995, 1998a and Heckman, Smith and Clements,1997.)

Answers to many important evaluation questions require knowledge of the distributionof program gains especially for programs that have a redistributive objective or programsfor which altruistic motivations play a role in motivating the existence of the program. LetD = 1 denote direct participation in the program andD = 0 denote direct nonparticipation.To simplify the argument in this section, ignore any indirect e¤ects. From the standpointof a detached observer of a social program who takes the base state values (denoted “0”)as those that would prevail in the absence of the program, it is of interest to know, amongother things,

(A) the proportion of people taking the program who bene…t from it:Pr(Y1 > Y0 j D = 1) = Pr(¢ > 0 j D = 1);

(B) the proportion of the total population bene…ting from the program:Pr(Y1 > Y0 j D = 1) ¢ Pr(D = 1) = Pr(¢ > 0 j D = 1) ¢ Pr(D = 1);

(C) selected quantiles of the impact distributioninf¢

f¢ : F (¢ j D = 1) > qg, where q is a quantile of the distributionand where “inf” is the smallest attainable value of ¢ that satis…es thecondition stated in the braces;

(D) the distribution of gains at selected base state values:F (¢ j D = 1; Y0 = y0);

(E) the increase in the level of outcomes above a certain threshold ¹y due to a policy:Pr(Y1 > ¹y j D = 1) ¡ Pr(Y0 > ¹y j D = 1).

Measure (A) is of interest in determining how widely program gains are distributedamong participants. Participants in the political process with preferences over distributionsof program outcomes would be unlikely to assign the same weight to two programs withthe same mean outcome, one of which produced favorable outcomes for only a few personswhile the other distributed gains more broadly. When considering a program, it is ofinterest to determine the proportion of participants who are harmed as a result of programparticipation, indicated by Pr(Y1 < Y0 j D = 1): Negative mean impact results mightbe acceptable if most participants gain from the program. These features of the outcomedistribution are likely to be of interest to evaluators even if the persons studied do notknow their Y0 and Y1 values in advance of participating in the program.

Measure (B) is the proportion of the entire population that bene…ts from the program,assuming that the costs of …nancing the program are broadly distributed and are notperceived to be related to the speci…c program being evaluated. If voters have correct

19

Page 20: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

expectations about the joint distribution of outcomes, it is of interest to politicians todetermine how widely program bene…ts are distributed. At the same time, large programgains received by a few persons may make it easier to organize interest groups in supportof a program than if the same gains are distributed more widely.

Evaluators interested in the distribution of program bene…ts would be interested inmeasure (C). Evaluators who take a special interest in the impact of a program on recipi-ents in the lower tail of the base state distribution would …nd measure (D) of interest. Itreveals how the distribution of gains depends on the base state for participants. Measure(E) provides the answers to the question “do the distributions of gains for the participantsdominate the distribution of outcomes if they did not participate?” (See Heckman, Smithand Clements, 1997; and Heckman and Smith, 1998a.) Expanding the scope of the discus-sion to evaluate the indirect e¤ects of the program makes it more likely that estimatingdistributional impacts is an important part in conducting program evaluations.

3.3 The Counterfactuals Most Commonly Estimated In The Lit-erature

The evaluation problem in its most general form for distributions of outcomes is formidableand is not considered in depth either in this chapter or in the literature. (Heckman andSmith, 1998a, and Heckman, Smith and Clements, 1997, consider identi…cation and esti-mation of counterfactual distributions.) Instead, in this chapter we focus on counterfactualmeans, and consider a form of the problem in which analysts have access to informationon persons who are in one state or the other at any time, and for certain time periodsthere are some persons in both states, but there is no information on any single person whois in both states at the same time. As discussed in Heckman (1998a) and Heckman andSmith (1998a), a crucial assumption in the traditional evaluation literature is that the notreatment state approximates the no program state. This would be true if indirect e¤ectsare negligible.

Most of the empirical work in the literature on evaluating government training programsfocuses on means and in particular on one mean counterfactual: the mean direct e¤ect oftreatment on those who take treatment. The transition from the individual to the grouplevel counterfactual recognizes the inherent impossibility of observing the same person inboth states at the same time. By dealing with aggregates, rather than individuals, it issometimes possible to estimate group impact measures even though it may be impossibleto measure the impacts of a program on any particular individual. To see this point moreformally, consider the switching regression model with two regimes denoted by “1” and “0”(Quandt, 1972). The observed outcome Y is given by

20

Page 21: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

(3.3) Y = DY1 + (1 ¡D)Y0:When D = 1 we observe Y1; when D = 0 we observe Y0:

To cast the foregoing model in a more familiar-looking form, and to distinguish it fromconventional regression models, express the means in (3.1a) and (3.1b) in more familiarlinear regression form:

E(YjjX) = ¹j(X) = X¯j ; j = 0; 1.

With these expressions, substitute from (3.1a) and (3.1b) into (3.3) to obtain

Y = D(¹1(X) + U1) + (1 ¡D)(¹0(X) + U0):

Rewriting,

Y = ¹0(X) +D(¹1(X) ¡ ¹0(X) + U1 ¡ U0) + U0:

Using the linear regression representation, we obtain(3.4) Y = X¯0 +D(X(¯1 ¡ ¯0) + U1 ¡ U0) + U0:Observe that from the de…nition of a conditional mean, E(U0 j X) = 0 and E(U1 j X) = 0:

The parameter most commonly invoked in the program evaluation literature, althoughnot the one actually estimated in social experiments, or in most nonexperimental evalua-tions, is the e¤ect of randomly picking a person with characteristics X and moving thatperson from “0” to “1”:

E(Y1 ¡ Y0jX) = E(¢jX):

In terms of the switching regression model this parameter is the coe¢cient on D in the“regression” non-error component of following equation:(3.5) Y = ¹0(X) +D(¹1(X) ¡ ¹0(X)) + fU0 +D(U1 ¡ U0)g

= ¹0(X) +D(E(¢jX)) + fU0 +D(U1 ¡ U0)g= X¯0 +DX(¯1 ¡ ¯0) + fU0 +D(U1 ¡ U0)g

where the term in braces is the “error.”If the model is specialized so that there are K regressors plus an intercept and ¯1 =

(¯10; : : : ; ¯1K) and ¯0 = (¯00; : : : ¯0K), where the intercepts occupy the …rst position, andthe slope coe¢cients are the same in both regimes:

¯1j = ¯0j = ¯j ; j = 1; : : : ; K

21

Page 22: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

and ¯00 = ¯0 and ¯10 ¡ ¯00 = ®, the parameter under consideration reduces to ®:(3.6) E(Y1 ¡ Y0jX) = ¯10 ¡ ¯00 = ®:The regression model for this special case maybe written as(3.7) Y = X¯ +D®+ fU0 +D(U1 ¡ U0)g :It is nonstandard from the standpoint of elementary econometrics because the error termhas a component that switches on or o¤ with D. In general, its mean is not zero becauseE[U0 +D(U1 ¡ U0)] = E(U1 ¡ U0jD = 1)Pr(D = 1): If U1 ¡ U0; or variables statisticallydependent on it, help determine D, E(U1 ¡ U0 j D = 1) 6= 0. Intuitively, if persons whohave high gains (U1 ¡ U0) are more likely to appear in the program, than this term ispositive.

In practice most non-experimental and experimental studies do not estimate E(¢ j X).Instead, most nonexperimental studies estimate the e¤ect of treatment on the treated,E(¢ j X;D = 1): This parameter conditions on participation in the program as follows:(3.8) E(¢jX;D = 1) = E(Y1 ¡ Y0jX;D = 1) = X(¯1 ¡ ¯0) + E(U1 ¡ U0jX;D = 1):It is the coe¢cient on D in the non-error component of the following regression equation:(3.9) Y = ¹0(X) +D(E(¢jX;D = 1))

+ fU0 +D [(U1 ¡ U0) ¡ E(U1 ¡ U0jX;D = 1)]g= X¯0 +D(X(¯1 ¡ ¯0) + E(U1 ¡ U0jX;D = 1))

+ fU0 +D [(U1 ¡ U0) ¡ E(U1 ¡ U0jX;D = 1)]g :E(¢ j X;D = 1) is a nonstandard parameter in conventional econometrics. It combines

“structural” parametersX(¯1¡¯0) with the means of the unobservables (E(U1¡U0jX;D =1)): It measures the average gain in the outcome for persons who choose to participate in aprogram compared to what they would have experienced in the base state. It computes theaverage gain in terms of both observables and unobservables. It is the latter that makesthe parameter look nonstandard. Most econometric activity is devoted to separating ¯0and ¯1 from the e¤ects of the regressors on U1 and U0. Parameter (3.8) combines thesee¤ects.

This parameter is implicitly de…ned conditional on the current levels of participationin the program in society at large. Thus it recognizes social interaction. But at any pointin time the aggregate participation level is just a single number, and the composition oftrainees is …xed. From a single cross section of data, it is not possible to estimate howvariation in the levels and composition of participants in a program a¤ect the parameter.

The two evaluation parameters we have just presented are the same if we assume thatU1 ¡ U0 = 0, so the unobservables are common across the two states. From (3.9) we nowhave Y1¡Y0 = ¹1(X)¡¹0(X) = X(¯1¡¯0). The di¤erence between potential outcomes inthe two states is a function ofX but not of unobservables. Further specializing the model toone of intercept di¤erences (i.e. Y1¡Y0 = ®); requires that the di¤erence between potential

22

Page 23: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

outcomes is a constant. The associated regression can be written as the familiar-lookingdummy variable regression model:(3.10) Y = X¯ +D®+ U , where E(U) = 0:The parameter ® is easy to interpret as a standard structural parameter and the speci…ca-tion (3.10) looks conventional. In fact, model (3.10) dominates the conventional evaluationliterature. The validity of many conventional instrumental variables methods and longitu-dinal estimation strategies is contingent on this speci…cation as we document below. Theconventional econometric evaluation literature focuses on ®; or more rarely, X(¯1 ¡ ¯0),and the selection problem arises from the correlation between D and U .

While familiar, the framework of (3.10) is very special. Potential outcomes (Y1; Y0)di¤er only by a constant (Y1 ¡ Y0 = ®). The best Y1 is the best Y0: All people gain or losethe same amount in going from “0” to “1”. There is no heterogeneity in gains. Even inthe more general case, with ¹1(X) and ¹0(X) distinct, or ¯1 6= ¯0 in the linear regressionrepresentation, so long as U1 = U0 among people with the sameX, there is no heterogeneityin the outcomes moving from “0” to “1”. This assumed absence of heterogeneity in responseto treatments is strong. When tested, it is almost always rejected (see Heckman, Smithand Clements, 1997 and the evidence presented below).

There is one case when U1 6= U0, where the two parameters of interests are still equaleven though there is dispersion in gain ¢. This case occurs when(3.11) E(U1 ¡ U0jX;D = 1) = 0:Condition (3.11) arises when conditional on X; D does not explain or predict U1¡U0. Thiscondition could arise if agents who select into state “1” from “0” either do not know ordo not act on U1 ¡ U0, or information dependent on U1 ¡ U0, in making their decision toparticipate in the program. Ex post, there is heterogeneity, but ex ante it is not acted onin determining participation in the program.

When the gain does not a¤ect individuals’ decisions to participate in the program, theerror terms (the terms in braces in (3.7) and (3.9)) have conventional properties. The onlybias in estimating the coe¢cients on D in the regression models arise from the dependencebetween U0 and D just as the only source of bias in the common coe¢cient model is thecovariance between U and D when E(U(X)) = 0. To see this point take the expectationof the terms in braces in (3.7) and (3.9), respectively, to obtain the following:

E(U0 +D(U1 ¡ U0)jX;D) = E(U0jX;D)

and

E(U0 +D [(U1 ¡ U0) ¡E(U1 ¡ U0jX;D = 1)] jX;D) = E(U0jX;D).

23

Page 24: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

A problem that remains when condition (3.11) holds is that, the D component in theerror terms contributes a component of variance to the model and so makes the modelheteroscedastic:V ar(U0 +D(U1 ¡ U0)jX;D) = V ar(U0jX;D)

+2COV (U0; U1 ¡ U0jX;D)D + V ar(U1 ¡ U0jX;D)D:The distinction between a model with U1 = U0, and one with U1 6= U0, is fundamental to

understanding modern developments in the program evaluation literature. When U1 = U0and we condition on X, everyone with the same X has the same treatment e¤ect. Theevaluation problem greatly simpli…es and one parameter answers all of the conceptuallydistinct evaluation questions we have posed. “Treatment on the treated” is the sameas the e¤ect of taking a person at random and putting him/her into the program. Thedistributional questions (A)–(E) all have simple answers because everyone with the sameX has the same ¢. Equation (3.10) is amenable to analysis by conventional econometricmethods. Eliminating the covariance betweenD and U is the central problem in this model.

When U1 6= U0, but (3.11) characterizes the program being evaluated, most of the fa-miliar econometric intuition remains valid. This is the “random coe¢cient” model with thecoe¢cient on D “random” (from the standpoint of the observing economist), but uncorre-lated with D. The central problem in this model is covariance between U0 and D and theonly additional econometric problem arises in accounting for heteroscedasticity in gettingthe right standard errors for the coe¢cients. In this case, the response to treatment variesamong persons with the same X values. The mean e¤ect of treatment on the treated andthe e¤ect of treatment on a randomly chosen person are the same.

In the general case when U1 6= U0 and (3.11) no longer holds, we enter a new world notcovered in the traditional econometric evaluation literature. A variety of di¤erent treatmente¤ects can be de…ned. Conventional econometric procedures often break down or requiresubstantial modi…cation. The error term for the model (3.5) has a non-zero mean.7 Botherror terms are heteroscedastic. The distinctions among these three models — (a) thecoe¢cient on D is …xed (given X) for everyone; (b) the coe¢cient on D is variable (givenX), but does not help determine program participation; and (c) the coe¢cient on D isvariable (given X) and does help determine program participation — are fundamental tothis chapter and the entire literature on program evaluation.

7E[U0 + D(U1 ¡ U0)X] = E(U1 ¡ U0 j X;D = 1) Pr(D = 1 j X) 6= 0:

24

Page 25: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

3.4 Is Treatment on the Treated an Interesting Economic Para-meter?

What economic question does parameter (3.2) answer? How does it relate to the conven-tional parameter of interest in cost-bene…t analysis - the e¤ect of a program on GDP?In order to relate the parameter (3.2) with the parameters needed to perform traditionalcost-bene…t analysis, it is fruitful to consider a more general framework. Following ourprevious discussion, we consider two discrete states or sectors corresponding to direct par-ticipation and nonparticipation and a vector of policy variables ' that a¤ect the outcomesin both states and the allocation of persons to states or sectors. The policy variables maybe discrete or continuous. Our framework departs from the conventional treatment e¤ectliterature and allows for general equilibrium e¤ects.

Assuming that costless lump-sum transfers are possible, that a single social welfarefunction governs the distribution of resources and that prices re‡ect true opportunity costs,traditional cost-bene…t analysis (see, e.g., Harberger, 1971) seeks to determine the impact ofprograms on the total output of society. E¢ciency becomes the paramount criterion in thisframework, with the distributional aspects of policies assumed to be taken care of by lumpsum transfers and taxes engineered by an enlightened social planner. In this framework,impacts on total output are the only objects of interest in evaluating programs. Thedistribution of program impacts is assumed to be irrelevant. This framework is favorableto the use of mean outcomes to evaluate social programs.

Within the context of the simple framework discussed in Section 3.1, let Y1 and Y0 beindividual output which trades at a constant relative price of “1” set externally and nota¤ected by the decisions of the agents we analyze. Alternatively, assume that the policieswe consider do not alter relative prices. Let ' be a vector of policy variables which operateon all persons. These generate indirect e¤ects. c(') is the social cost of ' denominated in“0” units. We assume that c(0) = 0 and that c is convex and increasing in '. Let N1(')be the number of persons in state “1” and N0(') be the number of persons in state “0”.The total output of society is

N1(')E(Y1 j D = 1; ') +N0(')E(Y0 j D = 0; ') ¡ c(');

where N1(') + N0(') = ¹N is the total number of persons in society. For simplicity, weassume that all persons have the same person-speci…c characteristics X. Vector ' is generalenough to include …nancial incentive variables for participation in the program as well asmandates that assign persons to a particular state. A policy may bene…t some and harmothers.

25

Page 26: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

Assume for convenience that the treatment choice and mean outcome functions aredi¤erentiable and for the sake of argument further assume that ' is a scalar. Then thechange in output in response to a marginal increase in ' from any given position is:(3.12) ¢(') =

@N1(')@'

[E(Y1 j D = 1; ') ¡ E(Y0 j D = 0; ')]+

N1(')"@E(Y1 j D = 1; ')

@'

#+N0(')

"@E(Y1 j D = 0; ')

@'

#¡ @c(')@':

The …rst term arises from the transfer of persons across sectors that is induced by thepolicy change. The second term arises from changes in output within each sector inducedby the policy change. The third term is the marginal social cost of the change.

In principle, this measure could be estimated from time-series data on the change inaggregate GDP occurring after the program parameter ' is varied. Assuming a well-de…nedsocial welfare function and making the additional assumption that prices are constant atinitial values, an increase in GDP evaluated at base period prices raises social welfareprovided that feasible bundles can be constructed from the output after the social programparameter is varied so that all losers can be compensated. (See, e.g., La¤ont, 1989, p. 155,or the comprehensive discussion in Chipman and Moore, 1976).

If marginal policy changes have no e¤ect on intra-sector mean output, the bracketedelements in the second set of terms inside the braces are zero. In this case, the parametersof interest for evaluating the impact of the policy change on GDP are:

(i)@N1(')@'

; the number of people entering or

leaving state 1.(ii) E(Y1 j D = 1; ') ¡E(Y0 j D = 0; ');the mean output di¤erence between

sectors.(iii)

@c(')@'

; the social marginal cost of the policy.

It is revealing that nowhere on this list are the parameters that receive the most atten-tion in the econometric policy evaluation literature. (See, e.g., Heckman and Robb, 1985a).These are “the e¤ect of treatment on the treated”:

(a) E(Y1 ¡ Y0 j D = 1,')or(b) E(Y1 j ' = ¹') ¡E(Y0 j ' = 0) where ' = ¹' sets N1(¹') = ¹N . This is

the e¤ect of universal coverage forthe program.

26

Page 27: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

Parameter (ii) can be estimated by taking simple mean di¤erences between the outputsin the two sectors; no adjustment for selection bias is required. Parameter (i) can beobtained from knowledge of the net movement of persons across sectors in response to thepolicy change, something usually neglected in micro policy evaluation (for exceptions, seeMo¢tt, 1992, or Heckman, 1992). Parameter (iii) can be obtained from cost data. Fullsocial marginal costs should be included in the computation of this term. The typicalmicro evaluation neglects all three terms. Costs are rarely collected and gross outcomes aretypically reported; entry e¤ects are neglected and term (ii) is usually “adjusted” to avoidselection bias when in fact, no adjustment is needed to estimate the impact of the programon GDP.

It is informative to place additional structure on this model. This leads to a repre-sentation of a criterion that is widely used in the literature on microeconomic programevaluation and also establishes a link with the models of program participation used in thelater sections of this chapter. Assume a binary choice random utility framework. Supposethat agents make choices based on net utility and that policies a¤ect participant utilitythrough an additively-separable term k(') that is assumed scalar and di¤erentiable. Netutility is

U = X + k(')

where k is monotonic in ' and where the joint distributions of (Y1;X) and (Y0; X) areF (y1; x) and F (y0; x), respectively. The underlying variables are assumed to be continu-ously distributed. In the special case of the Roy model of self-selection (see, Heckman andHonoré, 1990, for one discussion) X = Y1 ¡ Y0;

D = 1(U ¸ 0) = 1(X ¸ ¡k('));

where “1” is the indicator function (1(Z > 0) = 1 if Z > 0;= 0 otherwise)

N1(') = ¹N Pr(U ¸ 0) = ¹NR1¡k(') f(x)dx;

and

N0(') = ¹N Pr(U < 0) = ¹NR¡k(')¡1 f(x)dx:

Total output is

¹NR1¡1 y1

R1¡k(') f(y1; x j ')dxdy1 + ¹N

R1¡1 y0

R¡k(')¡1 f(y0; x j ')dxdy0 ¡ c('):

27

Page 28: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

Under standard conditions (see, e.g., Royden, 1968), we may di¤erentiate this expressionto obtain the following expression for the marginal change in output with respect to a changein ' :(3.13) ¢(') =¹Nk0(')fx(¡k('))[E(Y1 j D = 1; x = ¡k('); ')-E(Y0 j D = 0; x = ¡k('); ')]

+ ¹N [R1¡1 y1

R1¡k(')

@f(y1; x j ')@'

dxdy1 +R1¡1 y0

R¡k(')¡1

@f(y0; x j ')@'

dxdy0]

¡@c(')@':

This model has a well-de…ned margin: X = ¡k('); which is the utility of the marginalentrant into the program. The utility of the participant might be distinguished from theobjective of the social planner who seeks to maximize total output. The …rst set of termscorresponds to the gain arising from the movement of persons at the margin (the termin brackets) weighted by the proportion of the population at the margin, k0(')fx(¡k(')),times the number of people in the population. This term is the net gain from switchingsectors. The expression in brackets in the …rst term is a limit form of the “local averagetreatment e¤ect” of Imbens and Angrist (1994) which we discuss further in our discussionof instrumental variables in Section 7.4.3. The second set of terms is the intrasector changein output resulting from a policy change. This includes both direct and indirect e¤ects.The second set of terms is ignored in most evaluation studies. It describes how peoplewho do not switch sectors are a¤ected by the policy. The third term is the direct marginalsocial cost of the policy change. It includes the cost of administering the program plus theopportunity cost of consumption foregone to raise the taxes used to …nance the program.Below we demonstrate the empirical importance of accounting for the full social costs ofprograms.

At an optimum, ¢(') = 0, provided standard second order conditions are satis…ed.Marginal bene…t should equal the marginal cost. We can use either a cost-based measureof marginal bene…t or a bene…t-based measure of cost to evaluate the marginal gains ormarginal costs of the program, respectively.

Observe that the local average treatment e¤ect is simply the e¤ect of treatment on thetreated for persons at the margin (X = ¡k(')) :(3.14) E(Y1 j D = 1; X = ¡k('); ') ¡E(Y0 j D = 0; X = ¡k('); ')

= E(Y1 ¡ Y0 j D = 1; X = ¡k('); '):This expression is obvious once it is recognized that the set X = ¡k(') is the indif-

ference set. Persons in that set are indi¤erent between participating in the program andnot participating. The Imbens and Angrist (1994) parameter is a marginal version of the“treatment on the treated” evaluation parameter for gross outcomes. This parameter isone of the ingredients required to produce an evaluation of the impact of a marginal change

28

Page 29: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

in the social program on total output but it ignores costs and the e¤ect of a change in theprogram on the outcomes of persons who do not switch sectors.8

The conventional evaluation parameter,

E(Y1 ¡ Y0 j D = 1; x; ')

does not incorporate costs, does not correspond to a marginal change and includes rentsaccruing to persons. This parameter is in general inappropriate for evaluating the e¤ectof a policy change on GDP. However, under certain conditions which we now specify, thisparameter is informative about the gross gain accruing to the economy from the existenceof a program at level ~' compared to the alternative of shutting it down. This is theinformation required for an “all or nothing” evaluation of a program.

The appropriate criterion for an all or nothing evaluation of a policy at level ' = ~' isA(~') = fN1(~')E(Y1 j D = 1; ' = ~') +N0(~')E(Y0 j D = 0; ' = ~') ¡ c(~')g

¡ fN1(0)E(Y1 j D = 1; ' = 0 ) + N0(0)E(Y0 j D = 0; ' = 0)gwhere ' = 0 corresponds to the case where there is no program, so that N1(0) = 0 andN0(0) = ¹N . If A(~') > 0, total output is increased by establishing the program at level ~'.

In the special case where the outcome in the benchmark state “0” is the same whetheror not the program exists,(3.15) E(Y0 j D = 0; ' = ~') = E(Y0 j D = 0; ' = 0):This condition de…nes the absence of general equilibrium e¤ects in the base state so the noprogram state for nonparticipants is the same as the nonparticipation state. Assumption(3.15) is what enables analysts to generalize from partial equilibrium to general equilibriumsettings. Recalling that ¹N = N1(') +N0('), when (3.15) holds we have(3.16) A(~') = N1(~')E(Y1 ¡ Y0 j D = 1; ' = ~') ¡ c(~'):9Given costless redistribution of the bene…ts, the output-maximizing solution for ' alsomaximizes social welfare. For this important case, which is applicable to small-scale socialprograms with partial participation, the measure “treatment on the treated” which wefocus on in this chapter is justi…ed. For evaluating the e¤ect of marginal variation or“…ne-tuning” of existing policies, measure ¢(') is more appropriate.10

8Heckman and Smith (1998a) and Heckman (1997) present comprehensive discussions of the Imbensand Angrist (1994) parameter. We discuss this parameter further in Section 7. One important di¤erencebetween their parameter and the traditional treatment on the treated parameter is that the latter excludesvariables like ' from the conditioning set, but the Imbens-Angrist parameter includes it.

9Condition (3.15) is stronger than what is required to justify (3.16). The condition only has to hold forthe subset of the population (N0(') in number) who would not participate in the presence of the program.

10Björklund and Mo¢tt (1987) estimate both the marginal gross gain and the average gross gain fromparticipating in a program. However, they do not present estimates of marginal or average costs.

29

Page 30: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

4 Prototypical Solutions to the Evaluation ProblemAn evaluation entails making some comparison between “treated” and “untreated” persons.This section considers three widely-used comparisons for estimating the impact of treatmenton the treated: E(Y1 ¡ Y0 j X;D = 1). All use some form of comparison to construct therequired counterfactual E(Y0 j X;D = 1). Data on E(Y1 j X;D = 1) are available fromprogram participants. A person who has participated in a program is paired with an“otherwise comparable” person or set of persons who have not participated in it. The setmay contain just one person. In most applications of the method, the paired partner is notliterally assumed to be a replica of the treated person in the untreated state although somepanel data evaluation estimators make such an assumption. Thus, in general, ¢ = Y1 ¡ Y0is not estimated exactly. Instead, the outcome of the paired partners is treated as a proxyfor Y0 for the treated individual and the population mean di¤erence between treated anduntreated persons is estimated by averaging over all pairs. The method can be appliedsymmetrically to nonparticipants to estimate what they would have earned if they hadparticipated. For that problem the challenge is to …nd E(Y1 j X;D = 0) since the data onnonparticipants enables one to identify E(Y0 j X;D = 0).

A major di¢culty with the application of this method is providing some objective wayof demonstrating that a candidate partner or set of partners is “otherwise comparable.”Many econometric and statistical methods are available for adjusting di¤erences betweenpersons receiving treatment and potential matching partners which we discuss in Section7.

4.1 The Before-After EstimatorIn the empirical literature on program evaluation, the most commonly-used evaluationstrategy compares a person with himself/herself. This is a comparison strategy based onlongitudinal data. It exploits the intuitively-appealing idea that persons can be in bothstates at di¤erent times, and that outcomes measured in one state at one time are goodproxies for outcomes in the same state at other times at least for the no-treatment state.This gives rise to the motivation for the simple “before-after” estimator which is still widelyused. Its econometric descendent is the …xed e¤ect estimator without a comparison group.

The method assumes that there is access either (i) to longitudinal data on outcomesmeasured before and after a program for a person who participates in it, or (ii) to repeatedcross section data from the same population where at least one cross section is from aperiod prior to the program. To incorporate time into our analysis, we introduce “t”subscripts. Let Y1t be the post-program earnings of a person who participates in theprogram. When longitudinal data are available, Y0t0 is the pre-program outcome of the

30

Page 31: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

person. For simplicity, assume that program participation occurs only at time period k,where t > k > t0. The “ before-after” estimator uses preprogram earnings Y0t0 to proxythe treatment state in the post program period. In other words, the underlying identifyingassumption is(4.A.1) E(Y0t ¡ Y0t0 j D = 1) = 0:If this assumption is valid, the “Before-After” estimator is given by(4.1) (Y 1t ¡ Y 0t0)1;where the subscript “1” denotes conditioning on D = 1, and the “¡” denotes samplemeans.

To see how this estimator works, observe that for each individual the gain from theprogram may be written as

Y1t ¡ Y0t = (Y1t ¡ Y0t0) + (Y0t0 ¡ Y0t):

The second term (Y0t0 ¡ Y0t) is the approximation error. If this term averages out to zero,we may estimate the impact of participation on those who participate in a program bysubtracting participants’ mean pre-program earnings from the mean of their post-programearnings. These means also may be de…ned for di¤erent values of participants’ character-istics, X.

The before-after estimator does not literally require longitudinal data to identify themeans (Heckman and Robb, 1985a,b). As long as the approximation error averages out,repeated cross-sectional data that sample the same population over time, but not necessarilythe same persons, are su¢cient to construct a before-after estimate. An advantage of thisapproach is that it only requires information on the participants and their pre-participationhistories to evaluate the program.

The major drawback to this estimator is its reliance on the assumption that the ap-proximation errors average out. This assumption requires that among participants, themean outcome in the no-treatment state is the same in t and t0. Changes in the overallstate of the economy between t and t0, or changes in the life cycle position of a cohort ofparticipants, can violate this assumption.

A good example of a case in which assumption (4.A.1) is likely violated is provided inthe work of Ashenfelter (1978). Ashenfelter observed that prior to enrollment in a trainingprogram, participants experience a decline in their earnings. Later research demonstratesthat Ashenfelter’s “dip” is a common feature of the pre-program earnings of participantsin government training programs. See Figures 4.1 to 4.6 which show the dip for a varietyof programs in di¤erent countries. If this decline in earnings is transitory, and earningsis a mean-reverting process so that the dip is eventually restored, even in the absenceof participation in the program, and if period t0 falls in the period of transitorily low

31

Page 32: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

earnings, then the approximation error will not average out. In this example, the before-after estimator overstates the average e¤ect of training on the trained and attributes meanreversion that would occur under any event to the e¤ect of the program. On the otherhand, if the decline is permanent, the before-after estimator is unbiased for the parameterof interest. In this case, any improvement in earnings is properly attributable to theprogram. Another potential defect of this estimator is that it attributes to the programany trend in earnings due to macro or lifecycle factors.

Two di¤erent approaches have been used to solve these problems with the before-afterestimators. One controversial method generalizes the before-after estimator by making useof many periods of pre-program data and extrapolating from the period before t0 to generatethe counterfactual state in period t. It assumes that Y0t and Y0t0 can be adjusted to equalityusing data on the same person, or the same populations of persons, followed over time. Asan example, suppose that Y0t is a function of t, or is a function of t- dated variables. Ifwe have access to enough data on pre-program outcomes prior to date t0 to extrapolatepost-program outcomes Y0t; and if there are no errors of extrapolation, or if it is safe toassume that such errors average out to zero across persons in period t, one can replacethe missing data or at least averages of the missing data, using extrapolated values. Thismethod is appropriate if population mean outcomes evolve as deterministic functions oftime or macroeconomic variables like unemployment. This procedure is discussed furtherin Section 7.5.11 The second approach is based on the before-after estimator which wediscuss next.

4.2 The Di¤erence-in-Di¤erences EstimatorA more widely used approach to the evaluation problem assumes access either (i) to lon-gitudinal data or (ii) to repeated cross-section data on nonparticipants in periods t and t0.If the mean change in the no-program outcome measures are the same for participants andnonparticipants i.e. if the following assumption is valid:(4.A.2) E(Y0t ¡ Y0t0 j D = 1) = E(Y0t ¡ Y0t0 j D = 0)then the di¤erence-in-di¤erences estimator given by(4.2) ( ¹Y1t ¡ ¹Y0t0)1 ¡ ( ¹Y0t ¡ ¹Y0t0)0 t > k > t0 :is valid for E(¢t j D = 1) = E(Y1t ¡ Y0t j D = 1) where ¢t = Y1t ¡ Y0t because

E[( ¹Y1t ¡ ¹Y0t0)1 ¡ ( ¹Y0t ¡ ¹Y0t0)0] = E(¢t j D = 1):12

11See also Heckman and Robb (1985a), p. 210-215.12The proof is immediate. Make the following decomposition

( ¹Y1t ¡ ¹Y0t0)1 = (¹Y1t ¡ ¹Y0t0)1 + (¹Y0t ¡ ¹Y0t0)1:

32

Page 33: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

If assumption (4.A.2) is valid, the change in the outcome measure in the comparison groupserves to benchmark common year or age e¤ects among participants.

Because we cannot form the change in outcomes between the treated and untreatedstates, the expression

(Y1t ¡ Y0t0)1 ¡ (Y0t ¡ Y0t0)0;

cannot be formed for anyone, although we can form one or the other of these terms foreveryone. Thus, we cannot use the di¤erence-in-di¤erences estimator to identify the dis-tribution of gains without making further assumptions.13 Like the before-after estimator,we can implement the di¤erence-in-di¤erences estimator for means (4.2) on repeated crosssections. It is not necessary to sample the same persons in periods t and t0— just personsfrom the same populations.

Ashenfelter’s dip provides an example of a case where assumption (4.A.2) is likely to beviolated. If Y is earnings, and t0 is measured at the time of a transitory earnings dip, andif non-participants do not experience the dip, then (4.A.2) will be violated, because thetime path of no-program earnings between t0 and t will be di¤erent between participantsand nonparticipants. In this example, the di¤erence-in-di¤erences estimator overstates theaverage impact of training on the trainee.

4.3 The Cross-Section EstimatorA third estimator compares mean outcomes of participants and nonparticipants at timet: This estimator is sometimes called the cross-section estimator. It does not comparethe same persons because by hypothesis a person cannot be in both states at the sametime. Because of this fact, cross-section estimators cannot estimate the distribution ofgains unless additional assumptions are invoked beyond those required to estimate meanimpacts.

The key identifying assumption for the cross-section estimator of the mean is that(4.A.3) E(Y0t j D = 1) = E(Y0t j D = 0);i.e., that on average persons who do not participate in the program have the same no-treatment outcome as those who do participate. If this assumption is valid, then thecross-section estimator is given by

The claim follows upon taking expectations.13One assumption that identi…es the distribution of gains is to assume that (Y1t ¡Y0t)1 is independent of

(Y0t ¡ Y0t0)1 and that the distribution of (Y1t ¡ Y0t)1 is the same as the distribution of (Y0t ¡ Y0t0)0: Thenthe results on deconvolution in Heckman, Smith and Clements (1997) can be applied. See their paper fordetails.

33

Page 34: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

(4.3) ( ¹Y1t)1 ¡ ( ¹Y0t0)0:This estimator is valid under assumption (4.A.3) because

E(( ¹Y1t)1 ¡ ( ¹Y0t)0) = E(¢t j D = 1):14

If persons go into the program based on outcome measures in the post-program state, thenassumption (4.A.3) will be violated. The assumption would be satis…ed if participation inthe program is unrelated to outcomes in the no program state in the post-program period.Thus, it is possible for Ashenfelter’s dip to characterize the data on earnings in the pre-program period, and yet for (4.A.3) to be satis…ed. Moreover, as long as the macro economyand aging process operate identically on participants and nonparticipants, the cross sectionestimator is not vulnerable to the problems that plague the before-after estimator.

The cross section estimator (4.3), the di¤erence-in-di¤erences estimator (4.2), and thebefore-after estimator (4.1) comprise the trilogy of conventional non-experimental evalua-tion estimators. All of these estimators can be de…ned conditional on observable character-isticsX. Conditioning onX or additional “instrumental” variables make it more likely thatmodi…ed versions of assumptions (4.A.3), (4.A.2), or (4.A.1) will be satis…ed but this is notguaranteed. If, for example, the distribution ofX characteristics is di¤erent between partic-ipants (D = 1) and nonparticipants (D = 0), conditioning on X may eliminate systematicdi¤erences in outcomes between the two groups. Using modern nonparametric procedures,it is possible to exploit each of the identifying conditions to estimate nonparametric ver-sions of all three estimators. On the other hand, if the di¤erence between participants andnonparticipants is due to unobservables, conditioning may accentuate, and not eliminate,di¤erences between participants and nonparticipants in the no-program state.15

The three estimators exploit three di¤erent principles but all are based on making somecomparison. The assumptions that justify one method will not, in general, justify any of theother methods. All of the estimators considered in this chapter exploit one of these threeprinciples. They extend the simple mean di¤erences just discussed by making a variety ofadjustments to the means. Throughout the rest of the chapter, we organize our discussionof alternative estimators by discussing how they modify the simple mean di¤erences usedin the three intuitive estimators to account for nonstationary environments and di¤erentregressors in the di¤erent comparison groups. We …rst consider social experimentation andhow it constructs the counterfactuals used in policy evaluations.

14Proof:( ¹Y1t)1 ¡ ( ¹Y0t0)0 = (¹Y1t)1 ¡ ( ¹Y0t)1 + (¹Y0t)1 ¡ ( ¹Y0t0)0

and take expectations invoking assumption (4-A-3).15Thus if j E(Y0 j D = 1) ¡ E(Y0 j D = 0) j= M , there is no guarantee that j E(Y0 j D = 1;X) ¡ E(Y0 j

D = 0;X) j< M . For some values of X, the gap could widen.

34

Page 35: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

5 Social ExperimentsRandomization is one solution to the evaluation problem. Recent years have witnessedincreasing use of experimental designs to evaluate North American employment and train-ing programs. This approach has been less common in Europe, though a small number ofexperiments have been conducted in Britain, Norway and Sweden. When the appropriatequali…cations are omitted, the impact estimates from these social experiments are easy foranalysts to calculate and for policymakers to understand (see, e.g., Burtless, 1995). As aresult of its apparent simplicity, evidence from social experiments has had an importantimpact on the design of U.S. welfare and training programs.16 Because of the importanceof experimental designs in this literature, in this section we show how they solve the eval-uation problem, describe how they have been implemented in practice, and discuss theiradvantages and limitations.

5.1 How Social Experiments Solve the Evaluation Problem.An important lesson of this section is that social experiments, like other evaluation methods,provide estimates of the parameters of interest only under certain behavioral and statisticalassumptions. To see this, let “*” denote outcomes in the presence of random assignment.Thus, conditional on X for each person we have (Y ¤1 ; Y ¤0 ; D¤) in the presence of randomassignment and (Y1; Y0; D) when the program operates normally without randomization.Let R = 1 if a person for whom D¤ = 1 is randomized into the program and R = 0 if theperson is randomized out. Thus, R = 1 corresponds to the experimental treatment groupand R = 0 to the experimental control group.

The essential assumption required to use randomization to solve the evaluation problemfor estimating the mean e¤ect of treatment on the treated is that(5.A.1) E(Y ¤1 ¡ Y ¤0 j X;D¤ = 1) = E(Y1 ¡ Y0 j X;D = 1).A stronger set of conditions, not strictly required, are(5.A.2a) E(Y ¤1 j X;D¤ = 1) = E(Y1 j X;D = 1)and(5.A.2b) E(Y ¤0 j X;D¤ = 1) = E(Y0 j X;D¤ = 1).

Assumption (5.A.2a) states that the means from the treatment and control groupsgenerated by random assignment produce the desired population parameter. With certainexceptions discussed below, this assumption rules out changes in the impact of participationdue to the presence of random assignment as well as changes in the process of programparticipation. The …rst part of this assumption can in principle be tested by comparing the

16We discuss this evidence in Section 10.

35

Page 36: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

outcomes of participants under a regime of randomization with the outcome of participantsunder the usual regime.

If (5.A.2a) is true, among the population for whom D = 1 and R = 1 we can identify

E(Y1 j X;D = 1; R = 1) = E(Y1 j X;D = 1):

Under (5.A2a) information su¢cient to estimate this mean without bias is routinely pro-duced from data collected on participants in social programs. The new information pro-duced by an experiment comes from those randomized out of the program. Using theexperimental control group it is possible to estimate:

E(Y0 j X;D = 1; R = 0) = E(Y0 j X;D = 1):

Thus, experiments produce data that satisfy assumption (4.A.3). Simple application of thecross-section estimator identi…es

E(¢ j X;D = 1) = E(Y1 ¡ Y0 j X;D = 1):

Within the context of the model of equation (3.10), an experiment that satis…es (5.A.1)or (5.A.2a) and (5.A2b) does not make D orthogonal to U . It simply equates the bias inthe two groups R = 1 and R = 0. Thus in the model of equation (3.1), under (5.A.2a),E(Y jX;D = 1; R = 1) = g1(X) + E(U1jX;D = 1) and E(Y jX;D = 1; R = 0) = g0(X) +E(U0jX;D = 1):17

Rewriting the …rst conditional mean, we obtain

E(Y jX;D = 1; R = 1) = g1(X) + E(U1 ¡ U0jX;D = 1) + E(U0jX;D = 1):

Subtracting the second mean from the …rst eliminates the common selection bias componentE(U0jX;D = 1) so

E(Y jX;D = 1; R = 1) ¡ E(Y jX;D = 1; R = 0) = g1(X) ¡ g0(X) + E(U1 ¡ U0jX;D = 1):

When the model (3.1) is specialized to one of intercept di¤erences, as in (3.10), thisparameter simpli…es to ®. Notice, that the method of social experiments does not set eitherE(U1jX;D = 1) or E(U0jX;D = 1) equal to zero. Rather, it balances the selection bias inthe treatment and control groups.

17Notice that in this section we allow for the more general model Y0 = g0(X) + U0,Y1 = g1(X) + U1 where E(U0 j X) 6= 0 and E(U1 j X) 6= 0.

36

Page 37: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

Stronger assumptions must be made to identify the distribution of impacts F (¢ j D =1):18 Without invoking further assumptions, data from experiments, like data from non-experimental sources, are unable to identify the distribution of impacts because the sameperson is not observed in both states at the same time (Heckman, 1992; Heckman, Smithand Clements, 1997; Heckman and Smith, 1993, 1995, 1998a).

If assumption (5.A.1) or assumptions (5.A.2a) and (5.A.2b) fail to hold because theprogram participation probabilities are a¤ected, so D¤ and D are di¤erent, then the com-position of the participant population di¤ers in the presence of random assignment. In twoimportant special cases, experimental data still provide unbiased estimates of the e¤ect oftreatment on the treated. First, if the e¤ect of training is the same for everyone, changingthe composition of the participants has no e¤ect because the parameter of interest is thesame for all possible participant populations (Heckman, 1992). This assumption is some-times called the common treatment e¤ect assumption and, letting i denote a variable valuefor individual i, may be formally expressed as(5.A.3) Y1i ¡ Y0i = ¢i ´ ¢ for all i.This assumption is equivalent to setting U1 = U0 in (3.9). Assumption (5.A.3) can bede…ned conditionally on observed characteristics, so we may write ¢ = ¢(X). Notice,however, that in this case, if randomization induces persons with certain X values not toparticipate in the program, then estimates of ¢(X) can only be obtained for values of Xpossessed by persons who participate in the program. In this case (5.A.1) is satis…ed but(5.A.2a) and (5.A.2b) are not.

The second special case where experimental data still provide unbiased estimates ofthe e¤ect of treatment on the treated arises when decisions about training are not a¤ectedby the realized gain from participating in the program. This case could arise if potentialtrainees know E(¢ j X) but not ¢ at the time participation decisions are made. Formally,the second condition is(5.A.4) E(¢ j X;D = 1) = E(¢ j X);which is equivalent to condition (3.11) in the model (3.9). If either (5.A.3) or (5.A.4) holds,the simple experimental mean di¤erence estimator is unbiased for E(¢ j X;D = 1):

Randomization improves on the non-experimental cross-section estimator even if thereis no selection bias. In an experiment, for all values of X for which D = 1, one can identify

E(¢ j X;D = 1) = E(Y1 ¡ Y0 j X;D = 1):

Using assumption (4.A.3) in an ordinary nonexperimental evaluation, there may be valuesof X such that Pr(D = 1 j X) = 1; that is, there may be values of X with no comparisongroup members. Randomization avoids this di¢culty by balancing the distribution of X

18Replace “E” with “F” in (5.A.2a) and (5.A.2b) to obtain one necessary condition.

37

Page 38: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

values in the treatment and control groups (Heckman, 1996). At the same time, however,random assignment conditional on D = 1 cannot provide estimates of ¢(X) for values ofX such that Pr(D = 1 j X) = 0.

The stage of potential program participation at which randomization is applied - eligi-bility, application, or acceptance into a program - determines what can be learned from asocial experiment. For randomization conditional on acceptance into a program (D = 1),we can estimate the e¤ect of treatment on the treated:

E(¢ j X;D = 1) = E(Y1 ¡ Y0 j X;D = 1)

using simple experimental means. We cannot estimate the e¤ect of randomly selecting aperson to go into the program:

E(¢ j X) = E(Y1 ¡ Y0 j X);

by using simple experimental means unless one of two conditions prevails. The …rst con-dition is just the common e¤ect assumption (5.A.3). This assumption is explicit in thewidely-used dummy endogenous variable model (Heckman, 1978). The second condition isthat embodied in assumption (5.A.4), that participation decisions are independent of theperson-speci…c component of the impact. In both cases, the mean impact of treatment ona randomly selected person is the same as the mean impact of treatment on the treated.

In the general case, it is di¢cult to estimate the e¤ect of randomly assigning a personwith characteristics X to go into a program. This is because persons randomized into aprogram cannot be compelled to participate in it. In order to secure compliance, it maybe necessary to compensate or persuade persons to participate. For example, in many U.S.social experiments, program operators threaten to reduce participants’ social assistancebene…ts, if they refuse to participate in training. Such actions, even if successful, alter theenvironment in which persons operate and may make it impossible to estimate E(¢ j X)using experimental means. One assumption that guarantees compliance is the existence ofa “compensation” or “punishment” level c such that(5.A.5a) Pr(D = 1 j X; c) = 1and(5.A5b) E(¢ j X; c) = E(¢ j X):The …rst part of the assumption guarantees that a person with characteristics X can be“bribed” or “persuaded” to participate in the program. The second part of the assumptionguarantees that compensation c does not a¤ect the outcome being evaluated.19 If c is a

19Observe that the value of c is not necessarily unique.

38

Page 39: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

monetary payment, it would be optimal from the standpoint of an experimental analyst to…nd the minimal value of c that satis…es these conditions.

Randomization of eligibility is sometimes proposed as a less disruptive alternative torandomization conditional on D = 1. Randomizing eligibility avoids the application andscreening costs that are incurred when accepted individuals are randomized out of a pro-gram. Because the randomization is performed outside of training centers, it also avoidssome of the political costs that have accompanied the use of the experimental method.

Consider a population of persons who are usually eligible for the program. Randomizeeligibility within this population. Let e = 1 if a person retains eligibility and e = 0 if aperson becomes ineligible. Assume that eligibility does not disturb the underlying structureof the random variables (Y0; Y1;D;X) and that Pr(D = 1 j X) 6= 0. Then Heckman (1996)shows that

E(Y j X; e = 1) ¡ E(Y j X; e = 0)Pr(D = 1 j X; e = 1)

= E(¢ j X;D = 1):

Randomization of eligibility produces samples that can be used to identify E(¢ j X;D = 1)and also to recover Pr(D = 1jX): The latter is not recovered from samples which conditionon D = 1 (Heckman, 1992; Mo¢tt, 1992). Without additional assumptions of the sortpreviously discussed, randomization on eligibility will not, in general, identify E(¢ j X).

5.2 Intention to Treat and Substitution BiasThe objective of most experimental designs is to estimate the conditional mean impactof training, or E(¢ j X;D = 1). However, in many experiments a signi…cant fractionof the treatment group drops out of the program and does not receive the services beingevaluated.20 In general, in the presence of dropping out E(¢ j X;D = 1) cannot beidenti…ed using comparisons of means. Instead, the experimental mean di¤erence estimatesthe mean e¤ect of the o¤er of treatment, or what is sometimes called the “intent to treat.”For many purposes, this is the policy-relevant parameter. It is informative on how theavailability of a program a¤ects participant outcomes. Attrition is a normal feature of anongoing program.

To obtain an estimate of the impact of training on those who actually receive it, ad-ditional assumptions are required beyond (5.A.1) or (5.A.2a) and (5.A.2b). Let T be anindicator for actual receipt of treatment, with T = 1 for persons actually receiving train-ing, and T = 0 otherwise. Let T ¤ be a similarly de…ned latent variable for control group

20Using the analysis in the preceding subsection, dropping out by experimental treatment group memberscould be reduced by compensating them for completing training.

39

Page 40: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

members indicating whether or not they would have actually received training, had theybeen in the treatment group. De…ne

E(¢ j X;D = 1; R = 1; T = 1) = E(¢ j X;D = 1; T = 1)

as the mean impact of training on those members of the treatment group who actuallyreceive it. This parameter will equal the original parameter of interest E(¢ j X;D = 1)only in the special cases where (5.A.3), the common e¤ect assumption, holds, or where ananalog to (5.A.4) holds so that the decision of treatment group members to drop out isindependent of (¢ ¡ E(¢)), the person-speci…c component of their impact.

A consistent estimate of the impact of training on those who actually received it can beobtained under the assumption that the mean outcome of the treatment group dropouts isthe same as that of their analogs in the control group, so that(5.A.6) E(Y j X;D = 1; R = 1; T = 0) = E(Y j X;D = 1; R = 0; T ¤ = 0):Note that this assumption rules out situations where the treatment group dropouts receivepotentially valuable partial treatment. Under (5.A.6),

(5.1)E(Y j X;D = 1; R = 1) ¡ E(Y j X;D = 1; R = 0)

P (T = 1 j X;D = 1; R = 1)identi…es the mean impact of training on those who receive it.21 This estimator scales upthe experimental mean di¤erence estimate by the fraction of the treatment group receivingtraining. When all treatment group members receive training, the denominator equals oneand the estimator reduces to the simple experimental mean di¤erence. Estimator (5.1) alsoshows that the simple mean di¤erence estimator provides a downward biased estimate ofthe mean impact of training on the trained when there are dropouts from the treatmentgroup, because the denominator always lies between zero and one. Heckman, Smith andTaber (1998) present methods for estimating distributions of outcomes and for testing theidentifying assumptions in the presence of dropping out. They present evidence on thevalidity of the assumptions that justify (5.1) in the National JTPA Study data.

In an experimental evaluation, the converse problem can also arise for the control groupmembers. In an ideal experiment, no control group members would receive either the ex-perimental treatment or close substitutes to it from other sources. In practice, a signi…cantfraction of controls often receives similar services from other sources. In this situation, themean earnings of control group members no longer correspond to E(Y0 j X;D = 1) andneither the experimental mean di¤erence estimator nor the adjusted estimator (5.1) identi-…es the impact of training relative to no training for those who receive it. However, undercertain conditions discussed in Section 3, the experimental estimate can be interpreted asthe mean incremental e¤ect of the program relative to a world in which it does not exist.

21See, e.g., Mallar (1978), Bloom (1984) and Heckman, Smith and Taber (1998).

40

Page 41: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

As in the case of treatment group dropouts, identifying the impact of training on thetrained in the presence of control group substitution requires additional assumptions beyond(5.A.1) or (5.A.2a) and (5.A.2b). Let S = 1 denote control group members receivingsubstitute training from alternative sources and let S = 0 denote control group membersreceiving no training and let Y2 be the outcome conditional on receipt of alternative training.Consider the general case with both treatment group dropping out and control groupsubstitution. In this context, one approach would be to invoke the assumptions required toapply non-experimental techniques as described in Section 7 to the treatment group datato obtain an estimate of the impact of the training being evaluated on those who receive it.Heckman, Hohmann, Khoo and Smith (1998) employ this and other strategies using datafrom the National JTPA Study.

Alternatively, two other assumptions allow use of the control group data to estimatethe impact of training on the trained. The …rst assumption is a generalized common e¤ectassumption, where to distinguish individuals we restore subscript i(5.A.30) Y1i ¡ Y0i = Y2i ¡ Y0i = ¢i ´ ¢ for all i.This assumption states that (a) the impact of the program being evaluated is the same asthe impact of substitute programs for each person and (b) that all persons respond exactlythe same way to the program (a common e¤ect assumption). The second assumption is ageneralized version of (5.A.4), where(5.A.40) E(Y1 ¡ Y0 j X;D = 1; T = 1; R = 1) = E(Y2 ¡ Y0 j X;D = 1; S = 1; R = 0):This assumption states that the mean impact of the training being evaluated received bytreatment group members who do not drop out equals the mean impact of substitute train-ing on those control group members who receive it. Both (5.A.30) and (5.A.40) are strongassumptions. To be plausible, either would require evidence that the training received bytreatment group members was similar in content and duration to that received by controlgroup members. Note that (5.A.30) implies (5.A.40). Under either assumption, the ratio

(5.2)E(Y j X;D = 1; R = 1) ¡ E(Y j X;D = 1; R = 0)

Pr(T = 1 j X;D = 1; R = 1) ¡ Pr(S = 1 j X;D = 1; R = 0)identi…es the mean impact of training on those who receive it in both the experimentaltreatment and control groups, provided that the denominator is not zero. The similarity ofestimator (5.2) to the instrumental variable estimator de…ned in Section 7 is not accidental;under assumptions (5.A.30) or (5.A.40), random assignment is a valid instrument for trainingbecause it is correlated with training receipt but not with any other determinants of theoutcome Y . Without one of these assumptions, random assignment is not, in general,a valid instrument (Heckman, 1997; Heckman, Hohmann, Khoo and Smith, 1998). Tosee this point, consider a model in which individuals know their gain from training, butbecause the treatment group has access to the program being evaluated, it faces a lower costof training. In this case, controls are less likely to be trained, but the mean gross impact

41

Page 42: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

would be larger among control trainees than among the treatment trainees. Drawing on theanalysis of Section 7, this correlation violates the condition required for the IV estimatorto identify the parameter of interest.

5.3 Social Experiments in PracticeIn this subsection we discuss how social experiments operate in practice. We present em-pirical evidence on some of the theoretical issues surrounding social experiments discussedin the preceding subsections and provide a context for the discussion of the experimentalevidence on the impact of training in Section 10. To make the discussion concrete, we focusin particular on two of the best known U.S. social experiments: the National SupportedWork (NSW) demonstration (Hollister, et al., 1984) and the recent National JTPA Study(NJS).22 We begin with a brief discussion of the implementation of these two experiments.

5.3.1 Two Important Social Experiments

The NSW Demonstration was one of the …rst employment and training experiments. Ittested the e¤ect of 9 to 18 months of guaranteed work experience in unskilled occupationson groups of long-term AFDC (welfare) recipients, ex-drug addicts, ex-criminal o¤enders,and economically disadvantaged youths in 10 sites across the U.S. These jobs were in asheltered environment in which productivity standards were gradually raised over time andparticipants met frequently with program counselors to discuss grievances and performance.

The NSW enrollment process began with a referral, usually by a welfare agency, drugrehabilitation agency, or prisoners’ assistance society. Program operators then interviewedpotential participants and eliminated any persons that they believed “would be disruptiveto their programs” (Hollister, et al., 1984, p. 35). Following this screening, a third partyrandomly assigned one-half of the quali…ed applicants to the treatment group. The re-mainder were assigned to the control group and prevented from receiving NSW services.Although the controls could not receive NSW services, program administrators could notprevent them from receiving other training services in their community, such as those of-fered under another widely available training program with the acronym CETA. Follow-updata on the experimental treatment and control groups were collected via both surveys andadministrative earnings records.

In contrast to the NSW, the NJS sought to evaluate the e¤ectiveness of an ongoingtraining program. From the start, the goal of evaluating an ongoing program withoutsigni…cantly disrupting its operations – and thereby violating assumption (5.A.1) or as-sumptions (5.A.2a) and (5.A.2b) – posed signi…cant problems. The …rst of these arose

22See, among others, Doolittle and Traeger (1990), Bloom, et al. (1993) and Orr, et al. (1994).

42

Page 43: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

in selecting the training centers at which random assignment would take place. Initially,evaluators planned to use a random sample of the nearly 600 U.S. JTPA training sites.Randomly choosing the evaluation sites would enhance the “external validity” of the ex-periment – the extent to which its …ndings can be generalized to the population of JTPAtraining centers. Yet, it was di¢cult to persuade local administrators to participate in anevaluation that required them to randomly deny services to eligible applicants. When onlyfour of the randomly selected sites or their alternates agreed to participate, the study wasredesigned to include a “diverse” group of 16 centers willing to participate in a randomassignment study (see Doolittle and Traeger, 1990; or the summary of their analysis pre-sented in Hotz, 1992). Evaluators had to contact 228 JTPA training centers in order toobtain these sixteen volunteers.23 The option of forcing centers to participate was rejectedbecause of the importance of securing the cooperation of local administrators in preservingthe integrity of random assignment. Such concerns are not without foundation, as the in-tegrity of an experimental training evaluation in Norway was undermined by the behaviorof local operators (Torp, et al., 1993).

Concerns about disrupting normal program operations and violating (5.A.1) or (5.A.2a)-(5.A.2b) also led to an unusual approach to the evaluation of the speci…c service typesprovided by JTPA. This program o¤ers a personalized mix of employment and trainingservices including all those listed in Table 2.1 with the exception of public service employ-ment. During their enrollment in the program, participants may receive two or more ofthese services in sequence, where the sequence may depend on the participant’s success orfailure in those services provided …rst. As a result of this heterogeneous, ‡uid structure,it was impossible without changing the character of the program to conduct random as-signment conditional on (planned) receipt of particular services or sets of services. Instead,JTPA sta¤ recommended particular services for each potential participant prior to randomassignment, and impact estimates were calculated conditional on these recommendations.In particular, the recommendations were grouped into three “treatment streams”: the“CT-OS stream” which included persons recommended for classroom training, CT, (andpossibly other services), OS, but not on the job training or OJT; the “OJT stream” whichincluded persons recommended for OJT (and possibly other services) but not CT; and the“other stream” which included the rest of the admitted applicants, most of whom endedup receiving only job search assistance. Note that this issue did not arise in the NSW,which provided a single service to all of its participants. In the NJS, follow-up data onearnings, employment and other outcomes were obtained from both surveys and multiple

23Very large training centers (e.g., Los Angeles) and small, rural centers were excluded from the studydesign from the outset of the center enrollment process, for administrative and cost reasons, respectively.The …nal set of 16 training centers received a total of $1 million in payments to cover the cost of participatingin the experiment.

43

Page 44: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

administrative data sources.

5.3.2 The Practical Importance of Dropping Out and Substitution

The most important problems a¤ecting social experiments are treatment group dropout andcontrol group substitution. These problems are not unique to experiments. Persons dropout of programs whether or not they are experimentally evaluated. There is no evidencethat the rate of dropping out increases during an experimental evaluation. Most programshave good substitutes so that the estimated e¤ect of a program as typically estimated isin relation to the full range of activities in which nonparticipants engage. Experimentsexacerbate this problem by creating a pool of persons who attempted to take training whothen ‡ock to substitute programs when they are placed in an experimental control group.

Table 5.1 demonstrates the practical importance of these problems in experimentalevaluations by reporting the rates of treatment group dropout and control group substitu-tion from a variety of social experiments. It reveals that the fraction of treatment groupmembers receiving program services is often less than 0.7, and sometimes less than 0.5.Furthermore, the observed characteristics of the treatment group members who drop outoften di¤er from those who remain and receive the program services.24 In regard to sub-stitution, Table 5.1 shows that as many as 40 percent of the controls in some experimentsreceived substitute services elsewhere. In an ideal experiment, all treatments receive thetreatment and there is no control group substitution, so that the di¤erence between thefraction of treatments and controls that receive the treatment equals 1.0. In practice, thisdi¤erence is often well below 1.0.

The extent of both substitution and dropout depends on the characteristics of thetreatment being evaluated and the local program environment. In the NSW, where thetreatment was relatively unique and of high enough quality to be clearly perceived asvaluable by participants, dropout and substitution rates were low enough to approximatethe ideal case. In contrast, in the NJS and other evaluations of programs that providelow cost services widely available from other sources, substitution and dropout rates arehigh.25 In the NJS, the substitution problem is accentuated by the fact that JTPA relies on

24For the NSW, see LaLonde (1984); for the NJS see Smith (1992).25For the NJS, Table 5.1 reveals the additional complication that estimates of the rate of training

receipt in the treatment and control groups depend on the data source used to make the calculation. Inparticular, because many treatment group members do not report training that administrative recordsshow they received, dropout rates measured using only the survey data are substantially higher than thosethat combine the survey and administrative data. At the same time, because administrative data are notavailable on control group training receipt (other than the very small number of persons who defeated theexperimental protocol), using only self-report data on controls but the combined data for the treatmentgroup will likely overstate the di¤erence in service receipt levels between the two groups.

44

Page 45: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

outside vendors to provide most of its training. Many of these vendors, such as communitycolleges, provide the same training to the general public, often with subsidies from othergovernment programs such as Pell Grants. In addition, in order to help in recruiting sitesto participate in the NJS, evaluators allowed them to provide control group members witha list of alternative training providers in the community. Of the 16 sites in the NJS, 14took advantage of this opportunity to alert control group members to substitute trainingopportunities.

To see the e¤ect of high of dropping out and substitution on the interpretation of theexperimental evidence, consider Project Independence. The unadjusted experimental im-pact estimate is $264 over the 2-year follow-up period, while application of the IV estimatorthat uses sample moments in place of (5.2) yields an adjusted impact estimate of $1,100($264/0.24). The …rst estimate indicates the mean impact of the o¤er of treatment relativeto the other employment and training opportunities available in the community. Under as-sumptions (5.A.30) or (5.A.40), the latter estimate indicates the impact of training relativeto no training in both the treatment and control groups. Under these assumptions, thehigh rates of dropping out and substitution suggest that, the experimental mean di¤erenceestimate is strongly downward biased as an estimate of the impact of treatment on thetreated, the primary parameter of policy interest.

A problem unique to experimental evaluations is violation of (5.A.1), or (5.A.2a) and(5.A.2b) which produces what Heckman (1992) and Heckman and Smith (1993, 1995) call“randomization bias.” In the NJS, this problem took the form of concerns that expandingthe pool of accepted applicants, which was required to keep the number of participantsat normal levels while creating a control group, would change the process of selection ofpersons into the program. Speci…cally, training centers were concerned that the additionalrecruits brought in during the experiment would be less motivated and harder to train andtherefore bene…t less from the program. Concerns about this problem were frequently citedby training centers that declined to participate in the NJS (Doolittle and Traeger, 1990).To partially allay these concerns, random assignment was changed from the 1:1 ratio thatminimizes the sampling variance of the experimental impact estimator to a 2:1 ratio oftreatments to controls.

Although we have no direct evidence on the empirical importance of changes in par-ticipation patterns on measured outcomes during the NJS, there is some indirect evidenceabout the validity of (5.A.1) or (5.A.2a) and (5.A.2b) in this instance. First of all, a numberof training centers in the NJS streamlined their intake processes during the experiment –sometimes with the help of an intake consulting …rm whose services were subsidized as partof the evaluation. In so doing, they generally reduced the number of visits and other costspaid by potential trainees, thereby including among those randomly assigned less motivatedpersons than were normally served. Second, some training centers asked for, and received,

45

Page 46: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

additional temporary reductions in the random assignment ratio during the course of theexperiment when they experienced di¢culties recruiting su¢cient quali…ed applicants tokeep the program operating at normal levels.

A second problem unique to experiments involves obtaining experimental estimates ofthe e¤ects of individual components of services provided in sequence as part of a singleprogram. Experimental designs can readily determine how access to a bundle of servicesa¤ects participants’ earnings. More di¢cult is the question of how participation at eachstage in‡uences earnings, when participants can drop out during the sequence. Providing anexperimental answer to this question requires randomization at each stage in the sequence.26

In a program with several stages, this would lead to a proliferation of treatments andeither large (and costly) samples or insu¢cient sample sizes. In practice, such sequentialrandomization has not been attempted in evaluating job training programs.

A …nal problem unique to experimental designs is that even under ideal conditions, theyare unable to answer many questions of interest besides the narrow impact of “treatmenton the treated” parameter. For example, it is not possible in practice to obtain simpleexperimental estimates of the duration of post-random assignment employment due to post-random assignment selection problems (Ham and LaLonde, 1990). An elaborate analysisof self-selection of the sort sought to be avoided by social experiments is required. Asanother example, consider estimating the impact of training on wage rates. The problemthat arises in this case is that we observe wages only for those employed following randomassignment. If the experimental treatment a¤ects employment, then the sample of employedtreatments will have di¤erent observed and unobserved characteristics than the employedcontrols. In general, we would expect that the persons without wages will be less skilled.The experimental impact estimate cannot separate out di¤erences between the distributionof observed wages in the treatment and control groups that result from the e¤ect of theprogram on wage rates from the e¤ect of the program on selection into employment. Underthese circumstances, only non-experimental methods such as those discussed in Section 7can provide an answer to the question of interest.

5.3.3 Additional Problems Common to All Evaluations

There are a number of other problems that arise in both social experiments and non-experimental evaluations. Solving these problems in an experimental setting requires an-

26Alternatively, in a program with three stages, program administrators might randomly assign eligibleparticipants to one of several treatment groups, with the …rst group receiving only stage 1 services, thesecond receiving stage 1 and stage 2 services and the third receiving services from all three stages. However,a problem may arise with this scheme if participants assigned to the second and third stages of the programat some point decline to participate. In that case, the design described in the text would be more e¤ective.

46

Page 47: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

alysts to make the same types of choices (and assumptions) that are required in a non-experimental analysis. An important point of this subsection is that experimental impactestimates are sensitive to these choices in the same way as non-experimental estimates. Arelated concern is that experimental evaluations should, but often do not, include sensitivityanalyses indicating the e¤ect of the choices made on the impact estimates obtained.

The …rst common evaluation problem arises from imperfect data. Di¤erent surveyinstruments can yield di¤erent measures for the same variable for the same person in agiven time period (see Smith, 1997a,b, and the citations therein). For example, self-reportedmeasures of earnings or welfare receipt from surveys typically di¤er from administrativemeasures covering the same period (LaLonde and Maynard, 1987; Bloom, et al., 1993). Aswe discuss in Section 8, in the case of earnings, data sources commonly used for evaluationresearch di¤er in the types of earnings covered, the presence or absence of top-coding andthe extent of missing or incorrect values. The evaluator must trade o¤ these factors whenchoosing which data source to rely on. Whatever the data source used, the analyst mustmake decisions about how to handle outliers and missing values.

To underscore the point that experimental impacts for the same program can di¤er dueto di¤erent choices about data sources and data handling, we compare the impact estimatesfor NJS presented in the two o¢cial experimental impact reports, Bloom, et al. (1993) andOrr, et al. (1994).27 As shown in Table 5.2, these two reports give substantially di¤erentestimates of the impact of JTPA training for the same demographic groups over the sametime period. The di¤erences result from di¤erent decisions about whom to include in theevaluation sample, how to combine earnings information from surveys and administrativedata, how to treat seemingly anomalous reports of overtime earnings in the survey dataand so on. Several of the point estimates di¤er substantially, as do the implications aboutthe relative e¤ectiveness of the three treatment streams for adult women. The estimated18-month impact for adult women in the “other services” stream triples from the 18-month impact report to the 30-month impact report, making it the service with the largestestimated impact despite the low average cost of the services provided to persons in thisstream.

The second problem common to experimental and non-experimental evaluations is sam-ple attrition. Note that sample attrition is not the same as dropping out of the program.Both control and treatment group members can attrit from the sample and treatmentgroup members who drop out of the program will often remain in the data. In the NSW,attrition from the evaluation sample by the 18 month follow-up interview was 10 percentfor the adult women, but more than 30 percent for the male participants. In the NJS study,sample attrition by the 18 month follow-up was 12 percent for the adult women and ap-

27A complete discussion of the impact estimates from the NJS appears in Section 10.

47

Page 48: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

proximately 20 percent of the adult males. Such high rates of attrition are common amongthe disadvantaged due to relatively frequent changes in residence and other di¢culties withmaking follow-up contacts.

Sample attrition poses a problem for experimental evaluations when it is correlated withindividual characteristics or with the impact of treatment conditional on characteristics.In practice, persons with poorer labor market characteristics tend to have higher attritionrates (see, e.g., Brown, 1979). Even if attrition a¤ects both experimental and control groupsin the same way, the experiment estimates the mean impact of the program only for thosewho remain in the sample. Usually, attrition rates are both non-random and larger forcontrols than for treatments. In this case, the experimental estimate of training is biasedbecause individuals’ experimental status, R, is correlated with their likelihood of being inthe sample. In this setting, experimental evaluations become non-experimental evaluationsbecause evaluators must make some assumption to deal with selection bias.

48

Page 49: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

6 Econometric Models of Outcomes and Program Par-ticipation

The economic approach to program evaluation is based on estimating behavioral relation-ships that can be applied to evaluate policies not yet implemented. A focus on invariantbehavioral relationships is the cornerstone of the econometric approach. Economic rela-tionships provide frameworks within which empirical knowledge can be accumulated acrossdi¤erent studies. They o¤er guidance on the speci…cation of empirical relationships for anygiven study and the type of data required to estimate a behaviorally-motivated evaluationmodel. Alternative empirical evaluation strategies can be judged, in part, by the economicjusti…cation for them. Estimators that make economically implausible or empirically un-justi…ed assumptions about behavior should receive little support.

The approach to evaluation guided by economic models is in contrast with the case-by-case approach of statistics that at best o¤ers intuitive frameworks for motivating estimators.The emphasis in statistics is on particular estimators and not on the models motivatingthe estimators. The output of such case by case studies often does not cumulate. Sinceno articulated behavioral theory is used in this approach, it is not helpful in organizingevidence across studies or in suggesting explanatory variables or behaviorally motivatedempirical relationships for a given study. It produces estimated parameters that are verydi¢cult to use in answering well posed evaluation questions.

All economic evaluation models have two ingredients: (a) a model of outcomes and(b) a model of program participation. This section presents several prototypical econo-metric models. The …rst was developed by Heckman (1978) to rationalize the evidence inAshenfelter (1978). The second rationalizes the evidence presented in Heckman and Smith(1998b) and Heckman, Ichimura, Smith and Todd (1998).

6.1 Uses of Economic ModelsThere are several distinct uses of economic models. (1) They suggest lists of explanatoryvariables that might belong in both outcome and participation equations. (2) They some-times suggest plausible “exclusion restrictions” - variables that in‡uence participation butdo not directly in‡uence outcomes, that can be used to help identify models in the presenceof self-selection by participants. (3) They sometimes suggest speci…c functional forms ofestimating equations motivated by a priori theory or by cumulated empirical wisdom.

49

Page 50: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

6.2 Prototypical Models of Earnings and Program ParticipationTo simplify the discussion, and start where the published literature currently stops, assumethat persons have only one period in their lives - period k - where they have the chance totake job training. From the beginning of economic life, t = 1 up through t = k, personshave one outcome associated with the no-training state “0”:

Y0j j = 1; :::; k:

After period k, there are two potential outcomes corresponding to the training outcome(denoted “1”) and the no-training outcome (“0”):

(Y0j ; Y1j) j = k + 1; :::; T

where T is the end of economic life.Persons participate in training only if they apply to a program and are accepted into

it. Several decision makers may be involved: individuals, family members and bureaucrats.Let D = 1 if a person participates in a program; D = 0 otherwise. Then the full descriptionof participation and potential outcomes is(6.1) (D;Y0t; t = 1; :::; k; (Y0t; Y1t); t = k + 1; ::::; T ):As before, observed outcomes after period k can be written as a switching regression model:

Y0t = DY1t + (1 ¡D)Y0t:

The most familiar model and the one that is most widely used in the training programevaluation literature assumes that program participation decisions are based on individualchoices based on the maximization of the expected present value of earnings. It ignoresfamily and bureaucratic in‡uences on participation decisions.

6.3 Expected Present Value of Earnings MaximizationIn period k, a prospective trainee seeks to measure the expected present value of earnings.Earnings is the outcome of interest. The information available to the agent in period kis Ik. The cost of program participation consists of two components: c (direct costs) andforegone earnings during the period. Training takes one period to complete. Assume thatcredit markets are perfect so that agents can lend and borrow freely at interest rate r.The expected present value of earnings maximizing decision rule is to participate in theprogram (D = 1) if

(6.2) E"T¡kPj=1

Y1;k+j(1 + r)j

¡ c¡T¡kPj=0

Y0;k+j(1 + r)j

j Ik#

¸ 0;

50

Page 51: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

and not to participate in the program (D = 0) if this inequality does not hold. In (6.2), theexpectations are computed with respect to the information available to the person in periodk(Ik). It is important to notice that the expectations in (6.2) are the private expectationsof the decision maker. They may or may not conform to the expectations computed againstthe true ex ante distribution. Note further that Ik may di¤er among persons in the sameenvironment or may di¤er among environments. Many variables external to the modelmay belong in the information sets of persons. Thus friends, relatives and other channelsof information may a¤ect personal expectations.28

The following are consequences of this decision rule. (a) Older persons, and persons withhigher discount rates, are less likely to take training. (b) Earnings prior to time period k areirrelevant for determining participation in the program except for their value in forecastingfuture earnings. (i.e. except as they enter the person’s information set Ik). (c) Only currentcosts and the discounted gain to earnings determine participation in the program. Personswith lower foregone earnings and lower direct costs of program participation are more likelyto go into the program. (d) Any dependence between the realized (measured) income atdate t and D is induced by the decision rule. It is the relationship between the expectedoutcomes at the time decisions are made and the realized outcomes that generate thestructure of the bias for any econometric estimator of a model. This framework underliesmuch of the empirical work in the literature on evaluating job training programs (see, e.g.,Ashenfelter, 1978, Bassi, 1983, 1984, and Ashenfelter and Card, 1985). We now considervarious specializations of it.

6.3.1 Common Treatment E¤ect

As discussed in Section 3, the common treatment e¤ect model is implicitly assumed in muchof the literature evaluating job training programs. It assumes that Y1t ¡ Y0t = ®t; t > k,where ®t is a common constant for everyone. Another version writes ®t as a function of X;®t (X). We take it as a point of departure for our analysis. The model we …rst presentedwas in Heckman (1978). Ashenfelter and Card (1985) and Heckman and Robb (1985a,1986a) develop it. In this model, the e¤ect of treatment on the treated and the e¤ectof randomly assigning a person to treatment come to the same thing, i.e. E(Y1t ¡ Y0t jX;D = 1) = E(Y1t ¡ Y0t j X) since the di¤erence between the two income streams is thesame for all persons with the same X characteristics. Under this model, decision rule (6.2)specializes to the discrete choice model

28A sharp contrast between a model of perfect certainly and model of uncertainty is that the latterintroduces the possibility of incorporating many more “explanatory variables” in the model in addition tothe direct objects of the theory.

51

Page 52: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

(6.3) D = 1; if EÃT¡kPj=1

®k+j(1 + r)j

¡ c¡ Y0k j Ik!

¸ 0;

D = 0 otherwise.If the ®k+j are constant in all periods and T is large (T ! 1) the criterion simpli…es to

(6.4) D = 1 if Eµ®r

¡ c¡ Y0kjIk¶

¸ 0;D = 0 otherwise.

Even though agents are assumed to be farsighted, and possess the ability to makeaccurate forecasts, the decision rule is simple. Persons compare current costs (both direct

costs c and foregone earnings, Y0k) with expected future rewards E"(T¡kPj=1

®k+j(1 + r)j

)¯¯¯ Ik

#.

Future rewards are the same for everyone of the same age and with the same discountrate. Future values of Y0t do not directly determine participation given Y0k. The linkbetween D and Y0t; t > k, comes through the dependence with Y0k and any dependenceon cost c. If one knew, or could proxy, Y0k and c, one could condition on these variablesand eliminate selective di¤erences between participants and nonparticipants. Since returnsare identical across persons, only variation across persons in the direct cost and foregoneearnings components determine the variation in the probability of program participationacross persons. Assuming that c and Y0k are unobserved by the econometrician, but knownto the agent making the decision to go into training,

Pr(D = 1) = PrÃT¡kPj=1

®k+j(1 + r)j

> c+ Y0k

!:

In the case of an in…nite-horizon, temporally-constant treatment e¤ect, ®, the expressionsimpli…es to

Pr(D = 1) = Prµ®r

¸ c+ Y0k¶:

This simple model is rich enough to be consistent with Ashenfelter’s dip. As discussedin Section 4, the “dip” refers to the pattern that the earnings of program participantsdecline just prior to their participation in the program. If earnings are temporarily low inenrollment period k, and c does not o¤set Y0k, persons with low earnings in the enrollmentperiod enter the program. Since the return is the same for everyone, it is low opportunitycosts or tuition that drive program participation in this model. If the ®; c or Y0k dependon observed characteristics, one can condition on those characteristics in constructing theprobability of program participation.

This model is an instance of a more general approach to modelling behavior that is usedin the economic evaluation literature. Write the net utility of program participation of the

52

Page 53: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

decision maker as IN . An individual participates in the program (D = 1) if and only ifIN > 0. Adopting a separable speci…cation, we may write

IN = H(X) ¡ V:

In terms of the previous example, H(X) =T¡kX

j=1

®k+j(1 + r)j

is a constant, and V = c+ Y0k.

The probability that D = 1 given X is(6.5) Pr(D = 1 j X) = Pr(V < H(X) j X):If V is stochastically independent of X; we obtain the important special case

Pr(D = 1 j X) = Pr(V < H(X))

which is widely assumed in econometric studies of discrete choice.29

If V is normal with mean ¹1 and variance ¾2V , then

(6.6) Pr(D = 1 j X) = Pr(V < H(X)) = ©ÃH(X) ¡ ¹1¾V

!

where © is the cumulative distribution function of a standard normal random variable. IfV is a standardized logit,

Pr(D = 1 j X) =exp(H(X))

1 + exp(H(X)):

Although these functional forms are traditional, they are restrictive and are not required bythe econometric approach. Conditions for nonparametric identi…ability of Pr(D = 1 j X)given di¤erent assumptions about the dependence of X and V are presented in Cosslett(1983), and Matzkin (1992). Cosslett (1983), Matzkin (1993) and Ichimura (1993) considernonparametric estimation of H and the distribution of V . Lewbel (1998) demonstrateshow discrete choice models can be identi…ed under much weaker assumptions than inde-pendence between X and V . Under certain conditions, information about agent decisionsto participate in a training program can be informative about their preferences and theoutcomes of a program.

Heckman and Smith (1998a) demonstrate conditions under which knowledge of theself-selection decisions of agents embodied in Pr(D = 1 j X) is informative about thevalue of Y1 relative to Y0. In the Roy model (see, e.g., Heckman and Honoré, 1990),IN = Y1 ¡Y0 = (¹1(X)¡¹0(X))+(U1 ¡U0): Assuming X is independent of U1 ¡U0; from

29Conditions for the existence of a discrete choice random utility representation of a choice process aregiven in McLennan (1990).

53

Page 54: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

self selection decisions of persons into a program, it is possible to estimate ¹1(X)¡ ¹0(X)up to scale, where the scale is [V ar(U1¡U0)]1=2. This is a standard result in discrete choicetheory. Thus in the Roy model it is possible to recover E(Y1 ¡ Y0 j X) up to scale justfrom knowledge of the choice probability. Under additional assumptions on the supportof X, Heckman and Smith (1998a) demonstrate that it is possible to recover the full jointdistribution F (y0; y1 j X) and to answer all of the evaluation questions about means anddistributions posed in Section 3. Under more general self-selection rules, it is still possibleto infer the personal valuations of a program from observing selection into the program andattrition from it. The Roy model is the one case where personal evaluations of a program,as revealed by the choice behavior of the agents studied, coincide with the “objective”evaluations based on Y1 ¡ Y0.

Within the context of a choice-theoretic model, it is of interest to consider the assump-tions that justify the three intuitive evaluation estimators introduced in section 4, startingwith the cross-section estimator (3.3) - which is valid if assumption (4.A.3) is correct. Givendecision rule (6.3), under what conditions is it plausible to assume that(4.A.3) E(Y0t j D = 1) = E(Y0t j D = 0); t > kso that cross section comparisons identify the true program e¤ect? (Recall that in a modelwith homogeneous treatment impacts, the various mean treatment e¤ects all come to thesame thing.) We assume that evaluators do not observe costs nor do they observe Y0k fortrainees.

Assumption (4.A.3) would be satis…ed in period t if

E(Y0t jT¡kPj=1

®k+j(1 + r)j

¡ c¡ Y0k ¸ 0) = E(Y0t jT¡kPj=1

®k+j(1 + r)j

¡ c¡ Y0k < 0); t > k:

One way this condition can be satis…ed is if earnings are distributed independently overtime (Y0k independent of Y0t), t > k, and direct costs c are independent of Y0t; t > k: Moregenerally, only independence in the means with respect to c + Y0k is required.30 If thedependence in earnings vanishes for earnings measured more than ` periods apart (e.g. ifearnings are a moving average of order `), then for t > k+ `, assumption (4.A.3) would besatis…ed in such periods.

Considerable evidence indicates that earnings have an autoregressive component (see,e.g., Ashenfelter 1978; Ashenfelter and Card, 1985; MaCurdy, 1982; Farber and Gibbons,1994). Then (4.A.3) seems implausible except for special cases.31 Moreover if stipends(a component of c) are determined in part by current and past income because they aretargeted toward low-income workers, then (4.A.3) is unlikely to be satis…ed.

Access to better information sometimes makes it more likely that a version of assumption30Formally, it is required that E (Y0tjc + Y0k) does not depend on c and Y0k for all t > k.31Note, however, much of this evidence is for log earnings and not earnings levels.

54

Page 55: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

(4.A.3) will be satis…ed if it is revised to condition on observables X:(4.A.30) E(Y0t j D = 1; X) = E(Y0t j D = 0; X):In this example, let X = (c; Y0k): Then if we observe Y0k for everyone, and can conditionon it, and if c is independent of Y0t given Y0k; then

E(Y0t j D = 1; Y0k) = E(Y0t jT¡kPj=1

®k+j(1 + r)j

¡ Y0k ¸ c; Y0k)= E(Y0t j Y0k)= E(Y0t j D = 0; Y0k):

Then for common values of Y0k, assumption (4.A.30) is satis…ed for X = Y0k.Ironically, using too much information may make it di¢cult to satisfy (4.A.30). To see

this, suppose that we observe c and Y0k and X = (c; Y0k). NowE(Y0t j D = 1; (c; Y0k)) = E(Y0t j c; Y0k)

andE(Y0t j D = 0; (c; Y0k)) = E(Y0t j c; Y0k)

because c and Y0k perfectly predict D. But (4.A.30) is not satis…ed because decision rule(6.3) perfectly partitions the (c; Y0k) space into disjoint sets. There are no common valuesofX = (c; Y0k) such that (4.A.30) can be satis…ed. In this case, the “regression discontinuitydesign” estimator of Campbell and Stanley (1966) is appropriate. We discuss this estimatorin Section 7.4.6 below.

If we assume that

0 < Pr(D = 1 j X) < 1

we rule out the phenomenon of perfect predictability of D given X. This condition guar-antees that persons with the same X values have a positive probability of being bothparticipants and nonparticipants.32 Ironically, having too much information may be a badthing. We need some “random” variation that places observationally equivalent people inboth states. The existence of this fortuitous randomization lies at the heart of the methodof matching.

Next consider assumption (4.A.1). It is satis…ed in this example if in a time homoge-neous environment, a “…xed e¤ect” or “components of variance structure” characterizes Y0tso that there is an invariant random variable ' such that Y0t can be written as(6.7) Y0t = ¯t + '+ U0t for all tand E(U0t j ') = 0 for all twhere the U0t are mutually independent, and c is independent of U0t: If Y0t is earnings,then ' is “permanent income” and the U0t are “transitory deviations” around it. Thenusing (6.3) for t > k > t0, we have

32This is one of two conditions that Rosenbaum and Rubin (1983) call “strong ignorability” and is centralto the validity of matching. We discuss these conditions further in section 7.3.

55

Page 56: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

E(Y0t ¡ Y0t0 j D = 1) = ®t + ¯t ¡ ¯t0,

since E(U0t j D = 1) ¡ E(U0t0 j D = 1) = 0:From the assumption of time homogeneity, ¯t = ¯t0. Thus assumption (4.A.1) is satis-

…ed and the before-after estimator identi…es ®t. It is clearly not necessary to assume thatthe U0t are mutually independent, just that(6.8) E(U0t ¡ U0t0 j D = 1) = 0i.e. that the innovation U0t¡U0t0 is mean independent of U0k+c. In terms of the economicsof the model, it is required that participation does not depend on transitory innovationsin earnings in periods t and t0. For decision model (6.3), this condition is satis…ed as longas U0k is independent of U0t and U0t0 , or as long as U0k + c is mean independent of bothterms.

If, however, the U0t are serially correlated, then (4.A.1) will generally not be satis…ed.Thus if a transitory decline in earnings persists over several time periods (as seems tobe true as a consequence of Ashenfelter’s dip), so that there is stochastic dependence of(U0t; U0t0) with U0k, then it is unlikely that the key identifying assumption is satis…ed. Onespecial case where it is satis…ed, developed by Heckman (1978) and Heckman and Robb(1985a) and applied by Ashenfelter and Card (1985) and Finifter (1987) among others, is a“symmetric di¤erences” assumption. If t and t0 are symmetrically aligned (so that t = k+` and t = k¡ `) and conditional expectations forward and backward are symmetric, so that(6.9) E(U0t j c + ¯t + U0k) = E(U0t0 j c+ ¯k + U0k);then assumption (4.A.1) is satis…ed. This identifying condition motivates the symmetricdi¤erences estimator discussed in Section 7.6.

Some evidence of non-stationary wage growth presented by Farber and Gibbons (1994),MaCurdy (1982), Topel and Ward (1992) and others suggests that earnings can be approx-imated by a “random walk” speci…cation. If

(6.10) Y0t = ¯t + ´ +tPj=0ºj;

where the ºj are mean zero, mutually independent and identically-distributed randomvariables independent of ´, then (6.8) and (6.9) will not generally be satis…ed. Thus evenif conditional expectations are linear, both forward and backward, it does not follow that(4.A.1) will hold. Let the variance of ´ and the variance of ºj be …nite. Assume thatE(´) = 0. Suppose c is independent of all the ºj and ´;and

E(U0t j c+ ¯t + U0k) =¾2´ + k¾2v

¾2c + ¾2´ + k¾2º(c+ U0k ¡E(c))

and E(U0t0 j c+ ¯t + U0k) =¾2´ + t0¾2v

¾2c + ¾2´ + t¾2º(c+ U0k ¡E(c)):

56

Page 57: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

These two expressions are not equal unless ¾2º = 0:A more general model that is consistent with the evidence reported in the literature

writes

Y0t = ¹0t(X) + ´ + U0t;

where

U0t =kPj=1½0jU0;t¡j +

mX

j=1m0jºt¡j ;

where the ºt¡j satisfy E(ºt¡j) = 0 at all leads and lags, and are uncorrelated with ´; whereU0t is an autoregression of order k and moving average of length m. Some authors likeMaCurdy (1982) or Gibbons and Farber (1994) allow the coe¢cients (½0j;m0j) to dependon t and do not require that the innovations be identically distributed over time. Forthe logarithm of white male earnings in the United States, MaCurdy (1982) …nds that amodel with a permanent component (´), plus one autoregressive coe¢cient (k = 1) andtwo moving average terms (m = 2) describes his data.33 Gibbons and Farber report similarevidence.

These times series models suggest generalizations of the before-after estimator thatexploit the longitudinal structure of earnings processes but work with more general typesof di¤erences that align future and past earnings. These are developed at length in Heckmanand Robb (1985, 1986), Heckman (1998a) and in Section 7.6.

If there are “time e¤ects,” so that ¯t 6= ¯t0 , (4.A.1) will not be satis…ed. Before-afterestimators will confound time e¤ects with program gains. The “di¤erence in di¤erences”estimator circumvents this problem for models in which (4.A.1) is satis…ed for the unob-servables of the model but ¯t 6= ¯t0: Note, however, that in order to apply this assumption itis necessary that time e¤ects be additive in some transformation of the dependent variableand identical across participants and nonparticipants. If they are not, then (4.A.2) will notbe satis…ed.

For example, if the decision rule for program participation is such that persons withlower life cycle wage growth paths are admitted into the program, or persons who aremore vulnerable to the national economy are trained, then the assumption of commontime (or age) e¤ects across participants and nonparticipants will be inappropriate and thedi¤erence-in-di¤erence estimator will not identify true program impacts.

33The estimated value of ½01 is close to 1 so that the model is close is a random walk in levels of logearnings.

57

Page 58: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

6.3.2 A Separable Representation

In implementing econometric evaluation strategies, it is common to control for observedcharacteristics X. Invoking the separability assumption, we write the outcome equationfor Y0t as

Y0t = g0t(X) + U0t

where g0t is a behavioral relationship and U0t has a …nite mean conditioning on X. Aparallel expression can be written for Y1t :

Y1t = g1t(X) + U1t.

The expression for g0t(X) is a structural relationship that may or may not be di¤erent from¹0t(X), the conditional mean. It is a ceteris paribus relationship that informs us of thee¤ect of changes of X on Y0t holding U0t constant. Throughout this chapter we distinguish¹1t from g1t and ¹0t from g0t. For the latter, we allow for the possibility that E(U1t j X) 6= 0and E(U0t j X) 6= 0. The separability enables us to isolate the e¤ect of self selection, as itoperates through the “error term”, from the structural outcome equation:(6.11a) E(Y0t j D = 0; X) = g0t(X) + E(U0t j D = 0; X):(6-11b) E(Y1t j D = 1; X) = g1t(X) + E(U1t j D = 1; X):

The g0t(X) and g1t(X) functions are invariant across di¤erent conditioning schemesand decision rules provided that X is available to the analyst. One can borrow knowledgeof these functions from other studies collected under di¤erent conditioning rules includingthe conditioning rules that de…ne the samples used in social experiments. Although theconditional mean of the errors di¤ers across studies, the g0t(X) and analogous g1t(X)functions are invariant across studies. If they can be identi…ed, they can be meaningfullycompared across studies, unlike the parameter treatment on the treated which, in the caseof heterogeneous response to treatment that is acted on by agents, di¤ers across programswith di¤erent decision rules and di¤erent participant compositions.

A special case of this representation is the basis for an entire literature. Suppose that(P.1) The random utility representation (6.5) is valid.Further, suppose that(P.2) (U0t; U1t; V ) k X,(“ k ” denotes stochastic independence)

and …nally assume that(P.3) the distribution of V;F (V ) is strictly increasing in V .Then(6.12a) E(U0t j D = 1;X) = K0t(Pr(D = 1 j X)):

58

Page 59: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

and(6.12b) E(U1t j D = 1; X) = K1t(Pr(D = 1 j X)).34

The mean error term is a function of P , the probability of participation in the program.This special case receives empirical support in Heckman, Ichimura, Smith and Todd (1998)and Heckman, Ichimura and Todd (1997). It enables analysts to characterize the depen-dence between U0t and X by the dependence of U0t on Pr(D = 1 j X) which is a scalarfunction of X. As a practical matter, this greatly reduces the empirical task of estimatingselection models. Instead of having to explore all possible dependence relationships betweenU and X; the analyst can con…ne attention to the more manageable task of exploring thedependence between U and Pr(D = 1 j X). An investigation of the e¤ect of conditioningon program eligibility rules or self selection on Y0t comes down to an investigation of thee¤ect of the conditioning on Y0t as it operates through the probability P . It motivates afocus on the determinants of participation in the program in order to understand selectionbias and is the basis for the “control function” estimators developed in Section 7.

34The proof is immediate. The proof of (6.12b) follows by similar reasoning. We follow Heckman (1980)and Heckman and Robb (1985a, 1986b). Assume that U0t; V are jointly continuous random variables, withdensity f(U0t; V j X). From (P.2)

f(U0t; V j X) = f(U0t; V ):

Thus

E(U0t j X;D = 1) =

1R¡1

U0t

H(X)R¡1

f(U0t; V )dU0tdV

H(X)R¡1

f(V )dV

:

Now Pr(D = 1 j X) =H(X)R¡1

f(V )dV:

Inverting, we obtain

H(X) = F¡1V (Pr(D = 1 j X)):

Thus

E(U0t j X;D = 1) =

1R¡1

U0t

F¡1V (Pr(D=1jX))R

¡1f(U0t; V )dV dU0t

Pr(D = 1 j X)

def=

K0t(Pr(D = 1 j X)):

59

Page 60: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

If, however, (P.2) is not satis…ed, then the separable representation is not valid. Thenit is necessary to know more than the probability of participation to characterize E(U0t jX;D = 1). In this case it is necessary to characterize both the dependence between U0tand X given D = 1 and the probability of participation.

6.3.3 Variable Treatment E¤ect

A more general version of the decision rule, given by (6.2), allows (Y0t; Y1t) to be a pair ofrandom variables with no necessary restriction connecting them. In the more general case,

®t = Y1t ¡ Y0t; t > k

is now a random variable. In this case as previously discussed in Section 3, there is adistinction between the parameter “the mean e¤ect of treatment on the treated” and the“mean e¤ect of randomly assigning a person with characteristics X into the program”.

In one important case discussed in Heckman and Robb (1985a), the two parameters havethe same ex post mean value even if treatment e¤ect ®t is heterogeneous after conditioningon X. Suppose that ®t is unknown to the agent at the time enrollment decisions aremade. The agent forecasts ®t using the information available in his/her information set Ik.E(®t j Ik) is the private expectation of gain by the agent. If ex post gains of participantswith characteristics X are the same as what the ex post gains of nonparticipants wouldhave been had they participated, then the two parameters are the same. This would ariseif both participants and nonparticipants have the same ex ante expected gains

E(®t j D = 1; Ik) = E(®t j D = 0; Ik) = E(®t j Ik),

and if

E[E(®t j Ik) j X;D = 1] = E[E(®t j Ik) j X;D = 0];

where the expectations are computed with respect to the observed ex-post distributionof the X. This condition requires that the information in the participant’s decision sethas the same relationship to X as it has for nonparticipants. The interior expectationsin the preceding expression are subjective. The exterior expectations in the expressionare computed with respect to distributions of objectively-observed characteristics. Thecondition for the two parameters to be the same is

E[E(®t j Ik; D = 1) j X;D = 1] = E[E(®t j Ik; D = 0) j X;D = 0]:

60

Page 61: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

As long as the ex-post objective expectation of the subjective expectations is the same,the two parameters (E(®t j X;D = 1) and E(®t(X)) are the same. This condition wouldbe satis…ed if, for example, all agents, irrespective of their X values, place themselves atthe mean of the objective distribution, i.e.,

E(®t j Ik; D = 1) = E(®t j Ik;D = 0) = ¹®t

(see, e.g., Heckman and Robb, 1985a). Di¤erences across persons in program participationare generated by factors other than potential outcomes. In this case, the ex-post surprise,

(®t ¡ ¹®t)

does not depend on X or D in the sense that

E(®t ¡ ¹®t j X;D = 1) = 0:

So

E(Y1t ¡ Y0t j X;D = 1) = ¹®t:

This discussion demonstrates the importance of understanding the decision rule and itsrelationship to measured outcomes in formulating an evaluation model. If agents do notmake their decisions based on the unobserved components of gains from the program or onvariables statistically related to those components, the analysis for the common coe¢cientmodel presented in section (a) remains valid even if there is variability in U1t¡U0t: If agentsanticipate the gains, and base decisions on them, at least in part, then a di¤erent analysisis required.

The conditions for the absence of bias for one parameter are di¤erent from the condi-tions for the absence of bias for another parameter. The di¤erence between the “randomassignment” parameter E(Y1t ¡ Y0t j X) and the “treatment on the treated” parameter isgain in the unobservables going from one state to the next:

E(U1t ¡ U0t j X;D = 1) = E(¢t j X;D = 1) ¡ E(¢t j X):

The only way to avoid bias for both mean parameters is if E(U1t ¡ U0t j X;D = 1) = 0:Unlike the other estimators, the before-after estimators are non-robust to time e¤ects

that are common across participants and nonparticipants. The di¤erence-in-di¤erencesestimators and the cross-section estimators are unbiased under di¤erent conditions. Thecross-section estimator for the period t common e¤ect and the “treatment on the treated”

61

Page 62: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

variable-e¤ect version of the model require that mean unobservables in the no program statebe the same for participants and nonparticipants. The di¤erence-in-di¤erences estimatorrequires a balance of the bias in the change in the unobservables from period t0 to periodt. If the cross-section conditions for the absence of bias are satis…ed for all t, then theassumption justifying the di¤erence-in-di¤erences estimator is satis…ed.

However, the converse is not true. Even if the conditions for the absence of bias inthe di¤erence-in-di¤erences estimator are satis…ed, the conditions for absence of bias forthe cross section estimator are not necessarily satis…ed. Moreover, failure of the di¤erence-in-di¤erences condition for the absence of bias does not imply failure of the conditionfor absence of bias for the cross-section estimator. Ashenfelter’s dip provides empiricallyrelevant example of this point. If t0 is measured during the period of the dip, but the dip ismean-reverting in post-program periods, then the condition for the absence of cross-sectionbias could be satis…ed because post-program, there could be no selective di¤erences amongparticipants.

6.3.4 Imperfect Credit Markets

How robust is the analysis of Sections 6.2 and 6.3, and in particular the conditions forbias, to alternative speci…cations of decision rules and the economic environments in whichindividuals operate? To answer this question, we …rst reexamine the decision rule afterdropping our assumption of perfect credit markets. There are many ways to model im-perfect credit markets. The most extreme approach assumes that persons consume theirearnings each period. This changes the decision rule (6.2) and produces a new interpreta-tion for the conditions for absence of bias. Let G denote a time-separable strictly concaveutility function and let ¯ be a subjective discount rate. Suppose that persons have ex-ogenous income ‡ow ´t per period. Expected utility maximization given information Ikproduces the following program participation rule:

(6.13) D =

8>>><>>>:

1 if E

264T¡kPj=1¯jfG(Y1;k+j + ´k+j) ¡G(Y0;k+j + ´k+j)g

+G(´k ¡ ck) ¡G(Y0k + ´k) j Ik

375 ¸ 0;

0 otherwise.As in the previous cases, earnings prior to time period k are only relevant for forecastingfuture earnings (i.e., as elements of Ik). However, the decision rule (6.2) is fundamentallyaltered in this case. Future earnings in both states determine participation in a di¤erentway. Common components of earnings in the two states do not di¤erence out unless G isa linear function.35

35Due to the nonlinearity of G, there are wealth e¤ects in the decision to take training.

62

Page 63: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

Consider the permanent-transitory model of equation (6.7). That model is favorable tothe application of longitudinal before-after estimators. Suppose that the U0t are indepen-dent and identically distributed, and there is a common-e¤ect model. Condition (6.8) isnot satis…ed in a perfect foresight environment when there are credit constraints, or in anenvironment in which the U0t can be partially forecast36 because for t > k > t0

E(U0t j X;D = 1) 6= 0even though E(U0t0 j X;D = 1) = 0;so E(U0t ¡ U0t0 j X;D = 1) 6= 0:The before-after estimator is now biased. So is the di¤erence in di¤erences estimator. If,however, the U0t are not known, and cannot be partially forecast, then condition (6.8) isvalid, so both the before-after and di¤erence in di¤erence estimators are unbiased.

Even in a common e¤ect model, with Y0t (or U0t) independently and identically distrib-uted, the cross section estimator is biased for period t > k in an environment of perfectcertainty with credit constraints because D depends on Y0t through decision rule (6.13).On the other hand, if Y0t is not forecastable with respect to the information in Ik, thecross-section estimator is unbiased.

The analysis in this subsection and the previous subsections has major implications fora certain style of evaluation research. Understanding the stochastic model of the outcomeprocess is not enough. It is also necessary to know how the decision makers process theinformation, and make decisions about program participation.

6.3.5 Training As A Form of Job Search

Heckman and Smith (1998b) …nd that among persons eligible for the JTPA program, theunemployed are much more likely to enter the program than are other eligible persons.Persons are de…ned to be unemployed if they are not working but report themselves asactively seeking work. The relationship uncovered by Heckman and Smith is not due toeligibility requirements. In the United States, unemployment is not a precondition forparticipation in the program.

Several previous studies suggest that Ashenfelter’s dip results from changes in laborforce status, instead of from declines in wages or hours among those who work. Using

36“Partially forecastable” means that some component of U0t resides in the information set Ik: That is,letting f(y j x) be the density of y given x, f(U0t j Ik) 6= f(U0t) so that Ik predicts U0t in this sense. Onecould de…ne “moment forecastability” using conditional expectations of certain moments of function“'":If E('(U0t) j Ik) 6= E('(U0t)), then '(U0t) is partially moment forecastable using the information in Ik.More formally, a random variable is fully-forecastable if the ¾-algebra generating U0t is contained in the¾-algebra of Ik. It is partially forecastable if the complement of the projection of the ¾-algebra of U0t ontothe ¾-algebra of Ik is not the empty set. It is fully unforecastable if the projection of the ¾-algebra of U0tonto the ¾-algebra of Ik is the empty set.

63

Page 64: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

even a crude measure of employment rates, namely whether a person was employed at allduring a calendar year, Card and Sullivan (1988) observed that U.S. CETA training par-ticipants’ employment rates declined prior to entering training.37 Their evidence suggeststhat changes in labor force dynamics instead of changes in earnings may be a more preciseway to characterize participation in training.

Heckman and Smith (1998b) show that whether or not a person is employed, unem-ployed (not employed and looking for work), or out of the labor force is a powerful predictorof participation in training programs. Moreover, they …nd that recent changes in labor forcestatus are important determinants of participation for all demographic groups. In particu-lar, eligible persons who have just become unemployed, either through job loss or throughre-entry into the labor force, have the highest probabilities of participation. For women,divorce, another form of job termination, is a predictor of who goes into training. Amongthose who either are employed or out of the labor force, persons who have recently enteredthese states have much higher participation program probabilities than persons in thosestates for some time. Their evidence is formalized by the model presented in this section.

The previous models that we have considered are formulated in terms of levels of costsand earnings. When opportunity costs are low, or tuition costs are low, the persons are morelikely to enter training. The model presented here recognizes that changes in labor forcestates account for participation in training. Low earnings levels are a subsidiary predictorof program participation that are overshadowed in empirical importance by unemploymentdynamics in the analyses of Heckman and Smith (1998b).

Persons with zero earnings di¤er substantially in their participation probabilities de-pending on their recent labor force status histories. Yet, in models based on pre-trainingearnings dynamics, such as the one presented in Section 6.3, such persons are assumed tohave the same behavior irrespective of their labor market histories.

The importance of labor force status histories also is not surprising given that manyemployment and training services, such as job search assistance, on-the-job training at pri-vate …rms, and direct placement are all designed to lead to immediate employment. Byproviding these services, these programs function as a form of job search for many partic-ipants. Recognizing this role of active labor market policies is an important developmentin recent research. It indicates that in many cases, participation in active labor marketprograms should not be modeled as if it were like a schooling decision, such as we havemodeled it in the preceding sections.

In this section, we summarize the evidence on the determinants of participation in theprogram and construct a simple economic model in which job search makes two contribu-

37Ham and LaLonde (1990) report the same result using semi-monthly employment rates for adult womenparticipating in NSW.

64

Page 65: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

tions to labor market prospects: (a) it facilitates the rate of arrival of job o¤ers and (b) itimproves the distribution of wages in the sense of giving agents a stochastically dominantwage distribution compared to the one they face without search. Training is one form ofunemployment that facilitates job search. Di¤erent training options will produce di¤er-ent job prospects characterized by di¤erent wage and layo¤ distributions. Searchers mightparticipate in programs that subsidize the rate of arrival of job o¤ers (JSA as described inSection 2), or that improve the distribution from which wage o¤ers are drawn (i.e., basiceducational and training investments).

Instead of motivating participation in training with a standard human capital model,we motivate participation as a form of search among options. Because JSA constitutes alarge component of active labor market policy, it is of interest to see how the decision ruleis altered if enhanced job search rather than human capital accumulation is the main factormotivating individuals’ participation in these programs.

Our model is based on the idea that in program j; wage o¤ers arrive from a distributionFj at rate ¸j. Persons pay cj to sample from Fj : (The costs can be negative). Assumethat the arrival times are statistically independent of the wage o¤ers and that arrival timesand wage o¤ers from one search option are independent of the wages and arrival timesof other search options. At any point in time, persons pick the search option with thehighest expected return. To simplify the analysis, suppose that all distributions are timeinvariant and denote by N the value of nonmarket time. Persons can select among any ofJ options, denoted by j. Associated with each option is a rate at which jobs appear, ¸j.Let the discount rate be r. These parameters may vary among persons but for simplicitywe assume that they are constant for the same person over time. This heterogeneityamong persons produces di¤erences among choices in training options, and di¤erences inthe decision to undertake training.

In the unemployed state, a person receives a nonmarket bene…t, N . The choice betweensearch from any of the training and job search options can be written in “Gittens Index”form. (See, e.g., Berry and Fristedt, 1986). Under our assumptions, being in the nonmarketstate has constant per-period value N irrespective of the search option selected. LettingVje be the value of employment arising from search option j, the value of being unemployedunder training option j is:

(6.14a) Vju = N ¡ cj +¸j

1 + rEjmax[Vje;Vju] +

(1 ¡ ¸j)1 + r

Vju:

The …rst term, (N ¡ cj), is the value of nonmarket time minus the j-speci…c cost of search.The second term is the discounted product of the probability that an o¤er arrives nextperiod if the jth option is used, and the expected value of the maximum of the two options:work (valued at Vje) or unemployment Vju. The third term is the probability that theperson will continue to search times the value of doing so. In a stationary environment, if

65

Page 66: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

it is optimal to search from j today, it is optimal to do so tomorrow.Let ¾je be the exogenous rate at which jobs disappear. For a job holder, the value of

employment is Vje:

(6.14b) Vje = Yj +(1 ¡ ¾je)1 + r

Vje +¾je1 + r

Ej [ max(VN ; Vju).]Vju is the value of optimal job search under j. The expression consists of the current ‡ow

of earnings (Yj) plus the discounted (1

1 + r) expected value of employment (Vje) times the

probability that the job is retained (1 ¡ ¾je). The third term arises from the possibilitythat a person loses his/her job (this happens with probability (¾je)) times the expectedvalue of the maximum of the search and nonmarket value options (VN):

To simplify this expression, assume that Vju > VN . If this is not so, the person wouldnever search under any training option under any event. In this case, Vje simpli…es to

Vje = Yj +(1 ¡ ¾je)1 + r

Vje +¾je1 + r

Vju

so(6.14c) Vje =

¾jer + ¾je

Vju +(1 + r)Yjr + ¾je

:

Substituting (6.14c) into (6.14a), we obtain, after some rearrangement,

Vju =(1 + r)(N ¡ cj) + ¸j E

j(Vje j Vje > Vju) Pr(Yj > Vju(r=1 + r))

r + ¸j Pr(Yj > Vju(r=1 + r)):

In deriving this expression, we assume that the environment is stationary so that theoptimal policy at time t is also the optimal policy at t0 provided that the state variablesare the same in each period.

The optimal search strategy is

j =argmaxj

fVjug

provided that Vju > VN for at least one j. The lower cj and the higher ¸j, the moreattractive is option j. The larger the Fj— in the sense that j stochastically dominatesj0(Fj(x) < Fj0(x)), so more of the mass of Fj is the upper portion of the distribution— themore attractive is option j. Given the search options available to individuals, enrollmentin a job training program may be the most e¤ective option.

The probability that a training from option j lasts Tj = tj periods or more is

Pr(Tj ¸ tj) = [1 ¡ ¸j(1 ¡ Fj(Vju(r=(1 + r)))]tj

66

Page 67: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

where 1¡ ¸j(1¡Fj(Vju(r=1+ r)) is the sum of the probability of receiving no o¤er (1¡¸j)plus the probability of receiving an o¤er that is not acceptable (¸jFj(Vju(r=1 + r)). Thismodel is nonlinear in the basic parameters. Because of this nonlinearity, many estimatorsrelying on additive separability of the unobservables, such as di¤erence-in-di¤erences or the…xed e¤ect schemes for eliminating unobservables, are ine¤ective evaluation estimators.

This simple model summarizes the available empirical evidence on job training pro-grams. (a) It rationalizes variability in the length of time persons with identical character-istics spend in training. Persons receive di¤erent wage o¤ers at di¤erent times and leave theprogram to accept the wage o¤ers at di¤erent dates. (b) It captures the notion that train-ing programs might facilitate the rate of job arrivals - the ¸j (this is an essential functionof “job search assistance” programs) or they might produce skills - by improving the F 0j -or both. (c) It accounts for why there might be recidivism back into training programs. Asjobs are terminated (at rate ¾je), persons re-enter the program to search for a replacementjob. Recidivism is an important feature of major job training programs. Trott and Baj(1993) estimate that as many as 20 percent of all JTPA program participants in NorthernIllinois have been in the program at least twice with the modal number being three. Thishas important implications for the contamination bias problem that we discuss in Section7.7.

A less attractive feature of the model is that persons do not switch search strategies.This is a consequence of the assumed stationarity of the environment and the assumptionthat agents know both arrival rates and wage o¤er distributions. Relaxing the stationarityassumption produces switching among strategies which seems to be consistent with theevidence. A more general - but less analytically tractable model - allows for learning aboutwage o¤er distributions as in Weitzman (1979). In such a model, persons may switchstrategies as they learn about the arrival rates or the wage o¤ers obtained under a givenstrategy. The learning can take place within each type of program and may also entailword of mouth learning from fellow trainees taking the option.

Weitzman’s model captures this idea in a very simple way and falls within the Gitten’sindex framework. The basic idea is as follows. Persons have J search options. They pickthe option with the highest value and take a draw from it. They accept the draw if thevalue of the realized draw is better than the expected value of the best remaining option.Otherwise they try out the latter option. If the draws from the J options are independentlydistributed, a Gittens-index strategy describes this policy. In this framework, unemployedpersons may try a variety of options - including job training - before they take a job, ordrop out of the labor force.

One could also extend this model to allow the value of non-market time, N , to becomestochastic. If N ‡uctuates, persons would enter or exit the labor force depending on thevalue of N . Adding this feature captures the employment dynamics of trainees described

67

Page 68: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

by Card and Sullivan (1988).In this more general model, shocks to the value of leisure or termination of previous

jobs make persons contemplate taking training. Whether or not they do so depends on thevalue of training compared to the value of other strategies for …nding jobs. Allowing forthese considerations produces a model broadly consistent with the evidence presented inHeckman and Smith (1998b) that persons enter training as a consequence of displacementfrom both the market and nonmarket sector.

The full details of this model remain to be developed (see Heckman and Smith, 1999,for a start). We suggest that future analyses of program participation be based on thisempirically more concordant model. For the rest of this chapter, however, we take decisionrule (6.3) as canonical in order to motivate and justify the choice of alternative econometricestimators. We urge our readers to modify our analysis to incorporate the lessons from thisframework of labor force dynamics sketched here.

6.4 The Role of Program Eligibility Rules In Determining Par-ticipation

Several institutional features of most training programs suggest that the participation ruleis more complex than that characterized by the simple model presented above in Section6.2. For example, eligibility for training is often based on a set of objective criteria, such ascurrent or past earnings being below some threshold. In this instance, individuals can taketraining at time k only if they have had low earnings, regardless of its potential bene…t tothem. For example, enrollees satisfy(6.15) ®=r ¡ Yik ¡ ci > 0 and the eligibility rules Yi;k¡1 < Kwhere K is a cuto¤ level. More general eligibility rules can be analyzed in the sameframework.

The universality of Ashenfelter’s dip in pre-program earnings among program partici-pants occurs despite the substantial variation in eligibility rules among training programs.This suggests that earnings or employment dynamics drive the participation process andthat Ashenfelter’s dip is not an artifact of eligibility rules. Few major training programs inthe United States have required earnings declines to qualify for program eligibility. CertainCETA programs in the late 1970s required participants to be unemployed during the periodjust prior to enrollment, while NSW required participants to be unemployed at the date ofenrollment. MDTA contained no eligibility requirements, but restricted training stipendsto persons who were unemployed or “underemployed.”38 For the JTPA program, eligibility

38Eligibility for CETA varied by subprogram. CETA’s controversial Public Sector Employment (PSE)program required participants to have experienced a minimum number of days of unemployment or “un-

68

Page 69: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

has been con…ned to the economically disadvantaged (de…ned by low family income overthe past six months, participation in a cash welfare program or Food Stamps or being afoster child or disabled). There is also a 10 percent “audit window” of eligibility for personsfacing other unspeci…ed “barriers to employment.”

It is possible that Ashenfelter’s dip results simply from a mechanical operation of pro-gram eligibility rules that condition on recent earnings. Such rules select individuals withparticular types of earnings patterns into the eligible population. To illustrate this point,consider the monthly earnings of adult males who were eligible for JTPA in a given monthfrom the 1986 panel of the U.S. Survey of Income and Program Participation (SIPP). Formost people, eligibility is determined by family earnings over the past six months. Themean monthly earnings of adult males appear in Figure 4.1 aligned relative to month ‘k,’the month when eligibility is measured. The …gure reveals a dip in the mean earnings ofadult male eligibles centered in the middle of the six month window over which familyincome is measured when determining JTPA eligibility.

Figure 4.1 also displays the mean earnings of adult males in the experimental controlgroup from the NJS.39 The earnings dip for the controls, who applied and were admittedin the program, is larger than for the sample of JTPA eligibles from the SIPP. Moreover,this dip reaches its minimum during month ‘k’ rather than three or four months before aswould be indicated by the operation of eligibility rules. The substantial di¤erence betweenthe mean earnings patterns of JTPA participants and eligibles implies that Ashenfelter’sdip does not result from the mechanical operation of program eligibility rules.40

6.5 Administrative Discretion and the E¢ciency and Equity ofTraining Provision

Training participation also often depends on discretionary choices made by program oper-ators. Recent research focuses on how program operators allocate training services among

deremployment” just prior to enrollment. In general, persons became eligible for other CETA programsby having a low income or limited ability in English. Considerable discretion was left to the states andtraining centers to determine who enrolled in the program. By contrast, the NSW eligibility requirementswere quite speci…c. Adult women had to be on AFDC at the time of enrollment, have received AFDC for30 of the last 36 months, and have a youngest child age six years or older. Youth in the NSW had to beage 17-20 years with no high school diploma or equivalency degree and have not been in school in the pastsix months. In addition, …fty percent of youth participants had to have had some contact with the criminaljustice system (Hollister, et al., 1984).

39Such data were collected at four of the 16 training centers that participated in the study.40Devine and Heckman (1996) present certain nonstationary family income processes that can generate

Ashenfelter’s dip from the application of JTPA eligibility rules. However, in their empirical work they …nda dip centered at k ¡ 3 or k ¡ 4 for adult men and adult women, but no dip for male and female youth.

69

Page 70: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

groups and on how administrative performance standards a¤ect the allocation of theseservices. The main question that arises in these studies is the potential trade-o¤ betweenequity and e¢ciency, and the potential con‡ict between social objectives and program op-erators’ incentives. An e¢ciency criterion that seeks to maximize the social return to publictraining investments, regardless of the implications for income distribution, implies focus-ing training resources on those groups for whom the impact is largest (per dollar spent).In contrast, equity and redistributive criteria dictate focusing training resources on groupswho are most in “need” of services .

These goals of e¢ciency and equity are written into the U.S. Job Training PartnershipAct.41 Whether or not these twin goals con‡ict with each other depends on the empiricalrelationship between initial skill levels and the impact of training. As we discuss in belowSection 10, the impact of training appears to vary on the basis of observable characteristics,such as sex, age, race and what practitioners call “barriers to employment” – low schooling,lack of employment experience and so on. These twin goals would be in con‡ict if the largestsocial returns resulted from training the most job ready applicants.

In recent years, especially in the United States, policymakers have used administrativeperformance standards to assess the success of program operators in di¤erent training sites.Under JTPA, these standards are based primarily on average employment rates and aver-age wage rates of trainees shortly after they leave training. The target levels for each siteare adjusted based on a regression model that attempts to hold constant features of theenvironment over which the local training site has no control, such as racial composition.42

Sites whose performance exceeds these standards may be rewarded with additional funding;those that fall below may be sanctioned. The use of such performance standards, insteadof measures of the impact of training, raises the issue of “cream-skimming” by program op-erators (Bassi, 1984). Program sta¤ concerned solely with their site’s performance relativeto the standard should admit into the program applicants who are likely to be employedat good wages (the “cream”) regardless of whether or not they bene…t from the program.By contrast, they should avoid applicants who are less likely to be employed after leavingtraining or have low expected wages, even if the impact of the training for such personsis likely to be large. The implications of cream-skimming for equity are clear. If it exists,program operators are directing resources away from those most in need. However, its im-

41A related issue involves di¤erences in the types of services provided to di¤erent groups conditional onparticipation in a program. The U.S. General Accounting O¢ce (1991) …nds such di¤erences alarming inthe JTPA program. Smith (1992) argues that they result from di¤erences across groups in readiness forimmediate employment and in the availability of income support during classroom training.

42See Heckman and Smith (1997d) and the essays in Heckman (1998b) for more detailed descriptions ofthe JTPA performance standards system. Similar systems based on the JTPA system now form a part ofmost U.S. training programs.

70

Page 71: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

plications for e¢ciency depend on the empirical relationship between short-term outcomelevels and long-term impacts. If applicants who are likely to be subsequently employedalso are those who bene…t the most from the program, performance standards indirectlyencourage the e¢cient provision of training services.43

A small literature examines the empirical importance of cream-skimming in JTPA pro-grams. Anderson, et al. (1991) and Anderson, et al.(1993) look for evidence of cream-skimming by comparing the observable characteristics of JTPA participants and individ-uals eligible for JTPA. They report evidence of cream-skimming de…ned in their study asthe case in which individuals with fewer barriers to employment have di¤erentially higherprobabilities of participating in training. However, this …nding may result not from cream-skimming by JTPA sta¤, but because among those in the JTPA eligible population, moreemployable persons self-select into training.44

Two more recent studies address this problem. Using data from the NJS, Heckmanand Smith (1998e) decompose the process of participation in JTPA into a series of stages.They …nd that much of what appears to be cream-skimming in simple comparisons be-tween participants’ and eligibles’ characteristics is self-selection. For example, high schooldropouts are very unlikely to be aware of JTPA and as a result are unlikely ever to apply.To assess the role of cream-skimming, Heckman, Smith and Taber (1996) study a sampleof applicants from one of the NJS training centers. They …nd that program sta¤ at thistraining center do not cream-skim, and appear instead to favor the hard-to-serve whendeciding whom to admit into the program. Such evidence suggests that cream-skimmingmay not be of major empirical importance, perhaps because the social service orientationof JTPA sta¤ moderates the incentives provided by the performance standards system, orbecause of local political incentives to serve more disadvantaged groups. For programs inNorway, Aakvik (1998) …nds strong evidence of negative selection of participants on out-comes. Heinrich (1998) reports just the opposite for a job training program in the UnitedStates. At this stage no universal generalization about bureaucratic behavior regardingcream skimming is possible.

Studies based on the NJS also provide evidence on the implications of cream-skimming,even if it were to exist. Heckman, Smith and Clements (1997) …nd that except for those whoare very unlikely to be employed, the impact of training does not vary with the expectedlevels of employment or earnings in the absence of training. This …nding indicates thatthe impact on e¢ciency of cream-skimming (or alternatively the e¢ciency cost of serving

43Heckman and Smith (1997d) discuss this issue in greater depth. The discussion in the text presumesthat the costs of training provided to di¤erent groups are roughly equal.

44Program sta¤ often have some control over who applies through their decisions about where and howmuch to publicize the program. However, this control is much less important than their ability to selectamong program applicants.

71

Page 72: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

the hard-to-serve) is low. Similarly, (1998d) …nd little empirical relationship between theoutcome measures used in the JTPA performance standards system and experimental es-timates of the impact of JTPA training. These …ndings suggest that cream-skimming haslittle impact on e¢ciency, and that administrative performance standards, to the extentthat they a¤ect who is served, do little to increase either the e¢ciency or equity of trainingprovision.

6.6 The Con‡ict Between The Economic Approach to ProgramEvaluation And The Modern Approach to Social Experi-ments

We have already noted in Section 5 that under ideal conditions, social experiments identifyE(Y1 ¡ Y0jX;D = 1). Without further assumptions and econometric manipulation, theydo not answer the other evaluation questions posed in Section 3. As a consequence ofthe self-selected nature of the samples generated by social experiments, the data producedfrom them are far from ideal for estimating the structural parameters of behavioral models.This makes it di¢cult to generalize …ndings across experiments or to use experiments toidentify the policy-invariant structural parameters that are required for econometric policyevaluation.

To see this, recall that social experiments balance bias, but they do not eliminate thedependence between U0 and D or U1 and D. Thus from the experiments conducted underideal conditions, we can recover the conditional densities f(y0jX;D = 1) and f(y1jX;D =1). From nonparticipants we can recover f(y0jX;D = 0). It is the density f(y0 j X;D = 1)that is the new information produced from social experiments. The other densities areavailable from observational data. All of these densities condition on choices. Knowledgeof the conditional means

E(Y0jX;D = 1) = g0(X) + E(U0jX;D = 1)

and

E(Y1jX;D = 1) = g1(X) + E(U1jX;D = 1)

does not allow us to separately identify the structure (g0(X); g1(X)) from the conditionalerror terms without invoking the usual assumptions made in the nonexperimental selectionliterature. Moreover, the error processes for U0 and U1 conditional on D = 1 are funda-mentally di¤erent than those in the population at large if participation in the programdepends, in part, on U0 and U1:

72

Page 73: The Economics and Econometrics of Active Labor Market ...athena.sas.upenn.edu/petra/class721/hls.pdf · 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results

For these reasons, evidence from social experiments on programs with di¤erent par-ticipation and eligibility rules do not cumulate in any interpretable way. The estimatedtreatment e¤ects reported from the experiments combine structure and error in di¤erentways, and the conditional means of the outcomes bear no simple relationship to g0(X)or g1(X) (X¯0 and X¯1 in a linear regression setting). Thus it is not possible, withoutconducting a nonexperimental selection study, to relate the conditional means or regressionfunctions obtained from a social experiment to a core set of policy-invariant structural pa-rameters. Ham and LaLonde (1996) present one of the few attempts to recover structuralparameters from a randomized experiment, where randomization was administered at thestage where persons applied and were accepted into the program. The complexity of theiranalysis is revealing about the di¢culty of recovering structural parameters from socialexperiments.

In bypassing the need to specify economic models, many recent social experimentsproduce evidence that is not informative about them. They generate choice-based, endoge-nously strati…ed samples that are di¢cult to use in addressing any other economic questionapart from the narrow question of determining the impact of treatment on the treated forone program with one set of participation and eligibility rules.

73