Instrumental Variables: An Econometrician's Perspective · 2018. 4. 27. · used a notation that is...

37
arXiv:1410.0163v1 [stat.ME] 1 Oct 2014 Statistical Science 2014, Vol. 29, No. 3, 323–358 DOI: 10.1214/14-STS480 c Institute of Mathematical Statistics, 2014 Instrumental Variables: An Econometrician’s Perspective 1 Guido W. Imbens Abstract. I review recent work in the statistics literature on instru- mental variables methods from an econometrics perspective. I discuss some of the older, economic, applications including supply and demand models and relate them to the recent applications in settings of ran- domized experiments with noncompliance. I discuss the assumptions underlying instrumental variables methods and in what settings these may be plausible. By providing context to the current applications, a better understanding of the applicability of these methods may arise. Key words and phrases: Simultaneous equations models, randomized experiments, potential outcomes, noncompliance, selection models. 1. INTRODUCTION Instrumental Variables (IV) refers to a set of meth- ods developed in econometrics starting in the 1920s to draw causal inferences in settings where the treat- ment of interest cannot be credibly viewed as ran- domly assigned, even after conditioning on addi- tional covariates, that is, settings where the assump- tion of no unmeasured confounders does not hold. 2 Guido W. Imbens is the Applied Econometrics Professor and Professor of Economics, Graduate School of Business, Stanford University, Stanford, California 94305, USA and NBER e-mail: [email protected]; URL: http://www.gsb.stanford.edu/users/imbens . 1 Discussed in 10.1214/14-STS494, 10.1214/14-STS485 , 10.1214/14-STS488, 10.1214/14-STS491; rejoinder at 10.1214/14-STS496. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in Statistical Science, 2014, Vol. 29, No. 3, 323–358. This reprint differs from the original in pagination and typographic detail. 2 There is another literature in econometrics using instru- mental variables methods also to deal with classical measure- ment error (where explanatory variables are measured with error that is independent of the true values). My remarks in the current paper do not directly reflect on the use of instru- mental variables to deal with measurement error. See Sargan (1958) for a classical paper, and Hillier (1990) and Arellano (2002) for more recent discussions. In the last two decades, these methods have at- tracted considerable attention in the statistics lit- erature. Although this recent statistics literature builds on the earlier econometric literature, there are nevertheless important differences. First, the re- cent statistics literature primarily focuses on the bi- nary treatment case. Second, the recent literature explicitly allows for treatment effect heterogeneity. Third, the recent instrumental variables literature (starting with Imbens and Angrist (1994); Angrist, Imbens and Rubin (1996); Heckman (1990); Man- ski (1990); and Robins (1986)) explicitly uses the potential outcome framework used by Neyman for randomized experiments and generalized to observa- tional studies by Rubin (1974, 1978, 1990). Fourth, in the applications this literature has concentrated on, including randomized experiments with noncom- pliance, the intention-to-treat or reduced-form esti- mates are often of greater interest than they are in the traditional econometric simultaneous equations applications. Partly the recent statistics literature has been mo- tivated by the earlier econometric literature on in- strumental variables, starting with Wright (1928) (see the discussion on the origins of instrumental variables in Stock and Trebbi (2003)). However, there are also other antecedents, outside of the tradi- tional econometric instrumental variables literature, notably the work by Zelen on encouragement designs (Zelen, 1979, 1990). Early papers in the recent statis- tics literature include Angrist, Imbens and Rubin 1

Transcript of Instrumental Variables: An Econometrician's Perspective · 2018. 4. 27. · used a notation that is...

  • arX

    iv:1

    410.

    0163

    v1 [

    stat

    .ME

    ] 1

    Oct

    201

    4

    Statistical Science

    2014, Vol. 29, No. 3, 323–358DOI: 10.1214/14-STS480c© Institute of Mathematical Statistics, 2014

    Instrumental Variables:An Econometrician’s Perspective1

    Guido W. Imbens

    Abstract. I review recent work in the statistics literature on instru-mental variables methods from an econometrics perspective. I discusssome of the older, economic, applications including supply and demandmodels and relate them to the recent applications in settings of ran-domized experiments with noncompliance. I discuss the assumptionsunderlying instrumental variables methods and in what settings thesemay be plausible. By providing context to the current applications, abetter understanding of the applicability of these methods may arise.

    Key words and phrases: Simultaneous equations models, randomizedexperiments, potential outcomes, noncompliance, selection models.

    1. INTRODUCTION

    Instrumental Variables (IV) refers to a set of meth-ods developed in econometrics starting in the 1920sto draw causal inferences in settings where the treat-ment of interest cannot be credibly viewed as ran-domly assigned, even after conditioning on addi-tional covariates, that is, settings where the assump-tion of no unmeasured confounders does not hold.2

    Guido W. Imbens is the Applied EconometricsProfessor and Professor of Economics, Graduate Schoolof Business, Stanford University, Stanford, California94305, USA and NBER e-mail:[email protected]; URL:http://www.gsb.stanford.edu/users/imbens.

    1Discussed in 10.1214/14-STS494, 10.1214/14-STS485,10.1214/14-STS488, 10.1214/14-STS491; rejoinder at10.1214/14-STS496.

    This is an electronic reprint of the original article

    published by the Institute of Mathematical Statistics in

    Statistical Science, 2014, Vol. 29, No. 3, 323–358. Thisreprint differs from the original in pagination and

    typographic detail.2There is another literature in econometrics using instru-

    mental variables methods also to deal with classical measure-ment error (where explanatory variables are measured witherror that is independent of the true values). My remarks inthe current paper do not directly reflect on the use of instru-mental variables to deal with measurement error. See Sargan(1958) for a classical paper, and Hillier (1990) and Arellano(2002) for more recent discussions.

    In the last two decades, these methods have at-tracted considerable attention in the statistics lit-erature. Although this recent statistics literaturebuilds on the earlier econometric literature, thereare nevertheless important differences. First, the re-cent statistics literature primarily focuses on the bi-nary treatment case. Second, the recent literatureexplicitly allows for treatment effect heterogeneity.Third, the recent instrumental variables literature(starting with Imbens and Angrist (1994); Angrist,Imbens and Rubin (1996); Heckman (1990); Man-ski (1990); and Robins (1986)) explicitly uses thepotential outcome framework used by Neyman forrandomized experiments and generalized to observa-tional studies by Rubin (1974, 1978, 1990). Fourth,in the applications this literature has concentratedon, including randomized experiments with noncom-pliance, the intention-to-treat or reduced-form esti-mates are often of greater interest than they are inthe traditional econometric simultaneous equationsapplications.

    Partly the recent statistics literature has been mo-tivated by the earlier econometric literature on in-strumental variables, starting with Wright (1928)(see the discussion on the origins of instrumentalvariables in Stock and Trebbi (2003)). However,there are also other antecedents, outside of the tradi-tional econometric instrumental variables literature,notably the work by Zelen on encouragement designs(Zelen, 1979, 1990). Early papers in the recent statis-tics literature include Angrist, Imbens and Rubin

    1

    http://arxiv.org/abs/1410.0163v1http://www.imstat.org/sts/http://dx.doi.org/10.1214/14-STS480http://www.imstat.orgmailto:[email protected]://www.gsb.stanford.edu/users/imbenshttp://dx.doi.org/10.1214/14-STS494http://dx.doi.org/10.1214/14-STS485http://dx.doi.org/10.1214/14-STS488http://dx.doi.org/10.1214/14-STS491http://dx.doi.org/10.1214/14-STS496http://www.imstat.orghttp://www.imstat.org/sts/http://dx.doi.org/10.1214/14-STS480

  • 2 G. W. IMBENS

    (1996), Robins (1989) and McClellan and Newhouse(1994). Recent reviews include Rosenbaum (2010),Vansteelandt et al. (2011) and Hernán and Robins(2006). Although these reviews include many refer-ences to the earlier economics literature, it mightstill be useful to discuss the econometric literaturein more detail to provide some background and per-spective on the applicability of instrumental vari-ables methods in other fields. In this discussion, I willdo so.

    Instrumental variables methods have been a cen-tral part of the econometrics canon since the firsthalf of the twentieth century, and continue to bean integral part of most graduate and undergrad-uate textbooks (e.g., Angrist and Pischke, 2009;Bowden and Turkington (1984); Greene (2011);Hayashi (2000); Manski (1995); Stock and Watson(2010); Wooldridge, 2010, 2008). Like the statisti-cians Fisher and Neyman (Fisher (1925); Splawa-Neyman, 1990), early econometricians such as Wright(1928), Working (1927), Tinbergen (1930) andHaavelmo (1943) were interested in drawing causalinferences, in their case about the effect of economicpolicies on economic behavior. However, in sharpcontrast to the statistical literature on causal infer-ence, the starting point for these econometricianswas not the randomized experiment. From the out-set, there was a recognition that in the settings theystudied, the causes, or treatments, were not assignedto passive units (economic agents in their setting,such as individuals, households, firms or countries).Instead the economic agents actively influence, oreven explicitly choose, the level of the treatmentthey receive. Choice, rather than chance, was thestarting point for thinking about the assignmentmechanism in the econometrics literature. In thisperspective, units receiving the active treatment aredifferent from those receiving the control treatmentnot just because of the receipt of the treatment: they(choose to) receive the active treatment because theyare different to begin with. This makes the treat-ment potentially endogenous, and creates what issometimes in the econometrics literature referred toas the selection problem (Heckman (1979)).

    The early econometrics literature on instrumentalvariables did not have much impact on thinking inthe statistics community. Although some of the tech-nical work on large sample properties of various esti-mators did get published in statistics journals (e.g.,the still influential Anderson and Rubin, 1949 pa-per), applications by noneconomists were rare. It is

    not clear exactly what the reasons for this are. Onepossibility is the fact that the early literature oninstrumental variables was closely tied to substan-tive economic questions (e.g., interventions in mar-kets), using theoretical economic concepts that mayhave appeared irrelevant or difficult to translate toother fields (e.g., supply and demand). This mayhave suggested to noneconomists that the instru-mental variables methods in general had limited ap-plicability outside of economics. The use of economicconcepts was not entirely unavoidable, as the crit-ical assumptions underlying instrumental variablesmethods are substantive and require subtle subjectmatter knowledge. A second reason may be that al-though the early work by Tinbergen and Haavelmoused a notation that is very similar to what Rubin(1974) later called the potential outcome notation,quickly the literature settled on a notation only in-volving realized or observed outcomes; see for a his-torial perspective Hendry and Morgan (1992) andImbens (1997). This realized-outcome notation thatremains common in the econometric textbooks ob-scures the connections between the Fisher and Ney-man work on randomized experiments and the in-strumental variables literature. It is only in the 1990sthat econometricians returned to the potential out-come notation for causal questions (e.g., Heckman(1990); Manski (1990); Imbens and Angrist (1994)),facilitating and initiating a dialogue with statisti-cians on instrumental variable methods.

    The main theme of the current paper is that theearly work in econometrics is helpful in understand-ing the modern instrumental variables literature,and furthermore, is potentially useful in improvingapplications of these methods and identifying poten-tial instruments. These methods may in fact be use-ful in many settings statisticians study. Exposure totreatment is rarely solely a matter of chance or solelya matter of choice. Both aspects are important andhelp to understand when causal inferences are cred-ible and when they are not. In order to make thesepoints, I will discuss some of the early work and putit in a modern framework and notation. In doing so,I will address some of the concerns that have beenraised about the applicability of instrumental vari-ables methods in statistics. I will also discuss someareas where the recent statistics literature has ex-tended and improved our understanding of instru-mental variables methods. Finally, I will review someof the econometric terminology and relate it to thestatistical literature to remove some of the semantic

  • INSTRUMENTAL VARIABLES 3

    barriers that continue to separate the literatures. Ishould emphasize that many of the topics discussedin this review continue to be active research areas,about which there is considerable controversy bothinside and outside of econometrics.

    The remainder of the paper is organized as follows.In Section 2, I will discuss the distinction betweenthe statistics literature on causality with its primaryfocus on chance, arising from its origins in the experi-mental literature, and the econometrics or economicsliterature with its emphasis on choice. The next twosections discuss in detail two classes of examples.In Section 3, I discuss the canonical example of in-strumental variables in economics, the estimation ofsupply and demand functions. In Section 4, I discussa modern class of examples, randomized experimentswith noncompliance. In Section 5, I discuss the sub-stantive content of the critical assumptions, and inSection 6, I link the current literature to the oldertextbook discussions. In Section 7, I discuss someof the recent extensions of traditional instrumentalvariables methods. Section 8 concludes.

    2. CHOICE VERSUS CHANCE INTREATMENT ASSIGNMENT

    Although the objectives of causal analyses instatistics and econometrics are very similar, tra-ditionally statisticians and economists have ap-proached these questions very differently. A key dif-ference in the approaches taken in the statisticaland econometric literatures is the focus on differ-ent assignment mechanisms, those with an emphasison chance versus those with an emphasis on choice.Although in practice in many observational stud-ies assignment mechanisms have elements of bothchance and choice, the traditional starting points inthe two literatures are very different, and it is onlyrecently that these literatures have discovered howmuch they have in common.3

    3In both literatures, it is typically assumed that there is nointerference between units. In the statistics literature, this isoften referred to as the Stable Unit Treatment Value Assump-tion (SUTVA, Rubin (1978)). In economics, there are manycases where this is not a reasonable assumption because thereare general equilibrium effects. In an interesting recent exper-iment, Crépon et al. (2012) varied the scale of experimentalinterventions (job training programs in their case) in differentlabor markets and found that the scale substantially affectedthe average effects of the interventions. There is also a grow-ing literature on settings directly modeling interactions. Inthis discussion, I will largely ignore the complications aris-

    2.1 The Statistics Literature: The Focus onChance

    The starting point in the statistics literature,going back to Fisher (1925) and Splawa-Neyman(1990), is the randomized experiment, with bothFisher and Neyman motivated by agricultural ap-plications where the units of analysis are plots ofland. To be specific, suppose we are interested inthe average causal effect of a binary treatment orintervention, say fertilizer A or fertilizer B, on plotyields. In the modern notation and language orig-inating with Rubin (1974), the unit (plot) levelcausal effect is a comparison between the two po-tential outcomes, Yi(A) and Yi(B) [e.g., the differ-ence τi = Yi(B)−Yi(A)], where Yi(A) is the potentialoutcome given fertilizer A and Yi(B) is the potentialoutcome given fertilizer B, both for plot i. In a com-pletely randomized experiment with N plots, we se-lect M (with M ∈ {1, . . . ,N−1}) plots at random toreceive fertilizer B, with the remaining N −M plotsassigned to fertilizer A. Thus, the treatment assign-ment, denoted by Xi ∈ {A,B} for plot i, is by designindependent of the potential outcomes.4 In this spe-cific setting, the work by Fisher and Neyman showshow one can draw exact causal inferences. Fisher fo-cused on calculating exact p-values for sharp nullhypotheses, typically the null hypothesis of no effectwhatsoever, Yi(A) = Yi(B) for all plots. Neyman fo-cused on developing unbiased estimators for the av-erage treatment effect

    ∑i(Yi(A)−Yi(B))/N and the

    variance of those estimators.The subsequent literature in statistics, much of it

    associated with the work by Rubin and coauthors(Cochran (1968); Cochran and Rubin (1973); Ru-bin, 1974, 1990, 2006; Rosenbaum and Rubin, 1983;Rubin and Thomas (1992); Rosenbaum, 2002, 2010;Holland (1986)) has focused on extending and gen-eralizing the Fisher and Neyman results that werederived explicitly for randomized experiments to themore general setting of observational studies. A largepart of this literature focuses on the case where

    ing from interference between units. See, for example, Manski(2000a).

    4To facilitate comparisons with the econometrics literature,I will follow the notation that is common in econometrics,denoting the endogenous regressors, here the treatment of in-terest, by Xi, and later the instruments by Zi. Additional(exogenous) regressors will be denoted by Vi. In the statis-tics literature, the treatments of interested are often denotedby Wi, the instruments by Zi, with Xi denoting additionalregressors or attributes.

  • 4 G. W. IMBENS

    the researcher has additional background informa-tion available about the units in the study. The ad-ditional information is in the form of pretreatmentvariables or covariates not affected by the treatment.Let Vi denote these covariates. A key assumption inthis literature is that conditional on these pretreat-ment variables the assignment to treatment is inde-pendent of the treatment assignment. Formally,

    Xi ⊥ (Yi(A), Yi(B))|Vi (unconfoundedness).Following Rubin (1990), I refer to this assumptionas unconfoundedness given Vi, also known as no un-measured confounders. This assumption, in combina-tion with the auxiliary assumption that for all valuesof the covariates the probability of being assignedto each level of the treatment is strictly positive isreferred to as strong ignorability (Rosenbaum andRubin, 1983). If we assume only that Xi ⊥ Yi(A)|Viand Xi ⊥ Yi(B)|Vi rather than jointly, the assump-tion is referred to as weak unconfoundedness (Im-bens (2000)), and the combination as weak ignor-ability. Substantively, it is not clear that there arecases in the setting with binary treatments where theweak version is plausible but not the strong version,although the difference between the two assump-tions has some content in the multivalued treat-ment case (Imbens (2000)). In the econometric lit-erature, closely related assumptions are referred toas selection-on-observables (Barnow, Cain and Gold-berger (1980)) or exogeneity.

    Under weak ignorability (and thus also understrong ignorability), it is possible to estimate pre-cisely the average effect of the treatment in largesamples. In other words, the average effect of thetreatment is identified. Various specific methodshave been proposed, including matching, subclas-sification and regression. See Rosenbaum (2010),Rubin (2006), Imbens (2004, 2014), Gelman andHill (2006), Imbens and Rubin (2014) and Angristand Pischke (2009) for general discussions and sur-veys. Robins and coauthors (Robins (1986); Gill andRobins (2001); Richardson and Robins (2013); Vander Laan and Robins, 2003) have extended this ap-proach to settings with sequential treatments.

    2.2 The Econometrics Literature:The Focus on Choice

    In contrast to the statistics literature whose pointof departure was the randomized experiment, thestarting point in the economics and econometricsliteratures for studying causal effects emphasizes the

    choices that led to the treatment received. Unlike theoriginal applications in statistics where the units arepassive, for example, plots of land, with no influenceover their treatment exposure, units in economicanalyses are typically economic agents, for example,individuals, families, firms or administrations. Theseare agents with objectives and the ability to pursuethese objectives within constraints. The objectivesare typically closely related to the outcomes underthe various treatments. The constraints may be le-gal, financial or information-based.

    The starting point of economic science is to modelthese agents as behaving optimally. More specifi-cally, this implies that economists think of everyoneof these agents as choosing the level of the treatmentto most efficiently pursue their objectives given theconstraints they face.5 In practice, of course, there isoften evidence that not all agents behave optimally.Nevertheless, the starting point is the presumptionthat optimal behavior is a reasonable approximationto actual behavior, and the models economists taketo the data often reflect this.

    2.3 Some Examples

    Let us contrast the statistical and econometric ap-proaches in a highly stylized example. Roy (1951)studies the problem of occupational choice and theimplications for the observed distribution of earn-ings. He focuses on an example where individualscan choose between two occupations, hunting andfishing. Each individual has a level of productiv-ity associated with each occupation, say, the totalvalue of the catch per day. For individual i, the twoproductivity levels are Yi(h) and Yi(f), for the pro-ductivity level if hunting and fishing, respectively.6

    Suppose the researcher is interested in the averagedifference in productivity in these two occupations,τ = E[Yi(f)−Yi(h)], where the averaging is over thepopulation of individuals.7 The researcher observesfor all units in the sample the occupation they chose

    5In principle, these objectives may include the effort it takesto find the optimal strategy, although it is rare that these costsare taken into account.

    6In this example, the no-interference (SUTVA) assumptionthat there are no effects of other individual’s choices and,therefore, that the individual level potential outcomes are welldefined is tenuous—if one hunter is successful that will reducethe number of animals available to other hunters—but I willignore these issues here.

    7That is not actually the goal of Roy’s original study, butthat is beside the point here.

  • INSTRUMENTAL VARIABLES 5

    (Xi, equal to h for hunters and f for fishermen) andthe productivity in their chosen occupation,

    Y obsi = Yi(Xi) =

    {Yi(h) if Xi = h,

    Yi(f) if Xi = f .

    In the Fisher–Neyman–Rubin statistics tradition,one might start by estimating τ by comparing pro-ductivity levels by occupation:

    τ̂ = Yobs

    f − Yobs

    h ,

    where

    Yobs

    f =1

    Nf

    i:Xi=f

    Y obsi , Yobs

    h =1

    Nh

    i:Xi=h

    Y obsi ,

    Nf =

    N∑

    i=1

    1Xi=f and Nh =N −Nf .

    If there is concern that these unadjusted differencesare not credible as estimates of the average causaleffect, the next step in this approach would be toadjust for observed individual characteristics suchas education levels or family background. This wouldbe justified if individuals can be thought of as choos-ing, at least within homogenous groups defined bycovariates, randomly which occupation to engage in.

    Roy, in the economics tradition, starts from a verydifferent place. Instead of assuming that individualschoose their occupation (possibly after conditioningon covariates) randomly, he assumes that each indi-vidual chooses her occupation optimally, that is, theoccupation that maximizes her productivity:

    Xi ={f if Yi(f)≥ Yi(h),h otherwise.

    There need not be a solution in all cases, especiallyif there is interference, and thus there are generalequilibrium effects, but I will assume here that sucha solution exists. If this assumption about the occu-pation choice were strictly true, it would be difficultto learn much about τ from data on occupations andearnings. In the spirit of research by Manski (1990,2000b, 2001), Manski and Pepper (2000), and Man-ski et al. (1992), one can derive bounds on τ , exploit-ing the fact that if Xi = f , then the unobserved Yi(h)must satisfy Yi(h)≤ Yi(f), with Yi(f) observed. Forthe Roy model, the specific calculations have beenreported in Manski (1995), Section 2.6. Without ad-ditional information or restrictions, these boundsmight be fairly wide, and often one would not learnmuch about τ . However, the original version of the

    Roy model, where individuals know ex ante the ex-act value of the potential outcomes and choose thelevel of the treatment corresponding to the maxi-mum of those, is ultimately not plausible in prac-tice. It is likely that individuals face uncertainty re-garding their future productivity, and thus may notbe able to choose the ex post optimal occupation;see for bounds under that scenario Manski and Na-gin (1998). Alternatively, and this is emphasized inAthey and Stern (1998), individuals may have morecomplex objective functions taking into account het-erogenous costs or nonmonetary benefits associatedwith each occupation. This creates a wedge betweenthe outcomes that the researcher focuses on and theoutcomes that the agent optimizes over. What is keyhere in relation to the statistics literature is that un-der the Roy model and its generalizations the veryfact that two individuals have different occupationsis seen as indicative that they have different poten-tial outcomes, thus fundamentally calling into ques-tion the unconfoundedness assumption that individ-uals with similar pretreatment variables but differ-ent treatment levels are comparable. This concernabout differences between individuals with the samevalues for pretreatment variables but different treat-ment levels underlies many econometric analyses ofcausal effects, specifically in the literature on selec-tion models. See Heckman and Robb (1985) for ageneral discussion.

    Let me discuss two additional examples. There isa large literature in economics concerned with esti-mating the causal effect of educational achievement(measured as years of education) on earnings; seefor general discussions Griliches (1977) and Card(2001). One starting point, and in fact the basis ofa large empirical literature, is to compare earningsfor individuals who look similar in terms of back-ground characteristics, but who differ in terms ofeducational achievement. The concern in an equallylarge literature is that those individuals who chooseto acquire higher levels of education did so preciselybecause they expected their returns to additionalyears of education to be higher than individuals whochoose not to acquire higher levels of education ex-pected their returns to be. In the terminology ofthe returns-to-education literature, the individualschoosing higher levels of education may have higherlevels of ability, which lead to higher earnings forgiven levels of education.

    Another canonical example is that of voluntaryjob training programs. One approach to estimate

  • 6 G. W. IMBENS

    the causal effect of training programs on subsequentearnings would be to compare earnings for those par-ticipating in the program with earnings for thosewho did not. Again the concern would be that thosewho choose to participate did so because they ex-pected bigger benefits (financial or otherwise) fromdoing so than individuals who chose not to partici-pate.

    These issues also arise in the missing data liter-ature. The statistics literature (Rubin, 1976, 1987,1996; Little and Rubin, 1987) has primarily focusedon models that assume that units with item non-response are comparable to units with complete re-sponse, conditional on covariates that are always ob-served. The econometrics literature (Heckman, 1976,1979) has focused more heavily on models that in-terpret the nonresponse as the result of systematicdifferences between units. Philipson (1997a, 1997b),Philipson and DeSimone (1997), and Philipson andHedges (1998) take this even further, viewing surveyresponse as a market transaction, where individu-als not responding the survey do so deliberately be-cause the costs of responding outweighs the benefitsto these nonrespondents. The Heckman-style selec-tion models often assume strong parametric alterna-tives to the Little and Rubin missing-at-random orignorability condition. This has often in turn led toestimators that are sensitive to small changes in thedata generating process. See Little (1985).

    These issues of nonrandom selection are of coursenot special to economics. Outside of randomized ex-periments, the exposure to treatment is typicallyalso chosen to achieve some objectives, rather thanrandomly within homogenous populations. For ex-ample, physicians presumably choose treatments fortheir patients optimally, given their knowledge andgiven other constraints (e.g., financial). Similarly, ineconomics and other social sciences one may view in-dividuals as making optimal decisions, but these aretypically made given incomplete information, lead-ing to errors that may make the ultimate decisionsappear as good as random within homogenous sub-populations. What is important is that the start-ing point is different in the two disciplines, and thishas led to the development of substantially differentmethods for causal inference.

    2.4 Instrumental Variables

    How do instrumental variables methods addressthe type of selection issues the Roy model raises?

    At the core, instrumental variables change the in-centives for agents to choose a particular level of thetreatment, without affecting the potential outcomesassociated with these treatment levels. Consider ajob training program example where the researcheris interested in the average effect of the training pro-gram on earnings. Each individual is characterizedby two potential earnings outcomes, earnings giventhe training and earnings in the absence of the train-ing. Each individual chooses to participate or notbased on their perceived net benefits from doing so.As pointed out in Athey and Stern (1998), it is im-portant that these net benefits that enter into the in-dividual’s decision differ from the earnings that arethe primary outcome of interest to the researcher.They do so by the costs associated with participat-ing in that regime. Suppose that there is variationin the costs individuals incur with participation inthe training program. The costs are broadly defined,and may include travel time to the program facili-ties, or the effort required to become informed aboutthe program. Furthermore, suppose that these costsare independent of the potential outcomes. This isa strong assumption, often made more plausible byconditioning on covariates. Measures of the partici-pation cost may then serve as instrument variablesand aid in the identification of the causal effects ofthe program. Ultimately, we compare earnings for in-dividuals with low costs of participation in the pro-gram with those for individuals with high costs ofparticipation and attribute the difference in averageearnings to the increased rate of participation in theprogram among the two groups.

    In almost all cases, the assumption that there isno direct effect of the change in incentives on thepotential outcomes is controversial, and it needs tobe assessed at a case-by-case level. The second partof the assumption, that the costs are independent ofthe potential outcomes, possibly after conditioningon covariates, is qualitatively very different. In somecases, it is satisfied by design, for example, if theincentives are randomized. In observational studies,it is a substantive, unconfoundedness-type, assump-tion, that may be more plausible or at least approxi-mately hold after conditioning on covariates. For ex-ample, in a number of studies researchers have usedphysical distance to facilities as instruments for ex-posure to treatments available at such facilities. Suchstudies include McClellan and Newhouse (1994) andBaiocchi et al. (2010) who use distance to hospi-tals with particular capabilities as an instrument for

  • INSTRUMENTAL VARIABLES 7

    treatments associated with those capabilities, afterconditioning on distance to the nearest medical fa-cility, and Card (1995), who uses distance to collegesas an instrument for attending college.

    3. THE CLASSIC EXAMPLE:SUPPLY AND DEMAND

    In this section, I will discuss the classic exampleof instrumental variables methods in econometrics,that is, simultaneous equations. Simultaneous equa-tions models are both at the core of the econometricscanon and at the core of the confusion concerninginstrumental variables methods in the statistics lit-erature. More precisely, in this section I will look atsupply and demand models that motivated the orig-inal research into instrumental variables. Here, theendogeneity, that is, the violation of unconfounded-ness, arises from an equilibrium condition. I will dis-cuss the model in a very specific example to makethe issues clear, as I think that perhaps the level ofabstraction used in the older econometric text bookshas hampered communication with researchers inother fields.

    3.1 Discussions in the Statistics Literature

    To show the level of frustration and confusion inthe statistics literature with these models, let mepresent some quotes. In a comment on Pratt andSchlaifer (1984), Dawid (1984) writes “I despair ofever understanding the logic of simultaneous equa-tions well enough to tackle them,” (page 24). Cox(1992) writes in a discussion on causality “it seemsreasonable that models should be specified in a waythat would allow direct computer simulation of thedata. . . . This, for example, precludes the use of y2as an explanatory variable for y1 if at the sametime y1 is an explanatory variable for y2” (page294). This restriction appears to rule out the firstmodel Haavelmo considers, that is, equations (1.1)and (1.2) (Haavelmo (1943), page 2):

    Y = aX + ǫ1, X = bY + ǫ2

    (see also Haavelmo, 1944). In fact, the comment byCox appears to rule out all simultaneous equationsmodels of the type studied by economists. Holland(1988), in comment on structural equation meth-ods in econometrics, writes “why should [this distur-bance] be independent of [the instrument]. . . whenthe very definition of [this disturbance] involves [theinstrument],” (page 460). Freedman writes “Addi-tionally, some variables are taken to be exogenous

    (independent of the disturbance terms) and someendogenous (dependent on the disturbance terms).The rationale is seldom clear, because—among otherthings—there is seldom any very clear descriptionof what the disturbance terms mean, or where theycome from” (Freedman (2006), page 699).

    3.2 The Market for Fish

    The specific example I will use in this section isthe market for whiting (a particular white fish, oftenused in fish sticks) traded at the Fulton fish marketin New York City. Whiting was sold at the Fultonfish market at the time by a small number of deal-ers to a large number of buyers. Kathryn Graddycollected data on quantities and prices of whitingsold by a particular trader at the Fulton fish marketon 111 days between December 2, 1991, and May8, 1992 (Graddy, 1995, 1996; Angrist, Graddy andImbens (2000)). I will take as the unit of analysisa day, and interchangeably refer to this as a mar-ket. Each day, or market, during the period coveredin this data set, indexed by t = 1, . . . ,111, a num-ber of pounds of whiting are sold by this particulartrader, denoted by Qobst . Not every transaction onthe same day involves the same price, but to focuson the essentials I will aggregate the total amount ofwhiting sold and the total amount of money it wassold for, and calculate a price per pound (in cents)for each of the 111 days, denoted by P obst . Figure 1presents a scatterplot of the observed log price andlog quantity data. The average quantity sold overthe 111 days was 6335 pounds, with a standard de-viation of 4040 pounds, for an average of the averagewithin-day prices of 88 cts per pound and a standarddeviation of 34 cts. For example, on the first day ofthis period 8058 pounds were sold for an average of65 cents, and the next day 2224 pounds were sold foran average of 100 cents. Table 1 presents averages oflog prices and log quantities for the fish data.

    Now suppose we are interested in predicting the ef-fect of a tax in this market. To be specific, supposethe government is considering imposing a 100× r%tax (e.g., a 10% tax) on all whiting sold, but beforedoing so it wishes to predict the average percent-age change in the quantity sold as a result of thetax. We may formalize that by looking at the av-erage effect on the logarithm of the quantity, τ =E[lnQt(r) − lnQt(0)], where Qt(r) is the quantitytraded in market/day t if the tax rate were set at r.The problem, substantially worse than in the stan-dard causal inference setting where for some units

  • 8 G. W. IMBENS

    Fig. 1. Scatterplot of log prices and log quantities.

    we observe one of the two potential outcomes andfor other units we observe the other potential out-come, is that in all 111 markets we observe the quan-tity traded at tax rate 0, Qobst =Qt(0), and we neversee the quantity traded at the tax rate contemplatedby the government, Qt(r). Because only E[lnQt(0)]is directly estimable from data on the quantities weobserve, the question is how to draw inferences aboutE[lnQt(r)].

    A naive approach would be to assume that a taxincrease by 10% would simply raise prices by 10%. Ifone additionally is willing to make the unconfound-edness assumption that prices can be viewed as setindependently of market conditions on a particularday, it follows that those markets after the intro-duction of the tax where the price net of taxes is$1.00 would on average be like those markets priorto the introduction of the 10% tax where the price

    was $1.10. Formally, this approach assumes that

    E[lnQt(r)|P obst = p](3.1)

    = E[lnQt(0)|P obst = (1+ r)× p],implying that

    E[lnQt(r)− lnQt(0)|P obst = p]= E[lnQobst |P obst = (1 + r)× p]

    −E[lnQobst |P obst = p]≈ E[lnQobst | lnP obst = r+ lnp]

    −E[lnQobst | lnP obst = lnp].The last quantity is often estimated using linear re-gression methods. Typically, the regression functionis assumed to be linear in logarithms with constantcoefficients,

    lnQobst = αls + βls × lnP obst + εt.(3.2)

    Table 1

    Fulton fish market data (N = 111)

    Logarithm of price Logarithm of quantityNumber ofobservations Average Standard deviation Average Standard deviation

    All 111 −0.19 (0.38) 8.52 (0.74)

    Stormy 32 0.04 (0.35) 8.27 (0.71)Not-stormy 79 −0.29 (0.35) 8.63 (0.73)

    Stormy 32 0.04 (0.35) 8.27 (0.71)Mixed 34 −0.16 (0.35) 8.51 (0.77)Fair 45 −0.39 (0.37) 8.71 (0.69)

  • INSTRUMENTAL VARIABLES 9

    Ordinary least squares estimation with the Fultonfish market data collected by Graddy leads to

    ̂lnQobst = 8.42 − 0.54 × lnP obst .(0.08) (0.18)

    The estimated regression line is also plotted in Fig-ure 1. Interestingly, this is what Working (1927) callsthe “statistical ‘demand curve’,” as opposed to theconcept of a demand curve in economic theory. Thissimple regression, in combination with the assump-tion embodied in (3.1), suggests that the quantitytraded would go down, on average, by 5.4% in re-sponse to a 10% tax.

    τ̂ =−0.054 (s.e. 0.018).Why does this answer, or at least the method inwhich it was derived, not make any sense to aneconomist? The answer assumes that prices can beviewed as independent of the potential quantitiestraded, or, in other words, unconfounded. This as-signment mechanism is unrealistic. In reality, it islikely the markets/days, prior to the introduction ofthe tax, when the price was $1.10 were systemati-cally different from those where the price was $1.00.From an economists’ perspective, the fact that theprice was $1.10 rather than $1.00 implies that mar-ket conditions must have been different, and it islikely that these differences are directly related tothe potential quantities traded. For example, on dayswhere the price was high there may have been morebuyers, or buyers may have been interested in buy-ing larger quantities, or there may have been less fishbrought ashore. In order to predict the effect of thetax, we need to think about the responses of bothbuyers and sellers to changes in prices, and aboutthe determination of prices. This is where economictheory comes in.

    3.3 The Supply of and Demand for Fish

    So, how do economists go about analyzing ques-tions such as this one if not by regressing quantitieson prices? The starting point for economists is tothink of an economic model for the determination ofprices (the treatment assignment mechanism in Ru-bin’s potential outcome terminology). The first partof the simplest model an economist would considerfor this type of setting is a pair of functions, thedemand and supply functions. Think of the buyerscoming to the Fulton fishmarket on a given mar-ket/day (say, day t) with a demand function Qdt (p).This function tells us, for that particular morning,

    how much fish all buyers combined would be willingto buy if the price on that day were p, for any valueof p. This function is conceptually exactly like thepotential outcomes set up commonly used in causalinference in the modern literature. It is more com-plicated than the binary treatment case with twopotential outcomes, because there is a potential out-come for each value of the price, with more or lessa continuum of possible price values, but it is inline with continuous treatment extensions such asthose in Gill and Robins (2001). Common sense, andeconomic theory, suggests that this demand func-tion is a downward sloping function: buyers wouldlikely be willing to buy more pounds of whiting if itwere cheaper. Traditionally, the demand function isspecified parametrically, for example, linear in loga-rithms:

    lnQdt (p) = αd + βd × lnp+ εdt ,(3.3)

    where βd is the price elasticity of demand. This equa-tion is not a regression function like (3.2). It is inter-preted as a structural equation or behavioral equa-tion, and in the treatment effect literature terminol-ogy, it is a model for the potential outcomes. Partof the confusion between the model for the poten-tial outcomes in (3.3) and the regression functionin (3.2) may stem from the traditional notation inthe econometrics literature where the same symbol(e.g., Qt) would be used for the observed outcomes

    (Qobst in our notation) and the potential outcomefunction [Qdt (p) in our notation], and the same sym-bol (e.g., Pt) would be used for the observed valueof the treatment (P obst in our notation) and the ar-gument in the potential outcome function (p in ournotation). Interestingly, the pioneers in this litera-ture, Tinbergen (1930) and Haavelmo (1943), diddistinguish between these concepts in their nota-tion, but the subsequent literature on simultaneousequations dropped that distinction and adopted anotation that did not distinguish between observedand potential outcomes. For a historical perspectivesee Christ (1994) and Stock and Trebbi (2003). Myview is that dropping this distinction was merelyincidental, and that implicitly the interpretation ofthe simultaneous equations models remained that interms of potential outcomes.8

    8As a reviewer pointed out, once one views simultaneousequations in terms of potential outcomes, there is a naturalnormalization of the equations. This suggests that perhaps thediscussions of issues concerning normalizations of equations in

  • 10 G. W. IMBENS

    Implicit (by the lack of a subscript on the coeffi-cients) in the specification of the demand functionin (3.3) is the strong assumption that the effect ofa unit change in the logarithm of the price (equalto βd) is the same for all values of the price, andthat the effect is the same in all markets. This isclearly a very strong assumption, and the modernliterature on simultaneous equations (see Matzkin(2007) for an overview) has developed less restric-tive specifications allowing for nonlinear and nonad-ditive effects while maintaining identification. Theunobserved component in the demand function, de-noted by εdt , represents unobserved determinants ofthe demand on any given day/market: a particularbuyer may be sick on a particular day and not go tothe market, or may be expecting a client wanting topurchase a large quantity of whiting. We can normal-ize this unobserved component to have expectationzero, where the expectation is taken over all marketsor days:

    E[lnQdt (p)] = αd + βd × lnp.

    The interpretation of this expectation is subtle, andagain it is part of the confusion that sometimesarises. Consider the expected demand at p = 1,E[lnQdt (1)], under the linear specification in (3.3)equal to αd + βd · ln(1) = αd. This αd is the averageof all demand functions, evaluated at price equal to$1.00, irrespective of what the actual price in themarket is, where the expectation is taken over allmarkets. It is not, and this is key, the conditional ex-pectation of the observed quantity in markets wherethe observed price is equal to $1.00 (or which is thesame the demand function at 1 in those markets),which is E[lnQobst |P obst = 1] = E[lnQdt (1)|P obst = 1].The original Tinbergen and Haavelmo notation andthe modern potential outcome version is helpfulin making this distinction, compared to the sixtieseconometrics textbook notation.9

    simultaneous equations models (e.g., Basmann, 1963a, 1963b,1965; Hillier (1990)) implicitly rely on a different interpreta-tion, for example, thinking of the endogeneity arising frommeasurement error. Throughout this discussion, I will inter-pret simultaneous equations in terms of potential outcomes,viewing the realized outcome notation simply as obscuringthat.

    9Other notations have been recently proposed to stressthe difference between the conditional expectation of the ob-served outcome and the expectation of the potential out-come. Pearl (2000) writes the expected demand when theprice is set to $1.00 as E[lnQdt |do(Pt = 1)], rather than con-

    Similar to the demand function, the supply func-tion Qst (p) represents the quantity of whiting thesellers collectively are willing to sell at any givenprice p, on day t. Here, common sense would suggestthat this function is sloping upward: the higher theprice, the more the sellers are willing to sell. As withthe demand function, the supply function is typicallyspecified parametrically with constant coefficients:

    lnQst(p) = αs + βs × lnp+ εst ,(3.4)

    where βs is the price elasticity of supply. Again wecan normalize the expectation of εst to zero (wherethe expectation is taken over markets), and write

    E[lnQst(p)] = αs + βs × lnp.

    Note that the εdt and εst are not assumed to be in-

    dependent in general, although in some applicationsthat may be a reasonable assumption. In this spe-cific example, εdt may represent random variation inthe set or number of buyers coming to the marketon a particular day, and εst may represent randomvariation in suppliers showing up at the market andin their ability to catch whiting during the precedingdays. These components may well be uncorrelated,but there may be common components, for example,in traffic conditions around the market that make itdifficult for both suppliers and buyers to come to themarket.

    3.4 Market Equilibrium

    Now comes the second part of the simple economicmodel, the determination of the price, or, in the ter-minology of the treatment effect literature, the as-signment mechanism. The conventional assumptionin this type of market is that the price that is ob-served, that is, the price at which the fish is traded inmarket/day t, is the (unique) market clearing priceat which demand and supply are equal. In otherwords, this is the price at which the market is inequilibrium, denoted by P obst . This equilibrium pricesolves

    Qdt (Pobs

    t ) =Qst(P

    obs

    t ).(3.5)

    The observed quantity on that day, that is the quan-tity actually traded, denoted by Qobst , is then equal

    ditional on the price being observed to be $1.00. Hernánand Robins (2006) write this average potential outcome asE[lnQdt (Pt = 1)], whereas Lauritzen and Richardson (2002)write it as E[lnQobst ‖ P

    obs

    t = 1] where the double ‖ impliesconditioning by intervention.

  • INSTRUMENTAL VARIABLES 11

    to the demand function at the equilibrium price (or,equivalently, because of the equilibrium assumption,the supply function at that price):

    Qobst =Qdt (P

    obs

    t ) =Qst (P

    obs

    t ).(3.6)

    Assuming that the demand function does slopedownward and the supply function does slope up-ward, and both are linear in logarithms, the equi-librium price exists and is unique, and we can solvefor the observed price and quantities in terms of theparameters of the model and the unobserved com-ponents:

    lnP obst =αd −αsβs − βd +

    εdt − εstβs − βd and

    lnQobst =βs · αd − βd · αs

    βs − βd +βs · εdt − βd · εst

    βs − βd .

    For economists, this is a more plausible model forthe determination of realized prices and quantitiesthan the model that assumes prices are independentof market conditions. It is not without its problemsthough. Chief among these from our perspective isthe complication that, just as in the Roy model, wecannot necessarily infer the values of the unknownparameters in this model even if we have data onequilibrium prices and quantities P obst and Q

    obst for

    many markets.Another issue is how buyers and sellers arrive at

    the equilibrium price. There is a theoretical eco-nomic literature addressing this question. Often theidea is that there is a sequential process of buyersmaking bids, and suppliers responding with offers ofquantities at those prices, with this process repeat-ing itself until it arrives at a price at which supplyand demand are equal. In practice, economists oftenrefrain from specifying the details of this process andsimply assume that the market is in equilibrium. Ifthe process is fast enough, it may be reasonable to ig-nore the fact the specifics of the process and analyzethe data as if equilibrium was instantaneous.10 A re-lated issue is whether this model with an equilibriumprices that equates supply and demand is a reason-able approximation to the actual process that deter-mines prices and quantities. In fact, Graddy’s datacontains information showing that the seller wouldtrade at different prices on the same day, so strictly

    10See Shapley and Shubik (1977) and Giraud (2003), andfor some experimental evidence, Plott and Smith (1987) andSmith (1982).

    speaking this model does not hold. There is a longtradition in economics, however, of using such mod-els as approximations to price determination and wewill do so here.

    Finally, let me connect this to the textbook dis-cussion of supply and demand models. In many text-books, the demand and supply equations would bewritten directly in terms of the observed (equilib-rium) quantities and prices as

    Qobst = αs + βs × lnP obst + εst ,(3.7)

    Qobst = αd + βd × lnP obst + εdt .(3.8)

    This representation leaves out much of the struc-ture that gives the demand and supply function theirmeaning, that is, the demand equation (3.3), thesupply equation (3.4) and the equilibrium condition(3.5). As Strotz and Wold (1960) write, “Those whowrite such systems [(3.8) and (3.8)] do not, however,really mean what they write, but introduce an ellip-sis which is familiar to economists” (page 425), withthe ellipsis referring to the market equilibrium con-dition that is left out. See also Strotz (1960), Strotzand Wold (1965), and Wold (1960)

    3.5 The Statistical Demand Curve

    Given this set up, let me discuss two issues.First, let us explore, under this model, the inter-pretation of what Working (1927) called the “sta-tistical demand curve.” The covariance between ob-served (equilibrium) log quantities and log prices iscov(lnQobst , lnP

    obst ) = (β

    s · σ2d + βd · σ2s − ρ · σd · σs ·(βd + βs))/((βs − βd)2), where σd and σs are thestandard deviations of εdt and ε

    st , respectively, and ρ

    is their correlation. Because the variance of lnP obst is(σ2s +σ

    2

    d−2 ·ρ ·σd ·σs)/(βs−βd)2, it follows that theregression coefficient in the regression of log quanti-ties on log prices is

    cov(lnQobst , lnPobst )

    var(lnP obst )

    =βs · σ2d + βd · σ2s − ρ · σd · σs · (βd + βs)

    σ2s + σ2

    d − 2 · ρ · σd · σs.

    Working focuses on the interpretation of this relationbetween equilibrium quantities and prices. Supposethat the correlation between εdt and ε

    st , denoted by ρ,

    is zero. Then the regression coefficient is a weightedaverage of the two slope coefficients of the supplyand demand function, weighted by the variances ofthe residuals:

    cov(lnQobst , lnPobst )

    var(lnP obst )= βs · σ

    2

    d

    σ2s + σ2

    d

    + βd · σ2s

    σ2s + σ2

    d

    .

  • 12 G. W. IMBENS

    If σ2d is small relative to σ2s , then we estimate some-

    thing close to the slope of the demand function, andif σ2s is small relative to σ

    2

    d , then we estimate some-thing close to the slope of the supply function. Ingeneral, however, as Working stresses, the “statisti-cal demand curve” is not informative about the de-mand function (or about the supply function); seealso Leamer (1981).

    3.6 The Effect of a Tax Increase

    The second question is how this model with supplyand demand functions and a market clearing pricehelps us answer the substantive question of interest.The specific question considered is the effect of thetax increase on the average quantity traded. In agiven market, let p be the price sellers receive perpound of whiting, and let p̃ = p× (1 + r) the pricebuyers pay after the tax has been imposed. The keyassumption is that the only way buyers and sellersrespond to the tax is through the effect of the tax onprices: they do not change how much they would bewilling to buy or sell at any given price, and the pro-cess that determines the equilibrium price does notchange. The technical econometric term for this isthat the demand and supply functions are structuralor invariant in the sense that they are not affectedby changes in the treatment, taxes in this case. Thismay not be a perfect assumption, but certainly inmany cases it is reasonable: if I have to pay $1.10per pound of whiting, I probably do not care whether10 cts of that goes to the government and $1 to theseller, or all of it goes to the seller. If we are willingto make that assumption, we can solve for the newequilibrium price and quantity. Let Pt(r) be the newequilibrium price [net of taxes, that is, the price sell-ers receive, with (1+ r) ·Pt(r) the price buyers pay],given a tax rate r, with in our example r= 0.1. Thisprice solves

    Qdt (Pt(r)× (1 + r)) =Qst (Pt(r)).Given the log linear specification for the demand andsupply functions, this leads to

    lnPt(r) =αd −αsβs − βd +

    βd × ln(1 + r)βs − βd +

    εdt − εstβs − βd .

    The result of the tax is that the average of the loga-rithm of the price that sellers receive with a positivetax rate r is less than what they would have receivedin the absence of the tax rate:

    E[lnPt(r)] =αd −αsβs − βd +

    βd × ln(1 + r)βs − βd

    ≤ αd −αs

    βs − βd = E[lnPt(0)].

    (Note that βd < 0.) On the other hand, the buyerswill pay more on average:

    E[ln((1 + r) · Pt(r))] =αd −αsβs − βd +

    βs × ln(1 + r)βs − βd

    ≥ E[lnPt(0)].The quantity traded after the tax increase is

    lnQt(r) =βs · αd − βd ·αs

    βs − βd +βs · βd · ln(1 + r)

    βs − βd

    +βs · εdt − βd · εst

    βs − βd ,

    which is less than the quantity that would be tradedin the absence of the tax increase. The causal effectis

    lnQt(r)− lnQt(0) =βs · βd · ln(1 + r)

    βs − βd ,

    the same in all markets, and proportional to the sup-ply and demand elasticities and, for small r, propor-tional to the tax. What should we take away fromthis discussion? There are three points. First, theregression coefficient in the regression of log quan-tity on log prices does not tell us much about theeffect of new tax. The sign of this regression coeffi-cient is ambiguous, depending on the variances andcovariance of the unobserved determinants of supplyand demand. Second, in order to predict the mag-nitude of the effect of a new tax we need to learnabout the demand and supply functions separately,or in the econometrics terminology, identify the sup-ply and demand functions. Third, observations onequilibrium prices and quantities by themselves donot identify these functions.

    3.7 Identification with Instrumental Variables

    Given this identification problem, how do we iden-tify the demand and supply functions? This is whereinstrumental variables enter the discussion. To iden-tify the demand function, we look for determinantsof the supply of whiting that do not affect the de-mand for whiting, and, similarly, to identify the sup-ply function we look for determinants of the de-mand for whiting that do not affect the supply. Inthis specific case, Graddy (1995, 1996) assumes thatweather conditions at sea on the days prior to mar-ket t, denoted by Zt, affect supply but do not affect

  • INSTRUMENTAL VARIABLES 13

    demand. Certainly, it appears reasonable to thinkthat weather is a direct determinant of supply: hav-ing high waves and strong winds makes it harder tocatch fish. On the other hand, there does not seemto be any reason why demand on day t, at a givenprice p, would be correlated with wave height orwind speed on previous days. This assumption maybe made more plausible by conditioning on covari-ates. For example, if one is concerned that weatherconditions on land affect demand, one may wish tocondition on those, and only look at variation inweather conditions at sea given similar weather con-ditions on land as an instrument. Formally, the keyassumptions are that

    Qdt (p)⊥Zt and Qst (p) 6⊥ Zt,possibly conditional on covariates. If both of theseconditions hold, we can use weather conditions as aninstrument.

    How do we exploit these assumptions? The tradi-tional approach is to generalize the functional formof the supply function to explicitly incorporate theeffect of the instrument on the supply of whiting. Inour notation,

    lnQst (p, z) = αs + βs × lnp+ γs × z+ εst .

    The demand function remains unchanged, capturingthe fact that demand is not affected by the instru-ment:

    lnQdt (p, z) = αd + βd × lnp+ εdt .

    We assume that the unobserved components of sup-ply and demand are independent of (or at least un-correlated with) the weather conditions:

    (εdt , εst )⊥Zt.

    The equilibrium price P obst is the solution for p inthe equation

    Qd(p,Zt) =Qst(p,Zt),

    which, in combination with the log linear specifica-tion for the demand and supply functions, leads to

    lnP obst =αd −αsβs − βd +

    εdt − εstβs − βd −

    γs ·Ztβs − βd

    and

    lnQobst =βs · αd − βd · αs

    βs − βd +βs · εdt − βd · εst

    βs − βd

    − γs · βd ·Ztβs − βd .

    Now consider the expected value of the equilib-rium price and quantity given the weather condi-tions:

    E[lnQobst |Zt = z](3.9)

    =βs · αd − βd · αs

    βs − βd −γs · βdβs − βd · z

    and

    E[lnP obst |Zt = z] =αd −αsβs − βd −

    γs

    βs − βd · z.(3.10)

    Equations (3.9) and (3.10) are what is called ineconometrics the reduced form of the simultaneousequations model. It expresses the endogenous vari-ables (those variables whose values are determinedinside the model, price and quantity in this exam-ple) in terms of the exogenous variables (those vari-ables whose values are not determined within themodel, weather conditions in this example). Theslope coefficients on the instrument in these reducedform equations are what in randomized experimentswith noncompliance would be called the intention-to-treat effects. One can estimate the coefficients inthe reduced form by least squares methods. Thekey insight is that the ratio of the coefficients onthe weather conditions in the two regression func-tions, γs · βd/(βs − βd) in the quantity regressionand γs/(βs − βd) in the price regression, is equal tothe slope coefficient in the demand function.

    For some purposes, the reduced-form or intention-to-treat effects may be of substantive interest. Inthe Fulton fish market example, people attemptingto predict prices and quantities under the currentregime may find these estimates of interest. Theyare of less interest to policy makers contemplatingthe introduction of a new tax. In simultaneous equa-tions settings, the demand and supply functions areviewed as structural in the sense that they are notaffected by interventions in the market such as newtaxes. As such they, and not the reduced-form re-gression functions, are the key components of predic-tions of market outcomes under new regimes. Thisis somewhat different in many of the recent applica-tions of instrumental variables methods in the statis-tics literature in the context of randomized experi-ments with noncompliance where the intention-to-treat effects are traditionally of primary interest.

    Let me illustrate this with the Fulton Fish Marketdata collected by Graddy. For ease of illustration,let me simplify the instrument to a binary one: the

  • 14 G. W. IMBENS

    Fig. 2. Scatterplot of log prices and log quantities by weather conditions.

    weather conditions are good for catching fish (Zt =0, fair weather, corresponding to low wind speed andlow wave height) or stormy (Zt = 1, corresponding torelatively strong winds and high waves).11 The priceis the average daily price in cents for one dealer, andthe quantity is the daily quantity in pounds. Thetwo estimated reduced forms are

    l̂nQobs

    t = 8.63 − 0.36 ×Zt(0.08) (0.15)

    and

    l̂nPobs

    t =−0.29 + 0.34 ×Zt.(0.04) (0.07)

    Hence, the instrumental variables estimate of theslope of the demand function is

    β̂d =−0.360.34

    =−1.08 (s.e. 0.46).

    Another, perhaps more intuitive way of looking atthese estimates is to consider the location of the av-erage log quantity and average log price separatelyby weather conditions. Figure 2 presents the scatterplot of log quantity and log prices, with the starsindicating stormy days and the plus signs indicat-ing calm days. On fair weather days the average logprice is −0.29, and the average log quantity is 8.6.

    11The formal definition I use, following Angrist, Graddyand Imbens (2000) is that stormy is defined as wind speedgreater than 18 knots in combination with wave height morethan 4.5 ft, and fair weather is anything else.

    On stormy days, the average log price is 0.04, andthe average log quantity is 8.3. These two loci aremarked by circles in Figure 2. On stormy days, theprice is higher and the quantity traded is lower thanon fair weather days. This is used to estimate theslope of the demand function. The figure also in-cludes the estimated demand function based on us-ing the indicator for stormy days as an instrumentfor the price: the estimated demand function goesthrough the two points defined by the average ofthe log price and log quantity for stormy and fairweather days.

    With the data collected by Graddy, it is more dif-ficult to point identify the supply curve. The tra-ditional route toward identifying the supply curvewould rely on finding an instrument that shifts de-mand without directly affecting supply. Withoutsuch an instrument, we cannot point identify the ef-fect of the introduction of the tax on quantity andprices. It is possible under weaker assumptions tofind bounds on these estimands (e.g., Leamer (1981);Manski (2003)), but we do not pursue this here.

    3.8 Recent Research on SimultaneousEquations Models

    The traditional econometric literature on simul-taneous equations models is surveyed in Hausman(1983). Compared to the discussion in the preced-ing sections, this literature focuses on a more gen-eral case, allowing for multiple endogenous variablesand multiple instruments. The modern economet-ric literature, starting in the 1980s, has relaxed the

  • INSTRUMENTAL VARIABLES 15

    linearity and additivity assumptions in specification(3.3) substantially. Key references to this literatureare Brown (1983), Roehrig (1988), Newey and Pow-ell (2003), Chesher (2003, 2010), Benkard and Berry(2006), Matzkin (2003, 2007), Altonji and Matzkin(2005), Imbens and Newey (2009), Hoderlein andMammen (2007), Horowitz (2011) and Horowitz andLee (2007). Matzkin (2007) provides a recent surveyof this technically demanding literature. This litera-ture has continued to use the observed outcome no-tation, making it more difficult to connect to the sta-tistical literature. Here, I briefly review some of thisliterature. The starting point is a structural equa-tion, in the potential outcome notation,

    Yi(x) = α+ β · x+ εiand an instrument Zi that satisfies

    Zi ⊥ εi and Zi 6⊥Xi.The traditional econometric literature would formu-late this in the observed outcome notation as

    Yi = α+ β ·Xi + εi, Zi ⊥ εi and Zi 6⊥Xi.There are a number of generalizations considered inthe modern literature. First, instead of assuming in-dependence of the unobserved component and theinstrument, part of the current literature assumesonly that the conditional mean of the unobservedcomponent given the instrument is free of depen-dence on the instrument, allowing the variance andother distributional aspects to depend on the valueof the instrument; see Horowitz (2011). Another gen-eralization of the linear model allows for general non-linear function forms of the type

    Yi = g(Xi) + εi, Zi ⊥ εi and Zi 6⊥Xi,where the focus is on nonparametric identificationand estimation of g(x); see Brown (1983), Roehrig(1988), Benkard and Berry (2006). Allowing for evenmore generality, researchers have studied nonaddi-tive versions of these models with

    Yi = g(Xi, εi), Zi ⊥ εi and Zi 6⊥Xi,with g(x, ε) strictly monotone in a scalar unobservedcomponent ε. In these settings, point identificationoften requires strong assumptions on the support ofthe instrument and its relation to the endogenous re-gressor and, therefore, researchers have also exploredbounds. See Matzkin (2003, 2007, 2008) and Imbensand Newey (2009).

    4. A MODERN EXAMPLE: RANDOMIZEDEXPERIMENTS WITH NONCOMPLIANCE

    AND HETEROGENOUS TREATMENTEFFECTS

    In this section, I will discuss part of the modern lit-erature on instrumental variables methods that hasevolved simultaneously in the statistics and econo-metrics literature. I will do so in the context of asecond example. On the one hand, concern arosein the econometric literature about the restrictive-ness of the functional form assumptions in the tra-ditional instrumental variables methods and in par-ticular with the constant treatment effect assump-tion that were commonly used in the so-called selec-tion models (Heckman (1979); Heckman and Robb(1985)). The initial results in this literature demon-strated the difficulties in establishing point identi-fication (Heckman (1990); Manski (1990)), leadingto the bounds approach developed by Manski (1995,2003). At the same time, statisticians analyzed thecomplications arising from noncompliance in ran-domized experiments (Robins (1989)) and the mer-its of encouragement designs (Zelen, 1979, 1990). Byadopting a common framework and notation in Im-bens and Angrist (1994) and Angrist, Imbens andRubin (1996), these literatures have become closelyconnected and influenced each other substantially.

    4.1 The McDonald, Hiu and Tierney (1992)Data

    The canonical example in this literature is thatof a randomized experiment with noncompliance.To illustrate the issues, I will use here data previ-ously analyzed in Hirano et al. (2000) and McDon-ald, Hiu and Tierney (1992). McDonald, Hiu andTierney (1992) carried out a randomized experimentto evaluate the effect of an influenza vaccination onflu-related hospital visits. Instead of randomly as-signing individuals to receive the vaccination, theresearchers randomly assigned physicians to receiveletters reminding them of the upcoming flu seasonand encouraging them to vaccinate their patients.This is what Zelen (1979, 1990) refers to as an en-couragement design. I discuss this using the potentialoutcome notation used for this particular set up inAngrist, Imbens and Rubin (1996), and in generalsometimes referred to as the Rubin Causal Model(Holland (1986)), although there are important an-tecedents in Splawa-Neyman (1990). I consider two

  • 16 G. W. IMBENS

    Table 2

    Influenza data (N = 2861)

    Hospitalized for Influenzaflu-related reasons vaccine Letter Number of

    Yobs

    i Xobs

    i Zi individuals

    No No No 1027No No Yes 935No Yes No 233No Yes Yes 422Yes No No 99Yes No Yes 84Yes Yes No 30Yes Yes Yes 31

    distinct treatments: the first the receipt of the let-ter, and second the receipt of the influenza vacci-nation. Let Zi ∈ {0,1} be the indicator for the re-ceipt of the letter, and let Xi ∈ {0,1} be the indi-cator for the receipt of the vaccination. We startby postulating the existence of four potential out-comes. Let Yi(z,x) be the potential outcome cor-responding to the receipt of letter equal to Zi = z,and the receipt of vaccination equal to Xi = x, forz = 0,1 and x = 0,1. In addition, we postulate theexistence of two potential outcomes correspondingto the receipt of the vaccination as a function of thereceipt of the letter, Xi(z), for z = 0,1. We observefor each unit in a population of size N = 2861 thevalue of the assignment, Zi, the treatment actuallyreceived, Xobsi =Xi(Zi) and the potential outcomecorresponding to the assignment and treatment re-ceived, Y obsi = Yi(Zi,Xi(Zi)). Table 2 presents thenumber of individuals for each of the eight valuesof the triple (Zi,X

    obsi , Y

    obsi ) in the McDonald, Hiu

    and Tierney data set. It should be noted that therandomization in this experiment is at the physicianlevel. I do not have physician indicators and, there-fore, ignore the clustering. This will tend to lead tounderestimation of the standard errors.

    4.2 Instrumental Variables Assumptions

    There are four key of assumptions underlyinginstrumental variables methods beyond the no-interference assumption or SUTVA, with differentversions for some of them. I will introduce these as-sumptions in this section, and in Section 5 discusstheir substantive content in the context of some ex-amples. The first assumption concerns the assign-ment to the instrument Zi, in the flu example the

    receipt of the letter by the physician. The assump-tion requires that the instrument is as good as ran-domly assigned:

    Zi ⊥ (Yi(0,0), Yi(0,1), Yi(1,0),Yi(1,1),Xi(0),Xi(1))(4.1)

    (random assignment).

    This assumption is often satisfied by design: if theassignment is physically randomized, as the letter inthe flu example and as in many of the applicationsin the statistics literature (e.g., see the discussion inRobins (1989)), it is automatically satisfied. In otherapplications with observational data, common in theeconometrics literature, this assumption is more con-troversial. It can in those cases be relaxed by requir-ing it to hold only within subpopulations defined bycovariates Vi, assuming the assignment of the instru-ment is unconfounded:

    Zi ⊥ (Yi(0,0), Yi(0,1), Yi(1,0),Yi(1,1),Xi(0),Xi(1))|Vi

    (unconfounded assignment given Vi).

    This is identical to the generalization from randomassignment to unconfounded assignment in observa-tional studies. Either version of this assumption jus-tifies the causal interpretation of Intention-To-Treat(ITT) effects, the comparison of outcomes by assign-ment to the treatment. In many cases, these ITT ef-fects are only of limited interest, however, and thismotivates the consideration of additional assump-tions that do allow the researcher to make state-ments about the causal effects of the treatment ofinterest. It should be stressed, however, that in or-der to draw inferences beyond ITT effects, additionalassumptions will be used; whether the resulting in-ferences are credible will depend on the credibilityof these assumptions.

    The second class of assumptions limits or rules outcompletely direct effects of the assignment (the re-ceipt of the letter in the flu example) on the outcome,other than through the effect of the assignment onthe receipt of the treatment of interest (the receiptof the vaccine). This is the most critical, and typ-ically most controversial assumption underlying in-strumental variables methods, sometimes viewed asthe defining characteristic of instruments. One wayof formulating this assumption is as

    Yi(0, x) = Yi(1, x) for x= 0,1, for all i

    (exclusion restriction).

  • INSTRUMENTAL VARIABLES 17

    Robins (1989) formulates a similar assumption as re-quiring that the instrument is “not an independentcausal risk factor” (Robins (1989), page 119). Un-der this assumption, we can drop the z argument ofthe potential outcomes and write the potential out-comes without ambiguity as Yi(x). This assumptionis typically a substantive one. In the flu example,one might be concerned that the physician, in re-sponse to the receipt of the letter, takes actions thataffect the likelihood of the patient getting infectedwith the flu other than simply administering the fluvaccine. In randomized experiments with noncom-pliance, the exclusion restriction is sometimes madeimplicitly by indexing the potential outcomes onlyby the treatment x and not the instrument z (e.g.,Zelen (1990)).

    There are other, weaker versions of this assump-tion. Hirano et al. (2000) use a stochastic versionof the exclusion restriction that only requires thatthe distribution of Yi(0, x) is the same as the dis-tribution of Yi(1, x). Manski (1990) uses a weakerrestriction that he calls a level set restriction, whichrequires that the average value of Yi(0, x) is equal tothe average value of Yi(1, x). In another approach,Manski and Pepper (2000) consider monotonicity as-sumptions that restrict the sign of Yi(1, x)−Yi(0, x)across individuals without requiring that the effectsare completely absent.

    Imbens and Angrist (1994) combine the randomassignment assumption and the exclusion restrictionby postulating the existence of a pair of potentialoutcomes Yi(x), for x= 0,1, and directly assumingthat

    Zi ⊥ (Yi(0), Yi(1)).A disadvantage of this formulation is that it becomesless clear exactly what role randomization of the in-strument plays. Another version of this combinationof the exclusion restriction and random assignmentassumption does not require full independence, butassumes that the conditional mean of Yi(0) and Yi(1)given the instrument is free of dependence on the in-strument. A concern with such assumptions is thatthey are functional form dependent: if they hold inlevels, they do not hold in logarithms unless full in-dependence holds.

    A third assumption that is often used, labeledmonotonicity by Imbens and Angrist (1994), re-quires that

    Xi(1)≥Xi(0) for all i (monotonicity),

    for all units. This assumption rules out the presenceof units who always do the opposite of their assign-ment [units with Xi(0) = 1 and Xi(1) = 0], and istherefore also referred to as the no-defiance assump-tion (Balke and Pearl (1995)). It is implicit in thelatent index models often used in econometric eval-uation models (e.g., Heckman and Robb, 1985). Inthe randomized experiments such as the flu example,this assumption is often plausible. There it requiresthat in response to the receipt of the letter by theirphysician, no patient is less likely to get the vaccine.Robins (1989) makes this assumption in the contextof a randomized trial for the effect of AZT on AIDS,and describes the assumption as “often, but not al-ways, reasonable” (Robins (1989), page 122).

    Finally, we need the instrument to be correlatedwith the treatment, or the instrument to be relevantin the terminology of Phillips (1989) and Staiger andStock (1997):

    Xi 6⊥Zi.In practice, we need the correlation to be substantialin order to draw precise inferences. A recent litera-ture on weak instruments is concerned with credi-ble inference in settings where this correlation be-tween the instrument and the treatment is weak; seeStaiger and Stock (1997) and Andrews and Stock(2007).

    The random assignment assumption and the ex-clusion restriction are conveniently captured by thegraphical model below, although the monotonicityassumption does not fit in as easily. The unobservedcomponent U has a direct effect on both the treat-ment X and the outcome Y (captured by arrowsfrom U to X and to Y ). The instrument Z is not re-lated to the unobserved component U (captured bythe absence of a link between U and Z), and is onlyrelated to the outcome Y through the treatment X(as captured by the arrow from Z to X and an arrowfrom X to Y , and the absence of an arrow betweenZ and Y ).

    I will primarily focus on the case with all four as-sumptions maintained, random assignment, the ex-clusion restriction, monotonicity and instrument rel-evance, without additional covariates, because this

  • 18 G. W. IMBENS

    case has been the focus of, or a special case ofthe focus of, many studies, allowing me to com-pare different approaches. Methodological studiesconsidering essentially this set of assumptions, some-times without explicitly stating instrument rele-vance, and sometimes adding additional assump-tions, include Robins (1989), Heckman (1990), Man-ski (1990), Imbens and Angrist (1994), Angrist,Imbens and Rubin (1996), Robins and Greenland(1996), Balke and Pearl (1995, 1997), Greenland(2000), Hernán and Robins (2006), Robins (1994),Robins and Rotnitzky (2004), Vansteelandt andGoetghebeur (2003), Vansteelandt et al. (2011),Hirano et al. (2000), Tan (2006, 2010), Abadie(2002, 2003), Duflo, Glennester and Kremer (2007),Brookhart et al. (2006), Martens et al. (2006), Mor-gan and Winship (2007), and others. Many morestudies make the same assumptions in combinationwith a constant treatment effect assumption.

    The modern literature analyzed this setting froma number of different approaches. Initially, the lit-erature focused on the inability, under these fourassumptions, to identify the average effect of thetreatment. Some researchers, including prominentlyManski (1990), Balke and Pearl (1995) and Robins(1989), showed that although one could not point-identify the average effect under these assumptions,there was information about the average effect inthe data under these assumptions and they derivedbounds for it. Another strand of the literature, start-ing with Imbens and Angrist (1994) and Angrist,Imbens and Rubin (1996) abandoned the effort todo inference for the overall average effect, and fo-cused on subpopulations for which the average effectcould be identified, the so-called compliers, leadingto the local average treatment effect. We discuss thebounds approach in the next section (Section 4.3)and the local average treatment effect approach inSections 4.4–4.6.

    4.3 Point Identification versus Bounds

    In a number of studies, the primary estimand isthe average effect of the treatment, or the averageeffect for the treated:

    τ = E[Yi(1)− Yi(0)] and(4.2)

    τt = E[Yi(1)− Yi(0)|Xi = 1].With only the four assumptions, random assign-ment, the exclusion restriction, monotonicity, andinstrument relevance Robins (1989), Manski (1990)

    and Balke and Pearl (1995) established that the av-erage treatment effect can often not be consistentlyestimated even in large samples. In other words, thatit is often not point-identified.

    Following this result, a number of different ap-proaches have been taken. Heckman (1990) showedthat if the instrument takes on values such that theprobability of treatment given the instrument canbe arbitrarily close to zero and one, then the aver-age effect is identified. This is sometimes referred toas identification at infinity. Robins (1989) also for-mulates assumptions that allow for point identifica-tion, focusing on the average effect for the treated,τt. These assumptions restrict the average value ofthe potential outcomes when not observed in termsof average outcomes that are observed. For example,Robins formulates the condition that

    E[Yi(1)− Yi(0)|Zi = 1,Xi = 1]= E[Yi(1)− Yi(0)|Zi = 0,Xi = 1],

    which, in combination with the random assign-ment and the exclusion restriction, this allows forpoint identification of the average effect for thetreated. Robins also formulates two other assump-tions, including one where the effects are propor-tional to survival rates E[Yi(1)|Zi = 1,Xi = 1] andE[Yi(1)|Zi = 0,Xi = 1] respectively, that also point-identifies the average effect for the treated. However,Robins questions the applicability of these resultsby commenting that “it would be hard to imaginethat there is sufficient understanding of the biologi-cal mechanism. . . to have strong beliefs that any ofthe three conditions. . . is more likely to hold thaneither of the other two” (Robins (1989), page 122).

    As an alternative to adding assumptions, Robins(1989), Manski (1990) and Balke and Pearl (1995),focused on the question what can be learned aboutτ or τt given these four assumptions that do notallow for point identification. Here, I focus on thecase where the three assumptions, random assign-ment, the exclusion restriction and monotonicityare maintained (without necessarily instrument rele-vance holding), although Robins (1989) and Manski(1990) also consider other combinations of assump-tions. For ease of exposition, I focus on the boundsfor the average treatment effect τ under these as-sumptions, in the case where Yi(0) and Yi(1) arebinary. Then

    E[Yi(1)− Yi(0)]∈ [−(1−E[Xi|Zi = 1]) ·E[Yi|Zi = 1,Xi = 0]

  • INSTRUMENTAL VARIABLES 19

    +E[Yi|Zi = 1]− E[Yi|Zi = 0]+E[Xi|Zi = 0] · (E[Yi|Zi = 0,Xi = 1]− 1),(1− E[Xi|Zi = 1])· (1−E[Yi|Zi = 1,Xi = 0])+E[Yi|Zi = 1]− E[Yi|Zi = 0]

    +E[Xi|Zi = 0] ·E[Yi|Zi = 0,Xi = 1]],

    which are known at the natural bounds. In this sim-ple setting, this is a straightforward calculation.Work by Manski (1995, 2003, 2005, 2007), Robins(1989) and Hernán and Robins (2006) extends thepartial identification approach to substantially morecomplex settings.

    For the McDonald–Hiu–Tierney flu data, the esti-mated identified set for the population average treat-ment effect is

    E[Yi(1)− Yi(0)] ∈ [−0.24,0.64].

    There is a growing literature developing methods forestablishing confidence intervals for parameters insettings with partial identification taking samplinguncertainty into account; see Imbens and Manski(2004) and Chernozhukov, Hong and Tamer (2007).

    4.4 Compliance Types

    Imbens and Angrist (1994) and Angrist, Imbensand Rubin (1996) take a different approach. Ratherthan focusing on the average effect for the popula-tion that is not identified under the three assump-tions given in Section 4.2, they focus on different av-erage causal effects. A first key step in the Angrist–Imbens–Rubin set up is that we can think of four dif-ferent compliance types defined by the pair of valuesof (Xi(0),Xi(1)), that is, defined by how individualswould respond to different assignments in terms ofreceipt of the treatment:12

    Ti =

    n (never-taker) if Xi(0) =Xi(1) = 0,c (complier) if Xi(0) = 0,Xi(1) = 1,d (defier) if Xi(0) = 1,Xi(1) = 0,a (always-taker) if Xi(0) =Xi(1) = 1.

    Given the existence of deterministic potential out-comes this partitioning of the population into four

    12Frangakis and Rubin (2002) generalize this notion of sub-populations whose membership is not completely observedinto their principal stratification approach; see also Sec-tion 7.2.

    subpopulations is simply a definition.13 It clarifiesimmediately that it will be difficult to identify theaverage effect of the primary treatment (the receiptof the vaccine) for the entire population: never-takersand always-takers can only be observed exposed to asingle level of the treatment of interest, and thus forthese groups any point estimates of the causal effectof the treatment must be based on extrapolation.

    We cannot infer without additional assumptionsthe compliance type of any unit: for each unit weobserve Xi(Zi), but the data contain no informa-tion about the value of Xi(1 − Zi). For each unit,there are therefore two compliance types consistentwith the observed behavior. We can also not iden-tify the proportion of individuals of each compli-ance type without additional restrictions. The mono-tonicity assumption implies that there are no defiers.This, in combination with random assignment, im-plies that we can identify the population shares ofthe remaining three compliance types. The propor-tion of always-takers and never-takers are

    πa = pr(Ti = a) = pr(Xi = 1|Zi = 0) andπn = pr(Ti = n) = pr(Xi = 0|Zi = 1),

    respectively, and the proportion of compliers is theremainder:

    πc = pr(Ti = c) = 1− πa − πn.

    For the McDonald–Hiu–Tierney data these sharesare estimated to be

    π̂a = 0.189, π̂n = 0.692, π̂c = 0.119,

    although, as I discuss in Section 5.2, these sharesmay not be consistent with the exclusion restriction.

    4.5 Local Average Treatment Effects

    If, in addition to monotonicity, we also assumethat the exclusion restriction holds, Imbens and An-grist (1994) and Angrist, Imbens and Rubin (1996)show that the local average treatment effect or com-plier average causal effect is identified:

    τlate = E[Yi(1)− Yi(0)|Ti = c](4.3)

    =E[Yi|Zi = 1]−E[Yi|Zi = 0]E[Xi|Zi = 1]−E[Xi|Zi = 0]

    .

    13Outside of this framework, the existence of these foursubpopulations would be an assumption.

  • 20 G. W. IMBENS

    The components of the right-hand side of this ex-pression can be estimated consistently from a ran-dom sample (Zi,Xi, Yi)

    Ni=1. For the McDonald–Hiu–

    Tierney data, this leads to

    τ̂late =−0.125 (s.e. 0.090).Note that just as in the supply and demand exam-

    ple, the causal estimand is the ratio of the intention-to-treat effects of the letter on hospitalization andof the letter on the receipt of the vaccine. Theseintention-to-treat effects are

    ÎTTY =−0.015 (s.e. 0.011),ÎTTX = π̂c = 0.119 (s.e. 0.016),

    with the latter equal to the estimated proportion ofcompliers in the population.

    Without the monotonicity assumption, but main-taining the random assignment assumption and theexclusion restriction, the ratio of ITT effects still hasa clear interpretation. In that case, it is equal to alinear combination average of the effect of the treat-ment for compliers and defiers:

    E[Yi|Zi = 1]− E[Yi|Zi = 0]E[Xi|Zi = 1]− E[Xi|Zi = 0]

    =pr(Ti = c)

    pr(Ti = c)− pr(Ti = d)·E[Yi(1)− Yi(0)|Ti = c](4.4)

    − pr(Ti = d)pr(Ti = c)− pr(Ti = d)·E[Yi(1)− Yi(0)|Ti = d].

    This estimand has a clear interpretation if the treat-ment effect is constant across all units, but if there isheterogeneity in the treatment effects it is a weightedaverage with some weights negative. This represen-tation shows that if the monotonicity assumption isviolated, but the proportion of defiers is small rel-ative to that of compliers, the interpretation of theinstrumental variables estimand is not severely im-pacted.

    4.6 Do We Care About the Local AverageTreatment Effect?

    The local average treatment effect is an unusualestimand. It is an average effect of the treatmentfor a subpopulation that cannot be identified in thesense that there are no units whom we know for sureto belong to this subpopulation, although there are

    some units whom we know do not belong to it. Amore conventional approach is to start an analysisby clearly articulating the object of interest, say theaverage effect of a treatment for a well-defined popu-lation. There may be challenges in obtaining credibleestimates of this object of interest, and along the wayone may make more or less credible assumptions, buttypically the focus remains squarely on the originallyspecified object of interest.

    Here, the approach appears to be quite different.We started off by defining unit-level treatment ef-fects for all units. We did not articulate explicitlywhat the target estimand was. In the McDonald–Hiu–Tierney influenza-vaccine application a naturalestimand might be the population average effect ofthe vaccine. Then, apparently more or less by acci-dent, the definition of the compliance types led usto focus on the average effects for compliers. In thisexample, the compliers were defined by the responsein terms of the receipt of the vaccine to the receiptof the letter. It appears difficult to argue that thisis a substantially interesting group, and in fact noattempt was made to do so.

    This type of example has led distinguished re-searchers both in economics and in statistics toquestion whether and why one should care aboutthe local average treatment effect. The economistDeaton writes “I find it hard to make any sense ofthe LATE [local average treatment effect]” (Deaton(2010), page 430). Pearl similarly wonders “Realizingthat the population averaged treatment effect (ATE)is not identifiable in experiments marred by non-compliance, they have shifted attention to a specificresponse type (i.e., compliers) for which the causaleffect was identifiable, and presented the latter [thelocal average treatment effect] as an approximationfor ATE. . . . However, most authors in this cate-gory do not state explicitly whether their focus on aspecific stratum is motivated by mathematical con-venience, mathematical necessity (to achieve identi-fication) or a genuine interest in the stratum underanalysis” (Pearl (2011), page 3). Freedman writes“In many circumstances, the instrumental-variablesestimator turns out to be estimating some data-dependent average of structural parameters, whosemeaning would have to be elucidated” (Freedman(2006), pages 700–701). Let me attempt to clear upthis confusion. See also Imbens (2010). An instru-mental variables analysis is an analysis in a second-best setting. It would have been preferable if one

  • INSTRUMENTAL VARIABLES 21

    had been able to carry out a well-designed random-ized experiment. However, such an experiment wasnot carried out, and we have noncompliance. As aresult, we cannot answer all the questions we mighthave wanted to ask. Specifically, if the noncompli-ance is substantial, we are limited in the questions wecan answer credibly and precisely. Ultimately, thereis only one subpopulation we can credibly (point-)identify the average effect of the treatment for,namely, the compliers.

    It may be useful to draw an analogy. Suppose a re-searcher is interested in evaluating a medical treat-ment and suppose a randomized experiment hadbeen carri