Dufour J.-m. Identification Weak Instruments and Statistical Inference in Econometrics (U. de...

7/31/2019 Dufour J.-m. Identification Weak Instruments and Statistical Inference in Econometrics (U. de Montreal, 2003)(43s)_GL

1/43

Identication, weak instruments and statistical inference ineconometricsJean-Marie Dufour

Universit de Montral

First version:May 2003This version: July 2003

This paper is based on the authors Presidential Address to theCanadian Economics Association given on May31, 2003, at Carleton University (Ottawa). The author thanks Bryan Campbell, Tarek Jouini, Lynda Khalaf, William Mc-Causland, Nour Meddahi, Benot Perron and Mohamed Taamouti for several useful comments. This work was supportedby the Canada Research Chair Program (Chair in Econometrics, Universit de Montral), the Alexander-von-Humboldt

Foundation (Germany), the Canadian Network of Centres of Excellence [program onMathematics of Information Tech-nology and Complex Systems (MITACS)], the Canada Council for the Arts (Killam Fellowship), the Natural Sciences andEngineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, the Fondsde recherche sur la socit et la culture (Qubec), and the Fonds de recherche sur la nature et les technologies (Qubec).

Canada Research Chair Holder (Econometrics). Centre interuniversitaire de recherche en analyse des organisa-tions (CIRANO), Centre interuniversitaire de recherche en conomie quantitative (CIREQ), and Dpartement de sciencesconomiques, Universit de Montral. Mailing address: Dpartement de sciences conomiques, Universit de Montral,C.P. 6128 succursale Centre-ville, Montral, Qubec, Canada H3C 3J7. TEL: 1 514 343 2400; FAX: 1 514 343 5831;e-mail: [email protected]. Web page: http://www.fas.umontreal.ca/SCECO/Dufour .


2/43

ABSTRACT

We discuss statistical inference problems associated with identication and testability in econo-metrics, and we emphasize the common nature of the two issues. After reviewing the relevantstatistical notions, we consider in turn inference in nonparametric models and recent developmentson weakly identied models (or weak instruments). We point out that many hypotheses, for whichtest procedures are commonly proposed, are not testable at all, while some frequently used econo-metric methods are fundamentally inappropriate for the models considered. Such situations lead toill-dened statistical problems and are often associated with a misguided use of asymptotic distri-butional results. Concerning nonparametric hypotheses, we discuss three basic problems for whichsuch difculties occur: (1) testing a mean (or a moment) under (too) weak distributional assump-tions; (2) inference under heteroskedasticity of unknown form; (3) inference in dynamic modelswith an unlimited number of parameters. Concerning weakly identied models, we stress that validinference should be based on proper pivotal functions a condition not satised by standard Wald-type methods based on standard errors and we discuss recent developments in this eld, mainlyfrom the viewpoint of building valid tests and condence sets. The techniques discussed includealternative proposed statistics, bounds, projection, split-sampling, conditioning, Monte Carlo tests.The possibility of deriving a nite-sample distributional theory, robustness to the presence of weakinstruments, and robustness to the specication of a model for endogenous explanatory variablesare stressed as important criteria assessing alternative procedures.

Key-words : hypothesis testing; condence set; condence interval; identication; testability;asymptotic theory; exact inference; pivotal function; nonparametric model; Bahadur-Savage; het-eroskedasticity; serial dependence; unit root; simultaneous equations; structural model; instrumen-tal variable; weak instrument; weak identication; simultaneous inference; projection; split-sample;conditional test; Monte Carlo test; bootstrap.

JEL classication numbers: C1, C12, C14, C15, C3, C5.

i


3/43

RSUM

Nous analysons les problmes dinfrence associs lidentication et la testabilit enconomtrie, en soulignant la similarit entre les deux questions. Aprs une courte revue des no-tions statistiques requises, nous tudions tour tour linfrence dans les modles non-paramtriquesainsi que les rsultats rcents sur les modles structurels faiblement identis (ou les instrumentsfaibles). Nous remarquons que beaucoup dhypothses, pour lesquelles des tests sont rgulirementproposs, ne sont pas en fait testables, tandis que plusieurs mthodes conomtriques frquem-ment utilises sont fondamentalement inappropries pour les modles considrs. De telles situa-tions conduisent des problmes statistiquesmal poss et sont souvent associes un emploi malavis de rsultats distributionnels asymptotiques. Concernant les hypothses non-paramtriques,nous analysons trois problmes de base pour lesquels de telles difcults apparaissent: (1) testerune hypothse sur un moment avec des restrictions trop faibles sur la forme de la distribution;(2) linfrence avec htroscdasticit de forme non spcie; (3) linfrence dans les modlesdynamiques avec un nombre illimit de paramtres. Concernant les modles faiblement identis,nous insistons sur limportance dutiliser des fonctions pivotales une condition qui nest pas sat-isfaite par les mthodes usuelles de type Wald bases sur lemploi dcart-types et nous passonsen revue les dveloppements rcents dans ce domaine, en mettant laccent sur la construction de testet rgions de conance valides. Les techniques considres comprennent les diffrentes statistiquesproposes, lemploi de bornes, la subdivision dchantillon, les techniques de projection, le condi-tionnement et les tests de Monte Carlo. Parmi les critres utiliss pour valuer les procdures, nousinsistons sur la possibilit de fournir une thorie distributionnelle distance nie, sur la robustessepar rapport la prsence dinstruments faibles ainsi que sur la robustesse par rapport la spcicationdun modle pour les variables explicatives endognes du modle.

Mots cls :test dhypothse; rgion de conance; intervalle de conance; identication; testabil-it; thorie asymptotique; infrence exacte; fonction pivotale; modle non-paramtrique; Bahadur-Savage; htroscdasticit; dpendance srielle; racine unitaire; quations simultanes; modlestructurel; variable instrumentale; instrument faible; infrence simultane; projection; subdivisiondchantillon; test conditionnel; test de Monte Carlo; bootstrap.

Classication JEL: : C1, C12, C14, C15, C3, C5.

ii


4/43

Contents

List of Propositions and Theorems iii

1. Introduction 1

2. Models 4

3. Statistical notions 53.1. Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2. Test level and size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3. Condence sets and pivots. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.4. Testability and identication. . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4. Testability, nonparametric models and asymptotic methods 94.1. Procedures robust to nonnormality. . . . . . . . . . . . . . . . . . . . . . . . 94.2. Procedures robust to heteroskedasticity of unknown form. . . . . . . . . . . . . 124.3. Procedures robust to autocorrelation of arbitrary form. . . . . . . . . . . . . . 13

5. Structural models and weak instruments 145.1. Standard simultaneous equations model. . . . . . . . . . . . . . . . . . . . . . 155.2. Statistical problems associated with weak instruments. . . . . . . . . . . . . . 165.3. Characterization of valid tests and condence sets. . . . . . . . . . . . . . . . 17

6. Approaches to weak instrument problems 186.1. Anderson-Rubin statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186.2. Projections and inference on parameter subsets. . . . . . . . . . . . . . . . . . 216.3. Alternatives to the AR procedure. . . . . . . . . . . . . . . . . . . . . . . . . 21

7. Extensions 267.1. Multivariate regression, simulation-based inference and nonnormal errors. . . . 267.2. Nonlinear models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

8. Conclusion 29

List of Propositions and Theorems4.1 Theorem :Mean non-testability in nonparametric models. . . . . . . . . . . . . . 104.2 Theorem :Characterization of heteroskedasticity robust tests. . . . . . . . . . . . . 124.4 Theorem :Unit root non-testability in nonparametric models. . . . . . . . . . . . . 13

iii


5/43

1. IntroductionThe main objective of econometrics is to supply methods for analyzing economic data, buildingmodels, and assessing alternative theories. Over the last 25 years, econometric research has ledto important developments in many areas, such as: (1) new elds of applications linked to theavailability of new data, nancial data, micro-data, panels, qualitative variables; (2) new models:multivariate time series models, GARCH-types processes; (3) a greater ability to estimate nonlinearmodels which require an important computational capacity; (4) methods based on simulation: boot-strap, indirect inference, Markov chain Monte Carlo techniques; (5) methods based on weak distri-butional assumptions: nonparametric methods, asymptotic distributions based on weak regularityconditions; (6) discovery of various nonregular problems which require nonstandard distributionaltheories, such as unit roots and unidentied (or weakly identied) models.

An important component of this work is the development of procedures fortesting hypotheses(or models). Indeed, a view widely held by both scientists and philosophers is thattestability or theformulation of testable hypotheses constitutes a central feature of scientic activity a view weshare. With the exception of mathematics, it is not clear a discipline should be viewed as scientic if it does not lead to empirically testable hypotheses. But this requirement leaves open the question of formulating operational procedures for testing models and theories. To date, the only coherent or,at least, the only well developed set of methods are those supplied by statistical and econometrictheory.

Last year, on the same occasion, MacKinnon (2002) discussed the use of simulation-based in-ference methods in econometrics, specicallybootstrapping , as a way of getting more reliable testsand condence sets. In view of the importance of the issue, this paper also considers questions as-sociated with the development of reliable inference procedures in econometrics. But our expositionwill be, in a way, more specialized, and in another way, more general and critical. Specically,we shall focus on general statistical issues raised byidentication in econometric models, and morespecically onweak instruments in the context of structural models [e.g. , simultaneous equationsmodels (SEM)]. We will nd it useful to bring together two separate streams of literature: namely,results (from mathematical statistics and econometrics) ontestability in nonparametric models , andthe recent econometric research on weak instruments.1 In particular, we shall emphasize that identi-cation problems arise in both literatures and have similar consequences for econometric methodol-ogy. Further, the literature on nonparametric testability sheds light on various econometric problemsand their solutions.

Simultaneous equations models (SEM) are related in a natural way to the concept of equilib-rium postulated by economic theory, both in microeconomics and macroeconomics. So it is notsurprising that SEM were introduced and most often employed in the analysis of economic data.Methods for estimating and testing such models constitute a hallmark of econometric theory andrepresent one of its most remarkable achievements. The problems involved are difcult, raisingamong various issues the possibility of observational equivalence between alternative parametervalues (non-identication ) and the use of instrumental variables (IV). Further, the nite-sample

1By anonparametric model (or hypothesis), we mean a set of possible data distributions such that a distribution [e.g.,the true distribution] cannot be singled out by xing anite number of parameter values.

1


6/43

distributional theory of estimators and test statistics is very complex, so inference is typically basedon large-sample approximations.2

IV methods have become a routine part of econometric analysis and, despite a lot of looseends (often hidden by asymptotic distributional theory), the topic of SEM was dormant until afew years ago. Roughly speaking, an instrument should have two basic properties: rst, it shouldbe independent (or, at least, uncorrelated) with the disturbance term in the equation of interest(exogeneity ); second, it should be correlated with the included endogenous explanatory variablesfor which it is supposed to serve as an instrument (relevance ). The exogeneity requirement hasbeen well known from the very beginning of IV methods. The second one was also known from thetheory of identication, but its practical importance was not well appreciated and often hidden fromattention by lists of instruments relegated to footnotes (if not simply absent) in research papers. Itreturned to center stage with the discovery of so-called weak instruments, which can be interpretedas instruments with little relevance (i.e. , weakly correlated with endogenous explanatory variables).

Weak instruments lead to poor performance of standard econometric procedures and cases wherethey have pernicious effects may be difcult to detect.3 Interest in the problem also goes far beyondIV regressions and SEM, because it underscores the pitfalls in using large-sample approximations,as well as the importance of going back to basic statistical theory when developing econometricmethods.

A parameter (or a parameter vector) in a model is not identied when it is not possible todistinguish between alternative values of the parameter. In parametric models, this is typicallyinterpreted by stating that the postulated distribution of the data as a function of the parametervector (thelikelihood function ) can be the same for different values of the parameter vector.4An important consequence of this sort of situation is a statistical impossibility: we cannot designa data-based procedure for distinguishing between equivalent parameter values (unless additional

information is introduced). In particular, no reasonable test can be produced.5

In nonparametricmodels, identication is more difcult to characterize because a likelihood function (involving anite number of parameters) is not available and parameters are often introduced through moreabstract techniques (e.g. , functionals of distribution functions). But the central problem is the same:canwe distinguish between alternative values of theparameter? So, quite generally, anidentication problem can be viewed as a special form of non-testability . Specically,

identication involves the possibility of distinguishing different parameter values on the basisof the corresponding data distributions, while testability refers to the possibility of designing procedures that can discriminate between sub-sets of parameter values.

2For reviews, see Phillips (1983) and Taylor (1983).3Early papers which called attention to the problem include: Nelson and Startz (1990a, 1990b), Buse (1992), Choi

and Phillips (1992), Maddala and Jeong (1992), and Bound, Jaeger, and Baker (1993, 1995).4For general expositions of the theory of identication in econometrics and statistics, the reader may consult Rothen-

berg (1971), Fisher (1976), Hsiao (1983), Prakasa Rao (1992), Bekker, Merckens, and Wansbeek (1994) and Manski(1995, 2003).

5By a reasonable test, we mean here a test that both satises a level constraint and may have power superior to thelevel when the tested hypothesis (the null hypothesis) does not hold. This will be discussed in greater detail below.

2


7/43

Alternatively, a problem of non-testability can be viewed as a form of non-identication. (or un-deridentication ). These problems are closely related. Furthermore, it is well known that one can

create a non-identied model by introducing redundant parameters, and conversely identicationproblems can be eliminated by transforming the parameter space (e.g. , by reducing the number of parameters). Problems of non-identication are associated withbad parameterizations , inappropriate choices of parameter representations . We will see below that the same remark applies quitegenerally to non-testability problems, especially in nonparametric setups.

In this paper, we pursue twomain objectives:rst , we analyze the statistical problems associatedwith non-identication within the broader context of testability;second , we review the inferentialissues linked to the possible presence of weak instruments in structural models. More precisely,regarding the issue of testability, the following points will be emphasized:

1. many models and hypotheses are formulated in ways that make them fundamentally non-testable; in particular, this tends to be the case in nonparametric setups;

2. such difculties arise in basic apparently well-dened problems, such as: (a) testing an hy-pothesis about a mean when the observationsare independent and identically distributed (i.i.d.; (b) testing an hypothesis about a mean (or a median) with heteroskedasticity of unknownform; (c) testing the unit root hypothesis on an autoregressive model whose order can bearbitrarily large;

3. some parameters tend to be non-testable (badly identied) in nonparametric models whileothers are not; in particular, non-testability easily occurs for moments (e.g. , means, variances)while it does not for quantiles (e.g. , medians); from this viewpoint, moments are not a goodway of representing the properties of distributions in nonparametric setups, while quantilesare;

4. these phenomena underscoreparametric nonseparability problems: statements about a givenparameter (often interpreted as the parameter of interest ) arenotempirically meaningful with-out supplying information about other parameters (often callednuisance parameters ); but hy-potheses that set both the parameter of interest and some nuisance parameters may well betestable in such circumstances, so that the development of appropriate inference proceduresshould start from a joint approach;

5. to the extent that asymptotic distributional theory is viewed as a way of producing statisticalmethods which are valid under weak distributional assumptions, it is fundamentally mis-leading because, under nonparametric assumptions, such approximations are arbitrarily badin nite samples.

Concerning weak instruments, we will review the associated problems and proposed solutions,with an emphasis on nite-sample properties and the developmentof tests and condence sets whichare robust to the presence of weak instruments. In particular, the following points will be stressed:

1. in accordance with basic statistical theory, one should always look forpivots as the funda-mental ingredient for building tests and condence sets; this principle appears to be especiallyimportant when identication problems are present;

3


8/43

2. parametric nonseparability arises in striking ways when some parameters may not be iden-tied, so that proper pivots may easily involve many more parameters than the parameter of

interest; this also indicates that the common distinction between parameters of interest andnuisance parameters can be quite arbitrary, if not misleading;

3. important additional criteria for evaluating procedures in such contexts include various formsof invariance (or robustness), such as: (a) robustness to weak instruments; (b) robustnessto instrument exclusion; (c) robustness to the specication of the model for the endogenousexplanatory variables in the equation(s) of interest;

4. weak instrument problems underscore in a striking way the limitations of large-sample argu-ments for deriving and evaluating inference procedures;

5. very few informative pivotal functions have been proposed in the context of simultaneous

equations models;6. the early statistic proposed by Anderson and Rubin (1949, AR) constitutes one of the (very

rare) truly pivotal functions proposed for SEM; furthermore, it satises all the invarianceproperties listed above, so that it may reasonably be viewed as a fundamental building blockfor developing reliable inference procedures in the presence of weak instruments;

7. a fairly complete set of inference procedures that allow one to produce tests and condencesets for all model parameters can be obtained throughprojection techniques;

8. various extensions and improvements over the AR method are possible, especially in improv-ing power; however, it is important to note that these often come at the expense of using

large-sample approximations or giving up robustness.The literature on weak instruments is growing rapidly, and we cannot provide here a completereview. In particular, we will not discuss in any detail results on estimation, the detection of weakinstruments, or asymptotic theory in this context. For that purpose, we refer the reader to theexcellent survey recently published by Stock, Wright, and Yogo (2002).

The paper is organized as follows. In the next two sections, we review succinctly some basicnotions concerning models (section 2) and statistical theory (section 3), which are important forour discussion. In section 4, we study testability problems in nonparametric models. In section 5,we review the statistical difculties associated with weak instruments. In section 6, we examine anumber of possible solutions in the context of linear SEM, while extensions to nonlinear or non-Gaussian models are considered in Section 7. We conclude in section 8.

2. ModelsThe purpose of econometric analysis is to develop mathematical representations of data, which wecall models or hypotheses (models subject to restrictions). An hypothesis should have two basicfeatures.

4


9/43

1. It must restrict the expected behavior of observations, beinformative . A non-restrictive hy-pothesis says nothing and, consequently, does not teach us anything: it isempirically empty ,

voidof empirical content

. The more restrictive a model is, the more informative it is, and themore interesting it is.

2. It must becompatible with available data ; ideally, we would like it to betrue .

However, these two criteria are not always compatible:

1. theinformation criterion suggests the use of parsimonious models that usually take the formof parametric models based onstrong assumptions ; note the information criterion is empha-sized by an inuential view in philosophy of science which stressesfalsiabilityas a criterionfor thescientic characterof a theory [Popper (1968)];

2. in contrast,compatibility with observed data is most easily satised byvague models whichimpose few restrictions; vague models may take the form of parametric models with a largenumber of free parameters ornonparametric models which involve an innite set of freeparameters and thus allow forweak assumptions .

Models can be classied as being eitherdeterministic or stochastic . Deterministic models ,which claim to make arbitrarily precise predictions, are highly falsiable but always inconsistentwith observed data. Accordingly, most models used in econometrics arestochastic . Suchmodels areunveriable : as with any theory that makes an indenite number of predictions, we can never be surethat the model will not be put in jeopardy by new data. Moreover, they arelogically unfalsiable :in contrast with deterministic models, a probabilistic model is usually logically compatible with all

possible observation vectors.Given these facts, it is clear any criterion for assessing whether an hypothesis is acceptable mustinvolve aconventional aspect. The purpose of hypothesis testing theory is to supply a coherentframework for accepting or rejecting probabilistic hypotheses. It is a probabilistic adaptation of thefalsication principle.6

3. Statistical notionsIn this section, we review succinctly basic statistical notions which are essential for understandingthe rest of our discussion. The general outlook follows modern statistical testing theory, derivedfrom the Neyman-Pearson approach and described in standard textbooks, such as Lehmann (1986).

3.1. HypothesesConsider an observational experiment whose result can be represented by a vector of observations

X (n ) = ( X 1, . . . , X n ) (3.1)6For further discussion on the issues discussed in this section, the reader may consult Dufour (2000).

5


10/43

whereX i takes real values, and let

F (x) = F (x1, . . . , xn ) = P [X 1

x1, . . . , X n

xn ] (3.2)

be its distribution, wherex = ( x1 , . . . , xn ). We denote byF n the set of possible distributionfunctions onR n [F F n ].For various reasons, we prefer to represent distributions in terms of parameters . There aretwo ways of introducing parameters in a model. Therst is to dene a function from a space of probability distributions to a vector in some Euclidean space:

: F n R p . (3.3)Examples of such parameters include: the moments of a distribution (mean, variance, kurtosis,etc.), its quantiles (median, quartiles, etc.). Such functions are also calledfunctionals . Thesecond

approach is to dene a family of distribution functions which are indexed by a parameter vector :F (x) = F 0(x |) (3.4)

whereF 0 is a distribution function with a specic form. For example, if F 0(x |) represents aGaussian distribution with mean and variance2 [e.g. , corresponding to a Gaussian law], we have = ( , 2).

A model isparametric if the distribution of the data is specied up to a nite number of (scalar)parameters. Otherwise, it isnonparametric . An hypothesisH 0 onX (n ) is an assertion

H 0 : F H0 , (3.5)whereH0 is a subset of F n , the set of all possible distributionsF n . The setH0 may contain: a singledistribution (simple hypothesis ), or several distributions (composite hypothesis ). In particular, if wecan write = ( 1, 2), H 0 often takes the following form:

H0 {F () : F (x) = F 0(x |1, 2) and1 = 01}. (3.6)We usually abbreviate this as:

H 0 : 1 = 01 . (3.7)

In such a case, we call1 the parameter of interest , and2 a nuisance parameter : the parameterof interest is set byH 0 but the nuisance parameter remains unknown. H 0 may be interpreted asfollows: there is at least one distribution inH 0 that can be viewed as a representation compatiblewith the observed behavior of X (n ) . Then we can say that:

H 0 is acceptable ( F H0) F is acceptable (3.8)or, equivalently,

H 0 is unacceptable ( F H0) F is unacceptable. (3.9)

6


11/43

It is important to note here that showing thatH 0 is unacceptable requires one to show thatalldistributions inH0 are incompatible with the observed data.3.2. Test level and sizeA test forH 0 is a rule by which one decides to reject or accept the hypothesis (or to view it asincompatible with the data). It usually takes the form:

rejectH 0 if S n (X 1, . . . , X n ) > c ,do not rejectH 0 if S n (X 1, . . . , X n ) c .

(3.10)

The test haslevel iff P F [RejectingH 0] for allF H0 (3.11)

or, equivalently,sup

F H 0P F [RejectingH 0] , (3.12)

whereP F [] is the function ( probability measure ) giving the probability of an event when the datadistribution function isF. The test hassize if sup

F H 0P F [RejectingH 0] = . (3.13)

H 0 is testable if we can nd a nite numberc that satises the level restriction. Probabilities of rejectingH 0 for distributions outsideH0 (i.e. , forF / H0) dene thepower function of the test.7Power describes the ability of a test to detect a false hypothesis. Alternative tests are typicallyassessed by comparing their powers: between two tests with the same level, the one with the highestpower against a given alternative distributionF / H0 is deemed preferable (at least, under thisparticular alternative). Among tests with the same level, we typically like to have a test with thehighest possible power against alternatives of interest.

As the setH0 gets larger, the test procedure must satisfy a bigger set of constraints: the largeris the set of distributions compatible with a null hypothesis, the stronger are the restrictions on thetest procedure. In other words, the less restrictive an hypothesis is, the more restricted will be thecorresponding test procedure. It is easy to understand that imposing a large set of restrictions on atest procedure may reduce its power against specic alternatives. There may be a point where therestrictions are no longer implementable, in the sense that no procedure which has some power cansatisfy the level constraint:H 0 is non-testable. In such a case, we have anill-dened test problem.

In a framework such as the one in (3.6), where we distinguish between a parameter of interest1 and a nuisance parameter2, this is typically due to heavy dependence of the distribution of S n on the nuisance parameter2. If the latter is specied, we may be able to nd a (nite) criticalvaluec = c(, 2) that satises the level constraint (3.11). But, in ill-dened problems,c(, 2)

7More formally, the power function can be dened as the function:P (F ) = P F [RejectingH 0 ] for F H 1 \ H 0 ,whereH 1 is an appropriate subset of the set of all possible distributionsF n . Sometimes, it is also dened on the setH 1 H 0 , in which case it should satisfy the level constraint forF H 0 .

7


12/43

depends heavily on2, so that it is not possible to nd a useful (nite) critical value for testingH 0,i.e. sup

2c(, 2) = . Besides, even if this is the case, this does not imply that an hypothesis

that would xboth 1 and 2, is not testable,i.e. the hypothesisH 0 : (1, 2) = ( 01, 02) maybe perfectly testable. But only a complete specication of the vector(1, 2) does allow one tointerpret the values taken by the test statisticS n (nonseparability ).

3.3. Condence sets and pivotsIf we consider an hypothesis of the form

H 0(01) : 1 = 01 (3.14)

and if we can build a different testS n (01; X 1, . . . , X n ) for each possible value of 01 , we candetermine the set of values that can be viewed as compatible with the data according to the testsconsidered:

C = 01 : S n (01; X 1, . . . , X n ) c(01) . (3.15)

If P F RejectingH 0(01) for all F H(F 0 , 01) , (3.16)

we haveinf

1 , 2P [1 C ] 1 . (3.17)

C is a condence set with level1 for1 . The setC covers the true parameter value1 withprobability at least1. The minimal probability of covering the true value of 1, i.e. inf 1 , 2 P [1C ], is called thesize of the condence set.In practice,condence regions (orcondence intervals ) were made possible by the discovery of pivotal functions (or pivots ): a pivot for1 is a functionS n (1; X 1, . . . , X n ) whose distributiondoes not depend on unknown parameters (nuisance parameters); in particular, the distribution doesnot depend on2. More generally, the functionS n (1; X 1, . . . , X n ) is boundedly pivotal if itsdistribution function may depend on but is bounded over the parameter space [see Dufour (1997)].When we have a pivotal function (or a boundedly pivotal function), we can nd a pointc such that:

P [S n (1; X 1, . . . , X n ) c] , 1 . (3.18)For example, if X 1, . . . , X n

i.i.d. N [, 2], thet-statistic

tn () = n(X n )/s X (3.19)whereX n =

n

i=1X i /n andsX =

n

i=1(X i X n )/ (n 1), follows a Studentt(n 1) distribution

which does not depend on the unknown values of and; hence, it is a pivot. By contrast,n(X n ) is not a pivot because its distribution depends on. More generally, in the classical linear modelwith several regressors, thet statistics for individual coefcients [say,t( i ) = n( i i )/ i ]

8


13/43

constitute pivots because their distributions do not depend on unknown nuisance parameters; inparticular, the values of the other regression coefcients disappear from the distribution.

3.4. Testability and identicationWhen formulating and trying to solve test problems, two types of basic difculties can arise. First,there is no valid test that satises reasonable properties [such as depending upon the data]: in sucha case, we have anon-testable hypothesis , an empiricallyempty hypothesis . Second, the proposedstatistic cannot be pivotal for the model considered: its distribution varies too much under the nullhypothesis to determine anite critical point satisfying the level restriction (3.18).

If an hypothesis is non-testable, we are not able to design a reasonable procedure for decidingwhether it holds (without the introduction of additional data or information). This difculty isclosely related to the concept of identication in econometrics. A parameter is identiable iff

(F 1) = (F 2) = F 1 = F 2 . (3.20)For1 = 2, we can, in principle, design a procedure for deciding whether = 1 or = 2. Thevalues of are testable . More generally, a parametric transformationg() is identiable iff

g[(F 1)] = g[(F 2)] = F 1 = F 2 . (3.21)

Intuitively, these denitions mean that different values of the parameter imply different distributionsof the data, so that we may expect to be able to tell the difference by looking at the data. This iscertainly the case when a unique distribution is associated with each parameter value [for example,we may use the Neyman-Pearson likelihood ratio test to make the decision], but this may not be the

case when a parameter coversseveral

distributions. In the next section, we examine several caseswhere this happens.

4. Testability, nonparametric models and asymptotic methodsWe will now discuss three examples of test problems that look perfectly well dened and sensibleat rst sight, but turn out to be ill-dened when we look at them more carefully. These include:(1) testing an hypothesis about a mean when the observations are independent and identically dis-tributed (i.i.d.); (2) testing an hypothesis about a mean (or a median) with heteroskedasticity of unknown form; (3) testing the unit root hypothesis on an autoregressive model whose order can bearbitrarily large.8

4.1. Procedures robust to nonnormalityOne of the most basic problems in econometrics and statistics consists in testing an hypothesisabout a mean, for example, its equality to zero. Tests on regression coefcients in linear regressions

8Further discussion on the issues discussed in this section is available in Dufour (2001). For rrelated discussions, seealso Horowitz (2001), Maasoumi (1992) and Ptscher (2002).

9


14/43

or, more generally, on parameters of models which are estimated by the generalized method of moments (GMM) can be viewed as extensions of this fundamental problem. If the simplest versions

of the problem have no reasonable solution, the situation will not improve when we consider morecomplex versions (as done routinely in econometrics).The problem of testing an hypothesis about a mean has a very well known and neat solution

when the observations are independent and identically (i.i.d.) distributed according to a normaldistribution: we can use at test. The normality assumption, however, is often considered to be toostrong. So it is tempting to consider a weaker (less restrictive) version of this null hypothesis,such as

H 0(0) : X 1 , . . . , X n are i.i.d. observations withE (X 1) = 0 . (4.1)

In other words, we would like to test the hypothesis that the observations have mean0, under thegeneral assumption thatX 1, . . . , X n are i.i.d. HereH 0(0) is a nonparametric hypothesis becausethe distribution of the data cannot be completely specied by xing anite number of parameters.The set of possible data distributions (or data generating processes) compatible with this hypothesis,i.e. ,

H(0) = {Distribution functionsF n F n such thatH 0(0) is satised}, (4.2)is much larger here than in the Gaussian case and imposes very strong restrictions on the test.Indeed, the setH(0) is so large that the following property must hold.Theorem 4.1 MEAN NON-TESTABILITY IN NONPARAMETRIC MODELS. If a test has level for H 0(0), i.e.

P F n [ Rejecting H 0(0)] for all F n H(0) , (4.3)then, for any 1 = 0,

P F n [ Rejecting H 0(0)] for all F n H(1) . (4.4)Further, if there is at least one value 1 = 0 such that

P F n Rejecting H 0(0) for at least one F n H(1) , (4.5)then, for all 1 = 0,

P F n Rejecting H 0(0) = for all F n H() . (4.6)PROOF. See Bahadur and Savage (1956).

In other words [by (4.4)], if a test has level for testingH 0(0), the probability of rejectingH 0(0) should not exceed the level irrespective how far the true mean is from0. Further [by(4.6)], if by luck the power of the test gets as high as the level, then the probability of rejectingshould be uniformly equal to the level. Here, the restrictions imposed by the level constraint areso strong that the test cannot have power exceeding its level: it should be insensitive to cases wherethe null hypothesis does not hold! Anoptimal test (say, at level.05) in such a problem can be run

10


15/43

as follows: (1) ignore the data; (2) using a random number generator, produce a realization of avariableU according to a uniform distribution on the interval(0, 1), i.e. , U U (0, 1); (3) reject

H 0 if U .05. Clearly, this is not an interesting procedure. It is also easy to see that a similar resultwill hold if we add various nonparametric restrictions on the distribution, such as anite varianceassumption.

The above theorem also implies that tests based on the asymptotic distribution of the usualtstatistic for = 0 [tn (0) dened in (3.19)] has size one underH 0(0) :

supF n H ( 0 )

P F n |tn (0)| > c = 1 (4.7)

for any nite critical valuec. In other words, procedures based on the asymptotic distribution of atest statistic have size that deviate arbitrarily from their nominal size.

A way to interpret what happens here is through the distinction betweenpointwise convergence

anduniform convergence . Suppose, to simplify, that the probability of rejectingH 0(0) when it istrue depends on a single nuisance parameter in the following way:

P n ( ) P |tn (0)| > c = 0 .05 + (0 .95)e| |n (4.8)where = 0 . Then, for each value of , the test has level0.05 asymptotically,i.e.

limn

P n ( ) = 0 .05 , (4.9)

but thesize of the test is one for all sample sizes:

sup> 0

P n ( ) = 1 , for alln . (4.10)

P n ( ) converges to a level of 0.05 pointwise (for each ), but the convergence is not uniform, sothat the probability of rejection is arbitrarily close to one for sufciently close to zero (for allsample sizesn).

Many other hypotheses lead to similar difculties. Examples include:

1. hypotheses about various moments of X t :

H 0(2) : X 1, . . . , X n are i.i.d. observations such thatVar (X t ) = 2 ,H 0( p) : X 1, . . . , X n are i.i.d. observations such thatE (X

pt ) = p ;

2. most hypotheses on the coefcients of a regression (linear or nonlinear), a structural equation(as in SEM), or a more general estimating function [Godambe (1960)]:

H 0(0) : gt (Z t , 0) = u t , t = 1 , . . . , T , whereu1 , . . . , uT are i.i.d.

In econometrics, models of the formH 0(0) are typically estimated and tested through a variant of the generalized method of moments (GMM), usually with weaker assumptions on the distribution

11


16/43

of u1 , . . . , uT ; see Hansen (1982), Newey and West (1987a), Newey and McFadden (1994) andHall (1999). To the extent that GMM methods are viewed as a way to allow for weak assumptions,

it follows from the above discussion that they constitute pseudo-solutions of ill-dened problems.It is important to observe that the above discussion does not imply that all nonparametric hy-potheses are non testable. In the present case, the problem of non-testability could be eliminated bychoosing another measure of central tendency, such as amedian :

H 0.50 (m0) : X 1, . . . , X n are i.i.d. continuous r.v.s such thatMed(X t ) = m0 , t = 1 , . . . , T .

H 0.50 (m0) can be easily tested with a sign test [see Pratt and Gibbons (1981, Chapter 2)]. Moregenerally, hypotheses on the quantiles of the distribution of observations in random sample remaintestable nonparametrically:

H p0 (Q p0) : X 1, . . . , X n are i.i.d. observations such that

P X t Q p0 = p , t = 1 , . . . , T .Moments are not empirically meaningful functionals in nonparametric models (unless strong distri-butional assumptions are added), though quantiles are.

4.2. Procedures robust to heteroskedasticity of unknown formAnother common problem in econometrics consists in developing methods which remain valid inmaking inference on regression coefcients when the variances of the observations are not identi-cal (heteroskedasticity). In particular, this may go as far as looking for tests which are robust toheteroskedasticity of unknown form. But it is not widely appreciated that this involves very strongrestrictions on the procedures that can satisfy this requirement. To see this, consider the prob-lem which consists in testing whethern observations are independent with common zero median,namely:

H 0 : X 1, . . . , X n are independent random variableseach with a distribution symmetric about zero. (4.11)

Equivalently,H 0 states that the joint distributionF n of the observations belongs to the (huge) set

H0 = {F n F n : F n satisesH 0}: H 0 allows heteroskedasticity of unknown form. In such acase, we have the following theorem.Theorem 4.2 CHARACTERIZATION OF HETEROSKEDASTICITY ROBUST TESTS. If a test haslevel for H 0, where 0 < < 1, then it must satisfy the condition

P [ Rejecting H 0 | |X 1|, . . . , |X n | ] under H 0 . (4.12)PROOF. See Pratt and Gibbons (1981, Section 5.10) and Lehmann and Stein (1949).

In other words, a valid test with level must be asign test or, more precisely, its level mustbe equal to conditional on the absolute values of the observations (which amounts to considering

12


17/43

a test based on the signs of the observations). From this, the following remarkable property follows.

Corollary 4.3 If, for all 0 < < 1, the condition (4.12) is not satised, then the size of the test isequal to one, i.e.

supF n H 0

P F n [ Rejecting H 0] = 1 . (4.13)

In other words, if a test procedure does not satisfy(4.12) for all levels0 < < 1, then its truesize isone irrespective of its nominal size. Most so-called heteroskedasticity robust proceduresbased on corrected standard errors [see White (1980), Newey and West (1987b), Davidson andMacKinnon (1993, Chapter 16), Cushing and McGarvey (1999)] do not satisfy condition(4.12) andconsequently have size one.9

4.3. Procedures robust to autocorrelation of arbitrary form

As a third illustration, let us now examine the problem of testing the unit root hypothesis in the con-text of an autoregressive model whose order is innite or is not bounded by a prespecied maximalorder:

X t = 0 + p

k=1

k X t k + u t , uti.i.d. N [0, 2] , t = 1 , . . . , n , (4.14)

wherep is not boundeda priori . This type of problem has attracted a lot of attention in recentyears.10 We wish to test:

H 0 : p

k=1

k = 1 (4.15)

or, more precisely,H 0 : X t = 0 +

p

k=1k X t k + u t , t = 1 , . . . , n , for somep 0 ,

p

k=1k = 1 and u t

i.i.d. N [0, 2] .(4.16)

About this problem, we can show the following theorem and corollary.

Theorem 4.4 UNIT ROOT NON-TESTABILITY IN NONPARAMETRIC MODELS. If a test has level for H 0, i.e.

P F n [ Rejecting H 0]

for all F n satisfying H 0 , (4.17)

thenP F n [ Rejecting H 0] for all F n . (4.18)

PROOF. See Cochrane (1991) and Blough (1992).9For examples of size distortions, see Dufour (1981) and Campbell and Dufour (1995, 1997).

10For reviews of this huge literature, the reader may consult: Banerjee, Dolado, Galbraith, and Hendry (1993), Stock(1994), Tanaka (1996), and Maddala and Kim (1998).

13


18/43

Corollary 4.5 If, for all 0 < < 1, the condition (4.18) is not satised, then the size of the test isequal to one, i.e.

supF n H 0P

F n [ Rejecting

H 0] = 1

where H0 is the set of all data distributions F n that satisfy H 0.As in the mean problem, the null hypothesis is simply too large (unrestricted) to allow testing

from a nite data set. Consequently, all procedures that claim to offer corrections for very generalforms of serial dependence [e.g. , Phillips (1987), Phillips and Perron (1988)] are affected by theseproblems: irrespective of the nominal level of the test, the true size under the hypothesisH 0 is equalto one.

To get a testable hypothesis, it is essential to x jointly the order of the AR process (i.e. , anumerical upper bound on the order) and the sum of the coefcients: for example, we could considerthe following null hypothesis where the order of the autoregressive process is equal to 12:

H 0(12) : X t = 0 +12

k=1k X t k + u t , t = 1 , . . . , n ,

12

k=1 k = 1 and u t

i.i.d. N [0, 2] .(4.19)

The order of the autoregressive process is an essential part of the hypothesis: it is not possible toseparate inference on the unit root hypothesis from inference on the order of the process. Similardifculties will also occur for most other hypotheses on the coefcients of (4.16). For furtherdiscussion of this topic, the reader may consult Sims (1971a, 1971b), Blough (1992), Faust (1996,1999) and Ptscher (2002).

5. Structural models and weak instrumentsSeveral authors in the past have noted that usual asymptotic approximations are not valid or lead tovery inaccurate results when parameters of interest are close to regions where these parameters areno longer identiable. The literature on this topic is now considerable.11 In this section, we shallexamine these issues in the context of SEM.

11See Sargan (1983), Phillips (1984, 1985, 1989), Gleser and Hwang (1987), Koschat (1987), Phillips (1989), Hillier(1990), Nelson and Startz (1990a, 1990b), Buse (1992), Choi and Phillips (1992), Maddala and Jeong (1992), Bound,Jaeger, and Baker (1993, 1995), Dufour and Jasiak (1993, 2001), McManus, Nankervis, and Savin (1994), Angrist andKrueger (1995), Hall, Rudebusch, and Wilcox (1996), Dufour (1997), Shea (1997), Staiger and Stock (1997), Wang and

Zivot (1998), Zivot, Startz, and Nelson (1998), Hall and Peixe (2000), Stock and Wright (2000), Hahn and Hausman(2002a, 2002b, 2002c), Hahn, Hausman, and Kuersteiner (2001), Dufour and Taamouti (2000, 2001b, 2001a), Startz,Nelson, and Zivot (2001), Kleibergen(2001b, 2001a,2002a,2002b, 2003), Bekker(2002), Bekkerand Kleibergen(2001),Chao and Swanson (2001, 2003), Moreira (2001, 2003a, 2003b), Moreira and Poi (2001), Stock and Yogo (2002, 2003),Stock, Wright, and Yogo (2002)], Wright (2003, 2002), Imbens and Manski (2003), Kleibergen and Zivot (2003), Perron(2003), and Zivot, Startz, and Nelson (2003).

14


19/43

5.1. Standard simultaneous equations modelLet us consider the standard simultaneous equations model:

y = Y + X 1 + u , (5.1)Y = X 1 1 + X 2 2 + V , (5.2)

wherey and Y are T 1 and T G matrices of endogenous variables,X 1 and X 2 are T k1andT k2 matrices of exogenous variables, and are G 1 andk1 1 vectors of unknowncoefcients, 1 and 2 arek1G andk2G matrices of unknowncoefcients,u = ( u1, . . . , uT )is aT 1 vector of structural disturbances, andV = [V 1, . . . , V T ] is aT G matrix of reduced-form disturbances. Further,X = [ X 1, X 2] is a full-column rankT k matrix (5.3)

wherek = k1 + k2. Finally, to get a nite-sample distributional theory for the test statistics, weshall use the following assumptions on the distribution of u :

u andX are independent; (5.4)u N 0, 2u I T . (5.5)

(5.4) may be interpreted as the strict exogeneity of X with respect tou.Note that the distribution of V is not otherwise restricted; in particular, the vectorsV 1, . . . , V T

need not follow a Gaussian distribution and may be heteroskedastic. Below, we shall also considerthe situation where the reduced-form equation forY includes a third set of instrumentsX 3 whichare not used in the estimation:

Y = X 1 1 + X 2 2 + X 3 3 + V (5.6)

whereX 3 is aT k3 matrix of explanatory variables (not necessarily strictly exogenous); in partic-ular,X 3 may be unobservable. We view this situation as important because, in practice, it is quiterare that one can consider all the relevant instruments that could be used. Even more generally, wecould also assume thatY obeys a general nonlinear model of the form:

Y = g(X 1, X 2, X 3, V, ) (5.7)

whereg() is a possibly unspecied nonlinear function and is an unknown parameter matrix.The model presented in (5.1) - (5.2) can be rewritten in reduced form as:

y = X 1 1 + X 22 + v , (5.8)Y = X 1 1 + X 2 2 + V , (5.9)

15


20/43

where1 = 1 + , v = u + V , and

2 = 2 . (5.10)

Suppose now that we are interested in making inference about.(5.10) is the crucial equation governing identication in this system: we need to be able to

recover from the values of the regression coefcients2 and 2. The necessary and sufcientcondition for identication is the well-knownrank condition for the identication of :

is identiable iff rank( 2) = G . (5.11)

We have aweak instrument problem when eitherrank( 2) < k 2 (non-identication ), or 2 isclose to having decient rank [i.e. , rank( 2) = k2 with strong linear dependence between the rows(or columns) of 2]. There is no compelling denition of the notion of near-nonidentication , but

reasonable characterizations include the condition thatdet( 2 2) is close to zero, or that 2 2has one or several eigenvalues close to zero.Weak instruments are notorious for causing serious statistical difculties on several fronts: (1)

parameter estimation; (2) condence interval construction; (3) hypothesis testing. We now considerthese problems in greater detail.

5.2. Statistical problems associated with weak instrumentsThe problems associated with weak instruments were originally discovered through its conse-quences for estimation. Work in this area includes:

1. theoretical work on theexact distribution of two-stage least squares (2SLS) and other con-sistent structural estimators and test statistics [Phillips (1983), Phillips (1984), Rothenberg(1984), Phillips (1985), Phillips (1989), Hillier (1990), Nelson and Startz (1990a), Nelsonand Startz (1990a), Buse (1992), Maddala and Jeong (1992), Choi and Phillips (1992), Du-four (1997)];

2. weak-instrument (local to non-identication)asymptotics [Staiger and Stock (1997), Wangand Zivot (1998), Stock and Wright (2000)];

3. empirical examples [Bound, Jaeger, and Baker (1995)].

The main conclusions of this research can be summarized as follows.

1. Theoretical results show that the distributions of various estimators depend in a complicatedway on unknown nuisance parameters. Thus, they are difcult to interpret.

2. When identication conditions are not satised, standard asymptotic theory for estimatorsand test statistics typically collapses.

3. With weak instruments,

16


21/43

(a) the 2SLS estimator becomes heavily biased [in the same direction as ordinary leastsquares (OLS)];

(b) the distribution of the 2SLS estimator is quite far from the normal distribution (e.g. ,bimodal).

4. A striking illustration of these problems appears in the reconsideration by Bound, Jaeger, andBaker (1995) of a study on returns to education by Angrist and Krueger (1991, QJE). Using329000 observations, these authors found that replacing the instruments used by Angrist andKrueger (1991) with randomly generated (totally irrelevant) instruments produced very simi-lar point estimates and standard errors. This result indicates that the original instruments wereweak.

For a more complete discussion of estimation with weak instruments, the reader may consult Stock,Wright, and Yogo (2002).

5.3. Characterization of valid tests and condence setsWeak instruments also lead to very serious problems when one tries to perform tests or build con-dence intervals on the parameters of the model. Consider the general situation where we have twoparameters1 and2 [i.e. , = ( 1, 2)] such that2 is no longer identied when1 takes a certainvalue, say1 = 01 :

L(y |1, 2) L(y |01) . (5.12)Theorem 5.1If 2 is a parameter whose value is not bounded, then the condence region C withlevel 1 for 2 must have the following property:

P [C is unbounded ] > 0 (5.13)

and, if 1 = 01,P [C is unbounded ] 1 . (5.14)

PROOF. See Dufour (1997).

Corollary 5.2 If C does not satisfy the property given in the previous theorem, its size must be zero.

This will be the case, in particular, for any Wald-type condence interval, obtained by assuming

thatt 2 =

2 2

2approx N (0, 1) , (5.15)

which yields condence intervals of the form2 c 2 2 2 + c 2 , whereP [|N (0, 1)| >c] . By the above corollary, this type of interval has level zero, irrespective of the critical valuec used:inf

P 2 c 2 2 2 + c 2 = 0 . (5.16)

17


22/43

In such situations, the notion of standard error loses its usual meaning and does not constitute a validbasis for building condence intervals. In SEM, for example, this applies to standard condence

intervals based on 2SLS estimators and their asymptotic standard errors.Correspondingly, if we wish to test an hypothesis of formH 0 : 2 = 02 , the size of any test of the form

t 2

(02) = 2 02 2> c () (5.17)

will deviate arbitrarily from its nominal size. No unique large-sampledistribution fort 2can provide

valid tests and condence intervals based on the asymptotic distribution of t 2 .From a statistical

viewpoint, this means thatt 2is not a pivotal function for the model considered. More generally,

this type of problem affect the validity of allWald-type methods, which are based on comparingparameter estimates with their estimated covariance matrix.

By contrast, in models of the form (5.1) - (5.5), the distribution of the LR statistics for mosthypotheses on model parameters can be bounded and cannot move arbitrarily: likelihood ratios areboundedly pivotal functions and provide a valid basis for testing and condence set construction[see Dufour (1997)].

The central conclusion here is:tests and condence sets on the parameters of a structural modelshould be based on proper pivots.

6. Approaches to weak instrument problemsWhat should be the features of a satisfactory solution to the problem of making inference in struc-tural models? We shall emphasize here four properties: (1) the method should be based on properpivotal functions (ideally, a nite-sample pivot); (2) robustness to the presence of weak instruments;(3) robustness to excluded instruments; (4) robustness to the formulation of the model for the ex-planatory endogenous variablesY (which is desirable in many practical situations).

In the light of these criteria, we shallrst discuss the Anderson-Rubin procedure, which in ourview is the reference method for dealing with weak instruments in the context of standard SEM,second the projection technique which provides a general way of making a wide spectrum of testsand condence sets, andthirdly several recent proposals aimed at improving and extending the ARprocedure.

6.1. Anderson-Rubin statisticA solution to testing in the presence of weak instruments has been available for more than 50years [Anderson and Rubin (1949)] and is now center stage again [Dufour (1997), Staiger andStock (1997)]. Interestingly, the AR method can be viewed as an alternative way of exploitinginstruments for inference on a structural model, although it pre-dates the introduction of 2SLSmethods in SEM [Theil (1953), Basmann (1957)], which later became the most widely used methodfor estimating linear structural equations models.12 The basic problem considered consists in testing

12The basic ideas for using instrumental variables for inference on structural relationships appear to go back to Working(1927)and Wright (1928). For an interesting discussion of the origin of IV methods ineconometrics, see Stock and Trebbi

18


23/43

the hypothesisH 0( 0) : = 0 (6.1)

in model (5.1) - (5.4). Inorder to do that, we consider anauxiliary regression obtainedby subtractingY 0 from both sides of (5.1) and expanding the right-hand side in terms of the instruments. Thisyields the regression

y Y 0 = X 11 + X 22 + (6.2)where1 = + 1( 0), 2 = 2( 0) and = u + V ( 0) . Under the null hypothesisH 0( 0), this equation reduces to

y Y 0 = X 11 + , (6.3)so we can testH 0( 0) by testingH 0( 0) : 2 = 0 , in the auxiliary regression (6.2). This yields thefollowing F-statistic the Anderson-Rubin statistic which follows a Fisher distribution underthe null hypothesis:

AR( 0) =[SS 0( 0) SS 1( 0)]/k 2

SS 1( 0)/ (T k)F (k2, T k) (6.4)

whereSS 0( 0) = ( y Y 0) M (X 1)(y Y 0) andSS 1( 0) = ( y Y 0) M (X )(y Y 0); forany full-rank matrixA, we denoteP (A) = A(A A) 1A andM (A) = I P (A). What plays thecrucial role here is the fact that we have instruments(X 2) that can be related toY but are excludedfrom the structural equation. To draw inference on the structural parameter , we hang on thevariables inX 2 : if we addX 2 to the constrained structural equation (6.3), its coefcient should bezero. For these reasons, we shall call the variables inX 2 auxiliary instruments .

Since the latter statistic is a proper pivot, it can be used to build condence sets for :

C ( ) = { 0 : AR ( 0) F (k2, T k)} (6.5)whereF (k2, T k) is the critical value for a test with level based on theF (k2, T k) distri-bution. When there is only one endogenous explanatory variable(G = 1) , this set has an explicitsolution involving a quadratic inequation,i.e.

C ( ) = { 0 : a 20 + b 0 + c 0} (6.6)wherea = Y HY, H M (X 1) M (X ) [1 + k2F (k2, T k)/ (T k)] , b = 2Y Hy, andc = y Hy . The setC ( ) may easily be determined by nding the roots of the quadratic polynomialin equation (6.6); see Dufour and Jasiak (2001) and Zivot, Startz, and Nelson (1998) for details.

WhenG > 1, the setC ( ) is not in general an ellipsoid, but it remains fairly manageableby using the theory of quadrics [Dufour and Taamouti (2000)]. When the model is correct and itsparameters are well identied by the instruments used,C ( ) is a closed bounded set close to anellipsoid. In other cases, it can be unbounded or empty. Unbounded sets are highly likely whenthe model is not identied, so they point tolack of identication . Empty condence sets can occur(with a non-zero probability) when we have more instruments than parameters in the structural

(2003).

19


24/43

equation (5.1),i.e. the model isoveridentied . An empty condence set means that no value of the parameter vector is judged to be compatible with the available data, which indicates that the

model is misspecied. So the procedure provides as an interesting byproduct aspecication test

.13

It is also easy to see that the above procedure remains valid even if the extended reduced form(5.6) is the correct model forY. In other words, we can leave out a subset of the instruments(X 3)and use onlyX 2 : the level of the procedure will not be affected. Indeed, this will also hold if Y is determined by the general possibly nonlinear model (5.7). The procedure isrobust toexcluded instruments as well as to thespecication of the model for Y. The power of the test maybe affected by the choice of X 2, but its level is not. Since it is quite rare an investigator can be surerelevant instruments have not been left out, this is an important practical consideration.

The AR procedure can be extended easily to deal with linear hypotheses which involve aswell. For example, to test an hypothesis of the form

H 0( 0,

0) : =

0and =

0, (6.7)

we can consider the transformed model

y Y 0 X 1 0 = X 11 + X 22 + . (6.8)Since, underH 0( 0, 0),

y Y 0 X 1 0 = , (6.9)we can testH 0( 0, 0) by testingH 0( 0, 0) : 1 = 0 and2 = 0 in the auxiliary regression (6.8);see Maddala (1974). Tests for more general restrictions of the form

H 0( 0, 0) : = 0 andR = 0 , (6.10)

whereR is ar K xed full-rank matrix, are discussed in Dufour and Jasiak (2001).The AR procedure thus enjoys several remarkable features. Namely, it is: (1) pivotal in nitesamples; (2) robust to weak instruments; (3) robust to excluded instruments; (4) robust to the spec-ication of the model forY (which can be nonlinear with an unknown form); further, (5) the ARmethod provides asymptotically valid tests and condence sets under quite weak distributionalassumptions (basically, the assumptions that cover the usual asymptotic properties of linear regres-sion); and (6) it can be extended easily to test restrictions and build condence sets which alsoinvolve the coefcients of the exogenous variables, such asH 0( 0, 0) in (6.10).

But the method also has its drawbacks. The main ones are: (1) the tests and condence setsobtained in this way apply only to the full vector [or( , ) ]; what can we do, if has more than

one element? (2) power may be low if too many instruments are added (X 2 has too many variables)to perform the test, especially if the instruments are irrelevant; (3) error normality assumption isrestrictive and we may wish to consider other distributional assumptions; (4) the structural equationsare assumed to be linear. We will now discuss a number of methods which have been proposed inorder to circumvent these drawbacks.

13For further discussion of this point, see Kleibergen (2002b).

20


25/43

6.2. Projections and inference on parameter subsetsSuppose now that [or ( , ) ] has more than one component. The fact that a procedure with anite-sample theory has been obtained for joint hypotheses of the formH 0( 0) [orH 0( 0, 0)]is not due to chance: since the distribution of the data is determined by the full parameter vector,there is no reason in general why one should be able to decide on the value of a component of independently of the others. Such a separation is feasible only in special situations,e.g. in theclassical linearmodel (without exactmulticollinearity). Lack of identication is precisely a situationwhere the value of a parameter may be determined only after various restrictions (e.g. , the valuesof other parameters) have been imposed. So parametric nonseparability arises here, and inferenceshould start from a simultaneous approach. If the data generating process corresponds to a modelwhere parameters are well identied, precise inferences on individual coefcients may be feasible.This raises the question how one can move from a joint inference on to its components.

A general approach to this problem consists in using a projection technique. If

P [ C ( )] 1 , (6.11)then, for any functiong( ),

P g( ) g [C ( )] 1 . (6.12)If g( ) is a component of or (more generally) a linear transformationg( ) = w , the condenceset for a linear combination of the parameters, sayw takes the usual form[w z , w + z ]with a k-class type estimator of ; see Dufour and Taamouti (2000).14

Another interesting feature comes from the fact that the condence sets obtained in this way aresimultaneous in the sense of Scheff. More precisely, if {ga ( ) : a A}is a set of functions of ,then

P ga ( ) g [C ( )] for alla A 1 . (6.13)If these condence intervalsareused to test differenthypotheses, anunlimitednumberof hypothesescan be tested without losing control of the overall level.

6.3. Alternatives to the AR procedureIn view of improving the power of AR procedures, a number of alternative methods have beenrecently suggested. We will now discuss several of them.

a. Generalized auxiliary regressionA general approach to the problem of testingH 0( 0) con-sists in replacingX 2 in the auxiliary regression

y Y 0 = X 11 + X 22 + (6.14)14g [C ( )] takes the form of a bounded condence interval as soon as the condence setg [C ( )] is unbounded.

For further discussion of projection methods, the reader may consult Dufour (1990, 1997), Campbell and Dufour (1997),Abdelkhalek and Dufour (1998), Dufour, Hallin, and Mizera (1998), Dufour and Kiviet (1998), and Dufour and Jasiak(2001).

21


26/43

by an alternative set of auxiliary instruments, sayZ of dimensionT k2. In other words, weconsider the generalized auxiliary regressiony Y 0 = X 11 + Z 2 + (6.15)

where2 = 0 underH 0( 0). So we can testH 0( 0) by testing2 = 0 in (6.15). Then the problemconsists in selectingZ so that the level can be controlled and power may be improved with respectto the AR auxiliary regression (6.14). For example, it is easy to see that the power of the AR testcould become low if a large set of auxiliary instruments is used, especially if the latter are weak.So several alternative procedures can be generated by reducing the number of auxiliary instruments(the number of columns inZ ).

At the outset, we should note that, if (5.8) were the correct model and = [ 1, 2] wereknown, then an optimal choice from the viewpoint of power consists in choosingZ = X 2 2;see Dufour and Taamouti (2001b). The practical problem, of course, is that 2 is unknown. Thissuggests that we replaceX 2 2 by an estimate, such as

Z = X 2 2 (6.16)

where 2 is an estimate of the reduced-form coefcient 2 in (5.2). The problem then consists inchoosing. For that purpose, it is tempting to use the least squares estimator = ( X X ) 1X Y.However, and are not independent and we continue to face a simultaneity problem with messydistributional consequences. Ideally, we would like to select an estimate 2 which is independentof .

b. Split-sample optimal auxiliary instrumentsIf we can assume that the error vectors(u t , V t ) , t = 1 , . . . , T, are independent, this approach to estimating may be feasible by usinga split-sample technique: a fraction of the sample is used to obtain and the rest to estimate theauxiliary regression (6.15) withZ = X 2 2. Under such circumstances, by conditioning on, wecan easily see that the standardF test for2 = 0 is valid. Further, this procedure is robust to weakinstruments, excluded instruments as well as the specication of the model forY [i.e. , under thegeneral assumptions (5.6) or (5.7)], as long as the independence between and can be main-tained. Of course, using a split-sample may involve a loss of the effective number of observationsand there will be a trade-off between the efciency gain from using a smaller number of auxiliaryinstruments and the observations that are sacriced to get. Better results tend to be obtained byusing a relatively small fraction of the sample to obtain 10% for example and the rest forthe main equation. For further details on this procedure, the reader may consult Dufour and Jasiak(2001) and Kleibergen (2002a).15

A number of alternative procedures can be cast in the framework of equation (6.15).c. LM-type GMM-based statisticIf we takeZ = Z W Z with

Z W Z = P [M (X 1)X 2]Y = P [M (X 1)X 2]M (X 1)Y = [ M (X 1)X 2] 2 , (6.17)15Split-sample techniques often lead to important distributional simplications; for further discussion of this type of

method, see Angrist and Krueger (1995) and Dufour and Torrs (1998, 2000).

22


27/43

2 = [X 2M (X 1)X 2] 1X 2M (X 1)Y , (6.18)

the F-statistic [say,F GMM ( 0)] for 2 = 0 is a monotonic transformation of the LM-type statisticLM GMM ( 0) proposed by Wang and Zivot (1998). Namely,

F GMM ( 0) =T k1 G

GT LM GMM ( 0)

1 (1/T )LM GMM ( 0)(6.19)

where 1 = T k1 G andLM GMM ( 0) =

(y Y 0) P [Z W Z ](y Y 0)(y Y 0) M (X 1)(y Y 0)/T

. (6.20)

Note that 2 above is the ordinary least squares (OLS) estimator of 2 from the multivariate re-gression (5.2), so thatF GMM (

0) can be obtained by computing the F-statistic for

2= 0 in the

regressiony Y 0 = X 11 + ( X 2 2)2 + u . (6.21)

Whenk2 G, the statisticF GMM ( 0) can also be obtained by testing2 = 0 in the auxiliaryregressiony Y 0 = X 11 + Y 2 + u (6.22)

whereY = X . It is also interesting to note that the OLS estimates of 1 and 2 , obtained bytting the latter equation, are identical to the 2SLS estimates of 1 and2 in the equation

y Y 0 = X 11 + Y 2 + u . (6.23)The LM GMM test may thus be interpreted as an approximation to the optimal test based on re-placing the optimal auxiliary instrumentX 2 2 by X 2 2. The statisticLM GMM ( 0) is also nu-merically identical to the corresponding LR-type and Wald-type tests, based on the same GMMestimator (in this case, the 2SLS estimator of ).

As mentioned above, the distribution of this statistic will be affected by the fact thatX 2 2 andu are not independent. In particular, it is inuenced by the presence of weak instruments. ButWang and Zivot (1998) showed that the distribution of LM GMM ( 0) is bounded by the 2(k2)asymptotically. Whenk2 = G (usually deemed the just-identied case, although the model maybe under-identied in that case), we see easily [from (6.21)] thatF GMM ( 0) is (almost surely)identical with the AR statistic,i.e.

F GMM ( 0) = AR ( 0) if k2 = G , (6.24)so thatF GMM ( 0) follows an exactF (G, T k) distribution, while fork2 > G,

G F GMM ( 0) T k1 GT k1 k2

k2 AR( 0) , (6.25)

so that the distribution of LM GMM ( 0) can be bounded in nite samples by the distribution of a

23


28/43

monotonic transformation of aF (k2, T k) variable [which, forT large, is very close to the 2(k2)distribution]. But, forT reasonably large,AR ( 0) will always reject whenF GMM ( 0) rejects (at agiven level), so the power of the AR test is uniformly superior to that of theLM GMM bound test.

16

d. KleibergensK test If we takeZ = Z K with

Z K = P (X ) Y (y Y 0)suV ( 0)suu ( 0)

= X ( 0) Y ( 0) , (6.26)

( 0) = ( 0)sV ( 0)s ( 0)

, = ( X X ) 1X Y , (6.27)

( 0) = ( X X ) 1X (y Y 0) , suV ( 0) =

1T k

(y Y 0) M (X )Y , (6.28)suu ( 0) =

(y Y 0) M (X )(y Y 0)T

k

, (6.29)

we obtain a statistic, which reduces to the one proposed by Kleibergen (2002a) fork1 = 0 . Moreprecisely, withk1 = 0 , the F-statisticF K ( 0) for 2 = 0 is equal to Kleibergens statisticK ( 0)divided byG :

F K ( 0) = K ( 0)/G . (6.30)

This procedure tries to correct the simultaneity problem associated with the use of Y inthe LM GMM statistic by purging it from its correlation withu [by subtracting the term ( 0)sV ( 0)/s ( 0) in Z K ] . In other words,F K ( 0) andK ( 0) G F K ( 0) can be obtainedby testing2 = 0 in the regression

y

Y 0 = X 11 + Y ( 0)2 + u (6.31)

where the tted valuesY ,which appear in the auxiliary regression (6.22) for theLM GMM test, havebeen replaced byY ( 0) = Y X ( 0)sV ( 0)/s ( 0), which are closer to being orthogonal withu.

If k2 = G, we haveF K ( 0) = AR ( 0) F (G, T k), while in the other cases(k2 G),we can see easily that the bound forF GMM ( 0) in (6.25) also applies toF K ( 0) :

G F K ( 0) T k1 GT k1 k2

k2 AR ( 0) , (6.32)

Kleibergen (2002a) did not supply a nite-sample distributional theory but showed (assumingk1 = 0) that K ( 0) follows a 2(G) distribution asymptotically underH 0( 0), irrespective of the presence of weak instruments. This entails that theK ( 0) test will have power higher thanthe one of LM GMM test [based on the 2(k2) bound], at least in the neighborhood of the nullhypothesis, although not necessarily far away from the null hypothesis.

It is also interesting to note that the inequality (6.32) indicates that the distribution of K ( 0) 16The 2 (k 2 ) bound also follows in a straightforwardwayfrom (6.25). Note that Wangand Zivot (1998) do notprovide

the auxiliary regression interpretation (6.21) - (6.22) of their statistics. For details, see Dufour and Taamouti (2001b).

24


29/43

G F K ( 0) can be bounded in nite samples by a[k2(T k1G)/ (T k)]F (k2, T k) distribution.However, because of the stochastic dominance of AR( 0), there would be no advantage in usingthe bound to get critical values forK ( 0), for the AR test would then have better power.In view of the fact that the above procedure is based on estimating the mean of X (usingX ) and the covariances between the errors in the reduced form forY andu [usingsV ( 0)], itcan become quite unreliable in the presence of excluded instruments.

e. Likelihood ratio test The likelihood ratio (LR) statistic for = 0 was also studied by Wangand Zivot (1998). The LR test statistic in this case takes the form:

LR LIML = T ln ( 0) ln ( LIML ) (6.33)where LIML is the limited information maximum likelihood estimator (LIML) of and

( ) = (y Y ) M (X 1)(y Y )(y Y ) M (X )(y Y ). (6.34)

Like LM GMM , the distribution of LR LIML depends on unknown nuisance parameters underH 0( 0), but its asymptotic distribution is 2(k2) whenk2 = G and bounded by the 2(k2) dis-tribution in the other cases [a result in accordance with the general LR distributional bound given inDufour (1997)]. This bound can also be easily derived from the following inequality:

LR LIML T

T kk2 AR ( 0) , (6.35)

so that the distribution of LR LIML is bounded in nite samples by the distribution of a[T k2/ (T k)]F (k2, T k) variable; for details, see Dufour and Khalaf (2000). ForT reasonably large, thisentails that theAR( 0) test will have power higher than the one of LR LIML test [based on the 2(k2) bound], at least in the neighborhood of the null hypothesis. So the power of the AR test isuniformly superior to the one of theLR LIML bound test. Because the LR test depends heavily onthe specication of the model forY , it is not robust to excluded instruments.

f . Conditional tests A promising approach was recently proposed by Moreira (2003a). His sug-gestion consists in conditioning upon an appropriately selected portion of the sufcient statistics fora gaussian SEM. On assuming that the covariance matrix of the errors is known, the correspondingconditional distribution of various test statistics forH 0( 0) does not involve nuisance parameters.The conditional distribution is typically not standard but may be established by simulation. Suchan approach may lead to power gains. On the other hand, the assumption that error covariancesare known is rather implausible, and the extension of the method to the case where the error co-variance matrix is unknown is obtained at the expense of using a large-sample approximation. LikeKleibergens procedure, this method yields an asymptotically similar test. For further discussion,see Moreira and Poi (2001) and Moreira (2003b).

g. Instrument selection procedures Systematic search methods for identifying relevant instru-ments and excluding unimportant instruments have been discussed by several authors; see Hall,

25


30/43

Rudebusch, and Wilcox (1996), Hall and Peixe (2000), Dufour and Taamouti (2001a), and Donaldand Newey (2001). In this setup, the power of AR-type tests depends on a function of model param-

eters called theconcentration coefcient

. One way to approach instrument selection is to maximizethe concentration coefcient towards maximizing test power. Robustness to instrument exclusionis very handy in this context. For further discussion, the reader may consult Dufour and Taamouti(2001a).

To summarize, in special situations, alternatives to the AR procedure may allow some powergains with respect to the AR test with an unreduced set of instruments. They themselves may havesome important drawbacks. In particular, (1) only an asymptotic distributional theory is supplied,(2) the statistics used are not pivotal in nite samples, although Kleibergens and Moreiras statisticsare asymptotically pivotal, (3) they are not robust to instrument exclusion or to the formulation of the model for the explanatory endogenous variables. It is also of interest to note that nite-sampleversions of several of these asymptotic tests may be obtained by using split-sample methods.

All the problems and techniques discussed above relate to sampling-based statistical methods.SEM can also be analyzed through a Bayesian approach, which can alleviate the indeterminaciesassociated with identication via the introduction of a prior distribution on the parameter space.Bayesian inferences always depend on the choice of prior distribution (a property viewed as undesir-able in the sampling approach), but this dependence becomes especially strong when identicationproblems are present [see Gleser and Hwang (1987)]. This paper only aims at discussing prob-lems and solutions which arise within the sampling framework, and it is beyond its scope to debatethe advantages and disadvantages of Bayesian methods under weak identication. For additionaldiscussion on this issue, see Kleibergen and Zivot (2003) and Sims (2001).

7. ExtensionsWe will discuss succinctly some extensions of the above results to multivariate setups (where severalstructural equations may be involved), models with non-Gaussian errors, and nonlinear models.

7.1. Multivariate regression, simulation-based inference and nonnormal errorsAnother approach to inference on a linear structural equation model is based on observing that thestructural model (5.1) - (5.4) can be put in the form of a multivariate linear regression (MLR):

Y = XB + U (7.1)

whereY = [y, Y ], B = [, ], U = [u, V ] = [U 1, . . . , U T ] , = [1, 2] , = [ 1, 2] ,1 = 1 + and2 = 2. 17 This model is linear except for the nonlinear restriction2 = 2.Let us now make the assumption that the errors in the different equations for each observation,U t ,satisfy the property:

U t = JW t , t = 1 , . . . , T , (7.2)17Most of this section is based on Dufour and Khalaf (2000, 2001, 2002).

26


31/43

where the vectorw = vec(W 1, . . . , W n ) has a known distribution andJ is an unknown nonsin-gular matrix (which enters into the covariance matrix of the error vectorsU t ). This distributional

assumption is, in a way, more restrictive than the one made in section 5.1 because of the assump-tion onV and in another way, less restrictive, because the distribution of u is not taken to benecessarilyN 0, 2u I T .

Consider now an hypothesis of the form

H 0 : RBC = D (7.3)

whereR, C andD are xed matrices. This is called auniform linear (UL) hypothesis ; for example,the hypothesis = 0 tested by the AR test can be written in this form [see Dufour and Khalaf (2000)]. The corresponding gaussian LR statistic is

LR (H 0) = T ln(

| 0

|/

|

|) (7.4)

where = U U /T and 0 = U 0U 0/T are respectively the unrestricted and restricted estimates of the error covariance matrix. The AR test can also be obtained as a monotonic transformation of astatistic of the formLR (H 0). An important feature of LR (H 0) in this case is that its distributionunderH 0 does not involve nuisance parameters and may be easily simulated (it is a pivot); seeDufour and Khalaf (2002). In particular, its distribution is completely invariant to the unknownJ matrix (or the error covariance matrix). In such a case, even though this distribution may becomplicated, we can useMonte Carlo test techniques a form of parametric bootstrap to obtainexact test procedures.18 Multivariate extensions of AR tests, which impose restrictions on severalstructural equations, can be obtained in this way. Further, this approach allows one to consider any(possibly non-gaussian) distribution onw.

More generally, it is of interest to note that the LR statistic for about any hypothesis onB canbe bounded by a LR statistic for an appropriately selected UL hypothesis: settingb = vec(B ) and

H 0 : Rb 0 (7.5)

whereR an arbitraryq k(G + 1) matrix and 0 is an arbitrary subset of R q , the distribution of the corresponding LR statistic can be bounded by the LR statistic for a UL hypothesis (which ispivotal). This covers as special cases all restrictions on the coefcients of SEM (as long as they arewritten in the MLR form). To avoid the use of such bounds (which may be quite conservative), it isalso possible to usemaximized Monte Carlo tests [Dufour (2002)].

All the above procedures are valid for parametric models that specify the error distribution up toan unknown linear transformation (theJ matrix) which allows an unknown covariance matrix. It iseasy to see that these (including the exact procedures discussed in section 6) yield asymptoticallyvalid procedures under much weaker assumptions than those used to obtain nite-sample results.However, in view of the discussion in section 4, the pitfalls and limitations of such arguments shouldbe remembered: there is no substitute for a provably exact procedure.

If we aim at getting tests and condence sets for nonparametric versions of the SEM (where the18For further discussion of Monte Carlo test methods, see Dufour and Khalaf (2001) and Dufour (2002).

27


32/43

error distribution involves an innite set of nuisance parameters), this may be achievable by lookingat distribution-free procedures based on permutations, ranks or signs. There is very little work on

this topic in the SEM. For an interesting rst look, however, the reader should look at an interestingrecent paper by Bekker (2002).

7.2. Nonlinear modelsIt is relatively difcult to characterize identication and study its consequences in nonlinear struc-tural models. But problems similar to those noted for linear SEM do arise. Nonlinear structuralmodels typically take the form:

f t (yt , xt , ) = u t , E [u t |Z t ] = 0 , t, . . . , T, (7.6)wheref t () is a vector of possibly nonlinear relationships,yt is a vector endogenous variables,x t isa vector of exogenous variables, is vector of unknown parameters,Z t is a vector of conditioningvariables (or instruments) usually with a number of additional distributional assumptions andE [] refers to the expected value under a distribution with parameter value. In such models,can be viewed as identiable if there is only one value of [say, = ] that satises (7.6), and wehaveweak identication (or weak instruments) when the expected valuesE [f t (yt , xt , ) |Z t ] = 0 ,t, . . . , T, are weakly sensitive to the value of .

Research on weak identication in nonlinear models remains scarce. Nonlinearity makes it dif-cult to construct nite-sample procedures even in models where identication difculties do notoccur. So it is not surprising that work in this area has been mostly based on large-sample approx-imations. Stock and Wright (2000) studied the asymptotic distributions of GMM-based estimatorsand test statistics under conditions of weak identication (and weak high level asymptotic distri-

butional assumptions). While GMM estimators of have nonstandard asymptotic distributions, theobjective function minimized by the GMM procedure follows an asymptotic distribution which isunaffected by the presence of weak instruments: it is asymptotically pivotal. So tests and condencesets based on the objective function can be asymptotically valid irrespective of the presence of weakinstruments. These results are achieved for the full parameter vector, i.e. for hypotheses of theform = 0 and the corresponding joint condence sets. This is not surprising: parametric non-separability arises here for two reasons, model nonlinearity and the possibility of non-identication.Of course, once a joint condence set for has been b

Dufour J.-m. Identification Weak Instruments and Statistical Inference in Econometrics (U. de...

Documents

Transcript of Dufour J.-m. Identification Weak Instruments and Statistical Inference in Econometrics (U. de...