Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a...

60
10ISSN 1183-1057 SIMON FRASER UNIVERSITY Department of Economics Working Papers 12-17 Testing Identification StrengthBertille Antoine & Eric Renault January 8, 2017

Transcript of Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a...

Page 1: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

10ISSN 1183-1057

SIMON FRASER UNIVERSITY

Department of Economics

Working Papers

12-17

“Testing Identification Strength”

Bertille Antoine & Eric Renault

January 8, 2017

Page 2: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Testing Identification Strength

Bertille Antoine∗ and Eric Renault†

January 8, 2017

Abstract

We consider models defined by a set of moment restrictions that may be subject

to weak identification. We propose a testing procedure to assess whether instruments

are ”too weak” for standard (Gaussian) asymptotic theory to be reliable. Since the

validity of standard asymptotics for GMM rests upon a Taylor expansion of the first

order conditions, we distinguish two cases: (i) models that are either linear or sep-

arable in the parameters of interest (ii) general models that are neither linear nor

separable. Our testing procedure is similar in both cases, but our null hypothesis of

weak identification for a nonlinear model is broader than the popular one. Our test is

straightforward to apply and allows to test the null hypothesis of weak identification of

specific subvectors without assuming identification of the components not under test.

In the linear case, it can be seen as a generalization of the popular first-stage F-test

but allows us to fix its shortcomings in case of heteroskedasticity. In simulations, our

test is well behaved when compared to contenders, both in terms of size and power. In

particular, the focus on subvectors allows us to have power to reject the null of weak

identification on some components of interest. This observation may explain why,

when applied to the estimation of the Elasticity of Intertemporal Substitution, our

test is the only one to find matching results for every country under the two symmet-

ric popular specifications: the intercept parameter is always found strongly identified,

whereas the slope parameter is always found weakly identified.

Keywords: GMM; Weak IV; Test; Subvector.

JEL classification: C32; C12; C13; C51.

∗Simon Fraser University. Email: bertille [email protected]†Brown University. Email: eric [email protected]

1

Page 3: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

1 Introduction

Following Hahn and Hausman (2003), the weak instruments problem should be defined

by two features. The first relates to the bias of two-stage least squares (2SLS) towards

OLS, while the second involves the inaccurate inference framework associated with the

standard asymptotic theory. The main goal of this paper is to provide the practitioner with

a tool to assess the second feature. Specifically, we propose to test the null hypothesis that

instruments are too weak for reliable application of ”standard asymptotic theory”. When

the sample provides compelling evidence against the null, the practitioner will conclude that

she can safely rely on standard inference.

Power gain in pretesting for weak instruments is arguably the main contribution of this

paper. We achieve this power gain by testing weak identification on some specific subset of

components of the vector of parameters of interest. One can often suspect which components

of the structural parameter are likely to be poorly identified. Two cases of interest can then

be distinguished when testing the null of weak identification on these suspicious components.

On one hand, it should be quite intuitive to understand that maintaining the assumption

that the other components are properly identified lead to substantial power gain. On the

other hand, we also show how to pretest without assuming anything about the identification

of the other components. In our Monte-Carlo study, it is quite striking how pretesting

specific components (rather than the whole vector) can lead to improved inference.

The main ”trick” of this paper is related to macroeconomists’ common practice in their

empirical study of DSGE models. They often suspect which parameters are likely to be

poorly identified (like, for instance, the subjective discount factor) and they may prefer to fix

the value of these parameters rather than to run the risk of contaminating inference on other

parameters by a vain attempt to identify all parameters together. This practice also suggests

a simple way to test for weak identification of some specific components. If we add to the

GMM estimator of these components some well-tuned perturbation term, under the null of

weak identification, the perturbation should have an immaterial impact on the value of the

J-test statistic for overidentification proposed by Hansen (1982). By contrast, a sufficiently

accurate identification (our alternative hypothesis in the testing procedure) will consistently

detect such a perturbation. Depending on several identification patterns, we develop a

testing strategy for different identification strengths, which is based on the perturbation of

GMM estimators and corresponding J-tests. These tests are not only practitioner-friendly,

2

Page 4: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

but also conformable to a theoretical view of weak identification as developed very generally

by Dufour (1997). When the degree of identification can be arbitrarily small, valid confidence

sets should be infinite with a positive probability. In terms of tests, it is akin to consider

that a null hypothesis written as such distortions of the true value per unit of standard error

may deliver a positive p-value.

Interestingly enough, our analysis sheds some new light, through the powerful lens of GMM,

on the popular rule of thumb used to assess identification strength (see Staiger and Stock

(1997) and Stock and Yogo (2005)). Recall that this rule of thumb is based on the magnitude

of the first-stage F-statistic with heteroskedasticity-robust computation of the variance of

OLS when necessary. Andrews (2015) has recently pointed out that ”pretest based on

the heteroskedasticity robust first-stage F-statistic fails to control coverage distortions in

heteroskedastic linear IV”. The beauty of the GMM approach based on distorted J-test

statistics is that it provides information not only about the behavior of the test statistic

under the null of weak identification (what is also known for the F-statistic thanks to

its robustification), but also under the alternative of stronger identification. We precisely

show that the GMM theory delivers the relevant metric to assess the identification strength

(beyond the null of weak identification) by contrast with the robustified F-statistic. In all

these cases of linear models, the relevant amount of distortion of the GMM estimator to

be able to possibly reject the null of weak identification is proportional to 1/√T , precisely

because the null of interest, following Staiger and Stock (1997), is akin to relevance of

instruments vanishing as fast 1/√T . It is also the case for separable models in the spirit of

Stock and Wright (2000).

On theoretical grounds, an additional contribution of this paper is, for the purpose of

testing, to characterize the relevant amount of infinite distortions that may deliver positive

p-values for models that are neither linear nor separable with respect to the parameters

under test. By doing so, we bridge the gap between Dufour (1997) and the literature on

alternative asymptotics that has been developed to capture more accurately finite sample

distributions of GMM estimators in the presence of weak instruments. In the spirit of the

recent paper by Andrews and Cheng (2012), many different patterns of nearly weak/strong

identification cases are discussed. More generally, the weak instrument literature can be

understood by considering the reduced rank setting as the limit of a sequence of Data

Generating Processes (DGP) indexed by the sample size. Antoine and Renault (2009,

2012) have characterized how various degrees of identification weakness (as defined by the

3

Page 5: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

rate of convergence towards reduced rank along the sequence of drifting DGPs) lead to

various rates of convergence for estimators of structural parameters. The validity of standard

asymptotic normal distributions of GMM estimators rests upon Taylor expansions of first-

order conditions. These expansions involve the computation of derivatives not only at the

true value of unknown parameters but also at some intermediate points between the true

value and the GMM estimator. The slow convergence of the GMM estimator may introduce

some asymptotic distortions. As a result, poor identification may be harmful not only when

no consistent estimators are available (the case studied by Stock and Wright (2000)), but

also, more generally, when consistent estimators at hand may not converge faster than the

square root of square-root-T.

In all cases, linear or not, we devise tests with a controlled asymptotic size that may be

conservative but consistent against all relevant alternatives. We show that the finite sample

power of these tests is significantly increased when the researcher is able to set the focus on

a specific subvector, while possibly (but not necessarily) assuming proper identification of

other components. We are also able to keep valid testing procedures still based on distorted

J-test statistics, even when lack of consistency under the null prevents us from estimating

consistently the efficient weighting matrix for asymptotic chi-square distributions. We apply

a projection technique in the spirit of Chaudhuri and Zivot (2011).

The related literature is too vast to be properly summarized here. As already mentioned,

several authors rather recommend inference procedures that are robust to weak identification

as an alternative to pretesting: see e.g. Moreira (2003), Kleibergen (2005), Guggenberger,

Kleibergen, Mavroeidis and Chen (2012), references therein, and the survey by Andrews and

Stock (2007). The pretesting literature can be divided into two categories depending on the

null under test: the null of poor identification (as in this paper) is considered by Staiger

and Stock (1997) and Stock and Yogo (2005); whereas the null of strong identification is

considered by Hahn and Hausman (2002), Inoue and Rossi (2011) as well as Bravo, Escan-

ciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model

(with or without conditional heteroskedasticity) and we compare the performance of our

pretesting procedure to Staiger and Stock’s (1997) and Stock and Yogo’s (2005). Finally,

we revisit the empirical application of Yogo (2004) (estimation of the Elasticity of Intertem-

poral Substitution from Euler equations) and compare with the recent test developed by

Montiel-Olea and Pflueger (2013).

The paper is organized as follows. Section 2 introduces our framework and the motivating

4

Page 6: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

example of identification of the Elasticity of Intertemporal Substitution. In section 3, we set

the focus on linear (or separable) models, and show how our testing procedure generalizes

the first-stage F-test popularized by Staiger and Stock (1997) to test for weak identification.

General non-linear and non-separable frameworks are considered in section 4. We show in

section 5 how some recent additional examples (Andrews and Cheng (2012), Dovonon and

Renault (2013)) are nested within our general framework; this means that our identification

test could shed some new light on these issues as well. In section 6, we illustrate the finite

sample performance of our tests through Monte-Carlo simulations in a linear IV regression

model, and we apply our testing procedure to the estimation of the Elasticity of Intertem-

poral Substitution. Section 7 concludes. The proofs of the theoretical results and the tables

of empirical results are gathered in the Appendix.

2 General framework and a motivating example

In this section, we first introduce our general framework that remains valid throughout the

paper. Second, we revisit a well-known empirical example, the estimation of the Elasticity

of Intertemporal Substitution to motivate our framework.

2.1 General framework

We consider models where the structural parameter θ of interest (θ ∈ Θ ⊂ Rp) is defined as

solution of moment conditions,

E [φt(θ)] = 0 ,

for some known function φt(θ) of size k ≥ p defined for an observation t = 1, · · · , T . The

associated sample mean is,

φT (θ) =1

T

T∑

t=1

φt(θ) .

The moment conditions are assumed to define a true unknown value θ0 of the parameters

in the interior of Θ,

E[

φT (θ)]

= 0 ⇔ θ = θ0 .

Our framework accommodates the case of a drifting DGP (see e.g. Stock and Wright (2000)

and more examples in section 3 below) where the above expectation may be arbitrarily close

to zero for a (large) set of parameters when T is infinitely large.

5

Page 7: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

We focus on the possibly weak identification of a subvector π of θ. More precisely, we

consider the following partition of the vector θ of structural parameters,

θ = (ψ′ , π′)′, ψ ∈ R

p1 , π ∈ Rp2 , p = p1 + p2 ,

and we want to test the identification strength of π (e.g. whether π is weakly identified or

not). Note that we do not specify any identification pattern for the nuisance parameters

ψ. Obviously, inference about the parameters of interest π will be simpler (and sharper)

when we maintain the assumption that the nuisance parameters ψ are strongly identified -

however, it is important to emphasize that all possible cases will be considered in this paper:

e.g. testing whether π is weakly identified regardless of the identification properties of the

nuisance parameters ψ.

For the purpose of GMM inference, we will consider a GMM estimator θT of θ0 defined as,

θT = argminθ∈Θ

JT [θ,WT (θ∗)] ,

where JT [θ,WT (θ∗)] = T φT (θ)

′WT (θ∗)φT (θ)

We have in mind one of the two following GMM estimators:

• θT is the continuously updated GMM estimator (CU-GMM hereafter) when θ∗ = θ

and [WT (θ∗)]−1 is a consistent estimator ST (θ) of S(θ) the long run covariance matrix

of φT (θ) for all θ ∈ Θ,

√T

φT (θ)− E[

φT (θ)] d→ N (0, S(θ)) .

• θT is an efficient GMM estimator when [WT (θ∗)]−1 does not depend on θ∗ and is a

consistent estimator ST of S(θ0).

Keeping in mind that the above weighting matrix WT (θ) may or may not actually depend

on θ, our estimator will therefore be denoted as follows without loss of generality,

θT = argminθ∈Θ

JT [θ,WT (θ)] where JT [θ,WT (θ)] = T φT (θ)′WT (θ)φT (θ) . (2.1)

In many cases, the two approaches are relevant and tightly related because, for ST (θ) a

consistent estimator of the long run covariance matrix S(θ), one would consider for the

purpose of the second approach

ST = ST (θT )

6

Page 8: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

where θT is a first-step consistent estimator of θ0. Then θT is an efficient two-step GMM

estimator. When it is not necessary, we will not make an explicit difference between these

two approaches. However, in cases where no consistent estimator of θ0 is available (e.g. due

to weak identification) only the first approach via CU-GMM is feasible.

In any case, a maintained assumption will be that, not only θT = OP (1), but also, for some

positive definite - possibly random - matrices W and W ,

Plim[

WT

(

θ0)]

= W and WT (θT )d→ W .

In most cases, WT (θT ) will actually converge in probability towards a deterministic matrix

W = W . However, when WT (θ) does depend on θ and θT may not be a consistent estimator

of θ0, we can only assume that WT (θT ) = OP (1). Then, the above assumption is natural,

at least for a subsequence.

2.2 Motivating example: the estimation of the EIS

To motivate our general framework, we now revisit a well-known empirical challenge, the

estimation of the Elasticity of Intertemporal Substitution (EIS), a key parameter in macroe-

conomics and finance: for example, Woodford (2003, chapter 4) discusses its interpretation

in the context of a monetary policy model; see Campbell and Viceira (1999) for its inter-

pretation in a consumption and portfolio choice model.

Let β be the subjective discount factor, γ the coefficient of relative risk aversion, π the EIS,

and define λ = (1−γ)/(1−1/π). The Epstein-Zin (1989, 1991) objective function is defined

recursively by

Ut =[

(1− β)C(1−γ)/λt + β

(

EtU1−γt+1

)1/λ]λ/(1−γ)

,

where Ct is the consumption at time t. The maximization of the above utility function

subject to the intertemporal budget constraint (see Epstein and Zin (1991)) yields the

following Euler equation,

Et

(

β

(

Ct+1

Ct

)−1/π)λ(

1

1 +Rw,t−1

)1−λ(1 +Ri,t+1)

= 1 (2.2)

where (1+Ri,t+1) is the gross real return on asset i and (1+Rw,t+1) is the gross real return

on the portfolio of all invested wealth at time (t+ 1).

7

Page 9: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Given a vector of k (valid) instruments Zt, the preference parameters θ = (ψ′ π)′ with

ψ′ = (β γ) can be estimated by GMM (or CU-GMM) through the above nonlinear Euler

equation after defining the moment functions

φt(θ) = Zt

(

β

(

Ct+1

Ct

)−1/π)λ(

1

1 +Rw,t−1

)1−λ(1 +Ri,t+1)− 1

.

See e.g. Hansen and Singleton (1982) for the power utility case and Epstein and Zin (1991)

for Epstein-Zin preferences.

The difficulty with this approach is two-fold: first, as mentioned by Epstein and Zin (1991),

it requires observations on the returns on the wealth portfolio which includes returns on

human capital; second, it is unclear whether sufficiently strong instruments can be found to

allow conventional asymptotics and associated inference procedures to be reliable: see e.g.

Stock and Wright (2000) and Antoine and Renault (2009) for the power utility case.

The first practical issue can easily be avoided by estimating the preference parameters from

a linearized Euler equation as it does not require the knowledge of returns on the wealth

portfolio. For example, when also using the market return, a second-order log-linearization1

of the Euler equation (2.2) yields:

Etri,t+1 = µi +1

πEt∆ct+1 (2.3)

where lowercase letters denote the logarithms of the corresponding uppercase variables,

∆ct+1 is the consumption growth at time (t+1), and µi is a constant term that depends on

(conditional) second moments of ri,t+1 and ∆ct+1: see Campbell and Viceira (2002, chapter

2) for full details and the explicit definition of µi.

By setting

ηi,t+1 = ri,t+1 −Etri,t+1 −1

π(∆ct+1 − Et∆ct+1)

the model can be rewritten as the following linear IV regression model

ri,t+1 = µi +1

π∆ct+1 + ηi,t+1 ,

with associated moment functions,

φt(µi, π) = Ztηi,t+1 = Zt

[

ri,t+1 − µi −1

π∆ct+1

]

.

1When asset returns and consumption are jointly log normally distributed and homoskedastic conditional

on information at time t, (2.3) is the exact log-linearization of (2.2).

8

Page 10: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

The obvious disadvantage of the linearized Euler equation is that the discount factor β

cannot be identified. Nevertheless, the study of the linear IV regression model remains

interesting because much more is known on how to detect weak identification: see e.g.

Staiger and Stock (1997) and Stock and Yogo (2005). This is the focus of the next section.

The detection of weak identification in general nonlinear models is dealt with in section

4, typically because we will have to set the focus on a more pervasive context of poor

identification.

3 Testing for weak identification in a separable model

In this section, we introduce our testing procedure in the simpler framework of linear mod-

els, or nonlinear separable models. This separable setup can accommodate, for instance, a

(partially) linearized consumption Euler equation. We also provide a theoretical justifica-

tion for the popular rule-of-thumb introduced by Staiger and Stock (1997) to detect weak

identification in a linear homoskedastic model, and show how our testing procedure can be

seen as its generalization that remains valid in heteroskedastic and nonlinear models.

3.1 Detecting weak identification in a linear model

Our test for identification strength is based on the asymptotic behavior of the Hansen J-test

statistic defined as

JT = JT

[

θT ,WT (θT )]

.

In the standard case of strong identification of the whole vector θT , it is well-known that JT

is asymptotically distributed as χ2(k − p). However, in the general case, we only have

JTd→ J∞ ≤ χ2(k) ,

since, in the case of continuously updated GMM,

JT = T φT (θT )′[ST (θT )]

−1φT (θT ) ≤ T φT (θ0)′[ST (θ

0)]−1φT (θ0) = JT (θ

0)d→ χ2(k) ,

while, in the case of a (two-step) efficient GMM,

JT = T φT (θT )′S−1T φT (θT ) ≤ T φT (θ

0)′S−1T φT (θ

0) = JT (θ0) + op(1)

d→ χ2(k) .

9

Page 11: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

To detect any departure from standard asymptotic theory due to identification weakness - at

least regarding the subvector π of θ - our key idea is to assess the impact of any distortion in

the estimation of π on the J-test statistic. More precisely, for some non-zero p2-dimensional

vector δ, we consider the asymptotic behavior of

JδT = JT

[

θδT ,WT (θT )]

where θT =(

ψ′T , π

′T

)′and θδT =

(

ψ′T , π

′T + δ′

)′= θT + (0′, δ′)′ . (3.1)

To simplify the exposition, we focus in this subsection on moment conditions that are linear

with respect to the parameters of interest π, that is,

φT (θ) = aT (ψ) + ATπ ,

for some sequences aT (.) and AT of matricial functions that are known from the observations

of a sample of size T : the intercept aT (ψ) may depend on the nuisance parameters, but not

the slope AT . Maintaining this assumption is the price to pay to keep the simplicity of the

linear setting (wrt π) without maintaining any specific identification assumption about the

nuisance parameters ψ. For expositional simplicity, we also maintain the assumption that

for some positive definite matrix W ,

Plim[

WT (θT )]

= W .

The case of a random limit can be accommodated at the cost of more complicated empirical

strategies that will be described in section 4 below.

Linearity allows us to get the following decomposition,

JδT = JT + Tδ′AT′WT (θT )AT δ .

Then, several cases are worth considering depending on the identification of the vector π.

When the vector π of parameters is strongly identified, we have under standard regularity

conditions:

A∞ = PlimAT ⇒ Plim[

δ′AT′WT (θT )AT δ

]

= δ′A′∞WA∞δ > 0

The last inequality follows from the fact that the matrix W is positive definite and that A∞

must be a full column rank matrix (otherwise π would not be strongly identified). Thus,

Plim[

JδT]

= +∞ .

10

Page 12: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

It is worth noting that this result is still granted even if, due to some identification weakness

for π, the matrix AT goes to zero but, for some positive ν < 1/2,

Plim [T νAT ] = A∞ ,

with A∞ full column rank matrix.

By contrast, when the subvector π is weakly identified, we see that:

Plim[

Tδ′AT′WT (θT )AT δ

]

= δ′A′∞WA∞δ ,

if we assume that √TAT

d→ A∞ .

Note that this latter assumption is conformable to Staiger and Stock’s (1997) approach to

weak identification in linear models. More precisely, if we assume that,

limT→∞

√TE(AT ) = µ ,

a central limit theorem will characterize A∞ as a Gaussian distribution with expectation µ.

The following Theorem summarizes our results so far.

Theorem 3.1. (Asymptotic behavior of JδT in the linear case)

Consider moment conditions that are linear with respect to π,

φT (θ) = aT (ψ) + ATπ .

(i) When π is weakly identified, that is when√TAT

d→ A∞

and when we have jointly(

JT (θ0)√

TAT

)

d→(

J∞(θ0)

A∞

)

with J∞(θ0) ∼ χ2(k)

Then,

JδTd→ Jδ∞ ≤ J

∞(θ0) + δ′A′

∞WA∞δ . (3.2)

(ii) When π is (nearly-)strongly identified, that is when there exists ν ∈ [0, 1/2) such that

Plim [T νAT ] = A∞ with A∞ full column rank matrix

Then,

Plim[

JδT]

= +∞ .

11

Page 13: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Several comments are in order. First, the clearcut asymptotic behavior of the statistic JδT(for any given nonzero deterministic perturbation δ) will obviously allow us to build a test

of the null hypothesis of weak identification. Theorem 3.1 shows that such a test should be

consistent not only against the case of strong identification (ν = 0), but even in any case

of nearly-weak or semi-strong identification (see Hahn and Kursteiner (2002), Antoine and

Renault (2009), and Andrews and Cheng (2012)) defined by some ν ∈ (0, 1/2). Even more

generally, the test sketched above should have nontrivial power for any given δ such that

Plim[√

T ‖AT δ‖]

= +∞ .

Second, the above discussion highlights that the implicit definition of the null hypothesis of

weak identification is indeed

H0,π(WI) : ‖AT‖ = OP

(

1/√T)

. (3.3)

This condition is necessary and sufficient to design a test of weak identification based on

the test statistic JδT that is consistent, at least for some directions δ ∈ Rp2. In other words,

the implicit null hypothesis is actually defined by a rate of convergence. This should not

be a cause of concern regarding its testability. It is actually the case of all tests about

identification that rely upon the weak instruments asymptotics such as Staiger and Stock

(1997), Stock and Wright (2000), and others. More generally, null hypotheses implicitly

defined by a rate of convergence are not new in econometrics. When one tests for the

specification of a parametric model (e.g. regression function, drift of a diffusion process, ...)

by comparing the parametric estimator and a nonparametric counterpart - like an estimator

based on kernel smoothing - one implicitly acknowledges that the critical region depends on

the rate of convergence of a bandwidth parameter.

Third, a convenient feature of Theorem 3.1 is to pave the way for tests on identification

strength, such that, when we are able to reject the null of weak identification - that is, the

fact that the matrix AT goes to zero as fast as 1/√T - then it is possible (see e.g. Antoine

and Renault (2009, 2012)) to use standard tools of asymptotic inference about π. This is in

sharp contrast with approaches for which the null hypothesis under test is strong identifi-

cation. For instance, the test devised by Bravo, Escanciano and Otsu (2012), albeit similar

in spirit to our test, is only able to test the null hypothesis of strong identification against

an alternative of (nearly-)weak identification. This shortcoming is shared by all classical

tests for weak identification that are based on a Hausman test (see also Guggenberger and

12

Page 14: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Smith (2005), Inoue and Rossi (2011)). Our main goal consists in providing underpinnings

for the validity of standard asymptotic inference as soon as the null of poor identification

is rejected. This also explains why we will need to define a more pervasive concept of poor

identification in general nonlinear and non-separable models; see section 4 below.

In order to devise a formal test of weak identification based on Theorem 3.1, one must use

equation (3.2) to compute a critical value cα such that

Pr[

J∞(θ0) + δ′A′

∞WA∞δ > cα

]

= α . (3.4)

We expect that the parameters of the distribution at stake in (3.4) can be consistently

estimated, so that a consistent estimator cα,T of the population critical value cα is readily

available. Then we will be able to define a consistent test with asymptotic level α by

rejecting the null of weak identification if and only if

JδT > cα,T .

We describe a comprehensive example of this testing strategy in the next subsections. We

first show that our testing procedure encompasses the popular rule of thumb by Staiger and

Stock (1997), while providing its theoretical underpinnings.

3.2 Revisiting the rule-of-thumb

We consider the standard linear IV regression model to revisit the first-stage F-statistic as a

convenient and popular tool to detect weak identification. For sake of expositional simplicity

and following common practice, we focus on the linear model with a single endogenous

regressor, while we assume that all exogenous regressors have been partialled out,

y = Y π + u ,

Y = Zβ + v ,

where the dependent variable y and the endogenous regressor Y are (T, 1)-vectors, Z is

the (T, k)-matrix of instruments whose validity is ensured by the maintained assumption,

E(utZt) = 0; we also assume E(vtZt) = 0. The variables (ut, vt, Yt, Zt), t = 1, ..., T are

assumed to be serially independent. The first-stage F-statistic controls the statistical sig-

nificance of the reduced form coefficient β:

FT =T

kβ ′T Σ

−1T βT , with βT the OLS estimator of β,

13

Page 15: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

under the maintained assumption that√T (βT − β) is asymptotically normally distributed

with mean zero and variance Σ, consistently estimated by ΣT .

When the error term v is homoskedastic with variance σ2v , one can take:

ΣT = T σ2v(Z

′Z)−1 with

σ2v = 1

T−k∑T

t=1(vt − vT )2

vt = Yt − ZβT

vT = 1T

∑Tt=1 vt ,

and then, when the true value of β is zero, FT is distributed as χ2(k)/k where χ2(k) denotes

the Chi-square distribution with k degrees of freedom - actually Fisher F (k, T − k) in finite

sample when error terms are Gaussian.

In the general case, one can use an Eicker-White robust variance estimator for ΣT . FT is

then the so-called robust F-test and its use to detect weak identification has been recently

discussed by Montiel-Olea and Pflueger (2013) as well as Andrews (2015). In all circum-

stances, the asymptotic distribution of FT when the true value of β is zero is χ2(k)/k.

3.2.1 Theoretical underpinnings of the rule-of-thumb

Under weak identification of π, the reduced form parameter β is drifting to zero at the rate

1/√T , so that the asymptotic distribution of FT is a non-central Chi-square. By contrast,

if β is a non-drifting non-zero number, the F-statistic goes to infinity as fast as T . Hence,

the rule of thumb that FT should be found sufficiently large to be able to reject the null

hypothesis of weak identification of π, H0,π(WI).

Proposition 3.2. Under the null H0,π(WI) of weak identification defined in (3.3),

FTT→ 1

kχ2(k, λ) with λ = lim

T

[

Tβ ′Σ−1β]

,

where χ2(k, λ) denotes the non-central Chi-square distribution. Under strong identification,

in case of homoskedastic reduced-form error terms vt,

FT ∼ Tβ ′Σ−1β

k→ ∞ .

Proposition 3.2 justifies the rule of thumb put forward by Staiger and Stock (1997) and

Stock and Yogo (2005): Staiger and Stock suggest to declare instruments weak if the first-

stage F-statistic falls below 10; Stock and Yogo provide cut-off values for more than one

endogenous regressor based on coverage distortions for Wald confidence sets based on 2SLS.

14

Page 16: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

However, it must be kept in mind that the above discussion rests upon the maintained as-

sumption of homoskedasticity. The robustified version of the F-statistic using Eicker-White

estimator of Σ maintains the same distribution under the null, but does not provide any

guidance about the pattern of divergence of the F-test statistic under strong identification.

This is the reason why Andrews (2015) states that ”pretest based on the heteroskedasticity-

robust first-stage F-statistic fails to control coverage distortions in heteroskedastic linear

IV ”.

Our interpretation is that in case of heteroskedasticity, the robustified F-statistic does not

consider the right norm of the reduced form parameters β. We now revisit the F-test statistic

through the lens of GMM and address this shortcoming.

3.2.2 Generalizing the rule-of-thumb

The key intuition is that in case of strong identification, the GMM criterion should be able

to detect a small distortion of the parameters. Let us rewrite the linear IV model in the

GMM framework,

θT = argminθ∈Θ

[

φ′T (θ)S

−1T φT (θ)

]

, with φT (θ) =1

TZ ′(y − Y π) ,

and ST a consistent estimator of the long-term variance matrix of√TφT (θ

0) denoted S(θ0).

The structural parameter θ coincides here with the parameter of interest π.

Following Hansen (1982), researchers may check the validity of the moment conditions by

computing the J-test statistic of overidentification,

JT = Tφ′T (θT )S

−1T φT (θT ) .

As already mentioned in section 3.1, when instruments are valid and strong, JT is asymptot-

ically distributed as χ2(k − 1). However, when instruments are weak, the GMM estimator

θT = πT is no longer consistent, and we can only state that the estimation error (πT − π)

is OP (1) under the null of weak identification H0,π(WI). As a result, the asymptotic distri-

bution is no longer χ2(k − 1) and is dominated by χ2(k) since, by definition,

JT = Tφ′T (θT )S

−1T φT (θT ) ≤ Tφ

′T (θ

0)S−1T φT (θ

0)d→ χ2(k) .

Since the validity of the J-test statistic goes beyond the linear model, it suggests the def-

inition of a general rule-of-thumb based on the J-test statistic as an alternative to the F-

statistic. For reasons explained shortly, we will actually consider a J-test statistic computed

15

Page 17: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

at a distorted value of the GMM estimator2

JT (δ) = Tφ′T (θT + δ)S−1

T φT (θT + δ) .

Proposition 3.3. Under the null of weak identification H0,π(WI), for any real number δ,

JT (δ) = Op(1) .

Moreover, under homoskedasticity, JT (δ) converges in distribution towards J∞(δ) with

J∞(δ) ≤ χ2(k) + δ2σ2v

σ2u

χ2(k, λ) + 2δσvσu

χ2(k)χ2(k, λ) with λ = limT

[

Tβ ′Σ−1β]

.

The upper bound of the asymptotic distribution of J∞(δ) is actually χ2(k) if we choose a

sequence of distortion terms δ = δT going to zero when T goes to infinity. This result will be

shown in much more general contexts in sections 3.3 and 4. In addition, if we rather choose

δ = σu/σv in order to give the same order of magnitude to the structural and reduced form

equations, we end up with a simpler upper bound:[

χ2(k) +√

χ2(k, λ)]2

≤ 4χ2(k, λ) .

Even though the standard J-test statistic JT (0) and the rescaled F-statistic kFT have such

similar asymptotic behaviors under the null, the former, by contrast with the latter, would

be useless to detect the alternative of strong identification. In this case, we know that JT (0)

is asymptotically χ2(k − 1); that being said, the trick of distortion of the GMM estimator

makes all the difference. If θ = π is strongly identified but we replace the root-T consistent

estimator θT by (θT + δ),

√T (θT + δ − θ0) =

√T (θT − θ0) +

√Tδ

will blow up at the rate√Tδ (or

√TδT by choosing δT going to zero at a rate slower than

1/√T ). Therefore, up to the possible dampening effect due to δT going to zero, the distorted

J-test statistic should detect strong identification as well as the F-statistic FT . This intuition

is confirmed by the following result.

Proposition 3.4. Let us introduce the following notations,

Qz(v) = E[

ztz′tσ

2v(zt)

]

, σ2v(zt) = E[v2t |zt] ,

Qz(u) = E[

ztz′tσ

2u(zt)

]

, σ2u(zt) = E[u2t |zt] .

2Since π coincides with θ, the distortion δ applies here to the whole vector θ - unlike equation (3.1).

16

Page 18: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

When π is strongly identified, we have

Plim

[

kFTT

]

= β ′Qz(v)−1β = β ′Σ−1β ,

Plim

[

JT (δ)

T

]

= δ2β ′Qz(u)−1β = δ2

σ2v

σ2u

β ′Σ−1β ,

where, for each statistic, the last equality is valid under conditional homoskedasticity.

The key message of proposition 3.4 is that, in general, the first-stage F-statistic does not

use the right metric to assess the significance of β. It is only in the homoskedastic case

that both Qz(u) and Qz(v) are proportional to Qz ≡ E [ztz′t], so that the right metric is

proportional to Q−1

z , or equivalently to Σ−1.

To conclude, it is worth mentioning that the approach promoted above will take a consistent

estimator of the variance of the error term u. This may be problematic under the null of

weak identification when no consistent estimation of the structural equation is available.

The feasible version of this strategy will be discussed in section 3.4.1 below.

3.3 Detecting weak identification in a separable model

As explained in section 3.1, our testing procedure is based on a distorted version of the

J-test statistic for overidentification JδT . In order to control the size of our test, we need to

characterize the asymptotic distribution of JδT under the null. In a general nonlinear context,

this introduces two issues. First, since under the null of weak identification no consistent

estimator of θ0 is available in general, we may not be able to consistently estimate the long-

run variance matrix S(θ0). This issue will be addressed at the end of this subsection. For the

moment, we maintain the possibly infeasible assumption that there exists such consistent

estimator ST . Second, even with ST , it is unclear whether JδT converges in distribution

with a well-characterized upper bound. This is the reason why we need to consider instead

JT (δT ) with a tuning parameter δT going to zero with T .

The introduction of δT → 0 may induce some power loss in the linear homoskedastic case

considered in the previous section: e.g. we may have to accept some slight power loss with

respect to the naive F-test statistic since JT (δT )/FT = Op(δ2T ); the slower the convergence

of δT to zero, the smaller the implied power loss. We argue that it is actually the price to

pay to design a testing procedure that remains valid beyond the linear homoskedastic case.

17

Page 19: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

When considering δT → 0, we hope that, for T large, our distorted J-test statistic behaves

under the null as

JT (0) = Tφ′T (θT )S

−1T φT (θT ) ≤ TφT (θ

0)S−1T φT (θ

0)d→ χ2(k) .

If it is the case, it means that we can control the asymptotic size of our test by using

the critical values provided by χ2(k). The fact that JT (δT ) may behave under the null

asymptotically as JT (0) for any sequence δT going to zero (even very slowly) actually comes

from the singularity of the Jacobian matrix. A first-order Taylor expansion around θ0 writes

φT (θT + δT ) ∼ φT (θ0) +

∂φT (θ0)

∂θ′δT , (3.5)

with, under the null,∥

∂φT (θ0)

∂π

= Op

(

1√T

)

. (3.6)

However, the above equation implicitly assumes that the higher-order terms in the Taylor

expansion (3.5) are negligible. Since the first-order is driven to zero by a Jacobian matrix,

higher-order terms may not be negligible. Of course, higher-order terms were not an issue

in the linear setting of section 3.1. It turns out that there are no such issues either in the

case of a separable model:

φt(θ) = φ1t(ψ) +1

T νφ2t(ψ, π) ,with θ = (ψ, π). (3.7)

The null of weak identification corresponds to ν = 1/2 (as assumed in the original framework

of Stock and Wright (2000); see also Example 5.3 below), while the alternative corresponds

to any 0 ≤ ν < 1/2. In addition, the nuisance parameters ψ are also assumed to be strongly

identified:

Plim

[

∂φT (θ)

∂ψ′

]

= Γψ(θ) ,with Γψ(θ) full-column rank,

and, under the null, Γψ(θ) = Plim

[

∂φ1T (θ)

∂ψ′

]

.

The important property of this separable case is that, under the null, we have, at the true

value,

Plim

[

∂2φT (θ0)

∂π∂π′

]

= 0 .

18

Page 20: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Then, second order terms in (3.5) are negligible insofar as the distortion δT goes to zero and

only affects the components πT of θT (see also equation (3.1)).

More formally, our test procedure is the following:

1. Compute the GMM estimator, θT = (ψT , πT ) = argminθ∈Θ

JT [θ,WT (θ∗)];

2. Compute the distorted estimator, θδTT = (ψ′T , π

′T + δ′T )

′ with δT → 0;

3. Compute the distorted J-test statistic JT (δT );

4. Reject the null of weak identification of π when JT (δT ) > χ21−α(k − p1) with p1 the

dimension of ψ, and χ21−α(k − p1) the (1 − α)-quantile of the Chi-square distribution

with (k − p1) degrees of freedom.

Note that the degrees of freedom (k − p1) shows up because we maintain the assumption

that the p1 components of ψ are strongly identified. The theorem below shows that the

assumption of sufficiently strong identification of ψ can be relaxed at the price of a more

conservative test with critical values computed from χ2(k) rather than from χ2(k − p1).

Theorem 3.5. (Test of weak identification of π in the separable case (3.7))

For an arbitrary choice of the deterministic sequence δT such that δT → 0, we define two

asymptotic tests with respective critical regions W δTT and W δT

T ,

W δTT =

JT (δT ) > χ21−α(k − p1)

and W δTT =

JT (δT ) > χ21−α(k)

where χ21−α(d) is the (1−α)-quantile of the chi-square distribution with d degrees of freedom.

(i) The test W δTT is asymptotically conservative at level α for the null hypothesis H0,π(WI).

(ii) Under strong identification of ψ, the test W δTT is asymptotically conservative for the

same null hypothesis H0,π(WI).

(iii) The tests W δTT and W δT

T are consistent against any alternative that makes the choice

δT conformable to Plim [JT (δT )] = ∞.

The proof of Theorem 3.5 will be provided in the appendix as a special case of the proof of

Theorem 4.4 of the next section.

Intermediate critical values χ21−α(k − q) with 0 < q < p1 can be considered after assuming

that at least q components of ψ are strongly identified. Hence, in order to obtain a procedure

that is less conservative, parameters known to be strongly identified (e.g. the intercept)

19

Page 21: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

should not be included in π but rather in ψ. It is also worth mentioning that nothing

prevents us from testing the identification strength of the whole vector θ = (ψ, π).

The consistency claim (iii) in Theorem 3.5 may look somewhat tautological. One may

interpret it as stating that the choice of a sequence δT fulfilling the consistency condition in

(iii) is likely feasible in practice since, by definition under the alternative,

√T∂φT (θ

0)

∂π′

6= Op(1) ,

that is, at least for a subsequence,

√T∂φT (θ

0)

∂π′

→ ∞ .

Our main task is to pin down a sequence δT going to zero sufficiently slowly to ensure that,

for this sequence,∥

√T∂φT (θ)

∂π′ δT

→ ∞ .

Hence, the slower the sequence δT converges to zero, the more powerful the resulting test

will be. Note that the size of our test is always controlled, regardless of the chosen sequence

δT . In finite samples, the choice of this tuning parameter takes a data-based selection rule

that will be described in subsection 3.4.2.

Recall that Theorem 3.5 enables us to address the issue of interest - testing for weak iden-

tification - in the separable case (which includes the linear case). Besides the fact that a

similar test may not work in the non-separable case, it is worth realizing that even if it

did work it may not have been so useful in practice. When second order terms in Taylor

expansions are not negligible, standard inference procedures may break down even when

ν in (3.7) is smaller than 1/2 (which corresponds to weak identification). In other words,

being able to reject the null H0,π(WI) may not guarantee valid standard inference in the

non-separable case; as a result, testing for weak identification in such a case may not be as

empirically relevant. An extended null hypothesis will be introduced in section 4 below.

3.4 Practical implementation of our testing procedure

We now explain how to implement our testing procedure in practice. More specifically, we

first explain how our testing procedure can be robustified to the absence of a consistent

estimator of S(θ0). Then, we explain how to choose the tuning parameter δT .

20

Page 22: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

3.4.1 Testing without a consistent estimator of S(θ0)

We start by sketching our robust testing procedure:

(i) Compute the CU-GMM estimator θT of θ0;

(ii) For some chosen 0 < γ < 1, build the robust confidence region CT (1− γ) for θ0 using

the CU-GMM objective function. If CT is empty, the null hypothesis is automatically

rejected;

(iii) If CT is not empty, compute the following test statistic

minθ∈CT (1−γ)

[

Tφ′T (θ

δTT )S−1

T (θ)φT (θδTT )]

, (3.8)

and compare it to the appropriate chi-square quantile as discussed in Theorem 3.5.

Our robust testing procedure is related to the inference procedure of Stock and Wright

(2000) and to the projection method of Chaudhuri and Zivot (2011); it also applies to the

general framework of section 4 (with a slight change of notation). The idea to compute the

quantity (3.8) by minimizing over θ ∈ CT (1−γ) for some well-chosen γ ∈ (0, 1) is motivated

by the fact that, by definition,

limT

inf Pr

(

infθ∈CT (1−γ)

[

Tφ′T (θ

δTT )S−1

T (θ)φT (θδTT )]

≤ Tφ′T (θ

δTT )S−1

T (θ0)φT (θδTT )

)

≥ limT

Pr(θ0 ∈ CT (1− γ))

= 1− γ

Moreover, we know that, under H0,

√TφT (θ

δTT ) =

√TφT (θ

0) + op(1) ,

which implies that

Tφ′T (θ

δTT )S−1

T (θ0)φT (θδTT ) = Tφ

′T (θ

0)S−1T (θ0)φT (θ

0) + op(1)d→ χ2(k) .

Therefore, under H0, we have:

limT

inf Pr

(

infθ∈CT (1−γ)

[

Tφ′T (θ

δTT )S−1

T (θ)φT (θδTT )]

≤ Tφ′T (θ

0)S−1T (θ0)φT (θ

0)

)

≥ 1− γ

21

Page 23: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

where

Tφ′T (θ

0)S−1T (θ0)φT (θ

0)d→ χ2(k) .

Note that a sharper bound can even been obtained when we maintain the assumption that

a p1-subvector ψ of θ is sufficiently well-identified to allow standard inference.

With θ = (ψ, π), let θT denote the unfeasible GMM estimator such that:

θT ≡ (ψT , π0) with ψT = argmin

ψ

[

φ′T (ψ, π

0)S−1T (ψ, π0)φT (ψ, π

0)]

.

We have

Tφ′T (θ

δTT )S−1

T (θ0)φT (θδTT ) = Tφ

′T (θT )S

−1T (θT )φT (θT ) + op(1)

d→ χ2(k − p1) .

Overall, in any circumstances (and possibly without maintaining any identification assump-

tion about ψ), we have under H0:

limT

inf Pr

(

infθ∈CT (1−γ)

[

Tφ′T (θ

δTT )S−1

T (θ)φT (θδTT )]

≤ Tφ′T (θT )S

−1T (θT )φT (θT )

)

= 1− γ∗

with γ∗ ≤ γ, and θT stands for θ0 when no identification assumption is maintained about

ψ.

We define our robust test for H0,π asymptotically conservative at level α by the following

procedure. We will reject the null H0 in the following cases: either CT (1− γ) is empty; or

infθ∈CT (1−γ)

[

Tφ′T (θ

δTT )S−1

T (θ)φT (θδTT )]

> χ21−α(k − p1) , (3.9)

with p1 = 0 when no identification assumption is maintained about ψ.

Under the null, the probability to reject is asymptotically smaller or equal to

ǫ+ (1− ǫ)[γ + (1− γ∗)α] ,

where ǫ is the probability for the set CT (1− γ) to be empty. To see this, consider the case

where the set CT (1 − γ) is not empty: the probability of the event (3.9) is upper bounded

by (γ + (1− γ∗)α), since two cases may then occur:

(i) either θ0 is in CT (1 − γ) (with asymptotic probability (1 − γ)), and then the event

(3.9) may occur with probability at most α(1− γ∗) by definition of γ∗;

22

Page 24: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

(ii) or, θ0 is not in CT (1− γ), which may occur with asymptotic probability at most γ.

Overall, the asymptotic level of the test is upper bounded by (ǫ + (1 − ǫ)[γ + (1 − γ∗)α]).

To conclude, it is sufficient to realize that by taking γ arbitrarily close to 0, the asymptotic

level of the test becomes upper bounded by α, since, by definition,

γ → 0 ⇒ ǫ→ 0 and γ∗ → 0 ,

⇒ [ǫ+ (1− ǫ)[γ + (1− γ∗)α]] → α .

3.4.2 Choosing the tuning parameter δT

For a given well-suited sequence δT converging to zero, for example δT = δ/log(log(T )),

we know that the test W δTT (and possibly W δT

T ) defined in Theorem 3.5 is asymptotically

conservative and consistent with probability one for any vector δ chosen randomly (with

an absolutely continuous probability distribution). However, for the sake of finite sample

performance, it clearly matters a lot to fine tune the length of the perturbation δ in due

proportion of the accuracy of estimation of θ already allowed by the available sample of size

T . We now detail a data-driven procedure to choose δ appropriately. This search sets the

focus on the length of δ rather than on its direction. The user of the procedure may even-

tually want to add a random component, by picking the direction of δ randomly in the unit

sphere. To simplify our discussions and notations, we focus on the cases where the moments

are separable and where the subvector ψ (not under test) is assumed sufficiently strongly

identified. Extensions to other cases are straightforward and not discussed explicitly.

Our proposed procedure of elicitation of δ relies on subsampling and is decomposed into

two steps: first, the design of the grid of relevant candidate points for the perturbation

parameter; second, the selection of the actual perturbation parameter.

• Step 1 : design of the grid of candidate points for the perturbation vector.

First, we consider the empirical distribution of the GMM estimators of θ across all the

subsamples of ⌊T κ⌋ consecutive observations3 (with κ user-chosen between 0 and 1; e.g.

κ = 1/2). The range of this distribution suggests a relevant grid of values of θ for the

purpose of Monte-Carlo study. The fineness of the grid is user-chosen: in our experiments,

we used 10 points for each component of θ. Our results were not too sensitive to this choice.

3⌊T κ⌋ refers to the largest integer below T κ. We consider consecutive observations to accommodate

possible serial dependencies.

23

Page 25: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Note that there are many other ways to obtain a meaningful grid of candidate points for

the perturbation vector that may be less computer-intensive, or may rely on additional

information about the values of the parameters.

• Step 2 : selection of the perturbation vector.

Second, we consider H subsamples of ⌊T κ∗⌋ consecutive observations (where κ∗ may or

may not coincide with the aforementioned κ). For each value of δ in the grid and for each

subsample s (with s = 1, · · · , H), we compute the associated local-to-zero version of the

estimator

θa,δm⌊Tκ∗⌋,s = θ⌊Tκ∗⌋,s +

(

0p1

δm/ log(log(⌊T κ∗⌋))

)

,

and the corresponding distorted J-test statistic for overidentification ,

Ja,δm⌊Tκ∗⌋,s = ⌊T κ∗⌋φ′⌊Tκ∗⌋,s(θ

a,δm⌊Tκ∗⌋,s)S

−1T φ⌊Tκ∗⌋,s(θ

a,δm⌊Tκ∗⌋,s) .

As a result, for each perturbation vector δm in the grid, we obtain a cross-sectional distri-

bution of the test statistic (Ja,δm⌊Tκ∗⌋,s)s=1,··· ,H. We can then extract the (1 − α∗)-quantile of

the test statistic, for some user-chosen α∗. We select the perturbation vector δm∗ associated

with the (1−α∗)-quantile the closest to the (1− α∗)-quantile of the chi-square distribution

with (k − p1) degrees of freedom4. Note that (1 − α∗) may, or may not correspond to the

actual asymptotic size of the designed test. Regardless of the chosen (1−α∗) and associated

perturbation vector δm∗ , the asymptotic size of the test (1−α) is always controlled as shown

above. In our Monte-Carlo experiments, we choose κ = κ∗ = 1/2 and α = α∗; our results

(through unreported experiments) are not too sensitive to these choices.

4 Detecting poor identification in the general model

In this section, we consider general models defined by non-linear and non-separable moment

conditions. We are interested in assessing the identification strength of some components

of the vector of structural parameters θ. When testing for identification strength, we will

specifically check whether our dataset allows us to reject the null hypothesis that some

components of θ are too poorly identified - so poorly that standard (Gaussian) asymptotic

4Recall that the appropriate degrees of freedom depends on the identification assumption imposed on

θ1. For instance, when ψ is assumed near-weakly identified, we use (k − p1) degrees of freedom, and when

no such identification assumption is maintained, we rather use k degrees of freedom.

24

Page 26: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

theory is not reliable. Our testing strategy, similarly to the one designed for linear or

separable models, still amounts to a conservative J-test questioning the rate of convergence

of a given GMM estimator. However, the generality of the model calls for a more extensive

concept of ”poor (weak) identification”. More precisely, the null hypothesis under test (see

definition at the beginning of section 4.2 below) covers a broader variety of poor identification

patterns than the strict definition of weak identification considered previously. We start with

some preliminary theoretical results before defining our testing procedure.

4.1 Preliminary theoretical results

4.1.1 Asymptotic theory

We first present the asymptotic theory for GMM estimators under some high-level assump-

tions. While a careful study of rates of convergence of GMM estimators is provided in

Antoine and Renault (2009, 2010), we simplify the exposition by directly maintaining the

required high-level regularity assumptions.

Assumption 1. (High-level regularity assumptions)

(a) The identification strength of θ is characterized by a sequence MT of deterministic non-

singular matrices of size p with limT (MT/√T ) = 0 such that

Γ(θ0) ≡ Plim

[

∂φT (θ0)

∂θ′MT

]

exists and is full-column rank. (4.1)

(b) For any GMM estimator θT and any sequence θT between θ0 and θT component by

component5, we have

Γ(θ0) = Plim

[

∂φT (θT )

∂θ′MT

]

,

where Γ(θ0) is the full-column rank matrix introduced in (4.1).

(c)√TφT (θ

0) converges in distribution towards a normal distribution with mean zero and

variance S(θ0).

(d) Any GMM estimator θT is such that√TM−1

T (θT − θ0) = OP (1).

Comments:

(i) As already acknowledged, assumptions 1(a) and (b) are high-level assumptions, and we

refer the interested reader to Antoine and Renault (2010) for more primitive conditions.

5Hereafter, we use the notation θT ∈ [θ0, θT ].

25

Page 27: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

(ii) Going from assumption 1(a) to (b) only amounts to assuming that

θT ∈ [θ0, θT ] ⇒ Plim

[(

∂φT (θT )

∂θ′− ∂φT (θ

0)

∂θ′

)

MT

]

= 0 . (4.2)

The required zero-limit in (4.2) is obviously ensured for the components of the moment

vector φT (.) that are linear with respect to the parameters θ. For those which are not linear

with respect to some subset of components of θ, the issue at stake is to know whether their

rate of convergence along the sequence θT is sufficient to supersede the possible convergence

to infinity of the sequence MT .

(iii) Under assumption 1(d), we have√TM−1

T (θT − θ0) = OP (1). Hence, the validity of

(4.2) means that, in relevant directions, the convergence to zero of MT/√T dominates the

convergence to infinity of the sequence MT . Roughly speaking, ‖MT‖ should not blow-up as

fast as T 1/4. This threshold is the key concept of ”near-strong identification” promoted by

Antoine and Renault (2009, 2010) as a sufficient condition to guarantee sufficiently strong

identification of θ for standard (Gaussian) asymptotic theory.

The following theorem extends the asymptotic normality result given in the linear case, as

well as results previously given in Antoine and Renault (2009, 2010).

Theorem 4.1. (Asymptotic normality and efficient estimation)

Let θT denote any GMM estimator. Under assumption 1, we have:

(a)√TM−1

T (θT − θ0) is asymptotically normal with mean zero and variance

Σ(θ0) =[

Γ′(θ0)ΩΓ(θ0)]−1

Γ′(θ0)ΩS(θ0)ΩΓ(θ0)[

Γ′(θ0)ΩΓ(θ0)]−1

.

(b) With a weighting matrix ΩT = S−1T , where ST denotes a consistent estimator of S(θ0),

the associated GMM estimator is efficient with asymptotic variance [Γ′(θ0)S−1(θ0)Γ(θ0)]−1.

In our framework, the terminology efficient GMM must be carefully qualified. For all prac-

tical purposes, Theorem 4.1b) states that, for T large enough,√TM−1

T (θT −θ0) can be seen

as a Gaussian vector with mean zero and variance consistently estimated by

M−1T

[

∂φ′T (θT )

∂θS−1T

∂φT (θT )

∂θ′

]−1

M−1′T , (4.3)

since Γ(θ0) = Plim[

∂φT (θ0)∂θ′

MT

]

. However, it is incorrect to deduce from formula (4.3) that√T(

θT − θ0)

can be seen (for T large enough) as a Gaussian vector with mean zero and

26

Page 28: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

variance consistently estimated by

[

∂φ′T (θT )

∂θS−1T

∂φT (θT )

∂θ′

]−1

. (4.4)

The above matrix (4.4) is actually the inverse of an asymptotically singular matrix. In this

sense, a truly standard GMM theory does not apply and some components of√T (θT −

θ0) actually blow-up. Quite fortunately, standard inference procedures work, albeit for

non-standard reasons. For all practical purposes related to inference about the structural

parameter θ, the knowledge of the matrix MT is not required; see also the discussion in

Antoine and Renault (2010). In particular, the J-test for overidentification can be performed

as usual as stated in the following result.

Theorem 4.2. (J-test)

Under the assumptions of Theorem 4.1, for any GMM estimator with a weighting matrix

ΩT = S−1T , where ST denotes a consistent estimator of S(θ0), we have

Tφ′T (θT )S

−1T φT (θT )

d→ χ2(k − p) .

The J-test is a ”black box” that conceals the fact that the quality of identification is het-

erogenous. The asymptotic singularity of (4.4) means that the actual rate of convergence

of the GMM estimator may vary depending on the linear combinations of the structural

parameter vector θ. The following high-level assumption helps to characterize the relevant

directions in the parameter space.

Assumption 2. The sequence of matrices MT such that assumption 1 is fulfilled can be

chosen as MT = RΛT , for some fixed non-singular matrix R and a sequence ΛT of diagonal

matrices.

As shown in Antoine and Renault (2010), from an initial sequence of matrices MT that

does not fulfill assumption 2, we can always build the diagonal coefficients of ΛT as the

singular values of the matrix MT (as the square-roots of eigenvalues of (MTM′T )), while

the matrix R is the limit of a sequence of orthogonal matrices of eigenvectors of (MTM′T ).

Then, Theorem 4.1 characterizes the asymptotic normal distribution of√TΛ−1

T R−1(θT−θ0).In other words, when considering the reparametrization η ≡ R−1θ, the j-th component of

ηT ≡ R−1θT is a consistent asymptotically normal estimator of η0j (with η0 ≡ R−1θ0) with

a rate of convergence√T/λjT (with λjT the j-th diagonal coefficient of ΛT ). Moreover,

27

Page 29: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Antoine and Renault (2010) have shown that this rate of convergence is not impacted by

a preliminary consistent estimation of the matrix R, making this asymptotically normal

estimation feasible. The characterization of the different rates of convergence through the

sequence of diagonal coefficients of the matrix ΛT may matter for interpretation, even though

their knowledge is not necessary to run Wald inference from (4.3).

4.1.2 Identification of subvectors

When testing for identification strength, we will check whether our dataset allows us to reject

the null hypothesis that some components of θ are (too) poorly identified. Throughout,

ψ denotes a vector of p1 components of θ, while π collects the p2 (= p − p1) remaining

components of θ that are not included in ψ. For simplicity, ψ corresponds hereafter to the

first p1 components of θ, that is θ = (ψ, π). We consider cases where the econometrician

is concerned about the poor identification of π while prior knowledge warrants ”sufficiently

strong” identification of ψ.6

It is only when a sequence of matrices MT characterizing the identification strength of θ is

block-diagonal,

MT =

[

Mψ,T 0

0 Mπ,T

]

, (4.5)

that we can deduce the identification strengths of ψ and π from the identification strength

of θ. A well-known example is the separable case (see example 5.3 below) put forward by

Stock and Wright (2000) where the two subsets of components of θ are disentangled as

follows7,

EφT (θ) = Eφ1T (ψ) +1

T νEφ2T (θ), with 0 < ν ≤ 1/2 ,

Rank

(

E∂φ1T (ψ

0)

∂ψ′

)

= p1 and Rank

(

E∂φ2T (θ

0)

∂π′

)

= p2 .

MT can then be defined as in (4.5) with Mψ,T = Ip1 and Mπ,T = T νIp2. Our setting here is

more general since the matrix Mψ,T may also go to infinity.

Up to a convenient reparameterization, the above block-diagonal structure of (4.5) is not

really restrictive. From assumption 2, we define the new vector of parameters, η ≡ R−1θ

6Note that our procedure also accommodates testing the whole vector θ, and testing subvector π without

assuming that ψ is sufficiently strongly identified.7Strictly speaking, Stock and Wright (2000) only consider the limit case with ν = 1/2.

28

Page 30: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

whose identification strength is characterized by the sequence of diagonal matrices ΛT .

Hence, the maintained assumption (4.5) implies that such a reparameterization is possible

with a block-diagonal matrix R. It makes sense to question the identification strength of π

while maintaining sufficiently strong identification on ψ, precisely because the two subvectors

are disentangled in the classification of directions as regards identification strength.

Our tests for identification strength provides some answers to the following question: taking

for granted that ψ is sufficiently strongly identified, do our data confirm the strong iden-

tification of a larger vector of unknown parameters, say (ψ, π)? Moreover, at the cost of

possibly more conservativeness, we may even be able to decide whether a parameter π is

sufficiently well identified (when the null of weak identification of π is rejected) while remain-

ing agnostic about the identification of other parameters ψ. Our null hypothesis is devised

such that failing to reject it implies that standard inference (using the Gaussian asymptotic

theory of section 4.1.1) is not reliable when parameter π is considered as unknown. In front

of such a negative evidence, two strategies are available:

(i) one resorts to inference procedures that are robust to weak identification. Of course,

robustness has a cost in terms of efficiency of estimators, power of tests, and maintained

assumptions regarding nuisance parameters;

(ii) following the common practice of calibration, one may fix the value of parameters in

π at pre-specified levels provided by other studies hoping that these calibrated values

are not too far from the unknown ones and will not contaminate inference on ψ. The

validity of this practice has been extensively studied by Dridi, Guay and Renault

(2007) who propose some encompassing tests for backtesting it.

In any case, both strategies will always maintain the assumption that, when π is fixed at

its true unknown value π0, the remaining moment problem is well-behaved.

Definition 4.1. (Sufficiently strong identification of a subvector)

With the block-diagonal structure (4.5) for the sequence of matrices MT and a true unknown

value π0 for π, ψ is sufficiently strongly identified if the two following conditions are fulfilled.

(i) Assumption 1 is fulfilled for the sequence of matrices Mψ,T in the context of the (infea-

sible) moment model

E[

φt(ψ, π0)]

= 0 with ψ ∈ Θ(π0) =

ψ ∈ Rp1 ; (ψ, π0) ∈ Θ

.

29

Page 31: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

(ii) For any GMM estimator θT and any sequence θ∗T such that ψ∗T = ψT and π∗

T − πT =

oP (T−1/4), we have

∂φT (θ∗T )

∂π′ Mπ,T = Op(1) .

When wondering whether a parameter vector θ = (ψ, π) that strictly nests ψ is sufficiently

strongly identified, the key issue is to check that the convergence condition (4.2) holds for

any sequence θT between the true value θ0 and some GMM estimator θT , that is

Plim

[(

∂φT (θT )

∂θ′− ∂φT (θ

0)

∂θ′

)

MT

]

= 0 . (4.6)

The maintained assumption limT(MT/

√T ) = 0 may not be sufficient to get (4.6) in a general

model, because a rate of convergence for θT strictly slower than√T may be unable to

protect against the asymptotic blow-up of the sequence MT :

∂φT (θT )

∂θ′MT =

[

∂φT (θT )

∂ψ′ Mψ,T...∂φT (θT )

∂π′ Mπ,T

]

. (4.7)

In the second block of (4.7),(

∂φT (θT )∂π′

)

usually depends on the estimator of the weakest

parameters π. Since these parameters are only consistent at a rate ‖Mπ,T‖ /√T , a Taylor

expansion of(

∂φT (θT )∂π′

)

(assuming φ twice continuously differentiable) will allow us to prove

(4.6) only if ‖Mπ,T‖2 /√T goes to zero when T goes to infinity. In other words, the con-

dition to ensure that the poor identification of π does not impair the identification of the

whole vector θ is that ‖Mπ,T‖ = o(T 1/4), or equivalently that the rate of convergence of all

parameters in π (rate defined by the sequence of matrices Mπ,T/√T ) is faster than T 1/4.

As already emphasized in Antoine and Renault (2012), this condition is quite similar in

spirit to Andrews’ (1994) study of MINPIN estimators, or estimators defined as MINimiz-

ing a criterion function that might depend on a Preliminary Infinite dimensional Nuisance

parameter estimator. Intuitively (and even without an infinite dimensional issue), second-

order terms in Taylor expansions (see the above discussion regarding a Taylor expansion of(

∂φT (θT )∂π′

)

) have to remain negligible in front of first-order terms. Since this condition comes

as a byproduct of second-order Taylor expansions, it explains why no threshold like T 1/4

pops up in the linear case. Following Antoine and Renault (2009), this latter property will

be dubbed near-strong identification of the associated linear combination.

The above issue is obviously easier to control for when the weakest parameters π are not

multiplied by the most explosive part of the sequence MT , namely Mπ,T : such cases have

30

Page 32: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

been discussed in section 3, for example, when the moment conditions are linear with respect

to π,

φt(θ) = at(ψ) +Bt(ψ)π ,

or when they are separable a la Stock and Wright (see Assumption C p1061),

EφT (θ) = Eφ1T (θ1) +1

T νEφ2T (θ) .

4.2 Detecting poor identification in the general case

In this section, null hypotheses under test are about the rate of convergence of πT , a subset

of components of a given GMM estimator θT . Unlike existing literature, we do not maintain

any identification assumption on the remaining components of θT , namely ψT . To formulate

a well-suited null hypothesis about the rate of convergence of πT , we follow the practice that

has been dominant since Staiger and Stock (1997) and setup the null hypothesis with the

worse case scenario regarding the identification of π, that is the rate of convergence of πT .

The worst case scenario of interest is actually that no linear combination of the parameters

can be estimated at a satisfactory rate to guarantee standard asymptotic theory. As already

stressed in section 3, the worst case scenario regarding the rate of convergence of πT in the

linear or separable case is that it is not even infinite. We explain below that in the general

case, the worst case scenario is that the rate of convergence of πT is not greater than T 1/4.

For some real number ν ∈ ]0, 1/2], we consider the following null hypothesis.

Null hypothesis of identification of π at a rate no faster than (1/2− ν):

H0,π(ν) :

∂φT (θ)

∂π′

= OP

(

1

T ν

)

.

The above null hypothesis encompasses the null hypothesis of weak identification of π (with

ν = 1/2) and the strong identification of ψ (with ν = 0) considered in section 3. In a general

(non-linear and non-separable) model, we will only be able to control the level of the test

when ν ≤ 1/4; see Corollary 4.3. As already mentioned, the case ν ≤ 1/4 is precisely the

case of interest for non-linear non separable models. It means that, when the data allow

us to reject the null hypothesis, we can conclude that the rate of convergence is faster than

T (1/2−ν), that is faster than T 1/4. It is exactly the case of ”near-strong identification” (see

comments after (4.2)) when, following Antoine and Renault (2009, 2012), we can safely use

31

Page 33: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

standard Gaussian asymptotic inference, based on studentization (Wald tests statistics) to

perform standard inference in spite of non-standard rates of convergence.

The key intuition for our proposed test of H0,π(ν) comes from a technical result (Lemma

A.1 in the appendix) that allows us to characterize the behavior of moment conditions

computed at a conveniently distorted value of the GMM estimator θT . Specifically, the

distorted estimator θδTT is defined as

θδTT = (ψT , πT + δT ) with δT = δ/aT , (4.8)

where this distortion δT depends on a deterministic sequence aT and a direction δ ∈ Rp2.

Our proposed test computes the J-test at the distorted estimator. We can show the following

result.

Corollary 4.3. (i) Under the null hypothesis H0,π(ν), ν ≤ 1/4, for any deterministic

sequence aT such that aT/T1/2−ν → ∞, we have for any δ ∈ R

p2,

Plim [JT (δT )− JT (0)] = 0 .

(ii) Assume that δ is drawn randomly according to some absolutely continuous probability

distribution on Rp2. Then, under the alternative hypothesis to H0,π(ν), with convenient

regularity conditions, there exists a deterministic sequence aT such that aT/T1/2−ν → ∞

and, at least for a convenient subsequence,

Plim [JT (δT )] = ∞ . (4.9)

Comments:

(i) When testing H0,π(ν), the level of our test is controlled whenever ν ≤ 1/4: the equiva-

lence result above between the usual J-test and the J-test evaluated at the distorted GMM

estimator allows us to bound the asymptotic distribution of our test statistic under the null.

In this respect, the test designed for a general model actually works similarly to the one for

a linear or separable model. Unfortunately, without the linearity or separability, the above

equivalence only holds for ν ≤ 1/4.

(ii) The convenient regularity conditions, made explicit in the appendix, are not really

restrictive. They are implied in particular by the assumption that the moment conditions

are affine w.r.t. π. The key intuition is that when limT

√T

aTM−1π

∥= ∞ as in Lemma A.1, we

can be sure that limT

√T

aTM−1π δ

∥= ∞ for generically all directions δ. Then, the result (4.9)

follows by standard Taylor expansions, knowing that√TM−1

T

(

θT − θ0)

= Op(1).

32

Page 34: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

(iii) In the separable case (3.7), the above lemma holds for the null hypothesis H0,π(1/2) as

shown in the appendix.

• Testing procedure:

With general moment conditions that may not be separable with respect to the parameters

under test π, we wonder whether some linear combinations of these parameters can be

consistently estimated at a rate faster than T 1/4. In other words, we want to test the

following null hypothesis (see section 4.1 above for a precise definition H0,π(1/4)):

H0,π(1/4) : No identification within π at rate faster than T 1/4.

Our testing procedure in the general case is quite similar to the procedure described in

the previous section. However, for clarity, the following highlights the key elements of our

general procedure. We consider a sequence aT such that aT/T1/4 → ∞ and a deterministic

vector δ to build a distorted version θδTT of the GMM estimator θT as in (4.8). Our asymptotic

conservative test for H0 is then based on the corresponding J-test statistic computed at the

distorted estimator JT (δT ).

Theorem 4.4. (Test of poor identification of π in the general case)

For an arbitrary choice of a deterministic sequence aT such that aT /T1/4 → ∞ and of a

vector δ ∈ Rp2, we define two asymptotic tests with respective critical region W a,δ

T and W a,δT ,

W δTT =

JT (δT ) > χ21−α(k)

and W δTT =

JT (δT ) > χ21−α(k − p1)

,

where χ21−α(d) is the (1−α)-quantile of the chi-square distribution with d degrees of freedom.

(i) Under assumptions 1c), d) and 2, the test W δTT is asymptotically conservative at level α

for the null hypothesis H0,π(1/4). The test W δTT is consistent against any alternative that

makes the choice (aT , δ) conformable to (4.9).

(ii) If we assume in addition that ψ is sufficiently strongly identified, the test W δTT is asymp-

totically conservative for the same null hypothesis H0,π(1/4), albeit less conservative than

W δTT . The test W δT

T is consistent against any alternative that makes the choice (aT , δ) con-

formable to (4.9).

It is worth realizing that the same comments apply to Theorems 3.5 and 4.4 regarding

the use of intermediate critical values depending on the number of parameters for which

sufficiently strong identification is granted, and the wide range of alternatives against which

consistency is warranted.

33

Page 35: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

5 Additional examples

Our general framework accommodates two classes of models that may be subject to iden-

tification issues. First, as in Andrews and Cheng (2012), the parameter π of interest may

be identified if and only if another parameter β is not equal to zero. ζ denotes additional

parameters not related to the identification of π, and ψ = (β, ζ) collects parameters that

are always well-identified. In this case, we have

θ = (ψ, π) = (β, ζ, π) , (5.1)

and the moment function φt(θ) does not depend on π when β = 0.

Second, in order to include the weak IV literature that is not nested in the above framework,

we also consider cases where the source of weak identification cannot be tied to the value

of a structural parameter but to some auxiliary parameter β instead. In this case, ψ = ζ is

always well-identified and

θ = (ψ, π) = (ζ, π) . (5.2)

As shown in the examples below, the weak IV literature can be included when considering β

as a reduced form parameter, either explicitly introduced (as in Staiger and Stock (1997)),

or implicitly at stake (as in Stock and Wright (2000)). In all cases, we end up with a

classification of identification categories consistent with the one proposed by Andrews and

Cheng (2012) (see their Table 1 p2164), but extended to encompass the weak IV literature.

In particular, the case β = 0 always corresponds to lack of identification for the parameter

π, while the weak identification case is captured by a sequence of drifting DGPs such that

‖β‖ = O(

1√T

)

. (5.3)

Example 5.1. (Andrews and Cheng, 2012).

In the context of Andrews and Cheng (2012), we have

∂φT (β, ζ, π)

∂π′ =∂φT (0, ζ, π)

∂π′ +∂2φT (0, ζ, π)

∂π∂β ′ β + oP (‖β‖) , (5.4)

and, since φT (θ) does not depend on π when β = 0,

∂φT (0, ζ, π)

∂π′ = 0 .

34

Page 36: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

From these equations, it is clear that our null hypothesis H0,π(WI) is always fulfilled in

the case (5.3) dubbed weak identification by Andrews and Cheng (2012). A consistent test

of our null hypothesis H0,π(WI) of weak identification of π will a fortiori be a consistent

test of weak identification as defined (through β) by Andrews and Cheng (2012). In this

respect, our setting encompasses the setting of Andrews and Cheng (2012), at least when

the criterion function QT (θ) that they minimize over the parameter space Θ is a sample

average that is smooth in θ. In this case, our estimating function φT (θ) must be interpreted

as the p-dimensional vector (p = k) of first-order derivatives of QT (θ). Andrews and Cheng

(2012) consider two leading examples: a nonlinear regression model, and an ARMA (1,1)

model estimated by QML. In the former case, weak identification comes from the fact that

some nonlinear function of π (and observations) is weighted by β while in the latter, it

comes from the fact that the roots of the AR and the MA lag-polynomials coincide when

β = 0.

Example 5.2. (Dovonon and Renault, 2013).

Dovonon and Renault (2013) consider moment conditions:

E [φt(β, ζ, π)] = 0

such that for all (ζ, π):

E

[

∂φt(0, ζ, π)

∂π′

]

= 0

However, by contrast with the framework of Andrews and Cheng (2012), the estimating

equations always depend on π, even when β = 0. The first derivative with respect to π is

actually zero when β = 0 and the derivative is computed at the true unknown value of the

parameters of interest (ζ, π). The parameter π suffers from first order underidentification

when β = 0 but, still, it is identified through higher order terms in Taylor expansions.

The first order underidentification issue has been first addressed in the IV literature by Sar-

gan (1983) while Lee and Chesher (1986) and Rotnitzky, Bottai, Cox, and Robbins (2000)

had to deal with it for MLE inference about the selectivity bias in a TOBIT-type model.

Recently, Chen, Chernozhukov, Lee, and Newey (2014) have also proposed a general (non-

parametric) treatment of these issues. In all these cases, the structural model of interest is

actually defined by the constraint β = 0, while the unconstrained model is not even consid-

ered. By contrast, Dovonon and Renault (2013) set the focus on the J-overidentification test

35

Page 37: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

of this model. In this context, the condition (5.4) of weak identification a la Andrews and

Cheng (2012) would rather be interpreted as a condition of local misspecification. Such lo-

cal misspecification is especially meaningful when one is interested in a parsimonious model

that may only be true approximatively.

As far as our null hypothesis H0,π(WI) of weak identification is concerned, by revisiting the

decomposition (5.4) through its probability limit, one can easily see that, once again, the

weak identification condition (5.3) implies our null hypothesis. When the econometrician

rejects our null hypothesis H0,π(WI), she actually rejects the possibility of (first-order) local

misspecification in her model. This specification test can be seen as an alternative to the

specification test of Dovonon and Renault (2013).

Example 5.3. (Stock and Wright, 2000).

In this case, we have a DGP that is drifting through the value of an auxiliary parameter β

such that

E [φt(θ)] = E [φ1t(ζ)] + βE [φ2t(ζ, π)] .

The parameter ζ is well-identified by the estimating equations,

E [φ1t(ζ)] = 0 .

However, the parameter π is obviously not identified when β = 0. Our weak identification

hypothesis H0,π(WI) is implied by the condition (5.3) as in Andrews and Cheng (2012). To

see that, simply note that as soon as the moment conditions of interest fulfill a weak law of

large numbers, we have

lim[√

Tβ]

= b ⇒ Plim

[√T∂φT (θ)

∂π′

]

= lim

√TE

[

∂φt(θ)

∂π′

]

= bE

[

∂φ2t(θ)

∂π′

]

.

Note also that the case where all structural parameters are well-identified can easily be

nested after considering θ = ζ .

36

Page 38: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

6 Empirical illustrations

6.1 Linear IV regression model

Consider the following linear IV regression model (see also section 3.2),

yt = α0 + Ytπ0 + h(Zt)εt , (6.1)

Yt = Z ′tβ + Ut ,

where Yt is a univariate endogenous regressor, while Zt is a vector of Lz (exogenous) in-

strumental variables that follows a standard normal distribution. (εt, Ut) is normally dis-

tributed and independent of Zt. We set θ0 = (α0 π0)′ = (0 0)′. We consider two versions

of the model: a homoskedastic model with h(z) = 1 and a heteroskedastic model with

h(z) =√

(1 + (e′z)2)/(Lz + 1) where e is the vector of ones of size Lz. In both models,

(h(Zt)εt, Ut) has mean 0, unit unconditional variances, and unconditional correlation ρ; β

is proportional to the vector e. The intercept parameter is always strongly-identified, while

the slope parameter is more or less weakly identified depending on the value of β ′β.

In this experiment, we compare the properties of three pretests: (i) our pretest introduced

in section 3; (ii) Staiger and Stock’s (1997) rule of thumb; (iii) Stock and Yogo’s (2005)

pretest based on 10% bias of 2SLS8. Results for the homoskedastic model with β ′β =

0.163 (stronger identification), 0.010 (weaker identification), and 0 (no identification) are

respectively collected in Tables 1 to 3. Corresponding results for the heteroskedastic model

are collected in Tables 4 to 6.

Each table contains three panels. Panel A provides the first four moments of the Monte-

Carlo distribution of the standardized GMM estimator. The reader can then assess how far

the Monte-Carlo distribution is from its standard asymptotic approximation: recall that for

the standard normal distribution, we expect mean 0, variance 1, skewness 0 and kurtosis

3. Panel B collects results for detecting weak identification of the whole parameter vector

θ jointly. We consider four pretests: two versions of our test, robust and not robust to

a consistent estimator of the covariance matrix of the moment restrictions, hereafter AR

(robust) and AR (not robust); the rule of thumb of Staiger and Stock, hereafter SS; Staiger

and Stock pretest based on 10% bias of 2SLS, hereafter SY. For each pretest, we provide the

rejection probabilities. Finally, Panel C collects results for detecting weak identification of a

8This is the version of the test proposed by Stock and Yogo which is commonly used. Results for alternate

versions of their test based on 5% bias, as well as 10% and 15% size distortion are available upon request.

37

Page 39: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

subvector using our test: specifically, testing for weak identification of the intercept without

assuming that the slope is sufficiently strongly identifified; testing for weak identification of

the slope under the assumption that the intercept is sufficiently strongly identified. Several

comments are in order.

(1) The distribution of the standardized GMM estimator displayed in Panel A confirms

that the asymptotic approximation always works well for the intercept parameter (which is

always well-identified), whereas the approximation worsens for the slope parameter as the

identification issues become more acute for small values of β ′β.

(2) AR robust vs AR not robust. When comparing the performances of the two versions

of our test, it is clear that a lot of power is lost by the robust version of our test. This is

especially striking for the test on the whole vector (intercept and slope jointly). However,

when only testing the intercept, both versions of our test reject weak identification with

probability 1 in all cases, as expected.

To conclude, we recommend testing identification strength of specific components rather

than the whole vector whenever possible in order to have good power properties.

6.2 Estimating the EIS

In this section, we apply our testing procedures to the motivating example discussed in

section 2.2, the instrumental variables estimation of the Elasticity of Intertemporal Substi-

tution (EIS). Recall the linearized Euler equation in two standard IV frameworks:

∆ct+1 = ν + ψrt+1 + ut+1 ,

rt+1 = ξ +1

ψ∆ct+1 + ηt+1 ,

where ψ is the EIS, ∆ct+1 the consumption growth at time (t + 1), rt+1 a real asset return

at time (t + 1), ν and ξ two constants. The vector of (valid) instruments is denoted by Zt.

Following Yogo (2004), we use the real return on the short-term interest rate for rt, and

two lags of nominal interest rate, inflation, consumption growth, and log dividend price-

ratio for instruments. The quarterly data for 11 countries can be found on Yogo’s webpage

at https://sites.google.com/site/motohiroyogo/. Our empirical study replicates par-

tially the studies by Yogo (2004) and Montiel-Olea and Pflueger (2013) who have found that

the EIS point estimates are small and close to zero.

Table 7 compares pretests for weak instruments for 11 countries and compares to Table 1

in Yogo (2004) and Table 2 in Montiel-Olea and Pflueger (2013). Panel A pretests for weak

38

Page 40: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

identification with the ex-post real interest rate as the endogenous variable, while Panel B

pretests for weak identification with the consumption growth as the endogenous variable.

We compare seven pretests: Stock and Yogo test based on 10% bias of 2SLS, three versions

of the test proposed by Montiel-Olea and Pflueger, and three versions of the test proposed

in this paper. Montiel-Olea and Pflueger propose two testing procedures, simplified and

generalized, that rely on the same test statistic and adjust the critical values. Accordingly,

we consider the three associated pretests with simplified critical values csimp, and generalized

critical values, cTSLS and cLIML. We also report the results for three versions of our pretest:

pretesting the intercept and the slope jointly, pretesting the intercept only, and pretesting

the slope only.

As already mentioned, the intercept (either ν or ξ) is always strongly identified, while only

the slope (either ψ or 1/ψ) may be weakly identified. This distinction cannot be captured

by joint tests such as Stock and Yogo’s or Montiel-Olea and Pflueger’s. Our pretest for the

intercept always conclude that it is strongly identified. Our pretest for the slope always

conclude that it is weakly identified. Our joint test is associated with mixed results: it

rejects weak identification for 5 countries in Panel A (out of 11) and 2 countries in Panel B.

The pretests of Stock and Yogo and Montiel-Olea and Pflueger also lead to mixed results.

However, it is worth mentioning that none of these pretests can reject weak identification in

Panel B. This corresponds to the results of our test for the slope only. Finally, our test is the

only one to find matching results for every country under both specifications: the intercept

parameter is always found strongly identified, whereas the slope parameter is always found

weakly identified.

To conclude and similarly to the Monte-Carlo results, we recommend using our pretest of a

subvector rather than the joint pretest, which always yields consistent results of identification

strength regardless of the formulation of the model as shown in Panels A and B.

7 Conclusion

This paper introduces a new test for identification strength based on sensitivity analysis of

a GMM estimator. Interestingly enough, in linear homoskedastic cases, this test is asymp-

totically equivalent to the properly tuned rule of thumb based on a F-test performed about

the reduced from. Moreover, by contrast with the naive F-test, even robustified for het-

eroskedasticity, our GMM approach provides the right metric for assessing the discrepancy

39

Page 41: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

from the null hypothesis.

Our null hypothesis is defined by the fact that identification would be too weak for valid

asymptotic inference. This concern for valid inference implies that we define the null hy-

pothesis by a rate convergence, while the relevant rate may differ in linear and nonlinear

contexts. The curvature of the model with respect to the possibly weakly identified param-

eters may make their slow convergence even more detrimental. The relevant rates govern

the rate of distortion of GMM estimators that is performed in order to detect weak identifi-

cation. This careful definition is the price to pay to control the asymptotic behavior of tests

and estimators through the setup of drifting DGPs. In addition, we provide a consistent

cross-validation procedure for the choice of the relevant tuning parameters for the distortion

of the GMM estimators. The important advantage to have defined the null hypothesis by an

”excessive level of weak identification” is that, when compelling evidence has been gathered

by the practitioner to reject the null, then she can safely perform standard GMM inference.

It is also worth stressing that this paper can also be understood as a general rule of thumb,

independently of the asymptotic theory based on drifting DGPs. After all, it is natural to

conclude that our GMM problem is well behaved when we note that the overidentification

test statistics is sufficiently sensitive to a well-tuned distortion of the GMM estimator.

Moreover, other contexts of weak identification, like first-order underidentification studied

by Dovonon and Renault (2013), are also characterized by slower rates of convergence, albeit

not based on the arguably non-orthodox argument of drifting DGPs.

Even more generally, the impossibility theorems devised by Dufour (1997) perfectly charac-

terize the accuracy delusion that can be produced by overlooked identification issues. Our

strategy to distort the GMM estimator sounds like a relevant answer to this risk. And, at

the end of the day, if distorting the GMM estimator does not have a significant impact on

the overidentification test statistic, the practitioner should be worried that inference may

not be as accurate as it was supposed to be.

References

[1] D.W.K. Andrews, Asymptotics for Semiparametric Econometric Models via Stochastic

Equicontinuity, Econometrica 62 (1994), 43–72.

[2] D.W.K. Andrews and X. Cheng, Estimation and Inference with Weak, Semi-strong,

and Strong Identification, Econometrica 80 (2012), 2153–2211.

40

Page 42: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

[3] D.W.K. Andrews and J.H. Stock, Inference with Weak Instruments, Econometric So-

ciety Monograph Series, vol. 3, ch. 8 in Advances in Economics and Econometrics,

Theory and Applications: Ninth World Congress of the Econometric Society, Cam-

bridge University Press, Cambridge, 2007.

[4] I. Andrews, Robust Two-Step Confidence Sets, and the Trouble with the First-stage

F-statistic, Working paper (2015).

[5] B. Antoine and E. Renault, Efficient GMM with Nearly-weak Instruments, The Econo-

metrics Journal 12 (2009), 135–171.

[6] , Efficient Inference with Poor Instruments: a General Framework, ch. 2 in

Handbook of Empirical Economics and Finance, pp. 29–70, edited by A. Ullah and

D.E. Giles, Taylor & Francis, 2010.

[7] , Efficient Minimum Distance Estimation with Multiple Rates of Convergence,

Journal of Econometrics (2012), 350–367.

[8] , On the Relevance of Weaker Instruments, Forthcoming Econometric Reviews

(2016).

[9] F. Bravo, J.C. Escanciano, and T. Otsu, A simple test for identification in GMM under

conditional moment restrictions, Advances in Econometrics 29 (2012), 455–477.

[10] J.Y. Campbell, Consumption-based Asset Pricing, in Hanbook of Economics and Fi-

nance 13 (2003), 803–887.

[11] J.Y. Campbell and L.M Viceira, Consumption and Portfolio Decisions When Expected

Returns Are Time Varying, Quarterly Journal of Economics 114 (1999), 433–495.

[12] , Strategic Asset Allocation: Portfolio Choice for Long-Term Investors, Claren-

don Lectures in Economics (New York: Oxford University Press, 2002.

[13] M. Caner, Testing, Estimation in GMM and CUE with Nearly-Weak Identification,

Econometric Reviews 29 (2010), 330–363.

[14] S. Chaudhuri and E. Zivot, A new method of projection-based inference in GMM with

weakly identified nuisance parameters, Journal of Econometrics 164 (2011), 239–

251.

[15] X. Chen, V. Chernozhukov, S. Lee, and W. Newey, Local Identification of Nonparamet-

ric and Semiparametric Models, Econometrica 82 (2014), 785–809.

41

Page 43: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

[16] P. Dovonon and E. Renault, Testing for Common Conditionally Heteroskedastic Fac-

tors, Econometrica 81 (2013), 2561–2586.

[17] R. Dridi, A. Guay, and E. Renault, Indirect Inference and Calibration of Dynamic

Stochastic General Equilibrium Models, Journal of Econometrics 136 (2007), 397–

430.

[18] J.-M. Dufour, Some Impossibility Theorems in Econometrics with Applications to Struc-

tural and Dynamic Models, Econometrica 65 (1997), 1365–1388.

[19] L.G. Epstein and S.E. Zin, Substitution, Risk Aversion, and the Temporal Behavior

of Consumption and Asset Returns: A Theoretical Framework, Econometrica 57

(1989), 937–969.

[20] , Substitution, Risk Aversion, and the Temporal Behavior of Consumption and

Asset Returns: An Empirical Analysis, Journal of Political Economy 99 (1991),

263–286.

[21] P. Guggenberger, F. Kleibergen, S. Mavroeidis, and L. Chen, On the Asymptotic Sizes

of Subset Anderson-Rubin and Lagrange Multiplier Tests in Linear Instrumental

Variables Regression, Econometrica 80 (2012), 2649–2666.

[22] P. Guggenberger and R.J. Smith, Generalized empirical likelihood estimators and tests

under partial, weak and strong identification., Econometric Theory 21 (2005), 667–

709.

[23] J. Hahn and J. Hausman, A new Specification Test for the Validity of Instrumental

Variables, Econometrica 70 (2002), 163–190.

[24] , Weak Instruments: Diagnosis and Cures in Empirical Econometrics, American

Economic Review 93 (2003), 118–125.

[25] J. Hahn and G. Kuersteiner, Discontinuities of Weak Instruments limiting Distribu-

tions, Economics Letters 75 (2002), 325–331.

[26] C. Hansen, J. Hausman, and W. Newey, Estimation with Many Instrumental Variables,

Journal of Business and Economic Statistics 26 (2008), 398–422.

[27] L.P. Hansen, Large Sample Properties of Generalized Method of Moments Estimators,

Econometrica 50 (1982), no. 4, 1029–1054.

[28] L.P. Hansen and K.J. Singleton, Generalized Instrumental Variables Estimation of Non-

linear Rational Expectations Models, Econometrica 50 (1982), 1269–1286.

42

Page 44: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

[29] R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge University Press, 1985.

[30] A. Inoue and B. Rossi, ”Testing for Weak Identification in Possibly Non-linear Models”,

Journal of Econometrics 161 (2011), 246–261.

[31] F. Kleibergen, Testing Parameters in GMM without assuming that they are identified,

Econometrica 73 (2005), 1103–1123.

[32] L.F. Lee and A. Chesher, Specification Testing when Score Test Statistics are Identically

Zero, Journal of Econometrics 31 (1986), 121–149.

[33] A. McCloskey, Bonferroni-Based Size-Correction for Nonstandard Testing Problems,

Working paper, Brown University (2015).

[34] J.L. Montiel-Olea and C. Pflueger, A Robust Test for Weak Instruments, Journal of

Business and Economics Statistics 31 (2013), 358–369.

[35] M.J. Moreira, A Conditional Likelihood Ratio Test for Structural Models, Econometrica

71 (2003), 1027–1048.

[36] W.K. Newey and F. Windmeijer, GMM with Many Weak Moment Conditions, Econo-

metrica 77 (2009), 687–719.

[37] A. Rotnitzky, D. R. Cox, M. Bottai, and J. Robins, Likelihood-based Inference with

Singular Information Matrix, Bernoulli 6 (2000), 243–284.

[38] J.D. Sargan, Identification and lack of Identification, Econometrica 51 (1983), 1605–

1633.

[39] D. Staiger and J. Stock, Instrumental Variables Regression with Weak instruments,

Econometrica 65 (1997), 557–586.

[40] J.H. Stock and J.H. Wright, GMM with Weak Identification, Econometrica 68 (2000),

no. 5, 1055–1096.

[41] J.H. Stock and M. Yogo, Asymptotic Distributions of Instrumental Variables Statis-

tics With Many Instruments, ch. 8 in Identification and Inference for Econometric

Models: Essays in Honor of Thomas Rothenberg, pp. 109–120, edited by D.W.K.

Andrews and J.H. Stock, Cambridge University Press, Cambridge, U.K., 2005.

[42] M.D. Woodford, Interest and Prices: Foundations of a Theory of Monetary Policy,

Princeton University Press, 2003.

43

Page 45: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

[43] M. Yogo, Estimating the Elasticity of Intertemporal Substitution When Instruments are

Weak, The Review of Economics and Statistics 86 (2004), 797–810.

A Proofs of the main results

Proof of Propositions 3.2, 3.3 and 3.4:

(i) Asymptotic behavior of the FT statistic:

By assumption: √TΣ−1/2(β − β)

d→ N (0, I) .

Under the null of weak identification:

limTβ√T = c⇒

√TΣ−1/2β

d→ N(

Σ−1/2c, I)

,

and thus:

FT =1

k

√TΣ−1/2β

2 d→ χ2(k, λ)

kwith λ = c′Σ−1c.

Under the alternative hypothesis:

√TΣ−1/2β =

√TΣ−1/2β + Σ−1/2

(

Z ′Z

T

)−1Z ′V√T

=√TΣ−1/2β +OP (1) .

Hence:

kFTT

= β ′Σ−1β +OP (1/√T ) .

(ii) Asymptotic behavior of the JT (δ) statistic:

JT (δ) = Tφ′T (θT + δ)S−1

T φT (θT + δ)

with:

φT (θT + δ) =1

TZ ′(y − Y πT − Y δ)

=1

TZ ′u+

1

TZ ′Y (π − πT )−

1

TZ ′Y δ

=1

TZ ′u+

1

TZ ′Zβ(π − πT ) +

1

TZ ′v(π − πT )−

1

TZ ′Zβδ − 1

TZ ′vβδ .

Under the null of weak identification:

limTβ√T = c

⇒√TφT (θT + δ) =

Z ′u√T

+ cZ ′Z

T[(π − πT )− δ] +

Z ′v√T[(π − πT )− cδ] + oP (1) .

44

Page 46: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Under weak identification, (π − πT ) = OP (1). Consequently, none of the first three terms

of the right hand side can be neglected in general. Therefore, we can only state that:

JT (δ) = Tφ′T (θT + δ)S−1

T φT (θT + δ) = OP (1) ,

with a non-standard asymptotic distribution.

Under the alternative hypothesis:

√TφT (θT + δ) =

Z ′u√T

+1

TZ ′Zβ

√T (π − πT ) +

Z ′v√T(π − πT )−

1

TZ ′Zβδ

√T

−Z′v√Tβδ

= −(

Z ′Z

T

)

βδ√T +OP (1) .

Hence:JT (δ)

T= δ2β ′QZS

−1QZβ +OP (1/√T ) ,

where QZ stands for the positive definite matrix Plim [Z ′Z/T ].

(iii) Comparison of the FT and the JT (δ) statistic:

Let us first do it under the null hypothesis of weak identification.

JT (δ) = Tφ′T (θT + δ)S−1

T φT (θT + δ)

=

[√TφT (θT )−

Z ′Y√Tδ

]′S−1T

[√TφT (θT )−

Z ′Y√Tδ

]

= ‖AT − BT‖2ST

≤ ‖AT‖2ST+ ‖BT‖2ST

+ 2 ‖AT ‖ST‖BT ‖ST

,

with the general notation:

‖A‖2Ω = A′Ω−1A .

We have:

‖AT‖2ST= Tφ

′T (θT )S

−1T φT (θT )

‖BT‖2ST= δ2

(

Z ′Y√T

)′S−1T

(

Z ′Y√T

)

We have seen in the main text that:

‖AT‖2ST ‖A∗

T‖2ST

d→ χ2(k) .

45

Page 47: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Moreover:

limTβ√T = c

⇒ ‖BT‖2ST= δ2

(

Z ′Z

Tc +

Z ′v√T

)′S−1T

(

Z ′Z

Tc+

Z ′v√T

)

+ oP (1)

= δ2(

QZc+Z ′v√T

)′S−1

(

QZc+Z ′v√T

)

+ oP (1) .

The asymptotic distribution of this variable will be in general a complicated mixture of

non-central chi-square except if we maintain the simplifying assumption of conditional ho-

moskedasticity. In this case, S = σ2uQZ with σ2

u = E(u2t ), and

‖BT‖2ST=

δ2

σ2u

(

Q1/2Z c+Q

−1/2Z

Z ′v√T

)′(

Q1/2Z c+Q

−1/2Z

Z ′v√T

)

= δ2σ2v

σ2u

(

1

σvQ

1/2Z c+ ζT

)′(1

σvQ

1/2Z c+ ζT

)

,

where the vector is asymptotically standard normal. Therefore,

‖BT‖2ST

d→ δ2σ2v

σ2u

χ2(k, λ) , with λ =1

σ2v

c′QZc = c′Σ−1c .

Therefore, we have shown that under the null of weak identification and under conditional

homoskedasticity, the asymptotic distribution of JT (δ) is upper bounded by:

χ2(k) + δ2σ2v

σ2u

χ2(k, λ) + 2δσvσu

χ2(k)χ2(k, λ) .

Moreover, if we choose the parameter distortion to properly rescale the error terms, that is,

δ =σuσv,

we get the announced upper bound:

χ2(k) + χ2(k, λ) + 2√

χ2(k)χ2(k, λ) .

Even without assuming conditional homoskedasticity, we deduce from (ii) above that under

the alternative hypothesis of strong identification:

kFTT

= ‖β‖2Σ +OP (1/√T ) ,

JT (δ)

T= ‖β‖2Ω +OP (1/

√T ) ,

46

Page 48: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

with Ω = Q−1Z SQ−1

Z /δ2. It it worth noting that due to serial independence:

Σ = Q−1Z E

[

ztz′tσ

2v(zt)

]

Q−1Z ,

S = E[

ztz′tσ

2u(zt)

]

.

And we get very similar formulas:

kFTT

= ‖QZβ‖2Qz(v)+OP (1/

√T ) ,

JT (δ)

T= δ2 ‖QZβ‖2Qz(u)

+OP (1/√T ) ,

where:

Qz(u) = E[

ztz′tσ

2u(zt)

]

,

Qz(v) = E[

ztz′tσ

2v(zt)

]

.

Under conditional homoskedasticity, we get:

kFTT

− JT (δ)

T= OP (1/

√T ) ,

when we choose the parameter distortion to properly rescale the error terms, that is,

δ =σuσv

Proof of Theorem 4.1 (Asymptotic normality and efficient GMM estimator):

A mean-value expansion of the moment conditions around θ0 for θT between θT and θ0 gives

φT (θT ) = φT (θ0) +

∂φT (θT )

∂θ′(θT − θ0) . (A.1)

Combined with the first-order conditions,

∂φ′T (θT )

∂θΩTφT (θT ) = 0 ,

this yields to

∂φ′T (θT )

∂θΩTφT (θ

0) +∂φ

′T (θT )

∂θΩT

∂φT (θT )

∂θ′(θT − θ0) = 0

⇔ M ′T

∂φ′T (θT )

∂θΩT

√TφT (θ

0) +M ′T

∂φ′T (θT )

∂θΩT

∂φT (θT )

∂θ′MTM

−1T (θT − θ0) = 0 . (A.2)

47

Page 49: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Under assumption 1a), we have

∂φT (θT )

∂θ′MT

P→ Γ(θ0) and∂φT (θT )

∂θ′MT

P→ Γ(θ0)

⇒ M ′T

∂φ′T (θT )

∂θΩT

∂φT (θT )

∂θ′MT

P→ Γ′(θ0)ΩΓ(θ0)

⇒ M ′T

∂φ′T (θT )

∂θΩT

∂φT (θT )

∂θ′MT is invertible for T large enough.

Combined with (A.2), we get

M−1T

√T (θT − θ0) = −

[

M ′T

∂φ′T (θT )

∂θΩT

∂φT (θT )

∂θ′MT

]−1

M ′T

∂φ′T (θT )

∂θΩT

√TφT (θ

0) .

Assumption 1d) and a standard argument for optimality allow to conclude.

Proof of Theorem 4.2 (J-test):

Using (A.1) and (A.2), we get:

√TφT (θT ) =

√TφT (θ

0)− ∂φT (θT )

∂θ′MT

[

M ′T

∂φ′T (θT )

∂θS−1T

∂φT (θT )

∂θ′MT

]−1

×M ′T

∂φ′T (θT )

∂θS−1T

√TφT (θ

0)

⇒ TQT (θT ) =[√

TφT (θ0)]′S′−1/2T [Ik − PX ]S

−1/2T

[√TφT (θ

0)]

+ oP (1)

with S−1T = S

′−1/2T S−1

T and PX = X(X ′X)−1X ′ for X = S−1/2T

∂φT (θT )∂θ′

MT . And we get the

expected result.

Lemma A.1. (i) Under the null hypothesis H0(ν), for any deterministic sequence aT such

that aT/Tν → ∞, we have

limT

[√T

aTM−1π,T

]

= 0 .

(ii) Under the alternative hypothesis to H0(ν), and under convenient regularity conditions,

there exists a deterministic sequence aT such that aT/Tν → ∞ and, at least for a convenient

subsequence,

limT

√T

aTM−1π,T

= ∞ .

48

Page 50: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Proof of Lemma A.1:

Our proof uses the notations introduced in section 4.

(i) Assume that we can find a deterministic sequence aT with aT/Tν → ∞ such that the

sequence of matrices√T

aTM−1π,T does not converge to zero. Then, there exists a vector δ ∈ R

p2

such that√TM−1

π,TδaT

does not converge to zero. Assume, for expositional simplicity, that

the first coefficient bT of this vectorial sequence does not converge to zero. Then, up to

eliciting a well-chosen subsequence, we can claim that for some ε > 0, we have for all T ,

|bT | > ε .

However, we know that: √TM−1

T

(

θT − θ0)

= OP (1) ,

and, in particular, with obvious notations

√TM−1

π,T

(

πT − π0)

=√TΛ−1

π,TR−1π

(

πT − π0)

= OP (1) .

Note that R−1π π = ηπ where η = R−1θ is the new vector of parameters (after rotation of the

parameter space) defined just after assumption 2. Thus we have

√TΛ−1

π

(

ηπ,T − η0)

= OP (1) ,

and, in particular, focusing on first (diagonal) coefficient λ1,π,T of Λπ,T and first coefficient

η1,π,T of ηπ,T , we have: √T

λ1,π,T(η1,π,T − η01,π) = OP (1) .

However, since bT has been defined as the first coefficient of

√TM−1

π,T

δ

aT=

√TΛ−1

π,TR−1π

δ

aT,

it can be written

bT =

√T

λ1,π,T

δ1aT

,

where δ1 stands for the first coefficient of R−1π δ. Note that δ1 6= 0 (since |bT | > ε) and we

deduce from a comparison of the two above formulas that

bTδ1aT (η1,π,T − η01,π) = OP (1) .

49

Page 51: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Since |bT | > ε for all T (or at least a subsequence), this implies that, at least along a

subsequence,

aT (η1,π,T − η01,π) = OP (1) .

Therefore, the null hypothesis H0(ν) must be violated since aT/Tν → ∞ and (η1,π,T − η01,π)

has been built as a linear combination of (πT − π0).

(ii) Under the alternative, we can find a deterministic sequence bT with bT/Tν → ∞ such

that for some non-zero vector δ ∈ Rp2 we have (for a well-suited subsequence):

bT δ′ (πT − π0

)

= OP (1) .

Then:

δ′(

πT − π0)

= γ′(ηπ,T − η0π) = OP (1/bT ) ,

for some non zero vector γ = R′πδ. Let us consider another deterministic sequence aT with

aT/Tν → ∞ but aT/bT → 0. Then:

γ′aT (ηπ,T − η0π) = oP (1) .

Since we maintain the assumption that, at least for a convenient subsequence, γ′(ηπ,T − η0π)

does not go to zero at a rate faster than minj

[

λj,π,T )/√T]

, we are able to conclude that at

least one diagonal coefficient of[

aTΛπ,T√T

]

goes to zero. Therefore, since

√T

aTM−1

π,T =

√T

aTΛ−1π,TR

−1π ,

at least one line of this matrix is such that the sum of the absolute value of its coefficients

goes to infinity. In other words, the norm ‖.‖∞ of this matrix (maximum row sum norm,

see Horn and Johnson (1985) p295) goes to infinity. Since, for the spectral matrix norm ‖.‖we are using in this paper, we have

∥M−1π,T

∥ ≥ √p2∥

∥M−1π,T

∞ (see Horn and Johnson (1985)

p314), we can conclude that, at least for a convenient subsequence,

limT

√T

aTM−1π,T

= ∞

Proof of Corollary 4.3:

Our proof uses the notations introduced in section 4.

50

Page 52: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

(i) From Lemma A.1(i), we get:

limT

[√T

aTM−1

π,T δ

]

= 0 . (A.3)

Moreover, a mean-value expansion of the moment conditions gives:

√TφT (θ

δTT ) =

√TφT (θT ) +

√T∂φT∂θ′

(θT )(θδTT − θT ) ,

where, with the standard abuse of notation, we have defined component by component some

θT between θδTT and θT . Then, by definition of θδTT ,

√TφT (θ

δTT ) =

√TφT (θT ) +

√T∂φT∂π′ (θT )

δ

aT(A.4)

=√TφT (θT ) +

∂φT∂π′ (θT )Mπ,T

√T

aTM−1

π,T δ .

It is worth realizing that in all cases we know that

∂φT∂π′ (θT )Mπ,T = OP (1) (A.5)

• in the linear case, because ∂φT∂π′

(θT ) =∂φT∂π′

(θ0).

• in the separable case, it follows from

E∂φT (θT )

∂θ′MT =

[

E ∂φ1T (ψ)∂ψ′

+ 1T νE

∂φ2T (θT )∂ψ′

E ∂φ2T (θT )∂π′

]

.

• in the general case, it is implied by definition 4.1 since

ψT = ψT and πT − πT = O(δ/aT ) = o(T−1/4) .

Then, from (A.3), (A.4) and (A.5), we deduce:√TφT (θ

δTT ) =

√TφT (θT ) + oP (1),

and the required result as an immediate consequence.

(ii) Since from Lemma A.1, we have, under convenient regularity conditions,

limT

√T

aTM−1

π,T

= ∞ ,

51

Page 53: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

we have for most vectors δ ∈ Rp2

limT

√T

aTM−1

π,T δ

= ∞ . (A.6)

Only vectors δ in the orthogonal space of the relevant eigenspace would not fulfill this

condition. In particular, condition (A.6) is fulfilled with probability 1 when δ is drawn

randomly according to some absolutely continuous probability distribution. Then, using

the expansion (A.4), we expect the vector√TφT (θ

δTT ) to ”blow-up” like the vector

zT ≡ ∂φT (θT )

∂π′ Mπ,T

√T

aTM−1

π,T δ .

zT must blow-up since, by definition 4.1,[

∂φT (θ0)∂π′

Mπ,T

]

is asymptotically full-column rank.

If[

∂φT (θT )∂π′

Mπ,T

]

is different from[

∂φT (θ0)∂π′

Mπ,T

]

(due to some non-linearity w.r.t. π), it

would take some perverse asymptotic singularity to erase the blow-up in (A.6). Note that

insofar as the vector√TφT (θ

δTT ) blows up, we can be sure that Plim [JT (δT )] = ∞ since, for

T sufficiently large,

JT (δT ) ≥ Mineg(ΩT )∥

√TφT (θ

δTT )∥

2

≥ Mineg(Ω)1

2

√TφT (θ

δTT )∥

2

,

with probability one asymptotically, where Mineg(A) is the smallest eigenvalue of a matrix

A and Mineg(Ω) > 0 by positive definiteness.

Proof of Theorem 4.4: It follows directly from Corollary 4.3.

52

Page 54: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

B Graphs and Tables of results

Panel A: Distribution of the standardized GMM estimator

Mean Variance Skewness Kurtosis

Intercept 0.04 1.07 -0.01 2.98

Slope 0.12 1.08 0.49 4.03

Panel B: Inference on the whole vector

No pretest Pretest

GMM AR AR SS SY

(robust) (not robust)

Rej. Prob. n.a. 0 1 1 1

Panel C: Inference on a subvector

AR (robust) AR (not robust)

Rej. Prob.

Intercept 1.00 1.00

Slope 0 1.00

Table 1: Inference in the linear homoskedastic model with T = 200, M = 1, 000, ρ = 0.8,

β ′β = 0.163 (stronger identification), θ0 = (0 0)′. Panel A provides the first four moments

of the Monte-Carlo distribution of the standardized GMM estimator. Panel B collects the

rejection probabilities when testing for weak identification of the whole parameter vector

θ. Finally Panel C collects rejection probabilities when testing for weak identification of a

subvector: testing the intercept without assuming strong identification of the slope; testing

the slope when assuming strong identification of the intercept.

53

Page 55: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Panel A: Distribution of the standardized GMM estimator

Mean Variance Skewness Kurtosis

Intercept 0.05 1.00 -0.00 2.77

Slope 0.30 1.14 1.06 5.40

Panel B: Inference on the whole vector

No pretest Pretest

GMM AR AR SS SY

(robust) (not robust)

Rej. Prob. n.a. 0 1.00 0.47 0.58

Panel C: Inference on a subvector

AR (robust) AR (not robust)

Rej. Prob.

Intercept 1.00 1.00

Slope 0 0.55

Table 2: Inference in the linear homoskedastic model with T = 200, M = 1, 000, ρ = 0.8,

β ′β = 0.010 (weaker identification), θ0 = (0 0)′. Panel A provides the first four moments

of the Monte-Carlo distribution of the standardized GMM estimator. Panel B collects the

rejection probabilities when testing for weak identification of the whole parameter vector

θ. Finally Panel C collects rejection probabilities when testing for weak identification of a

subvector: testing the intercept without assuming strong identification of the slope; testing

the slope when assuming strong identification of the intercept.

54

Page 56: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Panel A: Distribution of the standardized GMM estimator

Mean Variance Skewness Kurtosis

Intercept 0.04 0.62 -0.05 3.91

Slope 1.43 1.45 0.78 3.48

Panel B: Inference on the whole vector

No pretest Pretest

GMM AR AR SS SY

(robust) (not robust)

Rej. Prob. n.a. 0 0.95 0 0

Panel C: Inference on a subvector

AR (robust) AR (not robust)

Rej. Prob.

Intercept 1.00 1.00

Slope 0 0.06

Table 3: Inference in the linear homoskedastic model with T = 200, M = 1, 000, ρ = 0.8,

β ′β = 0 (no identification), θ0 = (0 0)′. Panel A provides the first four moments of the

Monte-Carlo distribution of the standardized GMM estimator. Panel B collects the rejection

probabilities when testing for weak identification of the whole parameter vector θ. Finally

Panel C collects rejection probabilities when testing for weak identification of a subvector:

testing the intercept without assuming strong identification of the slope; testing the slope

when assuming strong identification of the intercept.

55

Page 57: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Panel A: Distribution of the standardized GMM estimator

Mean Variance Skewness Kurtosis

Intercept 0.04 1.04 0.01 2.97

Slope 0.11 1.11 0.52 3.82

Panel B: Inference on the whole vector

No pretest Pretest

GMM AR AR SS SY

(robust) (not robust)

Rej. Prob. n.a. 0 1 1 1

Panel C: Inference on a subvector

AR (robust) AR (not robust)

Rej. Prob.

Intercept 1.00 1.00

Slope 0 0.98

Table 4: Inference in the linear heteroskedastic model with T = 200, M = 1, 000, ρ = 0.8,

β ′β = 0.163 (stronger identification), θ0 = (0 0)′. Panel A provides the first four moments

of the Monte-Carlo distribution of the standardized GMM estimator. Panel B collects the

rejection probabilities when testing for weak identification of the whole parameter vector

θ. Finally Panel C collects rejection probabilities when testing for weak identification of a

subvector: testing the intercept without assuming strong identification of the slope; testing

the slope when assuming strong identification of the intercept.

56

Page 58: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Panel A: Distribution of the standardized GMM estimator

Mean Variance Skewness Kurtosis

Intercept 0.04 0.95 0.02 2.79

Slope 0.28 1.17 1.00 4.73

Panel B: Inference on the whole vector

No pretest Pretest

GMM AR AR SS SY

(robust) (not robust)

Rej. Prob. n.a. 0 0.99 0.47 0.59

Panel C: Inference on a subvector

AR (robust) AR (not robust)

Rej. Prob.

Intercept 1.00 1.00

Slope 0 0.18

Table 5: Inference in the linear heteroskedastic model with T = 200, M = 1, 000, ρ = 0.8,

β ′β = 0.010 (weaker identification), θ0 = (0 0)′. Panel A provides the first four moments

of the Monte-Carlo distribution of the standardized GMM estimator. Panel B collects the

rejection probabilities when testing for weak identification of the whole parameter vector

θ. Finally Panel C collects rejection probabilities when testing for weak identification of a

subvector: testing the intercept without assuming strong identification of the slope; testing

the slope when assuming strong identification of the intercept.

57

Page 59: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Panel A: Distribution of the standardized GMM estimator

Mean Variance Skewness Kurtosis

Intercept 0.02 0.55 -0.10 4.04

Slope 1.30 1.29 0.74 3.37

Panel B: Inference on the whole vector

No pretest Pretest

GMM AR AR SS SY

(robust) (not robust)

Rej. Prob. n.a. 0.00 0.86 0 0

Panel C: Inference on a subvector

AR (robust) AR (not robust)

Rej. Prob.

Intercept 1.00 1.00

Slope 0 0.02

Table 6: Inference in the linear heteroskedastic model with T = 200, M = 1, 000, ρ = 0.8,

β ′β = 0 (no identification), θ0 = (0 0)′. Panel A provides the first four moments of the

Monte-Carlo distribution of the standardized GMM estimator. Panel B collects the rejection

probabilities when testing for weak identification of the whole parameter vector θ. Finally

Panel C collects rejection probabilities when testing for weak identification of a subvector:

testing the intercept without assuming strong identification of the slope; testing the slope

when assuming strong identification of the intercept.

58

Page 60: Department of Economics Working Papersciano and Otsu (2012). In our Monte-Carlo study, we consider a linear IV regression model (with or without conditional heteroskedasticity) and

Panel A: ∆ct+1 = ν + ψrt+1 + ut+1

Country Sample periodStock and Yogo Montiel-Olea and Pflueger Antoine and Renault

csimp cTSLS cLIML joint inter. slope

USA 1947:3-1998:4 1 0 0 0 1 1 0

AUL 1970:3-1998:4 1 0 1 1 1 1 0

CAN 1970:3-1999:1 1 0 0 1 1 1 0

FR 1970:3-1998:3 1 1 1 1 1 1 0

GER 1979:1-1998:3 1 0 0 0 1 1 0

ITA 1971:4-1998:1 1 1 1 1 0 1 0

JAP 1970:3-1998:4 0 0 0 0 0 1 0

NTH 1977:3-1998:4 1 0 0 0 0 1 0

SWD 1970:3-1999:2 1 1 1 1 0 1 0

SWT 1976:2-1998:4 0 0 0 0 0 1 0

UK 1970:3-1999:1 1 0 0 0 0 1 0

Panel B: rt+1 = ξ + (1/ψ)∆ct+1 + ηt+1

Country Sample periodStock and Yogo Montiel-Olea and Pflueger Antoine and Renault

csimp cTSLS cLIML joint inter. slope

USA 1947:3-1998:4 0 0 0 0 1 1 0

AUL 1970:3-1998:4 0 0 0 0 1 1 0

CAN 1970:3-1999:1 0 0 0 0 0 1 0

FR 1970:3-1998:3 0 0 0 0 0 1 0

GER 1979:1-1998:3 0 0 0 0 0 1 0

ITA 1971:4-1998:1 0 0 0 0 0 1 0

JAP 1970:3-1998:4 0 0 0 0 0 1 0

NTH 1977:3-1998:4 0 0 0 0 0 1 0

SWD 1970:3-1999:2 0 0 0 0 0 1 0

SWT 1976:2-1998:4 0 0 0 0 0 1 0

UK 1970:3-1999:1 0 0 0 0 0 1 0

Table 7: Pretest for weak instruments in the estimation of EIS. ”0” indicates that the pretest

cannot reject at 5%, whereas ”1” indicates that the pretest rejects at 5%.

59