RobustInferenceinModelsIdentiﬁedvia Heteroskedasticity

Robust Inference in Models Identified viaHeteroskedasticity

Daniel J. LewisHarvard University∗

October 31, 2017

Preliminary draft. All comments welcome via email.

Most recent version available here.

AbstractSimultaneous equations models identified via heteroskedasticity may be subject to

weak identification concerns due to proportional changes in variance across series. Con-sidering data from Nakamura & Steinsson (2017), I calibrate simulations to show therelevance of these concerns in practice. I propose conditions under which robust in-ference methods for a subset of the parameter vector are valid. In simulation, thesemethods avoid dramatic size-distortions present in strong-identification methods. Whilethere is power loss relative to standard inference, they are less conservative than theprevious alternative of projection methods. I propose two tests for weak identification.I offer a method for robust inference on IRFs for SVARs identified via heteroskedas-ticity. An empirical application to Nakamura & Steinsson (2017) shows that weakidentification is a problem with daily policy shocks, but not higher-frequency shocks.This confirms the authors’ suspicions, and shows the value of focusing empirical workon carefully constructed shock series.

JEL Classification: C12, C32, E43∗I would like to thank Jim Stock, Isaiah Andrews, Adam McCloskey, Jose Montiel Olea, Emi Nakamura,

Mikkel Plagborg-Möller, and Jòn Steinsson for their helpful feedback.

1

mailto:[email protected]

https://scholar.harvard.edu/files/daniellewis/files/robustinferenceoct17.pdf

1 Introduction

Latent variables, like the shocks in the structural vector auto-regressions (SVARs) ofSims (1980), are ubiquitous in economic models across fields, where observed innovationsare related to unobserved (structural) shocks by a linear combination matrix. A variety ofidentification schemes have been applied to map such innovations to structural shocks. Inrecent years, the literature has turned away from imposing identifying assumptions on thelinear combination matrix between shocks and innovations, realizing such assumptions aretoo closely related to the structural relationships of interest. Instead, focus has turned tohow assumptions on the structural shocks can contribute to identification. First proposed byRigobon (2003), identification via heteroskedasticity uses the presence of multiple varianceregimes to identify the structural parameters of interest. Reduced-form second moments areproducts of those parameters and second moments of the structural shocks. Identification viaheteroskedasticity supposes that the variance regimes can be discerned (either using priorinformation or estimation), and sometimes that only a subset of variances change. It isreasonable to worry that in small samples, or in data measured with noise, that the varianceregimes may not be markedly distinct. Such failure of the identifying assumptions can resultin weak identification, as noted by Magnusson & Mavroeidis (2014), Nakamura & Steinsson(2017), and Hébert & Schreger (2017). Unfortunately, there do not currently exist generalmethods for assessing the strength of identification or conducting robust inference in thissetting.

This paper draws on the extensive weak identification literature to provide a comprehen-sive treatment of the weak identification problem potentially present in models identified viaheteroskedasticity. First, I provide a theoretical exposition of how such a system of VARresiduals may exhibit weak identification properties, drawing an analogy to classical weak-IVto build intuition. I develop a fully general model, allowing for n variables where multiple(or all) variances may change. A Monte Carlo study displays the extent to which asymptoticdistributions are affected for an empirically-motivated calibration. Limiting representationsare established for S−statistics and K−statistics for both the full parameter and a subsetof parameters, facilitating non-conservative identification-robust inference. Conducting suchinference on subsets of the parameter vector is crucial for applied work, and is not straight-forward in weak identification settings. The tests proposed have the the advantage overpreviously available approaches of being neither prohibitively conservative, unlike projectionmethods (Dufour (1997), Robins (2004), Dufour & Taoumouti (2005), Kleibergen (2005),Chaudhuri (2008), Chaudhuri et al (2010), Chaudhuri & Zivot (2011), McCloskey (2016)),nor limited to a simplified two-variable setting (Nakamura & Steinsson, 2017). Monte Carlo

2

analysis confirms that standard Wald tests exhibit dramatic size distortions, while the ro-bust methods perform well. Next, tests of identification are offered; bias-based criticalvalues for a first-stage F−test are suggested from Montiel Olea & Pflueger (2013) for anempirically-common simple case, and a fully-general size-based method is proposed fromAndrews (2017).

I also offer a new approach to inference on the final object of interest in many macroe-conomic applications, impulse response functions (IRFs). IRFs are transformations of bothreduced-form and structural parameters. Point-wise confidence sets can be constructed us-ing a simple test-inversion approach, with relatively small critical values. This parallelsthe recent work of Montiel Olea, Stock, & Watson (2016), who offer such a method in theexternal instruments setting; however, their method exploits linearities not present in theidentification via heteroskedasticity context except in the simplest case. As such, the methodproposed here is more general, and, I believe, the first to apply in this context.

Throughout, the paper considers an empirical application to the data of Nakamura &Steinsson (2017). They study the impact of forward guidance on treasury futures yields,using a variety of bivariate simultaneous equations models. They use identification viaheteroskedasticity as a comparison to their main identification strategy. Two policy series areconsidered to track the latent monetary policy shocks: 1-day changes in treasury yields and30-minute changes in a “policy news” series constructed using principle components. Theyfind suggestive evidence of weak identification for the 1-day changes when they constructrobust confidence sets using a bootstrap approach to an AR-type statistic. In applying thetechniques developed here, I confirm the authors’ suspicions, with the one-day changes inthe yield not passing any of the proposed tests of identification. On the other hand, the 30-minute policy shocks satisfy the tests at virtually all levels. In addition, Monte Carlo workcalibrated from the data shows dramatic size-distortions resulting from using test statisticsemploying strong-identification asymptotics.

In light of these findings, scrutiny must be applied to the proliferation of applied papersnow using the approach. While many applications come from monetary economics andinternational finance, examples now exist in public finance (Jahn & Weber, 2016), growth(Islam et al, 2017), trade (Lin et al, 2016, Feenstra & Weinstein, 2017), political economy(Rigobon & Rodrik, 2005, Khalid, 2016), environmental economics (Millimet & Roy, 2016,Gong et al, 2017), agriculture and energy (Fernandez-Perez et al, 2016), education (Hogan& Rigobon, 2009, Klein & Vella, 2009), marketing (Zaefarian et al, 2017), and even fertilitystudies (Mönkediek & Bras, 2016). Within the macrofinance core of the literature, notablepapers include Rigobon & Sack (2003, 2004), Craine & Martin (2008), Eichengreen & Panizza(2016), Ehrmann & Fratzscher (2016), and Hébert & Schreger (2017). An entirely separate

3

strand of literature focuses on the method of Lewbel (2012), which is based on Rigobon’s(2003) approach; such analysis is also subject to these weak identification issues, but furtherwork remains to extend the present results to Lewbel’s setting.

The paper proceeds as follows. Section 2 formally presents the identification via het-eroskedasticity approach, and provides intuition behind the weak identification problem andwhen it may arise, supported by simulation evidence. Section 3 proposes methods for weakidentification robust inference, both for the full parameter vector and a subset of parameters,and compares them to alternative methods in a Monte Carlo study. Section 4 offers two testsof identification. Section 5 develops a method to construct robust confidence sets for IRFsin non-linear identification schemes. Section 6 presents the empirical application. Section 7concludes.Mij denotes the ijth element of matrix M

M (j) denotes the jth column of matrix M

M(i) denotes the ith row of matrix M

vech (M) denotes the unique vectorization of matrix M

Ai denotes the matrix of coefficients corresponding to lag i of the lag polynomialA (L)

2 A weak identification problem

2.1 The standard setting

Before considering issues of weak identification, it is necessary to discuss how identificationvia heteroskedasticity behaves in an ideal environment. Consider an n−dimensional systemof equations of the form

ηt = Hεt,

where εt are structural shocks with a diagonal covariance matrix. Often, these innovations,ηt, result from a reduced-form VAR of the form

A (L)Yt = ηt,

which has the structural representation

C (L)Yt = εt,

4

where C (L) = H−1A (L). The lag polynomial, A (L), and thus the reduced form residualsηt, can be estimated consistently by OLS. The conditional variance of εt is central to theidentification scheme. Identification via heteroskedasticity exploits the additional momentequations generated by assuming multiple variance regimes for the structural shocks andthus the reduced-form residuals. This requires at least two variance regimes, in which casethe system just-identified. The literature makes some standard assumptions:

Assumption 1. For all t = 1, 2, . . . , T and regimes Rj, j = 1, 2, . . . , k,

1. H is fixed over time, invertible, and has a unit-diagonal,

2. E [εt | t ∈ Rj] = 0, E [εtε′t | t ∈ Rj] = Σε,j,

3. Σ−1/2ε,j εt, for t in Rj, are i.i.d. cross-sectionally and across time.

The unit diagonal normalization means that a unit structural shock to variable i is assumed tohave a unit impact on the reduced form innovation to i. Assumption 1.2 assumes stationaritywithin each regime and the existence of the expectation Σε,j. The independence of theshocks allows estimators to be well-behaved, but could be relaxed to a mixing condition.The identical distribution assumption is made for convenience.

For simplicity of exposition, I discuss the two variance regime case throughout this paper.This is without loss loss of generality – the results can be extended to settings with moreregimes, like that considered in Rigobon & Sack (2003). In particular, note that if a system ofequations demonstrates weak identification for two regimes, further sub-dividing the regimeswill not necessarily strengthen the variation.1

A typical framework for identification via heteroskedasticity contrasts “event” observa-tions and “control” observations, arguing that on the event days, when, for example, a pieceof news reaches a market, variables are likely to be more volatile than on a typical day.2

One of the most common applications is typified by Nakamura & Steinsson (2017), studyingmonetary policy, where the event observations are changes in returns on announcement days,and the control observations are changes in returns on other days. On days when monetarypolicy announcements are made, relevant securities prices or interest rates should exhibitmore dramatic movements than on a typical day. In keeping with this dominant application,

1The extension of the theoretical results of this paper to accommodate additional regimes, up to k, isquite straightforward. Most extend trivially. When a rank condition or similar is required for a theorem toapply, it is then taken with respect to a matrix containing k columns for the variance regimes, instead of 2.

2When regimes and breaks must instead be estimated, there can be substantial bias in the estimation, asdiscussed in Lewis (2017).

5

I replace the sets of dates, Rj, with P,C ⊂ T , (P for policy, high variance, C for control, lowvariance) and Tp = |P | and similarly for C. Then, by Assumption 1.1-1.2, the reduced-formcovariance is

Ση,j = HΣε,jH′, (1)

Given two periods, (1) yields 2 × n(n+1)2

= n2 + n equations, with n2 − n + 2 × n = n2 + n

unknowns. The system is just-identified in an order-condition sense; for now, assume that arank condition also holds, so it is in fact identified. The system lends itself to estimation viaGMM. In the n = 2 case, for each regime j, the reduced form covariance can be decomposedas

σ2η1,j

= σ2ε1,j

+H212σ

2ε2,j

σ2η2,j

= H221σ

2ε1,j

+ σ2ε2,j

ση1,η2,j = H21σ2ε1,j

+H12σ2ε2,j.

Assumption 1.2 and (1) imply E [η′tηt | t ∈ Tj] = Ση,j, which suggests the natural momentequations

mt (ηt, H,Σε,j) =

η21t − σ2

ε1,j−H2

12σ2ε2,j

η22t −H2

21σ2ε1,j− σ2

ε2,j

η1tη2t −H21σ2ε1,j−H12σ

2ε2,j

,where date t is in regime j. Define the vector θ ∈ Θ as the unique elements of H, Σε,C ,and Σε,P , where Θ is compact. Then, stacking the moment equations from each regime, theresulting moment function as

φt (θ) =

[1 [t ∈ TC ] (vech (ηtη

′t)− vech (HΣε,CH

′))

1 [t ∈ TP ] (vech (ηtη′t)− vech (HΣε,PH

′))

](2)

Clearly, E [φt (θ0)] = 0 at θ0, the true parameter value. The standard GMM objectivefunction is defined as

ST(θ; θT (θ)

)=

[T−1/2

T∑t=1

φt (θ)

]′WT

(θT (θ)

) [T−1/2

T∑t=1

φt (θ)

]. (3)

Expressing identification via heteroskedasticity as a GMM problem is not novel; Rigobon(2003), Rigobon & Sack (2003, 2004), and Craine & Martin (2008) all use a related approach.Focus is hence restricted to the continuous-updating estimator (CUE) form of 3 with theefficient weighting matrix. This implies the use of ST (θ) and WT (θ) in (3), where WT (θ) =

ΩT (θ)−1, ΩT (θ) = E[φt (θ)′ φt (θ)

].

6

2.2 Identification in the simple case

The discussion above, and much of the literature, assumes a rank condition to hold,but that need not be the case. Consider a 2 − variable system where only the variance ofone structural shock (say the second) changes, a common model in the literature. This ishenceforth referred to as “the simple case”. Economically, it corresponds to the intuition thatthe monetary policy may be more volatile on FOMF announcement days, but other shocksshould not be. Then, writing out the equations for each regime under Assumption 1,

Ση,C =

(σ2η1,C

ση1η2,C

ση1η2,C σ2η2,C

)= H

(σ2ε1

0

0 σ2ε2,C

)H ′,

Ση,P =

(σ2η1,P

ση1η2,P

ση1η2,P σ2η2,P

)= H

(σ2ε1

0

0 σ2ε2,P

)H ′.

H12 is the parameter of interest. This off-diagonal element measures the impact of a unitstructural shock (say a policy shock) on another variable in the system. For instance, inNakamura & Steinsson (2017), this represents the impact of policy news on treasury futures,which they argue proxy for market expectations of monetary policy. This is frequently thecase in empirical work, where the policy shock is the one exhibiting heteroskedasticity, andthe researcher is interested in its impact on the other series (the “dependent” variable). Inthis setting, H12 is identified in closed form:

ση1η2,P − ση1η2,Cσ2η2,P− σ2

η2,C

=H12∆σ2

ε2

∆σ2ε2

= H12,

where ∆ is the difference operator. This is in fact just one of three closed-form expressionsfor H12 (as the problem is over-identified)3.

Due to an observation in Rigobon & Sack (2004), this can be characterized as an IVproblem:

H12 =∆ση1η2∆σ2

η2

=TPT

∑t∈P η1tη2t − TC

T

∑t∈C η1tη2t

TPT

∑t∈P η

22t − TC

T

∑t∈C η

22t

=

∑Tt=1 η1tZt∑Tt=1 η2tZt

,

whereZt =

[1 (t ∈ TP )× T

TP− 1 (t ∈ TC)× T

TC

]η1t.

3The following development of weak identification applies equally to either of the alternative identifyingequations.

7

When this instrument is not strongly correlated with the residuals, weak identification arises.Why might this be the case? Suppose the change in the first shock is quite small, such thatit can be modeled as local-to-zero. Then the variance changes of the reduced form residualsare also local to zero:

∆σ2η2

= d2/T1/2

∆ση1η2 = d12/T1/2.

In this case, the estimator will have an asymptotic distribution described by

H12d→ z12

z2

, where z =

(z12

z2

)∼ N

((d12

d2

), V

).

This is the canonical weak instruments setting - the estimator is not consistent for H12,nor is it asymptotically normally distributed, rather the ratio of two (generally correlated)normals. As a result, standard inference methods break down.

2.3 General case

I now extend the intuition present in the simple case to a general n−variable framework.In doing so, I take serious the rank condition, previously alluded to (and violated in theweak-IV discussion above). Sentana & Fiorentini (2001) establish a condition for H to beglobally identified in the presence of time-varying volatility, which is simplified in Proposition1.

Proposition 1. H is globally identified from Ση1 and Ση2 up to column order provided therows of

[Σε1 Σε2

]are not proportional.

Proof. See Appendix.

Proposition 1 presents the conditions under which the equations of identification viaheteroskedasticity satisfy the rank condition, and the system is in fact identified. Under anadditional assumption, distinguishing the columns of H or the shocks, identification is to apoint, not just up to column order. An example, adopted hence, is Assumption 2:

Assumption 2. The shock of interest experiences the largest relative change in varianceacross regimes.

This is one of a large class of assumptions that allow the researcher to impose a partialorder on the columns of H. This assumption was implicit in the simple case above. For amore detailed discussion of this labeling problem in a related setting, see Lewis (2017).

8

Proposition 1 implies that weak identification can arise if two (or more) variances changeby a similar factor. This nests the scenario illustrated in the above simple case, where onevariance change is local-to-zero and the other change is precisely zero. As an example ofvariances changing by similar factors, consider a case where, following a macroeconomic an-nouncement, two variables, including the variable of interest, experience shocks with aboutγ times the usual volatility. This could occur if there are shocks to more than one dimensionof monetary policy news. During the recovery from the financial crisis, the Fed made si-multaneous announcements about the path of the Fed Funds rate and Quantitative Easing;the Reserve Bank of New Zealand routinely announces changes to its Official Cash Rate si-multaneously with the release of detailed projections for a variety of key financial variables.Depending on the coordination between policy measures, it is plausible that their varianceschange by comparable factors. This general pathology of proportional changes is modeledas a local-to-unity relationship in the ratio of relative changes of two variances,

σ2εiP/σ2

εiC

σ2εjP/σ2

εjC

= 1 +d√T,

where d is finite. Without loss of generality, let j = 1, k = 2 and denote σ2ε1P/σ2

ε1C= γ, so

σ2ε2P

= γσ2ε2C

(1 +

d√T

). (4)

This nests the near-zero issue of the simple case when γ = 1.

Proposition 2. Adopting the modeling device in (4) and Assumption 1.1-1.3, H is asymp-totically unidentified.


Under the local-to-unity modeling device, the proportionality requirement of Proposition1 fails, resulting in an unidentified system, as stated in Proposition 2.

Dufour (1997) (Section 4) demonstrates the impact that such deficiencies can have ontesting problems. He shows that the size of Wald tests tends to unity as a system tendstowards non-identification. In this setting, it is straightforward to show, fixing d, that thesize of a Wald test on the full parameter vector is unity asymptotically. In fact, the resultsof the Monte Carlo simulations in Table 2 illustrate this well, establishing the severity of theproblems caused by weak identification in this setting.

It is important to note that in the case where H12 is close to zero, there may also be aweak identification problem when H12 is estimated using the analogy to linear IV. Hébert

9

Table 1: Monte Carlo Specificationsvery weak identification data strong identification

small sample T = 375, δ = δ/10 T = 375, δ = δ T = 375, δ = 10× δdata T = 750, δ = δ/10 T = 750, δ = δ T = 750, δ = 10× δ

large sample T = 1500, δ = δ/10 T = 1500, δ = δ T = 1500, δ = 10× δEach cell represents a calibration of the Monte Carlo exercise. The sample split allocates ceil

(676750T

)of the

observations to the control period, in keeping with the data sample studied in the empirical application.

δ =σ2ε1P

/σ2ε1C

σ2ε2P

/σ2ε2C

− 1 is the estimated measure of identification from the empirical data.

& Schreger (2017) argue for the use of one of the three possible closed form identifyingequations as the others have a zero denominator under their null hypothesis (H12 = 0), sinceH12 appears in the denominator. This is not a concern in the GMM method discussed here.However, if, like in Hébert & Schreger (2017), one is interested in the null hypothesis thatH12 = 0 the identification problems resulting from negligible variance changes are potentiallyexacerbated under the null.

2.4 Simulation evidence

Distribution of estimators

Monte Carlo evidence illustrates the scope of the problem in empirically-calibrated data.Indeed, the distributions of estimates under weak identification are far from those assumedunder standard asymptotics. The central Monte Carlo specification is calibrated from esti-mates from the nominal future, 1-day window nominal yield data of Nakamura & Steinsson(2017), discussed in more detail in the empirical application In particular,

H =

[1 0.38

0.65 1

],Σε,C =

[0.0042 0

0 0.0004

],Σε,P =

[0.0065 0

0 0.0011

]. (5)

Define δ ≡ σ2ε1P

/σ2ε1C

σ2ε2P

/σ2ε2C

− 1. I calculate the implied δ =σ2ε1P

/σ2ε1C

σ2ε2P

/σ2ε2C

− 1 as a measure ofidentification, which can be varied in the simulations. The specifications considered aredescribed in Table 1. T = 750 is taken as the central sample-length, in keeping with theempirical application. All other parameters are held constant. Estimation proceeds by CUE

10

GMM.4

Figure 1 presents histograms of the t−statistic on the parameter of interest, H12, for10,000 draws. It is clear that the estimates are not normally distributed for low degreesof identification, heuristically proxied by δ ×

√T . However, for the “strong identification”

specifications, the distribution is closer to a t−distribution.To further illustrate the implications of this problem in a macroeconomic context, I

impose a reduced form VAR(1) specification estimated in the empirical application:

A1 =

[−0.078 −0.086

0.079 0.064

].

I consider how weak identification propagates to the IRFs. I apply the “true” value of thereduced form coefficients to the estimated H matrices, as, in standard estimating proce-dures, A1 will be estimated consistently, and contribute relatively little to the variation indistribution of the IRFs. For clarity of illustration, I calculate and plot the IRFs for only200 draws, so the paths are actually distinguishable. Figure 2 presents the results. Theseshould be concerning to macroeconomists - under weak identification, the distribution ofthe IRFs is highly diffuse - there should be little confidence in an estimated path from anysingle observed draw when weak identification is a concern. Even the sign of the responsesis ambiguous.

3 Weak identification robust inference

3.1 Asymptotic distribution of test statistics

Existing results for the asymptotic properties of weakly identified GMM apply directlyin this context. The asymptotic distribution of GMM estimators, robust to weak identifica-tion, was established in Stock, Wright, & Yogo (1997) and Stock & Wright (2000). Insteadof providing an asymptotic distribution for the parameter estimates, as in strongly identi-fied GMM problems, they show that ST (θ0) follows a Chi-square distribution. Kleibergen(2005) provides a more refined statistic, which is asymptotically efficient, making use of theestimated Jacobian.

The present paper proceeds using the Kleibergen “K−statistic”, as defined in Kleibergen(2005), while the simpler S−statistic is retained in numerical work for comparison. Notethat while there now exist test statistics that may exhibit more desirable properties than

4Given that the problem is just-identified, the weighting matrix is irrelevant to point estimates. Therefore,estimation can be equivalently conducted without resorting to numerical optimization via the eigenvectormethod developed by Sims (2014). Given such point estimates, inference can be conducted by plugging themback into the CUE GMM moment function.

11

Figure 1: Distribution of t−statistics

t−statistics on H12 calculated from 10,000 Monte Carlo draws, using the sample length in the left marginand the degree of identification in the bottom margin. Extreme outliers are truncated to allow comparisonon the same axes. Calibration details are given in equation (5). Point estimation proceeds via Sims’ (2014)eigenvector method, with inference using these values in CUE GMM.

12

Figure 2: Distribution of IRFs

IRF paths for a unit shock for 200 Monte Carlo draws of H12, taking A1 as given. Sample length is givenin the left margin and the degree of identification in the bottom margin. Calibration details are given inequation (5). The horizontal axis is calibrated to days. Point estimation of H proceeds via Sims’ (2014)eigenvector method, with inference using these values in CUE GMM.

13

the K−statistic (for example Andrews’ (2016) Conditional Linear Combination tests), itis pursued here since most newer statistics make use of a convex combination of the K−and J−statistics. In the leading just-identified case, the J−statistic, testing overidentifyingrestrictions, cannot be used. However, researchers interested in overidentified systems shouldconsider such refinements. To proceed, additional assumptions are needed.Assumption 3. Assume

1. E[|εt|4+ρ] <∞, ρ > 0 in all regimes for all t = 1, 2, . . . , T,

2. As T →∞, Tj →∞ for all j.

Defineφt (θ) = φt (θ)− E (φt (θ))

qt (θ) = vec

(∂φt (θ)

∂θ′

)

qt (θ) = qt (θ)− E (qt (θ))

Lemma 1 provides asymptotic distributions for φt (θ0) and qt (θ0).

Lemma 1. Under Assumptions 1 & 3,

ψT (θ0) =1√T

T∑t=1

(φt (θ0)

qt (θ0)

)d→

(ψφ

ψθ0

)

where ψ =

(ψφ

ψθ0

)is a 2 (n2 + n)-dimensional normally distributed random variable with

mean zero and positive semi-definite 2 (n2 + n)× 2 (n2 + n)-dimensional covariance matrix

V (θ) =

(Vφφ (θ) Vφθ (θ)

Vθφ (θ) Vθθ (θ)

)= lim

T→∞var

[1√T

(φT (θ)

qT (θ)

)]

for φT (θ) =∑φt (θ).

Lemma 2 provides additional properties needed for the use of the estimated Jacobian inthe K−statistic.

14

Lemma 2. Under Assumptions 1.3 & 3.1, the moment covariance matrix estimator V (θ0)

satisfiesV (θ0)

p→ V (θ0)

and∂vec

(Vφφ (θ0)

)∂θ′

p→ ∂vec (Vφφ (θ0))

∂θ′

These two lemmas, proven in the Appendix, mirror those of Kleibergen (2005), andsimilarly yield Theorem 1 of that paper:

Theorem 1. If Lemmas 1 and 2 hold,

K (θ0)d→ χ2

n2+n.

This offers an asymptotic distribution for the K−statistic as defined above under theenumerated assumptions.

Monte Carlo: full vector Monte Carlo evidence shows that tests based on these identification-robust asymptotics perform far better than Wald tests. The distributions of t−statistics inthe previous section suggests that strong identification asymptotics may break down, ex-emplified by Dufour’s (1997) unit-size result for the Wald test. To investigate this, MonteCarlo simulations are run using the calibrations of equation (5) and Table 1. For each samplesize and degree of identification, 10,000 draws were taken, and Wald statistics, S−statistics,and K−statistics calculated to test the null hypothesis of the true parameter vector (a six-restriction test). These are compared to the appropriate limiting distribution 5% criticalvalues (χ2

6 (0.95)). The resulting rejection rates are presented in Table 2. The Wald testsexhibit extremely large size distortions, in alignment with Dufour’s asymptotic unit size re-sult, which improve with the strength of identification. For a given δ, increasing T increasesthe strength of identification. The S−tests and K−tests, however, are not systematicallyaffected by the degree of identification, as expected of robust tests. Their size distortion doesdecrease with sample size, which is indicative of the respective asymptotics “kicking in”, nota lack of robustness. In a macroeconomic sense, it appears that performance of Wald-basedinference approaches an acceptable level only for variance changes orders of magnitude largerthan those observed empirically.

15

Table 2: Size of tests on the full parameter vectorδ/10 δ δ × 10

Wald S/K Wald S/K Wald S/KT = 375 91.6 13.3 70.8 13.9 25.6 13.3T = 750 91.5 10.1 64.9 9.7 16.9 10.0T = 1500 91.57 7.9 58.3 8.1 11.7 8.1

Rejection rate of the true parameter vector based on 10,000 Monte Carlo draws. While S and K results arenot identical, they are the same to rounding error, and thus displayed together. Calibration details are givenin equation (5). Estimation via CUE GMM.

Subset inference A key challenge in the weak identification literature is inference on asubset of the parameter vector. Kleibergen (2005) offers a further refinement over Theorem1, providing a limiting distribution for estimates of a sub-vector of parameters, providedthe concentrated Jacobian, based on the remaining parameters, has full rank. Previouswork with identification via heteroskedasticity has skirted the problem of subset inferencebesides in the simple case, leaving interested practitioners, should they worry about weakidentification, to rely on projection methods based on the S−statistic (see Dufour (1997),Dufour & Taoumouti (2005), Chaudhuri (2008), Chaudhuri & Zivot (2008)) and the resultof Theorem 1. I characterize the circumstances in which the present setting will satisfyKleibergen’s rank condition, allowing the direct application of the subset K−statistic, andless conservative critical values.

To apply Kleibergen’s (2005) results, I partition θ into the parameter(s) of interest, β,and the remainder, α, and make the following assumption:

Assumption 4. Conditional on β, α is strongly identified.

This can be operationalized as a condition on the Jacobian,

Jα (α, β) = limT→∞

E

1

T

T∑t=1

[(∂φt (α, β)

∂α′

)∣∣∣∣α,β

]= E

[(∂φt (α, β)

∂α′

)∣∣∣∣α,β

]

Under Assumption 2, Kleibergen (2005) proves the following Theorem.

Theorem 2. If Lemmas 1 and 2 hold, then under Assumption 4,

K (β0)d→ ζpint ,

where ζpint is a χ2 random variable with pint degrees of freedom, and pint is the dimension ofβ.

16

Theorem 2 establishes that the subsetK−statistic on the parameters of interest is asymp-totically distributed Chi-square with degrees of freedom equal to the number of parametersof interest instead of the length of the full parameter vector. The proof of Kleibergen (2005)suffices under Lemmas 1 and 2 and Assumption 4. I henceforth refer to this as a “plug-in”statistic due to the approach taken with the nuisance parameters in the calculation of thestatistic. The subset result, due to Kleigbergen (2005), extends Stock & Wright’s (2000)result for the concentrated S−statistic, which holds only when β is the full set of weaklyidentified parameters.

I now characterize systems of equations identified via heteroskedasticity that satisfy As-sumption 4. Since the identification results presented here are global, they also imply localidentification, and thus imply the Jacobian condition, Assumption 4. I restrict analysis tothe cases where β is a single element of H or a single column of H. In policy analysis, theobject of interest is generally either the immediate impact of one shock on one variable, oran IRF, for which, a full column of H is required.

Sentana & Fiorentini (2001) generalize the result described in Proposition 1 to partialidentification. I reformulate their result in Proposition 3. First, define r (M) as the “propor-tional row space” of matrix M . That is, it is the maximal set of rows in M such that noneare proportional.5

Proposition 3. For H partitioned HA...HB, HA is identified from the covariance matrices

provided no row in the upper block of

[vec (ΣηA1) vec (ΣηA2)

r (vec (ΣηB1) , vec (ΣηB2))

]is proportional to another

row.

Proof. This is a restatement of Sentana & Fiorentini (2001) Proposition 4, with linear inde-pendence replaced with the weaker non-proportionality, as explained in the Appendix underthe proof of Proposition 1.

This result makes some progress with respect to Assumption 4. It offers partial identifi-cation when variance pathologies exist. However, in terms of inference, it shows only that ifall elements in HB were fixed, the remainder would be identified, and Assumption 4 would besatisfied. However, given that the point of subset inference is to be able to choose a limitedcollection of parameters of interest and perform inference on them, having to fix all weaklyidentified parameters is not necessarily helpful. This result is loosely analogous to Theorem3 in Stock & Wright (2000). Armed with the fact that attention can be limited to caseswhere the subset of interest is a single element or a single column, I extend the identificationresult of Proposition 3 from HA to all of H in Theorem 3.

5Contrast to the typical definition of the row space, which is further limited to not include bases that areany type of linear combination of the others.

17

Theorem 3. For H partitioned HA...HB, where HB contains at most two columns, H is iden-

tified from the covariance matrices provided no row in the upper block of



]is proportional to another row if

1. A single element Hlj is fixed and Hjk 6= 1/Hlj for H(j), H(k) ∈ HB, or

2. The full column H(j) ∈ HB is fixed.


This result is more useful in a subset testing context than Proposition 3. The Sentana& Fiorentini result yields partial identification, for the columns of H unaffected by variancedeficiencies. When there is proportionality in the variances, this does not say anything aboutthe affected columns of H. As noted above, this means Assumption 4 cannot be met unlessall affected columns of H are conditioned upon. By explicitly incorporating the informationto be used in the null hypothesis of the subset test, I obtain conditional strong identificationfor the remainder of HB by Theorem 3. A system of equations satisfying the conditions ofTheorem 3 meets Assumption 4, and the subset K−statistic is valid.

To ground this in the weak identification setting, recall the local-to-unity device used inSection 2.3. Using this approach, I characterize weak identification settings for which thesubset K−statistic is valid.

Condition 1. If there are two variances that change by an O(

1/√T)

factor, and one isthat of the shock of interest, then Assumption 4 is satisfied asymptotically.

While this result constitutes an improvement on the previous possibilities for subsetinference, it warrants some important remarks.

Remark 1. The shock of interest must be one of those affected by the variance pathology.Given how identification via heteroskedasticity is usually applied, where focus is on changesin the variance of the shock of interest, this is not out of line with common practice.

Remark 2. Given how few variances are permitted to be affected by these pathologies, aresearcher should err towards minimizing the number of series in the system of equationssubject to the constraint that the reduced form innovations span the structural shocks (in-vertibility).

Remark 3. Moreover, in settings with three or more variables, if only one is thought toexhibit substantial variance changes across the samples, as is often the case, then the subsetK−statistic cannot be used. This is due to the fact that the series of interest is not one

18

Table 3: Size of t−test, projection S−test, and plug-in S−test on H12

δ/10 δ 10× δt Sproj

Splugin/Kplugin

t SprojSplugin/Kplugin

t SprojSplugin/Kplugin

T = 375 44.2 0.0 4.6 17.1 0.0 5.2 8.0 0.0 4.6T = 750 41.2 0.0 5.0 12.2 0.0 5.0 6.3 0.0 4.6T = 1500 37.3 0.0 5.3 9.4 0.0 5.0 5.8 0.0 4.9

Rejection rate of the true parameter value for H12 based on 10,000 Monte Carlo draws. While S and Kresults are not identical, they are the same to rounding error, and thus displayed together. The Sproj resultsare not identically zero, but round to 0.0. Calibration details are given in equation (5). Estimation via CUEGMM.

of the at most two affected by proportionality. In this case, robust inference would requirethe assumption of zero-changes in the other series, and then estimation and inference canproceed using the closed-form methods of the simple case.

Monte Carlo: subset Repeating the Monte Carlo assessment of size-distortion for subsettests demonstrates scope for improvement over standard procedures. Now I compare fourtesting approaches: the Wald test, the projected S−test based on Theorem 1, the newproposed plug-in K−test, based on Theorem 2, and a plug-in S−test. The results aredisplayed in Table 3. The projected S−test represents the best available option for inferencein this general setting, absent the results of Theorem 3. First, as with tests on the fullparameter vector, the standard Wald test is substantially oversized, though the problem isnot as serious as for the full vector. As identification gets stronger, the distortion shrinks.The S−test based on projection methods is substantially undersized, rejecting extremelyrarely. The rejection rate is effectively zero in simulation. However, the plug-in methodusing both the K− and S−statistics is consistently well-sized, regardless of the degree ofidentification, or sample size.

In contrast, the “Fieller’s Method” strategy employed in Nakamura & Steinsson (2017)faces the same downsides as the standard projection tests. Nakamura & Steinsson con-struct their weak identification robust confidence intervals using a bootstrap-based versionof Fieller’s Method, which they draw from Staiger, Stock, & Watson (1997). Asymptotically,this coincides with an S−statistic. This method allows them to test estimates of H21, underthe assumption of a single variance change. Their approach consists of constructing the teststatistic

ω (H21) = ∆ση1,η2 −H21∆σ2η1

for candidate values of H21, for a series of bootstrap draws from the sample. The 95%confidence interval then consists of the range of values of H21 between the 2.5% and 97.5%

19

quantile of ω (H21). This approach can only be immediately applied to the simple case whereone structural variance changes; as such, it is not suited to exercise of Table 3, or the fullygeneral model allowing multiple variance changes. In theory, this method could be appliedto other parameters or combinations of parameters in a more relaxed model, but it is notimmediately possible to test each parameter separately - extending the method to morecomplicated specifications of ω (·) would require an argument of several parameters to bespecified by the researcher, and returns the standard projection problem. In other words, theuse of a bootstrap approach for constructing an S−type statistic does not alter its limitationsfor subset inference. The bootstrap procedure might offer finite-sample improvements overthe projection S−test. However, even assuming strong identification, Brüggemann, Jentsch,& Trenkler (2016) show that standard bootstrapping schemes (pairwise and wild) may failin the presence of conditional heteroskedasticity. They propose a block bootstrap that isasymptotically valid under strong identification, but offer no discussion of weak identification.

3.2 Power improvements in subset testing

The power improvements offered by subset tests using smaller critical values are demon-strated in Monte Carlo simulation. To do so, the previous simulation approach is modified.I fix T = 750, but still consider a range of strengths of identification. I now test the nullhypothesis of H12 = 0.3787 against a sequence of local alternatives, which are used to gener-ate the data. Figure 3 computes power curves based on these simulations. Note that theseare not size-adjusted; this is necessary to demonstrate the relative advantage of using theplug-in approaches over the projection approach. Since both are based on an identical S orK−statistic, it is necessary to compare the values to asymptotic critical values to differenti-ate the tests. The strength of the plug-in approach is that it justifies using less conservativecritical values in testing, not a new test statistic. At hypotheses relatively close to the trueparameter value, the Wald test rejects with a significantly higher probability than eitherS-test or the plug-in K−test. For the two weaker identification calibrations, power of theS− and K−tests gradually increases before increasing dramatically to close the gap on theWald test at distant alternative values. For the strong identification calibration, the powerof the S− and K−tests rises more steadily. For all degrees of identification, there is a largespike in power around H12 = 1.5. Figure 5 suggests one possible explanation; under thisparticular calibration, the asymptotic variance of the parameter estimates approaches zeroaround this value. As it gets closer to zero, it is natural that the rejection rate of tests on thatparameter increases dramatically. Naturally, the spike occurs for the weaker identifications

20

Figure 3: Power

Power curves formed from estimates of rejection rates of the null hypothesis (H12 = 0.38) against a sequenceof local alternatives (x−axis) based on 1000 Monte Carlo draws. Cubic smoothing was employed to smoothcurves. The far right spikes in the robust methods is discussed in the text and Figure 5. Calibration detailsare given in equation (5). Estimation via CUE GMM.

earlier as the variance is near-zero for a wider range of alternatives due to noisier estimation.A smaller effect can be seen for the Wald statistic in each parametrization. Naturally, theWald test has better power properties the stronger the identification. The most importantpoint is that the plugin tests outperform the projection tests universally.

Figure 4 repeats the above exercise, but now plots size-adjusted power instead of power forthe Wald test, the K−test and S−test (since size-adjusted power of both S-tests is identicalmechanically). It is clear that there is power loss due to using robust inference, particularlyfor alternatives close to the truth, but this must be weighed against the substantial sizedistortion of the F−tests. The plugin tests do, however, offer a substantial improvementover projection tests. It should also be noted that, particularly in more specific appliedcontexts, there is significant scope for more powerful robust tests (for example, the weightedJ −K test or the conditional linear combination test of Andrews (2016)).

21

Figure 4: Size-adjusted power

Size-adjusted power curves formed from estimates of rejection rates of the null hypothesis (H12 = 0.38)against a sequence of local alternatives (x−axis) based on 1000 Monte Carlo draws. The critical values arebased on quantiles from 10,000 Monte Carlo draws. Cubic smoothing was employed to smooth curves. Thefar right spikes in the robust methods is discussed in the text and Figure 5. Calibration details are given inequation (5). Estimation via CUE GMM.

22

Figure 5: Calibration details

The asymptotic variance of the estimator of H12 computed using an arbitrarily long time series to estimatethe moment covariance matrix at the true value; the asymptote to zero around a local alternative of 1.5 mayexplain the spike in rejection rate visible in the power curves, Figures 3 and 4.

4 Tests of weak Identification

Tests for weak identification pose a challenge in general GMM settings. It is straight-forward to reparametrize the GMM framework to afford tests of identification (i.e. δ = 0);however, the suitable null is rather the presence of weak identification (δ close enough to0 to cause problems in estimation). The simple case coincides with just-identified linearIV with a single endogenous variable, and a rule of thumb based on bias is provided basedon the results of Staiger & Stock (1997) and Stock & Yogo (2005), specialized to a set-ting with heteroskedasticity. For the general case, I recommend the approach of Andrews(2017). Lütkepohl & Milunovich (2016) do offer tests of identification in systems identi-fied via heteroskedasticity, but their work mainly considers GARCH-type models and makesdistributional assumptions to allow maximum likelihood methods.

4.1 Single-variance change rule of thumb

The analogy to the just-identified linear IV case with a single endogenous regressor isno doubt an appealing feature of simpler systems of equations, and has been employedrepeatedly in the literature. With that simplicity comes a familiar test. However, it must benoted that using this model fails to utilize all of the available moment conditions, and maypose misspecification problems if the assumption that other variances do not change is faulty(as discussed in the empirical application). Given the apparent preference for the simplifiedmodel in applied work, I discuss it anyway. This setting suggests utilizing the approach ofStaiger & Stock (1997) and Stock & Yogo (2005), which yields the oft-cited “F > 10” rule

23

Table 4: Critical values for first-stage F-test based on TSLS bias (5% significance level)Bias 0.05 0.1 0.2 0.3

Critical Value 37.42 23.11 15.06 12.05Critical values for the first-stage F−statistic from Montiel Olea & Pflueger (2013); estimates deliver biasless than or equal to that listed in 95% of samples.

of thumb. The first-stage F−statistic is based on the first stage regression,

η2t = ΠZt + νt

= Π

[1 (t ∈ TP )× T

TP− 1 (t ∈ TC)× T

TC

]η2t + νt

Stock & Yogo (2005) derive expressions for the worst-case asymptotic bias (relative to OLS)and size for a given degree of identification governed by Π, but assume homoskedasticity.Fortunately, Montiel Olea & Pflueger (2013) develop a test that is valid under the presence ofarbitrary heteroskedasticity, autocorrelation, or clustering. They use a “scaled F−statistic”,which, in this simplest setting, reduces to the standard F−statistic. They compare theNagar bias of a TSLS estimator to a “worst case” benchmark. The Nagar bias approximatesthe distribution of the TSLS estimator under weak identification using a second-order Taylorapproximation and computes the bias of that distribution. The benchmark coincides withthe OLS comparison of Staiger & Stock (1997) and Stock & Yogo (2005) under those papers’distributional assumptions. Table 4 presents the critical values for the first-stage weakinstruments test based on TSLS bias for the 5% significance level from Montiel Olea &Pflueger (2013), analogous to Table 5.1 in Stock & Yogo (2005). Additional simulations,not reported here, based on an identification via heteroskedasticity DGP, produced criticalvalues within Monte Carlo error of those of Montiel Olea & Pflueger. The size exercise ofStock & Yogo (2005) is not pursued as this setting makes it impossible to perform theircontrolled comparison since the worst-case size depends on the covariance between first andsecond stage residuals. This covariance depends mechanically on the degree of identification,resulting in non-monotonicity in the relationship between the degree of identification andworst-case size, rendering any threshold rule inappropriate. The generally-adopted thresholdof F > 10 corresponds to the critical value for relative bias of 10%, which here translates toa new rule of thumb of F > 23, which can easily be adopted in future work in this simplesetting.

4.2 Method for the general case

In the more general case, more complex methods are required. Tests of the null ofnon-identification are straightforward, as noted in Wright (2001), but testing of weak iden-

24

tification in general GMM has long been a problem. However, Andrews (2017) offers atwo-step strategy, which can be adopted here. The necessary assumptions for its applicationare discussed and verified in the Appendix. Briefly, given a user-specified maximum allow-able size distortion, ξ, a preliminary robust confidence set is constructed to have asymptoticcoverage 1−ν−ξ, regardless of identification. Then, 1−ν non-robust and robust confidencesets are constructed. If the preliminary confidence set is contained by the non-robust con-fidence set, the non-robust confidence set is adopted (strong identification); otherwise, therobust set should be used (weak identification). Asymptotically, the probability of makingthe correct determination converges to unity. These confidence sets can be constructed usingK−statistics and Wald tests as elsewhere in this paper; Andrews, however, considers a linearcombination of the K− and S−statistics due to favourable power properties. The interestedreader should review the original paper for a more thorough discussion of the test’s proper-ties and the choice of statistics to be used in its construction. Determination of the strengthof identification can be conducted with respect to the full parameter vector or a subset ofparameters of interest. These tests are applied in the empirical application.

5 Robust inference on impulse responses

The ability to perform robust inference on H also enables inference on another objectof interest, the impulse response function (IRF). H is not the ultimate object of interestin many applications of identification via heteroskedasticity. Indeed, much macroeconomicpolicy analysis relies on the IRFs generated by SVARs. These are formed via non-linear com-binations of the lag coefficients, A (L) and H. The literature on this topic is nascent. MontielOlea, Stock & Watson (2016) develop a method that exploits the linearity of the externalinstruments problem to offer an elegant solution in that context. Chevillon, Mavroeidis, &Zhan (2016) consider the case of long-run restrictions and offer a projection-based methodthat also accounts for cointegration issues they face. Neither of those methods can be em-ployed for identification via heteroskedasticity. What follows is a general method that canbe used across SVAR identification schemes with simple modifications, although it cannotimmediately address the cointegration issues faced by Chevillon, Mavroeidis, & Zhan (2016).

Using a test-inversion approach, it is possible to compute confidence sets for IRFs. Astructural impulse response function at horizon h is computed as

Λ0 = H,

Λh =

[h∑v=1

Λh−vAv

]h = 1, 2, . . . , (6)

25

where Av denotes the lag coefficient matrix corresponding to the vth lag. The response ofthe jth to a unit shock to k can be read off as the jk element of this object. It is helpful todefine the related object, Bh given by

B0 = I,

Bh =h∑v=1

Bh−vAv h = 1, 2, . . . ,

such that Λh = BhH. Thus, Bh = Bh (A (L)), entirely a function of the lag coefficients. Anelement of interest in Λh, Λh

ij, is the product of the ith row of Bh and the jth column of H.I now present conditions under which the remainder of Λh is strongly identified conditionalon Λh

ij. First consider an analog to the partial identification result of Proposition 3:

Proposition 4. If Bh is invertible, for Λh partitioned ΛhA

...ΛhB, Λh

A is identified up to scale

from the covariance matrices provided no row in the upper block of



]is proportional to another row.


This follows the same argument as Proposition 3, simply replacing H with BhH, withinvertibility of the product guaranteed by invertibility of Bh and H. It says that IRFs toshocks with non-proportional variance processes are strongly identified. This leads to astronger result.

Theorem 4. For Λh partitioned ΛhA

...ΛhB, where Λh

B contains two columns, ΛhA is identified

up to scale from the covariance matrices provided

1. Bh is invertible,

2. No row in the upper block of




row,

3. A single element Λhij is fixed and Λh

jk 6= 1/Λhij for Λh(j),Λh(k) ∈ Λh

B.


This is an extension of Theorem 3, again simply replacing H with Λh = BhH. Iden-tification up to scale may be unfamiliar for IRFs, but recall that IRFs to each shock arealways implicitly scaled by the normalization of the corresponding column of H. Theorem 4

26

implies that, fixing an element of interest in Λh, subject to the same remarks made followingTheorem 3, an analog to Assumption 4 is met, and inference can proceed using χ2

1 criticalvalues, as shown in Theorem 2. This is standard test inversion problem.

However, it is worth noting how this result can be applied computationally efficiently inpractice. The following discussion assumes Theorem 4 applies. Recall that, asymptotically,moments identifying reduced form VAR coefficients (A (L)) and those identifying H haveblock-diagonal covariance, see e.g. Lütkepohl (2006). This fact will be useful, but somework is required first. Define a moment equation ft

(A,Λh,ΣC ,ΣP

)such that it stacks the

moments yielding the reduced form VAR coefficients, fAt (A), and those pertaining to Λh,

fΛt

(A,Λh,ΣC ,ΣP

)= vec

(1 [t ∈ TC ]

(Bhηtη

′tB

h′ − ΛhΣCΛh′)

1 [t ∈ TP ](Bhηtη

′tB

h′ − ΛhΣPΛh′) ) .

Conditional on some Λhij, the remainder of the parameters can be strongly identified from

these moments by Theorem 4. Robust inference for Λhij can then proceed using χ2

1 criticalvalues and the efficiently-weighted GMM objective function based on fΛ

t (·), denoted SΛT (·).

Note that fΛt can be rewritten as

vec

(1 [t ∈ TC ]

(Bhηtη

′tB

h′ −BhHΣCH′Bh′

)1 [t ∈ TP ]

(Bhηtη

′tB

h′ −BhHΣPH′Bh′

) )

= vec

(Bh1 [t ∈ TC ] (ηtη

′t −HΣCH

′)Bh′

Bh1 [t ∈ TP ] (ηtη′t −HΣPH

′)Bh′

),

since all moments are just scaled by Bh. Because the efficient weighting matrix will beused, Bh can be ignored, as it is applied as a constant matrix and will drop out oncethe moments are re-weighted by the inverse of the moment covariance. In particular, for

fHt (H,ΣC ,ΣP ) = vec

(1 [t ∈ TC ] (ηtη

′t −HΣCH

′)

1 [t ∈ TP ] (ηtη′t −HΣPH

′)

),

fHt (H,ΣC ,ΣP ) Ω−1/2fH = fΛ

t

(A,Λh,ΣC ,ΣP

)Ω−1/2fΛ ,

where Ω−1/2fH and Ω

−1/2fΛ are the inverse square roots of the relevant moment covariances. De-

note the GMM objective function corresponding to f blockt (A,H,ΣC ,ΣP ) =

[fAt (A)

fHt (H,ΣC ,ΣP ) Ω−1/2fH

]as SblockT (A,H,ΣC ,ΣP ). Then

SΛT

(A,Λh,ΣC ,ΣP

)= SblockT (A,H,ΣC ,ΣP ) ,

27

where Λh = Bh (A)H.6 By transforming the moment functions to separate the reduced-formand structural parameters, as is the case in f blockt (·), the asymptotic block diagonality of thecovariance matrix of such moments can be exploited. Asymptotically, using the fact thatthe covariance of fAt (·) and fHt (·) is block diagonal, SblockT can be decomposed as

SblockT (A,H,ΣC ,ΣP ) = SAT (A) + SHT (H,ΣC ,ΣP ) .

Finally, combining results, this implies

SAT (A) + SHT (H,ΣC ,ΣP ) = SΛT

(A,Λh,ΣC ,ΣP

), (7)

asymptotically.Armed with this decomposition, it is relatively straightforward to construct a confidence

set for Λhij. From (7), λ = Λh

ij can be in a 1−ν confidence set from the right hand side if andonly if there is a pair A,H in the 1− ν joint confidence set from the left hand side such that[Bh (A)H

]ij

= λ. Thus, computing the values of λ implied by the values of A and H in the1−ν confidence joint confidence set implies a 1−ν set for Λh

ij. As noted above, concentratedS−statistics for Λh

ij based on SΛT can be compared to χ2

1 critical values; such values can thusalso be used for the left hand side. A confidence set can thus be constructed as follows:

1. Construct the 1 − ν Wald set for A using the χ21 (1− ν) critical value. Since A is

strongly identified, this is asymptotically equivalent to the set of values such thatSAT (A) ≤ χ2

1 (1− ν).

2. Construct the 1 − ν concentrated robust confidence set for H(j), using the χ21 (1− ν)

critical value using SHT(H(j)

), even if there is more than one free parameter in H(j).7

3. For each element A(n) of the first set, and each element H(m) of the second, computethe implied value λ(nm) using

[Bh (A)H

]ij

= λ.

4. If SAT(A(n)

)+ SHT

(H(m)

)≤ χ2

1 (1− ν) , then λ(nm) is in the 1 − ν confidence set forΛhij.

The resulting set need not be connected, a common feature in weak identification settings.The test will be dominated by the confidence set for H, over which greater uncertaintyalmost always rests. The critical values used have 1 degree of freedom, equivalent to the

6This mapping is unique since both Bh and H have been assumed to be invertible.7Steps 1 and 2, constructing preliminary confidence sets, are not strictly necessary, but rather allow the

researcher to construct relevant sub-grids to focus the search for combinations of A,H likely to be withinthe joint confidence set, before working in a higher-dimensional space for possible values of Λhij .

28

power of the method of Montiel Olea, Stock, & Watson (2016). These values differ from thoseof Chevillon, Mavroeidis, & Zhan (2016) due to their need to accommodate cointegrationissues.

6 Empirical application

To demonstrate the use of these results in practice and the extent to which weak identi-fication can impact applied work, I re-examine the data of Nakamura & Steinsson (2017).8

They analyze the impact of policy shocks on nominal and real treasury forward yields of vary-ing maturities. They perform Rigobon-style analysis as a robustness check on their mainresults. Their analysis uses two different policy shock series: one is the one-day nominal2-year treasury yields, and the other is the 30-minute change in a “policy news”, which seriesthey construct from several macro variables using Principle Components Analysis.9 They useweak identification robust techniques, and express concern about weak identification prob-lems for the response to 2-year nominal treasury yields, using a 1-day window; they believethat the 30-min changes behave as a strong instrument. Their main approach “regresses”1-day changes of the forward rates on announcement days on the 1-day change in 2-yearnominal yields. In the Rigobon-style analysis, they use announcement days (Tuesdays orWednesdays) as the high-variance regime, and a sample of other Tuesdays and Wednesdaysas the control period, or “low variance” sample. For more detail on the dataset, consult theoriginal paper. For comparison, I also examine a specification using the authors’ constructed30-minute window “policy news shocks”.

The data considered here is restricted to a subset of that considered by Nakamura &Steinsson. The dataset was reconstructed using data on 2-year nominal yields and 2-yearnominal and real forward rates; it spans 2004 to 2014, shorter than the original paper, sincethe forward data is only publicly available back to 2004. Note that Nakamura & Steinsson(2017) analyze 5- and 10-year forwards in addition, but only the shortest maturity is selectedfor the current exercise. Their analysis takes observed changes in the interest rates to bereduced form residuals. This means that for the purpose of applying an IRF methodology,modifications must be made, as discussed below.

6.1 Specification & identification

Nakamura & Steinsson (2017) restrict only the variance of policy shocks to change on8I am very grateful to Emi Nakamura and Jòn Steinsson for making their policy news series available to

me.9They actually consider an additional instrument, one-day changes in the “policy news” series, but that

is omitted here to permit a simple comparison between the two included policy measures.

29

announcement days. This places the analysis in the simple two-variable, one shock setting,with analogy to just-identified linear IV with a single endogenous regressor. However, thispaper proposes a generalized estimation framework, making use of all moments, and allowingfor the possibility that the variances of both structural shocks might change. Economically,it might be the case that only the variance of the policy shock should change, but if thatis the case, the restriction need not be imposed mechanically, as the estimation will bear itout.

Table 5 reports estimates for this unrestricted model. For the 30 min policy news shock,the results are extremely close to those reported by Nakamura & Steinsson for their restrictedmodel (on a slightly longer dataset); compare 0.94 to 0.96 and 1.13 to 1.10. This suggeststhat for a policy shock series that incorporates several dimensions of policy news and iscalculated over a short window, the assumption that the policy series captures almost all ofthe heteroskedasticity in the true policy shocks is valid, and thus the single variance changeassumption is innocuous. Further evidence of this is the fact that the policy shock variancesare estimated to be the same regardless of the dependent variable, and the estimated shocksthemselves are similar. On the other hand, the results for the 1-day change in the nominalyield are starkly different. The values of the H parameters make less economic sense; thesizable values of H21 are highly surprising, as are their opposite signs for nominal and realseries. There should be no instantaneous impact on policy rates from treasuries in daily data.The lower pass-through to nominal futures is also at odds with the other results. Ratherthan suggesting that there is something wrong with the unrestricted estimation procedure,this is indicative of weak identification.

To assess these fears more rigorously, I apply the proposed tests of identification; theybear out suspicions noted in the original paper. Because this model can be cast in the twovariable, one variance change just-identified IV setting, as it is by Nakamura & Steinsson, thefirst-stage F−statistic critical values can be used, as if the researcher intended to estimatethat restricted model. These results are reported in the first panel of Table 6. For the1-day nominal yield policy series, the first-stage F−statistic is very low compared to all ofthe critical values computed, and thus there is evidence of weak identification based on thisbias metric. For the 30 minute policy news shock, the first stage F−statistic is large andsurpasses all critical values. Applying the general 2-step size-based test without making anassumption on variance changes, reported in the second panel, shows similar results. The1-day nominal yield policy variable gives weak identification for each level of acceptabledistortion, for both dependent variables. For the 30 min policy news shock, there is evidenceof weak identification only at the 5% distortion threshold, for both dependent variables,suggesting fairly strong identification. These results comport well with intuition, Nakamura

30

Table 5: EstimatesReal, yield Nominal, yield Real, policy news Nominal, policy news

H12 1.10 0.38 0.94 1.13H21 -0.39 0.65 -0.0015 0.0003σ2s,C 0.0025 0.0042 0.0034 0.0042σ2i,C 0.0001 0.0004 0.0001 0.0001σ2s,P 0.0044 0.0065 0.0051 0.0056σ2i,P 0.0002 0.0011 0.0008 0.0008

GMM CUE estimates allowing for changes in all variances, using 1-day changes in 2-year real or nominalforward rates as the “dependent” variable and 1-day changes in 2-year nominal yield or 30-minute changesin Nakamura & Steinsson’s policy news series as the “independent” variable.

Table 6: Tests of IdentificationFirst-stage F (bias) Andrews 2-step (size)

F 0.2 0.1 0.05 0.2 0.1 0.05Real, 1-day shock 5.77 × × × × × ×

Nominal, 1-day shock × × ×Real, 30-min shock 1073.50 X X X

X X ×Nominal, 30-min shock X X ×

The first panel tests each shock series using the first-stage F−statistic bias-based critical values in Table 4.The second panel conducts the Andrews 2-step size test for each specification.

& Steinsson’s findings, and the discussion of estimates above.

6.2 Performance of tests

The simulation evidence presented previously offers insight into the performance of testsin this empirical context. Nakamura & Steinsson’s AR-type test for H12 yields very wide con-fidence sets for the Rigobon procedure. Although they do not compare inference techniques,this is indicative that standard methods likely perform poorly. The Monte Carlo resultsof Section 3 establish that this is the case, using a calibration based on the 1-day windownominal yield and nominal forward specification. As shown in Table 2, a Wald test of the fullparameter vector is found to have size tending to unity, and distortions are prohibitive evenfor an order of magnitude larger variance change than is present in the data and empiricalsample size. Even in the best-case calibration considered, the Wald test is unambiguouslyoutperformed by the robust tests considered. Turning to a subset of parameters, Table 3shows that a t−test on H12 yields size distortions between 5 and 15% for all but the strongestidentified calibrations, while the robust tests perform well. These results imply that, empiri-cally, there may be substantial returns to adopting robust testing procedures in terms of sizedistortion. While there is power loss compared to non-robust methods, the tests proposedare more powerful than those resulting from any projection-type argument.

31

6.3 Application to IRFs

The application to monetary policy shocks provides an opportunity to assess the per-formance of the proposed method for computing IRF confidence sets and to demonstratethe difference in performance compared to conventional techniques in the presence of weakidentification. However, the precise setting considered by Nakamura & Steinsson follows anevent study approach, and thus does not directly permit the computation of an IRF, as itconsiders disjoint series of days. For the purpose of this exercise, I adopt a new framework inthe spirit of that paper. Instead of considering a sample of control and event days, I considerthe full time series for the daily yield on 2-year nominal treasuries and daily yield on 2-yearnominal treasury futures of all trading days between January 2004 and November 2015. Ithen estimate a VAR(1) on this data.

yt =

(st

it

)= A1yt−1 + ηt

Now, instead of being raw 1-day changes in the variables, the residuals are the componentsof the time series not predictable by prior data. Estimation yields

A1 =

(−0.078 −0.086

0.079 0.064

).

The point estimates are computed using GMM. Note that, unsurprisingly, given the presenceof weak identification, the estimated value of H is dramatically different to that on themain dataset. Now, the value of H12 is 2.37. The test-inversion procedure proposed is thenapplied. The block diagonal structure of the moment covariance is exploited to form separatepreliminary confidence sets for A and H. Combining the gridded values and computing theS−statistic yields a set of response points that cannot be rejected by the test at the 95% level.For comparison, two other confidence sets are displayed. The first is a standard Wald 95%confidence set using GMM, the multivariate delta method, and standard χ2 critical values.The second is a 95% confidence set computed using a conventional block bootstrappingprocedure with block length equal to sixty days.

Figure 6 presents the results. The confidence sets get exponentially narrower at furtherhorizons, in keeping with standard behaviour of VAR IRFs, as the Bh component in Λh tendstowards a very precisely-estimated zero. The bootstrap set captures the asymmetry of theproblem, but is virtually contained within the much wider robust set. These results showthat inference based on non-robust confidence sets is likely to be misleading and exhibitsubstantial size distortions. The proposed method for computing robust confidence sets

32

appears to be advisable when a researcher is concerned the data may suffer from weakidentification. In this example, using the standard confidence sets would be highly misleadingas to the precision of the result. As noted above, this method will carry over to weakidentification issues present in other SVAR identification schemes by merely changing themoment equations for H accordingly. In an economic sense, when identification may beweak, non-robust inference makes overstating the significance of a policy impact likely.

7 Conclusion

This paper provides a comprehensive framework for practitioners to estimate and con-duct inference on models identified via heteroskedasticity, in particular SVARs. It presentsa conception of how weak identification may arise in this context and models the deficiencythrough which that weak identification arises. Simulations demonstrate that such identifica-tion issues significantly impact the performance of standard inference techniques, leading tosubstantial size distortions. The problem is presented in a standard GMM framework, andasymptotic distributions for test statistics are derived.

I establish the validity of non-conservative subset inference based on K−type statistic,which does not suffer from the size distortions of F - or Wald inference and is more powerfulthan projection methods. Measures are proposed to test for the presence of weak identifica-tion. The final theoretical contribution is to present a general method to construct confidencesets for impulse response functions in the presence of weak identification. This technique hasapplications beyond the present paper, throughout identification schemes for SVARs thatmay be at risk for weak identification.

These contributions are put through their paces on a subset of the data analyzed usingthe heteroskedasticity approach by Nakamura & Steinsson (2017). This exercise exposes theextent to which the judicious choice of a policy shock series can impact results; choosinga longer window and series encompassing less information yields weaker identification andmay lead to misspecification. It suggests that for many of the ad hoc macroeconomic shockseries that applied papers may use, standard inference may be threatened by identificationproblems. Under this weak identification, the new methods provide superior size control in avariety of contexts. Comparison of the new technique for computing robust confidence setsfor IRFs to Wald and bootstrap techniques highlights substantial size distortions present instandard techniques when identification is weak as well as absent non-linearities.

Weak identification is a concern that should be taken seriously by those using het-eroskedasticity assumptions to identify simultaneous equations models. This approach hasconsiderable traction across many fields of economics, and beyond. The empirical exercisedemonstrates its presence in relevant data as well as the need for the general, more flexible

33

Figure 6: IRF Confidence Sets

Response (in b.p.) of 2-year nominal forwards on Treasuries to a 1 b.p. monetary policy shock. The IRFpath is computed based on the horizon-by-horizon IRF estimator (computing based on H12 and A1 estimatesfrom the 0th horizons results in negligible differences). A Wald 95% confidence set is plotted, along with abootstrap confidence interval following a block bootstrap. The collapse of the confidence sets towards zerowith the point estimate is due to the precision with which the near-zero Ah1 values are estimated.

34

estimation framework proposed. Fortunately, with the proposed testing procedures, the needfor robust inference should not be a deterrent to practitioners.

35

8 Appendix

Proof of Proposition 1

Proposition 1. H is globally identified from Ση1 and Ση2 up to column order provided therows of

[Σε1 Σε2

]are not proportional.

Proof. This is a restatement of Sentana & Fiorentini (2001) Proposition 3, with linear inde-pendence replaced by proportionality due to the presence of only two regimes. The underly-ing condition needed by Sentana & Fiorentini is that the matrix

[Σε1 Σε2

]is of full rank.

In their setting, where they in effect assume the number of regimes is greater than n, thistranslates into linear independence. Here, with only two regimes, the matrix of variancesis generally “tall” and can be of full column rank even if some rows are linearly dependent.Thus, the “no proportionality” condition is required, as without it, only the column spaceof the corresponding columns of H would be identified, not the individual columns. To seethis, consider Sims’ (2014) argument that columns of H are identified as the right eigen-vectors of Ση1Σ−1

η2 provided they correspond to unique elements (eigenvalues) of the matrixΣε1Σ−1

ε2 . Repeated eigenvalues mean that the corresponding eigenvectors are not uniquelydetermined, instead spanning the columnspace of corresponding elements of H. Repeatedeigenvalues occur if and only if rows of the matrix

[Σε1 Σε2

]are proportional. Since

n ≥ 2, non-proportional rows imply that the matrix[

Σε1 Σε2

]is of full rank, 2, the

weaker condition of Sentana & Fiorentini.


Proposition 2. Adopting the modeling device in 4, H is asymptotically unidentified.

Proof. I model the variance deficiency as

σ2εiP/σ2

εiC

σ2εjP/σ2

εjC

= 1 +d√T.

Under this device, the row of[

Σε1 Σε2

]corresponding to εP is equal to

[σ2ε1P σ2

ε1Pσ2ε2C

σ2ε1C

(1 + d/T 1/2

)].

In the limit, for finite d, this equals[σ2ε1P σ2

ε1Pσ2ε2C

σ2ε1C

]. However, this is σ2

ε1P

σ2ε1C

times[σ2ε1C σ2

ε2C

],

the row corresponding to εC , so the condition of Proposition 1 is violated, and only the col-umn space of H is identified.

36

Proof of Lemma 1

Lemma 1. Under Assumptions 1 & 3,

ψT (θ) =1√T

T∑t=1

(φt (θ0)

qt (θ0)

)d→

(ψφ

ψθ0

)

where ψ =

(ψφ

ψθ0

)is a 2 (n2 + n)-dimensional normally distributed random variable with

mean zero and positive semi-definite 2 (n2 + n)× 2 (n2 + n)-dimensional covariance matrix

Ω (θ) = V

(Ωφφ (θ) Ωφθ (θ)

Ωθφ (θ) Ωθθ (θ)

)= lim

T→∞var

[1√T

(φT (θ)

qT (θ)

)]

with φT (θ) =∑φt (θ).

Proof. First, I prove that ψφT (θ) = T−1/2∑T

t=1 φt (θ) obeys a central limit theorem. Forsimplicity the proof assumes TP = TC = T/2, but this can trivially be generalized, as longas Assumption 3.2 is met. By Assumptions 1.1-1.2,

ψφT (θ) = T−1/2

[T∑t=1

[1 [t ≤ T/2] (vech (ηt,η

′t)− vech (HΣε,1H

′))

1 [t > T/2] (vech (ηtη′t)− vech (HΣε,2H

′))

]

−E

[1 [t ≤ T/2] (vech (E [ηtη

′t|t ≤ T/2])− vech (HΣε,1H

′))

1 [t > T/2] (vech (E [ηtη′t|t > T/2])− vech (HΣε,2H

′))

]]

= T−1/2

T∑t=1

[1 [t ≤ T/2] (vech (ηtη

′t)− vech (E [ηtη

′t|t ≤ T/2]))

1 [t > T/2] (vech (ηtη′t)− vech (E [ηtη

′t|t > T/2]))

]

= 21/2

(T/2)−1/2

T/2∑t=1

[vech (ηtη

′t)− vech (E [ηtη

′t|t ≤ T/2])

0n(n+1)/2×1

]

+ (T/2)−1/2T∑

t=T/2+1

[0n(n+1)/2×1

vech (ηtη′t)− vech (E [ηtη

′t|t > T/2])

]Each of the summations obeys a standard multivariate CLT with the addition of Assumptions1.3 and 3.1. Thus, the entire expression obeys a CLT as a linear combination of randomvariables each obeying a CLT, for any θ ∈ Θ (the expression does not depend on the particular

37

value of θ). ThusψφT (θ0)

d→ N (0, Vφφ (θ0))

By definition, qt (·) = 0 deterministically; note that ∂φ(θ)∂θ′

= −∂(vech(ΣηC )

′,vech(ΣηP )

′)′∂θ′

=

E[∂φ(θ)∂θ′

]since it contains only population parameters and no data (the moment equations

are separable in data and parameters). This holds for any θ ∈ Θ; θ need not equal θ0. Thusψθ is a degenerate random variable. That V (θ) takes the form given follows by construction.It remains to show that V (θ) is positive semi-definite. Since all but the top left block willbe zeros, it suffices to show that Vφφ (θ) is positive semi-definite. This follows as Vφφ (θ0) hasthe form E [MM ′].

Proof of Lemma 2

Lemma 2. Under Assumptions 1.3 & 3.1, the covariance matrix estimator V (θ0) satisfies

V (θ0)p→ V (θ0)

and∂vec

(Vφφ (θ0)

)∂θ′

p→ ∂vec (Vφφ (θ0))

∂θ′ .

Proof. By a standard Law of Large Numbers and Assumptions 1.3 and 3.1, the naturalcovariance estimator, is consistent, 1

T

∑φt (θ0)φt (θ0)′

p→ E[φt (θ0)φt (θ0)′

]. Then

V (θ0) = limT→∞

var

[1√TφT (θ0)

]= lim

T→∞

1

T

T∑t=1

var (φt (θ0)) ,

again by Assumption 1.3, which simplifies to E[φt (θ0)φt (θ0)′

]. Since qt is deterministic,

this establishes the first part of the Lemma.

For the second part, note that ∂V φφ(θ0)

∂θ′=

∂[ 1T

∑Tt=1 φt(θ0)φt(θ0)′]

∂θ′= 1

T

∑Tt=1

[∂φt(θ0)

∂θ′φt (θ0)′

].

∂φt(θ0)

∂θ′is a matrix of zeros, ones, and continuous functions of elements of θ; it is entirely deter-

ministic. Similarly, ∂V φφ(θ0)

∂θ′= E

[∂φt(θ0)

∂θ′φt (θ0)′

]= ∂φt(θ0)

∂θ′E[φt (θ0)′

], and since E

[φt (θ0)′

]is consistently estimated, so too is ∂Vφφ(θ0)

∂θ′by Slutsky’s Theorem.

38

Proof of Theorem 1

Theorem 1.

K (θ0)d→ χ2

n2+n

Proof. The result follows directly from Kleibergen (2005). Lemmas 1 and 2 establish As-sumptions 1 and 2 from that paper, which are used to prove Theorem 1 therein. Notethat Lemma 1 and part of Lemma 2 establish the required conditions of Stock & Wright(2001) Theorem 2 (their Assumption A and the consistency of the covariance matrix for theweighting matrix) so

ST (θ0)d→ χ2

n2+n

as an immediate corollary.

Proof of Theorem 3

Theorem 3. For H partitioned HA...HB, where HB contains at most two columns, H is iden-

tified from the covariance matrices provided no row in the upper block of



]is proportional to another row if

1. A single element Hlj is fixed and Hjk 6= 1/Hlj for H(j), H(k) ∈ HB, or

2. The full column H(j) ∈ HB is fixed.

Proof. The proof follows from extending the proof of Proposition 4 in Sentana & Fiorentini(2001). They show that for a similarly partitioned H, the columns of HA are identified.However, the columns of HB are identified only up to an orthogonal rotation Q, QQ′ =

Q′Q = I. HB represents the portion of H pertaining to proportional variance processes, andas such cannot contain just a single column. If HB contains two columns, then Q is 2 × 2.Consider first a single fixed element of H(j), the subject of the null hypothesis for the subsettest. Without loss of generality, let it be H2j = x. This yields the system of equations

1 H1k

x 1...

...Hnj Hnk

[Q11 Q12

Q21 Q22

]=

1 H1k

x 1...

...Hnj Hnk

. (8)

39

The normalization of H1k and H2k to unity is without loss of generality, as identification isonly up to scale of each column. Since Q is orthogonal, fixing column order, Q2

11 +Q221 = 1.

Given this and the equationxQ11 +Q21 = x,

I can solve for Q11 and Q21 where the sign is pinned down by the unit normalization. Thisyields two possible solutions forQ11 andQ21: Q11 = 1, Q21 = 0 and

Q11 = x2−1

x2+1, Q21 = 2x

x2+1

.

However, using an additional equation implied by (8), Q11 + H1kQ21, rules out the secondsolution unless H1k = 1/x. With Q11 and Q21 thus pinned down, the other column of Q isunique, and thus the entirety of H is identified.

This argument extends to the case where the entirety of H(j) is fixed. Now, however,the solution is unique unless Hlk = Hlk/H2k = Hlj/H2j for all l, in which case column k isa scalar multiple of column j, making H non-invertible, which is false by Assumption 1.1.Thus, the solution when a full column of H is specified is unique.

The restriction that HB contain at most two columns is necessary to yield conditionalidentification without any assumptions on the variances. If three columns pertaining toproportional variance processes were included, and column j’s value conditioned upon, theremaining columns could not be distinguished.


Proposition 4. If Bh is invertible, for Λh partitioned ΛhA

...ΛhB, Λh

A is identified up to scale

from the covariance matrices provided no row in the upper block of



]is proportional to another row.Proof. Proposition 3 follows from Sentana & Fiorentini’s (2001) Proposition 4. This propo-sition similarly follows by replacing H with Λh = BhH, where both are full-rank n × n

matrices.

Proof of Theorem 4

Theorem 4. For Λh partitioned ΛhA

...ΛhB, where Λh

B contains two columns, ΛhA is identified

up to scale from the covariance matrices provided1. Bh is invertible,

2. No row in the upper block of




row,

40

3. A single element Λhij is fixed and Λh

jk 6= 1/Λhij for Λh(j),Λh(k) ∈ Λh

B.

Proof. This follows from the proof of Theorem 3. The condition that Bh is full rank guar-antees that, like H, Λh is full-rank, which is needed to yield Proposition 4, embedded inthe theorem. Then, H can simply be replaced in the proof of Theorem 3 with Λh. Notethat this requires a unit normalization of Λh, which is unfamiliar. However, Λh is alwaysnormalized, at least implicitly, by any normalization applied to H. Thus, the unit normal-ization is without loss of generality. Any identification results must similarly hold under adifferent normalization, where Λh is rescaled back to the conventional BhH, where H has aunit diagonal.

Verification of Andrews (2017) AssumptionsAndrews’ (2017) framework for weak identification relies on the necessary conditions

for his Theorem 1 being satisfied. For GMM, this requires conditions on both the robustconfidence sets constructed and the non-robust confidence sets, as laid out in his Assumptions2-6.

Assumption 2 coincides with Lemma 1. Assumption 3 largely coincides with Lemma2, requiring the consistent estimation of V (θ). It also requires consistent estimation ofthe weighting matrix, which, given the focus on CUE in the present paper, also followsfrom Lemma 2. Assumption 4 requires, for any draw of data, the existence of normalizingmatrices for both the Jacobian and orthogonalized Jacobian evaluated at θ0 such that whennormalized they converge to full-rank matrices. Since the Jacobian is deterministic, it isidentical to the orthogonalized Jacobian. Under strong identification, the identity matrixserves as such a normalizing matrix; it is trivial to construct a matrix with judiciously placedO(T 1/2

)elements under weak identification. Unsurprisingly, if the system is unidentified,

no such matrices exist.Assumption 5 states four conditions underlying the performance of Wald tests under

strong identification. First, given the separability of φt (θ) in data and parameters, φT (θ)

converges uniformly over Θ since the sample second moments converge uniformly. Uniformboundedness follows similarly. The third condition requires that the estimated weightingmatrix converges uniformly over Θ, which follows since W (θ) is continuous in θ and mea-surable, Θ is compact, and E

[vech (ηtηt) vech (ηtηt)

′]< ∞ by Assumptions 1.1 and 3.1.

Since CUE is considered here, positive definiteness of W (θ) follows from the positive defi-niteness of Vφφ (θ), which is established in the final point of Assumption 6 below. Finally, itsmaximal eigenvalue is bounded by Assumptions 1.1 and 3.1 and the minimal eigenvalue isbounded away from zero since W (θ) is positive definite. The second and fourth conditionsare identification requirements, ensuring the population objective is small if and only if it

41

is evaluated in neighbourhood of the true parameter value; these are satisfied if the systemmeets the requirements for strong identification.

Assumption 6 imposes 5 conditions guaranteeing asymptotic normality of θ under strongidentification. First, it requires that θ0 be in the interior of Θ. Second, it requires φT (θ) andΩ (θ) be continuously differentiable; the first part holds as the Jacobian is a deterministicmatrix of ones, zeros, and simple continuous functions of elements of θ, and the secondpart holds since ∂Ω(θ)

∂θ′= 2∂φt(θ)

∂θ′1TφT (θ)′ and 1

TφT (θ) is continuous in θ. The third set of

conditions, on the Jacobian, are satisfied trivially since it is deterministic and of full rankunder strong identification. supθ∈B(θ0)

∥∥∥∂Ω(θ)

∂θ′

∥∥∥ = Op (1) (stochastic boundedness) holds since1TφT (θ) consists of linear combinations of estimated second moments of the data (which have

finite second moments by Assumption 3.1) and multiplicative functions of θ (which is finitefor θ ∈ B (θ0)). For the fifth condition, since the moment function is continuous in θ andthe asymptotic variance is a continuous function of the moment function, the asymptoticvariance is continuous in θ. Thus, the covariance estimator is uniformly consistent on a ballaround θ0. Since, for all θ, Vφφ (θ) = E

[φt (θ)φt (θ)′

], an outer product, Vφφ (θ) is positive

semi-definite on B (θ0). Since no elements in φt (θ) have zero variance, the result can beextended to positive definiteness.

42

References

Anderson, T. W. and H. Rubin (1949): “Estimation of the Parameters of a SingleEquation in a Complete System of Stochastic Equations,” The Annals of MathematicalStatistics, 20, 46–63.

Andrews, D. W. K. and X. Cheng (2012): “Estimation and Inference With Weak,Semi-Strong, and Strong Identification,” Econometrica, 80, 2153–2211.

Andrews, I. (2016): “Conditional Linear Combination Tests for Weakly Identified Models,”Econometrica, 84, 2155–2182.

——— (2017): “Valid Two-Step Identification-Robust Confidence Sets for GMM,” Reviewof Economics and Statistics (forthcoming).

Andrews, I. and A. Mikusheva (2016): “A Geometric Approach to Nonlinear Econo-metric Models,” Econometrica, 84, 1571–1612.

Bacchiocchi, E. and L. Fanelli (2015): “Identification in Structural Vector Autoregres-sive Models with Structural Changes, with an Application to U. S. Monetary Policy,”Oxford Bulletin of Economics and Statistics, 77, 761–779.

Chaudhuri, S. (2008): “Projection-type Score Tests for Subsets of Parameters,” Ph.D.thesis, University of Washington.

Chaudhuri, S., T. Richardson, J. Robins, and E. Zivot (2010): “A New Projection-type Split-Sample Score Test in Linear Instrumental Variables Regression,” Economet-ric Theory, 26, 1820–1837.

Chaudhuri, S. and E. Zivot (2008): “A Comment on Weak Instrument Robust Tests inGMM and the New Keynesian Phillips Curve Inference on Subsets of Parameters inGMM,” 1–12.

——— (2011): “A New Method of Projection-based Inference in GMM with Weakly Identi-fied Nuisance Parameters,” Journal of Econometrics, 164, 239–251.

Craine, R. and V. L. Martin (2008): “International Monetary Policy Surprise Spillovers,”Journal of International Economics, 75, 180–196.

Dufour, J. M. (1997): “Some Impossibility Theorems in Econometrics With Applicationsto Structural and Dynamic Models,” 65, 1365–1387.

——— (2003): “Identification, Weak Instruments, and Statistical Inference in Econometrics,”Canadian Journal of Economics/Revue Canadienne d‘Economique, 36, 767–808.

Dufour, J. M. and J. Jasiak (2001): “Finite Sample Limited Information InferenceMethods for Structural Equations and Models With Generated Regressors,” 42.

Dufour, J. M. and M. Taamouti (2005): “Projection-Based Statistical Inference inLinear Structural Models with Possibly Weak Instruments,” Econometrica, 73, 1351–

43

1365.Ehrmann, M. and M. Fratzscher (2017): “Euro Area Government Bonds - Fragmenta-

tion and Contagion During the Sovereign Debt Crisis,” Journal of International Moneyand Finance, 70, 26–44.

Eichengreen, B. and U. Panizza (2016): “A Surplus of Ambition: Can Europe Rely onLarge Primary Surpluses to Solve its Debt Problem?” Economic Policy, 31, 5–49.

Feenstra, R. C. and D. E. Weinstein (2017): “Globalization, Markups, and US Wel-fare,” Journal of Political Economy, 125, 1040–1074.

Fernandez-Perez, A., B. Frijns, and A. Tourani-Rad (2016): “ContemporaneousInteractions among Fuel, Biofuel and Agricultural Commodities,” Energy Economics,58, 1–10.

Fieller, E. C. (1954): “Some Problems in Interval Estimation,” Journal of the RoyalStatistical Society, 16, 175–185.

Gong, X., S. Yang, and M. Zhang (2017): “Not Only Health: Environmental PollutionDisasters and Political Trust,” Sustainability, 9, 1–28.

Guggenberger, P., F. Kleibergen, S. Mavroeidis, and L. Chen (2012): “On theAsymptotic Sizes of Subset Anderson-Rubin and Lagrange Multiplier Tests in LinearInstrumental Variables Regression,” Econometrica, 80, 2649–2666.

Han, S. and A. McCloskey (2017): “Estimation and Inference with a (Nearly) SingularJacobian,” Manuscript.

Hébert, B. and J. Schreger (2017): “The Costs of Sovereign Default: Evidence fromArgentina,” American Economic Review (forthcoming).

Hogan, V. and R. Rigobon (2009): “Using Unobserved Supply Shocks to Estimate theReturns to Education,” Manuscript.

Islam, A., F. Islam, and C. Nguyen (2017): “Skilled Immigration, Innovation, and theWages of Native-Born Americans,” Industrial Relations, 56, 459–488.

Jahn, E. and E. Weber (2016): “Identifying the Substitution Effect of Temporary AgencyEmployment,” Macroeconomic Dynamics, 20, 1264–1281.

Khalid, U. (2016): “The Effect of Trade and Political Institutions on Economic Institu-tions,” Journal of International Trade and Economic Development, 26, 89–110.

Kleibergen, F. (2005): “Testing Parameters in GMM Without Assuming that they AreIdentified,” Econometrica, 73, 1103–1123.

Kleibergen, F. and S. Mavroeidis (2008): “Inference on Subsets of Parameters in GMMwithout Assuming Identification,” .

——— (2009): “Weak Instrument Robust Tests in GMM and the New Keynesian PhillipsCurve,” Journal of Business & Economic Statistics, 27, 293–311.

44

Klein, R. and F. Vella (2009): “Estimating the Return to Endogenous Schooling Deci-sions via Conditional Second Moments,” Journal of Human Resources, 44, 1047–1065.

Lewbel, A. (2012): “Using Heteroscedasticity to Identify and Estimate Mismeasured andEndogenous Regressor Models,” Journal of Business & Economic Statistics, 30, 67–80.

Lewis, D. J. (2017): “Identification via Generalized Stochastic Volatility,” Manuscript.Lin, F., E. O. Weldemicael, and X. Wang (2016): “Export Sophistication Increases

Income in Sub-Saharan Africa: Evidence from 1981-2000,” Empirical Economics, 52,1627–1649.

Magnusson, L. M. and S. Mavroeidis (2014): “Identification using Stability Restric-tions,” Econometrica, 82, 1799–1851.

McCloskey, A. (2017): “Bonferroni-Based Size-Correction for Nonstandard Testing Prob-lems,” Journal of Econometrics, 200, 17–35.

Millimet, D. L. and J. Roy (2016): “Empirical Tests of the Pollution Haven Hypothesiswhen Environmental Regulation is Endogenous,” Journal of Applied Econometrics, 31,652–677.

Mönkediek, B. and H. A. Bras (2016): “The Interplay of Family Systems, Social Net-works and Fertility in Europe Cohorts Born Between 1920 and 1960,” Economic Historyof Developing Regions, 31, 136–166.

Nakamura, E. and J. Steinsson (2017): “High Frequency Identification of MonetaryNon-Neutrality,” Manuscript.

Olea, J. L. M., J. H. Stock, and M. W. Watson (2016): “Inference in Structural VARswith External Instruments,” Manuscript.

Rigobon, R. (2003): “Identification through Heteroskedasticity,” The Review of Economicsand Statistics, 85, 777–792.

Rigobon, R. and D. Rodrik (2005): “Rule of law, Democracy, Openness, and Income,”Economics of Transition, 13, 533–564.

Rigobon, R. and B. Sack (2003): “Measuring the Reaction of Monetary Policy to theStock Market,” Quarterly Journal of Economics, 118, 639–669.

——— (2004): “The Impact of Monetary Policy on Asset Prices,” Journal of MonetaryEconomics, 51, 1553–1575.

Robins, J. M. (2004): “Optimal Structural Nested Models for Optimal Sequential Deci-sions,” Proceedings of the Second Seattle Symposium in Biostatistics, 189–326.

Rothenberg, T. (1971): “Identification in Parametric Models,” Econometrica, 39, 577–591.

Sentana, E. and G. Fiorentini (2001): “Identification, Estimation and Testing of Con-ditionally Heteroskedastic Factor Models,” Journal of Econometrics, 102, 143–164.

45

Sims, C. (1980): “Macroeconomics and Reality,” Econometrica, 48, 1–48.——— (2014): “Using Financial Data in Macroeconomic Models,” in Nonlinearities in

Macroeconomics and Finance in the Light of Crises, Frankfurt: ECB.Staiger, D. and J. Stock (1997): “Instrumental Variables Regression with Weak Instru-

ments,” Econometrica, 65, 557–586.Staiger, D., J. H. Stock, and M. W. Watson (1997): “How Precise Are Estimates of

the Natural Rate of Unemployment?” in Reducing Inflation: Motivation and Strategy,ed. by C. Romer and D. Romer, University of Chicago Press, January, 195–246.

Stock, J. H. and J. H. Wright (2000): “GMM with Weak Identification,” Econometrica,68, 1055–1096.

Stock, J. H., J. H. Wright, and M. Yogo (2002): “GMM,Weak Instruments, andWeakIdentification,” Journal of Business and Economic Statistics Symposium on GMM.

Stock, J. H. and M. Yogo (2005): “Testing for Weak Instruments in Linear IV Re-gression,” in Identification and Inference in Econometric Models: Essays in Honor ofThomas Rothenberg, ed. by D. W. K. Andrews and J. H. Stock, 80–108.

Wright, J. H. (2001): “Detecting Lack of Identification in GMM,” Manuscript.Zaefarian, G., V. Kadile, S. C. Henneberg, and A. Leischnig (2017): “Endogeneity

Bias in Marketing Research: Problem, Causes and Remedies,” Industrial MarketingManagement, 65, 39–46.

46

RobustInferenceinModelsIdentiﬁedvia Heteroskedasticity

Documents

Transcript of RobustInferenceinModelsIdentiﬁedvia Heteroskedasticity