Projection Minimum Distance: An Estimator for Dynamic ... jorda.pdf · This version, March 2006...

This version, March 2006

Projection Minimum Distance: An Estimator for Dynamic

Macroeconomic Models∗

Abstract

This paper introduces a simple, two-step estimator for rational expectations models and time series modelsin general. We call this estimator projection minimum distance (PMD) and show that it is consistent andasymptotically normal. PMD provides consistent estimates of parameters even when the economic modelis misspecified. In addition, it provides a simple chi-square specification test and a statistically basedmetric of closeness between the impulse responses implied by the model and those from the data. Weshow how PMD relates to GMM and how invalid/weak instrument problems in GMM are easily resolvedby PMD. PMD is simple to implement as it only involves two simple least squares steps and can begeneralized to more complex and nonlinear environments.

• Keywords: impulse response, local projection, rational expectations, minimum chi-squared estima-tion.

• JEL Codes: C32, E47, C53.

Òscar JordàDepartment of EconomicsUniversity of California, DavisOne Shields Ave.Davis, CA 95616e-mail: [email protected]

Sharon KozickiResearch DepartmentBank of Canada234 Wellington StreetOttawa, Ontario, CanadaK1A 0G9e-mail: [email protected]

∗The views expressed herein are solely those of the authors and do not necessarily reflect the views of theBank of Canada. We thank Timothy Cogley, James Hamilton, Peter Hansen, Kevin Hoover, Frank Wolak andseminar participants at Duke University, the Federal Reserve Bank of New York, Federal Reserve Bank of SanFrancisco, Federal Reserve Bank of St. Louis, Stanford University, the University of California, Davis, University ofKansas and the Winter Meetings of the American Economic Association, 2006 in Boston for useful comments andsuggestions. Jordà is thankful for the hospitality of the Federal Reserve Bank of San Francisco during the creationof this paper.

1 Introduction

Estimating and testing competing structural models of the macroeconomy is necessary to advance

our understanding of the competing forces at work and it is fundamental in establishing appro-

priate policy responses in actual economies. Inevitably, macroeconomic models compromise their

realism in favor of tractability and analysis. These opposing forces offer significant challenges for

formal statistical evaluation. For instance, the maximum likelihood principle and its asymptotic

optimality properties require that the structural model be a representation of the density of the

underlying data generation process (DGP). This demand is very hard to meet in practice and

results in rejection of many useful models (economically speaking). Routine failure of specifica-

tion tests has therefore led researchers down the path of evaluating their economic models by

comparing their impulse responses with those from the data by the “eyeball” metric.

This paper introduces new methods to estimate and evaluate structural, dynamic, stochastic

macroeconomic models. The method, which we label projection minimum distance (PMD), is a

two-step estimator. In the first step, we compute the impulse responses from the data semipara-

metrically with the local projections estimator introduced by Jordà (2005). Next, we represent the

stable solution of the model in terms of its Wold representation and obtain the mapping between

the structural parameters and the Wold coefficients by the method of undetermined coefficients

(see Christiano, 2002). Hence, in the second step, the structural parameters of the model are

effectively estimated by minimizing the distance between the impulse responses from the data and

those implied by the model. The resulting estimator is based on a minimum chi-square estimator

(Ferguson, 1958) and belongs to the broader family of minimum distance estimators of which

GMM is also a member of the class.

However, PMD has important advantages that distinguish it from GMM and other commonly

used estimators. Specifically, PMD consists of two, simple, least-squares steps, and therefore, is

easily implementable. The principle behind PMD consists in minimizing the distance between

1

the data’s and the model’s impulse responses — the dimension of fit macroeconomic researchers

most care about. Two implications follow from this principle: First, we provide an overall mis-

specification test based on overidentifying restrictions that is distributed chi-square. Effectively,

this test is the formal equivalent of the “eyeball” determination of when a model matches the

relevant dynamics of the data. Secondly, PMD is designed to provide consistent estimates even

when the model is dynamically misspecified. We show that the common practice of using lags of

the endogenous variables as instruments in GMM can only be justified by the internal dynamics of

the data, not the model. Otherwise, GMM presents and invalid instrument problem that results

in inconsistent estimates. Fuhrer and Olivei (2004) explain this problem in more detail.

Econometrically, the paper has three contributions. First, it is critical for the consistency

properties of PMD to obtain consistent, first-step estimates of the impulse responses from the

data. This motivates our preference for local projections over approximations based on vector

autoregressions (VARs). Consequently, we begin by deriving consistency and asymptotic nor-

mality results for the local projection estimates. Second and unlike classical minimum distance,

the minimum chi-square step is based on a function of the structural parameters that depends

on sample estimates. Consequently, we derive the consistency and asymptotic normality of the

minimum chi-square step and appropriately incorporate into the asymptotic variance-covariance

matrix the uncertainty generated in the first-step. Hence, we show that a misspecification test

of overidentifying restrictions is distributed chi-square. Finally, we show that PMD is consistent

even when the economic model is dynamically misspecified and how our model corrects the invalid

instrument problem that GMM suffers.

We introduce PMD and the main results in the context of a flexible, linear state-space repre-

sentation of a dynamic rational expectations model. However, PMD is not limited by linearity.

PMD can be made more flexible in two different ways that do not significantly complicate the

method. First, the first-step local projections can be made nonlinear in a variety of ways as

described in Jordà (2005). Secondly, the method is not limited to linear rational expectations

2

models. However, because the paper is already dense with results, we defer a more detailed dis-

cussion of these two issues to another paper. Finally, in addition to showing how PMD relates to

GMM we also show how it relates to the Hannan-Rissanen (1982) estimator and how PMD can

be used to estimate rather general time series models, including traditional ARMA(p,q) models.

2 Projection Minimum Distance: The Method

In this section we propose an estimator for rational expectations models (and more broadly time

series models), where it is important to be flexible about the underlying dynamics of the data

generating process (DGP) to ensure consistent estimates of the structural parameters of interest.

The estimator we propose consists of two steps. In the first step we estimate impulse responses

from the sample with a flexible, semi-parametric estimator: Jordà’s (2005) local projections. In the

second step we minimize the distance between the impulse responses estimated from the sample

and the responses generated from the macroeconomic model being considered. This second step

is based on Ferguson’s (1958) minimum chi-square estimator. We call this two-step estimator

“projection minimum distance” (PMD) and show that it is consistent and asymptotically normal.

We find it useful to use a generic rational expectations formulation (e.g., see Farmer, 1993;

and Evans and Honkapohja, 2001) to illustrate the particulars of our technique. Hence, let yt

be an r × 1 vector of endogenous variables whose behavior is described by the backward- and

forward-looking model

Φ00yt = Φ01yt−1 + Φ

02Etyt+1 + ut, E(utu

0t) = I (1)

where ut is the r × 1 vector of expectational errors. The r × r coefficient matrix Φ0 makes

explicit the nature of the contemporaneous relations between elements of yt. Expression (1) is

rather general. For instance, yt can include exogenous variables whose economic processes are

not directly modeled but whose reduced-form laws-of-motion can be expressed in finite Markov

3

form. Similarly (1) does not restrict the model to have first order dynamics since the vector yt can

always be appropriately redefined to include lags of yt and (1) can be thought of as a state-space

representation. Expression (1) can therefore be appropriate for both dynamic stochastic general

equilibrium or dynamic partial equilibrium models of the economy.

A stable solution for (1) is a dynamic stochastic difference equation in yt. In most practical

situations, the stability of the solution implies that it is covariance-stationary, and hence, by the

Wold decomposition theorem (see Anderson, 1994), can be represented as,

yt =∞Xj=0

B0jεt−j . (2)

with B0 = I, E(εtε0t) = Ω and with the relation between the reduced form residuals of the Wold

decomposition, εt, and the structural residuals of the economic model, ut, given by εt = Φ−10 ut

(we omit the constant and deterministic terms for simplicity but without loss of generality).

Substituting (2) into (1) and equating terms in εt−j (in analogous manner to the method of

undetermined coefficients, see Christiano, 2002), we obtain the following set of conditions:

Φ00 = Φ02B01 + Φ

00 for j = 0 (3)

B0j = Φ01B0j−1 + Φ

02Bj+1 for j ≥ 1

where we remark that an estimate of Φ0 can be obtained by imposing the structural identification

conditions of the model and noticing that Ω = Φ−10¡Φ−10

¢0.

Momentarily ignoring the conditions associated with j = 0 in (3), it is clear that for j ≥ 1,

structural identification plays no role in estimating Φ1 and Φ2, which can be done on the basis of

the reduced-form impulse responses, Bj , alone. Define

4

Br(h+1)×r

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

B0

B1

...

Bh

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦,

with B0 = I; and the selector matrices

S0 =

∙0

r(h−1)×r(Ih−1 ⊗ Ir) 0

r(h−1)×r

¸,

S1 =

∙(Ih−1 ⊗ Ir) 0

r(h−1)×2r

¸,

S2 =

∙0

r(h−1)×2r(Ih−1 ⊗ Ir)

¸.

Then we can stack the conditions in (3) for j = 1, ..., h more compactly as

S0B = S1BΦ1 + S2BΦ2 (4)

The vec operator can be applied to both sides of this expression to vectorize the coefficient vectors.

Let b = vec(B); φ = vec(Φ1) vec(Φ2)0 then (4) can be written as

(Ir ⊗ S0) b = (Ir ⊗ S1B) (Ir ⊗ S2B)φ . (5)

This last expression makes clear that if the impulse response coefficients in B were known rather

than estimated, then φ could be simply estimated, say, by least squares. Instead, we show below

that one can easily obtain semi-parametric estimates of b (or B interchangeably) with local pro-

jections in a first stage regression. In the second stage, we minimize the distance between the two

sides of expression (5) by minimum chi-square methods.

In particular, let

g³bbT ;φ´ = (Ir ⊗ S0)bbT − n³Ir ⊗ S1 bBT´ ³

Ir ⊗ S2 bBT´oφ (6)

5

then, a minimum chi-square estimator of φ can be found by minimizing

minφ

bQT (φ) = g ³bbT ;φ´0cWg ³bbT ;φ´ (7)

for some weighting matrix cW.Several remarks deserve comment. First, this estimation method is not limited by linear

expectational mechanisms since (6) can be easily generalized appropriately. However, we do not

explore the details of this feature explicitly in this paper. Second, the matrices Φ1 and Φ2 contain

2r2 parameters that we want to estimate but we have hr2 conditions available for estimation,

where h is determined by the sample size and the researcher in the same way that instruments

are selected in GMM estimation. Thus, the availability of these overidentifying restrictions will

provide a natural test of model misspecification. We show that this test is distributed chi-squared

under standard regularity conditions that we discuss below. This test effectively measures the

distance between the impulse responses from the data and those from the theoretical model.

Macroeconomic models, whether calibrated, estimated by maximum likelihood, or estimated by

some other method, are routinely evaluated by how well their impulse responses match those

responses generated by the data. Hence, the overidentifying restrictions test provides a formal

statistical metric for this common ocular diagnostic. Third, the relation between the coefficients

in φ and bbT is not a known function, as is commonly assumed in classical minimum distance

estimation, but rather depends on the first-stage estimates bbT . Below we show how we incorporatethis feature into the proofs of consistency and asymptotic normality.

In the next section we derive consistency and asymptotic normality results for the local pro-

jection estimator proposed therein. Estimates from this first-step are then incorporated into

the minimum chi-square step, whose consistency and asymptotic normality properties we derive

thereafter.

6

3 First-Step: Local Projections

3.1 Consistency

In the previous section we argued that a stable solution of the rational expectations model (1) is

therefore covariance-stationary and has a Wold decomposition,

yt =∞Xj=0

Bjεt−j (8)

where for simplicity and without loss of generality we drop the constant and any deterministic

terms. From the Wold decomposition theorem (see e.g. Anderson, 1994):

(i) E(εt) = 0 and εt are i.i.d.

(ii) E(εtε0t) = Σεr×r

(iii)P∞j=0 kBjk <∞ where kBjk2 = tr(B0jBj) and B0 = Ir

(iv) det B(z) 6= 0 for |z| ≤ 1 where B(z) =P∞j=0BjzjAnderson (1994) shows that the process in (8) can also be written as:

yt =∞Xj=1

Ajyt−j + εt (9)

such that,

(v)P∞j=1 kAjk <∞

(vi) A(z) = Ir −P∞j=1Ajz

j = B(z)−1

(vii) detA(z) 6= 0 for |z| ≤ 1.

Results (iii) and (vi) imply that

£Ir −A1L−A2L2 − ...

¤ £Ir +B1L+B2L

2 + ...¤= Ir

7

where L is the usual lag operator. Then, equating terms in the powers of L, we have the following

correspondences:

B1 = A1 (10)

B2 = A1B1 +A2

...

Bh = A1Bh−1 +A2Bh−2 + ...+Ah−1B1 +Ah

for any h ≥ 1 with B0 = Ir. Hence, expression (10) delivers a set of conditions that would allow

us to determine the impulse response function for yt if we knew what the Aj were. A truncated

version of these conditions for a finite lag VAR is available in Hamilton (1994). In addition and

by simple recursive substitution in the V AR(∞) representation, we have that

yt+h = Ah1yt +A

h2yt−1 + ...+ εt+h +B1εt+h−1 + ...+Bh−1εt+1 (11)

where:

(i) Ah1 = Bh for h ≥ 1

(ii) Ahj = Bh−1Aj +Ah−1j+1 where h ≥ 1; A0j+1 = 0; B0 = Ir; and j ≥ 1.

Now consider truncating the infinite lagged expression (11) at lag k

yt+h = Ah1yt +A

h2yt−1 + ...+A

hkyt−k+1 + vk,t+h (12)

vk,t+h =∞X

j=k+1

Ahj yt−j + εt+h +h−1Xj=1

Bjεt+h−j .

In what follows, we show that least squares estimates of (12) produce consistent estimates for

Ahj for j = 1, ..., k. Notice that this is more than we need since we are only interested in the

consistency of the Ah1 .

8

Let Γ(j) ≡ E(yty0t+j) with Γ(−j) = Γ(j)0. Further define:

(i) Xt,k =¡y0t,y

0t−1, ...,y

0t−k+1

¢0that is, the regressors in (12).

(ii) bΓ1,k,hkr×r

= (T − k − h)−1PT−ht=k Xt,ky

0t+h

(iii) bΓk = (T − k − h)−1PT−ht=k Xt,kX

0t,k

then, the mean-square error linear predictor of yt+h based on yt, ...,yt−k+1 is given by the

least-squares formula

bAr×kr

(k, h) = ( bAh1 , ..., bAhk) = bΓ01,k,hbΓ−1k (13)

The following theorem establishes the consistency of these least-squares estimates for A(k, h) =¡Ah1 , ..., A

hk

¢.

Theorem 1 Consistency. Let yt satisfy (8) and assume that:

(i) E|εitεjtεktεlt| <∞ for 1≤ i, j, k, l ≤ r

(ii) k is chosen as a function of T such that

k2

T→ 0 as T, k →∞

(iii) k is chosen as a function of T such that

k1/2∞X

j=k+1

kAjk→ 0 as T, k →∞

Then: °°° bA(k, h)−A(k, h)°°° p→ 0

The proof of this theorem is in the appendix and parallels the proof in Lewis and Reinsel (1985)

on the consistency properties of estimates from a truncated VAR when the underlying model is a

VAR(∞). A natural consequence of the theorem provides the essential result that we need, that

local projection estimates are consistent for the impulse response coefficients, that is, bAh1 p→ Bh.

9

3.2 Asymptotic Normality

We now show that least-squares estimates from the truncated projections in (12) are asymptoti-

cally normal, although for the purposes of the PMD estimator, proving that bAh1 is asymptoticallynormally distributed would suffice. Notice that we can write

bA(k, h)−A(k, h) = ((T − k − h)−1 T−hXt=k

vk,t+hX0t,k

)bΓ−1k= (T − k − h)−1

⎡⎣T−hXt=k

⎧⎨⎩⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠+ εt+h +h−1Xj=1

Bjεt+h−j

⎫⎬⎭X 0t,k

⎤⎦ bΓ−1k= (T − k − h)−1

⎧⎨⎩T−hXt=k

⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠X 0t,k

⎫⎬⎭nΓ−1k +³bΓ−1k − Γ−1k ó+

(T − k − h)−1⎧⎨⎩T−hXt=k

⎛⎝εt+h +h−1Xj=1

Bjεt+h−j

⎞⎠X 0t,k

⎫⎬⎭nΓ−1k +³bΓ−1k − Γ−1k ó

Hence, the strategy of the proof will consist in showing that the first term in the sum above

vanishes in probability and that the second term converges in probability as follows,

(T − k − h)1/2 vech bA(k, h)−A(k, h)i p→

(T − k − h)1/2 vec⎡⎣(T − k − h)−1

⎧⎨⎩T−hXt=k


Bjεt+h−j

⎞⎠X 0t,k

⎫⎬⎭Γ−1k⎤⎦

so that by showing that this last term is asymptotically normal, we complete the proof.

Define,

U1T =

⎧⎨⎩(T − k − h)−1T−hXt=k

⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠X 0t,k

⎫⎬⎭U∗2T =

⎧⎨⎩(T − k − h)−1T−hXt=k


Bjεt+h−j

⎞⎠X 0t,k

⎫⎬⎭then

(T − k − h)1/2 vech bA(k, h)−A(k, h)i =

(T − k − h)1/2⎧⎪⎪⎨⎪⎪⎩

vec£U1TΓ

−1k

¤+ vec

hU1T

³bΓ−1k − Γ−1k í+vec

£U∗2TΓ

−1k

¤+ vec

hU∗2T

³bΓ−1k − Γ−1k í⎫⎪⎪⎬⎪⎪⎭

10

hence

(T − k − h)1/2 vech bA(k, h)−A(k, h)i− (T − k − h)1/2 vec £U∗2TΓ−1k ¤ =

(T − k − h)1/2⎧⎪⎪⎨⎪⎪⎩vec

£U1TΓ

−1k

¤+ vec

hU1T

³bΓ−1k − Γ−1k ´i+vec

hU∗2T

³bΓ−1k − Γ−1k ´i⎫⎪⎪⎬⎪⎪⎭ =

¡Γ−1k ⊗ Ir

¢vec

h(T − k − h)1/2 U1T

i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i

Define, with a slight change in the order of the summands,

W1T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i

W2T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i

W3T =¡Γ−1k ⊗ Ir

¢vec

h(T − k − h)1/2 U1T

ithen, in the next theorem we show that W1T

p→ 0, W2Tp→ 0, W3T

p→ 0.

Theorem 2 Let yt satisfy (8) and assume that

(i) E |εitεjtεktεlt| <∞; 1 ≤ i, j, k, l ≤ r

(ii) k is chosen as a function of T such that k3

T → 0, k, T →∞

(iii) k is chosen as a function of T such that

(T − k − h)1/2∞X

j=k+1

kAjk→ 0; k,T→∞

Then

(T − k − h)1/2 vech bA(k, h)−A(k, h)i p→

(T − k − h)1/2 vec⎡⎣⎧⎨⎩(T − k − h)−1

T−hXt=k


Bjεt+h−j

⎞⎠X 0t,k

⎫⎬⎭Γ−1k⎤⎦

11

Proof is given in the appendix. Now that we have shown that W1T , W2T , and W3T vanish in

probability, all that remains is to show that

ST ≡ (T − k − h)1/2 vec⎡⎣(T − k − h)−1

⎧⎨⎩T−hXt=k


Bjεt+h−j

⎞⎠X 0t,k

⎫⎬⎭Γ−1k⎤⎦ d→

N(0,Ωh) with Ωh =¡Γ−1k ⊗ Σh

¢; Σh =

⎛⎝Σε + h−1Xj=1

BjΣεB0j

⎞⎠Since, vec

h bA(k, h)−A(k, h)i p→ ST , and STd→ N(0,Ωh), then we will have vec

h bA(k, h)−A(k, h)i d→

N(0,Ωh). We establish this result in the next theorem.

Theorem 3 Let yt satisfy (8) and assume

(i) E|εitεjtεktεlt| <∞; 1≤ i, j, k, l ≤ r

(ii) k is chosen as a function of T such that

k3

T→ 0, k, T →∞

Then

STd→ N(0,Ωh)

The proof is provided in the appendix. Several results deserve comment. Observe that for each

horizon h considered, the asymptotic variance-covariance matrix Ωh is determined by the moving

average structure of the residuals, Ωh =¡Γ−1k ⊗ Σh

¢; Σh =

³Σε +

Ph−1j=1 BjΣεB

0j

´. Jordà (2005)

suggests that one way of estimating this variance-covariance matrix semiparametrically is with

the Newey-West (1987) heteroskedasticity and autocorrelation consistent estimator. However, if

the projections are estimated sequentially, notice that each of the projections up to horizon h− 1

provide consistent estimates of Bj j = 1, 2, ..., h − 1, which could then be used to estimate the

asymptotic variance-covariance matrix parametrically. We postpone discussion on the differences

between these two methods for a later paper and proceed with the alternative based on the

Newey-West estimator.

12

In practice, we find it convenient to estimate responses for horizons 1, ..., h jointly as follows.

Define,

(i) Xt−1,kr(k−2)×1

≡ ¡y0t−1, ...,y0t−k+1¢0(ii) Yt,h

rh×1≡ ¡y0t+1, ...,y0t+h¢0

(iii) Mt−1,k1×1

≡ 1−PT−ht=k X

0t−1,k

³PT−ht=k Xt−1,kX

0t−1,k

´−1Xt−1,k

(iv) bΓ1|kr×r≡ (T − k − h)−1PT−h

t=k ytMt−1,ky0t

(v) bΓ1,h|kr×rh

≡ (T − k − h)−1PT−ht=k ytMt−1,kY 0t,h

Hence, the impulse response coefficient matrices for horizons 1 through h can be jointly esti-

mated in a single step with

bΓ01,h|kbΓ−11|k =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

bA11bA21bAh1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

bB1bB2...

bBh

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦= bB(1, h) (14)

Using the usual least-squares formulas, notice that

bB(1, h) = B(1, h) +((T − k − h)−1 T−hXt=k

ytMt−1,kV 0t,h

)0 bΓ−11|k + op(1) (15)

where Vt,h ≡¡v0t+1, ...,v

0t+h

¢0;vt+j = εt+j + B1εt+j−1 + ... + Bj−1εt+1 for j = 1, ..., h and the

terms vanishing in probability in (15) involve the terms U1T , U2T , and U3T defined in the proof of

theorem one, which makes use of the condition k1/2P∞j=k+1 ||Aj || → 0 as T, k → ∞. Under the

conditions of theorem 2, wew can write

(T − k − h)1/2vec³ bB(1, h)−B(1, h)´ p→ (16)

(T − k − h)1/2 vec"((T − k − h)−1

T−hXt=k

Vt,hMt−1,ky0t

)bΓ−11|k#

13

from which we can derive the asymptotic distribution under theorems 2 and 3.

Next notice that

(T − k − h)−1T−hXt=k

Vt,hV0t,h

p→ Σvrh×rh

(17)

The specific form of the variance-covariance matrix Σv can be derived as follows. Define 0j = 0j×j;

0m,n = 0m×n;. Recall that Vt,h ≡

¡v0t+1, ...,v

0t+h

¢0from above. Specifically,

Vt,h =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

εt+1

εt+2 +B1εt+1

...

εt+h +B1εt+h−1 + ...+Bh−1εt+1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦= ΨBεt,h,

where εt,h ≡¡ε0t+1, ..., ε0t+h

¢0and ΨB = [ΨB,1 . . .ΨB,h] with

ΨB,1 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

Ir

B1

...

Bh−1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ΨB,h =

⎡⎢⎢⎣ 0r(h−1),r

Ir

⎤⎥⎥⎦ and for 1 < j < h, ΨB,j =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0r(j−1),r

Ir

...

Bj−1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

Then, assuming Σεr×r

= E(εt+1ε0t+1), it is easy to show that

Σv = ΨBIr ⊗ ΣεΨ0B (18)

and hence that

(T − k − h)1/2 vec³ bB(1, h)−B(1, h)´ d→ N (0,ΩB)

ΩBr2h×r2h

=

ÃΓ−11|kr×r⊗ Σvrh×rh

!

In practice, one requires sample estimates bΓ−11|k and bΣv.We have already defined a sample estimatefor the first term, however, the latter estimate deserves particular consideration. As we mentioned

14

in our derivations for theorem 3, one avenue would be to estimate expression (18) by a Newey-

West type estimator of the variance-covariance matrix Σv. However, given the parametric form

of expression (18), it would be easy to construct a sample estimate of ΩB by plugging-in the

estimates bB(1, h) and bΣε into the expression (18). Furthermore, feasible GLS estimates of B(1, h)could be easily computed in two steps: (1) compute bB(1, h), bΣε, and hence bΣv (2) do GLS byscaling Yt,h with bΣ−1/2v . We defer further investigation on the properties of this estimator to a

different paper to avoid burdening the reader with too many results.

In practice, we find it convenient to estimate responses for horizons 1, ..., h jointly as follows.

define yj for j = h, ..., 1, 0, —1, ..., —k as the (T − k − h)× r matrix of observations of the vector

yt+j. Additionally, define Y ≡ (y1, ...,yh)0 ; X ≡ (Ih ⊗ y0) ; Z ≡

¡1(T−k−h)×r,y−1, ...,y−k+1

¢and hence, Mz = I(T−k−h)h − (Ih ⊗ Z)

£(Ih ⊗ Z)0 (Ih ⊗ Z)

¤−1(Ih ⊗ Z) . Standard properties of

least-squares allow us to jointly estimate

bB(1, h) =⎡⎢⎢⎢⎢⎢⎢⎣bB1...

bBh

⎤⎥⎥⎥⎥⎥⎥⎦ = [X0MzX]

−1[X 0MzY ] (19)

and with asymptotic variance-covariance matrix for vec( bB(1, h)), bV 2b = n[X 0MzX]−1 ⊗ bΣho ; bΣh =³bΣ+Ph−1

j=1bBj bΣ bB0j´ , and bΣ = v0v

T−k−h ; bv =MzY −MzX bB(1, h). Hence, recall that what is neededfor the two-step estimator is an estimate of b = vec(B) = vec(B(0, h)) where B0 = Ir. Hence,

from (19), bbT = (vec(Ir) vec( bB(1, h))0 where bV 2b can easily be redefined as

bV 2b =⎛⎜⎜⎝ 0r 0r×(hr2)

0(hr2)×r bV 2b⎞⎟⎟⎠

We conclude this section by highlighting its main results and drawing a brief comparison to

estimates of the Bj based on a V AR(k). First, our local projection estimator ensures that we have

consistent estimates of Bj for j = 1, ..., h. We have shown that this result holds even when the

underlying process is an infinite order V AR. Secondly, we have derived the variance-covariance

15

matrix of B(1, h). Finally, even if the underlying data generating process is more complex than an

infinite order V AR, the nature of the local approximation of the projections is likely to provide

more robust estimates of the Bj . In contrast, estimates of Bj based on a V AR(k) have serious

disadvantages. First, consistency of the Bj is only available for j = 1, ..., k. For any j > k the

Bj will be inconsistent for processes whose V AR representation has more than k lags (as would

be expected with the V ARMA representations of the newest generation of dynamic, stochastic,

general equilibrium model). Seconly, a closed form expression for the variance-covariance matrix

of the Bj is unavailable. This prevents us from forming the optimal weighting matrix in the

minimum distance step and complicates the computation of valid standard errors for the structural

parameter estimates.

4 The Second Step: Minimum Chi-Square

4.1 Consistency

Given bbT from the first-stage described in the previous sections, our objective here is to estimate

φ = vec(Φ1) vec(Φ2)0 from the conditions in (5) and minimization of expression (7), where

expression (6) is rewritten here for convenience

g³bbT ;φ´ = (Ir ⊗ S0)bbT − n³Ir ⊗ S1 bBT´ ³

Ir ⊗ S2 bBT´oφand hence the quadratic function we need to minimize is

minφ

bQT (φ) = g ³bbT ;φ´0cWg ³bbT ;φ´where bφT denotes the sample estimate of φ from this two-step estimator and Q0(φ) denotes

the objective function at b0. The following theorem establishes the conditions under which this

estimator is consistent for φ0.

16

Theorem 4 Given that bbT p→ b0, assume that

(i) cW p→W, a positive semidefinite matrix

(ii) Q0(φ) is uniquely maximized at φ0

(iii) The parameter space φ0 ∈ is compact

(iv) Q0(φ) is continuous

(v) bQT (φ) converges uniformly in probability to Q0(φ)(vi) hr2 ≥ dim(φ)

Then

bφT p→ φ0

Proof is provided in the appendix. Next, we show that the minimum chi-square estimator is

asymptotically normal.

4.2 Asymptotic Normality

The results in this section are based on Theorem 3.2 and pages 2149, 2175 in Newey and McFadden

(1994). The following theorem derives the asymptotic distribution of bφT .Theorem 5 Assume:

(i) cW p→W, a positive semidefinite matrix

(ii) bbT p→ b0 and bφT p→ φ0

(iii) b0 and φ0 are in the interior of their parameter spaces

(iv) g(bbT ;φ) is continuously differentiable in a neighborhood N of φ0

(v)√Tg(b0;φ0) =

√T (Ir ⊗ S0)

³bbT − b0´ d→ N(0, VB)

17

(vi) There is a Gb and a Gφ that are continuous at b0 and φ0 respectively and

supφ∈N

°°°∇bg(bbT ;φ)−Gb°°° p→ 0

supφ∈N

°°°∇φg(bbT ;φ)−Gφ

°°° p→ 0

(vii) For Gφ = Gφ(φ0), then G0φWGφ in invertible

(viii) hr2 ≥ dim(φ)

Then:

√T³bφT − φ0

´d→ N (0, Vφ) (20)

Vφ =¡G0φWGφ

¢−1G0φW (I ⊗ S0)0 Vb (I ⊗ S0)WGφ

¡G0φWGφ

¢−1+

¡G0φWGφ

¢−1G0bWΣhWGb

¡G0φWGφ

¢−1+

¡G0φWGφ

¢−1 G0φWGb −G0bMzX0VbG0φWGb −G0bMzX¡G0φWGφ

¢−1The optimal weighting matrix W is

Wo =©(I ⊗ S0)0 Vb (I ⊗ S0)

ª−1which would simplify the expression of the variance further into

Vφ =¡G0φWoGφ

¢−1+ V o

φ(bT )

where V oφ(bT )

denotes that portion of the variance of bφT due to the fact that bbT is estimated andcorresponds to the last two lines in expression (20) evaluated atWo. Absent this uncertainty, then

the expression for Vφ would simplify to³G0φWoGφ

´−1, the usual minimum distance result with

optimal weighting matrix.

5 PMD: A Summary for Practitioners

The PMD estimator of the model in expression (1) and repeated here for convenience

18

Φ00yt = Φ01yt−1 + Φ

02Etyt+1 + ut, E (u0tut) = I

consists of the following steps:

1. Construct Y = (y1, ...,yh)0; X = (Ih ⊗ y0) ; Z =

¡1(T−k−h)×r,y−1, ...,y−k+1

¢; Mz =

I(T−k−h)h− (Ih ⊗ Z)£(Ih ⊗ Z)0 (Ih ⊗ Z)

¤−1(Ih ⊗ Z) , where yj is the (T −k−h)×r matrix

of observations for the vector yt+j .

2. Compute by least squares bbT = vec(Ir bB(1, h)), wherebB(1, h) = [X 0MzX]

−1[X 0MzY ]

3. Compute bVb as bVb = (T − k − h)−1n[X 0MzX]

−1 ⊗ bΣho ; bΣh estimated by Newey-West onthe residuals of the regression in bullet point 2. Remember to redefine bVb by adding theappropriate zeros corresponding to the variance and covariances of the term I in bbT

4. Define bYb = (Ir ⊗ S0)bbT ; bXb = n³Ir ⊗ S1 bBT´ ³Ir ⊗ S2 bBT´o where S0, S1, S2 are the ma-

trices that select the appropriate elements of bBT = bB(1, h). Further define φ = vec(Φ1) vec(Φ2) .

5. The minimum chi-square estimate of φ is easily computed by weighted least-squares

bφT = ³ bXb bV −1bbXb´−1 ³ bXb bV −1b

bYb´

6. The variance-covariance matrix for bφT can be calculated as:bVφ = 1

T − k − h³ bX 0

bcWo

bXb´−1 + 1

T − k − hbV oφ(bT )

where cWo =n(I ⊗ S0)0 bVb (I ⊗ S0)o−1 and the expression for bV oφ(bT ) can be computed di-

rectly from expression (20).

7. A test of misspecification can be constructed by noticing that

³bYb − bXbbφT´0cW0

³bYb − bXbbφT´ d→ χ2(h−2)r2

19

6 Invalid Instruments: PMD versus GMM

We begin this section with a simple motivating example. Suppose the DGP is characterized by

the backward/forward model:

yt = φ1yt−1 + φ2Etyt+1 + εt. (21)

Instead, Euler conditions of conventional rational expectations models can be expressed as

yt = ρEtyt+1 + ut, (22)

which in this example, are misspecified with respect to the DGP. Based on the economic model

in (22), yt−j ; j > 1 would be considered valid instruments for GMM estimation and hence, an

estimate of ρ would be found with the set of conditions

bρGMM =

Ã1

T

TXyt+1yt−j

!−1Ã1

T

TXyt−jyt

!. (23)

It is easy to see that the probability limit of these conditions is

bρGMMp→ φ2 + φ1

γj−1γj+1

; j ≥ 1

where γj = COV (ytyt−j). Notice that the bias, φ1γj−1γj+1

, does not disappear by selecting longer

lags of yt−j as instruments, the usual GMM recommendation. In fact, any yt−j ; j ≥ 1 is an

invalid instrument and although γj → 0 as j →∞, γj−1γj+1is indeterminate, since both covariances

are simultaneously going to zero. Meanwhile, as j → ∞ the correlation of the instrument with

the regressor is exponentially decaying to zero.

Instead consider what happens in PMD. Notice that the MA(∞) representation of (21) is

yt =∞Xj=0

bjεt−j

20

and hence, under the proposed model in (22), we obtain the following set of conditions analogous

to (3), namely

⎡⎢⎢⎢⎢⎢⎢⎣b1

...

bh

⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎢⎢⎣

1

...

bh−1

⎤⎥⎥⎥⎥⎥⎥⎦ ρ (24)

which we can use to set-up the PMD estimator. Local projections for the bj are

bbj = Ã 1T

TXy0t−jMt−jyt−j

!−1Ã1

T

TXy0t−jMt−jyt

!

whereMt = 1−zt (z0tzt)−1 zt and zt = (1 yt−2 ...yt−k+1) . In order to compare PMD directly with

the GMM expression (23) notice that

bj = ρbj−1

then, for j ≥ 1

bρPMD =

Ã1

T

TXy0t−jMt−jyt−j

!−1Ã1

T

TXy0t−jMt−jyt

!

and hence

bρPMDp→ φ1

since

1

T

TXyt−j−1Mt−jyt

p→ 0;1

T

TXεt−jMt−jyt

p→ 0.

Clearly, a main distinguishing characteristic between PMD and GMM is that PMD uses the

semiparametric estimates of the impulse responses to generate conditional instruments that are

21

valid even if the model is misspecified. In contrast, GMM uses endogenous instruments uncon-

ditionally and therefore their validity heavily depends on having specified the correct model. A

second advantage of PMD is that the optimal weights assigned to each impulse responses con-

dition are inversely proportional to the estimated variance of the impulse response coefficient.

Hence, it is the marginal informational content of this conditional instrument that determines its

contribution to the parameter estimates and its variance.

7 Monte Carlo Experiments (incomplete)

7.1 Estimating ARMA(p,q) models with PMD

This section investigates the small sample properties of PMD. We take this opportunity to further

demonstrate the flexibility of our method by experimenting with univariate ARMA specifica-

tions, which would typically require numerical optimization routines. In particular, consider a

covariance-stationary ARMA(p,q) model,

yt = ρ1yt−1 + ...+ ρpyt−p + εt + θ1εt−1 + ...+ θqεt−q (25)

with Wold decomposition,

yt =∞Xj=0

bjεt−j (26)

with b0 = 1. Matching the responses bi in (26) into (25) delivers the set of conditions

22

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

b1

b2

b3

...

bh

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1

b1

b2

...

bh−1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ρ1 +

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0

1

b1

...

bh−2

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ρ2 + ...+

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0

0

0

...

bh−p

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ρp + (27)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1

0

0

...

0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦θ1 +

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0

1

0

...

0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦θ2 + ...+

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0

0

0

...

0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦θq

Table 1 summarizes a battery of experiments based on several ARMA(1,1) specifications that

include pure AR and pure MA examples as well. For each choice of ARMA parameters indicated

in the table, residuals are drawn from a standard normal and used to generate samples of 100 and

300 observations, where the first 100 observations generated are discarded to avoid initialization

problems. The horizon h is chosen when a joint F-test of the h+1 projection fails to reject the null

that the projection coefficients are zero. Finally, we compare the properties of ERME against the

Durbin-Hannan-Rissanen estimator for ARMA models (see Durbin, 1960 and Hannan-Rissanen,

1982). This is a two-step estimator based on fitting a long autoregression in the first step and

then using the residuals to estimate the ARMA model in the second step. We choose this method

for several reasons. Maximum likelihood estimation requires numerical routines which crashed

for some draws of the residuals. The Durbin-Hannan-Rissanen estimator does not have these

numerical problems and provides asymptotically consistent and efficient estimates1 .

Overall, we found the results of our Monte-Carlo experiment very encouraging. ERME esti-

1 The GAUSS routines used to estimate the models with the Durbin-Hannan-Rissanenapproach were obtained from the collection of time series routines prepared by RainerSchlittgen and Thomas Noack and available at the American University repository:http://www.american.edu/academic.depts/cas/econ/gaussres/timeseri/timeseri.htm

23

mates are numerically very similar (even slightly more accurate) to the Durbin-Hannan-Rissanen

estimates and close to the true parameter values. Although not reported in the table, we found the

numerical estimates and associated standard errors to be very stable to the choice of truncation

horizon h. Standard errors from ERME tended to be slightly less efficient, the difference becoming

smaller for larger samples or when the truncation horizon is larger.

7.2 Fuhrer and Olivei (2004)

7.3 Systems

8 Application: Fuhrer and Olivei (2004) revisited

The popular, elementary New-Keynesian framework for monetary policy analysis combines forward-

looking, micro-founded output (IS curve) and inflation (AS curve) Euler equations with a policy

reaction function. The specific expectations mechanism considered plays a prominent role: purely

forward-looking expectations mechanisms seem to be at odds with the dynamic properties of the

data. Fuhrer and Olivei (2004) investigate whether the nature of this dissonance can be explained

by the poor properties of weak instruments GMM or whether “hybrid” specifications, that simul-

taneously incorporate forward and backward-looking behavior, are warranted. To this end, Fuhrer

and Olivei (2004) propose a GMM procedure that imposes the dynamic constraints implied by

the economic model on the instruments. They dub this procedure “optimal-instruments” GMM

(OI-GMM) and explore its properties relative to conventional GMM and MLE.

This section uses ERME on the specifications in Fuhrer and Olivei (2004) to offer a benchmark

comparison between GMM, MLE and OI-GMM, and ERME. The basic specification is:

zt = (1− µ) zt−1 + µEtzt+1 + γEtxt + εt (28)

In the output Euler equation, zt is a measure of the output gap, xt is a measure of the real interest

rate, and hence, γ < 0. In the inflation Euler version of (28), zt in a measure of inflation, xt is

a measure of the output gap, and γ > 0 signifying that a positive output gap exerts “demand

24

pressure” on inflation.

Fuhrer and Olivei (2004) experiment with a quarterly sample from 1966:Q1 to 2001:Q4 and

use the following measures for zt and xt. The output gap is measured, either by the log deviation

of real GDP from its HP trend or, from a segmented time trend with breaks in 1974 and 1995.

Real interest rates are measured by the difference of the federal funds rate and next period’s

inflation. Inflation is measured by the log change in the GDP, chain-weighted price index. In

addition, Fuhrer and Olivei (2004) experiment with real unit labor costs instead of the output gap

for the inflation Euler equation. Further details can be found in their paper. Accordingly, Tables

2 and 3 reproduce Tables 4 and 5 in Fuhrer and Olivei (2004) and incorporate the results we find

from ERME. In addition, we report unconstrained estimates of expression (28) labeled with the

superscript “u” for unconstrained (“c” for constrained). The projection lag length of ERME’s first

step is determined by Akaike’s information criterion and the horizon truncation h is determined

with a joint significance F-test of the projections at each horizon, as we did in the Monte Carlos.

Table 2 shows significant discrepancies between GMM, MLE and OI-GMM on one hand, and

ERME on the other. Where as the former methods’ estimates of µ range between 0.52 to 0.41,

ERME’s estimates are in the range −0.40 to −0.17. Interestingly, ERME finds a much stronger

link and of the correct sign between current and future interest rates and the output gap - a

coefficient that is routinely estimated to be insignificant in the literature. Our estimates range

from −0.13 to −0.10, whereas maximum likelihood suggests this coefficient to be −0.0084 at best.

At worst the coefficient is of the wrong sign and insignificant in all other cases.

We investigated two possible explanations of the discrepancies between methods. First, we

considered whether the constraint that the coefficient on zt−1 and the coefficient on Etzt+1 add

up to one and imposed by Fuhrer and Olivei (2004), was being violated by the data. This turned

out not to be the case as can be observed from Table 2. Secondly, Cameron and Trivedi (2005)

suggest that in small samples, optimally weighted minimum distance estimators can be biased and

that it may be preferable to use a weighting matrix with equal weights. The results with equal

25

weights ERME are reported in the bottom third of table 2. Estimates of µ and γ under both the

constrained and unconstrained versions changed very little indeed, suggesting that this was not

the source of discrepancy.

What could explain the differences then? The overall specification J-test from ERME suggests

that, whether unconstrained or not, estimated with equal weights or not, the theoretical model

is misspecified. Because MLE and OI-GMM impose this theoretical and misspecified model on

the data, we attribute differences across methods to this constraint. This impression was further

reinforced by the results reported in Table 3. As we shall see momentarily, when the J-test indicates

that the theoretical model is not misspecified, estimates from ERME are almost identical to MLE

and OI-GMM. Because ERME is robust to misspecification of the underlying data generating

process, we feel strongly about our results if only the controlled environment of a Monte-Carlo

exercise could arbitrate among the merit of each method.

Table 3 reports estimates of the inflation Euler equation. As we remarked above, when ERME’s

J-test does not find evidence of misspecification, ERME’s estimates are very close to the MLE

estimates (in fact, somewhat closer than OI-GMM). µ is estimated to be approximately 0.20

and the output gap enters with the correct sign and with a coefficient of about 0.10. However,

discrepancies among methods arise when we use real unit labor costs instead of the output gap. In

this case, the J-test suggests the model is misspecified and, while ERME finds a value of µ which

is still in the neighborhood of 0.20, it finds that the effect of real unit labor costs on inflation is

much larger: 0.35 and highly significant. In contrast, both ML and OI-GMM have µ growing from

0.20 to 0.45 while the effect of real unit labor costs becomes insignificant.

We conclude this section by observing that ERME estimates were insensitive to the choice

of projection lag length (we obtained similar results with varying lag length specifications) and

changed very little when we vary the number of impulse response horizons used. Obviously, while

our method produces results that are preferable from the economic point of view (in that the

coefficient estimates of γ are more closely in line with a priori economic intuition), the more sensible

26

argument comes from observing that, while MLE and OI-GMM impose the economic model on the

data (and hence their properties cannot be assessed when this assumption is violated), ERME does

no such thing — it uses local projections to obtain the most accurate description of the properties

of the data generating process and uses these to obtain estimates of the parameters of interest.

9 Conclusions

This paper introduces a disarmingly simple method to estimate rational expectations models based

on mapping the structural parameters of the economic model into the impulse responses from the

data. The method avoids many of the computational complexities associated with full-information

maximum likelihood or the optimal instruments GMM approach proposed by Fuhrer and Olivei

(2004), yet possesses several advantages over these alternative estimation procedures. By assuming

that the solution paths and exogenous stochastic processes in the model are characterized by a

Wold representation rather than by a specific Markov process, we require less knowledge about

the data generating process and can accommodate more general dynamic specifications. Perhaps

most importantly, our efficient response matching estimator is designed to estimate parameters

so that the model fits the data precisely along the dimensions against which it will be evaluated.

The success or failure of dynamic stochastic general equilibrium models is ultimately based on

their ability to explain the dynamic interactions among the economic variables involved, which

interactions are typically summarized by impulse response functions. It is in this sense that our

method conforms to the introductory quote by Sims.

An important feature of our method originates from how we adapt the local projection re-

sponse estimator proposed by Jordà (2005). This impulse response estimator is more robust to

general misspecification of the relevant underlying stochastic processes. This feature is particu-

larly advantageous as it allows the ERME method to extract additional information from longer

horizon impulse responses in cases where the Wold decomposition differs from the typically as-

sumed Markov format. In the empirical example, this additional information tightened estimates

27

of standard errors considerably.

New second and higher order accurate solution techniques for nonlinear dynamic stochastic

general equilibrium models (see, e.g. Kim et al., 2003) deliver equilibrium conditions that are

polynomial (rather than linear) difference equations. Future research will investigate the ERME

method in these environments. We expect that, while the relationship between impulse response

coefficients and the model’s parameters will no longer be linear, the only real variation in our

method will be a nonlinear-GLS type step, which is still relatively straight-forward to apply and

may open the door to formal evaluation of this next generation of models.

10 Appendix

Proof. Theorem 1

Notice that

bA(k, h)−A(k, h) = bΓ01,k,hbΓ−1k −A(k, h)bΓkbΓ−1k =⎧⎨⎩(T − k − h)−1∞Xj=k

vk,t+hX0t,k

⎫⎬⎭ bΓ−1kwhere

vk,t+h =∞X

j=k+1

Ahj yt−j + εt+h +h−1Xj=1

Bjεt+h−j

Hence,

bA(k, h)−A(k, h) =

⎧⎨⎩(T − k − h−1)T−hXt=k

⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠X 0t,k

⎫⎬⎭ bΓ−1k +

((T − k − h−1)

T−hXt=k

εt+hX0t,k

)bΓ−1k +⎧⎨⎩(T − k − h−1)T−hXt=k

⎛⎝ hXj=1

Bjεt+h−j

⎞⎠X 0t,k

⎫⎬⎭ bΓ−1kDefine the matrix norm kCk21 = supl6=0 l

0C0C0l0l , that is, the largest eigenvalue of C 0C. When C is

symmetric, this is the square of the largest eigenvalue of C. Then

kABk2 ≤ kAk21 kBk2 and kABk2 ≤ kAk2 kBk21

28

Hence °°° bA(k, h)−A(k, h)°°° ≤ kU1T k°°°bΓ−1k °°°1+ kU2T k

°°°bΓ−1k °°°1+ kU3T k

°°°bΓ−1k °°°1

where

U1T =

⎧⎨⎩(T − k − h−1)T−hXt=k

⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠X 0t,k

⎫⎬⎭U2T =

((T − k − h−1)

T−hXt=k

εt+hX0t,k

)

U3T =

⎧⎨⎩(T − k − h−1)T−hXt=k

⎛⎝ hXj=1

Bjεt+h−j

⎞⎠X 0t,k

⎫⎬⎭Lewis and Reinsel (1985) show that

°°°bΓ−1k °°°1is bounded, therefore, the next objective is to show

kU1T k p→ 0, kU2T k p→ 0, and kU3T k p→ 0. We begin by showing kU2T k p→ 0, which is easiest to

see since εt+h and X 0t,k are independent, so that their covariance is zero. Formally and following

similar derivations in Lewis and Reinsel (1985)

E³kU2T k2

´= (T − k − h)−2

T−hXt=k

E¡εt+hε

0t+h

¢E(X 0

t,kX0t,k)

by independence. Hence

E³kU2T k2

´= (T − k − h)−1tr(Σ)k tr [Γ(0)]

Since kT−k−h → 0 by assumption (ii), then E

³kU2T k2

´p→ 0, and hence kU2T k p→ 0.

Next, consider kU3T k p→ 0. The proof is very similar since εt+h−j, j = 1, ..., h− 1 and X 0t,k are

independent. As long as kBjk2 < ∞ (which is true given that the Wold decomposition ensures

thatP∞j=0 kBjk < ∞, then using the same arguments we used to show kU2T k

p→ 0, it is easy to

see that kU3T k p→ 0.

Finally, we show that kU1T k p→ 0.The objective here is to show that assumption (iii) implies

that

k1/2∞X

j=k+1

°°Ahj °°→ 0, k, T → 0

29

because we will need this condition to hold to complete the proof later. Recall that

Ahj = Bh−1Aj +Ah−1j+1 ; A

0j+1 = 0; B0 = Ir; h, j ≥ 1, h finite

Hence

k1/2∞X

j=k+1

°°Ahj °° = k1/2⎧⎨⎩

∞Xj=k+1

kBh−1Aj +Bh−2Aj+1 + ...+B1Aj+h−2 +Aj+h−1k⎫⎬⎭

by recursive substitution. Thus

k1/2∞X

j=k+1

°°Ahj °° ≤ k1/2⎧⎨⎩

∞Xj=k+1

kBh−1Ajk+ ...+ kB1Aj+h−2k+ kAj+h−1k⎫⎬⎭

Define λ as the max kBh−1k , ..., kB1k , then sinceP∞j=0 kBjk <∞ we know λ <∞ so that

k1/2∞X

j=k+1

°°Ahj °° ≤ k1/2⎧⎨⎩λ

∞Xj=k+1

kAjk+ ...+ λ∞X

j=k+1

kAj+h−2k+∞X

j=k+1

kAj+h−1k⎫⎬⎭

By assumption (iii) and since λ <∞, then each of the elements in the sum goes to zero as T, k go

to infinity. Finally, to prove kU1T k p→ 0 all that is required is to follow the same steps as in Lewis

and Reinsel (1985) but using the condition

k1/2∞X

j=k+1

°°Ahj °°→ 0, k, T → 0

instead.

Proof. Theorem 2

We begin by showing thatW1Tp→ 0. Lewis and Reinsel (1985) show that under assumption (ii),

k1/2°°°bΓ−1k − Γ−1k °°°

1

p→ 0 and E³°°°k−1/2 (T − k − h)1/2 U1T°°°´ ≤ s (T − k − h)1/2P∞j=k+1 °°Ahj °° p→

0; k, T → ∞ from assumption (iii) and using similar derivations as in the proof of consistency

with s being a generic constant. Hence W1Tp→ 0.

Next, we show W2Tp→ 0. Notice that

|W2T | ≤ k1/2°°°bΓ−1k − Γ−1k °°°

1

°°°k−1/2(T − k − h)1/2U∗2T°°°As in the previous step, Lewis and Reinsel (1985) establish that k1/2

°°°bΓ−1k − Γ−1k °°°1

p→ 0 and from

the proof of consistency, we know the second term is bounded in probability, which is all we need

to establish the result.

30

Lastly, we need to showW3Tp→ 0, however, the proof of this result is identical to that in Lewis

and Reinsel once one realizes that assumption (iii) implies that

(T − k − h)1/2∞X

j=k+1

°°Ahj °° p→ 0

and substituting this result into their proof.

Proof. Theorem 3

Follows directly from Lewis and Reinsel (1985) by redefining

STm = (T − k − h)1/2 vec⎡⎣⎧⎨⎩(T − k − h)−1

T−hXt=k


Bjεt+h−j

⎞⎠X 0t,k(m)

⎫⎬⎭Γ−1k⎤⎦

for m = 1, 2, ... and Xt,k(m) as defined in Lewis and Reinsel (1985).

Proof. Theorem 4

Notice that when bbT p→ b0, then

g(bbT ;φ) = (Ir ⊗ S0)bbT − n³Ir ⊗ S1 bBT´ ³Ir ⊗ S2 bBT´oφ p−→

= (Ir ⊗ S0) b0 − (Ir ⊗ S1B0) (Ir ⊗ S2B0)φ

= g (b0;φ)

by the continuous mapping theorem. Furthermore and given assumption (i)

Q0(φ) = g(b0;φ)0Wg(b0;φ)

which is a quadratic expression that is uniquely maximized at φ0. Q0(φ) is obviously a continuous

function. Hence, the last thing we need to show is that

supφ∈Θ

¯ bQT (φ)−Q0(φ)¯ p→ 0

uniformly. [I have not done this formally yet. Here is where the stochastic equicontinuity may

come in]

Proof. Theorem 5

31

The strategy of the proof consists in stacking the first and second steps into a vector of

conditions whose weighted quadratic distance we want to minimize. This set-up then allows us to

apply the mean value theorem on the set of first order conditions and derive the desired result.

Therefore, let

mT (b) =MzY −MzXb

and notice that

minbmT (b)

0mT (b)

results in

bbT = (X 0MzX)−1(X 0MzY ) ,

the local projection estimator for the first-step, reduced-form, impulse responses we presented in

section 3. Hence, stacking the first and second stage conditions into a vector, we have

RT (b,φ) =

∙mT (b)

0 g³bbT ;φ´0 ¸

⎡⎢⎢⎣ I 0

0 W

⎤⎥⎥⎦⎡⎢⎢⎣ mT (b)

g³bbT ;φ´

⎤⎥⎥⎦ .The first order conditions of the problem

minβ,φ

RT (b,φ)

are:

⎡⎢⎢⎣ ∇bmT (b)0

0

∇bg³bbT ;φ´0 ∇φg

³bbT ;φ´0⎤⎥⎥⎦⎡⎢⎢⎣ I 0

0 W

⎤⎥⎥⎦⎡⎢⎢⎣ mT (b)

g³bbT ;φ´

⎤⎥⎥⎦ = 0. (29)

Notice that

32

∇bmT (b) = −MzX

∇bg (b;φ) = (I ⊗ S0)− (Φ01 ⊗ I) (I ⊗ S−1) + (Φ02 ⊗ I) (I ⊗ S1)

∇φg (b;φ) = − (I ⊗ S−1)B (I ⊗ S1)B

We find it convenient to define Gb = ∇bg (b;φ) and Gφ = ∇φg (b;φ) and we use Gb = ∇bg¡b;φ¢

to evaluate the gradient at a mean value b ∈hbbT , b0iand bGb = ∇bg ³bbT ; bφT´ and similarly for Gφ

and bGφ.

Next, apply the mean value theorem to the last term in the first order conditions (29):

⎡⎢⎢⎣ mT (b)

g³bbT ;φ´

⎤⎥⎥⎦ =⎡⎢⎢⎣ m (b0)

g (b0;φ0)

⎤⎥⎥⎦+⎡⎢⎢⎣ −MzX 0

Gb Gφ

⎤⎥⎥⎦⎡⎢⎢⎣ bbT − b0bφT − φ0

⎤⎥⎥⎦ (30)

and plug this expression back into the first order conditions (29)

⎡⎢⎢⎣ −X 0Mz 0

bG0b bGφ

⎤⎥⎥⎦⎡⎢⎢⎣ I 0

0 W

⎤⎥⎥⎦×⎧⎪⎪⎨⎪⎪⎩⎡⎢⎢⎣ m (b0)

g (b0;φ0)

⎤⎥⎥⎦+⎡⎢⎢⎣ −MzX 0

Gb Gφ

⎤⎥⎥⎦⎡⎢⎢⎣ bbT − b0bφT − φ0

⎤⎥⎥⎦⎫⎪⎪⎬⎪⎪⎭ = 0

After tedious algebra, we obtain

⎡⎢⎢⎣ bbT − b0bφT − φ0

⎤⎥⎥⎦ = (31)

⎡⎢⎢⎣ (X 0MzZ)−1 0

−³ bG0φWGφ

´−1 n bG0φWGb − bG0bMzXo(X 0MzZ)

−1 ³ bG0φWGφ

´−1⎤⎥⎥⎦×

⎡⎢⎢⎣ −X 0Mz 0

bG0b bG0φW⎤⎥⎥⎦⎡⎢⎢⎣ m (b0)

g (b0;φ0)

⎤⎥⎥⎦

33

Notice that from conditions and assumptions of the problem,

√Tm (b0)

d→ N (0,Σh)

√T (I ⊗ S0) g (b0;φ0) d→ N (0, Vb)

bGb, Gb p→ Gb

bGφ, Gφp→ Gφ

Therefore, the second row of expression (31) allows us to write

√T³bφT − φ0

´d→ N (0, Vφ)

Vφ =¡G0φWGφ

¢−1G0φW (I ⊗ S0)0 Vb (I ⊗ S0)WGφ

¡G0φWGφ

¢−1+

¡G0φWGφ

¢−1G0bWΣhWGb

¡G0φWGφ

¢−1+

¡G0φWGφ

¢−1 G0φWGb −G0bMzX0VbG0φWGb −G0bMzX¡G0φWGφ

¢−1where we make use of the fact that

(X 0MzX)−1X 0MzΣhMzX (X

0MzX)−1= Vb

ReferencesBlanchard, Olivier and Danny Quah (1989) “The Dynamic Effects of Aggregate Demandand Aggregate Supply Disturbances,” American Economic Review, 79: 655-73

Cameron, A. Colin and Pravin Trivedi (2005) Microeconometrics: Methods and Ap-plications. Cambridge: Cambridge University Press.

Christiano, Lawrence J. (2002) “Solving Dynamic Equilibrium Models by a Method of Un-determined Coefficients, Computational Economics, 20, 21-55.

Christiano, Lawrence J., Martin Eichenbaum and Charles L. Evans (2005) “Nominal Rigidi-ties and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy,v. 113, 1-45.

34

Cooley, Thomas F. (1997) “Calibrated Models,” Oxford Review of Economic Policy, 13:55-69.

Durbin, James (1960) “The Fitting of Time-Series Models,” Revue de l’Institut Internationalde Statistique, 28(3), 233-244.

Evans, George W. and Seppo Honkapohja (2001) Learning and Expectations in Macro-economics. Princeton and Oxford: Princeton University Press.

Farmer, Roger E. A. (1993) The Macroeconomics of Self-Fulfilling Prophecies. Cam-bridge, MA: MIT Press.

Ferguson, Thomas S. (1958) “A Method of Generating Best Asymptotically Normal Esti-mates with Application to the Estimation of Bacterial Densities,” Annals of MathematicalStatistics, 29: 1046-62.

Fuhrer, Jeffrey C. and Giovanni P. Olivei (2004) “Estimating Forward Looking Euler Equa-tions with GMM Estimators: An Optimal Instruments Approach,” FRB Boston Series,paper n. 04-2.

Hannan, Edward J. and Jorma Rissanen (1982) “Recursive Estimation of MixedAutoregressive-Moving Average Order,” Biometrika, 69(1), 81-94.

Kim, Jinill, Sunghyun Kim, Ernst Schaumburg, and Christopher A. Sims (2003) “Calculatingand Using Second Order Accurate Solutions of Discrete Time Dynamic Equilibrium Models,”Federal Reserve Board, Finance and Economics Discussion Series, 2003-61.

Hamilton, James D. (1994) Time Series Analysis. Princeton, New Jersey: PrincetonUniversity Press.

Harvey, James E., Kevin D. Hoover and Kevin D. Salyer, eds. (1998) Real Business Cy-cles, A Reader. London and New York: Routledge.

Jordà, Òscar (2005) “Estimation and Inference of Impulse Responses by Local Projections,”American Economic Review, 95(1): 161-182.

Lütkepohl, Helmut (1993) Introduction to Multiple Time Series Analysis, (2nd ed.).Berlin: Springer-Verlag.

McCallum, Bennett T. (1983) “On Non-Uniqueness in Rational Expectations Models: AnAttempt at Perspective,” Journal of Monetary Economics, March, 139-168.

McCallum, Bennett T. (2003) “The Unique Minimum State Variable RE Solution is E-Stablein All Well Formulated Linear Models,” National Bureau of Economic Research, workingpaper 9960.

Rotemberg, Julio J. and Michael Woodford (1997) “An Optimization-Based EconometricFramework for the Evaluation of Monetary Policy,” in NBER Macroeconomics Annual1997, Julio J. Rotemberg and Ben S. Bernanke, eds. Cambridge and London: MIT Press.

Rudebusch, Glenn D. and Jeffrey C. Fuhrer (2004) “Estimating the Euler Equation forOutput,” Journal of Monetary Economics, 51(6): 1133-53.

Sbordone, Argia M. (2004) “A Limited Information Approach to the Simultaneous Estima-tion of Wage and Price Dynamics,” Rutgers University, mimeo.

35

Sbordone, Argia M. (2005) “Do Expected Future Marginal Costs Drive Inflation Dynamics?”Federal Reserve Bank of New York, Staff Report n0. 204.

Sims, Christopher A. (1996) “Macroeconomics and Methodology,” Journal of EconomicPerspectives, v. 10, n.1, 105-120.

Stock, James H., Jonathan Wright, and Motohiro Yogo (2002) “A Survey of Weak Instru-ments and Weak Identification in Generalized Method of Moments,” Journal of Businessand Economic Statistics, 20, 518-529.

Watson, Mark (1993) “Measures of Fit for Calibrated Models,” Journal of Political Economy ,101:1011-1041.

36

38

Table 1. Fitting ARMA(p,q) Models with Efficient Response Matching Estimation. A Monte-Carlo Comparison Data simulated from the model:

)1,0(~ 11 Nyy ttttt εθεερ −− ++=

Model ρ = 0.25

θ = 0.50 ρ = 0.50 θ = 0.25

ρ = 0.50 θ = 0.50

ρ = 0.75 θ = 0

ρ = 0 θ = -0.50

T = 100 DHR 0.264

(0.146) 0.482

(0.130)

0.479 (0.135) 0.264

(0.147)

0.483 (0.110) 0.518

(0.107)

0.702 (0.099) 0.019

(0.138)

-0.034 (0.190) -0.482 (0.167)

ERME* 0.268 (0.152) 0.472

(0.177)

0.461 (0.143) 0.259

(0.166)

0.479 (0.123) 0.500

(0.152)

0.716 (0.133) 0.010

(0.165)

-0.019 (0.220) -0.507 (0.240)

h 5 4 5 4 3 T = 300

DHR 0.246 (0.084) 0.495

(0.075)

0.485 (0.076) 0.251

(0.084)

0.482 (0.064) 0.507

(0.062)

0.747 (0.051) -0.005 (0.078)

-0.006 (0.112) -0.501 (0.097)

ERME* 0.246 (0.092) 0.494

(0.107)

0.482 (0.084) 0.263

(0.101)

0.483 (0.071) 0.501

(0.090)

0.748 (0.061) -0.005 (0.074)

-0.020 (0.124) -0.500 (0.136)

h 7 8 9 12 5 Notes: DHR refers to the Durbin-Hannan-Rissanen estimator for ARMA models. ERME* refers to the ERME estimates obtained from choosing the maximum impulse response horizon h for which the joint null that all the coefficients are zero is rejected at a 95% confidence level. This choice of horizon is reported in the row labeled h. T = 100, 300 refers to the sample size. The first 100 observations of each simulation are discarded to avoid initialization problems. The Monte-Carlo is replicated 200 times. The numbers reported are the Monte-Carlo averages of the estimates (to assess consistency) and the standard errors are the Monte-Carlo average over the standard errors obtained from each of the two estimation alternatives (to assess efficiency).

38

Table 2 – ERME, MLE, GMM and Optimal Instruments GMM: A Comparison

Estimates of Output Euler Equation: 1966:Q1 to 2001:Q4

( ) ttttttt xEzEzz εγµµ +++−= +− 111 Estimation

Method Specification µ

SE(µ) γ

SE(γ) p-value

J-Statistic 1st Step

Lag Length GMM HP 0.52

(0.053) 0.0024

(0.0094)

GMM ST 0.51 (0.049)

0.0029 (0.0093)

ML HP 0.47 (0.035)

-0.0056 (0.0037)

ML ST 0.42 (0.052)

-0.0084 (0.0055)

OI – GMM HP 0.47 (0.062)

-0.0010 (0.023)

OI – GMM ST 0.41 (0.064)

-0.0010 (0.022)

Optimally Weighted ERME (optimal horizon, h = 4) ERMEc HP -0.40

(0.067) -0.10

(0.075) 0.015 4

ERMEc ST -0.32 (0.062)

-0.14 (0.075)

0.003 4

ERMEu HP 1.43/-0.41 (0.12/0.07)

-0.10 (0.075)

0.016 4

ERMEu ST 1.45/-0.32 (0.11/0.06)

-0.13 (0.075)

0.007 4

Equally Weighted ERME (optimal horizon, h = 4) ERMEc HP -0.23

(0.123) -0.10

(0.082) 0.001 4

ERMEc ST -0.17 (0.119)

-0.10 (0.086)

0.000 4

ERMEu HP 1.18/-0.25 (0.17/0.12)

-0.09 (0.101)

0.001 4

ERMEu ST 1.21/-0.17 (0.16/0.12)

-0.11 (0.10)

0.000 4

Notes: Top third of the table is Table 4 in Fuhrer and Olivei (2004). Estimates based on 4 horizons of the impulse response (determined by a joint F-test on the projections) and projection lag-length is determined by AICc. HP is the Hodrick-Prescott filter of log real GDP, and ST is a segmented deterministic trend of log real GDP. ERMEc constrains the coefficients on zt-1 and Etzt+1 to add up to one. ERMEu keeps the coefficients unconstrained and reports the value of the coefficient on zt-1 and then Etzt+1. Optimal Weights ERME uses the variance-covariance matrix from the projection step as weights in the second step. Equally weighted ERME uses the identity matrix instead. P-value of the J-Statistic is an overall specification test. Hence rejection of the null (i.e., p-value < 0.05, say), indicates model misspecification.

39

Table 3 – ERME, MLE, GMM and Optimal Instruments GMM: A Comparison Estimates of Inflation Euler Equation: 1966:Q1 to 2001:Q4

( ) ttttttt xEzEzz εγµµ +++−= +− 111

Estimation

Method Specification µ

SE(µ) γ

SE(γ) p-value

J-Statistic 1st Step

Lag Length GMM HP 0.66

(0.13) -0.055 (0.072)

GMM ST 0.63 (0.13)

-0.030 (0.050)

GMM rulc 0.60 (0.086)

0.053 (0.038)

ML HP 0.17 (0.037)

0.10 (0.042)

ML ST 0.18 (0.036)

0.074 (0.034)

ML rulc 0.47 (0.024)

0.050 (0.0081)

OI – GMM HP 0.23 (0.093)

0.12 (0.042)

OI – GMM ST 0.21 (0.11)

0.097 (0.039)

OI - GMM rulc 0.45 (0.028)

0.054 (0.0081)

Optimally Weighted ERME (optimal horizon, h = 4) ERMEc HP 0.21

(0.11) 0.09

(0.06) 0.714 3

ERMEc ST 0.21 (0.11)

0.07 (0.06)

0.829 3

ERMEc rulc 0.22 (0.10)

0.32 (0.07)

0.000 2

ERMEu HP 0.59/0.22 (0.19/0.11)

0.09 (0.06)

0.912 3

ERMEu ST 0.61/0.22 (0.19/0.11)

0.07 (0.06)

0.963 3

ERMEu rulc 0.56/0.18 (0.14/0.10)

0.35 (0.08)

0.003 2

Equally Weighted ERME (optimal horizon, h = 4) ERMEc HP 0.21

(0.11) 0.09

(0.06) 0.432 3

ERMEc ST 0.21 (0.11)

0.07 (0.06)

0.829 3

ERMEc rulc 0.22 (0.10)

0.32 (0.07)

0.000 2

ERMEu HP 0.59/0.22 (0.19/0.11)

0.09 (0.06)

0.913 3

ERMEu ST 0.61/0.22 (0.19/0.11)

0.07 (0.06)

0.963 3

ERMEu rulc 0.56/0.18 (0.14/0.10)

0.35 (0.08)

0.003 2

Notes: Top third of the table is Table 5 in Fuhrer and Olivei (2004). Estimates based on 4 horizons of the impulse response (determined by a joint F-test on the projections) and projection lag-length is determined by AICc. HP is the Hodrick-Prescott filter of log real GDP, and ST is a segmented deterministic trend of log real GDP. When the entry rulc appears, the specification replaces the output gap with real unit labor costs as the driving process. ERMEc constrains the coefficients on zt-1 and Etzt+1 to add up to one. ERMEu keeps the coefficients unconstrained and reports the value of the coefficient on zt-1 and then Etzt+1. Optimal Weights ERME uses the variance-covariance matrix from the projection step as weights in the second step. Equally weighted ERME uses the identity matrix instead. P-value of the J-Statistic is an overall specification test. Hence rejection of the null (i.e., p-value < 0.05, say), indicates model misspecification.

Projection Minimum Distance: An Estimator for Dynamic ... jorda.pdf · This version, March 2006...

Documents

Transcript of Projection Minimum Distance: An Estimator for Dynamic ... jorda.pdf · This version, March 2006...