Projection Minimum Distance: An Estimator for Dynamic ... jorda.pdf · This version, March 2006...
-
Upload
nguyentruc -
Category
Documents
-
view
217 -
download
0
Transcript of Projection Minimum Distance: An Estimator for Dynamic ... jorda.pdf · This version, March 2006...
This version, March 2006
Projection Minimum Distance: An Estimator for Dynamic
Macroeconomic Models∗
Abstract
This paper introduces a simple, two-step estimator for rational expectations models and time series modelsin general. We call this estimator projection minimum distance (PMD) and show that it is consistent andasymptotically normal. PMD provides consistent estimates of parameters even when the economic modelis misspecified. In addition, it provides a simple chi-square specification test and a statistically basedmetric of closeness between the impulse responses implied by the model and those from the data. Weshow how PMD relates to GMM and how invalid/weak instrument problems in GMM are easily resolvedby PMD. PMD is simple to implement as it only involves two simple least squares steps and can begeneralized to more complex and nonlinear environments.
• Keywords: impulse response, local projection, rational expectations, minimum chi-squared estima-tion.
• JEL Codes: C32, E47, C53.
Òscar JordàDepartment of EconomicsUniversity of California, DavisOne Shields Ave.Davis, CA 95616e-mail: [email protected]
Sharon KozickiResearch DepartmentBank of Canada234 Wellington StreetOttawa, Ontario, CanadaK1A 0G9e-mail: [email protected]
∗The views expressed herein are solely those of the authors and do not necessarily reflect the views of theBank of Canada. We thank Timothy Cogley, James Hamilton, Peter Hansen, Kevin Hoover, Frank Wolak andseminar participants at Duke University, the Federal Reserve Bank of New York, Federal Reserve Bank of SanFrancisco, Federal Reserve Bank of St. Louis, Stanford University, the University of California, Davis, University ofKansas and the Winter Meetings of the American Economic Association, 2006 in Boston for useful comments andsuggestions. Jordà is thankful for the hospitality of the Federal Reserve Bank of San Francisco during the creationof this paper.
1 Introduction
Estimating and testing competing structural models of the macroeconomy is necessary to advance
our understanding of the competing forces at work and it is fundamental in establishing appro-
priate policy responses in actual economies. Inevitably, macroeconomic models compromise their
realism in favor of tractability and analysis. These opposing forces offer significant challenges for
formal statistical evaluation. For instance, the maximum likelihood principle and its asymptotic
optimality properties require that the structural model be a representation of the density of the
underlying data generation process (DGP). This demand is very hard to meet in practice and
results in rejection of many useful models (economically speaking). Routine failure of specifica-
tion tests has therefore led researchers down the path of evaluating their economic models by
comparing their impulse responses with those from the data by the “eyeball” metric.
This paper introduces new methods to estimate and evaluate structural, dynamic, stochastic
macroeconomic models. The method, which we label projection minimum distance (PMD), is a
two-step estimator. In the first step, we compute the impulse responses from the data semipara-
metrically with the local projections estimator introduced by Jordà (2005). Next, we represent the
stable solution of the model in terms of its Wold representation and obtain the mapping between
the structural parameters and the Wold coefficients by the method of undetermined coefficients
(see Christiano, 2002). Hence, in the second step, the structural parameters of the model are
effectively estimated by minimizing the distance between the impulse responses from the data and
those implied by the model. The resulting estimator is based on a minimum chi-square estimator
(Ferguson, 1958) and belongs to the broader family of minimum distance estimators of which
GMM is also a member of the class.
However, PMD has important advantages that distinguish it from GMM and other commonly
used estimators. Specifically, PMD consists of two, simple, least-squares steps, and therefore, is
easily implementable. The principle behind PMD consists in minimizing the distance between
1
the data’s and the model’s impulse responses — the dimension of fit macroeconomic researchers
most care about. Two implications follow from this principle: First, we provide an overall mis-
specification test based on overidentifying restrictions that is distributed chi-square. Effectively,
this test is the formal equivalent of the “eyeball” determination of when a model matches the
relevant dynamics of the data. Secondly, PMD is designed to provide consistent estimates even
when the model is dynamically misspecified. We show that the common practice of using lags of
the endogenous variables as instruments in GMM can only be justified by the internal dynamics of
the data, not the model. Otherwise, GMM presents and invalid instrument problem that results
in inconsistent estimates. Fuhrer and Olivei (2004) explain this problem in more detail.
Econometrically, the paper has three contributions. First, it is critical for the consistency
properties of PMD to obtain consistent, first-step estimates of the impulse responses from the
data. This motivates our preference for local projections over approximations based on vector
autoregressions (VARs). Consequently, we begin by deriving consistency and asymptotic nor-
mality results for the local projection estimates. Second and unlike classical minimum distance,
the minimum chi-square step is based on a function of the structural parameters that depends
on sample estimates. Consequently, we derive the consistency and asymptotic normality of the
minimum chi-square step and appropriately incorporate into the asymptotic variance-covariance
matrix the uncertainty generated in the first-step. Hence, we show that a misspecification test
of overidentifying restrictions is distributed chi-square. Finally, we show that PMD is consistent
even when the economic model is dynamically misspecified and how our model corrects the invalid
instrument problem that GMM suffers.
We introduce PMD and the main results in the context of a flexible, linear state-space repre-
sentation of a dynamic rational expectations model. However, PMD is not limited by linearity.
PMD can be made more flexible in two different ways that do not significantly complicate the
method. First, the first-step local projections can be made nonlinear in a variety of ways as
described in Jordà (2005). Secondly, the method is not limited to linear rational expectations
2
models. However, because the paper is already dense with results, we defer a more detailed dis-
cussion of these two issues to another paper. Finally, in addition to showing how PMD relates to
GMM we also show how it relates to the Hannan-Rissanen (1982) estimator and how PMD can
be used to estimate rather general time series models, including traditional ARMA(p,q) models.
2 Projection Minimum Distance: The Method
In this section we propose an estimator for rational expectations models (and more broadly time
series models), where it is important to be flexible about the underlying dynamics of the data
generating process (DGP) to ensure consistent estimates of the structural parameters of interest.
The estimator we propose consists of two steps. In the first step we estimate impulse responses
from the sample with a flexible, semi-parametric estimator: Jordà’s (2005) local projections. In the
second step we minimize the distance between the impulse responses estimated from the sample
and the responses generated from the macroeconomic model being considered. This second step
is based on Ferguson’s (1958) minimum chi-square estimator. We call this two-step estimator
“projection minimum distance” (PMD) and show that it is consistent and asymptotically normal.
We find it useful to use a generic rational expectations formulation (e.g., see Farmer, 1993;
and Evans and Honkapohja, 2001) to illustrate the particulars of our technique. Hence, let yt
be an r × 1 vector of endogenous variables whose behavior is described by the backward- and
forward-looking model
Φ00yt = Φ01yt−1 + Φ
02Etyt+1 + ut, E(utu
0t) = I (1)
where ut is the r × 1 vector of expectational errors. The r × r coefficient matrix Φ0 makes
explicit the nature of the contemporaneous relations between elements of yt. Expression (1) is
rather general. For instance, yt can include exogenous variables whose economic processes are
not directly modeled but whose reduced-form laws-of-motion can be expressed in finite Markov
3
form. Similarly (1) does not restrict the model to have first order dynamics since the vector yt can
always be appropriately redefined to include lags of yt and (1) can be thought of as a state-space
representation. Expression (1) can therefore be appropriate for both dynamic stochastic general
equilibrium or dynamic partial equilibrium models of the economy.
A stable solution for (1) is a dynamic stochastic difference equation in yt. In most practical
situations, the stability of the solution implies that it is covariance-stationary, and hence, by the
Wold decomposition theorem (see Anderson, 1994), can be represented as,
yt =∞Xj=0
B0jεt−j . (2)
with B0 = I, E(εtε0t) = Ω and with the relation between the reduced form residuals of the Wold
decomposition, εt, and the structural residuals of the economic model, ut, given by εt = Φ−10 ut
(we omit the constant and deterministic terms for simplicity but without loss of generality).
Substituting (2) into (1) and equating terms in εt−j (in analogous manner to the method of
undetermined coefficients, see Christiano, 2002), we obtain the following set of conditions:
Φ00 = Φ02B01 + Φ
00 for j = 0 (3)
B0j = Φ01B0j−1 + Φ
02Bj+1 for j ≥ 1
where we remark that an estimate of Φ0 can be obtained by imposing the structural identification
conditions of the model and noticing that Ω = Φ−10¡Φ−10
¢0.
Momentarily ignoring the conditions associated with j = 0 in (3), it is clear that for j ≥ 1,
structural identification plays no role in estimating Φ1 and Φ2, which can be done on the basis of
the reduced-form impulse responses, Bj , alone. Define
4
Br(h+1)×r
=
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
B0
B1
...
Bh
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦,
with B0 = I; and the selector matrices
S0 =
∙0
r(h−1)×r(Ih−1 ⊗ Ir) 0
r(h−1)×r
¸,
S1 =
∙(Ih−1 ⊗ Ir) 0
r(h−1)×2r
¸,
S2 =
∙0
r(h−1)×2r(Ih−1 ⊗ Ir)
¸.
Then we can stack the conditions in (3) for j = 1, ..., h more compactly as
S0B = S1BΦ1 + S2BΦ2 (4)
The vec operator can be applied to both sides of this expression to vectorize the coefficient vectors.
Let b = vec(B); φ = vec(Φ1) vec(Φ2)0 then (4) can be written as
(Ir ⊗ S0) b = (Ir ⊗ S1B) (Ir ⊗ S2B)φ . (5)
This last expression makes clear that if the impulse response coefficients in B were known rather
than estimated, then φ could be simply estimated, say, by least squares. Instead, we show below
that one can easily obtain semi-parametric estimates of b (or B interchangeably) with local pro-
jections in a first stage regression. In the second stage, we minimize the distance between the two
sides of expression (5) by minimum chi-square methods.
In particular, let
g³bbT ;φ´ = (Ir ⊗ S0)bbT − n³Ir ⊗ S1 bBT´ ³
Ir ⊗ S2 bBT´oφ (6)
5
then, a minimum chi-square estimator of φ can be found by minimizing
minφ
bQT (φ) = g ³bbT ;φ´0cWg ³bbT ;φ´ (7)
for some weighting matrix cW.Several remarks deserve comment. First, this estimation method is not limited by linear
expectational mechanisms since (6) can be easily generalized appropriately. However, we do not
explore the details of this feature explicitly in this paper. Second, the matrices Φ1 and Φ2 contain
2r2 parameters that we want to estimate but we have hr2 conditions available for estimation,
where h is determined by the sample size and the researcher in the same way that instruments
are selected in GMM estimation. Thus, the availability of these overidentifying restrictions will
provide a natural test of model misspecification. We show that this test is distributed chi-squared
under standard regularity conditions that we discuss below. This test effectively measures the
distance between the impulse responses from the data and those from the theoretical model.
Macroeconomic models, whether calibrated, estimated by maximum likelihood, or estimated by
some other method, are routinely evaluated by how well their impulse responses match those
responses generated by the data. Hence, the overidentifying restrictions test provides a formal
statistical metric for this common ocular diagnostic. Third, the relation between the coefficients
in φ and bbT is not a known function, as is commonly assumed in classical minimum distance
estimation, but rather depends on the first-stage estimates bbT . Below we show how we incorporatethis feature into the proofs of consistency and asymptotic normality.
In the next section we derive consistency and asymptotic normality results for the local pro-
jection estimator proposed therein. Estimates from this first-step are then incorporated into
the minimum chi-square step, whose consistency and asymptotic normality properties we derive
thereafter.
6
3 First-Step: Local Projections
3.1 Consistency
In the previous section we argued that a stable solution of the rational expectations model (1) is
therefore covariance-stationary and has a Wold decomposition,
yt =∞Xj=0
Bjεt−j (8)
where for simplicity and without loss of generality we drop the constant and any deterministic
terms. From the Wold decomposition theorem (see e.g. Anderson, 1994):
(i) E(εt) = 0 and εt are i.i.d.
(ii) E(εtε0t) = Σεr×r
(iii)P∞j=0 kBjk <∞ where kBjk2 = tr(B0jBj) and B0 = Ir
(iv) det B(z) 6= 0 for |z| ≤ 1 where B(z) =P∞j=0BjzjAnderson (1994) shows that the process in (8) can also be written as:
yt =∞Xj=1
Ajyt−j + εt (9)
such that,
(v)P∞j=1 kAjk <∞
(vi) A(z) = Ir −P∞j=1Ajz
j = B(z)−1
(vii) detA(z) 6= 0 for |z| ≤ 1.
Results (iii) and (vi) imply that
£Ir −A1L−A2L2 − ...
¤ £Ir +B1L+B2L
2 + ...¤= Ir
7
where L is the usual lag operator. Then, equating terms in the powers of L, we have the following
correspondences:
B1 = A1 (10)
B2 = A1B1 +A2
...
Bh = A1Bh−1 +A2Bh−2 + ...+Ah−1B1 +Ah
for any h ≥ 1 with B0 = Ir. Hence, expression (10) delivers a set of conditions that would allow
us to determine the impulse response function for yt if we knew what the Aj were. A truncated
version of these conditions for a finite lag VAR is available in Hamilton (1994). In addition and
by simple recursive substitution in the V AR(∞) representation, we have that
yt+h = Ah1yt +A
h2yt−1 + ...+ εt+h +B1εt+h−1 + ...+Bh−1εt+1 (11)
where:
(i) Ah1 = Bh for h ≥ 1
(ii) Ahj = Bh−1Aj +Ah−1j+1 where h ≥ 1; A0j+1 = 0; B0 = Ir; and j ≥ 1.
Now consider truncating the infinite lagged expression (11) at lag k
yt+h = Ah1yt +A
h2yt−1 + ...+A
hkyt−k+1 + vk,t+h (12)
vk,t+h =∞X
j=k+1
Ahj yt−j + εt+h +h−1Xj=1
Bjεt+h−j .
In what follows, we show that least squares estimates of (12) produce consistent estimates for
Ahj for j = 1, ..., k. Notice that this is more than we need since we are only interested in the
consistency of the Ah1 .
8
Let Γ(j) ≡ E(yty0t+j) with Γ(−j) = Γ(j)0. Further define:
(i) Xt,k =¡y0t,y
0t−1, ...,y
0t−k+1
¢0that is, the regressors in (12).
(ii) bΓ1,k,hkr×r
= (T − k − h)−1PT−ht=k Xt,ky
0t+h
(iii) bΓk = (T − k − h)−1PT−ht=k Xt,kX
0t,k
then, the mean-square error linear predictor of yt+h based on yt, ...,yt−k+1 is given by the
least-squares formula
bAr×kr
(k, h) = ( bAh1 , ..., bAhk) = bΓ01,k,hbΓ−1k (13)
The following theorem establishes the consistency of these least-squares estimates for A(k, h) =¡Ah1 , ..., A
hk
¢.
Theorem 1 Consistency. Let yt satisfy (8) and assume that:
(i) E|εitεjtεktεlt| <∞ for 1≤ i, j, k, l ≤ r
(ii) k is chosen as a function of T such that
k2
T→ 0 as T, k →∞
(iii) k is chosen as a function of T such that
k1/2∞X
j=k+1
kAjk→ 0 as T, k →∞
Then: °°° bA(k, h)−A(k, h)°°° p→ 0
The proof of this theorem is in the appendix and parallels the proof in Lewis and Reinsel (1985)
on the consistency properties of estimates from a truncated VAR when the underlying model is a
VAR(∞). A natural consequence of the theorem provides the essential result that we need, that
local projection estimates are consistent for the impulse response coefficients, that is, bAh1 p→ Bh.
9
3.2 Asymptotic Normality
We now show that least-squares estimates from the truncated projections in (12) are asymptoti-
cally normal, although for the purposes of the PMD estimator, proving that bAh1 is asymptoticallynormally distributed would suffice. Notice that we can write
bA(k, h)−A(k, h) = ((T − k − h)−1 T−hXt=k
vk,t+hX0t,k
)bΓ−1k= (T − k − h)−1
⎡⎣T−hXt=k
⎧⎨⎩⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠+ εt+h +h−1Xj=1
Bjεt+h−j
⎫⎬⎭X 0t,k
⎤⎦ bΓ−1k= (T − k − h)−1
⎧⎨⎩T−hXt=k
⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠X 0t,k
⎫⎬⎭nΓ−1k +³bΓ−1k − Γ−1k ´o+
(T − k − h)−1⎧⎨⎩T−hXt=k
⎛⎝εt+h +h−1Xj=1
Bjεt+h−j
⎞⎠X 0t,k
⎫⎬⎭nΓ−1k +³bΓ−1k − Γ−1k ´o
Hence, the strategy of the proof will consist in showing that the first term in the sum above
vanishes in probability and that the second term converges in probability as follows,
(T − k − h)1/2 vech bA(k, h)−A(k, h)i p→
(T − k − h)1/2 vec⎡⎣(T − k − h)−1
⎧⎨⎩T−hXt=k
⎛⎝εt+h +h−1Xj=1
Bjεt+h−j
⎞⎠X 0t,k
⎫⎬⎭Γ−1k⎤⎦
so that by showing that this last term is asymptotically normal, we complete the proof.
Define,
U1T =
⎧⎨⎩(T − k − h)−1T−hXt=k
⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠X 0t,k
⎫⎬⎭U∗2T =
⎧⎨⎩(T − k − h)−1T−hXt=k
⎛⎝εt+h +h−1Xj=1
Bjεt+h−j
⎞⎠X 0t,k
⎫⎬⎭then
(T − k − h)1/2 vech bA(k, h)−A(k, h)i =
(T − k − h)1/2⎧⎪⎪⎨⎪⎪⎩
vec£U1TΓ
−1k
¤+ vec
hU1T
³bΓ−1k − Γ−1k ´i+vec
£U∗2TΓ
−1k
¤+ vec
hU∗2T
³bΓ−1k − Γ−1k ´i⎫⎪⎪⎬⎪⎪⎭
10
hence
(T − k − h)1/2 vech bA(k, h)−A(k, h)i− (T − k − h)1/2 vec £U∗2TΓ−1k ¤ =
(T − k − h)1/2⎧⎪⎪⎨⎪⎪⎩vec
£U1TΓ
−1k
¤+ vec
hU1T
³bΓ−1k − Γ−1k ´i+vec
hU∗2T
³bΓ−1k − Γ−1k ´i⎫⎪⎪⎬⎪⎪⎭ =
¡Γ−1k ⊗ Ir
¢vec
h(T − k − h)1/2 U1T
i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i
Define, with a slight change in the order of the summands,
W1T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i
W2T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i
W3T =¡Γ−1k ⊗ Ir
¢vec
h(T − k − h)1/2 U1T
ithen, in the next theorem we show that W1T
p→ 0, W2Tp→ 0, W3T
p→ 0.
Theorem 2 Let yt satisfy (8) and assume that
(i) E |εitεjtεktεlt| <∞; 1 ≤ i, j, k, l ≤ r
(ii) k is chosen as a function of T such that k3
T → 0, k, T →∞
(iii) k is chosen as a function of T such that
(T − k − h)1/2∞X
j=k+1
kAjk→ 0; k,T→∞
Then
(T − k − h)1/2 vech bA(k, h)−A(k, h)i p→
(T − k − h)1/2 vec⎡⎣⎧⎨⎩(T − k − h)−1
T−hXt=k
⎛⎝εt+h +h−1Xj=1
Bjεt+h−j
⎞⎠X 0t,k
⎫⎬⎭Γ−1k⎤⎦
11
Proof is given in the appendix. Now that we have shown that W1T , W2T , and W3T vanish in
probability, all that remains is to show that
ST ≡ (T − k − h)1/2 vec⎡⎣(T − k − h)−1
⎧⎨⎩T−hXt=k
⎛⎝εt+h +h−1Xj=1
Bjεt+h−j
⎞⎠X 0t,k
⎫⎬⎭Γ−1k⎤⎦ d→
N(0,Ωh) with Ωh =¡Γ−1k ⊗ Σh
¢; Σh =
⎛⎝Σε + h−1Xj=1
BjΣεB0j
⎞⎠Since, vec
h bA(k, h)−A(k, h)i p→ ST , and STd→ N(0,Ωh), then we will have vec
h bA(k, h)−A(k, h)i d→
N(0,Ωh). We establish this result in the next theorem.
Theorem 3 Let yt satisfy (8) and assume
(i) E|εitεjtεktεlt| <∞; 1≤ i, j, k, l ≤ r
(ii) k is chosen as a function of T such that
k3
T→ 0, k, T →∞
Then
STd→ N(0,Ωh)
The proof is provided in the appendix. Several results deserve comment. Observe that for each
horizon h considered, the asymptotic variance-covariance matrix Ωh is determined by the moving
average structure of the residuals, Ωh =¡Γ−1k ⊗ Σh
¢; Σh =
³Σε +
Ph−1j=1 BjΣεB
0j
´. Jordà (2005)
suggests that one way of estimating this variance-covariance matrix semiparametrically is with
the Newey-West (1987) heteroskedasticity and autocorrelation consistent estimator. However, if
the projections are estimated sequentially, notice that each of the projections up to horizon h− 1
provide consistent estimates of Bj j = 1, 2, ..., h − 1, which could then be used to estimate the
asymptotic variance-covariance matrix parametrically. We postpone discussion on the differences
between these two methods for a later paper and proceed with the alternative based on the
Newey-West estimator.
12
In practice, we find it convenient to estimate responses for horizons 1, ..., h jointly as follows.
Define,
(i) Xt−1,kr(k−2)×1
≡ ¡y0t−1, ...,y0t−k+1¢0(ii) Yt,h
rh×1≡ ¡y0t+1, ...,y0t+h¢0
(iii) Mt−1,k1×1
≡ 1−PT−ht=k X
0t−1,k
³PT−ht=k Xt−1,kX
0t−1,k
´−1Xt−1,k
(iv) bΓ1|kr×r≡ (T − k − h)−1PT−h
t=k ytMt−1,ky0t
(v) bΓ1,h|kr×rh
≡ (T − k − h)−1PT−ht=k ytMt−1,kY 0t,h
Hence, the impulse response coefficient matrices for horizons 1 through h can be jointly esti-
mated in a single step with
bΓ01,h|kbΓ−11|k =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
bA11bA21bAh1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
bB1bB2...
bBh
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦= bB(1, h) (14)
Using the usual least-squares formulas, notice that
bB(1, h) = B(1, h) +((T − k − h)−1 T−hXt=k
ytMt−1,kV 0t,h
)0 bΓ−11|k + op(1) (15)
where Vt,h ≡¡v0t+1, ...,v
0t+h
¢0;vt+j = εt+j + B1εt+j−1 + ... + Bj−1εt+1 for j = 1, ..., h and the
terms vanishing in probability in (15) involve the terms U1T , U2T , and U3T defined in the proof of
theorem one, which makes use of the condition k1/2P∞j=k+1 ||Aj || → 0 as T, k → ∞. Under the
conditions of theorem 2, wew can write
(T − k − h)1/2vec³ bB(1, h)−B(1, h)´ p→ (16)
(T − k − h)1/2 vec"((T − k − h)−1
T−hXt=k
Vt,hMt−1,ky0t
)bΓ−11|k#
13
from which we can derive the asymptotic distribution under theorems 2 and 3.
Next notice that
(T − k − h)−1T−hXt=k
Vt,hV0t,h
p→ Σvrh×rh
(17)
The specific form of the variance-covariance matrix Σv can be derived as follows. Define 0j = 0j×j;
0m,n = 0m×n;. Recall that Vt,h ≡
¡v0t+1, ...,v
0t+h
¢0from above. Specifically,
Vt,h =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
εt+1
εt+2 +B1εt+1
...
εt+h +B1εt+h−1 + ...+Bh−1εt+1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦= ΨBεt,h,
where εt,h ≡¡ε0t+1, ..., ε0t+h
¢0and ΨB = [ΨB,1 . . .ΨB,h] with
ΨB,1 =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
Ir
B1
...
Bh−1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ΨB,h =
⎡⎢⎢⎣ 0r(h−1),r
Ir
⎤⎥⎥⎦ and for 1 < j < h, ΨB,j =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
0r(j−1),r
Ir
...
Bj−1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.
Then, assuming Σεr×r
= E(εt+1ε0t+1), it is easy to show that
Σv = ΨBIr ⊗ ΣεΨ0B (18)
and hence that
(T − k − h)1/2 vec³ bB(1, h)−B(1, h)´ d→ N (0,ΩB)
ΩBr2h×r2h
=
ÃΓ−11|kr×r⊗ Σvrh×rh
!
In practice, one requires sample estimates bΓ−11|k and bΣv.We have already defined a sample estimatefor the first term, however, the latter estimate deserves particular consideration. As we mentioned
14
in our derivations for theorem 3, one avenue would be to estimate expression (18) by a Newey-
West type estimator of the variance-covariance matrix Σv. However, given the parametric form
of expression (18), it would be easy to construct a sample estimate of ΩB by plugging-in the
estimates bB(1, h) and bΣε into the expression (18). Furthermore, feasible GLS estimates of B(1, h)could be easily computed in two steps: (1) compute bB(1, h), bΣε, and hence bΣv (2) do GLS byscaling Yt,h with bΣ−1/2v . We defer further investigation on the properties of this estimator to a
different paper to avoid burdening the reader with too many results.
In practice, we find it convenient to estimate responses for horizons 1, ..., h jointly as follows.
define yj for j = h, ..., 1, 0, —1, ..., —k as the (T − k − h)× r matrix of observations of the vector
yt+j. Additionally, define Y ≡ (y1, ...,yh)0 ; X ≡ (Ih ⊗ y0) ; Z ≡
¡1(T−k−h)×r,y−1, ...,y−k+1
¢and hence, Mz = I(T−k−h)h − (Ih ⊗ Z)
£(Ih ⊗ Z)0 (Ih ⊗ Z)
¤−1(Ih ⊗ Z) . Standard properties of
least-squares allow us to jointly estimate
bB(1, h) =⎡⎢⎢⎢⎢⎢⎢⎣bB1...
bBh
⎤⎥⎥⎥⎥⎥⎥⎦ = [X0MzX]
−1[X 0MzY ] (19)
and with asymptotic variance-covariance matrix for vec( bB(1, h)), bV 2b = n[X 0MzX]−1 ⊗ bΣho ; bΣh =³bΣ+Ph−1
j=1bBj bΣ bB0j´ , and bΣ = v0v
T−k−h ; bv =MzY −MzX bB(1, h). Hence, recall that what is neededfor the two-step estimator is an estimate of b = vec(B) = vec(B(0, h)) where B0 = Ir. Hence,
from (19), bbT = (vec(Ir) vec( bB(1, h))0 where bV 2b can easily be redefined as
bV 2b =⎛⎜⎜⎝ 0r 0r×(hr2)
0(hr2)×r bV 2b⎞⎟⎟⎠
We conclude this section by highlighting its main results and drawing a brief comparison to
estimates of the Bj based on a V AR(k). First, our local projection estimator ensures that we have
consistent estimates of Bj for j = 1, ..., h. We have shown that this result holds even when the
underlying process is an infinite order V AR. Secondly, we have derived the variance-covariance
15
matrix of B(1, h). Finally, even if the underlying data generating process is more complex than an
infinite order V AR, the nature of the local approximation of the projections is likely to provide
more robust estimates of the Bj . In contrast, estimates of Bj based on a V AR(k) have serious
disadvantages. First, consistency of the Bj is only available for j = 1, ..., k. For any j > k the
Bj will be inconsistent for processes whose V AR representation has more than k lags (as would
be expected with the V ARMA representations of the newest generation of dynamic, stochastic,
general equilibrium model). Seconly, a closed form expression for the variance-covariance matrix
of the Bj is unavailable. This prevents us from forming the optimal weighting matrix in the
minimum distance step and complicates the computation of valid standard errors for the structural
parameter estimates.
4 The Second Step: Minimum Chi-Square
4.1 Consistency
Given bbT from the first-stage described in the previous sections, our objective here is to estimate
φ = vec(Φ1) vec(Φ2)0 from the conditions in (5) and minimization of expression (7), where
expression (6) is rewritten here for convenience
g³bbT ;φ´ = (Ir ⊗ S0)bbT − n³Ir ⊗ S1 bBT´ ³
Ir ⊗ S2 bBT´oφand hence the quadratic function we need to minimize is
minφ
bQT (φ) = g ³bbT ;φ´0cWg ³bbT ;φ´where bφT denotes the sample estimate of φ from this two-step estimator and Q0(φ) denotes
the objective function at b0. The following theorem establishes the conditions under which this
estimator is consistent for φ0.
16
Theorem 4 Given that bbT p→ b0, assume that
(i) cW p→W, a positive semidefinite matrix
(ii) Q0(φ) is uniquely maximized at φ0
(iii) The parameter space φ0 ∈ is compact
(iv) Q0(φ) is continuous
(v) bQT (φ) converges uniformly in probability to Q0(φ)(vi) hr2 ≥ dim(φ)
Then
bφT p→ φ0
Proof is provided in the appendix. Next, we show that the minimum chi-square estimator is
asymptotically normal.
4.2 Asymptotic Normality
The results in this section are based on Theorem 3.2 and pages 2149, 2175 in Newey and McFadden
(1994). The following theorem derives the asymptotic distribution of bφT .Theorem 5 Assume:
(i) cW p→W, a positive semidefinite matrix
(ii) bbT p→ b0 and bφT p→ φ0
(iii) b0 and φ0 are in the interior of their parameter spaces
(iv) g(bbT ;φ) is continuously differentiable in a neighborhood N of φ0
(v)√Tg(b0;φ0) =
√T (Ir ⊗ S0)
³bbT − b0´ d→ N(0, VB)
17
(vi) There is a Gb and a Gφ that are continuous at b0 and φ0 respectively and
supφ∈N
°°°∇bg(bbT ;φ)−Gb°°° p→ 0
supφ∈N
°°°∇φg(bbT ;φ)−Gφ
°°° p→ 0
(vii) For Gφ = Gφ(φ0), then G0φWGφ in invertible
(viii) hr2 ≥ dim(φ)
Then:
√T³bφT − φ0
´d→ N (0, Vφ) (20)
Vφ =¡G0φWGφ
¢−1G0φW (I ⊗ S0)0 Vb (I ⊗ S0)WGφ
¡G0φWGφ
¢−1+
¡G0φWGφ
¢−1G0bWΣhWGb
¡G0φWGφ
¢−1+
¡G0φWGφ
¢−1 G0φWGb −G0bMzX0VbG0φWGb −G0bMzX¡G0φWGφ
¢−1The optimal weighting matrix W is
Wo =©(I ⊗ S0)0 Vb (I ⊗ S0)
ª−1which would simplify the expression of the variance further into
Vφ =¡G0φWoGφ
¢−1+ V o
φ(bT )
where V oφ(bT )
denotes that portion of the variance of bφT due to the fact that bbT is estimated andcorresponds to the last two lines in expression (20) evaluated atWo. Absent this uncertainty, then
the expression for Vφ would simplify to³G0φWoGφ
´−1, the usual minimum distance result with
optimal weighting matrix.
5 PMD: A Summary for Practitioners
The PMD estimator of the model in expression (1) and repeated here for convenience
18
Φ00yt = Φ01yt−1 + Φ
02Etyt+1 + ut, E (u0tut) = I
consists of the following steps:
1. Construct Y = (y1, ...,yh)0; X = (Ih ⊗ y0) ; Z =
¡1(T−k−h)×r,y−1, ...,y−k+1
¢; Mz =
I(T−k−h)h− (Ih ⊗ Z)£(Ih ⊗ Z)0 (Ih ⊗ Z)
¤−1(Ih ⊗ Z) , where yj is the (T −k−h)×r matrix
of observations for the vector yt+j .
2. Compute by least squares bbT = vec(Ir bB(1, h)), wherebB(1, h) = [X 0MzX]
−1[X 0MzY ]
3. Compute bVb as bVb = (T − k − h)−1n[X 0MzX]
−1 ⊗ bΣho ; bΣh estimated by Newey-West onthe residuals of the regression in bullet point 2. Remember to redefine bVb by adding theappropriate zeros corresponding to the variance and covariances of the term I in bbT
4. Define bYb = (Ir ⊗ S0)bbT ; bXb = n³Ir ⊗ S1 bBT´ ³Ir ⊗ S2 bBT´o where S0, S1, S2 are the ma-
trices that select the appropriate elements of bBT = bB(1, h). Further define φ = vec(Φ1) vec(Φ2) .
5. The minimum chi-square estimate of φ is easily computed by weighted least-squares
bφT = ³ bXb bV −1bbXb´−1 ³ bXb bV −1b
bYb´
6. The variance-covariance matrix for bφT can be calculated as:bVφ = 1
T − k − h³ bX 0
bcWo
bXb´−1 + 1
T − k − hbV oφ(bT )
where cWo =n(I ⊗ S0)0 bVb (I ⊗ S0)o−1 and the expression for bV oφ(bT ) can be computed di-
rectly from expression (20).
7. A test of misspecification can be constructed by noticing that
³bYb − bXbbφT´0cW0
³bYb − bXbbφT´ d→ χ2(h−2)r2
19
6 Invalid Instruments: PMD versus GMM
We begin this section with a simple motivating example. Suppose the DGP is characterized by
the backward/forward model:
yt = φ1yt−1 + φ2Etyt+1 + εt. (21)
Instead, Euler conditions of conventional rational expectations models can be expressed as
yt = ρEtyt+1 + ut, (22)
which in this example, are misspecified with respect to the DGP. Based on the economic model
in (22), yt−j ; j > 1 would be considered valid instruments for GMM estimation and hence, an
estimate of ρ would be found with the set of conditions
bρGMM =
Ã1
T
TXyt+1yt−j
!−1Ã1
T
TXyt−jyt
!. (23)
It is easy to see that the probability limit of these conditions is
bρGMMp→ φ2 + φ1
γj−1γj+1
; j ≥ 1
where γj = COV (ytyt−j). Notice that the bias, φ1γj−1γj+1
, does not disappear by selecting longer
lags of yt−j as instruments, the usual GMM recommendation. In fact, any yt−j ; j ≥ 1 is an
invalid instrument and although γj → 0 as j →∞, γj−1γj+1is indeterminate, since both covariances
are simultaneously going to zero. Meanwhile, as j → ∞ the correlation of the instrument with
the regressor is exponentially decaying to zero.
Instead consider what happens in PMD. Notice that the MA(∞) representation of (21) is
yt =∞Xj=0
bjεt−j
20
and hence, under the proposed model in (22), we obtain the following set of conditions analogous
to (3), namely
⎡⎢⎢⎢⎢⎢⎢⎣b1
...
bh
⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎢⎢⎣
1
...
bh−1
⎤⎥⎥⎥⎥⎥⎥⎦ ρ (24)
which we can use to set-up the PMD estimator. Local projections for the bj are
bbj = Ã 1T
TXy0t−jMt−jyt−j
!−1Ã1
T
TXy0t−jMt−jyt
!
whereMt = 1−zt (z0tzt)−1 zt and zt = (1 yt−2 ...yt−k+1) . In order to compare PMD directly with
the GMM expression (23) notice that
bj = ρbj−1
then, for j ≥ 1
bρPMD =
Ã1
T
TXy0t−jMt−jyt−j
!−1Ã1
T
TXy0t−jMt−jyt
!
and hence
bρPMDp→ φ1
since
1
T
TXyt−j−1Mt−jyt
p→ 0;1
T
TXεt−jMt−jyt
p→ 0.
Clearly, a main distinguishing characteristic between PMD and GMM is that PMD uses the
semiparametric estimates of the impulse responses to generate conditional instruments that are
21
valid even if the model is misspecified. In contrast, GMM uses endogenous instruments uncon-
ditionally and therefore their validity heavily depends on having specified the correct model. A
second advantage of PMD is that the optimal weights assigned to each impulse responses con-
dition are inversely proportional to the estimated variance of the impulse response coefficient.
Hence, it is the marginal informational content of this conditional instrument that determines its
contribution to the parameter estimates and its variance.
7 Monte Carlo Experiments (incomplete)
7.1 Estimating ARMA(p,q) models with PMD
This section investigates the small sample properties of PMD. We take this opportunity to further
demonstrate the flexibility of our method by experimenting with univariate ARMA specifica-
tions, which would typically require numerical optimization routines. In particular, consider a
covariance-stationary ARMA(p,q) model,
yt = ρ1yt−1 + ...+ ρpyt−p + εt + θ1εt−1 + ...+ θqεt−q (25)
with Wold decomposition,
yt =∞Xj=0
bjεt−j (26)
with b0 = 1. Matching the responses bi in (26) into (25) delivers the set of conditions
22
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
b1
b2
b3
...
bh
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1
b1
b2
...
bh−1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ρ1 +
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
0
1
b1
...
bh−2
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ρ2 + ...+
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
0
0
0
...
bh−p
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ρp + (27)
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1
0
0
...
0
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦θ1 +
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
0
1
0
...
0
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦θ2 + ...+
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
0
0
0
...
0
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦θq
Table 1 summarizes a battery of experiments based on several ARMA(1,1) specifications that
include pure AR and pure MA examples as well. For each choice of ARMA parameters indicated
in the table, residuals are drawn from a standard normal and used to generate samples of 100 and
300 observations, where the first 100 observations generated are discarded to avoid initialization
problems. The horizon h is chosen when a joint F-test of the h+1 projection fails to reject the null
that the projection coefficients are zero. Finally, we compare the properties of ERME against the
Durbin-Hannan-Rissanen estimator for ARMA models (see Durbin, 1960 and Hannan-Rissanen,
1982). This is a two-step estimator based on fitting a long autoregression in the first step and
then using the residuals to estimate the ARMA model in the second step. We choose this method
for several reasons. Maximum likelihood estimation requires numerical routines which crashed
for some draws of the residuals. The Durbin-Hannan-Rissanen estimator does not have these
numerical problems and provides asymptotically consistent and efficient estimates1 .
Overall, we found the results of our Monte-Carlo experiment very encouraging. ERME esti-
1 The GAUSS routines used to estimate the models with the Durbin-Hannan-Rissanenapproach were obtained from the collection of time series routines prepared by RainerSchlittgen and Thomas Noack and available at the American University repository:http://www.american.edu/academic.depts/cas/econ/gaussres/timeseri/timeseri.htm
23
mates are numerically very similar (even slightly more accurate) to the Durbin-Hannan-Rissanen
estimates and close to the true parameter values. Although not reported in the table, we found the
numerical estimates and associated standard errors to be very stable to the choice of truncation
horizon h. Standard errors from ERME tended to be slightly less efficient, the difference becoming
smaller for larger samples or when the truncation horizon is larger.
7.2 Fuhrer and Olivei (2004)
7.3 Systems
8 Application: Fuhrer and Olivei (2004) revisited
The popular, elementary New-Keynesian framework for monetary policy analysis combines forward-
looking, micro-founded output (IS curve) and inflation (AS curve) Euler equations with a policy
reaction function. The specific expectations mechanism considered plays a prominent role: purely
forward-looking expectations mechanisms seem to be at odds with the dynamic properties of the
data. Fuhrer and Olivei (2004) investigate whether the nature of this dissonance can be explained
by the poor properties of weak instruments GMM or whether “hybrid” specifications, that simul-
taneously incorporate forward and backward-looking behavior, are warranted. To this end, Fuhrer
and Olivei (2004) propose a GMM procedure that imposes the dynamic constraints implied by
the economic model on the instruments. They dub this procedure “optimal-instruments” GMM
(OI-GMM) and explore its properties relative to conventional GMM and MLE.
This section uses ERME on the specifications in Fuhrer and Olivei (2004) to offer a benchmark
comparison between GMM, MLE and OI-GMM, and ERME. The basic specification is:
zt = (1− µ) zt−1 + µEtzt+1 + γEtxt + εt (28)
In the output Euler equation, zt is a measure of the output gap, xt is a measure of the real interest
rate, and hence, γ < 0. In the inflation Euler version of (28), zt in a measure of inflation, xt is
a measure of the output gap, and γ > 0 signifying that a positive output gap exerts “demand
24
pressure” on inflation.
Fuhrer and Olivei (2004) experiment with a quarterly sample from 1966:Q1 to 2001:Q4 and
use the following measures for zt and xt. The output gap is measured, either by the log deviation
of real GDP from its HP trend or, from a segmented time trend with breaks in 1974 and 1995.
Real interest rates are measured by the difference of the federal funds rate and next period’s
inflation. Inflation is measured by the log change in the GDP, chain-weighted price index. In
addition, Fuhrer and Olivei (2004) experiment with real unit labor costs instead of the output gap
for the inflation Euler equation. Further details can be found in their paper. Accordingly, Tables
2 and 3 reproduce Tables 4 and 5 in Fuhrer and Olivei (2004) and incorporate the results we find
from ERME. In addition, we report unconstrained estimates of expression (28) labeled with the
superscript “u” for unconstrained (“c” for constrained). The projection lag length of ERME’s first
step is determined by Akaike’s information criterion and the horizon truncation h is determined
with a joint significance F-test of the projections at each horizon, as we did in the Monte Carlos.
Table 2 shows significant discrepancies between GMM, MLE and OI-GMM on one hand, and
ERME on the other. Where as the former methods’ estimates of µ range between 0.52 to 0.41,
ERME’s estimates are in the range −0.40 to −0.17. Interestingly, ERME finds a much stronger
link and of the correct sign between current and future interest rates and the output gap - a
coefficient that is routinely estimated to be insignificant in the literature. Our estimates range
from −0.13 to −0.10, whereas maximum likelihood suggests this coefficient to be −0.0084 at best.
At worst the coefficient is of the wrong sign and insignificant in all other cases.
We investigated two possible explanations of the discrepancies between methods. First, we
considered whether the constraint that the coefficient on zt−1 and the coefficient on Etzt+1 add
up to one and imposed by Fuhrer and Olivei (2004), was being violated by the data. This turned
out not to be the case as can be observed from Table 2. Secondly, Cameron and Trivedi (2005)
suggest that in small samples, optimally weighted minimum distance estimators can be biased and
that it may be preferable to use a weighting matrix with equal weights. The results with equal
25
weights ERME are reported in the bottom third of table 2. Estimates of µ and γ under both the
constrained and unconstrained versions changed very little indeed, suggesting that this was not
the source of discrepancy.
What could explain the differences then? The overall specification J-test from ERME suggests
that, whether unconstrained or not, estimated with equal weights or not, the theoretical model
is misspecified. Because MLE and OI-GMM impose this theoretical and misspecified model on
the data, we attribute differences across methods to this constraint. This impression was further
reinforced by the results reported in Table 3. As we shall see momentarily, when the J-test indicates
that the theoretical model is not misspecified, estimates from ERME are almost identical to MLE
and OI-GMM. Because ERME is robust to misspecification of the underlying data generating
process, we feel strongly about our results if only the controlled environment of a Monte-Carlo
exercise could arbitrate among the merit of each method.
Table 3 reports estimates of the inflation Euler equation. As we remarked above, when ERME’s
J-test does not find evidence of misspecification, ERME’s estimates are very close to the MLE
estimates (in fact, somewhat closer than OI-GMM). µ is estimated to be approximately 0.20
and the output gap enters with the correct sign and with a coefficient of about 0.10. However,
discrepancies among methods arise when we use real unit labor costs instead of the output gap. In
this case, the J-test suggests the model is misspecified and, while ERME finds a value of µ which
is still in the neighborhood of 0.20, it finds that the effect of real unit labor costs on inflation is
much larger: 0.35 and highly significant. In contrast, both ML and OI-GMM have µ growing from
0.20 to 0.45 while the effect of real unit labor costs becomes insignificant.
We conclude this section by observing that ERME estimates were insensitive to the choice
of projection lag length (we obtained similar results with varying lag length specifications) and
changed very little when we vary the number of impulse response horizons used. Obviously, while
our method produces results that are preferable from the economic point of view (in that the
coefficient estimates of γ are more closely in line with a priori economic intuition), the more sensible
26
argument comes from observing that, while MLE and OI-GMM impose the economic model on the
data (and hence their properties cannot be assessed when this assumption is violated), ERME does
no such thing — it uses local projections to obtain the most accurate description of the properties
of the data generating process and uses these to obtain estimates of the parameters of interest.
9 Conclusions
This paper introduces a disarmingly simple method to estimate rational expectations models based
on mapping the structural parameters of the economic model into the impulse responses from the
data. The method avoids many of the computational complexities associated with full-information
maximum likelihood or the optimal instruments GMM approach proposed by Fuhrer and Olivei
(2004), yet possesses several advantages over these alternative estimation procedures. By assuming
that the solution paths and exogenous stochastic processes in the model are characterized by a
Wold representation rather than by a specific Markov process, we require less knowledge about
the data generating process and can accommodate more general dynamic specifications. Perhaps
most importantly, our efficient response matching estimator is designed to estimate parameters
so that the model fits the data precisely along the dimensions against which it will be evaluated.
The success or failure of dynamic stochastic general equilibrium models is ultimately based on
their ability to explain the dynamic interactions among the economic variables involved, which
interactions are typically summarized by impulse response functions. It is in this sense that our
method conforms to the introductory quote by Sims.
An important feature of our method originates from how we adapt the local projection re-
sponse estimator proposed by Jordà (2005). This impulse response estimator is more robust to
general misspecification of the relevant underlying stochastic processes. This feature is particu-
larly advantageous as it allows the ERME method to extract additional information from longer
horizon impulse responses in cases where the Wold decomposition differs from the typically as-
sumed Markov format. In the empirical example, this additional information tightened estimates
27
of standard errors considerably.
New second and higher order accurate solution techniques for nonlinear dynamic stochastic
general equilibrium models (see, e.g. Kim et al., 2003) deliver equilibrium conditions that are
polynomial (rather than linear) difference equations. Future research will investigate the ERME
method in these environments. We expect that, while the relationship between impulse response
coefficients and the model’s parameters will no longer be linear, the only real variation in our
method will be a nonlinear-GLS type step, which is still relatively straight-forward to apply and
may open the door to formal evaluation of this next generation of models.
10 Appendix
Proof. Theorem 1
Notice that
bA(k, h)−A(k, h) = bΓ01,k,hbΓ−1k −A(k, h)bΓkbΓ−1k =⎧⎨⎩(T − k − h)−1∞Xj=k
vk,t+hX0t,k
⎫⎬⎭ bΓ−1kwhere
vk,t+h =∞X
j=k+1
Ahj yt−j + εt+h +h−1Xj=1
Bjεt+h−j
Hence,
bA(k, h)−A(k, h) =
⎧⎨⎩(T − k − h−1)T−hXt=k
⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠X 0t,k
⎫⎬⎭ bΓ−1k +
((T − k − h−1)
T−hXt=k
εt+hX0t,k
)bΓ−1k +⎧⎨⎩(T − k − h−1)T−hXt=k
⎛⎝ hXj=1
Bjεt+h−j
⎞⎠X 0t,k
⎫⎬⎭ bΓ−1kDefine the matrix norm kCk21 = supl6=0 l
0C0C0l0l , that is, the largest eigenvalue of C 0C. When C is
symmetric, this is the square of the largest eigenvalue of C. Then
kABk2 ≤ kAk21 kBk2 and kABk2 ≤ kAk2 kBk21
28
Hence °°° bA(k, h)−A(k, h)°°° ≤ kU1T k°°°bΓ−1k °°°1+ kU2T k
°°°bΓ−1k °°°1+ kU3T k
°°°bΓ−1k °°°1
where
U1T =
⎧⎨⎩(T − k − h−1)T−hXt=k
⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠X 0t,k
⎫⎬⎭U2T =
((T − k − h−1)
T−hXt=k
εt+hX0t,k
)
U3T =
⎧⎨⎩(T − k − h−1)T−hXt=k
⎛⎝ hXj=1
Bjεt+h−j
⎞⎠X 0t,k
⎫⎬⎭Lewis and Reinsel (1985) show that
°°°bΓ−1k °°°1is bounded, therefore, the next objective is to show
kU1T k p→ 0, kU2T k p→ 0, and kU3T k p→ 0. We begin by showing kU2T k p→ 0, which is easiest to
see since εt+h and X 0t,k are independent, so that their covariance is zero. Formally and following
similar derivations in Lewis and Reinsel (1985)
E³kU2T k2
´= (T − k − h)−2
T−hXt=k
E¡εt+hε
0t+h
¢E(X 0
t,kX0t,k)
by independence. Hence
E³kU2T k2
´= (T − k − h)−1tr(Σ)k tr [Γ(0)]
Since kT−k−h → 0 by assumption (ii), then E
³kU2T k2
´p→ 0, and hence kU2T k p→ 0.
Next, consider kU3T k p→ 0. The proof is very similar since εt+h−j, j = 1, ..., h− 1 and X 0t,k are
independent. As long as kBjk2 < ∞ (which is true given that the Wold decomposition ensures
thatP∞j=0 kBjk < ∞, then using the same arguments we used to show kU2T k
p→ 0, it is easy to
see that kU3T k p→ 0.
Finally, we show that kU1T k p→ 0.The objective here is to show that assumption (iii) implies
that
k1/2∞X
j=k+1
°°Ahj °°→ 0, k, T → 0
29
because we will need this condition to hold to complete the proof later. Recall that
Ahj = Bh−1Aj +Ah−1j+1 ; A
0j+1 = 0; B0 = Ir; h, j ≥ 1, h finite
Hence
k1/2∞X
j=k+1
°°Ahj °° = k1/2⎧⎨⎩
∞Xj=k+1
kBh−1Aj +Bh−2Aj+1 + ...+B1Aj+h−2 +Aj+h−1k⎫⎬⎭
by recursive substitution. Thus
k1/2∞X
j=k+1
°°Ahj °° ≤ k1/2⎧⎨⎩
∞Xj=k+1
kBh−1Ajk+ ...+ kB1Aj+h−2k+ kAj+h−1k⎫⎬⎭
Define λ as the max kBh−1k , ..., kB1k , then sinceP∞j=0 kBjk <∞ we know λ <∞ so that
k1/2∞X
j=k+1
°°Ahj °° ≤ k1/2⎧⎨⎩λ
∞Xj=k+1
kAjk+ ...+ λ∞X
j=k+1
kAj+h−2k+∞X
j=k+1
kAj+h−1k⎫⎬⎭
By assumption (iii) and since λ <∞, then each of the elements in the sum goes to zero as T, k go
to infinity. Finally, to prove kU1T k p→ 0 all that is required is to follow the same steps as in Lewis
and Reinsel (1985) but using the condition
k1/2∞X
j=k+1
°°Ahj °°→ 0, k, T → 0
instead.
Proof. Theorem 2
We begin by showing thatW1Tp→ 0. Lewis and Reinsel (1985) show that under assumption (ii),
k1/2°°°bΓ−1k − Γ−1k °°°
1
p→ 0 and E³°°°k−1/2 (T − k − h)1/2 U1T°°°´ ≤ s (T − k − h)1/2P∞j=k+1 °°Ahj °° p→
0; k, T → ∞ from assumption (iii) and using similar derivations as in the proof of consistency
with s being a generic constant. Hence W1Tp→ 0.
Next, we show W2Tp→ 0. Notice that
|W2T | ≤ k1/2°°°bΓ−1k − Γ−1k °°°
1
°°°k−1/2(T − k − h)1/2U∗2T°°°As in the previous step, Lewis and Reinsel (1985) establish that k1/2
°°°bΓ−1k − Γ−1k °°°1
p→ 0 and from
the proof of consistency, we know the second term is bounded in probability, which is all we need
to establish the result.
30
Lastly, we need to showW3Tp→ 0, however, the proof of this result is identical to that in Lewis
and Reinsel once one realizes that assumption (iii) implies that
(T − k − h)1/2∞X
j=k+1
°°Ahj °° p→ 0
and substituting this result into their proof.
Proof. Theorem 3
Follows directly from Lewis and Reinsel (1985) by redefining
STm = (T − k − h)1/2 vec⎡⎣⎧⎨⎩(T − k − h)−1
T−hXt=k
⎛⎝εt+h +h−1Xj=1
Bjεt+h−j
⎞⎠X 0t,k(m)
⎫⎬⎭Γ−1k⎤⎦
for m = 1, 2, ... and Xt,k(m) as defined in Lewis and Reinsel (1985).
Proof. Theorem 4
Notice that when bbT p→ b0, then
g(bbT ;φ) = (Ir ⊗ S0)bbT − n³Ir ⊗ S1 bBT´ ³Ir ⊗ S2 bBT´oφ p−→
= (Ir ⊗ S0) b0 − (Ir ⊗ S1B0) (Ir ⊗ S2B0)φ
= g (b0;φ)
by the continuous mapping theorem. Furthermore and given assumption (i)
Q0(φ) = g(b0;φ)0Wg(b0;φ)
which is a quadratic expression that is uniquely maximized at φ0. Q0(φ) is obviously a continuous
function. Hence, the last thing we need to show is that
supφ∈Θ
¯ bQT (φ)−Q0(φ)¯ p→ 0
uniformly. [I have not done this formally yet. Here is where the stochastic equicontinuity may
come in]
Proof. Theorem 5
31
The strategy of the proof consists in stacking the first and second steps into a vector of
conditions whose weighted quadratic distance we want to minimize. This set-up then allows us to
apply the mean value theorem on the set of first order conditions and derive the desired result.
Therefore, let
mT (b) =MzY −MzXb
and notice that
minbmT (b)
0mT (b)
results in
bbT = (X 0MzX)−1(X 0MzY ) ,
the local projection estimator for the first-step, reduced-form, impulse responses we presented in
section 3. Hence, stacking the first and second stage conditions into a vector, we have
RT (b,φ) =
∙mT (b)
0 g³bbT ;φ´0 ¸
⎡⎢⎢⎣ I 0
0 W
⎤⎥⎥⎦⎡⎢⎢⎣ mT (b)
g³bbT ;φ´
⎤⎥⎥⎦ .The first order conditions of the problem
minβ,φ
RT (b,φ)
are:
⎡⎢⎢⎣ ∇bmT (b)0
0
∇bg³bbT ;φ´0 ∇φg
³bbT ;φ´0⎤⎥⎥⎦⎡⎢⎢⎣ I 0
0 W
⎤⎥⎥⎦⎡⎢⎢⎣ mT (b)
g³bbT ;φ´
⎤⎥⎥⎦ = 0. (29)
Notice that
32
∇bmT (b) = −MzX
∇bg (b;φ) = (I ⊗ S0)− (Φ01 ⊗ I) (I ⊗ S−1) + (Φ02 ⊗ I) (I ⊗ S1)
∇φg (b;φ) = − (I ⊗ S−1)B (I ⊗ S1)B
We find it convenient to define Gb = ∇bg (b;φ) and Gφ = ∇φg (b;φ) and we use Gb = ∇bg¡b;φ¢
to evaluate the gradient at a mean value b ∈hbbT , b0iand bGb = ∇bg ³bbT ; bφT´ and similarly for Gφ
and bGφ.
Next, apply the mean value theorem to the last term in the first order conditions (29):
⎡⎢⎢⎣ mT (b)
g³bbT ;φ´
⎤⎥⎥⎦ =⎡⎢⎢⎣ m (b0)
g (b0;φ0)
⎤⎥⎥⎦+⎡⎢⎢⎣ −MzX 0
Gb Gφ
⎤⎥⎥⎦⎡⎢⎢⎣ bbT − b0bφT − φ0
⎤⎥⎥⎦ (30)
and plug this expression back into the first order conditions (29)
⎡⎢⎢⎣ −X 0Mz 0
bG0b bGφ
⎤⎥⎥⎦⎡⎢⎢⎣ I 0
0 W
⎤⎥⎥⎦×⎧⎪⎪⎨⎪⎪⎩⎡⎢⎢⎣ m (b0)
g (b0;φ0)
⎤⎥⎥⎦+⎡⎢⎢⎣ −MzX 0
Gb Gφ
⎤⎥⎥⎦⎡⎢⎢⎣ bbT − b0bφT − φ0
⎤⎥⎥⎦⎫⎪⎪⎬⎪⎪⎭ = 0
After tedious algebra, we obtain
⎡⎢⎢⎣ bbT − b0bφT − φ0
⎤⎥⎥⎦ = (31)
⎡⎢⎢⎣ (X 0MzZ)−1 0
−³ bG0φWGφ
´−1 n bG0φWGb − bG0bMzXo(X 0MzZ)
−1 ³ bG0φWGφ
´−1⎤⎥⎥⎦×
⎡⎢⎢⎣ −X 0Mz 0
bG0b bG0φW⎤⎥⎥⎦⎡⎢⎢⎣ m (b0)
g (b0;φ0)
⎤⎥⎥⎦
33
Notice that from conditions and assumptions of the problem,
√Tm (b0)
d→ N (0,Σh)
√T (I ⊗ S0) g (b0;φ0) d→ N (0, Vb)
bGb, Gb p→ Gb
bGφ, Gφp→ Gφ
Therefore, the second row of expression (31) allows us to write
√T³bφT − φ0
´d→ N (0, Vφ)
Vφ =¡G0φWGφ
¢−1G0φW (I ⊗ S0)0 Vb (I ⊗ S0)WGφ
¡G0φWGφ
¢−1+
¡G0φWGφ
¢−1G0bWΣhWGb
¡G0φWGφ
¢−1+
¡G0φWGφ
¢−1 G0φWGb −G0bMzX0VbG0φWGb −G0bMzX¡G0φWGφ
¢−1where we make use of the fact that
(X 0MzX)−1X 0MzΣhMzX (X
0MzX)−1= Vb
ReferencesBlanchard, Olivier and Danny Quah (1989) “The Dynamic Effects of Aggregate Demandand Aggregate Supply Disturbances,” American Economic Review, 79: 655-73
Cameron, A. Colin and Pravin Trivedi (2005) Microeconometrics: Methods and Ap-plications. Cambridge: Cambridge University Press.
Christiano, Lawrence J. (2002) “Solving Dynamic Equilibrium Models by a Method of Un-determined Coefficients, Computational Economics, 20, 21-55.
Christiano, Lawrence J., Martin Eichenbaum and Charles L. Evans (2005) “Nominal Rigidi-ties and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy,v. 113, 1-45.
34
Cooley, Thomas F. (1997) “Calibrated Models,” Oxford Review of Economic Policy, 13:55-69.
Durbin, James (1960) “The Fitting of Time-Series Models,” Revue de l’Institut Internationalde Statistique, 28(3), 233-244.
Evans, George W. and Seppo Honkapohja (2001) Learning and Expectations in Macro-economics. Princeton and Oxford: Princeton University Press.
Farmer, Roger E. A. (1993) The Macroeconomics of Self-Fulfilling Prophecies. Cam-bridge, MA: MIT Press.
Ferguson, Thomas S. (1958) “A Method of Generating Best Asymptotically Normal Esti-mates with Application to the Estimation of Bacterial Densities,” Annals of MathematicalStatistics, 29: 1046-62.
Fuhrer, Jeffrey C. and Giovanni P. Olivei (2004) “Estimating Forward Looking Euler Equa-tions with GMM Estimators: An Optimal Instruments Approach,” FRB Boston Series,paper n. 04-2.
Hannan, Edward J. and Jorma Rissanen (1982) “Recursive Estimation of MixedAutoregressive-Moving Average Order,” Biometrika, 69(1), 81-94.
Kim, Jinill, Sunghyun Kim, Ernst Schaumburg, and Christopher A. Sims (2003) “Calculatingand Using Second Order Accurate Solutions of Discrete Time Dynamic Equilibrium Models,”Federal Reserve Board, Finance and Economics Discussion Series, 2003-61.
Hamilton, James D. (1994) Time Series Analysis. Princeton, New Jersey: PrincetonUniversity Press.
Harvey, James E., Kevin D. Hoover and Kevin D. Salyer, eds. (1998) Real Business Cy-cles, A Reader. London and New York: Routledge.
Jordà, Òscar (2005) “Estimation and Inference of Impulse Responses by Local Projections,”American Economic Review, 95(1): 161-182.
Lütkepohl, Helmut (1993) Introduction to Multiple Time Series Analysis, (2nd ed.).Berlin: Springer-Verlag.
McCallum, Bennett T. (1983) “On Non-Uniqueness in Rational Expectations Models: AnAttempt at Perspective,” Journal of Monetary Economics, March, 139-168.
McCallum, Bennett T. (2003) “The Unique Minimum State Variable RE Solution is E-Stablein All Well Formulated Linear Models,” National Bureau of Economic Research, workingpaper 9960.
Rotemberg, Julio J. and Michael Woodford (1997) “An Optimization-Based EconometricFramework for the Evaluation of Monetary Policy,” in NBER Macroeconomics Annual1997, Julio J. Rotemberg and Ben S. Bernanke, eds. Cambridge and London: MIT Press.
Rudebusch, Glenn D. and Jeffrey C. Fuhrer (2004) “Estimating the Euler Equation forOutput,” Journal of Monetary Economics, 51(6): 1133-53.
Sbordone, Argia M. (2004) “A Limited Information Approach to the Simultaneous Estima-tion of Wage and Price Dynamics,” Rutgers University, mimeo.
35
Sbordone, Argia M. (2005) “Do Expected Future Marginal Costs Drive Inflation Dynamics?”Federal Reserve Bank of New York, Staff Report n0. 204.
Sims, Christopher A. (1996) “Macroeconomics and Methodology,” Journal of EconomicPerspectives, v. 10, n.1, 105-120.
Stock, James H., Jonathan Wright, and Motohiro Yogo (2002) “A Survey of Weak Instru-ments and Weak Identification in Generalized Method of Moments,” Journal of Businessand Economic Statistics, 20, 518-529.
Watson, Mark (1993) “Measures of Fit for Calibrated Models,” Journal of Political Economy ,101:1011-1041.
36
38
Table 1. Fitting ARMA(p,q) Models with Efficient Response Matching Estimation. A Monte-Carlo Comparison Data simulated from the model:
)1,0(~ 11 Nyy ttttt εθεερ −− ++=
Model ρ = 0.25
θ = 0.50 ρ = 0.50 θ = 0.25
ρ = 0.50 θ = 0.50
ρ = 0.75 θ = 0
ρ = 0 θ = -0.50
T = 100 DHR 0.264
(0.146) 0.482
(0.130)
0.479 (0.135) 0.264
(0.147)
0.483 (0.110) 0.518
(0.107)
0.702 (0.099) 0.019
(0.138)
-0.034 (0.190) -0.482 (0.167)
ERME* 0.268 (0.152) 0.472
(0.177)
0.461 (0.143) 0.259
(0.166)
0.479 (0.123) 0.500
(0.152)
0.716 (0.133) 0.010
(0.165)
-0.019 (0.220) -0.507 (0.240)
h 5 4 5 4 3 T = 300
DHR 0.246 (0.084) 0.495
(0.075)
0.485 (0.076) 0.251
(0.084)
0.482 (0.064) 0.507
(0.062)
0.747 (0.051) -0.005 (0.078)
-0.006 (0.112) -0.501 (0.097)
ERME* 0.246 (0.092) 0.494
(0.107)
0.482 (0.084) 0.263
(0.101)
0.483 (0.071) 0.501
(0.090)
0.748 (0.061) -0.005 (0.074)
-0.020 (0.124) -0.500 (0.136)
h 7 8 9 12 5 Notes: DHR refers to the Durbin-Hannan-Rissanen estimator for ARMA models. ERME* refers to the ERME estimates obtained from choosing the maximum impulse response horizon h for which the joint null that all the coefficients are zero is rejected at a 95% confidence level. This choice of horizon is reported in the row labeled h. T = 100, 300 refers to the sample size. The first 100 observations of each simulation are discarded to avoid initialization problems. The Monte-Carlo is replicated 200 times. The numbers reported are the Monte-Carlo averages of the estimates (to assess consistency) and the standard errors are the Monte-Carlo average over the standard errors obtained from each of the two estimation alternatives (to assess efficiency).
38
Table 2 – ERME, MLE, GMM and Optimal Instruments GMM: A Comparison
Estimates of Output Euler Equation: 1966:Q1 to 2001:Q4
( ) ttttttt xEzEzz εγµµ +++−= +− 111 Estimation
Method Specification µ
SE(µ) γ
SE(γ) p-value
J-Statistic 1st Step
Lag Length GMM HP 0.52
(0.053) 0.0024
(0.0094)
GMM ST 0.51 (0.049)
0.0029 (0.0093)
ML HP 0.47 (0.035)
-0.0056 (0.0037)
ML ST 0.42 (0.052)
-0.0084 (0.0055)
OI – GMM HP 0.47 (0.062)
-0.0010 (0.023)
OI – GMM ST 0.41 (0.064)
-0.0010 (0.022)
Optimally Weighted ERME (optimal horizon, h = 4) ERMEc HP -0.40
(0.067) -0.10
(0.075) 0.015 4
ERMEc ST -0.32 (0.062)
-0.14 (0.075)
0.003 4
ERMEu HP 1.43/-0.41 (0.12/0.07)
-0.10 (0.075)
0.016 4
ERMEu ST 1.45/-0.32 (0.11/0.06)
-0.13 (0.075)
0.007 4
Equally Weighted ERME (optimal horizon, h = 4) ERMEc HP -0.23
(0.123) -0.10
(0.082) 0.001 4
ERMEc ST -0.17 (0.119)
-0.10 (0.086)
0.000 4
ERMEu HP 1.18/-0.25 (0.17/0.12)
-0.09 (0.101)
0.001 4
ERMEu ST 1.21/-0.17 (0.16/0.12)
-0.11 (0.10)
0.000 4
Notes: Top third of the table is Table 4 in Fuhrer and Olivei (2004). Estimates based on 4 horizons of the impulse response (determined by a joint F-test on the projections) and projection lag-length is determined by AICc. HP is the Hodrick-Prescott filter of log real GDP, and ST is a segmented deterministic trend of log real GDP. ERMEc constrains the coefficients on zt-1 and Etzt+1 to add up to one. ERMEu keeps the coefficients unconstrained and reports the value of the coefficient on zt-1 and then Etzt+1. Optimal Weights ERME uses the variance-covariance matrix from the projection step as weights in the second step. Equally weighted ERME uses the identity matrix instead. P-value of the J-Statistic is an overall specification test. Hence rejection of the null (i.e., p-value < 0.05, say), indicates model misspecification.
39
Table 3 – ERME, MLE, GMM and Optimal Instruments GMM: A Comparison Estimates of Inflation Euler Equation: 1966:Q1 to 2001:Q4
( ) ttttttt xEzEzz εγµµ +++−= +− 111
Estimation
Method Specification µ
SE(µ) γ
SE(γ) p-value
J-Statistic 1st Step
Lag Length GMM HP 0.66
(0.13) -0.055 (0.072)
GMM ST 0.63 (0.13)
-0.030 (0.050)
GMM rulc 0.60 (0.086)
0.053 (0.038)
ML HP 0.17 (0.037)
0.10 (0.042)
ML ST 0.18 (0.036)
0.074 (0.034)
ML rulc 0.47 (0.024)
0.050 (0.0081)
OI – GMM HP 0.23 (0.093)
0.12 (0.042)
OI – GMM ST 0.21 (0.11)
0.097 (0.039)
OI - GMM rulc 0.45 (0.028)
0.054 (0.0081)
Optimally Weighted ERME (optimal horizon, h = 4) ERMEc HP 0.21
(0.11) 0.09
(0.06) 0.714 3
ERMEc ST 0.21 (0.11)
0.07 (0.06)
0.829 3
ERMEc rulc 0.22 (0.10)
0.32 (0.07)
0.000 2
ERMEu HP 0.59/0.22 (0.19/0.11)
0.09 (0.06)
0.912 3
ERMEu ST 0.61/0.22 (0.19/0.11)
0.07 (0.06)
0.963 3
ERMEu rulc 0.56/0.18 (0.14/0.10)
0.35 (0.08)
0.003 2
Equally Weighted ERME (optimal horizon, h = 4) ERMEc HP 0.21
(0.11) 0.09
(0.06) 0.432 3
ERMEc ST 0.21 (0.11)
0.07 (0.06)
0.829 3
ERMEc rulc 0.22 (0.10)
0.32 (0.07)
0.000 2
ERMEu HP 0.59/0.22 (0.19/0.11)
0.09 (0.06)
0.913 3
ERMEu ST 0.61/0.22 (0.19/0.11)
0.07 (0.06)
0.963 3
ERMEu rulc 0.56/0.18 (0.14/0.10)
0.35 (0.08)
0.003 2
Notes: Top third of the table is Table 5 in Fuhrer and Olivei (2004). Estimates based on 4 horizons of the impulse response (determined by a joint F-test on the projections) and projection lag-length is determined by AICc. HP is the Hodrick-Prescott filter of log real GDP, and ST is a segmented deterministic trend of log real GDP. When the entry rulc appears, the specification replaces the output gap with real unit labor costs as the driving process. ERMEc constrains the coefficients on zt-1 and Etzt+1 to add up to one. ERMEu keeps the coefficients unconstrained and reports the value of the coefficient on zt-1 and then Etzt+1. Optimal Weights ERME uses the variance-covariance matrix from the projection step as weights in the second step. Equally weighted ERME uses the identity matrix instead. P-value of the J-Statistic is an overall specification test. Hence rejection of the null (i.e., p-value < 0.05, say), indicates model misspecification.