Cointegrated VARMA Models and Forecasting US...

Working Paper No. 33

Cointegrated VARMA Models and Forecasting US Interest Rates

Christian Kascha and Carsten Trenkler

October 2011

University of Zurich

Department of Economics

Working Paper Series

ISSN 1664-7041 (print) ISSN1664-705X(online)

Cointegrated VARMA Models and Forecasting US

Interest Rates ∗

Christian Kascha† Carsten Trenkler‡

University of Zurich University of Mannheim

October 3, 2011

Abstract

We bring together some recent advances in the literature on vec-tor autoregressive moving-average models creating a relatively simplespecification and estimation strategy for the cointegrated case. Weshow that in the cointegrated case with fixed initial values there existsa so-called final moving representation which is usually simpler but notas parsimonious than the usual Echelon form. Furthermore, we proofthat our specification strategy is consistent also in the case of cointe-grated series. In order to show the potential usefulness of the method,we apply it to US interest rates and find that it generates forecastssuperior to methods which do not allow for moving-average terms.

Keywords: Cointegration, VARMA Models, Forecasting

1 Introduction

In this paper, we propose a relatively simple specification and estimationstrategy for the cointegrated vector-autoregressive moving-average (VARMA)models using the estimators given in Yap & Reinsel (1995) and Poskitt(2003) and the identified forms proposed by Dufour & Pelletier (2008). In

∗We thank Francesco Ravazzolo, Kyusang Yu, Michael Vogt and seminars partici-pants at the Tinbergen Institute Amsterdam, Universities of Aarhus, Bonn, Konstanz,Mannheim, and Zurich as well as session participants at the ISF 2011, Prague, at theESEM 2011, Oslo, and at the New Developments in Time Series Econometrics conference,Florence, for very helpful comments.

†University of Zurich, Chair for Statistics and Empirical Economic Research,Zurichbergstrasse 14, 8032 Zurich, Switzerland; [email protected]

‡Address of corresponding author: University of Mannheim, Department of Eco-nomics, Chair of Empirical Economics, L7, 3-5, 68131 Mannheim, Germany; [email protected]

1

order to show its potential usefulness, we apply the procedure in a forecast-ing exercise for US interest rates and find promising results.

Existing specification and estimation procedures for cointegrated VARMAmodels can be found in Yap & Reinsel (1995), Lutkepohl & Claessen (1997),Poskitt (2003, 2006) and also Poskitt (2009). Common to these papers isthe use of the reverse “Echelon-Form”, a set of parameter restrictions whichmake sure that the remaining coefficients are identified with respect to thelikelihood function. A related but different approach uses so-called “scalar-component” representations originally proposed by Tiao & Tsay (1989) andembedded in a complete estimation procedure by Athanasopoulos & Vahid(2008). While both structures can be quite parsimonious representations ofa given process, they can display relatively complex structures. Instead, weextend the simpler identified representations of Dufour & Pelletier (2008) tothe cointegrated case with fixed initial values. Furthermore, we propose tospecify the model using Dufour & Pelletier’s (2008) order selection criteriaapplied to the model estimated in levels. We proof a.s. consistency of theestimated orders in this case. While we believe that our proposed specifica-tion and estimation procedure for this class of models stands out because ofits simplicity and robustness, this is not to say that our procedure shouldbe preferred to the alternative methods mentioned above under all circum-stances. It is likely, that the method of choice depends on the type of dataat hand and the sample size. This is therefore an empirical question thatgoes beyond the scope of this paper.

Finally, we apply the methods to the problem of predicting U.S. treasurybill and bond interest rates with different maturities taking cointegration asgiven. We find quite promising results relative to a multivariate random walkand the standard vector error correction model (VECM). An investigationof the relative forecasting performances over time shows that the VARMAmodel delivers consistently good forecasts apart from a period stretchingfrom the mid-nineties to 2000.

The motivation for looking at this particular model class stems fromthe well-known theoretical advantages of VARMA models over pure vector-autoregressive (VAR) processes; see e.g. Lutkepohl (2005). In contrast toVAR models, the class of VARMA models is closed under linear transforma-tions. For example, a subset of variables generated by a VAR process is typi-cally generated by a VARMA, not by a VAR process (Lutkepohl 1984a,b). Itis well known that linearized dynamic stochastic general equilibrium (DSGE)models imply that the variables of interest are generated by a finite-orderVARMA process. Fernandez-Villaverde, Rubio-Ramırez, Sargent & Watson(2007) show formally how DSGE models and VARMA processes are linked.Cooley & Dwyer (1998) claim that modeling macroeconomic time series sys-tematically as pure VARs is not justified by the underlying economic theory.A comparsion of structural identification using VAR, VARMA and statespace representations is provided by Kascha & Mertens (2009).

2

Our particular application is part of a vast literature on the term struc-ture of interest rates and serves therefore as an ideal framework in whichto compare different modeling strategies. The cointegration approach hasbecome a widespread tool for term structure analysis, following the seminalpaper of Campbell & Shiller (1987). If the rational expectation hypothesisof the term structure holds (REHTS) then the spread of two interest rates isstationary. Accordingly, there should exist K − 1 cointegration relations ina system of K (nonstationary) interest rates. While there is strong empir-ical evidence for K − 1 cointegration relations among money market ratesand medium-term bond yields, see e.g. Hall, Anderson & Granger (1992),Engsted & Tanggaard (1994), Cuthbertson (1996), Hassler & Wolters (2001),a smaller number of cointegration relations is usually found if long-term in-terest rates are considered in addition. This has been found e.g. by Shea(1992), Carstensen (2003) and Cavaliere, Rahbek & Taylor (2010).

Starting with Campbell & Shiller (1987), many studies have also ana-lyzed whether the spreads help to predict individual interest rates by clas-sical regression models including the spread or the use of vector autore-gressive models. Given the widespread use of cointegration techniques totest for the REHTS, it is, however, surprising that only a few papers applycorresponding multivariate models for cointegration like the VECM for pre-dicting interest rates; see Hall et al. (1992), Hassler & Wolters (2001) and,more recently, Clarida, Sarno, Taylor & Valente (2006)

The results on the forecasting performance of (cointegrated) VARMAmodels are even more sparse. Using an identified form different from ours,Yap & Reinsel (1995) apply a cointegrated VARMA model to U.S. interestrates but do not evaluate its forecasting performance. An interesting con-tribution is the one by Feunou (2009). He uses a VARMA model for model-ing the whole yield curve imposing no-arbitrage restrictions in a stationarymodel instead of cointegration restrictions. Another study is provided byMonfort & Pegoraro (2007) using switching VARMA term structure mod-els, again in a stationary context. The applied part of our paper adds tothis literature.

The rest of the paper is organized as follows. Section 2 presents thepaper’s contribution and the application. Section 3 gives details on theproposed methodology. Section 4 concludes. Programs and data can befound on the homepages of the authors.

2 Cointegrated VARMA models

This section summarizes the model framework and the results of the paper.The technical details are given in section 3 and the proofs are provided inthe appendix.

This paper mainly assembles and extends elements of the articles of

3

Yap & Reinsel (1995), Poskitt (2003) and Dufour & Pelletier (2008) in orderto construct a reasonably easy and fast strategy for the specification andestimation of cointegrated VARMA models. The considered model for atime series of dimension K, yt = (yt,1, . . . , yt,K)′, is

yt = µ0 +

p∑j=1

Ajyt−j + ut +

q∑j=1

Mjut−j for t = 1, . . . , T (1)

given fixed initial values y0, . . . , y−p+1. The error terms ut are assumed tobe i.i.d. with mean zero and positive definite covariance matrix, Σu, andat least finite second moments (Poskitt 2003, Assumptions A.2, A.3). Letus define the autoregressive and moving-average polynomials by A(L) =IK −A1L− . . .−ApL

p and M(L) = IK +M1L+ . . .+MqLq, respectively,

where L denotes the lag operator. M(L) is assumed to be invertible. Weare interested in the case in which the process has s unit roots such that|A(z)| = ast(z)(1 − z)s for 0 < s ≤ K and |ast(z)| = 0 for z ≤ 1, where| · | refers to the determinant. Then it is said that the cointegration rankof yt is r = K − s and we can decompose Π :=

∑pj=1Aj − IK as Π = αβ′,

where α and β are (n× r) matrices with full column rank r. Furthermore,the constant is assumed to take the form µ0 = −αρ such that a trend in thedifferences is ruled out and one can write

∆yt = αβ′(yt−1 − ρ) +

k∑j=1

Γj∆yt−j + ut +

q∑j=j

Mjut−j (2)

Now, it is well known, that one has to impose certain restrictions on theparameter matrices in order to achieve uniqueness. That is, given a series(yt), there is generally more than one pair of finite polynomials [A(z), M(z)]such that (1) is satisfied. Therefore, one has to restrict the set of consid-ered pairs [A(z), M(z)] to a subset such that every process satisfying (1) isrepresented by exactly one pair in this subset.

Poskitt (2003) proposes a complete modelling strategy using the Ech-elon form which is based on so-called Kronecker indices. Here, we usethe much simpler final moving-average (FMA) representation proposed byDufour & Pelletier (2008) in the context of stationary VARMAmodels. Thisrepresentation imposes restrictions on the moving-average polynomial only.More precisely, we consider only polynomials [A(z), M(z)], such that

M(L) = m(L)IK , m(L) = 1 +m1L+ . . .+mqLq. (3)

is true.1 As already noted by Dufour & Pelletier (2008), this identificationstrategy is valid despite A(z) having roots on the unit circle. The reasonis that the polynomial M−1(L)A(L) can be uniquely related to M(L) and

1 Dufour & Pelletier (2008) also propose another representation that restricts attention

4

A(L) (Dufour & Pelletier 2008, Lemma 3.8 and Theorem 3.9). What is left,is only to show the definition of the FMA form in the non-stationary contextwith fixed initial values. Analogous to the results in Poskitt (2006), we canshow that in this particular case the resulting pair of polynomials does nothave to be left-coprime anymore.

Prior to estimation and specification, we subtract the sample mean fromthe observations, that is, we actually apply the methods to yt−T−1

∑Ts=1 ys

in the VARMA case. However, the notation will not distinguish between rawand adjusted data and we simply write

yt =

p∑j=1

Ajyt−j + ut +

q∑j=1

Mjut−j , (1’)

for example. The used estimation methods remain valid, provided the con-stant can indeed be absorbed in the cointegrating relation; see Yap & Reinsel(1995, section 6.) and Poskitt (2003, section 2, p. 507).

Dufour & Pelletier (2008) also propose an information criterion for spec-ifying stationary VARMA models identified via (3). In their setting, theunobserved residuals are first estimated by a long autoregression and thenused to fit models of different orders p and q via generalized least squares(GLS). The orders which minimize their information criterion are then cho-sen. We modify their procedure by replacing the GLS regressions by OLSregressions which are applied to the cointegrated VARMA model in levels.

Having determined the orders p and q, we use the algorithm describedin Poskitt (2003) to obtain an initial estimate of the cointegrated model.The estimator basically amounts to an OLS regression in the VECM repre-sentation. This estimate is then updated using the algorithm described inYap & Reinsel (1995) in order to obtain an efficient estimate in a Gaussiansetting. The last step is another GLS regression.

The complete procedure for a given cointegrating rank is:

1. Subtract the sample mean from the yt and estimate a long autoregres-sion using the de-meaned data.

2. Estimate (1’) by OLS for different orders p and q imposing the FMAform. The order estimate (p, q) is the pair which minimizes the infor-mation criterion in (13).

3. Get a preliminary estimate via Poskitt’s method.

to pairs with diagonal moving-average polynomials such as

M(L) =

K⊕k=1

mk(L), mk(L) = 1 +mk,1L+ . . .mk,qkLqk

where the mk(z) are scalar polynomials. This form actually delivered results similar tothe ones for the FMA form and will therefore not be discussed in the paper.

5

4. Update the preliminary estimates by the method described by Yapand Reinsel.

For the proofs and the forecasting exercise, we take the cointegrating rank asgiven. However, one might use the results Yap & Reinsel (1995) to specifythe cointegrating rank at the last two steps of the procedure.

We show that this modeling strategy is potentially interesting by ap-plying it to a prediction exercise for US interest rates and comparing theresulting forecasts to those of the random walk (RW) model and a VECMthat has only autoregressive terms and whose order is chosen by minimizingthe BIC. The VECM is estimated via reduced rank regression (Johansen1988, 1991, 1996).

We take monthly averages of interest rate data for treasury bills andbonds from the FRED database of the Federal Reserve Bank of St. Louis.The used data are the series TB3MS, TB6MS, GS1, GS5 and GS10 with ma-turities, 3 months, 6 months, 1 year, 5 years and 10 years, respectively. Ourvintage starts in 1970:1 and ends in 2010:1 and comprises T = 482 datapoints. Denote by Rt,mk

the annualized interest rate for the k-th maturitymk. Throughout we analyze yt,k := 100 ln(1 +Rt,mk

).Both the VAR as well as the VARMA models are specified and estimated

using the data that is available at the forecast origin. Then, forecasts forhorizon h are obtained iteratively. As the sample expands, both models arere-specified and re-estimated, forecasts are formed and so on - until the endof the available sample is reached. In order to have sufficient observationsfor estimation, the first forecasts are obtained at Ts = 200. Thus we are leftwith 282 observations for evaluating 1-month ahead forecasts, for example.

Hence, we compare two modeling strategies rather than two models: one,which allows for nonzero moving average terms and includes the special caseof a pure VAR and one, which exclusively considers the latter case.

Table 1 and Table 2 contain the main results for the RW model, theVECM, the cointegrated VARMA estimated via Poskitt’s (2003) initial es-timator (VARMA) and the the cointegrated VARMA with estimates up-dated by one iteration via the algorithm presented in Yap & Reinsel (1995)(VARMA YP). The first table gives the mean square prediction errors(MSPEs) series by series for different systems and horizons. The MSPEis defined in a standard way. The second table gives results for the deter-minant of the MSPE matrix for different horizons and systems; that is,

|MSPEh| =

∣∣∣∣∣ 1

T − Ts − h+ 1

T−h∑t=Ts

(yt+h − yt+h|t)(yt+h − yt+h|t)′

∣∣∣∣∣ ,using an obvious notation and omitting the dependence on the specific modeland system. The last criterion serves as a criterion to measure joint forecast-ing precision as we do not want to enter the discussion of how to obtain the

6

best density forecast. In Table 1, the maturities of the systems are given onthe left column; that is, the first two columns stand for the bivariate systemwith interest rates for maturities 3 and 6 months. The forecast horizons are1, 3, 6 and 12 months. Table 2 is structured similarly. For both tables, theentries for the RW model are absolute while the entries for the other modelsare always relative to the corresponding entry for the RW model. For exam-ple, the first entry in the first row tells us that the random walk producesa one-step-ahead MSPE of 0.042 which corresponds to

√0.042 ≃ 0.205 per-

centage points. In the same row, the entry for the VECM at h = 1 tells usthat this model produces one-step-ahead forecasts of the 3-month interestrate that have a MSPE which is roughly 20 % lower than the MSPE of theRW model.

Table 1 shows that the cointegrated models are more advantageous rela-tive to the RW model for the bivariate systems than for the larger systems.Apparently, cointegrated VAR or VARMA models can be very advantegousat one-month and three-month horizons while the RW becomes more com-petitive for longer horizons and longer maturities - at least when individualMSPEs are considered. The cointegrated models also work better whenmaturities which are close to each other are grouped together. When com-paring the MSPE figures for the VECM and VARMA model (VARMA andVARMA YP) one can see that the VARMA model is generally performingbetter than the VECM model, sometimes quite clearly. To give an exam-ple, the gain in forecasting precision can amount to more than 20% for thebivariate systems at short horizons. Typically, a VARMA(1,1) model is pre-ferred by the information criterion over pure VAR models, while the BICusually picks two autoregressive lags. An exception is the system consistingof five variables. Here, the lag selection criterion almost always chooses nomoving-average terms and thus the “VARMA results” are actually resultsfor the pure VECM model when estimated with the algorithm by Poskitt(2003) or Yap & Reinsel (1995), respectively. Therefore, the comparison forthe five-variable system amounts to a comparison of different estimation al-gorithms for the same model and it turns out that in this case reduced rankregression is largely preferable to the approximative methods in terms ofthe MSPE measure. Note that the forecasting performance of VARMA andVARMA YP are typically quite similar.

The results in Table 2 largely reflect those in Table 1. That is, thecointegrated models’ forecasts are usually more precise than the forecastsgenerated by the random walk and the VARMA predictions are usuallymore accurate than the VAR predictions apart from the special case of thefive-dimensional system as discussed above. However, in contrast to thesingle MSPE results, the forecasts generated by the cointegrated models aresuperior - in terms of joint criterion - to those of the random walk even atlonger horizons, in particular for h = 12.

To get a complete picture of the performance of the cointegrated models

7

vis-a-vis the RW for h-step-ahead forecasts for the k-th series in the systemwe compute cumulative sums of squared prediction errors as defined as

t∑s=T s+h

e2s,RW,k,h − e2s,M,k,h t = T s + h, . . . , T (4)

where M stands for either the VECM or the cointegrated VARMA model(VARMA YP) and et,RW,k,h, et,M,k,h are the forecast errors from predictingyt,k based on information up to t−h, i.e. et,M,k,h = yt,k− yt,k|t−h,M. Ideally,we should see the above sum steadily increasing over time if forecastingmethod M is indeed preferable to the RW. The pictures are given in Figure2 for the system with maturities 3 months and 1 year and forecast horizonsh = 1, 6, 12.

First, Figure 2 of course mirrors the results of Table 1 as the Tablecontains the end-of-sample results. Second, the forecasting advantage ofboth models is relatively consistent through time. While there are occasional“jumps” when the cointegrated models perform much better than the RWmodel, these jumps do not appear to dominate the results in Table 1. Theforecasting advantage of the VARMA model appears however marginallymore stable over time. Third, there is also a period roughly from the midnineties to 2000 when the RW model performed better than the cointegratedmultivariate models. Interestingly, a similar finding is also obtained byde Pooter, Ravazzolo & van Dijk (2010) in a different context. In sum, thepictures support the view that the forecasting advantage of both the VECMand VARMA model over the RW is systematic.

3 Methodological Details

3.1 VARMA Modelling

We start discussing the general VARMA process

A0yt =m∑j=1

Ajyt−j +M0ut +m∑j=1

Mjut−j , for t = 1, . . . , T, (5)

where, for simplicity, m denotes the maximum of the autoregressive andmoving average lag order in this section. Using the notation from Poskitt(2006) with minor modifications, we define deg[A(z), M(z)] as the maximumrow degree max1≤k≤K degk[A(z), M(z)] where degk[A(z), M(z)] denotes thepolynomial degree of the kth row of [A(z), M(z)]. Then we can define a classof processes by {[AM ]}m := {[A(z), M(z)]|deg[A(z), M(z)] = m}.

For the moment we just assume

8

Assumption 3.1 The K-dimensional series (yt)Tt=1−m admits a VARMA

representation as in (5) with A0 = M0, A0 invertible, [A(z), M(z)] ∈{[AM ]}m and fixed initial values y1−m, . . . , y0;

but impose additional restrictions of the general model as needed. Theproofs are given in the appendix.

Identification

The identification of the parameters of the FMA form follows from the ob-servation that any process that satisfies (5) can always be written as

yt =t+m−1∑s=1

Πsyt−s + ut + nt, t = 1−m, . . . , T, (6)

where it holds, by construction of the sequences Πi and nt, that

0 =

m∑j=0

MjΠi−j , i ≥ m+ 1 (7)

0 =

m∑j=0

Mjnt−j , t = 1, . . . , T. (8)

On the other hand, given a process satisfying (6) and existence of matri-ces M0, M1, . . . , Mm such that conditions (7) and (8) are true, the processhas a VARMA representation as above. These statements are made precisein the following theorem which is just a restatement of the correspondingtheorem in Poskitt (2006).

Theorem 3.1 The process (yt)Tt=1−m admits a VARMA representation as

in (5) with [A(z), M(z)] ∈ {[A M ]}m and initial conditions y0, . . . , y1−m ifand only if (yt)

Tt=1−m admits an autoregressive representation

yt =t+m−1∑s=1

Πsyt−s + ut + nt, t = 1−m, . . . , T,

in which the conditions (7) and (8) are satisfied.

Now, one assigns to the autoregressive representation a unique VARMArepresentation. Although not necessary for the derivations that follow im-mediately, we assume that M(z) is invertible

Assumption 3.2 |M(z)| = 0 for |z| ≤ 1.

9

Because of the properties of the adjoint, Mad(z)M(z) = |M(z)|, equa-tions (7) and (8) imply

0 =

q∑j=0

mjΠi−j , i ≥ q + 1 (9)

0 =

q∑j=0

mjnt−j , t = q −m+ 1, . . . , T. (10)

Here, |M(z)| ≡ m(z) = m0 + m1z + . . . + mqzq is a scalar polynomial and

q = m ·K is its maximal order.Therefore, one can define a pair in final moving-average form as in (3)

[A(z), m(z)IK ], provided the stated assumptions and that T ≥ q −m + 1.This representation, however, is not the only representation of this form. Toachieve uniqueness, we select the representation of the form [A(z), m(z)IK ]with the lowest possible degree of the scalar polynomial m(z) such that thefirst coefficient is one and (9) and (10) are satisfied.

Theorem 3.2 Assume that the process (yt)Tt=1−m satisfies Assumptions 3.1

and 3.2. Then for T ≥ q − m + 1 there exists a unique, observationallyequivalent, representation in terms of a pair [A0(z), m0(z)IK ] with ordersp0 and q0, respectively, commencing from some t0 ≥ 1−m.

In contrast to the discussion in Dufour & Pelletier (2008), the specialfeature in the non-stationary case with fixed initial values is that the FMArepresentation does not need to be left-coprime, in particular the autore-gressive and moving average polynomial can have the same roots. This is aconsequence of condition (10) and is not very surprising given the results ofPoskitt (2006) on the Echelon form representation in the same setting.

Now, if we assume normality and independence, i.e. ut ∼ i.i.d.N(0,Σu)with Σu positive definite, and under our assumptions, the parameters of themodel are identified via the Gaussian partial likelihood function

f(yTt0 |yt0−11−m, λ)

where yTt0 = (y′t0 , . . . , y′T )

′, yt0−11−m = (y′1−m, . . . , y′t0−1)

′ and λ being the pa-rameter vector of the final moving average form. This just follows fromPoskitt (2006, section 2.2) and the observation that assumptions 2.1 and 2.2of this paper are satisfied in the present case.

For the cointegrated case, we make the following assumption.

Assumption 3.3 |A(z)| = ast(z)(1 − z)s for 0 < s ≤ K where ast(z) = 0for |z| ≤ 1. The number r = K − s is called the cointegrating rank of theseries.

10

Then the corresponding error-correction representation

yt = Πyt−1 +

p0−1∑i=1

Γj∆yt−j +m(L)ut (11)

with the same initial conditions as above is identified as there exists a one-to-one mapping between this representation and the presentation in levels(cf. Poskitt 2006, section 4.1).

Specification

Since a legitimate critique of VARMA modeling is the increased specificationuncertainty, we think that a serious forecast comparison has to involve mod-eling uncertainty. Therefore, we chose to select the orders data-dependentin our forecast study as described in the following.

First, the sample mean is subtracted from the observations as justifiedabove. In order to determine the lag orders p and q of the VARMA model(1) with the FMA structure in (3) we apply a two-step approach similarto Dufour & Pelletier (2008) and use the information criterion they havesuggested. Our procedure works as follows.

1. Fit a long VAR regression with hT lags to the mean-adjusted series as

yt =

hT∑i=1

ΠhTi yt−i + uhT

t . (12)

Denote the estimated residuals from (12) by uhTt .

2. Regress yt on ϕhTt−1(p, q) = [y′t−1, . . . , y

′t−p, u

hT ′t−1, . . . , u

hT ′t−q]

′, t = sT +1, . . . T , imposing the FMA restriction in (3) for all combinations ofp = k + 1 ≤ pT and q ≤ qT with sT = max(pT , qT ) + hT using OLS.Denote the estimate of the corresponding covariance error matrix byΣT (p, q) = (1/N)

∑TsT+1 zt(p, q)z

′t(p, q), where zt(p, q) are the OLS

residuals. Compute the information criterion

DP (p, q) = ln |ΣT (p, q)|+ dim(γ(p,q))(lnN)1+ν

N, ν > 0 (13)

where N = T − sT and dim(γ(p,q)) is the dimension of the vector offree parameters of the corresponding VARMA(p, q) model in levels.

3. Choose the AR and MA orders by (p, q)IC = argmin(p,q)DP (p, q),where the minimization is over {1, 2, . . . , pT } × {0, 1, . . . , qT }.

In order to show consistency we make the following assumption which isequivalent to Assumption A.2 in Poskitt (2003).

11

Assumption 3.4 The true error term vectors ut = (u′t,1, u′t,2, . . . , u

′t,K), t =

1−p, . . . , 0, 1, . . . , T , form an independent, identically distributed zero meanwhite noise sequence with positive definite variance-covariance matrix Σu.Furthermore, the moment condition E

(||ut||δ1

)< ∞ for some δ1 > 2, where

|| · || denotes the Euclidean norm, and growth rate ||ut|| = O((log t)1−δ2)

)almost surely (a.s.) for some 0 < δ2 < 1 also hold.

Using Assumption 3.4 we obtain the following theorem on the consistencyof the order estimators.

Theorem 3.3 If Assumptions 3.1-3.4 hold, if hT = [c(lnT )a] is the integerpart of c(lnT )a for some c > 0, a > 1, and if max(pT , qT ) < hT , then theorders chosen according to (13) converge a.s. to their true values.

Theorem 3.3 is the counterpart to Dufour & Pelletier (2008, Theorem 5.1),dealing with the stationary VARMA setup, and, to some extent, to Poskitt(2003, Proposition 3.2), referring to cointegrated VARMA models identi-fied via the echelon form. Note, that we can use the same penalty termCT = (lnN)1+ν , ν > 0, as in the stationary VARMA case. However,the assumptions on the error terms have to be strengthened. In partic-ular, an i.i.d. assumption is needed in contrast to the strong mixing as-sumption employed by Dufour & Pelletier (2008). Existing formal results ofPoskitt & Lutkepohl (1995) and Huang & Guo (1990) show that weakeningAssumption 3.4, e.g. making an appropriate martingale difference sequenceassumption on ut, leads to too low convergence orders for the estimatorsobtained in steps 1. and 2.. As a consequence, the penalty term needs to bestronger, e.g. one may set it to C ′

T = hTCT . Nevertheless, Poskitt (2003)argues that it is likely that the needed convergence orders can be obtainedunder weaker conditions than those stated in Assumption 3.4.

The practitioner has to chose values for ν, hT , pT , and qT satisfy-ing the conditions contained in Theorem 3.3. We set ν = 0.2 followingDufour & Pelletier (2008). As pointed out by Poskitt (2003) and Lutkepohl(2005, Chapter 14) no clear guideline exists on how to select hT for the non-stationary case. We adopt the rule hT = max

(max(pT , qT ) + 1, (lnT )1.25

)from Poskitt (2003) with pT = qT = 4. Choosing larger values for pT andqT left the results virtually unchanged. Alternatively, one may use Akaike’sinformation criterion (AIC) to determine hT resulting in the estimator hAIC

T ,

say. While Poskitt (2003) conjectures that hAICT satisfies the condition on

hT given in Theorem 3.3, the latter has only been proven for the BIC byBauer & Wagner (2005, Corollary 1).

Estimation

Given the estimated orders and residuals of the long autoregression (12) weobtain Poskitt’s (2003) initial estimator as follows.

12

The cointegrated VARMA model (2) can be conveniently written as

∆yt = Π′yt−1 + [Γ M]Zt−1 + ut, (14)

where Γ = vec[Γ1, . . . ,Γk], M = vec[M1, . . . ,Mq] andZt−1 = [∆y′t−1, . . . ,∆y′t−k, u

′t−1, . . . , u

′t−q]

′.

Let ZhTt be the matrix obtained from Zt by replacing the ut by uhT

t . Letγ1 be the vector of free parameters in vec[Γ M] and the augmented vectorγ2 = (vec(Π)′, γ′1)

′. Identification restrictions are imposed by defining suit-able matrices R,R2 such that vec([Γ M]) = Rγ1 and vec([Π Γ M]) = R2γ2,respectively. Equipped with these definitions, one can write

∆yt =(y′t−1 ⊗ IK , ZhT

t−1

′)vec([Π Γ M]) + ut

=(y′t−1 ⊗ IK , ZhT

t−1

′)R2γ2 + ut

= Xtγ2 + ut. (15)

Poskitt’s (2003) initial estimator is the feasible GLS estimator

γ2 =

T∑hT+1

X ′t(Σ∆,T )

−1Xt

−1T∑

hT+1

X ′t(Σ∆,T )

−1∆yt, (16)

which is strongly consistent (Poskitt 2003, Propositions 4.1 and 4.2) givenAssumptions 3.1-3.4 and Σ∆,T is an estimate of Σu obtained from OLS

estimation of (15).2 The estimated matrices are denoted by Π, Γ, M . Toexploit the reduced rank structure in Π = αβ′, β is normalized such thatβ = [Ir, β

∗′]′. Then α is estimated as the first r rows of Π such that

α = Π[., 1 : r], (17)

β∗ =

(α′(M(1)ΣT M(1)′

)−1α

)−1

×(α′(M(1)ΣT M(1)′

)−1Π[., r + 1 : K]

). (18)

These estimates are taken as starting values for one iteration of a conditionalmaximum likelihood estimation procedure as in Yap & Reinsel (1995). De-fine the vector of free parameters, given the cointegration restrictions, asδ := (vec((β∗)′)′, vec(α)′, γ′1)

′ and its value at the jth iteration as δ(j). Theelements of the initial vector δ(0) = δ correspond to (16) - (18). Compute

u(j)t and Σ

(j)u according to

M (j)(L)u(j)t = ∆yt − α(j)(β(j))′yt−1 − Γ(j)(L)∆yt−1, (19)

Σ(j)u =

1

T

T∑t

u(j)t (u

(j)t )′ (20)

2Our formulation differs from his because we formulate the models in differencesthroughout. The procedures yield identical results.

13

For the calculation, it is assumed yt = ∆yt = ut = 0 for t ≤ 0. Only

W(j)t := −∂u

(j)t

∂δ′t

is needed for computing one iteration of the proposed

Newton-Raphson iteration. Also W(j)t can be calculated iteratively as

(W(j)t )′ =

[(y′t−1H ⊗ α), (y′t−1β ⊗ IK), ((Z

(j)t−1)

′ ⊗ IK)R]−

q∑i=1

Mi(W(j)t−i)

′

where H ′ := [0((K−r)×r), IK−r]. The estimate is then updated according to

δ(j+1) − δ(j) =

(T∑t=1

W(j)t (Σ(j)

u )−1(W(j)t )′

)−1 T∑t=1

W(j)t (Σ(j)

u )−1u u

(j)t ,

which amounts to a GLS estimation step. The estimates of the residualsand their covariance can be updated according to (19) and (20). The one-step iteration estimator δ(1) is consistent and fully efficient asymptoticallyaccording to Yap & Reinsel (1995, Theorem 2) given the strong consistencyof the initial estimator γ2 in (16).

Given estimates of the parameters and innovations, forecasts are ob-tained by using the implied VARMA form in levels. Finally, the samplemean, which was subtracted earlier, is added to the forecasts.

3.2 Benchmark Models

There are two benchmark models in the forecasting exercise. The first isthe multivariate random walk yt = yt−1 + ut, ut ∼ i.i.d.(0K ,Σu), wherethe notation means that ut is an independent white noise process. Pointforecasts are obtained in a standard way.

The second benchmark model is the VECM

∆yt = µ0 +Πyt−1 +k∑

j=1

Γj∆yt−j + ut, t = k + 2, . . . , T, (21)

with the assumptions on the initial values, parameters and the ut are analo-gous to the assumptions made for the VARMA (1). The lag length is chosenby using the Bayesian information criterion, pBIC = argminpBIC(p), where

the minimization is over p = 1, . . . , pT and kBIC = pBIC − 1. From the re-sults in Bauer & Wagner (2005), we take pT = [(T/ log T )1/2]. Paulsen(1984) shows that the standard order selection criteria are consistent formultivariate autoregressive processes with unit roots. The BIC is

BIC(p) = ln |Σ(p)|+ lnN(pK + 1)K

N, (22)

where N = T −pT , Σ(p) =∑T

t=pT+1 utu′t/N is an estimate of the error term

covariance matrix Σ and the ut are obtained by estimating an unrestricted

14

VAR model of order p using ypT+1−p, . . . , yT by OLS. After that, the param-eters of (21) are estimated by reduced rank maximum likelihood estimation(Johansen 1988, 1991, 1996). Forecasts are obtained iteratively by using theimplied estimated VAR form.

4 Conclusion

In this paper, we tie together some recent advances in the literature onVARMA models creating a relatively simple specification and estimationstrategy for the cointegrated case. In order to show its potential usefulness,we applied the procedure in a forecasting exercise for US interest rates andfound promising results.

There are a couple of issues which could be followed up. For example,the intercept term in the cointegrating relation is treated by subtractingthe sample mean from the series and it would be desirable to have a moreefficient method in this case. Also, it would be good to augment the modelby time-varying conditional variance. Finally, the development of modeldiagnostic tests appropriate for the cointegrated VARMA case would be ofinterest.

References

Athanasopoulos, G. & Vahid, F. (2008), ‘VARMA versus VAR for macroe-conomic forecasting’, Journal of Business & Economic Statistics 26, 237–252.

Bauer, D. & Wagner, M. (2005), Autoregressive approximations of multiplefrequency I(1) processes, Economics Series 174, Institute for AdvancedStudies.

Campbell, J. Y. & Shiller, R. J. (1987), ‘Cointegration and tests of presentvalue models’, Journal of Political Economy 95(5), 1062–88.

Carstensen, K. (2003), ‘Nonstationary term premia and cointegration of theterm structure’, Economics Letters 80(3), 409–413.

Cavaliere, G., Rahbek, A. & Taylor, A. (2010), ‘Co-integration rank testingunder conditional heteroskedasticity’, Econometric Theory 26(6), 1719–1760.

Clarida, R. H., Sarno, L., Taylor, M. P. & Valente, G. (2006), ‘The roleof asymmetries and regime shifts in the term structure of interest rates’,Journal of Business 79(3), 1193–1224.

Cooley, T. F. & Dwyer, M. (1998), ‘Business cycle analysis without muchtheory. A look at structural VARs’, 83, 57–88.

15

Cuthbertson, K. (1996), ‘The expectations hypothesis of the term structure:The uk interbank market’, The Economic Journal 106(436), 578–592.

de Pooter, M., Ravazzolo, F. & van Dijk, D. (2010), ‘Term structure forecast-ing using macro factors and forecast combination’, Norges Bank WorkingPaper 2010/01 .

Dufour, J. M. & Pelletier, D. (2008), ‘Practical methods for modellingweak VARMA processes: Identification, estimation and specificationwith a macroeconomic application’, Discussion Paper, McGill University,CIREQ and CIRANO .

Engsted, T. & Tanggaard, C. (1994), ‘Cointegration and the us term struc-ture’, Journal of Banking & Finance 18(1), 167–181.

Fernandez-Villaverde, J., Rubio-Ramırez, J. F., Sargent, T. J. & Watson,M. W. (2007), ‘A,B,C’s (and D)’s of understanding VARs’, AmericanEconomic Review 97(3), 1021–1026.

Feunou, B. (2009), ‘A no-arbitrage VARMA term structure model withmacroeconomic variables’, Working Paper, Duke University . June 2009,Mimeo, Duke University.

Guo, L., Chen, H.-F. & Zhang, J.-F. (1989), ‘Consistent order estimationfor linear stochastic feedback control systems (carma model)’, Automatica25(1), 147–151.

Hall, A. D., Anderson, H. M. & Granger, C. W. J. (1992), ‘A cointegrationanalysis of treasury bill yields’, The Review of Economics and Statistics74(1), 116–26.

Hassler, U. & Wolters, J. (2001), Forecasting money market rates in theunified germany, in R. Friedmann, L. Knuppel & H. Lutkepohl, eds,‘Econometric Studies: A Festschrift in Honour of Joachim Frohn’, LIT,pp. 185–201.

Huang, D. & Guo, L. (1990), ‘Estimation of nonstationary ARMAX mod-els based on the Hannan-Rissanen method’, The Annals of Statistics18(4), 1729–1756.

Johansen, S. (1988), ‘Statistical analysis of cointegration vectors’, Journalof Economic Dynamics and Control 12(2-3), 231–254.

Johansen, S. (1991), ‘Estimation and hypothesis testing of cointegration vec-tors in Gaussian vector autoregressive models’, Econometrica 59(6), 1551–1580.

Johansen, S. (1996), Likelihood-based Inference in Cointegrated Vector Au-toregressive Models, Oxford University Press, Oxford.

16

Kascha, C. & Mertens, K. (2009), ‘Business cycle analysis and VARMAmodels’, Journal of Economic Dynamics and Control 33(2), 267–282.

Lai, T. L. & Wei, C. Z. (1982), ‘Asymptotic properties of projections withapplications to stochastic regression problems’, Journal of MultivariateAnalysis 12, 346–370.

Lutkepohl, H. (1984a), ‘Linear aggregation of vector autoregressive movingaverage processes’, Economics Letters 14(4), 345–350.

Lutkepohl, H. (1984b), ‘Linear transformations of vector ARMA processes’,Journal of Econometrics 26(3), 283–293.

Lutkepohl, H. (1996), Handbook of Matrices, Chichester: Wiley.

Lutkepohl, H. (2005), New Introduction to Multiple Time Series Analysis,Berlin: Springer-Verlag.

Lutkepohl, H. & Claessen, H. (1997), ‘Analysis of cointegrated VARMAprocesses’, Journal of Econometrics 80(2), 223–239.

Monfort, A. & Pegoraro, F. (2007), ‘Switching varma term structure mod-els’, Journal of Financial Econometrics 5(1), 105–153.

Nielsen, B. (2006), Order determination in general vector autoregressions, inH.-C. Ho, C.-K. Ing & T. L. Lai, eds, ‘Time Series and Related Topics: InMemory of Ching-ZongWei’, Vol. 52 of IMS Lecture Notes and MonographSeries, Institute of Mathematical Statistics, pp. 93–112.

Paulsen, J. (1984), ‘Order determination of multivariate autoregressive timeseries with unit roots’, Journal of Time Series Analysis 5(2), 115–127.

Poskitt, D. (2006), ‘On the identification and estimation of nonstationaryand cointegrated ARMAX systems’, Econometric Theory 22(6), 1138–1175.

Poskitt, D. S. (2003), ‘On the specification of cointegrated autoregressivemoving-average forecasting systems’, International Journal of Forecasting19(3), 503–519.

Poskitt, D. S. (2009), Vector autoregressive moving-average identificationfor macroeconomic modeling: Algorithms and theory, Working Paper12/2009, Monash University.

Poskitt, D. S. & Lutkepohl, H. (1995), Consistent specification of cointe-grated autoregressive moving average systems, Discussion Paper 1996-74,Humboldt Universitat zu Berlin, SFB 373.

17

Shea, G. S. (1992), ‘Benchmarking the expectations hypothesis of theinterest-rate term structure: An analysis of cointegration vectors’, Journalof Business & Economic Statistics 10(3), 347–66.

Tiao, G. C. & Tsay, R. S. (1989), ‘Model specificationi in multivariate timeseries’, Journal of the Royal Statistical Society, B 51(2), 157–213.

Yap, S. F. & Reinsel, G. C. (1995), ‘Estimation and testing for unit rootsin a partially nonstationary vector autoregressive moving average model’,Journal of the American Statistical Association 90(429), 253–267.

18

Appendix

Proof of Theorem 3.1:

⇒: Suppose (yt)Tt=1−m satisfies (5) given initial conditions. One can view

the sequence (ut)Tt=1−m as a solution to (5) viewed as system of equations

for the errors and given initial conditions u0, . . . , u1−m. Then we knowthat (ut)

Tt=1−m is the sum of a particular solution and the appropriately

chosen solution of the corresponding homogeneous system of equations, ut =uPt + (−nt), say.

Define the sequence (Πi)i∈N0 by the recursive relations A0 = −M0Π0

and

Ai =i∑

j=0

MjΠi−j , for i = 1, . . . ,m (23)

0 =

p∑j=0

MjΠi−j , for i ≥ m+ 1 (24)

Define now (uPt )Tt=1−m by uPt := yt−

∑t+m−1s=1 Πsyt−s, where

∑0s=1Πsyt−s :=

0. Then, (uPt )Tt=1−m is indeed a particular solution as for t ≥ 1

p∑j=0

MjuPt−j =

m∑j=0

Mj

(yt−j −

t−j+m−1∑s=1

Πsyt−s−j

)

= A0yt −m∑j=1

Ajyt−j .

Further, define (nt)Tt=1−m by nt = uPt − ut for t = 1 − m, . . . , 0 and 0 =∑m

i=0Mjnt−i, for t = 1, . . . , T .By construction of nt, yt =

∑t+m−1s=1 Πsyt−s+ut+nt for t = 1−m, . . . , 0.

Then, we also have ut = uPt −nt for t ≥ 1 as (−nt)Tt=1−m represents a solution

to the homogeneous system.

⇐ : Conversely, suppose (yt)Tt=1−m admits an autoregressive representation

as in (6) and there exist (K × K) matrices Mj j = 0, . . . ,m such that0 =

∑mj=0MjΠi−j for i ≥ m + 1 and 0 =

∑mj=0Mjnt−j , for t = 1, . . . , T..

Then, for t = 1, . . . , T , it holds that

m∑j=0

Mjyt−j =

m∑j=0

Mj

t−j+m−1∑s=1

Πsyt−j−s +

m∑j=0

Mjut−j +

m∑j=0

Mjnt−j

19

Which leads to

−t+m−1∑v=0

min(v,m)∑j=0

MjΠv−jyt−v = −m∑v=0

v∑j=0

MjΠv−jyt−v

= A0yt −m∑v=1

Avyt−v = M(L)ut

where the last line defines the Av’s.


From Theorem 3.1, (yt)Tt=1−m has a autoregressive representation. One can

express conditions (9) and (10) for a suitable pair [A(z), m(z)IK ] by definingthe polynomials Π(z) = −(Π0 + Π1z + . . .) and n(z) = n1−m + n−pz + . . ..One also defines the (stochastic) polynomial o(z) = o0 + o1z + . . . + ot0z

t

which captures that∑max(q,t−m+1)

j=0 mjnt−j = 0 for t ≤ t for some t.

A(z) = m(z)Π(z) (25)

o(z) = m(z)n(z) (26)

Then, for given polynomials (Π(z), n(z)) one defines the set of all scalar poly-nomialsm(z) with the first coefficient normalized to one for which there existfinite polynomials A(z) and o(z) such that the (25) and (26) are satisfied.Denote this set by S. Since the (normalized) determinant of M(z) satisfiesthe above conditions, S is not empty. Denote one solution to

minm(L)∈S

deg(m(L)),

by m0(z) with degree q0, where deg : S → N is the function that assignsthe degree to every polynomial in S. Denote the associated polynomials byA0(z), o0(z) with degrees p0, t0, respectively.

Suppose, there is another solution of the same degreem1(z) = 1+m1,1z+. . .+m1,q0z

q0 with

A1(z) = m1(z)Π(z)

o1(z) = m1(z)n(z)

Since both polynomials are of degree q0, a = m0,q0/m1,q0 exists and one gets

(A0(z)− aA1(z)) = (m0(z)− am1(z))Π(z)

(o0(z)− ao1(z)) = (m0(z)− am1(z))n(z)

Then, normalization of the first non-zero coefficient of (m0(z) − am1(z))would give a polynomial in S with degree smaller than q0, a contradiction.Thus m0(z) is unique.

Then, condition (25) alone would imply left-coprimeness of [A0(z), m0(z)IK ]but if n(z) = 0 the minimal orders p0, q0 might well be above those of theleft-coprime solution to (25).

20


Similar to Guo, Chen & Zhang (1989), we proof (pT , qT ) → (p0, q0) a.s. byshowing that the only limit point of (pT , qT ) is indeed (p0, q0) with proba-bility one, where p0 and q0 are the true lag orders. Thus, the convergenceof pT and qT follows, which is equivalent to joint convergence. In order toshow this, we demonstrate that the events “(pT , qT ) has a limit point (p, q)with p + q > p0 + q0 ” (assuming p ≥ p0, q ≥ q0) and “(pT , qT ) has a limitpoint (p, q) with p < p0 or q < q0 ” both have probability zero.

Following Huang & Guo (1990) we rely on the spectral norm in order toanalyze the convergence behaviour of various sample moments; that is, for a(m×n) matrix A, ||A|| :=

√λmax(AA′), where λmax(·) denotes the maximal

eigenvalue. Lutkepohl (1996, Ch. 8) provides a summary of the propertiesof this norm. The stochastic order symbols o and O are understood in thecontext of almost sure convergence.

Case 1: p ≥ p0, q ≥ q0, p+ q > p0 + q0

For simplicity, write T instead of N in our lag selection criterion (13). Then

DP (p, q)−DP (p0, q0) = ln det ΣT (p, q)/det ΣT (p0, q0) + c(lnT )1+v

T,

where c > 0 is a constant.We have to show that DP (p, q) − DP (p0, q0) has a positive limit for

any pair p, q with p0 ≤ p ≤ pT , q0 ≤ q ≤ qT , and p + q > p0 + q0.Similar to Nielsen (2006, Proof of Theorem 2.5), it is sufficient to show thatT (ΣT (p0, q0) − ΣT (p, q)) = O{g(T )} such that (lnT )1+v/g(T ) → ∞ in thiscase.

Let us introduce the following notation:

ϕ0t (p, q) = [y′t, . . . , y

′t−p+1, u

′t, . . . u

′t−q+1]

′

ϕhTt (p, q) = [y′t, . . . , y

′t−p+1, (u

hTt )′, . . . , (uhT

t−q+1)′]′

YT = [y′1, . . . , y′T ]

′

UT = [u′1, . . . , u′T ]

′

x0t (p, q) = [(ϕ0t−1(p, q)

′ ⊗ IK)R]′

xhTt (p, q) = [(ϕhT

t−1(p, q)′ ⊗ IK)R]′

X0T (p, q) = [x01(p, q), . . . , x

0T (p, q)]

′

XT (p, q) = [xhT1 (p, q), . . . , xhT

T (p, q)]′

γ(p, q) = [vec(A1, A2, . . . , Ap)′,m1,m2, . . . ,mq]

′,

where γ(p, q) is the (K2 · (p + q) × 1) vector of true parameters such thatAi = 0 and mj = 0 for i > p0, j > q0, respectively.

21

Then, one can write

yt =

p∑i=1

Aiyt−i + ut +

q∑i=1

Miut−i

= [A1, . . . , Ap,M1, . . .Mq]ϕ0t−1(p, q) + ut

= (ϕ0t−1(p, q)

′ ⊗ IK)vec[A1, . . . , Ap,M1, . . .Mq] + ut

= (ϕ0t−1(p, q)

′ ⊗ IK)Rγ(p, q) = x0t (p, q)′γ(p, q) + ut

in order to summarize the model in matrix notation by

YT = X0T (p, q)γ(p, q) + UT

= XT (p, q)γ(p, q) + [X0T (p, q)−XT (p, q)]γ(p, q) + UT

= XT (p, q)γ(p, q) +RT + UT , (27)

whereRT := [X0

T (p, q)−XT (p, q)]γ(p, q).

RT does not depend on p, q for p ≥ p0, q ≥ q0 and can be decomposedas RT = [r′0, r

′1, . . . , r

′T−1]

′, where rt, t = 0, 1, . . . , T − 1, is a K × 1 vector.Let ZT (p, q) = [z1(p, q)

′, . . . , zT (p, q)′]′ be the OLS residuals obtained from

regressing YT on XT (p, q), i.e.

ZT (p, q) = YT −XT (p, q)[XT (p, q)

′XT (p, q)]−1

XT (p, q)′YT

= XT (p, q)γ(p, q) +RT + UT −XT (p, q)[XT (p, q)

′XT (p, q)]−1

XT (p, q)′

×(XT (p, q)γ(p, q) +RT + UT )

= [RT + UT ]−XT (p, q)[XT (p, q)

′XT (p, q)]−1

XT (p, q)′[RT + UT ].

The estimator of the error covariance matrix Σ in dependence on p andq is given by ΣT (p, q) = T−1

∑Tt=1 zt(p, q)zt(p, q)

′. Furthermore, note that

ΣT (p0, q0)− ΣT (p, q) is positive semidefinite since p ≥ p0 and q ≥ q0 in thecurrent setup. Hence, we have

∥ ΣT (p0, q0)− ΣT (p, q) ∥= λmax

(ΣT (p0, q0)− ΣT (p, q)

)≤ tr

(ΣT (p0, q0)− ΣT (p, q)

)= tr

(ΣT (p0, q0)

)− tr

(ΣT (p, q)

)= T−1ZT (p0, q0)

′ZT (p0, q0)− T−1ZT (p, q)′ZT (p, q) (28)

= T−1[RT + UT ]′XT (p, q)

[XT (p, q)

′XT (p, q)]−1

XT (p, q)′[RT + UT ]

− T−1[RT + UT ]′XT (p0, q0)

[XT (p0, q0)

′XT (p0, q0)]−1

XT (p0, q0)′[RT + UT ]

We have for the terms on the right-hand side (r.h.s.) of the last equality

22

in (28)

[RT + UT ]′XT (p, q)

[XT (p, q)

′XT (p, q)]−1

XT (p, q)′[RT + UT ]

= O(||[XT (p, q)

′XT (p, q)]−1/2

XT (p, q)′[RT + UT ]||2

)= O

(||[XT (p, q)

′XT (p, q)]−1/2

XT (p, q)′RT ||2

)+O

(||[XT (p, q)

′XT (p, q)]−1/2

XT (p, q)′UT ||2

),

(29)

where the result holds for all p ≥ p0 and q ≥ q0.As in Poskitt & Lutkepohl (1995, Proof of Theorem 3.2), we obtain from

Lai & Wei (1982, Theorem 3) for any m = max(p, q)

||[XT (p, q)

′XT (p, q)]−1/2

XT (p, q)′UT ||2

= O

(max

{1, ln+

(s∑

n=1

∑t

||yt−n||2 + ||uhTt−n||2

)})(30)

= O(ln m) +O

(ln

(O

{∑t

||yt||2 + ||uhTt ||2

}))a.s.,

where ln+(x) denotes the positive part of ln(x). Moreover, we have that∑t ||yt||2 = O(T g) due to Assumption 3.3, where the growth rate is inde-

pendent of m, see Poskitt & Lutkepohl (1995, Proof of Theorem 3.2, Proofof Lemma 3.1). Therefore, the second term on the r.h.s. of (30) is O(lnT )for all m. Hence, the left-hand side (l.h.s.) of (30) is O(lnT ) a.s. sincem ≤ sT ≤ hT = [c(lnT )a], c > 0, a > 1.

Similar to Poskitt & Lutkepohl (1995, Proof of Theorem 3.2) we obtainfrom a standard result in least squares

||[XT (p, q)

′XT (p, q)]−1/2

XT (p, q)′RT ||2 ≤

T−1∑t=0

K∑i=1

r2i,t

≤ ||γ(p, q)||2 ·q∑

n=1

∑t

||ut−n − uhTt−n||2 (31)

= O(lnT ) a.s.,

where the last line follows from Poskitt (2003, Proposition 3.1) due to As-sumption 3.3, our choice of hT and since ||γ(p, q)|| = constant < ∞ indepen-dent of (p, q). Hence, we have T−1[RT +UT ]

′XT (p, q) [XT (p, q)′XT (p, q)]

−1×XT (p, q)

′[RT + UT ] = O(lnT/T ) a.s. uniformly in (p, q).Using (29 - 31), we have ∥ ΣT (p0, q0)− ΣT (p, q) ∥= O (lnT/T ) such that

T (ΣT (p0, q0)− ΣT (p, q)) = O{ln(T )}, the desired result, and therefore

DP (p, q)−DP (p0, q0) > 0 a.s.

for sufficently large T .

23

Case 2: (p, q) with p < p0 or q < q0

For (p, q) with p < p0 or q < q0, write

D(p, q)−D(p0, q0) = ln |IK + (ΣT (p, q)− ΣT (p0, q0))Σ−1T (p0, q0))|+ o(1)

As in Nielsen (2006), it suffices to show that lim inf λmax(ΣT (p, q)−ΣT (p0, q0)) >0. To do so, let us introduce some further notation:

γT (p, q) =[X ′

T (p, q)XT (p, q)]−1

X ′T (p, q)YT (32)

= [vec(A1, A2, . . . , Ap)′, m1, m2, . . . , mq]

′.

and, defining sp = max(p, p0) and sq = max(q, q0),

γ0T (p, q) = [vec(A1, A2, . . . , Asp)′, m1, m2, . . . , msq ]

′ (33)

with Ai = 0 for i > p and mi = 0 for i > q. Then, we get

ZT (p, q) = YT −XT (p, q)γT (p, q) = YT −XT (sp, sq)γ0T (p, q)

= YT −XT (sp, sq)γ(sp, sq) +XT (sp, sq)[γ(sp, sq)− γ0T (p, q)]

= UT + XTγ(sp, sq) +XT (sp, sq)γT (p, q),

where γ(p, q) is defined as above in Case 1,

XT := (X0T (sp, sq)−XT (sp, sq)), (34)

and γT (p, q) = γ(sp, sq)− γ0T (p, q).Let x′t and x′t(sp, sq) be the typical K × (pK2 + q) (sub)matrices of the

TK × (pK2 + q) matrices XT and XT (sp, sq), respectively, i.e. the partitionof XT and XT (sp, sq) is analogous to XT (p, q) above. Then, for p < p0 orq < q0, the residual covariance matrix can be written as

ΣT (p, q) =1

T

T∑t=1

zt(p, q)zt(p, q)′

=1

T

T∑t=1

x′t(sp, sq)γT (p, q)γ′T (p, q)xt(sp, sq)

+1

T

T∑t=1

(x′t(sp, sq)γT (p, q)

) (ut + x′tγ(sp, sq)

)′+

1

T

T∑t=1

(ut + x′tγ(sp, sq)

) (x′t(sp, sq)γT (p, q)

)′+

1

T

T∑t=1

(ut + x′tγ(sp, sq)

) (ut + x′tγ(sp, sq)

)′= D1,T + (D2,T +D′

2,T ) +D3,T ,

24

where D1,T , D2,T , and D3,T are equal to the square products and crossproducts in the above equations, respectively.

Similarly, the residual covariance matrix based on the true orders p0 andq0 can be expressed by

ΣT (p0, q0) =1

T

T∑t=1

zt(p, q)z′t(p, q)

=1

T

T∑t=1

x′t(p0, q0)γT (p0, q0)γ′T (p0, q0)x

′t(p0, q0)

+1

T

T∑t=1

(x′t(p0, q0)γT (p0, q0)

) (ut + x′tγ(p0, q0)

)′+

1

T

T∑t=1

(ut + x′tγ(p0, q0)

) (x′t(p0, q0)γT (p0, q0)

)′+

1

T

T∑t=1

(ut + x′tγ(p0, q0)

) (ut + x′tγ(p0, q0)

)′= D0

1,T + (D02,T + (D0

2,T )′) +D0

3,T ,

where D01,T , D

02,T , and D0

3,T are defined analogously. Then,

ΣT (p, q)− ΣT (p0, q0) = D1,T +(D2,T +D′

2,T −D01,T −D0

2,T − (D02,T )

′)+(D3,T −D0

3,T

). (35)

It is easily seen that D3,T and D03,T both converge to Σu a.s.. Therefore,

the third term in (35) is o(1). We will further show that D2,T , D01,T , and

D02,T are o(1) a.s. and that lim inf λmax(D1,T ) > 0 while noting that D1,T is

p.s.d. by construction, showing lim inf λmax(ΣT (p, q) − ΣT (p0, q0)) > 0, thedesired result.

D1,T : Since D1,T is positive semidefinite by construction, it has at leastone nonzero eigenvalue if

λmax(D1,T ) = λmax

(1

T

T∑t=1

x′t(sp, sq)γT (p, q)γ′T (p, q)xt(sp, sq)

)

= λmax

(ΓT (p, q)

(1

T

T∑t=1

ϕt(sp, sq)ϕ′t(sp, sq)

)Γ′T (p, q)

)

≥ λmin

(1

T

T∑t=1

ϕt(sp, sq)ϕ′t(sp, sq)

)||ΓT (p, q)||2,

> 0,

25

where ΓT (p, q) = [A1, . . . , Mq] is γ(p, q) augmented to matrix form. Then,according to Poskitt & Lutkepohl (1995, Proof of Theorem 3.2) one getslimT→∞ inf T−1λmin(

∑Tt=1 ϕt(sp, sq)ϕ

′t(sp, sq)) > 0 a.s. and we also have

||ΓT (p, q)||2 = constant > 0 from Huang & Guo (1990, p. 1753). This giveslim inf λmax(D1,T ) > 0.

D01,T : We have

γT (p0, q0) = γ(p0, q0)− γ0T (p0, q0) (36)

= −[X ′

T (p0, q0)XT (p0, q0)]−1

X ′T (p0, q0)

[XTγ(p0, q0) + UT

]due to (27), (32), (33), and (34). Therefore,

|| 1T

T∑t=1

x′t(p0, q0)γT (p0, q0)γT (p0, q0)′xt(p0, q0)||

≤ 1

T

T∑t=1

||x′t(p0, q0)γT (p0, q0)γT (p0, q0)′xt(p0, q0)||

=1

T

T∑t=1

γT (p0, q0)′xt(p0, q0)x

′t(p0, q0)γT (p0, q0)

=1

TγT (p0, q0)

′

(T∑t=1

xt(p0, q0)x′t(p0, q0)

)γT (p0, q0)

=1

TγT (p0, q0)

′ (X ′T (p0, q0)XT (p0, q0)

)γT (p0, q0),

and, using the above result on γT ,

=1

T

[XTγ(p0, q0) + UT

]′XT (p0, q0)

[X ′

T (p0, q0)XT (p0, q0)]−1

× X ′T (p0, q0)

[XTγ(p0, q0) + UT

]=

1

TO(lnT ) a.s.,

where the last line follows from (29-31) of the first part of the proof; comparealso Huang & Guo (1990, pp. 1754).

26

D2,T : For D2,T , we have

∥ 1

T

T∑t=1

(x′t(sp, sq)γT (p, q))(ut + x′tγ(sp, sq))′ ∥

=∥ 1

T

T∑t=1

ΓT (p, q)ϕt(sp, sq)(ut + x′tγ(sp, sq))′ ∥

=∥ 1

TΓT (p, q)(Φ

′TΦT )

1/2(Φ′TΦT )

−1/2T∑t=1

ϕt(sp, sq)(ut + x′tγ(sp, sq))′ ∥,

where ΦT := [ϕ0(sp, sq), . . . , ϕT−1(sp, sq)]′ is a T×(sp+sq)·K matrix. Then,

similar to the approach in Huang & Guo (1990),

|| 1T

T∑t=1

(x′t(sp, sq)γT (p, q))(ut + x′tγ(sp, sq))′||

≤ 1

T||ΓT (p, q)(Φ

′TΦT )

1/2|| ||(Φ′TΦT )

−1/2Φ′T (U

T +XT )||,

where UT := [u1, . . . , uT ]′, XT := [x′1γ(sp, sq), . . . , x

′Tγ(sp, sq)]

′. Now, te-dious but straightforward calculations lead to

1

T||ΓT (p, q)(Φ

′TΦT )

1/2|| ||(Φ′TΦT )

−1/2Φ′T (U

T +XT )||

≤(1

TΓT (p, q)(Φ

′TΦT )Γ

′T (p, q)

)1/2

(37)

×(1

T(UT + XTγ)

′(ΦT ⊗ IK)((Φ′TΦT )

−1 ⊗ IK)(Φ′T ⊗ IK)(UT + XTγ)

)1/2

= [O(1)]1/2[O

(1

TlnT

)]1/2= o(1) a.s.

following from the results on D1,T and again from (29-31) of the first partof the proof. Note in this respect that the results in (30) and (31) also holdwhen using the regressor matrix ΦT ⊗ IK appearing in (37). This is dueto the fact that the relevant properties of linear projections and OLS donot depend on whether the restricted or unrestricted form of the regressormatrix is used.

27

D02,T : Similar to the arguments used for D2,T , we can write

|| 1T

T∑t=1

(x′tγT (p0, q0))(ut + x′tγ(p0, q0)||

≤ 1

T

T∑t=1

||(x′tγT (p0, q0))(ut + x′tγ(p0, q0)′||

≤ 1

T

T∑t=1

[γ′T (p0, q0)xt(p0, q0)x′t(p0, q0)γT (p0, q0)]

1/2

× [(ut + x′tγ(p0, q0))′(ut + x′tγ(p0, q0))]

1/2

≤ 1

T[γT (p0, q0)

′(X ′T (p0, q0)XT (p0, q0))γT (p0, q0)]

1/2

× [(UT + X ′Tγ(p0, q0))

′(UT + X ′Tγ(p0, q0))]

1/2

=

[O

(1

TlnT

)]1/2[O(1)]1/2 = o(1) a.s.,

using arguments identical to those used to evaluate D01,T and noting that

T−1(UT + X ′Tγ(p0, q0))

′(UT + X ′Tγ(p0, q0)) = T−1U ′

TUT + o(1) = O(1) a.s.due to the results of Poskitt & Lutkepohl (1995, Proof of Theorem 3.2) andapplying Poskitt (2003, Proposition 3.1). This completes the proof.

28

Table

1:Com

parison

ofmeansquaredpredictionerrors

RW

VECM

VARMA

VARMA

YP

13

612

13

612

13

612

13

612

3M0.04

20.22

80.681

2.100

0.791

0.834

0.88

00.92

40.740

0.793

0.83

40.888

0.738

0.789

0.822

0.87

86M

0.04

20.23

50.676

2.015

0.832

0.954

1.00

11.00

40.794

0.903

0.94

40.966

0.780

0.886

0.926

0.95

4

3M0.04

20.22

80.681

2.100

0.860

0.912

0.87

30.87

70.725

0.746

0.77

40.808

0.728

0.736

0.751

0.78

31Y

0.055

0.285

0.75

62.06

70.89

91.089

1.058

1.05

10.793

0.934

0.97

80.991

0.786

0.90

80.949

0.964

3M0.04

20.22

80.681

2.100

1.062

1.065

0.96

10.89

60.817

0.886

0.89

90.847

0.803

0.908

0.912

0.82

55Y

0.065

0.282

0.59

11.12

00.93

01.076

1.042

1.07

80.879

0.987

1.03

01.063

0.875

0.97

91.016

1.032

3M0.04

20.22

80.681

2.100

1.092

1.070

0.98

80.89

40.843

0.955

0.96

20.867

0.847

1.003

1.001

0.86

110

Y0.05

30.20

60.420

0.75

50.939

1.06

91.05

31.053

0.915

1.015

1.04

41.052

0.914

1.006

1.02

71.020

3M0.04

20.22

80.681

2.100

0.779

0.800

0.88

90.93

40.795

0.821

0.87

70.902

0.794

0.829

0.878

0.89

96M

0.04

20.23

50.676

2.015

0.769

0.897

1.00

01.00

90.818

0.900

0.97

10.972

0.823

0.911

0.973

0.96

91Y

0.055

0.285

0.75

62.06

70.84

61.027

1.131

1.12

60.884

0.997

1.07

61.075

0.887

1.00

41.074

1.071

1Y0.055

0.285

0.75

62.06

71.07

01.347

1.306

1.03

10.962

1.155

1.11

20.994

0.972

1.17

51.141

1.010

5Y0.065

0.282

0.59

11.12

00.95

41.046

1.020

0.96

10.917

0.996

0.95

90.974

0.935

1.00

60.977

0.991

10Y

0.05

30.20

60.42

00.75

50.95

51.00

60.959

0.926

0.924

0.98

30.952

0.973

0.940

0.99

00.968

0.993

3M0.04

20.22

80.681

2.100

0.911

1.088

1.17

31.01

30.728

0.783

0.82

90.838

0.730

0.802

0.856

0.83

81Y

0.055

0.285

0.75

62.06

70.97

91.297

1.372

1.16

70.822

1.004

1.04

20.999

0.830

1.02

91.070

0.996

10Y

0.05

30.20

60.42

00.75

50.96

01.07

31.093

1.085

0.912

1.00

61.025

1.048

0.911

1.00

21.013

1.018

3M0.04

20.22

80.681

2.100

0.998

1.136

1.08

90.88

41.211

1.154

1.00

70.855

1.242

1.181

1.021

0.85

36M

0.04

20.23

50.676

2.015

1.027

1.186

1.16

20.95

11.193

1.161

1.05

70.913

1.222

1.185

1.071

0.91

11Y

0.055

0.285

0.75

62.06

71.07

21.217

1.194

1.01

01.195

1.164

1.07

90.967

1.216

1.18

31.091

0.966

5Y0.065

0.282

0.59

11.12

01.02

31.045

1.019

0.98

81.059

1.012

0.95

90.960

1.065

1.01

60.961

0.957

10Y

0.05

30.20

60.42

00.75

51.01

11.01

40.982

0.960

1.017

0.98

60.948

0.939

1.017

0.98

40.943

0.933

Note:Thetable

reportsmean

squared

prediction

errors

forsystem

swith

differen

tmaturities

and

differen

tmodels.

VARMA

refers

tothe

VARMA

model

estimatedvia

Poskitt’s(2003)methodandVARMA

YP

refers

tothesamemodel

estimatedvia

themethodofYap&

Reinsel

(1995).

29

Table

2:Com

parison

ofdeterminan

tofmeansquaredprediction

errormatrices

RW

VECM

VARMA

VARMA

YP

13

612

13

612

13

612

13

612

3M,6M

0.00

00.00

40.01

50.07

90.80

50.78

80.757

0.55

90.74

60.767

0.75

50.549

0.755

0.816

0.79

30.560

3M,1Y

0.001

0.01

30.06

70.344

0.85

70.860

0.707

0.55

10.688

0.734

0.65

70.495

0.705

0.774

0.66

60.484

3M,5Y

0.002

0.04

10.23

81.129

1.00

01.107

0.958

0.80

30.740

0.895

0.90

50.759

0.725

0.929

0.92

60.740

3M,10Y

0.00

20.037

0.213

1.11

71.032

1.172

1.067

0.86

00.79

41.026

1.04

60.840

0.792

1.073

1.07

60.817

3M,6M,1Y

0.00

00.000

0.000

0.00

20.750

0.737

0.80

10.603

0.751

0.68

60.745

0.562

0.758

0.69

40.748

0.56

1

3M,1Y,10Y

0.000

0.00

00.00

20.01

11.11

81.359

1.17

10.816

0.931

1.09

01.030

0.84

70.958

1.13

31.065

0.857

1Y,5Y,10Y

0.00

00.00

10.00

60.055

0.93

91.158

0.995

0.49

00.74

90.913

0.817

0.46

10.77

61.009

0.85

70.437

3M-10Y

0.000

0.000

0.00

00.00

00.92

71.05

10.943

0.44

91.14

71.057

0.92

80.491

1.17

01.069

0.92

30.470

Note:Thetable

reportsthedeterminantofthemean

squared

prediction

errormatrices

forsystem

swith

differentmaturities

and

differen

tmodels.

VARMA

refers

totheVARMA

model

estimatedvia

Poskitt’s(2003)methodandVARMA

YP

refers

tothesamemodel

estimatedvia

themethodofYap&

Reinsel(1995).

30

Figure 1: US treasury bills and bonds yields. See text for definitions.

31

(a) Forecasting horizon: 1 month

(b) Forecasting horizon: 6 month

(c) Forecasting horizon: 12 month

Figure 2: Cumulative squared prediction errors of the VECM and VARMAmodel for different horizons.

32

Cointegrated VARMA Models and Forecasting US...

Documents

Transcript of Cointegrated VARMA Models and Forecasting US...