Automated Estimation of Vector Error Correction Models - American

Automated Estimation of Vector Error Correction Models

Zhipeng Liao� Peter C. B. Phillipsy

December, 2010

Abstract

Model selection and associated issues of post-model selection inference present well

known challenges in empirical econometric research. These modeling issues are manifest

in all applied work but they are particularly acute in multivariate time series settings

such as cointegrated systems where multiple interconnected decisions can materially

a¤ect the form of the model and its interpretation. In cointegrated system modeling,

empirical estimation typically proceeds in a stepwise manner that involves the deter-

mination of cointegrating rank and autoregressive lag order in a reduced rank vector

autoregression followed by estimation and inference. This paper proposes an automated

approach to cointegrated system modeling that uses adaptive shrinkage techniques to

estimate vector error correction models with unknown cointegrating rank structure

and unknown transient lag dynamic order. These methods enable simultaneous or-

der estimation of the cointegrating rank and autoregressive order in conjunction with

oracle-like e¢ cient estimation of the cointegrating matrix and transient dynamics. As

such they o¤er considerable advantages to the practitioner as an automated approach to

the estimation of cointegrated systems. The paper develops the new methods, derives

their limit theory, reports simulations and presents an empirical illustration.

Keywords: Adaptive shrinkage; Automation; Cointegrating rank, Lasso regression;

Oracle e¢ ciency; Transient dynamics; Vector error correction.

JEL classi�cation: C22

�Yale University. Support from an Anderson Fellowship is gratefully acknowledged. Email:[email protected]

yYale University, University of Auckland, University of Southampton and Singapore Management Uni-versity. Support from the NSF under Grant No SES 09-56687 is gratefully acknowledged. Email: [email protected]

1

1 Introduction

Cointegrated system modeling is now one of the main workhorses in empirical time series

research. Much of this empirical research makes use of vector error correction (VECM)

formulations. While there is often some prior information concerning the number of coin-

tegrating vectors, most practical work involves (at least con�rmatory) pre-testing to de-

termine the cointegrating rank of the system as well as the lag order in the autoregressive

component that embodies the transient dynamics. These order selection decisions can be

made by sequential likelihood ratio tests (e.g. Johansen, 1988, for rank determination) or

the application of suitable information criteria (Phillips, 1996). The latter approach o¤ers

several advantages such as joint determination of the cointegrating rank and autoregressive

order, consistent estimation of both order parameters (Chao and Phillips, 1999), robust-

ness to heterogeneity in the errors, and the convenience and generality of semi-parametric

estimation in cases where the focus is simply the cointegrating rank (Cheng and Phillips,

2010). While appealing for practitioners, all of these methods are nonetheless subject to

pre-test bias and post model selection inferential problems (Leeb and Pötscher, 2005).

The present paper explores a di¤erent approach. The goal is to liberate the empirical

researcher from sequential testing procedures and post model selection issues in inference

about cointegrated systems and in policy work that relies on impulse responses. The ideas

originate in recent work on sparse system estimation using shrinkage techniques such as

lasso and bridge regression. These procedures utilize penalized least squares criteria in

regression that can succeed, at least asymptotically, in selecting the correct regressors in

a linear regression framework while consistently estimating the non-zero regression coe¢ -

cients.

One of the contributions of this paper is to show how to develop adaptive versions of

these shrinkage methods that apply in vector error correction modeling. The implemen-

tation of these methods is not immediate. This is partly because of the nonlinearities

involved in potential reduced rank structures and partly because of the interdependence of

decision making concerning the form of the transient dynamics and the cointegrating rank

structure. The paper designs a mechanism of estimation and selection that works through

the eigenvalues of the levels coe¢ cient matrix and the coe¢ cient matrices of the transient

dynamic components. The methods apply in quite general vector systems with unknown

cointegrating rank structure and unknown lag dynamics. They permit simultaneous order

estimation of the cointegrating rank and autoregressive order in conjunction with oracle-

2

like e¢ cient estimation of the cointegrating matrix and transient dynamics. As such they

o¤er considerable advantages to the practitioner: in e¤ect, it becomes unnecessary to im-

plement pre-testing procedures because the empirical results reveal the order parameters

as a consequence of the �tting procedure. In this sense, the methods provide an automated

approach to the estimation of cointegrated systems. In the scalar case, the methods reduce

to estimation in the presence or absence of a unit root and thereby implement an implicit

unit root test procedure, as suggested in earlier work by Caner and Knight (2009).

The paper is organized as follows. Section 2 lays out the model and assumptions and

shows how to implement adaptive shrinkage methods in VECM systems. Section 3 consid-

ers a simpli�ed �rst order version of the VECM without lagged di¤erences which reveals

the approach to cointegrating rank selection and develops key elements in the limit theory.

Here we show that the cointegrating rank ro is identi�ed by the number of zero eigenvalues

of �o and the latter is consistently recovered by suitably designed shrinkage estimation.

Section 4 extends this system and its asymptotics to the general case of cointegrated sys-

tems with weakly dependent errors. Here it is demonstrated that the cointegration rank

ro can be consistently selected despite the fact that �o may not be consistently estimable.

Section 5 deals with the practically important case of a general VECM system driven by

independent identically distributed (i.i.d.) shocks, where shrinkage estimation simultane-

ously performs consistent lag selection, cointegrating rank selection, and optimal estimation

of the system coe¢ cients. Section 6 considers adaptive selection of the tuning parameter.

Section 7 reports some simulation �ndings and a brief empirical illustration. Section 8

concludes and outlines some useful extensions of the methods and limit theory to other

models. Proofs and some supplementary technical results are given in the Appendix.

2 Vector Error Correction and Adaptive Shrinkage

Throughout the paper we consider following parametric VECM representation of a cointe-

grated system

�Yt = �oYt�1 +

pXj=1

Bo;j�Yt�j + ut; (2.1)

where �Yt = Yt � Yt�1; Yt is an m-dimensional vector-valued time series, �o = �o�0o hasrank 0 � ro � m, Bo;j (j = 1; :::; p) are m �m (transient) coe¢ cient matrices and ut is

an m-vector error term with mean zero and nonsingular covariance matrix �u. The rank

3

ro of �o is an order parameter measuring the cointegrating rank or the number of (long

run) cointegrating relations in the system. The lag order p is a second order parameter,

characterizing the transient dynamics in the system.

Conventional methods of estimation of (2.1) include reduced rank regression or maxi-

mum likelihood, based on the assumption of Gaussian ut and a Gaussian likelihood. This

approach relies on known ro and known p; so implementation requires preliminary order

parameter estimation. The system can also be estimated by unrestricted fully modi�ed

vector autoregression (Phillips, 1995), which leads to consistent estimation of the unit

roots in (2.1), the cointegrating vectors and the transient dynamics. This method does

not require knowledge of r0 but does require knowledge of p: In addition, a semiparametric

approach can be adopted in which ro is estimated semiparametrically by order selection as

in Cheng and Phillips (2010) followed by fully modi�ed least squares regression to estimate

the cointegrating matrix. This method achieves asymptotically e¢ cient estimation of the

long run relations (under Gaussianity) but does not estimate the transient relations.

The present paper explores the estimation of the parameters of (2.1) by Lasso-type

regression, i.e. least squares (LS) regression with penalization. The resulting estimator

is a shrinkage estimator. Speci�cally, the LS shrinkage estimator of (�o; Bo) where Bo =

(Bo;1; :::; Bo;p) is de�ned as

(b�n; bBn) = argmin�2Rm�m;Bj2Rm�m

nXt=1

�Yt ��Yt�1 �pXj=1

Bj�Yt�j

2

E

+n

pXk=1

bP�n(Bj) + n mXk=1

bP�n ��k(�)� ; (2.2)

where k�kE is the Euclidean norm, bP�n(�) is penalty function with tuning parameter �n,and ��k(�) = j�k (�)j denotes the k-th largest modulus of the eigenvalues f�k (�)gmk=1 ofthe matrix �1. Let Sn;� = fk : �k(b�n) 6= 0g be the index set of nonzero eigenvaluesof b�n. Since the eigenvalues of b�n are ordered as indicated, if there are dSn;� elementsin Sn;�, then Sn;� = fm � dSn;� + 1; :::;mg. Similarly, we index the zero components ofb�n using the set Scn;� = fk : �k(b�n) = 0g and set Scn;� = f1; :::; dSn;�g. Given �n, thisprocedure delivers a one step estimator of the model (2.1) with an implied estimate of the

1Throughout this paper, for any m � m matrix �, we order the eigenvalues of � in ascending orderby their moduli, i.e. j�1 (�)j � j�2 (�)j � ::: � j�m (�)j. When there is a pair of complex conjugateeigenvalues, we order the one with a negative imaginary part before the other.

4

cointegrating rank (based on the number of non-zero eigenvalues of b�n and an impliedestimate of the transient dynamic order p and transient dynamic structure (that is, the

non zero elements of Bo) based on the �tted value bBn:In the literature of variable selection, there are many popular choices of the penalty

function. One example for bP�n(�) is the adaptive Lasso penalty (Zou, 2006), de�ned asbP�n(�) = �n bw�j�j; (2.3)

where bw� = 1=jb�nj! with ! > 0 and some consistent estimator b�n of �. Other examplesare the bridge penalty (Knight, 2000), de�ned as

bP�n(�) = �n j�j (2.4)

where 2 (0; 1), and the SCAD penalty (Fan and Li, 2001), de�ned as

bP�n(�) =8>><>>:

�nj�j�naj�j(a�1) �

�2+�2n2(a�1)

(a+1)�2n2

j�j � �n�n < j�j � a�na�n < j�j

; (2.5)

where a is some positive real number strictly larger than 2. In the context of variable selec-

tion, the LS shrinkage estimator with the above penalty functions has "oracle" properties

in the sense that any zero coe¢ cients are estimated as zeros with probability approaching

1(w.p.a.1). Correspondingly, the non-zero coe¢ cients are estimated as if the true model

were known and implemented in estimation (Fan and Li, 2001).

Let �(�o) = (�1(�); :::; �m(�)) denote the vector containing the m ordered eigenvalues

of the matrix � in Rm�m. When futgt�1 is an i.i.d. process or a martingale di¤erencesequence, the LS estimators (b�ols; bBols) of (�o; Bo) are well known to be consistent. Thusit seems intuitively clear that some form of adaptive penalization can be devised to consis-

tently distinguish the zero and nonzero components in Bo and �(�o). We show that the

shrinkage LS estimator de�ned in (2.2) enjoys these oracle-like properties, in the sense that

the zero components in Bo and �(�o) are estimated as zeros w.p.a.1. Thus, �o and the

non-zero elements in Bo are estimated as if the form of the true model were known and

inferences can be conducted as if we knew the true cointegration rank ro.

If the transient behavior of (2.1) is misspeci�ed and (for some given lag order p) the error

process futgt�1 is weakly dependent and ro > 0, then consistent estimators of (�o; Bo) are

5

typically unavailable without further assumptions. However, we show here that the m� rozero eigenvalues of �o can still be consistently estimated with an order n convergence

rate, while the rest of the eigenvalues of �o are estimated with asymptotic bias at apn

convergence rate. The di¤erent convergence rates of the eigenvalues are important, because

when the non-zero eigenvalues of �o are occasionally (asymptotically) estimated as zeros,

the di¤erent convergence rates are useful in consistently distinguishing the zero eigenvalues

from the biasedly estimated non-zero eigenvalues of �o. Speci�cally, we show that if the

estimator of some non-zero eigenvalue of �o has probability limit zero, then this estimator

will converge in probability to zero at the ratepn, while estimates of the zero eigenvalues of

�o all have convergence rate n. Hence the adaptive penalties associated with estimates of

zero eigenvalues of �o will diverge to in�nity at a rate faster than those of estimates of the

nonzero eigenvalues of �o, even though the latter also converge to zero in probability. As

we have prior knowledge about these di¤erent divergence rates in a potentially cointegrated

system, we can impose explicit conditions on the convergence rate of the tuning parameter

to ensure that only the estimates of zero eigenvalues of �o are adaptively shrunk to zero

in �nite samples.

For empirical implementation of our approach, we provide data-driven procedures for

selecting the tuning parameter of the penalty function in �nite samples. Our method is

executed as follows.

(i) Solve the minimization problem in (6.2) to get the data determined tuning parameterb��n.(ii) Using the tuning parameter b��n, obtain the LS shrinkage estimator (b�n; bBn) of

(�o; Bo).

(iii) The cointegration rank selected by the shrinkage method is implied by the rank

of the shrinkage estimator b�n and the lagged di¤erences selected by the shrinkage methodare implied by the nonzero matrices in bBn.

(iv) The LS shrinkage estimator (b�n; bBSn;B ) contains shrinkage bias introduced by thepenalty on the nonzero eigenvalues of b�n and nonzero matrices in bB. To remove this bias,run a second-step reduced rank regression based on the cointegration rank and the model

selected by the LS shrinkage estimation given b��n.Notation is standard. For vector-valued, zero mean, covariance stationary stochastic

processes fatgt�1 and fbtgt�1, �ab(h) = E[atb0t+h] and �ab =

P1h=0 �ab(h) denote the

lag h autocovariance matrix and one-sided long-run covariance matrix. The symbolism

an � bn means that (1 + op(1))bn = an or vice versa; the expression an = op(bn) signi-

6

�es that Pr (jan=bnj � �) ! 0 for all � > 0 as n go to in�nity; and an = Op(bn) when

Pr (jan=bnj �M)! 0 as n and M go to in�nity. As usual, "!p" and "!d" imply conver-

gence in probability and convergence in distribution, respectively.

3 The First Order VECM

This section considers the following simpli�ed �rst order version of (2.1),

�Yt = �oYt�1 + ut = �o�0oYt�1 + ut: (3.1)

The model contains no deterministic trend and no lagged di¤erences. Our focus in this

simpli�ed system is to outline the approach to cointegrating rank selection and develop

key elements in the limit theory, showing consistency in rank selection and reduced rank

coe¢ cient matrix estimation. The theory is then generalized in subsequent sections.

We impose the following assumptions on the innovation ut.

Assumption 3.1 (WN) futgt�1 is an m-dimensional i:i:d: process with mean zero, andnonsingular covariance matrix u.

Assumption 3.1 ensures that the parameter matrix �o is consistently estimable in this

simpli�ed system. It could be relaxed to allow for martingale di¤erence innovations. Under

3.1, partial sums of ut satisfy the functional law

n�12

[n�]Xt=1

ut !d Bu(�); (3.2)

where Bu(�) is vector Brownian motion with variance matrix u.

Assumption 3.2 (RR) (i) The determinantal equation jI � (I +�o)�j = 0 has roots onor outside the unit circle; (ii) the matrix �o has rank ro, with 0 � ro � m; (iii) if ro > 0,then the matrix R = Iro + �

0o�o has eigenvalues within the unit circle.

As �o = �o�0o has rank ro, we can choose �o and �o to be m � ro matrices with full

rank. When ro = 0, we simply take �o = 0. Let �o;? and �o;? be the matrix orthogonal

complements of �o and �o and, without loss of generality, assume that �0o;?�o;? = Im�ro

7

and �0o;?�o;? = Im�ro . Assumption 3.2.(iii) implies �0o�o is nonsingular, which further

implies that �0o;?�o;? is nonsingular.

Suppose �o 6= 0 and de�ne Q = [�o; �o?]0 : In view of the well known relation (e.g.,

Johansen, 1995)

�o(�0o�o)

�1�0o + �o;?(�0o;?�o;?)

�1�0o;? = Im;

it follows that Q�1 =h�o(�

0o�o)

�1; �o;?(�0o;?�o;?)

�1iand then

Q�oQ�1 =

�0o�o 0

0 0

!: (3.3)

Cointegrating rank is the number ro of non-zero eigenvalues of �o and �o;? is a matrix of

normalized eigenvectors corresponding to those eigenvalues. When �o = 0, then the result

holds trivially with ro = 0 and �o;? = Im.

Let S� = fk : �k(�o) 6= 0g be the index set of nonzero eigenvalues of �o and similarlySc� = fk : �k(�o) = 0g denote the index set of zero eigenvalues of �o. By the ordering ofthe eigenvalues of �o, we know that S� = fm� dS� + 1; :::;mg and Sc� = f1; :::;m� dS�g,where dS� is the cardinality of the set S�. Based on the above discussion, the following

equality holds

dS� = ro: (3.4)

Consistent selection of the rank of �o is equivalent to the consistent recovery of the zero

components in �(�o). We show that the latter is achieved by the sparsity of our shrinkage

estimator of the eigenvalues of �o.

Using the matrix Q, we can transform (3.1) as

�Zt = �oZt�1 + wt; (3.5)

where

Zt =

�0oYt

�0o;?Yt

!:=

Z1;t

Z2;t

!; wt =

�0out

�0o;?ut

!:=

w1;t

w2;t

!and �o = Q�oQ�1. Assumption 3.2 leads to the following Wold representation for Z1;t

Z1;t = �0oYt =

1Xi=0

Ri�0out�i = R(L)�0out; (3.6)

8

and the partial sum Granger representation,

Yt = C

tXs=1

us + �o(�0o�o)

�1R(L)�0out + CY0; (3.7)

where C = �o;?(�0o;?�o;?)

�1�0o;?. Under Assumption 3.2 and (3.2), we have the functional

law

n�12

[n�]Xt=1

wt !d Bw(�) = QBu (�) ="�0oBu (�)�0o;?Bu (�)

#=:

"Bw1 (�)Bw2 (�)

#for wt = Qut; so that

n�12

[n�]Xt=1

Z1;t = n� 12

[n�]Xt=1

�0oYt !d �(�0o�o)�1Bw1(�); (3.8)

since R (1) =P1i=0R

i = (I �R)�1 = �(�0o�o)�1: Also

n�1nXt=1

Z1;t�1Z01;t�1 = n

�1nXt=1

�0oYt�1Y0t�1�o !p �z1z1 ;

where �z1z1 := V ar��0oYt

�=P1i=0R

i�0ou�oRi0:

The shrinkage LS estimator b�n of �o is de�ned asb�n = arg min

�2Rm�m

nXt=1

k�Yt ��Yt�1k2E + nmXk=1

bP�n ��k(�)� : (3.9)

The unrestricted LS estimator b�ols of �o isb�ols = arg min

�2Rm�m

nXt=1

k�Yt ��Yt�1k2E =

TXt=1

�YtY0t�1

! TXt=1

Yt�1Y0t�1

!�1: (3.10)

Let �(b�ols) = [�1(b�ols); :::; �k(b�ols)] be the vector of ordered eigenvalues of b�ols. Theasymptotic properties of b�ols and its eigenvalues are described in the following result.Lemma 3.1 Under Assumptions 3.1 and 3.2, we have:

(a) b�ols satis�esQ�b�ols ��o�Q�1D�1n = Op(1) (3.11)

9

where Dn = diag(n�12 Iro ; n

�1Im�ro);

(b) the eigenvalues of b�ols satisfyh�1(b�ols); :::; �m�ro(b�ols)i!p (0; :::; 0)1�(m�ro) (3.12)

and h�m�ro+1(

b�ols); :::; �m(b�ols)i!p

��m�ro+1(�o); :::; �m(�o)

�; (3.13)

(c) the �rst m� ro ordered eigenvalues of b�ols satisfynh�1(b�ols); :::; �m�ro(b�ols)i!d

�e�o;1; :::; e�o;m�ro� ; (3.14)

where the e�o;j (j = 1; :::;m� ro) are solutions of the following determinantal equation��Im�r0 ��Z

dBw2B0w2

��ZBw2B

0w2

��1�� = 0; (3.15)

where Bw2 = �0o;?Bu is Brownian motion with covariance matrix �

0o;?u�o;?.

The results of Lemma 3.1 are useful in constructing the adaptive Lasso penalty function

because the OLS estimate b�ols and the related eigenvalue estimates �k(b�ols) of �k(�o)(k = 1; :::;m) can be used as �rst step estimates in the penalty function. The convergence

rates of b�ols and �k(b�ols) are important for delivering consistent model selection andcointegrated rank selection. In the univariate case wherem = 1 and ro = 0, we immediately

have (see Lemma 1.(iii) in the Appendix)

n�Sc�(b�ols)!d

RBu(a)dBu(a)RB2u(a)da

; (3.16)

coinciding with the asymptotic distribution of the LS estimator in a regression with a

unit-root.

The following assumptions provide some regularity conditions on the penalty functionbP�n(�), which are su¢ cient for establishing consistency and the convergence rate of b�n, aswell as the sparsity of �(b�n).Assumption 3.3 (PF-I) (i) The penalty function bP�n(�) is non-negative with bP�n(0) = 0

10

and satis�es bP�n(�) = op(1) (3.17)

for all � 2 (0;1); (ii) bP�n(�) is twice continuously di¤erentiable at � 2 (0;1) withbP 00�n(�) = op(1) for all � 2 (0;1); (iii) bP�n(�) satis�esn12

�� bP 0�n(�)�� = op(1); (3.18)

for any � 2 (0;1); (iv) bP�n(�) is an increasing function in (0;+1) and for each k 2 Sc�,there exists some sequence vn;k = o(1) such that

lim infn!1

nn bP�n(n�1vn;k)o!1 a:e: (3.19)

Assumption 3.3.(i) essentially states that the shrinkage e¤ect on estimates of the non-

zero eigenvalues goes to zero asymptotically. This requirement is important for establishing

the consistency of the shrinkage estimator b�n. Assumption 3.3.(ii) is a useful regularitycondition in deriving the convergence rate of b�n and Assumption 3.3.(iii) assures a pnconvergence rate. Assumption 3.3.(iv) ensures that the penalty bP�n(�) attaches to estimatesof the zero eigenvalues diverges to in�nity, which ensures the zero eigenvalues of �o are

estimated with zero w.p.a.1 and consistent cointegration rank selection. Similar general

assumptions are imposed on the penalty function bP�n(�) in Liao (2010), where shrinkagetechniques are employed to perform consistent moment selection in a GMM framework.

Theorem 3.1 (Consistency) Suppose Assumptions WN, RR and PF-I.(i) are satis�ed.Then the shrinkage LS estimator b�n is consistent, i.e. b�n ��o = op(1):

When consistent shrinkage estimators are considered, Theorem 3.1 extends Theorem 1

of Caner and Knight (2009), where shrinkage techniques are used to perform a unit root

test. As the eigenvalues �(�) of the matrix � are continuous functions of �, we deduce from

the consistency of b�n and continuous mapping that �k(b�n)!p �k(�o) for all k = 1; :::;m.

Theorem 3.1 implies that the nonzero eigenvalues of �o are estimated as non-zeros, which

means that the rank of �o will not be under-selected. However, consistency of the estimates

of the non-zero eigenvalues is not necessary for consistent cointegration rank selection. In

that case what is essential is that the probability limits of the estimates of those (non-zero)

eigenvalues are not zeros or at least that their convergence rates are slower than those

of estimates of the zero eigenvalues. This point will be pursued in the following section

11

where it is demonstrated that consistent estimation of the cointegrating rank continues to

hold for weakly dependent innovations futgt�1 even though full consistency of b�n does notgenerally apply in that case.

Our next result gives the convergence rate of the shrinkage estimator b�n.Theorem 3.2 (Rate of Convergence) Denote an = maxk2S�

�� bP 0�n ��k(�o)��. Under

Assumption WN, RR and Assumption PF.(i)-(ii), the shrinkage LS estimator b�n satis�esthe following:

(a) if ro = 0, then b�n ��o = Op(n�1 + n�1an);(b) if 0 < ro � m, then

�b�n ��o�Q�1D�1n = Op(1 + n12an).

The term an represents the shrinkage bias that the penalty function introduces to the

LS shrinkage estimator. If the convergence rate of �n is fast enough such that Assumption

PF.(iii) is satis�ed, then Theorem 3.2 implies that b�n � �o = Op(n�1) when ro = 0

and�b�n ��o�Q�1D�1n = Op(1) otherwise. Hence, under Assumption WN, RR and

Assumption PF.(i)-(iii), the LS shrinkage estimator b�n has the same stochastic propertiesof the LS estimator b�ols. However, we next show that if the tuning parameter �n convergesto zero not too fast and such that Assumptions PF.(iv) is also satis�ed, then the correct

rank restriction r = ro is automatically imposed on the LS shrinkage estimator b�n w.p.a.1.Recall that Sn;� is the index set of the largest (or last) ro ordered eigenvalues of b�n

and Scn;� is the index set of the smallest (or �rst) m � ro ordered eigenvalues of b�n. Byvirtue of the consistency of �(b�n) = h�1(b�n); :::; �m(b�n)i, we can deduce that Scn;� � Sc�.To show Scn;� = S

c�, it is su¢ ces to show that S

c� � Scn;�:

Theorem 3.3 (Sparsity-I) Under Assumption WN, RR and Assumption PF-I,

Pr��k(b�n) = 0�! 1 (3.20)

for all k 2 Sc�, as n!1.

Proof. Let b�n;ro denote the reduced rank regression estimator with the rank constraintr(�) = ro. By the de�nition of b�n,nXt=1

� �Yt � b�nYt�1 2E� �Yt � b�n;roYt�1 2

E

�� n

mXk=1

n bP�n h��k �b�n;ro�i� bP�n h��k �b�n�io :(3.21)

12

The left-hand side of the inequality in (3.21) can be rewritten as

Ln : =hvec

�b�n;ro � b�n�i0

nXt=1

Yt�1Y0t�1 Im

!hvec

�2�o � b�n;ro � b�n�i

+ 2hvec

�b�n;ro � b�n�i0 vec

nXt=1

Yt�1u0t

!:(3.22)

We invoke Theorem 3.2 and Theorem 3.1 in Liao and Phillips (2010) to deduce that

Ln = Op(1).

On the eventn�ko(

b�n) 6= 0o for some ko 2 Sc�, the right-hand side of the inequality in(3.21) can be bounded in the following way

Rn : = nXk2S�

n bP�n h��k �b�n;ro�i� bP�n h��k �b�n�io� nXk2Sc�

bP�n h��k �b�n�i� �n bP�n h��k �b�n�i+ nX

k2S�

n bP�n h��k �b�n;ro�i� bP�n h��k �b�n�io : (3.23)

From Assumptions PF.(ii)-(iii), the Bauer-Fiker Theorem and Theorem 3.1 in Liao and

Phillips (2010), we have nXk2S�

n bP�n h��k �b�n;ro�i� bP�n h��k �b�n�io

� n12 bP 0�n ��k (�o)� n 1

2

�b�n;ro � b�n� E

+ [1 + op(1)] bP 00�n ��k (�o)� n 12

�b�n;ro � b�n� 2E

= Op(1): (3.24)

Suppose that Pr��ko(

b�n) 6= 0�! � 2 (0; 1], then on the eventn�ko(

b�n) 6= 0o the restric-tion �ko(�) = 0 is not imposed on the LS estimator

b�n asymptotically. Hence we can useAssumptions PF.(iv) and Theorem 4.1 in Liao and Phillips (2010) to deduce that

n bP�n h��k �b�n�i = n bP�n nn�1n h��k(b�n)io � n bP�n �n�1vn�!p 1 (3.25)

as n!1. From the results in (3.23)-(3.25), we have Rn ! �1 as n!1 and Ln > Rnwith nontrivial probability, which contradicts the de�nition of b�n. Hence, it follows that

13

Pr��k(b�n) = 0�! 1 for all k 2 Sc�, as n!1.

Theorem 3.1 and the CMT imply that

�k(b�n)!p �k (�o) : (3.26)

Combining Theorem 3.1 and Theorem 3.3, we deduce that

Pr (Sn;� = S�)! 1; (3.27)

which implies consistent cointegration rank selection, giving the following result.

Corollary 3.4 Under Assumption WN, RR and PF-I, we have

Pr�r(b�n) = ro�! 1 (3.28)

as n!1, where r(b�n) denotes the rank of b�n.It is clear that Assumption PF-I plays an important role in deriving consistent cointe-

gration rank selection. We next show that the adaptive Lasso and bridge penalty functions

satisfy Assumption PF-I; the SCAD penalty satis�es Assumptions PF-I.(i)-(iii) but fails to

satisfy Assumptions PF-I.(iv).

Corollary 3.5 The adaptive Lasso, bridge and SCAD penalty functions have the followingproperties:

(a) If �n = o(1), then the adaptive Lasso and bridge penalty functions satisfy Assump-

tions PF.(i)-(ii) and the SCAD penalty function satis�es Assumptions PF.(i)-(iii);

(b) if n12�n = o(1), then the adaptive Lasso and bridge penalty functions satisfy As-

sumptions PF.(ii)-(iii);

(c) if �nn! ! 1, then the adaptive Lasso penalty satis�es Assumption PF.(v); ifn1� �n !1, then the bridge penalty satis�es Assumption PF.(v);

(d) the SCAD penalty function does not satisfy Assumption PF.(iii).

Proof. (a). First it is clear that the adaptive Lasso, bridge and SCAD penalty functionsall satisfy bP�n(0) = 0. For the adaptive Lasso penalty,

bP�n [��k(�o)] = �nj�k(b�ols)j�!j�k(�o)j = op(1)14

for all k 2 S�, by the consistency of �k(b�ols) and the Slutsky Theorem. The adaptiveLasso penalty is clearly twice di¤erentiable with bP 00�n(�) = 0 for all � 2 (0;1), and thussatis�es Assumptions PF.(i)-(ii). For the bridge penalty,

�n bP�n [��k(�o)] = �nj�k(�o)j = o(1);for all k and bP 00�n(�) = ( � 1)�nj�j �2 ! 0;

for any � 2 (0;1). Hence the bridge penalty satis�es Assumptions PF.(i)-(ii). For theSCAD penalty, when n is su¢ ciently large, we have

bP�n [��k(�o)] = (a+ 1)�2n2

= o(1);

for all k 2 S�. Furthermore when n is su¢ ciently large, the SCAD penalty function is

clearly twice di¤erentiable at �k(�o) (k 2 S�) with

pn bP 0�n [��k(�o)] = pn [�na� j�k(�o)j]+(a� 1) I(� > �n) = 0; (3.29)

where [x]+ = maxf0; xg, and bP 00�n [��k(�o)] = 0 for all k 2 S�. Hence the SCAD penalty

satis�es Assumptions PF.(i)-(iii).

(b). For the adaptive Lasso penalty,

n12 bP 0�n ��k(�o)� = n 1

2�nj�k(b�ols)j�! = op(1);where the last equality is by the consistency of b�k;n and Slutsky Theorem. For the bridgepenalty,

n12 bP 0�n ��k(�o)� = n 1

2�nj�k(�o)j �1 = o(1):

Hence both the adaptive Lasso penalty and bridge penalty functions satisfy Assumptions

PF.(iii).

(c) For the adaptive Lasso penalty, by Lemma 3.1, we can choose vn;k = n�!2 �

� 12

n = o(1)

such that

n bP�n �n�1vn;k� = jn�k(b�n)j�!jn!�nvn;kj = jn�k(b�n)j�!jn!2 �

12n j !p 1;

15

for all k 2 Scn;�. Thus the adaptive Lasso penalty satis�es Assumption PF.(iv). For thebridge penalty, we can choose vn = (n1� �n)

� 12 = o(1) such that

n bP�n �n�1vn;k� = n1� �njvn;kj = �n1� �n� 12 !1;

for all k 2 Scn;�. Hence the bridge penalty function satis�es Assumption PF.(iii).(d) For the SCAD penalty, if n�n !1 and �n ! 0, then

n bP�n �n�1vn;k� = �njvn;kj ! 0;

for any vn;k = o(1). If n�n ! 0, then

n bP�n �n�1vn;k� = a�njvn;kj(a� 1) �

n�1v2n;k + n�2n

2(a� 1) ! 0;

or

n bP�n �n�1vn;k� = (a+ 1)n�2n2

! 0:

Hence there exists no sequence vn;k such that Assumption PF.(iii) is satis�ed.

From the sparsity and consistency of �k(b�n) (k = 1; :::;m), we can deduce that the rankconstraint r(�) = ro is imposed on the LS shrinkage estimator b�n w.p.a.1. Hence, for largeenough n the shrinkage estimator b�n can be decomposed as b�nb�0n w.p.a.1, where b�n andb�n are some m� ro matrices. Without loss of generality, we assume the �rst ro columns of�o are linearly independent. To ensure identi�cation, we normalize �o as �o = [Iro ; Oro ]

0

where Oro is some ro � (m� ro) matrix such that

�o = �o�0o = [�o; �oOro ]: (3.30)

Hence �o is the �rst ro columns of �o which is an m � ro matrix with full rank and Orois uniquely determined by the equation �oOro = �o;2, where �o;2 denotes the last m� rocolumns of �o. Correspondingly, for large enough n we can normalize b�n as b�n = [Iro ; bOn]0where bOn is some ro � (m� ro) matrix.

From Theorem 3.2 and Assumption 3.3.(ii), we have

Op(1) =�b�n ��o�Q�1Dn = �b�n ��o� �pn�o(�0o�o)�1; n�o;?(�0o;?�o;?)�1� : (3.31)

16

From (3.31), we can deduce that

nb�n �b�n � �o�0 �o;?(�0o;?�o;?)�1 = Op(1);which implies

n� bOn �Oo� = Op(1): (3.32)

Similarly, we have

pn

�(b�n � �o) b�0n�o(�0o�o)�1 � �o �b�n � �o�0 �o(�0o�o)�1� = Op(1);

which combined with (3.32) implies

pn (b�n � �o) = Op(1): (3.33)

To conclude this section, we give the centered limit distribution of the shrinkage estimatorb�n in the following theorem.Theorem 3.6 Under Assumption WN, RR and PF, we have

Q�b�n ��o�Q�1D�1n !d

�0oB1m�

�1z1z1 �oB2;m

�RBw2B

0w2

��1�0o;?B1m�

�1z1z1 0

!(3.34)

where B1m = N(0;u �z1z1), B2;m =�RBw2dB

0u

�0 and �o = �0o�o(�0o�o)�1�0o.Proof. By the sparsity of �k(b�n) (k = 1; :::;m), we can deduce that b�n and b�n minimizethe following criterion function w.p.a.1,

V3;n(�; �) =nXt=1

�Yt � ��0Yt�1 2E + nXk2S�

bP�n ��k(��0)� : (3.35)

De�ne U�1;n =pn (b�n � �o) and U�3;n = n�b�n � �o�0 = h0ro ; n� bOn �Oo�i := �0ro ; U�2;n�,

then�b�n ��o�Q�1D�1n =

�b�n �b�n � �o�0 + (b�n � �o)�0o�Q�1D�1n=

hn�

12 b�nU�3;n�o(�0o�o)�1 + U�1;n; b�nU�3;n�o;?(�0o;?�o;?)�1i :

17

De�ne

�n(U) =hn�

12 b�nU3�o(�0o�o)�1 + U1; b�nU3�o;?(�0o;?�o;?)�1i ;

then by de�nition, U�n =�U�1;n; U

�2;n

�minimizes the following criterion function

V4;n(U) =nXt=1

�k�Yt ��oYt�1 ��n(U)DnZt�1k2E � k�Yt ��oYt�1k

2E

�+n

Xk2S�

n bP�n ��k(�n(U)DnQ+�o)�� bP�n ��k(�o)�ow.p.a.1.

For any compact set K 2 Rm�ro �Rro�(m�ro) and any U 2 K, we have

�n(U)DnQ = Op(n� 12 ):

Hence, from Assumption PF.(ii)-(iii) and the Bauer-Fiker Theorem, we can deduce that

n�� bP�n ��k(�n(U)DnQ+�o)�� bP�n ��k(�o)��

� Cnmaxk2S�

n bP 0�n ��k(�o)�o k�n(U)DnQkE= Op(n

12 )maxk2S�

n bP 0�n ��k(�o)�o = op(1); (3.36)

uniformly over U 2 K.As �o = [Iro ; Oro ]

0, by de�nition we have �o;? = [�O0ro ; Im�ro ]0 and hence

�n(U) ! p

�U1; (0m�ro ; �oU2)�o;?(�

0o;?�o;?)

�1�=

�U1; �oU2(�

0o;?�o;?)

�1� =: �1(U) (3.37)

18

uniformly over U 2 K. By Lemma 10.1 and (3.37), we deduce that

nXt=1

�k�Yt ��oYt�1 ��n(U)DnZt�1k2E � k�Yt ��oYt�1k

2E

�= vec [�n(U)]

0 Dn

nXt=1

Zt�1Z0t�1Dn Im

!vec [�n(U)]

�2vec [�n(U)]0 vec

nXt=1

utZ0t�1Dn

!

! d vec [�1(U)]0"

�z1z1 0

0RBw2B

0w2�o;?

! Im

#vec [�1(U)]

�2vec [�1(U)]0 vec [(B1;m; B2;m)] := V2(U) (3.38)

uniformly over U 2 K.Note that vec [�1(U)] =

hvec(U1)

0; vec(�oU2(�0o;?�o;?)�1)0

i0and vec(�oU2(�0o;?�o;?)

�1) =h(�0o;?�o;?)

�1 �0oi0vec(U2): Hence V2(U) can be rewritten as

V2(U) = vec(U1)0 [�z1z1 Im] vec(U1)

+vec(U2)0�(�0o;?�o;?)

�1ZBw2B

0w2(�

0o;?�o;?)

�1 �0o�o�vec(U2)

�2vec(U1)0vec (B1;m)� 2vec(U2)0vec��0oB2;m(�

0o;?�o;?)

�1� : (3.39)

The expression in (3.39) makes it clear that V2(U) is uniquely minimized at (U�1 ; U�2 ), where

U�1 = B1;m��1z1z1 and U

�2 = (�

0o�o)

�1 ��0oB2;m��Z Bw2B0w2

��1(�0o;?�o;?). (3.40)

From (3.32) and (3.33), we can see that U�n is asymptotically tight. Invoking the Argmax

Continuous Mapping Theorem (ACMT), we can deduce that

U�n = (U�1;n; U

�2;n)!d (U

�1 ; U

�2 ):

Applying the CMT, we get

Q�b�n ��o�Q�1D�1n !d

�0oU

�1 �0o�oU

�2 (�

0o;?�o;?)

�1

�0o;?U�1 0

!;

19

which �nishes the proof.

Remark 3.7 The generalized shrinkage LS estimator is de�ned as

b�g;n = argmin�

nXt=1

(�Yt ��Yt�1)0 b�1u;n (�Yt ��Yt�1) + n mXk=1

bP�n ��k(�)� (3.41)

where bu;n is some consistent estimator of u. The consistency and convergence rate ofb�g;n and the sparsity of �k(b�g;n) (k = 1; :::;m) can be established similarly. Following

similar arguments to those of Theorem 3.6, we can deduce that

nXt=1

(ut ��n(U)DnZt�1)0 b�1u;n (ut ��n(U)DnZt�1)� nXt=1

u0tb�1u;nut! d vec(U1)

0 ��z1z1 �1u � vec(U1)+vec(U2)

0�(�0o;?�o;?)

�1ZBw2B

0w2(�

0o;?�o;?)

�1 �0o�1u �o�vec(U2)

�2vec(U1)0vec��1u B1;m

�� 2vec(U2)0vec

��0o

�1u B2;m(�

0o;?�o;?)

�1�: = V3(U); (3.42)

uniformly over U 2 K. V3(U) is uniquely minimized at�U�g;1; U

�g;2

�, where U�g;1 = B1;m�

�1z1z1

and

U�2 = (�0o

�1u �o)

�1 ��0o�1u B2;m��Z Bw2B0w2

��1(�0o;?�o;?):

Invoking the ACMT, we deduce that

pn (b�g;n � �o)!d B1;m�

�1z1z1 ; (3.43)

and

n� bOg;n �Oo�!d (�

0o

�1u �o)

�1 ��0o�1u B2;m��Z Bw2B0w2

��1(�0o;?�o;?): (3.44)

In the triangular representation of a cointegration system studied in Phillips (1991), we

have �o = [Iro ; 0ro�(m�ro)]0 and �o = [Iro ; Oo]

0: Thus, �0o;?�o;? = Im�ro and

�0o�1u �o =

�1u;11�2 and �

0o

�1u = �1u;11�2[Iro ;�u;12

�1u;22]; (3.45)

20

where u;11�2 = �1u;11�u;12�1u;22u;21 and u;11, u;12 and u;22 denote the leading ro�ro

sub-matrix of u, the upper-right ro�(m�ro) sub-matrix of u and last (m�ro)�(m�ro)sub-matrix of u respectively. Using (3.44) and (3.45), we have

pn� bOg;n �Oo�!d

ZdBu1�2B

0u2

�ZBu2B

0u2

��1; (3.46)

where Bu1 and Bu2 denotes the �rst ro and last m � ro vectors of Bu and Bu1�2 = Bu1 �u;12

�1u;22Bu2. From the results in (3.46), we can see that the generalized shrinkage LS

estimator bOg;n of the cointegration matrix Oo is asymptotically equivalent to the maximumlikelihood estimator studied in Phillips (1991).

4 Extension I: VECM Estimation with Weakly Dependent

Innovations

In this section, we study shrinkage reduced rank estimation in a scenario where the equation

innovations futgt�1 are weakly dependent. Speci�cally, we assume that futgt�1 is generatedby a linear process satisfying the following condition.

Assumption 4.1 (LP) Let D(L) =P1j=0DjL

j, where D0 = Im and D(1) has full rank.

Let ut have the Wold representation

ut = D(L)"t =1Xj=0

Dj"t�j, with1Xj=0

j12 jjDj jjE <1; (4.1)

where "t is i:i:d:(0;�") with �" positive de�nite and �nite fourth moments.

Denote the long-run variance of futgt�1 as u =P1h=�1 �uu(h). From the Wold

representation in (4.1), we have u = D(1)�"D(1)0, which is positive de�nite because

D(1) has full rank and �" is positive de�nite. The fourth moment assumption is needed

for the limit distribution of sample autocovariances in the case of misspeci�ed transient

dynamics.

The following lemma is useful in establishing the asymptotic properties of the shrinkage

estimator with weakly dependent innovations.

21

Lemma 4.1 Under Assumption 3.2 and 4.1, (a)-(c) and (e) of Lemma 10.1 are un-changed, while Lemma 10.1.(d) becomes

n�12

nXt=1

�utZ

01;t�1 � �uz1(1)

�!d N(0;�uz1); (4.2)

where �uz1(1) =P1j=0 �uu(j)�o

�Rj�0< 1 and �uz1 is the long run variance matrix of

ut Z1;t�1:

Proof. From the partial sum expression in (3.7), we get Z1;t�1 = �0oYt�1 = R(L)�0out,

which implies that f�0oYt�1gt�1 is a stationary process. Note that

E�utZ

01;t�1

�=

1Xj=0

E�utu

0t�j��o�Rj�0=

1Xj=0

�uu(j)�o�Rj�0<1:

Using a CLT for linear process time series (e.g. the multivariate version of theorem 8 and

Remark 3.9 of Phillips and Solo, 1992), we deduce that

n�12

nXt=1

�utZ

01;t�1 � �uz1(1)

�!d N(0;�uz1);

which establishes (4.2). The results of (a)-(c) and (e) can be proved using similar arguments

to those of Lemma 10.1.

As expected, under general weak dependence assumptions on ut; the simple reduced

rank regression models (2.1) and (3.1) are susceptible to the e¤ects of potential misspeci�-

cation in the transient dynamics. These e¤ects bear on the stationary components in the

system. In particular, due to the centering term �uz1(1) in (4.2), both the OLS estimatorb�ols and the shrinkage estimator b�n are asymptotically biased. Speci�cally, we show thatb�ols has the following probability limit,b�ols !p �1 := Q

�1HoQ+�o; (4.3)

where Ho =��wz1(1)�

�1z1z1 ; 0m�(m�ro)

�. Note that

Q�1HoQ+�o =��o + �uz1(1)�

�1z1z1

��0o = e�o�0o; (4.4)

22

which implies that the asymptotic bias of the OLS estimator b�ols is introduced via the biasin the pseudo true value limit e�o. Observe also that �1 = e�o�0o has rank at most equal toro; the number of rows in �0o: Denote

bS12 =nXt=1

Z1;t�1Z 02;t�1n

, S21 =nXt=1

Z2;t�1Z 01;t�1n

,

bS11 =

nXt=1

Z1;t�1Z 01;t�1n

and bS22 = nXt=1

Z2;t�1Z 02;t�1n

.

The next Lemma presents some asymptotic properties of the bias in the OLS estimatorb�ols.Lemma 4.2 Let Hn = n

��wz1(1); 0m�(m�ro)

� �Pnt=1 Zt�1Z

0t�1��1 and �1;n = Q�1HnQ+

�o: Then

(a) Hn converges in probability to Ho, i.e. Hn !p Ho;

(b) nQ�1HnQ�o? has limit distribution e�1�o?, wheree�1 = �uz1(1)��1z1z1(�0o�o)�1 ��Z Bw2dB

0w1

�0+�w1w2

��ZBw2B

0w2

��1�0o?; (4.5)

(c)pnQ�1 (Hn �Ho)Q�o has the limit distribution e�2�o, where

e�2 = �uz1(1)��1z1z1N(0; Vz1z1)��1z1z1�0o (4.6)

and N(0; Vz1z1) denotes the matrix limit distribution ofpn�bS11 � �z1z1�.

Proof. (a). Denote �wz1(1) = Q�uz1(1). By Lemma 4.1, we have

Hn =��wz1(1); 0m�(m�ro)

�n12D�1n

D�1n

nXt=1

Zt�1Z0t�1D

�1n

!�1D�1n n

12

=��wz1(1); 0m�(m�ro)

� D�1n

nXt=1

Zt�1Z0t�1D

�1n

!�1 Iro 0

0 n�12 Im�ro

!! p

��wz1�

�1z1z1 : 0

�= Ho: (4.7)

23

(b). From the expression of Hn in the �rst line of (4.7), we get

nQ�1HnQ�o? = [�uz1(1); 0]

D�1n

nXt=1

Zt�1Z0t�1D

�1n

!�1 0

n12�0o?�o?

!

= ��uz1(1)bS�111 bS12 �n�1 bS22 � n�1 bS21 bS�111 bS12��1 �0o?�o?: (4.8)

By Lemma 4.1 and the CMT, we have

�bS22 � n�1 bS21 bS�111 bS12��1 !d

�ZBw2B

0w2

��1: (4.9)

Next note that bS12 !d �(�0o�o)�1��Z

Bw2dB0w1

�0+�w1w2

�: (4.10)

The claimed result now follows by applying the results in (4.8)-(4.10), Lemma 4.1 and

CMT into the expression in (4.8).

(c). From the expression of Hn in the �rst line of (4.7), we get

pnQ�1 (Hn �Ho)Q�o

=pn [�uz1(1); 0]

8<:nD�1n bS11 n�

12 bS12

n�12 bS21 n�1 bS22

!�1D�1n �

��1z1z1 0

0 0

!9=;Q�o=

pn�uz1(1)

�Hn;11 � ��1z1z1

��0o�o; (4.11)

where Hn;11 =�bS11 � bS12 bS�122 bS21��1 and we use �0o?�o = 0 in get the third equality.

Invoking the CLT and Lemma 4.1, we get

�uz1(1)pn�Hn;11 � ��1z1z1

��0o�o

= ��uz1(1)Hn;11pn�H�1n;11 � �z1z1

��1z1z1�

0o�o

= ��uz1(1)H11;npn�bS11 � �z1z1��1z1z1�0o�o + op(1)

! d �uz1(1)��1z1z1N(0; Vz1z1)�

�1z1z1�

0o�o; (4.12)


24

Denote the rank of �1 by r1: Then, by virtue of the expression �1 = e�o�0o, we haver1 � ro as indicated. Without loss of generality, we decompose �1 as �1 = e�1e�01 wheree�1 and e�1 are m � r1 matrixes with full rank. Denote the orthogonal complements of e�1and e�1 as e�1? and e�1? respectively. Similarly, we decompose e�1? as e�1? = he�?; �o?iwhere e�? is an m � (r1 � ro) matrix. Denote e�0 = �R Bw2dB0u�0 �R Bw2B0w2��1 �0o? ande�3 = N(0;�uz1)�

�1z1z1�

0o. Let [�1(b�ols); :::; �m(b�ols)] and [�1(�1); :::; �m(�1)] be the or-

dered eigenvalues of b�ols and �1; respectively.Lemma 4.3 Under Assumption 3.2 and 4.1, we have the following results:

(a) the OLS estimator b�ols satis�eshQ�b�ols ��o�Q�1 �HniDn = Op(1); (4.13)

(b) the eigenvalues of b�ols satisfy[�1(b�ols); :::; �m�ro(b�ols)]!p (0; :::; 0)1�(m�r1) (4.14)

and

[�m�r1+1(b�ols); :::; �m(b�ols)]!p

��m�ro+1(�1); :::; �m(�1)

�; (4.15)

(c) the �rst m� ro ordered eigenvalues of b�ols satisfyn[�1(b�ols); :::; �m�ro(b�ols)]!d [e�01; :::; e�0m�ro ]; (4.16)

where e�0j (j = 1; :::;m�ro) are the ordered solutions of ��uIm�ro � �0o? �e�0 + e�1��o?�� = 0;(d) b�ols has ro � r1 eigenvalues satisfying

pn[�m�ro+1(

b�ols); :::; �m�r1(b�ols)]!d [e�0m�ro+1; :::; e�0m�r1 ]; (4.17)

where e�0j (j = m�ro+1; :::;m�r1) are the ordered solutions of ��uIro�r1 � e�0? �e�2 + e�3� e�?�� =0.

Proof. (a). From the expression in (3.10) and (4.7),

hQ�b�ols ��o�Q�1 �HniDn =

"nXt=1

�wtZ

0t�1 �H1;o

�#D�1n

D�1n

nXt=1

Zt�1Z0t�1D

�1n

!�1;

(4.18)

25

where H1;o =��wz1(1); 0m�(m�ro)

�. Now, the results in (a) are directly from Lemma 4.1

and CMT.

(b). Denote P =he�1; e�1?i and Sn(�) = �Im� b�ols, then by de�nition, the eigenvalues

of b�ols are the solutions of the following determinantal equation,0 =

��P 0Sn(�)P �� =�� e�

01e�1 � e�01b�olse�1 �e�01b�olse�1?�e�01?b�olse�1 �Im�ro � e�01?b�olse�1?

�� : (4.19)

As �1e�1? = 0 and b�ols = �1 + op(1),e�01b�olse�1? = e�01 �b�ols ��1� e�1? + e�01�1e�1? = op(1): (4.20)

Similarly, we have e�01?b�olse�1? = e�01? �b�ols ��1� e�1? = op(1) (4.21)

and e�01?b�olse�1 !pe�01?�1e�1 and e�01b�olse�1 !p

e�01�1e�1: (4.22)

From the results in (4.19)-(4.22), we can invoke the Slutsky Theorem to deduce that��Im � b�ols��!p j�Im�r1 j ��e�01e�1 � e�01�1e�1�� ; (4.23)

uniformly over any compact set in R, where��e�01e�1 �p e�01�1e�1�� = 0 can equivalently be

written as��Ir1�ro � e�01e�1�� = 0. Hence the claimed results follow by (4.23) and the CMT.

(c). If we denote u�n;k = n�k(b�ols), then by de�nition, u�n;k (k 2 Scols;�) is the solutionof the following determinantal equation

0 =��e�01Sn(u)e�1�� e�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?�� ; (4.24)

where Sn(u) = unIm � b�ols.

From the results in (4.3) and (4.4), we have

e�01Sn(u)e�1 = n� 12ue�01e�1 � e�01b�olse�1 !p �e�01e�1e�01e�1; (4.25)

26

where e�01e�1e�01e�1 is a r1� r1 nonsingular matrix. Hence u�n;k is the solution of the followingdeterminantal equation asymptotically

0 =

��e�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?�� : (4.26)

Denote Tn(u) = Sn(u) � Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u), then (4.26) can be equivalentlywritten as

0 =��e�0?Tn(u)e�?�� 0o?�Tn(u)� Tn(u)e�? he�0?Tn(u)e�?i�1 e�0?Tn(u)��o?�� : (4.27)

Note that

n12 e�0?Sn(u)e�? = n�

12ue�0?e�? � n 1

2 e�0? �b�ols ��1;n� e�? � n 12 e�0?�1;ne�?;

n12 e�01Sn(u)e�? = �n

12 e�01 �b�ols ��1;n� e�? � n 1

2 e�01�1;ne�?;e�0?Sn(u)e�1 = �e�0?b�olse�1:From above expressions, we can write

n12 e�0?Tn(u)e�? = n� 1

2ue�0?e�? � e�0? �Im + b�olse�1 he�01Sn(u)e�1i�1 e�01�M1;n; (4.28)

where M1;n = n12

�b�ols ��1;n� e�? + n 12�1;ne�?. Using Lemma 4.1 and the results in (a),

we can deduce that

n12

�b�ols ��1;n� e�? = Q�1hQ�b�ols ��1;n�Q�1Dnin 1

2D�1n Qe�?! d N(0;�uz1)�

�1z1z1�

0oe�? := N1e�?: (4.29)

Using the results in Lemma 4.2, we get

n12�1;ne�? = n

12 (�1;n ��1) e�? = n 1

2Q�1 (Hn �Ho)Qe�?! d �uz1(1)�

�1z1z1N(0; Vz1z1)�

�1z1z1�

0oe�? := N2e�?: (4.30)

27

From (4.28)-(4.30), we can deduce that��pne�0?Tn(u)e�?��!d

��e�0? (N1 +N2) e�?�� 6= 0; a:e: (4.31)

Next note that

�0o?Tn(u)�o? = n�1u�0o?�o? � �0o?�Im + b�olse�1 �e�01Sn(u)e�1��1 e�01�M2;n; (4.32)

e�0?Tn(u)�o? = �e�0? �Im + b�olse�1 �e�01Sn(u)e�1��1 e�01�M2;n; (4.33)

where M2;n =�b�ols ��1;n��o? +�1;n�o?. By (4.22) and (4.25), we have

�0o?Tn(u)e�? = �0o?b�olse�? � �0o?b�olse�1 �e�01Sn(u)e�1��1 e�01b�olse�? = op(1): (4.34)

Using Lemma 4.2, we can deduce that

n�b�ols ��1;n��o? = nQ�1

hQ�b�ols ��1;n�Q�1DniD�1n Q�o?

= Q�1hQ�b�ols ��1;n�Q�1Dni

0

�0o?�o?

!

! d

�ZBw2dB

0u

�0�ZBw2B

0w2

��1�0o?�o? := e�0�o? (4.35)

Using the result in (4.5), we get

n�1n�o? = n��o +Q

�1HnQ��o?

= nQ�1HnQ�o? !de�1�o?: (4.36)

From (4.32)-(4.36), we deduce that��n�0o?�Tn(u)� Tn(u)e�? he�0?Tn(u)e�?i�1 e�0?Tn(u)��o?��! d

��uIm�ro � �0o? �e�0 + e�1��o?�� ; (4.37)

uniformly over any compact set in R. Now, the results in (c) follow from (4.37) and the

CMT.

28

(d) If we denote u�n;k =pn�k(b�ols), then by de�nition, u�n;k (k 2 eScols;�) is the solution

of the following determinantal equation

0 =��e�01Sn(u)e�1�� e�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?�� ; (4.38)

where Sn(u) = upnIm � b�ols.

Note that

e�01?Sn(u)e�1? = n�12uIm�r1 � e�01?b�olse�1?; (4.39)e�01?Sn(u)e�1 = �e�01?b�olse�1 and e�01Sn(u)e�1? = �e�01b�olse�1?: (4.40)

Using expressions (4.39) and (4.40), we have

e�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?= n�

12uIm�r1 � e�01?�Im + b�olse�1 he�01Sn(u)e�1i�1 e�01� b�olse�1?: (4.41)

From (4.35), we getpn�b�ols ��1n��o;? = op(1): (4.42)

Using (4.29), (4.5) and (4.6), we have

pn�b�ols ��1n� e�? ! d

e�3e�?; (4.43)pn�1n�o? = op(1); (4.44)pn�1ne�? ! d

e�2e�?: (4.45)

Now, using the results in (4.42)-(4.45), we get

pnb�olse�1? =

pnhb�ols ��1ni e�1? +pn�1ne�1?

! d

h0m�(m�ro);

e�3e�?i+ h0m�(m�ro); e�2e�?i=

h0m�(m�ro);

�e�2 + e�3� e�?i : (4.46)

29

From the results in (4.41)-(4.46), it follows that��pne�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?��! d

��uIm�r1 � e�01? �Im + e�1 �e�01e�1��1 e�01� h0m�(m�ro); (N1 +N2) e�?i��= juIm�r0 j �

��uIro�r1 � e�0? �e�2 + e�3� e�?�� : (4.47)

Note that the determinantal equation

juIm�r0 j ��uIro�r1 � e�0? �e�2 + e�3� e�?�� = 0 (4.48)

has m� r0 zero eigenvalues, which correspond to the probability limit ofpn�Scols;�(

b�ols),as illustrated in (c). Equation (4.48) also has ro � r1 non-trivial eigenvalues as solutionsof the stochastic determinantal equation

��uIro�r1 � e�0? �e�2 + e�3� e�?�� = 0, which �nishesthe proof.

We next derive the asymptotic properties of the shrinkage estimator b�n with weaklydependent innovations. Let eS� = fk : �k(�1) 6= 0; k = 1; :::;mg be the index set of non-zeroeigenvalues of �1. From Lemma 4.3, it is clear that eS� � S�.Corollary 4.1 Under Assumption 3.1, 4.1 and 3.3.(i), the shrinkage estimator b�n satis-�es b�n !p �1; (4.49)

where �1 is de�ned in (4.3).

Proof. First, when ro = 0, then �1 = e�o�0o = 0 = �o. Hence, (4.49) follows by the

similar arguments to those in the proof of Theorem 3.1. To �nish the proof, we only need

to consider scenarios where ro = m and ro 2 (0;m).

30

Using the same notation for V1;n(�) de�ned in the proof of Theorem 3.1, by de�nition

we have V1;n(b�n) � V1;n(�1), which implieshvec(b�n ��1)i0 nX

t=1

Yt�1Y0t�1 Im

!hvec(b�n ��1)i

+2hvec(b�n ��1)i0 vec" nX

t=1

utY0t�1 � (�1 ��o)

nXt=1

Yt�1Y0t�1

#

� n

(mXk=1

bP�n ��k(�1)�� mXk=1

bP�n h��k(b�n)i): (4.50)

When ro = m, Yt is stationary and we have

1

n

nXt=1

Yt�1Y0t�1 !p �Y Y (0) = R(1)uR(1)

0: (4.51)

From the results in (4.50) and (4.51), we get w.p.a.1, b�n ��1 2E�min �

b�n ��1 Ecn � dn � 0; (4.52)

where �min denotes the smallest eigenvalue of �Y Y (0) , which is positive with probability

1,

cn =

Pnt=1 utY

0t�1

n� (�1 ��o)

Pnt=1 Yt�1Y

0t�1

n

E

! p

�uY (1)� �uY (1)��1z1z1�z1z1 E = 0;and dn =

Pk2 eS� bP�n ��k(�1)� = op(1). So result in (4.49) follows directly from the in-

equality in (4.52).

When 0 < ro < m, we get

hvec(b�n ��1)i0 nX

t=1

Yt�1Y0t�1 Im


= vec(b�n ��1)0 BnDn nXt=1

Zt�1Z0t�1DnB

0n Im

!vec(b�n ��1)

� �min

�b�n ��1�Bn 2E: (4.53)

31

Next, note that(nXt=1

utZ0t�1 �

�(�1 ��o)Q�1

� nXt=1

Zt�1Z0t�1

)Dn

=

"n�

12Pnt=1 Z1;t�1u

0t

n�1Pnt=1 Z2;t�1u

0t

#0�"n�

12Pnt=1 Z1;t�1Z

01;t�1�

�1z1z1�

0uz1(1)

n�1Pnt=1 Z2;t�1Z

01;t�1�

�1z1z1�

0uz1(1)

#0: (4.54)

From Lemma 10.1, we can deduce that

n�1nXt=1

Z2;t�1u0t = Op(1) and n

�1nXt=1

Z2;t�1Z01;t�1�

�1��

0uZ1(1) = Op(1): (4.55a)

Similarly, we get

n�12

nXt=1

�Z1;t�1u

0t � �0uz1(1)

�� n

12 [Sn;11 � �z1z1 ] ��1z1z1�

0uz1(1) = Op(1): (4.56)

De�ne en = �Pn

t=1 utZ0t�1 �

�(�1 ��o)Q�1

�Pnt=1 Zt�1Z

0t�1Dn E, then from (4.54)-

(4.56) we can deduce that en = Op(1). By Cauchy-Schwarz, we have��hvec(b�n ��1)i0 vec"nXt=1

utY0t�1 � (�1 ��o)

nXt=1

Yt�1Y0t�1

#��=

��hvec(b�n ��1)i0 vec"(

nXt=1

utZ0t�1 �

�(�1 ��o)Q�1

� nXt=1

Zt�1Z0t�1

)DnB

0n

#��

�b�n ��1�Bn Een: (4.57)

De�ne dn = nPk2 eS� bP�n ��k(�1)�, from results in (4.50), (4.53) and (4.57), we get

�min

�b�n ��1�Bn 2E� 2

�b�n ��1�Bn Een � dn � 0 (4.58)

Now, result in (4.49) follows by the same arguments in Theorem 3.1.

Corollary 4.1 implies that the shrinkage estimator b�n has the same probability limit asthat of the OLS estimator b�ols. We next derive the convergence rate of b�n.Corollary 4.2 Denote an = maxk2 eS�

�� bP 0�n ��k(�1)��. Under Assumption RR, 3.3.(i)-(ii)32

and 4.1, the shrinkage LS estimator b�n satis�es(a) if ro = 0, then b�n ��1 = Op(n�1 + n�1an);(b) if 0 < ro � m, then

�b�n ��1�Q�1D�1n = Op(n� 12 + an).

Proof. By the results of Corollary 4.1 and the CMT, we deduce that �k(b�n) is a consistentestimator of �k(�1) for all k = 1; :::;m. Using the same arguments as those of Theorem

3.2, we deduce thatXk2 eS�

� bP�n ��k(�1)�� bP�n [��k(b�n)]� � Cmaxk2 eS�

�� bP 0�n ��k(�1)�� b�n ��1 E+ op(1)

b�n ��1 2E: (4.59)

Using the inequality (4.59) in (4.50) gives

hvec(b�n ��1)i0 nX

t=1

Yt�1Y0t�1 Im


+2hvec(b�n ��1)i0 vec" nX

t=1

utY0t�1 � (�1 ��o)

nXt=1

Yt�1Y0t�1

#

� Cnmaxk2 eS�

�� bP 0�n ��k(�1)�� b�n ��1 E + op(1) b�n ��1 2E ; (4.60)

where C > 0 denotes a generic constant.

When ro = 0, the convergence rate of b�n could be derived using the same arguments inTheorem 3.2. Hence, to �nish the proof, we only need to consider scenarios where ro = m

and 0 < ro < m.

Consider the case ro = m. Note that

n�12

nXt=1

utY0t�1 � n�

12 (�1 ��o)

nXt=1

Yt�1Y0t�1

= n�12

nXt=1

�utY

0t�1 � �uY (1)

�� uY (1)��1z1z1

hn12

�bS11 � �z1z1�i = Op(1):Following similar arguments to those of Theorem 3.2, we get b�n ��1 2

E�min �

b�n ��1 E(cn + anC) � 0; (4.61)

33

where cn = n�1 �Pn

t=1 utY0t�1 � (�1 ��o)

Pnt=1 Yt�1Y

0t�1� E= Op(n

� 12 ). From the in-

equality in (10.32), we deduce that

b�n ��1 = Op(n� 12 + an): (4.62)

When 0 < ro < m, we can use (4.50), (4.53) and (4.57) in the proof of Corollary 4.1

and (10.29) in the proof of Corollary 3.2 to get

�min

�b�n ��1�Bn 2E� 2

�b�n ��1�Bn Een � nanCV

b�n ��1 E; (4.63)

where en = �Pn

t=1 utZ0t�1 �

�(�1 ��o)Q�1

�Pnt=1 Zt�1Z

0t�1Dn E= Op(1). Note that b�n ��1

E= �b�n ��1�BnB�1n

E� Cn�

12

�b�n ��1�Bn E: (4.64)

From the inequalities in (4.63) and (4.64), we obtain�b�n ��1�Bn = Op(n� 12 + an); (4.65)


Corollary 4.1 implies that the nonzero eigenvalues of �1 are estimated as nonzeros

w.p.a.1. However, the matrix �1 may have more zero eigenvalues than �o. To ensure

consistent cointegration rank selection, we need to show that �1 hasm�ro zero eigenvalueswhich are estimated as zero w.p.a.1 and ro � r1 zero eigenvalues which are estimated asnon-zeros w.p.a.1. From Lemma (4.2), we see that b�ols has m � ro eigenvalues whichconverge to zero at the rate n and ro � r1 eigenvalues which converge to zero at the ratepn. The di¤erent convergence rates of the estimates of the zero eigenvalues of �1 enable

us to empirically distinguish the estimates of the m � ro zero eigenvalues of �1 from the

estimates of the ro � r1 zero eigenvalues of �1. For example, we can de�ne thresholdingestimators of the eigenvalues of �1 in the following way

b�th;k =(

0b�k if��b�k�� n� 5

8

otherwise;

where b�k = �k(b�ols). Let eSols;� = fk : �k(b�ols) 6= 0; k = 1; :::;mg be the index set ofnonzero eigenvalues of b�ols. As n 5

8

��b�k�� = n18

��n 12 b�k�� !p 1 for all k 2 eSols;�, it follows

34

that b�th;k = b�k 8k 2 eSols;� w.p.a.1. On the other hand, n 58

��b�k�� = n� 38

��nb�k�� !p 0 for all

k 2 eScols;�, and hence b�th;k = 0 8k 2 eScols;� w.p.a.1. From the arguments above, we see that

consistent cointegration rank is achieved. We next show that the LS shrinkage estimatorb�n has only m� ro zero eigenvalues w.p.a.1.Assumption 4.2 The penalty function bP�n(�) satis�es

n12+v bP 0�n [��k(b�n)] = op(1)

for all k 2 S� and some positive constant v.

It is easy to see that if n1+!2 �n = o(1), then by Lemma (4.3) the adaptive Lasso penalty

satis�es Assumption 4.2. For any m�m matrix �, we have by de�nition

�vk = �kvk and �0uk = �kuk;

where �k is the k-th largest eigenvalue of �, and vk and uk are the related normalized right

and left eigenvectors. Applying the chain rule, we get

d�vk +�dvk = vkd�k + �kdvk;

so that

u0kd�vk = u0kvkd�k; (4.66)

where we use the fact that u0k� = u0k�k. It is easy to see that e�1? and e�1? contain the right

and left eigenvectors of the zero eigenvalues of �1 respectively. When r1 = m, then the

limit results �k(b�n)!p �k(�1), k = 1; :::;m imply consistent cointegration rank selection

immediately. When r1 < m, both e�1? and e�1? are non-zero. Suppose the (i; j)-th elementof e�1? and (l;m)-th element of e�1? are nonzero, then from the di¤erentials in (4.66), we

deduce thatd�jld�k

��=�1

=(e�1?)0�j(e�1?)l�(e�1?)ij(e�1?)lm : (4.67)

The result in (4.67) is important for establishing the following result.

Corollary 4.3 Under Assumption LP, RR, Assumption 3.3 and Assumption 4.2, we have

Pr��k(b�n) 6= 0�! 1 (4.68)

35

for all k 2 S� as n!1, and

Pr��k(b�n) = 0�! 1 (4.69)

for all k 2 Sc� as n!1.

Proof. On the event f�k(b�n) = 0g, for some k 2 S�, we have the following Karush-Kuhn-Tucker (KKT) optimality condition

��n� 12+vvec(@b�n)0 (Y�1 Im) h�y � �Y 0�1 Im� vec(b�n)i�� < n

12+v bP 0�n [��k(b�n)]

2; (4.70)

where @b�n is a matrix whose jk-th element equals (b�1?)0�j(b�1?)k�(b�1?)ij(b�1?)kl and whose other elements

are zero. By Corollary 4.1 and continuous mapping, we have

@b�n !p

(e�1?)0�j(e�1?)l�(e�1?)ij(e�1?)lm := @�1: (4.71)

Denote

bJ =

Pnt=1 Yt�1u

0t

n�Pnt=1 Yt�1Y

0t�1

n(b�n ��o)

= Q�1�Pn

t=1 Zt�1w0t

n�Pnt=1 Zt�1Z

0t�1

nQ0�1(b�n ��o)0Q0�Q0�1;

where Pnt=1 Zt�1Z

0t�1

nQ0�1(b�n ��o)0Q0

=

bS11 bS12bS21 bS22!

�0o(b�n ��o)�o(�0o�o)�1 �0ob�n(�0o?�o?)�1�0o?(

b�n ��o)�o(�0o�o)�1 �0o?b�n�o?(�0o?�o?)�1

!0

=

bS11 bS12bS21 bS22!

��1z1z1�z1w1(1) + op(1) ��1z1z1�z1w2(1) + op(1)

(�0o?�o?)�1�0o?

b�0n�o (�0o?�o?)�1�0o?

b�0n�o?!:

36

Hence

bJ11 =

Pnt=1 Z1;t�1w

01t

n�hbS11��1z1z1�z1w1(1) + bS12(�0o?�o?)�1�0o?b�0n�oi+ op(1)

=

Pnt=1 Z1;t�1w

01t

n� bS11��1z1z1�z1w1(1) + op(1)

= bS11 �bS�111 � ��1z1z1� Pnt=1 Z1;t�1w

01t

n+ bS11��1z1z1 �Pn

t=1 Z1;t�1w01t

n� �z1w1(1)

�+ op(1);

which implies that

n12+v bJ11 !p 1: (4.72)

for any v > 0. Next note that

bJ12 =

Pnt=1 Z1;t�1w

02t

n�hbS11��1z1z1�z1w2(1) + bS12(�0o?�o?)�1�0o?b�0n�o?i+ op(1)

=

Pnt=1 Z1;t�1w

02t

n� bS11��1z1z1�z1w2(1) + op(1)

= bS11 �bS�111 � ��1z1z1� Pnt=1 Z1;t�1w

02t

n+ bS11��1z1z1 �Pn

t=1 Z1;t�1w02t

n� �z1w2(1)

�+ op(1);

which implies that

n12+v bJ12 !p 1: (4.73)

For the third term,

bJ21 =

Pnt=1 Z2;t�1w

01t

n�hbS21��1z1z1�z1w1(1) + bS22(�0o?�o?)�1�0o?b�0n�oi+ op(1)

=

Pnt=1 Z2;t�1w

01t

n� bS22(�0o?�o?)�1�0o? �b�n ��1;n�0 �o

�hbS21��1z1z1�z1w1(1) + bS22(�0o?�o?)�1�0o?�01;n�oi+ op(1):

=

�ZBw2dB

0w1

��Z

Bw2dB0w2

�(�0o?�o?)

�1�0o?

�b�n ��1;n�0 �o + op(1):If b�n is the OLS estimator, then it is easy to see that�Z

Bw2dB0w2

�(�0o?�o?)

�1�0o?

�b�ols ��1;n�0 �o !d

�ZBw2dB

0w1

�:

37

However, as shown in Liao and Phillips (2010), when there are rank constraints imposed

on b�n, the limiting distribution of �R Bw2dB0w2� (�0o?�o?)�1�0o? �b�n ��1;n�0 �o will bedi¤erent from the above. Hence, we can deduce that

n12+v bJ21 !p 1: (4.74)

For the fourth term,

bJ22 =

Pnt=1 Z2;t�1w

02t

n�hbS21��1z1z1�z1w2(1) + bS22(�0o?�o?)�1�0o?b�0n�o?i+ op(1)

=

Pnt=1 Z2;t�1w

02t

n� bS22(�0o?�o?)�1�0o? �b�n ��1;n�0 �o?

�hbS21��1��z1w2(1) + bS22(�0o?�o?)�1�0o?�01;n�o?i+ op(1):

=

�ZBw2B

0w2

��Z

Bw2B0w2

�(�0o?�o?)

�1�0o?

�b�n ��1;n�0 �o? + op(1):Using similar arguments to those in deriving (4.74), we have

n12+v bJ22 !p 1: (4.75)

From the results in (4.71) and (4.72)-(4.75), we can deduce that��n� 12+vvec(@b�n)0 (Y�1 Im) h�y � �Y 0�1 Im� vec(b�n)i��!p 1: (4.76)

However, from Assumption 4.2, we have

n12+v bP 0�n [��k(b�n)]

2= op(1) (4.77)

Results in (4.70), (4.76) and (4.77) imply that Pr��k(b�n) = 0�! 0, for any k 2 S�; giving

(4.68). The proof of (4.69) is similar to that of Theorem 3.3 and is therefore omitted.

Corollary 4.3 implies that b�n has m � ro eigenvalues estimated as zero w.p.a.1 andthe other eigenvalues are estimated as nonzero w.p.a.1. Hence Corollary 4.3 gives us the

following result immediately.

38

Theorem 4.4 (Sparsity-II) Under the conditions of Corollary 4.3, we have

Pr�r(b�n) = ro�! 1 (4.78)

as n!1, where r(b�n) denotes the rank of b�n.5 Extension II: VECM Estimation with Explicit Transient

Dynamics

In this section, we are interested in the estimation of the VECM model

�Yt = �oYt�1 +

pXj=1

Bo;j�Yt�j + ut (5.1)

with simultaneous cointegration rank selection and lag order selection. Denoting Bo =

(Bo;1; :::; Bo;p), the unknown parameter matrices (�o; Bo) are estimated via the following

penalized LS estimation

(b�n; bBn) = argmin�2Rm�m;Bj2Rm�m

nXt=1


Bj�Yt�j

2

E

+n

pXj=1


bP�n ��k(�)� : (5.2)

In (5.2), bP�n(�) denotes the adaptive group Lasso penalty function, i.e.bP�n(A) = �n kAkE/ jj bAnjj!E ;

for any m�m matrix, where ! > 0 and bAn is some �rst-step estimator of A.The adaptive group Lasso delivers the penalty to the shrinkage estimator groupwisely.

For example, if kAkE 6= 0 for some m�m matrix A and bAn is a �rst step consistent esti-mator of A, then under condition �n ! 0, the penalty on the estimate of each component

of A will go to zero asymptotically because

jj bAnjj!E !p kAk!E 6= 0;

39

even though the matrix A may have some zero elements. On the other hand, if kAkE = 0,then the penalty on the estimate of each component of A will be nontrivial and it is possible

to consistently recover the zero matrix A in the LS shrinkage estimation. Also note that

when A is a scalar, then the adaptive group Lasso penalty function reduces to the adaptive

Lasso penalty function.

To consistently select the true order of the lagged di¤erences, Bo should be at least

consistently estimable. Thus we assume that the error term ut satis�es Assumption 3.1.

Assumption 3.2 needs revision to accommodate the general structure in (5.1). Denote

C(�) = �o +

pXj=0

Bo;j(1� �)�i

where Bo;0 = �Im.

Assumption 5.1 (RR) (i) The determinantal equation jC(�)j = 0 has roots on or outsidethe unit circle; (ii) the matrix �o has rank ro, with 0 � ro � m; (iii) the following

(m� ro)� (m� ro) matrix

�0o;?

0@Im � pXj=1

Bo;j

1A�o;? (5.3)

is nonsingular.

Under Assumption 5.1, the time series Yt has following partial sum representation,

Yt = CB

tXs=1

us + �(L)ut + CBY0 (5.4)

where CB = �o;?

h�0o;?

�Im �

Ppj=1Bo;j

��o;?

i�1�0o;? and �(L)ut =

P1s=0 �sut�s is a

stationary process. From the partial sum representation in (5.4), we deduce that �0oYt =

�0o�(L)ut and �Yt�j (j = 0; :::; p) are stationary.

De�ne

QB :=

0B@ �0o 0

0 Imp�mp

�0o;? 0

1CAm(p+1)�m(p+1)

40

and denote �Xt�1 =��Y 0t�1; :::;�Y

0t�p�0, then the model in (5.1) can be rewritten as

�Yt =h�o Bo

i " Yt�1

�Xt�1

#+ ut; (5.5)

Denote

Zt�1 = QB

"Yt�1

�Xt�1

#=

"Z1;t�1

Z2;t�1

#; (5.6)

where Z1;t�1 =

"�0oYt�1

�Xt�1

#is a stationary process and Z2;t�1 = �0o;?Yt�1 consists the I(1)

components.

Lemma 5.1 Under Assumption 3.1 and 5.1, the results in Lemma 10.1 are satis�ed forZ1;t and Z2;t de�ned in (5.6).

We �rst establish the asymptotic properties of the OLS estimator�b�ols; bBols� of

(�o; Bo) and the asymptotic properties of the eigenvalues of b�ols. The estimate �b�ols; bBols�has the following closed-form solution

�b�ols; bBols� = � bSy0y1 bSy0x0 � bSy1y1 bSy1x0bSx0y1 bSx0x0

!�1; (5.7)

where

bSy0y1 =1

n

nXt=1

�YtY0t�1; bSy0x0 = 1

n

nXt=1

�Yt�X0t�1; bSy1y1 = 1

n

nXt=1

Yt�1Y0t�1;

bSy1x0 =1

n

nXt=1

Yt�1�X0t�1; bSx0y1 = bS0y1x0 and bSx0x0 = 1

n

nXt=1

�Xt�1�X0t�1. (5.8)

Denote Y� = (Y0; :::; Yn�1)m�n, �Y = (�Y1; :::;�Yn)m�n and cM0 = Im ��X 0 bS�1x0x0�X,where �X = (�X0; :::;�Xn�1)mp�n, then b�ols has the explicit representation

b�ols =��Y cM0Y

0�

��Y�cM0Y

0�

��1= �o +

�UcM0Y

0�

��Y�cM0Y

0�

��1; (5.9)

41

where U = (u1; :::; un)m�n. Recall that [�1(b�ols); :::; �m(b�ols)] and [�1(�o); :::; �m(�o)] arethe ordered eigenvalues of b�ols and �o respectively, where �j(�o) = 0 (j = 1; :::;m� ro).Lemma 5.2 Suppose that Assumption 3.1 and 5.1 hold.

(a) De�ne Dn;B = diag(n12 Iro+mp; nIm�ro), then

h(b�ols; bBols)� (�o; Bo)iQ�1B Dn;B has

the following limit distribution�N(0;u �z1z1)��1z1z1 ;

�RBw2dB

0u

�0 �RBw2B

0w2

��1 �; (5.10)

(b) the eigenvalues of b�ols satisfy Lemma 3.1.(b);(c) the �rst m� ro ordered eigenvalues of b�ols satisfy Lemma 3.1.(c).

Proof. To prove (a), start by de�ning bSuy1 = 1n

nPt=1utY

0t�1 and bSux0 = 1

n

nPt=1ut�X

0t�1.

From the expression in (5.8), we geth(b�ols; bBols)� (�o; Bo)iQ�1B Dn;B

=� bSuy1 bSux0 �Q0BD�1n;B

"D�1n;BQB

bSy1y1 bSy1x0bSx0y1 bSx0x0!Q0BD

�1n;B

#�1: (5.11)

Note that

� bSuy1 bSux0 �Q0BD�1n;B = U"QB

Y�

�X

!#0D�1n;B =

�n�

12UZ 01 n�1UZ 02

�(5.12)

and

D�1n;BQB

bSy1y1 bSy1x0bSx0y1 bSx0x0!Q0BD

�1n;B =

0BB@ n�1nPt=1Z1;tZ

01;t n�

32

nPt=1Z1;tZ

02;t

n�32

nPt=1Z2;tZ

01;t n�2

nPt=1Z2;tZ

02;t

1CCA ; (5.13)

where Z1 = (Z1;0; :::; Z1;n�1) and Z2 = (Z2;0; :::; Z2;n�1). Now the result in (5.10) follows

by applying the Lemma 5.1.

The proofs of (b) and (c) are similar to those of Lemma 3.1.(b)-(c) and are therefore

omitted.

Denote the index set of the zero components in Bo as ScB such that kBo;jk = 0 for all

42

j 2 ScB and kBo;jk 6= 0 otherwise. We next derive the asymptotic properties of the LS

shrinkage estimator de�ned in (5.2).

Lemma 5.3 If Assumption 3.1 and 5.1 are satis�ed and �n = o(1), then the LS shrinkageestimator (b�n; bBn) satis�esh

(b�n; bBn)� (�o; Bo)iQ�1B Dn;B = Op(1 + n 12�n): (5.14)

Proof. Let � = (�; B) and

V3;n(�) =

nXt=1


Bj�Yt�j

2

E

+n

pXk=1


bP�n ��k(�)� ;= V4;n(�) + n

pXk=1


bP�n ��k(�)� :Set b�n = (b�n; bBn), and then by de�nition V3;n(b�n) � V3;n(�o); so thatn

vechn�

12 (b�n ��o)Q�1B Dn;Bio0Wn

nvec

hn�

12 (b�n ��o)Q�1B Dn;Bio

: +2nvec

hn�

12 (b�n ��o)Q�1B Dn;Bio0 dn � (e1;n + e2;n) (5.15)

whereWn = D�1n;B

Pnt=1 Zt�1Z

0t�1D

�1n;BIm(p+1), dn = vec

�n�

12D�1n;B

Pnt=1 Zt�1u

0t

�, e1;n =P

k2ScB

h bP�n(Bo;j)� bP�n( bBn;j)i and e2;n =Pk2Sc�

n bP�n ��k(�o)�� bP�n [��k(b�n)]o. Apply-ing the Cauchy-Schwarz inequality to (5.15), we deduce that

�n;min

n� 12 (b�n ��o)Q�1B Dn;B 2

E� n� 1

2 (b�n ��o)Q�1B Dn;B Edn � (e1;n + e2;n) � 0;

(5.16)

where �n;min denotes the smallest eigenvalue of Wn which is bounded away from zero

w.p.a.1. By the de�nition of the adaptive group Lasso penalty function, the consistency ofb�ols and the Slutsky Theorem, we �nd thate1;n �

Xk2ScB

bP�n(Bo;j) = op(1) and e2;n � Xk2Sc�

bP�n ��k(�o)� = op(1): (5.17)

43

If ro = m, then Dn;B = n12 Im(p+1) and hence n

� 12Dn;B = Im(p+1) and n

� 12D�1n;B =

n�1Im(p+1). As Zt�1u0t is a stationary martingale di¤erence, we have by the LLN,

dn = vec

n�1

nXt=1

Zt�1u0t

!= op(1): (5.18)

The consistency of b�n is implied by the inequality in (5.16) and the results in (5.17)-(5.18).If ro 2 [0;m), then by Lemma 5.1, we can deduce that

dn = vec

n�

12D�1n;B

nXt=1

Zt�1u0t

!= op(1): (5.19)

The consistency of b�n is implied by the inequality in (5.16) and the results in (5.17) and(5.19).

We next derive the convergence rate of the LS shrinkage estimator b�n. Using the

similar argument in the proof of Theorem 3.2, we getXk2S�

� bP�n ��k(�o)�� bP�n [��k(b�n)]� � n� 12Cmax

k2S�

n bP 0�n ��k(�o)�o �b�n ��o�Q�1B Dn;B ;(5.20)

where C is some generic positive constant. SimilarlyXk2SB

h bP�n(Bo;j)� bP�n( bBn;j)i � n� 12C max

k2SB

n bP 0�n(Bo;j)o �b�n ��o�Q�1B Dn;B : (5.21)Combining the results in (5.20)-(5.21), we get

n(e1;n + e2;n) � n12an

�b�n ��o�Q�1B Dn;B ; (5.22)

where an = maxk2S�n bP 0�n ��k(�o)�o_maxk2SB n bP 0�n(Bo;j)o = Op(�n). From Lemma 5.1,

we have

n12dn = vec

D�1n;B

nXt=1

Zt�1u0t

!= OP (1): (5.23)

Using the inequality in (5.22), we can rewrite the inequality in (5.16) as

�n;min

(b�n ��o)Q�1B Dn;B 2E� (b�n ��o)Q�1B Dn;B

E(dn + n

12an) � 0; (5.24)

44

which combined with the result in (5.23) imply the rate claimed in (5.14).

The results in Lemma 5.3 imply that the LS shrinkage estimator (b�n; bBn) may havethe same convergence rate of the OLS estimator (b�ols; bBols). We next show that if the

tuning parameter �n converges to zero not too fast, then the zero eigenvalues of �o and

zero matrices in Bo are estimated as zero w.p.a.1. Let the smallest m � ro eigenvalues ofb�n be indexed by Scn;� and the zero matrix in bBn by Scn;B.Theorem 5.1 Suppose that Assumption 3.1 and 5.1 hold. If the tuning parameter �n and! satisfy the conditions ! > 1

2 , n12�n = o(1), �nn! !1 and n

1+!2 �n !1, then we have

Pr��k(b�n) = 0�! 1 (5.25)

for all k 2 Sc�, andPr� bBn;j = 0m�m�! 1 (5.26)

for all j 2 ScB.

Proof. The �rst result can be proved using similar arguments to those in the proof of

Theorem 3.3 and hence is omitted. We next show the second result. Note that

nXt=1


Bj�Yt�j

2

E

�nXt=1

kutk2E

= vec(��o)0 Q�1B

nXt=1

Zt�1Z0t�1Q

�10B Im

!vec(��o)

�2vec(��o)0vec Q�1B

nXt=1

�Ztu0t

!: (5.27)

On the event f bBn;j 6= 0m�mg for some j 2 ScB, we have the following KKT optimality

condition,

n�n

2jj bBols;j jj!E jj bBn;j jjE bBn;j = V�cWB Im

�vec(b�n ��o)� V vec Q�1B nX

t=1

Zt�1u0t

!(5.28)

where V = diag(V1; :::; Vp+1), Vj+1 = Im and V�(j+1) = 0m andcWB = Q�1B

Pnt=1 Zt�1Z

0t�1Q

�10B .

By the de�nition of V , we have V = V � Im where V � is a (p + 1) � (p + 1) diagonal

45

matrix whose j + 1-th diagonal element is 1 and zero otherwise. Hence, results in (5.28)

imply that

n1+!2 �n

2jjn 12 bBols;j jj!E =

V �cWB

n12

Im

!vec(b�n ��o)� vec V �Q�1B Pn

t=1 Zt�1u0t

n12

! E

:

(5.29)

By Lemma 5.1 and Lemma 5.3, we deduce that V �cWB

n12

Im

!vec(b�n ��o)

=

V �cWBQ

0BD

�1n;B

n12

Im

!vec

h�b�n ��o�Q�1B Dn;Bi = Op(1); (5.30)

and

vec

V �Q�1B

Pnt=1 Zt�1u

0t

n12

!= Op(1): (5.31)

However, as n1+!2 �n !1, it follows by Lemma 5.2 and the Slutsky Theorem that

n1+!2 �n

2jjn 12 bBols;j jj!E !p 1: (5.32)

Now, using the results in (5.29)-(5.31), we can deduce that the event f bBn;j 6= 0m�mg8j 2 ScB has zero probability with n!1, which �nishes the proof.

Theorem 5.1 indicates that the zero eigenvalues of �o and the zero matrices in Bo are

estimated as zeros w.p.a.1. Hence Lemma 5.3 and Theorem 5.1 imply consistent cointe-

gration rank selection and consistent lag order selection.

Theorem 5.2 Under the conditions of Theorem 5.1, we have

Pr (Sn;� = S� and Sn;B = SB)! 1; (5.33)

as n!1.

As

QB =

0B@ Iro 0 0

0 0 Imp

0 Im�ro 0

1CA diag(Q; Imp);46

it is easy to see that

Q�1B =

�o(�

0o�o)

�1 0 �o;?(�0o;?�o;?)

�1

0 Imp 0

!:

To ensure identi�cation, we normalize �o as �o = [Iro ; Oro ]0; where Oro is some ro�(m�ro)

matrix such that �o = �o�0o = [�o; �oOro ]. From Lemma 5.3, we have�n12 (b�n ��o)�o(�0o�o)�1 n

12 ( bBn �Bo) n(b�n ��o)�o;?(�0o;?�o;?)�1 � = Op(1);

which implies that

n� bOn �Oo� = Op(1); (5.34)

n12 ( bBn �Bo) = Op(1); (5.35)

n12 (b�n � �o) = Op(1): (5.36)

To conclude this section, we derive the centered limit distribution of the shrinkage estimatorb�S = �b�n; bBSB� in the following theorem, where bBSB denotes the LS shrinkage estimatorof the nonzero matrices in Bo. Denote ISB = diag(I1;m; :::; IdSB ;m) where the Ij;m (j =

1; :::; dSB ) are m�m identity matrices and dSB is the dimensionality of the index set SB:De�ne

QS :=

0B@ �0o 0

0 ISB�0o;? 0

1CA and DS = diag(n12 Iro ; n

12 ISB ; nIm�ro);

where the identity matrix ISB = ImdSB in QS serves to accommodate the nonzero matrices

in Bo:

Theorem 5.3 Under conditions of Theorem 5.1, we have�b�S ��o;S�Q�1S D�1S !d

�N(0;u �z1z1)��1z1z1 �oU

�2 (�

0o;?�o;?)

�1�; (5.37)

where

U�2 = (�0o�o)

�1�0o

�ZdBuB

0w2

��ZBw2B

0w2

��1(�0o;?�o;?): (5.38)

Proof. From the results of Theorem 5.1, we deduce that b�n, b�n and bBSB minimize the47

following criterion function w.p.a.1,

V3;n(�S) =nXt=1

�Yt � ��0Yt�1 �Xj2SB

Bj�Yt�j

2

E

+nXk2S�

bP�n ��k(��0)�+n Xj2SB

bP�n(Bj):De�ne U�1;n =

pn (b�n � �o), U2;n = �

0ro ; U�2;n

�0, where U�2;n = n� bOn �Oo� and U�3;n =

pn� bBSB �Bo;SB� thenh�b�n ��o� ;� bBSB �Bo;SB�iQ�1S D�1S

=hn�

12 b�nU2;n�o(�0o�o)�1 + U�1;n; U�3;n; b�nU2;n�o;?(�0o;?�o;?)�1i :

Denote

�n(U) =hn�

12 b�nU2�o(�0o�o)�1 + U1; U3; b�nU2�o;?(�0o;?�o;?)�1i ;

then by de�nition, U�n =�U�1;n; U

�2;n; U

�3;n

�minimizes the following criterion function

V4;n(U) =nXt=1

�kut ��n(U)DSZS;t�1k2E � kutk

2E

�+n

Xk2S�

n bP�n ��k(�n(U)DSQSL1 +�o)�� bP�n ��k(�o)�o+n

Xj2SB

n bP�n [�n(U)DSQSLj+1 +Bo;j ]� bP�n(Bo;j)o ;where Lj = diag(A1; :::; Ap+1) with Aj = Im and A�j = 0 otherwise and ZS;t�1 =

QS

Yt�1

�XSB ;t�1

!.

For any compact set K 2 Rm�ro �Rro�(m�ro) �Rm�mdSB and any U 2 K, there is

�n(U)DSQS = Op(n� 12 ):

Hence using similar arguments in the proof of Theorem 3.6, we can deduce that

nXk2S�

n bP�n ��k(�n(U)DSQSL1 +�o)�� bP�n ��k(�o)�o = op(1) (5.39)

48

and

nXj2SB

n bP�n [�n(U)DSQSLj+1 +Bo;j ]� bP�n(Bo;j)o = op(1) (5.40)

uniformly over U 2 K.Next, note that

�n(U)!p

�U1; U3; �oU2(�

0o;?�o;?)

�1� := �1(U) (5.41)

uniformly over U 2 K. By Lemma 5.1 and (5.41), we can deduce that

nXt=1

�kut ��n(U)DSZS;t�1k2E � kutk

2E

�! d vec [�1(U)]

0"

�z1z1 0

0RBw2B

0w2

! Im

#vec [�1(U)]

�2vec [�1(U)]0 vec [(B1;m; B2;m)] := V2(U) (5.42)

uniformly over U 2 K, where B1;m = N(0;u �z1z1) and B2;m =�RBw2dB

0u

�0.Note that V2(U) can be rewritten as

V2(U) = vec(U1; U3)0 (�z1z1 Im) vec(U1; U3)

+vec(U2)0�(�0o;?�o;?)

�1ZBw2B

0w2(�

0o;?�o;?)

�1 �0o�o�vec(U2)

�2vec(U1; U3)0vec (B1;m)� 2vec(U2)0vec�B2;m(�

0o;?�o;?)

�1� : (5.43)

The expression in (5.43) makes it clear that V2(U) is uniquely minimized at (U�1 ; U�2 ; U

�3 ),

where

(U�1 ; U�3 ) = B1;m�

�1z1z1 and U

�2 = (�

0o�o)

�1 ��0oB2;m��Z Bw2B0w2

��1(�0o;?�o;?). (5.44)

From (5.34) and (5.36), we can see that U�n is asymptotically tight. Invoking the ACMT,

we deduce that U�n !d U�. The results in (5.37) follow by applying the CMT.

49

6 Adaptive Selection of the Tuning Parameters

This section develops a data-driven procedure of selecting the tuning parameter �n. Given

�, we denote the rank of the shrinkage estimator b�n(�) as brn(�) and let SB;� be the indexset of the nonzero matrices in the shrinkage estimator bBn. De�neICn(�) = min

�2Rm�m;r(�)=brn(�)Bj2Rm�m

8<:nXt=1

�Yt ��Yt�1 �Xj2SB;�

Bj�Yt�j

2

E

9=;�CnK [brn(�); bpn(�)] ;(6.1)

where bpn(�) denotes the dimensionality of the index set SB;�, K (r; p) is a strictly decreasingfunction in r and p and Cn is a positive sequence. We propose to select the tuning parameter

� by minimizing the information criterion ICn(�) de�ned in (6.1), i.e.

b��n = arg min�2n

ICn(�); (6.2)

where n = [0; �max;n] and �max;n is a positive sequence such that �max;n = o(1).

Note that the �rst term in (6.1) is used to measure the �t of the selected model and

the second term gives extra bonus to select smaller cointegration rank and smaller number

of lagged di¤erences. When an over�tted model is selected, then we can show that

min�2Rm�m;r(�)=brn(�)

Bj2Rm�m

8<:nXt=1


Bj�Yt�j

2

E

9=;� min�2Rm�m;r(�)=ro

Bj2Rm�m

8<:nXt=1

�Yt ��Yt�1 �Xj2SB

Bj�Yt�j

2

E

9=;= Op(1); (6.3)

while if Cn !1, then

Cn fK(ro; po)�K [brn(�); bpn(�)]g ! 1 (6.4)

Results in (6.3)-(6.4) imply that over�tted models will not be selected if the tuning para-

meter b��n is selected by minimizing the information criterion in (6.1). On the other hand,

50

if an under�tted model is selected, then we can show that

min�2Rm�m;r(�)=brn(�)

Bj2Rm�m

8<: 1nnXt=1


Bj�Yt�j

2

E

9=;� min�2Rm�m;r(�)=ro

Bj2Rm�m

8<: 1nnXt=1

�Yt ��Yt�1 �Xj2SB

Bj�Yt�j

2

E

9=;! pC; (6.5)

where C is some generic positive constant, while if Cn=n! 0, then

CnnfK(ro; po)�K [brn(�); bpn(�)]g ! 0: (6.6)

Results in (6.3)-(6.4) imply that under�tted models will not be selected if the tuning

parameter b��n is selected by minimizing the information criterion in (6.1). From above

discussion, we see that the following assumption is important for the LS shrinkage estimator

based on b��n to enjoy oracle-like properties.Assumption 6.1 (i) K(r; p) is a bounded and strictly decreasing function in r and p; (ii)Cn !1 as the sample size n!1 and Cn = o(n).

We next show that the LS shrinkage estimation based on b��n selected by minimizingthe information criterion de�ned in (6.1) is consistent in cointegration rank and lag order

selection.

Theorem 6.1 Under Assumption 3.1, 5.1 and 6.1, there is

Pr�S�;b��n = S� and SB;b��n = SB

�! 1 (6.7)

where S�;b��n denotes the index set of nonzero eigenvalues of b�n(b��n) and SB;b��n denotes the

nonzero matrices in bBn(b��n).For empirical implementation, the function K(r; p) can be speci�ed as K(r; p) = r2 �

2mr � p and the sequence Cn can be log n or log log(n), which give BIC and HQIC typeinformation criteria, respectively. On the other hand, if Cn = o(n) and Cn < M for some

51

�nite constant M and all n, then we can show that the LS shrinkage estimation based

on b��n selects under�tted models with probability approaching zero, while it has nontrivialprobability of selecting over�tted models. One such example is the AIC type of information

criterion where K(r; p) = r2 � 2mr � p and Cn = 2.

7 Simulation Study

8 An Empirical Example

9 Conclusion

One of the main challenges in any applied econometric work is the selection of a good

model for practical implementation. The conduct of inference and model use in forecasting

and policy analysis are inevitably conditioned on the empirical process of model selection,

which typically leads to issues of post-model selection inference. Adaptive lasso and bridge

estimation methods provide a methodology where these di¢ culties may be attenuated by

simultaneous model selection and estimation.

This paper uses that methodology to develop an automated approach to cointegrated

system modeling that enables simultaneous estimation of the cointegrating rank and au-

toregressive order in conjunction with oracle-like e¢ cient estimation of the cointegrating

matrix and the transient dynamics. As such the methods o¤er practical advantages to

the empirical researcher by avoiding sequential techniques where cointegrating rank and

transient dynamics are estimated prior to model �tting.

Various extensions of the methods developed here are possible. One rather obvious

extension is to allow for parametric restrictions on the cointegrating matrix which may

relate to theory-induced speci�cations. Lasso type procedures have so far been con�ned

to parametric models, whereas cointegrated systems are often formulated with some non-

parametric elements relating to unknown features of the model. A second extension of

the present methodology, therefore, is to semiparametric formulations in which the error

process in the VECM is weakly dependent, which is partly considered already in Section

4. These and other generalizations of the framework will be explored in future work.

52

10 Appendix

We start with the following standard asymptotic result.

Lemma 10.1 Under Assumptions 3.1 and 3.2, we have(a) n�1

Pnt=1 Z1;t�1Z

01;t�1 !p �z1z1;

(b) n�32Pnt=1 Z1;t�1Z

02;t�1 !p 0;

(c) n�2Pnt=1 Z2;t�1Z

02;t�1 !d

RBw2B

0w2;

(d) n�12Pnt=1 utZ

01;t�1 !d N(0;u �z1z1);

(e) n�1Pnt=1 utZ

02;t�1 !d

�RBw2dB

0u

�0.The quantities in (c), (d), and (e) converge jointly.

Proof of Lemma 10.1. See Johansen (1995) and Cheng and Phillips (2009).

Proof of Lemma 3.1. (a) From (3.10)

Q�b�ols ��o�Q�1Dn =

nXt=1

QutY0t�1Q

0

! nXt=1

QYt�1Y0t�1Q

0

!�1Dn

=

nXt=1

vtZ0t�1D

�1n

! D�1n

nXt=1

Zt�1Z0t�1D

�1n

!�1; (10.1)

where vt = Qut and Zt�1 = QYt�1. Result (a) follows directly from Lemma 10.1.

(b) Denote P = [�o; �o?] and Sn(�) = �Im� b�ols: Then, by de�nition, the elements of�(b�ols) are the solutions of the determinantal equation,0 =

��Im � b�ols�� = ��P 0Sn(�)P �� =�� 0o�o � �0ob�ols�o ��0ob�ols�o?

��0o?b�ols�o �Im�ro � �0o?b�ols�o?�� : (10.2)

Using (a) we deduce that

�0ob�ols�o? = �0o

�b�ols ��o��o? = op(1); (10.3)

�0o?b�ols�o? = �0o?

�b�ols ��o��o? = op(1); (10.4)

and, similarly,

�0o?b�ols�o !p �0o?�o�o and �

0ob�ols�o !p �

0o�o�o: (10.5)

53

Using (10.2)-(10.5), we deduce that��Im � b�ols��!p j�Im�ro j ��0o�o � �0o�o�o�� ; (10.6)

uniformly over any compact set in Rm. By Assumption 3.2.(i), �(�o) 2 U1 := f� 2 Rm;jj�jjE � 1g and U1 is a compact set in Rm. Thus, by continuous mapping, we have

�Sc�;n(b�ols)!p 0 and �S�;n(

b�ols)!p �S�(�o); (10.7)

where �S�(�o) denotes the solutions of the equation��0o�o � �0o�o�o�� = 0. The deter-

minantal equation��0o�o � �0o�o�o�� = 0 is equivalent to ��Iro � �0o�o�� = 0, so result (b)

follows.

(c) Using the notation from (b), we have

jSn(�)j =��0oSn(�)�o�� 0o? nSn(�)� Sn(�)�o ��0oSn(�)�o��1 �0oSn(�)o�o?�� : (10.8)

Let ��k = n�k(b�ols) (k 2 Scn;�), so that ��k is by de�nition a solution of the equation0 =

��0oSn(�)�o�� 0o? nSn(�)� Sn(�)�o ��0oSn(�)�o��1 �0oSn(�)o�o?�� ; (10.9)

where Sn(�) =�nIm � b�ols.

For any compact subset K � R, we can invoke the results in (a) to show

�0oSn(�)�o =�

n�0o�o � �0o

�b�ols ��o��o + �0o�o�o !p �0o�o�o; (10.10)

uniformly over K. From Assumption 3.2.(iii), we have

��0o�o�o�� = ��0o�o�0o�o�� = ��0o�o�� 0o�o�� 6= 0:Thus, the normalized m� ro smallest eigenvalues ��k (k 2 Scn;�) of b�ols are asymptoticallythe solutions of the following determinantal equation,

0 =��0o? nSn(�)� Sn(�)�o ��0oSn(�)�o��1 �0oSn(�)o�o?�� ; (10.11)

54

where

�0oSn(�)�o? = �0o

�b�ols ��o��o?; (10.12)

�0o?Sn(�)�o? =�

nIm�ro � �0o?

�b�ols ��o��o?; (10.13)

�0o?Sn(�)�o = �0o?b�ols�o !p �0o?�o�

0o�o: (10.14)

Using the results in (10.10) and (10.12)-(10.14), we get

�0o?

nSn(�)� Sn(�)�o

��0oSn(�)�o

��1�0oSn(�)

o�o?

=�

nIm�ro � �0o?

hIm � �o

��0o�o

��1�0o + op(1)

i �b�ols ��o��o?: (10.15)

Note that

�0o?

hIm�ro � �o

��0o�o

��1�0o

iQ�1 =

�0(m�ro)�ro ; (�

0o;?�o;?)

�1� := H1; (10.16)

and

Q�o? = [�o; �o?]0 �o? =

�0(m�ro)�ro ; �

0o?�o?

�0:= H 0

2: (10.17)

Using (10.1), we deduce that

nH1Q�b�ols ��o�Q�1H 0

2

=

H1

nXt=1

wtZ0t�1D

�1n

! D�1n

nXt=1

ZtZ0tD

�1n

!�1H 02

! d (�0o;?�o;?)

�1�Z

Bw2dB0w2

�0�ZBw2B

0w2

��1(�0o;?�o;?): (10.18)

Then, from (10.11)-(10.18), we obtain��n�0o? nSn(�)� Sn(�)�o ��0oSn(�)�o��1 �0oSn(�)o�o?��! d

��Im�ro ��Z

Bw2dB0w2

�0�ZBw2B

0w2

��1�� ; (10.19)

uniformly over K. The result in (c) follows from (10.19) and by continuous mapping.

55

Proof of Theorem 3.1. De�ne

V1;n(�) =nXt=1

k�Yt ��Yt�1k2E + nmXk=1

bP�n ��k(�)� = V2;n(�) + n mXk=1

bP�n ��k(�)� :We can write

nXt=1

k�Yt ��Yt�1k2E =��y �

�Y 0�1 Im

�vec(�)

�0 ��y �

�Y 0�1 Im

�vec(�)

�where �y = vec (�Y ), �Y = (�Y1; :::;�Yn)m�n and Y�1 = (Y0; :::; YT�1)m�n.

By de�nition, V1;n(b�n) � V1;n(�o) and thushvec(b�n ��o)i0 nX

t=1

Yt�1Y0t�1 Im

!hvec(b�n ��o)i

+2hvec(b�n ��o)i0 vec nX

t=1

Yt�1u0t

!

� n

(mXk=1

bP�n ��k(�o)�� mXk=1

bP�n h��k(b�n)i): (10.20)

When ro = 0, �Yt is stationary and Yt is full rank I (1) ; so that

n�2nXt=1

Yt�1Y0t�1 !d

Z 1

0Bu(a)B

0u(a)da; (10.21)

and n�2PTt=1 Yt�1u

0t = op(1). From the results in (10.20) and (10.21), we get w.p.a.1, b�n ��o 2

E�min � 2

b�n ��o Ecn � dn � 0; (10.22)

where �min denotes the smallest eigenvalue of the random matrixR 10 Bu(a)B

0u(a)da; which is

positive with probability 1, cn = n�2Pn

t=1 Yt�1u0t

E= op(1) and dn = n�1

Pmk=1

bP�n ��k(�o)� =op(1). From (10.22), it is straightforward to deduce that

b�n ��o E= op(1).

When ro = m, Yt is stationary and we have

n�1nXt=1

Yt�1Y0t�1 !p �Y Y (0) = R(1)uR(1)

0; (10.23)

56

and n�1PTt=1 Yt�1u

0t = op(1). From the results in (10.20) and (10.21), we get w.p.a.1, b�n ��o 2

E�min � 2

b�n ��o Ecn � dn � 0 (10.24)

where �min denotes the smallest eigenvalue of �Y Y (0); which is positive with probability 1,

cn = n�1Pn

t=1 Yt�1u0t

E= op(1) and dn =

Pmk=1

bP�n ��k(�o)� = op(1). So, consistencyof b�n follows directly from the inequality in (10.24).

Denote Bn = (DnQ)�1, then when 0 < ro < m, we can use the results in Lemma 1 to

deduce that

nXt=1

Yt�1Y0t�1 = Q�1D�1n Dn

nXt=1

Zt�1Z0t�1DnD

�1n Q

0�1

= Bn

" �z1z1 0

0RBw2B

0w2

!+ op(1)

#B0n;

and thus

vec(b�n ��o)0 BnDn nXt=1

Zt�1Z0t�1DnB

0n Im

!vec(b�n ��o) � �min �b�n ��o�Bn 2

E;

(10.25)

where �min is the smallest eigenvalue of diag��z1z1 ;

RBw2B

0w2

�and is strictly positive with

probability 1. Next observe that��hvec(b�n ��o)i0 vec BnDn

nXt=1

Zt�1u0t

!�� b�n ��o�Bn E en; (10.26)

where en = kDnPnt=1 Yt�1u

0tkE = Op(1) by Lemma 10.1. Denoting dn = n

Pmk=1

bP�n ��k(�o)�,we have the inequality

�min

�b�n ��o�Bn 2E� 2

�b�n ��o�Bn Een � dn � 0; (10.27)

which implies �b�n ��o�Bn = Op(1 + d 12n ): (10.28)

57

By the de�nition of Bn and (10.28), we deduce that

b�n ��o = Op(n� 12 + n�

12d

12n ) = op(1);

which implies the consistency of b�n.Proof of Theorem 3.2. By the consistency of b�n and the CMT �k(b�n) is consistent for�k(�o). Consistency of �k(b�n) and continuous twice di¤erentiability of bP�n(�) imply that��Xk2S�

� bP�n ��k(�o)�� bP�n [��k(b�n)]�� max

k2S�

�� bP 0�n ��k(�o)�� Xk2S�

��k(b�n)� ��k(�o)��+maxk2S�

�� bP 00�n ��k(�o)�+ op(1)�� Xk2S�

��k(b�n)� ��k(�o)��2 :(10.29)

By the triangle inequality and Bauer-Fiker Theorem on eigenvalue sensitivity we have the

inequality ��k(b�n)� ��k(�o)�� k(b�n)� �k(�o)�� CV b�n ��o E; (10.30)

for all k 2 S�, where the constant CV = kVokE/ V �1o

E2 (0;1) and Vo is the eigenvector

matrix of �o.

Using (10.29) and (10.30) and invoking the inequality in (10.20) we get

hvec(b�n ��o)i0 nX

t=1

Yt�1Y0t�1 Im

!hvec(b�n ��o)i

+2hvec(b�n ��o)i0 vec nX

t=1

Yt�1u0t

!

� Cnmaxk2S�

�� bP 0�n ��k(�o)�� b�n ��o E + op(1) b�n ��o 2E ; (10.31)

where C > 0 denotes a generic constant.

When ro = 0, we use (10.21) and (10.31) to deduce that b�n ��o 2E�min �

b�n ��o E

�cn + n

�1anC�� 0; (10.32)

58

where cn = n�2Pn

t=1 Yt�1u0t

E= Op(n

�1). We deduce from the inequality (10.32) that

b�n ��o = Op(n�1 + n�1an): (10.33)

When ro = m, we use (10.23) and (10.31) to deduce that b�n ��o 2E�min �

b�n ��o E(cn + anC) � 0; (10.34)

where cn = 1n

Pnt=1 Yt�1u

0t

E= Op(n

� 12 ). The inequality (10.32) leads to

b�n ��o = Op(n� 12 + an): (10.35)

When 0 < ro < m, we can use the results in (10.25), (10.26) and (10.31) to deduce that

�min

�b�n ��o�Bn 2E� 2

�b�n ��o�Bn Een � nanCV

�b�n ��o�BnB�1n E;

(10.36)

where en = kDnQPnt=1 Yt�1u

0tkE = Op(1). Note that �b�n ��o�BnB�1n

E� Cn�

12

�b�n ��o�Bn E: (10.37)

Using (10.36) and (10.37), we get�b�n ��o�Bn = Op(1 + n 12an) (10.38)


59

References

[1] M. Caner and K. Knight, "No country for old unit root tests: bridge estimators

di¤erentiate between nonstationary versus stationary models and select optimal lag,"

unpublished manuscript, 2009.

[2] J. Chao and P.C.B. Phillips, "Model selection in partially nonstationary vector au-

toregressive processes with reduced rank structure," Journal of Econometrics, vol. 91,

no. 2, pp. 227.271, 1999.

[3] X. Cheng and P.C.B. Phillips, "Semiparametric cointegrating rank selection," Econo-

metrics Journal, vol. 12, pp. S83.S104, 2009.

[4] J. Fan and R. Li, "Variable selection via nonconcave penalized likelihood and its

oracle properties," Journal of the American Statistical Association, vol. 96, no. 456,

pp. 1348.1360, 2001.

[5] S. Johansen, "Statistical analysis of cointegration vectors," Journal of economic dy-

namics and control, vol. 12, no. 2-3, pp. 231.254, 1988.

[6] S. Johansen, Likelihood-based inference in cointegrated vector autoregressive models.

Oxford University Press, USA, 1995.

[7] K. Knight and W. Fu, "Asymptotics for lasso-type estimators," Annals of Statistics,

vol. 28, no. 5, pp. 1356.1378, 2000.

[8] H. Leeb and B. Potscher, "Model selection and inference: facts and �ction," Econo-

metric Theory, vol. 21, no. 01, pp. 21.59, 2005.

[9] Z. Liao, "Adaptive GMM shrinkage estimation with consistent moment selection,"

unpublished manuscript, 2010.

[10] Z. Liao and P.C.B. Phillips, "Reduced rank regression of partially non-stationary

vector autoregressive processes under misspeci�cation," unpublished manuscript, 2010.

[11] P.C.B. Phillips, "Optimal inference in cointegrated systems," Econometrica, vol. 59,

no. 2, pp. 283.306, 1991.

[12] � � � �, "Fully modi�ed least squares and vector autoregression," Econometrica, vol.

63, no. 5, pp. 1023.1078, 1995.

60

[13] � � � �, "Econometric model determination," Econometrica, vol. 64, no. 4, pp.

763.812, 1996.

[14] P.C.B. Phillips and V. Solo, "Asymptotics for linear processes," Annals of Statistics,

vol. 20, no. 2, pp. 971.1001, 1992.

[15] H. Zou, "The adaptive lasso and its oracle properties," Journal of the American Sta-

tistical Association, vol. 101, no. 476, pp. 1418.1429, 2006.

61

Automated Estimation of Vector Error Correction Models - American

Documents

Transcript of Automated Estimation of Vector Error Correction Models - American