Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf ·...

35
Estimation of Nonlinear Error Correction Models Myung Hwan Seo London School of Economics The Suntory Centre Suntory and Toyota International Centres for Economics and Related Disciplines London School of Economics and Political Science Discussion paper Houghton Street No. EM/2007/517 London WC2A 2AE March 2007 Tel: 020 7955 6679 Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. E-mail address: [email protected]. This research was supported through a grant from the Economic and Social Science Research Council.

Transcript of Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf ·...

Page 1: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

Estimation of Nonlinear Error Correction Models

Myung Hwan Seo∗ London School of Economics

The Suntory Centre

Suntory and Toyota International Centres for Economics and Related Disciplines London School of Economics and Political Science

Discussion paper Houghton Street No. EM/2007/517 London WC2A 2AE March 2007 Tel: 020 7955 6679

∗ Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. E-mail address: [email protected]. This research was supported through a grant from the Economic and Social Science Research Council.

Page 2: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

Abstract

Asymptotic inference in nonlinear vector error correction models (VECM) that exhibit regime-specific short-run dynamics is nonstandard and complicated. This paper contributes the literature in several important ways. First, we establish the consistency of the least squares estimator of the cointegrating vector allowing for both smooth and discontinuous transition between regimes. This is a nonregular problem due to the presence of cointegration and nonlinearity. Second, we obtain the convergence rates of the cointegrating vector estimates. They differ depending on whether the transition is smooth or discontinuous. In particular, we find that the rate in the discontinuous threshold VECM is extremely fast, which is n^{3/2}, compared to the standard rate of n: This finding is very useful for inference on short-run parameters. Third, we provide an alternative inference method for the threshold VECM based on the smoothed least squares (SLS). The SLS estimator of the cointegrating vector and threshold parameter converges to a functional of a vector Brownian motion and it is asymptotically independent of that of the slope parameters, which is asymptotically normal. Keywords: Threshold Cointegration, Smooth Transition Error Correction,

Least Squares, Smoothed Least Squares, Consistency, Convergence Rate.

JEL No: C32 © The author. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Page 3: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

1 Introduction

Nonlinear error correction models (ECM) have been studied actively in economics and there

are numerous applications. To list only a few, see Michael, Nobay, and Peel (1997) for

the application in purchasing power parity, Anderson (1997) in the term structure model of

interest rates, Escribano (2004) in Money demand, Psaradakis, Sola, and Spagnolo (2004) in

relation between stock prices and dividends, and Sephton (2003) in spatial market arbitrage,

and see also a review by Granger (2001). The models include smooth transition ECM of

Granger and Teräsvirta (1993), threshold cointegration of Balke and Fomby (1997), Markov

switching ECM of Psaradakis et al. (2004), and cubic polynomials of Escribano (2004) :

A strand of econometric literature focuses on testing for the presence of nonlinearity and

cointegration in an attempt to disentangle the nonstationarity from nonlinearity. A partial list

includes Hansen and Seo (2002), Kapetanios, Shin, and Snell (2006) and Seo (2006). Time

series properties of various ECMs have been established by Corradi, Swanson, and White

(2000) and Saikkonen (2005, 2007) among others. However, the result on estimation is still

limited. Most of all, consistency has not been proven except for special cases. It is di¢ cult to

establish due to the lack of uniformity in the convergence over the cointegrating vector space as

noted by Saikkonen (1995), which derived the consistency of the MLE of a cointegrated system

that is nonlinear in parameters but otherwise linear. de Jong (2002) studied consistency of

minimization estimators of smooth transition ECMs where the error correction term appears

only in a bounded transition function. Another case studied by Kristensen and Rahbek (2008)

is that the function is unbounded but becomes linear as the error correction term diverges.

Next, estimation of regime-switching and/or discontinuous cases has hardly been studied,

which includes important class of models such as smooth transition ECM and threshold

cointegration. Hansen and Seo (2002) proposed the MLE under normality but only to make

conjecture on the consistency. While it may be argued that the two-step approach by Engle

and Granger (1987) can be adopted due to the super-consistency of the cointegrating vector

estimate, the estimation error cannot be ignored in nonlinear ECMs as shown by de Jong

(2001).1

The purpose of this paper is to develop asymptotic theory for a class of nonlinear vector

error correction models (VECM). In particular, we consider regime switching VECMs, where

each regime exhibits di¤erent short-run dynamics and the regime switching depends on the

disequilibrium error. Examples include threshold cointegration and smooth transition VECM.

First, we establish the square root n consistency for the LS estimator of �. This enables us

to employ de Jong (2002) to make asymptotic inference for both short-run and long-run

parameters jointly in smooth transition models. Then, we turn to discontinuous models,

focusing on the threshold cointegration model, which is particularly popular in practice.

This paper shows that the convergence of the LS estimator of � in the threshold coin-

tegration model is extremely fast at the rate of n3=2: This asymptotics is based not on the

diminishing threshold asymptotics of Hansen (2000) but on the �xed threshold asymptot-

ics. Two di¤erent irregularities contribute to this fast rate. First, the estimating function

1 It provides an orthogonality condition, under which the two-step approach is valid.

1

Page 4: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

lacks uniformity over the cointegrating vector space as the data becomes stationary at the

true value, which is the reason for the super-consistency of the standard cointegrating vector

estimates. Second, � takes part in regime switching, which is discontinuous. This model

discontinuity also boosts the convergence rate, yielding the super-consistency of the thresh-

old estimate as in Chan (1993). While this fast convergence rate is certainly interesting and

has some inferential value, e.g. when we perform sequential test to determine the number of

regimes, it makes it very challenging to obtain an asymptotic distribution of the estimator.

Even in the stationary threshold autoregression, the asymptotic distribution is very compli-

cated and cannot be tabulated (see Chan 1993). Subsampling is the only way to approximate

the distribution in the literature reported by Gonzalo and Wolf (2005), although it would not

work when � is estimated due to the involved nonstationarity. Meanwhile, Seo and Linton

(2007) proposed the smoothed least squares (SLS) estimation for threshold regression models,

which results in the asymptotic normality of the threshold estimate and is applicable to the

threshold cointegration model.

We develop the asymptotic distributions of the SLS estimators of �, the threshold para-

meter , and the other short-run parameters. The estimates � and converge jointly to a

functional of Brownian motions, with the rates slightly slower than those of the unsmoothed

counterparts. This slow-down in convergence rate has already been observed in Seo and Lin-

ton and is the price to pay to achieve standard inference. The remaining regression parameter

estimates converge to the Normal as if the true values of � and were known. This is not the

case if the transition function is smooth. We also show that � can be treated as if known in

the SLS estimation of the short-run parameters including if we plug in the unsmoothed coin-

tegrating vector estimate due to the fast convergence rate. A set of Monte Carlo experiments

demonstrates that this two-step approach is more e¢ cient in �nite samples.

This paper is organized as follows. Section 2 introduces the regime switching VECMs and

establishes the square root n consistency of the LS estimator of �. Section 3 concentrates on

the threshold cointegration model, obtaining the convergence rate of the LS estimator of �

and the asymptotic distributions of the SLS estimators of all the model parameters. It also

discusses estimation of the asymptotic variances. Finite sample performance of the proposed

estimators is examined in Section 4. Section 5 concludes. Proofs of theorems are collected in

the appendix.

We make the following conventions throughout the paper. The integralRis taken over R

unless speci�ed otherwise and the summationP

t with respect to t is taken for all available

observations for a given sample. The subscript 0 and the hat ^ in any parameter indicate

the true value and an estimate of the parameter, respectively, e.g., �0 and �. For a function

g; kgk22 =Rg (x)

2dx and g(i) indicates the ith derivative of g: And, for a random vector xt

and a parameter �; we write gt (�) = g (xt; �), gt = g (xt; �0) and gt = g�xt; �

�: For example,

if zt (�) = x0t�; then we write zt = x0t�0 and zt = x0t�. The weak convergence of stochastic

processes under the uniform metric is signi�ed by ) :

2

Page 5: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

2 Regime Switching Error Correction Models

Let xt be a p-dimensional I (1) vector that is cointegrated with single cointegrating vector.

It is denoted by�1; �0

�0, normalizing the �rst element by 1: De�ne the error correction term

zt (�) = x1t + x02t�, where xt = (x1t; x

02t)

0; and let

Xt�1 (�) =�1; zt�1 (�) ;�

0t�1�0;

where �t�1 denotes the vector of the lagged �rst di¤erence terms��x0t�1; � � � ;�x0t�l+1

�0.

Then, consider a two-regime vector error correction model

�xt = A0Xt�1 (�) +D

0Xt�1 (�) dt�1 (�; ) + ut; (1)

where t = l + 1; :::; n; and dt (�; ) = d (zt (�) ; ) is a bounded function that controls the

transition from one regime to the other regime. It needs not be continuous. Typical examples

of the transition function include the indicator function 1 fzt (�) > g and the logistic function(1� exp (� 1 (zt (�)� 2)))

�1, where = ( 1; 2)0:

The threshold cointegration model of Balke and Fomby (1997) and the smooth transition

error correction model in Granger and Teräsvirta (1993) can be viewed as special cases. As an

alternative, Escribano (2004) used cubic polynomials to capture this type of regime-speci�c

error correction behavior. While the last model is not nested in model (1), all this literature

focuses on the nonlinear adjustment based on the magnitude of disequilibrium error. In this

regard, Gonzalo and Pitarakis (2006) is di¤erent, in which a stationary variable determines

the regimes. While we study a two-regime model to simplify our exposition, we expect the

models with more than two regimes can be analyzed in a similar way. A symmetric three-

regime model can be directly embedded in the two-regime model by replacing the threshold

variable zt with its absolute value jztj. However, some of the assumptions imposed later on inthis section, in particular, the one with the series being I (1) becomes more di¢ cult to verify.

See Saikkonen (2007) :

We introduce some matrix notation. De�ne X (�), X� (�) ; y; and u as the matrices

stacking X 0t�1 (�), X

0t�1 (�) dt�1 (�; ) ; �xt and ut, respectively. Let � = vec

�(A0; D0)

0�;

where vec stacks rows of a matrix. We call by Az and Dz the columns of A0 and D0 that are

associated with zt�1 (�) and zt�1 (�) dt�1 (�; ) ; respectively, and by �z the collection of Azand Dz: Then, we may write

y =��X (�) ; X�

(�)� Ip

��+ u:

We consider the LS estimation, which minimizes

S�n (�) =�y �

��X (�) ; X�

(�)� Ip

���0 �y �

��X (�) ; X�

(�)� Ip

���; (2)

where � =��0; ; �0

�0: The LS estimator is then de�ned as

��= argmin

�S�n (�) ;

3

Page 6: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

where the minimum is taken over a compact parameter space �. The concentrated LS is

computationally convenient, since it is simple OLS for a �xed (�; ) ; i:e:

�� (�; ) =

0@" X (�)0X (�) X (�)

0X� (�)

X� (�)

0X (�) X�

(�)0X� (�)

#�1 X (�)

0

X� (�)

0

! Ip

1A y;which is then plugged back into (2) for optimization over (�; ) : In practice, the grid search

over (�; ) can be applied. In particular, the grid for � can be set up around a preliminary

estimate of �, that can be obtained based on the linear VECM, such as the Johansen�s

maximum likelihood estimator or the simple OLS estimator as described in Hansen and Seo

(2002) :

The asymptotic property of the estimator ��is nonstandard due to the irregular feature

of S�n, which does not obey a uniform law of large numbers. Thus, we take a two-step

approach. First it is shown that ��= �0 + Op

�n�1=2

�by evaluating the di¤erence between

inf S�n (�) and S�n (�0) ; where the in�mum is taken over all � 2 � such that rn j� � �0j > �

for a sequence rn such that rn ! 1 and rn=pn ! 0: Similar approaches were taken by

Wu (1981) and Saikkonen (1995) among others. The latter established the consistency of the

maximum likelihood estimator of nonlinear transformation of � in the linear model. Second,

the consistency of the short-run parameter estimates is established by the standard consistency

argument using a uniform law of large numbers.

We assume the following for the consistency of the estimator ��.

Assumption 1 (a) futg is an independent and identically distributed sequence with Eut =0; Eutu

0t = � that is positive de�nite.

(b) f�xt; ztg is a sequence of strictly stationary strong mixing random variables with mix-

ing numbers �m; m = 1; 2; : : : ; that satisfy �m = o�m�(�0+1)=(�0�1)

�as m ! 1 for some

�0 � 1; and for some " > 0; E jXtX 0tj�0+" < 1 and E jXt�1utj�0+" < 1: Furthermore,

E�xt = 0 and the partial sum process, x[ns]=pn; s 2 [0; 1] ; converges weakly to a vector

Brownian motion B with a covariance matrix , which is the long-run covariance matrix

of �xt and has rank p � 1 such that�1; �00

� = 0: In particular, assume that x2[ns]=

pn

converges weakly to a vector Brownian motion B with a covariance matrix , which is �nite

and positive de�nite.

(c) the parameter space � is compact and bounded away from zero for �z and there is

a function ~d (x) that is monotonic, integrable, and symmetric around zero, and by which

sup 2� jd (x; )� 1 fx > 0gj is bounded.(d) Let ut (�; �; ) be de�ned as in (1) replacing zt (�) with zt+�; where � belongs to a compact

set in R and letS (�; �; ) = E (u0t (�; �; )ut (�; �; )) :

Then, assume that 1n

Pt ut (�; �; )

0ut (�; �; )

p�! S (�; �; ) uniformly in (�; �; ) on any

compact set and S (�; �; ) is continuous in all its arguments and it is uniquely minimized at

(�; �; ) = (0; �0; 0) :

Condition (a) is common as in Chan (1993) : It simpli�es our presentation but could be

4

Page 7: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

relaxed. While we assume the stationarity and mixing conditions for f�xt; ztg ; Saikkonen(2007) provide more primitive conditions on futg and the coe¢ cients. They are more strongerthan each regime satisfying the standard conditions in the linear VECM. We also focus on

the processes without the linear time trend by assuming that E�xt = 0:

Unlike for nonlinear models with stationary variables, the consistency proof for the non-

linear error correction models is di¢ cult to be established in a general level. It depends

crucially on the shape of the nonlinear transformation of the error correction term when the

variable takes large values, see Park and Phillips (2001) : Condition (c) identi�es the shape,

which is piece-wise linear in large values of the error correction terms. It is clear that the

indicator functions and logistic functions satisfy (c) : It distinguishes the current work from

previous ones. de Jong (2002) considered the nonlinearity only through a bounded function

and Kristensen and Rahbek (2008) through an unbounded function, which becomes linear for

the large values of the error correction zt�1. Thus, they do not capture the regime-speci�c

behavior, which makes the consistency proof much di¤erent from the previous ones. We do

not consider more general functional forms discussed in Saikkonen (2005) and Escribano and

Mira (2002). The condition for �z is not necessary but convenient for our proof and does not

appear to be much restrictive. We note that the case with �z = 0 is similar to the model

studied by de Jong (2002) as the error correction term appears only in a bounded function in

this case. We also comment on the case where the threshold variable is jzt�1 (�)j : The proofof the consistency goes through almost the same but an additional assumption on �z such that

Az +Dz 6= 0 will facilitate the direct application of the proof since 1 fj�tj > g � 1 fj�tj > 0gfor an integrated process �t:

The conditions in (d) are a standard set of conditions that are imposed to ensure the

consistency of nonlinear least squares estimators. More speci�c set of su¢ cient conditions

to ensure the uniform law of large numbers can be found in Andrews (1987) or Pöscher

and Prucha (1991) ; for instance. It can be easily checked that the commonly used smooth

transition functions and the indicator functions for threshold models satisfy such conditions.

It implicitly impose the condition that D0 6= 0 as in the standard threshold model. The

identi�cation condition for � here is to identify the cointegrating vector at the square root n

neighborhood and it is also imposed in de Jong (2002). The conditions in Assumption 1 do

not guarantee the existence of a measurable least squares estimator since we do not impose

the continuity of the function d: In this case, we can still establish the consistency based

on the convergences in outer measure, see e.g. Newey and McFadden (1994). To ease the

exposition, we implicitly assume the measurability in the theorem below.

Theorem 1 Under Assumption 1, ����0 is op (1) and furthermore,

pn���� �0

�= op (1) :

When the transition function dt�1 (�; ) satis�es certain smoothness condition, the as-

ymptotic distribution of ��can be derived following the standard approach using the Taylor

series expansion. de Jong (2002) explored minimization estimators with nonlinear objective

function that involves the error correction term. It derived the asymptotic distributions of

such estimators under the assumption thatpn���� �0

�= Op (1) : Thus, we refer to de

Jong (2002) for the case with a smooth dt�1 (�; ) : It is worth noting that the asymptotic

distribution of the short-run parameter estimates is in general dependent on the estimation

5

Page 8: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

error of the cointegrating vector despite its super-consistency due to the nonlinearity of the

model. On the other hand, the threshold model has not been studied due to the irregular fea-

ture of the indicator function, although the model has been adopted more widely in empirical

research. We turn to the so-called threshold cointegration model and develop an asymptotics

for the model in the next section.

3 Threshold Cointegration Model

Balke and Fomby (1997) introduced the threshold cointegration model, which corresponds

to model (1) with d (zt (�) ; ) = 1 fzt (�) > g ; to allow for nonlinear and/or asymmetric

adjustment process to the equilibrium. That is,

�xt =

(A0 +Az zt�1 (�) +A1

��x0t�1; � � � ;�x0t�l+1

�0+ ut;

B1 +Bz zt�1 (�) +B1��x0t�1; � � � ;�x0t�l+1

�0+ ut;

if zt�1 (�) � if zt�1 (�) >

;

where B = A + D: The motivation of the model was that the magnitude and/or the sign

of the disequilibrium zt�1 plays a central role in determining the short-run dynamics (see

e.g. Taylor 2001). Thus, they employed the error correction term as the threshold variable.

This threshold variable makes the estimation problem highly irregular as the cointegrating

vector subjects to two di¤erent sorts of nonlinearity. Even when the cointegrating vector is

prespeci�ed, the estimation is nonstandard. We introduce a smoothed estimator and study

the asymptotic properties of both smoothed and unsmoothed estimators in the following

subsections.

To resolve the irregularity of the indicator function, Seo and Linton (2007) introduced a

smoothed least squares estimator. To describe the estimator, de�ne a bounded function K (�)satisfying that

lims!�1

K (s) = 0; lims!+1

K (s) = 1:

A distribution function is often used for K. Let Kt (�; ) = K�zt(�)�

h

�; where h ! 0

as n ! 1: To de�ne the smoothed objective function, we replace dt�1 (�; ) in (1) withKt�1 (�; ) and de�ne the matrix X (�) that stacks Xt�1 (�)Kt�1 (�; ) : Then, we have thesmoothed objective function

Sn (�) = (y � [(X (�) ; X (�)) Ip]�)0 (y � [(X (�) ; X (�)) Ip]�) : (3)

And, the Smoothed Least Squares (SLS) estimator is de�ned as

� = argmin�2�

Sn (�) :

Similarly as the concentrated LS estimator, we can de�ne

� (�; ) =

0@" X (�)0X (�) X (�)

0X (�)

X (�)0X (�) X (�)

0X (�)

#�1 X (�)

0

X (�)0

! Ip

1A y; (4)

6

Page 9: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

and by pro�ling we can minimize Sn (�) with respect to (�; ) :

It is worth mentioning that the true model is a threshold model and we employ the

smoothing only for the estimation purpose. Since

K�zt (�)�

h

�! 1 fzt (�) > g

as h! 0; Sn (�) converges in probability to the probability limit of S�n (�) as n!1:We make the following assumptions regarding the smoothing function K and the band-

width parameter h:

Assumption 2 (a) K is twice di¤erentiable everywhere, K(1) is symmetric around zero, K(1)

and K(2) are uniformly bounded and uniformly continuous. Furthermore,R ��K(1) (s)��4 ds,R ��K(2) (s)��2 ds, and R ��s2K(2) (s)�� ds are �nite.

(b) For some integer # � 1 and each integer i (1 � i � #) ;R ��siK(1) (s)�� ds <1; andZ

si�1sgn (s)K(1) (s) ds = 0;Zs#sgn (s)K(1) (s) ds 6= 0;

and K (x)�K (0) ? 0 if x ? 0:(c) Furthermore, for some " > 0;

limn!1

hi�#Zjhsj>"

���siK(1) (s)��� ds = 0;

limn!1

h�1Zjhsj>"

���K(2) (s)��� ds = 0:

(d) The sequence fhg satis�es that for some sequence m � 1;

nh3 ! 0;

log (nm)�n1�6=rh2m�2

��1! 0;

and

h�9k=2�3n(9k=2+2)=r+"�m ! 0

where k is the dimension of � and r > 4 is speci�ed in Assumption 4.

These conditions are imposed in Seo and Linton (2007) and common in smoothed esti-

mation as in Horowitz (1992) for example. Condition (b) is an analogous condition to that

de�ning the so-called #th order kernel, and requires a kernel K(1) that permits negative valueswhen # > 1 and K (0) = 1=2: We impose the condition that K (x)�K (0) ? 0 if x ? 0 as weneed negative kernels for # > 1: Condition (c) is standard. The standard normal cumulative

distribution function clearly satis�es these conditions and see Seo and Linton (2007) for an

example with # > 1: Condition (d) serves to determine the rate for h: While this range of

rates is admissible, we do not have a sharp bound and thus no optimal rate. It the data

7

Page 10: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

were independent and identically distributed, the conditions simplify to�nh2

��1log n ! 0

and nh3 ! 0: As will be shown in the following section, the smaller h implies the faster con-

vergence, whereas it may destroy the asymptotic normality if it is too small. This condition

is not relevant for the consistency of �: On the other hand, too large h introduces correlation

between the threshold estimate and the slope estimate �:

The following corollary establishes the consistency of the smoothed least squares estimator.

Corollary 2 Under Assumption 1 and 2, � � �0 is op (1) and furthermore,pn�� � �0

�is

op (1) :

3.1 Convergence Rates and Asymptotic Distributions

The unsmoothed LS estimator of the threshold parameter is super-consistent in the stan-

dard stationary threshold regression and has complicated asymptotic distribution, which de-

pends not only on certain moments but on the whole distribution of data. On the contrary,

the smoothed LS estimator of the same parameter exhibits asymptotic normality, while the

smoothing slows down the convergence rate. The nonstandard nature of the estimation of

threshold models becomes more complicated in threshold cointegration since the threshold-

ing relies on the error correction term, which is estimated simultaneously with the threshold

parameter : We begin with developing the convergence rates of the unsmoothed estimators

of the cointegrating vector � and the threshold parameter and then explore the asymptotic

distribution of the smoothed estimators.

The asymptotic behavior of the threshold estimator heavily relies on whether the model

is continuous or not. We focus on the discontinuous model. The following is assumed.

Assumption 3 (a) For almost every �t; the probability distribution of zt conditional on �thas everywhere positive density with respect to Lebesque measure.

(b) E�X 0t�1D0D

00Xt�1jzt�1 = 0

�> 0:

The condition (a) ensures that the threshold parameter is uniquely identi�ed and is

common in threshold autoregressions. While it is more complicated to verity the condition in

general, it is easy to see that the threshold VECM without any lagged term satis�es it since

it entails threshold autoregression in zt: This remark is also relevant to Assumption 4 (c) :

The discontinuity of the regression function is assumed in the condition (b) : At the threshold

point 0; the change in the regression function is nonzero, which makes the regression function

discontinuous. This enables a super-e¢ cient estimation of the threshold parameter : If the

regression function is continuous, Gonzalo and Wolf (2005) showed that the least squares

estimator of the threshold parameter is root n consistent and asymptotically normal in the

context of stationary threshold autoregression, which may be used to test for the condition

(b) :

We obtain the following rate result for the unsmoothed estimator of � and :

Theorem 3 Under Assumption 1 and 3, ��= �0 +Op

�n�3=2

�and � = 0 +Op

�n�1

�:

8

Page 11: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

It is surprising that the cointegrating vector estimate converges faster than the standard

n-rate. Heuristically, � = �x02t�1�� � �0

�behaves like a threshold estimate in a stationary

threshold model as

1 fzt�1 (�) > g = 1 fzt�1 > � g :

Since sup2�t�n jxt�1j = Op�n1=2

�and the threshold estimate is super-consistent, it is ex-

pected that � � �0 = Op�n�3=2

�: This fast rate of convergence has an important inferential

implication for the short-run parameters as will be discussed later.

We turn to the smoothed estimator for the inference of the cointegrating vector. While

subsampling is shown to be valid to approximate the asymptotic distribution of the un-

smoothed LS estimator of the threshold parameter in the stationary threshold autoregression

(see Gonzalo and Wolf 2005), the extension to the threshold cointegration is not trivial due to

the involved nonstationarity. The smoothing of the objective function enables us to develop

the asymptotic distribution based on the standard Taylor series expansion. Let f (�) denotethe density of zt and f (�j�) the conditional density given �t = �. Also de�ne

~K1 (s) = K(1) (s) (1 fs > 0g � K (s))

and

�2v = E

� K(1) 22

�X 0t�1D0ut

�2+ ~K1 2

2

�X 0t�1D0D

00Xt�1

�2 jzt�1 = 0� f ( 0) (5)

�2q = K(1) (0)E�X 0t�1D0D

00Xt�1jzt�1 = 0

�f ( 0) : (6)

First, we set out assumptions that we need to derive the asymptotic distribution.

Assumption 4 (a) E[jXtu0tjr] <1; E[jXtX 0

tjr] <1; for some r > 4;

(b) f�xt; ztg is a sequence of strictly stationary strong mixing random variables with mixing

numbers �m; m = 1; 2; : : : ; that satisfy �m � Cm�(2r�2)=(r�2)�� for positive C and �;as

m!1:(c) For some integer # � 2 and each integer i such that 1 � i � #� 1; all z in a neighborhoodof , almost every �; and some M < 1, f (i) (zj�) exists and is a continuous function of zsatisfying

��f (i) (zj�)�� < M . In addition, f (zj�) < M for all z and almost every �.

(d) The conditional joint density f (zt; zt�mj�t;�t�m) < M; for all (zt; zt�m) and almost all(�t;�t�m) :

(e) �0 is an interior point of �:

These assumptions are analogous to those imposed in Seo and Linton (2007) that study the

SLS estimator of the threshold regression model. The condition (a) ensures the convergence of

the variance covariance estimators but can be weakened. We need stronger mixing condition

as set out in (b) than that required for consistency. The conditions (c) - (e) are common

in the smoothed estimation as in Horowitz (1992), only (d) being an analogue of a random

sample to a dependent sample. In particular, we require more stringent smoothness condition

(condition (c)) for the conditional density f (zj�) for the smoothed estimation than for theunsmoothed estimation.

9

Page 12: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

We present the asymptotic distribution below.

Theorem 4 Suppose Assumption 1 - 4 hold. Let W denote a standard Brownian motion that

is independent of B: Then, nh�1=2

�� � �0

�pnh�1 ( � 0)

!) �v

�2q

R 10BB0

R 10BR 1

0B0 1

!�1 RBdW

W (1)

!

pn��� �0

�) N

0@0;"E 1 dt�1

dt�1 dt�1

!Xt�1X 0

t�1

#�1 �

1A ;and these two random vectors are asymptotically independent. The unsmoothed estimator �

has the same asymptotic distribution as �.

We make some remarks on the similarities to and di¤erences from the linear cointegration

model and the stationary threshold model. First, the asymptotic distribution of � and is

mixed normal, the same asymptotic distribution as that of the standard OLS estimator of the

cointegrating vector and the constant in the exogenous case up, to the scaling factor �v=�2q:

A reading of the proof of this theorem reveals that the linear part does not contribute to

the asymptotic variance of � although � appears in both inside the indicator and the linear

part of the model. The factor �2v=�4q contains the conditional expectation and density at the

discontinuity point. It is the asymptotic variance of threshold estimate if the true cointegrating

vector were known. Other than the estimation of this factor, the inference can be made in the

same way as in the standard OLS case. Second, the cointegrating vector converges faster than

the usual n rate but slower than the n3=2, which is obtained for the unsmoothed estimator.

This is also the case for the estimators � and for the threshold point : Third, as in the

stationary threshold model, the slope parameter estimate � is asymptotically independent of

the estimation of � and :

The convergence rates of the estimators � and depend on the smoothing parameter h

in a way that the smaller h accelerates the convergence. This is in contrast to the smoothed

maximum score estimation. In the extremum case where h = 0; we obtain the fastest conver-

gence, which corresponds to the unsmoothed estimator. The smaller h boosts the convergence

rates by reducing the bias but too small a h destroys the asymptotic normality. We do not

know the exact order of h where the asymptotic normality breaks down, which requires further

research.

The asymptotic independence between the estimator � and the estimator � of the slope

parameter � and the asymptotic normality of � contrast the result in smooth transition

cointegration models, where the asymptotic distribution of � not only draws on the estimation

of � but is non-Normal without certain orthogonality condition (see e.g. de Jong 2001; 2002):2

This is due to the slower convergence of the estimators of � in the smooth transition models.

Therefore, it should also be noted that the Engle-Granger type two-step approach, where

2 In case of Ezt = 0; we can still retain the asymptotic Normality of the slope estimate by estimating (1)

after replacing the zt�1 (�) with �zt�1�~��=�1; ~�

0� �xt�1 � 1

n

Ps xs�1

�; for any n-consistent ~�; as in de

Jong (2001) : It is worth noting, however, that this demeaning increases the asymptotic variance.

10

Page 13: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

the cointegrating vector is estimated by the linear regression of x1t on x2t and the estimate

is plugged in the error correction model, does not work in our case in the sense that the

estimation error a¤ects the asymptotic distribution of �: Therefore, the above independence

result is useful for the construction of con�dence interval for the slope parameter �.

Furthermore, we may propose a two-step approach for the inference of the short-run

parameters making use of the fact that the unsmoothed estimator ��converges faster than

the smoothed estimator �: In principle, we can treat ��as if it is the true value �0: The

following corollary states this.

Corollary 5 Suppose Assumption 1 - 4 hold. Let (�) be the smoothed estimator of when� is given. Then,

����has the same asymptotic distribution as that of (�0), which is

N�0;

�2v�4q

�:

3.2 Asymptotic Variance Estimation

The construction of con�dence interval for the slope parameter � is straightforward as � and

��are just OLS estimators given (�; ) : We may treat the estimates � and (or �

�and �)

as if they are �0 and 0 due to Theorem 4. We may use either 1 fzt�1 > g or Kt�1��;

�for dt�1: The inference for (�; ) requires to estimate , �2v; and �

2q:3 The estimation of can

be done by applying a standard method of HAC estimation to �xt; see e.g., Andrews (1991).

Although �2v and �2q involve nonparametric objects like conditional expectation and density,

we do not have to do a nonparametric estimation as those are limits of the �rst and second

derivatives of the objective function with respect to the threshold parameter :

Thus, let

� t =1

2phXt�1

���0DK(1)t�1

��;

�ut; (7)

where ut is the residual from the regression (1) ; and let

�2v =1

n

Xt

�2t ; and �2q =

h

2nQn22

���;

where Qn22 is the diagonal element corresponding to of the Hessian matrixQn; see Appendix

for the explicit formulas. Consistency of �2q is straightforward from the proof of Theorem 4

and that of �2v can be obtained after a slight modi�cation of Theorem 4 of Seo and Linton

(2007) :

We can construct con�dence interval for based on Corollary 5. The estimation of �2v and

�2q can be done as above with � = ��: Due to the asymptotic normality and independence,

the construction of con�dence interval is much simpler this way without the need to estimate

:

Even though � and ��are asymptotically independent of

��;

�and

���; ��; they are

dependent in �nite samples. So, we may not bene�t from the imposition of the block diagonal

3 In the MLE of linear VECM, the likelihood ratio statistic for a hyhothesis on � converges to the Chi-squaredistribution as the log likelihood function under normality is approximately quadratic. It will be interestingto examine if the same holds true for the threshold cointegration model.

11

Page 14: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

feature of the asymptotic variance matrix. Corollary 5 enables the standard way of construct-

ing con�dence interval based on the inversion of t-statistic with jointly estimated covariance

matrix. In this case, we may de�ne � t in (7) using the score of ut (�) with respect to ( ; �)

for a given ��: See Seo and Linton (2007) for a more discussion.

4 Monte Carlo Experiments

This section investigates the �nite sample performance of the estimators explored in this

paper. Of particular interest are the various estimators of the cointegrating vector � and

the threshold parameter : We compare the unsmoothed LS estimator ��of � with the SLS

estimator � and the Johansen�s maximum likelihood estimator ~�; which is based on the linear

VECM. For comparison purpose, we also compute the restricted estimators ��0 and �0; which

are the unsmoothed LS and SLS estimators of � when is �xed at the true value 0. Similarly,

�0 and 0 denote the restricted unsmoothed LS and SLS estimators of when � is prespeci�ed

at the true value �0. To distinguish the SLS estimator from the two-step SLS estimator,

let 2 denote the two-step estimator.

The simulation samples are generated from the following process

�xt =

�10

!(x1t�1 � �0x2t�1) +

�20

!1 fx1t�1 � �0x2t�1 � 0g

+

�0:50

!(x1t�1 � �0x2t�1) 1 fx1t�1 � �0x2t�1 > 0g+ ut;

where ut � iidN (0; I2) ; t = 0; :::; n; and �x0 = u0: This process was considered in Hansen

and Seo (2002) ; who provide us with the �nite sample performance of the maximum likelihood

estimator of � and : We �x �0 = 1; and 0 = 0: While the data generating process does

not contain any lagged �rst di¤erence term, the model is estimated with two lagged terms in

addition to the error correction term. The estimation is based on the grid search with grid

sizes for � and being 100 and 500, respectively. The grid for � is set around the Johansen�s

maximum likelihood estimator ~� as in Hansen and Seo (2002) : For the smoothed estimators,

we use the standard normal distribution function for K and set h = �n�1=2 log n; where �2 isthe sample variance of the error correction term.

Table 1 summarizes the results of our experiments with various sample sizes n = 100; 250,

and 500. We examined the �nite sample distributions of the various estimators in terms of

mean, root mean squared error (RMSE), mean absolute error (MAE), and selected percentiles

from 1000 simulation replications. The RMSE and MAE are reported in log :

The results appear as we expected. The Johansen�s maximum likelihood estimator ~� does

not perform as well as all the other estimators, which are obtained from estimating the cor-

rectly speci�ed threshold cointegration model. The unsmoothed estimators and the restricted

estimators outperform the smoothed and the unrestricted counterparts, respectively, in terms

of RMSE and MAE. However, careful examination of percentiles reveals that the smoothed

estimator � exhibits the smaller length of interval between �ve percentile and ninety-�ve per-

12

Page 15: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

Mean RMSE MAE Percentile(%)in log in log 5 25 50 75 95

n = 100e� � �0 0.001 -2.841 -3.191 -0.083 -0.028 0.000 0.028 0.098b�� � �0 0.001 -2.880 -3.293 -0.082 -0.023 -0.001 0.023 0.091b��0 � �0 -0.001 -3.009 -3.617 -0.063 -0.015 -0.001 0.011 0.062b� � �0 -0.001 -2.854 -3.268 -0.085 -0.025 0.001 0.025 0.084b�0 � �0 -0.002 -2.947 -3.470 -0.068 -0.020 -0.001 0.019 0.068

b � � 0 -0.672 0.138 -0.284 -2.649 -1.153 -0.293 -0.032 0.258b �0 � 0 -0.631 0.080 -0.408 -2.617 -1.023 -0.209 -0.038 0.123b � 0 -0.461 0.149 -0.265 -2.702 -0.945 -0.058 0.241 0.700b 0 � 0 -0.444 0.090 -0.380 -2.719 -0.862 -0.021 0.192 0.504b 2 � 0 -0.497 0.146 -0.272 -2.802 -1.014 -0.086 0.219 0.582

n = 250e� � �0 0.000 -3.960 -4.241 -0.030 -0.011 0.000 0.011 0.031b�� � �0 0.001 -4.302 -4.637 -0.022 -0.007 0.001 0.007 0.023b��0 � �0 0.000 -4.639 -5.133 -0.014 -0.003 0.000 0.003 0.014b� � �0 0.000 -4.278 -4.626 -0.022 -0.007 0.000 0.007 0.020b�0 � �0 0.000 -4.540 -4.866 -0.019 -0.006 0.000 0.006 0.016

b � � 0 -0.116 -0.928 -1.778 -0.734 -0.108 -0.031 0.015 0.156b �0 � 0 -0.102 -1.007 -2.085 -0.526 -0.069 -0.021 -0.002 0.069b � 0 -0.057 -0.888 -1.677 -0.746 -0.073 0.019 0.113 0.236b 0 � 0 -0.049 -0.901 -1.826 -0.622 -0.043 0.030 0.100 0.184b 2 � 0 -0.054 -0.917 -1.704 -0.769 -0.071 0.025 0.105 0.236

n = 500e� � �0 0.000 -2.006 -2.147 -0.015 -0.005 0.000 0.006 0.015b�� � �0 -0.000 -2.279 -2.425 -0.009 -0.003 -0.000 0.002 0.008b��0 � �0 -0.000 -2.491 -2.698 -0.005 -0.001 -0.000 0.001 0.005b� � �0 -0.000 -2.253 -2.383 -0.009 -0.003 -0.000 0.003 0.008b�0 � �0 -0.000 -2.335 -2.476 -0.007 -0.003 -0.000 0.002 0.008

b � � 0 -0.006 -1.165 -1.346 -0.115 -0.035 -0.004 0.020 0.096b �0 � 0 -0.010 -1.438 -1.636 -0.068 -0.022 -0.007 0.002 0.045b � 0 0.032 -1.080 -1.199 -0.092 -0.013 0.030 0.076 0.155b 0 � 0 0.031 -1.213 -1.319 -0.051 -0.007 0.031 0.061 0.122b 2 � 0 0.028 -1.102 -1.229 -0.086 -0.015 0.025 0.068 0.146

Notation. Johansen�s maximum likelihood estimator, ~�; the unsmoothed estima-tors, ��, the restricted unsmoothed estimators, ��0, the smoothed estimators, �; therestricted smoothed estimators, �0; and �nally the two-step estimator 2.

Table 1: Distribution of Estimators

13

Page 16: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

centile than the unsmoothed estimator ��: And the percentiles of all the estimators of � seem

very symmetric around the medians, which are mostly zeros.

Similar observation is made for the comparison among the estimators of the threshold

parameter : The unsmoothed estimators have smaller RMSE and MAE than the smoothed

ones. The knowledge on the true value of � helps reduce RMSE and MAE of the estimators

of : The two-step estimator 2; which employs more e¢ cient estimator ��than �; indeed

improves upon and even has the smaller RMSE than the restricted estimator 0 when

n = 250: The distributions of all the estimators of appear quite asymmetric and have large

negative biases when n = 100: However, for n = 500 the distributions become more or less

symmetric around the medians and the biases are much smaller than those for n = 100:

We also note that the biases of the unsmoothed estimators are much bigger than those of

smoothed estimators when n = 100 and n = 250:

5 Conclusion

We have established the consistency of the LS estimators of the cointegrating vector in gen-

eral regime switching VECMs, validating the application of some of existing results on the

joint estimation of long-run and short-run parameters for models with smooth transition.

We also provided asymptotic inference methods for threshold cointegration, establishing the

convergence rates and asymptotic distributions of the LS and SLS estimators of the model pa-

rameters. In particular, the theory and Monte Carlo experiments indicate that the inference

on the threshold parameter can be improved upon by using the two-step estimator.

While we only considered two-regime models, our results might be extended to multiple-

regime models, provided that the assumptions on stationarity and invariance principle in

Assumption 1 hold true. In that case, we may consider the sequential estimation strategy

discussed in Bai and Perron (1998) and Hansen (1999) :A sequence of estimations and tests can

determine the number of regimes and the threshold parameter. The LM test by Hansen and

Seo (2002) can be employed to test for the presence of the second break without modi�cation

due to the fast rate of convergence of the cointegrating vector estimators.

It is also possible to think of the case with more than one cointegrating relation if p is

greater than 2: In this case, the threshold variable can be understood as a linear combination

of those cointegrating vectors. However, the models commonly used in empirical applications

are bivariate and the estimation of such a model is more demanding and thus we leave it for

a future research.

Proof of Theorems

A word on notation. Throughout this section, C stands for a generic constant that is �nite.

The proof of Theorem 1 makes use of the following lemma, which might be of independent

interest.

Lemma 6 Let fZng be a sequence of random variables and frng be a sequence of positivenumbers such that rn ! 1: If anZn = op (1) for any sequence fang such that an=rn ! 0;

14

Page 17: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

then

rnZn = Op (1) :

Proof of Lemma 6 Let Xn = rnZn and suppose Xn 6= Op (1) : Then, there exists " > 0

such that for any K;

lim supn!1

Pr (jXnj > K) > ":

Thus, there exists n1 such that Pr (jXn1 j > 1) > ": Similarly one can �nd n2 > n1 such

that Pr (jXn2 j > 2) > "; and n3 < n4 < � � � ; accordingly. Let bn = 1 for n � n1; bn = 2

for n1 < n � n2; etc. Then, it is clear from the construction that bn ! 1, and thatPr (jXnj > bn) > "; in�nitely often (i:o:), which implies that Xn=bn 6= op (1) : However, giventhe condition of the lemma, Xn=bn = (rn=bn)Zn = op (1) since (rn=bn) =rn ! 0: This yields

contradiction. �

Proof of Theorem 1

Let �rn;� = f� 2 � : rn j� � �0j > �g : Supremums and in�mums in this proof are taken onthe set �rn;�; unless speci�ed otherwise. To show that �

�� �0 = op(r

�1n ) we need to show

that for every � > 0;

Pr

�inf�S�n (�) =n� S�n (�0) =n > 0

�! 1: (8)

Let X��; =

�X (�) ; X�

(�)� Ip and rewrite (2) as

S�n (�) = y0y + �0X�0

�; X��; �� 2y0X�

�; �:

Let � =pn (� � �0) and rn be a sequence of real numbers such that

pn � rn ! 1 and

rn=pn! 0 as n!1: Then,

j�j = rn j� � �0j�pn=rn

�� �

pn=rn !1 (9)

for any � 2 �rn;�:Note that

1

nS�n (�)�

1

nS�n (�0) =

1

n�0X�0

�; X��; ��

2

ny0X�

�; �+1

ny0y � 1

nS�n (�0)

and that y0y=n � S�n (�0) =n = y0y=n � u0u=n converges in probability to a positive constantas in the standard linear regression due to Assumption 1 (c) and this term is free of the

parameter �: Therefore, it is su¢ cient to show that

Pr

�inf�

�1

n�0X�0

�; X��; �� 2

1

ny0X�

�; �

�� 0�

= Pr

(inf�

j�j 1n�0X�0�; X

��;

j�j2�� 2 1

n

y0X��;

j�j �

!� 0)

! 1; (10)

15

Page 18: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

as n!1: This follows from (9) if we show that (i)

sup�

���� 1n y0X��;

j�j �

���� = Op (1) ; (11)

and (ii) inf� 1n�0X�0

�; X��;

j�j2 � converges weakly to a random variable that is positive with prob-

ability one.

Show (i) �rst. Note that y0X��; =n consists of the sample means of the product of �xt and�

1; zt�1 (�) ;�0t�1�and that of �xt and

�1; zt�1 (�) ;�

0t�1�dt�1 (�; ) : However, as � and

dt�1 (�; ) are bounded, it is su¢ cient to observe that 1n

Pt j�x0tj ; and 1

n

Pt

���xt�0t�1�� areOp (1) ; and that

1

n j�jXt

jzt�1 (�)�x0tj �1

n�

Xt

jzt�1�x0tj+1

n

Xt

�����xtx02t�1pn

���� = Op (1) ;by the law of large numbers for jzt�1�x0tj ; the weak convergence of x2t=

pn and the Cauchy-

Schwarz inequality.

We consider (ii) now. Let _� = �= j�j : Since 1nX

�0�; X

��; is the matrix of the sample means

of the outer product of

��1; zt�1 (�) ;�

0t�1�;�1; zt�1 (�) ;�

0t�1�dt�1 (�; )

�0;

we may write1

n j�j2�0X�0

�; X��; � = �

0z (�

�n (�; ) Ip)�z +Rn (�) ;

where

��n (�; ) =

1n2

Pt

�x02t�1 _�

�2 1n2

Pt

�x02t�1 _�

�2dt�1 (�; )

1n2

Pt

�x02t�1 _�

�2dt�1 (�; )

1n2

Pt

�x02t�1 _�

�2d2t�1 (�; )

!;

and

sup�jRn (�)j = Op

���� rnpn����2!;

by the same reasoning to show (i) : Since �z is bounded away from zero by Assumption 1 (c) ;

it is su¢ cient to show that

��n (�; ))�_�0_�

� R 10W 2

R 10W 21 fW > 0gR 1

0W 21 fW > 0g

R 10W 21 fW > 0g

!; (12)

where W is the standard Brownian motion. Note that the matrix in (12) is positive de�nite

and free of parameters up to a constant multiple�_�0_�

�; which is positive and bounded away

from zero.

Now we show (12) : It follows from Assumption 1 (b) and the continuous mapping theorem

that1

n2

Xt

�x02t�1 _�

�2 ) Z 1

0

(B0 _�)2=dZ 1

0

W 2�_�0_�

�; (13)

16

Page 19: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

and that

1

n2

Xt

�x02t�1 _�

�21�x02t�1 _� > 0

)Z 1

0

(B0 _�)21 fB0 _� > 0g =d

�_�0_�

� Z 1

0

W 21 fW > 0g :

Then, it remains to show that

sup�

����� 1n2 Xt

�x02t�1 _�

�2 �1�x02t�1 _� > 0

� d (zt�1 (�) ; )

������ = op (1) (14)

and that

sup�

����� 1n2 Xt

�x02t�1 _�

�2 �1�x02t�1 _� > 0

� d2 (zt�1 (�) ; )

������ = op (1) : (15)

Since�1 fx > 0g � g2 (x)

�� (supx jg (x)j+ 1) j1 fx > 0g � g (x)j for any bounded function g

and sup _�;t�x02t�1 _�=

pn�2= Op (1), it is su¢ cient to show that

sup�

1

n

Xt

��d (zt�1 (�) ; )� 1�x02t�1 _� > 0�� � R1n +R2n = op (1) ;where

R1n = sup�

1

n

Xt

��1 fzt�1 (�) > 0g � 1�x02t�1 _� > 0��R2n = sup

1

n

Xt

jd (zt�1 (�) ; )� 1 fzt�1 (�) > 0gj :

Due to (9) and the fact that zt�1 (�) = zt�1 +x02t�1 _�p

nj�j ;

1

n

Xt

��1 fzt�1 (�) > 0g � 1�x02t�1 _� > 0�� � 1

n

Xt

1

�����x02t�1 _�pn

���� � �����rnzt�1pn

����� : (16)

For any mn > 0;

supj _�j=1

1

(�����x02[nr] _�pn

����� ������rnzt�1p

n

����)

� 1

(infj _�j=1

�����x02[nr] _�pn

����� ������rnzt�1p

n

����)

� 1

(infj _�j=1

�����x02[nr] _�pn

����� � mn

)+ 1

������rnzt�1pn

���� > mn

�:(17)

Consider the �rst term in (17) : Sincex02[nr] _�p

nconverges weakly on f0 � r � 1g�f _� : j _�j = 1g ;

E1

(infj _�j=1

�����x02[nr] _�pn

����� � mn

)! Pr

�infj _�j=1

jB0r _�j � 0�= 0;

for any r 2 [0:1] and for any decreasing sequence mn ! 0: And for the second term in

(17) ; we observe that E1n��� �rnzt�1p

n

��� > mn

ois the same for all t and that it goes to zero as

17

Page 20: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

mnpn=rn !1: Consider the right hand side term of (16). Lettingmn ! 0 and mn

pn=rn !

1; we observe that

E1

n

Xt

1

�����x02t�1 _�pn

���� � �����rnzt�1pn

����� = Z 1

0

E1

(�����x02[nr] _�pn

����� ������rnz[nr]p

n

����)dr ! 0;

by the dominated convergence theorem. This shows that R1n = op (1) :

Next, it follows from Assumption 1 (c) that

sup�jd (zt�1 (�) ; )� 1 fzt�1 (�) > 0gj � ~d

inf�

�����z[nr] + x02[nr]�pn

�����!; (18)

where [nr] = t � 1: Due to (9) ; for each r and for any sequence f"ng such that "n ! 0 and

"npn=rn !1 as n!1;

E

"~d

inf�

�����z[nr] + x02[nr]�pn

�����!#

� E

"~d

pn

rn

infj _�j=1

�����x02[nr] _�pn

������ "n!!

1

�sup�

����z[nr]j�j

���� � "n� 1(infj _�j=1

�����x02[nr] _�pn

����� > 2"n)#

+CE

1

����� z[nr]�pnrn

���� > "n�+ 1(infj _�j=1

�����x02[nr] _�pn

����� � 2"n)!

:

As inf j _�j=1���x02[nr] _�p

n

��� = Op (1) ; E1ninf j _�j=1 ���x02[nr] _�pn

��� < 2"no! 0:Due to the fact that "npn=rn !

1; we have E1n��� z[nr]�

pnrn

��� > "no! 0; and

E ~d

pn

rn

infj _�j=1

�����x02[nr] _�pn

������ "n!!

1

�sup�

����z[nr]j�j

���� � "n� 1(infj _�j=1

�����x02[nr] _�pn

����� > 2"n)

� ~d�d"n

pn=rn

�! 0:

Thus, it follows from the dominated convergence theorem that the integral of (18) over r on

the interval [0; 1] goes to zero. This shows R2n = op (1) ; thus, completing the proof of (8) :

Then, it follows from Lemma 6 thatpn���� �0

�= Op (1).

We turn to show that � � � = op (1) andpn���� �0

�= op (1). When d (x; ) is con-

tinuous in all its arguments, we can resort to Theorem 1 of de Jong (2002) : And it can be

readily generalized to the case where the limit objective function is continuous. Recalling

Assumption 1 (d) and reading the proof of Theorem 1 of de Jong (2002) ; we see that we only

have to show his equation (A:9) with an = 0; which corresponds to

1

n

Xt

"ut

�x02t�1�p

n; �;

�0ut

�x02t�1�p

n; �;

�� S

�x02t�1�p

n; �;

�#p�! 0; (19)

uniformly in (�; ) on any compact set. However, for any � and and C > 0; the term in

18

Page 21: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

(19) is bounded by

sup�; ;j�j�C

����� 1nXt

�ut (�; �; )

0ut (�; �; )� S (�; �; )

������+ 1

n

Xt

1

�sup�

����x02t�1�pn

���� > C� ;whose �rst term converges to zero in probability by assumption and the second term can be

made arbitrarily small by choosing C large enough due to the weak convergence ofx02t�1�p

n:

Therefore, the proof is complete. �

Proof of Corollary 2

As in the proof of Theorem 1, we need to show that

Pr

�inf�Sn (�) =n� Sn (�0) =n > 0

�! 1;

where we follow the notational convention of the proof the theorem that in�mums and supre-

mums are assumed to be taken over �rn;� unless speci�ed otherwise. As Sn (�0) =n does not

contain any I (1) variable, the result in Seo and Linton (2007) applies so that it is su¢ cient

to show (10) with X��; replaced by X�; = (X (�) ; X (�)) Ip: However, (11) is obvious as

K is bounded. Thus, we need to show (12), that is,

�n (�; ) =

1n2

Pt

�x02t�1 _�

�2 1n2

Pt

�x02t�1 _�

�2Kt�1 (�; )1n2

Pt

�x02t�1 _�

�2Kt�1 (�; ) 1n2

Pt

�x02t�1 _�

�2K2t�1 (�; )!

)�_�0_�

� R 10W 2

R 10W 21 fW > 0gR 1

0W 21 fW > 0g

R 10W 21 fW > 0g

!;

which follows if we show (14) and (15) with d replaced by K. The proof hinges on (18) ; whichis the part where d matters. However, it still holds replacing d with K and taking supremumover both � and h on any interval including zero. This establishes that � is

pn-consistent.

Remaining part of the proof follows if the conditions on K satisfy Assumption 1 (d) : In otherwords, de�ning ~ut (�; �; ) like ut (�; �; ) replacing the indicator with K, we need to showthe uniform convergence of its sample mean to S (�; �; ), which follows from Seo and Linton

(2007) as ~ut (�; �; ) consists of stationary variables. This completes the proof. �

A word on notation. Having proved Theorem 1 and Corollary 2, we write hereafter

� =��0; ; �0

�0and �1 = (�0; )

0 with slight abuse of notation. To further simplify notation

assume 0 = 0 and thus �10 is a vector of zeros.

Proof of Theorem 3

Let �c = f� : j� � �0j < cg : Due to the consistency shown in Theorem 1, we may restrict the

parameter space to �c for some c > 0; which will be speci�ed below. It is su¢ cient to show

the following claim for �1 to be n-consistent as in Chan (1993) :

19

Page 22: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

Claim I For any " > 0; there exist c > 0 and K > 0 such that

lim infn!1

Pr fSn (�)� Sn (0; �) > 0 for all � 2 �c;Kg > 1� ";

where �c;K = �c \ f� : j�1j > K=ng :

Proof of Claim I: Let t = �x02tpn� and ut (�) = u1t (�) + u2t (�) ; where

u1t (�) = ut � (A�A0)0Xt�1 � (D �D0)0Xt�11�zt�1 > t�1

�D0

0Xt�1�1�zt�1 > t�1

� 1 fzt�1 > 0g

�u2t (�) = �

�Az +Dz1

�zt�1 > t�1

� x02t�1pn�:

Since u2t (0; �) = 0; (Sn (�)� Sn (0; �)) =n = D1n (�) +D2n (�) ; where

D1n (�) =1

n

Xt

�u1t (�)

0u1t (�)� u01t (0; �)u1t (0; �)

�D2n (�) =

1

n

Xt

u2t (�)0u2t (�)�

2

n

Xt

u1t (�)0u2t (�) :

Note that ~x2n = supt<n���x2t�1p

n

��� = Op (1) and thus ~ n = supt<n �� t�1�� = Op (j�1j) = Op (c)for � 2 �c: Then, ����� 1nX

t

u2t (�)0u2t (�)

����� � O �[~x2n]2 j�j2� = Op (c j�j) : (20)

Since futg is a martingale di¤erence sequence, so is the sequence�ut1

�zt�1 > t�1

;

implying1

n

Xt

ut1�zt�1 > t�1

x02t�1pn= op (1) ;

see e.g. Hansen (1992) : This implies that����� 1nXt

u1t (�)0u2t (�)

����� � op (j�j) +Op (c j�j) + op (j�j) ; (21)

because��� 1nPtXt�11

�zt�1 > t�1

x02t�1pn

��� � 1n

Pt

���Xt�1 x02t�1pn

��� = Op (1) and1

n

Xt

����Xt�1 �1�zt�1 > t�1� 1 fzt�1 > 0g� 1�zt�1 > t�1 x02t�1pn

����� ~x2n

1

n

Xt

jXt�1j 1�jzt�1j �

�� t�1��= Op (1)Op (c) :

The meaning of op (j�j) is that the term can be bounded for all large n by the product of

j�j and an arbitrary small constant with probability 1� " for any " > 0: The same is true for

20

Page 23: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

Op (c j�j) since c can be chosen arbitrary small. Thus, we conclude from (20) and (21) that

for any m; " > 0; there is c > 0 such that

lim infn!1

Pr fjD2n (�)j � m j�1j for all � 2 �cg > 1� ": (22)

On the other hand, as we show below, for any " > 0 and all su¢ ciently large n; there

exists some constant m0 > 0 such that D1n (�) > m0 j�1j with probability 1 � "; which willcomplete the proof of Claim I as m is arbitrary. By direct calculation as above, we may write

D1n (�) = (D00 +Op (c))

1

n

Xt

Xt�1X0t�11

�jzt�1j �

�� t�1��D0+(D0

0 +O (c))1

n

Xt

Xt�1ut1�jzt�1j �

�� t�1��+Op �c2� :We �rst argue that the same as (22) can be said for the last two terms. It is obvious for the

last term Op�c2�and it is left to show that for any "; � > 0 and su¢ ciently large n

Pr

(sup

�2�c;K

����� 1nXt

ut1�jzt�1j �

�� t�1������� = j�1j > �

)< ": (23)

The other terms in Xt�1 can be analyzed similarly. To show (23), we consider a grid

�b =n�1 =

�bi1 ; :::; bik

�0K=n : j�1j < c; and i1; :::; ik = 1; 2; :::

o;

for b > 1 and �rst show (23) when the supremum is taken over �b: By the Markov inequality

Pr

(sup�b

����� 1nXt

ut1�jzt�1j �

�� t�1������� = ���K=n �bi1 ; :::; bik�0��� > �

)

�X

i1;:::;ik

E�� 1n

Pt ut1

�jzt�1j �

�� t�1����2��K=n (bi1 ; :::; bik)0��2 �2=

Xi1;:::;ik

1

�2K2 j(bi1 ; :::; bik)j2Xt

Eu2t1�jzt�1j �

�� t�1�� : (24)

However, since the conditional distribution, say, Ft�1 of zt�1 given x2t�1 has a density, which

is bounded by M , an expansion of it yields that

Xt

EFt�1��� t�1��� �M 1

n

Xt

E

���� x2tpn����2K ���bi1 ; :::; bik��� = O �K ���bi1 ; :::; bik���� : (25)

Furthermore, b > 1 implies thatP

i1;:::;ik

���bi1 ; :::; bik����1 < 1: Thus, letting K large makes

the term in (24) smaller than any given " > 0:

Next, if a1 and a2 are any two adjacent points in �b; then ja1 � a2j � ja1j (b� 1)K=n �c (b� 1)K=n: Also, let 1t and 2t denote 0ts corresponding to any two points (say, �11 and

21

Page 24: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

�12; ) lying between a1 and a2; then��1�jzt�1j � �� 1t�1��� 1�jzt�1j � �� 2t�1����� 1

(�� 1t�1��� supt;�11;�12

j 1t � 2tj � jzt�1j ��� 1t�1��+ sup

t;�11;�12

j 1t � 2tj)

and

supt;�11;�12

j 1t � 2tj � ~x2n c (b� 1)K

n;

where the supremum is taken over t < n and over �11 and �12 between a1 and a2. Since a1and a2 were chosen arbitrary, the supremum can be extended to the collection of all �11 and

�12 that lie between any two adjacent point in �b and by the same reasoning as (25) ;

E sup�11;�12

�����Xt

ut�1�jzt�1j �

�� 1t�1��� 1�jzt�1j � �� 2t�1�������� = O (c (b� 1)K) :

Since b can be chosen arbitrarily close to 1; this completes the proof of (23) :

Turning to the �rst term of D1n; let F denote the distribution function of jztj then itfollows from the standard uniform law of large numbers that

supjxj;j�1j�C

1

n

Xt

j1 fjzt�1j � jx0�1jg � F (jx0�1j)jp�! 0;

for any C <1: Since C is arbitrary and ~x2n = Op (1) ;

supj�1j�C

1

n

Xt

��1�jzt�1j � �� _ t�1��� F ��� _ t�1����� p�! 0: (26)

Since x2[nr]=pn) B (r) and F is continuous, it follows from (26) that

1

n

Xt

1�jzt�1j �

�� _ t�1��) Z 1

0

F��� �B (r)0 ���� dr d

=

Z 1

0

F (j + �0�W (r)j) dr;

where W is a standard Brownian motion and the equality in distribution follows from the

normality of B: Then, for all large n;

Pr

(1

n

Xt

1�jzt�1j �

�� _ t�1���m5 j�1j � 0 for all �1 2 �c

)

� Pr

�Z 1

0

F (j + �0�W (r)j) dr �m5 j�1j � 0 for all �1 2 �c�� "=2

� 1� ";

where the last inequality is shown below in several steps.

First, there is m1 such that F (z) � m1z for all z 2 [0; 1] and sup0�r�1 jW (r)j = Op (1)and j j and j�j are bounded by c: Thus, choosing c small we can argue that for any " > 0;

22

Page 25: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

with probability greater than 1� ";Z 1

0

F (j + �0�W (r)j) dr �Z 1

0

m1 j + �0�W (r)j dr:

Second, since j j � c; if jW (r)j � c; then j + �0�W (r)j � 2�0� jW (r)j � 2m2 j�j jW (r)j ;for somem2 as is positive de�nite. Thus, choosing c small, Pr finf0�r�1 jW (r)j � cg > 1�"and thus with probability greater 1� ";Z 1

0

m1 j + �0�W (r)j dr � m3 j�jZ 1

0

jW (r)j dr:

Third, for any " > 0; there is m4 such that PrnR 1

0jW (r)j dr > m4

o> 1 � ": Thus, we can

conclude that for any " > 0; there exists a constant m5 > 0 such that

Pr

�Z 1

0

F (j + �0�W (r)j) dr �m5 j�1j � 0�> 1� "=2:

Proof of Theorem 4

To derive the limit distribution of the SLS estimator �; de�ne Tn (�) =@Sn(�)n@� and Qn (�) =

@2Sn(�)n@�@�0 : Then, by the mean value theorem,

pnD�1

n

�� � �0

�=�DnQn

�~��Dn

��1pnDnTn (�0) ;

where Dn is a diagonal matrix, whose �rst p � 1 elements are (h=n)1=2 ; the p-th element isph; and the others are 1; and ~� lies between � and �0: Recall that X 0

t�1 (�)D = X 0t�1D +

x02t�1�pnD0z and write the residuals of the SLS as

et (�) = ut � (A�A0)0Xt�1 � (D �D0)0Xt�1dt�1 (27)

�D0Xt�1 (Kt�1 (�; )� dt�1)� (Az +DzKt�1 (�; ))x02t�1�p

n:

Then,

@et (�)0

@�= �x2t�1

"A0z +D

0zKt�1 (�; ) +

�D0z

x02t�1�pn

+X 0t�1D

� K(1)t�1 (�; )h

#(28)

@et (�)0

@ = �

�X 0t�1D +D

0z

x02t�1�pn

� K(1)t�1 (�; )h

(29)

@et (�)0

@�=

0@ ��X 0t�1D +D

0zx02t�1�p

n

� Ip

�Kt�1 (�; )�X 0t�1D +D

0zx02t�1�p

n

� Ip

1A :

23

Page 26: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

Convergence of Tn

The asymptotic distribution of

pnDnTn (�0) =2 =

1pnDnXt

@et (�0)0

@�et (�0)

has been developed in Seo and Linton (2007) except for the part corresponding to �: Thus,

we focus on the convergence of

ph

2n

Xt

@et (�0)0

@�et (�0) = �

1

n

Xt

x2t�1(phv1t +

phv2t + v3t=

ph);

where

v1t = A0z0ut +D0z0Kt�1ut;

v2t = (A0z0 +D0z0Kt�1)D0

0Xt�1 (Kt�1 � dt�1) ;

v3t = K(1)t�1X 0t�1D0 [ut �D0

0Xt�1 (Kt�1 � dt�1)] ;

and that of covariances ofpnDnTn (�0) :

Since v1t is a martingale di¤erence array, 1nP

t x2t�1v1t = Op (1) due to the convergence of

stochastic integrals (see e.g. Kurtz and Protter 1991). The convergence of n�1h�1=2P

t x2t�1v2t

is similar to that of 1nph

Pt x2t�1v3t, which we will establish here. Then, it follows that

1

n

Xt

x2t�1 (v1t + v2t)ph = op (1) ;

as h ! 0: Let �v3t = (v3t � Ev3t) =ph; then �v3t is a zero mean strong mixing array. Seo

and Linton (2007, Lemma 2) has shown thatpn=hEv3t ! 0 and var

h(hn)

�1=2Pt v3t

i=

varhv3t=

phi+ o (1)! �2v; which is de�ned in (5) : This implies that

1

n

Xt

x2t�1Ev3t=ph = op (1) ;

and that

n�1=2[nr]Xt=2

�x2t�1

�v3t

!)

B

�2vW

!; (30)

due to Assumption 4 and the invariance principle of Wooldridge and White (1988, Theorem

2.11), where W is a standard Brownian motion that is independent of B: For the inde-

pendence between B and W , see Lemma 2 of Seo and Linton (2007) ; which shows thatPns;t=1E�xs�v3t = o (1).

For the convergence of 1nP

t x2t�1�v3t; we resort to Hansen (1992, Theorem 3.1). Checking

his conditions, we observe that the moment condition for �v3t is not met, that is, E j�v3tjp !1for p > 2:However, it is not necessary but used to show that sup1�t�n n

�1=2P1k=1E [�v3t+kjFt] =

op (1) ; where Ft is the natural �ltration at time t: Using the Markov inequality, he obtains

24

Page 27: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

that for p > 2

Pr

(����� sup1�t�nn�1=2

1Xk=1

E [�v3t+kjFt]����� � "

)� CE j�v3tjp

"pnp=2�1! 0 (31)

if E j�v3tjp <1. Now we show that while E j�v3tjp is not bounded for p > 2 but diverges at arate slower than np=2�1 so that (31) still holds. As

pn=hEv3t ! 0 and the part associated

with ut can be done in the same manner, we focus on

E���K(1)t�1X 0

t�1D0D00Xt�1 (Kt�1 � dt�1)

���p h�p=2= h�p=2

ZjX 0D0D

00Xj

p

����K(1)�z � 0h

��K�z � 0h

�� 1 (z > 0)

�����p f (zjX) dzdPX= h1�p=2

ZjX 0D0D

00Xj

p���K(1) (s) (K (s)� 1 (s))���p f (hs+ 0jX) dsdPX ;

where PX is the distribution of Xt�1 and the last equality follows by the change-of-variables.

Note that f is bounded almost every X, K (s) � 1 (s) is bounded,��K(1)��p is integrable, and

E jX 0D0D00Xj

p< 1. As h1�p=2=np=2�1 ! 0 for p > 2; we conclude that (31) converges to

zero. Therefore, ph

2�vn

Xt

@et (�0)0

@�et (�0))

Z 1

0

BdW: (32)

Finally, note that a similar argument that showed the asymptotic independence in (30)

yields the asymptotic independence between the scores for � and �: For the covariance between

the scores for � and ; note that v3t = h@et(�0)

0

@ et (�0) :

Convergence of Qn

To begin with, we claim the following lemma.

Lemma 7 Under the conditions of this theorem, h�1 ( � 0) and h�1� are op (1) :

Proof of Lemma 7 Recalling the formula (27) and (29) ; we may write

1

n

Xt

@

@ et (�)

0et (�) = T1n (�) + � � �+ T10n (�) ; (33)

where

T1n (�) = � 1

nh

Xt

K(1)t�1 (�; )X 0t�1Dut;

T2n (�) = tr

"D0 1

nh

Xt

K(1)t�1 (�; )Xt�1X 0t�1 (A�A0)

#

T3n (�) = tr

"D0 1

nh

Xt

Xt�1X0t�1K

(1)t�1 (�; ) (Kt�1 (�; )� dt�1)D

#;

25

Page 28: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

and till we reach

T10n (�) =1

nhD0z�Xt

x2t�1pnK(1)t�1 (�; ) (Az +DzKt�1 (�; ))

x02t�1�pn:

As we proved the consistency of the smoothed least squares estimator of � in Corollary 2,

we can �nd a sequence rn ! 0 and that Prn���� � �0��� > rno! 0:Without loss of generality, we

restrict the parameter space to �rn = fj� � �0j � rng : We �rst show that sup�rnjTin (�)j =

op (1) ; for all i 6= 3; which implies that

T3n

���= op (1) ; (34)

since (33) equals zero at � = � due to the �rst order condition of the SLS estimation.

First take T1n (�). In particular, consider����� 1nhXt

K(1)�zt�1 + x

02t�1�=

pn�

h

�ut

����� ;and the other terms in the vector can be handled similarly. As in Seo and Linton, we may

assume ut is bounded by some constant C <1 and divide the parameter space for �1 of �rninto �n non-overlapping pieces �ni; i = 1; :::;�n; and the distance between any two points in

each piece is smaller than equal to �h2+" for some � > 0: Then, �n = O�h�3(p�1)(1+"=2)

�. An

application of a Hoe¤ding type inequality for martingale di¤erence sequence (Azuma 1967)

yields that for �1i 2 �ni and some constant C1

Pr

(supi��n

����� 1nhXt

utK(1)�zt�1 + x

02t�1�i=

pn� i

h

������ > �)

�Xi��n

Pr

(����� 1nhXt

utK(1)�zt�1 + x

02t�1�i=

pn� i

h

������ > �)

� �n exp��nh2�2=C1

�! 0;

as nh2 !1: And for any �1 and �2 in �ni;

1

nh

Xt

�K(1)

�zt�1 + x

02t�1�1=

pn� 1

h

��K(1)

�zt�1 + x

02t�1�2=

pn� 2

h

��ut

� 1

nh

Xt

sup�;

����K(2)�zt�1 + x02t�1�=pn� h

�utx2t�1pn

���� �h2+"h

� Op (h") ;

as sup1�t�n���x2t�1p

n

��� = Op (1) and K(2) is bounded.Next, we study the convergence of T3n (�) : The same analysis applies to Tins for i =

2; 4; 5; :::; 10; to yield that they are all op (1) s: Note for example that T2n is the product of

(A�A0) ; which is bounded by rn; and of a sample mean, which is Op (1) applying the same

26

Page 29: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

method as below.

De�ne

Tt (x; �1) = Xt�1X0t�1K(1)

�zt�1 + x

0� � h

��K�zt�1 + x

0� � h

�� 1 fzt�1 + x0� > g

�;

and

& (x; �1) = E (Tt (x; �1)) :

Furthermore, letting _ = � x0�, de�ne

_Tt ( _ ) = Xt�1X0t�1K(1)

�zt�1 � _

h

��K�zt�1 � _

h

�� 1 fzt�1 > _ g

�;

_& ( _ ) = E _Tt ( _ ) :

Assume that x belongs to a compact set �x and thus _ lies within an interval �gn; shrinking

to zero as j�1j � rn ! 0: Then, it is clear that

supx2�x;�12�rn

����� 1nhXt

(Tt (x; �1)� & (x; �1))����� � sup

_ 2�gn

����� 1nhXt

�_Tt ( _ )� _& ( _ )

������ = op (1) ; (35)where the last equality is due to Lemma 4 (17) of Seo and Linton (2007) : Since (35) holds

for any compact set �x and supt��n�1=2x2t�1�� = Op (1) ; (35) holds true when x is replaced

with n�1=2x2t�1: This implies that

sup�rn

�����T3n (�)� tr"D0 1

nh

Xt

&

�x2t�1pn; �1

�D

#����� = op (1) ; (36)

and thus1

nh

Xt

&

�x2t�1pn; �1

�= op (1) ; (37)

due to (34) : Furthermore,

1

nh

Xt

&

�x2t�1pn; �10

�=

1

nh

Xt

_& (0) = o (1) ; (38)

where the last equality is due to (24) of Seo and Linton (2007) : Expanding the (1; 1) element

of (37) around �10 yields that for ~�1 between �1 and �10

1

nh

Xt

&11

�x2t�1pn; �1

�=

1

nh

Xt

&11

�x2t�1pn; �10

�+

@

@�1

1

nh

Xt

&11

�x2t�1pn; ~�1

���1 � �10

�;

where &11 denotes the (1; 1) element of &:Given (37) and (38), this implies that h�1��1 � �10

�=

op (1) if@

@�1

1

n

Xt

&11

�x2t�1pn; ~�1

�6= op (1) : (39)

27

Page 30: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

However,

@

@�1E

�K(1)

�zt�1 + x

0� � h

��K�zt�1 + x

0� � h

�� 1 fzt�1 + x0� > g

��= E

�K(2)

�zt�1 + x

0� � h

��K�zt�1 + x

0� � h

�� 1 fzt�1 + x0� > g

�1

h

�(x0;�1)0

+

"EK(1)

�zt�1 + x

0� � h

�21

h�K(1) (0) f ( � x0�)

#(x0;�1)0 ;

where f denotes the density of zt: Using change-of-variables formula,

EK(1)�zt�1 � _

h

�21

h=

Z 1

�1K(1)

�z � _

h

�21

hf (z) dz

=

Z 1

�1K(1) (z)2 f (hz + _ ) dz

!Z 1

�1K(1) (z)2 dz � f (0) ;

by the dominated convergence theorem as h and _ go to zero. Similarly,

E

�K(2)

�zt�1 � _

h

��K�zt�1 � _

h

�� 1 fzt�1 > _ g

�1

h

�!

Z 1

�1K(2) (z) (K (z)� 1 fz > 0g) dz � f (0) :

Furthermore, it follows from the integral by parts thatZ 1

�1K(1) (s)2 ds�

Z 1

�1K(2) (s) (1 fs > 0g � K (s)) ds = �

Z 1

�1K(2) (s) 1 fs > 0g ds = K0 (0) :

Since sup1�t�n;�12�rn

���x02t�1pn� �

��� = op (1) ; we obtain@

@�1

1

n

Xt

&11

�x2t�1pn; ~�1

= 2

Z 1

�1K(2) (z) (K (z)� 1 fz > 0g) dz � f (0) 1

n

Xt

�x02t�1pn;�1

�0+ op (1) :

Thus, (39) follows sinceR1�1K

(2) (z) (K (z)� 1 fz > 0g) dz > 0 and f (0) > 0: �

In view of Lemma 7 and the proof, we can restrict the parameter space to

�n = f� 2 � : j�� �0j < rn; h�1 j � 0j < rn; h�1 j�j < rng;

for a sequence rn ! 0. Let

Qan (�) =1

n

Xt

@et (�)0

@�

@et (�)

@�0; Qbn (�) =

1

n

Xt

pXi=1

@2eit (�)

@�@�0eit (�) ;

28

Page 31: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

where the subscript i of a matrix (or a vector) indicates the ith Column (element) of the

matrix (the vector). Then, Qn (�) = 2Qan (�)+2Qbn (�) : Start with Q

bn (�) ; in particular, with

�Xt

@2eit (�)

@�@�0eit (�)

=Xt

x2t�1x02t�1

2Dzi

K(1)t�1 (�; )h

+Xt�1 (�)0DiK(2)t�1 (�; )

h2

!eit (�) : (40)

Since h! 0 and sup1�t�n���x02t�1�h

pn

��� = op (1) ; Xt�1 (�)0Di = X 0t�1Di+Dzi

x02t�1�pn

= X 0t�1Di+

op (1) and the leading term in (40) with the normalization is

h

n2

Xt

x2t�1x02t�1X

0t�1Di

K(2)t�1 (�; )h2

eit (�)) ~�2qi

Z~K2Z 1

0

BB0; (41)

where ~K2 (s) = K(2) (s) (K (s)� 1 fs > 0g) and ~�2qi = E�D00iX

0t�1Xt�1D0ijzt�1 = 0

�f (0) :

The convergence (41) can be achieved in the same way as in Lemma 7. That is, decomposing

it as in the lemma, we can easily show that T 0ins are negligible except for i = 3: For T3n; the

same as (35) holds with

& (x; �) =1

hn2

Xt

xx0EhX 0t�1DiK(2)

�zt�1h

+ _ ��K�zt�1h

+ _ �� 1

nzt�1h

+ _ > 0o�i

:

where _ = x0�� h : Since supt�n;�2�n

���x02t�1�� h

��� = op (1) and by the change-of-variables tech-nique

1

hE�X 0t�1Di

�2K(2)t�1 (Kt�1 � dt�1)! E�(D0

iXt�1)2 jzt�1 = 0

�f (0)

Z~K2;

the convergence in (41) follows.

The remaining terms in Qbn (�) are@2eit(�)@�@�0 = 0,

@2eit (�)

@ @�0=

Dzi

K(1)t�1 (�; )h

+Xt�1 (�)0DiK(2)t�1 (�; )

h2

!x02t�1;

@2eit (�)

@ @ 0= �Xt�1 (�)0Di

K(2)t�1 (�; )h2

;@2eit (�)

@�@ 0=

0

K(1)t�1(�; )

h Xt�1 (�)

! Ii;

and@2eit (�)

@�@�0= �

�2

Kt�1 (�; ) �2 +K(1)t�1(�; )

h Xt�1 (�)

!x02t�1 Ii

29

Page 32: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

where �2 = (0; 1; 0; :::; 0) whose dimension is (pl + 2) : As their convergence can be analyzed

similarly as above, we omit the details and conclude that

DnQbnDn ) ~�2q

Z~K2

0B@ �R 10BB0

R 10B 0R 1

0B0 1 0

0 0 0

1CA ;where ~�2q =

Ppi=1 ~�

2qi : Similarly,

DnQanDn )

266664 K(1) 2

2~�2q

R 10BB0 �

R 10B

�R 10B0 1

!0

0 E

1 dt�1

dt�1 dt�1

!Xt�1X 0

t�1 Ip

377775 :

Finally, note that K(1) 2

2�R~K2 = K(1) (0) by an application of the integral by parts, which

yields the desired result.

The convergence of ��is a direct consequence of Theorem 3, which obtains the convergence

rates of � and ��; and Theorem 5 of Seo and Linton (2007) : �

Proof of Corollary 5

Let �n = �+Op�n�1

�, �2 = ( ; �) : The consistency of �2 is obvious since we established the

consistency in Corollary 2 based on thepn-consistency of �: For the limit distribution, let

T2n (�2; �) and Q2n (�2; �) denote the score and hessian of the sum of squares function with a

�xed �. But, we have already derived the convergence of the Hessian Qn (�) for � = �0+op (1) :

Thus, we only have to examine the score T2n: Let

�et (�2; �n) = ut � (A�A0)0Xt�1 � (D �D0)0Xt�1dt�1�D0Xt�1 (Kt�1 (�n; )� dt�1)� (Az +DzKt�1 (�n; ))x02t�1 (�n � �0) ;

@�et (�2)0

@�2=

0B@ ��X 0t�1D +D

0zx02t�1 (�n � �0)

�K(1)t�1 (�n; ) =h

��X 0t�1D +D

0zx02t�1 (�n � �0)

� Ip

�Kt�1 (�n; )�X 0t�1D +D

0zx02t�1 (�n � �0)

� Ip

1CAThen, we need to show that

phpn

nXt=1

����@�et (�20; �n)0@�2�et (�20; �n)�

@�et (�20; �0)0

@�2�et (�20; �0)

���� = op (1) :

30

Page 33: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

As the arguments are all similar for each term, we show that

phpn

nXt=1

��X 0t�1D0D

00Xt�1 (Kt�1 (�n; 0)�Kt�1)

���

pnphj(�n � �0)j sup

1�t�n

����x02t�1pn

���� 1nnXt=1

���X 0t�1D0D

00Xt�1K

(1)t�1 (~�; 0)

��� = op (1) ;where ~� lies between �0 and �n; since (�n � �0)

pn=ph = op (1) and K(1) is bounded. This

completes the proof. �

References

Anderson, H. M. (1997): �Transaction Costs and Non-linear Adjustment towards Equilib-

rium in the US Treasury Bill Market,�Oxford Bulletin of Economics and Statistics, 59(4),

465�84.

Andrews, D. W. K. (1987): �Consistency in nonlinear econometric models: a generic

uniform law of large numbers,�Econometrica, 55(6), 1465�1471.

(1991): �Heteroskedasticity and autocorrelation consistent covariance matrix esti-

mation,�Econometrica, 59, 817�858.

Azuma, K. (1967): �Weighted sums of certain dependent random variables,�The Tohoku

Mathematical Journal. Second Series, 19, 357�367.

Bai, J., and P. Perron (1998): �Estimating and testing linear models with multiple struc-

tural changes,�Econometrica, 66, 47�78.

Balke, N., and T. Fomby (1997): �Threshold cointegration,�International Economic Re-

view, 38, 627�645.

Bec, F., and A. Rahbek (2004): �Vector equilibrium correction models with non-linear

discontinuous adjustments,�The Econometrics Journal, 7(2), 628�651.

Chan, K. S. (1993): �Consistency and Limiting Distribution of the Least Squares Estimator

of a Threshold Autoregressive Model,�The Annals of Statistics, 21, 520�533.

Corradi, V., N. R. Swanson, and H. White (2000): �Testing for stationarity-ergodicity

and for comovements between nonlinear discrete time Markov processes,�Journal of Econo-

metrics, 96(1), 39�73.

de Jong, R. M. (2001): �Nonlinear estimation using estimated cointegrating relations,�

Journal of Econometrics, 101(1), 109�122.

(2002): �Nonlinear minimization estimators in the presence of cointegrating rela-

tions,�Journal of Econometrics, 110(2), 241�259.

Engle, R. F., and C. W. J. Granger (1987): �Co-integration and error correction: rep-

resentation, estimation, and testing,�Econometrica, 55(2), 251�276.

31

Page 34: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

Escribano, A. (2004): �Nonlinear Error Correction: The Case of Money Demand in the

United Kingdom (1878-2000),�Macroeconomic Dynamics, 8(1), 76�116.

Escribano, A., and S. Mira (2002): �Nonlinear error correction models,�Journal of Time

Series Analysis, 23(5), 509�522.

Gonzalo, J., and J.-Y. Pitarakis (2006): �Threshold E¤ects in Cointegrating Relation-

ships,�Oxford Bulletin of Economics and Statistics, 68(s1), 813�833.

Gonzalo, J., and M. Wolf (2005): �Subsampling Inference in Threshold Autoregressive

Models,�Journal of Econometrics, 127(201-224), 209�233.

Granger, C., and T. Terasvirta (1993): Modelling nonlinear economic relationships.

Oxford University Press.

Granger, C. W. J. (2001): �Overview of Nonlinear Macroeconometric Empirical Models,�

Macroeconomic Dynamics, 5(4), 466�81.

Hansen, B. (1992): �Convergence to Stochastic Integrals for Dependent Heterogeneous

Processes,�Econometric Theory, 8, 489�500.

Hansen, B., and B. Seo (2002): �Testing for two-regime threshold cointegration in vector

error correction models,�Journal of Econometrics, 110, 293�318.

Hansen, B. E. (1999): �Threshold e¤ects in non-dynamic panels: estimation, testing, and

inference,�Journal of Econometrics, 93(2), 345�368.

(2000): �Sample splitting and threshold estimation,�Econometrica, 68, 575�603.

Horowitz, J. L. (1992): �A smoothed maximum score estimator for the binary response

model,�Econometrica, 60(3), 505�531.

Kapetanios, G., Y. Shin, and A. Snell (2006): �Testing for Cointegration in Nonlinear

Smooth Transition Error Correction Models,�Econometric Theory, 22.

Kristensen, D., and A. Rahbek (2008): �Likelihood-Based Inference in Nonlinear Error-

Correction Models,�Journal of Econometrics, forthcoming.

Kurtz, T., and P. Protter (1991): �Weak Limit Theorems for Stochastic Integrals and

Stochastic Di¤erential Equations,�Annals of Probability, 19, 1035�1070.

Lo, M., and E. Zivot (2001): �Threshold Cointegration and Nonlinear Adjustment to the

Law of One Price,�Macroeconomic Dynamics, 5(4), 533�576.

Michael, P., A. R. Nobay, and D. A. Peel (1997): �Transactions Costs and Nonlinear

Adjustment in Real Exchange Rates: An Empirical Investigation,� Journal of Political

Economy, 105(4), 862�79.

Newey, W. K., and D. McFadden (1994): �Large sample estimation and hypothesis

testing,�in Handbook of econometrics, Vol. IV, vol. 2, pp. 2111�2245. North-Holland, Am-

sterdam.

32

Page 35: Estimation of Nonlinear Error Correction Models - STICERDsticerd.lse.ac.uk/dps/em/em517.pdf · Nonlinear error correction models (ECM) have been studied actively in economics and

Park, J., and P. Phillips (2001): �Nonlinear Regressions with Integrated Time Series,�

Econometrica, 69, 117�161.

Pötscher, B. M., and I. R. Prucha (1991): �Basic structure of the asymptotic theory

in dynamic nonlinear econometric models. I. Consistency and approximation concepts,�

Econometric Reviews, 10(2), 125�216.

Psaradakis, Z., M. Sola, and F. Spagnolo (2004): �On Markov error-correction models,

with an application to stock prices and dividends,�Journal of Applied Econometrics, 19(1),

69�88.

Saikkonen, P. (1995): �Problems with the asymptotic theory of maximum likelihood esti-

mation in integrated and cointegrated systems,�Econometric Theory, 11(5), 888�911.

(2005): �Stability results for nonlinear error correction models,�Journal of Econo-

metrics, 127, 69�81.

(2007): �Stability of Regime Switching Error Correction Models under Linear Coin-

tegration,�Econometric Theory, 24, 294�318.

Seo, M. (2006): �Bootstrap Testing for the Null of No Cointegration in a Threshold Vector

Error Correction Model,�Journal of Econometrics, 134(1), 129�150.

Seo, M., and O. Linton (2007): �A Smoothed Least Squares Estimator For The Threshold

Regression,�Journal of Econometrics, 141, 704.

Sephton, P. S. (2003): �Spatial market arbitrage and threshold cointegration,�American

Journal of Agricultural Economics, 85, 1041.

Taylor, A. M. (2001): �Potential Pitfalls for the Purchasing Power Parity Puzzle? Sampling

and Speci�cation Biases in Mean-Reversion Tests of the Law of One Price,�Econometrica,

69(2), 473�498.

Wooldridge, J. M., and H. White (1988): �Some invariance principles and central limit

theorems for dependent heterogeneous processes,�Econometric Theory, 4(2), 210�230.

Wu, C.-F. (1981): �Asymptotic theory of nonlinear least squares estimation,�The Annals

of Statistics, 9(3), 501�513.

33