Efficient tests for unit roots with prediction errors

30
Journal of Statistical Planning and Inference 113 (2003) 341 – 370 www.elsevier.com/locate/jspi Ecient tests for unit roots with prediction errors Ismael S anchez Departamento de Estad stica y Econometr a, Universidad Carlos III de Madrid, Avda. de la Universidad 30, 28911 Legan es, Madrid, Spain Received 2 December 2000; received in revised form 1 October 2001; accepted 24 October 2001 Abstract It is well-known that the main dierence between a stationary (or trend-stationary) process and a process with a unit root is to be observed in their long-term behaviour. This paper exploits this idea and shows that nearly optimal unit-root tests can admit an interpretation based on prediction performance. This result is not only useful in understanding how ecient tests use the information, but it can also be used to construct new unit-root tests based on prediction errors. A Monte Carlo experiment for the autoregressive moving-average of order (1; 1) indicates that the proposed tests have desirable size and power properties. c 2002 Elsevier Science B.V. All rights reserved. MSC: 62M10; 62E20; 62F03; 62F04 Keywords: Optimal tests; Predictive mean squared error; Unit roots 1. Introduction 1.1. General considerations This article analyses the relationship between the prediction errors of a predictor that assumes the presence of a unit root and the ecient detection of such a root. The motivation for this analysis is the intuitive concept that the main dierence between a stationary (or trend-stationary) process and a process with a unit root is to be observed in their long-term behaviour (see, e.g., Hamilton, 1994, Chapter 15). In spite of this well-documented result, there is not yet a theory that relates the optimal detection of a unit root with the long-term behaviour of a process. The main contribution of this article is to ll this gap, proving that nearly optimal unit-root tests can be built using E-mail address: [email protected] (I. S anchez). 0378-3758/02/$ - see front matter c 2002 Elsevier Science B.V. All rights reserved. PII: S0378-3758(01)00313-5

Transcript of Efficient tests for unit roots with prediction errors

Page 1: Efficient tests for unit roots with prediction errors

Journal of Statistical Planning andInference 113 (2003) 341–370

www.elsevier.com/locate/jspi

E$cient tests for unit roots with prediction errorsIsmael S&anchez

Departamento de Estad �stica y Econometr �a, Universidad Carlos III de Madrid,Avda. de la Universidad 30, 28911 Legan es, Madrid, Spain

Received 2 December 2000; received in revised form 1 October 2001; accepted 24 October 2001

Abstract

It is well-known that the main di/erence between a stationary (or trend-stationary) process anda process with a unit root is to be observed in their long-term behaviour. This paper exploitsthis idea and shows that nearly optimal unit-root tests can admit an interpretation based onprediction performance. This result is not only useful in understanding how e$cient tests usethe information, but it can also be used to construct new unit-root tests based on prediction errors.A Monte Carlo experiment for the autoregressive moving-average of order (1; 1) indicates thatthe proposed tests have desirable size and power properties. c© 2002 Elsevier Science B.V. Allrights reserved.

MSC: 62M10; 62E20; 62F03; 62F04

Keywords: Optimal tests; Predictive mean squared error; Unit roots

1. Introduction

1.1. General considerations

This article analyses the relationship between the prediction errors of a predictorthat assumes the presence of a unit root and the e$cient detection of such a root. Themotivation for this analysis is the intuitive concept that the main di/erence between astationary (or trend-stationary) process and a process with a unit root is to be observedin their long-term behaviour (see, e.g., Hamilton, 1994, Chapter 15). In spite of thiswell-documented result, there is not yet a theory that relates the optimal detection ofa unit root with the long-term behaviour of a process. The main contribution of thisarticle is to @ll this gap, proving that nearly optimal unit-root tests can be built using

E-mail address: [email protected] (I. S&anchez).

0378-3758/02/$ - see front matter c© 2002 Elsevier Science B.V. All rights reserved.PII: S0378 -3758(01)00313 -5

Page 2: Efficient tests for unit roots with prediction errors

342 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

the information of prediction errors. Therefore, prediction errors do contain valuableinformation for the e$cient detection of a unit root.

Since the work of Dickey and Fuller (1979), there has been an abundance of lit-erature devoted to the detection of autoregressive unit roots. A possible criterion forclassifying the existing literature is the approach to the detection of the unit root. Theoriginal and most commonly employed approach is that of Dickey and Fuller (1979),which is based on the asymptotic properties of the ordinary least squares (OLS) estima-tor. Important variations of Dickey–Fuller tests (DF tests) are their extensions to otherestimation methods, such as maximum likelihood (e.g., Pantula et al., 1994; Yap andReinsel, 1995; Shin and Fuller, 1998; Shin and Lee, 2000), the weighted symmetricestimator (�W test) (Park and Fuller, 1995; Fuller, 1996, p. 568) and the generalizedleast-squares (GLS) detrending under a @xed local alternative (�GLS test), developedby Elliott et al. (1996) (see also Hwang and Schmidt, 1996; Xiao and Phillips, 1998,among others).

A very important approach is the construction of tests observing certain optimal cri-teria (under normality). Works following this so-called optimal approach are Sarganand Bhargava (1983) and Bhargava (1986). These authors extend the basis of opti-mal serial correlation tests to construct approximate uniformly most powerful invariant(UMPI) tests and approximate locally best invariant (LBI) tests. Important extensionsof the Sargan and Bhargava tests, and also closely related to DF tests, are the M testsof Stock (1990) (see also Perron and Ng, 1996). Recently, Ng and Perron (2001) haveimproved the performance of these M tests using GLS detrending under a @xed localalternative (hereafter MGLS tests). Important contributions to this optimal approach arethe (infeasible) point optimal invariant (POI) tests and LBI tests developed by Dufourand King (1991). A sequence of these infeasible POI tests (that use the true valueof the root as the alternative) provide the upper attainable power (Gaussian powerenvelope) at each alternative. This power envelope can be used as a benchmark toevaluate the performance of any feasible test. Dufour and King (1991) found that POItests perform well in practice if a @xed alternative is properly chosen. According tothis idea, Elliott et al. (1996) and Elliott (1999) proposed a feasible POI test (PT test)using a @xed local-to-unity alternative.

This article introduces a new approach for unit-root detection based on predictionerrors. In this approach, the information that the process contains a unit root is obtainedfrom the behaviour of the prediction errors. The usefulness of this new approach istwofold. First, it proves to be very useful to give an intuitive interpretation of the useof the information by e$cient tests. Second, it allows the construction of new unit-roottests, based on prediction errors. The new tests are shown to be asymptotically nearlyoptimal in the vicinity of one. Therefore, this approach is also related to the mentionedoptimal approach. Hence, the intuitive notion that the nonstationarity is related withthe long-term behaviour of the process has optimal properties for unit-root detection.The new tests also have good @nite sample properties. In the AR(1) case, the empiricalpower is similar to the POI test, and also similar to other e$cient tests based on theoptimal approach like the PT test. The proposed tests can be extended to the generalARMA case by @tting an ARMA model. Therefore, they need not rely on autoregressiveapproximations. A Monte Carlo experiment for an ARMA(1; 1) model shows that the

Page 3: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 343

proposed tests still have high power with very small size distortion. The performanceof the proposed tests in this more general ARMA case is also similar to the e$cienttests based on the optimal approach like PT test.

The paper proceeds as follows: After setting up the notation and the model in the nextsubsection, Section 2 introduces test statistics based on the mean squared predictionerror (MSPE) of a random walk predictor. These tests are the basis for the remainingtests of the paper. Section 3 derives tests that are nearly optimal in the vicinity of unityand shows that they have a prediction-error interpretation. The resulting near-optimaltests happen to be functions of the statistics proposed in Section 2. Section 4 usesthe prediction-error interpretation to modify the near-optimal tests in order to achievehigh power in regions far from the unit circle. Section 5 compares the proposed testswith some other existing in the literature through a Monte Carlo experiment in theAR(1) case. Section 6 extends the tests to the ARMA case and compares them withother tests in a simulation exercise with an ARMA(1; 1) model. Section 7 concludes.Mathematical proofs are given in Appendix A.

1.2. Notation and the model

Let {yt} be a discrete stochastic process. We assume that this process contains adeterministic component dt and a pure stochastic component xt ; namely, yt =dt +xt . Itis assumed that the deterministic component can be a mean, dt =, and a deterministictrend, dt = + t. The pure stochastic part has the following structure: xt =�xt−1 + ut ,satisfying the following conditions:

Assumption A. xt is initialized at t = 0 by x0; a random variable with 7nite variance.

Assumption B. ut is a stationary and invertible ARMA(p; q) satisfying �(B)ut=�(B)at ;where at is a sequence of iid random variables with E(at) = 0 and E(a2

t ) = �2;�(B) = 1 − �1B − · · · − �pBp and �(B) = 1 − �1B − · · · − �qBq are polynomialsin the backward-shift operator B; with no common factors and with either �p �= 0 or�q �= 0.

Assumption C. T−1=2∑[Tr]t=1 ut

d→!W (r); T−1=2∑[Tr]t=1 at

d→�W (r); where !2 = E(u2t )

+ 2∑∞

k=2 E(u1uk) = �2{�(1)=�(1)}2 is the long-run variance of ut ; and W (r) is a

standard Brownian motion de7ned on C[0; 1]. The symbol d→ denotes weak conver-gence in distribution. For 06 r6 1; [Tr] denotes the greatest integer less than orequal to Tr.

Notice that Assumption A includes the case x0 = constant, with probability one, asa special case. Notice also that the limits of partial sums of ut and at depend on thesame standard Brownian motion W (r). The data-generating process is, therefore,

yt = + t + xt ; xt = �xt−1 + ut ; ut = (B)at ; (1)

where (B) = �(B)−1�(B). This process can also be expressed as yt = (1 − �) + �+ (1 − �)t + �yt−1 + (B)at .

Page 4: Efficient tests for unit roots with prediction errors

344 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

2. Tests based on the ratio of observed and expected MSPE

This section shows basic test statistics of the prediction-error approach. They ariseby comparing the empirical MSPE of a predictor with the expected one, under thenull hypothesis of a unit root. The proposed tests are based on the following property:if the process is stationary, as the horizon increases the empirical MSPE is bounded,although the predictor assumes there is a unit root. On the other hand, if there is aunit root, the empirical and the expected MSPEs are unbounded. We de@ne ej|i as theempirical prediction error of forecasting yj from t= i (j¿ i) under the null hypothesisof random walk (ut = at). Therefore, ej|i = yj − yi. We @rst base the test statistics onthe cumulative sum of squared prediction errors from the origin of the series. Then

T∑t=2

e2t|1 =

T∑t=2

(yt − y1)2: (2)

The expected value of this statistic, under the null hypothesis of a unit root, is

E

(T∑

t=2

e2t|1

)= �2{1 + 2 + · · · + (T − 1)} = (1=2)�2

uT (T − 1): (3)

Since �2u is not known, the consistent (under the null) estimator �2

u = T−1∑Tt=2(yt −

yt−1)2 can be used. Finally, the test statistic is obtained by dividing the cumulativesum (2) and its estimated expected value under the null (3). The constant can bedropped since it does not a/ect the test. Also, for the sake of simplicity, T (T − 1)is approximated to T 2. The proposed statistic of this prediction-error approach is asfollows:

C1 =

∑Tt=2 (yt − y1)2

T∑T

t=2 (yt − yt−1)2; (4)

where C stands for cumulative and the subscript shows the origin of the predictions.This test statistic is invariant under the group of transformations yt → ayt +b, with a, bconstants and, therefore, it is not a/ected by the mean value of the series. Therefore, itis also applicable to the case of a non-zero mean. It can be veri@ed that, under the nullhypothesis of a unit root, C

1 =Op(1). Under the alternative, the numerator is, however,of lower order of magnitude than in the previous case since T−2∑T

t=2 y2t = Op(T−1).

Therefore, under the stationary alternative, C1

p→0. Therefore, a consistent unit-root testagainst a stationary alternative has the rejection region C

1 ¡c1 .We now extend this test to the case of a null hypothesis of a random walk with a

drift, yt=+yt−1+ut , where ut=at . The prediction errors are et|1=yt−y1−(t−1). Ane$cient estimator of , under the null, can be obtained from the regression Qyt=+ut .Therefore, = (T − 1)−1(yT − y1). A consistent estimator of �2

u, under the null, is�2u = T−1∑T

t=2 (yt − yt−1 − )2. This leads to the following test statistic:

C�1 =

∑Tt=2{yt − y1 − (t − 1)}2

T∑T

t=2 (yt − yt−1 − )2=

∑Tt=2 (y t − y 1)2

T∑T

t=2 (y t − y t−1)2; (5)

Page 5: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 345

where y t is the estimated detrended series. Similarly to C1 , the rejection region is

C�1 ¡c�1. The statistics C

1 and C�1 are already in the literature using di/erent justi@ca-

tion. They correspond to (TN1)−1 and (TN2)−1, respectively, where N1 and N2 are inBhargava (1986). Bhargava (1986) shows that, if the Anderson approximation is usedfor the inverse of the covariance of the process, and the @rst observation is extractedfrom its conditional distribution, these statistics lead to approximate LBI tests, undernormality. These statistics are also derived in Tanaka (1996), with the same initialconditions, by taking the second derivative of the likelihood function, under normality(extended score tests).

Another way of evaluating the prediction errors is to also consider the series inreverse order (backward process). The use of the reversed time series can be justi@edby the time-reversibility of Gaussian stationary processes (see, e.g., Weiss, 1975; Boxand Jenkins, 1976, p. 197). This property states that the processes in direct and reverseorder have the same covariance structure. Therefore, under stationarity, better use ofthe information could be obtained if both processes are analysed. The statistic thataverages the prediction errors from both extremes of the series and at each horizon is

C1T =

∑Tt=2 (yt − y1)2 +

∑T−1t=1 (yt − yT )2

2T∑T

t=2 (yt − yt−1)2; (6)

in which the subscript 1T denotes the origin of the predictions. Several authors havealso used the property of time-reversibility to improve the performance of DF tests(Sen and Dickey, 1987; Pantula et al., 1994; Leybourne, 1995; Park and Fuller, 1995;among others). However, a unit-root test based on (6) had not been proposed in theliterature. We will see in the next section that both C1 and C1T tests have, apart froman intuitive interpretation, optimal properties. For the case of a null hypothesis of arandom walk with a drift, it can be shown that

C�1T = C�

1 :

The following theorem shows the limiting distribution of the statistics C1 and C

1Twhen the process is yt = yt−1 + ut ; ut = (B)at , and that of C�

1 when the process isyt = + yt−1 + ut ; ut = (B)at . For convenience, the following processes are de@ned:WB(r) ≡ {W (1 − r) −W (1)}, and W�(r) = W (r) − rW (1).

Theorem 1. Under Assumptions A; B; and C; when � = 1;

(i) C1

d→'−2∫ 1

0{W (r)}2 dr;

(ii) C�1(=C�

1T ) d→'−2∫ 1

0{W�(r)}2 dr;

(iii) C1T

d→'−2 12

{∫ 1

0{W (r)}2 dr +

∫ 1

0{WB(r)}2 dr

};

where '2 = !−2�2u; �

2u = E(u2);!2 = �2 (1)2.

Page 6: Efficient tests for unit roots with prediction errors

346 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

Remark 1. The consistency of the tests follows from noting that the limiting distribu-tions are Op(1) and are positive with probability one; so that the critical values of the

tests are positive. Since under the alternative C1

d→ 0; C�1

d→ 0; and C1T

d→ 0 as T → ∞;the consistency holds.

3. Nearly optimal tests and prediction errors

3.1. Asymptotic results

This section proposes asymptotic tests that are nearly optimal under normality of thedisturbances. The resulting statistics are found to be the same as the statistics proposedin the previous section (C1T and C1) or functions of them. In all of these cases, theresulting nearly optimal tests also allow an interpretation based on prediction errorsand, therefore, are useful to explain how e$cient unit-root tests use the information.Very important conclusions can be obtained from this section. First, it con@rms theintuition that the behaviour of a process in the long term, provides important informa-tion for the e$cient detection of the unit root. Secondly, it is shown here that e$cienttests utilize the time-reversibility property of Gaussian stationary processes to improvetheir asymptotic performances. The interest here is in asymptotic and invariant tests(invariant to the values of the parameters describing the deterministic terms) with highpower in the vicinity of � = 1. To obtain these properties, a second-order Taylor ex-pansion of the asymptotic POI test statistics, around the null hypothesis, is performed.Hence, the proposed tests are, asymptotically, nearly optimal invariant, close to thenull hypothesis. This section develops tests for the AR(1) case. Section 6 extends theproposed tests to the ARMA case.

We assume, @rst, the model (1) with = 0 and (B) = 1 (then, ut = at). We alsoassume that ut is normally distributed. In this paper, two di/erent assumptions aremade about x0:

Assumption A1 (Nonstationary case). We assume that x0 = 0 so x1 = a1. Therefore;y1 is extracted from its conditional distribution.

Assumption A2 (Stationary case). We assume that x0=0 when �=1 and x0 is randomwith zero mean and variance �2=(1 − �2) when |�|¡ 1. Therefore; y1 is extractedfrom its unconditional distribution.

Under Assumption A1, the process xt is still nonstationary under the alternative. Thisassumption is denoted as the nonstationary case, although some authors also refer tothis as the conditional case. The covariance matrix of the T ×1 vector x= (x1; : : : ; xT )′

is �2(N(�), where

(−1N (�) = IT − �(LT + L′

T ) + �2L′TLT ; (7)

with IT a T × T identity matrix and LT a T × T matrix with ones on the diagonalimmediately below the main diagonal and zeros elsewhere. This nonstationary model is

Page 7: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 347

the same as that of Dufour and King (1991), with d1 = 1. According to these authors,the POI test of � = 1 against � = �0 rejects the null hypothesis for small values of

SN(�0) =x′N(

−1N (�0)xN

x′1(−1N (1)x1

; (8)

where xN and x1 are GLS residual vectors using (−1N (�0) and (−1

N (1), respectively.Under Assumption A2, xt is covariance stationary under the alternative. This assump-tion is denoted as the stationary case (also known in the unit-root literature as theunconditional case). The covariance matrix of the vector x is �2(N(1), under the null,and �2(S(�), under the alternative, where

(−1S (�) = (−1

N (�) − �2ee′; (9)

with e= (1; 0; : : : ; 0)′. From the Neyman–Pearson lemma, an asymptotic representationof the POI test against some alternative, |�0|¡ 1, rejects the null for small values of

SS(�0) =x′S(

−1S (�0)xS

x′1(−1N (1)x1

; (10)

where xS are the GLS residual vectors using (−1S (�0) (see, e.g., Kadiyala, 1970 or

Elliott, 1999). The nonstationary case is a reasonable assumption when the time serieshas its origin at x0. Potential applications of this assumption can be found, for instance,in engineering, where the beginning of processes and experiments are known and con-trolled. On the other hand, the stationary case reRects the situation of a time series withthe origin in a very far past point. This assumption can be of interest, for instance, withsome macroeconomic data. The following theorems show the Taylor-series expansionof (8) and (10) around � = 1.

Theorem 2. Let yt = + xt ; where xt = �xt−1 + at ; with x0 following Assumption A1.Then;

SN(�) =x′N(

−1N (�)xN

x′1(−1N (1)x1

= 1 + (�− 1)S ′N(1) +

(�− 1)2

2S ′′

N(1) + O{(�− 1)3};

where

S ′N(1) =

@SN(�)@�

∣∣∣∣�=1

= 1 − (yT − y1)2∑Tt=2 (yt − yt−1)2

;

S ′′N(1) =

@2SN(�)@�2

∣∣∣∣�=1

= 2∑T

t=2 (yt − y1)2∑Tt=2 (yt − yt−1)2

− 4(yT − y1)2∑T

t=2 (yt − yt−1)2: (11)

Theorem 3. Let yt = + xt ; where xt = �xt−1 + at ; with x0 following Assumption A2.Then;

SS(�) =x′S(

−1S (�)xS

x′1(−1N (1)x1

= 1 + (�− 1)S ′S(1) +

(�− 1)2

2S ′′

S (1) + O{(�− 1)3};

Page 8: Efficient tests for unit roots with prediction errors

348 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

where

S ′S(1) =

@SS(�)@�

∣∣∣∣�=1

= 1 − 12

(yT − y1)2∑Tt=2 (yt − yt−1)2

;

S ′′S (1) =

@2SS(�)@�2

∣∣∣∣�=1

=∑T

t=2 (yt − y1)2 +∑T−1

t=1 (yt − yT )2

2∑T

t=2 (yt − yt−1)2

− (yT − y1)2 +∑T−1

t=2 (yt − y1)(yT − yt)∑Tt=2 (yt − yt−1)2

: (12)

Our interest is in building unit-root tests based on this Taylor expansion up to thesecond derivative. These tests will be denoted as N tests. For the sake of exposition,we will analyse, @rst, the asymptotic tests obtained from the use of only the @rst orsecond derivatives. We will then analyse their interpretation and relation with existingliterature. Then, these asymptotic tests resulting from the @rst and second derivativeswill be used to build the N tests.

At @rst approximation, the test statistics based on the Taylor approximation can beconstructed by using only the @rst derivatives, S ′

N(1) and S ′S(1), as expressed in (11)

and (12). These derivatives are, apart from a constant, identical for both stationary andnonstationary cases. By the generalized Neyman–Pearson lemma (see, e.g., Ferguson,1967), these @rst derivatives lead to asymptotically LBI tests. Therefore, these LBItests reject the null for small values of

EN = E

S =(yT − y1)2∑T

t=2 (yt − yt−1)2; (13)

where E stands for extreme points of the series and the subscript N or S denotesthe nonstationary case or the stationary case, respectively. This statistic was previouslyobtained by Nabeya and Tanaka (1990) and Tanaka (1996), but only under AssumptionA1 (corresponding to E

N). Therefore, it is shown here that the same statistic holds forthe LBI test in the stationary case (E

S). Statistic (13) also has an interpretation interms of prediction errors. It is the ratio of the empirical prediction squared error ofthe last observation, calculated from the origin of the series, assuming a unit root, tothe consistent estimator of its expectation. When = 0 (or is known, and the seriesis previously demeaned with that value), following the same arguments as in the proofof Theorems 2 and 3, the corresponding @rst derivatives lead to the test statistics:

E0S =

y2T + y2

1∑Tt=2 (yt − yt−1)2

; (14)

E0N =

y2T∑T

t=2 (yt − yt−1)2; (15)

for the stationary and nonstationary cases, respectively. These asymptotic LBI tests areonly invariant under transformations of the type yt → ayt . This last statistic, E0

N, wasalso previously obtained by Nabeya and Tanaka (1990), Stock (1994), and Tanaka

Page 9: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 349

(1996). The statistic E0S for the LBI test in the stationary case is @rst derived here.

Tests based on these E statistics only have optimal power when the alternative is veryclose to the null (say, � = 0:999 : : :). The power is, however, very low for alternativesof practical interest.

Given the low usefulness of the @rst derivatives to build unit-root tests, one coulddiscard this part of the Taylor expansion and consider test statistics based only on thesecond derivatives. To obtain random variables with nondegenerate limiting distribu-tions, it is necessary to divide the second derivatives by T . This correction makes thesecond term both in S ′′

N(1) and S ′′S (1) to converge in probability to zero, under the null

hypothesis of a unit root. In the nonstationary case, it is veri@ed that

(yT − y1)2

T∑T

t=2 (yt − yt−1)2= Op(T−1):

In the stationary case, it holds that∑T−1t=2 (yt − y1)(yT − yt)

T∑T

t=2 (yt − yt−1)2=∑T

t=2 t+t(T − 1)�2

u

= Op(T−1); (16)

where +t =∑T−t

j=1 ujuj+t =T . Note that (16) still holds if ut is a general stationary ARMAmodel. With these results, the second derivatives satisfy:

12T

S ′′N(1) = C

1 + Op(T−1); (17)

12T

S ′′S (1) = C

1T + Op(T−1); (18)

where C1 and C

1T are derived in the previous section using the prediction-error inter-pretation. Therefore, the second derivatives lead to the asymptotic tests C

1 and C1T ,

respectively.An important conclusion can be extracted from these tests based on the @rst or sec-

ond derivatives: in the stationary case, unit-root tests seem to use the time-reversibilityproperty of Gaussian stationary processes to obtain a more e$cient use of the infor-mation. This can be seen, e.g., in the asymptotic equivalence between a test based onS ′′

S (1) and on C1T , where prediction errors are evaluated from both ends of the series.

However, in the nonstationary case, a test based on S ′′N(1) is asymptotically equivalent

to C1 , where only one extreme of the series is considered. Statistics (13) and (14) also

admit this time-reversibility interpretation: when = 0, expression (15) is a measureof how far the last observation has deviated from the mark, using zero as a refer-ence. Under Assumption A2, however, if the time-reversal series is also considered,the corresponding asymptotically LBI test should be based on both extremes, as does(14), since the @rst observation will be the last one for the reverse process. Similarly,when is unknown, the time-reversibility interpretation can also explain the fact thatthe corresponding asymptotically LBI tests use the same test statistic (13), both in thestationary and nonstationary cases. This optimal property of the time-reversibility forunit-root testing is also important to understand why modi@cations of DF tests, basedon the use of the series in reverse order as well, improve their performance (e.g., Senand Dickey, 1987; Leybourne, 1995; Park and Fuller, 1995; Shin and So, 1997).

Page 10: Efficient tests for unit roots with prediction errors

350 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

If = 0, and following the same arguments as in the proof of Theorems 2 and 3,the asymptotic tests based on the second derivatives use the statistics

C01 = C0

1T =∑T−1

t=2 y2t

T∑T

t=2 (yt − yt−1)2

for the nonstationary and stationary cases, respectively.Finally, our main asymptotic tests are based on both @rst and second derivatives of

the Taylor expansion. Hence, they also have a prediction-error interpretation. Thesetests are asymptotically nearly optimal invariant in the vicinity of one, and are denotedas N tests. For convenience, the value of � that determines the neighbourhood aroundunity is parameterized as �c=1−c=T , with c¿ 0. This formulation is more appropriatethan the use of a @xed value of �, since the power for a given alternative dependson T . In the nonstationary case, the second-order Taylor expansion leads to SN(�c) ≈1−c=T+(c=T )(E+cC

1 ). Therefore, the corresponding N test rejects the null hypothesisfor small values of

NN = E

N + cC1 : (19)

Similarly, for the stationary case, SS(�c) ≈ 1 − c=T + {c=(2T )}(E + cC1T ), and the

corresponding N test will be

NS = E

S + cC1T : (20)

The extension to the = 0 case leads to:

N 0N = E0

N + cC01 ; (21)

N 0S = E0

S + cC01T : (22)

The following theorem extends the previous results to the case with deterministic lineartrend. Since this theorem holds for both stationary and nonstationary cases, di/erencesin notation are omitted. This theorem extends the results of Nabeya and Tanaka (1990)and Tanaka (1996) to the stationary case.

Theorem 4. Let yt =+t+xt ; where xt =�xt−1 +at ; with x0 either 7xed or followinga random variable with zero mean and variance �2=(1 − �2). Then;

S(�) =x′(−1(�)xx′1(

−1N x1

= 1 + (�− 1)S ′(1) +(�− 1)2

2S ′′(1) + O{(�− 1)3};

where

S ′(1) =@S(�)@�

∣∣∣∣�=1

= 1;

S ′′(1) =@2S(�)@�2

∣∣∣∣�=1

= 2∑T

t=2{yt − y1 − (t − 1)}2∑Tt=2 (yt − yt−1 − )2

= 2∑T

t=2 (y t − y 1)2∑Tt=2 (y t − y t−1)2

:

Page 11: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 351

Since the @rst derivative is a constant, an LBI test would be of no practical interest.By the generalized Neyman–Pearson lemma, the second derivative can be used toconstruct asymptotically LBIU tests for both the stationary and nonstationary cases. Itcan be veri@ed, by dividing the second derivative by T , that the asymptotically LBIUtests reject the null for small values of the statistics C�

1, derived in Section 2 usingthe prediction-error interpretation. The fact that the test statistic is the same in boththe stationary and nonstationary cases, is also compatible with the time-reversibilityinterpretation of the stationary case since, as seen in Section 2, C�

1T = C�1. Therefore,

the N tests are

N�N = N�

S = C�1 : (23)

The limiting distributions of the statistics NN and N

S , for the general case of ut= (B)at ,are shown in the following theorem. The limiting distribution of N 0

N and N 0S tests is

the same as for the NN test. The proof is a straightforward application of the proof of

Theorem 1 and is omitted.

Theorem 5. Let yt be the process (1) with �= 1 and = 0. Then; under AssumptionsA; B; and C

(i) NN

d→'−2

{W 2(1) + c

∫ 1

0{W (r)}2 dr

};

(ii) NS

d→'−2

[W 2(1) + c

12

{∫ 1

0{W (r)}2 dr +

∫ 1

0{W 0

B (r)}2 dr

}]:

Remark 2. The consistency of the tests can be established using the same argumentsas in Remark 1.

The procedure followed here to obtain the N tests has some similarities with that ofPloberger (2001). Ploberger (2001) shows a general procedure to obtain an asymptot-ically complete class of tests in the general case of a locally asymptotically quadraticlikelihood. Therefore, for a test not within this class, one can @nd a test within the classthat has better or equal asymptotical power function. For a given testing problem, thetest statistic that de@nes the asymptotically complete class is, in general, some nonlinearfunction of the @rst and second derivatives of the likelihood functions. Therefore, in theunit-root case, the local tests of this complete class would contain a nonlinear functionof the E-tests and C-tests. Building such a test in a general case is not straightforward,since it must ful@ll considerable restrictions. However, in the deterministic trend case,it is possible to check that N�

N and N�S belong to this class in the vicinity of one (see

Theorem 7 and Remark 2 in Ploberger, 2001).

3.2. Relation with existing tests

There are other unit-root tests in the literature that also admit a prediction-errorinterpretation. The Bayesian unit-root test by Phillips and Ploberger (1994) and Phillips

Page 12: Efficient tests for unit roots with prediction errors

352 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

(1996), based on the posterior information criteria (PIC) also have, to some extent,this interpretation. The Phillips and Ploberger test is based on the comparison of thepenalized log-likelihood of two competing models: one model that assumes the unitroot and the second model that is based on an unrestricted estimation. The problemof testing for the presence of a unit root is then treated as a model selection problem,and PIC is used to choose the best model. In large samples and stationary situations,the PIC can be considered equivalent to BIC; where the penalty function is based on asimple parameter count. However, in presence of a unit root, and due to the di/erentconvergence rates of the estimators of such root, the PIC behaves di/erently than BIC.The PIC, then, tends to attach a greater penalty to the parameter associated with the unitroot than it does to the parameters associated with stationary components. The purposeof this greater penalty to the estimator of the unit root is just to help build an e$cientpredictor; since the inclusion of excess nonstationarity leads to a greater increase inprediction error that overparameterization of stationary components. Therefore, whenthe null hypothesis of unit-root is true, the unit-root test based on PIC tends to selectthe model with the unit-root restriction, since it will be the more e$cient predictor.Extensions of this approach to a multivariate setting can be found in Phillips (1996)and Chao and Phillips (1999). About the pros and cons of using unit-root tests to buildan e$cient predictor, see also Diebold and Kilian (2000) and S&anchez (2002).

The N tests and the PIC tests are, therefore, similar in the sense that they both arerelated to the prediction performance of a nonstationary process. The di/erence betweenthe two tests is in the approach they use to detect the unit-root. The PIC test knowsthat there is a unit root because its estimator has a greater convergence rate. Then, thePIC translates that into a greater penalty in order to obtain a better predictor. The Ntest knows that there is a unit root just because the prediction errors are behaving likein a nonstationary process. The N test reads the information just from the predictionerrors. For this reason, we call this a prediction-error approach.

The N tests are also related to the PT tests in Elliott et al. (1996) and Elliott (1999).These PT tests are asymptotically POI tests at the speci@c alternative �=�c, whereas Ntests come from an approximation of asymptotically POI tests around unity. Therefore,both N and PT tests are alternative ways to use the optimal theory to construct testswith high power when � is close to the null, although only the N tests have an explicitprediction-error interpretation. When there are no deterministic components, these tests(denoted as P0

T and N 0 tests) verify, if � = 1,

P0T-N = cN 0

N + op(1);

P0T-S = cN 0

S + op(1);

where the subscript N or S is added to the PT tests to distinguish the initial-conditionassumption. The extension of P0

T and tests N 0 to the case with deterministic componentscan be interpreted as replacing yt , in the numerator of the statistics, with its detrendedcounterpart. The tests di/er, however, in the detrending method. Whereas in PT tests,detrending is made by GLS under the alternative �c, in N tests it is made under thenull. This detrending method gives N tests a clear interpretation in terms of predictionerrors.

Page 13: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 353

When there are no deterministic components, the N 0N test is also related with the

MZ, test, proposed by Stock (1990) (see also Perron and Ng, 1996; Ng and Perron,2001), since it can be written that

MZ, = (E0N − 1)(2C0

1 )−1: (24)

Therefore, the test also has information about the @rst and second derivatives of theTaylor expansion of the asymptotically POI in the non-deterministic case. It can beseen from (24) that the MZ, is designed assuming Assumption A1. Under AssumptionA2, these tests could have been written as MZ, = (E0

S − 1)(2C01 )−1, instead of (24).

Analogously, the MSB test of Stock (1990), based on Bhargava’s (1986) R1 statistics(see also Sargan and Bhargava, 1983; Perron and Ng, 1996), can be expressed as

MSB =√

C01 : (25)

Similar to PT tests, Ng and Perron (2001) extend the MZ, and MSB tests to the casewith deterministic components by detrending by GLS under the alternative �c. Theresulting tests are denoted by MZGLS

, and MSBGLS, respectively.It is important to note that, when dt =, it has been proved here that an e$cient use

of the information leads to di/erent test statistics, depending on the initial conditions.Therefore, the classic unit-root tests that assume a speci@c assumption on the initialvalues need not have comparable performances under di/erent assumptions. However,when dt = + t, this comment does not apply.

4. Modi(ed tests with predictors based on a local alternative

The previous section shows how the prediction-error approach leads to (asymp-totically nearly) e$cient tests in the vicinity of unity. This section also uses theprediction-error interpretation to modify the proposed statistics, to achieve greater powerat alternatives far from the unit circle. The modi@ed statistics use predictors constructedunder a @xed local alternative �c instead of under the null of a unit root. It shouldbe noted that the information regarding the alternative �c has already been used inthe de@nition of the N tests in the last section. This information is summarized inthe parameter c of the statistics (19)–(22). In this section, the information of thislocal alternative is extended to the predictors also. All previous tests; E-tests, C-tests,and N -tests, have greater empirical power with this approach. The best performance isobtained by the modi@cations of the N tests (denoted as NGLS tests). Our attention,therefore, is restricted to these tests.

We suppose that the process follows the model (1) with (B) = 1. Under the nullhypothesis of a unit root, the h-steps ahead prediction of yt+h from yt is y t+h|t=h+yt .Test statistics proposed in previous sections use this predictor. It is now considered thatthe predictor assumes that the process is nearly nonstationary with � = �c. Therefore,the new predictor is yc

t+h|t = c(1 − �hc) +

ct(1 − �h

c) + ch + �h

cyt , where c and

c

are obtained by GLS estimation using the covariance matrix (N(�c) or (S(�c). Inthe nonstationary case, these estimators can also be obtained by regressing [y1; (1 −�cB)y2; : : : ; (1−�cB)yT ] on [z1; (1−�cB)z2; : : : ; (1−�cB)zT ], where zt =1 if dt = and

Page 14: Efficient tests for unit roots with prediction errors

354 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

zt =(1; t) if dt =+t. In the stationary case, the same procedure can be used, but y1 isreplaced with (1−�2

c)1=2y1, and z1 is replaced with (1−�2

c)1=2z1. We denote y

t =yt−c

and y�t =yt − c−

ct. Then, the empirical prediction error of forecasting y

t+h from yt

can be expressed as yt+h− y

t+h|t =yt+h−�h

cyt if dt =; or y�

t+h− y�t+h|t =y�

t+h−�hcy

�t

if dt = + t. In the nonstationary case, the modi@ed N test statistics are:

NGLS-N =

(yT − �T−1

c y1 )2∑T

t=2 (yt − yt−1)2+ c∑T

t=2 (yt − �t−1

c y1 )2

T∑T

t=2 (yt − yt−1)2(26)

and

NGLS-�N =

∑Tt=2 (y�

t − �t−1c y�

1)2

T∑T

t=2 (yt − yt−1 − )2: (27)

In the stationary case, the modi@ed N test statistics are

NGLS-S =

(yT − �T−1

c y1 )2 + (y

1 − �T−1c y

T)2

2∑T

t=2 (yt − yt−1)2

+ c

∑Tt=2 (y

t − �t−1c y

1 )2 +∑T−1

t=1 (yT−t − �t−1

c yT )2

2T∑T

t=2 (yt − yt−1)2(28)

and

NGLS-�S =

∑Tt=2 (y�

t − �t−1c y�

1)2

T∑T

t=2 (yt − yt−1 − )2: (29)

The motivation for these modi@ed tests is clear. If the process is such that � = �c, itcan be expected that the numerators of the NGLS tests are smaller than those based ona misspeci@ed random walk predictor, as in N tests, and, hence, it will be easier toreject the null of a unit root. Besides, it is also reasonable to foresee a similar e/ectif the process is such that � ≈ �c. This modi@cation, however, can alter the limitingdistributions under the null, as stated in the following theorem.

Theorem 6. Let yt be the process (1) with � = 1 and let �c = 1 − c=T . Then; underassumptions A; B; and C;

(i) NGLS-N

d→'−2{W 2(1) + c∫ 1

0{W (r)}2 dr};

(ii) NGLS-�N

d→'−2

{∫ 1

0{Uc(r)}2 dr

};

(iii) NGLS-S

d→'−2 12

{[{Wc

F(1)}2 + {WcB(1)}2]

+c

[∫ 1

0{Wc

F(r)}2 dr +∫ 1

0{Wc

B(r)}2 dr

]};

(iv) NGLS-�S

d→'−2

{∫ 1

0{Kc(r)}2 dr

};

Page 15: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 355

where

Uc(r) = W (r) − rb; b = 3W (1) + 3(1 − 3)∫ 1

0rW (r) dr;

3 = (1 + c)=(1 + c + c2=3);

WcF(r) = V

0 (r) − e−crV 0 (0); V

0 (r) = W (r) − 50 ;

50 = {1=(2 + c)}

{W (1) + c

∫ 1

0W (r) dr

};

WcB(r) = V

0 (1 − r) − e−crV 0 (1);

Kc(r) = V �0 (r) − e−crV �

0 (0); V �0 (r) = {W (r) − 5�

0 − r5�1};

[5�

0

5�1

]=

[c2 + 2c c + c2=2

c + c2=2 1 + c + c2=3

]−1

cW (1) + c2∫ 1

0W (r) dr

(1 + c)W (1) + c2∫ 1

0rW (r) dr

:

5. Finite sample performance. AR(1) case

This section reports on a Monte Carlo experiment, for the AR(1) case, to comparethe empirical power of the N and NGLS tests with some others that appear in theliterature. The main conclusion of this section is that the proposed NGLS tests, basedon prediction errors, have empirical powers very close to those of the POI tests, usedas a benchmark. We will also use the results of Section 3 to explain the relative be-haviour of the competing tests. Elliott et al. (1996) showed that currently used tests(apart from the LBI test) have asymptotic power functions very close to the asymp-totic Gaussian power envelope in the event of no deterministic components. However,their performance in the presence of deterministic components can be very di/erent,even asymptotically. Therefore, only processes with some deterministic component areconsidered.

The proposed tests are compared with the pivotal �DF of Dickey and Fuller (1979);the PT tests and �GLS tests, proposed by Elliott et al. (1996) for the nonstationary case,and Elliott (1999) for the stationary case (denoted by QT and DF-GLSu, respectively,in Elliott, 1999); the weighted symmetric estimator tests (�W), proposed by Park andFuller (1995) (see also Pantula et al., 1994; Fuller, 1996); and the MZGLS

, and MSBGLS

tests of Ng and Perron (2001). The MZGLS, and MSBGLS tests have only been developed

for the nonstationary case, but they have been applied here to the stationary case usingthe corresponding covariance matrix in the GLS detrending model. The asymptoticallyPOI tests (SS(�) and SN(�)), where the true parameter � is used as the alternative, arealso included as a benchmark. For the sake of clarity, the subscript N is added to thosestatistics that use GLS estimation with the covariance matrix (N(�c), and the subscript

Page 16: Efficient tests for unit roots with prediction errors

356 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

Table 1Selected values of c for di/erent unit-root tests

Stationary case

NGLSS NGLS

N PT-S �GLS-S MZGLS,-S MSBGLS

S

dt = 10 13.5 10 10 10 7dt = + t 10 20 10 10 10 7

Non-stationary case

NGLSN PT-N �GLS-N MZGLS

,-N MSBGLSN

dt = 8 7 7 7 7dt = + t 8 13.5 13.5 13.5 13.5

S when the matrix (S(�c) is used. The Monte Carlo experiment was conducted forsample sizes T = 50 and 100; but only T = 100 is reported. Conclusions are identicalin both sample sizes.

Critical values for some of the tests depend on the value of c in �c=1−c=T . Since weare interested in tests with high power in a real situation, the selection of c is based onthe empirical performance in @nite samples. Table 1 summarizes the selected value of cfor each test. To construct this table, critical values for each test and a given value of cwere obtained from 100,000 replications of the model yt =yt−1 +at , with at ∼ N (0; 1)and y1 = a1. Then, the empirical power was evaluated from 100,000 replications ofthe model yt = �yt−1 + at , with at ∼ N (0; 1) and y1 = a1 in the nonstationary case,and y1 = a1=

√1 − �2 in the stationary case. The selected values of c are those that

provide, on average, an empirical power closer to that of the SN(�) and SS(�) testsfor the set of values � = 0:97, 0.95, 0.90, 0.80. The critical values and empiricalpower of POI tests are obtained, for each value of �, using a similar Monte Carloexperiment.

It can be observed in Table 1 that, in the nonstationary case, the values of c for thePT-N, �GLS-N, MZGLS

,-N , and MSBGLSN tests agree with the values proposed by the respec-

tive authors, using asymptotic criteria. In the stationary case, the values of c, proposedin Elliott (1999) for the PT-S and �GLS-S tests, also take @nite sample performance intoconsideration and they agree with the values obtained in the present experiment. Therewere no reported values for MZGLS

,-S and MSBGLSS tests in the stationary case. It should

be noted, however, that the values of c for the MSBGLSS test are di/erent from the

remaining competing tests, whereas in the nonstationary case, they coincide with thoseof PT-N, �GLS-N, and MZGLS

,-N tests. The NGLSN test is also included in the stationary case

because of its excellent performance in that setting. This NGLSN test is always calculated

as in (27) and the estimation is always made with the matrix (N(�c). To facilitatecomparison, the N tests use the same values of c as the NGLS tests. The empiricalpower of NGLS tests using values of c around the recommended ones is not much af-fected. This is illustrated in Fig. 1, where the empirical power of NGLS tests is plottedfor di/erent values of c. We can see in this @gure that the empirical power is onlysigni@cantly a/ected if we choose a value of c very distant from the recommended

Page 17: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 357

Fig. 1. Empirical power function of NGLSN tests for di/erent values of c and initial conditions assumption.

The solid line corresponds to the recommended value of c.

one (solid line). The NGLS tests are, then, fairly robust to this parameter. We can alsosee in Fig. 1 that the recommended values of c are still valid even for alternatives thatare far from the unit root.

Finite sample critical values for DF tests are in Fuller (1996). Finite sample criticalvalues for �GLS-N tests (linear trend case) and PT tests are in Elliott et al. (1996)for the nonstationary case. Finite sample critical values for the remaining tests havebeen obtained through 100,000 Monte Carlo replications. All these critical values, forT = 100, can be found in Table 2. Tables 3 and 4 show the empirical powers of thecompeting tests. From the tables, we observe the following:

1. As suggested by the asymptotic results, the N test has high power (similar to thebenchmark) when the alternative is close to the unit root (� lower than 0.9). Forsmaller values of �, however, this test tends to have lower power than the benchmark.

Page 18: Efficient tests for unit roots with prediction errors

358 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

Table 2Empirical 5% critical values for T = 100 obtained through 100,000 Monte Carlo replications

Stationary and non-stationary case

�W �DF

dt = −2:56 −2:90dt = + t −3:28 −3:45

Stationary case

NS NGLSS NN NGLS

N PT-S MZGLS,-S MSBGLS

S �GLS−S

dt = 0.7605 0.4292 0.8702 0.6406 4.71 −11:29 0.2060 −2:77dt = + t 0.0380 0.0269 0.0380 0.0268 2.94 −16:72 0.1736 −3:21

Non-stationary case

NN NGLSN PT-N MZGLS

,-N MSBGLSN �GLS-N

dt = 0.5472 0.4581 3.11 −8:89 0.2286 −2:14dt = + t 0.0380 0.0305 5.64 −15:76 0.1759 −3:03

Table 3AR(1) case. Empirical power at size 5%. T = 100. dt = . 100,000 replications

Stationary case

� SS(�) NS NGLSS NN NGLS

N �W PT-S �GLS-S MZGLS,-S MSBGLS

S �DF

0.97 0.12 0.12 0.12 0.11 0.12 0.12 0.12 0.09 0.12 0.11 0.080.95 0.20 0.19 0.20 0.18 0.20 0.20 0.20 0.15 0.20 0.19 0.130.90 0.52 0.45 0.52 0.43 0.52 0.52 0.52 0.39 0.52 0.50 0.340.80 0.97 0.82 0.97 0.76 0.97 0.97 0.97 0.92 0.97 0.97 0.87

Non-stationary case

� SN(�) NN NGLSN �W PT-N �GLS-N MZGLS

,-N MSBGLSN �DF

0.97 0.17 0.17 0.16 0.15 0.17 0.17 0.16 0.16 0.080.95 0.29 0.29 0.29 0.26 0.29 0.28 0.28 0.27 0.110.90 0.67 0.65 0.67 0.61 0.66 0.66 0.66 0.64 0.290.80 0.98 0.93 0.98 0.98 0.96 0.98 0.98 0.97 0.86

2. The N tests are, in general, surpassed by the NGLS tests. These NGLS tests haveempirical power very close to the benchmark, both for the stationary and nonstation-ary cases. Although NGLS

N was constructed to have high power for the nonstationarycase, it also has very high power for the stationary case.

3. The NGLS, PT, and MZGLS, tests have similar behaviour, with empirical power iden-

tical or very close to the benchmark.

Page 19: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 359

Table 4AR(1) case. Empirical power at size 5%. T = 100. dt = + t. 100,000 replications

Stationary case

� SS(�) NS NGLSS NGLS

N �W PT-S �GLS-S MZGLS,-S MSBGLS

S �DF

0.97 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.060.95 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.090.90 0.26 0.24 0.25 0.25 0.25 0.26 0.24 0.26 0.26 0.200.80 0.78 0.64 0.77 0.77 0.78 0.77 0.76 0.78 0.78 0.66

Non-stationary case

� SN(�) NN NGLSN �W PT-N �GLS-N MZGLS

,-N MSBGLSN �DF

0.97 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.070.95 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.080.90 0.30 0.29 0.30 0.29 0.30 0.30 0.30 0.30 0.190.80 0.82 0.72 0.82 0.81 0.82 0.82 0.82 0.82 0.64

4. The MSBGLS test has lower power than the NGLS, PT, and MZGLS, tests for the

case with dt = , but it has similar performance when dt = + t. This result isin accordance with the asymptotic theory. As can be seen in (25), the MSBGLS testonly uses the information of the second derivative of the Taylor expansion of theasymptotically POI test. This information might be enough to construct an e$cienttest if dt =+t (see Theorem 4). When dt =, however, more e$cient tests mightneed the use of both @rst and second derivatives (see Theorems 2 and 3) as NGLS,PT, and MZGLS

, do.5. The asymptotic results of Section 3 also help to explain the relative behaviour of the

tests based on least squares: �W and �GLS. When dt = , Table 3 shows that the �W

test has better performance than the �GLS test in the stationary case, but is worse inthe nonstationary case. This result can be explained by the di/erent e$ciency (nearthe unit root) of the information in each case shown in Section 3. In the station-ary case, it has been seen that an e$cient use of the information leads to the useof the series, both in direct and reverse order, as �W does. On the other hand, inthe nonstationary case, it would be more e$cient to use only the series in directorder, just as �GLS does. This result suggests that a modi@cation of the �GLS test,based on the use of a symmetric estimator, could improve its performance in thestationary case.

6. Regarding the case dt =+t, it has been shown in Section 3 that both the stationaryand the nonstationary cases use the same nearly optimal test. This means that, inthe presence of a deterministic linear trend and as far as unit-root detection is con-cerned, the series in reverse order contains the same information as in direct order.This can explain the fact that the performances of �W and �GLS are very similar in thissetting.

Page 20: Efficient tests for unit roots with prediction errors

360 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

6. The general ARMA case

This section extends the proposed tests to a general ARMA case. For the sake ofbrevity, only NGLS tests are considered. The extension of the proposed tests requiresthe consistent estimation of '2, where '2 =!−2�2

u, with !2 =�2 (1)2. As a consistentestimator of �2

u we can use �2u =

∑Tt=2 (yt − yt−1)2=T if dt = or �2

u =∑T

t=2 (yt −yt−1 − )2=T if dt = + t. It can, however, be seen that the numerator of �2

u is partof the NGLS test statistic. Therefore, the generalization to a general ARMA case isstraightforward. For example, the generalization of NGLS-

N and NGLS-�N is

NGLS-N =

(yT − �T−1

c y1 )2

T!2 + c∑T

t=2 (yt − �t−1

c y1 )2

T 2!2 ; (30a)

NGLS-�N =

∑Tt=2 (y�

t − �t−1c y�

1)2

T 2!2 ; (30b)

with !2 a consistent estimator of !2. In the same fashion, the remaining statistics ofprevious sections can be generalized. A second option for the extension to an ARMAcase would also consist of incorporating the information of the model into the pre-dictions in the numerator of the test statistics. For example, if ut is an MA(1), andconsidering dt =, and at = 0 for t ¡ 1, the prediction of y

t from the @rst observationy

1 would be yt|1 = �t−2

c (�c − �)y1 , with � an estimator of �, instead of �t−1

c y1 used

in (30a). A limited experiment has revealed that this second approach does not havebetter empirical performance than the previous one. Therefore, for the sake of brevity,further details are omitted.

6.1. Finite sample performance

The @nite sample properties of NGLS, PT, MZGLS, , �GLS, and �W are compared. The

model considered is yt = �yt−1 + at − �at−1, where at are independent, identicallydistributed, standard normal errors. Experiments have been devised for both the non-stationary case (y1 = a1) and the stationary case, with sample sizes T = 100. In thestationary case, and for �¡ 1, a set of 50 initial observations were generated and dis-carded to avoid the e/ect of initial values. The selection of the autoregressive truncationlag k for �GLS and �W has been made with the BIC and also with the MIC procedures,proposed in Ng and Perron (2001). The MIC procedure is designed to diminish thewell-known size distortion in the presence of MA components. The remaining com-peting tests need the estimation of !2. There are several well-known estimators of!2 in the literature. The most popular is the autoregressive estimator of the spectraldensity at frequency zero, de@ned as !2

OLS–AR = �2AR(1−∑k

j=1 �∗j )−2, where �2

AR is the

residual variance obtained from the OLS estimation of Qyt=dt+,yt−1+∑k

j=1 �∗j Qyt−j

+ 7t . There are also several procedures for the choice of the truncation lag k (see, e.g.,Ng and Perron, 1995, 2001). In this experiment, however, we consider a modi@cationof this estimator, proposed in Ng and Perron (2001), involving detrended data from us-ing local-to-unity GLS. This estimator of !2 is !2

GLS–AR = �2AR(1−∑k

j=1 �∗j )−2, where

Page 21: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 361

�2AR is the residual variance obtained from the OLS estimation of Qyc

t = ,yct−1 +∑k

j=1 �∗j Qyc

t−j + vt ; where yct are GLS detrended data using �c = 1 − c=T , with c = 7

if dt = , and c = 13:5 if dt = + t. Ng and Perron (2001) show that !2GLS–AR

improves the performance of unit-root tests with respect to !2OLS–AR. The choice of

the truncation lag k in !2GLS–AR was also made by the BIC and MIC. When the BIC

is used, k is restricted to be 36 k6 8, as in Elliott et al. (1996). For the MIC, therestriction 06 k6 10(T=100)1=4 is used, as in Ng and Perron (2001). In this paper,we also consider the estimator proposed in S&anchez (2000), de@ned as

!2GLS–ARMA = �2 (1 −∑q

i=1 �i)2

(1 −∑p−1j=1 �

∗j )2

; (31)

where �j, �i are estimates from the error correction regression

Qyct = ,yc

t−1 +p−1∑j=1

�∗j Qyc

t−j + at +q∑

j=1

�iat−j: (32)

The advantage of !2GLS–ARMA is in the possibility of @tting an ARMA model, whereas

!2OLS–AR and !2

GLS–AR should always rely on autoregressive approximations. There are,besides, many procedures for the identi@cation of ARMA models (see Koreisha andPukkila, 1995 and references therein). As shown in S&anchez (2000), the possibility ofestimating an ARMA model signi@cantly reduces the size distortion induced by movingaverage polynomials. The estimation method is exact maximum likelihood when dt =and marginal likelihood when dt = + t. The computations have been done withthe NAG routine G13bef. Since the estimation methods are nonlinear, a set of initialvalues were needed. The performance of the tests when �6 0 is very robust to theseinitial values. However, when �¿ 0, the tests are highly sensitive to the initial values.S&anchez (2000) analyses several sets of initial values, both random and @xed. The bestperformance is obtained with initial values close to the unit circle for both parameters:� and �. The initial values used in this experiment are: �0 = 0:95 and �0 = 0:90. Foreach parameter con@guration, all tests are based on the same seeds. Empirical size andpower, based on 10,000 replications, are summarized in Tables 5 and 6. From thesetables, we observe the following:

1. The advantage of the estimator !2GLS–ARMA, shown in S&anchez (2000), is also ob-

served with the proposed NGLS tests. Tests based on BIC tend to have inRated size,specially at � = 0:8. Tests based on MIC, apart from �W and �GLS, have reasonablesize at � = 0:8 in the stationary case. However, they are undersize at � = 0 and canhave inRated size at negative values of �. Besides, tests based on MIC can su/erfrom a severe loss of power when compared with the AR(1) case.

2. The NGLSN test has similar performance to PT test and, in the stationary case, also

to NGLSS test. These three tests also have better performance than the remaining

competing tests. As mentioned above, the NGLS and PT tests can be considered asalternative ways to use the optimal theory to build e$cient tests, although only NGLS

tests have a prediction error interpretation.

Page 22: Efficient tests for unit roots with prediction errors

362 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

Table 5Empirical size and power for 5% level. T = 10. Model: (1 − �B)(yt − dt) = (1 − �B)at . Stationary case.10,000 replicationsa

Model Test � dt = dt = + t

� �

−0:8 −0:5 0 0.5 0.8 −0:8 −0:5 0 0.5 0.8

ARMA NGLSN

1:000:90

0:0580:571

0:0590:568

0:0570:526

0:0370:286

0:0500:228

0:0690:301

0:0680:297

0:0590:244

0:0290:103

0:0660:159

NGLSS

1:000:90

0:0590:571

0:0600:568

0:0580:518

0:0360:261

0:0490:220

0:0680:301

0:0670:300

0:0580:245

0:0280:103

0:0650:158

MZGLS,

1:000:90

0:0720:623

0:0730:619

0:0720:574

0:0440:314

0:0550:238

0:0980:407

0:0970:398

0:0870:335

0:0410:137

0:0730:171

PT1:000:90

0:0550:564

0:0560:563

0:0560:521

0:0370:287

0:0510:232

0:0610:299

0:0640:294

0:0580:250

0:0300:110

0:0680:164

AR(MIC) NGLSN

1:000:90

0:0850:295

0:0500:273

0:0330:270

0:0410:235

0:0560:296

0:0880:137

0:0510:107

0:0280:077

0:0270:097

0:0590:205

NGLSS

1:000:90

0:0870:277

0:0480:255

0:0330:244

0:0360:210

0:0500:279

0:0890:141

0:0490:106

0:0280:077

0:0280:096

0:0610:205

MZGLS,

1:000:90

0:0960:317

0:0600:303

0:0420:300

0:0450:251

0:0580:311

0:1150:193

0:0730:160

0:0430:128

0:0450:149

0:0770:233

PT1:000:90

0:0800:294

0:0470:270

0:0340:263

0:0390:237

0:0550:301

0:0790:141

0:0480:112

0:0260:083

0:0290:111

0:0650:218

�GLS1:000:90

0:0270:169

0:0240:173

0:0210:192

0:0320:212

0:0820:401

0:0300:118

0:0290:119

0:0280:127

0:0330:142

0:1090:299

�W1:000:90

0:0200:132

0:0240:181

0:0220:241

0:0360:273

0:1060:515

0:0070:033

0:0160:060

0:0240:121

0:0450:169

0:1810:449

AR(BIC) NGLSN

1:000:90

0:1420:580

0:1060:556

0:0860:536

0:0860:528

0:3570:894

0:2550:479

0:1660:414

0:1270:340

0:0950:294

0:3750:711

NGLSS

1:000:90

0:1500:582

0:1120:559

0:0850:526

0:0850:503

0:3510:889

0:2550:479

0:1690:415

0:1260:341

0:0940:293

0:3730:706

MZGLS,

1:000:90

0:1580:609

0:1270:597

0:0990:574

0:0990:561

0:3910:924

0:2930:542

0:2070:487

0:1570:421

0:1310:382

0:4550:785

PT1:000:90

0:1390:575

0:1040:553

0:0850:532

0:0850:529

0:3590:904

0:2410:473

0:1550:417

0:1220:341

0:1010:308

0:3880:727

�GLS1:000:90

0:0650:348

0:0520:331

0:0520:313

0:0740:442

0:4370:965

0:0790:245

0:0590:220

0:0490:189

0:0830:303

0:5640:896

�W1:000:90

0:0670:410

0:0370:346

0:0510:456

0:1220:749

0:6900:998

0:0760:242

0:0330:148

0:0520:211

0:1480:495

0:8450:995

aThe tests NGLS, PT, and MZGLS, use the spectral estimator !2

GLS−ARMA when an ARMA(1,1) model is@tted, and the spectral estimator !2

GLS−AR when an AR approximation is used.

Page 23: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 363

Table 6Empirical size and power for 5% level. T = 100. Model: (1 − �B)(yt − dt) = (1 − �B)at . Non-stationarycase. 10,000 replicationsa

Model Test � dt = dt = + t

� �

−0:8 −0:5 0 0.5 0.8 −0:8 −0:5 0 0.5 0.8

ARMA NGLSN

1:000:90

0:0550:732

0:0570:717

0:0550:643

0:0420:339

0:0500:232

0:0640:352

0:0640:343

0:0600:285

0:0310:123

0:0670:164

MZGLS,

1:000:90

0:0600:758

0:0630:741

0:0620:670

0:0450:353

0:0530:236

0:0880:437

0:0900:428

0:0820:359

0:0390:150

0:0720:174

PT1:000:90

0:0550:747

0:0560:732

0:0570:649

0:0430:347

0:0500:231

0:0620:351

0:0630:344

0:0590:287

0:0310:126

0:0680:165

AR(MIC) NGLSN

1:000:90

0:0850:295

0:0500:273

0:0330:270

0:0410:235

0:0560:296

0:0880:137

0:0510:107

0:0280:077

0:0270:097

0:0590:205

MZGLS,

1:000:90

0:0960:317

0:0600:303

0:0420:300

0:0450:251

0:0580:311

0:1150:193

0:0730:160

0:0430:128

0:0450:149

0:0770:233

PT1:000:90

0:0800:294

0:0470:270

0:0340:263

0:0390:237

0:0550:301

0:0790:141

0:0480:112

0:0260:083

0:0290:111

0:0650:218

�GLS1:000:90

0:0360:358

0:0380:279

0:0310:357

0:0390:279

0:0790:263

0:0340:143

0:0330:146

0:0290:147

0:0370:159

0:0980:221

�W1:000:90

0:0200:175

0:0240:232

0:0220:299

0:0360:299

0:1060:522

0:0070:042

0:0160:066

0:0240:129

0:0450:173

0:1830:441

AR(BIC) NGLSN

1:000:90

0:1420:580

0:1060:556

0:0860:536

0:0860:528

0:3570:894

0:2550:479

0:1660:414

0:1270:340

0:0950:294

0:3750:711

MZGLS,

1:000:90

0:1580:609

0:1270:597

0:0990:574

0:0990:561

0:3910:924

0:2930:542

0:2070:487

0:1570:421

0:1310:382

0:4550:785

PT1:000:90

0:1390:575

0:1040:553

0:0850:532

0:0850:529

0:3590:904

0:2410:473

0:1550:417

0:1220:341

0:1010:308

0:3880:727

�GLS1:000:90

0:0630:563

0:0620:599

0:0500:536

0:0750:560

0:3350:727

0:0790:275

0:0640:262

0:0490:219

0:0800:334

0:5060:802

�W1:000:90

0:0660:480

0:0410:444

0:0510:527

0:1260:791

0:6950:999

0:0780:264

0:0380:171

0:0520:230

0:1420:521

0:8570:997

aThe tests NGLS, PT, and MZGLS, use the spectral estimator !2

GLS−ARMA when an ARMA(1,1) model is@tted, and the spectral estimator !2

GLS−AR when an AR approximation is used.

Page 24: Efficient tests for unit roots with prediction errors

364 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

Table 7Critical values

Stationary case Non-stationary case

T 1% 2:5% 5% 10% T 1% 2:5% 5% 10%

NGLS−N with c = 13:5 NGLS-

N with c = 8

25 0.4207 0.4943 0.5782 0.7081 25 0.2843 0.3418 0.4056 0.499350 0.4146 0.4987 0.5917 0.7318 50 0.2835 0.3486 0.4251 0.5440

100 0.4285 0.5249 0.6406 0.8045 100 0.2930 0.3699 0.4581 0.5988150 0.4426 0.5518 0.6728 0.8587 150 0.2988 0.3787 0.4743 0.6268250 0.4555 0.5770 0.7120 0.9284 250 0.3027 0.3920 0.4945 0.6650500 0.4780 0.6105 0.7674 1.0180 500 0.3103 0.3989 0.5088 0.6928∞ 0.5165 0.6697 0.8551 1.1726 ∞ 0.3152 0.4120 0.5303 0.7301

NGLS-�N with c = 20 NGLS-�

N with c = 8

25 0.0221 0.0248 0.0277 0.0319 25 0.0253 0.0289 0.0326 0.038050 0.0204 0.0234 0.0266 0.0311 50 0.0231 0.0270 0.0311 0.0369

100 0.0201 0.0234 0.0268 0.0317 100 0.0221 0.0262 0.0305 0.0367150 0.0201 0.0234 0.0272 0.0324 150 0.0219 0.0260 0.0303 0.0366250 0.0199 0.0236 0.0276 0.0331 250 0.0217 0.0258 0.0302 0.0366500 0.0203 0.0242 0.0282 0.0341 500 0.0215 0.0257 0.0302 0.0367∞ 0.0206 0.0246 0.0290 0.0354 ∞ 0.0212 0.0255 0.0299 0.0366

We conclude that, as expected from the intuition, the use of prediction errors allowsthe construction of unit-root tests with excellent properties of size and power, evenwith large moving average roots. These two characteristics: intuitive interpretation andgood performance, make the NGLS test of potential interest to practitioners. A moredetailed set of critical values can be found in Table 7. Since the performance of NGLS

Nand NGLS

S is similar, only critical values for NGLSN are supplied.

7. Concluding remarks

This paper shows a new approach for unit-root detection. This so-called prediction-error approach exploits the intuitive relationship between the long-run behaviour of aprocess and the presence of a unit root, and shows the link between our intuition andthe optimal detection of such a root.

From a theoretical point of view, this prediction-error approach appears very usefulto understand how (nearly) optimal tests use the information. It is seen in this paperthat nearly optimal tests in the vicinity of the null can be constructed with the predictionerrors of a predictor that assumes the presence of a unit root, where the predictions areevaluated from the origin of the series. When the series is covariance-stationary underthe alternative, a more e$cient use of the information (in the vicinity of the null) isobtained if the time-reversal series is also considered. In that case, both ends of theseries are used as origin of the predictions.

From a practical point of view, the proposed approach allows the construction oftests for a general ARMA setting. These tests are denoted as NGLS tests and have

Page 25: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 365

an empirical performance similar to the PT tests. A Monte Carlo experiment for theARMA(1,1) model shows that both NGLS and PT tests have an excellent relative per-formance among a set of tests currently in common use.

The present approach, not only have an intuitive interpretation, but good theoreticaland empirical properties. Then, it can be used to design nonstationarity tests underdi/erent circumstances than those exposed here. Outliers robust unit-root tests, or non-parametric tests, could be some examples of promising directions for future research.

Acknowledgements

The author is especially grateful to Daniel Pena, Graham Elliott, and an anonymousreferee for valuable discussions and comments. He thanks Professor W. Ploberger forproviding him with a copy of his unpublished manuscript. Part of this research wasconducted while the author was visiting the University of California, San Diego. Heis indebted to this institution. This research was supported in part by CICYT, GrantsPB96-0339 and BEC2000-0167. The usual disclaimer applies.

Appendix A

Proof of Theorem 1. (i) The limiting distribution of C1 is a straightforward application

of the functional central limit theorem (FCLT) to St =∑t

j=2 uj.(ii) To obtain the limiting distribution of C�

1, it is convenient to rewrite yt − y1

− (t−1)=xt−x1−(−)(t−1). Since =(T−1)−1(yT −y1)=+(T−1)−1∑Tt=2 uj,

it can be shown that√T (−) d→� (1)W (1). By the FCLT and using T−(k+1)∑T

t=1 tk

→ (k + 1)−1; k = 0; 1; : : : as T → ∞, it is veri@ed that: T−1=2{yt − y1 − (t − 1)} =

T−1=2{xt−x1−(−)(t−1)} d→!{W (r)−rW (1)} ≡ !W�(r). Then, from the continuousmapping theorem (CMT), the distribution can be obtained.

(iii) By the FCLT, T−1=2(yT−t − yT ) d→!{W (1 − r) −W (1)} ≡ !W 0B (r). From the

CMT, the limiting distribution is obtained.

Proof of Theorem 2. The GLS residuals of the numerator of SN are xN; t = yt − N;where

N =y1 + (1 − �)

∑Tt=2(yt − �yt−1)2

1 + (T − 1)(1 − �)2 : (A.1)

Likewise; the residuals of the denominator of SN are x1; t = yt − 1; where 1 isthe GLS estimator using (−1

N (1). Then; 1 = y1. The @rst derivative isS ′

N(1) = (x′1(−1N (1)x1)−1@(x′N(

−1N (�)xN)=@�|�=1; where it can be veri@ed that

x′1(−1N (1)x1 =

∑Tt=2(yt − yt−1)2. Then; the numerator of S ′

N(1) can be written as@(x′N(

−1N (�)xN)=@�|�=1 = 2(@x′N=@�)(−1

N (�)xN|�=1 + x′N(@(−1N (�)=@�)xN|�=1. Since

(@x′N=@�) = −(@N=@�)e′and also

e′(−1N (�)xN|�=1 = 0; (A.2)

Page 26: Efficient tests for unit roots with prediction errors

366 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

it can be shown that @(x′N(−1N (�)xN)=@�|�=1 = x′N(@(−1

N (�)=@�)xN|�=1. From (7) it canbe shown that

@(−1N (�)@�

∣∣∣∣�=1

= 2L′L− (L + L′) = (−1N (1) − diag(0; : : : ; 0; 1); (A.3)

where diag expresses a diagonal matrix. Also; from (A.1); it holds that xN; t(1) = (yt −y1). Therefore; it can be obtained that S ′

N(1) = 1 − {∑Tt=2(yt − yt−1)2}−1(yT − y1)2.

For the second derivative; it can be written that

@2

@�2 x′N(

−1N (�)xN = 2

@2x′N@�2 (−1

N (�)xN + 4@x′N@�

@(−1N (�)@�

xN (A.4a)

+ 2@x′N@�

(−1N (�)

@xN

@�+ x′N

@2(−1N (�)@�2 xN: (A.4b)

Applying (A.2); the @rst term in (A.4a) is null at � = 1. Also; by (A.3) and applyingthat @N=@�|�=1 = (y1 − yT ); it can be obtained that 4(@x′N=@�)(@(−1

N (�)=@�)xN|�=1 =−4(yT−y1)2. Similarly; since e′(−1

N (1)e=1 it holds that (@x′N=@�)(−1N (�)(@xN=@�)|�=1

= (yT − y1)2. To solve the last term in (A.4b); it can be applied that

@2(−1N (�)@�2

∣∣∣∣�=1

= 2L′L; (A.5)

and; therefore; x′N(@2(−1N (�)@�2)xN|�=1 =2

∑T−1t=2 (yt −y1)2 and the theorem holds.

Proof of Theorem 3. The GLS residuals of the numerator of SS are xS; t(�) = yt − S.Then;

S =(1 − �)y1 + (1 − �)yT +

∑T−1t=2 (1 − �)2yt

2(1 − �) + (T − 2)(1 − �)2 =y1 + yT +

∑T−1t=2 (1 − �)yt

2 + (T − 2)(1 − �):

(A.6)

Following the same arguments as in Theorem 2; it holds that

e′(−1S (�)xS|�=1 = 0: (A.7)

Also; by (9); it holds that

@(−1S (�)@�

∣∣∣∣∣�=1

= 2L′L− (L + L′) − 2ee′ = (−1S (1) − diag(1; 0; : : : ; 0; 1): (A.8)

Then; it can be obtained that S ′S(1) = 1−{2

∑Tt=2(yt −yt−1)2}−1(yT −y1)2. Likewise;

in the second derivative; it can be obtained that

@2

@�2 x′S(

−1S (�)xS = 2

@2x′S@�2 (−1

S (�)xS + 4@x′S@�

@(−1S (�)@�

xS

+ 2@x′S@�

(−1S (�)

@xS

@�+ x′S

@2(−1S (�)@�2 xS: (A.9)

Similar to the nonstationary case; it can be obtained that the @rst term in (A.9) is nullat � = 1. Applying (A.8) it can also be seen that the second term in (A.9) is null in

Page 27: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 367

� = 1. Since e′(−1S (1)e = 0; the third term is also null at unity. To solve the fourth

term in (A.9); it can be veri@ed that

@2(−1S (�)@�2

∣∣∣∣∣�=1

= 2L′L− 2ee′ = diag(0; 2; 2; : : : ; 2; 0): (A.10)

Hence; x′S(@2(−1S (�)@�2)xS|�=1 = 1

2

∑T−1t=2 (yt −y1)2 + 1

2

∑T−1t=2 (yT −yt)2 −∑T−1

t=2 (yt −y1)(yT − yt); and the theorem holds.

Proof of Theorem 4. The proof is given @rstly for the stationary case and then for thenonstationary case. The GLS residuals in the numerator of SS are xS; t = yt − � − �t;

where � and � are the GLS estimator using the matrix (−1S (�). Likewise; the residuals

of the denominator are x1; t =yt − 1 − t; where 1 and are the GLS estimator usingthe matrix (−1(1). Then; 1 = y1 − and = (yT − y1)(T − 1)−1. It can; then;be veri@ed that x′1(

−1(1)x1 =∑T

t=2(yt − yt−1 − )2. The numerator of S ′S(1) can be

written as

@@�

x′S(−1S (�)xS

∣∣∣∣�=1

= 2@x′S@�

(−1S (�)xS

∣∣∣∣�=1

+ x′S@(−1

S (�)@�

xS

∣∣∣∣∣�=1

: (A.11)

After some algebraic manipulation; it can be obtained that

[�

]=

d3n1 − d2n2

Dd1n2 − (1 − �)d2n1

D

;

where n1 = yT + y1 + (1 − �)∑T−1

t=2 yt; n2 = y1(1 − 2�) + {T − (T − 1)�}yT + (1 −�)2∑T−1

t=2 tyt ; d1 = (1 + �) + (T − 1)(1 − �); d2 = (1 + �) +∑T

t=2{t − (t − 1)�};d3 = (1 − �2) +

∑Tt=2{t − (t − 1)�}2; D = d1d3 − (1 − �)d2

2. It can then be obtainedthat; when � = 1; �|�=1 = y1 − (T − 1)−1(yT − y1); �|�=1 = (T − 1)−1(yT − y1) ≡ ;

@�=@�|�=1 = (yT − y1)(T=4 − 12 ) − 1

2

∑T−1t=2 yt; and @�=@�|�=1 = 0. Then; when � = 1;

xS; t |�=1=yt−y1−(t−1). After some algebra; it can be seen that (@x′S=@�)(−1S (�)|�=1=

(0; 0; : : : ; 0). Then; the @rst term in (A.11) is null. Applying the result (A.8); it canbe seen that the second term in (A.11) is equal to one. For the second derivative; wecan make a decomposition similar to (A.9). Following the same arguments as in the@rst derivative; it can be seen that the second and third terms of this decompositionare zero. It is easy to verify that the term (@2x′S=@�

2)(−1S (�)|�=1 is a vector with

all elements equal to zero; except for the @rst and the last one. The term xS|�=1;however; is a vector with @rst and last elements equal to zero. Therefore; the @rstterm in this decomposition is also null. By (A.10); we obtain that the fourth termof this decomposition is 2

∑T−1t=2 {yt − y1 − (t − 1)}2 and; then; the theorem holds

in the stationary case. In the nonstationary case; we also have a decomposition like(A.11). The GLS residuals in the numerator of SN are xN; t = yt − � − �t; where �

and � are now the GLS estimator using the matrix (−1N (�). After some algebra; we

Page 28: Efficient tests for unit roots with prediction errors

368 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

obtain that

=

b3m1 − b2m2

Fb1m2 − b2m1

F

;

where m1 =(1−�)yT +(1−�+�2)y1 +(1−�)2∑T−1t=2 yt; m2 =(1−�)2y1 +{T − (T −

1)�}yT + (1− �)2∑T−1t=2 tyt ; b1 = 1 + (T − 1)(1− �)2; b2 = 1 + (1− �){(T − �) + (1−

�)∑T−1

t=2 t}; b3 =(1−�2)+∑T

t=2{t−(t−1)�}2; F =b1b3−b22. Then; it can be obtained

that; when � = 1; �|�=1 = y1 − (T − 1)−1(yT − y1); �|�=1 = (T − 1)−1(yT − y1) ≡ ;

@�=@�|�=1 = 0; and @�=@�|�=1 = 0. Then; when � = 1; xN; t |�=1 = yt − y1 − (t − 1).Therefore; @x′N=@�|�=1 is a vector of zeros. Using the same arguments as in the sta-tionary case; we have S ′

N(1) = 1. For the second derivative; we have a decompositionsimilar to (A.4a); where it is easy to check that the second and third terms are null. Itcan also be veri@ed; in this nonstationary case; that (@2x′N=@�

2)(−1N (�)|�=1 is a vector

with all elements equal to zero; except the @rst and the last ones; and that xN|�=1 isa vector with @rst and last elements equal to zero. Therefore; the @rst term of thisdecomposition is null. By (A.5); we obtain that the fourth term of this decompositionis 2

∑T−1t=2 {yt − y1 − (t − 1)}2 and; then; the theorem also holds in the nonstationary

case.

Proof of Theorem 6. (i) In the NGLS-N statistic; y

t = yt − c; where c is estimatedby GLS using the matrix (−1

N (�c); �c = 1 − c=T . From Elliott et al. (1996); it can be

obtained that T−1=2yt

d→!W (r). Then; applying that limT→∞ �Tc = e−c; limT→∞ �[Tr]

c =

e−rc; and the CMT; it can be veri@ed that T−1�−2u (y

T − �T−1c y

1 )2 d→'−2{W (1)}2 and

T−2�−2u∑T

t=2(yt − �t−1

c y1 )2 d→'−2[

∫ 10 {W (r)}2 dr].

(ii) In the NGLS-�N test, y�

t =yt − c − ct, where the estimation is by GLS using the

matrix (−1N (�c). From Elliott et al. (1996), T−1=2y�

td→!Uc(r). It can be checked that

Uc(0) = 0. Then, T−1=2(y�t − �t−1

c y�1) d→!Uc(r). From the CMT, T−2�−2

u∑T

t=2(y�t −

�t−1c y�

1)2 d→'−2[∫ 1

0 {Uc(r)}2 dr](iii) In the NGLS-

S test, yt =yt − c, where c is estimated by GLS using (−1

S (�c).

From Lemma 3 in Elliott (1999), T−1=2yt

d→!V0 (r). Then, T−1=2(y

T − �T−1c y

1 ) d→!

{V0 (1) − e−cV

0 (0)} ≡ !WcF(1) and T−1=2(y

1 − �T−1c y

T ) d→!{V0 (0) − e−cV

0 (1)} ≡!Wc

B(1). Therefore, from the CMT,

(yT − �T−1

c y1 )2

2T �2u

+(y

1 − �T−1c y

T )2

2T �2u

d→12'−2[{Wc

F(1)}2 + {WcB(1)}2]:

Similarly, it can also be veri@ed that T−1=2(yt − �t−1

c y1 ) d→ !{V

0 (r) − e−crV 0 (0)}

≡!WcF(r) and T−1=2(y

T−t − �t−1c y

T ) d→ !{V0 (1 − r) − e−crV

0 (1)}≡!WcB(r).

Page 29: Efficient tests for unit roots with prediction errors

I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370 369

Hence,∑Tt=2(y

t − �t−1c y

1 )2

2T 2�2u

+(y

T−t − �t−1c y

T )2

2T 2�2u

d→

'−2 12

[∫ 1

0{Wc

F(r)}2 dr +∫ 1

0{Wc

B(r)}2 dr

]:

(iv) In the NGLS-�S tests, y�

t = yt − c − ct, where the estimation is by GLS with

(−1S (�c). From Elliott (1999), T−1=2y�

td→!V�

0 (r). Therefore, T−1=2(y�t − �t−1

c y�1) d→!

{V �0 (r)−e−crV �

0 (0)}≡!Kc(r). Then, T−2�−2u∑T−1

t=1 (y�t −�t−1

c y�1)2 d→'−2

∫ 10 {Kc(r)}2 dr.

References

Bhargava, A., 1986. On the theory of testing for unit roots in observed time series. Rev. Econom. Stud. 53,369–384.

Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis: Forecasting and Control. Holden-Day, SanFrancisco, CA.

Chao, J., Phillips, P.C.B., 1999. Model selection in partially nonstationary vector autoregression. J.Econometrics 91, 227–272.

Diebold, F.X., Kilian, L., 2000. Unit-root tests are useful for selecting forecasting models. J. Bus. Econom.Statist. 18, 265–273.

Dickey, D.A., Fuller, W.A., 1979. Distribution of the estimators for autoregressive time series with a unitroot. J. Amer. Statist. Assoc. 74, 427–431.

Dufour, J.M., King, M.L., 1991. Optimal invariant tests for the autocorrelation coe$cient in linear regressionswith stationary or nonstationary AR(1) errors. J. Econometrics 47, 115–143.

Elliott, G., 1999. E$cient tests for a unit root when the initial observation is drawn from its unconditionaldistribution. Internat. Econom. Rev. 40, 767–783.

Elliott, G., Rothenberg, T.J., Stock, J.H., 1996. E$cient tests for an autoregressive unit root. Econometrica64, 813–836.

Ferguson, T.S., 1967. Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York.Fuller, W.A., 1996. Introduction to Statistical Time Series, 2nd Edition. Wiley, New York.Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press, New York.Hwang, J., Schmidt, P., 1996. Alternative methods of detrending and the power of unit-root tests. J.

Econometrics 71, 227–248.Kadiyala, K.R., 1970. Testing for the independence of regression disturbances. Econometrica 38, 97–117.Koreisha, S.G., Pukkila, T., 1995. A comparison between di/erent order-determination criteria for

identi@cation of ARIMA models. J. Bus. Econom. Statist. 13, 127–131.Leybourne, S.J., 1995. Testing for unit roots using forward and reverse Dickey–Fuller regressions. Oxford

Bull. Econom. Statist. 57, 559–571.Nabeya, S., Tanaka, K., 1990. Limiting power of unit-root tests in time-series regression. J. Econometrics

46, 247–271.Ng, S., Perron, P., 1995. Unit-root tests in ARMA models with data-dependent methods for the selection of

the truncation lag. J. Amer. Statist. Assoc. 90, 268–281.Ng, S., Perron, P., 2001. Lag length selection and the construction of unit-root tests with good size and

power. Econometrica 69, 1519–1554.Pantula, S.G., Gonzalez-Farias, G., Fuller, W.A., 1994. A comparison of unit-root test criteria. J. Bus.

Econom. Statist. 12, 449–459.

Page 30: Efficient tests for unit roots with prediction errors

370 I. S anchez / Journal of Statistical Planning and Inference 113 (2003) 341–370

Park, H.J., Fuller, W.A., 1995. Alternative estimators and unit root tests for the autoregressive process. J.Time Ser. Anal. 16, 415–429.

Perron, P., Ng, S., 1996. Useful modi@cations to some unit root tests with dependent errors and their localasymptotic properties. Rev. Econom. Stud. 63, 435–463.

Phillips, P.C.B., 1996. Econometric model determination. Econometrica 4, 763–812.Phillips, P.C.B., Ploberger, W., 1994. Posterior odds testing for a unit root with data-based model selection.

Econometric Theory 10, 774–808.Ploberger, W., 2001. A complete class of tests when the likelihood is locally asymptotically quadratic.

Manuscript, University of Rochester.S&anchez, I., 2000. Spectral density estimators at frequency zero for nonstationary tests in ARMA models.

Working Paper, Universidad Carlos III de Madrid.S&anchez, I., 2002. E$cient forecasting in nearly non-stationary processes. J. Forecasting, in preparation.Sargan, J.D., Bhargava, A., 1983. Testing the residuals from least squares regression for being generated by

the Gaussian random walk. Econometrica 41, 153–174.Sen, D.L., Dickey, D.A., 1987. Symmetric test for second di/erencing in univariate time series. J. Bus.

Econom. Statist. 5, 463–473.Shin, D.W., Fuller, W.A., 1998. Unit-root tests based on unconditional maximum likelihood estimation for

the autoregressive moving average. J. Time Ser. Anal. 19, 591–599.Shin, D.W., Lee, J.H., 2000. Consistency of the maximum likelihood estimators for nonstationary ARMA

regressions with time trends. J. Statist. Plann. Inference 87, 55–68.Shin, D.W., So, B.S., 1997. Semiparametric unit-root tests based on symmetric estimators. Statist. Probab.

Lett. 33, 177–184.Stock, J.H., 1990. A class of tests for integration and cointegration. Manuscript, Harvard University.Stock, J.H., 1994. Unit roots, structural breaks, and trends. In: Engle, R.F., McFadden, D.L. (Eds.), Handbook

of Econometrics, Vol. IV. Elsevier, France.Tanaka, K., 1996. Time Series Analysis: Nonstationary and Noninvertible Distribution Theory. Wiley, New

York.Weiss, G., 1975. Time-reversibility of linear stochastic processes. J. Appl. Probab. 12, 831–836.Xiao, Z., Phillips, P.C.B., 1998. An ADF coe$cient test for a unit root in ARMA models of unknown order

with empirical applications to the US economy. Econometric J. 1, 27–43.Yap, S.F., Reinsel, G.C., 1995. Results on estimation and testing for a unit root in the nonstationary

autoregressive moving-average model. J. Time Ser. Anal. 16, 339–353.