Some Tools for Identiﬁcation of Nonlinear Time Series · of linear time series models. For...

Some Tools for Identification of Nonlinear

Time Series

Henrik Aalborg Nielsen∗ and Henrik Madsen†

Dept. of Mathematical ModellingTechnical University of Denmark

DK-2800 LyngbyDenmark

October 23, 1998

Abstract

In classical time series analysis the sample autocorrelation func-tion (SACF ) and the sample partial autocorrelation function(SPACF ) has gained wide application for structural identificationof linear time series models. For non-linear time series these toolsare not applicable since they only address variation which can beexplained by linear models, and for this reason they may completelyfail to detect non-linear dependencies. We suggest generalizations,founded on smoothing techniques, applicable for structural identi-fication of non-linear time series models. A similar generalizationof the sample cross correlation function is discussed. Furthermore, ameasure of the departure from linearity is suggested. It is shown howbootstrapping can be applied to test for independence and for lin-earity. The generalizations do not prescribe a particular smoothingtechnique. In fact, when the smoother are replaced by a linear re-gression the generalizations of SACF and SPACF reduce to a closeapproximation of their linear counterparts. For this reason a smoothtransition form the linear to the non-linear case can be obtained byvarying the bandwidth of a local linear smoother. By adjusting the

∗E-mail: [email protected]

†E-mail: [email protected]

1

flexibility of the smoother the power of the tests for independenceand linearity against specific alternatives can be adjusted. The gen-eralizations allow for graphical presentations, very similar to thoseused for SACF and SPACF . In this report the generalizations aretested on some simulated data sets and on the Canadian lynx data.The generalizations seem to perform well and the measure of thedeparture from linearity proves to be an important additional tool.

KEY WORDS: Lagged scatter plot; Non-linear time series; Smooth-ing; Non-parametric; Independence; Bootstrap.

1 Introduction

The sample autocorrelation function and the sample partial autocorrelationfunction have gained wide application for structural identification of lineartime series models. For non-linear time series these tools are not sufficientbecause they only address linear dependencies.

During the last couple of decades a number of results on properties of,and estimation and testing in, nonlinear models have been obtained. Foran overview (Priestley, 1988; Tong, 1990; Tjøstheim, 1994) can be con-sulted. However, considerable fewer results have been seen on the problemof structural identification. Tjøstheim and Auestad (1994) have suggesteda method based on kernel estimates to select the significant lags in a non-linear model, and Granger and Lin (1994) used the mutual informationcoefficient and Kendall’s τ as generalizations of the correlation coefficientand Kendall’s partial τ as a generalization of the partial correlation co-efficient. Chen and Tsay (1993) have considered a best subset modellingprocedure and the ACE and BRUTO algorithms, see e.g. (Hastie and Tib-shirani, 1990), for identification of non-linear additive ARX models. Re-cently, Lin and Pourahmadi (1998) have used the BRUTO algorithm toidentify the lags needed in a semi-parametric non-linear model. Multi-variate adaptive regression splines (Friedman, 1991) was introduced formodelling of non-linear autoregressive time series by Lewis and Stevens(1991). Terasvirta (1994) suggested a modelling procedure for non-linearautoregressive time series in which a (parametric) smooth threshold au-toregressive model is used in case a linear model proves to be inadequate.For the case of non-linear transfer functions Hinich (1979) considered thecase where the impulse response function of the transfer function depends

2

linearly on the input process.

In this report we define the new tools LDF (Lag Dependence Function),PLDF (Partial Lag Dependence Function), and NLDF (Non-linear LagDependence Function) for structural identification of non-linear time series.The tools can be applied in a way very similar to the sample autocorrela-tion function and the sample partial autocorrelation function. The toolsare based on smoothing techniques, but they are not dependent on anyparticular smoother, see e.g. (Hastie and Tibshirani, 1990, Chapter 3) foran overview of smoothing techniques. For some smoothers an (almost) con-tinuous transition from the linear to the non-linear case can be obtained byvarying the smoothing parameter. Also, smoothers applying optimal selec-tion of the bandwidth may be used; however, see e.g. (Chen and Tsay, 1993)for a discussion of the potential problems in applying criteria such as gener-alized cross validation to time series data. Under a hypothesis of indepen-dence bootstrap confidence intervals (Efron and Tibshirani, 1993) of the lagdependence function are readily calculated, and we propose that these canalso be applied for the partial lag dependence function. Furthermore thenon-linear lag dependence function can be used to test specific linear hy-pothesis. A modification of the partial lag dependence function is suggestedthat seemingly is more appropriate for identifying models for prediction.This also applies when linear models are used. The lag dependence functionand the non-linear lag dependence function are readily calculated in thatonly univariate smoothing are needed, whereas multivariate smoothing orbackfitting are required for the application of the remaining tools.

The suggested tools are illustrated both by using simulated linear andnon-linear time series models, and by considering the Canadian lynx data(Moran, 1953), which have attained a bench-mark status in time series lit-erature. Using the Canadian lynx data results very similar to those foundby Lin and Pourahmadi (1998) are obtained.

In Section 2 the study is motivated by considering a simple deterministicnon-linear process for which the sample autocorrelation function is non-significant. Section 3 describes the relations between multiple linear re-gression, correlation, and partial correlation with focus on aspects leadingto the generalization. The proposed tools are described in Sections 4, 5,6, and 7 and bootstrapping is considered in Section 8. Examples of ap-plication by considering simulated linear and non-linear processes and theCanadian lynx data (Moran, 1953) are found in Section 9. In Section 10 ageneralization of the sample cross correlation function is briefly discussed.

3

Finally, in Section 11 some further remarks are given.

2 Motivation

The sample autocorrelation function (Brockwell and Davis, 1987), com-monly used for structural identification in classical time series analysis,measures only the degree of linear dependency. In fact deterministic se-ries exists for which the sample autocorrelation function is almost zero, seealso (Granger, 1983). One such example is xt = 4xt−1(1 − xt−1) for whichFigure 1 shows 1000 values using x1 = 0.8 and the corresponding sampleautocorrelation function SACF together with an approximative 95% con-fidence interval of the estimates under the hypothesis that the underlyingprocess is i.i.d. Furthermore lagged scatter plots for lag one and two areshown. From the plot of the series and the SACF the deterministic struc-ture is not revealed. However, the lagged scatter plots clearly reveals thatthe series contains a non-linear dynamic dependency.

Time (t)

0 200 400 600 800 1000

0.0

0.5

1.0

xt

lag

Sam

ple

AC

F

0 3 6 9 12

0.0

0.5

1.0

.

.

.

.

.

.

.

.

.

.

.

.

.....

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

...

.

.

.

.

.

.

...

.

.

.....

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

...

.

. ..

.

.

.

...

.

.

.

.

.

.....

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

...

.

.

.

.

.

..

.

.

.

.

.

.

...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

. ...

.

.

.

.

.

..

.

.

...

.

.

.

.

.

.

.

.

.

.

.....

.

.

.

.

...

.

..

.

.

.

.

.

.

..

.

.

.

.

.

. ..

.

.

.

...

.

.

.

.

.

....

.

.

...

.

.

..

.

..

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. ..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

...

.

.

.

.

.

......

.

..

.

.

.

...

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

...

.

.

...

.

..

.

.

.

.

.

.

. ...

.

.

.

.

.

.

..

.

.

.

.

.

..

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

..

.

.

...

.

. ...

.

.

.

..

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

...

.

.

.

.

.

.

.

.

.

. ..

.

.

.

..

.

.

...

.

..

.

.

.

.

.

. ... .

.

.

.

......

.

.

.

.

.

.

.

.

.

.

.

.

.

.

......

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

....

.

.

..

.

.

.

.

.

.

.

..

.

.

.

. .. .

.

.

.

..

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

..

.

.

...

.

.

.

.

....

.

.

.

.

. ..

.

.

.

.

.

..

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

....

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

....

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.....

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

....

.

.

.

.

.

.

.

.

.

.

.

.

. .. .

.

.

.

..

.

.

.

.

..

.

.

....

.

.

.

. ....

.

.

.

.

.

.

.

....

.

.

.

.

.

.

.

.

. .. .

.

.

.

......

.

.

.

..

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

..

.

..

.

.

.

....

.

.

.....

.

.

.

.

..

.

.

.

.

.

.

.

.

.

...

.

.

.

.

..

0.0 0.5 1.0

0.0

0.5

1.0

x t-1

xt

.

.

.

.

.

.

.

.

.

.

.

. ....

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

. ..

.

.

.

.

.

.

. ..

.

.

. ....

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. ..

.

. ..

.

.

.

. ..

.

.

.

.

.

. ....

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

. ..

.

.

.

.

.

..

.

.

.

.

.

.

. ..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.. ..

.

.

.

.

.

..

.

.

. ..

.

.

.

.

.

.

.

.

.

.

. ....

.

.

.

.

. ..

.

..

.

.

.

.

.

.

..

.

.

.

.

.

...

.

.

.

. ..

.

.

.

.

.

. ...

.

.

. ..

.

.

. .

.

..

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. ..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

.

.

.

.

.

. ..

.

.

.

.

.

. .....

.

..

.

.

.

. ..

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. ..

.

.

. ..

.

..

.

.

.

.

.

.

.. ..

.

.

.

.

.

.

..

.

.

.

.

.

..

.

.

.

.

.

.

. .

.

.

.

.

.

.

.

.

..

.

.

. ..

.

. . ..

.

.

.

..

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. ..

.

.

.

.

.

.

.

.

.

. ..

.

.

.

..

.

.

. ..

.

..

.

.

.

.

.

. . ...

.

.

.

. .....

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .....

.

.

.

.

.

.

. .

.

.

.

.

.

.

.

.

.

.

.

. ...

.

.

..

.

.

.

.

.

.

.

..

.

.

.

.. ..

.

.

.

..

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

..

.

.

. ..

.

.

.

.

. ...

.

.

.

.

....

.

.

.

.

..

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. ...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. ...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

. ....

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

. ...

.

.

.

.

.

.

.

.

.

.

.

.

. . ..

.

.

.

..

.

.

.

.

..

.

.

. ...

.

.

.

.. ...

.

.

.

.

.

.

.

. ...

.

.

.

.

.

.

.

.

.. ..

.

.

.

. .....

.

.

.

..

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

. ..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

. .

.

..

.

.

.

. ...

.

.

. ....

.

.

.

.

..

.

.

.

.

.

.

.

.

.

. ..

.

.

.

.

..

0.0 0.5 1.0

0.0

0.5

1.0

x t-2

xt

Figure 1: The time series (top), SACF (bottom, left), xt versus xt−1

(bottom, middle), and xt versus xt−2 (bottom, right) for 1000 values fromthe recursion xt = 4xt−1(1 − xt−1).

In practice the series will often be contaminated with noise and it is then

4

difficult to judge from the lagged scatter plots whether any dependenceis present. Smoothing the lagged scatter plots will aid the interpretationbut different smoothing parameters may result in quite different estimates.Therefore it is important to separate the variability of the smooth from theunderlying dependence.

From Figure 1 it is revealed that, in principle, xt can be regarded as afunction of xt−k for any k > 0, but k = 1 is sufficient, since xt can bepredicted exactly from xt−1 alone. This indicates that there may exista non-linear equivalent to the partial autocorrelation function (Brockwelland Davis, 1987) and reveals that substantial information can be obtainedby adjusting for the dependence of lag 1, . . . , k − 1 when xt and xt−k areaddressed. The sample partial autocorrelation function amounts to a linearadjustment.

3 Preliminaries

Estimates of correlation and partial correlation are closely related to valuesof the squared degree of determination (R-squared) obtained using linearregression models. The generalizations of the sample autocorrelation func-tion SACF and the sample partial autocorrelation function SPACF arebased on similar R-squared values obtained using non-linear models. Inthis section the relations between multiple linear regression, correlation,and partial correlation are presented.

Consider the multivariate stochastic variable (Y, X1, . . . , Xk). The squaredmultiple correlation coefficient ρ2

0(1...k) between Y and (X1, . . . , Xk) can be

written (Kendall and Stuart, 1961, p. 334, Eq. (27.56))

ρ20(1...k) =

V [Y ] − V [Y |X1, . . . , Xk]

V [Y ]. (1)

If the variances are estimated using a maximum likelihood estimator, as-suming normality, it then follows that an estimate of ρ2

0(1...k) is

R20(1...k) =

SS0 − SS0(1...k)

SS0, (2)

where SS0 =∑

(yi − y)2 (where y =∑

yi/N) and SS0(1...k) is the sumof squares of the least squares residuals when regressing yi linearly on

5

x1i, . . . , xki (i = 1, . . . , N). R20(1...k) is also called the squared degree of

determination of the regression and can be interpreted as the relative re-duction in variance due to the regressors.

Hence it follows that when regressing yi linearly on xki the squared degreeof determination R2

0(k) equals the squared estimate of correlation between

Y and Xk, and furthermore it follows that R20(k) = R2

k(0).

The partial correlation coefficient ρ(0k)|(1...k−1) between Y and Xk givenX1, . . . , Xk−1 measures the extend to which, by using linear models, thevariation in Y , which cannot be explained by X1, . . . , Xk−1, can be ex-plained by Xk. Consequently, the partial correlation coefficient is thecorrelation between (Y |X1, . . . , Xk−1) and (Xk |X1, . . . , Xk−1), see also(Rao, 1965, p. 270). Using (Whittaker, 1990, p. 140) we obtain

ρ2(0k)|(1...k−1)

=V [Y |X1, . . . , Xk−1] − V [Y |X1, . . . , Xk]

V [Y |X1, . . . , Xk−1]. (3)

For k = 1 it is readily seen that ρ2(0k)|(1...k−1) = ρ2

0(1). If the variances areestimated using the maximum likelihood estimator, assuming normality, itfollows that an estimate of ρ2

(0k)|(1...k−1) is

R2(0k)|(1...k−1) =

SS0(1...k−1) − SS0(1...k)

SS0(1...k−1). (4)

Besides an estimate of ρ2(0k)|(1...k−1) this value can also be interpreted as

the relative decrease in the variance when including xki as an additionalpredictor in the linear regression of yi on x1i, . . . , xk−1,i. Note that (4) mayalso be derived from (Ezekiel and Fox, 1959, p. 193).

Interpreting R20(1...k), R2

0(k), and R2(0k)|(1...k−1) as measures of variance re-

duction, these can be calculated and interpreted for a wider class of models.In the remaining part of this report “ ˜ ” will be used above values of SSand R2 obtained from models other than linear models.

4 Lag Dependence

Assume that observations {x1, . . . , xN} from a stationary stochastic process{Xt} exists. It is readily shown that the estimate of the autocorrelation

6

function in lag k is approximately equal to the estimate of the correla-tion coefficient between Xt and Xt−k using the observations {x1, . . . , xN}.Hence, the squared SACF (k) can be approximated by the squared degreeof determination when regressing xt linearly on xt−k, i.e. R2

0(k).

This observation leads to a generalization of SACF (k), based on R20(k)

obtained from a smooth of the k-lagged scatter plot, i.e. a plot of xt

against xt−k. The smooth is an estimate of the conditional mean fk(x) =E[Xt |Xt−k = x]. We then define the Lag Dependence Function in lag k,LDF (k), as

LDF (k) = sign(fk(b) − fk(a)

) √(R2

0(k))+ (5)

where a and b is the minimum and maximum over the observations andthe subscript “+” indicates truncation of negative values. The truncationis necessary to ensure that (5) is defined. However, the truncation will onlybecome active in extreme cases. For some smoothers and lags it becomesactive for the series considered in Figure 1.

Due to the sign, when fk(·) is restricted to be linear, LDF (k) is a good ap-proximation of SACF (k) and, hence, it can be interpreted as a correlation.In the general case LDF (k) can be interpreted as (the signed square-rootof) the part of the overall variation in xt which can be explained by xt−k.Generally, R-squared for the non-parametric regression of xt on xt−k, R0(k)

do not equal R-squared for the corresponding non-parametric regression ofxt−k on xt, and consequently, unlike SACF (k), the lag dependence func-tion is not an even function. In this report only causal models will beconsidered and (5) will only be used for k > 0 and by definition LDF (0)will be set equal to one.

5 Strictly Non-Linear Lag Dependence

The lag dependence function described in Section 4 measures both linearand non-linear dependence. If, in the definition of R2

0(k), the sum of squaresfrom a overall mean SS0 is replaced by the sum of squares from fittinga strait line to the k-lagged scatter plot, a measure of non-linearity isobtained. In this report this will be called the strictly Non-linear LagDependence Function in lag k, or NLDF (k).

7

6 Partial Lag Dependence

For the time series {x1, . . . , xN} the sample partial autocorrelation function

in lag k, denoted SPACF (k) or φkk, is obtainable as the Yule–Walkerestimate of φkk in the AR(k) model

Xt = φk0 + φk1Xt−1 + . . . + φkkXt−k + et, (6)

where {et} is i.i.d. with zero mean and constant variance, see also (Brockwelland Davis, 1987, p. 235). An additive, but non-linear, alternative to (6) is

Xt = ϕk0 + fk1(Xt−1) + . . . + fkk(Xt−k) + et. (7)

This model may be fitted using the backfitting algorithm (Hastie and Tib-shirani, 1990), see also Section 6.1. The function fkk(·) can be interpretedas a partial dependence function in lag k when the effect of lags 1, . . . , k−1is accounted for. If the functions fkj(·), (j = 1, . . . , k) are restricted to be

linear then fkk(x) = φkkx and the function can be uniquely identified by

its slope φkk.

However, since the partial autocorrelation function in lag k is the corre-lation between (Xt |Xt−1, . . . , Xt−(k−1)) and (Xt−k |Xt−1, . . . , Xt−(k−1)),the squared SPACF (k) may also be approximated by R2

(0k)|(1...k−1), basedon linear autoregressive models of order k − 1 and k. Using models of thetype (7) SPACF (k) may then be generalized using an R-squared value ob-tained from a comparison of models (7) of order k − 1 and k. This value isdenoted R2

(0k)|(1...k−1) and we define the Partial Lag Dependence Function

in lag k, PLDF (k), as

PLDF (k) =

sign(fkk(b) − fkk(a)

) √(R2

(0k)|(1...k−1))+. (8)

When (7) is replaced by (6) PLDF (k) is a good approximation ofSPACF (k). As for LDF (k), generally, PLDF (k) cannot be interpretedas a correlation. However, PLDF (k) can be interpreted as (the signedsquare-root of) the relative decrease in one-step prediction variance whenlag k is included as a predictor. For k = 1 the model (7) correspondingto k − 1 reduce to an overall mean and the R-squared value in (8) is thusR2

0(1), whereby PLDF (1) = LDF (1) if the same smoother is used for both

8

functions. It can be noticed that the same relation exists between the par-tial autocorrelation function and the autocorrelation function. For k = 0the partial lag dependence function is set equal to one.

Except for the sign PLDF (k) may also be based on the completely generalautoregressive model

xt = gk(xt−1, . . . , xt−k) + et (9)

where g : Rk → R. However, the estimation of gk(·, . . . , ·) without otherthan an assumption of smoothness is not feasible in practice for k largerthan, say, three, see also (Hastie and Tibshirani, 1990). Recently, alterna-tives to (9) has been considered by Lin and Pourahmadi (1998).

6.1 Fitting the Additive Models

To fit the non-linear additive autoregressive model (7) the backfitting al-gorithm (Hastie and Tibshirani, 1990) is suggested. However, concurvity(Hastie and Tibshirani, 1990) between the lagged values of the time se-ries may exist and, hence, the estimates may not be uniquely defined. Forthis reason it is suggested to fit models of increasing order, starting withk = 1 and ending with the highest lag K for which PLDF (k) is to becalculated. In the calculation of the residual sum of squares only residualscorresponding to t = K + 1, . . . , N should be used.

For the numerical examples considered in this report local polynomial re-gression (Cleveland and Devlin, 1988) is used for smoothing. The conver-gence criterion used is the maximum absolute change in any of the estimatesrelative to the range of the fitted values. Also, an iteration limit is applied.

For k = 1 the estimation problem reduces to local polynomial regressionand hence convergence is guaranteed. If for any k = 2, . . . , K convergenceis not obtained, or if the residual sum of squares increases compared to theprevious lag, we put fjk(·) = 0, (j = k, . . . , K) and fkj(·) = fk−1,j(·), (j =1, . . . , k − 1). This ensures that convergence is possible for k + 1.

7 Partial R-squared for Non-Linear Relations

Basicly, PLDF (k) compares the fit of a model containing lags 1, . . . , krelatively to fit of a model containing lags 1, . . . , k − 1. For prediction

9

purposes it may seem more appropriate to compare the reduction in theone-step prediction variance. From the definition of R2

0(1...k) it is seen that

∆R20(k) = R2

0(1...k) − R20(1...k−1)

=SS0(1...k−1) − SS0(1...k)

SS0, (10)

is the normalized reduction in the (in-sample) one-step prediction errorvariance when including lag k as a predictor. The Partial R-Squared Func-tion in lag k, PRSF (k), is then defined as

PRSF (k) = sign(fkk(b) − fkk(a)

) √(∆R2

0(k))+. (11)

As for PLDF (k) it is possible, except for the sign, to define PRSF (k)using models of the type (9). For k = 1 the model (7) corresponding tok−1 reduces to an overall mean and, hence, ∆R2

0(1) = R20(1). Consequently,

PRSF (1) is equal to PLDF (1) and LDF (1), assuming the same smootheris used in all cases.

8 Testing Hypothesis of Independence or Lin-

earity

Smoothers usually require one or more smoothing parameters to be se-lected, see e.g. (Hastie and Tibshirani, 1990, Chapter 3). Therefore, inprinciple, smoothing parameters can be selected to obtain R-squared valuesarbitrarily close to one, also when the underlying process is i.i.d. (assumingno ties are present in the data). For this reason it is important to obtainconfidence limits of, e.g., the lag dependence function under the hypoth-esis that the underlying process is i.i.d. and for a given set of smoothingparameters. Furthermore, it seems applicable to use the strictly non-linearlag dependence function to test for linearity. These aspects are consideredin this section.

8.1 Testing for Independence

Under the hypothesis that the time series {x1, . . . , xN} is observations froman i.i.d. process the distribution of any of the quantities discussed in the

10

previous sections, except NLDF (k), can be approximated by generatinga large number of i.i.d. time series of length N from an estimate of thedensity function of the process and recalculating the quantities for eachof the generated time series. In this report the empirical density functionwill be used. However, for short time series it may be more appropriate tocondition on a parametric form of the density function.

Methods as outlined above are often denoted bootstrap methods and in thiscontext various approaches to the calculation of approximate confidenceintervals have been addressed extensively in the literature, see e.g. (Efronand Tibshirani, 1993).

If local polynomial smoothers with degree one or larger is used, thenat every point the smoother will be more flexible than a globally linearmodel. Since LDF (k) and PLDF (k) reduce to close approximations ofSACF (k) and SPACF (k), respectively, when linear models are used in-stead of smoothers, ±2/

√N will be a lower bound on the 95% confidence

interval of LDF (k) and PLDF (k).

Calculation of LDF (k) involves only scatter plot smoothing and, thus, it isfaster to calculate than, e.g., PLDF (k). For this reason it is suggested tobase a test for independence on LDF (k) for some range k = 1, . . . , K. Foran i.i.d. process it is obvious that the distribution of LDF (k) will dependon k only due to the fact that k affects the number of points on the k-laggedscatter plot. Hence, when k ≪ N the distribution of LDF (k) under thehypothesis of independence is approximately independent of k.

The sign in the definition of LDF (k) is included only to establish an ap-proximate equality with SACF (k) when linear models are used and toinclude information about the sign of the average value of the slope. Whenthe observations originates from an i.i.d. process LDF (k) will be positivewith probability 1/2. Consequently, when the smoother is flexible enoughthe null-distribution of LDF (k) will be bimodal, since in this case R2

0(k)

will be strictly positive. The most efficient way of handling this problem isto base the bootstrap calculations on the absolute value of LDF (k). Hence,an upper confidence limit on |LDF (k)| is to be approximated.

Below the standard, percentile, and BCa methods, all defined in (Efron andTibshirani, 1993, Chapters 13 and 14), will be briefly discussed. For theseries considered in Figure 1 the LDF (k) were calculated for k ≤ 12 usinga local linear smoother and a nearest neighbour bandwidth of 1/3. Theresult is shown in Figure 2 together with 95% bootstrap confidence limits

11

calculated separately for each lag and based on 1000 bootstrap replicates,generated under the hypothesis of independence. The BCa limit could notbe calculated for lags 1 to 4, since all the bootstrap replicates were eithersmaller or larger than the actual value of |LDF (k)|. Results correspondingto Figure 2 when the true process is standard Gaussian i.i.d. are shownin Figure 3. For practical purposes an equality of the standard and per-centile methods are observed (no difference is visible on the plots), whereasthe results obtained using the BCa method is highly dependent on the lagthrough the value of |LDF (k)|. Hence, the BCa method cannot be usedwhen the confidence limit is only calculated for one lag and used for the re-maining lags as outlined above. The high degree of correspondence betweenthe standard and percentile method indicates that sufficient precision canbe obtained using the standard method on fever bootstrap replicates. Thisis highly related to the approximate normality of |LDF (k)| and it is sug-gested that this is investigated for each application before a choice betweenthe standard and percentile method is made.

Lag

|LD

F|

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

Figure 2: Absolute value of the Lag Dependence Function of the determin-istic series presented in Figure 1. The dots indicate the maximum over the1000 bootstrap replicates. Standard, percentile, and BCa 95% confidencelimits are indicated by lines (BCa dotted).

The underlying model of the BCa method assumes that the estimate inquestion may be biased and that the variance of the estimate depends lin-early on an increasing transformation of the true parameter (Efron andTibshirani, 1993, p. 326-8), and furthermore the estimate is assumed tobe normally distributed. The bias and the slope of the line are then esti-mated from the data. With λ being the fraction of the bootstrap replicatesstrictly below the original estimate, the bias is Φ−1(λ) (Φ is the cumula-

12

Lag

|LD

F|

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Figure 3: Absolute value of the Lag Dependence Function of 1000 obser-vations from a standard Gaussian i.i.d. process. The dots indicate themaximum over the 1000 bootstrap replicates. Standard, percentile, andBCa 95% confidence limits are indicated by lines (BCa dotted).

tive standard Gaussian distribution function). This explains why the BCa

limit is non-existing for lags 1-4 of the deterministic series. The slope isestimated by use of the jackknife procedure (Efron and Tibshirani, 1993,p. 186). It seems that, although the underlying model of the BCa methodis a superset of the underlying model of the standard method, the estima-tion of bias and slope induces some additional variation in the confidencelimit obtained. As a consequence it may be advantageous to average theBCa limits over the lags and use this value instead of the individual values.However, the standard and percentile methods seem to be appropriate forthis application and since significant savings of computational effort can beimplemented by use of these methods it is suggested that only these areapplied on a routine basis.

8.2 Testing for Linearity

Assuming a specific linear model this can be used for simulation and anapproximate bootstrap confidence limit for |NLDF (k)| can be obtainedgiven this model. Consequently, the alternative contains both linear andnon-linear models. To make the approach sensible the linear model needsto be appropriately selected, i.e. using the standard time series tools ofidentification, estimation, and validation. Alternatively, the simulationscan be performed using autocovariances only and assuming these to bezero after a specific lag. In this case an estimator of autocovariance must

13

be used that ensures that the autocovariance function used for simulation isnon-negative definite, see e.g. (Brockwell and Davis, 1987, p. 27). Note thatusing this approach on the series considered in Figure 1 will, essentially,result in a test for independence.

Hjellvik and Tjøstheim (1996) consider a similar test for linearity and usesAkaike’s information criterion (Brockwell and Davis, 1987) to select anappropriate AR(p)-model under which the bootstrap replicates are gener-ated. In (Theiler, Eubank, Longtin, Galdrikian and Farmer, 1992) a rangeof alternative linear null hypotheses is considered. Especially, the randomsampling in the phase spectrum described in Section 2.4.1 of this referenceseems to be a relevant linear null hypothesis.

8.3 Confidence Limit for |PLDF (k)|

In Section 8.1 it is shown how bootstrapping can be used to construct anapproximative confidence limit for |LDF (k)|. There is some indication thatthis limit can be used also for |PLDF (k)| if the same smoother is used for

calculation of LDF (k) and fk1(·), . . . , fkk(·) (Sections 4 and 6).

For (linear) autoregressive models of order p, with i.i.d. N(0, σ2) errors,and fitted using N observations it holds approximately that the residualsum of squares is distributed as σ2χ2(N − p) (Brockwell and Davis, 1987,p. 251 and 254). Therefore if the true process is i.i.d. with variance σ2 thefollowing approximations apply when linear autoregressive models are used

SS0 ∼ σ2χ2(N − 1) (12)

SS0(k) ∼ σ2χ2(N − 2) (13)

SS0(1...k−1) ∼ σ2χ2(N − k) (14)

SS0(1...k) ∼ σ2χ2(N − k − 1) (15)

For N ≫ k the distribution of all four sums of squares are approximatelyequal.

For locally weighted regression Cleveland and Devlin (1988) stated thatthe distribution of the residual sum of squares can be approximated bya constant multiplied by a χ2 variable, see also (Hastie and Tibshirani,1990, Section 3.9). Furthermore, for generalized additive models Hastieand Tibshirani (1990, Section 8.1) uses a χ2 distribution with degrees offreedom equal to the number of observations minus a quantity depending

14

on the flexibility of the smoothers used.

For these reasons we conjecture that when N ≫ k and when the samesmoother is used for LDF (k) and PLDF (k), as outlined in the beginning

of this section, then the sum of squares SS0, SS0(k), SS0(1...k−1), and

SS0(1...k) will follow approximately the same distribution.

This conjecture leads to approximate equality of means and variances ofthe sums of squares. Since for both LDF (k) and PLDF (k) the comparedmodels differ by an additive term, estimated by the same smoother in bothcases, we also conjecture that for an i.i.d. process.

Cor[SS0(k), SS0] ≈ Cor[SS0(1...k), SS0(1...k−1)]. (16)

Using linearizations about the mean of the sums of squares it then followsfrom the approximate equality of means that

E[|LDF (k)|] ≈ E[|PLDF (k)|], (17)

and from both conjectures that

V [|LDF (k)|] ≈ V [|PLDF (k)|]. (18)

Eqs. (17) and (18) tell us that the approximate i.i.d. confidence limit ob-tained for |LDF (k)| can be used also as an approximate limit for |PLDF (k)|.In Section 9 (Canadian lynx data) an example of the quality of the approx-imation is given, and the mentioned arguments seems to be confirmed bythe bootstrap limits obtained in that example.

15

9 Examples

9.1 Linear processes

Below it is briefly illustrated how LDF , PLDF , and PRSF behaves com-pared to SACF and SPACF in case of simple linear processes. The AR(2)process

Xt = 1.13Xt−1 − 0.64Xt−2 + et (19)

and the MA(2) process

Xt = et + 0.6983et−1 + 0.5247et−2 (20)

are considered, where in both cases {et} is i.i.d. N(0, 1).

Figures 4 and 5 contain plots based on 100 simulated values from (19) and(20), respectively (the default random number generator of S-PLUS version3.4 for HP-UX were used). Each figure show SACF and SPACF . Theremaining plots are LDF , PLDF , and PRSF for local linear smoothersusing a nearest neighbour bandwidth of 1.00 (2nd row), 0.50 (3rd row),and 0.1 (bottom row). 95% confidence intervals are indicated by dottedlines. The confidence intervals obtained for LDF are included on the plotsof PLDF .

For the calculation of PLDF and PRSF a convergence criterion (see Sec-tion 6.1) of 0.01 and an iteration limit of 20 is used. Standard bootstrapintervals are calculated for LDF under the i.i.d. hypothesis using 200 repli-cates. For LDF the agreement with SACF is large for nearest neighbourbandwidths 1.0 and 0.5. As expected, the range of the confidence intervalincreases with decreasing bandwidth, and, using the smallest bandwidth,it is almost not possible to reject the i.i.d. hypothesis.

When a nearest neighbour bandwidth of 1.0 is used PLDF agrees well withSPACF for the lower half of the lags, whereas PLDF is exactly zero formost of the larger half of the lags. Similar comments apply for nearestneighbour bandwidths 0.5 and 0.1. This is due to the function estimatesbeing set equal to zero when the iteration limit is exceeded. The alternativeto PLDF , namely PRSF , behaves similar, but for lags two, or larger, theabsolute value of PRSF is smaller than for the absolute value of PLDF .This is because PRSF addresses decreases in (normalized) prediction errorvariances, whereas PLDF compares the fit relative to the fit when the lastlag is excluded.

16

Sam

ple

AC

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

Sam

ple

PA

CF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

•

-0.010 -0.005 0.0 0.005 0.010

-0.0

10-0

.005

0.0

0.00

50.

010

LDF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PR

SF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

LDF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PR

SF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

LDF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PR

SF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

Figure 4: Plots of autocorrelation functions and their generalizations for100 observations from the AR(2) process (19).

17

Sam

ple

AC

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

Sam

ple

PA

CF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

•

-0.010 -0.005 0.0 0.005 0.010

-0.0

10-0

.005

0.0

0.00

50.

010

LDF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PR

SF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

LDF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PR

SF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

LDF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PR

SF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

Figure 5: Plots of autocorrelation functions and their generalizations for100 observations from the MA(2) process (20).

18

9.2 Non-linear processes

Three non-linear processes are addressed, namely (i) the non-linear autore-gressive process (NLAR(1))

Xt =1

1 + exp(−5Xt−1 + 2.5)+ et, (21)

where {et} i.i.d. N(0, 0.12), and (ii) the non-linear moving average process(NLMA(1))

Xt = et + 2 cos(et−1), (22)

where {et} i.i.d. N(0, 1) and (iii) the non-linear and deterministic processdescribed in Section 2, called DNLAR(1) in the following. For all threecases 1000 observations are generated. The starting value for NLAR(1) isset to 0.5 and for DNLAR(1) it is set to 0.8. Plots of the series NLAR(1)and NLMA(1) are shown in Figure 6. The plot of DNLAR(1) is shownin Figure 1.

0 200 400 600 800 1000

-0.2

0.4

1.0

0 200 400 600 800 1000

-20

24

Figure 6: Plots of the series NLAR(1) (top) and NLMA(1) bottom.

For the calculation of LDF , PLDF , and NLDF a local linear smootherwith a nearest neighbour bandwidth of 0.5 is used. Actually, lagged scatterplots indicate that a local quadratic smoother should be applied, at least forNLMA(1) and DNLAR(1), but to avoid a perfect fit for the deterministicseries a local linear smoother is used. Confidence intervals are constructedusing standard normal intervals, since normal QQ-plots of the absolutevalues of the 200 bootstrap replicates showed this to be appropriate. Theconfidence interval obtained for LDF is included on the plots of PLDF .

19

Figure 8 shows SACF , SPACF , LDF , and PLDF for the three series.For NLMA(1) and DNLAR(1) the linear tools, SACF and SPACF , in-dicate independence and LDF shows that lag dependence is present. Fromthese observations it can be concluded that NLMA(1) and DNLAR(1)are nonlinear processes. From the plots of LDF and PLDF it cannotbe inferred whether NLMA(1) is of the autoregressive or of the movingaverage type. For DNLAR(1) the autoregressive property is more clearsince PLDF drops to exactly zero after lag two. In case of DNLAR(1)a more flexible smoother will result in values of LDF being significantlydifferent from zero for lags larger than two, while, for NLMA(1), LDFwill be close to zero for lags larger than one independent of the flexibilityof the smoother used. This is an indication of DNLAR(1) being of theautoregressive type and NLMA(1) being of the moving average type.

For NLAR(1) the linear tools indicate that the observations come froman AR(1) process. This is not seriously contradicted by LDF or PLDF ,although LDF decline somewhat slower to zero than SACF . To investigateif the underlying process is linear a Gaussian AR(1) model is fitted to thedata and this model is used as the hypothesis under which 200 (parametric)bootstrap replicates of NLDF are generated. Figure 7 shows NLDF and a95% standard normal interval, constructed under the hypothesis mentionedabove. A normal QQ-plot show that the absolute values of the bootstrapreplicates are approximately Gaussian. From the plot it is concluded thatthe underlying process is not the estimated AR(1)-model, and based onPLDF it is thus concluded that the observations originate from a non-linear process of AR(1) type.

0 2 4 6 8 10 12

0.0

0.4

0.8

Figure 7: NLDF for NLAR(1), including a 95% confidence interval underthe assumption of an AR(1) process (dotted).

20

Sam

ple

AC

F

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

Sam

ple

AC

F

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

Sam

ple

AC

F

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

Sam

ple

PA

CF

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

Sam

ple

PA

CF

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

Sam

ple

PA

CF

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

LDF

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

LDF

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

LDF

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-0.5

0.0

0.5

1.0

Figure 8: SACF , SPACF , LDF , and PLDF for series NLMA(1),DNLAR(1), and NLAR(1) (columns, left to right).

21

9.3 Canadian lynx data

Recently, Lin and Pourahmadi (1998) analyzed the Canadian lynx data(Moran, 1953) using non-parametric methods quite similar to the methodspresented in this report. The data is included in the software S-PLUS(version 3.4 for HP-UX) and described in (Tong, 1990, Section 7.2). In thisreport a thorough analysis of the data will not be presented, but the datawill be used to illustrate how the methods suggested can be applied. Asin (Lin and Pourahmadi, 1998) the data is log10-transformed prior to theanalysis.

For the transformed data LDF , PLDF , and NLDF are computed using alocal quadratic smoother and nearest neighbour bandwidths of 0.5 and 1.For LDF 200 bootstrap replicates are generated under the i.i.d. hypothesisand QQ-plots indicate that standard normal intervals are appropriate. Thesame apply for NLDF with the exception that the bootstrap replicates aregenerated under the hypothesis that the AR(2) model of Moran (1953), alsodescribed by Lin and Pourahmadi (1998), is true. Confidence intervals arecomputed also for PLDF for the nearest neighbour bandwidth of 1.0. Theintervals are based one hundred bootstrap replicates of PLDF generatedunder the i.i.d. hypothesis. QQ-plots indicate that the percentile methodshould be applied to the absolute values of PLDF .

In Figure 9 plots of LDF , NLDF , and PLDF are shown. Dotted linesindicates 95% confidence intervals under the i.i.d. hypothesis (LDF ) andunder the AR(2) model of Moran (1953) (NLDF ). The intervals obtainedfor LDF are also shown on the plots of PLDF . Furthermore, for thenearest neighbour bandwidth of 1.0, a 95% confidence interval for whitenoise is included on the plot of PLDF (solid lines).

From the plots of LDF it is clearly revealed that the process is not i.i.d.The plots of NLDF for a nearest neighbour bandwidth of 0.5 show hardlyany significant values, but when a nearest neighbour bandwidth of 1.0 isused lags three and four show weak significance. This indicates that a smalldeparture from linearity is present in the data. Finally, the plots of PLDFclearly illustrate that lag one and two are the most important lags and thatother lags are, practically, non-significant. In conclusion, an appropriatemodel seems to be a non-linear autoregressive model containing lag oneand two, i.e. a model of the type (7) with k = 2.

Estimation in this model using local quadratic smoothers and a nearest

22

neighbour bandwidth of 1.0 yields the results shown in Figure 10. Theresponse for lag one seems to be nearly linear. This aspect should befurther investigated. The results agree well with the results of Lin andPourahmadi (1998).

LDF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

LDF

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

NLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

NLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

PLD

F

0 2 4 6 8 10 12

-1.0

-0.5

0.0

0.5

1.0

Figure 9: Canadian lynx data (log10-transformed). Plots of LDF , NLDF ,and PLDF using local quadratic smoothers and nearest neighbour band-widths 0.5 (top row) and 1.0 (bottom row).

10 Lagged Cross Dependence

Given two time series {x1, . . . , xN} and {y1, . . . , yN} the Sample Cross Cor-relation Function between processes {Xt} and {Yt} in lag k (SCCFxy(k))is an estimate of the correlation between Xt−k and Yt. It is possible to gen-eralized this in a way similar to the way LDF is constructed. Like SCCFthis generalization will be sensible to autocorrelation, or lag dependence,in {Xt} in general. For SCCF this problem is (approximately) solved byprewhitening (Brockwell and Davis, 1987, p. 402). However, prewhiteningis very dependent on the assumption of linearity, in that it relies on theimpulse response function from the noise being independent on the level.For this reason, in the non-linear case, it is not possible to use prewhiteningand the appropriateness of the generalization of SCCF depend on {Xt}being i.i.d.

23

2.0 2.5 3.0 3.5

-1.5

-0.5

0.0

0.5

1.0

1.5

Figure 10: Non-linear additive autoregressive model for the log10-

transformed Canadian lynx data (f21(·) solid, f22(·) dotted). The estimateof the constant term is 2.76 and the MSE of the residuals is 0.0414.

11 Final Remarks

The generalizations of the sample correlation functions reduce to their lin-ear counterpart when the smoothers are replaced by linear models. Hence,if a local linear smoother is applied an almost continuous transition fromlinear to non-linear measures of dependence is obtainable via the bandwidthof the smoother.

In the test for an i.i.d. process the alternative contain both linear andnon-linear models. The degree of departure from linearity that can bedetected by the test can be adjusted by selection of the flexibility of thesmoother. If, e.g., a local quadratic smoother with a nearest neighbourbandwidth of 100% is applied, the test will have large power against an “al-most quadratic” alternative. If, on the other hand, a more flexible smootheris used, the power against an “almost quadratic” alternative will be lower.

Optimal bandwidth selection is not addressed in this report. However, themethods can still be applied in this case, but the power against specificalternatives cannot be adjusted.

The estimation in the non-linear additive autoregressive model, which isused in the generalization of the sample partial autocorrelation function,breaks down in case of concurvity. Hence, the lags enter the model sequen-tially and in case of non-convergence of the backfitting iterations the lastlag is, essentially, excluded from all subsequent models, c.f. Section 6.1.Furthermore, slow convergence have been observed in some cases. Hence,

24

the particular procedure is probably not optimal. The procedure may bereplaced by a modified BRUTO algorithm (Hastie and Tibshirani, 1990,p. 262-3) in which the smoothing parameters are fixed but where the nullfit is still a possibility. However, due to the autoregression then e.g. lo-cal regression smoothers will be non-linear and the approximation to thedegrees of freedom used in the BRUTO algorithm may not be applicable.Inspired by the work of Ye (1998) the degrees of freedom may be defined asthe sum of the sensitivities of the fitted values with respect to the observa-tions. Using this definition the degrees of freedom will equal the trace of thesmoother matrix plus a quantity involving partial derivatives of elementsof the smoother matrix with respect to the observations.

If the conditional mean of the series can be modelled the methods describedin this report can be applied to the series of squared residuals and the con-ditional variance can, possibly, be addressed in this way. This approach isquite similar to the approach by Tjøstheim and Auestad (1994, Section 5).

The first author has created an experimental S-PLUS library implement-ing most of the methods described in this report, using local polynomialsmoothers. The software has been used for most of the numerical calcula-tions in this report. A copy of the software can be obtained by contactingthe first author.

References

Brockwell, P. J. and Davis, R. A. (1987), Time Series: Theory and Methods,Springer-Verlag, Berlin/New York.

Chen, R. and Tsay, R. S. (1993), ‘Nonlinear additive ARX models’, Journal

of the American Statistical Association 88, 955–967.

Cleveland, W. S. and Devlin, S. J. (1988), ‘Locally weighted regression:An approach to regression analysis by local fitting’, Journal of the

American Statistical Association 83, 596–610.

Efron, B. and Tibshirani, R. J. (1993), An Introduction to the Bootstrap,Chapman & Hall, London/New York.

Ezekiel, M. and Fox, K. A. (1959), Methods of Correlation and Regression

Analysis, third edn, John Wiley & Sons, Inc., New York.

25

Friedman, J. H. (1991), ‘Multivariate adaptive regression splines’, The An-

nals of Statistics 19, 1–67. (Discussion: p67-141).

Granger, C. and Lin, J.-L. (1994), ‘Using the mutual information coefficientto identify lags in nonlinear models’, Journal of Time Series Analysis

15, 371–384.

Granger, C. W. J. (1983), Forecasting white noise, in A. Zellner, ed., ‘Ap-plied Time Series Analysis of Economic Data’, U.S. Department ofCommerce, Washington, pp. 308–314.

Hastie, T. J. and Tibshirani, R. J. (1990), Generalized Additive Models,Chapman & Hall, London/New York.

Hinich, M. J. (1979), ‘Estimating the lag structure of a nonlinear time seriesmodel’, Journal of the American Statistical Association 74, 449–452.

Hjellvik, V. and Tjøstheim, D. (1996), ‘Nonparametric statistics for test-ing of linearity and serial independence’, Journal of Nonparametric

Statistics 6, 223–251.

Kendall, M. and Stuart, A. (1961), Inference and Relationship, Vol. 2 of The

Advanced Theory of Statistics, first edn, Charles Griffin & CompanyLimited, London.

Lewis, P. A. W. and Stevens, J. G. (1991), ‘Nonlinear modeling of timeseries using multivariate adaptive regression splines (MARS)’, Journal

of the American Statistical Association 86, 864–877.

Lin, T. C. and Pourahmadi, M. (1998), ‘Nonparametric and non-linearmodels and data mining in time series: a case-study on the Canadianlynx data’, Applied Statistics 47, 187–201.

Moran, P. A. P. (1953), ‘The statistical analysis of the sunspot and lynxcycles’, Journal of Animal Ecology 18, 115–116.

Priestley, M. B. (1988), Non-linear and Non-stationary Time Series Anal-

ysis, Academic Press, New York/London.

Rao, C. R. (1965), Linear Statistical Inference and its applications, Wiley,New York.

26

Terasvirta, T. (1994), ‘Specification, estimation, and evaluation of smoothtransition autoregressive models’, Journal of the American Statistical

Association 89, 208–218.

Theiler, J., Eubank, S., Longtin, A., Galdrikian, B. and Farmer, J. D.(1992), ‘Testing for nonlinearity in time series: the method of surro-gate data’, Physica D 58, 77–94.

Tjøstheim, D. (1994), ‘Non-linear time series: A selective review’, Scandi-

navian Journal of Statistics 21, 97–130.

Tjøstheim, D. and Auestad, B. H. (1994), ‘Nonparametric identification ofnonlinear time series: Selecting significant lags’, Journal of the Amer-

ican Statistical Association 89, 1410–1419.

Tong, H. (1990), Non-linear Time Series. A Dynamical System Approach,Oxford University Press, Oxford.

Whittaker, J. (1990), Graphical Models in Applied Multivariate Statistics,Wiley, New York.

Ye, J. (1998), ‘On measuring and correcting the effects of data miningand model selection’, Journal of the American Statistical Association

93(441), 120–131.

27

Some Tools for Identiﬁcation of Nonlinear Time Series · of linear time series models. For...

Documents

Transcript of Some Tools for Identiﬁcation of Nonlinear Time Series · of linear time series models. For...