Kernel-based nonlinear canonical analysis and time...

Journal of Econometrics 119 (2004) 323–353www.elsevier.com/locate/econbase

Kernel-based nonlinear canonical analysis andtime reversibility

Serge Darollesa, Jean-Pierre Florensb, Christian Gouri,erouxc;∗

aCREST and Soci�et�e G�en�erale Asset Management, FrancebGREMAQ and IDEI, Toulouse, France

cCEPREMAP and CREST, Laboratoire de Finance-Assurance, 15 Boulevard Gabriel P�eri,92245 Malako) Cedex, France

Abstract

We consider a kernel-based approach to nonlinear canonical correlation analysis and its im-plementation for time series. We deduce a test procedure of the reversibility hypothesis. Themethod is applied to the analysis of stochastic di2erential equation from high-frequency data onstock returns.c© 2003 Published by Elsevier B.V.

JEL classi,cation: C14; C22; C40

Keywords: Nonlinear canonical analysis; Kernel estimators; Reversibility hypothesis; Di2usion equations;High-frequency data

1. Introduction

Let us consider a multivariate stationary process (Xt) with a continuous distribution,denote by fh the joint density of (Xt; Xt−h) and by f the marginal density of Xt .Under weak conditions, the joint density can be decomposed as (see e.g. Barrett andLampard, 1955; Dunford and Schwartz, 1963; Lancaster, 1968):

fh(xt ; xt−h) = f(xt)f(xt−h)

[1 +

∞∑i=1

�i;h’i;h(xt) i;h(xt−h)

]; (1)

∗ Corresponding author. Tel.: +33-1-4117-3593; fax: +33-1-4117-7666.E-mail address: [email protected] (C. Gouri,eroux).

0304-4076/$ - see front matter c© 2003 Published by Elsevier B.V.doi:10.1016/S0304-4076(03)00199-4

mailto:[email protected]

324 S. Darolles et al. / Journal of Econometrics 119 (2004) 323–353

where the canonical correlations �i;h; i varying, are decreasing: �1; h¿ �2; h¿ · · ·¿ 0;∀h, and the canonical directions satisfy the orthogonality conditions:

E[’i;h(Xt)’k;h(Xt)] = 0; ∀k �= i; ∀h;E[ i;h(Xt) k;h(Xt)] = 0; ∀k �= i; ∀h;E[’i;h(Xt)] = E[ i;h(Xt)] = 0; ∀i; h (2)

and the normalization conditions:

V [’i;h(Xt)] = V [ i;h(Xt)] = 1; ∀i; h:In this note we introduce kernel-based nonparametric estimators of the canonical

correlations and canonical directions.The nonlinear canonical decompositions (1) are the basis for analyzing nonlinear

dynamics. For instance, they are used to estimate indirectly the drift and the volatilityfunctions of an univariate stochastic di2erential equation from discrete time data (seeHansen and Scheinkman, 1995; Kessler and Sorensen, 1999; Darolles and Gouri,eroux,2001; Hansen et al., 1998; Chen et al., 1998). They are also used to deIne the nonlin-ear autocorrelograms, i.e. the sequence (�i;h; h varying) (see Ding and Granger, 1996;Gouri,eroux and Jasiak, 2002), to characterize the dynamic models with Inite dimen-sional predictor space (see Gouri,eroux and Jasiak, 2001), to test for the independencehypothesis (see Rosenblatt, 1975 and Dauxois and Nkiet, 1998), to exhibit the dynam-ics of extreme risks in Inance, to study smooth transitions (Gouri,eroux and Jasiak,2003; Larsen and SJorensen, 2003), and to analyze the nonlinear comovements betweenseries, i.e. the nonlinear copersistence (see e.g. Gouri,eroux and Jasiak, 1999).

In Section 2 we introduce the unconstrained estimator of the canonical decompositionand describe its asymptotic properties. In Section 3 we consider the corresponding esti-mation constrained by the reversibility property and discuss the test of the reversibilityhypothesis. The method is applied in Section 4 to high-frequency data on stock returnsto illustrate the practical usefulness of the method. Section 5 concludes. The proofs aregathered in the appendices.

2. Unconstrained estimator

We consider a pair of continuous random vectors (X; Y ), whose joint p.d.f. admitsa nonlinear canonical decomposition:

f(x; y) = f(x; :)f(: ; y)

[1 +

∞∑i=1

�i’i(x) i(y)

]; (3)

where the canonical correlations and canonical directions satisfy the standard orthogo-nality and normalization conditions. This decomposition exists whenever:

Assumption A.1.∫ f2(x;y)

f(x; :)f(: ;y) dx dy¡ + ∞.

S. Darolles et al. / Journal of Econometrics 119 (2004) 323–353 325

The elements of the canonical decomposition have important interpretations(see e.g. Hotelling, 1936; Hannan, 1961; Anderson, 1963; Dauxois and Pousse, 1975;Darolles et al., 1998). Indeed, the Irst pair (’1; 1) is a solution of the optimizationproblem max’; Corr[’(X ); (Y )], whereas �1 is the maximal value of the correlation.(’2; 2) is the second pair the most correlated and �2 the correlation between ’2 and 2, and so on. These successive optimization problems are used to derive numericallythe elements of the canonical decomposition.

In practice, the distribution of the pair (X; Y ) is not known and the theoreticalcanonical analysis cannot be performed. However, we can estimate this canonical de-composition if some observations (Xn; Yn); n = 1; : : : ; N , of (X; Y ) are available. Weassume:

Assumption A.2. The sequence (Xn; Yn); n¿ 1, is a stationary process, whose marginaldistribution coincides with the distribution of (X; Y ).

The results will especially be applied to a stationary time series (Xt; t = 0; 1; : : :)observed until date T , with Xn = Xt; Yn = Xt−h. For this reason and without loss ofgenerality we consider vectors X and Y with a same dimension:

Assumption A.3. X and Y are d-dimensional.

2.1. De,nition of the estimator

When the joint p.d.f. is unknown, we perform the nonlinear canonical analysis ona nonparametric estimator of f. We consider a kernel estimator 1 of this joint p.d.f.(see Rosenblatt, 1956; Silverman, 1986). Let us introduce two kernels K1; K2 deInedon Rd. The unknown density function is estimated by

fN (x; y) =1N

N∑n=1

1hd

1Nhd2N

K1

(Xn − xh1N

)K2

(Yn − yh2N

); (2)

where h1N ; h2N are the bandwidths associated with the two components. Then we con-sider the associated canonical decomposition �i;N ; ’i;N ; i;N ; i¿ 1. It will be computedby solving the sequence of optimization problems corresponding to fN .

In this approach, fN has to satisfy the properties of a density function, for anyN , to ensure the validity of the empirical canonical analysis. This justiIes the nextassumption.

Assumption A.4. The kernels K1; K2 are nonnegative, with unit mass.

1 Other nonparametric estimators can be considered as sieve estimators (see Chen et al., 1998; Darolleset al., 1998).


2.2. Assumptions for consistency and normality of fN

The asymptotic properties of the estimated canonical decomposition have to be de-duced from the asymptotic properties of the kernel density estimator. We consider thefollowing assumptions to get the uniform consistency of the density kernel estimatorand a central limit theorem.

Assumption A.5. 2 The variables X and Y take values in the same compact set X ⊂Rd; X = [0; 1]d, say.

Assumption A.6. The probability density function f is continuous on X2.

Assumption A.7. The strictly stationary sequence (Xn; Yn) is geometrically strong mix-ing, i.e. with �-mixing 3 coeNcients such that �k 6 c0�k , for some Ixed c0 ¿ 0 and06 �¡ 1.

Assumption A.8. The kernels Ki; i = 1; 2, are bounded, symmetric, of order 2, Lips-chitzian, and satisfy lim‖u‖→∞ ‖u‖dKi(u) = 0; i = 1; 2.

Assumption A.9. As N → ∞; hiN → 0; NhdiN =(logN )2 → +∞; i = 1; 2.

Assumptions A.2–A.9 give the uniform consistency of the density kernel estima-tor (see Lemma A.1). We introduce an additional assumption to obtain the uniformconsistency of the associated inner product estimator (see Lemma A.2):

〈’(X ); (Y )〉N =∫∫

’(x) (y)fN (x; y) dx dy:

Assumption A.10. The probability density function f is bounded from below by �¿ 0.

Such assumption is now standard in the nonparametric literature (see e.g. Bosq, 1998and the references therein). If the density function were known, it would be possible totransform the data to get a uniform distribution on the compact set X= [0; 1]d. Whenthe probability density function is unknown, a preliminary transformation can still beapplied to satisfy Assumption A.10, if we have a priori information on the tail behaviorof the joint distribution. Without this a priori information and without Assumption A.10,the results below can be strongly modiIed since they implicitly include a tail analysiswhich is out of the scope of this paper. In the special case of a Markov process (Xt),

2 The compactness assumption is not restrictive. Indeed, it is always possible to transform the initial databy a one-to-one transform onto a compact set, since the canonical analysis prior to transformation is easilydeduced from the canonical analysis of the transformed data.

3 The �-mixing coeNcients �k are deIned as

�k = supB∈!(Xs; s6t)

C∈!(Xs; s¿t+k)

|P(B ∩ C) − P(B)P(C)|; k¿ 1:


and the choice X = Xt and Y = Xt−1, there exists another approach to circumvent thetail problem. It consists in censoring the time series from its extreme values whileperforming an appropriate change of time (Darolles and Gouri,eroux, 2001). Indeed,there is a simple relation between the canonical decompositions of the initial andtransformed processes which can be exploited.

We now introduce the assumptions needed to derive a central limit theorem for thekernel density estimator.

Assumption A.11. The p.d.f. ft1 ;t2 ;t3 ;t4 of {(Xt1 ; Yt1 ); (Xt2 ; Yt2 ); (Xt3 ; Yt3 ); (Xt4 ; Yt4 )} existsfor any t1 ¡t2 ¡t3 ¡t4, and supt1¡t2¡t3¡t4‖ft1 ;t2 ;t3 ;t4‖∞ ¡∞.

Assumption A.12. The p.d.f. ft1 ;t2 of {(Xt1 ; Yt1 ); (Xt2 ; Yt2 )} satisIes the conditionsupt1¡t2‖ft1 ;t2 −f ⊗f‖∞ ¡∞, where f ⊗f denotes the product of marginal p.d.f. of(Xti ; Yti), i = 1; 2.

Assumption A.13. The p.d.f. f is twice continuously di2erentiable on ]0; 1[2d, andthere exists b such that ‖f‖∞ ¡b and ‖f(2)‖∞ ¡b.

Assumption A.14. As N → ∞; hiN → 0; NhdiN → ∞; Nhd+4

iN → 0; i = 1; 2.

Assumptions A.2–A.8 and A.11–A.14 allow to derive a central limit theorem andthen obtain the asymptotic distribution of the kernel density estimator (seeLemma B.1).

2.3. Consistency of the estimated canonical analysis

We are concerned by the consistency of the p Irst estimated canonical correlationsand canonical directions, i.e. their convergence to their theoretical counterparts. WeIrst need to introduce some identiIability conditions.

Assumption A.15. The p Irst canonical correlations are distinct.

Assumption A.15 is an identiIability condition for the canonical correlations. Since∑∞i=1 �2

i =∫

[f2(x; y)=f(x; :)f(: ; y)] dx dy¡ + ∞, distinct nonzero canonical correla-tions are necessarily isolated. Another identiIability assumption has to be introducedfor the canonical directions, which are deIned up to a change of sign.

Assumption A.16. There exists a value x0 such that ’j(x0) �= 0; j = 1; : : : ; p.

Hence we select the pair of canonical directions with ’jN (x0)¿ 0 and ’j(x0)¿ 0,j = 1; : : : ; p.

Moreover, the functional parameters of interest have to belong to the admissi-ble values of the associated estimators. Note that the initial and approximated op-timization problems associated with the nonlinear canonical analysis are not deInedon the same spaces of functions. Indeed, the approximated optimizations involve the


spaces L2N (X ); L2

N (Y ) of square integrable functions with respect to fN . A function’ is square integrable in the approximated optimization problem of order N if andonly if∫

’2(x)1

hd1N

K1

(Xn − xh1N

)dx¡ + ∞; n = 1; : : : ; N:

Since the observations can take any value from the support of the marginal distri-bution of f and the bandwidth may vary, it is useful to introduce the following space:

L2K1

(X ) ={’ :∫

’2(x)1hd

1

K1

(x − xh1

)dx¡ + ∞; ∀h1 ¿ 0; ∀x∈ suppf

};

the space L2K2

(Y ) being deIned accordingly. The links between the spaces L2K1

(X )

and L2(X ) (L2K2

(Y ) and L2(Y ), respectively) involve the respective tails of the kernelsK1; K2 and the p.d.f. f. We intuitively have to select kernels with rather thin tails tobe sure that L2

K1(X ) includes the theoretical canonical directions of interest. It explains

the assumption below.

Assumption A.17. The canonical directions ’i and j are such that ’i ∈L2K1

(X ); i =1; : : : ; p; and j ∈L2

K2(Y ); j = 1; : : : ; p.

Finally, the assumption below is useful to study the estimators in appropriate topo-logical space.

Assumption A.18. (i) The canonical variates ’i; i; i = 1; : : : ; p, are continuously dif-ferentiable.

(ii) The kernels K1; K2 are continuously di2erentiable.

Theorem 2.1. Under the identi,cation conditions A.15 and A.16, the estimability con-ditions A.17–A.18 and the technical assumptions A.1–A.10, �i;N ; ’i;N ; i;N ; i=1; : : : ; pconverge to their theoretical counterparts in the sense:

�i;N →N→∞

�i a:s:

∫|’i;N (x) − ’i(x)|2f(x; :) dx →

N→∞0 a:s:;

∫| i;N (y) − i(y)|2f(: ; y) dy →

N→∞0 a:s:

Proof. See Appendix A.The a.s. convergence of the integrated mean square error does not imply in general

the a.s. pointwise convergence of the estimated canonical variates. It implies (undersome additional assumptions) the convergence of inner products 〈’i;N ; g〉, for any testfunction g∈L2(X ) (see Darolles et al., 2001). However, to obtain simple pointwiseasymptotic distributions, we only consider x for which the pointwise convergence isachieved.


2.4. Asymptotic distributions

The convergence properties of the estimated canonical correlations and canonicaldirections allow us to expand the Irst-order conditions and to derive the asymptoticdistributions.

Theorem 2.2. Under Assumptions A.1–A.18,

(i) the asymptotic distribution of �N = (�1;N ; : : : ; �p;N ) is a Gaussian distribution:√N (�N − �) d→N(0; V );

where the elements of the matrix V are: Vi;j = 14

∑∞k=−∞ Cov[2’i(Xn) i(Yn) −

�i’2i (Xn) − �i 2

i (Yn); 2’j(Xn+k) j(Yn+k) − �j’2j (Xn+k) − �j 2

j (Yn+k)].(ii) The asymptotic distribution of ’N (x) = (’1;N (x); : : : ; ’p;N (x)) is a Gaussian dis-

tribution:√Nhd

1N (’N (x) − ’(x)) d→N(0; W (x));

where Wi;j(x) = 1�i�j

1f(x; :)

∫K2

1 (u) duCov[ i(Y ); j(Y ) |X = x].

(iii) The asymptotic distribution of N (y) = ( 1;N (y); : : : ; p;N (y)) is a Gaussiandistribution:√

Nhd2N ( N (y) − (y)) d→N(0; U (y));

where Ui;j(y) = 1�i�j

1f(: ;y)

∫K2

2 (u) duCov[’i(X ); ’j(X ) |Y = y].

Proof. See Appendix B.

The rate of convergence is a parametric rate for the canonical correlations, whereasit is nonparametric for the canonical directions. The asymptotic variance Wi; i(x) of’i;N coincides with the asymptotic variance of the Nadarahya–Watson estimator of(1=�i)E[ i(Y ) |X ]. This corresponds to the interpretation of the canonical direction ’i

as the conditional expectation of the canonical direction i, up to a scale factor.

3. The reversibility hypothesis

In this section we are interested in reversible processes, i.e. processes with identicaldistributional properties in initial and reversed times. Discretized unidimensional dif-fusion processes are examples of reversible processes. Therefore, when rejecting thereversibility hypothesis, we also reject the existence of an underlying unidimensionaldi2usion process (see Section 4). Some other procedures to test for unidimensional dif-fusion processes have been based on the embeddability hypothesis (see Florens et al.,1998).


The reversibility condition implies that, ∀h; fh(xt ; xt−h) = fh(xt−h; xt), that is thesymmetry of the bivariate distribution fh at any lag. We explain in the subsectionbelow how to estimate the canonical decomposition of f(x; y) under these reversibil-ity (i.e. symmetry) conditions. Then, by comparing the unconstrained and constrainedestimators, we derive a test of the symmetry hypothesis.

3.1. Constrained estimators

There exists di2erent kernel estimators of the canonical decomposition taking intoaccount the symmetry constraint.

(i) We select identical kernels K1 =K2 =K and bandwidths h1N =h2N =hN , whereaswe artiIcially double the size of the sample by considering (Xi; Yi); (Yi; Xi); i=1; : : : ; N .The constrained kernel estimator of the density function is

fRN (x; y) =

12Nh2d

N

N∑n=1

{K(Xn − xhN

)K(Yn − yhN

)+ K

(Xn − yhN

)K(Yn − xhN

)}:

The constrained estimators of the canonical correlations and canonical directions arededuced from the canonical decomposition of fR

N and denoted by �Ri;N and ’R

i;N = R

i;N ; i¿ 0.(ii) We look for the solutions of the spectral problem:

12 (T ∗ + T )’R

i;N = �Ri;N ’

Ri;N ; 〈’R

i;N ; ’Ri;N 〉 = 1;

where

T’(y) =∫

’(x)fN (x; y)

fN (: ; y)dx;

T ∗’(x) =∫

’(y)fN (x; y)

fN (x; :)dy;

(iii) We look for the solutions of the spectral problem:

12 (T ∗T + T T ∗)’R

i;N = (�Ri;N )2’R

i;N ; 〈’Ri;N ; ’

Ri;N 〉 = 1:

The latter approaches are based on the property of self-adjoint conditional expectationoperator under the reversibility hypothesis. These three constrained estimators do notprovide the same results in Inite sample, but share the same asymptotic propertiesunder the reversibility hypothesis.

Theorem 3.1. Under the assumptions of Theorem 2:2, and if the reversibility hypoth-esis is satis,ed,

(i) �Ri;N ; ’

Ri;N ; i = 1; : : : ; p, converge a.s. to their theoretical counterparts.

(ii) The asymptotic distribution of �RN = (�R

1;N ; : : : ; �Rp;N ) is a Gaussian distribution:

√N (�R

N − �) d→N(0; V R);


where VRi; j = 1

4

∑∞k=−∞ Cov[2’i(Xn)’i(Yn) − �i’2

i (Xn) − �i’2i (Yn), 2’j(Xn+k)’j

(Yn+k) − �j’2j (Xn+k) − �j’2

j (Yn+k)].(iii) The asymptotic distribution of ’R

N (x) = (’R1;N (x); : : : ; ’R

p;N (x)) is a Gaussian dis-tribution:√

NhdN (’R

N (x) − ’(x)) d→N(0; WR(x));

where

WRi;j(x) =

12

1�i�j

1f(x; :)

∫K2

1 (u) duCov[’i(Y ); ’j(Y ) |X = x]:

Proof. See Appendix C.

We obtain asymptotic results, which may be directly compared to Theorem 2.2. Theconstrained and unconstrained estimators of the canonical correlations have the sameasymptotic distribution under the reversibility hypothesis. In contrast the asymptoticvariance of the constrained estimated canonical variate is half the variance of the un-constrained one. This is a correction for the double size of the sample (see interpretation(i) of the constrained estimator).

3.2. Comparison of constrained and unconstrained estimators

Under the null hypothesis of reversibility we can compare the asymptotic proper-ties of the constrained and unconstrained estimators of the canonical correlations andcanonical directions. The di2erence between these two types of estimators can be usedto construct testing procedures of the reversibility hypothesis.

3.2.1. Asymptotic equivalenceUnder the reversibility hypothesis, �1N and �R

1N admit the same Irst-order expansion.Therefore, we need the second-order expansion of the di2erence �1N − �R

1N to discusstesting procedures. The property below is proved in Appendix D. The notation ∼means that the residual term can be neglected with respect to the terms of the left-and right-hand sides.

Theorem 3.2. Under the reversibility hypothesis, we have the asymptotic equivalences:

2(’1N − ’R1N ) ∼ ’1N − 1N

and

�1N − �R1N ∼ 2�1

∫[’1N (y) − ’R

1N (y)]2f(: ; y) dy

∼ �1

2

∫[’1N (y) − 1N (y)]2f(: ; y) dy:


Proof. See Appendix D.

The Irst relation means that it is equivalent to construct a test procedure based onthe di2erence between the constrained and unconstrained estimators of the canonicalvariates ’1, or the di2erence between the unconstrained estimators of the canonicalvariates ’1 and 1. The second equation shows that a testing procedure based on thedi2erence between the constrained and unconstrained canonical correlations consists inintroducing an appropriate measure of discrepancy between the estimators of ’1.

3.2.2. Test procedureA test procedure can be deduced from the asymptotic properties of

IN = .1

∫[’1N (y) − 1N (y))]2f(: ; y) dy: (4)

The di2erence between ’1N and 1N has been weighted by the marginal distributionf(: ; y) to keep the interpretation of Theorem 3.2. Similar results can be derived iff(: ; y) is replaced by another weighting function w(y) (see e.g. Hall, 1984a; Tenreiro,1997). For instance, we may use w(y)=f2(: ; y) (see Pagan and Ullah, 1999, p. 168).

The analysis is similar to the one usually followed for the integrated square error ofnonparametric density or regression estimators (see e.g. Bickel and Rosenblatt, 1973;Nadaraya, 1983; Hall, 1984a, b). However, these results are generally derived for i.i.d.observations, and we will use an extension of limit theorems established by Tenreiro(1995, 1997) (see also Meloche, 1990).

We assume that the process Zi = (Xi; Yi)′ is strongly stationary and geometricallyabsolutely regular (see Bradley, 1986). Then, under regularity conditions described inAppendix E, we get the theorem below:

Theorem 3.3. Under the reversibility hypothesis,

(i) the asymptotic distribution of IN − EIN is a Gaussian distribution:

Nhd=2N (IN − EIN ) d→N(0; 12);

where

12 = 2

{∫ [∫K(u)K(u + v) du

]2

dv

}∫(Var[’1(Xi+1) |Xi = xi])2 dxi:

(ii) The bias is

EIN =2

NhdN

∫K2(u) du

∫Var[’1(Xi+1) |Xi = xi] dxi + o

(1

Nhd−1N

):

After replacement of IN by a consistent estimator, we deduce from Theorem 3.3, aprocedure for testing the reversibility hypothesis.


The test procedures described above can be used to check if discrete time data arecompatible with an underlying continuous time di2usion model. To solve the questionwe can proceed in two steps. First apply a test for embeddability (see e.g. Florenset al., 1998) to check if the eigenvalue are strictly positive, that is if the data are com-patible with a continuous time model (not necessarily a di2usion). If the embeddabilityhypothesis is not rejected, the reversibility test can then be performed.

4. Applications

In this section we provide two illustrations of the approach. The Irst one is based onan artiIcial dataset consisting of simulated realizations of a reQected Brownian motion.This is a Markov reversible process providing a basis for a comparison of di2erentestimation techniques. The second example involves high-frequency data on returns onthe Alcatel stock traded on the Paris-Bourse.

4.1. Re=ected Brownian motion

In this example we consider a reQected Brownian motion, with zero drift, a variance!2, and two reQecting barriers at 0 and l. This process is stationary, markovian andreversible. Its inInitesimal generator:

Af(x) = limh→0

[E[f(Xt+h) |Xt = x] − f(x)

h

]=

12!2 d2

dx2 f(x) (5)

is deIned for any function f belonging to D(5) = {f∈L2 : Af exists and f′(0) =f′(l)=0}, where L2 is the space of square integrable functions with respect to Lebesguemeasure on [0; l]. The eigenelements (�i; ei) of A are (see Darolles and Laurent, 2000for the computation):

�i = −12

(i!7l

)2

; ei(x) = cos(i7lx)

; i varying:

Therefore, the canonical variates associated with discrete observations (Xt; Xt−1) are’i= i=ei, up to a change of sign, and the canonical correlations are �i=exp(�i); i=1varying. Moreover, the reQected Brownian motion is reversible.

We simulate a path (Xt; t = 1; 2; : : : ; T ) of length T = 2500, for the volatility ! = 1and the barrier l = 7. Next, we use these artiIcial observations to Ind the nonlinearcanonical decomposition of the joint distribution of (Xt; Xt−1). Two estimation methodsare successively applied to the data (Xt; Xt−1); t = 2; : : : ; T :

(i) an unconstrained kernel method, with Gaussian kernels K1(x) = K2(x), and band-widths h1 = h2 = 0:1025;

(ii) the same kernel method constrained by the reversibility hypothesis. We providethe estimated canonical correlations in Table 1.


Table 1Estimated canonical correlations

Canonical correlations �i

Order True Unconstrained Constrained

1 0.6065 0.6146 0.61392 0.1353 0.1795 0.17173 0.0111 0.0897 0.07344 0.0003 0.0781 0.0682

Fig. 1. Constrained kernel estimator of the Irst canonical variate.

The constrained and unconstrained estimators of the canonical correlations are closeto each other, and close to the true values. Similar results are obtained when comparingthe estimators of the canonical variates. Hence, we only consider below the constrainedestimation method.

To study the asymptotic variance of the estimators, we perform the following Monte-Carlo study. We replicate 250 simulated paths using the same parameters values andwe compute at each point of the support the mean and the standard deviation of theestimator of the Irst canonical variate. Figs. 1 and 2 provide the averaged estimatorsand the pointwise conIdence bands, computed for the Irst and the second canonicalvariates. The Igure presents the true function (dotted line) and its estimator (continuousline).

The kernel estimators of the canonical variates satisfy approximatively the boundaryconstraints ’′

1(0) = ’′1(l) = 0. This nice property is not satisIed in practice by the

standard sieve method, which can create important Inite sample bias (see Darolles andGouri,eroux, 2001 for a modiIcation of the sieve approach to integrate the boundarye2ects).


Fig. 2. Constrained kernel estimator of the second canonical variate.

Table 2Estimated canonical correlations

Canonical correlations �i

Order Unconstrained Constrained

1 0.3595 0.25692 0.1829 0.14593 0.1467 0.12604 0.030 0.1255

4.2. High-frequency data

We consider a series of returns corresponding to the Alcatel stock traded on theParis-Bourse. The prices are resampled every 20 mn from real time records and thereturns are computed by di2erencing the log-prices. The observation period is May2, 1997–August 30, 1997, and includes 1705 observations. For this application, wecan assume that returns take values in a compact set. Indeed, the tradings wouldautomatically stop if the price modiIcation with respect to the opening price wastoo large.

We implement the unconstrained and constrained kernel based methods, with a Gaus-sian kernel and bandwidths h1N = h2N = 0:062. The reversibility hypothesis is clearlyrejected when we compare the constrained and unconstrained estimated canonical cor-relations (see Table 2, their asymptotic variances are provided in Appendix F). Theintroduction of the reversibility constraint induces an underestimation of the Irst canon-ical correlation by about 30%.

The estimated canonical variates are provided in Figs. 3 and 4 for the unconstrainedcase. The Irst variate is represented by a continuous line, the second one by a dashedline, and the third one by a dotted line.


Fig. 3. Estimated current canonical variates.

Fig. 4. Estimated lagged canonical variates.

It is commonly assumed in Inancial theory that the stock returns (Xt; t¿ 0) satisfya stochastic di2erential equation:

dXt = .(Xt) dt + !(Xt) dWt (say):

In such a case the process is necessarily reversible and the Irst canonical variate cor-responds to a monotonous function, the second one to a function with one breakpoint,and so on. The comparison of the three Igures shows clearly that the reversibilityproperty has to be rejected, as the expected patterns of the canonical variates are. Inparticular, the observed returns are not compatible with an underlying unidimensionalstochastic di2erential equation.

How to interpret the pattern of the Irst canonical variate? It is well known that the(linear) autocorrelations of stock returns are generally insigniIcant, which is consistentwith the theory of market eNciency. In our case the Irst order linear correlation is0.065 and is not signiIcant. Therefore, the linear transformation will not belong to the


subspaces generated by the Irst canonical variates. Moreover, the literature on ARCHmodels insists on the so-called volatility persistence, implying a large autocorrelationof squared returns. Therefore, it is not surprising to Ind a Irst canonical variate witha parabolic form, even if the pattern also includes some leverage e2ect (see Black,1976).

5. Concluding remarks

In this paper we develop a nonlinear canonical correlation analysis based on kernelestimators of the density function. It allows to deIne nonparametric estimators of thecanonical correlations and canonical directions either unconstrained or constrained bythe reversibility property. This nonparametric canonical analysis is especially usefulfor Inancial applications based on high-frequency data and for the estimation of thedrift and volatility functions of a di2usion equations. It may also be used to study theliquidity risk in an analysis of intertrade durations (see Gouri,eroux and Jasiak, 2002),or to implement technical analysis based models for directions of price changes (seeDarolles et al., 2000).

Acknowledgements

The authors thank Eric Renault and three anonymous referees for many valuablecomments and constructive criticisms.

Appendix A. Proof of Theorem 2.1

From Roussas (1988), Theorem 3.1, Bosq (1998) and Theorem 2.2, we get thefollowing lemma.

Lemma A.1. Under Assumptions A.2–A.9, the kernel estimator of the p.d.f. is uni-formly strongly consistent.

From this result, we deduce a uniform consistency property of integrals with respectto fN .

Lemma A.2. Under Assumptions A.2–A.10,∫g(x; y)fN (x; y) dx dy converges a.s.

uniformly to∫g(x; y)f(x; y) dx dy, for any function g in G = {g :

∫ |g(x; y)|f(x; y)dx dy6 1}.

Proof. For any function g in G = {g :∫ |g(x; y)f(x; y) dx dy6 1}, we get: supg∈G

| ∫ g(x; y)[fN (x; y)−f(x; y)] dx dy|6 sup(x;y)∈X2 |fN (x; y)−f(x; y)|=f(x; y). Using as-sumption A.10, this uniform convergence is equivalent to the uniform convergence offN (x; y) to f(x; y) given by Lemma A.1.


Let us now prove the consistency for i = 1. The case i �= 1 can be easily deducedby similar arguments. Let us introduce the three following maximizationproblems:

Problem A.1. (’1;N ; 1;N ) solution of max’; ∫’(x) (y)fN (x; y) dx dy, subject to∫

’2(x)fN (x; :) dx =∫ 2(y)fN (: ; y) dy = 1.

Problem A.2. (’1;N ; 1;N ) solution of max’; ∫’(x) (y)fN (x; y) dx dy, subject to∫

’2(x)f(x; :) dx =∫ 2(y)f(: ; y) dy = 1.

Problem A.3. (’1; 1) solution of max’; ∫’(x) (y)f(x; y) dx dy, subject to

∫’2(x)

f(x; :) dx =∫ 2(y)f(: ; y) dy = 1.

Let us denote H = {(’; ) :∫’2(x)f(x; :) dx =

∫ 2(y)f(: ; y) dy = 1}. We Irst

prove the consistency of ’1;N and 1;N .(i) Consistency of ’1;N and 1;N : Using Cauchy–Schwarz inequality, we get g(x; y)=

’(x) (y) ∈G, ∀(’; ) ∈H and by Lemma A.2 we deduce the a.s. uniform conver-gence

∫’(x) (y)fN (x; y) dx dy → ∫

’(x) (y)f(x; y) dx dy as N → ∞. Moreover,the application (’; ) → ∫

’(x) (y)f(x; y) dx dy is continuous from L2(X ) × L2(Y )to R. We use a Jennrich’s lemma valid for any metric space (see Jennrich, 1969,proof of Theorem 6) to deduce the a.s. convergence of the solutions of the Inite sam-ple problem to the solution of the limit problem A.3. This implies: ‖’1;N − ’1‖2 → 0a.s. and ‖ 1;N − 1‖2 → 0 a.s., where ‖:‖2 is the L2 distance.

(ii) Consistency of ’1;N and 1;N : Since g(x; y)=’2(x) ∈G and g(x; y)= 2(y) ∈G,when (’; ) ∈H, we deduce from Lemma A.2 the equicontinuity property, whichimplies the a.s. convergences: �N =

∫’2

1;N (x)fN (x; :) dx → ∫’2

1(x)f(x; :) dx = 1, a.s.,and 8N =

∫ 2

1;N (y)fN (: ; y) dy → ∫ 2

1(y)f(: ; y) dy=1, a.s. The solutions (’1;N ; 1;N )

and (’1;N ; 1;N ) of Problems A.1 and A.2 are proportional up to the terms (�1=2N ; 81=2

N ).Therefore, we have ‖’1;N −’1‖26 ‖’1;N −’1‖2 +‖(1−�−1=2

N )’1;N‖2 =‖’1;N −’1‖2 +|1 − �−1=2

N |, and we deduce ‖’1;N − ’1‖2 → 0 a.s. A similar computation gives theconsistency of 1;N .

(iii) Consistency of �1;N : It is consequence of the equicontinuity property.

Appendix B. Proof of Theorem 2.2

From Bosq (1998) and Theorem 2.3, we get the following central limit theorem forthe kernel density estimator.

Lemma B.1. Under Assumptions A.2–A.8, A.11–A.14, for any (x; y) ∈ ]0; 1[2d,√Nhd

2Nhd1N [fN (x; y) − f(x; y)] d→N(0; W (x; y)), where W (x; y) = f(x; y)

∫K2

1 (u) du∫K2

2 (u) du.


To derive the asymptotic distributions, we need central limit theorems for speciIctransformations of the kernel density estimator.

Lemma B.2. Under Assumptions A.2–A.8, A.11–A.14, we get for a given functiong(x; y):

(i)√N∫g(x; y)[fN (x; y) − f(x; y)] dx dy d→N(0; V ), with asymptotic variance V =∑∞

k=−∞ Cov[g(Xn; Yn); g(Xn+k ; Yn+k)].

(ii)√

Nhd1N

∫g(x; y)[fN (x; y) − f(x; y)] dy d→N(0; W (x)), with asymptotic variance

W (x) =∫K2

1 (u)du∫g2(x; y)f(x; y) dy.

(iii) If the reversibility hypothesis is satis,ed,√

NhdN

∫g(x; y)[fR

N (x; y) − f(x; y)] dyd→N(0; W (x)), with asymptotic variance W (x)=1

2

∫K2(u) du

∫g2(x; y)f(x; y) dy.

We can now begin the proof of Theorem 3.2. We derive asymptotic distributionsfor p = 1. The case p �= 1 is easily deduced by similar arguments.

(i) Expansion of the Irst-order conditions. Let us denote by L2(X ) (resp. L2(Y )) thespace of square integrable functions of X (resp. Y ), by ‖:‖2 the associated L2 norm,by T the conditional expectation operator from L2(X ) to L2(Y ):

T : ’(X ) → T’(Y ) = E[’(X ) |Y ] =∫

’(x)f(x; y)f(: ; y)

dx;

and by T ∗ the adjoint operator of T :

T ∗ : (Y ) → T (X ) = E[ (Y )|X ] =∫

(y)f(x; y)f(x; :)

dy:

The Irst-order conditions associated with Problem A.3 are: T ∗T’1 = .1’1 andTT ∗ 1 = .1 1, where .1 = �2

1. Moreover 〈’1; ’1〉 = 1, 〈 1; 1〉 = 1, and:

T’1 = .1=21 1; T ∗ 1 = .1=2

1 ’1:

T , T ∗, ’1, 1 and .1 are function of the distribution F . We denote by dTF(H)(resp. dT ∗

F (H), d’1F(H), d.1F(H)) the Gateaux derivative of T (resp. T ∗, ’1, .1) inthe direction H at the point F (h(x; y) denotes the p.d.f. of H). We will derive theexplicit expressions of d.1F(H) and d’1F(H). The Gateaux derivative of the Irst-ordercondition corresponding to .1 and ’1 is

d.1F(H)’1 = dT ∗F (H)T’1 + T ∗dTF(H)’1 + (T ∗T − .1I) d’1F(H); (B.1)

where I is the identity operator.(a) Expression of d.1F(H): Let us focus on d.1F(H). To eliminate the term

d’1F(H) in the expression above, we compute the inner product of this equation


with ’1. We Ind

d.1F(H)〈’1; ’1〉 = 〈’1; (dT ∗F (H)T + T ∗dTF(H))’1〉

+〈’1; (T ∗T − .1I) d’1F(H)〉:

By the normalization condition 〈’1; ’1〉 = 1, its derivative satisIes

〈’1; d’1F(H)〉 = 0:

Moreover, by using the deInitions of the adjoint operator and of 1, we get

d.1F(H) = .1=21 [〈’1; dT ∗

F (H) 1〉 + 〈 1; dTF(H)’1〉]: (B.2)

Let us now compute the quantities dTF(H)’1(y) and dT ∗F (H) 1(x). We get

dTF(H)’1(y) =∫

’1(x)[h(x; y)f(: ; y)

− f(x; y)f(: ; y)2 h(: ; y)

]dx

=1

f(: ; y)

[∫’1(x)h(x; y) dx − T’1(y)h(: ; y)

]

=1

f(: ; y)

∫(’1(x) − .1=2

1 1(y))h(x; y) dx

and

dT ∗F (H) 1(x) =

1f(x; :)

∫( 1(y) − .1=2

1 ’1(x))h(x; y) dy: (B.3)

By introducing this expression in condition (B.2), we get the expression of d.1F(H)as an integral with respect to h:

d.1F(H) = .1=21

∫[2’1(x) 1(y) − .1=2

1 ’21(x) − .1=2

1 21 (y)]h(x; y) dx dy: (B.4)

(b) Expression of d’1F(H): From (B.1), we get

(.1I − T ∗T ) d’1F(H)

=dT ∗F (H)T’1 + T ∗dTF(H)’1 − d.1F(H)’1

=dT ∗F (H)T’1 + T ∗dTF(H)’1 − 〈’1; dT ∗

F (H)T’1 + T ∗dTF(H)’1〉’1:

The r.h.s. belongs to the null space of the operator .1I − T ∗T , which implies theexistence of a solution d’1F(H). Moreover, the solution is unique due to the constraint


〈’1; d’1F(H)〉 = 0. It satisIes the Fredholm equation:

(.1I − T ∗T ) d’1F(H) = dvF(H);

where dvF(H) denotes the r.h.s. It can be written as

d’1F(H)(x) =1.1

dvF(H) −∫

b(x; s) dvF(H) ds;

where

b(x; s) =1.1

∑j �=1

.1

.1 − .k’j(x)’j(x):

Assumption A.15 ensures the convergence of the series deIning b.(c) Frechet di)erentiability: The directional Gateaux derivatives can be used to

derive the Irst-order expansions of .1(F + H) − .1(F), ’1(F + H) − ’1(F) and 1(F + H) − 1(F), whenever these functions are Frechet di2erentiable.

Let us consider the canonical variate ’1(F) which is solution of the implicit equation:

T ∗F TF’1 − 〈TF’1; TF’1〉F’1 = 0:

It is easily checked that the application:

g : (F; ’) → T ∗F TF’ − 〈TF’; TF’〉F’

is Frechet di2erentiable for the L2(X ) norm for ’ and g, and the norm:

‖F‖(∞;1) = supx;y

|F(x; y)| + supx;y

d∑i=1

∣∣∣∣@F@xi (x; y)∣∣∣∣+ sup

x;y

d∑i=1

∣∣∣∣ @F@yi(x; y)

∣∣∣∣ ;for F . We deduce the di2erentiability of ’1 with respect to F by the implicit functiontheorem. Indeed we have

@g@’

(F; ’1) d’1F = (T ∗T − .1I) d’1F �= 0:

Similarly, we can check that

.1(F) = max’;

〈TF’; 〉F ;

where the maximum is computed on the pairs (’; ) of continuously di2erentiablefunctions, is Frechet di2erentiable for the norm:

‖F‖(∞;0) = supx;y

|F(x; y)|;

for F and the standard norm of R for ..(ii) Asymptotic distribution of �1;N . By the Frechet di2erentiability of .1(F) and the

Assumption A.18, we deduce the Irst-order expansion of .1;N . We have

.1;N − .1 = .1(FN ) − .1(F) = d.1F(FN − F) + o(‖FN − F‖(∞;0))


and

√N (.1;N − .1) =

√N d.1F(FN − F) + o(

√N‖FN − F‖(∞;0)):

Since√N‖FN − F‖(∞;0) = Op(1) (see AJUt-Sahalia, 1992, proof of Theorem 2), the

two terms√N (.1;N −.1) and

√N d.1F(FN −F) have the same asymptotic distribution.

We get

√N d.1F(FN − F) =

√N∫

.1=21 [2’1(x) 1(y) − .1=2

1 ’21(x) − .1=2

1 21 (y)]

× [fN (x; y) − f(x; y)] dx dy;

and the asymptotic distribution of .1;N follows from Lemma B.2(i) with g(x; y) =.1=2

1 [2’1(x) 1(y) − .1=21 ’2

1(x) − .1=21 2

1 (y)]. We immediately deduce the asymptoticdistribution of �1;N = (.1;N )1=2.

(iii) Asymptotic distribution of ’1;N . By the Frechet di2erentiability of ’1(F) andAssumption A.18, we deduce the Irst-order expansion of ’1;N . We get

’1;N − ’1 = ’1(FN ) − ’1(F) = d’1F(FN − F) + o(‖FN − F‖(∞;1))

and

√Nhd

1N (’1;N − ’1) =√

Nhd1N d’1F(FN − F) + o

(√Nhd

1N‖FN − F‖(∞;1)

):

Since√

Nhd1N‖FN − F‖(∞;1) = Op(1) (see AJUt-Sahalia, 1992, proof of Theorem 3),√

Nhd1N (’1;N − ’1) and

√Nhd

1N d’1F(FN − F) have the same asymptotic distribution(see SerQing, 1980, Lemma B, p. 218). Moreover, by using the expression of d’1F(H)and comparing the rates of convergence, we get

√Nhd

1N d’1F(FN − F)(x) =1.1

√Nhd

1N dT ∗F (H)T’1(x) + op(1)

=1�1

1f(x; :)

√Nhd

1N

∫[ 1(y) − �1’1(x)]

× [fN (x; y) − f(x; y)] dy + op(1): (B.5)

We apply Lemma B.2(ii) with g(x; y) = (1=�i)[1=f(x; :)][ 1(y) − �1’1(x)] to get theasymptotic distribution of ’1;N (x):

√Nhd

1N (’1;N (x) − ’1(x))d→N(0; W1;1(x));


where

W1;1(x) =∫

K21 (u) du

1�2

1

1f2(x; :)

∫[ 1(y) − �1’1(x)]2f(x; y) dy

=1�2

1

1f(x; :)

∫K2

1 (u) duV [ 1(Y )|X = x]:

The proof is similar for deriving the asymptotic distributions of (’1;N (x); : : : ; ’p;N (x))and of ( 1;N (x); : : : ; p;N (x)).

Appendix C. Proof of Theorem 3.1

The proof of consistency is similar to the proof given for Theorem 2.1 in AppendixA. We derive asymptotic distributions of the estimators under the reversibility hypothe-sis. We only present the expansions for the constrained estimators of type (iii) deInedby

12 (T

∗T + T T ∗)’Ri;N = .R

i;N ’Ri;N ; 〈’R

i;N ; ’Ri;N 〉 = 1;

where .Ri;N =(�R

i;N )2. For expository purpose, we consider the case p=1. The Gateauxderivative of the Irst-order condition is

d.R1F(H)’1 = 1

2 d(T ∗F TF + TFT ∗

F )(H)’1 + (12 (T

∗T + TT ∗) − .1I) d’R1F(H):

The Gateaux derivatives are computed at a point F which satisIes the reversibilityhypothesis and is such that T ∗ = T , ’1 = 1. We get

d.R1F(H)’1 = 1

2 [dT ∗F (H)T + T dTF(H) + dTF(H)T + T dT ∗

F (H)]’1

+ (T 2 − .1I) d’R1F(H); (C.1)

where I is the identity operator.(i) Expression of d.R

1F(H): To eliminate the terms d’1F(H) in the previous expres-sion, we compute the inner product of this equation with ’1. We Ind

d.R1F(H) = .1=2

1 [〈’1; dT ∗F (H)’1〉 + 〈’1; dTF(H)’1〉]: (C.2)

By comparing with the Gateaux derivative of the unconstrained estimator (seeAppendix B), we note that

d.R1F(H) = d.1F(H);

under the reversibility hypothesis. In particular, the constrained and unconstrainedestimated canonical correlations have the same asymptotic distributions.

(ii) Expression of d’R1F(H): From (C.1), we get

(.1I − T 2) d’R1F(H) = dvRF(H);


where

dvRF(H) = 12 [(dT ∗

F (H) + dTF(H))T + T (dT ∗F (H) + dTF(H))]’1 − d.R

1F(H)’1:

When H is replaced by FN − F in the expansion and by comparing the rates ofconvergence, we get√

NhdN d’R

1;N (FN − F)(y)

=

√Nhd

N

.1

12

[dT ∗F (FN − F) + dTF(FN − F)]T’1(y) + op(1)

=

√Nhd

N

�1

1f(: ; y)

∫(’1(x) − .1=2

1 ’1(y))

×[fN (x; y) + fN (y; x)

2− f(x; y)

]dx + op(1)

=

√Nhd

N

�1

1f(: ; y)

∫(’1(x) − .1=2

1 ’1(y))[fRN (x; y) − f(x; y)] dx + op(1):

We apply Lemma B.2(iii) with g(x; y)=(1=�1)[1=f(: ; y)][’1(x)−�1’1(y)] to obtainthe asymptotic distribution of ’R

1;N :√Nhd

N (’Ri;N (x) − ’i(x))

d→N(0; WR1;1(x));

where

WR1;1(x) =

12

1�2

1

1f2(x; :)

∫K2

1 (u) du∫

[’1(y) − �1’1(x)]2f(x; y) dy

=12

1�2

1

1f(x; :)

∫K2

1 (u) duV [’1(Y )|X = x]:

Appendix D. Proof of Theorem 3.2

(i) Second-order Gateaux derivatives: Let us derive the second order Gateaux deriva-tive of d.1;F − d.R

1;F , at a point F satisfying the reversibility hypothesis. From (B.4)and (C.2) we have

d.1F(H) =∫

[2.1=21 ’1(x) 1(y) − .1’2

1(x) − .1 21 (y)]h(x; y) dx dy;


and

d.R1F(H) =

∫[2.1=2

1 ’1(x)’1(y) − .1’21(x) − .1’2

1(y)]h(x; y) dx dy:

Then we compute d2.1F(H; L) − d2.R1F(H; L) as∫

d[2.1=21 ’1(x) 1(y) − 2.1=2

1 ’1(x)’1(y) − .1 21 (y) + .1’2

1(y)](L)h(x; y) dx dy

=∫

d[(2.1=21 ’1(x) − .1( 1(y) + ’1(y)))( 1(y) − ’1(y))](L)h(x; y) dx dy

=2∫

.1=21 [’1(x) − .1=2

1 ’1(y)] d( 1F − ’1F)(L)(y)h(x; y) dx dy; (D.1)

since 1 =’1 under the reversibility hypothesis. The notation d( 1F −’1F)(L) is usedfor d 1F(L) − d’1F(L).

(ii) Second-order expansion of .1;N − .R1;N : The Irst nonzero term of the expansion

is d2(.1F − .R1F)(L; L), where L = FN − F . From (D.1), we get

.1;N − .R1;N

∼ 2.1

∫1�1

1f(: ; y)

(’1(x) − �1’1(y))d ( 1F − ’1F)(L)(y)l(x; y)f(: ; y) dx dy

=2.1

∫ [1�1

1f(: ; y)

∫(’1(x) − �1’1(y))l(x; y) dx

]

× d( 1F − ’1F)(L)(y)f(: ; y) dy: (D.2)

We know from (B.5) that

d’1F(L)(y) ∼ 1�1

1f(: ; y)

∫(’1(s) − �1’1(y))l(y; s) ds

and

d 1F(L)(y) ∼ 1�1

1f(: ; y)

∫(’1(s) − �1’1(y))l(s; y) ds:

Then

d( 1F − ’1F)(L)(y)

∼ 1�1

1f(: ; y)

∫(’1(s) − �1’1(y))[l(s; y) − l(y; s)] ds;

when F satisIes the reversibility hypothesis. Moreover, we note that

d(’1F − ’R1F)(L)(y)

∼ 1�1

1f(: ; y)

∫(’1(s) − �1’1(y))

[l(y; s) − l(y; s) + l(s; y)

2

]ds


∼ 12

1�1

1f(: ; y)

∫(’1(s) − �1’1(y))[l(y; s) − l(s; y)] ds

∼ − 12 d( 1F − ’1F)(L)(y): (D.3)

By replacing in (D.2), we get

.1;N − .R1;N ∼ −4.1

∫d 1F(L)(y)d(’1F − ’R

1F)(L)(y)f(: ; y) dy

= −4.1〈d 1F(L); d(’1F − ’R1F)(L)〉 (say):

(iii) Additional equivalences: Since the computations are symmetrical in ’; , wededuce

〈d 1F(L); d(’1F − ’R1F)(L)〉 = 〈d’1F(L); d( 1F − R

1F)(L)〉:By applying (D.3), we get

〈d 1F(L); d( 1F − ’1F)(L)〉 ∼ −〈d’1F(L); d( 1F − ’1F)(L)〉;or equivalently

〈d( 1F + ’1F)(L); d( 1F − ’1F)(L)〉 ∼ 0: (D.4)

We deduce that

〈d 1F(L); d(’1F − ’R1F)(L)〉

∼ − 12 〈d 1F(L); d( 1F − ’1F)(L)〉 (from (D:3))

∼ − 14 〈d( 1F − ’1F)(L); d( 1F − ’1F(L)〉 (from (D:4));

and therefore

.1;N − .R1;N ∼ .1

∫[d( 1F − ’1F)(L)(y)]2f(: ; y) dy:

Appendix E. Proof of Theorem 3.3

(i) Decomposition of IN − EIN : By applying (D.3), we get

IN = .1

∫[’1N (y) − 1N (y)]2f(: ; y) dy

=∫∫∫

1f(: ; y)

(’1(x) − �1’1(y))(’1(s) − �1’1(y))

× [lN (x; y) − lN (y; x)][lN (s; y) − lN (y; s)] dx ds dy;


where lN (x; y) = fN (x; y) − f(x; y). Under the reversibility hypothesis, we have ElN(X; Y ) = ElN (Y; X ). We deduce that

IN − EIN =1

Nhd=2N

2N

∑i¡j

[HN (Zi; Zj) − EHN (Zi; Zj)]

+1

N 3=2hd=2N

1√N

∑i

[HN (Zi; Zi) − EHN (Zi; Zi)]; (E.1)

where

HN (Zi; Zj)

=∫∫∫

1f(: ; y)

(’1(x) − �1’1(y))(’1(s) − �1’1(y))

× 1

h7d=2N

[K(x − xihN

)K(y − yi

hN

)− E

[K(x − xihN

)K(y − yi

hN

)]

− K(y − xihN

)K(x − yi

hN

)+ E

[K(y − xihN

)K(x − yi

hN

)]]

×[K(s − xjhN

)K(y − yj

hN

)− E

[K(s − xjhN

)K(y − yj

hN

)]

−K(y − xjhN

)K(s − yj

hN

)+ E

[K(s − xjhN

)K(y − yj

hN

)]]ds dx dy:

(E.2)

(ii) Asymptotic distribution of IN − EIN : We can now apply Tenreiro (1997) andTheorem 1 concerning the asymptotic behavior of U -statistics under dependence con-ditions. Let us denote by VZ0 an independant copy of Z0. Under the assumptions below:

A:1∗ EHN (Z0; z) = 0;A:2∗ HN (Zi; Zj) = HN (Zj; Zi);A:3∗ E[HN (Z0; VZ0)2] = 212 + o(1);A:4∗ There exist =0 ¿ 0; >0 ¡ 1

2 such that: uN (4 + =0) = O(N>0 ), where uN (r) =max{max16i6N (‖HN (Zi; Z0)‖r ; ‖HN (Z0; VZ0)‖r)} and ‖?‖r = (E|?|r)1=r;

the term

1N

∑16i6j6N

[HN (Zi; Zj) − EHN (Zi; Zj)]

is asymptotically normal with zero mean and variance 12.We easily check that Assumptions A:1∗ and A:2∗ are satisIed, and the expression

of 12 is computed below. Moreover, since the second term of the decomposition (E.1)


is of order O(N−3=2hdN ), we deduce that

Nhd=2N (IN − EIN ) d→N(0; 12):

(iii) Computation of 12: We get

HN (Zi; VZi)

=1

h7d=2N

∫∫∫1

f(: ; y)(’1(x) − �1’1(y))(’1(s) − �1’1(y))

×[K(x − xihN

)K(y − xi−1

hN

)− K

(x − xi−1

hN

)K(y − xihN

)]

×[K(s − VxihN

)K(y − Vxi−1

hN

)− K

(s − Vxi−1

hN

)K(y − VxihN

)]ds dx dy

∼ AN;1 + AN;2 + AN;3 + AN;4;

where

AN;1 =∫

1f(: ; y)

(’1(xi) − �1’1(y))(’1( Vxi) − �1’1(y))

× 1

h3d=2N

K(y − xi−1

hN

)K(y − Vxi−1

hN

)dy;

AN;2 = −∫

1f(: ; y)

(’1(xi) − �1’1(y))(’1( Vxi−1) − �1’1(y))

× 1

h3d=2N

K(y − xi−1

hN

)K(y − VxihN

)dy;

AN;3 = −∫

1f(: ; y)

(’1(xi−1) − �1’1(y))(’1( Vxi) − �1’1(y))

× 1

h3d=2N

K(y − xihN

)K(y − Vxi−1

hN

)dy;

AN;4 =∫

1f(: ; y)

(’1(xi−1) − �1’1(y))(’1( Vxi−1) − �1’1(y))

× 1

h3d=2N

K(y − xihN

)K(y − VxihN

)dy;

are functions of the independent copies (xi; xi−1); ( Vxi; Vxi−1). When computing thequantity:

E[HN (Zi; VZi)2] = E(AN;1 + AN;2 + AN;3 + AN;4)2;


we get square of the type E[A2N;j] and cross terms like E[AN;jAN;k ]; k �= j. We Irst

check that the cross terms can be neglected. For instance, we have

E[AN;1AN;2]

= −∫∫∫∫ {∫

1f(: ; y)

(’1(xi) − �1’1(y))(’1( Vxi) − �1’1(y))

× 1

h3d=2N

K(y − xi−1

hN

)K(y − Vxi−1

hN

)dy

}

×{∫

1f(: ; y∗)

(’1(xi) − �1’1(y∗))(’1( Vxi−1) − �1’1(y∗))

× 1

h3d=2N

K(y∗ − xi−1

hN

)K(y∗ − Vxi

hN

)dy∗

}

×f(xi; xi−1)f( Vxi; Vxi−1) dxi dxi−1d Vxi d Vxi−1

= −∫∫∫∫ {∫

1f(: ; xi−1 + uhN )

(’1(xi) − �1’1(xi−1 + uhN ))

× (’1(xi−1 − v∗hN ) − �1’1(xi−1 + uhN ))K(u)K(u + v) du}

×{∫

1f(: ; xi−1 + u∗hN

(’1(xi) − �1’1(xi−1 + u∗hN ))

× (’1(xi−1 − vhN ) − �1’1(xi−1 + u∗hN ))K(u∗)K(u∗ + v∗) du∗}×f(xi; xi−1)f(xi−1 − v∗hN ; xi−1 − vhN )hd

N dxi dxi−1 dv dv∗

=O(hdN ):

Let us now consider a square term. We get

E[A2N;1]

=∫∫∫∫ {∫

1f(: ; y)

(’1(xi) − �1’1(y))(’1( Vxi) − �1’1(y))

× 1

h3d=2N

K(y − xi−1

hN

)K(y − Vxi−1

hN

)dy

}2

f(xi; xi−1)

×f( Vxi; Vxi−1) dxi dxi−1 d Vxi d Vxi−1

=∫∫∫∫ {∫

1f(: ; xi−1 + uhN )

(’1(xi) − �1’1(xi−1 + uhN ))


× (’1( Vxi) − �1’1(xi−1 + uhN ))K(u)K(u + v) du}2

×f(xi; xi−1)f( Vxi; xi−1 − vhN ) dxi d Vxi dxi−1 dv

∼∫∫∫∫ {∫

1f(: ; xi−1)

(’1(xi) − �1’1(xi−1))(’1( Vxi) − �1’1(xi−1))

× K(u)K(u + v) du}2 f(xi; xi−1)f( Vxi; xi−1) dxi d Vxi dxi−1 dv

=∫ [∫

K(u)K(u + v) du]2

dv∫

(Var[’1(Xi) |Xi−1 = xi−1])2 dxi−1:

It is easily checked that EA2N;j = EA2

N;1; ∀j.(iv) Computation of EIN : By the strong stationarity of the process (Zi) and the

symmetry of the function HN , we get

EIN =1

N 2hd=2N

N∑i=1

N∑j=1

[EHN (Zi; Zj)]

=1

N 2hd=2N

+∞∑k=−∞

[max(N − |k|; 0) EHN (Zi; Zi+k)]

∼ 1

Nhd=2N

+∞∑k=−∞

EHN (Zi; Zi+k): (E.3)

The analysis of the limiting behavior of EHN (Zi; Zi+k) is based on the equivalencesbelow ∫

KN (x; xi)a(x) dx = a(xi) + o(hN ); (E.4)

∫KN (y; yi)KN (y; yj)a(y) dy

=∫

K(u)1hdN

K(u +

yi − yj

hN

)a(yi + uhN ) du;

=

o(hN ) if yi �= yj;

1hdN

[a(yi)

∫K2(u) du + o(hN )

]if yi = yj;

(E.5)

whenever K tends to zero at inInity at a suNcient rate (see Assumption A.8). Inthe application on time series, we have: yi = xi−1. From decomposition (E.2) andequivalence (E.5), we deduce that the expectations:

EHN (Zi; Zi+k) = o(hN );


if |k|¿ 2. Moreover, we have

EHN (Zi; Zi+1) = −∫

K2(u) du∫

f(xi+1 | xi)f(xi | xi−1)(’1(xi−1) − �1’1(xi))

× (’1(xi+1) − �1’1(xi)) dxi−1 dxi dxi+1 + o(hN );

= o(hN );

since∫

(’1(xi+1) − �1’1(xi))f(xi+1 | xi) dxi+1 = 0. Then we compute

EHN (Zi; Zi) =1

hd=2N

∫K2(u) du

{∫∫(’1(xi) − �1’1(xi−1))2f(xi | xi−1) dxi dxi−1

+∫∫

(’1(xi−1) − �1’1(xi))2f(xi−1 | xi) dxi−1 dxi

}

=2

hd=2N

∫K2(u) du

∫Var[’1(Xi+1) |Xi = xi] dxi;

due to the reversibility hypothesis. We get

EIN =2

NhdN

∫K2(u) du

∫Var[’1(Xi+1) |Xi = xi] dxi:

Appendix F. Variance matrices

We display in Tables 3 and 4 the constrained and unconstrained estimated canonicalcorrelations asymptotic variances.

To construct a procedure for testing the reversibility hypothesis from Theorem 3.3,we give in Table 5 the covariance matrix of �i − �R

i , i = 1; : : : ; 6.

Table 3Estimated canonical correlations and standard error

Constrained estimation

�Ri !�Ri

1 0.25659518 0.0481556692 0.14579950 0.0307341833 0.12594681 0.0159377964 0.12531014 0.0326805275 0.00543565 0.0252640846 0.00166780 0.007468832


Table 4Estimated canonical correlations and standard error

Unconstrained estimation

�i !�i

1 0.35914099 0.0310380802 0.18268443 0.0332286463 0.14661676 0.0928626054 0.03072694 0.0676727675 0.00338121 0.0291668526 0.00017670 0.0078029362

Table 5Estimated covariance of di2erence of canonical correlations

Unconstrained estimation

�i − �Ri !�i−�Ri

1 0.10254581 0.020387682 −0:03688493 0.011223213 −0:02066995 0.030076544 0.09458319 0.019877755 0.00205443 0.006166856 0.00149110 0.00250076

References

AJUt-Sahalia, Y., 1992. The delta and bootstrap methods for nonparametric kernel functionals, Working Paper,MIT.

Anderson, T.W., 1963. Asymptotic theory for principle component analysis. Annals of Statistics 34,122–148.

Barrett, J.F., Lampard, D.G., 1955. An expansion for some second order probability distributions and itsapplication to noise problems. IRE Transactions on PGIT IT-1, 10–15.

Bickel, P., Rosenblatt, M., 1973. On some global measures of the deviations of density function estimates.Annals of Statistics 1, 1071–1095.

Black, F., 1976. Studies in stock price volatility changes. Proceedings of the 1976 Business Meeting of theBusiness and Economic Statistics Section, American Statistical Association, pp. 177–181.

Bosq, D. 1998. Nonparametric Statistics for Stochastic Processes, Lecture Notes in Statistics, Springer,New York.

Bradley, R., 1986. Basic properties of strong mixing conditions. In: Eberlein, E., et al. (Ed.), Dependencein Probability and Statistics, a Survey of Recent Results. Birkhauser, Boston, pp. 165–192.

Chen, X., Hansen, L., Scheinkman, J., 1998. Shape preserving estimation of di2usions. Discussion Paper,University of Chicago.

Darolles, S., Gouri,eroux, C., 2001. Truncated dynamics and estimation of di2usion equations. Journal ofEconometrics 102, 1–22.

Darolles, S., Laurent, J.P., 2000. Approximating payo2s and pricing formulas. Journal of Economic Dynamicsand Control 24, 1721–1746.

Darolles, S., Florens, J.P., Renault, E., 1998. Nonlinear principal components and inference on a conditionalexpectation operator. Discussion Paper, CREST.

Darolles, S., Gouri,eroux, C., Le Fol, G., 2000. Intraday Transaction price dynamics. Annales d’Economieet Statistique 60, 208–238.


Darolles, S., Florens, J.P., Renault, E., 2001. Nonparametric instrumental regression. Discussion Paper,CREST.

Dauxois, J., Nkiet, G., 1998. Nonlinear canonical analysis and independence tests. Annals of Statistics26 (4), 1254–1278.

Dauxois, J., Pousse, A., 1975. Une extension de l’analyse canonique. Quelques applications. Annales de l’Institut Henri Poincar,e 11, 355–379.

Ding, Z., Granger, C., 1996. Modelling volatility persistence of speculative returns: a new approach. Journalof Econometrics 73, 185–216.

Dunford, N., Schwartz, J., 1963. Linear Operators, Vol. 2. Wiley, New York.Florens, J.P., Renault, E., Touzi, N., 1998. Testing for embeddability by stationary reversible continuous-time

Markov processes. Econometric Theory 14, 744–769.Gouri,eroux, C., Jasiak, J., 1999. Nonlinear persistence and copersistence. Working Paper CEPREMAP 9920.Gouri,eroux, C., Jasiak, J., 2001. State-space models with Inite dimensional dependance. Journal of Time

Series Analysis 22, 665–678.Gouri,eroux, C., Jasiak, J., 2002. Nonlinear autocorrelograms with application to intertrade durations. Journal

of Time Series Analysis 23, 127–154.Gouri,eroux, C., Jasiak, J., 2003. Jacobi process and smooth transitions. CREST, DP.Hall, P., 1984a. Central limit theorem for integrated square error properties of multivariate nonparametric

density estimators. Journal of Multivariate Analysis 14, 1–16.Hall, P., 1984b. Integrated square error properties of kernel estimators of regression functions. Annals of

Statistics 12, 241–260.Hannan, E., 1961. The general theory of canonical correlation and its relation to functional analysis. Journal

of the Australian Mathematical Society 2, 229–242.Hansen, L., Scheinkman, J., 1995. Back to the future: generating moment implications for continuous time

Markov processes. Econometrica 73, 767–807.Hansen, L., Scheinkman, J., Touzi, N., 1998. Spectral methods for identifying scalar di2usions. Journal of

Econometrics 86, 1–32.Hotelling, H., 1936. Relations between two sets of variables. Biometrika 28, 321–377.Jennrich, R., 1969. Asymptotic properties of nonlinear least squares estimators. Annals of Mathematical

Statistics 40, 633–643.Kessler, M., Sorensen, M., 1999. Estimating Equations Based on Eigenfunctions for Discretely Observed

Di2usion Process, Bernoulli 5, 299–314.Lancaster, H., 1968. The structure of bivariate distributions. Annals of Mathematical Statistics 29, 719–736.Larsen, K., SJorensen, M., 2003. Di2usion models for exchange rates in a target zone. University of

Copenhagen, DP.Meloche, J., 1990. Asymptotic behavior of the mean integrated squared error of kernel density estimators

for dependent observations. Canadian Journal of Statistics 18, 205–224.Nadaraya, E., 1983. A limit distribution of the square error deviation of nonparametric estimators of the

regression functions. Zeitschrift fJur Wahrscheinlichkeit Theorie und Verwandte Gebiete 64, 37–48.Pagan, A., Ullah, A., 1999. Nonparametric Econometrics. Cambridge University Press, Cambridge.Rosenblatt, M., 1956. Remarks on some nonparametric estimates of a density function. Annals of

Mathematical Statistics 27, 832–837.Rosenblatt, M., 1975. A quadratic measure of deviation of two dimensional density estimates and a test of

independence. Annals of Statistics 3, 1–14.Roussas, G., 1988. Nonparametric estimation in mixing sequences of random variables. Journal of Statistical

Planning and Inference 18, 135–149.SerQing, R.J., 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.Silverman, B., 1986. Density Estimation for Statistics and Data Analysis. Monographs on Statistics and

Applied Probability. Chapman & Hall, London.Tenreiro, C., 1995. Th,eor\emes limites pour les erreurs quadratiques int,egr,ees des estimateurs \a noyau de

la densit,e et de la r,egression sous des conditions de d,ependance. Comptes Rendus de L Academie desSciences Paris, Series I 320, 1535–1538.

Tenreiro, C., 1997. Loi asymptotique des erreurs quadratiques int,egr,ees des estimateurs \a noyau de la densit,eet de la r,egression sous des conditions de d,ependance. Portugalia Mathematica 53, 187–213.

Kernel-based nonlinear canonical analysis and time...

Documents

Transcript of Kernel-based nonlinear canonical analysis and time...