Optimal Dimension Reduction for (Functional) Time Series · functional principal components,...

64
Optimal Dimension Reduction for (Functional) Time Series Marc Hallin Universit´ e libre de Bruxelles (ULB) IMA Workshop on Forecasting from Complexity Institute for Mathematics and its Applications University of Minnesota, Minneapolis April 23–27, 2018

Transcript of Optimal Dimension Reduction for (Functional) Time Series · functional principal components,...

Page 1: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Optimal Dimension Reductionfor (Functional) Time Series

Marc Hallin

Universite libre de Bruxelles (ULB)

IMA Workshop on Forecasting from ComplexityInstitute for Mathematics and its Applications

University of Minnesota, MinneapolisApril 23–27, 2018

Page 2: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Partly based on

Hormann, S., Kidzinski, L., and Hallin, M. (2015). Dynamicfunctional principal components, Journal of the Royal StatisticalSociety Series B 77, 319-348.

Hallin, M., Hormann, S., and Lippi, M. (2018). On optimaldimension reduction for high-dimensional and functional timeseries, Statistical Inference for Stochastic Processes, to appear.

Page 3: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dimension reduction techniques are at the core of the statisticalanalysis of high-dimensional observations

Xt high-dimensional “replaced with” Yt =

Yt1...

YtK

, K “small”,

t = 1, . . . ,T , where the Ytk ’s are scalars

• observations in Rp (p-dimensional, p large): Xt =

Xt1...

Xtp

;

• functional observations: Xt = Xt(u), u ∈ [0, 1] (any basis isinfinite-dimensional);

Page 4: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

“Replacing Xt (p-dimensional or functional) by Yt”(K -dimensional, K “small”) means “linear approximations of Xt

can be constructed from the Yt ’s”—via linear combinations (staticloading) or linear filtering (dynamic loading) ...

Page 5: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dimension reduction techniques are at the core of the statisticalanalysis of high-dimensional observations

• observations in Rp, p large, 0 < K < p

• functional observations—intrinsicallyinfinite-dimensional—0 < K <∞

The latter case, where each observation is a function, isincreasingly frequent in a variety of applications—econometrics,finance, environmental studies ... “Tick by tick” time series ofreturns, p stocks; daily pollution or climatic data (e.g. withhalf-hour resolution); daily geophysical data (e.g. horizontalcomponent of the magnetic field with minute resolution; oceansurface salinity, velocities, etc.;

Page 6: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

“Tick by tick” financial data

0 1000 2000 3000 4000

360

365

370

time

OEX

Figure 1: S&P100 market index plotted for 10 consecutive trading days,405 measurements per day.

Page 7: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Pollution data

Time

PM

10

0 2 4 6 8 10

20

40

60

80

Figure 2: 10 days of PM10–data in Graz (Austria), 48 observations perday.

Page 8: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Geophysical data

Time

0 2000 4000 6000 8000 10000

27560

27580

27600

27620

27640

27660

Figure 3: Horizontal component of the magnetic field measured in oneminute resolution at Honolulu magnetic observatory from 1/1/200100:00 UT to 1/7/2001 24:00 UT, 1440 measurements per day.

Page 9: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Functional time series

Each day yields one observation Xt = {Xt(u), u ∈ [0, 1]}, which istreated as a real-valued function (or the corresponding curve).

A common feature of all those datasets is that they do exhibitserial dependence, hence, should be analyzed as stochasticprocesses—functional time series.

The intraday process {Xt(u), u ∈ [0, 1]} typically is not stationary,while the functional process {Xt , t ∈ Z} is.

We shall assume throughout that Xt ∈ L2([0, 1]).

Page 10: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dimension reduction

Whether the data are in Rp or functional, the most populardimension reduction technique is Principal Components, PCA(Functional Principal Components, FPCA).

Page 11: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

PCA in Rp

Let µ := EX = 0 and C := Cov(X ), with eigenvectors e1, . . . , epand eigenvalues λ1 > . . . > λp. The p-dimensional X decomposesinto (Karhunen-Loeve expansion)

X =

p∑k=1

Ykek =

p∑k=1

〈X , ek〉ek .

where

• Yk := 〈X , ek〉 is X ’s kth Principal Component (a scalar)

• Cov(Y ) = diag(λ1, . . . , λp)

Interpretation: Y1 is the normed linear combination of X1, . . . ,Xp

with highest variance, etc.

Page 12: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Optimality of PCA-based dimension reduction in Rp

The success of Principal Components as a dimension reductiontechnique is explained by the fact that the (Karhunen-Loeve)expansion

X =

p∑k=1

Ykek =

p∑k=1

〈X , ek〉ek

is such that, for any K ≤ p and any K -tuple v1, . . . , vK ofp-dimensional vectors (wlg, normed),

E

∥∥∥∥∥X −K∑

k=1

〈X , ek〉ek

∥∥∥∥∥2 ≤ E

∥∥∥∥∥X −K∑

k=1

〈X , vk〉vk

∥∥∥∥∥2

(=

p∑k=K+1

λk

)

Page 13: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Optimality of PCA-based dimension reduction in Rp

For all 1 ≤ K ≤ p, the K first principal components (Y1, . . . ,YK )thus

• provide the “best” (linear) reduction of X to dimension K in thesense that, for any 1 ≤ K < p, (Y1, . . . ,YK ) is the “best”K -dimensional approximation of X and, moreover,

•• they are mutually orthogonal (now in the sense ofuncorrelatedness), which greatly simplifies the statistical analysis.

Page 14: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Functional PCA

All this is easily extended for random variables in functional spaces.

Here we consider only variables in the Hilbert space L2 = L2([0, 1])equipped with inner product

〈x , y〉 =

∫ 1

0x(u)y(u)du

and norm‖x‖ =

√〈x , x〉.

Page 15: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Basic ingredients for Functional PCA

Let X = X (u) be a functional random variable, with mean function

µ = µ(u) = EX = (EX (u) : u ∈ [0, 1]) ;

(without loss of generality, let us assume µ(u) = 0 for all u )

and covariance kernel

C =(C (u, v) := Cov(X (u),X (v)) : (u, v) ∈ [0, 1]2

).

Instead of covariance matrices, we are dealing here with covarianceoperators. The covariance operator C is mapping f ∈ L2 toC (f ) ∈ L2, with

C (f )(u) =

∫ 1

0C (u, v)f (v)dv : u ∈ [0, 1].

Page 16: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Functional PCA

Functional PCA uses the fact that C is a compact (in fact, “traceclass”) operator. For any f ∈ L2, the image C (f ) of f admits aneigendecomposition

C (f ) =∑k≥1

λk〈f , ek〉ek ,

with∑

k≥1 λk <∞, where ek ∈ L2 is the kth eigenfunction of C ,with eigenvalue λk , ‖ek‖ = 1 and ek ⊥ e` for k 6= `: the sequencee1, e2, . . . thus constitutes an orthonormal basis for L2[0, 1].

Page 17: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

KL expansion

In that basis, X admits a representation (called Karhunen-Loeveexpansion)

X (u) =∑k≥1

〈X , ek〉ek(u) =:∑k≥1

Ykek(u) u ∈ [0, 1]

where

Yk := 〈X , ek〉 is X ’s kth functional principal component (ascalar), with

Cov(Yk ,Y`) = 0 for k 6= ` and Var(Yk) = λk −→ 0, andhence Yk →m.q. 0, as k →∞.

Page 18: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Optimality

That Karhunen-Loeve expansion is such that, for any 0 < K ∈ Nand any L2[0, 1]-valued sequence v1, v2, . . . ,

E

∫ 1

0

(X (u)−

K∑k=1

Ykek(u)

)2

du

≤ E

∫ 1

0

(X (u)−

K∑k=1

〈X , vk〉vk(u)

)2

du,

that is,

E

∥∥∥∥∥X −K∑

k=1

Ykek

∥∥∥∥∥2

≤ E

∥∥∥∥∥X −K∑

k=1

〈X , vk〉vk

∥∥∥∥∥2

.

Page 19: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

KL expansion

Hence, just as in the vector case, the K first functional principalcomponents (Y1, . . . ,YK )

• provide the “best reduction to dimension K” of X in the sensethat (Y1, . . . ,YK ) is the “best” K -dimensional approximationof X , and, moreover,

•• Y1, . . . ,YK are mutually orthogonal ...

Page 20: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Static KL expansion

Let us now assume {Xt | t ∈ Z} is a second-order stationaryprocess. Then, the the K first functional principal components

(Y1t , . . . ,YKt) := (〈Xt , e1〉, . . . , 〈Xt , eK 〉), t ∈ Z

(a K -dimensional process) based on the eigenfunctions ek of thecovariance operator C

• fail to exploit the information contained in Xt ’s leads and lags,hence

• only provide the “best” static K -dimensional approximation ofX , that is, is best among the reductions involving instantaneouslinear combinations only, of the form 〈Xt , ek〉, of the Xt ’s

In a time series context, linear combinations involving the past andfuture values of Xt are likely to do a better job

Page 21: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Static KL expansion

Moreover,•• unless {Xt | t ∈ Z} is an uncorrelated process, the Ykt ’s aremutually orthogonal “at lag zero” only:

Yk1t = 〈Xt , ek1〉 is uncorrelated with Yk2t = 〈Xt , ek2〉 for k1 6= k2,but in general Yk1t = 〈Xt , ek1〉 and Yk2,t−` = 〈Xt,−`, ek2〉 are(cross-)correlated.

Page 22: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Conclusion:

Principal components (whether classical or functional) in a timeseries context do not enjoy the two properties that make them anefficient statistical tool in the i.i.d. (uncorrelated) case.

Can we adapt traditional PCA (call it static PCA) to the dynamiccontext in order to recover its optimal dimension reduction andmutual orthogonality properties in the time-series context?

Page 23: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

This is what Brillinger did in the multivariate time series case.Denote by {Xt := (X1t , . . . ,Xpt)

′ t ∈ Z} a second-order stationaryprocess, with values in Rp, mean µ = 0 and autocovariances

Γk := E[XtX

′t−k], k ∈ Z.

Intuitively, informative linear approximations of Xt should exploitthe autocovariance structure (all Γk ’s), hence involve all present,but also past and future values of the Xt ’s instead of restricting tocontemporaneous ones.

Page 24: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

This requires looking for the normed linear combinations ofpresent, past and future observations maximizing (subject toorthogonality constraints) the variance, etc.—not just thoseinvolving the present observations: dynamic linear combinations(filters), not just static ones.

Page 25: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

Brillinger (1981), in a pathbreaking but all-too-often overlookedcontribution, has solved this problem for p-dimensional processesadmitting a spectral density (a spectral density matrix) ...

... which does not mean that Brillinger’s concept is a frequencydomain concept! Fourier transforms here are used as tools, andother spectral concepts probably can be considered as well.

Page 26: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

Brillinger’s dynamic principal components are based on afactorization of spectral density matrices instead of thefactorization of the (instantaneous) covariance matrices.

Therefore, let us make the additional assumption that

• the spectral measure of {Xt , t ∈ Z} is absolutely continuouswith respect to the Lebesgue measure on [−π, π], that is,{Xt} has a p × p spectral density matrix

Σ(θ) :=∞∑

`=−∞Γ`e−ı`θ,

with entries (σij(θ)), θ ∈ [−π, π];

Page 27: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

The matrix Σ(θ) is Hermitian and positive semidefinite; denotingby Σ(θ) its complex conjugate, Σ(θ) = Σ′(θ) = Σ(−θ).

The eigenvalues λ1(θ) ≥ λ2(θ) ≥ . . . ≥ λp(θ) ≥ 0 of Σ(θ) are real;call them the dynamic eigenvalues of {Xt}.

The corresponding eigenvectors are called the dynamiceigenvectors of {Xt}; denote by ϕ′k(θ) the row eigenvector (ofΣ(θ)) associated with λk(θ): then, ϕk(θ) is a column eigenvectorof of Σ(−θ) = Σ(θ) = Σ′(θ), still with eigenvalue λk(θ):

ϕ′k(θ)Σ(θ) = λk(θ)ϕ′k(θ) Σ(−θ)ϕk(θ) = λk(θ)ϕk(θ)

Page 28: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

Spectral densities always are defined up to a set of θ values withLebesgue measure zero; rather than functions, we are dealing withequivalence classes of a.e. equal functions, thus; by Σ(θ) we tacitlymean a representative of such a class—the same comment appliesto dynamic eigenvalues and eigenvectors

Page 29: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

For all θ,

• the eigenvalues λk(θ) are real,

• the p × p matrix of eigenvectors ϕϕϕ(θ) := (ϕ1(θ) . . . ϕp(θ)) isunitary, that is,

ϕϕϕ(θ)ϕϕϕ∗(θ) = I = ϕϕϕ∗(θ)ϕϕϕ(θ),

henceϕϕϕ(θ)ϕϕϕ′(θ) = I = ϕϕϕ′(θ)ϕϕϕ(θ)

(where ϕϕϕ∗(θ) and ϕϕϕ(θ) stand for the adjoint of ϕϕϕ(θ) and itsconjugate, respectively)

• Σ(−θ) = Σ(θ) implies we can impose ϕϕϕ(−θ) = ϕϕϕ(θ)

• ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ) = Λ(θ) and ϕϕϕ(θ)Λ(θ)ϕϕϕ′(θ) = Σ(θ)

Page 30: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Matrices and Filters

Any matrix or vector M(θ) with square-integrable θ-measurableelements defined over [−π, π] can be expanded (componentwise)into a Fourier series

M(θ) =1

∞∑`=−∞

[ ∫ π

−πM(θ)e ı`θdθ

]e−ı`θ

where the right-hand side converges in quadratic mean. Thatexpansion creates a correspondence between the square-integrablematrix-valued function M(θ) and the square-summable filter

M(L) :=1

∞∑`=−∞

[∫ π

−πM(θ)e ı`θ dθ

]L`

(L, as usual, stands for the lag operator)

Page 31: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

The (p ×m)-dimensional matrix M(θ) and the filter M(L) then arestrongly connected by the fact that

• if the p-dimensional process {Xt} has spectral densitymatrix Σ(θ),• then, the m-variate stochastic process {M ′(L)Xt} has spectraldensity matrix

M ′(θ)Σ(θ)M(θ)

where M is the conjugate of M.

Page 32: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

The p × 1 dynamic eigenvectors θ 7→ ϕk(θ), in particular, can beexpanded into

ϕk(θ) =1

∞∑`=−∞

[ ∫ π

−πϕk(θ)e ı`θdθ

]e−ı`θ =:

1

∞∑`=−∞

ψk`e−ı`θ,

defining p × 1 square-summable filters of the form

ϕk

(L) =1

∞∑`=−∞

[∫ π

−πϕk(θ)e ı`θdθ

]L` =:

∞∑`=−∞

ψk`L`

where ψk` is a p-dimensional vector of Fourier coefficients; sinceϕϕϕ(−θ) = ϕϕϕ(θ), the ψk`’s are real

Page 33: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

It follows that the p-tuple

Yt := ϕϕϕ′(L)Xt

of (real-valued) univariate processes {Ykt | t ∈ Z}, where

Ykt := ϕ′k

(L)Xt =∞∑

`=−∞ψ′k`Xt−` =

∞∑`=−∞

〈Xt−`, ψk`〉

has diagonal spectral density

ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ) = Λ(θ),

with diagonal elements ϕ′k(θ)Σ(θ)ϕk(θ) = λk(θ).

Hence, the {Ykt}’s are mutually orthogonal at all leads and lags,with

Var(Ykt) = λk :=

∫ π

−πλk(θ)dθ.

Page 34: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

Definition. The univariate process {Ykt | t ∈ Z} is called {Xt}’skth dynamic principal component (k = 1, . . . , p).

Page 35: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

The properties of {Xt}’s dynamic principal components extend tothe time-series context the standard properties of traditionalprincipal components associated with the eigenvalues andeigenvectors of {Xt}’s covariance matrix.

In particular, the variance λk of Ykt is such that

λk =

max{∑p

i=1

∑∞`=−∞ a2

i`=1}Var(∑p

i=1

∑∞`=−∞ ai`Xi ,t−`

), k = 1

max{∑pi=1

∑∞`=−∞ a2

i`=1}Var(∑p

i=1

∑∞`=−∞ ai`Xi ,t−`

), k = 2, . . . , p

subject to∑p

i=1

∑∞`=−∞ ai`Xi ,t−` orthogonal to Y1t , . . . ,Yk−1,t

Page 36: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

Because ϕϕϕ(θ) is unitary, ϕϕϕ′−1(θ) = ϕϕϕ(θ). Therefore,

ϕϕϕ(L)Yt= Xt ,

since it has spectral density

ϕϕϕ(θ)Λ(θ)ϕϕϕ′(θ) = ϕϕϕ(θ)ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ)ϕϕϕ′(θ)= Σ(θ).

Developing ϕϕϕ(L)Yt (taking into account the fact thatψk` = ψk,−`), we obtain

Xt = ϕϕϕ(L)Yt =

p∑k=1

∞∑`=−∞

Yk,t+` ψk`

Page 37: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

The dynamic principal components {(Y1t , . . . ,Ypt)′} thus provide

the expansion (the dynamic Karhunen-Loeve expansion of {Xt}),of the form

Xt =

p∑k=1

∞∑`=−∞

Yk,t+` ψk`,

the truncation at K of which provides, for any 1 ≤ K ≤ p, the“best” reduction of {Xt} to a K -tuple of linear combinations of itspresent, past, and future values.

Page 38: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

More precisely, for any sequence of p-dimensional vectors

vk` k = 1, . . . p, ` ∈ Z

such that∑∞

`=−∞ ‖vk`‖ ≤ ∞, letting

Ykt :=∞∑

`=−∞〈Xt−`, vk`〉,

we have, for all K = 1, . . . , p,

E

∥∥∥∥∥Xt −K∑

k=1

∞∑`=−∞

Yk,t+` ψk`

∥∥∥∥∥2

≤ E

∥∥∥∥∥Xt −K∑

k=1

∞∑`=−∞

Yk,t+` vk`

∥∥∥∥∥2

.

Page 39: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Brillinger’s Dynamic PCA

Mutual orthogonality of Yk1t and Yk2s , k1 6= k2 already has beenshown for all s and t.

Contrary to the the dimension reduction based on the staticprincipal components of {Xt}, the dimension reduction based onthe dynamic principal components thus recovers, in the time seriescontext, the desired properties of standard principalcomponents-based dimension reduction in the i.i.d. case.

Page 40: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

How can we extend this to the functional setting?

A neat treatment of the spectral approach to the analysis offunctional time series has been proposed (only quite recently) byPanaretos & Tavakoli (AoS, 2013), who give the followingdefinition

Definition. Provided that∑

` ‖Cov(Xt ,Xt−`)‖S <∞ (asummability condition on Hilbert-Schmidt norms—details areskipped),

FXθ :=

∞∑`=−∞

Cov(Xt ,Xt−`)e−ı`θ

exists and is called the spectral density operator of the functionalprocess {Xt}.

Page 41: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

Based on the latter, instead of static functional transformations(mapping a L2-valued variable to a RK -valued one; here K is anarbitrary integer)

Xt 7→ Yt = (Y1t . . .YKt)′

=: Ψ0(Xt) say, with Ψ0 : L2 → RK ,

we consider linear functional filters (mapping an L2-valuedfunctional process to a RK -valued variable)

(. . . ,Xt−1,Xt ,Xt+1, . . .) 7→ Yt = (Y1t . . .YKt)′

=:∞∑

`=−∞Ψ`(Xt−`), Ψ` : L2 → RK .

Page 42: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

Those Ψ`’s should be such that Ykt and Yk ′t′ be uncorrelatedunless k = k ′ and t = t ′ (at all leads and lags, not justcontemporaneously: autocorrelations are admitted, but notcross-correlations) ...

Page 43: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

... which holds if and only if the (traditional) spectral densitymatrix

ΣYθ :=

1

∞∑`=−∞

Cov(Yt ,Yt−`)e−ı`θ

of the p-dimensional vector process {Yt} is diagonal for (almost)all θ ∈ [−π, π].

Accordingly, let us investigate the relation between the spectraldensity matrix ΣY

θ and the spectral density operator FXθ .

Page 44: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

For fixed θ, the spectral density operator FXθ (for every θ, a

non-negative self-adjoint Hilbert-Schmidt operator) has similarproperties as a covariance operator.

In particular, for any f ∈ L2, the image FXθ (f ) of f admits the

eigendecomposition (just as for the usual covariance operator)

FXθ (f ) :=

∑k≥1

λk(θ)〈f , ϕk(θ)〉ϕk(θ)

where λk(θ) and ϕk(θ) are FXθ ’s eigenvalues (in descending order

of magnitude) and eigenfunctions, respectively.

Page 45: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

The relation between functional filters and frequency-indexedoperators is similar to, but more delicate than, the relationbetween matrix filters and frequency-indexed matrices ...

Since each Ψ` in the functional filter (providing a reduction todimension p)

(. . . ,Xt−1,Xt ,Xt+1, . . .) 7→∞∑

`=−∞Ψ`(Xt−`) =: Yt , Ψ` : L2 → Rp

is linear, it has, for some f , the representation (Rieszrepresentation)

Ψ`(f ) = (〈f , ψ1`〉, . . . , 〈f , ψp`〉)′.

Then, the following relation holds between the spectral densityoperator FX

θ and the spectral density matrix ΣYθ of the filtered

p-dimensional process {Yt}.

Page 46: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

Let

ψ?k(θ) :=∞∑

`=−∞ψk`e

ı`θ k = 1, . . . , p.

(the ψk`’s thus are the Fourier coefficients of the ψ?k`’s). We have

ΣYθ =

〈FXθ (ψ?1(θ)), ψ?1(θ)

⟩· · · 〈FX

θ (ψ?p(θ)), ψ?1(θ)⟩

.... . .

...〈FX

θ (ψ?1(θ)), ψ?p(θ)⟩· · · 〈FX

θ (ψ?p(θ)), ψ?p(θ)⟩ .

Page 47: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

Let us choose the ψk`’s (equivalently, the functional filters Ψ`) insuch a way that

ψ?k(θ) =∑`∈Z

ψk`eı`θ = ϕk(θ).

That is, choose as ψk`’s the Fourier coefficients of FXθ ’s kth

eigenfunction ϕk(θ):

ψk` =1

∫ π

−πϕk(θ)e−ı`θdθ

(note again that those ψk`’s, just as the ψ?k(θ)’s and ϕk(θ)’s, arefunctions, viz. ψk`(u) = 1

∫ π−π ϕk(θ)(u)e−ı`θdθ)

Page 48: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic FPCA

Definition. The kth Dynamic Functional Principal Component ofXt is the univariate real-valued process

Ykt :=∞∑

`=−∞〈Xt−`, ψk`〉, t ∈ Z, k = 1, . . . , p

(formally, same definition as for Rp-valued processes), withvariance ∫ π

−πλk(θ)dθ.

The spectral density matrix ΣYθ of the Ykt ’s then is diagonal, with

diagonal elements λk(θ).

Page 49: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Elementary properties

Assume {Xt : t ∈ Z} is a functional process with summableautocovariances and let Ykt , t ∈ Z be its kth dynamic FPC. Thenthe following holds:

(a) The series defining Ykt is mean square convergent.

(b) Ykt is real.

(c) If the Xt ’s are serially uncorrelated, then the dynamicFPCs coincide with the static FPCs.

(d) For k 6= k ′, the principal components Ykt and Yk ′s

are uncorrelated for all s, t.

Page 50: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Dynamic Karhunen-Loeve expansion

We then can recover the original process by means of a functionalversion of the dynamic Karhunen-Loeve expansion previouslyobtained in Rp.

Definition. The dynamic Karhunen-Loeve expansion of thefunctional process Xt is

Xt =∞∑k=1

∞∑`=−∞

Yk,t+` ψk`, t ∈ Z

with

Ykt :=∞∑

`=−∞〈Xt−`, ψk`〉

({Xt}’s dynamic functional principal components).

Page 51: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Optimality

For any L2-valued sequence

vk` , k, ` ∈ Z

such that∑∞

`=−∞ ‖vk`‖ <∞, letting

Ykt :=∞∑

`=−∞〈Xt−`, vk`〉,

we have, for all K ∈ N, the desired optimality property

E

∥∥∥∥∥Xt −K∑

k=1

∞∑`=−∞

Yk,t−` ψk`

∥∥∥∥∥2

≤ E

∥∥∥∥∥Xt −K∑

k=1

∞∑`=−∞

Yk,t−` vk`

∥∥∥∥∥2

.

Page 52: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

The last proposition provides, as a natural measure of how well afunctional time series can be represented in finite dimension K , theproportion of variance explained by the first K dynamic FPCs:{

K∑k=1

∫ π

−πλk(θ)dθ

}/E‖X1‖2.

Page 53: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Real data example (pollution data)

0.0 0.2 0.4 0.6 0.8 1.0

−6

−4

−2

02

Intraday time

Sq

rt(

PM

10

) −

me

an

0.0 0.2 0.4 0.6 0.8 1.0

−6

−4

−2

02

Intraday time

Sq

rt(

PM

10

) −

me

an

0.0 0.2 0.4 0.6 0.8 1.0

−6

−4

−2

02

Intraday time

Sq

rt(

PM

10

) −

me

an

Figure 4: We show 10 subsequent observations (left panel), thecorresponding static KL expansion with one term (middle panel) and thedynamic KL expansion with one term (right panel).

Page 54: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Estimation of Functional Dynamic Principal Components

In practice, covariance and spectral density operators are notknown.

Theorem (Consistency)

Let Ykt be the random variable defined by

Ykt :=L∑

`=−L〈Xt−`, ψk`〉, k = 1, . . . , p and t = L + 1, . . . n− L,

where the ψk`’s are estimated from the empirical covarianceoperator. Then (under appropriate technical assumptions), we

have for L = L(n)→∞ that Ykt − YktP→ 0 as n→∞.

Page 55: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Implementation in a nutshell

(i) In practice, a function x(u) is observed on a grid0 ≤ u1 < u2 < . . . < ur ≤ 1, and converted into a functionalobservation within the space spanned by a finite number of basisfunctions v1, . . . , vd as

x(u) = Z1v1(u) + . . .+ Zdvd(u) = v′(u)Z.

Commonly, Fourier bases, b-splines or wavelets are used. Thecoefficients Zk can be obtained, for example, via leastsquaresfitting or some penalized form thereof.

Page 56: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Implementation in a nutshell

(ii) We are working on a finite-dimensional space now, whereoperators reduce to matrices acting on Z := (Z1, . . . ,Zd)′.

In particular, the spectral density operator

FXθ :=

∞∑`=−∞

Cov(Xt ,Xt−`)e−ı`θ

reduces to the matrix

GXθ =

1

(∑h∈Z

CZh e−ıhθ

)V ′,

where CZh := EZhZ′0 and V := (〈vi , vj〉 : 1 ≤ i , j ≤ d).

Page 57: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Implementation in a nutshell

(iii) Relation between the eigenfunctions/values of FXθ and the

eigenvectors/values of GXθ

Assume that λm(θ) is the m-th eigenvalue of GXθ , with eigenvector

φm(θ). Then λm(θ) is also an eigenvalue of FXθ , with

eigenfunction v′φm(θ).

From there, we obtain

ψmk =v′

∫ π

−πϕm(s)e−ıksds =: v′ψmk ,

and

Ymt =∑k∈Z

∫ 1

0Z′t−kv(u)v′(u)ψmkdu =

∑k∈Z

Z′t−kVψmk .

Page 58: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Implementation in a nutshell

(iv) Of course, all quantities involved need to be estimated(consistency results have been established).

Details can be found in the JRSS paper.

Implementation in R: package freqdom.

Page 59: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Real data example

0 50 100 150

−4−2

02

4

days

1st s

core

sequ

ence

s

Figure 5: The sequence of the first static FPCs (red) [73%] and thedynamic ones (black) [80%].

Page 60: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Simulation

Setting: functional AR(1) process; autoregressive kernelproportional to κ; length n = 250

dyn.PC1 stat.PC1 dyn.PC2 stat.PC2 dyn.PC3 stat.PC3

0.00

0.10

0.20

0.30

RMSE(K)

Figure 6: Boxplots of RMSEstat(K ) and RMSEdyn(K ) for K = 1, 2, 3principal components, based on 500 iterations, n = 250 and κ = 0.5.

Page 61: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Simulation

dyn.PC1 stat.PC1 dyn.PC2 stat.PC2 dyn.PC3 stat.PC3

0.0

0.1

0.2

0.3

0.4

RMSE(K)

Figure 7: Boxplots of RMSEstat(K ) and RMSEdyn(K ) for K = 1, 2, 3principal components, based on 500 iterations, n = 250 and κ = 0.7.

Page 62: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Simulation

dyn.PC1 stat.PC1 dyn.PC2 stat.PC2 dyn.PC3 stat.PC3

0.0

0.1

0.2

0.3

0.4

0.5

RMSE(K)

Figure 8: Boxplots of RMSEstat(K ) and RMSEdyn(K ) for K = 1, 2, 3principal components, based on 500 iterations, n = 250 and κ = 0.9.

Page 63: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Conclusion

Static PCA and FPCA (which are everyday practice) areinadequate in the context of time series: the KLdecomposition does not involve basis processes that aremutually orthogonal at all leads and lags, and the optimalityproperty is lost in non-i.i.d. cases.

Dynamic PCA and FPCA take into account serial dependence,and recover those properties.

Consistent empirical versions are computationally feasible.

Simulations and real data example are quite encouraging

Page 64: Optimal Dimension Reduction for (Functional) Time Series · functional principal components, Journal of the Royal Statistical Society Series B 77, 319-348. Hallin, M., H ormann, S.,

Conclusion (continued)

Moreover,

Dynamic principal components are a basic ingredient of theGeneral Dynamic Factor methods used (see Forni, Hallin,Lippi, and Reichlin/Zaffaroni 2000, 2005, . . . , 2015, 2017) inthe analysis of large panels of time series data;

in an ongoing research (Hallin, Hormann, Nisol), functionaldynamic principal components methods are used in afunctional extension of General Dynamic Factor methods,allowing for the analysis/prediction of large panels containingboth scalar and functional time series