Optimal Dimension Reduction for (Functional) Time Series · functional principal components,...

Optimal Dimension Reductionfor (Functional) Time Series

Marc Hallin

Universite libre de Bruxelles (ULB)

IMA Workshop on Forecasting from ComplexityInstitute for Mathematics and its Applications

University of Minnesota, MinneapolisApril 23–27, 2018

Partly based on

Hormann, S., Kidzinski, L., and Hallin, M. (2015). Dynamicfunctional principal components, Journal of the Royal StatisticalSociety Series B 77, 319-348.

Hallin, M., Hormann, S., and Lippi, M. (2018). On optimaldimension reduction for high-dimensional and functional timeseries, Statistical Inference for Stochastic Processes, to appear.

Dimension reduction techniques are at the core of the statisticalanalysis of high-dimensional observations

Xt high-dimensional “replaced with” Yt =

Yt1...

YtK

, K “small”,

t = 1, . . . ,T , where the Ytk ’s are scalars

• observations in Rp (p-dimensional, p large): Xt =

Xt1...

Xtp

;

• functional observations: Xt = Xt(u), u ∈ [0, 1] (any basis isinfinite-dimensional);

“Replacing Xt (p-dimensional or functional) by Yt”(K -dimensional, K “small”) means “linear approximations of Xt

can be constructed from the Yt ’s”—via linear combinations (staticloading) or linear filtering (dynamic loading) ...

Dimension reduction techniques are at the core of the statisticalanalysis of high-dimensional observations

• observations in Rp, p large, 0 < K < p

• functional observations—intrinsicallyinfinite-dimensional—0 < K <∞

The latter case, where each observation is a function, isincreasingly frequent in a variety of applications—econometrics,finance, environmental studies ... “Tick by tick” time series ofreturns, p stocks; daily pollution or climatic data (e.g. withhalf-hour resolution); daily geophysical data (e.g. horizontalcomponent of the magnetic field with minute resolution; oceansurface salinity, velocities, etc.;

“Tick by tick” financial data

0 1000 2000 3000 4000

360

365

370

time

OEX

Figure 1: S&P100 market index plotted for 10 consecutive trading days,405 measurements per day.

Pollution data

Time

PM

10

0 2 4 6 8 10

20

40

60

80

Figure 2: 10 days of PM10–data in Graz (Austria), 48 observations perday.

Geophysical data

Time

0 2000 4000 6000 8000 10000

27560

27580

27600

27620

27640

27660

Figure 3: Horizontal component of the magnetic field measured in oneminute resolution at Honolulu magnetic observatory from 1/1/200100:00 UT to 1/7/2001 24:00 UT, 1440 measurements per day.

Functional time series

Each day yields one observation Xt = {Xt(u), u ∈ [0, 1]}, which istreated as a real-valued function (or the corresponding curve).

A common feature of all those datasets is that they do exhibitserial dependence, hence, should be analyzed as stochasticprocesses—functional time series.

The intraday process {Xt(u), u ∈ [0, 1]} typically is not stationary,while the functional process {Xt , t ∈ Z} is.

We shall assume throughout that Xt ∈ L2([0, 1]).

Dimension reduction

Whether the data are in Rp or functional, the most populardimension reduction technique is Principal Components, PCA(Functional Principal Components, FPCA).

PCA in Rp

Let µ := EX = 0 and C := Cov(X ), with eigenvectors e1, . . . , epand eigenvalues λ1 > . . . > λp. The p-dimensional X decomposesinto (Karhunen-Loeve expansion)

X =

p∑k=1

Ykek =

p∑k=1

〈X , ek〉ek .

where

• Yk := 〈X , ek〉 is X ’s kth Principal Component (a scalar)

• Cov(Y ) = diag(λ1, . . . , λp)

Interpretation: Y1 is the normed linear combination of X1, . . . ,Xp

with highest variance, etc.

Optimality of PCA-based dimension reduction in Rp

The success of Principal Components as a dimension reductiontechnique is explained by the fact that the (Karhunen-Loeve)expansion

X =

p∑k=1

Ykek =

p∑k=1

〈X , ek〉ek

is such that, for any K ≤ p and any K -tuple v1, . . . , vK ofp-dimensional vectors (wlg, normed),

E

∥∥∥∥∥X −K∑

k=1

〈X , ek〉ek

∥∥∥∥∥2 ≤ E

∥∥∥∥∥X −K∑

k=1

〈X , vk〉vk

∥∥∥∥∥2

(=

p∑k=K+1

λk

)

Optimality of PCA-based dimension reduction in Rp

For all 1 ≤ K ≤ p, the K first principal components (Y1, . . . ,YK )thus

• provide the “best” (linear) reduction of X to dimension K in thesense that, for any 1 ≤ K < p, (Y1, . . . ,YK ) is the “best”K -dimensional approximation of X and, moreover,

•• they are mutually orthogonal (now in the sense ofuncorrelatedness), which greatly simplifies the statistical analysis.

Functional PCA

All this is easily extended for random variables in functional spaces.

Here we consider only variables in the Hilbert space L2 = L2([0, 1])equipped with inner product

〈x , y〉 =

∫ 1

0x(u)y(u)du

and norm‖x‖ =

√〈x , x〉.

Basic ingredients for Functional PCA

Let X = X (u) be a functional random variable, with mean function

µ = µ(u) = EX = (EX (u) : u ∈ [0, 1]) ;

(without loss of generality, let us assume µ(u) = 0 for all u )

and covariance kernel

C =(C (u, v) := Cov(X (u),X (v)) : (u, v) ∈ [0, 1]2

).

Instead of covariance matrices, we are dealing here with covarianceoperators. The covariance operator C is mapping f ∈ L2 toC (f ) ∈ L2, with

C (f )(u) =

∫ 1

0C (u, v)f (v)dv : u ∈ [0, 1].

Functional PCA

Functional PCA uses the fact that C is a compact (in fact, “traceclass”) operator. For any f ∈ L2, the image C (f ) of f admits aneigendecomposition

C (f ) =∑k≥1

λk〈f , ek〉ek ,

with∑

k≥1 λk <∞, where ek ∈ L2 is the kth eigenfunction of C ,with eigenvalue λk , ‖ek‖ = 1 and ek ⊥ e` for k 6= `: the sequencee1, e2, . . . thus constitutes an orthonormal basis for L2[0, 1].

KL expansion

In that basis, X admits a representation (called Karhunen-Loeveexpansion)

X (u) =∑k≥1

〈X , ek〉ek(u) =:∑k≥1

Ykek(u) u ∈ [0, 1]

where

Yk := 〈X , ek〉 is X ’s kth functional principal component (ascalar), with

Cov(Yk ,Y`) = 0 for k 6= ` and Var(Yk) = λk −→ 0, andhence Yk →m.q. 0, as k →∞.

Optimality

That Karhunen-Loeve expansion is such that, for any 0 < K ∈ Nand any L2[0, 1]-valued sequence v1, v2, . . . ,

E

∫ 1

0

(X (u)−

K∑k=1

Ykek(u)

)2

du

≤ E

∫ 1

0

(X (u)−

K∑k=1

〈X , vk〉vk(u)

)2

du,

that is,

E

∥∥∥∥∥X −K∑

k=1

Ykek

∥∥∥∥∥2

≤ E

∥∥∥∥∥X −K∑

k=1

〈X , vk〉vk

∥∥∥∥∥2

.

KL expansion

Hence, just as in the vector case, the K first functional principalcomponents (Y1, . . . ,YK )

• provide the “best reduction to dimension K” of X in the sensethat (Y1, . . . ,YK ) is the “best” K -dimensional approximationof X , and, moreover,

•• Y1, . . . ,YK are mutually orthogonal ...

Static KL expansion

Let us now assume {Xt | t ∈ Z} is a second-order stationaryprocess. Then, the the K first functional principal components

(Y1t , . . . ,YKt) := (〈Xt , e1〉, . . . , 〈Xt , eK 〉), t ∈ Z

(a K -dimensional process) based on the eigenfunctions ek of thecovariance operator C

• fail to exploit the information contained in Xt ’s leads and lags,hence

• only provide the “best” static K -dimensional approximation ofX , that is, is best among the reductions involving instantaneouslinear combinations only, of the form 〈Xt , ek〉, of the Xt ’s

In a time series context, linear combinations involving the past andfuture values of Xt are likely to do a better job

Static KL expansion

Moreover,•• unless {Xt | t ∈ Z} is an uncorrelated process, the Ykt ’s aremutually orthogonal “at lag zero” only:

Yk1t = 〈Xt , ek1〉 is uncorrelated with Yk2t = 〈Xt , ek2〉 for k1 6= k2,but in general Yk1t = 〈Xt , ek1〉 and Yk2,t−` = 〈Xt,−`, ek2〉 are(cross-)correlated.

Conclusion:

Principal components (whether classical or functional) in a timeseries context do not enjoy the two properties that make them anefficient statistical tool in the i.i.d. (uncorrelated) case.

Can we adapt traditional PCA (call it static PCA) to the dynamiccontext in order to recover its optimal dimension reduction andmutual orthogonality properties in the time-series context?

Brillinger’s Dynamic PCA

This is what Brillinger did in the multivariate time series case.Denote by {Xt := (X1t , . . . ,Xpt)

′ t ∈ Z} a second-order stationaryprocess, with values in Rp, mean µ = 0 and autocovariances

Γk := E[XtX

′t−k], k ∈ Z.

Intuitively, informative linear approximations of Xt should exploitthe autocovariance structure (all Γk ’s), hence involve all present,but also past and future values of the Xt ’s instead of restricting tocontemporaneous ones.


This requires looking for the normed linear combinations ofpresent, past and future observations maximizing (subject toorthogonality constraints) the variance, etc.—not just thoseinvolving the present observations: dynamic linear combinations(filters), not just static ones.


Brillinger (1981), in a pathbreaking but all-too-often overlookedcontribution, has solved this problem for p-dimensional processesadmitting a spectral density (a spectral density matrix) ...

... which does not mean that Brillinger’s concept is a frequencydomain concept! Fourier transforms here are used as tools, andother spectral concepts probably can be considered as well.


Brillinger’s dynamic principal components are based on afactorization of spectral density matrices instead of thefactorization of the (instantaneous) covariance matrices.

Therefore, let us make the additional assumption that

• the spectral measure of {Xt , t ∈ Z} is absolutely continuouswith respect to the Lebesgue measure on [−π, π], that is,{Xt} has a p × p spectral density matrix

Σ(θ) :=∞∑

`=−∞Γ`e−ı`θ,

with entries (σij(θ)), θ ∈ [−π, π];


The matrix Σ(θ) is Hermitian and positive semidefinite; denotingby Σ(θ) its complex conjugate, Σ(θ) = Σ′(θ) = Σ(−θ).

The eigenvalues λ1(θ) ≥ λ2(θ) ≥ . . . ≥ λp(θ) ≥ 0 of Σ(θ) are real;call them the dynamic eigenvalues of {Xt}.

The corresponding eigenvectors are called the dynamiceigenvectors of {Xt}; denote by ϕ′k(θ) the row eigenvector (ofΣ(θ)) associated with λk(θ): then, ϕk(θ) is a column eigenvectorof of Σ(−θ) = Σ(θ) = Σ′(θ), still with eigenvalue λk(θ):

ϕ′k(θ)Σ(θ) = λk(θ)ϕ′k(θ) Σ(−θ)ϕk(θ) = λk(θ)ϕk(θ)


Spectral densities always are defined up to a set of θ values withLebesgue measure zero; rather than functions, we are dealing withequivalence classes of a.e. equal functions, thus; by Σ(θ) we tacitlymean a representative of such a class—the same comment appliesto dynamic eigenvalues and eigenvectors


For all θ,

• the eigenvalues λk(θ) are real,

• the p × p matrix of eigenvectors ϕϕϕ(θ) := (ϕ1(θ) . . . ϕp(θ)) isunitary, that is,

ϕϕϕ(θ)ϕϕϕ∗(θ) = I = ϕϕϕ∗(θ)ϕϕϕ(θ),

henceϕϕϕ(θ)ϕϕϕ′(θ) = I = ϕϕϕ′(θ)ϕϕϕ(θ)

(where ϕϕϕ∗(θ) and ϕϕϕ(θ) stand for the adjoint of ϕϕϕ(θ) and itsconjugate, respectively)

• Σ(−θ) = Σ(θ) implies we can impose ϕϕϕ(−θ) = ϕϕϕ(θ)

• ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ) = Λ(θ) and ϕϕϕ(θ)Λ(θ)ϕϕϕ′(θ) = Σ(θ)

Matrices and Filters

Any matrix or vector M(θ) with square-integrable θ-measurableelements defined over [−π, π] can be expanded (componentwise)into a Fourier series

M(θ) =1

2π

∞∑`=−∞

[ ∫ π

−πM(θ)e ı`θdθ

]e−ı`θ

where the right-hand side converges in quadratic mean. Thatexpansion creates a correspondence between the square-integrablematrix-valued function M(θ) and the square-summable filter

M(L) :=1

2π

∞∑`=−∞

[∫ π

−πM(θ)e ı`θ dθ

]L`

(L, as usual, stands for the lag operator)


The (p ×m)-dimensional matrix M(θ) and the filter M(L) then arestrongly connected by the fact that

• if the p-dimensional process {Xt} has spectral densitymatrix Σ(θ),• then, the m-variate stochastic process {M ′(L)Xt} has spectraldensity matrix

M ′(θ)Σ(θ)M(θ)

where M is the conjugate of M.


The p × 1 dynamic eigenvectors θ 7→ ϕk(θ), in particular, can beexpanded into

ϕk(θ) =1

2π

∞∑`=−∞

[ ∫ π

−πϕk(θ)e ı`θdθ

]e−ı`θ =:

1

2π

∞∑`=−∞

ψk`e−ı`θ,

defining p × 1 square-summable filters of the form

ϕk

(L) =1

2π

∞∑`=−∞

[∫ π

−πϕk(θ)e ı`θdθ

]L` =:

∞∑`=−∞

ψk`L`

where ψk` is a p-dimensional vector of Fourier coefficients; sinceϕϕϕ(−θ) = ϕϕϕ(θ), the ψk`’s are real


It follows that the p-tuple

Yt := ϕϕϕ′(L)Xt

of (real-valued) univariate processes {Ykt | t ∈ Z}, where

Ykt := ϕ′k

(L)Xt =∞∑

`=−∞ψ′k`Xt−` =

∞∑`=−∞

〈Xt−`, ψk`〉

has diagonal spectral density

ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ) = Λ(θ),

with diagonal elements ϕ′k(θ)Σ(θ)ϕk(θ) = λk(θ).

Hence, the {Ykt}’s are mutually orthogonal at all leads and lags,with

Var(Ykt) = λk :=

∫ π

−πλk(θ)dθ.


Definition. The univariate process {Ykt | t ∈ Z} is called {Xt}’skth dynamic principal component (k = 1, . . . , p).


The properties of {Xt}’s dynamic principal components extend tothe time-series context the standard properties of traditionalprincipal components associated with the eigenvalues andeigenvectors of {Xt}’s covariance matrix.

In particular, the variance λk of Ykt is such that

λk =

max{∑p

i=1

∑∞`=−∞ a2

i`=1}Var(∑p

i=1

∑∞`=−∞ ai`Xi ,t−`

), k = 1

max{∑pi=1

∑∞`=−∞ a2

i`=1}Var(∑p

i=1

∑∞`=−∞ ai`Xi ,t−`

), k = 2, . . . , p

subject to∑p

i=1

∑∞`=−∞ ai`Xi ,t−` orthogonal to Y1t , . . . ,Yk−1,t


Because ϕϕϕ(θ) is unitary, ϕϕϕ′−1(θ) = ϕϕϕ(θ). Therefore,

ϕϕϕ(L)Yt= Xt ,

since it has spectral density

ϕϕϕ(θ)Λ(θ)ϕϕϕ′(θ) = ϕϕϕ(θ)ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ)ϕϕϕ′(θ)= Σ(θ).

Developing ϕϕϕ(L)Yt (taking into account the fact thatψk` = ψk,−`), we obtain

Xt = ϕϕϕ(L)Yt =

p∑k=1

∞∑`=−∞

Yk,t+` ψk`


The dynamic principal components {(Y1t , . . . ,Ypt)′} thus provide

the expansion (the dynamic Karhunen-Loeve expansion of {Xt}),of the form

Xt =

p∑k=1

∞∑`=−∞

Yk,t+` ψk`,

the truncation at K of which provides, for any 1 ≤ K ≤ p, the“best” reduction of {Xt} to a K -tuple of linear combinations of itspresent, past, and future values.


More precisely, for any sequence of p-dimensional vectors

vk` k = 1, . . . p, ` ∈ Z

such that∑∞

`=−∞ ‖vk`‖ ≤ ∞, letting

Ykt :=∞∑

`=−∞〈Xt−`, vk`〉,

we have, for all K = 1, . . . , p,

E

∥∥∥∥∥Xt −K∑

k=1

∞∑`=−∞

Yk,t+` ψk`

∥∥∥∥∥2

≤ E

∥∥∥∥∥Xt −K∑

k=1

∞∑`=−∞

Yk,t+` vk`

∥∥∥∥∥2

.


Mutual orthogonality of Yk1t and Yk2s , k1 6= k2 already has beenshown for all s and t.

Contrary to the the dimension reduction based on the staticprincipal components of {Xt}, the dimension reduction based onthe dynamic principal components thus recovers, in the time seriescontext, the desired properties of standard principalcomponents-based dimension reduction in the i.i.d. case.

Dynamic FPCA

How can we extend this to the functional setting?

A neat treatment of the spectral approach to the analysis offunctional time series has been proposed (only quite recently) byPanaretos & Tavakoli (AoS, 2013), who give the followingdefinition

Definition. Provided that∑

` ‖Cov(Xt ,Xt−`)‖S <∞ (asummability condition on Hilbert-Schmidt norms—details areskipped),

FXθ :=

∞∑`=−∞

Cov(Xt ,Xt−`)e−ı`θ

exists and is called the spectral density operator of the functionalprocess {Xt}.

Dynamic FPCA

Based on the latter, instead of static functional transformations(mapping a L2-valued variable to a RK -valued one; here K is anarbitrary integer)

Xt 7→ Yt = (Y1t . . .YKt)′

=: Ψ0(Xt) say, with Ψ0 : L2 → RK ,

we consider linear functional filters (mapping an L2-valuedfunctional process to a RK -valued variable)

(. . . ,Xt−1,Xt ,Xt+1, . . .) 7→ Yt = (Y1t . . .YKt)′

=:∞∑

`=−∞Ψ`(Xt−`), Ψ` : L2 → RK .

Dynamic FPCA

Those Ψ`’s should be such that Ykt and Yk ′t′ be uncorrelatedunless k = k ′ and t = t ′ (at all leads and lags, not justcontemporaneously: autocorrelations are admitted, but notcross-correlations) ...

Dynamic FPCA

... which holds if and only if the (traditional) spectral densitymatrix

ΣYθ :=

1

2π

∞∑`=−∞

Cov(Yt ,Yt−`)e−ı`θ

of the p-dimensional vector process {Yt} is diagonal for (almost)all θ ∈ [−π, π].

Accordingly, let us investigate the relation between the spectraldensity matrix ΣY

θ and the spectral density operator FXθ .

Dynamic FPCA

For fixed θ, the spectral density operator FXθ (for every θ, a

non-negative self-adjoint Hilbert-Schmidt operator) has similarproperties as a covariance operator.

In particular, for any f ∈ L2, the image FXθ (f ) of f admits the

eigendecomposition (just as for the usual covariance operator)

FXθ (f ) :=

∑k≥1

λk(θ)〈f , ϕk(θ)〉ϕk(θ)

where λk(θ) and ϕk(θ) are FXθ ’s eigenvalues (in descending order

of magnitude) and eigenfunctions, respectively.

Dynamic FPCA

The relation between functional filters and frequency-indexedoperators is similar to, but more delicate than, the relationbetween matrix filters and frequency-indexed matrices ...

Since each Ψ` in the functional filter (providing a reduction todimension p)

(. . . ,Xt−1,Xt ,Xt+1, . . .) 7→∞∑

`=−∞Ψ`(Xt−`) =: Yt , Ψ` : L2 → Rp

is linear, it has, for some f , the representation (Rieszrepresentation)

Ψ`(f ) = (〈f , ψ1`〉, . . . , 〈f , ψp`〉)′.

Then, the following relation holds between the spectral densityoperator FX

θ and the spectral density matrix ΣYθ of the filtered

p-dimensional process {Yt}.

Dynamic FPCA

Let

ψ?k(θ) :=∞∑

`=−∞ψk`e

ı`θ k = 1, . . . , p.

(the ψk`’s thus are the Fourier coefficients of the ψ?k`’s). We have

ΣYθ =

〈FXθ (ψ?1(θ)), ψ?1(θ)

⟩· · · 〈FX

θ (ψ?p(θ)), ψ?1(θ)⟩

.... . .

...〈FX

θ (ψ?1(θ)), ψ?p(θ)⟩· · · 〈FX

θ (ψ?p(θ)), ψ?p(θ)⟩ .

Dynamic FPCA

Let us choose the ψk`’s (equivalently, the functional filters Ψ`) insuch a way that

ψ?k(θ) =∑`∈Z

ψk`eı`θ = ϕk(θ).

That is, choose as ψk`’s the Fourier coefficients of FXθ ’s kth

eigenfunction ϕk(θ):

ψk` =1

2π

∫ π

−πϕk(θ)e−ı`θdθ

(note again that those ψk`’s, just as the ψ?k(θ)’s and ϕk(θ)’s, arefunctions, viz. ψk`(u) = 1

2π

∫ π−π ϕk(θ)(u)e−ı`θdθ)

Dynamic FPCA

Definition. The kth Dynamic Functional Principal Component ofXt is the univariate real-valued process

Ykt :=∞∑

`=−∞〈Xt−`, ψk`〉, t ∈ Z, k = 1, . . . , p

(formally, same definition as for Rp-valued processes), withvariance ∫ π

−πλk(θ)dθ.

The spectral density matrix ΣYθ of the Ykt ’s then is diagonal, with

diagonal elements λk(θ).

Elementary properties

Assume {Xt : t ∈ Z} is a functional process with summableautocovariances and let Ykt , t ∈ Z be its kth dynamic FPC. Thenthe following holds:

(a) The series defining Ykt is mean square convergent.

(b) Ykt is real.

(c) If the Xt ’s are serially uncorrelated, then the dynamicFPCs coincide with the static FPCs.

(d) For k 6= k ′, the principal components Ykt and Yk ′s

are uncorrelated for all s, t.

Dynamic Karhunen-Loeve expansion

We then can recover the original process by means of a functionalversion of the dynamic Karhunen-Loeve expansion previouslyobtained in Rp.

Definition. The dynamic Karhunen-Loeve expansion of thefunctional process Xt is

Xt =∞∑k=1

∞∑`=−∞

Yk,t+` ψk`, t ∈ Z

with

Ykt :=∞∑

`=−∞〈Xt−`, ψk`〉

({Xt}’s dynamic functional principal components).

Optimality

For any L2-valued sequence

vk` , k, ` ∈ Z

such that∑∞

`=−∞ ‖vk`‖ <∞, letting

Ykt :=∞∑

`=−∞〈Xt−`, vk`〉,

we have, for all K ∈ N, the desired optimality property

E

∥∥∥∥∥Xt −K∑

k=1

∞∑`=−∞

Yk,t−` ψk`

∥∥∥∥∥2

≤ E

∥∥∥∥∥Xt −K∑

k=1

∞∑`=−∞

Yk,t−` vk`

∥∥∥∥∥2

.

The last proposition provides, as a natural measure of how well afunctional time series can be represented in finite dimension K , theproportion of variance explained by the first K dynamic FPCs:{

K∑k=1

∫ π

−πλk(θ)dθ

}/E‖X1‖2.

Real data example (pollution data)

0.0 0.2 0.4 0.6 0.8 1.0

−6

−4

−2

02

Intraday time

Sq

rt(

PM

10

) −

me

an

0.0 0.2 0.4 0.6 0.8 1.0

−6

−4

−2

02

Intraday time

Sq

rt(

PM

10

) −

me

an

0.0 0.2 0.4 0.6 0.8 1.0

−6

−4

−2

02

Intraday time

Sq

rt(

PM

10

) −

me

an

Figure 4: We show 10 subsequent observations (left panel), thecorresponding static KL expansion with one term (middle panel) and thedynamic KL expansion with one term (right panel).

Estimation of Functional Dynamic Principal Components

In practice, covariance and spectral density operators are notknown.

Theorem (Consistency)

Let Ykt be the random variable defined by

Ykt :=L∑

`=−L〈Xt−`, ψk`〉, k = 1, . . . , p and t = L + 1, . . . n− L,

where the ψk`’s are estimated from the empirical covarianceoperator. Then (under appropriate technical assumptions), we

have for L = L(n)→∞ that Ykt − YktP→ 0 as n→∞.

Implementation in a nutshell

(i) In practice, a function x(u) is observed on a grid0 ≤ u1 < u2 < . . . < ur ≤ 1, and converted into a functionalobservation within the space spanned by a finite number of basisfunctions v1, . . . , vd as

x(u) = Z1v1(u) + . . .+ Zdvd(u) = v′(u)Z.

Commonly, Fourier bases, b-splines or wavelets are used. Thecoefficients Zk can be obtained, for example, via leastsquaresfitting or some penalized form thereof.


(ii) We are working on a finite-dimensional space now, whereoperators reduce to matrices acting on Z := (Z1, . . . ,Zd)′.

In particular, the spectral density operator

FXθ :=

∞∑`=−∞

Cov(Xt ,Xt−`)e−ı`θ

reduces to the matrix

GXθ =

1

2π

(∑h∈Z

CZh e−ıhθ

)V ′,

where CZh := EZhZ′0 and V := (〈vi , vj〉 : 1 ≤ i , j ≤ d).


(iii) Relation between the eigenfunctions/values of FXθ and the

eigenvectors/values of GXθ

Assume that λm(θ) is the m-th eigenvalue of GXθ , with eigenvector

φm(θ). Then λm(θ) is also an eigenvalue of FXθ , with

eigenfunction v′φm(θ).

From there, we obtain

ψmk =v′

2π

∫ π

−πϕm(s)e−ıksds =: v′ψmk ,

and

Ymt =∑k∈Z

∫ 1

0Z′t−kv(u)v′(u)ψmkdu =

∑k∈Z

Z′t−kVψmk .


(iv) Of course, all quantities involved need to be estimated(consistency results have been established).

Details can be found in the JRSS paper.

Implementation in R: package freqdom.

Real data example

0 50 100 150

−4−2

02

4

days

1st s

core

sequ

ence

s

Figure 5: The sequence of the first static FPCs (red) [73%] and thedynamic ones (black) [80%].

Simulation

Setting: functional AR(1) process; autoregressive kernelproportional to κ; length n = 250

dyn.PC1 stat.PC1 dyn.PC2 stat.PC2 dyn.PC3 stat.PC3

0.00

0.10

0.20

0.30

RMSE(K)

Figure 6: Boxplots of RMSEstat(K ) and RMSEdyn(K ) for K = 1, 2, 3principal components, based on 500 iterations, n = 250 and κ = 0.5.

Simulation


0.0

0.1

0.2

0.3

0.4

RMSE(K)


Simulation


0.0

0.1

0.2

0.3

0.4

0.5

RMSE(K)


Conclusion

Static PCA and FPCA (which are everyday practice) areinadequate in the context of time series: the KLdecomposition does not involve basis processes that aremutually orthogonal at all leads and lags, and the optimalityproperty is lost in non-i.i.d. cases.

Dynamic PCA and FPCA take into account serial dependence,and recover those properties.

Consistent empirical versions are computationally feasible.

Simulations and real data example are quite encouraging

Conclusion (continued)

Moreover,

Dynamic principal components are a basic ingredient of theGeneral Dynamic Factor methods used (see Forni, Hallin,Lippi, and Reichlin/Zaffaroni 2000, 2005, . . . , 2015, 2017) inthe analysis of large panels of time series data;

in an ongoing research (Hallin, Hormann, Nisol), functionaldynamic principal components methods are used in afunctional extension of General Dynamic Factor methods,allowing for the analysis/prediction of large panels containingboth scalar and functional time series

Optimal Dimension Reduction for (Functional) Time Series · functional principal components,...

Documents

Transcript of Optimal Dimension Reduction for (Functional) Time Series · functional principal components,...