Optimal Dimension Reduction for (Functional) Time Series · functional principal components,...
Transcript of Optimal Dimension Reduction for (Functional) Time Series · functional principal components,...
Optimal Dimension Reductionfor (Functional) Time Series
Marc Hallin
Universite libre de Bruxelles (ULB)
IMA Workshop on Forecasting from ComplexityInstitute for Mathematics and its Applications
University of Minnesota, MinneapolisApril 23–27, 2018
Partly based on
Hormann, S., Kidzinski, L., and Hallin, M. (2015). Dynamicfunctional principal components, Journal of the Royal StatisticalSociety Series B 77, 319-348.
Hallin, M., Hormann, S., and Lippi, M. (2018). On optimaldimension reduction for high-dimensional and functional timeseries, Statistical Inference for Stochastic Processes, to appear.
Dimension reduction techniques are at the core of the statisticalanalysis of high-dimensional observations
Xt high-dimensional “replaced with” Yt =
Yt1...
YtK
, K “small”,
t = 1, . . . ,T , where the Ytk ’s are scalars
• observations in Rp (p-dimensional, p large): Xt =
Xt1...
Xtp
;
• functional observations: Xt = Xt(u), u ∈ [0, 1] (any basis isinfinite-dimensional);
“Replacing Xt (p-dimensional or functional) by Yt”(K -dimensional, K “small”) means “linear approximations of Xt
can be constructed from the Yt ’s”—via linear combinations (staticloading) or linear filtering (dynamic loading) ...
Dimension reduction techniques are at the core of the statisticalanalysis of high-dimensional observations
• observations in Rp, p large, 0 < K < p
• functional observations—intrinsicallyinfinite-dimensional—0 < K <∞
The latter case, where each observation is a function, isincreasingly frequent in a variety of applications—econometrics,finance, environmental studies ... “Tick by tick” time series ofreturns, p stocks; daily pollution or climatic data (e.g. withhalf-hour resolution); daily geophysical data (e.g. horizontalcomponent of the magnetic field with minute resolution; oceansurface salinity, velocities, etc.;
“Tick by tick” financial data
0 1000 2000 3000 4000
360
365
370
time
OEX
Figure 1: S&P100 market index plotted for 10 consecutive trading days,405 measurements per day.
Pollution data
Time
PM
10
0 2 4 6 8 10
20
40
60
80
Figure 2: 10 days of PM10–data in Graz (Austria), 48 observations perday.
Geophysical data
Time
0 2000 4000 6000 8000 10000
27560
27580
27600
27620
27640
27660
Figure 3: Horizontal component of the magnetic field measured in oneminute resolution at Honolulu magnetic observatory from 1/1/200100:00 UT to 1/7/2001 24:00 UT, 1440 measurements per day.
Functional time series
Each day yields one observation Xt = {Xt(u), u ∈ [0, 1]}, which istreated as a real-valued function (or the corresponding curve).
A common feature of all those datasets is that they do exhibitserial dependence, hence, should be analyzed as stochasticprocesses—functional time series.
The intraday process {Xt(u), u ∈ [0, 1]} typically is not stationary,while the functional process {Xt , t ∈ Z} is.
We shall assume throughout that Xt ∈ L2([0, 1]).
Dimension reduction
Whether the data are in Rp or functional, the most populardimension reduction technique is Principal Components, PCA(Functional Principal Components, FPCA).
PCA in Rp
Let µ := EX = 0 and C := Cov(X ), with eigenvectors e1, . . . , epand eigenvalues λ1 > . . . > λp. The p-dimensional X decomposesinto (Karhunen-Loeve expansion)
X =
p∑k=1
Ykek =
p∑k=1
〈X , ek〉ek .
where
• Yk := 〈X , ek〉 is X ’s kth Principal Component (a scalar)
• Cov(Y ) = diag(λ1, . . . , λp)
Interpretation: Y1 is the normed linear combination of X1, . . . ,Xp
with highest variance, etc.
Optimality of PCA-based dimension reduction in Rp
The success of Principal Components as a dimension reductiontechnique is explained by the fact that the (Karhunen-Loeve)expansion
X =
p∑k=1
Ykek =
p∑k=1
〈X , ek〉ek
is such that, for any K ≤ p and any K -tuple v1, . . . , vK ofp-dimensional vectors (wlg, normed),
E
∥∥∥∥∥X −K∑
k=1
〈X , ek〉ek
∥∥∥∥∥2 ≤ E
∥∥∥∥∥X −K∑
k=1
〈X , vk〉vk
∥∥∥∥∥2
(=
p∑k=K+1
λk
)
Optimality of PCA-based dimension reduction in Rp
For all 1 ≤ K ≤ p, the K first principal components (Y1, . . . ,YK )thus
• provide the “best” (linear) reduction of X to dimension K in thesense that, for any 1 ≤ K < p, (Y1, . . . ,YK ) is the “best”K -dimensional approximation of X and, moreover,
•• they are mutually orthogonal (now in the sense ofuncorrelatedness), which greatly simplifies the statistical analysis.
Functional PCA
All this is easily extended for random variables in functional spaces.
Here we consider only variables in the Hilbert space L2 = L2([0, 1])equipped with inner product
〈x , y〉 =
∫ 1
0x(u)y(u)du
and norm‖x‖ =
√〈x , x〉.
Basic ingredients for Functional PCA
Let X = X (u) be a functional random variable, with mean function
µ = µ(u) = EX = (EX (u) : u ∈ [0, 1]) ;
(without loss of generality, let us assume µ(u) = 0 for all u )
and covariance kernel
C =(C (u, v) := Cov(X (u),X (v)) : (u, v) ∈ [0, 1]2
).
Instead of covariance matrices, we are dealing here with covarianceoperators. The covariance operator C is mapping f ∈ L2 toC (f ) ∈ L2, with
C (f )(u) =
∫ 1
0C (u, v)f (v)dv : u ∈ [0, 1].
Functional PCA
Functional PCA uses the fact that C is a compact (in fact, “traceclass”) operator. For any f ∈ L2, the image C (f ) of f admits aneigendecomposition
C (f ) =∑k≥1
λk〈f , ek〉ek ,
with∑
k≥1 λk <∞, where ek ∈ L2 is the kth eigenfunction of C ,with eigenvalue λk , ‖ek‖ = 1 and ek ⊥ e` for k 6= `: the sequencee1, e2, . . . thus constitutes an orthonormal basis for L2[0, 1].
KL expansion
In that basis, X admits a representation (called Karhunen-Loeveexpansion)
X (u) =∑k≥1
〈X , ek〉ek(u) =:∑k≥1
Ykek(u) u ∈ [0, 1]
where
Yk := 〈X , ek〉 is X ’s kth functional principal component (ascalar), with
Cov(Yk ,Y`) = 0 for k 6= ` and Var(Yk) = λk −→ 0, andhence Yk →m.q. 0, as k →∞.
Optimality
That Karhunen-Loeve expansion is such that, for any 0 < K ∈ Nand any L2[0, 1]-valued sequence v1, v2, . . . ,
E
∫ 1
0
(X (u)−
K∑k=1
Ykek(u)
)2
du
≤ E
∫ 1
0
(X (u)−
K∑k=1
〈X , vk〉vk(u)
)2
du,
that is,
E
∥∥∥∥∥X −K∑
k=1
Ykek
∥∥∥∥∥2
≤ E
∥∥∥∥∥X −K∑
k=1
〈X , vk〉vk
∥∥∥∥∥2
.
KL expansion
Hence, just as in the vector case, the K first functional principalcomponents (Y1, . . . ,YK )
• provide the “best reduction to dimension K” of X in the sensethat (Y1, . . . ,YK ) is the “best” K -dimensional approximationof X , and, moreover,
•• Y1, . . . ,YK are mutually orthogonal ...
Static KL expansion
Let us now assume {Xt | t ∈ Z} is a second-order stationaryprocess. Then, the the K first functional principal components
(Y1t , . . . ,YKt) := (〈Xt , e1〉, . . . , 〈Xt , eK 〉), t ∈ Z
(a K -dimensional process) based on the eigenfunctions ek of thecovariance operator C
• fail to exploit the information contained in Xt ’s leads and lags,hence
• only provide the “best” static K -dimensional approximation ofX , that is, is best among the reductions involving instantaneouslinear combinations only, of the form 〈Xt , ek〉, of the Xt ’s
In a time series context, linear combinations involving the past andfuture values of Xt are likely to do a better job
Static KL expansion
Moreover,•• unless {Xt | t ∈ Z} is an uncorrelated process, the Ykt ’s aremutually orthogonal “at lag zero” only:
Yk1t = 〈Xt , ek1〉 is uncorrelated with Yk2t = 〈Xt , ek2〉 for k1 6= k2,but in general Yk1t = 〈Xt , ek1〉 and Yk2,t−` = 〈Xt,−`, ek2〉 are(cross-)correlated.
Conclusion:
Principal components (whether classical or functional) in a timeseries context do not enjoy the two properties that make them anefficient statistical tool in the i.i.d. (uncorrelated) case.
Can we adapt traditional PCA (call it static PCA) to the dynamiccontext in order to recover its optimal dimension reduction andmutual orthogonality properties in the time-series context?
Brillinger’s Dynamic PCA
This is what Brillinger did in the multivariate time series case.Denote by {Xt := (X1t , . . . ,Xpt)
′ t ∈ Z} a second-order stationaryprocess, with values in Rp, mean µ = 0 and autocovariances
Γk := E[XtX
′t−k], k ∈ Z.
Intuitively, informative linear approximations of Xt should exploitthe autocovariance structure (all Γk ’s), hence involve all present,but also past and future values of the Xt ’s instead of restricting tocontemporaneous ones.
Brillinger’s Dynamic PCA
This requires looking for the normed linear combinations ofpresent, past and future observations maximizing (subject toorthogonality constraints) the variance, etc.—not just thoseinvolving the present observations: dynamic linear combinations(filters), not just static ones.
Brillinger’s Dynamic PCA
Brillinger (1981), in a pathbreaking but all-too-often overlookedcontribution, has solved this problem for p-dimensional processesadmitting a spectral density (a spectral density matrix) ...
... which does not mean that Brillinger’s concept is a frequencydomain concept! Fourier transforms here are used as tools, andother spectral concepts probably can be considered as well.
Brillinger’s Dynamic PCA
Brillinger’s dynamic principal components are based on afactorization of spectral density matrices instead of thefactorization of the (instantaneous) covariance matrices.
Therefore, let us make the additional assumption that
• the spectral measure of {Xt , t ∈ Z} is absolutely continuouswith respect to the Lebesgue measure on [−π, π], that is,{Xt} has a p × p spectral density matrix
Σ(θ) :=∞∑
`=−∞Γ`e−ı`θ,
with entries (σij(θ)), θ ∈ [−π, π];
Brillinger’s Dynamic PCA
The matrix Σ(θ) is Hermitian and positive semidefinite; denotingby Σ(θ) its complex conjugate, Σ(θ) = Σ′(θ) = Σ(−θ).
The eigenvalues λ1(θ) ≥ λ2(θ) ≥ . . . ≥ λp(θ) ≥ 0 of Σ(θ) are real;call them the dynamic eigenvalues of {Xt}.
The corresponding eigenvectors are called the dynamiceigenvectors of {Xt}; denote by ϕ′k(θ) the row eigenvector (ofΣ(θ)) associated with λk(θ): then, ϕk(θ) is a column eigenvectorof of Σ(−θ) = Σ(θ) = Σ′(θ), still with eigenvalue λk(θ):
ϕ′k(θ)Σ(θ) = λk(θ)ϕ′k(θ) Σ(−θ)ϕk(θ) = λk(θ)ϕk(θ)
Brillinger’s Dynamic PCA
Spectral densities always are defined up to a set of θ values withLebesgue measure zero; rather than functions, we are dealing withequivalence classes of a.e. equal functions, thus; by Σ(θ) we tacitlymean a representative of such a class—the same comment appliesto dynamic eigenvalues and eigenvectors
Brillinger’s Dynamic PCA
For all θ,
• the eigenvalues λk(θ) are real,
• the p × p matrix of eigenvectors ϕϕϕ(θ) := (ϕ1(θ) . . . ϕp(θ)) isunitary, that is,
ϕϕϕ(θ)ϕϕϕ∗(θ) = I = ϕϕϕ∗(θ)ϕϕϕ(θ),
henceϕϕϕ(θ)ϕϕϕ′(θ) = I = ϕϕϕ′(θ)ϕϕϕ(θ)
(where ϕϕϕ∗(θ) and ϕϕϕ(θ) stand for the adjoint of ϕϕϕ(θ) and itsconjugate, respectively)
• Σ(−θ) = Σ(θ) implies we can impose ϕϕϕ(−θ) = ϕϕϕ(θ)
• ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ) = Λ(θ) and ϕϕϕ(θ)Λ(θ)ϕϕϕ′(θ) = Σ(θ)
Matrices and Filters
Any matrix or vector M(θ) with square-integrable θ-measurableelements defined over [−π, π] can be expanded (componentwise)into a Fourier series
M(θ) =1
2π
∞∑`=−∞
[ ∫ π
−πM(θ)e ı`θdθ
]e−ı`θ
where the right-hand side converges in quadratic mean. Thatexpansion creates a correspondence between the square-integrablematrix-valued function M(θ) and the square-summable filter
M(L) :=1
2π
∞∑`=−∞
[∫ π
−πM(θ)e ı`θ dθ
]L`
(L, as usual, stands for the lag operator)
Brillinger’s Dynamic PCA
The (p ×m)-dimensional matrix M(θ) and the filter M(L) then arestrongly connected by the fact that
• if the p-dimensional process {Xt} has spectral densitymatrix Σ(θ),• then, the m-variate stochastic process {M ′(L)Xt} has spectraldensity matrix
M ′(θ)Σ(θ)M(θ)
where M is the conjugate of M.
Brillinger’s Dynamic PCA
The p × 1 dynamic eigenvectors θ 7→ ϕk(θ), in particular, can beexpanded into
ϕk(θ) =1
2π
∞∑`=−∞
[ ∫ π
−πϕk(θ)e ı`θdθ
]e−ı`θ =:
1
2π
∞∑`=−∞
ψk`e−ı`θ,
defining p × 1 square-summable filters of the form
ϕk
(L) =1
2π
∞∑`=−∞
[∫ π
−πϕk(θ)e ı`θdθ
]L` =:
∞∑`=−∞
ψk`L`
where ψk` is a p-dimensional vector of Fourier coefficients; sinceϕϕϕ(−θ) = ϕϕϕ(θ), the ψk`’s are real
Brillinger’s Dynamic PCA
It follows that the p-tuple
Yt := ϕϕϕ′(L)Xt
of (real-valued) univariate processes {Ykt | t ∈ Z}, where
Ykt := ϕ′k
(L)Xt =∞∑
`=−∞ψ′k`Xt−` =
∞∑`=−∞
〈Xt−`, ψk`〉
has diagonal spectral density
ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ) = Λ(θ),
with diagonal elements ϕ′k(θ)Σ(θ)ϕk(θ) = λk(θ).
Hence, the {Ykt}’s are mutually orthogonal at all leads and lags,with
Var(Ykt) = λk :=
∫ π
−πλk(θ)dθ.
Brillinger’s Dynamic PCA
Definition. The univariate process {Ykt | t ∈ Z} is called {Xt}’skth dynamic principal component (k = 1, . . . , p).
Brillinger’s Dynamic PCA
The properties of {Xt}’s dynamic principal components extend tothe time-series context the standard properties of traditionalprincipal components associated with the eigenvalues andeigenvectors of {Xt}’s covariance matrix.
In particular, the variance λk of Ykt is such that
λk =
max{∑p
i=1
∑∞`=−∞ a2
i`=1}Var(∑p
i=1
∑∞`=−∞ ai`Xi ,t−`
), k = 1
max{∑pi=1
∑∞`=−∞ a2
i`=1}Var(∑p
i=1
∑∞`=−∞ ai`Xi ,t−`
), k = 2, . . . , p
subject to∑p
i=1
∑∞`=−∞ ai`Xi ,t−` orthogonal to Y1t , . . . ,Yk−1,t
Brillinger’s Dynamic PCA
Because ϕϕϕ(θ) is unitary, ϕϕϕ′−1(θ) = ϕϕϕ(θ). Therefore,
ϕϕϕ(L)Yt= Xt ,
since it has spectral density
ϕϕϕ(θ)Λ(θ)ϕϕϕ′(θ) = ϕϕϕ(θ)ϕϕϕ′(θ)Σ(θ)ϕϕϕ(θ)ϕϕϕ′(θ)= Σ(θ).
Developing ϕϕϕ(L)Yt (taking into account the fact thatψk` = ψk,−`), we obtain
Xt = ϕϕϕ(L)Yt =
p∑k=1
∞∑`=−∞
Yk,t+` ψk`
Brillinger’s Dynamic PCA
The dynamic principal components {(Y1t , . . . ,Ypt)′} thus provide
the expansion (the dynamic Karhunen-Loeve expansion of {Xt}),of the form
Xt =
p∑k=1
∞∑`=−∞
Yk,t+` ψk`,
the truncation at K of which provides, for any 1 ≤ K ≤ p, the“best” reduction of {Xt} to a K -tuple of linear combinations of itspresent, past, and future values.
Brillinger’s Dynamic PCA
More precisely, for any sequence of p-dimensional vectors
vk` k = 1, . . . p, ` ∈ Z
such that∑∞
`=−∞ ‖vk`‖ ≤ ∞, letting
Ykt :=∞∑
`=−∞〈Xt−`, vk`〉,
we have, for all K = 1, . . . , p,
E
∥∥∥∥∥Xt −K∑
k=1
∞∑`=−∞
Yk,t+` ψk`
∥∥∥∥∥2
≤ E
∥∥∥∥∥Xt −K∑
k=1
∞∑`=−∞
Yk,t+` vk`
∥∥∥∥∥2
.
Brillinger’s Dynamic PCA
Mutual orthogonality of Yk1t and Yk2s , k1 6= k2 already has beenshown for all s and t.
Contrary to the the dimension reduction based on the staticprincipal components of {Xt}, the dimension reduction based onthe dynamic principal components thus recovers, in the time seriescontext, the desired properties of standard principalcomponents-based dimension reduction in the i.i.d. case.
Dynamic FPCA
How can we extend this to the functional setting?
A neat treatment of the spectral approach to the analysis offunctional time series has been proposed (only quite recently) byPanaretos & Tavakoli (AoS, 2013), who give the followingdefinition
Definition. Provided that∑
` ‖Cov(Xt ,Xt−`)‖S <∞ (asummability condition on Hilbert-Schmidt norms—details areskipped),
FXθ :=
∞∑`=−∞
Cov(Xt ,Xt−`)e−ı`θ
exists and is called the spectral density operator of the functionalprocess {Xt}.
Dynamic FPCA
Based on the latter, instead of static functional transformations(mapping a L2-valued variable to a RK -valued one; here K is anarbitrary integer)
Xt 7→ Yt = (Y1t . . .YKt)′
=: Ψ0(Xt) say, with Ψ0 : L2 → RK ,
we consider linear functional filters (mapping an L2-valuedfunctional process to a RK -valued variable)
(. . . ,Xt−1,Xt ,Xt+1, . . .) 7→ Yt = (Y1t . . .YKt)′
=:∞∑
`=−∞Ψ`(Xt−`), Ψ` : L2 → RK .
Dynamic FPCA
Those Ψ`’s should be such that Ykt and Yk ′t′ be uncorrelatedunless k = k ′ and t = t ′ (at all leads and lags, not justcontemporaneously: autocorrelations are admitted, but notcross-correlations) ...
Dynamic FPCA
... which holds if and only if the (traditional) spectral densitymatrix
ΣYθ :=
1
2π
∞∑`=−∞
Cov(Yt ,Yt−`)e−ı`θ
of the p-dimensional vector process {Yt} is diagonal for (almost)all θ ∈ [−π, π].
Accordingly, let us investigate the relation between the spectraldensity matrix ΣY
θ and the spectral density operator FXθ .
Dynamic FPCA
For fixed θ, the spectral density operator FXθ (for every θ, a
non-negative self-adjoint Hilbert-Schmidt operator) has similarproperties as a covariance operator.
In particular, for any f ∈ L2, the image FXθ (f ) of f admits the
eigendecomposition (just as for the usual covariance operator)
FXθ (f ) :=
∑k≥1
λk(θ)〈f , ϕk(θ)〉ϕk(θ)
where λk(θ) and ϕk(θ) are FXθ ’s eigenvalues (in descending order
of magnitude) and eigenfunctions, respectively.
Dynamic FPCA
The relation between functional filters and frequency-indexedoperators is similar to, but more delicate than, the relationbetween matrix filters and frequency-indexed matrices ...
Since each Ψ` in the functional filter (providing a reduction todimension p)
(. . . ,Xt−1,Xt ,Xt+1, . . .) 7→∞∑
`=−∞Ψ`(Xt−`) =: Yt , Ψ` : L2 → Rp
is linear, it has, for some f , the representation (Rieszrepresentation)
Ψ`(f ) = (〈f , ψ1`〉, . . . , 〈f , ψp`〉)′.
Then, the following relation holds between the spectral densityoperator FX
θ and the spectral density matrix ΣYθ of the filtered
p-dimensional process {Yt}.
Dynamic FPCA
Let
ψ?k(θ) :=∞∑
`=−∞ψk`e
ı`θ k = 1, . . . , p.
(the ψk`’s thus are the Fourier coefficients of the ψ?k`’s). We have
ΣYθ =
〈FXθ (ψ?1(θ)), ψ?1(θ)
⟩· · · 〈FX
θ (ψ?p(θ)), ψ?1(θ)⟩
.... . .
...〈FX
θ (ψ?1(θ)), ψ?p(θ)⟩· · · 〈FX
θ (ψ?p(θ)), ψ?p(θ)⟩ .
Dynamic FPCA
Let us choose the ψk`’s (equivalently, the functional filters Ψ`) insuch a way that
ψ?k(θ) =∑`∈Z
ψk`eı`θ = ϕk(θ).
That is, choose as ψk`’s the Fourier coefficients of FXθ ’s kth
eigenfunction ϕk(θ):
ψk` =1
2π
∫ π
−πϕk(θ)e−ı`θdθ
(note again that those ψk`’s, just as the ψ?k(θ)’s and ϕk(θ)’s, arefunctions, viz. ψk`(u) = 1
2π
∫ π−π ϕk(θ)(u)e−ı`θdθ)
Dynamic FPCA
Definition. The kth Dynamic Functional Principal Component ofXt is the univariate real-valued process
Ykt :=∞∑
`=−∞〈Xt−`, ψk`〉, t ∈ Z, k = 1, . . . , p
(formally, same definition as for Rp-valued processes), withvariance ∫ π
−πλk(θ)dθ.
The spectral density matrix ΣYθ of the Ykt ’s then is diagonal, with
diagonal elements λk(θ).
Elementary properties
Assume {Xt : t ∈ Z} is a functional process with summableautocovariances and let Ykt , t ∈ Z be its kth dynamic FPC. Thenthe following holds:
(a) The series defining Ykt is mean square convergent.
(b) Ykt is real.
(c) If the Xt ’s are serially uncorrelated, then the dynamicFPCs coincide with the static FPCs.
(d) For k 6= k ′, the principal components Ykt and Yk ′s
are uncorrelated for all s, t.
Dynamic Karhunen-Loeve expansion
We then can recover the original process by means of a functionalversion of the dynamic Karhunen-Loeve expansion previouslyobtained in Rp.
Definition. The dynamic Karhunen-Loeve expansion of thefunctional process Xt is
Xt =∞∑k=1
∞∑`=−∞
Yk,t+` ψk`, t ∈ Z
with
Ykt :=∞∑
`=−∞〈Xt−`, ψk`〉
({Xt}’s dynamic functional principal components).
Optimality
For any L2-valued sequence
vk` , k, ` ∈ Z
such that∑∞
`=−∞ ‖vk`‖ <∞, letting
Ykt :=∞∑
`=−∞〈Xt−`, vk`〉,
we have, for all K ∈ N, the desired optimality property
E
∥∥∥∥∥Xt −K∑
k=1
∞∑`=−∞
Yk,t−` ψk`
∥∥∥∥∥2
≤ E
∥∥∥∥∥Xt −K∑
k=1
∞∑`=−∞
Yk,t−` vk`
∥∥∥∥∥2
.
The last proposition provides, as a natural measure of how well afunctional time series can be represented in finite dimension K , theproportion of variance explained by the first K dynamic FPCs:{
K∑k=1
∫ π
−πλk(θ)dθ
}/E‖X1‖2.
Real data example (pollution data)
0.0 0.2 0.4 0.6 0.8 1.0
−6
−4
−2
02
Intraday time
Sq
rt(
PM
10
) −
me
an
0.0 0.2 0.4 0.6 0.8 1.0
−6
−4
−2
02
Intraday time
Sq
rt(
PM
10
) −
me
an
0.0 0.2 0.4 0.6 0.8 1.0
−6
−4
−2
02
Intraday time
Sq
rt(
PM
10
) −
me
an
Figure 4: We show 10 subsequent observations (left panel), thecorresponding static KL expansion with one term (middle panel) and thedynamic KL expansion with one term (right panel).
Estimation of Functional Dynamic Principal Components
In practice, covariance and spectral density operators are notknown.
Theorem (Consistency)
Let Ykt be the random variable defined by
Ykt :=L∑
`=−L〈Xt−`, ψk`〉, k = 1, . . . , p and t = L + 1, . . . n− L,
where the ψk`’s are estimated from the empirical covarianceoperator. Then (under appropriate technical assumptions), we
have for L = L(n)→∞ that Ykt − YktP→ 0 as n→∞.
Implementation in a nutshell
(i) In practice, a function x(u) is observed on a grid0 ≤ u1 < u2 < . . . < ur ≤ 1, and converted into a functionalobservation within the space spanned by a finite number of basisfunctions v1, . . . , vd as
x(u) = Z1v1(u) + . . .+ Zdvd(u) = v′(u)Z.
Commonly, Fourier bases, b-splines or wavelets are used. Thecoefficients Zk can be obtained, for example, via leastsquaresfitting or some penalized form thereof.
Implementation in a nutshell
(ii) We are working on a finite-dimensional space now, whereoperators reduce to matrices acting on Z := (Z1, . . . ,Zd)′.
In particular, the spectral density operator
FXθ :=
∞∑`=−∞
Cov(Xt ,Xt−`)e−ı`θ
reduces to the matrix
GXθ =
1
2π
(∑h∈Z
CZh e−ıhθ
)V ′,
where CZh := EZhZ′0 and V := (〈vi , vj〉 : 1 ≤ i , j ≤ d).
Implementation in a nutshell
(iii) Relation between the eigenfunctions/values of FXθ and the
eigenvectors/values of GXθ
Assume that λm(θ) is the m-th eigenvalue of GXθ , with eigenvector
φm(θ). Then λm(θ) is also an eigenvalue of FXθ , with
eigenfunction v′φm(θ).
From there, we obtain
ψmk =v′
2π
∫ π
−πϕm(s)e−ıksds =: v′ψmk ,
and
Ymt =∑k∈Z
∫ 1
0Z′t−kv(u)v′(u)ψmkdu =
∑k∈Z
Z′t−kVψmk .
Implementation in a nutshell
(iv) Of course, all quantities involved need to be estimated(consistency results have been established).
Details can be found in the JRSS paper.
Implementation in R: package freqdom.
Real data example
0 50 100 150
−4−2
02
4
days
1st s
core
sequ
ence
s
Figure 5: The sequence of the first static FPCs (red) [73%] and thedynamic ones (black) [80%].
Simulation
Setting: functional AR(1) process; autoregressive kernelproportional to κ; length n = 250
dyn.PC1 stat.PC1 dyn.PC2 stat.PC2 dyn.PC3 stat.PC3
0.00
0.10
0.20
0.30
RMSE(K)
Figure 6: Boxplots of RMSEstat(K ) and RMSEdyn(K ) for K = 1, 2, 3principal components, based on 500 iterations, n = 250 and κ = 0.5.
Simulation
dyn.PC1 stat.PC1 dyn.PC2 stat.PC2 dyn.PC3 stat.PC3
0.0
0.1
0.2
0.3
0.4
RMSE(K)
Figure 7: Boxplots of RMSEstat(K ) and RMSEdyn(K ) for K = 1, 2, 3principal components, based on 500 iterations, n = 250 and κ = 0.7.
Simulation
dyn.PC1 stat.PC1 dyn.PC2 stat.PC2 dyn.PC3 stat.PC3
0.0
0.1
0.2
0.3
0.4
0.5
RMSE(K)
Figure 8: Boxplots of RMSEstat(K ) and RMSEdyn(K ) for K = 1, 2, 3principal components, based on 500 iterations, n = 250 and κ = 0.9.
Conclusion
Static PCA and FPCA (which are everyday practice) areinadequate in the context of time series: the KLdecomposition does not involve basis processes that aremutually orthogonal at all leads and lags, and the optimalityproperty is lost in non-i.i.d. cases.
Dynamic PCA and FPCA take into account serial dependence,and recover those properties.
Consistent empirical versions are computationally feasible.
Simulations and real data example are quite encouraging
Conclusion (continued)
Moreover,
Dynamic principal components are a basic ingredient of theGeneral Dynamic Factor methods used (see Forni, Hallin,Lippi, and Reichlin/Zaffaroni 2000, 2005, . . . , 2015, 2017) inthe analysis of large panels of time series data;
in an ongoing research (Hallin, Hormann, Nisol), functionaldynamic principal components methods are used in afunctional extension of General Dynamic Factor methods,allowing for the analysis/prediction of large panels containingboth scalar and functional time series