Sparse seasonal and periodic vector autoregressive modeling

Sparse seasonal and periodic vector autoregressive modeling ∗†‡

Changryong BaekSungkyunkwan University

Richard A. DavisColumbia University

Vladas PipirasUniversity of North Carolina

October 16, 2015

Abstract

Seasonal and periodic vector autoregressions are two common approaches to modeling vectortime series exhibiting cyclical variations. The total number of parameters in these modelsincreases rapidly with the dimension and order of the model, making it difficult to interpretthe model and questioning the stability of the parameter estimates. To address these andother issues, two methodologies for sparse modeling are presented in this work: first, based onregularization involving adaptive lasso and, second, extending the approach of Davis, Zang andZheng (2015) for vector autoregressions based on partial spectral coherences. The methods areshown to work well on simulated data, and to perform well on several examples of real vectortime series exhibiting cyclical variations.

1 Introduction

In this work, we introduce methodologies for sparse modeling of stationary vector (q–dimensional)time series data exhibiting cyclical variations. Sparse models are gaining traction in the time seriesliterature for similar reasons sparse (generalized) linear models are used in the traditional settingof i.i.d. errors. Such models are particularly suitable in a high-dimensional context, for whichthe number of parameters often grows as q2 (as for example with vector autoregressive modelsconsidered below) and becomes prohibitively large compared to the sample size even for moderateq. Sparse models also ensure better interpretability of the fitted models and numerical stability ofthe estimates, and tend to improve prediction.

In the vector time series context, sparse modeling has been considered for the class of vectorautoregressive (VAR) models:

Xn − µ = A1(Xn−1 − µ) + . . .+Ap(Xn−p − µ) + εn, n ∈ Z, (1.1)

where Xn = (X1,n, . . . , Xq,n)′ is a q–vector time series, A1, . . . , Ap are q×q matrices, µ is the overallconstant mean vector and εn are white noise (WN) error terms. Regularization approaches based onlasso and its variants were taken in Hsu, Hung and Chang (2008), Shojaie and Michailidis (2010),Song and Bickel (2011), Medeiros and Mendes (2012), Basu and Michailidis (2015), Nicholson,Matteson and Bien (2015), Kock and Callot (2015), with applications to economics, neuroscience

∗AMS subject classification. Primary: 62M10, 62H12. Secondary: 62H20.†Keywords and phrases: seasonal vector autoregressive (SVAR) model, periodic vector autoregressive (PVAR)

model, sparsity, partial spectral coherence (PSC), adaptive lasso, variable selection.‡The work of the first author was supported in part by the Basic Science Research Program from the Na-

tional Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A1A1006025). The third author was supported in part by NSA grant H98230-13-1-0220.

1

(e.g. functional connectivity among brain regions), biology (e.g. reconstructing gene regulatorynetwork from time course data), and environmental science (e.g. pollutants levels over time). Asusual, the model (1.1) will be abbreviated as

Φ(B)(Xn − µ) = εn, n ∈ Z, (1.2)

where Φ(B) = 1−A1B − . . .−ApBp and B is the backshift operator.In a different approach, Davis et al. (2015) introduced an alternative 2–stage procedure for

sparse VAR modeling. In the first stage, all pairs of component series are ranked based on theestimated values of their partial spectral coherences (PSCs), defined as

supλ|PSCX

jk(λ)|2 := supλ

|gXjk(λ)|2

gXjj(λ)gXkk(λ), j, k = 1, . . . , q, j 6= k, (1.3)

where gX(λ) = fX(λ)−1 with fX being the spectral density matrix of X. Then, the order p and

the top M pairs are found which minimize the BIC(p,M) value, and the coefficients of matrices Arare set to 0 for all pairs of indices j, k not included in M . In the second stage, the estimates of theremaining non-zero coefficients are ranked according to their t–statistic values. Again, the top m∗

of the coefficients are selected that minimize a suitable BIC, and then the rest of the coefficientsare set to 0. As shown in Davis et al. (2015), this 2–stage procedure outperforms regular lasso. Thebasic idea of this approach is that small PSCs do not increase the likelihood sufficiently to warrantthe inclusion of the respective coefficients of matrices Ar in the model.

We shall extend here the regularization approach based on lasso and the approach of Davis et al.(2015) based on PSCs to sparse modeling of vector time series data exhibiting cyclical variations.The motivation here is straightforward. Consider, for example, the benchmark flu trends andpollutants series studied through sparse VAR models by Davis et al. (2015), and others. Figure 1depicts the plots of (the logs of) their two component series with the respective sample ACFs andPACFs. The cyclical nature of the series can clearly be seen from the figure. The same holds forother component series (not illustrated here).

Cyclical features of component series are commonly built into a larger vector model by usingone of the following two approaches. A seasonal VAR model (SVAR(P, p) model, for short; not tobe confused with the so-called structural VAR) is one possibility, defined as

Φ(B)Φs(Bs)(Xn − µ) = εn, n ∈ Z, (1.4)

where Φ(B) and εn are as in (1.2), Φs(Bs) = 1 − As,1B

s − . . . − As,PBPs with q × q matrices

As,1, . . . , As,P , µ denotes the overall mean and s denotes the period. This is the vector version ofthe multiplicative seasonal AR model proposed by Box and Jenkins (1976). Another possibility isa periodic VAR model (PVAR(p) model, for short) defined as

Φm(B)(Xn − µm) = εm,n, n ∈ Z, (1.5)

where Φm(B) = 1−Am,1B− . . .−Am,pBp with q× q matrices Am,1, . . . , Am,p which depend on theseason m = 1, . . . , s wherein the time n falls (that is, there are in fact sp matrices A of dimensionq× q), and µm refers to seasonal mean. One could also allow p depend on the season m = 1, . . . , s.Note that whereas the overall mean µ and the covariance matrix Eεnε′n = Σ are constant in (1.4),the mean µm in (1.5) and the covariance matrix Eεm,nε′m,n = Σm are allowed to depend on theseason m.

Both seasonal and periodic VAR models are widely used. For SVAR models, including theunivariate case, see Brockwell and Davis (2009), Ghysels and Osborn (2001). These models form

2

Time

y

2004 2006 2008 2010 2012 2014

−1

.00

.01

.02

.0

Monthly Flu − NC

0 10 20 30 40 50

−0

.6−

0.2

0.2

0.6

Lag

AC

F

SACF− Monthly Flu − NC

0 10 20 30 40 50

−0

.40

.00

.40

.8

Lag

PA

CF

SPACF− Monthly Flu − NC

Time

y

0 10 20 30 40 50

−0

.50

.00

.51

.0

Ozone

0 10 20 30 40 50

−0

.20

.20

.40

.60

.8

Lag

AC

F

SACF−Ozone

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

Lag

PA

CF

SPACF−Ozone

Figure 1: Top: Monthly flu trend in NC. Bottom: 23 hour ozone levels at a CA location. Respectivesample ACFs and PACFs are given.

the basis for the U.S. Census X-12-ARIMA seasonal adjustment program. PVAR models, againwith the focus on univariate series, are considered in the monographs by Ghysels and Osborn(2001), Franses and Paap (2004), Lutkepohl (2005), Hurd and Miamee (2007). These referencesbarely scratch the surface. The vast amount of work should not be surprising – most economic,environmental and other time series naturally exhibit cyclical variations.

Sparse modeling is proposed below for both SVAR and PVAR models. These are two differentclasses of models. Both classes are considered here because of their central role in the analysis oftime series with cyclical variations, and because the real time series of the flu and pollutants datadiscussed above are, in fact, better modeled by one of the two types of models. Indeed, for example,the 1–step-ahead mean square prediction errors for the flu data are 0.0807 (best AR model), 0.0605(best seasonal AR model), and 0.9759 (best periodic AR model). For the pollutants series, theprediction errors are smaller for periodic AR models when long (at least 2–step-ahead) horizons areconsidered. In this work, we thus decide between SVAR and PVAR models just based on how wellthey fit the data and perform in prediction. For more systematic approaches to choosing betweenseasonal and periodic models, see e.g. Lund (2011) and references therein.

The regularization approach to SVAR and PVAR models is based on adaptive lasso, and issomewhat standard. The regular lasso of Tibshirani (1996) is well known to overestimate thenumber of nonzero coefficients (e.g. Buhlmann and van de Geer (2013)). The adaptive lasso ofZou (2006) corrects this tendency in estimating fewer non-zero coefficients. While the applicationof adaptive lasso to PVAR models is straightforward, a linearization and an iterative version ofadaptive lasso is used for SVAR models, which are nonlinear by their construction.

Our extension of the Davis et al. (2015) approach based on PSCs to sparse SVAR and PVAR

3

models involves the (qs)–vector series

Yt =

X(t−1)s+1

X(t−1)s+2...

X(t−1)s+s

,

where s is the period as above and t now refers to a cycle. For the PVAR model, the series {Yt} isnow (second-order) stationary. The Davis et al. (2015) approach can then be applied, though notdirectly since a VAR model for {Yt} is too complex for our purposes. For SVAR models, it is naturalto estimate first a sparse seasonal filter Φs(B

s) by considering the between-period (between-cycle)series X(t−1)s+m as a series in t, for fixed season m = 1, . . . , s. A non-trivial and new issue is how todeal with the results across seasons m = 1, . . . , s. Once a sparse seasonal filter Φs(B

s) is estimated,its seasonal effect on Xn can be removed, and then the between-season filter Φ(B) can be estimatedsparsely by following the approach of Davis et al. (2015).

In the simulations and data applications presented below, the adaptive lasso and the PSCapproaches are found to perform similarly. But the latter approach provides great computationaladvantages, especially as the dimension increases. In the data applications, both of these sparsemodeling approaches outperform non-seasonal (non-periodic) VAR alternatives, as well as non-sparse seasonal (periodic) models.

The rest of the paper is organized as follows. Preliminaries on vector time series, partial spectralcoherences, SVAR and PVAR models can be found in Section 2. Our approaches to fit sparse SVARand PVAR models are presented in Section 3. Finite sample properties of the proposed methodsare studied in Section 4. An application to two real data sets is given in Section 5. Conclusions arein Section 6. Finally, Appendices A and B contain details on several estimation methods employedfor our sparse SVAR and PVAR modeling.

2 Preliminaries

We focus throughout on q–vector time series models Xn = (X1,n, . . . , Xq,n)′, n ∈ Z, with componentunivariate series {Xj,n}n∈Z, j = 1, . . . , q. The prime above indicates transpose. If the seriesX = {Xn}n∈Z is second-order stationary, it has constant mean EXn =: µX and the autocovariancefunction Cov(Xn, Xn+h) = EXnX

′n+h − EXnEX ′n+h =: γX(h), h ∈ Z, does not depend on n.

The spectral density, if it exists, is a complex- and matrix-valued function fX(λ), λ ∈ (−π, π],characterized by

∫ π−π e

ihλfX(λ)dλ = γX(h), h ∈ Z. For more information, see Hannan (1970),Lutkepohl (2005), Brockwell and Davis (2009).

As described in Section 1, in the approach of Davis et al. (2015), the quantity of interest inthe initial step is the partial spectral coherence (PSC) between two component series Xj,n and Xk,n

defined as (see (1.3)):

PSCXjk(λ) = −

gXjk(λ)√gXjj(λ)gXkk(λ)

, λ ∈ (−π, π], (2.1)

where gX(λ) = (gXjk(λ))j,k=1,...,q satisfies gX(λ) = fX(λ)−1, supposing the latter exists. PSCs arerelated to pairwise conditional correlations of the series X. Denote by X−jk,n the (q − 2)–vectorseries obtained from Xn by removing its jth and kth components Xj,n and Xk,n, respectively. Set

{Doptj,m ∈ Rq−2,m ∈ Z} = argmin

Dj,m,m∈ZE

(Xj,n −

∞∑m=−∞

Dj,mX−jk,n−m

)2

(2.2)

4

and consider the residual series

εj,n = Xj,n −∞∑

m=−∞Doptj,mX−jk,n−m.

Define similarly the residual series {εk,n}. The conditional correlation between {Xj,n} and {Xk,n} ischaracterized by the correlation between the two residual series {εj,n} and {εk,n}. The componentseries {Xj,n} and {Xk,n} are called conditionally uncorrelated when Cov(εj,n+m, εk,n) = 0 for anylag m ∈ Z. It can be shown (Davis et al. (2015)) that

{Xj,n} and {Xk,n}, j 6= k, are conditionally uncorrelated (2.3)

if and only if PSCXj,k(λ) = 0, for all λ ∈ (−π, π].

In a sparse modeling of VAR time series (1.1), Davis et al. (2015) set

Ar(j, k) = Ar(k, j) = 0, r = 1, . . . , p, (2.4)

with Ar(j, k) denoting the entries of the matrix Ar, whenever

PSCXj,k(λ) = 0, for all λ ∈ (−π, π]. (2.5)

From a practical perspective, the rule (2.4) is used when the corresponding PSCs in (2.5) are small.Strictly speaking, the relation between (2.4) and (2.5) is not true in either direction. But using (2.5)(or when the PSCs are small in practice) to select a sparse model according to (2.4) seems to beworking very well in practice. As suggested in Davis et al. (2015) (see Section 5), this is because “ifthe PSCs are near zero, the corresponding AR coefficients do not increase the likelihood sufficientlyto merit their inclusion in the model based on BIC.”

A seasonal VAR model is defined as (see (1.4))

(1−A1B − . . .−ApBp)(1−As,1Bs − . . .−As,PBsP )Xn = εn, n ∈ Z, (2.6)

where s is the period, A1, . . . , Ap, As,1, . . . , As,P are q× q matrices, {εn} is a white noise series withEεnε′n = Σ and we assume for simplicity that the overall mean µX = EXn = 0. The model (2.6)will be abbreviated as (1.4) and denoted SVAR(p, P )s. We suppose that it is stationary causal,that is, det(Iq −

∑pr=1Arz

r) 6= 0 and det(Iq −∑P

R=1As,RzR) 6= 0 for z ∈ C, |z| ≤ 1. The elements

of matrices Ar and As,R are denoted as Ar(j, k) and As,R(j, k), j, k = 1, . . . , q, respectively.A periodic VAR model is defined as (see (1.5))

(1−Am,1B − . . .−Am,pmBpm)(Xn − µm) = εm,n, n ∈ Z, (2.7)

where Am,r, r = 1, . . . , pm, m = 1, . . . , s, are q × q matrices, s is the period, n = m + ts forsome cycle t ∈ Z so that m denotes which season from 1, 2, . . . , s, the time instance n belongs to,{εm,n}n∈Z, m = 1, . . . , s, are uncorrelated white noise series with Eεm,nε′m,n = Σm, and µm,m =1, . . . , s, denote the seasonal means. The model (2.7) will be abbreviated as (1.5) and denotedPVAR(p1, . . . , ps) (or PVAR(p)s, when p1 = . . . = ps = p). Note that a PVAR model is notstationary. But as already indicated in Section 1, the (qs)–vector series

Yt =

X(t−1)s+1

X(t−1)s+2...Xts

, t ∈ Z, (2.8)

5

is (second-order) stationary. The elements of matrices Am,r are denoted as Am,r(j, k), j, k =1, . . . , q.

We shall not provide any preliminaries regarding lasso and its variants. The interested readercan consult any of the available textbooks on the subject, including Hastie, Tibshirani and Friedman(2013), Buhlmann and van de Geer (2013), Giraud (2014).

3 Fitting sparse SVAR and PVAR models

In this section, we propose methods to fit sparse SVAR and PVAR models by using PSCs orregularization. The suggested approaches based on PSCs extend the 2–stage method for sparseVAR models considered by Davis et al. (2015). The regularization approaches also extend thosepreviously used for sparse VAR models. For notational convenience, we assume throughout thissection that the observed q–vector time series is X1, . . . , XN with the sample size

N = Ts, (3.1)

that is, an integer multiple of T cycles of length s, where s is the period. Each cycle has s seasons,denoted m = 1, . . . , s.

3.1 Sparse SVAR models

3.1.1 Two-stage approach based on PCSs

The idea to fit sparse SVAR models based on PSCs is straightforward. The seasonal filter Φs(Bs)

in (1.4) (or (2.6)) can be thought to account for the between-cycle dependence structure, that is,the dependence structure in the series

Y(m)t = X(t−1)s+m, t = 1, . . . , T, (3.2)

where m = 1, . . . , s is a fixed season. The sparse filter Φs(Bs) can then be estimated by following

the approach of Davis et al. (2015) applied to the series (3.2). One new issue arising here is how tocombine the information across different seasons m. Once the sparse seasonal filter is estimated, itsseasonal effect on Xn can be removed, and then the between-season filter Φ(B) can be estimatedsparsely by following again the approach of Davis et al. (2015). We next give the details of thedescribed method.

First, for each season m = 1, . . . , s, calculate the PSCs for(q2

)pairs of component series of Y

(m)t

in (3.2). Denote the rank of the PSC as R(m)(j, k) for the pair (j, k). We then define the rank ofthe seasonal PSC for the pair (j, k) as

r(j, k) =s∑

m=1

R(m)(j, k). (3.3)

We will use r(j, k)’s to rank the conditional correlations in the between-cycle time series Y(m)t in

(3.2), that is, the top seasonal ranks will indicate the strongest conditional between-cycle correla-tions. Somewhat surprising perhaps, we also investigated the possibility of considering the averagePSCs across m seasons and then ranking them, but this approach did not lead to satisfactory resultsin practice.

Second, having the seasonal ranks of conditional correlations, we proceed as in Davis et al.(2015). A sparse SVAR filter Φs(B

s) is fitted as follows. For M pairs (j, k) with the top M

6

seasonal ranks, the coefficients As,R(j, k) and As,R(k, j) of the matrices As,R, R = 1, . . . , P , canbe non-zero. The (j, k) and (k, j) coefficients are set to zero for all other

(q2

)−M pairs (j, k) with

smaller ranks. The order P and the top M pairs are chosen by minimizing the following BIC overa prespecified range (P,M) ∈ P× M,

BICS(P,M) =s∑

m=1

−2 logL(A(m)s,1 , . . . , A

(m)s,P ) + (q + 2M)P logN,

where L(A(m)s,1 , . . . , A

(m)s,P ) is the Gaussian maximum likelihood of the VAR(P ) model based on the

series Y(m)t in (3.2), assuming the model is sparse (with 2M non-zero off-diagonal elements in

matrices A(m)s,R ), and (q + 2M)P is the number of non-zero coefficients.

With the sparse seasonal VAR model filter Φs(Bs) chosen, consider the “deseasonalized” series

Zn = (1−As,1Bs − . . .−As,PBP s)Xn, (3.4)

where As,R = s−1∑s

m=1 A(m)s,R , R = 1, . . . , P , are the average estimators of As,R across m sea-

sons. The matrices As,R have only 2M non-zero off-diagonal elements. The rest of the method isessentially the Davis et al. (2015) approach applied to the series {Zn} in (3.4).

Third, which is the first stage of Davis et al. (2015) method, calculate and rank the(q2

)PSCs of

the series {Zn} in (3.4). Select the order p and the top m pairs (with the top m PCSs) to minimizethe following BIC over (p,m) ∈ P×M:

BIC(p,m) = −2 logL(A1, . . . , Ap) + (q + 2m)p log(N − P s),

where L(A1, . . . , Ap) is the Gaussian maximum likelihood of the VAR(p) model for the series Zn in(3.4) assuming the model is sparse (with 2m non-zero off-diagonal elements in matrices Ar), and(q + 2m)p is the number of non-zero coefficients.

Finally and forth, we adapt the second stage of the Davis et al. (2015) approach as follows. Wereestimate the sparse matrices A1, . . . , Ap and As,1, . . . , As,P by using the estimated GLS proceduredescribed in Appendix A. The t–statistic for a non-zero SVAR coefficient is

t(γC(i)) =γC(i)

s.e.(γC(i)), (3.5)

where γ = Rα with constraints matrix R (imposing sparsity) and α :=vec(A1, . . . , Ap, As,1, . . . , As,P ) is the

(q2(p + P )

)× 1 vector, and γC is the estimated GLS

estimator for the constrained SVAR. By ranking the absolute values of the t–statistics |t(γC(i))|from the highest to the lowest, we finally select the top r∗ of non-zero coefficients from SVAR byfinding the smallest BIC value given by

BICC(r) = −2 logLC(αC) + r logN,

where LC(αC) is the likelihood evaluated with the estimated GLS estimator αC obtained by select-ing the top r non-zero coefficients from the ranked t–statistics. In fact, to give a more balancedtreatment to the non-seasonal and seasonal parts of the model, it is natural to consider the rankingof the t–statistics separately for the non-seasonal and seasonal coefficients, and to take the top rsuccessively from the two lists of the coefficients. We found this ranking to perform better and useit throughout the paper.

7

3.1.2 Adaptive lasso approach

The adaptive lasso procedure of Zou (2006) applies to linear models. In order to apply it to thenonlinear SVAR models, we linearize the SVAR model (2.6) around Ar = Ar and As,R = As,R asin (A.5) of Appendix A:

Y = Xγ + ε, (3.6)

where γ = vec(A1, . . . , Ap, As,1, . . . , As,P

)=(γ1, . . . , γ(p+P )q2

), Y = vec(Y1, . . . , YN ) with Yn given

by (A.4), and X is the design matrix determined by (A.3). For fixed Ar and As,R, the adaptivelasso solution for SVAR(p, P ) in the linearized form (3.6) is given by

argminγ

1

N‖Y −Xγ‖2 + λ`

(p+P )q2∑j=1

w(`)j |γj |

. (3.7)

The estimation (3.7) is applied iteratively in `, where Ar = Ar(`), As,R = As,R(`) are the estimates

of Ar and As,R in the previous step `− 1, the weights w(`)j are

w(`)j =

1

|γ(`−1)j |

with the estimator γ(`−1) of γ in the previous stage `−1. One additional consideration in applyingadaptive lasso for SVAR is that the components of the innovations εn in the SVAR model are notidentically distributed. To incorporate possible correlations in εn, we modify the adaptive lasso as

γ(`)al = argmin

γ

1

N‖(IN ⊗ Σ

−1/2(`) )Y − (IN ⊗ Σ

−1/2(`) )Xγ‖2 + λ`

(p+P )q2∑j=1

w(`)j |γj |

, (3.8)

where Σ(`) is the estimated covariance matrix of innovations εn from the previous step `− 1. The10–fold cross validation is used to select the tuning penalty parameter λ`. Finally, the iterativeprocedure is stopped when

‖Σ(`+1) − Σ(`)‖ ≤ ε

for some predetermined ε. We have used the original lasso estimator with the 10–fold cross valida-tion rule to select the initial estimator γ(0). The order of the SVAR model is chosen by selecting

(p, P ) = argmin(p,P )∈P×P

CV(p, P ),

where CV(p, P ) is the average 10–fold cross-validation error.

3.2 Sparse PVAR models

3.2.1 Two-stage approach based on PSCs

Recall from Section 2 that PVAR models can be cast into the framework of stationary time seriesby considering the (qs)–vector time series {Yt} in (2.8). Moreover, the series {Yt} has a VARrepresentation (see e.g. Franses and Paap (2004), p. 105, in the case of quarterly data). TheVAR representation, however, is a nonlinear and involved transformation of the PVAR coefficients,making a direct application of the Davis et al. (2015) approach not plausible. For the reader lessfamiliar with the subject, the VAR representation of the series {Yt} is more complex because, for

8

(m, j) \(l, k) (1,1) (1,2) (2,1) (2,2) (3,1) (3,2) (4,1) (4,2)(1,1) A1,1(1, 1) A1,1(1, 2)(1,2) A1,1(2, 1) A1,1(2, 2)(2,1) A2,1(1, 1) A2,1(1, 2)(2,2) A2,1(2, 1) A2,1(2, 2)(3,1) A3,1(1, 1) A3,2(1, 1)(3,2) A3,1(2, 1) A3,1(2, 2)(4,1) A4,1(1, 1) A4,1(1, 2)(4,2) A4,1(2, 1) A4,1(2, 2)

Table 1: Corresponding PVAR(1) coefficients in the case of period s = 4 and dimension q = 2.First column: the indices of the response variable. First row: the indices of the regressor variable.

example, the component of Yt in season s ≥ 2 is regressed on the component of Yt in season s− 1for the same cycle t.

To fit a sparse PVAR model, we can nevertheless proceed straightforwardly by following thesame principle as in Davis et al. (2015) described around (2.4)–(2.5). That is, if the PSC betweentwo component series of {Yt} is zero (or small), we would set the corresponding coefficient in thePVAR representation between the two components to zero, even if it accounts for the regression ofone on the other in the same (or any) cycle t.

Before expressing this rule in the general case, we illustrate it through an example. Considera PVAR(1) model with period s = 4 and dimension q = 2. Index the components of Yt by(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (4, 1) and (4, 2), where the first index refers to the seasonm = 1, 2, 3, 4 and the second to the dimension j = 1, 2. Table 1 presents the coefficients of thematrices Am,1, m = 1, 2, 3, 4, in the PVAR(1) model between two component series in the aboveindexing, where the indices of the response and regressors are given, respectively, in the first columnand the first row. For example, in this PVAR(1) model, the component series Y(2,1),t in season 2is regressed on the component series Y(1,1),t and Y(1,2),t in season 1 of the same cycle t, with therespective coefficients A2,1(1, 1) and A2,1(1, 2). Similarly, Y(1,1),t in season 1 is regressed on Y(4,1),t−1and Y(4,2),t−1 in season 4 of the previous cycle t− 1, with the respective coefficients A1,1(1, 1) andA1,1(1, 2).

Then, for example, if the PSC between the component series {Y(2,1),t} and {Y(1,1),t} is small, wewould set the coefficient A2,1(1, 1) to zero, even if the regression is in the same cycle t. Likewise, forexample, if the PSC between {Y(1,1),t} and {Y(4,2),t} is small, we would set the coefficient A1,1(1, 2)to zero.

In general, we index the components of the vector Yt through (m, j), where m = 1, . . . , s refersto the season and j = 1, . . . , q to the dimension of the q–vector time series Xt. That is,

Yt = vec(X(t−1)s+1, X(t−1)s+2, . . . , X(t−1)s+s

)=(Y(1,1),t, . . . , Y(1,q),t, Y(2,1),t, . . . , Y(2,q),t, . . . , Y(s,1),t, . . . , Y(s,q),t

)′.

The periodic PSC between component series {Y(m,j)} and {Y(l,k)} is

supλ

∣∣∣PSCY(m,j),(l,k)(λ)

∣∣∣2 , m, l = 1, . . . , s, j, k = 1, . . . , q

and m 6= l, j 6= k. The pairs of the component series are ranked according to their PSCs.At first stage, as in VAR and SVAR models, we always take the “diagonal” coefficients of the

PVAR model as non-zero. These are the coefficients

Am,r(j, j) 6= 0, r = s, 2s, . . . ,

9

that is, the coefficients in the PVAR regression when Y(m,j),t is regressed on its valuesY(m,j),t−1, Y(m,j),t−2, . . . in the previous cycles t − 1, t − 2, . . .. If the pair (m, j) and (l, k) withm ≥ l, are included in the PVAR model based on the rank of their PSC, the following coefficientsare then naturally set to non-zero: if m > l,

Am,r(j, k) 6= 0, r = m− l, 2(m− l), . . . ,

and, if m = l,Am,r(j,m) 6= 0, r = s, 2s, . . . .

For a possible range of (p,m) ∈ P × M, the order p and the number m of non-zero coefficients inthe PVAR model is selected by minimizing the BIC value

BIC(p,m) = −2 logLP(A1,1, . . . , As,p) +m log(N), (3.9)

where LP is the likelihood computed with the GLS estimates A1,1, . . . , As,p for the constrainedPVAR model. The calculation of the GLS estimators is detailed in Appendix B.

The refinement step can also be applied similarly to PVAR. By ranking the absolute values oft–statistic for non-zero PVAR coefficients, the top r∗ number of non-zero coefficients are finallyselected from the BIC criterion as in Davis et al. (2015).

3.2.2 Adaptive lasso approach

An adaptive lasso procedure for PVAR is straightforward by applying it to each season. LetX∗n = Xn − µm be the seasonally centered observations and set X∗ := (X∗1 , . . . , X

∗N ). Let also Qm,

m = 1, . . . , s, be the N × T matrix operator extracting observations falling into the mth seasonfrom the indices 1, . . . , N . Then, the PVAR model for the mth seasonal component can be writtenas

Ym = BmUm + Zm, (3.10)

where Ym = X∗Qm, Bm = (Am,1, . . . , Am,p), Um = UQm, Zm = ZQm with U = (U1, . . . , UN ),Z = (ε1, . . . , εN ) and Ut = vec(X∗t , X

∗t−1, . . . , X

∗t−pm−1). Vectorizing (3.10) gives the linear model

for PVAR:ym = (U′m ⊗ Iq)βm + zm

with ym = vec(Ym), βm = vec(Bm) and zm = vec(Zm).Similarly to the case of SVAR, possible correlations of εm,n can be incorporated in the adaptive

lasso by using the estimated covariance matrix Σ(`). This leads to the adaptive lasso estimator

β(`)

m = argminβm

1

T‖(IT ⊗ Σ

−1/2(`) )ym − (U′m ⊗ Σ

−1/2(`) )βm‖2 + λ`

pmq2∑j=1

w(`)j |βm,j |

, (3.11)

where βm,j represents the jth component parameter of βm and the weights are given by

w(`)j =

1

|β(`−1)m,j |.

The estimation (3.11) is iterated over `. For the `th iteration, the covariance matrix is obtained as

Σ(`) =1

T(Ym − B(`−1)

m Um)(Ym − B(`−1)m Um)′,

10

where B(`−1)m is the estimator from the previous step ` − 1. The 10–fold cross-validation rule is

used to select the tuning penalty parameter λ`. The iterations are stopped when the covariancematrix is estimated within the specified margin of error. We used the original lasso estimator fromthe 10-fold cross validation rule as the initial estimator. The order of PVAR model is selected byfinding the orders that minimize the average 10–fold cross validation error as in the SVAR modelselection.

4 Finite sample properties

In this section, we study the finite sample properties of the proposed methods through a simulationstudy.

4.1 Sparse SVAR models

We examine here the performance of our proposed two-stage and adaptive lasso procedures forsparse SVAR models. Consider a 6–dimensional SVAR(1, 1)12 model with period s = 12 given by

(1−A1B)(1−A12,1B12)Xn = εn,

where

A1 =

0 0 0 0 0 00 .6 0 0 0 00 0 0 .7 0 00 0 0 0 0 .50 0 0 0 −.3 00 0 0 0 0 0

, A12,1 =

0 0 0 .8 0 00 0 0 0 0 00 0 0 0 −.7 00 0 0 0 0 0.6 0 0 0 0 00 0 0 0 0 .8

. (4.1)

Note that the number of non-zero coefficients of A1 and A12,1 is 8 so that 88.88% of the coefficientsare set to zero in this model. To indicate the number of non-zero coefficients, we will also writethe model as sparse SVAR(1,1;8)12. A sequence of Gaussian i.i.d. innovations {εn} with zero meanand covariance matrix

Σ =

δ2 δ/4 δ/6 δ/8 δ/10 δ/12δ/4 1 0 0 0 0δ/6 0 1 0 0 0δ/8 0 0 1 0 0δ/10 0 0 0 1 0δ/12 0 0 0 0 1

is considered with three different values of δ ∈ {1, 5, 10}. The order of SVAR(p, P )12 model issearched within the pre-specified range p, P ∈ {0, 1, 2, 3}. The sample size is N = 80 × 12 = 960(with T = 80) and all results are based on 500 replications.

We first evaluate the performance of our proposed two-stage approach based on PSCs andadaptive lasso (A-LASSO) approaches in Table 2 by considering the average values (over 500replications) of estimated orders p, P , the numbers of non-zero coefficients at the two stages, andthe MSE. Observe from the table that our proposed methods find the correct order of SVAR(p,P )12 model in all the cases considered. The true number of non-zero coefficients in stage 1 basedon PSCs is 22 by including the diagonal and symmetric entries. It can also be seen that the two-stage approach based on PSCs finds non-zero coefficients reasonably well at both stages. For theadaptive lasso approach, first observe that it also estimates the orders p, P correctly. It performs

11

p P # coeff 1 # coeff 2 Bias2 Variance MSE

δ = 1 PSC 1 1 22.520 8.012 0.006 0.152 0.158A-LASSO 1 1 11.392 0.009 0.154 0.163

δ = 5 PSC 1 1 20.428 6.496 0.004 0.722 0.726A-LASSO 1 1 9.71 0.049 0.681 0.730

δ = 10 PSC 1 1 20.584 6.056 0.005 0.730 0.735A-LASSO 1 1 6.25 0.074 0.642 0.716

Table 2: Estimated orders p and P with the number of non-zero coefficients at each stage, andMSE for sparse SVAR(1,1;8)12.

1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

0.6

0

0

0.6

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.7

0

0

0

0.8

0

0

0

0

0

0

0

0

0

−0.3

0

0

0

−0.7

0

0

0

0

0

0

0.5

0

0

0

0

0

0

0

0.8

TRUE coefficients (delta=1)

1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

1

Selected proportions

1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

0.61

0

0

0.6

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.69

0

0

0

0.8

0

0

0

0

0

0

0

0

0

−0.1

0

0

0

−0.7

0

0

0

0

0

0

0.5

0

0

0

0

0

0

0

0.8

Estimated coefficients

1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

0.6

0

0

0.6

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.7

0

0

0

0.8

0

0

0

0

0

0

0

0

0

−0.3

0

0

0

−0.7

0

0

0

0

0

0

0.5

0

0

0

0

0

0

0

0.8


1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0.01

0

0.02

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

1


1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

0.6

0

0

0.6

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.7

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

−0.7

0

0

0

0

0

0

0.5

0

0

0

0

0

0

0

0.8


Figure 2: The coefficients of stacked A1 and A12,1 for simulated sparse SVAR(1,1;8)12 (left), therelative frequency on the non-zero coefficients (middle) and the estimated coefficients (right) fortwo-stage approach based on PSCs.

12

1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

0.6

0

0

0.6

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.7

0

0

0

0.8

0

0

0

0

0

0

0

0

0

−0.3

0

0

0

−0.7

0

0

0

0

0

0

0.5

0

0

0

0

0

0

0

0.8


1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0.03

0.01

0.04

0.01

0.02

0.02

0.03

0.01

0.01

0.88

1

0.02

0.01

1

0.02

0.02

0.01

0.01

0.02

0.01

0.07

0.01

0.01

0.01

0.02

0.02

0.06

0.01

0.01

0.02

0.01

0.02

0.04

0.15

0.03

0.01

0.01

0.01

1

0.01

0.28

0.02

1

0.02

0.02

0.01

0.01

0.02

0.01

0.01

1

0.01

1

0.01

0.02

0.02

1

0.01

0.01

0.02

0.02

0.02

0.02

1

0.01

0.02

0.01

0.02

0.03

0.01

0.02

1


1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

−0.07

0.58

0

0

0.59

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.4

0

−0.01

0

0.79

0

0

0

0

0

0

0

0.11

0

−0.09

0

0

0

−0.7

0

0

0

0

0

0

0.49

0

0

0

0

0

0

0

0.79


1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

0.6

0

0

0.6

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.7

0

0

0

0.8

0

0

0

0

0

0

0

0

0

−0.3

0

0

0

−0.7

0

0

0

0

0

0

0.5

0

0

0

0

0

0

0

0.8


1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0.01

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.01

0

0

0

0

0.99

0

0

0

0.12

0

0

0

0

0

0

0

0.1

0

0.01

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

1


1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

0

0

0.6

0

0

0.58

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.61

0

0

0

0.06

0

0

0

0

0

0

0

0.01

0

0

0

0

0

−0.7

0

0

0

0

0

0

0.47

0

0

0

0

0

0

0

0.79


Figure 3: The coefficients of stacked A1 and A12,1 for simulated sparse SVAR(1,1;8)12 (left), therelative frequency on the non-zero coefficients (middle) and the estimated coefficients (right) forthe adaptive lasso approach.

quite similarly to the two-stage approach based on PSCs in terms of MSE, but it has slightly largerbias than the PSC approach and tends to find a larger number of non-zero coefficients. The sparseSVAR modelling based on both PSCs and adaptive lasso works successfully in this simulation,but the PSC approach performs slightly better and is computationally less demanding than theadaptive lasso.

When the signal-to-noise ratio parameter δ is larger, our method tends to find a sparser model.This can be observed more clearly in both Figures 2 and 3. The left panels of Figure 2 representthe true coefficients (4.1) where the top 6 × 6 matrix represents A1 and the bottom 6 × 6 matrixis A12,1, and where the size and the brightness of a circle are proportional to the absolute value of

13

h = 1 h = 2 h = 3 h = 4 h = 6 h = 8 h = 10 h = 12

sparse VAR 10.230 11.157 12.103 11.013 12.082 11.217 11.699 11.275sparse PVAR (PSC) 53.227 62.488 53.507 55.354 46.170 43.552 49.275 49.832

SVAR 6.389 7.376 7.827 7.543 7.774 7.398 7.761 7.558sparse SVAR-(PSC) 6.281 7.357 7.702 7.430 7.615 7.270 7.562 7.500

sparse SVAR-(A-LASSO) 6.330 7.388 7.715 7.470 7.680 7.285 7.635 7.533

Table 3: MSPE(h) for sparse SVAR(1,1;8)12 and five different fitted models, namely, sparse VAR,sparse PVAR, SVAR and sparse SVAR based on PSCs and adaptive Lasso approaches when theinnovations are generated with δ = 1.

the average and the color corresponds to its sign. The middle panels show the proportion of (j, k)entries set to be non-zero out of 500 replications at stage 2, and the right panels are the averagesof estimated coefficients for 500 replications. When δ = 1, the selected proportions are essentially1 and the estimated coefficients are quite close to the true values. Even when δ is increased to 10so that larger noise now obscures the underlying true dynamics, our method still finds the correctlocation of non-zero coefficients together with good estimates, but tends to find a sparser model.For example, one diagonal term A1(5, 5) and A12,1(1, 4) are missed most of the time. For the lattercase, this might be because Var(ε1,n) = 100 and Var(ε4,n) = 1 so that the model will be almost thesame assuming that A12,1(1, 4) = 0.

We also compared forecasting errors among the following five models: a sparse VAR model(based on PSCs) not taking into account seasonal variations, a sparse PVAR model (based on PSCs),an unrestricted SVAR model and a sparse SVAR model with the proposed two-stage procedure andthe adaptive lasso. The order of sparse VAR(p) model is searched for p ∈ {0, 1, . . . , 8} as in Daviset al. (2015). The out-of-sample forecast performance is measured by computing the empiricalh–step–ahead forecast mean squared prediction error (MSPE)

MSPE(h) =1

Nr

Nr∑i=1

(X

(i)n+h − X

(i)n+h

)′ (X

(i)n+h − X

(i)n+h

),

where Xn+h is the best linear predictor based on {X1, . . . , Xn}. Table 3 indicates MSPE of sparseSVAR(1,1;8)12 model with δ = 1. Observe that SVAR model outperforms sparse VAR and sparsePVAR models as expected in terms of forecasting. Note that our proposed sparse SVAR modelsand SVAR without coefficient constraints perform quite similarly, but sparse SVAR based on PSCsachieves the smallest MSPE for all lags h = 1, . . . , 12 (lags 5, 7, 9 and 11 are omitted for brevity).However, note that sparse SVAR uses only 8 non-zero coefficients on average with the PSC approachand 11 non-zero coefficients for the adaptive lasso, which constitute only 11% and 15% respectively,of SVAR(1,1)12 model coefficients. Sparse modeling makes the model interpretation much easierand improves forecasting performance.

4.2 Sparse PVAR models

To evaluate the performance of the proposed methods on PVAR models, we consider a 3–dimensional PVAR(1)4 model with period s = 4 given by

Xn = Am,1Xn−1 + εm,n, m = 1, . . . , 4, (4.2)

14

p # coeff 1 # coeff 2 Bias2 Variance MSE

δ = 1 PSC 1 34.358 12.976 0.078 0.012 0.090A-LASSO 1 16.18 0.073 0.043 0.116

δ = 5 PSC 1 33.824 11.468 0.510 0.334 0.844A-LASSO 1 15.45 0.525 0.575 1.100

δ = 10 PSC 1 33.582 10.956 0.576 0.633 1.209A-LASSO 1 15.16 1.09 0.900 1.992

Table 4: Estimated order p, the number of non-zero coefficients at stage 1, the number of non-zerocoefficients at stage 2 and MSE for sparse PVAR(1;15)4.

where

A1,1 =

0 .5 .50 0 .30 .25 .5

, A2,1 =

.8 0 0.1 0 00 0 −.3

,

A3,1 =

0 .6 00 0 .20 .1 0

, A4,1 =

.7 0 00 −.5 00 −.2 −.8

,

and the errors εm,n are i.i.d. Gaussian noise with zero mean and covariance matrix

Σ =

δ2 δ/4 δ/6δ/4 1 0δ/6 0 1

.

Note that the simulated model has an autoregressive order of one with 15 non-zero coefficients,and we write it as sparse PVAR(1;15)4. The pre-specified order p ∈ {0, 1, 2, 3}, and three levels ofδ ∈ {1, 5, 10} are considered. The sample size is N = 1, 000 (and T = 250) and all results are basedon 500 replications.

Table 4 presents the five measures analogous to those in Table 2. First, observe that our two-stage approach selects the true order of autoregression in all the cases considered. Regarding theestimated non-zero coefficients, stage 1 always includes the diagonal entries and the coefficients areset to be non-zero symmetrically as explained in Section 3.2. Thus, the true number of such non-zero coefficients is 9 + 5 + 7 + 5 = 26, which is overestimated by the procedure of stage 1. However,the refinement stage 2 leads to a much smaller number of non-zero coefficients. The sparse PVARmodeling based on the adaptive lasso also successfully finds the true order, and the number ofestimated non-zero coefficients is very close to the true one. However, it has more variability thanthe PSC approach, leading to larger MSE.

Figures 4 and 5 depict which non-zero coefficients are selected from our two-stage procedure andthe adaptive lasso. Observe that the estimated non-zero coefficients are close to the true values.Observe from Table 4 that the MSE is increasing as δ is getting larger. The effect of increasing δis seen more clearly in Figure 4. In our simulated model, fewer non-zero coefficients are selectedwhile the estimated non-zero coefficients are close to the true values.

Forecasting performance is also compared for sparse VAR (based on PSCs), sparse SVAR,unconstrained PVAR(1)4 and sparse PVAR(1;15)4 models in Table 5. Sparse PVAR(1;15)4 modelis used with δ = 1. Observe that our sparse PVAR models achieve the smallest MSPE in all thecases considered. The estimation based on PSCs performs slightly better than the adaptive lassoapproach. As in the case of SVAR, unconstrained PVAR and sparse PVAR perform quite similarly

15

h = 1 h = 2 h = 3 h = 4

sparse VAR 3.508 5.019 4.223 4.460sparse SVAR (PSC) 3.660 4.901 4.486 4.607

PVAR 2.858 4.116 3.765 4.450sparse PVAR (PSC) 2.792 4.127 3.758 4.441

sparse PVAR (A-LASSO) 2.797 4.133 3.757 4.448

Table 5: MSPE(h) for sparse PVAR(1;15)4 and five different fitted models, namely, sparse VAR,sparse SVAR, PVAR and sparse PVARs based on PSCs and the adaptive lasso when the innovationsare generated with δ = 1.

1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0.8

0.1

0

0

0

0

0.7

0

0

0.5

0

0.25

0

0

0

0.6

0

0.1

0

−0.5

−0.2

0.5

0.3

0.5

0

0

−0.3

0

0.2

0

0

0

0.8


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0.01

0.01

0.01

1

0.27

0.01

0.01

0.01

0.01

1

0.01

0.01

1

0.02

0.95

0

0.02

0.01

1

0.01

0.15

0.02

1

0.68

1

1

1

0.01

0.02

0.97

0.01

0.74

0.01

0.01

0.01

1


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0.79

0.04

0

0

0

0

0.7

0

0

0.5

0

0.24

0

0

0

0.59

0

0.03

0

−0.51

−0.16

0.5

0.3

0.5

0

0

−0.29

0

0.16

0

0

0

0.8


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0.8

0.1

0

0

0

0

0.7

0

0

0.5

0

0.25

0

0

0

0.6

0

0.1

0

−0.5

−0.2

0.5

0.3

0.5

0

0

−0.3

0

0.2

0

0

0

0.8


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0.01

0.01

0.01

1

1

0.01

0.01

0.01

0.02

1

0.01

0.01

0.04

0.01

0.93

0.01

0.01

0.01

0.06

0.01

0.24

0.02

0.99

0.7

0.07

0.99

1

0.01

0.01

0.98

0.02

0.73

0.01

0.01

0.01

1


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0.8

0.1

0

0

0

0

0.7

0

0

0.06

0

0.23

0.01

0

0

0.08

0

0.04

−0.01

−0.49

−0.16

0.1

0.29

0.49

0

0

−0.3

−0.01

0.16

0

0.01

0

0.8


Figure 4: The coefficients of stacked A1,1, A2,1, A3,1 and A4,1 for simulated sparse PVAR(1;15)4(left), the relative frequency on the non-zero coefficients (middle) and the corresponding estimatedcoefficients (right) for the two-stage PSC approach.

16

1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0.8

0.1

0

0

0

0

0.7

0

0

0.5

0

0.25

0

0

0

0.6

0

0.1

0

−0.5

−0.2

0.5

0.3

0.5

0

0

−0.3

0

0.2

0

0

0

0.8


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0.15

0.14

0.15

1

0.49

0.13

0.13

0.12

0.13

1

0.11

0.1

1

0.18

0.99

0.1

0.1

0.09

1

0.1

0.37

0.12

1

0.9

1

1

1

0.1

0.13

1

0.11

0.9

0.1

0.11

0.12

1


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0.78

0.05

0

0

0

0

0.69

0

0

0.49

0

0.22

0

0

0

0.58

0

0.04

0

−0.48

−0.16

0.49

0.28

0.49

0

0

−0.28

0

0.16

0

0

0

0.79


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0.8

0.1

0

0

0

0

0.7

0

0

0.5

0

0.25

0

0

0

0.6

0

0.1

0

−0.5

−0.2

0.5

0.3

0.5

0

0

−0.3

0

0.2

0

0

0

0.8


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0.13

0.13

0.12

1

1

0.13

0.18

0.2

0.19

1

0.15

0.12

0.21

0.22

0.99

0.07

0.13

0.21

0.31

0.19

0.66

0.12

1

0.74

0.2

0.99

1

0.1

0.12

0.99

0.16

0.84

0.23

0.13

0.21

1


1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0.78

0.1

0

0.01

0

0

0.69

0

0

0.19

0.01

0.24

0.01

0

0.02

0.25

0.01

0.08

0.02

−0.44

−0.12

0.14

0.27

0.47

0.01

0

−0.26

−0.05

0.16

0

0.03

0

0.79


Figure 5: The coefficients of stacked A1,1, A2,1, A3,1 and A4,1 for simulated sparse PVAR(1;15)4(left), the relative frequency on the non-zero coefficients (middle) and the corresponding estimatedcoefficients (right) for the adaptive lasso approach.

in terms of forecasting. However, sparse PVAR model uses only 36% of the non-zero coefficientscompared to PVAR(1)4. Thus, sparse PVAR makes the model interpretation simpler while keepingthe same forecasting performance as unconstrained PVAR.

17

Time

NO

0 2000 4000 6000 8000

−0

.50

.51

.5

NO

0 20 40 60 80 100

−0

.20

.20

.61

.0

Lag

AC

F

SACF

0 5 10 15 20 25 30

0.0

0.4

0.8

Lag

Pa

rtia

l AC

F

SPACF

5 10 15 20

−0

.20

.00

.20

.4

season

Me

an

Seasonal mean

5 10 15 20

0.3

0.5

season

SD

Seasonal STDEV

Figure 6: Time plot and sample ACF and PACF plots for detrended NO concentration. Seasonalmean and standard deviation are also depicted.

5 Applications to real data

5.1 Air quality in CA

The air quality data observed hourly during the year of 2006 at Azusa, California is analyzedbased on the methods proposed in Section 3. We have considered four concentration levels of airpollutants, namely, CO, NO, NO2, Ozone, as well as solar radiation as a natural force. All data canbe downloaded from Air Quality and Meteorological Information System (AQMIS). Since stationsare not operating from 4 AM to 5 AM, there are only 23 observations for each day. We also appliedlinear interpolation for missing values. Thus, the series is 5–dimensional with N = 365×23 = 8395observations in each component series.

Before fitting a model, we applied the log transformation and cubic polynomial regression toremove heteroscedascity and deterministic trend. Figure 6 shows the time plot of the detrendedNO concentration together with several diagnostics plots. First, observe that the sample ACFand PACF indicate the presence of cyclical variations. Hence, a seasonal model seems plausible.However, as the bottom seasonal mean and standard deviation plots show, seasons have varyingmean and standard deviation, suggesting a periodic model for the air quality data. The other serieshave similar properties and are not considered here for brevity.

We applied our proposed two-stage method of Section 3.2 to fit a sparse PVAR model with thepre-specified order p ∈ {0, 1, 2, 3}. Figure 7 depicts the BIC curves for stages 1 and 2, showingthat the best selected model is sparse PVAR(1;256)23. When the adaptive lasso approach is used,

18

0 1000 2000 3000 4000 5000 6000

−1

30

00

0−

11

00

00

−9

00

00

−8

00

00

Index

BIC

Stage 1

p=1

p=2

p=3

0 100 200 300 400 500

−1

30

00

0−

11

00

00

−9

00

00

−8

00

00

Index

BIC

Stage 2

p=1

Figure 7: BIC plots for the two-stage procedure in finding the best sparse PVAR model for the airquality data.

h = 1 h = 2 h = 4 h = 8 h = 12 h = 13

sparse VAR(5;70) .201 .678 2.107 6.458 8.840 9.589sparse SVAR(2,1;43)23 (PSC) .602 1.309 2.521 4.595 6.339 6.801

PVAR(1;575)23 .189 .273 .356 .280 .269 .270sparse PVAR(1;256)23 (PSC) .180 .257 .290 .246 .235 .234

sparse PVAR(1;320)23 (A-LASSO) .182 .249 .249 .238 .235 .232

Table 6: The h–step forecast MSE for air quality data with sparse VAR, sparse SVAR based onPSCs, (non-sparse) PVAR and sparse PVAR models. The sparse PVAR(1;256)23 model achievesthe smallest h–step forecast MSE in all cases considered.

sparse PVAR(1;320)23 is selected.We also examined the fitted sparse PVAR model for the air quality data by comparing out-of-

sample forecasts as in Davis et al. (2015). The h–step-ahead forecast mean squared error (MSE) iscalculated by

MSE(h) =1

q(Tt − h+ 1)

T+Tt−h∑t=T

(Yt+h − Yt+h)′(Yt+h − Yt+h),

where Yt+h is the h–step-ahead best linear forecast of Yt+h with the training sample size T andtest sample size Tt. In this analysis, we used the first 8386 observations as the training set andset Tt = 46. We compared five different models: sparse VAR, sparse SVAR based on PSCs,unconstrained PVAR, and sparse PVARs based on PSCs and the adaptive lasso. The best sparseVAR model is VAR(5;70), and the unconstrained PVAR of order 1 is selected, hence the numberof non-zero coefficients is 575. For h=1, 2, 4, 8, 12 and 13, h–step-ahead forecast MSE is reportedin Table 6. First, the sparse VAR model has increasing MSE as h increases. Thus, the sparse VARmodel performs poorly since cyclical variations are not incorporated in the model. The best sparseSVAR model is SVAR(2,1)23 with 43 non-zero coefficients. The sparse SVAR model incorporatescyclical variations into the model, but its forecasting performance is considerably worse comparedto the PVAR models. While the PVAR models provide similar and reasonably small MSEs, ourproposed sparse PVARs achieve slightly smaller MSEs for all the considered lags h. This is inaddition to a better model interpretability a sparse PVAR model provides.

19

time

flu

2004 2006 2008 2010 2012

−1

.00

.52

.0

Flu trend − CA

time

flu

2004 2006 2008 2010 2012

−1

.00

.52

.0

Flu trend − GA

time

flu

2004 2006 2008 2010 2012

−1

.00

.52

.0

Flu trend − IL

time

flu

2004 2006 2008 2010 2012

−1

.00

.52

.0

Flu trend − NJ

time

flu

2004 2006 2008 2010 2012

−1

.00

.52

.0

Flu trend − TX

0 10 20 30 40 50

−0

.40

.20

.8

Lag

AC

FSACF (CA)

Figure 8: Monthly Google flu trend data.

5.2 Google flu trend

In this section, we consider the celebrated Google flu trend data on the weekly predicted number ofinfluenza-like-illness (ILI) related visits per 100,000 outpatients in a US region. More specifically,we consider the monthly data obtained by aggregating weekly observations from the first weekof 2004 to the last week of 2013. Though the Google flu data is available for the 50 states, theDistrict of Columbia and 122 major cities over the US, we consider only 5 states (CA, GA, IL, NJ,TX) for illustration. Thus, the dimension of the data is q = 5 and N = 132. We also take a logtransformation to make the series more stationary.

Figure 8 shows the time plot of the data and the right bottom plot shows the sample ACFfor CA. Observe that the Google flu data exhibits cyclical behavior, which we model through asparse SVAR model as proposed in Section 3.1. The best model is sparse SVAR(1,1)12 with 11 non-zero coefficients for the two-stage procedure based on PSCs, SVAR(1,1;41)12 for the adaptive lassoapproach. If we ignore cyclical variations, then the procedure of Davis et al. (2015) finds sparseVAR(1) as the best model with 16 non-zero coefficients. Figure 9 shows estimated coefficients forsparse VAR(1) and sparse SVAR(1,1)12 models.

To see which model explains the observed data better, we conducted the out-of-sample forecast-ing comparison as described in the previous section. The results are more interesting here. Table 7reports MSPE(h), h = 1, 2 and 3, for five different models, namely, sparse VAR, sparse PVAR basedon PSCs, (non-sparse) SVAR and sparse SVAR models based on PSCs and the adaptive lasso. Weused the first 120 observations as the training set and the subsequent 12 observations for the test

20

1 2 3 4 5

1

2

3

4

5

0.41

0

0

0

0

0.18

0.79

0.34

0

0.41

0

−0.63

0

0

−0.36

−0.27

−0.31

−0.41

0

−0.34

0.52

1

0.92

0.92

1.07

sparse VAR(1)

1 2 3 4 5

1

2

3

4

5

0.66

0

0

0

0

0

0.59

0

0

0

0

0

0

0

0

0

0

0

0.34

0

0

0

0.65

0.38

0.53

sparse SVAR(1,1)−AR

1 2 3 4 5

1

2

3

4

5

0.2

0

0

0

0

0

0.31

0

0

0

0

0

0.26

0

0

0

0

0

0.2

0

0

0

0

0

0.24

sparse SVAR(1,1)−SAR

Figure 9: Estimated coefficients for sparse VAR(1) and sparse SVAR(1,1)12 for monthly flu data.

h = 1 h = 2 h = 3

sparse VAR(1;16) .370 .573 .813sparse PVAR(1;136)12 (PSC) .451 .854 .586

SVAR(1,1;50)12 .241 .396 .439sparse SVAR(1,1;11)12 (PSC) .205 .316 .425

sparse SVAR(1,1;41)12 (A-LASSO) .222 .360 .355

Table 7: The h–step forecast MSE for monthly flu data with sparse VAR, sparse PVAR based onPSCs, (non-sparse) SVAR and sparse SVAR models.

data. Note that the SVAR models outperform the sparse VAR model, showing the advantages ofmodeling cyclical variations. Also, our proposed sparse SVAR(1,1)12 models achieve the smallestforecasting errors, in turn indicating that our sparse SVAR(1,1)12 models not only provide an easiermodel interpretation but also better forecasting performance. The sparse PVAR(1;136)12 modelperforms the worst amongst the four fitted models. This may be due to the small sample size anda relatively large number of selected non-zero coefficients.

6 Conclusions

In this work, we study the estimation of the sparse seasonal and periodic vector autoregressivemodels. These SVAR and PVAR models play a central role in modeling time series with cyclicalvariations. We consider two popular approaches available for the modeling of high dimensionaltime series: first, the regularization approach based on the adaptive lasso and, second, the variableselection approach based on PSCs. However, both approaches do not apply directly to SVARand PVAR because SVAR models are nonlinear, and aggregating results across different seasonsin PVAR models is not immediate. These issues are resolved by using linearization and newways of using information from PSCs as detailed in Section 3. Finite sample simulations showgood performance of our methods as reported in Section 4. In particular, the number of non-zero coefficients, parameter and model order estimates are reasonably close to the true values andforecasting is shown to be superior to those from the other models considered. Our methods areillustrated on real data applications and shown to outperform other approaches especially in termsof out-of-sample forecasting.

21

A Estimation in constrained SVAR models

In this appendix, we present an algorithm to estimate the coefficients for SVAR models with(sparsity) constraints. The SVAR(p, P )s model can be written as (assuming µ = 0)

Xn =

p∑r=1

ArXn−r +

P∑R=1

As,RXn−Rs −p∑r=1

P∑R=1

ArAs,RXn−Rs−r + εn, (A.1)

where {εn} is WN(0,Σ). Due to the cross terms in (A.1), the usual estimation algorithm forVAR with constraints (e.g. Lutkepohl (2005), Chapter 5) cannot be applied. We shall adapt thisapproach after linearizing (A.1). Vectorizing (A.1) yields

vec(Xn) =

p∑r=1

vec(ArXn−r) +P∑R=1

vec(As,RXn−Rs)−p∑r=1

P∑R=1

vec(ArAs,RXn−Rs−r

)+ vec(εn)

=

p∑r=1

(X ′n−r ⊗ Iq

)vec(Ar) +

P∑R=1

(X ′n−Rs ⊗ Iq)vec(As,R)

−p∑r=1

P∑R=1

(X ′n−Rs−r ⊗ Iq

)vec(ArAs,R) + vec(εn). (A.2)

Now, linearize the cross term in (A.2) as

vec(ArAs,R) ≈ vec(ArAs,R) +∂vec(ArAs,R)

∂vec(Ar)′

∣∣∣∣Ar,As,R

(vec(Ar)− vec(Ar))

+∂vec(ArAs,R)

∂vec (As,R)′

∣∣∣∣Ar,As,R

(vec(As,R)− vec(As,R))

at some neighborhood of Ar,As,R. By using this linearization and the identities

vec(ArAs,R) =(A′s,R ⊗ Iq

)vec(Ar) = (Iq ⊗Ar) vec (As,R)

to compute the derivatives in the linearization, the relation (A.2) can be linearized as

Yn =

p∑r=1

{(X ′n−r ⊗ Iq

)−

P∑R=1

(Xn−Rs−r

′ ⊗ Iq)

(As,R ⊗ Iq)

}vec(Ar)

+P∑R=1

(Xn−Rs

′ ⊗ Iq)

vec(As,R)−p∑r=1

P∑R=1

(Xn−Rs−r

′ ⊗ Iq)

(Iq ⊗ Ar) vec(As,R) + vec(εn), (A.3)

where

Yn = vec(Xn)+

p∑r=1

P∑R=1

(X ′n−Rs−r ⊗ Iq

) {vec (ArAs,R)−

(A′s,R ⊗ Iq

)vec(Ar)− (Iq ⊗ Ar) vec (As,R)

}.

(A.4)The model (A.3) is the vectorized and linearized version of the relation (A.1) around the points Arand As,R.

Rewriting (A.3) in the matrix form and viewing Yn as a response, we have

Y = Xγ + ε, (A.5)

22

where Y = vec(Y1, . . . , YN ), γ = vec(A1, . . . , Ap, As,1, . . . , As,P

), and ε = vec(ε1, . . . , εN ) with the

corresponding design matrix X. Write also the sparsity constraints as

γ = Rα, (A.6)

so that α contains only the non-zero coefficients of the SVAR model. Then, the GLS estimator ofγ is given by

γ = R((XR)′Σ−1(XR)

)−1 ((XR)′Σ−1Y

)(A.7)

with the covarianceVar(γ) = R

((XR)′Σ−1(XR)

)−1R′ (A.8)

(e.g. Lutkepohl (2005), p. 197). The estimated GLS (EGLS) estimator of γ in (A.5) is then definedby replacing Σ with a suitable estimate.

Turning back to the multiplicative constrained SVAR model (A.1), we propose to estimate itscoefficients through the following iterative procedure.

STEP 0: First, we obtain initial estimates for constrained SVAR based on the approximation of(A.1) by discarding the cross terms, namely,

Xn =

p∑r=1

ArXn−r +P∑R=1

As,RXn−Rs + zn.

The model can be written in a compact form as

W = (U′ ⊗ Iq)γ + Z,

where W = vec(XPs+1, . . . , XN ), U = (UPs+1, . . . , UN ) with Ut =vec(Xt−1, . . . , Xt−p, Xt−s, . . . , Xt−Ps), γ = vec(A1, . . . , Ap, As,1, . . . , As,P ) andZ = vec(zPs+1, . . . , zN ). Then, the least squares (LS) estimator of γ is givenby

γ(0) := vec(A(0)1 , . . . , A(0)

p , A(0)s,1, . . . , A

(0)s,P ) = R

(R′(UU′ ⊗ Iq)R

)−1R′ (U⊗ Iq)W

(A.9)(e.g. Lutkepohl (2005), Section 5.2). Set i = 0.

STEP 1: For a given estimate of γ(i) = vec(A(i)1 , . . . , A

(i)p , A

(i)s,1, . . . , A

(i)s,P ), define Σ(i+1) by

Σ(i+1) =1

N

N∑n=max(p,Ps)+1

(Xn − X(i)

n

)(Xn − X(i)

n

)′(A.10)

where

X(i)n =

p∑r=1

A(i)r Xn−r +

P∑R=1

A(i)s,RXn−Rs −

p∑r=1

P∑R=1

A(i)r A

(i)s,RXn−Rs−r. (A.11)

Then, update the EGLS estimates of the constrained SVAR model through (A.7) as

γ(i+1) = R

((X(i)R)′Σ(i+1)

−1(X(i)R)

)−1((X(i)R)′Σ(i+1)

−1Y(i)

),

where the X(i) and Y(i) are obtained by applying linearization at points Ar = A(i)r and

As,R = A(i)s,R in (A.5).

STEP 2: Stop if‖γ(i+1) − γ(i)‖ ≤ ε

for predetermined tolerance level ε. Otherwise, set i to i+ 1 and go back to STEP 1.

23

B Estimated GLS for constrained PVAR models

The estimated GLS (EGLS) procedure for constrained PVAR models applies EGLS to constrainedVAR for each season. Suppose that the constraint for the mth season component coefficients isrepresented as

δm = Rmβm, m = 1, . . . , s,

where δm := vec(Bm) := vec(Am,1, . . . , Am,p). Let X∗n = Xn − µm be the seasonally centeredobservations and write X∗ := (X∗1 , . . . , X

∗N ). Let also Qm be the matrix operation extracting

observations falling into the mth season from the indices 1, . . . , N . Then, for the mth seasoncomponent, PVAR model can be represented as

Ym = BmUm + Zm,

where Ym = X∗Qm, Um = UQm, Zm = ZQm with U = (U1, . . . , UN ), Z = (ε1, . . . , εN ) andUt = vec(X∗t , X

∗t−1, . . . , X

∗t−p−1). The GLS estimator for δm is given by

δm = Rm(Rm(UmU

′m ⊗ Σ−1m )Rm

)−1R′m(Um ⊗ Σ−1m )ym,

where ym = vec(Ym) and Σm is the covariance matrix for the mth season component innova-tions. Then, the EGLS is obtained by using estimates of Σm iteratively till a prespecified toler-

ance level is achieved. For an initial estimator of Σm, the OLS estimator δ(0)m := vec(B

(0)m ) :=(

(UmU′m)−1 ⊗ Iq

)ym can be used to obtain

Σ(0)m =

1

T(Ym − B(0)

m Um)(Ym − B(0)m Um)′,

where T is the number of observations falling into the mth season. The EGLS estimator is consistentand asymptotically normal and the variance can be estimated through

Var(δCm) = Rm

(R′m(UmU

′m ⊗ (ΣC

m)−1)Rm

)−1R′m.

References

Basu, S. and Michailidis, G. (2015), ‘Regularized estimation in sparse high-dimensional time series models’,The Annals of Statistics 43(4), 1535–1567.

Box, G. and Jenkins, G. (1976), Time Series Analysis: Forecasting and Control, Holden-Day.

Brockwell, P. and Davis, R. A. (2009), Time Series: Theory and Methods, Springer Series in Statistics,Springer, New York. Reprint of the second (1991) edition.

Buhlmann, P. and van de Geer, S. (2013), Statistics for High-Dimensional Data: Methods, Theory andApplications, Springer Series in Statistics, Springer Berlin Heidelberg.

Davis, R. A., Zang, P. and Zheng, T. (2015), ‘Sparse vector autoregressive modeling’, Journal of Computa-tional and Graphical Statistics, To appear.

Franses, P. H. and Paap, R. (2004), Periodic Time Series Models, Advanced Texts in Econometrics, OxfordUniversity Press, Oxford.

Ghysels, E. and Osborn, D. R. (2001), The Econometric Analysis of Seasonal Time Series, Themes inModern Econometrics, Cambridge University Press, Cambridge.

24

Giraud, C. (2014), Introduction to High-Dimensional Statistics, Chapman & Hall/CRC Monographs onStatistics & Applied Probability, Taylor & Francis.

Hannan, E. J. (1970), Multiple Time Series, John Wiley and Sons, New York-London-Sydney.

Hastie, T., Tibshirani, R. and Friedman, J. (2013), The Elements of Statistical Learning: Data Mining,Inference, and Prediction, Second Edition, Springer Series in Statistics, Springer.

Hsu, N.-J., Hung, H.-L. and Chang, Y.-M. (2008), ‘Subset selection for vector autoregressive processes usinglasso’, Computational Statistics & Data Analysis 52(7), 3645–3657.

Hurd, H. L. and Miamee, A. (2007), Periodically Correlated Random Sequences, Wiley Series in Probabilityand Statistics, Wiley-Interscience, Hoboken, NJ.

Kock, A. B. and Callot, L. (2015), ‘Oracle inequalities for high dimensional vector autoregressions’, Journalof Econometrics 186(2), 325 – 344.

Lund, R. (2011), Choosing seasonal autocovariance structures: PARMA or SARMA, in W. Bell, S. Holanand T. McElroy, eds, ‘Economic Time Series: Modelling and Seasonality’, Chapman and Hall, pp. 63–80.

Lutkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Springer-Verlag, Berlin.

Medeiros, M. C. and Mendes, E. (2012), ‘Estimating high-dimensional time series models’, CREATES Re-search Paper 37.

Nicholson, W. B., Matteson, D. S. and Bien, J. (2015), ‘VARX-L: Structured regularization for large vectorautoregressions with exogenous variables’, Preprint.

Shojaie, A. and Michailidis, G. (2010), ‘Discovering graphical Granger causality using the truncating lassopenalty’, Bioinformatics 26(18), i517–i523.

Song, S. and Bickel, P. J. (2011), ‘Large vector autoregressions’, arXiv preprint arXiv:1106.3915 .

Tibshirani, R. (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal StatisticalSociety, Series B 58, 267–288.

Zou, H. (2006), ‘The adaptive lasso and its oracle properties’, Journal of the American Statistical Association101(476), 1418–1429.

Changryong Baek Richard A. DavisDept. of Statistics Dept. of StatisticsSungkyunkwan University Columbia University25-2, Sungkyunkwan-ro, Jongno-gu 1255 Amsterdam Avenue, MC 4690Seoul, 110-745, Korea New York, NY 10027, [email protected] [email protected]

Vladas PipirasDept. of Statistics and Operations ResearchUNC at Chapel HillCB#3260, Hanes HallChapel Hill, NC 27599, [email protected]

25

Sparse seasonal and periodic vector autoregressive modeling

Documents

Transcript of Sparse seasonal and periodic vector autoregressive modeling