Sparse seasonal and periodic vector autoregressive modeling

Click here to load reader

  • date post

    21-Jan-2017
  • Category

    Documents

  • view

    218
  • download

    0

Embed Size (px)

Transcript of Sparse seasonal and periodic vector autoregressive modeling

  • Sparse seasonal and periodic vector autoregressive modeling

    Changryong BaekSungkyunkwan University

    Richard A. DavisColumbia University

    Vladas PipirasUniversity of North Carolina

    October 16, 2015

    Abstract

    Seasonal and periodic vector autoregressions are two common approaches to modeling vectortime series exhibiting cyclical variations. The total number of parameters in these modelsincreases rapidly with the dimension and order of the model, making it difficult to interpretthe model and questioning the stability of the parameter estimates. To address these andother issues, two methodologies for sparse modeling are presented in this work: first, based onregularization involving adaptive lasso and, second, extending the approach of Davis, Zang andZheng (2015) for vector autoregressions based on partial spectral coherences. The methods areshown to work well on simulated data, and to perform well on several examples of real vectortime series exhibiting cyclical variations.

    1 Introduction

    In this work, we introduce methodologies for sparse modeling of stationary vector (qdimensional)time series data exhibiting cyclical variations. Sparse models are gaining traction in the time seriesliterature for similar reasons sparse (generalized) linear models are used in the traditional settingof i.i.d. errors. Such models are particularly suitable in a high-dimensional context, for whichthe number of parameters often grows as q2 (as for example with vector autoregressive modelsconsidered below) and becomes prohibitively large compared to the sample size even for moderateq. Sparse models also ensure better interpretability of the fitted models and numerical stability ofthe estimates, and tend to improve prediction.

    In the vector time series context, sparse modeling has been considered for the class of vectorautoregressive (VAR) models:

    Xn = A1(Xn1 ) + . . .+Ap(Xnp ) + n, n Z, (1.1)

    where Xn = (X1,n, . . . , Xq,n) is a qvector time series, A1, . . . , Ap are qq matrices, is the overall

    constant mean vector and n are white noise (WN) error terms. Regularization approaches based onlasso and its variants were taken in Hsu, Hung and Chang (2008), Shojaie and Michailidis (2010),Song and Bickel (2011), Medeiros and Mendes (2012), Basu and Michailidis (2015), Nicholson,Matteson and Bien (2015), Kock and Callot (2015), with applications to economics, neuroscience

    AMS subject classification. Primary: 62M10, 62H12. Secondary: 62H20.Keywords and phrases: seasonal vector autoregressive (SVAR) model, periodic vector autoregressive (PVAR)

    model, sparsity, partial spectral coherence (PSC), adaptive lasso, variable selection.The work of the first author was supported in part by the Basic Science Research Program from the Na-

    tional Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A1A1006025). The third author was supported in part by NSA grant H98230-13-1-0220.

    1

  • (e.g. functional connectivity among brain regions), biology (e.g. reconstructing gene regulatorynetwork from time course data), and environmental science (e.g. pollutants levels over time). Asusual, the model (1.1) will be abbreviated as

    (B)(Xn ) = n, n Z, (1.2)

    where (B) = 1A1B . . .ApBp and B is the backshift operator.In a different approach, Davis et al. (2015) introduced an alternative 2stage procedure for

    sparse VAR modeling. In the first stage, all pairs of component series are ranked based on theestimated values of their partial spectral coherences (PSCs), defined as

    sup|PSCXjk()|2 := sup

    |gXjk()|2

    gXjj()gXkk()

    , j, k = 1, . . . , q, j 6= k, (1.3)

    where gX() = fX()1 with fX being the spectral density matrix of X. Then, the order p and

    the top M pairs are found which minimize the BIC(p,M) value, and the coefficients of matrices Arare set to 0 for all pairs of indices j, k not included in M . In the second stage, the estimates of theremaining non-zero coefficients are ranked according to their tstatistic values. Again, the top m

    of the coefficients are selected that minimize a suitable BIC, and then the rest of the coefficientsare set to 0. As shown in Davis et al. (2015), this 2stage procedure outperforms regular lasso. Thebasic idea of this approach is that small PSCs do not increase the likelihood sufficiently to warrantthe inclusion of the respective coefficients of matrices Ar in the model.

    We shall extend here the regularization approach based on lasso and the approach of Davis et al.(2015) based on PSCs to sparse modeling of vector time series data exhibiting cyclical variations.The motivation here is straightforward. Consider, for example, the benchmark flu trends andpollutants series studied through sparse VAR models by Davis et al. (2015), and others. Figure 1depicts the plots of (the logs of) their two component series with the respective sample ACFs andPACFs. The cyclical nature of the series can clearly be seen from the figure. The same holds forother component series (not illustrated here).

    Cyclical features of component series are commonly built into a larger vector model by usingone of the following two approaches. A seasonal VAR model (SVAR(P, p) model, for short; not tobe confused with the so-called structural VAR) is one possibility, defined as

    (B)s(Bs)(Xn ) = n, n Z, (1.4)

    where (B) and n are as in (1.2), s(Bs) = 1 As,1Bs . . . As,PBPs with q q matrices

    As,1, . . . , As,P , denotes the overall mean and s denotes the period. This is the vector version ofthe multiplicative seasonal AR model proposed by Box and Jenkins (1976). Another possibility isa periodic VAR model (PVAR(p) model, for short) defined as

    m(B)(Xn m) = m,n, n Z, (1.5)

    where m(B) = 1Am,1B . . .Am,pBp with q q matrices Am,1, . . . , Am,p which depend on theseason m = 1, . . . , s wherein the time n falls (that is, there are in fact sp matrices A of dimensionq q), and m refers to seasonal mean. One could also allow p depend on the season m = 1, . . . , s.Note that whereas the overall mean and the covariance matrix Enn = are constant in (1.4),the mean m in (1.5) and the covariance matrix Em,nm,n = m are allowed to depend on theseason m.

    Both seasonal and periodic VAR models are widely used. For SVAR models, including theunivariate case, see Brockwell and Davis (2009), Ghysels and Osborn (2001). These models form

    2

  • Time

    y

    2004 2006 2008 2010 2012 2014

    1

    .00

    .01

    .02

    .0

    Monthly Flu NC

    0 10 20 30 40 50

    0

    .6

    0.2

    0.2

    0.6

    Lag

    AC

    F

    SACF Monthly Flu NC

    0 10 20 30 40 50

    0

    .40

    .00

    .40

    .8

    Lag

    PA

    CF

    SPACF Monthly Flu NC

    Time

    y

    0 10 20 30 40 50

    0

    .50

    .00

    .51

    .0

    Ozone

    0 10 20 30 40 50

    0

    .20

    .20

    .40

    .60

    .8

    Lag

    AC

    F

    SACFOzone

    0 10 20 30 40 50

    0.0

    0.2

    0.4

    0.6

    0.8

    Lag

    PA

    CF

    SPACFOzone

    Figure 1: Top: Monthly flu trend in NC. Bottom: 23 hour ozone levels at a CA location. Respectivesample ACFs and PACFs are given.

    the basis for the U.S. Census X-12-ARIMA seasonal adjustment program. PVAR models, againwith the focus on univariate series, are considered in the monographs by Ghysels and Osborn(2001), Franses and Paap (2004), Lutkepohl (2005), Hurd and Miamee (2007). These referencesbarely scratch the surface. The vast amount of work should not be surprising most economic,environmental and other time series naturally exhibit cyclical variations.

    Sparse modeling is proposed below for both SVAR and PVAR models. These are two differentclasses of models. Both classes are considered here because of their central role in the analysis oftime series with cyclical variations, and because the real time series of the flu and pollutants datadiscussed above are, in fact, better modeled by one of the two types of models. Indeed, for example,the 1step-ahead mean square prediction errors for the flu data are 0.0807 (best AR model), 0.0605(best seasonal AR model), and 0.9759 (best periodic AR model). For the pollutants series, theprediction errors are smaller for periodic AR models when long (at least 2step-ahead) horizons areconsidered. In this work, we thus decide between SVAR and PVAR models just based on how wellthey fit the data and perform in prediction. For more systematic approaches to choosing betweenseasonal and periodic models, see e.g. Lund (2011) and references therein.

    The regularization approach to SVAR and PVAR models is based on adaptive lasso, and issomewhat standard. The regular lasso of Tibshirani (1996) is well known to overestimate thenumber of nonzero coefficients (e.g. Buhlmann and van de Geer (2013)). The adaptive lasso ofZou (2006) corrects this tendency in estimating fewer non-zero coefficients. While the applicationof adaptive lasso to PVAR models is straightforward, a linearization and an iterative version ofadaptive lasso is used for SVAR models, which are nonlinear by their construction.

    Our extension of the Davis et al. (2015) approach based on PSCs to sparse SVAR and PVAR

    3

  • models involves the (qs)vector series

    Yt =

    X(t1)s+1X(t1)s+2

    ...X(t1)s+s

    ,where s is the period as above and t now refers to a cycle. For the PVAR model, the series {Yt} isnow (second-order) stationary. The Davis et al. (2015) approach can then be applied, though notdirectly since a VAR model for {Yt} is too complex for our purposes. For SVAR models, it is naturalto estimate first a sparse seasonal filter s(B

    s) by considering the between-period (between-cycle)series X(t1)s+m as a series in