A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and...

download A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume Issue 2015 [Doi 10.1080_00949655.2015.1116535] Steward, Robert M.; Rigdon,

of 9

Transcript of A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and...

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    1/20

    Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=gscs20

    Download by: [Library Services City University London] Date: 19 December 2015, At: 09:47

     Journal of Statistical Computation and Simulation

    ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20

    A Bayesian wavelet approach to estimation of a change-point in a nonlinear multivariate timeseries

    Robert M. Steward & Steven E. Rigdon

    To cite this article: Robert M. Steward & Steven E. Rigdon (2015): A Bayesian wavelet approach

    to estimation of a change-point in a nonlinear multivariate time series, Journal of StatisticalComputation and Simulation, DOI: 10.1080/00949655.2015.1116535

    To link to this article: http://dx.doi.org/10.1080/00949655.2015.1116535

    Published online: 16 Dec 2015.

    Submit your article to this journal

    Article views: 5

    View related articles

    View Crossmark data

    http://crossmark.crossref.org/dialog/?doi=10.1080/00949655.2015.1116535&domain=pdf&date_stamp=2015-12-16http://crossmark.crossref.org/dialog/?doi=10.1080/00949655.2015.1116535&domain=pdf&date_stamp=2015-12-16http://www.tandfonline.com/doi/mlt/10.1080/00949655.2015.1116535http://www.tandfonline.com/doi/mlt/10.1080/00949655.2015.1116535http://www.tandfonline.com/action/authorSubmission?journalCode=gscs20&page=instructionshttp://www.tandfonline.com/action/authorSubmission?journalCode=gscs20&page=instructionshttp://dx.doi.org/10.1080/00949655.2015.1116535http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00949655.2015.1116535http://www.tandfonline.com/loi/gscs20http://www.tandfonline.com/action/journalInformation?journalCode=gscs20

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    2/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015http://dx.doi.org/10.1080/00949655.2015.1116535

    A Bayesian wavelet approach to estimation of a change-point in a

    nonlinear multivariate time series

    Robert M. Stewarda and Steven E. Rigdonb

    aDepartment of Mathematics and Computer Science, Saint Louis University, Saint Louis, MO, USA; bDepartment of Biostatistics, College for Public Health and Social Justice, Saint Louis University, Saint Louis, MO, USA

    ABSTRACT

    We propose a semiparametric approach to estimate the existence and loca-tion of a statistical change-point to a nonlinear multivariate time seriescontaminated with an additive noise component. In particular, we considera   p-dimensional stochastic process of independent multivariate normalobservations where the mean function varies smoothly except at a singlechange-point. Our approach involves conducting a Bayesian analysis onthe empirical detail coefficients of the original time series after a wavelettransform. If the mean function of our time series can be expressed as amultivariate step function, we find our Bayesian-wavelet method performscomparably with classical parametric methods such as maximum likelihoodestimation. Theadvantage of our multivariate change-point method is seenin how it applies to a much larger class of mean functions that require onlygeneral smoothness conditions.

    ARTICLE HISTORY

    Received 29 August 2014Accepted 2 November 2015

    KEYWORDSSemiparametric; scalingcoefficient; detail coefficient;discrete wavelet transform;Haar wavelet

    1. Introduction

    The change-point problem has been studied in a variety of settings since at least the 1920s when inan effort to improve quality control Walter Shewart developed his now ubiquitous statistical con-trol charts to detect various statistical changes in industrial processes.[1] Although control chartmethods proved useful in practice more theoretically grounded approaches involving maximumlikelihood estimation (MLE) [2] and Bayesian techniques [3] later allowed the practitioner to rigor-ously associate condence intervals to their conclusions. While initially the univariate case of a singlechange-point in the mean function was the focus, efforts expanded to include various other relatedproblems such as a multiple statistical change-points,[4–6] change in variance,[7] and a simultaneouschange in mean and variance.[8] The case where the error component is not from a normal distribu-tion has also been studied by various authors.[9,10] While many of these methods have proven to be

     valuable diagnostic data analysis tools, they generally either apply only in a single dimension or aftermaking strict assumptions on the time series model.

    There appears to be a gap in the change-point literature that addresses the change-point problemfor nonlinear multivariate time series. Classical parametric approaches such as MLE and Bayesianmethods exist to detect and estimate the location of one or more statistical change-points in mul-tivariate time series.[8,11,12] Many variations of such parametric approaches exist for detectingmultivariate statistical change-points,[13–16] but invariably these methods require strict assump-tions on the time series mean function. Müller [17] developed an approach to detect discontinuities

    in derivatives using left and right one-sided kernel smoothers for one-dimensional smooth functions.

    CONTACT  Steven E. Rigdon   [email protected]

    © 2015 Taylor & Francis

    mailto:[email protected]:[email protected]

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    3/20

    2 R. M. STEWARD AND S. E. RIGDON

    More recently, Ogden and Lynch,[18] Ciuperca,[19] and Battaglia and Protopapas [20] all have resultsfor estimating change-point locations in one-dimensional nonlinear time series. Matteson and James[21] developed a fully nonparametricapproach for estimating the location of multiple change-point ina multivariate data. While their work is perhaps the method most relevant to the change-point prob-lem in this article, their method still only applies to data sets where the mean function is piecewiseconstant.

    The multivariate change-point problem is an important problem that has direct applications ina surprising number of otherwise seemingly unrelated elds. In statistical process control (SPC),the multivariate change-point problem is important to quickly detect and estimate changes inmany industrial processes.[22] The US Department of Transportation has applied the multivariatechange-point problem to estimate statistical change-points around a speed limit increase from 55 to65 mph.[13] Additional applications occur in such unrelated elds as biosurveillance, nancial mar-ket analysis, and hydrology to name a few.[23,24] In practice, however, imposing strict assumptionson the time series may be impractical when encountering the change-point problem for real worldmultivariate data. Unfortunately, in the multivariate time series setting there have not been many other good options. In this article we propose a method that attempts to bridge this gap by devel-

    oping a generalization of the approach from Ogden and Lynch.[18] The method we propose detectsand estimates the location of a statistical change-point for multivariate data through a Bayesian anal-ysis on empirical wavelet detail coefficients and applies even when strict assumptions about the trueunderlying mean function cannot be made.

    2. Background: why wavelets in the change-point problem?

    The attractiveness of wavelets springs from both their simplicity in theory and exibility inapplication.[25] From a statistical point of view, wavelets offer alternative methods in data smoothing,density estimation, and multiscale time series analysis.[26] The multiscale characteristic of wavelets

    is particularly important in our setting because of the exibility it offers to various change-pointproblems we might encounter. The wavelet transform divides the original time series into ‘scalingcoefficient’ and ‘detail coefficient’ components. The scaling coefficients represent varying degrees of time series smoothing or averaging while the details contain information pertaining to how muchthe function is changing at a certain resolution level. Figure 1 illustrates this concept by displayinga time series along with two example plots of corresponding scaling and detail coefficients. In somesense, wavelets offer a method to analyse the change of a time series through a lens that may be readily ‘zoomed in’ or ‘zoomed out’ as required.[27]

    All wavelets we consider in this article constitute an orthonormal basis of the usual inner prod-uct space of square integrable functions,  L2(R). In particular, we may approximate any  f  ∈ L2(R)

    Sample function with two frequencies

    time

      s   i  g  n  a   l

      −   6

      −   4

      −   2

       0

       2

       4

       6

    Level 5 Scaling Coefficients

    time

      c  o  e   f   f   i  c   i  e  n   t  v  a   l  u  e

    0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

      −   2

      −   1

       0

       1

       2

       0 .   0

       0 .   1

       0 .   2

       0 .   3

       0 .   4

    Level 6 Detail Coefficients

    time

      m  a  g  n   i   t  u   d  e

    Figure 1.  Original function (left), scaling coefficients at level 5 smoothing the timeseries (middle), and detail coefficients capturinghow the time series is changing at detail level 6 (right).

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    4/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 3

    arbitrarily close in the L2(R) sense as

     f (t ) =∞k=−∞w0,kφ0,k +

    ∞ j= j0

    ∞k=−∞d  j,kψ j,k   (1)

    where w j,k and d  j,k are called the scaling and detail coefficients, respectively. Through an integrationwhich we dene below, eachw j,k and d  j,k coefficient is associated with a particular scaling and waveletfunction φ j,k and ψ j,k, respectively. Each φ j,k and ψ j,k are in turn related to each other by the so-called‘father’ and ‘mother’ wavelet expressed as

    {φ j,k(t ) = 2− j/2φ (2− jt − k)} j,k,{ψ j,k(t ) = 2− j/2ψ (2− jt − k)} j,k.

    (2)

    Explicitly, w j,k and d  j,k are given by 

    w j,k =  f , φ j,k =  ∞−∞ f (t )φ j,k(t ) dt ,

    d  j,k =  f , ψ j,k =  ∞−∞ f (t )ψ j,k(t ) dt .

    From Equation (2), we see wavelets by denition are simply systems of dilations and translations. Thesimplest wavelet that we may explicitly express in closed form is the Haar wavelet given by 

    φ (t ) =

    1, 0 ≤ t  

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    5/20

    4 R. M. STEWARD AND S. E. RIGDON

    the nest level of scaling (w jk) and detail (d  jk) coefficients are computed by the formulas

    w( J −1),k = (x 2k + x 2k−1)/√ 

    2 and   d ( J −1),k = (x 2k − x 2k−1)/√ 

    2. (3)

    We then compute all subsequent levels of scaling and detail coefficients recursively by the formulas

    w( j−1),k = (w j,2k + w j,2k−1)/√ 2 and   d ( j−1),k = (w j,2k − w j,2k−1)/√ 2. (4)This process terminates when  j = 0 and we produce just 20 = 1 additional scaling and detail coef-cient. After we take the DWT, we have in total  N − 1 scaling and detail coefficients. Of course,if we chose a different wavelet, the above formulas would also be different. We refer the reader toDaubechies,[25] Nason,[26] and Mallat [28] for in depth treatments of wavelet theory.

    Given a discrete noisy signal, we can take its DWT and analyse the resulting detail coefficientsto distinguish the signal from the statistical noise. Recall that a function is said to be smooth if itpossesses derivatives of all orders. Donoho and Johnstone [29] showed the DWT of a noisy signalwith an underlying smoothly varying mean function results in a sparse representation of the detail

    coefficients provided the signal-to-noise ratio is sufficiently high. In particular, the contribution of the signal to the high-level detail coefficient magnitudes should be close to zero leaving the energy of the true signal concentrated in a relatively sparse number of low-level detail coefficients repre-senting overall signal change. The noise component of the original signal, however, does not have asparse representation; rather, it is again transformed to noise after the DWT and spread throughoutall resolution levels. We will exploit this difference between signal and noise detail representation inour change-point detection and estimation method described below. Explicitly we model our originaltime series as

    x i =  g (i) + εi,where g (

    ·) is our true underlying smooth (except possibly at a change-point) mean function observed

    for a discrete number of equally spaced time intervals and εi is some additive noise component. Next,we take the DWT of our time series and obtain a ‘transformed’ data model of the following form:

    d ∗ jk = d  jk + η jk,

    where d ∗ jk is the empirical detail coefficient we actually observe. In the case of the Haar wavelet  d ∗ jk

    would be the computation results after recursively applying Equations (3) and (4). Next,  d  jk is thetrue (but unknown) detail coefficient of the underlying smooth mean function we wish to estimate.Finally, η jk is the transformed additive noise component from the original time series that transformsagain to noise.[26]

    If we assume εi is generated from a Gaussian process, then ηi will also be Gaussian.[30] Wang [31]connected these properties of the DWT to the change-point problem where he recognized that undersuitable conditions the largest detail coefficients are the result of those places where the time seriesis changing most rapidly and probably not attributable to noise. Wang then hypothesized that theplaces where the time series is most rapidly changing may be due to a statistical change-point. WhileWang’s method works well for change-point problems with relatively high signal-to-noise ratios, itbecomes much less reliable as the additive noise is increased. Additionally, there is also the issue of determining how best to combine the information from different detail levels to use in the analysis.In the following section we develop a method that capitalizes on these statistical properties whileaddressing Wang’s shortcomings in a complete Bayesian model framework.

    3. Bayesian-wavelet approach to the change-point problem

    From Section 2 we know any additive noise component of a time series is again transformed to anadditive noise component after a DWT. Ogden and Lynch [18] exploited these properties of the DWT

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    6/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 5

    Detail Coefficients(step function)

    Translate

    Standard transform Daub cmpct on ext. phase N=10

       R  e

      s  o   l  u   t   i  o  n   L  e  v  e   l

    Detail Coefficients(sine function with a shift)

    Translate

    Standard transform Daub cmpct on ext. phase N=10

       R  e

      s  o   l  u   t   i  o  n   L  e  v  e   l

       6

       5

       4

       3

       2

       1

       0

    0 16 32 48 64

       6

       5

       4

       3

       2

       1

       0

    0 16 32 48 64

    One dimensional step function

    units of time

      s   i  g  n  a   l

    0 20 40 60 80 100 20 40 60 80 100120

       0 .   0

       0 .   5

       1 .   0

    0 120  −   1 .   0

      −   0 .   5

       0 .   0

       0 .   5

       1 .   0

    Sine function with a shift

    units of time

      s   i  g  n  a   l

    Figure 2.  Two example mean functions with a change-point at time point 81 (top) along with their respective detail coeffi-cients (bottom). Each detail level is normalized by its  l ∞-norm. Notice at the finest four resolution levels the detail coefficientsare essentially identical to each other.

    by proposing a method for estimating the change-point location of a one-dimensional time series by applying Bayesian techniques in the wavelet domain. In this section, we generalize a similar method-ology now to an arbitrary dimensional time series and extend the approach to answer the inferencequestion.

    The DWT allows us to analyse a time series at varying resolution levels and stores the resultingdetails of smooth functions in a similar way. Observe Figure 2  which displays two examples of asmooth time series mean function except at a change-point at time point 81 (top) along with therespective detail coefficient values (bottom). Observe that the detail coefficient values are essentially identical for the nest three resolution levels (levels 4, 5, and 6), despite the fact that the mean func-tions are quite different. While some coefficient values at the lowest four resolution levels do begin

    to diverge, at least 112 of the total 127 detail coefficients in this 128 element time series very closely agree. This suggests that, from the wavelet perspective, the two change-points in Figure 2 are equally difficult to detect when using just the highest detail coefficient levels.

    The phenomena in Figure 2 illustrates the sparsity property of the DWT and holds in general forany smoothly varying mean functions which share a common change-point. In particular, any oth-erwise smooth function with a change-point should have a similar detail coefficient representationat the nest levels as a step function with a change-point at the same location. This observation pro-

     vides the intuition behind why an analysis of wavelet detail coefficients may be an effective approachin estimating the change-point location of time series with otherwise smooth functions such as thoseshown in Figure 2.

    Consider a multi-dimensional time series of independent observations {x i}N 

    i=1 forN  ∈N

    where x iis a p-dimensional vector, such that

    x i ∼ N  p(µi, ). (5)

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    7/20

    6 R. M. STEWARD AND S. E. RIGDON

    In the typical case where Bayesian or likelihood techniques are applied to the multivariate change-point problem,  µi  is assumed to be a  p-dimensional step function. For our more general analysis,however,µi is assumed to be generated by a p-dimensional function, g (·), smoothly changing exceptat a single point in time where the shift occurs. Throughout this article, we denote the unknown timeseries change-point location with the symbol τ . We also assume  is an unknown but constant p× pcovariance matrix throughout our time series. A particular observation of the time series takes theform

    x i =  g (i) + εi,

    where

    εi ∼ N  p(0, ).

    Next,welettheN × pmatrixX represent our time series where each row representsan observationat a particular time. Additionally, we introduce the idealized  N × p matrix, H, which we compareagainst X :

    X =

    x 11   x 12   . . .   x 1 px 21   x 22   . . .   x 2 p

    ...  ...

      . . .  ...

    x τ 1   x τ 2   . . .   x τ  px τ +1,1   x τ +1,2   . . .   x τ +1, p

    ...  ...

      . . .  ...

    x N 1   x N 2   . . .   x Np

    N × p

    and   H =

    0 0   . . .   · · ·   00 0   . . .   · · ·   0...

      ...  . . .

      . . .  ...

    0 0   . . . . . .   01 1   . . . . . .   1...

      . . .  . . .

      ...1 1   . . . . . .   1

    N × p

    (6)

    ThezerorowsinH represent those observations before the change-point and the one rows indicateobservations after the change-point. We assume for now that our time series is of dyadic length, thatis, of length N  = 2 J  for some J  ∈ N. While this appears to be a restrictive requirement, in practicethere are several ‘padding’ techniques that remedy this apparent difficulty.[26] For example we mightsimply concatenate low-level statistical noise to the front end of the time series to achieve the requireddyadic length if we have data available from the in-control time series state. Another method is toreect the time series elements of sufficient length to obtain the require dyadic length. For example,a data set with six elements  (x 1, x 2, x 3, x 4, x 5, x 6) could be modied as (x 3, x 2, x 1, x 2, x 3, x 4, x 5, x 6) toachieve the required dyadic length. The latter approach is what we will apply for our practical example

    in Section 6.2.We now take a one-dimensional discrete wavelet transform (DWT) of both X  and H column by column which produces two (N − 1) × p matrices in the wavelet domain D∗ andQ. We can normal-izeeachdetaillevelbyits l ∞ norm which has the effect weighting coefficients from different resolutionlevels equally. With   l ∞  normalized detail levels our subsequent analysis becomes less sensitive tochange information contained by the lowest resolution levels. In Section 6 we apply our algorithmboth with and without normalized detail coefficients. When the rows of zeroes and ones of H exactly correspond to the rows of  X  before and after change-point, the rows of  D∗ and  Q will closely relateto each other in a meaningful manner as we describe below. Since the statistical properties of theadditive noise component of the time series are retained after a one-dimensional DWT, it can easily be shown using the linearity of the DWT that the expected covariance matrix after the transform

    remains  .Notationally, we index our detail matrices to emphasize the detail levels of each row. More explic-

    itly, supposing our time series is of length 2 J , we denote a p-dimensional detail coefficient as  d jk =(d  jk,1, d  jk,2, . . . , d  jk, p) where j represents a particular detail level andk the translation index at the given

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    8/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 7

    detail level. We then express the DWT of  X  and H as the matrices D∗ and Q where,

    D∗ =

    d ∗01,1   d ∗01,2   . . .   d 

    ∗01, p

    d ∗11,1   d ∗11,2   . . .   d 

    ∗11, p

    d ∗12,1   d ∗12,2   . . .   d 

    ∗12, p

    d ∗21,1   d ∗21,2   . . .   d ∗21, p...  ...

      . . .  ...

    d ∗ jk,1   d ∗ jk,2   . . .   d 

    ∗ jk, p

    ...  ...

      . . .  ...

    d ∗( J −1) J 2 ,1  d ∗

    ( J −1) J 2 ,2  . . .   d ∗

    ( J −1) J 2 , p

    (N −1)× p

    and

    Q =

    q01,1   q01,2   . . .   q01, pq11,1   q11,2   . . .   q11, pq12,1   q12,2   . . .   q12, pq21,1   q21,2   . . .   q21, p

    ...  ...

      . . .  ...

    q jk,1   q jk,2   . . .   q jk, p...

      ...  . . .

      ...q

    ( J −1) J 2 ,1  q

    ( J −1) J 2 ,2   . . .  q

    ( J −1) J 2 , p

    (N −1)× p.

    (7)

    Next, we dene   = [δ1, δ2, . . . , δ p] as the amount our mean function shifts at the unknownchange-point. It is important to note that here is not a vector, rather a set of coefficients. We use the

    [ ] notation to distinguish this from say  q11 = (q11,1, q11,2, . . . , q11, p) which is a p-dimensional vec-tor. So in particular, we dene, q11 = (δ1q11,1, δ2q11,2, . . . , δ pq11, p) using element-by-element scalarmultiplication.

    We know the additive noise component of the original time series is again transformed to an addi-tive noise component after the dimension-by-dimension DWT is taken of the original time series.Furthermore, as illustrated in Figure 2, we know, at least for the nest level detail vectors, that thetrue detail vector values should very closely match the detail vectors of  Q. In the case when themean function of our time series is a multivariate step function, all true detail vectors will match thedetail vectors of Q. In the more general case where the true underlying mean function is unknown,we will ultimately retain only the nest level detail vectors in our nal analysis. With these properties

    in mind, those retained empirical detail coefficient vectors, d∗ jk, may therefore be modelled asd∗ jk ∼ N  p(d jk, ) = N  p(q jk, ),

    where d jk is the true detail vector while  d∗ jk = (d  jk1, d  jk2, . . . d  jkp) and  q jk = (q jk,1, q jk,2, . . . q jk, p) are

    the jk rows of the matrices D∗ and  Q, respectively. Using Bayes theorem, our posterior distributionof  τ , , and  takes the form of the product of our likelihood and prior distribution; that is,

     p(τ , , |D∗) ∝ j

    k

     f (d∗ jk|τ , , ) p0(τ , , ). (8)

    where f   is a p-dimensional multivariate normal probability density function.In Equation (8) we use the double index notation to emphasize that we are taking the product

    over distinct detail coefficients by their resolution and translation indices. In our model    is a con-stant but unknown covariance matrix. Following the discussion above, any prior covariance matrix

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    9/20

    8 R. M. STEWARD AND S. E. RIGDON

    information we have in our original time series directly applies after our transform. For example,we could put a Wishart distribution as an informative prior on    if we have sufficient prior knowl-edge of   . For the most general case, however, we will apply Jeffrey’s noninformative prior given as p0(, , τ ) ∝ ||−1/2. We also note, that implicit in this prior is that we assign a uniform prior tothe change-point location throughout the time series. Our posterior distribution takes the form

     p(τ , , |D∗) ∝ ||−m/2 exp−1

    2

     j

    k

    (d∗ jk − q jk)T−1(d∗ jk − q jk) ||−1/2,

    where m represents the actual number of detail coefficients used in the analysis. In the appendix,we provide details of the calculations where we integrate out  and   to arrive at the marginalizedposterior distribution function that we apply in Sections 6 and 7,

     p(τ |D∗) ∝ C −1/2  j k d∗ jkd∗T jk −1

    C BBT−(m− p−1)/2

    ,   (9)

    where

     A = j

    k

    d∗T jk  −1d∗ jk,   B = j

    k

    qijd∗ jk,   B

    T = j

    k

    qijd∗T jk  , and   C = j

    k

    q2ij.

    Formally we estimate the change-point of the time series as arg maxτ 

     p(τ |D∗). In particular, there

    are N − 1 possible values of  τ  and with probability one a maximum value always exists. Notice thatEquation (9) is neither wavelet nor detail level specic. Depending on what we know (or do notknow) about the time series, different wavelet- and detail-level combinations may be more appropri-ate. Depending on the true underlying mean function of the time series, we found through simulationstudies that the choice of wavelet had a minor, but noticeable, effect on correctly estimating thechange-point location. In the simplest case, when the mean function is represented by a multivari-ate step function, studies show it is also the simplest wavelet (i.e. the Haar wavelet) that performsmarginally better. In the case of a smoothly varying mean function, the Daubechies 10-tap waveletbecame the best choice for correctly estimating the change-point location.

    We also need to decide which detail levels to apply. This decision is fairly straightforward depend-ing on what is known about the true mean function. In general the more applicable detail vectors thatwe can use in Equation (9), the more condence we will be able to attribute to our conclusions. Solong as the mean function is smooth except at the change-point location then our model assumptionsapply and at least the nest three or four detail levels should be applied. If more information about themean is available, it may be optimal to use more detail levels. For example in the case of a multivariatestep function, all detail levels should be applied.

    4. Bayesian-wavelet approach to detecting the existence of a change-point

    We determine the existence of a change-point by taking a model selection approach and applying aform of the Schwarz information criteria (SIC). Let  M 1 denote that a single change-point occurs in

    the mean function of our time series and let  M 2 denote the model where no change occurs. We rstcompute the likelihood of observing either of these two models given the observed data. Since it isunclear which constants will cancel in the ratio of these two models, we must retain them throughoutthe calculations. Then similar to our previous derivation of Equation (9), we obtain the following

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    10/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 9

    likelihood for M 1:

    P (D∗| M 1) = K (2π )( p−mp)/22mp/2 pm

    2

    C −1/2

     j

    k

    d∗ jkd∗T jk −

    1

    C BBT

    −(m− p−1)/2

    ,   (10)

    where  p(·) is the multivariate gamma function dened as

     p(x ) = π p( p−1)/4 pi=1

    [x + (1 − i)/2],

    K  is a constant common to both models, and all other terms are as previously dened.In M 2, calculations are simplied since is assumed to be the p-dimensional zero vector. Once

    again adopting a similar approach as before we obtain the likelihood of observing our data under M 2:

    P (D∗| M 2) = K (2π )−mp/2

    2

    (m

    +1) p/2

     p (m+

    1)

    2  j k d∗ jkd∗

    T

     jk−(m− p)/2

    . (11)

    We note the difference in the number of free parameters in M 1 and M 2 is k2 − k1 =  p, namely thedimension of . This suggests a form of the SIC.

    (SIC) = −2(logP (D∗| M 2) − logP (D∗| M 1)) + (k2 − k1) logN .For our multi-dimensional change-point problem, we maximize Equation (10) for τ  to obtain ournal result.

    (SIC) = −2(log(P (D∗| M 2)) − logP (D∗| M 1)) + p log(N ),   (12)where Equation (12) implicitly assumes equal prior probability of realizing either M 1 or M 2.Incertaininstances the modeller may have reason to favour one model over the other and so the prior odds ratioof the two models would not be 1. Recall, the posterior odds ratio may be expressed as

    P ( M 1|D∗)P ( M 2|D∗)

    = P (D∗| M 1)P (D∗| M 2)

    P ( M 1)

    P ( M 2)

    = Bayes Factor × P ( M 1)P ( M 2)

    .

    (13)

    We may modify Equation (12) to incorporate a prior belief toa priori favour one model over the other.

    In our setting this may be accomplished by substituting the data-dependent terms in Equation (12)with −2 times the log of Equation (13). For the later examples and simulations we provide, we notethat each model is given equal weight and Equation (12) is implemented in its present form.

    Our selection process is now a straightforward calculation of  (SIC). We select the no changemodel when (SIC)  0. We noteslightly positive values (e.g. (SIC) ≤ 3) should be treated with caution. Although the change-pointmodel is favoured in such cases, the evidence is not particularly strong. Values computed farther fromzero (i.e. (SIC) > 3) denote strong evidence of the existence of a change-point with more assuranceobtained with larger computed values.

    5. Extending to the case of multiple change-pointsThe preceding methods may be extended to the case of time series containing multiple statisticalchange-points. In this section we demonstrate how an application of the so-called binary segmen-tation algorithm may be applied in conjunction with the methods developed in Sections 3 and 4 to

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    11/20

    10 R. M. STEWARD AND S. E. RIGDON

    (1) estimate the number of change-points in a nonlinear multivariate time series and (2) estimate thelocations of these change-points. In Section 6 we also provide an illustrative example of how this may be applied to a data set containing multiple change-points.

    Assume we observe a p-dimensional time series, X = {x i}N i  , where N  ∈ N such that

    x i ∼N  p(µi, ).

    We assume  is an unknown constant covariance matrix throughout the time series whileµi is deter-mined by an unknown multivariate mean function g (·) smoothly varying except at the set of points{τ i} M i=1. We focus our attention on determining  M  and each τ i. The binary segmentation algorithmmay now be applied as follows:

    (1) Apply Equation (12) to the time series X .If  (SIC)

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    12/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 11

    Dimension 1

    Units of Time

       S   i  g  n  a   l

      −   1 .   5

      −   0 .   5

       0 .   0

       0 .   5

       1 .   0

       1 .   5

       0 .   0

       0 .   5

       1 .   0

       1 .   5

    Dimension 2

    Units of Time

       S   i  g  n  a   l

    0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120

      −   1 .   5

      −   0 .   5

       0 .   5

       1 .   0

       1 .   5

      −   1 .   0

       2 .   0

    Dimension 3

    Units of Time

       S   i  g  n  a   l

    Figure 3.   Athree-dimensionaltimeserieswherethemeanfunctionisathree-dimensionalstepfunction.Inparticular,ashiftoccursat time point 85 in the first and third dimensions.

    Probability of Change−point Location(all detail levels used)

    units of time

       P  r  o   b  a   b   i   l   i   t  y  o   f  a  c   h  a  n  g  e  −

      p  o   i  n   t

       0 .   0

       0 .   2

       0 .   4

       0 .   6

       0 .   8

       0 .   0

       0 .   2

       0 .   4

       0

     .   6

    Probability of Change−point Location(detail levels 3, 4, 5, and 6 used)

    units of time

       P  r  o   b  a   b   i   l   i   t  y  o   f  a  c   h  a  n  g  e  −

      p  o   i  n   t

    0 20 40 60 80 100 120 0 20 40 60 80 100 120

    Figure 4.  Marginal posterior distribution of Equation (9) applied to the time series in Figure 3 with all details used (left) and onlythe four highest detail levels used (right). Notice in each case the concentrated probability is correctly centred at time point 85, butwith a slightly wider credible interval for the case on the right when not all detail coefficients are used.

    In this case since the true underlying mean function is a multivariate step function all details levelsshould be applied. In practice, we may not know the structure of the mean function and so wouldonly apply the four highest detail levels. In both cases the Bayesian-wavelet correctly estimates thechange-point location, but with different 95% credible intervals. With all detail levels used we obtaina 95% credible interval of [84, 86] and in the second case with only the rst levels used we obtain aslightly less precise 95% credible interval of [82, 89] (see Figure 4).

    To illustrate the power of the Bayesian-wavelet approach, suppose we now impose one period of 

    a sine wave on the same data set in each dimension. This new data set now represents the scenariowhere the mean function of our time series is nonlinear. Figure 5 depicts this new time series wherewe see the change-point at time point 85 is much more obscured. Applying both the likelihood andpure Bayesian approaches to this time series with a nonlinear mean function both return meaninglessresults as the assumptions upon which they are based are now violated. Directly imputing the new time series in the MLE algorithm, for example, incorrectly estimates the change-point location attime 63.

    Our Bayesian-wavelet approach, however, easily adapts to this more complicated situation. Usingthe four highest detail coefficient levels we calculate an SIC of 12.5 indicating the presence of a change-point in the time series. Maximizing Equation (9) for τ  correctly estimates the change-point locationonce again at time point 85. Figure 6 displays the relative probabilities for the change-point location

    with a slightly less concentrated 95% credible interval of [82, 88].As a nal illustrative example, we generate a ve-dimensional time series now with multiple

    change-points at time points 50, 100, 150, and 200. Figure 7  illustrates the rst dimension of thistime series where segments 1, 2, 3, 4, and 5 are centred around mean vectorsµT1 = (0,0,0,0,0),µT2 =

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    13/20

    12 R. M. STEWARD AND S. E. RIGDON

    Dimension 1

    Units of Time

       S   i  g  n  a   l

    Dimension 2

    Units of Time

       S   i  g  n  a   l

      −   1

       0

       1

       2

      −   2

      −   1

       0

       1

       2

    0 20 40 60 80 100 1200 20 40 60 80 100 1200 20 40 60 80 100 120  −   1 .   5

      −   0 .   5

       0 .   5

       1 .   0

       1 .   5

       2 .   0

    Dimension 3

    Units of Time

       S   i  g  n  a   l

    Figure 5.  This is the same data set at in Figure 3 only now with one period of the trigonometric function sin(2π t /128) added tothe elements in each dimension.

    0 20 40 60 80 100 120

       0 .   0

       0 .   1

       0 .   2

       0 .   3

       0 .   4

       0 .   5

       0 .   6

    Probability of Change−point Location

    units of time

       P  r  o   b  a   b   i   l   i   t  y  o   f  a  c   h  a  n  g  e  −  p  o   i  n   t

    Figure 6.  Marginal posteriordistribution fromthe timeseries in Figure 5 withconcentrated probabilityat the correct change-pointat time point 85.

    (−1,−1,−1,−1,−1),µT3 = (.5,.5,.5,.5,.5),µT4 = (2,2,2,2,2),and µT5 = (−.5,−.5,−.5,−.5,−.5),respectively. Applying Equation (12) to the original time series returns a value of 70.5 indicating withnear certainty the presence of a statistical change-point; we therefore apply Equation (9) to estimatethe location of the change-point. The rst application of Equation (9) estimates the change-pointlocation at time point 200 corresponding the largest shift of the time series. Next we ‘segment’ the

    time series from time points 1–200 and 201–256 and repeat this process. Continuing in such a way until all segments terminate, the algorithm correctly estimates the presence of four statistical change-points at time points 51, 100, 151, and 200 each with associated 95% credible intervals of less than±5 time units of the actual change-point location.

    6.2. Practical example

    We present a practical example implementing the methods developed in this article involving sixhydrological sequences in the Northern Québec Labrador region as represented in Figure 8. In par-ticular, we analyse the streamow in units of 1/(km2

    ×s) measured in the springs from 1957 to 1995. It

    has been noted that a perceptible general decrease in streamow seemed to occur in the 1980s in thisregion. The regional proximity of the rivers suggests a likely relationship between the rivers, but thespecic covariance structure is unclear a priori. Hence, a multivariate analysis certainly appears moreappropriate than sixindividual river univariate studies. The assertion is that dueto causes attributedto

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    14/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 13

    0 50 100 150 200 250

    Dimension 1

    units of time

      s   i  g  n  a   l

      −   2

       0

       2

       4

    0 50 100 150 200 250

      −   2

       0

       2

       4

    Dimension 1

    units of time

      s   i  g  n  a   l

    Figure 7.  The left figure represents the first dimension of a five-dimensional time series with change-points at time points 50,100, 150, and 200. The right figure delineates the time series into segments as estimated by the binary segmentation algorithm in

    conjunction with Equations (9) and (12).

    Romain

       1   /   k  m   ^   2  s

    Churchill Falls

       1   /   k  m   ^   2  s

    Outardes

       1   /   k  m   ^   2

      s

    Manicouagan

       1   /   k  m   ^   2

      s

    Sainte−Marguerite

       1   /   k  m   ^   2  s

    1960 1970 1980 1990

            2        0

            3        0

            4        0

    1960 1970 1980 1990

            2        0

            3        0

    1960 1970 1980 1990

            2        0

            3        0

            4        0

    1960 1970 1980 1990

            2        0

            3        0

    1960 1970 1980 1990

            2        0

            3        0

            4        0

    1960 1970 1980 1990

            1       5

            2       5

    A la Baleine

       1   /   k  m   ^   2  s

    Figure 8.  Plots of riverflows of six rivers in the Northern Québec Labrador region. The dashed lines for À la Baliene are years riverflows are estimated from a linear regression since the actual data are unavailable.

    perhaps climate change or other regional factors, a change-point in streamow has occurred. Apply-ing our methods, we would like to determine whether or not our methods support this assertion andif so estimate the change-point year.

    Perreault et al. [15] originally applied a retrospective Bayesian change-point analysis to this dataset. The principal advantage of our Bayesian-wavelet method over Perreault’s pure Bayesian approachto this data set is that our method applies even if the true underlying mean function is not a stepfunction. Perreault spends considerable time justifying rather strict assumptions on the data and the

    choice of hyperparameters used in the model. While Perreault’s analysis appears largely valid in thiscase, the strict assumptions required by such a purely Bayesian approach limit its applicability in moregeneral contexts and often make conclusions less compelling. With the Bayesian-wavelet approach,however, we have no need to elicit informative priors for the mean vectors both before and after the

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    15/20

    14 R. M. STEWARD AND S. E. RIGDON

    1960 1970 1980 1990

       0 .   0

       0 .   2

       0 .

       0 .   6

       0 .   8

    Probability of change−point location

    Year

       P  r  o   b  a   b   i   l   i   t  y  o   f  a  c   h  a  n  g

      e  −  p  o   i  n   t

    Figure 9.  Posterior distribution of a change-point for six hydrological sequence in the Northern Québec Labrador region.

    unknown change-point nor for the covariance matrix to construct our model. As discussed above,we require only that the true underlying mean function be smooth except at the single change-pointand that the random component be normally distributed.

    To begin our analysis we note measurements for one river, À la Baliene, are unavailable from theyears 1957–1962 inclusive. To handle this discrepancy we took two different approaches. In the rstcase, we simply analysed the data for the common years from 1963 to 1995 inclusive. In the secondapproach, we treat river ows for À la Baliene as a dependent variable and perform a linear regressionfor the years with complete data against the other ve rivers. With the linear model in hand, weestimate river ows for À la Baliene for the years 1957–1962 from the linear model using the datafrom the other rivers with complete data sets. The dashed line in Figure 8 for À la Baliene representsthese estimated values. After a comparison of our analyses, we nd very similar results are obtainedin both cases. As such we present results from only the latter case.

    We implement the Daubechies 10-tapwavelet since it has known properties particularly well suited

    to detect abrupt time series change.[32] Based on Perreault’s analysis, the mean function is someunknown multivariate step function. If this property actually holds, we should be able to apply alldetail levels with Bayesian-wavelet and arrive at the same answer. Standardizing detail coefficientsas described in Section 3, we thus apply all detail coefficients in our analysis. Finally, we note thistime series is not a power of two as required to apply any DWT. We remedy this situation by sim-ply reecting the beginning of the time series to achieve the required dyadic length as described inSection 3.

    With our wavelet parameters in hand, we next must determine whether or not a statistical change-point in the mean vector even exists in our data set. A computation of the SIC returns a value 14.53which represents strong evidence for the existence of a statistical change-point. Next, we estimatethe location of the change-point by maximizing the Bayesian-wavelet change-point equation for  τ .

    This returns the year 1984 as the change-point location estimate with posterior probability of nearly 0.85. Furthermore, we note a 90% credible interval ranges around this estimation of the change-pointlocation from [1983, 1986] (see Figure 9). We note these results are similar to Perreault, who alsoestimated the change-point year as 1984, but with a 90% credible interval of [1983, 1985].[15]

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    16/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 15

    7. Simulations

    In order to compare the performance of the Bayesian-wavelet method with a likelihood-basedmethod, we ran simulations and compared how often the estimate of the change-point was withintwo time units of the true change-point.

    7.1. Multivariate step mean function

    For the simulations in this section we generate multivariate time series with an underlying meanfunction represented as a multivariate step function. The time series length in each case is 128 wherethe change-point is randomly selected somewhere in the middle 90% of the time series elements.Before the change-point elements are centred around the zero mean vector, µτ  = (0,0, . . . , 0). Afterthe change-point the mean vector shifts to µτ  = (δ, δ, . . . , δ). Furthermore, we generate simulateddata for two separate covariance matrices  1 = I , the identity covariance matrix, and then  2  acovariance matrix with 1’s along the diagonal and 0.5’s on all off-diagonal elements. We record thepercentage each method correctly estimates the change-point location with two time units of the true

    change-point location for each 1000 simulation run.Before we can begin our simulations, we must decide which detail levels and which wavelet toapply. In the case of a stationary time series with a single change-point there is no underlying trendto the time series contributing to time series change except the single change-point. In this case, thechange information contained in the detail coefficients only pertains to the change-point itself. Hence,we apply all wavelet details in our simulations. For the wavelet function itself, we present results fromthe Haar wavelet although applying the Daubechies 10-tap wavelet yielded similar results (Table 1).

    We compared the effectiveness of the Bayesian-wavelet and likelihood methods for estimatingthe change-point location by varying the dimension, jump size, and covariance matrix of our timeseries. It is interesting to note that the simulation results suggest there is very little difference betweenthe likelihood and Bayesian-wavelet approach in correctly estimating the change-point. Furthermore,what differences may exist become less important as the dimension increases. Thus,we obtain compa-rable results to the likelihood method with our Bayesian-wavelet method without the same stringentlikelihood time series assumptions.

    Table 1.  Percentage of change-point estimations within two time units of actual change-point after 1000 simulations. In all casesthe initial mean vector is µτ  = (0,0, . . . , 0) and then shifts to µτ  = (δ, δ, . . . , δ).

    Time series dimension

    Chgt Pt method   δ   2 5 10 25 50 75 100

    BW 0.5   I   0.36 0.60 0.79 0.98 0.98 0.98 0.93MLE 0.5   I   0.37 0.59 0.79 0.94 0.98 0.98 0.92BW 1.0   I   0.77 0.96 0.99 1.00 1.00 1.00 1.00MLE 1.0   I   0.77 0.96 0.99 1.00 1.00 1.00 1.00BW 1.5   I   0.95 0.99 1.00 1.00 1.00 1.00 1.00MLE 1.5   I   0.96 0.99 1.00 1.00 1.00 1.00 1.00BW 2.0   I   0.99 1.00 1.00 1.00 1.00 1.00 1.00MLE 2.0   I   0.99 1.00 1.00 1.00 1.00 1.00 1.00BW 0.5   2   0.21 0.14 0.20 0.13 0.07 0.06 0.05MLE 0.5   2   0.22 0.16 0.21 0.13 0.08 0.07 0.05BW 1.0   2   0.60 0.56 0.71 0.62 0.41 0.24 0.12MLE 1.0   2   0.60 0.56 0.71 0.63 0.41 0.26 0.14BW 1.5   2   0.85 0.81 0.89 0.87 0.76 0.57 0.32MLE 1.5   2   0.84 0.81 0.90 0.88 0.76 0.57 0.31

    BW 2.0   2   0.96 0.97 0.98 0.97 0.91 0.88 0.54MLE 2.0   2   0.96 0.97 0.98 0.97 0.91 0.88 0.53

    Notes: BW indicates the Bayesian-wavelet approach and MLE indicates the maximum likelihood estimation approach. Simulationsare conductedwithtwo covariance matrices, the identity covariance matrix (I) anda covariance matrix with 1’s along thediagonaland .5’s on all off-diagonal elements (1).

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    17/20

    16 R. M. STEWARD AND S. E. RIGDON

    Table 2.  Percentage each method estimates the change-point location within 2 time units of true change-point location whereeach run represents 1000 simulations. In all cases the initial mean vector isµτ  = (sin(2π t /128),sin(2π t /128), . . . ,sin(2π t /128))and then shifts to µτ  = (sin(2π t /128) + 1,sin(2π t /128) + 1, . . . ,sin(2π t /128) + 1).

    Time series dimension

    Chgt Pt method   σ 2 2 4 6 8 10 25 50

    BW 0.2 0.98 0.99 1.00 1.00 1.00 1.00 1.00MLE 0.2 0.00 0.00 0.00 0.00 0.00 0.00 0.00BW 0.4 0.88 0.99 0.99 1.00 1.00 1.00 1.00MLE 0.4 0.00 0.00 0.00 0.00 0.00 0.00 0.00BW 0.6 0.71 0.92 0.99 0.99 0.99 1.00 1.00MLE 0.6 0.00 0.00 0.00 0.00 0.00 0.00 0.00BW 0.8 0.60 0.87 0.94 0.97 0.99 1.00 1.00MLE 0.8 0.01 0.01 0.00 0.00 0.00 0.00 0.00BW 1.0 0.53 0.78 0.89 0.96 0.97 1.00 1.00MLE 1.0 0.02 0.01 0.00 0.00 0.00 0.00 0.00BW 1.2 0.44 0.70 0.89 0.90 0.95 0.99 1.00MLE 1.2 0.01 0.00 0.00 0.00 0.00 0.00 0.00BW 1.4 0.39 0.62 0.76 0.83 0.89 0.99 0.99MLE 1.4 0.01 0.01 0.00 0.00 0.00 0.00 0.00

    BW 1.6 0.31 0.54 0.69 0.79 0.87 0.98 0.99MLE 1.6 0.01 0.00 0.00 0.00 0.00 0.00 0.00

    Notes: BW indicates the Bayesian-wavelet approach and MLE indicates the maximum likelihood estimation approach. Throughoutthe simulations the covariance matrix used is the identity multiplied by σ 2 .

    7.2. Multivariate piecewise smooth function with a single mean function shift 

    We next investigate how these methods perform when the underlying mean function does not con-form to a multivariate step function. In particular, since the Bayesian-wavelet method requires only the underlying mean function to be smooth except at the change-point, we consider a multivariate

    time series with a nonconstant smoothly varying mean function.We generate time series with a smoothly varying mean function except at a single change-point.Specically, we set the initial mean to µt  = sin(2π t /128)1, t  = 1,2, . . . , τ  and then after the change-point the mean vector becomes µt  = sin(2π t /128)1+ 1, t  = τ  + 1, τ  + 2, . . . , 128. That is, the shift

     vector is = (1,1, . . . , 1) for all simulations. We then incrementally adjust the variance of the addi-tive noise by changing the diagonal terms of the covariance matrix. We set our covariance matrixequal to the identity multiplied by the constant σ 2 as given in Table 2. The change-point is randomly selected from the middle 90% of the time series and the Daubechies 10-tap wavelet is applied usingthe four highest details coefficients.

    Simulation results provide evidence that the Bayesian-wavelet method does well seeing throughadditive noise component of the time series and estimating the true change-point location. ApplyingEquation (9) exactly as we did in Section 7.1, only now with just the four highest detail levels andthe Daubechies 10-tap wavelet, we have a method that easily adapts to estimate change-points ina very different time series. Methods such as MLE or a purely Bayesian approach that make strictassumptions on the true underlying mean function do not share this same exibility. We see theunderlying form of the oscillating mean function violates the likelihood assumptions in such a way that this method has no ability to correctly estimate the change-point location. Only in the lowerdimensional cases with high variance when the time series more closely resembles pure noise, doesthe MLE register a few correct estimates by chance alone. In the other cases the geometry of the timeseries forces the MLE method away from the true change-point location.

    8. Conclusion

    In this article we presented a methodology for both inferring the existence of one or more statis-tical change-points in a multivariate time series and estimating their location. We see this general

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    18/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 17

    approach is not limited to just changes in mean, but can also be adapted to estimate covariance struc-ture change-point locations as well. Finally, it can be shown that Equation (5) is invariant to dimensionpreserving linear transformations. This property suggests applications to the change-point problemfor high dimensional time series in conjunction with a dimension reduction through a randommatrixmultiplication. All these topics are currently under investigation.

    Another interesting aspect of this approach is how it may be used as an indirect tool to validatecertain data set assumptions. When parametric methods such as MLE or purely Bayesian modelsare applied to infer and estimate the location of a single change-point in a multivariate time series,the true underlying mean function is typically a multivariate step function. In principle using alldetail levels of our Bayesian-wavelet method should always return very nearly identical change-pointlocation estimates in such cases. If a discrepancy exists between the above parametric methods withour Bayesian-wavelet method, then either the time series signal-to-noise ratio is not sufficiently highor the model assumptions are simply not valid.

    We found our multivariate Bayesian-wavelet approach for detecting statistical change-points per-forms comparably with the classical likelihood method when the true mean function of the timeseries is a multivariate step function. The advantage to our approach is seen in how our method also

    easily extends to more general situations. The simulations demonstrate how the likelihood methodfails when its model assumptions become invalid, but also show how the Bayesian-method still per-forms well. We chose a multivariate trigonometric function as an example in our simulations, but theapproach applies equally well to any other such piecewise smooth multivariate functions. We thusconclude that the Bayesian-wavelet method affords the modeler greater exibility in much more gen-eral situations and potentially serves as a valuable diagnostic tool in the setting of the multivariatechange-point problem.

    Acknowledgments

    We would like to thank both Professor Darrin Speegle and the anonymous referees for their careful consideration of 

    this paper. Their suggestions and helpful advice certainly improved the nal form of this paper.

    Disclosure statement

    No potential conict of interest was reported by the authors.

    References

    [1] Montgomery D. Introduction to statistical quality control. 6th ed. Hoboken, NJ: Wiley ; 2009.[2] Worsley K. On the likelihood ratio test for a shift in location of normal populations. J Amer Statist Assoc.

    1979;74:365–367.[3] Smith AFM. A Bayesian approach to inference about a change-point in a sequence of random variables.

    Biometrika. 1975;62:407–416.[4] Barry D, Artigan J. A Bayesian analysis for change point problems. J Amer Statist Assoc. 1993;88(421):309–319.[5] Chib S. Estimation and comparison of multiple change-point models. J Econ. 1998;86(2):221–241.[6] Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.

    Biometrika. 1995;82:711–732.[7] Chen J, Gupta AK. Testing and locating variance change-points with application to stock prices. J Amer Statist

    Assoc. 1997;92:739–747.[8] Zamba K, Hawkins D. A multivariate change-point model for change in mean vector and/or covariance structure.

    J Quality Technol. 2009;41(3).[9] Carlin B, Gelfand A, Smith A. Hierarchical Bayesian analysis of changepoint problems. Appl Stat. 1992;389–405.

    [10] Pettitt A. A non-parametric approach to the change-point problem. Appl Stat. 1979;126–135.[11] Bai J. Estimation of a change point in multiple regression models. Rev Econ Stat. 1997;79(4):551–563.

    [12] Sullivan J, Woodall W. Change-point detection of mean vector or covariance matrix shifts using multivariateindicual observations. IIE Trans. 2000;32(6):537–549.

    [13] Chen J, Gupta AK. Parametric statistical change point analysis. New York: Birkhauser; 2012.[14] Horváth L, Kokoszka P. Testing for changes in multivariate dependent observations with an application to

    temperature changes. J Multivariate Anal. 1999;68:96–119.

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    19/20

    18 R. M. STEWARD AND S. E. RIGDON

    [15] Perreault L, Parent E, Bernier J, Bobée B, Parent E. Retrospective multivariate Bayesian change-point analysis: Asimultaneous single change in the mean of several hydrological sequences. J Multivariate Anal.2000;235:221–241.

    [16] Son YS, Kim SW. Bayesian single change point detection in a sequence of multivariate normal observations.Statistics. 2005;39(5):373–387.

    [17] Müller HG. Change-points in nonparametric regression analysis. Ann Stat. 1992;20:737–761.[18] Ogden R, Lynch J. Bayesian analysis of change-point models. Lecture Notes Stat. 1999;141:67–82.[19] Ciuperca G. Estimating Nonlinear Regression with and without Change-points by the LAD Method. Ann Inst

    Stat Math. 2011;63:717–743.[20] Battaglia F, Protopapas MK. Multi-regime models for nonlinear nonstationary time series. Comput Stat.

    2012;27:319–341.[21] Matteson DS, James NA. A nonparametric approach for multiple change point analysis of multivariate data. J

    Amer Statist Assoc. 2014;109:334–345.[22] Mason R, Young J. Multivariate statistical process control with industrial applications. Philadelphia, PA: Society 

    for Industrial and Applied Mathematics; 2002.[23] Perreault L, Bernier J, Bobée B, Parent E. Change-point analysis in hydrometeorological time series. part 1. the

    normal model revisted. J Multivariate Anal. 2000;235:221–241.[24] Wagner M. Handbook of bioserveillance. Burlington, MA: Elsevier Academic Press; 2006.[25] Daubechies I. Ten lectures on wavelets. Philadelphia, PA: Society for Industrial and Applied Mathematics; 1992.[26] Nason G. Wavelet methods in statistics with R. New York: Springer Science

    +Business Media, LLC; 2008.

    [27] Vidakovic B. Statistical modeling by wavelets. Canvers, MA: Wiley; 1999.[28] Mallat SG. Theory for multiresolution signal decomposition: The wavelet representaion. IEEE Trans Pattern Anal

    Mach Intell. 1989;11(7):674–693.[29] Donoho DL, Johnstone JM. Ideal Spatial Adaption by Wavelet Shrinkage. Biometrika. 1994;81:425–455.[30] Mardia K, Kent J, Bibby J. Multivariate analysis. New York: Academic Press; 1979.[31] Wang Y. Jump and sharp cusp detection by wavelets. Biometrika. 1995;82:385–97.[32] Jensen A, la Cour-Harbo A. Ripples in mathematics: the discrete wavelet transform. Berlin: Springer; 2001.

    Appendix

    We derive Equation (9) beginning with the posterior distribution

     p(τ , , |D∗) ∝ ||−m/2 exp− 1

    2

     j

    k

    (d jk − q jk)T−1(d jk − q jk) ||−1/2.

    Here, m represents the actual number of detail coefficients used in the analysis. We integrate out and  to obtain themarginal posterior distribution function

     p(τ |D∗) ∝ PD( p)

     R p||−(m+1)/2 exp−1

    2

     j

    k

    (d jk − q jk)T−1(d jk − q jk) dd . (A1)

    where PD( p) represents the space of  p-dimensional positive-denite matrices.Notice by how Q is dened, that all the elements of any q jk are identical. With this observation in mind, we letq jk be

    a scalar representative for a given row of Q corresponding to the value of each element in that particular row. Next, welet represent a vector of the mean function shift at the change-point in the natural way. With this change of notationin hand, we may equivalently write Equation (A1) as

     p(τ |D∗) ∝ PD( p)

     R p||−(m+1)/2 exp− 1

    2

     j

    k

    (d jk − q jk)T−1(d jk − q jk)  dd. (A2)

    Expanding the exponent of Equation (A2), we obtain

     p(τ |D∗) ∝ PD( p)

     R p||−(m+1)/2

    × exp− 12 j

    k

    (dT jk−

    1d jk + q jk

    T−

    1q jk− q jk

    T−

    1d jk − d

    T jk−

    1q jk)

     d d

    = PD( p)

     R p||−(m+1)/2 exp−1

    2( A+ C T−1−T−1B− BT−1)

     dd,   (A3)

  • 8/19/2019 A Bayesian Wavelet Approach to Estimation of a Change-Journal of Statistical Computation and Simulation Volume…

    20/20

    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 19

    where

     A = j

    k

    dT jk−1d jk,   B = j

    k

    qijd jk,   BT = j

    k

    qijdT jk, and   C = j

    k

    q2ij.

    Continuing from Equation (A3), we provide the following detailed calculations:

     p(τ 

    |D∗)

    =  PD( p)  R p ||−(m+1)/2 exp−

    2 AC  +

    T−1−T−1 B

    C  −

    BT

      −1 d d= PD( p)

    ||−(m+1)/2 exp− 1

    2

     A− 1C BT−1B 

    R pexp

    −C 

    2

    − BC 

    T−1

    − BC 

     dd

    = PD( p)

    ||−(m+1)/2||1/2C −1/2 exp−1

    2

     A− 1C BT−1B

     d

    = PD( p)

    −m/2C −1/2 exp

    −1

    2

     j

    k

    dT jk−1d jk −

    1

    C BT−1B

     d

    =  PD( p) −m/2C −1/2 exp−1

    2

    tr

    −1 j

    kd jkd

    T

     jk−1

    C BB

    T d

    ∝ C −1/2 j

    k

    d jkdT jk−

    1

    C BBT

    −(m− p−1)/2

    ,   (A4)

    where in the last step Equation (9) follows by dropping multiplicative constants and applying the known form of theWishart distribution.