A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The...
Transcript of A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The...
![Page 1: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/1.jpg)
A Nonparametric Approach for Multiple Change PointAnalysis of Multivariate Data
David S. MattesonDepartment of Statistical Science
Cornell University
[email protected]/~matteson
Joint work with: Nicholas A. James, ORIE, Cornell University
Sponsorship: National Science Foundation
2014 October
David S. Matteson ([email protected]) Change Point Analysis 2014 October 1 / 40
![Page 2: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/2.jpg)
Introduction
Change Point Analysis
The process of detecting distributional changes within time ordered data
Framework:
I Retrospective, offline analysis
I Multivariate observations
I Estimation: number of change points and their positions
I Hierarchical algorithms
Applications:
I Genetics
I Finance
I Emergency Medical Services
David S. Matteson ([email protected]) Change Point Analysis 2014 October 2 / 40
![Page 3: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/3.jpg)
Introduction
Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd
Partition into k homogeneous, temporally contiguous subsets
I k is unknownI Size of each subset is unknown
David S. Matteson ([email protected]) Change Point Analysis 2014 October 3 / 40
![Page 4: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/4.jpg)
Introduction
Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd
Partition into k homogeneous, temporally contiguous subsets
I k is unknownI Size of each subset is unknown
David S. Matteson ([email protected]) Change Point Analysis 2014 October 3 / 40
![Page 5: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/5.jpg)
Introduction
Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd
Partition into k homogeneous, temporally contiguous subsets
I k is unknownI Size of each subset is unknown
David S. Matteson ([email protected]) Change Point Analysis 2014 October 3 / 40
![Page 6: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/6.jpg)
Introduction
Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd
Partition into k homogeneous, temporally contiguous subsets
I k is unknownI Size of each subset is unknown
David S. Matteson ([email protected]) Change Point Analysis 2014 October 3 / 40
![Page 7: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/7.jpg)
Cluster Analysis
Cluster Analysis
Change point analysis is similar to cluster analysis
In cluster analysis we also wish to partition the observations intohomogeneous subsets
I Subsets may not be contiguous in time without some constraints
David S. Matteson ([email protected]) Change Point Analysis 2014 October 4 / 40
![Page 8: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/8.jpg)
Cluster Analysis
Cluster Analysis
Change point analysis is similar to cluster analysis
In cluster analysis we also wish to partition the observations intohomogeneous subsets
I Subsets may not be contiguous in time without some constraints
David S. Matteson ([email protected]) Change Point Analysis 2014 October 4 / 40
![Page 9: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/9.jpg)
Cluster Analysis
Cluster Analysis
Change point analysis is similar to cluster analysis
In cluster analysis we also wish to partition the observations intohomogeneous subsets
I Subsets may not be contiguous in time without some constraints
David S. Matteson ([email protected]) Change Point Analysis 2014 October 4 / 40
![Page 10: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/10.jpg)
Hierarchical Estimation
Hierarchical Estimation
Apply methods from clustering to find change points
Exhaustive search is not practical: O(nk), in general.
May consider Dynamic Programming
We use a hierarchical or sequential approach: O(kn2)
I Divisive: Clusters are divided until each observation is its own cluster
I Agglomerative: Clusters are merged until all observations belong to asingle cluster
David S. Matteson ([email protected]) Change Point Analysis 2014 October 5 / 40
![Page 11: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/11.jpg)
Hierarchical Estimation
Hierarchical Estimation
Apply methods from clustering to find change points
Exhaustive search is not practical: O(nk), in general.
May consider Dynamic Programming
We use a hierarchical or sequential approach: O(kn2)
I Divisive: Clusters are divided until each observation is its own cluster
I Agglomerative: Clusters are merged until all observations belong to asingle cluster
David S. Matteson ([email protected]) Change Point Analysis 2014 October 5 / 40
![Page 12: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/12.jpg)
Hierarchical Estimation
Hierarchical Estimation
Apply methods from clustering to find change points
Exhaustive search is not practical: O(nk), in general.
May consider Dynamic Programming
We use a hierarchical or sequential approach: O(kn2)
I Divisive: Clusters are divided until each observation is its own cluster
I Agglomerative: Clusters are merged until all observations belong to asingle cluster
David S. Matteson ([email protected]) Change Point Analysis 2014 October 5 / 40
![Page 13: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/13.jpg)
Hierarchical Estimation
Hierarchical Estimation: Divisive Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 6 / 40
![Page 14: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/14.jpg)
Hierarchical Estimation
Hierarchical Estimation: Divisive Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 6 / 40
![Page 15: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/15.jpg)
Hierarchical Estimation
Hierarchical Estimation: Divisive Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 6 / 40
![Page 16: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/16.jpg)
Hierarchical Estimation
Hierarchical Estimation: Agglomerative Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 7 / 40
![Page 17: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/17.jpg)
Hierarchical Estimation
Hierarchical Estimation: Agglomerative Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 7 / 40
![Page 18: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/18.jpg)
Hierarchical Estimation
Hierarchical Estimation: Agglomerative Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 7 / 40
![Page 19: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/19.jpg)
Hierarchical Estimation
Hierarchical Estimation: Agglomerative Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 7 / 40
![Page 20: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/20.jpg)
Multivariate Homogeneity
Measuring Multivariate Homogeneity
Suppose X,Y ∈ Rd with X ∼ Fx ⊥⊥ Y ∼ Fy
Let φx(t) = E(e i〈t,X〉) and φy (t) = E
(e i〈t,Y〉) characteristic functions
Define a divergence between Fx and Fy as
E(X,Y; w) =
∫Rd
|φx(t)− φy (t)|2 w(t) dt,
w(t) denotes an arbitrary positive weight function, for which E exists
David S. Matteson ([email protected]) Change Point Analysis 2014 October 8 / 40
![Page 21: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/21.jpg)
Multivariate Homogeneity
A Weight Function
A convenient choice for w(t) > 0 (Szekely and Rizzo, 2005):
w(t;α) =
(2πd/2Γ(1− α/2)
α2αΓ((d + α)/2)|t|d+α
)−1
in which Γ(x) is the gamma function
Note: for any fixed (d , α), w(t;α) ∝ |t|−(d+α)
David S. Matteson ([email protected]) Change Point Analysis 2014 October 9 / 40
![Page 22: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/22.jpg)
Multivariate Homogeneity
Equivalent Divergence MeasuresLet X and Y be independent, and (X′,Y′) be an iid copy of (X,Y)
Theorem
Suppose that E(|X|α + |Y|α) <∞, for some α ∈ (0, 2], then
E(X,Y;α) =
∫Rd
|φx(t)− φy (t)|2(
2πd/2Γ(1− α/2)
α2αΓ((d + α)/2)|t|d+α
)−1
dt
= 2E|X− Y|α − E|X− X′|α − E|Y − Y′|α
< ∞
I If 0 < α < 2 then E(X,Y;α) = 0 if and only if X and Y areidentically distributed
I If α = 2 then E(X,Y;α) = 0 if and only if EX = EY
David S. Matteson ([email protected]) Change Point Analysis 2014 October 10 / 40
![Page 23: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/23.jpg)
Multivariate Homogeneity
An Empirical Measure (U-statistics)
Let Xn = {Xi : i = 1, . . . , n} and Ym = {Yj : j = 1, . . . ,m} beindependent iid samples from the distribution of X ,Y ∈ Rd , respectively,such that E |X |α,E |Y |α <∞ for some α ∈ (0, 2)
Define
E(Xn,Ym;α) =
2
mn
n∑i=1
m∑j=1
|Xi − Yj |α −(
n
2
)−1∑1≤i<k≤n
|Xi − Xk |α −(
m
2
)−1 ∑1≤j<k≤m
|Yj − Yk |α
and
Q(Xn,Ym;α) =mn
m + nE(Xn,Ym;α)
David S. Matteson ([email protected]) Change Point Analysis 2014 October 11 / 40
![Page 24: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/24.jpg)
Multivariate Homogeneity
Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)
E(Xn,Ym;α)→ E(X ,Y ;α)
almost surely, as min(m, n)→∞.
Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,
Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1
λiQi
in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2
1, seeRizzo and Szekely (2010).
Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,
Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 12 / 40
![Page 25: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/25.jpg)
Multivariate Homogeneity
Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)
E(Xn,Ym;α)→ E(X ,Y ;α)
almost surely, as min(m, n)→∞.
Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,
Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1
λiQi
in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2
1, seeRizzo and Szekely (2010).
Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,
Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 12 / 40
![Page 26: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/26.jpg)
Multivariate Homogeneity
Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)
E(Xn,Ym;α)→ E(X ,Y ;α)
almost surely, as min(m, n)→∞.
Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,
Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1
λiQi
in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2
1, seeRizzo and Szekely (2010).
Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,
Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 12 / 40
![Page 27: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/27.jpg)
Single Change Point
Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.
Suppose heterogeneous sample with observations from two distributions.
Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .
Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.
A change point location τT is then estimated as
τT = argmaxτ
QT (Xτ ,Yτ ;α).
Theorem
If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then
τT/Ta.s.−→ γ, as T →∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 13 / 40
![Page 28: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/28.jpg)
Single Change Point
Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.
Suppose heterogeneous sample with observations from two distributions.
Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .
Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.
A change point location τT is then estimated as
τT = argmaxτ
QT (Xτ ,Yτ ;α).
Theorem
If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then
τT/Ta.s.−→ γ, as T →∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 13 / 40
![Page 29: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/29.jpg)
Single Change Point
Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.
Suppose heterogeneous sample with observations from two distributions.
Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .
Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.
A change point location τT is then estimated as
τT = argmaxτ
QT (Xτ ,Yτ ;α).
Theorem
If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then
τT/Ta.s.−→ γ, as T →∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 13 / 40
![Page 30: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/30.jpg)
Multiple Change Points
Multiple Change Points: Unknown Locations
A generalized bisection approach for sequential estimation
For 1 ≤ τ < κ ≤ T , define:
Xτ = {Z1,Z2, . . . ,Zτ} and Yτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
A change point location τ is then estimated as
(τ , κ) = argmax(τ,κ)
Q(Xτ ,Yτ (κ);α).
David S. Matteson ([email protected]) Change Point Analysis 2014 October 14 / 40
![Page 31: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/31.jpg)
Multiple Change Points
Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1
This partitions the observations into k clusters C1, C2, . . . , Ck
Given these clusters, we then apply the single change point procedurewithin each of the k clusters.
For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)
Now let i∗ = argmaxi∈{1,...,k}
Q[Xτ(i),Yτ(i)(κ(i));α],
in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci
Denote test statistic as
qk = Q(Xτk ,Yτk (κk);α),
τk = τ(i∗) is kth estimated change point, located within cluster Ci∗
David S. Matteson ([email protected]) Change Point Analysis 2014 October 15 / 40
![Page 32: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/32.jpg)
Multiple Change Points
Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1
This partitions the observations into k clusters C1, C2, . . . , Ck
Given these clusters, we then apply the single change point procedurewithin each of the k clusters.
For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)
Now let i∗ = argmaxi∈{1,...,k}
Q[Xτ(i),Yτ(i)(κ(i));α],
in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci
Denote test statistic as
qk = Q(Xτk ,Yτk (κk);α),
τk = τ(i∗) is kth estimated change point, located within cluster Ci∗
David S. Matteson ([email protected]) Change Point Analysis 2014 October 15 / 40
![Page 33: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/33.jpg)
Multiple Change Points
Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1
This partitions the observations into k clusters C1, C2, . . . , Ck
Given these clusters, we then apply the single change point procedurewithin each of the k clusters.
For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)
Now let i∗ = argmaxi∈{1,...,k}
Q[Xτ(i),Yτ(i)(κ(i));α],
in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci
Denote test statistic as
qk = Q(Xτk ,Yτk (κk);α),
τk = τ(i∗) is kth estimated change point, located within cluster Ci∗
David S. Matteson ([email protected]) Change Point Analysis 2014 October 15 / 40
![Page 34: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/34.jpg)
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
![Page 35: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/35.jpg)
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
![Page 36: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/36.jpg)
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
![Page 37: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/37.jpg)
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
![Page 38: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/38.jpg)
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
![Page 39: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/39.jpg)
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
![Page 40: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/40.jpg)
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
![Page 41: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/41.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation Test
Distribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
![Page 42: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/42.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
![Page 43: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/43.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
![Page 44: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/44.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
![Page 45: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/45.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
![Page 46: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/46.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
![Page 47: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/47.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
![Page 48: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/48.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 49: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/49.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 50: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/50.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 51: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/51.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 52: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/52.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 53: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/53.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 54: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/54.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 55: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/55.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 56: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/56.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 57: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/57.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 58: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/58.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 59: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/59.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
![Page 60: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/60.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
Once again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
![Page 61: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/61.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
![Page 62: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/62.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
![Page 63: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/63.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
![Page 64: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/64.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
![Page 65: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/65.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
![Page 66: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/66.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
![Page 67: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/67.jpg)
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
![Page 68: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/68.jpg)
The E-Divisive Algorithm ecp Package
The ‘ecp’ R package (CRAN)Signature:
e.divisive(X, sig.lvl=0.05, R=199, k=NULL, min.size=30, alpha=1)
Arguments:
I X - A T × d matrix representation of a length T time series, withd-dimensional observations.
I sig.lvl - The significance level used for the permutation test.
I R - The maximum number of permutations to perform in thepermutation test.
I k - The number of change points to return. If this is NULL only thestatistically significant estimated change points are returned.
I min.size - The minimum number of observations btw change points.
I alpha - The index for test statistic.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 20 / 40
![Page 69: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/69.jpg)
The E-Divisive Algorithm ecp Package
The ‘ecp’ R package (CRAN)Returned list:
I k.hat - Number of clusters created by the estimated change points.
I order.found - The order in which the change points were estimated.
I estimates - Locations of the statistically significant change points.
I considered.last - Location of the last change point, that was notfound to be statistically significant at the given significance level.
I permutations - The number of permutations performed by each ofthe sequential permutation test.
I cluster - The estimated cluster membership vector.
I p.values - Approximate p-values estimated from each permutationtest.
Complexity is O(kT 2)David S. Matteson ([email protected]) Change Point Analysis 2014 October 21 / 40
![Page 70: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/70.jpg)
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
![Page 71: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/71.jpg)
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
![Page 72: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/72.jpg)
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
![Page 73: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/73.jpg)
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
![Page 74: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/74.jpg)
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
![Page 75: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/75.jpg)
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
![Page 76: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/76.jpg)
Simulation
A change in variance for univariate normal data
Method Correct k Average Adjusted Rand
MultiRank 22/100 0.504
E-Divisive 95/100 0.909
David S. Matteson ([email protected]) Change Point Analysis 2014 October 23 / 40
![Page 77: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/77.jpg)
Simulation
A change in correlation for bivariate normal data
Method Correct k Average Adjused Rand
MultiRank 72/100 0.166
E-Divisive 92/100 0.997
David S. Matteson ([email protected]) Change Point Analysis 2014 October 24 / 40
![Page 78: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/78.jpg)
Simulation
1,000 simulations, 2 CP: N(0,1), N(µ,1), N(0,1)
Average Rand Average Adj. RandT µ MultiRank E-Divisive MultiRank E-Divisive
1501 0.940 0.948 0.867 0.8852 0.977 0.991 0.949 0.9814 0.981 1.000 0.958 1.000
3001 0.970 0.972 0.933 0.9372 0.989 0.996 0.975 0.9914 0.991 1.000 0.979 1.000
6001 0.986 0.986 0.968 0.9692 0.994 0.998 0.987 0.9964 0.995 1.000 0.990 1.000
David S. Matteson ([email protected]) Change Point Analysis 2014 October 25 / 40
![Page 79: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/79.jpg)
Simulation
1,000 simulations, 2 CP: N(0,1), N(0, σ2), N(0,1)
Average Rand Average Adj. RandT σ2 MultiRank E-Divisive MultiRank E-Divisive
1502 0.731 0.902 0.471 0.7855 0.764 0.976 0.521 0.948
10 0.764 0.989 0.519 0.975
3002 0.744 0.924 0.490 0.8345 0.759 0.990 0.511 0.978
10 0.759 0.995 0.512 0.989
6002 0.742 0.970 0.488 0.9335 0.753 0.996 0.500 0.990
10 0.753 0.998 0.501 0.995
David S. Matteson ([email protected]) Change Point Analysis 2014 October 26 / 40
![Page 80: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/80.jpg)
Simulation
1,000 simulations, 2 CP: N(0,1), tν(0, 1), N(0,1)
Average Rand Average Adj. RandT ν MultiRank E-Divisive MultiRank E-Divisive
15016 0.632 0.798 0.327 0.5648 0.651 0.830 0.353 0.6312 0.679 0.846 0.395 0.666
30016 0.640 0.755 0.341 0.4928 0.639 0.769 0.338 0.5222 0.680 0.809 0.396 0.596
60016 0.655 0.735 0.365 0.4698 0.653 0.727 0.359 0.4582 0.697 0.813 0.420 0.608
David S. Matteson ([email protected]) Change Point Analysis 2014 October 27 / 40
![Page 81: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/81.jpg)
Simulation
1,000 simulations, 2 CP: N2(0, I ),N2(µ, I ),N2(0, I )
Average Rand Average Adj. RandT µ MultiRank E-Divisive MultiRank E-Divisive
3001 0.656 0.698 0.363 0.4062 0.713 0.732 0.446 0.4683 0.743 0.778 0.489 0.549
6001 0.991 0.994 0.981 0.9872 0.995 1.000 0.989 0.9993 0.996 1.000 0.990 1.000
9001 0.994 0.996 0.987 0.9912 0.997 1.000 0.993 0.9993 0.997 1.000 0.993 1.000
David S. Matteson ([email protected]) Change Point Analysis 2014 October 28 / 40
![Page 82: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/82.jpg)
Simulation
1,000 simulations, 2 CP: N2(0,Σ),N2(0, I ),N2(0,Σ)
Σ =
(1 ρρ 1
)Average Rand Average Adj. Rand
T ρ MultiRank E-Divisive MultiRank E-Divisive
3000.5 0.663 0.729 0.373 0.4550.7 0.712 0.728 0.444 0.4620.9 0.745 0.743 0.491 0.488
6000.5 0.674 0.676 0.391 0.3860.7 0.724 0.672 0.462 0.3700.9 0.745 0.834 0.492 0.673
9000.5 0.692 0.635 0.415 0.3220.7 0.724 0.678 0.464 0.3980.9 0.747 0.966 0.494 0.928
David S. Matteson ([email protected]) Change Point Analysis 2014 October 29 / 40
![Page 83: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/83.jpg)
Simulation
1,000 simulations, 2 CP: Nd(0,Σ),Nd(0, I ),Nd(0,Σ)
Σw.o./noise =
0BBBBB@1 ρ ρ · · · ρρ 1 ρ · · · ρρ ρ 1 · · · ρ...
......
. . ....
ρ ρ ρ · · · 1
1CCCCCA Σw/noise =
0BBBBB@1 ρ 0 · · · 0ρ 1 0 · · · 00 0 1 · · · 0...
......
. . ....
0 0 0 · · · 1
1CCCCCAWithout Noise With Noise
T d Avg. Rand Avg. Adj. Rand Avg. Rand Avg. Adj. Rand
3002 0.767 0.522 0.774 0.5435 0.912 0.816 0.736 0.4639 0.970 0.935 0.736 0.459
6002 0.817 0.648 0.836 0.8165 0.993 0.984 0.631 0.6269 0.998 0.995 0.666 0.648
9002 0.970 0.937 0.968 0.9335 0.998 0.996 0.644 0.3429 0.999 0.999 0.612 0.284
David S. Matteson ([email protected]) Change Point Analysis 2014 October 30 / 40
![Page 84: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/84.jpg)
Applications Genetics
Genetics DataWe applied E-divisive to the aCGH mico-array dataset of 43 individuals with abladder tumor (Bleakley and Vert, 2011); relative hybridization intensity profile forone individual.MultiRank (Lung-Yut-Fong et al., 2011) k = 17 adjRand = 0.677KCPA (Arlot et al., 2012) k = 41 adjRand = 0.658PELT (Killick et al., 2012) k = 47 adjRand = 0.853
0 500 1000 1500 2000
−0.5
0.51.5
MultiRank
Index
Sign
al
0 500 1000 1500 2000
−0.5
0.51.5
KCPA
Index
Sign
al
0 500 1000 1500 2000
−0.5
0.51.5
PELT
Index
Sign
al
0 500 1000 1500 2000
−0.5
0.51.5
E−Divisive
Index
Sign
al
Figure: Top: MultiRank. Bottom: E-divisiveDavid S. Matteson ([email protected]) Change Point Analysis 2014 October 31 / 40
![Page 85: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/85.jpg)
Applications Finance
Financial Data: Cisco Systems
The E-divisive procedure was applied to the monthly log returns of theDow 30
Marginal analysis of Cisco Systems Inc. from April 1990 to January 2010.The procedure found change points at April 2000 and October 2002.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 32 / 40
![Page 86: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/86.jpg)
Applications Finance
Financial Data: Cisco SystemsMarginal analysis of Cisco Systems Inc. from April 1990 to January 2010.The procedure found change points at April 2000 and October 2002.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 33 / 40
![Page 87: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/87.jpg)
Applications Finance
Financial Data: S&P 500 Index
S&P 500: May 20, 1999 − April 25, 2011
Date
log
retu
rns
2000 2002 2004 2006 2008 2010
−0.
10−
0.05
0.00
0.05
0.10
David S. Matteson ([email protected]) Change Point Analysis 2014 October 34 / 40
![Page 88: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/88.jpg)
Agglomerative Algorithm
An Agglomerative Algorithm
Given a partition of k clusters C = {C1,C2, . . . ,Ck}, clusters may or maynot be single observations
Consider combining a pair of adjacent clusters
The partition that maximizes the goodness-of-fit statistic determineschange point locations
David S. Matteson ([email protected]) Change Point Analysis 2014 October 35 / 40
![Page 89: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/89.jpg)
Agglomerative Algorithm
An Agglomerative Algorithm: Goodness-of-Fit
Goodness-of-fit statistic S(k): sum the E-distances between adjacentclusters
Given clusters C = {C1,C2, . . . ,Ck} with ni = #Ci , define
S(k) =k−1∑i=1
(nini+1
ni + ni+1
)Eαni ,ni+1
(Ci ,Ci+1),
David S. Matteson ([email protected]) Change Point Analysis 2014 October 36 / 40
![Page 90: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/90.jpg)
Agglomerative Algorithm
An Agglomerative Algorithm
The partitioning which maximized S(k) is then used to estimate changepoint locations.
Figure: Progression of the goodness of fit statistic, and where it is maximized.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 37 / 40
![Page 91: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/91.jpg)
Agglomerative Algorithm Application: EMS
EMS Priority One Response for Toronto 2007
David S. Matteson ([email protected]) Change Point Analysis 2014 October 38 / 40
![Page 92: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/92.jpg)
Agglomerative Algorithm Application: EMS
EMS Priority One Response for Toronto 2007
David S. Matteson ([email protected]) Change Point Analysis 2014 October 39 / 40
![Page 93: A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The process of detectingdistributionalchanges within time ordered data Framework: I Retrospective,](https://reader033.fdocuments.in/reader033/viewer/2022053016/5f171b7ebd88a33c5042d597/html5/thumbnails/93.jpg)
Bibliography
Bibliographyhttp://www.stat.cornell.edu/∼matteson/
Bleakley, K., and Vert, J.-P. (2011), The group fused Lasso for multiple change-pointdetection,, Technical Report HAL-00602121, Bioinformatics Center (CBIO).
Hoeffding, W. (1961), The Strong Law of Large Numbers for U-Statistics,, TechnicalReport 302, North Carolina State University. Dept. of Statistics.
Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification,2(1), 193 – 218.
James, N. A., and Matteson, D. S. (2013), “ecp: An R Package for NonparametricMultiple Change Point Analysis of Multivariate Data,” arXiv:1309.3295, .
Lung-Yut-Fong, A., Levy-Leduc, C., and Cappe, O. (2011), “Homogeneity andchange-point detection tests for multivariate data using rank statistics,”.
Matteson, D. S., and James, N. A. (2013), “A Nonparametric Approach for MultipleChange Point Analysis of Multivariate Data,” Journal of the American StatisticalAssociation, To Appear.
Rizzo, M. L., and Szekely, G. J. (2010), “Disco Analysis: A Nonparametric Extension ofAnalysis of Variance,” The Annals of Applied Statistics, 4(2), 1034–1055.
Szekely, G. J., and Rizzo, M. L. (2005), “Hierarchical Clustering via JointBetween-Within Distances: Extending Ward’s Minimum Variance Method,” Journal ofClassification, 22(2), 151 – 183.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 40 / 40