Learning Time-Series Shapelets - Semantic Scholar
Transcript of Learning Time-Series Shapelets - Semantic Scholar
![Page 1: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/1.jpg)
Learning Time-Series Shapelets
Learning Time-Series Shapelets
Josif Grabocka, Nicolas Schilling, Martin Wistuba and LarsSchmidt-Thieme
Information Systems and Machine Learning Lab (ISMLL)University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 1 / 1
![Page 2: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/2.jpg)
Learning Time-Series Shapelets
What are Time-Series Shapelets? (I)
I Definition:
I Patterns whose minimum distances to time-series yielddiscriminative predictors [Ye and Keogh(2009),Lines et al.(2012)Lines, Davis, Hills, and Bagnall]
I Problem:
1. Learn K discriminative shapelets of length L (denoted as S ∈ RK×L).2. From a dataset that has I time-series instances of length M (denoted
as T ∈ RI×M), where each series is divided into J = M − L slidingwindow segments.
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 1 / 1
![Page 3: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/3.jpg)
Learning Time-Series Shapelets
What are Time-Series Shapelets? (II)
The minimum distances M ∈ RI×K between shapelets and time-series:
Mi ,k = minj=1,...,J
1
L
L∑l=1
(Ti ,j+l−1 − Sk,l)2 (1)
... yield discriminative predictors:
0 30−1
0
1
S1
0 30−2
0
2
S2
0 100 200−2
0
2
T1
0 100 200−2
0
2
T2
0 100 200−2
0
2
T3
0 100 200−2
0
2
T4
0.1 0.14 0.18
0.03
0.06
0.09
||T∗ − S1||2
||T∗−S2||2
Shapelet−transformed data
T4
T3
T2
T1
Figure 1: Left: Two shapelets, Middle: Closest Segments, Right:’Shapelet-Transform’ed data
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 2 / 1
![Page 4: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/4.jpg)
Learning Time-Series Shapelets
Related Shapelets Work
All the possible segments (exhaustively) of all time-series candidates arepotential shapelet candidates:
I Compute the prediction accuracy of minimum distances of eachcandidate and then rank the top-K by prediction accuracy (featureranking)
I Build a decision tree from the top-K features [Ye and Keogh(2009),Lines et al.(2012)Lines, Davis, Hills, and Bagnall]
I Alternatively, use the new K-dimensional feature representation anduse standard classifiers[Hills et al.(2013)Hills, Lines, Baranauskas, Mapp, and Bagnall].
Exhaustive search is expensive, speed-ups were proposed[Mueen et al.(2011)Mueen, Keogh, and Young,Rakthanmanon and Keogh(2013)].
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 3 / 1
![Page 5: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/5.jpg)
Learning Time-Series Shapelets
Proposed Method (I)
A linear model of predictors M ∈ RI×K and weights W ∈ RK ,W0 ∈ R:can be used to estimate the target Y ∈ RI :
Yi = W0 +K∑
k=1
Mi ,kWk ∀i ∈ {1, . . . , I} (2)
The risk of estimating the true target Y ∈ {−1,+1}I from approximatedtarget Y ∈ RI is the logistic loss L(Y , Y ) ∈ RI :
L(Yi , Yi ) = −Yi lnσ(Yi )− (1− Yi ) ln(
1− σ(Yi )), ∀i ∈ {1, . . . , I} (3)
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 4 / 1
![Page 6: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/6.jpg)
Learning Time-Series Shapelets
Proposed Method (II)
The objective function F ∈ R is a regularized loss function:
argminS,W
F(S ,W ) = argminS ,W
I∑i=1
L(Yi , Yi ) + λW ||W ||2 (4)
The objective function F can be decomposed into per-instance objectivesFi :
Fi = L(Yi , Yi ) +λWI
K∑k=1
Wk2, ∀i ∈ {1, . . . , I} (5)
Mission of This Paper: Learn S ,W that minimize F .
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 5 / 1
![Page 7: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/7.jpg)
Learning Time-Series Shapelets
Differentiable Minimum Function (I)
Approximate the true minimum M with the soft-minimum version M:
Mi ,k ≈ Mi ,k =
∑Jj=1 Di ,k,j e
αDi,k,j∑Jj ′=1 e
αDi,k,j′,
α ∈ (−∞, 0] ∀i ∈ {1, . . . , I},∀k ∈ {1, . . . ,K} (6)
Di ,k,j :=1
L
L∑l=1
(Ti ,j+l−1 − Sk,l)2 ,
∀i ∈ {1, . . . , I},∀k ∈ {1, . . . ,K},∀j ∈ {1, . . . , J} (7)
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 6 / 1
![Page 8: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/8.jpg)
Learning Time-Series Shapelets
Differentiable Minimum Function (II)The smooth approximation of the minimum function, allows only theminimum segment to contribute for α→ −∞.
50 100 150 200 250 300 350−5
−2.50
52.5
Series Shapelet
50 100 150 200 250 300 3500
0.5
1
Euclidean
50 100 150 200 250 300 3500246
x 10−3
Soft−min, alpha=−20
0 50 100 150 200 250 300 3500
1
2x 10
−3
Soft−min, alpha=−100
Figure 2: Illustration of the soft minimum between a shapelet (green) and all thesegments of a series (black) from the FaceFour dataset
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 7 / 1
![Page 9: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/9.jpg)
Learning Time-Series Shapelets
Learning Algorithm (I)The partial derivative of the per-instance objective function Fi withrespect to the l-th point of the k-th shapelet Sk,l is computed using thechain rule of derivation:
∂Fi
∂Sk,l=
∂L(Yi , Yi )
∂Yi
∂Yi
∂Mi ,k
J∑j=1
∂Mi ,k
∂Di ,k,j
∂Di ,k,j
∂Sk,l(8)
∂Fi
∂Wk=
∂L(Yi , Yi )
∂Yi
∂Yi
∂Wk
+∂Reg(W )
∂Wk,
∂Fi
∂W0=∂L(Yi , Yi )
∂Yi
(9)
All the components of the partial derivative are computable as follows:
∂L(Yi , Yi )
∂Yi
= −(Yi − σ
(Yi
)),
∂Yi
∂Mi,k
= Wk ,∂Yi
∂Wk
= Mi,k ,∂Reg(W )
∂Wk=
2λW
IWk (10)
∂Mi,k
∂Di,k,j=
eαDi,k,j
(1 + α
(Di,k,j − Mi,k
))∑J
j′=1 eαDi,k,j′
,∂Di,k,j
∂Sk,l=
2
L
(Sk,l − Ti,j+l−1
)(11)
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 8 / 1
![Page 10: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/10.jpg)
Learning Time-Series Shapelets
Learning Algorithm (II)
Require: T ∈ RI×Q , Number of Shapelets K , Length of a shapelet L, Regularization λW ,Learning Rate η, Number of iterations: maxIter
Ensure: Shapelets S ∈ RK×L, Classification weights W ∈ RK , Bias W0 ∈ R1: for iteration=1, . . . ,maxIter do2: for i = 1, . . . , I do3: W0 ←W0 − η ∂Fi
∂W0
4: for k = 1, . . . ,K do5: Wk ←Wk − η ∂Fi
∂Wk
6: for L = 1, . . . , L do7: Sk,l ← Sk,l − η ∂Fi
∂Sk,l
8: return S ,W ,W0
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 9 / 1
![Page 11: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/11.jpg)
Learning Time-Series Shapelets
Illustration of Learning
0.2 0.4 0.6 0.80
0.2
0.4
0.6
M1
M2
a) Iteration 0
Y=0Y=1W
5 15 25 35
−0.3
0
0.3
S1 (It. 0)
Time5 15 25 35
−1.5
0
1.5
S2 (It. 0)
Time
0.2 0.4 0.6 0.80.1
0.3
0.5
0.7
M1
M2
b) Iteration 400
Y=0Y=1W
5 15 25 35
−2
−0.5
1
S1 (It. 400)
Time5 15 25 35
−2
0
2
S2 (It. 400)
Time
0.2 0.53 0.87 1.20.2
0.4
0.6
0.8
M1
M2
c) Iteration 800
Y=0Y=1W
5 15 25 35
−2
−0.5
1
S1 (It. 800)
Time5 15 25 35
−2
0
2
S2 (It. 800)
Time
Figure 3: Learning Two Shapelets on the Gun-Point Dataset,(L = 40, η = 0.01, λW = 0.01, α = −100 )
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 10 / 1
![Page 12: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/12.jpg)
Learning Time-Series Shapelets
Advantage over Feature Ranking
1. Discover hidden/latent shapelets
2. Interactions of Variables
0 1 2 3 4 5
0
M1
0 1 2 3 4 5
0
M2 0 1 2 3 4 5
0
1
2
3
4
5
6
M1
M2
Pos. ClassNeg. Class
Figure 4: Interactions among Shapelets Enable Individually UnsuccessfulShapelets (left plots) to Excel in Cooperation (right plot)
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 11 / 1
![Page 13: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/13.jpg)
Learning Time-Series Shapelets
Experimental Setup
Our method, denoted LS, is compared against 13 baselines using 28time-series dataset:
I Baselines: Quality Criteria: Information gain (IG), Kruskall-Wallis(KW), F-Stats (FT), Mood’s Median Criterion (MM); Usingshapelet-transformed data: Nearest Neighbors (1NN), Naive Bayes(NB), C4.5 tree (C4), Bayesian Networks (BN), Random Forest (RA),Rotation Forest (RO), Support Vector Machines (SV); OtherRelated: Fast Shapelets (FS), Dynamic Time Warping (DT).
I Datasets: 28 time-series datasets from the UCR and UEA collectionsfrom diverse domains; Provided train/test splits
I Hyper-parameters: are found using grid-search, by testing over avalidation split from the training data
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 12 / 1
![Page 14: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/14.jpg)
Learning Time-Series Shapelets
Results
IG KW FT MM DT C4 NN NB BN RF RO SV FS LS
Total Wins 0.0 0.0 1.2 0.3 0.0 0.0 1.5 0.0 1.6 0.2 0.0 4.7 1.3 17.3
LS Wins 28 27 26 26 28 28 23 27 23 26 24 20 26 -LS Draws 0 0 1 1 0 0 2 0 2 1 1 2 1 -LS Losses 0 1 1 1 0 0 3 1 3 1 3 6 1 -
Rank Mean 9.8 9.9 9.1 9.5 9.8 10.0 6.2 7.7 5.5 6.3 5.4 4.6 9.2 1.9Rank C.I. 1.0 1.3 1.3 1.3 1.9 0.8 1.2 1.1 1.2 0.7 0.9 1.2 1.5 0.5Rank S.D. 2.7 3.4 3.6 3.4 5.0 2.1 3.2 2.9 3.1 2.0 2.4 3.2 4.1 1.4
W.t. p-val. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -
Table 1: Comparative Figures of Classification Accuracies over 28 Time-seriesDatasets
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 13 / 1
![Page 15: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/15.jpg)
Learning Time-Series Shapelets
Conclusions
1. Learning shapelets to directly optimize the classification objectiveimproves classification accuracy
2. Supervised Shapelet Learning more accurate than ExhaustiveShapelet Discovery
I Shapelets are not restricted to series segmentsI Considers interactions among minimum distances features
3. Results against 13 baselines using 28 datasets validate the claim
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 14 / 1
![Page 16: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/16.jpg)
Learning Time-Series Shapelets
References
J. Hills, J. Lines, E. Baranauskas, J. Mapp, and A. Bagnall.Classification of time series by shapelet transformation.Data Mining and Knowledge Discovery, 2013.
J. Lines, L. Davis, J. Hills, and A. Bagnall.A shapelet transform for time series classification.In Proceedings of the 18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2012.
A. Mueen, E. Keogh, and N. Young.Logical-shapelets: an expressive primitive for time series classification.In Proceedings of the 17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2011.
T. Rakthanmanon and E. Keogh.Fast shapelets: A scalable algorithm for discovering time series shapelets.Proceedings of the 13th SIAM International Conference on Data Mining, 2013.
L. Ye and E. Keogh.Time series shapelets: a new primitive for data mining.In Proceedings of the 15th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2009.
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 15 / 1
![Page 17: Learning Time-Series Shapelets - Semantic Scholar](https://reader031.fdocuments.in/reader031/viewer/2022012018/61686717d394e9041f6f5ab1/html5/thumbnails/17.jpg)
Learning Time-Series Shapelets
Back-up Slide: Dependence on Initialization
S1:15
S16
:30
a) Train Loss
−1 0 1−1
−0.5
0
0.5
1
20
25
30
35
40
S1:15
S16
:30
b) Train Error
−1 0 1−1
−0.5
0
0.5
1
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Figure 5: Sensitivity of Shapelet Initialization, Gun-Point dataset, Parameters:L = 30, η = 0.01, λW = 0.01, Iterations= 3000, α = −100
I We used K-Means centroids as initial shapelets.
Grabocka et al., ISMLL, University of Hildesheim, Germany
SIGKDD 2014, 26/08/2014 16 / 1