Learning Time-Series Shapelets - Semantic Scholar

17
Learning Time-Series Shapelets Learning Time-Series Shapelets Josif Grabocka, Nicolas Schilling, Martin Wistuba and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany SIGKDD 2014, 26/08/2014 Grabocka et al., ISMLL, University of Hildesheim, Germany SIGKDD 2014, 26/08/2014 1/1

Transcript of Learning Time-Series Shapelets - Semantic Scholar

Page 1: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Learning Time-Series Shapelets

Josif Grabocka, Nicolas Schilling, Martin Wistuba and LarsSchmidt-Thieme

Information Systems and Machine Learning Lab (ISMLL)University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 1 / 1

Page 2: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

What are Time-Series Shapelets? (I)

I Definition:

I Patterns whose minimum distances to time-series yielddiscriminative predictors [Ye and Keogh(2009),Lines et al.(2012)Lines, Davis, Hills, and Bagnall]

I Problem:

1. Learn K discriminative shapelets of length L (denoted as S ∈ RK×L).2. From a dataset that has I time-series instances of length M (denoted

as T ∈ RI×M), where each series is divided into J = M − L slidingwindow segments.

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 1 / 1

Page 3: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

What are Time-Series Shapelets? (II)

The minimum distances M ∈ RI×K between shapelets and time-series:

Mi ,k = minj=1,...,J

1

L

L∑l=1

(Ti ,j+l−1 − Sk,l)2 (1)

... yield discriminative predictors:

0 30−1

0

1

S1

0 30−2

0

2

S2

0 100 200−2

0

2

T1

0 100 200−2

0

2

T2

0 100 200−2

0

2

T3

0 100 200−2

0

2

T4

0.1 0.14 0.18

0.03

0.06

0.09

||T∗ − S1||2

||T∗−S2||2

Shapelet−transformed data

T4

T3

T2

T1

Figure 1: Left: Two shapelets, Middle: Closest Segments, Right:’Shapelet-Transform’ed data

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 2 / 1

Page 4: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Related Shapelets Work

All the possible segments (exhaustively) of all time-series candidates arepotential shapelet candidates:

I Compute the prediction accuracy of minimum distances of eachcandidate and then rank the top-K by prediction accuracy (featureranking)

I Build a decision tree from the top-K features [Ye and Keogh(2009),Lines et al.(2012)Lines, Davis, Hills, and Bagnall]

I Alternatively, use the new K-dimensional feature representation anduse standard classifiers[Hills et al.(2013)Hills, Lines, Baranauskas, Mapp, and Bagnall].

Exhaustive search is expensive, speed-ups were proposed[Mueen et al.(2011)Mueen, Keogh, and Young,Rakthanmanon and Keogh(2013)].

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 3 / 1

Page 5: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Proposed Method (I)

A linear model of predictors M ∈ RI×K and weights W ∈ RK ,W0 ∈ R:can be used to estimate the target Y ∈ RI :

Yi = W0 +K∑

k=1

Mi ,kWk ∀i ∈ {1, . . . , I} (2)

The risk of estimating the true target Y ∈ {−1,+1}I from approximatedtarget Y ∈ RI is the logistic loss L(Y , Y ) ∈ RI :

L(Yi , Yi ) = −Yi lnσ(Yi )− (1− Yi ) ln(

1− σ(Yi )), ∀i ∈ {1, . . . , I} (3)

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 4 / 1

Page 6: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Proposed Method (II)

The objective function F ∈ R is a regularized loss function:

argminS,W

F(S ,W ) = argminS ,W

I∑i=1

L(Yi , Yi ) + λW ||W ||2 (4)

The objective function F can be decomposed into per-instance objectivesFi :

Fi = L(Yi , Yi ) +λWI

K∑k=1

Wk2, ∀i ∈ {1, . . . , I} (5)

Mission of This Paper: Learn S ,W that minimize F .

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 5 / 1

Page 7: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Differentiable Minimum Function (I)

Approximate the true minimum M with the soft-minimum version M:

Mi ,k ≈ Mi ,k =

∑Jj=1 Di ,k,j e

αDi,k,j∑Jj ′=1 e

αDi,k,j′,

α ∈ (−∞, 0] ∀i ∈ {1, . . . , I},∀k ∈ {1, . . . ,K} (6)

Di ,k,j :=1

L

L∑l=1

(Ti ,j+l−1 − Sk,l)2 ,

∀i ∈ {1, . . . , I},∀k ∈ {1, . . . ,K},∀j ∈ {1, . . . , J} (7)

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 6 / 1

Page 8: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Differentiable Minimum Function (II)The smooth approximation of the minimum function, allows only theminimum segment to contribute for α→ −∞.

50 100 150 200 250 300 350−5

−2.50

52.5

Series Shapelet

50 100 150 200 250 300 3500

0.5

1

Euclidean

50 100 150 200 250 300 3500246

x 10−3

Soft−min, alpha=−20

0 50 100 150 200 250 300 3500

1

2x 10

−3

Soft−min, alpha=−100

Figure 2: Illustration of the soft minimum between a shapelet (green) and all thesegments of a series (black) from the FaceFour dataset

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 7 / 1

Page 9: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Learning Algorithm (I)The partial derivative of the per-instance objective function Fi withrespect to the l-th point of the k-th shapelet Sk,l is computed using thechain rule of derivation:

∂Fi

∂Sk,l=

∂L(Yi , Yi )

∂Yi

∂Yi

∂Mi ,k

J∑j=1

∂Mi ,k

∂Di ,k,j

∂Di ,k,j

∂Sk,l(8)

∂Fi

∂Wk=

∂L(Yi , Yi )

∂Yi

∂Yi

∂Wk

+∂Reg(W )

∂Wk,

∂Fi

∂W0=∂L(Yi , Yi )

∂Yi

(9)

All the components of the partial derivative are computable as follows:

∂L(Yi , Yi )

∂Yi

= −(Yi − σ

(Yi

)),

∂Yi

∂Mi,k

= Wk ,∂Yi

∂Wk

= Mi,k ,∂Reg(W )

∂Wk=

2λW

IWk (10)

∂Mi,k

∂Di,k,j=

eαDi,k,j

(1 + α

(Di,k,j − Mi,k

))∑J

j′=1 eαDi,k,j′

,∂Di,k,j

∂Sk,l=

2

L

(Sk,l − Ti,j+l−1

)(11)

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 8 / 1

Page 10: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Learning Algorithm (II)

Require: T ∈ RI×Q , Number of Shapelets K , Length of a shapelet L, Regularization λW ,Learning Rate η, Number of iterations: maxIter

Ensure: Shapelets S ∈ RK×L, Classification weights W ∈ RK , Bias W0 ∈ R1: for iteration=1, . . . ,maxIter do2: for i = 1, . . . , I do3: W0 ←W0 − η ∂Fi

∂W0

4: for k = 1, . . . ,K do5: Wk ←Wk − η ∂Fi

∂Wk

6: for L = 1, . . . , L do7: Sk,l ← Sk,l − η ∂Fi

∂Sk,l

8: return S ,W ,W0

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 9 / 1

Page 11: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Illustration of Learning

0.2 0.4 0.6 0.80

0.2

0.4

0.6

M1

M2

a) Iteration 0

Y=0Y=1W

5 15 25 35

−0.3

0

0.3

S1 (It. 0)

Time5 15 25 35

−1.5

0

1.5

S2 (It. 0)

Time

0.2 0.4 0.6 0.80.1

0.3

0.5

0.7

M1

M2

b) Iteration 400

Y=0Y=1W

5 15 25 35

−2

−0.5

1

S1 (It. 400)

Time5 15 25 35

−2

0

2

S2 (It. 400)

Time

0.2 0.53 0.87 1.20.2

0.4

0.6

0.8

M1

M2

c) Iteration 800

Y=0Y=1W

5 15 25 35

−2

−0.5

1

S1 (It. 800)

Time5 15 25 35

−2

0

2

S2 (It. 800)

Time

Figure 3: Learning Two Shapelets on the Gun-Point Dataset,(L = 40, η = 0.01, λW = 0.01, α = −100 )

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 10 / 1

Page 12: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Advantage over Feature Ranking

1. Discover hidden/latent shapelets

2. Interactions of Variables

0 1 2 3 4 5

0

M1

0 1 2 3 4 5

0

M2 0 1 2 3 4 5

0

1

2

3

4

5

6

M1

M2

Pos. ClassNeg. Class

Figure 4: Interactions among Shapelets Enable Individually UnsuccessfulShapelets (left plots) to Excel in Cooperation (right plot)

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 11 / 1

Page 13: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Experimental Setup

Our method, denoted LS, is compared against 13 baselines using 28time-series dataset:

I Baselines: Quality Criteria: Information gain (IG), Kruskall-Wallis(KW), F-Stats (FT), Mood’s Median Criterion (MM); Usingshapelet-transformed data: Nearest Neighbors (1NN), Naive Bayes(NB), C4.5 tree (C4), Bayesian Networks (BN), Random Forest (RA),Rotation Forest (RO), Support Vector Machines (SV); OtherRelated: Fast Shapelets (FS), Dynamic Time Warping (DT).

I Datasets: 28 time-series datasets from the UCR and UEA collectionsfrom diverse domains; Provided train/test splits

I Hyper-parameters: are found using grid-search, by testing over avalidation split from the training data

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 12 / 1

Page 14: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Results

IG KW FT MM DT C4 NN NB BN RF RO SV FS LS

Total Wins 0.0 0.0 1.2 0.3 0.0 0.0 1.5 0.0 1.6 0.2 0.0 4.7 1.3 17.3

LS Wins 28 27 26 26 28 28 23 27 23 26 24 20 26 -LS Draws 0 0 1 1 0 0 2 0 2 1 1 2 1 -LS Losses 0 1 1 1 0 0 3 1 3 1 3 6 1 -

Rank Mean 9.8 9.9 9.1 9.5 9.8 10.0 6.2 7.7 5.5 6.3 5.4 4.6 9.2 1.9Rank C.I. 1.0 1.3 1.3 1.3 1.9 0.8 1.2 1.1 1.2 0.7 0.9 1.2 1.5 0.5Rank S.D. 2.7 3.4 3.6 3.4 5.0 2.1 3.2 2.9 3.1 2.0 2.4 3.2 4.1 1.4

W.t. p-val. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -

Table 1: Comparative Figures of Classification Accuracies over 28 Time-seriesDatasets

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 13 / 1

Page 15: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Conclusions

1. Learning shapelets to directly optimize the classification objectiveimproves classification accuracy

2. Supervised Shapelet Learning more accurate than ExhaustiveShapelet Discovery

I Shapelets are not restricted to series segmentsI Considers interactions among minimum distances features

3. Results against 13 baselines using 28 datasets validate the claim

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 14 / 1

Page 16: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

References

J. Hills, J. Lines, E. Baranauskas, J. Mapp, and A. Bagnall.Classification of time series by shapelet transformation.Data Mining and Knowledge Discovery, 2013.

J. Lines, L. Davis, J. Hills, and A. Bagnall.A shapelet transform for time series classification.In Proceedings of the 18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2012.

A. Mueen, E. Keogh, and N. Young.Logical-shapelets: an expressive primitive for time series classification.In Proceedings of the 17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2011.

T. Rakthanmanon and E. Keogh.Fast shapelets: A scalable algorithm for discovering time series shapelets.Proceedings of the 13th SIAM International Conference on Data Mining, 2013.

L. Ye and E. Keogh.Time series shapelets: a new primitive for data mining.In Proceedings of the 15th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2009.

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 15 / 1

Page 17: Learning Time-Series Shapelets - Semantic Scholar

Learning Time-Series Shapelets

Back-up Slide: Dependence on Initialization

S1:15

S16

:30

a) Train Loss

−1 0 1−1

−0.5

0

0.5

1

20

25

30

35

40

S1:15

S16

:30

b) Train Error

−1 0 1−1

−0.5

0

0.5

1

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Figure 5: Sensitivity of Shapelet Initialization, Gun-Point dataset, Parameters:L = 30, η = 0.01, λW = 0.01, Iterations= 3000, α = −100

I We used K-Means centroids as initial shapelets.

Grabocka et al., ISMLL, University of Hildesheim, Germany

SIGKDD 2014, 26/08/2014 16 / 1