Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann”...
-
date post
20-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann”...
Making Time-series Classification More Accurate Using Learned
Constraints
© Chotirat “Ann” Ratanamahatana
Eamonn Keogh
2004 SIAM International Conference on DATA MININGApril 22, 2004
Roadmap
• Time series and their similarity measures
• Euclidean distance and its limitation
• Dynamic time warping (DTW)
• Global constraints
• R-K band
• Experimental Evaluation
• Conclusions and future work
Important Note!You are free to use any slides in this talk for teaching purposes, provide that the authorship of the slides is clearly attributed to Ratanamahatana and Keogh.
You may not use any text or images contained here in a paper (including tech reports or unpublished works) or tutorial, without the express permission of Dr.Keogh.
Chotirat Ann Ratanamahatana and Eamonn Keogh. Making Time-series Classification More Accurate Using Learned Constraints. In proceedings of SIAM International Conference on Data Mining (SDM '04), Lake Buena Vista, Florida, April 22-24, 2004. pp. 11-22
Classification in Time SeriesClassification, in general, maps data into predefined groups (supervised learning)
Pattern Recognition is a type of supervised classification where an input pattern is classified into one of the classes based on its similarity to these predefined classes.
Class BClass BClass AClass A
Which class does
belong to?
Age Income Student CreditRating Class: buy comp.
28 High No Fair No
25 High No Excellent No
35 High No Fair Yes
45 Medium No Excellent No
18 Low Yes Fair Yes
49 High No Fair ??Will this person buy a
computer?
Will this person buy a computer?
Euclidean Distance MetricGiven 2 time series
Q = q1, …, qn and
C = c1, …, cn
their Euclidean distance is
defined as
n
iii cqCQD
1
2)(),(
0 50 100 150-1.5
-1
-0.5
0
0.5
1
1.5
0 50 100 150-1.5
-1
-0.5
0
0.5
1
1.5
Q
C
Limitations of Euclidean MetricVery sensitive to some distortion in the data
Training data consistsof 10 instances fromeach of the 3 classes
Training data consistsof 10 instances fromeach of the 3 classes
Perform a 1-nearest neighbor algorithm, with “leaving-one-out”
evaluation, averaged over 100 runs.
Euclidean distance Error rate:29.77%
DTW Error rate:3.33 %
Dynamic Time Warping (DTW)
Euclidean DistanceOne-to-one alignments
Time Warping DistanceNon-linear alignments are allowed
How Is DTW Calculated? (II)Each warping path w can be found using dynamic programming to evaluatethe following recurrence:
)}1,(),,1(),1,1(min{),(),( jijijicqdji ji
where γ(i, j) is the cumulative distance of the distance d(i, j) and its minimumcumulative distance among the adjacent cells.
(i-1, j)
(i, j-1)
(i, j)
(i-1, j-1)
Global Constraints (I)
C
Q
C
Q
C
Q
C
Q
Sakoe-Chiba Band Itakura Parallelogram
Prevent any unreasonable
warping
Prevent any unreasonable
warping
Global Constraints (II)
Ri
Sakoe-Chiba Band Itakura Parallelogram
A Global Constraint for a sequence of size m is defined by R, whereRi = d 0 d m, 1 i m.
Ri defines a freedom of warping above and to the right of the diagonal at any given point i in the sequence.
Is Wider the Band, the Better?
DTW dist = 1.6389R = 1
DTW dist = 1.0204R = 25
DTW dist = 1.0204R = 10
Euclidean distance = 2.4836
identical
Wider Isn’t Always Better
0 10 20 30 40 50 60 700
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
4
Warping Window Size
CP
U T
ime
(mse
c)
auslan gun digit trace wordspotting
auslangundigittracewordspotting
Larger warping window is not always a good thing.
0 10 20 30 40 50 60 7060
65
70
75
80
85
90
95
100
Warping Window Size
Acc
ura
cy (
%) auslan
gun digit trace wordspotting
auslangundigittracewordspotting
Euclidean
0 10 20 30 40 50 60 7060
65
70
75
80
85
90
95
100
Warping Window Size
Acc
ura
cy (
%) auslan
gun digit trace wordspotting
auslangundigittracewordspotting
Euclidean
0 10 20 30 40 50 60 7060
65
70
75
80
85
90
95
100
Warping Window Size
Acc
ura
cy (
%) auslan
gun digit trace wordspotting
auslangundigittracewordspotting
Euclidean
0 10 20 30 40 50 60 7060
65
70
75
80
85
90
95
100
Warping Window Size
Acc
ura
cy (
%) auslan
gun digit trace wordspotting
auslangundigittracewordspotting
Euclidean
Recall this example
Most accuracies peak at smaller window size
Ratanamahatana-Keogh Band (R-K Band)
Solution: we create an arbitrary shape and size of the band that is appropriate for the data we want to classify.
How Many Bands Do We Need?• Of course, we could use ONE same band to classify
all the classes, as almost all of the researchers do.
• But…the width of the band does depend on the characteristic of the data within each class. Having one single band for classification is unlikely to generalize.
• Our proposed solution:We create an arbitrary band (R-K band) for each class and use it accordingly for classification.
How Do We Create an R-K Band?First Attempt: We could look at the data and manually create the shape of the bands.
(then we need to adjust the width of each band as well until we get a good result)
50 100 150 200 250
50
100
150
200
250
50 100 150 200 250
50
100
150
200
250
0 50 100 150 200 250-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
0 50 100 150 200 250-2.5
-2
-1.5
-1
-0.5
0
0.5
1
100 % Accuracy!
Learning an R-K Band Automatically
50 100 150 200 250
50
100
150
200
250
0 50 100 150 200 250-2.5
-2
-1.5
-1
-0.5
0
0.5
1
0 50 100 150 200 250-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
50 100 150 200 250
50
100
150
200
250
Our heuristic search algorithm automatically learns the bands from the data.(sometimes, we can even get an unintuitive shape that give a good result.)
100 % Accuracy as well!
Calculate h(1)
Calculate h(2)
h(2) > h(1) ? Yes No
Calculate h(1)
Calculate h(2)
h(2) > h(1) ? Yes No
R-K Band Learning With Heuristic Search
1. Gun Problem
2. Trace (transient classification benchmark)
3. Handwritten Word Spotting data
Experiment: Datasets
0 50 100 150 200 250 300-3
-2
-1
0
1
2
3
4
0 50 100 150 200 250 300-3
-2
-1
0
1
2
3
4
0 50 100 150 200 250 300-3
-2
-1
0
1
2
3
4
0 50 100 150 200 250 300-3
-2
-1
0
1
2
3
4
0 50 100 150-1
-0.5
0
0.5
1
1.5
2
2.5
0 50 100 150-1
-0.5
0
0.5
1
1.5
2
2.5
Experimental Design
We measure the accuracy and CPU time of each dataset, using the following methods:
1. Euclidean distance2. Uniformed warping window (size 1 to 100)3. Learning different R-K bands for all classes, and
perform classification based on them.
The leaving-one-out in 1-nearest-neighbor classification is used to Measure the accuracy.
The lower bounding method is also used to prune off unnecessary Calculation of DTW.
Experimental Results (I)
0 50 100 150-1
-0.5
0
0.5
1
1.5
2
2.5
20 40 60 80 100 120 140
20
40
60
80
100
120
140
Gun Draw
0 50 100 150-1
-0.5
0
0.5
1
1.5
2
2.5
20 40 60 80 100 120 140
20
40
60
80
100
120
140
Point
Euclidean Best Unif. Warping 10% Unif. Warping DTW with R-K Band
Error Rate (%) 5.5 1.0 (width = 4) 4.5 (width = 15) 0.5 (max width = 4)
CPU Time (msec) N/A 2,440 5,430 1,440
CPU Time (no LB) 60 11,820 17,290 9,440
Experimental Results (II)
0 50 100 150 200 250 300-3
-2
-1
0
1
2
3
4
0 50 100 150 200 250 300-3
-2
-1
0
1
2
3
4
0 50 100 150 200 250 300-3
-2
-1
0
1
2
3
4
0 50 100 150 200 250 300-3
-2
-1
0
1
2
3
4
50 100 150 200 250
50
100
150
200
250
50 100 150 200 250
50
100
150
200
250
50 100 150 200 250
50
100
150
200
250
50 100 150 200 250
50
100
150
200
250
Euclidean Best Unif. Warping 10% Unif. Warping DTW with R-K Band
Error Rate (%) 11 0 (width = 8) 0 (width = 27) 0 (max width = 7)
CPU Time (msec) N/A 16,020 34,980 7,420
CPU Time (no LB) 210 144,470 185,460 88,630
Conclusions
• Different shapes and widths of the band contributes to the classification accuracy.
• Each class can be better recognized using its own individual R-K Band.
• Heuristic search algorithm is a good approach to R-K Band learning.
• R-K Band combining with the Lower Bounding technique yields higher accuracy and makes a classification task much faster.
Future Work
• Investigate other choices that may make envelope learning more accurate.– Heuristic functions– Search algorithm (refining the search)
• Is there a way to always guarantee an optimal solution?• Examine the best way to deal with multi-variate time
series.• Consider a more generalized form of our framework, i.e.
a single R-K Band is learned for a particular domain.• Explore the utility of R-K Band specifically on real-world
problems: music, bioinformatics, biomedical data, etc.
UCR Time Series Data Mining Archive: http://www.cs.ucr.edu/~eamonn/TSDMA
Contact: [email protected] [email protected]
Homepage: http://www.cs.ucr.edu/~ratana
All datasets are publicly available at: