Time series data mining techniques
-
Upload
shanmukha-sreenivas -
Category
Data & Analytics
-
view
140 -
download
5
Transcript of Time series data mining techniques
AN OVERVIEW ON TIME SERIES DATA MINING
OUTLINE
2
1. Introduction
2. Similarity Search in Time Series Data
3. Feature-based Dimensionality Reduction
4. Discretization
5. Other Time Series Data Mining Tasks
6. Conclusions
3
Introduction6145.45
6128.75
6142.7
6201.2
6151.9
6050.95
5917.75
5855.95
5984
5993.9
5934.8
5920.05
5950
5950.7
5963.8
6141.15
..
..6471.4
6511.7
6563.25
6558.45
6492.7
6546.75
A time series is a collection of observations made sequentially in time.
CNX IT returnsExamples: Financial time series, scientific time series
TIME SERIES SIMILARITY SEARCH
4
Some examples:
- Identifying companies with similar patterns of growth.
- Determining products with similar selling patterns
- Discovering stocks with similar movement in stock prices.
- Finding out whether a musical score is similar to one of a set of copyrighted scores.
Indexing and clustering make explicit use of a distance measure The others make implicit use of a distance measure
Major Time Series Data Mining Tasks
• Indexing• Clustering• Classification• Prediction• Anomaly Detection
TIME SERIES SIMILARITY SEARCH
DISTANCE MEASURES
Euclidean distance
Dynamic Time Warping
Other distance measureso Threshold query based similarity search (TQuEST)
o Minkowski Distance
6
7
Euclidean Distance Metric
Given two time seriesQ = q1…qn
and C = c1…cn
their Euclidean distance is
defined as:
n
iii cqCQD
1
2,
Q
C
D(Q,C)
Similar sequences but they are shifted and have different scales
What’s wrong with Euclidean Distance?
What if a sequence is stretched or compressed along the time axis?
(Goldin and Kanellakis, 1995)
Normalize the time series before measuring the distance between them. 𝑥𝑖
′ =𝑥𝑖 − μ
σ
9
Fixed Time AxisSequences are aligned “one to one”.
“Warped” Time AxisNonlinear alignments are possible.
Dynamic Time Warping (Berndt et al.)
Dynamic Time Warping is a technique that finds the optimal
alignment between two time series if one time series may be
“warped” non-linearly by stretching or shrinking it along its time
axis.
This warping between two time series can be used or to determine
the similarity between the two time series.
DYNAMIC TIME WARPING[BERNDT, CLIFFORD, 1994]
Allows acceleration-deceleration of signals along the time
dimension
Basic idea
X = (x1; x2; :::xN); N є N Y = (y1; y2; :::yM); M є N
*Data sequences should be sampled at equidistant points in time
Algorithm starts by building the distance matrix C є R (N*M)
representing all pairwise distances between X and Y
This distance matrix is also called as the local cost matrix
c(i,j) = ||xi - yj|| i є [1 : N]; j є [1 : M]
Once the local cost matrix is built, the algorithm finds the
alignment path which runs through the low-cost areas – ‘valleys’
on the augmented cost matrix
C
QC
Q
HOW IS DTW
CALCULATED?(i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) }
Warping path w
CONSTRAINTS
Boundary condition
The starting and ending points of the warping path must be the first and the
last points of aligned sequences i.e C1 =(1,1) Ck=(M,N)
Monotonicity condition
n1< n2 < ::: < nK and m1< m2< :::< mK.
This condition preserves the time-ordering of points.
Step size conditionThis criteria limits the warping path from long jumps (shifts in time) while aligning sequences.
i.e we’ll be looking at only these values w(i-1,j-1) , w(i-1,j ) , w(i,j-1)
Shanmukha Sreenivas P , DoMS
12
CONSTRAINT VISUALIZATION
a)Admissible path satisfying constraints
b)Violation of boundary condition
c)Violation of monotonicity
d)Violation of step size
Shanmukha Sreenivas P , DoMS
13
Sakoe-Chiba Band Itakura Parallelogram
A global constraint constrains the indices of the warping path wk = (i,j)k such that j-r i j+r
Where r is a term defining allowed range of warping for a given point in a sequence.
r =
STEP SIZE CONDITION
EXAMPLE
s1 s2 s3 s4 s5 s6 s7 s8 s9
q1 3.76 8.07 1.64 1.08 2.86 0.00 0.06 1.88 1.25
q2 2.02 5.38 0.58 2.43 4.88 0.31 0.59 3.57 2.69
q3 6.35 11.70 3.46 0.21 1.23 0.29 0.11 0.62 0.29
q4 16.8 25.10 11.90 1.28 0.23 4.54 3.69 0.64 1.10
q5 3.20 7.24 1.28 1.42 3.39 0.04 0.16 2.31 1.61
q6 3.39 7.51 1.39 1.30 3.20 0.02 0.12 2.16 1.49
q7 4.75 9.49 2.31 0.64 2.10 0.04 0.00 1.28 0.77
q8 0.96 3.53 0.10 4.00 7.02 1.00 1.46 5.43 4.33
q9 0.02 1.08 0.27 8.07 12.18 3.39 4.20 10.05 8.53
Matrix of the pair-wise distances for element si with qj
EXAMPLE
s1 s2 s3 s4 s5 s6 s7 s8 s9
q1 3.76 11.83 13.47 14.55 17.41 17.41 17.47 19.35 20.60
q2 5.78 9.14 9.72 12.15 17.03 17.34 17.93 21.04 22.04
q3 12.13 17.48 12.60 9.93 11.16 11.45 11.56 12.18 12.47
q4 29.02 37.23 24.50 11.21 10.16 14.70 15.14 12.20 13.28
q5 32.22 36.26 25.78 12.63 13.55 10.20 10.36 12.67 13.81
q6 35.61 39.73 27.17 13.93 15.83 10.22 10.32 12.48 13.97
q7 40.36 45.10 29.48 14.57 16.03 10.26 10.22 11.50 12.27
q8 41.32 43.89 29.58 18.57 21.59 11.26 11.68 15.65 15.83
q9 41.34 42.40 29.85 26.64 30.75 14.65 15.46 21.73 24.18
Matrix computed with Dynamic Programming based on the:
dist(i,j) = dist(s1, q1) + min {dist(i-1,j-1), dist(i, j-1), dist(i-1,j))
Window size = 2
FORMULATION
Let D(i, j) refer to the dynamic time warping
distance between the subsequences
x1, x2, …, xi
y1, y2, …, yj
D(i, j) = | xi – yj | + min{ D(i – 1, j), D(i – 1, j – 1), D(i, j – 1) }
SOLUTION BY DYNAMIC PROGRAMMING
Basic implementation = O(n2) where n is the length of the sequences
will have to solve the problem for each (i, j) pair
If warping window is specified, then O(nw)
Only solve for the (i, j) pairs where | i – j | <= w
FEATURE-BASED DIMENSIONALITY
REDUCTION
20
• Time series databases are often extremely large.
Searching directly on these data will be very
complex and inefficient.
• To overcome this problem, we should use some of
transformation methods to reduce the magnitude of
time series.
• These transformation methods are called
dimensionality reduction techniques.
21
0 20 40 60 80 100 120 140
C
An Example of a
Dimensionality Reduction
Technique I
0.4995
0.5264
0.5523
0.5761
0.5973
0.6153
0.6301
0.6420
0.6515
0.6596
0.6672
0.6751
0.6843
0.6954
0.7086
0.7240
0.7412
0.7595
0.7780
0.7956
0.8115
0.8247
0.8345
0.8407
0.8431
0.8423
0.8387
…
Raw
Data
The graphic shows a
time series with 128
points.
The raw data used to
produce the graphic is
also reproduced as a
column of numbers (just
the first 30 or so points are
shown).
n = 128
22
0 20 40 60 80 100 120 140
C
. . . . . . . . . . . . . .
An Example of a
Dimensionality Reduction
Technique II
1.5698
1.0485
0.7160
0.8406
0.3709
0.4670
0.2667
0.1928
0.1635
0.1602
0.0992
0.1282
0.1438
0.1416
0.1400
0.1412
0.1530
0.0795
0.1013
0.1150
0.1801
0.1082
0.0812
0.0347
0.0052
0.0017
0.0002
...
Fourier
Coefficients
0.4995
0.5264
0.5523
0.5761
0.5973
0.6153
0.6301
0.6420
0.6515
0.6596
0.6672
0.6751
0.6843
0.6954
0.7086
0.7240
0.7412
0.7595
0.7780
0.7956
0.8115
0.8247
0.8345
0.8407
0.8431
0.8423
0.8387
…
Raw
Data
1.5698
1.0485
0.7160
0.8406
0.3709
0.4670
0.2667
0.1928
Truncated
Fourier
Coefficients
n = 128
N = 8
Cratio = 1/16
24
Fourier Analysis of Time Series using R
No. observations(n) = 11Max freq = (n-1)/2 =5w
No. of cosines = {(n-1)/2}+1=6
25
Fourier Analysis of Time Series using R
No. observations(n) = 11Max freq = (n-1)/2 =5w
No. of sines = {(n-1)/2}=5
26
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 1200 20 40 60 80 100 120 0 20 40 60 80 100 120
DFT DWT SVD APCA PAA PLA
DISCRETIZATION
27
• Discretization of a time series is tranforming it into a
symbolic string.
• The main benefit of this discretization is that there is an
enormous wealth of existing algorithms and data structures
that allow the efficient manipulations of symbolic
representations.
• Lin and Keogh et al. (2003) proposed a method called
Symbolic Aggregate Approximation (SAX), which allows
the descretization of original time series into symbolic
strings.
SYMBOLIC AGGREGATE
APPROXIMATION (SAX) [LIN ET AL. 2003]
28
baabccbc
The first symbolic representation
of time series, that allows
discretization of time series into
symbolic strings
HOW DO WE OBTAIN SAX
29
0 20 40 60 80 100 120
C
C
0
--
0 20 40 60 80 100 120
bb
b
a
c
c
c
a
baabccbc
First convert the time
series to PAA
representation, then
convert the PAA to
symbols
TWO PARAMETER CHOICES
30
0
--
0 20 40 60 80 100 120
bb
b
a
c
c
c
a
0 20 40 60 80 100 120
C
C
1 2 3 4 5 6 7
1
8
The word size, in this
case 8
The alphabet size (cardinality), in this case 3
3
2
1
Structural representations help in understanding time series through Data analysis + Visualization
SAX is claimed to be a landmark representation of time series Symbolic and therefore allows use of discrete data
structures and their corresponding algorithms for analysis
Also helps with visualization
31