Time series data mining techniques

32
IT'S ABOUT TIME !! Presented By- P.SHANMUKHA SREENIVAS M.MGT 1

Transcript of Time series data mining techniques

IT'S ABOUT TIME !!

Presented By-

P.SHANMUKHA SREENIVAS

M.MGT 1

AN OVERVIEW ON TIME SERIES DATA MINING

OUTLINE

2

1. Introduction

2. Similarity Search in Time Series Data

3. Feature-based Dimensionality Reduction

4. Discretization

5. Other Time Series Data Mining Tasks

6. Conclusions

3

Introduction6145.45

6128.75

6142.7

6201.2

6151.9

6050.95

5917.75

5855.95

5984

5993.9

5934.8

5920.05

5950

5950.7

5963.8

6141.15

..

..6471.4

6511.7

6563.25

6558.45

6492.7

6546.75

A time series is a collection of observations made sequentially in time.

CNX IT returnsExamples: Financial time series, scientific time series

TIME SERIES SIMILARITY SEARCH

4

Some examples:

- Identifying companies with similar patterns of growth.

- Determining products with similar selling patterns

- Discovering stocks with similar movement in stock prices.

- Finding out whether a musical score is similar to one of a set of copyrighted scores.

Indexing and clustering make explicit use of a distance measure The others make implicit use of a distance measure

Major Time Series Data Mining Tasks

• Indexing• Clustering• Classification• Prediction• Anomaly Detection

TIME SERIES SIMILARITY SEARCH

DISTANCE MEASURES

Euclidean distance

Dynamic Time Warping

Other distance measureso Threshold query based similarity search (TQuEST)

o Minkowski Distance

6

7

Euclidean Distance Metric

Given two time seriesQ = q1…qn

and C = c1…cn

their Euclidean distance is

defined as:

n

iii cqCQD

1

2,

Q

C

D(Q,C)

Similar sequences but they are shifted and have different scales

What’s wrong with Euclidean Distance?

What if a sequence is stretched or compressed along the time axis?

(Goldin and Kanellakis, 1995)

Normalize the time series before measuring the distance between them. 𝑥𝑖

′ =𝑥𝑖 − μ

σ

9

Fixed Time AxisSequences are aligned “one to one”.

“Warped” Time AxisNonlinear alignments are possible.

Dynamic Time Warping (Berndt et al.)

Dynamic Time Warping is a technique that finds the optimal

alignment between two time series if one time series may be

“warped” non-linearly by stretching or shrinking it along its time

axis.

This warping between two time series can be used or to determine

the similarity between the two time series.

DYNAMIC TIME WARPING[BERNDT, CLIFFORD, 1994]

Allows acceleration-deceleration of signals along the time

dimension

Basic idea

X = (x1; x2; :::xN); N є N Y = (y1; y2; :::yM); M є N

*Data sequences should be sampled at equidistant points in time

Algorithm starts by building the distance matrix C є R (N*M)

representing all pairwise distances between X and Y

This distance matrix is also called as the local cost matrix

c(i,j) = ||xi - yj|| i є [1 : N]; j є [1 : M]

Once the local cost matrix is built, the algorithm finds the

alignment path which runs through the low-cost areas – ‘valleys’

on the augmented cost matrix

C

QC

Q

HOW IS DTW

CALCULATED?(i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) }

Warping path w

CONSTRAINTS

Boundary condition

The starting and ending points of the warping path must be the first and the

last points of aligned sequences i.e C1 =(1,1) Ck=(M,N)

Monotonicity condition

n1< n2 < ::: < nK and m1< m2< :::< mK.

This condition preserves the time-ordering of points.

Step size conditionThis criteria limits the warping path from long jumps (shifts in time) while aligning sequences.

i.e we’ll be looking at only these values w(i-1,j-1) , w(i-1,j ) , w(i,j-1)

Shanmukha Sreenivas P , DoMS

12

CONSTRAINT VISUALIZATION

a)Admissible path satisfying constraints

b)Violation of boundary condition

c)Violation of monotonicity

d)Violation of step size

Shanmukha Sreenivas P , DoMS

13

Sakoe-Chiba Band Itakura Parallelogram

A global constraint constrains the indices of the warping path wk = (i,j)k such that j-r i j+r

Where r is a term defining allowed range of warping for a given point in a sequence.

r =

STEP SIZE CONDITION

DYNAMIC TIME WARPING

15

Advantages:

EXAMPLE

s1 s2 s3 s4 s5 s6 s7 s8 s9

q1 3.76 8.07 1.64 1.08 2.86 0.00 0.06 1.88 1.25

q2 2.02 5.38 0.58 2.43 4.88 0.31 0.59 3.57 2.69

q3 6.35 11.70 3.46 0.21 1.23 0.29 0.11 0.62 0.29

q4 16.8 25.10 11.90 1.28 0.23 4.54 3.69 0.64 1.10

q5 3.20 7.24 1.28 1.42 3.39 0.04 0.16 2.31 1.61

q6 3.39 7.51 1.39 1.30 3.20 0.02 0.12 2.16 1.49

q7 4.75 9.49 2.31 0.64 2.10 0.04 0.00 1.28 0.77

q8 0.96 3.53 0.10 4.00 7.02 1.00 1.46 5.43 4.33

q9 0.02 1.08 0.27 8.07 12.18 3.39 4.20 10.05 8.53

Matrix of the pair-wise distances for element si with qj

EXAMPLE

s1 s2 s3 s4 s5 s6 s7 s8 s9

q1 3.76 11.83 13.47 14.55 17.41 17.41 17.47 19.35 20.60

q2 5.78 9.14 9.72 12.15 17.03 17.34 17.93 21.04 22.04

q3 12.13 17.48 12.60 9.93 11.16 11.45 11.56 12.18 12.47

q4 29.02 37.23 24.50 11.21 10.16 14.70 15.14 12.20 13.28

q5 32.22 36.26 25.78 12.63 13.55 10.20 10.36 12.67 13.81

q6 35.61 39.73 27.17 13.93 15.83 10.22 10.32 12.48 13.97

q7 40.36 45.10 29.48 14.57 16.03 10.26 10.22 11.50 12.27

q8 41.32 43.89 29.58 18.57 21.59 11.26 11.68 15.65 15.83

q9 41.34 42.40 29.85 26.64 30.75 14.65 15.46 21.73 24.18

Matrix computed with Dynamic Programming based on the:

dist(i,j) = dist(s1, q1) + min {dist(i-1,j-1), dist(i, j-1), dist(i-1,j))

Window size = 2

FORMULATION

Let D(i, j) refer to the dynamic time warping

distance between the subsequences

x1, x2, …, xi

y1, y2, …, yj

D(i, j) = | xi – yj | + min{ D(i – 1, j), D(i – 1, j – 1), D(i, j – 1) }

SOLUTION BY DYNAMIC PROGRAMMING

Basic implementation = O(n2) where n is the length of the sequences

will have to solve the problem for each (i, j) pair

If warping window is specified, then O(nw)

Only solve for the (i, j) pairs where | i – j | <= w

FEATURE-BASED DIMENSIONALITY

REDUCTION

20

• Time series databases are often extremely large.

Searching directly on these data will be very

complex and inefficient.

• To overcome this problem, we should use some of

transformation methods to reduce the magnitude of

time series.

• These transformation methods are called

dimensionality reduction techniques.

21

0 20 40 60 80 100 120 140

C

An Example of a

Dimensionality Reduction

Technique I

0.4995

0.5264

0.5523

0.5761

0.5973

0.6153

0.6301

0.6420

0.6515

0.6596

0.6672

0.6751

0.6843

0.6954

0.7086

0.7240

0.7412

0.7595

0.7780

0.7956

0.8115

0.8247

0.8345

0.8407

0.8431

0.8423

0.8387

Raw

Data

The graphic shows a

time series with 128

points.

The raw data used to

produce the graphic is

also reproduced as a

column of numbers (just

the first 30 or so points are

shown).

n = 128

22

0 20 40 60 80 100 120 140

C

. . . . . . . . . . . . . .

An Example of a

Dimensionality Reduction

Technique II

1.5698

1.0485

0.7160

0.8406

0.3709

0.4670

0.2667

0.1928

0.1635

0.1602

0.0992

0.1282

0.1438

0.1416

0.1400

0.1412

0.1530

0.0795

0.1013

0.1150

0.1801

0.1082

0.0812

0.0347

0.0052

0.0017

0.0002

...

Fourier

Coefficients

0.4995

0.5264

0.5523

0.5761

0.5973

0.6153

0.6301

0.6420

0.6515

0.6596

0.6672

0.6751

0.6843

0.6954

0.7086

0.7240

0.7412

0.7595

0.7780

0.7956

0.8115

0.8247

0.8345

0.8407

0.8431

0.8423

0.8387

Raw

Data

1.5698

1.0485

0.7160

0.8406

0.3709

0.4670

0.2667

0.1928

Truncated

Fourier

Coefficients

n = 128

N = 8

Cratio = 1/16

Shanmukha Sreenivas P , DoMS

23

excellent approximation, with

only 2 frequencies!

24

Fourier Analysis of Time Series using R

No. observations(n) = 11Max freq = (n-1)/2 =5w

No. of cosines = {(n-1)/2}+1=6

25

Fourier Analysis of Time Series using R

No. observations(n) = 11Max freq = (n-1)/2 =5w

No. of sines = {(n-1)/2}=5

26

0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 1200 20 40 60 80 100 120 0 20 40 60 80 100 120

DFT DWT SVD APCA PAA PLA

DISCRETIZATION

27

• Discretization of a time series is tranforming it into a

symbolic string.

• The main benefit of this discretization is that there is an

enormous wealth of existing algorithms and data structures

that allow the efficient manipulations of symbolic

representations.

• Lin and Keogh et al. (2003) proposed a method called

Symbolic Aggregate Approximation (SAX), which allows

the descretization of original time series into symbolic

strings.

SYMBOLIC AGGREGATE

APPROXIMATION (SAX) [LIN ET AL. 2003]

28

baabccbc

The first symbolic representation

of time series, that allows

discretization of time series into

symbolic strings

HOW DO WE OBTAIN SAX

29

0 20 40 60 80 100 120

C

C

0

--

0 20 40 60 80 100 120

bb

b

a

c

c

c

a

baabccbc

First convert the time

series to PAA

representation, then

convert the PAA to

symbols

TWO PARAMETER CHOICES

30

0

--

0 20 40 60 80 100 120

bb

b

a

c

c

c

a

0 20 40 60 80 100 120

C

C

1 2 3 4 5 6 7

1

8

The word size, in this

case 8

The alphabet size (cardinality), in this case 3

3

2

1

Structural representations help in understanding time series through Data analysis + Visualization

SAX is claimed to be a landmark representation of time series Symbolic and therefore allows use of discrete data

structures and their corresponding algorithms for analysis

Also helps with visualization

31

THANK YOU

www.cs.ucr.edu/~eamonn/TSDMA/index.html

32

Datasets and code used in

this presentation can be

found at..