Periodic pattern mining

43
NAME OF PRESENTER Periodic Pattern Mining in Time Series Databases Ashis Kumar Chanda Swapnil Saha Department of Computer Science and Engineering University of Dhaka

Transcript of Periodic pattern mining

Page 1: Periodic pattern mining

1 I NAME OF PRESENTER

Periodic Pattern Mining in Time Series Databases

Ashis Kumar ChandaSwapnil Saha

Department of Computer Science and EngineeringUniversity of Dhaka

Page 2: Periodic pattern mining

2 I NAME OF PRESENTERCSE, DU2

Introduction

Key Terms

Suffix Tree Generation

Conclusion

>

>

>

Time Series Database>

Periodic Pattern Detection

>

Topics to be covered

>

Page 3: Periodic pattern mining

3 I NAME OF PRESENTERCSE, DU3

Introduction

What is a time-series database?A time-series database consists of sequences of values or events obtained over repeated measurements of time

A fixed time intervals (e.g., hourly, daily, weekly).

Page 4: Periodic pattern mining

4

MATHEMATICAL RPRESENTATIONA time series is a set of observation taken

at specified times

A time series involving a variable YIf a time series is defined by y1, y2, y3 ...

Values at times t1, t2, t3 ... Then we can write a function of time Y=F(t)

Page 5: Periodic pattern mining

5

CATEGORIES OF TIME SERIES Long term movements Cyclic movements Seasonal movements Irregular or random movements

We can define each movements as L, C, S, I variables respectively

And Time series variables Y = L+C+S+Ior Y = L*C*S*I

Page 6: Periodic pattern mining

6

TYPES OF PERIODIC Symbol periodicity

axy apq amn

Sequence periodicityabxy abpq abmn

Segment periodicityabxy abxy abxy

Page 7: Periodic pattern mining

7

KEY TERMS

Perfect Periodicityabxy abpq abmnabxy acpq abmn

Here conf( 4,0, ab)= 2/3 = 0.67

Page 8: Periodic pattern mining

8

KEY TERMS Periodicity in Subsection of a Time

SeriesT= gbxy asdf abpq abmnStpos = 8endPos= 15So, Subsection part gbxy asdf abpq

abmn

Page 9: Periodic pattern mining

9

KEY TERMS Periodicity with Time ToleranceWe can’t get always noise free time series

data

So we check some more bit then our target

sequenceThis extra bit is known as time tolerance

(tt)

If X is a pattern of p length in T then we check

At stPos, stPos+p±tt, stPos+2p±tt . . . ..

Page 10: Periodic pattern mining

10

KEY TERMS A period in a time series may be

represented by 5 tuple ( S, p, stPos, endPos, Conf)

S = sequence of periodic patternp = check pattern after p num of charConf= confidencestPos, endPos is the starting and endingposition of segment where match pattern

Page 11: Periodic pattern mining

11

KEY TERMS Suppose, T= abxy acpq abdd abmnthen ( ab, 4, 0, 11, 1) means Find ab pattern in T from 0 position to 11postion affter 4 char

a b x y a c p q a b d d abmn0 1 2 3 4 5 6 7 8 9 10 11

Page 12: Periodic pattern mining

12

KEY TERMSOccurrence Vector:

a b c a b b a b b a $0 1 2 3 4 5 6 7 8 9

Occurrence vector of a : (0 3 6 9)Occurrence vector of ab : (0 3 6)

Page 13: Periodic pattern mining

13

KEY TERMSDifference Vector:

a b c a b b a b b a $0 1 2 3 4 5 6 7 8 9

Occurrence vector of a : 0 3Difference vector : 3Occurrence vector of bb : 4 7Difference vetor : 3

Page 14: Periodic pattern mining

14

How to get a string format from a Transactional database?

Discretization Technique

Page 15: Periodic pattern mining

15

DISCRETIZATION TECHNIQUE

Page 16: Periodic pattern mining

16

DISCRETIZATION TECHNIQUEWe need to define a range or group from

DB and characterized each range by a unique ASCII character

Suppose,In our previous example,

log in defined by alog out ,, xbefore log in ,, bbefore log out ,, cafter log out ,, d

Page 17: Periodic pattern mining

17

DISCRETIZATION TECHNIQUE

Page 18: Periodic pattern mining

18

DISCRETIZATION TECHNIQUE

accx acxd axdd bacx

Page 19: Periodic pattern mining

19

SUFFIX TREE GENERATION

‘abcabbaabb$’ has following ten suffixes. We can ignore the 10th suffix when generating suffix tree

1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

7. abb$

8. bb$

9. b$

10. $

Page 20: Periodic pattern mining

20

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

ab

ac

bb

ab

b$

Page 21: Periodic pattern mining

21

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

ab

ac

bb

ab

b$

bc

b

ab

$

a

b

b

Page 22: Periodic pattern mining

22

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

ab

ac

bb

ab

b$

bc

b

ab

$

a

b

b

cb

ab

$

ab

b

Page 23: Periodic pattern mining

23

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

ab

bc

b

ab

$

a

b

b

cb

ab

$

ab

b

ac

bb

ab

b$

Page 24: Periodic pattern mining

24

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

ab

bc

b

ab

$

a

b

b

cb

ab

$

ab

b

ac

bb

ab

b$

b

a

b

b

$

Page 25: Periodic pattern mining

25

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

a

b

b

$

c

b

ab

$

a

b

b

ba

bb

$

Page 26: Periodic pattern mining

26

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

a

b

b

$

c

b

ab

$

a

b

b

ba

bb

$

a bb

$

Page 27: Periodic pattern mining

27

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

7. abb$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

abb$

c

b

ab

$

a

b

b

ba

bb

$

a bb

$

$

Page 28: Periodic pattern mining

28

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

7. abb$

8. bb$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

abb$

c

b

ab

$

a

b

b

b

a

a bb

$

$

bb

$

$

Page 29: Periodic pattern mining

29

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

7. abb$

8. bb$

9. b$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

abb$

c

b

ab

$

a

b

b

b

a

a bb

$

$

bb

$

$

$

Page 30: Periodic pattern mining

30

SUFFIX TREEabcabbabb$

Edge leaf node holds a number that represents starting position of the suffix

Each intermediate node holds a number which is the length of the substring read from root to the intermediate node

0

ab

1

b

cb

ab

$

ab

b

2ac

bb

ab

b$

2

6

b

abb$

c

b

ab

$

a

b

b

1

4

b

a

5

a bb

$

3$

3

bb

$

2

7

$

$

8

Page 31: Periodic pattern mining

31

SUFFIX TREEabcabbabb$Find Occrrence Vector

0

ab

1

b

cb

ab

$

ab

b

2ac

bb

ab

b$

2

6

b

abb$

c

b

ab

$

a

b

b

1

4

b

a

5

a bb

$

3$

3

bb

$

2

7

$

$

8(3,6)

Page 32: Periodic pattern mining

32

SUFFIX TREEabcabbabb$Find Occrrence Vector

0

ab

1

b

cb

ab

$

ab

b

2ac

bb

ab

b$

2

6

b

abb$

c

b

ab

$

a

b

b

1

4

b

a

5

a bb

$

3$

3

bb

$

2

7

$

$

8(3,6)

(0,3,6)

Page 33: Periodic pattern mining

33

SUFFIX TREEabcabbabb$Find Occrrence Vector

0

ab

1

b

cb

ab

$

ab

b

2ac

bb

ab

b$

2

6

b

abb$

c

b

ab

$

a

b

b

1

4

b

a

5

a bb

$

3$

3

bb

$

2

7

$

$

8(3,6)

(0,3,6)(4,7)

(1,5,8,4,7)

Page 34: Periodic pattern mining

34

PERIODICITY DETECTIONInput: a time series of Size nOutput: Positions of periodic patterns

Process:for each occurrence vector of size k

find pfor 0 to k

check each position after p char

count confidenceadd to list if greater than threshold

Page 35: Periodic pattern mining

35

STEPSabcabbabb$ab - (0,3,6)abb - (3,6)bb - (4,7)b - (1,5,8,4,7)

stpos= 0endPos= 6P= 3-0 = 3

Now check occurrence vector of abif difference equal pcount increment

Check confidenceAdd to pattern list if confidence >= Θ

Page 36: Periodic pattern mining

36

STEPSabcdabcabcab$ab - (0,4,7,10)stpos= 0

endPos= 10P= 4-0 = 4

Now check occurrence vector of abif difference equal pcount increment

Only one pattern get 0 to 10 with p=4abcdabcabcab$

Page 37: Periodic pattern mining

37

STEPSabcdabcabcab$ab - (0,4,7,10)stpos= 4

endPos= 10P= 7-4 = 3

Now check occurrence vector of abif difference equal pcount increment

3 pattern get 4 to 10 with p=3 abcdabcabcab$

Page 38: Periodic pattern mining

38

ALGORITHM

Page 39: Periodic pattern mining

39

DISCUSS- Elfeky proposed two separate

algorithms to detect symbol & segment periodicity. (CONV) & (WARP)

But it not used in sub-sequence & complexity O(nlogn) & O(n^2)

- Han’s parper algorithm used in sub-sequence

But it need user input

Page 40: Periodic pattern mining

40

DISCUSS- In this perspective, The algorithm

discussed here is better than previous- Complexity O(nlogn)

- Works online

Page 41: Periodic pattern mining

41 I NAME OF PRESENTERCSE, DU41

References- Periodic pattern mining using suffix tree

by Rasheed, Al-Shalalfa, & Alhajj, 2011

- Effective periodic pattern mining in time series database by Nishi, Farhan, Samiullah, Jeong

- Data Mining Concepts & Techniquesby J. Han & M. Kamber

- Database system Concept by Abraham Sillberschatz, Korth, Sudarshan

Page 42: Periodic pattern mining

42 I NAME OF PRESENTERCSE, DU42

Questions

Page 43: Periodic pattern mining

43 I NAME OF PRESENTERCSE, DU43

Thank You