Periodic pattern mining
-
Upload
ashis-kumar-chanda -
Category
Engineering
-
view
69 -
download
6
Transcript of Periodic pattern mining
1 I NAME OF PRESENTER
Periodic Pattern Mining in Time Series Databases
Ashis Kumar ChandaSwapnil Saha
Department of Computer Science and EngineeringUniversity of Dhaka
2 I NAME OF PRESENTERCSE, DU2
Introduction
Key Terms
Suffix Tree Generation
Conclusion
>
>
>
Time Series Database>
Periodic Pattern Detection
>
Topics to be covered
>
3 I NAME OF PRESENTERCSE, DU3
Introduction
What is a time-series database?A time-series database consists of sequences of values or events obtained over repeated measurements of time
A fixed time intervals (e.g., hourly, daily, weekly).
4
MATHEMATICAL RPRESENTATIONA time series is a set of observation taken
at specified times
A time series involving a variable YIf a time series is defined by y1, y2, y3 ...
Values at times t1, t2, t3 ... Then we can write a function of time Y=F(t)
5
CATEGORIES OF TIME SERIES Long term movements Cyclic movements Seasonal movements Irregular or random movements
We can define each movements as L, C, S, I variables respectively
And Time series variables Y = L+C+S+Ior Y = L*C*S*I
6
TYPES OF PERIODIC Symbol periodicity
axy apq amn
Sequence periodicityabxy abpq abmn
Segment periodicityabxy abxy abxy
7
KEY TERMS
Perfect Periodicityabxy abpq abmnabxy acpq abmn
Here conf( 4,0, ab)= 2/3 = 0.67
8
KEY TERMS Periodicity in Subsection of a Time
SeriesT= gbxy asdf abpq abmnStpos = 8endPos= 15So, Subsection part gbxy asdf abpq
abmn
9
KEY TERMS Periodicity with Time ToleranceWe can’t get always noise free time series
data
So we check some more bit then our target
sequenceThis extra bit is known as time tolerance
(tt)
If X is a pattern of p length in T then we check
At stPos, stPos+p±tt, stPos+2p±tt . . . ..
10
KEY TERMS A period in a time series may be
represented by 5 tuple ( S, p, stPos, endPos, Conf)
S = sequence of periodic patternp = check pattern after p num of charConf= confidencestPos, endPos is the starting and endingposition of segment where match pattern
11
KEY TERMS Suppose, T= abxy acpq abdd abmnthen ( ab, 4, 0, 11, 1) means Find ab pattern in T from 0 position to 11postion affter 4 char
a b x y a c p q a b d d abmn0 1 2 3 4 5 6 7 8 9 10 11
12
KEY TERMSOccurrence Vector:
a b c a b b a b b a $0 1 2 3 4 5 6 7 8 9
Occurrence vector of a : (0 3 6 9)Occurrence vector of ab : (0 3 6)
13
KEY TERMSDifference Vector:
a b c a b b a b b a $0 1 2 3 4 5 6 7 8 9
Occurrence vector of a : 0 3Difference vector : 3Occurrence vector of bb : 4 7Difference vetor : 3
14
How to get a string format from a Transactional database?
Discretization Technique
15
DISCRETIZATION TECHNIQUE
16
DISCRETIZATION TECHNIQUEWe need to define a range or group from
DB and characterized each range by a unique ASCII character
Suppose,In our previous example,
log in defined by alog out ,, xbefore log in ,, bbefore log out ,, cafter log out ,, d
17
DISCRETIZATION TECHNIQUE
18
DISCRETIZATION TECHNIQUE
accx acxd axdd bacx
19
SUFFIX TREE GENERATION
‘abcabbaabb$’ has following ten suffixes. We can ignore the 10th suffix when generating suffix tree
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
7. abb$
8. bb$
9. b$
10. $
20
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
ab
ac
bb
ab
b$
21
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
ab
ac
bb
ab
b$
bc
b
ab
$
a
b
b
22
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
3. cabbabb$
ab
ac
bb
ab
b$
bc
b
ab
$
a
b
b
cb
ab
$
ab
b
23
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
ab
bc
b
ab
$
a
b
b
cb
ab
$
ab
b
ac
bb
ab
b$
24
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
ab
bc
b
ab
$
a
b
b
cb
ab
$
ab
b
ac
bb
ab
b$
b
a
b
b
$
25
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
ab b
cb
ab
$
ab
b
ac
bb
ab
b$
b
a
b
b
$
c
b
ab
$
a
b
b
ba
bb
$
26
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
ab b
cb
ab
$
ab
b
ac
bb
ab
b$
b
a
b
b
$
c
b
ab
$
a
b
b
ba
bb
$
a bb
$
27
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
7. abb$
ab b
cb
ab
$
ab
b
ac
bb
ab
b$
b
abb$
c
b
ab
$
a
b
b
ba
bb
$
a bb
$
$
28
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
7. abb$
8. bb$
ab b
cb
ab
$
ab
b
ac
bb
ab
b$
b
abb$
c
b
ab
$
a
b
b
b
a
a bb
$
$
bb
$
$
29
SUFFIX TREE GENERATION
Strings:1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
7. abb$
8. bb$
9. b$
ab b
cb
ab
$
ab
b
ac
bb
ab
b$
b
abb$
c
b
ab
$
a
b
b
b
a
a bb
$
$
bb
$
$
$
30
SUFFIX TREEabcabbabb$
Edge leaf node holds a number that represents starting position of the suffix
Each intermediate node holds a number which is the length of the substring read from root to the intermediate node
0
ab
1
b
cb
ab
$
ab
b
2ac
bb
ab
b$
2
6
b
abb$
c
b
ab
$
a
b
b
1
4
b
a
5
a bb
$
3$
3
bb
$
2
7
$
$
8
31
SUFFIX TREEabcabbabb$Find Occrrence Vector
0
ab
1
b
cb
ab
$
ab
b
2ac
bb
ab
b$
2
6
b
abb$
c
b
ab
$
a
b
b
1
4
b
a
5
a bb
$
3$
3
bb
$
2
7
$
$
8(3,6)
32
SUFFIX TREEabcabbabb$Find Occrrence Vector
0
ab
1
b
cb
ab
$
ab
b
2ac
bb
ab
b$
2
6
b
abb$
c
b
ab
$
a
b
b
1
4
b
a
5
a bb
$
3$
3
bb
$
2
7
$
$
8(3,6)
(0,3,6)
33
SUFFIX TREEabcabbabb$Find Occrrence Vector
0
ab
1
b
cb
ab
$
ab
b
2ac
bb
ab
b$
2
6
b
abb$
c
b
ab
$
a
b
b
1
4
b
a
5
a bb
$
3$
3
bb
$
2
7
$
$
8(3,6)
(0,3,6)(4,7)
(1,5,8,4,7)
34
PERIODICITY DETECTIONInput: a time series of Size nOutput: Positions of periodic patterns
Process:for each occurrence vector of size k
find pfor 0 to k
check each position after p char
count confidenceadd to list if greater than threshold
35
STEPSabcabbabb$ab - (0,3,6)abb - (3,6)bb - (4,7)b - (1,5,8,4,7)
stpos= 0endPos= 6P= 3-0 = 3
Now check occurrence vector of abif difference equal pcount increment
Check confidenceAdd to pattern list if confidence >= Θ
36
STEPSabcdabcabcab$ab - (0,4,7,10)stpos= 0
endPos= 10P= 4-0 = 4
Now check occurrence vector of abif difference equal pcount increment
Only one pattern get 0 to 10 with p=4abcdabcabcab$
37
STEPSabcdabcabcab$ab - (0,4,7,10)stpos= 4
endPos= 10P= 7-4 = 3
Now check occurrence vector of abif difference equal pcount increment
3 pattern get 4 to 10 with p=3 abcdabcabcab$
38
ALGORITHM
39
DISCUSS- Elfeky proposed two separate
algorithms to detect symbol & segment periodicity. (CONV) & (WARP)
But it not used in sub-sequence & complexity O(nlogn) & O(n^2)
- Han’s parper algorithm used in sub-sequence
But it need user input
40
DISCUSS- In this perspective, The algorithm
discussed here is better than previous- Complexity O(nlogn)
- Works online
41 I NAME OF PRESENTERCSE, DU41
References- Periodic pattern mining using suffix tree
by Rasheed, Al-Shalalfa, & Alhajj, 2011
- Effective periodic pattern mining in time series database by Nishi, Farhan, Samiullah, Jeong
- Data Mining Concepts & Techniquesby J. Han & M. Kamber
- Database system Concept by Abraham Sillberschatz, Korth, Sudarshan
42 I NAME OF PRESENTERCSE, DU42
Questions
43 I NAME OF PRESENTERCSE, DU43
Thank You