An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets.
A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams
description
Transcript of A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams
![Page 1: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/1.jpg)
A Sliding Window Method for Finding A Sliding Window Method for Finding Recently Frequent Itemsets over OnliRecently Frequent Itemsets over Online Data Streamsne Data StreamsJoong Hyuk Chang and won Suk Lee, Proc. of the 9’th Joong Hyuk Chang and won Suk Lee, Proc. of the 9’th ACM SIGKDD International Conference on Knowledge DACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03)iscovery and Data Mining (SIGKDD’03)
Adviser: Jia-Ling Koh Speaker: Yu-ting Kung
![Page 2: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/2.jpg)
2
Introduction• Most of mining algorithms or
frequency approximation algorithm for a data stream don’t able to extract the recent change of information in a data stream adaptively.
![Page 3: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/3.jpg)
3
Introduction (Cont.)• In this paper,
– Propose a sliding window method of finding recently frequent itemsets over an online data stream
![Page 4: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/4.jpg)
4
Sliding Window Method• Idea:
– Define significant itemset:• An itemset whose current support is greater than or equal to an error parameter is a significant itemset
– Monitoring only significant itemsets
![Page 5: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/5.jpg)
5
SW Method (Cont.)• Two different phases
– Window initialization phase:• Actives while the number of transactions generated so far in a data stream is less than or equal to a predefined window size.• Insert new transaction in CTL (current transaction list)• No extracted transation
– Window sliding phase:• Actives after the CTL becomes full• Insert new transaction in CTL (current transaction list)• The oldest transaction is extracted from the CTL
![Page 6: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/6.jpg)
6
SW Method (Cont.)• Five steps:
1. Appending a transaction2. Counting updating and insertion of new itemsets3. Extracting a transaction4. Pruning of itemsets5. Frequent itemset selection
![Page 7: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/7.jpg)
7
Step1: Appending a transaction• Content
– The transaction Tk is appended to the current transaction list CTL
![Page 8: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/8.jpg)
8
Step2: Counting updating and insertion of new itemsets• Content
– For an itemset e that appears in the Tk with an entry (e, f, t):f: count of the itemset t: TID which makes the itemset be newly inserted into the monitoring lattice
Case 1 its corresponding node is in the monitoring lattice: e.f = e.f + 1Case 2 its corresponding node isn’t in the monitoring lattice: e is inserted into the monitoring lattice with (e, 1, k)
![Page 9: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/9.jpg)
9
Step3: Extracting a transaction• When this step is done?
– Only in the window sliding phase• Content
– Extract the oldest transaction in CTL– Update the entry (e, f, t) of this node in the
monitoring lattice:If t <= wfirst e.f = e.f -1;
Wfirst : the TID of the first transaction of the current windowElse e.f = e.f;
![Page 10: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/10.jpg)
10
Step4: Pruning of itemsets• Therom:
– Given an error parameter , the maximum possible count of an itemset with its entry (e, f, t) is found as follows:
otherwisewtf
feC
firstk )(
w t if )( firstmax
)(max eCk
![Page 11: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/11.jpg)
11
Step4: Pruning of itemsets• When this step is done?
– Periodically or when it is needed• Content
– For an itemset e with entry (e, f, t) in the monitoring lattice: If , Then it can be regarded as an insignificant itemset Prune it !!
k
wk DeC )(max
![Page 12: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/12.jpg)
12
Step5: Frequent itemset selection• When this step is done?
– The up-to-date set of recently frequent itemsets is requested.• Content
– For an itemset e with an entry (e, f, t) in the monitoring lattice:If its , min
max )( SDeCk
wk
it is a frequent itemset !!
![Page 13: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/13.jpg)
13
For Example• Data Stream
Tid 1
Items
AB
(a) D1
Tid 1 2 3 4 5
Items
AB
D AB
AB
A
(b) D5
Tid 1 2 3 4 5 6 7 8 9 10
Items
AB
D AB
AB
A AC
AE
AC
E AE
(c) D10
Tid 1 2 3 4 5 6 7 8 9 10
11
Items
AB
D AB
AB
A AC
AE
AC
E AE
AD
(d) D11
Tid 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
Items
AB
D AB
AB
A AC
AE
AC
E AE
AD
B A B A
(e) D15
![Page 14: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/14.jpg)
14
For Example (Cont.)• Initial value
– Smin = 0.5– = 0.25 (0.5 x Smin) – Window size = 10– Step4 is performed in every 5 transactions.
![Page 15: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/15.jpg)
15
e f t
AB
AB
111
111
(a) After T1 (AB)
Step1,2 Step1,2
e f t
AB
ABD
1111
1112
(b) After T2 (D)
Step1,2e f t
AB
ABD
2221
1112
(b.1) After T3 (AB)
Step1,2
(b.2) After T4 (AB)
Step1,2
(b.3) After T5 (A)
Step4
(b.3) After prning
D is pruned from the monitoring lattice, becasueStep1,2 recursively
(C) After T10 (AE)
![Page 16: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/16.jpg)
16
e f t
ABCDE
ABACAEAD
932133221
11611716711
Step1,2 Step3
(d) After T11 (AD)
Step4
Step1,2,3,4
(e) After Step4 for T15
![Page 17: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/17.jpg)
17
Experiment Result• Data souce
– T5.I4.D1000K-I– T5.I4.D1000K-II
![Page 18: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/18.jpg)
18
Experiment (Cont.)• Memory usage in the window
sliding phase
![Page 19: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/19.jpg)
19
Experiment (Cont.)• Average support error
– Measure the relative accuracy of the proposed method
– When two sets of mining results
and are given for the same data set, the average support error ASE(R2|R1) is defined:
1
2121
1221221211
)(|})()({|)()|(
R
eSeSeSeSRRASE RRRe
mRRe
mmRRRe
mmmm
})(|))(,{( min111 SeSeSeR iii
})(|))(,{( min222222 SeSeSeR
![Page 20: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/20.jpg)
20
Experiment (Cont.)• Average support error of the mining result of the proposed method with respect to that of the Apriori algorithm on the transactions within the current window
![Page 21: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/21.jpg)
21
Experiment (Cont.)• The average processing time(Step1-
Step4) of the sliding window method in each interval
![Page 22: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/22.jpg)
22
Experiment (Cont.)• The average processing time for
Step5
![Page 23: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/23.jpg)
23
Experiment (Cont.)• The memory usage of the window
sliding phase by varying the size of the window
![Page 24: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/24.jpg)
24
Experiment (Cont.)• The average processing time of
the sliding window method by varying the size of a window
![Page 25: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams](https://reader036.fdocuments.in/reader036/viewer/2022062816/56815591550346895dc36b31/html5/thumbnails/25.jpg)
25
Conclusion• The result of the proposed method guarantees the following:
– All itemsets whose true supports are greater than or equal to a minimum support Smin are found– No itemset whose true support is less than (Smin- ) is found as a recently frequent itemset – For each itemeset, the difference between its estimated support and its true support is less than