Horizontal format data mining with extended bitmaps
-
Upload
denis-weerasiri -
Category
Technology
-
view
1.432 -
download
2
Transcript of Horizontal format data mining with extended bitmaps
![Page 1: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/1.jpg)
![Page 2: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/2.jpg)
Question?
• Is it possible to leverage benefits of vertical data formats in combination with efficiencies of bitmap operations to mine association rules in a distributed environment.
![Page 3: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/3.jpg)
Association Rule Mining??
• Finding Interesting Relationships between the variables.
• Finding the subset that is common to a chosen minimum number of the itemsets from the set of itemsets.
• Pattern Recognition.
• Explained By Market Basket Analysis.
![Page 4: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/4.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
Sample (Toy ) Data
Set
![Page 5: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/5.jpg)
Apriori
• Fundamental Algorithm for Association Rule Mining.
• Mines frequent patterns from a horizontal data format which represents the items categorized into particular transactions.
• i-th stage identifies all frequent i-element sets.
• Two steps: • > Candidate generation.• > Candidate counting.
![Page 6: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/6.jpg)
Vertical Form
• Transactions categorized into particular items.
• Vertical format data mining only has to parse the dataset once to get the itemsets.
• For the itemset generation from the 2nd itemset it only needs to refer the previous itemset.
• Eliminates parsing through the dataset each time to count the frequency of itemsets, for each round.
• More efficient than its horizontal form.
![Page 7: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/7.jpg)
BitMaps
• Compactly store individual bits.
• Exploit bit-level parallelism effectively.
• 0’s and 1’s.
• 1 indicates existence.
![Page 8: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/8.jpg)
Combined?
• Algorithm takes a horizontal data set.
• With a one pass of the data set construct a bit map based data structure.
• This structure is in vertical format.
• The structure facilitates efficient mining of association rules.
![Page 9: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/9.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
Sample (Toy ) Data
Set
![Page 10: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/10.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
Sample (Toy ) Data
Set
Horizontal
Format
![Page 11: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/11.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I1
I2
I4
I5
Ordered Item
Array
![Page 12: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/12.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I2
1
I1
I2
I4
I5
1
1
![Page 13: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/13.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I2
1
I5
1
I5
1
I1
I2
I4
I5
1
1
1
![Page 14: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/14.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I2
1
I5
1
I5
1
I1
I2
I4
I5
1
1
1
Master Array
![Page 15: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/15.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I2
1
I5
1
I5
1
I1
I2
I4
I5
1
1
1
Master Array
Associated
Items
![Page 16: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/16.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I2
1
I5
1
I5
1
Master Array
Associated
Items
Bitmap
I1
I2
I4
I5
1
1
1
![Page 17: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/17.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I2
1
I5
1
I5
1
I1
I2
I4
I5
1
2
1
![Page 18: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/18.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I1 I2
I2
1
I5
1
I5
1
I4
I5
I4
0
0 1
1
2
1
1
![Page 19: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/19.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I1 I2
I2
1
I5
1
I5
1
I4
I5
I4
0
0 1
2
2
1
1
![Page 20: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/20.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I1I2
I2
1
I5
1
I5
1
I4
I5
I4
0
0 1
1 0
2
3
1
2
![Page 21: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/21.jpg)
TID Item ID’s
T100 I1, I2, I5
T200 I2, I4
T300 I1, I2
T400 I2, I5
I1I2
I2
1
I5
1
I5
1
I4
I5
I4
0
0 1
1 0
Final
Data
Structure
1 0
2
4
1
2
![Page 22: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/22.jpg)
I1I2
I2
1
I5
1
I5
1
I4
I5
I4
0
0 1
1 0
1 0
2
4
1
2
Counting
Frequent Item
Sets
No. of Items Frequent Item Sets
1 I1, I2, I5
2 I1-I2, I2-I5
3 -
Minimum Support = 2
![Page 23: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/23.jpg)
I1I2
I2
1
I5
1
I5
1
I4
I5
I4
0
0 1
1 0
1 0
2
4
1
2
Counting
Frequent Item
Sets
No. of Items Frequent Item Sets
1 I1, I2, I5
2 I1-I2, I2-I5
3 -
Minimum Support = 2
1
0
0
0
0
![Page 24: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/24.jpg)
Results
![Page 25: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/25.jpg)
![Page 26: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/26.jpg)
Insights
• The algorithm performs better than Apriori in most scenarios.
• Data structure generation dominates the total time in most cases.
• As an aside…
• Can this be made to a distributed mining algorithm?
![Page 27: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/27.jpg)
Turns out this can be done rather easily.
Algorithm lends to map reduce like distributed processing..
Each master array index is self contained..
So can be mined in parallel.
Data structure generation Map phase
Result accumulation -> Reduce phase
I1 I2
1
I5
1
1 0
2
![Page 28: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/28.jpg)
What Does Future Hold?
• Make this distributed.
• Java not the best of options. Use C so we can control memory allocations the way we want.
• Experiment with bitmap compression techniques.
![Page 29: Horizontal format data mining with extended bitmaps](https://reader033.fdocuments.in/reader033/viewer/2022042715/5584d9acd8b42a25468b482c/html5/thumbnails/29.jpg)
Summary