Post on 18-Dec-2015
Our New Progress on Frequent/Sequential Pattern Mining
We develop new frequent/sequential pattern mining methods
Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins
Our newmethods
Conventionalmethods
Frequent patternmining
FP-growth Apriori, TreeProjection
Sequential patternmining
PrefixSpan,FreeSpan
GSP
Frequent closedpattern mining
CLOSET A-close, CHARM
Mining Complete Set of Frequent Patterns on T10I4D100k
0
20
40
60
80
100
120
140
0.00% 0.05% 0.10% 0.15%
Support threshold
Ru
nti
me (
seco
nd
)
Apriori
TreeProjection
FP-growth
Mining Complete Set of Frequent Patterns on T25I20D100k
0
20
40
60
80
100
120
140
160
180
200
0.00% 0.50% 1.00% 1.50%
Support threshold
Ru
nti
me (
seco
nd
)
Apriori
TreeProjection
FP-growth
Mining Complete Set of Frequent Patterns on Connect-4
0
50
100
150
200
250
300
350
400
70% 75% 80% 85% 90% 95%
Support threshold
Ru
nti
me (
seco
nd
) Apriori
TreeProjection
FP-growth
Mining Sequential Patterns on C10T4S16I4
0
100
200
300
400
500
600
700
800
0.00% 0.50% 1.00% 1.50% 2.00%
Support threshold
Ru
n t
ime (
seco
nd
)
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
Mining Sequential Patterns on C10T8S8I8
0
20
40
60
80
100
120
140
160
180
200
0.00% 0.50% 1.00% 1.50% 2.00%
Support threshold
Ru
n t
ime (
seco
nd
)
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
Scalability of Mining Sequential Patterns on C10-100T8S8I8
0
100
200
300
400
500
600
700
800
0 20000 40000 60000 80000 100000
Number of sequences
Ru
n t
ime
(s
ec
on
d)
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
Scalability of Mining Sequential Patterns on C10-100T4S16I4
0
200
400
600
800
1000
1200
1400
1600
0 20000 40000 60000 80000 100000
Number of sequences
Ru
n t
ime
(s
ec
on
d)
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
Why Prefix Is Faster Than GSP?
0.001
0.01
0.1
1
10
100
0.00% 0.50% 1.00% 1.50% 2.00%
Support threshold
# cand/pattern inGSP
Runtime/proj. db inPrefixSpan
0.001
0.01
0.1
1
10
100
0.00% 0.50% 1.00% 1.50% 2.00%
Support threshold
# cand/pattern inGSP
Runtime/proj. db inPrefixSpan
Dataset C10T4S16I4 Dataset C10T8S8I8
Mining Frequent Closed Itemsets on T25I20D100k
0
20
40
60
80
100
0.7% 0.9% 1.1% 1.3% 1.5%
Support threshold
Ru
nti
me (
seco
nd
)
A-CLOSE
CLOSET
ChARM
Mining Frequent Closed Itemsets on Connect-4
1
10
100
1000
10000
40% 50% 60% 70% 80% 90% 100%
Support threshold
Ru
nti
me (
seco
nd
) A-CLOSE
CLOSET
ChARM
Mining Frequent Closed Itemsets on Pumsb
0
50
100
150
200
250
300
75% 80% 85% 90% 95%
Support threshold
Ru
nti
me (
seco
nd
) A-CLOSE
CLOSET
ChARM
References R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for
generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), (to appear), 2000.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994.
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. KDD'2000, Boston, August 2000.
J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc. SIGMOD’2000, Dallas, TX, May 2000.
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, submitted for publication
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT), pages 3--17, Avignon, France, March 1996.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. ICDT’99, Israel, January 1999.
M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed association rule mining. In Proc. KDD'2000, Boston, August 2000.
DBMiner Version 2.5 (Beta)
DBMiner Technology Inc.B.C. Canada
What we had for DBMiner 2.0…
Association module on data cubes Classification module on data cubes Clustering module on data cubes OLAP browser 3D Cube browser
What we will do in DBMiner 2.5…
Keep the existing association module and classification module in version 2.0
Change the existing clustering module Add new visual classification module
both on SQL server and OLAP Add new sequential pattern modules
on SQL server using FP algorithm
What we have done…
We have incorporated the existing association module and added OLAP browser Module
We have added the visual classification module
We have changed the existing clustering module
We have added the sequential pattern module
We are still in the development stage
Association module on data cubes
New sequential pattern module on SQL Server
New visual classification module on data cubes
New clustering module on data cubes