Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern...

Our New Progress on Frequent/Sequential Pattern Mining

We develop new frequent/sequential pattern mining methods

Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins

Our newmethods

Conventionalmethods

Frequent patternmining

FP-growth Apriori, TreeProjection

Sequential patternmining

PrefixSpan,FreeSpan

Frequent closedpattern mining

CLOSET A-close, CHARM

Mining Complete Set of Frequent Patterns on T10I4D100k

0.00% 0.05% 0.10% 0.15%

Support threshold

Apriori

TreeProjection

FP-growth

Mining Complete Set of Frequent Patterns on T25I20D100k

0.00% 0.50% 1.00% 1.50%

Support threshold

Apriori

TreeProjection

FP-growth

Mining Complete Set of Frequent Patterns on Connect-4

70% 75% 80% 85% 90% 95%

Support threshold

) Apriori

TreeProjection

FP-growth

Mining Sequential Patterns on C10T4S16I4

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

PrefixSpan-1

PrefixSpan-2

FreeSpan-2

Mining Sequential Patterns on C10T8S8I8

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

PrefixSpan-1

PrefixSpan-2

FreeSpan-2

Scalability of Mining Sequential Patterns on C10-100T8S8I8

0 20000 40000 60000 80000 100000

Number of sequences

PrefixSpan-1

PrefixSpan-2

FreeSpan-2

Scalability of Mining Sequential Patterns on C10-100T4S16I4

0 20000 40000 60000 80000 100000

Number of sequences

PrefixSpan-1

PrefixSpan-2

FreeSpan-2

Why Prefix Is Faster Than GSP?

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

# cand/pattern inGSP

Runtime/proj. db inPrefixSpan

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

# cand/pattern inGSP

Runtime/proj. db inPrefixSpan

Dataset C10T4S16I4 Dataset C10T8S8I8

Mining Frequent Closed Itemsets on T25I20D100k

0.7% 0.9% 1.1% 1.3% 1.5%

Support threshold

A-CLOSE

CLOSET

Mining Frequent Closed Itemsets on Connect-4

40% 50% 60% 70% 80% 90% 100%

Support threshold

) A-CLOSE

CLOSET

Mining Frequent Closed Itemsets on Pumsb

75% 80% 85% 90% 95%

Support threshold

) A-CLOSE

CLOSET

References R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for

generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), (to appear), 2000.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994.

J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. KDD'2000, Boston, August 2000.

J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc. SIGMOD’2000, Dallas, TX, May 2000.

J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, submitted for publication

R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT), pages 3--17, Avignon, France, March 1996.

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. ICDT’99, Israel, January 1999.

M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed association rule mining. In Proc. KDD'2000, Boston, August 2000.

DBMiner Version 2.5 (Beta)

DBMiner Technology Inc.B.C. Canada

What we had for DBMiner 2.0…

Association module on data cubes Classification module on data cubes Clustering module on data cubes OLAP browser 3D Cube browser

What we will do in DBMiner 2.5…

Keep the existing association module and classification module in version 2.0

Change the existing clustering module Add new visual classification module

both on SQL server and OLAP Add new sequential pattern modules

on SQL server using FP algorithm

What we have done…

We have incorporated the existing association module and added OLAP browser Module

We have added the visual classification module

We have changed the existing clustering module

We have added the sequential pattern module

We are still in the development stage

Association module on data cubes

New sequential pattern module on SQL Server

New visual classification module on data cubes

New clustering module on data cubes

Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern...

Documents

Transcript of Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern...

Frequent-Pattern Tree

Linkage Pattern Mining Method for Multiple Sequential Data ...three sequential data (Fig. 3b). Dataset3 is an artificial dataset wherein different length frequent patterns for each

Code Clone Detection using Sequential Pattern Mining - … · Code Clone Detection using Sequential Pattern ... technique for clone detection using sequential pattern mining titled

1 Sequential Pattern Mining. 2 Outline What is sequence database and sequential pattern mining Methods for sequential pattern mining Constraint-based.

What Is Sequential Pattern Mining? Given a set of sequences, find the complete set of frequent subsequences A sequence database A sequence : An element.

Mining frequent Max and closed sequential patternssummit.sfu.ca/system/files/iritems1/8688/b2604629a.pdf · MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS Ramin Afshar ... Mining

Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions

Sequential Pattern Mining - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/sequential_patterns.pdf · Sequential Pattern Mining Lecture Notes for Chapter 7 – Introduction

Data Mining Frequent Pattern

Sequential Pattern Mining

Preference-based Frequent Pattern Mining

Mining probabilistically frequent sequential patterns in large uncertain databases

REVIEW: Frequent Pattern Mining Techniques

Sequential Pattern Mininghic/CS7616/pdf/lecture13.pdf · • SPADE (Sequential PAttern Discovery using Equivalent Class) developed by Zaki 2001 • A vertical format sequential pattern

An overview of Advanced Aproiri algorithm on frequent item ... · Keywords - data mining, advanced Apriori algorithm, sequential pattern matching method, frequent itemset generation,

Sequential Pattern Mining - Georgia Institute of …hic/CS7616/pdf/lecture13.pdf2 Outline • What is sequence database and sequential pattern mining • Methods for sequential pattern

Frequent Pattern Mining - ssjaswal.comssjaswal.com/wp-content/uploads/2016/03/DMBI_chp7.pdf · Frequent Pattern Mining, Efficient and Scalable Frequent Itemset Mining Methods, The

Sequential Pattern Discovery

PATTERN-GROWTH METHODS FOR FREQUENT PATTERN …jpei/publications/jianpei_thesis.pdf · PATTERN-GROWTH METHODS FOR FREQUENT PATTERN MINING by Jian Pei B.Eng., Shanghai Jiaotong University,

Frequent Itemsets - cs.sfu.ca pattern... · Jian Pei: CMPT 741/459 Frequent Pattern Mining (2) 2 Candidate Generation & Test • Any subset of a frequent itemset must also be frequent