Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory...

18
Kuo-Yu Huang NCU CSIE DBLab 1 The Concept of Maximal The Concept of Maximal Frequent Itemsets Frequent Itemsets NCU CSIE Database Laborat NCU CSIE Database Laborat ory ory Kuo-Yu Huang Kuo-Yu Huang 2002-04-15 2002-04-15

Transcript of Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory...

Page 1: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 1

The Concept of Maximal FrequThe Concept of Maximal Frequent Itemsetsent Itemsets

NCU CSIE Database LaboratoryNCU CSIE Database LaboratoryKuo-Yu HuangKuo-Yu Huang

2002-04-152002-04-15

Page 2: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 2

OutlineOutline

• Introduction

• Max-Miner

• MAFIA

• GenMax

• Conclusion

Page 3: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 3

Introduction(1/2)Introduction(1/2)

• Interesting datasets with long patterns– Questionnaire results– Transactions database

• Contain many frequently occurring items• A wide average record length

• Apriori-like algorithms are inadequate– Enumerates every single frequent itemsets

Page 4: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 4

Introduction(2/2)Introduction(2/2)

• Maximal Frequent Itemsets– If it has no superset that is frequent.– eq

• Items: a, b, c, d, e• Frequent Itemset: {a, b, c}• {a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not Fre

quent Itemset.• Maximal Frequent Itemsets: {a, b, c}

Page 5: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 5

Max-Miner(1/4)Max-Miner(1/4)

• Efficiently mining long patterns from databases– R. J. Bayardo– ACM SIGMOD’98

• Max-Miner– Abandons a bottom-up traversal– Attempts to “look-ahead”– Identify a long frequent itemset, prune all its subse

ts.

Page 6: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 6

Max-Miner(2/4)Max-Miner(2/4)

• Set-enumeration tree

• Breadth-first search

Page 7: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 7

Max-Miner(3/4)Max-Miner(3/4)

• Candidate group– Head: h(g)

• Itemset enumerated by the node.

– Tail: t(g)• An ordered set and contains all items not in h

(g)

– eg:Node {1}• h{g}: {1}• t{g}: {2, 3, 4}

Page 8: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 8

Max-Miner(4/4)Max-Miner(4/4)

• Support counting– h(g), h(g) t{g}, h(g) {i} for all ∪ ∪– If h(g) t{g} is frequent, then any itemset e∪

numerated by a sub-node will also be frequent but no maximal.

– If h(g) {i} is infrequent, then any head of a ∪sub-node that contains item I will also be infrequent.

Page 9: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 9

MAFIA(1/4)MAFIA(1/4)

• MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases.– D. Burdick, M. Calimlim, and J. Gehrke.– ICDE’01

• MAFIA– Integrates a depth-first traversal of the itms

et lattice with eiffective pruning mechanisms

Page 10: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 10

MAFIA(2/4)MAFIA(2/4)

Page 11: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 11

MAFIA(3/4)MAFIA(3/4)

• HUTMFI– Check Head Union Tail is in MFI

• Stop searching and return

• PEP– newNode = C i∪– Check newNode.support == C.support

• Move I from C.tail to C.head

• FHUT– newNode = C I∪– Whether I is the leftmost child in the tail

Page 12: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 12

MAFIA(4/4)MAFIA(4/4)

Page 13: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 13

GenMax(1/2)GenMax(1/2)• Efficiently Mining Maximal Frequent Ite

msets– Karam Gouda and Mohammed J. Zaki.– ICDM’01

• GenMax– A backtrack search based algorithm for mi

ning maximal frequent itemsets.

Page 14: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 14

GenMax(2/2)GenMax(2/2)• Superset checking techniques

– Do superset check only for Il+1 P∪ l+1

– Using check_status flag– Local maximal frequent itemsets

• Reordering the combine set

• Diffsets propagation

Page 15: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 15

Conclusion(1/4)Conclusion(1/4)

database # of Items Average length # of records Maximal pattern length

Chess

Pumsb

76

7117

37

74

3196

49046

23(20%)

27(40%)

Connect

Pumsb*

130

7117

43

50

67557

49046

31(2.5%)

43(2.5%)

T10I4D100K

T40I10D100K

1000

1000

10

40

100,000

100,000

13(0.01%)

25(0.1%)

Type I

Type II

Type III

• Type I:– normal MFI distribution with not too long maximal patterns.

• Type II:– Left-skewed distribution with longer pattern

• Type III:– Exponential decay distribution with short maximal pattern

Page 16: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 16

Conclusion(2/4)Conclusion(2/4)

Page 17: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 17

Conclusion(3/4)Conclusion(3/4)

Page 18: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15.

Kuo-Yu Huang NCU CSIE DBLab 18

Conclusion(4/4)Conclusion(4/4)