Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

14
Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Page 1: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Data Mining Association Rules

Yao MengHongli Li91.574 Database IIFall 2002

Page 2: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Outline

Overview Apriori AprioriTid DIC

Data Structure Experiment Environment Experiment Result and Analysis

Page 3: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Overview – Apriori Algorithm

Generate Large1-itemset

Merge and pruneto Generatecandidate of

next size

Go throughwhole DB to

Count Support

> MinSupport?

large itemset

HasCandidate?

End

No

Yes

Yes

Page 4: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Overview – AprioriTid

Generate Large1-itemset

Merge and pruneto Generatecandidate of

next size

Go through

DB_Sub toCount Supportand generatenew DB_Sub

> MinSupport?

large itemset

HasCandidate?

End

No

Yes

Yes

Page 5: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Overview – DIC

Read M transaction Increment those itemset

that are current counting If all the child of a itemset

turned to large, begin to counting this itemset

If an itemset has been counted through all the transaction, remove it from the current counting list

If at the end of the DB, go to the first step

Stop if no itemset are need to counting

Page 6: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Hypothesis of Performance AnalysisGiven a memory size AprioriTid generally has better performance

than Apriori due to I/O saving DIC has better performance than Apriori in

fairly homogenenous data environment. DIC performance should approach that of

Apriori while M approaches number of total transaction.

Page 7: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Experiment Environment

Data Sets IBM Synthetic Dataset Generation Code for

Association Rules Enviroments

Operating System: Microsoft Windows XP Professional

Computer Intel Pentium III processor 550MHz RAM 384 MB

Source code written in Java

Page 8: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Data Structure

Apriori and DIC Candidate Itemset

stored in a hash-tree Each internal node is

are hashtables The leaves stored the

candidate itemset AprioriTid

Use array to keep candidates

Page 9: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Size vs. Execution Time

Number of Items = 8Avg transaction length = 5M = 500

Page 10: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Support Threshold

Size = 16410 transaction Number of Items = 8

Average Length per transaction = 5 M = 500

Page 11: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

DIC – Different M value

Size = 12291 transaction Number of Items = 8

Average Length per transaction = 5

Page 12: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

DIC – “Non-Homogeneous” Dataset

Size = 6000 transaction Number of Items = 8

M = 500

Page 13: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Conclusions

AprioriTid is the best in our experiment I/O saving AprioriTid use small Data structure

Apriori and DIC are very similar Apriori is Special Case of DIC They use same data structure

DIC Sensitive to data M affects performance

Page 14: Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002.

Reference

1. Rakesh Agrawal, Tomasz Imielinski, Arun Swami. Mining Association Rules between Sets of Items in Large Database. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993

2. Rakesh Agrawal, Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. Proc. 20th Int. Conf. Very Large Data Bases, VLDB, page 487-499. 1994

3. Ashok Savasere, Edward Omiecinski, Shamkant Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. Proc. of the 21st VLDB Conf., pp. 432-444, 1995.

4. Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data. Tucson, Arizona, USA. 1997.

5. J. Hipp, U. Güntzer, G. Nakhaeizadeh. Mining Association Rules: Deriving a Superior Algorithm by Analysing Today's Approaches. Proceedings of the 4th European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '00), Lyon, France. 2000.

6. Jochen Hipp, Ulrich Güntzer, Gholamreza Nakhaeizadeh. Algorithms for Association Rule Mining – A General Survey and Comparison. SIGKDD Explorations. 2(1): 58-64. 2000.

7. R. Srikant, R. Agrawal. Mining Generalized Association Rule. In Proc. of the VLDB Conference, September 1995