ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

26
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business

description

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis). Association Rule Mining. Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business. Main Expectations. Knowledge pattern in focus Definitions and examples A basic method How to tune the method - PowerPoint PPT Presentation

Transcript of ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

Page 1: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

1

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

Association Rule Mining

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Page 2: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

2

Main Expectations• Knowledge pattern in focus• Definitions and examples• A basic method• How to tune the method• Decision support applications• When to use association rule mining

• Reading – T2, pp. 225 - 236

Page 3: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

3

Association

• Under a given condition, a set of objects (implies) another set of objects

Examples • Retail items purchased together• Services subscribed by the same customer• Web pages a user access in a session• Courses taken by the same student• Medications prescribed by a doctor for a patient visit • Genes that are expressed at the same level

Page 4: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

4

Decision Support Applications

• Customer relationship management• Retail merchandise placement• Online retail catalog design• Website link re-organization• Fraud detection• Gene analysis for cancer prevention

Page 5: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

5

Preliminary

• Set Theory– A set is a collection of objects.

E.g., set A = {3,5} and set B= {1,3,5}

– Elements of a set are the objects belong to it. E.g., 3 {3,5}, 3 {1,3,5}, 3 A and 3 B

– Set X is a subset of set Y if any element in X belongs to Y, denoted as X Y. E.g., A B or {3,5} {1,3,5}

Page 6: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

6

Preliminary

• Two properties of set– An element in a set is counted only once E.g., {1,3,5} = {1,3,3,5}

– There is no order of elements in a set E.g., {3,1,5} = {1,3,5}

Page 7: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

7

Association RulesGiven: A database of transactions

Example of transactions:•a customer’s visit to a grocery store •an online purchase at a virtual store such as ‘Amazon.com’

Format of transactions:

date transaction IDcustomer ID Item1/1/99 001 001 egg

1/1/99 001 001 milk

Page 8: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

8

Association Rules

Find: patterns in the form of association rules

Association rules : correlate the presence of one set of items (X) with the presence of another set of items (Y), denoted as X Y

Example : {purchase egg,milk} {bread}

How to measure correlations in association rules?

Page 9: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

9

Association Rules

Itemset: a set of items, ex. {egg, milk}

Size of Itemset: number of items in that itemset.

The ratio of the number of transactions that purchases all items in an itemset to the total number of transactions is called the support of the itemset.

Page 10: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

10

Association Rules

Example:Example:

TIDTID CIDCID ItemItem Price Price DateDate101 201 Computer 1500 1/4/99101 201 MS Office 300 1/4/99101 201 MCSE Book 100 1/4/99102 201 Hard disk 500 1/8/99102 201 MCSE Book 100 1/8/99103 202 Computer 1500 1/21/99103 202 Hard disk 500 1/2199103 202 MCSE Book 100 1/2199

Page 11: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

11

Association Rules

In this example:

The support of the 2-itemset {Computer,Hard disk} is 1/3=33.3%.

What is the support of 1-itemset {Computer}?

Page 12: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

12

Association Rules

Two important metrics for association rules:

If two itemsets X and Y co-exist in a transaction database, the association rule XY holds with

supports s which is the ratio of the # of transactions purchasing both X and Y to (÷) the total # of transactions

confidence c which is the ratio of the # of transactions purchasing both X and Y to (÷) the # of transactions purchasing X only.

Page 13: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

13

Association Rules

Association rule: {Computer} {Hard disk}

Support: 1/3=33.3%Confidence: 1/2=50%

How about {Computer} {MCSE book}{Computer, MCSE book} {Hard disk}???

Page 14: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

14

Association Rule Mining

Association rule mining: find all association rules with support no less than user-specified minimum support and confidence no less than user-specified minimum confidence in a database

• For small problems, the process of mining association rules is not that complex.

• How about a transaction database with 1billion transactions and 1million different items?

• An efficient algorithm is needed!

Page 15: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

15

Association Rules

Two Steps in Association rule mining:

1. Find all large or frequent itemsets that have support above user-specified minimum support.

2. For each large itemset L, find all association rules in the form of a(L-a) where a and (L-a) are non-empty subsets of L.

Example: find all association rules in the example with minimum support 60% and minimum confidence 80%.

Page 16: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

16

Association Rule Mining

Step 2 is trivial compared to step 1:

•Exponential search space

•Size of transaction database

Page 17: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

17

Apriori Algorithm

• Apriori is an efficient algorithm to discover all large itemsets from a huge database with large number of items.

• Apriori is developed by two researchers from IBM Almaden Research Lab.

Page 18: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

18

Apriori Algorithm

• Apriori algorithm is based on Apriori property.

• Apriori property is that any subset of a large itemset must be large.

Page 19: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

19

Apriori Algorithm

• Step 1: Scan DB one time to find all large 1-itemsets.

• Step 2: Generate candidate K-itemsets from large (k-1)-itemsets.

• Step 3: Find all large k-itemsets from candidate k-itemsets by scanning DB once

• Go back to step 2 and stop until no cadidate itemsets can be generated.

Page 20: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

20

Apriori Algorithm

• Step 2– Candidate k-itemsets are k-itemsets that

could be large.– Why generate candidate k-itemsets only from

large (k-1) itemsets? – How to generate?

• Step 2-1: Join: Two large (k-1)-itemsets, L1 amd L2, that are joinable must satisfy the following conditions:

– L1(1)=L2(1) and L1(2)=L2(2) and …. L1(K-2)=L2(K-2)– L1(K-1)<L2(K-1)

• Step 2-2: Prune: prune itemsets generated in step 2-1 that have subset not large.

Page 21: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

21

Apriori Algorithm

Minimum support =40%

Minimum confidence =70%

Transaction ID Items

100 1,3,4,6

200 2,3,5,7

300 1,2,3,5,8

400 2,5,9,10

500 1,4

Page 22: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

22

Association Rule Mining

Large 1-itemset:

{1} support=3/5=60%

{2} support=3/5=60%

{3} support=3/5=60%

{4} support=2/5=40%

{5} support=3/5=60%

Tid items

100 1, 3, 4, 6

200 2, 3, 5, 7

300 1, 2, 3, 5, 8

400 2, 5, 9, 10

500 1, 4

Minimum Support: 40%

Page 23: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

23

Association Rule MiningLarge 1-itemset:

{1} support=3/5=60%

{2} support=3/5=60%

{3} support=3/5=60%

{4} support=2/5=40%

{5} support=3/5=60%

Candidate 2-itemset:

{1, 2} {1, 3} {1, 4} {1, 5}

{2, 3} {2, 4} {2, 5}

{3, 4} {3, 5}

{4, 5}

Page 24: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

24

Association Rule MiningCandidate 2-itemset:

{1, 2} {1, 3} {1, 4} {1, 5}

{2, 3} {2, 4} {2, 5}

{3, 4} {3, 5}

{4, 5}

Large 2-itemset:

{1, 3} support=2/5=40%

{1, 4} support=2/5=40%

{2, 3} support=2/5=40%

{2, 5} support=3/5=60%

{3, 5} support=2/5=40%

Page 25: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

25

Association Rule Mining

Candidate 3-itemset:

{1, 3, 4}

{2, 3, 5}

Large 2-itemset:

{1, 3} support=2/5=40%

{1, 4} support=2/5=40%

{2, 3} support=2/5=40%

{2, 5} support=3/5=60%

{3, 5} support=2/5=40%

Page 26: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

26

Association Rule Mining

Candidate 3-itemset:

{1, 3, 4}

{2, 3, 5}

Large 3-itemset:

{2, 3, 5} support=2/5=40%

Candidate 4-itemset:

No candidate 4-itemset. Stop.