Association Rule.. Association rule mining It is an important data mining model studied extensively...

16
Association Rule.

Transcript of Association Rule.. Association rule mining It is an important data mining model studied extensively...

Page 1: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Association Rule.

Page 2: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Association rule mining

It is an important data mining model studied extensively by the database and data mining community.

Assume all data are categorical. Initially used for Market Basket Analysis to

find how items purchased by customers are related.

Page 3: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Transaction data: supermarket data

Market basket transactions:t1: {bread, cheese, milk}t2: {apple, eggs, salt, yogurt}… …tn: {biscuit, eggs, milk}

Concepts:• An item: an item/article in a basket• I: the set of all items sold in the store• A transaction: items purchased in a basket; it

may have TID (transaction ID)• A transactional dataset: A set of transactions

Page 4: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

The model: rulesA transaction t contains X, a set of items

(itemset) in I, if X t.An association rule is an implication of the

form:X Y, where X, Y I, and X Y =

An itemset is a set of items.• E.g., X = {milk, bread, cereal} is an itemset.

A k-itemset is an itemset with k items.• E.g., {milk, bread, cereal} is a 3-itemset

Page 5: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Rule strength measures

Support: The rule holds with support sup in T (the transaction data set) if sup% of transactions contain X Y. sup = Pr(X Y)= Count (XY)/total count.

Confidence: The rule holds in T with confidence conf if conf% of transactions that contain X also contain Y.• conf = Pr(Y | X)=support(X,Y)/support(X).

An association rule is a pattern that states when X occurs, Y occurs with certain probability.

Page 6: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Goal of Association Rule. Find all rules that satisfy the user-

specified minimum support (minsup) and minimum confidence (minconf).

Page 7: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

An Example.

Transaction data Assume:

minsup = 30%minconf = 80%

An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7] Association rules from the itemset:Clothes Milk,Chicken[sup = 3/7, conf = 3/3]

… …Clothes, Chicken Milk[sup = 3/7, conf = 3/3]

•t1: Beef, Chicken, Milk

•t2: Beef, Cheese

•t3: Cheese, Boots

•t4: Beef, Chicken, Cheese

•t5: Beef, Chicken, Clothes, Cheese, Milk

•t6: Chicken, Clothes, Milk

•t7: Chicken, Milk, Clothes

Page 8: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Data set.

This data set related to retail industry. The data set contains information of each

transaction with the transaction ids. Each row represent a single transaction ,i.e

information of a single customer. For example if a row present the data like this-

{Bread sandwich,Milk,Egg,Butter}, it means this customer has taken those mentioned item in a single transaction.

Page 9: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Objective.

Here our main objective is to find out the pattern of buying from this huge data base

The discovery of such association rule can help people to develop marketing strategies by gaining insight into, which items are frequently purchased together by customer.

Here we have taken the following parameters,Minsup=.08Minconf=.40Mincorr=.30

Page 10: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Analysis.

The spreadsheet showing the frequently item set with the support values.

From the table it is clear that Fluid milk has the maximum frequencies followed by Bananas ,Salad vegetable, Eggs etc.

This means most of the customers has taken these three items into their basket.

Page 11: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

The fifth rule has got highest confidence value 58.83424%,which means 58% of customers who are taking Eggs also taking Fluid milk.

Similarly 54% of customers who are taking Tomatoes also taking Salad vegetables.

Same way 52% of customer who are taking Bread Sandwiches also taking Fluid milk.

Page 12: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Rule Graph. This will represent the entire Association rules

Graphically, which will help us to understand the entire process in a single snapshot.

In this graph, the support values for the Body and Head portions of each association rule are indicated by the sizes and colors of each circle.

The thickness of each line indicates the confidence value (conditional probability of Head given Body) for the respective association rule.

The sizes and colors of the circles in the center, above the Implies label, indicate the joint support (for the co-occurrences) of the respective Body and Head components of the respective association rules.

Page 13: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

In the graphical summary the strongest support value was found for Fluid milk associated with Bananas, Bread sandwiches, and Eggs.

From the graph it is also clear that Fluid milk and Eggs has got the highest confidence value (thickness of these rule is very high).

Page 14: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

3D Rule Graph.

The above graph is the 3D version of the earlier graph. From the graph it is clear that Fluid milk and Eggs have the

highest confidence value compared to any other items.

Page 15: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Conclusion. According to the rule Fluid milk, Bananas, Bread

sandwiches, Eggs, Salad Vegetables, Grapes, Fruit juice these items are frequently taken by customers into their basket.

Also the rule suggest that more than 50% of customers who are buying Fluid milk also buying Eggs and Bread sandwiches.

All the above information can be utilized for better marketing strategies.

For example retailer can arrange those frequently brought items very close to each other in the super market so that customer can get all these items easily.

Some new products (related to previous items) can also be placed nearby which will attract to the customers.

Page 16: Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Thank You.

Krishnendu Kundu

(Statistician)

StatSoft India.

Email- [email protected]

Mobile - +919873119520