Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
-
date post
15-Jan-2016 -
Category
Documents
-
view
216 -
download
0
Transcript of Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
![Page 1: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/1.jpg)
Association RulesAssociation Rules
Presented by: Anilkumar Presented by: Anilkumar PanickerPanicker
![Page 2: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/2.jpg)
What is Data Mining??What is Data Mining??
• Search for valuable information in large volumes of data.
• A step in knowledge discovery in databases.
• It enables companies to focus on customer satisfaction, corporate profits, and determining the impact of various parameters on the sales.
![Page 3: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/3.jpg)
Association RuleAssociation Rule
• Association rules are used to show the relationships between data items.
• Association rules detect common usage of data items.
• E.g. The purchasing of one product when another product is purchased represents an association rule.
![Page 4: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/4.jpg)
Example 1Example 1
• Grocery store.
• Association rules have most direct application in the retail businesses.
• Association rules used to assist in marketing, advertising, floor placements and inventory control.
![Page 5: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/5.jpg)
• From the transaction history several association rules can be derived.
• E.g. 100% of the time that PeanutButter is purchased, so is bread.
• 33% of the time PeanutButter is purchased, Jelly is also purchased.
![Page 6: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/6.jpg)
Example 2Example 2• A Telephone Company.
• A telephone company must ensure that all calls are completed and in acceptable period of time.
• In this environment, a potential data mining problem would be to predict a failure of a node.
• This can be done by finding association rules of the type XFailure.
![Page 7: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/7.jpg)
• If these types of rules occur with a high confidence, Failures can be predicted.
• Even though the support might be low because the X condition does not frequently occur.
![Page 8: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/8.jpg)
Association ruleAssociation rule
• Given a set of items I = {I1,I2,….Im} and a database of transactions D = {t1,t2,….tm} where ti = { Ii1,Ii2,….Iik} and IiJ € I , an association rule is an implication of the form X Y where X,Y C I are sets of items called itemsets and X∩Y =ø.
![Page 9: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/9.jpg)
• Support (s):
The support (s) for an association rule
XY is the percentage of transactions in the database that contain X U Y.
E.g. If bread along with peanutbutter occurs in 60% of the total transactions, then the support for breadpeanutbutter is 60%
![Page 10: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/10.jpg)
• Confidence or Strength (α):
The confidence or strength (α) for an association rule XY is the ratio of the number of transactions that contain X U Y to the number of transactions that contain X.
Eg.if support for breadpeanutbutter is 60% and bread occurs in 80% of total transactions then confidence for breadpeanutbutter is 75%.
![Page 11: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/11.jpg)
Selecting Association rulesSelecting Association rules
• The selection of association rules is based on Support and Confidence.
• Confidence measures the strength of the rule, Whereas support measures how often it should occur in the database.
• Typically large confidence values and a smaller support are used.
• Rules that satisfy both minimum support and minimum confidence are called strong rules.
![Page 12: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/12.jpg)
Association Rule ProblemAssociation Rule Problem
• Given a set of Items I = {I1,I2,….Im} and a database of transactions D = {t1,t2,….tn} where ti = { Ii1,Ii2,….Iik} and IiJ € I . The association rule problem is to identify all association rules XY with a minimum support and confidence. These values (s,α) are given as input to the problem.
![Page 13: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/13.jpg)
Large ItemsetsLarge Itemsets
• A Large Itemset / frequent Itemset is an itemset whose number of occurrences is above a threshold, s (Support)
• Finding large Itemsets generally is quite easy but very costly.
• The naive approach would be to count all itemsets that appear in any transaction.
• Given a set of items of size m, there are 2m subsets. Ignoring the empty set we are still left with 2m – 1 subsets.
![Page 14: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/14.jpg)
• For e.g. In the retail store example if have set of items of size 5, i.e the store sells 5 products. Then the possible number of itemsets is 25 – 1 = 31.
• If the 5 products sold are bread,peanutbutter,milk,beer and jelly.
then the 31 possible itemsets are
![Page 15: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/15.jpg)
• Bread• Peanutbutter• Milk• Beer• Jelly• Bread,peanutbutter• Bread,milk• Bread,beer• Bread,jelly• Peanutbutter,milk• Peanutbutter,beer• Peanutbutter,jelly• Milk,beer• Milk,jelly• Beer, jelly• Bread,peanutbutter,milk• Bread, Peanutbutter, beer and so on.
![Page 16: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/16.jpg)
• For m = 30 the number of potential itemsets become 1073741823.
• The challenge in solving an association problem is hence to efficiently determining all large itemsets.
• Most association rule algorithms are based on smart ways to reduce the number of itemsets to be counted.
![Page 17: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/17.jpg)
Large ItemsetsLarge Itemsets
• The most common approach to finding association rules is to breakup the problem into two parts
1. Finding large Itemsets and
2. Generating rules from these itemsets.
![Page 18: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/18.jpg)
• Subset of any large itemset is also large.
• Once the large Itemsets have been found, we know that any interesting association rule, XY ,must have X U Y in this set of frequent itemsets.
• When all large itemsets are found, generating the association rules is straightforward.
![Page 19: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/19.jpg)
Apriori AlgorithmApriori Algorithm
• Apriori algorithm is the most well known association rule algorithm.
• Apriori algorithm is used to efficiently discover large itemsets.
• Apriori algorithm uses the property that any subset of a large itemset must be large.
• Inputs: Itemsets, Database of transactions, support and the output is large itemsets.
![Page 20: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/20.jpg)
Apriori Algorithm ExampleApriori Algorithm Example
T.I.D. Items
100 1,3,4
200 2,3,5
300 1,2,3,5
400 2,5
ITEM SET SUPPORT
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
![Page 21: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/21.jpg)
Support threshold = 2Support threshold = 2
ITEM SET SUPPORT
{1} 2
{2} 3
{3} 3
{5} 3
ITEM SET
{1,2}
{1,3}
{1,5}
{2,3}
{2,5}
{3,5}
![Page 22: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/22.jpg)
Threshold Support = 2Threshold Support = 2
ITEM SET SUPPORT
{1,2} 1
{1,3} 2
{1,5} 1
{2,3} 2
{2,5} 3
{3,5} 2
ITEM SET SUPPORT
{1,3} 2
{2,3} 2
{2,5} 3
{3,5} 2
![Page 23: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/23.jpg)
ITEM SET
{2,3,5}
ITEM SET SUPPORT
{2,3,5} 2
![Page 24: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/24.jpg)
ReferencesReferences
• Data Mining by Margaret Dunham.
• Wikipedia
![Page 25: Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d545503460f94a31b84/html5/thumbnails/25.jpg)
Q & AQ & A
…… Thanks..