Association Mining
Transcript of Association Mining
www.edureka.co/r-for-analytics
Know The Science Behind Product
Recommendation
www.edureka.co/r-for-analyticsSlide 2
Objectives
What is data mining
What is Business Analytics
Stages of Analytics / data mining
What is R
overview of Machine Learning
What is Association rule mining
Use-case
At the end of this session, you will be able to
Slide 3 www.edureka.co/r-for-analytics
Business Analytics
Why Business Analytics is getting popular these days ?
Cost of storing data Cost of processing data
Slide 4 www.edureka.co/r-for-analytics
Cross Industry standard Process for data mining ( CRISP – DM )
Stages of Analytics / Data Mining
Slide 5 www.edureka.co/r-for-analytics
What is R
R is Programming Language
R is Environment for Statistical Analysis
R is Data Analysis Software
Slide 6 www.edureka.co/r-for-analytics
R : Characteristics
Effective and fast data handling and storage facility
A bunch of operators for calculations on arrays, lists, vectors etc
A large integrated collection of tools for data analysis, and visualization
Facilities for data analysis using graphs and display either directly at the computer or paper
A well implemented and effective programming language called ‘S’ on top of which R is built
A complete range of packages to extend and enrich the functionality of R
Slide 7 www.edureka.co/r-for-analytics
Who Uses R : Domains
Telecom
Pharmaceuticals
Financial Services
Life Sciences
Education, etc
Slide 8
Common Machine Learning Algorithms
Types of Learning
Supervised Learning
Unsupervised Learning
Algorithms
Naïve Bayes Support Vector Machines Random Forests Decision Trees
Algorithms
K-means
Fuzzy Clustering
Hierarchical Clustering
Gaussian mixture models
Self-organizing maps
Slide 9Slide 9Slide 9 www.edureka.co/r-for-analytics
Association Rule Mining
Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars
Customers who purchase maintenance agreements are very likely to purchase large appliances
When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners
Slide 10Slide 10Slide 10 www.edureka.co/r-for-analytics
What is Association Rule Mining?
In data mining, Association Rule Mining is a popular and well researched method for discovering interesting relations between variables in large databases.
It is intended to identify strong rules discovered in databases using different measures of interests.
The rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat.
Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.
Slide 11Slide 11Slide 11 www.edureka.co/r-for-analytics
How good is Association Rule?
Here we have 5 customers. Each customer is given a bucket and their purchases are as follows :
Customer Items Purchased
1 OJ, soda
2 Milk, OJ, window cleaner
3 OJ, detergent
4 OJ, detergent, soda
5 Window cleaner, soda
Here, customer 1 purchases OJ (orange juice), and soda.customer 2 purchases Milk, OJ and window cleanercustomer 3 purchases OJ and detergentcustomer 4 purchases OJ, detergent and sodacustomer 5 purchases window cleaner and soda.
Now lets form a matrix to analyze the above data and conclude inferences
Slide 12Slide 12Slide 12 www.edureka.co/r-for-analytics
How good is Association Rule?
OJ Window
cleaner
Milk Soda Detergent
OJ 4 1 1 2 2
Window cleaner 1 2 1 1 0
Milk 1 1 1 0 0
Soda 2 1 0 3 1
Detergent 2 0 0 1 2
Simple patterns derived from the above observation :
OJ and soda are more likely purchased together than any other two items
Detergent is never purchased with milk or window cleaner
Milk is never purchased with soda or detergent
Co-occurence of Products
Slide 13Slide 13Slide 13 www.edureka.co/r-for-analytics
Association Rule Mining
The following three terms are the important constraints on which the Association Rules are made
Support
The support Supp(x)=proportion of transactions in the data set which contain the interest.
Confidence
The confidence of a rule: Conf(x=>y)= Supp(X U Y)/Supp(X)
Lift
The lift of a rule: Lift(X=>Y)= Supp(X U Y) / (Supp(X) X Supp(Y))
Now lets calculate the Support, Confidence and Lift for our ‘Groceries’ data
Support Confidence
{Soda} => {OJ} 0.4 0.6667
{OJ} => {Soda} 0.4 0.5
Slide 14Slide 14Slide 14 www.edureka.co/r-for-analytics
Association Rule Mining
The Groceries data set contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 169 categories.
‘arules’ provides the infrastructure for representing, manipulating and analyzing transaction data and patterns.
Various visualization techniques for association rules and itemsets. This package extends package arules.
Slide 15Slide 15Slide 15 www.edureka.co/r-for-analytics
Association Rule Mining
Syntax - apriori(data, parameter = NULL, appearance = NULL, control = NULL)
apriori() - The apriori function is present in the ‘arules’ package. It employs level-wise search for frequent item-sets.
Slide 16Slide 16Slide 16 www.edureka.co/r-for-analytics
Association Rule Mining
Going through 1098 rules manually, is not an efficient option.
Let us make use of the ‘Viz’ in arulesViz and visualize the rules.
Slide 17Slide 17Slide 17 www.edureka.co/r-for-analytics
Association Rule Mining
Now lets plot the data using the ‘Scatter Plot’ graph
A scatter plot is a mathematical diagram to display values for two variables for a set of data.
The data is displayed as a collection of points
Scatter plot is used when a variable exists below the control of the experimenter.
Conclusion:
It can be seen that rules with high lift have relatively low support.
Most interesting rules reside on support-confidence border.
Slide 18Slide 18Slide 18 www.edureka.co/r-for-analytics
Association Rule Mining
Now after applying the Association Rules, the Support, Confidence and the Lift values for the Groceries data is as shown below:
Slide 19Slide 19Slide 19 www.edureka.co/r-for-analytics
Association Rule Mining
Slide 20Slide 20Slide 20 www.edureka.co/r-for-analytics
Conclusion:
The most interesting rules according to ‘lift’ can be seen at the top-center.
There are 3 rules containing “Butter” and 1 other item in the antecedent, in consequence to “whipped/sour cream”
Let us zoom into the plot to observe the significant inferences:
Association Rule Mining