ITM Slides .
-
Upload
aashima-grover -
Category
Documents
-
view
9 -
download
0
description
Transcript of ITM Slides .
Market Basket Analysis
What is Analytics/Analytical CRM?
Automated extraction of interesting patterns from large databases
e.g. Extracts in-depth customer history, preferences and
profitability information
Allows to analyze, predict and derive customer behavior and forecast demand
Lets you approach customers with relevant information and
often that are tailored to their needs
Types of Patterns
• Associations – Coffee buyers usually also purchase sugar
• Sequence Patterns – After seeing Superman, people usually see Star Wars
• Clustering – Segments of customers requiring different promotion
strategies
• Classification – Customers expected to be loyal
4
Association Rules
Transaction ID Items
1 Tomato, Potato, Onions
2 Tomato, Potato, Brinjal, Pumpkin
3 Tomato, Potato, Onions, Chilly
4 Lemon, Tamarind
Support(X) = |transactions containing X| / |D|
Rule: Tomato, Potato Onion (confidence: 66%, support: 50%)
D :
Support: It is a measure of how frequently the rule occurs in the database. Support (%) for A=>B is the percentage of all customers who purchased both A and B. Confidence: Confidence (%) for A=>B is the percentage of all customers who purchased both A and B, divided by the number of customers who purchased A. Ex: A supermarket database has 100,000 point-of-sale transactions. Of these transactions, 2000 include both orange juice and flu medications, and 800 of these include soup purchases. What is the support and confidence for the following rule Orange juice, Flu medication => Soup
Lift: Lift (%) for A=>B is a measure of the strength of the association. If Lift = 2 for the rule A=>B, then a customer having A is twice as likely to have B as a customer chosen at random. Benchmark Confidence = no. of transactions with consequent item sets/no. of transactions in database
Lift Ratio = confidence/benchmark confidence
Class Assignment: Transaction Faceplate Colors Purchased
1 red white green 2 white orange 3 white blue 4 red white orange 5 red blue 6 white blue 7 red blue 8 red white blue green 9 red white blue 10 yellow
List the item sets with support count of at least 20%
Item Set Support (Count) {red} 6 {white} 7 {blue} 6 {orange} 2 {green} 2 {red, white} 4 {red, blue} 4 {red, green} 2 {white, blue} 4 {white, orange} 2 {white, green} 2 {red, white, blue} 2 {red, white, green} 2
The Process of Rule Selection Rule1:{red,white}=>{green} with confidence = support of {red,white,green}/support of {red,white} = 2/4 = 50% Rule2:{red,green}=>{white} with confidence = support of {red,white,green}/support of {red,green} = 2/2 = 100% Rule3:{white,green}=>{red} with confidence = support of {red,white,green}/support of {white,green} = 2/2 = 100% Rule4:{red}=>{white,green} with confidence = support of {red,white,green}/support of {red} = 2/6 = 33% Rule5:{white}=>{red,green} with confidence = support of {red,white,green}/support of {white} = 2/7 = 29%
Rule6:{green}=>{red,white} with confidence = support of {red,white,green}/support of {green} = 2/2 = 100% If the desired confidence is 70%, only 2nd, 3rd and 6th rules are recommended.
11
Traditional Measures
• Confidence: Likelihood of a rule being true
• Support:
– Statistical significance: Data supports rule
– Applicability: Rule with high support is applicable in large number of transactions
Applications
• E-commerce – People who have bought Sundara Kandam have
also bought Srimad Bhagavatham
• Census analysis – Immigrants are usually male
• Sports – A chess end-game configuration with “white pawn
on A7” and “white knight dominating black rook” typically results in a “win for white”.
• Medical diagnosis – Allergy to latex rubber usually co-occurs with
allergies to banana and tomato
Recommendation Systems People who listen to songs that you listen, have also listened to these other songs… People who have bought these books, have also bought these other books…