Data Mining Association Analysis Stu (1)
Transcript of Data Mining Association Analysis Stu (1)
-
8/10/2019 Data Mining Association Analysis Stu (1)
1/17
ERP 345 5410
Business IntelligenceAssociation Analysis
1
Source: Business Intelligence, 3rded. Sharda et.al. 2014 Prentice HallSAP University alliance, BI workshop
-
8/10/2019 Data Mining Association Analysis Stu (1)
2/17
Association Rule Mining
Finds interesting relationships (affinities)between variables (items or events)
Part of machine learning family Employs unsupervised learning
There is no output variable
Also known as market basket analysis Often used as an example to describe DM to
ordinary people, such as the famousrelationship between diapers and beers!
-
8/10/2019 Data Mining Association Analysis Stu (1)
3/17
Association Analysis
Data Mining
Cross-SellingRules
C
D
D
A
B
E
E
E
A
Customers
Products
B
C
D
What products /
services are typically
bought together?
Export rules
to Web Shop
Use in
merchandising
Association Analysis -
Example Diapers impliesBeers?
3
-
8/10/2019 Data Mining Association Analysis Stu (1)
4/17
An urban legend: Beer
Implies Diapers?? Pattern: An Analysis of behavior of
supermarket shoppers discovered that
customers who buy beer tend also todiapers??
Rationale: Men in their 20s who
purchase beer on Fridays after receivingtheir paycheck are also likely to buy apack of diapers for the young kids in the
family.4
-
8/10/2019 Data Mining Association Analysis Stu (1)
5/17
Association Rule Mining
Input:the simple point-of-sale transaction data
Output:Most frequent affinities among items
Example: according to the transaction dataCustomer who bought a laptop computer anda virus protection software, also boughtextended service plan 70 percent of the time"
-
8/10/2019 Data Mining Association Analysis Stu (1)
6/17
How to Apply Market Basket
Analysis Results? Put the items next to each other for ease of finding
Promote the items as a package
Place items far apart from each other so that thecustomer has to walk the aisles to search for it, and bydoing so potentially see and buy other items
Direct marketers can use this information to determinewhich new products to offer to their current
customers. Inventory policies can be improved if reorder points
reflect the demand for the complementary products.
6
-
8/10/2019 Data Mining Association Analysis Stu (1)
7/17
Association Rules for
Market Basket AnalysisRules are written in the form left-hand side
implies right-hand side or A=>B
Green Peppers IMPLIES BananasOranges IMPLIES Apples
Milk IMPLIES Breads
To make effective use of a rule, three numericmeasures about that rule must be considered:(1) support, (2) confidence and (3) lift
7
-
8/10/2019 Data Mining Association Analysis Stu (1)
8/17
Measures of Predictive
Ability -- Confidence Confidence measures what percentage of
baskets that contained the item A also
contained item B.Confidence of the rule: AB=
(# of transactions contain A & B)
(#of transactions contain A)
Discussion:
What does Confidence measure?8
-
8/10/2019 Data Mining Association Analysis Stu (1)
9/17
Measures of Predictive
Ability-- Support Support refers to the percentage of baskets
where the rule was true (both items A and Bwere present).
Support of the rule: AB= (# of transactions contain A & B)
(#of all transactions)
Note: We could also define Support of any item thesame way. i.e. Support of Item A = (# oftransactions contain A) /(#of all transactions)
Questions: What is Support? Why Support?
So we would only like to retain rules with large support,9
-
8/10/2019 Data Mining Association Analysis Stu (1)
10/17
Measures of Predictive
Ability -- Lift Lift measures how much more frequently item A
is found with item B than without item B.
Lift of the rule: AB= Confidence (of the rule: AB)
Support of B
=(# of transactions contain A & B) x (#of all transactions)(#of transactions contain A) x (#of transactions contain B)
Questions: What does lift measure? Why Lift?
10
-
8/10/2019 Data Mining Association Analysis Stu (1)
11/17
Small Example
Rule:Diapers -> Beer Support: 60% (3/5)
60% of all purchases have diapers and beer Confidence: 75% (3/4)
If diapers are purchased, 75% chance ofbuying beer
Lift: 1.25 (75%/60%) (Note: this 60% isnot the 60% of Support of the rules, butthe Support of Beer) If diapers purchased, person is 1.25 times
more likely to purchase beerurl: http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf11
http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdfhttp://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdfhttp://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdfhttp://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf -
8/10/2019 Data Mining Association Analysis Stu (1)
12/17
Using the Results
The tabulations can immediately betranslated into association rules and the
numerical measures computed. Comparing this weeks table to last weeks
table can immediately show the effect ofthis weeks promotional activities.
Some rules are going to be trivial. But youmay discover some interested facts/patternsfrom the data
12
-
8/10/2019 Data Mining Association Analysis Stu (1)
13/17
Limitations to Market
Basket Analysis A large number of real transactions are
needed to do an effective basket analysis,
but the datas accuracy is compromised if allthe products do not occur with similarfrequency.
The analysis can sometimes capture results
that were due to the success of previousmarketing campaigns (and not naturaltendencies of customers).
13
-
8/10/2019 Data Mining Association Analysis Stu (1)
14/17
Performing Analysis with
Virtual Items The sales data can be augmented with the
addition of virtual items. For example, we
could record that the customer was new to us,or had children.
The transaction record might look like:
Item 1: Sweater Item 2: Jacket Item 3: New
customer
This might allow us to see what patterns newcustomers have versus old customers.
14
-
8/10/2019 Data Mining Association Analysis Stu (1)
15/17
Multidimensional Market
Basket Analysis Rules can involve more than two items, for
example Plant and Clay Pot IMPLIES Soil.
These rules are built iteratively. First, pairs arefound, then relevant sets of three or four.
These are then pruned by removing those thatoccur infrequently.
In an environment like a grocery store, wherecustomers commonly buy over 100 items, rulescould involve as many as 10 items.
15
-
8/10/2019 Data Mining Association Analysis Stu (1)
16/17
Lab 5 Association Analysis
with Excel Use the transactions from slide # 11 as the input.
Develop your own excel spreadsheet (Download
the sample Excel file from Blackboard) to conductassociation analysis (with formula from slides 8, 9& 10) Report Confidence, Support and Lift for allpossible first level (A implies B) rules.
Lab Questions(1)What conclusions can you make from the
association analysis of this lab? Explain.
(2)What suggestions can you provide to the store
manager? Explain.16
-
8/10/2019 Data Mining Association Analysis Stu (1)
17/17
Lab 5 report
Turn in a lab report with thefollowings:
A cover page
Summary of the lab
A screenshot of your Excel table including
all the association measures of your rules.Sort your rules first by Product A, then byproduct B (Ascending orders)
Two Lab questions with your answers17