Data Mining & Knowledge Discovery
-
Upload
nirmala-last -
Category
Technology
-
view
898 -
download
5
description
Transcript of Data Mining & Knowledge Discovery
Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing
Bhagi Narahari
Outline of Lecture
What and Why of Data Mining and KDD? Importance and Applications to E-
commerce How ? Personalization
personalized one-to-one business on the internet Part I: Overview of Personalization Part 2: The Data Mining Process
Predictive Modelling
A “black box” that makes predictions about the future based on information from the past and present
Age
balance
income
How much will customerspend on next catalog order ?
Model
(Crystal ball?)
What is Data Mining?
It is the exploration and analysis by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.
Why now? (A historical perspective)
Because data is now available (wasn’t always)
Distributed sources Technology evolution Competition (do what you can to outdo)
Why DM?
CRM (Customer Relationship Management) - important success factor in E-commerce price differentiation no longer enough customer service more important
Links with suppliers already exist (B2B) - JIT, joint forecasting, planning, procurement
Current emphasis on links with customers - feedback, input in design, etc.
CRM
Identifying profitable customers Better service for more valued customers Retaining profitable customers
Getting a new customer costs a lot more than retaining an existing one
takes 5X to acquire new customers (Peppers&Rogers)
An increase from 75% to 80% in retention reduces costs by about 10%
Larger share of customer pool
CRM
Product differentiations based on “price” and “quality” are increasingly difficult need to differentiate based on relationships
Increasingly sophisticated mass marketing increases probability of success cost of mass marketing is driven down by
internet (reach)
CRM
Goal: Positively interact with your customers and prospects define customer segments lights out execution of campaigns against
segments attribution and evaluation of responses
Personalization in Ecommerce
Positive: much better chance of personalization
customer identificationtracking across visits and within visit
ability to do ‘what if’ experiments Negative:
cost of switching is much less is web based shopping good for ‘touchy feely’ things price differentiation across geographies not easy
Personalization
ProductDiscovery
ProductEvaluation
TermsNegotiation
OrderPlacement
OrderPayment
Customer Service& Support
MarketResearch
Market Stimulation/Education
TermsNegotiations
OrderReceipt
Order billingand paymentmanagement
Customer Service& Support
ProducerChain
Customer Chain
B2C Personalization Objectives
Know the customer profile - registration, cookies
Determine what the customer wants Ask: Questionnaires
what is the incentive for truthfulness Deduce: click streams, history, collaborative filtering
(Amazon!!) Deliver
Customize the look and feel offer special promotions offer customized products (Holy Grail)
Use of Personalization
In addition to storing and retrieving information on the individual’s profile “on the fly” can also use mining software to analyze the
information in the database to make recommendations or comments specific to the individual
Impact of Personalization
Customer relationship Learn more about customers
learn and understand the why and how they prefer to do business with your organization
In tandem with tracking provides you with a tool to monitor your website what works, what does’nt, what makes your
audience “click”
Security and Privacy as Barrier to Personalization
Large number of customers concerned about personalization (double click!)
will they pay more to preserve privacy? Some falsify info to preserve privacy customers give more info to trusted site need secure site with clear privacy policies
stated at site
Personalization
Know the Customer IdentifyGive the customerhis/her wants
QuestionnairesPast historyClick Streams
Profile
LoginCredit Card#
Predicting the wantsMapping to“peers”
Extrapolationfrom past
Extrapolationfrom peers (firefly.com)
Look&feel
Product selection&promotions
NewProduct
Know the customer
Cookies backlash (users do not trust them)
OPS: Open Profiling Standard combined with eTrust certification
Registration User certificates: logons
Key Question: how do you know that this customer is same as that goes
to your storefront need standard warehouse techniques like address
resolution, cred.card resolution etc.
Know the Customer:OPS
Two drivers user should not retype again & again basic info data is used in a trusted fashion (not leaked, other data
not see etc.) by users Two parts
Common datademographics (country,zip,age,gender)Contact (name, address, CreditCard…)User agent preferences
Per-site Sections (can be shared across sites, if user allows)
What if no profile???
Deduce collect information: history of purchases, time
spent on pages ask questions (offer rewards) combine with database marketing data
Predict behaviour buy probabilities build customer relationship
mining is key!
Personalization: Actions to take- Look and feel
Personalized pages specific data specific presentation and design sent through various mediums
Manage Customers not products: 1-1 marketing Strategy.com
deliver personalized pageseg: stock portfolio, personal info including alarm,
travel reservations use different mediums
WAP enable phones (eg: Sprint PCS Web)
Storefront Personalization
Customers visit Store Website Howard buys ties Rob buys Baby Products Ray buys toys Amy buys clothes
Provide a view of the store to these customers present them with what they are likely to buy?
Howard: ties, and men’s formal wearRay: Toys and gadgetsRob: Infant, Toddler sectionAmy: Women’s Clothes section
More Actions: Product Presentations & Promotions
Basic Storefront Product Hierarchy
Clothes
Men’s Women’s Children’s
Shirts Pants Casuals Evening Infants Kids
John’s ViewMary’s View
BroadVision.com
BroadVision One-to-One application allows businesses to develop and manage
personalized web sites interactively profile each visitor and dynamically
match info based on their profile and business rules specified by providers of site & services
users do not go through hoops finding relevant data
DM Terminology
OLAP
ROLAP
Data Warehouse
Data Marts
Data Stores
Neural Networks
Genetic Algorithms
Data Mining
Rule Based Systems
SQL
How?
Determine probability of buying as a function of customer attributes such as age, income, past buying patterns, ..
Target customers by ranking from highest to lowest probabilities
Other techniques: Decision Trees, Neural Networks, ….
KDD
Knowledge Discovery in Databases It is the process of identifying valid, novel,
potentially useful, and understandable patterns in data (Fayyad, Piatesky-Shapiro, and Smyth)
It involves data preparation, pattern extraction, knowledge evaluation, and refinement, in iteration
KDD
Data mining is a step in the KDD process that involves the application of certain algorithms to extract patterns
Steps in the KDD process:Select DataData Cleansing and Pre-processingData MiningResults interpretationImplementation
Pre-processing in KDD
80-90% of KDD process is spent here Why?
Operational data is incomplete, inconsistent, in different formats across systems
DM techniques might require data in a specific format
Data Mining Problems
Classification/Segmentation Binary (Yes/No) Multiple Category (Large/Medium/Small)
Forecasting (how much) Association Rule extraction (market basket
analysis) Sequence detection
balance increase -> missed payment -> default
Typical DM tasks
Prediction and Classification Directed Decision trees, Neural networks, memory based
reasoning, logistic regression Examples:
How many units will be sold on a given day?What will be the stock price on a given day?Will a customer buy the product or not?
DM tasks
Affinity grouping Undirected Which products go together naturally? The beer-diaper syndrome? Market basket analysis Examples:
Which products peak in demand simultaneously?
DM tasks
Clustering task Undirected Segmenting into similar clusters Different from classification Examples
Customers with similar buying profilesProducts with similar demand patterns
DM success factors
Integration with data warehouses and DSS Users should develop a good understanding
of techniques Recognize that these tools cannot
automatically find patterns without being told what to do
Most methods now used are extensions of analytical methods that have been around for decades
Legal and Ethical Issues
Privacy concerns becoming more important will impact the way that data can be used and analyzed ownership issues European data laws have implications on US
Often data included in the data warehouse cannot legally be used in decision making process Race, Gender, Age
Data contamination will become critical
Making Decisions
Data Warehouse?
Models
Decisions
Data Data Data Data
Data Warehouse
Bill Inmon: “A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management decisions.”
is managed data that is situated after and outside the operational systems
Data Warehousing
Increasing need to find, summarize, and interpret large amounts of data effectively Especially when data is distributed across many
different databases Transaction processing systems not easily
accessible to other systems Plus TP systems have time constraints
Enter the Data Warehouse
To deliver decision data to decision makers by integrating data from various TPS to a
single storage which can then feed a range of decision support
applications through an OLAP interface!
Data Complications
Noise Missing data Transformation
numeric data text
Need to differentiate between variables you can control and those you cannot Actionable: size of discount, number of offers etc. Non-actionable: age, income ..
Data Mining Techniques
Market Basket Analysis Memory Based Reasoning Cluster Detection Link Analysis Decision Trees and Rule Induction Neural Networks Genetic Algorithms OLAP
OLAP: On Line Analytical Processing
While a data warehouse brings data together, OLAP lets you look at data and manipulate interactively
OLAP allows users to “slice and dice” data Allows user to drill-down into detail data
Relational vs Multidimensional
Consolidations
Multidimensional Terminology
East, West, Central are input members of the Region dimension. Total Region is an output member of the Region dimension. Similarly, Nuts, Screws, Bolts, Washers, and Total are members of the Product dimension.
Variables are typically numerical measures like Sales, Costs, Profits, Expenses, and so forth.
Dimensions are roughly equivalent to Fields in a relational database. Cells are roughly equivalent to Records.
Steps in DW and OLAP
Data Loader
Data Converter
Data Scrubber
Data Transformer
Data Warehouse OLAP Server OLAP Interface
Data Data Data
Cluster Detection
Undirected data mining Finds records that are similar to each other
(clusters) Clusters are found using geometric
methods, statistical methods, and neural networks
Good way to start any analysis
Market Basket Analysis
Form of clustering used for finding items that occur together (in a transaction or market basket)
Likelihood of different products being purchased together as rules
Planning store layouts, limiting specials to one of the products in a set,...
Transaction data
Customer Products
1 Milk, Soda
2 Milk, Beer,diapers
3 Milk, cleaner
4 Beer, diapers,soda
5 Beer, soda
Co-occurrence matrix
Beer Cleaner
Milk Soda Diapers
Beer 3 0 1 2 2
Clea 0 1 1 0 0
Milk 1 1 3 0 1
Soda 2 1 0 3 1
Diap 2 0 1 1 2
Support and confidence
For a rule that says: If A then B Support is defined as the ratio of number of
transactions that include both A and B to total number of transactions
Confidence is defined by the ratio of the number of transactions that include both A and B to the number of transactions that include A.
How do you specify ‘significant’ support and confidence ?
Algorithm for Finding Association Rules
Input is Min-Support and Min-Confidence Find all sets of items with Min-Support
(frequent itemsets) Frequent Itemsets Property: Every subset of a
frequent itemset must also be a frequent itemsetiterative algorithm: start with frequent
itemsets with one item, and construct larger itemsets using only smaller frequent itemsets.
MBA example
Using the sample data create a co-occurrence table
Let relevant Support = 25% and Confidence= 50%: Beer and Diapers appear in 3/5= 60% If beer then diapers has confidence of 2/3=67% Thus, “If customer buys beer then customer buys
diapers” satisfies 25% support & 50% confidence
Conclusion drawn by mining system: Customers who buy beer also buy diapers
Applying MBA Results
Is the relationship useful ? Beer and Diapers may not be of use Victoria’s Secret transaction mining led to specific
apparel sent to specific stores -- Microstrategy software
Who defines “usefullness” only as good as rules specified by
humans/marketing workforce NBA mining: designers of s/w did not include height
mismatches at first…coaches made the correction
Data Mining Algorithms
Four algorithms commonly cited Association Rule (used in over 90% of the cases!) Nearest Neighbor
quick and easy but models get large Decision Tree Neural Network
difficult to interpret and large time
Decision Trees
Series of if/then rules easy to understand, complexity in implementation
No
yes
Balance<10K Balance > 10K
Age > 48Age< 48
yes
CRM and Data Mining
Recall:customer segmentation is key in CRM data mining can help improve understanding of
customer behaviourhelps located meaningful segments from
customer data users want to turn that understanding into an
automated interactions with their customers
Integrating Data Mining & CRM
Data mining application owns the modelling process
CRM application owns the campaign execution process
Goals: minimize pain involved with using models in
campaigns score records only when and where necessary
Integrating Mining & CRM
Step 1: analytic user creates model using mining system model is then exported into campaign
management system Step 2:
Marketing user creates campaign that includes predictive models
when campaign executes, data mining engine scores customers dynamically
Benefits of Integration
Pre-generated model selection Score defined segments “on the fly”
eliminates need to score entire database improve efficiency of campaigns
Reduces manual intervention and error Accelerates the market cycle
increases likelihood of reaching customers before competitors
improves campaign results and lower costs
Summary
“Using the new media of the one-to-one future, you will be able to communicate directly with customers individually…..” - Don Peppers & Martha Rogers (One-to-One Future)
“What are you afraid of?…..Even if you’re not afraid of these things, the beauty is,with proper marketing, we can make you afraid”-- Michael Saylor, CEO Microstrategy.