Using Text Mining to Infer Semantic Attributes for Retail Data Mining Authors: Rayid Ghani & Andrew...
-
Upload
duane-allen -
Category
Documents
-
view
218 -
download
0
Transcript of Using Text Mining to Infer Semantic Attributes for Retail Data Mining Authors: Rayid Ghani & Andrew...
Using Text Mining to Infer Semantic Attributes for Retail Data Mining
Authors: Rayid Ghani & Andrew E. Fano
Presenter: Vishal MahajanINFS795
Agenda Drawbacks in Current Data Mining Techniques. Purpose. Assumptions and Constraints. Methodology or Approach. Extraction of Feature Set. Labeling . Classification Techniques.
Naïve Bayes EM
Experimental Results. Recommender System.
Drawbacks in Current Data Mining Techniques Semantic Features not automatically
considered. Transactional Data analyzed without
analyzing the customer. Trending is partial. Retail Items treated as objects with no
associated semantics. Data Mining Techniques (association rules,
decision trees, neural networks) ignore the meaning of items and semantics associated with them.
Purpose of the Presentation Describe a system that extracts semantic
features. Populate the knowledge base with the
semantic features. Use of text mining in retailing to extract
semantic features from website of retailers.
How profiles of customers or group of customers can be build using Text Mining.
Assumptions & Constraints Focus on Apparel Retail segment only. Results focus on extracting those
semantic features that are deemed important by CRM or Retail experts.
Data extracted from retailers website. Models generated can be extended
beyond the Apparel Retail segment.
Approach Collect Information about products. Define set of features to be extracted. Label the data with values of the features. Train a classifier/extractor to use the
labeled training to extract features from unseen data.
Extract Semantic Features from new products by using trained classifier.
Populate a knowledge base with the products and corresponding feature.
Data Collection Methodology Use of web crawler to extract the
following from large retailers’ website: Names URLs Description Prices Categories of all Products Available
Use of wrappers. Extracted Information stored in a
database and a subset chosen.
Extraction of Feature Set Feature selection based on Expert Systems. Use of extensive domain knowledge. Feature selection based on Retail Apparel
section in mind. Feature Selected for the project
Age Group Functionality Price Formality Degree of Conservativeness Degree of Sportiness Degree of Trendiness Degree of Brand Appeal
Labeling Training Data Database created with data from
collected from retailer website. Subset of 600 products chosen and
labeled. Labeling guidelines provided
Details of Features extracted from each Product DescriptionAge Group Age Group Juniors, Teens, GenX,
Mature, All ages
For what ages is this item most appropriate?
Functionality Loungewear, Sportswear, Eveningwear, Business Casual, Business Formal
How will the item be used?
Pricepoint Discount, Average, Luxury Compared to other items of this kind is this item cheap or expensive?
Formality Informal, Somewhat Formal, Very Formal How formal is this item?
Conservation 1 (gray suits) to 5 (Loud, flashy clothes) Does this suggest the person is conservative or flashy?
Sportiness 1 to 5
Trendiness 1 (Timeless Classic) to 5 (Current favorite)
Is this item popular now but likely to go out of style? Or is it more timeless?
Brand Appeal 1 (Brand makes the product unappealing) to 5 (high brand appeal)
Is the brand known and makes it appealing
Verifying Training Data Disjoint Dataset as labeling done by
different individuals. Association rules (between features) used
to obtain consistency in labeled data. Apriori algorithm
Apriori Algorithm implemented with single and two feature antecedents and consequents.
Desired Consistency in Labeling achieved by applying associating rules
Apriori Algorithm Find the frequent itemsets: the sets of items
that have minimum support A subset of a frequent itemset must also be
a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and
{B} should be a frequent itemset Use the frequent itemsets to generate
association rules.
The Apriori Algorithm — Example
TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5
Database D
itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3
itemset sup.{1} 2{2} 3{3} 3{5} 3
Scan D
C1L1
itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}
itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2
itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2
L2
C2 C2Scan D
C3 L3itemset{2 3 5}
Scan D itemset sup{2 3 5} 2
Training from Labeled Data Learning problem treated as a text
classification problem. Only one text classifier for each
semantic feature. e.g Price of product will be classified as
either discount or average or luxury. Age group is classified as Juniors or Teens or
GenX or Mature or All Ages. Classification was performed using
Naïve Bayes classification.
Sample Association Rules
Informal <- Sportswear
24.5% 93.6%
Informal <- Loungewear
16.1% 82.3%
Informal <- Juniors
12.1% 89.4%
PricePoint =Ave <- BrandAppeal=2
8.8% 79.0%
BrandAppeal=5 <- Trendy=5
16.3% 91.2%
Sportswear <- Sporty=4
9.0% 85.7%
AgeGroup=Mature <- Trendy=1
9.4% 78.8%
Rule Support Confidence
Naïve Bayes Simple but effective text classification method. Class is selected according to class prior
probabilities. This Model assumes each word in a document
is generated independently of the other in the class.
||
1
||
1
||
1
)|Pr(),(||
)|Pr(),(1)|Pr( V
s
D
i
ijis
D
i
ijit
jt
dcdwNV
dcdwNcw
where N(wt,di) = count of times word wt occurs in document di
and Pr(cj,di) = {0,1)
Incorporating Unlabeled Data Initial sample was for 600 products only. Need to take care of unlabeled products
to make any meaningful predictions. Use of Supervised learning algorithms. These algorithms have proved to reduce
the classification error considerably. Use of Expectation-Maximization (EM)
Algorithm as the supervised technique.
Expectation-Maximization (EM) Method EM is an iterative statistical technique for
maximum likelihood estimation for incomplete data.
In the retail classification problem, unlabeled data is considered as incomplete data.
EM Locally maximizes the likelihood of the
parameter. Gives estimates for missing values.
Expectation-Maximization (EM) Method- cont EM method is a 2-step process. Initial Parameters are set using naïve Bayes from
just the labeled documents. Subsequent iteration of E- and M-Steps. E-Step
Calculates probabilistically weighed class label Pr(cj|dj), for every unlabeled document.
M-Step Estimates new classifier parameter using all
documents (Equation 1). E and M steps iterated unless classifier converges
Experimental Results
Baseline 29% 24% 68% 39% 49% 29% 36%
Naïve Bayes
66% 57% 76% 80% 70% 69% 82%
EM 78% 70% 82% 84% 78% 80% 91%
Algorithm
Age Group
Functionality
Formality
Conservation
Sportiness
Trendiness
BrandAppeal
Experimental ResultsBrand
Appeal=5(high)Conservative=5(hi
gh)Conservative=1(lo
w)Formality=Infor
malSomewhat Formal
LaurenRalphDKNYKennethColeimported
LaurenRalphBreastedSeasonlessTrouserJonesSportClassicblazer
RoseSpecialLeopardChemiseStrapsFlirtySpraySilkplatform
JeanTommyJeansDenimSweaterPocketNeckTeeHilfiger
JacketFullyButtonSkirtLinesYorkSeamCrepeleather
AgeGroup=Junior
Functionality=Loungewear
Functionality=Partywear
Sportiness=5(High)
Trendiness=1(low)
JrsDknyJeansTeeColligateLogoTommyPoloShortsneaker
ChemiseSilkKimonoCalvinKleinAugustLoungeHilfigerRobegown
RockDressSateenLengthSkirtShirtdressOpenPlatformPlaidflower
SneakerCampBaseRubberSoleWhiteMiraclesuiteAthleticNylonMesh
LaurenSeasonlessBreastedTrouserPocketCarefreeRalphBlazerbutton
Results on new data set The subset of data that was used
earlier was from a single retailer. Another sample of data was
collected from variety of retailers. The results are as follows.
Results are consistently better.
Algorithm Age
GroupFunctionalit
yFormalit
yConservatio
nSportine
ssTrendiness BrandAppeal
Naïve Bayes
83% 45% 61% 70% 81% 80% 87%
Recommender System Creation of customer profiles (real time)
is feasible by analyzing the text associated with products and by mapping it to pre-defined semantic features.
Identity of customer is not known and prior transaction history is unknown.
Semantic features are inferred by the “browsing” pattern of the customer.
Helps in suggesting new products to the customers.
Recommender System
Mathematically P(Aij|Product) Where Aij is the jth value of ith attribute
i=semantic attributes, j=possible values User profile is constructed as follows Pr(Ui,j|Past N Items) = 1/N
i,j
)|(Pr ,
1
kji
N
k
ItemsA
is calculated
Types of Recommender Systems Two Types of Recommender Systems. Collaborative Filtering.
Collect user feedback in terms of ratings. Exploit similarities and differences of customers to
recommend items. Issues
Sparsity Problem. New Items.
Content Filtering Compares the contents Issues
Narrow in scope Recommends similar products only
Conclusions The systems learns from the use of
supervised and semi-supervised techniques. Major assumptions..Products accurately
convey the semantic attributes.?? Small sample of data used to Infer results.
Practical applications not verified. System bootstrapped from a small number
of labeled training examples. Interesting application which could be
evolved to generate trends for retail marketers.