Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products
description
Transcript of Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products
Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial
Products
Jianfu Chen, David S. Warren
Stony Brook University
Classification is a fundamental problem in information management.
Email content Product description
UNSPSC
Product and material transport vehicles (16)
Passenger motor vehicles (15)
Safety and rescue vehicles (17)
Limousines (06)Automobiles or cars (03)Buses (02)
Food Beverage and Tobacco Products (50)
Vehicles and their Accessories and
Components (25)
Office Equipment and Accessories and
Supplies (44)
Marine transport (11) Motor vehicles (10) Aerospace systems (20)
Segment
Family
Class
Commodity
Spam Ham
How should we design a classifier for a given real world task?
Method 1. No Design
Training Set f(x) Test Set
Try Off-the-shelf ClassifiersSVM
Logistic-regressionDecision Tree
Neural Network...
Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy
What’s the use of the classifier?
How do we evaluate the performance of a classifier according to our interests?
Method 2. Optimize what we really care about
Quantify what we really care about
Optimize what we care about
Hierarchical classification of commercial products
Textual product description
UNSPSC
Product and material transport vehicles (16)
Passenger motor vehicles (15)
Safety and rescue vehicles (17)
Limousines (06)Automobiles or cars (03)Buses (02)
Food Beverage and Tobacco Products (50)
Vehicles and their Accessories and
Components (25)
Office Equipment and Accessories and Supplies
(44)
Marine transport (11) Motor vehicles (10) Aerospace systems (20)
Segment
Family
Class
Commodity
Product taxonomy helps customers to find desired products quickly.
• Facilitates exploring similar products• Helps product recommendation• Facilitates corporate spend analysis
Looking for gift ideas for a kid?
Toys&Games
dolls building toyspuzzles
...
...
We assume misclassification of products leads to revenue loss.
Textual product description of a mouse
Product
...
Desktop computer and accessories
mouse keyboard
......
pet
...
realize an expected annual revenue lose part of the potential revenue
What do we really care about?
A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss
Observation 1: the misclassification cost of a product depends on its potential revenue.
Observation 2: the misclassification cost of a product depends on how far apart the true class
and the predicted class in the taxonomy.
...
Textual product description of a mouse
Product
...
Desktop computer and accessories
mouse keyboard
......
pet
...
The proposed performance evaluation metric: average revenue loss
• example weight is the potential annual revenue of product x
• error function is the loss ratio– the percentage of the potential revenue a vendor
will lose due to misclassification from class y to class y’.
– a non-decreasing monotonic function of hierarchical distance between y and y’, f(d(y, y’))
d(y,y’) 0 1 2 3 4
0 0.2 0.4 0.6 0.8
revenue loss of product x
Learning – minimizing average revenue loss
Minimize convex upper bound
Multi-class SVM with margin re-scaling
𝜃𝑦 𝑖
𝑇 𝑥 𝑖
𝜃𝑦 ′𝑇 𝑥 𝑖
∀ 𝑖 ,∀ 𝑦 ′ :𝜃𝑦 𝑖
𝑇 𝑥𝑖−𝜃𝑦 ′𝑇 𝑥 𝑖≥ 𝐿 (𝑥 𝑖 , 𝑦 𝑖 , 𝑦
′ )−𝜉 𝑖
𝜉 𝑖≥0
min𝜃 , 𝜉
12||𝜃||2+ 𝐶
𝑚∑𝑖=1
𝑚
𝜉 𝑖
𝑠 .𝑡 .
0-1 error rate (standard multi-class SVM)
VALUE product revenue
TREE hierarchical distance
REVLOSS revenue loss
Multi-class SVM with margin re-scaling
∀ 𝑖 ,∀ 𝑦 ′ :𝜃𝑦 𝑖
𝑇 𝑥𝑖−𝜃𝑦 ′𝑇 𝑥 𝑖≥ 𝐿 (𝑥 𝑖 , 𝑦 𝑖 , 𝑦
′ )−𝜉 𝑖
𝜉 𝑖≥0
min𝜃 , 𝜉
12||𝜃||2+ 𝐶
𝑚∑𝑖=1
𝑚
𝜉 𝑖
𝑠 .𝑡 .
plug in any loss function
Convex upper bound of
Dataset
• UNSPSC (United Nations Standard Product and Service Code) dataset
• Product revenues are simulated– revenue = price * sales
data source multiple online market places oriented for DoD and Federal government customers
GSA AdvantageDoD EMALL
taxonomy structure 4-level balanced tree UNSPSC taxonomy
#examples 1.4M
#leaf classes 1073
Experimental results
0-1 TREE VALUE REVLOSS0
10
20
30
40
50
60
4.745 4.964
47.708 48.082
5.092 5.082
IDENTITYUNIT
Average revenue loss (in K$) of different algorithms
What’s wrong?
𝑣 (𝑥𝑖 ) ⋅ 𝐿𝑦 𝑖 𝑦′
Revenue loss ranges from a few K to several M
Loss normalization
• Linearly scale loss function to a fixed range , say [1, 10]
The objective now upper bounds both 0-1 loss and the average normalized loss.
0-1 TREE VALUE REVLOSS0
10
20
30
40
50
60
4.745 4.964
47.708 48.082
5.092 5.0824.387 4.371
IDENTITYUNITRANGE
Final results
Average revenue loss (in K$) of different algorithms
7.88% reduction in average revenue loss!
ConclusionWhat do we really care about for this task?
Minimize error rate?Minimize revenue loss?
Performance evaluation metric
Model + Tractable loss function
Optimization
How do we approximate the performance evaluation metric to make it tractable?
Find the best parameters
empirical risk, average misclassification cost:
regularized empirical risk minimizationA general method: multi-class SVM with margin re-scaling and loss normalization
Thank you!
Questions?