Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial

Products

Jianfu Chen, David S. Warren

Stony Brook University

Classification is a fundamental problem in information management.

Email content Product description

UNSPSC

Product and material transport vehicles (16)

Passenger motor vehicles (15)

Safety and rescue vehicles (17)

Limousines (06)Automobiles or cars (03)Buses (02)

Food Beverage and Tobacco Products (50)

Vehicles and their Accessories and

Components (25)

Office Equipment and Accessories and

Supplies (44)

Marine transport (11) Motor vehicles (10) Aerospace systems (20)

Segment

Family

Class

Commodity

Spam Ham

How should we design a classifier for a given real world task?

Method 1. No Design

Training Set f(x) Test Set

Try Off-the-shelf ClassifiersSVM

Logistic-regressionDecision Tree

Neural Network...

Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy

What’s the use of the classifier?

How do we evaluate the performance of a classifier according to our interests?

Method 2. Optimize what we really care about

Quantify what we really care about

Optimize what we care about

Hierarchical classification of commercial products

Textual product description

UNSPSC

Product and material transport vehicles (16)

Passenger motor vehicles (15)

Safety and rescue vehicles (17)

Limousines (06)Automobiles or cars (03)Buses (02)

Food Beverage and Tobacco Products (50)

Vehicles and their Accessories and

Components (25)

Office Equipment and Accessories and Supplies

(44)

Marine transport (11) Motor vehicles (10) Aerospace systems (20)

Segment

Family

Class

Commodity

Product taxonomy helps customers to find desired products quickly.

• Facilitates exploring similar products• Helps product recommendation• Facilitates corporate spend analysis

Looking for gift ideas for a kid?

Toys&Games

dolls building toyspuzzles

...

...

We assume misclassification of products leads to revenue loss.

Textual product description of a mouse

Product

...

Desktop computer and accessories

mouse keyboard

......

pet

...

realize an expected annual revenue lose part of the potential revenue

What do we really care about?

A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss

Observation 1: the misclassification cost of a product depends on its potential revenue.

Observation 2: the misclassification cost of a product depends on how far apart the true class

and the predicted class in the taxonomy.

...

Textual product description of a mouse

Product

...

Desktop computer and accessories

mouse keyboard

......

pet

...

The proposed performance evaluation metric: average revenue loss

• example weight is the potential annual revenue of product x

• error function is the loss ratio– the percentage of the potential revenue a vendor

will lose due to misclassification from class y to class y’.

– a non-decreasing monotonic function of hierarchical distance between y and y’, f(d(y, y’))

d(y,y’) 0 1 2 3 4

0 0.2 0.4 0.6 0.8

revenue loss of product x

Learning – minimizing average revenue loss

Minimize convex upper bound

Multi-class SVM with margin re-scaling

𝜃𝑦 𝑖

𝑇 𝑥 𝑖

𝜃𝑦 ′𝑇 𝑥 𝑖

∀ 𝑖 ,∀ 𝑦 ′ :𝜃𝑦 𝑖

𝑇 𝑥𝑖−𝜃𝑦 ′𝑇 𝑥 𝑖≥ 𝐿 (𝑥 𝑖 , 𝑦 𝑖 , 𝑦

′ )−𝜉 𝑖

𝜉 𝑖≥0

min𝜃 , 𝜉

12||𝜃||2+ 𝐶

𝑚∑𝑖=1

𝑚

𝜉 𝑖

𝑠 .𝑡 .

0-1 error rate (standard multi-class SVM)

VALUE product revenue

TREE hierarchical distance

REVLOSS revenue loss

Multi-class SVM with margin re-scaling

∀ 𝑖 ,∀ 𝑦 ′ :𝜃𝑦 𝑖

𝑇 𝑥𝑖−𝜃𝑦 ′𝑇 𝑥 𝑖≥ 𝐿 (𝑥 𝑖 , 𝑦 𝑖 , 𝑦

′ )−𝜉 𝑖

𝜉 𝑖≥0

min𝜃 , 𝜉

12||𝜃||2+ 𝐶

𝑚∑𝑖=1

𝑚

𝜉 𝑖

𝑠 .𝑡 .

plug in any loss function

Convex upper bound of

Dataset

• UNSPSC (United Nations Standard Product and Service Code) dataset

• Product revenues are simulated– revenue = price * sales

data source multiple online market places oriented for DoD and Federal government customers

GSA AdvantageDoD EMALL

taxonomy structure 4-level balanced tree UNSPSC taxonomy

#examples 1.4M

#leaf classes 1073

Experimental results

0-1 TREE VALUE REVLOSS0

10

20

30

40

50

60

4.745 4.964

47.708 48.082

5.092 5.082

IDENTITYUNIT

Average revenue loss (in K$) of different algorithms

What’s wrong?

𝑣 (𝑥𝑖 ) ⋅ 𝐿𝑦 𝑖 𝑦′

Revenue loss ranges from a few K to several M

Loss normalization

• Linearly scale loss function to a fixed range , say [1, 10]

The objective now upper bounds both 0-1 loss and the average normalized loss.

0-1 TREE VALUE REVLOSS0

10

20

30

40

50

60

4.745 4.964

47.708 48.082

5.092 5.0824.387 4.371

IDENTITYUNITRANGE

Final results

Average revenue loss (in K$) of different algorithms

7.88% reduction in average revenue loss!

ConclusionWhat do we really care about for this task?

Minimize error rate?Minimize revenue loss?

Performance evaluation metric

Model + Tractable loss function

Optimization

How do we approximate the performance evaluation metric to make it tractable?

Find the best parameters

empirical risk, average misclassification cost:

regularized empirical risk minimizationA general method: multi-class SVM with margin re-scaling and loss normalization

Thank you!

Questions?

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products

Documents

Transcript of Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products