Machine Learning Applied in Product Classification

18
Machine Learning Applied in Product Classification Jianfu Chen Computer Science Department Stony Brook University

description

Machine Learning Applied in Product Classification. Jianfu Chen Computer Science Department Stony Brook University. Machine learning learns an idealized model of the real world. 1 + 1 = 2. ?. Prod1 -> class1 Prod2 -> class2 ... f ( x ) -> y - PowerPoint PPT Presentation

Transcript of Machine Learning Applied in Product Classification

Page 1: Machine Learning Applied in Product Classification

Machine Learning Applied in Product Classification

Jianfu ChenComputer Science Department

Stony Brook University

Page 2: Machine Learning Applied in Product Classification

Machine learning learns an idealized model of the real world.

+¿ ¿

+¿ ¿

1 + 1 = 2

+¿ ¿ ?

Page 3: Machine Learning Applied in Product Classification

Prod1 -> class1Prod2 -> class2

...

f(x) -> y Prod3 -> ?

X: Kindle Fire HD 8.9" 4G LTE Wireless 0 ... 1 1 ... 1 ... 1 ... 0 ...

Page 4: Machine Learning Applied in Product Classification

Compoenents of the magic box f(x)

Representat

ion

• Give a score to each class• s(y; x) =

Inference

• Predict the class with highest score

Learning

• Estimate the parameters from data

Page 5: Machine Learning Applied in Product Classification

Representation

Linear Model

• s(y;x)=

Probabilistic Model

• P(x,y)• Naive Bayes

• P(y|x)• Logistic

Regression

Algorithmic Model

• Decision Tree• Neural

Networks

Given an example, a model gives a score to each class.

Page 6: Machine Learning Applied in Product Classification

Linear Model

• a linear comibination of the feature values. • a hyperplane.• Use one weight vector to score each class.

𝑤1

𝑤2𝑤3

Page 7: Machine Learning Applied in Product Classification

Example

• Suppose we have 3 classes, 2 features• weight vectors

Page 8: Machine Learning Applied in Product Classification

Probabilistic model

• Gives a probability to class y given example x:

• Two ways to do this:– Generative model: P(x,y) (e.g., Naive Bayes)

– discriminative model: P(y|x) (e.g., Logistic Regression)

Page 9: Machine Learning Applied in Product Classification

Compoenents of the magic box f(x)

Representat

ion

• Give a score to each class• s(y; x) =

Inference

• Predict the class with highest score

Learning

• Estimate the parameters from data

Page 10: Machine Learning Applied in Product Classification

Learning

• Parameter estimation ()– ’s in a linear model– parameters for a probabilistic model

• Learning is usually formulated as an optimization problem.

Page 11: Machine Learning Applied in Product Classification

Define an optimization objective- average misclassification cost

• The misclassification cost of a single example x from class y into class y’:

– formally called loss function• The average misclassification cost on the

training set:

– formally called empirical risk

Page 12: Machine Learning Applied in Product Classification

Define misclassification cost

• 0-1 loss

average 0-1 loss is the error rate = 1 – accuracy:

• revenue loss

Page 13: Machine Learning Applied in Product Classification

Do the optimization- minimizes a convex upper bound of

the average misclassification cost.

• Directly minimizing average misclassificaiton cost is intractable, since the objective is non-convex.

•minimize a convex upper bound instead.

Page 14: Machine Learning Applied in Product Classification

A taste of SVM

• minimizes a convex upper bound of 0-1 loss

where C is a hyper parameter, regularization parameter.

Page 15: Machine Learning Applied in Product Classification

Machine learning in practice

feature extraction { (x, y) }

select a model/classifier

Setup experimenttraining:development:test4 : 2 : 4

SVM

call a package to do experiments

• LIBLINEARhttp://www.csie.ntu.edu.tw/~cjlin/liblinear/• find best C in developement set• test final performance on test set

Page 16: Machine Learning Applied in Product Classification

Cost-sensitive learning

• Standard classifier learning optimizes error rate by default, assuming all misclassification leads to uniform cost

• In product taxonomy classification

keyboardmousetruck car

IPhone5

Nokia 3720 Classic

Page 17: Machine Learning Applied in Product Classification

Minimize average revenue loss

where is the potential annual revenue of product x if it is correctly classified;

is the loss ratio of the revenue by misclassifying a product from class y to class y’.

Page 18: Machine Learning Applied in Product Classification

Conclusion

• Machine learning learns an idealized model of the real world.

• The model can be applied to predict unseen data.

• Classifier learning minimizes average misclassification cost.

• It is important to define an appropriate misclassification cost.