Aggressive Double Sampling for Reducing Multi-class ......Aggressive Double Sampling for Reducing...

27
Aggressive Double Sampling for Reducing Multi-class Classification to Binary Classification Bikash Joshi (PhD Student) AMA team, LIG Supervised By: Prof. Massih-Reza Amini and Dr. Franck Iutzeler March 20, 2017 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 1 / 27

Transcript of Aggressive Double Sampling for Reducing Multi-class ......Aggressive Double Sampling for Reducing...

Aggressive Double Sampling for Reducing Multi-classClassification to Binary Classification

Bikash Joshi (PhD Student)AMA team, LIG

Supervised By:

Prof. Massih-Reza Amini and Dr. Franck Iutzeler

March 20, 2017

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 1 / 27

Outline

1 Introduction

2 Multiclass to Binary Reduction

3 Double-Sampled Multiclass to Binary Reduction

4 Experimental Results

5 Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 2 / 27

Outline

1 Introduction

2 Multiclass to Binary Reduction

3 Double-Sampled Multiclass to Binary Reduction

4 Experimental Results

5 Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 3 / 27

Multiclass Classification: Introduction

Figure : DigitClassification

Figure : ImageClassification

Figure : TextClassification

Finite set of categories (K > 2)

Popular applications: image and text classification.

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 4 / 27

Multiclass classification: Related Work

1 Combined approaches based on binary classification:I One-Vs-Rest

F One binary problem for each classF K binary problemsF O(K × d)

I One-Vs-OneF One binary problem for each pair of classesF O(K 2× d)

2 Uncombined ApproachesI for example: multiclass SVM, MLPI One scoring function per class

3 Logarithmic Time AlgorithmsI For example: logTree, Recall-TreeI Each leaf node represents a classI O(logK)

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 5 / 27

Multiclass classification : Challenges

The number of classes, K, in new emerging multiclass problems, forexample in text and image classification, may reach 105 to 106

categories.

For example:

I 4 × 106 sites

I 106 categories

I 105 editors

I Imbalanced natureof hierarchies

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 6 / 27

Multiclass classification : Challenges

Class imbalance problem

Majority of classes have few representative examples

Long tailed distribution

0

500

1000

1500

2000

2500

3000

3500

4000

2-5 6-10 11-30 31-100 101-200 >200

# C

lasses

# Documents

DMOZ-7500

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 7 / 27

Text Classification:

Task: Automatic classification of an example text to one of fixed set ofcategories.Feature Representation:

Bag of Words:I From training corpus extract vocabulary.I Represent each terms as 0 or 1I Highly sparse

Document-class joint feature representation:I Inspired by learning to rankI Similarity features between an example and class of examplesI For example: ∑

t∈y∩x

1

Where,x → One documenty → Class of documents

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 8 / 27

Outline

1 Introduction

2 Multiclass to Binary Reduction

3 Double-Sampled Multiclass to Binary Reduction

4 Experimental Results

5 Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 9 / 27

Motivation of our work

Baselines: Model complexity increases with classes(K) and featuredimension (d).

Algorithm that scales well for large scale data

Does not suffer from class imbalance problem

Less complex model

Competitive with the state of the art approaches

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 10 / 27

Framework

X ⊆ Rd : Input Space

Y = 1,...,K : Output Space

S = (xyii )mi=1 : Training set of i.i.d. pairs

G = g : X × Y → R : Class of predictors

Instantaneous Loss

e(g , xy ) =1

K − 1

∑y ′∈Y\y

1g(xy )≤g(xy

′)

(1)

1π is the indicator function (Value is 0 or 1)

Average number of classes that get greater scoring by g than trueclass

Ranking loss used in Multiclass-SVM a

aWeston et. al. (1998)

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 11 / 27

Framework

Empirical Loss

Empirical error of g ∈ G over S is:

Lm(g ,S) =1

m(K − 1)

m∑i=1

∑y ′∈Y \yi

1g(x

yii )≤g(xy

′i )

(2)

=1

m(K − 1)

m∑i=1

∑y ′∈Y \yi

1

h(xyii , xy′

i )︸ ︷︷ ︸g(x

yii

)−g(xy′

i)

≤0(3)

Resembles to binary-classification-loss based risk

Selection of a hypothesis in G minimizing risk over S is equivalent tosearch a hypothesis in H minimizing risk over T(S) of size m× (K − 1)

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 12 / 27

Multiclass to binary reduction example

We consider the following transformation

T (S) =

({ (zj =

(xki , x

yii

), yj = −1

)if k < yi(

zj =(xyii , x

ki

), yj = +1

)elsewhere

)j.=(i−1)(K−1)+k

,

|T (S)| = m × (K - 1)

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 13 / 27

Multiclass to binary reduction algorithm

[Bikash et al. 2015]

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 14 / 27

Improvements and New challenges

Improvements:

One parameter vector for all classes.

Low-dimensional feature space.

Overcome class imbalance.

New Challenges:

Number of transformations huge for larger K

Large computational overhead

Large memory requirement

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 15 / 27

Outline

1 Introduction

2 Multiclass to Binary Reduction

3 Double-Sampled Multiclass to Binary Reduction

4 Experimental Results

5 Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 16 / 27

Aggressive double sampling

1 Drawing uniformly µ examples per class, in order to form practical setSµ;

I Reduce redundancy in examplesI Emphasizing rare classes

2 For each example xy in Sµ, drawing uniformly κ adversarial classes inY\{y}.

I Reduces time complexityI Low memory requirement

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 17 / 27

Double Sampled Multi to Binary Reduction

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 18 / 27

Outline

1 Introduction

2 Multiclass to Binary Reduction

3 Double-Sampled Multiclass to Binary Reduction

4 Experimental Results

5 Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 19 / 27

Experimental Setup

Datasets:

Application: Text Classification

DMOZ and Wikipedia datasets. (LSHTC challenge)

Pre-processed with stop word removal and stemming.

Random samples of 1000, 2000, 3000, 4000, 5000, 7500, 10000,20000.

Comparison:

DS-m2b: Proposed double sampled multiclass to binary algorithm

OVA: One-Vs-All algorithm

M-SVM: Crammar-Singer implementation of multiclass SVM

Recall Tree: Hierarchical One-Vs-Some algorithm

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 20 / 27

Feature representation Φ(xy)

Features

1.∑

t∈y∩x

ln(1 + yt) 2.∑

t∈y∩x

ln(1 +lSSt

)

3.∑

t∈y∩x

It 4.∑

t∈y∩x

ln(1 +yt|y | )

5.∑

t∈y∩x

ln(1 +yt|y | .It) 6.

∑t∈y∩x

ln(1 +yt|y | .

lSSt

)

7.∑

t∈y∩x

1 8.∑

t∈y∩x

yt|y | .It

9. BM25 10. d(xy , centroid(y))

xt : number of occurrences of terme t in document x ,

V: Number of distinct terms in S,

yt =∑

x∈y xt , |y | =∑

t∈V yt , St =∑

x∈S xt , lS =∑

t∈V St .It : idf of the term t,

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 21 / 27

Results: Runtime Comparison

1000 3000 5000 7500 10000 20000# of classes

101

102

103

104

105

106To

tal r

untim

e (s

econ

ds)

OVAMSVMRecall TreeDS-m2b

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 22 / 27

Results: Memory Comparison

1000 3000 5000 7500 10000 20000# of classes

0

10

20

30

40

50

60To

tal m

emor

y us

age

(GB)

16GB Limit

32GB Limit

OVAMSVMRecall TreeDS-m2b

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 23 / 27

Results: Prediction Performance Comparison

1000 3000 5000 7500 10000 20000# of classes

0.0

0.1

0.2

0.3

0.4

0.5

0.6M

AFOVAMSVMRecall TreeDS-m2b

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 24 / 27

Outline

1 Introduction

2 Multiclass to Binary Reduction

3 Double-Sampled Multiclass to Binary Reduction

4 Experimental Results

5 Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 25 / 27

Conclusion:

Multiclass to binary reduction to handle large-class scenario andovercome class imbalance problem.

Use of double sampling to further improve computational complexityand memory usage.

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 26 / 27

Questions?

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 27 / 27