From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass...

78
CS 6355: Structured Prediction From Binary to Multiclass Classification 1

Transcript of From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass...

Page 1: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

CS6355:StructuredPrediction

FromBinarytoMulticlassClassification

1

Page 2: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Wehaveseenbinaryclassification

• Wehaveseenlinearmodels• Learningalgorithms– Perceptron– SVM– LogisticRegression

• Predictionissimple– Givenanexample 𝐱,output= sgn(𝐰𝑇𝐱)– Outputisasinglebit

2

Page 3: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Whatifwehavemorethantwolabels?

3

Page 4: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Readingfornextlecture:

ErinL.Allwein,RobertE.Schapire,Yoram Singer, ReducingMulticlasstoBinary:AUnifyingApproachforMarginClassifiers,ICML2000.

4

Page 5: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Multiclassclassification

• Introduction

• Combiningbinaryclassifiers– One-vs-all– All-vs-all– Errorcorrectingcodes

• Trainingasingleclassifier– MulticlassSVM– Constraintclassification

5

Page 6: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Wherearewe?

• Introduction

• Combiningbinaryclassifiers– One-vs-all– All-vs-all– Errorcorrectingcodes

• Trainingasingleclassifier– MulticlassSVM– Constraintclassification

6

Page 7: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Whatismulticlassclassification?

• AninputcanbelongtooneofKclasses

• Trainingdata:examplesassociatedwithclasslabel(anumberfrom1toK)

• Prediction:Givenanewinput,predicttheclasslabel

Eachinputbelongstoexactlyoneclass.Notmore,notless.• Otherwise,theproblemisnotmulticlassclassification

• Ifaninputcanbeassignedmultiplelabels(thinktagsforemailsratherthanfolders),itiscalledmulti-labelclassification

7

Page 8: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Exampleapplications:Images

– Input:hand-writtencharacter;Output:whichcharacter?

– Input:aphotographofanobject;Output:whichofasetofcategoriesofobjectsisit?• Eg:theCaltech256dataset

8

allmaptotheletterA

Cartire Cartire Duck laptop

Page 9: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Exampleapplications:Language

• Input:anewsarticle• Output:Whichsectionofthenewspapershouldbebein

• Input:anemail• Output:whichfoldershouldanemailbeplacedinto

• Input:anaudiocommandgiventoacar• Output:whichofasetofactionsshouldbeexecuted

9

Page 10: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Wherearewe?

• Introduction

• Combiningbinaryclassifiers– One-vs-all– All-vs-all– Errorcorrectingcodes

• Trainingasingleclassifier– MulticlassSVM– Constraintclassification

10

Page 11: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Binarytomulticlass

• Canweuseanalgorithmfortrainingbinaryclassifierstoconstructamulticlassclassifier?– Answer:Decomposethepredictionintomultiplebinarydecisions

• Howtodecompose?– One-vs-all– All-vs-all– Errorcorrectingcodes

11

Page 12: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Generalsetting

• Input𝐱 ∈ ℜ-– Theinputsarerepresentedbytheirfeaturevectors

• Output𝐲 ∈ 1,2,⋯ ,𝐾– Theseclassesrepresentdomain-specificlabels

• Learning:Givenadataset𝐷 = {(𝐱𝑖, 𝐲𝑖)}– NeedalearningalgorithmthatusesDtoconstructafunctionthatcan

predict𝐱 to 𝐲– Goal:findapredictorthatdoeswellonthetrainingdataandhaslow

generalizationerror

• Prediction/Inference:Givenanexample𝐱 andthelearnedfunction,computetheclasslabelfor 𝐱

12

Page 13: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

1.One-vs-allclassification

• Assumption:Eachclassindividuallyseparablefromall theothers

• Learning:Givenadataset𝐷 = {(𝐱𝑖, 𝐲𝑖)}– DecomposeintoKbinaryclassificationtasks– Forclassk,constructabinaryclassificationtaskas:

• Positiveexamples:ElementsofDwithlabelk• Negativeexamples:AllotherelementsofD

– TrainKbinaryclassifiersw1,w2,! wK usinganylearningalgorithmwehaveseen

13

𝒙 ∈ ℜ-𝒚 ∈ 1,2,⋯ , 𝐾

Page 14: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

1.One-vs-allclassification

• Assumption:Eachclassindividuallyseparablefromall theothers

• Learning:Givenadataset𝐷 = {(𝐱𝑖, 𝐲𝑖)}– DecomposeintoKbinaryclassificationtasks– Forclassk,constructabinaryclassificationtaskas:

• Positiveexamples:ElementsofDwithlabelk• Negativeexamples:AllotherelementsofD

– TrainKbinaryclassifiersw1,w2,! wK usinganylearningalgorithmwehaveseen

14

𝐱 ∈ ℜ-𝐲 ∈ 1,2,⋯ , 𝐾

Page 15: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

1.One-vs-allclassification

• Assumption:Eachclassindividuallyseparablefromall theothers

• Learning:Givenadataset𝐷 = {(𝐱i, 𝐲𝑖)}– TrainKbinaryclassifiersw1,w2,! wK usinganylearningalgorithmwehaveseen

• Prediction:“WinnerTakesAll”argmax𝑖𝐰𝑖

𝑇𝐱

15

𝒙 ∈ ℜ-𝒚 ∈ 1,2,⋯ , 𝐾

Page 16: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

1.One-vs-allclassification

• Assumption:Eachclassindividuallyseparablefromall theothers

• Learning:Givenadataset𝐷 = {(𝐱i, 𝐲𝑖)}– TrainKbinaryclassifiersw1,w2,! wK usinganylearningalgorithmwehaveseen

• Prediction:“WinnerTakesAll”argmax𝑖𝐰𝑖

𝑇𝐱

16

𝒙 ∈ ℜ-𝒚 ∈ 1,2,⋯ , 𝐾

Question:Whatisthedimensionalityofeachwi?

Page 17: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

VisualizingOne-vs-all

17

Page 18: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

VisualizingOne-vs-all

Fromthefulldataset,constructthreebinaryclassifiers,oneforeachclass

18

Page 19: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

VisualizingOne-vs-all

Fromthefulldataset,constructthreebinaryclassifiers,oneforeachclass

19

wblueTx >0

forblueinputs

Page 20: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

VisualizingOne-vs-all

Fromthefulldataset,constructthreebinaryclassifiers,oneforeachclass

20

wblueTx >0

forblueinputs

wredTx >0

forredinputs

wgreenTx >0

forgreeninputs

Page 21: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

VisualizingOne-vs-all

Fromthefulldataset,constructthreebinaryclassifiers,oneforeachclass

21

wblueTx >0

forblueinputs

wredTx >0

forredinputs

wgreenTx >0

forgreeninputs

Notation:Scoreforbluelabel

Page 22: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

VisualizingOne-vs-all

Fromthefulldataset,constructthreebinaryclassifiers,oneforeachclass

22

wblueTx >0

forblueinputs

wredTx >0

forredinputs

wgreenTx >0

forgreeninputs

Notation:Scoreforbluelabel

WinnerTakeAllwillpredicttherightanswer.Onlythecorrectlabelwillhaveapositivescore

Page 23: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

One-vs-allmaynotalwaysworkBlackpointsarenotseparablewithasinglebinaryclassifier

Thedecompositionwillnotworkforthesecases!

wblueTx >0

forblueinputs

wredTx >0

forredinputs

wgreenTx >0

forgreeninputs

???

23

Page 24: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

One-vs-allclassification:Summary

• Easytolearn– Useanybinaryclassifierlearningalgorithm

• Problems– Notheoreticaljustification– Calibrationissues

• WearecomparingscoresproducedbyKclassifierstrainedindependently.Noreasonforthescorestobeinthesamenumericalrange!

– Mightnotalwayswork• Yet,worksfairlywellinmanycases,especiallyiftheunderlyingbinaryclassifiersaretuned,regularized

24

Page 25: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

2.All-vs-allclassification

• Assumption:Every pairofclassesisseparable

Sometimescalledone-vs-one

25

Page 26: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

2.All-vs-allclassification

• Assumption:Every pairofclassesisseparable

• Learning:Givenadataset𝐷 = {(𝐱𝒊, 𝐲𝑖)},– Foreverypairoflabels(j,k),createabinaryclassifierwith:

• Positiveexamples:Allexampleswithlabelj• Negativeexamples:Allexampleswithlabelk

– Train 𝐾2 = @(@AB)C

classifierstoseparateeverypairoflabelsfromeachother

Sometimescalledone-vs-one

26

𝐱 ∈ ℜ-𝐲 ∈ 1,2,⋯ , 𝐾

Page 27: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

2.All-vs-allclassification

• Assumption:Every pairofclassesisseparable

• Learning:Givenadataset𝐷 = {(𝐱𝒊, 𝐲𝑖)},– Train 𝐾2 = @(@AB)

Cclassifierstoseparateeverypairof

labelsfromeachother

• Prediction:Morecomplex,eachlabelgetK-1votes– Howtocombinethevotes?Manymethods

• Majority:Pickthelabelwithmaximumvotes• Organizeatournamentbetweenthelabels

Sometimescalledone-vs-one

27

𝐱 ∈ ℜ-𝐲 ∈ 1,2,⋯ , 𝐾

Page 28: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

All-vs-allclassification

• Everypairoflabelsislinearlyseparablehere– Whenapairoflabelsisconsidered,allothersareignored

• Problems1. O(K2)weightvectorstotrainandstore

2. Sizeoftrainingsetforapairoflabelscouldbeverysmall,leadingtooverfittingofthebinaryclassifiers

3. Predictionisoftenad-hocandmightbeunstableEg:Whatiftwoclassesgetthesamenumberofvotes?Foratournament,whatisthesequenceinwhichthelabelscompete?

28

Page 29: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

3.Errorcorrectingoutputcodes(ECOC)

• Eachbinaryclassifierprovidesonebitofinformation

• WithKlabels,weonlyneedlog2Kbitstorepresentthelabel– One-vs-allusesK bits(oneperclassifier)– All-vs-allusesO(K2)bits

• CanwegetbywithO(logK)classifiers?– Yes! Encodeeachlabelasabinarystring– Oralternatively,ifwedotrainmorethanO(logK)classifiers,can

weusetheredundancytoimproveclassificationaccuracy?

29

Page 30: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Usinglog2Kclassifiers

• Learning:– Representeachlabelbyabitstring(i.e.,itscode)– Trainonebinaryclassifierforeachbit

• Prediction:– Usethepredictionsfromalltheclassifierstocreatealog2Nbit

stringthatuniquelydecidestheoutput

• Whatcouldgowronghere?– Evenifoneoftheclassifiersmakesamistake,finalpredictionis

wrong!

30

label# Code

0 0 0 0

1 0 0 1

2 0 1 0

3 0 1 1

4 1 0 0

5 1 0 1

6 1 1 0

7 1 1 1

8 classes,code-length=3

Example:Forsomeexample,ifthethreeclassifierspredict0,1 and1,thenthelabelis3

Page 31: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Usinglog2Kclassifiers

• Learning:– Representeachlabelbyabitstring(i.e.,itscode)– Trainonebinaryclassifierforeachbit

• Prediction:– Usethepredictionsfromalltheclassifierstocreatealog2Nbit

stringthatuniquelydecidestheoutput

• Whatcouldgowronghere?– Evenifoneoftheclassifiersmakesamistake,finalpredictionis

wrong!

31

label# Code

0 0 0 0

1 0 0 1

2 0 1 0

3 0 1 1

4 1 0 0

5 1 0 1

6 1 1 0

7 1 1 1

8 classes,code-length=3

Page 32: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Usinglog2Kclassifiers

• Learning:– Representeachlabelbyabitstring(i.e.,itscode)– Trainonebinaryclassifierforeachbit

• Prediction:– Usethepredictionsfromalltheclassifierstocreatealog2Nbit

stringthatuniquelydecidestheoutput

• Whatcouldgowronghere?– Evenifoneoftheclassifiersmakesamistake,finalpredictionis

wrong!

32

label# Code

0 0 0 0

1 0 0 1

2 0 1 0

3 0 1 1

4 1 0 0

5 1 0 1

6 1 1 0

7 1 1 1

8 classes,code-length=3

Page 33: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Errorcorrectingoutputcoding

Answer:Useredundancy• Assignabinarystringwitheachlabel

– Couldberandom– LengthofthecodewordL >=log2Kisaparameter

• Trainonebinaryclassifierforeachbit– Effectively,splitthedataintorandomdichotomies– Weneedonlylog2Kbits

• Additionalbitsactasanerrorcorrectingcode

33

8 classes,code-length=5

# Code

0 0 0 0 0 0

1 0 0 1 1 0

2 0 1 0 1 1

3 0 1 1 0 1

4 1 0 0 1 1

5 1 0 1 0 0

6 1 1 0 0 0

7 1 1 1 1 1

Page 34: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Howtopredict?

• Prediction– RunallL binaryclassifiersontheexample– GivesusapredictedbitstringoflengthL– Output=labelwhosecodewordis“closest”to

theprediction– ClosestdefinedusingHammingdistance

• Longercodelengthisbetter,bettererror-correction

• Example– Supposethebinaryclassifiersherepredict11010– Theclosestlabeltothisis6,withcodeword11000

34

8 classes,code-length=5

# Code

0 0 0 0 0 0

1 0 0 1 1 0

2 0 1 0 1 1

3 0 1 1 0 1

4 1 0 0 1 1

5 1 0 1 0 0

6 1 1 0 0 0

7 1 1 1 1 1

Page 35: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Howtopredict?

• Prediction– RunallL binaryclassifiersontheexample– GivesusapredictedbitstringoflengthL– Output=labelwhosecodewordis“closest”to

theprediction– ClosestdefinedusingHammingdistance

• Longercodelengthisbetter,bettererror-correction

• Example– Supposethebinaryclassifiersherepredict11010– Theclosestlabeltothisis6,withcodeword11000

35

8 classes,code-length=5

# Code

0 0 0 0 0 0

1 0 0 1 1 0

2 0 1 0 1 1

3 0 1 1 0 1

4 1 0 0 1 1

5 1 0 1 0 0

6 1 1 0 0 0

7 1 1 1 1 1

One-vs-allisaspecialcaseofthisscheme.How?

Page 36: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Errorcorrectingcodes:Discussion

• Assumesthatcolumnsareindependent– Otherwise,ineffectiveencoding

• Strongtheoreticalresultsthatdependoncodelength– IfminimalHammingdistancebetweentworowsisd,thenthe

predictioncancorrectupto(d-1)/2errorsinthebinarypredictions

• Codeassignmentcouldberandom,ordesignedforthedataset/task

• One-vs-allandall-vs-allarespecialcases– All-vs-allneedsaternarycode(notbinary)

36

Page 37: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Errorcorrectingcodes:Discussion

• Assumesthatcolumnsareindependent– Otherwise,ineffectiveencoding

• Strongtheoreticalresultsthatdependoncodelength– IfminimalHammingdistancebetweentworowsisd,thenthe

predictioncancorrectupto(d-1)/2errorsinthebinarypredictions

• Codeassignmentcouldberandom,ordesignedforthedataset/task

• One-vs-allandall-vs-allarespecialcases– All-vs-allneedsaternarycode(notbinary)

37

Exercise:Convinceyourselfthatthisiscorrect

Page 38: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Decompositionmethods:Summary

• Generalidea– Decomposethemulticlassproblemintomanybinaryproblems– Weknowhowtotrainbinaryclassifiers– Predictiondependsonthedecomposition

• Constructsthemulticlasslabelfromtheoutputofthebinaryclassifiers

• Learningoptimizeslocalcorrectness– Eachbinaryclassifierdoesnotneedtobegloballycorrect

• Thatis,theclassifiersdonothavetoagreewitheachother– Thelearningalgorithmisnotevenawareofthepredictionprocedure!

• Poordecompositiongivespoorperformance– Difficultlocalproblems,canbe“unnatural”

• Eg.ForECOC,whyshouldthebinaryproblemsbeseparable?

38

Page 39: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Wherearewe?

• Introduction

• Combiningbinaryclassifiers– One-vs-all– All-vs-all– Errorcorrectingcodes

• Trainingasingleclassifier– MulticlassSVM– Constraintclassification

39

Page 40: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Motivation

• Decompositionmethods– Donotaccountforhowthefinalpredictorwillbeused– Donotoptimizeanyglobalmeasureofcorrectness

• Goal:Totrainamulticlassclassifierthatis“global”

40

Page 41: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Recall:Marginforbinaryclassifiers

Themargin ofahyperplaneforadataset:thedistancebetweenthehyperplaneandthedatapointnearesttoit

41

++

++

+ +++

-- --

-- -- --

---- --

--

Marginwithrespecttothishyperplane

Page 42: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Multiclassmargin

Definedasthescoredifferencebetweenthehighestscoringlabelandthesecondone

42

Labels

Scoreforalabel

Blue

Red

Green

Black

=wlabelTx

Page 43: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Multiclassmargin

Definedasthescoredifferencebetweenthehighestscoringlabelandthesecondone

43

Labels

Scoreforalabel

Blue

Red

Green

Black

=wlabelTx

MulticlassMargin

Page 44: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM(Intuition)

• Recall:BinarySVM– Maximizemargin– Equivalently,

Minimizenormofweightssuchthattheclosestpointstothehyperplanehaveascore±1

• MulticlassSVM– Eachlabelhasadifferentweightvector(likeone-vs-all)– Maximizemulticlassmargin– Equivalently,

Minimizetotalnormoftheweightssuchthatthetruelabelisscoredatleast1morethanthesecondbestone

44

Page 45: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVMintheseparablecase

45

RecallhardbinarySVM

𝑠𝑐𝑜𝑟𝑒 𝑦J – 𝑠𝑐𝑜𝑟𝑒 𝑘 ≥ 1

𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑒𝑟 𝐰B,⋯ ,𝒘@

Page 46: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVMintheseparablecase

46

RecallhardbinarySVM

𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑒𝑟 𝐰B,⋯ ,𝒘@

Page 47: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVMintheseparablecase

47

RecallhardbinarySVM

Page 48: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVMintheseparablecase

48

RecallhardbinarySVM

Thescoreforthetruelabelishigherthanthescoreforany otherlabelby1

Page 49: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVMintheseparablecase

49

RecallhardbinarySVM

Thescoreforthetruelabelishigherthanthescoreforany otherlabelby1

Sizeoftheweights.Effectively,regularizer

Page 50: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVMintheseparablecase

50

RecallhardbinarySVM

Thescoreforthetruelabelishigherthanthescoreforany otherlabelby1

Sizeoftheweights.Effectively,regularizer

Problemswiththis?

Page 51: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVMintheseparablecase

51

RecallhardbinarySVM

Thescoreforthetruelabelishigherthanthescoreforanyotherlabelby1

Sizeoftheweights.Effectively,regularizer

Problemswiththis?

Whatifthereisnosetofweightsthatachievesthisseparation?Thatis,whatifthedataisnotlinearlyseparable?

Page 52: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

52

Sizeoftheweights.Effectively,regularizer

Thescoreforthetruelabelishigherthanthescoreforany otherlabelby1- »i

Slackvariables.Notallexamplesneedtosatisfythemargin

constraint.

Page 53: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

53

Sizeoftheweights.Effectively,regularizer

Thescoreforthetruelabelishigherthanthescoreforany otherlabelby1- »i

Slackvariables.Notallexamplesneedtosatisfythemargin

constraint.

Totalslack.Don’tallowtoomanyexamplestoviolatethemargin

constraint

Page 54: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

54

Sizeoftheweights.Effectively,regularizer

Thescoreforthetruelabelishigherthanthescoreforany otherlabelby1- »i

Slackvariables.Notallexamplesneedtosatisfythemargin

constraint.

Totalslack.Don’tallowtoomanyexamplestoviolatethemargin

constraint

Slackvariablescanonlybepositive

Page 55: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

55

Sizeoftheweights.Effectively,regularizer

Thescoreforthetruelabelishigherthanthescoreforany otherlabelby1- »i

Slackvariables.Notallexamplesneedtosatisfythemargin

constraint.

Totalslack.Don’tallowtoomanyexamplestoviolatethemargin

constraint

Slackvariablescanonlybepositive

Page 56: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

56

Thescoreforthetruelabelishigherthanthescoreforany otherlabelby1- »i

Sizeoftheweights.Effectively,regularizer

Slackvariables.Notallexamplesneedtosatisfythemargin

constraint.

Totalslack.Don’tallowtoomanyexamplestoviolatethemargin

constraint

Slackvariablescanonlybepositive

Page 57: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

57

Solving

Isequivalenttosolving

min𝐰U,𝐰V,⋯,𝐰W

12X𝐰J

Y𝐰J + 𝐶 X max 0,max]^𝐲_

𝐰]Y𝐱J − 𝐰𝐲_

Y 𝐱J + 1�

(𝐱_,𝐲_)∈b

J

Why?

Page 58: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

58

min𝐰U,𝐰V,⋯,𝐰W

12X𝐰J

Y𝐰J + 𝐶 X max 0,max]^𝐲_

𝐰]Y𝐱J − 𝐰𝐲_

Y 𝐱J + 1�

(𝐱_,𝐲_)∈b

J

Sizeoftheweights.Effectively,regularizer

Page 59: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

59

min𝐰U,𝐰V,⋯,𝐰W

12X𝐰J

Y𝐰J + 𝐶 X max 0,max]^𝐲_

𝐰]Y𝐱J − 𝐰𝐲_

Y 𝐱J + 1�

(𝐱_,𝐲_)∈b

J

Sizeoftheweights.Effectively,regularizer Themulticlasshingeloss

Page 60: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Generalcase

60

min𝐰U,𝐰V,⋯,𝐰W

12X𝐰J

Y𝐰J + 𝐶 X max 0,max]^𝐲_

𝐰]Y𝐱J − 𝐰𝐲_

Y 𝐱J + 1�

(𝐱_,𝐲_)∈b

J

Sizeoftheweights.Effectively,regularizer Themulticlasshingeloss

Thetradeoffhyperparameter

Page 61: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM

• GeneralizesbinarySVMalgorithm– Ifwehaveonlytwoclasses,thisreducestothebinary(uptoscale)

• ComeswithsimilargeneralizationguaranteesasthebinarySVM

• Canbetrainedusingdifferentoptimizationmethods– Stochasticsub-gradientdescentcanbegeneralized

• Tryasexercise

61

Page 62: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

MulticlassSVM:Summary

• Training:– OptimizetheSVMobjective

• Prediction:– Winnertakesall

argmaxi wiTx

• WithKlabelsandinputsin<n,wehavenK weightsinall– Sameasone-vs-all

– Butcomeswithguarantees!

62Questions?

Page 63: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Wherearewe?

• Introduction

• Combiningbinaryclassifiers– One-vs-all– All-vs-all– Errorcorrectingcodes

• Trainingasingleclassifier– MulticlassSVM– Constraintclassification

63

Page 64: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Letusexamineone-vs-allagain

• Training:– CreateKbinaryclassifiersw1,w2,…,wK

– wi separatesclassi fromallothers

• Prediction:argmaxi wiTx

• Observations:1. Attrainingtime,werequirewi

Tx tobepositiveforexamplesofclassi.

2. Really,allweneedis forwiTx tobemorethanallothers

Therequirementofbeingpositiveismorestrict

64

Page 65: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Rewriteinputsandweightvector• Stackallweightvectorsintoan

nK-dimensionalvector

• Defineafeaturevectorforlabeli beingassociatedtoinputx:

LinearSeparability withmultipleclasses

65

xintheith block,zeroseverywhereelse

Forexampleswithlabeli,wewantwiTx >wj

Tx forallj

Page 66: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Rewriteinputsandweightvector• Stackallweightvectorsintoan

nK-dimensionalvector

• Defineafeaturevectorforlabeli beingassociatedtoinputx:

LinearSeparability withmultipleclasses

66

xintheith block,zeroseverywhereelse

Forexampleswithlabeli,wewantwiTx >wj

Tx forallj

ThisiscalledtheKesler construction

Page 67: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

LinearSeparability withmultipleclasses

Equivalentrequirement:

67

xintheith block,zeroseverywhereelse

Forexampleswithlabeli,wewantwiTx >wj

Tx forallj

Or:

Page 68: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

LinearSeparability withmultipleclasses

68

ithblock

Forexampleswithlabeli,wewantwiTx >wj

Tx foralljOrequivalently:

Page 69: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

LinearSeparability withmultipleclasses

69

ithblock

Foreveryexample(x,i)indataset,allotherlabelsj

Positiveexamples Negativeexamples

Thatis,thefollowingbinarytaskinnK dimensionsthatshouldbelinearlyseparable

Forexampleswithlabeli,wewantwiTx >wj

Tx foralljOrequivalently:

Page 70: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

ConstraintClassification

• Training:– Givenadataset{(x,y)},createabinaryclassificationtask

• Positiveexamples:Á(x,y)- Á(x,y’)• Negativeexamples:Á(x, y’)- Á(x,y)foreveryexample,foreveryy’≠y

– Useyourfavoritealgorithmtotrainabinaryclassifier

• Prediction:GivenanK dimensionalweightvectorwandanewexamplex

argmaxy wT Á(x,y)

70

Page 71: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

ConstraintClassification

• Training:– Givenadataset{(x,y)},createabinaryclassificationtask

• Positiveexamples:Á(x,y)- Á(x,y’)• Negativeexamples:Á(x, y’)- Á(x,y)foreveryexample,foreveryy’≠y

– Useyourfavoritealgorithmtotrainabinaryclassifier

• Prediction:GivenanK dimensionalweightvectorwandanewexamplex

argmaxy wT Á(x,y)

71

Page 72: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

ConstraintClassification

• Training:– Givenadataset{(x,y)},createabinaryclassificationtask

• Positiveexamples:Á(x,y)- Á(x,y’)• Negativeexamples:Á(x, y’)- Á(x,y)foreveryexample,foreveryy’≠y

– Useyourfavoritealgorithmtotrainabinaryclassifier

• Prediction:GivenanK dimensionalweightvectorwandanewexamplex

argmaxy wT Á(x,y)

72

Exercise:WhatdotheperceptronupdaterulelooklikeintermsoftheÁs?Interprettheupdatestep

Page 73: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

ConstraintClassification

• Training:– Givenadataset{(x,y)},createabinaryclassificationtask

• Positiveexamples:Á(x,y)- Á(x,y’)• Negativeexamples:Á(x, y’)- Á(x,y)foreveryexample,foreveryy’≠y

– Useyourfavoritealgorithmtotrainabinaryclassifier

• Prediction:GivenanK dimensionalweightvectorwandanewexamplex

argmaxy wT Á(x,y)

73

Note:Thebinaryclassificationtaskonlyexpressespreferencesoverlabelassignments

Thisapproachextendstotrainingaranker,canusepartialpreferencestoo,moreonthislater…

Page 74: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Asecondlookatthemulticlassmargin

74

Definedasthescoredifferencebetweenthehighestscoringlabelandthesecondone

Labels

Scoreforalabel

Blue

Red

Green

Black

MulticlassMargin

Page 75: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Asecondlookatthemulticlassmargin

75

Definedasthescoredifferencebetweenthehighestscoringlabelandthesecondone

Labels

Scoreforalabel

Blue

Red

Green

Black

MulticlassMarginIntermsofKeslerconstruction

Herey isthelabelthathasthehighestscore

Page 76: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Discussion

• ThenumberofweightsformulticlassSVMandconstraintclassificationisstillsameasOne-vs-all,muchlessthanall-vs-allK(K-1)/2

• Butbothstillaccountforallpairwiselabelpreferences– MulticlassSVMviathedefinitionofthelearningobjective

– Constraintclassificationbyconstructingabinaryclassificationproblem

• Bothcomewiththeoreticalguaranteesforgeneralization

• Importantideathatisapplicablewhenwemovetoarbitrarystructures

76Questions?

Page 77: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Trainingmulticlassclassifiers:Wrap-up

• Labelbelongstoasetthathasmorethantwoelements

• Methods– Decompositionintoacollectionofbinary(local)decisions

• One-vs-all• All-vs-all• Errorcorrectingcodes

– Trainingasingle(global)classifier• MulticlassSVM• Constraintclassification

• Exercise:Whichofthesewillworkforthiscase?

77Questions?

Page 78: From Binary to Multiclass Classification · 2020. 1. 14. · From Binary to Multiclass Classification 1. We have seen binary classification •We have seen linear models •Learning

Nextsteps…

• Builduptostructuredprediction– Multiclassisreallyasimplestructure

• Differentaspectsofstructuredprediction– Decidingthestructure,training,inference

• Sequencemodels

78