Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...

Search-Guided,Lightly-SupervisedTrainingofStructuredPredictionEnergyNetworks

AndrewMcCallumPedram Rooshenas Dongxu Zhang Gopal Sharma

StructuredPrediction

• Weareinterestedtolearnafunction• Xinputvariables• Youtputvariables

• Wecandefineas• ForaGibbsdistribution:

StructuredPredictionEnergyNetworks(SPENs)

• Ifisparameterizedusingadifferentiablemodelsuchasadeepneuralnetwork:• WecanfindalocalminimumofEusinggradientdescent

• Theenergynetworksexpressthecorrelationamonginputandoutputvariables.• Traditionallygraphicalmodelsareusedforrepresentingthecorrelationamongoutputvariables.• Inference isintractable formostofexpressive graphicalmodels

EnergyModels

[picture from Belanger (2016)]

[picture from Altinel (2018)]

TrainingSPENs

• StructuralSVM(BelangerandMcCallum,2016)• End-to-End(Belangeretal.,2017)• Value-basedtraining(Gygli etal.2017)• InferenceNetwork(Lifu Tu andKevinGimpel,2018)• Rank-BasedTraining(Rooshenasetal.,2018)

IndirectSupervision• Dataannotationisexpensive,especiallyforstructuredoutputs.• Domainknowledge asthesourceofsupervision.

• Itcanbewrittenasrewardfunctions• evaluatesapairofinputandoutputconfigurationintoascalarvalue• Foragivenx,wearelookingforthebestythatmaximize

6

Search-GuidedTraining

Wehaveareward function thatprovides indirect supervision


Wehaveareward function thatprovides indirect supervision

Wewanttolearnasmooth versionof the rewardfunctionsuch thatwecanusegradient-descent inference attesttime


y0

Wesample apoint from energy function using noisygradient-descent inference


y0

y1



y0

y2

y1



y0

y2

y3y1



y0

y2

y3y1

y4



y0

y2

y3y1

y4y5



y0

y2

y3y1

y4y5

Thenweproject thesample tothedomain ofthe rewardfunction(thesample isapoint inthesimplex,but thedomain ofthe rewardfunction isoften discrete, i.e.,theverticesof thesimplex)


y0

y2

y3y1

y4y5

Then thesearchprocedure usesthesampleasinput andreturns anoutput structure bysearching therewardfunction


y0

y2

y3y1

y4y5

Weexpectthatthe twopoints havethesamerankingon thereward function andnegative oftheenergy function


y0

y2

y3y1

y4y5

Rankingviolation

Weexpectthatthe twopoints havethesamerankingon thereward function andnegative oftheenergy function


y0

y2

y3y1

y4y5

Whenwefind apairofpoints thatviolates theranking constraints,weupdate theenergy function towards reducing theviolation

Task-LossasRewardFunctionforMulti-LabelClassification• Thesimplestformofindirectsupervisionistousetask-lossasrewardfunction:

DomainKnowledgeasRewardFunctionforCitationFieldExtraction

24


25


26


27

EnergyModel

0.9

0.9

0.85

0.4

0.1

0.05

0.05

0.04

0.1

0.45

0.8

0.9

... ...

Input embedding

Tagdistribution

Convolutional layer with multiple filters

and differentwindow sizes

Max pooling and

concatenation Multi-layer perceptron

Tokens

WeiLi.

DeepLearning

for

...

Energy

...

...

...

...

...

...

...

author title ...

Filter

size

Filte

r siz

e

PerformanceonCitationFieldExtraction

Semi-SupervisedSetting• Alternativelyusetheoutputofsearchandground-truthlabelfortraining.

ShapeParser

I

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

ShapeParser

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

I

Predict

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

ShapeParser

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

GraphicEngine

I O

Predict

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

ShapeParserEnergyModel

0.8

1e-5

1e-5

0.01

1e-5

...

...

...

...

...

Convolutional layer

Program

circle(16,16,12)triangle(32,48,16)

+

circle(16,24,12)

Energy

1e-5

1e-5

1e-3

1e-5

0.9

circle(16,16,12) -...

CNN

Output

distribution

Input

image Multi-layer perceptron

SearchBudgetvs.Constraints

PerformanceonShapeParser

ConclusionandFutureDirections

• Ifarewardfunctionexiststoevaluateeverystructuredoutputintoascalarvalue• Wecanuseunlabled datafortrainingstructuredpredictionenergynetworks

• Domainknowledgeornon-differentiablepipelinescanbeusedtodefinetherewardfunctions.• Themainingredientforlearningfromtherewardfunctionisthesearchoperator.• Hereweonlyusesimplesearchoperators,butmorecomplexsearchfunctionsderivedfromdomainknowledgecanbeusedforcomplicatedproblems.

Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...

Documents

Transcript of Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...