Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...

Search-Guided,Lightly-SupervisedTrainingofStructuredPredictionEnergyNetworks

AndrewMcCallumPedram Rooshenas Dongxu Zhang Gopal Sharma

StructuredPrediction

• Weareinterestedtolearnafunction• Xinputvariables• Youtputvariables

• Wecandefineas• ForaGibbsdistribution:

StructuredPredictionEnergyNetworks(SPENs)

• Ifisparameterizedusingadifferentiablemodelsuchasadeepneuralnetwork:• WecanfindalocalminimumofEusinggradientdescent

• Theenergynetworksexpressthecorrelationamonginputandoutputvariables.• Traditionallygraphicalmodelsareusedforrepresentingthecorrelationamongoutputvariables.• Inference isintractable formostofexpressive graphicalmodels

EnergyModels

[picture from Belanger (2016)]

[picture from Altinel (2018)]

TrainingSPENs

• StructuralSVM(BelangerandMcCallum,2016)• End-to-End(Belangeretal.,2017)• Value-basedtraining(Gygli etal.2017)• InferenceNetwork(Lifu Tu andKevinGimpel,2018)• Rank-BasedTraining(Rooshenasetal.,2018)

IndirectSupervision• Dataannotationisexpensive,especiallyforstructuredoutputs.• Domainknowledge asthesourceofsupervision.

• Itcanbewrittenasrewardfunctions• evaluatesapairofinputandoutputconfigurationintoascalarvalue• Foragivenx,wearelookingforthebestythatmaximize

Search-GuidedTraining

Wehaveareward function thatprovides indirect supervision

Wewanttolearnasmooth versionof the rewardfunctionsuch thatwecanusegradient-descent inference attesttime

Wesample apoint from energy function using noisygradient-descent inference

Thenweproject thesample tothedomain ofthe rewardfunction(thesample isapoint inthesimplex,but thedomain ofthe rewardfunction isoften discrete, i.e.,theverticesof thesimplex)

Then thesearchprocedure usesthesampleasinput andreturns anoutput structure bysearching therewardfunction

Weexpectthatthe twopoints havethesamerankingon thereward function andnegative oftheenergy function

Rankingviolation

Weexpectthatthe twopoints havethesamerankingon thereward function andnegative oftheenergy function

Whenwefind apairofpoints thatviolates theranking constraints,weupdate theenergy function towards reducing theviolation

Task-LossasRewardFunctionforMulti-LabelClassification• Thesimplestformofindirectsupervisionistousetask-lossasrewardfunction:

DomainKnowledgeasRewardFunctionforCitationFieldExtraction

EnergyModel

... ...

Input embedding

Tagdistribution

Convolutional layer with multiple filters

and differentwindow sizes

Max pooling and

concatenation Multi-layer perceptron

Tokens

WeiLi.

DeepLearning

Energy

author title ...

Filter

PerformanceonCitationFieldExtraction

Semi-SupervisedSetting• Alternativelyusetheoutputofsearchandground-truthlabelfortraining.

ShapeParser

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

ShapeParser

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

Predict

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

ShapeParser

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

GraphicEngine

Predict

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

ShapeParser

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

GraphicEngine

Predict

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

ShapeParserEnergyModel

Convolutional layer

Program

circle(16,16,12)triangle(32,48,16)

circle(16,24,12)

Energy

circle(16,16,12) -...

Output

distribution

image Multi-layer perceptron

SearchBudgetvs.Constraints

PerformanceonShapeParser

ConclusionandFutureDirections

• Ifarewardfunctionexiststoevaluateeverystructuredoutputintoascalarvalue• Wecanuseunlabled datafortrainingstructuredpredictionenergynetworks

• Domainknowledgeornon-differentiablepipelinescanbeusedtodefinetherewardfunctions.• Themainingredientforlearningfromtherewardfunctionisthesearchoperator.• Hereweonlyusesimplesearchoperators,butmorecomplexsearchfunctionsderivedfromdomainknowledgecanbeusedforcomplicatedproblems.

Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...

Documents

Transcript of Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...

Differentiable Sparse Coding

ANALYTIC EXTENSIONS OF DIFFERENTIABLE FUNCTIONS … · 2018. 11. 16. · ANALYTIC EXTENSIONS OF DIFFERENTIABLE FUNCTIONS DEFINED IN CLOSED SETS* BY HASSLER WHITNEYt I. DIFFERENTIABLE

[Hitchin N.] Differentiable Manifolds(BookZZ.org)

Differentiable Manifolds Notes

The Product Rule The product of two differentiable ... · ... The product of two differentiable functions f and g is itself differentiable. Moreover, ... de lo over lo squared. 2

Continuous, differentiable, and twice differentiable ...

NON-DIFFERENTIABLE FUNCTIONS

2. Differentiable Manifolds and Tensors

End-to-End Differentiable Physics for Learning and Controlpapers.nips.cc/paper/7948-end-to-end... · in deep neural networks for end-to-end learning. As a result, structured physics

RenderNet: A deep convolutional network for differentiable ... · makes these techniques non-differentiable. Although it is possible to treat them as non-differentiable renderers

Calculus Review Session - Duke University · Differentiable functions • Differentiable: A function is differentiable at a point when there's a defined derivative at that point.

DIFFERENTIABLE MANIFOLDS AND mUADRATIC FORMS

Let f be continuous differentiable

An Introduction to Differentiable Manifolds and Riemannian ...preview.kingborn.net/888000/f1149880ed1043db8393e9d20f8709ee.… · Title: An Introduction to Differentiable Manifolds

NeuralSim: Augmenting Differentiable Simulators with ...

Meta-Learning With Differentiable Convex Optimizationopenaccess.thecvf.com/content_CVPR_2019/papers/Lee_Meta-Learning_With... · Meta-Learning with Differentiable Convex Optimization

Differentiable Spatial Planning using Transformers

The Charity Law Update 2013 Maclay Murray & Spens LLP The Gathering February 2013.

Differentiable Generator Nets

DARTS: DIFFERENTIABLE ARCHITECTURE SEARCH