Neural Module Networks for Reasoning Over Textmausam/courses/col873/spring... · Neural Modules •...

NeuralModuleNetworksforReasoningOverText

Nitish Gupta,KevinLin,DanRoth,SameerSingh&MattGardner

Presentedby:Jigyasa Gupta

NeuralModules• Introducedinthepaper“DeepCompositionalQuestionAnsweringwithNeuralModuleNetworks”byJacobAndreas,MarcusRohrbach,Trevor Darrell,DanKleinforVisualQAtask

SlidesofNeuralModulestakenfromBerthy Feng,astudentatPrincetonUniversity

Motivation :CompositionalNatureofVQA

SlidesofNeuralModulestakenfromBerthy Feng,astudentatPrincetonUniversity

Motivation :CompositionalNatureofVQA

Motivation:CombineBothApproaches

Modules

Attention(Find)Re-Attention(Transform)CombinationClassification(Describe)Measurement

DROP:AReadingComprehensionBenchmarkRequiringDiscreteReasoningOverParagraphs

Dheeru Dua,Yizhong Wang,PradeepDasigi,GabrielStanovsky,SameerSingh,andMattGardner

• UseNeuralModuleNetworks(NMNs)toanswercompositionalquestionsagainstaparagraphoftext.

• Requiremultiplestepsofreasoning:discrete,symbolicoperations(asshowninDROPdataset)

• NMNsare• Interpretable• Modular• Compositional

NEURALMODULENETWORKSFORREASONINGOVERTEXT

Example

NMNcomponents

• Modules:differentiablemodulesthatperformreasoningovertextandsymbolsinaprobabilisticmanner• Contextualtokenrepresentations:• nandmarenumberoftokensinquesandpara,d=sizeofembedding(bidirectional- GRUorpretrainedBERT)

• QuestionParser:encoderdecodermodelwithattentiontomapquestionintoexecutableprogram• Learning:• likelihoodoftheprogramunderthequestion-parsermodelp(z|q)• foranygivenprogramz,likelihoodofthegold-answerp(y∗|z)

Questionembedding

Paragraphembedding Answer(y*)

Encoder Decoder Decoder Decoder Decoder

Module1 Module2 Module3 Module4

Programexecutor(z)

QuestionParser

JointLearning

NMNcomponents

LearningChallenges

• QuestionParser:• Freeformrealworldquestions:diversegrammarandlexicalvariability

• ProgramExecutor• Nointermediatefeedbackavailableformodules.Errorsgetspropagated

• JointLearning:• supervisiononlyatgoldlevel,difficulttolearnquestionparserandprogramexecutorjointly

Modules

find(Q)→PForquestionspansintheinput,findsimilarspansinthepassage

• Similaritymatrixbetweenquestionandparatokensembedding

• NormalizeStogetattentionmatrix• Computeexpectedparagraphattention

Inputquestionattentionmap

Outputparaattentionmap

find(Q)→P:Example

Questionattentionmapisavailablefromtheencoder–decoderofparser

filter(Q,P)→PBasedonthequestion,selectasubsetofspansfromtheinput

• Weightedsumofquestion-tokenembedding

• Computealocally-normalizedparagraph-tokenmask

• Outputisanormalizedmaskedinputparagraphattention

filter(Q,P)→P :Example

relocate(Q,P)→PFindtheargumentaskedforinthequestionforinputparagraphspans

• Weightedsumofquestion-tokenembeddingwithattentionmap

• Computeaparagraph-to-paragraphattentionmatrix

• OutputattentionisaweightedsumoftherowsRweightedbytheinputparagraphattention

find-num(P)→Nandfind-date(P)→DFindthenumber(s)/date(s)associatedtotheinputparagraphspans

• Extractnumbersanddatesasapre-processingstep,eg [2,2,3,4]• Computeatoken-to-number similarity matrix

• Computean expected distribution overthe number tokens

• Aggregate the probabilities fornumber-tokens ,• Example :{2,3,4}with N=[0.5,0.3,0.2]

find-num(P)→N:xample

count(P)→CCountthenumberofinputpassagespans

• Count([0,0,0.3,0.3,0,0.4])=2• Modulefirstscalestheattentionusingthevalues[1,2,5,10]toconvertitintoamatrixPscaled∈ Rm×4

Pretraining thismodulebygeneratingsyntheticdataofattentionandcountvalueshelps

Normalized-passage-attentionwherepassagelengthsaretypically400-500tokens.Hencescalingtheattentionusingvalues>1helpsthemodelindifferentiatingamongstsmallvalues.

compare-num-lt(P1,P2)→POutputthespanassociatedwiththesmallernumber

• N1=find_num(P1),N2=find_num(P2)• Computestwosoftboolean values,p(N1<N2)andp(N2<N1)

• Outputsaweightedsumoftheinputparagraphattentions

time-diff(P1,P2)→TDDifferencebetweenthedatesassociatedwiththeparagraphspans

• Moduleinternallycallsthefind-datemoduletogetadatedistributionforthetwoparagraphattentions,D1andD2

find-max-num(P)→P,find-min-num(P)→PSelectthespanthatisassociatedwiththelargestnumber

• ComputeanexpectednumbertokendistributionTusingfind-num• Computetheexpectedprobabilitythateachnumbertokenistheonewiththemaximumvalue,Tmax∈ Rntokens

• Reweight thecontributionfromthei-th paragraphtokentothej-thnumbertoken

span(P)→SIdentifyacontiguousspanfromtheattendedtokens

• Onlyappearsastheoutermostmoduleinaprogram.• Outputstwoprobabilitydistributions,Ps andPe∈ Rm,denotingstartandendofaspan• Thismoduleisimplementedsimilartothecountmodule

Auxiliarysupervision

• unsupervisedauxiliarylosstoprovideaninductivebiastotheexecutionoffind-num,find-date,andrelocatemodules• provideheuristically-obtainedsupervisionforquestionprogramandintermediatemoduleoutputforasubsetofquestions(5–10%).

UnsupervisedauxiliarylossforIE

• find-num,find-date,andrelocatemodulesperforminformationextraction• ObjectiveincreasesthesumoftheattentionprobabilitiesforoutputtokensthatappearwithinawindowW=10

QuestionParseSupervision

• Heuristicpatternstogetprogramandcorrespondingquestionattentionsupervisionforasubsetofthetrainingdata(10%)

IntermediateModuleOutputSupervision

• Usedforfind-num andfind-datemodules• Forasubsetofthequestions(5%)• Eg :“howmanyyardswasthelongest/shortesttouchdown?”• Identifyallinstancesofthetoken“touchdown”• Assumetheclosestnumbertoitshouldbeanoutputofthefind-nummodule.• Supervisethisasamulti-hotvectorN∗ anduseanauxiliaryloss

Dataset

20,000questionsfortraining/validation,and1800questionsfortesting(25%ofDROP)Automaticallyextractedquestionsinthescopeofmodelbasedontheirfirstn-gram.

RESULTS

RESULTS– QuestionsType

EffectofAuxiliarySupervision

IncorrectProgramPredictions.

• HowmanytouchdownpassesdidTomBradythrowintheseason?-count(find)• Correctanswerrequiresasimplelookupfromtheparagraph.

• Whichhappenedlast,failedassassinationattemptonLenin,ortheRedTerror?date-compare-gt(find,find))• Correctanswerrequiresnaturallanguageinferenceabouttheorderofeventsandnotsymboliccomparisonbetweendates.

• Whocaughtthemosttouchdownpasses?- relocate(find-max-num(find))).• Requirenestedcountingwhichisoutofscope

FutureWork

• Designadditionalmodules• Howmanylanguageseachhadlessthan115,000speakersinthepopulation?• Whichquarterbackthrewthemosttouchdownpasses?• Howmanypointsdidthepackersfallbehindduringthegame?

• UsecompletedatasetofDROP:Incurrentsystem,trainingmodelonthequestionsforwhichmodulescan’texpressthecorrectreasoningharmstheirabilitytoexecutetheirintendedoperations

• Opensupavenuesfortransferlearningwheremodulescanbeindependentlytrainedusingindirectordistantsupervisionfromdifferenttasks

• Combiningblack-boxoperationswiththeinterpretablemodulessothatcancapturemoreexpressivity

ReviewComments- Pros

• Interestingidea[Atishya,Rajas,Keshav,Siddhant,Lovish]• Interpretableandmodular[Atishya,Rajas,Siddhant,Lovish,Vipul]• BetterthanBERTforsymbolicreasoning[Keshav]• Auxiliarylossformulationseemsaverynovelidea[Vipul]• Questionparserhasnewrole:parsetoreturncompositionofmodules.[Pawan]

Reviewcomments- Cons

• Difficulttounderstandmoduledescription[Atishya,Siddhant]• Auxillary lossnotgeneralizable[Atishya,Rajas]• Contributionofeachmodulenotstudied[Atishya,Rajas,Siddhant,Lovish,Pawan]• Only22%ofDROPdatasetused[Rajas,Keshav,Lovish]• Compositionalreasoningquerieslike“WhoisthemotherofPMofIndia?”arenothandled.[Keshav]• Endlessamountofmodulesrequiredtoachievefullreasoningcapability[Vipul]

Reviewcomments- Extensions

• Studyonthecontributionofeachmodule[Atishya]• Pre-trainallthemodulesbycollectingdatausingspecificheuristics[Atishya,Rajas]• RLframeworktopredictwhetheragivenquestioncanbesufficientlyreasoned [Rajas]• Moduletopredictopen-predicatesofthetypePM(India,x)&Mother(x,y).[Keshav,Vipul]• Trainmultipurposemodules(topredict citizenof and presidentof relationships)[Vipul]• Combineend-to-endneuralsystemandNMN[Keshav]• Learnnewmodulesfromdatasetautomatically;learnnewSPARQLtemplatefromdata )[Siddhant,Pawan]• Curriculumlearning[Siddhant]• Metalearning toautomaticallydeterminethemodules[Lovish]

Neural Module Networks for Reasoning Over Textmausam/courses/col873/spring... · Neural Modules •...

Documents

Transcript of Neural Module Networks for Reasoning Over Textmausam/courses/col873/spring... · Neural Modules •...