Power Module Degradation Monitoring with Artificial Neural ...
Neural Module Networks for Reasoning Over Textmausam/courses/col873/spring... · Neural Modules •...
Transcript of Neural Module Networks for Reasoning Over Textmausam/courses/col873/spring... · Neural Modules •...
NeuralModuleNetworksforReasoningOverText
Nitish Gupta,KevinLin,DanRoth,SameerSingh&MattGardner
Presentedby:Jigyasa Gupta
NeuralModules• Introducedinthepaper“DeepCompositionalQuestionAnsweringwithNeuralModuleNetworks”byJacobAndreas,MarcusRohrbach,Trevor Darrell,DanKleinforVisualQAtask
SlidesofNeuralModulestakenfromBerthy Feng,astudentatPrincetonUniversity
Motivation :CompositionalNatureofVQA
SlidesofNeuralModulestakenfromBerthy Feng,astudentatPrincetonUniversity
Motivation :CompositionalNatureofVQA
Motivation:CombineBothApproaches
Modules
Attention(Find)Re-Attention(Transform)CombinationClassification(Describe)Measurement
DROP:AReadingComprehensionBenchmarkRequiringDiscreteReasoningOverParagraphs
Dheeru Dua,Yizhong Wang,PradeepDasigi,GabrielStanovsky,SameerSingh,andMattGardner
• UseNeuralModuleNetworks(NMNs)toanswercompositionalquestionsagainstaparagraphoftext.
• Requiremultiplestepsofreasoning:discrete,symbolicoperations(asshowninDROPdataset)
• NMNsare• Interpretable• Modular• Compositional
NEURALMODULENETWORKSFORREASONINGOVERTEXT
Example
NMNcomponents
• Modules:differentiablemodulesthatperformreasoningovertextandsymbolsinaprobabilisticmanner• Contextualtokenrepresentations:• nandmarenumberoftokensinquesandpara,d=sizeofembedding(bidirectional- GRUorpretrainedBERT)
• QuestionParser:encoderdecodermodelwithattentiontomapquestionintoexecutableprogram• Learning:• likelihoodoftheprogramunderthequestion-parsermodelp(z|q)• foranygivenprogramz,likelihoodofthegold-answerp(y∗|z)
Questionembedding
Paragraphembedding Answer(y*)
Encoder Decoder Decoder Decoder Decoder
Module1 Module2 Module3 Module4
Programexecutor(z)
QuestionParser
JointLearning
NMNcomponents
LearningChallenges
• QuestionParser:• Freeformrealworldquestions:diversegrammarandlexicalvariability
• ProgramExecutor• Nointermediatefeedbackavailableformodules.Errorsgetspropagated
• JointLearning:• supervisiononlyatgoldlevel,difficulttolearnquestionparserandprogramexecutorjointly
Modules
find(Q)→PForquestionspansintheinput,findsimilarspansinthepassage
• Similaritymatrixbetweenquestionandparatokensembedding
• NormalizeStogetattentionmatrix• Computeexpectedparagraphattention
Inputquestionattentionmap
Outputparaattentionmap
find(Q)→P:Example
Questionattentionmapisavailablefromtheencoder–decoderofparser
filter(Q,P)→PBasedonthequestion,selectasubsetofspansfromtheinput
• Weightedsumofquestion-tokenembedding
• Computealocally-normalizedparagraph-tokenmask
• Outputisanormalizedmaskedinputparagraphattention
filter(Q,P)→P :Example
relocate(Q,P)→PFindtheargumentaskedforinthequestionforinputparagraphspans
• Weightedsumofquestion-tokenembeddingwithattentionmap
• Computeaparagraph-to-paragraphattentionmatrix
• OutputattentionisaweightedsumoftherowsRweightedbytheinputparagraphattention
find-num(P)→Nandfind-date(P)→DFindthenumber(s)/date(s)associatedtotheinputparagraphspans
• Extractnumbersanddatesasapre-processingstep,eg [2,2,3,4]• Computeatoken-to-number similarity matrix
• Computean expected distribution overthe number tokens
• Aggregate the probabilities fornumber-tokens ,• Example :{2,3,4}with N=[0.5,0.3,0.2]
find-num(P)→N:xample
count(P)→CCountthenumberofinputpassagespans
• Count([0,0,0.3,0.3,0,0.4])=2• Modulefirstscalestheattentionusingthevalues[1,2,5,10]toconvertitintoamatrixPscaled∈ Rm×4
Pretraining thismodulebygeneratingsyntheticdataofattentionandcountvalueshelps
Normalized-passage-attentionwherepassagelengthsaretypically400-500tokens.Hencescalingtheattentionusingvalues>1helpsthemodelindifferentiatingamongstsmallvalues.
compare-num-lt(P1,P2)→POutputthespanassociatedwiththesmallernumber
• N1=find_num(P1),N2=find_num(P2)• Computestwosoftboolean values,p(N1<N2)andp(N2<N1)
• Outputsaweightedsumoftheinputparagraphattentions
time-diff(P1,P2)→TDDifferencebetweenthedatesassociatedwiththeparagraphspans
• Moduleinternallycallsthefind-datemoduletogetadatedistributionforthetwoparagraphattentions,D1andD2
find-max-num(P)→P,find-min-num(P)→PSelectthespanthatisassociatedwiththelargestnumber
• ComputeanexpectednumbertokendistributionTusingfind-num• Computetheexpectedprobabilitythateachnumbertokenistheonewiththemaximumvalue,Tmax∈ Rntokens
• Reweight thecontributionfromthei-th paragraphtokentothej-thnumbertoken
span(P)→SIdentifyacontiguousspanfromtheattendedtokens
• Onlyappearsastheoutermostmoduleinaprogram.• Outputstwoprobabilitydistributions,Ps andPe∈ Rm,denotingstartandendofaspan• Thismoduleisimplementedsimilartothecountmodule
Auxiliarysupervision
• unsupervisedauxiliarylosstoprovideaninductivebiastotheexecutionoffind-num,find-date,andrelocatemodules• provideheuristically-obtainedsupervisionforquestionprogramandintermediatemoduleoutputforasubsetofquestions(5–10%).
UnsupervisedauxiliarylossforIE
• find-num,find-date,andrelocatemodulesperforminformationextraction• ObjectiveincreasesthesumoftheattentionprobabilitiesforoutputtokensthatappearwithinawindowW=10
QuestionParseSupervision
• Heuristicpatternstogetprogramandcorrespondingquestionattentionsupervisionforasubsetofthetrainingdata(10%)
IntermediateModuleOutputSupervision
• Usedforfind-num andfind-datemodules• Forasubsetofthequestions(5%)• Eg :“howmanyyardswasthelongest/shortesttouchdown?”• Identifyallinstancesofthetoken“touchdown”• Assumetheclosestnumbertoitshouldbeanoutputofthefind-nummodule.• Supervisethisasamulti-hotvectorN∗ anduseanauxiliaryloss
Dataset
20,000questionsfortraining/validation,and1800questionsfortesting(25%ofDROP)Automaticallyextractedquestionsinthescopeofmodelbasedontheirfirstn-gram.
RESULTS
RESULTS– QuestionsType
EffectofAuxiliarySupervision
IncorrectProgramPredictions.
• HowmanytouchdownpassesdidTomBradythrowintheseason?-count(find)• Correctanswerrequiresasimplelookupfromtheparagraph.
• Whichhappenedlast,failedassassinationattemptonLenin,ortheRedTerror?date-compare-gt(find,find))• Correctanswerrequiresnaturallanguageinferenceabouttheorderofeventsandnotsymboliccomparisonbetweendates.
• Whocaughtthemosttouchdownpasses?- relocate(find-max-num(find))).• Requirenestedcountingwhichisoutofscope
FutureWork
• Designadditionalmodules• Howmanylanguageseachhadlessthan115,000speakersinthepopulation?• Whichquarterbackthrewthemosttouchdownpasses?• Howmanypointsdidthepackersfallbehindduringthegame?
• UsecompletedatasetofDROP:Incurrentsystem,trainingmodelonthequestionsforwhichmodulescan’texpressthecorrectreasoningharmstheirabilitytoexecutetheirintendedoperations
• Opensupavenuesfortransferlearningwheremodulescanbeindependentlytrainedusingindirectordistantsupervisionfromdifferenttasks
• Combiningblack-boxoperationswiththeinterpretablemodulessothatcancapturemoreexpressivity
ReviewComments- Pros
• Interestingidea[Atishya,Rajas,Keshav,Siddhant,Lovish]• Interpretableandmodular[Atishya,Rajas,Siddhant,Lovish,Vipul]• BetterthanBERTforsymbolicreasoning[Keshav]• Auxiliarylossformulationseemsaverynovelidea[Vipul]• Questionparserhasnewrole:parsetoreturncompositionofmodules.[Pawan]
Reviewcomments- Cons
• Difficulttounderstandmoduledescription[Atishya,Siddhant]• Auxillary lossnotgeneralizable[Atishya,Rajas]• Contributionofeachmodulenotstudied[Atishya,Rajas,Siddhant,Lovish,Pawan]• Only22%ofDROPdatasetused[Rajas,Keshav,Lovish]• Compositionalreasoningquerieslike“WhoisthemotherofPMofIndia?”arenothandled.[Keshav]• Endlessamountofmodulesrequiredtoachievefullreasoningcapability[Vipul]
Reviewcomments- Extensions
• Studyonthecontributionofeachmodule[Atishya]• Pre-trainallthemodulesbycollectingdatausingspecificheuristics[Atishya,Rajas]• RLframeworktopredictwhetheragivenquestioncanbesufficientlyreasoned [Rajas]• Moduletopredictopen-predicatesofthetypePM(India,x)&Mother(x,y).[Keshav,Vipul]• Trainmultipurposemodules(topredict citizenof and presidentof relationships)[Vipul]• Combineend-to-endneuralsystemandNMN[Keshav]• Learnnewmodulesfromdatasetautomatically;learnnewSPARQLtemplatefromdata )[Siddhant,Pawan]• Curriculumlearning[Siddhant]• Metalearning toautomaticallydeterminethemodules[Lovish]