10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/... · reflect the nature...

Post on 08-Oct-2020

1 views 0 download

Transcript of 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/... · reflect the nature...

10417/10617IntermediateDeepLearning:

Fall2019RussSalakhutdinov

Machine Learning Departmentrsalakhu@cs.cmu.edu

https://deeplearning-cmu-10417.github.io/

Representation Learning for Reading Comprehension

• Disclaimer:SomeofthematerialandslidesforthislecturewereborrowedfromBhuwanDhingra’stalk.

2

ReadingComprehension

ReadingComprehension

Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.

Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”

Answer:RodBlagojevich

Who-Did-What Dataset (Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016)

Ø  isadocumentØ  isaquestionoverthecontentsofthatdocumentØ  istheanswertothisquery

• Theanswercomesfromafixedvocabulary.

TASK:Givenadocumentquerypairfindwhichanswers.

ReadingComprehension

Ø  mightconsistofalltokens/spansoftokensinthedocument(ExtractiveQuestionAnswering)

•  QuestionAnswering/InformationExtraction•  Testfortextrepresentationmodels

–  Abetterrepresentationcanhelpanswermorequestions

Approach--SupervisedLearning

ModelPr(c|d, q) = f✓(d, q, c) 8 c 2 A

L(✓) =X

(d,q,a)2D

� logPr(a|d, q) Loss

✓̂ = argmin✓

L(✓) Training

Whatarchitecturalbiasescanwebuildintothemodel?

(Aneuralnetwork)

D = {(d, q, a)}Ni=1 Dataset

ArchitecturalBias

•  CNNs,RNNshavearchitecturalbiasestowardsimages/sequences

•  Forreadingcomprehensionwhatbiasescanwebuildtoreflectlinguisticphenomena?– Alignment,Paraphrasing,Aggregation– Coreference,SyntacticandSemanticDependencies

DesigningtheconnectivitypatternofaNeuralNetworktoreflectthenatureoftheproblembeingsolved.

(Part1)

(Part2)

Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.

Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”

TextPhenomena

Who-Did-What Dataset (Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016)

Alignment

Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.

Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”

TextPhenomena

Who-Did-What Dataset (Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016)

Alignment Paraphrasing

Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.

Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”

TextPhenomena

Who-Did-What Dataset (Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016)

Alignment AggregationParaphrasing

Alignment

WordVectors+RNNstorepresentDocumentandQuery

Multiplepassesoverthedocument

+PointerSumAttention

AggregationParaphrasing

MultiplicativeAttention

Biases

•  Ascompositionsofwordvectors

arrested Illinois governor Rod Blagojevich

Thingswhichhelp:Ø  PretrainedGloveembeddingsØ  RandomvectorsforOOVtokensattesttime.

•  Betterthantrained“UNK”embedding.Ø  Characterembeddings

*Dhingra,Liu,Salakhutdinov,Cohen(preprint,2017)**Yang,Dhingra,Yuan,Hu,Cohen,Salakhutdinov(ICLR,2017)

RepresentingDocument/Query

•  BidirectionalGatedRecurrentUnitsprocessthetokensfromlefttorightandrighttoleft

arrested Illinois governor Rod Blagojevich

RepresentingDocument/Query

dt = [�!ht ; �ht ]

•  Bothdocumentandqueryarerepresentedasmatrices

RepresentingDocument/Query

arrested Illinois governor Rod Blagojevich corruption by X who was

Q 2 R2h⇥|Q|D 2 R2h⇥|D|

h–StatesizeofeachGRU

•  ForeachtokeninD,weformatokenspecificrepresentationofthequery

Blagojevich corruption by X who was

di

GatedAttentionMechanism

Dhingra et al (ACL 2017)

↵ij = softmax(qTj di)q̃i =

|Q|X

j=1

↵ijqj

document query

qj

GatedAttentionMechanism•  Useelement-wisemultiplicationtogatethe

interactionbetweendocumenttokensandquery

corruption by X who wasBlagojevich

di

d0i = di ⇤ q̃i

Dhingra et al (ACL 2017)

q̃i =

|Q|X

j=1

↵ijqj

1. Findfeaturesinthequerywhichmatchthecontextualrepresentationofthedocument

2. Gatethedocumentrepresentationbymultiplyingwiththesefeatures

MultiHopArchitecture•  Performseveralpassesoverthedocument

–  Allowmodeltocombineevidencefrommultiplesentences

Dhingra et al (ACL 2017)corruption by X who wasBlagojevicharrested

RepeatforKlayers

GatedAttention

GatedAttention

Multi-hopArchitecture•  Performseveralpassesoverthedocument

–  Allowmodeltocombineevidencefrommultiplesentences

Dhingra et al (ACL 2017)

OutputModel• Probabilitythataparticulartokeninthedocumentanswersthequery:

Ø  Takeaninnerproductbetweenthequeryembeddingandtheoutputofthelastlayer:

• Theprobabilityofaparticularcandidateisthenaggregatedoveralldocumenttokenswhichappearinc:

setofpositionswhereatokenincappearsinthedocumentd.

PointerSumAttentionofKadlecetal.,2016

si =exp (hq(K), d(K)

i i)P

i0 exp (hq(K), d(K)i0 i)

, i = 1, . . . , |D|

• Theprobabilityofaparticularcandidateisthenaggregatedoveralldocumenttokenswhichappearinc:

• Thecandidatewithmaximumprobabilityisselectedasthepredictedanswer:

• Usecross-entropylossbetweenthepredictedprobabilitiesandthetrueanswers.

OutputModel

Results

Datasets

Westudied5datasets:

Results

Datasets

Westudied5datasets:Stateoftheart

Dhingra et al (ACL 2017)

AnalysisofAttention

Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.

Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”

Blagojevich corruption by X who was

di

↵ij = softmax(qTj di)q̃i =

|Q|X

j=1

↵ijqj

AnalysisofAttention

Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.

Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”

Layer1 Layer2

Dhingra et al (ACL 2017)

Summarysofar

•  Multiplicativeattentionfordocumentandqueryalignment

•  Multiplelayersallowmodeltofocusondifferentsalientaspectsofthequery

Code+Data:https://github.com/bdhingra/ga-reader

Wordsvs.Characters• Word-levelrepresentationsaregoodatlearningthesemanticsofthetokens

• Character-levelrepresentationsaremoresuitableformodelingsub-wordmorphologies(“cat”vs.“cats”)

Ø  Word-levelrepresentationsareobtainedfromalearnedlookuptable

Ø  Character-levelrepresentationsareusuallyobtainedbyapplyingRNNorCNN

• Hybridword-charactermodelshavebeenshowntobesuccessfulinvariousNLPtasks(Yangetal.,2016a,Miyamoto&Cho(2016),Lingetal.,2015)

• Commonlyusedmethodistoconcatenatethesetworepresentations

Fine-GrainedGating• Fine-grainedgatingmechanism:

Word-levelrepresentation

Character-levelrepresentation

Gating

Additionalfeatures:namedentitytags,part-of-speechtags,documentfrequencyvectors,wordlook-uprepresentations

Yang et al, ICLR 2017

Children’sBookTest(CBC)Dataset

Wordsvs.Characters• Highgatevalues:character-levelrepresentations• Lowgatevalues:word-levelrepresentations.

TalkRoadmap

•  MultiplicativeandFine-grainedAttention•  LinguisticKnowledgeasExplicitMemoryforRNNs

Broad-ContextLanguageModelingHer plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.'' She gave me a quick nod and turned back to X

Task:FindX.

LAMBADA dataset, Paperno et al., ACL 2016

(Xcanbeanywordinthevocabulary)

Passagesarefilteredsuchthathumanscancorrectlyanswerwhengiventhewholecontextbutnotwhengivenonlythelastsentence.

Broad-ContextLanguageModeling(recastasReadingComprehension)

Query:She gave me a quick nod and turned back to XFindX.(NowXisassumedtobeinthedocument)

Document:Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.''

Chu et al, EACL 2017

•  ReadingComprehensionapproachesperformbestonthistask

TextPhenomenaDocument:Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.''

Query:

She gave me a quick nod and turned back to XFindX.

conjnsubj nmod

A0:turner A1:destination

SyntacticDependency

SemanticDependency

TextPhenomenaDocument:Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.''

Query:

She gave me a quick nod and turned back to XFindX.

SyntacticDependency

SemanticDependency

X=Terry

Coreference

ArchitecturalBias

1.  Dependent/Coreferentmentionsmaybefarapartinthedocument–  RNNs(includingGRU/LSTM)tendto“forget”long-terminteractions

Usememory-augmentedarchitecture.2.  Off-the-shelfNLPtoolsavailablewhichextract

thesedependencies–  Example:StanfordCoreNLP

Uselinguisticannotationstoguidememorypropagationinthenetwork.

LinguisticKnowledge

there ball the left and kitchen the to went she football the got mary

CoreNLP / SENNA

there ball the left and kitchen the to went she football the got mary

1. Coreference2. SyntacticDependencies3. SemanticRoles+Inverserelations

•  Bothdocumentandqueryarerepresentedasmatrices

RepresentingDocument/Query

arrested Illinois governor Rod Blagojevich corruption by X who was

Q 2 R2h⇥|Q|D 2 R2h⇥|D|

h–StatesizeofeachGRU

•  Bothdocumentandqueryarerepresentedasmatrices

RepresentingDocument/QueryGraphs

Q 2 R2h⇥|Q|D 2 R2h⇥|D|

there ball the left and kitchen the to went she football the got mary there ball the left and kitchen the to went she football the got mary

GraphRepresentation GraphRepresentation

Forward/BackwardDAGs•  Topologicalordergivenbythesequenceitself

there ball the left and kitchen the to went she football the got mary

there ball the left and kitchen the to went she football the got mary

ForwardDAG

BackwardDAG

*Peng et al (TACL, 2017) use similar decomposition for Relation Extraction

MemoryasAcyclicGraphEncoding(MAGE)RNN

there ball the left and kitchen the to went she football the got mary

went

she

she

left

to

kitchenh3t

h2th1t

m1t

m3t

MemoryasAcyclicGraphEncoding(MAGE)RNN

there ball the left and kitchen the to went she football the got mary

went

Updateequations:

mt = m1tkm2

tk . . . km|Ef |t ,

rt = �(Wrxst + Urmt + br),

zt = �(Wzxst + Uzmt + bz),

h̃t = tanh(Whxst + rt � Uhmt + bh),

ht = (1� zt)� ht�1 + zt � h̃t,

Memory

GRUupdate

Performance•  LAMBADAdataset

– Onlyonquestionswhereanswerisincontext

Performance•  LAMBADAdataset

– OnlyonquestionswhereanswerisincontextStateoftheart

Document:marytravelledtothekitchen.danielgotthemilk.hewenttothekitchen.johnpickeduptheapple.marywenttothehallway.johnmovedtothebedroom.sandrajourneyedtothekitchen.danielwentbacktothebedroom.johnwenttotheoffice.danielputdownthemilk.hepickedupthemilk.sandrawenttothebedroom.johndroppedtheapple.marywentbacktothebathroom.

Performance•  FacebookbAbIdataset

–  Task3(1000trainingexamples)

Query:wherewastheapplebeforetheoffice?

Answer:bedroom

Onlycoreferenceannotationsextractedforthisdataset

Performance•  FacebookbAbIdataset

–  Task3

Performance•  FacebookbAbIdataset

–  Task3Onehotindicatorfeatureforentitymentionappendedtoinput

Performance•  FacebookbAbIdataset

–  Task3

Summarysofar

•  LinguisticannotationscanhelpguidememorypropagationinRNNs– Bettermodelingoflongtermdependencies

AnnotatorNoise

•  Performancedropssharplywithaddednoiseinlinguisticannotations

•  Canwejointlytrainannotationandreadingcomprehensionmodels?–  Canwelearntask-specificknowledge?

“Common”Sense?

Query:“Whowasthefemalememberofthe1980’spopmusicduoEurythmics?”

Document:“EurythmicswereaBritishmusicduoconsistingofmembersAnnieLennoxandDavidA.Stewart”

Answer:AnnieLennox

TriviaQA Dataset (Joshi et al, ACL 2017)

•  Wherecanwefindsuchknowledge?•  Needmoredatasetsfortestingthisaspecttoo.

HerplainfacebrokeintoahugesmilewhenshesawTerry.“Terry!”shecalledout.Sherushedtomeethimandtheyembraced.“Hon,Iwantyoutomeetanoldfriend,OwenMcKenna.Owen,pleasemeetEmily.'’ShegavemeaquicknodandturnedbacktoX

CoreferenceDependencyParses

Entityrelations

Wordrelations

CoreNLP

Freebase

WordNet

RecurrentNeuralNetwork

TextRepresentation

IncorporatingPriorKnowledge

Search-and-ReadforopendomainQA•  RCassumespassagecontaininganswerisalreadyknown

•  ForrealQAweneedtosearchforthepassageaswell

7-ElevenstoresweretemporarilyconvertedintoKwikE-martstopromotethereleaseofwhatmovie?

QUASAR Dataset (Dhingra, Mazaitis, Cohen, in submission)

InJuly2007,7-ElevenredesignedsomestorestolooklikeKwik-E-MartsinselectcitiestopromoteTheSimpsonsMovie.

Corpus

Tie-inpromotionsweremadewithseveralcompanies,including7-Eleven,whichtransformedselectedstoresintoKwik-E-Marts.

Retrieval

7-ElevenBecomesKwik-E-MartforSimpsonsMoviePromotion

Excerpts

Thankyou