Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ......

37
Texts as Knowledge Bases ChristopherManning Joint work with Gabor Angeli and Danqi Chen Stanford NLP Group @chrmanning · @stanfordnlp AKBC 2016

Transcript of Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ......

Page 1: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

TextsasKnowledgeBases

ChristopherManningJointworkwithGaborAngeliandDanqiChen

StanfordNLPGroup@chrmanning·@stanfordnlp

AKBC2016

Page 2: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear
Page 3: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

MachineComprehension=MachinehasanAugmentedKnowledgeBase

“Amachinecomprehends apassageoftext if,foranyquestion regardingthattextthatcanbeansweredcorrectlybyamajorityofnativespeakers,thatmachinecanprovideastringwhichthosespeakerswouldagreebothanswersthatquestion,anddoesnotcontaininformationirrelevanttothatquestion.”

3

Page 4: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

HowfardocurrentdeeplearningreadingcomprehensionsystemsgoinachievingChrisBurges’sgoal?

4

Twocasestudies…previewsofACL2016

Howcanweusenaturallogicandshallowreasoningtobettertreattextsasaknowledgebase?

Page 5: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

DeepMindRCdataset[Hermannetal.2015]

5

Page 6: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

DeepMindRCdataset

Largedataset

Reallanguage

GoodforDLtraining!

“Artificial”pre-processing(coref,anonymization)

Howhard?

Isitagoodtask?

6

Page 7: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

ResultsonDeepMindRCwhenwebegan[Hermannetal.2015;Hilletal.2016]

SystemCNNDev

CNNTest

DailyMailDev

DailyMailTest

Frame-semanticmodel 36.3 40.2 35.5 35.5Worddistancemodel 50.5 50.9 56.4 55.5DeepLSTMReader 55.0 57.0 63.3 62.2AttentiveReader 61.6 63.0 70.5 69.0ImpatientReader 61.8 63.8 69.0 68.0MemNNwindowmemory 58.0 60.6MemNN window+selfsup 63.4 66.8MemNN win,ss,ens,no-c 66.2 69.4

7

Page 8: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Framesemanticsorsimplesyntax?

Frame-semanticparsingattemptstoidentifypredicatesandtheirsemanticarguments– shouldbegoodforquestionanswering!

Hermannetal.usea“state-of-the-art frame-semanticparser”–Googleversionof[Dasetal.2013,Hermannetal.2014]

Butframesemanticsystemshavecoverageproblems,notrepresentingpertinentrelationsnotmappedontoverbalframes

Howaboutagoodoldfeature-basedsystem,usingasyntacticdependencyparser?

8

Page 9: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

SystemI:StandardEntity-CentricClassifier[Chen,Bolton,&Manning,ACL2016]

• Buildasymbolic featurevectorforeachentity:• Thegoalistolearnfeatureweightssuchthatthecorrectanswer

rankshigherthantheotherentities• TrainlogisticregressionandMARTclassifier(boosteddecision

trees– thesedobetterandarereported)

9

• Whethere isinthepassage• Whethere isinthequestion• Frequencyof e inpassage• Firstpositionofe inpassage• n-gramexactmatch (featuresformatchingL/R1/2words)• Worddistance ofquestionwordsinpassage• Whethereco-occurswithq verboranotherentity• Syntacticdependencyparsetriplematcharounde

Page 10: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Competent(traditional)statisticalNLP…

SystemCNNDev

CNNTest

DailyMailDev

DailyMailTest

Frame-semanticmodel 36.3 40.2 35.5 35.5ImpatientReader 61.8 63.8 69.0 68.0Competent statisticalNLP 67.1 67.9 69.1 68.3MemNN window+selfsup 63.4 66.8MemNN win,ss,ens,no-c 66.2 69.4

10

Page 11: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Ablatingindividualfeatures

11

Page 12: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

SystemII:End-to-EndNeuralNetwork[Chen,Bolton,&Manning,ACL2016]

12

Page 13: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

SystemII:End-to-EndNeuralNetwork

Nomagicatall;wemakeourmodelassimple aspossible• Learnedwordembeddingsfeedinto• Bi-directionalshallowLSTMsforpassageandquestion• Questionrepresentationusedforsoftattentionoverpassagewithsimplebilinearattentionfunction

• Afinalsoftmax layerpredictstheanswerentity• SGD,dropout(0.2),batchsize=32,hiddensize=128,…

13

Page 14: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Competentnew-fangledNLP…

System CNNDev CNNTest DMDev DMTestImpatientReader 61.8 63.8 69.0 68.0Competent statisticalNLP 67.1 67.9 69.1 68.3OurLSTMwithattention 72.4 72.4 76.9 75.8MemNN window+selfsup 63.4 66.8MemNN win,ss,ensem,no-c 66.2 69.4

14

Differences:Simplebilinearattention[Luong,Pham,&Manning2015]Hermannetal.hadanextra,unnecessarylayerjoiningoandqWepredictamongentities,notallwords(butdoesn’tmakeadifference)Maybewe’rebetterattuningneuralnets?Beendoingitforawhile.

Page 15: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

OurResults

Wearequitehappywiththenumbers

[and,BTW,severalotherpeoplehavenowgottensimilarnumbers]

… but what do they really mean?

15

• Whatleveloflanguageunderstandingisneeded?• Whathavethemodelsactuallylearned?

Page 16: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

DataAnalysis

Abreakdown oftheexamples

Exactmatch

Sentence-levelparaphrasing/textualentailment

PartialclueMultiplesentences

Coreferenceerrors

Ambiguousortoohard

16

Page 17: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear
Page 18: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

DataAnalysis

• 25%:coreferenceerrors+hardcases• Only2% requiremultiplesentences

18

Page 19: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Data Analysis

19

Page 20: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Discussion

• TheDeepMindRCdataisquitenoisy• Therequiredreasoningandinferencelevelisquitelimited• Thereisn’tmuchroomleftforimprovement• However,thescaleandeaseofdataproductionisappealing• CanwemakeuseofthisdatainsolvingmorerealisticRCtasks?

• Neuralnetworksaregreat forlearningsemanticmatchesacrosslexicalvariationorparaphrasing!

• LSTMswith(simplebilinear)attentionaregreat!• NotyetprovenwhetherNNscandomorechallengingRCtasks20

Page 21: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

AI24th GradeScienceQuestionAnswering[Angeli,Nayak,&Manning,ACL2016]

Our“knowledge”:

Ovariesarethefemalepartoftheflower,whichproduceseggsthatareneededformakingseeds.

Thequestion:

Whichpartofaplantproducestheseeds?

Theanswerchoices:

theflowertheleavesthestemtheroots

21

Page 22: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Howcanwerepresentandreasonwithbroad-coverageknowledge?

1. Rigid-schemaknowledgebaseswithwell-definedlogicalinference

2. Open-domainknowledgebases(OpenIE)– noclearontologyorinference[Etzionietal.2007ff]

3. HumanlanguagetextKB–Norigidschema,butwith“Naturallogic”candoformalinferenceoverhumanlanguagetext

22

Page 23: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

TextasKnowledgeBase

Storingknowledgeastextiseasy!

Doinginferencesovertextmightbehard

Don’t want to run inference over every fact!

Don’t want to store all the inferences!

Page 24: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Inferences…ondemandfromaquery…[AngeliandManning2014]

Page 25: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

…usingtextasthemeaning representation

Page 26: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

NaturalLogic:logicalinferenceovertext

WearedoinglogicalinferenceThecatateamouse⊨ ¬Nocarnivoreseatanimals

WedoitwithnaturallogicIfImutateasentenceinthisway,doIpreserveitstruth?Post-DealIranAsksifU.S.IsStill‘GreatSatan,’orSomethingLess ⊨ACountryAsksifU.S.IsStill‘GreatSatan,’orSomethingLess

• Asoundandcompleteweaklogic[Icard andMoss2014]• Expressiveforcommonhumaninferences*• “Semantic”parsingisjustsyntacticparsing• Tractable:Polynomialtimeentailmentchecking• Playsnicelywithlexicalmatchingback-offmethods

Page 27: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

#1.Commonsensereasoning

PolarityinNaturalLogic

Weorderphrasesinpartialorders(notjustis-a-kind-of,canalsodogeographicalcontainment,etc.)

Polarity isthedirectionaphrasecanmoveinthisorder

Page 28: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Exampleinferences

Quantifiersdeterminethepolarity ofphrases

Validmutationsconsider polarity

Successfultoyinference:

Allcatseatmice ⊨ Allhousecatsconsumerodents

Page 29: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

“Soft”NaturalLogic

Wealsowanttomakelikely(butnotcertain)inferences

• SamemotivationasMarkovlogic,probabilisticsoftlogic,etc.

• Eachmutationedgetemplatehasacostθ ≥0

• Costofanedgeisθi ·fi

• Costofapathisθ ·f

• Canlearnparametersθ

• Inferenceisthengraphsearch

Page 30: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

#2. Dealingwithreal,long sentences

Naturallogicworkswithfactsliketheseintheknowledgebase:ObamawasborninHawaii

Butreal-worldsentencesarecomplex:BorninHonolulu,Hawaii,Obama isagraduateofColumbiaUniversityandHarvardLawSchool,whereheservedaspresidentoftheHarvardLawReview.

Approach:1. Classifieryieldsentailedclausesfromalongsentence2. Shortenclauseswithnaturallogicinference

Page 31: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

UniversalDependencies(UD)http://universaldependencies.github.io/docs/

Asingleleveloftypeddependencysyntaxthatgivesasimple,human-friendlyrepresentationofsentencestructureandmeaning

Betterthanaphrase-structuretreeformachineinterpretation–it’salmostasemanticnetwork

UDaimstobelinguisticallybetteracrosslanguagesthanearlier,common,simpleNLPrepresentations,suchasCoNLLdependencies

Page 32: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Generationofminimalclauses

1. Classificationproblem:givenadependencyedge,isitaclause?

2. Isitmissingacontrolledsubjectfromsubj/object?

3. Shortenclauseswhilepreservingvalidity!• Allyoungrabbits drinkmilk⊭

Allrabbits drinkmilk

• OK: SJC,thebayarea’sthirdlargestairport,isexperiencingdelaysduetoweather.

• Oftenbetter:SJCisexperiencingdelays.

Usingnaturallogic

Page 33: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

#3.Addalexicalalignmentclassifier

• Sometimeswecan’tquitemaketheinferencesthatwewouldliketomake:

• Weuseasimplelexicalmatchback-offclassifierwithfeatures:• Matchingwords,mismatchedwords,unmatchedwords• Thesealwaysworkprettywell– the lessonofRTEevaluations

Page 34: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Thefullsystem

• Werunourusualsearchoversplitup,shortenedclauses• Ifwefindapremise,great!• Ifnot,weusethelexicalclassifierasanevaluationfunction

• Weworktodothisquickly• Visit1Mnodes/second,don’trefeaturize,justdelta• 32bytesearchstates(thanksGabor!)

Page 35: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Solving4th gradescience(AllenAIdatasets)

Multiplechoicequestionsfromreal4th gradescienceexamsWhichactivityisanexampleofagoodhealthhabit?

(A)Watchingtelevision(B)Smokingcigarettes(C)Eatingcandy(D)Exercisingeveryday

Inourcorpus knowledgebase:• PlasmaTV’scandisplayupto16millioncolors...greatfor

watchingTV...alsomakeagoodscreen.• Notsmokingordrinkingalcoholisgoodforhealth,regardless of

whetherclothingiswornornot.• Eatingcandyfordinerisanexampleofapoorhealthhabit.• Healthyisexercising

Page 36: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Solving4th gradescience(AllenAINDMC)

System Dev TestKnowBot[Hixon etal.NAACL2015] 45 –KnowBot(Oracle– humaninloop) 57 –IRbaseline(Lucene) 49 42NaturalLI 52 51Moredata+IRbaseline 62 58Moredata+NaturalLI 65 61NaturalLI+🔔 +(lex.classifier) 74 67Aristo[Clarketal.2016]6systems,evenmoredata 71

Testset:NewYorkRegents4thGradeScienceexammultiple-choice questions fromAI2Training:BasicisBarron’sstudyguide;moredataisSciText corpus fromAI2.Score:%correct

Page 37: Texts as Knowledge Bases · Frame-semantic parsing attempts to identify predicates and their ... • Question representation used for soft attention over passage with simple bilinear

Envoi

Canourknowledgebasejustbetext?

Naturallogicprovidesauseful,formal(weak)logicfortextualinference

Naturallogiciseasilycombinablewithlexicalmatchingmethods,includingneuralnetmethods

Theresultingsystemisusefulfor:

• Common-sensereasoning

• QuestionAnswering

• Also,OpenInformationExtraction