State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a...
Transcript of State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a...
State-of-the-artParsing
State-of-the-artParsers
‣ 2012:Transi5on-basedMaltparserachievedgoodresults(~90UAS)
‣ 2010:BeEergraph-basedparsersusing“parentannota5on”(~93UAS)
‣ 2014:Stanfordneuraldependencyparser(ChenandManning)got92UASwithtransi5on-basedneuralmodel
‣ 2005:Eisneralgorithmgraph-basedparserwasSOTA(~91UAS)
‣ 2016:ImprovementstoChenandManning
‣ LabeledaEachmentscore:havetolabeleachedgecorrectly(butthisisn’tthathard—nounbeforeverb->nsubjinmostcontexts)
‣ UnlabeledaEachmentscore:frac5onofwordswithcorrectparent
StanfordDependencyParser
ChenandManning(2014)
‣ Feedforwardneuralnetworkontopoffeaturevectorextractedfromstackandbuffer
1stinstack 2ndinstack 1stinbuf POSofle\mostchildof1stinstack… …
StanfordDependencyParser
ChenandManning(2014)
StanfordDependencyParser
ChenandManning(2014)
‣MSTParser:“graph-based”parser(likeCKY)from2005—soChen+Manning’sparserisn’tmuchbeEerbutismuchfaster!
ParseyMcParseFace
Andoretal.(2016)
‣ Closetostate-of-the-art,releasedbyGooglepublicly
‣ 94.61UASonthePennTreebankusingatransi5on-basedsystem
‣ SamefeaturesetasChenandManning(2014),Googlefine-tunedit
‣ Addi5onaldataharvestedvia“tri-training”,formofself-training
(a.k.a.SyntaxNet)
hEps://github.com/tensorflow/models/tree/master/research/syntaxnet
AllenNLP
‣ Veryniceandusablewebdemo
‣ Reimplementa5onofgraph-based,state-of-the-artparser
‣ Somefancytrickswehaven’tdiscussedyet
hEps://demo.allennlp.org/dependency-parsing
Otherlanguages‣ Annotatedependencieswiththesamerepresenta5oninmanylanguages
hEp://universaldependencies.org/
English
Bulgarian
Czech
Swiss
Seman5cRoleLabeling
Seman5cRoleLabeling
LadyGagaperformedaconcertforstudents
‣ Performingevent
‣ Subject:LadyGaga‣Object:aconcert‣ Audience:students
AconcertwasperformedbyLadyGagaforstudents
‣ Sameeventdescribedbuttherepresenta5onlooksdifferent
VARG1 ARG0 ARG2
VARG0 ARG1 ARG2
VerbNet
verbs.colorado.edu
‣ Definestheseman5csofverbs,argumentsforeveryverbinEnglish
Seman5cRoles
‣ Relatedtothetarolesinlinguis5cs
‣ Agent(~subject),pa5ent/theme(~object),goal(~indirectobject)
‣ “Postprocessing”layerontopofdependencyparsingthatexposesusefulinforma5on,canonicalizesacrossgramma5calconstruc5ons
ARG0 ARG1 ARG2+(seman5csvary)
Seman5cRoleLabeling
‣ Iden5fypredicate,disambiguateit,iden5fythatpredicate’sarguments
‣ VerbrolesfromPropbank(Palmeretal.,2005)
FigurefromHeetal.(2017)
quicken:
SRLforQA
ShenandLapata(2007)
‣ Ques5onandseveralanswercandidates
Q:Whodiscoveredprions?
AC1:In1997,StanleyB.Prusiner,ascienEstintheUnitedStates,discoveredprions…
AC2:Prionswereresearchedby…
Scorebymatchingexpectedanswerphrase(EAP)againstanswercandidate(AC)Prusiner
MoreonSRL
‣ EmmaStrubellfromUMassAmherst:“NeuralNetworkArchitecturesforFastandRobustNLP”
‣ EvencomplexneuralnetworkmodelsforSRLbenefitfromdependencyinforma5on
‣ Tuesday,11amGDCmainauditorium
‣ IncludesdiscussionofworkonneuralSRLsystem
Rela5onExtrac5on
Rela5onExtrac5on
TimCookistheCEOofApple.
AppleCEOTimCooksaidthat…
AppleshareshavetakenabeaEng,muchtothechagrinofitsCEO,TimCook
Cook’stenureasCEOofApple…
Wozniak’sdesiretobeCEO…
Rela5onExtrac5on‣ Extracten5ty-rela5on-en5tytriplesfromafixedinventory
ACE(2003-2005)
DuringthewarinIraq,AmericanjournalistsweresomeEmescaughtinthelineoffire
Located_In
‣ Systemscanbefeature-basedorneural,lookatsurfacewords,dependencypathfeatures,seman5croles
Na5onality
‣ Problem:limiteddataforscalingtobigontologies
‣ UseNER-likesystemtoiden5fyen5tyspans,classifyrela5onsbetweenen5typairswithaclassifier
DistantSupervision
Mintzetal.(2009)
[StevenSpielberg]’sfilm[SavingPrivateRyan]islooselybasedonthebrothers’story
‣ Iftwoen55esinarela5onappearinthesamesentence,assumethesentenceexpressestherela5on
‣ Lotsofrela5onsinourknowledgebasealready(e.g.,23,000film-directorrela5ons);usethesetobootstrapmoretrainingdata
Allisonco-producedtheAcademyAward-winning[SavingPrivateRyan],directedby[StevenSpielberg]
Director
Director
DistantSupervision
Mintzetal.(2009)
‣ Learndecentlyaccurateclassifiersfor~100Freebaserela5ons‣ Couldbeusedtocrawlthewebandexpandourknowledgebase
OpenIE
OpenInforma5onExtrac5on
‣ Typicallynofixedrela5oninventory
‣ “Open”ness—wanttobeabletoextractallkindsofinforma5onfromopen-domaintext
‣ Acquirecommonsenseknowledgejustfrom“reading”aboutit,butneedtoprocesslotsoftext(“machinereading”)
TextRunner‣ Extractposi5veexamplesof(e,r,e)triplesviaparsingandheuris5cs
BarackObama,44thpresidentoftheUnitedStates,wasbornonAugust4,1961inHonolulu
=>Barack_Obama,wasbornin,Honolulu
‣ TrainaNaiveBayesclassifiertofiltertriplesfromrawtext:usesfeaturesonPOStags,lexicalfeatures,stopwords,etc.
‣ 80xfasterthanrunningaparser(whichwasslowin2007…)
Bankoetal.(2007)
‣ Usemul5pleinstancesofextrac5onstoassignprobabilitytoarela5on
Exploi5ngRedundancy
Bankoetal.(2007)
‣ Concrete:definitelytrueAbstract:possiblytruebutunderspecified
‣ Hardtoevaluate:canassessprecision ofextractedfacts,buthowdoweknowrecall?
‣ 9Mwebpages/133Msentences
‣ 2.2tuplesextractedpersentence,filterbasedonprobabili5es
ReVerb
Faderetal.(2011)
‣ Moreconstraints:openrela5onshavetobeginwithverb,endwithpreposi5on,becon5guous(e.g.,wasbornon)
‣ Extractmoremeaningfulrela5ons,par5cularlywithlightverbs
ReVerb
Faderetal.(2011)
‣ Foreachverb,iden5fythelongestsequenceofwordsfollowingtheverbthatsa5sfyaPOSregex(V.*P)andwhichsa5sfyheuris5clexicalconstraintsonspecificity
‣ Findthenearestargumentsoneithersideoftherela5on
‣ Annotatorslabeledrela5onsin500documentstoassessrecall
QAfromOpenIE
Choietal.(2015)
Takeaways
‣ Rela5onextrac5on:cancollectdatawithdistantsupervision,usethistoexpandknowledgebases
‣OpenIE:extractslotsofthings,buthardtoknowhowgoodorusefultheyare‣ Cancombinewithstandardques5onanswering
‣ Addnewfactstoknowledgebases
‣ SRL/AMR:handleabunchofphenomena,butmoreorlesslikesyntax++intermsofwhattheyrepresent
‣ Many,manyapplica5onsandtechniques
Roadmap
Roadmap‣ Classifica5on:conven5onalandneural,wordrepresenta5ons(3weeks)
‣ Textanalysis:tagging,parsing,informa5onextrac5on(3.5weeks)
‣ Structuredmodelsforsequences,trees(HMMs,PCFGs),aswellasunstructuredapproaches(transi5on-basedparsing)
‣ LotsofNLPtaskscanbeformulatedastagging
‣ Linearandneuralclassifica5on
‣ Howtobuildeffec5vewordvectors
Applica5onsofTagging‣ Extractproductoccurrencesincybercrimeforums,butnoteverythingthatlookslikeaproductisaproduct
Portnoffetal.(2017),DurreEetal.(2017)Notaproductinthiscontext
Roadmap‣ Classifica5on:conven5onalandneural,wordrepresenta5ons(3weeks)
‣ Textanalysis:tagging,parsing,informa5onextrac5on(3.5weeks)
‣ Structuredmodelsforsequences,trees(HMMs,PCFGs),aswellasunstructuredapproaches(transi5on-basedparsing)
‣ Linearandneuralclassifica5on
‣ Howtobuildeffec5vewordvectors
‣ Genera5on,applica5ons:languagemodeling,machinetransla5on,dialogue(4weeks)
‣ Otherapplica5ons:ques5onanswering,TBD(3weeks)
‣ Missing:structuredneuralmodels.Theseareabitbeyondthisclassbutwe’llseeonewaytodothisa\erspringbreak