State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a...

8
State-of-the-art Parsing State-of-the-art Parsers 2012: Transi5on-based Maltparser achieved good results (~90 UAS) 2010: BeEer graph-based parsers using “parent annota5on” (~93 UAS) 2014: Stanford neural dependency parser (Chen and Manning) got 92 UAS with transi5on-based neural model 2005: Eisner algorithm graph-based parser was SOTA (~91 UAS) 2016: Improvements to Chen and Manning Labeled aEachment score: have to label each edge correctly (but this isn’t that hard — noun before verb -> nsubj in most contexts) Unlabeled aEachment score: frac5on of words with correct parent Stanford Dependency Parser Chen and Manning (2014) Feedforward neural network on top of feature vector extracted from stack and buffer 1st in stack 2nd in stack 1st in buf POS of le\most child of 1st in stack Stanford Dependency Parser Chen and Manning (2014)

Transcript of State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a...

Page 1: State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a concert for students ‣Performing event ‣Subject: Lady Gaga ‣Object: a concert ‣Audience:

State-of-the-artParsing

State-of-the-artParsers

‣ 2012:Transi5on-basedMaltparserachievedgoodresults(~90UAS)

‣ 2010:BeEergraph-basedparsersusing“parentannota5on”(~93UAS)

‣ 2014:Stanfordneuraldependencyparser(ChenandManning)got92UASwithtransi5on-basedneuralmodel

‣ 2005:Eisneralgorithmgraph-basedparserwasSOTA(~91UAS)

‣ 2016:ImprovementstoChenandManning

‣ LabeledaEachmentscore:havetolabeleachedgecorrectly(butthisisn’tthathard—nounbeforeverb->nsubjinmostcontexts)

‣ UnlabeledaEachmentscore:frac5onofwordswithcorrectparent

StanfordDependencyParser

ChenandManning(2014)

‣ Feedforwardneuralnetworkontopoffeaturevectorextractedfromstackandbuffer

1stinstack 2ndinstack 1stinbuf POSofle\mostchildof1stinstack… …

StanfordDependencyParser

ChenandManning(2014)

Page 2: State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a concert for students ‣Performing event ‣Subject: Lady Gaga ‣Object: a concert ‣Audience:

StanfordDependencyParser

ChenandManning(2014)

‣MSTParser:“graph-based”parser(likeCKY)from2005—soChen+Manning’sparserisn’tmuchbeEerbutismuchfaster!

ParseyMcParseFace

Andoretal.(2016)

‣ Closetostate-of-the-art,releasedbyGooglepublicly

‣ 94.61UASonthePennTreebankusingatransi5on-basedsystem

‣ SamefeaturesetasChenandManning(2014),Googlefine-tunedit

‣ Addi5onaldataharvestedvia“tri-training”,formofself-training

(a.k.a.SyntaxNet)

hEps://github.com/tensorflow/models/tree/master/research/syntaxnet

AllenNLP

‣ Veryniceandusablewebdemo

‣ Reimplementa5onofgraph-based,state-of-the-artparser

‣ Somefancytrickswehaven’tdiscussedyet

hEps://demo.allennlp.org/dependency-parsing

Otherlanguages‣ Annotatedependencieswiththesamerepresenta5oninmanylanguages

hEp://universaldependencies.org/

English

Bulgarian

Czech

Swiss

Page 3: State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a concert for students ‣Performing event ‣Subject: Lady Gaga ‣Object: a concert ‣Audience:

Seman5cRoleLabeling

Seman5cRoleLabeling

LadyGagaperformedaconcertforstudents

‣ Performingevent

‣ Subject:LadyGaga‣Object:aconcert‣ Audience:students

AconcertwasperformedbyLadyGagaforstudents

‣ Sameeventdescribedbuttherepresenta5onlooksdifferent

VARG1 ARG0 ARG2

VARG0 ARG1 ARG2

VerbNet

verbs.colorado.edu

‣ Definestheseman5csofverbs,argumentsforeveryverbinEnglish

Seman5cRoles

‣ Relatedtothetarolesinlinguis5cs

‣ Agent(~subject),pa5ent/theme(~object),goal(~indirectobject)

‣ “Postprocessing”layerontopofdependencyparsingthatexposesusefulinforma5on,canonicalizesacrossgramma5calconstruc5ons

ARG0 ARG1 ARG2+(seman5csvary)

Page 4: State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a concert for students ‣Performing event ‣Subject: Lady Gaga ‣Object: a concert ‣Audience:

Seman5cRoleLabeling

‣ Iden5fypredicate,disambiguateit,iden5fythatpredicate’sarguments

‣ VerbrolesfromPropbank(Palmeretal.,2005)

FigurefromHeetal.(2017)

quicken:

SRLforQA

ShenandLapata(2007)

‣ Ques5onandseveralanswercandidates

Q:Whodiscoveredprions?

AC1:In1997,StanleyB.Prusiner,ascienEstintheUnitedStates,discoveredprions…

AC2:Prionswereresearchedby…

Scorebymatchingexpectedanswerphrase(EAP)againstanswercandidate(AC)Prusiner

MoreonSRL

‣ EmmaStrubellfromUMassAmherst:“NeuralNetworkArchitecturesforFastandRobustNLP”

‣ EvencomplexneuralnetworkmodelsforSRLbenefitfromdependencyinforma5on

‣ Tuesday,11amGDCmainauditorium

‣ IncludesdiscussionofworkonneuralSRLsystem

Rela5onExtrac5on

Page 5: State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a concert for students ‣Performing event ‣Subject: Lady Gaga ‣Object: a concert ‣Audience:

Rela5onExtrac5on

TimCookistheCEOofApple.

AppleCEOTimCooksaidthat…

AppleshareshavetakenabeaEng,muchtothechagrinofitsCEO,TimCook

Cook’stenureasCEOofApple…

Wozniak’sdesiretobeCEO…

Rela5onExtrac5on‣ Extracten5ty-rela5on-en5tytriplesfromafixedinventory

ACE(2003-2005)

DuringthewarinIraq,AmericanjournalistsweresomeEmescaughtinthelineoffire

Located_In

‣ Systemscanbefeature-basedorneural,lookatsurfacewords,dependencypathfeatures,seman5croles

Na5onality

‣ Problem:limiteddataforscalingtobigontologies

‣ UseNER-likesystemtoiden5fyen5tyspans,classifyrela5onsbetweenen5typairswithaclassifier

DistantSupervision

Mintzetal.(2009)

[StevenSpielberg]’sfilm[SavingPrivateRyan]islooselybasedonthebrothers’story

‣ Iftwoen55esinarela5onappearinthesamesentence,assumethesentenceexpressestherela5on

‣ Lotsofrela5onsinourknowledgebasealready(e.g.,23,000film-directorrela5ons);usethesetobootstrapmoretrainingdata

Allisonco-producedtheAcademyAward-winning[SavingPrivateRyan],directedby[StevenSpielberg]

Director

Director

DistantSupervision

Mintzetal.(2009)

‣ Learndecentlyaccurateclassifiersfor~100Freebaserela5ons‣ Couldbeusedtocrawlthewebandexpandourknowledgebase

Page 6: State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a concert for students ‣Performing event ‣Subject: Lady Gaga ‣Object: a concert ‣Audience:

OpenIE

OpenInforma5onExtrac5on

‣ Typicallynofixedrela5oninventory

‣ “Open”ness—wanttobeabletoextractallkindsofinforma5onfromopen-domaintext

‣ Acquirecommonsenseknowledgejustfrom“reading”aboutit,butneedtoprocesslotsoftext(“machinereading”)

TextRunner‣ Extractposi5veexamplesof(e,r,e)triplesviaparsingandheuris5cs

BarackObama,44thpresidentoftheUnitedStates,wasbornonAugust4,1961inHonolulu

=>Barack_Obama,wasbornin,Honolulu

‣ TrainaNaiveBayesclassifiertofiltertriplesfromrawtext:usesfeaturesonPOStags,lexicalfeatures,stopwords,etc.

‣ 80xfasterthanrunningaparser(whichwasslowin2007…)

Bankoetal.(2007)

‣ Usemul5pleinstancesofextrac5onstoassignprobabilitytoarela5on

Exploi5ngRedundancy

Bankoetal.(2007)

‣ Concrete:definitelytrueAbstract:possiblytruebutunderspecified

‣ Hardtoevaluate:canassessprecision ofextractedfacts,buthowdoweknowrecall?

‣ 9Mwebpages/133Msentences

‣ 2.2tuplesextractedpersentence,filterbasedonprobabili5es

Page 7: State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a concert for students ‣Performing event ‣Subject: Lady Gaga ‣Object: a concert ‣Audience:

ReVerb

Faderetal.(2011)

‣ Moreconstraints:openrela5onshavetobeginwithverb,endwithpreposi5on,becon5guous(e.g.,wasbornon)

‣ Extractmoremeaningfulrela5ons,par5cularlywithlightverbs

ReVerb

Faderetal.(2011)

‣ Foreachverb,iden5fythelongestsequenceofwordsfollowingtheverbthatsa5sfyaPOSregex(V.*P)andwhichsa5sfyheuris5clexicalconstraintsonspecificity

‣ Findthenearestargumentsoneithersideoftherela5on

‣ Annotatorslabeledrela5onsin500documentstoassessrecall

QAfromOpenIE

Choietal.(2015)

Takeaways

‣ Rela5onextrac5on:cancollectdatawithdistantsupervision,usethistoexpandknowledgebases

‣OpenIE:extractslotsofthings,buthardtoknowhowgoodorusefultheyare‣ Cancombinewithstandardques5onanswering

‣ Addnewfactstoknowledgebases

‣ SRL/AMR:handleabunchofphenomena,butmoreorlesslikesyntax++intermsofwhattheyrepresent

‣ Many,manyapplica5onsandtechniques

Page 8: State-of-the-art Parsersgdurrett/courses/sp2019/... · 2019. 3. 7. · Lady Gaga performed a concert for students ‣Performing event ‣Subject: Lady Gaga ‣Object: a concert ‣Audience:

Roadmap

Roadmap‣ Classifica5on:conven5onalandneural,wordrepresenta5ons(3weeks)

‣ Textanalysis:tagging,parsing,informa5onextrac5on(3.5weeks)

‣ Structuredmodelsforsequences,trees(HMMs,PCFGs),aswellasunstructuredapproaches(transi5on-basedparsing)

‣ LotsofNLPtaskscanbeformulatedastagging

‣ Linearandneuralclassifica5on

‣ Howtobuildeffec5vewordvectors

Applica5onsofTagging‣ Extractproductoccurrencesincybercrimeforums,butnoteverythingthatlookslikeaproductisaproduct

Portnoffetal.(2017),DurreEetal.(2017)Notaproductinthiscontext

Roadmap‣ Classifica5on:conven5onalandneural,wordrepresenta5ons(3weeks)

‣ Textanalysis:tagging,parsing,informa5onextrac5on(3.5weeks)

‣ Structuredmodelsforsequences,trees(HMMs,PCFGs),aswellasunstructuredapproaches(transi5on-basedparsing)

‣ Linearandneuralclassifica5on

‣ Howtobuildeffec5vewordvectors

‣ Genera5on,applica5ons:languagemodeling,machinetransla5on,dialogue(4weeks)

‣ Otherapplica5ons:ques5onanswering,TBD(3weeks)

‣ Missing:structuredneuralmodels.Theseareabitbeyondthisclassbutwe’llseeonewaytodothisa\erspringbreak