CS388: Natural Language Processing Lecture 10: Syntax...

9
CS388: Natural Language Processing Lecture 10: Syntax I Greg Durrett Slides adapted from Dan Klein, UC Berkeley Recall: CNNs vs. LSTMs Both LSTMs and convoluLonal layers transform the input using context the movie was good the movie was good nxk c filters, m x k each O(n) x c nxk n x 2c BiLSTM with hidden size c LSTM: “globally” looks at the enLre sentence (but local for many problems) CNN: local depending on filter width + number of layers Recall: CNNs the movie was good nxk c filters, m x k each nxc max pooling over the sentence c-dimensional vector projecLon + soZmax P (y|x) W Max pooling: return the max acLvaLon of a given filter over the enLre sentence; like a logical OR (sum pooling is like logical AND) Recall: Neural CRFs Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC ORG B-PER I-PER O O O B-LOC B-ORG O O O O O Barack Obama will travel to Hangzhou 1) Compute f(x) 2) Run forward-backward 3) Compute error signal 4) Backprop (no knowledge of sequenLal structure required)

Transcript of CS388: Natural Language Processing Lecture 10: Syntax...

Page 1: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

CS388:NaturalLanguageProcessingLecture10:SyntaxI

GregDurrett

SlidesadaptedfromDanKlein,UCBerkeley

Recall:CNNsvs.LSTMs

‣ BothLSTMsandconvoluLonallayerstransformtheinputusingcontext

themoviewasgood themoviewasgood

nxk

cfilters,mxkeach

O(n)xc

nxk

nx2c

BiLSTMwith hiddensizec

‣ LSTM:“globally”looksattheenLresentence(butlocalformanyproblems)

‣ CNN:localdependingonfilterwidth+numberoflayers

Recall:CNNs

themoviewasgood

nxk

cfilters,mxkeach

nxc

maxpoolingoverthesentence

c-dimensionalvector

projecLon+soZmax

P (y|x)

W

‣Maxpooling:returnthemaxacLvaLonofagivenfilterovertheenLresentence;likealogicalOR(sumpoolingislikelogicalAND)

Recall:NeuralCRFs

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

BarackObamawilltraveltoHangzhou

1)Computef(x)

2)Runforward-backward

3)Computeerrorsignal

4)Backprop(noknowledgeofsequenLalstructurerequired)

Page 2: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

ThisLecture‣ ConsLtuencyformalism

‣ Context-freegrammarsandtheCKYalgorithm

‣ Refininggrammars

‣DiscriminaLveparsers

ConsLtuency

Syntax‣ Studyofwordorderandhowwordsformsentences

‣Whydowecareaboutsyntax?

‣ Recognizeverb-argumentstructures(whoisdoingwhattowhom?)

‣ MulLpleinterpretaLonsofwords(nounorverb?Fedraises…example)

‣ HigherlevelofabstracLonbeyondwords:somelanguagesareSVO,someareVSO,someareSOV,parsingcancanonicalize

ConsLtuencyParsing‣ Tree-structuredsyntacLcanalysesofsentences

‣ Commonthings:nounphrases,verbphrases,preposiLonalphrases

‣ BogomlayerisPOStags

‣ ExampleswillbeinEnglish.ConsLtuencymakessenseforalotoflanguagesbutnotall

Page 3: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

sentenLalcomplement

wholeembeddedsentence

adverbialphrase

ConsLtuencyParsing

Theratthecatchasedsqueaked

IracedtoIndianapolis,unimpededbytraffic

Challenges‣ PPagachment

§  Ifwedonoannota+on,thesetreesdifferonlyinonerule:§  VP→VPPP§  NP→NPPP

§  Parsewillgoonewayortheother,regardlessofwords§  Lexicaliza+onallowsustobesensi+vetospecificwords

sameparseas“thecakewithsomeicing”

Challenges‣NPinternalstructure:tags+depthofanalysis

Page 4: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

ConsLtuency‣HowdoweknowwhattheconsLtuentsare?

‣ ConsLtuencytests:‣ SubsLtuLonbyproform(e.g.,pronoun)

‣ CleZing(Itwaswithaspoonthat…)

‣ Answerellipsis(Whatdidtheyeat?thecake) (How?withaspoon)

‣ SomeLmesconsLtuencyisnotclear,e.g.,coordinaLon:shewenttoandboughtfoodatthestore

Context-FreeGrammars,CKY

CFGsandPCFGs§  Writesymbolicorlogicalrules:

§  Usededuc4onsystemstoproveparsesfromwords§  Minimalgrammaron“Fedraises”sentence:36parses§  Simple10-rulegrammar:592parses§  Real-sizegrammar:manymillionsofparses

§  Thisscaledverybadly,didn’tyieldbroad-coveragetools

Grammar (CFG) Lexicon

ROOT → S

S → NP VP

NP → DT NN

NP → NN NNS

NN → interest

NNS → raises

VBP → interest

VBZ → raises

NP → NP PP

VP → VBP NP

VP → VBP NP PP

PP → IN NP

‣ Context-freegrammar:symbolswhichrewriteasoneormoresymbols

‣ Lexiconconsistsof“preterminals”(POStags)rewriLngasterminals(words)

‣ CFGisatuple(N,T,S,R):N=nonterminals,T=terminals,S=startsymbol(generallyaspecialROOTsymbol),R=rules

‣ PCFG:probabiliLesassociatedwithrewrites,normalizebysourcesymbol

0.20.5

0.30.70.31.0

1.01.0

1.01.01.01.0

EsLmaLngPCFGs

‣ MaximumlikelihoodPCFG:countandnormalize!SameasHMMs/NaiveBayes

S→NPVP

NP→PRPNP→DTNN

1.00.50.5

‣ TreeTisaseriesofruleapplicaLonsr. P (T ) =Y

r2T

P (r|parent(r))

Page 5: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

BinarizaLon‣ Toparseefficiently,weneedourPCFGstobeatmostbinary(notCNF)

VP

VBD NP PP PP

sold thebook toher for$3

P(VP→VBDNPPPPP)=0.2

VP

VBD VP

NP

PP

VP

PP

VP

VBD VP-[NPPPPP]

NP

PP

VP-[PPPP]

PP

‣ Lossless: ‣ Lossy:

P(VP→VBZPP)=0.1…

ChomskyNormalForm

VP

VBD VP-[NPPPPP]

VBD

PP

VP-[PPPP]

PPP(VP→VBDVP-[NPPPPP])=0.2

VP

VBD VP

NP

PP

VP

PP

‣ Lossless: ‣ Lossy:

P(VP→VBDVP)=0.2

P(VP→NPVP)=0.03P(VP-[NPPPPP]→NPVP-[PPPP])=1.0

P(VP-[PPPP]→PPPP)=1.0 P(VP→PPPP)=0.001‣ DeterminisLcsymbolsmakethis thesameasbefore

‣MakesdifferentindependentassumpLons,notthesamePCFG

CKY

He wrote a long report on Mars

NPPP

NP

‣ FindargmaxP(T|x)=argmaxP(T,x)

‣ Dynamicprogramming:chartmaintainsthebestwayofbuildingsymbolXoverspan(i,j)

‣ Loopoverallsplitpointsk,applyrulesX->YZtobuild Xineverypossibleway

‣ CKY=Viterbi,alsoanalgorithmcalledinside-outside=forward-backward Cocke-Kasami-Younger

i jk

X

ZY

UnaryRulesSBAR

S

theratthecatchasedsqueaked

NP

NNSmice

‣ UnaryproducLonsintreebankneedtobedealtwithbyparsers

‣ Binarytreesovernwordshaveatmostn-1nodes,butyoucanhaveunlimitednumbersofnodeswithunaries(S→SBAR→NP→S→…)

‣ InpracLce:enforceatmostoneunaryovereachspan,modifyCKYaccordingly

Page 6: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

Results

KleinandManning(2003)

‣ StandarddatasetforEnglish:PennTreebank(Marcusetal.,1993)

‣ EvaluaLon:F1overlabeledconsLtuentsofthesentence

‣ VanillaPCFG:~75F1

‣ BestPCFGsforEnglish:~90F1

‣Otherlanguages:resultsvarywidelydependingonannotaLon+complexityofthegrammar

‣ SOTA:95F1

RefiningGeneraLveGrammars

PCFGIndependenceAssumpLons

11%9%

6%

NP PP DT NN PRP

9% 9%

21%

NP PP DT NN PRP

7%4%

23%

NP PP DT NN PRP

All NPs NPs under S NPs under VP

‣ Languageisnotcontext-free:NPsindifferentcontextsrewritedifferently

‣ Canwemakethegrammar“lesscontext-free”?

RuleAnnotaLon

‣ VerLcal(parent)annotaLon:addtheparentsymboltoeachnode,candograndparentstoo

§  Ver$calMarkovorder:rewritesdependonpastkancestornodes.(cf.parentannota$on)

Order 1 Order 2

72%73%74%75%76%77%78%79%

1 2v 2 3v 3

Vertical Markov Order

05000

10000

150002000025000

1 2v 2 3v 3

Vertical Markov Order

Symbols

KleinandManning(2003)

‣ LikeatrigramHMMtagger,incorporatesmorecontext

70%

71%

72%

73%

74%

0 1 2v 2 inf

Horizontal Markov Order

0

3000

6000

9000

12000

0 1 2v 2 inf

Horizontal Markov Order

Symbols

Order 1 Order ∞

KleinandManning(2003)

‣ HorizontalannotaLon:rememberthestatesof mulL-arityrulesduringbinarizaLon

Page 7: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

AnnotatedTree

KleinandManning(2003)

‣ 75F1withbasicPCFG=>86.3F1withthishighlycustomizedPCFG(SOTAwas90F1attheLme,butwithmorecomplexmethods)

LexicalizedParsers

§  What’sdifferentbetweenbasicPCFGscoreshere?§  What(lexical)correla;onsneedtobescored?

‣ EvenwithparentannotaLon,thesetreeshavethesamerules.Needtousethewords

LexicalizedParsers§  Add“headwords”to

eachphrasalnode§  Syntac4cvs.seman4c

heads§  Headshipnotin(most)

treebanks§  Usuallyuseheadrules,

e.g.:§  NP:

§  TakeleFmostNP§  TakerightmostN*§  TakerightmostJJ§  Takerightchild

§  VP:§  TakeleFmostVB*§  TakeleFmostVP§  TakeleFchild

‣ Annotateeachgrammarsymbolwithits“headword”:mostimportantwordofthatconsLtuent

‣ RulesforidenLfyingheadwords(e.g.,thelastwordofanNPbeforeapreposiLonistypicallythehead)

‣ CollinsandCharniak(late90s):~89F1withthese

DiscriminaLveParsers

Page 8: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

CRFParsing

HewrotealongreportonMars.

PP

NP

HewrotealongreportonMars.

PPNP

VP

VBDNP

Myreport

Fig.1report—onMars wrote—onMars

CRFParsing

Taskaretal.(2004)Hall,Durreg,andKlein(2014)

DurregandKlein(2015)

score

LeZchildlastword=report ∧ NP PPNP

w>f NP PPNP

2 5 7=

f NP PPNP

2 5 7HewrotealongreportonMars.

PPNP

NP

=2 5 7

wrotealongreportonMars.

wrotealongreportonMars.

‣ Canlearnthatwereport[PP],whichiscommonduetorepor=ngonthings

‣ Can“neuralize”thisaswelllikeneuralCRFsforNER

+Discrete ConLnuous

He wrote a long report on Mars

NPPP

NP

‣ Chartremainsdiscrete!

‣ Feedforwardpassonnets

‣ RunCKYdynamicprogram

‣DiscretefeaturecomputaLon

+Discrete ConLnuous…

Parsingasentence:

DurregandKlein(ACL2015)

JointDiscreteandConLnuousParsing NeuralCRFParsing

Sternetal.(2017),Kitaevetal.(2018)

‣ Simplerversion:scorecons=tuentsratherthanruleapplicaLons

score w>f NP2 7

=

HewrotealongreportonMars.

PPNP

NP

2 5 7

wrotealongreportonMars.

‣ UseBiLSTMs(Stern)orself-agenLon(Kitaev)tocomputespanembeddings

‣ 91-93F1,95F1withELMo(SOTA).Greatonotherlangstoo!

Page 9: CS388: Natural Language Processing Lecture 10: Syntax Igdurrett/courses/fa2018/lectures/lec10-4pp.pdf‣Lexicon consists of “preterminals” (POS tags) rewriLng as terminals (words)

Takeaways‣ PCFGsesLmatedgeneraLvelycanperformwellifsufficientlyengineered

‣ NeuralCRFsworkwellforconsLtuencyparsing

‣NextLme:revisitlexicalizedparsingasdependencyparsing

Survey‣Writeonethingyoulikeabouttheclass

‣Writeonethingyoudon’tlikeabouttheclass