Post on 20-Aug-2020
CS378:NaturalLanguageProcessingLecture10:Seq3/SyntaxI
GregDurrett
Announcements
‣Midterm:listoftopicsnextweek.CoverscontentuptoMarch7
‣ A2duetoday
‣ CRFswillNOTbeonthemidterm,acoupleothertopicstoo
‣ A3outtomorrow
Today
‣ CondiRonalrandomfields
‣ NamedenRtyrecogniRon
‣ SyntaxandconsRtuencyparsing
CRFsandNER
NamedEnRtyRecogniRon
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
PERSON LOC ORG
B-PER I-PER O O O B-LOC B-ORGO O O O O
‣ FrameasasequenceproblemwithaBIOtagset:begin,inside,outside
‣WhymightanHMMnotdosowellhere?
‣ LotsofO’s,sotagsaren’tasinformaRveaboutcontext
‣ Needsub-wordfeaturesonunknownwords‣ CRFsarediscriminaRvemodelsthatwillsolvetheseproblems
CondiRonalRandomFields
‣ HMMsareexpressibleasBayesnets(factorgraphs)
y1 y2 yn
x1 x2 xn
…
‣ ThisreflectsthefollowingdecomposiRon:
‣ Locallynormalizedmodel:eachfactorisaprobabilitydistribuRonthatnormalizes
P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .
CondiRonalRandomFields
anyreal-valuedscoringfuncRonofitsarguments
‣ Howdowemaxovery?RequiresconsideringanexponenRalnumberofsequencesingeneral
‣ CRFs:discriminaRvemodelswiththefollowingglobally-normalizedform:
‣ HMMs:
‣ NaiveBayes:logisRcregression::HMMs:CRFslocalvs.globalnormalizaRon<->generaRvevs.discriminaRve
normalizerZ
P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .
P (y|x) =Q
k exp(�k(x,y))Py0
Qk exp(�k(x,y0
))
SequenRalCRFs
y1 y2 yn
x1 x2 xn
…
P (y|x) /Y
k
exp(�k(x,y))
y1 y2 yn
x1 x2 xn
…�t
�e
�o
P (y|x) / exp(�
o
(y1))
nY
i=2
exp(�
t
(y
i�1, yi))
nY
i=1
exp(�
e
(x
i
, y
i
))
‣ HMMs:
‣ CRFs:
P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .
SequenRalCRFs
y1 y2 yn
x1 x2 xn
…�t
�e
�o
P (y|x) / exp(�
o
(y1))
nY
i=2
exp(�
t
(y
i�1, yi))
nY
i=1
exp(�
e
(x
i
, y
i
))
‣WecondiRononx,soeveryfactorcandependonallofx
nY
i=1
exp(�e(yi, i,x))
‣ ycan’tdependarbitrarilyonxinageneraRvemodel tokenindex—letsuslookatcurrentword
y1 y2 yn…�t
�e
�o
SequenRalCRFs
y1 y2 yn…�t
�e
�o
‣ Don’tincludeiniRaldistribuRon,canbakeintootherfactors
P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
SequenRalCRFs:
FeatureFuncRons
y1 y2 yn…
�e
�t
‣ Phiscanbealmostanything!HereweuselinearfuncRonsofsparsefeatures
‣ LookslikeoursingleweightvectormulRclasslogisRcregressionmodel
�t(yi�1, yi) = w>ft(yi�1, yi)
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#�e(yi, i,x) = w>fe(yi, i,x)
P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
BasicFeaturesforNER
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
OB-LOC
TransiRons:
Emissions: Ind[B-LOC&Currentword=Hangzhou]Ind[B-LOC&Prevword=to]
ft(yi�1, yi) = Ind[yi�1 & yi]
fe(y6, 6,x) =
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
=Ind[O—B-LOC]
CRFsOutline
‣Model: P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
‣ Inference:argmaxP(y|x)fromViterbi
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
‣ Learning:requiresrunningsum-productViterbitocomputeposteriorprobabiliResP(y|x)ateachstepi
FeaturesforNER
‣ Contextfeatures(can’tuseinHMM!)‣Wordsbefore/ajer‣ Tagsbefore/ajer
‣Wordfeatures(canuseinHMM)‣ CapitalizaRon‣Wordshape‣ Prefixes/suffixes‣ Lexicalindicators
‣ Gazeleers‣Wordclusters
Leicestershire
Boston
Applereleasedanewversion…
AccordingtotheNewYorkTimes…
EvaluaRngNER
‣ PredicRonofallOssRllgets66%accuracyonthisexample!
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
PERSON LOC ORG
B-PER I-PER O O O B-LOC B-ORGO O O O O
‣Whatwereallywanttoknow:howmanynamedenRtychunkpredicRonsdidwegetright?‣ Precision:oftheoneswepredicted,howmanyareright?
‣ Recall:ofthegoldnamedenRRes,howmanydidwefind?
‣ F-measure:harmonicmeanofthesetwo
NER
‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem
‣ OtherpiecesofinformaRonthatmanysystemscapture
‣Worldknowledge:
ThedelegaRonmetthepresidentattheairport,Tanjugsaid.
ORG?PER?
NonlocalFeatures
ThedelegaRonmetthepresidentattheairport,Tanjugsaid.
ThenewsagencyTanjugreportedontheoutcomeofthemeeRng.
‣Morecomplexfactorgraphstructurescanletyoucapturethis,orjustdecodesentencesinorderandusefeaturesonprevioussentences
FinkelandManning(2008),RaRnovandRoth(2009)
HowwelldoNERsystemsdo?
RaRnovandRoth(2009)
Lampleetal.(2016)
BiLSTM-CRF+ELMo Petersetal.(2018)
92.2
Takeaways
‣ CRFsarestructuredfeature-basedmodels
‣ Efficienttodoinferenceandlearningusingdynamicprograms
‣ LookslikelogisRcregression,butrequiresmoreefforttoimplement
ConsRtuencyParsing
Syntax‣ Studyofwordorderandhowwordsformsentences
‣Whydowecareaboutsyntax?
‣ Recognizeverb-argumentstructures(whoisdoingwhattowhom?)
‣MulRpleinterpretaRonsofwords(nounorverb?Fedraises…example)
‣ HigherlevelofabstracRonbeyondwords:somelanguagesareSVO,someareVSO,someareSOV,parsingcancanonicalize
ConsRtuencyParsing‣ Tree-structuredsyntacRcanalysesofsentences
‣ Commonthings:nounphrases,verbphrases,preposiRonalphrases
‣ BolomlayerisPOStags
‣ ExampleswillbeinEnglish.ConsRtuencymakessenseforalotoflanguagesbutnotall
sentenRalcomplement
wholeembeddedsentence
adverbialphrase
ConsRtuencyParsing
Examples
Challenges‣ PPalachment
§ Ifwedonoannota+on,thesetreesdifferonlyinonerule:§ VP→VPPP§ NP→NPPP
§ Parsewillgoonewayortheother,regardlessofwords§ Lexicaliza+onallowsustobesensi+vetospecificwords
sameparseas“thecakewithsomeicing”
Challenges‣ NPinternalstructure:tags+depthofanalysis
ConsRtuency‣ HowdoweknowwhattheconsRtuentsare?
‣ ConsRtuencytests:‣ SubsRtuRonbyproform(e.g.,pronoun)
‣ Clejing(Itwaswithaspoonthat…)
‣ Answerellipsis(Whatdidtheyeat?thecake) (How?withaspoon)
‣ SomeRmesconsRtuencyisnotclear,e.g.,coordinaRon:shewenttoandboughtfoodatthestore
Context-FreeGrammars,CKY
Survey
‣ 1.Thepaceofthefirstfewlectures(naiveBayes,logisRcregression,perceptron,etc.)was[toofast/tooslow/justright]
‣ 2.Thepaceofthelastfewlectures(tagging,Viterbi,parsing)was[toofast/tooslow/justright]
‣ 3.Thehomeworksoverallare[toohard/tooeasy/justright]
‣ 4.IwouldpreferA3bedueon[FridayMarch8/MondayMarch11](midtermisonThursday,March14)
‣ 5.Othercomments(likes/dislikes)