TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM...
Transcript of TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM...
TTIC31210:AdvancedNaturalLanguageProcessing
KevinGimpelSpring2017
Lecture14:FinishupBayesian/UnsupervisedNLP,
StartStructuredPrediction
1
• TodayandWednesday:structuredprediction• NoclassMondayMay29(MemorialDay)• FinalclassisWednesdayMay31
2
• Assignment3hasbeenposted,dueThursdayJune1• FinalprojectreportdueFriday,June9
3
KeyQuantities
4
Ourdataisasetofsamples:
GibbsSamplingTemplate
5
LDA
6
ExpectationMaximization(EM)
• EMisanalgorithmictemplatethatfindsalocalmaximumofthemarginallikelihoodoftheobserveddata
7
EM• “E”step:– computeposteriorsoverlatentvariables:
• “M”step:– updateparametersgivenposteriors:
8
DifferentViewsoftheDirichlet Process(DP)
• lasttimewediscussedthe“stick-breaking”viewoftheDP
• todaywe’llbrieflydiscussthe“ChineseRestaurantProcess”view
• withbothviews,westillhavethesameDPhyperparameters(basedistribution&concentrationparameter)
9
BaseDistributionforDP
• ourunboundeddistributionoveritemswillchoosethemfromthebasedistribution
• basedistributionusuallyhasinfinitesupport• simpleexamplebasedistributionforourmorphlexicon:
10
ConcentrationParameter• instick-breakingprocess,concentrationparameterdetermineshowmuchofthestickwebreakoffeachtime
• highconcentration==smallpartsofstick
11
fullstick
• thestick-breakingconstructionoftheDPisusefulforspecifyingmodelsanddefininginferencealgorithms
• anotherusefulwayofrepresentingadrawfromaDPiswiththeChineseRestaurantProcess(CRP)– CRPprovidesadistributionoverpartitionswithanunboundednumberofparts
12
• imagineaChineserestaurantwithaninfinitenumberoftables…
13
…
1 2 3 4
• firstcustomersitsatfirsttable:
14
…
1 2 3 4
• secondcustomerenters,choosesatable:
15
…
1 2 3 4
• secondcustomerenters,
choosestable1:
16
…
1 2 3 4
• secondcustomerenters,
choosestable1:
choosesnewtable:
17
…
1 2 3 4
• secondcustomerenters,
choosestable1
18
…
1 2 3 4
• thirdcustomerenters,
19
…
1 2 3 4
• thirdcustomerenters,
choosestable1:
choosesnewtable:
20
…
1 2 3 4
• thirdcustomerenters,
choosesnewtable
21
…
1 2 3 4
• fourthcustomerenters,
p(choosetable1):p(choosetable2):
p(choosenewtable):
22
…
1 2 3 4
23
…
1 2 3 4
• largevalueofconcentrationparameter:
24
…
1 2 3 4
fullstick
• smallvalueofconcentrationparameter:
25
…
1 2 3 4
ADrawGfromaDP(Stick-BreakingRepresentation)
26
drawinfiniteprobabilitiesfromstick-breakingprocesswithparameters
drawatomsfrombasedistributionatomscanberepeated!
ARepresentationofGDrawnfromaDP(ChineseRestaurantProcessRepresentation)
27
drawtableassignmentsforn customerswithparameters
foreachoccupiedtable,drawatomfrombasedistribution
numberoftablesoccupied
eachdrawfromGisanatom,whereitsprobabilitycomesfromthenumberofcustomersatitstable
WhentobeBayesian?• ifyou’redoingunsupervisedlearningorlearningwithlatentvariables
• ifyouwanttomarginalizeoutsomemodelparameters
• ifyouwanttolearnthestructure/architectureofyourmodel
• ifyouwanttolearnapotentially-unboundedlexicon(Bayesiannonparametrics)
28
WhatisStructuredPrediction?
29
Modeling,Inference,Learning
30
Modeling,Inference,Learning
• Modeling:Howdoweassignascoretoan(x,y)pairusingparameters?
modeling:definescorefunction
31
Modeling,Inference,Learning
• Inference:Howdoweefficientlysearchoverthespaceofalllabels?
inference:solve_ modeling:definescorefunction
32
Modeling,Inference,Learning
• Learning:Howdowechoose?
learning:choose_
modeling:definescorefunctioninference:solve_
33
Modeling,Inference,Learning
StructuredPrediction:sizeofoutputspaceisexponentialinsizeofinputorisunbounded(e.g.,machinetranslation)(wecan’tjustenumerateallpossibleoutputs)
learning:choose_
modeling:definescorefunctioninference:solve_
34
determinerverb(past)prep.properproperposs.adj.noun
modalverbdet.adjectivenounprep.properpunc.
35
Part-of-SpeechTagging
determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct
modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.
Simplestkindofstructuredprediction:SequenceLabeling
36
OOOB-PERSONI-PERSONOOOSomequestionedifTimCook’sfirstproduct
OOOOOOB-ORGANIZATIONOwouldbeabreakawayhitforApple.
NamedEntityRecognition
B=“begin”I=“inside”O=“outside”
FormulatingsegmentationtasksassequencelabelingviaB-I-Olabeling:
ConstituentParsing(S(NPtheman)(VPwalked(PPto(NPthepark))))
37
themanwalkedtothepark
S
NP
NP
VP
PP
Key:S=sentenceNP=nounphraseVP=verbphrasePP=prepositionalphraseDT=determinerNN=nounVBD=verb(pasttense)IN=preposition
DT NN VBDINDTNN
38
source: $ konnten sie es übersetzen ?
reference: $ could you translate it ?“wall”symbol
DependencyParsing
Coreference ResolutionAsweheadtowardstrainingcamp,thePhiladelphia
Eagleshavefinallyfilledmostoftheirneedsonoffense.Oneofthemaingoalsforthisoff-seasonwastofind
weaponsfortheteam’sfranchisequarterback,CarsonWentz.TheEagles neededawidereceiverwhocouldstretchthefieldandgiveWentz theopportunitytothrowthelongball.They signedreceiverTorreySmithtoa3-yeardeal.
WhilethesigningofSmith washugefortheteam,thebiggestsigningtheEaglesmadewasformerChicagoBearsreceiverAlshon Jeffery.He hadasolid5-yearstintinChicago,butastheteamstartedtofallapart,Jefferywasforcedtoexploreotheroptions.
39
Coreference Resolutioninput:adocumentoutput:asetof“mentions”(textualspansindocument),andmembershipsofthosementionsinclusters
40
SemanticRoleLabelingApplications
` Question & answer systems
Who did what to whom at where?
30
The police officer detained the suspect at the scene of the crime
ARG0 ARG2 AM-loc V Agent ThemePredicate Location
J&M/SLP3
input:asentenceoutput:onespaninthesentenceidentifiedasapredicate,andasetofotherspansidentifiedasparticularrolesforthatpredicate
SupervisedWordAlignment
42
givenparallelsentences,predictwordalignments:
Brownetal.(1990)
konnten : could
konnten : could
konnten sie : could you
sie : you
sie : you
es übersetzen : translate it
sie es übersetzen : you translate it
übersetzen :translate
übersetzen :translate
es : it
es : it
es : it
? : ?
? : ?
MachineTranslation• phrase-basedmodel(Koehnetal.,2003):
input:asentenceinthesourcelanguageoutput:asegmentationofthesourcesentenceintosegments,atranslationofeachsegment,andanorderingofthetranslations
• Ithinkofstructuredpredictionmethodsintwoprimarycategories:score-basedandsearch-based
KeyCategoriesofStructuredPrediction
44
Score-BasedStructuredPrediction• focusondefiningthescorefunctionofthestructuredinput/outputpair:
• independencyparsing,thisiscalled“graph-basedparsing”becauseminimumspanningtreealgorithmscanbeusedtofindtheglobally-optimalmax-scoringtree
45
Search-BasedStructuredPrediction• focusontheprocedureforsearchingthroughthestructuredoutputspace(usuallyinvolvessimplegreedyorbeamsearch)
• designaclassifiertoscoreasmallnumberofdecisionsateachpositioninthesearch• thisclassifiercanuseinformationaboutthecurrentstate
aswellastheentirehistoryofthesearch
• independencyparsing,thisiscalled“transition-basedparsing”becauseitconsistsofgreedily,sequentiallydecidingwhatparsingdecisiontomake
46
StructuredPrediction• tomakeSPpractical,weneedtodecomposetheSPproblemintoparts
• thisistruewhetherwearegoingtousesearch-basedorscore-basedSP– score-based:scorefunctiondecomposesadditivelyintoscoresofparts
– search-based:searchfactorsintoasequenceofdecisions,eachoneaddingaparttothefinaloutputstructure
47