Lecture 12: Algorithms for...
Transcript of Lecture 12: Algorithms for...
![Page 1: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/1.jpg)
Lecture12:
AlgorithmsforHMMs
NathanSchneider
(someslidesfromSharonGoldwater;
thankstoJonathanMayforbugfixes)
ENLP|17October2016
updated9September2017
![Page 2: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/2.jpg)
Recap:tagging
• POStaggingisasequencelabellingtask.
• Wecantackleitwithamodel(HMM)that
usestwosourcesofinformation:
– Theworditself– Thetagsassignedtosurroundingwords
• Thesecondsourceofinformationmeanswe
can’tjusttageachwordindependently.
![Page 3: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/3.jpg)
LocalTagging
<s> one dog bit </s><s> CD NN NN </s>
NN VB VBD
PRP
Possibletags:
(orderedby
frequencyfor
eachword)
Words:
• Choosingthebesttagforeachwordindependently,
i.e.notconsideringtagcontext,givesthewrong
answer(<s>CDNNNN</s>).
• ThoughNNismorefrequentfor‘bit’,taggingitas
VBDmayyieldabettersequence(<s>CDNNVB</s>)
– becauseP(VBD|NN) andP(</s>|VBD)arehigh.
![Page 4: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/4.jpg)
Recap:HMM
• ElementsofHMM:
– Setofstates(tags)– Outputalphabet(wordtypes)– Startstate(beginningofsentence)– StatetransitionprobabilitiesP(ti |ti-1)– OutputprobabilitiesfromeachstateP(wi |ti)
![Page 5: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/5.jpg)
Recap:HMM
• GivenasentenceW=w1…wn withtagsT=t1…tn,computeP(W,T) as:
• Butwewanttofind without
enumeratingallpossibletagsequences T– Useagreedyapproximation,or
– UseViterbialgorithmtostorepartialcomputations.
-(.,/) =0- 12 32 - 32 3245
6
275
argmax/ -(/|.)
![Page 6: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/6.jpg)
GreedyTagging
<s> one dog bit </s><s> CD NN NN </s>
NN VB VBD
PRP
Possibletags:
(orderedby
frequencyfor
eachword)
Words:
• Fori =1toN:choosethetagthatmaximizes
– transitionprobability- 32 3245 ×– emissionprobability- 12 32
• Thisusestagcontextbutisstillsuboptimal.Why?
– Itcommitstoatagbeforeseeingsubsequenttags.
– ItcouldbethecasethatALLpossiblenexttagshavelow
transitionprobabilities.E.g.,ifatagisunlikelytooccuratthe
endofthesentence,thatisdisregardedwhengoinglefttoright.
![Page 7: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/7.jpg)
Greedyvs.DynamicProgramming
• Thegreedyalgorithmisfast:wejusthavetomakeonedecisionpertoken,andwe’redone.
– Runtimecomplexity?
– @(AB) withA tags,length-B sentence
• Butsubsequentwordshavenoeffectoneach
decision,sotheresultislikelytobesuboptimal.
• Dynamicprogrammingsearchgivesanoptimalglobalsolution,butrequiressomebookkeeping
(=morecomputation).Postponesdecisionabout
anytaguntilwecanbesureit’soptimal.
![Page 8: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/8.jpg)
ViterbiTagging:intuition
<s> one dog bit </s><s> CD NN NN </s>
NN VB VBD
PRP
Possibletags:
(orderedby
frequencyfor
eachword)
Words:
• Supposewehavealreadycomputed
a) Thebesttagsequencefor<s> … bit thatendsinNN.b) Thebesttagsequencefor<s> … bit thatendsinVBD.
• Then,thebestfullsequencewouldbeeither
– sequence(a)extendedtoinclude</s>,or
– sequence(b)extendedtoinclude</s>.
![Page 9: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/9.jpg)
ViterbiTagging:intuition
<s> one dog bit </s><s> CD NN NN </s>
NN VB VBD
PRP
Possibletags:
(orderedby
frequencyfor
eachword)
Words:
• Butsimilarly,toget
a) Thebesttagsequencefor<s> … bit thatendsinNN.
• Wecouldextendoneof:
– Thebesttagsequencefor<s> … dog thatendsinNN.– Thebesttagsequencefor<s> … dog thatendsinVB.
• Andsoon…
![Page 10: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/10.jpg)
Viterbi:high-levelpicture
• Intuition:thebestpathoflengthi endinginstatetmustincludethebestpathoflengthi−1 tothepreviousstate.So,
– Findthebestpathoflengthi−1 toeachstate.
– Considerextendingeachofthoseby1step,tostatet.– Takethebestofthoseoptionsasthebestpathtostatet.
![Page 11: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/11.jpg)
Viterbi:high-levelpicture
• Wanttofind
• Intuition:thebestpathoflengthi endinginstatetmustincludethebestpathoflengthi-1 tothepreviousstate.So,
– Findthebestpathoflengthi-1 toeachstate.
– Considerextendingeachofthoseby1step,tostatet.– Takethebestofthoseoptionsasthebestpathtostatet.
argmax/ -(/|.)
![Page 12: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/12.jpg)
Viterbialgorithm
• Useachart tostorepartialresultsaswego– T× Ntable,whereb 3, c istheprobability*ofthebest
statesequenceforw1…wi thatendsinstatet.
*Specifically,v(t,i)storesthemaxofthejointprobabilityP(w1…wi,t1…ti-1,ti=t|λ)
![Page 13: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/13.jpg)
Viterbialgorithm
• Useachart tostorepartialresultsaswego– T× Ntable,whereb 3, c istheprobability*ofthebest
statesequenceforw1…wi thatendsinstatet.
• Fillincolumnsfromlefttoright,with
– Themaxisovereachpossibleprevioustag3d
• Storeabacktrace toshow,foreachcell,whichstateatc − 1 wecamefrom.
b 3, c = maxUf b 3′, c − 1 [ -(3|3′) [ - 12|32
*Specifically,v(t,i)storesthemaxofthejointprobabilityP(w1…wi,t1…ti-1,ti=t|λ)
![Page 14: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/14.jpg)
TransitionandOutputProbabilitiesTransitionmatrix:P(ti |ti-1):
Emissionmatrix:P(wi |ti):
Noun Verb Det Prep Adv </s><s> .3 .1 .3 .2 .1 0
Noun .2 .4 .01 .3 .04 .05
Verb .3 .05 .3 .2 .1 .05
Det .9 .01 .01 .01 .07 0
Prep .4 .05 .4 .1 .05 0
Adv .1 .5 .1 .1 .1 .1
a cat doctor in is the veryNoun 0 .5 .4 0 0.1 0 0
Verb 0 0 .1 0 .9 0 0
Det .3 0 0 0 0 .7 0
Prep 0 0 0 1.0 0 0 0
Adv 0 0 0 .1 0 0 .9
![Page 15: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/15.jpg)
Example
SupposeW=thedoctorisin.Ourinitiallyempty
table:
v w1=the w2=doctor w3=is w4=in </s>NounVerbDetPrepAdv
![Page 16: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/16.jpg)
Fillinginthefirstcolumn
SupposeW=thedoctorisin.Ourinitiallyempty
table:
b Noun, the = - Noun <s> -(the|Noun)=.3(0)…
v w1=the w2=doctor w3=is w4=in </s>Noun 0
Verb 0
Det .21
Prep 0
Adv 0
b Det, the = - Det <s>)-(the|Det =.3(.7)
![Page 17: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/17.jpg)
Thesecondcolumn
- Noun Det)-(doctor|Noun =.3(.4)
b Noun, doctor= maxUf b 3d, the [ -(Noun|3′) [ -(doctor|Noun)
v w1=the w2=doctor w3=is w4=in </s>Noun 0 ?
Verb 0
Det .21
Prep 0
Adv 0
![Page 18: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/18.jpg)
Thesecondcolumn
- Noun Det)-(doctor|Noun =.9(.4)
b Noun, doctor= maxUf b 3d, the [ -(Noun|3′) [ -(doctor|Noun)=max{0,0,.21(.36),0,0}=.0756
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756
Verb 0
Det .21
Prep 0
Adv 0
![Page 19: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/19.jpg)
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756
Verb 0 .00021
Det .21
Prep 0
Adv 0
Thesecondcolumn
- Verb Det)-(doctor|Verb =.01(.1)
b Verb, doctor= maxUf b 3d, the [ -(Verb|3′) [ -(doctor|Verb)=max{0,0,.21(.001),0,0}=.00021
![Page 20: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/20.jpg)
Thesecondcolumn
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756
Verb 0 .00021
Det .21 0
Prep 0 0
Adv 0 0
- Verb Det)-(doctor|Verb =.01(.1)
b Verb, doctor= maxUf b 3d, the [ -(Verb|3′) [ -(doctor|Verb)=max{0,0,.21(.001),0,0}=.00021
![Page 21: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/21.jpg)
Thethirdcolumn
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512
Verb 0 .00021
Det .21 0
Prep 0 0
Adv 0 0
- Noun Noun)-(is|Noun =.2(.1)=.02
b Noun, is= maxUf b 3d, doctor [ -(Noun|3′) [ -(is|Noun)=max{.0756(.02),.00021(.03),0,0,0}=.001512
- Noun Verb)-(is|Noun =.3(.1)=.03
![Page 22: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/22.jpg)
Thethirdcolumn
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512
Verb 0 .00021 .027216
Det .21 0 0
Prep 0 0 0
Adv 0 0 0
- Verb Noun)-(is|Verb =.4(.9)=.36
b Verb, is= maxUf b 3d, doctor [ -(Verb|3′) [ -(is|Verb)=max{.0756(.36),.00021(.045),0,0,0}=.027216
- Verb Verb)-(is|Verb =.05(.9)=.045
![Page 23: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/23.jpg)
Thefourthcolumn
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512 0
Verb 0 .00021 .027216 0
Det .21 0 0 0
Prep 0 0 0 .005443
Adv 0 0 0
- Prep Noun)-(in|Prep =.3(1.0)
b Prep, in= maxUf b 3d, is [ -(Prep|3′) [ -(in|Prep)=max{.001512(.3),.027216(.2),0,0,0}=.005443
- Prep Verb)-(in|Prep =.2(1.0)
![Page 24: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/24.jpg)
Thefourthcolumn
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512 0
Verb 0 .00021 .027216 0
Det .21 0 0 0
Prep 0 0 0 .005443
Adv 0 0 0 .000272
- Adv Noun)-(in|Adv =.04(.1)
b Prep, in= maxUf b 3d, is [ -(Prep|3′) [ -(in|Prep)=max{.000504(.004),.027216(.01),0,0,0}=.000272
- Adv Verb)-(in|Adv =.1(.1)
![Page 25: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/25.jpg)
Endofsentence
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512 0
.000027
2
Verb 0 .00021 .027216 0
Det .21 0 0 0
Prep 0 0 0 .005443
Adv 0 0 0 .000272
- </s> Prep =0
b </s>= maxUf b 3d, in [ -(</s>|3′)=max{0,0,0,.005443(0),.000272(.1)}=.0000272
- </s> Adv =.1
![Page 26: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/26.jpg)
CompletedViterbiChart
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512 0
.000027
2
Verb 0 .00021 .027216 0
Det .21 0 0 0
Prep 0 0 0 .005443
Adv 0 0 0 .000272
![Page 27: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/27.jpg)
FollowingtheBacktraces
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512 0
.000027
2
Verb 0 .00021 .027216 0
Det .21 0 0 0
Prep 0 0 0 .005443
Adv 0 0 0 .000272
![Page 28: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/28.jpg)
FollowingtheBacktraces
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512 0
.000027
2
Verb 0 .00021 .027216 0
Det .21 0 0 0
Prep 0 0 0 .005443
Adv 0 0 0 .000272
![Page 29: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/29.jpg)
FollowingtheBacktraces
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512 0
.000027
2
Verb 0 .00021 .027216 0
Det .21 0 0 0
Prep 0 0 0 .005443
Adv 0 0 0 .000272
![Page 30: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/30.jpg)
FollowingtheBacktraces
v w1=the w2=doctor w3=is w4=in </s>Noun 0 .0756 .001512 0
.000027
2
Verb 0 .00021 .027216 0
Det .21 0 0 0
Prep 0 0 0 .005443
Adv 0 0 0 .000272
Det Noun Verb Prep
![Page 31: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/31.jpg)
Implementationandefficiency
• ForsequencelengthN withT possibletags,
– EnumerationtakesO(TN) timeandO(N) space.– BigramViterbitakesO(T2N) timeandO(TN) space.– Viterbiisexhaustive:furtherspeedupsmightbehad
usingmethodsthatprunethesearchspace.
• AswithN-grammodels,chartprobs getreally
tinyreallyfast,causingunderflow.
– So,weusecosts (neg logprobs)instead.– Takeminimumoversumofcosts,insteadofmaximum
overproductofprobs.
![Page 32: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/32.jpg)
Higher-orderViterbi
• ForatagtrigrammodelwithT possibletags,
weeffectivelyneedT2 states
– n-gramViterbirequiresTn-1 states,takesO(TnN)timeandO(Tn-1N) space.
Noun Verb
VerbPrepVerbNoun
VerbVerb
Verb</s>
![Page 33: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/33.jpg)
HMMs:whatelse?
• UsingViterbi,wecanfindthebesttagsfora
sentence(decoding),andget-(.,/).
• Wemightalsowantto
– Computethelikelihood -(.),i.e.,theprobabilityofasentenceregardlessofitstags(alanguagemodel!)
– learn thebestsetofparameters(transition&emission
probs.)givenonlyanunannotated corpusofsentences.
![Page 34: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/34.jpg)
Computingthelikelihood
• Fromprobabilitytheory,weknowthat
• ThereareanexponentialnumberofTs.
• Again,bycomputingandstoringpartialresults,we
cansolveefficiently.
• (Advancedslidesshowthealgorithmforthosewhoareinterested!)
-(.) =Å-(.,/)�
/
![Page 35: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/35.jpg)
Summary
• HMM:agenerativemodelofsentencesusing
hiddenstatesequence
• Greedytagging:fastbutsuboptimal
• Dynamicprogrammingalgorithmstocompute
– Besttagsequencegivenwords(Viterbialgorithm)
– Likelihood(forwardalgorithm—seeadvancedslides)
– Bestparametersfromunannotatedcorpus
(forward-backwardalgorithm,aninstanceofEM—
seeadvancedslides)
![Page 36: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/36.jpg)
AdvancedTopics
(thefollowingslidesarejustforpeoplewhoareinterested)
![Page 37: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/37.jpg)
Notation
• Sequenceofobservationsovertimeo1, o2, …, oN– here,wordsinsentence
• VocabularysizeV ofpossibleobservations
• Setofpossiblestatesq1, q2, …, qT(seenotenextslide)
– here,tags
• A,anT×T matrixoftransitionprobabilities
– aij:theprob oftransitioningfromstatei toj.
• B,anT×V matrixofoutputprobabilities
– bi(ot):theprob ofemittingot fromstatei.
![Page 38: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/38.jpg)
Noteonnotation
• J&Museq1, q2, …, qN forsetofstates,butalso useq1, q2, …, qN forstatesequenceovertime.
– So,justseeingq1 isambiguous(thoughusually
disambiguatedfromcontext).
– I’llinsteaduseqi forstatenames,andqn forstateattimen.
– Sowecouldhaveqn = qi,meaning:thestatewe’reinat
timen is qi.
![Page 39: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/39.jpg)
HMMexamplew/newnotation
• States{q1,q2}(or{<s>,q1,q2}):thinkNN,VB
• Outputsymbols{x,y,z}:thinkchair,dog,help
q1 q2
x y z
.6 .1 .3
x y z
.1 .7 .2
.5
.3
.5
.7
Start
AdaptedfromManning&Schuetze,Fig9.2
![Page 40: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/40.jpg)
HMMexamplew/newnotation
• ApossiblesequenceofoutputsforthisHMM:
• ApossiblesequenceofstatesforthisHMM:
• Fortheseexamples,N = 9,q3= q2 ando3= y
z y y x y z x z z
q1 q2 q2 q1 q1 q2 q1 q1 q1
![Page 41: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/41.jpg)
TransitionandOutputProbabilities
• TransitionmatrixA:aij =P(qj |qi)
Ex: P(qn=q2|qn-1=q1)=.3
• OutputmatrixB:bi(o)=P(o|qi)
Ex: P(on=y |qn=q1)=.1
q1 q2
<s> 1 0
q1 .7 .3
q2 .5 .5
x y zq1 .6 .1 .3
q2 .1 .7 .2
![Page 42: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/42.jpg)
Forwardalgorithm
• Useatablewithcellsα(j,t): theprobabilityofbeinginstatej afterseeingo1…ot (forwardprobability).
• Fillincolumnsfromlefttoright,with
– SameasViterbi,butsuminsteadofmax(andnobacktrace).
â ä, 3 =Åâ c, 3 − 1Q
275
[ V2ã[ Rã OU
â(ä, 3) = -(O1, O2, … O3, P3 = ä|N)
Note:becausethere’sasum,wecan’tusethetrickthatreplacesprobs withcosts.For
implementationinfo,seehttp://digital.cs.usu.edu/~cyan/CS7960/hmm-tutorial.pdf and
http://stackoverflow.com/questions/13391625/underflow-in-forward-algorithm-for-hmms .
![Page 43: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/43.jpg)
Example
• SupposeO=xzy.Ourinitiallyemptytable:
o1=x o2=z o3=yq1
q2
![Page 44: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/44.jpg)
Fillingthefirstcolumn
o1=x o2=z o3=yq1 .6
q2 0
â 1,1 = V\]^5 [ R1 z) = 1 (.6â 2,1 = V\]^| [ R2 z) = 0 (.1
![Page 45: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/45.jpg)
Startingthesecondcolumn
o1=x o2=z o3=yq1 .6 .126
q2 0
â 1,2 =Åâ c, 1Q
275
[ V25 [ R1 Z
= .6 .7 .3 + 0 .5 .3
= â 1,1 [ V55[ R5 Z + â 2,1 [ V|5[ R1(Z)
= .126
![Page 46: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/46.jpg)
Finishingthesecondcolumn
o1=x o2=z o3=yq1 .6 .126
q2 0 .036
â 2,2 =Åâ c, 1Q
275
[ V2| [ R2 Z
= .6 .3 .2 + 0 .5 .2
= â 1,1 [ V5|[ R| Z + â 2,1 [ V||[ R2(Z)
= .036
![Page 47: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/47.jpg)
Thirdcolumnandfinish
• Addupallprobabilitiesinlastcolumntogetthe
probabilityoftheentiresequence:
o1=x o2=z o3=yq1 .6 .126 .01062
q2 0 .036 .03906
- @|N =Åâ c, AQ
275
![Page 48: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/48.jpg)
Learning
• Givenonly theoutputsequence,learnthebestsetofparametersλ =(A,B).
• Assume‘best’=maximum-likelihood.
• Otherdefinitionsarepossible,won’tdiscusshere.
![Page 49: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/49.jpg)
Unsupervisedlearning
• TraininganHMMfromanannotatedcorpusis
simple.
– Supervised learning:wehaveexampleslabelledwiththe
right‘answers’(here,tags):nohiddenvariablesintraining.
• Trainingfromunannotatedcorpusistrickier.
– Unsupervised learning:wehavenoexampleslabelledwith
theright‘answers’:allweseeareoutputs,statesequence
ishidden.
![Page 50: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/50.jpg)
Circularity
• Ifweknowthestatesequence,wecanfindthebestλ.– E.g.,useMLE:
• Ifweknowλ,wecanfindthebeststatesequence.– useViterbi
• Butwedon'tknoweither!
- Pä|Pc = ç(S2→Sã)ç(S2)
![Page 51: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/51.jpg)
Expectation-maximization(EM)
Asinspellingcorrection,wecanuseEMtobootstrap,
iterativelyupdatingtheparametersandhiddenvariables.
• Initializeparametersλ(0)
• Ateachiterationk,– E-step:Computeexpectedcountsusingλ(k-1)
– M-step:Setλ(k) usingMLEontheexpectedcounts
• Repeatuntilλdoesn'tchange(orotherstoppingcriterion).
![Page 52: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/52.jpg)
Expectedcounts??
Countingtransitionsfromqi→qj:
• Realcounts:
– count1eachtimeweseeqi→qj intruetagsequence.
• Expectedcounts:
– Withcurrentλ,computeprobs ofallpossibletagsequences.
– IfsequenceQ hasprobabilityp,count p foreachqi→qj inQ.– Addupthesefractionalcountsacrossallpossiblesequences.
![Page 53: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/53.jpg)
Example
• Notionally,wecomputeexpectedcountsasfollows:
Possible
sequence
Probabilityof
sequence
Q1= q1 q1 q1 p1Q2= q1 q2 q1 p2Q3= q1 q1 q2 p3Q4= q1 q2 q2 p4Observs: x z y
![Page 54: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/54.jpg)
Example
• Notionally,wecomputeexpectedcountsasfollows:
êë P1 → P1 = 2í1 + í3
Possible
sequence
Probabilityof
sequence
Q1= q1 q1 q1 p1Q2= q1 q2 q1 p2Q3= q1 q1 q2 p3Q4= q1 q2 q2 p4Observs: x z y
![Page 55: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/55.jpg)
Forward-Backwardalgorithm
• Asusual,avoidenumeratingallpossiblesequences.
• Forward-Backward (Baum-Welch)algorithmcomputes
expectedcountsusingforwardprobabilitiesand
backwardprobabilities:
– Details,seeJ&M6.5
• EMideaismuchmoregeneral:canuseformanylatent
variablemodels.
ì(ä, 3) = -(P3 = ä, OUî5, OUî|, … OA|N)
![Page 56: Lecture 12: Algorithms for HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_slides.pdfHigher-order Viterbi •For a tag trigrammodel with Tpossible tags, we effectively](https://reader034.fdocuments.in/reader034/viewer/2022042918/5f5decdb724fbf03c352b121/html5/thumbnails/56.jpg)
Guarantees
• EMisguaranteedtofindalocalmaximumofthelikelihood.
• Notguaranteedtofindglobalmaximum.
• Practicalissues:initialization,randomrestarts,earlystopping.
Factis,itdoesn’tworkwellforlearningPOStaggers!
valuesofλ
P(O| λ)