Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms...
Transcript of Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms...
![Page 1: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/1.jpg)
Lecture 11: Viterbi and Forward
Algorithms
Kai-Wei ChangCS @ University of Virginia
Couse webpage: http://kwchang.net/teaching/NLP16
1CS6501 Natural Language Processing
![Page 2: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/2.jpg)
Quiz 1
vMax: 24; Mean: 18.1; Median: 18; SD: 3.36
CS6501 Natural Language Processing 2
0
5
10
15
20
25
30
[0-5] [6-10] [11-15] [16-20] [21-25]
Quiz1
![Page 3: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/3.jpg)
This lecture
vTwo important algorithms for inferencevForward algorithmvViterbi algorithm
3CS6501 Natural Language Processing
![Page 4: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/4.jpg)
CS6501 Natural Language Processing ‹#›
![Page 5: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/5.jpg)
Three basic problems for HMMs
vLikelihood of the input:vForward algorithm
vDecoding (tagging) the input:vViterbi algorithm
vEstimation (learning):vFind the best model parameters
v Case 1: supervised – tags are annotatedvMaximum likelihood estimation (MLE)
v Case 2: unsupervised -- only unannotated textvForward-backward algorithm
CS6501 Natural Language Processing 5
Howlikelythesentence”Ilovecat”occurs
POStagsof”Ilovecat”occurs
Howtolearnthemodel?
![Page 6: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/6.jpg)
Likelihood of the input
vHow likely a sentence “I love cat” occurvCompute 𝑃(𝒘 ∣ 𝝀) for the input 𝒘 and
HMM 𝜆
vRemember, we model 𝑃 𝒕,𝒘 ∣ 𝝀
v𝑃 𝒘 𝝀 = ∑ 𝑃 𝒕,𝒘 𝝀 𝒕
CS6501 Natural Language Processing 6
Marginalprobability:Sumoverallpossibletagsequences
![Page 7: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/7.jpg)
Likelihood of the input
v How likely a sentence “I love cat” occurv 𝑃 𝒘 𝝀 = ∑ 𝑃 𝒕,𝒘 𝝀 𝒕
= ∑ Π./01 𝑃 𝑤. 𝑡. 𝑃 𝑡. ∣ 𝑡.40 𝒕 v Assume we have 2 tags N, Vv 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡” 𝝀
= 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑁𝑁𝑁” 𝝀 + 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑁𝑁𝑉” 𝝀+𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑁𝑉𝑁” 𝝀 + 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑁𝑉𝑉” 𝝀+𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑉𝑁𝑁” 𝝀 + 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑉𝑁𝑉” 𝝀
+𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑉𝑉𝑁” 𝝀 + 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑉𝑉𝑉” 𝝀
v Now, let’s write down 𝑃(“𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡” ∣ 𝝀) with 45 tags…
CS6501 Natural Language Processing 7
![Page 8: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/8.jpg)
Trellis diagram
vGoal: P(𝐰 ∣ 𝝀) = ∑ Π./01 𝑃 𝑤. 𝑡. 𝑃 𝑡. ∣ 𝑡.40 𝒕
CS6501 Natural Language Processing 8
𝑃(𝑡C = 2|𝑡0 = 1)
𝑖 = 1𝑖 = 2𝑖 = 3𝑖 = 4
𝑃(𝑡J = 1|𝑡C = 1) 𝑃(𝑤J|𝑡J = 1)
⋯
𝝀 istheparametersetofHMM.Let’signore itinsomeslidesforsimplicity’ssake
![Page 9: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/9.jpg)
Trellis diagram
v P(“I eat a fish”, NVVA )
CS6501 Natural Language Processing 9
𝑖 = 1𝑖 = 2𝑖 = 3𝑖 = 4
N
V
A
N
V
A
N
V
A
N
V
A
𝑃(𝐼|𝑁)
𝑃(𝑒𝑎𝑡|𝑉) 𝑃(𝑎|𝑉)
𝑃(𝑓𝑖𝑠ℎ|A)
𝑃(𝑉|𝑁) 𝑃(𝑉|𝑉)
𝑃(𝑁| < 𝑆 >)
𝑃(𝐴|𝑉)
⋯
![Page 10: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/10.jpg)
Trellis diagram
v ∑ Π./01 𝑃 𝑤. 𝑡. 𝑃 𝑡. ∣ 𝑡.40 𝒕 : sum over all paths
CS6501 Natural Language Processing 10
𝑖 = 1𝑖 = 2𝑖 = 3𝑖 = 4
N
V
A
N
V
A
N
V
A
N
V
A
⋯
![Page 11: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/11.jpg)
Dynamic programming
vRecursively decompose a problem into smaller sub-problems
vSimilar to mathematical inductionvBase step: initial values for 𝑖 = 1v Inductive step: assume we know the values for 𝑖 = 𝑘, let’s compute 𝑖 = 𝑘 + 1
CS6501 Natural Language Processing 11
![Page 12: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/12.jpg)
Forward algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v 𝒕Y :tagsequencewithlength𝑘,𝒘Y = 𝑤0, 𝑤C …𝑤Yv ∑ 𝑃(𝒕Y,𝒘𝒌)𝒕g = ∑ ∑ 𝑃(𝒕Y40, 𝒘𝒌, 𝑡Y = 𝑞)𝒕gij k
CS6501 Natural Language Processing 12𝑖 = 1𝑖 = 2𝑖 = 3𝑖 = 4
N
V
A
N
V
A
N
V
A
N
V
A
⋯
tagsequences tag@i=k
P(𝒘𝒌, 𝑡Y = 𝑞)
tagsequences
![Page 13: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/13.jpg)
Forward algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v ∑ 𝑃(𝒕Y,𝒘)𝒕g = ∑ P(𝒘𝒌, 𝑡Y = 𝑞) k
v P(𝒘𝒌, 𝑡Y = 𝑞)= ∑ 𝑃(kl 𝒘𝒌, 𝑡Y40 = 𝑞m, 𝑡Y = 𝑞)
=∑ 𝑃(km 𝒘𝒌40, 𝑡Y40 = 𝑞m)𝑃(𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞′)𝑃(𝑤Y ∣ 𝑡Y = 𝑞)
CS6501 Natural Language Processing13𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
![Page 14: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/14.jpg)
Forward algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v ∑ 𝑃(𝒕Y,𝒘)𝒕g = ∑ P(𝒘𝒌, 𝑡Y = 𝑞) k
v P(𝒘𝒌, 𝑡Y = 𝑞)= ∑ 𝑃(kl 𝒘𝒌, 𝑡Y40 = 𝑞m, 𝑡Y = 𝑞)
=∑ 𝑃(km 𝒘𝒌40, 𝑡Y40 = 𝑞m)𝑃(𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞′)𝑃(𝑤Y ∣ 𝑡Y = 𝑞)
CS6501 Natural Language Processing14𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
Let’scallit𝛼Y(𝑞)
Thisis𝛼Y40(𝑞′)
![Page 15: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/15.jpg)
Forward algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v 𝛼Y(𝑞)=∑ 𝛼Y40(𝑞′)km 𝑃(𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞′)𝑃(𝑤Y ∣ 𝑡Y = 𝑞)
CS6501 Natural Language Processing15𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
![Page 16: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/16.jpg)
Forward algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v 𝛼Y(𝑞)= ∑ 𝛼Y40(𝑞′)km 𝑃(𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞′)𝑃(𝑤Y ∣ 𝑡Y = 𝑞)
=𝑃(𝑤Y ∣ 𝑡Y = 𝑞)∑ 𝛼Y40(𝑞′)km 𝑃(𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞′)
CS6501 Natural Language Processing16𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
![Page 17: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/17.jpg)
Forward algorithm
v Base step: i=0 v 𝛼0 𝑞 = 𝑃 𝑤0 𝑡0 = 𝑞 𝑃(𝑡0 = 𝑞 ∣ 𝑡q)
CS6501 Natural Language Processing17𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
initialprobability𝑝(𝑡0 = 𝑞)
![Page 18: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/18.jpg)
Implementation using an array
vUse a 𝑛×𝑇 table to keep 𝛼Y(𝑞)
CS6501 Natural Language Processing 18
FromJuliaHockenmaier, IntrotoNLP
![Page 19: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/19.jpg)
Implementation using an array
CS6501 Natural Language Processing 19
Initial:Trellis[1][q]= 𝑃 𝑤0 𝑡0 = 𝑞 𝑃(𝑡0 = 𝑞 ∣ 𝑡q)
![Page 20: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/20.jpg)
Implementation using an array
CS6501 Natural Language Processing 20
Induction:𝛼Y(𝑞)=𝑃(𝑤Y ∣ 𝑡Y = 𝑞)∑ 𝛼Y40(𝑞′)km 𝑃(𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞′)
ii
![Page 21: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/21.jpg)
The forward algorithm (Pseudo Code)
CS6501 Natural Language Processing 21
.fwd=0
![Page 22: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/22.jpg)
Jason’s ice cream
vP(”1,2,1”)?
CS6501 Natural Language Processing 22
p(…|C) p(…|H) p(…|START)p(1|…) 0.5 0.1p(2|…) 0.4 0.2p(3|…) 0.1 0.7p(C|…) 0.8 0.2 0.5p(H|…) 0.2 0.8 0.5
Scroll to the bottom to see a graph of what states and transitions the model thinks are likely on each day. Those likely states and transitions can be used to reestimate the red probabilities (this is the "forward-backward" or Baum-Welch algorithm), incr
#cones
C
H
C
H
C
H
0.5
0.5
0.8
0.80.8
0.8
0.2
0.2
0.2
0.2
0.5 0.50.4
0.1 0.10.2
![Page 23: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/23.jpg)
Three basic problems for HMMs
vLikelihood of the input:vForward algorithm
vDecoding (tagging) the input:vViterbi algorithm
vEstimation (learning):vFind the best model parameters
v Case 1: supervised – tags are annotatedvMaximum likelihood estimation (MLE)
v Case 2: unsupervised -- only unannotated textvForward-backward algorithm
CS6501 Natural Language Processing 23
Howlikelythesentence”Ilovecat”occurs
POStagsof”Ilovecat”occurs
Howtolearnthemodel?
![Page 24: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/24.jpg)
Prediction in generative model
v Inference: What is the most likely sequence of tags for the given sequence of words w
vWhat are the latent states that most likely generate the sequence of word w
CS6501 Natural Language Processing 24
initialprobability𝑝(𝑡0)
![Page 25: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/25.jpg)
Tagging the input
vFind best tag sequence of “I love cat”vRemember, we model 𝑃 𝒕,𝒘 ∣ 𝝀
v 𝒕∗ = argmax𝒕𝑃 𝒕,𝒘 𝝀
CS6501 Natural Language Processing 25
Findthebestonefromallpossibletagsequences
![Page 26: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/26.jpg)
Tagging the input
v Assume we have 2 tags N, Vv Which one is the best?𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑁𝑁𝑁” 𝝀 , 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑁𝑁𝑉” 𝝀 ,𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑁𝑉𝑁” 𝝀 , 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑁𝑉𝑉” 𝝀 ,𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑉𝑁𝑁” 𝝀 , 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑉𝑁𝑉” 𝝀 ,
𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑉𝑉𝑁” 𝝀 , 𝑃 “𝐼𝑙𝑜𝑣𝑒𝑐𝑎𝑡”, ”𝑉𝑉𝑉” 𝝀
v Again! We need an efficient algorithm
CS6501 Natural Language Processing 26
![Page 27: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/27.jpg)
Trellis diagram
vGoal: argmax𝒕Π./01 𝑃 𝑤. 𝑡. 𝑃 𝑡. ∣ 𝑡.40
CS6501 Natural Language Processing 27
𝑃(𝑡C = 2|𝑡0 = 1)
𝑖 = 1𝑖 = 2𝑖 = 3𝑖 = 4
𝑃(𝑡J = 1|𝑡C = 1) 𝑃(𝑤J|𝑡J = 1)
⋯
![Page 28: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/28.jpg)
Trellis diagram
v Goal: argmax𝒕Π./01 𝑃 𝑤. 𝑡. 𝑃 𝑡. ∣ 𝑡.40
v Find the best path!
CS6501 Natural Language Processing 28
𝑖 = 1𝑖 = 2𝑖 = 3𝑖 = 4
N
V
A
N
V
A
N
V
A
N
V
A
⋯
![Page 29: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/29.jpg)
Dynamic programming again!
vRecursively decompose a problem into smaller sub-problems
vSimilar to mathematical inductionvBase step: initial values for 𝑖 = 1v Inductive step: assume we know the values for 𝑖 = 𝑘, let’s compute 𝑖 = 𝑘 + 1
CS6501 Natural Language Processing 29
![Page 30: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/30.jpg)
Viterbi algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v 𝒕Y :tagsequencewithlength𝑘,𝒘Y = 𝑤0, 𝑤C …𝑤Yv max
𝒕𝒌𝑃(𝒕Y,𝒘𝒌) = 𝑚𝑎𝑥kmax𝒕𝒌i𝟏
𝑃(𝒕Y40, 𝑡Y = 𝑞,𝒘𝒌)
CS6501 Natural Language Processing 30𝑖 = 1𝑖 = 2𝑖 = 3𝑖 = 4
N
V
A
N
V
A
N
V
A
N
V
A
⋯
tagsequencestag@i=k tagsequences
![Page 31: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/31.jpg)
Viterbi algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v max
𝒕𝒌i𝟏𝑃(𝒕Y40, 𝑡Y = 𝑞,𝒘𝒌)
= 𝑚𝑎𝑥kl max𝒕𝒌i𝟏𝑃(𝒕Y4𝟐, 𝑡Y = 𝑞, 𝑡Y40 = 𝑞m,𝒘𝒌)
= 𝑚𝑎𝑥kl max𝒕𝒌i𝟏𝑃 𝒕Y4𝟐, 𝑡Y40 = 𝑞m,𝒘𝒌4𝟏 𝑃 𝑡Y = 𝑞, 𝑡Y40 = 𝑞m 𝑃 𝑤Y 𝑡Y = 𝑞
CS6501 Natural Language Processing31𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
Let’scallit𝛿Y(𝑞)
Thisis𝛿Y40(𝑞′)
![Page 32: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/32.jpg)
Viterbi algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v 𝛿Y 𝑞 =max
km𝛿Y40(𝑞m)𝑃(𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞′)𝑃(𝑤Y ∣ 𝑡Y = 𝑞)
CS6501 Natural Language Processing32𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
![Page 33: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/33.jpg)
Viterbi algorithm
v Inductive step: from 𝑖 = 𝑘 toi =k+1v 𝛿Y 𝑞 =max
kl𝛿Y40 𝑞m 𝑃 𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞m 𝑃 𝑤Y ∣ 𝑡Y = 𝑞
=𝑃 𝑤Y ∣ 𝑡Y = 𝑞 maxkl
𝛿Y40 𝑞m 𝑃 𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞m
CS6501 Natural Language Processing33𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
![Page 34: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/34.jpg)
Viterbi algorithm
v Base step: i=0 v 𝛿0 𝑞 = 𝑃 𝑤0 𝑡0 = 𝑞 𝑃(𝑡0 = 𝑞 ∣ 𝑡q)
CS6501 Natural Language Processing34𝑖 = 𝑘 − 1𝑖 = 𝑘
N
V
A
N
V
A
N
V
A
N
V
A
⋯
initialprobability𝑝(𝑡0 = 𝑞)
![Page 35: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/35.jpg)
Implementation using an array
CS6501 Natural Language Processing 35
Initial:Trellis[1][q]= 𝑃 𝑤0 𝑡0 = 𝑞 𝑃(𝑡0 = 𝑞 ∣ 𝑡q)
![Page 36: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/36.jpg)
Implementation using an array
CS6501 Natural Language Processing 36
Induction:𝛿Y 𝑞 = 𝑃 𝑤Y ∣ 𝑡Y = 𝑞 max
kl𝛿Y40 𝑞m 𝑃 𝑡Y = 𝑞 ∣ 𝑡Y40 = 𝑞m
![Page 37: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/37.jpg)
Retrieving the best sequence
vKeep one backpointer
CS6501 Natural Language Processing 37
![Page 38: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/38.jpg)
The Viterbi algorithm (Pseudo Code)
CS6501 Natural Language Processing 38
.fwd=0
Maxinsteadofsum
![Page 39: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/39.jpg)
CS6501 Natural Language Processing 39
![Page 40: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/40.jpg)
Jason’s ice cream
vBest tag sequence for P(”1,2,1”)?
CS6501 Natural Language Processing 40
p(…|C) p(…|H) p(…|START)p(1|…) 0.5 0.1p(2|…) 0.4 0.2p(3|…) 0.1 0.7p(C|…) 0.8 0.2 0.5p(H|…) 0.2 0.8 0.5
Scroll to the bottom to see a graph of what states and transitions the model thinks are likely on each day. Those likely states and transitions can be used to reestimate the red probabilities (this is the "forward-backward" or Baum-Welch algorithm), incr
#cones
C
H
C
H
C
H
0.5
0.5
0.8
0.80.8
0.8
0.2
0.2
0.2
0.2
0.5 0.50.4
0.1 0.10.2
![Page 41: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/41.jpg)
Trick: computing everything in log space
vHomework:vWrite forward and Viterbi algorithm in log-space
vHint: you need a function to compute log(a+b)
CS6501 Natural Language Processing 41
![Page 42: Lecture 11: Viterbi and Forward Algorithms · 2019-04-14 · This lecture vTwo important algorithms for inference vForward algorithm vViterbi algorithm CS6501 Natural Language Processing](https://reader033.fdocuments.in/reader033/viewer/2022041923/5e6cca552153037f05685548/html5/thumbnails/42.jpg)
Three basic problems for HMMs
vLikelihood of the input:vForward algorithm
vDecoding (tagging) the input:vViterbi algorithm
vEstimation (learning):vFind the best model parameters
v Case 1: supervised – tags are annotatedvMaximum likelihood estimation (MLE)
v Case 2: unsupervised -- only unannotated textvForward-backward algorithm
CS6501 Natural Language Processing 42
Howlikelythesentence”Ilovecat”occurs
POStagsof”Ilovecat”occurs
Howtolearnthemodel?