Unit 3: Natural Language Learning - Oregon State...
Transcript of Unit 3: Natural Language Learning - Oregon State...
![Page 1: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/1.jpg)
Language TechnologyFall 2014
Liang Huang
Unit 3: Natural Language LearningUnsupervised Learning
(EM, forward-backward, inside-outside)
![Page 2: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/2.jpg)
CS 562 - EM
Review of Noisy-Channel Model
2
![Page 3: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/3.jpg)
CS 562 - EM
Example 1: Part-of-Speech Tagging
• use tag bigram as a language model
• channel model is context-indep.
3
![Page 4: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/4.jpg)
CS 562 - EM
Ideal vs. Available Data
4
ideal available
![Page 5: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/5.jpg)
CS 562 - EM
Ideal vs. Available Data
5
HW2: ideal HW4: realisticEY B AH LA B E R U1 2 3 4 4
AH B AW TA B A U T O1 2 3 3 4 4
AH L ER TA R A A T O1 2 3 3 4 4
EY SE E S U1 1 2 2
EY B AH LA B E R U
AH B AW TA B A U T O
AH L ER TA R A A T O
EY SE E S U
![Page 6: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/6.jpg)
CS 562 - EM
Incomplete Data / Model
6
![Page 7: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/7.jpg)
CS 562 - EM
EM: Expectation-Maximization
7
![Page 8: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/8.jpg)
CS 562 - EM
How to Change m? 1) Hard
8
![Page 9: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/9.jpg)
CS 562 - EM
How to Change m? 1) Hard
9
![Page 10: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/10.jpg)
CS 562 - EM
How to Change m? 2) Soft
10
![Page 11: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/11.jpg)
CS 562 - EM
Fractional Counts• distribution over all possible hallucinated hidden variables
• W AI NW A I N
11
W AI N| | / \ W A I N
W AI N| |\ \ W A I N
W AI N|\ \ \ W A I N
hard-EM counts 1 0 0
AY|-> A: 0.333 A I: 0.333 I: 0.333W|-> W: 0.667 W A: 0.333 N|-> N: 0.667 I N: 0.333
fractional counts 0.333 0.333 0.333
fractional counts 0.25 0.5 0.25AY|-> A I: 0.500 A: 0.250 I: 0.250W|-> W: 0.750 W A: 0.250 N|-> N: 0.750 I N: 0.250
eventually ... 0 ... 1 ... 0
![Page 12: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/12.jpg)
CS 562 - EM
Fractional Counts
• how about
W EH T
W E T O
B IY B IY| |\ |\ \B I I B I I
• so EM can possibly: (1) learn something correct (2) learn something wrong (3) doesn’t learn anything
• but with lots of data => likely to learn something good12
![Page 13: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/13.jpg)
CS 562 - EM
EM: slow version (non-DP)
• initialize the conditional prob. table to uniform
• repeat until converged:
• E-step:
• for each training example x (here: (e...e, j...j) pair):
• for each hidden z: compute p(x, z) from the current model
• p(x) = sumz p(x, z); [debug: corpus prob p(data) *= p(x)]
• for each hidden z = (z1 z2 ... zn): for each i:
• fraccount(zi) += p(x, z) / p(x)
• M-step: count-n-divide on fraccounts => new model
13
W AI N| |\ \ W A I N
z’
W AI N|\ \ \ W A I N
z’’
W AI N| | /\ W A I N
z(z1 z2 z3)
![Page 14: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/14.jpg)
CS 562 - EM
EM: fast version (DP)
• initialize the conditional prob. table to uniform
• repeat until converged:
• E-step:
• for each training example x (here: (e...e, j...j) pair):
• forward from s to t; note: forw[t] = p(x) = sumz p(x, z)
• backward from t to s; note: back[t]=1; back[s] = forw[t]
• for each edge (u, v) in the DP graph with label(u, v) = zi
• fraccount(zi) += forw[u] * back[v] * prob(u, v) / p(x)
• M-step: count-n-divide on fraccounts => new model
14sumz: (u, v) in z p(x, z)
forw[u] back[v]u v ts
forw[t] = back[s] = p(x) = sumz p(x, z)
![Page 15: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/15.jpg)
CS 562 - EM
inside-outside:PCFG, SCFG, ...
How to avoid enumeration?
• dynamic programming: the forward-backward algorithm
• forward is just like Viterbi, replacing max by sum
• backward is like reverse Viterbi (also with sum)
15
POS tagging, crypto, ...
alignment, edit-distance, ...
![Page 16: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/16.jpg)
CS 562 - EM
Example Forward Code• for HW5. this example shows forward only.
16
n, m = len(eprons), len(jprons) forward[0][0] = 1
for i in xrange(0, n): epron = eprons[i] for j in forward[i]: for k in range(1, min(m-j, 3)+1): jseg = tuple(jprons[j:j+k]) score = forward[i][j] * table[epron][jseg] forward[i+1][j+k] += score
totalprob *= forward[n][m]
W A I N
W
AY
N
0 1 2 3 4
0
1
2
3
![Page 17: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/17.jpg)
CS 562 - EM
Example Forward Code• for HW5. this example shows forward only.
17
n, m = len(eprons), len(jprons) forward[0][0] = 1
for i in xrange(0, n): epron = eprons[i] for j in forward[i]: for k in range(1, min(m-j, 3)+1): jseg = tuple(jprons[j:j+k]) score = forward[i][j] * table[epron][jseg] forward[i+1][j+k] += score
totalprob *= forward[n][m]............ A I .........
AY
forw[i][j]forw[i][j]forw[i][j]forw[i][j]forw[i][j]forw[i][j]forw[i][j]forw[i][j]
back[i+1][j+k]back[i+1][j+k]back[i+1][j+k]back[i+1][j+k]back[i+1][j+k]back[i+1][j+k]
0 j+k m
0
n
j
i
i+1
forw[u] back[v]u v ts
u
v
s
t
forw[s] = back[t] = 1.0 forw[t] = back[s] = p(x)
![Page 18: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/18.jpg)
CS 562 - EM
EM: fast version (DP)
• initialize the conditional prob. table to uniform
• repeat until converged:
• E-step:
• for each training example x (here: (e...e, j...j) pair):
• forward from s to t; note: forw[t] = p(x) = sumz p(x, z)
• backward from t to s; note: back[t]=1; back[s] = forw[t]
• for each edge (u, v) in the DP graph with label(u, v) = zi
• fraccount(zi) += forw[u] * back[v] * prob(u, v) / p(x)
• M-step: count-n-divide on fraccounts => new model
18sumz: (u, v) in z p(x, z)
forw[u] back[v]u v ts
forw[t] = back[s] = p(x) = sumz p(x, z)
![Page 19: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/19.jpg)
CS 562 - EM
EM
19
![Page 20: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/20.jpg)
CS 562 - EM
Why EM increases p(data) iteratively?
20
![Page 21: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/21.jpg)
CS 562 - EM
Why EM increases p(data) iteratively?
21
convexauxiliary function
converge tolocal maxima
KL-
dive
rgen
ce
![Page 22: Unit 3: Natural Language Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/nlp/2014fall/lec-EM.pdfLanguage Technology Fall 2014 Liang Huang liang.huang.sh@gmail.com](https://reader033.fdocuments.in/reader033/viewer/2022050504/5f9653dd6b8dc3184600bcae/html5/thumbnails/22.jpg)
CS 562 - EM
How to maximize the auxiliary?
22
W AI N| |\ \ W A I N
p(z’|x)=0.3
W AI N|\ \ \ W A I N
p(z’’|x)=0.2
W AI N| | /\
W A I N
p(z|x)=0.5
just count-n-divide on the fractional data!
(as if MLE on complete data)
W AI N| |\ \ W A I N
3x
W AI N|\ \ \ W A I N
2x
W AI N| | /\
W A I N
5x