Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell...
-
Upload
stephen-lang -
Category
Documents
-
view
216 -
download
0
Transcript of Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell...
![Page 1: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/1.jpg)
Dual Decomposition Inference for Graphical Models over Strings
Nanyun (Violet) PengRyan Cotterell Jason Eisner
Johns Hopkins University
1
![Page 2: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/2.jpg)
Attention!
• Don’t care about phonology?
• Listen anyway. This is a general method for
inferring strings from other strings (if you have a probability model).
• So if you haven’t yet observed all the words of your noisy or complex language, try it!
2
![Page 3: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/3.jpg)
A Phonological ExerciseTenses
Verb
s
3
[tɔk] [tɔks] [tɔkt]TALKTHANKHACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæks] [kɹækt][slæp] [slæpt]
![Page 4: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/4.jpg)
Matrix Completion: Collaborative Filtering
Movies
Use
rs
-37 29 19 29-36 67 77 22-24 61 74 12
-79 -41-52 -39
![Page 5: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/5.jpg)
Matrix Completion: Collaborative Filtering
29 19
Movies
Use
rs
2967 77 2261 74 12
-79 -41-39
-6 -3 2
[ 4 1 -5][ 7 -2 0][ 6 -2 3][-9 1 4][ 3 8 -5]
5
[
[
9 -2 1
[
[
9 -7 2
[
[
4 3 -2
[
[
-37-36-24
-52
![Page 6: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/6.jpg)
Matrix Completion: Collaborative Filtering
6
Prediction!
59 -806 46
-37 29 19 29-36 67 77 22-24 61 74 12
-79 -41-52 -39
-6 -3 2
[
[
9 -2 1
[
[
9 -7 2
[
[
[
[
[ 4 1 -5][ 7 -2 0][ 6 -2 3][-9 1 4][ 3 8 -5]
Movies
Use
rs
4 3 -2[
![Page 7: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/7.jpg)
Matrix Completion: Collaborative Filtering
[1,-4,3] [-5,2,1]
-10
-11
Dot Product
Gaussian Noise
7
![Page 8: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/8.jpg)
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALKTHANKHACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Tenses
Verb
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæks] [kɹækt][slæp] [slæpt]
8
![Page 9: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/9.jpg)
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALKTHANKHACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Suffixes
Stem
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæks] [kɹækt][slæp] [slæpt]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/slæp//kɹæk/
9
![Page 10: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/10.jpg)
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALKTHANKHACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæks] [kɹækt][slæp] [slæpt]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/slæp//kɹæk/
10
Suffixes
Stem
s
![Page 11: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/11.jpg)
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALK
HACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæk] [kɹæks] [kɹækt] [kɹækt][slæp] [slæps] [slæpt] [slæpt]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/slæp//kɹæk/
Prediction!11
THANK
Suffixes
Stem
s
![Page 12: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/12.jpg)
A Model of Phonology
tɔk s
tɔks
Concatenate
“talks”12
![Page 13: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/13.jpg)
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALKTHANKHACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAPCODEBAT
[kɹæks] [kɹækt][slæp] [slæpt]
[koʊdz] [koʊdɪt][bæt] [bætɪt]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/bæt//koʊd//slæp//kɹæk/
13
Suffixes
Stem
s
![Page 14: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/14.jpg)
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALK
HACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAPCODEBAT
[kɹæks] [kɹækt][slæp] [slæpt]
[koʊdz] [koʊdɪt][bæt] [bætɪt]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/bæt//koʊd//slæp//kɹæk/
z instead of s ɪt instead of t14
THANK
Suffixes
Stem
s
![Page 15: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/15.jpg)
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALK
HACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAPCODEBATEAT
[kɹæks] [kɹækt][slæp] [slæpt]
[koʊdz] [koʊdɪt][bæt] [bætɪt][it] [eɪt] [itən]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/it//bæt//koʊd//slæp//kɹæk/
eɪt instead of itɪt 15
THANK
Suffixes
Stem
s
![Page 16: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/16.jpg)
A Model of Phonology
koʊd s
koʊd#s
koʊdz
Concatenate
Phonology (stochastic)
“codes”
16
Modeling word forms using latent underlying morphs and phonology.Cotterell et. al. TACL 2015
![Page 17: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/17.jpg)
A Model of Phonology
rizaign ation
rizaign#ation
rεzɪgneɪʃn
“resignation”
Concatenate
17
Phonology (stochastic)
![Page 18: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/18.jpg)
dæmneɪʃənzrizaign
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
rεzɪgn#eɪʃən rizajn#z dæmn#zdæmn#eɪʃən
Fragment of Our Graph for English
18
1) Morphemes
2) Underlying words
3) Surface words
Concatenation
Phonology
“resignation” “resigns”
“damnation” “damns”
3rd-personsingular suffix:very common!
![Page 19: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/19.jpg)
Limited to concatenation? No, could extend to templatic morphology …
19
![Page 20: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/20.jpg)
Outline
20
● A motivating example: phonology● General framework:
o graphical models over stringso Inference on graphical models over strings
● Dual decomposition inferenceo The general ideao Substring features and active set
● Experiments and results
![Page 21: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/21.jpg)
Graphical Models over Strings?
● Joint distribution over many strings
● Variables● Range over * Σ infinite set of all strings
● Relations among variables● Usually specified by (multi-tape) FSTs
21
A probabilistic approach to language change (Bouchard-Côté et. al. NIPS 2008)
Graphical models over multiple strings. (Dreyer and Eisner. EMNLP 2009)
Large-scale cognate recovery (Hall and Klein. EMNLP 2011)
![Page 22: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/22.jpg)
Graphical Models over Strings?
● Strings are the basic units in natural languages.● Use
o Orthographic (spelling)o Phonological (pronunciation)o Latent (intermediate steps not observed directly)
● Sizeo Morphemes (meaningful subword units)o Wordso Multi-word phrases, including “named entities”o URLs
22
![Page 23: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/23.jpg)
What relationships could you model?
● spelling pronunciation
● word noisy word (e.g., with a typo)
● word related word in another language
(loanwords, language evolution, cognates)
● singular plural (for example)
● root word
● underlying form surface form
23
![Page 24: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/24.jpg)
Factor Graph for phonology
25
zrizajgn eɪʃən dæmn
rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
1) Morpheme URs
2) Word URs
3) Word SRs
Concatenation (e.g.)
Phonology (PFST)
log-probabilityLet’s maximize it!
zrizajgn eɪʃən dæmn
rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
1) Morpheme URs
2) Word URs
3) Word SRs
Concatenation (e.g.)
Phonology (PFST)
![Page 25: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/25.jpg)
Contextual Stochastic Edit Process
26
Stochastic contextual edit distance and probabilistic FSTs. (Cotterell et. al. ACL 2014)
![Page 26: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/26.jpg)
?
?
riz’ajnz
?
r,εzɪgn’eɪʃn
?
?
riz’ajnd
??
Inference on a Factor Graph
28
1) Morpheme URs
2) Word URs
3) Word SRs
![Page 27: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/27.jpg)
foo
?
riz’ajnz
?
r,εzɪgn’eɪʃn
s
?
riz’ajnd
dabar
Inference on a Factor Graph
29
1) Morpheme URs
2) Word URs
3) Word SRs
![Page 28: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/28.jpg)
Inference on a Factor Graph
30
foo
bar#s
riz’ajnz
bar#foo
r,εzɪgn’eɪʃn
s
bar#da
riz’ajnd
dabar1) Morpheme URs
2) Word URs
3) Word SRs
![Page 29: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/29.jpg)
Inference on a Factor Graph
8e-3 0.01 0.05 0.02
31
foo
bar#s
riz’ajnz
bar#foo
r,εzɪgn’eɪʃn
s
bar#da
riz’ajnd
dabar1) Morpheme URs
2) Word URs
3) Word SRs
![Page 30: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/30.jpg)
Inference on a Factor Graph
8e-3 0.01 0.05 0.02
32
foo
bar#s
riz’ajnz
bar#foo
r,εzɪgn’eɪʃn
s
bar#da
riz’ajnd
dabar1) Morpheme URs
2) Word URs
3) Word SRs
6e-12002e-1300 7e-1100
![Page 31: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/31.jpg)
Inference on a Factor Graph
8e-3 0.01 0.05 0.02
33
foo
bar#s
riz’ajnz
bar#foo
r,εzɪgn’eɪʃn
s
bar#da
riz’ajnd
dabar1) Morpheme URs
2) Word URs
3) Word SRs
6e-12002e-1300 7e-1100
![Page 32: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/32.jpg)
Inference on a Factor Graph
34
foo
far#s
riz’ajnz
far#foo
r,εzɪgn’eɪʃn
s
far#da
riz’ajnd
dafar1) Morpheme URs
2) Word URs
3) Word SRs
?
![Page 33: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/33.jpg)
Inference on a Factor Graph
35
foo
size#s
riz’ajnz
size#foo
r,εzɪgn’eɪʃn
s
size#da
riz’ajnd
dasize1) Morpheme URs
2) Word URs
3) Word SRs
?
![Page 34: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/34.jpg)
Inference on a Factor Graph
36
foo
…#s
riz’ajnz
…#foo
r,εzɪgn’eɪʃn
s
…#da
riz’ajnd
da…1) Morpheme URs
2) Word URs
3) Word SRs
?
![Page 35: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/35.jpg)
Inference on a Factor Graph
37
foo
rizajn#s
riz’ajnz
rizajn#foo
r,εzɪgn’eɪʃn
s
rizajn#da
riz’ajnd
darizajn1) Morpheme URs
2) Word URs
3) Word SRs
![Page 36: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/36.jpg)
Inference on a Factor Graph
38
foo
rizajn#s
riz’ajnz
rizajn#foo
r,εzɪgn’eɪʃn
s
rizajn#da
riz’ajnd
darizajn1) Morpheme URs
2) Word URs
3) Word SRs
0.012e-5 0.008
![Page 37: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/37.jpg)
Inference on a Factor Graph
39
eɪʃn
rizajn#s
riz’ajnz
rizajn#eɪʃn
r,εzɪgn’eɪʃn
s
rizajn#d
riz’ajnd
drizajn1) Morpheme URs
2) Word URs
3) Word SRs
0.010.001 0.015
![Page 38: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/38.jpg)
Inference on a Factor Graph
40
eɪʃn
rizajgn#s
riz’ajnz
rizajgn#eɪʃn
r,εzɪgn’eɪʃn
s
rizajgn#d
riz’ajnd
drizajgn1) Morpheme URs
2) Word URs
3) Word SRs
0.0080.008 0.013
![Page 39: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/39.jpg)
eɪʃn
rizajgn#s
riz’ajnz
rizajgn#eɪʃn
r,εzɪgn’eɪʃn
s
rizajgn#d
riz’ajnd
drizajgn
0.0080.008 0.013
Inference on a Factor Graph
41
![Page 40: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/40.jpg)
Challenges in Inference
42
• Global discrete optimization problem.
• Variables range over a infinite set … cannot be solved by ILP or even brute force. Undecidable!
• Our previous papers used approximate algorithms: Loopy Belief Propagation, or Expectation Propagation.
Q: Can we do exact inference? A: If we can live with 1-best and not marginal inference, then we can use Dual Decomposition … which is exact.
(if it terminates! the problem is undecidable in general …)
![Page 41: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/41.jpg)
Outline
43
● A motivating example: phonology● General framework:
o graphical models over stringso Inference on graphical models over strings
● Dual decomposition inferenceo The general ideao Substring features and active set
● Experiments and results
![Page 42: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/42.jpg)
Graphical Model for Phonology
44
Jointly decide the values of the inter-dependent latent variables, which range over a infinite set.
1) Morpheme URs
2) Word URs
3) Word SRs
Concatenation (e.g.)
Phonology (PFST)
zrizajgn eɪʃən dæmn
rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
rεzign eɪʃən
![Page 43: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/43.jpg)
General Idea of Dual Decomp
45
zrizajgn eɪʃən dæmn
rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
rεzign eɪʃən
![Page 44: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/44.jpg)
General Idea of Dual Decomp
zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn
rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4
46
![Page 45: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/45.jpg)
I preferrεzɪgn
I preferrizajn
General Idea of Dual Decomp
zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn
rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4
47
![Page 46: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/46.jpg)
Outline
48
● A motivating example: phonology● General framework:
o graphical models over stringso Inference on graphical models over strings
● Dual decomposition inferenceo The general ideao Substring features and active set
● Experiments and results
![Page 47: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/47.jpg)
zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn
rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4
49
![Page 48: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/48.jpg)
Substring Features and Active Set
zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn
rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z
r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz
Subproblem 1 Subproblem 1 Subproblem 1 Subproblem 1
50
I preferrεzɪgn
Less ε, ɪ, g; more i, a, j(to match others)
I preferrizajn
Less i, a, j;more ε, ɪ, g(to match others)
![Page 49: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/49.jpg)
Features: “Active set” method
• How many features?
• Infinitely many possible n-grams!
• Trick: Gradually increase feature set as needed.– Like Paul & Eisner (2012), Cotterell & Eisner (2015)
1. Only add features on which strings disagree.2. Only add abcd once abc and bcd already agree.
– Exception: Add unigrams and bigrams for free.
51
![Page 50: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/50.jpg)
Fragment of Our Graph for Catalan
52
?
?
grizos
?
gris
?
?
grize
??
grizes
?
?
Stem of “grey”
Separate these 4 words into 4 subproblems as before …
![Page 51: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/51.jpg)
53
? ?
grizos
?
gris
?
?
grize
??
??
grizes
Redraw the graph to focus on the stem …
![Page 52: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/52.jpg)
54
? ?
grizos
?
gris
?
?
grize
??
grizes
??
???
Separate into 4 subproblems – each gets its own copy of the stem
![Page 53: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/53.jpg)
55
? ?
grizos
?
gris
ε
?
grize
??
grizes
??
εε
ε
nonzero features:{ }
Iteration: 1
![Page 54: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/54.jpg)
56
? ?
grizos
?
gris
g
?
grize
??
grizes
??
gg
g
nonzero features: { }
Iteration: 3
![Page 55: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/55.jpg)
57
? ?
grizos
?
gris
gris
?
grize
??
grizes
??
grizgriz
griz
nonzero features: {s, z, is, iz, s$, z$ }
Iteration: 4
Feature weights (dual variable)
![Page 56: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/56.jpg)
58
? ?
grizos
?
gris
gris
?
grize
??
grizes
??
grizgrizo
griz
nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }
Iteration: 5
Feature weights (dual variable)
![Page 57: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/57.jpg)
59
? ?
grizos
?
gris
gris
?
grize
??
grizes
??
grizgrizo
griz
nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }
Iteration: 6
Iteration: 13
Feature weights (dual variable)
![Page 58: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/58.jpg)
60
? ?
grizos
?
gris
griz
?
grize
??
grizes
??
grizgrizo
griz
nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }
Iteration: 14
Feature weights (dual variable)
![Page 59: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/59.jpg)
61
? ?
grizos
?
gris
griz
?
grize
??
grizes
??
grizgriz
griz
nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }
Iteration: 17
Feature weights (dual variable)
![Page 60: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/60.jpg)
62
? ?
grizos
?
gris
griz
?
grize
??
grizes
??
grizegriz
griz
nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}
Iteration: 18
Feature weights (dual variable)
![Page 61: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/61.jpg)
63
? ?
grizos
?
gris
griz
?
grize
??
grizes
??
grizegriz
griz
nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}
Iteration: 19
Iteration: 29
Feature weights (dual variable)
![Page 62: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/62.jpg)
64
? ?
grizos
?
gris
griz
?
grize
??
grizes
??
grizgriz
griz
nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}
Iteration: 30
Feature weights (dual variable)
![Page 63: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/63.jpg)
65
? ?
grizos
?
gris
griz
?
grize
??
grizes
??
grizgriz
griz
nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}
Iteration: 30
Converged!
![Page 64: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/64.jpg)
I’ll try to arrange forr not i at position 2, i not z at position 3,z not at position 4.
Why n-gram features?
66
• Positional features don’t understand insertion:
• In contrast, our “z” feature counts the number of “z” phonemes, without regard to position.
These solutions already agree on “g”, “i”, “z” counts … they’re only negotiating over the “r” count.
gizgriz
gizgriz
I need more r’s.
![Page 65: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/65.jpg)
Why n-gram features?
67
• Adjust weights λ until the “r” counts match:
• Next iteration agrees on all our unigram features:
– Oops! Features matched only counts, not positions – But bigram counts are still wrong …
so bigram features get activated to save the day
– If that’s not enough, add even longer substrings …
gizgriz I need more r’s … somewhere.
girzgriz I need more gr, ri, iz,less gi, ir, rz.
![Page 66: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/66.jpg)
Outline
68
● A motivating example: phonology● General framework:
o graphical models over stringso Inference on graphical models over strings
● Dual decomposition inferenceo The general ideao Substring features and active set
● Experiments and results
![Page 67: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/67.jpg)
7 Inference Problems (graphs)
EXERCISE (small)
o 4 languages: Catalan, English, Maori, Tangale
o 16 to 55 underlying morphemes.
o 55 to 106 surface words.
CELEX (large)
o 3 languages: English, German, Dutch
o 341 to 381 underlying morphemes.
o 1000 surface words for each language.
69
# vars (unknown strings)
# subproblems
![Page 68: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/68.jpg)
Experimental Questions
o Is exact inference by DD practical?o Does it converge? o Does it get better results than approximate
inference methods?
o Does exact inference help EM?
71
![Page 69: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/69.jpg)
● DD seeks best λ via subgradient algorithm reduce dual objective tighten upper bound on primal objective
● If λ gets all sub-problems to agree (x1 = … = xK) constraints satisfied dual value is also value of a primal solution which must be max primal! (and min dual)
72
≤
primal (function of strings x)
dual(function of weights λ)
![Page 70: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/70.jpg)
Convergence behavior (full graph)
Catalan Maori
English Tangale73
Dual (tighten upper bound)
primal(improve strings)
optimal!
![Page 71: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/71.jpg)
Comparisons
● Compare DD with two types of Belief Propagation (BP) inference.
Approximate MAP inference(max-product BP)
(baseline)
Approximate marginal inference(sum-product BP)
(TACL 2015)
Exact MAP inference(dual decomposition)
(this paper)
74
Exact marginal inference(we don’t know how!)
variationalapproximation
Viterbiapproximation
![Page 72: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/72.jpg)
Inference accuracy
75
Approximate MAP inference(max-product BP)
(baseline)
Approximate marginal inference(sum-product BP)
(TACL 2015)
Exact MAP inference(dual decomposition)
(this paper)
Model 1, EXERCISE: 90% Model 1, CELEX: 84% Model 2S, CELEX: 99%Model 2E, EXERCISE: 91%
Model 1, EXERCISE: 95% Model 1, CELEX: 86% Model 2S, CELEX: 96%Model 2E, EXERCISE: 95%
Model 1, EXERCISE: 97% Model 1, CELEX: 90% Model 2S, CELEX: 99%Model 2E, EXERCISE: 98%
Model 1 – trivial phonologyModel 2S – oracle phonologyModel 2E – learned phonology (inference used within EM)
impro
ves improvesmore!
worse
![Page 73: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f1d5503460f94c34698/html5/thumbnails/73.jpg)
Conclusion
•A general DD algorithm for MAP inference on graphical models over strings.
•On the phonology problem, terminates in practice, guaranteeing the exact MAP solution.
•Improved inference for supervised model; improved EM training for unsupervised model.
•Try it for your own problems generalizing to new strings!
76