Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.
-
Upload
emmeline-snow -
Category
Documents
-
view
216 -
download
0
Transcript of Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.
![Page 1: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/1.jpg)
Penalized EP for Graphical Models Over Strings
Ryan Cotterell and Jason Eisner
![Page 2: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/2.jpg)
Natural Language is Built from Words
![Page 3: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/3.jpg)
Can store info about each word in a table
Index
Spelling
Meaning Pronunciation
Syntax
123 ca [si.ei] NNP (abbrev)
124 can [kɛɪn] NN
125 can [kæn], [kɛn], …
MD
126 cane [keɪn] NN (mass)
127 cane [keɪn] NN
128 canes [keɪnz] NNS
![Page 4: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/4.jpg)
Problem: Too Many Words!
• Technically speaking, # words = • Really the set of (possible) words is ∑*
• Names• Neologisms• Typos• Productive processes: – friend friendless friendlessness
friendlessnessless …– hand+bag handbag (sometimes can iterate)
![Page 5: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/5.jpg)
Solution: Don’t model every cell separately
NoblegasesPositive
ions
![Page 6: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/6.jpg)
Can store info about each word in a table
Index
Spelling
Meaning Pronunciation
Syntax
123 ca [si.ei] NNP (abbrev)
124 can [kɛɪn] NN
125 can [kæn], [kɛn], …
MD
126 cane [keɪn] NN (mass)
127 cane [keɪn] NN
128 canes [keɪnz] NNS
![Page 7: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/7.jpg)
Can store info about each word in a table
Index
Spelling
Meaning Pronunciation
Syntax
123 ca [si.ei] NNP (abbrev)
124 can [kɛɪn] NN
125 can [kæn], [kɛn], …
MD
126 cane [keɪn] NN (mass)
127 cane [keɪn] NN
128 canes [keɪnz] NNS
Ultimate goal: Probabilistically reconstruct all missing entries of this infinite multilingual table, given some entries and some text.
Approach: Linguistics + generative modeling + statistical inference.
Modeling ingredients: Finite-state machines + graphical models.
Inference ingredients: Expectation Propagation (this talk).
![Page 8: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/8.jpg)
Can store info about each word in a table
Index
Spelling
Meaning Pronunciation
Syntax
123 ca [si.ei] NNP (abbrev)
124 can [kɛɪn] NN
125 can [kæn], [kɛn], …
MD
126 cane [keɪn] NN (mass)
127 cane [keɪn] NN
128 canes [keɪnz] NNS
Ultimate goal: Probabilistically reconstruct all missing entries of this infinite multilingual table, given some entries and some text.
Approach: Linguistics + generative modeling + statistical inference.
Modeling ingredients: Finite-state machines + graphical models.
Inference ingredients: Expectation Propagation (this talk).
![Page 9: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/9.jpg)
Predicting Pronunciations of Novel Words (Morpho-Phonology)
d æmnˌe nˈ ɪʃə riz ajnzˈ
r z gnˌɛ ɪe nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
????
e nɪʃə z rizajgndæmn
damns damnation resigns resignation
How do you pronounce this word?
![Page 10: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/10.jpg)
Predicting Pronunciations of Novel Words (Morpho-Phonology)
d æmnˌe nˈ ɪʃə riz ajnzˈ
r z gnˌɛ ɪe nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
d æmzˌ
e nɪʃə z rizajgndæmn
damns damnation resigns resignation
How do you pronounce this word?
![Page 11: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/11.jpg)
Graphical Models over Strings
• Use Graphical Model Framework to model many strings jointly!
11
ψ1
X2
X1ring 1rang 2rung 2
ring 10.2rang 13rung 16
ring
rang
rung
ring 2 4 0.1
rang 7 1 2
rung 8 1 3
ψ1
X2
X1
aardvark
0.1
… …
rang 3
ring 4
rung 5
… …
aardvark
…rang
ring
rung
…
aardvark
0.1 0.2 0.1 0.1
…
rang 0.1 2 4 0.1
ring 0.1 7 1 2
rung 0.2 8 1 3
…
ψ1
X2
X1r i n g
ue ε ee
s e ha
s i n gr a n g
uaeε εa
rs
au
r i n gue ε
s e ha
![Page 12: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/12.jpg)
Zooming in on a WFSA
• Compactly represents an (unnormalized) probability distribution over all strings in
• Marginal belief: How do we pronounce damns?
• Possibilities: /damz/, /dams/, /damnIz/, etc..
d/1 a/1 m/1z/.5
s/.25
n/.25
z/1
I/1 z/1
![Page 13: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/13.jpg)
Log-Linear Approximation
• Given a WFSA distribution p, find a log-linear approximation q– min KL(p || q) “inclusive KL divergence”– q corresponds to a smaller/tidier WFSA
• Two Approaches:– Gradient-Based Optimization (Discussed Here)– Closed Form Optimization
![Page 14: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/14.jpg)
fo = 3
bar = 2
az = 4
foo = 1foo 1.2
bar 0.5
baz 4.3
Fit model that predicts same counts
Broadcast n-gram counts
ML Estimation = Moment Matching
![Page 15: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/15.jpg)
FSA Approx. = Moment Matching
r i n g
ue ε ee
s e ha
r i n gue ε ee
s e ha
Compute with forward-backward!
xx = 0.1
zz= 0.1
fo = 3
bar = 2
az = 4
foo = 1foo 1.2
bar 0.5
baz 4.3
Fit model that predicts same counts
![Page 16: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/16.jpg)
Gradient-Based Minimization
• Objective: • Gradient with respect to
• Difference between two expectations of feature counts, which are determined by the weighted DFA q
• Features are just n-gram counts!
Arc weights are determined by a parameter vector - just like a log-linear model
![Page 17: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/17.jpg)
Does q need a lot of features?
• Game: what order of n-grams do we need to put probability 1 on a string?
• Word 1: noon– Bigram model? No - Trigram model
• Word 2: papa– Trigram model? No - 4-gram model - very big!
• Word 3: abracadabra– 6-gram model – way too big!
![Page 18: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/18.jpg)
Variable Order Approximations
• Intuition: In NLP marginals are often peaked
– Probability mass mostly on a few similar strings!
• q should reward a few long n-grams– also need short n-gram features for backoff
abra 5.0
^a 5.0
b 4.3
^abrab 5.0
abraca 5.0
zzzzzz -500
6-gram table. Too Big!
Variable order table. Very Small!
![Page 19: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/19.jpg)
Variable Order Approximations• Moral: Use only the n-grams you really need!
![Page 20: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/20.jpg)
Belief Propagation (BP) in a Nutshell
X1
X2
X3
X4
X6
X5
![Page 21: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/21.jpg)
Belief Propagation (BP) in a Nutshell
X1
X2
X3
X4
X6
X5
d/1 a/1 m/1z/.5
s/.25
n/.25
z/1
I/1 z/1
![Page 22: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/22.jpg)
Belief Propagation (BP) in a Nutshell
X1
X2
X3
X4
X6
X5
![Page 23: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/23.jpg)
Computing Marginal Beliefs
X1
X2
X3
X4
X7
X5
![Page 24: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/24.jpg)
Computing Marginal Beliefs
X1
X2
X3
X4
X7
X5
![Page 25: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/25.jpg)
Belief Propagation (BP) in a Nutshell
X1
X2
X3
X4
X6
X5
r i n gue ε ee
s e ha
r i n gue ε
s e ha
r i n gue ε ee
s e ha
r i n gue ε
s e ha
![Page 26: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/26.jpg)
Computing Marginal Beliefs
X1
X2
X3
X4
X7
X5
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
![Page 27: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/27.jpg)
Computing Marginal Beliefs
X1
X2
X3
X4
X7
X5
C
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e har i n g
ue ε
s e har i n gue ε
s e haComputation of belief results in large state space
![Page 28: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/28.jpg)
Computing Marginal Beliefs
X1
X2
X3
X4
X7
X5
C
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e har i n g
ue ε
s e har i n gue ε
s e haComputation of belief results in large state space
What a hairball!
![Page 29: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/29.jpg)
Computing Marginal Beliefs
X1
X2
X3
X4
X7
X5
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e haApproximation Required!!!
![Page 30: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/30.jpg)
BP over String-Valued Variables
• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!
X2
X1
ψ2
a
a
εa
aa
a
ψ1
aa
![Page 31: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/31.jpg)
BP over String-Valued Variables
• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!
X2
X1
ψ2
a
a
εa
aa
a
ψ1
aa
a
![Page 32: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/32.jpg)
BP over String-Valued Variables
• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!
X2
X1
ψ2
a
a
εa
aa
a
ψ1
aa
a
a a
![Page 33: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/33.jpg)
BP over String-Valued Variables
• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!
X2
X1
ψ2
a
a
εa
aa
a
ψ1
aa
a a
a a
![Page 34: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/34.jpg)
BP over String-Valued Variables
• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!
X2
X1
ψ2
a
a
εa
aa
a
ψ1
aa
a a
a a a a a a a a a a a a
a a a a a a a a a
![Page 35: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/35.jpg)
Expectation Propagation (EP) in a Nutshell
X1
X2
X3
X4
X7
X5
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
![Page 36: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/36.jpg)
Expectation Propagation (EP) in a Nutshell
X1
X2
X3
X4
X7
X5
foo 1.2bar 0.5baz 4.3
r i n gue ε
s e ha
r i n gue ε
s e ha
r i n gue ε
s e ha
![Page 37: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/37.jpg)
Expectation Propagation (EP) in a Nutshell
X1
X2
X3
X4
X7
X5
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
r i n gue ε
s e ha
r i n gue ε
s e ha
![Page 38: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/38.jpg)
Expectation Propagation (EP) in a Nutshell
X1
X2
X3
X4
X7
X5
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
r i n gue ε
s e ha
![Page 39: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/39.jpg)
Expectation Propagation (EP) in a Nutshell
X1
X2
X3
X4
X7
X5
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
![Page 40: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/40.jpg)
EP In a Nutshell
X1
X2
X3
X4
X7
X5
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
foo 1.2bar 0.5baz 4.3
foo 4.8
bar 2.0
baz 17.2
Approximate belief is now a table of n-grams.
The point-wise product is now super easy!
![Page 41: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/41.jpg)
KL( || )
How to approximate a message?
foo 1.2bar 0.5baz 4.3
foo 0.2bar 1.1baz -0.3
foo 1.2bar 0.5baz 4.3
foobarbazi n g
u ε
s e ha
Minimize with respect to the parameters θ
r i n gue ε
s e ha
θ
foo 0.2bar 1.1baz -0.3
foobarbaz
i n gu ε
s e ha= i n g
u ε
s e ha=
![Page 42: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/42.jpg)
Results• Question 1: Does EP work in
general (comparison to baseline)?
• Question 2: Do variable order approximations improve over fixed n-grams?
• Unigram EP (Green) – fast, but inaccurate
• Bigram EP (Blue) – also fast and inaccurate
• Trigram EP (Cyan) – slow and accurate
• Penalized EP (Red) – fast and accurate
• Baseline (Black) – accurate and slow (pruning based)
![Page 43: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697c0031a28abf838cc3d88/html5/thumbnails/43.jpg)
Fin
Thanks for you attention!
For more information on structured models and belief propagation, see the Structured Belief Propagation Tutorial at ACL 2015 by Matt Gormley and Jason Eisner.