Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using...
Transcript of Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using...
![Page 1: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/1.jpg)
Marginal Inference in MRFs using Frank-Wolfe
David Belanger, Daniel Sheldon, Andrew McCallum
School of Computer ScienceUniversity of Massachusetts, Amherst
{belanger,sheldon,mccallum}@cs.umass.edu
December 10, 2013
![Page 2: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/2.jpg)
Table of Contents
1 Markov Random Fields
2 Frank-Wolfe for Marginal Inference
3 Optimality Guarantees and Convergence Rate
4 Beyond MRFs
5 Fancier FW
December 10, 2013 2 / 26
![Page 3: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/3.jpg)
Table of Contents
1 Markov Random Fields
2 Frank-Wolfe for Marginal Inference
3 Optimality Guarantees and Convergence Rate
4 Beyond MRFs
5 Fancier FW
December 10, 2013 3 / 26
![Page 4: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/4.jpg)
Markov Random Fields
Φθ(x) =∑c∈C
θc(xc)
P(x) =exp (Φθ(x))
log(Z )
x→ µ
Φθ(x)→ 〈θ,µ〉
December 10, 2013 4 / 26
![Page 5: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/5.jpg)
Markov Random Fields
Φθ(x) =∑c∈C
θc(xc)
P(x) =exp (Φθ(x))
log(Z )
x→ µ
Φθ(x)→ 〈θ,µ〉
December 10, 2013 4 / 26
![Page 6: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/6.jpg)
Markov Random Fields
Φθ(x) =∑c∈C
θc(xc)
P(x) =exp (Φθ(x))
log(Z )
x→ µ
Φθ(x)→ 〈θ,µ〉
December 10, 2013 4 / 26
![Page 7: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/7.jpg)
Markov Random Fields
Φθ(x) =∑c∈C
θc(xc)
P(x) =exp (Φθ(x))
log(Z )
x→ µ
Φθ(x)→ 〈θ,µ〉
December 10, 2013 4 / 26
![Page 8: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/8.jpg)
Markov Random Fields
Φθ(x) =∑c∈C
θc(xc)
P(x) =exp (Φθ(x))
log(Z )
x→ µ
Φθ(x)→ 〈θ,µ〉
December 10, 2013 4 / 26
![Page 9: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/9.jpg)
Marginal Inference
µMARG = EPθ[µ]
µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ)
µ̄approx = arg maxµ∈L〈µ,θ〉+ HB(µ)
HB(µ) =∑c∈C
WcH(µc)
December 10, 2013 5 / 26
![Page 10: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/10.jpg)
Marginal Inference
µMARG = EPθ[µ]
µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ)
µ̄approx = arg maxµ∈L〈µ,θ〉+ HB(µ)
HB(µ) =∑c∈C
WcH(µc)
December 10, 2013 5 / 26
![Page 11: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/11.jpg)
Marginal Inference
µMARG = EPθ[µ]
µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ)
µ̄approx = arg maxµ∈L〈µ,θ〉+ HB(µ)
HB(µ) =∑c∈C
WcH(µc)
December 10, 2013 5 / 26
![Page 12: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/12.jpg)
Marginal Inference
µMARG = EPθ[µ]
µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ)
µ̄approx = arg maxµ∈L〈µ,θ〉+ HB(µ)
HB(µ) =∑c∈C
WcH(µc)
December 10, 2013 5 / 26
![Page 13: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/13.jpg)
MAP Inference
µMAP = arg maxµ∈M〈µ,θ〉
Black&Box&&MAP&Solver&
✓ µMAP
Gray&Box&&MAP&Solver&
✓ µMAP
December 10, 2013 6 / 26
![Page 14: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/14.jpg)
MAP Inference
µMAP = arg maxµ∈M〈µ,θ〉
Black&Box&&MAP&Solver&
✓ µMAP
Gray&Box&&MAP&Solver&
✓ µMAP
December 10, 2013 6 / 26
![Page 15: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/15.jpg)
MAP Inference
µMAP = arg maxµ∈M〈µ,θ〉
Black&Box&&MAP&Solver&
✓ µMAP
Gray&Box&&MAP&Solver&
✓ µMAP
December 10, 2013 6 / 26
![Page 16: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/16.jpg)
Marginal → MAP Reductions
Hazan and Jaakkola [2012]
Ermon et al. [2013]
December 10, 2013 7 / 26
![Page 17: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/17.jpg)
Table of Contents
1 Markov Random Fields
2 Frank-Wolfe for Marginal Inference
3 Optimality Guarantees and Convergence Rate
4 Beyond MRFs
5 Fancier FW
December 10, 2013 8 / 26
![Page 18: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/18.jpg)
Generic FW with Line Search
yt = arg minx∈X〈x,−∇f (xt−1)〉
xt = minγ∈[0,1]
f ((1− γ)xt + γyt)
December 10, 2013 9 / 26
![Page 19: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/19.jpg)
Generic FW with Line Search
Linear&&Minimiza<on&
Oracle&
Line&Search&Compute&&Gradient&
xt
�rf(xt�1) yt
December 10, 2013 10 / 26
![Page 20: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/20.jpg)
FW for Marginal Inference
MAP&Inference&Oracle&
Line&Search&Compute&Gradient&
rF (µt) = ✓ +rH(µt)
✓̃ µ̃MAP
µt+1
December 10, 2013 11 / 26
![Page 21: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/21.jpg)
Subproblem Parametrization
F (µ) = 〈µ,θ〉+∑c∈C
WcH(µc)
θ̃ = ∇F (µt) = θ +∑c∈C
Wc∇H(µc)
December 10, 2013 12 / 26
![Page 22: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/22.jpg)
Subproblem Parametrization
F (µ) = 〈µ,θ〉+∑c∈C
WcH(µc)
θ̃ = ∇F (µt) = θ +∑c∈C
Wc∇H(µc)
December 10, 2013 12 / 26
![Page 23: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/23.jpg)
Line Search
µ̃MAP
µt
µt+1
Computing line search objective can scale with:
Bad: # possible values in cliques.
Good: # cliques in graph.
(see paper)
December 10, 2013 13 / 26
![Page 24: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/24.jpg)
Line Search
µ̃MAP
µt
µt+1
Computing line search objective can scale with:
Bad: # possible values in cliques.
Good: # cliques in graph.
(see paper)
December 10, 2013 13 / 26
![Page 25: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/25.jpg)
Line Search
µ̃MAP
µt
µt+1
Computing line search objective can scale with:
Bad: # possible values in cliques.
Good: # cliques in graph.
(see paper)
December 10, 2013 13 / 26
![Page 26: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/26.jpg)
Line Search
µ̃MAP
µt
µt+1
Computing line search objective can scale with:
Bad: # possible values in cliques.
Good: # cliques in graph.
(see paper)
December 10, 2013 13 / 26
![Page 27: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/27.jpg)
Experiment #1
December 10, 2013 14 / 26
![Page 28: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/28.jpg)
Table of Contents
1 Markov Random Fields
2 Frank-Wolfe for Marginal Inference
3 Optimality Guarantees and Convergence Rate
4 Beyond MRFs
5 Fancier FW
December 10, 2013 15 / 26
![Page 29: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/29.jpg)
Convergence Rate
Convergence Rate of Frank-Wolfe [Jaggi, 2013]
F (µt)− F (µ∗) ≤ 2CF
t + 2(1 + δ)
δCft+2 MAP suboptimality at iter t −→ NP-Hard
How to deal with MAP hardness?
Use MAP solver and hope for the best [Hazan and Jaakkola, 2012].
Relax to the local polytope.
December 10, 2013 16 / 26
![Page 30: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/30.jpg)
Convergence Rate
Convergence Rate of Frank-Wolfe [Jaggi, 2013]
F (µt)− F (µ∗) ≤ 2CF
t + 2(1 + δ)
δCft+2 MAP suboptimality at iter t
−→ NP-Hard
How to deal with MAP hardness?
Use MAP solver and hope for the best [Hazan and Jaakkola, 2012].
Relax to the local polytope.
December 10, 2013 16 / 26
![Page 31: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/31.jpg)
Convergence Rate
Convergence Rate of Frank-Wolfe [Jaggi, 2013]
F (µt)− F (µ∗) ≤ 2CF
t + 2(1 + δ)
δCft+2 MAP suboptimality at iter t −→ NP-Hard
How to deal with MAP hardness?
Use MAP solver and hope for the best [Hazan and Jaakkola, 2012].
Relax to the local polytope.
December 10, 2013 16 / 26
![Page 32: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/32.jpg)
Convergence Rate
Convergence Rate of Frank-Wolfe [Jaggi, 2013]
F (µt)− F (µ∗) ≤ 2CF
t + 2(1 + δ)
δCft+2 MAP suboptimality at iter t −→ NP-Hard
How to deal with MAP hardness?
Use MAP solver and hope for the best [Hazan and Jaakkola, 2012].
Relax to the local polytope.
December 10, 2013 16 / 26
![Page 33: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/33.jpg)
Curvature + Convergence Rate
Cf = supx ,s∈D;γ∈[0,1];y=x+γ(s−x)
2
γ2(f (y)− f (x)− 〈y − x ,∇f (x)〉)
µ̃MAP
µt
µt+1
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
entr
opy
prob x = 1
December 10, 2013 17 / 26
![Page 34: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/34.jpg)
Curvature + Convergence Rate
Cf = supx ,s∈D;γ∈[0,1];y=x+γ(s−x)
2
γ2(f (y)− f (x)− 〈y − x ,∇f (x)〉)
µ̃MAP
µt
µt+1
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
entr
opy
prob x = 1
December 10, 2013 17 / 26
![Page 35: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/35.jpg)
Experiment #2
December 10, 2013 18 / 26
![Page 36: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/36.jpg)
Table of Contents
1 Markov Random Fields
2 Frank-Wolfe for Marginal Inference
3 Optimality Guarantees and Convergence Rate
4 Beyond MRFs
5 Fancier FW
December 10, 2013 19 / 26
![Page 37: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/37.jpg)
Beyond MRFs
Question
Are MRFs the right Gibbs distribution to use Frank-Wolfe?
Problem Family MAP Algorithm Marginal Algorithmtree-structured graphical models Viterbi Forward-Backward
loopy graphical models Max-Product BP Sum-Product BPDirected Spanning Tree Chu Liu Edmonds Matrix Tree Theorem
Bipartite Matching Hungarian Algorithm ×
December 10, 2013 20 / 26
![Page 38: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/38.jpg)
Beyond MRFs
Question
Are MRFs the right Gibbs distribution to use Frank-Wolfe?
Problem Family MAP Algorithm Marginal Algorithmtree-structured graphical models Viterbi Forward-Backward
loopy graphical models Max-Product BP Sum-Product BPDirected Spanning Tree Chu Liu Edmonds Matrix Tree Theorem
Bipartite Matching Hungarian Algorithm ×
December 10, 2013 20 / 26
![Page 39: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/39.jpg)
Table of Contents
1 Markov Random Fields
2 Frank-Wolfe for Marginal Inference
3 Optimality Guarantees and Convergence Rate
4 Beyond MRFs
5 Fancier FW
December 10, 2013 21 / 26
![Page 40: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/40.jpg)
norm-regularized marginal inference
µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ) + λR(µ)
Harchaoui et al. [2013].
Local linear oracle for MRFs?
µ̃t = arg maxµ∈M∩Br (µt)
〈µ,θ〉
Garber and Hazan [2013]
December 10, 2013 22 / 26
![Page 41: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/41.jpg)
norm-regularized marginal inference
µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ) + λR(µ)
Harchaoui et al. [2013].
Local linear oracle for MRFs?
µ̃t = arg maxµ∈M∩Br (µt)
〈µ,θ〉
Garber and Hazan [2013]
December 10, 2013 22 / 26
![Page 42: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/42.jpg)
Conclusion
We need to figure out how to handle the entropy gradient.
There are plenty of extensions to further Gibbs distributions +regularizers.
December 10, 2013 23 / 26
![Page 43: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/43.jpg)
Conclusion
We need to figure out how to handle the entropy gradient.
There are plenty of extensions to further Gibbs distributions +regularizers.
December 10, 2013 23 / 26
![Page 44: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/44.jpg)
Further Reading I
Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. Taming thecurse of dimensionality: Discrete integration by hashing and optimization. InProceedings of the 30th International Conference on Machine Learning(ICML-13), pages 334–342, 2013.
D. Garber and E. Hazan. A Linearly Convergent Conditional Gradient Algorithmwith Applications to Online and Stochastic Optimization. ArXiv e-prints,January 2013.
Zaid Harchaoui, Anatoli Juditsky, and Arkadi Nemirovski. Conditional gradientalgorithms for norm-regularized smooth convex optimization. arXiv preprintarXiv:1302.2325, 2013.
Tamir Hazan and Tommi S Jaakkola. On the Partition Function and RandomMaximum A-Posteriori Perturbations. In Proceedings of the 29th InternationalConference on Machine Learning (ICML-12), pages 991–998, 2012.
Bert Huang and Tony Jebara. Approximating the permanent with beliefpropagation. arXiv preprint arXiv:0908.1769, 2009.
December 10, 2013 24 / 26
![Page 45: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/45.jpg)
Further Reading II
Mark Huber. Exact sampling from perfect matchings of dense regular bipartitegraphs. Algorithmica, 44(3):183–193, 2006.
Martin Jaggi. Revisiting Frank-Wolfe: Projection-Free Sparse ConvexOptimization. In Proceedings of the 30th International Conference on MachineLearning (ICML-13), pages 427–435, 2013.
James Petterson, Tiberio Caetano, Julian McAuley, and Jin Yu. Exponentialfamily graph matching and ranking. 2009.
Tim Roughgarden and Michael Kearns. Marginals-to-models reducibility. InAdvances in Neural Information Processing Systems, pages 1043–1051, 2013.
Maksims Volkovs and Richard S Zemel. Efficient sampling for bipartite matchingproblems. In Advances in Neural Information Processing Systems, pages1322–1330, 2012.
Pascal O Vontobel. The bethe permanent of a non-negative matrix. InCommunication, Control, and Computing (Allerton), 2010 48th AnnualAllerton Conference on, pages 341–346. IEEE, 2010.
December 10, 2013 25 / 26
![Page 46: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/46.jpg)
Finding the Marginal Matching
Sampling
Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012].
Used for maximum-likelihood learning [Petterson et al., 2009].
Sum-Product
Also requires Bethe approximation.Works well:
In practice [Huang and Jebara, 2009]
In theory [Vontobel, 2010]
Frank-Wolfe
Basically the same algorithm as for graphical models.
Same issue with curvature.
December 10, 2013 26 / 26
![Page 47: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/47.jpg)
Finding the Marginal Matching
Sampling
Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012].Used for maximum-likelihood learning [Petterson et al., 2009].
Sum-Product
Also requires Bethe approximation.Works well:
In practice [Huang and Jebara, 2009]
In theory [Vontobel, 2010]
Frank-Wolfe
Basically the same algorithm as for graphical models.
Same issue with curvature.
December 10, 2013 26 / 26
![Page 48: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/48.jpg)
Finding the Marginal Matching
Sampling
Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012].Used for maximum-likelihood learning [Petterson et al., 2009].
Sum-Product
Also requires Bethe approximation.Works well:
In practice [Huang and Jebara, 2009]
In theory [Vontobel, 2010]
Frank-Wolfe
Basically the same algorithm as for graphical models.
Same issue with curvature.
December 10, 2013 26 / 26
![Page 49: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science](https://reader035.fdocuments.in/reader035/viewer/2022081619/60fe38c32673d04e016d20b3/html5/thumbnails/49.jpg)
Finding the Marginal Matching
Sampling
Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012].Used for maximum-likelihood learning [Petterson et al., 2009].
Sum-Product
Also requires Bethe approximation.Works well:
In practice [Huang and Jebara, 2009]
In theory [Vontobel, 2010]
Frank-Wolfe
Basically the same algorithm as for graphical models.
Same issue with curvature.
December 10, 2013 26 / 26