Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping...

42
Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge Mar 18, 2016 Work with Tony Jebara and Justin Domke For more information, see http://mlg.eng.cam.ac.uk/adrian/ 1 / 21

Transcript of Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping...

Page 1: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Clamping Variables and Approximate Inference

Adrian WellerUniversity of Cambridge

MSR CambridgeMar 18, 2016

Work with Tony Jebara and Justin Domke

For more information, seehttp://mlg.eng.cam.ac.uk/adrian/

1 / 21

Page 2: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Motivation: undirected graphical models

Powerful way to represent relationships across variables

Many applications including: computer vision, social networkanalysis, deep belief networks, protein folding...

In this talk, focus on binary pairwise (Ising) models

Example: Grid for computer vision (attractive)

2 / 21

Page 3: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Motivation: undirected graphical models

Example: Part of epinions social network (mixed)

Figure courtesy of N. Ruozzi

3 / 21

Page 4: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Motivation: undirected graphical models

x4 x5 x6 x7 x8

x1 x2 x3

Example: Restricted Boltzmann machine (mixed)

A fundamental problem is marginal inference

Estimate marginal probability distribution of one variable

p(x1) =∑

x2,...,xn

p(x1, x2, . . . , xn)

Closely related to computing the partition function

Computationally intractable, focus on approximate methods

Our theme: combining approximate inference with clampingcan be very fruitful as a proof technique, and in practice

4 / 21

Page 5: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: Binary pairwise models

Binary variables X1, . . . ,Xn ∈ {0, 1}Singleton and pairwise potentials θ

Write θ · x for the total score of a complete configuration

Probability distribution given by

p(x) =1

Zexp(θ · x)

To ensure probabilities sum to 1, need normalizing constant

Z =∑

x exp (θ · x)

Z is the partition function, a fundamental quantity we’d liketo compute or approximate

5 / 21

Page 6: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: A variational approximation

Recall p(x) =1

Zexp(θ · x)

Exact inference may be viewed as optimization,

logZ = maxµ∈M

[ θ · µ+ S(µ) ]

M is the space of marginals that are globally consistent, S isthe (Shannon) entropy

Bethe makes two pairwise approximations,

logZB = maxq∈L

[ θ · q + SB(q) ]

L is the space of marginals that are pairwise consistent, SB isthe Bethe entropy approximation

Loopy Belief Propagation finds stationary points of Bethe

For models with no cycles (acyclic), Bethe is exact ZB = Z6 / 21

Page 7: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: When is Bethe a good approximation?

We know that Bethe is exact for acyclic models, ZB = Z

When else does Bethe perform well?

‘Tree-like models’: models with long cycles or weak potentials

Also: attractive models (all edges attractive)

Sudderth, Wainwright and Willsky (NIPS 2007) used loopseries to show that for a subclass of attractive binary pairwisemodels, ZB ≤ Z

Conjectured ZB ≤ Z for all attractive binary pairwise models

Proved true by Ruozzi (NIPS 2012) using graph covers

Here we provide a separate proof building from first principles,and also derive an upper bound for Z in terms of ZB

We use the idea of clamping variables

7 / 21

Page 8: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: What is clamping?

x2

x3x4

x1

x5

x10x6x7

x9x8

Example model

To compute the partition function Z , canenumerate all states and sum

x1x2 . . . x10 score exp(score)

0 0 . . . 0 1 2.70 0 . . . 1 2 7.4. . . . . . . . .0 1 . . . 1 1.3 3.71 0 . . . 0 -1 0.41 0 . . . 1 0.2 1.2. . . . . . . . .1 1 . . . 1 1.8 6.0

Total Z = 47.1

8 / 21

Page 9: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: What is clamping?

x2

x3x4

x5

x10x6x7

x9x8

x1

Can split Z in two: clamp variable X1 to each of{0, 1}, then add the two sub-partition functions:

Z = Z |X1=0 + Z |X1=1

After we clamp a variable, it may be removed

x1x2 . . . x10 score exp(score)

0 0 . . . 0 1 2.70 0 . . . 1 2 7.4. . . . . . . . .0 1 . . . 1 1.3 3.7 27.5

1 0 . . . 0 -1 0.41 0 . . . 1 0.2 1.2. . . . . . . . .1 1 . . . 1 1.8 6.0 19.6

Total Z = 47.1

p(X1 = 1) =Z |X1=1

Z

9 / 21

Page 10: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: What is clamping?

x2

x3x4

x5

x10x6x7

x9x8

x1

Can split Z in two: clamp variable X1 to each of{0, 1}, then add the two sub-partition functions:

Z = Z |X1=0 + Z |X1=1

After we clamp a variable, it may be removed

After removing the clamped variable, if the remainingsub-models are acyclic then can find sub-partition functionsefficiently (BP, Bethe approximation is exact on trees)

If not,

Can repeat: clamp and remove variables until acyclic, orSettle for approximate inference on sub-models

Z(i)B := ZB |Xi=0 + ZB |Xi=1

Will this lead to a better estimate than approximate inferenceon the original model? Always? Often but not always

10 / 21

Page 11: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: What is clamping?

x2

x3x4

x5

x10x6x7

x9x8

x1

Can split Z in two: clamp variable X1 to each of{0, 1}, then add the two sub-partition functions:

Z = Z |X1=0 + Z |X1=1

After we clamp a variable, it may be removed

After removing the clamped variable, if the remainingsub-models are acyclic then can find sub-partition functionsefficiently (BP, Bethe approximation is exact on trees)

If not,

Can repeat: clamp and remove variables until acyclic, orSettle for approximate inference on sub-models

Z(i)B := ZB |Xi=0 + ZB |Xi=1

Will this lead to a better estimate than approximate inferenceon the original model? Always? Often but not always

10 / 21

Page 12: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: What is clamping?

x2

x3x4

x5

x10x6x7

x9x8

x1

Can split Z in two: clamp variable X1 to each of{0, 1}, then add the two sub-partition functions:

Z = Z |X1=0 + Z |X1=1

After we clamp a variable, it may be removed

After removing the clamped variable, if the remainingsub-models are acyclic then can find sub-partition functionsefficiently (BP, Bethe approximation is exact on trees)

If not,

Can repeat: clamp and remove variables until acyclic, orSettle for approximate inference on sub-models

Z(i)B := ZB |Xi=0 + ZB |Xi=1

Will this lead to a better estimate than approximate inferenceon the original model? Always?

Often but not always

10 / 21

Page 13: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Background: What is clamping?

x2

x3x4

x5

x10x6x7

x9x8

x1

Can split Z in two: clamp variable X1 to each of{0, 1}, then add the two sub-partition functions:

Z = Z |X1=0 + Z |X1=1

After we clamp a variable, it may be removed

After removing the clamped variable, if the remainingsub-models are acyclic then can find sub-partition functionsefficiently (BP, Bethe approximation is exact on trees)

If not,

Can repeat: clamp and remove variables until acyclic, orSettle for approximate inference on sub-models

Z(i)B := ZB |Xi=0 + ZB |Xi=1

Will this lead to a better estimate than approximate inferenceon the original model? Always? Often but not always

10 / 21

Page 14: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

A variational perspective on clamping

Bethe approximation

logZB = maxq∈L

[ θ · q + SB(q) ]

Observe that when Xi is clamped, we optimize over a subset

logZB |Xi=0 = maxq∈L:qi=0

[ θ · q + SB(q) ]

⇒ ZB |Xi=0 ≤ ZB , similarly ZB |Xi=1 ≤ ZB

Recap of Notation

Z true partition functionZB Bethe optimum partition function

Z(i)B := ZB |Xi=0 + ZB |Xi=1

≤ 2ZB

approximation obtained whenclamp and sum approximate

sub-partition functions

11 / 21

Page 15: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Clamping variables: an upper bound on Z

From before,

Z(i)B := ZB |Xi=0 + ZB |Xi=1 ≤ 2ZB

Repeat: clamp and remove variables, until remaining model isacyclic, where Bethe is exact

For example, if must delete 2 variables Xi ,Xj , obtain

Z(ij)B :=

∑a,b∈{0,1}

ZB |Xi=a,Xj=b ≤ 22ZB

But sub-partition functions are exact, hence LHS = Z

x2

x3x4

x1

x5

x10x6x7

x9x8

x2

x3x4

x5

x10x6x7

x9x8

x1

x3x4

x5

x10x6x7

x9x8

x1

x2

12 / 21

Page 16: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Clamping variables: an upper bound on Z

Z(i)B := ZB |Xi=0 + ZB |Xi=1 ≤ 2ZB

Repeat: clamp and remove variables, until remaining model isacyclic, where Bethe is exact

Let k(G ) be the minimum size of a feedback vertex set

Theorem (result is tight in a sense)

Z ≤ 2kZB

13 / 21

Page 17: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Clamping variables: an upper bound on Z

Z(i)B := ZB |Xi=0 + ZB |Xi=1 ≤ 2ZB

Repeat: clamp and remove variables, until remaining model isacyclic, where Bethe is exact

Let k(G ) be the minimum size of a feedback vertex set

Theorem (result is tight in a sense)

Z ≤ 2kZB

14 / 21

Page 18: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Attractive models: a lower bound on Z

An attractive model is one with all edges attractive

Recall definition,

Z(i)B := ZB |Xi=0 + ZB |Xi=1

Theorem (actually show a stronger result, ask if interested)

For an attractive binary pairwise model and any Xi , ZB ≤ Z(i)B

Repeat as before: ZB ≤ Z(i)B ≤ Z

(ij)B ≤ · · · ≤ Z

Corollary (similar proof to earlier result; first proved Ruozzi, 2012)

For an attractive binary pairwise model, ZB ≤ Z

⇒ each clamp and sum can only improve ZB

15 / 21

Page 19: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Attractive models: a lower bound on Z

An attractive model is one with all edges attractive

Recall definition,

Z(i)B := ZB |Xi=0 + ZB |Xi=1

Theorem (actually show a stronger result, ask if interested)

For an attractive binary pairwise model and any Xi , ZB ≤ Z(i)B

Repeat as before: ZB ≤ Z(i)B ≤ Z

(ij)B ≤ · · · ≤ Z

Corollary (similar proof to earlier result; first proved Ruozzi, 2012)

For an attractive binary pairwise model, ZB ≤ Z

⇒ each clamp and sum can only improve ZB

15 / 21

Page 20: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Recap of results so far

We have used clamping as a proof technique

Derived lower and upper bounds on Z for attractive models

ZB ≤︸ ︷︷ ︸attractive only

Z ≤ 2kZB︸ ︷︷ ︸attractive and mixed

⇔ Z

2k≤ ZB ≤ Z︸︷︷︸

attractive only

We also proved that for attractive models, clamping andsumming (optimum) Bethe sub-partition functions can onlyimprove the estimate

How about for mixed models?

16 / 21

Page 21: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Example: here clamping any variable worsens ZB estimate

x1 x2

x3x4

Blue edges are attractive with edge weight +2Red edges are repulsive with edge weight −2No singleton potentials

(performance is only slightly worse with clamping)

In practice, if we pick a good variable to clamp, then clampingis usually helpful

17 / 21

Page 22: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

New work: what does clamping do for MF and TRW?

Mean field (MF) approximation assumes independentvariables, yields a lower bound, ZM ≤ Z

Tree-reweighted (TRW) is a pairwise approximation similar toBethe but allows a convex optimization and yields an upperbound, Z ≤ ZT ZM ≤ Z ≤ ZT

Earlier, we showed that for Bethe, clamping always improvesthe approximation for attractive models; often but not alwaysimproves for mixed models

How about for MF and TRW? ZM ≤ ZB ≤ ZT

Theorem

For both MF and TRW, for attractive and mixed models, clampingand summing approximate sub-partition functions can only improvethe respective approximation and bound (any number of labels).

18 / 21

Page 23: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

New work: what does clamping do for MF and TRW?

Mean field (MF) approximation assumes independentvariables, yields a lower bound, ZM ≤ Z

Tree-reweighted (TRW) is a pairwise approximation similar toBethe but allows a convex optimization and yields an upperbound, Z ≤ ZT ZM ≤ Z ≤ ZT

Earlier, we showed that for Bethe, clamping always improvesthe approximation for attractive models; often but not alwaysimproves for mixed models

How about for MF and TRW? ZM ≤ ZB ≤ ZT

Theorem

For both MF and TRW, for attractive and mixed models, clampingand summing approximate sub-partition functions can only improvethe respective approximation and bound (any number of labels).

18 / 21

Page 24: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Error in log Z vs number of clamps: grids

larg

e(9

x9)

0 1 2 3 4 5−15

−10

−5

0

5

10

best

worst

pseudo

greedy

0 1 2 3 4 5−10

−5

0

5

10

15

20

25

best

worst

pseudo

greedy

smal

l(5

x5)

0 1 2 3 4 5−3

−2

−1

0

1

2

3

best

worst

pseudo

greedy

0 1 2 3 4 5

−2

0

2

4

6

best

worst

pseudo

greedy

attractive grids [0, 6] mixed grids [−6, 6]19 / 21

Page 25: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Conclusions for practitioners

Typically Bethe performs very well

Clamping can be very helpful, more so for denser models withstronger edge weights, a setting where inference is often hard

We provide fast methods to select a good variable to clamp

MF and TRW provide useful bounds on Z and ZB

Thank you

For more information, seehttp://mlg.eng.cam.ac.uk/adrian/

20 / 21

Page 26: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

References

N. Ruozzi. The Bethe partition function of log-supermodulargraphical models. In NIPS, 2012.

E. Sudderth, M. Wainwright, and A. Willsky. Loop series andBethe variational bounds in attractive graphical models. InNIPS, 2007.

A. Weller and J. Domke. Clamping improves TRW and meanfield approximations. To appear in AISTATS, 2016.

A. Weller and T. Jebara. Bethe bounds and approximatingthe global optimum. In AISTATS, 2013.

A. Weller and T. Jebara. Clamping variables and approximateinference. In NIPS, 2014.

J. Yedidia, W. Freeman, and Y. Weiss. Understanding beliefpropagation and its generalizations. In IJCAI, DistinguishedLecture Track, 2001.

21 / 21

Page 27: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Supplementary material

Extra slides for questions or furtherexplanation

22 / 21

Page 28: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Error in log Z vs number of clamps: complete graphs

0 1 2 3 4 5

−1.5

−1

−0.5

0

best

worst

pseudo

greedy

0 1 2 3 4 5

0

10

20

30

40

best

worst

pseudo

greedy

attractive K15, [0, 6] mixed K15, [−6, 6]

For dense mixed models (many edges),

MF can be better than Bethe

What happens if we increase edge strength?

23 / 21

Page 29: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Error in log Z vs number of clamps: complete graphs

0 1 2 3 4 5

0

10

20

30

40

best

worst

pseudo

greedy

0 1 2 3 4 5−20

0

20

40

60

80

100

best

worst

pseudo

greedy

mixed K15, [−6, 6] mixed K15, [−12, 12]

With stronger edges, MF is much better than Bethe!

But MF assumes variables are independent, what’s going on?

Frustrated cycles cause Bethe to overestimate by a lotTRW is even worseMF behaves much better (in marginal polytope)

24 / 21

Page 30: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Error in log Z vs number of clamps: complete graphs

0 1 2 3 4 5

0

10

20

30

40

best

worst

pseudo

greedy

0 1 2 3 4 5−20

0

20

40

60

80

100

best

worst

pseudo

greedy

mixed K15, [−6, 6] mixed K15, [−12, 12]

With stronger edges, MF is much better than Bethe!

But MF assumes variables are independent, what’s going on?

Frustrated cycles cause Bethe to overestimate by a lotTRW is even worseMF behaves much better (in marginal polytope)

24 / 21

Page 31: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Time (secs) vs error in log Z for various methods

Mixed models, Wij ∼ U[−6, 6]Time shown on a log scale

10−2

100

102

−5

0

5

10

TRW

B

MF

maxW+c+TRE

pseudo

greedy

10−2

100

102

0

5

10

15

TRW

B

MF

maxW+c+TRE

pseudo

greedy

7x7 grid complete K10

Clamping can make the subsequent optimization problemseasier, hence sometimes total time with clamping is lowerwhile also being more accurate

25 / 21

Page 32: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Clamping variables: strongest result for attractive models

logZB = maxq∈L [ θ · q + SB(q) ]

For any variable Xi and x ∈ [0, 1], let qi = q(Xi = 1) and

logZBi (x) = maxq∈L:qi=x [ θ · q + SB(q) ]

ZBi (x) is ‘Bethe partition function constrained to qi = x ’

Note: ZBi (0) = ZB |Xi=0, ZBi (x∗) = ZB , ZBi (1) = ZB |Xi=1

Define new function,

Ai (x) := logZBi (x)− Si (x)

Theorem (implies all other results for attractive models)

For an attractive binary pairwise model, Ai (x) is convex

Builds on derivatives of Bethe free energy from [WJ13]

26 / 21

Page 33: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Clamping variables: strongest result for attractive models

logZB = maxq∈L [ θ · q + SB(q) ]

For any variable Xi and x ∈ [0, 1], let qi = q(Xi = 1) and

logZBi (x) = maxq∈L:qi=x [ θ · q + SB(q) ]

ZBi (x) is ‘Bethe partition function constrained to qi = x ’

Note: ZBi (0) = ZB |Xi=0, ZBi (x∗) = ZB , ZBi (1) = ZB |Xi=1

Define new function,

Ai (x) := logZBi (x)− Si (x)

Theorem (implies all other results for attractive models)

For an attractive binary pairwise model, Ai (x) is convex

Builds on derivatives of Bethe free energy from [WJ13]

26 / 21

Page 34: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: Which variable to clamp?

Compare error | logZ − logZ(i)B | to original error | logZ − logZB |

for various ways to choose which variable Xi to clamp:

best Clamp best improvement in error of Z in hindsight

worst Clamp worst improvement in error of Z in hindsight

avg Clamp average performance

maxW max sum of incident edge weights∑

j∈N(i) |Wij |Mpower more sophisticated, based on powers of related matrix

x2

x3x4

x1

x5

x10x6x7

x9x8

x10

x1

27 / 21

Page 35: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: attractive random graph n = 10, p = 0.5

unary θi ∼ U[−2, 2],edge Wij ∼ U[0,Wmax ]

Error of estimate of logZ

Observe

Clamping any variable helpssignificantly

Our selection methodsperform well

2 4 8 12 160

0.05

0.1

0.15

0.2

0.25

max interaction strength W

Avg `1 error of singleton

marginals

Using Frank-Wolfe to optimize

Bethe free energy

2 4 8 12 160

0.05

0.1

0.15

0.2

max

Originalall ClampmaxW Clampbest Clampworst ClampMpower

interaction strength W

28 / 21

Page 36: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: mixed random graph n = 10, p = 0.5

unary θi ∼ U[−2, 2],edge Wij ∼ U[−Wmax ,Wmax ]

Error of estimate of logZ

Results remain promisingfor higher n 2 4 8 12 16

0

1

2

3

4

5

6

max

Originalavg ClampmaxW Clampbest Clampworst ClampMpower

interaction strength W

AWAvg `1 error of singleton

marginals

Using Frank-Wolfe to optimize

Bethe free energy

2 4 8 12 160

0.1

0.2

0.3

0.4

max

Originalall ClampmaxW Clampbest Clampworst ClampMpower

interaction strength W

29 / 21

Page 37: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: attractive complete graph n = 10, TRW

unary θi ∼ U[−0.1, 0.1],edge Wij ∼ U[−Wmax ,Wmax ]

Error of estimate of logZ

Note low unary potentials

2 4 8 12 160

0.2

0.4

0.6

0.8

1

max interaction strength W

AWAvg `1 error of singleton

marginals

Clamping a variable ‘breaks

symmetry’ and overcomes

TRW advantage

2 4 8 12 160

0.1

0.2

0.3

0.4

0.5

max

Originalall ClampmaxW Clampbest Clampworst ClampMpowerTRW

interaction strength W

30 / 21

Page 38: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: mixed complete graph n = 10, TRW

unary θi ∼ U[−2, 2],edge Wij ∼ U[0,Wmax ]

Error of estimate of logZ

Note regular singleton

potentials

2 4 8 12 160

10

20

30

40

50

max

Originalavg ClampmaxW Clampbest Clampworst ClampMpowerTRW

interaction strength W

AWAvg `1 error of singleton

marginals

2 4 8 12 160

0.1

0.2

0.3

0.4

max interaction strength W

31 / 21

Page 39: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: attractive random graph n = 50, p = 0.1

unary θi ∼ U[−2, 2],edge Wij ∼ U[0,Wmax ]

Error of estimate of logZ

‘worst Clamp’ performs worse

here due to suboptimal

solutions found by Frank-Wolfe2 4 8 12 16

0

0.05

0.1

0.15

0.2

0.25

max interaction strength W

AWAvg `1 error of singleton

marginals

2 4 8 12 160

0.02

0.04

0.06

0.08

max interaction strength W

32 / 21

Page 40: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: mixed random graph n = 50, p = 0.1

unary θi ∼ U[−2, 2],edge Wij ∼ U[−Wmax ,Wmax ]

Error of estimate of logZ

Performance still good for

clamping just one variable

2 4 8 12 160

5

10

15

20

25

30

max

Originalavg ClampmaxW Clampbest Clampworst ClampMpower

interaction strength W

AWAvg `1 error of singleton

marginals

2 4 8 12 160

0.1

0.2

0.3

0.4

max

Originalall ClampmaxW Clampbest Clampworst ClampMpower

interaction strength W

33 / 21

Page 41: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: attractive ‘lamp’ graph

unary θi ∼ U[−2, 2],edge Wij ∼ U[0,Wmax ]

Error of estimate of logZ

Mpower performs well,

significantly better than maxW

2 4 8 12 160

0.05

0.1

0.15

0.2

0.25

max interaction strength W

Avg `1 error of singleton

marginals

x2

x3x4

x1

x5

x10x6x7

x9x8

x10

x1

2 4 8 12 160

0.02

0.04

0.06

0.08

0.1

0.12

max

Originalall ClampmaxW Clampbest Clampworst ClampMpower

interaction strength W

34 / 21

Page 42: Clamping Variables and Approximate Inferencemlg.eng.cam.ac.uk/adrian/slides_msr2.pdf · Clamping Variables and Approximate Inference Adrian Weller University of Cambridge MSR Cambridge

Experiments: mixed ‘lamp’ graph

unary θi ∼ U[−2, 2],edge Wij ∼ U[−Wmax ,Wmax ]

Error of estimate of logZ

Mpower performs well,

significantly better than maxW

2 4 8 12 160

0.5

1

1.5

2

max

Originalavg ClampmaxW Clampbest Clampworst ClampMpower

interaction strength W

Avg `1 error of singleton

marginals

x2

x3x4

x1

x5

x10x6x7

x9x8

x10

x1

2 4 8 12 160

0.05

0.1

0.15

0.2

max

Originalall ClampmaxW Clampbest Clampworst ClampMpower

interaction strength W

35 / 21