Free Energy Rates for Parameter Rich Optimization Problems...

Free Energy Rates for ParameterRich Optimization Problems

Joachim M. Buhmann1, Julien Dumazert1,Alexey Gronskiy1, Wojciech Szpankowski2

1ETH Zurich, {jbuhmann, alexeygr}@inf.ethz.ch,[email protected],

2Purdue University, [email protected]

AofA WorkshopPrinceton, 20th June 2017

Roadmap

Application

Genericproblem

Case study:“sparse” minimum bisection

Corollary

Main result formulation

Proof outline

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 1/24

Generic Problem

Genericproblem



Proof outline

Application

Corollary


Generic Problem: Gibbs Sampling

• We search for solutions:X → c

• Input X (e. g. a graph) israndom!

• Want solution c still to bestable

⇓define a Maximum EntropyGibbs measure over c:

p(c|X ) ∝ exp(−β · cost(c,X)) solutions

Gib

bs

me

asu

res


Generic Problem: Gibbs Sampling

• We search for solutions:X → c

• Input X (e. g. a graph) israndom!

• Want solution c still to bestable

⇓define a Maximum EntropyGibbs measure over c:

p(c|X ) ∝ exp(−β · cost(c,X)) solutionsG

ibb

sm

ea

sure

s


Generic Problem: “free energy”

Nowp(c|X ) = exp

(−β · cost−F(X )

),

where the following is free energy:

F(X ) = log Z (X ) (here Z (X ) is partition function).

Goal: compute expected “free energy” — hard:

EXF(X ) = EX log Z (X ).


Case Study: “Sparse” Minimum Bisection

Application

Genericproblem


Corollary


Proof outline



Find two subgraphs of some small size d withminimum edge cost between them.

2

3

4

1

5

10

11

12

9

8

6

7



• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!

• solutions are dependent oneach other — not RandomEnergy Model (next slide)!

• contribution: free energyasymptotics solved!

• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7


Free Energy: History and Advances

• [Derrida’80] established REM — nodependencies;

• [Talagrand’02]: systematization; simple proof ofREM free energy phase transition and beyond:

limn→∞

E[log Z (β,X )]

n=

{β2

4 + log 2 β < 2√

log 2,β√

log 2 β ≥ 2√

log 2.


Main Result

Genericproblem



Proof outline

Application

Corollary


Main Result: Setting• m is number of solutions

• N is size of one solution(number of parameters)

• require to be “parameterrich”

log m = o(N)

23

4

1

5

10

11

129

86

7

N

log m = o(N)


Main Result: 2nd Order Phase Transition

Main theoremConsider sparse MBP on a complete graph; edgeweights mutually independent within any givensolution; have mean µ and variance σ2. Then, thefollowing holds:

limn→∞

E[log Z ] + βµ√

N log mlog m

=

{1 + β2σ2

2 , β <√

2σ,

βσ√

2, β ≥√

2σ

provided log n� d � n2/7√log n

.


Proof Outline

Genericproblem



Proof outline

Application

Corollary


Proof Outline – I

• Introduce “solution overlap” D:average intersection of two bisections

• D is key to understand dependencies(remember we are no REM!)

Lemma 1The following holds

Erand choiceD = O(d4/n).

Computedependencies

P(``Z close to EZ’’)viaVar Z

Breaking E log Z into two cond’s

Final boundingvia adjustingparameter


Proof Outline – II• Introduce event A: happens when Z is

close to EZ , i. e. A := {Z ≥ εEZ}• Goal is to compute P(A)

Fact 1P(A) can be bounded by VarZ viaChebychev.

Lemma 2 [Buhmann et al., 2014]VarZ can be asymptotically approximatedvia Erand choiceD:

VarZ ∼ (EZ )2(σ2β2Erand choiceD).

Computedependencies





Proof Outline – III

• Break E log Z into

E log Z = E[log Z | A] · P(A) + E[log Z1(A)]

≥ (logEZ + log ε)P(A) + E[log Z1(A)]

Fact 3Can expand logEZ via Taylor expansion(used assumptions of Theorem) andbound P(A) from previous.

Fact 4E[log Z1(A)]: enough to bound loosely

Computedependencies





Proof Outline – IV

• Finally, the right choice of ε for tworegimes of β gives the phase transitionin lower bound

limn→∞

E[log Z ] + · · ·· · ·

>

{1 + β2σ2

2 , β <√

2σ,

βσ√

2, β ≥√

2σ.

• The same phase transition happensfor upper bound, — easier to prove(no computing dependencies)

Computedependencies





Possible Application

Genericproblem



Proof outline

Application

Corollary


Application Setting:Gibbs Sampling Regularizer

• Remember our approach:

• Q: what is regularizing here?A: choice of β — essentially controls “width”

• Q: how?A: maximize expected log-convolution:

β∗ = arg maxβ

EX ′,X ′′

[log

∑c

pβ(c|X ′)pβ(c|X ′′)]


Application Setting: Intuition• Log-convolution is “almost” cross entropy;

• It stabilizes solution output:β = 0.25: underfitting

c opt(X

′ )

c opt(X

′′)

pβ(·|X ′) pβ(·|X ′′)

β = 2.8: near-optimal β = 19.5: overfitting

2 6 10 14 18

β = 0.25

β = 2.8

β = 19.5

β

solutions solutions solutions


Application Setting: Intuition• Log-convolution is “almost” cross entropy;• It stabilizes solution output:

β = 0.25: underfitting

c opt(X

′ )

c opt(X

′′)

pβ(·|X ′) pβ(·|X ′′)

β = 2.8: near-optimal β = 19.5: overfitting

2 6 10 14 18

β = 0.25

β = 2.8

β = 19.5

β

solutions solutions solutions


Corollary and Application

Genericproblem



Proof outline

Application

Corollary


Corollary: Log-Convolution• The log-convolution score can be rewritten:

E log∑

c

pβ(c|X ′)pβ(c|X ′′)

= E log Z (β,X ′ + X ′′)︸︷︷︸Thm

−2E log Z (β,X )︸︷︷︸Thm

• Thus possible to analytically compute score:

noise-to-signal

ratio

less noise

greater beta

more noise

less beta


Conclusion

• We computed asymptotically precisely the valueof E log Z with small dependencies

• It has applications to model validation(submitted to J. of Theor. CS) in machinelearning: method involves comparing twofluctuating Gibbs distributions

• Still not found a general way (and probablythere is no such way)


Thanks for your attention!


References I

E. T. Jaynes,Information Theory and Statistical Mechanics.Phys. Rev., 106, 1957, p. 620.

W. Szpankowski,Combinatorial optimization problems for whichalmost every algorithm is asymptotically optimal.Optimization, 33, 1995, pp. 359–368.

M. Talagrand.Spin Glasses: A Challenge for Mathematicians.Springer, NY, 2003.


References II

J. Vannimenus and M. Mezard,On the Statistical Mechanics of OptimizationProblems of the traveling Salesman Type.J. de Physique Lettres, 45, 1984, pp. 1145–1153.

J. M. Buhmann, A. Gronskiy and W. SzpankowskiFree Energy Rates for a Class of CombinatorialOptimization ProblemsAofA’14, Paris

J. M. Buhmann, A. Gronskiy, J. Dumazert andW. SzpankowskiPhase Transitions in Parameter Rich OptimizationProblemsSODA/ANALCO’17, Barcelona


Free Energy Rates for Parameter Rich Optimization Problems...

Documents

Transcript of Free Energy Rates for Parameter Rich Optimization Problems...