Free Energy Rates for Parameter Rich Optimization Problems...
Transcript of Free Energy Rates for Parameter Rich Optimization Problems...
Free Energy Rates for ParameterRich Optimization Problems
Joachim M. Buhmann1, Julien Dumazert1,Alexey Gronskiy1, Wojciech Szpankowski2
1ETH Zurich, {jbuhmann, alexeygr}@inf.ethz.ch,[email protected],
2Purdue University, [email protected]
AofA WorkshopPrinceton, 20th June 2017
Roadmap
Application
Genericproblem
Case study:“sparse” minimum bisection
Corollary
Main result formulation
Proof outline
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 1/24
Generic Problem
Genericproblem
Case study:“sparse” minimum bisection
Main result formulation
Proof outline
Application
Corollary
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 2/24
Generic Problem: Gibbs Sampling
• We search for solutions:X → c
• Input X (e. g. a graph) israndom!
• Want solution c still to bestable
⇓define a Maximum EntropyGibbs measure over c:
p(c|X ) ∝ exp(−β · cost(c,X)) solutions
Gib
bs
me
asu
res
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24
Generic Problem: Gibbs Sampling
• We search for solutions:X → c
• Input X (e. g. a graph) israndom!
• Want solution c still to bestable
⇓define a Maximum EntropyGibbs measure over c:
p(c|X ) ∝ exp(−β · cost(c,X)) solutions
Gib
bs
me
asu
res
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24
Generic Problem: Gibbs Sampling
• We search for solutions:X → c
• Input X (e. g. a graph) israndom!
• Want solution c still to bestable
⇓define a Maximum EntropyGibbs measure over c:
p(c|X ) ∝ exp(−β · cost(c,X)) solutionsG
ibb
sm
ea
sure
s
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24
Generic Problem: “free energy”
Nowp(c|X ) = exp
(−β · cost−F(X )
),
where the following is free energy:
F(X ) = log Z (X ) (here Z (X ) is partition function).
Goal: compute expected “free energy” — hard:
EXF(X ) = EX log Z (X ).
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 4/24
Case Study: “Sparse” Minimum Bisection
Application
Genericproblem
Case study:“sparse” minimum bisection
Corollary
Main result formulation
Proof outline
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 5/24
Case Study: “Sparse” Minimum Bisection
Find two subgraphs of some small size d withminimum edge cost between them.
2
3
4
1
5
10
11
12
9
8
6
7
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
Case Study: “Sparse” Minimum Bisection
Find two subgraphs of some small size d withminimum edge cost between them.
2
3
4
1
5
10
11
12
9
8
6
7
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
Case Study: “Sparse” Minimum Bisection
Find two subgraphs of some small size d withminimum edge cost between them.
2
3
4
1
5
10
11
12
9
8
6
7
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
Case Study: “Sparse” Minimum Bisection
• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!
• solutions are dependent oneach other — not RandomEnergy Model (next slide)!
• contribution: free energyasymptotics solved!
• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).
2
3
4
1
5
10
11
12
9
8
6
7
2
3
4
1
5
10
11
12
9
8
6
7
2
3
4
1
5
10
11
12
9
8
6
7
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
Case Study: “Sparse” Minimum Bisection
• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!
• solutions are dependent oneach other — not RandomEnergy Model (next slide)!
• contribution: free energyasymptotics solved!
• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).
2
3
4
1
5
10
11
12
9
8
6
7
2
3
4
1
5
10
11
12
9
8
6
7
2
3
4
1
5
10
11
12
9
8
6
7
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
Case Study: “Sparse” Minimum Bisection
• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!
• solutions are dependent oneach other — not RandomEnergy Model (next slide)!
• contribution: free energyasymptotics solved!
• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).
2
3
4
1
5
10
11
12
9
8
6
7
2
3
4
1
5
10
11
12
9
8
6
7
2
3
4
1
5
10
11
12
9
8
6
7
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
Case Study: “Sparse” Minimum Bisection
• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!
• solutions are dependent oneach other — not RandomEnergy Model (next slide)!
• contribution: free energyasymptotics solved!
• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).
2
3
4
1
5
10
11
12
9
8
6
7
2
3
4
1
5
10
11
12
9
8
6
7
2
3
4
1
5
10
11
12
9
8
6
7
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
Free Energy: History and Advances
• [Derrida’80] established REM — nodependencies;
• [Talagrand’02]: systematization; simple proof ofREM free energy phase transition and beyond:
limn→∞
E[log Z (β,X )]
n=
{β2
4 + log 2 β < 2√
log 2,β√
log 2 β ≥ 2√
log 2.
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24
Free Energy: History and Advances
• [Derrida’80] established REM — nodependencies;
• [Talagrand’02]: systematization; simple proof ofREM free energy phase transition and beyond:
limn→∞
E[log Z (β,X )]
n=
{β2
4 + log 2 β < 2√
log 2,β√
log 2 β ≥ 2√
log 2.
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24
Main Result
Genericproblem
Case study:“sparse” minimum bisection
Main result formulation
Proof outline
Application
Corollary
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 8/24
Main Result: Setting• m is number of solutions
• N is size of one solution(number of parameters)
• require to be “parameterrich”
log m = o(N)
23
4
1
5
10
11
129
86
7
N
log m = o(N)
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24
Main Result: Setting• m is number of solutions
• N is size of one solution(number of parameters)
• require to be “parameterrich”
log m = o(N)
23
4
1
5
10
11
129
86
7
N
log m = o(N)
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24
Main Result: Setting• m is number of solutions
• N is size of one solution(number of parameters)
• require to be “parameterrich”
log m = o(N)
23
4
1
5
10
11
129
86
7
N
log m = o(N)
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24
Main Result: 2nd Order Phase Transition
Main theoremConsider sparse MBP on a complete graph; edgeweights mutually independent within any givensolution; have mean µ and variance σ2. Then, thefollowing holds:
limn→∞
E[log Z ] + βµ√
N log mlog m
=
{1 + β2σ2
2 , β <√
2σ,
βσ√
2, β ≥√
2σ
provided log n� d � n2/7√log n
.
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 10/24
Proof Outline
Genericproblem
Case study:“sparse” minimum bisection
Main result formulation
Proof outline
Application
Corollary
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 11/24
Proof Outline – I
• Introduce “solution overlap” D:average intersection of two bisections
• D is key to understand dependencies(remember we are no REM!)
Lemma 1The following holds
Erand choiceD = O(d4/n).
Computedependencies
P(``Z close to EZ’’)viaVar Z
Breaking E log Z into two cond’s
Final boundingvia adjustingparameter
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 12/24
Proof Outline – II• Introduce event A: happens when Z is
close to EZ , i. e. A := {Z ≥ εEZ}• Goal is to compute P(A)
Fact 1P(A) can be bounded by VarZ viaChebychev.
Lemma 2 [Buhmann et al., 2014]VarZ can be asymptotically approximatedvia Erand choiceD:
VarZ ∼ (EZ )2(σ2β2Erand choiceD).
Computedependencies
P(``Z close to EZ’’)viaVar Z
Breaking E log Z into two cond’s
Final boundingvia adjustingparameter
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 13/24
Proof Outline – III
• Break E log Z into
E log Z = E[log Z | A] · P(A) + E[log Z1(A)]
≥ (logEZ + log ε)P(A) + E[log Z1(A)]
Fact 3Can expand logEZ via Taylor expansion(used assumptions of Theorem) andbound P(A) from previous.
Fact 4E[log Z1(A)]: enough to bound loosely
Computedependencies
P(``Z close to EZ’’)viaVar Z
Breaking E log Z into two cond’s
Final boundingvia adjustingparameter
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 14/24
Proof Outline – IV
• Finally, the right choice of ε for tworegimes of β gives the phase transitionin lower bound
limn→∞
E[log Z ] + · · ·· · ·
>
{1 + β2σ2
2 , β <√
2σ,
βσ√
2, β ≥√
2σ.
• The same phase transition happensfor upper bound, — easier to prove(no computing dependencies)
Computedependencies
P(``Z close to EZ’’)viaVar Z
Breaking E log Z into two cond’s
Final boundingvia adjustingparameter
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 15/24
Possible Application
Genericproblem
Case study:“sparse” minimum bisection
Main result formulation
Proof outline
Application
Corollary
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 16/24
Application Setting:Gibbs Sampling Regularizer
• Remember our approach:
• Q: what is regularizing here?A: choice of β — essentially controls “width”
• Q: how?A: maximize expected log-convolution:
β∗ = arg maxβ
EX ′,X ′′
[log
∑c
pβ(c|X ′)pβ(c|X ′′)]
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24
Application Setting:Gibbs Sampling Regularizer
• Remember our approach:
• Q: what is regularizing here?A: choice of β — essentially controls “width”
• Q: how?A: maximize expected log-convolution:
β∗ = arg maxβ
EX ′,X ′′
[log
∑c
pβ(c|X ′)pβ(c|X ′′)]
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24
Application Setting:Gibbs Sampling Regularizer
• Remember our approach:
• Q: what is regularizing here?A: choice of β — essentially controls “width”
• Q: how?A: maximize expected log-convolution:
β∗ = arg maxβ
EX ′,X ′′
[log
∑c
pβ(c|X ′)pβ(c|X ′′)]
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24
Application Setting: Intuition• Log-convolution is “almost” cross entropy;
• It stabilizes solution output:β = 0.25: underfitting
c opt(X
′ )
c opt(X
′′)
pβ(·|X ′) pβ(·|X ′′)
β = 2.8: near-optimal β = 19.5: overfitting
2 6 10 14 18
β = 0.25
β = 2.8
β = 19.5
β
solutions solutions solutions
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24
Application Setting: Intuition• Log-convolution is “almost” cross entropy;• It stabilizes solution output:
β = 0.25: underfitting
c opt(X
′ )
c opt(X
′′)
pβ(·|X ′) pβ(·|X ′′)
β = 2.8: near-optimal β = 19.5: overfitting
2 6 10 14 18
β = 0.25
β = 2.8
β = 19.5
β
solutions solutions solutions
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24
Corollary and Application
Genericproblem
Case study:“sparse” minimum bisection
Main result formulation
Proof outline
Application
Corollary
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 19/24
Corollary: Log-Convolution• The log-convolution score can be rewritten:
E log∑
c
pβ(c|X ′)pβ(c|X ′′)
= E log Z (β,X ′ + X ′′)︸ ︷︷ ︸Thm
−2E log Z (β,X )︸ ︷︷ ︸Thm
• Thus possible to analytically compute score:
noise-to-signal
ratio
less noise
greater beta
more noise
less beta
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24
Corollary: Log-Convolution• The log-convolution score can be rewritten:
E log∑
c
pβ(c|X ′)pβ(c|X ′′)
= E log Z (β,X ′ + X ′′)︸ ︷︷ ︸Thm
−2E log Z (β,X )︸ ︷︷ ︸Thm
• Thus possible to analytically compute score:
noise-to-signal
ratio
less noise
greater beta
more noise
less beta
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24
Conclusion
• We computed asymptotically precisely the valueof E log Z with small dependencies
• It has applications to model validation(submitted to J. of Theor. CS) in machinelearning: method involves comparing twofluctuating Gibbs distributions
• Still not found a general way (and probablythere is no such way)
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 21/24
Thanks for your attention!
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 22/24
References I
E. T. Jaynes,Information Theory and Statistical Mechanics.Phys. Rev., 106, 1957, p. 620.
W. Szpankowski,Combinatorial optimization problems for whichalmost every algorithm is asymptotically optimal.Optimization, 33, 1995, pp. 359–368.
M. Talagrand.Spin Glasses: A Challenge for Mathematicians.Springer, NY, 2003.
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 23/24
References II
J. Vannimenus and M. Mezard,On the Statistical Mechanics of OptimizationProblems of the traveling Salesman Type.J. de Physique Lettres, 45, 1984, pp. 1145–1153.
J. M. Buhmann, A. Gronskiy and W. SzpankowskiFree Energy Rates for a Class of CombinatorialOptimization ProblemsAofA’14, Paris
J. M. Buhmann, A. Gronskiy, J. Dumazert andW. SzpankowskiPhase Transitions in Parameter Rich OptimizationProblemsSODA/ANALCO’17, Barcelona
AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 24/24