MCMC Methods for Functions: Modifying Old Algorithms to Make ...
Fast MCMC Algorithms on Polytopes
Transcript of Fast MCMC Algorithms on Polytopes
Fast MCMC Algorithms on Polytopes
Raaz Dwivedi, Department of EECS
Joint work with Yuansi Chen Martin Wainwright Bin Yu
Random Sampling
• Consider the problem of drawing random samples from a given density (known up-to proportionality)
X1, X2, . . . , Xm ⇠ ⇡⇤
Applications
• Probabilities of Events • Rare Event Simulations • Bayesian Posterior Mean • Volume Computation (polynomial time)
X1, X2, . . . , Xm ⇠ ⇡⇤
E[g(X)] =
Zg(x)⇡⇤(x)dx ⇡ 1
m
mX
i=1
g(Xi)
Applications
• Probabilities of Events • Rare Event Simulations • Bayesian Posterior Mean • Volume Computation (polynomial time)
X1, X2, . . . , Xm ⇠ ⇡⇤
E[g(X)] =
Zg(x)⇡⇤(x)dx ⇡ 1
m
mX
i=1
g(Xi)
Applications
• Zeroth order optimization: Polynomial time algorithms based on Random Walk
• Convex optimization: Bertsimas and Vempala 2004, Kalai and Vempala 2006, Kannan and Narayanan 2012, Hazan et al. 2015
• Non-convex optimization, Simulated Annealing: Aarts and Korst 1989, Rakhlin et al. 2015
minx2K
g(x)
Uniform Sampling on Polytopes
• Integration of arbitrary functions under linear constraints
• Mixed Integer Programming
• Sampling non negative integer matrices with specified row and column sums (contingency tables)
• Connections between optimization and sampling algorithms
GoalGiven A and b, and a starting distribution ,
design an MCMC algorithm
that generates a random sample from uniform distribution on
in as few steps as possible!
Convergence Rate: Mixing time for total variation
µ0
kµ0Pk � ⇡⇤kTV ✏
X =
⇢x 2 Rd
���� Ax b
�
Markov Chain Monte Carlo
• Design a Markov Chain which can converge to the desired distribution • Metropolis Hastings Algorithms (1950s), Gibbs Sampling (1980s)
• Simulate the Markov chain for several steps to get a sample
Markov Chain Monte Carlo
• Sampling on convex sets: Ball Walk (Lovász et al. 1990), Hit-and-run (Smith et al. 1993, Lovász 1999),
• Sampling on polytopes: Dikin Walk (Kannan and Hariharan 2012, Hariharan 2015, Sachdeva and Vishnoi 2016), Geodesic Walk (Lee and Vempala 2016)
Ball Walk [Lovász and Simonovits 1990]
• Propose a uniform point in a ball around x
• reject if outside the polytope, else move to it
z
z
z ⇠ UB✓x,
cpd
◆�
Ball Walk [Lovász and Simonovits 1990]
• Mixing time depends on conditioning of the set
Rmin
Rmax
#steps = O
✓d2
R2max
R2min
◆
per step cost = nd
Can be exponential in d
• Proposal
• Another variant
• Accept Reject:
Dikin Walk [Kannan and Narayanan 2012]
z
z
z ⇠ N✓x,
r2
dD�1
x
◆
P( accept z) = min
⇢1,
P (z ! x)
P (x ! z)
�
z ⇠ U [Dx(r)]
• Proposal
Dikin Walk [Kannan and Narayanan 2012]
z
zDx =
nX
i=1
aia>i(bi � a>i x)
2
A =
2
6664
—a>1 ——a>2 —
...—a>n—
3
7775K =
�x 2 Rd|Ax b
Log Barrier Method (Optimization)
[Dikin 1967, Nemirovski 1990]
z ⇠ N✓x,
r2
dD�1
x
◆
Upper bounds
Ball Walk Dikin Walk ? ?
#Steps
Per Step Cost
nd
nd
n = #constraints d = #dimensions
n > dnd2
d2R2
max
R2min
Slow mixing of Dikin Walk
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
Dikin
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
Dikin
#constraints = 128#constraints = 4
–Lovász’s Lemma
“If any two points that are apart have overlap in their transition regions, then the chain mixes in
steps.”
�
(Distance and overlap measured in appropriately)
⇢
O
✓1
�2⇢2
◆
–Lovász’s Lemma
“If any two points that are apart have overlap in their transition regions, then the chain mixes in
steps.”
� ⇢
O
✓1
�2⇢2
◆
For any fixed overlap , we want far away points to have overlapping regions, and hence large
ellipsoids (contained within the polytope) are useful.
⇢⇢
Improving Dikin Walk
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
nX
i=1
wi(x)aia>i
(bi � a>i x)2
Importance weighting of constraints
Log Barrier Method [Dikin 1967, Nemirovski 1990]
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
[Kannan and Narayanan 2012]
Improving Dikin Walk
Sampling meets optimization (again!!)
Volumetric Barrier Method [Vaidya 1993]
[Chen, D., Wainwright and Yu 2017]
Log Barrier Method [Dikin 1967, Nemirovski 1990]
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
[Kannan and Narayanan 2012]
Vaidya Proposal
z ⇠ N✓x,
r2pnd
V �1x
◆
Vx =nX
i=1
✓�x,i +
d
n
◆aia>i
(bi � a>i x)2
�x,i =a>i D
�1x ai
(bi � a>i x)2
Vaidya Walk [Chen, D., Wainwright, Yu 2017]
#constraints = 128#constraints = 4
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
Dikin
Vaidya
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
Dikin
Vaidya
Convergence Rates
Ball Walk Dikin Walk
Vaidya Walk
#Steps
Per Step Cost
n0.5d1.5nd
n constraints d dimensions
n > d
d2R2
max
R2min
Convergence Rates
Ball Walk Dikin Walk
Vaidya Walk
#Steps
Per Step Cost
n0.5d1.5nd
ndn constraints d dimensions
n > dnd2
d2R2
max
R2min
nd2
/ n0.45
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
initial
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
target
Dikin Walk vs Vaidya Walk
k = 0 k = 1
k = #iterations#experiments = 200#dimensions = 2
Dikin Walk
Vaidya Walk
k=10 k=100 k=500 k=1000
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 64
Dikin Walk
Vaidya Walk
k=10 k=100 k=500 k=1000
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 64
Dikin Walk
Vaidya Walk
k=10 k=100 k=500 k=1000
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 64
Dikin Walk
Vaidya Walk
k=10 k=100 k=500 k=1000
Small number of constraints: No Winner!
k = #iterations#experiments = 200#constraints = 64
Dikin Walk
Vaidya Walk
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 2048
k=10 k=100 k=500 k=1000
Dikin Walk
Vaidya Walk
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 2048
k=10 k=100 k=500 k=1000
Dikin Walk
Vaidya Walk
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 2048
k=10 k=100 k=500 k=1000
Dikin Walk
Vaidya Walk
Vaidya walk wins!k = #iterations#experiments = 200#constraints = 2048
k=10 k=100 k=500 k=1000
Dikin Walk vs Vaidya Walk/ n0.9
/ n0.45
O(nd) O(n0.5d1.5)
101 102 103
n
101
102
103
k̂ mix
Dikin
Vaidya / n0.9
/ n0.45
#constraints (n)
vs
Approx. Mixing Time
Dikin Walk
Vaidya Walk
k=0 k=10 k=100 k=500 k=1000
Dikin Walk
Vaidya Walk
#constraints = 64
#constraints = 2048
k=0 k=10 k=100 k=500 k=1000
Can we improve further?
Log Barrier Method [Dikin 1967, Nemeirovski 1990]
Vaidya’s Volumetric Barrier Method
[Vaidya 1993]
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
Vaidya Proposal
z ⇠ N✓x,
r2pnd
V �1x
◆
Vx =nX
i=1
✓�x,i +
d
n
◆aia>i
(bi � a>i x)2
�x,i =a>i D
�1x ai
(bi � a>i x)2
[Kannan and Narayanan, 2012]
John Walk
Log Barrier Method [Dikin 1967, Nemirovski 1990]
Vaidya’s Volumetric Barrier Method
[Vaidya 1993]
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
Vaidya Proposal
z ⇠ N✓x,
r2pnd
V �1x
◆
Vx =nX
i=1
✓�x,i +
d
n
◆aia>i
(bi � a>i x)2
�x,i =a>i D
�1x ai
(bi � a>i x)2
John Proposal
z ⇠ N✓x,
r2
d1.5J�1x
◆
Jx =nX
i=1
jx,iaia>i
(bi � a>i x)2
jx,i = convex program
[Chen, D., Wainwright, Yu 2017][Kannan and Narayanan, 2012]
John’s Ellipsoidal Algorithm
[Fritz John 1948, Lee and Sidford 2015]
Mixing Times
Dikin Walk Vaidya Walk John Walk
#Steps
Per Step Cost
n0.5d1.5nd
n = #constraints d = #dimensions
n > d
d2.5 log4n
d
Mixing Times
Dikin Walk Vaidya Walk John Walk
#Steps
Per Step Cost
n0.5d1.5nd
n = #constraints d = #dimensions
n > d
d2.5 log4n
d
nd2 nd2 nd2 log2 n
Conjecture
Dikin Walk Vaidya Walk John Walk
#Steps
Per Step Cost
n0.5d1.5nd
n = #constraints d = #dimensions
n > d
nd2 nd2 nd2 log2 n
d2 logc⇣nd
⌘
Proof Idea
• Proof relies on Lovasz’s Lemma
• Need to establish that near by points have similar transition distributions
• Have to show that the weighted matrices are sufficiently smooth — use of weights makes it involved