Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Mixing

Dana RandallGeorgia Tech

A tutorial on Markov chains

( Slides at: www.math.gatech.edu/~randall )

Outline

Fundamentals for designing a Markov chain

Bounding running times (convergence rates)

Connections to statistical physics

Main Q: What do typical elements look like?

Determine properties of “typical’’ elements Evaluate thermodynamic properties

(such as free energy, entropy,…)

Estimate the cardinality of the set “Markov chain Monte Carlo’’

Random sampling can be

used to:

Markov chains for sampling

Given: A large set (matchings, colorings,

independent sets,…)

Andrei Andreyevich Markov 1856-1922

Markov chains

Sampling using Markov chains

State space Ω

( |Ω| ~ cn )

Sampling using Markov chains

State space Ω

Step 1. Connect the state space.

( |Ω| ~ cn )

E.g., if Ω = indep. sets of a graph G, connect I and I’ iff |I I’| = 1.

Basics of Markov chains

Starting at x: - Pick a neighbor y. - Move to y with prob. P(x,y) = 1/∆.

- With all remaining prob. stay at x.

Transitions P: Random walk on H

(max deg in H)

Def’n: A MC is ergodic if it is: •irreducible - for all x,y Ω, t: Pt(x,y) > 0; (connected) •aperiodic - g.c.d. t: Pt(x,y) > 0 =1.

(not bipartite)(The “t step” transition prob.)

The stationary distribution

(1/∆/∆)

Thm: Any finite, ergodic MC converges to a unique stationary distribution π.

Thm: The stationary distribution π satisfies:

(The detailed balance condition)

π(x) P(x,y) = π(y) P(y,x).

P symmetric π is uniform.

E.g., For >0, sample ind. set I w/ prob: π(I) =

where Z = ∑J |J|.

Q: What if we want to sample from some other distribution?

Sampling from non-uniform distributions

Step 2. Carefully define the transition probabilities.

The Metropolis Algorithm

Propose a move from x to y as before, but accept with probability min (1, π(y)/π(x))

(with remaining probability stay at x).

(MRRTT ’53)

π(y)/∆π(x)1π(y)π(x)

y( if π(x) ≥ π(y) )

π(x) P(x,y) = π(y) P(y,x)

For independent sets:

min(1,)

min(1,-

π(y) (|I|+1)/Z

π(x) (|I|)/Z= =

Q: But for how long do we walk?

Basics continued…

Step 1. Connect the state space.Step 2. Carefully define the transition probabilities.

Starting at any state x0, take a random walk for some number of steps . . . and output the final state (from ?).

Step 3. Bound the mixing time.

This tells us the number of steps to take.

The mixing rate

Def’n: The total variation distance is ||Pt,π|| = max __ ∑ |Pt(x,y) - π(x)|.

x Ω yΩ 2 1

A Markov chain is rapidly mixing if() is poly (n, log(-1)).

Def’n Given , the mixing time is

= min t: ||Pt’,π|| < , t’ ≥

Spectral gap

Let >≥…≥ Ω be the eigenvalues of P.

Def’n: Gap(P) = 1-|2| is the spectral gap.

Mixing rate

Spectral Gap

Thm: (Alon, Alon-Milman, Sinclair)

log ( )

≥ log

Gap(P)

2 Gap(P)

Outline

Fundamentals for designing a Markov chain

Bounding running times (convergence rates)

Connections to statistical physics

Outline for rest of talk

Techniques:

•Coupling

•Flows and paths

•Indirect methods

Problems:

•Walk on the hypercube

•Colorings

•Matchings

•Independent sets

•Connections with statistical physics: - problems - algorithms - physical insights

Coupling

Once they agree, they move in sync (xt=yt

xt+1=yt+1)

Couple moves, but each simulates the MC

Start at any x0 and y0

y0Simulate 2 processes:

Def’n: A coupling is a MC on Ω x Ω:1) Each process Xt, Yt is a faithful

copy of the original MC,

2) If Xt = Yt, then Xt+1 = Yt+1.

Coupling

T = max ( E [ Tx,y ] ), where Tx,y = min t: Xt=Yt | X0=x, Y0=y.

The coupling time T is:

Thm: () ≤ T e ln -1 . (Aldous’81)

Ex1: Walk on the hypercube

MCCUBE:• Start at v0=(0,0,…,0).• Repeat: - Pick i [n], b 0,1. - Set vi = b.

Symmetric, ergodic π is uniform.

Mixing time? Use coupling:

x0 = 0 1 1 0 0 1 y0 = 1 1 1 0 0 0

i=2, b=0: x1 = 0 0 1 0 0 1 y1 = 1 0 1 0 0 0

i=6, b=1: x2 = 0 0 1 0 0 1 y2 = 1 0 1 0 0 1

i=1, b=1: xt = 1 0 1 1 1 0 yt = 1 0 1 1 1 0. . .

so T = n log n (coupon

collecting)

() = O ( n ln (n -1).

Outline

Techniques:

•Coupling - path coupling

•Flows and paths

•Indirect methods

Problems:

•Colorings

•Matchings

•Independent sets

Ex 2: Colorings

Given: A graph G (max deg d), k > 1.Goal: Find a random k-coloring of G. MCCOL: (Single point replacement)

• Starting at some k-coloring C0

• Repeat: - With prob 1/2 do nothing. - Pick v V, c [k]; - Recolor v with c, if possible.

The “lazy” chain

If k ≥ d + 2, then the state space is connected.

(Therefore π is uniform.)

Note: k ≥ d + 1 colorings exist.(Greedy)

Path Coupling

Coupling: Show for all x,y , E[ (dist(x,y)) ] < 0.

Path coupling: Show for all u,v s.t. dist(u,v)=1, that E[ (dist(u,v)) ] < 0.

Consider a shortest path:x = z0, z1, z2, . . . , zr= y, dist(zi,zi+1) = 1 dist(x,y) = r.

[Bubley,Dyer,Greenhill’97-8]

E[ (dist(x,y)) ]

E[ (dist(zi,zi+1)) ]

≤ 0.

Path coupling for MCCOL

Thm: MCCOL is rapidly mixing if k ≥ 3d. (Jerrum ‘95)

Pf: Use path coupling: dist(x,y) = 1.

E∆dist ≤ ( (k-d)(-1) + 2d(+1) ) = (3d-k) ≤ 0.

12nk12nk

v = w, c C \ , , : ∆dist = -1,Cases:

v N(w), c , : ∆dist = + 1 (or 0) o.w.: ∆dist = 0.

Summary: Coupling

Pros: Can yield very easy proofs

Cons: Demands a lot from the chain

Extensions: Careful coupling (k ≥ 2d) (Jerrum’95)

Change the MC (Luby-R-

Sinclair’95)

“Macromoves” - burn in (Dyer-Frieze’01, Molloy’02) - non-Markovian couplings (Hayes-Vigoda’03)

Outline

Techniques:

•Coupling

•Flows and paths

•Indirect methods

Problems:

•Colorings

•Matchings

•Independent sets

Conductance and flows

(Jerrum-Sinclair’88)

= min (S)SΩ, π(S)≤1/2

S SC(S) =

∑ π(s) P(s,s’)

∑ π(s)

sS, s’SC

2 Thm: ≤ Gap(P) ≤ 2 2

Min cut Max flow

paths: xy: from xΩ, to yΩ, x ≠ y, carrying π(x)π(y) units of flow.

: Make |Ω|2

canonical

(Sinclair’92)

Q(e) = π(u) P(u,v) = π(v) P(v,u).

Capacity of e=(u,v): e

= min l

( lis the max path length )

() = max ∑ π(x) π(y) Q(e)

The congestion of these paths is:

Thm: ≤ log ( π(x))-1._

Ex 3: Back to the hypercube

- The complementary pair (u’,v’) determines (s,t), so |

xy e | = 2n-1.

and l= n = Õ(n2).

() = max = = n Q(e)

∑ π(x) π(y)xy e

2n-1 2-2n

2-n (1/2n)

s = 0 1 1 0 0 1 t = 1 1 0 0 0 0

1 1 1 0 0 1

s = 0 1 1 0 0 1 t = 1 1 0 0 0 0

1 1 1 0 0 1

s = 0 1 1 0 0 1 t = 1 1 0 0 0 0

1 1 0 0 0 1

1 1 0 0 0 1 t = 1 1 0 0 0 0

1 1 1 0 0 1

u =v =

0 1 0 0 0 0

0 1 1 0 0 0

0 1 1 0 0 0 0 1 1 0 0 1 = s

u’ =v’ =

- Bound the number of paths through (u,v) E.

- Define a canonical path from s to t.

Outline

Techniques:

•Coupling

•Flows and paths

•Indirect methods

Problems:

•Colorings

•Matchings

•Independent sets

Ex 4: Sampling matchings

MCMATCH:

Starting at M0, repeat: Pick e = (u,v) E

- If e M, remove e;

- If u and v unmatched in

M, add e;

- If u matched (by e’) and v unmatched (or vice versa), add e and remove e’;

- Otherwise do nothing.

Thm: Coupling won’t work! (Kumar-Ramesh’99)

Mixing time of MCMATCH

vpaths using (u,v) determined by u’

. . . as before.

Techniques:

•Coupling

•Flows and paths

•Indirect methods

Problems:

•Colorings

•Matchings

•Independent sets

Outline

Goal: Given , sample ind. set I with prob: π(I) = |I|/Z,

Z = ∑J |J|.

Ex 5: Independent Sets

MCIND: Starting at I0, Repeat: - Pick v V and b 0,1; - If v I, b=0, remove v w.p. min (1,-1) - If v I, b=1, add v w.p. min (1,) if possible; - O.w. do nothing.

Slow mixing of MCIND (large )

(nn/2)

10 ∞

large there is a “bad cut,” . . . so MCIND is slowly mixing.

(Even)

Summary: Flows

Pros: Offers a combinatorial approach to mixing; especially useful for proving slow mixing.

Cons: Requires global knowledge of the chain to spread out paths.

Extensions: Balanced flows (Morris-Sinclair’99) MCMC -- Major highlights: - The permanent (Jerrum-Sinclair-Vigoda’02) - Volume of a convex polytope (Dyer-Frieze-Kannan’89, +… )

Techniques:

•Coupling

•Flows and paths

•Indirect methods - Comparison - Decomposition

Problems:

•Colorings

•Matchings

•Independent sets

Outline

Comparison(Diaconis,Saloff-Coste’93)

unknown

Pknown

For each edge (x,y) P, make a path x,y using edges in P.

Let (z,w) be the set of paths x,y using (z,w)

Thm: Gap(P) ≥ Gap(P)._

A = max ∑ |x,y|

π(x)P(x,y)

Q(e) exy e

Comparison

(x,y) P x,y (using P)

(z,w) is the set of paths x,y using (z,w)

Thm: Gap(P) ≥ Gap(P)._

x y _known

unknownP

(S,S) cannot be a bad cut in P if it isn’t in P.

Adjacency . . . The ˆ Matrix Reloaded

Comparison, aka . . .

Disjoint decomposition

Projection

Restrictions

π(ai) =

π(Ai)

P(ai,aj) = ∑

π(x)P(x,y)

π(Ai) xAi,

(Madras-R.’96, Martin-R.’00)

Thm: Gap(P) ≥ — Gap(P) (mini Gap(Pi)).12

Let Ω = ind. sets of G; Ωk = ind. sets of size k.

For G=(V,E):

Ex 6: MCIND on small ind. sets

MCSWAP:Starting at I0, Repeat: - Pick (u,v,b) V x V x 0,1,2; - If b=0 and u V, remove u w.p. min (1,-1) - If b=1 and u V, add u w.p. min (1,) if possible; - If b=2 remove u and add v (if possible); - O.w. do nothing.

* Consider first the “swap” chain:

Thm: MCIND is rapidly mixing

Ωk , where K = |V|/2(∆+1).

Ind. sets w/bounded size (cont.)

Thm: MCIND is rapidly mixing on

Ωk , where K=|V|/2(∆+1).k = 1

Ω0 Ω1 Ω2 . . . ΩK-1 ΩK

a0 a1 a2 . . .aK-1 aK

ProjectionRestrictions

|ΩK| is logconcave, . . .

so P is rapidly mixing. _

MCSWAP

The Restrictions of MCswap

Ω0 Ω1 Ω2 . . . ΩK-1 ΩK

ProjectionRestrictions

Thm: MCSWAP is rapidly mixing on Ωk , k < K. (Bubley-Dyer’97)

KThm: MCSWAP is rapidly mixing on

k = 1 (Decomposition)

Cor: MCIND is rapidly mixing on Ωk .

(Comparison)

Summary: Indirect methods

Pros: Offer a top down approach; allow hybrid methods to be used..

Extensions: Comparison thm for log-Sobolev (Diaconis-Saloff-Coste’96) Comparison for Glauber dynamics (R.-Tetali ‘98) Decomposition for log-Sobolev (Jerrum-Son-Tetali-Vigoda ‘02)

Cons: Can increase the complexity.

Techniques:

•Coupling

•Flows and paths

•Hybrid methods

Problems:

•Colorings

•Matchings

•Independent sets

Outline

They have a need for sampling

Use many interesting heuristics

Great intuition

Experts on “large data sets’’

Microscopic

Macroscopic details behavior

(i.e., phase transitions)

Why Statistical Physics?

(3-colorings) (Independent sets)

(Matchings) (Min cut)

- - -- +

Models from statistical physics

Potts model

Hardcore model

Dimer model

--- --

Ising model

Independent sets:

π(I)=|I|/Z

Models (cont.)

Matchings:

π(M)=|M|/Z

Ising model:

π()= |E |/Z,

E= = u v: (u) =

(E = E= E≠)

Models: (The physics perspective)

Independent sets: H() = -|I|

If = e then π() = |I| /Z.

Given: A physical system Ω = Define: A Gibbs measure as follows:

π() = e-H()/ Z,

H() (the Hamiltonian),

= 1/kT (inverse temperature),

normalizing constant or partition function. where Z = ∑ e

-H() is the

Ising model: H() = -∑ u v

(u,v) E

If = e2 then π() = |E | /Z.=

Physics perspective (cont.)

Q: What about on the infinite lattice? Use conditional probabilities:

But there can be boundary effects !!!

Phase transitions: Ind. sets

Low temperature: long range effects

High temperature: ∂ effects die out

regions

……

TC indicates a “phase transition.”

Slow mixing of MCIND

revisited

π(Si) = ∑ π(s) e-H(s)/Z

“Entropy “Energy term” term”

Group by # of “fault lines”

Fault lines are vacant pathsof width 2 from top to bottom (or left to right).

“Peierls Argument”

2. Shift right of fault by 1 and flip colors.

For fixed path length l,

SB x 2n/2 x 3l.

1. Identify horizontalor vertical fault line .

3. Remove rt column ; add points along fault line, if possible.

Peierls Argument cont.

≤ 2n/2 3l

( ≥ l - n/2more points)

≤ π(SB) 2n/2 3n (n/2) (poly(n)) /n)

≤ π(SB) ( )n/2 (poly(n)),

if > 18.

π(S1) = ∑ π()eS1

≤ ∑ ∑ π() 2n/2 3l (n/2-l)

(and similarly for S2, S3, …)

Conclusions

Techniques:• Coupling: can be easy

when it works

•Flows: requires global knowledge of chain;

very useful for slow mixing

• Connection to physics: can offer tremendous insights

Open problems: . . .

• Indirect methods: top down approach; often increases complexity

Conclusions

Open problems:

Sampling 4,5,6-colorings on the grid.

Sampling perfect matchings on non-bipartite graphs. Sampling acyclic orientations in a graph. Sampling configurations of the Potts model (a generalization of Ising, but with more colors).

How can we further exploit phase transitions? Other physical intuition?

Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Documents

Transcript of Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Markov Chains Regular Markov Chains Absorbing Markov Chains

DEPARTMENT OF INSURANCE, FINANCIAL INSTITUTIONS AND ... Documents/MedLiabAllORD.pdf · Dana L. Frese Randall G. Friesen Bruce M. Lane Diane L. Light ... RSMo (Definitions). An Insurance

Slow and Fast Mixing of Tempering and Swapping for the Potts Model Nayantara Bhatnagar, UC Berkeley Dana Randall, Georgia Tech.

Randall Warranty - Randall Amplifiers

home.bexar.orghome.bexar.org/medicalexaminer/docs/AnnualReport_2001.pdf · 2012-08-02 · Robert C. Bux, M.D. Suzanna E. Dana, M.D. Randall E. Frost, ... Robert Rodriguez Mike Frontz

RANDALL LEWIS HEALTH POLICY FELLOWSHIP · 2020. 8. 16. · 44 | Randall Lewis Health Policy Fellowship Randall Lewis Health Policy Fellowship | 45. Margo Welsh 46 | Randall Lewis

Colin Randall

REKSA DANA SCHRODER DANA KOMBINASI - Schroders - …

Affective Movement Generation using Laban Effort …Hidden Markov Models Ali-Akbar Samadani, Rob Gorbet, Dana Kulic´ Abstract—Body movements are an important communication medium

Clustering in Interfering Binary Mixtures Sarah Miracle, Dana Randall, Amanda Streib Georgia Institute of Technology.

Tennison Randall

SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. · Title: SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. Subject: SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. Keywords: 303!5

PENGARUH DANA DESA, ALOKASI DANA DESA DAN DANA …

SCIENTIFIC ABSTRACT MARKOV, K.K. - MARKOV, K.K. · Title: SCIENTIFIC ABSTRACT MARKOV, K.K. - MARKOV, K.K. Subject: SCIENTIFIC ABSTRACT MARKOV, K.K. - MARKOV, K.K. Keywords: k. a r

Randall Hardy20080604

REKSA DANA SCHRODER DANA ANDALAN II

Randall Bramblett

Cryptography | Randall Lewis Randall Lewis Cryptography · PDF fileCryptography | Randall Lewis Randall Lewis Cryptography LAB Objectives Use data protection techniques such as encryption

Presented by: Rex Randall EricksonRex Randall Erickson ... Conference/2016/1B ACCCA Labor Relati… · Presented by: Presented by: Rex Randall EricksonRex Randall Erickson EEEErickson

Mixing Times of Markov Chains for Self-Organizing Lists ...people.math.gatech.edu/~randall/biasedpermutation.pdf · Mixing Times of Markov Chains for Self-Organizing Lists and Biased