Convergence of MCMC Algorithms in Finite Samplesecon.ucsb.edu/~doug/245a/Papers/MCMC Convergence...

30
Convergence of MCMC Algorithms in Finite Samples Anna Kormilitsina and Denis Nekipelov SMU and UC Berkeley September 2009 Kormilitsina, Nekipelov Divergence of MCMC September 2009

Transcript of Convergence of MCMC Algorithms in Finite Samplesecon.ucsb.edu/~doug/245a/Papers/MCMC Convergence...

Convergence of MCMC Algorithms in Finite Samples

Anna Kormilitsina and Denis Nekipelov

SMU and UC Berkeley

September 2009

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Introduction

Motivation

MCMC widely used Bayesian method in frequentist context

Due to simplicity of coding and promise to converge to globalextremum, popular in structural estimation

In general requires verification of set of “regularity conditions”:practitioners rarely consider

These assumptions can be violated in very common structural models

Violation can lead to divergence of algorithm

We use example of macro DSGE model: erroneous inference can leadto misinterpretation of policy parameters

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Introduction

Our approach

MCMC chain: complex dynamic system; in general, stability in suchsystems can be an issue

We use continuous-time approximation for Markov chain; allows us touse results on Lyapunov stability

Lyapunov in-stability implies divergence of MCMC chain (withprobability 1)

We formulate requirements for objective function to guaranteestability

If stability is local, convergence will not occur from some regions ofparameter space

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Introduction

Our results: preview

MCMC can diverge even when structural model is identified

Create test for stability of chain initialized in particular subset ofparameter space: based on Lyapunov stability of its continuous-timeapproximation

Test creates potential for “automatic” choice of support of priordistribution

Results are illustrated using commonly used model of (Christiano,Eichenbaum, Evans, 2005)

Find that even in simple case MCMC does not have globalconvergence

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Our definition of “MCMC”

MCMC - very large class of algorithms

We analyze narrow class of quasi-Bayesian procedures in(Chernozhukov, Hong, 2003)

Based on using objective for M-estimation to form a quasi-density(Laplace quasi-posterior)

Idea: convergence of statistics of quasi-posterior to extremumestimates (Bernstein-von Mises theorem) leads to convergence ofquasi-posterior moment to the M-estimator

Study this problem in the context of sampling based onMetropolis-Hastings

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Characterization of Markov chains

Create sample of parameter draws from quasi-posterior

Procedure can be treated as dynamic system

Elements: proposal density, objective function + tuning parameters;output {θt}

Usually have large samples, proposals can be chosennormal/truncated normal

Sequence of draws can be approximated by diffusion-based stochasticprocess

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Characterization of Markov chains

Result from theory of SDE: form stochastic differential equation forLangevin diffusion process Lt

dLt =1

2∇ log f (Lt) dt + dWt

where Wt standard Brownian motion

f will be the stationary distribution of the solution to Langevinequation

Powerful tool: continuous mapping theorem, can look at cumulativemeans process

1√t

t∑

=0

θk

By functional continuous mapping theorem√

τt

τk

θτk⇒ Lt .

This motivates us to use continuous-time approximationKormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Lyapunov stability

Dynamic system

d θt =1

2∇ log f (θt) dt + G (t, θt) dwt ,

θ0 = θ0.

Maximum of objective ⇒ equilibrium

Stochastic stability: once neighborhood of is equilibrium reached,probability of large deviations is small

Use notion of Lyapunov function

Lyapunov function V (θ, t) is non-negative continuous function, inneighborhood of equilibrium point it is bounded from above bypositive-definite function and

lim|θ|→∞

inft≥0

V (θ, t) = ∞.

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Characterization of Markov chains

Define V (·) for stochastic process θt (representation using Ito lemma)

Definition

The stochastic dynamics system is Lyapunov stable if there exists ǫ > 0and 0 < τ < ǫ such that for each t ∈ [ǫ, T − ǫ], the expectation of thesample path

E

[∫ t+τ

t−τ

dVt

∣Fτ

]

≤ 0. (1)

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Stability result

Theorem

Suppose θ(0) is a unique equilibrium of the MCMC stochastic process in

Θ ∈ Rk . Assume that there exists a function v : R+ × Θ → R, which is

twice continuously differentiable on its support except possibly the

equilibrium point θ(0), and v is such that

∂v(θ, t)∂t

+∑

i

12∇ log f i (θt)

∂v(t,θt )∂θt,i

+ 12

i,j

{G(t,θ)G(t,θ)}i,j∂2v(t,θ)∂θi ∂θj

<0

for all (t, θ) ∈ R+ × Θ, with the strict inequality in (t, θ) ∈ R+ × Θ.

Then the equilibrium point θ(0) is asymptotically stochastically stable.

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Divergence result

Theorem

Suppose θ(0) is a unique equilibrium point of MCMC stochastic process in

Θ ∈ Rk . Assume that there exists a function v : R+ × Θ → R which is

twice continuously differentiable on its support except possibly the

equilibrium point θ(0), and v is such that limθ→θ(0)

inft∈R+

v(t, θ)=∞ while

supt∈R+, θ∈Θ\Bǫ(θ(0))

{

∂v(θ, t)∂t

+∑

i

12∇ log f i (θt)

∂v(t,θt )∂θt,i

+ 12

i,j

{G(t,θ)G(t,θ)}i,j∂2v(t,θ)∂θi ∂θj

}

≥0.

for Bǫ

(

θ(0)

)

={

θ∣

∣ ‖θ − θ(0)‖ < ǫ}

. Then the equilibrium point θ(0) is

asymptotically stochastically unstable and

PFt

{

supt∈R+

‖θt‖<ρ

∣ θ0

}

=0,

for all θ0 ∈ Θ and 0 < ρ < diam(

Θ)

.

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Implications for MCMC

Parameter chain

dθt =1

2

{

∆1T (θt) +1

π (θt)

∂π (θt)

∂θ

}

dt + dWt ,

where ∆L,T (·) is local (mean-square) gradient of quasi-likelihood

Convenient choice of Lyapunov function

v(θ) = 1{

rk−1 ≤(

θ − θ(0)

)′Σk

(

θ − θ(0)

)

≤ rk

}

ak

× exp(

αk

(

θ − θ(0)

)′Σk

(

θ − θ(0)

)

.)

Can use Σ−1 in lieu of Σk

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Theory

Formal procedure

TestH0 : sup

θ∈Θ

{

{

∆1T (θt)+1

π(θt )∂π(θt )

∂θ

}

∂v(θt )

∂θ′+e′

∂2v(θt )

∂θ∂θ′e

}

<0

Formal test statistic

Ts=supt≤s

[

p∑

i,j=1

[

∆i1T

(θt)+1

π(θt )∂π(θt )

∂θit

]

Σ−1{i,j}(θjt−θ

j0)+akα2

k

i,j

(

Σ−1{i,j}

)

]

.

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Example

Example

Can construct simple example where MCMC has absorbing stateoutside solution

Minimize objective

Qn (θ) =1

n

n∑

i=1

(

|xi − θ|1|xi−θ|<a + a1|xi−θ|≥a

)

x ∼ U [−2a, 2a]

For well-defined a objective has well-defined minimum

Population objective is smooth

Sample objective does not sufficiently penalize outliers

Sequence of MCMC draws becomes “too stable” if the draw is farfrom true minimum

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Example

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure: Objective function:sample

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Example

−3 −2 −1 0 1 2 30.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Figure: Objective function:population

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Example

Example

Look at very simple case: accept draws if objective grows; acceptwith probability proportional to exp (Qn (θ2) − Qn (θ1)) if objectivediminishes

Statement: can choose variance of sample draws and a such that“flat region” of objective becomes absorbing state

Idea: if variance of proposal draws is large relative to a, Markov chainmoves to flat area.

There will be chains staying far from 0 for all draws

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

Empirical application: structural model

Use DSGE model similar to (Christiano, Eichenbaum, Evans, 2005)

E0

∞∑

t=0

βtu(ct − bct−1, 1 − lt), (2)

where

u(ct , 1 − lt) = φ log(ct − bct−1) + (1 − φ) log(1 − lt), (3)

where ct − bct−1 is adjusted for habit consumption

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

Structural model

WageW i

t = W it−1π

χw

t−1, (4)

where χw ∈ [0, 1] is the parameter of partial wage indexation

Investment

Φ

(

it

it−1

)

2

(

it

it−1− 1

)2

, (5)

Budget

Etrt,t+1xht+1 + ct + it =

xht

Πt+ trt + rtkt +

∫ 1

0w

jt

(

wjt

wt

)−η

hdt dj + φt

where trt - net transfer from government to household, φt - incomefrom ownership

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

Structural model

Productionyi ,t ≤ ztk

1−θi ,t hi ,t

θ, (6)

Technology

log(zt

z

)

= ρz log(zt−1

z

)

+ ǫz,t , (7)

PricesP i

t = P it−1π

χp

t−1, (8)

Monetary policy

log(Rt

R) = αR log

(

Rt−1

R

)

+ απ log(πt

π

)

+ αy log

(

yt

yt−1

)

, (9)

where αR , απ, and αy - monetary policy parameters

Budget is balanced

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

Solving structural model and generating data

Form system of Euler equations implied by equilibrium

Set parameters

Discount factorβ = .9902; Depreciation rate δ = 2.5%; Investmentadjustment costs κ = 3; habits parameter b = .6; technologyparameter θ = 0.7; price and wage rigidity αp = .6, αw = .8;indexation χp = χw = .5; elasticities of substitution ηw = ηp = 6;monetary policy rule coefficients αR = .7, απ = .45, αy = .15; inflationtarget is π = 1.005; steady state labor 0.3; steady state shadow priceof capital q = 1

Generate data for equilibrium for dynamics of macro-variables

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

Estimator

Estimate

Θ = {αp, αw , χp, χw , b, κ, θ, β, αR , απ, αY , ρz , σz}. (10)

ObjectiveLT (θ) = (X (θ) − XT )′VT (X (θ) − XT ), (11)

X (Θ) - impulse responses generated by model, XT - impulseresponses predicted by data with T data observations

VT - weighting matrix

Use impulse responses for 20 steps of each variable: 140 points tomatch; X (θ) and XT are vectors 140 × 1, and VT is 140 × 140.

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

Results

Smaller sample sizes lead to more frequent divergence

Compute test statistic for sample sizes: 100, 200, 500, 1000, and10, 000

Use two-step procedure: (i)run MCMC chain to find θ0 and Σ−1 (ii)Calibrate (akα2

k) = 0.026

Find that convergence quality depends on

Choice of starting valueChoice of support

Results are obtain given identification in population

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

0.2 0.4 0.6 0.8−1000

−500

0 b

2 3 4 5−200

−100

0 κ

0.4 0.5 0.6 0.7−2000

−1000

0

αp

0.4 0.5 0.6 0.7 0.8−200

−100

0

αw

0.2 0.4 0.6 0.8−200

−100

0

χp

0.2 0.4 0.6 0.8−200

−100

0

χw

0.6 0.7 0.8 0.9−400

−200

0

αR

0.4 0.6 0.8 1−400

−200

0

απ

0.05 0.1 0.15 0.2−30

−25

−20

αy

0.1 0.15 0.2 0.25−100

−50

0

ρz

0.99 0.992 0.994 0.996 0.998−21

−20

−19 β

0.1 0.15 0.2 0.25 0.3 0.35

−400

−200

0 θ

0.6 0.8 1 1.2 1.4−4000

−2000

0

σz

Figure: Empirically evaluated expected distance function

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

200 400 600 800 1000−14

−12

−10

−8

−6

−4

−2

x 105 10000

200 400 600 800 1000−8

−7

−6

−5

−4

−3

−2

−1

0

x 104 1000

200 400 600 800 1000

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

x 104 500

200 400 600 800 1000

−12000

−10000

−8000

−6000

−4000

−2000

0

2000

200

200 400 600 800 1000

−7000

−6000

−5000

−4000

−3000

−2000

−1000

0

1000

2000

100

Figure: Divergence Statistics

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

b

2 4 6

2

4

6

8

10

κ

0.2 0.4 0.6 0.8

0.5

0.6

0.7

αp

0.2 0.4 0.6 0.80.5

0.6

0.7

0.8

0.9

αw

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

χp

0.2 0.4 0.6 0.80.2

0.4

0.6

0.8

χw

0.6 0.7 0.8 0.9

0.6

0.7

0.8

αR

2 4 6 8

5

10

15

απ

1 2 3

1

2

3

4

αY

0.2 0.4 0.6 0.80.1

0.2

0.3

0.4

ρ

0.5 0.6 0.7 0.8 0.9

0.7

0.8

0.9

1

β

0.2 0.4 0.6 0.80.2

0.3

0.4

0.5

θ

2 4 6 8

0.8

1

1.2

1.4

σz

Figure: Dependence of estimates on starting values, 100 obs

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

0.2 0.4 0.6 0.8

0.2

0.4

0.6

b

2 4 6

2

4

6

κ

0.2 0.4 0.6 0.8

0.2

0.4

0.6

αp

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

αw

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

χp

0.2 0.4 0.6 0.8

0.2

0.4

0.6

χw

0.6 0.7 0.8 0.9

0.6

0.7

0.8

αR

2 4 6 8

5

10

15

20

απ

1 2 3

1

2

3

4

αY

0.2 0.4 0.6 0.8

0.1

0.2

0.3

0.4

0.5 ρ

0.5 0.6 0.7 0.8 0.9

0.8

0.9

1

β

0.2 0.4 0.6 0.8

0.2

0.4

0.6

θ

2 4 6 8

0.91

1.11.21.3

σz

Figure: Dependence of estimates on starting values, 200 obs

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

0.2 0.4 0.6 0.8

0.2

0.4

0.6

b

2 4 6

1

2

3

κ

0.2 0.4 0.6 0.8

0.45

0.5

0.55

0.6

0.65

αp

0.2 0.4 0.6 0.8

0.5

0.6

0.7

0.8

0.9

αw

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

χp

0.2 0.4 0.6 0.80.1

0.2

0.3

0.4

0.5

χw

0.6 0.7 0.8 0.9

0.6

0.7

0.8

αR

2 4 6 80.4

0.6

0.8

απ

1 2 3

1

2

3

4

αY

0.2 0.4 0.6 0.80.05

0.1

0.15

0.2

ρ

0.5 0.6 0.7 0.8 0.9

0.850.9

0.951

1.05

β

0.2 0.4 0.6 0.8

0.4

0.6

0.8 θ

2 4 6 80.6

0.8

1

1.2

σz

Figure: Dependence of estimates on starting values, 500 obs

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

0.2 0.4 0.6 0.8

0.2

0.4

0.6

b

2 4 6

1

2

3

κ

0.2 0.4 0.6 0.80.3

0.4

0.5

0.6

αp

0.2 0.4 0.6 0.8

0.4

0.6

0.8

αw

0.2 0.4 0.6 0.8

0.2

0.4

0.6

χp

0.2 0.4 0.6 0.8

0.2

0.4

0.6

χw

0.6 0.7 0.8 0.9

0.7

0.8

0.9

αR

2 4 6 8

5

10

15

απ

1 2 3

0.5

1

1.5

2

αY

0.2 0.4 0.6 0.8

0.1

0.2

0.3

ρ

0.5 0.6 0.7 0.8 0.9

0.9

0.95

1

1.05

β

0.2 0.4 0.6 0.8

0.3

0.4

0.5

0.6 θ

2 4 6 8

0.7

0.8

0.9

1

1.1

σz

Figure: Dependence of estimates on starting values, 1000 obs

Kormilitsina, Nekipelov Divergence of MCMC September 2009

Application

0.2 0.4 0.6 0.8

0.2

0.4

0.6

b

2 4 6

1

2

3

κ

0.2 0.4 0.6 0.8

0.2

0.4

0.6

αp

0.2 0.4 0.6 0.8

0.6

0.7

0.8

αw

0.2 0.4 0.6 0.8

0.2

0.4

0.6

χp

0.2 0.4 0.6 0.80.1

0.2

0.3

0.4

0.5

χw

0.6 0.7 0.8 0.9

0.65

0.7

0.75

αR

2 4 6 8

0.4

0.45

0.5

0.55

απ

1 2 30.12

0.14

0.16

αY

0.2 0.4 0.6 0.80.2

0.4

0.6

0.8

1

ρ

0.5 0.6 0.7 0.8 0.90.9

0.95

1

1.05

β

0.2 0.4 0.6 0.8

0.4

0.6

0.8

θ

2 4 6 80.9

1

1.1

σz

Figure: Dependence of estimates on starting values, sample of 10000 observations

Kormilitsina, Nekipelov Divergence of MCMC September 2009