Accelerated approximate Bayesian computation with applications to protein folding data

51
Accelerating inference for complex stochastic models using Approximate Bayesian Computation with an application to protein folding Umberto Picchini Centre for Mathematical Sciences, Lund University joint work with Julie Forman (Dept. Biostatistics, Copenhagen University) Dept. Mathematics, Uppsala 4 Sept. 2014 Umberto Picchini ([email protected])

Transcript of Accelerated approximate Bayesian computation with applications to protein folding data

Page 1: Accelerated approximate Bayesian computation with applications to protein folding data

Accelerating inference for complex stochasticmodels using Approximate Bayesian Computation

with an application to protein folding

Umberto PicchiniCentre for Mathematical Sciences,

Lund University

joint work withJulie Forman (Dept. Biostatistics, Copenhagen University)

Dept. Mathematics, Uppsala 4 Sept. 2014

Umberto Picchini ([email protected])

Page 2: Accelerated approximate Bayesian computation with applications to protein folding data

Outline

I’ll use protein folding data as a motivating example; most of thetalk will be about statistical inference issues.

I will introduce our data and protein folding problem.

I will introduce a model based on stochastic differentialequations.

I will mention some theoretical and computational issues relatedto (Bayesian) inference.

I will introduce approximate Bayesian computation (ABC) toalleviate methodological and computational problems.

Umberto Picchini ([email protected])

Page 3: Accelerated approximate Bayesian computation with applications to protein folding data

A motivating example

Proteins are synthesized in the cell on ribosomes as linear,unstructured polymers

...which then self-assemble into specific and functionalthree-dimensional structures.

Umberto Picchini ([email protected])

Page 4: Accelerated approximate Bayesian computation with applications to protein folding data

This self-assembly process is called protein folding. It’s the last andcrucial step in the transformation of genetic information, encoded inDNA, into functional protein molecules.

Umberto Picchini ([email protected])

Page 5: Accelerated approximate Bayesian computation with applications to protein folding data

protein folding is also associated with a wide range of humandiseases.In many neurodegenerative diseases, such as Alzheimers disease,proteins misfold into toxic protein structures.

Protein folding has been named “the Holy Grail of biochemistryand biophysics” (!).

Umberto Picchini ([email protected])

Page 6: Accelerated approximate Bayesian computation with applications to protein folding data

Modelize time dynamics is difficult (large number of atoms in a3D space);Atoms coordinates are usually projected onto a single dimensioncalled reaction coordinate see the figure below

0 0.5 1 1.5 2 2.5

x 104

20

25

30

35

40

index

Figure: Data time-course projected on a single coordinate: 25,000 measurements of theL-reaction coordinate of the small Trp-zipper protein at sampling freq. ∆−1 = 1/nsec.

here the L-reaction coordinate was used, i.e. the total distance toa folded reference.notice the random switching between folded/unfolded states.

Umberto Picchini ([email protected])

Page 7: Accelerated approximate Bayesian computation with applications to protein folding data

Modelize time dynamics is difficult (large number of atoms in a3D space);Atoms coordinates are usually projected onto a single dimensioncalled reaction coordinate see the figure below

0 0.5 1 1.5 2 2.5

x 104

20

25

30

35

40

index

Figure: Data time-course projected on a single coordinate: 25,000 measurements of theL-reaction coordinate of the small Trp-zipper protein at sampling freq. ∆−1 = 1/nsec.

here the L-reaction coordinate was used, i.e. the total distance toa folded reference.notice the random switching between folded/unfolded states.

Umberto Picchini ([email protected])

Page 8: Accelerated approximate Bayesian computation with applications to protein folding data

Forman and Sørensen [2013] proposed to consider sums ofdiffusions:1

Zt︸︷︷︸observable process

= Yt︸︷︷︸latent state

+ Ut︸︷︷︸autocorrelated error term

they considered diffusion processes to modelize both Yt and Ut

they found i.i.d. errors where not really giving satisfactoryresults.So let’s introduce some autocorrelation:

dUt = −κUtdt +√

2κγ2dWt, U0 = 0

a zero mean Ornstein-Uhlenbeck process with stationaryvariance γ2 and autocorrelation ρU(t) = e−κt. HeredWt ∼ N(0, dt).

1Forman and Sørensen, A transformation approach to modelling multi-modaldiffusions. J. Statistical Planning and Inference. 2014.

Umberto Picchini ([email protected])

Page 9: Accelerated approximate Bayesian computation with applications to protein folding data

Regarding the “‘signal” part Yt: data clearly shows a bimodalmarginal structure:

0 0.5 1 1.5 2 2.5

x 104

20

25

30

35

40

index22 24 26 28 30 32 34 36 38 400

500

1000

1500

2000

2500

data

so we want a stochastic process that is able to switch between the two“modes” (i.e. the folded-unfolded states). One of possible options:

an OU Xt with zero mean and unit variance

dXt = −θXtdt +√

2θdBt, X0 = x0

plug each Xt into the cdf of its stationary N(0,1) distribution⇒takeΦ(Xt) as the cdf of N(0, 1).

Umberto Picchini ([email protected])

Page 10: Accelerated approximate Bayesian computation with applications to protein folding data

now build a Gaussian mixture and take the percentilecorresponding to an areaΦ(Xt)

to summarize: simulate Xt ⇒ compute Φ(Xt)⇒ find percentileYt from a 2-components Gaussian mixture with cdf

F(y) = α ·Φ(

y − µ1

σ1

)+ (1 − α) ·Φ

(y − µ2

σ2

)Umberto Picchini ([email protected])

Page 11: Accelerated approximate Bayesian computation with applications to protein folding data

So in conclusion we have:

Zt︸︷︷︸data

= Yt︸︷︷︸latent state

+ Ut︸︷︷︸autocorrelated error term

Zt = Yt + Ut

Yt := τ(Xt)

dXt = −θXtdt +√

2θdBt

dUt = −κUtdt +√

2κγ2dWt

τ(x) = (F−1◦Φ)(x), F(y) = α·Φ(

y − µ1

σ1

)+(1−α)·Φ

(y − µ2

σ2

)We are interested in conducting (Bayesian) inference for

η = (θ, κ,γ,α,µ1,µ2,σ1,σ2)

Umberto Picchini ([email protected])

Page 12: Accelerated approximate Bayesian computation with applications to protein folding data

Difficulties with exact Bayesian

A non-exhaustive list on the difficulty of using exact Bayesianinference via MCMC and SMC in our application:

our dataset is “large” (25,000 observations...)which is not “terribly” large in absolute sense, but it is whendealing with diffusion processes......in fact even when a proposed parameter value is in the bulk ofthe posterior distribution, generated trajectories might still be toodistant from the data (⇒ high rejection rate!)a high rejection rate implies poor exploration of the posteriorsurface, poor inferential results and increasing computationaltime.some of these issues can be mitigated using bridging techniques(Beskos et al. ’13): not trivial in our case (transformation τ(x)unknown in closed form).

Umberto Picchini ([email protected])

Page 13: Accelerated approximate Bayesian computation with applications to protein folding data

Before going into approximated methods...you should trust (!) that wehave put lots of effort in trying to avoid approximations! Still...

Beside theoretical difficulties, currently existing methods do not scalewell for large data:

e.g. we attempted at using particle MCMC (Andrieu et al. ’10)and even with a few (10 only!) particles it would require weeksto obtain results...

it is expensive to simulate from our model as percentiles ofmixture-models are unknown in closed form.

we had to use some approximated strategy, and we consideredABC (approximate Bayesian computation).

Umberto Picchini ([email protected])

Page 14: Accelerated approximate Bayesian computation with applications to protein folding data

Notation

Data: z = (z0, z1, ..., zn)

unknown parameters: η = (θ, κ,γ,α,µ1,µ2,σ1,σ2)

Likelihood funct.: p(z|η)

Prior density: π(η) (our a priori knowledge of η)

Posterior density:

π(η|z) ∝ π(η)p(z|η)

Ideally we would like to use/sample from the posterior.We assume this is either theoretically difficult or computationallyexpensive (it is in our case!).

Umberto Picchini ([email protected])

Page 15: Accelerated approximate Bayesian computation with applications to protein folding data

Approximate Bayesian computation (ABC)

ABC gives a way to approximate a posterior distribution π(η|z)

key to the success of ABC is the ability to bypass the explicitcalculation of the likelihood p(z|η)...only forward-simulation from the model is required!

ABC is in fact a likelihood-free method that works by simulatingpseudo-data zsim from the model:

zsim ∼ p(z|η)

had an incredible success in genetic studies since mid 90’s(Tavare et al ’97, Pritchard et al. ’99).lots of hype in recent years: see Christian Robert’s excellentblog.

Umberto Picchini ([email protected])

Page 16: Accelerated approximate Bayesian computation with applications to protein folding data

Basic rejection sampler (NO approximations here!)for r = 1 to R do

repeatGenerate parameter η ′ from its prior distribution π(η)Generate zsim from the likelihood p(z|η ′) (!! no need to know

p(·) analytically!!)

until zsim = z (simulated data = actual data)

set ηr = η′

end for

The algorithm produces R samples from the exact posterior π(η|z).

:( It won’t work for continuous data or large amount of data becausePr(z = zsim) ≈ 0⇒ substitute z = zsim with z ≈ zsim

Umberto Picchini ([email protected])

Page 17: Accelerated approximate Bayesian computation with applications to protein folding data

...⇒ substitute z = zsim with z ≈ zsim

Introduce some distance ‖ z − zsim ‖ to measure proximity of zsim

and data z [Pritchard et al. 1999].

Introduce a tolerance value δ > 0.

An ABC rejection sampler:for r = 1 to R do

repeatGenerate parameter η ′ from its prior distribution π(η)Generate zsim from the likelihood p(z|η ′)

until ‖ z − zsim ‖< δ [or alternatively ‖ S(z) − S(zsim) ‖< δ]set ηr = η

end forfor some “summary statistics” S(·).

Umberto Picchini ([email protected])

Page 18: Accelerated approximate Bayesian computation with applications to protein folding data

Previous algorithm samples from approximate posteriorsπ(η| ‖ z − zsim ‖< δ) or π(η| ‖ S(z) − S(zsim) ‖< δ.

useful to consider statistics S(·) when dealing with large datasetsto increase the probability of acceptance.

the key result of ABC

When S(·) is “sufficient” for η and δ ≈ 0 sampling from the posterioris (almost) exact!

When S(·) is sufficient for the parameter⇒ π(η| ‖ S(zsim) − S(z) ‖< δ) ≡ π(η| ‖ zsim − z ‖< δ)when δ = 0 and S sufficient we accept only parameter draws forwhich zsim ≡ z⇒ π(η|z), the exact posterior.

This is all good and nice, but such conditions rarely hold.

Umberto Picchini ([email protected])

Page 19: Accelerated approximate Bayesian computation with applications to protein folding data

Previous algorithm samples from approximate posteriorsπ(η| ‖ z − zsim ‖< δ) or π(η| ‖ S(z) − S(zsim) ‖< δ.

useful to consider statistics S(·) when dealing with large datasetsto increase the probability of acceptance.

the key result of ABC

When S(·) is “sufficient” for η and δ ≈ 0 sampling from the posterioris (almost) exact!

When S(·) is sufficient for the parameter⇒ π(η| ‖ S(zsim) − S(z) ‖< δ) ≡ π(η| ‖ zsim − z ‖< δ)when δ = 0 and S sufficient we accept only parameter draws forwhich zsim ≡ z⇒ π(η|z), the exact posterior.

This is all good and nice, but such conditions rarely hold.

Umberto Picchini ([email protected])

Page 20: Accelerated approximate Bayesian computation with applications to protein folding data

a central problem is how to choose the statistics S(·): outside theexponential family we typically cannot derive sufficient statistics.[A key work to obtain “semi-automatically” statistics is Fearnhead-Prangle ’12

(discussion paper on JRSS-B. Very much recommended.)]

substitute with the loose concept of informative (enough)statistic then choose a small (enough) threshold δ.

We now go back to our model and (large) data. We propose sometrick to accelerate the inference.

We will use an ABC within MCMC approach (ABC-MCMC).

Umberto Picchini ([email protected])

Page 21: Accelerated approximate Bayesian computation with applications to protein folding data

a central problem is how to choose the statistics S(·): outside theexponential family we typically cannot derive sufficient statistics.[A key work to obtain “semi-automatically” statistics is Fearnhead-Prangle ’12

(discussion paper on JRSS-B. Very much recommended.)]

substitute with the loose concept of informative (enough)statistic then choose a small (enough) threshold δ.

We now go back to our model and (large) data. We propose sometrick to accelerate the inference.

We will use an ABC within MCMC approach (ABC-MCMC).

Umberto Picchini ([email protected])

Page 22: Accelerated approximate Bayesian computation with applications to protein folding data

ABC-MCMC

Example: Weigh discrepancy between observed data andsimulated trajectories using a uniform 0-1 kernel Kδ(·), e.g.

Kδ(S(zsim), S(z)) =

{1 if ‖ S(zsim) − S(z) ‖< δ0 otherwise

complete freedom to choose a different criterion...

Use such measure in place of the typical conditional density ofdata given latent states. See next slide.

Umberto Picchini ([email protected])

Page 23: Accelerated approximate Bayesian computation with applications to protein folding data

Zt = Yt + Ut with Yt := τ(Xt)

dXt = −θXtdt +√

2θdBt

dUt = −κUtdt +√

2κγ2dWt

ABC-MCMC acceptance ratio: Given the current value of parameterη ≡ ηold generate a Markov chain via Metropolis-Hastings:

Algorithm 1 a generic iteration of ABC-MCMC (fixed threshold δ)At r-th iteration1. generate η ′ ∼ u(η ′|ηold), e.g. using Gaussian random walk2. generate zsim|η

′ ∼ “from the model – forward simulation”3. generateω ∼ U(0, 1)

4. accept η ′ if ω < min(

1, π(η′)K(S(zsim),S(z))u(ηold|η

′)π(ηold)K(S(zold),S(z))u(η ′|ηold)

)then set ηr = η

′ else ηr = ηold

Samples are from π(η, zsim| ‖ S(zsim) − S(z) ‖< δ).Umberto Picchini ([email protected])

Page 24: Accelerated approximate Bayesian computation with applications to protein folding data

Algorithm 2 a generic iteration of ABC-MCMC (random threshold δ)At r-th iteration1. generate η ′ ∼ u(η ′|ηold), δ ′ ∼ v(δ|δold)2. generate zsim|η

′ ∼ “from the model – forward simulation”3. generateω ∼ U(0, 1)

4. accept η ′ if ω < min(

1, π(η′)Kδ(S(zsim),S(z))u(ηold|η

′)v(δold|δ′)

π(ηold)Kδ(S(zold),S(z))u(η ′|ηold)v(δ ′|δold)

)then set (ηr, δr) = (η ′, δ ′) else (ηr, δr) = (ηold, δold)

Samples are from π(η, δ, zsim| ‖ S(zsim) − S(z) ‖< δ).

Umberto Picchini ([email protected])

Page 25: Accelerated approximate Bayesian computation with applications to protein folding data

by using a (not too!) small threshold δ we might obtain a decentacceptance rate for the approximated posterior...

however ABC does not save us from having to producecomputationally costly “long” trajectories zsim (n ≈ 25, 000) ateach step of ABC-MCMC.

However in ABC the relevant info about our simulations isencoded into S(·)...do we really have to simulate zsim having same length as our dataz??

...simulate “short” zsim that still result qualitativelyrepresentative! (see next slide...)

Umberto Picchini ([email protected])

Page 26: Accelerated approximate Bayesian computation with applications to protein folding data

Top row: full dataset of 28,000 observations.Bottom row: every 30th observation is reported.

0 0.5 1 1.5 2 2.5

x 104

20

25

30

35

40

index22 24 26 28 30 32 34 36 38 400

500

1000

1500

2000

2500

data

0 100 200 300 400 500 600 700 800 90022

24

26

28

30

32

34

36

index22 24 26 28 30 32 34 36

0

10

20

30

40

50

60

70

data

Dataset is 30 times smaller but qualitative features are still there!Umberto Picchini ([email protected])

Page 27: Accelerated approximate Bayesian computation with applications to protein folding data

Strategy for large datasets (Picchini-Forman ’14)

We have a dataset z of about 25, 000 observations.prior to starting ABC-MCMC construct S(z) to contain:

1 the 15th-30th...-90th percentile of the marginal distribution of thefull data. → to identify Gauss. mixture params µ1,µ2 etc

2 values of autocorrelation function of full data z at lags(60, 300, 600, ..., 2100). → to identify dynamics-related paramsθ,γ, κ

during ABC-MCMC we simulate shorter trajectories zsim of size25000/30 ≈ 800.we take as summary statistics S(zsim) the 15th-30th...-90thpercentile of simulated data and autocorrelations at lags(2, 10, 20, ..., 70) (recall zsim is 30x shorter than z).we then compare S(zsim) with S(z) into ABC-MCMC.this is fast! S(·) for the large data can be computed prior toABC-MCMC start.

Umberto Picchini ([email protected])

Page 28: Accelerated approximate Bayesian computation with applications to protein folding data

Strategy for large datasets (Picchini-Forman ’14)

We have a dataset z of about 25, 000 observations.prior to starting ABC-MCMC construct S(z) to contain:

1 the 15th-30th...-90th percentile of the marginal distribution of thefull data. → to identify Gauss. mixture params µ1,µ2 etc

2 values of autocorrelation function of full data z at lags(60, 300, 600, ..., 2100). → to identify dynamics-related paramsθ,γ, κ

during ABC-MCMC we simulate shorter trajectories zsim of size25000/30 ≈ 800.we take as summary statistics S(zsim) the 15th-30th...-90thpercentile of simulated data and autocorrelations at lags(2, 10, 20, ..., 70) (recall zsim is 30x shorter than z).we then compare S(zsim) with S(z) into ABC-MCMC.this is fast! S(·) for the large data can be computed prior toABC-MCMC start.

Umberto Picchini ([email protected])

Page 29: Accelerated approximate Bayesian computation with applications to protein folding data

Strategy for large datasets (Picchini-Forman ’14)

We have a dataset z of about 25, 000 observations.prior to starting ABC-MCMC construct S(z) to contain:

1 the 15th-30th...-90th percentile of the marginal distribution of thefull data. → to identify Gauss. mixture params µ1,µ2 etc

2 values of autocorrelation function of full data z at lags(60, 300, 600, ..., 2100). → to identify dynamics-related paramsθ,γ, κ

during ABC-MCMC we simulate shorter trajectories zsim of size25000/30 ≈ 800.we take as summary statistics S(zsim) the 15th-30th...-90thpercentile of simulated data and autocorrelations at lags(2, 10, 20, ..., 70) (recall zsim is 30x shorter than z).we then compare S(zsim) with S(z) into ABC-MCMC.this is fast! S(·) for the large data can be computed prior toABC-MCMC start.

Umberto Picchini ([email protected])

Page 30: Accelerated approximate Bayesian computation with applications to protein folding data

So the first strategy to accelerate computations was simulating asmaller subset of artificial data.

Our second strategy is to perform so called “early rejection” ofproposed parameters.

Umberto Picchini ([email protected])

Page 31: Accelerated approximate Bayesian computation with applications to protein folding data

ABC-MCMC acceptance ratio:

accept η ′ if ω < min(

1,π(η ′)K(|S(zsim) − S(z)|)u(ηold |η

′)

π(ηold)K(|S(zold) − S(z)|)u(η ′|ηold)

)however remember K(·) is a 0/1 kernel→ let’s start the algorithm atan admissible ηstart such that at first iteration S(zold) ≈ S(z)→

accept η ′ if ω < min(

1,π(η ′)

π(ηold)K(|S(zsim) − S(z)|)

u(ηold |η′)

u(η ′|ηold)

)

Notice η ′ will surely be rejected ifω > π(η ′)π(ηold)

u(ηold|η′)

u(η ′|ηold)

REGARDLESS the value of K(·) ∈ {0, 1}→do NOT simulate trajectories if the above is satisfied!

Umberto Picchini ([email protected])

Page 32: Accelerated approximate Bayesian computation with applications to protein folding data

Algorithm 3 Early–Rejection ABC-MCMC (Picchini ’13)1. At (r + 1)th ABC-MCMC iteration:2. generate η ′ ∼ u(η|ηr) from its proposal distribution;3. generateω ∼ U(0, 1);if

ω>π(η ′)u(ηr|η

′)

π(ηr)u(η ′|ηr)(= “ratio”)

then(ηr+1, S(zsim,r+1)) := (ηr , S(zsim,r)); . (proposal early-rejected)

else generate xsim ∼ π(x|η ′) conditionally on the η ′ from step 2; determine ysim = τ(xsim)and generate zsim ∼ π(z|ysim,η ′) and calculate S(zsim);

if K(|S(zsim) − S(z)|) = 0 then(ηr+1, S(zsim,r+1)) := (ηr , S(zsim,r)) . (proposal rejected)

else ifω 6 ratio then(ηr+1, S(zsim,r+1)) := (η ′, S(zsim)) . (proposal accepted)

else(ηr+1, S(zsim,r+1)) := (ηr , S(zsim,r)) . (proposal rejected)

end ifend if4. increment r to r + 1. If r > R stop, else go to step 2.

Umberto Picchini ([email protected])

Page 33: Accelerated approximate Bayesian computation with applications to protein folding data

Notice “early rejection” works only with 0-1 kernels.

No reason not to use it. Early-rejection per-se is not anapproximation, it’s just a trick.

It saved us between 40-50% of computing time (Picchini ’13).

Umberto Picchini ([email protected])

Page 34: Accelerated approximate Bayesian computation with applications to protein folding data

Results after 2 millions ABC-MCMC iterations. Acceptance rate of1% and 6 hrs computation with MATLAB on a common pc desktop.

Table: Protein folding data experiment: posterior means from theABC-MCMC output and 95% posterior intervals.

ABC posterior meanslog θ –6.454 [–6.898,-5.909]log κ –0.651 [–1.424,0.246]logγ 0.071 [–0.313,0.378]logµ1 3.24 [3.22,3.26]logµ2 3.43 [3.39,3.45]logσ1 –0.959 [–2.45,0.38]logσ2 –0.424 [–2.26,0.76]logα –0.663 [–1.035,–0.383]

Umberto Picchini ([email protected])

Page 35: Accelerated approximate Bayesian computation with applications to protein folding data

0 0.5 1 1.5 2 2.5

x 104

20

30

40

0 0.5 1 1.5 2 2.5

x 104

20

30

40

0 0.5 1 1.5 2 2.5

x 104

20

30

40

Figure: Data (top), process Yt (middle), process Zt (bottom).

Zt = Yt + Ut meaning “evaluated at η = posterior mean”Umberto Picchini ([email protected])

Page 36: Accelerated approximate Bayesian computation with applications to protein folding data

A simulation study (Picchini-Forman ’14)

Here we want to compare ABC against the (computationallyintensive) exact Bayesian inference (via particle MCMC,pMCMC).

in order to do so we consider a very small dataset of 360simulated observations.

we use a parallel strategy for pMCMC devised in Dovrandi ’14(4 chains run in parallel using 100 particles for each chain).

C. Dovrandi (2014) Pseudo-marginal algorithms with multiple CPUs.Queensland University of Technology, http://eprints.qut.edu.au/61505/

U.P. and Forman (2014). Accelerating inference for diffusions observed with measurement errorand large sample sizes using Approximate Bayesian Computation. arXiv:1310.0973.

Umberto Picchini ([email protected])

Page 37: Accelerated approximate Bayesian computation with applications to protein folding data

Comparison ABC-MCMC (*) vs exact Bayes (pMCMC)

True valueθ 0.0027 0.0023 [0.0013,0.0041]

0.0024∗ [0.0013,0.0039]κ 0.538 0.444 [0.349,0.558]

0.553∗ [0.386,0.843]γ 1.063 1.040 [0.943,1.158]

0.982∗ [0.701,1.209]µ1 25.52 25.68 [25.08,26.61]

25.72∗ [25.10,26.71]µ2 30.92 32.12 [29.15,35.42]

32.17∗ [29.46,34.96]σ1 0.540 0.421 [0.203,0.844]

0.523∗ [0.248,0.972]σ2 0.624 0.502 [0.232,1.086]

0.511∗ [0.249,1.041]α 0.537 0.510 [0.345,0.755]

0.508∗ [0.346,0.721]

Umberto Picchini ([email protected])

Page 38: Accelerated approximate Bayesian computation with applications to protein folding data

−7 −6.5 −6 −5.5 −50

0.5

1

1.5

2

2.5

(a) log θ

−2 −1.5 −1 −0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

(b) log κ

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.60

1

2

3

4

5

6

7

8

(c) logγ

Figure: Exact Bayesian (solid); ABC-MCMC (dashed); true value (verticallines); uniform priors.

Umberto Picchini ([email protected])

Page 39: Accelerated approximate Bayesian computation with applications to protein folding data

3.1 3.15 3.2 3.25 3.30

20

40

60

80

100

120

(a) logµ1

3.3 3.4 3.5 3.6 3.70

20

40

60

80

100

120

140

(b) logµ2

−3 −2 −1 0 10

0.5

1

1.5

2

2.5

3

(c) logσ1

−3 −2 −1 0 10

0.2

0.4

0.6

0.8

1

1.2

1.4

(d) logσ2

−1.5 −1 −0.5 00

0.5

1

1.5

2

2.5

3

3.5

(e) logα

Figure: Exact Bayesian (solid); ABC-MCMC (dashed); true value (vertical lines);uniform priors.Umberto Picchini ([email protected])

Page 40: Accelerated approximate Bayesian computation with applications to protein folding data

Conclusions

as long as we manage to “compress” information into summarystatistics ABC is a useful inferencial tool for complex modelsand large datasets.

1,000 ABC-MCMC iterations performed in 6 sec; in about 20min with exact Bayesian sampling (pMCMC).

...problem is that ABC requires lots of tuning (choose S(·), δ,K(·)...)MATLAB implementation available athttp://sourceforge.net/projects/abc-sde/with 50+ pages manual.

Umberto Picchini ([email protected])

Page 41: Accelerated approximate Bayesian computation with applications to protein folding data

References

U.P. (2014). Inference for SDE models via ApproximateBayesian Computation. J. Comp. Graph. Stat.

U.P. and J. Forman (2013). Accelerating inference for diffusionsobserved with measurement error and large sample sizes usingApproximate Bayesian Computation. arXiv:1310.0973.

U.P. (2013) abc-sde: a MATLAB toolbox for approximateBayesian computation (ABC) in stochastic differential equationmodels.http://sourceforge.net/projects/abc-sde/

Umberto Picchini ([email protected])

Page 42: Accelerated approximate Bayesian computation with applications to protein folding data

Appendix

Umberto Picchini ([email protected])

Page 43: Accelerated approximate Bayesian computation with applications to protein folding data

Proof that the basic ABC algorithm works

The proof is straightforward.We know that a draw (η ′, zsim) produced by the algorithm is such that(i) η ′ ∼ π(η), and (ii) such that zsim = z, where zsim ∼ π(zsim | η ′).

Thus let’s call f (η ′) the (unknown) density for such η ′, then becauseof (i) and (ii)

f (η ′) ∝∑zsim

π(η ′)π(zsim|η′)Iz(zsim) =

∑zsim=z

π(η ′, zsim) ∝ π(η|z).

Therefore η ′ ∼ π(η|z).

Umberto Picchini ([email protected])

Page 44: Accelerated approximate Bayesian computation with applications to protein folding data

A theoretical motivation to consider ABC

An important (known) result

A fundamental consequence is that if S(·) is a sufficient statistic for θthen limδ→0 πδ(θ | y) = π(θ | y) the exact (marginal) posterior!!!

uh?!

Otherwise (in general) the algorithm draws from the approximationπ(θ | ρ(S(x), S(y)) < δ).

also by introducing the class of quadratic losses

L(θ0, θ; A) = (θ0 − θ)TA(θ0 − θ) :

Another relevant result

If S(y) = E(θ | y) then the minimal expected quadratic lossE(L(θ0, θ; A) | y) is achieved via θ = EABC(θ | S(y)) as δ→ 0.

Umberto Picchini ([email protected])

Page 45: Accelerated approximate Bayesian computation with applications to protein folding data

The straightforward motivation is the following:consider the (ABC) posterior πδ(θ | y) then

πδ(θ | y) =∫πδ(θ, x | y)dx ∝ π(θ)

∫1δ

K(|S(x) − S(y)|

δ

)π(x | θ)dx

→ π(θ)π(S(x) = S(y) | θ) (δ→ 0).

Therefore if S(·) is a sufficient statistic for θ then

limδ→0

πδ(θ | y) = π(θ | y)

the exact (marginal) posterior!!!

Umberto Picchini ([email protected])

Page 46: Accelerated approximate Bayesian computation with applications to protein folding data

Acceptance probability in Metropolis-Hastings

Suppose at a given iteration of Metropolis-Hastings we are in the(augmented)-state position (θ#, x#) and wonder whether to move (ornot) to a new state (θ ′, x ′). The move is generated via a proposaldistribution “q((θ#, x#)→ (x ′, θ ′))”.

e.g. “q((θ#, x#)→ (x ′, θ ′))” = u(θ ′|θ#)v(x ′ | θ ′);move “(θ#, x#)→ (θ ′, x ′)” accepted with probability

α(θ#,x#)→(x′,θ′) = min(

1,π(θ ′)π(x ′|θ ′)π(y|x ′, θ ′)q((θ ′, x ′)→ (θ#, x#))

π(θ#)π(x#|θ#)π(y|x#, θ#)q((θ#, x#)→ (θ ′, x ′))

)= min

(1,π(θ ′)π(x ′|θ ′)π(y|x ′, θ ′)u(θ#|θ

′)v(x# | θ#)

π(θ#)π(x#|θ#)π(y|x#, θ#)u(θ ′|θ#)v(x ′ | θ ′)

)now choose v(x | θ) ≡ π(x | θ)

= min(

1,π(θ ′)����π(x ′|θ ′)π(y|x ′, θ ′)u(θ#|θ

′)�����π(x# | θ#)

π(θ#)����π(x#|θ#)π(y|x#, θ#)u(θ ′|θ#)�����π(x ′ | θ ′)

)This is likelihood–free! And we only need to know how to generate x ′

(not a problem...)Umberto Picchini ([email protected])

Page 47: Accelerated approximate Bayesian computation with applications to protein folding data

Acceptance probability in Metropolis-Hastings

Suppose at a given iteration of Metropolis-Hastings we are in the(augmented)-state position (θ#, x#) and wonder whether to move (ornot) to a new state (θ ′, x ′). The move is generated via a proposaldistribution “q((θ#, x#)→ (x ′, θ ′))”.

e.g. “q((θ#, x#)→ (x ′, θ ′))” = u(θ ′|θ#)v(x ′ | θ ′);move “(θ#, x#)→ (θ ′, x ′)” accepted with probability

α(θ#,x#)→(x′,θ′) = min(

1,π(θ ′)π(x ′|θ ′)π(y|x ′, θ ′)q((θ ′, x ′)→ (θ#, x#))

π(θ#)π(x#|θ#)π(y|x#, θ#)q((θ#, x#)→ (θ ′, x ′))

)= min

(1,π(θ ′)π(x ′|θ ′)π(y|x ′, θ ′)u(θ#|θ

′)v(x# | θ#)

π(θ#)π(x#|θ#)π(y|x#, θ#)u(θ ′|θ#)v(x ′ | θ ′)

)now choose v(x | θ) ≡ π(x | θ)

= min(

1,π(θ ′)����π(x ′|θ ′)π(y|x ′, θ ′)u(θ#|θ

′)�����π(x# | θ#)

π(θ#)����π(x#|θ#)π(y|x#, θ#)u(θ ′|θ#)�����π(x ′ | θ ′)

)This is likelihood–free! And we only need to know how to generate x ′

(not a problem...)Umberto Picchini ([email protected])

Page 48: Accelerated approximate Bayesian computation with applications to protein folding data

Acceptance probability in Metropolis-Hastings

Suppose at a given iteration of Metropolis-Hastings we are in the(augmented)-state position (θ#, x#) and wonder whether to move (ornot) to a new state (θ ′, x ′). The move is generated via a proposaldistribution “q((θ#, x#)→ (x ′, θ ′))”.

e.g. “q((θ#, x#)→ (x ′, θ ′))” = u(θ ′|θ#)v(x ′ | θ ′);move “(θ#, x#)→ (θ ′, x ′)” accepted with probability

α(θ#,x#)→(x′,θ′) = min(

1,π(θ ′)π(x ′|θ ′)π(y|x ′, θ ′)q((θ ′, x ′)→ (θ#, x#))

π(θ#)π(x#|θ#)π(y|x#, θ#)q((θ#, x#)→ (θ ′, x ′))

)= min

(1,π(θ ′)π(x ′|θ ′)π(y|x ′, θ ′)u(θ#|θ

′)v(x# | θ#)

π(θ#)π(x#|θ#)π(y|x#, θ#)u(θ ′|θ#)v(x ′ | θ ′)

)now choose v(x | θ) ≡ π(x | θ)

= min(

1,π(θ ′)����π(x ′|θ ′)π(y|x ′, θ ′)u(θ#|θ

′)�����π(x# | θ#)

π(θ#)����π(x#|θ#)π(y|x#, θ#)u(θ ′|θ#)�����π(x ′ | θ ′)

)This is likelihood–free! And we only need to know how to generate x ′

(not a problem...)Umberto Picchini ([email protected])

Page 49: Accelerated approximate Bayesian computation with applications to protein folding data

Generation of δ’s

0 0.5 1 1.5 2 2.5

x 105

0

0.2

0.4

0.6

0.8

Here we generate a chain for log δ using a (truncated) Gaussianrandom walk with support (−∞, log δmax]. We let log δmax decreaseduring the simulation.

Umberto Picchini ([email protected])

Page 50: Accelerated approximate Bayesian computation with applications to protein folding data

HOWTO: post-hoc selection of δ (the “precision”parameter) [Bortot et al. 2007]

During ABC-MCMC we let δ vary (according to a MRW): at rth iterationδr = δr−1 + ∆, with ∆ ∼ N(0,ν2).After the end of the MCMC we have a sequence {θr, δr}r=0,1,2... and for eachparameter {θj,r}r=0,1,2... we produce a plot of the parameter chain vs δ:

0 0.5 1 1.5 2 2.5 3 3.5 4−2.5

−2

−1.5

−1

−0.5

0

bandwidth

Umberto Picchini ([email protected])

Page 51: Accelerated approximate Bayesian computation with applications to protein folding data

Post-hoc selection of the bandwidth δ, cont’d...

Therefore in practice:

we filter out of the analyses those draws {θr}r=0,1,2,...corresponding to “large” δ, for statistical precision;we retain only those {θr}r=0,1,2,... corresponding to a low δ.in the example we retain {θr; δr < 1.5}.PRO: this is useful as it allows an ex-post selection of δ, i.e. wedo not need to know in advance a suitable value for δ.CON: by filtering out some of the draws, a disadvantage of theapproach is the need to run very long MCMC simulations inorder to have enough “material” on which to base our posteriorinference.PRO: also notice that by letting δ vary we are almost consideringa global optimization method (similar to simulated tempering).

Umberto Picchini ([email protected])