ABC in Roma
-
Upload
christian-robert -
Category
Education
-
view
9.440 -
download
1
description
Transcript of ABC in Roma
Likelihood-free Methods
Likelihood-free Methods
Christian P. Robert
Universite Paris-Dauphine & CRESThttp://www.ceremade.dauphine.fr/~xian
February 16, 2012
Likelihood-free Methods
Outline
Introduction
The Metropolis-Hastings Algorithm
simulation-based methods in Econometrics
Genetics of ABC
Approximate Bayesian computation
ABC for model choice
ABC model choice consistency
Likelihood-free Methods
Introduction
Approximate Bayesian computation
IntroductionMonte Carlo basicsPopulation Monte CarloAMIS
The Metropolis-Hastings Algorithm
simulation-based methods in Econometrics
Genetics of ABC
Approximate Bayesian computation
ABC for model choice
ABC model choice consistency
Likelihood-free Methods
Introduction
Monte Carlo basics
General purpose
Given a density π known up to a normalizing constant, and anintegrable function h, compute
Π(h) =
∫h(x)π(x)µ(dx) =
∫h(x)π(x)µ(dx)∫π(x)µ(dx)
when∫
h(x)π(x)µ(dx) is intractable.
Likelihood-free Methods
Introduction
Monte Carlo basics
Monte Carlo basics
Generate an iid sample x1, . . . , xN from π and estimate Π(h) by
ΠMCN (h) = N−1
N∑i=1
h(xi ).
LLN: ΠMCN (h)
as−→ Π(h)If Π(h2) =
∫h2(x)π(x)µ(dx) <∞,
CLT:√
N(
ΠMCN (h)− Π(h)
)L N
(0,Π
{[h − Π(h)]2
}).
Caveat
Often impossible or inefficient to simulate directly from Π
Likelihood-free Methods
Introduction
Monte Carlo basics
Monte Carlo basics
Generate an iid sample x1, . . . , xN from π and estimate Π(h) by
ΠMCN (h) = N−1
N∑i=1
h(xi ).
LLN: ΠMCN (h)
as−→ Π(h)If Π(h2) =
∫h2(x)π(x)µ(dx) <∞,
CLT:√
N(
ΠMCN (h)− Π(h)
)L N
(0,Π
{[h − Π(h)]2
}).
Caveat
Often impossible or inefficient to simulate directly from Π
Likelihood-free Methods
Introduction
Monte Carlo basics
Importance Sampling
For Q proposal distribution such that Q(dx) = q(x)µ(dx),alternative representation
Π(h) =
∫h(x){π/q}(x)q(x)µ(dx).
Principle
Generate an iid sample x1, . . . , xN ∼ Q and estimate Π(h) by
ΠISQ,N(h) = N−1
N∑i=1
h(xi ){π/q}(xi ).
Likelihood-free Methods
Introduction
Monte Carlo basics
Importance Sampling
For Q proposal distribution such that Q(dx) = q(x)µ(dx),alternative representation
Π(h) =
∫h(x){π/q}(x)q(x)µ(dx).
Principle
Generate an iid sample x1, . . . , xN ∼ Q and estimate Π(h) by
ΠISQ,N(h) = N−1
N∑i=1
h(xi ){π/q}(xi ).
Likelihood-free Methods
Introduction
Monte Carlo basics
Importance Sampling (convergence)
ThenLLN: ΠIS
Q,N(h)as−→ Π(h) and if Q((hπ/q)2) <∞,
CLT:√
N(ΠISQ,N(h)− Π(h))
L N
(0,Q{(hπ/q − Π(h))2}
).
Caveat
If normalizing constant unknown, impossible to use ΠISQ,N
Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x |θ)π(θ).
Likelihood-free Methods
Introduction
Monte Carlo basics
Importance Sampling (convergence)
ThenLLN: ΠIS
Q,N(h)as−→ Π(h) and if Q((hπ/q)2) <∞,
CLT:√
N(ΠISQ,N(h)− Π(h))
L N
(0,Q{(hπ/q − Π(h))2}
).
Caveat
If normalizing constant unknown, impossible to use ΠISQ,N
Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x |θ)π(θ).
Likelihood-free Methods
Introduction
Monte Carlo basics
Self-Normalised Importance Sampling
Self normalized version
ΠSNISQ,N (h) =
(N∑i=1
{π/q}(xi )
)−1 N∑i=1
h(xi ){π/q}(xi ).
LLN : ΠSNISQ,N (h)
as−→ Π(h)
and if Π((1 + h2)(π/q)) <∞,
CLT :√
N(ΠSNISQ,N (h)− Π(h))
L N
(0, π {(π/q)(h − Π(h)}2)
).
c© The quality of the SNIS approximation depends on thechoice of Q
Likelihood-free Methods
Introduction
Monte Carlo basics
Self-Normalised Importance Sampling
Self normalized version
ΠSNISQ,N (h) =
(N∑i=1
{π/q}(xi )
)−1 N∑i=1
h(xi ){π/q}(xi ).
LLN : ΠSNISQ,N (h)
as−→ Π(h)
and if Π((1 + h2)(π/q)) <∞,
CLT :√
N(ΠSNISQ,N (h)− Π(h))
L N
(0, π {(π/q)(h − Π(h)}2)
).
c© The quality of the SNIS approximation depends on thechoice of Q
Likelihood-free Methods
Introduction
Monte Carlo basics
Self-Normalised Importance Sampling
Self normalized version
ΠSNISQ,N (h) =
(N∑i=1
{π/q}(xi )
)−1 N∑i=1
h(xi ){π/q}(xi ).
LLN : ΠSNISQ,N (h)
as−→ Π(h)
and if Π((1 + h2)(π/q)) <∞,
CLT :√
N(ΠSNISQ,N (h)− Π(h))
L N
(0, π {(π/q)(h − Π(h)}2)
).
c© The quality of the SNIS approximation depends on thechoice of Q
Likelihood-free Methods
Introduction
Monte Carlo basics
Iterated importance sampling
Introduction of an algorithmic temporal dimension :
x(t)i ∼ qt(x |x (t−1)
i ) i = 1, . . . , n, t = 1, . . .
and
It =1
n
n∑i=1
%(t)i h(x
(t)i )
is still unbiased for
%(t)i =
πt(x(t)i )
qt(x(t)i |x
(t−1)i )
, i = 1, . . . , n
Likelihood-free Methods
Introduction
Population Monte Carlo
PMC: Population Monte Carlo Algorithm
At time t = 0
Generate (xi ,0)1≤i≤Niid∼ Q0
Set ωi ,0 = {π/q0}(xi ,0)
Generate (Ji ,0)1≤i≤Niid∼M(1, (ωi ,0)1≤i≤N)
Set xi ,0 = xJi ,0
At time t (t = 1, . . . ,T ),
Generate xi ,tind∼ Qi ,t(xi ,t−1, ·)
Set ωi ,t = {π(xi ,t)/qi ,t(xi ,t−1, xi ,t)}
Generate (Ji ,t)1≤i≤Niid∼M(1, (ωi ,t)1≤i≤N)
Set xi ,t = xJi,t ,t .
[Cappe, Douc, Guillin, Marin, & CPR, 2009, Stat.& Comput.]
Likelihood-free Methods
Introduction
Population Monte Carlo
PMC: Population Monte Carlo Algorithm
At time t = 0
Generate (xi ,0)1≤i≤Niid∼ Q0
Set ωi ,0 = {π/q0}(xi ,0)
Generate (Ji ,0)1≤i≤Niid∼M(1, (ωi ,0)1≤i≤N)
Set xi ,0 = xJi ,0
At time t (t = 1, . . . ,T ),
Generate xi ,tind∼ Qi ,t(xi ,t−1, ·)
Set ωi ,t = {π(xi ,t)/qi ,t(xi ,t−1, xi ,t)}
Generate (Ji ,t)1≤i≤Niid∼M(1, (ωi ,t)1≤i≤N)
Set xi ,t = xJi,t ,t .
[Cappe, Douc, Guillin, Marin, & CPR, 2009, Stat.& Comput.]
Likelihood-free Methods
Introduction
Population Monte Carlo
Notes on PMC
After T iterations of PMC, PMC estimator of Π(h) given by
ΠPMCN,T (h) =
1
T
T∑t=1
N∑i=1
¯ωi ,th(xi ,t).
1. ¯ωi ,t means normalising over whole sequence of simulations
2. Qi ,t ’s chosen arbitrarily under support constraint
3. Qi ,t ’s may depend on whole sequence of simulations
4. natural solution for ABC
Likelihood-free Methods
Introduction
Population Monte Carlo
Notes on PMC
After T iterations of PMC, PMC estimator of Π(h) given by
ΠPMCN,T (h) =
1
T
T∑t=1
N∑i=1
¯ωi ,th(xi ,t).
1. ¯ωi ,t means normalising over whole sequence of simulations
2. Qi ,t ’s chosen arbitrarily under support constraint
3. Qi ,t ’s may depend on whole sequence of simulations
4. natural solution for ABC
Likelihood-free Methods
Introduction
Population Monte Carlo
Improving quality
The efficiency of the SNIS approximation depends on the choice ofQ, ranging from optimal
q(x) ∝ |h(x)− Π(h)|π(x)
to uselessvar ΠSNIS
Q,N (h) = +∞
Example (PMC=adaptive importance sampling)
Population Monte Carlo is producing a sequence of proposals Qt
aiming at improving efficiency
Kull(π, qt) ≤ Kull(π, qt−1) or var ΠSNISQt ,∞(h) ≤ var ΠSNIS
Qt−1,∞(h)
[Cappe, Douc, Guillin, Marin, Robert, 04, 07a, 07b, 08]
Likelihood-free Methods
Introduction
Population Monte Carlo
Improving quality
The efficiency of the SNIS approximation depends on the choice ofQ, ranging from optimal
q(x) ∝ |h(x)− Π(h)|π(x)
to uselessvar ΠSNIS
Q,N (h) = +∞
Example (PMC=adaptive importance sampling)
Population Monte Carlo is producing a sequence of proposals Qt
aiming at improving efficiency
Kull(π, qt) ≤ Kull(π, qt−1) or var ΠSNISQt ,∞(h) ≤ var ΠSNIS
Qt−1,∞(h)
[Cappe, Douc, Guillin, Marin, Robert, 04, 07a, 07b, 08]
Likelihood-free Methods
Introduction
AMIS
Multiple Importance Sampling
Reycling: given several proposals Q1, . . . ,QT , for 1 ≤ t ≤ Tgenerate an iid sample
x t1, . . . , x
tN ∼ Qt
and estimate Π(h) by
ΠMISQ,N(h) = T−1
T∑t=1
N−1N∑i=1
h(x ti )ωt
i
where
ωti 6=
π(x ti )
qt(x ti )
correct...
Likelihood-free Methods
Introduction
AMIS
Multiple Importance Sampling
Reycling: given several proposals Q1, . . . ,QT , for 1 ≤ t ≤ Tgenerate an iid sample
x t1, . . . , x
tN ∼ Qt
and estimate Π(h) by
ΠMISQ,N(h) = T−1
T∑t=1
N−1N∑i=1
h(x ti )ωt
i
where
ωti =
π(x ti )
T−1∑T
`=1 q`(x ti )
still correct!
Likelihood-free Methods
Introduction
AMIS
Mixture representation
Deterministic mixture correction of the weights proposed by Owenand Zhou (JASA, 2000)
I The corresponding estimator is still unbiased [if notself-normalised]
I All particles are on the same weighting scale rather than theirown
I Large variance proposals Qt do not take over
I Variance reduction thanks to weight stabilization & recycling
I [K.o.] removes the randomness in the component choice[=Rao-Blackwellisation]
Likelihood-free Methods
Introduction
AMIS
Global adaptation
Global Adaptation
At iteration t = 1, · · · ,T ,
1. For 1 ≤ i ≤ N1, generate x ti ∼ T3(µt−1, Σt−1)
2. Calculate the mixture importance weight of particle x ti
ωti = π
(x ti
) /δti
where
δti =t−1∑l=0
qT (3)
(x ti ; µl , Σl
)
Likelihood-free Methods
Introduction
AMIS
Backward reweighting
3. If t ≥ 2, actualize the weights of all past particles, x li
1 ≤ l ≤ t − 1ωli = π
(x ti
) /δli
whereδli = δli + qT (3)
(x li ; µt−1, Σt−1
)4. Compute IS estimates of target mean and variance µt and Σt ,
where
µtj =t∑
l=1
N1∑i=1
ωli (xj)
li
/ t∑l=1
N1∑i=1
ωli . . .
Likelihood-free Methods
Introduction
AMIS
A toy example
−40 −20 0 20 40
−30
−20
−10
010
Banana shape benchmark: marginal distribution of (x1, x2) for the parameters
σ21 = 100 and b = 0.03. Contours represent 60% (red), 90% (black) and 99.9%
(blue) confidence regions in the marginal space.
Likelihood-free Methods
Introduction
AMIS
A toy example
●
●
●4e+
046e
+04
8e+
041e
+05
p=5
●
●
2000
040
000
6000
080
000
p=10
●
020
000
6000
0
p=20
Banana shape example: boxplots of 10 replicate ESSs for the AMIS scheme
(left) and the NOT-AMIS scheme (right) for p = 5, 10, 20.
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm
Introduction
The Metropolis-Hastings AlgorithmMonte Carlo Methods based on Markov ChainsThe Metropolis–Hastings algorithmThe random walk Metropolis-Hastings algorithm
simulation-based methods in Econometrics
Genetics of ABC
Approximate Bayesian computation
ABC for model choice
ABC model choice consistency
Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains
Epiphany! It is not necessary to use a sample from the distributionf to approximate the integral
I =
∫h(x)f (x)dx ,
Principle: Obtain X1, . . . ,Xn ∼ f (approx) without directlysimulating from f , using an ergodic Markov chain with stationarydistribution f
[Metropolis et al., 1953]
Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains
Epiphany! It is not necessary to use a sample from the distributionf to approximate the integral
I =
∫h(x)f (x)dx ,
Principle: Obtain X1, . . . ,Xn ∼ f (approx) without directlysimulating from f , using an ergodic Markov chain with stationarydistribution f
[Metropolis et al., 1953]
Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains (2)
Idea
For an arbitrary starting value x (0), an ergodic chain (X (t)) isgenerated using a transition kernel with stationary distribution f
I Insures the convergence in distribution of (X (t)) to a randomvariable from f .
I For a “large enough” T0, X (T0) can be considered asdistributed from f
I Produce a dependent sample X (T0),X (T0+1), . . ., which isgenerated from f , sufficient for most approximation purposes.
Problem: How can one build a Markov chain with a givenstationary distribution?
Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains (2)
Idea
For an arbitrary starting value x (0), an ergodic chain (X (t)) isgenerated using a transition kernel with stationary distribution f
I Insures the convergence in distribution of (X (t)) to a randomvariable from f .
I For a “large enough” T0, X (T0) can be considered asdistributed from f
I Produce a dependent sample X (T0),X (T0+1), . . ., which isgenerated from f , sufficient for most approximation purposes.
Problem: How can one build a Markov chain with a givenstationary distribution?
Likelihood-free Methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains (2)
Idea
For an arbitrary starting value x (0), an ergodic chain (X (t)) isgenerated using a transition kernel with stationary distribution f
I Insures the convergence in distribution of (X (t)) to a randomvariable from f .
I For a “large enough” T0, X (T0) can be considered asdistributed from f
I Produce a dependent sample X (T0),X (T0+1), . . ., which isgenerated from f , sufficient for most approximation purposes.
Problem: How can one build a Markov chain with a givenstationary distribution?
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
The Metropolis–Hastings algorithm
BasicsThe algorithm uses the objective (target) density
f
and a conditional densityq(y |x)
called the instrumental (or proposal) distribution
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
The MH algorithm
Algorithm (Metropolis–Hastings)
Given x (t),
1. Generate Yt ∼ q(y |x (t)).
2. Take
X (t+1) =
{Yt with prob. ρ(x (t),Yt),
x (t) with prob. 1− ρ(x (t),Yt),
where
ρ(x , y) = min
{f (y)
f (x)
q(x |y)
q(y |x), 1
}.
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Features
I Independent of normalizing constants for both f and q(·|x)(ie, those constants independent of x)
I Never move to values with f (y) = 0
I The chain (x (t))t may take the same value several times in arow, even though f is a density wrt Lebesgue measure
I The sequence (yt)t is usually not a Markov chain
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties
The M-H Markov chain is reversible, with invariant/stationarydensity f since it satisfies the detailed balance condition
f (y) K (y , x) = f (x) K (x , y)
Ifq(y |x) > 0 for every (x , y),
the chain is Harris recurrent and
limT→∞
1
T
T∑t=1
h(X (t)) =
∫h(x)df (x) a.e. f .
limn→∞
∥∥∥∥∫ Kn(x , ·)µ(dx)− f
∥∥∥∥TV
= 0
for every initial distribution µ
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties
The M-H Markov chain is reversible, with invariant/stationarydensity f since it satisfies the detailed balance condition
f (y) K (y , x) = f (x) K (x , y)
Ifq(y |x) > 0 for every (x , y),
the chain is Harris recurrent and
limT→∞
1
T
T∑t=1
h(X (t)) =
∫h(x)df (x) a.e. f .
limn→∞
∥∥∥∥∫ Kn(x , ·)µ(dx)− f
∥∥∥∥TV
= 0
for every initial distribution µ
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Random walk Metropolis–Hastings
Use of a local perturbation as proposal
Yt = X (t) + εt ,
where εt ∼ g , independent of X (t).The instrumental density is now of the form g(y − x) and theMarkov chain is a random walk if we take g to be symmetricg(x) = g(−x)
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Algorithm (Random walk Metropolis)
Given x (t)
1. Generate Yt ∼ g(y − x (t))
2. Take
X (t+1) =
Yt with prob. min
{1,
f (Yt)
f (x (t))
},
x (t) otherwise.
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
RW-MH on mixture posterior distribution
−1 0 1 2 3
−1
01
23
µ1
µ 2
X
Random walk MCMC output for .7N (µ1, 1) + .3N (µ2, 1)
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Acceptance rate
A high acceptance rate is not indication of efficiency since therandom walk may be moving “too slowly” on the target surface
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Acceptance rate
A high acceptance rate is not indication of efficiency since therandom walk may be moving “too slowly” on the target surface
If x (t) and yt are “too close”, i.e. f (x (t)) ' f (yt), yt is acceptedwith probability
min
(f (yt)
f (x (t)), 1
)' 1 .
and acceptance rate high
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Acceptance rate
A high acceptance rate is not indication of efficiency since therandom walk may be moving “too slowly” on the target surface
If average acceptance rate low, the proposed values f (yt) tend tobe small wrt f (x (t)), i.e. the random walk [not the algorithm!]moves quickly on the target surface often reaching its boundaries
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Rule of thumb
In small dimensions, aim at an average acceptance rate of 50%. Inlarge dimensions, at an average acceptance rate of 25%.
[Gelman,Gilks and Roberts, 1995]
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Noisy AR(1)
Target distribution of x given x1, x2 and y is
exp−1
2τ2
{(x − ϕx1)2 + (x2 − ϕx)2 +
τ2
σ2(y − x2)2
}.
For a Gaussian random walk with scale ω small enough, therandom walk never jumps to the other mode. But if the scale ω issufficiently large, the Markov chain explores both modes and give asatisfactory approximation of the target distribution.
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Noisy AR(2)
Markov chain based on a random walk with scale ω = .1.
Likelihood-free Methods
The Metropolis-Hastings Algorithm
The random walk Metropolis-Hastings algorithm
Noisy AR(3)
Markov chain based on a random walk with scale ω = .5.
Likelihood-free Methods
simulation-based methods in Econometrics
Meanwhile, in a Gaulish village...
Similar exploration of simulation-based techniques in Econometrics
I Simulated method of moments
I Method of simulated moments
I Simulated pseudo-maximum-likelihood
I Indirect inference
[Gourieroux & Monfort, 1996]
Likelihood-free Methods
simulation-based methods in Econometrics
Simulated method of moments
Given observations yo1:n from a model
yt = r(y1:(t−1), εt , θ) , εt ∼ g(·)
simulate ε?1:n, derive
y?t (θ) = r(y1:(t−1), ε?t , θ)
and estimate θ by
arg minθ
n∑t=1
(yot − y?t (θ))2
Likelihood-free Methods
simulation-based methods in Econometrics
Simulated method of moments
Given observations yo1:n from a model
yt = r(y1:(t−1), εt , θ) , εt ∼ g(·)
simulate ε?1:n, derive
y?t (θ) = r(y1:(t−1), ε?t , θ)
and estimate θ by
arg minθ
{n∑
t=1
yot −
n∑t=1
y?t (θ)
}2
Likelihood-free Methods
simulation-based methods in Econometrics
Method of simulated moments
Given a statistic vector K (y) with
Eθ[K (Yt)|y1:(t−1)] = k(y1:(t−1); θ)
find an unbiased estimator of k(y1:(t−1); θ),
k(εt , y1:(t−1); θ)
Estimate θ by
arg minθ
∣∣∣∣∣∣∣∣∣∣
n∑t=1
[K (yt)−
S∑s=1
k(εst , y1:(t−1); θ)/S
]∣∣∣∣∣∣∣∣∣∣
[Pakes & Pollard, 1989]
Likelihood-free Methods
simulation-based methods in Econometrics
Indirect inference
Minimise (in θ) the distance between estimators β based onpseudo-models for genuine observations and for observationssimulated under the true model and the parameter θ.
[Gourieroux, Monfort, & Renault, 1993;Smith, 1993; Gallant & Tauchen, 1996]
Likelihood-free Methods
simulation-based methods in Econometrics
Indirect inference (PML vs. PSE)
Example of the pseudo-maximum-likelihood (PML)
β(y) = arg maxβ
∑t
log f ?(yt |β, y1:(t−1))
leading to
arg minθ||β(yo)− β(y1(θ), . . . , yS(θ))||2
whenys(θ) ∼ f (y|θ) s = 1, . . . ,S
Likelihood-free Methods
simulation-based methods in Econometrics
Indirect inference (PML vs. PSE)
Example of the pseudo-score-estimator (PSE)
β(y) = arg minβ
{∑t
∂ log f ?
∂β(yt |β, y1:(t−1))
}2
leading to
arg minθ||β(yo)− β(y1(θ), . . . , yS(θ))||2
whenys(θ) ∼ f (y|θ) s = 1, . . . ,S
Likelihood-free Methods
simulation-based methods in Econometrics
Consistent indirect inference
...in order to get a unique solution the dimension ofthe auxiliary parameter β must be larger than or equal tothe dimension of the initial parameter θ. If the problem isjust identified the different methods become easier...
Consistency depending on the criterion and on the asymptoticidentifiability of θ
[Gourieroux, Monfort, 1996, p. 66]
Likelihood-free Methods
simulation-based methods in Econometrics
Consistent indirect inference
...in order to get a unique solution the dimension ofthe auxiliary parameter β must be larger than or equal tothe dimension of the initial parameter θ. If the problem isjust identified the different methods become easier...
Consistency depending on the criterion and on the asymptoticidentifiability of θ
[Gourieroux, Monfort, 1996, p. 66]
Likelihood-free Methods
simulation-based methods in Econometrics
AR(2) vs. MA(1) example
true (AR) modelyt = εt − θεt−1
and [wrong!] auxiliary (MA) model
yt = β1yt−1 + β2yt−2 + ut
R codex=eps=rnorm(250)
x[2:250]=x[2:250]-0.5*x[1:249]
simeps=rnorm(250)
propeta=seq(-.99,.99,le=199)
dist=rep(0,199)
bethat=as.vector(arima(x,c(2,0,0),incl=FALSE)$coef)
for (t in 1:199)
dist[t]=sum((as.vector(arima(c(simeps[1],simeps[2:250]-propeta[t]*
simeps[1:249]),c(2,0,0),incl=FALSE)$coef)-bethat)^2)
Likelihood-free Methods
simulation-based methods in Econometrics
AR(2) vs. MA(1) example
One sample:
−1.0 −0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
θ
dist
ance
Likelihood-free Methods
simulation-based methods in Econometrics
AR(2) vs. MA(1) example
Many samples:
0.2 0.4 0.6 0.8 1.0
01
23
45
6
Likelihood-free Methods
simulation-based methods in Econometrics
Choice of pseudo-model
Pick model such that
1. β(θ) not flat(i.e. sensitive to changes in θ)
2. β(θ) not dispersed (i.e. robust agains changes in ys(θ))
[Frigessi & Heggland, 2004]
Likelihood-free Methods
simulation-based methods in Econometrics
ABC using indirect inference
We present a novel approach for developing summary statisticsfor use in approximate Bayesian computation (ABC) algorithms byusing indirect inference(...) In the indirect inference approach toABC the parameters of an auxiliary model fitted to the data becomethe summary statistics. Although applicable to any ABC technique,we embed this approach within a sequential Monte Carlo algorithmthat is completely adaptive and requires very little tuning(...)
[Drovandi, Pettitt & Faddy, 2011]
c© Indirect inference provides summary statistics for ABC...
Likelihood-free Methods
simulation-based methods in Econometrics
ABC using indirect inference
...the above result shows that, in the limit as h→ 0, ABC willbe more accurate than an indirect inference method whose auxiliarystatistics are the same as the summary statistic that is used forABC(...) Initial analysis showed that which method is moreaccurate depends on the true value of θ.
[Fearnhead and Prangle, 2012]
c© Indirect inference provides estimates rather than global inference...
Likelihood-free Methods
Genetics of ABC
Genetic background of ABC
ABC is a recent computational technique that only requires beingable to sample from the likelihood f (·|θ)
This technique stemmed from population genetics models, about15 years ago, and population geneticists still contributesignificantly to methodological developments of ABC.
[Griffith & al., 1997; Tavare & al., 1999]
Likelihood-free Methods
Genetics of ABC
Population genetics
[Part derived from the teaching material of Raphael Leblois, ENS Lyon, November 2010]
I Describe the genotypes, estimate the alleles frequencies,determine their distribution among individuals, populationsand between populations;
I Predict and understand the evolution of gene frequencies inpopulations as a result of various factors.
c© Analyses the effect of various evolutive forces (mutation, drift,migration, selection) on the evolution of gene frequencies in timeand space.
Likelihood-free Methods
Genetics of ABC
A[B]ctors
Towards an explanation of the mechanisms of evolutionarychanges, mathematical models of evolution started to be developedin the years 1920-1930.
Les mécanismes de l’évolution
Ronald A Fisher (1890-1962) Sewall Wright (1889-1988) John BS Haldane (1892-1964)
•! Les mathématiques de l’évolution ont
commencé à être développés dans les
années 1920-1930
Likelihood-free Methods
Genetics of ABC
Wright-Fisher modelLe modèle de Wright-Fisher
•! En l’absence de mutation et de
sélection, les fréquences
alléliques dérivent (augmentent
et diminuent) inévitablement
jusqu’à la fixation d’un allèle
•! La dérive conduit donc à la
perte de variation génétique à
l’intérieur des populations
I A population of constantsize, in which individualsreproduce at the same time.
I Each gene in a generation isa copy of a gene of theprevious generation.
I In the absence of mutationand selection, allelefrequencies derive inevitablyuntil the fixation of anallele.
Likelihood-free Methods
Genetics of ABC
Coalescent theory
[Kingman, 1982; Tajima, Tavare, &tc]
5
!"#$%&'(('")**+$,-'".'"/010234%'".'5"*$*%()23$15"6"
!!"7**+$,-'",()5534%'" " "!"7**+$,-'"8",$)('5,'1,'"9"
"":";<;=>7?@<#" " " """"":"ABC7#?@>><#"
"":"D+04%'1,'5" " " """"":"E010)($/3'".'5"/F1'5"
"":"G353$1")&)12"HD$+I)+.J" " """"":"G353$1")++3F+'"HK),LI)+.J"
Coalescence theory interested in the genealogy of a sample ofgenes back in time to the common ancestor of the sample.
Likelihood-free Methods
Genetics of ABC
Common ancestor
6 T
ime
of
coal
esce
nce
(T)
Modélisation du processus de dérive génétique
en “remontant dans le temps”
jusqu’à l’ancêtre commun d’un échantillon de gènes
Les différentes
lignées fusionnent
(coalescent) au fur et à mesure que
l’on remonte vers le
passé
The different lineages merge when we go back in the past.
Likelihood-free Methods
Genetics of ABC
Neutral mutations
20
Arbre de coalescence et mutations
Sous l’hypothèse de neutralité des marqueurs génétiques étudiés,
les mutations sont indépendantes de la généalogie
i.e. la généalogie ne dépend que des processus démographiques
On construit donc la généalogie selon les paramètres
démographiques (ex. N),
puis on ajoute a posteriori les
mutations sur les différentes
branches, du MRCA au feuilles de
l’arbre
On obtient ainsi des données de
polymorphisme sous les modèles
démographiques et mutationnels
considérés
I Under the assumption ofneutrality, the mutationsare independent of thegenealogy.
I We construct the genealogyaccording to thedemographic parameters,then we add a posteriori themutations.
Likelihood-free Methods
Genetics of ABC
Demo-genetic inference
Each model is characterized by a set of parameters θ that coverhistorical (time divergence, admixture time ...), demographics(population sizes, admixture rates, migration rates, ...) and genetic(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset ofpolymorphism (DNA sample) y observed at the present time
Problem: most of the time, we can not calculate the likelihood ofthe polymorphism data f (y|θ).
Likelihood-free Methods
Genetics of ABC
Untractable likelihood
Missing (too missing!) data structure:
f (y|θ) =
∫G
f (y|G ,θ)f (G |θ)dG
The genealogies are considered as nuisance parameters.
This problematic thus differs from the phylogenetic approachwhere the tree is the parameter of interesst.
Likelihood-free Methods
Genetics of ABC
A genuine example of application
94
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03 !1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Pygmies populations: do they have a common origin? Is there alot of exchanges between pygmies and non-pygmies populations?
Likelihood-free Methods
Genetics of ABC
Alternative scenarios
96
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03 !1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Différents scénarios possibles, choix de scenario par ABC
Verdu et al. 2009
Likelihood-free Methods
Genetics of ABC
Simulation results
97
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03 !1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Différents scénarios possibles, choix de scenario par ABC
Le scenario 1a est largement soutenu par rapport aux
autres ! plaide pour une origine commune des
populations pygmées d’Afrique de l’Ouest Verdu et al. 2009
97
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03 !1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Différents scénarios possibles, choix de scenario par ABC
Le scenario 1a est largement soutenu par rapport aux
autres ! plaide pour une origine commune des
populations pygmées d’Afrique de l’Ouest Verdu et al. 2009
c© Scenario 1A is chosen.
Likelihood-free Methods
Genetics of ABC
Most likely scenario
99
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03 !1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Scénario
évolutif :
on « raconte »
une histoire à
partir de ces
inférences
Verdu et al. 2009
Likelihood-free Methods
Approximate Bayesian computation
Approximate Bayesian computation
Introduction
The Metropolis-Hastings Algorithm
simulation-based methods in Econometrics
Genetics of ABC
Approximate Bayesian computationABC basicsAlphabet soupAutomated summary statistic selection
ABC for model choice
ABC model choice consistency
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Untractable likelihoods
Cases when the likelihood function f (y|θ) is unavailable and whenthe completion step
f (y|θ) =
∫Z
f (y, z|θ) dz
is impossible or too costly because of the dimension of zc© MCMC cannot be implemented!
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Untractable likelihoods
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Illustrations
Example
Stochastic volatility model: fort = 1, . . . ,T ,
yt = exp(zt)εt , zt = a+bzt−1+σηt ,
T very large makes it difficult toinclude z within the simulatedparameters
0 200 400 600 800 1000
−0.
4−
0.2
0.0
0.2
0.4
t
Highest weight trajectories
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Illustrations
Example
Potts model: if y takes values on a grid Y of size kn and
f (y|θ) ∝ exp
{θ∑l∼i
Iyl=yi
}
where l∼i denotes a neighbourhood relation, n moderately largeprohibits the computation of the normalising constant
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Illustrations
Example
Inference on CMB: in cosmology, study of the Cosmic MicrowaveBackground via likelihoods immensely slow to computate (e.gWMAP, Plank), because of numerically costly spectral transforms[Data is a Fortran program]
[Kilbinger et al., 2010, MNRAS]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Illustrations
Example
Phylogenetic tree: in populationgenetics, reconstitution of a commonancestor from a sample of genes viaa phylogenetic tree that is close toimpossible to integrate out[100 processor days with 4parameters]
[Cornuet et al., 2009, Bioinformatics]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
The ABC method
Bayesian setting: target is π(θ)f (x |θ)
When likelihood f (x |θ) not in closed form, likelihood-free rejectiontechnique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointlysimulating
θ′ ∼ π(θ) , z ∼ f (z|θ′) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Tavare et al., 1997]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
The ABC method
Bayesian setting: target is π(θ)f (x |θ)When likelihood f (x |θ) not in closed form, likelihood-free rejectiontechnique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointlysimulating
θ′ ∼ π(θ) , z ∼ f (z|θ′) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Tavare et al., 1997]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
The ABC method
Bayesian setting: target is π(θ)f (x |θ)When likelihood f (x |θ) not in closed form, likelihood-free rejectiontechnique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointlysimulating
θ′ ∼ π(θ) , z ∼ f (z|θ′) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Tavare et al., 1997]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Why does it work?!
The proof is trivial:
f (θi ) ∝∑z∈D
π(θi )f (z|θi )Iy(z)
∝ π(θi )f (y|θi )= π(θi |y) .
[Accept–Reject 101]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Earlier occurrence
‘Bayesian statistics and Monte Carlo methods are ideallysuited to the task of passing many models over onedataset’
[Don Rubin, Annals of Statistics, 1984]
Note Rubin (1984) does not promote this algorithm forlikelihood-free simulation but frequentist intuition on posteriordistributions: parameters from posteriors are more likely to bethose that could have generated the data.
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
A as A...pproximative
When y is a continuous random variable, equality z = y is replacedwith a tolerance condition,
%(y, z) ≤ ε
where % is a distance
Output distributed from
π(θ) Pθ{%(y, z) < ε} ∝ π(θ|%(y, z) < ε)
[Pritchard et al., 1999]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
A as A...pproximative
When y is a continuous random variable, equality z = y is replacedwith a tolerance condition,
%(y, z) ≤ ε
where % is a distanceOutput distributed from
π(θ) Pθ{%(y, z) < ε} ∝ π(θ|%(y, z) < ε)
[Pritchard et al., 1999]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC algorithm
Algorithm 1 Likelihood-free rejection sampler 2
for i = 1 to N dorepeat
generate θ′ from the prior distribution π(·)generate z from the likelihood f (·|θ′)
until ρ{η(z), η(y)} ≤ εset θi = θ′
end for
where η(y) defines a (not necessarily sufficient) statistic
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Output
The likelihood-free algorithm samples from the marginal in z of:
πε(θ, z|y) =π(θ)f (z|θ)IAε,y (z)∫
Aε,y×Θ π(θ)f (z|θ)dzdθ,
where Aε,y = {z ∈ D|ρ(η(z), η(y)) < ε}.
The idea behind ABC is that the summary statistics coupled with asmall tolerance should provide a good approximation of theposterior distribution:
πε(θ|y) =
∫πε(θ, z|y)dz ≈ π(θ|y) .
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Output
The likelihood-free algorithm samples from the marginal in z of:
πε(θ, z|y) =π(θ)f (z|θ)IAε,y (z)∫
Aε,y×Θ π(θ)f (z|θ)dzdθ,
where Aε,y = {z ∈ D|ρ(η(z), η(y)) < ε}.
The idea behind ABC is that the summary statistics coupled with asmall tolerance should provide a good approximation of theposterior distribution:
πε(θ|y) =
∫πε(θ, z|y)dz ≈ π(θ|y) .
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (first attempt)
What happens when ε→ 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitraryδ > 0, there exists ε0 such that ε < ε0 implies
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (first attempt)
What happens when ε→ 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitraryδ > 0, there exists ε0 such that ε < ε0 implies
π(θ)∫
f (z|θ)IAε,y (z) dz∫Aε,y×Θ π(θ)f (z|θ)dzdθ
∈ π(θ)f (y|θ)(1∓ δ)µ(Bε)∫Θ π(θ)f (y|θ)dθ(1± δ)µ(Bε)
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (first attempt)
What happens when ε→ 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitraryδ > 0, there exists ε0 such that ε < ε0 implies
π(θ)∫
f (z|θ)IAε,y (z) dz∫Aε,y×Θ π(θ)f (z|θ)dzdθ
∈ π(θ)f (y|θ)(1∓ δ)����XXXXµ(Bε)∫Θ π(θ)f (y|θ)dθ(1± δ)����XXXXµ(Bε)
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (first attempt)
What happens when ε→ 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitraryδ > 0, there exists ε0 such that ε < ε0 implies
π(θ)∫
f (z|θ)IAε,y (z) dz∫Aε,y×Θ π(θ)f (z|θ)dzdθ
∈ π(θ)f (y|θ)(1∓ δ)����XXXXµ(Bε)∫Θ π(θ)f (y|θ)dθ(1± δ)����XXXXµ(Bε)
[Proof extends to other continuous-in-0 kernels Kε]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (second attempt)
What happens when ε→ 0?
For B ⊂ Θ, we have∫B
∫Aε,y
f (z|θ)dz∫Aε,y×Θ π(θ)f (z|θ)dzdθ
π(θ)dθ =
∫Aε,y
∫B f (z|θ)π(θ)dθ∫
Aε,y×Θ π(θ)f (z|θ)dzdθdz
=
∫Aε,y
∫B f (z|θ)π(θ)dθ
m(z)
m(z)∫Aε,y×Θ π(θ)f (z|θ)dzdθ
dz
=
∫Aε,y
π(B|z)m(z)∫
Aε,y×Θ π(θ)f (z|θ)dzdθdz
which indicates convergence for a continuous π(B|z).
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Convergence of ABC (second attempt)
What happens when ε→ 0?
For B ⊂ Θ, we have∫B
∫Aε,y
f (z|θ)dz∫Aε,y×Θ π(θ)f (z|θ)dzdθ
π(θ)dθ =
∫Aε,y
∫B f (z|θ)π(θ)dθ∫
Aε,y×Θ π(θ)f (z|θ)dzdθdz
=
∫Aε,y
∫B f (z|θ)π(θ)dθ
m(z)
m(z)∫Aε,y×Θ π(θ)f (z|θ)dzdθ
dz
=
∫Aε,y
π(B|z)m(z)∫
Aε,y×Θ π(θ)f (z|θ)dzdθdz
which indicates convergence for a continuous π(B|z).
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Probit modelling on Pima Indian women
Example (R benchmark)200 Pima Indian women with observed variables
I plasma glucose concentration in oral glucose tolerance test
I diastolic blood pressure
I diabetes pedigree function
I presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag -prior modelling:
β ∼ N3(0, n(
XTX)−1)
Use of importance function inspired from the MLE estimatedistribution
β ∼ N (β, Σ)
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Probit modelling on Pima Indian women
Example (R benchmark)200 Pima Indian women with observed variables
I plasma glucose concentration in oral glucose tolerance test
I diastolic blood pressure
I diabetes pedigree function
I presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag -prior modelling:
β ∼ N3(0, n(
XTX)−1)
Use of importance function inspired from the MLE estimatedistribution
β ∼ N (β, Σ)
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Probit modelling on Pima Indian women
Example (R benchmark)200 Pima Indian women with observed variables
I plasma glucose concentration in oral glucose tolerance test
I diastolic blood pressure
I diabetes pedigree function
I presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag -prior modelling:
β ∼ N3(0, n(
XTX)−1)
Use of importance function inspired from the MLE estimatedistribution
β ∼ N (β, Σ)
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Probit modelling on Pima Indian women
Example (R benchmark)200 Pima Indian women with observed variables
I plasma glucose concentration in oral glucose tolerance test
I diastolic blood pressure
I diabetes pedigree function
I presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag -prior modelling:
β ∼ N3(0, n(
XTX)−1)
Use of importance function inspired from the MLE estimatedistribution
β ∼ N (β, Σ)
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Pima Indian benchmark
−0.005 0.010 0.020 0.030
020
4060
8010
0
Dens
ity
−0.05 −0.03 −0.01
020
4060
80
Dens
ity
−1.0 0.0 1.0 2.0
0.00.2
0.40.6
0.81.0
Dens
ityFigure: Comparison between density estimates of the marginals on β1
(left), β2 (center) and β3 (right) from ABC rejection samples (red) andMCMC samples (black)
.
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
MA example
Back to the MA(q) model
xt = εt +
q∑i=1
ϑiεt−i
Simple prior: uniform over the inverse [real and complex] roots in
Q(u) = 1−q∑
i=1
ϑiui
under the identifiability conditions
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
MA example
Back to the MA(q) model
xt = εt +
q∑i=1
ϑiεt−i
Simple prior: uniform prior over the identifiability zone, e.g.triangle for MA(2)
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1, ϑ2) in the triangle
2. generating an iid sequence (εt)−q<t≤T
3. producing a simulated series (x ′t)1≤t≤T
Distance: basic distance between the series
ρ((x ′t)1≤t≤T , (xt)1≤t≤T ) =T∑t=1
(xt − x ′t)2
or distance between summary statistics like the q autocorrelations
τj =T∑
t=j+1
xtxt−j
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1, ϑ2) in the triangle
2. generating an iid sequence (εt)−q<t≤T
3. producing a simulated series (x ′t)1≤t≤T
Distance: basic distance between the series
ρ((x ′t)1≤t≤T , (xt)1≤t≤T ) =T∑t=1
(xt − x ′t)2
or distance between summary statistics like the q autocorrelations
τj =T∑
t=j+1
xtxt−j
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Comparison of distance impact
Evaluation of the tolerance on the ABC sample against bothdistances (ε = 100%, 10%, 1%, 0.1%) for an MA(2) model
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Comparison of distance impact
0.0 0.2 0.4 0.6 0.8
01
23
4
θ1
−2.0 −1.0 0.0 0.5 1.0 1.5
0.00.5
1.01.5
θ2
Evaluation of the tolerance on the ABC sample against bothdistances (ε = 100%, 10%, 1%, 0.1%) for an MA(2) model
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Comparison of distance impact
0.0 0.2 0.4 0.6 0.8
01
23
4
θ1
−2.0 −1.0 0.0 0.5 1.0 1.5
0.00.5
1.01.5
θ2
Evaluation of the tolerance on the ABC sample against bothdistances (ε = 100%, 10%, 1%, 0.1%) for an MA(2) model
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
Homonomy
The ABC algorithm is not to be confused with the ABC algorithm
The Artificial Bee Colony algorithm is a swarm based meta-heuristicalgorithm that was introduced by Karaboga in 2005 for optimizingnumerical problems. It was inspired by the intelligent foragingbehavior of honey bees. The algorithm is specifically based on themodel proposed by Tereshko and Loengarov (2005) for the foragingbehaviour of honey bee colonies. The model consists of threeessential components: employed and unemployed foraging bees, andfood sources. The first two components, employed and unemployedforaging bees, search for rich food sources (...) close to their hive.The model also defines two leading modes of behaviour (...):recruitment of foragers to rich food sources resulting in positivefeedback and abandonment of poor sources by foragers causingnegative feedback.
[Karaboga, Scholarpedia]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the densityof x ’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimationand by developing techniques to allow for larger ε
[Beaumont et al., 2002]
.....or even by including ε in the inferential framework [ABCµ][Ratmann et al., 2009]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC advances
Simulating from the prior is often poor in efficiencyEither modify the proposal distribution on θ to increase the densityof x ’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimationand by developing techniques to allow for larger ε
[Beaumont et al., 2002]
.....or even by including ε in the inferential framework [ABCµ][Ratmann et al., 2009]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC advances
Simulating from the prior is often poor in efficiencyEither modify the proposal distribution on θ to increase the densityof x ’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimationand by developing techniques to allow for larger ε
[Beaumont et al., 2002]
.....or even by including ε in the inferential framework [ABCµ][Ratmann et al., 2009]
Likelihood-free Methods
Approximate Bayesian computation
ABC basics
ABC advances
Simulating from the prior is often poor in efficiencyEither modify the proposal distribution on θ to increase the densityof x ’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimationand by developing techniques to allow for larger ε
[Beaumont et al., 2002]
.....or even by including ε in the inferential framework [ABCµ][Ratmann et al., 2009]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP
Better usage of [prior] simulations byadjustement: instead of throwing awayθ′ such that ρ(η(z), η(y)) > ε, replaceθ’s with locally regressed transforms
(use with BIC)
θ∗ = θ − {η(z)− η(y)}Tβ [Csillery et al., TEE, 2010]
where β is obtained by [NP] weighted least square regression on(η(z)− η(y)) with weights
Kδ {ρ(η(z), η(y))}
[Beaumont et al., 2002, Genetics]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
1
S
S∑s=1
Kδ {ρ(η(zs(θ)), η(y))}
[consistent estimate of f (η|θ)]
Curse of dimensionality: poor estimate when d = dim(η) is large...
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
1
S
S∑s=1
Kδ {ρ(η(zs(θ)), η(y))}
[consistent estimate of f (η|θ)]Curse of dimensionality: poor estimate when d = dim(η) is large...
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior expectation∑i θiKδ {ρ(η(z(θi )), η(y))}∑i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior conditional density∑i Kb(θi − θ)Kδ {ρ(η(z(θi )), η(y))}∑
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimations)
Other versions incorporating regression adjustments∑i Kb(θ∗i − θ)Kδ {ρ(η(z(θi )), η(y))}∑
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[g(θ|y)]− g(θ|y) = cb2 + cδ2 + OP(b2 + δ2) + OP(1/nδd)
var(g(θ|y)) =c
nbδd(1 + oP(1))
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimations)
Other versions incorporating regression adjustments∑i Kb(θ∗i − θ)Kδ {ρ(η(z(θi )), η(y))}∑
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[g(θ|y)]− g(θ|y) = cb2 + cδ2 + OP(b2 + δ2) + OP(1/nδd)
var(g(θ|y)) =c
nbδd(1 + oP(1))
[Blum, JASA, 2010]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NP (density estimations)
Other versions incorporating regression adjustments∑i Kb(θ∗i − θ)Kδ {ρ(η(z(θi )), η(y))}∑
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[g(θ|y)]− g(θ|y) = cb2 + cδ2 + OP(b2 + δ2) + OP(1/nδd)
var(g(θ|y)) =c
nbδd(1 + oP(1))
[standard NP calculations]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NCH
Incorporating non-linearities and heterocedasticities:
θ∗ = m(η(y)) + [θ − m(η(z))]σ(η(y))
σ(η(z))
where
I m(η) estimated by non-linear regression (e.g., neural network)
I σ(η) estimated by non-linear regression on residuals
log{θi − m(ηi )}2 = log σ2(ηi ) + ξi
[Blum & Francois, 2009]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NCH
Incorporating non-linearities and heterocedasticities:
θ∗ = m(η(y)) + [θ − m(η(z))]σ(η(y))
σ(η(z))
where
I m(η) estimated by non-linear regression (e.g., neural network)
I σ(η) estimated by non-linear regression on residuals
log{θi − m(ηi )}2 = log σ2(ηi ) + ξi
[Blum & Francois, 2009]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NCH (2)
Why neural network?
I fights curse of dimensionality
I selects relevant summary statistics
I provides automated dimension reduction
I offers a model choice capability
I improves upon multinomial logistic
[Blum & Francois, 2009]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-NCH (2)
Why neural network?
I fights curse of dimensionality
I selects relevant summary statistics
I provides automated dimension reduction
I offers a model choice capability
I improves upon multinomial logistic
[Blum & Francois, 2009]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MCMC
Markov chain (θ(t)) created via the transition function
θ(t+1) =
θ′ ∼ Kω(θ′|θ(t)) if x ∼ f (x |θ′) is such that x = y
and u ∼ U(0, 1) ≤ π(θ′)Kω(θ(t)|θ′)π(θ(t))Kω(θ′|θ(t))
,
θ(t) otherwise,
has the posterior π(θ|y) as stationary distribution[Marjoram et al, 2003]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MCMC
Markov chain (θ(t)) created via the transition function
θ(t+1) =
θ′ ∼ Kω(θ′|θ(t)) if x ∼ f (x |θ′) is such that x = y
and u ∼ U(0, 1) ≤ π(θ′)Kω(θ(t)|θ′)π(θ(t))Kω(θ′|θ(t))
,
θ(t) otherwise,
has the posterior π(θ|y) as stationary distribution[Marjoram et al, 2003]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MCMC (2)
Algorithm 2 Likelihood-free MCMC sampler
Use Algorithm 1 to get (θ(0), z(0))for t = 1 to N do
Generate θ′ from Kω
(·|θ(t−1)
),
Generate z′ from the likelihood f (·|θ′),Generate u from U[0,1],
if u ≤ π(θ′)Kω(θ(t−1)|θ′)π(θ(t−1)Kω(θ′|θ(t−1))
IAε,y (z′) then
set (θ(t), z(t)) = (θ′, z′)else
(θ(t), z(t))) = (θ(t−1), z(t−1)),end if
end for
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why does it work?
Acceptance probability does not involve calculating the likelihoodand
πε(θ′, z′|y)
πε(θ(t−1), z(t−1)|y)
× q(θ(t−1)|θ′)f (z(t−1)|θ(t−1))
q(θ′|θ(t−1))f (z′|θ′)
=π(θ′)���
�XXXXf (z′|θ′) IAε,y (z′)
π(θ(t−1)) f (z(t−1)|θ(t−1)) IAε,y (z(t−1))
× q(θ(t−1)|θ′) f (z(t−1)|θ(t−1))
q(θ′|θ(t−1))����XXXXf (z′|θ′)
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why does it work?
Acceptance probability does not involve calculating the likelihoodand
πε(θ′, z′|y)
πε(θ(t−1), z(t−1)|y)
× q(θ(t−1)|θ′)f (z(t−1)|θ(t−1))
q(θ′|θ(t−1))f (z′|θ′)
=π(θ′)���
�XXXXf (z′|θ′) IAε,y (z′)
π(θ(t−1))((((((((hhhhhhhhf (z(t−1)|θ(t−1)) IAε,y (z(t−1))
× q(θ(t−1)|θ′)((((((((hhhhhhhhf (z(t−1)|θ(t−1))
q(θ′|θ(t−1))����XXXXf (z′|θ′)
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why does it work?
Acceptance probability does not involve calculating the likelihoodand
πε(θ′, z′|y)
πε(θ(t−1), z(t−1)|y)
× q(θ(t−1)|θ′)f (z(t−1)|θ(t−1))
q(θ′|θ(t−1))f (z′|θ′)
=π(θ′)���
�XXXXf (z′|θ′) IAε,y (z′)
π(θ(t−1))((((((((hhhhhhhhf (z(t−1)|θ(t−1))���
���XXXXXXIAε,y (z(t−1))
× q(θ(t−1)|θ′)((((((((hhhhhhhhf (z(t−1)|θ(t−1))
q(θ′|θ(t−1))����XXXXf (z′|θ′)
=π(θ′)q(θ(t−1)|θ′)
π(θ(t−1)q(θ′|θ(t−1))IAε,y (z′)
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
Case of
x ∼ 1
2N (θ, 1) +
1
2N (θ, 1)
under prior θ ∼ N (0, 10)
ABC samplerthetas=rnorm(N,sd=10)
zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1)
eps=quantile(abs(zed-x),.01)
abc=thetas[abs(zed-x)<eps]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
Case of
x ∼ 1
2N (θ, 1) +
1
2N (θ, 1)
under prior θ ∼ N (0, 10)
ABC samplerthetas=rnorm(N,sd=10)
zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1)
eps=quantile(abs(zed-x),.01)
abc=thetas[abs(zed-x)<eps]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
Case of
x ∼ 1
2N (θ, 1) +
1
2N (θ, 1)
under prior θ ∼ N (0, 10)
ABC-MCMC samplermetas=rep(0,N)
metas[1]=rnorm(1,sd=10)
zed[1]=x
for (t in 2:N){
metas[t]=rnorm(1,mean=metas[t-1],sd=5)
zed[t]=rnorm(1,mean=(1-2*(runif(1)<.5))*metas[t],sd=1)
if ((abs(zed[t]-x)>eps)||(runif(1)>dnorm(metas[t],sd=10)/dnorm(metas[t-1],sd=10))){
metas[t]=metas[t-1]
zed[t]=zed[t-1]}
}
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 2
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 2
θ
−4 −2 0 2 4
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 10
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 10
θ
−10 −5 0 5 10
0.00
0.05
0.10
0.15
0.20
0.25
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 50
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A toy example
x = 50
θ
−40 −20 0 20 40
0.00
0.02
0.04
0.06
0.08
0.10
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
Use of a joint density
f (θ, ε|y) ∝ ξ(ε|y, θ)× πθ(θ)× πε(ε)
where y is the data, and ξ(ε|y, θ) is the prior predictive density ofρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
Warning! Replacement of ξ(ε|y, θ) with a non-parametric kernelapproximation.
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
Use of a joint density
f (θ, ε|y) ∝ ξ(ε|y, θ)× πθ(θ)× πε(ε)
where y is the data, and ξ(ε|y, θ) is the prior predictive density ofρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)Warning! Replacement of ξ(ε|y, θ) with a non-parametric kernelapproximation.
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ details
Multidimensional distances ρk (k = 1, . . . ,K ) and errorsεk = ρk(ηk(z), ηk(y)), with
εk ∼ ξk(ε|y, θ) ≈ ξk(ε|y, θ) =1
Bhk
∑b
K [{εk−ρk(ηk(zb), ηk(y))}/hk ]
then used in replacing ξ(ε|y, θ) with mink ξk(ε|y, θ)
ABCµ involves acceptance probability
π(θ′, ε′)
π(θ, ε)
q(θ′, θ)q(ε′, ε)
q(θ, θ′)q(ε, ε′)
mink ξk(ε′|y, θ′)mink ξk(ε|y, θ)
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ details
Multidimensional distances ρk (k = 1, . . . ,K ) and errorsεk = ρk(ηk(z), ηk(y)), with
εk ∼ ξk(ε|y, θ) ≈ ξk(ε|y, θ) =1
Bhk
∑b
K [{εk−ρk(ηk(zb), ηk(y))}/hk ]
then used in replacing ξ(ε|y, θ) with mink ξk(ε|y, θ)ABCµ involves acceptance probability
π(θ′, ε′)
π(θ, ε)
q(θ′, θ)q(ε′, ε)
q(θ, θ′)q(ε, ε′)
mink ξk(ε′|y, θ′)mink ξk(ε|y, θ)
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ multiple errors
[ c© Ratmann et al., PNAS, 2009]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABCµ for model choice
[ c© Ratmann et al., PNAS, 2009]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Questions about ABCµ
For each model under comparison, marginal posterior on ε used toassess the fit of the model (HPD includes 0 or not).
I Is the data informative about ε? [Identifiability]
I How is the prior π(ε) impacting the comparison?
I How is using both ξ(ε|x0, θ) and πε(ε) compatible with astandard probability model? [remindful of Wilkinson ]
I Where is the penalisation for complexity in the modelcomparison?
[X, Mengersen & Chen, 2010, PNAS]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Questions about ABCµ
For each model under comparison, marginal posterior on ε used toassess the fit of the model (HPD includes 0 or not).
I Is the data informative about ε? [Identifiability]
I How is the prior π(ε) impacting the comparison?
I How is using both ξ(ε|x0, θ) and πε(ε) compatible with astandard probability model? [remindful of Wilkinson ]
I Where is the penalisation for complexity in the modelcomparison?
[X, Mengersen & Chen, 2010, PNAS]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC
Another sequential version producing a sequence of Markov
transition kernels Kt and of samples (θ(t)1 , . . . , θ
(t)N ) (1 ≤ t ≤ T )
ABC-PRC Algorithm
1. Select θ? at random from previous θ(t−1)i ’s with probabilities
ω(t−1)i (1 ≤ i ≤ N).
2. Generateθ
(t)i ∼ Kt(θ|θ?) , x ∼ f (x |θ(t)
i ) ,
3. Check that %(x , y) < ε, otherwise start again.
[Sisson et al., 2007]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC
Another sequential version producing a sequence of Markov
transition kernels Kt and of samples (θ(t)1 , . . . , θ
(t)N ) (1 ≤ t ≤ T )
ABC-PRC Algorithm
1. Select θ? at random from previous θ(t−1)i ’s with probabilities
ω(t−1)i (1 ≤ i ≤ N).
2. Generateθ
(t)i ∼ Kt(θ|θ?) , x ∼ f (x |θ(t)
i ) ,
3. Check that %(x , y) < ε, otherwise start again.
[Sisson et al., 2007]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why PRC?
Partial rejection control: Resample from a population of weightedparticles by pruning away particles with weights below threshold C ,replacing them by new particles obtained by propagating anexisting particle by an SMC step and modifying the weightsaccordinly.
[Liu, 2001]
PRC justification in ABC-PRC:
Suppose we then implement the PRC algorithm for somec > 0 such that only identically zero weights are smallerthan c
Trouble is, there is no such c ...
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Why PRC?
Partial rejection control: Resample from a population of weightedparticles by pruning away particles with weights below threshold C ,replacing them by new particles obtained by propagating anexisting particle by an SMC step and modifying the weightsaccordinly.
[Liu, 2001]PRC justification in ABC-PRC:
Suppose we then implement the PRC algorithm for somec > 0 such that only identically zero weights are smallerthan c
Trouble is, there is no such c ...
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC weight
Probability ω(t)i computed as
ω(t)i ∝ π(θ
(t)i )Lt−1(θ?|θ(t)
i ){π(θ?)Kt(θ(t)i |θ
?)}−1 ,
where Lt−1 is an arbitrary transition kernel.
In caseLt−1(θ′|θ) = Kt(θ|θ′) ,
all weights are equal under a uniform prior.Inspired from Del Moral et al. (2006), who use backward kernelsLt−1 in SMC to achieve unbiasedness
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC weight
Probability ω(t)i computed as
ω(t)i ∝ π(θ
(t)i )Lt−1(θ?|θ(t)
i ){π(θ?)Kt(θ(t)i |θ
?)}−1 ,
where Lt−1 is an arbitrary transition kernel.In case
Lt−1(θ′|θ) = Kt(θ|θ′) ,
all weights are equal under a uniform prior.
Inspired from Del Moral et al. (2006), who use backward kernelsLt−1 in SMC to achieve unbiasedness
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC weight
Probability ω(t)i computed as
ω(t)i ∝ π(θ
(t)i )Lt−1(θ?|θ(t)
i ){π(θ?)Kt(θ(t)i |θ
?)}−1 ,
where Lt−1 is an arbitrary transition kernel.In case
Lt−1(θ′|θ) = Kt(θ|θ′) ,
all weights are equal under a uniform prior.Inspired from Del Moral et al. (2006), who use backward kernelsLt−1 in SMC to achieve unbiasedness
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC bias
Lack of unbiasedness of the method
Joint density of the accepted pair (θ(t−1), θ(t)) proportional to
π(θ(t−1)|y)Kt (θ(t)|θ(t−1))f (y|θ(t)) ,
For an arbitrary function h(θ), E [ωth(θ(t))] proportional to
∫∫h(θ(t))
π(θ(t))Lt−1(θ(t−1)|θ(t))
π(θ(t−1))Kt (θ(t)|θ(t−1))π(θ(t−1)|y)Kt (θ(t)|θ(t−1))f (y|θ(t))dθ(t−1)dθ(t)
∝∫∫
h(θ(t))π(θ(t))Lt−1(θ(t−1)|θ(t))
π(θ(t−1))Kt (θ(t)|θ(t−1))π(θ(t−1))f (y|θ(t−1))
× Kt (θ(t)|θ(t−1))f (y|θ(t))dθ(t−1)dθ(t)
∝∫
h(θ(t))π(θ(t)|y)
{∫Lt−1(θ(t−1)|θ(t))f (y|θ(t−1))dθ(t−1)
}dθ(t)
.
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-PRC bias
Lack of unbiasedness of the method
Joint density of the accepted pair (θ(t−1), θ(t)) proportional to
π(θ(t−1)|y)Kt (θ(t)|θ(t−1))f (y|θ(t)) ,
For an arbitrary function h(θ), E [ωth(θ(t))] proportional to
∫∫h(θ(t))
π(θ(t))Lt−1(θ(t−1)|θ(t))
π(θ(t−1))Kt (θ(t)|θ(t−1))π(θ(t−1)|y)Kt (θ(t)|θ(t−1))f (y|θ(t))dθ(t−1)dθ(t)
∝∫∫
h(θ(t))π(θ(t))Lt−1(θ(t−1)|θ(t))
π(θ(t−1))Kt (θ(t)|θ(t−1))π(θ(t−1))f (y|θ(t−1))
× Kt (θ(t)|θ(t−1))f (y|θ(t))dθ(t−1)dθ(t)
∝∫
h(θ(t))π(θ(t)|y)
{∫Lt−1(θ(t−1)|θ(t))f (y|θ(t−1))dθ(t−1)
}dθ(t)
.
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A mixture example (1)
Toy model of Sisson et al. (2007): if
θ ∼ U(−10, 10) , x |θ ∼ 0.5N (θ, 1) + 0.5N (θ, 1/100) ,
then the posterior distribution associated with y = 0 is the normalmixture
θ|y = 0 ∼ 0.5N (0, 1) + 0.5N (0, 1/100)
restricted to [−10, 10].Furthermore, true target available as
π(θ||x | < ε) ∝ Φ(ε−θ)−Φ(−ε−θ)+Φ(10(ε−θ))−Φ(−10(ε+θ)) .
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
“Ugly, squalid graph...”
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
θθ
−3 −1 1 3
0.00.2
0.40.6
0.81.0
Comparison of τ = 0.15 and τ = 1/0.15 in Kt
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A PMC version
Use of the same kernel idea as ABC-PRC but with IS correction( check PMC )Generate a sample at iteration t by
πt(θ(t)) ∝
N∑j=1
ω(t−1)j Kt(θ
(t)|θ(t−1)j )
modulo acceptance of the associated xt , and use an importance
weight associated with an accepted simulation θ(t)i
ω(t)i ∝ π(θ
(t)i )/πt(θ
(t)i ) .
c© Still likelihood free[Beaumont et al., 2009]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Our ABC-PMC algorithm
Given a decreasing sequence of approximation levels ε1 ≥ . . . ≥ εT ,
1. At iteration t = 1,
For i = 1, ...,N
Simulate θ(1)i ∼ π(θ) and x ∼ f (x |θ(1)
i ) until %(x , y) < ε1
Set ω(1)i = 1/N
Take τ 2 as twice the empirical variance of the θ(1)i ’s
2. At iteration 2 ≤ t ≤ T ,
For i = 1, ...,N, repeat
Pick θ?i from the θ(t−1)j ’s with probabilities ω
(t−1)j
generate θ(t)i |θ
?i ∼ N (θ?i , σ
2t ) and x ∼ f (x |θ(t)
i )
until %(x , y) < εt
Set ω(t)i ∝ π(θ
(t)i )/
∑Nj=1 ω
(t−1)j ϕ
(σ−1t
{θ
(t)i − θ
(t−1)j )
})Take τ 2
t+1 as twice the weighted empirical variance of the θ(t)i ’s
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Sequential Monte Carlo
SMC is a simulation technique to approximate a sequence ofrelated probability distributions πn with π0 “easy” and πT astarget.Iterated IS as PMC: particles moved from time n to time n viakernel Kn and use of a sequence of extended targets πn
πn(z0:n) = πn(zn)n∏
j=0
Lj(zj+1, zj)
where the Lj ’s are backward Markov kernels [check that πn(zn) is amarginal]
[Del Moral, Doucet & Jasra, Series B, 2006]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Sequential Monte Carlo (2)
Algorithm 3 SMC sampler
sample z(0)i ∼ γ0(x) (i = 1, . . . ,N)
compute weights w(0)i = π0(z
(0)i )/γ0(z
(0)i )
for t = 1 to N doif ESS(w (t−1)) < NT then
resample N particles z(t−1) and set weights to 1end ifgenerate z
(t−1)i ∼ Kt(z
(t−1)i , ·) and set weights to
w(t)i = w
(t−1)i−1
πt(z(t)i ))Lt−1(z
(t)i ), z
(t−1)i ))
πt−1(z(t−1)i ))Kt(z
(t−1)i ), z
(t)i ))
end for
[Del Moral, Doucet & Jasra, Series B, 2006]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-SMC
[Del Moral, Doucet & Jasra, 2009]
True derivation of an SMC-ABC algorithmUse of a kernel Kn associated with target πεn and derivation of thebackward kernel
Ln−1(z , z ′) =πεn(z ′)Kn(z ′, z)
πn(z)
Update of the weights
win ∝ wi(n−1)
∑Mm=1 IAεn (xm
in )∑Mm=1 IAεn−1
(xmi(n−1))
when xmin ∼ K (xi(n−1), ·)
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-SMCM
Modification: Makes M repeated simulations of the pseudo-data zgiven the parameter, rather than using a single [M = 1] simulation,leading to weight that is proportional to the number of acceptedzi s
ω(θ) =1
M
M∑i=1
Iρ(η(y),η(zi ))<ε
[limit in M means exact simulation from (tempered) target]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Properties of ABC-SMC
The ABC-SMC method properly uses a backward kernel L(z , z ′) tosimplify the importance weight and to remove the dependence onthe unknown likelihood from this weight. Update of importanceweights is reduced to the ratio of the proportions of survivingparticlesMajor assumption: the forward kernel K is supposed to be invariantagainst the true target [tempered version of the true posterior]
Adaptivity in ABC-SMC algorithm only found in on-lineconstruction of the thresholds εt , slowly enough to keep a largenumber of accepted transitions
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Properties of ABC-SMC
The ABC-SMC method properly uses a backward kernel L(z , z ′) tosimplify the importance weight and to remove the dependence onthe unknown likelihood from this weight. Update of importanceweights is reduced to the ratio of the proportions of survivingparticlesMajor assumption: the forward kernel K is supposed to be invariantagainst the true target [tempered version of the true posterior]Adaptivity in ABC-SMC algorithm only found in on-lineconstruction of the thresholds εt , slowly enough to keep a largenumber of accepted transitions
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A mixture example (2)
Recovery of the target, whether using a fixed standard deviation ofτ = 0.15 or τ = 1/0.15, or a sequence of adaptive τt ’s.
θθ
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
θθ
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
θθ
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
θθ
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
θθ
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Yet another ABC-PRC
Another version of ABC-PRC called PRC-ABC with a proposaldistribution Mt , S replications of the pseudo-data, and a PRC step:
PRC-ABC Algorithm
1. Select θ? at random from the θ(t−1)i ’s with probabilities ω
(t−1)i
2. Generate θ(t)i ∼ Kt , x1, . . . , xS
iid∼ f (x |θ(t−1)i ), set
ω(t)i =
∑s
Kt{η(xs)− η(y)}/
Kt(θ(t)i ) ,
and accept with probability p(i) = ω(t)i /ct,N
3. Set ω(t)i ∝ ω
(t)i /p(i) (1 ≤ i ≤ N)
[Peters, Sisson & Fan, Stat & Computing, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Yet another ABC-PRC
Another version of ABC-PRC called PRC-ABC with a proposaldistribution Mt , S replications of the pseudo-data, and a PRC step:
I Use of a NP estimate Mt(θ) =∑ω
(t−1)i ψ(θ|θ(t−1)
i (with afixed variance?!)
I S = 1 recommended for “greatest gains in samplerperformance”
I ct,N upper quantile of non-zero weights at time t
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Wilkinson’s exact BC
ABC approximation error (i.e. non-zero tolerance) replaced withexact simulation from a controlled approximation to the target,convolution of true posterior with kernel function
πε(θ, z|y) =π(θ)f (z|θ)Kε(y − z)∫π(θ)f (z|θ)Kε(y − z)dzdθ
,
with Kε kernel parameterised by bandwidth ε.[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomisedobservation y = y + ξ, ξ ∼ Kε, and an acceptance probability of
Kε(y − z)/M
gives draws from the posterior distribution π(θ|y).
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Wilkinson’s exact BC
ABC approximation error (i.e. non-zero tolerance) replaced withexact simulation from a controlled approximation to the target,convolution of true posterior with kernel function
πε(θ, z|y) =π(θ)f (z|θ)Kε(y − z)∫π(θ)f (z|θ)Kε(y − z)dzdθ
,
with Kε kernel parameterised by bandwidth ε.[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomisedobservation y = y + ξ, ξ ∼ Kε, and an acceptance probability of
Kε(y − z)/M
gives draws from the posterior distribution π(θ|y).
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
How exact a BC?
“Using ε to represent measurement error isstraightforward, whereas using ε to model the modeldiscrepancy is harder to conceptualize and not ascommonly used”
[Richard Wilkinson, 2008]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
How exact a BC?
Pros
I Pseudo-data from true model and observed data from noisymodel
I Interesting perspective in that outcome is completelycontrolled
I Link with ABCµ and assuming y is observed with ameasurement error with density Kε
I Relates to the theory of model approximation[Kennedy & O’Hagan, 2001]
Cons
I Requires Kε to be bounded by M
I True approximation error never assessed
I Requires a modification of the standard ABC algorithm
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC for HMMs
Specific case of a hidden Markov model
Xt+1 ∼ Qθ(Xt , ·)Yt+1 ∼ gθ(·|xt)
where only y01:n is observed.
[Dean, Singh, Jasra, & Peters, 2011]
Use of specific constraints, adapted to the Markov structure:{y1 ∈ B(y 0
1 , ε)}× · · · ×
{yn ∈ B(y 0
n , ε)}
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC for HMMs
Specific case of a hidden Markov model
Xt+1 ∼ Qθ(Xt , ·)Yt+1 ∼ gθ(·|xt)
where only y01:n is observed.
[Dean, Singh, Jasra, & Peters, 2011]
Use of specific constraints, adapted to the Markov structure:{y1 ∈ B(y 0
1 , ε)}× · · · ×
{yn ∈ B(y 0
n , ε)}
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MLE for HMMs
ABC-MLE defined by
θεn = arg maxθ
Pθ(Y1 ∈ B(y 0
1 , ε), . . . ,Yn ∈ B(y 0n , ε)
)
Exact MLE for the likelihood same basis as Wilkinson!
pεθ(y 01 , . . . , yn)
corresponding to the perturbed process
(xt , yt + εzt)1≤t≤n zt ∼ U(B(0, 1)
[Dean, Singh, Jasra, & Peters, 2011]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MLE for HMMs
ABC-MLE defined by
θεn = arg maxθ
Pθ(Y1 ∈ B(y 0
1 , ε), . . . ,Yn ∈ B(y 0n , ε)
)Exact MLE for the likelihood same basis as Wilkinson!
pεθ(y 01 , . . . , yn)
corresponding to the perturbed process
(xt , yt + εzt)1≤t≤n zt ∼ U(B(0, 1)
[Dean, Singh, Jasra, & Peters, 2011]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MLE is biased
I ABC-MLE is asymptotically (in n) biased with target
l ε(θ) = Eθ∗ [log pεθ(Y1|Y−∞:0)]
I but ABD-MLE converges to true value in the sense
l εn(θn)→ l ε(θ)
for all sequences (θn) converging to θ and εn ↘ ε
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
ABC-MLE is biased
I ABC-MLE is asymptotically (in n) biased with target
l ε(θ) = Eθ∗ [log pεθ(Y1|Y−∞:0)]
I but ABD-MLE converges to true value in the sense
l εn(θn)→ l ε(θ)
for all sequences (θn) converging to θ and εn ↘ ε
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Noisy ABC-MLE
Idea: Modify instead the data from the start
(y 01 + εζ1, . . . , yn + εζn)
[ see Fearnhead-Prangle ] andnoisy ABC-MLE estimate
arg maxθ
Pθ(Y1 ∈ B(y 0
1 + εζ1, ε), . . . ,Yn ∈ B(y 0n + εζn, ε)
)[Dean, Singh, Jasra, & Peters, 2011]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Consistent noisy ABC-MLE
I Degrading the data improves the estimation performances:I Noisy ABC-MLE is asymptotically (in n) consistentI under further assumptions, the noisy ABC-MLE is
asymptotically normalI increase in variance of order ε−2
I likely degradation in precision or computing time due to thelack of summary statistic
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
SMC for ABC likelihood
Algorithm 4 SMC ABC for HMMs
Given θfor k = 1, . . . , n do
generate proposals (x1k , y
1k ), . . . , (xN
k , yNk ) from the model
weigh each proposal with ωlk = IB(y0
k +εζk ,ε)(y l
k)
renormalise the weights and sample the x lk ’s accordingly
end forapproximate the likelihood by
n∏k=1
(N∑l=1
ωlk
/N
)
[Jasra, Singh, Martin, & McCoy, 2010]
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Which summary?
Fundamental difficulty of the choice of the summary statistic whenthere is no non-trivial sufficient statistics [except when done by theexperimenters in the field]
Starting from a large collection of summary statistics is available,Joyce and Marjoram (2008) consider the sequential inclusion intothe ABC target, with a stopping rule based on a likelihood ratiotest.
I Does not taking into account the sequential nature of the tests
I Depends on parameterisation
I Order of inclusion matters.
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Which summary?
Fundamental difficulty of the choice of the summary statistic whenthere is no non-trivial sufficient statistics [except when done by theexperimenters in the field]Starting from a large collection of summary statistics is available,Joyce and Marjoram (2008) consider the sequential inclusion intothe ABC target, with a stopping rule based on a likelihood ratiotest.
I Does not taking into account the sequential nature of the tests
I Depends on parameterisation
I Order of inclusion matters.
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
Which summary?
Fundamental difficulty of the choice of the summary statistic whenthere is no non-trivial sufficient statistics [except when done by theexperimenters in the field]Starting from a large collection of summary statistics is available,Joyce and Marjoram (2008) consider the sequential inclusion intothe ABC target, with a stopping rule based on a likelihood ratiotest.
I Does not taking into account the sequential nature of the tests
I Depends on parameterisation
I Order of inclusion matters.
Likelihood-free Methods
Approximate Bayesian computation
Alphabet soup
A connected Monte Carlo study
I Repeating simulations of the pseudo-data per simulated parameterdoes not improve approximation
I Tolerance level ε does not seem to be highly influential
I Choice of distance / summary statistics / calibration factorsare paramount to successful approximation
I ABC-SMC outperforms ABC-MCMC
[Mckinley, Cook, Deardon, 2009]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of thesummary statistic in close proximity to Wilkinson’s proposal
ABC then considered from a purely inferential viewpoint andcalibrated for estimation purposes.Use of a randomised (or ‘noisy’) version of the summary statistics
η(y) = η(y) + τε
Derivation of a well-calibrated version of ABC, i.e. an algorithmthat gives proper predictions for the distribution associated withthis randomised summary statistic.
[calibration constraint: ABCapproximation with same posterior mean as the true randomisedposterior.]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of thesummary statistic in close proximity to Wilkinson’s proposal
ABC then considered from a purely inferential viewpoint andcalibrated for estimation purposes.Use of a randomised (or ‘noisy’) version of the summary statistics
η(y) = η(y) + τε
Derivation of a well-calibrated version of ABC, i.e. an algorithmthat gives proper predictions for the distribution associated withthis randomised summary statistic. [calibration constraint: ABCapproximation with same posterior mean as the true randomisedposterior.]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Summary statistics
Optimality of the posterior expectation E[θ|y] of the parameter ofinterest as summary statistics η(y)!
Use of the standard quadratic loss function
(θ − θ0)TA(θ − θ0) .
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Summary statistics
Optimality of the posterior expectation E[θ|y] of the parameter ofinterest as summary statistics η(y)!Use of the standard quadratic loss function
(θ − θ0)TA(θ − θ0) .
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Details on Fearnhead and Prangle (F&P) ABC
Use of a summary statistic S(·), an importance proposal g(·), akernel K (·) ≤ 1 and a bandwidth h > 0 such that
(θ, ysim) ∼ g(θ)f (ysim|θ)
is accepted with probability (hence the bound)
K [{S(ysim)− sobs}/h]
and the corresponding importance weight defined by
π(θ)/
g(θ)
[Fearnhead & Prangle, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Errors, errors, and errors
Three levels of approximation
I π(θ|yobs) by π(θ|sobs) loss of information[ignored]
I π(θ|sobs) by
πABC(θ|sobs) =
∫π(s)K [{s− sobs}/h]π(θ|s) ds∫π(s)K [{s− sobs}/h] ds
noisy observations
I πABC(θ|sobs) by importance Monte Carlo based on Nsimulations, represented by var(a(θ)|sobs)/Nacc [expectednumber of acceptances]
[M. Twain/B. Disraeli]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Average acceptance asymptotics
For the average acceptance probability/approximate likelihood
p(θ|sobs) =
∫f (ysim|θ) K [{S(ysim)− sobs}/h] dysim ,
overall acceptance probability
p(sobs) =
∫p(θ|sobs)π(θ) dθ = π(sobs)hd + o(hd)
[F&P, Lemma 1]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Optimal importance proposal
Best choice of importance proposal in terms of effective sample size
g?(θ|sobs) ∝ π(θ)p(θ|sobs)1/2
[Not particularly useful in practice]
I note that p(θ|sobs) is an approximate likelihood
I reminiscent of parallel tempering
I could be approximately achieved by attrition of half of thedata
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Optimal importance proposal
Best choice of importance proposal in terms of effective sample size
g?(θ|sobs) ∝ π(θ)p(θ|sobs)1/2
[Not particularly useful in practice]
I note that p(θ|sobs) is an approximate likelihood
I reminiscent of parallel tempering
I could be approximately achieved by attrition of half of thedata
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Calibration of h
“This result gives insight into how S(·) and h affect the MonteCarlo error. To minimize Monte Carlo error, we need hd to be nottoo small. Thus ideally we want S(·) to be a low dimensionalsummary of the data that is sufficiently informative about θ thatπ(θ|sobs) is close, in some sense, to π(θ|yobs)” (F&P, p.5)
I turns h into an absolute value while it should becontext-dependent and user-calibrated
I only addresses one term in the approximation error andacceptance probability (“curse of dimensionality”)
I h large prevents πABC(θ|sobs) to be close to π(θ|sobs)
I d small prevents π(θ|sobs) to be close to π(θ|yobs) (“curse of[dis]information”)
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Calibrating ABC
“If πABC is calibrated, then this means that probability statementsthat are derived from it are appropriate, and in particular that wecan use πABC to quantify uncertainty in estimates” (F&P, p.5)
Definition
For 0 < q < 1 and subset A, event Eq(A) made of sobs such thatPrABC(θ ∈ A|sobs) = q. Then ABC is calibrated if
Pr(θ ∈ A|Eq(A)) = q
I unclear meaning of conditioning on Eq(A)
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Calibrating ABC
“If πABC is calibrated, then this means that probability statementsthat are derived from it are appropriate, and in particular that wecan use πABC to quantify uncertainty in estimates” (F&P, p.5)
Definition
For 0 < q < 1 and subset A, event Eq(A) made of sobs such thatPrABC(θ ∈ A|sobs) = q. Then ABC is calibrated if
Pr(θ ∈ A|Eq(A)) = q
I unclear meaning of conditioning on Eq(A)
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Calibrated ABC
Theorem (F&P)
Noisy ABC, where
sobs = S(yobs) + hε , ε ∼ K (·)
is calibrated
[Wilkinson, 2008]no condition on h!!
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Calibrated ABC
Consequence: when h =∞
Theorem (F&P)
The prior distribution is always calibrated
is this a relevant property then?
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
More about calibrated ABC
“Calibration is not universally accepted by Bayesians. It is even morequestionable here as we care how statements we make relate to thereal world, not to a mathematically defined posterior.” R. Wilkinson
I Same reluctance about the prior being calibrated
I Property depending on prior, likelihood, and summary
I Calibration is a frequentist property (almost a p-value!)
I More sensible to account for the simulator’s imperfectionsthan using noisy-ABC against a meaningless based measure
[Wilkinson, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Converging ABC
Theorem (F&P)
For noisy ABC, the expected noisy-ABC log-likelihood,
E {log[p(θ|sobs)]} =
∫ ∫log[p(θ|S(yobs) + ε)]π(yobs|θ0)K (ε)dyobsdε,
has its maximum at θ = θ0.
True for any choice of summary statistic? even ancilary statistics?![Imposes at least identifiability...]
Relevant in asymptotia and not for the data
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Converging ABC
Corollary
For noisy ABC, the ABC posterior converges onto a point mass onthe true parameter value as m→∞.
For standard ABC, not always the case (unless h goes to zero).
Strength of regularity conditions (c1) and (c2) in Bernardo& Smith, 1994?
[out-of-reach constraints on likelihood and posterior]Again, there must be conditions imposed upon summarystatistics...
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Loss motivated statistic
Under quadratic loss function,
Theorem (F&P)
(i) The minimal posterior error E[L(θ, θ)|yobs] occurs whenθ = E(θ|yobs) (!)
(ii) When h→ 0, EABC(θ|sobs) converges to E(θ|yobs)
(iii) If S(yobs) = E[θ|yobs] then for θ = EABC[θ|sobs]
E[L(θ, θ)|yobs] = trace(AΣ) + h2
∫xTAxK (x)dx + o(h2).
measure-theoretic difficulties?dependence of sobs on h makes me uncomfortable inherent to noisyABCRelevant for choice of K ?
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Optimal summary statistic
“We take a different approach, and weaken the requirement forπABC to be a good approximation to π(θ|yobs). We argue for πABC
to be a good approximation solely in terms of the accuracy ofcertain estimates of the parameters.” (F&P, p.5)
From this result, F&P
I derive their choice of summary statistic,
S(y) = E(θ|y)
[almost sufficient]
I suggest
h = O(N−1/(2+d)) and h = O(N−1/(4+d))
as optimal bandwidths for noisy and standard ABC.
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Optimal summary statistic
“We take a different approach, and weaken the requirement forπABC to be a good approximation to π(θ|yobs). We argue for πABC
to be a good approximation solely in terms of the accuracy ofcertain estimates of the parameters.” (F&P, p.5)
From this result, F&P
I derive their choice of summary statistic,
S(y) = E(θ|y)
[wow! EABC[θ|S(yobs)] = E[θ|yobs]]
I suggest
h = O(N−1/(2+d)) and h = O(N−1/(4+d))
as optimal bandwidths for noisy and standard ABC.
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Caveat
Since E(θ|yobs) is most usually unavailable, F&P suggest
(i) use a pilot run of ABC to determine a region of non-negligibleposterior mass;
(ii) simulate sets of parameter values and data;
(iii) use the simulated sets of parameter values and data toestimate the summary statistic; and
(iv) run ABC with this choice of summary statistic.
where is the assessment of the first stage error?
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Caveat
Since E(θ|yobs) is most usually unavailable, F&P suggest
(i) use a pilot run of ABC to determine a region of non-negligibleposterior mass;
(ii) simulate sets of parameter values and data;
(iii) use the simulated sets of parameter values and data toestimate the summary statistic; and
(iv) run ABC with this choice of summary statistic.
where is the assessment of the first stage error?
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Approximating the summary statistic
As Beaumont et al. (2002) and Blum and Francois (2010), F&Puse a linear regression to approximate E(θ|yobs):
θi = β(i)0 + β(i)f (yobs) + εi
with f made of standard transforms[Further justifications?]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
[my]questions about semi-automatic ABC
I dependence on h and S(·) in the early stage
I reduction of Bayesian inference to point estimation
I approximation error in step (i) not accounted for
I not parameterisation invariant
I practice shows that proper approximation to genuine posteriordistributions stems from using a (much) larger number ofsummary statistics than the dimension of the parameter
I the validity of the approximation to the optimal summarystatistic depends on the quality of the pilot run
I important inferential issues like model choice are not coveredby this approach.
[Robert, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
More about semi-automatic ABC
[End of section derived from comments on Read Paper, to appear]
“The apparently arbitrary nature of the choice of summary statisticshas always been perceived as the Achilles heel of ABC.” M.Beaumont
I “Curse of dimensionality” linked with the increase of thedimension of the summary statistic
I Connection with principal component analysis[Itan et al., 2010]
I Connection with partial least squares[Wegman et al., 2009]
I Beaumont et al. (2002) postprocessed output is used as inputby F&P to run a second ABC
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
More about semi-automatic ABC
[End of section derived from comments on Read Paper, to appear]
“The apparently arbitrary nature of the choice of summary statisticshas always been perceived as the Achilles heel of ABC.” M.Beaumont
I “Curse of dimensionality” linked with the increase of thedimension of the summary statistic
I Connection with principal component analysis[Itan et al., 2010]
I Connection with partial least squares[Wegman et al., 2009]
I Beaumont et al. (2002) postprocessed output is used as inputby F&P to run a second ABC
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Wood’s alternative
Instead of a non-parametric kernel approximation to the likelihood
1
R
∑r
Kε{η(yr )− η(yobs)}
Wood (2010) suggests a normal approximation
η(y(θ)) ∼ Nd(µθ,Σθ)
whose parameters can be approximated based on the R simulations(for each value of θ).
I Parametric versus non-parametric rate [Uh?!]
I Automatic weighting of components of η(·) through Σθ
I Dependence on normality assumption (pseudo-likelihood?)
[Cornebise, Girolami & Kosmidis, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Wood’s alternative
Instead of a non-parametric kernel approximation to the likelihood
1
R
∑r
Kε{η(yr )− η(yobs)}
Wood (2010) suggests a normal approximation
η(y(θ)) ∼ Nd(µθ,Σθ)
whose parameters can be approximated based on the R simulations(for each value of θ).
I Parametric versus non-parametric rate [Uh?!]
I Automatic weighting of components of η(·) through Σθ
I Dependence on normality assumption (pseudo-likelihood?)
[Cornebise, Girolami & Kosmidis, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Reinterpretation and extensions
Reinterpretation of ABC output as joint simulation from
π(x , y |θ) = f (x |θ)πY |X (y |x)
whereπY |X (y |x) = Kε(y − x)
Reinterpretation of noisy ABC
if y |y obs ∼ πY |X (·|y obs), then marginally
y ∼ πY |θ(·|θ0)
c© Explain for the consistency of Bayesian inference based on y and π[Lee, Andrieu & Doucet, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Reinterpretation and extensions
Reinterpretation of ABC output as joint simulation from
π(x , y |θ) = f (x |θ)πY |X (y |x)
whereπY |X (y |x) = Kε(y − x)
Reinterpretation of noisy ABC
if y |y obs ∼ πY |X (·|y obs), then marginally
y ∼ πY |θ(·|θ0)
c© Explain for the consistency of Bayesian inference based on y and π[Lee, Andrieu & Doucet, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Reinterpretation and extensions
Also fits within the pseudo-marginal of Andrieu and Roberts (2009)
Algorithm 5 Rejuvenating GIMH-ABC
At time t, given θt = θ:Sample θ′ ∼ g(·|θ).Sample z1:N ∼ π⊗NY |Θ(·|θ′).
Sample x1:N−1 ∼ π⊗(N−1)Y |Θ (·|θ).
With probability
min
{1,π(θ′)g(θ|θ′)π(θ)g(θ′|θ)
×∑N
i=1 IBh(zi )(y)
1 +∑N−1
i=1 IBh(xi )(y)
}
set θt+1 = θ′.Otherwise set θt+1 = θ.
[Lee, Andrieu & Doucet, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Reinterpretation and extensions
Also fits within the pseudo-marginal of Andrieu and Roberts (2009)
Algorithm 6 One-hit MCMC-ABC
At time t, with θt = θ:Sample θ′ ∼ g(·|θ).
With probability 1−min{
1, π(θ′)g(θ|θ′)π(θ)g(θ′|θ)
}, set θt+1 = θ and go to
time t + 1.Sample zi ∼ πY |Θ(·|θ′) and xi ∼ πY |Θ(·|θ) for i = 1, . . . untily ∈ Bh(zi ) and/or y ∈ Bh(xi ).If y ∈ Bh(zi ) set θt+1 = θ′ and go to time t + 1.If y ∈ Bh(xi ) set θt+1 = θ and go to time t + 1.
[Lee, Andrieu & Doucet, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Reinterpretation and extensions
Also fits within the pseudo-marginal of Andrieu and Roberts (2009)
Validation
c© The invariant distribution of θ is
πθ|Y (·|y)
for both algorithms.
[Lee, Andrieu & Doucet, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
ABC for Markov chains
Rewriting the posterior as
π(θ)1−n(θ)π(θ|x1)∏
π(θ|xt−1, xt)
where π(θ|xt−1, xt) ∝ f (xt |xt−1, θ)π(θ)
I Allows for a stepwise ABC, replacing each π(θ|xt−1, xt) by anABC approximation
I Similarity with F&P’s multiple sources of data (and also withDean et al., 2011 )
[White et al., 2010, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
ABC for Markov chains
Rewriting the posterior as
π(θ)1−n(θ)π(θ|x1)∏
π(θ|xt−1, xt)
where π(θ|xt−1, xt) ∝ f (xt |xt−1, θ)π(θ)
I Allows for a stepwise ABC, replacing each π(θ|xt−1, xt) by anABC approximation
I Similarity with F&P’s multiple sources of data (and also withDean et al., 2011 )
[White et al., 2010, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Back to sufficiency
Difference between regular sufficiency, equivalent to
π(θ|y) = π(θ|η(y))
for all θ’s and all priors π, and
marginal sufficiency, stated as
π(µ(θ)|y) = π(µ(θ)|η(y))
for all θ’s, the given prior π and a subvector µ(θ)[Basu, 1977]
Relates to F & P’s main result, but could event be reduced toconditional sufficiency
π(µ(θ)|yobs) = π(µ(θ)|η(yobs))
(if feasible at all...)[Dawson, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Back to sufficiency
Difference between regular sufficiency, equivalent to
π(θ|y) = π(θ|η(y))
for all θ’s and all priors π, andmarginal sufficiency, stated as
π(µ(θ)|y) = π(µ(θ)|η(y))
for all θ’s, the given prior π and a subvector µ(θ)[Basu, 1977]
Relates to F & P’s main result, but could event be reduced toconditional sufficiency
π(µ(θ)|yobs) = π(µ(θ)|η(yobs))
(if feasible at all...)[Dawson, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Back to sufficiency
Difference between regular sufficiency, equivalent to
π(θ|y) = π(θ|η(y))
for all θ’s and all priors π, andmarginal sufficiency, stated as
π(µ(θ)|y) = π(µ(θ)|η(y))
for all θ’s, the given prior π and a subvector µ(θ)[Basu, 1977]
Relates to F & P’s main result, but could event be reduced toconditional sufficiency
π(µ(θ)|yobs) = π(µ(θ)|η(yobs))
(if feasible at all...)[Dawson, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
ABC and BCa’s
Arguments in favour of a comparison of ABC with bootstrap in theevent π(θ) is not defined from prior information but asconventional non-informative prior.
[Efron, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Predictive performances
Instead of posterior means, other aspects of posterior to explore.E.g., look at minimising loss of information∫
p(θ, y) logp(θ, y)
p(θ)p(y)dθdy−
∫p(θ, η(y)) log
p(θ, η(y))
p(θ)p(η(y))dθdη(y)
for selection of summary statistics.[Filippi, Barnes, & Stumpf, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Auxiliary variables
Auxiliary variable method avoids computations of untractableconstant in likelihood
f (y|θ) = Zθ f (y|θ)
Introduce pseudo-data z with artificial target g(z|θ, y)Generate θ′ ∼ K (θ, θ′) and z′ ∼ f (z|θ′)
[Møller, Pettitt, Berthelsen, & Reeves, 2006]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Auxiliary variables
Auxiliary variable method avoids computations of untractableconstant in likelihood
f (y|θ) = Zθ f (y|θ)
Introduce pseudo-data z with artificial target g(z|θ, y)Generate θ′ ∼ K (θ, θ′) and z′ ∼ f (z|θ′)Accept with probability
π(θ′)f (y|θ′)g(z′|θ′, y)
π(θ)f (y|θ)g(z|θ, y)
K (θ′, θ)f (z|θ)
K (θ, θ′)f (z′|θ′)∧ 1
[Møller, Pettitt, Berthelsen, & Reeves, 2006]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Auxiliary variables
Auxiliary variable method avoids computations of untractableconstant in likelihood
f (y|θ) = Zθ f (y|θ)
Introduce pseudo-data z with artificial target g(z|θ, y)Generate θ′ ∼ K (θ, θ′) and z′ ∼ f (z|θ′)Accept with probability
π(θ′)f (y|θ′)g(z′|θ′, y)
π(θ)f (y|θ)g(z|θ, y)
K (θ′, θ)f (z|θ)
K (θ, θ′)f (z′|θ′)∧ 1
[Møller, Pettitt, Berthelsen, & Reeves, 2006]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Auxiliary variables
Auxiliary variable method avoids computations of untractableconstant in likelihood
f (y|θ) = Zθ f (y|θ)
Introduce pseudo-data z with artificial target g(z|θ, y)Generate θ′ ∼ K (θ, θ′) and z′ ∼ f (z|θ′)
For Gibbs random fields , existence of a genuine sufficient statistic η(y).[Møller, Pettitt, Berthelsen, & Reeves, 2006]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Auxiliary variables and ABC
Special case of ABC when
I g(z|θ, y) = Kε(η(bz)− η(y))
I f (y|θ′)f (z|θ)/f (y|θ)f (z′|θ′) replaced by one [or not?!]
Consequences
I likelihood-free (ABC) versus constant-free (AVM)
I in ABC, Kε(·) should be allowed to depend on θ
I for Gibbs random fields, the auxiliary approach should beprefered to ABC
[Møller, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
Auxiliary variables and ABC
Special case of ABC when
I g(z|θ, y) = Kε(η(bz)− η(y))
I f (y|θ′)f (z|θ)/f (y|θ)f (z′|θ′) replaced by one [or not?!]
Consequences
I likelihood-free (ABC) versus constant-free (AVM)
I in ABC, Kε(·) should be allowed to depend on θ
I for Gibbs random fields, the auxiliary approach should beprefered to ABC
[Møller, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
ABC and BIC
Idea of applying BIC during the local regression :
I Run regular ABC
I Select summary statistics during local regression
I Recycle the prior simulation sample (reference table) withthose summary statistics
I Rerun the corresponding local regression (low cost)
[Pudlo & Sedki, 2012]
Likelihood-free Methods
Approximate Bayesian computation
Automated summary statistic selection
A Brave New World?!
Likelihood-free Methods
ABC for model choice
ABC for model choice
Introduction
The Metropolis-Hastings Algorithm
simulation-based methods in Econometrics
Genetics of ABC
Approximate Bayesian computation
ABC for model choiceModel choiceGibbs random fieldsGeneric ABC model choice
ABC model choice consistency
Likelihood-free Methods
ABC for model choice
Model choice
Bayesian model choice
Several models M1,M2, . . . are considered simultaneously for adataset y and the model index M is part of the inference.Use of a prior distribution. π(M = m), plus a prior distribution onthe parameter conditional on the value m of the model index,πm(θm)Goal is to derive the posterior distribution of M, challengingcomputational target when models are complex.
Likelihood-free Methods
ABC for model choice
Model choice
Generic ABC for model choice
Algorithm 7 Likelihood-free model choice sampler (ABC-MC)
for t = 1 to T dorepeat
Generate m from the prior π(M = m)Generate θm from the prior πm(θm)Generate z from the model fm(z|θm)
until ρ{η(z), η(y)} < εSet m(t) = m and θ(t) = θm
end for
Likelihood-free Methods
ABC for model choice
Model choice
ABC estimates
Posterior probability π(M = m|y) approximated by the frequencyof acceptances from model m
1
T
T∑t=1
Im(t)=m .
Issues with implementation:
I should tolerances ε be the same for all models?
I should summary statistics vary across models (incl. theirdimension)?
I should the distance measure ρ vary as well?
Likelihood-free Methods
ABC for model choice
Model choice
ABC estimates
Posterior probability π(M = m|y) approximated by the frequencyof acceptances from model m
1
T
T∑t=1
Im(t)=m .
Extension to a weighted polychotomous logistic regression estimateof π(M = m|y), with non-parametric kernel weights
[Cornuet et al., DIYABC, 2009]
Likelihood-free Methods
ABC for model choice
Model choice
The Great ABC controversy
On-going controvery in phylogeographic genetics about the validityof using ABC for testing
Against: Templeton, 2008,2009, 2010a, 2010b, 2010cargues that nested hypothesescannot have higher probabilitiesthan nesting hypotheses (!)
Likelihood-free Methods
ABC for model choice
Model choice
The Great ABC controversy
On-going controvery in phylogeographic genetics about the validityof using ABC for testing
Against: Templeton, 2008,2009, 2010a, 2010b, 2010cargues that nested hypothesescannot have higher probabilitiesthan nesting hypotheses (!)
Replies: Fagundes et al., 2008,Beaumont et al., 2010, Berger etal., 2010, Csillery et al., 2010point out that the criticisms areaddressed at [Bayesian]model-based inference and havenothing to do with ABC...
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Gibbs random fields
Gibbs distribution
The rv y = (y1, . . . , yn) is a Gibbs random field associated withthe graph G if
f (y) =1
Zexp
{−∑c∈C
Vc(yc)
},
where Z is the normalising constant, C is the set of cliques of Gand Vc is any function also called potential sufficient statistic
U(y) =∑
c∈C Vc(yc) is the energy function
c© Z is usually unavailable in closed form
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Gibbs random fields
Gibbs distribution
The rv y = (y1, . . . , yn) is a Gibbs random field associated withthe graph G if
f (y) =1
Zexp
{−∑c∈C
Vc(yc)
},
where Z is the normalising constant, C is the set of cliques of Gand Vc is any function also called potential sufficient statistic
U(y) =∑
c∈C Vc(yc) is the energy function
c© Z is usually unavailable in closed form
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Potts model
Potts model
Vc(y) is of the form
Vc(y) = θS(y) = θ∑l∼i
δyl=yi
where l∼i denotes a neighbourhood structure
In most realistic settings, summation
Zθ =∑x∈X
exp{θTS(x)}
involves too many terms to be manageable and numericalapproximations cannot always be trusted
[Cucala, Marin, CPR & Titterington, 2009]
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Potts model
Potts model
Vc(y) is of the form
Vc(y) = θS(y) = θ∑l∼i
δyl=yi
where l∼i denotes a neighbourhood structure
In most realistic settings, summation
Zθ =∑x∈X
exp{θTS(x)}
involves too many terms to be manageable and numericalapproximations cannot always be trusted
[Cucala, Marin, CPR & Titterington, 2009]
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Bayesian Model Choice
Comparing a model with potential S0 taking values in Rp0 versus amodel with potential S1 taking values in Rp1 can be done throughthe Bayes factor corresponding to the priors π0 and π1 on eachparameter space
Bm0/m1(x) =
∫exp{θT
0 S0(x)}/Zθ0,0π0(dθ0)∫
exp{θT1 S1(x)}/Zθ1,1
π1(dθ1)
Use of Jeffreys’ scale to select most appropriate model
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Bayesian Model Choice
Comparing a model with potential S0 taking values in Rp0 versus amodel with potential S1 taking values in Rp1 can be done throughthe Bayes factor corresponding to the priors π0 and π1 on eachparameter space
Bm0/m1(x) =
∫exp{θT
0 S0(x)}/Zθ0,0π0(dθ0)∫
exp{θT1 S1(x)}/Zθ1,1
π1(dθ1)
Use of Jeffreys’ scale to select most appropriate model
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Neighbourhood relations
Choice to be made between M neighbourhood relations
im∼ i ′ (0 ≤ m ≤ M − 1)
withSm(x) =
∑im∼i ′
I{xi=xi′}
driven by the posterior probabilities of the models.
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Model index
Formalisation via a model index M that appears as a newparameter with prior distribution π(M = m) andπ(θ|M = m) = πm(θm)
Computational target:
P(M = m|x) ∝∫
Θm
fm(x|θm)πm(θm) dθm π(M = m) ,
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Model index
Formalisation via a model index M that appears as a newparameter with prior distribution π(M = m) andπ(θ|M = m) = πm(θm)Computational target:
P(M = m|x) ∝∫
Θm
fm(x|θm)πm(θm) dθm π(M = m) ,
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Sufficient statistics
By definition, if S(x) sufficient statistic for the joint parameters(M, θ0, . . . , θM−1),
P(M = m|x) = P(M = m|S(x)) .
For each model m, own sufficient statistic Sm(·) andS(·) = (S0(·), . . . ,SM−1(·)) also sufficient.
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Sufficient statistics
By definition, if S(x) sufficient statistic for the joint parameters(M, θ0, . . . , θM−1),
P(M = m|x) = P(M = m|S(x)) .
For each model m, own sufficient statistic Sm(·) andS(·) = (S0(·), . . . ,SM−1(·)) also sufficient.
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Sufficient statistics in Gibbs random fields
For Gibbs random fields,
x |M = m ∼ fm(x|θm) = f 1m(x|S(x))f 2
m(S(x)|θm)
=1
n(S(x))f 2m(S(x)|θm)
wheren(S(x)) = ] {x ∈ X : S(x) = S(x)}
c© S(x) is therefore also sufficient for the joint parameters[Specific to Gibbs random fields!]
Likelihood-free Methods
ABC for model choice
Gibbs random fields
ABC model choice Algorithm
ABC-MCI Generate m∗ from the prior π(M = m).
I Generate θ∗m∗ from the prior πm∗(·).
I Generate x∗ from the model fm∗(·|θ∗m∗).
I Compute the distance ρ(S(x0), S(x∗)).
I Accept (θ∗m∗ ,m∗) if ρ(S(x0),S(x∗)) < ε.
Note When ε = 0 the algorithm is exact
Likelihood-free Methods
ABC for model choice
Gibbs random fields
ABC approximation to the Bayes factor
Frequency ratio:
BFm0/m1(x0) =
P(M = m0|x0)
P(M = m1|x0)× π(M = m1)
π(M = m0)
=]{mi∗ = m0}]{mi∗ = m1}
× π(M = m1)
π(M = m0),
replaced with
BFm0/m1(x0) =
1 + ]{mi∗ = m0}1 + ]{mi∗ = m1}
× π(M = m1)
π(M = m0)
to avoid indeterminacy (also Bayes estimate).
Likelihood-free Methods
ABC for model choice
Gibbs random fields
ABC approximation to the Bayes factor
Frequency ratio:
BFm0/m1(x0) =
P(M = m0|x0)
P(M = m1|x0)× π(M = m1)
π(M = m0)
=]{mi∗ = m0}]{mi∗ = m1}
× π(M = m1)
π(M = m0),
replaced with
BFm0/m1(x0) =
1 + ]{mi∗ = m0}1 + ]{mi∗ = m1}
× π(M = m1)
π(M = m0)
to avoid indeterminacy (also Bayes estimate).
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Toy example
iid Bernoulli model versus two-state first-order Markov chain, i.e.
f0(x|θ0) = exp
(θ0
n∑i=1
I{xi=1}
)/{1 + exp(θ0)}n ,
versus
f1(x|θ1) =1
2exp
(θ1
n∑i=2
I{xi=xi−1}
)/{1 + exp(θ1)}n−1 ,
with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phasetransition” boundaries).
Likelihood-free Methods
ABC for model choice
Gibbs random fields
Toy example (2)
−40 −20 0 10
−50
5
BF01
BF01
−40 −20 0 10−10
−50
510
BF01
BF01
(left) Comparison of the true BFm0/m1(x0) with BFm0/m1
(x0) (inlogs) over 2, 000 simulations and 4.106 proposals from the prior.(right) Same when using tolerance ε corresponding to the 1%quantile on the distances.
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Back to sufficiency
‘Sufficient statistics for individual models are unlikely tobe very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
If η1(x) sufficient statistic for model m = 1 and parameter θ1 andη2(x) sufficient statistic for model m = 2 and parameter θ2,(η1(x), η2(x)) is not always sufficient for (m, θm)
c© Potential loss of information at the testing level
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Back to sufficiency
‘Sufficient statistics for individual models are unlikely tobe very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
If η1(x) sufficient statistic for model m = 1 and parameter θ1 andη2(x) sufficient statistic for model m = 2 and parameter θ2,(η1(x), η2(x)) is not always sufficient for (m, θm)
c© Potential loss of information at the testing level
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Back to sufficiency
‘Sufficient statistics for individual models are unlikely tobe very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
If η1(x) sufficient statistic for model m = 1 and parameter θ1 andη2(x) sufficient statistic for model m = 2 and parameter θ2,(η1(x), η2(x)) is not always sufficient for (m, θm)
c© Potential loss of information at the testing level
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Limiting behaviour of B12 (T →∞)
ABC approximation
B12(y) =
∑Tt=1 Imt=1 Iρ{η(zt),η(y)}≤ε∑Tt=1 Imt=2 Iρ{η(zt),η(y)}≤ε
,
where the (mt , z t)’s are simulated from the (joint) prior
As T go to infinity, limit
Bε12(y) =
∫Iρ{η(z),η(y)}≤επ1(θ1)f1(z|θ1) dz dθ1∫Iρ{η(z),η(y)}≤επ2(θ2)f2(z|θ2) dz dθ2
=
∫Iρ{η,η(y)}≤επ1(θ1)f η1 (η|θ1) dη dθ1∫Iρ{η,η(y)}≤επ2(θ2)f η2 (η|θ2) dη dθ2
,
where f η1 (η|θ1) and f η2 (η|θ2) distributions of η(z)
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Limiting behaviour of B12 (T →∞)
ABC approximation
B12(y) =
∑Tt=1 Imt=1 Iρ{η(zt),η(y)}≤ε∑Tt=1 Imt=2 Iρ{η(zt),η(y)}≤ε
,
where the (mt , z t)’s are simulated from the (joint) priorAs T go to infinity, limit
Bε12(y) =
∫Iρ{η(z),η(y)}≤επ1(θ1)f1(z|θ1) dz dθ1∫Iρ{η(z),η(y)}≤επ2(θ2)f2(z|θ2) dz dθ2
=
∫Iρ{η,η(y)}≤επ1(θ1)f η1 (η|θ1) dη dθ1∫Iρ{η,η(y)}≤επ2(θ2)f η2 (η|θ2) dη dθ2
,
where f η1 (η|θ1) and f η2 (η|θ2) distributions of η(z)
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Limiting behaviour of B12 (ε→ 0)
When ε goes to zero,
Bη12(y) =
∫π1(θ1)f η1 (η(y)|θ1) dθ1∫π2(θ2)f η2 (η(y)|θ2) dθ2
,
c© Bayes factor based on the sole observation of η(y)
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Limiting behaviour of B12 (ε→ 0)
When ε goes to zero,
Bη12(y) =
∫π1(θ1)f η1 (η(y)|θ1) dθ1∫π2(θ2)f η2 (η(y)|θ2) dθ2
,
c© Bayes factor based on the sole observation of η(y)
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic for both models,
fi (y|θi ) = gi (y)f ηi (η(y)|θi )
Thus
B12(y) =
∫Θ1π(θ1)g1(y)f η1 (η(y)|θ1) dθ1∫
Θ2π(θ2)g2(y)f η2 (η(y)|θ2) dθ2
=g1(y)
∫π1(θ1)f η1 (η(y)|θ1) dθ1
g2(y)∫π2(θ2)f η2 (η(y)|θ2) dθ2
=g1(y)
g2(y)Bη
12(y) .
[Didelot, Everitt, Johansen & Lawson, 2011]
c© No discrepancy only when cross-model sufficiency
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic for both models,
fi (y|θi ) = gi (y)f ηi (η(y)|θi )
Thus
B12(y) =
∫Θ1π(θ1)g1(y)f η1 (η(y)|θ1) dθ1∫
Θ2π(θ2)g2(y)f η2 (η(y)|θ2) dθ2
=g1(y)
∫π1(θ1)f η1 (η(y)|θ1) dθ1
g2(y)∫π2(θ2)f η2 (η(y)|θ2) dθ2
=g1(y)
g2(y)Bη
12(y) .
[Didelot, Everitt, Johansen & Lawson, 2011]
c© No discrepancy only when cross-model sufficiency
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Poisson/geometric example
Samplex = (x1, . . . , xn)
from either a Poisson P(λ) or from a geometric G(p) Then
S =n∑
i=1
yi = η(x)
sufficient statistic for either model but not simultaneously
Discrepancy ratio
g1(x)
g2(x)=
S!n−S/∏
i yi !
1/(n+S−1
S
)
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Poisson/geometric discrepancy
Range of B12(x) versus Bη12(x) B12(x): The values produced have
nothing in common.
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))[Didelot, Everitt, Johansen & Lawson, 2011]
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))[Didelot, Everitt, Johansen & Lawson, 2011]
In the Poisson/geometric case, if∏
i xi ! is added to S , nodiscrepancy
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))[Didelot, Everitt, Johansen & Lawson, 2011]
Only applies in genuine sufficiency settings...
c© Inability to evaluate loss brought by summary statistics
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Meaning of the ABC-Bayes factor
‘This is also why focus on model discrimination typically(...) proceeds by (...) accepting that the Bayes Factorthat one obtains is only derived from the summarystatistics and may in no way correspond to that of thefull model.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
In the Poisson/geometric case, if E[yi ] = θ0 > 0,
limn→∞
Bη12(y) =
(θ0 + 1)2
θ0e−θ0
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Meaning of the ABC-Bayes factor
‘This is also why focus on model discrimination typically(...) proceeds by (...) accepting that the Bayes Factorthat one obtains is only derived from the summarystatistics and may in no way correspond to that of thefull model.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
In the Poisson/geometric case, if E[yi ] = θ0 > 0,
limn→∞
Bη12(y) =
(θ0 + 1)2
θ0e−θ0
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
MA(q) divergence
1 2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
1 2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
1 2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
1 2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Evolution [against ε] of ABC Bayes factor, in terms of frequencies ofvisits to models MA(1) (left) and MA(2) (right) when ε equal to10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sampleof 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factorequal to 17.71.
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
MA(q) divergence
1 2
0.0
0.2
0.4
0.6
1 2
0.0
0.2
0.4
0.6
1 2
0.0
0.2
0.4
0.6
1 2
0.0
0.2
0.4
0.6
0.8
Evolution [against ε] of ABC Bayes factor, in terms of frequencies ofvisits to models MA(1) (left) and MA(2) (right) when ε equal to10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sampleof 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21
equal to .004.
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Further comments
‘There should be the possibility that for the same model,but different (non-minimal) [summary] statistics (sodifferent η’s: η1 and η∗1) the ratio of evidences may nolonger be equal to one.’
[Michael Stumpf, Jan. 28, 2011, ’Og]
Using different summary statistics [on different models] mayindicate the loss of information brought by each set but agreementdoes not lead to trustworthy approximations.
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
A population genetics evaluation
Population genetics example with
I 3 populations
I 2 scenari
I 15 individuals
I 5 loci
I single mutation parameter
I 24 summary statistics
I 2 million ABC proposal
I importance [tree] sampling alternative
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
A population genetics evaluation
Population genetics example with
I 3 populations
I 2 scenari
I 15 individuals
I 5 loci
I single mutation parameter
I 24 summary statistics
I 2 million ABC proposal
I importance [tree] sampling alternative
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Stability of importance sampling
●
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
●
0.0
0.2
0.4
0.6
0.8
1.0
●
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Comparison with ABC
Use of 24 summary statistics and DIY-ABC logistic correction
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Importance Sampling estimates of P(M=1|y)
AB
C e
stim
ates
of P
(M=
1|y)
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Comparison with ABC
Use of 15 summary statistics and DIY-ABC logistic correction
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Importance Sampling estimates of P(M=1|y)
AB
C e
stim
ates
of P
(M=
1|y)
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
Comparison with ABC
Use of 24 summary statistics and DIY-ABC logistic correction
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Importance Sampling estimates of P(M=1|y)
AB
C e
stim
ates
of P
(M=
1|y)
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
The only safe cases???
Besides specific models like Gibbs random fields,
using distances over the data itself escapes the discrepancy...[Toni & Stumpf, 2010; Sousa & al., 2009]
...and so does the use of more informal model fitting measures[Ratmann & al., 2009]
Likelihood-free Methods
ABC for model choice
Generic ABC model choice
The only safe cases???
Besides specific models like Gibbs random fields,
using distances over the data itself escapes the discrepancy...[Toni & Stumpf, 2010; Sousa & al., 2009]
...and so does the use of more informal model fitting measures[Ratmann & al., 2009]
Likelihood-free Methods
ABC model choice consistency
ABC model choice consistency
Introduction
The Metropolis-Hastings Algorithm
simulation-based methods in Econometrics
Genetics of ABC
Approximate Bayesian computation
ABC for model choice
ABC model choice consistency
Likelihood-free Methods
ABC model choice consistency
Formalised framework
The starting point
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic T(y)consistent?
Note: c© drawn on T(y) through BT12(y) necessarily differs from c©
drawn on y through B12(y)
Likelihood-free Methods
ABC model choice consistency
Formalised framework
The starting point
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic T(y)consistent?
Note: c© drawn on T(y) through BT12(y) necessarily differs from c©
drawn on y through B12(y)
Likelihood-free Methods
ABC model choice consistency
Formalised framework
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks]:[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N (θ1, 1) opposed to model M2:y ∼ L(θ2, 1/
√2), Laplace distribution with mean θ2 and scale
parameter 1/√
2 (variance one).
Likelihood-free Methods
ABC model choice consistency
Formalised framework
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks]:[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N (θ1, 1) opposed to model M2:y ∼ L(θ2, 1/
√2), Laplace distribution with mean θ2 and scale
parameter 1/√
2 (variance one).Four possible statistics
1. sample mean y (sufficient for M1 if not M2);
2. sample median med(y) (insufficient);
3. sample variance var(y) (ancillary);
4. median absolute deviation mad(y) = med(y −med(y));
Likelihood-free Methods
ABC model choice consistency
Formalised framework
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks]:[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N (θ1, 1) opposed to model M2:y ∼ L(θ2, 1/
√2), Laplace distribution with mean θ2 and scale
parameter 1/√
2 (variance one).
0.1 0.2 0.3 0.4 0.5 0.6 0.7
01
23
45
6
posterior probability
Den
sity
Likelihood-free Methods
ABC model choice consistency
Formalised framework
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks]:[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N (θ1, 1) opposed to model M2:y ∼ L(θ2, 1/
√2), Laplace distribution with mean θ2 and scale
parameter 1/√
2 (variance one).
●
●
●
●
●
●
●●
●
●
●
Gauss Laplace
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
n=100
Likelihood-free Methods
ABC model choice consistency
Formalised framework
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks]:[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N (θ1, 1) opposed to model M2:y ∼ L(θ2, 1/
√2), Laplace distribution with mean θ2 and scale
parameter 1/√
2 (variance one).
0.1 0.2 0.3 0.4 0.5 0.6 0.7
01
23
45
6
posterior probability
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
01
23
probability
Den
sity
Likelihood-free Methods
ABC model choice consistency
Formalised framework
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks]:[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N (θ1, 1) opposed to model M2:y ∼ L(θ2, 1/
√2), Laplace distribution with mean θ2 and scale
parameter 1/√
2 (variance one).
●
●
●
●
●
●
●●
●
●
●
Gauss Laplace
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
n=100
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
Gauss Laplace
0.0
0.2
0.4
0.6
0.8
1.0
n=100
Likelihood-free Methods
ABC model choice consistency
Formalised framework
Framework
Starting from sample
y = (y1, . . . , yn)
the observed sample, not necessarily iid with true distribution
y ∼ Pn
Summary statistics
T(y) = Tn = (T1(y),T2(y), · · · ,Td(y)) ∈ Rd
with true distribution Tn ∼ Gn.
Likelihood-free Methods
ABC model choice consistency
Formalised framework
Framework
c© Comparison of
– under M1, y ∼ F1,n(·|θ1) where θ1 ∈ Θ1 ⊂ Rp1
– under M2, y ∼ F2,n(·|θ2) where θ2 ∈ Θ2 ⊂ Rp2
turned into
– under M1, T(y) ∼ G1,n(·|θ1), and θ1|T(y) ∼ π1(·|Tn)
– under M2, T(y) ∼ G2,n(·|θ2), and θ2|T(y) ∼ π2(·|Tn)
Likelihood-free Methods
ABC model choice consistency
Consistency results
Assumptions
A collection of asymptotic “standard” assumptions:
[A1] There exist a sequence {vn} converging to +∞,an a.c. distribution Q with continuous bounded density q(·),a symmetric, d × d positive definite matrix V0
and a vector µ0 ∈ Rd such that
vnV−1/20 (Tn − µ0)
n→∞ Q, under Gn
and for all M > 0
supvn|t−µ0|<M
∣∣∣|V0|1/2v−dn gn(t)− q(vnV
−1/20 {t − µ0}
)∣∣∣ = o(1)
Likelihood-free Methods
ABC model choice consistency
Consistency results
Assumptions
A collection of asymptotic “standard” assumptions:
[A2] For i = 1, 2, there exist d × d symmetric positive definite matricesVi (θi ) and µi (θi ) ∈ Rd such that
vnVi (θi )−1/2(Tn − µi (θi ))
n→∞ Q, under Gi,n(·|θi ) .
Likelihood-free Methods
ABC model choice consistency
Consistency results
Assumptions
A collection of asymptotic “standard” assumptions:
[A3] For i = 1, 2, there exist sets Fn,i ⊂ Θi and constants εi , τi , αi > 0such that for all τ > 0,
supθi∈Fn,i
Gi,n
[|Tn − µ(θi )| > τ |µi (θi )− µ0| ∧ εi |θi
]. v−αi
n supθi∈Fn,i
(|µi (θi )− µ0| ∧ εi )−αi
withπi (F c
n,i ) = o(v−τin ).
Likelihood-free Methods
ABC model choice consistency
Consistency results
Assumptions
A collection of asymptotic “standard” assumptions:
[A4] For u > 0
Sn,i (u) ={θi ∈ Fn,i ; |µ(θi )− µ0| ≤ u v−1
n
}if inf{|µi (θi )− µ0|; θi ∈ Θi} = 0, there exist constants di < τi ∧ αi − 1such that
πi (Sn,i (u)) ∼ udi v−din , ∀u . vn
Likelihood-free Methods
ABC model choice consistency
Consistency results
Assumptions
A collection of asymptotic “standard” assumptions:
[A5] If inf{|µi (θi )− µ0|; θi ∈ Θi} = 0, there exists U > 0 such that forany M > 0,
supvn|t−µ0|<M
supθi∈Sn,i (U)
∣∣∣|Vi (θi )|1/2v−dn gi (t|θi )
−q(vnVi (θi )
−1/2(t − µ(θi ))∣∣∣ = o(1)
and
limM→∞
lim supn
πi
(Sn,i (U) ∩
{||Vi (θi )
−1||+ ||Vi (θi )|| > M})
πi (Sn,i (U))= 0 .
Likelihood-free Methods
ABC model choice consistency
Consistency results
Assumptions
A collection of asymptotic “standard” assumptions:
[A1]–[A2] are standard central limit theorems ([A1] redundantwhen one model is “true”)[A3] controls the large deviations of the estimator Tn from theestimand µ(θ)[A4] is the standard prior mass condition found in Bayesianasymptotics (di effective dimension of the parameter)[A5] controls more tightly convergence esp. when µi is notone-to-one
Likelihood-free Methods
ABC model choice consistency
Consistency results
Effective dimension
[A4] Understanding d1, d2 : defined only whenµ0 ∈ {µi (θi ), θi ∈ Θi},
πi (θi : |µi (θi )− µ0| < n−1/2) = O(n−di/2)
is the effective dimension of the model Θi around µ0
Likelihood-free Methods
ABC model choice consistency
Consistency results
Asymptotic marginals
Asymptotically, under [A1]–[A5]
mi (t) =
∫Θi
gi (t|θi )πi (θi ) dθi
is such that(i) if inf{|µi (θi )− µ0|; θi ∈ Θi} = 0,
Clvd−din ≤ mi (Tn) ≤ Cuvd−di
n
and(ii) if inf{|µi (θi )− µ0|; θi ∈ Θi} > 0
mi (Tn) = oPn [vd−τin + vd−αi
n ].
Likelihood-free Methods
ABC model choice consistency
Consistency results
Within-model consistency
Under same assumptions, if inf{|µi (θi )− µ0|; θi ∈ Θi} = 0, theposterior distribution of µi (θi ) given Tn is consistent at rate 1/vnprovided αi ∧ τi > di .
Note: di can truly be seen as an effective dimension of the modelunder the posterior πi (.|Tn), since if µ0 ∈ {µi (θi ); θi ∈ Θi},
mi (Tn) ∼ vd−din
Likelihood-free Methods
ABC model choice consistency
Consistency results
Within-model consistency
Under same assumptions, if inf{|µi (θi )− µ0|; θi ∈ Θi} = 0, theposterior distribution of µi (θi ) given Tn is consistent at rate 1/vnprovided αi ∧ τi > di .
Note: di can truly be seen as an effective dimension of the modelunder the posterior πi (.|Tn), since if µ0 ∈ {µi (θi ); θi ∈ Θi},
mi (Tn) ∼ vd−din
Likelihood-free Methods
ABC model choice consistency
Consistency results
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayesfactor is driven by the asymptotic mean value of Tn under bothmodels. And only by this mean value!
Likelihood-free Methods
ABC model choice consistency
Consistency results
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayesfactor is driven by the asymptotic mean value of Tn under bothmodels. And only by this mean value!
Indeed, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
Clv−(d1−d2)n ≤ m1(Tn)
/m2(Tn) ≤ Cuv
−(d1−d2)n ,
where Cl ,Cu = OPn(1), irrespective of the true model.c© Only depends on the difference d1 − d2
Likelihood-free Methods
ABC model choice consistency
Consistency results
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayesfactor is driven by the asymptotic mean value of Tn under bothmodels. And only by this mean value!
Else, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
thenm1(Tn)
m2(Tn)≥ Cu min
(v−(d1−α2)n , v
−(d1−τ2)n
),
Likelihood-free Methods
ABC model choice consistency
Consistency results
Consistency theorem
If
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0,
Bayes factor
BT12 = O(v
−(d1−d2)n )
irrespective of the true model. It is consistent iff Pn is within themodel with the smallest dimension
Likelihood-free Methods
ABC model choice consistency
Consistency results
Consistency theorem
If Pn belongs to one of the two models and if µ0 cannot beattained by the other one :
0 = min (inf{|µ0 − µi (θi )|; θi ∈ Θi}, i = 1, 2)
< max (inf{|µ0 − µi (θi )|; θi ∈ Θi}, i = 1, 2) ,
then the Bayes factor BT12 is consistent
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Consequences on summary statistics
Bayes factor driven by the means µi (θi ) and the relative position ofµ0 wrt both sets {µi (θi ); θi ∈ Θi}, i = 1, 2.
For ABC, this implies the most likely statistics Tn are ancillarystatistics with different mean values under both models
Else, if Tn asymptotically depends on some of the parameters ofthe models, it is quite likely that there exists θi ∈ Θi such thatµi (θi ) = µ0 even though model M1 is misspecified
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Toy example: Laplace versus Gauss [1]
If
Tn = n−1n∑
i=1
X 4i , µ1(θ) = 3 + θ4 + 6θ2, µ2(θ) = 6 + · · ·
and the true distribution is Laplace with mean θ0 = 1, under theGaussian model the value θ∗ = 2
√3− 3 leads to µ0 = µ(θ∗)
[here d1 = d2 = d = 1]
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Toy example: Laplace versus Gauss [1]
If
Tn = n−1n∑
i=1
X 4i , µ1(θ) = 3 + θ4 + 6θ2, µ2(θ) = 6 + · · ·
and the true distribution is Laplace with mean θ0 = 1, under theGaussian model the value θ∗ = 2
√3− 3 leads to µ0 = µ(θ∗)
[here d1 = d2 = d = 1]c© a Bayes factor associated with such a statistic is inconsistent
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Toy example: Laplace versus Gauss [1]
If
Tn = n−1n∑
i=1
X 4i , µ1(θ) = 3 + θ4 + 6θ2, µ2(θ) = 6 + · · ·
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
probability
Fourth moment
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Toy example: Laplace versus Gauss [1]
If
Tn = n−1n∑
i=1
X 4i , µ1(θ) = 3 + θ4 + 6θ2, µ2(θ) = 6 + · · ·
●
●●
●
●
●●
M1 M2
0.30
0.35
0.40
0.45
n=1000
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Toy example: Laplace versus Gauss [1]
If
Tn = n−1n∑
i=1
X 4i , µ1(θ) = 3 + θ4 + 6θ2, µ2(θ) = 6 + · · ·
Caption: Comparison of the distributions of the posteriorprobabilities that the data is from a normal model (as opposed to aLaplace model) with unknown mean when the data is made ofn = 1000 observations either from a normal (M1) or Laplace (M2)distribution with mean one and when the summary statistic in theABC algorithm is restricted to the empirical fourth moment. TheABC algorithm uses proposals from the prior N (0, 4) and selectsthe tolerance as the 1% distance quantile.
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Toy example: Laplace versus Gauss [0]
WhenT(y) =
{y
(4)n , y
(6)n
}and the true distribution is Laplace with mean θ0 = 0, thenµ0 = 6, µ1(θ∗1) = 6 with θ∗1 = 2
√3− 3
[d1 = 1 and d2 = 1/2]thus
B12 ∼ n−1/4 → 0 : consistent
Under the Gaussian model µ0 = 3 µ2(θ2) ≥ 6 > 3 ∀θ2
B12 → +∞ : consistent
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Toy example: Laplace versus Gauss [0]
WhenT(y) =
{y
(4)n , y
(6)n
}and the true distribution is Laplace with mean θ0 = 0, thenµ0 = 6, µ1(θ∗1) = 6 with θ∗1 = 2
√3− 3
[d1 = 1 and d2 = 1/2]thus
B12 ∼ n−1/4 → 0 : consistent
Under the Gaussian model µ0 = 3 µ2(θ2) ≥ 6 > 3 ∀θ2
B12 → +∞ : consistent
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Toy example: Laplace versus Gauss [0]
WhenT(y) =
{y
(4)n , y
(6)n
}
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
posterior probabilities
Den
sity
Fourth and sixth moments
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Embedded models
When M1 submodel of M2, and if the true distribution belongs tothe smaller model M1, Bayes factor is of order
v−(d1−d2)n
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Embedded models
When M1 submodel of M2, and if the true distribution belongs tothe smaller model M1, Bayes factor is of order
v−(d1−d2)n
If summary statistic only informative on a parameter that is thesame under both models, i.e if d1 = d2, thenc© the Bayes factor is not consistent
Likelihood-free Methods
ABC model choice consistency
Summary statistics
Embedded models
When M1 submodel of M2, and if the true distribution belongs tothe smaller model M1, Bayes factor is of order
v−(d1−d2)n
Else, d1 < d2 and Bayes factor is consistent under M1. If truedistribution not in M1, thenc© Bayes factor is consistent only if µ1 6= µ2 = µ0