Download - Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxiliary Model

Bayesian Indirect Likelihood pBII Methods Examples References

Bayesian Indirect Inference using a ParametricAuxiliary Model

Dr Chris DrovandiQueensland University of Technology, Australia

[email protected]

Collaborators: Tony Pettitt and Anthony Lee

July 25, 2014

Chris Drovandi QUT Seminar 2014


Bayesian Indirect Likelihood

Bayesian inference is based on posterior distribution

p(θ|y ) ∝ p(y |θ)p(θ).Thus Bayesian inference requires the likelihood functionp(y |θ)Many models have an intractable likelihood function

Replace with some alternative tractable ‘likelihood’

p(θ|y ) ∝ p(y |θ)p(θ).Referred to as Bayesian Indirect Likelihood (BIL)



BIL Methods

Composite Likelihood (e.g. Pauli et al 2011)

Emulation (e.g. Gaussian process, Wilkinson 2014)

parametric BIL (uses the likelihood of an alternativeparametric model, e.g. Gallant and McCullagh 2009)

non-parametric BIL (traditional ABC method recovered, e.g.Beaumont et al 2002)



pBII methods

pBII - Bayesian Indirect Inference methods that uses a‘p’arametric auxiliary model in some way

Requires parametric model with parameter φ with tractablelikelihood, pA(y |φ)

Does not have to be an obvious connection between θ and φ

Assume throughout that dim(φ) ≥ dim(θ)

Two classes of pBII methods:

Use auxiliary model to form summary statistics for ABC (ABCII)Use auxiliary model to form replacement likelihood, pBIL



ABC II (Intro to ABC)

ABC assumes that simulation from model is straightforward,x ∼ p(y |θ)Compares observed and simulated data on the basis ofsummary statistics, s(·)

Based on the following target distribution

pǫ(θ|y ) ∝ p(θ)pǫ(y |θ),where ǫ is ABC tolerance and

pǫ(y |θ) = ∫x p(x |θ)Kǫ(||s(y )− s(x)||)dx ,where K is a kernel weighting function. This can be estimatedunbiasedly and this is enough (Andrieu and Roberts 2009)

If s(·) is sufficient and ǫ → 0 then pǫ(θ|y ) ≡ p(θ|y ) (does nothappen in practice)

Thus two sources of error



ABC II (Intro to ABC Cont...)

ABC has connections with kernel density estimation (Blum,2009)

Choice of summary statistics involve trade-off betweendimensionality and information loss

Non-parametric aspect wants summary as low-dimensional aspossible

But decreasing dimension means information loss

Choice of summary statistics most crucial to ABCapproximation



ABC II Methods

In some applications can propose an alternative parametricauxiliary model

Use auxiliary model to form summary statistic (ABC II)

Auxiliary parameter estimate as summary statistic (ABC IPand ABC IL)

Score of auxiliary model as summary statistic (ABC IS)



ABC IP (Drovandi et al 2011)

Parameter estimates of auxiliary model are summary statistics

Simulate data from generative model x ∼ p(·|θ)

Estimate auxiliary parameter

φ(θ, x) = argmaxφ

pA(x |φ).Compare with φ(y )ρ(s(x), s(y )) = √

(φ(x)− φ(y ))T I (φ(y ))(φ(x)− φ(y )).Efficient weighting of summary statistics using observed Fisherinformation I (φ(y ))



ABC IL (Gleim and Pigorsch 2013)

Same as ABC IP but uses likelihood in the discrepancyfunction

Parameter estimates of auxiliary model are summary statistics



φ(θ, x) = argmaxφ

pA(x |φ).Compare with φ(y )

ρ(s(x), s(y )) = log pA(y |φ(y ))− log pA(y |φ(x)).Chris Drovandi QUT Seminar 2014


ABC IS (Gleim and Pigorsch 2013)

Uses scores of auxiliary model as summary statisticsSA(y ,φ) = (

∂ log pA(y |φ)∂φ1

, · · · ,∂ log pA(y |φ)

∂φdim(φ)

)T

,


Evaluate scores of auxiliary model based on simulated dataand φ(y ), SA(x ,φ(y ))Note that SA(y ,φ(y )) = 0. Uses following discrepancyfunction

ρ(s(x), s(y )) = √SA(x ,φ(y ))T I (φ(y ))−1SA(x ,φ(y )).Very fast when scores are analytic (no fitting of auxiliarymodel)



ABC II Assumptions

Assumption (ABC IP Assumptions)

The estimator of the auxiliary parameter, φ(θ, x), is unique for all

θ with positive prior support.

Assumption (ABC IL Assumptions)

The auxiliary likelihood evaluated at the auxiliary estimate,

pA(y |φ(x ,θ)), is unique for all θ with positive prior support.

Assumption (ABC IS Assumptions)

The MLE of the auxiliary model fitted to the observed data, φ(y ),is an interior point of the parameter space of φ and J(φ(y )) ispositive definite. The log-likelihood of the auxiliary model,

log pA(·|φ), is differentiable and the score, SA(x ,φ(y )), is uniquefor any x that may be drawn according to any θ that has positive

prior support.Chris Drovandi QUT Seminar 2014


ABC II vs Traditional ABC for Summary Statistics

Advantages

Can check if auxiliary model fits data well

Natural ABC discrepancy functions

Control dimensionality of summary statistics via parsimoniousauxiliary model (e.g. compare auxiliary models via AIC)

Disadvantages

ABC II can be expensive

Restricted to applications where auxiliary model can beproposed



pdBIL (Gallant and McCullogh 2009, Reeves and Pettitt2005)

Replaces true likelihood with auxiliary likelihood (full datalevel)

Simulate n independent datasets from generative modelx1:niid∼ p(·|θ)


φ(θ, x1:n) = argmaxφ

pA(x1:n|φ).

Provides estimate of mapping φ(θ)

Evaluate auxiliary likelihood pA(y |φ(θ, x1:n))

Not ABC. No summary statistics or ABC tolerance

Theoretically behaves very different to ABC II



pdBIL (Cont...)

Ultimate target of pdBIL method

pA(θ|y ) ∝ pA(y |φ(θ))p(θ).Unfortunately binding function unknown, estimate via n iidsimulations from generative model, φθ,n = φ(θ, x1:n)

Target distribution of the method becomes

pA,n(θ|y ) ∝ pA,n(y |θ)p(θ),where

pA,n(y |θ) = ∫x1:n pA(y |φθ,n)p(x1:n|θ)dx1:n

pA,n(y |θ) can be estimated unbiasedly, which is enough(Andrieu and Roberts (2009))



pdBIL (Cont...)

Comparing pA(θ|y ) with p(θ|y )Under suitable conditions pdBIL is exact as n → ∞ ifgenerative model special case of auxiliary model

Suggests in this context auxiliary model needs to be flexible tomimic behaviour of generative model

Empirical evidence suggests for good approximation auxiliarylikelihood must do well in non-negligible posterior regions



Synthetic Likeihood as a psBIL Method

Reduces data to set of summary statistics. Target distribution:

p(θ|s(y )) ∝ p(s(y )|θ)p(θ).But p(s(y )|θ) is unknownSynthetic likelihood approach Wood (2010) assumes aparametric model, pA(s(y )|φ(θ)). Thus a pBIL method.Wood takes pA to be multivariate normal N(µ(θ),Σ(θ))For some θ, simulate n iid replicates,x1:n = (s(x1), . . . , s(xn)), iid sample from p(s(y )|θ). Fitauxiliary model to this dataset:

µ(x1:n,θ) =1

n

n∑

i=1

s(x i ),

Σ(x1:n,θ) =1

n

n∑

i=1

(s(x i )− µ(x1:n,θ))(s(x i )− µ(x1:n,θ))T ,



ABC as BIL with Non-Parametric Auxiliary Model (npBIL)

ABC is recovered as a BIL method by selecting anon-parametric auxiliary likelihood

Full data (npdBIL): Choose φ(θ, x1:n) = x1:n

pA(y |φ(θ, x1:n)) = pA(y |x1:n) =1

n

n∑

i=1

Kǫ(ρ(y , x i )),

Data reduced to summary statistic (npsBIL):

1

n

n∑

i=1

Kǫ(ρ(s(y ), s(x i))).

ABC II is a special npsBIL method where summary statisticscome from a parametric auxiliary model



pBII Methods

Figure: pBII Methods



BIL

pBIL npBIL=ABC

npdBIL npsBIL

ABC II

ABC IP ABC IL ABC IS

pdBIL psBIL

Key:

BIL - ’B’ayesian ’I’ndirect ’L’ikelihoodpBIL - ’p’arametric BILpdBIL - pBIL based on full ’d’atapsBIL - pBIL based on ’s’ummary statisticABC - ’a’pproximate ’B’ayesian ’c’omputationnpBIL - ’n’on-’p’arametric BIL (equivalent to ABC)npdBIL - npBIL based on full ’d’atanpsBIL - npBIL based on ’s’ummary statisticABC II - ABC ’I’ndirect ’I’nferenceABC IP - ABC ’I’ndirect ’P’arameterABC IL - ABC ’I’ndirect ’L’ikelihoodABC IS - ABC ’I’ndirect ’S’core

Figure: BIL framework



Comparison of pdBIL and ABC II

Theoretical Comarison

pdBIL exact when true model contained within auxiliary model(as n → ∞). Suggests to choose flexible auxiliary modelEven when true model is special case, ABC II statistics notsufficient in generalWhen ABC II have sufficient statistics, pdBIL not exact ingeneral, even when n → ∞Some Toy Examples Later...

Choice of Auxiliary Model

ABC II requires good summarisation of data (independent ofspecification of generative model)Auxiliary model for pdBIL does not necessarily have to fit datawell (if generative model mis-specified). Requires flexibilitywrt chosen generative modelIf generative model (approx) correct and fits data well thensame auxiliary model may be chosen



Sampling from ABC II Target - MCMC ABC

Algorithm 1 MCMC ABC algorithm of Marjoram et al 2003.

1: Set θ0

2: for i = 1 to T do

3: Draw θ∗ ∼ q(·|θi−1)4: Simulate x∗ ∼ p(·|θ∗)

5: Compute r = p(θ∗)q(θi−1|θ∗)

p(θi−1)q(θ∗|θi−1)1(ρ(s(x∗), s(y)) ≤ ǫ)

6: if uniform(0, 1) < r then

7: θi = θ∗

8: else

9: θi = θi−1

10: end if

11: end for



Sampling from pdBIL Target - MCMC pdBIL

Find increase in acceptance probability with increase in n

Algorithm 1MCMC BIL algorithm (see also Gallant and McCullogh 2009).

1: Set θ0

2: Simulate x∗

n ∼ p(·|θ0)3: Compute φ0 = argmaxφ pA(x

∗

n|φ)4: for i = 1 to T do

5: Draw θ∗ ∼ q(·|θi−1)6: Simulate x∗

n ∼ p(·|θ∗)

7: Compute φ(x∗

n) = argmaxφ pA(x∗

n|φ)

8: Compute r = pA(y|φ(x∗

n))π(θ∗)q(θi−1|θ∗)

pA(y|φi−1)π(θi−1)q(θ∗|θi−1)

9: if uniform(0, 1) < r then

10: θi = θ∗

11: φi = φ(x∗

n)12: else

13: θi = θi−1

14: φi = φi−1

15: end if

16: end for



Toy Example 1 (Drovandi and Pettitt 2013)

True model Poisson(λ) and auxiliary model Normal(µ, τ)

ABC II ‘gets lucky’ as µ = y , sufficient for λ

pdBIL as n → ∞ approximates Poisson(λ) likelihood withNormal(λ, λ) likelihood. Not exact.

28 30 320

0.2

0.4

0.6

0.8

n = 1n = 10n = ∞true

20 30 40 50−800

−600

−400

λ

log−

likel

ihoo

d

trueauxiliary

Acceptance probabilities: n = 1 46%, n = 10 67%, n = 100 72%and n = 1000 73% for increasing n



Toy Example 1

Results for pdBIL when auxiliary model mis-specified

Normal(µ, 9) (underdispersed) Normal(µ, 49) (overdispersed)

ABC II will still work by chance

28 30 320

0.2

0.4

0.6

0.8

1

trueτ unspecifiedτ = 16τ = 49

29 30 31 32−330

−320

−310

λ

log

−lik

elih

oo

d

trueτ unspecifiedτ=16τ=49



Toy Example 2

True model t-distribution(µ, σ, ν = 1) and auxiliary modelt-distribution(µ, σ, ν)Here pdBIL exact as n → ∞ since true is special case ofauxiliaryABC II do not produce sufficient statistics (full set of orderstatistics are minimal sufficient)

−1.5 −1 −0.5 0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

truen=1n=10n=100

0.5 1 1.5 2 2.50

0.5

1

1.5

2

2.5

truen=1n=10n=100



Toy Example 2 cont

−1 −0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

trueABC IPABC ILABC IS

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.60

0.5

1

1.5

2

2.5

trueABC IPABC ILABC IS



Quantile Distribution Example

Model of Interest: g-and-k quantile distribution

Defined in terms of its quantile function:

Q(z(p);θ) = a + b

(

1 + c1− exp(−gz(p))

1 + exp(−gz(p))

)

(1 + z(p)2)kz(p).

(1)

p - quantile, z(p) - standard normal quantile, θ = (a, b, g , k),c = 0.8 (see Rayner and MacGillivray 2003).

Numerical likelihood evaluation possible

Simulation easier via inversion method

Data consists of 10000 independent draws with a = 3, b = 1,g = 2 and k = 0.5



Quantile Distribution Example (Cont...)

Auxiliary model is a 3-component normal mixture model

Flexible and fits data well

But breaks assumption of ABC IP (parameter estimates notunique)

0 5 10 15 200

0.2

0.4

0.6

0.8

y

de

nsity

estimated density3 component mixture density



pdBIL Results

2.95 3 3.050

10

20

30

n=1n=10n=50posterior

(a) a

0.9 1 1.10

5

10

15

(b) b

1.9 2 2.1 2.20

5

10

(c) g

0.45 0.5 0.550

10

20

30

(d) k

Acc Prob: 2.8% for n = 1, 13.1% for n = 10 and 20.8% for n = 50Chris Drovandi QUT Seminar 2014


ABC II (no regression adjustment)

2.95 3 3.050

10

20

30

posteriorABC IPABC ILABC IS

(e) a

0.9 1 1.10

5

10

15

(f) b

1.9 2 2.1 2.20

5

10

(g) g

0.45 0.5 0.550

10

20

30

(h) k



Compare all (note ABC has regression adjustment)

2.95 3 3.050

10

20

30

ABCABC ISABC ILABC IPpdBIL n=50posterior

(i) a

0.9 1 1.10

5

10

15

(j) b

1.9 2 2.1 2.20

5

10

(k) g

0.45 0.5 0.550

10

20

30

(l) k



4 Component Auxiliary Mixture Model

2.95 3 3.050

10

20

30

40

ABCABC ISABC ILABC IPpdBIL n=50posterior

(m) a

0.9 1 1.10

5

10

15

(n) b

1.9 2 2.1 2.20

5

10

(o) g

0.45 0.5 0.550

10

20

30

(p) k



Macroparasite Immunity Example

Estimate parameters of a Markov process model explainingmacroparasite population development with host immunity

212 hosts (cats) i = 1, . . . , 212. Each cat injected with lijuvenile Brugia pahangi larvae (approximately 100 or 200).

At time ti host is sacrificed and the number of matures arerecorded

Host assumed to develop an immunity

Three variable problem: M(t) matures, L(t) juveniles, I (t)immunity.

Only L(0) and M(ti) is observed for each host

No tractable likelihood



0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Pro

porto

n of

Mat

ures



Trivariate Markov Process of Riley et al (2003)

M(t) L(t)

I(t)

Mature Parasites Juvenile Parasites

Immunity

Maturation

Gain of immunity Loss of immunity

Death due to im

munity

Natural death

Natural death

Invisible

Invisible

Invisible

γL(t)

νL(t) µI I (t)

βI(t)L(t)

µLL(t)

µMM

(t)



Auxiliary Beta-Binomial model

The data show too much variation for Binomial

A Beta-Binomial model has an extra parameter to capturedispersion

p(mi |αi , βi ) =

(

li

mi

)

B(mi + αi , li −mi + βi )

B(αi , βi ),

Useful reparameterisation pi = αi/(αi + βi ) andθi = 1/(αi + βi )

Relate the proportion and over dispersion parameters to time,ti , and initial larvae, li , covariates

logit(pi ) = β0 + β1 log(ti) + β2(log(ti ))2,

log(θi) =

{

η100, if li ≈ 100η200, if li ≈ 200

,

Five parameters φ = (β0, β1, β2, η100, η200)



pdBIL Results

0 0.5 1 1.5 2 2.5

x 10−3

0

500

1000

1500

n=1n=20n=50

(q) ν

0 1 20

0.2

0.4

0.6

0.8

(r) µI

0 0.01 0.02 0.030

50

100

150

(s) µL

0 1 2 30

0.2

0.4

0.6

0.8

(t) β



pBII results (ABC II no regression adjustment)

0 1 2 3

x 10−3

0

500

1000

ABC ISABC IPABC ILpdBIL n=50

(u) ν

0 1 20

0.2

0.4

0.6

0.8

(v) µI

−0.01 0 0.01 0.02 0.03 0.040

50

100

150

(w) µL

0 1 20

0.2

0.4

0.6

0.8

(x) β



pBII results (ABC has regression adjustment)

0 1 2

x 10−3

0

500

1000

1500

2000

ABC IS REGABC IP REGABC IL REGpdBIL n=50

(y) ν

0 0.005 0.01 0.015 0.02 0.0250

50

100

150

(z) µL



Discussion

pdBIL very different to ABC II theoretically

pdBIL needs to have a good auxiliary model. ABC II might beuseful is auxiliary model mis-specified

ABC II more flexible; can incorporate other summary statistics



Key References

Drovandi et al. (2011). Approximate Bayesian Computationusing Indirect Inference. JRSS C.

Drovandi, Pettitt and Lee (2014). Bayesian Indirect Inferenceusing a Parametric Auxiliary Model.http://eprints.qut.edu.au/63767/

Gleim and Pigorsch (2013). Approximate BayesianComputation with Indirect Summary Statistics.

Gallant and McCullogh (2009). On the determination ofgeneral scientific models with application to asset pricing.JASA.

Reeves and Pettitt (2005). A theoretical framework forapproximate Bayesian computation. 20th IWSM.


http://eprints.qut.edu.au/63767/