How to compute posterior model probabilities and why … · How to compute posterior model...

82
How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice Peter Green School of Mathematics University of Bristol 15 March 2011, Groningen: All models are wrong... Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 1 / 69

Transcript of How to compute posterior model probabilities and why … · How to compute posterior model...

Page 1: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

How to compute posterior model probabilities ... andwhy that doesn’t mean that we have solved the

problem of model choice

Peter Green

School of MathematicsUniversity of Bristol

15 March 2011, Groningen: All models are wrong...

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 1 / 69

Page 2: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Outline

1 Ideal Bayes

2 Across-model methodology

3 Examples

4 Alternatives to across-model sampling

5 Methodological relatives

6 Better Bayes factors

7 Real Bayes

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 2 / 69

Page 3: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Hierarchical model

Suppose givena prior p(k) over models k in a countable set K, andfor each k

a prior distribution p(θk |k), anda likelihood p(Y |k , θk ) for the data Y .

For definiteness and simplicity, suppose that p(θk |k) is a density in nkdimensions, and that there are no other parameters, so that wherethere are parameters common to all models these are subsumed intoeach θk ∈ Rnk .

Additional parameters, perhaps in additional layers of a hierarchy, areeasily dealt with. All probability distributions are proper.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 3 / 69

Page 4: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Hierarchical model

Suppose givena prior p(k) over models k in a countable set K, andfor each k

a prior distribution p(θk |k), anda likelihood p(Y |k , θk ) for the data Y .

For definiteness and simplicity, suppose that p(θk |k) is a density in nkdimensions, and that there are no other parameters, so that wherethere are parameters common to all models these are subsumed intoeach θk ∈ Rnk .

Additional parameters, perhaps in additional layers of a hierarchy, areeasily dealt with. All probability distributions are proper.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 3 / 69

Page 5: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Hierarchical model as DAG

k

Y

model indicator

model-specific parameter

data

)(kp

)|( kp k

)θκ,|p(Y κ

The generic hierarchical modelfor model choice

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 4 / 69

Page 6: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

The joint posterior

p(k , θk |Y ) =p(k)p(θk |k)p(Y |k , θk )∑

k ′∈K∫

p(k ′)p(θ′k ′ |k ′)p(Y |k ′, θ′k ′)dθ′k ′

can always be factorised as

p(k , θk |Y ) = p(k |Y )p(θk |k ,Y )

– the product of posterior model probabilities and model-specificparameter posteriors.– very often the basis for reporting the inference, and in some of themethods mentioned below is also the basis for computation.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 5 / 69

Page 7: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Bayes factors, Evidence

Perhaps you didn’t think you wanted to report p(k |Y )?Of course, with these posterior probabilities, we can report BayesFactors

Bkl =p(Y |k)p(Y |l)

=p(k |Y )

p(l |Y )÷ p(k)

p(l)

for pairwise comparison of models.

For some, the Evidence (= marginal likelihood) p(Y |k) has an intrinsicmeaning and interpretation.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 6 / 69

Page 8: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Bayesian model averaging, prediction

We can do model averaging

E(F |Y ) =∑

k

∫F (k , θk )p(k , θk |Y )dθk

for any function F with the same interpretation in each model; theexpectation can be estimated simply by averaging F along the entirerun, essentially ignoring the model indicator k .

As an example, we can do prediction:

p(Y+|Y ) =∑

k

p(Y+|k ,Y )p(k |Y )

a posterior-weighted mixture of the within-model-k predictions

p(Y+|k ,Y ) =

∫p(Y+|k , θk )p(θk |k ,Y )dθk

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 7 / 69

Page 9: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Bayesian model averaging, prediction

We can do model averaging

E(F |Y ) =∑

k

∫F (k , θk )p(k , θk |Y )dθk

for any function F with the same interpretation in each model; theexpectation can be estimated simply by averaging F along the entirerun, essentially ignoring the model indicator k .

As an example, we can do prediction:

p(Y+|Y ) =∑

k

p(Y+|k ,Y )p(k |Y )

a posterior-weighted mixture of the within-model-k predictions

p(Y+|k ,Y ) =

∫p(Y+|k , θk )p(θk |k ,Y )dθk

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 7 / 69

Page 10: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Note the generality of this basic formulation: it embraces bothgenuine model-choice situations, where the variable k indexes thecollection of discrete models under consideration, but alsosettings where there is really a single model, but one with avariable dimension parameter, for example a functionalrepresentation such as a series whose number of terms is notfixed (in which case, k is unlikely to be of direct inferential interest).

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 8 / 69

Page 11: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Splines

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

x

y

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 9 / 69

Page 12: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Polynomials

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

x

y

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 10 / 69

Page 13: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Ideal Bayes

Compatibility across models

Some would argue that responsible adoption of this Bayesianhierarchical model presupposes that, e.g., p(θk |k) should becompatible in that inference about functions of parameters that aremeaningful in several models should be approximately invariant to k .

Such compatibility could in principle be exploited in the construction ofMCMC methods (how?).

But it is philosophically tenable that no such compatibility is present,and we can’t assume it.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 11 / 69

Page 14: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Across- and within-model simulation

How to compute p(k , θk |Y )?

Two main approaches using MCMC:across: one MCMC simulation with states of the form(k , θk ) ≈ p(k , θk |Y )

within: separate simulations of θk ≈ p(θk |k ,Y ) for each k .and beyond straight MCMC:

particle filter: SMC in place of MCMCABC: ‘likelihood-free’ methods where Y |k , θk can be simulated butp(Y |k , θk ) cannot be evaluated.nested samplingvariational methods

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 12 / 69

Page 15: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Across- and within-model simulation

How to compute p(k , θk |Y )?

Two main approaches using MCMC:across: one MCMC simulation with states of the form(k , θk ) ≈ p(k , θk |Y )

within: separate simulations of θk ≈ p(θk |k ,Y ) for each k .and beyond straight MCMC:

particle filter: SMC in place of MCMCABC: ‘likelihood-free’ methods where Y |k , θk can be simulated butp(Y |k , θk ) cannot be evaluated.nested samplingvariational methods

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 12 / 69

Page 16: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Across-model simulation: Reversible jump MCMC

The across-model MCMC simulator follows the ideal Bayesian, andtreats (k , θk ) as a single unknown. The state space for anacross-model simulation is {(k , θk )} =

⋃k∈K({k} ×Rnk ).

Mathematically, this is not a particularly awkward object. But at least alittle non-standard, when nk varies with k .

We use Metropolis-Hastings to build a suitable reversible chain.

On the face of it, this requires measure-theoretic notation, which maybe unwelcome! The point of the ‘reversible jump’ framework is torender the measure theory invisible, by means of a construction usingonly ordinary densities. Even the fact that we are jumping dimensionsbecomes essentially invisible.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 13 / 69

Page 17: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Across-model simulation: Reversible jump MCMC

The across-model MCMC simulator follows the ideal Bayesian, andtreats (k , θk ) as a single unknown. The state space for anacross-model simulation is {(k , θk )} =

⋃k∈K({k} ×Rnk ).

Mathematically, this is not a particularly awkward object. But at least alittle non-standard, when nk varies with k .

We use Metropolis-Hastings to build a suitable reversible chain.

On the face of it, this requires measure-theoretic notation, which maybe unwelcome! The point of the ‘reversible jump’ framework is torender the measure theory invisible, by means of a construction usingonly ordinary densities. Even the fact that we are jumping dimensionsbecomes essentially invisible.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 13 / 69

Page 18: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Modelling the code

Let’s avoid all abstraction!

Consider how the transition will be implemented.

Assume X ⊂ Rd , and that π has a density (also denoted π).

At the current state x , we generate, say, r random numbers u from aknown joint density g, and then form the proposed new state as adeterministic function of the current state and the random numbers:x ′ = h(x ,u), say.

The reverse transition from x ′ to x would be made with the aid ofrandom numbers u′ ∼ g′ giving x = h′(x ′,u′).

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 14 / 69

Page 19: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Modelling the code

x=h’(x’,u’)

x’=h(x,u)u

u’

h

h’

Reversible jump

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 15 / 69

Page 20: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

The equilibrium probability of jumping from A to B is then an integralwith respect to (x ,u):∫

(x ,x ′=h(x ,u))∈A×Bπ(x)g(u)α(x , x ′)dx du.

The equilibrium probability of jumping from B to A is an integral withrespect to (x ′,u′):∫

(x=h′(x ′,u′),x ′)∈A×Bπ(x ′)g′(u′)α(x ′, x)dx ′ du′.

If the transformation from (x ,u) to (x ′,u′) is a diffeomorphism (thetransformation and its inverse are differentiable), then we can apply thestandard change-of-variable formula, to write this as an integral withrespect to (x ,u).

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 16 / 69

Page 21: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Detailed balance says the two integrals must be equal:it holds if

α(x , x ′) = min{

1,π(x ′)g′(u′)π(x)g(u)

∣∣∣∣∂(x ′,u′)∂(x ,u)

∣∣∣∣} ,involving only ordinary joint densities.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 17 / 69

Page 22: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

What’s the point?

Perhaps a little indirect!– but it’s a flexible framework for constructing quite complex movesusing only elementary calculus.

The possibility that r < d covers the typical case that given x ∈ X , onlya lower-dimensional subset of X is reachable in one step.

(The Gibbs sampler is the best-known example of this, since in thatcase only some of the components of the state vector are changed ata time, although the formulation here is more general as it allows thesubset not to be parallel to the coordinate axes.)

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 18 / 69

Page 23: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

What’s the point?

Perhaps a little indirect!– but it’s a flexible framework for constructing quite complex movesusing only elementary calculus.

The possibility that r < d covers the typical case that given x ∈ X , onlya lower-dimensional subset of X is reachable in one step.

(The Gibbs sampler is the best-known example of this, since in thatcase only some of the components of the state vector are changed ata time, although the formulation here is more general as it allows thesubset not to be parallel to the coordinate axes.)

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 18 / 69

Page 24: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

The trans-dimensional case

But the main benefit of this formalism is that

α(x , x ′) = min{

1,π(x ′)g′(u′)π(x)g(u)

∣∣∣∣∂(x ′,u′)∂(x ,u)

∣∣∣∣} ,applies, without change, in a variable dimension context.

(Use the same symbol π(x) for the target density whatever thedimension of x in different parts of X .)

Provided that the transformation from (x ,u) to (x ′,u′) remains adiffeomorphism, the individual dimensions of x and x ′ can be different.The dimension-jumping is ‘invisible’.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 19 / 69

Page 25: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Dimension matching

Suppose the dimensions of x , x ′,u and u′ are d ,d ′, r and r ′

respectively, then we have functions h : Rd ×Rr → Rd ′and

h′ : Rd ′ ×Rr ′ → Rd , used respectively in x ′ = h(x ,u) andx = h′(x ′,u′).

For the transformation from (x ,u) to (x ′,u′) to be a diffeomorphismrequires that d + r = d ′ + r ′, so-called ‘dimension-matching’; if thisequality failed, the mapping and its inverse could not both bedifferentiable.

Dimension matching is necessary but not sufficient.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 20 / 69

Page 26: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Dimension matching

x

u1

u2

x’1

x’2

u’

Dimension matching

'' rdrd

1221

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 21 / 69

Page 27: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Dimension matching

'' rdrd x

u

x’1

x’2

Dimension matching

0211

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 22 / 69

Page 28: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

Details of application to model-choice

We wish to use these reversible jump moves to sample the spaceX =

⋃k∈K({k} ×Rnk ) with invariant distribution π, which here is

p(k , θk |Y ).

Just as in ordinary MCMC, we typically need multiple types of movesto traverse the whole space X . Each move is a transition kernelreversible with respect to π, but only in combination do we obtain anergodic chain.

The moves will be indexed by m: consider a particular move m thatproposes to take x = (k , θk ) to x ′ = (k ′, θ′k ′) or vice-versa for a specificpair (k , k ′).

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 23 / 69

Page 29: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Across-model methodology

The acceptance probability is derived as before, and yields

αm(x , x ′) = min{

1,π(x ′)π(x)

jm(x ′)jm(x)

g′m(u′)gm(u)

∣∣∣∣∂(x ′,u′)∂(x ,u)

∣∣∣∣} .Here jm(x) is the probability of choosing move type m when at x , thevariables x , x ′,u,u′ are of dimensions dm,d ′m, rm, r ′m respectively, withdm + rm = d ′m + r ′m, we have x ′ = hm(x ,u) and x = h′m(x ′,u′), and theJacobian has a form correspondingly depending on m.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 24 / 69

Page 30: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

..... of no statistical use at all!Suppose x lies in R∪R2: π(x) is a mixture:with probability p1, x ∼ Beta(2,3),with probability p2, (x1, x2,1− x1 − x2) ∼ Dirichlet(2,3,4).I will use three moves:(1) within R: x → U(x − ε, x + ε).(2) within R2: (x1, x2)→ (x2, x1).(3) between R and R2

In R, choose (1) or (3) with probabilities 1− r1, r1.In R2, choose (2) or (3) with probabilities 1− r2, r2.Thus j3(x) = r1 for all x ∈ R and j3(x ′) = r2 for all x ′ ∈ R2.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 25 / 69

Page 31: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Dimension-changing with move (3)

Proposal:To go from x ∈ R to (x1, x2) ∈ R2, draw u from U(0,1) [so g3(u) = 1 if0 < u < 1] and propose (x1, x2) = (x ,u). For reverse move, no u′

required [write g′3(u′) ≡ 1] and set x = x1. This certainly gives a

bijection: (x ,u)↔ (x1, x2), with Jacobian = 1.Acceptance decision:

α = min{

1,π(x ′)π(x)

j3(x ′)j3(x)

g′3(u′)

g3(u)

∣∣∣∣∂(x ′,u′)∂(x ,u)

∣∣∣∣}= min

{1,

p2fD(x ,u)p1fB(x)

r2

r1

11|1|}

= min{

1,p2r2fD(x ,u)p1r1fB(x)

}

For reverse move, α = min{1, (p1r1fB(x))/(p2r2fD(x ,u))}.Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 26 / 69

Page 32: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 1 sweeps

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 27 / 69

Page 33: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 2 sweeps

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 28 / 69

Page 34: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 3 sweeps

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 29 / 69

Page 35: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 4 sweeps

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 30 / 69

Page 36: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 5 sweeps

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 31 / 69

Page 37: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 6 sweeps

●●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 32 / 69

Page 38: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 7 sweeps

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 33 / 69

Page 39: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 8 sweeps

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 34 / 69

Page 40: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 9 sweeps

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 35 / 69

Page 41: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 10 sweeps

●●

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 36 / 69

Page 42: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 20 sweeps

●●●

●●●

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 37 / 69

Page 43: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 30 sweeps

●●●

●●●

●●

●●

●●●

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 38 / 69

Page 44: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 40 sweeps

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 39 / 69

Page 45: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 50 sweeps

●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 40 / 69

Page 46: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 100 sweeps

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●●

●●●●

●●

●●●

●●

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 41 / 69

Page 47: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 200 sweeps

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●●●

●●●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●● ●●

●●●●

●●

● ●●

●●●●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 42 / 69

Page 48: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 300 sweeps

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●●●

●●●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●● ●●

●●●●

●●

● ●●

●●●●●

●●●●●

●●●

●●

●●●

●●

●●●●●

●●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 43 / 69

Page 49: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 400 sweeps

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●●●

●●●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●●●●●●●●●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●● ●●

●●●●

●●

● ●●

●●●●●

●●●●●

●●●

●●

●●●

●●

●●●●●

●●●●●

●●●●●●●

●●

●●

●●

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 44 / 69

Page 50: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 500 sweeps

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●●●

●●●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●

●●

●●●●●

●●

●●●

●●●●

●●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●● ●●

●●●●

●●

● ●●

●●●●●

●●●●●

●●●

●●

●●●

●●

●●●●●

●●●●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 45 / 69

Page 51: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Toy example

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

60.

81.

0

after 1000 sweeps

●●●

●●●

●●

●●●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●●●

●●●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●

●●

●●●●●

●●

●●●

●●●●

●●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●

●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●●●●●●

●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●●●●●

●●●●●●

●●

●●

●●●●●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●● ●●

●●●●

●●

● ●●

●●●●●

●●●●●

●●●

●●

●●●

●●

●●●●●

●●●●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●●

●●● ●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 46 / 69

Page 52: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 47 / 69

Page 53: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x1 ~ Beta(2,3)

data

mod

el

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x2[1] ~ Beta(2,7)

data

mod

el

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x2[2] ~ Beta(3,6)

data

mod

el

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 48 / 69

Page 54: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Cyclones in Bay of Bengal

Multiple change-point analysis

A Bayesian approach to change-point analysis forpoint processes �y�� y�� � � � � yn�: combines

� Poisson-process likelihood:p�yjx� � expf

Pn

i�� log x�yi��R L�x�t�dtg

� prior model for step function x�t�, � � t � L,representing intensity

Prior model: represent step function by�k� fsjg

kj��� fhjg

kj���: x�t� �

Pj hjI�sj�sj����t�:

� number of steps k: Poisson(�),

� step heights hj : Gamma(�� �),

� step positions sj : p�sjk� �Q

j�sj�� � sj�,

all independent.

18

MCMC for step functions

� � �k� fsjgkj��� fhjg

kj��� �� ��

I will use four moves:

(a) Metropolis change to a randomly chosen stepheight hj .

(b) Metropolis change to a randomly chosen stepposition sj .

(c) Jump move: birth/death of steps

– birth: choose new step position s� atrandom, split current step height h into two:�h�� h��

– death: choose step at random to kill,combine current step heights �h�� h�� intoone: h

(d) Update hyperparameters �, �

19

Birth and death of steps

j-h

hj+

s* sjsj-1

h

hw�

�hw�� � hw��w�

�h�w� s�� u�� �h�� h�� w�� w��

20

Example:cyclones hitting the Bay of Bengal

141 cyclones over a period of 100 years(a cyclone is a storm with winds � �� km h��).

time

0 20 40 60 80 100

.. .......... .................. ..................... ............ .......................... . ... ................................................

21

A point process – of cyclone events – in time: modelled here as aninhomogeneous Poisson process, with piecewise constant intensity.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 49 / 69

Page 55: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Cyclones in Bay of Bengal

Multiple change-point analysis

A Bayesian approach to change-point analysis forpoint processes �y�� y�� � � � � yn�: combines

� Poisson-process likelihood:p�yjx� � expf

Pn

i�� log x�yi��R L�x�t�dtg

� prior model for step function x�t�, � � t � L,representing intensity

Prior model: represent step function by�k� fsjg

kj��� fhjg

kj���: x�t� �

Pj hjI�sj�sj����t�:

� number of steps k: Poisson(�),

� step heights hj : Gamma(�� �),

� step positions sj : p�sjk� �Q

j�sj�� � sj�,

all independent.

18

MCMC for step functions

� � �k� fsjgkj��� fhjg

kj��� �� ��

I will use four moves:

(a) Metropolis change to a randomly chosen stepheight hj .

(b) Metropolis change to a randomly chosen stepposition sj .

(c) Jump move: birth/death of steps

– birth: choose new step position s� atrandom, split current step height h into two:�h�� h��

– death: choose step at random to kill,combine current step heights �h�� h�� intoone: h

(d) Update hyperparameters �, �

19

Birth and death of steps

j-h

hj+

s* sjsj-1

h

hw�

�hw�� � hw��w�

�h�w� s�� u�� �h�� h�� w�� w��

20

Example:cyclones hitting the Bay of Bengal

141 cyclones over a period of 100 years(a cyclone is a storm with winds � �� km h��).

time

0 20 40 60 80 100

.. .......... .................. ..................... ............ .......................... . ... ................................................

21

Illustrating the model-jumping move: splitting and merging steps.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 50 / 69

Page 56: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Cyclones in Bay of Bengal

Choices of hyperparameters:

� Prior on k: Poisson(�), with � � �.

� Prior on hj : Gamma(�,�), with– � � ���� ��

– � � ���� n�L�

Sample of step functions from the posterior:

time

inte

nsity

0 20 40 60 80 100

01

23

22

Posterior for the number of change points k

o

o

oo

o

o

o

o o o o o o

k

prob

abili

ty

0 2 4 6 8 10 12

0.0

0.10

0.20

Zero change points is ruled out; k � � or � moreprobable than under the prior.

23

Posterior density estimates for change-pointpositions

time

dens

ity

0 20 40 60 80 100

0.0

0.05

0.10

0.15

24

Model-averaged estimate: E�x���jy�

time

inte

nsity

0 20 40 60 80 100

0.0

0.5

1.0

1.5

2.0

2.5

.. .......... .................. ..................... ............ .......................... . ... ................................................

(the expectation of a random step function is not astep function).

25

A small sample from the posterior

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 51 / 69

Page 57: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Cyclones in Bay of Bengal

Choices of hyperparameters:

� Prior on k: Poisson(�), with � � �.

� Prior on hj : Gamma(�,�), with– � � ���� ��

– � � ���� n�L�

Sample of step functions from the posterior:

time

inte

nsity

0 20 40 60 80 100

01

23

22

Posterior for the number of change points k

o

o

oo

o

o

o

o o o o o o

k

prob

abili

ty

0 2 4 6 8 10 12

0.0

0.10

0.20

Zero change points is ruled out; k � � or � moreprobable than under the prior.

23

Posterior density estimates for change-pointpositions

time

dens

ity

0 20 40 60 80 100

0.0

0.05

0.10

0.15

24

Model-averaged estimate: E�x���jy�

time

inte

nsity

0 20 40 60 80 100

0.0

0.5

1.0

1.5

2.0

2.5

.. .......... .................. ..................... ............ .......................... . ... ................................................

(the expectation of a random step function is not astep function).

25

Posterior of the number of steps

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 52 / 69

Page 58: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Cyclones in Bay of Bengal

Choices of hyperparameters:

� Prior on k: Poisson(�), with � � �.

� Prior on hj : Gamma(�,�), with– � � ���� ��

– � � ���� n�L�

Sample of step functions from the posterior:

time

inte

nsity

0 20 40 60 80 100

01

23

22

Posterior for the number of change points k

o

o

oo

o

o

o

o o o o o o

k

prob

abili

ty

0 2 4 6 8 10 12

0.0

0.10

0.20

Zero change points is ruled out; k � � or � moreprobable than under the prior.

23

Posterior density estimates for change-pointpositions

time

dens

ity

0 20 40 60 80 100

0.0

0.05

0.10

0.15

24

Model-averaged estimate: E�x���jy�

time

inte

nsity

0 20 40 60 80 100

0.0

0.5

1.0

1.5

2.0

2.5

.. .......... .................. ..................... ............ .......................... . ... ................................................

(the expectation of a random step function is not astep function).

25

An example of inference conditional on k : posterior densities of steppositions

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 53 / 69

Page 59: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Examples

Cyclones in Bay of Bengal

Choices of hyperparameters:

� Prior on k: Poisson(�), with � � �.

� Prior on hj : Gamma(�,�), with– � � ���� ��

– � � ���� n�L�

Sample of step functions from the posterior:

time

inte

nsity

0 20 40 60 80 100

01

23

22

Posterior for the number of change points k

o

o

oo

o

o

o

o o o o o o

k

prob

abili

ty

0 2 4 6 8 10 12

0.0

0.10

0.20

Zero change points is ruled out; k � � or � moreprobable than under the prior.

23

Posterior density estimates for change-pointpositions

time

dens

ity

0 20 40 60 80 100

0.0

0.05

0.10

0.15

24

Model-averaged estimate: E�x���jy�

time

inte

nsity

0 20 40 60 80 100

0.0

0.5

1.0

1.5

2.0

2.5

.. .......... .................. ..................... ............ .......................... . ... ................................................

(the expectation of a random step function is not astep function).

25

An example of Bayesian model averaging: posterior expectation of theintensity

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 54 / 69

Page 60: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Alternatives to across-model sampling

Alternatives to joint model-parameter sampling

When marginal likelihoods are tractable, it’s usually a good idea tocompute them (thus integrating out θk ) then conductsearch/sampling only over the model indicator k .This direct approach has been taken a little further byGodsill (2001), who considers cases of ‘partial analytic structure’,where some of the parameters in θk may be integrated out, andthe others left unchanged in the move that updates the model, togive an across-model sampler with probable superiorperformance.Marginal likelihoods via within-model sampling: Nial’s talk!

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 55 / 69

Page 61: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Alternatives to across-model sampling

Some issues in choosing a sampler

How many models are there?Do we want results across k , within each k , or for one k ofinterest?Do we need the Evidence (marginal likelihood) values p(Y |k)absolutely, or only relatively?Jumping between models as an aid to mixing (c.f. simulatedtempering: mixing may be better in the ‘other’ model)Are samplers for individual models already written and tested?Are standard strategies like split/merge likely to work?Trade-off between remembering and forgetting θk when leavingmodel k

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 56 / 69

Page 62: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Alternatives to across-model sampling

The different ways that models can interconnect

completely unrelatednested in some irregular waymixture models (nesting and exchangeability)variable selection (factorial structure)graphical models (possibly with constraints such asdecomposability)

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 57 / 69

Page 63: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Methodological relatives

Methodological connections and extensions

Jump–diffusion (Grenander & Miller, 1994, Phillips & Smith, 1996)Point process representations and samplers (Geyer & Møller,1994, Stephens, 2000)Product-space samplers (Carlin & Chib, 1995, Green & O’Hagan,1998, Dellaportas et al., 2002)Delayed rejection (Green & Mira, 2001)Connections between reversible jump and continuous timebirth-and-death samplers (Cappé, Robert & Rydén, 2001)Composite model space framework (Godsill, 2001)Efficient construction of proposals (Brooks, Giudici & Roberts,2003)Automatic RJ sampler (Hastie, 2005)Population RJMCMC and Interacting SMC (Jasra et al, 2007/8)

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 58 / 69

Page 64: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Better Bayes factors

Bartolucci, Scaccia and Mira

The obvious estimator of p(k |Y ) is the empirical frequency

p̂(k |Y ) =#{t : k (t) = k}

N

the proportion of the MCMC run spent in model k , but Bartolucci,Scaccia and Mira (Biometrika, 2006) showed that we can do betterusing one of several ‘Rao-Blackwellised’ estimators, exploiting the factthat the sampler moves into each model with a probability that isknown at the time.

We can also use this in BMA, etc.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 59 / 69

Page 65: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Better Bayes factors

Bartolucci, Scaccia and Mira

The basic idea of the simplest version.Let α(t)

kl be the Metropolis-Hastings acceptance ratio for moving frommodel k to model l on sweep t , or 0 if we are not in k at time t − 1 ordo not propose such a move. Define

B̂kl =

∑Nt=1 α

(t)lk /nl∑N

t=1 α(t)kl /nk

where nk = #{t : k (t = k} is the number of visits to model k . Thisderives from Meng–Wong type bridge-sampling ideas, somewhatanalogous to Chib–Jeliazkov.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 60 / 69

Page 66: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Better Bayes factors

Bartolucci, Scaccia and Mira

0 1000 2000 3000 4000

0.56

0.58

0.60

0.62

0.64

sweep

p(k=

2|Y

)

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 61 / 69

Page 67: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Prior model probabilities may be fictionalThe ideal Bayesian (or her/his scientist colleague) has real priorprobabilities reflecting scientific judgement/belief across the modelspace; not very common in practice!

Arbirariness in prior model probabilities may not affect Bayesfactors, but it sabotages Bayesian model averaging!

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 62 / 69

Page 68: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Prior model probabilities may be fictionalThe ideal Bayesian (or her/his scientist colleague) has real priorprobabilities reflecting scientific judgement/belief across the modelspace; not very common in practice!

Arbirariness in prior model probabilities may not affect Bayesfactors, but it sabotages Bayesian model averaging!

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 62 / 69

Page 69: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Prior model probabilities may be fictionalThe ideal Bayesian (or her/his scientist colleague) has real priorprobabilities reflecting scientific judgement/belief across the modelspace; not very common in practice!

Arbirariness in prior model probabilities may not affect Bayesfactors, but it sabotages Bayesian model averaging!

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 62 / 69

Page 70: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

No chance of passing the test of a sensitivity analysisIn ordinary parametric problems we commonly find that inferencesare rather insensitive to moderately large variations in priorassumptions, except when there are very few data (indeed, theopposite case, of high sensitivity, poses a challenge to thenon-Bayesian – perhaps the data carry less information thanhoped?). But it’s clear that a test of sensitivity to modelprobabilities will always fail:

p?(k |Y )

p?(l |Y )=

p(k |Y )

p(l |Y )×(

p?(k)p?(l)

÷ p(k)p(l)

)

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 63 / 69

Page 71: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

No chance of passing the test of a sensitivity analysisIn ordinary parametric problems we commonly find that inferencesare rather insensitive to moderately large variations in priorassumptions, except when there are very few data (indeed, theopposite case, of high sensitivity, poses a challenge to thenon-Bayesian – perhaps the data carry less information thanhoped?). But it’s clear that a test of sensitivity to modelprobabilities will always fail:

p?(k |Y )

p?(l |Y )=

p(k |Y )

p(l |Y )×(

p?(k)p?(l)

÷ p(k)p(l)

)

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 63 / 69

Page 72: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

No chance of passing the test of a sensitivity analysisIn ordinary parametric problems we commonly find that inferencesare rather insensitive to moderately large variations in priorassumptions, except when there are very few data (indeed, theopposite case, of high sensitivity, poses a challenge to thenon-Bayesian – perhaps the data carry less information thanhoped?). But it’s clear that a test of sensitivity to modelprobabilities will always fail:

p?(k |Y )

p?(l |Y )=

p(k |Y )

p(l |Y )×(

p?(k)p?(l)

÷ p(k)p(l)

)

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 63 / 69

Page 73: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Improper parameter priors problemsIn ordinary parametric problems it is commonly true that it is safeto use improper priors – when posterior distributions arewell-defined as limits based on a sequence of approximatingproper priors (and not usually sensitive to what that sequence is).

But improper parameter priors make Bayes factors indeterminate(since improper priors can only be defined up to arbitrarynormalising constants, which persist into marginal likelihoods).

And proper but vague/diffuse priors fail to solve the problem, sincethe Bayes factors will depend on the arbitrary degree ofvagueness used.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 64 / 69

Page 74: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Improper parameter priors problemsIn ordinary parametric problems it is commonly true that it is safeto use improper priors – when posterior distributions arewell-defined as limits based on a sequence of approximatingproper priors (and not usually sensitive to what that sequence is).

But improper parameter priors make Bayes factors indeterminate(since improper priors can only be defined up to arbitrarynormalising constants, which persist into marginal likelihoods).

And proper but vague/diffuse priors fail to solve the problem, sincethe Bayes factors will depend on the arbitrary degree ofvagueness used.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 64 / 69

Page 75: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Improper parameter priors problemsIn ordinary parametric problems it is commonly true that it is safeto use improper priors – when posterior distributions arewell-defined as limits based on a sequence of approximatingproper priors (and not usually sensitive to what that sequence is).

But improper parameter priors make Bayes factors indeterminate(since improper priors can only be defined up to arbitrarynormalising constants, which persist into marginal likelihoods).

And proper but vague/diffuse priors fail to solve the problem, sincethe Bayes factors will depend on the arbitrary degree ofvagueness used.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 64 / 69

Page 76: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Improper parameter priors problemsIn ordinary parametric problems it is commonly true that it is safeto use improper priors – when posterior distributions arewell-defined as limits based on a sequence of approximatingproper priors (and not usually sensitive to what that sequence is).

But improper parameter priors make Bayes factors indeterminate(since improper priors can only be defined up to arbitrarynormalising constants, which persist into marginal likelihoods).

And proper but vague/diffuse priors fail to solve the problem, sincethe Bayes factors will depend on the arbitrary degree ofvagueness used.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 64 / 69

Page 77: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Improper parameter priors problems, continued

In certain circumstances, ideas such as Intrinsic or FractionalBayes factors, or Expected Posterior priors, can be applied,essentially based on tying together improper priors acrossdifferent models. These ideas lose much of the appeal of idealBayes arguments, have arbitrary aspects, and are not widelyaccepted.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 65 / 69

Page 78: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

In reality, it’s not that easy...

Improper parameter priors problems, continued

In certain circumstances, ideas such as Intrinsic or FractionalBayes factors, or Expected Posterior priors, can be applied,essentially based on tying together improper priors acrossdifferent models. These ideas lose much of the appeal of idealBayes arguments, have arbitrary aspects, and are not widelyaccepted.

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 65 / 69

Page 79: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

Model uncertainty? Yes, but do we have to choose?

Model uncertainty is a fact of life.When can we quantify it?When can we eliminate it?When can we accommodate it?

Why choose?PredictionScientific understandingPresentationPolicyDefence

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 66 / 69

Page 80: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

An illusion of unity?

Is ‘model’ too much of a catch-all?different scientific mechanismsselection of predictors in regressionnumber of components in mixtureorder of AR modelcomplexity of polynomial or spline

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 67 / 69

Page 81: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Real Bayes

All the other criteria

Bayesian hypothesis testing, and ‘alternative-prior carpentry’AIC, BIC, DIC, DIC+, MDL, Cp

Decision theoryBayesian p-valuesPosterior predictive checks

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 68 / 69

Page 82: How to compute posterior model probabilities and why … · How to compute posterior model probabilities ... and why that doesn’t mean that we have solved the problem of model choice

Acknowledgements

Thanks to all my collaborators on trans-dimensional MCMC problems:Carmen Fernández, Paolo Giudici, Miles Harkness, David Hastie,Matthew Hodgson, Antonietta Mira, Agostino Nobile, Marco Pievatolo,Sylvia Richardson, Luisa Scaccia and Claudia Tarantola.

References and preprints available from

http://www.stats.bris.ac.uk/∼peter/Research.htmlhttp://www.stats.bris.ac.uk/∼peter/papers/[email protected]

c©University of Bristol, 2011

Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 69 / 69