How to compute posterior model probabilities and why … · How to compute posterior model...
Transcript of How to compute posterior model probabilities and why … · How to compute posterior model...
How to compute posterior model probabilities ... andwhy that doesn’t mean that we have solved the
problem of model choice
Peter Green
School of MathematicsUniversity of Bristol
15 March 2011, Groningen: All models are wrong...
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 1 / 69
Outline
1 Ideal Bayes
2 Across-model methodology
3 Examples
4 Alternatives to across-model sampling
5 Methodological relatives
6 Better Bayes factors
7 Real Bayes
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 2 / 69
Ideal Bayes
Hierarchical model
Suppose givena prior p(k) over models k in a countable set K, andfor each k
a prior distribution p(θk |k), anda likelihood p(Y |k , θk ) for the data Y .
For definiteness and simplicity, suppose that p(θk |k) is a density in nkdimensions, and that there are no other parameters, so that wherethere are parameters common to all models these are subsumed intoeach θk ∈ Rnk .
Additional parameters, perhaps in additional layers of a hierarchy, areeasily dealt with. All probability distributions are proper.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 3 / 69
Ideal Bayes
Hierarchical model
Suppose givena prior p(k) over models k in a countable set K, andfor each k
a prior distribution p(θk |k), anda likelihood p(Y |k , θk ) for the data Y .
For definiteness and simplicity, suppose that p(θk |k) is a density in nkdimensions, and that there are no other parameters, so that wherethere are parameters common to all models these are subsumed intoeach θk ∈ Rnk .
Additional parameters, perhaps in additional layers of a hierarchy, areeasily dealt with. All probability distributions are proper.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 3 / 69
Ideal Bayes
Hierarchical model as DAG
k
Y
model indicator
model-specific parameter
data
)(kp
)|( kp k
)θκ,|p(Y κ
The generic hierarchical modelfor model choice
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 4 / 69
Ideal Bayes
The joint posterior
p(k , θk |Y ) =p(k)p(θk |k)p(Y |k , θk )∑
k ′∈K∫
p(k ′)p(θ′k ′ |k ′)p(Y |k ′, θ′k ′)dθ′k ′
can always be factorised as
p(k , θk |Y ) = p(k |Y )p(θk |k ,Y )
– the product of posterior model probabilities and model-specificparameter posteriors.– very often the basis for reporting the inference, and in some of themethods mentioned below is also the basis for computation.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 5 / 69
Ideal Bayes
Bayes factors, Evidence
Perhaps you didn’t think you wanted to report p(k |Y )?Of course, with these posterior probabilities, we can report BayesFactors
Bkl =p(Y |k)p(Y |l)
=p(k |Y )
p(l |Y )÷ p(k)
p(l)
for pairwise comparison of models.
For some, the Evidence (= marginal likelihood) p(Y |k) has an intrinsicmeaning and interpretation.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 6 / 69
Ideal Bayes
Bayesian model averaging, prediction
We can do model averaging
E(F |Y ) =∑
k
∫F (k , θk )p(k , θk |Y )dθk
for any function F with the same interpretation in each model; theexpectation can be estimated simply by averaging F along the entirerun, essentially ignoring the model indicator k .
As an example, we can do prediction:
p(Y+|Y ) =∑
k
p(Y+|k ,Y )p(k |Y )
a posterior-weighted mixture of the within-model-k predictions
p(Y+|k ,Y ) =
∫p(Y+|k , θk )p(θk |k ,Y )dθk
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 7 / 69
Ideal Bayes
Bayesian model averaging, prediction
We can do model averaging
E(F |Y ) =∑
k
∫F (k , θk )p(k , θk |Y )dθk
for any function F with the same interpretation in each model; theexpectation can be estimated simply by averaging F along the entirerun, essentially ignoring the model indicator k .
As an example, we can do prediction:
p(Y+|Y ) =∑
k
p(Y+|k ,Y )p(k |Y )
a posterior-weighted mixture of the within-model-k predictions
p(Y+|k ,Y ) =
∫p(Y+|k , θk )p(θk |k ,Y )dθk
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 7 / 69
Ideal Bayes
Note the generality of this basic formulation: it embraces bothgenuine model-choice situations, where the variable k indexes thecollection of discrete models under consideration, but alsosettings where there is really a single model, but one with avariable dimension parameter, for example a functionalrepresentation such as a series whose number of terms is notfixed (in which case, k is unlikely to be of direct inferential interest).
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 8 / 69
Ideal Bayes
Splines
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●●●
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
x
y
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 9 / 69
Ideal Bayes
Polynomials
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●●●
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
x
y
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 10 / 69
Ideal Bayes
Compatibility across models
Some would argue that responsible adoption of this Bayesianhierarchical model presupposes that, e.g., p(θk |k) should becompatible in that inference about functions of parameters that aremeaningful in several models should be approximately invariant to k .
Such compatibility could in principle be exploited in the construction ofMCMC methods (how?).
But it is philosophically tenable that no such compatibility is present,and we can’t assume it.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 11 / 69
Across-model methodology
Across- and within-model simulation
How to compute p(k , θk |Y )?
Two main approaches using MCMC:across: one MCMC simulation with states of the form(k , θk ) ≈ p(k , θk |Y )
within: separate simulations of θk ≈ p(θk |k ,Y ) for each k .and beyond straight MCMC:
particle filter: SMC in place of MCMCABC: ‘likelihood-free’ methods where Y |k , θk can be simulated butp(Y |k , θk ) cannot be evaluated.nested samplingvariational methods
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 12 / 69
Across-model methodology
Across- and within-model simulation
How to compute p(k , θk |Y )?
Two main approaches using MCMC:across: one MCMC simulation with states of the form(k , θk ) ≈ p(k , θk |Y )
within: separate simulations of θk ≈ p(θk |k ,Y ) for each k .and beyond straight MCMC:
particle filter: SMC in place of MCMCABC: ‘likelihood-free’ methods where Y |k , θk can be simulated butp(Y |k , θk ) cannot be evaluated.nested samplingvariational methods
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 12 / 69
Across-model methodology
Across-model simulation: Reversible jump MCMC
The across-model MCMC simulator follows the ideal Bayesian, andtreats (k , θk ) as a single unknown. The state space for anacross-model simulation is {(k , θk )} =
⋃k∈K({k} ×Rnk ).
Mathematically, this is not a particularly awkward object. But at least alittle non-standard, when nk varies with k .
We use Metropolis-Hastings to build a suitable reversible chain.
On the face of it, this requires measure-theoretic notation, which maybe unwelcome! The point of the ‘reversible jump’ framework is torender the measure theory invisible, by means of a construction usingonly ordinary densities. Even the fact that we are jumping dimensionsbecomes essentially invisible.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 13 / 69
Across-model methodology
Across-model simulation: Reversible jump MCMC
The across-model MCMC simulator follows the ideal Bayesian, andtreats (k , θk ) as a single unknown. The state space for anacross-model simulation is {(k , θk )} =
⋃k∈K({k} ×Rnk ).
Mathematically, this is not a particularly awkward object. But at least alittle non-standard, when nk varies with k .
We use Metropolis-Hastings to build a suitable reversible chain.
On the face of it, this requires measure-theoretic notation, which maybe unwelcome! The point of the ‘reversible jump’ framework is torender the measure theory invisible, by means of a construction usingonly ordinary densities. Even the fact that we are jumping dimensionsbecomes essentially invisible.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 13 / 69
Across-model methodology
Modelling the code
Let’s avoid all abstraction!
Consider how the transition will be implemented.
Assume X ⊂ Rd , and that π has a density (also denoted π).
At the current state x , we generate, say, r random numbers u from aknown joint density g, and then form the proposed new state as adeterministic function of the current state and the random numbers:x ′ = h(x ,u), say.
The reverse transition from x ′ to x would be made with the aid ofrandom numbers u′ ∼ g′ giving x = h′(x ′,u′).
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 14 / 69
Across-model methodology
Modelling the code
x=h’(x’,u’)
x’=h(x,u)u
u’
h
h’
Reversible jump
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 15 / 69
Across-model methodology
The equilibrium probability of jumping from A to B is then an integralwith respect to (x ,u):∫
(x ,x ′=h(x ,u))∈A×Bπ(x)g(u)α(x , x ′)dx du.
The equilibrium probability of jumping from B to A is an integral withrespect to (x ′,u′):∫
(x=h′(x ′,u′),x ′)∈A×Bπ(x ′)g′(u′)α(x ′, x)dx ′ du′.
If the transformation from (x ,u) to (x ′,u′) is a diffeomorphism (thetransformation and its inverse are differentiable), then we can apply thestandard change-of-variable formula, to write this as an integral withrespect to (x ,u).
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 16 / 69
Across-model methodology
Detailed balance says the two integrals must be equal:it holds if
α(x , x ′) = min{
1,π(x ′)g′(u′)π(x)g(u)
∣∣∣∣∂(x ′,u′)∂(x ,u)
∣∣∣∣} ,involving only ordinary joint densities.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 17 / 69
Across-model methodology
What’s the point?
Perhaps a little indirect!– but it’s a flexible framework for constructing quite complex movesusing only elementary calculus.
The possibility that r < d covers the typical case that given x ∈ X , onlya lower-dimensional subset of X is reachable in one step.
(The Gibbs sampler is the best-known example of this, since in thatcase only some of the components of the state vector are changed ata time, although the formulation here is more general as it allows thesubset not to be parallel to the coordinate axes.)
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 18 / 69
Across-model methodology
What’s the point?
Perhaps a little indirect!– but it’s a flexible framework for constructing quite complex movesusing only elementary calculus.
The possibility that r < d covers the typical case that given x ∈ X , onlya lower-dimensional subset of X is reachable in one step.
(The Gibbs sampler is the best-known example of this, since in thatcase only some of the components of the state vector are changed ata time, although the formulation here is more general as it allows thesubset not to be parallel to the coordinate axes.)
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 18 / 69
Across-model methodology
The trans-dimensional case
But the main benefit of this formalism is that
α(x , x ′) = min{
1,π(x ′)g′(u′)π(x)g(u)
∣∣∣∣∂(x ′,u′)∂(x ,u)
∣∣∣∣} ,applies, without change, in a variable dimension context.
(Use the same symbol π(x) for the target density whatever thedimension of x in different parts of X .)
Provided that the transformation from (x ,u) to (x ′,u′) remains adiffeomorphism, the individual dimensions of x and x ′ can be different.The dimension-jumping is ‘invisible’.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 19 / 69
Across-model methodology
Dimension matching
Suppose the dimensions of x , x ′,u and u′ are d ,d ′, r and r ′
respectively, then we have functions h : Rd ×Rr → Rd ′and
h′ : Rd ′ ×Rr ′ → Rd , used respectively in x ′ = h(x ,u) andx = h′(x ′,u′).
For the transformation from (x ,u) to (x ′,u′) to be a diffeomorphismrequires that d + r = d ′ + r ′, so-called ‘dimension-matching’; if thisequality failed, the mapping and its inverse could not both bedifferentiable.
Dimension matching is necessary but not sufficient.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 20 / 69
Across-model methodology
Dimension matching
x
u1
u2
x’1
x’2
u’
Dimension matching
'' rdrd
1221
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 21 / 69
Across-model methodology
Dimension matching
'' rdrd x
u
x’1
x’2
Dimension matching
0211
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 22 / 69
Across-model methodology
Details of application to model-choice
We wish to use these reversible jump moves to sample the spaceX =
⋃k∈K({k} ×Rnk ) with invariant distribution π, which here is
p(k , θk |Y ).
Just as in ordinary MCMC, we typically need multiple types of movesto traverse the whole space X . Each move is a transition kernelreversible with respect to π, but only in combination do we obtain anergodic chain.
The moves will be indexed by m: consider a particular move m thatproposes to take x = (k , θk ) to x ′ = (k ′, θ′k ′) or vice-versa for a specificpair (k , k ′).
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 23 / 69
Across-model methodology
The acceptance probability is derived as before, and yields
αm(x , x ′) = min{
1,π(x ′)π(x)
jm(x ′)jm(x)
g′m(u′)gm(u)
∣∣∣∣∂(x ′,u′)∂(x ,u)
∣∣∣∣} .Here jm(x) is the probability of choosing move type m when at x , thevariables x , x ′,u,u′ are of dimensions dm,d ′m, rm, r ′m respectively, withdm + rm = d ′m + r ′m, we have x ′ = hm(x ,u) and x = h′m(x ′,u′), and theJacobian has a form correspondingly depending on m.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 24 / 69
Examples
Toy example
..... of no statistical use at all!Suppose x lies in R∪R2: π(x) is a mixture:with probability p1, x ∼ Beta(2,3),with probability p2, (x1, x2,1− x1 − x2) ∼ Dirichlet(2,3,4).I will use three moves:(1) within R: x → U(x − ε, x + ε).(2) within R2: (x1, x2)→ (x2, x1).(3) between R and R2
In R, choose (1) or (3) with probabilities 1− r1, r1.In R2, choose (2) or (3) with probabilities 1− r2, r2.Thus j3(x) = r1 for all x ∈ R and j3(x ′) = r2 for all x ′ ∈ R2.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 25 / 69
Examples
Dimension-changing with move (3)
Proposal:To go from x ∈ R to (x1, x2) ∈ R2, draw u from U(0,1) [so g3(u) = 1 if0 < u < 1] and propose (x1, x2) = (x ,u). For reverse move, no u′
required [write g′3(u′) ≡ 1] and set x = x1. This certainly gives a
bijection: (x ,u)↔ (x1, x2), with Jacobian = 1.Acceptance decision:
α = min{
1,π(x ′)π(x)
j3(x ′)j3(x)
g′3(u′)
g3(u)
∣∣∣∣∂(x ′,u′)∂(x ,u)
∣∣∣∣}= min
{1,
p2fD(x ,u)p1fB(x)
r2
r1
11|1|}
= min{
1,p2r2fD(x ,u)p1r1fB(x)
}
For reverse move, α = min{1, (p1r1fB(x))/(p2r2fD(x ,u))}.Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 26 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 1 sweeps
●●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 27 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 2 sweeps
●
●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 28 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 3 sweeps
●
●
●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 29 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 4 sweeps
●
●
●
●●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 30 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 5 sweeps
●
●
●
●
●●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 31 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 6 sweeps
●
●
●
●
●●●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 32 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 7 sweeps
●
●
●
●
●
●●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 33 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 8 sweeps
●
●
●
●
●
●
●●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 34 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 9 sweeps
●
●
●
●
●
●
●
●●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 35 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 10 sweeps
●
●
●
●
●●
●
●
●●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 36 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 20 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●
●●
●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 37 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 30 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 38 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 40 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●
●●
●●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 39 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 50 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●
●●
●●●
●
●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 40 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 100 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●●
●●
●●
●●●●●
●●
●
●
●●
●
●●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 41 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 200 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●●
●●
●●
●●●●●
●●
●
●
●
●●●●
●●
●●●●
●●●●
●
●●●
●
●●●●●●
●
●●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●● ●●
●●●●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●●●●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 42 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 300 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●●
●●
●●
●●●●●
●●
●
●
●
●●●●
●●
●●●●
●●●●
●
●●●
●
●●●●●●
●
●●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●●●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●●●●
●
●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●● ●●
●●●●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●●●●
●
●
●●●●●
●
●
●
●●●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●●●
●
●●●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 43 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 400 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●●
●●
●●
●●●●●
●●
●
●
●
●●●●
●●
●●●●
●●●●
●
●●●
●
●●●●●●
●
●●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●●●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●●●●
●
●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●
●
●
●●●●●●
●●●
●
●●●●●●●●●●●●●●●
●
●●
●●
●
●●●●●
●
●
●●
●
●●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●● ●●
●●●●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●●●●
●
●
●●●●●
●
●
●
●●●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●●●
●
●●●●●
●
●
●
●
●●●●●●●
●●
●
●●
●
●●
●
●
●●
●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 44 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 500 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●●
●●
●●
●●●●●
●●
●
●
●
●●●●
●●
●●●●
●●●●
●
●●●
●
●●●●●●
●
●●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●●●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●●●●
●
●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●
●
●
●●●●●●
●●●
●
●●●●●●●●●●●●●●●
●
●●
●●
●
●●●●●●●●
●
●
●
●●
●
●
●●●●●
●
●●
●
●●●
●●●●
●
●●●●
●
●
●
●●●
●
●●●●
●●
●
●●
●●
●
●●
●●
●●●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●●
●●●●●
●
●
●
●●
●
●●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●● ●●
●●●●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●●●●
●
●
●●●●●
●
●
●
●●●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●●●
●
●●●●●
●
●
●
●
●●●●●●●
●●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●●●●
●
●●
●
●
●●
●
●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 45 / 69
Examples
Toy example
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
60.
81.
0
after 1000 sweeps
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●●
●●
●●
●●●●●
●●
●
●
●
●●●●
●●
●●●●
●●●●
●
●●●
●
●●●●●●
●
●●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●●●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●●●●
●
●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●
●
●
●●●●●●
●●●
●
●●●●●●●●●●●●●●●
●
●●
●●
●
●●●●●●●●
●
●
●
●●
●
●
●●●●●
●
●●
●
●●●
●●●●
●
●●●●
●
●
●
●●●
●
●●●●
●●
●
●●
●●
●
●●
●●
●●●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●●
●●●●●
●
●
●
●●
●
●
●
●
●●●
●●
●●●●●●●
●●
●
●●
●●
●●
●
●●●
●
●
●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●
●
●●●●●●●●●●●●
●
●●●●●●●
●●
●
●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●
●●
●●
●●●
●●
●●
●●
●
●●
●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●●
●●●●
●
●●
●●●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●
●
●●●●●●●
●●●
●
●
●
●●
●
●●
●●●
●●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●●●
●
●●
●
●
●●●●●●●
●
●●
●
●
●●
●
●●
●
●
●
●●●●●●
●●●●●●
●
●
●●
●
●
●●
●●●●●●
●
●
●●
●
●●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●● ●●
●●●●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●●●●
●
●
●●●●●
●
●
●
●●●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●●●
●
●●●●●
●
●
●
●
●●●●●●●
●●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●●●●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●●
●
●
●●●
●
●●
●
●
●
●
● ●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●●●
●
●
●●
●●
●
●
●●
●
●
●●●
●
●
●●● ●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 46 / 69
Examples
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 47 / 69
Examples
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x1 ~ Beta(2,3)
data
mod
el
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x2[1] ~ Beta(2,7)
data
mod
el
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x2[2] ~ Beta(3,6)
data
mod
el
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 48 / 69
Examples
Cyclones in Bay of Bengal
Multiple change-point analysis
A Bayesian approach to change-point analysis forpoint processes �y�� y�� � � � � yn�: combines
� Poisson-process likelihood:p�yjx� � expf
Pn
i�� log x�yi��R L�x�t�dtg
� prior model for step function x�t�, � � t � L,representing intensity
Prior model: represent step function by�k� fsjg
kj��� fhjg
kj���: x�t� �
Pj hjI�sj�sj����t�:
� number of steps k: Poisson(�),
� step heights hj : Gamma(�� �),
� step positions sj : p�sjk� �Q
j�sj�� � sj�,
all independent.
18
MCMC for step functions
� � �k� fsjgkj��� fhjg
kj��� �� ��
I will use four moves:
(a) Metropolis change to a randomly chosen stepheight hj .
(b) Metropolis change to a randomly chosen stepposition sj .
(c) Jump move: birth/death of steps
– birth: choose new step position s� atrandom, split current step height h into two:�h�� h��
– death: choose step at random to kill,combine current step heights �h�� h�� intoone: h
(d) Update hyperparameters �, �
19
Birth and death of steps
j-h
hj+
s* sjsj-1
h
hw�
�hw�� � hw��w�
�h�w� s�� u�� �h�� h�� w�� w��
20
Example:cyclones hitting the Bay of Bengal
141 cyclones over a period of 100 years(a cyclone is a storm with winds � �� km h��).
time
0 20 40 60 80 100
.. .......... .................. ..................... ............ .......................... . ... ................................................
21
A point process – of cyclone events – in time: modelled here as aninhomogeneous Poisson process, with piecewise constant intensity.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 49 / 69
Examples
Cyclones in Bay of Bengal
Multiple change-point analysis
A Bayesian approach to change-point analysis forpoint processes �y�� y�� � � � � yn�: combines
� Poisson-process likelihood:p�yjx� � expf
Pn
i�� log x�yi��R L�x�t�dtg
� prior model for step function x�t�, � � t � L,representing intensity
Prior model: represent step function by�k� fsjg
kj��� fhjg
kj���: x�t� �
Pj hjI�sj�sj����t�:
� number of steps k: Poisson(�),
� step heights hj : Gamma(�� �),
� step positions sj : p�sjk� �Q
j�sj�� � sj�,
all independent.
18
MCMC for step functions
� � �k� fsjgkj��� fhjg
kj��� �� ��
I will use four moves:
(a) Metropolis change to a randomly chosen stepheight hj .
(b) Metropolis change to a randomly chosen stepposition sj .
(c) Jump move: birth/death of steps
– birth: choose new step position s� atrandom, split current step height h into two:�h�� h��
– death: choose step at random to kill,combine current step heights �h�� h�� intoone: h
(d) Update hyperparameters �, �
19
Birth and death of steps
j-h
hj+
s* sjsj-1
h
hw�
�hw�� � hw��w�
�h�w� s�� u�� �h�� h�� w�� w��
20
Example:cyclones hitting the Bay of Bengal
141 cyclones over a period of 100 years(a cyclone is a storm with winds � �� km h��).
time
0 20 40 60 80 100
.. .......... .................. ..................... ............ .......................... . ... ................................................
21
Illustrating the model-jumping move: splitting and merging steps.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 50 / 69
Examples
Cyclones in Bay of Bengal
Choices of hyperparameters:
� Prior on k: Poisson(�), with � � �.
� Prior on hj : Gamma(�,�), with– � � ���� ��
– � � ���� n�L�
Sample of step functions from the posterior:
time
inte
nsity
0 20 40 60 80 100
01
23
22
Posterior for the number of change points k
o
o
oo
o
o
o
o o o o o o
k
prob
abili
ty
0 2 4 6 8 10 12
0.0
0.10
0.20
Zero change points is ruled out; k � � or � moreprobable than under the prior.
23
Posterior density estimates for change-pointpositions
time
dens
ity
0 20 40 60 80 100
0.0
0.05
0.10
0.15
24
Model-averaged estimate: E�x���jy�
time
inte
nsity
0 20 40 60 80 100
0.0
0.5
1.0
1.5
2.0
2.5
.. .......... .................. ..................... ............ .......................... . ... ................................................
(the expectation of a random step function is not astep function).
25
A small sample from the posterior
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 51 / 69
Examples
Cyclones in Bay of Bengal
Choices of hyperparameters:
� Prior on k: Poisson(�), with � � �.
� Prior on hj : Gamma(�,�), with– � � ���� ��
– � � ���� n�L�
Sample of step functions from the posterior:
time
inte
nsity
0 20 40 60 80 100
01
23
22
Posterior for the number of change points k
o
o
oo
o
o
o
o o o o o o
k
prob
abili
ty
0 2 4 6 8 10 12
0.0
0.10
0.20
Zero change points is ruled out; k � � or � moreprobable than under the prior.
23
Posterior density estimates for change-pointpositions
time
dens
ity
0 20 40 60 80 100
0.0
0.05
0.10
0.15
24
Model-averaged estimate: E�x���jy�
time
inte
nsity
0 20 40 60 80 100
0.0
0.5
1.0
1.5
2.0
2.5
.. .......... .................. ..................... ............ .......................... . ... ................................................
(the expectation of a random step function is not astep function).
25
Posterior of the number of steps
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 52 / 69
Examples
Cyclones in Bay of Bengal
Choices of hyperparameters:
� Prior on k: Poisson(�), with � � �.
� Prior on hj : Gamma(�,�), with– � � ���� ��
– � � ���� n�L�
Sample of step functions from the posterior:
time
inte
nsity
0 20 40 60 80 100
01
23
22
Posterior for the number of change points k
o
o
oo
o
o
o
o o o o o o
k
prob
abili
ty
0 2 4 6 8 10 12
0.0
0.10
0.20
Zero change points is ruled out; k � � or � moreprobable than under the prior.
23
Posterior density estimates for change-pointpositions
time
dens
ity
0 20 40 60 80 100
0.0
0.05
0.10
0.15
24
Model-averaged estimate: E�x���jy�
time
inte
nsity
0 20 40 60 80 100
0.0
0.5
1.0
1.5
2.0
2.5
.. .......... .................. ..................... ............ .......................... . ... ................................................
(the expectation of a random step function is not astep function).
25
An example of inference conditional on k : posterior densities of steppositions
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 53 / 69
Examples
Cyclones in Bay of Bengal
Choices of hyperparameters:
� Prior on k: Poisson(�), with � � �.
� Prior on hj : Gamma(�,�), with– � � ���� ��
– � � ���� n�L�
Sample of step functions from the posterior:
time
inte
nsity
0 20 40 60 80 100
01
23
22
Posterior for the number of change points k
o
o
oo
o
o
o
o o o o o o
k
prob
abili
ty
0 2 4 6 8 10 12
0.0
0.10
0.20
Zero change points is ruled out; k � � or � moreprobable than under the prior.
23
Posterior density estimates for change-pointpositions
time
dens
ity
0 20 40 60 80 100
0.0
0.05
0.10
0.15
24
Model-averaged estimate: E�x���jy�
time
inte
nsity
0 20 40 60 80 100
0.0
0.5
1.0
1.5
2.0
2.5
.. .......... .................. ..................... ............ .......................... . ... ................................................
(the expectation of a random step function is not astep function).
25
An example of Bayesian model averaging: posterior expectation of theintensity
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 54 / 69
Alternatives to across-model sampling
Alternatives to joint model-parameter sampling
When marginal likelihoods are tractable, it’s usually a good idea tocompute them (thus integrating out θk ) then conductsearch/sampling only over the model indicator k .This direct approach has been taken a little further byGodsill (2001), who considers cases of ‘partial analytic structure’,where some of the parameters in θk may be integrated out, andthe others left unchanged in the move that updates the model, togive an across-model sampler with probable superiorperformance.Marginal likelihoods via within-model sampling: Nial’s talk!
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 55 / 69
Alternatives to across-model sampling
Some issues in choosing a sampler
How many models are there?Do we want results across k , within each k , or for one k ofinterest?Do we need the Evidence (marginal likelihood) values p(Y |k)absolutely, or only relatively?Jumping between models as an aid to mixing (c.f. simulatedtempering: mixing may be better in the ‘other’ model)Are samplers for individual models already written and tested?Are standard strategies like split/merge likely to work?Trade-off between remembering and forgetting θk when leavingmodel k
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 56 / 69
Alternatives to across-model sampling
The different ways that models can interconnect
completely unrelatednested in some irregular waymixture models (nesting and exchangeability)variable selection (factorial structure)graphical models (possibly with constraints such asdecomposability)
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 57 / 69
Methodological relatives
Methodological connections and extensions
Jump–diffusion (Grenander & Miller, 1994, Phillips & Smith, 1996)Point process representations and samplers (Geyer & Møller,1994, Stephens, 2000)Product-space samplers (Carlin & Chib, 1995, Green & O’Hagan,1998, Dellaportas et al., 2002)Delayed rejection (Green & Mira, 2001)Connections between reversible jump and continuous timebirth-and-death samplers (Cappé, Robert & Rydén, 2001)Composite model space framework (Godsill, 2001)Efficient construction of proposals (Brooks, Giudici & Roberts,2003)Automatic RJ sampler (Hastie, 2005)Population RJMCMC and Interacting SMC (Jasra et al, 2007/8)
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 58 / 69
Better Bayes factors
Bartolucci, Scaccia and Mira
The obvious estimator of p(k |Y ) is the empirical frequency
p̂(k |Y ) =#{t : k (t) = k}
N
the proportion of the MCMC run spent in model k , but Bartolucci,Scaccia and Mira (Biometrika, 2006) showed that we can do betterusing one of several ‘Rao-Blackwellised’ estimators, exploiting the factthat the sampler moves into each model with a probability that isknown at the time.
We can also use this in BMA, etc.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 59 / 69
Better Bayes factors
Bartolucci, Scaccia and Mira
The basic idea of the simplest version.Let α(t)
kl be the Metropolis-Hastings acceptance ratio for moving frommodel k to model l on sweep t , or 0 if we are not in k at time t − 1 ordo not propose such a move. Define
B̂kl =
∑Nt=1 α
(t)lk /nl∑N
t=1 α(t)kl /nk
where nk = #{t : k (t = k} is the number of visits to model k . Thisderives from Meng–Wong type bridge-sampling ideas, somewhatanalogous to Chib–Jeliazkov.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 60 / 69
Better Bayes factors
Bartolucci, Scaccia and Mira
0 1000 2000 3000 4000
0.56
0.58
0.60
0.62
0.64
sweep
p(k=
2|Y
)
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 61 / 69
Real Bayes
In reality, it’s not that easy...
Prior model probabilities may be fictionalThe ideal Bayesian (or her/his scientist colleague) has real priorprobabilities reflecting scientific judgement/belief across the modelspace; not very common in practice!
Arbirariness in prior model probabilities may not affect Bayesfactors, but it sabotages Bayesian model averaging!
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 62 / 69
Real Bayes
In reality, it’s not that easy...
Prior model probabilities may be fictionalThe ideal Bayesian (or her/his scientist colleague) has real priorprobabilities reflecting scientific judgement/belief across the modelspace; not very common in practice!
Arbirariness in prior model probabilities may not affect Bayesfactors, but it sabotages Bayesian model averaging!
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 62 / 69
Real Bayes
In reality, it’s not that easy...
Prior model probabilities may be fictionalThe ideal Bayesian (or her/his scientist colleague) has real priorprobabilities reflecting scientific judgement/belief across the modelspace; not very common in practice!
Arbirariness in prior model probabilities may not affect Bayesfactors, but it sabotages Bayesian model averaging!
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 62 / 69
Real Bayes
In reality, it’s not that easy...
No chance of passing the test of a sensitivity analysisIn ordinary parametric problems we commonly find that inferencesare rather insensitive to moderately large variations in priorassumptions, except when there are very few data (indeed, theopposite case, of high sensitivity, poses a challenge to thenon-Bayesian – perhaps the data carry less information thanhoped?). But it’s clear that a test of sensitivity to modelprobabilities will always fail:
p?(k |Y )
p?(l |Y )=
p(k |Y )
p(l |Y )×(
p?(k)p?(l)
÷ p(k)p(l)
)
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 63 / 69
Real Bayes
In reality, it’s not that easy...
No chance of passing the test of a sensitivity analysisIn ordinary parametric problems we commonly find that inferencesare rather insensitive to moderately large variations in priorassumptions, except when there are very few data (indeed, theopposite case, of high sensitivity, poses a challenge to thenon-Bayesian – perhaps the data carry less information thanhoped?). But it’s clear that a test of sensitivity to modelprobabilities will always fail:
p?(k |Y )
p?(l |Y )=
p(k |Y )
p(l |Y )×(
p?(k)p?(l)
÷ p(k)p(l)
)
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 63 / 69
Real Bayes
In reality, it’s not that easy...
No chance of passing the test of a sensitivity analysisIn ordinary parametric problems we commonly find that inferencesare rather insensitive to moderately large variations in priorassumptions, except when there are very few data (indeed, theopposite case, of high sensitivity, poses a challenge to thenon-Bayesian – perhaps the data carry less information thanhoped?). But it’s clear that a test of sensitivity to modelprobabilities will always fail:
p?(k |Y )
p?(l |Y )=
p(k |Y )
p(l |Y )×(
p?(k)p?(l)
÷ p(k)p(l)
)
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 63 / 69
Real Bayes
In reality, it’s not that easy...
Improper parameter priors problemsIn ordinary parametric problems it is commonly true that it is safeto use improper priors – when posterior distributions arewell-defined as limits based on a sequence of approximatingproper priors (and not usually sensitive to what that sequence is).
But improper parameter priors make Bayes factors indeterminate(since improper priors can only be defined up to arbitrarynormalising constants, which persist into marginal likelihoods).
And proper but vague/diffuse priors fail to solve the problem, sincethe Bayes factors will depend on the arbitrary degree ofvagueness used.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 64 / 69
Real Bayes
In reality, it’s not that easy...
Improper parameter priors problemsIn ordinary parametric problems it is commonly true that it is safeto use improper priors – when posterior distributions arewell-defined as limits based on a sequence of approximatingproper priors (and not usually sensitive to what that sequence is).
But improper parameter priors make Bayes factors indeterminate(since improper priors can only be defined up to arbitrarynormalising constants, which persist into marginal likelihoods).
And proper but vague/diffuse priors fail to solve the problem, sincethe Bayes factors will depend on the arbitrary degree ofvagueness used.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 64 / 69
Real Bayes
In reality, it’s not that easy...
Improper parameter priors problemsIn ordinary parametric problems it is commonly true that it is safeto use improper priors – when posterior distributions arewell-defined as limits based on a sequence of approximatingproper priors (and not usually sensitive to what that sequence is).
But improper parameter priors make Bayes factors indeterminate(since improper priors can only be defined up to arbitrarynormalising constants, which persist into marginal likelihoods).
And proper but vague/diffuse priors fail to solve the problem, sincethe Bayes factors will depend on the arbitrary degree ofvagueness used.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 64 / 69
Real Bayes
In reality, it’s not that easy...
Improper parameter priors problemsIn ordinary parametric problems it is commonly true that it is safeto use improper priors – when posterior distributions arewell-defined as limits based on a sequence of approximatingproper priors (and not usually sensitive to what that sequence is).
But improper parameter priors make Bayes factors indeterminate(since improper priors can only be defined up to arbitrarynormalising constants, which persist into marginal likelihoods).
And proper but vague/diffuse priors fail to solve the problem, sincethe Bayes factors will depend on the arbitrary degree ofvagueness used.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 64 / 69
Real Bayes
In reality, it’s not that easy...
Improper parameter priors problems, continued
In certain circumstances, ideas such as Intrinsic or FractionalBayes factors, or Expected Posterior priors, can be applied,essentially based on tying together improper priors acrossdifferent models. These ideas lose much of the appeal of idealBayes arguments, have arbitrary aspects, and are not widelyaccepted.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 65 / 69
Real Bayes
In reality, it’s not that easy...
Improper parameter priors problems, continued
In certain circumstances, ideas such as Intrinsic or FractionalBayes factors, or Expected Posterior priors, can be applied,essentially based on tying together improper priors acrossdifferent models. These ideas lose much of the appeal of idealBayes arguments, have arbitrary aspects, and are not widelyaccepted.
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 65 / 69
Real Bayes
Model uncertainty? Yes, but do we have to choose?
Model uncertainty is a fact of life.When can we quantify it?When can we eliminate it?When can we accommodate it?
Why choose?PredictionScientific understandingPresentationPolicyDefence
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 66 / 69
Real Bayes
An illusion of unity?
Is ‘model’ too much of a catch-all?different scientific mechanismsselection of predictors in regressionnumber of components in mixtureorder of AR modelcomplexity of polynomial or spline
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 67 / 69
Real Bayes
All the other criteria
Bayesian hypothesis testing, and ‘alternative-prior carpentry’AIC, BIC, DIC, DIC+, MDL, Cp
Decision theoryBayesian p-valuesPosterior predictive checks
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 68 / 69
Acknowledgements
Thanks to all my collaborators on trans-dimensional MCMC problems:Carmen Fernández, Paolo Giudici, Miles Harkness, David Hastie,Matthew Hodgson, Antonietta Mira, Agostino Nobile, Marco Pievatolo,Sylvia Richardson, Luisa Scaccia and Claudia Tarantola.
References and preprints available from
http://www.stats.bris.ac.uk/∼peter/Research.htmlhttp://www.stats.bris.ac.uk/∼peter/papers/[email protected]
c©University of Bristol, 2011
Peter Green (Bristol) Computing posterior model probabilities Groningen, March 2011 69 / 69