Bayesian model choice in cosmology

63
Bayesian Model Comparison in Cosmology Bayesian Model Comparison in Cosmology with Population Monte Carlo Monthly Notices Royal Astronomical Soc. 405 (4), 2381 - 2390, 2010 Christian P. Robert Universit´ e Paris Dauphine & CREST http://www.ceremade.dauphine.fr/ ~ xian Joint works with D., Benabed K., Capp´ e O., Cardoso J.F., Fort G., Kilbinger M., [Marin J.-M., Mira A.,] Prunet S., Wraith D.

description

Talk at JSM 2010, Vancouver, B.C.

Transcript of Bayesian model choice in cosmology

Page 1: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Bayesian Model Comparison in Cosmologywith Population Monte Carlo

Monthly Notices Royal Astronomical Soc. 405 (4), 2381 - 2390, 2010

Christian P. Robert

Universite Paris Dauphine & CRESThttp://www.ceremade.dauphine.fr/~xian

Joint works with D., Benabed K., Cappe O., Cardoso J.F., Fort G., Kilbinger M.,

[Marin J.-M., Mira A.,] Prunet S., Wraith D.

Page 2: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Outline

1 Cosmology background

2 Importance sampling

3 Application to cosmological data

4 Evidence approximation

5 Cosmology models

6 lexicon

Page 3: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology background

Cosmology

A large part of the data to answer some of the major questions in cosmologycomes from studying the Cosmic Microwave Background (CMB) radiation(fossil heat released circa 380,000 years after the BB).

Huge uniformity of the CMB. Only very sensitive instruments like such asWMAP (NASA, 2001) can detect fluctuations CMB temperaturee.g minute temperature variations: one part of the sky has a temperature of 2.7251Kelvin (degrees above absolute zero), while another part of the sky has a temperatureof 2.7249 Kelvin

Page 4: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology background

CosmologyA large part of the data to answer some of the major questions in cosmologycomes from studying the Cosmic Microwave Background (CMB) radiation(fossil heat released circa 380,000 years after the BB).

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

CMB

−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6

01

23

45

[Marin & CPR, Bayesian Core, 2007]

Page 5: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology background

Plank

Temperature variations are related to fluctuations in the density ofmatter in the early universe and thus carry information about theinitial conditions for the formation of cosmic structures such asgalaxies, clusters, and voids for example.

PlanckJoint mission between the European Space Agency (ESA) and NASA, launched inMay 2009. The Planck mission plans to provide datasets of nearly 5 × 1010

observations to settle many open questions with CMB temperature data. Rather thanscalar valued observations, Planck will provide tensor-valued data and thus is likely toalso open up this area of statistical research.

Page 6: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology background

Plank

Temperature variations are related to fluctuations in the density ofmatter in the early universe and thus carry information about theinitial conditions for the formation of cosmic structures such asgalaxies, clusters, and voids for example.

PlanckJoint mission between the European Space Agency (ESA) and NASA, launched inMay 2009. The Planck mission plans to provide datasets of nearly 5 × 1010

observations to settle many open questions with CMB temperature data. Rather thanscalar valued observations, Planck will provide tensor-valued data and thus is likely toalso open up this area of statistical research.

Page 7: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology background

.

Page 8: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology background

Some questions in cosmology

Will the universe expand forever, or will it collapse?

Is the universe dominated by exotic dark matter and what isits concentration?

What is the shape of the universe?

Is the expansion of the universe accelerating rather thandecelerating?

Is the “flat ΛCDM paradigm” appropriate or is the curvaturedifferent from zero?

[Adams, The Guide [a.k.a. H2G2], 1979]

Page 9: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology background

Statistical problems in cosmology

Potentially high dimensional parameter space [Not consideredhere]

Immensely slow computation of likelihoods, e.g WMAP, CMB,because of numerically costly spectral transforms [Data is aFortran program]

Nonlinear dependence and degeneracies between parametersintroduced by physical constraints or theoretical assumptions

Ωm

w0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

−3.

0−

2.0

−1.

00.

0

− M

α

19.1 19.3 19.5 19.7

1.0

1.5

2.0

2.5

Page 10: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Importance sampling solutions

1 Cosmology background

2 Importance samplingAdaptive importance samplingAdaptive multiple importance sampling

3 Application to cosmological data

4 Evidence approximation

5 Cosmology models

6 lexicon

Page 11: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Importance sampling 101

Importance sampling is based on the fundamental identity

π(f) =

f(x)π(x) dx =

f(x)π(x)

q(x)q(x) dx

If x1, . . . , xN are drawn independently from q,

π(f) =

N∑

n=1

f(xn)wn; wn =π(xn)/q(xn)

∑Nm=1 π(xm)/q(xm)

,

provides a converging approximation to π(f) (independent of thenormalisation of π).

Page 12: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Initialising importance sampling

PMC/AIS offers a solution to the difficulty of picking q throughadaptivity:Given a target π, PMC produces a sequence qt of importancefunctions (t = 1, . . . , T ) aimed at approximating πFirst sample produced by a regular importance sampling scheme,x1

1, . . . , x1N ∼ q1, associated with importance weights

w1n =

π(x1n)

q1(x1n)

and their normalised counterparts w1n, providing a first

approximation to a sample from π.Moments of π can then be approximated to construct an updatedimportance function q2, &c.

Page 13: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Adaptive importance sampling

Optimality criterion?

The quality of approximation can be measured in terms of theKullback divergence from the target,

D(π‖qt) =

log

(

π(x)

qt(x)

)

π(x)dx,

and the density qt can be adjusted incrementally to minimize thisdivergence.

Page 14: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

PMC – Some papers

Cappe et al (2004) - J. Comput. Graph. Stat.

Outline of Population Monte Carlo but missed main point

Celeux et al (2005) - Comput. Stat. & Data Analysis Rao-Blackwellisation forimportance sampling and missing data problems

Douc et al (2007) - ESAIM Prob. & Stat. and Annals of Statistics

Convergence issues proving adaptation is positive where q is a mixture density ofrandom-walk proposals (mixture weights varied)

Cappe et al (2007) - Stat. & Computing

Adaptation of q (mixture density of independent proposals), where weights andparameters vary

Wraith et al (2009) - Physical Review D

Application of Cappe et al (2007) to cosmology and comparison with MCMC

Beaumont et al (2009) - Biometrika

Application of Cappe et al (2007) to ABC settings

Kilbinger et al (2010) - Month. N. Royal Astro. Soc.

Application of Cappe et al (2007) to model choice in cosmology

Page 15: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Adaptive importance sampling (2)

Use of mixture densities

qt(x) = q(x;αt, θt) =

D∑

d=1

αtdϕ(x; θt

d)

[West, 1993]

where

αt = (αt1, . . . , α

tD) is a vector of adaptable weights for the D

mixture components

θt = (θt1, . . . , θ

tD) is a vector of parameters which specify the

components

ϕ is a parameterised density (usually taken to be multivariateGaussian or Student-t, the later preferred)

Page 16: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Cappe et al (2007) optimal scheme

Update qt using an integrated EM approach minimising the KLdivergence at each iteration

D(π‖qt) =

log

(

π(x)∑D

d=1 αtdϕ(x; θt

d)

)

π(x)dx,

equivalent to maximising

ℓ(α, θ) =

log

(

D∑

d=1

αdϕ(x; θd)

)

π(x) dx

in α, θ.

Page 17: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

PMC updates

Maximization of Lt(α, θ) leads to closed form solutions inexponential families (and for the t distributions)For instance for Np(µd,Σd):

αt+1d =

ρd(x;αt, µt,Σt)π(x)dx,

µt+1d =

xρd(x;αt, µt,Σt)π(x)dx

αt+1d

,

Σt+1d =

(x − µt+1d )(x − µt+1

d )Tρd(x;αt, µt,Σt)π(x)dx

αt+1d

.

Page 18: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Empirical updates

And empirical versions,

αt+1d

=N

X

n=1

wtn ρd(xt

n;αt, µt, Σt)

µt+1d

=

PNn=1 wt

nxtn ρd(xt

n;αt, µt, Σt)

αt+1d

Σt+1d

=PN

n=1 wtn (xt

n − µt+1d

)(xtn − µt+1

d)Tρd(xt

n;αt, µt, Σt)

αt+1d

Page 19: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Banana benchmark

Twisted Np(0, Σ) target with Σ = diag(σ2

1, 1, . . . , 1), changing the

second co-ordinate x2 to x2 + b(x2

1− σ2

1)

x1

x 2

−40 −20 0 20 40

−40

−30

−20

−10

010

20

p = 10, σ2

1= 100, b = 0.03

[Haario et al. 1999]

Page 20: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Simulation

−40 −20 0 20 40

−40

−20

010

20

−40 −20 0 20 40

−40

−20

010

20−40 −20 0 20 40

−40

−20

010

20

−40 −20 0 20 40−

40−

200

1020

−40 −20 0 20 40

−40

−20

010

20

−40 −20 0 20 40

−40

−20

010

20

Page 21: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Monitoring by perplexity

Stop iterations when further adaptations do not improve D(π‖qt).

The transform exp[−D(π‖qt)] may be estimated by the normalised

perplexity p = exp(HtN)/N, where

HtN = −

N∑

n=1

wtn log wt

n

is the Shannon entropy of the normalised weights

Thus, minimization of the Kullback divergence can beapproximately connected with the maximization of the perplexity(normalised) (values closer to 1 indicating good agreementbetween q and π).

Page 22: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Monitoring by ESS

A second criterion is the effective sample size (ESS)

ESStN =

(

N∑

n=1

wtn

2

)−1

which can be interpreted as the number of equivalent iid samplepoints.

Page 23: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Simulation

1 2 3 4 5 6 7 8 9 10

0.0

0.2

0.4

0.6

0.8

NP

ER

PL

1 2 3 4 5 6 7 8 9 10

0.0

0.2

0.4

0.6

0.8

NE

SS

Normalised perplexity (top panel) and normalised effective sample size(ESS/N) (bottom panel) estimates for thefirst 10 iterations of PMC

Page 24: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Comparison to MCMCAdaptive MCMC: Proposal is a multivariate Gaussian with Σupdated/based on previous values in the chain. Scale and updatetimes chosen for optimal results.

!"# $"# %"# &"# '"#

!$

!(

!!

"!

($

)!

!"# $"# %"# &"# '"#

!$

!(

!!

"!

($

)!

!"# $"# %"# &"# '"#

!$

!(

!!

"!

($

)(

!"# $"# %"# &"# '"#

!$

!(

!!

"!

($

)(

fa fa

fbfb

PMC MCMC

Evolution of π(fa) (top panels) and π(fb) (bottom panels) from 10k points to 100k points for both PMC (leftpanels) and MCMC (right panels).

Page 25: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive importance sampling

Simulation

d10 PMC d10 MCMC d2 PMC d2 MCMC d1 PMC d1 MCMC

0.62

0.66

0.70

0.74

Propoportion of points inside

d10 PMC d10 MCMC d2 PMC d2 MCMC d1 PMC d1 MCMC

0.88

0.92

0.96

1.00

Propoportion of points inside

MCMC

MCMC

MCMC

MCMC

MCMC

MCMC

PMC PMC PMC

PMC PMC PMC

fc fe fh

fd fg fi

Results showing the distributions of the PMC and the MCMC estimates. All estimates are based on 500 simulationruns.

Page 26: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Adaptive multiple importance sampling

Full recycling:

At iteration t, design a new proposal qt based on all previoussamples

x11, . . . , x

1N , . . . , xt−1

1 , . . . , xt−1N

At each stage, the whole past can be used: if un-normalisedweights ωi,t are preserved along iterations, then all xt

i’s can bepooled together

Page 27: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Adaptive multiple importance sampling

Full recycling:

At iteration t, design a new proposal qt based on all previoussamples

x11, . . . , x

1N , . . . , xt−1

1 , . . . , xt−1N

At each stage, the whole past can be used: if un-normalisedweights ωi,t are preserved along iterations, then all xt

i’s can bepooled together

Page 28: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Caveat

When using several importance functions at once, q0, . . . , qT , withsamples x0

1, . . . , x0N0

, . . ., xT1 , . . . , xT

NTand importance weights

ωti = π(xt

i)/qt(xti), merging thru the empirical distribution

t,i

ωtiδxt

i(x)

/

t,i

ωti≈ π(x)

Fails to cull poor proposals: very large weights do remain large inthe cumulated sample and poorly performing samplesoverwhelmingly dominate other samples in the final outcome.

c© Raw mixing of importance samples may be harmful, comparedwith a single sample, even when most proposals are efficient.

Page 29: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Caveat

When using several importance functions at once, q0, . . . , qT , withsamples x0

1, . . . , x0N0

, . . ., xT1 , . . . , xT

NTand importance weights

ωti = π(xt

i)/qt(xti), merging thru the empirical distribution

t,i

ωtiδxt

i(x)

/

t,i

ωti≈ π(x)

Fails to cull poor proposals: very large weights do remain large inthe cumulated sample and poorly performing samplesoverwhelmingly dominate other samples in the final outcome.

c© Raw mixing of importance samples may be harmful, comparedwith a single sample, even when most proposals are efficient.

Page 30: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Deterministic mixtures

Owen and Zhou (2000) propose a stabilising recycling of theweights via deterministic mixtures by modifying the importancedensity qt(x

ti) under which xt

i was truly simulated to a mixture ofall the densities that have been used so far

1∑T

j=0 Nj

T∑

t=0

Ntqt(xTi ) ,

resulting into the deterministic mixture weight

ωti = π(xt

i)

/

1∑T

j=0 Nj

T∑

t=0

Ntqt(xti) .

Page 31: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Unbiasedness

Potential to exploit the most efficient proposals in the sequenceQ0, . . . , QT without rejecting any simulated value nor sample.Poorly performing importance functions are simply eliminatedthrough the erosion of their weights

π(xti)

/

1∑T

j=0 Nj

T∑

l=0

Nlql(xti)

as T increases.Paradoxical feature of competing acceptable importance weightsfor the same simulated value well-understood in the cases ofRao-Blackwellisation and of Population Monte Carlo. Moreintricated here in that only unbiasedness remains [fake mixture]

Page 32: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Unbiasedness

Potential to exploit the most efficient proposals in the sequenceQ0, . . . , QT without rejecting any simulated value nor sample.Poorly performing importance functions are simply eliminatedthrough the erosion of their weights

π(xti)

/

1∑T

j=0 Nj

T∑

l=0

Nlql(xti)

as T increases.Paradoxical feature of competing acceptable importance weightsfor the same simulated value well-understood in the cases ofRao-Blackwellisation and of Population Monte Carlo. Moreintricated here in that only unbiasedness remains [fake mixture]

Page 33: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

AMIS

AMIS (or Adaptive Multiple Importance Sampling) usesimportance sampling functions (qt) that are constructedsequentially and adaptively, using past t − 1 weighted samples.

i weights of all present and past variables xli

(1 ≤ l ≤ t , 1 ≤ j ≤ Nt) are modified, based on the currentproposals

ii the entire collection of importance samples is used to buildthe next importance function.

[Parallel with IMIS: Raftery & Bo, 2010]

Page 34: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

The AMIS algorithm

Adaptive Multiple Importance SamplingAt iteration t = 1, . . . , T

1) Independently generate Nt particles xt

i∼ q(x|θt−1)

2) For 1 ≤ i ≤ Nt, compute the mixture at xit

δti

= N0q0(xti) +

P

t

l=1 Nlq(xti; θl−1) and derive the

weight of xti, ωt

i= π(xt

i)‹

[δti

ffi

N0 +P

t

l=0 Nl] .

3) For 0 ≤ l ≤ t − 1 and 1 ≤ i ≤ Nl, actualise past weights as

δl

i= δ

l

i+ q(x

l

i; θ

t−1) and ω

l

i= π(x

l

i)‹

[δl

i

N0 +

tX

l=0

Nl] .

4) Compute the parameter estimate θt based on

(x01, ω

01, . . . , x

0N0

, ω0N0

, . . . , xt

1, ωt

1, . . . , xt

Nt, ω

t

Nt)

[Cornuet, Marin, Mira & CPR, 2009, arXiv:0907.1254]

Page 35: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Studentised AMIS

When the proposal distribution qt is a Student’s t proposal,

T3(µ,Σ)

mean µ and covariance Σ parameters can be updated byestimating first two moments of the target distribution Π

µt =

Ptl=0

PNl

i=1 ωlix

li

Ptl=0

PNl

i=1 ωli

and Σt =

Ptl=0

PNl

i=1 ωli(x

li − µt)(xl

i − µt)T

Ptl=0

PNl

i=1 ωli

.

i.e. using optimal update of Cappe et al. (2007)

Obvious extension to mixtures [and again optimal update of Cappeet al. (2007)]

Page 36: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Studentised AMIS

When the proposal distribution qt is a Student’s t proposal,

T3(µ,Σ)

mean µ and covariance Σ parameters can be updated byestimating first two moments of the target distribution Π

µt =

Ptl=0

PNl

i=1 ωlix

li

Ptl=0

PNl

i=1 ωli

and Σt =

Ptl=0

PNl

i=1 ωli(x

li − µt)(xl

i − µt)T

Ptl=0

PNl

i=1 ωli

.

i.e. using optimal update of Cappe et al. (2007)

Obvious extension to mixtures [and again optimal update of Cappeet al. (2007)]

Page 37: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

SimulationsSame banana benchmark

Target function p AMIS Cappe’07

5 0.06558 0.06879E(x1) = 0 10 0.06388 0.11051

20 0.09167 0.17912

5 0.10215 0.11583E(x2) = 0 10 0.21421 0.22557

20 0.25316 0.29087P5

i=3 E(xi) = 0 5 0.00478 0.00927P10

i=3 E(xi) = 0 10 0.00902 0.02099P20

i=3 E(xi) = 0 20 0.01666 0.04208

5 2.60672 3.92650var(x1) = 100 10 7.06686 7.48877

20 8.20020 9.71725

5 2.10682 2.96132var(x2) = 19 10 3.76660 5.08474

20 4.85407 5.98031P5

i=3var(xi) = 3 5 0.00645 0.01196P10

i=3var(xi) = 8 10 0.01370 0.02636P20

i=3var(xi) = 18 20 0.04609 0.06424

Root mean square errors calculated over 10 replications for different target functionsand dimensions p.

Page 38: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Simulation (cont’d)

10 replicate ESSs for AMIS (left) and PMC (right) for p = 5, 10, 20.

Page 39: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Simulation (cont’d)

10 replicate absolute errors associated to the estimations of E(x1) (left column),

E(x2) (center column) andPp

i=3 E(xi) (right column) using AMIS (left in each

block) and PMC (right) for p = 5, 10, 20.

Page 40: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Importance sampling

Adaptive multiple importance sampling

Simulation (cont’d)

10 replicate absolute errors associated to the estimations of var(x1) (left column),

var(x2) (center column) andPp

i=3 var(xi) (right column) using AMIS (left in each

block) and PMC (right) for p = 5, 10, 20.

Page 41: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Application to cosmological data

Cosmological data

Posterior distribution of cosmological parameters for recentobservational data of CMB anisotropies (differences in temperaturefrom directions) [WMAP], SNIa, and cosmic shear.Combination of three likelihoods, some of which are available aspublic (Fortran) code, and of a uniform prior on a hypercube.

Page 42: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Application to cosmological data

Cosmology parameters

Parameters for the cosmology likelihood(C=CMB, S=SNIa, L=lensing)

Symbol Description Minimum Maximum ExperimentΩb Baryon density 0.01 0.1 C LΩm Total matter density 0.01 1.2 C S Lw Dark-energy eq. of state -3.0 0.5 C S Lns Primordial spectral index 0.7 1.4 C L

∆2R

Normalization (large scales) Cσ8 Normalization (small scales) C Lh Hubble constant C Lτ Optical depth CM Absolute SNIa magnitude Sα Colour response Sβ Stretch response Sa Lb galaxy z-distribution fit Lc L

For WMAP5, σ8 is a deduced quantity that depends on the other parameters

Page 43: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Application to cosmological data

Adaptation of importance function

Page 44: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Application to cosmological data

Estimates

Parameter PMC MCMC

Ωb 0.0432+0.0027−0.0024

0.0432+0.0026−0.0023

Ωm 0.254+0.018

−0.0170.253+0.018

−0.016

τ 0.088+0.018−0.016

0.088+0.019−0.015

w −1.011 ± 0.060 −1.010+0.059

−0.060

ns 0.963+0.015−0.014

0.963+0.015−0.014

109∆2R

2.413+0.098−0.093

2.414+0.098−0.092

h 0.720+0.022−0.021

0.720+0.023−0.021

a 0.648+0.040−0.041

0.649+0.043−0.042

b 9.3+1.4−0.9

9.3+1.7−0.9

c 0.639+0.084−0.070

0.639+0.082−0.070

−M 19.331 ± 0.030 19.332+0.029

−0.031

α 1.61+0.15−0.14

1.62+0.16−0.14

−β −1.82+0.17

−0.16−1.82 ± 0.16

σ8 0.795+0.028−0.030

0.795+0.030−0.027

Means and 68% credible intervals using lensing, SNIa and CMB

Page 45: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Application to cosmological data

Advantage of AIS and PMC?

Parallelisation of the posterior calculations- For the cosmological examples, we used up to 100 CPUs on a computer cluster to explore the cosmologyposteriors using AIS/PMC. Reducing the computational time from several days for MCMC to a few hoursusing PMC.

Low variance of Monte Carlo estimates- For PMC and q closely matched to π, significant reductions in the variance of the Monte Carloestimates are possible compared to estimates using MCMC. Also translating into a computational saving,with further savings possible by combining samples across iterations

Simple diagnostics of ‘convergence’ (perplexity)- For PMC, the perplexity provides a relatively simple measure of sampling adequacy to the target densityof interest

Page 46: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Evidence approximation

Evidence/Marginal likelihood/Integrated Likelihood ...

Central quantity of interest in (Bayesian) model choice

E =

π(x)dx =

π(x)

q(x)q(x)dx.

expressed as an expectation under any density q with large enoughsupport.Importance sampling provides a sample x1, . . . xN ∼ q andapproximation of the above integral,

E ≈N∑

n=1

wn

where the wn = π(xn)q(xn) are the (unnormalised) importance weights.

Page 47: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Evidence approximation

Evidence/Marginal likelihood/Integrated Likelihood ...

Central quantity of interest in (Bayesian) model choice

E =

π(x)dx =

π(x)

q(x)q(x)dx.

expressed as an expectation under any density q with large enoughsupport.Importance sampling provides a sample x1, . . . xN ∼ q andapproximation of the above integral,

E ≈N∑

n=1

wn

where the wn = π(xn)q(xn) are the (unnormalised) importance weights.

Page 48: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Evidence approximation

Back to the banana ...

Centred d-multivariate normal, x ∼ Nd(0,Σ) with covarianceΣ = diag(σ2

1 , 1, . . . , 1), which is slightly twisted in the first twodimensions by changing x2 to be x2 + β(x2

1 − σ21). where σ2

1 = 100and β controls the degree of curvature.We integrate over the unormalised target density

E =

π(β)f(x|β,Σ)dβ

or

E =

π(x|β,Σ)dx.

Page 49: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Evidence approximation

Simulation results (1)

x1

x 2

−40 −20 0 20 40

−30

−20

−10

010

−40 −20 0 20 40

−30

−20

−10

010

x1

x 2

0.02

992

0.02

996

0.03

000

0.03

004

After 10th iteration

Pos

terio

r m

ean

of β

−26

4.03

6−

264.

032

−26

4.02

8

After 10th iteration

Evi

denc

e (lo

g)

β unknown

Page 50: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Evidence approximation

Simulation results (2)

1 2 3 4 5 6 7 8 9 10

0.2

0.4

0.6

0.8

Iteration

Per

plex

ity

1 2 3 4 5 6 7 8 9 10

0.0

0.2

0.4

0.6

0.8

Iteration

NE

SS

1 2 3 4 5 6 7 8 9 10

−0.

10.

00.

10.

2

Iteration

Evi

denc

e (lo

g)

−0.

015

−0.

005

0.00

50.

015

After 10th iteration

Evi

denc

e (lo

g): f

inal

sam

ple

β = 0.015 known

Page 51: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Back to cosmology questions

Standard cosmology successful in explaining recent observations,such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy clustercounts, and Lyα forest clustering.

Flat ΛCDM model with only six free parameters(Ωm,Ωb, h, ns, τ, σ8)

Extensions to ΛCDM may be based on independent evidence(massive neutrinos from oscillation experiments), predicted bycompelling hypotheses (primordial gravitational waves frominflation) or reflect ignorance about fundamental physics(dynamical dark energy).

Testing for dark energy, curvature, and inflationary models

Page 52: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Back to cosmology questions

Standard cosmology successful in explaining recent observations,such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy clustercounts, and Lyα forest clustering.

Flat ΛCDM model with only six free parameters(Ωm,Ωb, h, ns, τ, σ8)

Extensions to ΛCDM may be based on independent evidence(massive neutrinos from oscillation experiments), predicted bycompelling hypotheses (primordial gravitational waves frominflation) or reflect ignorance about fundamental physics(dynamical dark energy).

Testing for dark energy, curvature, and inflationary models

Page 53: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Extended models

Focus on the dark energy equation-of-state parameter, modeled as

w = −1 ΛCDM

w = w0 wCDM

w = w0 + w1(1 − a) w(z)CDM

In addition, curvature parameter ΩK for each of the above is eitherΩK = 0 (‘flat’) or ΩK 6= 0 (‘curved’).Choice of models represents simplest models beyond a“cosmological constant” model able to explain the observed,recent accelerated expansion of the Universe.

Page 54: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Cosmology priors

Prior ranges for dark energy and curvature models. In case of w(a)models, the prior on w1 depends on w0

Parameter Description Min. Max.

Ωm Total matter density 0.15 0.45Ωb Baryon density 0.01 0.08h Hubble parameter 0.5 0.9

ΩK Curvature −1 1w0 Constant dark-energy par. −1 −1/3

w1 Linear dark-energy par. −1 − w0−1/3−w0

1−aacc

Page 55: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Cosmology priors (2)

Component to the matter-density tensor with w(a) < −1/3 forvalues of the scale factor a > aacc = 2/3. To limit the stateequation from below, we impose the condition w(a) > −1 for all a,thereby excluding phantom energy.Natural limit on the curvature is that of an empty Universe, i.e.upper boundary on the curvature ΩK = 1. A lower boundarycorresponds to an upper limit on the total matter-energy density:ΩK > −1, excluding high-density Universe(s) which are ruled outby the age of the oldest observed objects.Alternative prior on ΩK could be derived from the paradigm of inflation, but most

scenarios imply the curvature to be , on the order of 10−60. The likelihood over such

a prior on ΩK is essentially flat for any current and future experiments, hence cannot

be assessed.

Page 56: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Cosmology priors (2)

Component to the matter-density tensor with w(a) < −1/3 forvalues of the scale factor a > aacc = 2/3. To limit the stateequation from below, we impose the condition w(a) > −1 for all a,thereby excluding phantom energy.Natural limit on the curvature is that of an empty Universe, i.e.upper boundary on the curvature ΩK = 1. A lower boundarycorresponds to an upper limit on the total matter-energy density:ΩK > −1, excluding high-density Universe(s) which are ruled outby the age of the oldest observed objects.Alternative prior on ΩK could be derived from the paradigm of inflation, but most

scenarios imply the curvature to be , on the order of 10−60. The likelihood over such

a prior on ΩK is essentially flat for any current and future experiments, hence cannot

be assessed.

Page 57: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

PMC setup

q0 is a Gaussian mixture model with D components randomlyshifted away from the MLE and covariance equal to theinformation matrix.

For the dark-energy and curvature models number ofiterations T equal to 10, unless perplexity indicated thecontrary. Average number of points sampled under anindividual mixture-component, N/D, controlled for stableupdating component (N = 7500 and D = 10).

For the primordial models T = 5, N = 10000 and D between7 and 10, depending on the dimensionality.

Parameters controlling the initial mixture means andcovariances, chosen as fshift = 0.02, and fvar between 1 and1.5. Final iteration run with a five-times larger sample

Page 58: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Results

In most cases evidence in favour of the standard model. especiallywhen more datasets/experiments are combined.

Largest evidence is ln B12 = 1.8, for the w(z)CDM model andCMB alone. Case where a large part of the prior range is stillallowed by the data, and a region of comparable size is excluded.Hence weak evidence that both w0 and w1 are required, butexcluded when adding SNIa and BAO datasets.

Results on the curvature are compatible with current findings:non-flat Universe(s) strongly disfavoured for the three dark-energycases.

Page 59: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Evidence

-8

-6

-4

-2

0

2

4

4 5 6

ln B

12

npar

Evidence (reference model ΛCDM flat)

inco

ncl.

wea

km

od.

wea

km

od.

stro

ng

CMB

Λ curved

w0 flat

w0 curved

w(z) flat

w(z) curved

CMB+SN

Λ curved

w0 flat

w0 curved

w(z) flat

w(z) curved

CMB+SN+BAO

Λ curved

w0 flat

w0 curved

w(z) flat

w(z) curved

Page 60: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

Posterior outcome

Posterior on dark-energy parameters w0 and w1 as 68%- and 95% credible regions forWMAP (solid blue lines), WMAP+SNIa (dashed green) and WMAP+SNIa+BAO(dotted red curves). Allowed prior range as red straight lines.

−1.0 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4

−0.5

0.0

0.5

1.0

1.5

2.0

w0

w1

Page 61: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

PMC stability−

11.0

−10

.0−

9.5

−9.

0−

8.5

iteration

ln E

1 2 3 4 5 6 7 8 9 10

wCDM flat

−14

−13

−12

−11

−10

iterationln

E

1 3 5 7 9 11 13 15 17 19

wCDM curvature

Distribution of 25 PMC samplings of two dark-energy models, flat wCDM (left panel)

and curved wCDM (right panel). Log-evidence

Page 62: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

Cosmology models

PMC stability0.

00.

20.

40.

60.

8

iteration

perp

lexi

ty

1 2 3 4 5 6 7 8 9 10

wCDM flat

0.0

0.1

0.2

0.3

0.4

0.5

iterationpe

rple

xity

1 3 5 7 9 11 13 15 17 19

wCDM curvature

Distribution of 25 PMC samplings of two dark-energy models, flat wCDM (left panel)

and curved wCDM (right panel). Perplexity

Page 63: Bayesian model choice in cosmology

Bayesian Model Comparison in Cosmology

lexicon

lexicon

BAO, baryon acoustic oscillations

CMB, cosmic microwave background radiation

COBE, cosmic background explorer

ΛCDM, lambda-cold dark matter

Lyα, Lyman-alpha

SNIa, type Ia supernovae

WMAP, Wilkinson microwave anisotropy probe