CMB Power Spectrum Estimation with Hamiltonian...

Post on 03-Jul-2018

215 views 0 download

Transcript of CMB Power Spectrum Estimation with Hamiltonian...

CMB Power Spectrum Estimation withHamiltonian Sampling

J.F. Taylor, M.A.J. Ashdown, M.P. Hobson

March 17, 2008

Outline

I Aim of power spectrum estimation

I Standard methodsI Sampling

I GibbsI Hamiltonian Monte CarloI Tests on simulated data

I Extension to polarisationI What are the new challenges?I Preliminary results

The aim of power spectrum estimation

Figures WMAP science team

I Pixelised CMB map t (xp)

t (xp) =`=`max∑`=0

∑m=−`

a`mY`m (xp)

I For an isotropic Gaussian CMB

〈a`ma∗`′m′〉 = C`δ``′δmm′

I Observed data d = s + n wheres = Rt = RYa

I pixelised map R = WBI time-ordered data R = AB

Aim of power spectrum estimation

I non-stationary, correlatednoise

I beams

I cut-sky / partial coverage

I foregrounds

I systematicsI large data-sets

I WMAP : Npix ≈3× 106 `max ≈ 1000

I Planck : Npix ≈ 5× 107

`max ≈ 2500 ∼ 3000

Figures WMAP science team

pseudo-C` estimators

I Frequentist approach (Peebles 1973; Hivon et al. 2002)

I compute sphericalharmonic coefficientsof data map a`m

C` =1

2`+ 1

∑m

|a`m|2

I Clearly different fromtrue C` applycorrections for the

1. cut2. noise3. beams4. filtering . . .

(Hivon et al. 2002)

I Fast

I but sub-optimal, particularly at low-`

Maximum-likelihood

I likelihood L = Pr (d|C`) =∫

Pr (d|s) Pr (s|C`) ds

I Gaussian noise and signal

L ∝∫e−

12(d−s)T N−1(d−s)e−

12sT S−1sds

where N = 〈nnT 〉 and S = 〈ssT 〉 non-sparse

I Complete square and integrate

lnL = constant− 12{

ln |S + N|+ dT (S + N)−1d}

I Obtain ML estimate using an iterative algorithm, e.g.Newton–Raphson

I Basic method requires storage O(N2pix), operations O(N3

pix)

more ML

There are a number of shortcuts

I Solve matrix equations with conjugate gradient

O(NiterN2pix)

I Compute traces with Monte–Carlo

O(NMCNiterN2pix)

I Ring torus method, for some (i.e. Planck) scanning strategiesS, N same block diagonal form

O(N2pix)

Feasible only up to Npix ∼ 104 or 105

Sampling as an alternative approach

We would actually like to know about the posterior distribution

Pr (C`|d)

It is possible to sample from the joint density

Pr (C`,a|d)

and we could marginalize over a.

I Need to sample in extremely high dimensional space

I Most conventional Monte Carlo methods move throughparameter space by a random walk

Gibbs sampling

I For sampling multi-dimensionaldistributions

I Joint distribution Pr ({xi}) hard tosample

I Conditional distribtionsPr (xi|{xj}j 6=i) tractable

I Sample from each conditionaldistribution in turn to build upjoint density

I No parameters to tune

I Explores parameter space by random walk

Gibbs sampling

Collect samples by alternately drawing fromthe conditional distributions for C` and a

ai+1 ← Pr(a|Ci`,d

)∝ Pr (d|a) Pr

(a|Ci`

)Ci+1` ← Pr

(C`|ai,d

)∝ Pr

(ai|C`

)Pr (C`)

I C` step is simple. . . but slow for low signal to noise . . . so bin

I a step is hard

I Limited to Gaussian noise and signal

Gibbs sampling

Signal sample

I distribution is multivariate Gaussian in a space

a =(S−1 + RtN−1R

)−1RtN−1d

V =(S−1 + RtN−1R

)−1

I performed using transformed white noise sampling

I solved using conjugate gradient method

Computational cost

I write matrix equations in form of SHTs

storage O(Npix), operations O(N3/2pix )

I need to construct preconditioner

I hence approach requires ∼ 100− 200 SHTs per a sample

Hamiltonian Monte Carlo

I Proposed (Duane et al. 1987)

I Draws parallels between sampling and classical dynamics

I Introduces persistent motion into the Markov Chain → movesthrough high dimensional spaces efficiently.

I For each parameter xi we introduce a momentum pi anddefine the Hamiltonian

H =∑i

p2i

2mi+ ψ (x)

where ψ (x) = − log{Pr (x)} and mi is a fictional massassociated with each variable.

Hamiltonian Monte Carlo

I Draw momenta pi fromGaussian with variance mi

I Move x,p along a trajectoryaccording to Hamilton’sequations using simpleiterative scheme.

I After random time testcandidate point withMetropolis rule.

I If using exact Hamiltonian and trajectory is accurate theacceptance rate will be 100 %.

I Explores correlations and degeneracies with relative ease.

Hamiltonian Sampling

I Draw samples for C` and a simultaneously.

I ‘Potential’

ψ =12

(d− Ra)tN−1 (d− Ra) +∑`

(`+

12

)(lnC` +

σ`C`

)I Gradient

∇aψ = RtN−1 (d− Ra) +(l +

12

)a

C`

∇C`ψ =

(l +

12

)1C`

(1− σ`

C`

)where σ` = 1

2`+1

∑m |a`m|2 the spectrum of the signal

I recall R = BYI one SHT for potentialI two SHTs for gradient

WMAP

I Simulated W-bandmap

I Nside = 512 i.e.3× 106 pixels

I Noise and beams asfor a combinedW-band map

I Kp2 mask (∼ 15% ofthe sky

I Currently takes a day on a workstation to process up to`max = 512

I Initially we had problems with long correlation lengths but wenow better understand how to keep these low.

Polarisation

I Small signal

I Low multipoles of particularinterest

I Dominant foregrounds. . . large mask

I Ambiguity between E and B

Possible to separate E/B inmaps but we consider estimatingspectra directly from the data.

PolSpice

Estimates power spectrum using correlation functions.(Chon et al, 2004, Szapudi, Prunet, Colombi 2001)

BB spectrum with full sky (top) and with WMAP cut (bottom)

Sampler

BB spectrum with full sky (blue) and with WMAP cut (red). Mean spectrum rather than maximum likelihood.

Conclusions

I Sampling provides a fast and optimal framework forperforming power spectrum estimation

I Hamiltonian Monte Carlo is a good candidate for performingthe sampling ... fast and flexible

I Optimal estimates are needed for polarisation

next. . .

I incorporate component separation?

Scaling with problem size

Supressing random walk → increased sampling efficiency for largedimensional problems

Metropolis–HastingsGibbsHamiltonianMC

Blackwell-Rao

We can use the a samples to form a fast likelihood code.(Wandelt et al. 2004; Chu et al. 2004)

I Allows us to compute Pr (C`|d) for arbitrary values of C`given our samples

Pr (C`|d) ≈ 1Nsamples

Nsamples∑i

Pr(C`|σi`

)For a Gaussian field

Pr (C`|σ`) ∝∞∏`=0

1σ`

(σ`C`

) 2`+12

exp(−2`+ 1

2σ`C`

)

I Require large numbers of samples to analyse high resolutiondata exactly

I But certainly useful at low `

A convergence diagnostic

Hanson (2001) proposed the following diagnostic that makes use ofgradient information.Compare two estimates of the variance that depend differently onwhere our samples lie in the distribution.

var1(x) =∫ ∞−∞

(x− x)2 Pr (x) dx

var2(x) =13

∫ ∞−∞

(x− x)3 Pr (x)∂ψ(x)∂x

dx +13

∣∣∣(x− x)3 Pr (x)∣∣∣∞−∞

For most ‘interesting’ distributions the second term is zeroSo we compute the ratio from a set of samples {xk}

R =

∑k

(xk − x

)3 ∂ψ(x)∂x

∣∣xk

3∑

k (xk − x)2

Hamiltonian Monte Carlo

Proposed (Duane et al. 1987)

I Draws parallels between sampling and classical dynamics

I Introduces persistent motion through the parameter space

For each parameter xi we introduce a momentum pi and definethe Hamiltonian

H =∑i

p2i

2mi+ ψ (x)

where ψ (x) = − log{Pr (x)} and mi is a fictional mass associatedwith each variable. (In general we can have a mass matrix M)

Hamiltonian Monte Carlo

1. draw new momenta pi from Gaussian with variance mi

2. propagate x,p along a trajectory in the (x,p) space fromHamilton’s equations

∂p

∂t= −∂H

∂x

∂x

∂t=∂H

∂p

3. after some (randomised) length of time halt and accept thenew point according to the Metropolis rule

4. discard p variables, x sample Pr (x)

I We can use any Hamiltonian we like to define our trajectoryas long as we use the correct Hamiltonian to make theaccept/reject decision

I If we use the true Hamiltonian and simulate the dynamicsexactly then every proposed point will be accepted.Conservation of energy.

The leapfrog method

I Simple method for followingdynamics

I Robust to numerical errors

I Reversible

I iterate for T = nτ

I randomise τ and n to avoidresonance conditions

p (t+ τ/2) = p (t)−τ2∇xψ (x) |x=x(t)

x (t+ τ) = x (t)+τ

mp (t+ τ/2)

p (t+ τ) = p (t+ τ/2)−τ2∇xψ (x) |x=x(t+τ)