Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra Julian Borrill...

Cosmic Microwave Background Data Analysis :

From Time-Ordered Data To Power Spectra

Julian BorrillComputational Research Division, Berkeley Lab

& Space Sciences Laboratory, UC Berkeley

CMB Science - I

The CMB is a snapshot of the Universe when it first became neutral 400,000 years after the Big Bang.

Cosmic - filling all of space.

Microwave - redshifted by the expansion of the Universe from 3000K to 3K.

Background - primordial photons coming from “behind” all astrophysical sources.

CMB Science - II

The CMB is a unique probe of the very early Universe.Its tiny (1:105-8) fluctuations carry information about - the fundamental parameters of cosmology - ultra-high energy physics beyond the Standard Model

CMB Science - III

The new frontier in CMB research is polarization :• consistency check of temperature results• re-ionization history of the Universe• gravity wave production during inflationBut polarization fluctuations are up to 3 orders of magnitude fainter than temperature (we think) requiring :• many more detectors• much longer observation times• very careful analysis of very large datasets

The Planck Satellite

•A joint ESA/NASA mission due to launch in fall 2007.

•An 18+ month all-sky survey at 9 microwave frequencies from 30 to 857 GHz.

•O(1012) observations, O(108) sky pixels, O(104) spectral multipoles.

CMB analysis moves from the time domain - observations to the pixel domain - maps to the multipole domain - power

spectracalculating the compressed data and

their reduced error bars at each step.

Overview (I)

Overview (II)

CMB data analysis typically proceeds in 4 steps:

- Pre-processing (deglitching, pointing, calibrating).

- Estimating the time-domain noise statistics.

- Estimating the map (and its errors).

- Estimating the power spectra (and their errors).

iterating & looping as we learn more about the data.

Then we can ask about the likelihoods of the parameters of any particular class of cosmologies.

To cover the(a)basic mathematical formalism(b)algorithms & their scaling behaviours(c) example implementation issues

for map-making and power spectrum estimation.

To consider how to extract the maximum amount of information from the data, subject to practical computational constraints.

To illustrate some of the computational issues faced when analyzing very large datasets (eg. Planck).

Data CompressionCMB data analysis is an exercise in data

compression:1. Time-ordered data: #samples = #detectors x sampling rate x duration ~ 70 x 200Hz x 18 months for Planck2. (HEALPixelized) Maps: #pixels = #components x sky fraction x 12 nside2

~ (3 - 6) x 1 x 12 x 40962 for Planck3. Power Spectra: #bins = #spectra x #multipoles / bin resolution ~ 6 x (3 x 103) / 1 for Planck

Data Parameters

Symbol Description Planck

Number of samples 5 x 1011

Noise bandwidth O(104)

Number of pixels 6 x 108

Number of spectra 6

Maximum multipole 3 x 103

Number of spectral bins

2 x 104

Number of iterations -

Number of realizations -

Computational Constraints• 1 GHz processor running at 100% efficiency for

1 day performs O(1014) operations.

• 1 Gbyte of memory can hold O(108) element vector, or O(104 x 104) matrix, in 64-bit precision.

• Parallel (multiprocessor) computing increases the operation count and memory limits.

• Challenges to computational efficiency & scaling:

- load balancing (work & memory)- data-delivery, including communication & I/O

Map-Making - Formalism (I)

Consider data consisting of noise and sky-signal

where the pointing matrix A encodes the weight of

eachpixel p in each sample t - for a total power

temperatureobservation

and the sky-signal sp is both beam & pixel smoothed.

Map-Making - Formalism (II)

Assume Gaussian noise with probability distribution

and a time-time noise correlation matrix

whose inverse is (piecewise) stationary & band-limited

Aside : Noise Estimation

To make the map we need the inverse time-time noise correlations. Approximate:

which requires the pure noise timestream. i) Assume nt = dt

ii) Solve for the map: dp ~ sp

iii) Subtract the map from the data: nt = dt - Atp dp

iv) Iterate.

Map-Making - Formalism (III)

Writing the noise in terms of the data and signal & maximizing its likelihood over all possible signals gives the minimum variance map

with pixel-pixel noise correlations

Taken together, these are a complete description of the data.

Map-Making - Algorithms (I)

We want to solve the system:

Eg. (5 x 1011)2 x (6 x 108) ~ 2 x 1032 for Planck !

Equation Naive OpCount

Map-Making - Algorithms (II)

a) Exploit the structure of the matrices– Pointing matrix is sparse– Inverse noise correlation matrix is band-Toeplitz

Associated matrix-matrix & -vector multiplication scalings reduced from & to .Eg. (5 x 1011) x 104 ~ 5 x 1015 for Planck.

Map-Making - Algorithms (III)

b) Replace explicit matrix inversion with an iterative solver (eg. preconditioned conjugate gradient) using repeated matrix-vector multiplications

reducing the scaling from to .

depends on the required solution accuracy and the quality of the preconditioner (white noise works well).

Eg. 30 x (6 x 108)2 ~ 1019 for Planck.

Map-Making - Algorithms (IV)

c) Leave the inverse pixel-pixel noise matrix in implicit form

Now each multiplication takes operations in

pixel space, or operations in fourier

space.

Eg. 30 x (5 x 1011) x log2 104 ~ 2 x 1014 for Planck.

But this gives no information about the pixel-pixel noise correlations (ie. errors).

Map-Making - Extensions (I)

For polarization data the signal can be written in terms of the i, q & u Stokes parameters and the angle of the polarizer relative to some chosen coordinate frame

where Atp now has 3 non-zero entries per row.We need at least 3 observation-angles to separate i, q & u.

Map-Making - Extensions (II)

If the data includes a sky-asynchronous contaminating

signal (eg. MAXIMA’s chopper-synchronous signal)

This can be extended to any number of contaminants,

including relative offsets between sub-maps.

If the data includes a sky-synchronous contaminating signal then more information is needed. Given multi-frequency data with foreground

With detectors at d different frequencies this can be extended to (d-1) different foregrounds.

Map-Making - Extensions (III)

Map-Making - Implementation (I)

We want to be able to calculate the inverse pixel-pixel

noise correlation matrix

because it

- encodes the error information

- is sparse, so can be saved even for huge data

- provides a good (block) diagonal preconditioner & i/q/u pixel degeneracy test with white noise.

Map-Making - Implementation (II)

Both the time- & pixel-domain data must be distributed.

We want to balance simultaneously both the work- & the memory-load per processor.

Balanced distribution in one basis is unbalanced in other:

(i) uniform work/observation but varied work/pixel.

(ii) unequal numbers of pixels observed/interval.

No perfect solution - have to accept the least limiting imbalance (pixel memory).

Map-Making - Implementation (III)

Pseudo-code:

Initialize

For each time t

For each pixel p observed at time t with weight w

For each time t’ within of t

For each pixel p’ observed at time t’ with weight w’

Accumulate

Doesn’t work well - pixel-memory access too random.

Map-Making - Implementation (IV)

Alternative pseudo-code:

For each pixel p find the times at which it is observed.

For each pixel p

Initialize

For each time t at which p is observed with weight w

For each time t’ within of t

For each pixel p’ observed at time t’ with weight w’

Accumulate

Map-Making - Implementation (V)

Distribute time-ordered data equally over processors.

Each processor accumulates part of Npp’-1 in sparse form

- only over pixels observed in its time interval

- only over some of the observations of these pixels

Then all of these partial matrices, each stored in its own sparse form:

(p, z, (p1, n1), … , (pz, nz)), …, (p’, z’, (p’1, n’1), …, (p’z’,

n’z’))

have to be combined.

Map-Making - Conclusions

• The maximum-likelihood map only can be calculated in operations - O(1014) for Planck.

• The sparse inverse pixel-pixel correlation matrix can be calculated in operations - O(1015) for Planck.

• A single column of the pixel-pixel correlation matrix can be calculated (just like a map) in operations - O(1014) for Planck.

• Computational efficiency only ~5% serial/small data, ~0.5% parallel/large data.

Power Spectrum Estimation

From the map-making we have:

(i) the pixelized data-vector dp containing the CMB

signal(s) plus any other separated components.

(ii) some representation of the pixel-pixel noise correlation matrix Npp’ - explicit, explicit inverse, or implicit inverse.

Truncation of Npp’ (only) is equivalent to marginalization.

Power Spectrum - Formalism (I)

Given a map of noise + Gaussian CMB signal

with correlations

and - assuming a uniform prior - probability distribution

Power Spectrum - Formalism (II)

Assuming azimuthal symmetry of the CMB, its pixel-pixel correlations depend only on the angle between the pair.

3 components 6 spectra (3 auto + 3 cross).

Theory predicts

CTB = CEB = 0

but this should still be

calculated/confirmed

(test of systematics).

Power Spectrum - Formalism (III)

Eg: temperature signal correlations:

with beam+pixel window

For binned multipoles

Power Spectrum - Formalism (IV)

No analytic likelihood maximization - use iterative search.

Newton-Raphson requires two derivatives wrt bin powers:

Power Spectrum - Algorithms (I)

For each NR iteration we want to solve the system:

EquationNaive

OpCount

Eg. (2 x 104) x (6 x 108)3 ~ 4 x 1030 operations/iteration & 8 x (6 x 108)2 ~ 3 x 1018 bytes/matrix for Planck !

Power Spectrum - Algorithms (II)

a) Exploiting the signal derivative matrices’ structures

- reduces the operation count by up to a factor of 10. - increases the memory needed from 2 to 3 matrices.

Power Spectrum - Algorithms (III)

b) Using PCG for D-1 & trace approximant

for dL/dC & d2L/dC2 terms.

scaling as

Eg. 5 x 104 x (2 x 104) x 103 x (5 x 1011) x 10 ~ 5 x 1024 for Planck.

Power Spectrum - Algorithms (IV)c) Abandoning maximum likelihood altogether !

Eg. Take SHT of noisy, cut-sky map.

Build a pseudo-spectrum

Assume spectral independence

& an invertible linear transfer function

Power Spectrum - Algorithms (VI)

Now use Monte-Carlo simulations of the experiment’s observations of signal+noise & noise only to determine the transfer matrix Tll’ & the noise spectrum

Cln, and hence the signal spectrum Cl

Scales as

Eg. 104 x 30 x 5 x 1011 x 10 ~ 1018 for Planck.

And provided we simulate time-ordered data, any other abuses (filtering etc) can be included in the Monte Carlo.

Power Spectrum - Implementation (I)

Even with supercomputers full maximum likelihood analysis

is unfeasible for more than O(105) pixels.

However, it is still useful for(a) low-resolution all-sky maps (eg. for low multipoles

where Monte Carlo methods are less reliable).(b) high-resolution cut-sky maps (eg. for consistency

& systematics cross-checks).(c) hierarchical resolution/sky-fraction analyses.

Power Spectrum - Implementation (II)

(i) Build the dS/dCb matrices.

(ii) Make initial guess Cb

(iii) Calculate D = N + Cb dS/dCb & invert.

(iv) For each b, calculate Wb = D-1 dS/dCb.

(v) For each b, calculate dL/dCb ~ d Wb z - Tr[Wb]

(vi) For each b, b’ calculate Fbb’ ~ Tr[Wb Wb’]

(vii) Calculate new Cb += Fbb’-1 dL/dCb’.

(viii) Iterate to convergence.Using as many processors as possible, as efficiently as

possible.

Power Spectrum - Implementation (III)

The limiting calculation is

so it sets the constraints for the overall implementation.

Maximum efficiency requires minimum communication keep data as dense as possible on the processors 3 matrices fill available memory

Check other steps - none requires more than 3 matrices.Out-of-core implementation, so I/O efficiency is critical.

Power Spectrum - Implementation (IV)

Dense data requirement sets the processor count:#PE ~ 3 x 8 x Np

2 / bytes per processor

Eg. 3 x 8 x (105)2 / 109 = 240, but we want to scale to very large systems.

Each bin multiplication is independent - divide the processors into gangs & do many Wb simultaneously.

Derivatives terms are bin-parallel toodL/dCb trivial

d2L/dCbdCb’ requires some load-balancing

Use Tr[AB] = Aij Bji

= ATji Bji

For all my gang’s b Read Wb

Transpose WTb

WTb & Wb Fbb

For all b’ > b Read Wb’ over Wb

WTb & Wb’ Fbb’

Aside : Fisher Matrix (I)

2 matrices in memory; Nb(Nb+1)/2 reads.

Aside : Fisher Matrix (II)

3 matrices in memory; Nb(Nb+2)/4 reads

For all my gang’s b Read Wb

Transpose WTb

WTb & Wb Fbb

Read Wb+1 over Wb

WTb & Wb+1 Fbb+1

Transpose WTb+1

For all b’ > b+1 Read Wb’ over Wb+1

WTb & Wb’ Fbb’

WTb+1 & Wb’ Fb+1 b’

Power Spectrum - Implementation (V)

All gangs need dS/dCb & D-1 first but these terms do not bin-parallelize.

Either - calculate them using all processors & re-map or - calculate them on 1 gang & leave others idle

Balance re-mapping/idling cost against multi-gang benefit; optimal WT2C depends on architecture & data parameters.

For large data with many bins & efficient communication network, sustain 50%+ peak on thousands of processors.

Aside : Parameterizing Architectures

• Portability is crucial, but architectures differ.• Identify critical components & parameterize limits.• Tune codes to each architecture.

Eg. ScaLAPACK - block-cyclic matrix distribution is most efficient data layout (mapped to re-use pattern).Hardware limit is cache size - set blocksize so that requisite number of blocks fit & fill cache.

Eg. Input/Output - how many processors can efficiently read/write simultaneously (contention vs idling) ?

Conclusions

• Maximum likelihood spectral estimation cannot handle high-resolution all-sky observation, but Monte Carlo methods can (albeit with questions at low-l).• For sufficiently cut-sky or low-resolution maps ML methods are feasible, and allow complete error-tracking.• Dense matrix-matrix kernels achieve 50%+ of peak performance (cf. 5% for FFT & SHT).• Inefficient implementation of data-delivery can reduce these by an order of magnitude or more; re-use each data unit as much as possible before moving to the next.• Things are only going to get harder !

Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra Julian Borrill...

Documents

Transcript of Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra Julian Borrill...

Julian on JavaScript: Functions Julian M Bucknall, CTO.

1 Data Guard Basics Julian Dyke Independent Consultant Web Version - February 2008 juliandyke.com © 2008 Julian Dyke.

Digital poetrybook borrill

...01-24-16 Julian Julian Hotel accepts packages: 2032 Main St, Julian CA 92036 03-01-16 Julian Julian PO holds packages for only 30 days. 03-19-16 Julian General Delivery address

Julian Center on Regression for Proportion Data July 10, 2007 (68)

The Cosmic Simulator Daniel Kasen (UCB & LBNL) Peter Nugent, Rollin Thomas, Julian Borrill & Christina Siegerist.

Julian Wilson, Climate Change Unit The GAW World Data Centre.

Report Selection Criteria - WordPress.com · cameron julian, brandon ellis, beverly ellis, asher ellis, joseph julian,beth julian,joseph julian, lacy owen, amber owen, kailiegh owen,

Julian jenkins from data measures

Student julian

Julian R - Climate data portals Dakar April 2012

Julian Beever

Julian Schnabel

Julian D. Richards - Open Data in European Archaeology

Julian Cho

CS267-April 20th, 2010 Big Bang, Big Iron High Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center,

Julian Johnson O’Neil O’Neil Residents of Crystal Park ... · Julian Johnson O’Neil O’Neil Residents of Crystal Park Residents of Crystal Park Julian Johnson Julian Johnson

1 juliandyke.com © 2005 Julian Dyke Data Segment Compression Julian Dyke Independent Consultant Web Version.

Scientific Computations on Modern Parallel Vector Systems Leonid Oliker Julian Borrill, Jonathan Carter, Andrew Canning, John Shalf, David Skinner Lawrence.

Julian douglas