Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_5_GLM.pdf · From a Bayesian...

Advanced Lectures on Bayesian Analysis

Alan Heavens

Imperial Centre for Inference and Cosmology (ICIC)Imperial College, London

[email protected]

November 23, 2016

Alan Heavens (ICIC, Imperial College) Advanced Topics November 23, 2016 1 / 23

Overview

1 General linear models

2 Wiener filtering

3 Messenger Fields

4 Further Reading


General linear models

Many problems are linear problems, where the measured data arelinear combinations of the parameters of interest. i.e.

y = Ax+ n

where A is a matrix and n is noise.

Note that A may not be square (and hence not invertible).


Map making

An example of this is map-making in the CMB. x would represent the pixeltemperatures, y is the ‘time-ordered data’, and A is a very sparse matrix of1s and 0s, being 1 if the telescope is pointing at the pixel and zerootherwise.The model can be extended so that x can contain anything else on whichthe data depend linearly, which might involve calibration uncertainties, orsystematic e↵ects.Alan Heavens (ICIC, Imperial College) Advanced Topics November 23, 2016 4 / 23

Generalised Linear Models

From a Bayesian perspective, we are most interested in the posteriordistribution of x, given the data y, but let us think about making amap - i.e. we want an estimator for x, given the data.

We further assume that we know the noise covariance matrix N, i.e.:

hni = 0; hnnTi ⌘ N.

N will have o↵-diagonal terms if the noise is correlated.

Later we also assume that we know the signal power spectrum, or,equivalently, correlation function:

hxi = 0; hxnTi = 0; hxxTi ⌘ S.



p(x|y,N) / exp

�1

2(y � Ax)TN�1(y � Ax)

�

The maximum (i.e. the maximum likelihood [ML] estimate) of thisdistribution is given by di↵erentiating w.r.t. an element of x, yieldingtwo identical terms that give

ATN�1(y � Ax) = 0

which we solve to give the ML estimate:

xML = Wy

whereW = (ATN�1A)�1ATN�1.



xML = Wy; W = (ATN�1A)�1ATN�1.

There are several things to note about this estimate:

First, WA = I, which means that the error in x is independent of thevalue of the field:

✏ ⌘ xML � x

✏ = W(Ax+ n) � x

✏ = (WA � I)x+Wn = Wn.

The second is that it minimizes �2 = (y � Ax)TN�1(y � Ax)(evidently)

Third is (exercise) that it minimizes the mean square error h|✏|2isubject to WA = I.


These are potentially desirable properties, so from a frequentistpoint-of-view, this is a useful estimator. For gaussian n, it alsohappens to be the ML estimator of x.

You may like to show that the noise covariance in the map is

h✏✏Ti = (ATN

�1A)�1.


Wiener filteringSo far we have not exploited any knowledge of the two-pointproperties of the signal.

Let us suppose that we know S.

Then we can compute the posterior for x given y, N and S as well.

The treatment is similar:

p(x|y,N, S) / p(y|x,N, S) p(x|N, S)

Since the signal is not dependent on the noise properties,p(x|N, S) = p(x|S) but now we assume it is gaussian:

p(x|S) = 1p|2⇡S|

exp

�1

2x

TS�1x

�.

Again, we use the linear model, p(y|x,N) = �(y � Ax � n), so

p(x|y,N, S) / exp

�1

2(y � Ax)TN�1(y � Ax) � 1

2x

TS�1x

�.



p(x|y,N, S) / exp

�1

2(y � Ax)TN�1(y � Ax) � 1

2x

TS�1x

�.

The quadratic (in x) in the exponent (⇥2) can be manipulated to

x

T(ATN�1A + S�1)x � x

TATN�1y � y

TN�1Ax+ y

TN�1y

and we can complete the square :where the Wiener filtered map xWF is:

(x � xWF)T(ATN�1A + S�1)(x � xWF)

where ensuring agreement with the terms linear in x requires theWiener filtered map to be

xWF = WWF y, WWF = (ATN�1A + S�1)�1ATN�1.

This is the maximum posterior estimate of x, given gaussian noiseand signal.



From an estimator point-of-view, this also

minimizes h|✏|2i, without the condition WA = I,

The reconstruction error is no longer independent of x. It tends tosuppress peaks.

Note that from a Bayesian perspective, the complete output of theexperiment is the full posterior, not an estimator.

If one wants to do inference with the map, one should also includethe gaussian uncertainty around the Wiener filter solution, withcovariance matrix

CWF = (ATN�1A + S�1)�1.

We can draw samples from the posterior for x, since it is amultivariate Gaussian N (xWF,CWF)


Wiener filtered images

Figure: Wiener-filtered map (d) from Pogrebnyak & Lukin (2003)Alan Heavens (ICIC, Imperial College) Advanced Topics November 23, 2016 13 / 23

Summary

There are powerful linear algebra tools for linear models - a lot isknown

Solutions that are most probable from a Bayesian perspective coincidewith estimator-based solutions that are optimised subject to certainconditions, provided that the fields are Gaussian

From a Bayesian point-of-view, the most probable (maximum a

posteriori, or MAP) solution is not the whole story; we want and needthe full posterior

For Gaussian fields we can sample from the posterior if we cancompute S�1 and (ATN�1A)�1


Messenger Fields

Sometimes we do not know, or cannot compute, some of theconditional distributions in BHMs.

One elegant trick is Data Augmentation, where we introduceadditional latent variables, with conditional distributions that we cansample from

These extra variables are sometimes called Messenger Fields

Example: Gaussian fields. we know that the posterior for the field is aGaussian field, with mean given by the Wiener-filtered map, andknown covariance (we take the response matrix to be A = I forsimplicity):

x ⇠ N (xWF ,CWF )

xWF = (N�1 + S�1)�1N�1y, CWF = (N�1 + S�1)�1.


Messenger Fields (Elsner, Wandelt 2012)

Typical cosmology case: maps are large, so N�1 and S�1 are onlycomputable if they are diagonal

N is typically diagonal in pixel space (uncorrelated noise)

S is typically diagonal in Fourier basis (homogeneity or isotropy)

There is no basis in which N and S are both diagonal (unless N / I)

Conclusion: we cannot compute xWF = (N�1 + S�1)�1N�1y or

CWF = (N�1 + S�1)�1.

Solution: make the problem harder - introduce another large(fictitious) map, the Messenger Field t.

t contains the white noise contribution of N, so its covariance matrixT / I and is diagonal in both bases.


Messenger Fields in Weak lensingAlsing, AFH et al (2016a). ⇠ 130, 000 parameters; Gibbs sampling


Messenger Fields in Weak lensingAlsing, AFH et al (2016b). ⇠ 130, 000 parameters; Gibbs sampling


Mega BHM, including more levels of the hierarchy


Summary

There are powerful linear algebra tools for linear models - a lot isknown

Solutions that are most probable from a Bayesian perspective coincidewith estimator-based solutions that are optimised subject to certainconditions, provided that the fields are Gaussian

From a Bayesian point-of-view, the most probable (maximum a

posteriori, or MAP) solution is not the whole story; we want and needthe full posterior

For Gaussian fields we can sample from the posterior if we cancompute S�1 and (ATN�1A)�1

If we cannot, introducing extra latent variables (data augmentation;

Messenger Fields) may allow solution

Very large dimensional parameter spaces (e.g. ⇠ 106) can in somecases be sampled with Gibbs sampling or HMC


Further Reading

Bayes in the Sky (Roberto Trotta, arXiV::0803.4089)

Bayesian Data Analysis (Andrew Gelman et al., CRC Press)

Information Theory, Inference and Learning Algorithms (DavidMackay, CUP)

Berkeley course on Bayesian Modeling and Inference (Michael I.Jordan). This is an excellent resource, on which I have drawn forsome bits of the material. Making these publicly available isacknowledged and appreciated.

http://www.cs.berkeley.edu/~jordan/courses/

260-spring10/lectures/


Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_5_GLM.pdf · From a Bayesian...

Documents

Transcript of Advanced Lectures on Bayesian Analysisbayes/16/lp/Heavens_Lecture_5_GLM.pdf · From a Bayesian...