Approximate Bayesian Computation for Astrostatistics · 2016-10-26 · Approximate Bayesian...

Approximate Bayesian Computation forAstrostatistics

Jessi CisewskiDepartment of Statistics

Yale University

October 24, 2016

SAMSI Undergraduate Workshop

Our goals

Introduction to Bayesian methods

Likelihoods, priors, posteriors

Learn our ABCs....

Approximate Bayesian Computation

Image from http://livingstontownship.org/lycsblog/?p=241

Suppose we randomly select a sample of size n women andmeasure their heights: x1, x2, x3, . . . , xn

Let’s assume this follows a Normal distribution: X ∼ N(µ, σ)

f (x ;µ, σ) =1√

2πσ2e−

(x−µ)2

Stellar Initial Mass Function

Probability density function (pdf)−→ the function f (x , µ, σ), with µ fixed and x variable

f (x , µ, σ) =1√

2πσ2e−

(x−µ)2

Height (in)

Density

µHeight

Each observation from this

Likelihood−→ the function f (x , µ, σ), with µ variable and x fixed

f (x , µ, σ) =1√

2πσ2e−

(x−µ)2

Likelihood

Probability density function (pdf)−→ the function f (x , µ, σ), with µ fixed and x variable

f (x , µ, σ) =1√

2πσ2e−

(x−µ)2

Height (in)

Density

µHeight

Each observation from thisLikelihood−→ the function f (x , µ, σ), with µ variable and x fixed

f (x , µ, σ) =1√

2πσ2e−

(x−µ)2

Likelihood

Consider a random sample of size n (assuming independence, anda known σ): X1, . . . ,Xn ∼ N(3, 2)

f (x , µ, σ) = f (x1, . . . , xn, µ, σ)=n∏

1√2πσ2

e−(xi−µ)2

1.0 1.5 2.0 2.5 3.0 3.5 4.0

n = 50n = 100n = 250n = 500µ

Likelihood Principle

All of the information in a sample is contained in the likelihoodfunction, a density or distribution function.

The data are modeled by a likelihood function.

How do we infer µ? Or any unknown parameter(s) θ?

Maximum likelihood estimation

The parameter value, θ, that maximizes the likelihood:

θ = maxθ

f (x1, . . . , xn, θ)

1.0 2.0 3.0 4.00e+00

Likelihood

maxµ f (x1, . . . , xn, µ, σ) =

maxµ∏n

i=11√

2πσ2e−

(xi−µ)2

Hence, µ =∑n

i=1 xin = x

Bayesian framework

Suppose θ is our parameter(s) of interest

Classical or Frequentist methods for inference consider θ to befixed and unknown−→ performance of methods evaluated by repeated sampling−→ consider all possible data sets

Bayesian methods consider θ to be random−→ only considers observed data set and prior information

Posterior distribution

π(θ |Data︷︸︸︷x ) =

Likelihood︷︸︸︷f (x | θ) ·

Prior︷︸︸︷π(θ)

f (x)=

f (x | θ)π(θ)∫Θ dθf (x | θ)π(θ)

∝ f (x | θ)π(θ)

The prior distribution allows you to “easily” incorporate yourbeliefs about the parameter(s) of interest

Posterior is a distribution on the parameter space given theobserved data

Data: y1, . . . , y4 ∼ N(µ = 3, σ = 2), y = 1.278

Prior: N(µ0 = −3, σ0 = 5)

Posterior: N(µ1 = 1.114, σ1 = 0.981)

-10 -5 0 5 10

DensityLikelihoodPriorPosterior

Data: y1, . . . , y4 ∼ N(µ = 3, σ = 2), y = 1.278

Prior: N(µ0 = −3, σ0 = 1)

Posterior: N(µ1 = −0.861, σ1 = 0.707)

-10 -5 0 5 10

Data: y1, . . . , y200 ∼ N(µ = 3, σ = 2), y = 2.999

Prior: N(µ0 = −3, σ0 = 1)

Posterior: N(µ1 = 2.881, σ1 = 0.140)

-6 -4 -2 0 2 4 6

Prior distribution

If one has a specific prior in mind, then it fits nicely into thedefinition of the posterior

But how do you go from prior information to a prior distribution?

And what if you don’t actually have prior information?

And what if you don’t know the likelihood??

Prior distribution

If one has a specific prior in mind, then it fits nicely into thedefinition of the posterior

But how do you go from prior information to a prior distribution?

And what if you don’t actually have prior information?

And what if you don’t know the likelihood??

Approximate Bayesian Computation

“Likelihood-free” approach to approximating π(θ | xobs)i.e. f (xobs | θ) not specified

Proceeds via simulation of the forward process

Why would we not know f (xobs | θ)?

1 Physical model too complex to write as a closed form functionof θ

2 Strong dependency in data

3 Observational limitations

ABC in Astronomy:Ishida et al. (2015); Akeret et al. (2015); Weyant et al. (2013);Schafer and Freeman (2012); Cameron and Pettitt (2012)

Basic ABC algorithm

For the observed data xobs and prior π(θ):

Algorithm∗

1 Sample θprop from prior π(θ)

2 Generate xprop from forward process f (x | θprop)

3 Accept θprop if xobs = xprop

4 Return to step 1

∗Introduced in Pritchard et al. (1999) (population genetics)

Step 3: Accept θprop if xobs = xprop

Waiting for proposals such that xobs = xprop would be computationallyprohibitive

Instead, accept proposals with ∆(xobs, xprop) ≤ εfor some distance ∆ and some tolerance threshold ε

When x is high-dimensional, will have to make ε too large in order tokeep acceptance probability reasonable.

Instead, reduce the dimension by comparing summariesS(xprop) and S(xobs)

Gaussian illustration

Data xobs consists of 25 iid draws from Normal(µ, 1)

Summary statistics S(x) = x

Distance function ∆(S(xprop), S(xobs)) = |xprop − xobs|

Tolerance ε = 0.50 and 0.10

Prior π(µ) = Normal(0,10)

Gaussian illustration: posteriors for µ

Tolerance: 0.5, N:1000, n:25

Density

-1.0 -0.5 0.0 0.5 1.0

True PosteriorABC PosteriorInput value

Tolerance: 0.1, N:1000, n:25

Density

-1.0 -0.5 0.0 0.5 1.0

2.0 True Posterior

ABC PosteriorInput value

How to pick a tolerance, ε?

Instead of starting the ABC algorithm over with a smallertolerance (ε), decrease the tolerance and use the alreadysampled particle system as a proposal distribution rather thandrawing from the prior distribution.

Particle system: (1) retained sampled values, (2) importanceweights

Beaumont et al. (2009); Moral et al. (2011); Bonassi and West(2004)

Gaussian illustration: sequential posteriors

-1.0 -0.5 0.0 0.5 1.0

N:1000, n:25

Density

True PosteriorABC PosteriorInput value

Tolerance sequence, ε1:10:1.00 0.75 0.53 0.38 0.27 0.19 0.15 0.11 0.08 0.06

Stellar Initial Mass Function

Stellar Initial Mass Function: the distribution of star massesafter a star formation event within a specified volume of space

Molecular cloud → Protostars → Stars

Image: adapted from http://www.astro.ljmu.ac.uk

Broken power-law(Kroupa, 2001)

Φ(M) ∝ M−αi ,

M1i ≤ M ≤ M2i

α1 = 0.3 for 0.01 ≤ M/M∗Sun ≤ 0.08 [Sub-stellar]α2 = 1.3 for 0.08 ≤ M/MSun ≤ 0.50α3 = 2.3 for 0.50 ≤ M/MSun ≤ Mmax

Many other models, e.g. Salpeter (1955); Chabrier (2003)∗1 MSun = 1 Solar Mass (the mass of our Sun)

ABC for the Stellar Initial Mass Function

MassFrequency

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Log10(Mass)

Frequency

-1.0 -0.5 0.0 0.5

Image (left): Adapted from http://www.astro.ljmu.ac.uk

IMF Likelihood

Start with a power-law distribution: each star’s mass isindependently drawn from a power law distribution with density

f (m) =

(1− α

M1−αmax −M1−α

)m−α, m ∈ (Mmin,Mmax)

Then the likelihood is

L(α | m1:ntot ) =

(1− α

M1−αmax −M1−α

×ntot∏i=1

m−αi

ntot = total number of stars in cluster

Observational limitations: aging

Lifecycle of star depends on mass → more massive stars die faster

Cluster age of τ Myr → only observe stars with masses< Tage ≈ τ−2/5 × 108/5

If age = 30 Myr so the aging cutoff is Tage ≈ 10 MSun

Then the likelihood is

L(α | m1:nobs , ntot) =

(1− α

T 1−αage −M1−α

)nobs(

nobs∏i=1

m−αi

)× P(M > Tage)

ntot−nobs

ntot = # of stars in cluster

nobs = # stars observed in cluster

Image: http://scioly.org

Observational limitations: completeness

Completeness function:

P(observing star | m) =

0, m < Cminm−Cmin

Cmax−Cmin, m ∈ [Cmin,Cmax]

1, m > Cmax

Probability of observing a particular star given its mass

Depends on the flux limit, stellar crowding, etc.

Image: NASA, J. Trauger (JPL), J. Westphal (Caltech)

Observational limitations: measurement error

Incorporating log-normal measurement error gives our final likelihood:

L(α | m1:nobs, ntot ) =(

P(M > Tage ) +

(1− α

M1−αmax − M1−α

)∫ Cmax

M−α ×(

1−M − Cmin

Cmax − Cmin

)ntot−nobs

×nobs∏i=1

{∫ Tage

2(2πσ2)

− 12 m−1

i e− 1

2σ2 (log(mi )−log(M))2(

1− α

M1−αmax − M1−α

)M−α

×(I{M > Cmax} +

(M − Cmin

Cmax − Cmin

)I{Cmin ≤ M ≤ Cmax}

Density

0 10 20 30 40 50 60

With aging, completeness, and error

Density

2 4 6 8 10 12 14 16

Sample size = 1000 stars, [Cmin,Cmax] = [2, 4], σ = 0.25

Simulation Study: forward model

Draw from

f (m) =

(1− α

601−α − 21−α

)m−α, m ∈ (2, 60)

Aged 30 Myrs

Observational completeness:

P(obs | m) =

0, m < 4m−2

2 , m ∈ [2, 4]

1, m > 4.

Uncertainty: logM = logm + 0.25η (with η ∼ N(0, 1))

Prior: α ∼ U[0, 6]∗

Simulation Study: summary statistics

We want to account for the following with our summary statisticsand distance functions:

1. Shape of the observed Mass Function

ρ1(msim,mobs) =

[∫ {flog msim

(x)− flog mobs(x)}2

2. Number of stars observed

ρ2(msim,mobs) = |1− nsim/nobs |

msim = masses of the stars simulated from the forward modelmobs = masses of observed starsnsim = number of stars simulated from the forward modelnobs = number of observed stars

Simulation Study

1 Draw n = 103 stars2 IMF slope α = 2.35 with Mmin = 2 and Mmax = 603 N = 103 particles4 T = 30 sequential time steps

Observed Mass Function

Density

0 5 10 15

0 10 20 30 40 50

N = 1000 Bandwidth = 0.4971

Density

0 10 20 30 40 50

Observed MF

N = 510 Bandwidth = 0.5625

Density

Sample size = 1000 stars, [Cmin,Cmax] = [2, 4], σ = 0.25

Simulation Study results

1 Draw n = 103 stars

2 IMF slope α = 2.35 with Mmin = 2 and Mmax = 60

3 N = 103 particles

4 T = 30 sequential time steps

0.4 0.6 0.8 1.0

(A) Observed Mass Function

Log10(M)

Log10(f M)

Observed MF95% Credible bandPosterior Median

1.6 1.8 2.0 2.2 2.4 2.6 2.8

(B) Posteriors

Density

ABC PosteriorTrue PosteriorInput value

Initial ABC posterior...Intermediate ABC posterior...Final ABC Posterior

Summary

ABC can be a useful tool when data are too complex to define areasonable likelihood,

E.g. the Stellar Initial Mass Function

Selection of good summary statistics is crucial for ABC posterior tobe meaningful

THANK YOU!

Summary

ABC can be a useful tool when data are too complex to define areasonable likelihood,

E.g. the Stellar Initial Mass Function

Selection of good summary statistics is crucial for ABC posterior tobe meaningful

THANK YOU!37

Bibliography I

Akeret, J., Refregier, A., Amara, A., Seehars, S., and Hasner, C. (2015), “Approximate Bayesian computation forforward modeling in cosmology,” Journal of Cosmology and Astroparticle Physics, 2015, 043.

Bate, M. R. (2012), “Stellar, brown dwarf and multiple star properties from a radiation hydrodynamical simulationof star cluster formation,” Monthly Notices of the Royal Astronomical Society, 419, 3115–3146.

Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., and Robert, C. P. (2009), “Adaptive approximate Bayesiancomputation,” Biometrika, 96, 983 – 990.

Bonassi, F. V. and West, M. (2004), “Sequential Monte Carlo with Adaptive Weights for Approximate BayesianComputation,” Bayesian Analysis, 1, 1–19.

Cameron, E. and Pettitt, A. N. (2012), “Approximate Bayesian Computation for Astronomical Model Analysis: ACase Study in Galaxy Demographics and Morphological Transformation at High Redshift,” Monthly Notices ofthe Royal Astronomical Society, 425, 44–65.

Chabrier, G. (2003), “The galactic disk mass function: Reconciliation of the hubble space telescope and nearbydeterminations,” The Astrophysical Journal Letters, 586, L133.

Ishida, E. E. O., Vitenti, S. D. P., Penna-Lima, M., Cisewski, J., de Souza, R. S., Trindade, A. M. M., Cameron,E., and Busti, V. C. (2015), “cosmoabc: Likelihood-free inference via Population Monte Carlo ApproximateBayesian Computation,” Astronomy and Computing, 13, 1 – 11.

Kroupa, P. (2001), “On the variation of the initial mass function,” Monthly Notices of the Royal AstronomicalSociety, 322, 231–246.

Moral, P. D., Doucet, A., and Jasra, A. (2011), “An adaptive sequential Monte Carlo method for approximateBayesian computation,” Statistics and Computing, 22, 1009–1020.

Pritchard, J. K., Seielstad, M. T., and Perez-Lezaun, A. (1999), “Population Growth of Human Y Chromosomes:A study of Y Chromosome Microsatellites,” Molecular Biology and Evolution, 16, 1791 – 1798.

Salpeter, E. E. (1955), “The luminosity function and stellar evolution.” The Astrophysical Journal, 121, 161.

Schafer, C. M. and Freeman, P. E. (2012), Statistical Challenges in Modern Astronomy V, Springer, chap. 1,Lecture Notes in Statistics, pp. 3 – 19.

Weyant, A., Schafer, C., and Wood-Vasey, W. M. (2013), “Likeihood-free cosmological inference with type Iasupernovae: approximate Bayesian computation for a complete treatment of uncertainty,” The AstrophysicalJournal, 764, 116.

Approximate Bayesian Computation for Astrostatistics · 2016-10-26 · Approximate Bayesian...

Documents

Transcript of Approximate Bayesian Computation for Astrostatistics · 2016-10-26 · Approximate Bayesian...

Approximate Bayesian computation with differential evolution

ABC Methods for Bayesian Model Choiceconferences.inf.ed.ac.uk/bayes250/slides/robert.pdf · ABC Methods for Bayesian Model Choice Approximate Bayesian computation Regular Bayesian

Stochastic Gradient Descent as Approximate Bayesian Inferencejmlr.csail.mit.edu/papers/volume18/17-214/17-214.pdf · Keywords: approximate Bayesian inference, variational inference,

Sparse Linear Models: Estimation and Approximate Bayesian ...mlss.tuebingen.mpg.de/2013/2013/seeger_slides.pdf · Sparse Linear Models: Estimation and Approximate Bayesian Inference

Approximate Bayesian Inference for Latent Gaussian Models ...rjs57/RSS/0708/Rue08.pdf · Approximate Bayesian Inference for Latent Gaussian Models 3 Dynamic models Temporal dependency

An intro to ABC – approximate Bayesian computation

Approximate Bayesian computation with the Wasserstein … · 2020-01-11 · Approximate Bayesian computation with the Wasserstein distance Espen Bernton , Pierre E. Jacob y, Mathieu

(Approximate) Bayesian computation as a new empirical Bayes (something)?

Approximate Bayesian inference for random effects meta ...

Approximate Bayesian computation (ABC) 1cm NIPS Tutorial · Approximate Bayesian computation (ABC) NIPS Tutorial Richard Wilkinson r.d.wilkinson@nottingham.ac.uk School of Mathematical

Approximate Bayesian Computation (ABC) in R: A Vignette

A family of algorithms for approximate Bayesian inference

Learning Summary Statistics for Approximate Bayesian ...cs229.stanford.edu/proj2015/051_poster.pdf · Learning summary stas6c for approximate bayesian computaon via deep neural network.

Approximate Bayesian Computation framework...Approximate Bayesian Computation framework M ARGARET KOSMALA 1,6 , PHILIP MILLER 2 , SAM FERREIRA 3 , PAUL FUNSTON 4 , DEWALD KEET 5 ,

Approximate Bayesian Computation in Evolution and Ecologyjtaylor/teaching/Fall2013/...techniques—often dubbed approximate Bayesian computation, or likelihood-free inference—has

Approximate Bayesian Computation for State Space Modelstdri.or.th/wp-content/uploads/2014/12/Maneesoonthorn_ABC_TDRI.pdf · Approximate Bayesian Computation for State Space Models

Variational Algorithms for Approximate Bayesian Inference

A New Approximate Bayesian Approach for Decision Making ...

Approximate Bayesian model choice via random forests

Bayesian Networks: Sampling Algorithms for Approximate Inference