Introductory Bayesian Analysis• Placenta previa: unusual pregnancy condition • Study n=980...

Introductory Bayesian Analysis

Jaya M. SatagopanMemorial Sloan-Kettering Cancer Center

April 2, 2008

Bayesian Inference

• Fit probability model to observed data

• Unknown parameters– Summarize using probability distribution

• Use prior information

Outline

• Distributions

• Bayes Rule – Example

• Prior and posterior distributions – Some models – Examples: Binomial, Normal

• Hierarchical Mixture model

Part I

• Joint, conditional and marginal distributions

• Posterior density and Bayes Rule

• Examples

General Notations

• Outcome: y– Disease Status (y = 0, 1)– Tumor size

• Parameter: θ– Mutation carrier status (θ = 0, 1)– Effect of treatment on tumor size

• Covariate: X– Treatment dosage

Distributions

• Model: joint probability distribution (density of y and θ)– p(y, θ) = p(θ) p(y|θ)

• p(θ) = prior density of θ

• p(y|θ) = conditional (sampling) density of y given θ

Example(Gelman et al., 1995)

• Hemophelia: X-chromosome linked recessive disorder

• Affected males (single copy of X chromosome)

• Unaffected female (two copies of X chromosome)

Example

• A single male – y = hemophelia status – θ = “bad” hemophelia gene carrier status

• p(θ) = probability of “bad” hemophelia gene

• p(y|θ) = probability of hemophelia given gene carrier status

Bayes Rule

• Obtain posterior density from prior and sampling densities.

• p(y) = marginal probability of data – Average over all possible values of θ

( ) ( )( )

( ) ( )( )

( ) ( ) ( )

( ) ( ) )continuous is (when

discrete) is (when

,

θθθθ

θθθ

θθθθ

θ

θ

dypp

yppyp

ypypp

ypypyp

∫

∑

=

=

==

Example (Hemophelia)

• θ = carrier status in woman

• Find p(θ=1|y1=0, y2=0)

y1 =0 y2 =0

( ) ( )( )( )( ) ( ) ( )

( )

( ) ( ) ( )( ) ( )

625.0

110,0

000,00,0

20.0625.0125.0

0,010,01

0,01

11100,0

25.05.05.010,015.00

21

2121

21

2121

21

21

=====

+=======

==

======

====

=×====

=×========

θθθθ

θθθ

θθ

θθ

pyyp

pyypyyp

yypyypp

yyp

yyp

yyppp

Example (Breast Cancer)

• Case-control study

• Sample individuals with y=1 (case), and y=0 (control)

• Genotype for a BRCA mutation (θ=1 if carrier, 0 if non-carrier)

• Observed distribution: p(θ|y) !!

• Want to estimate risk: p(y=1|θ=1)


• Odds ratio:

• Using Bayes rule:

( )( )01

11====

=ypyp

θθ

φ

( ) ( )( )

( ) ( )( ) ( ) ( ) ( )

( )( ) ( )01

1

001111111

11,111

=+===

===+======

=

======

ypypyp

ypypypypypyp

pypyp

φφ

θθθ

θθθ


• p(θ=1|y=1) = 25/204 = 0.1225• p(θ=1|y=0) = 23/1113 = 0.0207• Φ = 0.1225/0.0207 = 5.9

• p(y=1): disease incidence in a given age group (40-49)– from SEER database– p(y=1) = 0.0138

• p(y=0) = 1-p(y=1) • p(y=1|θ=1) = 7.6%

1090179Non-carrier

2325CarrierControlCase

BRCA Mutation

• Age group 40-49• Satagopan et al., (2001)• SEER Registry:

http://seer.cancer.gov

Summary

• Probability distributions – Sampling density of observed data– Probability density of parameter of interest

• Bayes rule helps determine posterior density of interest

Part II

• Probability models for data analysis

• Specifying prior and posterior densities

• Example:– Binomial-Beta– Normal

Bayesian Inference

• Fit model to data (specify likelihood)

• Specify prior density for unknown parameters

• Derive posterior density

• Summaries: posterior mean, variance, confidence interval ...

Example(Gelman et al., 1995)

• Placenta previa: unusual pregnancy condition

• Study n=980 births, y=437 female births

• Goal: Bayesian inference on probability of female births among placenta previapregnancies

• θ = probability of a female birth in a single placenta previa pregnancy case

Sampling Density

• The n placenta previa pregnancies are independent.

• p(y|θ) is a Binomial density.

• θ is a probability. Hence, θ є [0,1].

• Goal: Posterior density

• Need to specify prior density p(θ).

( ) ( ) yny

yn

yp −−⎟⎟⎠

⎞⎜⎜⎝

⎛= θθθ 1

( ) ( ) ( )( ) ( )∫

=

θ

θθθθθ

θdypp

yppyp

Prior Density

• p(θ) for θ є [0,1].

• θ ~ Uniform [0,1]– All values in [0,1] are equally likely

• θ ~ Beta (a,b)– mean = a/(a+b)– variance = ab / [ (a+b)2 (a+b+1) ]

( ) ( )( ) ( ) ( ) 11 1 −− −

ΓΓ+Γ= ba

babap θθθ

a = 1, b = 1 a = 2.425, b = 2.575

θ

p(θ)

a = 4.85, b = 5.15 a = 97, b = 103

mean = 0.5var = 0.083

mean = 0.485var = 0.042

mean = 0.485var = 0.001

mean = 0.485var = 0.023

θ

p(θ)

θ

p(θ)

θ

p(θ)

Posterior Density

• p(θ|y) = Beta (a+y, b+n-y)

• Conjugacy property: Posterior has the same distributional form as the prior

• Summaries: – Mean: E(θ|y) = (a+y) / (a+b+n)– Variance:

V(θ|y) = (a+y)(b+n-y) / [ (a+b+n)2 (a+b+n+1)]

Example (Placenta Previa)

• n=980, y=437. p(y|n) is Binomial. • θ = probability of female birth for a single

placenta previa case.

• Prior: θ ~ Uniform [0,1] = Beta (1,1)• Posterior: p(θ|y) is Beta (438, 544).

• Summaries: – Posterior mean = 0.446– Posterior variance = 0.00025


• Study other summaries of posterior density– Posterior Median– Sample θ values from p(θ|y) using

computer simulation– Calculate median of these sampled values

• What happens if we choose different a, b values for prior p(θ) ?

Posterior Median

• Draw 1000 samples θ from p(θ|y) = Beta (438, 544).

• Can use computer programs to generate such random samples.

• Posterior median = 0.445• 95% interval=[0.414,0.475]

θ

p(θ|

y)

95% interval = two values such that 95% of the samples are between these two values.

Sensitivity to Choice of Prior

0.425, 0.4820.4530.000210.452103970.414, 0.4780.4470.000250.4465.154.850.414, 0.4470.4470.000250.4462.5752.425

0.414, 0.4750.4450.000250.44611

95% Int.Post.Med.V(θ|y)E(θ|y)ba

These results are not sensitive to choice of the above a, b.


• Can address other questions: – Do these data provide evidence that the

proportion of female births in placenta previapregnancies is smaller than 0.485?

– Calculate p(θ < 0.485 | y)

• p(θ|y) is a Beta density. • Obtain 1000 random samples from this density. • What proportion of these samples are smaller than

0.485? • When a = 1 = b, p(θ < 0.485 | y) approximately 0.995• 99.5% evidence that ...

Some comments on Posterior Density

• p(θ|y) will not always have a nice form.

• Need to approximate p(θ|y).

• Sampling θ from p(θ|y) not straightforward.

• Need advanced computations.– Stochastic simulations

Example (Advanced Computation)Gene Mapping in Experimental Crosses

• Quantitative Trait Loci (QTL) associated with flowering time in mustard plants (Satagopan et al., 1996)

• y = log(flowering time)• M = (M1, M2, ..., M9) : 9 RFLP markers on

chromosome 9– Binary coding

• Data: (yi, Mi) for 105 plants (backcross-type)

Sampling Density

• Goal: Find loci affecting flowering time• Locus Q can be between two markers !!

• Model: y = α + βQ + ε• ε ~ N(0, σ2)• θ = (α, β, σ2)

• Sampling density: p(y|Q, θ) is N(α+βQ, σ2)

QTL Genotypes

• Observe 9 marker data for each plant• QTL genotype unobserved !!

• Assume: QTL is somewhere between two markers

• Can write probability QTL genotype is 1 (or 0) given the flanking marker genotype

M6 M7

Q

r1 r2

( ) ( )( )( )r

rrMMQp kk −−−==== + 1

111,11 211

Recombination

• r = Recombination between two adjacent markers.– Assumed known

• r1 = Recombination between left flanking marker and putative QTL

• r2 can be estimated for given r and r1– r2 = (r – r1) / (1 – 2 r1) [under some assumptions]

Plan

• Go to a random locus between any two markers

• Calculate probability of QTL genotype given marker genotypes at that locus

• Actual sampling density:

( ) ( ) ( )( ) ( )1

11

,0,0

,1,1,,

+

++

==

+===

kk

kkkk

MMQpQyp

MMQpQypMMyp

θθθ

Target

• At any locus, obtain p(θ, r1, Q1, ..., Qn | y1, ..., yn, M1, ..., Mn)

• Prior for θ: p(θ) = p(α) p(β) p(σ2)• Prior for Q: p(Q| Mk, Mk+1)• Prior for r1: p(r1) = Uniform between two

markers

• p(α) = N(α0, τ2)• p(β) = N(β0, δ2)• p(1/σ2) = Gamma(a, b)

Standard priorsin statisticsliterature

Posterior Sampling

• Sampling from smaller pieces– easy if the densities are “nice”

• Normal, Gamma, Binomial etc.– Gibbs sampling

• When the smaller pieces are not nice– sampling is complex– advanced sampling algorithms– Markov chain Monte-Carlo methods

Posterior density of r1

Satagopan et al., 1996

Posterior Sampling

• Simply sample from posterior

• Gibbs sampling

• Importance sampling, Rejection sampling

• Metropolis-Hastings method

• Other Markov chain Monte-Carlo Sampling

• Gilks et al., (1996): Markov chain Monte Carlo in Practice. Chapman and Hall, New York.

Summary

• Fit probability model to observed data

• Specify prior for unknown parameters

• Calculate posterior probability of unknowns

• Sample from posterior

• Summaries from posterior samples

• Check for sensitivity to choice of prior

Part III

• Hierarchical Mixture Model

• Empirical Bayes solution

• Application to Gene Expression Data Analysis– Summary of Kendziorski et al., 2003

Gene Expression Example

• Two (or more) groups of subjects

• Expression (log scale) of gene g– primary cancer: y1g, y2g, ..., yn1,g

– metastatic cancer: x1g, x2g, ..., xn2,g

• Sampling distribution: Normal

Hierarchical Model

• Sampling distribution: p(yig|μg) = N(μg, σ2)– Likewise for p(xjg|μg)

• Prior for μg: p(μg) = N(μ, τ2)

• Unknown parameters: (μ, τ2, σ2)

• Earlier: fixed parameters of prior density

Hierarchical Structure

μ, τ2

μg

σ2

yig

N(μ, τ2)

N(μg ,σ2)

Hypothesis Testing

• Ho,g (null): equivalent expression• HA,g (alternative): differential expression

• Calculate posterior probability– P(H0,g | expression data of gene g)

• Small posterior probabilityDifferential expression

DECLARE

Posterior Probability

• Each gene obeys H0,g or HA,g

• π = Prob. gene obeys H0,g• 1-π = Prob. gene obeys HA,g

• Goal: ( ) ( ) ( )( )

( )( ) ( )( )ππ

π−+

=

=

1 data data data

Rule] [Bayes data

datadata

,,0

,0

,0,0,0

gAg

g

ggg

HpHpHp

pHpHp

Hp

Mixture Model

• The data for a gene arises as a mixture of the probability under H0,g and HA,g

( ) ( )( ) ( ) ( )( ) ( ) ( )ggAgg

gAggggg

gg

xypxyp

HxypHxyp

xypp

, 1 ,

, 1,

,data

0

,,0

g

ππ

ππ

−+=

−+=

=

Probability Density Under the Null

• Under the null, treat data as (yg, xg) = wg = (w1,g, w2,g, ..., wn,g)

• Marginal density under the null:( ) ( )

( ) ( )

( ) ( ) gg

n

iggi

ggggg

gggg

dpwp

dpHwp

Hwpxyp

μμμ

μμμ

,

,

1,

,0

,00

∫ ∏∫

⎭⎬⎫

⎩⎨⎧=

=

=

=

Depends upon μ, τ2 and σ2

Probability Density Under the Alternative

• Product of marginal densities of the two groups

( ) ( )( ) ( )

( ) ( ){ }( ) ( ){ }

( ) ( )

( ) ( )⎭⎬⎫

⎩⎨⎧

×⎭⎬⎫

⎩⎨⎧

=

×=

=

=

∫∏

∫∏

∫∫

=

=

,

,

,,

2

1

1,

1,

,

,

,,

,

gg

n

jggi

gg

n

iggi

ggggAg

ggggAg

gAggAg

gAggggA

dpxp

dpyp

dpHxp

dpHyp

HxpHyp

Hxypxyp

μμμ

μμμ

μμμ

μμμ

Depends upon μ, τ2 and σ2

Marginal Densities

• p0(yg, xg) = Normal (mean, var0)• pA(yg, xg) = Normal (mean, var1)

• mean = (μ, μ, ..., μ) [n = n1+n2 times]

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

+

++

=

... ...

var

2222

2222

2222

1

τσσσ

στσσσστσ

M

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎝

⎛

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

+

++

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

+

++

=

×

×

22

11

22222

2222

2222

22222

2222

2222

0

0

0

var

nn

nn

τσσσσ

στσσσστσ

τσσσσ

στσσσστσ

K

M

K

K

K

M

K

K

Estimation

• Likelihood for gene g: p(datag)

• Likelihood for the full data:

• Maximize likelihood to estimate π, μ, τ2, σ2. • Empirical Bayes solution• Note: not interested in posterior density of μg

( ) ( ) ( ) ( ){ }∏ ∏= =

+=G

g

G

gggAggg xypxypp

1 10 , -1 , data ππ

Discussion of Results from Kendziorski et al., (2003)

Summary

• Specify distributions in a hierarchical manner

• Samples from posterior density

• Alternatively, empirical Bayes solution

• Gene expression setting: calculate posterior probability of null hypothesis for each gene

Wrap-up

• Specify sampling and prior densities

• Derive posterior density

• Inference based on posterior densities

Some References

• A Gelman, JB Carlin, HS Stern, DB Rubin (1996). Bayesian data analysis. Chapman and Hall, New York.

• BP Carlin, TA Louis (1996). Bayes and empirical Bayes methods for data analysis. Chapman and Hall, New York.

• WR Gilks, S Richardson, DJ Spiegelhalter (1996). Markov chain Monte Carlo in practice. Chapman and Hall, New York.

• SC Heath (1997). Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. American Journal of Human Genetics 61: 748-760.

• CM Kendziorski, MA Newton, H Lan, MN Gould (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine 22:3899-3914.

• S Lin (1999). Monte Carlo Bayesian methods for quantitative traits. Computational Statistics and Data Analysis 31: 89-108.

• MA Newton and Y Lee (2000). Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics 56: 1088-1097.

• JM Satagopan, BS Yandell, MA Newton, TC Osborn (1996). A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144: 805-816.

• JM Satagopan, K Offit, W Foulkes, ME Robson, S Wacholder, CM Eng, SE Karp, CB Begg (2001). The lifetime risks of breast cancer in Ashkenazi Jewish carriers of BRCA1 and BRCA2 mutations. Cancer Epidemiology,Biomarkers and Prevention 10: 467-473.

• EA Thompson (2000). MCMC estimation of multi-locus genome sharing and multipoint gene location scores. International Statistical Review 68: 53-73.

• EA Thompson (2000). Statistical inference from genetic data on pedigrees. Institute of Mathematical Statistics Monograph. Volume 6.

Introductory Bayesian Analysis• Placenta previa: unusual pregnancy condition • Study n=980...

Documents

Transcript of Introductory Bayesian Analysis• Placenta previa: unusual pregnancy condition • Study n=980...