Introductory Bayesian Analysis• Placenta previa: unusual pregnancy condition • Study n=980...
Transcript of Introductory Bayesian Analysis• Placenta previa: unusual pregnancy condition • Study n=980...
Introductory Bayesian Analysis
Jaya M. SatagopanMemorial Sloan-Kettering Cancer Center
April 2, 2008
Bayesian Inference
• Fit probability model to observed data
• Unknown parameters– Summarize using probability distribution
• Use prior information
Outline
• Distributions
• Bayes Rule – Example
• Prior and posterior distributions – Some models – Examples: Binomial, Normal
• Hierarchical Mixture model
Part I
• Joint, conditional and marginal distributions
• Posterior density and Bayes Rule
• Examples
General Notations
• Outcome: y– Disease Status (y = 0, 1)– Tumor size
• Parameter: θ– Mutation carrier status (θ = 0, 1)– Effect of treatment on tumor size
• Covariate: X– Treatment dosage
Distributions
• Model: joint probability distribution (density of y and θ)– p(y, θ) = p(θ) p(y|θ)
• p(θ) = prior density of θ
• p(y|θ) = conditional (sampling) density of y given θ
Example(Gelman et al., 1995)
• Hemophelia: X-chromosome linked recessive disorder
• Affected males (single copy of X chromosome)
• Unaffected female (two copies of X chromosome)
Example
• A single male – y = hemophelia status – θ = “bad” hemophelia gene carrier status
• p(θ) = probability of “bad” hemophelia gene
• p(y|θ) = probability of hemophelia given gene carrier status
Bayes Rule
• Obtain posterior density from prior and sampling densities.
• p(y) = marginal probability of data – Average over all possible values of θ
( ) ( )( )
( ) ( )( )
( ) ( ) ( )
( ) ( ) )continuous is (when
discrete) is (when
,
θθθθ
θθθ
θθθθ
θ
θ
dypp
yppyp
ypypp
ypypyp
∫
∑
=
=
==
Example (Hemophelia)
• θ = carrier status in woman
• Find p(θ=1|y1=0, y2=0)
y1 =0 y2 =0
( ) ( )( )( )( ) ( ) ( )
( )
( ) ( ) ( )( ) ( )
625.0
110,0
000,00,0
20.0625.0125.0
0,010,01
0,01
11100,0
25.05.05.010,015.00
21
2121
21
2121
21
21
=====
+=======
==
======
====
=×====
=×========
θθθθ
θθθ
θθ
θθ
pyyp
pyypyyp
yypyypp
yyp
yyp
yyppp
Example (Breast Cancer)
• Case-control study
• Sample individuals with y=1 (case), and y=0 (control)
• Genotype for a BRCA mutation (θ=1 if carrier, 0 if non-carrier)
• Observed distribution: p(θ|y) !!
• Want to estimate risk: p(y=1|θ=1)
Example (Breast Cancer)
• Odds ratio:
• Using Bayes rule:
( )( )01
11====
=ypyp
θθ
φ
( ) ( )( )
( ) ( )( ) ( ) ( ) ( )
( )( ) ( )01
1
001111111
11,111
=+===
===+======
=
======
ypypyp
ypypypypypyp
pypyp
φφ
θθθ
θθθ
Example (Breast Cancer)
• p(θ=1|y=1) = 25/204 = 0.1225• p(θ=1|y=0) = 23/1113 = 0.0207• Φ = 0.1225/0.0207 = 5.9
• p(y=1): disease incidence in a given age group (40-49)– from SEER database– p(y=1) = 0.0138
• p(y=0) = 1-p(y=1) • p(y=1|θ=1) = 7.6%
1090179Non-carrier
2325CarrierControlCase
BRCA Mutation
• Age group 40-49• Satagopan et al., (2001)• SEER Registry:
http://seer.cancer.gov
Summary
• Probability distributions – Sampling density of observed data– Probability density of parameter of interest
• Bayes rule helps determine posterior density of interest
Part II
• Probability models for data analysis
• Specifying prior and posterior densities
• Example:– Binomial-Beta– Normal
Bayesian Inference
• Fit model to data (specify likelihood)
• Specify prior density for unknown parameters
• Derive posterior density
• Summaries: posterior mean, variance, confidence interval ...
Example(Gelman et al., 1995)
• Placenta previa: unusual pregnancy condition
• Study n=980 births, y=437 female births
• Goal: Bayesian inference on probability of female births among placenta previapregnancies
• θ = probability of a female birth in a single placenta previa pregnancy case
Sampling Density
• The n placenta previa pregnancies are independent.
• p(y|θ) is a Binomial density.
• θ is a probability. Hence, θ є [0,1].
• Goal: Posterior density
• Need to specify prior density p(θ).
( ) ( ) yny
yn
yp −−⎟⎟⎠
⎞⎜⎜⎝
⎛= θθθ 1
( ) ( ) ( )( ) ( )∫
=
θ
θθθθθ
θdypp
yppyp
Prior Density
• p(θ) for θ є [0,1].
• θ ~ Uniform [0,1]– All values in [0,1] are equally likely
• θ ~ Beta (a,b)– mean = a/(a+b)– variance = ab / [ (a+b)2 (a+b+1) ]
( ) ( )( ) ( ) ( ) 11 1 −− −
ΓΓ+Γ= ba
babap θθθ
a = 1, b = 1 a = 2.425, b = 2.575
θ
p(θ)
a = 4.85, b = 5.15 a = 97, b = 103
mean = 0.5var = 0.083
mean = 0.485var = 0.042
mean = 0.485var = 0.001
mean = 0.485var = 0.023
θ
p(θ)
θ
p(θ)
θ
p(θ)
Posterior Density
• p(θ|y) = Beta (a+y, b+n-y)
• Conjugacy property: Posterior has the same distributional form as the prior
• Summaries: – Mean: E(θ|y) = (a+y) / (a+b+n)– Variance:
V(θ|y) = (a+y)(b+n-y) / [ (a+b+n)2 (a+b+n+1)]
Example (Placenta Previa)
• n=980, y=437. p(y|n) is Binomial. • θ = probability of female birth for a single
placenta previa case.
• Prior: θ ~ Uniform [0,1] = Beta (1,1)• Posterior: p(θ|y) is Beta (438, 544).
• Summaries: – Posterior mean = 0.446– Posterior variance = 0.00025
Example (Placenta Previa)
• Study other summaries of posterior density– Posterior Median– Sample θ values from p(θ|y) using
computer simulation– Calculate median of these sampled values
• What happens if we choose different a, b values for prior p(θ) ?
Posterior Median
• Draw 1000 samples θ from p(θ|y) = Beta (438, 544).
• Can use computer programs to generate such random samples.
• Posterior median = 0.445• 95% interval=[0.414,0.475]
θ
p(θ|
y)
95% interval = two values such that 95% of the samples are between these two values.
Sensitivity to Choice of Prior
0.425, 0.4820.4530.000210.452103970.414, 0.4780.4470.000250.4465.154.850.414, 0.4470.4470.000250.4462.5752.425
0.414, 0.4750.4450.000250.44611
95% Int.Post.Med.V(θ|y)E(θ|y)ba
These results are not sensitive to choice of the above a, b.
Example (Placenta Previa)
• Can address other questions: – Do these data provide evidence that the
proportion of female births in placenta previapregnancies is smaller than 0.485?
– Calculate p(θ < 0.485 | y)
• p(θ|y) is a Beta density. • Obtain 1000 random samples from this density. • What proportion of these samples are smaller than
0.485? • When a = 1 = b, p(θ < 0.485 | y) approximately 0.995• 99.5% evidence that ...
Some comments on Posterior Density
• p(θ|y) will not always have a nice form.
• Need to approximate p(θ|y).
• Sampling θ from p(θ|y) not straightforward.
• Need advanced computations.– Stochastic simulations
Example (Advanced Computation)Gene Mapping in Experimental Crosses
• Quantitative Trait Loci (QTL) associated with flowering time in mustard plants (Satagopan et al., 1996)
• y = log(flowering time)• M = (M1, M2, ..., M9) : 9 RFLP markers on
chromosome 9– Binary coding
• Data: (yi, Mi) for 105 plants (backcross-type)
Sampling Density
• Goal: Find loci affecting flowering time• Locus Q can be between two markers !!
• Model: y = α + βQ + ε• ε ~ N(0, σ2)• θ = (α, β, σ2)
• Sampling density: p(y|Q, θ) is N(α+βQ, σ2)
QTL Genotypes
• Observe 9 marker data for each plant• QTL genotype unobserved !!
• Assume: QTL is somewhere between two markers
• Can write probability QTL genotype is 1 (or 0) given the flanking marker genotype
M6 M7
Q
r1 r2
( ) ( )( )( )r
rrMMQp kk −−−==== + 1
111,11 211
Recombination
• r = Recombination between two adjacent markers.– Assumed known
• r1 = Recombination between left flanking marker and putative QTL
• r2 can be estimated for given r and r1– r2 = (r – r1) / (1 – 2 r1) [under some assumptions]
Plan
• Go to a random locus between any two markers
• Calculate probability of QTL genotype given marker genotypes at that locus
• Actual sampling density:
( ) ( ) ( )( ) ( )1
11
,0,0
,1,1,,
+
++
==
+===
kk
kkkk
MMQpQyp
MMQpQypMMyp
θθθ
Target
• At any locus, obtain p(θ, r1, Q1, ..., Qn | y1, ..., yn, M1, ..., Mn)
• Prior for θ: p(θ) = p(α) p(β) p(σ2)• Prior for Q: p(Q| Mk, Mk+1)• Prior for r1: p(r1) = Uniform between two
markers
• p(α) = N(α0, τ2)• p(β) = N(β0, δ2)• p(1/σ2) = Gamma(a, b)
Standard priorsin statisticsliterature
Posterior Sampling
• Complex• Break into smaller pieces
• p(α | others), p(β | others), p(σ2 | others)• p(r1 | others)• p(Q1 | others), ..., p(Qn | others)
• Sample from these smaller pieces
Posterior Sampling
• Sampling from smaller pieces– easy if the densities are “nice”
• Normal, Gamma, Binomial etc.– Gibbs sampling
• When the smaller pieces are not nice– sampling is complex– advanced sampling algorithms– Markov chain Monte-Carlo methods
Posterior density of r1
Satagopan et al., 1996
Posterior Sampling
• Simply sample from posterior
• Gibbs sampling
• Importance sampling, Rejection sampling
• Metropolis-Hastings method
• Other Markov chain Monte-Carlo Sampling
• Gilks et al., (1996): Markov chain Monte Carlo in Practice. Chapman and Hall, New York.
Summary
• Fit probability model to observed data
• Specify prior for unknown parameters
• Calculate posterior probability of unknowns
• Sample from posterior
• Summaries from posterior samples
• Check for sensitivity to choice of prior
Part III
• Hierarchical Mixture Model
• Empirical Bayes solution
• Application to Gene Expression Data Analysis– Summary of Kendziorski et al., 2003
Gene Expression Example
• Two (or more) groups of subjects
• Expression (log scale) of gene g– primary cancer: y1g, y2g, ..., yn1,g
– metastatic cancer: x1g, x2g, ..., xn2,g
• Sampling distribution: Normal
Hierarchical Model
• Sampling distribution: p(yig|μg) = N(μg, σ2)– Likewise for p(xjg|μg)
• Prior for μg: p(μg) = N(μ, τ2)
• Unknown parameters: (μ, τ2, σ2)
• Earlier: fixed parameters of prior density
Hierarchical Structure
μ, τ2
μg
σ2
yig
N(μ, τ2)
N(μg ,σ2)
Hypothesis Testing
• Ho,g (null): equivalent expression• HA,g (alternative): differential expression
• Calculate posterior probability– P(H0,g | expression data of gene g)
• Small posterior probabilityDifferential expression
DECLARE
Posterior Probability
• Each gene obeys H0,g or HA,g
• π = Prob. gene obeys H0,g• 1-π = Prob. gene obeys HA,g
• Goal: ( ) ( ) ( )( )
( )( ) ( )( )ππ
π−+
=
=
1 data data data
Rule] [Bayes data
datadata
,,0
,0
,0,0,0
gAg
g
ggg
HpHpHp
pHpHp
Hp
Mixture Model
• The data for a gene arises as a mixture of the probability under H0,g and HA,g
( ) ( )( ) ( ) ( )( ) ( ) ( )ggAgg
gAggggg
gg
xypxyp
HxypHxyp
xypp
, 1 ,
, 1,
,data
0
,,0
g
ππ
ππ
−+=
−+=
=
Probability Density Under the Null
• Under the null, treat data as (yg, xg) = wg = (w1,g, w2,g, ..., wn,g)
• Marginal density under the null:( ) ( )
( ) ( )
( ) ( ) gg
n
iggi
ggggg
gggg
dpwp
dpHwp
Hwpxyp
μμμ
μμμ
,
,
1,
,0
,00
∫ ∏∫
⎭⎬⎫
⎩⎨⎧=
=
=
=
Depends upon μ, τ2 and σ2
Probability Density Under the Alternative
• Product of marginal densities of the two groups
( ) ( )( ) ( )
( ) ( ){ }( ) ( ){ }
( ) ( )
( ) ( )⎭⎬⎫
⎩⎨⎧
×⎭⎬⎫
⎩⎨⎧
=
×=
=
=
∫∏
∫∏
∫∫
=
=
,
,
,,
2
1
1,
1,
,
,
,,
,
gg
n
jggi
gg
n
iggi
ggggAg
ggggAg
gAggAg
gAggggA
dpxp
dpyp
dpHxp
dpHyp
HxpHyp
Hxypxyp
μμμ
μμμ
μμμ
μμμ
Depends upon μ, τ2 and σ2
Marginal Densities
• p0(yg, xg) = Normal (mean, var0)• pA(yg, xg) = Normal (mean, var1)
• mean = (μ, μ, ..., μ) [n = n1+n2 times]
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
+
++
=
... ...
var
2222
2222
2222
1
τσσσ
στσσσστσ
M
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
+
++
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
+
++
=
×
×
22
11
22222
2222
2222
22222
2222
2222
0
0
0
var
nn
nn
τσσσσ
στσσσστσ
τσσσσ
στσσσστσ
K
M
K
K
K
M
K
K
Estimation
• Likelihood for gene g: p(datag)
• Likelihood for the full data:
• Maximize likelihood to estimate π, μ, τ2, σ2. • Empirical Bayes solution• Note: not interested in posterior density of μg
( ) ( ) ( ) ( ){ }∏ ∏= =
+=G
g
G
gggAggg xypxypp
1 10 , -1 , data ππ
Discussion of Results from Kendziorski et al., (2003)
Summary
• Specify distributions in a hierarchical manner
• Samples from posterior density
• Alternatively, empirical Bayes solution
• Gene expression setting: calculate posterior probability of null hypothesis for each gene
Wrap-up
• Specify sampling and prior densities
• Derive posterior density
• Inference based on posterior densities
Some References
• A Gelman, JB Carlin, HS Stern, DB Rubin (1996). Bayesian data analysis. Chapman and Hall, New York.
• BP Carlin, TA Louis (1996). Bayes and empirical Bayes methods for data analysis. Chapman and Hall, New York.
• WR Gilks, S Richardson, DJ Spiegelhalter (1996). Markov chain Monte Carlo in practice. Chapman and Hall, New York.
• SC Heath (1997). Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. American Journal of Human Genetics 61: 748-760.
• CM Kendziorski, MA Newton, H Lan, MN Gould (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine 22:3899-3914.
• S Lin (1999). Monte Carlo Bayesian methods for quantitative traits. Computational Statistics and Data Analysis 31: 89-108.
• MA Newton and Y Lee (2000). Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics 56: 1088-1097.
• JM Satagopan, BS Yandell, MA Newton, TC Osborn (1996). A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144: 805-816.
• JM Satagopan, K Offit, W Foulkes, ME Robson, S Wacholder, CM Eng, SE Karp, CB Begg (2001). The lifetime risks of breast cancer in Ashkenazi Jewish carriers of BRCA1 and BRCA2 mutations. Cancer Epidemiology,Biomarkers and Prevention 10: 467-473.
• EA Thompson (2000). MCMC estimation of multi-locus genome sharing and multipoint gene location scores. International Statistical Review 68: 53-73.
• EA Thompson (2000). Statistical inference from genetic data on pedigrees. Institute of Mathematical Statistics Monograph. Volume 6.