BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA...

36
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Part 1: Mixture Priors for Linear Settings Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 1 / 36

Transcript of BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA...

Page 1: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

BAYESIAN METHODS FOR VARIABLE SELECTION WITHAPPLICATIONS TO HIGH-DIMENSIONAL DATA

Part 1: Mixture Priors for Linear Settings

Marina Vannucci

Rice University, USA

PASI-CIMAT04/28-30/2010

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 1 / 36

Page 2: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings

Part 1: Mixture Priors for Linear Settings

Linear regression models (univariate and multivariate responses)

Matlab code on simulated data

Extensions to categorical responses and survival outcomes

Applications to high-throughput data from bioinformatics

Models that incorporate biological information

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 2 / 36

Page 3: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Linear Regression Models

Regression Model

Yn×1 = 1α+ Xn×pβββp×1 + ε, ε ∼ N(0, σ2I)

Introduce latent variable γγγ = (γ1, . . . , γp)′ to select variables{γj = 1 if variable j included in modelγj = 0 otherwise

Specify priors for model parameters:

βj |σ2 ∼ (1 − γj)δ0(βj) + γjN(0, σ2hj)

α|σ2 ∼ N(α0,h0σ2)

σ2 ∼ IG(ν/2, λ/2)

p(γγγ) =

p∏

j=1

wγj (1 − w)1−γj .

where δ0(·) is the Dirac function.Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 3 / 36

Page 4: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Linear Regression Models

Posterior Distribution

Combine data and prior information into a posterior distribution ⇒interest in posterior distribution

p(γγγ|Y,X) ∝ p(γγγ)∫

f (Y|X, α, βββ, σσσ)p(α|σ)p(βββ|σ, γγγ)p(σ)dαdβββdσ

p(γγγ|Y,X) ∝ g(γγγ)

|X′(γγγ)X(γγγ)|

−1/2(νλ+ S2γγγ)−(n+ν)/2p(γγγ)

X(γγγ) =

(X(γγγ)H

12(γγγ)

Ipγγγ

), Y =

(Y0

)

S2γγγ = Y′Y − Y′X(γγγ)(X

′(γγγ)X(γγγ))

−1X′(γγγ)Y

the residual sum of squares from the least squares regression of Y onX(γγγ). Fast updating schemes use Cholesky or QR decompositions withefficient algorithms to remove or add columns.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 4 / 36

Page 5: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Linear Regression Models

Model Fitting via MCMC

With p variables there are 2p different γγγ values. We useMetropolis as stochastic search.At each MCMC iteration we generate a candidate γγγnew byrandomly choosing one of these moves:

(i) Add or Delete : randomly choose one of the indices in γγγold andchange its value.

(ii) Swap : choose independently and at random a 0 and a 1 in γγγold andswitch their values.

The proposed γγγnew is accepted with probability

min{

p(γγγnew |X,Y)

p(γγγold |X,Y),1}.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 5 / 36

Page 6: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Linear Regression Models

Posterior inference

The stochastic search results in a list of visited models (γγγ(0), γγγ(1), . . .)and their corresponding relative posterior probabilities

p(γγγ(0)|X,Y),p(γγγ(1)|X,Y) . . .

Select variables:in the “best” models, i.e. the γγγ’s with highest p(γγγ|X,Y) orwith largest marginal posterior probabilities

p(γj = 1|X,Y) =

∫p(γj = 1, γγγ(−j)|X,Y)dγγγ(−j)

≈∑

γγγ:γj=1

p(

Y|X, γγγ(t))

p(γ(t))

or more simply by empirical frequencies in the MCMC output

p(γj = 1|X,Y) = E(γj = 1|X,Y) ≈ #{γγγ(t) = 1}

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 6 / 36

Page 7: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Linear Regression Models

Multivariate Response

Yn×q = 1α′ + Xn×pBp×q + E, Ei ∼ N(0,ΣΣΣ)

Variable selection via γγγ as

BBBj |ΣΣΣ ∼ (1 − γj)I0 + γjN(0,hjΣΣΣ),

with BBBj the j-th row of BBB and I0 a vector of point masses at 0.

Need to work with matrix-variate distributions (Dawid, 1981):

Y − 1ααα′ − XB ∼ N (In,ΣΣΣ)

ααα− ααα0 ∼ N (h0,ΣΣΣ)Bγγγ − B0γγγ ∼ N (Hγγγ,ΣΣΣ)

ΣΣΣ ∼ IW(δ,Q).

with IW an inverse-Wishart with parameters δ and Q to be specified.Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 7 / 36

Page 8: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Linear Regression Models

Posterior Distribution

Combine data and prior information into a posterior distribution ⇒interest in posterior distribution

p(γγγ|Y,X) ∝ p(γγγ)∫

f (Y|X, ααα,B,ΣΣΣ)p(ααα|ΣΣΣ)p(B|ΣΣΣ, γγγ)p(ΣΣΣ)dαααdBdΣΣΣ

p(γγγ|Y,X) ∝ g(γγγ) =

|X′(γγγ)X(γγγ)|

−q/2|Qγγγ |−(n+δ+q−1)/2p(γγγ)

X(γγγ) =

(X(γγγ)H

12(γγγ)

Ipγγγ

), Y =

(Y0

)

Qγγγ = Q + Y′Y − Y′X(γγγ)(X′(γγγ)X(γγγ))

−1X′(γγγ)Y

It can be calculated via QR-decomposition (Seber, ch.10, 1984). Useqrdelete and qrinsert algorithms to remove or add a column.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 8 / 36

Page 9: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Linear Regression Models

Prediction

Prediction of future Y f given the corresponding Xf can be done:

as posterior weighted average of model predictions (BMA)

p(Y f |X,Y) =∑

γγγp(Yf |X,Y, γγγ)p(γγγ|X,Y)

with p(Yf |X,Y, γγγ) a matrix-variate T distribution with mean Xf Bγγγ

Yf =∑

γγγ

(XfγγγBγγγ

)p(γγγ|X,Y)

Bγγγ = (X′γγγXγγγ + H−1

γγγ )−1X′γγγY

as LS or Bayes predictions on single best models

as LS or Bayes predictions with “threshold” models (eg, “median”model) obtained from estimated marginal probabilities of inclusion.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 9 / 36

Page 10: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Linear Regression Models

Prior Specification

Priors on ααα and ΣΣΣ vague and largely uninformative

ααα′ − ααα′0 ∼ N (h,ΣΣΣ), ααα0 ≡ 0,h → ∞,

ΣΣΣ ∼ IW(δ,Q), δ = 3,Q = k I

Choices for Hγγγ :

Hγγγ = c ∗ (X′γγγXγγγ)−1 (Zellner g-prior)

Hγγγ = c ∗ diag(X′γγγXγγγ)−1

Hγγγ = c ∗ Iγγγ

Choice of wj = p(γj = 1) : wj = w , w ∼ Beta(a,b) (sparsity). Also,choices that reflect prior information (e.g., gene networks).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 10 / 36

Page 11: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings

Advantages of Bayesian Approach

Past and collateral information through priors

n << p

Rich modeling via Markov chain Monte Carlo (MCMC) (for p large)

Optimal model averaging prediction

Extends to multivariate response

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 11 / 36

Page 12: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings

Main References

GEORGE, E.I. and MCCULLOCH, R.E. (1993). Variable Selection viaGibbs Sampling. Journal of the American Statistical Association, 88,881–889.

GEORGE, E.I. and MCCULLOCH, R.E. (1997). Approaches for BayesianVariable Selection. Statistica Sinica, 7, 339–373.

MADIGAN, D. and YORK, J. (1995). Bayesian Graphical Models forDiscrete Data. International Statistical Review, 63, 215–232

BROWN, P.J., VANNUCCI, M. and FEARN, T. (1998). MultivariateBayesian Variable Selection and Prediction. Journal of the RoyalStatistical Society, Series B, 60, 627–641.

BROWN, P.J., VANNUCCI, M. and FEARN, T. (2002). Bayes modelaveraging with selection of regressors. Journal of the Royal StatisticalSociety, Series B, 64(3), 519–536.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 12 / 36

Page 13: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings

Additional References

Use of g-priors:LIANG, F., PAULO, R., MOLINA, G., CLYDE, M. and BERGER, J. (2008).Mixture of g priors for Bayes variable section. Journal of the AmericanStatistical Association, 103, 410-423.

Improving MCMC mixing:BOTTOLO, L. and RICHARDSON, S. (2009). Evolutionary stochasticsearch. Journal of Computational and Graphical Statistics, underrevision. The authors propose an evolutionary Monte Carlo schemecombined with a parallel tempering approach that prevents the chainfrom getting stuck in local modes.

Multiplicity:SCOTT, J. and BERGER, J. (2008). Bayes and empirical-Bayesmultiplicity adjustment in the variable-selection problem. The Annals ofStatistics, to appear. The marginal prior on γγγ contains a non-linearpenalty which is a function of p and therefore, as p grows, with thenumber of true variables remaining fixed, the posterior distribution of wconcentrates near 0.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 13 / 36

Page 14: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Matlab Code

Code from my Website

bvsme fast: Bayesian Variable Selection with fast form of QRupdating

Metropolis search

gPrior or diagonal and non-diagonal selection prior

Bernoulli priors or Beta-Binomial prior

Predictions by LS, BMA and BMA with selection

http://stat.rice.edu/∼marina

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 14 / 36

Page 15: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Categorical Outcomes

Probit Models with Binary Response

Response with G = 2 classes: zi ∈ {0,1} associated with a set ofp predictors Xi , i = 1, . . . ,n.

Data augmentation: Latent (unobserved) yi linearly associatedwith the Xi ’s

yi = α+ X′iβββ + ǫi , ǫi ∼ N(0, σ2 = 1), i = 1, . . . ,n.

with intercept α and coefficient vector βββp×1.

Association

zi =

{0 if yi < 01 if otherwise

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 15 / 36

Page 16: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Categorical Outcomes

Probit Models with Multinomial Response

Response with G classes: zi ∈ {0,1, ...,G − 1} associated with aset of p predictors Xi , i = 1, · · · ,n (gene expressions).

Data augmentation: Latent (unobserved) vector Yi linearlyassociated with the Xi ’s

Yi = ααα′ + X′iB + Ei , Ei ∼ N(0,ΣΣΣ), i = 1, . . . ,n.

with intercepts ααα(G−1)×1 and coefficient matrix Bp×(G−1).

Association

zi =

{0 if yig < 0 for each gg if yig = max

1≤g≤G−1{yig}

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 16 / 36

Page 17: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Categorical Outcomes

Variable Selection

We introduce a binary latent vector γγγ for variable selection{γj = 1 if variable j discriminate the samplesγj = 0 otherwise

A mixture prior is placed on the j th row of B, given γγγ

B j ∼ (1 − γj)I0 + γjN(0, cΣΣΣ)

Assume γj ’s are independent Bernoulli variables

Combine data and priors into posterior p(γγγ|X,Y). Inference iscomplicated because response variable is latent.

ΣΣΣ ∼ IW (δ; Q), ααα ∼ N(0,h0ΣΣΣ), large h0.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 17 / 36

Page 18: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Categorical Outcomes

MCMC Algorithm

We sample (γγγ,Y) by Metropolis within Gibbs

Metropolis step to update γγγ from [γγγ|X,Z,Y]. We update γγγ(old) toγγγ(new) by:

(a) Add/delete : randomly choose a γj and change its value.(b) Swap : randomly choose a 0 and a 1 in γγγold and switch values.

The new candidate γγγ(new) is accepted with probability

min{p(γγγ(new)|X,Z,Y)

p(γγγ(old)|X,Z,Y),1}

We sample (Y|γγγ,X,Z) from a truncated normal or t distributionwith truncation based on Z.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 18 / 36

Page 19: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Categorical Outcomes

Posterior Inference

Select variables that are in the “best” models

γγγ∗ = argmax1≤t≤M

{p(γγγ(t)|X,Z, Y)

}, with Y =

1M

M∑

t=1

Y(t)

Select variables with largest marginal probabilities

p(γj = 1|X,Z, Y)

Predict future Yf by a posterior predictive mean

Yf =∑

γγγYf (γγγ)π(γγγ|Y,X,Z)

with Yf (γγγ) = 1α′ + Xf (γγγ)Bγγγ and ααα, Bγγγ based on Y

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 19 / 36

Page 20: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Matlab Code

Code from my website

bvsme prob: Bayesian Variable Selection for classification withfast form of QR updating

binary/multinomial/ordinal response

Metropolis search

gPrior or diagonal and non-diagonal selection prior

Bernoulli priors or Beta-Binomial prior

Predictions by LS, BMA and BMA with selection

http://stat.rice.edu/∼marina

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 20 / 36

Page 21: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Categorical Outcomes

Logit Models

More naturally interpretable in terms of odds ratios.Marginalization not possible.For binary data, a data augmented model is

zi = x′iβββ + ǫi ,

with ǫi a scale mixture of normals with marginal logistic,

ǫi ∼ N(0, λi)

λi = (2ψi)2

ψi ∼ KS,

with KS the Kolmogorov-Smirnov distribution.Variable selection is achieved by imposing mixture priors on βj ’s.Sampling schemes improve mixing by joint updates of correlatedparameters, i.e, (γγγ, βββ) using a Metropolis-Hastings with proposalthe full conditional of βββ and the add-delete-swap Metropolis for γγγ.Also, (z, λλλ) from truncated logistics and rejection sampling.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 21 / 36

Page 22: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Survival Outcomes

Accelerated Failure Time models

We use accelerated failure time (AFT) models

log(Ti) = α+ X′iβββ + εi , i = 1, . . . ,n.

Observe yi = min(ti , ci ) and δi = I{ti ≤ ci}, where ci censoring time.

We introduce augmented data W = (w1, . . . ,wn)′ to impute the

censored survival times{

wi = log(yi) if δi = 1wi > log(yi) if δi = 0

We consider different distributional assumptions for εi .

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 22 / 36

Page 23: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Survival Outcomes

Introduce latent vector γγγ for variable selection.MCMC steps consist of(1) Metropolis search to update γγγ from f (γγγ|X,W).(2) Impute censored failure times, wi with δi = 0, from f (wi |W−i ,X, γγγ).

Inference on variables based on p(γj = 1|X, W) or p(γγγ|X, WWW ).

Prediction of survival time for future patients

Wf =∑

γγγ

(1α′ + Xf (γγγ)βγγγ

)p(γγγ|X, W).

Predictive survivor function

P(Tf > t |Xf ,X, W) ≈∑

γγγP(

W > w |Xf ,X, W, γγγ)

p(γγγ|X, W).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 23 / 36

Page 24: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Matlab Code

Code from my website

bvsme surv: Bayesian Variable Selection for AFT models withright censoring

Metropolis search

diagonal selection prior

Bernoulli priors or Beta-Binomial prior

http://stat.rice.edu/∼marina

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 24 / 36

Page 25: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings

Main References

ALBERT, J.H. and CHIB, S. (1993), ”Bayesian Analysis of Binary andPolychotomous Response Data”, JASA, 88(422), 669-679.

SHA, N., VANNUCCI, M., TADESSE, M.G., BROWN, P.J., DRAGONI, I.,DAVIES, N., ROBERTS, T. C., CONTESTABILE, A., SALMON, N.,BUCKLEY, C. and FALCIANI, F. (2004). Bayesian variable selection inmultinomial probit models to identify molecular signatures of diseasestage, Biometrics, 60, 812-819.

HOLMES, C.C. and HELD, L. (2006). Bayesian auxiliary variable modelsfor binary and multinomial regression. Bayesian Analysis, 1(1), 145-166.

SHA, N., TADESSE, M.G. and VANNUCCI, M. (2006). Bayesian variableselection for the analysis of microarray data with censored outcome.Bioinformatics, 22(18), 2262-2268.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 25 / 36

Page 26: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Applications to High-Throughput Data

DNA microarrays

DNA fragments arrayed on glass slide or chip

Parallel quantification of thousands of genes in a singleexperiment

Identify biomarkers for treatment strategies and diagnostic tools

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 26 / 36

Page 27: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Applications to High-Throughput Data

Statistical Analyses

Identification of differentially expressed genes (gene selection forsample classification)

Discovery of subtypes of tissue/disease that respond differently totreatment (gene selection and sample clustering)

Prediction of continuous responses (clinical outcome, survivaltime)

The major challenge is the high-dimensionality of the data.p ≫ n

Widely used approaches: t-test, ANOVA, Cox model on singlegenes (ignores joint effect of genes; multiple testing issue) ordimension reduction techniques, PCA, PLS (leads to linearcombinations; cannot assess original variables). Lately emphasison subset selection methods (LASSO, Bayesian models).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 27 / 36

Page 28: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Applications to High-Throughput Data

Identification of Biomarkers of Disease Stage

Data consist of 11 early stage (duration less than 2 years) and 9late stage (over 15 years) rheumatoid arthritis patients.

mRNA samples extracted from peripheral blood and hybridized tocustom-made cDNA arrays.

755 gene expressions. Logged and std-ed data

Bernoulli prior with expected model size 10

We ran six MCMC chains with very different starting γ vectors.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 28 / 36

Page 29: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Applications to High-Throughput Data

0 1 2 3 4 5 6 7 8 9 10

x 104

0

20

40

60

80

100

120

(a)

Iteration number

Num

ber

of o

nes

0 1 2 3 4 5 6 7 8 9 10

x 104

−400

−300

−200

−100

0(b)

Iteration number

Log

prob

s

0 100 200 300 400 500 600 700 8000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gene

Pro

babi

lity

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 29 / 36

Page 30: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Applications to High-Throughput Data

Selected genes by best 10 models of each chain and of their union

Small sets of functionally related genes involved in cytoskeletonremodeling and motility, and with lymphocytes’ ability to respondto activation.

.05(1/20) misclassification error.

Ge

ne

Ge

ne

Ge

ne

Ge

ne

Ge

ne

Ge

ne

Ge

ne

Patient (Early 1−11, Late 12−20)2 4 6 8 10 12 14 16 18 20

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 30 / 36

Page 31: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Protein Mass Spectrometry

Peak Selection for Protein Mass spectra

Cancer classification based on mass spectra at 15,000 m/z ratios.x-axis: ratio of weight of a molecule to its electrical charge (m/z),y-axis: intensity ∼ abundance of that molecule in the sample.Goal: identification of peaks (proteins or protein fragments)related to a clinical outcome or disease status

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 31 / 36

Page 32: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Protein Mass Spectrometry

2000 4000 6000 8000 10000 12000 140000

100

200

300

m/z

inte

nsity

2000 4000 6000 8000 10000 12000 140000

100

200

300

m/z

inte

nsity

2000 4000 6000 8000 10000 12000 140000

100

200

300

m/z

inte

nsity

(A)

(B)

(C)

Serum spectra on 50 (10+11+29) subjects (SELDI-TOF). Ordinalresponse - tumor grade (ovarian cancer).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 32 / 36

Page 33: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Protein Mass Spectrometry

Data Processing

Preprocessing:

- Baseline subtraction- Denoising (often by wavelets)- Peak identification- Normalization- Alignment

Analysis:

- Model fitting- Validation

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 33 / 36

Page 34: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Protein Mass Spectrometry

Data processing results in 39 identified peaks.Probit model with Bayesian variable selection applied to 39 peaks.“Best” models with around 7 peptides (6 common).Misclassification errors (2/10,8/11,9/29)

0 10 20 30 400

0.2

0.4

0.6

0.8

1

mar

gina

l pro

babi

lity

peak0 10 20 30 40

0

0.2

0.4

0.6

0.8

1

mar

gina

l pro

babi

lity

peak

0 10 20 30 400

0.2

0.4

0.6

0.8

1

mar

gina

l pro

babi

lity

peak0 10 20 30 40

0

0.2

0.4

0.6

0.8

1m

argi

nal p

roba

bilit

y

peak

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 34 / 36

Page 35: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings Survival Outcomes

Case Study on Breast Cancer (van’t Veer et al. (2002))

Microarray data on 76 patients, 33 who developed distantmetastases within 5 years and 43 who did not (censored).Training and test sets (38+38 patients). About 5,000 genes.MSE=1.9 (with 11 genes).

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

gene index

fre

q o

f γ j=

1

−6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

log survival time

su

rviv

al p

rob

ab

ility

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 35 / 36

Page 36: BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA ...jrojo/PASI/lectures/Marina part1.pdf · 2010. 6. 21. · Matlab code on simulated data Extensions

Mixture Priors for Linear Settings

Main References

SHA, N., VANNUCCI, M., TADESSE, M.G., BROWN, P.J., DRAGONI, I.,DAVIES, N., ROBERTS, T. C., CONTESTABILE, A., SALMON, N.,BUCKLEY, C. and FALCIANI, F. (2004). Bayesian variable selection inmultinomial probit models to identify molecular signatures of diseasestage, Biometrics, 60, 812-819.

KWON, D.W., TADESSE, M.G., SHA, N., PFEIFFER, R.M. andVANNUCCI, M. (2007). Identifying biomarkers from mass spectrometrydata with ordinal outcome. Cancer Informatics, 3, 19–28.

SHA, N., TADESSE, M.G. and VANNUCCI, M. (2006). Bayesian variableselection for the analysis of microarray data with censored outcome.Bioinformatics, 22(18), 2262-2268.

TADESSE, M.G., SHA, N., KIM, S. and VANNUCCI, M. (2006).Identification of biomarkers in classification and clustering ofhigh-throughput data. In Bayesian Inference for Gene Expression andProteomics, K. Anh-Do, P. Mueller and M. Vannucci (Eds). CambridgeUniversity Press.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) PASI-CIMAT 04/28-30/2010 36 / 36