Bayesian dynamic modeling of latent trait distributions

Bayesian dynamic modeling of latent trait distributions

Duke University Machine Learning Group

Presented by Kai Ni

Jan. 25, 2007

Paper by David B. Dunson,

Biostatistics, 2006

Outline

• Introduction

• Measurement model

• Dynamic mixture of Dirichlet processes

• Inference

• Results & Conclusion

Motivation

• The general problem – The primary response variable of interest cannot be

measured directly and one must rely on multiple surrogates.

– The different measured outcomes are assumed to e manifestations of a latent variable, which may depend on covaraites.

• Example – Cannot measure the frequency of DNA strand break but can use gel electrophoresis to get surrogates. The distribution of DNA damage across cells may have different shapes depending on the level of oxidative stress.

Motivation (2)

• The paper focus on developing an approach for assessing dynamic changes in the latent response distribution across levels of a predictor.

• Dynamic mixture of Dirichlet processes (DMDP) – The latent response distribution in group h is represented as a mixture of the distribution in group h-1 and an unknown innovation distribution, which is assigned a DP prior.

Measurement Model

• Let yhi = (yhi1,…,yhip)’ denote a p x 1 vector of surrogate measurements for the latent response of the ith (i = 1,…,nh) subject in group h (h = 1,…,d).

• For example, in the DNA damage study, yhi denotes surrogates of DNA damage for the ith cell in dose group h.

• The yhi has both continuous and categorical elements. Use some mapping function to get an underlying continuous variables yhi

*.

Measurement Model (2)

• Relate the underlying continuous variables to the latent response through a measurement model:

– Latent variable– Intercept parameters– Factor loadings– Measurement errors

A scale mixture of normal distribution is assumed for the residual distribution.

• The primary goal is to assess how the latent response distribution changes between groups.

Dynamic mixture of Dirichlet process

• First the latent response distribution for group 1 is assumed to be drawn from a DP:

and the predictive density of latent response for group 1 is:

• Assume the distribution G2 for group 2 shares features with G1 but that innovation may have occurred. So G2 =

• G2 is randomly modified from G1 by (1) reducing the probabilities allocated to the atoms in G1 by a factor (1- ) and (2) incorporating new atoms drawn from the base 1

01H

DMDP

• The difference between G1 and G2 has mean and variance

• The hyperparameters control the magnitude of the expected changed from G1 to G2.

1 1 01, and H

2 1 1

2 1 1 1 1 01

2 1 1 1 1 1

as 0

{ | , , } as 1

{ ( ) | , , } 0 as or 0

G G

E G G G H G

V B G

Correlation

• For the special case in which for all l, so that the same base distribution is chosen for each component in the mixture. The correlation between consecutive G’s is

• The prior probability of clustering together two subjects h, i and h’, i’ in the same or different groups is

• For the hyperparameters, beta distribution is chosen for and gamma distribution is chosen for

Sampling in the latent response model

Sampling in the measurement model

Inference on the latent response distribution

• Collecting draws from the conditional predictive distribution for a future subject in dose groups:

• After convergence, the samples of nh,nh+1 represent draws from the predictive density of the latent response in group h, and inferences can be based on comparing these densities between groups.

DNA damage study

• The study assessed the effect of oxidative stress on the frequency of DNA strand breaks using single-cell gel electrophoresis.

• 500 human lymphoblastoid cells drawn from an immortalized cell line were randomized to one of the the five dose groups (0, 5, 20, 50, or 100 micromoles H2O2).

• There are p=5 surrogate measures of DNA damage, including (1) % tail DNA, (2) tail extent divided by head extent, (3) extent tail moment, (4) Olive tail moment, and (5) tail extent.

Conclusion

• The author proposed a Bayesian semiparametric latent response model in which the latent variable density can shift dynamically across groups.

• Use linear regression model to infer the latent variables may fail in many applications while the measurement model proposed by the author is quite flexible.

• The DMDP should prove useful when interest focuses on clustering of observations within and across groups.

Bayesian dynamic modeling of latent trait distributions

Documents

Transcript of Bayesian dynamic modeling of latent trait distributions