Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali...

27
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar- Joseph

Transcript of Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali...

Page 1: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Alignment and classification of time series gene expression in

clinical studiesTien-ho Lin, Naftali Kaminski and

Ziv Bar-Joseph

Page 2: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Outline

• Introduction

• HMM for aligning time series gene expression– Generative training of HMM– Discriminative training of HMM

Page 3: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Introduction

• A growing number of expression datasets are measured as time series:– Utilize its unique features (temporal evolution

of profiles)– Address its unique challenges (different

response rates of patients in the same class)

Page 4: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Introduction

• We use the HMMs with less states less than time points leading to an alignment of the different patient response rates.

• We develop a discriminative HMM classifier instead of traditional generative HMM.

Page 5: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

HMM for aligning time series gene expression

• Because of different and varying response rate of each patient, we align a patient’s time series gene expression to a common profile.

• For a classification task we generate two such HMMs, one for good responder and one for poor responders.

• To avoid overfitting, the covariance matrix is assumed to be diagonal.

Page 6: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

HMM for aligning time series gene expression

• Three state space topologies

A state I has transitions to i+1, i+2, …,i+J, where maximum jump step J

The first and third topologies can be used to align patients by modifying their transition probabilities based on the observed expression data.

Page 7: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

HMM for aligning time series gene expression

• Time series gene expressions of K patients

• We measure the expression of G genes for each patient at T time points.

• A HMM with multivariate Gaussian emission probability is trained for each class

},...,,{ 21 KOOO

),...,,( 11 kTkkk OOOO

}2,1{, mm

Page 8: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

HMM for aligning time series gene expression

• is the transition probability from state i to j.• are the mean and SD for the Gaussian

distribution of stat j.• The mean and SD of gene g in state j is denoted

as . • the posterior probability of stat j at time t

of observation .• is the probability of a transition from

state i to state j at time t of observation.

}{ mija

}{},{ mj

mj

}}{{ mjg

mjg

),()( jimkt

)()( jmkt

kO

Page 9: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Generative training of HMM

• Given labeled expression data we can learn the parameters of a HMM using the Baum-Welch algorithm.

• Class assignment is based on maximum conditional likelihood.

• MLE is optimal if the true model is indeed the assumed HMM and there is infinite data.

• Focus on the differences between positive and negative data, rather than on the most visible features.

Page 10: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Discriminative training of HMM

• Discriminative modelsa class of models used in machine learning for modeling the dependence of an unobserved variable y on an observed variable x. Within a statistical framework, this is done by modeling the conditional probability distribution P(y|x), which can be used for predicting y from x.

• The HMMs for both classes are learned concurrently and parameters in one of the models are affected by the parameters estimated for the other model.

Page 11: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Discriminative training of HMM

• To model the difference between positive and negative examples, we need to optimize a discriminative criteria.

• We use MMIE objective function:

Ck is the class (1 or 2) of the patient k.

Page 12: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Discriminative training of HMM

• The denominator will be represented by the likelihood of a combined HMM, ,such that

• is called the denominator model.• During the training, the denominator model is

constructed in each iteration after the HMM and are updated. While updating one class, the HMM for that class is called the numerator model.

den

den

(1)(2)

Page 13: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Discriminative training of HMM

• E-stepThis estimation is similar to the ones in the Baum—Welch algorithm

Page 14: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Discriminative training of HMM

• M-stepMMIE updates the parameters by moving them toward the positive examples and away from the denominator model.

Page 15: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Discriminative training of HMM

– A smoothing constant DE and DT needs to be added to both the numerator terms and the denominator terms to avoid negative transition probabilities or negative variances in emission probabilities.

– If the smoothing constants are too small, update may not increase the (discriminative) object function, but if they are too large, convergence will be too slow.

– Empirically, twice the lower bound leads to fast convergence.

Page 16: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Gene selection for time series expression classification

• Gene selection is critical in clinical expression classification:– The number of patients is small compared to

the number of genes, resulting in overfitting.– The small subset of genes that discriminate

between the classes can lead to boimarker discovery.

Page 17: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Gene selection for time series expression classification

• There are two primary approaches:– The “wrapper” approach

• The wrapper approach evaluates the classifier on different feature subset, and searches in the space of all possible feature subsets using the

– The “filter” approach• The filter approach does not rely on the underlying

classifier, but instead uses a simpler criteria to filter out irrelevant features.

Page 18: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Gene selection for time series expression classification

• Backward stepwise feature selection method that utilize the alignment to the HMM profiles based on recursive feature elimination (RFE) algorithm, termed HMM-RFE.– Train the classifier, eliminate the feature

whose contribution to the discrimination is minimal, and repeat iteratively until the stopping criteria is met.

Page 19: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Gene selection for time series expression classification

• Since the covariance matrix is diagonal, gene-expression levels are independent given the hidden states. Thus, if the states are known, the likelihood can be decomposed into terms involving each gene separately.

• We define the contribution to log adds of a gene g, dg, as

Page 20: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Gene selection for time series expression classification

Page 21: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Results

• The average expressions of four genes in MS patient treated with IFNB

Page 22: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Results

• Simulated dataset– 100 patients, 50 patients were in class 1(good

responders) and 50 in class 2 (poor responders)

– 100 genes were measured for each patient, with a maximum of 8 time points per tatient.

– For each gene g, generate the Class 1 response profile by randomly selecting a segment of a sine wave, of length between 0 to ,denote as a function .4

5.1)()1( tf g

Page 23: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Results

– 10 out of the 100 genes to be differential, the other 90 where assigned the same valude for Class 2.

ag is +5 or -5, and bg is uniformly selected at random between -0.1 and 0.3.

Page 24: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Results

– Scaling value sk between 0.5 to 1.5 for patient k

Page 25: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Results

Page 26: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Results

Page 27: Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Results

• MS dataset