Infinite Latent Process Decomposition

26
infinite Latent Process Decomposition Tomonari MASADA ( 正正正正 ) [email protected] Nagasaki University ( 正正正正 )

description

A topic model for analyzing microarray data

Transcript of Infinite Latent Process Decomposition

Page 1: Infinite Latent Process Decomposition

infinite Latent Process Decomposition

Tomonari MASADA (正田備也 )[email protected]

Nagasaki University (長崎大學 )

Page 2: Infinite Latent Process Decomposition

From array dataextract gene clusterssample-by-sample

[Intuition]Different samples may

show different groupings of

gene expressionsProblem

Page 3: Infinite Latent Process Decomposition

Neither gene clusteringnor sample clustering

Clustering ofgene-sample pairs

WhatWeDo

Page 4: Infinite Latent Process Decomposition

LPD [Rogers et al. 05]

LatentProcessDecomposition

• Bayesian modeling

• Assignment of eachgene-sample pair

to a processprocess = cluster

PreviousWork

Page 5: Infinite Latent Process Decomposition

[Ying et al. 08]

• K (# processes) shouldbe given as an input.

• LPD is inefficientwhen K is large.

In many cases,we don’t knowoptimal K. Weakness

Page 6: Infinite Latent Process Decomposition

iLPDinfiniteLatentProcessDecomposition

• Bayesian nonparametrics(K ∞)

OurNewMethod

Page 7: Infinite Latent Process Decomposition

• K can be truncated.(K∞ only theoretically.)• Memory size is fixed.• Parallelization is easy.

• K can be setwith little thought. Merits

Page 8: Infinite Latent Process Decomposition

ModelDetails

γtruncatedGEM~ ,11

1

k

l lkk πππ

απd Dirichlet~

γγ ,baGamma~

αα ,baGamma~

,1Beta~ ,γk

ρρ ,baGamma~

,ρμgk 0Gauss~

00Gamma~ ,bagk

dgdg gzgzdg ,λμx Gauss~

ddg θz Multi~

Kk ,,1

Page 9: Infinite Latent Process Decomposition

Collapsed Variational Bayesian Inference

○ Fixed memory size

○ Easy parallelization

× Special function evaluation– digamma, trigamma, tetragamma functions

Inference(CVB)

Page 10: Infinite Latent Process Decomposition

Experiment

http://www.gems-system.org/Dataset name Sample Gene Diagnostic Task

11_Tumors 174 12,534 11 various human tumor types

14_Tumors 308 15,010 14 various human tumor types and12 normal tissue types

9_Tumors 60 5,727 9 various human tumor types

Brain_Tumor1 90 5,921 5 human brain tumor types

Brain_Tumor2 50 10,368 4 malignant glioma types

Leukemia1 72 5,328 AML, ALL B-cell, and ALL T-cell

Leukemia2 72 11,226 AML, ALL, and mixed-lineage leukemia (MLL)

Lung_Cancer 203 12,601 4 lung cancer types and normal tissues

SRBCT 83 2,309 Small, round blue cell tumors (SRBCT) of childhood

Prostate_Tumor 102 10,510 Prostate tumor and normal tissues

DLBCL 77 5,470 DLBCL and follicular lymphomas

Page 11: Infinite Latent Process Decomposition

• Compare iLPD withLPD [Ying et al. 08]

• Train iLPD on90% randomly selected data

• Evaluate posterior density at 10% test data and

calculate geometric mean

• Average over 25 runs Evaluation

Page 12: Infinite Latent Process Decomposition

• iLPD is more efficient for a large K than LPD.

• There is a dataset that is not well analyzed.

–LPD-type methods may not be a panacea.

Cf. BMC Bioinformatics 2010, 11:552– Nonparametric Bayesian method based on

Indian Buffet ProcessesResults

Page 13: Infinite Latent Process Decomposition

• Practical evaluation

• Result interpretation

• GPGPU acceleration

• Visualization

FutureWork

Page 14: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.270

0.280

0.290

0.300 iLPD LPD

Brain_Tumor1

Page 15: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.225

0.235

0.245

0.255 iLPD LPD

Brain_Tumor2

Page 16: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.250

0.260

0.270

0.280 iLPD LPD

DLBCL

Page 17: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.230

0.240

0.250

0.260 iLPD LPD

Leukemia1

Page 18: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.300

0.310

0.320

0.330 iLPD LPD

Leukemia2

Page 19: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.340

0.345

0.350

0.355

0.360 iLPD LPD

Lung_Cancer

Page 20: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.425

0.445

0.465

0.485 iLPD LPD

Prostate_Tumor

Page 21: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.230

0.240

0.250

0.260

0.270

0.280 iLPD LPD

SRBCT

Page 22: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.305

0.310

0.315

iLPD LPD

11_Tumors

Page 23: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.470

0.480

0.490

0.500 iLPD LPD

14_Tumors

Page 24: Infinite Latent Process Decomposition

10 processes 20 processes 40 processes0.140

0.150

0.160

0.170

0.180

0.190 iLPD LPD

9_Tumors

Page 25: Infinite Latent Process Decomposition
Page 26: Infinite Latent Process Decomposition