Infinite Latent Process Decomposition
-
Upload
tomonari-masada -
Category
Technology
-
view
73 -
download
2
description
Transcript of Infinite Latent Process Decomposition
infinite Latent Process Decomposition
Tomonari MASADA (正田備也 )[email protected]
Nagasaki University (長崎大學 )
From array dataextract gene clusterssample-by-sample
[Intuition]Different samples may
show different groupings of
gene expressionsProblem
Neither gene clusteringnor sample clustering
Clustering ofgene-sample pairs
WhatWeDo
LPD [Rogers et al. 05]
LatentProcessDecomposition
• Bayesian modeling
• Assignment of eachgene-sample pair
to a processprocess = cluster
PreviousWork
[Ying et al. 08]
• K (# processes) shouldbe given as an input.
• LPD is inefficientwhen K is large.
In many cases,we don’t knowoptimal K. Weakness
iLPDinfiniteLatentProcessDecomposition
• Bayesian nonparametrics(K ∞)
OurNewMethod
• K can be truncated.(K∞ only theoretically.)• Memory size is fixed.• Parallelization is easy.
• K can be setwith little thought. Merits
ModelDetails
γtruncatedGEM~ ,11
1
k
l lkk πππ
απd Dirichlet~
γγ ,baGamma~
αα ,baGamma~
,1Beta~ ,γk
ρρ ,baGamma~
,ρμgk 0Gauss~
00Gamma~ ,bagk
dgdg gzgzdg ,λμx Gauss~
ddg θz Multi~
Kk ,,1
Collapsed Variational Bayesian Inference
○ Fixed memory size
○ Easy parallelization
× Special function evaluation– digamma, trigamma, tetragamma functions
Inference(CVB)
Experiment
http://www.gems-system.org/Dataset name Sample Gene Diagnostic Task
11_Tumors 174 12,534 11 various human tumor types
14_Tumors 308 15,010 14 various human tumor types and12 normal tissue types
9_Tumors 60 5,727 9 various human tumor types
Brain_Tumor1 90 5,921 5 human brain tumor types
Brain_Tumor2 50 10,368 4 malignant glioma types
Leukemia1 72 5,328 AML, ALL B-cell, and ALL T-cell
Leukemia2 72 11,226 AML, ALL, and mixed-lineage leukemia (MLL)
Lung_Cancer 203 12,601 4 lung cancer types and normal tissues
SRBCT 83 2,309 Small, round blue cell tumors (SRBCT) of childhood
Prostate_Tumor 102 10,510 Prostate tumor and normal tissues
DLBCL 77 5,470 DLBCL and follicular lymphomas
• Compare iLPD withLPD [Ying et al. 08]
• Train iLPD on90% randomly selected data
• Evaluate posterior density at 10% test data and
calculate geometric mean
• Average over 25 runs Evaluation
• iLPD is more efficient for a large K than LPD.
• There is a dataset that is not well analyzed.
–LPD-type methods may not be a panacea.
Cf. BMC Bioinformatics 2010, 11:552– Nonparametric Bayesian method based on
Indian Buffet ProcessesResults
• Practical evaluation
• Result interpretation
• GPGPU acceleration
• Visualization
FutureWork
10 processes 20 processes 40 processes0.270
0.280
0.290
0.300 iLPD LPD
Brain_Tumor1
10 processes 20 processes 40 processes0.225
0.235
0.245
0.255 iLPD LPD
Brain_Tumor2
10 processes 20 processes 40 processes0.250
0.260
0.270
0.280 iLPD LPD
DLBCL
10 processes 20 processes 40 processes0.230
0.240
0.250
0.260 iLPD LPD
Leukemia1
10 processes 20 processes 40 processes0.300
0.310
0.320
0.330 iLPD LPD
Leukemia2
10 processes 20 processes 40 processes0.340
0.345
0.350
0.355
0.360 iLPD LPD
Lung_Cancer
10 processes 20 processes 40 processes0.425
0.445
0.465
0.485 iLPD LPD
Prostate_Tumor
10 processes 20 processes 40 processes0.230
0.240
0.250
0.260
0.270
0.280 iLPD LPD
SRBCT
10 processes 20 processes 40 processes0.305
0.310
0.315
iLPD LPD
11_Tumors
10 processes 20 processes 40 processes0.470
0.480
0.490
0.500 iLPD LPD
14_Tumors
10 processes 20 processes 40 processes0.140
0.150
0.160
0.170
0.180
0.190 iLPD LPD
9_Tumors