1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using...
-
date post
15-Jan-2016 -
Category
Documents
-
view
212 -
download
0
Transcript of 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using...
![Page 1: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/1.jpg)
1Machine Learning in Systems Biology, Sep 25th
Identification of overlapping biclusters using Probabilistic Relational Models
Tim Van den Bulcke, Hui Zhao, Kristof Engelen,
Bart De Moor, Kathleen Marchal
![Page 2: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/2.jpg)
2Machine Learning in Systems Biology, Sep 25th
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 3: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/3.jpg)
3Machine Learning in Systems Biology, Sep 25th
Overview
• Biclustering and biology– What is biclustering?
– Why biclustering?
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 4: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/4.jpg)
4Machine Learning in Systems Biology, Sep 25th
Biclustering and biology
What is biclustering?
• Definition in the context of gene expression data:
genes
conditions
A bicluster is a subset of genes which show a similar expression profile under a subset of conditions.
![Page 5: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/5.jpg)
5Machine Learning in Systems Biology, Sep 25th
Biclustering and biology
Why bi-clustering?*
• Only a small set of the genes participates in a cellular process.
• A cellular process is active only in a subset of the conditions.
• A single gene may participate in multiple pathways that may or may not be coactive under all conditions.
* From: Madeira et al. (2004) Biclustering Algorithms for Biological Data Analysis: A Survey
![Page 6: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/6.jpg)
6Machine Learning in Systems Biology, Sep 25th
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 7: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/7.jpg)
7Machine Learning in Systems Biology, Sep 25th
Probabilistic Relational Models (PRMs)
Patient
Treatment
Virus strain Contact
Image: free interpretation from Segal et al. Rich probabilistic models
![Page 8: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/8.jpg)
8Machine Learning in Systems Biology, Sep 25th
Probabilistic Relational Models (PRMs)
• Traditional approaches “flatten” relational data– Causes bias
– Centered around one view of the data
– Loose relational structure
• PRM models– Extension of Bayesian networks
– Combine advantages of probabilistic reasoning with relational logic
Patient
flatten
Contact
![Page 9: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/9.jpg)
9Machine Learning in Systems Biology, Sep 25th
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 10: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/10.jpg)
10Machine Learning in Systems Biology, Sep 25th
ProBic biclustering model: notation
• g: gene
• c: condition
• e: expression
• g.Bk: gene-bicluster assignment for gene g to bicluster k (unknown, 0 or 1)
• c.Bk: condition-bicluster assignment for condition c to bicluster k (unknown, 0 or 1)
• e.Level: expression level value (known, continuous value)
![Page 11: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/11.jpg)
11Machine Learning in Systems Biology, Sep 25th
ProBic biclustering model
• Dataset instance
GeneGene
ExpressionExpression
ConditionCondition
ID B1 B2
g1 ? (0 or 1) ? (0 or 1)
g2 ? (0 or 1) ? (0 or 1)
ID B1 B2
c1 ? (0 or 1) ? (0 or 1)
c2 ? (0 or 1) ? (0 or 1)
g.ID c.ID level
g1 c1 -2.4
g1 c2 (missing value)
g2 c1 1.6
g2 c2 0.5
![Page 12: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/12.jpg)
12Machine Learning in Systems Biology, Sep 25th
ProBic biclustering model
• Relational schema and PRM model
Notation:
• g: gene
• c: condition
• e: expression
• g.Bk: gene-bicluster assignment for gene g to bicluster k (0 or 1, unknown)
• c.Bk: condition-bicluster assignment for condition c to bicluster k: (0 or 1, unknown)
• e.Level: expression level value (continuous, known)
GeneGene
ExpressionExpression
ConditionCondition
B1 B2
level
B1 B2
P(e.level | g.B1,g.B2,c.B1,c.B2,c.ID)=
Normal( μg.B,c.B,c.ID, σg.B,c.B,c.ID )
ID
P(e.level | g.B1,g.B2,c.B1,c.B2,c.ID)=
Normal( μg.B,c.B,c.ID, σg.B,c.B,c.ID )
1
2
3
![Page 13: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/13.jpg)
13Machine Learning in Systems Biology, Sep 25th
ProBic biclustering model
GeneGene
ExpressionExpression
ConditionCondition
? (0 or 1)? (0 or 1)g2
? (0 or 1)? (0 or 1)g1
B2B1ID
? (0 or 1)? (0 or 1)c2
? (0 or 1)? (0 or 1)c1
B2B1ID
1.6c1g2
(missing value)c2g1
0.5c2g2
-2.4c1g1
levelc.IDg.ID
g1.B1
g1.B2 level1,1
c1.B1 c1.B2
g2.B1
g2.B2 level2,
2
c2.B1 c2.B2
level2,1
c1.ID c2.ID
PRM modelDatabase instance
ground Bayesian network
GeneGene
ExpressionExpression
ConditionCondition
B1 B2B1 B2
P(e.level | g.B1,g.B2,c.B1,c.B2, c.ID)=
Normal( μg.B,c.B, c.ID, σg.B,c.B,c.ID )
ID
level
![Page 14: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/14.jpg)
14Machine Learning in Systems Biology, Sep 25th
ProBic biclustering model
• Joint Probability Distribution is defined as a product over each of the node in Bayesian Network:
• ProBic posterior ( ~ likelihood x prior ):
Expression level
conditional probabilitiesExpression level prior
(μ, σ)’s
Prior condition to bicluster assignmentsPrior gene to bicluster assignment
![Page 15: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/15.jpg)
15Machine Learning in Systems Biology, Sep 25th
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 16: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/16.jpg)
16Machine Learning in Systems Biology, Sep 25th
Algorithm: choices
• Different approaches possible
• Only approximative algorithms are tractable:– MCMC methods (e.g. Gibbs sampling)
– Expectation-Maximization (soft, hard assignment)
– Variational approaches
– simulated annealing, genetic algorithms, …
• We chose a hard assignment Expectation-Maximization algorithm (E.-M.)– Natural decomposition of the model in E.-M. steps
– Efficient
– Extensible
– Relatively good convergence properties for this model
![Page 17: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/17.jpg)
17Machine Learning in Systems Biology, Sep 25th
Algorithm: Expectation-Maximization
• Maximization step:– Maximize posterior w.r.t. μ, σ values (model parameters),
given the current gene-bicluster and condition-bicluster assignments (=the hidden variables)
• Expectation step:– Maximize posterior w.r.t. gene-bicluster and condition-bicluster
assignments, given the current model parameters
– Two-step approach:
• Step 1: max. posterior w.r.t. C.B, given G.B and μ, σ values
• Step 2: max. posterior w.r.t. G.B, given C.B and μ, σ values
![Page 18: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/18.jpg)
18Machine Learning in Systems Biology, Sep 25th
Algorithm: Simple Example
![Page 19: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/19.jpg)
19Machine Learning in Systems Biology, Sep 25th
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results– Noise sensitivity
– Bicluster shape
– Overlap
– Missing values
• Conclusion
![Page 20: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/20.jpg)
20Machine Learning in Systems Biology, Sep 25th
Results: noise sensitivity
• Setup: – Simulated dataset: 500 genes x 200 conditions
– Background distribution: Normal(0,1)
– Bicluster distributions: Normal( rnd(N(0,1)), σ ), varying sigma
– Shapes: three 50x50 biclusters
![Page 21: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/21.jpg)
21Machine Learning in Systems Biology, Sep 25th
Results: noise sensitivity
Precision (genes) Recall (genes)
Precision (conditions) Recall (conditions)
A BA B
A
B
A B
σ σ
σ σ
A
…
B
…
Precision = TP / (TP+FP) Recall = TP / (TP+FN)
![Page 22: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/22.jpg)
22Machine Learning in Systems Biology, Sep 25th
Results: bicluster shape independence
• Setup:– Dataset: 500 genes x 200 conditions
– Background distribution: N(0,1)
– Bicluster distributions: N( rnd(N(0,1)), 0.2 )
– Shapes: 80x10, 10x80, 20x20
![Page 23: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/23.jpg)
23Machine Learning in Systems Biology, Sep 25th
Results: bicluster shape independence
![Page 24: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/24.jpg)
24Machine Learning in Systems Biology, Sep 25th
Results: Overlap examples
• Two biclusters (50 genes, 50 conditions)
• Overlap:25 genes, 25 conditions
• Two biclusters (10 genes, 80 conditions)
• Overlap: 2 genes, 40 conditions
![Page 25: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/25.jpg)
25Machine Learning in Systems Biology, Sep 25th
Results: Missing values
• ProBic model has no concept of ‘missing values’
No prior missing value estimations which could bias the result
![Page 26: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/26.jpg)
26Machine Learning in Systems Biology, Sep 25th
Results: Missing values – one example
• 500 genes x 200 conditions
• Noise std: bicluster 0.2, background 1.0
• Missing value: 70%
• One bicluster 50x50
![Page 27: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/27.jpg)
27Machine Learning in Systems Biology, Sep 25th
Results: Missing values
Precision (genes) Recall (genes)
Precision (conditions) Recall (conditions)% missing values
% missing values
![Page 28: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/28.jpg)
28Machine Learning in Systems Biology, Sep 25th
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 29: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/29.jpg)
29Machine Learning in Systems Biology, Sep 25th
Conclusion
• Noise robustness
• Naturally deals with missing values
• Relatively independent of bicluster shape
• Simultaneous identification of multiple overlapping biclusters
• Can be used query-driven
• Extensible
![Page 30: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/30.jpg)
30Machine Learning in Systems Biology, Sep 25th
Acknowledgements
KULeuven:
• whole BioI group, ESAT-SCD
– Tim Van den Bulcke
– Thomas Dhollander
• whole CMPG group(Centre of Microbial and Plant Genetics)
– Kristof Engelen
– Kathleen Marchal
UGent:
• whole Bioinformatics & Evolutionary Genomics group
– Tom Michoel
![Page 31: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/31.jpg)
31Machine Learning in Systems Biology, Sep 25th
Thank you for your attention!
![Page 32: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/32.jpg)
32Machine Learning in Systems Biology, Sep 25th
Near future
• Automated definition of algorithm parameter settings
• Application biological datasets– Dataset normalization
• Extend model with different overlap models
• Model extension from biclusters to regulatory modulesinclude motif + ChIP-chip data
PromoterPromoter
ExpressionExpression
ArrayArray
S1 S2 S3 S4
R1 R2 R3
M1 M2
level
M1P1 P2 P3
Gene
M2
TTCAATACAGG
R1
R2
…
![Page 33: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/33.jpg)
33Machine Learning in Systems Biology, Sep 25th
Results: Missing values
• ProBic model has no concept of ‘missing values’
GeneGene
ExpressionExpression
ConditionCondition
? (0 or 1)? (0 or 1)g2
? (0 or 1)? (0 or 1)g1
B2B1ID
? (0 or 1)? (0 or 1)c2
? (0 or 1)? (0 or 1)c1
B2B1ID
1.6c1g2
(missing value)c2g1
0.5c2g2
-2.4c1g1
levelc.IDg.ID
g1.B1
g1.B2 level1,
1
c1.B1 c1.B2
g2.B1
g2.B2 level2,2
c2.B1 c2.B2
level2,
1
c1.ID c2.ID
PRM model Database instance
ground Bayesian network
GeneGene
ExpressionExpression
ConditionCondition
B1 B2B1 B2
P(e.level | g.B1,g.B2,c.B1,c.B2, c.ID)
=Normal( μg.B,c.B, c.ID, σg.B,c.B,c.ID )
ID
level
![Page 34: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/34.jpg)
34Machine Learning in Systems Biology, Sep 25th
Algorithm: example
![Page 35: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/35.jpg)
35Machine Learning in Systems Biology, Sep 25th
Algorithm properties
• Speed:– 500 genes, 200 conditions, 2 biclusters: 2 min.
– Scaling:
• ~ #genes . #conditions . 2#biclusters (worse case)
• ~ #genes . #conditions . (#biclusters)p (in practice), p=1..3
![Page 36: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/36.jpg)
40Machine Learning in Systems Biology, Sep 25th
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Discussion
• Conclusion
![Page 37: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/37.jpg)
41Machine Learning in Systems Biology, Sep 25th
Discussion: Expectation-Maximization
• Initialization: – initialization with (almost) all genes and condition:
convergence to good local optimum
– multiple random initializations: many initializations required (speed!)
• E.-M. steps:– limit changes in gene/condition-bicluster assignments in both E-
steps results in higher stability (at cost of slower convergence)
![Page 38: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/38.jpg)
42Machine Learning in Systems Biology, Sep 25th
Probabilistic Relational Models (PRMs)
• Relational extension to Bayesian Networks (BNs):– BNs ~ a single flat table– PRMs ~ relational data structure
• A relational scheme (implicitly) defines a constrained Bayesian network
• In PRMs, probability distributions are shared among all objects of the same class
• Likelihood function:(very similar to chain-rule in Bayesian networks)
• Learning PRM model e.g. using maximum likelihood principle
)).(|.(),S,|( ,.
AxparentsAxPP Sx Ax
I
AttributesObjects
![Page 39: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/39.jpg)
43Machine Learning in Systems Biology, Sep 25th
Algorithm: user-defined parameters
• P(μg.B,c.B,c, σg.B,c.B,c): prior distributions for μ, σ
– Conjugate:
• Normal-Inverse-χ2 distribution or
• Normal distribution with pseudocount
– Makes extreme distributions less likely
• P(a.Bk): prior probability that a condition is in bicluster k
– Prevents background conditions to be in biclusters
– If no prior distribution P(μ, σ): conditions are always more likely to be in a bicluster due to statistical variations.
• P(g.Bk): prior probability that a gene is in bicluster k
– Initialize biclusters with seed genes: query-driven biclustering
• P(a.Bk) and P(g.Bk):
– Both have impact on the preference for certain bicluster shapes
![Page 40: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/40.jpg)
44Machine Learning in Systems Biology, Sep 25th
Algorithm: Expectation-Maximization
• Expectation step 2:
– argmaxG.B log(posterior)
constant
without prior: independent per gene
Approximation based on previous iteration
quasi independence if small changes in assignment
![Page 41: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/41.jpg)
45Machine Learning in Systems Biology, Sep 25th
ProBic biclustering model
GeneGene
ExpressionExpression
ConditionCondition
B1 B2B1 B2
P(e.level | g.B1,g.B2,c.B1,c.B2, c.ID)=
Normal( μg.B,c.B, c.ID, σg.B,c.B,c.ID )
ID
PRM model
level
g.B c.B c μ σ
1 1,2 44 -1 .5
2 1,2 44 +2 .6
- * 44 0 1
* - 44 0 1
1 1 44 -1 .5
2 2 44 +2 .6
1
2
c=44
![Page 42: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/42.jpg)
46Machine Learning in Systems Biology, Sep 25th
ProBic biclustering model
• ground Bayesian network
g1.B1
g1.B2 level1,1
c1.B1 c1.B2
g2.B1
g2.B2 level2,2
c2.B1 c2.B2
level2,1
c1.ID c2.ID
![Page 43: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/43.jpg)
47Machine Learning in Systems Biology, Sep 25th
ProBic biclustering model
• Likelihood function:(~ chain-rule in Bayesian networks)
• In PRMs, probability distributions are shared among all objects of the same class
)).(|.(),S,|( ,.
AxparentsAxPP Sx Ax
I
AttributesObjects
![Page 44: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/44.jpg)
48Machine Learning in Systems Biology, Sep 25th
Algorithm: Expectation-Maximization
• Maximization step:
– argmaxμ,σ log(posterior)
constant
independent per condition
+
analytic solution based on
sufficient statistics
![Page 45: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/45.jpg)
49Machine Learning in Systems Biology, Sep 25th
Algorithm: Expectation-Maximization
• Expectation step 1: assign conditions
– argmaxC.B log(posterior)
constant
independent per condition
![Page 46: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/46.jpg)
51Machine Learning in Systems Biology, Sep 25th
Algorithm: Expectation-Maximization
• Expectation step 1:– Evaluate function for every condition and for
every bicluster assignment e.g. 200 conditions, 30 biclusters: 200 * 230 = 200 billion ~ a lot
– But can be performed very efficiently:
• Partial solutions can be reused among different bicluster assignments
1
2
3
c.B = 1 ?c.B = 1,2 ?c.B = 1,2,3 ?
![Page 47: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/47.jpg)
52Machine Learning in Systems Biology, Sep 25th
Algorithm: Expectation-Maximization
• Expectation step 1:– Evaluate function for every condition and for
every bicluster assignment e.g. 200 conditions, 30 biclusters: 200 * 230 = 200 billion ~ a lot
– But can be performed very efficiently:
• Partial solutions can be reused among different bicluster assignments
• Only evaluate potential good solutions: use Apriori-like approach.
1
2
3
a.B = 1 ?a.B = 2 ?a.B = 1,2 ?
![Page 48: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/48.jpg)
53Machine Learning in Systems Biology, Sep 25th
Algorithm: Expectation-Maximization
• Expectation step 1:– Evaluate function for every condition and for
every bicluster assignment e.g. 200 conditions, 30 biclusters: 200 * 230 = 200 billion ~ a lot
– But can be performed very efficiently:
• Partial solutions can be reused among different bicluster assignments
• Only evaluate potential good solutions: use Apriori-like approach.
• Avoid background evaluations
1
2
3
![Page 49: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/49.jpg)
54Machine Learning in Systems Biology, Sep 25th
Algorithm: Expectation-Maximization
• Expectation step 2:– Analogous approach as in step 1
![Page 50: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/50.jpg)
55Machine Learning in Systems Biology, Sep 25th
Acknowledgements
KULeuven:
• whole BioI group, ESAT-SCD
– Tim Van den Bulcke
– Thomas Dhollander
• whole CMPG group(Centre of Microbial and Plant Genetics)
– Kristof Engelen
– Kathleen Marchal
UGent:
• whole Bioinformatics & Evolutionary Genomics group
– Tom Michoel BIO I..
![Page 51: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/51.jpg)
56Machine Learning in Systems Biology, Sep 25th
Algorithm: iteration strategy
• Many implementation choices in generalized E.-M.:– Single E-step per M-step
– Iterate E-steps until convergence per M-step
– Iteratively perform E-step 1, M-step, E-step 2, M-step
– …
• Our choice: – Iteratively perform E-step 1, M-step, E-step 2, M-step
– fast and good convergence properties
![Page 52: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/52.jpg)
57Machine Learning in Systems Biology, Sep 25th
Results: 9 bicluster dataset (15 genes x 80 conditions)
![Page 53: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.](https://reader036.fdocuments.in/reader036/viewer/2022070412/56649d455503460f94a22173/html5/thumbnails/53.jpg)
58Machine Learning in Systems Biology, Sep 25th
Results: Missing values
• 500 genes, 300 conditions
• Noise stdev: 0.2 (bicluster), 1.0 background
• 1 50x50 biclusterPrecision (genes) Recall (genes)
Precision (conditions) Recall (conditions)% missing values
% missing values