Using Bayesian Networks to Analyze Expression Data
description
Transcript of Using Bayesian Networks to Analyze Expression Data
![Page 1: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/1.jpg)
.
Using Bayesian Networks to Analyze Expression Data
N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem
![Page 2: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/2.jpg)
Central Dogma
Transcription
mRNA
Cells express different subset of the genesIn different tissues and under different conditions
Gene
Translation
Protein
![Page 3: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/3.jpg)
Microarrays (aka “DNA chips”)
New technological breakthrough: Measure RNA expression levels of thousands
of genes in one experiment Measure expression on
a genomic scale Opens up new
experimental designs Many major labs are using,
or will use this technology in the near future
![Page 4: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/4.jpg)
The ProblemGenes
Exp
erim
ents
j
i
Aij - the mRNA level of gene j in experiment iGoal:
Learn regulatory/metabolic networks Identify causal sources of the biological
phenomena of interest
![Page 5: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/5.jpg)
Analysis Approaches
Clustering of expression data Groups together genes with similar expression patterns Does not reveal structural relations between genes
Boolean networks Deterministic models of the logical interactions between
genes Deterministic, impractical for real data
![Page 6: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/6.jpg)
Example: Cell-Cycle Data [Spellman et al]
clusters
Cell cycle stages
![Page 7: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/7.jpg)
Our Approach
Characterize statistical relationships between expression patterns of different genes
Beyond pair-wise interactions Many interactions are explained by intermediate factors Regulation involves combined effects of several gene-
products
We build on the language of Bayesian networks
![Page 8: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/8.jpg)
Modeling assumptions: Ancestors can effect descendants' genotype only by passing
genetic materials through intermediate generations
Network: Example
Noisy stochastic process:
Example: Pedigree A node represents
an individual’sgenotype
Homer
Bart
Marge
Lisa Maggie
![Page 9: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/9.jpg)
Network Structure
Generalizing to DAGs: A child is conditionally
independent from its non-descendents, given the value of its parents
Often a natural assumption for causal processes if we believe that we capture
the relevant state of each intermediate stage.
X
Y1 Y2
Descendent
Ancestor
Parent
Non-descendentNon-descendent
![Page 10: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/10.jpg)
Associated with each variable Xi is a conditional probability distribution P(Xi|Pai:)
Discrete variables: Multinomial distribution
Continuous variables: Choice: for example linear Gaussian
Local Probabilities
YX
P(Y
| X
)
X
Y
0.9 0.1
0 0.3 0.7
1
X P(Y |X)
![Page 11: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/11.jpg)
Qualitative partDAG specifies
conditionalindependence
statements
+
Quantitative part
localprobability
models
Unique jointdistribution
over domain=
P(C,A,R,E,B) = P(B)*P(E|B)*P(R|E,B)*P(A|R,B,E)*P(C|A,R,B,E) versusP(C,A,R,E,B) = P(B)*P(E) * P(R|E) * P(A|B,E) * P(C|A)
E
R
B
A
C
Bayesian Network Semantics
Compact & efficient representation: k parents O(2kn) vs. O(2n) params parameters pertain to local interactions
![Page 12: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/12.jpg)
Why Bayesian Networks?
Bayesian Networks: Flexible representation of dependency structure
of multivariate distributions Natural for modeling processes with local
interactions
Learning of Bayesian Networks Can learn dependencies from observations Handles stochastic processes:
“true” stochastic behavior noise in measurements
![Page 13: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/13.jpg)
Modeling Biological Regulation
Variables of interest: Expression levels of genes Concentration levels of proteins Exogenous variables: Nutrient levels, Metabolite
Levels, Temperature, Phenotype information …
Bayesian Network Structure: Capture dependencies among these variables
![Page 14: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/14.jpg)
Examples
Interactions are represented by a graph: Each gene is represented by a node in the graph Edges between the nodes represent direct
dependency
Measured expression level of each gene
Gene interaction
Random variables
Probabilistic dependencies
A BX BA
![Page 15: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/15.jpg)
More Complex Examples
Dependencies can be mediated through other nodes
Common effects can imply conditional dependence
Common cause
A CB
Intermediate gene
A
C
B
B
A C
![Page 16: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/16.jpg)
Outline of Our Approach
Use learned network to make predictions about
structure of the interactions between genes
Bayesian NetworkLearning Algorithm
E
R
B
A
C
Expression data
![Page 17: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/17.jpg)
Sparse Candidate algorithm - efficient heuristic search that relies on sparseness
Learning With Many Variables
parents in BNcandidates
Choose candidate set for direct influence for each gene
Find optimal BN constrained on candidates
Iteratively improve candidate set
![Page 18: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/18.jpg)
Experiment
Data from Spellman et al. (Mol.Bio. of the Cell 1998).
Contains 76 samples of all the yeast genome:
Different methods for synchronizing cell-cycle in yeast.
Time series at few minutes (5-20min) intervals.
Spellman et al. identified 800 cell-cycle regulated genes.
![Page 19: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/19.jpg)
MethodsExperiment 1: discretized data into 3 levels
Learn multinomial probabilities
Experiment 2: Learn linear interactions (w/ Gaussian noise)
No prior biological knowledge was used
-0.5 0.5
0 +-
Log(ratio to control)
![Page 20: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/20.jpg)
Network Learned
![Page 21: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/21.jpg)
Challenge: Statistical Significance
Sparse Data Small number of samples “Flat posterior” -- many networks fit the data
Solution estimate confidence in network features Two types of features
Markov neighbors: X directly interacts with Y Order relations: X is an ancestor of Y
![Page 22: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/22.jpg)
Confidence Estimates
D resample
resample
resample
D1
D2
Dm
...
Learn
Learn
Learn
E
R
B
A
C
E
R
B
A
C
E
R
B
A
C
m
iiGf
mfC
1
11
)(Estimate:
Bootstrap approach[FGW, UAI99]
![Page 23: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/23.jpg)
RandomReal
Testing for Significance
0
500
1000
1500
2000
2500
3000
3500
4000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fea
ture
s w
ith C
onfid
ence
abo
ve t
t
0
50
100
150
200
250
300
350
400
450
500
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RandomReal
We run our procedure on randomized data where we reshuffled the order of values for each gene
Markov w/ Gaussian Models
![Page 24: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/24.jpg)
Testing for Significance
0
200
400
600
800
1000
1200
1400
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fea
ture
s w
ith C
onfid
ence
abo
ve t
t
RandomReal
Markov w/ Multinomial Models
0
50
100
150
200
250
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RandomReal
![Page 25: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/25.jpg)
Local Map
![Page 26: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/26.jpg)
Finding Key GenesKey gene: a gene that preceeds many other genes YLR183C MCD1 Mitotic Chromosome Determinant; RAD27 DNA repair protein CLN2 role in cell cycle START SRO4 involved in cellular polarization during budding YOX1 Homeodomain protein that binds leu-tRNA gene POL30 required for DNA replication and repair YLR467W CDC5 MSH6 Homolog of the human GTBP protein YML119W CLN1 role in cell cycle START
![Page 27: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/27.jpg)
Strong Markov Relations
YKL163W-PIR3 YKL164C-PIR1 Close location
YKR013W-PRY2 YKR012C Close location
MCD1 MSH6 Bind to DNA during mitosis
PHO11 PHO12 Acid phosphatases
HHT1 HTB1 Histones
FAR1 ASH1 Mating type switch, expression uncorrelated
CLN2 SVS1 Unknown function - SVS1
STE2 MFA2 Mating factor & receptor
![Page 28: Using Bayesian Networks to Analyze Expression Data](https://reader035.fdocuments.in/reader035/viewer/2022062217/56815a33550346895dc7732d/html5/thumbnails/28.jpg)
Future Work
Finding suitable local distribution models Temporal aspect - DBN Correct handling of hidden variables
Can we recognize hidden causes of coordinated regulation events?
Incorporating prior knowledge Incorporate large mass of biological knowledge, and
insight from sequence/structure databases Abstraction
Combine with cluster analysis