Using Bayesian Networks to Analyze Expression Data
description
Transcript of Using Bayesian Networks to Analyze Expression Data
![Page 1: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/1.jpg)
.
Using Bayesian Networks to Analyze Expression Data
N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem
![Page 2: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/2.jpg)
Central Dogma
Transcription
mRNA
Cells express different subset of the genesIn different tissues and under different conditions
Gene
Translation
Protein
![Page 3: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/3.jpg)
Microarrays (aka “DNA chips”)
New technological breakthrough: Measure RNA expression levels of thousands
of genes in one experiment Measure expression on
a genomic scale Opens up new
experimental designs Many major labs are using,
or will use this technology in the near future
![Page 4: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/4.jpg)
The ProblemGenes
Exp
erim
ents
j
i
Aij - the mRNA level of gene j in experiment iGoal:
Learn regulatory/metabolic networks Identify causal sources of the biological
phenomena of interest
![Page 5: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/5.jpg)
Our Approach
Characterize statistical relationships between expression patterns of different genes
Beyond pair-wise interactions Many interactions are explained by intermediate factors Regulation involves combined effects of several gene-
products
We build on the language of Bayesian networks
![Page 6: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/6.jpg)
Modeling assumptions: Ancestors can effect descendants' genotype only by passing
genetic materials through intermediate generations
Network: Example
Noisy stochastic process:
Example: Pedigree A node represents
an individual’sgenotype
Homer
Bart
Marge
Lisa Maggie
![Page 7: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/7.jpg)
Network Structure
Generalizing to DAGs: A child is conditionally
independent from its non-descendents, given the value of its parents
Often a natural assumption for causal processes if we believe that we capture
the relevant state of each intermediate stage.
X
Y1 Y2
Descendent
Ancestor
Parent
Non-descendentNon-descendent
![Page 8: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/8.jpg)
Associated with each variable Xi is a conditional probability distribution P(Xi|Pai:)
Discrete variables: Multinomial distribution
Continuous variables: Choice: for example linear gaussian
Local Probabilities
XY
P(Y
| X
)
X
Y
0.9 0.1
x 0.3 0.7
x
X P(Y |X)
![Page 9: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/9.jpg)
Qualitative partDAG specifies
conditionalindependence
statements
+
Quantitative part
localprobability
models
Unique jointdistribution
over domain=
P(C,A,R,E,B) = P(B)*P(E|B)*P(R|E,B)*P(A|R,B,E)*P(C|A,R,B,E) versusP(C,A,R,E,B) = P(B)*P(E) * P(R|E) * P(A|B,E) * P(C|A)
E
R
B
A
C
Bayesian Network Semantics
Compact & efficient representation: k parents O(2kn) vs. O(2n) params parameters pertain to local interactions
![Page 10: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/10.jpg)
Why Bayesian Networks?
Bayesian Networks: Flexible representation of dependency structure
of multivariate distributions Natural for modeling processes with local
interactions
Learning of Bayesian Networks Can learn dependencies from observations Handles stochastic processes:
“true” stochastic behavior noise in measurements
![Page 11: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/11.jpg)
Modeling Regulatory Interactions
Variables of interest: Expression levels of genes Concentration levels of proteins (proteomics!) Exogenous variables: Nutrient levels, Metabolite
Levels, Temperature, Phenotype information …
Bayesian Network Structure: Capture dependencies among these variables
![Page 12: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/12.jpg)
Examples
Interactions are represented by a graph: Each gene is represented by a node in the graph Edges between the nodes represent direct
dependency
Measured expression level of each gene
Gene interaction
Random variables
Probabilistic dependencies
A BX BA
![Page 13: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/13.jpg)
More Complex Examples
Dependencies can be mediated through other nodes
Common effects can imply conditional dependence
Common cause
A CB
Intermediate gene
A
C
B
B
A C
![Page 14: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/14.jpg)
Outline of Our Approach
Use learned network to make predictions about
structure of the interactions between genes
Bayesian NetworkLearning Algorithm
E
R
B
A
C
Expression data
![Page 15: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/15.jpg)
Experiment
Data from Spellman et al. (Mol.Bio. of the Cell 1998)
Contains 76 samples of all the yeast genome:
Different methods for synchronizing cell-cycle in yeast
Time series at few minutes (5-20min) intervals
Spellman et al. identified 800 cell-cycle regulated genes.
![Page 16: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/16.jpg)
Methods Treat samples as IID (ignoring temporal order)
Experiment 1: Discretized into three levels of expression
Learn multinomial probabilities
Experiment 2: Learn linear interactions (w/ Gaussian noise)
No prior biological knowledge was used
-0.5 0.5
0 +-
Log(ratio to control)
![Page 17: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/17.jpg)
Network Learned
![Page 18: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/18.jpg)
Challenge: Statistical Significance
Sparse Data Small number of samples “Flat posterior” -- many networks fit the data
Solution estimate confidence in network features Two types of features
Markov neighbors: X directly interacts with Y Order relations: X is an ancestor of Y
![Page 19: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/19.jpg)
Confidence Estimates
D resample
resample
resample
D1
D2
Dm
...
Learn
Learn
Learn
E
R
B
A
C
E
R
B
A
C
E
R
B
A
C
m
iiGf
mfC
1
11
)(Estimate:
Bootstrap approach[FGW, UAI99]
![Page 20: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/20.jpg)
Testing for Significance
We run our procedure on randomized data where we reshuffled the order of values for each gene
Histograms of number of Markov features at each confidence level
Original Data Randomized Data
![Page 21: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/21.jpg)
RandomReal
Testing for Significance
0
500
1000
1500
2000
2500
3000
3500
4000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fea
ture
s w
ith C
onfid
ence
abo
ve t
t
0
50
100
150
200
250
300
350
400
450
500
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RandomReal
We run our procedure on randomized data where we reshuffled the order of values for each gene
Markov w/ Gaussian Models
![Page 22: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/22.jpg)
Testing for Significance
0
200
400
600
800
1000
1200
1400
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fea
ture
s w
ith C
onfid
ence
abo
ve t
t
RandomReal
Markov w/ Multinomial Models
0
50
100
150
200
250
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RandomReal
![Page 23: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/23.jpg)
Local Map
![Page 24: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/24.jpg)
Finding Key GenesKey gene: a gene that preceeds many other genes YLR183C MCD1 Mitotic Chromosome Determinant; RAD27 DNA repair protein CLN2 role in cell cycle START SRO4 involved in cellular polarization during budding YOX1 Homeodomain protein that binds leu-tRNA gene POL30 required for DNA replication and repair YLR467W CDC5 MSH6 Homolog of the human GTBP protein YML119W CLN1 role in cell cycle START
![Page 25: Using Bayesian Networks to Analyze Expression Data](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140a7550346895dac6500/html5/thumbnails/25.jpg)
Future Work
Finding suitable local distribution models Correct handling of hidden variables
Can we recognize hidden causes of coordinated regulation events?
Incorporating prior knowledge Incorporate large mass of biological knowledge, and
insight from sequence/structure databases Abstraction
Combine with cluster analysis of higher confidence conclusions