Using Bayesian Networks to Analyze Expression Data

28
. Using Bayesian Networks to Analyze Expression Data N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem

description

Using Bayesian Networks to Analyze Expression Data. N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem. Transcription. mRNA. Gene. Central Dogma. Translation. Protein. Cells express different subset of the genes - PowerPoint PPT Presentation

Transcript of Using Bayesian Networks to Analyze Expression Data

Page 1: Using Bayesian Networks to Analyze Expression Data

.

Using Bayesian Networks to Analyze Expression Data

N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem

Page 2: Using Bayesian Networks to Analyze Expression Data

Central Dogma

Transcription

mRNA

Cells express different subset of the genesIn different tissues and under different conditions

Gene

Translation

Protein

Page 3: Using Bayesian Networks to Analyze Expression Data

Microarrays (aka “DNA chips”)

New technological breakthrough: Measure RNA expression levels of thousands

of genes in one experiment Measure expression on

a genomic scale Opens up new

experimental designs Many major labs are using,

or will use this technology in the near future

Page 4: Using Bayesian Networks to Analyze Expression Data

The ProblemGenes

Exp

erim

ents

j

i

Aij - the mRNA level of gene j in experiment iGoal:

Learn regulatory/metabolic networks Identify causal sources of the biological

phenomena of interest

Page 5: Using Bayesian Networks to Analyze Expression Data

Analysis Approaches

Clustering of expression data Groups together genes with similar expression patterns Does not reveal structural relations between genes

Boolean networks Deterministic models of the logical interactions between

genes Deterministic, impractical for real data

Page 6: Using Bayesian Networks to Analyze Expression Data

Example: Cell-Cycle Data [Spellman et al]

clusters

Cell cycle stages

Page 7: Using Bayesian Networks to Analyze Expression Data

Our Approach

Characterize statistical relationships between expression patterns of different genes

Beyond pair-wise interactions Many interactions are explained by intermediate factors Regulation involves combined effects of several gene-

products

We build on the language of Bayesian networks

Page 8: Using Bayesian Networks to Analyze Expression Data

Modeling assumptions: Ancestors can effect descendants' genotype only by passing

genetic materials through intermediate generations

Network: Example

Noisy stochastic process:

Example: Pedigree A node represents

an individual’sgenotype

Homer

Bart

Marge

Lisa Maggie

Page 9: Using Bayesian Networks to Analyze Expression Data

Network Structure

Generalizing to DAGs: A child is conditionally

independent from its non-descendents, given the value of its parents

Often a natural assumption for causal processes if we believe that we capture

the relevant state of each intermediate stage.

X

Y1 Y2

Descendent

Ancestor

Parent

Non-descendentNon-descendent

Page 10: Using Bayesian Networks to Analyze Expression Data

Associated with each variable Xi is a conditional probability distribution P(Xi|Pai:)

Discrete variables: Multinomial distribution

Continuous variables: Choice: for example linear Gaussian

Local Probabilities

YX

P(Y

| X

)

X

Y

0.9 0.1

0 0.3 0.7

1

X P(Y |X)

Page 11: Using Bayesian Networks to Analyze Expression Data

Qualitative partDAG specifies

conditionalindependence

statements

+

Quantitative part

localprobability

models

Unique jointdistribution

over domain=

P(C,A,R,E,B) = P(B)*P(E|B)*P(R|E,B)*P(A|R,B,E)*P(C|A,R,B,E) versusP(C,A,R,E,B) = P(B)*P(E) * P(R|E) * P(A|B,E) * P(C|A)

E

R

B

A

C

Bayesian Network Semantics

Compact & efficient representation: k parents O(2kn) vs. O(2n) params parameters pertain to local interactions

Page 12: Using Bayesian Networks to Analyze Expression Data

Why Bayesian Networks?

Bayesian Networks: Flexible representation of dependency structure

of multivariate distributions Natural for modeling processes with local

interactions

Learning of Bayesian Networks Can learn dependencies from observations Handles stochastic processes:

“true” stochastic behavior noise in measurements

Page 13: Using Bayesian Networks to Analyze Expression Data

Modeling Biological Regulation

Variables of interest: Expression levels of genes Concentration levels of proteins Exogenous variables: Nutrient levels, Metabolite

Levels, Temperature, Phenotype information …

Bayesian Network Structure: Capture dependencies among these variables

Page 14: Using Bayesian Networks to Analyze Expression Data

Examples

Interactions are represented by a graph: Each gene is represented by a node in the graph Edges between the nodes represent direct

dependency

Measured expression level of each gene

Gene interaction

Random variables

Probabilistic dependencies

A BX BA

Page 15: Using Bayesian Networks to Analyze Expression Data

More Complex Examples

Dependencies can be mediated through other nodes

Common effects can imply conditional dependence

Common cause

A CB

Intermediate gene

A

C

B

B

A C

Page 16: Using Bayesian Networks to Analyze Expression Data

Outline of Our Approach

Use learned network to make predictions about

structure of the interactions between genes

Bayesian NetworkLearning Algorithm

E

R

B

A

C

Expression data

Page 17: Using Bayesian Networks to Analyze Expression Data

Sparse Candidate algorithm - efficient heuristic search that relies on sparseness

Learning With Many Variables

parents in BNcandidates

Choose candidate set for direct influence for each gene

Find optimal BN constrained on candidates

Iteratively improve candidate set

Page 18: Using Bayesian Networks to Analyze Expression Data

Experiment

Data from Spellman et al. (Mol.Bio. of the Cell 1998).

Contains 76 samples of all the yeast genome:

Different methods for synchronizing cell-cycle in yeast.

Time series at few minutes (5-20min) intervals.

Spellman et al. identified 800 cell-cycle regulated genes.

Page 19: Using Bayesian Networks to Analyze Expression Data

MethodsExperiment 1: discretized data into 3 levels

Learn multinomial probabilities

Experiment 2: Learn linear interactions (w/ Gaussian noise)

No prior biological knowledge was used

-0.5 0.5

0 +-

Log(ratio to control)

Page 20: Using Bayesian Networks to Analyze Expression Data

Network Learned

Page 21: Using Bayesian Networks to Analyze Expression Data

Challenge: Statistical Significance

Sparse Data Small number of samples “Flat posterior” -- many networks fit the data

Solution estimate confidence in network features Two types of features

Markov neighbors: X directly interacts with Y Order relations: X is an ancestor of Y

Page 22: Using Bayesian Networks to Analyze Expression Data

Confidence Estimates

D resample

resample

resample

D1

D2

Dm

...

Learn

Learn

Learn

E

R

B

A

C

E

R

B

A

C

E

R

B

A

C

m

iiGf

mfC

1

11

)(Estimate:

Bootstrap approach[FGW, UAI99]

Page 23: Using Bayesian Networks to Analyze Expression Data

RandomReal

Testing for Significance

0

500

1000

1500

2000

2500

3000

3500

4000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fea

ture

s w

ith C

onfid

ence

abo

ve t

t

0

50

100

150

200

250

300

350

400

450

500

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RandomReal

We run our procedure on randomized data where we reshuffled the order of values for each gene

Markov w/ Gaussian Models

Page 24: Using Bayesian Networks to Analyze Expression Data

Testing for Significance

0

200

400

600

800

1000

1200

1400

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fea

ture

s w

ith C

onfid

ence

abo

ve t

t

RandomReal

Markov w/ Multinomial Models

0

50

100

150

200

250

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RandomReal

Page 25: Using Bayesian Networks to Analyze Expression Data

Local Map

Page 26: Using Bayesian Networks to Analyze Expression Data

Finding Key GenesKey gene: a gene that preceeds many other genes YLR183C MCD1 Mitotic Chromosome Determinant; RAD27 DNA repair protein CLN2 role in cell cycle START SRO4 involved in cellular polarization during budding YOX1 Homeodomain protein that binds leu-tRNA gene POL30 required for DNA replication and repair YLR467W CDC5 MSH6 Homolog of the human GTBP protein YML119W CLN1 role in cell cycle START

Page 27: Using Bayesian Networks to Analyze Expression Data

Strong Markov Relations

YKL163W-PIR3 YKL164C-PIR1 Close location

YKR013W-PRY2 YKR012C Close location

MCD1 MSH6 Bind to DNA during mitosis

PHO11 PHO12 Acid phosphatases

HHT1 HTB1 Histones

FAR1 ASH1 Mating type switch, expression uncorrelated

CLN2 SVS1 Unknown function - SVS1

STE2 MFA2 Mating factor & receptor

Page 28: Using Bayesian Networks to Analyze Expression Data

Future Work

Finding suitable local distribution models Temporal aspect - DBN Correct handling of hidden variables

Can we recognize hidden causes of coordinated regulation events?

Incorporating prior knowledge Incorporate large mass of biological knowledge, and

insight from sequence/structure databases Abstraction

Combine with cluster analysis