Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Pe’er @...

26
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Pe’er @ Hebrew University

Transcript of Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Pe’er @...

Using Bayesian Networks to Analyze Expression Data

N. Friedman, M. Linial, I. Nachman, D. Pe’er @ Hebrew University

What I will cover

• Domain background• Overview of their work• Causal networks vs. Bayes networks• Application • Results

BACKGROUND INFORMATION

• What are gene expressions?– It is the process in which information is used in the

synthesis of a functional gene product (protein or Rna).

• Think of it as a menu for a dinner given a certain holiday.– Need certain ingredients / food to pull it off right.– Too much or too little of something can lead to

odd results.

• Advancement in technology lead to DNA Microarrays.– Snapshot of internals of a cell at a given moment in time.– No more having to look at one gene at a time for

comparison.• Most computational analysis has focused on

clustering algorithms.– Cluster like genes with like genes.– Useful for finding co-regulated genes but not really for

finding the structure of the regulation process.

OVERVIEW

Overview

• How to discover key relations in cellular systems given large amounts of micro array data.

• Propose a Bayesian Network framework for gene interaction discovery from micro array data.– Trying to build statistical dependencies.– Understand interactions from multiple expression

measurements.

Overview

• Want to uncover properties of the network by examining the dependence and conditional dependence of the gene data.– How does one gene interact with another etc.– Can use this information to determine causal

influence.

BAYES NETS

Bayesian Network

Bayesian Network

• Useful for a few reasons– Great for describing locally interacting entities.– Well understood array of algorithms and

successful use in many areas.– Can be used to infer a causal network even though

they are not mathematically defined as such.– Able to handle noise fairly well.

Causal Network

• Very similar to a typical Bayesian net.• Bayesian network with a strict requirement

that the relationships are causal.– X causes something about Y.

• Learning multiple networks with the same directed path could mean there is a causal indication between X and Y.

Bayes vs Causal

• Bayesian Network generally deals with dependence.

• Causal Networks deal with strict relationships.• Bayesian Network can have equivalent

networks.– X Y is equivalent to Y X

• Causal Network– The above cannot hold due to the definition of

Causal networks.

Learning Causal Patterns

• Need to determine a causal interpretation of the network.

• Observation– Passive domain measurement.

• Intervention– Setting variable values using outside forces.

Causal Markov assumption

• Given the values of a variables immediate causes, it is independent of its earlier causes.– Once we know the makeup of the genes parents,

we don’t care about the ancestors anymore in terms of the current gene.

Analyzing Expression Data

• Consider distributions over all possible states ( can include environmental states etc)

• State of the system is a series of random variables.– Each random variable denotes expression level of

each gene.• Take all of these variables and build the joint

distribution.

• Difficult to learn from expression data due to involving transcript levels from thousands of genes!

• However these gene networks are sparse so Bayes Nets are still well suited.

Learning the model

• Markov relations are a feature that indicates if two genes are related in a joint biological process.

• Order relations are a feature that captures a global property about the network.– Used as an indication of some causality between X

and Y. Its not certain though.

Confidence of features

• Produce m different networks and for each feature of interest calculate its confidence.

• Where f(G) is 1 if f is a feature of G, 0 otherwise.

m

iiGfm

fconf1

)(1

)(

Learning the network structure

• Issues– Extremely large search space (super-exponential in

the number of variables)• Need to id potential parents for each gene

using simple statistics to build the network.– Reduces search space to networks that only

contain the candidate parents as parents of some variables Xi .

Different local probability models

• Multinomial Model– Treat each variable as discrete and learn

multinomial distribution to describe the possible state of each child given the stat of the parents.

• Linear Gaussian Model– Linear regression model for the child given its

parents.

Results

• Applied Cell Cycle Expression patterns.• 76 gene expression measurements.• Treat each measurement as an independent

sample.• Performed the boot strapping algorithm along

with the sparse search algorithm to extract learned features.– Performed on only 250 genes

Test robustness

• Tested their confidence assessment by using a randomly created data set. Random permutation of the order of experiments per gene.– Found that random data did not perform well due

to not finding real features that correspond in the data.

– Tells us that the learned features are not artifacts of the boot strapping estimation.

• Managed to extract plausible biological knowledge without use of priors.

• Framework builds a much “richer” structure from the data compared to clustering techniques.

• Capable of discovering causal relationships between genes from expression data.