Gene Network Modeling

Post on 06-Jan-2016

25 views 0 download

Tags:

description

Gene Network Modeling. Prof. Yasser Kadah Eng. Fadhl Al-Akwaa. OUTLINES. What is the Gene Regulatory Network? GRN Application of GRN GRN Construction Methodology GRN modeling steps GRN Models GRS Software Next work Reference. From The Last Lecture. DNA sequence {A,T,C,G} ATCGAATCGA - PowerPoint PPT Presentation

Transcript of Gene Network Modeling

23/2/20082

Gene Network Modeling

Prof. Yasser KadahEng. Fadhl Al-Akwaa

23/2/20083

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference

From The Last Lecture

DNA sequence {A,T,C,G}

ATCGAATCGAProtein sequence { except B, J, O, U, X, Z}

KMLSLLMARTYW

:The Central DogmaProtein Synthesis

Transcription Translation

Genome ProteomeTranscriptome

Cell Function

23/2/20086

Bioinformatics Important Challenges

Transcription Translation

Gene Prediction

Gene FunctionProtein FunctionProtein 3D Structure

Public Data Base

Transcription Translation

DNA sequence {A,T,C,G}

Microarray Protein sequenceKMLSLLMARTYW

Gene Expression

Level

Gene Expression

9

23/2/200810

Microarray Technology

Gene Expression

Level

Protein Level

Translation Rate

Transcription

Rate

+

+

+

-

GENE A

Translation Rate

Transcription

Rate

Gene Expression Level

Protein Level

+

+

+

-

GENE A

Translation Rate

Transcription

Rate

Gene Expression Level

Protein Level

+

+

+

-

GENE B

?

?

Translation Rate

Transcription

Rate

Gene Expression Level

Protein Level

+

+

+

-

GENE A

Translation Rate

Transcription

Rate

Gene Expression Level

Protein Level

+

+

+

-

GENE B

?

??

?

?

23/2/200814

OUTLINES

What is the Gene Regulatory Network? Application of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareFuture workReference

What is Gene Regulatory Network? (GRN)

Gene A

Gene B

Gene C

Gene D?

?

?

??

GRN An example: Fission yeast

Lackner DH ,2007http://www.sanger.ac.uk/Info/News-releases/2007/070413.shtml

http://en.wikipedia.org/wiki/Metabolic_network_modelling

http://www.enm.bris.ac.uk/anm/summerschools/complexity/imagery/191.html

23/2/200819

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference

23/2/200820

Why build a Gene Network?

Functional Genomics Allow researchers to make predictions about

gene function that can then be tested at the bench.

The Focus is gradually shifting to Functional Genomics.

23/2/200821

Application of GRN Translational Genomics

we can study the effects of a compound (such as a drug) on the level of expression of many genes.

Translational Genomics The mission of the Translational Genomics is to translate genomic discoveries into advances in human health.

23/2/200822

Application of GRN Understanding Experimental data

Biologists are expecting powerful computational tools to extract functional

information from the Experimental data.

23/2/200823

GRN Model ObjectiveConstruct a gene network model that:Describes known genes interactions wellPredicts interactions not known so farAllows for Drug effect simulationUnderstand the etiology of the Disease

23/2/200824

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference

23/2/200825

GRN Construction Methodology

Forward EngineeringInverse Engineering “Traditional

methodology”

23/2/200826

Forward Engineering

Hard

23/2/200827

Reverse Engineering

Microaary Data

ModelGene Network

Possibleforward

problem

very difficultinverse

problem

23/2/200828

Reverse Engineering

Boolean data

Boolean networks

easy

easy

23/2/200829

Data Required: DNA Microarray

0 10 20 30 40 50 60time (min)

gene 1

gene 2

gene 3

assay

23/2/200830

Data Required: Gene Expression Matrix

t1t2t3t4

g10121

g21210

g30111.

g41210

23/2/200831

Data Required: Gene Expression Matrix

a1a2a3a4

g10311

g21210

g30111.

g41210

Snap Shot

t1t2t3t4

g10121

g21210

g30111.

g41210

Time serious

23/2/200832

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction Methodology

GRN Modeling StepsGRN ModelsGRS SoftwareNext workReference

23/2/200833

Microarray Image

Grid Alignment SegmentationDiscretization

t1

t2

10

Upregulated

99%

Hypothesis testing

t1 t2

Down regulated

Gene Selection

Seed Algorithm

Gene Expression Extraction

YPBN steady state matched

BN generationDesign of Optimal Control Policy

(I) Penalty Assignment

(II) Formulation of Optimal Control Problem

-1

A1 A2

A3

B C

D

E

F

G

Dynamic Programming

Optimal Control Policy

H

Application of Stationary Policy

Steady State using Control

Original Steady State

1.722.250.941.56

Overview of steps in modeling and control of Probabilistic Boolean networks

Ranadip Pal,2007

Prior Biological Knowledge

Data Extraction Discretization

Network Generation Gene Selection

Control of Network

23/2/200834

GRN modeling steppes: Discretization

0 10 20 30 40 50 60time (min)

assume that genes exist in two states: on and off

if expression of gene i is above level i consider it on, otherwise, consider it off

gene 1

gene 2

gene 3

23/2/200835

GRN modeling steppes: Discretization

1

23/2/200836

GRN modeling steppes: Discretization

onononon

off off off

off

off

off

off

on

off

on

on

1

23/2/200837

GRN modeling steppes: Discretization

0 10 20 30 40 50 60time (min)

gene 1

gene 2

gene 3

2

1

3

23/2/200838

GRN modeling steppes: Discretization

0 10 20 30 40 50 60time (min)

gene 1

gene 2

gene 3

2

1

3

ononononon

on

off off off

off

off

offoffoff

off

off

on on onon

on

on

on

off off off off offoff

on

off off off

23/2/200839

GRN modeling steppes: Discretization

we obtain the following discretized gene expression data:

time 0 510152025303540455055

gene 1 000000111111

gene 2000000011000

gene 3111111100000

the gene expression data is now in the form of bit streams

23/2/200840

GRN modeling steppes: Discretization

assume that genes exist in three states

Unchanged 0

Up-regulated 1

Down-regulated -1

23/2/200841

GRN modeling steppes: Gene SelectionClustring

a1a2a3a4

g10121

g21210

g30111.

g41210

23/2/200842

Clustering Steps: CorrelationChoose a similarity metric to compare the

transcriptional response or the expression profiles:Pearson CorrelationSpearman CorrelationEuclidean Distance

23/2/200843

Clustering Steps: Correlation Algorithm

g1g2g3g4g5

g110.230.000.95-0.63

g2-110.910.560.56

g300.2310.320.77

g410.50.561-0.36

g5-10.910.320.41

Correlation coefficients are values from –1 to 1, with 1 indicating a similar behavior, –1 indicating an opposite behavior and 0 indicating no direct relation.

23/2/200844

Clustering Steps: Clustering Algorithm

Choose a clustering algorithm:HierarchicalK-means…

23/2/200845

Hierarchical Clustering

g1g2g3g4g5

g10.230.000.95-0.63

g20.910.560.56

g30.320.77

g4-0.36

g5

g1 g4

g1g2g3g4g5

g10.230.000.95-0.63

g20.910.560.56

g30.320.77

g4-0.36

g5 Find largest value in similarity matrix. Join clusters together. Recompute matrix and iterate.

23/2/200846

Hierarchical Clustering

g1 , g4g2g3g5

g1 , g40.370.16-

0.52

g20.910.56

g30.77

g5

g1 g4 g2 g3

g1 , g4g2g3g5

g1 , g40.370.16-

0.52

g20.910.56

g30.77

g5

• Find largest value is similarity matrix.

• Join clusters together.

• Recompute matrix and iterate.

23/2/200847

Hierarchical Clustering

g1 , g4g2 , g3g5

g1 , g40.27-

0.52

g2 , g30.68

g5

g1 g4 g2 g3g5

g1 , g4g2 , g3g5

g1 , g40.27-

0.52

g2 , g30.68

g5

• Find largest value is similarity matrix.

• Join clusters together.

• Recompute similarity matrix and iterate.

23/2/200848

Clustering Example

Eisen et al. (1998), PNAS, 95(25): 14863-14868

23/2/200849

GRN Modeling Steppes: GRN Generation

g2

g1

g4

g3

_

_

+

+ _

_

+

_

?

Gene network

t1t2t3t4

g10121

g21210

g30111.

g41210

Statistical Signal Processing Technique

23/2/200850

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling steps

GRN ModelsGRS SoftwareNext workReference

GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms

23/2/200852

GRN Models Hidde de Jong,

Modeling and simulation of genetic regulatory systems: a literature review;

J Comput Biol. 2002;9(1):67-103. Review. Node States ComputationData ComplexityDynamics

23/2/200853

What class of modelsshould be chosen?

The selection should be made in view of data requirements goals of modeling and analysis.

23/2/200854

Classical Tradeoff A “fine” model with many parameters

may be able to capture detailed “low-level” phenomena (protein concentrations, reaction kinetics);

requires very large amounts of data for inference, lest the model be “overfit”.

A “coarse” model with lower complexitymay succeed in capturing “high-level” phenomena

(which genes are ON/OFF);requires smaller amounts of data.

23/2/200855

Occam’s Razor

23/2/200856

Model Reliability and Adequacy

P is the set of all possible observations

S set of all observations made on the study system

M is the set of all model outputs

Q=S пM

S

MQ

P

23/2/200857

Model Reliability and Adequacy

S

M

P

Useless Model

M

P

Dream Model

SQ

23/2/200858

Model Reliability and Adequacy

S

M

P

Incomplete model

Q M

S

P

Complete, but erring model

Q

Model reliability: |Q|/|M|Model adequacy: |Q|/|S|

23/2/200859

GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms

23/2/200860

Directed and undirected Graphs Probably most straightforward way to model a GRN G=<V,E> V set of vertices Set of edges E=<i,j> where i,j є V, head and tail of edge Additional labels denote positive/negative influence

23/2/200861

Directed and undirected Graphs Advantages: Intuitive way of visualization Common and well explored graph algorithms can make

biologically relevant predictions about GRSes: paths between genes may reveal missing regulatory interactions or

provide clues about redundancy cycles in the network point at feedback relations connectivity characteristics give indication of the complexity loosely connected subgraphs point at functional modules

Disadvantages: Time does not play a role Too much abstraction: very simplified model far from reality

23/2/200862

GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms

More popular and efficient

23/2/200863

Boolean Network Model

• A Boolean network is defined by a set of nodes, V = {x1, x2, . . . , xn}, and a list of

Boolean functions, F= {f1, f2, . . . , fn}

• Each xk represents the state (expression) of

a gene, gk, where xk = 1 the gene is expressed

or xk = 0, the gene is not expressed

23/2/200864

Boolean Network

At any given time, combining the gene states gives a gene activity pattern (GAP).

t01234

x111011

x210001

x310111

GAP

23/2/200865

Boolean Network

t01234

x111011

x210000

x310110

•Given a GAP at time t, a deterministic function (a set of logical rules) provides the GAP at time t +1.

GAPt GAPt+1

23/2/200866

Boolean Network

t01234

x111011

x210000

x310110

23/2/200867

Boolean Network Example

t01234

x111011

x210000

x310110

23/2/200868

Boolean Network

t01234

x111011

x210000

x310110

23/2/200869

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3t

t+1

23/2/200870

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3

or

t

t+1

23/2/200871

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3t

t+1

23/2/200872

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3t

t+1

23/2/200873

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3t

t+1or

For each node there will be 2^2^k possible functions

23/2/200874

Boolean Network Example

t01234

x111011

x210000

x310110x2

x1

x1 x3

x2 x3

or nor nand

t

t+1

23/2/200875

Boolean Network Example

I. Shmulevich et al., Bioinformatics (2002), 18 (2): 261-274

AND

NOT

NAND

23/2/200876

Boolean Networks – Summary Advantages

Efficient analysis of large RN Positive/negative feedback-cycles can be modeled with

BN‘s Disadvantages

Strong simplifying assumptions – gene is either on or off, no in between states

The computation time is very high or often impractical to construct large-scale gene networks

Very susceptible to noiseThere are situations where boolean idealisation is not

appropriate – more general methods required

23/2/200877

Bayesian Networks

A gene regulatory network is represented by directed acyclic graph:Vertices correspond to genes.Edges correspond to direct influence or interaction.

For each gene xi, a conditional distribution p(xi | ancestors(xi) ) is defined.

The graph and the conditional distributions, uniquely specify the joint probability distribution.

23/2/200878

Bayesian Network Example

x3x4

x5

x1 x2

Conditional distributions:p(x1), p(x2), p(x3| x2),

p(x4| x1,x2), p(x5| x4)

p(X) =p(X) = p(x1) p(x2) p(x3| x2) p(x4| x1,x2) p(x5| x4)

23/2/200879

Learning Bayesian Models

Using gene expression data, the goal is to find the bayesian network that best matches the data.

Recovering optimal conditional probability distributions when the graph is known is “easy”.

Recovering the structure of the graph is NP hard(non-deterministic polynomial ).

But, good statistics are available: What is the likelihood of a specific assignment? What is the distribution of xi given xj? …

23/2/200880

Issues with Bayesian Models Computationally intensive. Requires lots of data. Does not allow for feedback loops which are known

to play an important role in biological networks. Does not make use of the temporal aspect of the

data. Dynamical Bayesian Networks aim at solving some of

these issues but they require even more data.

23/2/200881

Differential Equations

Typically uses linear differential equations to model the gene trajectories:dxi(t) / dt = a0 + ai,1 x1(t)+ ai,2 x2(t)+ … + ai,n xn(t)

Several reasons for that choice:lower number of parameters implies that we are

less likely to over fit the datasufficient to model complex interactions between

the genes

23/2/200882

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)

dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)

dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)

dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

23/2/200883

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)

dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)

dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)

dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

one interactioncoefficient

23/2/200884

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)

dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)

dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)

dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

constantcoefficients

23/2/200885

Problem Revisited

a0,ia1,ia2,ia3,ia4,i

x1.431.-248000

x2000.-473.374

x3.-427.3760.-2410

x40.4350.-315.-437

Given the time-series data, can we find the interactions coefficients?

23/2/200886

Issues with Differential Equations

• Even under the simplest linear model, there are m(m+1) unknown parameters to estimate:

• m(m-1) directional effects• m self effects• m constant effects

• Number of data points is mn and we typically have that n << m (few time-points).

• To avoid over fitting, extra constraints must be incorporated into the model such as:

• Smoothness of the equations • Sparseness of the network (few non-null interaction coefficients)

23/2/200887

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference

23/2/200888

GRN Software

GNA: Genetic Network Analyzer

Helix Bioinformatics

http://www-helix.inrialpes.fr/article122.html

23/2/200889

GRN Software

Probabilistic Boolean Networks (PBN)Matlab Tool BoxIlya ShmulevichInstitute for Systems Biology

23/2/200890

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS Software

Next workReference

Future Work: Literature Review

• Study the noisy natural of Microarray Data.• Study in depth the existing modeling

methodology.• Focus on specialized problem like cancer.

Future Work : GSP Statistics Books

Genomics signal processing and statistics, Edward,2006 Introduction to genomics signal processing

with control, Ily,2006 Computational and Statistical Approaches to

Genomics (Springer, 2006), Ily

23/2/200893

Future Work : Statistics Books

Handbook of Computational Statistics An Introduction to Statistical Signal

Processing, Robert M. Gray,2007 fundamentals of statistical signal

processing :estimation theory, steven kay nonlinear signal processing a statistical

approach, Gonzalo R,2005 Inference_in_HMM, Olivier Cappe,2005

23/2/200894

Future Work : Modeling Books

Modeling and Control of Complex Systems (Control Engineering)

by Petros A. Ioannou, Andreas Pitsillides,2008

MODELING BIOLOGICAL SYSTEMS: Principles and Applications2005

gene regulation and metabolism postgenomic computational approaches, Julio, 2000

23/2/200895

Future Work: Resources IEEE Transactions on Computational Biology and Bioinformatics IEEE International Workshop on Genomic Signal Processing and

Statistics IEEE Journal of Selected Topics in Signal Processing: Special Issue on

Genomic and Proteomic Signal Processing EURASIP Journal of Bioinformatics and Systems Biology Special issue of

the on Genetic Regulatory Networks IEEE Signal Processing Magazine on Signal Processing Special issue of

the Methods in Genomics and Proteomics IEEE Transactions on Signal Processing Special Genomic Signal

Processing issue of the Workshop on Discrete Models for Genetic Regulatory Networks

23/2/200896

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext work

Reference

23/2/200897

Reference Hidde de Jong,

Modeling and simulation of genetic regulatory systems: a literature review; J Comput Biol. 2002;9(1):67-103. Review.

BAYESIAN ROBUSTNESS IN THE CONTROL OF GENE REGULATORY NETWORKS Ranadip Pal1, Aniruddha Datta2, Edward R. Dougherty

Anastassiou, D. (2001). Genomic Signal Processing. IEEE Signal Processing

Dougherty, E. R. and A. Datta (2005). "Genomic signal processing: diagnosis and therapy." Signal Processing Magazine, IEEE 22(1): 107 - 112.

Vaidyanathan, P. P. (2004). Genomics and Proteomics: A Signal Processorapos's Tour. Circuits and Systems Magazine, IEEE. 4: 1-1.

23/2/200898

Reference Vaidyanathan, P. P. and B.-J. Yoon (2004). "The role

of signal-processing concepts in genomics and proteomics." Journal of the Franklin Institute.(Special Issue on Genomics).