Gene Network Modeling

99

description

Gene Network Modeling. Prof. Yasser Kadah Eng. Fadhl Al-Akwaa. OUTLINES. What is the Gene Regulatory Network? GRN Application of GRN GRN Construction Methodology GRN modeling steps GRN Models GRS Software Next work Reference. From The Last Lecture. DNA sequence {A,T,C,G} ATCGAATCGA - PowerPoint PPT Presentation

Transcript of Gene Network Modeling

Page 1: Gene Network Modeling
Page 2: Gene Network Modeling

23/2/20082

Gene Network Modeling

Prof. Yasser KadahEng. Fadhl Al-Akwaa

Page 3: Gene Network Modeling

23/2/20083

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference

Page 4: Gene Network Modeling

From The Last Lecture

DNA sequence {A,T,C,G}

ATCGAATCGAProtein sequence { except B, J, O, U, X, Z}

KMLSLLMARTYW

Page 5: Gene Network Modeling

:The Central DogmaProtein Synthesis

Transcription Translation

Genome ProteomeTranscriptome

Cell Function

Page 6: Gene Network Modeling

23/2/20086

Page 7: Gene Network Modeling

Bioinformatics Important Challenges

Transcription Translation

Gene Prediction

Gene FunctionProtein FunctionProtein 3D Structure

Page 8: Gene Network Modeling

Public Data Base

Transcription Translation

DNA sequence {A,T,C,G}

Microarray Protein sequenceKMLSLLMARTYW

Gene Expression

Level

Page 9: Gene Network Modeling

Gene Expression

9

Page 10: Gene Network Modeling

23/2/200810

Microarray Technology

Page 11: Gene Network Modeling

Gene Expression

Level

Protein Level

Translation Rate

Transcription

Rate

+

+

+

-

GENE A

Page 12: Gene Network Modeling

Translation Rate

Transcription

Rate

Gene Expression Level

Protein Level

+

+

+

-

GENE A

Translation Rate

Transcription

Rate

Gene Expression Level

Protein Level

+

+

+

-

GENE B

?

?

Page 13: Gene Network Modeling

Translation Rate

Transcription

Rate

Gene Expression Level

Protein Level

+

+

+

-

GENE A

Translation Rate

Transcription

Rate

Gene Expression Level

Protein Level

+

+

+

-

GENE B

?

??

?

?

Page 14: Gene Network Modeling

23/2/200814

OUTLINES

What is the Gene Regulatory Network? Application of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareFuture workReference

Page 15: Gene Network Modeling

What is Gene Regulatory Network? (GRN)

Gene A

Gene B

Gene C

Gene D?

?

?

??

Page 16: Gene Network Modeling

GRN An example: Fission yeast

Lackner DH ,2007http://www.sanger.ac.uk/Info/News-releases/2007/070413.shtml

Page 17: Gene Network Modeling

http://en.wikipedia.org/wiki/Metabolic_network_modelling

Page 18: Gene Network Modeling

http://www.enm.bris.ac.uk/anm/summerschools/complexity/imagery/191.html

Page 19: Gene Network Modeling

23/2/200819

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference

Page 20: Gene Network Modeling

23/2/200820

Why build a Gene Network?

Functional Genomics Allow researchers to make predictions about

gene function that can then be tested at the bench.

The Focus is gradually shifting to Functional Genomics.

Page 21: Gene Network Modeling

23/2/200821

Application of GRN Translational Genomics

we can study the effects of a compound (such as a drug) on the level of expression of many genes.

Translational Genomics The mission of the Translational Genomics is to translate genomic discoveries into advances in human health.

Page 22: Gene Network Modeling

23/2/200822

Application of GRN Understanding Experimental data

Biologists are expecting powerful computational tools to extract functional

information from the Experimental data.

Page 23: Gene Network Modeling

23/2/200823

GRN Model ObjectiveConstruct a gene network model that:Describes known genes interactions wellPredicts interactions not known so farAllows for Drug effect simulationUnderstand the etiology of the Disease

Page 24: Gene Network Modeling

23/2/200824

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference

Page 25: Gene Network Modeling

23/2/200825

GRN Construction Methodology

Forward EngineeringInverse Engineering “Traditional

methodology”

Page 26: Gene Network Modeling

23/2/200826

Forward Engineering

Hard

Page 27: Gene Network Modeling

23/2/200827

Reverse Engineering

Microaary Data

ModelGene Network

Possibleforward

problem

very difficultinverse

problem

Page 28: Gene Network Modeling

23/2/200828

Reverse Engineering

Boolean data

Boolean networks

easy

easy

Page 29: Gene Network Modeling

23/2/200829

Data Required: DNA Microarray

0 10 20 30 40 50 60time (min)

gene 1

gene 2

gene 3

assay

Page 30: Gene Network Modeling

23/2/200830

Data Required: Gene Expression Matrix

t1t2t3t4

g10121

g21210

g30111.

g41210

Page 31: Gene Network Modeling

23/2/200831

Data Required: Gene Expression Matrix

a1a2a3a4

g10311

g21210

g30111.

g41210

Snap Shot

t1t2t3t4

g10121

g21210

g30111.

g41210

Time serious

Page 32: Gene Network Modeling

23/2/200832

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction Methodology

GRN Modeling StepsGRN ModelsGRS SoftwareNext workReference

Page 33: Gene Network Modeling

23/2/200833

Microarray Image

Grid Alignment SegmentationDiscretization

t1

t2

10

Upregulated

99%

Hypothesis testing

t1 t2

Down regulated

Gene Selection

Seed Algorithm

Gene Expression Extraction

YPBN steady state matched

BN generationDesign of Optimal Control Policy

(I) Penalty Assignment

(II) Formulation of Optimal Control Problem

-1

A1 A2

A3

B C

D

E

F

G

Dynamic Programming

Optimal Control Policy

H

Application of Stationary Policy

Steady State using Control

Original Steady State

1.722.250.941.56

Overview of steps in modeling and control of Probabilistic Boolean networks

Ranadip Pal,2007

Prior Biological Knowledge

Data Extraction Discretization

Network Generation Gene Selection

Control of Network

Page 34: Gene Network Modeling

23/2/200834

GRN modeling steppes: Discretization

0 10 20 30 40 50 60time (min)

assume that genes exist in two states: on and off

if expression of gene i is above level i consider it on, otherwise, consider it off

gene 1

gene 2

gene 3

Page 35: Gene Network Modeling

23/2/200835

GRN modeling steppes: Discretization

1

Page 36: Gene Network Modeling

23/2/200836

GRN modeling steppes: Discretization

onononon

off off off

off

off

off

off

on

off

on

on

1

Page 37: Gene Network Modeling

23/2/200837

GRN modeling steppes: Discretization

0 10 20 30 40 50 60time (min)

gene 1

gene 2

gene 3

2

1

3

Page 38: Gene Network Modeling

23/2/200838

GRN modeling steppes: Discretization

0 10 20 30 40 50 60time (min)

gene 1

gene 2

gene 3

2

1

3

ononononon

on

off off off

off

off

offoffoff

off

off

on on onon

on

on

on

off off off off offoff

on

off off off

Page 39: Gene Network Modeling

23/2/200839

GRN modeling steppes: Discretization

we obtain the following discretized gene expression data:

time 0 510152025303540455055

gene 1 000000111111

gene 2000000011000

gene 3111111100000

the gene expression data is now in the form of bit streams

Page 40: Gene Network Modeling

23/2/200840

GRN modeling steppes: Discretization

assume that genes exist in three states

Unchanged 0

Up-regulated 1

Down-regulated -1

Page 41: Gene Network Modeling

23/2/200841

GRN modeling steppes: Gene SelectionClustring

a1a2a3a4

g10121

g21210

g30111.

g41210

Page 42: Gene Network Modeling

23/2/200842

Clustering Steps: CorrelationChoose a similarity metric to compare the

transcriptional response or the expression profiles:Pearson CorrelationSpearman CorrelationEuclidean Distance

Page 43: Gene Network Modeling

23/2/200843

Clustering Steps: Correlation Algorithm

g1g2g3g4g5

g110.230.000.95-0.63

g2-110.910.560.56

g300.2310.320.77

g410.50.561-0.36

g5-10.910.320.41

Correlation coefficients are values from –1 to 1, with 1 indicating a similar behavior, –1 indicating an opposite behavior and 0 indicating no direct relation.

Page 44: Gene Network Modeling

23/2/200844

Clustering Steps: Clustering Algorithm

Choose a clustering algorithm:HierarchicalK-means…

Page 45: Gene Network Modeling

23/2/200845

Hierarchical Clustering

g1g2g3g4g5

g10.230.000.95-0.63

g20.910.560.56

g30.320.77

g4-0.36

g5

g1 g4

g1g2g3g4g5

g10.230.000.95-0.63

g20.910.560.56

g30.320.77

g4-0.36

g5 Find largest value in similarity matrix. Join clusters together. Recompute matrix and iterate.

Page 46: Gene Network Modeling

23/2/200846

Hierarchical Clustering

g1 , g4g2g3g5

g1 , g40.370.16-

0.52

g20.910.56

g30.77

g5

g1 g4 g2 g3

g1 , g4g2g3g5

g1 , g40.370.16-

0.52

g20.910.56

g30.77

g5

• Find largest value is similarity matrix.

• Join clusters together.

• Recompute matrix and iterate.

Page 47: Gene Network Modeling

23/2/200847

Hierarchical Clustering

g1 , g4g2 , g3g5

g1 , g40.27-

0.52

g2 , g30.68

g5

g1 g4 g2 g3g5

g1 , g4g2 , g3g5

g1 , g40.27-

0.52

g2 , g30.68

g5

• Find largest value is similarity matrix.

• Join clusters together.

• Recompute similarity matrix and iterate.

Page 48: Gene Network Modeling

23/2/200848

Clustering Example

Eisen et al. (1998), PNAS, 95(25): 14863-14868

Page 49: Gene Network Modeling

23/2/200849

GRN Modeling Steppes: GRN Generation

g2

g1

g4

g3

_

_

+

+ _

_

+

_

?

Gene network

t1t2t3t4

g10121

g21210

g30111.

g41210

Statistical Signal Processing Technique

Page 50: Gene Network Modeling

23/2/200850

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling steps

GRN ModelsGRS SoftwareNext workReference

Page 51: Gene Network Modeling

GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms

Page 52: Gene Network Modeling

23/2/200852

GRN Models Hidde de Jong,

Modeling and simulation of genetic regulatory systems: a literature review;

J Comput Biol. 2002;9(1):67-103. Review. Node States ComputationData ComplexityDynamics

Page 53: Gene Network Modeling

23/2/200853

What class of modelsshould be chosen?

The selection should be made in view of data requirements goals of modeling and analysis.

Page 54: Gene Network Modeling

23/2/200854

Classical Tradeoff A “fine” model with many parameters

may be able to capture detailed “low-level” phenomena (protein concentrations, reaction kinetics);

requires very large amounts of data for inference, lest the model be “overfit”.

A “coarse” model with lower complexitymay succeed in capturing “high-level” phenomena

(which genes are ON/OFF);requires smaller amounts of data.

Page 55: Gene Network Modeling

23/2/200855

Occam’s Razor

Page 56: Gene Network Modeling

23/2/200856

Model Reliability and Adequacy

P is the set of all possible observations

S set of all observations made on the study system

M is the set of all model outputs

Q=S пM

S

MQ

P

Page 57: Gene Network Modeling

23/2/200857

Model Reliability and Adequacy

S

M

P

Useless Model

M

P

Dream Model

SQ

Page 58: Gene Network Modeling

23/2/200858

Model Reliability and Adequacy

S

M

P

Incomplete model

Q M

S

P

Complete, but erring model

Q

Model reliability: |Q|/|M|Model adequacy: |Q|/|S|

Page 59: Gene Network Modeling

23/2/200859

GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms

Page 60: Gene Network Modeling

23/2/200860

Directed and undirected Graphs Probably most straightforward way to model a GRN G=<V,E> V set of vertices Set of edges E=<i,j> where i,j є V, head and tail of edge Additional labels denote positive/negative influence

Page 61: Gene Network Modeling

23/2/200861

Directed and undirected Graphs Advantages: Intuitive way of visualization Common and well explored graph algorithms can make

biologically relevant predictions about GRSes: paths between genes may reveal missing regulatory interactions or

provide clues about redundancy cycles in the network point at feedback relations connectivity characteristics give indication of the complexity loosely connected subgraphs point at functional modules

Disadvantages: Time does not play a role Too much abstraction: very simplified model far from reality

Page 62: Gene Network Modeling

23/2/200862

GRN Models Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms

More popular and efficient

Page 63: Gene Network Modeling

23/2/200863

Boolean Network Model

• A Boolean network is defined by a set of nodes, V = {x1, x2, . . . , xn}, and a list of

Boolean functions, F= {f1, f2, . . . , fn}

• Each xk represents the state (expression) of

a gene, gk, where xk = 1 the gene is expressed

or xk = 0, the gene is not expressed

Page 64: Gene Network Modeling

23/2/200864

Boolean Network

At any given time, combining the gene states gives a gene activity pattern (GAP).

t01234

x111011

x210001

x310111

GAP

Page 65: Gene Network Modeling

23/2/200865

Boolean Network

t01234

x111011

x210000

x310110

•Given a GAP at time t, a deterministic function (a set of logical rules) provides the GAP at time t +1.

GAPt GAPt+1

Page 66: Gene Network Modeling

23/2/200866

Boolean Network

t01234

x111011

x210000

x310110

Page 67: Gene Network Modeling

23/2/200867

Boolean Network Example

t01234

x111011

x210000

x310110

Page 68: Gene Network Modeling

23/2/200868

Boolean Network

t01234

x111011

x210000

x310110

Page 69: Gene Network Modeling

23/2/200869

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3t

t+1

Page 70: Gene Network Modeling

23/2/200870

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3

or

t

t+1

Page 71: Gene Network Modeling

23/2/200871

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3t

t+1

Page 72: Gene Network Modeling

23/2/200872

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3t

t+1

Page 73: Gene Network Modeling

23/2/200873

Boolean Network Example

t01234

x111011

x210000

x310110

x1

x1

x2 x3t

t+1or

For each node there will be 2^2^k possible functions

Page 74: Gene Network Modeling

23/2/200874

Boolean Network Example

t01234

x111011

x210000

x310110x2

x1

x1 x3

x2 x3

or nor nand

t

t+1

Page 75: Gene Network Modeling

23/2/200875

Boolean Network Example

I. Shmulevich et al., Bioinformatics (2002), 18 (2): 261-274

AND

NOT

NAND

Page 76: Gene Network Modeling

23/2/200876

Boolean Networks – Summary Advantages

Efficient analysis of large RN Positive/negative feedback-cycles can be modeled with

BN‘s Disadvantages

Strong simplifying assumptions – gene is either on or off, no in between states

The computation time is very high or often impractical to construct large-scale gene networks

Very susceptible to noiseThere are situations where boolean idealisation is not

appropriate – more general methods required

Page 77: Gene Network Modeling

23/2/200877

Bayesian Networks

A gene regulatory network is represented by directed acyclic graph:Vertices correspond to genes.Edges correspond to direct influence or interaction.

For each gene xi, a conditional distribution p(xi | ancestors(xi) ) is defined.

The graph and the conditional distributions, uniquely specify the joint probability distribution.

Page 78: Gene Network Modeling

23/2/200878

Bayesian Network Example

x3x4

x5

x1 x2

Conditional distributions:p(x1), p(x2), p(x3| x2),

p(x4| x1,x2), p(x5| x4)

p(X) =p(X) = p(x1) p(x2) p(x3| x2) p(x4| x1,x2) p(x5| x4)

Page 79: Gene Network Modeling

23/2/200879

Learning Bayesian Models

Using gene expression data, the goal is to find the bayesian network that best matches the data.

Recovering optimal conditional probability distributions when the graph is known is “easy”.

Recovering the structure of the graph is NP hard(non-deterministic polynomial ).

But, good statistics are available: What is the likelihood of a specific assignment? What is the distribution of xi given xj? …

Page 80: Gene Network Modeling

23/2/200880

Issues with Bayesian Models Computationally intensive. Requires lots of data. Does not allow for feedback loops which are known

to play an important role in biological networks. Does not make use of the temporal aspect of the

data. Dynamical Bayesian Networks aim at solving some of

these issues but they require even more data.

Page 81: Gene Network Modeling

23/2/200881

Differential Equations

Typically uses linear differential equations to model the gene trajectories:dxi(t) / dt = a0 + ai,1 x1(t)+ ai,2 x2(t)+ … + ai,n xn(t)

Several reasons for that choice:lower number of parameters implies that we are

less likely to over fit the datasufficient to model complex interactions between

the genes

Page 82: Gene Network Modeling

23/2/200882

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)

dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)

dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)

dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

Page 83: Gene Network Modeling

23/2/200883

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)

dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)

dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)

dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

one interactioncoefficient

Page 84: Gene Network Modeling

23/2/200884

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)

dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)

dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)

dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

constantcoefficients

Page 85: Gene Network Modeling

23/2/200885

Problem Revisited

a0,ia1,ia2,ia3,ia4,i

x1.431.-248000

x2000.-473.374

x3.-427.3760.-2410

x40.4350.-315.-437

Given the time-series data, can we find the interactions coefficients?

Page 86: Gene Network Modeling

23/2/200886

Issues with Differential Equations

• Even under the simplest linear model, there are m(m+1) unknown parameters to estimate:

• m(m-1) directional effects• m self effects• m constant effects

• Number of data points is mn and we typically have that n << m (few time-points).

• To avoid over fitting, extra constraints must be incorporated into the model such as:

• Smoothness of the equations • Sparseness of the network (few non-null interaction coefficients)

Page 87: Gene Network Modeling

23/2/200887

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext workReference

Page 88: Gene Network Modeling

23/2/200888

GRN Software

GNA: Genetic Network Analyzer

Helix Bioinformatics

http://www-helix.inrialpes.fr/article122.html

Page 89: Gene Network Modeling

23/2/200889

GRN Software

Probabilistic Boolean Networks (PBN)Matlab Tool BoxIlya ShmulevichInstitute for Systems Biology

Page 90: Gene Network Modeling

23/2/200890

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS Software

Next workReference

Page 91: Gene Network Modeling

Future Work: Literature Review

• Study the noisy natural of Microarray Data.• Study in depth the existing modeling

methodology.• Focus on specialized problem like cancer.

Page 92: Gene Network Modeling

Future Work : GSP Statistics Books

Genomics signal processing and statistics, Edward,2006 Introduction to genomics signal processing

with control, Ily,2006 Computational and Statistical Approaches to

Genomics (Springer, 2006), Ily

Page 93: Gene Network Modeling

23/2/200893

Future Work : Statistics Books

Handbook of Computational Statistics An Introduction to Statistical Signal

Processing, Robert M. Gray,2007 fundamentals of statistical signal

processing :estimation theory, steven kay nonlinear signal processing a statistical

approach, Gonzalo R,2005 Inference_in_HMM, Olivier Cappe,2005

Page 94: Gene Network Modeling

23/2/200894

Future Work : Modeling Books

Modeling and Control of Complex Systems (Control Engineering)

by Petros A. Ioannou, Andreas Pitsillides,2008

MODELING BIOLOGICAL SYSTEMS: Principles and Applications2005

gene regulation and metabolism postgenomic computational approaches, Julio, 2000

Page 95: Gene Network Modeling

23/2/200895

Future Work: Resources IEEE Transactions on Computational Biology and Bioinformatics IEEE International Workshop on Genomic Signal Processing and

Statistics IEEE Journal of Selected Topics in Signal Processing: Special Issue on

Genomic and Proteomic Signal Processing EURASIP Journal of Bioinformatics and Systems Biology Special issue of

the on Genetic Regulatory Networks IEEE Signal Processing Magazine on Signal Processing Special issue of

the Methods in Genomics and Proteomics IEEE Transactions on Signal Processing Special Genomic Signal

Processing issue of the Workshop on Discrete Models for Genetic Regulatory Networks

Page 96: Gene Network Modeling

23/2/200896

OUTLINES

What is the Gene Regulatory Network? GRNApplication of GRNGRN Construction MethodologyGRN modeling stepsGRN ModelsGRS SoftwareNext work

Reference

Page 97: Gene Network Modeling

23/2/200897

Reference Hidde de Jong,

Modeling and simulation of genetic regulatory systems: a literature review; J Comput Biol. 2002;9(1):67-103. Review.

BAYESIAN ROBUSTNESS IN THE CONTROL OF GENE REGULATORY NETWORKS Ranadip Pal1, Aniruddha Datta2, Edward R. Dougherty

Anastassiou, D. (2001). Genomic Signal Processing. IEEE Signal Processing

Dougherty, E. R. and A. Datta (2005). "Genomic signal processing: diagnosis and therapy." Signal Processing Magazine, IEEE 22(1): 107 - 112.

Vaidyanathan, P. P. (2004). Genomics and Proteomics: A Signal Processorapos's Tour. Circuits and Systems Magazine, IEEE. 4: 1-1.

Page 98: Gene Network Modeling

23/2/200898

Reference Vaidyanathan, P. P. and B.-J. Yoon (2004). "The role

of signal-processing concepts in genomics and proteomics." Journal of the Franklin Institute.(Special Issue on Genomics).

Page 99: Gene Network Modeling