Contact Sambit Samal (sambits) for additional information on Benchmarks.
Areejit Samal Randomizing metabolic networks
-
Upload
areejit-samal -
Category
Education
-
view
654 -
download
7
Transcript of Areejit Samal Randomizing metabolic networks
Areejit Samal
Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and UnivParis-Sud, Orsay, France
andMax Planck Institute for Mathematics in the Sciences, Leipzig, Germany
Areejit Samal
Large-scale structure of metabolic networks: Role of Biochemical and Functional
constraints
Areejit Samal
Outline
Architectural features of metabolic networks
Issues with existing null models for metabolic networks
Our method to generate realistic randomized metabolic networks using Markov Chain Monte Carlo (MCMC) sampling and Flux Balance Analysis (FBA)
Biological function is a main driver of large-scale structure in metabolic networks
Areejit Samal
Architectural features of real-world networks
Small Average Path Length & High Local Clustering
Watts and Strogatz, Nature (1998)
Small-world
Power law degree distribution
Barabasi and Albert, Science (1999)
Scale-free
Scale-free graphs can be generated using a simple model incorporating: Growth & Preferential attachment scheme
Areejit Samal
Architectural features of real-world networks
Small Average Path Length & High Local Clustering
Watts and Strogatz, Nature (1998)
Small-world
Power law degree distribution
Barabasi and Albert, Science (1999)
Scale-free
Scale-free graphs can be generated using a simple model incorporating: Growth & Preferential attachment scheme
Areejit Samal
Large-scale structure of metabolic networks
Ma and Zeng, Bioinformatics (2003)
Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)
Small-world, Scale-free and Hierarchical Organization
Bow-tie architecture
Bow-tie architecture of the metabolism is similar to that found for the WWW by Broder et al.
Small Average Path Length, High Local Clustering, Power-law degree distribution
Directed graph with a giant component of strongly connected nodes along with associated IN and OUT component
Areejit Samal
Large-scale structure of metabolic networks
Ma and Zeng, Bioinformatics (2003)
Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)
Small-world, Scale-free and Hierarchical Organization
Bow-tie architecture
Bow-tie architecture of the metabolism is similar to that found for the WWW by Broder et al.
Small Average Path Length, High Local Clustering, Power-law degree distribution
Directed graph with a giant component of strongly connected nodes along with associated IN and OUT component
Areejit Samal
Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004)
Network motifs and Evolutionary design
Sub-graphs that are over-represented in real networks compared to randomized networks with same degree sequence.
Certain network motifs were shown to perform important information processing tasks.
For example, the coherent Feed Forward Loop (FFL) can be used to detect persistent signals.
Areejit Samal
Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004)
Network motifs and Evolutionary design
Sub-graphs that are over-represented in real networks compared to randomized networks with same degree sequence.
Certain network motifs were shown to perform important information processing tasks.
For example, the coherent Feed Forward Loop (FFL) can be used to detect persistent signals.
Areejit Samal
Chosen Property
0.60 0.62 0.64 0.66 0.68 0.70
Freq
uenc
y
0
50
100
150
200
250
300
“Null hypothesis” based approach for identifying design principles in real-world networks
Statistically significant network properties are deduced as follows:
1) Measure the ‘chosen property’ (E.g., motif significance profile, average clustering coefficient, etc.) in the investigated real network.
2) Generate randomized networks with structure similar to the investigated real network using an appropriate null model.
3) Use the distribution of the ‘chosen property’ for the randomized networks to estimate a p-value.
Investigated network
p-value
Areejit Samal
Edge-exchange algorithm: commonly-used null model
Randomized networks preserve the degree at
each node and the degree sequence.
Investigated Network
After 1 exchange
After 2 exchanges
Reference: Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004); Maslov & Sneppen Science (2002)
Areejit Samal
Artzy-Randrup et al Science (2004)
Carefully posed null models are required for deducing design principles with confidence
C. elegans Neuronal Network: Close by neurons have a greater chance of forming a connection than distant neurons
A null model incorporating this spatial constraint gives different results compared to generic edge-exchange algorithm.
Areejit Samal
Artzy-Randrup et al Science (2004)
Carefully posed null models are required for deducing design principles with confidence
C. elegans Neuronal Network: Close by neurons have a greater chance of forming a connection than distant neurons
A null model incorporating this spatial constraint gives different results compared to generic edge-exchange algorithm.
Areejit Samal
Edge-exchange algorithm: case of bipartite metabolic networks
ASPT CITL
fum
asp-L cit
ac oaanh4
ASPT: asp-L → fum + nh4CITL: cit → oaa + ac
Areejit Samal
Edge-exchange algorithm: case of bipartite metabolic networks
ASPT CITL
fum
asp-L cit
ac oaanh4
ASPT* CITL*
fum
asp-L cit
ac oaanh4
ASPT: asp-L → fum + nh4CITL: cit → oaa + ac
ASPT*: asp-L → ac + nh4 CITL*: cit → oaa + fum
Preserves degree of each node in the networkBut generates fictitious reactions that violatemass, charge and atomic balance satisfied by real chemical reactions!!
Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.
Null-model used in many studies including: Guimera & Amaral Nature (2005); Samal et al BMC Bioinformatics (2006)
Areejit Samal
Edge-exchange algorithm: case of bipartite metabolic networks
ASPT CITL
fum
asp-L cit
ac oaanh4
ASPT* CITL*
fum
asp-L cit
ac oaanh4
ASPT: asp-L → fum + nh4CITL: cit → oaa + ac
ASPT*: asp-L → ac + nh4 CITL*: cit → oaa + fum
Preserves degree of each node in the networkBut generates fictitious reactions that violatemass, charge and atomic balance satisfied by real chemical reactions!!
Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.
Null-model used in many studies including: Guimera & Amaral Nature (2005); Samal et al BMC Bioinformatics (2006)
Areejit Samal
Edge-exchange algorithm: case of bipartite metabolic networks
ASPT CITL
fum
asp-L cit
ac oaanh4
ASPT* CITL*
fum
asp-L cit
ac oaanh4
ASPT: asp-L → fum + nh4CITL: cit → oaa + ac
ASPT*: asp-L → ac + nh4 CITL*: cit → oaa + fum
Preserves degree of each node in the networkBut generates fictitious reactions that violatemass, charge and atomic balance satisfied by real chemical reactions!!
Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.
Null-model used in many studies including: Guimera & Amaral Nature (2005); Samal et al BMC Bioinformatics (2006)
A common generic null model (edge exchange algorithm) is unsuitable for different types of real-world networks.
Areejit Samal
Null models for metabolic networks: Blind-watchmaker network
Areejit Samal
Null models for metabolic networks: Blind-watchmaker network
Areejit Samal
Randomizing metabolic networks
We propose a new method using Markov Chain Monte Carlo (MCMC) sampling and Flux Balance Analysis (FBA) to generate meaningful randomized ensembles for metabolic networks by successively imposing biochemical and functional constraints.
Ensemble RGiven no. of valid
biochemical reactions
Ensemble RmFix the no. of metabolites
Ensemble uRmExclude Blocked Reactions
Ensemble uRm-V1Functional constraint of
growth on specified environments
Increasing level of constraints
Constraint I
Constraint II
Constraint III
Constraint IV
+
+
+
Areejit Samal
Comparison with E. coli: Large-scale structural properties of interest
• Metabolite Degree distribution
• Clustering Coefficient
• Average Path Length
• Pc: Probability that a path exists between two nodes in the directed graph
• Largest strongly component (LSC) and the union of LSC, IN and OUT components
E. coli metabolic network has m (~750) metabolites and r (~900) reactions.
In this work, we will generate randomized metabolic networks using only reactions in KEGG and compare the structural properties of randomized networks with those of E. coli.
We study the following large-scale structural properties:
Bow-tie architecture of the Internet and metabolism
Ref: Broder et al; Ma and Zeng, Bioinformatics (2003)
Scale Free and Small World Ref: Jeong et al (2000) Wagner & Fell (2001)
HEX1
PGI
atp
adph
PFK
g6p
glc-D
f6p
fdp
glc-D
g6p
f6p
fdp
HEX1: atp + glc-D → adp + g6p + hPGI: g6p ↔ f6pPFK: atp + f6p → adp + fdp + h
CurrencyMetabolite
OtherMetabolite
ReversibleReaction
IrreversibleReaction
(a) Bipartite Representation (b) Unipartite Representation
Diff
eren
t Gra
ph-th
eore
tic R
epre
sent
atio
ns o
f the
M
etab
olic
Net
wor
k
HEX1
PGI
atp
adph
PFK
g6p
glc-D
f6p
fdp
glc-D
g6p
f6p
fdp
HEX1: atp + glc-D → adp + g6p + hPGI: g6p ↔ f6pPFK: atp + f6p → adp + fdp + h
CurrencyMetabolite
OtherMetabolite
ReversibleReaction
IrreversibleReaction
(a) Bipartite Representation (b) Unipartite Representation
Degree Distribution
Clustering coefficientAverage Path Length
Largest Strong Component
Diff
eren
t Gra
ph-th
eore
tic R
epre
sent
atio
ns o
f the
M
etab
olic
Net
wor
k
HEX1
PGI
atp
adph
PFK
g6p
glc-D
f6p
fdp
glc-D
g6p
f6p
fdp
HEX1: atp + glc-D → adp + g6p + hPGI: g6p ↔ f6pPFK: atp + f6p → adp + fdp + h
CurrencyMetabolite
OtherMetabolite
ReversibleReaction
IrreversibleReaction
(a) Bipartite Representation (b) Unipartite Representation
Degree Distribution
Clustering coefficientAverage Path Length
Largest Strong Component
Diff
eren
t Gra
ph-th
eore
tic R
epre
sent
atio
ns o
f the
M
etab
olic
Net
wor
k
The values obtained for different network measures depend on the list of currency metabolites used to generate the unipartite graph of metabolites starting from the bipartite graph. In this work, we have used a well defined biochemically sound list of currency metabolites to construct unipartite graph corresponding to each metabolic network.
Areejit Samal
Constraint I: Given number of valid biochemical reactions
Instead of generating fictitious reactions using edge exchange mechanism, we can restrict to >6000 valid reactions contained within KEGG database.
We use the database of 5870 mass balanced reactions derived from KEGG by Rodrigues and Wagner. For simplicity, I will refer to this database in the talk as ‘KEGG’
Areejit Samal
Global Reaction Set
+
Biomass composition
The E. coli biomass composition of iJR904 was used to determine viability of genotypes.
Nutrient metabolites
The set of external metabolites in iJR904were allowed for uptake/excretion in the global reaction set.
KEGG Database + E. coli iJR904
We can implement Flux Balance Analysis (FBA)
within this database.
Areejit Samal
Genome-scale metabolic models: Flux Balance Analysis (FBA)
List of metabolic reactions with stoichiometric coefficients
Biomass composition
Set of nutrients in theenvironment
Flux Balance Analysis (FBA)
Growth rate for the given medium‘Biomass Yield’
Fluxes of all reactions
AdvantagesFBA does not require enzyme kinetic information which is not known for most reactions.
DisadvantagesFBA cannot predict internal metabolite concentrations and is restricted to steady states.Basic models do not account for metabolic regulation.
Reference: Varma and Palsson, Biotechnology (1994); Price et al Nature Reviews Microbiology (2004)
Areejit Samal
Bit string representation of metabolic networks within the global reaction set
6000900C
KEGG Database E. coli
R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------
101....
The E. coli metabolic network or any random network of reactions within KEGG can be represented as a bit string of length N with exactly r entries equal to 1. r is the number of reactions in the E. coli metabolic network.
Present - 1Absent - 0
N = 5870 reactions within KEGG (or global reaction set)
Contains r reactions (1’s in the bit string)
Areejit Samal
Ensemble R: Random networks within KEGG with same number of reactions as E. coli
100....
To compare with E. coli, we generate an ensemble R of random networks as follows:
Pick random subsets with exactly r (~900) reactions within KEGG database of N(~6000) reactions. r is the no. of reactions in E. coli.
Huge no. of networks possible within KEGG with exactly r reactions ~
Fixing the no. of reactions is equivalent to fixing genome size.
6000900C
KEGG with N=5870
reactions
Random subsets or networks with r reactions
001....
111....
101....
Each random network or bit string contains exactly r reactions (1’s in the string) like in E. coli.
Areejit Samal
Metabolite degree distribution
k
1 10 100 1000
P(k)
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coli
E. coli: = 2.17; Most organisms: is 2.1-2.3
Reference: Jeong et al (2000);
Wagner and Fell (2001)
A degree of a metabolite is the number of reactions in which the metabolite participates in the network.
Areejit Samal
Metabolite degree distribution
k
1 10 100 1000
P(k)
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coli
k
1 10 100 1000
P(k)
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coliRandom
E. coli: = 2.17; Most organisms: is 2.1-2.3
Reference: Jeong et al (2000);
Wagner and Fell (2001)
Random networks in ensemble R: = 2.51
A degree of a metabolite is the number of reactions in which the metabolite participates in the network.
Areejit Samal
Metabolite degree distribution
k
1 10 100 1000
P(k)
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coli
Complete KEGG: = 2.31
k
1 10 100 1000
P(k)
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coliRandom
E. coli: = 2.17; Most organisms: is 2.1-2.3
Reference: Jeong et al (2000);
Wagner and Fell (2001)
Random networks in ensemble R: = 2.51
Any (large enough) random subset of reactions in KEGG has Power Law degree distribution !!
k
1 10 100 1000
P(k)
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coliRandomKEGG
A degree of a metabolite is the number of reactions in which the metabolite participates in the network.
Areejit Samal
Reaction Degree Distribution
Number of substrates in a reaction
0 2 4 6 8 10 12
Freq
uenc
y
0
500
1000
1500
2000
2500
30000 Currency1 Currency2 Currency3 Currency4 Currency5 Currency6 Currency7 Currency
In each reaction, we have counted the number of currency and other metabolites which is shown in the figure. Most reactions involve 4 metabolites of which 2 are currency metabolites and 2 are other metabolites.
The degree of a reaction is the number of metabolites that participate in it.
The reaction degree distribution is bell shaped with typical reaction in the network involving 4 metabolites. The nature of reaction degree distribution is very different from the metabolite degree distribution which follows a power law.
R
atp
A
adp
B
Areejit Samal
Scale-free versus Scale-rich network: Origin of Power laws in metabolic networks
Tanaka and Doyle have suggested a classification of metabolites into three categories based on their biochemical roles (a) ‘Carriers’ (very high degree)(b) ‘Precursors’ (intermediate degree)(c) ‘Others’ (low degree)
It has been argued that gene duplication and divergence at the level of protein interaction networks can lead to observed power laws in metabolic networks. However, the presence of ubiquitous (high degree) ‘currency’ metabolites along with low degree ‘other’ metabolites in each reaction can explain the observed fat tail in the metabolite distribution.
Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008)
R
atp
A
adp
B
Areejit Samal
Network Properties: Given number of valid biochemical reactions
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Network Properties: Given number of valid biochemical reactions
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Constraint II: Fix the number of metabolites
Direct sampling of randomized networks with same no. of reactions and metabolites as in E. coli is not possible.
Accept/Reject Criterion of reaction swap:Accept if the no. of metabolites in the new network is less than or equal to E. coli.
KEGG Database
R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------
10101..
N = 5870 reactions within KEGG reaction universe
11001..
10110..
11100..
00101..
E. coli
Accept
Reject
Step 1
Swap Swap
Reject
106 Swaps
Step 2
103 Swaps
store storeReaction swap keeps the number of reactions in network constant
Erase memory of
initial network
Saving Frequency
Ensemble RM
Markov Chain Monte Carlo (MCMC) sampling of randomized networks
Areejit Samal
Network Properties: After fixing number of metabolites
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Network Properties: After fixing number of metabolites
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Constraint III: Exclude Blocked Reactions
KEGG
Unblocked KEGG
Large-scale metabolic networks typically have certain dead-end or blocked reactions which cannot contribute to growth of the organism.
Blocked reactions have zero flux for every investigated chemical environment under any steady-state condition and do not contribute to biomass growth flux.
We next exclude blocked reactions from our biochemical universe used for sampling randomized networks.
For Blocked reactions, see Burgard et al Genome Research (2004)
Areejit Samal
MCMC Sampling: Exclusion of blocked reactions
Accept/Reject Criterion of reaction swap:
Restrict the reaction swaps to the set of unblocked reactions within KEGG.
Accept if the no. of metabolites in the new network is less than or equal to E. coli.
KEGG Database
R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------
10101..
N = 5870 reactions within KEGG reaction universe
11001..
10110..
11100..
00101..
E. coli
Accept
Reject
Step 1
Swap Swap
Reject
106 Swaps
Step 2
103 Swaps
store store
Erase memory of
initial network
Saving Frequency
Ensemble uRM
Areejit Samal
Network Properties: After excluding blocked reactions
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Network Properties: After excluding blocked reactions
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Constraint IV: Growth Phenotype
011....
001....
GA
GB
xDeletedreaction
FluxBalanceAnalysis
(FBA)
Viable genotype
Unviable genotype
Bit string and equivalent network representation of genotypes
Genotypes Ability to synthesize Viability biomass components
Growth
No Growth
Definition of Phenotype
For FBA, see Price et al (2004) for a review
Areejit Samal
MCMC Sampling: Growth phenotype in one environment
Accept/Reject Criterion of reaction swap:
Restrict the reaction swaps to the set of unblocked reactions within KEGG.
Accept if (a) No. of metabolites in the new network is less than or equal to E. coli(b) New network is able to grow on the specified environment
KEGG Database
R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------
10101..
N = 5870 reactions within KEGG reaction universe
11001..
10110..
11100..
00101..
E. coli
Accept
Reject
Step 1
Swap Swap
Reject
106 Swaps
Step 2
103 Swaps
store store
Erase memory of
initial network
Saving Frequency
Ensemble uRM-V1
Areejit Samal
Network Properties: Growth phenotype in one environment
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Network Properties: Growth phenotype in one environment
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Network Properties: Growth phenotype in multiple environments
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Network Properties: Growth phenotype in multiple environments
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
P c
0.0
0.2
0.4
0.6
0.8
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Frac
tion
of n
odes
0.0
0.2
0.4
0.6
0.8
1.0LSCLSC + IN + OUT
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< C
>
0.00
0.02
0.04
0.06
0.08
0.10
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
< L
>
0
2
4
6
8
10
Clustering Coefficient Path Length
Probability that a path exists between two nodes
Largest Strong Component
Areejit Samal
Global structural properties of real metabolic networks: Consequence of simple biochemical and functional constraints
Incr
easi
ng le
vel o
f con
stra
ints
Samal and Martin, PLoS ONE (2011) 6(7): e22295
Conjectured by: A. Wagner (2007) and
B. Papp et al. (2008)
Areejit Samal
Global structural properties of real metabolic networks: Consequence of simple biochemical and functional constraints
Incr
easi
ng le
vel o
f con
stra
ints
Samal and Martin, PLoS ONE (2011) 6(7): e22295
Conjectured by: A. Wagner (2007) and
B. Papp et al. (2008)
Areejit Samal
High level of genetic diversity in our randomized ensembles
Any two random networks in our most constrained ensemble uRM-v10 differ in ~ 60% of their reactions.
101....
100....
E. coli Random network in ensemble uRM-V10
versus
Hamming distance between the two networks is ~ 60% of the maximum possible between two bit strings.
There is a set of ~ 100 reactions which are present in all of our random networks.
101....
100....
Random network in ensemble uRM-V10
versus
Random network in ensemble uRM-V10
Areejit Samal
Necessity of MCMC sampling
Reference: Samal et al, BMC Systems Biology (2010)
Ensemble R
Ensemble Rm
Ensemble uRm
Number of reactions in genotype2000 2200 2400 2600 2800
Frac
tion
of g
enot
ypes
that
are
via
ble 100
10-5
10-10
10-15
10-20
GlucoseAcetateSuccinateAnalytic prefactor
Ensemble uRm-V1Ensemble uRm-V5
Ensemble uRm-V10
Embedded sets like Russian dolls
Including the viability constraint on the first chemical environment leads to a reduction by at least a factor 1022
Areejit Samal
Summary
We have proposed a new method based on MCMC sampling and FBA to generate randomized metabolic networks by successively imposing few macroscopic biochemical and functional constraints into our sampling criteria.
By imposing a few macroscopic constraints into our sampling criteria, we were able to approach the properties of real metabolic networks.
We conclude that biological function is a main driver of structure in metabolic networks.
Areejit Samal
Limitations and Future Outlook
We can extend the study using the SEED database to many organisms.
The list of reactions in KEGG is incomplete and its use may reflect evolutionary constraints associated with natural organisms.
From a synthetic biology context, there could be many more reactions that can occur which are not in KEGG.
Alternate approach proposed by Basler et al, Bioinformatics, 2011 which generates new mass-balanced reactions but does not impose functionality constraint.
Automated reconstruction of metabolic models for >140 organisms on which FBA can be performed.
Generate new mass-balanced reactions but no constraint of biomass production.
Areejit Samal
Genotype networks: A many-to-one genotype to phenotype map
Genotype
Phenotype
RNA Sequence
Secondary structure of pairings
folding
Reference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner (2007)
Genotype network refers to the set of (many) genotypes that have a given phenotype, and their “organization” in genotype space. In the context of RNA, genotype networks are commonly referred to as Neutral networks.
Protein Sequence
Three dimensional structure
Gene network
Gene expression levels
Areejit Samal
Properties of the genotype-to-phenotype map
Are there many genotypes that have a given phenotype?
Do the genotypes with a given phenotype form a significant fraction of all possible genotypes?
Are all the genotypes with a given phenotype rather similar?
Are biological genotypes atypically robust compared to random genotypes with similar phenotype?
Are biological genotypes atypically innovative?
Can genotypes change gradually in response to selective pressures?
A. Wagner, Nat. Rev. Genet. (2008)
Areejit Samal
Properties of the genotype-to-phenotype map
Are there many genotypes that have a given phenotype?
Do the genotypes with a given phenotype form a significant fraction of all possible genotypes?
Are all the genotypes with a given phenotype rather similar?
Are biological genotypes atypically robust compared to random genotypes with similar phenotype?
Are biological genotypes atypically innovative?
Can genotypes change gradually in response to selective pressures?
We have considered these questions in the context of metabolic networks.
Genotype = lists of metabolic enzymes or reactionsPhenotype = growth or not in a given environment
A. Wagner, Nat. Rev. Genet. (2008)
Areejit Samal
Computational tool for Evolutionary Systems Biology
Areejit Samal
Acknowledgement
CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG GDRE 513) for a Postdoctoral Fellowship.
Reference
Other Work
Genotype networks in metabolic reaction spacesAreejit Samal, João F Matias Rodrigues, Jürgen Jost, Olivier C Martin* and Andreas Wagner*BMC Systems Biology 2010, 4:30
Environmental versatility promotes modularity in genome-scale metabolic networksAreejit Samal, Andreas Wagner* andOlivier C Martin*BMC Systems Biology 2011, 5:135
Funding
Andreas Wagner João Rodrigues University of Zurich
Jürgen JostMax Planck Institute for Mathematics in the Sciences
Olivier C. MartinUniversité Paris-Sud