Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN]...

17
arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimer` a and Lu´ ıs A. Nunes Amaral NICO and Dept. Chemical and Biological Engineering Northwestern University, Evanston, IL 60208, USA High-throughput techniques are leading to an explosive growth in the size of biological databases and creating the opportunity to revolutionize our under- standing of life and disease. Interpretation of these data remains, however, a major scientific challenge. Here, we propose a methodology that enables us to extract and display information contained in complex networks 1,2,3 . Specif- ically, we demonstrate that one can (i) find functional modules 4,5 in complex networks, and (ii) classify nodes into universal roles according to their pat- tern of intra- and inter-module connections. The method thus yields a “car- tographic representation” of complex networks. Metabolic networks 6,7,8 are among the most challenging biological networks and, arguably, the ones with more potential for immediate applicability 9 . We use our method to analyze the metabolic networks of twelve organisms from three different super-kingdoms. We find that, typically, 80% of the nodes are only connected to other nodes within their respective modules, and that nodes with different roles are af- fected by different evolutionary constraints and pressures. Remarkably, we 1

Transcript of Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN]...

Page 1: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

arX

iv:q

-bio

/050

2035

v1 [

q-bi

o.M

N]

23 F

eb 2

005

Functional cartography

of complex metabolic networks

Roger Guimera and Luıs A. Nunes Amaral

NICO and Dept. Chemical and Biological Engineering

Northwestern University, Evanston, IL 60208, USA

High-throughput techniques are leading to an explosive growth in the size of

biological databases and creating the opportunity to revolutionize our under-

standing of life and disease. Interpretation of these data remains, however, a

major scientific challenge. Here, we propose a methodology that enables us

to extract and display information contained in complex networks1,2,3. Specif-

ically, we demonstrate that one can (i) find functional modules4,5 in complex

networks, and (ii) classify nodes into universal roles according to their pat-

tern of intra- and inter-module connections. The method thus yields a “car-

tographic representation” of complex networks. Metabolicnetworks6,7,8 are

among the most challenging biological networks and, arguably, the ones with

more potential for immediate applicability 9. We use our method to analyze the

metabolic networks of twelve organisms from three different super-kingdoms.

We find that, typically, 80% of the nodes are only connected toother nodes

within their respective modules, and that nodes with different roles are af-

fected by different evolutionary constraints and pressures. Remarkably, we

1

Page 2: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

find that low-degree metabolites that connect different modules are more con-

served than hubs whose links are mostly within a single module.

If one is to extract the significant information from the topology of a large complex net-

work, the knowledge of the role of each node is of crucial importance. A cartographic analogy

is helpful to illustrate this point. Consider the network formed by all cities and towns in a

country—the nodes—and all the roads that connect them—the links. It is clear that a map in

which each city and town is represented by a circle of fixed size and each road is represented

by a line of fixed width is hardly useful. Rather, real maps emphasize capitals and important

communication lines so that one can obtain scale-specific information at a glance. Similarly, it

is difficult, if not impossible, to obtain information from anetwork with hundreds or thousands

of nodes and links, unless the information about nodes and links is conveniently summarized.

This is particularly true for biological networks.

Here, we propose a methodology, which is based on the connectivity of the nodes, that

yields a “cartographic representation” of a complex network. The first step in our method is

to identify the functional modules4,5 in the network. In the cartographic picture, modules are

analogous to countries or regions, and enable a coarse-grained, and thus simplified, description

of the network. Then, we classify the nodes in the network into a small number ofsystem-

independent“universal roles.”

Modules.It is a matter of common experience that social networks havecommunities of highly

interconnected nodes that are less connected to nodes in other communities. Such modular

structures have been reported not only in social networks5,10,11,12, but also in food webs13 and

biochemical networks4,14,15,16. It is widely believed that the modular structure of complexnet-

works plays a critical role in their functionality4,14,16. There is therefore a clear need to develop

algorithms to identify modules accurately5,11,17,18,19,20.

We identify modules by maximizing the network’smodularity11,18,21 using simulated an-

nealing22 (see Methods). Simulated annealing enables us to carry out an exhaustive search and

to minimize the problem of finding sub-optimal partitions. It is noteworthy that, in our method,

2

Page 3: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

one does not need to specify a priori the number of modules; rather, this number is an outcome

of the algorithm. Our algorithm, which significantly outperforms the best algorithm in the lit-

erature, is able to reliably identify modules in a network whose nodes have as many as 50% of

their connections outside their own module (Fig. 1).

Roles in modular networks.It is plausible to surmise that the nodes in a network are connected

according to therole they fulfill. This fact has been long recognized in the analysis of so-

cial networks23. For example, in a classical hierarchical organization, the CEO is not directly

connected to plant employees but is connected to the membersof the board of directors. Impor-

tantly, such a statement holds for virtually any organization, that is, the role of CEO is defined

irrespective of the particular organization one considers.

We propose a new method to determine the role of a node in a complex network. Our

approach is based on the idea that nodes with the same role should have similar topological

properties24 (see Supplementary Information for a discussion on how our approach relates to

previous work). We hypothesize that the role of a node can be determined, to a great extent,

by its within-module degreeand itsparticipation coefficient, which define how the node is

positioned in its own module and with respect to other modules25,26 (see Methods). These two

properties are easily computed once the modules of a networkare known.

The within-module degreezi measures how “well-connected” nodei is to other nodes in

the module. High values ofzi indicate high within-module degrees and vice versa. The partic-

ipation coefficientPi measures how “well-distributed” the links of nodei are among different

modules. The participation coefficientPi is close to one if its links are uniformly distributed

among all the modules and zero if all its links are within its own module.

We define heuristically seven different “universal roles,”each defined by a different re-

gion in thezP parameters-space (Fig. 2). According to the within-moduledegree, we classify

nodes withz ≥ 2.5 as module hubs and nodesz < 2.5 as non-hubs. Both hub and non-hub

nodes are then more finely characterized by using the values of the participation coefficient (see

Supplementary Information for a detailed justification of this classification scheme, and for a

3

Page 4: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

discussion on possible alternatives).

We find that non-hub nodes can be naturally divided into four different roles: (R1)ultra-

peripheral nodes, i.e., nodes with all its links within their module (P ≤ 0.05); (R2) peripheral

nodes, i.e., nodes with most links within their module (0.05 < P ≤ 0.62); (R3) non-hub

connector nodes, i.e., nodes with many links to other modules (0.62 < P ≤ 0.80); and (R4)

non-hub kinless nodes, i.e., nodes with links homogeneously distributed among all modules

(P > 0.80). We find that hub nodes can be naturally divided into three different roles: (R5)

provincial hubs. i.e., hub nodes with the vast majority of links within theirmodule (P ≤ 0.30);

(R6) connector hubs, i.e., hubs with many links to most of the other modules (0.30 < P ≤

0.75); and (R7)kinless hubs, i.e., hubs with links homogeneously distributed among allmodules

(P > 0.75).

Cartographic representation of metabolic networks.To test the applicability of our approach to

complex biological networks, we consider the metabolic network6,7,14,8,9of twelve organisms:

four bacteria (E. coli, B. subtilis, L. lactis, andT. elongatus), four eukaryotes (S. cerevisiae, C.

elegans, P. falciparum, andH. sapiens), and four archaea (P. furiosus, A. pernix, A. fulgidus,

andS. solfataricus). In metabolic networks, nodes represent metabolites and two nodesi and

j are connected by a link if there is a chemical reaction in which i is a substrate andj a prod-

uct, or vice versa. In our analysis, we use the database developed by Ma and Zeng8 (MZ)

from the Kyoto Encyclopedia of Genes and Genomes27 (KEGG). Importantly, the results we

report are not altered if we consider the complete KEGG database instead (Figs. 2c and 4b, and

Supplementary Information).

First, we identify the functional modules in the different metabolic networks (Fig. 3). Find-

ing modules in metabolic networks based on purely topological properties is an extremely im-

portant task. For example, Schusteret al. have reported on the impossibility of obtaining

elementary flux modes28 from complete metabolic networks due to the combinatorial explosion

of the number of such modes29. Our algorithm identifies an average of 15 different modulesin

each metabolic network—with a maximum of 19 forE. coli andH. sapiens, and a minimum of

4

Page 5: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

11 forA. fulgidus. As expected, the density of links within each of the modulesis significantly

larger than between modules, typically 100-1000 times larger (see Supplementary Information).

To assess how each of the modules is related to the pathways traditionally defined in biol-

ogy, we use the classification scheme proposed in KEGG, whichincludes nine major pathways:

carbohydrate metabolism, energy metabolism, lipid metabolism, nucleotide metabolism, amino

acid metabolism, glycan biosynthesis and metabolism, metabolism of cofactors and vitamins,

biosynthesis of secondary metabolites, and biodegradation of xenobiotics. Each metabolite in

the KEGG database is assigned to, at least, one pathway; thus, we can determine to which path-

ways the metabolites in a given module belong. We find that most modules contain metabolites

mostly from one major pathway. For example, in 17 of the 19 modules identified forE. coli,

more than one third of the metabolites belong to a single pathway. Interestingly, some other

modules—two in the case ofE. coli—cannot be trivially associated with a single traditional

pathway. These modules are typically central in the metabolism and contain, mostly, metabo-

lites that are classified in KEGG as belonging to carbohydrate and amino acid metabolism.

Next, we identify the role of each metabolite. In Fig. 2b we show the roles identified in

the metabolic network ofE. coli. Remarkably, other organisms display a similar distribution of

the nodes in the different roles, even though they correspond to organisms that are very distant

from an evolutionary standpoint (see Supplementary Information). Role R1, which contains

ultra-peripheral metabolites with small degree and no between-module links, comprises 76-

86% of all the metabolites in the networks. This considerably simplifies the coarse-grained

representation of the network as these nodes do not need to beidentified separately. Note that

this finding alone represents an important step towards the goal of extracting scale-specific

information from complex networks.

Metabolite role and inter-species conservation.The information about modules and roles en-

ables us to build a “cartographic representation” of the metabolic network of, for example,E.

coli (Fig. 3). This representation enables us to recover relevant biological information. For

instance, we find that the metabolism is mostly organized around the module containing pyru-

5

Page 6: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

vate, which, in turn, is strongly connected to the module whose hub is acetyl-CoA. These two

molecules are key to connect the metabolism of carbohydrates, amino acids, and lipids to the

TCA cycle from which ATP is obtained. These two modules are connected to more peripheral

ones by key metabolites such as D-glyceraldehyde 3-phosphate and D-fructose 6-phosphate

(which connect to the glucose and galactose metabolisms), D-ribose 5-phosphate (which con-

nects to the metabolism of certain nucleotides), and glycerone phosphate (which connects to

the metabolism of certain lipids).

Importantly, our analysis also uncovers nodes with key connector roles that take part in only

a small but fundamental set of reactions. For example, N-carbamoyl-L-aspartate takes part in

only three reactions but is vital because it connects the pyrimidine metabolism, whose hub is

uracil, to the core of the metabolism through the alanine andaspartate metabolism. The po-

tential importance of such non-hub connectors points to another consideration. It is a plausible

hypothesisthat nodes with different roles are under different evolutionary constraints and pres-

sures. In particular, one expects that nodes with structurally relevant roles are more necessary

and therefore more conserved across species.

To quantify the relation between roles and conservation, wedefine the loss rateplost(R)

(see Methods). Structurally relevant roles are expected tohave low values ofplost(R) and vice

versa. Remarkably, we find that the different roles have, indeed, different loss rates (Fig. 4). As

expected, ultra-peripheral nodes (role R1) have the highest loss rate while connector hubs (role

R6) are the most conserved across all species considered.

The results for the comparison ofplost(R) for ultra-peripheral nodes and connector hubs is

illustrative, but hardly surprising. The comparison ofplost(R) for non-hub connectors (role R3)

and provincial hubs (role R5), however, yields a surprisingand remarkable finding. The metabo-

lites in the provincial hubs class have many within-module connections, sometimes as much as

five standard deviations more connections than the average node in the module. Conversely,

non-hub connector metabolites have few links relative to other nodes in their modules—and

fewer total connections than the metabolites in role R5 (seeSupplementary Figs. S12b,c). The

6

Page 7: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

links of non-hub connectors, however, are distributed among several different modules, while

the links of provincial hubs are mainly within their modules. We find that non-hub connectors

are systematically and significantly more conserved than provincial hub metabolites (Fig. 4).

A possible explanation for the high degree of conservation of non-hub connectors is the

following. Connector nodes are responsible for inter-module fluxes. These modules are, oth-

erwise, poorly connected or not connected at all to each other, so the elimination of connector

metabolites will likely have a large impact on the global structure of fluxes in the network. On

the contrary, the pathways in which provincial hubs are involved may be backed up within the

module, in such a way that elimination of these metabolites may have a comparatively smaller

impact, which, in addition, would likely be confined to the module containing the provincial

hub.

Our results therefore point to the need to consider each complex biological network as a

whole, instead of focusing on local properties. In protein networks, for example, it has been

reported that hubs are more essential than non-hubs30. Notwithstanding the relevance of such

a finding, our results suggest that the global role of nodes inthe network might be a better

indicator of their importance than degree26.

Our “cartography” provides a scale-specific method to process the information contained in

the structure of complex networks, and to extract knowledgeabout the function carried out by

the network and its constituents. An open question is how to adapt current module-detection

algorithms to networks with a hierarchical structure.

For metabolic networks, a comparatively well studied and well understood case, our method

allows us to recover firmly established biological facts, and to uncover important new results,

such as the significant conservation of non-hub connector metabolites. Similar results can be

expected when our method is applied to other complex networks that are not as well studied

as metabolic networks. Among those, protein interaction and gene regulation networks may be

the most significant.

7

Page 8: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

Methods

Modularity

For a given partition of the nodes of a network into modules, the modularityM of this partition

is11,18,21

M ≡NM∑

s=1

ls

L−

(

ds

2L

)2

, (1)

whereNM is the number of modules,L is the number of links in the network,ls is the number

of links between nodes in modules, andds is the sum of the degrees of the nodes in modules.

The rationale for this definition of modularity is the following. A good partition of a network

into modules must comprise many within-module links and as few as possible between-module

links. However, if one just tries to minimize the number of between-module links (or, equiva-

lently, maximize the number of within-module links) the optimal partition consists of a single

module and no between-module links. Equation (1) addressesthis difficulty by imposing that

M = 0 if nodes are placed at random into modulesor if all nodes are in the same cluster11,18,21.

The objective of a module identification algorithm is to find the partition with largest mod-

ularity, and several methods have been proposed to attain such a goal. Most of them rely on

heuristic procedures and useM—or a similar measure—only to assess their performance. In

contrast, we use simulated annealing22 to find the partition with the largest modularity.

Simulated annealing for module identification

Simulated annealing22 is a stochastic optimization technique that enables one to find “low cost”

configurations without getting trapped in “high-cost” local minima. This is achieved by intro-

ducing acomputational temperatureT . WhenT is high, the system can explore configurations

of high cost while at lowT the system only explores low cost regions. By starting at high

T and slowly decreasingT , the system descends gradually toward deep minima, eventually

overcoming small cost barriers.

8

Page 9: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

When identifying modules, the objective is to maximize the modularity and, thus, the cost

isC = −M , whereM is the modularity as defined in Eq. (1). At each temperature, we perform

a number of random updates and accept them with probability

p =

1 if Cf ≤ Ci

exp(

−Cf−Ci

T

)

if Cf > Ci

(2)

whereCf is the cost after the update andCi is the cost before the update.

Specifically, at eachT we proposeni = fS2 individual node movements from one module

to another, whereS is the number of nodes in the network. We also proposenc = fS collective

movements, which involve either the merging two modules or splitting a module. Forf we

typically choosef = 1. After the movements are evaluated at a certainT , the system is cooled

down toT ′ = cT , with c = 0.995.

Within-module degree and participation coefficient

Each module can be organized in very different ways, rangingfrom totally centralized—with

one or a few nodes connected to all the others—to totally decentralized—with all nodes having

similar connectivities. Nodes with similar roles are expected to have similar relative within-

module connectivity. Ifκi is the number of links of nodei to other nodes in its modulesi, κsi

is the average ofκ over all the nodes insi, andσκsiis the standard deviation ofκ in si, then

zi =κi − κsi

σκsi

(3)

is the so-calledz-score. The within-module degreez-score measures how “well-connected”

nodei is to other nodes in the module.

Different roles can also arise because of the connections ofa node to modules other than its

own. For example, two nodes with the samez-score will play different roles if one of them is

connected to several nodes in other modules while the other is not. We define the participation

coefficientPi of nodei as

Pi = 1−NM∑

s=1

(

κis

ki

)2

(4)

9

Page 10: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

whereκis is the number of links of nodei to nodes in modules, andki is the total degree of

nodei. The participation coefficient of a node is therefore close to one if its links are uniformly

distributed among all the modules and zero if all its links are within its own module.

Loss rate

To quantify the relation between roles and conservation, wecalculate to which extent metabo-

lites are conserved in the different species depending on the role they play. Specifically, for a

pair of species,A andB, we define the loss rate as the probabilityp(RA = 0|RB = R) ≡

plost(R) that a metabolite is not present in one of the species (RA = 0) given that it plays role

R in the other species (RB = R). Structurally relevant roles are expected to have low values of

plost(R) and vice versa.

References

1. Amaral, L. A. N., Scala, A., Barthelemy, M. & Stanley, H. E. Classes of small-world

networks.Proc. Natl. Acad. Sci. USA97, 11149–11152 (2000).

2. Albert, R. & Barabasi, A.-L. Statistical mechanics of complex networks.Rev. Mod. Phys.

74, 47–97 (2002).

3. Amaral, L. A. N. & Ottino, J. Complex networks: Augmentingthe framework for the study

of complex systems.Eur. Phys. J. B38, 147–162 (2004).

4. Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W.From molecular to modular

biology. Nature402, C47–C52 (1999).

5. Girvan, M. & Newman, M. E. J. Community structure in socialand biological networks.

Proc. Natl. Acad. Sci. USA99, 7821–7826 (2002).

10

Page 11: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

6. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. & Barabasi, A. L. The large-scale organi-

zation of metabolic networks.Nature407, 651–654 (2000).

7. Wagner, A. & Fell, D. A. The small world inside large metabolical networks.Proc. Roy.

Soc. B268, 1803–1810 (2001).

8. Ma, H. & Zeng, A.-P. Reconstruction of metabolic networksfrom genome data and analysis

of their global structure for various organisms.Bioinformatics19, 270–277 (2003).

9. Hatzimanikatis, V., Li, C., Ionita, J. A. & Broadbelt, L. Metabolic networks: enzyme

function and metabolite structure.Curr. Opin. Struc. Biol.14, 300–306 (2004).

10. Guimera, R., Danon, L., Dıaz-Guilera, A., Giralt, F. &Arenas, A. Self-similar community

structure in a network of human interactions.Phys. Rev. E68, art. no. 065103 (2003).

11. Newman, M. E. J. & Girvan, M. Finding and evaluating community structure in networks.

Phys. Rev. E69, art. no. 026113 (2004).

12. Arenas, A., Danon, L., Dıaz-Guilera, A., Gleiser, P. M.& Guimera, R. Community analysis

in social networks.Eur. Phys. J. B38, 373–380 (2004).

13. Krause, A. E., Frank, K. A., Mason, D. M., Ulanowicz, R. E.& Taylor, W. W. Compart-

ments revealed in food-web structure.Nature426, 282–285 (2003).

14. Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabasi, A.-L. Hierarchical

organization of modularity in metabolic networks.Science297, 1551–1555 (2002).

15. Holme, P. & Huss, M. Subnetwork hierarchies of biochemical pathways.Bioinformatics

19, 532–538 (2003).

16. Papin, J. A., Reed, J. L. & Palsson, B. O. Hierarchical thinking in network biology: the un-

biased modularization of biochemical networks.Trends Biochem. Sci.29, 641–647 (2004).

11

Page 12: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

17. Eriksen, K. A., Simonsen, I., Maslov, S. & Sneppen, K. Modularity and extreme edges of

the Internet.Phys. Rev. Lett.90, art. no. 148701 (2003).

18. Newman, M. E. J. Fast algorithm for detecting community structure in networks.Phys.

Rev. E69, art. no. 066133 (2004).

19. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V. & Parisi, D. Defining and identifying

communities in networks.Proc. Natl. Acad. Sci. USA101, 2658–2663 (2004).

20. Donetti, L. & Munoz, M. A. Detecting network communities: A new systematic and

efficient algorithm.J. Stat. Mech. Theor. Exp.P10012 (2004).

21. Guimera, R., Sales-Pardo, M. & Amaral, L. A. N. Modularity from fluctuations in random

graphs and complex networks.Phys. Rev. E70, art. no. 025101 (2004).

22. Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing.Science

220, 671–680 (1983).

23. Wasserman, S. & Faust, K.Social Network Analysis(Cambridge University Press, Cam-

bridge, U.K., 1994).

24. Guimera, R. & Amaral, L. A. N.J. Stat. Mech. Theor. Exp.submitted (2004).

25. Rives, A. W. & Galitski, T. Modular organization of cellular networks.Proc. Natl. Acad.

Sci. USA100, 1128–1133 (2003).

26. Han, J.-D. J.et al. Evidence for dinamically organized modularity in the yeastprotein-

protein interaction network.Nature430, 88–93 (2004).

27. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genesand Genomes.Nucleic

Acids Res.28, 27–30 (2000).

12

Page 13: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

28. Schuster, S., Fell, D. A. & Dandekar, T. A general definition of metabolic pathways useful

for systematic organization and analysis of complex metabolic networks.Nat. Biotechnol.

18, 326–332 (2000).

29. Schuster, S., Pfeiffer, T., Moldenhauer, F., Koch, I. & Dandekar, T. Exploring the pathway

structure of metabolism: decomposition into subnetworks and application toMicroplasma

pneumoniae. Bioinformatics18, 351–361 (2002).

30. Jeong, H., Mason, S. P., Barabasi, A.-L. & Oltvai, Z. N. Lethality and centrality in protein

networks.Nature41–42 (2001).

Acknowledgments We thank L. Broadbelt, V. Hatzimanikatis, A. A. Moreira, E. T. Papout-

sakis, M. Sales-Pardo, and D. B. Stouffer for stimulating discussions and helpful suggestions,

and H. Ma and A. P. Zeng for providing us with their metabolic networks’ database. R.G. thanks

the Fulbright Program and the Spanish Ministry of Education, Culture & Sports. L.A.N.A.

gratefully acknowledges the support of a Searle LeadershipFund Award and of a NIH/NIGMS

K-25 award.

13

Page 14: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

0 0.1 0.2 0.3 0.4 0.5 0.6Fraction of inter-community edges, kout / k

0.0

0.2

0.4

0.6

0.8

1.0

Per

form

ance

PresentGirvan-Newman

e1.0

0.0

0.5

1 32 64 96 128

1

32

64

96

128

Nodes

Nod

es

b ca

d

Figure 1: Performance of module identification methods. To test the performance of themethod, we build “random networks” with known module structure. Each test network com-prises 128 nodes divided into 4 modules of 32 nodes. Each nodeis connected to the other nodesin its module with probabilitypi, and to nodes in other modules with probabilitypo < pi. Onaverage, thus, each node is connected tokout = 96 po nodes in other modules and tokin = 31 piin the same module. Additionally,pi andpo are selected so that the average degree of the nodesis k = 16. We display networks with:a, kin = 15 andkout = 1; b, kin = 11 andkout = 5; andc, kin = kout = 8. d, The performance of a module identification algorithm is typically definedas the fraction of correctly classified nodes. We compare ouralgorithm to the Girvan-Newmanalgorithm5,18, which is the reference algorithm for module identification11,18,19. Note that ourmethod is 90% accurate even when half of a node’s links are to nodes in outside modules.e,Our module-identification algorithm is stochastic, so different runs yield, in principle, differentpartitions. To test the robustness of the algorithm, we obtain 100 partitions of the network de-picted inc and plot, for each pair of nodes in the network, the fraction of times that they areclassified in the same module. As shown in the figure, most pairs of nodes are either alwaysclassified in the same module (red) or never classified in the same module (dark blue), whichindicates that the solution is robust.

14

Page 15: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

0.0 0.2 0.4 0.6 0.8 1.0Participation coefficient, P

-2

0

2

4

6

8

With

in-m

odul

e de

gree

, z

R1

R2 R3 R4

R5 R6 R7

0.0 0.2 0.4 0.6 0.8 1.0Participation coefficient, P

-2

0

2

4

6

8

With

in-m

odul

e de

gree

, z

0.0 0.2 0.4 0.6 0.8 1.0Participation coefficient, P

-2

0

2

4

6

8

With

in-m

odul

e de

gree

, za b c

Figure 2: Roles and regions in thezP parameters-space.a, Each node in a network can becharacterized by its within-module degree and its participation coefficient (see Methods fordefinitions.) We classify nodes withz ≥ 2.5 as module hubs and nodesz < 2.5 as non-hubs.We find that non-hub nodes can be naturally assigned into fourdifferent roles: (R1)ultra-peripheral nodes, i.e., nodes with all its links within their module; (R2)peripheral nodes, i.e.,nodes with most links within their module; (R3)non-hub connector nodes, i.e., nodes with manylinks to other modules; and (R4)non-hub kinless nodes, i.e., nodes with links homogeneouslydistributed among all modules. We find that hub nodes can be naturally assigned into threedifferent roles: (R5)provincial hubs. i.e., hub nodes with the vast majority of links withintheir module; (R6)connector hubs, i.e., hubs with many links to most of the other modules;and (R7)kinless hubs, i.e., hubs with links homogeneously distributed among allmodules.(Supplementary Information.)b, Metabolite role determination for the metabolic networkE.coli, as obtained from the MZ database. Each metabolite is represented as a point in thezPparameters-space, and is colored according to its role.c, Same asb but for the complete KEGGdatabase.

15

Page 16: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

����

��������

������

������

����

���

���

������

������

��������

���

���

����

������

������

������

������

��������

����

���

���

������

������

��������

��������

����

��������

���

���

��������

���

���

������

���������

���

��������

���

���

����

�����������

���

��������

������

���������

���

����

������

������������

������

��������

��������

����

������

������

������

������

������

������

����

���

���

����

�������� ���

���

���

���

������

������

��������

���

���

�������

���

������

������

��������

���

���

���

���

��������

��������

����

���

���

���

���

����

���

���

����

������

������

����

������

������

����

���

���

�����������

���

������

������

��������

�������� ������

������

��������

���

�����������

��������

����

������

������

������

����������

���

���

��������

���� ��

������

��������

������

������

�������������������

���

������

������

������

������

��������

������

������

������

�������

���

���

���

��������������

������

��������

��������

����

����

����

����

������

������

����

���

������

���

������

������

��������

���

���

������

������

����

������

������

������

������

���

���

����

����

��������

������

������

��������

���

���

����

��������

���������

��

���

���

�������

���

���

����

����

��������

���

���

��������

����

����

����

���

���

������

������

����

���

���

���

������

���

������

������

����

����

������

������

������

����������

����

������������

��������

����

Carbohydrate metabolism

Metabolism of cofactors & vitamins

Nucleotide metabolism

Glycan biosynthesis & metabolism

Energy metabolismLipid metabolism

Amino acid metabolismBiosynthesis of secondary metabolites

Biodegradation of xenobiotics

Module−moduleModule−nodeNode−node

Non−hub connectorConnector hubProvincial hub

AdenineD−Glucose

D−Galactose

D−Ribose 5−phosphate

Pyruvate

D−Glyceraldehyde 3−phosphate

D−fructose 1,6−biphosphateGlycerone phosphate

Acetyl−CoA

D−Fructose 6−phosphate

L−Glutamate

N−Carbamoyl L−aspartate

Uracil

UDP−N−acetyl−D−glucosamine

Glycine

Glutathione

Figure 3: “Cartographic representation” of the metabolic network ofE. coli. Each circle repre-sents a module and is colored according to the KEGG pathway classification of the metabolitesit contains. Certain important nodes are depicted as triangles (non-hub connectors), hexagons(connector hubs), and squares (provincial hubs). Interactions between modules and nodes aredepicted using lines, whith thickness proportional to the number of actual links. (Inset) Pajek-obtained representation of the entire metabolic network ofE. coli contains 473 metabolites and574 links. Each node is colored according to the “main” colorof its module, as obtained fromthe “cartographic representation.”

16

Page 17: Functional cartography of complex metabolic networks arXiv ... · arXiv:q-bio/0502035v1 [q-bio.MN] 23 Feb 2005 Functional cartography of complex metabolic networks Roger Guimera and

Letter to Nature Guimera and Amaral

1 2 3 5 6Role, R

0.0

0.2

0.4

0.6

0.8

p lost (

R)

aN

on-h

ubs

Hub

s

1 2 3 5 6Role, R

0.0

0.2

0.4

0.6

0.8

p lost (

R)

b

Non

-hub

s

Hub

s

Figure 4: Roles of metabolites and inter-species conservation. To quantify the relation be-tween roles and conservation, we calculate the loss rateplost(R) of each metabolite (see Meth-ods). Each thin line in the graph corresponds to a comparisonbetween two species. Since weare interested in metabolites that are present in some species but missing in others, metabolicnetworks of species within the same super-kingdom—bacteria, eukaryotes, and archaea—areusually too similar to provide statistically sound information, especially for roles containingonly a few metabolites. Therefore, we consider in our analysis only pairs of species that be-long to different super-kingdoms. The thick line is the average over all pairs of species. Theloss rateplost(R) is maximum for ultra-peripheral (R1) nodes and minimum for connector hubs(R6). Remarkably, provincial hubs (R5) have a significantlyand consistently higherplost(R)than non-hub connectors (R3), even though the within-module degree and the total degree ofprovincial hubs is larger. Note that, out of the total 48 paircomparisons, only in two casesplost(R) is lower for provincial hubs than for non-hub connectors, while the opposite is true in44 cases.a, Results obtained for the MZ database andb, the complete KEGG database.

17