Resilience of protein–protein interaction networks as determined by their large-scale topological...

7
This journal is c The Royal Society of Chemistry 2011 Mol. BioSyst., 2011, 7, 1263–1269 1263 Cite this: Mol. BioSyst., 2011, 7, 1263–1269 Resilience of protein–protein interaction networks as determined by their large-scale topological featuresw Francisco A. Rodrigues,* a Luciano da Fontoura Costa b and Andre´ Luiz Barbieri b Received 1st November 2010, Accepted 7th January 2011 DOI: 10.1039/c0mb00256a The relationship between the structure and function of biological networks constitutes a fundamental issue in systems biology. Particularly, the structure of protein–protein interaction networks is related to important biological functions. In this work, we investigated how such a resilience is determined by the large scale features of the respective networks. Four species are taken into account, namely yeast Saccharomyces cerevisiae, worm Caenorhabditis elegans, fly Drosophila melanogaster and Homo sapiens. We adopted two entropy-related measurements (degree entropy and dynamic entropy) in order to quantify the overall degree of robustness of these networks. We verified that while they exhibit similar structural variations under random node removal, they differ significantly when subjected to intentional attacks (hub removal). As a matter of fact, more complex species tended to exhibit more robust networks. More specifically, we quantified how six important measurements of the networks topology (namely clustering coefficient, average degree of neighbors, average shortest path length, diameter, assortativity coefficient, and slope of the power law degree distribution) correlated with the two entropy measurements. Our results revealed that the fraction of hubs and the average neighbor degree contribute significantly for the resilience of networks. In addition, the topological analysis of the removed hubs indicated that the presence of alternative paths between the proteins connected to hubs tend to reinforce resilience. The performed analysis helps to understand how resilience is underlain in networks and can be applied to the development of protein network models. Introduction The study of molecular biology has received much inter- disciplinary attention focused on complex interactions in biological systems, which has led to a new perspective, i.e. holism instead of reduction. 1 Biological processes depend on the interaction between proteins as well as between proteins and other molecules, 1 which generate networks of interactions. Though protein–protein interaction networks have been studied for a long time (e.g. ref. 2), only recently graph theoretical concepts and measurements were applied for respective analysis (e.g. ref. 3). Proteins can interact in order to form a cellular complex, where each protein can carry another protein, or interact briefly with a target protein in order to modify it. Protein interaction maps can be naturally represented as networks, where the nodes represent proteins and the edges represent interactions. 4 One particularly important object of investigation by systems biology researchers has been the relationship between the structure of biological networks and their respective func- tions that control the information flow and regulation of cellular signals. 5,6 Indeed, it has been verified that the structure of protein–protein interaction networks differs from random networks of the type E ¨ rdo¨ s and Re´nyi models, 7 but presents a structured organization that follows the scale-free paradigm, 8 characterized by the fact that the distribution of the number of connections follows a power law. Actually, the distribution of connections of protein–protein interaction networks follow a power law with a cutoff exponential. 9,10 This type of topology is particularly interesting because it accounts for the existence of hubs, i.e. nodes with particularly high degrees, which are known to play an important role in networks. 11 As a matter of fact, hubs have an intrinsic relationship with lethality, as verified in recent works (e.g. ref. 9 and 12). The investigation of the relationship between the network organization and function can be performed by gene deletion. In recent experiments, the analysis of robustness of biological networks was performed experimentally by studies in a direct response to gene deletions 13 or RNA interference. 14 These a Departamento de Matema ´tica Aplicada e Estatı´stica, Instituto de Cieˆncias Matema ´ticas e de Computac ¸a ˜o, Universidade de Sa ˜o Paulo-Campus de Sa ˜o Carlos, Caixa Postal 668, 13560-970 Sa ˜o Carlos, SP, Brazil. E-mail: [email protected] b Instituto de Fsica de Sa ˜o Carlos, Universidade de Sa ˜o Paulo, Av. Trabalhador Sa ˜o Carlense 400, Caixa Postal 369 CEP 13560-970, Sa ˜o Carlos, Sa ˜o Paulo, Brazil w Electronic supplementary information (ESI) available. See DOI: 10.1039/c0mb00256a Molecular BioSystems Dynamic Article Links www.rsc.org/molecularbiosystems PAPER Published on 04 February 2011. Downloaded by New York University on 26/10/2014 22:01:23. View Article Online / Journal Homepage / Table of Contents for this issue

Transcript of Resilience of protein–protein interaction networks as determined by their large-scale topological...

Page 1: Resilience of protein–protein interaction networks as determined by their large-scale topological features

This journal is c The Royal Society of Chemistry 2011 Mol. BioSyst., 2011, 7, 1263–1269 1263

Cite this: Mol. BioSyst., 2011, 7, 1263–1269

Resilience of protein–protein interaction networks as determined by their

large-scale topological featuresw

Francisco A. Rodrigues,*aLuciano da Fontoura Costa

band Andre Luiz Barbieri

b

Received 1st November 2010, Accepted 7th January 2011

DOI: 10.1039/c0mb00256a

The relationship between the structure and function of biological networks constitutes a

fundamental issue in systems biology. Particularly, the structure of protein–protein interaction

networks is related to important biological functions. In this work, we investigated how such a

resilience is determined by the large scale features of the respective networks. Four species are

taken into account, namely yeast Saccharomyces cerevisiae, worm Caenorhabditis elegans, fly

Drosophila melanogaster and Homo sapiens. We adopted two entropy-related measurements

(degree entropy and dynamic entropy) in order to quantify the overall degree of robustness of

these networks. We verified that while they exhibit similar structural variations under random

node removal, they differ significantly when subjected to intentional attacks (hub removal). As a

matter of fact, more complex species tended to exhibit more robust networks. More specifically,

we quantified how six important measurements of the networks topology (namely clustering

coefficient, average degree of neighbors, average shortest path length, diameter, assortativity

coefficient, and slope of the power law degree distribution) correlated with the two entropy

measurements. Our results revealed that the fraction of hubs and the average neighbor degree

contribute significantly for the resilience of networks. In addition, the topological analysis of the

removed hubs indicated that the presence of alternative paths between the proteins connected to

hubs tend to reinforce resilience. The performed analysis helps to understand how resilience is

underlain in networks and can be applied to the development of protein network models.

Introduction

The study of molecular biology has received much inter-

disciplinary attention focused on complex interactions in

biological systems, which has led to a new perspective, i.e.

holism instead of reduction.1 Biological processes depend on

the interaction between proteins as well as between proteins

and other molecules,1 which generate networks of interactions.

Though protein–protein interaction networks have been

studied for a long time (e.g. ref. 2), only recently graph

theoretical concepts and measurements were applied for

respective analysis (e.g. ref. 3). Proteins can interact in order

to form a cellular complex, where each protein can carry

another protein, or interact briefly with a target protein in

order to modify it. Protein interaction maps can be naturally

represented as networks, where the nodes represent proteins

and the edges represent interactions.4

One particularly important object of investigation by

systems biology researchers has been the relationship between

the structure of biological networks and their respective func-

tions that control the information flow and regulation of

cellular signals.5,6 Indeed, it has been verified that the structure

of protein–protein interaction networks differs from random

networks of the type Erdos and Renyi models,7 but presents a

structured organization that follows the scale-free paradigm,8

characterized by the fact that the distribution of the number of

connections follows a power law. Actually, the distribution of

connections of protein–protein interaction networks follow a

power law with a cutoff exponential.9,10 This type of topology

is particularly interesting because it accounts for the existence

of hubs, i.e. nodes with particularly high degrees, which are

known to play an important role in networks.11 As a matter of

fact, hubs have an intrinsic relationship with lethality, as

verified in recent works (e.g. ref. 9 and 12).

The investigation of the relationship between the network

organization and function can be performed by gene deletion.

In recent experiments, the analysis of robustness of biological

networks was performed experimentally by studies in a direct

response to gene deletions13 or RNA interference.14 These

aDepartamento de Matematica Aplicada e Estatıstica, Instituto deCiencias Matematicas e de Computacao, Universidade de SaoPaulo-Campus de Sao Carlos, Caixa Postal 668, 13560-970Sao Carlos, SP, Brazil. E-mail: [email protected]

b Instituto de Fsica de Sao Carlos, Universidade de Sao Paulo,Av. Trabalhador Sao Carlense 400, Caixa Postal 369CEP 13560-970, Sao Carlos, Sao Paulo, Brazil

w Electronic supplementary information (ESI) available. See DOI:10.1039/c0mb00256a

MolecularBioSystems

Dynamic Article Links

www.rsc.org/molecularbiosystems PAPER

Publ

ishe

d on

04

Febr

uary

201

1. D

ownl

oade

d by

New

Yor

k U

nive

rsity

on

26/1

0/20

14 2

2:01

:23.

View Article Online / Journal Homepage / Table of Contents for this issue

Page 2: Resilience of protein–protein interaction networks as determined by their large-scale topological features

1264 Mol. BioSyst., 2011, 7, 1263–1269 This journal is c The Royal Society of Chemistry 2011

experiments are fundamental approaches to understanding the

global organization of gene functions.13 Computationally, the

examination of perturbations can be implemented by random

or preferential removal of proteins in protein interaction

networks.15 In the former case, all proteins have the same

probability to be removed, while in the latter, proteins are

selected according to a given attribute, such as the respective

number of connections. A previous work analyzed the

robustness of proteins of the Saccharomyces cerevisiae,

Caenorhabditis elegans and Drosophila melanogaster, where

the authors described the variation of network diameter under

fails, i.e. random removal, and attacks, achieved by deleting

the most connected proteins while following the order of their

number of connections.16 However, that work provided only a

partial picture of network resilience since it did not provide

any explanation about which topological feature is contributing

to the network’s robustness. In addition, only one particular

network property, i.e. the diameter, was analyzed. In fact,

since the organization of the networks is a product of natural

selection, it is expected that the structure of biological networks

of different species should differ in some way. Here, we present

a quantification of how several important structural properties

contribute for the resilience of protein–protein interaction

networks.

In order to verify this relationship between resilience and

network organization, we performed a comparative analysis

of different species calculating the entropy of connectivity

distribution17 and the dynamics entropy based on the Markov

process,18 which are robustness-related measurements. Parti-

cularly, the dynamical entropy for a Markov process was

applied recently in the analysis of a protein network of the

yeast S. cerevisiae,19 revealing that proteins with large

contribution to network entropy are frequently lethal. These

adopted measurements proved to be consistent with our

performed simulations. The relationship between the structure

and function of networks can be quantified in terms of the

correlation between these entropy-based measurements and

network topological features. In fact, this analysis can reveal

which structural property of networks is more determinant for

network resilience. We verified that two topological features

appear to contribute particularly for this resilience, namely

the coefficient of the power law degree distribution and the

average neighbor degree. In addition, we verified that the

networks of the S. cerevisiae, C. elegans, D. melanogaster

and Homo sapiens present similar behavior when subjected

to random fails. This result agrees with the experimental

observation, where a large number of perturbations did not

result in any phenotypic perturbation variation under a given

experimental condition.13,14 On the other hand, their networks

topologies have a different behavior under intentional attacks.

As a matter of fact, the networks of H. sapiens and

D. melanogaster revealed to be more robust under intentional

attacks than the networks of S. cerevisiae and C. elegans.

Therefore, the more complex an organism is, the higher is its

resilience to attacks. Since we identified which features are

mainly responsible for increased network resilience, these

results help to understand the organization and function of

protein interactions, and can be applied in the development of

more precise protein–protein interaction network models.

In next sections, we present methodologies for description

of network topology and quantification of network resilience.

The results and discussion are presented subsequently,

followed by a description of the simulations considering

random fails and intentional attacks.

Protein–protein interaction databases

In order to compare networks of different species, we

considered two databases from different sources for each

species. This adoption paved the way to a more reliable

comparison, since the adopted databases provided networks

with different number of proteins and connections (i.e. different

levels of completeness). For the yeast S. cerevisiae, we adopted

the database of the Center for Cancer Systems Biology

(CCSB), which provides high-quality binary interaction

information.20 This database is considered a ‘‘second-generation’’

high-quality high-throughput Y2H dataset covering about

20% of all yeast binary interactions.20 This database covers

1809 interactions among 1278 proteins (we name this network

as Sce-CCSB). The second database is based on Ito-core and

Uetz-screens and is composed by 2930 interactions among

2018 proteins (here named Sce-Union).21 In the case of the

worm C. elegans, we took into account two databases: (i) the

Worm Interactome,22 from the CCSB (named Cel-Wi), which

is composed by 1496 proteins and 1816 interactions (from

Vital Lab 2007), and (ii) the worm database by the Biogrid,23

version 3.0.65 (named Cel-BG), composed by 3518 proteins

and 6672 interactions. The databases of D. melanogaster

were also obtained from the Biogrid,23 version 3.0.65 (named

Dme-BG), formed by 7282 nodes and 27 671 connections; and

from the Drosophila Interaction Database (DroiD), which

includes protein interaction data generated in the Finley

laboratory using the LexA yeast two-hybrid system, mostly

from high throughput screens.24 This latter database resulted

in a network (named Dme-FI) composed by 1338 nodes and

3161 interactions. At last, for the H. sapiens, we took into

account the database by Biogrid,23 also version 3.0.65 (named

Hsa-BG), composed by 9527 proteins and 36 877 interactions;

and the Human Protein Reference Database release 9,25

formed by 9612 proteins and 39 128 interactions (named

Has-HPRD). For all these databases, only the largest compo-

nent was taken into account while performing the resilience

analysis.

Methods

Topological measurements

Protein–protein interaction networks are types of undirected

complex networks formed by a set of N nodes (or vertices)

connected by E edges. A network can be represented by its

adjacency matrix A, whose elements aij and aji are equal to one

whenever there is a connection between the proteins i and j,

and equal to zero otherwise. The topology of protein–protein

interaction networks can be described by a set of network

measurements,26 which can quantify the network connectivity,

presence of cycles and distances between a pair of nodes.

Publ

ishe

d on

04

Febr

uary

201

1. D

ownl

oade

d by

New

Yor

k U

nive

rsity

on

26/1

0/20

14 2

2:01

:23.

View Article Online

Page 3: Resilience of protein–protein interaction networks as determined by their large-scale topological features

This journal is c The Royal Society of Chemistry 2011 Mol. BioSyst., 2011, 7, 1263–1269 1265

The degree of a protein i, hence ki, is equal to the number of

edges connected to that protein and can be calculated as

ki ¼X

j

aij ¼X

j

aji: ð1Þ

The average degree of a network is the average of ki for all

vertices in the network,

hki ¼ 1

N

X

i

ki ¼1

N

X

ij

aij : ð2Þ

The average degree quantifies the density of connections. The

degree distribution P(k) yields the probability of finding a

node with degree k in a network.

Protein–protein interaction networks present cycles of order

three, given by triples of three connected vertices. One way to

characterize the presence of loops of this order is through the

clustering coefficient. The clustering coefficient cc of a protein i

is defined as the number of links li between the nodes within its

neighborhood divided by the number of edges that could

possibly exist between them (ki(ki � 1)/2).

The average neighbor connectivity, r(i), measures the average

degree of the neighbors of each vertex i in the network.27 This

amount is calculated by finding the neighbors of given proteins

and averaging their connectivities. The global respective

measurement is given by averaging the neighbor degrees for

all vertices, represented by r.

The average shortest path length (sp) is calculated by taking

into account the shortest distance between each pair of vertices

in the network. The diameter of a network corresponds to the

longest length among all shortest paths.

The assortative coefficient, henceforth s, measures the

correlation between vertex degrees.28 More specifically, it is

obtained by calculating the Pearson correlation coefficient

between the degrees of each pair of nodes. If s > 0 the

network is assortative; if s o 0, the network is disassortative;

for s = 0 there are no correlations between vertex degrees.

While in assortative networks nodes with a similar degree tend

to connect to each other, in disassortative structures the highly

connected nodes tend to be attached to poorly connected

nodes.28

Resilience-related measurements

The resilience of protein networks can be quantified by

entropy related measurements.17,19 The entropy of the degree

distribution has revealed to be an effective measure of net-

work’s resilience under perturbations.17 This measurement

provides an average measure of the network’s heterogeneity,

since it measures the diversity of the link distribution. The

entropy of the degree distribution P(k) can be defined as

follows:

H ¼ �X

k

PðkÞ log½PðkÞ�: ð3Þ

In the case of protein–protein interaction networks, which

present a power-law degree distribution with an exponential

cutoff, as the scaling exponent g increases, the network

becomes less heterogeneous and, as a result, a lower entropy

value should be observed.

The other adopted resilience-related measurement is the

dynamic entropy, which is based on stochastic dynamic of

random walks. This particular dynamic has been used for

modeling of sequential protein activations, such as signal

transduction,29 as described in ref. 12. The dynamical entropy

of a Markov process characterizes the diversity of possible

pathways and is related to the response of the system to

perturbations.30 This measurement was used for characteriza-

tion of proteins in the networks of the yeast, revealing that

proteins with large contribution to network entropy are

preferentially lethal.19 The dynamical entropy is calculated

by the following equation:18

Hd ¼ �X

ij

pipij logðpijÞ; ð4Þ

where pij was obtained from the transition probability matrix,

whose elements pi,j = ai,j/kj. That is, each element pij repre-

sents the probability of movement of a random walker from

node i to node j in one step. The term pi represents the

components of the stationary distribution, i.e. p is defined as

the left-hand eigenvector associated with the largest eigenvalue

1 of the stochastic matrix P, pP = p. Since the entropy of

the degree distribution and the dynamical entropy are well-

established measurements,17,19 we used both in the quantifi-

cation of network resilience.

Discussion

The large scale organization of protein–protein interaction

networks

The cumulative degree distribution for all protein–protein

interaction networks follows a power law with an exponential

cut-off, which is due to the finite size effect,31 described by

P(k) C kge�k/kc (5)

Fig. 1 shows the cumulative distribution (Pcum(k) =RN

k

P(x)dx) obtained for all eight networks. The values of g,

Fig. 1 Cumulative degree distribution of the protein–protein inter-

action networks of S. cerevisiae (Sce-CCSB and Sce-Union), C. elegans

(Cel-Wi and Cel-BG), D. melanogaster (Dme-BG and Dme-Fl) and

H. sapiens (Hsa-BG and Hsa-CCSB).

Publ

ishe

d on

04

Febr

uary

201

1. D

ownl

oade

d by

New

Yor

k U

nive

rsity

on

26/1

0/20

14 2

2:01

:23.

View Article Online

Page 4: Resilience of protein–protein interaction networks as determined by their large-scale topological features

1266 Mol. BioSyst., 2011, 7, 1263–1269 This journal is c The Royal Society of Chemistry 2011

obtained by using the least squares method, are shown in

Table 1.

The clustering coefficient resulted relatively small for all

networks, indicating that two connected proteins tend not to

share the same set of neighbors. In addition, the small values

of the average shortest path length indicate that all considered

protein networks are small-world, which can be a consequence

of the presence of hubs. It is interesting to note that smaller sp

values tended to be obtained for more complex species. In fact,

the smaller the shortest path, the faster the communication

between nodes, an aspect that can be related to resilience.32

The diameter of the networks seems to be preserved for all

species, varying from 12 to 16. The average neighbor degree

also is likely to be higher for more complex species, with the

H. sapiens and D. melanogaster presenting the highest values.

All investigated protein networks are disassortative, exhibiting

smaller coefficient s for the yeast and worm. Similar behavior

was observed for the coefficient of the degree distribution.

Actually, the protein interaction networks of S. cerevisiae and

C. elegans have the lowest degree exponent g, which means

that the topologies of these networks are mostly dominated by

the highly connected proteins. In next sections, we verify that

this property seems to be fundamental for defining the level of

network resilience.

Random fail

We start our dynamic analysis by investigating the effect of

random removal of proteins from the networks of the four

considered species. This task was achieved without any

preferential rule, i.e. all proteins had the same probability to

be selected for removal in a network. Fig. 2 presents the

variation of the size of the largest component, diameter and

average shortest path for each network. These measurements

are those most frequently adopted in the literature for analysis

of resilience (e.g. ref. 11), since they quantify the network’s

ability to keep its normal behavior under external or internal

perturbations. Note that each network measurement was

normalized with respect to its original value, i.e. the measure-

ment calculated from the network without perturbations. This

approach was adopted since networks present different num-

ber of vertices and edges. Moreover, the studied measurements

depend on the networks size and density of connections.

Therefore, the normalization allowed us to investigate only

the variation of the measurement while avoiding effects

imposed by the network size and connection density. In

Fig. 2, it can be observed that all networks presented similar

behavior. The main difference occurred for the largest compo-

nent of the S. cerevisiae and C. elegans. Furthermore, the

diameter and the average shortest path of all networks present

very small variations. Therefore, the networks of all species

exhibit similar resilience against random fails, not mattering

the complexity of the organism. Indeed, this is a consequence

of the fact that all these networks are scale-free presenting

degree distribution that follows a power-law with an expo-

nential cutoff, as shown in Fig. 1. These obtained results

corroborate experimental observations, which showed that a

large number of perturbations does not result in any pheno-

typic perturbation variation under a given experimental

condition.13,14

Table 1 Set of measurements obtained from the networks of the four considered species: average clustering coefficient (cc), average shortest pathlength (sp), network diameter (d), average neighbor connectivity (r), assortative coefficient (s), the scaling exponent of the power law (g)

Species cc sp d r s g H Hd

S. cerevisiae (Sce-CCSB) 0.056 5.35 14 3.08 �0.16 �2.47 2.56 2.47S. cerevisiae (Sce-Union) 0.057 5.61 14 3.06 �0.11 �2.47 2.59 2.41C. elegans (Cel-Wi) 0.017 5.72 16 2.71 �0.25 �2.53 2.20 2.37C. elegans (Cel-BG) 0.060 4.32 13 3.85 �0.17 �2.17 2.55 3.50D. melanogaster (Dme-BG) 0.046 4.32 11 6.93 �0.04 �1.77 3.88 3.79D. melanogaster (Dme-Fl) 0.043 4.50 12 5.02 �0.15 �2.05 3.09 3.53H. sapiens (Hsa-BG) 0.157 4.48 14 6.76 �0.07 �1.96 3.79 3.86H. sapiens (Hsa-CCSB) 0.106 4.21 14 8.00 �0.04 �1.91 4.09 4.11

Fig. 2 Random fails in protein–protein interaction networks. The variation of the size of largest component (a), diameter (b) and average shortest

path length (c) as a function of the fraction of removed vertices for the protein–protein interaction networks of S. cerevisiae (Sce-CCSB and

Sce-Union), C. elegans (Cel-Wi and Cel-BG), D. melanogaster (Dme-BG and Dme-Fl) and H. sapiens (Hsa-BG and Hsa-CCSB). Each point is an

average over 1000 simulations. The lines are only guides for the eye.

Publ

ishe

d on

04

Febr

uary

201

1. D

ownl

oade

d by

New

Yor

k U

nive

rsity

on

26/1

0/20

14 2

2:01

:23.

View Article Online

Page 5: Resilience of protein–protein interaction networks as determined by their large-scale topological features

This journal is c The Royal Society of Chemistry 2011 Mol. BioSyst., 2011, 7, 1263–1269 1267

Intentional attack

The resilience of protein–protein interaction networks was

also investigated with respect to intentional attacks. In this

case, the most connected nodes were removed sequentially,

according to their degree, and a respective measurement was

calculated. We performed such analysis removing upto 5% of

the total number of vertices and obtained the relative variation

of the giant component, network diameter and average shortest

path length. As done in the case of random fails, the relative

variation was obtained by the ratio between the measurement

calculated when a fraction f of nodes was removed and the

original measurement. Fig. 3 presents the obtained results.

Different from the random fails, the structural variation of the

networks under attacks depends on the organism complexity.

In all cases, the more complex an organism, the more resilient

it tends to be to intentional attacks. The protein–protein

interactions of the D. melanogaster and H. sapiens are the

most robust, while the network of the S. cerevisiae is the least

resilient. In the next section, we investigate which topological

properties are mainly responsible for protein network

robustness.

Relationship between network structure and resilience

Entropy-related measurements, namely the structural17 and

dynamic entropy,19 have been taken into account for quanti-

fication of network robustness in networks. We adopted these

measurements in order to identify which properties of net-

works contribute more decisively for the resilience of the

protein–protein interaction networks. Table 1 presents the

obtained entropy values for each network. Comparing these

values with the curves in Fig. 2 and 3, we can observe a good

agreement between them, i.e. more resilient networks tend to

have higher dynamic and structural entropies. In this way, we

considered a set of six measurements and calculated their

Pearson correlation coefficient with the entropy of connectiv-

ity distribution and the dynamics entropy. This approach

allowed the identification of clear relationships between each

given structural features and the overall network robustness.

Fig. 4 presents the obtained relationships between the dynamic

entropy and the adopted measurements. The correlations

between the entropy of the degree distribution and each

measurement (r), as well as between the dynamic entropy

and each network measurement, are also given in this figure.

By comparing the correlation values, it can be observed that

the average neighbor degree, r, and the coefficient of power

law, g, are the measurements that present the highest correla-

tions with the entropies. These two properties are potentially

contributing in a particularly strong way for the resilience

altogether. In order to verify the difference between the

average neighbor degrees of all considered species, we

calculated the distributions of these measurements by taking

into account only the removed hubs, which represent 5% of

the total number of proteins. Fig. 5 shows the obtained results.

The median values for each distribution are also indicated in

this figure. We took into account the median instead

of the mean because the distributions are not symmetric.33

It can be observed that H. sapiens and D. melanogaster exhibit

hubs with larger average neighbor degree than the S. cerevisiae

and the C. elegans. Therefore, our results suggest that

this mechanism of reinforcing the connections of the

neighbors of hubs is particularly accountable for the higher

resilience of more evolved organisms. Indeed, large values of

average neighbor degree of hubs indicate alternative connec-

tions between neighbors of such nodes, which favor the

network resilience. As a result, the high density of connections

between nodes connected to hubs tends to minimize the hub

removals.

Although the average neighbor degree and the coefficient of

power law presented the highest correlation with the entropy

coefficients, the other measurements also exhibit a significant

correlation. The shortest path length is strongly correlated,

negatively, with the dynamic entropy, indicating that small

shortest path lengths favor resilience. In fact, the high density

of connections among the neighbors of each node tends to

create alternative paths between pairs of nodes reducing the

network average shortest path length. The network diameter

presents a smaller correlation with the entropy measurements

than the average shortest path length. It is interesting to note

that the assortativity coefficients are not strongly correlated

with the resilience related measurements, indicating that this

feature is not contributing significantly for the resilience. In

fact, the smaller is this coefficient, the less robust the network.

This result corroborated other works, which verified that the

Fig. 3 Intentional attacks in protein–protein interaction networks. The variation of the size of largest component (a), diameter (b) and average

shortest path length (c) as a function of the fraction of removed vertices for the protein–protein interaction networks of S. cerevisiae (Sce-CCSB

and Sce-Union), C. elegans (Cel-Wi and Cel-BG), D. melanogaster (Dme-BG and Dme-Fl) andH. sapiens (Hsa-BG and Hsa-CCSB). Each point is

an average over 1000 simulations. The lines are only guides for the eye.

Publ

ishe

d on

04

Febr

uary

201

1. D

ownl

oade

d by

New

Yor

k U

nive

rsity

on

26/1

0/20

14 2

2:01

:23.

View Article Online

Page 6: Resilience of protein–protein interaction networks as determined by their large-scale topological features

1268 Mol. BioSyst., 2011, 7, 1263–1269 This journal is c The Royal Society of Chemistry 2011

assortative network is considerably more robust against the

removal of vertices than the disassortative networks.28

Conclusion

The relationship between the structure and dynamics is a

fundamental issue in complex networks theory. In order to

investigate the behavior of protein–protein interaction

networks under random fails and intentional attacks, we

compared eight networks, corresponding to four species,

namely the yeast S. cerevisiae, the worm C. elegans, the fly

D. melanogaster and the H. sapiens. Computer simulations

revealed that the diameter, shortest path length and size of the

giant component have similar variations in all species when

their respective networks are subjected to random removal of

proteins. These results are in agreement with previous experi-

mental observations, which showed that a large number of

perturbations does not result in any phenotypic perturbation

variation under a given experimental condition.13,14 On the

other hand, these networks revealed different topological

variations when subjected to intentional attacks. As a matter

of fact, the networks of H. sapiens and D. melanogaster

are more robust under hubs removal than the networks of

S. cerevisiae and C. elegans. Thus, more complex species tend

to have higher resilience against intentional attacks in the

respective network.

The relationship between resilience and network organiza-

tion was analyzed by taking into account robustness related

measurements—namely the entropy of the degree distribution

and the dynamics entropy based on the Markov process—and

topological measurements. We calculated the correlation

between six structural measurements and these entropy-related

quantities. We verified that two topological features contri-

buted significantly for the resilience, namely the density of

hubs, quantified by the coefficient of the power law degree

distribution, and the average neighbor degree. Particularly, the

analysis of the distribution of the average neighbor degree of

the removed hubs for all species shows that these measure-

ments are higher for more complex species. This indicates that

the presence of alternative paths between the proteins

Fig. 4 The obtained correlations between the entropy of the degree distribution and each measurement are indicated by r. The points in black are

related to the dynamic entropy Hd, and those in gray to the entropy of the degree distribution H. The structural measurements are the average

clustering coefficient cc, the average shortest path length sp, the network diameter d, the average neighbor degree r, the assortativity coefficient s

and the scale coefficient of the power law degree distribution, g. P-values for testing the hypothesis of no correlation against the alternative that

there is a nonzero correlation are indicated by the numbers in brackets.

Fig. 5 Distribution of the average neighbor degree for the removed

hubs of S. cerevisiae (Sce-CCSB and Sce-Union), C. elegans (Cel-Wi

and Cel-BG), D. melanogaster (Dme-BG and Dme-Fl) and H. sapiens

(Hsa-BG and Hsa-CCSB). The median of each distribution, m, is also

indicated in the respective plots.

Publ

ishe

d on

04

Febr

uary

201

1. D

ownl

oade

d by

New

Yor

k U

nive

rsity

on

26/1

0/20

14 2

2:01

:23.

View Article Online

Page 7: Resilience of protein–protein interaction networks as determined by their large-scale topological features

This journal is c The Royal Society of Chemistry 2011 Mol. BioSyst., 2011, 7, 1263–1269 1269

connected to hubs reinforces the resilience. The remaining four

studied measurements also contributed for resilience, but less

decisively.

The results presented in the current paper are fundamental for

the understanding of the relationship between the large scale

organization of protein–protein interaction networks and the

respective biological functions. The particularly investigated role,

i.e. the resilience, is related to biological evolution and organism

organization.34 Thus, the identification of which role is favoring

the evolution can help the development of computational models

of protein network evolution, such as ref. 35, and understand

which mechanisms are used during the natural evolution to

produce robust architectures. Further investigation can be

performed by taking into account other network topological

measurements and other removal dynamics, such as considering

the deletion of nodes belonging to a given class of proteins. Edges

removal can also be investigated by a similar approach. Other

measurements for quantification of the resilience of protein

interaction networks can also be developed. The application of

resilience analysis with respect to metabolic and transcriptional

regulatory networks constitutes another promising investigation.

Acknowledgements

Luciano da F. Costa thanks CNPq (301303/06-1) and FAPESP

(05/00587-5) for sponsorship. Andre L. Barbieri thanks CAPES

for the financial support.

References

1 A.-L. Barabasi and Z. N. Oltvai, Nat. Rev. Genet., 2004, 5, 101.2 E. M. Marcotte, M. Pellegrini, H. Ng, D. W. Rice, T. O. Yeatesand D. Eisenberg, Science, 1999, 285, 751.

3 L. d. F. Costa, F. A. Rodrigues and A. S. Cristino, Genet. Mol.Biol., 2008, 31, 591.

4 B.Schwikowski,P.UetzandS.Fields,Nat.Biotechnol., 2000,18, 1257.5 A. Vazquez, A. Flammini, A. Maritan and A. Vespignani, Nat.Biotechnol., 2003, 21, 697.

6 R. Sharan, I. Ulitsky and R. Shamir, Mol. Syst. Biol., 2007, 3.7 P. Erdos and A. Renyi, Publ. Math., 1959, 6, 290.8 R. Albert and A. L. Barabasi, Rev. Mod. Phys., 2002, 74, 47–97.9 H. Jeong, S. P. Mason, A.-L. Barabasi and Z. N. Oltvai, Nature,2001, 411, 41.

10 L. d. F. Costa, F. A. Rodrigues and G. Travieso, Appl. Phys. Lett.,2006, 89, 41–42.

11 S. Boccaletti, V. Latora, Y. Moreno, M. Chaves andD.-U. Hwang, Phys. Rep., 2006, 424, 175.

12 F. A. Rodrigues and L. F. Costa, Mol. BioSyst., 2009, 5, 385–390.13 G. Giaever, A. Chu, L. Ni, C. Connelly, L. Riles, S. Veronneau,

S. Dow, A. Lucau-Danila, K. Anderson and B. Andre, et al.,Nature, 2002, 418, 387.

14 R. Kamath, A. Fraser, Y. Dong, G. Poulin, R. Durbin, M. Gotta,A. Kanapin, N. Le Bot, S. Moreno and M. Sohrmann, et al.,Nature, 2003, 421, 231.

15 R. Albert, H. Jeong and A.-L. Barabasi, Nature, 2000, 406, 378.16 D. Li, J. Li, S. Ouyang, J. Wang, S. Wu, P. Wan, Y. Zhu, X. Xu

and F. He, Proteomics, 2006, 6, 456.17 B. Wang, H. Tang, C. Guo and Z. Xiu, Phys. A, 2006, 363, 591.18 L. Demetrius and T. Manke, Phys. A, 2005, 346, 682.19 T. Manke, L. Demetrius and M. Vingron, J. R. Soc. Interface,

2006, 3, 843.20 H. Yu, P. Braun, M. Yildirim, I. Lemmens, K. Venkatesan,

J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li andN. Simonis, et al., Science, 2008, 322, 104.

21 T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori andY. Sakaki, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 4569.

22 S. Li, C. Armstrong, N. Bertin, H. Ge, S. Milstein, M. Boxem,P. Vidalain, J. Han, A. Chesneau and T. Hao, et al., Science, 2004,303, 540.

23 C. Stark, B. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz andM. Tyers, Nucleic Acids Res., 2006, 34, D535.

24 J. Yu, S. Pacifico, G. Liu and R. Finley, BMC Genomics, 2008, 9,461.

25 G. Mishra, M. Suresh, K. Kumaran, N. Kannabiran, S. Suresh,P. Bala, K. Shivakumar, N. Anuradha, R. Reddy andT. Raghavan, et al., Nucleic Acids Res., 2006, 34, D411.

26 L. da F. Costa, F. A. Rodrigues, G. Travieso and P. R. V. Boas,Adv. Phys., 2007, 56, 167.

27 R. Pastor-Satorras, A. Vazquez and A. Vespignani, Phys. Rev.Lett., 2001, 87, 258701.

28 M. E. J. Newman, Phys. Rev. Lett., 2002, 89, 208701.29 A. J. Shaywitz, S. L. Dove, M. E. Greenberg and A. Hochschild,

Science, 2002, 2002, DOI: 10.1126/stke.2002.142.pl11.30 P. Billingsley, Ergodic theory and information, Wiley, New York,

1965.31 M. E. J. Newman, Contemp. Phys., 2005, 46, 323.32 A. Rives and T. Galitski, Proc. Natl. Acad. Sci. U. S. A., 2003, 100,

1128.33 G. Box, J. Hunter and W. Hunter, Statistics for Experimenters.

Design, Innovation, and Discovery, Wiley Series in Probability andStatistics, 2005.

34 H. Kitano, Nat. Rev. Genet., 2004, 5, 826.35 R. V. Sole, R. Pastor-Satorras, E. Smith and T. B. Kepler,

Adv. Complex Syst., 2002, 5, 43.

Publ

ishe

d on

04

Febr

uary

201

1. D

ownl

oade

d by

New

Yor

k U

nive

rsity

on

26/1

0/20

14 2

2:01

:23.

View Article Online