Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science...

40
Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University
  • date post

    20-Jan-2016
  • Category

    Documents

  • view

    222
  • download

    0

Transcript of Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science...

Page 1: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Modular Organization of Protein Interaction Network

Feng Luo, Ph.D.Department of Computer ScienceClemson University

Page 2: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Outline Background.

Network module definition.

Algorithm for identifying modules in network.

Page 3: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Biological Networks

Biological SystemsMade of many non-identical elements

connected by diverse interactions.

Biological Networks

Biological networks as framework for the study of biological systems

Page 4: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Nodes: proteins Links: physical interactions (Jeong et al., 2001)

Protein Interaction Network

Page 5: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Metabolic NetworkNodes: chemicals (substrates) Links: chemistry reactions (Ravasz et al., 2002)

Page 6: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Biological System are Modular There is increasing evidence that the cell system is composed

of modules

A “module” in a biological system is a discrete unit whose function is separable from those of other modules

Modules defined based on functional criteria reflect the critical level of biological organization (Hartwell, et al.)

A modular system can reuse existing, well-tested modules

Functional modules will be reflect in the topological structures of biological networks.

Identifying functional modules and their relationship from biological networks will help to the understanding of the organization, evolution and interaction of the cellular systems they represent

Page 7: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Biological Modules in Biological Networks

1 2

3

Page 8: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Background: Identify Modules from Biological Networks

Most efforts focused on detecting highly connected clusters.

Ignored the peripheral proteins. Modules with other topology are not identified. Modules are isolated and no inter relationship is revealed.

Page 9: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Background: Identify Modules from Biological Networks (continue)

Traditional clustering algorithms have been applied to protein interaction networks (PIN) to find biological modules.

Need transforming PIN into weighted networks Weight the protein interactions based on number of experiments that

support the interaction (Pereira-Leal et al). Weight with shortest path length (River et al. and Arnau et al. ).

Drawbacks Weights are artificial. “tie in proximity” problem in hierarchical agglomerative clustering

(HAC).

Page 10: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Background: Identify Modules from Biological Networks (continue)

Radicchi et al. (PNAS, 2004) proposed two new definitions of module in network.

For a sub-graph VG, the degree definition of vertex iV in a undirected graph

equal to 1 if i and j are directly connected; it is equal to zero otherwise.

Strong definition of Module Weak definition of Module

Vj

jiini AVk ,)(

Vj

jiouti AVk ,)(

jiA ,

ViVkVk outi

ini )()(

Vi

outi

Vi

ini VkVk )()(

Page 11: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Background: Identify Modules from Biological Networks (continue)

Two module definitions do not follow the intuitive concept of module exactly.

Page 12: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Summary of our work A new formal definition of network modules A new agglomerative algorithm for assembling

modules Application to yeast protein interaction dataset

Page 13: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Degree of Subgraph Given a graph G, let S be a subgraph of G (S G).

The adjacent matrix of sub-graph S and its neighbors N can be given

as:

Indegree of S, Ind(S):

Where is 1 if both vertex i and vertex j are in sub-graph S and 0

otherwise.

Outdegree of S, Outd(S):

Where is 1 if only one of vertex i and vertex j belong to sub-graph

S and 0 otherwise.

otherwise

StobelongsjorieitherandconnectedjandiverticesifSij

,

0

1

ji

ij jiSSind,

),()(

ji

ij jiSSoutd,

),()(

),( ji

),( ji

Page 14: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Degree of Subgraph: Example

Ind(1) =16Outd(1)=5

1 2

3

Ind(2) =7Outd(2)=4

Ind(3) =8Outd(3)=5

Page 15: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Modularity The modularity M of a sub-graph S in a given

graph G is defined as the ratio of its indegree, ind(S), and outdegree, outd(S):

)(

)(

Soutd

SindM

Page 16: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

New Network Module Definition A subgraph S G is a module if M>1.

Ind(1) =16Outd(1)=5M=3.2

1 2

3

Ind(2) =7Outd(2)=4M=1.75

Ind(3) =8Outd(3)=5M=1.6

Page 17: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Comparison to Radicchi’s Module Defintions

This sample network is a Strong module, but is not a module by this new definition based on indegree vs outdegree criteria

Page 18: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Agglomerative Algorithm for Identifying Network Modules

Flow chart of the agglomerative algorithm

Page 19: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

The Order of Merging Edge Betweenness

(Girvan-Newman, 2002)

Defined as the number of shortest paths between all pairs of vertices that run through it.

Edges between modules have higher betweenness values.

Betweenness = 20

Page 20: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

The Order of Merging (continue)

Gradually deleting the edge with the highest betweenness will generate an order of edges. Edges between modules will be deleted

earlier. Edges inside modules will be deleted later.

Reverse the deletion order of edges and use it as the merging order.

Page 21: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

When Merging Occurs? Between two non-modules Between a non-module and a module Not between two modules

Page 22: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Testing Data Set Yeast Core Protein Interaction Network (PIN).

The yeast core PIN from Database of Interacting Proteins (DIP) (version ScereCR20041003).

Total: 2609 proteins; 6355 links.

Large component: 2440 proteins, 6401 interactions.

Page 23: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

86 Modules Obtained from DIP Yeast core PIN

Page 24: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Robustness of Modules

Page 25: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Robustness of Modules

Page 26: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Validation of modules Annotated each protein with the Gene OntologyTM (GO) terms from

the Saccharomyces Genome Database (SGD) (Cherry et al. 1998; Balakrishna et al)

Quantified the co-occurrence of GO terms using the hypergeometric distribution analysis supported by the Gene Ontology Term Finder of SGD(Balakrishna et al)

The results show that each module has statistically significant co-occurrence of bioprocess GO categories

Page 27: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Validation of modulesModules with 100% GO frequency

Module # GOID GO_term Frequency Genome Frequency Probability

134 45851 pH reduction 14 out of 14 genes, 100% 21 out of 7274 2.79E-36

140 6402 mRNA catabolism 14 out of 14 genes, 100% 55 out of 7274 1.99E-30

23 6267 pre-replicative complex formation and maintenance 7 out of 7 genes, 100% 13 out of 7272 5.83E-20

99 6617SRP-dependent cotranslational protein-membrane

targeting, signal sequence recognition 6 out of 6 genes, 100% 7 out of 7274 7.94E-19

109 6207 'de novo' pyrimidine base biosynthesis 5 out of 5 genes, 100% 5 out of 7274 1.53E-16

54 42147 retrograde transport, endosome to Golgi 5 out of 5 genes, 100% 10 out of 7272 4.91E-15

108 6303double-strand break repair via nonhomologous end-

joining 5 out of 5 genes, 100% 19 out of 7274 1.21E-13

96 96 sulfur amino acid metabolism 5 out of 5 genes, 100% 31 out of 7274 1.40E-12

55 6896 Golgi to vacuole transport 4 out of 4 genes, 100% 18 out of 7272 3.75E-11

84 6109 regulation of carbohydrate metabolism 4 out of 4 genes, 100% 26 out of 7274 1.63E-10

Page 28: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Validation of modulesMost significant GO term in top 10 largest modules

Module # Module Size GOID GO term Frequency Genome Frequency Probability

202 201 6913 nucleocytoplasmic transport 62 out of 201 genes, 30.8% 105 out of 7274 5.48E-63

199 111 30163 protein catabolism 46 out of 111 genes, 41.4% 175 out of 7274 2.85E-44

193 93 16071 mRNA metabolism 58 out of 93 genes, 62.3% 184 out of 7274 4.69E-68

189 76 7028 cytoplasm organization and biogenesis 56 out of 76 genes, 73.6% 250 out of 7274 5.81E-65

187 59 30036actin cytoskeleton organization and

biogenesis 31 out of 59 genes, 52.5% 101 out of 7274 9.93E-42

182 50 6366transcription from RNA polymerase II

promoter 34 out of 50 genes, 68% 270 out of 7274 6.35E-37

185 45 16573 histone acetylation 17 out of 45 genes, 37.7% 28 out of 7274 8.90E-30

188 45 6364 rRNA processing 34 out of 45 genes, 75.5% 175 out of 7274 7.18E-46

175 44 48193 Golgi vesicle transport 36 out of 44 genes, 81.8% 137 out of 7274 1.20E-54

194 42 6338 chromatin remodeling 18 out of 42 genes, 42.8% 128 out of 7274 6.18E-21

Page 29: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Validation of modules Comparison with module definitions of Radicchi et al.

Running the agglomerative algorithm based on different definitions

Average lowest P value (-log10)

Number of Modules (larger than 3)

Our 16.77497 86

Weak 12.28661 157

Strong 13.5531 33

0

2

4

6

8

10

12

14

16

18

Our Weak Strong

Ave

rag

e P

val

ues

(-l

og

10)

Page 30: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Validation of modules

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70

P value (-log10) Bin

Mo

du

les in

each

P v

alu

e (

-lo

g10)

bin

(%

)

Weak our Strong

Page 31: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Validation of modules P values of modules obtained based our definition plot against P

values of the corresponding weak modules (line is y=x).

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70

P value (-log10) of our modules

P v

alu

e (

-lo

g1

0)

of

co

rre

sp

on

din

g w

ea

k m

od

ule

s

Page 32: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Constructing the Network of Modules

Assembling the 86 MoNet modules to form an interconnected network of modules. For each adjacent module

pair, the edge that is deleted last by the G-N algorithm was selected from all the edges that connect two modules to represent the link between two modules.

1 2

3

1 2

3

Page 33: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

A Section of Module Network of 30 Largest Modules

Page 34: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Conclusions Provide a framework for decomposing the protein interaction network into

functional modules

The modules obtained appear to be biological functional modules based on clustering of Gene Ontology terms

The network of modules provides a plausible way to understanding the interactions between these functional modules

With the increasing amounts of protein interaction data available, our approach will help construct a more complete view of interconnected functional modules to better understand the organization of the whole cellular system

Page 35: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Questions?

Page 36: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Limitation of Global Algorithms Biological networks

are incomplete.

Each vertex can only belong to one module.

Page 37: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Local Optimization Algorithm

Page 38: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

139 Modules Obtained from DIP Yeast core PIN

Page 39: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Example of Module Overlap

Page 40: Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Interconnected Module Network