A Method for Protein Functional Flow Configuration and Validation Woo-Hyuk Jang 1 Suk-Hoon Jung 1...

1
A Method for Protein Functional Flow Configuration and Validation Woo-Hyuk Jang 1 Suk-Hoon Jung 1 Dong-Soo Han 1 [email protected] [email protected] [email protected] 1 School of Engineering, Information and Communications University, 119, Munjiro,Yuseong-gu, Daejeon, 305-714, Korea ABSTRACT Background At the early studies of PPI prediction, many prediction techniques were developed based mainly on a few features of a protein (i.e., domain frequency in the interaction protein pair), so they suffered from low prediction accuracy problem. However, recent researches gradually consider physicochemical properties of proteins and support high resolution results with integration of experimental results. With regard to current research trend, it is very close future to complete a PPI network configuration of each organism. The signal transduction is a process which describes a cell change by external stimulus, and it plays an important reference roll of most fundamental cellular processes. Most of signal transduction are initiated by extra cellular signal, and cascade intracellular activities by ligand-receptor binding are followed such as protein phosphorylation and de-phosphorylation, PPI, and protein-small molecules’ interaction. Given the fact that signal transduction pathway is protein’s cascading activities, identifying participants and their relationship for a specific signal transduction from whole PPI network is an essential work. However, even though the PPI network is completely configured, extracting signal transduction pathway is complicated problem because PPI network is only a set of co-expressive proteins or gene products, and its network link means simple physical binding rather than in-depth knowledge of biological process. Thus, most of the target signal transductions have been manually discovered so far. To overcome the problem, we suggest a protein functional flow model which can be a constraint to extract meaningful sub paths from whole PPI network. Suggested model is a directed network based on a protein functions’ relation of signaling transduction pathway. With explosively growing PPI databases, the computational approach for a prediction and configuration of PPI network has been a big stream in the bioinformatics area. Recent researches gradually consider physicochemical properties of proteins and support high resolution results with integration of experimental results. With regard to current research trend, it is very close future to complete a PPI network configuration of each organism. However, direct applying the PPI network to real field is a complicated problem because PPI network is only a set of co-expressive proteins or gene products, and its network link means simple physical binding rather than in-depth knowledge of biological process. In this paper, we suggest a protein functional flow model which is a directed network based on a protein functions’ relation of signaling transduction pathway. The vertex of the suggested model is a molecular function annotated by gene ontology, and the relations among the vertexes are considered as edges. Thus, it is easy to trace a specific function’s transition, and it can be a constraint to extract a meaningful sub-path from whole PPI network. To evaluate the model, 11 functional flow models of Homo sapiens were built from KEGG, and Chronbach’s alpha values were measured (alpha=0.67). Among 1023 functional flows, 765 functional flows showed 0.6 or higher alpha values Motivation & Related Work ▣ Conventional PPI network can not be directly applied to real fields such as signaling transduction pathway prediction or metabolic pathway prediction, because of a lack of in-depth biological knowledge annotations. ▣ Researchers’ focuses are moving from a single protein-protein interaction possibility inspection to a extracting meaningful sub networks against whole protein interaction network. Figure 1. To fulfill applied area’s needs, researchers’ focuses are moving from single protein pairs to functionally related sub-networks. In this situation, it is essential to develop a reference model or rules to navigate the paths. ▣ Protein pairs show a specific functional patterns on the PPI network. ▣ Correlated interacting genes with GO annotations (~12% of interacting genes had exactly same annotations; 27% had very similar annotations)[1]. ▣ Found functional patterns from PPI network, and compared them to random patterns respect to MIPS and KEGG respectively[2]. ▣ Functional template modeling by abstraction of enzyme functions[3]. Figure 2. Gene Ontology (GO) annotations have hierarchical relationship each other, so one function can be replaced to its parent function. With this strategy, they finally make the most general functional template, the Pathway Functionality Template (PFT). Functional Flow ▣ Concept 1: There are functional flow patterns in the meaningful protein interaction path such as signaling transduction pathway. ▣ Concept 2: If a certain protein pair show a functional pattern, a relation type (i.e. activation) of this pair is similarly detected to other protein pairs which have same functional patterns. ▣ Concept 3: Usually, multiple molecular functions are annotated for one protein, but only one function of them would be selected by other protein. RAB23 SMO (SMOH) SHH (HPE3, HLP3) PTCH2 LRP2 inhibition dissociatio n activation binding/ association Q68DJ6, Q9ULC3 A4D1K5, Q99835 Q13635, Q9Y6C5 Q14623, Q43323, … (4) P98164 GO:0005515 , GO:0004872 GO:0005515 , GO:0004872 , GO:0004888 GO:0015485 , GO:0005113 , GO:0043237 GO:0005515 , Unknown 5515 4872 15485 5113 43237 4888 activation activation activation binding/association activation activation dissociation 5515 4872 4888 5515 dissociatio n dissociatio n dissociatio n GO:0005515, protein binding GO:0004872, receptor activity GO:0004888, transmembrane receptor activity GO:0015485, cholesterol binding GO:0005113, patched binding GO:0043237, laminin-1 binding Figure 3. Some proteins and their relationship in Hedgehog Signaling Pathway. ▣ Figure 3 shows some proteins and their relationship of Hedgehog signaling transduction pathway. Each protein has general molecular functions and the functions have relations such as inhibition or dissociation. [1] Tong, et al., “Global mapping of the yeast genetic interaction network”, Science, 2004. [2] Mehmet E Turanalp and Tolga Can, “Discovering functional interaction patterns in protein-protein interaction networks”, BMC Bioinformatics, 2008. [3] Ali Cakmak and Gultekin Ozsoyoglu(CS, USA), “Mining biological networks for unknown pathways”, BIOINFORMATICS, 2007, 23:20. ▣ This research was financially supported by the Ministry of Education, Science Technology (MEST) and Korea Industrial Technology Foundation (KOTEF) through the Human Recourse Training Project For Regional Innovation. ▣ Based on the relation “binding/association” between “LRP2” and “SHH”, we consider that protein binding(GO:0005515) , cholesterol binding(GO:0015485), patched binding(GO:0005113) and laminin-1 binding(GO:0043237) functions have “binding/association” relation. Proteins whose function is unknown were manually removed, and redundant count of functional flows was utilized as a weight score. Similarly, total 1023 functional flow were extracted from 11 H. sapiens signaling transduction pathways of KEGG database. Validation ▣ Top 10 Function Flows including & excluding general function Protein Binding (GO:0005515) Function 1 Function 2 Sub Type Count GO:000551 5 GO:000551 5 phophorylation 35 GO:000551 5 GO:000551 5 activation 29 GO:000551 5 GO:000551 5 compound 16 GO:000443 5 GO:000551 5 compound 13 GO:000551 5 GO:000551 5 inhibition 12 GO:000551 5 GO:000551 5 Binding/association 11 GO:000472 2 GO:000551 5 compound 10 GO:000551 5 GO:000467 4 phosphorylation 9 GO:000551 5 GO:000370 0 phosphorylation 9 GO:000551 5 GO:000467 2 phosphorylation 8 Function 1 Function 2 Sub Type Coun t GO:0004722 GO:0004708 Inhibition/ dephosphorylation 5 GO:0000287 GO:0004708 phosphorylation 5 GO:0004707 GO:0003700 phosphorylation 5 GO:0046332 GO:0043565 activation 4 GO:0046332 GO:0003700 activation 4 GO:0004722 GO:0005545 compound 4 GO:0004722 GO:0005545 compound 4 GO:0004722 GO:0005158 compound 4 GO:0004722 GO:0016303 compound 4 GO:0004722 GO:0019903 compound 4 ▣ Internal integrity of functional flow was measured via Chronbach’s alpha value. Chronbach’s alpha value checks integrity or similarity of each questionnaire when single concept is asked by many different questionnaires. The variables of Chronbach’s alpha values correspond to followings. 2 1 2 1 1 X N i i N N N = total count of functional flows which extracted from a specific signaling transduction pathway = a variance of a specific functional flows out of 11 signaling transduction pathways. = a variance of all functional flows in a specific signaling transduction pathway. 2 i 2 X ▣ Note that the type of a certain functional flow has conflict in other signaling transduction pathway, we decrease a appearance values from 11 to zero. The average alpha of overall functional flows was 0.67, and 765 functional flows had 0.6 or higher alpha values. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Alpha 04010 04012 04310 04330 04340 04350 04370 04630 04020 04070 04150 KEGG Sig. Path. C hronbach'salpha 0 1 2 3 4 5 6 7 D istan ce 04010 04012 04310 04330 04340 04350 04370 04630 04020 04070 04150 KEGG Sig. Path. G O term distance from root Intelligent Service Integration Laboratory, http://isilab.icu.ac.kr

Transcript of A Method for Protein Functional Flow Configuration and Validation Woo-Hyuk Jang 1 Suk-Hoon Jung 1...

Page 1: A Method for Protein Functional Flow Configuration and Validation Woo-Hyuk Jang 1 Suk-Hoon Jung 1 Dong-Soo Han 1 toraim@icu.ac.kr jsh@icu.ac.krdshan@icu.ac.kr.

A Method for Protein Functional Flow Configuration and ValidationWoo-Hyuk Jang1 Suk-Hoon Jung1 Dong-Soo Han1

[email protected] [email protected] [email protected]

1 School of Engineering, Information and Communications University, 119, Munjiro,Yuseong-gu, Daejeon, 305-714, Korea

ABSTRACT

Background

At the early studies of PPI prediction, many prediction techniques were developed based mainly on a few features of a protein (i.e., domain frequency in the interaction protein pair), so they suffered from low prediction accuracy problem. However, recent researches gradually consider physicochemical properties of proteins and support high resolution results with integration of experimental results. With regard to current research trend, it is very close future to complete a PPI network configuration of each organism. The signal transduction is a process which describes a cell change by external stimulus, and it plays an important reference roll of most fundamental cellular processes. Most of signal transduction are initiated by extra cellular signal, and cascade intracellular activities by ligand-receptor binding are followed such as protein phosphorylation and de-phosphorylation, PPI, and protein-small molecules’ interaction. Given the fact that signal transduction pathway is protein’s cascading activities, identifying participants and their relationship for a specific signal transduction from whole PPI network is an essential work. However, even though the PPI network is completely configured, extracting signal transduction pathway is complicated problem because PPI network is only a set of co-expressive proteins or gene products, and its network link means simple physical binding rather than in-depth knowledge of biological process. Thus, most of the target signal transductions have been manually discovered so far. To overcome the problem, we suggest a protein functional flow model which can be a constraint to extract meaningful sub paths from whole PPI network. Suggested model is a directed network based on a protein functions’ relation of signaling transduction pathway.

With explosively growing PPI databases, the computational approach for a prediction and configuration of PPI network has been a big stream in the bioinformatics area. Recent researches gradually consider physicochemical properties of proteins and support high resolution results with integration of experimental results. With regard to current research trend, it is very close future to complete a PPI network configuration of each organism. However, direct applying the PPI network to real field is a complicated problem because PPI network is only a set of co-expressive proteins or gene products, and its network link means simple physical binding rather than in-depth knowledge of biological process. In this paper, we suggest a protein functional flow model which is a directed network based on a protein functions’ relation of signaling transduction pathway. The vertex of the suggested model is a molecular function annotated by gene ontology, and the relations among the vertexes are considered as edges. Thus, it is easy to trace a specific function’s transition, and it can be a constraint to extract a meaningful sub-path from whole PPI network. To evaluate the model, 11 functional flow models of Homo sapiens were built from KEGG, and Chronbach’s alpha values were measured (alpha=0.67). Among 1023 functional flows, 765 functional flows showed 0.6 or higher alpha values

Motivation & Related Work

▣ Conventional PPI network can not be directly applied to real fields such as signaling transduction pathway prediction or metabolic pathway prediction, because of a lack of in-depth biological knowledge annotations.

▣ Researchers’ focuses are moving from a single protein-protein interaction possibility inspection to a extracting meaningful sub networks against whole protein interaction network.

Figure 1. To fulfill applied area’s needs, researchers’ focuses are moving from single protein pairs to functionally related sub-networks. In this situation, it is essential to develop a reference model or rules to navigate the paths.

▣ Protein pairs show a specific functional patterns on the PPI network.

▣ Correlated interacting genes with GO annotations (~12% of interacting genes had exactly same annotations; 27% had very similar annotations)[1].

▣ Found functional patterns from PPI network, and compared them to random patterns respect to MIPS and KEGG respectively[2].

▣ Functional template modeling by abstraction of enzyme functions[3].

Figure 2. Gene Ontology (GO) annotations have hierarchical relationship each other, so one function can be replaced to its parent function. With this strategy, they finally make the most general functional template, the Pathway Functionality Template (PFT).

Functional Flow

▣ Concept 1: There are functional flow patterns in the meaningful protein interaction path such as signaling transduction pathway.

▣ Concept 2: If a certain protein pair show a functional pattern, a relation type (i.e. activation) of this pair is similarly detected to other protein pairs which have same functional patterns.

▣ Concept 3: Usually, multiple molecular functions are annotated for one protein, but only one function of them would be selected by other protein.

RAB23 SMO(SMOH)

SHH(HPE3, HLP3)PTCH2 LRP2

inhibition dissociation

activation binding/association

Q68DJ6, Q9ULC3

A4D1K5, Q99835

Q13635, Q9Y6C5

Q14623, Q43323, … (4)

P98164

GO:0005515, GO:0004872

GO:0005515, GO:0004872, GO:0004888

GO:0015485, GO:0005113, GO:0043237

GO:0005515, Unknown

5515

4872

15485

5113

43237

4888

activation

activation

activation

binding/association

activation

activation

dissociation

5515

4872

4888

5515dissociation

dissociation

dissociation

GO:0005515, protein binding

GO:0004872, receptor activity

GO:0004888, transmembrane receptor activity

GO:0015485, cholesterol binding

GO:0005113, patched binding

GO:0043237, laminin-1 binding

Figure 3. Some proteins and their relationship in Hedgehog Signaling Pathway.

▣ Figure 3 shows some proteins and their relationship of Hedgehog signaling transduction pathway. Each protein has general molecular functions and the functions have relations such as inhibition or dissociation.

[1] Tong, et al., “Global mapping of the yeast genetic interaction network”, Science, 2004.

[2] Mehmet E Turanalp and Tolga Can, “Discovering functional interaction patterns in protein-protein interaction networks”, BMC Bioinformatics, 2008.

[3] Ali Cakmak and Gultekin Ozsoyoglu(CS, USA), “Mining biological networks for unknown pathways”, BIOINFORMATICS, 2007, 23:20.

▣ This research was financially supported by the Ministry of Education, Science Technology (MEST) and Korea Industrial Technology Foundation (KOTEF) through the Human Recourse Training Project For Regional Innovation.

▣ Based on the relation “binding/association” between “LRP2” and “SHH”, we consider that protein binding(GO:0005515) , cholesterol binding(GO:0015485), patched binding(GO:0005113) and laminin-1 binding(GO:0043237) functions have “binding/association” relation. Proteins whose function is unknown were manually removed, and redundant count of functional flows was utilized as a weight score. Similarly, total 1023 functional flow were extracted from 11 H. sapiens signaling transduction pathways of KEGG database.

Validation

▣ Top 10 Function Flows including & excluding general function Protein Binding (GO:0005515)

Function 1 Function 2 Sub Type Count

GO:0005515 GO:0005515 phophorylation 35

GO:0005515 GO:0005515 activation 29

GO:0005515 GO:0005515 compound 16

GO:0004435 GO:0005515 compound 13

GO:0005515 GO:0005515 inhibition 12

GO:0005515 GO:0005515 Binding/association 11

GO:0004722 GO:0005515 compound 10

GO:0005515 GO:0004674 phosphorylation 9

GO:0005515 GO:0003700 phosphorylation 9

GO:0005515 GO:0004672 phosphorylation 8

Function 1 Function 2 Sub Type Count

GO:0004722 GO:0004708 Inhibition/dephosphorylation 5

GO:0000287 GO:0004708 phosphorylation 5

GO:0004707 GO:0003700 phosphorylation 5

GO:0046332 GO:0043565 activation 4

GO:0046332 GO:0003700 activation 4

GO:0004722 GO:0005545 compound 4

GO:0004722 GO:0005545 compound 4

GO:0004722 GO:0005158 compound 4

GO:0004722 GO:0016303 compound 4

GO:0004722 GO:0019903 compound 4

▣ Internal integrity of functional flow was measured via Chronbach’s alpha value. Chronbach’s alpha value checks integrity or similarity of each questionnaire when single concept is asked by many different questionnaires. The variables of Chronbach’s alpha values correspond to followings.

21

2

11 X

N

ii

N

N

• N = total count of functional flows which extracted from a specific signaling transduction pathway

• = a variance of a specific functional flows out of 11 signaling transduction pathways.

• = a variance of all functional flows in a specific signaling transduction pathway.

2

i

2X

▣ Note that the type of a certain functional flow has conflict in other signaling transduction pathway, we decrease a appearance values from 11 to zero. The average alpha of overall functional flows was 0.67, and 765 functional flows had 0.6 or higher alpha values.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Alpha

04010 04012 04310 04330 04340 04350 04370 04630 04020 04070 04150

KEGG Sig. Path.

Chronbach's alpha

0

1

2

3

4

5

6

7

Distance

04010 04012 04310 04330 04340 04350 04370 04630 04020 04070 04150

KEGG Sig. Path.

GO term distance from root

Intelligent Service Integration Laboratory, http://isilab.icu.ac.kr