1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D.,...

39
1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012

Transcript of 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D.,...

Page 1: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

1

Eukaryotic Secretome Prediction and Knowledge-Base Development

Xiang-Jia “Jack” Min

Ph.D., Assistant Professor

2nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012

Page 2: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

2

DNA RNA phenotypeprotein

Page 3: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

3

Genome

Transcriptome

Proteome

Secretome

mRNA (protein-coding DNA

sequences)

Protein sequences

Proteins with secretory signal peptide

Transcription

Translation

Secretion

Page 4: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

4

Günter Blobel

Page 5: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

5

Page 6: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

6

Page 7: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

7

Page 8: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

8

Biomaterials Small molecules

Fungi

secreted enzymes

YeastsMouldsMushrooms

Biomaterials Bio-fuels

Enzymes

Page 9: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

9

How to identify secreted proteins?

Genome

Transcriptome

Proteome

Secretome

Transcription

Translation

Secretion

(1) Direct identification using proteomics methods (Tsang et al. 2009)

(2) Computational prediction from predicted proteome

(3) EST data mining

Page 10: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

10

Secreted Proteins

• Classical secreted proteins have a signal peptide at N-terminus;

• Not all proteins have a signal peptide are secreted:

• Signal peptide = secreted protein

Page 11: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

11

SignalP: a program to predict if a protein contains a signal peptide.

Phobius: signal peptide and transmembrane domain predicton.

WolfPsort: a multiple subcellular location predictor

TargetP: detect proteins targeted to mitochondria.

TMHMM: transmembrane domain prediction.

PS-Scan: detection ER-retention signals

Page 12: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

12

Page 13: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

13

Page 14: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

14

Human cytochrome C oxidase subunit 1 (COX1)

Page 15: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

15

Page 16: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

16

Data

Secreted Non-secreted

Fungi 241 5,992

Animals 5,568 19,048

Plants 216 7,528

Protists 32 1,979

Page 17: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

17

Method

• Sensitivity (%) = TP/(TP + FN) x 100

• Specificity (%) = TN/(TN + FP) x 100

• Mathews’ Correlation Coefficient (MCC)

MCC (%) = (TP x TN – FP x FN) x 100 /((TP + FP) (TP + FN) (TN + FP) (TN + FN))1/2

Page 18: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

18

TP FP TN FNSn

(%) Sp (%)MCC (%)

SignalP 232 329 5663 9 96.3 94.5 61.2

Phobius 226 203 5789 15 93.8 96.6 68.8

TargetP 228 583 5409 13 94.6 90.3 48.6

WolfPsort 230 167 5825 11 95.4 97.2 73.1

SignalP/TMHMM 228 168 5824 13 94.6 97.2 72.6

Phobius/TMHMM 224 200 5792 17 92.9 96.7 68.6

TargetP/TMHMM 224 265 5727 17 92.9 95.6 63.5

WolfPsort/TMHMM 227 135 5857 14 94.2 97.7 75.8

SignalP/TMHMM/WolfPsort 226 86 5906 15 93.8 98.6 81.6

SignalP/TMHMM//WolfPsort/Phobius 222 69 5923 19 92.1 98.8 83.1

SignalP/TMHMM/WolfPsort/Phobius/PS-Scan 222 67 5925 19 92.1 98.9 83.4

SignalP/TMHMM/WolfPsort/Phobius/TargetP/PS-Scan 218 66 5926 23 90.5 98.9 82.6

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity;MCC: Mathews' correlation coefficient.

Table 1. Prediction accuracies of secreted proteins in fungi

Min XJ (2010) JPB 3:143-147.

Page 19: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

19

Table 2. Prediction accuracies of secreted proteins in animals

TP FP TN FNSn (%) Sp (%)

MCC (%)

SignalP 5307 4108 14940 261 95.3 78.4 63.5

Phobius 5157 1167 17881 411 92.6 93.9 82.8

TargetP 5313 5412 13636 255 95.4 71.6 56.5

WolfPsort 5135 1762 17286 433 92.2 90.7 77.3

SignalP/TMHMM 5217 1383 17665 351 93.7 92.7 81.6

Phobius/TMHMM 5148 1142 17906 420 92.5 94.0 82.9

TargetP/TMHMM 5222 1369 17679 346 93.8 92.8 81.8

WolfPsort/TMHMM 5093 1084 17964 475 91.5 94.3 82.8

Phobius/WolfPsort 4959 555 18493 609 89.1 97.1 86.4

Phobius/WolfPsort/TMHMM 4952 544 18504 616 88.9 97.1 86.5

Phobius/WolfPsort/TMHMM/SignalP 4952 544 18504 616 88.9 97.1 86.5

Phobius/WolfPsort/TMHMM/TargetP 4934 505 18543 634 88.6 97.3 86.7

Phobius/WolfPsort/TMHMM/TargetP/PS-Scan 4931 482 18566 637 88.6 97.5 86.9

Phobius/WolfPsort/TMHMM/TargetP/PS-Scan/SignalP 4931 482 18566 637 88.6 97.5 86.9

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient.

Min XJ (2010) JPB 3:143-147.

Page 20: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

20

Table 3. Prediction accuracies of secreted proteins in plants

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient.

TP FP TN FN Sn (%) Sp (%)MCC (%)

SignalP 199 364 7164 17 92.1 95.2 55.4

Phobius 188 638 6890 28 87.0 91.5 41.9

TargetP 198 442 7086 18 91.7 94.1 51.3

WolfPsort 108 70 7458 108 50.0 99.1 53.9

SignalP/TMHMM 197 237 7291 19 91.2 96.9 63.0

Phobius/TMHMM 188 636 6892 28 87.0 91.6 42.0

TargetP/TMHMM 195 256 7272 21 90.3 96.6 61.1

WolfPsort/TMHMM 106 45 7483 110 49.1 99.4 57.7

SignalP/HMM/TargetP 195 149 7379 21 90.3 98.0 70.6

Phobius/TargetP/TMHMM 183 122 7406 33 84.7 98.4 70.4

SignalP/TMHMM/WolfPsort 106 35 7493 110 49.1 99.5 59.9

SignalP/TMHMM/Phobius 188 183 7345 28 87.0 97.6 65.2

SignalP/HMM/Phobius/TargetP 183 113 7415 33 84.7 98.5 71.5

SignalP/HMM/Phobius/TargetP/PS-Scan 183 100 7428 33 84.7 98.7 73.2

SignalP/HMM/Phobius/TargetP/WolfPsort/PS-Scan 102 29 7499 114 47.2 99.6 59.8

Min XJ (2010) JPB 3:143-147.

Page 21: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

21

Summary

• Different prediction tools have different accuracies for prediction of secretomes in different kingdoms of species;

• Combining these tools often increases the prediction accuracy. However, differential combination are needed for species in different kingdoms.

• Optimal methods are proposed.

Page 22: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

22

Page 23: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

23

Page 24: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

24

Page 25: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

25

Views

gi

accession

UniProt ID

Keywords

Species

User Inputs

Manual Curation

Subcellular Location

FunSecKB

fragAnchor

PS-SCAN

TMHMM

TargetP

WolfPsort

Phobius

SignalP

Database

RefSeq

UniProt

Prediction Tools

External Links

Lum G & Min XJ (2011) Database.

Page 26: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

26

Summary of FunSecKB

• Currently the database contains a total of 478,073 fungal protein sequences

• 23,878 predicted and / or curated secreted proteins

• A total of 118 fungal species including 52 fungal species having a complete proteome

Page 27: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

27Lum G & Min XJ (2011) Database.

Page 28: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

28

Lum G & Min XJ (2011) Database.

Page 29: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

29Lum G & Min XJ (2011) Database.

Page 30: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

30

Page 31: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

31

Page 32: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

32

Page 33: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

33

Page 34: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

34

Page 35: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

35

Page 36: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

36

Plant secretomes and other subcellular proteins

Vitis vinifera (%)

Populus trichocarpa (%)

Arabidopsis thaliana (%)

Oryza sativa (%)

SorghumBicolor (%)

Total proteins 29836 41794 32214 39997 32796

Secreted proteins 1892 (6.3) 2487 (6.0) 2835 (8.8) 3085 (7.7) 2394 (7.3)

Mitochondria

Membrane 490 (1.6) 566 (1.4) 415 (1.3) 832 (2.1) 666 (2.0)

Non-membrane 3877 (13.0) 5238 (12.5) 3729 (11.6) 7187 (18.0) 5768 (17.6)

Chloroplast

Membrane 565 (1.9) 601 (1.4) 671 (2.1) 720 (1.8) 610 (1.9)

Non-membrane 3675 (12.3) 4850 (11.6) 4865 (15.1) 6318 (15.8) 5385 (16.4)

ER proteins 29 (0.1) 37 (0.1) 60 (0.2) 32 (0.1) 25 (0.1)

Other membrane proteins 3251 (10.9) 4532 (10.8) 3649 (11.3) 3672 (9.2) 2900 (8.8)

Others (unknown) 16057 (53.8) 23483 (56.2) 15990 (49.64) 18151 (45.4) 15048 (45.9)

Page 37: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

37

Page 38: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

38

Page 39: 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

39

Acknowledgements

Gengkon Lum (M. S. Graduate)Jessica Orr (Undergraduate)Docylyne Shelton (Undergraduate)

Braden Walters (Undergraduate)