Improving the Sensitivity of Peptide Identification
-
Upload
zeus-robertson -
Category
Documents
-
view
37 -
download
0
description
Transcript of Improving the Sensitivity of Peptide Identification
![Page 1: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/1.jpg)
Improving the Sensitivityof Peptide Identification
by Meta-Search, Grid-Computing,
and Machine-Learning
Nathan EdwardsGeorgetown University Medical Center
![Page 2: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/2.jpg)
2
Searching under the street-light…
Tandem mass spectrometry doesn’t discriminate against novel peptides...
...but protein sequence databases do!
Searching traditional protein sequence databases biases the results in favor of well-understood and/or computationally predicted proteins and protein isoforms!
![Page 3: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/3.jpg)
3
Lost peptide identifications
Missing from the sequence database
Search engine strengths, weaknesses, quirks
Poor score or statistical significance
Thorough search takes too long
![Page 4: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/4.jpg)
4
Lost peptide identifications
Missing from the sequence database Build exhaustive peptide sequence databases Build evidence for unannotated proteins and protein isoforms
Search engine strengths, weaknesses, quirks Use multiple search engines and combine results
Poor score or statistical significance Use search-engine consensus to boost confidence Use machine-learning to distinguish true from false
Thorough search takes too long Harness the power of heterogeneous computational grids
![Page 5: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/5.jpg)
5
Unannotated Splice Isoform
Human Jurkat leukemia cell-line Lipid-raft extraction protocol, targeting T cells von Haller, et al. MCP 2003. Peptide Atlas raftflow, raftapr, raftaug
LIME1 gene: LCK interacting transmembrane adaptor 1
LCK gene: Leukocyte-specific protein tyrosine kinase Proto-oncogene Chromosomal aberration involving LCK in leukemias.
Multiple significant peptide identifications
![Page 6: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/6.jpg)
6
Unannotated Splice Isoform
![Page 7: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/7.jpg)
7
Unannotated Splice Isoform
![Page 8: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/8.jpg)
8
Splice Isoform Anomaly
Human erythroleukemia K562 cell-line Depth of coverage study Resing et al. Anal. Chem. 2004. Peptide Atlas A8_IP
SALT1A2 gene: Sulfotransferase family, cytosolic, 1A
2 ESTs, 1 mRNA mRNA from lung, small cell-cancinoma sample
Single (significant) peptide identification Five agreeing search engines PepArML FDR < 1%. All source engines have non-significant E-values
![Page 9: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/9.jpg)
9
Splice Isoform Anomaly
![Page 10: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/10.jpg)
10
Splice Isoform Anomaly
![Page 11: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/11.jpg)
11
Peptide Sequence Databases
All amino-acid seqs of at most 30 amino-acids from: IPI and all IPI constituent protein sequences
IPI, HInvDB, VEGA, UniProt, EMBL, RefSeq, GenBank
SwissProt variants, conflicts, splices, and annotated signal peptide truncations.
Genbank and RefSeq mRNA sequence 3 frame translation
GenBank EST and HTC sequences 6 frame translation and found in at least 2 sequences
Grouped by Gene/UniGene cluster and compressed.
![Page 12: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/12.jpg)
12
Formatted as a FASTA sequence database Easy integration with search engines.
One entry per gene/cluster. Automated rebuild every few months.
Peptide Sequence Databases
Organism Size (AA) Size (Entries)Human 248Mb 74,976Mouse 171Mb 55,887
Rat 76Mb 42,372Zebra-fish 94Mb 40,490
![Page 13: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/13.jpg)
13
Peptide evidence, in context
Statistically significant identified peptides can be misleading… Isobaric amino-acid/PTM substitutions Unsubstantiated peptide termini
Few b-ions or y-ions suggest “random” mass match Single amino-acids on upstream or downstream exons
Peptides in 5’ UTR with no upstream Met Need tools to quickly check the corroborating
(genomic, transcript, SNP) evidence
![Page 14: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/14.jpg)
14
PeptideMapper Web Service Counts:
by gene and evidence EST, mRNA, Protein
Sequences: accessions by gene UniProt variants nucleotide sequence &
link to BLAT alignment Genomic Loci:
one-click projection onto the UCSC genome browser
peptides with cSNPs too!
![Page 15: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/15.jpg)
15
PeptideMapper Web Service
I’m Feeling Lucky
![Page 16: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/16.jpg)
16
PeptideMapper Web Service
I’m Feeling Lucky
![Page 17: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/17.jpg)
17
Combining search engine results – harder than it looks!
Consensus boosts confidence, but... How to assess statistical significance? Gain specificity, but lose sensitivity! Incorrect identifications are correlated too!
How to handle weak identifications? Consensus vs disagreement vs abstention Threshold at some significance?
We apply unsupervised machine-learning.... Lots of related work unified in a single framework.
![Page 18: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/18.jpg)
18
PepArML – Peptide identification Arbiter by Machine-Learning
![Page 19: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/19.jpg)
19
Peptide Atlas A8_IP LTQ Dataset
![Page 20: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/20.jpg)
20
Peptide Atlas Halobacterium Dataset
![Page 21: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/21.jpg)
21
Running many search engines
Search engine configuration can be difficult: Correct spectral format Search parameter files and command-line Pre-processed sequence databases. Tracking spectrum identifiers Extracting peptide identifications, especially
modifications and protein identifiers
![Page 22: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/22.jpg)
22
Peptide Identification Meta-Search Parameters Instrument
Precursor Tolerance Fragment Tolerance Max. Charge
Sequence Database Target and # of Decoys
Modification Fixed/Variable Amino-Acids Position Delta
Proteolytic Agent Motif
Peptide Candidates Termini Specificity Precursor Tolerance Missed cleavages Charge State Handling # 13C Peaks
Search Engines Mascot, X!Tandem, K-Score, OMSSA, MyriMatch
![Page 23: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/23.jpg)
23
Peptide Identification Meta-Search Simple unified search
interface for: Mascot, X!Tandem,
K-Score, OMSSA, MyriMatch
Automatic decoy searches
Automatic spectrumfile "chunking"
Automatic scheduling Serial, Multi-
Processor, Cluster, Grid
![Page 24: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/24.jpg)
24
PepArML Meta-Search EngineNSF TeraGrid1000+ CPUs
UMIACS250+ CPUs
Edwards LabScheduler &48+ CPUs
Securecommunication
Heterogeneouscompute resources
Single, simplesearch request
Scales easily to 250+ simultaneous
searches
X!Tandem,KScore,OMSSA,
MyriMatch,Mascot(1 core).
X!Tandem,KScore,OMSSA,
MyriMatch.
X!Tandem,KScore,OMSSA.
![Page 25: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/25.jpg)
25
PepArML Meta-Search EngineNSF TeraGrid1000+ CPUs
UMIACS250+ CPUs
Edwards LabScheduler &48+ CPUs
Securecommunication
Heterogeneouscompute resources
Single, simplesearch request
Scales easily to 250+ simultaneous
searches
X!Tandem,KScore,OMSSA,
MyriMatch,Mascot(1 core).
X!Tandem,KScore,OMSSA,
MyriMatch.
X!Tandem,KScore,OMSSA.
![Page 26: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/26.jpg)
26
PepArML Meta-Search Engine
NSF TeraGrid1000+ CPUs
UMIACS250+ CPUs
Edwards LabScheduler &48+ CPUs
Securecommunication
Heterogeneouscompute resources
Simple searchrequest
![Page 27: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/27.jpg)
27
PepArML Meta-Search Engine
NSF TeraGrid1000+ CPUs
UMIACS250+ CPUs
Edwards LabScheduler &48+ CPUs
Securecommunication
Heterogeneouscompute resources
Simple searchrequest
![Page 28: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/28.jpg)
28
Peptide Atlas A8_IP LTQ Dataset
Tryptic search of Human ESTs using PepSeqDB 107084 spectra (145 files) searched ~ 26 times:
Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge
8685 search jobs 25.7 days of CPU time. 5211 TeraGrid TKO jobs < 2 hours
Using 143 different machines
Total elapsed time < 26 hours Bottleneck: Mascot license (1 core, 4 CPUs)
![Page 29: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/29.jpg)
29
PepArML Meta-Search Engine
Access to high-performance computing resources for the proteomics community NSF TeraGrid Community Portal University/Institute HPC clusters Individual lab compute resources Contribute cycles to the community
and get access to others’ cycles in return.
Centralized scheduler Compute capacity can still be exclusive, or prioritized. Compute client plays well with HPC grid schedulers.
![Page 30: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/30.jpg)
30
Conclusions
Improve sensitivity of peptide identification, using Exhaustive peptide sequence databases Machine-learning for combining Meta-search tools to maximize consensus Grid-computing for thorough search
Tools & cycles available to the community...
http://edwardslab.bmcb.georgetown.edu
![Page 31: Improving the Sensitivity of Peptide Identification](https://reader033.fdocuments.in/reader033/viewer/2022051620/568130f3550346895d971772/html5/thumbnails/31.jpg)
31
Acknowledgements
Dr. Catherine Fenselau University of Maryland Biochemistry
Dr. Rado Goldman Georgetown University Medical Center
Dr. Chau-Wen Tseng & Dr. Xue Wu University of Maryland Computer Science
Funding: NIH/NCI