For five columns, line up guides with these boxes
description
Transcript of For five columns, line up guides with these boxes
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center
• Application of meta-search, grid-computing, and
machine-learning can significantly improve the
sensitivity of peptide identification.
• The PepArML meta-search engine is publicly
available, free of charge, on-line from:
http://edwardslab.bmcb.georgetown.edu
Improving the Sensitivity of Peptide Identification from Tandem Mass Spectrausing Meta-Search, Grid-Computing, and Machine-Learning.
Introduction
Automatic search engine configuration and execution,
parameterized by:
• Instrument & proteolytic agent
• Fixed and variable modifications
• Protein sequence database & MS/MS spectra file
• Peptide candidate selection
Nathan J. Edwards, Georgetown University Medical Center
Unified MS/MS Search Interface
MS/MS Spectra Reformatting
Peptide Identification Meta-Search via Grid-Computing PeptideMapper Web-Service
Conclusions
References
The PepArML meta-search engine provides:
• A unified MS/MS search interface for Mascot, X!
Tandem, KScore, OMSSA, and MyriMatch,
• Search job scheduling on multiple large-scale
heterogeneous computational grids,
• Unsupervised, model-free result combining
using machine-learning (PepArML [1])
The PepArML meta-search engine improves peptide
identification sensitivity, significantly increasing the
number of peptide ids at 10% FDR.
Georgetown University
1. N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-
Free, Machine-Learning Combiner for Peptide Identifications from
Tandem Mass Spectra." Clinical Proteomics 5.1 (2009).
2. N.J. Edwards. "Novel Peptide Identification using Expressed
Sequence Tags and Sequence Database Compression." Molecular
Systems Biology 3.102 (2007).
PepArML – Unsupervised Machine-Learning Combiner
NSF TeraGrid1000+ CPUs
Edwards LabScheduler &48+ CPUs
Meta-search with five search engines;Automatic target & decoy searches.
Securecommunication
Heterogeneouscompute resources
Scales to 250+simultaneous
searches
Free, instantregistration
Tandem,KScore,OMSSA,
MyriMatch.
X!Tandem,KScore,OMSSA,
MyriMatch,Mascot(1 core).
Simple search description
• Charge and precursor enumeration for peptide
candidate selection (for charge & 13C peak correction)
• Search engine formatting constraints (MGF/mzXML)
• Consistent MS/MS spectrum identifier tracking
• Spectrum file “chunking”
Peptide Candidate Selection• Missed cleavages, specific or semi-specific proteolysis
• Precursor matching parameters, including
• Precursor mass tolerance & 13C peak correction
• Charge state guessing and/or enumeration
Job management Result combining
Annual Meeting, 2009
• 8685 search jobs, 25.7 days of total CPU time.
• 5211 TeraGrid TKO jobs in < 2 hours (143 nodes)
• Total elapsed time (Mascot bottleneck): < 26 hours.
Peptide Atlas A8_IP LTQ MS/MS Dataset• Tryptic search of Human ESTs using PepSeqDB [2]
• 107084 spectra searched ~ 26 times:
- Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge
• Ad-hoc, one-click mapping of peptides to protein
and transcript sequence evidence, and genomic
loci. • Interactive, SOAP,
HTTP → CSV,XML,BED