For five columns, line up guides with these boxes

1
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center • Application of meta-search, grid- computing, and machine-learning can significantly improve the sensitivity of peptide identification. • The PepArML meta-search engine is publicly available, free of charge, on- line from: http://edwardslab.bmcb.georgetown.edu Improving the Sensitivity of Peptide Identification from Tandem Mass Spectra using Meta-Search, Grid-Computing, and Machine-Learning. Introduction Automatic search engine configuration and execution, parameterized by: • Instrument & proteolytic agent • Fixed and variable modifications • Protein sequence database & MS/MS spectra file • Peptide candidate selection Nathan J. Edwards, Georgetown University Medical Center Unified MS/MS Search Interface MS/MS Spectra Reformatting Peptide Identification Meta-Search via Grid-Computing PeptideMapper Web-Service Conclusions References The PepArML meta-search engine provides: •A unified MS/MS search interface for Mascot, X!Tandem, KScore, OMSSA, and MyriMatch, • Search job scheduling on multiple large-scale heterogeneous computational grids, Unsupervised, model-free result combining using machine-learning (PepArML [1]) The PepArML meta-search engine improves peptide identification sensitivity, significantly increasing the number of peptide ids at 10% FDR. Georgetown University 1. N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra." Clinical Proteomics 5.1 (2009). 2. N.J. Edwards. "Novel Peptide Identification using Expressed Sequence Tags and Sequence Database Compression." Molecular Systems Biology 3.102 (2007). PepArML – Unsupervised Machine-Learning Combiner NSF TeraGrid 1000+ CPUs Edwards Lab Scheduler & 48+ CPUs Meta-search with five search engines; Automatic target & decoy searches. Secure communication Heterogeneous compute resources Scales to 250+ simultaneous searches Free, instant registration Tandem, KScore, OMSSA, MyriMatch. X!Tandem, KScore, OMSSA, MyriMatch, Mascot (1 core). Simple search description • Charge and precursor enumeration for peptide candidate selection (for charge & 13 C peak correction) • Search engine formatting constraints (MGF/mzXML) • Consistent MS/MS spectrum identifier tracking • Spectrum file “chunking” Peptide Candidate Selection • Missed cleavages, specific or semi- specific proteolysis • Precursor matching parameters, including • Precursor mass tolerance & 13 C peak correction • Charge state guessing and/or Job management Result combining Annual Meeting, 2009 8685 search jobs, 25.7 days of total CPU time. 5211 TeraGrid TKO jobs in < 2 hours (143 nodes) • Total elapsed time (Mascot bottleneck): Peptide Atlas A8_IP LTQ MS/MS Dataset • Tryptic search of Human ESTs using PepSeqDB [2] • 107084 spectra searched ~ 26 times: • Ad-hoc, one-click mapping of peptides to protein and transcript sequence evidence, and genomic loci. • Interactive, SOAP, HTTP → CSV,XML,BED

description

Georgetown University. Peptide Atlas A8_IP LTQ MS/MS Dataset Tryptic search of Human ESTs using PepSeqDB [2] 107084 spectra searched ~ 26 times: - Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge. 8685 search jobs, 25.7 days of total CPU time. 5211 TeraGrid TKO jobs in < 2 hours (143 nodes) - PowerPoint PPT Presentation

Transcript of For five columns, line up guides with these boxes

Page 1: For five columns, line up guides with these boxes

Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center

• Application of meta-search, grid-computing, and

machine-learning can significantly improve the

sensitivity of peptide identification.

• The PepArML meta-search engine is publicly

available, free of charge, on-line from:

http://edwardslab.bmcb.georgetown.edu

Improving the Sensitivity of Peptide Identification from Tandem Mass Spectrausing Meta-Search, Grid-Computing, and Machine-Learning.

Introduction

Automatic search engine configuration and execution,

parameterized by:

• Instrument & proteolytic agent

• Fixed and variable modifications

• Protein sequence database & MS/MS spectra file

• Peptide candidate selection

Nathan J. Edwards, Georgetown University Medical Center

Unified MS/MS Search Interface

MS/MS Spectra Reformatting

Peptide Identification Meta-Search via Grid-Computing PeptideMapper Web-Service

Conclusions

References

The PepArML meta-search engine provides:

• A unified MS/MS search interface for Mascot, X!

Tandem, KScore, OMSSA, and MyriMatch,

• Search job scheduling on multiple large-scale

heterogeneous computational grids,

• Unsupervised, model-free result combining

using machine-learning (PepArML [1])

The PepArML meta-search engine improves peptide

identification sensitivity, significantly increasing the

number of peptide ids at 10% FDR.

Georgetown University

1. N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-

Free, Machine-Learning Combiner for Peptide Identifications from

Tandem Mass Spectra." Clinical Proteomics 5.1 (2009).

2. N.J. Edwards. "Novel Peptide Identification using Expressed

Sequence Tags and Sequence Database Compression." Molecular

Systems Biology 3.102 (2007).

PepArML – Unsupervised Machine-Learning Combiner

NSF TeraGrid1000+ CPUs

Edwards LabScheduler &48+ CPUs

Meta-search with five search engines;Automatic target & decoy searches.

Securecommunication

Heterogeneouscompute resources

Scales to 250+simultaneous

searches

Free, instantregistration

Tandem,KScore,OMSSA,

MyriMatch.

X!Tandem,KScore,OMSSA,

MyriMatch,Mascot(1 core).

Simple search description

• Charge and precursor enumeration for peptide

candidate selection (for charge & 13C peak correction)

• Search engine formatting constraints (MGF/mzXML)

• Consistent MS/MS spectrum identifier tracking

• Spectrum file “chunking”

Peptide Candidate Selection• Missed cleavages, specific or semi-specific proteolysis

• Precursor matching parameters, including

• Precursor mass tolerance & 13C peak correction

• Charge state guessing and/or enumeration

Job management Result combining

Annual Meeting, 2009

• 8685 search jobs, 25.7 days of total CPU time.

• 5211 TeraGrid TKO jobs in < 2 hours (143 nodes)

• Total elapsed time (Mascot bottleneck): < 26 hours.

Peptide Atlas A8_IP LTQ MS/MS Dataset• Tryptic search of Human ESTs using PepSeqDB [2]

• 107084 spectra searched ~ 26 times:

- Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge

• Ad-hoc, one-click mapping of peptides to protein

and transcript sequence evidence, and genomic

loci. • Interactive, SOAP,

HTTP → CSV,XML,BED