Download - For five columns, line up guides with these boxes

Transcript
Page 1: For five columns, line up guides with these boxes

Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center

• Application of meta-search, grid-computing, and

machine-learning can significantly improve the

sensitivity of peptide identification.

• The PepArML meta-search engine is publicly

available, free of charge, on-line from:

http://edwardslab.bmcb.georgetown.edu

Improving the Sensitivity of Peptide Identification from Tandem Mass Spectrausing Meta-Search, Grid-Computing, and Machine-Learning.

Introduction

Automatic search engine configuration and execution,

parameterized by:

• Instrument & proteolytic agent

• Fixed and variable modifications

• Protein sequence database & MS/MS spectra file

• Peptide candidate selection

Nathan J. Edwards, Georgetown University Medical Center

Unified MS/MS Search Interface

MS/MS Spectra Reformatting

Peptide Identification Meta-Search via Grid-Computing PeptideMapper Web-Service

Conclusions

References

The PepArML meta-search engine provides:

• A unified MS/MS search interface for Mascot, X!

Tandem, KScore, OMSSA, and MyriMatch,

• Search job scheduling on multiple large-scale

heterogeneous computational grids,

• Unsupervised, model-free result combining

using machine-learning (PepArML [1])

The PepArML meta-search engine improves peptide

identification sensitivity, significantly increasing the

number of peptide ids at 10% FDR.

Georgetown University

1. N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-

Free, Machine-Learning Combiner for Peptide Identifications from

Tandem Mass Spectra." Clinical Proteomics 5.1 (2009).

2. N.J. Edwards. "Novel Peptide Identification using Expressed

Sequence Tags and Sequence Database Compression." Molecular

Systems Biology 3.102 (2007).

PepArML – Unsupervised Machine-Learning Combiner

NSF TeraGrid1000+ CPUs

Edwards LabScheduler &48+ CPUs

Meta-search with five search engines;Automatic target & decoy searches.

Securecommunication

Heterogeneouscompute resources

Scales to 250+simultaneous

searches

Free, instantregistration

Tandem,KScore,OMSSA,

MyriMatch.

X!Tandem,KScore,OMSSA,

MyriMatch,Mascot(1 core).

Simple search description

• Charge and precursor enumeration for peptide

candidate selection (for charge & 13C peak correction)

• Search engine formatting constraints (MGF/mzXML)

• Consistent MS/MS spectrum identifier tracking

• Spectrum file “chunking”

Peptide Candidate Selection• Missed cleavages, specific or semi-specific proteolysis

• Precursor matching parameters, including

• Precursor mass tolerance & 13C peak correction

• Charge state guessing and/or enumeration

Job management Result combining

Annual Meeting, 2009

• 8685 search jobs, 25.7 days of total CPU time.

• 5211 TeraGrid TKO jobs in < 2 hours (143 nodes)

• Total elapsed time (Mascot bottleneck): < 26 hours.

Peptide Atlas A8_IP LTQ MS/MS Dataset• Tryptic search of Human ESTs using PepSeqDB [2]

• 107084 spectra searched ~ 26 times:

- Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge

• Ad-hoc, one-click mapping of peptides to protein

and transcript sequence evidence, and genomic

loci. • Interactive, SOAP,

HTTP → CSV,XML,BED