Post on 20-Jan-2020
…
IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications Sangtae Kim, Gordon W. Slysz, Kevin L. Crowell, Samuel H. Payne, Gordon A. Anderson, and Richard D. Smith
Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA
Introduction
Overview Methods Results
Acknowledgements Portions of this research were supported by the NIH National Center for
Research Resources (RR18522) and National Institute of General Medical
Sciences (8 P41 GM103493-10), and by the U. S. Department of Energy
Office of Biological and Environmental Research (DOE/BER) Genome
Sciences Program. Samples were analyzed using capabilities developed
under the support of NIH National Institute of General Medical Sciences (8
P41 GM103493-10) and DOE/BER.
Significant portions of the work were performed in the Environmental
Molecular Science Laboratory, a DOE/BER national scientific user facility at
Pacific Northwest National Laboratory in Richland, Washington.
References 1. Kessner D et al. ProteoWizard: open source software for rapid
proteomics tools development. Bioinformatics 24: 2534-6 (2008).
2. Mayampurath AM et al. DeconMSn: a software tool for accurate
parent ion monoisotopic mass determination for tandem mass
spectra. Bioinformatics 24: 1021-3 (2008).
3. Kim S et al. MS-GF+: Universal database search tool for mass
spectrometry. Submitted (2013).
4. Eng J et al. An approach to correlate tandem mass spectral data of
peptides with amino acid sequences in a protein database. J Am Soc
Mass Spectrom 5: 976–989 (1994).
5. Nesvizhskii AI et al. A statistical model for identifying proteins by
tandem mass spectrometry. Anal Chem 75: 4646-58 (2003).
6. Cox J and Mann M. MaxQuant enables high peptide identification
rates, individualized p.p.b.-range mass accuracies and proteome-
wide protein quantification. Nat Biotechnol 26: 1367-72 (2008).
Conclusions
Data sets
CONTACT: Sangtae Kim, Ph.D. Biological Sciences Division
Pacific Northwest National Laboratory
E-mail: sangtae.kim@pnnl.gov
• A new approach to using MS1 and
MS/MS spectra for identifying peptides.
• IPA, an informed proteomics analysis
tool provides ~15% more peptide
identifications.
• IPA better handles co-eluted peptides.
• IPA identified ~20% and ~10% more
peptides compared to MSConvert and
DeconMSn, respectively.
• IPA identified 20 - 40% more peptides
than Sequest/PeptideProphet and
MaxQuant.
• For the phosphorylation-enriched
dataset, IPA identified a comparable
number of peptides compared to
MaxQuant and DeconMSn/ MS-GF+.
• Shewanella: 34,342 CID spectra (High-Low)
• Human-iTRAQ: iTRAQ-labeled 22,806 HCD
spectra
# identified peptides (1% FDR)
Examples of peptides exclusively identified by IPA
MS/MS Isolation window
[760.34, 762.34]
Protein Database
Charge 1: [759.33,763.35] Charge 2: [1518.66, 1526.7] Charge 3: [2277.99, 2290.05] …
Score all peptides whose
ion m/z’s are within the
isolation window
Peptide Charge Score(SpecEValue)
GETASVADNTTENGR 2 1.41288E-18
ASEWAAK 1 1.99072E-07
EPLLYDFVVRDR 2 2.85083E-07
NTIYAAGRVGTETLGVYRINL 3 3.15152E-07
DGRDGIVVDRKPEFKVGARVEVEAKFK 4 4.1451E-07
GKVWGRQM+15.995AKLVPPQENKAK 3 4.85142E-07
LNPVFPLPHEVAFWYSGQASSSYDFGQ 4 6.69304E-07
VM+15.995DGGAFVKPNTTQFPNDAQK 3 9.16367E-07
WADTAAK 1 9.84701E-07
DETLTIDKELAARVVEGDHGDVLMDVAK 4 1.07169E-06
List top k good scoring
peptides using MS-GF+
MS2
Precursor
For each peptide,
compare its theoretical
isotopomer profile with
MS1 features across
multiple scans
Peptide Charge Score(SpecEValue)
GETASVADNTTENGR 2 1.41288E-18
Filter peptides with good
MS1 feature “fit”
Precursor m/z: 761.3433 Charge 2
Determine
“accurate precursor”
Protein Database
Score peptides with masses [1520.6557, 1520.6861]
Score peptides
matching the
accurate precursor
Report the best scoring
peptide
Precursor MS
Software Tools Used
• MSConvert (MC): determine precursors based on
selected ions in the raw file
• DeconMSn (DM): determine precursors based on the
Averagine
• Sequest (SQ): database search engine [4]
• PeptideProphet (PP): re-scoring Sequest
identifications [5]
• MS-GF+ (MG): database search engine
• MaxQuant (MQ): database search with precursor
refinement [6]
• Informed Proteome Analysis (IPA): integrated search
and precursor refinement
0
2000
4000
6000
8000
10000
12000
DM/SQ/PP MQ MC/MG DM/MG IPA
0
1000
2000
3000
4000
5000
DM/SQ/PP MQ MC/MG DM/MG IPA
0
500
1000
1500
2000
2500
DM/SQ/PP MQ MC/MG DM/MG IPA
Shewanella
Human-iTRAQ
Human-iTRAQ-Phos
• Determining accurate monoisotopic precursor mass-
to-charge ratio (m/z) and charge is important.
• A simple method to use the ion selected by the
instrument (e.g., msconvert in ProteoWizard [1]) is not
effective.
• A more complex method using the Averagine model
(e.g., DeconMSn [2]) works better. However, this does
not work well if multiple co-eluted peptides are
present in the isolation window.
• We present a new Informed Proteomics Analysis
(IPA) approach that addresses this problem.
• Without pre-determination of the accurate precursor,
IPA scores all peptides whose ion m/z are within
the isolation window.
• Afterwards, IPA uses MS1 spectra to filter out
peptides based on the fit between their isotopomer
profiles and corresponding MS1 peaks.
• IPA also assigns a score to each peptide using its
MS/MS identification scores (SpecEValue) and
correlation of its extracted ion chromatograms (XICs)
of MS1 features associated with the peptide.
• With IPA, database search parameters are
streamlined. For MS-GF+ [3], required parameters
are: 1) spectrum file(s), 2) database file, and 3)
modification file.
Existing approach IPA approach
Isolation window
Running Time (Shewanella)
IPA peptide-centric scoring
For each peptide passing filters, assign a score
based on:
1) MS/MS identification scores (SpecEValue) and
2) Correlation of XICs of MS1 features
AADLGLETVIVER matched to 4 spectra
Scan#: 15831 SpecEValue: 2.0E-14 Charge 2
Scan#: 15850 SpecEValue: 4.4E-14 Charge 1
Scan#: 16178 SpecEValue: 6.5E-12 Charge 2
Scan#: 16172 SpecEValue: 5.3E-12 Charge 2
Monoisotopic ion of
KWEQITSGTAPFYIDPAR Co-eluted peptide
AADLGLETVIVER
0.00E+00
5.00E+06
1.00E+07
1.50E+07
2.00E+07
2.50E+07
3.00E+07
3.50E+07
4.00E+07
15700 15800 15900 16000 16100 16200 16300 16400
Inte
nsi
ty
Scan Number
Charge 1
Charge 2
(Note: Peptide-centric scoring is still under development.)
• Human-iTRAQ-Phos: iTRAQ-labeled 29,212 HCD
spectra from Phosphorylation enriched human
sample
• All data sets are generated with Thermo LTQ-
Orbitrap Velos
Co-eluted peptides
m/z: 890.93 SpecEValue: 8.7E-15 Charge 3
m/z: 890.78 SpecEValue: 9.0E-15 Charge 2
“Busy” isolation window
Isolation window target and selected ion
Correct m/z
m/z: 1068.97 SpecEValue: 5.2E-21 Charge 2
0
1000
2000
3000
4000
5000
DM/SQ/PP MQ MC/MG DM/MG IPA
IPA
MG
MC
MQ
PP
SQ
DM
(Note: DeconMSn does not support multi-threading.)