Download - IPA: an Informed Proteomics Analysis Tool for Improved Peptide … · 2016-01-06 · IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications Sangtae Kim, Gordon

Transcript
Page 1: IPA: an Informed Proteomics Analysis Tool for Improved Peptide … · 2016-01-06 · IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications Sangtae Kim, Gordon

IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications Sangtae Kim, Gordon W. Slysz, Kevin L. Crowell, Samuel H. Payne, Gordon A. Anderson, and Richard D. Smith

Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA

Introduction

Overview Methods Results

Acknowledgements Portions of this research were supported by the NIH National Center for

Research Resources (RR18522) and National Institute of General Medical

Sciences (8 P41 GM103493-10), and by the U. S. Department of Energy

Office of Biological and Environmental Research (DOE/BER) Genome

Sciences Program. Samples were analyzed using capabilities developed

under the support of NIH National Institute of General Medical Sciences (8

P41 GM103493-10) and DOE/BER.

Significant portions of the work were performed in the Environmental

Molecular Science Laboratory, a DOE/BER national scientific user facility at

Pacific Northwest National Laboratory in Richland, Washington.

References 1. Kessner D et al. ProteoWizard: open source software for rapid

proteomics tools development. Bioinformatics 24: 2534-6 (2008).

2. Mayampurath AM et al. DeconMSn: a software tool for accurate

parent ion monoisotopic mass determination for tandem mass

spectra. Bioinformatics 24: 1021-3 (2008).

3. Kim S et al. MS-GF+: Universal database search tool for mass

spectrometry. Submitted (2013).

4. Eng J et al. An approach to correlate tandem mass spectral data of

peptides with amino acid sequences in a protein database. J Am Soc

Mass Spectrom 5: 976–989 (1994).

5. Nesvizhskii AI et al. A statistical model for identifying proteins by

tandem mass spectrometry. Anal Chem 75: 4646-58 (2003).

6. Cox J and Mann M. MaxQuant enables high peptide identification

rates, individualized p.p.b.-range mass accuracies and proteome-

wide protein quantification. Nat Biotechnol 26: 1367-72 (2008).

Conclusions

Data sets

CONTACT: Sangtae Kim, Ph.D. Biological Sciences Division

Pacific Northwest National Laboratory

E-mail: [email protected]

• A new approach to using MS1 and

MS/MS spectra for identifying peptides.

• IPA, an informed proteomics analysis

tool provides ~15% more peptide

identifications.

• IPA better handles co-eluted peptides.

• IPA identified ~20% and ~10% more

peptides compared to MSConvert and

DeconMSn, respectively.

• IPA identified 20 - 40% more peptides

than Sequest/PeptideProphet and

MaxQuant.

• For the phosphorylation-enriched

dataset, IPA identified a comparable

number of peptides compared to

MaxQuant and DeconMSn/ MS-GF+.

• Shewanella: 34,342 CID spectra (High-Low)

• Human-iTRAQ: iTRAQ-labeled 22,806 HCD

spectra

# identified peptides (1% FDR)

Examples of peptides exclusively identified by IPA

MS/MS Isolation window

[760.34, 762.34]

Protein Database

Charge 1: [759.33,763.35] Charge 2: [1518.66, 1526.7] Charge 3: [2277.99, 2290.05] …

Score all peptides whose

ion m/z’s are within the

isolation window

Peptide Charge Score(SpecEValue)

GETASVADNTTENGR 2 1.41288E-18

ASEWAAK 1 1.99072E-07

EPLLYDFVVRDR 2 2.85083E-07

NTIYAAGRVGTETLGVYRINL 3 3.15152E-07

DGRDGIVVDRKPEFKVGARVEVEAKFK 4 4.1451E-07

GKVWGRQM+15.995AKLVPPQENKAK 3 4.85142E-07

LNPVFPLPHEVAFWYSGQASSSYDFGQ 4 6.69304E-07

VM+15.995DGGAFVKPNTTQFPNDAQK 3 9.16367E-07

WADTAAK 1 9.84701E-07

DETLTIDKELAARVVEGDHGDVLMDVAK 4 1.07169E-06

List top k good scoring

peptides using MS-GF+

MS2

Precursor

For each peptide,

compare its theoretical

isotopomer profile with

MS1 features across

multiple scans

Peptide Charge Score(SpecEValue)

GETASVADNTTENGR 2 1.41288E-18

Filter peptides with good

MS1 feature “fit”

Precursor m/z: 761.3433 Charge 2

Determine

“accurate precursor”

Protein Database

Score peptides with masses [1520.6557, 1520.6861]

Score peptides

matching the

accurate precursor

Report the best scoring

peptide

Precursor MS

Software Tools Used

• MSConvert (MC): determine precursors based on

selected ions in the raw file

• DeconMSn (DM): determine precursors based on the

Averagine

• Sequest (SQ): database search engine [4]

• PeptideProphet (PP): re-scoring Sequest

identifications [5]

• MS-GF+ (MG): database search engine

• MaxQuant (MQ): database search with precursor

refinement [6]

• Informed Proteome Analysis (IPA): integrated search

and precursor refinement

0

2000

4000

6000

8000

10000

12000

DM/SQ/PP MQ MC/MG DM/MG IPA

0

1000

2000

3000

4000

5000

DM/SQ/PP MQ MC/MG DM/MG IPA

0

500

1000

1500

2000

2500

DM/SQ/PP MQ MC/MG DM/MG IPA

Shewanella

Human-iTRAQ

Human-iTRAQ-Phos

• Determining accurate monoisotopic precursor mass-

to-charge ratio (m/z) and charge is important.

• A simple method to use the ion selected by the

instrument (e.g., msconvert in ProteoWizard [1]) is not

effective.

• A more complex method using the Averagine model

(e.g., DeconMSn [2]) works better. However, this does

not work well if multiple co-eluted peptides are

present in the isolation window.

• We present a new Informed Proteomics Analysis

(IPA) approach that addresses this problem.

• Without pre-determination of the accurate precursor,

IPA scores all peptides whose ion m/z are within

the isolation window.

• Afterwards, IPA uses MS1 spectra to filter out

peptides based on the fit between their isotopomer

profiles and corresponding MS1 peaks.

• IPA also assigns a score to each peptide using its

MS/MS identification scores (SpecEValue) and

correlation of its extracted ion chromatograms (XICs)

of MS1 features associated with the peptide.

• With IPA, database search parameters are

streamlined. For MS-GF+ [3], required parameters

are: 1) spectrum file(s), 2) database file, and 3)

modification file.

Existing approach IPA approach

Isolation window

Running Time (Shewanella)

IPA peptide-centric scoring

For each peptide passing filters, assign a score

based on:

1) MS/MS identification scores (SpecEValue) and

2) Correlation of XICs of MS1 features

AADLGLETVIVER matched to 4 spectra

Scan#: 15831 SpecEValue: 2.0E-14 Charge 2

Scan#: 15850 SpecEValue: 4.4E-14 Charge 1

Scan#: 16178 SpecEValue: 6.5E-12 Charge 2

Scan#: 16172 SpecEValue: 5.3E-12 Charge 2

Monoisotopic ion of

KWEQITSGTAPFYIDPAR Co-eluted peptide

AADLGLETVIVER

0.00E+00

5.00E+06

1.00E+07

1.50E+07

2.00E+07

2.50E+07

3.00E+07

3.50E+07

4.00E+07

15700 15800 15900 16000 16100 16200 16300 16400

Inte

nsi

ty

Scan Number

Charge 1

Charge 2

(Note: Peptide-centric scoring is still under development.)

• Human-iTRAQ-Phos: iTRAQ-labeled 29,212 HCD

spectra from Phosphorylation enriched human

sample

• All data sets are generated with Thermo LTQ-

Orbitrap Velos

Co-eluted peptides

m/z: 890.93 SpecEValue: 8.7E-15 Charge 3

m/z: 890.78 SpecEValue: 9.0E-15 Charge 2

“Busy” isolation window

Isolation window target and selected ion

Correct m/z

m/z: 1068.97 SpecEValue: 5.2E-21 Charge 2

0

1000

2000

3000

4000

5000

DM/SQ/PP MQ MC/MG DM/MG IPA

IPA

MG

MC

MQ

PP

SQ

DM

(Note: DeconMSn does not support multi-threading.)