Conclusion Comprehensive workflow identified approximately 70% more high confident peptide as...

1
Conclusion Comprehensive workflow identified approximately 70% more high confident peptide as compare to general search strategy. The comprehensive workflow helped increase the number of high confident protein identification and high confident grouped protein identification by approximately 63% and 44% respectively as compared to general search approach. Comprehensive workflow identifies large number of high confident peptides with multiple PTMs. The percentage of matched spectra improves significantly when using comprehensive search workflow. References 1.Khoury GA, Baliban RC, Floudas CA. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss- prot database. Sci Rep. 2011 Sep 13;1. 2.Schandorff S, Olsen JV, Bunkenborg J, Blagoev B, Zhang Y, Andersen JS, Mann M. A mass spectrometry-friendly database for cSNP identification. Nat Methods. 2007Jun;4(6):465-6. Overview Purpose: Development of a comprehensive protein identification workflow that helps identify more high confidence peptide/protein IDs including post translational modifications than traditional workflows. Methods: Use of combinations of multiple search engines (e.g., SEQUEST and Mascot) where combinations of PTMs were judiciously chosen for each node based on uniprotKB-relative PTM abundances from high-quality, manually curated, proteome-wide data 1 . Results: Tremendous enhancement in the high confident percolator validated peptide/protein identification compared to standard SEQUEST and MASCOT workflow. Introduction Mass spectrometry has become an established method for protein identification and characterization in recent years. The number of protein identification from complex biological samples depends on many factors, ranging from data acquisition strategy to MS/MS data searching methods. Unfortunately, only a fraction of spectra generated have confident peptide matches for any complex biological sample. There are several factors that are being overlooked by many users in data searching strategy including appropriate combination of post translational modifications (PTMs), coding SNP 2 , isoforms of proteins, iterative searching etc. that can possibly help identify these unmatched spectrum. We herein develop a comprehensive protein identification workflow that helps identify higher number of high confidence peptide/protein IDs and also identify multiple PTMs and partially cleaved peptide in a single run. Methods Comprehensive workflow development We developed a comprehensive MS/MS searching workflow within Proteome Discoverer using a combination of multiple search engines (Figure1) in an iterative fashion to maximise number of protein/peptide identification by considering the most frequently found PTMs 1 ; sequence-isoforms of proteins; and partially cleaved peptide etc. Effect of various factors on peptide identification were explored and implemented in the process that include protein isoforms, missed cleavage sites, semi tryptic digestion and most importantly appropriate combination of PTMs in each search node. The combination of PTMs were judiciously chosen based on uniprotKB-relative abundances of each PTM found experimentally and putatively, from high-quality, manually curated, proteome-wide data 1 . The workflows were tested on plasma and urine samples acquired on a hybrid Orbitrap mass spectrometer. FIGURE 2. Comprehensive workflow increases number of peptide identification Results Peptide Identification We compare the results from our comprehensive searching workflow with general search. We found that on average, the number of high confidence peptides identification (FDR≤0.01) increased by approximately 70% with our comprehensive workflow as compared to general searches, whereas the number of medium confidence peptides identification (FDR≤0.05) increment was twice as compared to general searches (figure2). SEQUEST and Percolator are registered trademarks of University of Washington. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others. FIGURE 4. Comprehensive workflow increases number of matched spectra. Table1. Examples of peptide containing multiple PTMs from Comprehensive search. Improving mass spectrometry data searching workflow to maximize protein Identifications Shadab Ahmad 1 , Amol Prakash 1 , David Sarracino 1 , Bryan Krastins 1 , MingMing Ning 2 , Barbara Frewen 1 , Scott Peterman 1 , Gregory Byram 1 , Maryann S. Vogelsang 1 , Gouri Vadali 1 , Jennifer Sutton 1 , Mary F. Lopez 1 1 Thermo Fisher Scientific, BRIMS (Biomarker Research in Mass Spectrometry), Cambridge, MA 2 Massachusetts General Hospital, Boston, MA with Thermo QExactive benchtop mass spectrometer, with top 15 data dependent MS/MS using HCD fragmentation. Data Analysis The acquired data was searched with proteome discoverer 1.4 (Thermo Fisher Scientific) using comprehensive workflow and also with general SEQUEST workflow with standard PTMs (oxidation at methionine as dynamic modification and alkylation as static modification) coupled with percolator validation (General Search). FIGURE 1. Structure of Comprehensive workflow Sample Preparation In order to evaluate the performance of the comprehensive workflow we took four human samples from two different sources (a) Urine and (b) Plasma (three samples). Human urine and plasma samples were collected with full consent and approval. The samples were subjected to reduction and alkylation followed by digestion with trypsin. Liquid Chromatography and Mass Spectrometry The digested samples were separated with C18 column with 5-45% acetonitrile gradient in 0.1% formic acid through nano-LC system. The urine sample (sample no. 1) and a plasma sample (sample no. 2) were run for 140 minutes and 90 minutes respectively and the data were acquired with LTQ Orbitrap Velos MS with top 11 and top 10 data dependent MS/MS respectively using CID fragmentation . Another two plasma samples (sample no.3 and 4) were run for 250 minutes and 240 minutes respectively and the data were acquired FIGURE 3. Comprehensive workflow increases number of grouped protein identification (with at least two peptide hits per protein) The comprehensive workflow found to increase the number of high confident protein (FDR≤0.01) by 63% and the high confident grouped protein by 44% with respect to the general search. Moreover the comprehensive workflow increases the high confident group proteins (with at least two high confident peptides for every protein in the group) by 15% (figure3). File Total Spectra Matched Spectra General Search (FDR≤0.05) Matched Spectra Comprehensive Search (FDR≤0.05) Matched Spectra General Search (FDR≤0.01) Matched Spectra Comprehensive Search (FDR≤0.01) Sample1 27215 27.9 % 43.5 % 26.0 % 38.5 % Sample2 14005 15.5 % 34.4 % 14.5 % 30.1 % Sample3 30026 19.9 % 32.8 % 19.1 % 30.1 % Sample4 60770 8.2 % 18.1 % 8.0 % 16.8 % Sequence Modification q-Value RATTVTGTPCQDWAAQEPHR R1(ADP-Ribosyl); G7(Myristoyl); C10(Carboxymethyl) ≤0.001 VSHSPPPKQRSSPVTK S2(Phospho); S4(Phospho); K8(Methyl); R10(Methyl) ≤0.001 LLIYAASSLETGVPSR Y4(Phospho); A6(Acetyl) 0.007 LVRPEVDVMCTAFHDNEETFLK M9(Oxidation); C10(Carboxymethyl); F13(Amidated); E17(Carboxy); F20(Amidated) ≤0.001 Moreover the comprehensive workflow identified several high confident peptides with multiple PTMs which reveal the importance of right combination of PTM in a search node (table1). We further investigate the matched and unmatched spectra while using general search and our comprehensive search. We found that the percentage of matched spectra improves significantly when using comprehensive search workflow (figure 4, table2). Table2. Comparative table for matched spectra

Transcript of Conclusion Comprehensive workflow identified approximately 70% more high confident peptide as...

Page 1: Conclusion  Comprehensive workflow identified approximately 70% more high confident peptide as compare to general search strategy.  The comprehensive.

Conclusion Comprehensive workflow identified approximately 70% more high

confident peptide as compare to general search strategy.

The comprehensive workflow helped increase the number of high confident protein identification and high confident grouped protein identification by approximately 63% and 44% respectively as compared to general search approach.

Comprehensive workflow identifies large number of high confident peptides with multiple PTMs.

The percentage of matched spectra improves significantly when using comprehensive search workflow.

References 1. Khoury GA, Baliban RC, Floudas CA. Proteome-wide post-

translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci Rep. 2011 Sep 13;1.

2. Schandorff S, Olsen JV, Bunkenborg J, Blagoev B, Zhang Y, Andersen JS, Mann M. A mass spectrometry-friendly database for cSNP identification. Nat Methods. 2007Jun;4(6):465-6.

Overview Purpose: Development of a comprehensive protein identification workflow that helps identify more high confidence peptide/protein IDs including post translational modifications than traditional workflows.

Methods: Use of combinations of multiple search engines (e.g., SEQUEST and Mascot) where combinations of PTMs were judiciously chosen for each node based on uniprotKB-relative PTM abundances from high-quality, manually curated, proteome-wide data1.

Results: Tremendous enhancement in the high confident percolator validated peptide/protein identification compared to standard SEQUEST and MASCOT workflow.

IntroductionMass spectrometry has become an established method for protein identification and characterization in recent years. The number of protein identification from complex biological samples depends on many factors, ranging from data acquisition strategy to MS/MS data searching methods. Unfortunately, only a fraction of spectra generated have confident peptide matches for any complex biological sample. There are several factors that are being overlooked by many users in data searching strategy including appropriate combination of post translational modifications (PTMs), coding SNP2, isoforms of proteins, iterative searching etc. that can possibly help identify these unmatched spectrum. We herein develop a comprehensive protein identification workflow that helps identify higher number of high confidence peptide/protein IDs and also identify multiple PTMs and partially cleaved peptide in a single run.

Methods Comprehensive workflow development

We developed a comprehensive MS/MS searching workflow within Proteome Discoverer using a combination of multiple search engines (Figure1) in an iterative fashion to maximise number of protein/peptide identification by considering the most frequently found PTMs1; sequence-isoforms of proteins; and partially cleaved peptide etc. Effect of various factors on peptide identification were explored and implemented in the process that include protein isoforms, missed cleavage sites, semi tryptic digestion and most importantly appropriate combination of PTMs in each search node. The combination of PTMs were judiciously chosen based on uniprotKB-relative abundances of each PTM found experimentally and putatively, from high-quality, manually curated, proteome-wide data1. The workflows were tested on plasma and urine samples acquired on a hybrid Orbitrap mass spectrometer.

FIGURE 2. Comprehensive workflow increases number of peptide identification

Results Peptide Identification

We compare the results from our comprehensive searching workflow with general search. We found that on average, the number of high confidence peptides identification (FDR≤0.01) increased by approximately 70% with our comprehensive workflow as compared to general searches, whereas the number of medium confidence peptides identification (FDR≤0.05) increment was twice as compared to general searches (figure2).

SEQUEST and Percolator are registered trademarks of University of Washington. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries.

This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

FIGURE 4. Comprehensive workflow increases number of matched spectra.

Table1. Examples of peptide containing multiple PTMs from Comprehensive search.

Improving mass spectrometry data searching workflow to maximize protein IdentificationsShadab Ahmad1, Amol Prakash1, David Sarracino1, Bryan Krastins1, MingMing Ning2, Barbara Frewen1, Scott Peterman1, Gregory Byram1, Maryann S. Vogelsang1, Gouri Vadali1, Jennifer Sutton1, Mary F. Lopez1 1Thermo Fisher Scientific, BRIMS (Biomarker Research in Mass Spectrometry), Cambridge, MA2Massachusetts General Hospital, Boston, MA

with Thermo QExactive benchtop mass spectrometer, with top 15 data dependent MS/MS using HCD fragmentation.

Data Analysis

The acquired data was searched with proteome discoverer 1.4 (Thermo Fisher Scientific) using comprehensive workflow and also with general SEQUEST workflow with standard PTMs (oxidation at methionine as dynamic modification and alkylation as static modification) coupled with percolator validation (General Search).

FIGURE 1. Structure of Comprehensive workflow

Sample Preparation

In order to evaluate the performance of the comprehensive workflow we took four human samples from two different sources (a) Urine and (b) Plasma (three samples). Human urine and plasma samples were collected with full consent and approval. The samples were subjected to reduction and alkylation followed by digestion with trypsin.

Liquid Chromatography and Mass Spectrometry

The digested samples were separated with C18 column with 5-45% acetonitrile gradient in 0.1% formic acid through nano-LC system. The urine sample (sample no. 1) and a plasma sample (sample no. 2) were run for 140 minutes and 90 minutes respectively and the data were acquired with LTQ Orbitrap Velos MS with top 11 and top 10 data dependent MS/MS respectively using CID fragmentation . Another two plasma samples (sample no.3 and 4) were run for 250 minutes and 240 minutes respectively and the data were acquired

FIGURE 3. Comprehensive workflow increases number of grouped protein identification (with at least two peptide hits per protein)

The comprehensive workflow found to increase the number of high confident protein (FDR≤0.01) by 63% and the high confident grouped protein by 44% with respect to the general search. Moreover the comprehensive workflow increases the high confident group proteins (with at least two high confident peptides for every protein in the group) by 15% (figure3).

File Total Spectra

Matched Spectra General

Search (FDR≤0.05)

Matched Spectra Comprehensive

Search (FDR≤0.05)

Matched Spectra General

Search (FDR≤0.01)

Matched Spectra Comprehensive

Search (FDR≤0.01)

Sample1 27215 27.9 % 43.5 % 26.0 % 38.5 %

Sample2 14005 15.5 % 34.4 % 14.5 % 30.1 %

Sample3 30026 19.9 % 32.8 % 19.1 % 30.1 %

Sample4 60770 8.2 % 18.1 % 8.0 % 16.8 %

Sequence Modification q-Value

RATTVTGTPCQDWAAQEPHR R1(ADP-Ribosyl); G7(Myristoyl); C10(Carboxymethyl) ≤0.001

VSHSPPPKQRSSPVTK S2(Phospho); S4(Phospho); K8(Methyl); R10(Methyl) ≤0.001

LLIYAASSLETGVPSR Y4(Phospho); A6(Acetyl) 0.007

LVRPEVDVMCTAFHDNEETFLK M9(Oxidation); C10(Carboxymethyl); F13(Amidated); E17(Carboxy); F20(Amidated) ≤0.001

Moreover the comprehensive workflow identified several high confident peptides with multiple PTMs which reveal the importance of right combination of PTM in a search node (table1).

We further investigate the matched and unmatched spectra while using general search and our comprehensive search. We found that the percentage of matched spectra improves significantly when using comprehensive search workflow (figure 4, table2).

Table2. Comparative table for matched spectra