Efficient and Accurate Glycopeptide Identification Pipeline...

9
Ecient and Accurate Glycopeptide Identication Pipeline for High- Throughput Site-SpecicNGlycosylation Analysis Mingqi Liu, ,Yang Zhang, Yaohan Chen, ,Guoquan Yan, ,Chengping Shen, § Jing Cao, ,Xinwen Zhou, ,Xiaohui Liu, ,Lei Zhang, ,Huali Shen, Haojie Lu, ,Fuchu He, ,and Pengyuan Yang* ,,Department of Chemistry, Fudan University, 220 Han Dan Road, Shanghai 200433, P. R. China Institutes of Biomedical Sciences, Fudan University, 138 YiXueYuan Road, Shanghai 200032, P. R. China § Cloudscientic Technology Co., Ltd., 585 Long Hua West Road, Xuhui District, Shanghai 200232, P. R. China State Key Laboratory of Proteomics, 33 Life Science Park, Beijing 102206, P. R. China * S Supporting Information ABSTRACT: Study of site-specic N-glycosylation in complex sample remains a huge analytical challenge because protein glycosylation is structurally diverse in post-translational mod- ications, resulting in an intricacy of N-glycopeptides. Here we have developed a novel approach for high-throughput N- glycopeptide proling based on a network-centric algorithm for deciphering glycan fragmentation in mass spectrometry. We performed an extensive validation and a high-throughput N- glycosylation study on serum and identied thousands of N- glycopeptide spectra with high condence. The results revealed a similar level of glycan microheterogeneity to that of conven- tional glycomics approach on individual proteins and provided the unique in-depth site-specic information that could only be studied through glycopeptide proling. KEYWORDS: N-glycosylation, CID, tandem mass spectrometry, site-specic, post-translational modication, spectral interpretation, bioinformatics INTRODUCTION The four fundamental components of molecular building blocks of life are nucleic acids, proteins, glycans, and lipids, and the interaction of these components is essential for cell development and functions. 1 The linkage between glycans and proteins, or glycosylation, represents not only the most abundant post translational modication but also by far the most structurally diverse. 2 A study of glycosylation could be performed by dierent approach: once the glycan is released from the protein, identication of glycosylation sites or overall glycan pattern is a routine technique; however, the all-important information on site-specic information is lost (glycan microheterogeneity on di erent glycosylation sites for each protein). 3 Intact glycopeptide, the major target of site-specic glycosylation study remains a huge analytical challenge for both proteomics and glycomics so far. 4 Currently no method is freely available for analyzing spectral data in batch and unambiguously determining glycopeptide compositions on a complex sample. 5 We have thus developed a high-throughput glycopeptide proling method based on mass spectrometry (MS). A novel network-centric algorithm coupled to data-dependent database was specically designed for glycopeptide spectral analysis. Extensive validation and high-throughput glycosylation study on complex sample were performed. We have identied thousands of glycopeptide spectra with high condence and revealed in-depth glycosylation microheterogeneity on serum for dozens of glycan with site-specic information on many glycoproteins. METHODS Routine glycoproteomics techniques were used in sample processing and MS spectral collection, with detailed informa- tion in Supplementary Methods S1 in the Supporting Information. Spectral Extraction and Recalibration RAW les from mass spectrometry were converted into MZXML format, and MS/MS spectra in dtaformat were extracted from MZXML by Msconvert and Mzxm2search, respectively (Trans-Proteome-Pipeline, or TPP, v4.5.1). 24 The recalibration of MS/MS spectra was performed using SEQUEST (Bioworks v3.3.1) and FTDR. 25 First, database Received: December 22, 2013 Published: April 25, 2014 Technical Note pubs.acs.org/jpr © 2014 American Chemical Society 3121 dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 31213129

Transcript of Efficient and Accurate Glycopeptide Identification Pipeline...

Page 1: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

Efficient and Accurate Glycopeptide Identification Pipeline for High-Throughput Site-Specific N‑Glycosylation AnalysisMingqi Liu,†,‡ Yang Zhang,‡ Yaohan Chen,†,‡ Guoquan Yan,†,‡ Chengping Shen,§ Jing Cao,†,‡

Xinwen Zhou,†,‡ Xiaohui Liu,†,‡ Lei Zhang,†,‡ Huali Shen,‡ Haojie Lu,†,‡ Fuchu He,‡,∥

and Pengyuan Yang*,†,‡

†Department of Chemistry, Fudan University, 220 Han Dan Road, Shanghai 200433, P. R. China‡Institutes of Biomedical Sciences, Fudan University, 138 YiXueYuan Road, Shanghai 200032, P. R. China§Cloudscientific Technology Co., Ltd., 585 Long Hua West Road, Xuhui District, Shanghai 200232, P. R. China∥State Key Laboratory of Proteomics, 33 Life Science Park, Beijing 102206, P. R. China

*S Supporting Information

ABSTRACT: Study of site-specific N-glycosylation in complexsample remains a huge analytical challenge because proteinglycosylation is structurally diverse in post-translational mod-ifications, resulting in an intricacy of N-glycopeptides. Here wehave developed a novel approach for high-throughput N-glycopeptide profiling based on a network-centric algorithm fordeciphering glycan fragmentation in mass spectrometry. Weperformed an extensive validation and a high-throughput N-glycosylation study on serum and identified thousands of N-glycopeptide spectra with high confidence. The results revealed asimilar level of glycan microheterogeneity to that of conven-tional glycomics approach on individual proteins and providedthe unique in-depth site-specific information that could only bestudied through glycopeptide profiling.

KEYWORDS: N-glycosylation, CID, tandem mass spectrometry, site-specific, post-translational modification, spectral interpretation,bioinformatics

■ INTRODUCTIONThe four fundamental components of molecular buildingblocks of life are nucleic acids, proteins, glycans, and lipids, andthe interaction of these components is essential for celldevelopment and functions.1 The linkage between glycans andproteins, or glycosylation, represents not only the mostabundant post translational modification but also by far themost structurally diverse.2

A study of glycosylation could be performed by differentapproach: once the glycan is released from the protein,identification of glycosylation sites or overall glycan pattern is aroutine technique; however, the all-important information onsite-specific information is lost (glycan microheterogeneity ondifferent glycosylation sites for each protein).3 Intactglycopeptide, the major target of site-specific glycosylationstudy remains a huge analytical challenge for both proteomicsand glycomics so far.4 Currently no method is freely availablefor analyzing spectral data in batch and unambiguouslydetermining glycopeptide compositions on a complex sample.5

We have thus developed a high-throughput glycopeptideprofiling method based on mass spectrometry (MS). A novelnetwork-centric algorithm coupled to data-dependent databasewas specifically designed for glycopeptide spectral analysis.

Extensive validation and high-throughput glycosylation studyon complex sample were performed. We have identifiedthousands of glycopeptide spectra with high confidence andrevealed in-depth glycosylation microheterogeneity on serumfor dozens of glycan with site-specific information on manyglycoproteins.

■ METHODSRoutine glycoproteomics techniques were used in sampleprocessing and MS spectral collection, with detailed informa-tion in Supplementary Methods S1 in the SupportingInformation.

Spectral Extraction and Recalibration

RAW files from mass spectrometry were converted intoMZXML format, and MS/MS spectra in “dta” format wereextracted from MZXML by Msconvert and Mzxm2search,respectively (Trans-Proteome-Pipeline, or TPP, v4.5.1).24 Therecalibration of MS/MS spectra was performed usingSEQUEST (Bioworks v3.3.1) and FTDR.25 First, database

Received: December 22, 2013Published: April 25, 2014

Technical Note

pubs.acs.org/jpr

© 2014 American Chemical Society 3121 dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−3129

Page 2: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

searching using wide precursor mass tolerance −100 ppm wasperformed by SEQUEST. After that, FTDR performedautomatic recalibration. Other parameters for SEQUESTwere: protein database: Swissprot v14.9, and Homo sapiensspecies, 20 343 entries with same number of reversed proteinsequence attached as decoy database. Enzyme specificityconsidered: semitrypsin; missed cleavages permitted: 2; fixedmodification: carbamidomethylation (C + 57.02150); variablemodifications: oxidation (M + 15.9949) for all samples, anddeamidation (N + 0.9840) for samples with PNGaseFtreatment; mass tolerance for fragment ions: 1 Da.Identification of Nonglycosylated Peptides

SEQUEST and PeptideProphet (TPP, v4.5.1) were used toidentify nonglycosylated peptides. All spectra after recalibrationwere searched by SEQUEST using the same parameterspreviously mentioned except a much narrower precursor masstolerance −10 ppm. The searching results were converted into“PEPXML” format by Out2XML (TPP, v4.5.1), and sub-sequent analyses by PeptideProphet were performed (withdecoy-assisted semiparametric model). A stringent thresholdscore (p ≥ 0.99) was used to ensure that the FPR on spectrallevel was way below 1%.Generation of Data-Dependent Deglycopeptide Database

All peptide identifications (PeptideProphet score, p ≥ 0.99)with deamidation on N-glycosyaltion motifs (N-X-S/T, X ≠ P)were classified as deglycopeptides. A data-dependent databaseof deglycopeptide was generated based on the deglycopeptideidentification of the given sample. A typical example of onedeglycopeptide in the database is given as:

Accession: sp|P01871|IGHM_HUMANPeptide: R.GLTFQQN#ASSM$C%VPDQDTAIR.VStart: 203End: 223Seq_feature: N_GLYCANFix_modification: C%Varible_modification: M$Molecular_weight: N_GLYCAN

In this example, we used # for the illustration of N-glycosylation site, % for carbamidomethylation on C, and $ foroxidation on M in the “Peptide” column. The molecular weightof the peptide includes hydrogen ion. The deglycopeptidedatabase used in the report is shown in Supplementary Data S1and S2 in the Supporting Information.Generation of Knowledge-Based Glycan Database

The N-glycome of human serum has been extensively studied.We generated a knowledge-based database of glycan based onexisting knowledge: the three largest plausible N-glycanscorresponding to high mannose, hybrid, and complex-typesugar chains in human serum were used. Glycoworkbench wasused to generate all Y-type fragments of the three N-glycans;365 different glycan compositions were included in the glycandatabase for human serum.26 Here is an example of one glycanin the database:

Composition: 3 2 0 00Molecular_Weight: 892.3172

The “Composition” column consists of five digits, corre-sponding to five categories of glycans with different masses:

Glc/Man/Gal or Hex: D-glucose/D-mannose/D-galactoseGlcNAc/GalNAc or HexNac: N-acetyl-D-glucosamine/N-acetylgalactosamine

NeuAc: N-acetylneuraminic acid

Xyl: xylose

Fuc: fucose

For example, the common core of N-glycan is glycan“32 000”in the database. The elements of the database can be easilychanged or enlarged according to user’s requirement. Glycandatabase used in the report is shown in Supplementary Data S3in the Supporting Information.

Construction of Glycopeptide Database

The data-dependent database of glycopeptide is a peptide-glycan linkage multiplication between databases of deglycopep-tide and glycan, for which deglycopeptides in the deglycopep-tide database were linked with glycans in the glycan databaseexhaustively. The nomenclature of glycopeptide is the directjoint of peptide and glycan. For example, “R.EEQYN#-STYR.V32000” means that peptide “R.EEQYN#STYR.V” hasa common core of N-glycan attached to N#(sugar compositionof 32 000). For peptide with more than one glycosylation site,we do not differentiate site-specific attachment of glycan in thispaper: for such glycopeptide, glycan attachment at either sitegenerates almost identical CID−MS/MS spectrum. Onepossible solution for this problem is using MS3 or even MS4for peptide cleavage, which has arguably much lower sensitivitythan MS2. Besides, glycopeptides with same glycan anddifferent peptide of exactly the same mass could not bedifferentiated by CID−MS/MS: for example, glycopeptideR.-EEQYN#STFR.V32000 from protein IGHG3 and R.EEQFN#-STYR.V32000 from protein IGHG4 have almost identicalspectra. Therefore, N-glycosylation of IGHG3 and IGHG4 isformed together as a group in this paper.Two different kinds of glycopeptide databases were

generated in this paper: one is glycopeptide compositiondatabase. Only information on deglycopeptide and compositionof glycan was considered, which means that isomeric glycanstructures shared the same entry. We used glycopeptidecomposition database for high-throughput glycosylation anal-ysis. The other is glycopeptide structure database; isomericglycan structures with the same composition were separated inthis database, and we used glycopeptide structure database forstructural analysis.

Identification of Glycopeptide

For high-throughput interpretation of glycopeptide composi-tion, the algorithm needs three inputs: MS/MS spectrum,deglycopeptide database, and glycan database. In theinterpretation process, GRIP forms glycopeptide database incomputer memory, as previously mentioned. The searchingparameters of GRIP were: mass tolerance for precursor, 10ppm; mass tolerance for fragment, 1 Da; peak filtering, top 100in intensity; marker mass, the mass of four categories of glycansin the database, 146.1, 162.1, 203.1, and 291.1. The coreequation is a hypergeometric distribution for the completenessof Y-fragment in glycopeptide spectra

∑= −×

=

−+ − −

−⎛⎝⎜⎜

⎞⎠⎟⎟

C C

Cscore2 log 1

i

NnNnPs

Np NnNa Ps

NpNa10

0

11 1

(1)

where Ps is the number of peaks selected, Na is the number ofall nodes in a glycan database, Nn is the experimental numberof network nodes for glycan linkage network, and Np is thepredicted number of network nodes for glycan linkage network.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−31293122

Page 3: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

Detailed interpretation procedures of GRIP are described inSupplementary Methods S2 in the Supporting Information.

Poisson Estimation of False Positive Matches

We applied a Poisson model to estimate the expected numberof the false positive matches. We postulated that a fragmentMS/MS spectrum is derived from a given glycopeptide existingpossibly in the constructed glycopeptide database and that anumber of the false-positive matches with similar fragmentpattern but at a random intensity sequence might occur acrossthe glycopeptide database. Then, the false-positive match-specific probability, p(x), that M or more matches will be seen inan experiment is

∑ λ=!

λ

=

−px

exx M

x

( )(2)

where x is the false-positive match in the results, λ is theexpected mean for false-positive matches of x, and p(x) is theprobability of p at given x.It may not be feasible to perform a number of experiments

for glycoproteome in a serum sample for statistical values of xto calculate the mean of λ. Here we suggested an approximationof λ for which the expected mean of λ should not be over thetotal spectral number of Ti in the ith searching (with amaximum spectra of T0 in the experiment) and would belimited by T × FDR practically, with a false-positive rate ofFDR (number of matches of Si in the decoy glycopeptidedatabase over that of Ti in the real glycopeptide database) inour validation experiment. Si, Ti, and FRDi are the ithsearching-score-dependent values, for example, S0 = T0, FRD0= 1 at zeroth score of 0, and Si = 0,FRDi = 0 at a very high ithscore. We could perform an iteration process to estimate anexpected M = x at a given λ for a 95% confidence level (p =0.05), as follows

∑ λ!

=

xe 0.05

x M

x

(3)

with λt = 0.5(λt + Ti × FDRi) in the iteration.The calculated M can then be used to evaluate the expected

false-positive matches at the 95% confidence level and compare

with the Si selected in the validation experiment for GRIPalgorithm.

■ RESULTS AND DISCUSSION

Network-Centric Algorithm for Automated,High-Throughput Glycopeptide Spectral Analysis

We describe a novel approach for efficient, reliableglycosylation study. One of the major obstacles in MS-basedglycopeptide analysis is the compromise between spectralquality and quantity. At the moment, no single MS/MS methodcould provide de novo interpretation for the glycopeptidespectrum.6 Most glycosylation studies on the complex sampleeither use multiple MS/MS methods at the cost of lowersensitivity with overall high quality and low quantity or usesingle MS/MS method and complicated spectral analysis withoverall low quality and high quantity.5−7 Our workflow hastherefore tried to achieve a high-confident and high-throughputglycopeptide analysis by incorporating data-dependent databaseand network-centric algorithm, as demonstrated later.The core of our approach is an algorithm specifically

designed for glycopeptide spectrum, as shown in Figures 1 and2. Glycopeptides shows extensive glycosidic bond cleavage inthe MS/MS spectrum (in an ion-trap-based instrument withcollision-induced dissociation (CID), see Methods), while thepeptide backbone remains nicely intact (Figure 1a), generatinga very informative fragmentation network only for glycans(Figure 1b). Because each node in this network includesinformation on both peptide backbone and glycan, we reckonedthat by reconstructing the fragmentation network we couldinterpret glycopeptide spectrum in a high-throughput mannerby using single MS/MS spectrum. On the basis of this idea, wehave developed a workflow for automated glycopeptide spectralanalysis, named as “GRIP” (Glycopeptide Revealing andInterpretation Pipeline).In the sample preparation procedures of GRIP, data-

dependent database (DDD) is an important feature for anaccurate glycosyaltion analysis of complex samples. Asdemonstrated in Figure 1, peptide backbone or deglycopeptideusually remains intact in MS/MS spectrum; therefore, itssequence could not be directly elucidated. We designed data-dependent glycopeptide database with the knowledge of

Figure 1. Fragmentation behavior of glycopeptide spectrum. (a) High-quality MS/MS glycopeptide spectrum. (b) Corresponding fragmentationnetwork used by GRIP algorithm with marked glycan branches for fragments in panel a.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−31293123

Page 4: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

glycosylation studies on complex samples. We have first createdDDD of deglycopeptide and of glycan from deglycopeptide andglycan analysis, respectively (Figure 2). The deglycopeptides ina given sample were identified using routine techniques, whileglycans were derived from a previous report8(see Methods andSupplementary Data S1−3 in the Supporting Information). Wecould then perform a glycopeptide spectral collection andinterpretation in next (Figure 2). Note that the two parallelexperiments shown in Figure 2 were started from the samesample and conducted using almost identical procedures exceptdeglycosylation (Supplementary Methods S1 in the SupportingInformation).In the data interpretation procedures of GRIP, our

homemade software automatically searches all collected MS/MS spectra against the data-dependent glycopeptide database.Our algorithm analyzes each spectrum in four steps, as shownin Figure 2 with a decision tree: (1) precursor matching: the

precursor of the spectrum is compared against glycopeptidedatabase; (2) glyco-mass detection:; peaks in the spectrum arefiltered for possible pairs of glyco-mass fragmentation; (3)network construction and relation tree searching: we use theremaining peaks to generate a glycopeptide fragmentationnetwork and calculate a relation tree (consecutive fragmenta-tion) in this network; (4) scoring: two equations are used forthe final score of proposed network. Spectra with score above athreshold are classified as glycopeptide identification. Thedetails of GRIP algorithm and the manual for correspondingsoftware are illustrated in Supplementary Methods S2 in theSupporting Information. We tested our workflow usingAsialofetuin, a standard glycoprotein. Twenty-five nonredun-dant glycopeptides on three glycosylation sites were identified(Supplementary Figure 4 and Supplementary Data S1, S3, andS4 in the Supporting Information).

Extensive Validation by Complementary, IndependentMS/MS Technique

We applied our method for N-glycosylation analysis onundepleted human serum (the same sample for all experimentson serum in this study) to carry out extensive validation todemonstrate the performance of GRIP. As illustrated in Figure2, we first generated DDD of serum glycopeptides ondeglycosylated samples; after that, conventional two-dimen-sional liquid chromatogram (2D-LC) MS/MS analysis wasperformed and followed by automated spectral interpretationby GRIP using aforementioned DDD of glycopeptide(Supplementary Data S1 and S2 in the SupportingInformation).For the validation of high-throughput analysis, we designed

paired CID− and HCD−MS/MS experiments (HCD refers tohigher-energy C-trap dissociation9,10), which is possibly aconvincing measure for glycopeptide analysis verification at themoment. Unlike MS-based peptide sequence analysis that usesdecoy database and statistical analysis to assess data quality, forstudy of glycopeptide spectral data quality, there is likely nowidely accepted decoy database or other proven statisticalmeasures.3,5 Here we applied the paired MS/MS technique forresult validation: as previously reported, glycopeptide showsdistinct and complementary fragmentation behavior in CID−and HCD−MS/MS; therefore, we can use information inHCD−MS/MS to verify the results from automated CID−MS/MS analysis (Figure 3, Supplementary Method S3 in theSupporting Information).We carried out two runs of 2DLC−CID/HCD−MS/MS on

serum, and 67 142 pairs of CID− and HCD−MS/MS spectrawere collected. We performed a GRIP searching on all CIDspectra for glycopeptide analysis, conducted a manualinterpretation on paired HCD glycopeptide spectra, andcompleted a SEQUEST/Peptideprophet searching for peptideanalysis with all CID/HCD spectra. Cut-off score for GRIPsearching was determined against a target-decoy spectralsearching and a Poisson distribution (see Methods andSupplementary Method S3 in the Supporting Information).As demonstrated in Figure 3b, 1053 spectra were identified byGRIP (corresponding to 183 nonredundant glycopeptides, 37core peptides, 24 proteins), 797 of them (75.5%) showedsufficient information for CID result validation in paired HCDspectra, and 795 of 797 HCD spectra showed the sameglycopeptide composition as GRIP searching result from CIDspectra, indicating only a 0.3% false-positive rate (FPR)(Supplementary Data S5 in the Supporting Information). To

Figure 2. Overview of the proposed strategy for automatedglycopeptide analysis. A given sample is prepared in two parallelexperiments to generate glycopeptide database and collect glycopep-tide spectra (with and without deglycosylation). After that, homemadesoftware elucidates glycopeptide spectra in several steps as shown bythe decision tree.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−31293124

Page 5: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

our knowledge, such a high accuracy had been hardly achievedin automated glycopeptide-based analysis in complex samples.We noticed that most paired HCD glycopeptide spectra (561

of 797) showed a complete series of Y1-related ions (Figure 3b,Y1 presents for a peptide backbone with on monosaccharide).Three high-quality CID/HCD glycopeptide spectra are shownin Figure 3c: the left CID spectra show extensive fragmentationon the glycan, while the right HCD spectra show distinct Y1-related ions. Note that “Y1 + 146” ion only appears with theexistence of fucose. More examples of paired CID/HCDglycopeptide spectra are shown in Supplementary Figure S1 inthe Supporting Information.We further analyzed whether different glycopeptide

compositions affect spectral quality and found a significantlylowered quality of HCD spectra due to sialic acid(Supplementary Figure S2 in the Supporting Information).Besides, the overall parallelism efficiency of CID− and HCD−MS/MS in our instrument was measured using peptideidentification: 20 458 peptide CID spectra were identified,and 13 774 of them (67.3%) showed the same identification inpaired HCD spectra, proving that the ratio of elucidated pairedHCD glycopeptide spectra (75.5%) was reasonable.Performance Evaluation of Glycosylation Study on Serum

We have eventually performed a high-throughput N-glyco-syaltion analysis on human serum to explore an analyticalcapability of the GRIP algorithm. Three runs of 2DLC−CID−MS/MS were performed with the same spectral interpretationprocedures as those in the previous section. In total, we

identified more than 5000 spectra with glycopeptideinformation (corresponding to 1145 nonredundant glycopep-tides, 225 core peptides, 95 proteins), which is likely to be byfar the most extensive glycopeptide-based analysis study onserum.7,11 Compared with the CID- and HCD−MS/MS forvalidation, the overall sensitivity for spectral collection wasincreased significantly due mainly to a reduced MS cycle time.An original high-quality glycopeptide spectrum with annotationis shown in Figure 4: GRIP elucidated nearly every major peakin this spectrum. Another 12 glycopeptide spectra with detailedannotation are shown in Supplementary Figure S5 in theSupporting Information.A large proportion of glycopeptide spectra was derived from

immune globulin (IG) family, which consists of several high-abundance glycoproteins in serum. Figure 5 shows site-specificglycosylation of IG family from 2852 glycopeptide spectra (seespectral information in Supplementary Data S6 in theSupporting Information), including IGHG1/2/4 and IGJ(one site for each), IGHA1/2 (two sites for each), andIGHM (four sites), implying an obvious and distinct site-specific glycosylation difference on single protein. For example,45 different glycans were attached to N-46 of IGM, while forthe other three sites of this protein 26 different glycans wereidentified in total.To demonstrate the performance of GRIP on micro-

heterogeneity coverage on individual proteins, we alsocompared our results against a latest serum glycan library.12

Note that a conventional low-throughput glycomics approach

Figure 3. Demonstration of CID/HCD−MS/MS glycopeptide spectral validation. (a) Workflow of validation for glycopeptide analysis. (b)Validation result of CID glycopeptide spectra using paired HCD spectra. (c) An example of paired CID/HCD spectra. Glycopeptide CID spectrawith annotation of major peaks are on the left and HCD spectra with annotation of Y1-related ions are on the right.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−31293125

Page 6: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

was used for this library, and glycan microheterogeneity of onlyseven proteins was provided (without site-specific informationbecause glycan was released at first), while our methodidentified dozens of glycoproteins in serum with site-specificglycosylation information. As shown in Figure 5, we identifiedmore glycans on IgG, IgM, and alpha-1-acid glycoprotein 1. Wefound that poorer performance of GRIP for serotransferrin andalpha-2-macroglobulin could be a result of the relatively longlength of their particular tryptic glycopeptides. For example,potential glycopeptide on serotransferrin (glycosylation site N-432) is likely to have molecular weight (MW) above 5000 Da(corresponding deglycopeptide with MW of 3529 Da wasidentified in parallel experiment), which is hardly detectableusing conventional MS/MS now. In general, GRIP revealed asimilar level of microheterogeneity coverage on individualproteins to that by any classical glycomics approaches. To ourknowledge, previous glycopeptide-based approaches had barelyachieved such a performance on complex samples.Automated Glycopeptide Structure Analysis

Another feature in our method is that GRIP is capable ofproviding partial structure information on the site-specificglycan in a high-throughput manner, which seems to have beenunreported previously. In principle, the glycan database wasextended for all probable glycoform isomers of the same glycancomposition. GRIP then estimated relative scores of theisomers using the network-centric algorithm and assigned aglycopeptide structure to a spectrum, as shown in Figure 6a foran example. For each of the first three possible structures listedin Figure 6a, a fragmentation network scoring was calculatedfrom the matching by the same 15 spectral peaks; then,different scores were given to each structure based on itscorrelation with the theoretical network. We found that thethird possible structure with no core-fucose sugar matches only

10 peaks to the theoretical network and therefore received thelowest score. Two examples of high-throughput interpretationfor glycopeptide structure are illustrated in Figure 6b, showingthe relative confidence value of particular glycoforms with 19high-quality glycopeptide spectra of protein IGHG1 (upper)and with 5 spectra of IGHG4 (lower), respectively. We foundthat the glycoform with the highest relative confidence valueseemed to have the most probability, and the glycan structurewith no core-fucose in both examples received a very low score(<50% confidence level on average) and could be considered asbarely existent. These results are consistent with previousstudies on the previous proteins with glycomics ap-proaches.13,14

■ CONCLUSIONS

A high-throughput glycosyaltion analysis based on glycopeptidespectral identification was demonstrated, and an extensivevalidation on complex samples was performed with a 0.3% FPRachieved in serum glycosylation analysis. Extensive site-specificglycosylation microheterogeneity could be revealed in a singleshotgun-type LC−MS/MS (Figure 5).Compared with existing methods for glycosylation analysis,

GRIP has several advantages: (1) Outstanding efficiency inspectral identification: We have identified thousands ofglycopeptide spectra in undepleted serum with very highaccuracy, while most glycopeptide-based approaches couldidentify only a few hundred spectra,15−23 showing the efficiencyof network-centric GRIP algorithm. (2) Ability of revealingabundant glycosylation microheterogeneity: GRIP is the firstglycopeptide-based approach to reveal a similar level of glycanmicroheterogeneity to that by conventional glycomicsapproaches (Figure 5) and provides the unique in-depth site-specific information, as shown in this report, which could only

Figure 4. Demonstration of high-quality glycopeptide spectrum. The upper spectrum, original spectrum without any modification. Spectralinformation such as glycopeptide composition, GRIP score, and mass accuracy are listed on the top left. Potential glycan structure is shown on thetop right. The lower table reveals elucidated glyco-Y-fragments with sufficient intensity. Corresponding peaks of each of the fragments are labeled inthe spectra in red.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−31293126

Page 7: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

be studied through glycopeptide profiling. Besides, GRIP couldperhaps perform high-throughput O-glycosylation study withthe possible help of corresponding high-quality DDD ofglycopeptide.An important issue related to our approach is its possible

application to other types of MS instruments. Glycopeptideanalysis in this report was performed on ion-trap CID−MS/MS, therefore we further investigated glycopeptide fragmenta-tion on quadruple CID−MS/MS. Glycopeptide from horse-radish peroxidase was systematically measured using differentcollision energy level in a Q-Star XL instrument (Supple-mentary Figure S3 in the Supporting Information); weobserved similar glycopeptide fragmentation pattern in ion-trap. In summary, we have illustrated that with a proper

instrument tuning, GRIP is capable of identifying glycopeptide

spectrum from quadruple CID−MS/MS as well.Our strategy for glycopeptide profiling could open up an

exciting possibility for high-throughput site-specific glycosyla-

tion comparison between multiple biological samples (e.g.,

human serum from health ones and patients) for potential

biomarkers. All sample preparation procedures in our approach

are widely used techniques in routine glycoproteomics, while

data processing procedures are as easy as conventional

proteomics data analysis. GRIP software as well as raw MS

data shown in this report are available upon request.

Figure 5. Result of glycosylation analysis on serum. Upper table: site-specific glycosylation information on immune globulin family: Hex/HexNAc/NeuAc/Fuc, composition of identified glycans; the remaining columns show the numbers of identified glycopeptide spectra corresponding to thespecified glycosylation site. Lower figure: revealed glycans on seven proteins of GRIP and a latest serum glycan library.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−31293127

Page 8: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

■ ASSOCIATED CONTENT

*S Supporting Information

Examples of paired CID/HCD glycopeptide spectra. Y1-relatedinformation in HCD glycopeptide spectra. Possible applicationon other type of MS instrument. LC of standard glycoproteinand XIC of identified de-glycopeptides and glycopeptides.Examples of 12 high-quality glycopeptide spectra withannotation. Experimental Procedures for sample processing.Interface to GRIP (GUI version). Detailed interpretationprocedures of GRIP (Batch version, for multiple spectra).Interpretation and validation of paired CID/HCD−MS/MSspectra. Core peptides for standard glycoprotein and serumanalysis, respectively. Glycans obtained from literature. Detailedinformation for each glycopeptide spectrum from standardglycoprotein and serum CID−MS/MS analysis. Detailedinformation for each glycopeptide spectrum from serumCID/HCD−MS/MS analysis. This material is available freeof charge via the Internet at http://pubs.acs.org.

■ AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel: 86-21-65642009. Fax: 86-21-54237961.

Notes

The authors declare no competing financial interest.

■ ACKNOWLEDGMENTSThe study was supported by S973/863 projects(2012AA020203, 2010CB912700), National Natural ScienceFoundation of China project 21227805.

■ REFERENCES(1) Marth, J. D. A unified vision of the building blocks of life. Nat.Cell Biol. 2008, 10, 1015−1016.(2) Marino, K.; Bones, J.; Kattla, J. J.; Rudd, P. M. A systematicapproach to protein glycosylation analysis: a path through the maze.Nat. Chem. Biol. 2010, 6, 713−723.(3) Pan, S.; Chen, R.; Aebersold, R.; Brentnall, T. A. Massspectrometry based glycoproteomics − from a proteomics perspective.Mol. Cell. Proteomics. 2011, DOI: 10.1074/mcp.R110.003251.(4) Doerr, A. Glycoproteomics. Nat. Methods. 2012, 9, 36.(5) Dallas, D. C.; Martin, W. F.; Hua, S.; German, J. B. Automatedglycopeptide analysis-review of current state and future directions.Briefings Bioinf. 2012, 14, 361−374.(6) Dodds, E. D. Gas-phase dissociation of glycosylated peptide ions.Mass Spectrom. Rev. 2012, 31, 666−682.(7) Joenvaara, S.; Ritamo, I.; Peltoniemi, H.; Renkonen, R. GlyBio N-Glycoproteomics − An automated workflow approach. Glycobiology.2008, 18, 339−349.(8) Kronewitter, S. R.; An, H. J.; Leoz, M. L.; Lebrilla, C. B.;Miyamoto, S.; Leiserowitz, G. S. The development of retrosyntheticglycan libraries to profile and classify the human serum N-linkedglycome. Proteomics 2009, 9, 2986−2994.(9) Segu, Z. M.; Mechref, Y. Characterizing protein glycosylationsites through higher-energy C-trap dissociation. Rapid Commun. MassSpectrom. 2010, 24, 1217−1225.

Figure 6. Structural verification of glycopeptide. (a) Example of glycopeptide structure interpretation by GRIP for a high-quality glycopeptide CID−MS/MS spectrum of R.EEQFN#STFR.V-33001 (top figure), with three deduced structures from upper to lower figures with final-score/intensity-score/network score of 6.03/64.2/9.4, 4.75/64.2/7.4, and 2.43/38.0/6.4, respectively. The existing and theoretical fragmentation networks of threepossible structures differed significantly from each other, and GRIP used this information to distinguish different glycoforms. (b) Result of high-throughput glycopeptide structure analysis. Upper right: possible glycoforms with the same monosaccharide composition for protein IGHG1 andnormalized score across 19 spectra for these glycoforms; lower right: possible glycoforms for protein IGHG4, and normalized score across fivespectra. The relative confidence value is the score ratio between the score of current candidate and the highest score of all candidates. The horizontalaxis is the list of all scores of multiple spectra.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−31293128

Page 9: Efficient and Accurate Glycopeptide Identification Pipeline ...pyyangteam.fudan.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a... · searching using wide precursor mass tolerance

(10) Singh, C.; Zampronio, C. G.; Creese, A. J.; Cooper, H. J. HigherEnergy Collision Dissociation (HCD) Product Ion-Triggered ElectronTransfer Dissociation (ETD) Mass Spectrometry for the Analysis ofN-Linked Glycoproteins. J. Proteome Res. 2012, 11, 4517−4525.(11) Wang, H.; Wong, C. H.; Chin, A.; Taguchi, A.; Taylor, A.;Hanash, S.; Sekiya, S.; Takahashi, H.; Murase, M.; Kajihara, S.;Iwamoto, S.; Tanaka, K. Integrated mass spectrometry−based analysisof plasma glycoproteins and their glycan modifications. Nat. Protoc.2011, 6, 253−269.(12) Aldredge, D.; An, H. J.; Tang, N.; Waddell, K.; Lebrilla, C. B.Annotation of a Serum N-Glycan Library for Rapid Identification ofStructures. J. Proteome Res. 2012, 11, 1958−1968.(13) Mittermayr, S.; Bones, J.; Doherty, M.; Guttman, A.; Rudd, P.M. Multiplexed Analytical Glycomics Rapid and Confident IgG N-Glycan Structural Elucidation. J. Proteome Res. 2011, 10, 3820−3829.(14) Pucic, M.; Knezevic, A.; Vidic, J.; Adamczyk, B.; Novokemet,M.; Polasek, O.; Gornik, O.; Goreta, S. S.; Wormald, M. R.; Redzic, I.;Campbell, H.; Wright, A.; Hastie, N. D.; Wilson, J. F.; Rudan, I.;Wuhrer, M.; Rudd, P. M.; Josic, D.; Lauc, G. High ThroughputIsolation and Glycosylation Analysis of IgG−Variability andHeritability of the IgG Glycome in Three Isolated HumanPopulations. Mol. Cell. Proteomics. 2011, DOI: 10.1074/mcp.M111.010090.(15) Goldberg, D.; Bern, M.; Parry, S.; Smith, M. S.; Panico, M.;Morris, H. R.; Dell, A. Automated N-Glycopeptide IdentificationUsing a Combination of Single- and Tandem-MS. J. Proteome Res.2007, 6, 3995−4005.(16) Ozohanics, O.; Kreyacz, J.; Ludanyi, K.; Pollreisz, F.; Vekey, K.;Draho, L. GlycoMiner a new software tool to elucidate glycopeptidecomposition. Rapid Commun. Mass Spectrom. 2008, 22, 3245−3254.(17) Alley, W. R.; Mechref, Y.; Novotny, M. V. Characterization ofglycopeptides by combining collision-induced dissociation andelectron-transfer dissociation mass spectrometry data. Rapid Commun.Mass Spectrom. 2009, 23, 161−170.(18) Wu, Y.; Mechiref, Y.; Klouckova, I.; Mayampurath, A.; Novotny,M. V.; Tang, H. Mapping site-specific protein N-glycosylationsthrough liquid chromatography-mass spectrometry and targetedtandem mass spectrometry. Rapid Commun. Mass Spectrom. 2010,24, 965−972.(19) Froehlich, J. W.; Barboza, M.; Chu, C.; Lerno, L. A. J.; Clowers,B. H.; Zivkovic, A. M.; German, J. B.; Lebrilla, C. B. Nano-LC−MSMSof Glycopeptides Produced by Nonspecific Proteolysis Enables Rapidand Extensive Site-Specific Glycosylation Determination. Anal. Chem.2011, 83, 5541−5547.(20) Nwosu, C. C.; Seipert, R. R.; Strum, J. S.; Hua, S. S.; An, H. J.;Zivkovic, A. M.; German, B. J.; Lebrilla, C. B. Simultaneous andExtensive Site-specific N- and O-Glycosylation Analysis in ProteinMixtures. J. Proteome Res. 2011, 10, 2612−2624.(21) Halim, A.; Nilsson, J.; Ruetschi, U.; Hesse, C.; Larson, G.Human Urinary Glycoproteomics Attachment Site Specific Analysis ofN- and O-Linked Glycosylations by CID and ECD. Mol. Cell.Proteomics. 2012, DOI: 10.1074/mcp.M111.013649.(22) Pompach, P.; Chandler, K. B.; Lan, R.; Edwards, N.; Goldman,R. Semi-Automated Identification of N-Glycopeptides by HydrophilicInteraction Chromatography, nano-Reverse-Phase LC−MSMS, andGlycan Database Search. J. Proteome Res. 2012, 11, 1728−1740.(23) Kolarich, D.; Jensen, P. H.; Altmann, F.; Packer, N. H.Determination of site-specific glycan heterogeneity on glycoproteins.Nat. Protoc. 2012, 7, 1285−1298.(24) Deutsch, E. W.; Mendoza, L.; Shteynberg, D.; Farrah, T.; Lam,H.; Tasman, N.; Sun, Z.; Nilsson, E.; Pratt, B.; Prazen, B.; Eng, J. K.;Martin, D. B.; Nesvizhskii, A. I.; Aebersold, R. A guided tour of theTrans-Proteomic Pipeline. Proteomics 2010, 10, 1150−1159.(25) Zhang, J.; Ma, J.; Dou, L.; Wu, S.; Qian, X.; Xie, H.; Zhu, Y.; He,F. Mass Measurement Errors of Fourier-Transform Mass Spectrom-etry (FTMS) Distribution, Recalibration, and Application. J. ProteomeRes. 2009, 8, 849−859.

(26) Ceroni, A.; Maass, K.; Geyer, H.; Geyer, R.; Dell, A.; Haslam, S.M. GlycoWorkbench A Tool for the Computer-Assisted Annotation ofMass Spectra of Glycans. J. Proteome Res. 2008, 7, 1650−1659.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr500238v | J. Proteome Res. 2014, 13, 3121−31293129