spectrometric amino acid sequencing a storage proteins ...thermolysin in 100,/l ofNH4HCO3buffer...

Proc. Natl. Acad. Sci. USAVol. 93, pp. 3647-3652, April 1996Biochemistry

Mass spectrometric amino acid sequencing of a mixture of seedstorage proteins (napin) from Brassica napus, products of amultigene familyPETER M. GEHRIG*t, ANDRZEJ KRZYZANIAKI, JAN BARCISZEWSKIt, AND KLAUS BIEMANN*§*Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139; and :Institute of Bioorganic Chemistry of the Polish Academy ofSciences, Noskowskiego 12/14, 61-704 Poznan, Poland

Contributed by Klaus Biemann, December 22, 1995

ABSTRACT The amino acid sequences of a number ofclosely related proteins ("napin") isolated from Brassicanapus were determined by mass spectrometry without priorseparation into individual components. Some of these proteinscorrespond to those previously deduced (napA, BngNAP1, andgNa), chiefly from DNA sequences. Others were found to differto a varying extent (BngNAP1', BngNAPIA, BngNAPlB,BngNAP1C, gNa', and gNaA). The short chains of gNa andgNa' and of BngNAP1 and BngNAP1' differ by the replace-ment of N-terminal proline by pyroglutamic acid; the longchains of gNaA and BngNAPlB contain a six amino acidstretch, MQGQQM, which is present in gNa (according to itsDNA sequence) but absent from BngNAP1 and BngNAPlC.These alternations of sequences between napin isoforms aremost likely due to homologous recombination of the geneticmaterial, but some of the changes may also be due to RNAediting. The amino acids that follow the untruncated C terminiof those napin chains for which the DNA sequences are known(napA, BngNAPI, and gNa) are aromatic amino acids. Thissuggests that the processing of the proprotein leading to theC termini of the two chains is due to the action of a proteasethat specifically cleaves a G/S-F/Y/W bond.

Napin, a member of the 2 S albumin class of proteins, is oneof the major seed storage proteins in Brassica napus, consti-tuting about 20% of the total protein content in mature rapeseeds (1, 2). These proteins are expressed during seed devel-opment as precursors, undergo co- and posttranscriptionalmodifications, and are then transported to membranous or-ganelles (protein bodies) where they accumulate in largequantities. There is increasing interest in napin proteins andtheir genes because they represent a good model for studyingboth the expression of a multigene family and protein matu-ration processes in plant cells (3-7). DNA-binding studiesprovide some evidence for a role of napin in the regulation ofits own high level of synthesis (A.K. and J.B., unpublisheddata). Napin contains a short track rich in basic amino acidstypical of nuclear localization signals (8), and also glutamine-rich domains that are characteristic of one group of transcrip-tion factors (9, 10). Furthermore, x-ray studies of phaseolin,another seed storage protein, showed a domain with structuralsimilarity to the helix-turn-helix motif found in certain DNA-binding proteins (11).Mature napin consists of two polypeptide chains that are

linked by disulfide bonds (2). It appears from comparison ofthese polypeptide chains with the corresponding DNA se-quences that the initial translation product is a 20-kDa pre-cursor protein (2-4) that contains both the peptide chains ofmature napin as well as peptide stretches that are removedduring maturation. First, an N-terminal signal peptide isremoved from the precursor, which is subsequently cleaved at

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement" inaccordance with 18 U.S.C. §1734 solely to indicate this fact.

four other points, yielding two disulfide-linked peptide chainswith molecular masses in the range of 4-10 kDa (2).

Several different napin polypeptides can be expected asproducts of expression of a gene family of at least 10 (4) andperhaps more than 16 members (5). Four genomic sequences,napA (4), napB (6), gNa (5), and BngNAP1 (12), and threecDNA sequences, pNl and pN2 (3), and pNAP1 (2), werepublished. Mature napin protein was sequenced by Edmandegradation without prior separation of isoforms (2). Thesequence of one single main component was found to corre-spond to the cDNA sequence ofpNAP1 and the gene sequenceof napA (which are identical), and only minor sequenceheterogeneity was observed.

In another study, the isolation of five napin proteins from B.napus was described (13). The amino acid compositions of thethree major components, napins nII, nIII, and nIV, thesequences of the N- and C-terminal ends of both chains (14),and the sequence of the entire small chain of nIII werereported (15). However, these napin proteins could not beunambiguously correlated with known gene or cDNA se-quences, and the results obtained for the processing sites of thenapin chains were only partially consistent with those proposedpreviously (2).To assess the function of plant storage proteins and even-

tually allow genetic engineering of improved grain crops, thedetailed primary structure of these proteins must be known.For this reason, we have carried out, mainly by mass spec-trometry, the complete characterization of the amino acidsequences and the truncated termini of all relatively abundant14-kDa napin forms from B. napus. Because of the high degreeof homology among napin variants and due to their raggedends, separation and isolation of all isoforms are difficult. Forthis reason, and in order to obtain a complete picture of theexpressed napin proteins, the structural analysis was done onthe mixture of the 14-kDa napins that all eluted in one HPLCpeak. Using a strategy involving different mass spectrometricmethods, napin protein sequences corresponding exactly toknown DNA sequences, as well as others differing by a fewamino acids, were identified and their expression and process-ing patterns were analyzed in detail. The environment of theN and C termini of different isoforms showed common structuralfeatures indicating processing by one or two specific proteases.

MATERIALS AND METHODSIsolation of Napin. Rape seeds (B. napus var. Bor; 500 g)

were homogenized and extracted with buffer A (50 mM

Abbreviations: MALDI, matrix-assisted laser desorption ionization;TOF, time-of-flight; MS, mass spectrometer(try); PSD, postsourcedecay; RP-HPLC, reversed-phase HPLC; TFA, trifluoroacetic acid;CID, collision-induced dissociation.tPresent address: Brain Research Institute, University of Zurich,August Forel-Strasse 1, CH-8029, Zurich, Switzerland.§To whom reprint requests should be addressed.

3647

Dow

nloa

ded

by g

uest

on

June

25,

202

1

Proc. Natl. Acad. Sci. USA 93 (1996)

NaH2PO4, pH 7.0/1 mM EDTA) and centrifuged for 1 h [allcentrifugations were at 4000 rpm (2504 x g)]. To 400 ml ofsupernatant, solid (NH4)2SO4 was added to 30% of saturationand centrifuged. The supernatant was again brought to 75%saturation of (NH4)2SO4 and centrifuged; the pellet was thendissolved in 50 ml of buffer A, dialyzed against the same buffer,and passed over a Sephadex G-50 column. Fractions containingnapin were collected, precipitated with (NH4)2SO4 (75% ofsaturation), and centrifuged. Crude napin was dissolved in bufferB (same as buffer A, but pH 7.4), dialyzed against the same buffer,loaded onto a Sephadex 50 coarse medium column, and elutedwith a gradient of 0.15-0.35 M NaCl in buffer B. Fractionscontaining napin were pooled, precipitated with (NH4)2SO4,redissolved in buffer A, and dialyzed. Finally, 1.4 g of crude napinwas obtained. For the work described here, 5 mg was furtherpurified by reversed-phase (RP)-HPLC on an Aquapore-C4column (Brownlee Lab) using a gradient of 0-80% acetonitrilein H20 containing 0.05% trifluoroacetic acid (TFA).Reduction and S-Ethylpyridylation of Napin. Napin (0.28

mg, 20 nmol) was dissolved in 100 utl NH4HCO3 buffer (0.01M, pH 8.3), reduced, and S-ethylpyridylated by a 10-fold excessof triethylphosphine (Aldrich; 1 wt-% in 2-propanol) and a100-fold excess of 4-vinylpyridine (Sigma). The mixture wasincubated under argon at 37°C for 2 h and then lyophilized.Enzymatic Digestion of Napin. S-Ethylpyridylated napin

(0.22 mg, 16 nmol) was sequentially digested with Endo-Lys-C(Wako Pure Chemical, Osaka), 150:1 substrate:enzyme, at30°C for 6 h followed by trypsin (Boehringer Mannheim), 100:2substrate:enzyme, at 37°C for 3 h in 100 ,tl NH4HCO3 buffer(0.01 M, pH 8.3). S-Alkylated napin (0.11 mg, 8 nmol) wasdigested with 2 wt-% of a-chymotrypsin in 250 ,tl NH4HCO3buffer (0.1 M, pH 7.8) using otherwise identical conditions asfor trypsin,'and another 0.22 mg (16 nmol) was treated with 2wt-% of Endo-Glu-C in 0.1 M sodium phosphate buffer and0.02% sodium azide (Pierce) at 37°C for 2 h. Finally, 0.22 mg(16 nmol) S-alkylated napin was digested with 2 wt-% ofthermolysin in 100 ,/l of NH4HCO3 buffer (0.01 M, pH 8.3) at37°C for 2 h. These digests were partially fractionated byRP-HPLC for mass spectrometric analyses.

Relatively small amounts (

Proc. Natl. Acad. Sci. USA 93 (1996) 3649

Table 1. Summary of MALDI mass spectral data of S-ethylpyridylated short and long napin chainsHPLC Protein Calc. Measured

fraction* chain Sequence Position [M + H]+ [M + H]+IP...QSGGGPS

RIP...QSGGGPSGPFRIP...QSGGGPS

AGPFRIP...QSGGGPSSAGPFRIP...QSGGGPS

IP...QSGSGPSRIP...QSGSGPS

GPFRIP...QSGSGPSAGPFRIP...QSGSGPSPAGPFRIP...QSGSGPSQAGPFRIP...QSGSGPS

IP...QSGSGPSRIP...QSGSGPS

GPFRIP...QSGSGPSPAGPFRIP...QSGSGPS

RIP...GGGSGPSPAGPFRIP...GGGSGPS


Short chains:

napABngNAP1BngNAP1'BngNAPlAgNagNa'

1 10 20 30 40(D)SAGPFRIPKCRKEFQQAQHLRACQQWLHKQAMQSGGGPS(W)(N)P ...................K ..............S...(W)

Proc. Natl. Acad. Sci. USA 93 (1996) 3651

Table 5. [M + H]+ ions of peptides obtained from clostripaindigestion of fractions 3-6 (long chains) of Fig. 1

HPLC Measured Calc. Position*fraction [M + H]+ [M + H]+ 6 7 8 9 10

3 810.7 810.9 1-7 - -2130.1 2129.4 39-60 - - -2428.3 2427.8 36-60 - -3311.4 3313.0 61-88 - -3409.6 3410.1 61-89 - -3499.2 3497.1 61-90 - -

3501.2 8-35 - -4 810.6 810.9 - 1-7

829.0 829.0 - 36-42 -1885.7 1885.1 43-60 -3313.9 3313.0 - 61-88 -3410.8 3410.1 61-89 -3500.9 3497.2 - 61-90 -

3501.2 - - 8-354293.9 4293.1 - 1-35 - -4311.0 4311.2 8-42

5 709.2 709.9 - - - - 71-75828.8 829.0 - - 36-42953.2 953.0 - - - - 0-71200.6 1200.4 - - - - 61-701695.7 1696.1 - 76-902132.4 2132.4 - - - - 43-603500.6 3501.2 - - - - 8-354307.7 4311.2 - - - 8-424432.9 4435.2 - - - - 0-35

6 810.7 810.9 - 1-7828.2 829.0 - 36-42 - 36-42

1856:1 1855.0 - 43-60 - 43-603314.4 3313.0 - 61-88 - 61-883410.0 3410.1 - 61-89 - 61-893500.7 3497.2 - 61-90 - 61-90

3501.2 - 8-35 - 8-354310.4 4311.2 8-42 - 8-42

Molecular weights of the protonated peptide ions were determinedby MALDI-MS.*See legend of Table 3 for heavy chain sequence positions. See Fig. 1.

The MALDI-MS data obtained from fraction 6 similarlyindicate that it consists chiefly of BngNAP1 (12), both full-length C-terminally truncated (loss of S and PS) and N-terminally truncated (loss of PQGPQ) versions. There is alsoanother isoform, termed BngNAP1C, that is 29.7 Da heavierthan BngNAP1. Pairs of peaks differing by about the samemass were also found for the C-terminally processed forms, butnot for those starting with amino acid 6. Thus, the onlydifference between these two isoforms must be near the N

terminus. Edman sequencing of fraction 6 indeed revealed thepresence of both glycine and serine at position 3, which causesthe observed mass difference of -30 Da. This finding explainsa similar earlier observation (14).The peptides produced from fraction 4 by clostripain (Table

5) are almost identical to those from BngNAP1, with theexception of m/z 1885.7 (fraction 4) vs. m/z 1856.1 (fraction6). Because the latter corresponds to peptide 43-58 of Bng-NAP1, the former most likely represents the same region butdiffers in mass by +29.6 Da. A similar difference (+31.0 Da)is observed between the corresponding Endo-Lys-C peptidesof m/z 3519.8 (from BngNAP1, fraction 6) and m/z 3550.8 infraction 4 (Table 3). Finally, the set of [M + HI+ ions alsodiffers by an average of 30.1 Da. All these data point to theregion 43-60 for the sequence difference. While the peptideion of m/z 1885.7 is a minor component in the clostripaindigest, more of it had also been isolated from an Endo-Lys-C/trypsin digest. A combination of micro-Edman sequencing (inthe presence of a considerable excess of peptide IYQTATHLPK,m/z 1171.8) and MALDI-PSD data suggested the sequenceQQQGMQGQQMQHVISR for the segment 43-60 (Table 6) ofthis new napin, which we designate BngNAP1B.

Fraction 5 also seems to represent a new long chain and itsN-truncated analogues. The N-terminal clostripain peptide0-7 of [M + H]+ = m/z 953.2 (Table 5, fraction 5) as well aspeptides 43-60, 71-75, and 76-90 fit the amino acid sequencesof the corresponding portions of gene gNa (5). However,clostripain peptides [M + H]+ = m/z 828.8 and m/z 4307.7seem to match m/z 828.2 and m/z 4310.4 in the same digest ofBngNAP1 (fraction 6), corresponding to peptides 36-42 and8-42, respectively. From the latter, we conclude that in thisnapin, which we term gNaA, amino acids 10 and 11 are bothleucine (as in BngNAP1) and not proline and that the aminoacid at position 38 is lysine and not arginine, as the genesequence ofgNa would require. This is further corroborated bythe Endo-Lys-C peptide, [M + H]+ = m/z 4091.1, obtainedfrom fraction 5 (Table 3), which corresponds to the region0-31 with Leu [10]-Leu [11], and by the presence of theEndo-Lys-C peptide 39-84. This Leu-Leu placement is re-dundantly proven by the detection of an Endo-Glu-C peptide0-17, [M + H]+ of m/z 2289.4, calculated 2289.6, and athermolysin peptide 0-10, [M + H]+ ofm/z 1259.0, calculated1260.5 (data not shown). The [M + H]+ ion of m/z 1200.6 ofthe clostripain digest of fraction 5 does not match any knownnapin DNA sequence. The high-energy CID spectrum of thecorresponding Endo-Lys-C/trypsin peptide indicates that itrepresents aa 61-70 and that 61 is not valine, as the genesequence of gNa would require, but isoleucine as in all othernapin variants. The sequences (determined by CID-MS orMALDI-PSD) of these and other proteolytic peptides thatfurther confirm the assignment of the structures of the napin

Table 6. Sequences (determined by CID or PSD) supporting new napin structuresMeasured[M + H]+ A* Sequence Napin Positiont Digest

996.5 0.0


Table 7. Molecular weights of intact napin proteinsCalc.*

Short chain Position Long chain Position [M + H]+napA 7-39 napA l'-89' 13233.3 (1.6)napA 6-39 napA 1'-89' 13389.3 (1.3)napA 6-39 napA 1'-90' 13476.4 (2.9)BngNAP1/1' 6-39 BngNAP1 1'-89' 13628.5 (2.9)BngNAP1/1' 6-39 BngNAP1 1'-90' 13715.6 (2.3)BngNAPlA 6-39 BngNAPlB/C 1'-90' 13772.5 (2.1)gNa/gNa' 7-41 gNaA 0'-90' 14202.3 (3.1)gNa/gNa' 6-41 gNaA 0'-90' 14358.4 (6.5)gNa' 1-41 gNaA 0'-90' 14842.3 (1.6)*Values in parentheses = measured (by MALDI-MS) minus calculated.

chains differing from those previously known are listed inTable 6.While the identification of the short and long chains and

their primary structures present in the napin isolated from B.napus was now complete, it remained to be determined whichof these were linked together by disulfide bonds. The increasein sensitivity and resolution resulting from recent modifica-tions ("delayed ion extraction") of the MALDI-TOF instru-ment and its operation (17) allowed the assignment of at leastthe major short chain-long chain combinations based onmolecular weight measurements (Table 7).

DISCUSSIONNapin isolated from B. napus was found to be a complexmixture of closely related proteins (Fig. 2). Some of theseproteins are identical with those previously identified (bytranslation.of the genomic DNA or cDNA), namely, napA (4),BngNAP1 (12), and gNa (short chain only) (5). Additionally,we found a number of new variants: the short chains termedBngNAP1', BngNAP1A, and gNa', and the long chains termedBngNAP1B, BngNAPlC, and gNaA. No evidence for thepresence of the long chain of gNa was found. The molecularweight data summarized in Table 7 indicate that the shortchain of BngNAPlA is connected via the two disulfide bondsto either the long chain of BngNAPlB and/or BngNAPlC,and similarly gNa' is connected to gNaA. The mixture isfurther complicated by the varying degrees of N- and/orC-terminal truncation listed in Table 1.From Fig. 2, it is evident that BngNAP1 and BngNAP1', as

well as gNa and gNa', differ only by a replacement of theN-terminal proline by pyroglutamic acid. Because it is unlikelythat the same C ->A mutation occurred independently in boththe gene of BngNAP1 and of gNa, one must conclude that ittook place during the successive duplication and divergence ofthe ancestral protogene (7), and that the other sequencedifferences between BngNAP1 and gNa were copied mostprobably by homologous recombination of the genetic mate-rial. A similar process may have incorporated the gene-stretchleading to the amino acid region 49-58 (MQGQQMQQVI)from gNa into gNaA and BngNAPlB (where Q-56 is furtherconverted to H). However, in BngNAPlC, this region remainsidentical to that of BngNAP1. Whereas some of these differ-ences could be due to an editing process at the mRNA level(23-25), the codon changes are too varied beyond the knownC U, U -> C (26), and A I (i.e., G) (27) cases of editing.

It should be noted that no evidence for the long chain of gNaas derived from the published genomic DNA sequence (5)could be found, nor any other sequence containing P, P, R, andV in positions 10, 11, 38, and 61.The mass spectrometric evidence indicates that most of the

napin chains are also present partially truncated at either or

both termini (see Table 1). The N termini of the full-lengthchains of those derived from a DNA sequence are in agree-ment with the specificity of a recently discovered enzyme (28),which cleaves C-terminal to asparagine (amino acid shown inparentheses at the N termini in Fig. 2). The sequence of napApreviously determined by the Edman method (2) begins withisoleucine, which according to the DNA sequence is precededby arginine. However, now this turns out to be an N- andC-terminally truncated form (aa 7-35) and is thus not indisagreement with the enzyme specificity. According to theDNA sequence corresponding to napA (4), position -1 is Asp(see Fig. 1) indicating that its close relationship to Asn issufficient for proper processing of the proprotein.Upon inspection of the DNA-derived known amino acid

sequences, we noticed that the amino acid following thecompleted, nontruncated C terminus is always an aromaticamino acid (shown in parentheses in Fig. 2). Thus, the secondset of peptide-bond cleavages necessary to generate the shortand long chains seems to be also due to an enzyme thatspecifically cleaves N-terminal to Phe, Tyr, or Trp, particularlywhen preceded by Ser (or Gly, in the case of the long chainsof gNa and gNaA).

The authors are indebted to J. A. Vath for the PSD spectra, to P.Juhasz for the MALDI mass spectrum of intact napin, and to A. Richfor stimulating discussions. P.M.G. thanks the Swiss National ScienceFoundation for a fellowship. This work was funded by NationalInstitutes of Health Grant GM05472 (to K.B.). A.K. and J.B. weresupported by a grant from the Polish Scientific Committee (KBN).

1.

2.

3.

4.

5.6.

7.8.9.

10.

11.

12.13.14.

15.

16.17.

18.

19.

20.21.

22.23.24.25.26.

27.

28.

Lonnerdahl, B. & Jansson, J. C. (1972) Biochim. Biophys. Acta 278,175-183.Ericson, M., Rodin, J., Lenman, M., Glimelius, K., Josefsson, L.-G. & Rask,L. (1986) J. Biol. Chem. 261, 14576-14581.Crouch, M. L., Tenbarge, K. M., Simon, A. E. & Ferl, R. (1983) J. Mol.Appl. Genet. 2, 273-283.Josefsson, L.-G., Lenman, M., Ericson, M. L. & Rask, L. (1987) J. Biol.Chem. 262, 12196-12201.Scofield, R. & Crouch, M. L. (1987) J. Biol. Chem. 262, 12202-12208.Ericson, M., Mur6n, E., Gustavsson, H.-O., Josefsson, L.-G. & Rask, L.(1991) Eur. J. Biochem. 197, 741-746.Raynal, M., Depigny, D., Grellet, F. & Delseny, M. (1991) Gene 99, 77-86.Gerace, L. (1995) Cell 82, 341-344.Schneuwly, S., Kuroiwa, A., Baumgartner, P. & Gehring, W. J. (1986)EMBO J. 5, 733-739.Gerber, H.-P., Seipel, K., Georgiev, O., H6fferer, M., Hug, M., Rusconi, S.& Schaffner, W. (1994) Science 263, 808-811.Lawrence, M. C., Izard, T., Beuchat, M., Blagrove, R. J. & Colman, P. M.(1994) J. Mol. Biol. 238, 748-776.Baszczynski, C. L. & Fallis, L. (1990) Plant Mol. Biol. 14, 633-635.Monsalve, R. I. & Rodriguez, R. (1990) J. Exp. Bot. 41, 89-94.Monsalve, R. I., Men6ndez-Ariaz, L., L6pez-Otin, C. & Rodriguez, R.(1990) FEBS Lett. 263, 209-212.Monsalve, R. I., Villalba, M., L6pez-Otin, C. & Rodriguez, R. (1991)Biochim. Biophys. Acta 1078, 265-272.Karas, M. & Hillenkamp, F. (1988) Anal. Chem. 60, 2299-2301.Vestal, M. L., Juhasz, P. & Martin, S. A. (1995) Rapid Commun. MassSpectrom. 9, 1044-1050.Kaufmann, R., Spengler, B. & Liitzenkirchen, F. (1993) Rapid Commun.Mass Spectrom. 7, 902-910.Rouse, J. C., Yu, W. & Martin, S. A. (1995) J. Am. Soc. Mass Spectrom. 6,822-835.Johnson, R. S. & Biemann, K. (1987) Biochemistry 26, 1209-1214.Sato, K., Asada, T., Ishihara, M., Kunihiro, F., Kammei, Y., Kubota, E.,Costello, C. E., Martin, S. A., Scoble, H. A. & Biemann, K. (1987) Anal.Chem. 59, 1652-1659.Desrosiers, R. & Tanguay, R. M. (1988) J. Biol. Chem. 263, 4686-4692.Cattaneo, R. (1991) Annu. Rev. Genet. 25, 71-88.Chan, L. (1993) BioEssays 15, 33-41.Scott, J. (1995) Cell 81, 833-836.Hiesel, R., Wissinger, B., Schuster, W. & Brennicke, A. (1989) Science 246,1632-1634.Herbert, A., Lowenhaupt, K., Spitzner, J. & Rich, A. (1995) Proc. Natl.Acad. Sci. USA 92, 7550-7554.Hara-Nishimura, I., Inoue, K. & Nishimura, M. (1991) FEBS Lett. 294,89-93.

3652 BiceityGergeal

Dow

nloa

ded

by g

uest

on

June

25,

202

1

spectrometric amino acid sequencing a storage proteins ...thermolysin in 100,/l ofNH4HCO3buffer...

Documents

Transcript of spectrometric amino acid sequencing a storage proteins ...thermolysin in 100,/l ofNH4HCO3buffer...