Estimates of the loss of main-chain conformational entropy of different residues on protein folding

8
Estimates of the Loss of Main-Chain Conformational Entropy of Different Residues on Protein Folding Debnath Pal and Pinak Chakrabarti* Department of Biochemistry, Bose Institute, Calcutta, India ABSTRACT The average contribution of confor- mational entropy for individual amino acid residues towards the free energy of protein folding is not well understood. We have developed empirical scales for the loss of the main-chain (torsion angles, f and c) conformational entropy by taking its side-chain into account. The analysis shows that the main-chain component of the total conformational entropy loss for a residue is significant and reflects intrinsic characteristics associated with individual residues. The values have direct correlation with the hydro- phobicity values and this has important bearing on the folding process. Proteins 1999;36:332–339. r 1999 Wiley-Liss, Inc. Key words: conformation; conformational entropy; hydrophobicity; protein folding; pro- tein engineering INTRODUCTION When a protein folds into a compact globule, the resi- dues lose degrees of freedom as lesser number of conforma- tions can be accessed by the main- and the side-chain— this reduction in conformational entropy opposes the folding process. 1,2 On the other hand, water molecules released from contact with nonpolar side-chains (that get buried in the folded state) gain in entropy. These are the source of the hydrophobic effect, which is a major driving force for folding. 3,4 However, as most globular proteins are only marginally stable (the free energy for a folding-unfolding reaction is around 5–20 kcal/mol) 5 it appears that the conformational entropy is the prime deterrent to folding. 6 While the calculation/measurement of the free energy associated with hydrophobicity has received wide atten- tion, Cornette et al. 7 reporting a comparison of 38 different scales, the quantification of the conformational entropy changes has been rare and usually computation-inten- sive. 1,8–13 Some recent work in this area have determined scales for the conformational entropy change of side- chains during protein folding. 14–19 The backbone gets neglected in such studies, presumably with the assump- tion that the restriction on the conformation of the main- chain on folding is the same for all residues, and hence its inclusion will result in the addition of a constant term to all the entropy values such that the relative scale remains the same. D’Aquino et al. 20 have recently shown experimen- tally that this assumption is not valid and that the presence of the methyl group in alanine reduces the conformational entropy of the peptide backbone by 2.5 cal/K.mol with respect to that of glycine. They further substantiated their result by computing the energy profile of the backbone conformations as a function of the main- chain dihedral angles, f and c. However, as discussed below, the f,c angles can be influenced by the conforma- tion of the side-chain of individual residues, and this needs to be taken into account while calculating the conforma- tional entropy of the main-chain. Here we report such a calculation to show that the main-chain of each residue makes a distinct contribution towards the loss of conforma- tional entropy on folding, and that the values are related to hydrophobicity. Influence of the Side-Chain That each amino acid residue leaves its imprint on the protein structure on the basis of the chemical nature of its side-chain, is undisputed. However, at a more subtle level the side-chain can also influence the main-chain conforma- tion of the residue, 21 and the assumption that the f,c space is independent of the residue type while calculating the relative entropy values 15 is not justified. Because of the interdependence between the main-chain and the side- chain torsion angles, 21–23 a residue is represented not by a single f,c distribution, but by distributions at three discreet values of x 1 (260, 180 and 60°, which are desig- nated as g 1 , t and g 2 conformational states, respectively), or better yet, one should consider the distribution in the three-dimensional space (f,c,x 1 ). Additionally, instead of considering the influence of the side-chain up to the g position only (x 1 -torsion), the effect of the whole side-chain can also be taken into account by calculating the f,c distribution for all the rotameric states of the side-chain. All these methods to incorporate the effect of the side- chain while calculating the main-chain conformational entropy have been implemented by us. Folded and Unfolded States If one attempts to calculate the change in conforma- tional entropy in bringing a particular residue to the folded state, the unique conformation of the residue re- quires its entropy in the folded state to be set to zero. 15 On the other hand, instead of dealing with a residue in specific location in a protein structure, if we are interested in the *Correspondence to: Pinak Chakrabarti, Department of Biochemis- try, P-1/12 CIT Scheme VIIM, Bose Institute, Calcutta, 700 054, India. E-mail: [email protected] Received 17 December 1998; Accepted 8 April 1999 PROTEINS: Structure, Function, and Genetics 36:332–339 (1999) r 1999 WILEY-LISS, INC.

Transcript of Estimates of the loss of main-chain conformational entropy of different residues on protein folding

Page 1: Estimates of the loss of main-chain conformational entropy of different residues on protein folding

Estimates of the Loss of Main-Chain ConformationalEntropy of Different Residues on Protein FoldingDebnath Pal and Pinak Chakrabarti*Department of Biochemistry, Bose Institute, Calcutta, India

ABSTRACT The average contribution of confor-mational entropy for individual amino acid residuestowards the free energy of protein folding is not wellunderstood. We have developed empirical scales forthe loss of the main-chain (torsion angles, f and c)conformational entropy by taking its side-chain intoaccount. The analysis shows that the main-chaincomponent of the total conformational entropy lossfor a residue is significant and reflects intrinsiccharacteristics associated with individual residues.The values have direct correlation with the hydro-phobicity values and this has important bearing onthe folding process. Proteins 1999;36:332–339.r 1999 Wiley-Liss, Inc.

Key words: conformation; conformational entropy;hydrophobicity; protein folding; pro-tein engineering

INTRODUCTION

When a protein folds into a compact globule, the resi-dues lose degrees of freedom as lesser number of conforma-tions can be accessed by the main- and the side-chain—this reduction in conformational entropy opposes the foldingprocess.1,2 On the other hand, water molecules releasedfrom contact with nonpolar side-chains (that get buried inthe folded state) gain in entropy. These are the source ofthe hydrophobic effect, which is a major driving force forfolding.3,4 However, as most globular proteins are onlymarginally stable (the free energy for a folding-unfoldingreaction is around 5–20 kcal/mol)5 it appears that theconformational entropy is the prime deterrent to folding.6

While the calculation/measurement of the free energyassociated with hydrophobicity has received wide atten-tion, Cornette et al.7 reporting a comparison of 38 differentscales, the quantification of the conformational entropychanges has been rare and usually computation-inten-sive.1,8–13 Some recent work in this area have determinedscales for the conformational entropy change of side-chains during protein folding.14–19 The backbone getsneglected in such studies, presumably with the assump-tion that the restriction on the conformation of the main-chain on folding is the same for all residues, and hence itsinclusion will result in the addition of a constant term toall the entropy values such that the relative scale remainsthe same. D’Aquino et al.20 have recently shown experimen-tally that this assumption is not valid and that thepresence of the methyl group in alanine reduces theconformational entropy of the peptide backbone by 2.5

cal/K.mol with respect to that of glycine. They furthersubstantiated their result by computing the energy profileof the backbone conformations as a function of the main-chain dihedral angles, f and c. However, as discussedbelow, the f,c angles can be influenced by the conforma-tion of the side-chain of individual residues, and this needsto be taken into account while calculating the conforma-tional entropy of the main-chain. Here we report such acalculation to show that the main-chain of each residuemakes a distinct contribution towards the loss of conforma-tional entropy on folding, and that the values are related tohydrophobicity.

Influence of the Side-Chain

That each amino acid residue leaves its imprint on theprotein structure on the basis of the chemical nature of itsside-chain, is undisputed. However, at a more subtle levelthe side-chain can also influence the main-chain conforma-tion of the residue,21 and the assumption that the f,cspace is independent of the residue type while calculatingthe relative entropy values15 is not justified. Because of theinterdependence between the main-chain and the side-chain torsion angles,21–23 a residue is represented not by asingle f,c distribution, but by distributions at threediscreet values of x1 (260, 180 and 60°, which are desig-nated as g1, t and g2 conformational states, respectively),or better yet, one should consider the distribution in thethree-dimensional space (f,c,x1). Additionally, instead ofconsidering the influence of the side-chain up to the gposition only (x1-torsion), the effect of the whole side-chaincan also be taken into account by calculating the f,cdistribution for all the rotameric states of the side-chain.All these methods to incorporate the effect of the side-chain while calculating the main-chain conformationalentropy have been implemented by us.

Folded and Unfolded States

If one attempts to calculate the change in conforma-tional entropy in bringing a particular residue to thefolded state, the unique conformation of the residue re-quires its entropy in the folded state to be set to zero.15 Onthe other hand, instead of dealing with a residue in specificlocation in a protein structure, if we are interested in the

*Correspondence to: Pinak Chakrabarti, Department of Biochemis-try, P-1/12 CIT Scheme VIIM, Bose Institute, Calcutta, 700 054, India.E-mail: [email protected]

Received 17 December 1998; Accepted 8 April 1999

PROTEINS: Structure, Function, and Genetics 36:332–339 (1999)

r 1999 WILEY-LISS, INC.

Page 2: Estimates of the loss of main-chain conformational entropy of different residues on protein folding

average value for all residues of the same type then wehave to consider the distribution of the conformationalangles of the amino acid residue both in the folded and theunfolded states. For example, Creamer and Rose,14 whileinvestigating the difference in conformational entropybetween a series of apolar side-chains in the unfolded stateand in the a-helical state, obtained these two distributionsusing Monte Carlo calculations. Although the rotationalstates that are being sampled by the main-chain in thefolded state are directly available from the statisticalanalysis of known structures, the information on theunfolded state is not directly available. As a result, scalesof main-chain entropy changes during folding have beenproposed relative to the value for Gly (considering f,cdistributions in native states only).8,24 Below we propose aprocedure to fill in the lacuna on the unfolded state.

We have recently shown21 that amino acid residues(barring Gly, Ala and Pro) can be classified into five classesdepending on how the f,c distribution of individual resi-dues get affected by a change of the side-chain torsionangle x1. These are: Class I: Ser, Cys, Met, Glu, Gln, Lysand Arg; II: Leu; III: Asp and Asn; IV: His, Phe, Tyr andTrp; and V: Val, Ile and Thr. The classification is alsochemically intuitive. Residues in a given class have similartopological arrangements at least up to the g-position,beyond which the side-chain atoms do not significantlyinteract with the main-chain atoms, and are non-invasiveas regard to the f,c distribution.21 For example, Class Ihas linear side-chains (up to g-position), whereas Class Vhas aliphatic chains branched at Cb. As steric clash is thelimiting factor that affects the f,c angles in the unfoldedstate, one can safely assume that the ranges of the f,cangles that all residues in a given class span in theunfolded state are identical. Because of the additionalshort range interactions (like hydrogen bonding) that arebrought into play in the folded state (for example, due tothe presence of different atoms, as in Val and Thr), thedistribution of angles within this allowed f,c space (in theunfolded state) becomes different for different residues in aclass. Conversely then, the combined distribution of thef,c (and x1) values for all the members in a class can betaken as the representation of the distribution in theunfolded state for any member of the class. This translatesto the assumption that in the unfolded state a residuecannot assume a (f,c,x1) value not accessible to anymember of the class in the native state, and that thedistribution is random. Further justification for this as-sumption is provided by the energy calculations given inthe next section.

Theory

The change in the conformational entropy, DS, for amain-chain taken from the unfolded state (U) to the folded(F) state is given by:

DS 5 S(F) 2 S(U).

In one of the earliest studies Nemethy et al.8 estimated therestrictions on folding of either the main-chain or theside-chain using:

DS 5

2R ln 1accessible conformational space unfolded

accessible conformational space folded 2 . (1)

Although they used the area spanned by the torsion anglesf, c to determine the main-chain entropy changes duringfolding, we have used the equation by considering thevolume occupied by different residues in the (f,c,x1) space.Depending on the distribution of points, the 3-dimensionalspace is divided into a number of regions, such that eachregion encompasses a cluster of points (Fig. 1). For a givenresidue the mean and the standard deviation (s) of the f,cand x1 values in a cluster are calculated.21 The product ofthe three s values gives the volume of an ellipsoid thatprovides an estimate of the dispersion of points in thecluster. The summation of these ellipsoidal volumes overall the regions give the conformational space available tothe residue in the folded state. The same procedurerepeated using the combined distribution of points for allthe members in a class gives the value for the unfoldedstate of a member residue.

In a second method, we have directly used the distribu-tion of points based on the definition of entropy25 as the

Fig. 1. Distribution of points corresponding to all Ile residues in thethree-dimensional space. For one cluster the standard deviations (magni-fied four times) of the distribution of points are shown, which are theellipsoidal axes used for the calculation of volume in Method 1.

333LOSS OF MAIN-CHAIN CONFORMATIONAL ENTROPY

Page 3: Estimates of the loss of main-chain conformational entropy of different residues on protein folding

Boltzmann sampling over all accessible states,

S 5 2R oi51

N

pi ln pi (2)

where pi is the probability of the main-chain in state i(where each state is represented by a grid of fixed size inthe (f,c,x1) space (Fig. 1), the sum being performed overall the grid points (N) in the range of 360° along the threeaxes); R is the gas constant. As earlier, S(U) is obtainedfrom the combined distribution of all the residues belong-ing to a class.

As Leu is the only member of the class, the aboveprocedure of combining the f,c,x1 distributions of theconstituent class members to generate the unfolded statefails. Moreover, two residues (Gly and Ala) are left out inthe above procedure (besides Pro, for which the main-chain f and the side-chain torsion angles are restrictedbecause of the pyrrolidine ring, and is not consideredhere). As a result, there is a need for an independentmethod to simulate the unfolded state for all the residues,and for this we have taken recourse to energy calculationin the f,c space.

The conformational features of the unfolded state canalso be obtained from the fact that when unfolded, theconformation of a residue is influenced only by short-rangeinteractions which are adequately quantified through force-field calculations. The f,c map obtained by Ramachan-dran and coworkers26 is based on the simple assumptionthat the torsion angles that lead to steric clash are notpermissible. This assumption is equally applicable both forthe native and the denatured state, so that short rangesteric clash makes a portion of the whole map inaccessibleeven in the denatured state. Because of various secondaryand tertiary interactions, residues in the native state arenot spread all over the allowed region in the f,c map, butoccupy only a part of it. However, in the denatured statethe main-chain torsion angles can be expected to span thewhole space within the bounds of steep energy gradient.Consequently, we have calculated the f,c energy maps forall the different combinations of the side-chain torsionangles (rotamers) of a given residue (X) in a tripeptide,Ala-X-Ala. For a given map, starting from the minimumenergy value, contours are drawn at a fixed increment ofenergy and the area of the map enclosed by each contourlevel is calculated and compared with the area in theprevious level. This process is repeated until a steepenergy gradient is reached and no significant increase inarea is observed. This area can then be taken as themaximum area that is accessible to the residue for therotameric state under consideration. The area accessiblein the folded state is obtained from the distribution ofcrystallographic data points for the residue in the samerotameric state (Fig. 2). Equation (1) then gives the DSvalue for the main-chain of the rotamer, and the averageDS value can be obtained by considering all the rotamers.

METHODS

Two hundred ninety-four X-ray crystallographic proteincoordinate files taken from Brookhaven Protein DataBank (PDB),27 1998 version, having resolution #2.0 Å andR-factor #0.2 were selected conforming to ,25% sequencehomology criterion.28 For proteins having more than onesubunit, only those subunits were taken as prescribed inthe list available at http://www.sander.embl-heidelberg.de.Torsion angles of the main-chain (f and c) and side-chain (x)were calculated using the program DIHDRLprovided by PDB.Small screening factors, discussed below, depending on thedensity/number of data points in a region/grid, were applied toget consistent entropy values in different methods.

Method 1

It was found that for all the residues there were amaximum of twelve distinct regions in the (f,c,x1) spacewhere the points are clustered. The dispersion of thepoints in each region can be approximated by the standarddeviation associated with the mean of each variable withinthe region (Fig. 1). The regions are defined21 by thecombination of the following ranges of f,c and x1:

f(°) c(°) x1(°)

2180,0 2120,60 2240,2120 (t)60,240 2120,0 (g1)

0,120 (g2)0,180 290,90 2240,2120 (t)

90,270 2120,0 (g1)0,120 (g2)

Fig. 2. Boundary obtained by energy calculation (Method 3, for theunfolded state) shown superimposed on the distribution of points corre-sponding to Ile in the rotameric state g1t.

334 D. PAL AND P. CHAKRABARTI

Page 4: Estimates of the loss of main-chain conformational entropy of different residues on protein folding

The product of standard deviations for each variable in agiven range gave the volume dispersion of that cluster. Thesum of all the clusters gave the total accessible volume forthe residue in the 3-dimensional space. The dispersionvolume for all the residues individually (except Ala, Glyand Pro) and combining them into classes (for the unfoldedstate) were obtained and the entropy values calculatedusing equation (1).

While combining the occupied regions in the (f,c,x1)space of different residues to get the value for a class it wasassumed that in the unfolded state these regions wererandomly populated. As such, even though the distributionof points in the individual 10° 3 10° 3 10° grids within theoccupied regions of a given residue may vary, only onepoint was retained per occupied grid per residue whilegetting the distribution (and subsequent calculation of thestandard deviations) for the class in the unfolded state.

If a region was sparsely populated (containing ,1% ofthe total data points), it was not considered; this cut-offwas found to be necessary as each region gets equal weightin this calculation and a region with a very negligiblenumber of widely dispersed points lead to a spuriouslylarge volume contribution. As an example, for the aromaticresidues in Class IV, only 7 out of 12 regions contributed tothe volume calculation for the class, as well as the indi-vidual residues.

Method 2

Torsion angles f,c,x1 of all residues were plotted in a3-dimensional Cartesian axes having ranges 0 through360°. The fraction of the number of points (Ni) in a 10° 310° 3 10° grid to the total number (Nt) of points of aresidue gave the probability pi for the evaluation ofentropy in the folded state using equation (2). The choice ofa 10° grid size has been found to be suitable in an earlierstudy.24 As discussed under Method 1, the condition thatthe whole occupied region is randomly accessible to a givenresidue in the denatured state was imposed by assumingall occupied grids to have just one point. This modifieddistribution was then combined to get the distribution forthe class for the calculation of entropy in the denaturedstate.

Method 3

The nonbonded-energy calculations over the whole f,crange were performed for the sequence Ala-X-Ala (thepeptide terminals being capped by protons), X being anyresidue, using the CVFF29 forcefield as implemented in theprogram DISCOVER_3 (Biosym/MSI, San Diego, CA).Acidic and basic residues were considered with appropri-ate charges. Calculations were carried out by restrainingthe side-chain at discreet values of x1 (and beyond); all therotameric states were considered. As x1 can have slightlydifferent values at different regions of the f,c map,21 wefound out the average of all the x torsion angles of aresidue when the main-chain conformation is in one of the4 regions given above; however, if the mean value could notevaluated because of the paucity of data, the torsion angle

was placed at the canonical value of 660 or 180°. Next, theenergy values within the individual regions were com-puted (at 5° intervals) by fixing the side-chain at theseaverage values, and the total map was constructed.

Starting with the minimum energy (Em) value in a map,contours were drawn at 5 kcal/mol intervals (this value isapproximately 1/10th the value of the standard deviationof the average of all Em values), and areas enclosed (as thepercentage of the total map-area) were found out. If agiven increment increased the area by less than 10% of themap area then the area at that step was accepted as theaccessible area for the rotamer in the denatured state (Fig.2). For Cys, Ser and Thr, this area was found to beprofoundly affected by the orientation of the -SH or -OHproton in the side-chain. As there is no way of ascertainingthe position of the proton from crystallographic method,we have used six idealized orientations (given by thetorsion angle, Ca-Cb-O/Sg-H, with values 0, 660, 6120 and180°) for each value of x1 and the average value of theareas was accepted.

For the folded state of a residue, the f,c values for eachrotamer were plotted and the map was divided intosquares 10° on a side. If a grid contained more than 0.25%of the data points it was assumed to be occupied (for Gly,however, as the points are distributed in all the fourquadrants of the map, a threshold value of 0.1% was used)and the total number of such grids gave the accessiblearea. (On average, about 96% data points are enclosed inthis area). If the population of a rotamer is less than 0.5%of the total number (78,662) of residues in our database,the size of the grid was increased to 20° and the thresholdvalue for occupancy was changed to 1%.

The change in the conformational entropy for eachrotamer of a residue was then calculated using equation(1), and the weighted DS value,

DS 5

or

Nr DSr

or

Nr

was found out; Nr is the number of data points for therotamer that defined the accessible area; the summation isover all rotamers.

RESULTS AND DISCUSSION

Scales of the Main-Chain ConformationalEntropy Values

Tables I and II provide the values for the average loss ofmain-chain conformational entropy, obtained by threedifferent methods. As different considerations have goneinto the calculation of each set of values the concurrencebetween them is remarkable (Table III). While the side-chain torsion, x1, is directly used in Methods 1 and 2, theeffect of the whole side-chain is taken into account, albeitindirectly, in Method 3. Also, the agreement between Set 3and the other two sets suggests that the unfolded state

335LOSS OF MAIN-CHAIN CONFORMATIONAL ENTROPY

Page 5: Estimates of the loss of main-chain conformational entropy of different residues on protein folding

TABLE I. Conformational Entropy Values Using Methods 1 and 2

Residue Set 1 Set 2 Entropy TDS (kcal/mol at 300 K)Name % of total populationa Volumeb occupied Entropy (DS/R) Absolute (S/R) Relative (DS/R) Set 1 Set 2

a) Class I 4.13 7.94Cys 1.42 1.99 20.73 6.23 21.72 20.43 21.02Ser 6.13 2.66 20.44 6.84 21.10 20.26 20.66Met 1.97 2.10 20.68 5.98 21.97 20.41 21.17Glu 5.72 2.43 20.53 6.41 21.53 20.32 20.91Gln 3.67 2.17 20.64 6.26 21.68 20.30 21.00Lys 5.83 2.33 20.57 6.62 21.32 20.34 20.79Arg 4.36 2.28 20.59 6.49 21.45 20.35 20.86

b) Class II 2.50 7.72Leuc 8.10 0.97 20.95 6.24 21.48 20.57 20.88

c) Class III 4.89 7.68Asp 6.18 2.74 20.58 6.62 21.06 20.35 20.63Asn 4.79 3.04 20.47 6.80 20.88 20.28 20.52

d) Class IV 2.71 7.41His 2.26 2.22 20.20 6.47 20.95 20.12 20.57Phe 4.10 1.62 20.51 6.36 21.05 20.30 20.63Tyr 3.81 1.69 20.47 6.45 20.96 20.28 20.57Trp 1.60 1.59 20.53 6.09 21.33 20.32 20.79

e) Class V 3.09 7.46Val 7.02 1.47 20.74 5.81 21.65 20.44 20.98Ile 5.43 1.00 21.13 5.60 21.86 20.67 21.11Thr 5.96 2.04 20.41 6.32 21.14 20.24 20.68

aThe values for Gly, Ala and Pro are 8.23, 8.67 and 4.75, respectively.bGiven by S sf p sc p sx1 , where ss (in deg) are the standard deviations of different clusters in the (f, c, x1) space, the summation being over all theclusters, as given in Methods. The individual values are divided by that of Ile which has the smallest volume (not considering Leu, for reasonsgiven below).cAs Leu is the only member of class II, its unfolded state has been simulated by putting it along with all other members of class I and excluding theg2 conformation from the calculation. This is because as the branching in the side-chain is at the g-position the x1-dependence of the f, c map ofLeu resembles that of the unbranched residues of class I, except that its side-chain is hardly found in the g2 conformation.21

TABLE II. Conformational Entropy Values (Set 3) Using Method 3

Residue

No. ofpossiblerotamers

Most populous rotamer

Entropy(DS/R)

TDS(kcal/mol at 300 K)

Name% of total

population% of the (f, c) map occupied

Observed Calculated Calculated Scaleda

Gly 17.90 57.93 21.17 20.70 21.41Ala 5.63 19.19 21.23 20.73 21.45Cys 3 g1 57 7.87 14.63 20.69 20.41 21.03Ser 3 g2 47 6.10 17.61 20.95 20.57 21.24Met 27 g1g1g1 16 6.17 14.93 20.76 20.45 21.08Glu 27 g1tg1 18 7.64 13.46 20.72 20.43 21.05Gln 27 g1tg2 14 4.94 14.24 20.86 20.51 21.16Lys 81 g1ttt 18 5.56 10.8 20.57 20.34 20.93Arg 81 g1ttt 11 8.02 11.25 20.59 20.35 20.95Leu 9 g1t 56 6.10 13.45 20.80 20.48 21.12Asp 9 g1g1 43 5.79 10.19 20.67 20.40 21.01Asn 9 g1g1 33 6.64 12.81 20.63 20.38 20.99His 9 g1g1 28 7.48 9.39 20.17 20.10 20.61Phe 6 g1g1/2 42 8.02 13.37 20.58 20.35 20.95Tyr 6 g1g1/2 42 7.33 13.29 20.61 20.36 20.96Trp 9 g1g2 34 5.40 13.39 20.67 20.40 21.01Val 3 g1 73 4.48 11.57 20.87 20.52 21.17Ile 9 g1t 58 3.78 9.41 20.78 20.46 21.09Thr 3 g2 47 6.48 13.62 20.84 20.50 21.15aScaled to the values of Set 2, using the equation y 5 1.33x 2 0.48, obtained by regression analysis between Set 2 (y) and Set 3 (x) (after removingSer, Cys and Thr, as given in Table III).

336 D. PAL AND P. CHAKRABARTI

Page 6: Estimates of the loss of main-chain conformational entropy of different residues on protein folding

obtained from the energy calculation is similar to the onederived from the combined distribution of a class ofresidues, thereby lending a justification, based on energet-ics, for the earlier classification21 of amino acid residues.

Ser, Thr and Cys have maximum discrepancies in theSet 3 values compared to the other sets. This is because theresults of the energy calculation for these residues arehighly sensitive to the location of the terminal proton ofthe side-chain. The calculated area can change by as muchas 100% depending on the conformation (that defines theposition of the proton) selected (see Methods). In theensuing discussion we use the Set 2 values (along withthose of Gly and Ala from Set 3 scaled to Set 2) as thesevalues are obtained directly from the observed distribu-tion.

Although not used in the discussion, our aim for employ-ing Method 1 was to see if a simple volume calculationusing three s values can give result comparable to othermethods. One may argue that this ellipsoid enclosing thedistribution of f,c,x1 points is aligned with the three axis,whereas a more sophisticated method would be to calcu-late intertial tensor of the points and then use the deducedprincipal axes of an enclosing nonaligned ellipsoid for thevolume calculation. As can be seen, DS values calculatedby Method 1 are reasonable, and besides the volumes givenin Table I, are also correlated with other parameters likethe b-sheet propensities of residues (manuscript underpreparation). This method has the added advantage thatone may also include the influence of other side-chaintorsion angles, say x2 by considering a hypervolume consist-ing of f,c,x1 and x2.

Substitution of Gly by Ala

Gly and Ala are bypassed in all work dealing with theside-chain conformational entropy. But even without anyside-chain torsion angle they do have large and compa-rable negative DS values, which has important connota-tion for protein engineering experiments aimed at enhanc-ing protein thermostability. Matthews et al.30 have proposedthat substitutions which decrease the chain entropy of theunfolded state might shift the equilibrium to the foldedstate, provided there are no enthalpic penalties for thesechanges in the folded state. It was assumed that Glyresidues would lose the maximum conformational entropyupon folding and its substitution, even by a slightly largerresidue like Ala, would be stabilizing, as indeed was thecase in many instances.20,30–37 However, inasmuch as DS

for Gly and Ala are nearly the same, the degree to whichthe conformational flexibility of the main-chain is reduceddue to a change from a Gly to an Ala residue is similar bothin the denatured and the folded states. Consequently, ourdata do not support the hypothesis that the higher proteinstability arising out of such mutations is due to entropiceffect; rather the enthalpic contribution involving themethyl group may also be important.

This result needs to be put in proper perspective. Ourmethod of finding out the maximum area that is accessibleto a residue in the unfolded state is similar to Ramachan-dran’s method of delineating the contact map,26 i.e., theregion devoid of any steric clash. Gly residues in knownstructures, however, cover only a limited portion of thisallowed region as can be seen in the diagrams of Ra-makrishnan and Srinivasan.38 Even for Ala, the points arenot distributed for the whole allowed region and indeedStites and Pranata24 have commented that of all residues(except Gly and Pro), Ala is the most ordered. Our resultquantifies the reduction in the area over which the f,cpoints in the known structures are distributed as com-pared to the total allowed region, and the values obtainedfor Gly and Ala are nearly identical. The relative DS valuesenumerated here provide guidelines for the judicioussubstitutions that would decrease the conformational en-tropy of unfolding and may thereby stabilize the nativestructure.

Main-Chain Conformational Entropy,Hydrophobicity and Protein Folding

Inasmuch as the contribution of the side-chain torsion x1

is taken into account in our calculation, the entropy valuesreported here should approximate the value for the wholeresidue if it does not have any torsion angle beyond x1.Moreover, it is the main-chain that determines the proteinfold, and it is highly plausible that at the initial stages offolding, the main-chain atoms and the close side-chainatoms (x1 torsion) get restrained, leaving the side-chainatoms further off to retain their full flexibility characteris-tic of the unfolded state. As such, even for the longerresidues (with x2 and beyond) the estimated entropyvalues should correspond closely to the loss in the confor-mational entropy for the whole residue as these are foldingto make the protein core. Additionally, side-chains ofresidues on protein surface are highly mobile (so much sothat the distribution of their side-chain conformations maybe what one expects in the unfolded state15) and for theseresidues, our DS values would be the major component ofthe overall value.

In order to analyse if the entropy values reflect theresidue hydrophobicities, we have compared the 2TDSvalues with a few hydrophobicity scales (Table IV). Valuesare in good agreement with the scales due to Kyte-Doolittle and Wolfenden (Fig. 3). Some workers haveconsidered hydrophobicity and side-chain entropy to bedistinct entities.19 However, as there is a substantialcorrelation between the main-chain conformation entropyand hydrophobicity it can be surmised that both get amajor contribution from a common factor. In case of

TABLE III. CorrelationCoefficients Between the

Three Sets of ConformationalEntropy Values

Set 2 Set 3

Set 1 0.73 0.49 (0.73)Set 2 0.44 (0.66)

Values in brackets are obtained whenSer, Thr and Cys are omitted fromcalculations.

337LOSS OF MAIN-CHAIN CONFORMATIONAL ENTROPY

Page 7: Estimates of the loss of main-chain conformational entropy of different residues on protein folding

hydrophobicity it is water, and it is likely that water alsocontributes to DS values, albeit in an indirect manner.Hydrophobicity is determined by the chemical nature ofthe side-chain and its surface area, which in turn alsocontrol the (f,c,x1) values that a particular residue canadopt, and thus the main-chain entropy.

Residues that show large deviations from the fitted linesin Figure 3 are Arg, Lys, Glu and Gln. Interestingly,however, these are the very residues for which the hydro-phobicity values derived by various methods also showwide variations.39 These residues have large componentsof both polar and nonpolar atoms which may contributedifferently depending on the systems used to generate thehydrophobicity scales, whereas in our calculations, it is thenearby nonpolar part that is likely to affect the main-chainconformation and thereby the resultant entropy values.

It is remarkable that the hydrophobic residues (Fig. 3),and Gly and Ala (Table II) show the maximum reduction ofthe main-chain conformational entropy on folding, unlikethe earlier studies concerned only with the side-chainentropies which showed, as expected, that the entropyvalues depend on the length of the side-chain. As theseconstitute a major proportion of amino-acid residues in aprotein (Table I), DS values will have a pronounced influ-ence in the folding process. In the hydrophobic collapsemodel,5 it is the hydrophobic residues which initiate thefolding process, but our study indicates that the main-chain of these same residues get the maximum resistance,owing to conformational entropy to take up the nativeform. This may be beneficial for a productive foldingprocess as it means that specific non-covalent interactions(like hydrogen bonding) must start to be formed at thisstage to overcome the entropic cost. Without the counter-ing influence from DS the hydrophobic residues mightaggregate, leading to non-native like forms. Seen from thisangle the term ‘‘hydrophobic collapse’’ may be a too severeterm, as it does not adequately convey the opposition of themain-chain entropy of the hydrophobic residues.

CONCLUDING REMARKS

In summary, any attempt to elucidate thermodynamicparameters neglecting the main-chain on account of itsinvariance within the naturally occurring amino acid

residues may not be correct; although chemically equiva-lent, the backbones from different residues have varyingdegree of conformational restrictions imposed by the side-chain. The main hindrance for the calculation of the loss inthe main-chain conformational entropy has been the lackof any concrete data on the possible conformations in thedenatured state. We have overcome this by energy calcula-tions, and by combining the distribution of residues thatshow similar x1-dependence of the f,c plot.21 Hydrophobicresidues, and Gly and Ala have larger entropy values withimportant implications for the folding process. The derivedentropy values can be used for protein stabilization byrational amino acid replacements based on the concept ofentropic stabilization.30

ACKNOWLEDGMENTS

We thank the Council of Scientific and Industrial Re-search for a fellowship to D.P., the Department of Scienceand Technology for a grant to P.C., and the Department of

Fig. 3. Plot of the loss of main-chain conformational entropy (2TDSvalues, Set 2 1 Gly 1 Ala, explained in Table IV footnote) against thehydrophobicity values (both in kcal/mol) due to (a) Kyte and Doolittle, and(b) Wolfenden. The least-squares lines [(a) y 5 10.30x 27.64; (b)y 5 15.93x 216.35] are obtained by excluding Gly, Ala, Arg, Lys, Glu andGln from the regression analysis; the corresponding correlation coeffi-cients are provided in Table IV, third column.

TABLE IV. Correlation Coefficients BetweenConformational Entropy Values (2DS) and Some

Representative Hydrophobicity Data

Hydrophobicity scale of Set 1 Set 2Set 2 1

Gly 1 Alaa

Fauchere and Pliska40 0.44 (0.53) 0.32 (0.60) 0.20 (0.30)Kyte and Doolittle41 0.65 (0.78) 0.47 (0.77) 0.46 (0.63)Miller et al.42 0.38 (0.59) 0.30 (0.77) 0.32 (0.59)Ponnuswamy et al.43 0.55 (0.63) 0.52 (0.79) 0.36 (0.46)Wolfenden et al.44 0.55 (0.77) 0.37 (0.75) 0.51 (0.80)

Values obtained on exclusion of Arg, Lys, Glu and Gln are given inparentheses.aValues for Gly and Ala are from Set 3 scaled to Set 2 (last column ofTable II).

338 D. PAL AND P. CHAKRABARTI

Page 8: Estimates of the loss of main-chain conformational entropy of different residues on protein folding

Biotechnology for the National Facility for BiomolecularModeling.

REFERENCES1. Finkelstein AV, Janin J. The price of lost freedom: entropy of

bimolecular complex formation. Protein Eng 1989;3:1–3.2. Brady GP, Sharp KA. Entropy in protein folding and in protein-

protein interactions. Curr Opin Struct Biol 1997;7:215–221.3. Kauzmann W. Some factors in the interpretation of protein

denaturation. Adv Protein Chem 1959;14:1–63.4. Chothia C. Hydrophobic bonding and accessible surface area in

proteins. Nature (London) 1974;248:338–339.5. Dill KA. Dominant forces in protein folding. Biochemistry 1990;29:

7133–7155.6. Privalov PL, Gill SJ. Stability of protein structure and hydropho-

bic interaction. Adv Protein Chem 1988;39:191–234.7. Cornette JL, Cease KB, Margalit H, Spouge JL, Berzsofsky JA,

DeLisi C. Hydrophobicity scales and computational techniques fordetecting amphipathic structures in proteins. J Mol Biol 1987;195:659–685.

8. Nemethy G, Leach SJ, Scheraga HA. The influence of amino acidside-chains on the free energy of helix-coil transitions. J PhysChem 1966;70:998–1004.

9. Karplus M, Kushick JN. Method for estimating the configura-tional entropy of macromolecules. Macromolecules 1981;14:325–332.

10. Di Nola A, Berendsen HJC, Edholm O. Free energy determinationof polypeptide conformations generated by molecular dynamics.Macromolecules 1984;17:2044–2050.

11. Brady J, Karplus M. Configuration entropy of the alanine dipep-tide in vacuum and in solution: a molecular dynamics study. J AmChem Soc 1985;107:6013–6015.

12. Meirovitch H, Kitson DH, Hagler AT. Computer simulation of theentropy of polypeptides using the local states method: applicationto cyclo-(ala-pro-D-phe)2 in vacuum and in the crystal. J Am ChemSoc 1992;114:5386–5399.

13. Novotny J, Bruccoleri RE, Saul FA. On the attribution of bindingenergy in antigen-antibody complexes McPC 603, D1.3 and Hy-HEL-5. Biochemistry 1989;28:4735–4749.

14. Creamer TP, Rose GD. Side-chain entropy opposes a-helix forma-tion but rationalizes experimentally determined helix formingpropensities. Proc Natl Acad Sci USA 1992;89:5937–5941.

15. Pickett SD, Sternberg MJE. Empirical scale of side-chain confor-mational entropy in protein folding. J Mol Biol 1993;231:825–839.

16. Lee KH, Xie D, Freire E, Amzel LM. Estimation of changes inside-chain configurational entropy in binding and folding: Generalmethods and application to helix formation. Proteins 1994;20:68–84.

17. Blaber M, Zhang X, Lindstorm JD, Pepiot SD, Baase WA, MathewsBW. Determination of a-helix propensity within the context of afolded protein. Sites 44 and 131 in bacteriophage T4 lysozyme. JMol Biol 1994;235:600–624.

18. Koehl P, Delarue M. Application of a self-consistent mean fieldtheory to predict protein side-chains conformation and estimatetheir conformational entropy. J Mol Biol 1994;239:249–275.

19. Sternberg MJE, Chickos JS. Protein side-chain conformationalentropy derived from fusion data—comparison with other empiri-cal scales. Protein Eng 1994;7:149–155.

20. D’Aquino JA, Gomes J, Hilser VJ, Lee KH, Amzel LM, Freire E.The magnitude of the backbone conformational entropy change inprotein folding. Proteins 1996;25:143–156.

21. Chakrabarti P, Pal D. Main-chain conformational features atdifferent conformations of the side-chains in proteins. Protein Eng1998;11:631–647.

22. Dunbrack Jr RL, Karplus M. Backbone dependent rotamer libraryfor proteins: application to side-chain prediction. J Mol Biol1993;230:543–571.

23. Dunbrack Jr RL, Karplus M. Conformational analysis of thebackbone dependent rotamer preferences of protein side-chains.Nature Struct Biol 1994;1:334–340.

24. Stites WE, Pranata J. Empirical evaluation of the influence of sidechains on the conformational entropy of the polypeptide backbone.Proteins 1995;22:132–140.

25. Hill TL. Statistical mechanics. New York: McGraw Hill, 1956.26. Ramachandran GN, Sasisekharan V. Conformation of polypep-

tides and proteins. Adv Protein Chem 1968;23:283–437.27. Bernstein FC, Koetzle TF, Williams GJB, Meyer Jr EF, Brice MD,

Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The proteindata bank: a computer based archival file for macromolecularstructures. J Mol Biol 1977;112:535–542.

28. Hobohm U, Sander C. Enlarged representative set of proteinstructures. Protein Sci 1994;3:522–524.

29. Dauber-Osguthorpe P, Roberts VA, Osguthorpe DJ, Wolff J, GenestM, Hagler AT. Structure and energetics of ligand binding toproteins: E. coli dihydrofolate reductase-trimethoprim, a drug-receptor system. Proteins 1988;4:31–47.

30. Matthews BW, Nicholson H, Becktel WJ. Enhanced proteinthermostability from site-directed mutations that decrease theentropy of unfolding. Proc Natl Acad Sci USA 1987;84:6663–6667.

31. Ganter C, Pluckthun A. Glycine to alanine substitutions in helicesof glyceraldehyde-3-phosphate dehydrogenase: effects on stability.Biochemistry 1990;29:9395–9402.

32. Green SM, Meeker AK, Shortle D. Contributions of the polar,uncharged amino acids to the stability of staphylococcal nuclease:evidence for mutational effects on the free energy of the denaturedstate. Biochemistry 1992;31:5717–5728.

33. Hecht MH, Sturtevant JM, Sauer RT. Stabilization of l repressoragainst thermal denaturation by site-directed Gly = Ala changesin a-helix. Proteins 1986;1:43–46.

34. Imanaka T, Shibazaki M, Takagi M. A new way of enhancing thethermostability of proteases. Nature (London) 1986;324:695–697.

35. Margarit I, Campagnoli S, Frigerio F, Grandi G, De FV, FontanaA. Cumulative stabilizing effects of glycine to alanine substitu-tions in Bacillus subtilis neutral protease. Protein Eng 1992;5:543–550.

36. Shortle D, Stites WE, Meeker AK. Contributions of the largehydrophobic amino acids to the stability of staphylococcal nucle-ase. Biochemistry 1990;29:8033–8041.

37. Takagi M, Imanaka T. Addition of a methyl group changes bothcatalytic velocity and thermostability of the neutral protease fromBacillus stearothermophilus. FEBS Lett 1989;254:43–46.

38. Ramakrishnan C, Srinivasan N. Glycyl residues in proteins andpeptides: an analysis. Curr Sci 1990;59:851–861.

39. Karplus PA. Hydrophobicity regained. Protein Sci 1997;6:1302–1307.

40. Fauchere J, Pliska V. Hydrophobic parameters p of amino acidside-chains from the partitioning of N-acetyl-amino-acid amides.Eur J Med Chem 1983;18:369–375.

41. Kyte J, Doolittle RF. A simple method for displaying the hydro-pathic character of a protein. J Mol Biol 1982;157:105–132.

42. Miller S, Janin J, Lesk AM, Chothia C. Interior and surface ofmonomeric proteins. J Mol Biol 1987;196:641–656.

43. Ponnuswamy PK, Prabhakaran M, Manavalan P. Hydrophobicpacking and spatial arrangement of amino acid residues inglobular proteins. Biochim Biophys Acta 1980;623:301–316.

44. Wolfenden R, Andersson L, Cullis PM, Southgate CCB. Affinitiesof amino acid side-chains for solvent water. Biochemistry 1981;20:849–855.

339LOSS OF MAIN-CHAIN CONFORMATIONAL ENTROPY