STRUCTURE CALCULATIONS OF PROTEIN SURFACE SEGMENTS: MONTE CARLO SIMULATED ANNEALING

1
STRUCTURE CALCULATIONS OF PROTEIN SURFACE SEGMENTS: MONTE CARLO SIMULATED ANNEALING WITH SCALED COLLECTIVE VARIABLES AND FORCE CONSTANT ANNEALING ergio A. Hassan, Ernest L. Mehler and Harel Weinstein; Dept. Physiology and Biophysics, Mount Sinai School of Medicine, New York, NY 10029 A new algorithm for modeling segments in proteins (in particular loops) is presented that first finds conformations representative of segment structures tethered to the protein at the N-terminus only, and subsequently the free end of the segment is driven towards its attachment point using a reversed force constant simulated annealing scheme with scaled collective variables (SCV). The segment peptide is initially placed in an extended conformation with the N-terminus covalently bound to the attachment point in the protein, and simulated annealing Monte Carlo (MC) calculations [1] are carried out. The resulting families of new conformations prepare the peptide for attachment of the C-terminus. In the second stage a hierachical protocol drives the segment’s C-terminus towards its final position in the protein. In this second part of the calculation the complete force field, i.e., including the protein’s tertiary structure, is considered. The free C-terminus is attached to a dummy residue, identical to the target residue where the segment will be connected. Successive MC simulations are carried out using the SCV method [2] with increasingly larger values of the harmonic force constant to ensure the correct orientation of the segment with the rest of the protein in the attachment point. The method was evaluated for eight different segments in the -subunit of transducin, using PARAM22 CHARMM and the recently developed screened Coulomb potential based implicit solvent model [3]. INTRODUCTION Loops are important in many biological functions of proteins and fluctuate considerably from their equilibrium structures in solution, which is problematic for their structure determination by experimental methods or for homology modeling. Structural flexibility of loops plays an important role in protein-protein, protein-peptide and protein- DNA recognition by allowing adaptation of loop conformation during interaction. In G-protein coupled receptors, for example, the extracellular loops are involved in binding of various ligands, whereas intracellular loops are important for triggering subsequent steps of the cellular response upon activation. A segment is defined as a loop portion plus the elements of secondary structure that immediately precedes and follows it. Therefore, segment structure prediction is a more challenging problem since it includes the task of reproducing the specific folding properties observed at the ends of the segment (i.e., specific secondary structure) and the proper H-bond interactions. The method developed here, for the calculation of segments connecting elements of secondary structure motifs in proteins, consists of two successive steps: 1. A simulated annealing Monte Carlo simulation (SA-MC) of the segment peptide tethered at its N-terminus only. 2. A Monte Carlo simulation in the space of the scaled collective variables (SCV-MC) of the segment, with an increasing harmonic constraint that drives the C-terminus towards its attachment point. The rationale for this combined process is based on the assumption that even segments connecting elements of secondary structural motifs have an intrinsic propensity for a particular set of conformations, i.e., there is a specific folding encoded in its amino acid sequence. In addition it is assumed that the side chains of the segment are predominantly exposed to the solvent. The second part of the process takes the segment from the initially determined structures and closes it in the presence of the rest of the protein (Fig.1). dummy-anchor residue dista k 1 k 2 k 3 k 4 k 1 <k 2 <k 3 <k 4 0 Figure 1: Schematic representation of four successive destabilizations of a local minimum of the energy surface by an external constraint. The upper curve shows a minimum corresponding to a large distance of the C-terminus from the attachment point, obtained with a relatively small force constant k 1 of the harmonic constraint. The lower curve represents the shift of the minimum for complete segment closure, achieved by a larger value k 4 of the force constant. The relaxation around each local minima is carried out using a Monte Carlo simulation in the space of the scaled collective variables of the segment. CALCULATION OF SEGMENTS IN THE -SUBUNIT OF TRANSDUCIN One of the best-characterized G-protein signaling pathways in humans is in the rod cells of the retina where the conversion of light (external stimulus) into a nerve impulse is mediated by the -subunit of transducin that binds to rhodopsin, a transmembrane photoreceptor. The -subunit is composed of two domains (Fig.2) one containing six -strands (1-6) surrounded by six helices (1-5 and G), and another one composed mainly of helical structures. The two domains are connected by two short segments, labeled linkers 1 and 2. The eight segments considered in this study are shown in Figures 3 and 4. Figure 2: -subunit of transducin (PDB entry 1got) showing the two domains connected by two linkers (shown in black). All -strands are located within the domain shown at the right of the figure. Transducin is a heterotrimeric G-protein involved in the visual cascade in the human retina and binds to the photoreceptor rhodopsin; the -subunit is a 39 KDa protein. Figure 4: Schematic representation of the secondary structure parsing of the sequence of the -subunit of transducin, showing the location of the eight segments studied. Figure 3: Segments s1 to s8 (black ribbons) in the -subunit of transducin. Three segments (s2-s4) belong to one domain of the protein and four segments (s5-s8) belong to the second domain; Segment s1 contains (but is longer than) the linker 1 (see Fig.2). Together, the segments comprise about 25% of the whole protein. The calculations were carried out with the all-atom representation of the CHARMM force field [4] and the recently proposed screened Coulomb potential based implicit solvent model [3]. Figure 6: Upper view of the superposition of the calculated (blue ribbon) and native (black ribbon) conformations of Segment 1 (rest of the protein not shown); initial extended conformation is also shown. Arrows indicate the two ends of the segments. -2 0 2 4 6 8 10 12 -20 0 20 40 60 80 100 Relative Free Energy (Kcal/mol native structure best scored conformation Figure 8: Scatter plot of the RMSD of backbone heavy atoms (with exact superposition of the protein excluding the segment) versus the energy of the system (relative to the native structure) for Segment s1. In this preliminary study, the lower energy was used as a quantitative criterion for selecting the predicted conformation of the segments. Figure 7: Calculated (blue) and native (black) structures of the isolated Segment s1, showing the orientations of the side chains. Arrows indicate the two ends of the segment. Figure 9: Three main stages of the structure calculation for Segment 2 (see legend of Fig.5 for details). Figure 10: Superposition of the calculated (black ribbon) and crystal structure for Segment 2 (rest of the protein not shown). Arrows indicate the ends of the segment. Figure 11: Comparison of the orientations of side chains in the crystal structure and calculated conformation of the segment shown in Fig.10. Figure 12: Detailed views of Segment 3 and the regions of the protein connected by this segment. The formation of critical H-bonds that define the -helical motifs is observed throughout the simulation. Stages (A), (B) and (C) are as in Figs.5 and 9. In stage (A) no H-bond is present between the segment and the rest of the protein; in (B) the correct orientation of the peptide bond at the N- terminal attachment is reproduced, but no H- bond is formed yet; in (C) an H-bond at the N-terminus and two H-bonds at the C-terminus complete the closure of the segments (indicated by arrows). This pathway for the formation of critical H-bonds is characteristic for the method. Hydrogen bonding interactions are described by the SCP-ISM reported in [3]. CONCLUSIONS The method developed was shown to reproduce the qualitative features of the structures of the segments in the -subunit of transducin. Quantitative agreement with the crystal structure, in terms of RMSD of backbone atoms was satisfactory for s1, s2, s3, s6 and s8. For s7 the RMSD was larger, but the overall conformation of the calculated structure was still in reasonable agreement with the native structure. Notably, the known conformations in the crystal were found to have lower energies than any of the computed structures. This indicates that the energy function with the SCP-ISM is probably correct but that the sampling method must be improved. Several option are available to improve the quality of the results. One approach to be explored is to introduce the protein structure in a more gradual and systematic way, especially for segments such as s4 and s5 that are buried in the protein. For these segments many steric clashes were observed when the structures obtained in the SA-MC phase were combined with the rest of the protein; the sudden introduction of the protein at the start of the SCV-MC phase introduces a large perturbation that forces the structure to leave the initial conformation in an uncontrolled and irreversible way. The gradual switching on of the force field of the protein around the segment would help avoid this problem. ID Seq a N b RMSD c rmsd d sec.struct e Function s1 46-54 9 2.09 1.75 1 A Contains linker 1 s2 80-90 11 1.60 0.85 A B s3 102-112 11 2.97 1.63 B C s4 132-142 11 >10.00 <3.00 D E s5 222-234 13 >10.00 <3.00 4 3 Part of the Switch III s6 276-286 11 2.70 2.50 G 4 s7 301-310 10 4.20 2.70 4 6 Receptor binding region s8 314-321 8 2.85 1.79 6 5 Receptor binding region a) sequence defined in [5], PDB entry:1got b) number of residues in the segments c) root-mean-square deviations of main chain heavy atoms of the segments, in Å; note that the proteins are superimposed, excluding the segments. d) same as c) but superimposing the calculated and native segments only e) elements of secondary structure connected by the segments (see Fig.4) References: [1] S Kirkpatrick, C D Gellat Jr and M P Vechi; Science 220, 671 (1983) [2] T Noguti and N Go; Biopolymers 24, 527 (1985) [3] S A Hassan, F Guarnieri and E L Mehler; J. Phys. Chem B 104, 6478 (2000); ibid 104, 6490 (2000) [4] B R Brooks et al.; J. Comp. Chem. 4, 187 (1983); MacKerell et al. J. Phys. Chem. B 102, 3586 (1998) [5] A Fiser, R Kinh Gian Do and A Sali; Protein Sci. 9, 1753 (2000) [6] D G Lambright et al.; Nature 379, 311 (1996) Figure 5: The three main stages of the calculation of Segment s1: (A) initial extended conformation of the segment (note that initially the C-terminus points in a direction opposite where it appears in the crystal structure); (B) structure obtained after the SA-MC phase, showing the partial folding towards an helical motif (N-terminus covalently bonded to the helix, rest of the protein not included); (C) segment completely closed after the SCV-MC phase, where the correct -helix motifs at the two ends of the segment are reproduced (rest of the protein included in the calculation). The characteristic of the closure depicted in this figure is typical of all the segments studied. RESULTS OF THECALCULATIONS SUMMARY OF THE RESULTS The table below shows the quantitative results of the eight segments calculated. The results compare favorably to the best results reported to date on loop modeling [5], except for segments s4 and s5: Differently to the rest of the segments, s4 and s5 are not completely solvent exposed and interact with each other. Although the qualitative folding at the end of these two segments is correctly reproduced, the middle part of the segment present a different conformation that increases the RMSD values. In general all the segments reproduce well the qualitative characteristics of the native structure: proper folding, correct formation of H-bonds and proper side chain orientation.

description

RESULTS OF THECALCULATIONS. Figure 6 : Upper view of the superposition of the calculated (blue ribbon) and native (black ribbon) conformations of Segment 1 (rest of the protein not shown); initial extended conformation is also shown. Arrows indicate the two ends of the segments. - PowerPoint PPT Presentation

Transcript of STRUCTURE CALCULATIONS OF PROTEIN SURFACE SEGMENTS: MONTE CARLO SIMULATED ANNEALING

Page 1: STRUCTURE CALCULATIONS OF PROTEIN SURFACE SEGMENTS: MONTE CARLO SIMULATED ANNEALING

STRUCTURE CALCULATIONS OF PROTEIN SURFACE SEGMENTS: MONTE CARLO SIMULATED ANNEALINGWITH SCALED COLLECTIVE VARIABLES AND FORCE CONSTANT ANNEALING

Sergio A. Hassan, Ernest L. Mehler and Harel Weinstein; Dept. Physiology and Biophysics, Mount Sinai School of Medicine, New York, NY 10029

A new algorithm for modeling segments in proteins (in particular loops) is presented that first finds conformations representative of segment structures tethered to the protein at the N-terminus only, and subsequently the free end of the segment is driven towards its attachment point using a reversed force constant simulated annealing scheme with scaled collective variables (SCV). The segment peptide is initially placed in an extended conformation with the N-terminus covalently bound to the attachment point in the protein, and simulated annealing Monte Carlo (MC) calculations [1] are carried out. The resulting families of new conformations prepare the peptide for attachment of the C-terminus. In the second stage a hierachical protocol drives the segment’s C-terminus towards its final position in the protein. In this second part of the calculation the complete force field, i.e., including the protein’s tertiary structure, is considered. The free C-terminus is attached to a dummy residue, identical to the target residue where the segment will be connected. Successive MC simulations are carried out using the SCV method [2] with increasingly larger values of the harmonic force constant to ensure the correct orientation of the segment with the rest of the protein in the attachment point. The method was evaluated for eight different segments in the -subunit of transducin, using PARAM22 CHARMM and the recently developed screened Coulomb potential based implicit solvent model [3].

INTRODUCTION

Loops are important in many biological functions of proteins and fluctuate considerably from their equilibrium structures in solution, which is problematic for their structure determination by experimental methods or for homology modeling. Structural flexibility of loops plays an important role in protein-protein, protein-peptide and protein-DNA recognition by allowing adaptation of loop conformation during interaction. In G-protein coupled receptors, for example, the extracellular loops are involved in binding of various ligands, whereas intracellular loops are important for triggering subsequent steps of the cellular response upon activation.

A segment is defined as a loop portion plus the elements of secondary structure that immediately precedes and follows it. Therefore, segment structure prediction is a more challenging problem since it includes the task of reproducing the specific folding properties observed at the ends of the segment (i.e., specific secondary structure) and the proper H-bond interactions.

The method developed here, for the calculation of segments connecting elements of secondary structure motifs in proteins, consists of two successive steps:

1. A simulated annealing Monte Carlo simulation (SA-MC) of the segment peptide tethered at its N-terminus only.

2. A Monte Carlo simulation in the space of the scaled collective variables (SCV-MC) of the segment, with an increasing harmonic constraint that drives the C-terminus towards its attachment point.

The rationale for this combined process is based on the assumption that even segments connecting elements of secondary structural motifs have an intrinsic propensity for a particular set of conformations, i.e., there is a specific folding encoded in its amino acid sequence. In addition it is assumed that the side chains of the segment are predominantly exposed to the solvent. The second part of the process takes the segment from the initially determined structures and closes it in the presence of the rest of the protein (Fig.1).

dummy-anchor residue distance

k1

k2

k3

k4

k1<k

2<k

3<k

4

0

Figure 1: Schematic representation of four successive destabilizations of a local minimum of the energy surface by an external constraint. The upper curve shows a minimum corresponding to a large distance of the C-terminus from the attachment point, obtained with a relatively small force constant k1 of the harmonic constraint. The lower curve represents the shift of the minimum for complete segment closure, achieved by a larger value k4 of the force constant. The relaxation around each local minima is carried out using a Monte Carlo simulation in the space of the scaled collective variables of the segment.

CALCULATION OF SEGMENTS IN THE -SUBUNIT OF TRANSDUCIN

One of the best-characterized G-protein signaling pathways in humans is in the rod cells of the retina where the conversion of light (external stimulus) into a nerve impulse is mediated by the -subunit of transducin that binds to rhodopsin, a transmembrane photoreceptor.

The -subunit is composed of two domains (Fig.2) one containing six -strands (1-6) surrounded by six helices (1-5 and G), and another one composed mainly of helical structures. The two domains are connected by two short segments, labeled linkers 1 and 2. The eight segments considered in this study are shown in Figures 3 and 4.

Figure 2: -subunit of transducin (PDB entry 1got) showing the two domains connected by two linkers (shown in black). All -strands are located within the domain shown at the right of the figure. Transducin is a heterotrimeric G-protein involved in the visual cascade in the human retina and binds to the photoreceptor rhodopsin; the -subunit is a 39 KDa protein.

Figure 4: Schematic representation of the secondary structure parsing of the sequence of the -subunit of transducin, showing the location of the eight segments studied.

Figure 3: Segments s1 to s8 (black ribbons) in the -subunit of transducin. Three segments (s2-s4) belong to one domain of the protein and four segments (s5-s8) belong to the second domain; Segment s1 contains (but is longer than) the linker 1 (see Fig.2). Together, the segments comprise about 25% of the whole protein.

The calculations were carried out with the all-atom representation of the CHARMM force field [4] and the recently proposed screened Coulomb potential based implicit solvent model [3].

Figure 6: Upper view of the superposition of the calculated (blue ribbon) and native (black ribbon) conformations of Segment 1 (rest of the protein not shown); initial extended conformation is also shown. Arrows indicate the two ends of the segments.

-2

0

2

4

6

8

10

12

-20 0 20 40 60 80 100

Relative Free Energy (Kcal/mol)

native structure

best scored conformation

Figure 8: Scatter plot of the RMSD of backbone heavy atoms (with exact superposition of the protein excluding the segment) versus the energy of the system (relative to the native structure) for Segment s1. In this preliminary study, the lower energy was used as a quantitative criterion for selecting the predicted conformation of the segments.

Figure 7: Calculated (blue) and native (black) structures of the isolated Segment s1, showing the orientations of the side chains. Arrows indicate the two ends of the segment.

Figure 9: Three main stages of the structure calculation for Segment 2 (see legend of Fig.5 for details).

Figure 10: Superposition of the calculated (black ribbon) and crystal structure for Segment 2 (rest of the protein not shown). Arrows indicate the ends of the segment.

Figure 11: Comparison of the orientations of side chains in the crystal structure and calculated conformation of the segment shown in Fig.10.

Figure 12: Detailed views of Segment 3 and the regions of the protein connected by this segment. The formation of critical H-bonds that define the -helical motifs is observed throughout the simulation. Stages (A), (B) and (C) are as in Figs.5 and 9. In stage (A) no H-bond is present between the segment and the rest of the protein; in (B) the correct orientation of the peptide bond at the N-terminal attachment is reproduced, but no H-bond is formed yet; in (C) an H-bond at the N-terminus and two H-bonds at the C-terminus complete the closure of the segments (indicated by arrows). This pathway for the formation of critical H-bonds is characteristic for the method. Hydrogen bonding interactions are described by the SCP-ISM reported in [3].

CONCLUSIONS The method developed was shown to reproduce the qualitative features of the structures of the segments in the -subunit of transducin. Quantitative agreement with the crystal structure, in terms of RMSD of backbone atoms was satisfactory for s1, s2, s3, s6 and s8. For s7 the RMSD was larger, but the overall conformation of the calculated structure was still in reasonable agreement with the native structure.

Notably, the known conformations in the crystal were found to have lower energies than any of the computed structures. This indicates that the energy function with the SCP-ISM is probably correct but that the sampling method must be improved.

Several option are available to improve the quality of the results. One approach to be explored is to introduce the protein structure in a more gradual and systematic way, especially for segments such as s4 and s5 that are buried in the protein. For these segments many steric clashes were observed when the structures obtained in the SA-MC phase were combined with the rest of the protein; the sudden introduction of the protein at the start of the SCV-MC phase introduces a large perturbation that forces the structure to leave the initial conformation in an uncontrolled and irreversible way. The gradual switching on of the force field of the protein around the segment would help avoid this problem.

ID Seqa Nb RMSDc rmsdd sec.structe Functions1 46-54 9 2.09 1.75 1 A Contains linker 1s2 80-90 11 1.60 0.85 A Bs3 102-112 11 2.97 1.63 B Cs4 132-142 11 >10.00 <3.00 D Es5 222-234 13 >10.00 <3.00 4 3 Part of the Switch III s6 276-286 11 2.70 2.50 G 4s7 301-310 10 4.20 2.70 4 6 Receptor binding regions8 314-321 8 2.85 1.79 6 5 Receptor binding regiona)sequence defined in [5], PDB entry:1gotb)number of residues in the segmentsc)root-mean-square deviations of main chain heavy atoms of the segments, in Å; note that the proteins are superimposed, excluding the segments.d)same as c) but superimposing the calculated and native segments onlye)elements of secondary structure connected by the segments (see Fig.4)

References:

[1] S Kirkpatrick, C D Gellat Jr and M P Vechi; Science 220, 671 (1983)[2] T Noguti and N Go; Biopolymers 24, 527 (1985)[3] S A Hassan, F Guarnieri and E L Mehler; J. Phys. Chem B 104, 6478 (2000); ibid 104, 6490 (2000)[4] B R Brooks et al.; J. Comp. Chem. 4, 187 (1983); MacKerell et al. J. Phys. Chem. B 102, 3586 (1998)[5] A Fiser, R Kinh Gian Do and A Sali; Protein Sci. 9, 1753 (2000)[6] D G Lambright et al.; Nature 379, 311 (1996)

Figure 5: The three main stages of the calculation of Segment s1: (A) initial extended conformation of the segment (note that initially the C-terminus points in a direction opposite where it appears in the crystal structure); (B) structure obtained after the SA-MC phase, showing the partial folding towards an helical motif (N-terminus covalently bonded to the helix, rest of the protein not included); (C) segment completely closed after the SCV-MC phase, where the correct -helix motifs at the two ends of the segment are reproduced (rest of the protein included in the calculation). The characteristic of the closure depicted in this figure is typical of all the segments studied.

RESULTS OF THECALCULATIONS

SUMMARY OF THE RESULTS

The table below shows the quantitative results of the eight segments calculated. The results compare favorably to the best results reported to date on loop modeling [5], except for segments s4 and s5: Differently to the rest of the segments, s4 and s5 are not completely solvent exposed and interact with each other. Although the qualitative folding at the end of these two segments is correctly reproduced, the middle part of the segment present a different conformation that increases the RMSD values.

In general all the segments reproduce well the qualitative characteristics of the native structure: proper folding, correct formation of H-bonds and proper side chain orientation.