Coarse Grained Molecular Dynamics - AUusers-birc.au.dk/cstorm/students/Siuda_Dec2010.pdf · Coarse...

35
Coarse Grained Molecular Dynamics with Domain Movements of Large Proteins Iwona Siuda Progress Report December 2010 Department of Molecular Biology Bioinformatics Research Centre (BiRC) Membrane Pumps in Cells and Disease (PUMPKIN) Aarhus University Denmark

Transcript of Coarse Grained Molecular Dynamics - AUusers-birc.au.dk/cstorm/students/Siuda_Dec2010.pdf · Coarse...

Coarse Grained Molecular Dynamics

with

Domain Movements of Large Proteins

Iwona Siuda

Progress Report

December 2010

Department of Molecular Biology

Bioinformatics Research Centre (BiRC)

Membrane Pumps in Cells and Disease (PUMPKIN)

Aarhus University

Denmark

CONTENT

Preface ........................................................................................................................ i

1. Introduction ......................................................................................................... 1

2. Multi-domain Proteins ......................................................................................... 3

2.1. Test Set ............................................................................................................................. 3

2.2. Periplasmic Leucine Binding Protein ............................................................................... 5

2.3. SERCA ............................................................................................................................... 6

3. Methods .............................................................................................................. 8

3.1. Modelling Setups ............................................................................................................. 8

3.2. Molecular Dynamics Simulations .................................................................................... 8

3.3. All-Atom MD .................................................................................................................. 10

3.4. MARTINI CG MD ............................................................................................................. 10

3.5. ELNEDIN MD ................................................................................................................... 12

3.6. domELNEDIN MD ........................................................................................................... 13

4. Results and Discussion ....................................................................................... 15

4.1. AA Simulations ............................................................................................................... 15

4.1.1. Leucine Interactions ............................................................................................................ 16

4.1.2. Intra-domain Changes ........................................................................................................ 17

4.2. CG Simulations ............................................................................................................... 18

4.3. ELNEDIN .......................................................................................................................... 19

4.3.1. Model A and Model B ......................................................................................................... 19

4.3.2. Model C and Model D ......................................................................................................... 22

4.4. domELNEDIN .................................................................................................................. 23

4.4.1. Model A and Model B ......................................................................................................... 23

4.4.2. Model C and Model D ......................................................................................................... 24

5. Conclusions and Future Perspectives .................................................................. 27

References ................................................................................................................ 29

Appendix: SLEU Parameterization ............................................................................. 31

i

Preface

I am a Ph.D. student at the Department of Molecular Biology, within the field of

bioinformatics, and a member of the Structural Bioinformatics group at the

Bioinformatics Research Centre (BiRC). I started my Ph.D. studies November 1st 2009,

under supervision of Christian N. S. Pedersen and Lea Thøgersen at BiRC. My Ph.D.

project is concerned with developing and applying coarse grained molecular dynamics

methods to protein pumps, of interest to research centre for Membrane Pumps in Cells

and Disease (PUMPKIN). Thus, during the first part of my studies I have focused mainly

on the development and application of the domELNEDIN model - a “protein domain

version” of the ELNEDIN model (1), which is based on the established MARTINI coarse

grained force field (2-4).

In this report I will briefly introduce multiscale methods for molecular dynamics

simulations and steps that were taken to collect a set of proteins for the purposes of the

development and application of the domELNEDIN model. Then I will present the

methods for which simulations were set up and carried out, both at the all-atom level,

the MARTINI CG level, the ELNEDIN level, and the domELNEDIN level. Thereafter, the

obtained results will be discussed and the used methods will be evaluated. I will finish

this report with conclusions about the presented models, and a discussion about future

plans for the second part of my Ph.D. studies.

Iwona Siuda

1

1. Introduction

The main objective of my Ph.D. project is the development and application of a

method and a protocol to model the dynamics of the domain motion in the P-type

ATPases and other large multi-domain proteins. There are many different modelling

techniques, each proper for a particular time scale and detail level of the simulated

system. One of the atomic resolution techniques is all-atom (AA) molecular dynamics

(MD) simulation. This method remains a powerful tool for investigating the structure,

dynamics and function of many important biomolecular systems, like proteins and lipid

bilayers. However, the properties that make atomically detailed MD simulation such a

powerful tool are also its limitations. For an average size of the system, the number of

simulated atoms can easily reach thousands to millions of atoms, and the time scale

currently approachable is limited (due to the computer efficiency) to hundreds of

nanoseconds (5). This means that the most relevant dynamics and interactions within

cells (like protein-protein docking or rearrangement upon ligand binding), which occur

on the micro- and millisecond time scale, are currently out of reach for AA simulations.

As the fast and slow molecular dynamics are sufficiently independent, it should be

possible to ignore fast vibrations for the study of slow dynamics. Thus, to study the

mechanisms, dynamics and structural changes of ATPase pumps (6), some level of

coarse grained (CG) description of the system, can be applied.

In CG models, molecules are described by the interaction sites representing groups

of atoms, providing a reduced resolution description of a given system. These models

are expected to be highly computationally efficient, both because mapping atoms into

the sites reduce some of the degrees of freedom, and also because high frequency intra-

molecular vibrations have been incorporated into averaged effective interactions

between sites. Consequently, it is possible to choose a larger time step Δt, and therefore

speed up computations. Very different levels of CG models have been considered (7-9)

ranging from the “united atoms” approach, where only non-polar hydrogen atoms are

ignored, to mesoscale models using the rigid regions of well-defined equilibrium

structures identified within a biomolecule as the natural coarse grained elements (10).

Another level of system description is the “residue” resolution level represented by the

MARTINI CG model (2-4), where the atoms of each residue are mapped into two to five

beads. In CG models with “residue” level resolution, the lack of proper hydrogen bond

description can make it necessary to combine CG model with an elastic network to

maintain the overall shape of the protein (1).

In the group of models named elastic network models (ENM) (7,9), the structure of a

macromolecule is described as a network of points of masses connected to each other

with springs when the distance between the point of masses is less than a predefined

cut-off distance Rc. The strength of the springs, and therefore the rigidity of the network,

is characterized by the spring force constant Ks. There are different usages of the

network models like e.g. combining them with normal mode analysis (NMA) for the

2

analysis of the principal modes of a large variety of different systems (11), or with CG

models to maintain the overall shape of the protein during MD simulations (1). This is

done in the extension to the MARTINI CG force field called ELNEDIN (1). The use of ENM

with MARTINI allows the study of large complex proteins and their interactions with the

surrounding and each other on the microsecond time scale of a resolution describing

specific residue interactions. However, as the ENM restrain the initial protein structure,

conformational shifts are impossible to observe.

With my project I therefore wished to experiment with a modification of the

ELNEDIN approach, where the ENM scaffold is put on each domain separately,

restraining movements inside domains, while at the same time allowing inter-domain

movements. Depending on the generality of such model, it will make a whole new range

of theoretical studies of the ATPase pumps as well as other multi-domain proteins

possible. This proposed approach is called domELNEDIN, and the preliminary testing of

this model is the core of the work presented in this report.

3

2. Multi-domain Proteins

The main target of my Ph.D. project are the cell membrane pumps, belonging to the

family of P-type ATPase, with a special focus on a sarco-endoplasmic reticulum Ca2+-

ATPase (SERCA). These cell membrane pumps are made of multiple functional domains

that may undergo substantial displacements essential for their functioning. Exploring the

conformational space of large-scale domain rearrangements may be useful, for example,

for testing hypotheses about the possibility of interactions between the multi-domain

protein and a small molecule, often a ligand, in docking procedure. There are two main

types of motions that can reproduce domain movements, the “shear” and “hinge-

bending” motions (12). An individual “shear” motion is small (due to involvement of

amino acid residues that are distributed over extended areas of the protein), and thus a

single one is usually not sufficient to produce a large domain movement (12). In the

“hinge-bending” movement a relatively small number of residues of a polypeptide chain

significantly changes mutual domain position (12). As multiple conformations SERCA are

known, it can be seen that domain movements in this membrane pump are of the

second type. For all the proteins that have two domains connected by linking hinge

regions, a few large torsion angle changes are sufficient to produce almost the whole

domain motion. The rest of the protein rotates essentially as a rigid body, thus the intra-

domain structure remains unchanged between the conformations. In order to develop

and test the domELNEDIN model for describing domain movements, a test set of multi-

domain proteins was built.

2.1. Test Set

To test the domELNEDIN model and confirm that for many proteins during

conformational shifts the protein intra-domain structure remains essentially unchanged,

a set of 40 protein structures resolved in at least two conformations has been compiled.

After further investigation eight proteins were selected (Fig. 1), where the root-mean-

square deviation (RMSD) between all Cα atoms for the two conformations of the same

protein was higher than 1.6 Å (Tab. 1).

Figure 1 A selection of the proteins in the test set collected for studying protein domain interfaces. For all of the proteins the structure is known in at least two conformations (Figure created using VMD (13)).

4

Table 1 The RMSD between Cα of two different conformations of the same protein.

Molecule PDB Structure PDB Structure RMSD [Å]

1. Leucine Binding Protein 1USK (14) 1USG (14) 7.04

2. Aspartate Aminotransferase 1AMA (15) 9AAT (16) 1.66

3. Citrate Synthase 4CTS (17) 1CTS (18) 2.37

4. Fibronectin 1E8B (19) 1E88 (19) 2.79

5. Phosphotransferase System, Enzyme I 2EZA (20) 3EZA (21) 1.86

6. Cyanovirin-N 1L5B (22) 1L5E (22) 6.51

7. Cbl 1B47 (23) 2CBL (23) 1.87

8. HIV-1 Reverse Transcriptase 2HMI (24) 1HVU (25) 5.73

To develop and test a procedure for coarse graining at the protein domain level,

each domain had to be defined. In general, the functional domains can fold and function

independently. However, when it comes to dividing a protein into domains there is no

unambiguous definition of how to do it. There are several methods for the protein

domain identification, but for the purposes of this project an automatic domain-parsing

procedure called DDOMAIN (26), available as a web server at http://sparks.informatics.

iupui.edu/hzhou/ddomain.html, was used. It is based on the principle that inter-domain

interaction is weak under a correct domain assignment. All residues coordinates (Nr) for

a structure submitted to this server are considered as continuous from 1 to Nr. Then the

structure is divided into two candidate domains: residues from 1 to i and from i + 1 to Nr,

with the only assumption that a domain must be 40 residues or longer. The domain-

domain interactions are calculated either by the residue-residue contacts or by a

normalized residue-based or distance-based energy profile. The lowest value of the

energy profile indicates a boundary point between the domains. After defining two

domains, each of them is inspected in order to see if it can be further divided into

smaller domains. To optimize the results, energy profile parameters are trained, tested,

and compared to the annotations of the three following data sets: AUTHORS (domain

definitions are given by the authors who solved protein structures), CATH (37) and SCOP

(38). Results obtained from the DDOMAIN server were further investigated by visual

inspection using VMD (13). If the two conformations of the same protein did not get the

same domain definition it was changed manually to best fit based on visual inspection.

Based on the domain definitions for the eight selected proteins, the RMSDs between

the same domains in different protein conformations were computed (Tab. 2).

Comparing results from Table 2 to those from Table 1 shows that for some of the pdb

structures in the test set, like 1B47 and 2CBL, or 1USK and 1USG, each domain remains

essentially unchanged during the conformational shifts. Thus, it seems promising to

model conformational changes by coarse graining at the protein domain level.

5

Table 2 The RMSDs and average RMSDs computed for the same domains in different protein conformations.

RMSD [Å]

PDB Structures Domain 1 Domain 2 Domain 3 Domain 4 Average

1. 1USK - 1USG 0.73 0.63 - - 0.68

2. 1AMA - 9AAT 1.64 0.33 0.98 - 0.98

3. 4CTS - 1CTS 1.14 1.61 - - 1.38

4. 1E8B - 1E88 1.23 1.12 1.07 - 1.14

5. 2EZA - 3EZA 1.31 0.99 - - 1.15

6. 1L5B - 1L5E 1.72 0.45 - - 1.09

7. 1B47 - 2CBL 0.62 0.46 - - 0.54

8. 2HMI - 1HVU 1.87 2.09 0.77 0.90 1.41

2.2. Periplasmic Leucine Binding Protein

Based on the significant difference in the RMSD between the two conformations

(1USK and 1USG) of the periplasmic Leucine Binding Protein (LBP) (14) (RMSD of 7.04 Å),

and between the same domains in different conformations (RMSD of 0.68 Å), the LBP

was chosen as main test example for further method development.

The LBP is the primary receptor for the leucine transport system in E. coli, and

undergoes hinge movements associated with large conformational changes upon ligand

binding. The structure was resolved and refined in an open ligand-free form (PDB: 1USG

(14)) (Fig. 2B) to a resolution of 1.5 Å, and in a closed form with leucine bound (PDB:

1USK (14)) (Fig. 2A) to a resolution of 2.4 Å. LBP is a 346 residue protein made of two

domains each consisting of a central β-sheet flanked by α-helices. The first domain

contains residues 1-120 and 251-329 and the second contains residues 121-250 and 330-

345 (14). The domains are linked by a three stranded hinge (Fig. 2) spanning residues

117-121 for connection I, 248-252 for connection II and 325-331 for connection III (14).

Most of the changes in the main-chain torsion angles that determine the observed

motions occur in the connections I and III, which form the direct links between the β-

sheets of two domains. Connection II merely adapts, as a short helix (residues 251-255)

is placed next to the hinge region. In general, the residues in helices are subject to more

severe hydrogen-bonding and steric constraints than those in sheets, thus the possible

changes in helices torsion angles are correspondingly smaller than those of the residues

in sheets (12). In the holo form leucine binds to the LBP in a cleft formed between the

two domains involving hydrogen-bonding and non-polar contributions.

The domains in the LBP and the cytoplasmic domains in SERCA, share similar features

as they both contain both α-helical and β-sheet elements, and are subjected to hinge

movements. This, as well as the straight-forwarded coarse graining of the ligand leucine,

makes this protein a relevant and convenient test example.

6

Figure 2 The two conformations of LBP, domain 1 shown in blue, domain 2 in red, and hinge region in grey. A: the holo form (1USK) with leucine positioned in the cleft between the two domains; B: the apo form (1USG). Comparison of the Cα backbone of 1USK (blue) and 1USG (red) C: the whole structures, D: domain 2 from the two conformations, E: domain 1 from the two conformations (Figure created using VMD (13)).

2.3. SERCA

From the superfamily of P-type ATPases, SERCA is by far the most studied, and more

than 30 structures representing several conformational states with different inhibitors

and substrates along the catalytic cycle, are known. SERCA is therefore an obvious test

protein for the domELNEDIN model, before it is applied to the other membrane pumps.

SERCA is a 994 residue protein consisting of one transmembrane and three

cytoplasmic domains. The schematic figure (Fig. 3A) of the first-determined high-

resolution (2.6 Å) structure of SERCA (27) presents the cytoplasmic headpiece consisting

of three well defined domains: A (Actuator), N (Nucleotide-binding), and P

(Phosphorylation), and ten transmembrane α-helices denoted as TM1-TM10. The

functional cycle (Fig. 3B) is denoted by E1 and E2 states that refer to the binding and

active transport of cytoplasmic Ca2+ and the counter transport of luminal H+ to the

cytoplasm, respectively (6). The Ca2+ transport cycle starts in the E2 state, where 2‐3

protons are expected to be bound in the Ca2+ ion binding sites (the helices TM4-TM6 and

TM8 have been considered to form a binding site (27) for Ca2+ ions), and ATP is proposed

to be bound to the N‐domain. After dephosphorylation of ATP and phosphorylation of

the protein, the transition from the E2 to the Ca2E1~P state, associated with Ca2+

binding and release of protons, occurs. The energy derived from the ATP hydrolysis is

used to translocate the two Ca2+ ions from the cytoplasm to the sarcoplasmic reticulum

lumen against a steep concentration gradient. Consequently, after domain

rearrangement, the protein changes conformation to the E2P state in which the

transmembrane region opens and the Ca2+ ion binding sites are exposed to the luminal

pathway allowing the Ca2+ ions’ release. Next, 2-3 protons and ATP are taken up by the

cation binding sites and the transmembrane region closes off, causing conformational

change to the E2‐Pi state. The cycle is then completed with release of the inorganic

phosphate (6).

A B

C D E

7

Figure 3 A: The structure of SERCA. The three cytoplasmic domains are labelled A, N and P, and TM helices are labelled from 1-10. The colour changes gradually from blue in the N-terminus to red in the C-terminus. An ATP analogue bound to the N-domain is shown in CPK (Figure from (27)). B: The functional cycle of SERCA illustrated with four cornerstones. Domain A is shown in yellow, domain N in red, domain P in blue, TM1‐2 in purple, TM3‐4 in green, TM5‐6 in wheat and TM7‐10 in grey, Ca

2+ ions are shown as grey spheres

(Figure from (6)).

As mentioned before, SERCA is an obvious test protein for domELNEDIN model

before it is applied to the other membrane pumps. However, this protein is very

complex, and the method proposed in this project has not yet been tested with SERCA.

A B

8

3. Methods

The LBP was studied in four different simulation setups. As the interactions between

the CG sites are parameterized from atomic interactions, the results from the CG

simulations must be consistent with results obtained from the atomistic model. Thus, all

setups were studied; both at the AA level (28), the MARTINI CG level (2-4), the ELNEDIN

level (1), where an elastic network model is put on top of MARTINI model, as well as

with the domELNEDIN level where the elastic network is only set up internally in the

domains.

3.1. Modelling Setups

The four modelling setups of the LBP (Tab. 3) are as follows; Model A – the crystal

structure of the apo form, Model B – the crystal structure of the holo form, Model C –

the apo form with leucine bound, and Model D – the holo form without leucine bound.

In Model C, leucine was positioned in the cleft between the two domains of the crystal

structure of the apo form of the LBP by superimposing Cα atoms of the apo and holo

forms. In this manner the structures were aligned and the coordinates for leucine were

combined with the apo form and its crystal water molecules. Model D was built by

removing leucine coordinates from the pdb file. All crystal water molecules were kept in

the initial structures.

Table 3 The four modelling setups of the LBP.

Model Description

1. Model A apo form (1USG)

2. Model B holo form (1USK)

3. Model C apo form (1USG) with leucine bound

4. Model D holo form (1USK) without leucine bound

3.2. Molecular Dynamics Simulations

The simulations described in this report were carried out using classical molecular

dynamics technique, in which Newton’s equations of motion integrated with respect to

time {1}, are used for calculating trajectories of the particles.

{1}

The acceleration , together with the prior position and velocity of each atom i {1},

determines their new position after a small time step. Since most simulations start from

a static structure in which the velocities are not known, the velocities are assigned

randomly to the atoms from a Maxwell-Boltzmann distribution at a predetermined

temperature (29). The force , acting on each atom, is determined from the negative

gradient {2} of the potential energy surface V {3} which describes energy terms from the

bonded and non-bonded interactions between the particles.

9

{2}

{3}

The bonded interactions are described by the following set of the potential energy

functions {4-7} acting between the bonded particles i, j, k, and l with the equilibrium

distance r0, angle θ0, dihedral angle φ (Fig. 4) and improper dihedral angle ω, and where

K indicates the force constants (29).

{4}

{5}

{6}

{7}

The stretching {4} and bending {5} energy equations are based on Hooke’s law, and they

estimate the energy associated with vibrations about

the equilibrium bond length and bond angle,

respectively. The dihedral angle {6} describes bond

rotation. The phase angle of the rotation is described

by , the torsional barrier by Vn, and the periodicity,

which is the number of energy minima during a full

rotation ( ) by n. The improper dihedral

angle potential {7} is used to prevent out-of plane

distortions of planar groups.

The non-bonded interactions can be described by two terms: the Lennard-Jones (LJ)

potential representing the Van der Waals interactions and Coulomb’s potential

representing electrostatic interactions. All particle pairs i and j at the distance rij = ri − rj

interact via the LJ potential {8} (29).

{8}

The strength of the interactions between the particles i and j is determined by the

value of ij. The distance represents the effective minimum distance of an approach

between the two particles. The first term in {8} describes the repulsion between two

particles and the second term the attraction between them. In addition to the LJ term,

the electrostatic interactions between charged groups of atoms bearing a charge q are

described by a Coulombic energy function {9} with a relative dielectric constant (29).

{9}

Figure 4 Internal coordinates for

bonded interactions: r governs bond

stretching; represents the bond angle;

gives the dihedral angle (Figure created using VMD (13)).

r

10

These energy functions {4-9}, together with the set of parameters required to

describe the behaviour of different kinds of atoms and bonds, fitted to the experimental

data are known as a force field. There are several types of force fields often with specific

focus on a subgroup of the biomolecules: proteins, nucleic acids, lipids, and

carbohydrates. The MD simulations presented in this report were performed using the

AMBER03 (30) and MARTINI-2.1 CG force fields (2-4).

3.3. All-Atom MD

The AA simulations were performed using the GROMACS simulation package (28)

version 4.0.7 with the AMBER03 force field (30) for the protein and the SPC water model

for solvent (31). The protonation of the protein was handled automatically by GROMACS

pdb2gmx program, which takes the most common protonation state for an amino acid

residue in solvent at pH 7. Thus, Lys was protonated, Asp and Glu were unprotonated,

and His was kept neutral. For simulations including the ligand leucine, parameters for

zwitterionic leucine had to be defined (see Appendix).

For each setup the protein structure was solvated in a cubic box with dimensions

10x10x10 nm and counter ions were added (9 Na+). For each group in the system (ions,

water, protein) the temperature (300 K) and isotropic pressure (1 bar) were kept

constant using the Berendsen coupling algorithm (32) with time constants τt = 0.1 ps and

τp = 1 ps, respectively. A twin-range cut-off was used for the non-bonded interactions.

Interactions within the short range cut-off (1.0 nm) were evaluated every time step (2

fs), whereas interactions within the long-range cut-off (1.4 nm) were updated every 10

steps together with the pair-list. Electrostatics interactions were modelled using PME

(33). Bond lengths were constrained using the LINCS algorithm (34) for the protein. The

setups were energy-minimized followed by a relaxation of the solvent and ions, with

position restraints (1000 kJ·mol-1·nm-2) applied to all heavy atoms of the protein. The

setups were then simulated for 100 ns without any restraints.

3.4. MARTINI CG MD

MARTINI (2-4) is a CG force field which has become very popular due to its success in

parameterizing a large library of the biologically relevant building blocks. In this model

each residue is mapped to a backbone bead and zero (Ala) to four (Trp) side chain beads

(Fig. 5A). On average, one bead represents four heavy atoms. A backbone bead for each

residue is placed at the center of mass (COM) of the backbone atoms: N , Cα, C, O. There

are four main types of a particle: polar (P), nonpolar (N), apolar (C), and charged (Q), and

they can be further divided denoting the hydrogen-bonding capabilities: d – donor, a –

acceptor, da – both, 0 – none, or by a number indicating the degree of polarity (from 1 –

low to 5 – high) (4) (Fig. 5B).

11

Figure 5 A: The representation of all protein amino acids mapped into beads (4). B: The scheme shows

different bead types (Figure from http://md.chem.rug.nl/cgmartini/).

The bonded and non-bonded interactions {4-9} between the beads are described (4) in a

manner similar to an AA force field. However, the strength of an interaction in the LJ

potential {8}, determined by the well-depth ij, depends on interacting particle types and

ranges from ij = 5.6 kJ/mol for the interactions between strongly polar groups to ij = 2.0

kJ/mol for groups mimicking hydrophobic effects. The effective size of beads σ, is σ =

0.47 nm for normal types of particle and σ = 0.43 nm for model ring-like molecules (4). In

the Coulombic energy function {9}, which describes

interactions between charged (Q type) beads, a

relative dielectric constant is set to εrel = 15 for explicit

screening as CG water is a neutral bead, and

therefore does not poses any screening capabilities.

Both of these potentials are cut off, and smoothly

shifted to avoid noise in the simulations.

The CG simulations were also performed using the

GROMACS (28) software package version 4.0.7. The

protein structure, as it appeared after 100 ns of

atomistic simulation for the four setups, was used as

input to the CG simulations. All scripts used to

generate the topology files were obtained from the

MARTINI home page: http://md.chem.rug.nl/

cgmartini/. To generate the protein topology the

protein sequence from the Protein Data Bank (35)

and secondary structure found with DSSP (36) were

given as an input file to the seq2itp.pl script, which

was modified to assign the correct charges on C- and

N-termini. After mapping atoms into beads (Fig. 6) each structure was energy-minimized

in vacuum and then solvated with CG water beads (representing each four water

molecules). Next, counter ion beads were added to neutralize the system (9 Na+). The

setups were energy-minimized and the solvent and ions were relaxed with position

restraints (1000 kJ·mol-1·nm-2) applied to all beads of the protein for 1.25 ns. The

Figure 6 The CG representation of the

holo form of the LBP (1USK).

Residues in the all-atom representation

(upper snapshot) were mapped into

beads (lower snapshot), where the

backbone beads are shown in green,

and side chain beads in yellow (Figure created using VMD (13)).

A B

12

temperature and pressure settings were as in the atomistic simulations, only with time

constants τt = 1 ps and τp = 5 ps (4). Non-bonded interactions were cut off at 1.2 nm and

shifted from 0.9 nm for the LJ potential and from 0.0 nm for the electrostatic potential

(2,3). Neighbour lists were updated every 10 steps. The systems were simulated for 25

ns using a 25 fs time step. In order to have comparable time scales of the MARTINI CG

simulations and the AA simulations, a scaling factor of 4, which is the speed up factor in

the diffusional dynamics of CG water compared to real water, was proposed (2). Thus, 25

ns of CG simulation correspond to 100 ns of AA simulation.

3.5. ELNEDIN MD

In the ELNEDIN model (1) an elastic network is put on top of a modified MARTINI CG

model to maintain the tertiary structure of the protein. Firstly, the modification includes

positioning the backbone beads at the location of Cα atoms, and not in the COM.

Secondly, there is a difference in how amino acids ring structures are represented. For

both Phe and Tyr an extra bond is used to maintain the ring structure, and in the case of

His and Trp the asymmetry in their rings is considered (Fig. 7). Those differences in the

side chains of Trp and Tyr ring structures, as well as their movement upon ligand binding

(in case of Trp and Tyr residues positioned in, and near the cleft) caused the simulation

to be unstable with the 25 fs time step. Thus, the time step was decreased to 10 fs.

Figure 7 The CG representation of residues Phe, Tyr, His and Trp in the ELNEDIN model, showing structural

mapping and bond connectivity (Figure from supplementary information for (1)).

The conversion from the AA model to the ELNEDIN representation (Fig. 8A) includes

two additional parameters for setting up the structural scaffold. Those parameters are:

the cut-off distance between the backbone beads Rc [nm], which describes the range of

beads that can be connected with the additional elastic bonds (Fig. 8B), and the spring

force constant Kspring *kJ·mol-1·nm-2], which describes the stiffness of the elastic bonds.

The default parameters that seems to work the best for the cases presented so far, are

Rc = 0.9 nm and Kspring = 500 kJ·mol-1·nm-2 (1).

A

13

Figure 8 The ELNEDIN representation. A: After the conversion from AA to CG model (left, the backbone

beads are shown in green, and side chain beads in yellow) additional restraints are put on top of the CG beads (right). ELNEDIN scaffold built with Rc = 0.9 nm is shown in red lines, protein is shown as a Licorice, representation of backbone beads in green (Figure created using VMD (13)). B: Three ENM scaffolds built with different cut-off distances (Rc) (Figure from (1)).

The protein structures used as an input were the same as those used for the

MARTINI CG simulations. The parameters for the elastic network scaffold for Model A

and Model B were varied with cut off distances Rc [nm] є {0.8, 0.9, 1.0} and spring force

constants Kspring [kJ·mol-1·nm-2] є {50, 500, 5000}. For Model C and Model D, ELNEDIN

simulations were done for Rc [nm] є {0.8, 0.9} and Kspring = 500 *kJ·mol-1·nm-2] as it was

observed that those parameters provide reasonable overlap between the CG and

atomistic models for the LBP protein. To the scripts used to generate ELNEDIN topology

files, the same changes as for standard CG scripts to include charges on the C- and N-

terminus were applied. After conversion from AA to ELNEDIN representation, the

proteins were energy-minimized in vacuum, then solvated as in MARTINI CG, and

counter ions were added. The setups were energy-minimized and the solvent and ions

were relaxed while all protein beads were restrained (1000 kJ·mol-1·nm-2) for 50 ps using

1 fs time step followed by a 1 ns equilibration using 10 fs time steps with restraints put

only on the backbone beads of the protein. The temperature and pressure settings were

as in the atomistic simulations, only with constants τt = 0.5 ps and τp = 1.2 ps. The non-

bonded interactions were treated with the same shifts and cut offs as applied in the

MARTINI CG model. Model A and B were simulated for 25 ns using 10 fs time step,

whereas Model C and D were simulated for 1 μs with a 10 fs time step.

3.6. domELNEDIN MD

As domain movements in general are essential for the function of proteins, and as

this cannot be described within the ELNEDIN model, a modified version of ELNEDIN is

here proposed. In this model, named domELNEDIN, the ENM scaffold is put on each

domain separately, restraining movements inside domains, while at the same time

allowing complete freedom for inter-domain movements.

The setups for domELNEDIN simulations are the same as described for the ELNEDIN

method, with one crucial exception, that all network bonds connecting the two domains

of the LBP were left out (Fig. 9). Based on the domain predictions (described in 3.1), the

final domain definition of the LBP assigned residues 1 – 120 and 250 – 330 to domain 1,

and residues 121 -249 and 331 – 345 to domain 2. In order to generate the elastic

springs connecting only atoms from the same domain, the script generating the

B

Rc = 0.6 nm 0.9 nm 1.2 nm

14

topology file for the ELNEDIN method was modified. A section parsing an additional file

with the domain definitions obtained from domain predictions has been added to the

script, allowing setting up the elastic network exclusively inside the domains.

Additional simulations were performed for Model D where 2, 4 and 6 residues

around each of three linkers forming the hinge were unlocked. The simulations were run

for 1 μs with 10 fs time step.

Figure 9 The domELNEDIN representation of the 1USG (A) and 1USK (B) structures scaffold built with Rc = 0.9 nm is shown in red lines, protein is shown as a Licorice representation of backbone beads in green. The elastic springs that were in the ELNEDIN model and now are left out in the domELNEDIN model, are shown in blue (Figure created using VMD (13)).

A B

15

4. Results and Discussion

The work presented here is based on studying the use of the standard MARTINI CG

and the ELNEDIN model on the LBP and extending it to a “protein domain version” – the

domELNEDIN. To compare the structural and dynamical properties of the CG models

based on the MARTINI-2.1 force field, AA simulations were used as benchmarks.

Presentation and discussion of the results starts with the AA simulations, where the

ligand leucine influence on the conformational changes is investigated. Next the CG

models are presented starting from the standard MARTINI-2.1 CG, through the ELNEDIN

and ending with the domELNEDIN model. Evaluation of all four models was based on

three physical quantities: the RMSD, the RMSD per residue, RMSD_res, and the root-

mean-square fluctuation per residue, RMSF_res.

4.1. AA Simulations

All AA simulations were run for 100 ns for each of the four Models. To compute

RMSD_res and RMSF_res the last 80 ns of simulation were used. As shown in Fig. 10 all

models are stable during the AA simulation, which is important as the final structure at

100 ns, will be used as input structure to the CG models.

Figure 10 The RMSD plots of Model A in blue, Model B in red, Model C in green, and Model D in purple, for the AA simulations.

As expected Model A – the apo form of the LBP (blue) shows the highest deviation in

the structure. When the ligand leucine is bound to the apo form - Model C (green), the

RMSD plot shows smaller deviations than for the apo form without leucine bound, at the

level of around 2 Å, indicating that the creation of contacts between the protein and

ligand stabilize the structure. Also as expected Model B – the holo form with leucine

bound (red) is more stable than the apo form due to the additional connections

between the protein and ligand, and inter-domain interactions. However, at around 45

ns some deviations in the structure of Model B are observed. They are caused by a

ligand movement upward in the cleft causing small opening of the protein. At around 65

ns leucine moves back toward the bottom of the cleft and the protein closes returning to

the initial form. For the same starting structure of the closed form, without leucine,

0

1

2

3

4

5

6

0 20 40 60 80 100

RM

SD

]

Time [ns]

RMSD plots for Models A-D

16

Model D (purple), the structure remains very stable at a level of around 1.5 Å. The closed

form of the LBP, with or without leucine, seems to be very stable conformations. As LBP

is the primary receptor for the leucine transport system in E. coli, the interaction with

another protein or some other factor not included in the simulations might be necessary

to observe the closed form opening associated with ligand release.

The deviations and flexibility of each residue are plotted as the RMSD and RMSF per

residue (Fig. 11). As the AA simulations will be used as a benchmark for the CG models,

those measurements will indicate if the CG models can reproduce patterns observed in

the AA approach.

Figure 11 The plots of RMSD_res and RMSF_res for Model C in the AA simulations.

The plots above (Fig. 11) show deviations (on the left), and fluctuations (on the right) of

the residues in Model C (as an example). The distribution of the peaks and valleys across

the sequence is very similar for all the models (data not shown). The two big peaks

observed in the middle of the plot, indicates movements of the residues 160-169 and

178-181, which are loop regions. These residues are not placed in the hinge regions, but

on the surface of the protein in domain 2, and have no direct effect on the ligand

binding or hinge movement (14).

4.1.1. Leucine Interactions

The conformational changes in the LBP are associated with leucine binding and

release (14). The interactions between the ligand and LBP in the holo form (1USK),

involve both hydrogen-bonds and non-polar contributions (Fig. 12). The amino group of

the ligand forms hydrogen bonds with Gly100, Thr102, and Glu226, and the carboxylate

group of the ligand forms the hydrogen bonds with Ser79, Thr102, and Tyr202. The

hydrogen-bonding distances are ≤ 3.5 Å. Additional, the Trp18, Tyr150, and Tyr276

residues are within inter-atomic distance of ≤ 4 Å, and make the van der Waals contacts

with the hydrophobic side chain of the ligand (Fig. 12) (14).

When the interactions between leucine and the holo form of the LBP (Model B) are

investigated after 100 ns of AA simulation, it appears that the hydrogen bonds between

the amino group of the ligand and Gly100, Thr102, and Glu226, and interactions

between the carboxylate groups of leucine and Ser79, Thr102 are lost. Losing contacts

between the ligand and protein might be the result of a too simplified parameterization

0

1

2

3

4

5

6

0 50 100 150 200 250 300 350

RM

SD

]

Residue number

RMSD_res

0

1

2

3

4

0 50 100 150 200 250 300 350

RM

SF

]

Residue number

RMSF_res

Leu

Tyr202

Tyr150

Tyr276

Ser79

Gly100

Leu

Ser79

Tyr202

17

of the zwitterionic leucine for the AMBER03 force field. The partial charges were

assigned in a way decreasing the positive and negative contributions of the amino and

carboxylate groups, respectively (see Appendix). As this might very well influence ligand

binding and conformational changes of the LBP, the leucine ligand should be re-

parameterized and the simulations should be rerun.

Figure 12 Interactions between ligand leucine and apo form of LBP (1USK). Oxygen and nitrogen are shown in red and blue, respectively. The hydrogen bond between ligand and protein are shown in dashed lines. Residues that make van der Waals contacts with the hydrophobic side chain of the ligand, are shown in grey. (Figure created using VMD (13)).

4.1.2. Intra-domain Changes

The main assumption of the proposed domELNEDIN method is that the structural

changes inside the domains are relatively small compared to the inter-domain

movements (as shown in 3.1). Thus the elastic network can be put on each domain

separately retaining the intra-domain movements. To check if this assumption is still

adequate the RMSD between the same domains in different models at 100 ns of the AA

simulations was computed. Results (Tab. 4) show that the differences between domains

in different models are higher than the one obtained for the crystal structures of the LBP

(Tab. 2), but still much smaller than the differences between the two conformations of

the LBP protein (Tab. 1).

Table 4 The RMSD computed for the same domains in different Models.

Model A Model B Model C Model D

Model A 1.41 1.37 1.49 domain 1

Model B 1.47 1.49 1.52 domain 2

Model C 1.45 1.26 1.73 Model D 1.75 1.32 1.47

The average RMSD value between the domains 1, in different models is 1.55 Å, and is

slightly higher than for domains 2, which is 1.44 Å. This is expected as most of the

interactions upon ligand binding to the LBP are contributed by domain 1 (14).

Glu226

Trp18 Tyr202

Tyr276

Gly100

Thr102

Ser79

Tyr150

Leu

Leu

Tyr150

Ser79

18

4.2. CG Simulations

All CG simulations were run for 25 ns for each model, which corresponds to 100 ns of

real time. The RMSD for the protein structures show large structural deformations for all

models (Fig. 13). Moreover, for each models it has an upward trend, indicating that the

protein cannot find stable conformation. The protein structures at 25 ns have RMSDs

ranging from 5.5 – 7.5 Å. Visual inspection of those structures showed that models

involving the open form of the LBP (Models A and C), to some extent changed their

conformation to a more closed form and models of the closed form (Models B and D)

remain in a closed conformation. To check if closing of the open forms of the LBP is

related to conformational changes, the RMSD between them and the closed crystal

structure form of the LBP (1USK) was computed. The result in both cases were higher

than 7.1 Å, which suggests that observed changes are rather associated with tertiary

structure collapse than shifting from one conformation to another. Thus, with this model

in its current form, the LBP cannot be studied, as it is not able to keep the protein

structure stable.

Figure 13 The RMSD plot for 25 ns of CG simulations for four models. Model A in blue, Model B in red, Model C in green, Model D in purple.

The RMSD_res and RMSF_res were computed for the last 20 ns (last 80 ns in real time)

(Fig. 14). Also these plots show that MARTINI CG model is too flexible; residues in the CG

model (in blue) fluctuate much more than the residues represented in the AA model (in

red). However, even if the system is much less stable and more flexible than observed

from the AA approach, the same patterns, like the two biggest peaks corresponding to

the large fluctuations of residues 160-169 and 178-181, can still be observed (Fig. 14).

0

2

4

6

8

10

0 5 10 15 20 25

RM

SD

]

Time [ns]

RMSD for MARTINI CG model

19

Figure 14 The plots of RMSD_res and RMSF_res for Model C, showing that the same patterns for residues deviations (left) and fluctuations (right) as in the AA model in red, can be observed with the MARTINI CG model in blue.

4.3. ELNEDIN

The simulations presented in this section were used to get familiar with the ELNEDIN

method, as it is modified to establish the domELNEDIN model. Simulations are divided

into two groups. In the first one, simulations were carried out for Model A and Model B

which are the crystal structures of the apo and holo form of the LBP, respectively. In the

second group, simulations were carried out for the modified structures of the LBP,

where Model C is the apo form with leucine bound and Model D is the holo form

without leucine bound. For the first group various structural scaffold parameters were

tested and the optimal parameters then were used for the simulations in the second

group.

4.3.1. Model A and Model B

In order to test the ELNEDIN method, nine simulations for Model A and B were set

up with different parameters for the elastic network, ranging for the cut off distance Rc

[nm] є { 0.8, 0.9, 1.0} and spring force constant Kspring *kJ·mol-1·nm-2] є {50, 500, 5000}.

Simulations were run for 25 ns and the results are shown on the RMSD plot (Fig. 15) for

Model B (Run 11) example. The RMSD plots show that protein is very stable, and as the

ENM restrain the initial protein structure no conformational shifts are observed. In these

simulations, leucine stays in the cleft between two domains. However, as now model is

described by beads, specificity of leucine and thus, all interactions with protein,

involving hydrogen-bonding, are lost. The RMSD plots show also, that the global changes

of the protein decreases when both Rc and Kspring are increased. For both Models A (data

not shown) and B the deviations in the protein structure are highest using a combination

of different values of Rc and the smallest value of Kspring = 50 kJ·mol-1·nm-2, or the

1 For Model B (the holo form of the LBP), two AA simulations were in fact carried out. For the first one,

denoted as Run 1, some deviations were observed in the RMSD after 100 ns of simulation associated with leucine flipping in the binding pocket. Thus, for investigating the protein flexibility dependency on the structural scaffold parameters the structure at 50 ns of Run 1 was used as the protein structure was already stabilized at this point. However, to be consistent with the rest of the models, the AA simulation for Model B was rerun (denoted as Run 2). All results presented in this report for Model B are based on Run 2, unless clearly noted.

0

2

4

6

8

10

12

14

0 50 100 150 200 250 300 350

RM

SD

]

Residue number

RMSD_res

0

1

2

3

4

5

6

0 50 100 150 200 250 300 350

RM

SF

]

Residue num

RMSF_res

20

smallest value for cut off distance Rc = 0.8 nm, with different Kspring values. This

behaviour indicates that Rc and Kspring compensate each other to maintain the overall

structure of the protein, and is in agreement with the results presented in the ELNEDIN

paper (1).

Figure 15 RMSD plots showing effect of the Kspring and Rc values on the structure and dynamics of Model B (Run 1).

The influence of the Rc value on the flexibility of the protein is depicted on the figure

below (Fig. 16). Red indicates the common bonds (1602 present) for both the cut off

values Rc = 0.8 nm and Rc = 0.9 nm. The additional bonds (622) generated for Rc = 0.9 nm

are shown in blue. This illustrates the effect of increasing Rc by 0.1 nm, thus further

constraining the protein, which results in reduced protein flexibility.

Figure 16 Model A with a visualization of elastic bonds for Rc = 0.8 nm in red, and additional bonds for the Rc = 0.9 nm cut off in blue. A: side view; B: top view (Figure created using VMD (13)).

The protein deformation dependence on scaffold parameters is also observed when

inspecting RMSD_res (Fig. 17) and RMSF_res (Fig. 18). Both measurements were

computed for the last 20 ns of simulation (last 80 ns in real time). To evaluate the

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25

A B

50

50

0

5

00

0

RM

SD

]

Time [ns]

Rc [nm]

0.8 0.9 1.0

Ksp

ring [k

J·mo

l -1·nm

-2]

21

residue fluctuations, deformations and amplitudes, the results are compared to those

obtained from the AA simulation.

Figure 17 The RMSD_res plots showing the effect of the Kspring and Rc values on the structure and dynamics of Model B, Run 1. The ELNEDIN simulations are shown in blue, AA simulations shown in red.

Figure 18 The RMSF_res plots showing the effect of the Kspring and Rc values on the structure and dynamics of Model B, Run 1. The ELNEDIN simulations are shown in blue, AA simulations shown in red.

0

2

4

6

8

0

2

4

6

8

0

2

4

6

8

0 100 200 300 0 100 200 300 0 100 200 300

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

0 100 200 300 0 100 200 300 0 100 200 300

RM

SD

_re

s [Å

]

Residue number

Rc [nm]

0.8 0.9 1.0

50

50

0

50

00

Ksp

ring [k

J·mo

l -1·nm

-2]

RM

SF

_re

s [Å

]

Residue number

Rc [nm]

0.8 0.9 1.0

50

50

0

50

00

Ksp

ring [k

J·mo

l -1·nm

-2]

22

Testing different combinations of the scaffold parameters, allowed to choose the set of

parameters that gave optimal agreement with the flexibility observed in the AA

simulations. As shown on the example of Model B, there are only small differences

observed for the scaffolds built with the cut off Rc = 0.8 nm and Rc = 0.9 nm, and Kspring =

500 kJ·mol-1·nm-2. The default setting for the ELNEDIN model is Rc = 0.9 nm and Kspring =

500 kJ·mol-1·nm-2, however it may vary for different proteins (1). Thus, simulations for

Model C and D were set up with both cut off Rc = 0.8 nm and Rc = 0.9 nm values, and

spring force constant Kspring = 500 kJ·mol-1·nm-2 value.

4.3.2. Model C and Model D

Simulations for Model C and D were run for 1 μs (which corresponds to 4 μs of the

real time). This time range allowed the study of long time-scale behaviour for the

Models C and D, where it could be expected to observe the closing of the open form and

opening of the closed form. The RMSD plots for Model C are shown below (Fig. 19).

Figure 19 The RMSD of the 1μs simulation of Model C. Plot on left shows ELNEDIN model with Rc = 0.8 nm and Kspring = 500 kJ·mol

-1·nm

-2, and on right with Rc = 0.9 nm and Kspring = 500 kJ·mol

-1·nm

-2. Dashed lines 1

and 2 indicate structural changes in the system.

In the RMSD (Fig. 19) plot for Model C (the open form with leucine bound) with the

parameters Rc = 0.8 nm and Kspring = 500 kJ·mol-1·nm-2 some significant changes in the

RMSD are observed. After 70 ns leucine escapes from the cleft but still oscillates on the

surface of the first domain. Around 120 ns leucine escapes out of the vicinity of the

protein (Fig. 19; line 1) and after around 380 ns the protein starts to close (Fig. 19; line

2). When the structure of Model C at 1 μs of ELNEDIN simulation is superimposed on the

crystal structure of the holo form of LBP (PDB: 1USK (14)), significant differences in the

structures are observed (Fig. 20) with the RMSD between the structures equal to 7.20 Å.

In this simulation Model C folds in manner not comparable to the known closed

conformation and the changes observed during the simulation are therefore not

associated with a true conformational shift, from the open to closed form, of the

protein.

0

2

4

6

8

0 200 400 600 800 1000

RM

SD

]

Time [ns]

RMSD (0.8/500)

0

2

4

6

8

0 200 400 600 800 1000

RM

SD

]

Time [ns]

RMSD (0.9/500)

1 2

23

Figure 20 Superimpose of backbone beads from the protein in Model C and Cα atoms of the crystal structure in the holo form of the LBP. Changes observed in tertiary structure of the protein in Model C (red), for the ELNEDIN simulation with scaffold parameters Rc = 0.8 nm and Kspring = 500 kJ·mol

-1·nm

-2,

differs significantly from the crystal structure of LBP (blue) (Figure created using VMD (13)).A: top view; B:

side view.

The RMSD plots for Model D (the closed form without leucine Fig. 21), indicate that

putting a structural scaffold on the initial structure of a very stable protein conformation

will keep its tertiary structure stable, just as expected, and thus no conformational shifts

will be observed.

Figure 21 The RMSD of the 1 μs simulation of Model D. Plots in left panel shows ELNEDIN model with Rc = 0.8 nm and Kspring = 500 kJ·mol

-1·nm

-2, and on right with Rc = 0.9 nm and Kspring = 500 kJ·mol

-1·nm

-2.

4.4. domELNEDIN

The analysis for the domELNEDIN simulations were also carried out in a similar

manner to the ELNEDIN simulations. Thus, for Model A and B, nine simulations were set

up with different parameters for the elastic network, ranging for the cut off distance Rc

*nm+ є , 0.8, 0.9, 1.0- and spring force constant Kspring *kJ·mol-1·nm-2+ є ,50, 500, 5000-,

and run for 25 ns. The protein flexibility dependence on the scaffold parameters in

Model A and B shows the same trends as for the ELNEDIN model. Thus, for simulations

of Model C and D only Rc = 0.8 nm and Rc = 0.9 nm were considered.

4.4.1. Model A and Model B

The RMSD plots for Model A (data not shown), and Model B with Rc = 0.8 nm (Fig. 22)

did not indicate any conformational changes of the protein. However, the RMSD of

Model B (Fig. 22), with a structural scaffold build with the cut off Rc = 0.9 nm, showed

some deviations.

0

1

2

3

0 200 400 600 800 1000

RM

SD

]

Time [ns]

RMSD (0.8/500)

0

1

2

3

0 200 400 600 800 1000

RM

SD

]

Time [ns]

RMSD (0.9/500)

B A

24

Figure 22 The RMSD plots for Model B. Plot in left panel shows domELNEDIN model with Rc = 0.8 nm and Kspring = 500 kJ·mol

-1·nm

-2, and on right with Rc = 0.9 nm and Kspring = 500 kJ·mol

-1·nm

-2 parameters.

Further investigation, showed that after around 4 ns (Fig. 22 right) leucine moves

upward the cleft and protein slightly opens, remaining in this form until the end of the

simulation. As this might suggest opening of the closed form, the simulation was rerun

for 1 μs to check if the behaviour was reproducible. The result is shown on the RMSD

plot Fig. 23. In this simulation leucine escapes from the cleft after 38 ns and the protein

starts to close slightly, and after around 250 ns it stays in this form for the rest of the

simulation.

Figure 23 The RMSD plot for the rerun of Model B, 1 μs. The domELNEDIN model with Rc = 0.9 nm and Kspring = 500 kJ·mol

-1·nm

-2 parameters.

Again, this behaviour may suggest the opening of the closed form of the LBP, but the

release of leucine is obtained without the protein changing to an open conformation.

However, it should be noted that as the structure of the closed form without leucine

was observed to be very stable, in the AA simulation, the total opening of the closed

form of the LBP could involve the interaction with another protein or some other factor

not included in the simulations.

4.4.2. Model C and Model D

Like in the ELNEDIN simulations Model C and D were set up with a cut off distance

either of Rc = 0.8 nm or Rc = 0.9 nm and with force constant Kspring = 500 kJ·mol-1·nm-2.

Simulations were run for 1 μs in order to check if conformational changes were observed

within these models. Both the RMSD plots for Model C (Fig. 24) show a conformational

shift.

0

1

2

3

4

0 5 10 15 20 25

RM

SD

]

Time [ns]

RMSD (0.8/500)

0

1

2

3

4

0 5 10 15 20 25

RM

SD

]

Time [ns]

RMSD (0.9/500)

0 200 400 600 800 1000

0

1

2

3

4

Time [ns]

RM

SD

]

RMSD (0.9/500)

25

Figure 24 RMSD of Model C. Plot in left panel shows the domELNEDIN model with Rc = 0.8 nm and Kspring = 500 kJ·mol

-1·nm

-2, and on right with Rc = 0.9 nm and Kspring = 500 kJ·mol

-1·nm

-2.

In in both cases leucine stays in the cleft between the two domains. Moreover, the

RMSD between the structure of Model C at 1 μs of simulation, compared to the crystal

structure of the holo form of the LBP (1USK) is 5.54 Å for Rc = 0.8 nm and 3.44 Å for Rc =

0.9 nm. The RMSD values suggest that the structure in Model C changes its

conformation towards the closed form (Fig. 25) for both scaffold cut offs Rc = 0.8 nm and

Rc = 0.9 nm, and thus indicates that conformational changes can be studied with the

domELNEDIN model. The closed structure observed for the scaffold with cut off distance

Rc = 0.9 nm gives a final structure with great similarity to the known closed structure

(Fig. 25 B).

Figure 25 Superimpose of backbone beads of Model C and Cα atoms of the crystal structure of the holo form of LBP. Changes observed in tertiary structure of the protein in Model C (red), for domELNEDIN simulations with scaffold parameters A: Rc = 0.8 nm (top – on left, side – on right view) and B: Rc = 0.9 nm (top – on left, side – on right view) with Kspring = 500 kJ·mol

-1·nm

-2, compared to the holo structure (blue)

(Figure created using VMD(13)).

For Model D, both RMSD plots (Fig. 26) show some deviations, however as this structure

is the most stable of all the models and there is no leucine bound that might initiate

structure rearrangements, only small structural changes are observed.

Figure 26 RMSD of Model D. Plot in left panel shows domELNEDIN model with Rc = 0.8 nm and Kspring = 500 kJ·mol

-1·nm

-2, and on right with Rc = 0.9 nm and Kspring = 500 kJ·mol

-1·nm

-2.

0

2

4

6

8

10

0 200 400 600 800 1000

RM

SD

]

Time [ns]

RMSD (0.8/500)

0

2

4

6

8

10

0 200 400 600 800 1000

RM

SD

]

Time [ns]

RMSD (0.9/500)

0

1

2

3

4

0 200 400 600 800 1000

RM

SD

]

Time [ns]

RMSD (0.8/500)

0

1

2

3

4

0 200 400 600 800 1000

RM

SD

]

Time [ns]

RMSD (0.9/500)

A B

26

As mentioned before, Model D is a very stable form and even with a scaffold put only on

each domain separately (Fig. 27A), a switch from one conformation to another is not

observed. Thus, to test if unlocking more residues at the hinge region may lead to

protein opening, additional simulations were carried out, where elastic bonds for 2, 4,

and 6 residues at each of the hinge connections were left out (Fig. 27B). However, no

significant changes that would suggest conformational shifts were observed.

Figure 27 The domELNEDIN representation of Model D, scaffold built with Rc = 0.9 nm is shown in red lines, protein is shown as a Licorice representation of backbone beads in green. A: ENM is put on each domain separately. B: As in A, but with 4 residues unlocked at each connection of hinge region (Figure

created using VMD (13)).

A B

27

5. Conclusions and Future Perspectives

The main goal of this report was to describe and present the steps and methods

used for establishing the domELNEDIN method for modelling of protein domain

rearrangements. Thus, different CG methods were investigated. The first approach, the

MARTINI CG model, is providing a relatively detailed description of the studied structure

at the amino acid resolution, allowing at the same time to cover a time scale of several

microseconds. However, as shown with the example of the two-domain LBP, this model

is too flexible and is not capable of keeping the overall structure of the protein. This lead

to the test of another approach called ELNEDIN, where an elastic network was put on

the top of the MARTINI CG model of the LBP to keep its tertiary structure stable. Within

this model, no conformational shifts were observed. Thus, the domELNEDIN approach

was proposed, where the elastic network bonds between domains are left out. In this

manner, the tertiary structure of the protein is still kept stable, allowing at the same

time for inter-domain movements. The Model C example where the apo form of the LBP

changes its conformation to a closed form shows that conformational changes can be

studied within this model. However, for the two closed forms of the LBP with (Model B)

and without (Model D) leucine bound, conformational shifts are not observed. Thus, for

Model D, which is a very stable form, additional simulations were carried out, where the

elastic bonds for 2, 4, and 6 residues at the hinge connections were left out. However,

even in this case no opening of the closed form of LBP was observed. Thus, it might

suggest that this very stable closed form of LBP needs an additional stimulating factor –

like a transporter - to change its conformation. The combination of Rc = 0.9 nm and Kspring

= 500 kJ·mol-1·nm-2 parameters for structural scaffold, as well as this Rc = 0.9 nm and

Kspring = 500 kJ·mol-1·nm-2 combination gives the best agreement with the flexibility of the

protein presented by AA simulations. The domELNEDIN method compared to the

ELNEDIN model, seems to be a better alternative as it shows the same overall stability,

while allowing domain movement. Still, more tests should be done on proteins with

different types of domain movement.

In the further method development, the way of assigning residues to the domains

and hinges, should be reconsidered. Knowing exactly which residues are involved in

domain movements, different parts of the protein could be restrained with different

strengths of the elastic network. An applied structural scaffold would then keep the

tertiary structure of the protein stable and at the same time allow the regions crucial for

the protein conformational changes (even inside the domains), to be flexible enough to

rearrange.

The application of the domELNEDIN model should also include reverse CG

simulations (39,40), which reintroduce the atomic details from the CG description. This

would allow to test the stability of the structure at the more established AA MD level

after the conformational changes have taken place in the domELNEDIN simulation.

28

At this point of my Ph.D. the domELNEDIN model needs further investigation and

improvement. Tests should be made with more proteins, including proteins with more

than two domains. This, as well as repeating simulations, would give a better feeling for

the qualities and the problems of the domELNEDIN model and give ideas for how to

further improve the model. Membrane proteins should also be included in the testing,

approaching a state where the domELNEDIN model can be used to simulate the

dynamics and structural rearrangements associated with the conformational changes in

the catalytic cycle of P-type ATPases. I will have great opportunity to carry on with this

work during my research stay in Calgary, Canada, at Peter Tielemans group, which I will

be visiting from February to July 2011.

29

References

(1) Periole X, Cavalli M, Marrink S, Ceruso MA. Combining an Elastic Network With a Coarse-Grained

Molecular Force Field: Structure, Dynamics, and Intermolecular Recognition. Journal of Chemical Theory

and Computation 2009 SEP;5(9):2531-2543.

(2) Marrink SJ, de Vries AH, Mark AE. Coarse grained model for semiquantitative lipid simulations. J Phys

Chem B 2004 JAN 15;108(2):750-760.

(3) Marrink SJ, Risselada HJ, Yefimov S, Tieleman DP, de Vries AH. The MARTINI force field: Coarse grained

model for biomolecular simulations. J Phys Chem B 2007 JUL 12;111(27):7812-7824.

(4) Monticelli L, Kandasamy SK, Periole X, Larson RG, Tieleman DP, Marrink S. The MARTINI coarse-grained

force field: Extension to proteins. Journal of Chemical Theory and Computation 2008 MAY;4(5):819-834.

(5) Dror RO, Jensen MO, Borhani DW, Shaw DE. Exploring atomic resolution physiology on a femtosecond

to millisecond timescale using molecular dynamics simulations. J.Gen.Physiol. 2010 JUN;135(6):555-562.

(6) Olesen C, Picard M, Winther AL, Gyrup C, Morth JP, Oxvig C, et al. The structural basis of calcium

transport by the calcium pump. Nature 2007 DEC 13;450(7172):1036-U5.

(7) Tozzini V. Coarse-grained models for proteins. Curr.Opin.Struct.Biol. 2005 APR;15(2):144-150.

(8) Tozzini V. Multiscale Modeling of Proteins. Acc.Chem.Res. 2010 FEB;43(2):220-230.

(9) Sherwood P, Brooks BR, Sansom MSP. Multiscale methods for macromolecular simulations.

Curr.Opin.Struct.Biol. 2008 OCT;18(5):630-640.

(10) Gohlke H, Thorpey MF. A natural coarse graining for simulating large biomolecular motion. Biophys.J.

2006 SEP;91(6):2115-2120.

(11) Tirion MM. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis.

Phys.Rev.Lett. 1996 AUG 26;77(9):1905-1908.

(12) Gerstein M, Lesk AM, Chothia C. Structural Mechanisms for Domain Movements in Proteins.

Biochemistry (N.Y.) 1994 JUN 7;33(22):6739-6749.

(13) Humphrey W, Dalke A, Schulten K. VMD: Visual molecular dynamics. J.Mol.Graph. 1996 FEB;14(1):33-

&.

(14) Magnusson U, Salopek-Sondi B, Luck LA, Mowbray SL. X-ray structures of the leucine-binding protein

illustrate conformational changes and the basis of ligand specificity. J.Biol.Chem. 2004 MAR

5;279(10):8747-8752.

(15) Mcphalen CA, Vincent MG, Picot D, Jansonius JN, Lesk AM, Chothia C. Domain Closure in

Mitochondrial Aspartate-Aminotransferase. J.Mol.Biol. 1992 SEP 5;227(1):197-213.

(16) Mcphalen CA, Vincent MG, Jansonius JN. X-Ray Structure Refinement and Comparison of 3 Forms of

Mitochondrial Aspartate-Aminotransferase. J.Mol.Biol. 1992 MAY 20;225(2):495-517.

(17) Wiegand G, Remington S, Deisenhofer J, Huber R. Crystal-Structure Analysis and Molecular-Model of a

Complex of Citrate Synthase with Oxaloacetate and S-Acetonyl-Coenzyme-a. J.Mol.Biol. 1984;174(1):205-

219.

(18) Remington S, Wiegand G, Huber R. Crystallographic Refinement and Atomic Models of 2 Different

Forms of Citrate Synthase at 2.7-a and 1.7-a Resolution. J.Mol.Biol. 1982;158(1):111-152.

(19) Pickford AR, Smith SP, Staunton D, Boyd J, Campbell ID. The hairpin structure of the (6)F1(1)F2(2)F2

fragment from human fibronectin enhances gelatin binding. EMBO J. 2001 APR 2;20(7):1519-1529.

(20) Tjandra N, Garrett DS, Gronenborn AM, Bax A, Clore GM. Defining long range order in NMR structure

determination from the dependence of heteronuclear relaxation times on rotational diffusion anisotropy.

Nat.Struct.Biol. 1997 JUN;4(6):443-449.

(21) Garrett DS, Seok YJ, Peterkofsky A, Gronenborn AM, Clore GM. Solution structure of the 40,000 M-r

phosphoryl transfer complex between the N-terminal domain of enzyme I and HPr. Nat.Struct.Biol. 1999

FEB;6(2):166-173.

(22) Barrientos LG, Gronenborn AM. The domain-swapped dimer of cyanovirin-N contains two sets of

oligosaccharide binding sites in solution. Biochem.Biophys.Res.Commun. 2002 NOV 8;298(4):598-602.

30

(23) Meng WY, Sawasdikosol S, Burakoff SJ, Eck MJ. Structure of the amino-terminal domain of Cbl

complexed to its binding site on ZAP-70 kinase. Nature 1999 MAR 4;398(6722):84-90.

(24) Ding J, Das K, Tantillo C, Zhang W, Clark AD, Jessen S, et al. Structure of Hiv-1 Reverse-Transcriptase in

a Complex with the Nonnucleoside Inhibitor Alpha-Apa-R-95845 at 2.8-Angstrom Resolution. Structure

1995 APR 15;3(4):365-379.

(25) Jaeger J, Restle T, Steitz TA. The structure of HIV-1 reverse transcriptase complexed with an RNA

pseudoknot inhibitor. EMBO J. 1998 AUG 3;17(15):4535-4542.

(26) Zhou H, Xue B, Zhou Y. DDOMAIN: Dividing structures into domains using a normalized domain-

domain interaction profile. Protein Sci. 2007 MAY;16(5):947-955.

(27) Toyoshima C, Nakasako M, Nomura H, Ogawa H. Crystal structure of the calcium pump of

sarcoplasmic reticulum at 2.6 angstrom resolution. Nature 2000 JUN 8;405(6787):647-655.

(28) Van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJC. GROMACS: Fast, flexible,

and free. Journal of Computational Chemistry 2005 DEC;26(16):1701-1718.

(29) Leach AR. Molecular modelling : principles and applications. 2. ed ed. Essex: Pearson; 2001.

(30) Duan Y, Wu C, Chowdhury S, Lee MC, Xiong GM, Zhang W, et al. A point-charge force field for

molecular mechanics simulations of proteins based on condensed-phase quantum mechanical

calculations. J.Comput.Chem. 2003 DEC;24(16):1999-2012.

(31) Berendsen HJC, Postma JPM, Vangunsteren WF, Hermans J. Interaction Models for Water in Relation

To Protein Hydration. . B. Pullman ed.: D. Reidel Publishing Company; 1981. p. 331-338.

(32) Berendsen HJC, Postma JPM, Vangunsteren WF, Dinola A, Haak JR. Molecular-Dynamics with Coupling

to an External Bath. J.Chem.Phys. 1984;81(8):3684-3690.

(33) Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A Smooth Particle Mesh Ewald

Method. J.Chem.Phys. 1995 NOV 15;103(19):8577-8593.

(34) Hinsen K. Analysis of domain motions by approximate normal mode calculations. Proteins 1998 NOV

15;33(3):417-429.

(35) Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank.

Nucleic Acids Res. 2000 JAN 1;28(1):235-242.

(36) Kabsch W, Sander C. Dictionary of Protein Secondary Structure - Pattern-Recognition of Hydrogen-

Bonded and Geometrical Features. Biopolymers 1983;22(12):2577-2637.

(37) Murzin AG, Brenner SE, Hubbard T, Chothia C. Scop - a Structural Classification of Proteins Database

for the Investigation of Sequences and Structures. J.Mol.Biol. 1995 APR 7;247(4):536-540.

(38) Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH - a hierarchic

classification of protein domain structures. Structure 1997 AUG 15;5(8):1093-1108.

(39) Thogersen L, Schiott B, Vosegaard T, Nielsen NC, Tajkhorshid E. Peptide Aggregation and Pore

Formation in a Lipid Bilayer: A Combined Coarse-Grained and All Atom Molecular Dynamics Study.

Biophys.J. 2008 NOV 1;95(9):4337-4347.

(40) Rzepiela AJ, Schafer LV, Goga N, Risselada HJ, De Vries AH, Marrink SJ. Software News and Update

Reconstruction of Atomistic Details from Coarse-Grained Structures. J.Comput.Chem. 2010 APR

30;31(6):1333-1343.

31

Appendix: SLEU Parameterization

In the AMBER03 force field residues are named according to their position in the

sequence. For C- and N-terminal amino acids a C or N prefix, respectively, has to be

included, so the C-terminal leucine is CLEU and N-terminal leucine is NLEU. Based on the

definitions of C-terminus and N-terminus leucine (shown below) the single zwitterion

leucine SLEU residue was defined.

CLEU

N amber99_34 -0,3821 H amber99_17 0,2681

CA amber99_11 -0,2847

HA amber99_19 0,1346

CB amber99_11 -0,2469

HB1 amber99_18 0,0974

HB2 amber99_18 0,0974

CG amber99_11 0,3706

HG amber99_18 -0,0374

CD1 amber99_11 -0,4163

HD11 amber99_18 0,1038

HD12 amber99_18 0,1038

HD13 amber99_18 0,1038

CD2 amber99_11 -0,4163

HD21 amber99_18 0,1038

HD22 amber99_18 0,1038

HD23 amber99_18 0,1038

C amber99_2 0,8326

OC1 amber99_45 -0,8199

OC2 amber99_45 -0,8199

NLEU

N amber99_39 0,101

H1 amber99_17 0,2148

H2 amber99_17 0,2148

H3 amber99_17 0,2148

CA amber99_11 0,0104

HA amber99_28 0,1053

CB amber99_11 -0,0244

HB1 amber99_18 0,0256

HB2 amber99_18 0,0256

CG amber99_11 0,3421

HG amber99_18 -0,038

CD1 amber99_11 -0,4106

HD11 amber99_18 0,098

HD12 amber99_18 0,098

HD13 amber99_18 0,098

CD2 amber99_11 -0,4104

HD21 amber99_18 0,098

HD22 amber99_18 0,098

HD23 amber99_18 0,098

C amber99_2 0,6123

O amber99_41 -0,5713

In the representation of NLEU and CLEU the first column indicates atom name, the

second describes atom type, and the last one the atomic partial charge δ. In simple case

the δ for SLEU was calculated as an average value of δ for the same atom in both

definitions: NLEU and CLEU. However, when one atom, i.e. H in CLEU corresponds to

three H1, H2 and H3 atoms from NLEU, δ of H from CLEU was divided by a number of

atoms from corresponding definition of NLEU, and then this value was taken to calculate

an average δ value for SLEU representation. The idea of computing δ for SLEU is shown

below.

NLEU CLEU SLEU

δ of N + δ of N /2 = δ of N

δ of H1 + (δ of H)/3 /2 = δ of H1

(δ of O)/2 + δ of OC1 /2 = δ of OC1

Calculated partial charges with atom types were assigned to each numbered atom of

SLEU. In addition bonds, dihedrals and impropers were defined accordingly with the

AMBER03 nomenclature. Below definition of SLEU (Fig. 27) is shown.

[ SLEU ]

[ atoms ]

N amber99_39 -0.14054 1

32

H1 amber99_17 0.15208 2

H2 amber99_17 0.15208 3

H3 amber99_17 0.15208 4

CA amber99_11 -0.13715 5

HA amber99_28 0.11995 6

CB amber99_11 -0.13565 7

HB1 amber99_18 0.0615 8

HB2 amber99_18 0.0615 9

CG amber99_11 0.35635 10

HG amber99_18 -0.0377 11

CD1 amber99_11 -0.41345 12

HD11 amber99_18 0.1009 13

HD12 amber99_18 0.1009 14

HD13 amber99_18 0.1009 15

CD2 amber99_11 -0.41335 16

HD21 amber99_18 0.1009 17

HD22 amber99_18 0.1009 18

HD23 amber99_18 0.1009 19

C amber99_2 0.72245 20

OC1 amber99_45 -0.552775 21

OC2 amber99_45 -0.552775 22

[ bonds ]

N H1

N H2

N H3

N CA

CA HA

CA CB

CA C

CB HB1

CB HB2

CB CG

CG HG

CG CD1

CG CD2

CD1 HD11

CD1 HD12

CD1 HD13

CD2 HD21

CD2 HD22

CD2 HD23

C OC1

C OC2

[ dihedrals ]

H1 N CA CB backbone_prop_3

H1 N CA C backbone_prop_4

[ impropers ]

CA OC1 C OC2

Figure 28 L-Leucine in zwitterionic form.