Molecular Dynamics

Molecular DynamicsWhat can we calculate ?

Building up Model SystemsParallel Molecular Dynamics

Introduction to Molecular Dynamics

Mario G. Del Pópolo

Instituto de Ciencias BásicasUniversidad Nacional de Cuyo

[email protected] [email protected]

Mendoza, 23th of July 2013

Mario G. Del Pópolo Introduction to Molecular Dynamics 1 / 40



Outline

1 Molecular DynamicsMolecular DynamicsIntegrators and MD CodesGeneral Considerations

2 What can we calculate ?Microscopic structureThermodynamic propertiesDynamical properties

3 Building up Model SystemsPeriodic boundary conditionsInteratomic PotentialsNeighbour Lists and Linked Cells

4 Parallel Molecular Dynamics




Molecular DynamicsIntegrators and MD CodesGeneral Considerations

Outline









What is Molecular Dynamics ?

Molecular Dynamics (MD) is the numerical solution to theclassical equations of motion of atoms and molecules to obtainthe time evolution of the systemWhen dealing with systems of many interacting particles,analytical solutions to the equations of motion are not possible,so we must resort to computers and numerical methodsMD is primarily performed in the micro-canonical ensemble(NVE)→ total energy is a conserved quantityOther ensembles can be sampled (thermostats and barostats)The result of a MD simulation is a series of causally connectedsnapshots (atomic positions, velocities and forces)→ trajectoryThe trajectory is always of finite lengthMD allows for the calculation of dynamical, structural andthermodynamic properties.





Basic Molecular Dynamics - 2D fluid

http://www.ph.biu.ac.il/~rapaport/java-apps/mddisk.html







Sparkling Application of MD in Biosciences

Largest protein folding simulation ever

Atomic-Level Characterization of the StructuralDynamics of Proteins. David E. Shaw et al. Science330, 341, 2010

Several, small proteins investigated: Villin, FiP35 yBPTI. System sizes ∼15000 atoms.

1 ms simulation, several folding events observed.

Custom-built supercomputer, Anton: based onapplication-specific integrated circuits (ASICs)interconnected by a specialised high-speedthree-dimensional torus network.

Performance of a 512-node Anton machine is over17000 nanoseconds of simulated time per day for aprotein-water system consisting of 23558 atoms

www.sciencemag.org SCIENCE VOL 330 15 OCTOBER 2010 309

NEWS OF THE WEEK

From the Science Policy Blog

A National Academies’ report on how U.S. universities have managed intellectual

property in the wake of the 1980 Bayh-Dole Act has concluded that things are pretty much hunky-dory but that schools may be trying too hard to cash in on discoveries. Universities instead should aim to dissemi-nate technology for the public good, which may mean passing up a more lucrative licensing deal. http://bit.ly/bayh-dole-update

The U.S. Food and Drug Administration is pressing for a $25 million funding boost for research that can help it evaluate new

treatments better and faster. Commissioner Margaret Hamburg says such “regulatory sci-ence” would allow the agency to help turn the nation’s sizable investment in basic biomedi-cal research “into vital products for those who need them.” http://bit.ly/fda-research

Israel’s minister of education, Gideon Sa’ar, has fi red his chief scientist for comments that questioned the tenets of evolution and global warming. Gavriel Avital’s trial appoint-ment last December had been controversial from the start. http://bit.ly/sacked-adviser

The National Ignition Facility, the highest energy laser in the world, has fi red its fi rst shot in what offi cials at Lawrence Livermore National Laboratory hope will be a suc-cessful campaign to achieve ignition—a self-sustaining fusion burn that produces more energy than was pumped in to make it happen. http://bit.ly/fi rst-blast

The National Institutes of Health has launched a $60 million program that will allow a few talented young scientists

to become independent investigators shortly after earning their Ph.D.—provided they can get jobs with institutions willing to nominate them for the award. http://bit.ly/early-independence

The European Union has unveiled a new plan to foster innovation. Offi cials hope its emphasis on making it easier for companies to actually use the fruits of science will bridge a valley of death that slows commer-cialization. http://bit.ly/innovation-union

For more science policy news, visit http://

news.sciencemag.org/scienceinsider.

mine-tagged benzene as well as one from the

olefi n, bringing them close enough to pair

up. When they do so, they form styrene, the

building block of polystyrene plastics. The

reacting molecules kick the bromine out into

solution and send the palladium on its way

to orchestrate another hookup. In the late

1970s, Japanese-born Negishi, who spent

the bulk of his career at Purdue University in

West Lafayette, Indiana, and Suzuki, of Hok-

kaido University in Sapporo, Japan, modi-

fi ed the approach, adding different tagging

atoms as well as metals to tailor the reaction

to make other organic compounds.

Today, the three approaches are col-

lectively known in chemistry parlance as

“palladium-catalyzed cross-coupling reac-

tions,” and they continue to grow more popu-

lar. “Of all methodologies developed over the

past 50 years, it is safe to say that palladium-

catalyzed cross-coupling methodologies

have had the biggest impact on how organic

compounds are made,” says Eric Jacobsen,

an organic chemist at Harvard University.

“Cross-coupling methods are now used in

all facets of organic synthesis, but nowhere

more so than in the pharmaceutical industry,

where they are used on a daily basis by nearly

every practicing medicinal chemist.”

As a result, Jacobsen and other chemists

say they were not surprised by the award. “It

was just a matter of time for this chemistry

to be recognized,” says Joseph Francisco, a

chemist at Purdue University and the pres-

ident of the American Chemical Society.

Jacobsen says the Nobel Committee could

have also chosen any of a few other cross-

coupling pioneers, such as Barry Trost of

Stanford University. But Nobel rules limit

the committee to picking no more than

three recipients. “I think they got it right,”

says Jeremy Berg, who heads the National

Institute of General Medical Sciences in

Bethesda, Maryland.

For Negishi in particular, the prize is a

dream come true. After immigrating to the

United States from Japan, Negishi says he

had the opportunity to interact with several

Nobel Laureates while studying at the Uni-

versity of Pennsylvania. “I began dreaming

about this prize half a century ago,” Negishi

says. At a press conference televised in Japan,

Suzuki said he hopes his work will have a sim-

ilar effect on the next generation. “Japan has

no natural resources. Knowledge is all we’ve

got,” Suzuki says.

–ROBERT F. SERVICE

With reporting by Dennis Normile in Tokyo.

relatively simple calculations involved in

determining how neighboring atoms in a

protein interact. By speeding computa-

tions, Shaw says, Anton has run all-atom

simulations 100 times longer than general

purpose supercomputers can. In the current

study, for example, Shaw and his colleagues

were able to track 13,564 atoms, compris-

ing a relatively small protein and surround-

ing water molecules, long enough to see the

protein fold and unfold repeatedly.

Like all supercomputers, the Anton used in

the current study still has its limits and can’t

run such lengthy simulations of very large

proteins. But Shaw says

he and his colleagues are

already making progress on

that. In addition to build-

ing 11 supercomputers

incorporating 512 custom-

designed computer chip

cores, or nodes—Shaw

donated one to the National Resource for

Biomedical Supercomputing in Pittsburgh,

Pennsylvania—Shaw’s team has built a 1024-

node machine and one with 2048 nodes. The

larger machines, he notes, are more effi cient

tracking the motions of larger proteins. More-

over, Shaw says his team is already building

successors to Anton, using the next genera-

tion of chip technology to burn through cal-

culations signifi cantly faster. And Shaw says

he’s happy to be back in the thick of a knotty

intellectual challenge: “I love this. It’s just the

most fun I’ve ever had. It’s very satisfying.”

–ROBERT F. SERVICE

Custom job. By speeding cal-culations, this Anton super-computer can run all-atom simulations (inset) 100 times longer than can general pur-pose supercomputers.

CR

ED

ITS

: C

OU

RT

ES

Y O

F M

AT

TH

EW

MO

NT

EIT

H; (I

NS

ET

) C

OU

RT

ES

Y O

F D

. E

. S

HA

W R

ES

EA

RC

H

Published by AAAS

on

July

21,

201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

simulations to ~1 ms of simulated biologicaltime. [The longest previously published all-atomMD simulation of a protein, at 10 ms, requiredmore than 3 months of supercomputer time(10).] This has limited the usefulness of MD, asmany biological processes involve conforma-tional changes that take place on time scalesbetween 10 ms and 1 ms.

To access such time scales, we designed andconstructed a special-purpose machine, calledAnton (11), that greatly accelerates the executionof such simulations, producing continuous trajec-tories as much as 1 ms in length. This has allowednew insight into two fundamental processes inprotein dynamics: protein folding and the inter-conversion among distinct structural states of afolded protein.

Specifically, we have been able to formulate adetailed description of the folding of a WW do-main (12) as well as the folded-state dynamicsof bovine pancreatic trypsin inhibitor (BPTI), aworkhorse in the study of protein dynamics [andthe subject of the first protein MD simulation(13) and of pioneering computational studies ofprotein folding (14)]. Our choice of biologicalsystems was motivated in part by the fact that aconsiderable amount of experimental data isavailable for both of these proteins, providingus with various ways to test the reliability of oursimulations.

Folding of a WW domain. WWdomains aresmall, independently folding protein domainsthat bind to proline-rich sequences. The topologyof WW domains is characterized by two b hair-pins, which form a three-stranded b sheet (15).Mutational analyses of the folding of WW do-mains show that the rate-limiting step in the fold-ing reaction involves the formation of the firsthairpin (16–18). This information facilitated theoriginal design of the fastest-foldingWW domainreported to date, FiP35 (12), which folds in 14 ms.

FiP35 has several features that make it at-tractive as a model system for use in computa-tional studies of protein folding. A great dealof experimental data is available for this system(12, 16, 17), and previous attempts to charac-terize its folding mechanism through explicit-solvent simulations have been largely unsuccessful(10, 19). It has been suggested that WW do-mains such as FiP35 fold through multiple dis-tinct routes that differ in the order of formation ofthe individual hairpins (20–22), but this has notbeen conclusively demonstrated. It has also beenspeculated that the mutations involved in thedesign of FiP35may have shifted the rate-limitingstep relative to that of the Pin1 WW domain,which formed the basis for its design. Finally, ithas been suggested (12) that the fast folding ofFiP35 may be close to the downhill regime, withonly a small (<3kBT) free-energy barrier. Manyof the unresolved questions surrounding the fold-ing of FiP35 raise issues central to the process ofprotein folding more generally, and we believedthey might be amenable to investigation usingvery long atomistic simulations.

Folding FiP35 and villin to experimentalresolution using the same force field. We firstran a simulation at 337 K, a temperature at whichFiP35 should be predominantly folded. The pro-tein was initially configured in a fully extendedstate and was observed to fold to a stable confor-mation with a backbone root-mean-squared de-viation (RMSD) of ~1Å from the crystal structure(15) (Fig. 1B). A previous attempt to fold FiP35computationally, using a 10-ms explicit solventsimulation (10), did not converge to the nativestate, and subsequent work (19) provided evi-dence that this was attributable to deficiencies inthe force field. Our successful folding simulationwas based on a modified version of the Amberff99SB force field (23).

One potential concern in simulating the fold-ing of an all-b protein is the possibility that theforce field used might tend to overstabilize bsheets, thus “assisting” the folding process in anonphysical manner. The force field we used,however, also folded a variant of the villinheadpiece C-terminal fragment, a small all-aprotein (24), to within an RMSD of ~1 Å fromthe crystal structure. (These results cannot betaken as evidence that the same force field wouldnecessarily succeed in folding other proteins.)

Reversible folding and unfolding in anequilibrium simulation of FiP35. In the hope ofobserving a number of folding and unfoldingevents under equilibrium conditions, we then rantwo independent 100-msMD simulations of FiP35at a temperature (395 K) approximating the pro-tein’s in silico melting temperature. [The in silicomelting temperature was estimated using a replicaexchange metadynamics simulation (25) and is~40 K higher than the experimental melting tem-perature.] In each of the two equilibrium simu-lations, FiP35 underwent multiple folding andunfolding transitions, for a total of 15 events in thetwo simulations (Fig. 2A). We found the foldingtime, as calculated from the average waiting timein the unfolded state, to be 10 T 3 ms, in relativelyclose agreement with the experimental foldingtime [14 ms (12)]. The population of the foldedstate in the simulations was 60%.

In our simulations, a well-defined sequenceof events leads from the disordered, unfolded

state to the native state: In all folding transi-tions, formation of the tip of the first hairpin isfollowed by formation of the entire first hair-pin, then by formation of the second hairpin,and finally by consolidation of the hydrophobiccore (Fig. 2B, top). Unfolding transitions followthe reverse pattern. Thus, the folding mecha-nism of FiP35 appears to be dominated by asingle pathway, with any flux through alternativepathways being small. This is in contrast withrecent studies of WW domains (20–22)—usingsimulations shorter than the folding time—whichfound structurally heterogeneous dynamics withmultiple parallel pathways connecting foldedand unfolded states.

To explain why FiP35 folds along a single,dominant route, we performed reversible foldingsimulations of peptides corresponding to the twohairpins of FiP35. Both the first and second hair-pin, in isolation, fold to the structure found in thefull WW domain, with similar time constants(1.2 T 0.1 ms and 0.86 T 0.06 ms for hairpins 1 and2, respectively). The two hairpins, however, dif-fer substantially in their stabilities, with the pop-ulation of the folded state being 25% and 4% forthe first and second hairpins, respectively. Thus,the order in which the individual hairpins formduring folding of the full WW domain mirrorstheir intrinsic thermodynamic stabilities, with theslower but more stable hairpin 1 forming first.We suspect that the alternative pathway in whichthe second hairpin forms first may not be sub-stantially utilized because of the relatively largedifference in stability between the two hairpins(~1.5 kcal mol−1). Indeed, experiments haveshown that variations in substructure stabilitiescan cause substantial shifts in folding pathways(26, 27), as have computational studies in thecontext of the diffusion-collision model for pro-tein folding (28).

Our simulations show that both hairpins alsoform with similar rate constants in the context ofthe fullWWdomain. The first hairpin forms witha time constant of 5 T 1 ms, slower than in isola-tion by a factor of 4; such a slowdown is greaterthan expected for a simple hydrodynamic drag.To determine its origin, we performed simula-tions in which we systematically added residues

Fig. 1. Folding proteinsat x-ray resolution, show-ing comparison of x-raystructures (blue) (15, 24)and last frame of MDsimulation (red): (A) sim-ulation of villin at 300 K,(B) simulation of FiP35 at337 K. Simulations wereinitiated from completelyextended structures. Villinand FiP35 folded to theirnative states after 68 msand 38 ms, respectively,and simulations were continued for an additional 20 ms after the folding event to verify the stability of thenative fold.

A B

15 OCTOBER 2010 VOL 330 SCIENCE www.sciencemag.org342

RESEARCH ARTICLES

on

July

21,

201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from





Equations of motion

Hamiltonian→ H(rN ,pN) = KN(pN) + VN(rN)

with

KN(pN) =N∑

i=1

|pi |2

2mand VN(rN) =

N∑i=1

∑j>i

u(|rij |)

Equations of motion

ri =

(∂H∂pi

)=

pi

m→ dri

dt= vi (1)

pi = −(∂H∂ri

)= −

(∂VN

∂ri

)= fi → dvi

dt= fi/m (2)

Need to find numerical scheme to solve these equationsMario G. Del Pópolo Introduction to Molecular Dynamics 7 / 40




The Verlet algorithm (1967)Forward Taylor expansion of atomic positions:

ri(t + τ) = ri(t) + viτ +12

fi(t)mi

τ 2 + · · · (3)

Backward Taylor expansion of atomic positions:

ri(t − τ) = ri(t)− viτ +12

fi(t)mi

τ 2 − · · · (4)

By adding these equations (truncated at 2nd order) we get:

ri(t + τ) = 2ri(t)− ri(t − τ) +fi(t)mi

τ 2

where fi = −(∂V∂ri

)is the force on particle i and mi is its mass.

Velocities needed to calculate kinetic energy (check energy conservation andcalculate temperature)

vi(t) =ri(t + τ)− ri(t − τ)

2τMario G. Del Pópolo Introduction to Molecular Dynamics 8 / 40




Velocity Verlet algorithm

In practice, a more accurate scheme is the Velocity Verlet algorithm.Stage 1: Update positions

ri(t + τ) = ri(t) + vi(t)τ +fi(t)2mi

τ 2

Stage 2: Calculate forces for new configuration ri(t + τ)Stage 3: Update velocities

vi(t + τ) = vi(t) +τ

2mi[fi(t) + fi(t + τ)]





Integrators: essential requirements

Integrators must satisfy the following conditions:

Computational speed: the bottleneck is the calculation of fi

Low memory demandAccuracyStability (energy conservation, no drift)Time reversibilitySymplectic: each step in the simulation conserves the volume ofphase space

It is possible to add thermostats and barostats (extendedHamiltonians) in order to sample NVT and NPT ensembles





General considerations on MD

Set initial configuration (positions): avoid overlaps betweenatomsSet initial velocities: drawn from a Maxwell-Boltzmanndistribution with specified temperature. Remove centre-of-massmotion.Choose appropriate integration time-step δt = τ

Monitor conservation of total momentum P =∑

i mivi and totalenergy H = K + VThe instantaneous temperature is related to the kinetic energythrough: T (t) = 2

3NkBK (t)

In order to avoid temperature drift, velocities can be scaled at thebegging of the simulation:

K (t) =12

∑i

mi |s × vi |2 with s =

[Tset(t)T (t)

]1/2

where Tset(t) is the target temperature for the simulationMario G. Del Pópolo Introduction to Molecular Dynamics 11 / 40




Standard Molecular Dynamics Flow-Chart

Implementation of barostats requires calculating stress-tensor


→ Built neighbours lists andlinked cells arrays

→ Update neighbours listsand linked cells arrays



Microscopic structureThermodynamic propertiesDynamical properties

Outline









The radial distribution function

Chandler, chap. 7 1

g(r) =⟨

1N∑N

i=1∑N

j 6=i δ(r − rj + ri )⟩

measured by X-ray or neutrons scattering

coordination number nc :

nc(r) = 4πρ∫ r

0g(r ′)r ′2dr ′ (5)

Note: for a liquid,

g(r)→ 1 as r →∞. Absence of long-range orderg(r)→ 0 as r → 0. Repulsive forces at short distances





Calculating g(r)

histo=0for i = 1 to N − 1 do

for j = i + 1 to N dorx = x(j)− x(i)· · ·Apply minimum image convention · · ·rsq = rx × rx + ry × ry + rz × rzr = sqrt [rsq]index = int [r/δ] + 1histo(index) = histo(index) + 2

end forend forNormalizationfor j = 1 to Mbins do

rlow = (j − 1) ∗ δrup = rlow + δnideal = (rup3 − rlow3)× 4πρ/3histo(j) = histo(j)/N/nconfs/nideal

end for

rlow

rup

δ





Connection with thermodynamicsRelations valid for uniform fluids with potential energy given by a sum of pairinteractions, VN(rN) =

∑Ni=1

∑Nj>i v(rij):

Energy equation:

Uex/N = 2πρ∫ ∞

0v(r)g(r)r 2dr (6)

Pressure equation:

βPρ

= 1− 2πβρ3

∫ ∞0

v ′(r)g(r)r 3dr (7)

Reversible work theorem

Under constant (N,V ,T ) the reversible for work (Helmholtz free energy) nec-essary to bring two tagged particles from infinite separation to a relative dis-tance r is given by:

w(r) = −kBT ln (g(r)) (8)





Thermodynamics

Phase diagrams (big challenge)Free energy differences and potentials of mean force

on the system becomes larger than the free energy difference.29

Due to the exponential nature of eq 4 and the limited numberof samplings, this forces the reconstructed PMF to be far fromthe exact PMF.27

The position fluctuation associated with a spring constantkconnecting between the ion and the imaginary AFM tip can bedescribed using40

The use of a soft spring causes a large fluctuation in the positionand makes it difficult to estimate the fine structure of the PMF.To avoid this difficulty, the SMD simulation uses the stiff-springapproximation, which assumes that the spring constant mustbe sufficiently large that the ion closely follows the tip.27 Theeffect of spring stiffness on the PMF is shown in Figure 3b.This clearly shows that the reconstructed PMF profiles digressmore from the exact PMF as the spring constant gets softer.

Another factor that affects the PMF reconstruction from thecv-SMD is the diffusion coefficient or friction coefficient ofthe ion. As the diffusion coefficient decreases (or the frictioncoefficient increases), the energy dissipation from the ion tothe bath increases and the process gets irreversible more quickly.This also leads to bad sampling in the PMF reconstruction.Figure 3c shows the results from three different ion diffusioncoefficients that represent the values in bulk (2.0× 10-5cm2/s)and estimates for ion channels (1.0× 10-5-0.5 × 10-5cm2/s).19,20It appears that diffusion coefficients in the indicated rangehave little effect on the reconstructed PMF profiles.

B. PMF Profile of a Single Na+ Ion from cv-SMDSimulations.Figure 5 shows the PMF profile for a single Na+

moving through the cyclic peptide nanotube. The positions ofthe ion in the cyclic peptide nanotube are also shown in severalplaces. Several features in the PMF profile are worth addressing.First of all, energy barriers of∼2.4 kcal/mol show up at the

both entrances to the nanotube (z ) (8.0 Å). These energybarriers, so-called dielectric barriers, dielectric self-energybarriers, or desolvation penalties, appear because several watermolecules that surround the Na+ ion are removed for the ion toenter the mouth of the nanotube. Many studies have been doneto calculate the dielectric barrier energies in ion channels.43-46

Beckstein et al. used PB equations and MD simulations to obtainfree energy profiles inside model ion channels with no fixedcharges and radii varying from 1.5 to 10.0 Å. The barrier heightsthey obtained from the MD simulations ranged from 24 (r )1.5 Å) to 2.5 kcal/mol (r ) 10.0 Å).46 Cyclic peptide nanotubeshave negatively charged carbonyl oxygens on the backbone,and there are strong attractive interactions between cations andthe carbonyl oxygens.17 Consequently, the energy barriers atthe channel entrances are a combined result of the desolvationpenalty and the attractions between Na+ and the carbonyloxygens. As mentioned in Section II.B, Na+ at z ) (11.83 Åis still bound to the cyclic peptide. If the ion is pulled frombulk toward the entrance of the tube, the dielectric barrier mustbe higher than the 2.4 kcal/mol value we obtained. Second, thewell depth of the PMF profile is-4.1 kcal/mol, which is smallerthan the-11.8 kcal/mol value predicted using a free energyperturbation calculation for a model cyclic peptide nanotubeby Asthagiri and Bashford.13 Note that their work determinedonly the electrostatic component of the free energy; in contrast,our study calculated the total free energy, so the directcomparison of their results and ours cannot be made. Also notethat Asthagiri and Bashford did not study other aspects of thePMF in their work. So, the presence of barriers at the nanotubeentrances was not determined previously. Third, our results showthat there are small barriers and wells in the PMF curve, whichsuggest that the Na+ ion diffuses through the peptide nanotubewith a series of hoppings. A similar hopping mechanism forwater molecules diffusing inside the peptide nanotube was foundin MD simulations.12

Figure 5. Na+ PMF profile. Na+ positions corresponding tozcoordinates along the cyclic peptide nanotube are also shown in the inserted snapshots.

δz∼ xkBT/k (10)

SMD Studies of PMF of Na+/K+ in Peptide Nanotube J. Phys. Chem. B, Vol. 110, No. 51, 200626453

Figure: Potential of mean force, calculated by molecular dynamics, for asodium ion moving across a cyclic peptide nanotube in water. Hwang et al.,J. Phys. Chem. B, 2006, 110, 26448.





Time auto-correlation functions

Time average:

CAA(t) = 〈A(t)A(0)〉 = limτ→∞

1τ

∫ τ

0A(t)A(0)dt (9)

For a system in equilibrium fo(rN ,pN) is independent of time and correlationfunctions only depend on the time lag: t = t ′ − t ′′.

Short- and long-time limits:

limt→0〈A(t)A(0)〉 = 〈AA〉 lim

t→∞〈A(t)A(0)〉 = 〈A〉2 (10)

It is possible to show that the power absorbed by the system for amono-cromatic perturbation of frequency ω, is given by:

abs(ω) =βω2|fω|2

4

∫ ∞0

dt 〈δA(0)δA(t)〉 cos (ωt) (11)





The velocity auto-correlation function

CVx Vx (t) =13〈V(t) · V(0)〉

CVx Vx (0) =⟨

V 2x

⟩=

kBTm

limt→∞

CVx Vx (t) = 〈Vx〉2 = 0

CVx Vx (t) measures the projection of theparticle velocity at time t onto its initialvalue

Fast initial decay followed byback-scatering

¡ ¢ £ ¤ ¥ ¤ ¡ ¥ ¡ ° ¥ ¡ ¥ ³ ¡ ¢ ¥ ¡ ¥ ¤ º ¤ ¢ ¥ ° ¥ ¤ ¡ ¡ °

¢ ¥ º ¡ ° º £ ¥ ¤ ° ³ ¡ ¤ ¡ ¥ ¥ ¤ ¥ ¡ ¡ Ü ¥ £ ¤

¡ Ü ° ¡ ¤ ¡ ¡ Ü £ ¡ ° ¢ £ ¡ ¤ º ¢ ¥ ¤

³ Ü ¡ ¤ ¡ Ü ¡ ¤ ¡ ¡ ¥ ¥ º ¡ £ ¤ ° ¥ ¥ ³ ³

Ü £ ¢ ° º ¢ º ¡ £ ¢ ¥ ¤ ¤ ¢ ¡ £ ¡ ¡ £ º ¡ Ü ¤ ¡ £ ° ¢

¡ º ¥ ¤ ° ¢ ¡ ¤ £ £ ¥ º £ ¤ ° ¥ ¤ ¡ Ü ° ³ ¡ £ ¤ ¥ ¡

º ¢ ¥ £ ¡ ¤ ¡ ¥ ¤ ¥ ³ ¢ ³ ¡ ° º ¡ ° ³ ¡ ¥ £ ¥

¤ Ü ¤ £ £ ¢ ¡ £ ¥ ° ¥ ¤ ¡ ¡ ¤ ³ £ £ £ ¢ ¡ ° £ ¡ ¥ ¥

¥ Ü ¤ £ ¡ ° £ ¡ ¥ ³ ¡ ° ¡ º ¤ ¢ ¥ ¤ ° £ ¡ Ü Ü ¡ Ü ¤ ¥ £ ° Ü º ¡ ¥

Source: Allen and Tildesley 1





The mean-square displacement

R2(t) =⟨|r(t)− r(0)|2

⟩Short- and long- time limits

R2(t) ∝ t2 R2(t) ∝ t

Relation with CVx Vx (t)

R2(t) = 6t∫ t

0(1− s/t)CVx Vx (s)ds

Einstein relation for diffusion coefficientD:

D = limt→∞

16t

⟨|r(t)− r(0)|2

⟩

0 10 20 30 40 50t / ps0

1

2

3

4

Δr2 (t)

/ Å2

CationAnion

-1 0 1 2log10(t)

-1

0

1

log 10

(Δr2 (t)

)

100 200 300 400 500t / ps

0

0.5

1

1.5

2

β(t)

slope = 1.0

slope = 2.0

a)

b)

Del Pópolo et al. J. Phys. Chem. B 108 , 1744




Periodic boundary conditionsInteratomic PotentialsNeighbour Lists and Linked Cells

Outline









Boundary conditions

Periodic boundary conditions:

Avoid surface effectsPeriodicity introducescorrelations

System size N:

Finite size effects depend oncorrelation length and range ofinteractionsDependence on cell symmetryand shape

Configurational energy:

The Ewald method (trulyperiodic b.c.)Truncation of the intermolecularforces: Minimum image & cutoff

L

x

r xji = r x

j − r xi

r xji = r x

ji − L× nint(r xji /L

)Mario G. Del Pópolo Introduction to Molecular Dynamics 22 / 40




The Magic Ingredient: Models for Molecular Systems

Simulation of Molecular SystemsForce-Fields

V =∑

b

kr, b

2(rb − r0, b)

2 +∑

a

kθ,a2

(θa − θ0, a)2

+∑

d

n∑m=1

Vm, d

2[1 + (−1)m + 1 cos(mϕd)]

+∑

i

∑j

{4εij

[(σij

rij

)12

−(σij

rij

)6]+

14πε0

qiqj

rij

}





Intermolecular Short-Range Forces

Lennard-Jones potential

v12(rij) = 4ε

[(σ

rij

)12

−(σ

rij

)6]

Used in conjunction with a Cut-Off anda Switch Function

In this example σ = 0.34nm andε/kB = 119.8K

vi,α = Fα

∑i 6=j

ρβ(rij)

+12

∑i 6=j

φαβ(rij) Embedded Atom Method





Long-Range Forces. Coulomb’s Potential

E =1

4πεo

∑ij

qiqj

|ri − rj |Coulombic Energy





Long-Range Forces. Coulomb’s Potential

E = Ereal + Eneut + Ereci + Eself

Ereal =12

∑L

N∑i=1

N∑j=1

j 6=i when L=0

qiqjerfc(α|ri − (rj + L)|)|ri − (rj + L)| Short Ranged

Eneut =πq2

tot

2Vα2 Constant; Eself = −α√π

N∑i=1

q2i Constant;

Ereci =2πV

∑k 6=0

1k2 exp

(− k2

4α2

)∣∣∣∣∣

N∑i=1

qi cos(k · ri)

∣∣∣∣∣2

+

∣∣∣∣∣N∑

i=1

qi sin(k · ri)

∣∣∣∣∣2

Sum over reciprocal space vectors





Standard Molecular Dynamics Flow-Chart

Implementation of barostats requires the calculation of thestress-tensor


→ Built neighbours lists andlinked cells arrays

→ Update neighbours listsand linked cells arrays




Neighbour Lists

Built at the beginning of the simulation, updated with certain frequencyComputational cost of force calculation O[N(N∗ − 1)]Need two arrays LIST ,POINT

POINT points to the positions in array LIST. POINT(I+1)points to the first neighbour of molecule I+1; POINT(I+1)-1points to the last neighbour of molecule I.

Building the list is O(N2)





Linked Cells

Convenient for large systems

A cubic cell is divided into M ×M ×Mcells, with side l = L/M > rcut

A list of molecules within each cell is keptupdated during simulation

Forces are calculated by looping over allcells, and checking neighbours

Linked-list used to sweep throughparticles in each cell




Outline








Parallelisation Strategies

Bottleneck:Forces Calculation.

Short range forces.Ewald sums.Intramolecular components.

Two models of parallelismParticle Decomposition or PD

Particles are assigned to differentProcessor Elements (PE) at thebeginning of simulationGlobal communication of dataAssignment needs to be revised assimulation progresses (LB)

Domain Decomposition or DDGeometrical Domains are assigned todifferent PEEach processor communicates onlywith its neighboursRevise assignment during run (LB)




Popular MD Codes

NAMD. http://www.ks.uiuc.edu/Research/namd/

GROMACS. http://www.gromacs.org/

LAMMPS. http://lammps.sandia.gov/

AMBER. http://ambermd.org/

DLPOLY. http://www.stfc.ac.uk/cse/25526.aspx

ESPRESSO. http://espressomd.org/


http://www.ks.uiuc.edu/Research/namd/

http://www.gromacs.org/

http://lammps.sandia.gov/

http://ambermd.org/

http://www.stfc.ac.uk/cse/25526.aspx

http://espressomd.org/



Molecular Dynamics in Parallel Computers

MD performance comes from: fast processor clock cycle, and little andfast communication between processors (PE)

Two platforms to be considered hereEach PE has its own private memory

Message Passing Methods, MPI for example. Involves communicationover a network.Communication overheads: i) time to initiate message transfer(latency) and ii) transfer time (proportional to message length)

All PE share a common memoryMessage Passing or Threading Methods (OPENMP)Threading: replica copies accessing a common region of memory




Domain Decomposition I

At the beginning particles are distributed amongPE

Particles are free to move from one domain (PE) toanother. Need to update PE info during run

If Lcell > Rcut on has to communicate informationonly between neighbouring processors (26 PE in3D)

Build list of δN particles which can interact withparticles in other PE

Perform (6) send/receive operations tocommunicate δN to PE along left/right, up/down,in/out directions

Force calculation:

For non δN particles. Use local neighbour listor local linked-cells + action/counteractionFor δN particles, compute forces twice andavoid action/counteraction principle. Nocommunication of forces between PE

17.3 Techniques for parallel processing 449

Fig. 17.1. The portion of the simulation region (for a two-dimensional subdivision) rep-resented by the square outline is handled by a single processor; it contains shaded areasdenoting subregions whose atoms interact with atoms in adjacent processors, and is sur-rounded by shaded areas denoting subregions from adjacent processors whose interactionsmust be taken into account; arrows indicate the flow of data between processors.

communication are required to handle interactions between atoms assigned to dif-ferent processors. For long-range interactions this may not be a problem, but forthe short-range case the third choice turns out to be far more efficient.

The third scheme subdivides space and assigns each processor a particular sub-region [rap91b]. All the atoms that are in a given subregion at some moment in timereside in the processor responsible and when an atom moves between subregionsall the associated variables are explicitly transferred from one processor to another.Thus there is economy insofar as memory is concerned, and also in the communi-cation required to allow atoms to transfer between processors, since comparativelyfew atoms make such a move during a single timestep. More importantly, assum-ing there are some 104 or more atoms per subregion (in three dimensions) and arelatively short-ranged potential, most of the interactions will occur among atomsin the subregion and relatively few between atoms in adjacent subregions – seeFigure 17.1. In order to accommodate the latter, copies of the coordinates of atomsclose to any subregion boundary are transferred to the processor handling the ad-jacent subregion prior to the interaction computation. This transfer also involvesonly a small fraction of the atoms.

It is this third scheme that will be described here. The only requirement is thatcommunication be reasonably efficient, with the associated system overheads –




Domain Decomposition II

Domain Decomposition Scaling with Number of Particles

Ewald sums:Real-space contribution is parallelized following the previous schemeReciprocal space contribution in (PME) is performed by parallel 3D FFT.Some codes (GROMACS-4) reserves a number of CPUs for parallel 3DFFT




Particle Decomposition I

Atoms or molecules are assigned to a particular processor during thesimulation

Based on the construction of neighbour lists

Can be done with Message-Passing or Multithreading approaches

When Message-Passing is used, large amounts of communication areneeded to handle interactions. This is a problem for short-rangepotentials.

The following examples: i) Parallelization of the integration routines(threads act over different sets of particles). ii) Parallelization of the forcecomputation routine




Multithreading Particle Decomposition. Example I

468 17 Algorithms for supercomputers

when processors do not share common memory. The overall efficiency depends onthe processor architecture and the nature of the problem, and it is quite possiblethat performance will not scale as efficiently with the number of processors as themessage-passing approach†.

Use of computational threads

The example used here! is, once again, a simple soft-sphere MD simulation. Wewill show how threads can be introduced into two key parts of the program. Thefirst is the leapfrog integrator, included on account of its simplicity rather than be-cause of its heavy computational requirements. The second involves the neighbor-list construction and force evaluation procedures, which is where most of the com-putation time is spent and which, therefore, stand to benefit most from thread usage.Several macro definitions are used to conceal programming details.

The leapfrog integration function is changed so that now it calls another functionLeapfrogStepT. There will be one such call for each thread and it is this newfunction that does the real work.

void LeapfrogStep (int part)

{

int ip;

THREAD_PROC_LOOP (LeapfrogStepT, part); 5

}

void *LeapfrogStepT (void *tr)

{

int ip, n; 10

QUERY_THREAD ();

switch (QUERY_STAGE) {

case 1:

THREAD_SPLIT_LOOP (n, nMol) { 15

VVSAdd (mol[n].rv, 0.5 * deltaT, mol[n].ra);

VVSAdd (mol[n].r, deltaT, mol[n].rv);

}

break;

case 2: 20

THREAD_SPLIT_LOOP (n, nMol)

VVSAdd (mol[n].rv, 0.5 * deltaT, mol[n].ra);

break;

}

return (NULL); 25

}

† Parallel compilers can, in principle, produce the kind of code described here, although it is often simpler andmore efficient (and sometimes essential) to introduce the changes into the MD code manually.

! pr_17_2




Multithreading Particle Decomposition. Example I

17.5 Shared-memory parallelism 469

The above functions need explanation. When LeapfrogStep is called, it spawnsseveral threads that can execute concurrently on separate processors. It does this byrunning through a loop that starts each one of nThread processes separately. This iscarried out by THREAD_PROC_LOOP. The first argument is the name of the function(here LeapfrogStepT) to be executed by the thread, the second (part) is a valueto be passed across to the thread. The function LeapfrogStep returns only afterall the threads have completed their work.

The function LeapfrogStepT is executed by each of the threads and, since thethreads execute in parallel and in a totally unsynchronized manner, care is requiredto ensure that data are handled properly; in particular, any data written by onethread should not be accessed (either for reading or for writing) by another con-current thread. The form of the function header and the return statement are arequirement of the thread functions.

The reference to the macro QUERY_THREAD produces the serial number of thethread, a value between 0 and nThread-1, which is placed in ip. It is used inTHREAD_SPLIT_LOOP, which is a loop over a subset of the atoms defined as

#define THREAD_SPLIT_LOOP(j, jMax) \

for (j = ip * jMax / nThread; \

j < (ip + 1) * jMax / nThread; j ++)

The value of the argument part is supplied by the macro QUERY_STAGE. The out-come of executing all the threads is that the leapfrog update is applied to all atoms.A similar approach can be used, for example, in ApplyBoundaryCond, althoughfunctions that are responsible for only a small fraction of the overall workload maynot warrant conversion to use threads.

The force computation is a little more intricate. It requires multiple neighborlists, one for each thread, containing distinct subsets of atom pairs; individual atomswill generally appear in more than one of these lists. It also requires additionalstorage for acceleration and potential energy values that are computed separatelyfor each of the lists and subsequently combined to produce the correct values. Notethat allowing the different threads to update a common array of acceleration valueswould violate the restrictions on what threads are permitted to do with data andwould be guaranteed to produce incorrect results.

Replacing the function BuildNebrList of §3.4 involves the followingalterations:

void BuildNebrList ()

{

int ip, n;

for (n = nMol; n < nMol + VProd (cells); n ++) cellList[n] = -1; 5

THREAD_PROC_LOOP (BuildNebrListT, 1);

472 17 Algorithms for supercomputers

contributions, where, for brevity,

#define THREAD_LOOP \

for (iq = 0; iq < nThread; iq ++)

Additional quantities declared here are

pthread_t *pThread;

VecR **raP;

real *uSumP;

int **nebrTabP, *nebrTabLenP, funcStage, nThread;

where the elements pThread are required by the thread processing; the necessaryarray allocations (in AllocArrays – the usual nebrTab is not required, havingbeen replaced by corresponding arrays nebrTabP private to each thread) and inputdata item are

AllocMem (pThread, nThread, pthread_t);

AllocMem (uSumP, nThread, real);

AllocMem (nebrTabLenP, nThread, int);

AllocMem2 (raP, nThread, nMol, VecR);

AllocMem2 (nebrTabP, nThread, 2 * nebrTabMax / nThread, int); 5

NameI (nThread),

Finally, the remaining definitions used in the program are

#define QUERY_THREAD() ip = (int) tr

#define QUERY_STAGE funcStage

#define THREAD_PROC_LOOP(tProc, fStage) \

funcStage = fStage; \

for (ip = 1; ip < nThread; ip ++) \ 5

pthread_create (&pThread[ip], NULL, tProc, \

(void *) ip); \

tProc ((void *) 0); \

for (ip = 1; ip < nThread; ip ++) \

pthread_join (pThread[ip], NULL); 10

Further information about the functions prefixed with pthread_ can be found inthe documentation of the thread library functions. It is left to the reader to explorethe performance benefits that can be obtained by this approach (a prerequisite beinga computer with multiple processors and shared memory).




Multithreading Particle Decomposition. Example II17.5 Shared-memory parallelism 471

The corresponding function ComputeForces becomes

void ComputeForces ()

{

int ip, iq;

THREAD_PROC_LOOP (ComputeForcesT, 1); 5

THREAD_PROC_LOOP (ComputeForcesT, 2);

uSum = 0.;

THREAD_LOOP uSum += uSumP[iq];

}

10

void *ComputeForcesT (void *tr)

{

...

int ip, iq;

15

QUERY_THREAD ();

switch (QUERY_STAGE) {

case 1:

rrCut = Sqr (rCut);

DO_MOL VZero (raP[ip][n]); 20

uSumP[ip] = 0.;

for (n = 0; n < nebrTabLenP[ip]; n ++) {

j1 = nebrTabP[ip][2 * n];

j2 = nebrTabP[ip][2 * n + 1];

... 25

if (rr < rrCut) {

...

VVSAdd (raP[ip][j1], fcVal, dr);

VVSAdd (raP[ip][j2], - fcVal, dr);

uSumP[ip] += uVal; 30

}

}

break;

case 2:

THREAD_SPLIT_LOOP (n, nMol) { 35

VZero (mol[n].ra);

THREAD_LOOP VVAdd (mol[n].ra, raP[iq][n]);

}

break;

} 40

return (NULL);

}

In the first call to ComputeForcesT, the particular neighbor list associated withthread ip is processed, with acceleration values being stored in array raP[ip][]

and potential energies in uSumP[ip]. The second call accumulates these separate




Bibliography

"Computer Simulations of Liquids" , by M. P. Allen, D. J. Tildesley.Oxford University Press, 1987"Understanding Molecular Simulations" , by D. Frenkel and B.Smit. Academic Press, 2002”Molecular Dynamics Algorithms for Massively ParallelComputers”, N. Attig et al., Workshop on Molecular Dynamics onParallel Computers. NIC Series, 1999”The Art of Molecular Dynamics Simulation ”, D. C. Rapaport,Cambridge University Press, 2004


Molecular Dynamics

Documents

Transcript of Molecular Dynamics