Welcome to Daresbury Laboratory Alice’s Wonderland (1865) Lewis Carroll (Charles Lutwidge Dodgson)

Post on 18-Jan-2016

218 views 3 download

Tags:

Transcript of Welcome to Daresbury Laboratory Alice’s Wonderland (1865) Lewis Carroll (Charles Lutwidge Dodgson)

Welcome to Daresbury Laboratory

Alice’s Wonderland (1865)Lewis Carroll (Charles Lutwidge

Dodgson)

DL_SOFTWARE TUTORIALSoftware Solutions in Atomistic

ModellingILIAN TODOROV, CHIN YONGMICHAEL SEATON, JOHN PURTONDAVID GUNN, ANDREY BRUKHNO

WILLIAM SMITH

SCD, STFC DARESBURY LABORATORY, DARESBURYWARRINGTON WA4 4AD, CHESHIRE, ENGLAND, UK

Multiple Scales of Materials Modelling

MS&MD via DL_POLY

DPD & LB via DL_MESO

KMC via DL_AKMC

FF mapping

via DL_FIELD

MC

via

D

L_M

ON

TE

Coarse graining

via DL_CGMAP

Part 1Part 1

DL_POLY Project BackgroundDL_POLY Project Background

DL_POLY Trivia

• General purpose parallel (classical) MD simulation software • It was conceived to meet needs of CCP5 - The Computer Simulation of Condensed Phases (academic collaboration community)• Written in modularised Fortran90 (NagWare & FORCHECK compliant) with MPI2 (MPI1+MPI-I/O) & fully self-contained

• 1994 – 2010: DL_POLY_2 (RD) by W. Smith & T.R. Forester (funded for 6 years by EPSRC at DL). In 2010 moved to a BSD open source licence as DL_POLY_Classic.

• 2003 – 2010: DL_POLY_3 (DD) by I.T. Todorov & W. Smith (funded for 4 years by NERC at Cambridge). Up-licensed to DL_POLY_4 in 2010 – free of charge to academic researchers and at cost to industry (provided as source).

• ~ 15,500 licences taken out since 1994 (~1,600 annually)• ~ 2,600 e-mail list and ~1,300 FORUM members (since 2005)

Written in modularised free formatted F90 (+MPI) with rigorous code syntax (FORCHECK and NAGWare verified) and no external library dependencies• DL_POLY_4 (version 5.1)

– Domain Decomposition parallelisation, based on domain decomposition (no dynamic load balancing), limits: up to ≈2.1×109 atoms with inherent parallelisation

– Parallel I/O (amber netCDF) and radiation damage features

– Free format (flexible) reading with some fail-safe features and basic reporting (but not fully fool-proofed)

• DL_POLY_Classic (version 1.9)– Replicated Data parallelisation, limits up to ≈30,000

atoms with good parallelisation up to 100 (system dependent) processors (running on any processor count)

– Hyper-dynamics, Temperature Accelerated Dynamics, Solvation Dynamics, Path Integral MD

– Free format reading (somewhat rigid)

Current Versions

DL_POLY on the Web

WWW:WWW:

http://www.ccp5.ac.uk/DL_POLY/http://www.ccp5.ac.uk/DL_POLY/

FTP:ftp://ftp.dl.ac.uk/ccp5/DL_POLY/

DEV:http://ccpforge.cse.rl.ac.uk/gf/project/dl-poly/

http://ccpforge.cse.rl.ac.uk/gf/project/dl_poly_classic/

FORUM:http://www.cse.stfc.ac.uk/disco/forums/

ubbthreads.php/

W. Smith and T.R. Forester,J. Molec. Graphics (1996), 14, 136

W. Smith, C.W. Yong, P.M. Rodger,Molecular Simulation (2002), 28, 385

I.T. Todorov, W. Smith, K. Trachenko, M.T. Dove,J. Mater. Chem. (2006), 16, 1611-1618

W. Smith (Guest Editor),Molecular Simulation (2006), 32, 933

I.J. Bush, I.T. Todorov and W. Smith,Comp. Phys. Commun. (2006), 175, 323-329

Further Information

DL_POLY_DD Development Statistics

Lines [x 1,000]Comment 4.0Blank 5.6Total 36.5Manual

178 p

DL_POLY_3.01

DL_POLY_4.05

Lines [x 1,000]excluding CUDA portComment 14.5Blank 31.5Total 128.9Manual317 p

-----------s

ingle

developer-----------

reengineered

reen

gin

eere

d

reengineering

DL_POLY Licence Statistics

Valid e-mail list at end of2010 :: DL_POLY (2+3+MULTI) - 1,000 (list end)2012 :: DL_POLY_4 - 2,000 (list start 2011)

web-registration

Part Part 22

The Molecular Dynamics MethodThe Molecular Dynamics Method

Molecular Dynamics: Definitions

• Theoretical tool for modelling the detailed microscopic behaviour of many different types of systems, including; gases, liquids, solids, polymers, surfaces and clusters.• In an MD simulation, the classical equations of motion governing the microscopic time evolution of a many body system are solved numerically, subject to the boundary conditions appropriate for the geometry or symmetry of the system.• Can be used to monitor the microscopic mechanisms of energy and mass transfer in chemical processes, and dynamical properties such as absorption spectra, rate constants and transport properties can be calculated.• Can be employed as a means of sampling from a statistical mechanical ensemble and determining equilibrium properties. These properties include average thermodynamic quantities (pressure, volume, temperature, etc.), structure, and free energies along reaction paths.

MD simulations are used for:

• Microscopic insight: we can follow the motion of a single molecule (glass of water)

• Investigation of phase change (NaCl)• Understanding of complex systems

like polymers (plastics – hydrophilic and hydrophobic behaviour)

Molecular Dynamics for Beginners

Example: Simulation of Argon

rrcutcut

612

4)(rr

rV

Pair Potential:Pair Potential:

Lagrangian:Lagrangian:

L r v m v V ri i i ii

N

ijj ii

N

( , ) ( )

1

22

1

Lennard -Jones Potential

612

4)(rr

rV

V(r)V(r)

rr

rrcutcut

Pair-wise radial distancePair-wise radial distance

Equations of Motion

d

dt

L

v

L

ri i

)( ijiij

N

ijiji

iii

rVf

fF

Fam

Lagrange Equation Lagrange Equation – time evolution– time evolution

Force Evaluation – Force Evaluation – particle particle interactionsinteractions

Boundary Conditions

• None – biopolymer simulations

• Stochastic boundaries – biopolymers

• Hard wall boundaries – pores, capillaries

• Periodic boundaries – most MD simulations

2D cubic periodic2D cubic periodic

Periodic Boundary Conditions

TriclinicTriclinic

Truncated octahedronTruncated octahedron

Hexagonal prismHexagonal prism

Rhombic dodecahedronRhombic dodecahedron

• Kinetic Energy:

• Temperature:

• Configuration Energy:

• Pressure:

• Specific heat:

K E m vi ii

N

. . 12

2

TNk

K EB

2

3. .

U V rc ijj i

N

i

( )

N

iiiB frTNkPV

3

1

v

BBNVEc C

NkTNkU

2

31

2

3)( 222

System Properties: Static (1)

Structural Properties

– Pair correlation (Radial Distribution Function):

– Structure factor:

– Note: S(k) available from x-ray diffraction

System Properties: Static (2)

g rn r

r r

V

Nr rij

j i

N

i

( )( )

( )

4 2 2

drrrgkr

krkS 2

01)(

)sin(41)(

RR

RR

Radial Distribution Function (RDF)

g(r)g(r)

separation (r)separation (r)

1.01.0

Typical RDF

Single correlation functions:

Mean squared displacement (Einstein relation)

Velocity Autocorrelation (Green-Kubo relation)

2Dt= 13

⟨∣r i( t )−r i(0 )∣2 ⟩

D=130

∞⟨v i(t )⋅v i(0 )⟩dt

System Properties: Dynamic (1)

Collective Correlation Functions: DL_POLY GUI

• General van Hove correlation function

• van Hove self-correlation function

• van Hove distinct correlation function

N

jiji trrr

NtG

1,

)]()0([1

),( r

N

iiis trrr

NtG )]()0([

1),( r

N

i

N

ijjid trrr

NtG )]()0([

1),( r

System Properties: Dynamic (2)

Correlation Function Uses

• Complete description of bulk dynamical properties

• Space-time Fourier Transform of van Hove function

• Elastic properties of materials• Energy dissipation• Sound propagationObtained directly from neutron scattering

Part 2Part 2

DL_POLY Basics & AlgorithmsDL_POLY Basics & Algorithms

Supported Molecular Entities

Point ionsand atoms

Polarisable ions

(core+shell)Flexible

moleculesConstraint

bonds

Rigidmolecules

Flexiblylinked rigidmolecules

Rigid bondlinked rigidmolecules

Force Field Definitions – I

• particle: a rigid ion or atom (charged or not), a core or a shell of a polarisable ion (with or without associated degrees of freedom), a massless charged site. A particle is a countable object and has a global ID index.• site: a particle prototype that serves to define the chemical & physical nature (topology/connectivity/stoichiometry) of a particle (mass, charge, frozen-ness). Sites are not atoms they are prototypes!• Intra-molecular interactions: chemical bonds, bond angles, dihedral angles, improper dihedral angles, inversions. Usually, the members in a unit do not interact via an inter-molecular term. However, this can be overridden for some interactions. These are defined by site.• Inter-molecular interactions: van der Waals, metal (2B/E/EAM, Gupta, Finnis-Sinclair, Sutton-Chen), Tersoff, three-body, four-body. Defined by species.

Force Field Definitions – II

• Electrostatics: Standard Ewald*, Hautman-Klein (2D) Ewald*, SPM Ewald (3D FFTs), Force-Shifted Coulomb, Reaction Field, Fennell damped FSC+RF, Distance dependent dielectric constant, Fuchs correction for non charge neutral MD cells.

• Ion polarisation via Dynamic (Adiabatic) or Relaxed shell model.

• External fields: Electric, Magnetic, Gravitational, Oscillating & Continuous Shear, Containing Sphere, Repulsive Wall.

• Intra-molecular like interactions: tethers, core shells units, constraint and PMF units, rigid body units. These are also defined by site.

• Potentials: parameterised analytical forms defining the interactions. These are always spherically symmetric!

• THE CHEMICAL NATURE OF PARTICLES DOES NOT CHANGE IN SPACE AND TIME!!! *

Force Field by Sums

i

N

1iexternal

N

ijishell-coreshell-core

N

i0tttethertether

N

idcbainversinvers

N

idcbadiheddihed

N

icbaangleangle

N

ibabondbond

N'

ji,

N'

ji,jiij

N

ijipairmetal

N'

nk,j,i,nkjibody4

N'

kj,i,kjibody3

N'

kj,i,kjiTersoff

N'

ji, ji

ji

0

N'

ji,jipairN21

rΦ|rr|,iUr,r,iU

r,r,r,r,iUr,r,r,r,iU

r,r,r,iUr,r,iU

)|rr|(ρF|)rr(|Vε

r,r,r,rUr,r,rUr,r,rU

|rr|

qq

1|)rr(|U)r,.....,r,rV(

-shellcore

-shellcore

tether

tether

invers

invers

dihed

dihed

angle

angle

bond

bond

Boundary Conditions

• None (e.g. isolated macromolecules)

• Cubic periodic boundaries

• Orthorhombic periodic boundaries

• Parallelepiped (triclinic) periodic boundaries

• Truncated octahedral periodic boundaries*

• Rhombic dodecahedral periodic boundaries*

• Slabs (i.e. x,y periodic, z non-periodic)

DL_POLY is designed for homogeniousdistributed parallel machines

M1 P1

M2 P2

M3 P3

M0 P0 M4P4

M5P5

M6P6

M7P7

Assumed Parallel Architecture

InitializeInitialize

ForcesForces

MotionMotion

StatisticsStatistics

SummarySummary

InitializeInitialize

ForcesForces

MotionMotion

StatisticsStatistics

SummarySummary

InitializeInitialize

ForcesForces

MotionMotion

StatisticsStatistics

SummarySummary

InitializeInitialize

ForcesForces

MotionMotion

StatisticsStatistics

SummarySummary

AA BB CC DD

Replicated Data Strategy – I

• Every processor sees the full system

• No memory distribution (performance overheads and limitations increase with increasing system size)

• Functional/algorithmic decomposition of the workload

• Cutoff ≤ 0.5 min system width

• Extensive global communications (extensive overheads increase with increasing system size)

Replicated Data Strategy – II

1,2 1,3 1,4 1,5 1,6 1,7

2,3 2,4 2,5 2,6 2,7 2,8

3,4 3,5 3,6 3,7 3,8 3,9

4,5 4,6 4,7 4,8 4,9 4,10

5,6 5,7 5,8 5,9 5,10 5,11

6,7 6,8 6,9 6,10 6,11 6,12

7,8 7,9 7,10 7,11 7,12

8,9 8,10 8,11 8,12 8,1

9,10 9,11 9,12 9,1 9,2

10,11 10,12 10,1 10,2 10,3

11,12 11,1 11,2 11,3 11,4

12,1 12,2 12,3 12,4 12,5

PP00

PP00

PP00

Distributed list!

Parallel (RD) Verlet List

AA BB

CC DD

Domain Decomposition MD

• Linked lists provide an elegant way to scale short-ranged two body interactions from O(N2/2) to ≈O(N). The efficiency increases with increasing link cell partitioning – as a rule of thumb best efficacy is achieved for cubic-like partitioning with number of link-cells per domain ≥ 4 for any dimension.

• Linked lists can be used with the same efficiency for 3-body (bond-angles) and 4-body (dihedral & improper dihedral & inversion angles) interactions. For these the linked cell halo is double-layered and since cutoff3/4-body ≤ 0.5*cutoff2-body which makes the partitioning more effective than that for the 2-body interactions.

• The larger the particle density and/or the smaller the cutoff with respect to the domain width, (the larger the sub-selling and the better the spherical approximation of the search area), the shorter the Verlet neighbour-list search.

Linked Cell Lists

6

1 2 3 4 5

LinkCell 2

6

10

12

16

17

Head of ChainList

1 2 3 4 5 6 7 8 9 2019181716151413121110

10 12 16 17 0LinkList

Atom number

Cell number

Linked Cell List Idea

• Provides dynamically adjustable workload for variable local density and VNL speed up of ≈ 30% (45% theoretically).

• Provides excellent serial performance, extremely close to that of Brode-Ahlrichs method for construction of the Verlet neighbour-list when system sizes are smaller < 5000 particles.

1 2 3 4 5 6 7

Sub-celling of LCs

• Bonded forces:- Algorithmic decomposition for DL_POLY_C- Interactions managed by bookkeeping

arrays, i.e. explicit bond definition!!!- Shared bookkeeping arrays

• Non-bonded forces:– Distributed Verlet neighbour list (pair forces)– Link cells (3,4-body forces)

• Implementations differ between DL_POLY_4 & C!

Parallel Force Calculation

Molecular Molecular force fieldforce fielddefinitiondefinition

Glo

bal Fo

rce F

ield

Glo

bal Fo

rce F

ield

PP00LocalLocalforceforcetermsterms

PP11LocalLocalforceforcetermsterms

PP22LocalLocalforceforcetermsterms

Pro

cess

ors

Pro

cess

ors

DL_POLY_C and Bonded Forces

Glo

bal fo

rce fi

eld

Glo

bal fo

rce fi

eld

PP00LocalLocalatomicatomicindicesindices

PP11LocalLocalatomicatomicindicesindices

PP22LocalLocalatomicatomicindicesindices

Pro

cess

or

Dom

ain

sPro

cess

or

Dom

ain

s

Tricky!Tricky!Molecular Molecular force fieldforce fielddefinitiondefinition

DL_POLY_4 and Bonded Forces

EE

AA

BB

CC

DDFF

GG

HH 11

22

33

44

55

66

77

Replicated dataReplicated data Domain decompositionDomain decomposition

Topology Distribution

AA11 AA33 AA55 AA77 AA99 AA1111 AA1313 AA1515 AA1717

AA22 AA44 AA66 AA88 AA1010 AA1212 AA1414 AA1616

PP00 PP11 PP22 PP33 PP44 PP55 PP66 PP77 PP88

RD Distribution Scheme: Bond Forces

AA BB

CC DD

AA11

AA33

AA55

AA77

AA99

AA1111

AA1313

AA1515

AA1717

AA22

AA44

AA66

AA88

AA1010

AA1212

AA1414

AA1616

DD Distribution Scheme: Bond Forces

Ensembles and Algorithms

Integration:

Available as velocity Verlet (VV) or leapfrog Verlet (LFV) generating flavours of the following ensembles• NVE

• NVT (Ekin) Evans

• NVT Andersen^, Langevin^, Berendsen, Nosé-Hoover, GST• NPT Langevin^, Berendsen, Nosé-Hoover, Martyna-Tuckerman-Klein^

• NT/NPnAT/NPnT Langevin^, Berendsen, Nosé-Hoover, Martyna-Tuckerman-Klein^

Constraints & Rigid Body Solvers: • VV dependent – RATTLE, No_Squish, QSHAKE*• LFV dependent – SHAKE, Euler-Quaternion, QSHAKE*

Integration Algorithms

Essential Requirements:

• Computational speed• Low memory demand• Accuracy• Stability (energy conservation, no drift)• Useful property - time reversibility• Extremely useful property –

symplecticness = time reversibility + long term stability

r (t)

r (t+t)v (t)t

f(t)t2/m

Net

displacement

r’ (t+t)

[r (t), v(t), f(t)] [r (t+t), v(t+t), f(t+t)]

Integration: Essential Idea

Simulation Cycle and Integration Schemes

SetupSetup

ForcesForces

MotionMotion

Stats.Stats.

ResultsResults

Set up initial Set up initial systemsystem

Calculate Calculate forcesforces

Calculate Calculate motionmotion

Accumulate Accumulate statistical datastatistical data

Summarise Summarise simulationsimulation

Taylor expansion:

)()()(.3

)()()(.2

)(.1

)(),(.0

21

21

21

21

ttvttxttx

m

tftttvttv

afreshcalculatedtf

ttvtx

iii

i

iii

i

ii

Leapfrog Verlet (LFV)

Velocity Verlet (VV)

i

iii

i

iii

i

iii

iii

m

ttftttvttv

afreshcalculatedttf

ttvt

txttx

m

tfttvttv

tftvtx

)(

2)()(.VV2.1

)(.0VV2.

)(2

)()(.2VV1.

)(

2)()(.1VV1.

)(),(),(.0VV1.

21

21

21

32

1 2tO

m

ftvtrr nnnn

Integration Algorithms: Leapfrog Verlet

Discrete timeDiscrete time

rrn+1n+1rrnnrrn-1n-1rrn-2n-2

vvn+1/2n+1/2

f f nnf f n-1n-1f f n-2n-2

vvn-1/2n-1/2vvn-3/2n-3/2

)(

)(

42/11

32/12/1

tvtrr

tFm

tvv

ni

ni

ni

ni

i

ni

ni

Application in Practice

2

2/12/1

2/11

2/12/1

ni

nin

i

ni

ni

ni

ni

i

ni

ni

vvv

vtrr

Fm

tvv

Integration Algorithms: Velocity Verlet

Discrete timeDiscrete time

rrn+1n+1rrnnrrn-1n-1rrn-2n-2

vvn+1n+1vvnnvvn-1n-1vvn-2n-2

f f n+1n+1f f nnf f n-1n-1f f n-2n-2

12/11

2/11

2/1

2

2

ni

i

ni

ni

ni

ni

ni

ni

i

ni

ni

Fm

tvv

vtrr

Fm

tvv

Application in Practice

)()(2

)(2

211

42

1

tFFm

tvv

tFm

tvtrr

ni

ni

i

ni

ni

ni

i

ni

ni

ni

Constraint Solvers

SHAKE

RATTLERATTLE_R (SHAKE)

Taylor expansions: 3

2

1 2tO

m

gftvtrr nnnnn

31 22

1 tOm

hftvv nnnn

jiij

uij

oij

uijijij

ij

oijijjiij

mm

dd

dd

tg

dgGG

111

)(

2

22

2

uij

oij

uij

oijij

ijdd

dd

tg

)(

22

2

RATTLE_Vijd

i

joiv

oiv

2

)(

ij

oijjiij

ij

oijijjiij

d

dvv

th

dhHH

oijd

ijd

uijd

oioj

ui i

ujj

jiG

ijG

MUMU11

MUMU22

MUMU33

MUMU44

Replicated Data SHAKE

Proc AProc A Proc BProc B

Extended Ensembles in VV casting

Velocity Verlet integration algorithms can be naturally derived from the non-commutable Liouvile evolution operator by using a second order Suzuki-Trotter expansion. Thus they are symplectic/true ensembles (with conserved quantities) warranting conservation of the phase-space volume, time-reversibility and long term numerical stability…

Examplary VV Expansion of NVE to NVEkin, NVT, NPT & NσT

ttttRRATTLE

tttvt

txttx

tm

tfttvttv

tttttThermostat

ttttBarostat

ttttThermostat

tftvtx

iii

i

iii

iii

:)(_

:)(2

)()(

:)(

2)()(

:)(

:)(

:)(

)(),(),(

:VV1

21

21

21

41

21

41

21

21

41

41

tttttThermostat

tttttBarostat

tttttThermostat

tttttVRATTLE

tm

ttftttvttv

afreshttfttvttx

i

iii

iii

41

23

21

21

41

43

21

21

21

21

21

:)(

:)(

:)(

:)(_

:)(

2)()(

)(),(),(

:VV2

Other Integration Algorithms

• Gear Predictor-Corrector – generally easily extendable to any high order of accuracy. It is used in satellite trajectory calculations/corrections. However, lacking long term stability.

• Trotter derived evolution algorithms – generally easily extendable to any high order of accuracy. Symplectic.

Base Functionality

• Molecular dynamics of polyatomic systems with options to save the micro evolution trajectory at regular intervals

• Optimisation by conjugate gradients method or zero Kelvin annealing

• Statistics of common thermodynamic properties (temperature, pressure, energy, enthalpy, volume) with options to specify collection intervals and stack size for production of rolling and final averages

• Calculation of RDFs and Z-density profiles• Temperature scaling, velocity re-Gaussing• Force capping in equilibration

• Radiation damage driven features:• defects analysis• boundary thermostats• volumetric expansion• replay history• variable time step algorithm

• Extra ensembles:• Langevin, Andersen, MTK, GST• extensions of NsT to NPnAT and NPnT

• Infrequent k-space Ewald evaluation• Direct VdW• Direct Metal• Force shifted VdW• I/O driven features Parallel I/O & netCDF• Extra Reporting

DL_POLY_4 Specials

Part 3Part 3

DL_POLY I/O FilesDL_POLY I/O Files

I/O Files

• Crystallographic (Dynamic) data• Simulation Control data• Molecular / Topological Data• Tabulated two-body potentials• EAM potential data• Reference data for DEFECTS

• Restart data

• Final configuration• Simulation summary data• Trajectory Data

• Defects data

• MSD & T inst data

• Statistics data• Best CGM configuration

• RDF data

• Z density data

• Restart data

Internally, DL_POLY uses atomic scale units:

• Mass- mass of H atom (D) [Daltons]• Charge - charge on proton (e)• Length - Angstroms (Å)• Time - picoseconds (ps)• Force - D Å ps-2

• Energy - D Å2 ps-2 [10 J mol-1]

pressure is expressed in k-atm for I/O

DL_POLY Units

UNITS directive in FIELD file allows to opt for the following energy units

•Internal DL_POLY units - 10 J mol-1 •Electron-volts - eV•Kilo calories per mol - k-cal mol-1

•Kilo Joules per mol - k-J mol-1

•Kelvin per Boltzmann - K Boltzmann-1

All interaction MUST have the same energy units!

Acceptable DL_POLY Units

• SIMULATION CONTROL

• Free Format• Mandatory• Driven by keywords:

keyword [options] {data}

e.g.:

ensemble NPT Hoover 1.0 8.0

CONTROL File

CONFIG [REVCON,CFGMIN] File

• Initial atomic coordinates• Format

- Integers (I10)- Reals (F20)- Names (A8)

• Mandatory• Units:

- Position - Angstroms (Å)- Velocity – Å ps-1

- Force - D Å ps-2

• Construction:- Some kind of GUI

essential for complex systems

• Force Field specification

• Mandatory• Format:

- Integers (I5)- Reals (F12)- Names (A8)- Keywords (A4)

• Maps on to CONFIG file structure

• Construction- Small systems - by hand- Large systems – nfold or

GUI!

FIELD File

• Defines non-analytic pair (vdw) potentials

• Format- Integers (I10)- Reals (F15)- Names (A8)

• Conditional, activated by FIELD file option

• Potential & Force • NB force (here) is:

)()( rUr

rrG

TABLE File

• Defines embedded atom potentials• Format

- Integers (I10)- Reals (F15)- Names (A8)

• Conditional, activated by FIELD file option• Potentials only • pair, embed & dens keywords for atom

types followed by data records (4 real numbers per record)

• Individual interpolation arrays

TABEAM File

• Provides program restart capability

• File is unformatted (not human readable)

• Contains thermodynamic accumulators, RDF data, MSD data and other checkpoint data

• REVIVE (output file) ---> REVOLD (input file)

REVOLD [REVIVE] File

• Provides Job Summary (mandatory!)• Formatted to be human readable• Contents:

- Summary of input data- Instantaneous thermodynamic data at selected intervals- Rolling averages of thermodynamic data- Statistical averages- Final configuration- Radial distribution data- Estimated mean-square displacements and 3D diffusion

coefficient• Plus:

- Timing data, CFG and relaxed shell model iteration data- Warning & Error reports

OUTPUT File

• System properties at intervals selected by user

• Optional• Formatted (I10,E14)• Intended use: statistical

analysis (e.g. error) and plotting vs. time.

• Recommend use with GUI!

• Header:- Title- Units

• Data:- Time step, time, #

entries- System data

STATIS File

Configuration data at user selected intervals•Formatted•Optional

Header:•Title•Data level, cell key, number

Configuration data:•Time step and data keys•Cell Matrix•Atom name, mass, charge•X,Y,Z coordinates (level 0)•X,Y,Z velocities (level 1)•X,Y,Z forces (level 2)

HISTORY File

• Formatted (A8,I10,E14)• Plotable• Optional• RDFs from pair forces• Header:

- Title- No. plots & length of plot

• RDF data:- Atom symbols (2)- Radius (A) & RDF- Repeated……

• ZDNDAT file has same format

RDFDAT [ZDNDAT] File

• REFERENCE file– Reference structure to compare against

• DEFECTS file– Trajectory file of vacancies and interstitials

migration

• MSDTMP file– Trajectory like file containing the each

particle’s Sqrt(MSDmean) and Tmean

DL_POLY_4 Extra Files

Part 5Part 5

DL_POLY_4 PerformanceDL_POLY_4 Performance

Proof of Concept on IBM p575 2005

300,763,000 NaCl with full SPME electrostatics evaluation on 1024 CPU cores

HECToR (2013)Start-up time 60 min 15 min Timestep time 68 sec 23 secFFT evaluation 55 sec 18 sec

In theory ,the system can be seen by the eye. Although you would need a very good microscope – the MD cell size for this system is 2μm along the side and as the wavelength of the visible light is 0.5μm so it should be theoretically possible.

2000 4000 6000 8000 10000 12000 14000 16000

2000

4000

6000

8000

10000

12000

14000

16000

14.6 million particle Gd2Zr

2O

7 system

Sp

ee

d G

ain

Processor count

Perfect MD step total Link cells van der Waals Ewald real Ewald k-space

Benchmarking BG/L Jülich 2007

0 200 400 600 800 1000

0

200

400

600

800

1000

max load 700'000 atoms per 1GB/CPUmax load 220'000 ions per 1GB/CPUmax load 210'000 ions per 1GB/CPU

Solid Ar (32'000 atoms per CPU) NaCl (27'000 ions per CPU) SPC Water (20'736 ions per CPU)

21 million atoms

28 million atoms

33 million atoms

Sp

ee

d G

ain

Processor Count

Weak Scaling

DL_POLY_4 RB v/s CB

HECToR (Cray XE6) 2013

Weak Scaling and Cost of Complexity

HECToR (Cray XE6) 2013

I/O Solutions

1.Serial read and write (sorted/unsorted) – where only a single MPI task, the master, handles it all and all the rest communicate in turn to or get broadcasted to while the master completes writing a configuration of the time evolution.

2.Parallel write via direct access or MPI-I/O (sorted/unsorted) – where ALL / SOME MPI tasks print in the same file in some orderly manner so (no overlapping occurs using Fortran direct access printing. However, it should be noted that the behaviour of this method is not defined by the Fortran standard, and in particular we have experienced problems when disk cache is not coherent with the memory).

3.Parallel read via MPI-I/O or Fortran

4.4.Serial NetCDF read and writeSerial NetCDF read and write using NetCDF libraries using NetCDF libraries for machine-independent data formats of array-based, for machine-independent data formats of array-based, scientific data (widely used by various scientific scientific data (widely used by various scientific communities).communities).

MX1-1

PX0-1

M1

P1

The Advanced Parallel I/O Strategy

MX0+1

PX0+1

MX1-1

PX1-1

M0

P0

MXn+1

PXn+1

MN-1

PN-1

MXn

PXn

MX0

PX0

I/O Group 0

I/O Group 1

I/O Group n=M-1

I/O

B

ATC

HI/

O

BA

TC

HI/

O

BA

TC

HD

I S

KD

I S

K

Memory

PHEAD Pslave I/O WRITE COMMS

I/O READ COMMS

HECToR (Cray XE6) 2012

•72 I/O NODES

•READ ~ 50-300 Mbyte/s with best performance on 16 to 128 I/O Groups

•WRITE ~ 50-150 Mbyte/s with best performance on 64 to 512 I/O Groups

•Performance depends on user defined number of I/O groups, and I/O batch (memory CPU to disk) and buffer (memory of comms transactions between CPUs)

•Reasonable defaults as a function of all MPI tasks are provided

N compute cores of which M < N N compute cores of which M < N do I/Odo I/O

Part 4Part 4

Obtaining & Building DL_POLYObtaining & Building DL_POLY

DL_POLY Licensing and Support

• Online Licence Facility at http://www.ccp5.ac.uk/DL_POLY/• The licence is

• To protect copyright of Daresbury Laboratory• To reserve commercial rights• To provide documentary evidence justifying continued support by UK Research Councils

• It covers only the DL_POLY_4 package• Registered users are entered on the DL_POLY e-mailing list• Support is available (under CCP5 and EPSRC service level agreement to CSED) only to UK academic researchers• For the rest of the world there is the online user forum• Last but not least there is a detailed, interactive, self-referencing PDF (LaTeX) user manual

• Register at http://www.ccp5.ac.uk/DL_POLY/

• Registration provides the decryption - procedure and password (sent by e-mail)

• Source is supplied by anonymous FTP (in an invisible directory)

• Source is in a tarred, gzipped and encrypted form

• Successful unpacking produces a unix directory structure

• Test and benchmarking data are also available

Supply of DL_POLY_4

DL_POLY_Classic Support

• Full documentation of software supplied with source• Support is available through the online user forum or the CCP5 user community

WWW:WWW:

http://www.ccp5.ac.uk/DL_POLY_CLASSIC/http://www.ccp5.ac.uk/DL_POLY_CLASSIC/

FTP:ftp://ftp.dl.ac.uk/ccp5/DL_POLY/

FORUM:http://www.cse.stfc.ac.uk/disco/forums/

ubbthreads.php/

• Downloads are available from CCPForge at http://ccpforge.cse.rl.ac.uk/gf/project/dl_poly_classic/

• No registration required – BSD licence

• Download source from: CCPForge: Projects: DL_POLY Classic: Files: dl_poly_classic: dl_poly_classic1.9

• Sources is a in tarred and gzipped form

• Successful unpacking produces a unix directory structure

• Test data are also available

Supply of DL_POLY_Classic

DL_POLYDL_POLY

buildbuild

sourcesource

executeexecute

javajava

utilityutility

Home of makefilesHome of makefiles

DL_POLY source codeDL_POLY source code

Home of executable Home of executable &&Working DirectoryWorking Directory

Java GUI source codeJava GUI source code

Utility codesUtility codes

data Test dataTest data

DL_POLY Directory Structure

1. Note differences in capabilities (e.g. linked rigid bodies) !!!2. Less than 10,000 atoms (if in parallel)? – DL_POLY Classic3. More than 30,000 atoms? – DL_POLY_44. Ratio cell_width/rcut < 3 (in any direction)? – DL_POLY_Classic

5. Less than 500 particles per processor? – DL_POLY_Classic

DL_POLY_C v/s DL_POLY_4 Usage

DL_POLY_ClassicSimple molecules (no SHAKE)• 8 or less, 10,000 atoms• 16 or less, 20,000 atoms• 32 or less, 30,000 atomsSimple ionics• 16 or less, 10,000 atoms• 64 or less, 20,000 atoms• 128 or less, 30,000 atomsMolecules (with SHAKE)• 64 max!

DL_POLY_4• Golden Rule 1: No fewer than

3x3x3 link cells per processor (if in parallel)

• Golden Rule 2: No fewer than 500 particles per processor (if in parallel)!

Part 6Part 6

DL_POLY_Classic FunctionalityDL_POLY_Classic FunctionalityW. SmithW. Smith

Special Algorithms

• Hyperdynamics– Bias potential dynamics– Temperature accelerated dynamics– Nudged elastic band

• Solvation properties:– Energy decomposition– Spectroscopic solvent shifts– Free energy of solution

• Metadynamics

Standard Input Special Input Standard Output Special Output

CONFIG REVOLD OUTPUT HISTORY

FIELD TABLE STATIS RDFDAT

CONTROL TABEAM REVIVE ZDNDAT

REVCON

HYPOLD HYPRES

EVENTS

CFGBSNnn

CFGTRAnn

PROnn.XY

SOLVAT

FREENG

STEINHARDT METADYNAMICS

ZETA

CFGMIN

Operation Type:

Standard use

Hyperdyn./TAD

Solvation

Metadynamics

Optimisation

I/O Files

Solvation Features

• Molecular Solvation Energy• Energy decomposition• Energy distribution functions

• Free Energy of Solvation• Mixed Hamiltonian method• Thermodynamic Integration

• Solution Spectroscopy• Solvent induced shifts• Solvation relaxation

• SOLVAT – Breakdown of system energy based on

molecular types– Energies of ground and excited states

• FREENG – Energy data for thermodynamic

integration

Solvation Files

Bias Potential Dynamics

Temperature Accelerated Dynamics

Metadynamics

Hyperdynamics

• Construct bias potential to reduce well depth of state A.

• Bias potential is zero at saddle point.

• Ratios of rates from state A to states B, C, etc. preserved:

• Suitable bias potential:

Bias Potential Dynamics

OriginalPotential

Bias Potential

ModifiedPotential

State A

Bias Potential Dynamics 2

b

b

bb

bb

b

b

A

Nb

ATSTTST

*bA

NbANTST

A

NbA

NbNTST

ANTST

A

Nb

A

Nb

N

A

NNb

Nb

N

NNb

Nb

NN

A

NN

NNN

A

RVkk

RVRVRVk

RVRVRVk

RVk

RV

RVff

dRVRVH

dRVRVHff

dH

dHff

exp/or

0 if exp/ and

exp/exp So

Now

exp

exp

exp

exp

exp

exp

*

*

*

Temperature Accelerated Dynamics

First order reactions:

Hopping probability:

P dt = k exp(-kt) dt

Lifetime of state: =1/k

Arrhenius:

k = A exp(-E*/RT)

log(1/) = log A - E*/RT 1/RTh 1/RTl1/RToo

log(1

/)

p1

p2

E1

E2

increasing simulationtime

stop time tend

Temperature Accelerated Dynamics 2

• Simulate system at high T & watch for transitions• When transition found, stop simulation and:

– Determine activation energy using nudged elastic band

– Record transition time, save `new’ state configuration

• Restart simulation in original state with new velocities.

• Search for new transitions. Hence build `library’ of transition data.

• Stop searching after time tend given by:

tend=exp[E2+(Th-Tl)(E2-)/Th]

• Commence new search from `first’ low T state.

Nudged Elastic Band

A

B

E

R

A B

C1 CN-1

C2 C3 C4

C0 CN

• N+1 configs (C0…CN) linearly interpolated From A to B

• Connect by spring (stiffness K)

• Remove `off tangent’ forces• Minimise all configs subject to presence of spring forces

• Resulting path is reaction path through saddle point

Kinetic Monte Carlo

• Simulate set of competing processes• Rate of process is (make a list).• Define sum of rates

• Generate random number• Select process

• Advance time • Repeat!

Nipi ,,1; ip ir

N

iirR

1

10 : uu

i

jj

i

jji ruRrp

1

1

1

:

R

uRip

Rut /)log(

Invoking the Hyperdynamics Options

In the CONTROL file:

tadunits kJnum_block 500num_track 10blackout 1000catch_radius 3.5neb_spring 10.0deltad 6.91low_temp 40.0force 0.0025endtad

bpd pathunits eVvmin -3.9935E03ebias -3.5000E03num_block 300num_track 10catch_radius 3.5neb_spring 1.0force 0.00025endbpd

OR

Additional files for TAD and Bias potential dynamics:

•HYPRES/HYPOLD – restart files•EVENTS –program activity report•CFGBSNnn – Basin CONFIG files (new states)•PROnn.XY – Reaction path profiles•CFGTRAnn – Tracking CONFIG files

Subdirectories required in execute directory: BASINS, PROFILES, TRACKS

Hyperdynamics Files

TAD – DL_POLY Test Case 32

• Atoms `hop’ into vacancies• Each vacancy has 12 nearest

neighbour atoms• So 12 possible escapes from PE basin• Use TAD to find them!• Use NEB to find activation energy• Extrapolate to low temperature for

low T rate• Put results into KMC simulation

255 L-J Argon atoms FCC crystal + 1 vacancy

EVENTS file extract:

Event nt Basins Nt E Time(ps) Extrap.(ps) Stop time(ps)TRA 38500 0 1 1 7.28338E+00 3.82250E+01 4.31244E+07 2.04398E+03TRA 55500 0 2 1 7.20808E+00 5.49650E+01 5.36891E+07 2.04398E+03TRA 127500 0 3 1 7.28160E+00 1.26145E+02 1.41830E+08 2.04398E+03TRA 750500 0 4 1 7.19597E+00 7.47515E+02 7.13444E+08 2.04398E+03

BPD – DL_POLY Test Case 33

998 NaCl ions rocksalt crystal + 2 vacancies

Event nt Basins Nt E Time(ps) Extrap.(ps)TRA 4500 0 1 1 6.74301E-01 4.39500E+00 7.34793E+03TRA 399300 1 2 1 1.11127E+00 3.99185E+02 6.45155E+05TRA 466500 2 3 1 6.57466E-01 4.66495E+02 7.53837E+05

EVENTS file extract:

• Overall neutral system• Ions `hop’ into vacancies• Escapes from PE basin unknown

(a priori)• Use BPD to find them!• Use NEB to find activation energy• Extrapolate hopping time for zero

bias• Put results into KMC simulation

NEB Reaction Profiles

Lennard Jones Argon

Sodium Chloride

Metadynamics

Metadynamics is a method devised by Alessandro Laio and Michele Parrinello for accelerating the exploration of a free energy landscape as the function of collective variables.

Method:• The system potential energy is augmented by a time-

dependent bias potential consisting of Gaussian functions of the collective variables

• The longer a simulation remains in a particular free energy minimum, the larger the bias potential becomes – thus forcing the system to seek out a new thermodynamic state.

• The accumulated bias potential provides a description of the free energy surface

Metadynamics in 1D

A. Laio & M. Parrinello, PNAS 99 (2002) 12562

Collective Variables?

A collective variable is a single number that defines an atomic structure (i.e. it is a function of ).

Most often they are called Order Parameters. Particular examples used in metadynamics are:

•The system potential energy:

•Simulation cell vectors:

•The Steinhardt order parameters:

•Tetrahedral order parameters:

and are maximum for particular structures.

Defining the bias potential in terms of order parameters

allows destabilization of particular structural phases.

)( NrUQ

Nr

Q

),,( cbah

Metadynamics Formulae

Order parameter vector:

System Hamiltonian:

Bias Potential:

and are chosen to `fill’ surface at acceptable rate

Force:

Free Energy Surface:

)(,),()( 1N

MNNM rsrsrs

]),([)(21

2

trsVrUm

pH NMN

N

i i

i

gN

k

Mk

MnM htssWtrsV1

22

2/)()(exp]),([

)()(1

Nji

M

j j

Nii

rss

VrUf

]),([lim

)( trsVt

sF NMMg

W h

• METADYNAMICS – Data defining the metadynamics hypersurface

• STEINHARDT – Defines the Steinhardt order parameters

• ZETA – Defines the tetrahedral order parameters

Metadynamics Files

Steinhardt Order Parameters

if 0

if 1)(

)(cos

2

1

if 1

)(

and and typesatom connecting vectors allover runs where

,)(

with, and typesatomfor

1

12

4

2

2112

1

1

1

2/12

rr

rrrrr

rr

rr

rf

Nb

YrfQ

QNN

Q

c

b

bbmb

N

bcm

mm

C

b

) typeof are atoms all (assuming atom

tolinked atoms of pairs ofnumber theis and

species of atoms allover run ,, Where

)3/1)(cos()(1

1

2

c

N

i

N

ij

N

jkjikikcijc

c

N

Nkji

rfrfNN

T

Tetrahedral Order Parameters

Ice Nucleation and Growth 1

0.5ns

D Quigley and PM Rodger, Molec. Sim. 35 (2009) 613

Bias:Q4

OO, Q6OO, T & PE

Ice Nucleation and Growth 2

0.75ns

D Quigley and PM Rodger, Molec. Sim. 35 (2009) 613

Bias:Q4

OO, Q6OO, T & PE

Ice Nucleation and Growth 3

1.25ns

D Quigley and PM Rodger, Molec. Sim. 35 (2009) 613

Bias:Q4

OO, Q6OO, T & PE

Ice Nucleation and Growth 4

1.5ns

D Quigley and PM Rodger, Molec. Sim. 35 (2009) 613

Bias:Q4

OO, Q6OO, T & PE

Conclusions

• DL_POLY Classic is free

• It's very versatile with advanced features

• Go get it!

Part 7Part 7

The DL_POLY Java GUIThe DL_POLY Java GUIW. SmithW. Smith

• Java - Free!

• Facilitate use of code

• Selection of options (control of capability)

• Construct (model) input files

• Control of job submission

• Analysis of output

• Portable and easily extended by user

GUI Overview

• Edit source in java directory

• Edit using vi,emacs, whatever

• Compile in java directory:

javac *.java

jar -cfm GUI.jar manifesto *.class

• Executable is GUI.jar

• But.....

****Don't Panic!****

The GUI.jar file is provided in the download

Compiling/Editing the GUI

• Invoke the GUI from within the execute directory (or equivalent):

java -jar ../java/GUI.jar

• Colour scheme options:

java -jar ../java/GUI.jar –colourscheme

with colourscheme one of:monet, vangoch, picasso, cezanne, mondrian

(default picasso).

Invoking the GUI

MenusMenus

The Monitor Window

Using Menus

ShowShowEditorEditorOptionOption

GraphicsGraphicsButtonsButtons

The Molecular Viewer

GraphicsGraphicsWindowWindow

EditorEditorButtonButton

EditorEditorButtonsButtons

The Molecular Editor

EditorEditorWindowWindow

• File - Simple file manipulation, exit etc.• FileMaker - make input files:

– CONTROL, FIELD, CONFIG, TABLE

• Execute– Select/store input files, run job

• Analysis– Static,

dynamic,statistics,viewing,plotting• Information

– Licence, Force Field files, disclaimers etc.

Available Menus

ButtonsButtons

Text BoxesText Boxes

A Typical GUI Panel

• VMD is a free software package for visualising MD data.

• Website: http://www.ks.uiuc.edu/Research/vmd/

• Useful for viewing snapshots and movies.– A plug in is available for DL_POLY HISTORY files– Otherwise convert HISTORY to XYZ or PDB format

DL_POLY & VMD

xyzPDB

DL_FIELD

‘black box’FIELD CONFIG

DL_FIELD – http://www.ccp5.ac.uk/DL_FIELD/

• Orgainic Fields – AMBER, CHARM, OPLS-AA, PCFF, Drieding, CHARM19 (united atom)

• Inorganic Fields including a core-shell polarisation option• Solvation Features, Auto-CONNECT feature for mapping

complex random structures such as gels and random polymers

• input units freedom and molecular rigidification

Protonated 4382 atoms (excluding water)19400 bond interactions 7993 angles interactions13000 dihedral interactions 730 VDW intearctions

Developed by Developed by C.W. YongC.W. Yong

SOD

DL_POLY Hands-OnDL_POLY Hands-On

http://www.ccp5.ac.uk/DL_POLY/TUTORIAL/EXERCISES/http://www.ccp5.ac.uk/DL_POLY/TUTORIAL/EXERCISES/index.htmlindex.html

UV

k

kq ik rc

o kj j

j

N

1

2

42 2

20 1

2

exp( / )

exp

1

4

4

0

3 22

1

o

n j

jnn j

N

Rjn

oj

j

N

q q

R rerfc R r

q

/

k

Vm n 2

1 3

/ ( , , )with:with:

The Ewald Summation

Parallelised Ewald Summation

• Self interaction correction - as is• Real Space terms:

– Handle using parallel Verlet neighbour list– For excluded atom pairs replace erfc by -erf

• Reciprocal Space Terms:– Distribute over atoms for traditional RD

parallelisation; or– Use a grid method such as SPME (by taking

advantage of data distribution). This can be especially advantageous for DD parallelisation!

q ik rj jj N N P

N

exp/

1

q ik rj jj

N P

exp/

1

q ik rj jj N P

N P

exp/

( / )

1

2

q ik rj jj N P

N P

exp( / )

/ )

2 1

3(

q ik rj jj N P

N P

exp/ )

( / )

3( 1

4

q ik rj jj

Nexp

1

2

Partition overPartition overatoms:atoms:

Add to EwaldAdd to Ewaldsum on allsum on allprocessorsprocessors

Global SumGlobal Sum

exp( / )k

k

2 2

2

4

ppPP

pp33

pp22

pp11

pp00

Repeat for eachRepeat for eachk vectork vector

Strategy in Replicated Data

2

102

22

exp)4/exp(

2

1

N

jjj

korecip rkiq

k

k

VU

The crucial part of the SPME method is the conversion The crucial part of the SPME method is the conversion of the Reciprocal Space component of the Ewald sum of the Reciprocal Space component of the Ewald sum into a form suitable for Fast Fourier Transforms (FFT).into a form suitable for Fast Fourier Transforms (FFT).Thus:Thus:

),,(),,(2

1321

,,321

321

kkkQkkkGV

Ukkk

T

orecip

becomes:becomes:

wherewhere G and Q are 3D grid arrays (see later)G and Q are 3D grid arrays (see later)

Ref: Essmann Ref: Essmann et alet al., J. Chem. Phys. (1995) ., J. Chem. Phys. (1995) 103103 8577 8577

Smoothed Particle-Mesh Ewald

Central idea - share discrete charges on 3D grid:Central idea - share discrete charges on 3D grid:

Cardinal B-Splines MCardinal B-Splines Mnn(u) - in 1D:(u) - in 1D:

)1(1

)(1

)(

)0,max()!(!

!)1(

)!1(

1)(

/2exp)1()/)1(2exp()(

/2exp)()(/2exp

11

1

0

12

0

uMn

unuM

n

uuM

kuknk

n

nuM

KikMKknikb

KikuMkbLkiu

nnn

nn

k

kn

n

n

jnj

RecursionRecursionrelationrelation

SPME: Spline Scheme

321 ,,333322221111

1

321

)()()(

),,(

nnnjnjnjn

N

jj KnuMKnuMKnuMq

Q

GGT T (k(k11,k,k22,k,k33) is the discrete Fourier Transform of the) is the discrete Fourier Transform of thefunction:function:

*3213212

22

321 )),,()(,,()4/exp(

),,( kkkQkkkBk

kkkkG T

2

33

2

22

2

11321 )()()(),,( kbkbkbkkkB withwith

Is the charge array and QIs the charge array and QTT(k(k11,k,k22,k,k33) its discrete Fourier transform.) its discrete Fourier transform.

SPME: Building the Arrays

• SPME is generally faster then conventional Ewald SPME is generally faster then conventional Ewald sum in most applications. Algorithm scales as sum in most applications. Algorithm scales as O(NO(NloglogN)N)

• In DL_POLY_Classic the FFT array is built in pieces In DL_POLY_Classic the FFT array is built in pieces on each processor and made whole by a global sum on each processor and made whole by a global sum for the FFT operationfor the FFT operation

• In DL_POLY_4 the FFT array is built in pieces on In DL_POLY_4 the FFT array is built in pieces on each processor and kept that way for the distributed each processor and kept that way for the distributed FFT operation (DaFT)FFT operation (DaFT)

• The DaFT FFT `hides’ all the implicit The DaFT FFT `hides’ all the implicit communicationscommunications

SPME: Comments

FFTs areFFTs are

• Fast (!) - O(VlogV) operations where V is the Fast (!) - O(VlogV) operations where V is the number of points in the gridnumber of points in the grid

• Global operations - to perform a FFT you need all Global operations - to perform a FFT you need all the pointsthe points

• This makes it difficult to write an efficient, good This makes it difficult to write an efficient, good scaling FFTscaling FFT

Parallel FFTs - The Basics

• Distribute the data by planesDistribute the data by planes

• Each processor has a complete set of points in the Each processor has a complete set of points in the xx and and yy directions so can do those Fourier transforms directions so can do those Fourier transforms

• Redistribute data so that a processor holds all the Redistribute data so that a processor holds all the points in points in zz

• Do the z transformsDo the z transforms

• Allows efficient implementation of the serial FFTs Allows efficient implementation of the serial FFTs ( use a library routine )( use a library routine )

• In practice for large enough 3D FFTs can scale In practice for large enough 3D FFTs can scale reasonablyreasonably

• However, the distribution does not map onto However, the distribution does not map onto DL_POLY’s - large amounts of data redistributionDL_POLY’s - large amounts of data redistribution

Traditional Parallel FFTs

• Takes data distributed as in DL_POLY_4 (Domain Takes data distributed as in DL_POLY_4 (Domain Decomposition) thus avoiding a lot of Decomposition) thus avoiding a lot of communication in comparison to traditional 3D FFT communication in comparison to traditional 3D FFT routines (due to data redistribution) when used on routines (due to data redistribution) when used on more than 8-64 CPUsmore than 8-64 CPUs

• So do a distributed data FFT in the So do a distributed data FFT in the xx direction then direction then the the y y and finally the and finally the zz directions directions

• Disadvantage is that can not use the library routine Disadvantage is that can not use the library routine for the 1D FFT ( not quite true … )for the 1D FFT ( not quite true … )

• Scales quite well - e.g. on 512 procs, an 8x8x8 CPU Scales quite well - e.g. on 512 procs, an 8x8x8 CPU grid, a 1D FFT need only scale to 8 CPUgrid, a 1D FFT need only scale to 8 CPU

• Totally avoids data redistribution, exists as a stand Totally avoids data redistribution, exists as a stand alone F90 library.alone F90 library.

Daresbury advanced Fourier Transforms

U. Essmann, L. Perera, M.L. Berkowtz, T. Darden, H. Lee, L.G. Pedersen, J. Chem. Phys., (1995), 103, 8577

1. Calculate self interaction correction2. Initialise FFT routine (FFT – 3D FFT)3. Calculate B-spline coefficients4. Convert atomic coordinates to scaled fractional units5. Construct B-splines6. Construct charge array Q7. Calculate FFT of Q array8. Construct array G9. Calculate FFT of G array10. Calculate net Coulombic energy11. Calculate atomic forces

RD Scheme for long-ranged part of SPME

U. Essmann, L. Perera, M.L. Berkowtz, T. Darden, H. Lee, L.G. Pedersen, J. Chem. Phys., 103, 8577 (1995)

1. Calculate self interaction correction2. Initialise FFT routine (FFT – IJB’s DaFT: 3M2 1D FFT)3. Calculate B-spline coefficients4. Convert atomic coordinates to scaled fractional units5. Construct B-splines6. Construct partial charge array Q7. Calculate FFT of Q array8. Construct partial array G9. Calculate FFT of G array10. Calculate net Coulombic energy11. Calculate atomic forces

I.J. Bush, I.T. Todorov, W. Smith, Comp. Phys. Commun., 175, 323 (2006)

DD Scheme for long-ranged part of SPME

Simplest option:

ewald precision [] (tolerance)

Advanced option (for anisotropic systems, particularly with slabs):

ewald [] [kmax1] [kmax2] [kmax3]

with

= exp[-( rcut)2]/rcut (solve for )

= exp[-(km/2)2]km2 (solve for km)

kmax n=Lnkm/2 , where Ln is the MD box width in n direction

Choosing Ewald Sum Parameters