G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation...

31
G.A.S . Read Write Analyz e Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013

Transcript of G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation...

Page 1: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

G.A.S.

Read

Write

Analyze

Manip-ulate

Generalized Atomic Systems: A Tool Kit for Atomistic Simulation DataMichael WatersKatie Sebeck2/20/2013

Page 2: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

2

Overview

Traditional Workflow in Molecular Dynamics

Defining the Problem An Interchangeable Approach Aiding Analysis Current Usage

Page 3: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

3

Basics of Atomistic Simulations Atoms in boxes

Positions Updated by iteratively

solving F=ma according to empirical force fields

Velocity Type, charge, etc..

System wide data Simulation box Number of atoms Temperature, energy,

pair potentials…

Page 4: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

Molecular Dynamics Data

Input

Initial System Data

Coordinates, types, charges,

mass

Interatomic bonds, angles

Run Time Instructions

Interaction Potential

Equations (form, coefficients)

Output

System

Run Data (CPU rate, memory

usage)

System variables (pressure, stress,

temperature)

Atomic trajectories

Atomic characteristics (charge, type)

Per-Atom

Force, PE, KE, stress

Processed

Neighbor lists

Time averaged: Mean squared displacement,

radial distribution function

ALL molecular dynamics data can be contained in ASCII text files

4

Page 5: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

5

A Brief Guide to Atomistic File Types

pdb, xyz, mol, cfg, sfd, gro, mdl, LAMMPS read_data, ccm, xsd, cif, car…

Page 6: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

6

Through a Traditional Workflow

Generate input file(s)

Run simulation

Analyze output

Visualization/ plotting analysis

• Control file• Structure file• Format depends on program

n=16, 500 Chains, rho=0.7918

8000 atoms3 atom types7500 bonds1 bond types0 angles0 dihedrals0 impropers

0 92.055 xlo xhi0 70.395 ylo yhi0 37.905 zlo zhi

Masses

1 14.0022 14.0023 63.54

Atoms

1 1 2 1.80500000000000 1.80500000000000 1.80500000000000

2 1 1 2.65313400000000 3.07841000000000 1.80500000000000

units realtimestep 1.0atom_style bonddimension 3boundary p p p#---------------Coordinates and Bonds --------------lattice fcc 1.0region 1 block -9.025 -1.805 0 70.395 0 37.905 #N=28read_data n28latpair_style lj/cut 9.805pair_coeff 1 1 0.1431 3.923pair_coeff 2 2 0.1432 3.923pair_coeff 3 3 4.72 2.616pair_modify mix arithmeticbond_style harmonicbond_coeff 1 41.82 1.54group alkane type 1 2group copper type 3neighbor 1.0 binthermo 1thermo_style custom step temp pe ke etotal#minimize 1.0e-4 1.0e-6 100 1000fix hope all nverun 100000

Page 7: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

7

Through a Traditional Workflow

Generate input file(s)

Run simulation

Analyze output

Visualization/ plotting analysis

• Information about simulation run in control file• Hardware, software version metadata formatting depends on system configuration• Produces output of overall run statistics

Loop time of 3515.13 on 32 procs for 50000 steps with 107008 atoms

Pair time (%) = 1108.83 (31.5444)Bond time (%) = 78.4225 (2.231)Neigh time (%) = 162.274 (4.61645)Comm time (%) = 1270 (36.1294)Outpt time (%) = 523.248 (14.8856)Other time (%) = 372.363 (10.5931)

Nlocal: 3344 ave 8049 max 0 minHistogram: 16 0 0 0 0 0 2 6 3 5Nghost: 7940.66 ave 15817 max 0 minHistogram: 8 4 4 0 0 0 0 0 8 8Neighs: 862976 ave 2.19776e+06 max 0 minHistogram: 16 0 0 0 0 2 2 6 2 4

Page 8: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

8

Through a Traditional Workflow

Generate input file(s)

Run simulation

Analyze output

Visualization/ plotting analysis

• Output files generally dictated by control file• Final structure file• System properties log• Other run-time analysis

outputs• HIGHLY VARIED FORMATING!• Quantitative analysis of output by scripting, MATLAB or Excel

Page 9: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

9

Through a Traditional Workflow

Generate input file(s)

Run simulation

Analyze output

Visualization/ plotting analysis

• Output structure file may or may not be in a format which can be fed into visualization software• Many software options available:

• VMD• Avogadro• POVray• VESTA• …

• Analysis output may or may not be in a format which can be parsed by plotting software

Page 10: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

10

An Endless Series of Parsing Problems

Input file Convert from something you can

manipulate/generate to something the code can read

Output analysis Typically requires writing new parsing routines Different codes require re-writing scripts

Visualizations May require extract data from other files manually Most visualization code is already equipped to

parse a variety of file types

Page 11: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

11

Data from Legacy Code

Locally developed molecular dynamics code, FLX

Trying to port data into another code, LAMMPS

Ctrl+C, Ctrl+V and lots of manual editing… Very time consuming for each file

Page 12: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

12

Obstacles to Data Sharing and Reuse

Energy barrier of converting files formats Example: A file downloaded directly from

Protein Data Bank (.pdb) may not be readable by MD code (LAMMPS)

Extracting relevant quantities from available data sets Parsing rules not always clear if unfamiliar

with the format Formats not always well documented

Page 13: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

13

Problem Statement

Too much redundant work Too little documentation or code clarity Too much time spent manipulating data

formatting How can we fix this?

Page 14: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

14

Our Approach: Interchangeable Libraries

We created a General Atomic System (GAS) class All file read functions generate a GAS object GAS objects are accepted by

Write file functions Analysis functions Manipulation functions

G.A.S.

Page 15: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

15

Examining Existing Standards for Commonalities

Positions Type Number of atoms

Page 16: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

16

Examining Existing Standards for Commonalities

Positions Type Number of atoms

Page 17: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

17

Examining Existing Standards for Commonalities

Positions Type Number of atoms/ end of atoms section

Page 18: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

18

Creating a Common Data Structure

GAS class contains System data Internal functions

Trivial ontology Simplicity in data structure is flexibility Internal functions should be as reliable as

possible Obvious and explicit naming schemes

Page 19: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

19

Ontological Details

GASSystem Data

.number_of_atoms

.x, .y, .z

.atomic_number

… and many more

Internal Functions.update_number_of_atoms

.fill_id_list

.sort_by_id

… and many more

Page 20: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

20

User Time Savings

From read_data to xyz: timing comparisons Manual copy-paste, eliminating excess

columns: 2.15 minutes Calling functions, including typing out

calls: 1.05 minutes Actual function timing: ~6 seconds

Page 21: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

21

Aiding Analysis

With all data in standard structure: Write all analysis based on this format Input format independent

Allows reuse of analysis functions Reuse begs for optimization Intended reuse encourages documentation

Nested analyses now possible Modularization saves:

Time Effort Error

Page 22: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

22

Traditional Scripting Problems Scripts typically used for:

Quantitative analysis Modifying files to be parsed by various software

Rewriting input/output handling for each script MATLAB, sed, awk and grep are not the

friendliest or fastest parsing tools Lack of commenting Can only be applied to specific file types or

a single file

Page 23: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

23

Examples of Scripting

2.5 seconds

Page 24: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

24

The Python Version…

0.4 seconds

Once a function is written, can be called in just a few lines by ANY GAS system containing sufficient information

Page 25: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

25

CC BY-NC-SA http://www.flickr.com/photos/katieharbath/

Page 26: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

26

User Time Savings

Open source and custom function libraries instead of MATLAB allows for brute force parallelization, shifting of load to external resources

Faster run times: 2.5 using bash versus 0.4 in Python

Faster coding times Reuse of functions without additional modifications

needed Eliminating redundant coding efforts

Use of common language promotes code reusability Writing code for “future” self as well as others

Page 27: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

27

Ways We’re Using GAS

Polymerization Analyze pair-pair distances Alter system topology Automatically generate system readable file

Iterative system analysis Quantitative analysis of a series of files

Radial distribution functions Density profile Bond length distributions

Automatically generates easily parsed output files Automatic movie rendering

Page 28: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

28

Automatic Movie Rendering

Page 29: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

29

System Manipulation: Unwrapping Coordinates

Page 30: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

30

Moving Forward

More file formats More advanced analysis methods and

functions Density functional theory support Non-spherical particles Collaboration with other groups Better metadata integration

Page 31: G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

31

Final Thoughts

Our lives are much better Our code is much more consistent Future users have a hope of understanding what we did If you want people to use it, it needs to be USEFUL and

EASY

G.A.S.