EGEE-III INFSO-RI-031688 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ab...
-
Upload
charity-pope -
Category
Documents
-
view
217 -
download
1
Transcript of EGEE-III INFSO-RI-031688 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ab...
EGEE-III INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Ab initio electronic structure calculations on the Grid
M. Sterzel – ACC “Cyfronet” AGH
COST Training School on Molecular and Material Science Grid Applications
Trieste, 30 March 2010
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 2
Outlook
• Aim of this talk– To demonstrate benefits of Grid computing in chemistry
• Parts of the talk topics– Work on the Grid vs. cluster – comparison– Chemical Software packages available on the Grid
Parallel execution
– Typical Chemical Computations on the Grid Geometry optimizations Numerical Frequencies Chemical reactions
– In silico Lab– Final Remarks
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688
What is a/the Grid ?
• A Grid is NOT– The Next generation Internet– A new Operating System– Just
(a) a way to exploit unused cycles (b) a new model of parallel computing (c) a new model of P2P networking
• Definition (Vaidy Sunderam)– A paradigm/infrastructure that enables the sharing, selection, & aggregation of
geographically distributed resources (computers, software, data(bases), people) [share (virtualized) resources]
– . . . depending on availability, capability, cost– . . . for solving large-scale problems/applications– . . . within virtual organizations [multiple administrative domains]
• Grid Checklist (Ian Foster): A Grid is a system that– Coordinates resources that are not subject to centralized control– Uses standard, open, general purpose protocols and interfaces– Delivers nontrivial qualities of service
3
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 4
Authorization
• On a cluster - passwordokrucyusz:~ sterzel$ ssh ymsterze@uiLast login: Wed Mar 5 18:40:24 2008 from ha2rtr.agh.edu.pl
[ymsterze@ui ymsterze]$
• On the grid - certificate (Grid identity card)– User has to be a member of Virtual Organization (at least one)– Virtual Organization determines the resources user can use– To access grid resources user needs to obtain proxy:
[ymsterze@ui ymsterze]$ voms-proxy-init -voms gaussianYour identity: /C=PL/O=GRID/O=Cyfronet/CN=Mariusz SterzelEnter GRID pass phrase:Creating temporary proxy .............................. DoneContacting voms.cyf-kr.edu.pl:15001[/C=PL/O=GRID/O=Cyfronet/
CN=voms.cyf-kr.edu.pl] "gaussian" ...................DoneCreating proxy ........................................ DoneYour proxy is valid until Thu Mar 6 07:20:50 2008
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688
EGEE
5Mariusz Sterzel CGW'08 Kraków, 13 October 2008 5
EGEE
ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…
>250 sites48 countries>150,000 CPUs>50 PetaBytes>15,000 users>150 VOs>150,000 jobs/day
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 6
Job file
• Grid JDL file
Executable = "/bin/bash";
Arguments = $VO_GAUSSIAN_SW_DIR/g03/gaussian.x water.com";JobType = "MPICH";NodeNumber = 4;
StdOutput = "water.out";
StdError = "water.err";
InputSandbox = {"water.com"};
OutputSandbox= {"water.out",
"water.log", "water.err"
};
• PBS file
#!/bin/bash#PBS -l ncpus=4#PBS -q long#PBS -o water.out#PBS -e water.err#PBS -N my_job_name#PBS -M my@email#PBS -m eexport g03root=/somewhere. $g03root/g03/bsd/g03.profile$g03root/g03 water.com
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 7
Job management
• PBS– qsub – submit job to a queue– qdel – delete job from a queue– qstat – show job status in a queue
• EGEE Grid– glite-wms-job-submit – submit job to the Grid– glite-wms-job-delete – remove job from the Grid– glite-wms-job-status – show status of the job on the Grid– glite-wms-job-output – retrieve job files from the Grid
… just a few new commands…
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 8
• Freely available packages on EGEE Grid:
• Commercial packages on EGEE Grid:– Gaussian– Turbomole– Wien2k
Chemical software
– GAMESS
– DALTON
– CPMD
– Newton X
– DL_POLY
– NAMD
– RWAVEP
– GROMACS
– Autodock
– Tinker
– Solvate
– PIC-DMSC
– MCGBgrid
– QMC
– ABCtraj
– VENUS
– CRBS
– LM
– COLUMBUS
– DINX
– Abinit
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 9
Gaussian VO
• Why Gausian?– Large number of computational methods implemented– One of the first ab initio codes– The most popular among communities– User friendly– Available for many platforms along with GUI
• Gaussian VO– Invented and operated by ACC CYFRONET– All license issues confirmed with Gaussian Inc,– Open for every EGEE user– Any computing centre with site Gaussian license may support it
(4 supporting centres, another 3 in the line)– 30+ users since the start in September 2006– VO manager – Mariusz Sterzel ([email protected])– Enabled for parallel execution up to 8 processors
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 10
Turbomole – an alternative
Advantages:– Probably the fastest B3LYP implementation– Analytical gradients for excited states at DFT and CC2 levels– Variety of fitting approaches speeding up calculations– Very well scalability during parallel execution– Extremely fast and very well parallelised (ri)CC2 and (ri)MP2
Disadvantages:– Limited number of DFT functionals (only “good” ones available)– Lack of parallel version of analytical second derivatives– Lack of parallel version of TDDFT– Only NMR chemical shifts implemented, no spin-spin couplings
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 11
Gaussian VO – participation
As a user:• Register at:
https://voms.cyf-kr.edu.pl:8443/voms/gaussian• Wait for VOMS admin acceptance• voms-proxy-init --voms gaussian and you are ready to use the
program...
As a participating centre:• Just sent an e-mail concerning participation to VO manager• After confirmation of the license status at your centre with Gaussian Inc,
detailed information concerning set-up will be sent back to you
More details at:http://egee.grid.cyfronet.pl/Applications/gaussian-vo/
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 12
Grid and parallel execution
Past:– Serial jobs only– Job of MPICH type always enforced execution of mpirun
Present– mpirun no longer enforced– Instead a wrapper script mpistart can be executed which will
automatically set up environment for required MPI flavour– No possibility to request desired # of processors on a WN
… Unfortunately not all sites are set up…
Current work– Support for OpenMP jobs
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 13
Chemical software
• “Old codes” – mostly written in FORTRAN• Serial – parallel execution added later
(with exceptions)
• Different parallelization models used• Low scalability in many cases• Only selected computational methods parallelized
… all that makes parallel grid ports of chemical software even more complicated
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 14
Selected cases
• Gaussian– Parallelization via OpenMP or Linda– OpenMP – SMP machines or multiprocessor/core clusters up to # of
processors/cores on WN– Linda – allows the parallel execution between nodes. Requires equal #
of processors for each WN– For Linda additional expenses required (commercial package),
available only for specific platforms• Turbomole
– Uses MPI – currently HPMPI– No specific requirements
• GAMESS– Uses sockets but MPI execution possible (a little slower but more
convenient for execution on a grid)• ADF
– Uses MPI (MPICH, OpenMPI, HPMPI, …)– One of the best parallelized QC codes // to my knowledge ;-)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 15
Grid solution
• Gaussian– Parallel execution via OpenMP on a single WN up to # of
processors/cores available on that worker node– Necessarily queue system set-up requires a Site admin help– Torque set-up:
Modification of /var/spool/pbs/torque.cfg
to: SUBMITFILTER /var/spool/pbs/submit_filter.pl
– Other settings -- typical Job has to be of MPICH type # of processors controlled via NodeNumber variable Gaussian %Nproc route is automatically set-up by script executing
Gaussian
– Execution with 8 processors per job possible.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 16
Sample script
Executable = “/bin/sh”;
Arguments = “$GAUSSIAN_SW_DIR/gaussian.run myfile.com”;
JobType = “MPICH”;
NodeNumber = 8;
InputSandbox = {“myfile.com”};
StdOut = “myfile.out”;
StdErr = “myfile.err”;
OutputSandbox = {“myfile.log”, “myfile.chk”,
“myfile.out”, “myfile.err”};
Requirements = other.GlueCEUniqueID==“ce.cyf-kr.edu.pl”
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 17
Grid Solution cont.
• Turbomole– No special set up except shared directory needed, # of
processors automatically discovered by Turbomole scripts
• NAMD– Similar to Turbomole. If the NAMD executing script was set-up
properly during installation the necessarily “node file” is created every time program is executed
• GAMESS– Depends on Grid port– In case of MPI no additional input needed– DDI case – may require WN reconfiguration especially if large
DDI memory is requested by a job
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688
PL-Grid – Polish Grid Infradtructure
• In addition to above:– CPMD– Cfour– ACES III– Amber– OOMMF– ADF– Amber– Cadence– Fluent– ... any aplication needed by users...
18
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 19
Execution benchmarks
Scheduling time– MPI jobs
4 proc./job -- usually less than hour 8 proc./job -- waiting time even 3-4 hour
– OpenMP jobs Job waiting time much longer, heavily depends on site overload
• 4 proc./job -- from less than hour up to 6 hours
• 8 proc./job -- in some cases job waiting time exceeds 12 hours
Parallel job execution can be inefficient in case of short (less than 24 h) jobs
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 20
Grid application in chemistry
• Tasks to which Grid can be applied directly:– Conformational analysis– Numerical frequency computations– Zero Point Vibrational Averaging– Property computations for series of geometries from Molecular
Dynamics simulation– Determination of chemical reaction paths– Determination of potential energy surfaces (PES)
… all kind of “brute force” tasks, or tasks which operate on huge data sets
• Other tasks– Computations need to to be planned in order to maximize
benefits from the grid computing
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 21
An example
Geometry optimization:– Steep potential
Few steps needed
– Flat potential Many steps needed Energy and gradient
convergence have to be increased to high values
Start
Stop
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 22
PCP complex
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 23
PCP cont.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 24
PCP cont.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 25
PCP cont.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 26
PCP cont.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 27
PCP cont.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 28
PCP cont.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 29
Plan
• Computations of the whole molecule are not possible• System needs to be modeled
– For this we need: Chlorophyll A and peridinin ground and excited states geometries
and normal modes The geometry of whole complex A little programming (tetramer model, energy transfer model)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 30
Plan
• Computations of the whole molecule are not possible• System needs to be modeled
– For this we need: Chlorophyll A and peridinin ground and excited states geometries
and normal modes The geometry of whole complex A little programming (tetramer model, energy transfer model)
… a lot of luck
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 31
First step – Peridinin
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 32
First step – Peridinin
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 33
Peridinin geometry
• A long peridinin chain makes usual gradient based optimization inefficient
• Instead we propose following scheme:– MD for peridinin (force field level)– Geometry preoptimization for series of MD snapshots (semi
empirical or ~ RHF/STO-3G level)– “Final” geometry optimization for few lowest energy MD
snapshots (CASSCF/PT2 level)– Verification of the minima by frequency calculations– Excited state geometry optimization with ground state as a
starting point– Again, minima verification via vibrational analysis– Verification of obtained data by comparison with experiment
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 34
H—CN CN—H
Chemical reactions
... a process that results interconversion of molecules
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 35
Chemical reactions
Points of interest:– Structure of substrates and products– Structure of active complex at TS– Reaction path
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 36
Chemical reactions
Points of interest:– Structure of substrates and products– Structure of active complex at TS– Reaction path(s)
Reactionpath
substrate(s) product(s)
Transition State (TS)
Energy
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 37
Chemical reactions
Energy
substrate(s)product(s)
A+B C
DIntermediateproduct
TS1
TS2
E(TS1) > E(TS2)
Reactionpath
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 38
Chemical reactions
Energy
Possible reaction paths;More intermediate products
Reaction path
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 39
Chemical reactions
Main problem – TS determination
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 40
Chemical reactions
Main problem – TS determination
Artur MichalakArtur Michalak
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 41
Chemical reactions
Main problem – TS determination
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 42
Chemical reactions
Main problem – TS determination
E
‘reaction coordinate’
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 43
Chemical reactions
Main problem – TS determination
‘reaction coordinate’
E
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 44
Chemical reactions
TS verification – vibrational analysis – one imaginary frequency!
i1203 cm-1
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 45
Sample calculations
N2O braking on oxide surfaces – possible mechanisms
• Electron transfer
6. (O-)--X+--(O-) X + O2
1. N2O + X 2. N2O(s) + X X+--N2O-
4. X+--N2O- X+--(O-) + N25. (O-)--X+--(O-)
e- transfer
Filip ZasadaFilip Zasada
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 46
Sample calculations
N2O braking on oxide surfaces – possible mechanisms
• An Oxygen transfer
1. N2O(g) N2O(s) 2. N2O(s) + X X--O--N2
4. O-X-O 5. O--O-X 6. X + O2
4. X-O
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 47
Sample calculations
Vac
uu
m
Oxi
de
slab
Adsorbed N2O
Slab construction
Oxi
de
slab
Matrix generation
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 48
Sample calculations
• Reaction paths:
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 49
Sample calculations
• Reaction paths:
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 50
Sample calculations
Computational details:– Gaussian 03 D.01– BP86 functional– Basis set of double-ζ quality
Timings:– CPU time for SP calculation – approx 7 hours– 15 paths – 10-50 energy points on each path– In total about 250 energy points calculated
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 51
Conformational searchers
• To determine lowest energy structure
(1R,2R)-1,2-bis-(N’-cyklohexy-1’,4’,5’,8’-naphthalenetetracarboxydiimide)cyklohexane
• Possible issues– For short time jobs
it is convenient to
group several geometries
in to one job to minimize
average job scheduling time
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 52
Conformational searches
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 53
Harmonic frequencies
Numerical frequency computations for lycopene
– 96 atoms, 2·3·96+1=577 independent computation steps– Software: GAMESS version June 2005– VOCE VO resources used– Methodology: B3LYP, cc-pVDZ basis set– Computations done on approx. 100 processors (ia64 and i386)– CPU time for single computation:
Intel Xeon 2.8Gh - 14h 39’ Intel Itanium 2 1.3Gh - 12h 8’
– Total time: single CPU - 330 days (estimated) EGEE Grid - 3 days
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688
In silico Lab• Computational chemists use first
principles chemistry packages available on the Grid (Gaussian, GAMESS, TURBOMOLE)
• Grid is still mainly available through command line interfaces (voms-proxy-init, glite-wms-job-submit, glite-wms-job-status)
• Adoption to the Grid environment by non-experts is difficult
• Lets build a web-based problem solving environment facilitating the use of Grid … but not only that
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688
State-of-the-art
• WebMO
– Desktop and Web access
– Supports main computational chemistry applications
– Does not support Grid or local queues infrastructures
• ECCE – Extensible Computational Chemistry Environment
– Only desktop access
– Supports main computational chemistry applications
– Supports many queue management systems (PBS, LSF, NQE, etc.)
– Does not use Grid infrastructures
• CCG – Computational Chemistry Grid
– A virtual organization which is a part of Teragrid
– GridChem – Java client which can be run as a WebStart application
• P-GRADE – Framework for Development and Execution of Parallel Applications
– A Grid-oriented solution
55
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688
Description of the solution - requirements
• Supports main chemistry packages
– Gaussian
– GAMESS
– TURBOMOLE
– ...
• Is accessible through a web interface
– All Grid-related operations should be embedded (proxy generation, job submission and status monitoring, LFC catalog operation)
– Persists information about executed jobs between web sessions
• Enables user-centric processing rather than grid-centric
– It should not be yet another Grid job submission tool
– Supports inter-application geometry passing
– Provides automated report summaries
– Covers Grid complexity
• Supports annotations through user free-text tagging
56
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688
Description of the solution – architecture (1/2)
• Uses gLite for Grid job management
– A set of Grid APIs is used (LFC, WMS, VOMS)
• Job submission and monitoring implemented as a separate layer for better error handling
– GridSpace platform used
– Each application backed by a separate script
• Application model realized by using SINT (Semantic Integration Tool)
– Basic concepts such as geometry, basic and detailed reports, input parameters, annotations are modeled
• Interactive graphical user interfaces are used
– GWT (Google Web Toolkit) is used as the user front-end technology
57
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 58
Description of the solution – architecture (2/2)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 59
Demo
Switching to demo ...
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 60
References
T. Gubala, M. Bubak, P.M.A. Sloot: Semantic Integration of Collaborative Research Environments, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global
Marian Bubak et al.,: Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-031688 61
Summary
• Access to the Grid is easy, does not differ to much from queue system usage
• EGEE Grid offers variety of software packages for chemical computations. A parallel execution can be made as simple for the user as a serial execution is now
• It is always possible to find solution for parallel execution even if computational platform does not directly support certain parallelization model
• With a little of planning every computational chemistry task may benefit from the Grid platform