Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG...
-
Upload
destiny-wagner -
Category
Documents
-
view
217 -
download
2
Transcript of Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG...
Looking for a (standard) Common Format for (Quantum)
A WG activity within COST action 23 (WG D23/0006/01)
– Elda Rossi, Andrew Emerson – CINECA–Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna–Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara–Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse–José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona
– Elda Rossi, Andrew Emerson – CINECA–Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna–Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara–Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse–José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona
Computational Chemistry
MotivationVocabolary
wrappers
Motivation for the work
To build a meta-system for supporting research collaboration in the field of
“Localised Orbitals in post-SCF methods …
Linear Scaling methods in a Multi-Reference context”
MotivationVocabolary
wrappers
The scenario
Different laboratories need to collaborate Different “home-made” codes need to be used together since
they give different views of the same problem General purpose “basic” codes needed to pre-compute data in
a sort of pipeline Programmes should remain
on their original sites under the responsibility of their authors
Different platforms Network connections (grid architecture)
Workflow
MotivationVocabolary
wrappers
The need of a Common Format
The first problem we faced:How different codes (on different platforms) can communicate
we need a Common Format for (at least) Quantum Chemistry codes
MotivationVocabolary
wrappers
Preliminary steps
Looking around …o CML available since long timeo XML is use by Accelrys for internal fileso XML is used by ArgusLab for internal files
All of them not completed suited for computational chemistry mainly structural chemistry, no Quantum Chemistry properties
XML seems the best technology so we took the decision to try another XML based format
HDF5 looked nice for storing large binary data typical of QC
MotivationVocabolary
wrappers
IN-wrapper
OUT-wrapper
Program
IN-files
OUT-files
Data Data RepositoryRepositoryXML/HDF
Leaves the program unchanged
One wrapper for each program – If a code is added only one wrapper to be written
How should work the engine
MotivationVocabolary
wrappers
QCML: an XML format for QC
In order to be as general as possible we need to write down a hierarchical schema of Quantum Chemistry quantities
As a first approximation three domains can be identified
Base FACTS initial data for describing the physics of the system
DERIVED quantities computed from FACTS using QC Fact algorithms (Energies, Props, integrals, coeff, …)
W-FLOW which codes are in the pipeline, specific input Parameters data, …
•A base fact is a fact that is a given in the world and is remembered (stored) in the system. •A derived fact is created by an inference or a mathematical calculation from terms, facts, other derivations, or even action assertions.
MotivationVocabolary
wrappers
FACT: molecule<system title date program author><molecule nElectrons charge spinMultiplicity
spaceSymmetry> <symmetry> groupName/> <geometry type unit numAtoms symmetryRef > <atom symbol isotope x3 y3 z3/> <basis name type numOrbitals >
<atomBase angularMomMAX symbol > <angularMom value symbol numOrbitals> <orbital id numPrimitives> <exps/> <coeffs/>
–FACTS–DERIVED–W-FLOW
Symmetry: group name & other symmetry data
Geometry: only cartesian, full or unique for sym
Basis: by name or fully defined
MotivationVocabolary
wrappers
DERIVED data: computedData
<system …>
<computedData>
<energy unit levelOfTheory quality value>
<state spaceSymmetry spinMultiplicity excitationLevel />
<property unit levelOfTheory quality value>
<state “bra” spaceSymmetry spinMultiplicity excitationLevel />
<state “ket” spaceSymmetry spinMultiplicity excitationLevel />
<operator order name/>
<file address URL/>
–FACTS–DERIVED–W-FLOW
A “schema” has been written for QCML
MotivationVocabolary
wrappers
DERIVED : computedData/file
Two possible strategies:1. Leave data in their native format and translate
them only when needed. Maintain different version (formats) of the same data
2. Define a “standard” format for binary data and convert them anyway
Problem with large binary datasets include the reference not the actual data
The second was the solution of choice HDF5 appears to be a good solution
MotivationVocabolary
wrappers
HDF Mission
To develop, promote, deploy, and To develop, promote, deploy, and support open and free technologies that support open and free technologies that facilitate scientific data storage, facilitate scientific data storage, exchange, access, analysis and exchange, access, analysis and discovery. discovery.
• Format and software for scientific data• Stores images, multidimensional arrays, tables, etc.• Emphasis on storage and I/O efficiency• Free and commercial software support• Emphasis on standards• Users from many engineering and scientific fields
MotivationVocabolary
wrappers
“/MO”
“/” (root)
“/AO”
Example HDF5 file
Orb | occ | energy----|-----|----- 1 | 0 | 0.35 2 | 0.5| 0.26 3 | 2. | 0.69
TableTable
“/MO”
KineticKineticOverlapOverlap RepulsionRepulsion
Kinetic+Kinetic+RepulsionRepulsion
PropertyProperty
“/bi”“/mono”
4-D 4-D arrayarray
“/bi”“/mono”
“/coefficients”
MotivationVocabolary
wrappers
HDF file structure for QCRoot AO <i/j>
<i/T/j> <i/Vnuc/j>
<i/T/j>+<i/Vnuc/j>
<ij/kl>
MO <i/T/j> <i/V/j> <i/T/j>+<i/Vnuc/j>
<ij/kl>
coeff(i,j)
Property <i/p/j>
NameQCML_refNorb
Norb
Spin Polar.: Orb Classif: Core
ActiveVirtual
Orb Energies: Orb Symm: [1-order]
+ format metadata (integer, binary, Endian-ism, …)
MotivationVocabolary
wrappers
QCML processing: wrappers
One couple of wrappers for each code in the metasystem
They should be written & maintained by the authors of the chemical codes
XML processing can be used (DOM) but … what language???
o Fortran: no easy and stable DOM available
o Scripting languages (Perl/Python/Java): not known by chemists
We tried both ways (Fortran & Python)We tried both ways (Fortran & Python)
MotivationVocabolary
wrappers
Fortran DOM: drawbacks
The only problem is the Fortran bindingo It doesn’t exist (at least last year …)o DOM is OO and Fortran is not
It exists a C binding (Gdome2) Gdome2 was installed – very hard work – on
a mainframe platform (it was conceived for Linux)
We are currently converting it to Fortran, by adopting the DOM recommendations (simplified …)
MotivationVocabolary
wrappers
Why Fortran
GOODGOOD•Users don't need to learn a new language•Homogeneous environment
BABADD•Tricky: need an external library (f77xml) built on top of gdome2•Porting problems for gdome2/libxml2 may arise
MotivationVocabolary
wrappers
F77xml library
Still in development ov0.4 is out (experimental, with limited features)ov1.0 upcoming, API changed to be nearly DOM2 compliant
Written in C on top of gdome2 http://gdome2.cs.unibo.it/index.html
Designed for interfacing to F77 (also F90 soon)Reduced namespace pollution
Cons: ● F77 syntax is difficult (DOM2 + tricks)● F90 syntax is simpler ● A pre-processor will convert F90 syntax to
F77http://freshmeat.net/projects/f77xml
MotivationVocabolary
wrappers
F77xml library - V1.0 example
GdomeNode* gdome_el_firstChild (GdomeElement *self, GdomeException *exc);
Call f77xml_el_firstChild(nodeCode, elemCode, exc)
First position:Return value
NodeCode, elemCode,excmapped to INTEGER
Gdome2 (C)Gdome2 (C)
F90F90
F77F77
Func='el_firstChild'Call xp3t1(nodeCode,func,elemCode,exc)
Multiplexer function:x:p3: 3 parameters (+ name function) t1: type 1 parameter schema (code/code/error)
MotivationVocabolary
wrappers
Why PythonGOODGOOD Very Easy Object Oriented Language Works well with strings Simple ed efficient DOM interface for XML Present in almost all UNIX/LINUX distribution
BADBAD Users do need to learn a new language Maybe less powerful than Perl Usually not used by chemists
MotivationVocabolary
wrappers
Python Wrapper
At the present a prototype does work with molpro-fci chain.
It takes information from xml-repository Writes down proper MOLPRO and FCI input Starts the two programs
With a different XML file users should only specify the file name and some simple parameters (orbital guess for FCI)
MotivationVocabolary
wrappers
Python or not
Python is very simple to learn and works very efficiently with xml
Scripts written in Python (at least for prototypes) are quite clear, linear and easy to maintain or upgrade
Possibility of a GUI could make our project much more user-friendly
MotivationVocabolary
wrappers
What we have done …Single platform:
IBM SP4Two code chains MolPro to FCI MolPro to CasDI
MolProMolPro
FCIDUMP
QCML Repository
HDF5 Repository
OUT-wrapper
IN-wrapper
Bin file for FCI
FCIFCI
IN-wrapper
IN-wrapper
MolProIN-file
FCIIN-file
Start here
Stop here
In conclusion …
Two important hints on data…Two important hints on data…1.Use some XML dialect for describing simple
structured data2.Use HDF5 for storing large array and binary data
Need of a good and easy API to XML & HDFNeed of a good and easy API to XML & HDF
How to manage the workflowHow to manage the grid connection