VAMDC tutorial for prospective data-providers Guy Rixon SUP@VAMDC meeting, IPR, November 2013.
Data-intensive profile for the VAMDC
description
Transcript of Data-intensive profile for the VAMDC
Data-intensive
profile for VAMDC
C. Mendoza (IVIC, CeCalCULA) &
VAMDC Collaboration
3d CDAMOP
University of Delhi
15 December 2011
Since early 80s, I have contributed to international
consortia for the production of massive atomic
data sets for astrophysical applications
1982-1997: Opacity Project (Opacity Project Team 1995)
Radiative atomic data (LS coupling) and opacities for
cosmic abundant elements
Led by Mike Seaton and Dimitri Mihalas
Contributors from France, Germany, UK, USA,
Venezuela
1992-present: IRON Project (Hummer et al 1993)
Radiative and collisional data (intermediate coupling)
for Fe-group ions
Coordinated by David Hummer
Contributors from Canada, France, Germany, UK, USA
TOPbase was one of the first
online atomic databases
Source: Cunto & Mendoza (1992)
In 1995 TIPTOPbase was upgraded
with web technology
The OPserver is a good example of
database-centric computing OPserver at OSC
Source: Mendoza et al. (2007)
Source: John R. Johnson, “HPC for data intensive science”,Pacific Northwest National Laboratory
A new scientific culture:
e-Science (John Taylor 1999) Digital science
Multidisciplinary and collaborative (social
networks)
Virtualized on a 2nd generation Internet
(advanced networks)
Data intensive, open access (database centric)
HPC in distributed environments (grids, clouds)
and managed through services
New communication and publication pathways:
knowledge preservation & dissemination
(metadata)
Grid
E-Scientists
Entire e-Science
Cycle
Encompassing
experimentation,
analysis, publication,
research, learning
Institutional
Archive
Local
Web Publisher
Holdings
Digital
Library
E-Scientists Graduate
Students
Undergraduate
Students
Virtual
Learning
Environment
E-Experimentation
E-Scientists
Technical
Reports
Reprints
Peer-
Reviewed
Journal &
Conference
Papers
Preprints &
Metadata
Certified
Experimental
Results &
Analyses
Data,
Metadata &
Ontologies
Source: David De Roure (Univ. Southampton, UK)
Data curation is rapidly becoming a crucial
step in the research cycle
Original image from Lord et al (2004)
• The Virtual Atomic and Molecular Data Center (VAMDC)
aims at building an interoperable e-infrastructure for the
exchange of A&M data. VAMDC involves 15 administrative
partners representing 24 teams from 6 European Union
member states, Serbia, the Russian Federation and
Venezuela.
• VAMDC is supported by EU in the framework of the FP7
"Research Infrastructures - INFRA-2008-1.2.2 - Scientific
Data Infrastructures" initiative. It started on the 1st of July
2009 for a duration of 42 months.
VAMDC integrates several research groups
mainly from the European Research Area
NIST
IVIC
CeCalCULA
UCL
U Cambridge
Open U
Queen’s U
U Uppsala
U Cologne
CNRS
INA Italia
AO Belgrade
U Vienna
RAS
RFNC
Outstanding problems in existing A&M databases
are interoperability and data interfaces
VAMDC intends to deploy an interoperable e-
environment for distributed A&M databases
database1
database2
database3 database4
Users will be able to navigate seamlessly
and retrieve data from 21 A&M databases
NIST
HITRAN
OPserver
XSTAR
TIPbase
TOPbase
W@DIS
SPECTRA
OZONE
CDSD SpecW3
BELDATA
LASP
PAH
KIDA
UMIST
STSP
BASECOL
CDMS
CHIANTI VALD
VAMDC
A&M data are used in a wide variety of
research and industrial fields
Astrophysics
Fusion plasmas Lighting
VAMDC is conceived as a virtual warehouse
of A&M distributed data services
The first database integrations were carried
out by the IAEA by means of web portals
Database integration and data exchange
management are now performed with XML
Source: Freire & Benedict, 2004, Comp. Sc. Eng., 6, 12
Storage of XML in a database
Source: Freire & Benedict, 2004, Comp. Sc. Eng., 6, 12
XSAMS is an XML schema for
A&M data exchange
XSAMS tree
XSAMS
Methods
Functions
Data
sources Objects Processes
Atoms Molecules
Solids Particles
Nonrad. Radiative
Collisions
VO-PDC Forum, Paris, November 2011
<ChemicalElement>
<NuclearCharge> 1</NuclearCharge>
<ElementSymbol>H </ElementSymbol>
</ChemicalElement>
<Isotope>
<IonState>
<IonCharge> 0</IonCharge>
<IsoelectronicSequence> H </IsoelectronicSequence>
<AtomicState stateID="S.0101.001">
<AtomicNumericalData>
<StateEnergy><Value units="1/cm"> 0.0000000E+00</Value></StateEnergy>
</AtomicNumericalData>
<AtomicQuantumNumbers>
<Parity>even</Parity>
<TotalAngularMomentum> 0.5</TotalAngularMomentum>
</AtomicQuantumNumbers>
<AtomicComposition>
<Component>
<Configuration>
<Shells>
<Shell>
<PrincipalQuantumNumber> 1</PrincipalQuantumNumber>
<OrbitalAngularMomentum><Value> 0</Value></OrbitalAngularMomentum>
<NumberOfElectrons>1</NumberOfElectrons>
</Shell>
</Shells>
<ConfigurationLabel>1s_1/2 </ConfigurationLabel>
</Configuration>
<Term>
<LS>
<L><Value> 0</Value></L>
<S>0.5</S>
<Multiplicity>2</Multiplicity>
</LS>
</Term>
</Component>
</AtomicComposition>
</AtomicState>
<RadiativeTransition>
<EnergyWavelength>
<Wavelength>
<Theoretical><Value units="nm"> 1.215674E+03</Value></Theoretical>
</Wavelength>
</EnergyWavelength>
<InitialStateRef>S.0101.002</InitialStateRef>
<FinalStateRef>S.0101.001</FinalStateRef>
<Probability>
<TransitionProbabilityA><Value units="1/s"> 6.2684E+08</Value></TransitionProbabilityA>
</Probability>
</RadiativeTransition>
Deployment Strategy • All data on the WWW
• Databases stay at their producers’
sites
• All data searchable
VO-PDC Forum, Paris, November 2011
VAMDC registry based on IVOA
registry standards
VO-PDC Forum, Paris, November 2011
Data Access: TAP-XSAMS
VO-PDC Forum, Paris, November 2011
Based on TAP standard
Ingredients & structure of a
VAMDC node
VAMDC will provide a registry of A&M
web services
The XSTAR spectral modeling code is being
offered as a SOAP web service
XSTAR
uaDB
Command-based app
Once XSTAR is available as a web service,
it can be integrated in a web page
Workflows maybe published in
scientific social network systems
A&M data producer-user communities are
being consolidated with Web 2.0 tools
Conclusions
Scientific research is becoming increasingly collaborative
and data-intensive (e-science)
Atomic data production must be scaled up to the extreme
requirements of diverse virtual organizations
Data repositories must be kept fit and integrated for
contemporary purposes, discovery and reuse (data curation)
Data provenance and preservation are of vital importance
Regarding A&M data, VAMDC is addressing most of these
issues
An A&M XML schema (XSAMS) has been released and is
being extended and maintained
If you have A&M databases, you are welcome to set up a
VAMDC node for publication