Data-intensive profile for the VAMDC

Post on 13-Jun-2015

496 views 0 download

Tags:

description

Talk given by Claudio Mendoza at 3d CDAMOP, 14-16 December 2011, Delhi University, Delhi, India.

Transcript of Data-intensive profile for the VAMDC

Data-intensive

profile for VAMDC

C. Mendoza (IVIC, CeCalCULA) &

VAMDC Collaboration

3d CDAMOP

University of Delhi

15 December 2011

Since early 80s, I have contributed to international

consortia for the production of massive atomic

data sets for astrophysical applications

1982-1997: Opacity Project (Opacity Project Team 1995)

Radiative atomic data (LS coupling) and opacities for

cosmic abundant elements

Led by Mike Seaton and Dimitri Mihalas

Contributors from France, Germany, UK, USA,

Venezuela

1992-present: IRON Project (Hummer et al 1993)

Radiative and collisional data (intermediate coupling)

for Fe-group ions

Coordinated by David Hummer

Contributors from Canada, France, Germany, UK, USA

TOPbase was one of the first

online atomic databases

Source: Cunto & Mendoza (1992)

In 1995 TIPTOPbase was upgraded

with web technology

The OPserver is a good example of

database-centric computing OPserver at OSC

Source: Mendoza et al. (2007)

Source: John R. Johnson, “HPC for data intensive science”,Pacific Northwest National Laboratory

A new scientific culture:

e-Science (John Taylor 1999) Digital science

Multidisciplinary and collaborative (social

networks)

Virtualized on a 2nd generation Internet

(advanced networks)

Data intensive, open access (database centric)

HPC in distributed environments (grids, clouds)

and managed through services

New communication and publication pathways:

knowledge preservation & dissemination

(metadata)

Grid

E-Scientists

Entire e-Science

Cycle

Encompassing

experimentation,

analysis, publication,

research, learning

Institutional

Archive

Local

Web Publisher

Holdings

Digital

Library

E-Scientists Graduate

Students

Undergraduate

Students

Virtual

Learning

Environment

E-Experimentation

E-Scientists

Technical

Reports

Reprints

Peer-

Reviewed

Journal &

Conference

Papers

Preprints &

Metadata

Certified

Experimental

Results &

Analyses

Data,

Metadata &

Ontologies

Source: David De Roure (Univ. Southampton, UK)

Data curation is rapidly becoming a crucial

step in the research cycle

Original image from Lord et al (2004)

• The Virtual Atomic and Molecular Data Center (VAMDC)

aims at building an interoperable e-infrastructure for the

exchange of A&M data. VAMDC involves 15 administrative

partners representing 24 teams from 6 European Union

member states, Serbia, the Russian Federation and

Venezuela.

• VAMDC is supported by EU in the framework of the FP7

"Research Infrastructures - INFRA-2008-1.2.2 - Scientific

Data Infrastructures" initiative. It started on the 1st of July

2009 for a duration of 42 months.

VAMDC integrates several research groups

mainly from the European Research Area

NIST

IVIC

CeCalCULA

UCL

U Cambridge

Open U

Queen’s U

U Uppsala

U Cologne

CNRS

INA Italia

AO Belgrade

U Vienna

RAS

RFNC

Outstanding problems in existing A&M databases

are interoperability and data interfaces

VAMDC intends to deploy an interoperable e-

environment for distributed A&M databases

database1

database2

database3 database4

Users will be able to navigate seamlessly

and retrieve data from 21 A&M databases

NIST

HITRAN

OPserver

XSTAR

TIPbase

TOPbase

W@DIS

SPECTRA

OZONE

CDSD SpecW3

BELDATA

LASP

PAH

KIDA

UMIST

STSP

BASECOL

CDMS

CHIANTI VALD

VAMDC

A&M data are used in a wide variety of

research and industrial fields

Astrophysics

Fusion plasmas Lighting

VAMDC is conceived as a virtual warehouse

of A&M distributed data services

The first database integrations were carried

out by the IAEA by means of web portals

Database integration and data exchange

management are now performed with XML

Source: Freire & Benedict, 2004, Comp. Sc. Eng., 6, 12

Storage of XML in a database

Source: Freire & Benedict, 2004, Comp. Sc. Eng., 6, 12

XSAMS is an XML schema for

A&M data exchange

XSAMS tree

XSAMS

Methods

Functions

Data

sources Objects Processes

Atoms Molecules

Solids Particles

Nonrad. Radiative

Collisions

VO-PDC Forum, Paris, November 2011

<ChemicalElement>

<NuclearCharge> 1</NuclearCharge>

<ElementSymbol>H </ElementSymbol>

</ChemicalElement>

<Isotope>

<IonState>

<IonCharge> 0</IonCharge>

<IsoelectronicSequence> H </IsoelectronicSequence>

<AtomicState stateID="S.0101.001">

<AtomicNumericalData>

<StateEnergy><Value units="1/cm"> 0.0000000E+00</Value></StateEnergy>

</AtomicNumericalData>

<AtomicQuantumNumbers>

<Parity>even</Parity>

<TotalAngularMomentum> 0.5</TotalAngularMomentum>

</AtomicQuantumNumbers>

<AtomicComposition>

<Component>

<Configuration>

<Shells>

<Shell>

<PrincipalQuantumNumber> 1</PrincipalQuantumNumber>

<OrbitalAngularMomentum><Value> 0</Value></OrbitalAngularMomentum>

<NumberOfElectrons>1</NumberOfElectrons>

</Shell>

</Shells>

<ConfigurationLabel>1s_1/2 </ConfigurationLabel>

</Configuration>

<Term>

<LS>

<L><Value> 0</Value></L>

<S>0.5</S>

<Multiplicity>2</Multiplicity>

</LS>

</Term>

</Component>

</AtomicComposition>

</AtomicState>

<RadiativeTransition>

<EnergyWavelength>

<Wavelength>

<Theoretical><Value units="nm"> 1.215674E+03</Value></Theoretical>

</Wavelength>

</EnergyWavelength>

<InitialStateRef>S.0101.002</InitialStateRef>

<FinalStateRef>S.0101.001</FinalStateRef>

<Probability>

<TransitionProbabilityA><Value units="1/s"> 6.2684E+08</Value></TransitionProbabilityA>

</Probability>

</RadiativeTransition>

Deployment Strategy • All data on the WWW

• Databases stay at their producers’

sites

• All data searchable

VO-PDC Forum, Paris, November 2011

VAMDC registry based on IVOA

registry standards

VO-PDC Forum, Paris, November 2011

Data Access: TAP-XSAMS

VO-PDC Forum, Paris, November 2011

Based on TAP standard

Ingredients & structure of a

VAMDC node

VAMDC will provide a registry of A&M

web services

The XSTAR spectral modeling code is being

offered as a SOAP web service

XSTAR

uaDB

Command-based app

Once XSTAR is available as a web service,

it can be integrated in a web page

Workflows maybe published in

scientific social network systems

A&M data producer-user communities are

being consolidated with Web 2.0 tools

Conclusions

Scientific research is becoming increasingly collaborative

and data-intensive (e-science)

Atomic data production must be scaled up to the extreme

requirements of diverse virtual organizations

Data repositories must be kept fit and integrated for

contemporary purposes, discovery and reuse (data curation)

Data provenance and preservation are of vital importance

Regarding A&M data, VAMDC is addressing most of these

issues

An A&M XML schema (XSAMS) has been released and is

being extended and maintained

If you have A&M databases, you are welcome to set up a

VAMDC node for publication