Data-intensive profile for the VAMDC

36
Data-intensive profile for VAMDC C. Mendoza (IVIC, CeCalCULA) & VAMDC Collaboration 3d CDAMOP University of Delhi 15 December 2011

description

Talk given by Claudio Mendoza at 3d CDAMOP, 14-16 December 2011, Delhi University, Delhi, India.

Transcript of Data-intensive profile for the VAMDC

Page 1: Data-intensive profile for the VAMDC

Data-intensive

profile for VAMDC

C. Mendoza (IVIC, CeCalCULA) &

VAMDC Collaboration

3d CDAMOP

University of Delhi

15 December 2011

Page 2: Data-intensive profile for the VAMDC

Since early 80s, I have contributed to international

consortia for the production of massive atomic

data sets for astrophysical applications

1982-1997: Opacity Project (Opacity Project Team 1995)

Radiative atomic data (LS coupling) and opacities for

cosmic abundant elements

Led by Mike Seaton and Dimitri Mihalas

Contributors from France, Germany, UK, USA,

Venezuela

1992-present: IRON Project (Hummer et al 1993)

Radiative and collisional data (intermediate coupling)

for Fe-group ions

Coordinated by David Hummer

Contributors from Canada, France, Germany, UK, USA

Page 3: Data-intensive profile for the VAMDC

TOPbase was one of the first

online atomic databases

Source: Cunto & Mendoza (1992)

Page 4: Data-intensive profile for the VAMDC

In 1995 TIPTOPbase was upgraded

with web technology

Page 5: Data-intensive profile for the VAMDC

The OPserver is a good example of

database-centric computing OPserver at OSC

Source: Mendoza et al. (2007)

Page 7: Data-intensive profile for the VAMDC

Source: John R. Johnson, “HPC for data intensive science”,Pacific Northwest National Laboratory

Page 8: Data-intensive profile for the VAMDC

A new scientific culture:

e-Science (John Taylor 1999) Digital science

Multidisciplinary and collaborative (social

networks)

Virtualized on a 2nd generation Internet

(advanced networks)

Data intensive, open access (database centric)

HPC in distributed environments (grids, clouds)

and managed through services

New communication and publication pathways:

knowledge preservation & dissemination

(metadata)

Page 9: Data-intensive profile for the VAMDC

Grid

E-Scientists

Entire e-Science

Cycle

Encompassing

experimentation,

analysis, publication,

research, learning

Institutional

Archive

Local

Web Publisher

Holdings

Digital

Library

E-Scientists Graduate

Students

Undergraduate

Students

Virtual

Learning

Environment

E-Experimentation

E-Scientists

Technical

Reports

Reprints

Peer-

Reviewed

Journal &

Conference

Papers

Preprints &

Metadata

Certified

Experimental

Results &

Analyses

Data,

Metadata &

Ontologies

Source: David De Roure (Univ. Southampton, UK)

Page 10: Data-intensive profile for the VAMDC

Data curation is rapidly becoming a crucial

step in the research cycle

Original image from Lord et al (2004)

Page 11: Data-intensive profile for the VAMDC

• The Virtual Atomic and Molecular Data Center (VAMDC)

aims at building an interoperable e-infrastructure for the

exchange of A&M data. VAMDC involves 15 administrative

partners representing 24 teams from 6 European Union

member states, Serbia, the Russian Federation and

Venezuela.

• VAMDC is supported by EU in the framework of the FP7

"Research Infrastructures - INFRA-2008-1.2.2 - Scientific

Data Infrastructures" initiative. It started on the 1st of July

2009 for a duration of 42 months.

Page 12: Data-intensive profile for the VAMDC

VAMDC integrates several research groups

mainly from the European Research Area

NIST

IVIC

CeCalCULA

UCL

U Cambridge

Open U

Queen’s U

U Uppsala

U Cologne

CNRS

INA Italia

AO Belgrade

U Vienna

RAS

RFNC

Page 13: Data-intensive profile for the VAMDC

Outstanding problems in existing A&M databases

are interoperability and data interfaces

Page 14: Data-intensive profile for the VAMDC

VAMDC intends to deploy an interoperable e-

environment for distributed A&M databases

database1

database2

database3 database4

Page 15: Data-intensive profile for the VAMDC

Users will be able to navigate seamlessly

and retrieve data from 21 A&M databases

NIST

HITRAN

OPserver

XSTAR

TIPbase

TOPbase

W@DIS

SPECTRA

OZONE

CDSD SpecW3

BELDATA

LASP

PAH

KIDA

UMIST

STSP

BASECOL

CDMS

CHIANTI VALD

VAMDC

Page 16: Data-intensive profile for the VAMDC

A&M data are used in a wide variety of

research and industrial fields

Astrophysics

Fusion plasmas Lighting

Page 17: Data-intensive profile for the VAMDC

VAMDC is conceived as a virtual warehouse

of A&M distributed data services

Page 18: Data-intensive profile for the VAMDC

The first database integrations were carried

out by the IAEA by means of web portals

Page 19: Data-intensive profile for the VAMDC

Database integration and data exchange

management are now performed with XML

Source: Freire & Benedict, 2004, Comp. Sc. Eng., 6, 12

Page 20: Data-intensive profile for the VAMDC

Storage of XML in a database

Source: Freire & Benedict, 2004, Comp. Sc. Eng., 6, 12

Page 21: Data-intensive profile for the VAMDC

XSAMS is an XML schema for

A&M data exchange

Page 22: Data-intensive profile for the VAMDC

XSAMS tree

XSAMS

Methods

Functions

Data

sources Objects Processes

Atoms Molecules

Solids Particles

Nonrad. Radiative

Collisions

VO-PDC Forum, Paris, November 2011

Page 23: Data-intensive profile for the VAMDC

<ChemicalElement>

<NuclearCharge> 1</NuclearCharge>

<ElementSymbol>H </ElementSymbol>

</ChemicalElement>

<Isotope>

<IonState>

<IonCharge> 0</IonCharge>

<IsoelectronicSequence> H </IsoelectronicSequence>

Page 24: Data-intensive profile for the VAMDC

<AtomicState stateID="S.0101.001">

<AtomicNumericalData>

<StateEnergy><Value units="1/cm"> 0.0000000E+00</Value></StateEnergy>

</AtomicNumericalData>

<AtomicQuantumNumbers>

<Parity>even</Parity>

<TotalAngularMomentum> 0.5</TotalAngularMomentum>

</AtomicQuantumNumbers>

<AtomicComposition>

<Component>

<Configuration>

<Shells>

<Shell>

<PrincipalQuantumNumber> 1</PrincipalQuantumNumber>

<OrbitalAngularMomentum><Value> 0</Value></OrbitalAngularMomentum>

<NumberOfElectrons>1</NumberOfElectrons>

</Shell>

</Shells>

<ConfigurationLabel>1s_1/2 </ConfigurationLabel>

</Configuration>

<Term>

<LS>

<L><Value> 0</Value></L>

<S>0.5</S>

<Multiplicity>2</Multiplicity>

</LS>

</Term>

</Component>

</AtomicComposition>

</AtomicState>

Page 25: Data-intensive profile for the VAMDC

<RadiativeTransition>

<EnergyWavelength>

<Wavelength>

<Theoretical><Value units="nm"> 1.215674E+03</Value></Theoretical>

</Wavelength>

</EnergyWavelength>

<InitialStateRef>S.0101.002</InitialStateRef>

<FinalStateRef>S.0101.001</FinalStateRef>

<Probability>

<TransitionProbabilityA><Value units="1/s"> 6.2684E+08</Value></TransitionProbabilityA>

</Probability>

</RadiativeTransition>

Page 26: Data-intensive profile for the VAMDC

Deployment Strategy • All data on the WWW

• Databases stay at their producers’

sites

• All data searchable

VO-PDC Forum, Paris, November 2011

Page 27: Data-intensive profile for the VAMDC

VAMDC registry based on IVOA

registry standards

VO-PDC Forum, Paris, November 2011

Page 28: Data-intensive profile for the VAMDC

Data Access: TAP-XSAMS

VO-PDC Forum, Paris, November 2011

Based on TAP standard

Page 29: Data-intensive profile for the VAMDC

Ingredients & structure of a

VAMDC node

Page 30: Data-intensive profile for the VAMDC

VAMDC will provide a registry of A&M

web services

Page 31: Data-intensive profile for the VAMDC

The XSTAR spectral modeling code is being

offered as a SOAP web service

XSTAR

uaDB

Command-based app

Page 32: Data-intensive profile for the VAMDC

Once XSTAR is available as a web service,

it can be integrated in a web page

Page 33: Data-intensive profile for the VAMDC
Page 34: Data-intensive profile for the VAMDC

Workflows maybe published in

scientific social network systems

Page 35: Data-intensive profile for the VAMDC

A&M data producer-user communities are

being consolidated with Web 2.0 tools

Page 36: Data-intensive profile for the VAMDC

Conclusions

Scientific research is becoming increasingly collaborative

and data-intensive (e-science)

Atomic data production must be scaled up to the extreme

requirements of diverse virtual organizations

Data repositories must be kept fit and integrated for

contemporary purposes, discovery and reuse (data curation)

Data provenance and preservation are of vital importance

Regarding A&M data, VAMDC is addressing most of these

issues

An A&M XML schema (XSAMS) has been released and is

being extended and maintained

If you have A&M databases, you are welcome to set up a

VAMDC node for publication