Download - D4science-II Codata

Transcript
Page 1: D4science-II Codata

D4Science:An e-Infrastructure for Facilitating Fisheries

and Aquaculture Resource Management

Pasquale PaganoNational Research Council of Italy

[email protected]

22nd International CODATA24-27 October 2010

Cape Town (South Africa)

www.d4science.eu

Page 2: D4science-II Codata

2

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Assumptions

Consolidated facts:

Very rich applications and data collections are currently maintained by a multitude of authoritative providers

Different problems require different execution paradigms: batch, map-reduce, synchronous call, message-queue, …

Key distributed computation technologies exist: grid (gLite and Globus), distributed resource management (Condor), clusters (Hadoop), …

Several standards are adopted in the same domain

Societal observations

• A rich variety of protocols, models, and formats • Create barriers in the usage of resources• Delay dramatically new exploitation patterns

Technical observations

Protocols, models, and formats heterogeneity increases load, Load increases failures

Page 3: D4science-II Codata

3

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

D4Science Vision

D4Science objectives:

hide heterogeneity, i.e. abstract over differences in location, protocol, and model;

embrace heterogeneity, i.e. allow for multiple locations, protocols, and models;

Technical goals

no bottlenecks: scale no less than the interfaced resources no outages: keep failures partial and temporary autonomicity: system reacts and recovers

Page 4: D4science-II Codata

4

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

From a testbed to a production ecosystem

Oct .’04 Nov.’07 Jan.’08 Dec.’09Oct .’09 Sept.’11

Page 5: D4science-II Codata

5

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

From a testbed to a production ecosystemfu

nctio

nalit

y

gLite

gCube

Oct .’04 Nov.’07 Jan.’08 Dec.’09Oct .’09 Sept.’11

Page 6: D4science-II Codata

6

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Infrastructure Exploitation

30 Nodes• CNR• NKUA• ESA• FAO• UNIBASEL

25 Data• EEA• MERIS• AATSR69 Metadata• es• ISO19115• eiDB

15 Data• AquaMaps• Fact Sheets• Country Maps 28 Metadata• FARM_dc• aquamaps

Nodes Collections Functionality

29 Nodes• CNR• NKUA• FAO• UNIBASEL

• Integration with gPod

• Geographical and text search• Search by metadata• Personal workspace• Objects annotation• Report generation• Maps Generation•Time Series management

Production

More than 500 autonomic Web Services

Page 7: D4science-II Codata

7

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

A Digital Library System is a possibly distributed system that collects, manages and preserves for the long term rich digital content, and offers to its user communities specialised functionality on that content, of measurable quality and according to codified policies

[The Digital Library Reference Model]

The gCube data infrastructure enabling framework provides DL functionality by:

gCube as a Digital Library System

Federating exiting digital content

Supporting the generation of new digital content

Providing discovery and access capabilities

maintained in a variety of tailored repository systems

by exploiting heterogeneous computational platforms

on diversely described and modeled digital content

Page 8: D4science-II Codata

8

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

gCube as an e-Infrastructure ecosystem enabling framework

By bridging a number of well-established systems and standards from various domains

including high-energy physics, biodiversity, fishery and aquaculture resources management

gCube realises an

e-Infrastructure ecosystem

Page 9: D4science-II Codata

9

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

How does it work ?

Page 10: D4science-II Codata

10

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Why sharing through VREs is a key? 

Through the VRE, groups of users have controlled access to distributed data and services integrated under a personalised interface.

Page 11: D4science-II Codata

11

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Why sharing through VREs is a key?

A Virtual Research Environment (VRE) supports cooperative activities

Metadata cleaning, enrichment, and transformation by exploiting mapping schema, controlled vocabulary, thesauri, and ontology

Processes refinement and show cases implementation (restricted to a set of users);

Data assessment (required to make data publically exploitable by VO members);

Expert users validation of products generated through data elaboration or simulation.

Page 12: D4science-II Codata

12

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Why sharing through VREs is a key?

VREs integrated environment put at disposal a functionality set to support and perform research activities:

the ability to integrate heterogeneous data and services the ability to process information on-demand ingesting the

results, to share data and process with other users, to customize collection of information, to store user actions and exploit them for further use, to aggregate relevant information into ad-hoc information

sources and keeping them updated.

VREs integrated environment put at disposal a functionality set to support and perform research activities:

the ability to integrate heterogeneous data and services the ability to process information on-demand ingesting the

results, to share data and process with other users, to customize collection of information, to store user actions and exploit them for further use, to aggregate relevant information into ad-hoc information

sources and keeping them updated.

Page 13: D4science-II Codata

13

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Building Virtual Research Environments

Page 14: D4science-II Codata

14

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Building Virtual Research Environments

Page 15: D4science-II Codata

15

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Building Virtual Research Environments

Page 16: D4science-II Codata

16

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Building Virtual Research Environments

Page 17: D4science-II Codata

17

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Building Virtual Research Environments

Page 18: D4science-II Codata

18

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Building Virtual Research Environments

Page 19: D4science-II Codata

19

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Transformation

Storage

VRE Facilities

Tools supporting specific tasksTools supporting specific tasks

A virtual live document to describe research results

A virtual live document to describe research results

A virtual desktop to organize the working environment

A virtual desktop to organize the working environment

Workspace

Species Maps Generation

Time Series Management

ReportManagement

Search AnnotationVisualisatio

nSearch AnnotationVisualisatio

nAnnotationSearchStorageVisualisatio

n

TransformationTransformatio

nStorage

Page 20: D4science-II Codata

20

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Workspace

A collaboration-oriented suite providing for seamless access and organisation facilities on a rich array of

objects (e.g. Information Objects, Queries, Files, Templates) mediation between external world objects, systems and

infrastructures (import/export/publishing) support common file manager (drag & drop, contextual menu) support an effective rich object sharing facility

Page 21: D4science-II Codata

21

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

AquaMaps is an application*

tailored to predict global distributions of marine species initially designed for marine mammals and subsequently generalised to marine species,

that generates color-coded species range maps using a half-degree latitude and longitude blocks

by interfacing several databases and repository providers

Species Distribution Maps Generation

* Algorithm by Kashner et al. 2006

Page 22: D4science-II Codata

22

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

AquaMaps execution is based on the gCube Ecological Niche Modelling Suite which allows the extrapolation of known species occurrences

Species Distribution Maps Generation

◦ to determine environmental envelopes (species tolerances)

◦ to predict future distributions by matching species tolerances against local environmental conditions (e.g. climate change and sea pollution)

Very large volume of input and output data: HSPEC native range 56,468,301 - HSPEC suitable range 114,989,360Very large number of computation: One multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species requires 125 millions computations (Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center)

Page 23: D4science-II Codata

23

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Time Series Management

Offers a set of tools to manage capture statistics

Supports the complete TS lifecycle Supports validation, curation, and analysis Provides support for data reallocation Produces uniform data-set

Page 24: D4science-II Codata

24

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Time Series

Offers a set of tools to operate on capture statistics

Multiple key families support Filtering, grouping, and aggregation Union Mining

Produce automatically provenance information

Page 25: D4science-II Codata

25

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Time Series

Offers a set of tools to operate on capture statistics

Multiple key families support Filtering, grouping, and aggregation Union Mining

Produce automatically provenance information

Page 26: D4science-II Codata

26

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Report Management

A collaboration-oriented suite providing for template-oriented, feature-rich and flexible document format

definition effective and infrastructure-integrated report compilation (drag &

drop workspace items) collaborative and distributed editing (workspace based) standard-based report materialisation (HTML, OpenXML)

Page 27: D4science-II Codata

27

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

VREs, Workspaces and Report in Action

Page 28: D4science-II Codata

28

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

gCube and Humanities: the gMan case

JISC - King’s College London Look at new ways of integrating existing data resources for Classics and

add services so that research work based on integrated resources can be published

Data sources The Heidelberger Gesamtverzeichnis (HGV) der griechischen Papyrusurkunden

Aegyptens, a collection of metadata records for 55,000 Greek papyri from Egypt. Projet Volterra, a database of Roman legal texts, and associated metadata, from

various sources (epigraphic, papyrological, or literary) currently in the low tens of thousands but very much in progress.

The Inscriptions of Aphrodisias, (InsAph), a corpus of about 2,000 ancient Greek inscriptions from the Roman city of Aphrodisias in Asia Minor, including transcribed texts and metadata marked up using EpiDoc TEI, as well as images of the physical objects.

Main functionality cross-collection search workspace annotation report creation

Early results in “AHM 2009 Phil. Trans. A special issue”

Page 29: D4science-II Codata

29

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

VRE Sumamry

D4Science approach:

• Heterogeneous resources are accessible in a common ecosystem of resources

• despite their locations, technologies, and protocol

• Different communities have access to different views• according to the conditions under which the sharing can occur

• Each community can define its own virtual research environment to satisfy specific needs

• for a limited timeframe and at no cost for the providers of the resource

• Several virtual research environments can coexist• without interfering each other even by competing for the same

resources

Page 30: D4science-II Codata

30

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Conclusions

Facts

Very rich services and data collections are currently maintained by a multitude of authoritative providers

Several standards are adopted in the same domain

Interoperability approaches are key to exploit such richness

D4Science offers a variety of patterns, tools, and solutions

to interconnect Heterogeneous digital content Heterogeneous repository systems Heterogeneous computation platforms

with a rich set of free-to-use tailored services to decrease the cost of adoption to reduce the time to market of new ideas to deal with plethora of standards

Page 31: D4science-II Codata

31

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Supported Standards

WS-* WSRF WS-BPEL

JDL JSDL Glue Schema (part)

X-* DC, TEI, ISO etc

JSR (several)

GSI-Security XACML SAML

OpenSearch

OGC related

Comply with: OAI-PMH OAI-ORE

Page 32: D4science-II Codata

32

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Supported Standards

WSRF Specifications

• WS-ResourceProperties (WSRF-RP)• WS-ResourceLifetime (WSRF-RL)• WS-ServiceGroup (WSRF-SG)• WS-BaseFaults (WSRF-BF)

JSR

• 168 : Simple Portlets• 286 : 186 update• 160 : JMX

WSN Specifications:

• WS-BaseNotification• WS-Topics• (WS-BrokeredNotification)• ….

WS-* Standards

• SOAP• WSDL• WS-Addressing• ….

ISO:

• ISO3166 countries• ISO4217 currencies• ISO19115 geo-location• ….

X-*

• XML• XSD• XSL• XSLT• xPath• xQuery

OGC

• Web Coverage Processing Service • Web Coverage Service • Web Feature Service • Web Map Context • Web Map Service • Web Map Tile Service • Web Processing Service • Web Service Common

OGF Standard:

• Glue Schema (2)

……….

Comply with: OAI-PMH OAI-ORE

Page 33: D4science-II Codata

33

www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010

Find us

www.gcube-system.org

www.d4science.eu

Donatella CastelliD4Science-II Project [email protected]

Pasquale PaganoD4Science-II Technical [email protected]

Thank You For Your Attention