Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI...

43
www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI [email protected] ICIS Requirements Gathering Meeting Rome, Italy, 10-13 June 2008
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    4

Transcript of Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI...

Page 1: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

www.d4science.eu

gCube framework: the way to implement the D4Science visionPasquale Pagano CNR-ISTI [email protected]

ICIS Requirements Gathering MeetingRome, Italy, 10-13 June 2008

Page 2: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

2

www.d4science.eu

D4Science vision

D4Science vision

calls for the realization of scientific e-Infrastructures that will remove all heterogeneity, sustainability, scalability, and other technical concerns from the minds of scientists, hide all related complexities from their perception, and enable them to focus on their science and collaborate on common research challenges

gCube is

a framework to manage distributed e-infrastructures where it is possible to define, host, and maintain dynamic Virtual Research Environments (VREs) capable to satisfy the collaboration needs of distributed Virtual Organizations (VOs)

Page 3: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

3

www.d4science.eu

e-Infrastructure

• An infrastructure is the basic physical and organizational structures and facilities (roads, power supplies, ..) needed for the operation of a society or enterprise

• An e-Infrastructure provides support for effective consumption of shared resources:

hardware-bound resources (i.e. networks, storage, instruments, and computational resources),

system-level software resources (i.e. basic middleware services), and application-level software resources (i.e. data sources and

services).

These e-Infrastructures offer mechanisms that concurrently exploit networks, grids and data in a seamless fashion, and will thus enable scientific communities to operate within a coherent model, regardless of the location of their research facilities.

Page 4: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

4

www.d4science.eu

Virtual Organization (VO)

• A Virtual Organization (VO) models sets of users and resources defining clearly and carefully

what is shared, who is allowed to share, and the conditions under which sharing occurs, usually

based on an authentication and authorization policies.

VOs may have a limited lifetime and they are dynamically created to satisfy transient needs of the constituent potentially heterogeneous actual Organizations.

Page 5: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

5

www.d4science.eu

Virtual Research Environment

• A Virtual Research Environment (VRE) provides a framework of applications, services and data sources dynamically identified to support the underlying processes of research/collaboration/cooperation.

The purpose of a VRE is to help researchers* in all disciplines by managing the increasingly complex range of tasks involved in carrying out their activities.

*Researcher has to be considered in the large, i.e. end-user, decision-makers, resource and data providers, etc.

Page 6: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

6

www.d4science.eu

gCube Infrastructure

gCube Infrastructure

gHN

RIRI

service

RI

port-type

RIRI

RIRI

VO VRE

VO

VRE

ADMIN

DESIGNER

ADMIN

USERS

OWNER

OWNER

ADMIN

Page 7: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

www.d4science.eu

VRE Highlights

Page 8: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

8

www.d4science.eu

VRE Advantages

• gCube infrastructures creates new opportunities to change the VRE development model used by distributed and dynamic organisations and communities

• Using gCube empowered infrastructures, the organisations and communities are able to setup their own environment:

When and for the time they need it Exploiting existing Grid-based services Accessing to and handling of distributed multi-focused data and services Orchestrating user defined services, with defined QoS (wrt. scalability,

reliability) Profiting from a shared storage and computational set of resources Sharing data and services in a collaborative and efficient way

Page 9: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

9

www.d4science.eu

VRE Definition Steps

Page 10: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

10

www.d4science.eu

VRE Definition Steps

Page 11: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

11

www.d4science.eu

VRE Definition Steps

Page 12: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

12

www.d4science.eu

VRE Definition Steps

Page 13: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

13

www.d4science.eu

VRE Definition Steps

Page 14: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

14

www.d4science.eu

VRE Definition Steps

Page 15: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

15

www.d4science.eu

VRE Definition Steps

Page 16: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

16

www.d4science.eu

gCube Service Management: System Administrator Measured Effort

SCOPE ACTION TIME

Infrastructureinstall collective layer < 1 day

install portal < 1 day

VO

install 1 DHN < 10 min

register resources (DHN, data) < 1 min

approve resource (DHN, data) < 1 min

data publishing (metadata, indexes) hours

manage users < 10 min

VRE

define VRE < 10 min

approve VRE < 1 min

deploy VRE < 2 hour

modify VRE < 1 hour

Page 17: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

www.d4science.eu

Content Management

Page 18: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

18

www.d4science.eu

Content & Storage Management: Challenges

Store large volumes of digital content in the Grid

But there is much more:Metadata for each object

<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean

Iberia North_Atlantic Africa North_Africa Middle_East Portugal </OVERLAP_REGIONS>

<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean

Iberia North_Atlantic Africa North_Africa Middle_East Portugal </OVERLAP_REGIONS>

<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean

Iberia North_Atlantic Africa </OVERLAP_REGIONS> <DATA_FILE_FORMAT>ENVI</DATA_FILE_FORMAT> ...</DIMAP_DOCUMENT>

<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Mediterranean

Iberia North_Atlantic Africa North_Africa Middle_East </OVERLAP_REGIONS> <DATA_FILE_FORMAT>ENVI</DATA_FILE_FORMAT> ...</DIMAP_DOCUMENT>

Automatically extracted features per object (e.g., color histo-grams for images)

203 236 172 210 78Storage properties(e.g., size, etc.)

… and all that highly inter-connected

Page 19: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

19

www.d4science.eu

gCube Data Management

• gCube Data Management supports researchers by providing means for

Persistently storing and physically structuring of content Logical grouping of content in collections Logical sharing of content among several collections Sharing of collections in a VRE and among several VREs through

shared workspace Management of complex content consisting of several parts and

having multiple representations Storage of structured and heterogeneous metadata compliant with

different formats and schemas Programmatic/manual annotation of content via text & images, e.g.

data provenance Content linking Definition of ‘composite documents’ template

Page 20: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

20

www.d4science.eu

gCube Data Management [cont.]

• gCube Data Management supports application designers/providers by providing means for

Bulk upload and update of data and metadata Manipulation of metadata and data through a powerful Data

Transformation Engine (gDTS). Metadata can be cleaned, enriched, and transformed in different

formats by exploiting mapping schema, controlled vocabulary, thesauri, and ontology to facilitate data integration and discovery

Data can be transformed to offer different views of the some goods Replication and partition of data Subscription and notification Generation and publishing of new data through workflow definition,

optimisation, and execution

• gCube exploits Grid technology file-system-like functionalities to manage data storage

Page 21: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

21

www.d4science.eu

Document Model

All entities in a VRE are information objects: collections, documents, metadata

Any information object can be stored or fetched independently of what it represents

Information Object

Storage_Property

ObjectReference

comprise

Reference_Source

Reference_Target

Reference_Role

Reference_Propagation

Info_Object_ID

Property_Name

Property_Type

Property_Value

BLOB object File object

ISA

Page 22: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

22

www.d4science.eu

Content example (Earth Observation)

Storage Properties

Satellite Image

ImageFeaturesImage

FeaturesImageFeatures

<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean

Iberia North_Atlantic Africa North_Africa Middle_East Portugal </OVERLAP_REGIONS> <DATA_FILE_FORMAT>ENVI</DATA_FILE_FORMAT> ...</DIMAP_DOCUMENT>

Metadata as XML Document

Metadata Management

Content & Storage Management

Feature Extraction

Storage Properties

Page 23: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

23

www.d4science.eu

Data Management Architecture

Content Management Layer

Base Layer

Storage ManagementLayer

Content Management

Service

Collection Management

Service

Notification Service

Archive Import Service

Storage Management Service

Any off-the-shelf DBMS, e.g. MySQL

Any SRM enabled SE, e.g. DPM

JDBC GFAL

Replication Management Service

Properties & Relations

Raw File Content

Cataloguing API Storage API

LFC SRM

Page 24: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

24

www.d4science.eu

gCube Storage Model

Content Management Layer

Base Layer

Storage ManagementLayer

RDBMS gLite SE

ISA

BLOB object File object

Content GUID

Information Object

Document

Page 25: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

25

www.d4science.eu

Bridging Data Sources

Hosted on the infrastructure

Data Sources are interfaced through ..The bridges are managed by ..

.. the infrastructure

Page 26: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

26

www.d4science.eu

Managing Data Source Heterogeneity

Mapping rules

Page 27: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

27

www.d4science.eu

gCube Infrastructure

Managing Data Import

DS import

MR MR MR

VRE 2 VRE 1 VRE 3 VRE 4

Page 28: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

www.d4science.eu

Search Management

Page 29: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

29

www.d4science.eu

gCube Search Engine Features

• An open, feature-rich distributed Search Engine Composed out of diverse, autonomous, pluggable

elements. Capturing complex application scenarios by combining

information retrieval and data processing procedures

• Maximization of resources placed at the disposal of VRE managers and users

Ease of sharing of resources, avoiding mis-utilization and misuse

Reduction of cost of ownership and use

Page 30: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

30

www.d4science.eu

Functional overview

• Search types Structured data (fielded search / xml search) Semi structured data (xml search) Geospatial / temporal data (R-Tree) Content based search

Full text search Image similarity search

• Access XML-based Query Language Web user interface (portal / search portlets) Command line UI

• Retrieval Incremental result delivery Automatic caching Result persistence

Page 31: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

31

www.d4science.eu

Functional Overview [cont.]

• Automatic description of the Content Sources

• To cope with Several content sources

Potentially thematically focused

Different ranking estimation Different structures for managing metadata / information retrieval Different content

• By offering the Selection of the appropriate sources Merging or fusing (re-ranking) the results in meaningful lists

Page 32: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

32

www.d4science.eu

Search in action

Page 33: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

33

www.d4science.eu

Search in action

Page 34: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

34

www.d4science.eu

Search in action

Page 35: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

35

www.d4science.eu

Search in action

Page 36: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

www.d4science.eu

Process Management

Page 37: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

37

www.d4science.eu

Process Management Rationale

• VREs are large-scale collections of resources that provide dedicated services to manage and access digital data sources

• VRE applications require to integrate these (distributed) resources into a coherent whole = „Programming in the Large“ (composition of services into processes)

Query Reformulation

Index Access(Query Execution)

Result Filtering

Page 38: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

38

www.d4science.eu

Service

Process Management: Example Search

• Similarity search over multimedia documents1. Query

2. Extract Features

3. Query Index

5. Present Results

Index203 236 172 210 78 4. Access

Content & CreateResult Set

Process

Page 39: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

39

www.d4science.eu

Processes in gCube

• Following the SOA paradigm, processes are the first choice in gCube to define and execute applications on the basis of available services

• gCube’s approach to Process Management on the Grid consists of three main phases

Process Design and Verification Through a graphical user interface for specification of processes

Process Execution and Reliability Distribute process support in the infrastructure Dynamic allocation of resources Sophisticated failure handling

Process Optimization Structural process modifications to maximise parallelism

Page 40: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

40

www.d4science.eu

… Sample Application Process:Meris Global Vegetation Index (MGVI)

Page 41: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

41

www.d4science.eu

Process Monitoring

• A monitoring interface allows to keep track of running process instances

Page 42: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

42

www.d4science.eu

Conclusion

• gCube reduces the costs to manage any complex multi-domain e-Infrastructure.

• gCube offers an horizontal solution to manage and enrich on-demand created VREs on complex e-Infrastructures.

• gCube is equipped with data and metadata management facilities that allows to make interoperable heterogeneous data sources.

• gCube is compliant with consolidated and emerging standards.

• gCube offers an open family of frameworks that can be easily customised

Page 43: Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI pasquale.pagano@isti.cnr.it ICIS Requirements Gathering.

www.d4science.eu

Thank you.