Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI...
-
date post
19-Dec-2015 -
Category
Documents
-
view
221 -
download
4
Transcript of Www.d4science.eu gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI...
www.d4science.eu
gCube framework: the way to implement the D4Science visionPasquale Pagano CNR-ISTI [email protected]
ICIS Requirements Gathering MeetingRome, Italy, 10-13 June 2008
2
www.d4science.eu
D4Science vision
D4Science vision
calls for the realization of scientific e-Infrastructures that will remove all heterogeneity, sustainability, scalability, and other technical concerns from the minds of scientists, hide all related complexities from their perception, and enable them to focus on their science and collaborate on common research challenges
gCube is
a framework to manage distributed e-infrastructures where it is possible to define, host, and maintain dynamic Virtual Research Environments (VREs) capable to satisfy the collaboration needs of distributed Virtual Organizations (VOs)
3
www.d4science.eu
e-Infrastructure
• An infrastructure is the basic physical and organizational structures and facilities (roads, power supplies, ..) needed for the operation of a society or enterprise
• An e-Infrastructure provides support for effective consumption of shared resources:
hardware-bound resources (i.e. networks, storage, instruments, and computational resources),
system-level software resources (i.e. basic middleware services), and application-level software resources (i.e. data sources and
services).
These e-Infrastructures offer mechanisms that concurrently exploit networks, grids and data in a seamless fashion, and will thus enable scientific communities to operate within a coherent model, regardless of the location of their research facilities.
4
www.d4science.eu
Virtual Organization (VO)
• A Virtual Organization (VO) models sets of users and resources defining clearly and carefully
what is shared, who is allowed to share, and the conditions under which sharing occurs, usually
based on an authentication and authorization policies.
VOs may have a limited lifetime and they are dynamically created to satisfy transient needs of the constituent potentially heterogeneous actual Organizations.
5
www.d4science.eu
Virtual Research Environment
• A Virtual Research Environment (VRE) provides a framework of applications, services and data sources dynamically identified to support the underlying processes of research/collaboration/cooperation.
The purpose of a VRE is to help researchers* in all disciplines by managing the increasingly complex range of tasks involved in carrying out their activities.
*Researcher has to be considered in the large, i.e. end-user, decision-makers, resource and data providers, etc.
6
www.d4science.eu
gCube Infrastructure
gCube Infrastructure
gHN
RIRI
service
RI
port-type
RIRI
RIRI
VO VRE
VO
VRE
ADMIN
DESIGNER
ADMIN
USERS
OWNER
OWNER
ADMIN
www.d4science.eu
VRE Highlights
8
www.d4science.eu
VRE Advantages
• gCube infrastructures creates new opportunities to change the VRE development model used by distributed and dynamic organisations and communities
• Using gCube empowered infrastructures, the organisations and communities are able to setup their own environment:
When and for the time they need it Exploiting existing Grid-based services Accessing to and handling of distributed multi-focused data and services Orchestrating user defined services, with defined QoS (wrt. scalability,
reliability) Profiting from a shared storage and computational set of resources Sharing data and services in a collaborative and efficient way
9
www.d4science.eu
VRE Definition Steps
10
www.d4science.eu
VRE Definition Steps
11
www.d4science.eu
VRE Definition Steps
12
www.d4science.eu
VRE Definition Steps
13
www.d4science.eu
VRE Definition Steps
14
www.d4science.eu
VRE Definition Steps
15
www.d4science.eu
VRE Definition Steps
16
www.d4science.eu
gCube Service Management: System Administrator Measured Effort
SCOPE ACTION TIME
Infrastructureinstall collective layer < 1 day
install portal < 1 day
VO
install 1 DHN < 10 min
register resources (DHN, data) < 1 min
approve resource (DHN, data) < 1 min
data publishing (metadata, indexes) hours
manage users < 10 min
VRE
define VRE < 10 min
approve VRE < 1 min
deploy VRE < 2 hour
modify VRE < 1 hour
www.d4science.eu
Content Management
18
www.d4science.eu
Content & Storage Management: Challenges
Store large volumes of digital content in the Grid
But there is much more:Metadata for each object
<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean
Iberia North_Atlantic Africa North_Africa Middle_East Portugal </OVERLAP_REGIONS>
<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean
Iberia North_Atlantic Africa North_Africa Middle_East Portugal </OVERLAP_REGIONS>
<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean
Iberia North_Atlantic Africa </OVERLAP_REGIONS> <DATA_FILE_FORMAT>ENVI</DATA_FILE_FORMAT> ...</DIMAP_DOCUMENT>
<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Mediterranean
Iberia North_Atlantic Africa North_Africa Middle_East </OVERLAP_REGIONS> <DATA_FILE_FORMAT>ENVI</DATA_FILE_FORMAT> ...</DIMAP_DOCUMENT>
Automatically extracted features per object (e.g., color histo-grams for images)
203 236 172 210 78Storage properties(e.g., size, etc.)
… and all that highly inter-connected
19
www.d4science.eu
gCube Data Management
• gCube Data Management supports researchers by providing means for
Persistently storing and physically structuring of content Logical grouping of content in collections Logical sharing of content among several collections Sharing of collections in a VRE and among several VREs through
shared workspace Management of complex content consisting of several parts and
having multiple representations Storage of structured and heterogeneous metadata compliant with
different formats and schemas Programmatic/manual annotation of content via text & images, e.g.
data provenance Content linking Definition of ‘composite documents’ template
20
www.d4science.eu
gCube Data Management [cont.]
• gCube Data Management supports application designers/providers by providing means for
Bulk upload and update of data and metadata Manipulation of metadata and data through a powerful Data
Transformation Engine (gDTS). Metadata can be cleaned, enriched, and transformed in different
formats by exploiting mapping schema, controlled vocabulary, thesauri, and ontology to facilitate data integration and discovery
Data can be transformed to offer different views of the some goods Replication and partition of data Subscription and notification Generation and publishing of new data through workflow definition,
optimisation, and execution
• gCube exploits Grid technology file-system-like functionalities to manage data storage
21
www.d4science.eu
Document Model
All entities in a VRE are information objects: collections, documents, metadata
Any information object can be stored or fetched independently of what it represents
Information Object
Storage_Property
ObjectReference
comprise
Reference_Source
Reference_Target
Reference_Role
Reference_Propagation
Info_Object_ID
Property_Name
Property_Type
Property_Value
BLOB object File object
ISA
22
www.d4science.eu
Content example (Earth Observation)
Storage Properties
Satellite Image
ImageFeaturesImage
FeaturesImageFeatures
<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> <RESOLUTION_TYPE>RR</RESOLUTION_TYPE> <PRODUCT_LEVEL>2P</PRODUCT_LEVEL> <PRODUCT_NAME>MERIS</PRODUCT_NAME> <BAND_USED>___ALGAL_1</BAND_USED> <BAND_USED_NORM>ALGAL 1</BAND_USED_NORM> <START_DATE>2005-08-01</START_DATE> <END_DATE>2005-08-07</END_DATE> <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean
Iberia North_Atlantic Africa North_Africa Middle_East Portugal </OVERLAP_REGIONS> <DATA_FILE_FORMAT>ENVI</DATA_FILE_FORMAT> ...</DIMAP_DOCUMENT>
Metadata as XML Document
Metadata Management
Content & Storage Management
Feature Extraction
Storage Properties
23
www.d4science.eu
Data Management Architecture
Content Management Layer
Base Layer
Storage ManagementLayer
Content Management
Service
Collection Management
Service
Notification Service
Archive Import Service
Storage Management Service
Any off-the-shelf DBMS, e.g. MySQL
Any SRM enabled SE, e.g. DPM
JDBC GFAL
Replication Management Service
Properties & Relations
Raw File Content
Cataloguing API Storage API
LFC SRM
24
www.d4science.eu
gCube Storage Model
Content Management Layer
Base Layer
Storage ManagementLayer
RDBMS gLite SE
ISA
BLOB object File object
Content GUID
Information Object
Document
25
www.d4science.eu
Bridging Data Sources
Hosted on the infrastructure
Data Sources are interfaced through ..The bridges are managed by ..
.. the infrastructure
26
www.d4science.eu
Managing Data Source Heterogeneity
Mapping rules
27
www.d4science.eu
gCube Infrastructure
Managing Data Import
DS import
MR MR MR
VRE 2 VRE 1 VRE 3 VRE 4
www.d4science.eu
Search Management
29
www.d4science.eu
gCube Search Engine Features
• An open, feature-rich distributed Search Engine Composed out of diverse, autonomous, pluggable
elements. Capturing complex application scenarios by combining
information retrieval and data processing procedures
• Maximization of resources placed at the disposal of VRE managers and users
Ease of sharing of resources, avoiding mis-utilization and misuse
Reduction of cost of ownership and use
30
www.d4science.eu
Functional overview
• Search types Structured data (fielded search / xml search) Semi structured data (xml search) Geospatial / temporal data (R-Tree) Content based search
Full text search Image similarity search
• Access XML-based Query Language Web user interface (portal / search portlets) Command line UI
• Retrieval Incremental result delivery Automatic caching Result persistence
31
www.d4science.eu
Functional Overview [cont.]
• Automatic description of the Content Sources
• To cope with Several content sources
Potentially thematically focused
Different ranking estimation Different structures for managing metadata / information retrieval Different content
• By offering the Selection of the appropriate sources Merging or fusing (re-ranking) the results in meaningful lists
32
www.d4science.eu
Search in action
33
www.d4science.eu
Search in action
34
www.d4science.eu
Search in action
35
www.d4science.eu
Search in action
www.d4science.eu
Process Management
37
www.d4science.eu
Process Management Rationale
• VREs are large-scale collections of resources that provide dedicated services to manage and access digital data sources
• VRE applications require to integrate these (distributed) resources into a coherent whole = „Programming in the Large“ (composition of services into processes)
Query Reformulation
Index Access(Query Execution)
Result Filtering
38
www.d4science.eu
Service
Process Management: Example Search
• Similarity search over multimedia documents1. Query
2. Extract Features
3. Query Index
5. Present Results
Index203 236 172 210 78 4. Access
Content & CreateResult Set
Process
39
www.d4science.eu
Processes in gCube
• Following the SOA paradigm, processes are the first choice in gCube to define and execute applications on the basis of available services
• gCube’s approach to Process Management on the Grid consists of three main phases
Process Design and Verification Through a graphical user interface for specification of processes
Process Execution and Reliability Distribute process support in the infrastructure Dynamic allocation of resources Sophisticated failure handling
Process Optimization Structural process modifications to maximise parallelism
40
www.d4science.eu
… Sample Application Process:Meris Global Vegetation Index (MGVI)
41
www.d4science.eu
Process Monitoring
• A monitoring interface allows to keep track of running process instances
42
www.d4science.eu
Conclusion
• gCube reduces the costs to manage any complex multi-domain e-Infrastructure.
• gCube offers an horizontal solution to manage and enrich on-demand created VREs on complex e-Infrastructures.
• gCube is equipped with data and metadata management facilities that allows to make interoperable heterogeneous data sources.
• gCube is compliant with consolidated and emerging standards.
• gCube offers an open family of frameworks that can be easily customised
www.d4science.eu
Thank you.