OGSA-DAI data access and integration

22
Neil Chue Hong Project Manager, EPCC [email protected] +44 131 650 5957 OGSA-DAI data access and integration NERC GridGIS workshop eSI, 1 February 2006

description

OGSA-DAI data access and integration. NERC GridGIS workshop eSI, 1 February 2006. Overview. The Data Deluge challenges of increasing data availability benefits of bringing data together OGSA-DAI overview use as a data integration base layer. Data Services: challenges to management. - PowerPoint PPT Presentation

Transcript of OGSA-DAI data access and integration

Page 1: OGSA-DAI data access  and integration

Neil Chue HongProject Manager, EPCC

[email protected]+44 131 650 5957

OGSA-DAIdata access

and integration

NERC GridGIS workshop

eSI, 1 February 2006

Page 2: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 2

Overview

• The Data Deluge– challenges of increasing data availability– benefits of bringing data together

• OGSA-DAI– overview– use as a data integration base layer

Page 3: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 6

Data Services: challenges to management

• Scale– Many sites, large collections, many uses

• Longevity– Research requirements outlive technical decisions

• Diversity– No “one size fits all” solutions will work

– Primary Data, Data Products, Meta Data, Administrative data, …

• Many Data Resources– Independently owned & managed

– No common goals– No common design– Work hard for agreements on foundation types and ontologies– Autonomous decisions change data, structure, policy, …

– Geographically distributed

• and I haven’t even mentioned security yet!

Page 4: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 8

What is a data service?

• An interface to a stored collection of data– e.g. Google and Amazon– web services

• But the data could be:– replicated– shared– federated– virtual– incomplete

• Don’t care about the underlying representation– do care about the information it represents

• Adding a service layer to existing data sources can improve composability

Page 5: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 10

Use Cases for Data Services

• Data Filtering:– Single source producing large amounts of data distributed to many sites

downstream

• Data Discovery:– many sources, many query entry points in a linked system

• Data Translation:– source to sink, conversion of data model / structure

• Data Federation:– many sources, linked to provide view as a single source

• Data Replication– full or partial copies to improve throughput

• Data Integration (model aggregation)– e.g. integration of time variant data, streams, files

• Data Integration (knowledge expansion)– forming links between databases to increase knowledge

Page 6: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 13

OGSA-DAI In One Slide

• An extensible framework for data access and integration.

• Expose heterogeneous data resources to a grid through web services.

• Interact with data resources:– Queries and updates.– Data transformation / compression– Data delivery.

• Customise for your project using– Additional Activities– Client Toolkit APIs– Data Resource handlers

• A base for higher-level services– federation, mining, visualisation,…

Page 7: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 17

MySQL

OGSA-DAI service

Engine

SQLQuery

JDBCData

Resources

Activities

DB2

The OGSA-DAI Framework

GZip GridFTPXPath

XMLDB

XIndice

readFile

File

SWISSPROT

XSLT

SQLServer

Data-bases

ApplicationApplicationClient ToolkitClient Toolkit

Page 8: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 18

Intermediary

• Simple intermediary– potential to accelerate development, logging, or filtering

• Persistent intermediary– e.g. to allow efficient local indexing

Client OGSA-DAIRequest & Response D

ata

Res

ourc

e

DR messages

Client OGSA-DAIRequest & Response D

ata

Res

ourc

e

DR messages

Client OGSA-DAIRequest & Response D

ata

Re

sour

ce

DR messages

OG

SA

-DA

IP

rivat

e S

tore

Page 9: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 19

Redirector, Coordinator, Network

• Allowing composition and decentralisation

consumer

Data

Res

ourc

e

DR messages

Client OGSA-DAIRequest & Response D

ata

Res

ourc

e

DR messagesD

ata

del

iver

y

OGSA-DAI

Request & Response

Client

DR1

DR2

DR3

Data

Res

ourc

e

OGSA-DAI

Data

Res

ourc

eD

ata

Res

ourc

e

DR mes

sage

s

DR messages

DR messages

Data

Res

ourc

e

OGSA-DAI

Data

Res

ourc

eD

ata

Res

ourc

e

DR mes

sage

s

DR messages

DR messages

Data

Res

ourc

e

OGSA-DAI

Data

Res

ourc

eD

ata

Res

ourc

e

DR mes

sage

s

DR messages

DR messages

Request, R

esponse & D

ata Transport

Req

uest

, Res

pons

e & D

ata

Tran

spor

t

Request & Response

Data

Res

ourc

e

Client OGSA-DAIRequest & Response D

ata

Res

ourc

eD

ata

Res

ourc

e

DR messages

DR1

DR2

DR3

Page 10: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 20

MySQL

OGSA-DAI service

Engine

SQLQuery

JDBC

SQL

JDBC

SQL

JDBC

SQL

JDBC

SQL

JDBC

MultipleSQL GDS

SQLQuery

Extensibility Example

Page 11: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 21

Map Retrieval: Current

OGC

browser

Internet

Service GISOracle

EDINA

Page 12: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 22

Map Retrieval: Grid Prototype

OGC

GIS OracleOGSA-DAI 1Client

EDINABasic client to demonstrate proof of concept

SO-OGC

Page 13: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 23

Map Retrieval: Security

• Exploit NGS infrastructure to provide secure access layer

OGCODS 1 GIS OraclePortlet

Allowed users dn

SO-OGC

NGS Authentication

EDINA

Page 14: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 24

Map Retrieval: Integration

• Exploit OGSA-DAI extensibility to add e.g. overlay

OGCODS 2 GIS OraclePortlet

ODS 1OracleCensus

ODS 3 Application data

SO-OGC

JDBC

SO-OGC

SQL/XML

NGS Authentication

Page 15: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 25

OGSA-DAI / EDINA prototyping work

• Stage 1: Using existing OGSA-DAI technology

• Stage 2: Extending OGSA-DAI

OGSA-DAI service

HTTP Data Resource WMS

Server

DeliverFromURL

GISClientGISClient

URLInput Parameters

Image/XML File

HTTP Request

HTTP Response

GISActivities

Page 16: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 28

Distributed Query Processing

• Higher level services building on

OGSA-DAI– specialised metadata extraction

• Execute queries in parallel over multiple

data resources

• Queries mapped to algebraic

expressions for evaluation

• Parallelism represented by partitioning

queries –Use exchange operators

• Equality based joins in current release– supported types: long, integer, string, double and float table_scan

(protein)table_scantermID=S92(proteinTerm)

reduce

reduce

hash_join(proteinId)

op_call(Blast)

reduce

exchange

exchange

3,4

1 2

Page 17: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 29

DQP architecture

Co-ordinator

Evaluator Evaluator Evaluator

OGSA-DAI

OGSA-DAI

OGSA-DAI

OGSA-DAI

Query SQL & OQL

OGSA-DAI activity

WS-I only

Using client toolkit

All interfaces that aresupported by toolkit

Page 18: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 37

Contributing to OGSA-DAI

• Additional functionality:– Provide activities which implement specific functionality– Provide extra client functionality– Provide different security mechanisms– Provide higher level components and applications

• Different levels of contributions– Based on OGSA-DAI?– Works with OGSA-DAI?– Part of OGSA-DAI?

Page 19: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 38

In the near future

• A new version of the OGSA-DAI Engine– should look mostly the same externally– better support for concurrency, sessions and monitoring

• Implementing new versions of specifications– DAIS Specifications

• Key things that we will be addressing:– Performance– A Security Model which can be applied across platforms– Full Transactions framework, distributed transactions– More data integration facilities– Better abstraction over DBMS variation

• Application centric queries– collaborating with other projects

• Research projects looking at:– schema mapping– extended data resources

Page 20: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 39

Associated Meetings and Workshops

• DIALOGUE Workshops (http://www.datagrids.org)– Data Integration Applications: Linking Organisations to Gain

Understanding and Experience– Bringing together Data Integration middleware and application

providers with users– Next one at NeSC: 9-10th February 2006

– http://www.nesc.ac.uk/esi/events/636/

• Next Generation Distributed Data Management (HPDC15,

Paris)– http://www.isi.edu/~annc/distributedDataWorkshop.html

• Data Management on Grids (VLDB’06, Seoul)

Page 21: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 40

Conclusions

• The benefits of trying to integrate data are hindered by

challenges such as heterogeneity, scale and distribution

• A common data service layer should make data integration

easier

• OGSA-DAI provides an extensible, data service based

framework which makes it easier to implement data

integration

• GIS data is amenable to integration using data services

Page 22: OGSA-DAI data access  and integration

NERC GridGIS workshop - 1 February 2006 41

Further information

• The OGSA-DAI Project Site:– http://www.ogsadai.org.uk

• The DAIS-WG site:– http://forge.gridforum.org/projects/dais-wg/

• OGSA-DAI Users Mailing list– [email protected]– General discussion on grid DAI matters

• Formal support for OGSA-DAI releases– http://bugs.ogsadai.org.uk/

• OGSA-DAI training courses