SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental...

Post on 02-Jan-2016

213 views 1 download

Transcript of SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental...

SEEK EcoGrid

Integrate diverse data networks from ecology, biodiversity, and environmental sciences

Metacat, DiGIR, SRB, Xanthoria, ... EML is the core for data documentation Open programming interface

EcoGrid client interactions

Aims of EcoGrid Which, Where, How, Who ???? Share Data and Information Relate Data from multiple projects/groups Crosswalks across data structures Develop Eco-related Finding Aids for Data Global User: Authenticate and Authorize Provide an infrastructure for “Archivable

Collection-building” for SEEK scientists Facilitate the A&M layer and the SMS

layer

Challenges of EcoGrid Data & User Diversity

6000+ datasets & 1500+ scientists themes, methods, units,structures Small data sizes but high complexity - metadata

Multiple Data Organizations Biodiversity Surveys Population data GIS, Satellite Images, Weather Data, …

Ontologies & Taxonomies Data Discovery: No single place to find Data Entropy – rapid decline of information on data Autonomy with Centralized access Leverage Computational Grid work

Existing services Metacat – syntactic and semantic metadata

querying/inserting/updating/deleting, user registration/authentication, data replication, data/metadata versioning, - supports any XML-based metadata

Xanthoria – common-schema mediator (currently 8 sites) metadata query/insert/update/delete for any XML schema to underlying metadatabase (SQL, native XML)

Existing Systems

DiGIR – querying arbitrary XML-describable resources (underlying data sources can be any type: RDB, XMLDB).

ClimDB – integrating (using wrapping at the data source) diverse format climate data. Access through web, common schema identified beforehand – tabular description

HyperLTER – summary ontology as metadata for images put in as metadata, image extraction /geographicsubsetting/band-level subsetting/ - integration with MODIS images and Hyperspectral images, TM images, airphotos, …

Existing Systems

VegBank – 3 databases co-occurrence records, species taxonomic database that is concept-driven, community classification. Distributed vegbank, querying by plots. Querying/insert/update/annotate across three diverse databases that are described using XML

SRB – access distributed data, syntactic, semantics,user-defined (arbitrary relational) metadata based querying. Annotations for data. Opertions on data. Extraction of metadata. ingest,bulk ingest, delete,upate of data/metadata

EcoGrid Services

Query Search metadata and data, return result sets with ID

Read Retrieve data objects by ID

Authentication Verify user identity

Authorization Record allowable interactions

Write Write data objects by ID

Replication Mirror objects for backup and efficiency

Computation Execute models and simulations from AMS on various nodes

EcoGrid Search Interactions

Features Well-defined interfaces (e.g., WSDL) Standardized messaging formats Automated discovery of implementing services Aggregation/Indexing across nodes for efficiency Support heterogeneous data objects via metadata descriptions Lightweight to implement for various systems like DiGIR and

Metacat

Client

Registry

QueryServiceQueryServiceQueryServiceQueryServiceQueryServiceQueryService

1. Register2. Find Query Nodes

3. Search (recursive)

4. Read (recursive)

5. Find Index Nodes1. Register

EcoGrid Index Interactions

Client

Registry

QueryServiceQueryServiceQueryServiceQueryServiceQueryServiceQueryService

3. Search (recursive)

IndexedQueryService

6. Search

2. Find Query Nodes

Authentication and Authorization

KNB uses simple LDAP system with referrals Leverages existing DB (e.g. LTER personnel DB) Not really scalable in terms of administration

Grid Security Infrastructure (GSI) Certificate based authentication Proxy certificates allows transfer of rights De-centralized administration (I.e., multiple CA’s)

Can we easily transition to GSI?

Native Range prediction workflow

Slide from D. Pennington

KNBAbundance

Data(a1)

Training sample (d)

GARPrule set (e)

Test sample (d)

Integrated layers

(native range) (c)

DiGIRSpecies

presence &absence points

(a2)EcoGridQuery

EcoGridQuery

LayerIntegration

Sample

+A3+A2

+A1

DataCalculation

Map Validation

User

Model qualityparameter (g)

Native range prediction

map (f)

SRBEnvironmental

layers (b)

EcoGridQuery

EcoGrid

Archive

Implementation

Short-term Define common WSDL services Simple service registry Wrappers for Metacat, DiGIR, SRB, Xanthoria, etc.

Medium-term Use OGSI-compliant interfaces

(add methods to current WSDL) Grid Registry service

Timing

April 4 April 11 -- Design Diagrams April 18 -- WSDL, Registry instance operational, query + read, RSIDS

schema and examples. April 25 May 2 May 9 Wrapper implementations + test client(s) May 16 (SEEK Technical WG meeting) May 23 May 30 -- Hard deadline for implementation of Eco-GRID alpha 1

Query Messages

<egq:query queryId="test.1.1" system="test" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0alpha1">

<namespace prefix="eml" space="eml://ecoinformatics.org/eml-2.0.0"/> <title>Soils metadata query</title> <AND> <OR> <condition operator="LIKE" concept="eml:title">%soil%</condition> <condition operator="LIKE" concept="eml:title">%dirt%</condition> </OR> <OR> <condition operator="LIKE" concept="eml:surName">%Jones%</condition> <condition operator="LIKE" concept="eml:surName">%Vieglais%</condition> </OR> </AND></egq:query>

Result responses<rs:resultset resultsetId="foo.1.1" system="http://knb.ecoinformatics.org/knb/" xmlns:rs='ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0alpha1'> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <recordCount>86</recordCount> </resultsetMetadata> <records startRecord="1" endRecord="1" xmlns:eml='eml://ecoinformatics.org/eml-2.0.0'> <record number="1" identifier="bar.1.23"> <eml:eml packageId="bar.1.23"> <title>Soil data from West Valley, 1983</title> <creator> <individualName><surName>Jones</surName></individualName> </creator> <creator> <individualName><surName>Smith</surName></individualName> </creator> <keywordSet> <keyword>aves</keyword> <keyword>ornithology</keyword> <keyword>biodiversity</keyword> </keywordSet> </eml:eml> </record> </records></rs:resultset>