SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental...
-
Upload
jonathan-palmer -
Category
Documents
-
view
213 -
download
1
Transcript of SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental...
SEEK EcoGrid
Integrate diverse data networks from ecology, biodiversity, and environmental sciences
Metacat, DiGIR, SRB, Xanthoria, ... EML is the core for data documentation Open programming interface
EcoGrid client interactions
Aims of EcoGrid Which, Where, How, Who ???? Share Data and Information Relate Data from multiple projects/groups Crosswalks across data structures Develop Eco-related Finding Aids for Data Global User: Authenticate and Authorize Provide an infrastructure for “Archivable
Collection-building” for SEEK scientists Facilitate the A&M layer and the SMS
layer
Challenges of EcoGrid Data & User Diversity
6000+ datasets & 1500+ scientists themes, methods, units,structures Small data sizes but high complexity - metadata
Multiple Data Organizations Biodiversity Surveys Population data GIS, Satellite Images, Weather Data, …
Ontologies & Taxonomies Data Discovery: No single place to find Data Entropy – rapid decline of information on data Autonomy with Centralized access Leverage Computational Grid work
Existing services Metacat – syntactic and semantic metadata
querying/inserting/updating/deleting, user registration/authentication, data replication, data/metadata versioning, - supports any XML-based metadata
Xanthoria – common-schema mediator (currently 8 sites) metadata query/insert/update/delete for any XML schema to underlying metadatabase (SQL, native XML)
Existing Systems
DiGIR – querying arbitrary XML-describable resources (underlying data sources can be any type: RDB, XMLDB).
ClimDB – integrating (using wrapping at the data source) diverse format climate data. Access through web, common schema identified beforehand – tabular description
HyperLTER – summary ontology as metadata for images put in as metadata, image extraction /geographicsubsetting/band-level subsetting/ - integration with MODIS images and Hyperspectral images, TM images, airphotos, …
Existing Systems
VegBank – 3 databases co-occurrence records, species taxonomic database that is concept-driven, community classification. Distributed vegbank, querying by plots. Querying/insert/update/annotate across three diverse databases that are described using XML
SRB – access distributed data, syntactic, semantics,user-defined (arbitrary relational) metadata based querying. Annotations for data. Opertions on data. Extraction of metadata. ingest,bulk ingest, delete,upate of data/metadata
EcoGrid Services
Query Search metadata and data, return result sets with ID
Read Retrieve data objects by ID
Authentication Verify user identity
Authorization Record allowable interactions
Write Write data objects by ID
Replication Mirror objects for backup and efficiency
Computation Execute models and simulations from AMS on various nodes
EcoGrid Search Interactions
Features Well-defined interfaces (e.g., WSDL) Standardized messaging formats Automated discovery of implementing services Aggregation/Indexing across nodes for efficiency Support heterogeneous data objects via metadata descriptions Lightweight to implement for various systems like DiGIR and
Metacat
Client
Registry
QueryServiceQueryServiceQueryServiceQueryServiceQueryServiceQueryService
1. Register2. Find Query Nodes
3. Search (recursive)
4. Read (recursive)
5. Find Index Nodes1. Register
EcoGrid Index Interactions
Client
Registry
QueryServiceQueryServiceQueryServiceQueryServiceQueryServiceQueryService
3. Search (recursive)
IndexedQueryService
6. Search
2. Find Query Nodes
Authentication and Authorization
KNB uses simple LDAP system with referrals Leverages existing DB (e.g. LTER personnel DB) Not really scalable in terms of administration
Grid Security Infrastructure (GSI) Certificate based authentication Proxy certificates allows transfer of rights De-centralized administration (I.e., multiple CA’s)
Can we easily transition to GSI?
Native Range prediction workflow
Slide from D. Pennington
KNBAbundance
Data(a1)
Training sample (d)
GARPrule set (e)
Test sample (d)
Integrated layers
(native range) (c)
DiGIRSpecies
presence &absence points
(a2)EcoGridQuery
EcoGridQuery
LayerIntegration
Sample
+A3+A2
+A1
DataCalculation
Map Validation
User
Model qualityparameter (g)
Native range prediction
map (f)
SRBEnvironmental
layers (b)
EcoGridQuery
EcoGrid
Archive
Implementation
Short-term Define common WSDL services Simple service registry Wrappers for Metacat, DiGIR, SRB, Xanthoria, etc.
Medium-term Use OGSI-compliant interfaces
(add methods to current WSDL) Grid Registry service
Timing
April 4 April 11 -- Design Diagrams April 18 -- WSDL, Registry instance operational, query + read, RSIDS
schema and examples. April 25 May 2 May 9 Wrapper implementations + test client(s) May 16 (SEEK Technical WG meeting) May 23 May 30 -- Hard deadline for implementation of Eco-GRID alpha 1
Query Messages
<egq:query queryId="test.1.1" system="test" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0alpha1">
<namespace prefix="eml" space="eml://ecoinformatics.org/eml-2.0.0"/> <title>Soils metadata query</title> <AND> <OR> <condition operator="LIKE" concept="eml:title">%soil%</condition> <condition operator="LIKE" concept="eml:title">%dirt%</condition> </OR> <OR> <condition operator="LIKE" concept="eml:surName">%Jones%</condition> <condition operator="LIKE" concept="eml:surName">%Vieglais%</condition> </OR> </AND></egq:query>
Result responses<rs:resultset resultsetId="foo.1.1" system="http://knb.ecoinformatics.org/knb/" xmlns:rs='ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0alpha1'> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <recordCount>86</recordCount> </resultsetMetadata> <records startRecord="1" endRecord="1" xmlns:eml='eml://ecoinformatics.org/eml-2.0.0'> <record number="1" identifier="bar.1.23"> <eml:eml packageId="bar.1.23"> <title>Soil data from West Valley, 1983</title> <creator> <individualName><surName>Jones</surName></individualName> </creator> <creator> <individualName><surName>Smith</surName></individualName> </creator> <keywordSet> <keyword>aves</keyword> <keyword>ornithology</keyword> <keyword>biodiversity</keyword> </keywordSet> </eml:eml> </record> </records></rs:resultset>