Http://knb.ecoinformatics.org Science Environment for Ecological Knowledge: EcoGrid Matthew B....
-
Upload
milton-solomon-stephens -
Category
Documents
-
view
217 -
download
0
Transcript of Http://knb.ecoinformatics.org Science Environment for Ecological Knowledge: EcoGrid Matthew B....
http://knb.ecoinformatics.org http://seek.ecoinformatics.org
Science Environment for Ecological Knowledge: EcoGrid
Matthew B. JonesNational Center for Ecological Analysis and Synthesis
University of California Santa Barbara
Science Environment for Ecological Knowledge
Research Objectives
Access to ecological, environmental, and biodiversity data Enable data sharing & re-use Enhance data discovery at global scales
Scalable analysis and synthesis Taxonomic, Spatial, Temporal, Conceptual integration of data
Address data heterogeneity issues Enable communication and collaboration for analysis Enable re-use of analytical components
Collaborators NCEAS, UNM, SDSC, U Kansas Vermont, Napier, ASU, UNC
SEEK Components
Science Environment for Ecological Knowledge
Kepler Modeling scientific workflows
EcoGrid Making diverse environmental data systems interoperate
Semantic Mediation System “Smart” data discovery and integration
Knowledge Representation WG Taxon WG BEAM WG Education, Outreach, Training
Scientific Workflows
Model the way scientists work with their data now Mentally coordinate export and import of data among software
systems
Workflows emphasize data flow
Output generation includes creating appropriate metadata The analysis workflow itself becomes metadata The workflow describes the data lineage as it has been
transformed Derived data sets can be stored in EcoGrid with provenance
Query EcoGrid to find data
Archive output to EcoGrid with workflow
metadata
SEEK EcoGrid
Goal: allow diverse environmental data systems to interoperate
Hides complexity of underlying systems using lightweight interfaces
We have standardized data via EML, need standard APIs Integrate diverse data networks from ecology, biodiversity, and
environmental sciences
Data systems Any system can implement these interfaces Prototyping using:
Metacat, SRB, DiGIR, Xanthoria, etc.
Supports multiple metadata standards EML, Darwin Core as foci
EcoGrid client interactions
Modes of interaction Client-server Fully distributed Peer-to-peer
EcoGrid Registry Node discovery Service discovery
Aggregation services Centralized access Reliability Data preservation
EcoGrid Query Interfaces
Provides a mechanism for search and retrieval of metadata and federated data
Supports third party interaction with search results – forwarding of result set identifiers to another service instance for retrieval
Different levels of compliance Low barrier for participation Bulk of data will be accessible through Type I
ResultQuery
Query Interfaces Implemented
Initial prototype to support query and retrieval from: Storage Resource Broker (SRB) Metacat Distributed Generic Information Retrieval (DiGIR) Xanthoria
Encourage additional experimentation with and feedback based on other system implementations
EcoGrid Query Level I
Basic, entry level exposure of data and metadata for EcoGrid and SEEK
Response contains data – intended for direct communications rather than 3rd party indirection
ResultsetType query(SessionID,QueryType)
byte[] get(SessionID,objectID)
Result Query
Query Conditions
Language independent representation of a query structure
Transformed into the appropriate native language of the data store
Example:<AND> <condition operator="LIKE“ concept="ScientificName">peromyscus%
</condition> <condition operator="NOT EQUALS“
concept="DecimalLatitude">NULL</condition>
</AND>
Query
Specifying the Resultset
Specify the list of concepts (fields) to be returned in the resultset
Simple paths used to identify elements or document subtrees
Effectively flattens the structure of the records, but allows generic representation
Example: <returnfield>/ScientificName</returnfield>
<returnfield>/Longitude</returnfield>
<returnfield>/Latitude</returnfield>
Query
Full Query Example
<egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org"
xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-
query-1.0.0beta1 ../../src/xsd/query.xsd"> <namespace
prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>
<returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE"
concept="Genus">Peromyscus</condition></egq:query>
Query
Query Result Set Structure
<rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
../../src/xsd/resultset.xsd">
<resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> <namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace> <system id="1">http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2</system> </resultsetMetadata>
<record number="1" system="1" identifier="mvz1"> <returnField name="ScientificName">PEROMYSCUS LEUCOPUS NOVEBORACENSIS</returnField> <returnField name="Longitude">100</returnField> <returnField name="Latitude">200</returnField> </record> …</rs:resultset>
Result
EcoGrid Query Level II
More detailed handling of results Uses RSIDs to identify resultsets- handles
that can be passed to a third party
RSID search(SessionID,query)
Resultset retrieve(SessionID,RSID,start,numrecs)
query decodeResultsetIdentifier(SessionID,RSID)
statusinfo getResultStatus(SessionID)
int transfer(SessionID,sourceURL,destURL,ObjectID)
EcoGrid Write
Used to push data back to sources (e.g. publishing EML documents)
Depends on the availability of an authentication and access control system
put(sessionID, objectID, object, type)
delete(sessionID,objectID)
Data Instance Query
New requirement to support direct query and retrieval with arbitrary data sets
Generally no common schemas between different instances
Could either Push data instance to service that can query object (e.g.
the SRB) Implement interface at the data instance location
Simple JDBC / SQL interface?
dbSchema getDataSchema(sessionID,objectID)
dbResultset search(sessionID,objectID,SQL)
Building the EcoGrid
AND
LUQ
NTL
Metacat node
Legacy system
LTER Network (24) Natural History Collections (>> 100)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)
SRB node
DiGIR node
VCR
VegBank node
Xanthoria node
HBR
Acknowledgements
This material is based upon work supported by:
The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.
The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.
The Andrew W. Mellon Foundation.
PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)