Post on 18-Jan-2016
1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Federating taxonomic databases: progress with the Catalogue of Life
Dynamic Checklist
Richard White, Andrew Jones, Computer Science, Cardiff University, UK
R.J.White@cs.cardiff.ac.ukAndrew.C.Jones@cs.cardiff.ac.uk
Frank BisbyPlant Sciences, University of Reading, UK
2October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
The Species 2000 programme
Species 2000, together with its partner ITIS, operates a federated environment which:
• gathers data from specialist species data providers• delivers the Catalogue of Life:
• Species 2000 global Dynamic Checklist (species; hierarchy);
• regional species checklist for Europe – (prototype for further regional hubs, etc)
Plan to complete Catalogue in 2011
3October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Main topics
• The Species 2000 federated environment• Interoperability conventions and standards
adopted
4October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
The federation: organisation
Species 2000 assembles sectors “side by side”:
Taxonomic hierarchy (or hierarchies)
Species
Global species databases (GSDs) and interim checklists:
the catalogue of life GSDinterim
checklists
Species information sources (SISs): regional faunas and floras, specialist or sectoral
databases, web pages etc.
SIS
5October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Uses for the system
• on-line reference tool (available)• index to further Web-based species resources
(planned; rudiments implemented for some taxonomic groups)
• “synonymy server”, exposed as a Web service (available, but to be improved)
6October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Species 2000 home pageSpecies 2000 home page
User about to click on “Dynamic Checklist” …
7October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Dynamic Checklist search pageDynamic Checklist search page
User interested in Dwarf Gourami; knows its genus …
8October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Found some speciesFound some species
User interested in Colisa laelia (Dwarf Gourami) and about to click on this name …
9October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Colisa laelia standard data (1)Colisa laelia standard data (1)
Scroll to bottom …
10October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Colisa laelia standard data (2)Colisa laelia standard data (2)
Follow further information link …
11October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Colisa laelia in FishBaseColisa laelia in FishBase
Information from FishBase in this case
12October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Spice for Species 2000
Currently provides Common Access System (CAS) for Species 2000
• implements a hub
• gathers data from providers via wrappers
• integrates and caches
• makes data available to users and other software
13October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Recent progress with Species 2000 (1)
EuroCat project:• added many new data providers for further taxonomic
sectors• improved Spice• set up “Species 2000 europa” regional hub (using Spice)• experimented with “cross-mapping”, using Litchi• gained better understanding of the dynamics of developing
and incorporating new GSDs• New wrapper-writing resources made available
14October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Recent progress with Species 2000 (2)
Current activities include:• Secretariat at Reading• At least 4 new databases have become available in the last
few months: people are busy working on various sectors• Annual checklist:
• Long term plan: snapshot of dynamic checklist• Currently parallel development in Philippines• ≥ 8 new databases being added• 2007 expected ≥ 1,000,000 species
15October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Components of the architecture
Main data and software components of the Catalogue: • Autonomous species databases (GSDs) • GSD wrappers• “Hubs” (portals) to assemble data from wrappers;
provide data to clients• Interfaces
• for users• for software
• Maintenance and administration software tools(e.g. metadatabase)
16October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Species 2000 protocols – overview
How (GSDs) interoperate in this federation ... four levels:
1. Organisational model for a federation in which data providers provide data about “taxonomic sectors”; hub assembles complete catalogue (see above)
2. Framework for information exchange based on a number of defined requests
3. Human-readable Common Data Model (CDM): abstract definition for requests; responses; data exchanged
4. Specific computer-readable interface definitions, implementing CDM
17October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Species 2000 protocols and data standards
Activities at the “federation” level 1 described above.
Levels 2, 3 and 4:• Species 2000 defines internal data standards• Intended to be open standards
18October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Interoperability level 2: Informal data request and response model
Describes informally how information is exchanged:• between federation components, including:
• data providers, the hub and software clients of the hub
• by means of (currently six) requests defined for specific purposes
• with correspondingly defined response dataThis model:
• avoids need for providers to handle general database queries
• treats GSDs as “black boxes”
19October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Request types
The request types sent by the CAS to SPICE wrappers:
0: get version of CDM the wrapper implements1: look up species name or ambiguous search string2: get “standard data” for a given species name
includes accepted name, synonym(s), common name(s),distribution data, reference(s), latest taxonomic scrutiny, andlinks to other online resources about the species
3: obtain metadata concerning source database &data provider
4: move one step up taxonomic hierarchy5: move one step down taxonomic hierarchy
20October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Interoperability level 3: Formal data and request/response model
Human-readable Common Data Model (CDM) for reference purposes
• provides abstract definition for the requests and responses, including parameters, etc
• candidate set of operations for retrieval of species-related data more generally
• defines the components of data transmitted and received• Data model defined specifically for Species 2000 “standard data set”
• doesn’t define programming-language or technology-specific implementations
(Also available: “Species 2000 standard data set”, which summarises CDM briefly)
21October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Interoperability level 4: Interface definitions
Computer-readable interface definitions, following the CDM, for use with particular implementations,including Corba IDL, XML DTD and XML Schema for:
• requests from hub to wrappers
• requests from external client software to hub
22October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Requests from hub to wrappers
Spice hub communicates with GSD wrappers using HTTP:
• “CGI” GET requests are sent to a wrapper, which returns an XML document in response
• An XML Schema (XSD) defines the specific XML requests and responses• Corba used within SPICE; corresponding IDL document
• NB CDM 1.20 is being updated to reflect minor modifications recently made to XSD, etc.
23October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Requests from external client software to hub
A SOAP Web Service to allow programmatic access to dynamic checklist (including by the user interface), to interrogate Spice global & European hubs:http://spice.sp2000europa.org/SPICE/services/CASWebService
(location and definition may change)
CAS Web Service version 1.0 informal definition & WSDL:http://biodiversity.cs.cf.ac.uk/sp2000/protocol/
24October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Further information
• Species 2000 programme and Species 2000 & ITIS Catalogue of Life:http://www.sp2000.org
• Species 2000 protocols and practices:http://biodiversity.cs.cf.ac.uk/sp2000/protocol/
• Spice:http://biodiversity.cs.cf.ac.uk/spice/
• Biodiversity Software Repository at Cardiff for access to Spice, other software and some wrappers:http://biodiversity.cs.cf.ac.uk/software/
25October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Collaboration in open standards and software
We would like to see future progress as a community effort for developing
• data standards
• interoperable software• Especially interoperation with emerging standards,
e.g. TCS
26October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Opportunities for standardisation
We would welcome consideration of the request/response model as a useful data representation-independent basis for interrogating sources of species related information
27October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Join us in enhancing SPICE & associated software
Areas for work include
• sophisticated management tools• revision of SPICE code-base• reusable software for wrapper writers• addition of new protocols and schemas
28October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Towards the Species Banks of the future
• Some Species 2000 GSDs currently provide “onward links” to rich species information
• Plan to investigate link-bases in which the Catalogue of Life can play an important part in the species banks of the future
29October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Date for your diaries
1-day symposium to discuss Species 2000 Phase 2: progressing beyond 1 million species to the target 1.75 million
• The University of Reading, UK • March 2007 (probably 29th)
30October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Summary
• These protocols and standards are intended to be open and available for others to use when building similar federated information systems
• Our 6 operations are a candidate set for interchange of taxonomic data (possibly needing augmentation)
• They are described further in Species 2000 data standards documents at:http://biodiversity.cs.cf.ac.uk/sp2000/protocol/
31October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis
Acknowledgements
• Funding: BBSRC, European Commission, GBIF
• Species 2000 Project Team and Directors
• Data providers