Global Biodiversity. Global Biodiversity Patterns and Processes.
GLOBAL BIODIVERSITY
description
Transcript of GLOBAL BIODIVERSITY
GLOBALBIODIVERSITYINFORMATIONFACILITY
ECAT Programme Update
David Remsen & Markus Döring
ECAT Goals GBIF provides a simple and extensible
solution for publishing taxonomic checklists
Published data used to improve access and data interoperability within the portal
Published data supports taxonomic name services
Name services support development of tools that meet national and regional needs.
SCOPE of ECAT publishing
Taxonomic Catalogues Monographs/Flora/Fauna
Annotated Species Checklists Regional Thematic
Nomenclators Name Dictionaries No taxonomy
Darwin Core Archive Format
Vocabularies.gbif.org
Community-drivenInternationalisedVocabulariesExtensionsTestedReady for release
See Spanish Page
Extensions
Extend the DwCFor Occurrence-levelFor Species-levelDraftAdd relevant vocabs.ReviewPublish!
Terms of Bionomenclature
Taxonomic Std Reference
Print PublicationOnline ReferenceSemanticSupports vocabulary building
April
Go to website
Publishing Checklists to GBIF
Integrated Publishing Toolkit (next version) Full & “lite”
Direct DWC Output from Sources HIT Adapters for existing sources Spreadsheets Desktop Applications
Refactoring existing online Tools (ITIS, EDIT)
HIT Adapters
HIT AdaptersDatabase Classificati
onSynonyms Vernacula
rDistrib.
Catalogue of Life 2009 Yes Yes Yes YesITIS Yes Yes Yes YesTree of Life Yes - - -USDA Plants Yes Yes Yes YesGRIN GermPlasm Taxonomy
Yes Yes Yes -
NCBI Taxonomy Yes Yes YesPalaeobiology Database Yes Yes - -
See Example DWC Archive Output
View the Project Wiki page with links to all source Scripts
Publishing by Spreadsheet
SimpleValidatedDeveloping countriesConforms to existing
workflow
Publishing by Spreadsheet
Forms and auto-completeMetadata and dataOccurrence dataSpecies ChecklistsEmbedded vocabularies
Desktop Application
Desktop ApplicationPublishes DwCACurrently used
GBIFS~100 sources600,000 records90 languagesCould be deployed
DwCA Validating Tool
View the DwCA Validator
Published DWC Archive files
Current StatusManually Curated
82 ECAT sources 14Taxonomic authority files 64 Vernacular Name Lists 2 Nomenclatural Lists 2 Thematic Lists
5,800 occurrence classifications
15M different usages11,454,896 unique names
assigned to 4.8M name groups
4,612,444 canonical names
Importing Data
ChecklistBank Command Line Tool
Bundles many tasks into 1 executable jar
adding/deleting/exporting resources, (pre)importing, lexical grouping, nub build
* to be used by HIT module * importing in 3 steps: 1) preimport terms 2) import into isolated db
schema 3) accepting import into
public schema
Checklist Data Qualities1. Highly relational taxonomic data, almost all records linked in a tree hierarchy +
basionym2. Wrong or missing records destroy dataset integrity, not just a single record! 3. Different to flat, unrelated occurrence records
Syntactically damaged sources wrong mappings wrong character encodingsend of line breaks or tabs within data
Data Quality broken referential integrity bad names (e.g. «Unallocated Family») missing or unused controlled vcabularies, e.g. «art» for rank species
Names can be published in several ways ScientificName ScientificName + Authorship Genus + Authorship Genus + SpeciesEpitheton (+ Rank + InfraspecificEpitheton)+ Authorship
Classifications can be published in several ways Normalised via parentNameUsageID Normalised via parentNameUsage Denormalised via Kingdom,Phylum,Class,Order,Family,Genus
Checklist Bank Model
Lexical Group Gerardia paupercula var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell)
Deam Gerardia paupercula (A.Gray) Britton var. borealis (Pennell)
Deam Gerardia paupercula borealis Gerardia paupercula borealis (Pennell) Deam
Nomenclatural Group Gerardia paupercula var. borealis (Pennell) Deam Agalinis paupercula var. borealis Pennell
Taxonomic Backbone (Nub)
What it isHow it is built
Composite Taxonomic Backbone
Largest integrated taxonomy in the world
200 million occurrencesOne taxonomic hierarchy
Nub Relevance Nub Management Classification is used for
provide hierarchy of names crosswalking between taxonomies
All biodiversity data is aligned via names Considerable variation in higher taxa
=> Maps & Statistics External linkages, e.g. EOL maps
More details:http://livelink.gbif.org/gbif/livelink/overview/3233870
Cronquist classification Mimosaceae: 3,200 species Caesalpiniaceae: 2,000 species Fabaceae: 14,000 species
“Modern” classification Fabaceae: 19,200 species Mimosoideae: 3,200 species Cæsalpinioideae: 2,000 species Faboideae: 14,000 species
Nub Components
Nub Building Regular Checklist Resource Lexical Grouping
Canonical homonyms Authorship matching difficult => canonical names + kingdom Ignore noisy occurrence derived only names?
Nub Assembling 8 CoL kingdoms Each LexGroup becomes a nub usage Contradicting classifications Intermediate rank synonyms Select preferred, wellformed name Stable IDs Rated sources, nomenclatural resources for names, taxonomic
for classification
Subphylum in ANIMALIA Vertebrata Vertebrate Vertebrata Cuvier, 1812 Algae genus in PLANTAE Vertebrata Vertebrata Gray Vertebrata S.F. Gray, 1821
Nub Building
Discovery: Portal and Services
Checklist Bank Portal
82 ECAT Resources 14 Taxonomic
Catalogues 64 Vernacular Name lists 2 Thematic Lists 2 Nomenclators
Go to Portal
Checklist Bank Web Services
Checklist Service Name Usage Resolver Name Usage Service Name Usage Navigation
Service Name String Service Image Service
Go to API Page
Name Parser
UsesComparingMatchingGBIF Backbone“Did you mean”
Try GBIF Name Parser
Name Recognition Services
View GBIF Name Recognition Tools
Updated ServiceMarch 2010DWC API
UsesIAIA parsingAdding names to
metadataChecklists from
documents
“TaxonTagger” tools
View TaxonTagger Sample document (Butterfly list)
Using Name Services: Data Entry
Google Docs: Live Example
Taxonomic Indexing Mining names from publishing RSS
feeds IAIA reports KNB Knowledge network
Mapping to Species lists “Any red-listed species in this set of
IAIA reports.”
Name Parser APITaxonFinder APIChecklist Bank API
Other 2010/11 Mapping Services
Linking a data collection to a specific taxonomic authority
Taxonomic Validation and Annotation of Occurrence data.
Linking to Community Species Pages