GLOBAL BIODIVERSITY

35
GLOBAL BIODIVERSITY INFORMATION FACILITY ECAT Programme Update David Remsen & Markus Döring

description

INFORMATION FACILITY. GLOBAL BIODIVERSITY. ECAT Programme Update. David Remsen & Markus Döring. ECAT Goals. GBIF provides a simple and extensible solution for publishing taxonomic checklists Published data used to improve access and data interoperability within the portal - PowerPoint PPT Presentation

Transcript of GLOBAL BIODIVERSITY

Page 1: GLOBAL BIODIVERSITY

GLOBALBIODIVERSITYINFORMATIONFACILITY

ECAT Programme Update

David Remsen & Markus Döring

Page 2: GLOBAL BIODIVERSITY

ECAT Goals GBIF provides a simple and extensible

solution for publishing taxonomic checklists

Published data used to improve access and data interoperability within the portal

Published data supports taxonomic name services

Name services support development of tools that meet national and regional needs.

Page 3: GLOBAL BIODIVERSITY

SCOPE of ECAT publishing

Taxonomic Catalogues Monographs/Flora/Fauna

Annotated Species Checklists Regional Thematic

Nomenclators Name Dictionaries No taxonomy

Page 4: GLOBAL BIODIVERSITY

Darwin Core Archive Format

Page 5: GLOBAL BIODIVERSITY

Vocabularies.gbif.org

Community-drivenInternationalisedVocabulariesExtensionsTestedReady for release

See Spanish Page

Page 6: GLOBAL BIODIVERSITY

Extensions

Extend the DwCFor Occurrence-levelFor Species-levelDraftAdd relevant vocabs.ReviewPublish!

Page 7: GLOBAL BIODIVERSITY

Terms of Bionomenclature

Taxonomic Std Reference

Print PublicationOnline ReferenceSemanticSupports vocabulary building

April

Go to website

Page 8: GLOBAL BIODIVERSITY

Publishing Checklists to GBIF

Integrated Publishing Toolkit (next version) Full & “lite”

Direct DWC Output from Sources HIT Adapters for existing sources Spreadsheets Desktop Applications

Refactoring existing online Tools (ITIS, EDIT)

Page 9: GLOBAL BIODIVERSITY

HIT Adapters

Page 10: GLOBAL BIODIVERSITY

HIT AdaptersDatabase Classificati

onSynonyms Vernacula

rDistrib.

Catalogue of Life 2009 Yes Yes Yes YesITIS Yes Yes Yes YesTree of Life Yes - - -USDA Plants Yes Yes Yes YesGRIN GermPlasm Taxonomy

Yes Yes Yes -

NCBI Taxonomy Yes Yes YesPalaeobiology Database Yes Yes - -

See Example DWC Archive Output

View the Project Wiki page with links to all source Scripts

Page 11: GLOBAL BIODIVERSITY

Publishing by Spreadsheet

SimpleValidatedDeveloping countriesConforms to existing

workflow

Page 12: GLOBAL BIODIVERSITY

Publishing by Spreadsheet

Forms and auto-completeMetadata and dataOccurrence dataSpecies ChecklistsEmbedded vocabularies

Page 13: GLOBAL BIODIVERSITY

Desktop Application

Desktop ApplicationPublishes DwCACurrently used

GBIFS~100 sources600,000 records90 languagesCould be deployed

Page 15: GLOBAL BIODIVERSITY

Published DWC Archive files

Current StatusManually Curated

82 ECAT sources 14Taxonomic authority files 64 Vernacular Name Lists 2 Nomenclatural Lists 2 Thematic Lists

5,800 occurrence classifications

15M different usages11,454,896 unique names

assigned to 4.8M name groups

4,612,444 canonical names

Page 16: GLOBAL BIODIVERSITY

Importing Data

Page 17: GLOBAL BIODIVERSITY

ChecklistBank Command Line Tool

Bundles many tasks into 1 executable jar

adding/deleting/exporting resources, (pre)importing, lexical grouping, nub build

* to be used by HIT module * importing in 3 steps:   1) preimport terms   2) import into isolated db

schema   3) accepting import into

public schema

Page 18: GLOBAL BIODIVERSITY

Checklist Data Qualities1. Highly relational taxonomic data, almost all records linked in a tree hierarchy +

basionym2. Wrong or missing records destroy dataset integrity, not just a single record! 3. Different to flat, unrelated occurrence records

Syntactically damaged sources wrong mappings wrong character encodingsend of line breaks or tabs within data

Data Quality broken referential integrity bad names (e.g. «Unallocated Family») missing or unused controlled vcabularies, e.g. «art» for rank species

Names can be published in several ways ScientificName ScientificName + Authorship Genus + Authorship Genus + SpeciesEpitheton (+ Rank + InfraspecificEpitheton)+ Authorship

Classifications can be published in several ways Normalised via parentNameUsageID Normalised via parentNameUsage Denormalised via Kingdom,Phylum,Class,Order,Family,Genus

Page 19: GLOBAL BIODIVERSITY

Checklist Bank Model

Lexical Group Gerardia paupercula var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell)

Deam Gerardia paupercula (A.Gray) Britton var. borealis (Pennell)

Deam Gerardia paupercula borealis Gerardia paupercula borealis (Pennell) Deam

Nomenclatural Group Gerardia paupercula var. borealis (Pennell) Deam Agalinis paupercula var. borealis Pennell

Page 20: GLOBAL BIODIVERSITY

Taxonomic Backbone (Nub)

What it isHow it is built

Page 21: GLOBAL BIODIVERSITY

Composite Taxonomic Backbone

Largest integrated taxonomy in the world

200 million occurrencesOne taxonomic hierarchy

Page 22: GLOBAL BIODIVERSITY

Nub Relevance Nub Management Classification is used for

provide hierarchy of names crosswalking between taxonomies

All biodiversity data is aligned via names Considerable variation in higher taxa

=> Maps & Statistics External linkages, e.g. EOL maps

More details:http://livelink.gbif.org/gbif/livelink/overview/3233870

Cronquist classification Mimosaceae: 3,200 species Caesalpiniaceae: 2,000 species Fabaceae: 14,000 species

“Modern” classification Fabaceae: 19,200 species Mimosoideae: 3,200 species Cæsalpinioideae: 2,000 species Faboideae: 14,000 species

Page 23: GLOBAL BIODIVERSITY

Nub Components

Page 24: GLOBAL BIODIVERSITY

Nub Building Regular Checklist Resource Lexical Grouping

Canonical homonyms Authorship matching difficult => canonical names + kingdom Ignore noisy occurrence derived only names?

Nub Assembling 8 CoL kingdoms Each LexGroup becomes a nub usage Contradicting classifications Intermediate rank synonyms Select preferred, wellformed name Stable IDs Rated sources, nomenclatural resources for names, taxonomic

for classification

Subphylum in ANIMALIA Vertebrata Vertebrate Vertebrata Cuvier, 1812 Algae genus in PLANTAE Vertebrata Vertebrata Gray Vertebrata S.F. Gray, 1821

Page 25: GLOBAL BIODIVERSITY

Nub Building

Page 26: GLOBAL BIODIVERSITY

Admin Console

View the Admin Console

Page 27: GLOBAL BIODIVERSITY

Discovery: Portal and Services

Page 28: GLOBAL BIODIVERSITY

Checklist Bank Portal

82 ECAT Resources 14 Taxonomic

Catalogues 64 Vernacular Name lists 2 Thematic Lists 2 Nomenclators

Go to Portal

Page 29: GLOBAL BIODIVERSITY

Checklist Bank Web Services

Checklist Service Name Usage Resolver Name Usage Service Name Usage Navigation

Service Name String Service Image Service

Go to API Page

Page 30: GLOBAL BIODIVERSITY

Name Parser

UsesComparingMatchingGBIF Backbone“Did you mean”

Try GBIF Name Parser

Page 31: GLOBAL BIODIVERSITY

Name Recognition Services

View GBIF Name Recognition Tools

Updated ServiceMarch 2010DWC API

UsesIAIA parsingAdding names to

metadataChecklists from

documents

Page 32: GLOBAL BIODIVERSITY

“TaxonTagger” tools

View TaxonTagger Sample document (Butterfly list)

Page 33: GLOBAL BIODIVERSITY

Using Name Services: Data Entry

Google Docs: Live Example

Page 34: GLOBAL BIODIVERSITY

Taxonomic Indexing Mining names from publishing RSS

feeds IAIA reports KNB Knowledge network

Mapping to Species lists “Any red-listed species in this set of

IAIA reports.”

Name Parser APITaxonFinder APIChecklist Bank API

Page 35: GLOBAL BIODIVERSITY

Other 2010/11 Mapping Services

Linking a data collection to a specific taxonomic authority

Taxonomic Validation and Annotation of Occurrence data.

Linking to Community Species Pages