Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program...

39
NCBO Webinar series Knowledge Organization System (KOS) for biodiversity information resources - GBIF KOS work program Dag Endresen Knowledge Systems Engineer, Node manager for GBIF Norway Natural History Museum, University of Oslo Éamonn Ó Tuama Senior Programme Officer, Inventory, Discovery, Access (IDA) Global Biodiversity Information Facility (GBIF) 17 October 2012

description

Presentation of the Global Biodiversity Information Facility (GBIF) knowledge organization system (KOS) work program for the National Center for Biomedical Ontology (NCBO) Web seminar series in October 2012. Available at http://www.bioontology.org/GBIF-vocabulary-management-for-biodiversity-informatics

Transcript of Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program...

Page 1: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

NCBO Webinar series

Knowledge Organization System (KOS) for biodiversity information resources- GBIF KOS work program

Dag EndresenKnowledge Systems Engineer, Node manager for GBIF NorwayNatural History Museum, University of Oslo

Éamonn Ó TuamaSenior Programme Officer, Inventory, Discovery, Access (IDA)Global Biodiversity Information Facility (GBIF)

17 October 2012

Page 2: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

GBIF enables free and open access to biodiversity data online.

We’re an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development.

WHAT IS THE GLOBAL BIODIVERSITY INFORMATION FACILITY?

Status data portalOctober 2012

Presented by Éamonn

Page 3: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

The OECD origin…OECD Global Science Forum recommendation (1999):“Establish and support a distributed system of interlinked and interoperable modules (databases, software and networking tools, search engines, analytical algorithms, etc.) that together will form a Global Biodiversity Information Facility (GBIF)”

OECD Global Science Forum recommendation (1999):“Establish and support a distributed system of interlinked and interoperable modules (databases, software and networking tools, search engines, analytical algorithms, etc.) that together will form a Global Biodiversity Information Facility (GBIF)”

“This facility will enable users to navigate and put to use vast quantities of biodiversity information, thereby:

advancing scientific research… serving the economic… providing a basis from which our

knowledge of the natural world can grow rapidly…”

“This facility will enable users to navigate and put to use vast quantities of biodiversity information, thereby:

advancing scientific research… serving the economic… providing a basis from which our

knowledge of the natural world can grow rapidly…”

Presented by Éamonn

Page 4: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

1. Information infrastructure – an Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data.

2. Community-developed tools, standards and protocols – the tools data providers need to format and share their data.

3. Capacity-building and training – and access to a global expert community.

GBIF PROVIDES THREE CORE SERVICES AND PRODUCTS:

Presented by Éamonn

Page 5: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

http://data.gbif.org/

Presented by Éamonn

Page 6: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Web services (REST)Advanced search for occurrence records

• Scientific names and classification• http://data.gbif.org/ws/rest/taxon

• Species occurrence data• http://data.gbif.org/ws/rest/occurrence

• Species occurrence data aggregated, 1 degree cell • http://data.gbif.org/ws/rest/density

• Metadata on data providers • http://data.gbif.org/ws/rest/provider

• Metadata on datasets • http://data.gbif.org/ws/rest/resource

• Metadata on data networks• http://data.gbif.org/ws/rest/network

Open and free use of data!

Presented by Éamonn

Page 7: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

GBIF’s unique role• Registry of biodiversity data resources• Tools and support for biodiversity data publication• Network development at national, regional and

global levels• Global virtual natural history collection• Cross-domain linkage between data from

collections, ecology and genomics• Access to biodiversity data for GIS analysis and

environmental monitoring– Aggregated presence data– Site-based survey data (samples, presence/absence)

Slide developed by Donald Hobern

Presented by Éamonn

Page 8: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Unifying species data

Integrated access for records of the occurrence of any species:

• What?• When?• Where?• What evidence?• Data owner?• Link to full record

Presence only

Collections

EcologicalMonitoring Genomics

Darwin Core

Slide developed by Donald Hobern

Page 9: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Darwin Core – a glossary of terms

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715

Page 10: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

TDWG Ontology

http://rs.tdwg.org/ontology

The TDWG Ontology was developed and maintained between 2007 and 2009.

Page 11: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

GBIF KOS task group report

http://links.gbif.org/gbif_kos_whitepaper_v1.pdf

11

Presented by Éamonn

Page 12: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

GBIF KOS work program, 2011 and 2012

Description of work:

• “a flexible, user-friendly ontology management environment, enabling users to create, define, extent and share their own terms and concepts where needed, providing options for discussions and annotation, while supporting re-use of terms from standardized ontologies wherever possible”.

• Extent the functionalities of existing vocabulary services (like GBIF).

• Collaborative community interface for users and user-networks, bottom-up, user-friendly and non-technical.

• Flexibility for biologists to express their knowledge regardless of whether the terminology has been standardized yet or not.

12

Page 13: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Data standards

Important principle: Re-use of terms from standardized terminologies wherever possible.

13 The cartoon is from XKCD: http://xkcd.com/927/ CC-BY-NC

Page 14: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Term versus Concept

Dextre Clarke, S.G. and L. Zeng (2012). From ISO 2788 to ISO 25964: the evolution of thesaurus Standards towards Interoperability and data modeling. ISQ Information Standards Quarterly 24(1): 20-26.

“The SKOS (simple knowledge organization system) format is designed to present KOS data in a format that is suitable for machine inferencing and particularly for use in the Semantic Web (….) concepts – units of thought – and distinguishes these from the terms that are used to label these concepts. Will, L. (2012). The ISO 25964 Data Model for the Structure of an Information Retrieval Thesaurus. Bulletin of the American Society for Information Science and Technology 38(4): 48-51.

14

Page 15: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

• Maximize the reuse of terms, focus on the definition and labels for basic terms.

• Low threshold for non-technical biologists and biodiversity domain experts to access terms and contribute (compared to richer ontologies).

• Preferred technology: RDF (resource description framework) and SKOS (simple knowledge organization system).

• Construction and maintenance of OWL ontologies are demanding in respect to expertise, effort and costs.

• Maintaining SKOS vocabularies are less demanding.• RDF resources are designed to be easily extended.• Ontologies (OWL) can be based on (extend) terms

declared by a RDF/SKOS vocabulary.• SKOS became a W3C recommendation in 2009.

Why use a flat vocabulary ?

15

Page 16: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

• OWL DL supports machine reasoning through machine accessible formal semantics.

• OWL provides by default an URI as identifier for classes, properties, relations and instances.

• E.g. OBO target practical solutions in the biomedical / biology domain, while OWL is more generic and provide cross-domain interoperability.

• OWL 1.0 became a W3C recommendation in 2004,• OWL 2.0 in 2009.• http://www.w3.org/2007/OWL/

• Recommendation:• REUSE terms declared by concept vocabularies…• Start with SKOS - then explore OWL…

Why use OWL (web ontology language) ?

16

Page 17: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

http://community.gbif.org/pg/groups/21382/

17

Governance structure (TDWG VoMaG)

Page 18: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

18

http://kos.gbif.org

Page 19: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Concept Vocabulary (rdf, skos)

Wiki VocabularyManagement

ISOcat VocabularyManagement

Excel, text, etc… Template forVocabularies

GBIFResources Browser

Resources Repository

1. Mint and maintain concepts and terms, in domain-expert working groups.

2. Release final version as a Concept Vocabulary.3. REUSE terms from published concept

vocabularies and ontologies when designing new DwC-A controlled term and value vocabularies.

4. Publish at the GBIF Resources Repository.5. Browse at the GBIF Resources Browser.

GBIF Vocabularies

DwC-Acontrolled vocabularies

Evaluation of collaborative management tools http://kos.gbif.org/

proposed templateprocessor

2

1

1

1

4

3

5

GBIF Vocabularies as a collaborative management tool for Darwin Core Archive controlled term and value vocabularies.

Vocabulary management (work-flow)

19

Page 20: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

GBIF Vocabularies

Darwin Core Archive controlled terms and value vocabularies

GBIF Vocabularies as a collaborative

management tool for Darwin Core Archive controlled terms and

controlled value vocabularies.

Concept Vocabulary (rdf, skos)

Wiki VocabularyManagement

Resources Repository

ISOcat VocabularyManagement

MS Excel Template forVocabularies

Evaluation of various tools for collaborative management of

concept vocabularies.

DwC-A term and value

vocabularies

GBIF IPT

GBIF Vocabulary Server (Drupal)

The GBIF Vocabulary Server is based on Drupal 6 / Scratchpads v1

20

Page 21: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Concept Vocabulary (rdf, skos)

Resources Repository

DwC-Aterm and valuevocabularies

Wiki Forum for Terms

Wiki forum for terms as an open community platform for description and maintenance of existing terms.

Replacement tool also for the GBIF Vocabulary Server?

Semantic wiki forum for terms

21

Wiki VocabularyManagement

ISOcat VocabularyManagement

MS Excel Template forVocabularies

Evaluation of various tools for collaborative management of

concept vocabularies.

?

GBIF IPT

Page 22: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Concept Vocabulary (rdf, skos)

Resources Repository

The GBIF Term Browser allows a user to browse for terms defined in widely used concept vocabularies

such as Darwin Core, Dublin Core, FOAF, etc., including where

available, translations.http://kos.gbif.org/termbrowser/

GBIF Term browser

22

Wiki VocabularyManagement

ISOcat VocabularyManagement

MS Excel Template forVocabularies

Evaluation of various tools for collaborative management of

concept vocabularies.

Concept vocabularies stored/deposited at http://rs.gbif.org/terms/

Page 23: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Concept Vocabulary (rdf, skos)

Wiki tool (incl. ontologydevelopment?)

Resources Repository (incl. ontologies?)

Ontologies(rdf, owl)

Biodiversity ontology management

Evaluation of tools for the

development of biodiversity

ontologies.

REUSE terms fromconcept vocabularies

Evaluation of biodiversity

ontology repository

solutions.

23

1 2

Page 24: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

BioPortal ontology repository

http://bioportal.bioontology.org/projects/168

Proposal: establish a biodiversity “slice” at the NCBO BioPortal.

• Loading biodiversity ontologies into the NCBO BioPortal promotes mapping (and reuse of terms) between bio-medical and biodiversity ontologies.

• An instance of the BioPortal software for biodiversity requires long-term obligations to host and maintain the resource – does e.g. GBIF have the resources to offer to host a BioPortal instance?

24

Page 25: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Concept vocabularies (skos:conceptSchema, RDF)

• Darwin Core, Darwin Core “extensions”, NCD, GNA, Audubon Core (and other concept vocabularies).

as a basis and foundation for

Software application schema (XML, XML schema)

• Darwin Core Archive (DwC-A) controlled terminology and controlled value vocabularies.

• Resources such as the DwC-A controlled term and value vocabularies REUSE terms (by URI) from a concept vocabulary.

25

GBIF KOS resources

Page 26: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Biodiversity KOS (based on Darwin Core)

Darwin Core (DwC) provides a flat list of concepts and terms, expressed using RDF.

DwC “extensions” (vocabularies for the declaration of complementary and additional concepts).

Reuse concepts from other vocabularies whenever possible.

Darwin Core Archive (DwC-A) has a star schema model.

• DwC-A core(s), extensions and controlled value vocabularies

• declared as XML lists of terms.

• DwC-A resources should always REUSE terms from Darwin Core and other concept vocabularies.

• New DwC-A core types (data types), eg. sample? Formalize class entities (ontology). [Current types: Taxon & Occurrence]

Formalize a governance structure for maintaining KOS resources based on the principles established for Darwin Core (towards TDWG VoMaG).

26

Page 27: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Darwin Core Archive (DwC-A) DwC-A publish DwC records including

terms from DwC-A extensions. Simple text based format. Zipped single file archive.

Germplasm.txt

27

Page 28: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Darwin Core Archive extension (XML term list)

28http://rs.gbif.org/sandbox/extension/audubon.xml

XSLT

Page 29: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

GBIF Vocabulary ServerThe GBIF Vocabulary Server can assist a user to create and manage DwC-A extensions or controlled value vocabularies.

However, it is not designed to create RDF/SKOS concept vocabulary resources with reusable concepts.

It can export XML, but not RDF.

It is based on Scratchpads (v1), aka. Drupal v 6.

29

XML export

edit interface

Page 30: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Concept vocabulary (RDF/SKOS)

http://rs.gbif.org/terms/geotime/geotimeConcept.rdf30

In progress: XSLT -> HTML for human readable version.

Page 31: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Global Names Architecture (GNA)

31

Many of the GNA term URI identifiers does not resolve (404 not found).

The rowType identifiers simply resolve to the software application schema (to the DwC-A extension).

We propose to formalize the GNA concept declarations using RDF/SKOS for improved re-usability of the GNA terms and concepts.

Page 32: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Global Names Architecture (GNA)

32

The Global Names Architecture (GNA) terms were originally simply declared by the DwC-A extension. We propose to formalize the GNA concept declarations using RDF/SKOS for improved re-usability of the GNA terms.

RDF/SKOS

XML

Page 33: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Global Names Architecture (GNA)

33

We propose to formalize the GNA concept declarations using RDF/SKOS for improved re-usability of the GNA terms.

RDF/SKOS

Page 34: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Darwin Core Archive extensions

34

• Global Names Architecture (GNA)• Audubon Core (multimedia)• Invasive species (GISIN)• Genetic Resources (Germplasm)• EOL species profile• Taxonomic Concept Schema (TCS)• Genomics Standards Consortium (GSC)• Meta-genomics (?)• ABCD (?)• …

Page 35: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

• Country codes• Language• Basis of record• Taxonomic rank• Nomenclatural status• Life form • Life stage• Geological time periods

• chronostratigraphy• magnetostratigraphy

• Species interactions• saproxylic interactions• pollinators

• …

Controlled value vocabularies

35

Page 36: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Example: master SKOS/RDF resource

http://rs.gbif.org/terms/dwc/dwc_translations.rdf

[[[[en

es

zh

ja

36

Presented by Éamonn

Page 37: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Semantic MediaWiki

single term view

Presented by Éamonn

Page 38: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Recommendations for the GBIF KOS work programme

• GBIF Resources Repository (http://rs.gbif.org/)• Further development of new DwC-A extensions and controlled value vocabularies.

• Workflow for the translation of term descriptions.

• Versioning of terms/vocabularies.

• Continue the evaluation of collaborative tools for management of flat vocabularies of terms (RDF/SKOS).• Semantic MediaWiki, ISOcat, Protégé (web-protégé), …

• New semantic Wiki for the description of terms / glossary of terms / community-driven discussion forum (with JKI, Gregor Hagedorn).• Discussion, discovery and REUSE of existing terms.

• NCBO BioPortal as a repository for biodiversity ontologies.• Explore BFO based OWL version of Darwin Core…?

• KOS governance structure developed and formalized by the (TDWG) Vocabulary Management Task Group (VoMaG).

• Roadmap for KOS into the GBIF infrastructure, portal, tools…!

38

Page 39: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (2012)

Furthermore, I think that we need persistent identifiers!

Cato the Elder ended all his speeches in the senate of Rome with: "Ceterum

autem censeo Carthaginem esse delendam" (English: "Furthermore, I think Carthage must be destroyed").

39Available at http://www.bioontology.org/GBIF-vocabulary-management-for-biodiversity-informatics