Biodiversity Data Publishing:...

23
Biodiversity Data Publishing: Overview Fhatani Ranwashe

Transcript of Biodiversity Data Publishing:...

  • Biodiversity Data Publishing: Overview Fhatani Ranwashe

  • Biodiversity Data Publishing: Overview

    INDEX

    • Background

    • Data publishing landscape

    • Biodiversity data publishing

    o Data types

    o Data standards

    o Data quality

    o Data publishing methods

    • Data publishing in South Africa

  • Background

    “Free and open access to primary biodiversity data is

    essential for informed decision-making to achieve

    conservation of biodiversity and sustainable development.

    However, primary biodiversity data are neither easily

    accessible nor discoverable. “ (Chavan and Penev, 2011)

    Biodiversity data publishing refers to making biodiversity

    data available to the public in a standard form via the

    internet.

  • Biodiversity Data Publishing: Overview

    Data publishing landscape

    Publishing and discovery of biodiversity data: the constraints and challenges

    • the lack of sustainable practices for data publishing;

    • the lack of easy-to-use tools and related guidelines for authoring metadata

    documents;

    • the difficulty of dealing with heterogeneity and diversity of standards;

    • the cost of creation and maintenance of infrastructure by small- and medium-scale

    data publishers; and

    • the lack of professional reward structures or incentives.

    Chavan and Penev, BMC Bioinformatics 2011, 12(Suppl 15):S2

  • Biodiversity Data Publishing: Overview

    Data publishing landscape

    “Currently the GBIF facilitates discovery of over 10,000 data resources,

    providing access to over 267 million primary biodiversity data records.” (Chavan

    and Penev, 2011)

    Partnerships with other international organisations such as the Catalogue of

    Life, Biodiversity Information Standards (TDWG), the Consortium for the

    Barcode of Life (CBOL), the Encyclopaedia of Life (EoL), and Integrated

    Taxonomic Information System (ITIS).

  • Data publishing landscape

    2008 2009 2010 2011 2012

    The data publishing area is in continuous evolution and expansion.

    2014

    Idea for simple,

    compressed text-based file for publishing introduced at

    TDWG

    DiGIR/TAPIR in high use to

    publish biodiversity

    data

    GBIF introduces

    IPT 1.0

    Darwin Core

    standards established by TDWG

    GBIF redevelops

    IPT

    GBIF introduces

    IPT 2.0

    Data Publishing taught at Nodes

    training.

    Nodes and aggregators

    begin to install and use IPTs

    Occurrence and checklist type datasets show

    continued growth

    GBIF introduces

    IPT 3.0 with Digital

    Object Identifier

    (DOI)

  • Data Types

    • Occurrences (observations, specimens etc)

    o 'collection event'

    o an observation in the field, vouchered (labeled) specimen in a museum or

    herbarium, or other evidence.

  • Data Types

    • Checklists (names)

    o lists of scientific names of organisms grouped into taxonomic hierarchies,

    o provide taxonomic 'backbones' around which species information can be

    organized.

  • Data Types

    • Metadata (data about data)

    o structured descriptions of datasets

    o help to give context to datasets and enable users to assess whether data are

    fit for use in a particular research project or application

  • Data Types

    http://www.gbif.org/publishing-data/summary#datatypes

    • Sampling-event (quantitative information )

    o records from thousands of different kinds of environmental, ecological, and

    natural resource monitoring and assessment investigations

  • Biodiversity Data Publishing: Overview

    Data Standards

    ABCD Access to Biological Collection Data (2005)

    DwC Darwin Core (2009)

    AC Audubon Core Multimedia Resources Metadata Schema (2013)

    NCD Natural Collection Descriptions (Draft)

    http://www.tdwg.org

  • Biodiversity Data Publishing: Overview

    Darwin Core

  • Mapping cores

    • Taxon Core – Species information

    The category of information pertaining to taxonomic names, taxon

    name usages, or taxon concepts. 43 terms.

    • Occurrence Core – Collection event information

    The category of information pertaining to evidence of an

    occurrence in nature, in a collection, or in a dataset (specimen,

    observation, etc.). 169 terms.

    • Event – Sampling information

    The category of information pertaining to a sampling event. Issued

    29 May 2015. 95 terms

  • Biodiversity Data Publishing: Overview

    Extensions

    • Darwin Core does not provide terms for every possible type of data.

    – 22 registered

    – 25 under development

    • Examples

    – Audubon Media Description (aka Audubon Core)

    – Darwin Core Identification History

    – Darwin Core Measurement or Facts

  • Biodiversity Data Publishing: Overview

    Darwin Core Archive

    • A Darwin Core Archive (DwCA) is the text

    representation of data formatted to Darwin

    Core.

    • A DwCA is a compressed file containing a

    minimum of three files.

  • Biodiversity Data Publishing: Overview

    Star schema

    Literature

    Taxon

    Core

    Descriptio

    n

    Occurrences

    meta.xml

    EML.xml

    +

    DwC Archive

    Checklist

    Vernacula

    r

    Distributio

    n

    Type

    s

  • Biodiversity Data Publishing: Overview

    Simple Darwin Core

    • SIMPLEDWC - flat file

    structure showing how to

    use taxa & occurrence

    Darwin Core terms.

    • Use if someone suggests

    to "Format your data

    according to the Darwin

    Core"

  • Biodiversity Data Publishing: Overview

    Data quality

    “Data quality and error in data are often neglected issues with environmental

    databases, modelling systems, GIS, decision support systems, etc. Too often, data are

    used uncritically without consideration of the error contained within, and this can lead

    to erroneous results, misleading information, unwise environmental decisions

    and increased costs.” (Chapman, 2005)

    • Data capture and recording at the time of gathering,

    • Data manipulation prior to digitisation (label preparation, copying of data to a

    ledger, etc.),

    • Identification of the collection (specimen, observation) and its recording,

    • Digitisation of the data,

    • Documentation of the data (capturing and recording the metadata),

    • Data storage and archiving,

    • Data presentation and dissemination (paper and electronic publications, web-

    enabled databases, etc.),

    • Using the data (analysis and manipulation).

  • Biodiversity Data Publishing: Overview

    Data publishing methods

  • Biodiversity Data Publishing: Overview

    Data publishing methods

  • Data publishing in South Africa

    South Africa is a country node for GBIF

    • SANBI IPT:

    o http://197.189.235.147:8080/iptsanbi/

    o 2737451 records published

    • ADU IPT

    o http://aduipt.uct.ac.za:8080/ipt-2.3.2/

    o 288822 records published

    • SAIAB IPT

    o http://ipt.saiab.ac.za

    o 138140 records published

    • ICLEI IPT

    o http://197.189.235.147:8080/ipticlei/

    • KwaZulu-Natal Museum IPT

    • Endangered Wildlife Trust IPT

  • 2.1m

    SANBI and Data Partners by numbers

    649k

    105k

    78k

    8.5m

    5k

    51k (Nematodes)

    SANBI-IPT (Trusted Data Hosting Centre)

    18k (Algae)

    17k

    91k

  • Thank You!