GBIF BIFA mentoring, Day 2 Publish data, July 2016

48

Transcript of GBIF BIFA mentoring, Day 2 Publish data, July 2016

Page 1: GBIF BIFA mentoring, Day 2 Publish data, July 2016
Page 2: GBIF BIFA mentoring, Day 2 Publish data, July 2016

TOPICS

•  Darwin Core standard •  Darwin Core archives data

exchange format •  Occurrence, Event and Taxon cores •  Data profiles in GBIF at rs.gbif.org •  What is the GBIF Integrated data

Publishing Toolkit? •  Download data from GBIF

Page 3: GBIF BIFA mentoring, Day 2 Publish data, July 2016

1.   Informa*oninfrastructure–anInternet-basedindexofagloballydistributednetworkofinteroperabledatabasesthatcontainprimarybiodiversitydata.

2.   Community-developedtools,standards

andprotocols–thetoolsdataprovidersneedtoformatandsharetheirdata.

3.   Capacity-buildingandtraining–and

accesstoaglobalexpertcommunity.

Page 4: GBIF BIFA mentoring, Day 2 Publish data, July 2016

GBIFprovidesadatadiscoverysystem

globalregistry dataportal

thatisdependentonresolvablestableiden0fiersforefficientfunc0onality

Page 5: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Darwin Core

Page 6: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DATA STANDARDS

Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015 - http://www.tdwg.org/standards/

ABCD Access to Biological Collection Data (2005) DwC Darwin Core (2009) AC Audubon Core Multimedia Resources Metadata Schema (2013) NCD Natural Collection Descriptions (Draft 2008) EML Ecological Metadata Language (Ecological Society of America)

Page 7: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Darwin Core – a glossary of terms

WieczorekJ,BloomD,GuralnickR,BlumS,DöringM,DeGiovanniR,RobertsonT,andVieglaisD(2012)DarwinCore:AnEvolvingCommunity-DevelopedBiodiversityDataStandard.PLoSONE7(1):e29715.doi:10.1371/journal.pone.0029715

Page 8: GBIF BIFA mentoring, Day 2 Publish data, July 2016

h^p://rs.tdwg.org/dwc/terms/

Record-levelTermsdcterms:type|dcterms:modified|dcterms:language|dcterms:rights|dcterms:rightsHolder|dcterms:accessRights|dcterms:bibliographicCitabon|dcterms:references|ins*tu*onID|collec*onID|datasetID|ins*tu*onCode|collec*onCode|datasetName|ownerInsbtubonCode|basisOfRecord|informabonWithheld|dataGeneralizabons|dynamicProperbesOccurrenceoccurrenceID|catalogNumber|recordNumber|recordedBy|individualCount|organismQuanbty|organismQuanbtyType|sex|lifeStage|reproducbveCondibon|behavior|establishmentMeans|occurrenceStatus|preparabons|disposibon|associatedMedia|associatedReferences|associatedSequences|associatedTaxa|otherCatalogNumbers|occurrenceRemarksOrganismorganismID|organismName|organismScope|assocoatedOccurrences|associatedOrganisms|previousIdenbficabons|organismRemarksMaterialSample|LivingSpecimen|PreservedSpecimen|FossilSpecimenmaterialSampleIDEvent|HumanObserva*on|MachineObserva*oneventID|parentEventID|fieldNumber|eventDate|eventTime|startDayOfYear|endDayOfYear|year|month|day|verbabmEventDate|habitat|samplingProtocol|sampleSizeValue|sampleSizeUnit|samplingEffort|fieldNotes|eventRemarksLoca*onloca*onID|higherGeographyID|higherGeography|conbnent|waterBody|islandGroup|island|country|countryCode|stateProvince|county|municipality|locality|verbabmLocality|verbabmElevabon|minimumEleva*onInMeters|maximumElevabonInMeters|verbabmDepth|minimumDepthInMeters|maximumDepthInMeters|minimumDistanceAboveSurfaceInMeters|maximumDistanceAboveSurfaceInMeters|locabonAccordingTo|locabonRemarks|verbabmCoordinates|verbabmLabtude|verbabmLongitude|verbabmCoordinateSystem|verbabmSRS|decimalLa*tude|decimalLongitude|geode*cDatum|coordinateUncertaintyInMeters|coordinatePrecision|pointRadiusSpabalFit|footprintWKT|footprintSRS|footprintSpabalFit|georeferencedBy|georeferencedDate|georeferenceProtocol|georeferenceSources|georeferenceVerificabonStatus|georeferenceRemarksGeologicalContextgeologicalContextID|earliestEonOrLowestEonothem|latestEonOrHighestEonothem|earliestEraOrLowestErathem|latestEraOrHighestErathem|earliestPeriodOrLowestSystem|latestPeriodOrHighestSystem|earliestEpochOrLowestSeries|latestEpochOrHighestSeries|earliestAgeOrLowestStage|latestAgeOrHighestStage|lowestBiostrabgraphicZone|highestBiostrabgraphicZone|lithostrabgraphicTerms|group|formabon|member|bedIden*fica*oniden*fica*onID|idenbfiedBy|typeStatus|idenbficabonQualifier|dateIdenbfied|idenbficabonReferences|idenbficabonVerificabonStatus|idenbficabonRemarksTaxontaxonID|scien*ficNameID|acceptedNameUsageID|parentNameUsageID|originalNameUsageID|nameAccordingToID|namePublishedInID|taxonConceptID|scien*ficName|acceptedNameUsage|parentNameUsage|originalNameUsage|nameAccordingTo|namePublishedIn|namePublishedInYear|higherClassificabon|kingdom|phylum|class|order|family|genus|subgenus|specificEpithet|infraspecificEpithet|taxonRank|verbabmTaxonRank|scienbficNameAuthorship|vernacularName|nomenclaturalCode|taxonomicStatus|nomenclaturalStatus|taxonRemarksResourceRela*onship(AuxiliaryTerms)resourceRela*onshipID|resourceID|relatedResourceID|relabonshipOfResource|relabonshipAccordingTo|relabonshipEstablishedDate|relabonshipRemarksMeasurementOrFact(AuxiliaryTerms)measurementID|measurementType|measurementValue|measurementAccuracy|measurementUnit|measurementDeterminedDate|measurementDeterminedBy|measurementMethod|measurementRemarks

Page 9: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DARWIN CORE ARCHIVE (DWC-A) v  DwC-A publish DwC records including terms

from DwC-A extensions. v  Simple text based format. v  Zipped single file archive.

occurrence.txt

Page 10: GBIF BIFA mentoring, Day 2 Publish data, July 2016

STAR SCHEMA

Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015

Ext2

Core

Ext1

Ext3

meta.xml

EML.xml

+

DwCArchive

Ext4

Ext5

Page 11: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Data types

Page 12: GBIF BIFA mentoring, Day 2 Publish data, July 2016

MAPPING CORES – DATA TYPES

Occurrence core The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.). Updated in July 2015, 169 terms.

Taxon core

The category of information pertaining to taxonomic names, taxon name usages, or taxon concepts. Updated in April 2015, 43 terms.

Event core

The category of information pertaining to a sampling event. Issued 29 May 2015, 95 terms.

http://www.gbif.org/publishing-data/summary - http://www.gbif.org/newsroom/news/sample-based-data - http://rs.gbif.org/core/

Occurrences

Checklists(oftaxonnames)

Page 13: GBIF BIFA mentoring, Day 2 Publish data, July 2016

BIODIVERSITY DATA TYPES – SAMPLE DATA

Slide source: GB23 Nodes Madagascar October 2015 - http://www.gbif.org/newsroom/news/sample-based-data

Sampledata

ReleaseandfirstuseoftheEventcoreinMarch-October2015

Page 14: GBIF BIFA mentoring, Day 2 Publish data, July 2016

EVENT CORE

“Monitoring biodiversity change often requires repeated measures at the same place. This extension will enable data holders publishing through the GBIF network to share population abundance data (including time series population data) and presence/absence data, and also to document the sampling protocol”.

Henrique Pereira, chair of GEO BON

Slidesource:GB23NodesMadagascarOctober2015

Page 15: GBIF BIFA mentoring, Day 2 Publish data, July 2016

EXTENSIONS

Darwin Core does not provide terms for every possible type of data.

•  22 registered extensions (for Darwin Core Archive format)

Examples •  Darwin Core Identification History •  Darwin Core Measurement or Facts •  Audubon Media Description (aka Audubon Core) •  Darwin Core extension for germplasm genebanks

Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015 - http://rs.gbif.org/extension/ - http://tools.gbif.org/dwca-validator/extensions.do

Page 16: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DARWIN CORE ARCHIVE EXTENSIONS

• Global Names Architecture (GNA) • Audubon Core (multimedia) • Invasive species (GISIN) • Genetic Resources (Germplasm) • EOL species profile • Taxonomic Concept Schema (TCS) • Genomics Standards Consortium (GSC) • Meta-genomics • ABCD • …

Page 17: GBIF BIFA mentoring, Day 2 Publish data, July 2016

• Country codes • Language • Basis of record • Taxonomic rank • Nomenclatural status • Life form • Life stage • Geological time periods

•  chronostratigraphy •  magnetostratigraphy

• Species interactions •  saproxylic interactions •  pollinators

• …

CONTROLLED VALUE VOCABULARIES

Page 18: GBIF BIFA mentoring, Day 2 Publish data, July 2016

STAR SCHEMA EXAMPLE - OCCURRENCE

Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015

Media

OccurrenceCore

Geographical

Determinabon

meta.xml

EML.xml

+

DwCArchiveOccurrence

Germplasm

Page 19: GBIF BIFA mentoring, Day 2 Publish data, July 2016

STAR SCHEMA EXAMPLE - CHECKLIST

Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015

Literature

TaxonCore

Descripbon

Occurrences

meta.xml

EML.xml

+

DwCArchiveChecklist

Vernacular

Distribubon

Types

Page 20: GBIF BIFA mentoring, Day 2 Publish data, July 2016

STAR SCHEMA EXAMPLE - EVENT

Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015

EventCore

Occurrences

MeasurementorFact

meta.xml

EML.xml

+

DwCArchiveSamplesRelevé

Page 21: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Publish your biodiversity data

with GBIF

Page 22: GBIF BIFA mentoring, Day 2 Publish data, July 2016

GBIFSecretariat(2015).GelngStarted:AnoverviewofdatapublishingintheGBIFnetwork,version1.1.Copenhagen:GlobalBiodiversityInformabonFacility,17pp.ISBN:87-92020-28-3(forversion1.0).h^p://www.gbif.org/resource/80635h^p://www.gbif.org/publishing-data/summary

GETTING STARTED GBIF GUIDELINES

Page 23: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Datapublishingguidelines

h^p://www.gbif.org/resources?f[0]=gr_purpose%3A955

Page 24: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DATA PUBLICATION IN GBIF

Conversion to standardized format (Darwin Core, ABCD)

Quality assessment & clearance/endorsement

Publication in GBIF

Data accessed by scientists and other users

PhotoCC-BYDagEndresen,Oslo,June2014

Page 25: GBIF BIFA mentoring, Day 2 Publish data, July 2016

FirststeptostartpublishingdatainGBIFistoseekendorsementasadatanode/publisher.

EndorsementrequestsareforwardedtotheappropriateGBIFmembernode-orevaluatedbytheGBIFsecretariat.

1

2

Page 26: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Institute (AHP reserve)

Biodiversity ConservationGBIF

portal

Global information systems

Scientific Research

MULTIPLE-PURPOSE DATA SERVICES

ACB Portal

Page 27: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Institute (AHP reserve)

Biodiversity Conservation

GBIF portal

Global information systems

Scientific Research

DATA PUBLISHING CLEARANCE AT ACB?

ACB Portal

Page 28: GBIF BIFA mentoring, Day 2 Publish data, July 2016

PLANTGENETICRESOURCESNETWORKMODEL

v  Each dataset is shared from the holding gene bank.

v  The National Inventory (NI), National Focal Person (NFP) endorses all national gene bank datasets for EURISCO.

v  ECPGR Crop databases can access passport data from EURISCO and additional crop specific data from the gene bank IPT interface.

v  Standard data sharing tools ensure that the genebank dataset is available to other relevant decentralized thematic, regional or global networks.

IllustrabonfromtheGBIFannualreport2009,page47.

Page 29: GBIF BIFA mentoring, Day 2 Publish data, July 2016

PUBLISH DATA IN GBIF da

ta p

ublis

hing

Step 1: data holding research institutes seek endorsement as an approved data publisher.

Step 2: datasets are identified and converted to standard Darwin Core format.

Step 3: datasets can be published directly from the data node and/or with the assistance from a national GBIF node.

Citizen science data platforms also publish in GBIF.

Page 30: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DATA PUBLISHING METHODS

Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015

Page 31: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DATA PUBLISHING METHODS

Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015

Page 32: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DATAPUBLISHINGTOOLKITS

DiGIR(PHP,2001-2006)

TapirLink(PHP,2007à)

BioCASE(Python,2001à)

GBIFIPT(Java,2009à)

"2"

Page 33: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DATA PUBLISHING SOFTWARE: SPREADSHEETS

Metadata Primary Biodiversity data Species Checklists

Slide source: iDigBio Florida January 2015

Page 34: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Ci*zenScienceplavormspublishbiodiversityobservabonsinGBIF.

Amateurnaturalistsandself-taughtcibzenscanreporttheirownspeciesobservabons.

951,235occurrencespublishedinGBIF

212mill.occurrencespublishedinGBIF

28,990occurrencespublishedinGBIF

14,719occurrencespublishedinGBIF

16mill.occurrencespublishedinGBIF

42mill.occurrencespublishedinGBIF

andmanymore...

Page 35: GBIF BIFA mentoring, Day 2 Publish data, July 2016

GBIF Integrated data Publishing

Toolkit

Page 36: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DATA PUBLISHING LANDSCAPE

DiGIR(2001),BioCASE

(2001),TapirLink(2007)inusefor

publishingbiodiversitydata

Ideaforsimple,compressedtext-basedfileforpublishing

introducedatTDWG

GBIFintroducesIPT1.0

GBIFredevelopsIPTwithlessmemory

requirements

GBIFintroducesIPT2.0

IPTmorethan100installabonsandservingmore

than800datasets

Nodesandaggregators

(includingGBIFNorway)begintoinstallanduse

IPTs

Demo/testEventcore

developedbyGBIFandEU

BON

Slide source: modified from GB23 Nodes Madagascar October 2015

2007 2008 2009 2010 2011 2012 20142013 2015

Eventcoreisreleasedforuse(October2015).

DatasetDOIswithDataCite(March2015).IPTbecomesthe

dominantdata-publishingsoluboninGBIF.

Page 37: GBIF BIFA mentoring, Day 2 Publish data, July 2016

DATA PUBLISHING LANDSCAPE - STATISTICS

Slide source: GB23 Nodes Madagascar October - http://www.gbif.org/ipt/stats

Page 38: GBIF BIFA mentoring, Day 2 Publish data, July 2016

TheGBIFIntegrateddataPublishingToolkit(IPT)isafreeopensourcesowwaretoolwri^eninJavathatisusedtopublishandsharebiodiversitydatasetsthroughtheGBIFnetwork.

h^p://www.gbif.org/ipt

IPTUserManual:

h^ps://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki

RobertsonT,DöringM,GuralnickR,BloomD,WieczorekJ,BraakK,OteguiJ,RussellL,DesmetP(2014).TheGBIFintegratedpublishingtoolkit:Facilitabngtheefficientpublishingofbiodiversitydataontheinternet.PLoSOne9(8).doi:10.1371/journal.pone.0102623

Page 39: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Download data

Page 40: GBIF BIFA mentoring, Day 2 Publish data, July 2016

GBIF DATA PORTAL

Page 41: GBIF BIFA mentoring, Day 2 Publish data, July 2016
Page 42: GBIF BIFA mentoring, Day 2 Publish data, July 2016
Page 43: GBIF BIFA mentoring, Day 2 Publish data, July 2016

SPECIES SEARCH

Page 44: GBIF BIFA mentoring, Day 2 Publish data, July 2016

GBIF PORTAL – DOWNLOAD DATA

Before downloading species occurrence data from GBIF, please take the time to register. http://www.gbif.org/user/register Downloads from the GBIF portal are packaged as a Darwin Core Archive (DwC-A). http://www.gbif.org/faq/datause The species occurrence data are found in the “occurrence.txt” data file. This tab-delimited text file can be imported to a spreadsheet such as Excel or to a database. NOTE: the data files can become very large! So look at the file size before you open them in MS Excel.

Page 45: GBIF BIFA mentoring, Day 2 Publish data, July 2016

Logintofindyourcurrentandpreviousdownloads

Page 46: GBIF BIFA mentoring, Day 2 Publish data, July 2016

GBIF DATA PORTAL API An interface to access data published through the GBIF network using web services.

Page 47: GBIF BIFA mentoring, Day 2 Publish data, July 2016

ROPENSCI : RGBIF library(rgbif) key <- name_backbone(name='Hepatica nobilis', kingdom=‘Plantae')$speciesKey sp <- occ_search(taxonKey=key, return='data', hasCoordinate=TRUE, limit=1000) gbifmap(sp)

Page 48: GBIF BIFA mentoring, Day 2 Publish data, July 2016