Antabif training

64
ANTABIF Training getting your data online Bruno Danis, Anton Van de Putte and Nabil Youdjou Wednesday 26 October 11

description

Introduction presentation for ANTABIF training.

Transcript of Antabif training

Page 1: Antabif training

ANTABIF Traininggetting your data online

Bruno Danis, Anton Van de Putte and Nabil Youdjou

Wednesday 26 October 11

Page 2: Antabif training

Objectives

• familiarize with ANTABIF

• learn about architecture, functionalities tools and standards we offer

• hands on exercises with dummy and *real* data

• collect feedback on the fitness for use for this community

Wednesday 26 October 11

Page 3: Antabif training

On the Menu Today

• Background about ANTABIF

• Technical overview

• Standards, tools and resources

• Functionalities

• Future directions

• Hands on

Wednesday 26 October 11

Page 4: Antabif training

Background

Wednesday 26 October 11

Page 5: Antabif training

Antarctic Treaty« In order to promote international cooperation in scientific investigation in Antarctica, […], the Contracting Parties agree that, to the greatest extent feasible and practicable: […]

Scientific observations and results from Antarctica shall be exchanged and made freely available. »

Wednesday 26 October 11

Page 6: Antabif training

SCAR-MarBIN & ANTABIF

• www.scarmarbin.be

• www.antabif.be or www.biodiversity.aq

• Core funding: BELSPO.be

• International Polar Year 2007/08

• Census of Antarctic Marine Life

• Ocean Biogeographic Information System

• Global Biodiversity Information Facility

Wednesday 26 October 11

Page 7: Antabif training

General Philosophy

• Build an electronic ecosystem

• Offer free and open access to data and technology

• Expose all the (biodiversity) data and metadata, in multiple contexts

• Remain community-driven, and collaborative

• Adopt strong standardization

• Work for science, conservation, management

Wednesday 26 October 11

Page 8: Antabif training

Wednesday 26 October 11

Page 9: Antabif training

Achievements

• The first RAMS

• Board of 60+ editors

• Feeds WoRMS, CoL and EoL

• 17,098 taxa (RAMS)

• Building a dynamic RAS

• 24,248 taxa (RAS)

Wednesday 26 October 11

Page 10: Antabif training

Achievements

• 1,288,441 records

• 198 datasets

• 5,235 taxa

• Feeds OBIS, GBIF

• Downloadable

• WebGIS

• Webservices

Wednesday 26 October 11

Page 11: Antabif training

Achievements

• Up since Oct 2005

• open access

• 909,915 visitors

• 8,093,774 hits

• 51,416,196 dld records

• Citations: 183

• Cited Publications: 38

Wednesday 26 October 11

Page 12: Antabif training

Achievements

Records SMB ANTABIF Progress

Metadata 198 7.200 36,4

Occurrence 1.288.441 2.659.392 2,1

Taxonomy 17.184 30.472 1,8

Wednesday 26 October 11

Page 13: Antabif training

Nuts and Bolts

Wednesday 26 October 11

Page 14: Antabif training

100% Open Source

• Language: Ruby

• Framework: Rails(ActiveRecord) and YUI

• (smart) Search engine: Full text (Elasticsearch-Lucene)

• Database/GIS server/SpatialDB: PostGresql/Geoserver/PostGIS

• Mapping client: OpenLayers

• Web services: RESTish (all resources)

• Protocols/Standards: DIF, DwC, DwC-A, Tapir…etc

• GBIF tools : HIT, IPT

• Hosting: BeBIF (ULB/VUB joint IT Center)

• Metadata systems: GCMD API (DIF)

Wednesday 26 October 11

Page 15: Antabif training

Data flow

Your data

standardize

DwC-A

upload publish

IPT ANTABIF

publish

Data Paper

(your point of view)

Wednesday 26 October 11

Page 16: Antabif training

Data flow(our point of view)

Wednesday 26 October 11

Page 17: Antabif training

Standards, tools, resources

Wednesday 26 October 11

Page 18: Antabif training

MetadataInformation about datasets deteriorates over time!

Wednesday 26 October 11

Page 19: Antabif training

Metadata

• preferred MD catalogue = Antarctic Master Directory (subset of GCMD)

• standard = DIF (Data Interchange Format)

• used by the whole SCAR community

• crawled by Google, Scopus...

Wednesday 26 October 11

Page 20: Antabif training

DarwinCore

"A vocabulary of words that biologists, hackers, and citizen scientists use to broadly describe the biodiversity of life on earth."

Wednesday 26 October 11

Page 21: Antabif training

DarwinCore Archive

• Complete package of data

–One file

–Multiple files

• Text Files…

• Self-documenting

• Intended to be shared/distributed

Wednesday 26 October 11

Page 22: Antabif training

DarwinCore Archive

The  core  data  file  is  a  text  file.

Archives always have a ‘core’ data file

My_data.txt

Wednesday 26 October 11

Page 23: Antabif training

DarwinCore Archive

The  core  data  file  is  a  text  file.

Archives always have a ‘core’ data file

My_data.txt

Wednesday 26 October 11

Page 24: Antabif training

DarwinCore Archive

meta.xml  describes  the  mappings  in  thecore  data  file  (species.txt)

Darwin Core Archive (two files)

Wednesday 26 October 11

Page 25: Antabif training

DarwinCore Archive

Columns  in  extensions  are  mapped  to  Darwin  Core  using  the  meta.xml  file

Multiple extensions are available

Wednesday 26 October 11

Page 26: Antabif training

DarwinCore Archive

h?p://rs.gbif.org/extension/

Many extensions are available

Wednesday 26 October 11

Page 27: Antabif training

Spreadsheet templates

• Metadata - describe a database or other data resource. 

• Species Occurrence - store basic species collections or observational data

• Species Checklists – recording and storing simple annotated species checklists.

Wednesday 26 October 11

Page 28: Antabif training

Wednesday 26 October 11

Page 29: Antabif training

Wednesday 26 October 11

Page 30: Antabif training

Wednesday 26 October 11

Page 31: Antabif training

Wednesday 26 October 11

Page 32: Antabif training

Wednesday 26 October 11

Page 33: Antabif training

Spreadsheet processor

• web application: Excel spreadsheet to DwC-A.

• Excel files contain data entry and GBIF metadata profile.

• Worksheet supports publication of primary biodiversity data

• Processor performs data validation and transformation and returns a validated DwC-A

Wednesday 26 October 11

Page 34: Antabif training

Wednesday 26 October 11

Page 35: Antabif training

DwC-A validator

• tests Darwin Core Archives

• validates the content against the known extensions and terms registered within the GBIF network for sharing biodiversity data.

Wednesday 26 October 11

Page 36: Antabif training

Wednesday 26 October 11

Page 37: Antabif training

IPT - Integrated Publishing Toolkit

• Publishing primary biodiversity data

• Resources

• Metadata

• Source Data (text, zip, SQL)

• Source Mappings

• Visibility

• Published Release

Wednesday 26 October 11

Page 38: Antabif training

The Data Paper concept

• A scholarly journal publication whose primary purpose is to describe a dataset or group of datasets, rather than to report a research investigation.

• Benefits of the Data Paper

–Scholarly credit to Data Publishers

–Describe the data in structured human readable form

–Bring the existence of the data to the attention of the scholarly community

Wednesday 26 October 11

Page 39: Antabif training

Data Paper: Incentivising Data Discovery

Wednesday 26 October 11

Page 40: Antabif training

Data PaperMetadata document

Reward data publishing

Wednesday 26 October 11

Page 41: Antabif training

• Complete metadata of a dataset using metadata editor in IPT 2.0.2

• Generate ‘Data Paper’ manuscript (menu: Manage Resource – RTF Download)

• Submit the manuscript for possible publication in one of the PenSoft publication (ZooKeys, PhytoKeys, BioRisks, NeoBiota).

• Revision (if any) is carried out using metadata editor in IPT 2.0.2 and manuscript re-submitted to PenSoft Open Journal System

Step-by-Step

Wednesday 26 October 11

Page 42: Antabif training

• Digital Object Identifier is assigned to the Data Paper

• Paper is published in (a) print format, (b) PDF format, (c) semantically enhanced HTML, and (d) XML is archived in PubMedCentral

• DoI of the Data Paper is linked with the Persistent Identifier of the metadata document in the GBIF Registry

• Data Paper is indexed by Web of Knowledge (ISI), PubMedCentral, Scopus, Zoological Record, Google Scholar, CAB Abstracts, Directory of Open Access Journal (DOAJ), EBSCO.

Once paper is accepted

Wednesday 26 October 11

Page 43: Antabif training

• Metadata is complete in all the respect

• All the claims are adequately substantiated

• Data described in ‘Data Paper’ is freely available at the time of submission of the manuscript

Important to consider

Wednesday 26 October 11

Page 44: Antabif training

ORC• GBIF’s Online Resource Center

• Provides access to documents, best practices, tools and links

• Wide thematic scope

• Different ways of accessing resources

• Enabling community contributions

• Different levels of resource access

• Multilanguage supportWednesday 26 October 11

Page 45: Antabif training

Wednesday 26 October 11

Page 46: Antabif training

Functionalities

Wednesday 26 October 11

Page 47: Antabif training

www.biodiversity.aq

• general website

• latest news

• contact

• sponsors

• governance

• links

Wednesday 26 October 11

Page 48: Antabif training

www. biodiversity.aq

Wednesday 26 October 11

Page 49: Antabif training

data. biodiversity.aq

• find primary biodiversity data

• visualize occurrence data on map

• view taxonomic data

• download data

• view metrics

• send feedback

• access technical documentation

Wednesday 26 October 11

Page 50: Antabif training

data. biodiversity.aq

Wednesday 26 October 11

Page 51: Antabif training

ipt. biodiversity.aq

• prepare and clean your data

• publish primary biodiversity data

• publish metadata

• push data and metadata to ANTABIF & GBIF

• get a Data Paper

Wednesday 26 October 11

Page 52: Antabif training

ipt. biodiversity.aq

Wednesday 26 October 11

Page 53: Antabif training

afg. biodiversity.aq

• (nice-looking) Identification aid

• Publication/sharing platform for customized Field Guides

• High quality (useful) pictures

• Expert Descriptions

• Built dynamically from various sources

Wednesday 26 October 11

Page 54: Antabif training

afg. biodiversity.aq

Wednesday 26 October 11

Page 55: Antabif training

share. biodiversity.aq

• download shared resources

• reports, communication material

• original datasets, tools, resources

Wednesday 26 October 11

Page 56: Antabif training

share. biodiversity.aq

Wednesday 26 October 11

Page 57: Antabif training

• polarcommons.org

• Emergency solution for orphan datasets

• Setup of a commons

• IT cloud

• Set of norms

• All polar data (IPY)

• Simple procedure!

PIC

Wednesday 26 October 11

Page 58: Antabif training

www.polarcommons.org

Wednesday 26 October 11

Page 59: Antabif training

Future directions

Wednesday 26 October 11

Page 60: Antabif training

Architecture

• A network of IPTs

• Enhanced data flow

• Community involved in data management

• Enhanced interoperability

• Optimization of research efforts/resources

• Integrative, connected science

• Factual, adaptative conservation

Wednesday 26 October 11

Page 61: Antabif training

Challenges ahead

• Data intensive science

• Data deluge

• Digital divides

• Other data types and integration

• Orphan datasets

• Cultural change

Wednesday 26 October 11

Page 62: Antabif training

Hands on now

Wednesday 26 October 11

Page 63: Antabif training

The rest of the day

• Using the portals

• Using data tools

• templates

• data validation

• documentation

• publishing

Wednesday 26 October 11

Page 64: Antabif training

http://share.biodiversity.aq/training/

Wednesday 26 October 11