Antabif training
-
Upload
bruno-danis -
Category
Technology
-
view
571 -
download
0
description
Transcript of Antabif training
ANTABIF Traininggetting your data online
Bruno Danis, Anton Van de Putte and Nabil Youdjou
Wednesday 26 October 11
Objectives
• familiarize with ANTABIF
• learn about architecture, functionalities tools and standards we offer
• hands on exercises with dummy and *real* data
• collect feedback on the fitness for use for this community
Wednesday 26 October 11
On the Menu Today
• Background about ANTABIF
• Technical overview
• Standards, tools and resources
• Functionalities
• Future directions
• Hands on
Wednesday 26 October 11
Background
Wednesday 26 October 11
Antarctic Treaty« In order to promote international cooperation in scientific investigation in Antarctica, […], the Contracting Parties agree that, to the greatest extent feasible and practicable: […]
Scientific observations and results from Antarctica shall be exchanged and made freely available. »
Wednesday 26 October 11
SCAR-MarBIN & ANTABIF
• www.scarmarbin.be
• www.antabif.be or www.biodiversity.aq
• Core funding: BELSPO.be
• International Polar Year 2007/08
• Census of Antarctic Marine Life
• Ocean Biogeographic Information System
• Global Biodiversity Information Facility
Wednesday 26 October 11
General Philosophy
• Build an electronic ecosystem
• Offer free and open access to data and technology
• Expose all the (biodiversity) data and metadata, in multiple contexts
• Remain community-driven, and collaborative
• Adopt strong standardization
• Work for science, conservation, management
Wednesday 26 October 11
Wednesday 26 October 11
Achievements
• The first RAMS
• Board of 60+ editors
• Feeds WoRMS, CoL and EoL
• 17,098 taxa (RAMS)
• Building a dynamic RAS
• 24,248 taxa (RAS)
Wednesday 26 October 11
Achievements
• 1,288,441 records
• 198 datasets
• 5,235 taxa
• Feeds OBIS, GBIF
• Downloadable
• WebGIS
• Webservices
Wednesday 26 October 11
Achievements
• Up since Oct 2005
• open access
• 909,915 visitors
• 8,093,774 hits
• 51,416,196 dld records
• Citations: 183
• Cited Publications: 38
Wednesday 26 October 11
Achievements
Records SMB ANTABIF Progress
Metadata 198 7.200 36,4
Occurrence 1.288.441 2.659.392 2,1
Taxonomy 17.184 30.472 1,8
Wednesday 26 October 11
Nuts and Bolts
Wednesday 26 October 11
100% Open Source
• Language: Ruby
• Framework: Rails(ActiveRecord) and YUI
• (smart) Search engine: Full text (Elasticsearch-Lucene)
• Database/GIS server/SpatialDB: PostGresql/Geoserver/PostGIS
• Mapping client: OpenLayers
• Web services: RESTish (all resources)
• Protocols/Standards: DIF, DwC, DwC-A, Tapir…etc
• GBIF tools : HIT, IPT
• Hosting: BeBIF (ULB/VUB joint IT Center)
• Metadata systems: GCMD API (DIF)
Wednesday 26 October 11
Data flow
Your data
standardize
DwC-A
upload publish
IPT ANTABIF
publish
Data Paper
(your point of view)
Wednesday 26 October 11
Data flow(our point of view)
Wednesday 26 October 11
Standards, tools, resources
Wednesday 26 October 11
MetadataInformation about datasets deteriorates over time!
Wednesday 26 October 11
Metadata
• preferred MD catalogue = Antarctic Master Directory (subset of GCMD)
• standard = DIF (Data Interchange Format)
• used by the whole SCAR community
• crawled by Google, Scopus...
Wednesday 26 October 11
DarwinCore
"A vocabulary of words that biologists, hackers, and citizen scientists use to broadly describe the biodiversity of life on earth."
Wednesday 26 October 11
DarwinCore Archive
• Complete package of data
–One file
–Multiple files
• Text Files…
• Self-documenting
• Intended to be shared/distributed
Wednesday 26 October 11
DarwinCore Archive
The core data file is a text file.
Archives always have a ‘core’ data file
My_data.txt
Wednesday 26 October 11
DarwinCore Archive
The core data file is a text file.
Archives always have a ‘core’ data file
My_data.txt
Wednesday 26 October 11
DarwinCore Archive
meta.xml describes the mappings in thecore data file (species.txt)
Darwin Core Archive (two files)
Wednesday 26 October 11
DarwinCore Archive
Columns in extensions are mapped to Darwin Core using the meta.xml file
Multiple extensions are available
Wednesday 26 October 11
DarwinCore Archive
h?p://rs.gbif.org/extension/
Many extensions are available
Wednesday 26 October 11
Spreadsheet templates
• Metadata - describe a database or other data resource.
• Species Occurrence - store basic species collections or observational data
• Species Checklists – recording and storing simple annotated species checklists.
Wednesday 26 October 11
Wednesday 26 October 11
Wednesday 26 October 11
Wednesday 26 October 11
Wednesday 26 October 11
Wednesday 26 October 11
Spreadsheet processor
• web application: Excel spreadsheet to DwC-A.
• Excel files contain data entry and GBIF metadata profile.
• Worksheet supports publication of primary biodiversity data
• Processor performs data validation and transformation and returns a validated DwC-A
Wednesday 26 October 11
Wednesday 26 October 11
DwC-A validator
• tests Darwin Core Archives
• validates the content against the known extensions and terms registered within the GBIF network for sharing biodiversity data.
Wednesday 26 October 11
Wednesday 26 October 11
IPT - Integrated Publishing Toolkit
• Publishing primary biodiversity data
• Resources
• Metadata
• Source Data (text, zip, SQL)
• Source Mappings
• Visibility
• Published Release
Wednesday 26 October 11
The Data Paper concept
• A scholarly journal publication whose primary purpose is to describe a dataset or group of datasets, rather than to report a research investigation.
• Benefits of the Data Paper
–Scholarly credit to Data Publishers
–Describe the data in structured human readable form
–Bring the existence of the data to the attention of the scholarly community
Wednesday 26 October 11
Data Paper: Incentivising Data Discovery
Wednesday 26 October 11
Data PaperMetadata document
Reward data publishing
Wednesday 26 October 11
• Complete metadata of a dataset using metadata editor in IPT 2.0.2
• Generate ‘Data Paper’ manuscript (menu: Manage Resource – RTF Download)
• Submit the manuscript for possible publication in one of the PenSoft publication (ZooKeys, PhytoKeys, BioRisks, NeoBiota).
• Revision (if any) is carried out using metadata editor in IPT 2.0.2 and manuscript re-submitted to PenSoft Open Journal System
Step-by-Step
Wednesday 26 October 11
• Digital Object Identifier is assigned to the Data Paper
• Paper is published in (a) print format, (b) PDF format, (c) semantically enhanced HTML, and (d) XML is archived in PubMedCentral
• DoI of the Data Paper is linked with the Persistent Identifier of the metadata document in the GBIF Registry
• Data Paper is indexed by Web of Knowledge (ISI), PubMedCentral, Scopus, Zoological Record, Google Scholar, CAB Abstracts, Directory of Open Access Journal (DOAJ), EBSCO.
Once paper is accepted
Wednesday 26 October 11
• Metadata is complete in all the respect
• All the claims are adequately substantiated
• Data described in ‘Data Paper’ is freely available at the time of submission of the manuscript
Important to consider
Wednesday 26 October 11
ORC• GBIF’s Online Resource Center
• Provides access to documents, best practices, tools and links
• Wide thematic scope
• Different ways of accessing resources
• Enabling community contributions
• Different levels of resource access
• Multilanguage supportWednesday 26 October 11
Wednesday 26 October 11
Functionalities
Wednesday 26 October 11
www.biodiversity.aq
• general website
• latest news
• contact
• sponsors
• governance
• links
Wednesday 26 October 11
data. biodiversity.aq
• find primary biodiversity data
• visualize occurrence data on map
• view taxonomic data
• download data
• view metrics
• send feedback
• access technical documentation
Wednesday 26 October 11
data. biodiversity.aq
Wednesday 26 October 11
ipt. biodiversity.aq
• prepare and clean your data
• publish primary biodiversity data
• publish metadata
• push data and metadata to ANTABIF & GBIF
• get a Data Paper
Wednesday 26 October 11
ipt. biodiversity.aq
Wednesday 26 October 11
afg. biodiversity.aq
• (nice-looking) Identification aid
• Publication/sharing platform for customized Field Guides
• High quality (useful) pictures
• Expert Descriptions
• Built dynamically from various sources
Wednesday 26 October 11
afg. biodiversity.aq
Wednesday 26 October 11
share. biodiversity.aq
• download shared resources
• reports, communication material
• original datasets, tools, resources
Wednesday 26 October 11
share. biodiversity.aq
Wednesday 26 October 11
• polarcommons.org
• Emergency solution for orphan datasets
• Setup of a commons
• IT cloud
• Set of norms
• All polar data (IPY)
• Simple procedure!
PIC
Wednesday 26 October 11
www.polarcommons.org
Wednesday 26 October 11
Future directions
Wednesday 26 October 11
Architecture
• A network of IPTs
• Enhanced data flow
• Community involved in data management
• Enhanced interoperability
• Optimization of research efforts/resources
• Integrative, connected science
• Factual, adaptative conservation
Wednesday 26 October 11
Challenges ahead
• Data intensive science
• Data deluge
• Digital divides
• Other data types and integration
• Orphan datasets
• Cultural change
Wednesday 26 October 11
Hands on now
Wednesday 26 October 11
The rest of the day
• Using the portals
• Using data tools
• templates
• data validation
• documentation
• publishing
Wednesday 26 October 11
http://share.biodiversity.aq/training/
Wednesday 26 October 11