Find Research Data B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This...

29
Find Research Data b2find.eudat.eu www.eudat.eu B2FIND Integration How to publish metadata in EUDAT’s B2FIND catalogue This work is licensed under the Creative Commons CC-BY 4.0 licence Version 2 December 2015

description

Community-Driven Solutions PHYSICAL SCIENCES & ENGINEERING SOCIAL SCIENCES & HUMANITIES MATERIALS & ANALYTICAL FACILITIES ENVIRONMENTAL SCIENCES MAPPER BIOMEDICAL & MEDICAL SCIENCES EUDAT services (the so called B2 Service Suite ) are designed, built and implemented based on user community requirements.

Transcript of Find Research Data B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This...

Page 1: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Find Research Data

b2find.eudat.euwww.eudat.eu

B2FIND IntegrationHow to publish metadata in EUDAT’s

B2FIND catalogue

This work is licensed under the Creative Commons CC-BY 4.0 licence

Version 2December 2015

Page 2: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

EUDAT: A truly pan-European Infrastructure

EUDAT offers common data services to both research communities and individuals through a network of 35 European organisations.

EUDAT enables European researchers from any discipline and any geographic location to preserve, find, access, and process data in a trusted environment.

European infrastructuresTechnology ProvidersResearch Communities

Page 3: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Community-Driven Solutions

PHYSICAL SCIENCES & ENGINEERING

SOCIAL SCIENCES

& HUMANITIES

MATERIALS & ANALYTICAL FACILITIES

ENVIRONMENTAL SCIENCES

MAPPER

BIOMEDICAL & MEDICAL SCIENCES

EUDAT services (the so called B2 Service Suite) are designed, built and implemented based on user community requirements.

Page 4: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

B2 Service Suite

Page 5: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.euB2FIND is based on a

comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other repositoriesB2FIND provides a simple and user-friendly discovery service on metadata steadily harvested from a wide range of research communities

What is B2FIND?

Page 6: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Why should you publish your metadata in B2FIND?

Make your research datasearchable, visible and accessible to the publicpopular in a cross-disciplinary and international scope

Improve interoperability and re-use of your dataAllow feedback and annotations on your research outputBenefit from validation, quality assurance and added value of your meta data

Page 7: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Data from a huge selection of subjects

B2FIND has a truly cross-community approachMetadata is mapped and offered covering a wide range of communities– From climate research to

Social Sciences– From Biodiversity to

Linguistics– From Archaeology to

Seismology Transformation and homogenisation of the catalogue allows use of a common vocabulary

Page 8: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

B2FIND communities

B2FIND comprises initially communities in the EUDAT registered domain of data, which provide a well-described and stable metadata offers. EUDAT is extending the service to other interested and reliable data and metadata providersThe list of currently integrated communities is available at http://b2find.eudat.eu/group/

Page 9: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

What will be covered

How get your metadata published in B2FIND ?Metadata GenerationMetadata repository and providerMetadata HarvestingMetadata Formats (excerpt)Metadata MappingB2FIND MD Schema (excerpt)Metadata ValidationSupport requestsAppendix: OAI-PMH - What it is and how it works

Page 10: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

How to get your metadata published in B2FIND? - The Metadata (MD) Ingestion Roadmap

MD Generation

MD Harvesting

MD Mapping and Validation

MD Uploading and Indexation

Data Provider on Community site

Service Provideron EUDAT site

MD Repository and Provider

Page 11: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Metadata Generation

Must be done in close proximity to the data productionshould be part of the data management planmust be checked and possibly enhanced to aim for a comprehensive data descriptionbenefits from quality control at an early stageshould be based on common ontologies and metadata formats

Page 12: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Metadata repository and provider

To be set up on community site to allow harvestingOAI-PMH is the preferred protocol (for a detailed description of the protocol and an installation guide of the data provider tool see the Appendix)But as well other data transfer techniques are supported, if necessaryEUDAT offers support for the installation

Page 13: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Metadata Harvesting

B2FIND harvests regularly and incrementally from OAI endpoints

Initially the B2FIND team will do a first harvest try on a given and accessible OAI endpoint The frequency and the harvested sets will be negotiated with the community

Page 14: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Metadata Formats (excerpt)Name Specification Description Used by B2FIND to harvest

from Communities

Dublincore Specification: See at http://dublincore.org/specifications/ and in the following standard documents:•IETF RFC 5013•ISO Standard 15836-2009•NISO Standard Z39.85

The Dublin Core Schema is a small set of vocabulary terms that can be used to describe web resources (video, images, web pages, etc.), as well as physical resources such as books or CDs, and objects like artworks. The full set of Dublin Core metadata terms can be found on the Dublin Core Metadata Initiative (DCMI) website, see left.

• DataCite• NARCIS• PanData• TheEuropeanLibrary• SDL• DARIAH• IVOA• PDC

ISO 19115 http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=53798

ISO 19115-1:2014 defines the schema required for describing geographic information and services by means of metadata. It provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services.

• ENES• Earlinet

MarcXML http://www.loc.gov/standards/marcxml/

MARC (MAchine-Readable Cataloging) standards are a set of digital formats for the description of items catalogued by libraries, such as books. It was developed by Henriette Avram at the US Library of Congress during the 1960s to create records that can be used by computers, and to share those records among libraries.

• B2SHARE• ALEPH

CMDI http://www.clarin.eu/content/component-metadata

CMDI (Component MetaData Infrastructure) was initiated by CLARIN to provide a framework to describe and reuse metadata blueprints. Description building blocks (“components”, which include field definitions) can be grouped into a ready-made description format (a “profile”).

• CLARIN

DDI http://www.ddialliance.org DDI (Data Documentation Initiative) is an effort to create an international standard for describing data from the social, behavioural, and economic sciences.

• CESSDA

Page 15: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Metadata Mapping

The community specific ‘raw’ metadata are processed and homogenized to B2FIND schema in the following steps

Parse harvested XML records and select entries by MD format specific XPATH rulesAnalyse and parse values and map onto key-value pairs (JSON) vs. given controlled vocabulariesCheck and validate the resulting JSON records against B2FIND schemaUse (community specific) ontologies and thesauri

Page 16: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

B2FIND Metadata Schema (excerpt)

MetadataType

B2FINDField name

Semantic definition Allowed values / CV Level of Obligation

Occurrence

General information

Title A name or title a resource is known

Free text Mandatory 1

Description All additional textual information

CKAN2.0 only supports plain text Recommended 1

Data Access Source URI of the related resource Valid URL Mandatory 1PID Persistent Identifier Recommended 1DOI Digital Object Identifier Recommended 1

Provenance data

Creator List of the main researchers involved in producing the data

Text field (‘;’ list of citied names, separately indexed)

Recommended 0-n

Discipline Field of research Text field (mapped and validated against CV)

Recommended 0-n

Publisher The person or institution publishes the data

PublicationYear The year when the data was or will be made public

YYYY Recommended 1

Data coverage TemporalCoverage Relation to or Coverage of a specific interval in time.

Interval between two UTC Date Timestamps : [ BeginDateTime , EndDateTime ]

Optional 1

SpatialCoverage The spatial limits of a place.

A spatial point or box specification, CKAN representation :spatial={"type":"Polygon","coordinates":[[[minlat,minlon…]]}

Optional 1

Page 17: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Metadata Validation

Check each field for coverage, consistency and validity

‘Technical’, e.g.:Check date-time vs. UTC formatCheck spatial coverage by geonames.org and consistency of lat/lon coordinates

Semantic mappingusing controlled vocabulariesusing ISO standards, e.g. iso639 library for ‘Language’

Online checksof links to the data objects (‘Source’, ‘PID’ and ‘DOI’)

Page 18: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

Support requests

www.eudat.eu/support-request?service=B2FIND

Page 19: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

b2find.eudat.eu

For more info: https://eudat.eu/services/b2find User documentation: https://

www.eudat.eu/services/userdoc/b2find-integration

b2find.eudat.eu

Page 20: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Appendix OAI-PMH: What it is and how it

worksOAI-PMH ( http://www.openarchives.org )• stands for Open Archives Initiative Protocol for Metadata

Harvesting• aims at world-wide consolidation of scholarly archives• enables free access to the archives (at least: metadata)• is a low-barrier mechanism for repository interoperability• consists in a set of six verbs or services that are invoked

within HTTP• provides consistent interfaces for data and service

provider• allows effortless implementation• is based only on a few simple protocols (HTTP, XML, DC)

Page 21: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Data/Service Provider setup

Page 22: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Basic functioning of OAI-PMH

MetadataHarvester

Service Provider

Metadata(Documents)

Data Provider

Requests (based on HTTP)

Metadata (encoded in XML)

Local MetadataStorage

„Services“, e.g.• Search• Access• Commenting• …

EUDATMetadata Catalogue

Page 23: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Interoperability: it is by no means domain specific and based on common metadata schemas Widely used: It’s a quasi standard tool for providing metadata, for registered data providers (more than 2800 repostitories worldwide) see e.g. at https://www.openarchives.org/Register/BrowseSites Simple to install: In the appendix we offer a guideline of the software joai. See the list of tools implemented by members of the Open Archives Initiative community at https://www.openarchives.org/pmh/tools/tools.php Simple to use: OAI attached great importance to simplicity of the protocol

OAI benefits

Page 24: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Inefficiency: The XML serialisation and deserialisation takes time. Reference clash issue: if two records happen to have the same ID value, the envelope is not valid XML. Persistence of deletion: OAI-PMH allows three levels of persistence, but most providers promise none. Lack of SSL: By a strict reading OAI-PMH standard supports only http: , but not https:

OAI shortcomings

Page 25: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

jOAI software (http://www.dlese.org/dds/services/joai_software.jsp )

is a Java-based data provider and harvester toolis from open source Open Archives Initiative runs in a servlet container such as Apache Tomcatenables existing systems, archives and databases

to provide metadata via OAI-PMH and to harvest metadata to the file system.

Software for OAI-PMH

Page 26: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

To install and run the jOAI software you must have the following:

oai.war - the jOAI software.Apache Tomcat v5.5.x or v6.x.Java Standard Edition (SE) (or JDK) version 6.

For details see the OAI-PMH tutorial athttp://www.oaiforum.org/tutorial/

Installation overview

Page 27: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Configuration and customisation can be done directly in the jOAI data provider site:

1. Setup and configuration Data Provider

Setup and status Repository Information and Administration

2. Add metadata by adding directories of files Metadata Files Configuration

Add metadata directory3. (Re)index added/changed dierectories ..4. (optional): Set configuration, Access control, …

Data provider

Page 28: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

Verbs that specify the service being invokedIdentify - used to retrieve information about the repository.ListIdentifiers - used to retrieve record headers from the repository.ListRecords - used to harvest full records from the repository.ListSets - used to retrieve the set structure of the repository.ListMetadataFormats - lists available metadata formatsGetRecord - used to retrieve an individual record from the repository.

Selective harvesting by parametersidentifier - specifies a specific record identifier.metadataPrefix - specifies the metadata format of the returned recordsset - specifies the set that returned records must belong to.from/until – returns records created/update/deleted after/before this dateresumptionToken - a token to resume a request where it last left off.

OAI-PMH Harvester – Verbs and Parameters

Page 29: Find Research Data     B2FIND Integration How to publish metadata in EUDATs B2FIND catalogue This work is licensed under the.

An example of an OAI Provider and Harvester