GESIS workshop, Bonn 2011 Publishing Scientific Data – the Role of the Digital Object Identifier...

17
GESIS workshop, Bonn 2011 www.pangaea.de Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE

Transcript of GESIS workshop, Bonn 2011 Publishing Scientific Data – the Role of the Digital Object Identifier...

Page 1: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

GESIS workshop, Bonn 2011 www.pangaea.de

Publishing Scientific Data – the Role of the Digital Object Identifier

Michael DiepenbroekPANGAEA / WDC-MARE

Page 2: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

DOI - Operational System

• International DOI Foundation (IDF) launched 1998. • Currently used by c. 4,000 naming authorities

– assigners e.g. 3,000 publishers, data repositories, EU

• Documents, science data sets, etc.• ~43 million DOI names assigned to date• ~60 million DOI resolutions per month• Well established in professional information sector

– best known applications are CrossRef (www.crossref.org ) and DataCite (www.datacite.org )

• Draft International Standard (ISO TC46 )

Page 3: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

DOI – Operational System

• Use of identifier syntax and network resolution mechanism (Handle System®)

• Persistence ensured through combination of – improved handle infrastructure (registry database, proxy

support) – social infrastructure (obligations by Registration Agencies - RA)

• Use of a semantically interoperable data model and grouping mechanisms.– multiple resolution, data typing, “Application Profiles”

Page 4: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

DOI - Organisation

International DOI

Foundation

members

Operating Federation

RegistrationAgencies (RA)

Clients

Page 5: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

DOI – Business Model

• IDF receives membership fees from RAs, contracts technical operator

• RAs also pay operational fees to IDF’s technical operator for registering and maintaining DOI names (sliding scale per volume)

• Assigners are customers of RAs• RAs might have their own existing numbering scheme• RAs are autonomous independent bodies. They offer

services to assigners using DOI names– RAs’ business model with their customers is entirely

autonomous– RAs only obligation to IDF is a licence/operating agreement

Page 6: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

DOI system – added value

• DOI is a brand• DOI resolving infrastructure• Offer you the opportunity to build added-value services• Strong linkage with academic publishing!• DataCite as DOI registry for scientific data

– Organised as an international association of libraries

– Developed by ICSU World Data Centers and Services (German cluster) & the Technical Information Library in

Hannover (TIB) • Will be adopted by ICSU World Data System (associated

member of DataCite)

Page 7: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

•Nuclear RadiationTokyo, Japan

WDC Co-ordination OfficesWashington DC, USABeijing, China

•MeteorologyAsheville NC, USABeijing, ChinaObninsk, Russia

•OceaographyObninsk, RussiaSilver Spring MD, USATianjin, China

•PaleoclimatologyBoulder CO, USA

•Marine Geology and GeophysicsBoulder CO, USAMoscow, Russia

•Remotely Sensed Land DataSioux Falls SD, USA

•Renewable Resources and EnvironmentBeijing, China

•Recent Crustal MovementsOndrejov, Czech Republic

•AirglowMitaka,Japan

•AstronomyBeijing, China

•Atmospheric Trace GasesOak Ridge TN, USA

•AuroraTokyo, Japan

•Cosmic RaysToyokawa, Japan

•GeologyBeijing, China

•Human Interactions in the EnvironmentPalisades NY, USA

•IonosphereTokyo, Japan

•Earth TidesBrussels, Belgium

•GeomagnetismCopenhagen, DenmarkEdinburgh, UKKyoto, JapanColaba, India

•GlaciologyBoulder CO, USACambridge, UKLanzhou, China

•Marine Environmental SciencesGermany, (2001)

•Rotation of the EarthObninsk, RussiaWashington DC, USA

•Satellite InformationGreenbelt MD, USA

•Rockets and SatellitesObninsk, Russia

•SeismologyDenver CO, USABeijing, China

•Solar Radio EmissionNagano, Japan

•Space ScienceBeijing, China

•Space Science SatellitesKanagawa, Japan

•Solar ActivityMeudon, France

•SoilsWageningen, The Netherlands

•Sunspot IndexBrussels, Belgium

•Solar Terrestrial PhysicsBoulder CO, USADidcot Oxon, UKMoscow, RussiaHaymarket, Australia

•Solid Earth GeophysicsBeijing, ChinaBoulder CO, USAMoscow, Russia

ICSU World Data Centers (WDC)Geophysical Year 1957

Page 8: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

Contra• Insufficient funding (of course)

• Organisation and quality of data services are not consistent

• IT development is fast – no time for legacies

• Fragmentation of efforts

Pro• Long standing experience & know how & motivation

• Good context with science

• Open access for all data resources

• As a whole a very large global data management capacity

• Trans-disciplinary !

Initial position of WDS

Michael Diepenbroek
from the user perspective data are not reliable
Michael Diepenbroek
in particular: data centers are big data sinks, you put something in but never get something out
Michael Diepenbroek
not only technical handling and adminstration of data, most data center have a clear scientific background and correspondingly skilled staff
Page 9: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

ICSU WDS - Roles & relations in a federated system

Publishers commercial, open access

(e.g. ESSD journal),crossreferencing

Data Collection & Processing FacilitiesQA/QC, data products, also

data rescue

Data Archiving & Publication Facilities

certified repositories

Related Networks & Programs

GEOSS, GMES, WMO-IS, IOC etc

Metadata & Data Services

web portals, catalogues

Visualisation & Analysis

compute systems, virtual labs, GIS systems

Research Institutionsuniversities,

research institutes

Research Projects / Programsnational, EU, international

Libraries & Service Providers

DOI registryinterdiscipl. catalogues

Research Facilitiessattelites, vessels,

observatories, alert systems etc.

Education & Outreach

Scientific Communities & Other Stakeholders

Page 10: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

WDS implementation

DataCite

WDC

Publisher

WDS ServicePlatform

PublisherPublisher

Thomson Reuters

OCLC

Google Scholar

CrossRef

WDC WDC

<<component>>DOI registry

<<component>>catalogue

<<component>>data archive

<<component>>editorial system

<<component>>editorial system

<<component>>journal archive

<<component>>editorial system

<<component>>journal archive

<<component>>editorial system

<<component>>journal archive

<<component>>service ticket

system

<<component>>data portal

<<component>>bibliometrics

<<component>>catalogue

<<component>>portal

<<component>>catalogue

<<component>>catalogue

<<component>>catalogue

<<component>>DOI Registry

<<component>>certification

<<component>>data archive

<<component>>editorial system

<<component>>data archive

<<component>>editorial system

REST

SOAP/REST

OAI-PMHSOAP

Page 11: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

Why do we need publishing systems for scientific data?

• Good data availability fosters large scale & complex science approaches.

• „Data recycling“ is more effective than re-production.• General data availability is low compared to data

production.• Available data are often not usable because the quality

cannot be estimated.• Prerequisite for the verification of scientific results.• Benefit to data producers

(publications = science currency)

Page 12: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

Data publishing - prerequisites and current status

• DOI assigners (agents) need to be certified – Certification agency – CA

• following the OECD principles and guidelines for access to research data (2007)

• peer-review procedures• citability

– persistent identifiers (DOI)– ICSU SCOR & Codata working groups on data citation– Science citation index -> Thomson Reuters Web of Knowledge

• Metadata/Data standards & protocols

Page 13: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

Data publishing - metadata

Dublin Core

STD-DOI

ISO19115

data management & longterm archiving

RDB

catalogues

PANGAEA

XSLT

Index

protocols

marshaller

WS(SOAP/WSDL)

Frontends / portals

Elsevier

Geoserver(OGC)

OGCcatalogue

service

OAI-PMH

WS(SOAP/WSDL) ISO690

GeoPortal.Bund®

TIB Library

DOI registration

catalogues

DOI registry

DIF DublinCoreharvester

Google

OCLC

harvester

Thomson Reuters

EUR-OCEANS

CARBOOCEAN

IODP

Darwin Core

DIGIRDarwin Core

ISO19115

DIF

OBIS

GBIF

harvester

harvester

D-GRID

gml, kml

WDS

PANGAEAweb frontend

Page 14: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

Data publishing - prerequisites and current status

• Collaborations with science publishers (Elsevier, Springer, Wiley, Oxford, AGU etc.)– data journals (ESSD)– cross-referencing supplementary data with traditional

publications– published data as embedded content for traditional publications– combined peer-review between data archive and journal

Page 15: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

Data publishing - peer-review

technical review

peer review(incl. data)

submit data sets

archive data sets

send DOI

publish data sets

submit article

publish article

prepare article &related data sets

JOURNAL .

data curator

reviewers

author,data originator

editor

DATA ARCHIVE .

noyes

accepted?

yes

no

accepted?

technical reviewpeer review(incl. data)

submit data sets

archive data sets

publish data sets

submit article

publish article

prepare article &related data sets

data curator reviewers

author,data originator

editor

JOURNAL .DATA ARCHIVE .

no

no

accepted?

accepted?

Page 16: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

Data publishing – cross-referencing

Page 17: GESIS workshop, Bonn 2011  Publishing Scientific Data – the Role of the Digital Object Identifier Michael Diepenbroek PANGAEA / WDC-MARE.

www.pangaea.de GESIS workshop, Bonn 2011

Thank you !