PANGAEA Archiving and Publication of Scholarly Data for the Long Tail of Science

21
PANGAEA Archiving and Publication of Scholarly Data for the Long Tail of Science Michael Diepenbroek

description

PANGAEA Archiving and Publication of Scholarly Data for the Long Tail of Science . Michael Diepenbroek. What is PANGAEA?. Information system for long -term archiving and publication of data from earth & environmental sciences ( since 1993) - PowerPoint PPT Presentation

Transcript of PANGAEA Archiving and Publication of Scholarly Data for the Long Tail of Science

Page 1: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

PANGAEAArchiving and Publication of Scholarly Data for the Long

Tail of Science

Michael Diepenbroek

Page 2: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

What is PANGAEA?• Information system for long-term archiving and publication of data

from earth & environmental sciences (since 1993)

• Accredited by the „World Meteorological Organisation“ (WMO) as „World Radiation Monitoring Center“ (WRMC)(since 2007)

• Accredited by the „International Council for Science“ (ICSU) as World Data Center„Publisher for Earth & Environmental Science“ (World Data Center) (since 2001)

Page 3: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

PANGAEA - contentsIRD

(gra v/1 0 c m3)

Sand(% )

CaCO3(% )

TOC(%)

Radio(%/s an d )

Smect(% /cl a y)

IRD(g ra v/1 0 c m3 )

Sand(% )

CaCO3(% )

TOC(% )

Radio(%/s an d )

Smect(%/c la y )

IRD(gra v/1 0 c m3)

Sand(% )

CaCO3(% )

TOC(% )

Radio(% /sa n d)

Smect(%/c la y )

IRD(gra v/1 0 c m3)

Sand(% )

CaCO3(% )

TOC(% )

Radio(%/s an d )

Smect(% /c la y )

IRD(g ra v/1 0 c m3)

Sand(% )

CaCO3(% )

TOC(% )

Radio(%/s an d )

Smect(%/c la y )

PS1389-3 PS1390-3 PS1431-1 PS1640-1 PS1648-1

Age (kyr) max. : 233.55 kyr PS1389-3ff

0.0

100.0

200.0

0 20 0 1 00 0 15 0 0.5 0 50 0 10 0 0 20 0 10 0 0 15 0 0 .5 0 50 0 1 00 0 20 0 1 00 0 15 0 0 .5 0 50 0 1 00 0 20 0 1 00 0 15 0 0.5 0 50 0 10 0 0 2 0 0 10 0 0 1 5 0 0.5 0 5 0 0 1 00

54° 0' 54° 0'

54°30' 54°30'

55° 0' 55° 0'

55°30' 55°30'

11°

11°

12°

12°

13°

13°

14°

14°

15°

15°

World vector shore lineGrain size class KOLP AGrain size class KOEHN2Grain size class KOEHNGeochemistryGrain size class KOLP BGrain size class KOLP DIN20 m

Scale: 1:2695194 at Latitude 0°

Source: Baltic Sea Research Institute, Warnemünde.

• Integral part of science– More than 160 European to

international projects since 1995 (www.pangaea.de/projects)

• highly heterogenous &dynamic• multidisciplinary

HydrosphereLithosphereAtmosphereCryosphere

Total number of data sets ~350.000 Data volume <2 PB Increase ~5% per year

Page 4: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Editorial System

SybaseASE

MiddlewareWebserver

PANGAEAsearchengine

PANGAEA - technical architecture

Harddisk+ tape (silo)

RDB

SybaseIQ

warehouse

IQinterface

Various services

Ticket System

Curators

Users

Page 5: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Portals CARBOOCEAN EUR-OCEANS IODP - SEDIS ICSU WDS portal ESONET/EMSO

Broker function GBIF, OBIS

Sensor webs ESONET/EMSO, Statoil

Conform to global standards ISO19xxx, OGC, W3C, OAI

PANGAEA - interoperability

Page 6: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

PANGAEA – interoperability

Dublin Core

STD-DOI

ISO19115

data management & longterm archiving

RDB

catalogues

PANGAEA

XSLT

Index

protocols

marshaller

WS(SOAP/WSDL)

Frontends / portals

Elsevier,Scopus …

OGC CSW

Geoserver(OGC)

OAI-PMH

WS(SOAP/WSDL) ISO690

INSPIRE

DataCite

DOI registration

catalogues

DOI registry

DIFDublin Coreharvester

Google

OCLC

ISO19115harvester

Thomson Reuters

EUR-OCEANS

CARBOOCEAN

GEOSS

Darwin Core

DIGIRDarwin Core

DIF

OBIS

GBIF

harvester

harvester

IODP

gml, kml

ICSU WDS

PANGAEAweb frontend

PubMed

OpenAire

Page 8: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

The Long Tail of DataFi

tnes

s of u

se

Total volume of scientific data

Professionally managed & published dataLarge scale monitoring & computed data & disciplinary data centers

Unmanaged & non-public dataData from individual scientists, labs, or smaller projects

Unmanaged open access data

Page 9: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

DOC

PDF

CSV

NetCDF

TXT

XML

XLSX

XLS

GRIB

• Citable & persistent (DOI)• CC-BY License• Quality data

QA/QC -> review procedures

• Efficient usage (Meta)data & interoperability standards

(mashine readable)

• FITNESS OF USE!

Data Set

Data Set

Data Set

Data Set

Data Set

Data Set

Data Set

Data Set

Data Set

Publishing data with PANGAEA

OECD principles and guidelines for access to research data (2007)

Page 10: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Data

time

Article Data

Article

ArticleData

Data

ArticleData

Data publication - citability

Page 11: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Publishing workflow - synchronized

technical review

peer review(incl. data)

submit data sets

archive data sets

send DOI

publish data sets

submit article

publish article

prepare article &related data sets

JOURNAL .

data curator

reviewers

author,data originator

editor

DATA ARCHIVE .

noyes

accepted?

yes

no

accepted?

Page 12: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Impact on citation rates35% to 69%

more citations!

courtesy of Jon Sears (AGU)Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308

Page 13: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Collaboration between data centers & science journals

linking editorial workflows linking services

Page 14: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Data Publishing – Cross-referencing

Page 15: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Data Publishing – Cross-referencing

Page 17: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Publishers

Data archiveBibliometrics

CataloguesData archive

Linking infrastructure

Data archive

Data archive

Data archive

Page 18: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

ICSU WDS perspective

Certified Data Archives

Registries

Bibliometric Services

Catalogues

Web of KnowledgeGoogle ScholarScopus

Thomson ReutersCitation Indexes

CrossrefDataCiteORCIDCrossData

Journals

ICSU WDS

Page 19: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

WDS Certification & accreditation Trustworthiness of WDS data

holders and service providers

Evaluation criteria: based on a compilation of international standards and best practices

Certification authority: WDS Scientific Committee

2014/03: 75 members

19

Page 20: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

WDS/RDA WGs and IGsFi

tnes

s of u

se

Total volume of scientific data

e-Infrastructures

Scientific research projects

• Publishing workflows• Publishing Services• Incentives (Bibliometrics)• Trusted repositories & services• Cost compensation models

Page 21: PANGAEA Archiving  and  Publication  of  Scholarly Data  for the Long Tail of Science

Some conclusions• Publishing data gives benefit to providers and has significant

impact on data quality.• „Fitness of use“ is an important aspect of data quality and a

prerequisite for integrating data from different sources.• Certification is key for the evaluation of the quality of services

and data.• Scalable services are needed to embed data publications into the

current scholarly publishing system