Pilot Implementation: Publication and Citation of Scientific Primary Data Result of CODATA WG,...

31
Pilot Implementation: Publication and Citation of Scientific Primary Data Result of CODATA WG, supported by DFG Jan Brase Learning Lab Lower Saxony, Uni. Hannover Michael Lautenschlager WDC for Climate Model and Data / Max-Planck-Institute for Meteorology ERPANET WS, Cork, Ireland, 17+18.06.04 IDF Member's Meeting, London, 22.06.04
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Pilot Implementation: Publication and Citation of Scientific Primary Data Result of CODATA WG,...

Pilot Implementation:Publication and Citation of

Scientific Primary Data

Result of CODATA WG, supported by DFG

Jan Brase Learning Lab Lower Saxony, Uni. Hannover

Michael LautenschlagerWDC for Climate

Model and Data / Max-Planck-Institute for Meteorology

ERPANET WS, Cork, Ireland, 17+18.06.04IDF Member's Meeting, London, 22.06.04

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 2

Roots

CODATA1) National Committee initiated WG, grant-aided by DFG

Working Period September 2001 to May 2002

Result Final Report "Konzept zur Zitierfähigkeit wissenschaftlicher

Primärdaten" or "Conception of Citing Scientifc Primary Data", Hannover, 29.05.2002

ContinuationTwo year project for pilot implementation funded by DFG starting in October 2003

(1) CODATA - Committee on Data for Science and Technology)

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 3

Northern Hemisphere temperature response for scenario IS92a

NH mean temperature anomaly relative to 1961 – 1990 mean of the IPCC DDCgreenhouse gas only experiments

ECHAM4 / 1 : T = 0.7°C

ECHAM4 / 2 : T = 2.5°C

ECHAM4 / 3 : T = 4.3°C

Each curve is connected with appr. 1TB data (numbers)

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 4

ECHAM4 / 1:Temperature 2000

-8°C to -12°C

ECHAM4/OPYC greenhouse gas only according to IS92a

Corresponding to point 1 in NH temperature anomalyCO2 = 370 ppmv

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 5

ECHAM4 / 2:Temperature 2050

ECHAM4/OPYC greenhouse gas only according to IS92a

Corresponding to point 2 in NH temperature anomaly:CO2 = 500 ppmv

-4°C to -8°C

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 6

ECHAM4 / 3:temperature anomaly 2099

ECHAM4/OPYC greenhouse gas only according to IS92a

Corresponding to point 3 in NH temperature anomaly:CO2 = 690 ppmv

0°C to -4°C

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 7

Problem and Solution

Shortcomings in data provision and interdisciplinary use Rules of good scientific practise are not taken into account in all

cases. Data sources are widely unknown. Data are achived without context. Data cannot be cited as independent entities

Method of solution: publication of primary data as independent entities Persitent Identifier with global resolving mechanism for data archive

and context referencing (scientifc datamodel at archive level) Integration into library catalogues in order to find data together with

articles STD-DOI application profile: meta data kernel + items for electronic

publication (interface between scientific data archives and libraries)

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 8

Credits in Science

"Citation Index": Scientific efficiency is "measured" by publications.

Extra work for data publication is currently not acknowledged. Data processing, context documentation, quality assurance.

Recommendation: Data publications should be included in the standard scientific "Citation Index". Motivation of the individual scientist. Connection between person and primary dataset.

Citable Data publications support the rules of good scientific practise. encourage inter-disciplinary data utilisation. Make data searchable in library catalogues together with articles Closes the gap between scientifc literature and related data

sources

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 9

Metadata for primary data 1Attribute Example

1. DOI 10.1594/WDCC/IPCC_EH4_OPYC_SRES_B2_MM

2. identifier URN:TIB:10.1594/WDCC/IPCC_EH4_OPYC_SRES_B2_MM

3. creator Monika Esch (Author)

4. publisher WDCC, World Data Center for Climate

5. title Climate Projection for the next Century calculated by the Global Climate Model ECHAM4-OPYC using the SRES B2 IPCC Scenario

6. language en

7. StructuralType Digital

8. mode Abstract

9. resourceType Dataset

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 10

Metadata for primary data 2Attribute Example

10.-12. registration information 10.1594 (RA) / 1 (issue no.) / 2004-07-18 (issue date)

13. creationDate 2001-12-31

14. publicationDate 2004-07-18

15. description These data represent results from the ECHAM4/OPYC climate model running the SRES-B2 sceanrio. The data base tables contain monthly mean time series of ……

16. publicationPlace Hamburg

17. size 614190228 Bytes

18. format GRIB

19. edition 1

20. relatedDOIs (none)

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 11

Criteria for Persistent Identifier Allocation

Critical points are securing of data quality and stable connection between identifier and data entity

Allocation is restricted to syntax control and completeness, i.e. expert data description and long-term archiving

Scientific quality assurance is expected by the author and will be reviewed during the allocation process.

Published primary data cannot be changed like published articles.

Stable connection between identifier reference and data entity as well as long-term availability of the primary data are essential and must be ensured (e.g. ICSU WDC's)

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 12

DOI and URN

DOI (Digital Object Identifier) URN (Uniform Ressource Name)

Non profit, but membership fee

Presently cost free

Extended metadata support Basic technical metadata

System of registration agencies infrastructure

Anybody can register URN namespaces

Global resolving mechanism Resolving at community level

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 13

GFZ Geophysics

International DOI Foundation

TIB HannoverRegistr.Agency

M&D/MPIM Climate Models

Marum/AWI Observations

Data StorageLong-termArchivingIn WDC

Data Storage Long-termArchivingIn WDC

Data StorageLong-termArchiving

Global Handle System

DDBURN-Knot

DFG Project "Publication and Citation of ScientificPrimary Data"

TIB-ORDERLibrary Catalogue

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 14

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 15

More Details of Pilot Implementation

Application Example

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 16

Primary data publication

• During her research for the World Data Center Climate (WDCC) the scientist Mrs. Weather gains primary data about the weather in Hannover in the year 2003.

• As usual the primary data is tested, evaluated, stored and administrated at the WDCC.

• In addition Mrs. Weather registers the primary data at the TIB (Primary data publication by STD-DOI/URN assignment)

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 17

Registration of primary data

• After quality assurance WDCC transmits to the TIB the URL where the data can be accessed, together with a XML-file containing all relevant metadata (generated from scientific data model)

• Including all information obligatory for the citing of electronic media (ISO 690-2)

• language• publisher• publishing date• publishing place

•author•title•size•edition

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 18

Identifier

• The TIB is saving this information about the primary data and awards the primary data with a unique identifier for registration: a DOI

• DOI (Digital Object Identifier) is a system for persistent and actionable identification and interoperable exchange of intellectual property on digital networks• Coordinated by the International DOI foundation (IDF)

DOI

Prefix

Suffix

10.1000/123456

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 19

Citing primary data

In her publications, Mrs. Weather is now citing this primary data with its unique DOI, maintaned from the TIB:

doi:10.1594 /WDCC/W_Han_2003_MMB_2

10.1594 (Prefix) stands for the TIB as the registration agency.

WDCC stands for the respective research institute.

W_Han_2003_MMB_2 is the internal name of the Data

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 20

Resolving the DOI

These DOI can be resolved (and the data can be cited) in every browser worldwide in three ways:• http://dx.doi.org/10.1594/WDCC/W_Han_2003_MMB_2• http://doi.tib-hannover.de:8000/10.1594/WDCC/

W_Han_2003_MMB_2

Or byDoi://10.1594/WDCC/W_Han_2003_MMB_2

(after installing a browser plugin)

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 21

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 22

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 23

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 24

Usage scenario 1

• Mr. Storm is reading publications from Mrs. Weather in a journal and would like to analyse her data under different aspects.

• In his publication ”Comparison of the weather from Hannover and Miami” Mr. Storm cites Mrs. Weathers data using its DOI, refering to the uniqueness and own identity of the original data.

• Citation example: Weather, 2003: Weather in Hannover for 2003. [doi: 10.1594/WDCC/W_Han_2003_MMB_2]

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 25

Usage scenario 2

• Mr. Nice is writing a paper about the sales figures of ice cream in Hannover in 2003, but he has no information about the weather.

• He uses the TIB as the central registration agency to start a metadata search over the registered primary data.

• The result is doi:10.1594/WDCC/W_Han_2003_MMB_2

• He resolves the DOI to find the data sufficient.• The metadata refers him to the WDCC as publisher and

data archive.• In his paper he cites the data again using their DOI.

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 26

URN

In cooperation with the German Library (DDB) in Frankfurt, every dataset is also registered with an unique URN, having the same structure as the DOI:

DOI-Structure: 10.1594/WDCC/W_Han_2003_MMB_2

URN-Structure:Urn:TIB:10.1594/WDCC/W_Han_2003_MMB_2

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 27

Current situation

• In cooperation with World Data Center Climate (WDCC), Max Plank Institut

für Meteorologie, Hamburg• Geoforschungszentrum Potsdam• World Data Center MARE, Uni. Bremen and Alfred

Wegener Institute Bremerhaven• Learning Lab Lower Saxony, Uni. Hannover

the TIB Hannover now is the world‘s first registration agency for scientific and technical data (STD-DOI).

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 28

Technical

• A Handle server is installed at the TIB Hannover, so TIB is able to register and resolve DOIs.

• The TIB officially received a DOI Prefix (10.1594)• The first data sets have been stored at the TIB by hand.• The automatic registration process is under

development.

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 29

Technical realization

Cocoon-WebserverXML-basiert

XSL-TransformierungHandle Server

InternationalDOI

Foundation

DDB

Central Library database Göttingen

GFZ WDCs

Metadata storage

URN registrationDOI registration

Data URL with XML-file

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 30

Outlook

2004• We expect abaout 10.000 datasets until the end of the

year.2005• The system shall be widened for other science fields 2006• The TIB Hannover shall become the central registration

agency for scientific primary data

J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 31

Further information

Project webpage:http://www.std-doi.de

TIB Handle Server:http://doi.tib-hannover.de:8000

DOI Foundation:http://www.doi.org

URN registration of the DDB:http://www.persistent-identifier.de