Publishing Data Catherine Jones Library Systems Development Manager, STFC Rutherford Appleton...

14
Publishing Data Catherine Jones Library Systems Development Manager, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton, UK 15 th May 2007

Transcript of Publishing Data Catherine Jones Library Systems Development Manager, STFC Rutherford Appleton...

Publishing Data

Catherine JonesLibrary Systems Development Manager, STFC

Rutherford Appleton Laboratory

CLADDIER workshop, Chilworth, Southampton, UK 15th May 2007

Contents

• Set the scene• Definition of publication• Complexities• Making data permanently available• Quality control• User requirements• Issues

Microsoft’s Science 2020

ReportModern scientific communication relies on both journals and databases. At present these are not integrated.

By 2020 mutual linking will be commonplace and publications just containing peer-reviewed data will become available.

http://research.microsoft.com/towards2020science/downloads.htm

Publication concept

In this context “publication” is defined as the process through which data is fixed and made retrievable over the long term, and may imply that there has been some quality control process.

Complexities of Data

These all show the same data at different levels of processing.

Making data permanently

available

Three areas: 1. Defining what is to be kept: encapsulation 2. Ensuring that it is described effectively:

metadata3. Identifying who is responsible for the

data management: trusted repository

Encapsulation

A method of identifying a fixed collection of meaningful data so that it can be preserved as a clearly defined unchanging entity.

Datasets which are still growing Versions of datasetsFormat translations

Metadata

Needs to be created to ensure that the data is usable now and over the long term. Semantic encapsulation is important as this is likely to be used in citation.

Trusted repository

To ensure that the data is available over the long term, the Data Centre needs to be on a secure footing and well managed.

Quality Control

Usability of the dataset. This is one of the roles of the Data Centres.Usefulness of the dataset. This is the role of domain experts.

User requirements for

citation1.Need for an unambiguous reference to a well defined

permanent entity2.This reference/citation needs to be understandable for

humans3.Author and publication year, or equivalents, are

important 4.An unambiguous data reference, in this area includes

the activity or tool which produced the data 5.Source of the data (i.e. the repository) may be as

important as the producer and needs to be unambiguous

Requirements from data producers

1. Traceable to the data provider/producer 2. Usable for usage metrics 3. To be recognised as intellectually

equivalent to academic papers 4. Able to be used to search for papers

citing data

Citation format

Author, title, [medium], publisher, publication date, identifier, feature, [access date, available at]

Natural Environment Research Council, Mesosphere-Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet]. British Atmospheric Data Centre (BADC), 1990- urn badc.nerc.ac.uk/data/mst/v3/upd15032006, feature 200409031205 [http://featuretype.registry/VerticalProfile] [cited 2006 Apr 25. Available from http://badc.nerc.ac.uk/data/mst.]

Issues for consideration

•The ability to cite data is strongly linked to the definition of the data. •Dynamic datasets pose additional issues for long-term accessibility. •Versioning of the data and the processing/analysing software are big issues to resolve.•Peer review of the data is important. •Identification of datasets where a facility may provide data from a set of instruments is a complex decision.