Publishing Data
Catherine JonesLibrary Systems Development Manager, STFC
Rutherford Appleton Laboratory
CLADDIER workshop, Chilworth, Southampton, UK 15th May 2007
Contents
• Set the scene• Definition of publication• Complexities• Making data permanently available• Quality control• User requirements• Issues
Microsoft’s Science 2020
ReportModern scientific communication relies on both journals and databases. At present these are not integrated.
By 2020 mutual linking will be commonplace and publications just containing peer-reviewed data will become available.
http://research.microsoft.com/towards2020science/downloads.htm
Publication concept
In this context “publication” is defined as the process through which data is fixed and made retrievable over the long term, and may imply that there has been some quality control process.
Making data permanently
available
Three areas: 1. Defining what is to be kept: encapsulation 2. Ensuring that it is described effectively:
metadata3. Identifying who is responsible for the
data management: trusted repository
Encapsulation
A method of identifying a fixed collection of meaningful data so that it can be preserved as a clearly defined unchanging entity.
Datasets which are still growing Versions of datasetsFormat translations
Metadata
Needs to be created to ensure that the data is usable now and over the long term. Semantic encapsulation is important as this is likely to be used in citation.
Trusted repository
To ensure that the data is available over the long term, the Data Centre needs to be on a secure footing and well managed.
Quality Control
Usability of the dataset. This is one of the roles of the Data Centres.Usefulness of the dataset. This is the role of domain experts.
User requirements for
citation1.Need for an unambiguous reference to a well defined
permanent entity2.This reference/citation needs to be understandable for
humans3.Author and publication year, or equivalents, are
important 4.An unambiguous data reference, in this area includes
the activity or tool which produced the data 5.Source of the data (i.e. the repository) may be as
important as the producer and needs to be unambiguous
Requirements from data producers
1. Traceable to the data provider/producer 2. Usable for usage metrics 3. To be recognised as intellectually
equivalent to academic papers 4. Able to be used to search for papers
citing data
Citation format
Author, title, [medium], publisher, publication date, identifier, feature, [access date, available at]
Natural Environment Research Council, Mesosphere-Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet]. British Atmospheric Data Centre (BADC), 1990- urn badc.nerc.ac.uk/data/mst/v3/upd15032006, feature 200409031205 [http://featuretype.registry/VerticalProfile] [cited 2006 Apr 25. Available from http://badc.nerc.ac.uk/data/mst.]
Issues for consideration
•The ability to cite data is strongly linked to the definition of the data. •Dynamic datasets pose additional issues for long-term accessibility. •Versioning of the data and the processing/analysing software are big issues to resolve.•Peer review of the data is important. •Identification of datasets where a facility may provide data from a set of instruments is a complex decision.
Top Related