Developing institutional RDM services

29
Developing institutional RDM services Michael Day Digital Curation Centre (DCC) UKOLN, University of Bath DCC Workshop, Cardiff University 14 May 2013

description

Slides from a presentation given at a Digital Curation Centre workshop, Cardiff University, 14 May 2013

Transcript of Developing institutional RDM services

Developing

institutional RDM

services

Michael Day

Digital Curation Centre (DCC)

UKOLN, University of Bath

DCC Workshop, Cardiff University 14 May 2013

Session outline

Managing active data Storage options

Long-term retention of data Selection criteria

Data repositories

Finding and citing data Data registries and metadata

Presentation based on: Sarah Jones, Graham Pryor and Angus Whyte, How to Develop Research Data Management Services – a guide for HEIs (DCC, 2013):

http://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services

Some slides reused from RDMRose training materials:

http://rdmrose.group.shef.ac.uk/

Managing active

data

Managing active data: key tasks

Researchers: Have a duty to ensure that research data is stored securely and backed-up on a

regular basis

Have choices (e.g. network drives, laptops, external storage devices, online / cloud-based storage)

Need to take data security seriously

This should be considered as part of the data management planning process

Institutions: Need to be constantly review data holdings and RDM practices in order to

evaluate whether current storage infrastructures are sufficient

May need to make a case for investing in the provision of additional data storage capability

Need procedures for the allocation and management of storage

Need to be flexible, taking account of a diverse range of research contexts and data storage requirements

Research data storage

Trend for some HEIs to enhance the capacity of research data storage facilities Extending capacity of existing filestores (e.g. Bath)

Exploring secure cloud storage

Utilising High Performance Computing facilities

Managing storage University of Bristol (data.bris) – registered researchers (data

stewards) are allocated 5TB storage to manage, e.g. deciding how long data should be kept, who has access, etc.

http://data.blogs.ilrt.org

Options for managing active data

Cloud storage options There may be benefits in terms of costs and expertise

There may also be risks (e.g. loss of control, jurisdictional issues)

Janet Brokerage - promoting the use of cloud and off-site data centre facilities

Academic dropbox-like services Dropbox is often used for sharing and synching data between

machines, but institutions are keen to retain control

Systems developed in-house Typically developed with an disciplinary focus, e.g. BRISSkit

(biomedicine)

Selection for the long-

term retention of data

Selecting data for retention

RCUK, Common Principles on Data Policy (2011): “Data with acknowledged long-term value should be preserved and remain

accessible and usable for future research”

http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx

Institutions will need to establish clear criteria to guide decisions on what should be kept It will not be possible to retain everything

Carefully considered selection processes are essential to help prioritise that data that has long-term value

Institutional selection processes will need to take account of: Data that institutions are legally obliged to retain (or destroy), e.g. for contractual

or regulatory reasons

Different disciplinary practices (e.g., some disciplines will have mature data sharing infrastructures and will already deposit data with third party services)

Researcher sensitivities about losing control of data (deposit agreements)

Developing guidance on selection

Establishing guidelines, processes and good practice for data selection and deposit can be one of the more challenging aspects of an RDM service There is a need for buy-in from researchers

There is a need for clarity on what kinds of data are within the remit of an institutional RDM service

There may be a need to apply different levels of curation, e.g. depending on the perceived value of the data accepted

DCC selection categories

DCC How to Select and Appraise Research Data for Curation (Whyte and Wilson, 2010) proposes seven main criteria: Relevance to mission

Scientific or historic value

Uniqueness

Potential for redistribution

Non-replicability

Economic case

Full documentation

http://www.dcc.ac.uk/resources/how-guides/appraise-select-data

Data repositories

Data repositories

Focusing on how data will be preserved and made available for others Main options:

Developing an institutional data repository Building, where possible, on existing systems, e.g. IR, CRIS,

etc.

Essex Research Data demo: http://researchdata.essex.ac.uk/

Liaising with external research data repositories (or data centres)

Often subject based, some UK data centres supported by funding bodies

Providing researchers with information on external services

Data catalogues

RCUK Common Principles

RCUK, Common Principles on Data Policy (2011):

“To enable research data to be discoverable and effectively re-used by others, sufficient metadata should be recorded and made openly available to enable other researchers to understand the research and re-use potential of the data. Published results should always include information on how to access the supporting data”

Also EPSRC Principle 6

http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx

EPSRC Expectation V

“Research organisations will ensure that appropriately structured metadata describing the research data they hold is published (normally within 12 months of the data being generated) and made freely accessible on the internet; in each case the metadata must be sufficient to allow others to understand what research data exists, why, when and how it was generated, and how to access it. Where the research data referred to in the metadata is a digital object it is expected that the metadata will include use of a robust digital object identifier (For example as available through the DataCite organisation - http://datacite.org).”

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx

May-13

Learning material produced by RDMRose

http://www.sheffield.ac.uk/is/research/projects/rdmros

e

Some questions to consider

What metadata is required to adequately record datasets? What is “sufficient metadata” for discovery and re-use?

Does any of this metadata already exist? If so, where might it be found?

If not, how can the appropriate metadata be generated or captured?

Will there be a need to share this metadata, e.g. with third-party discovery services? National data services? If so, what standards exist to support metadata sharing?

Examples: UKOLN Scoping Study

Scientific Data Application Profile Scoping Study (UKOLN, 2009) Building on work undertaken on the Scholarly Works Application Profile

(SWAP)

Analysed the metadata used by UK data centres and repositories, selected domain models (e.g. DDI, CCLRC Metadata Model, CIDOC CRM)

Concluded that: Simple Dublin Core (e.g., as mandated by OAI-PMH) would be insufficient

There was sufficient convergence between the different schemas to suggest that a generic metadata profile could be constructed

A generic metadata profile would benefit interdisciplinary research and institution based services (e.g. IRs)

http://www.ukoln.ac.uk/projects/sdapss/

Examples: DataCite metadata (1)

DataCite:

Organisation aiming to facilitate easier access to (and citation of) research data, e.g. through the use of persistent identifiers (DOIs)

DataCite Metadata Schema (currently v. 2.2, 2011) defines core metadata properties

Broadly based on Dublin Core concepts

http://schema.datacite.org

Examples: DataCite metadata (2)

Mandatory Properties: Identifier

Creator

Title

Publisher

PublicationYear

Administrative Metadata LastMetadataUpdate

MetadataVersionNumber

Optional Properties: Subject

Contributor

Date

Language

ResourceType

AlternateIdentifier

RelatedIdentifier

Size

Format

Version

Rights

Description

Examples: University of Oxford

The DaMaRO project at the University of Oxford is developing a metadata schema for its DataFinder (Rumsey, 2012).

A three-tier metadata approach: Mandatory minimal metadata to enable basic discovery, such as

Creator, Title, Publisher, Date, Location, Access terms & conditions

Mandatory contextual metadata (mostly administrative and partly based on EPSRC expectations), such as Funding Agency, Grant Number, Last access request date, Project Information, Data Generation Process, Why the data was generated, Date (range) of data collection, Reasons for embargoes

Optional metadata (including discipline-specific metadata) to enable reuse, such as machine settings and the experimental conditions under which the data were gathered

May-13

Learning material produced by RDMRose

http://www.sheffield.ac.uk/is/research/projects/rdmros

e

Examples: University of Essex

RDE Metadata Profile for EPrints

Based on DataCite, INSPIRE, DDI 2.1 and DataShare

Mixture of generic schema and standards specific to

social science data

http://data-

archive.ac.uk/media/375386/rde_eprints_metadatapr

ofile.pdf

Seems to be convergence on layered approach

Some practical questions (1)

Technical choices for institutions: Developing new institutional services, e.g. the

approach taken by ANDS: http://www.ands.org.au/guides/metadata-stores-solutions.html Defining metadata stores by their coverage, the granularity of

data that they describe, and the specialisation of their descriptions (e.g. collection-level, object level, local, institutional, national and discipline-specific)

Building upon existing infrastructures, e.g.: Institutional Repositories

CRIS (e.g. Pure, Symplectic, Converis)

Some practical questions (2)

Research Information Management interaction? There is interest in what RIM standards like CERIF can offer RDM (e.g.

potentially richer metadata structures for linking research outputs with organisational groupings and funding streams, some level of buy-in from funding bodies), but implementation

CERIF for Datasets (C4D): http://cerif4datasets.wordpress.com

We need to think about how metadata can be shared with: Discipline-based repositories and data centres

Emerging national (and international) discovery infrastructures Australian National Data Service

Uses RIF-CS schema (based on ISO 2146:2010) as a data interchange format

Jisc and DCC are currently exploring the options for collating metadata about research data at national level

Data citation

Data Citation

Issues include (Ball & Duke, 2011a and b):

At what granularity should data be made citeable?

How to credit each contributor in a dataset that is

assembled from very many contributions?

Where in a research paper should a data citation be

given (e.g. a paper describing a dataset versus

subsequent papers using it)?

What to do with frequently updated data?

May-13

Learning material produced by RDMRose

http://www.sheffield.ac.uk/is/research/projects/rdmros

e

DataCite

DataCite (http://www.datacite.org) is a not-for-profit organisation that aims to promote and support the sharing of research data

They are developing an infrastructure that supports methods of data citation, discovery, and access

They are currently leveraging the DOI (Digital Object Identifier) infrastructure, which is also used for research articles

They can provide DOIs for datasets

DataCite DOIs have to resolve to a public landing page with information about the dataset and a direct link to it

May-13

Learning material produced by RDMRose

http://www.sheffield.ac.uk/is/research/projects/rdmros

e

DataCite

Basic form:

Creator (PublicationYear): Title. Publisher. Identifier

Version and ResourceType are optional extra elements

For citation purposes, DataCite recommends that DOI names are displayed as linkable, permanent URLs

More info in DataCite (2011)

University of Poppleton (2011): Precipitation measurements 1905-2010 taken at Western Bank weather station. Meteorological service, The University of Poppleton. http://dx.doi.org/10.1594/UoP.MS.298

May-13

Learning material produced by RDMRose

http://www.sheffield.ac.uk/is/research/projects/rdmros

e

References

Ball, A., (2009). Scientific Data Application Profile Scoping Study Report. Bath: UKOLN, University of Bath. Retrieved from: http://www.ukoln.ac.uk/projects/sdapss/

Ball, A., & Duke, M. (2011a). Data Citation and Linking. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Retrieved from http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/data-citation-and-linking

Ball, A., & Duke, M. (2011b). How to Cite Datasets and Link to Publications. DCC How-To Guides. Edinburgh: Digital Curation Centre. Retrieved from http://www.dcc.ac.uk/resources/how-guides/cite-datasets

DataCite (2011). DataCite Metadata Schema for the Publication and Citation of Research Data. Version 2.2. London: DataCite. Retrieved from http://schema.datacite.org/meta/kernel-2.2/doc/DataCite-MetadataKernel_v2.2.pdf. doi:10.5438/0005

Rumsey, S. (2012). Just enough metadata: Metadata for research datasets in institutional data repositories [PowerPoint presentation]. Oxford: The University of Oxford. Retrieved from http://damaro.oucs.ox.ac.uk/docs/Just%20enough%20metadata%20v3-1.pdf

May-13

Learning material produced by RDMRose

http://www.sheffield.ac.uk/is/research/projects/rdmros

e

Questions?