RDMRose 2.5 Metadata and data citation

20
Metadata and data citation 7/18/22 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research /projects/rdmrose Research Data Management Workshop 2.5

Transcript of RDMRose 2.5 Metadata and data citation

Page 1: RDMRose 2.5 Metadata and data citation

Apr 15, 2023

Metadata and data citation

Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Research Data Management Workshop 2.5

Page 2: RDMRose 2.5 Metadata and data citation

Apr 15, 2023

Learning Outcomes

By the end of this session you will be able to• Discuss the varying requirements of metadata

that will enable researchers to identify the potential of a particular dataset

• Evaluate ways of citing data• Articulate and reflect upon some of the issues

involved with citing data and datasets

Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Page 3: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Session 2.5 overview

• EPSRC principles and expectations• What is sufficient metadata?• How to cite data?

Page 4: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

EPSRC Principle 6

• “Sufficient metadata should be recorded and made openly available to enable other researchers to understand the potential for further research and re-use of the data. Published results should always include information on how to access the supporting data.”http://www.epsrc.ac.uk/about/standards/researchdata/Pages/principles.aspx

Page 5: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

EPSRC Expectation 5

• “Research organisations will ensure that appropriately structured metadata describing the research data they hold is published (normally within 12 months of the data being generated) and made freely accessible on the internet; in each case the metadata must be sufficient to allow others to understand what research data exists, why, when and how it was generated, and how to access it. Where the research data referred to in the metadata is a digital object it is expected that the metadata will include use of a robust digital object identifier (For example as available through the DataCite organisation - http://datacite.org).”http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx

Page 6: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 1: Metadata

• What is “sufficient metadata” that enables “other researchers to understand the potential for further research and re-use of the data”?

Page 7: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 1: MetadataThe University of Poppleton holds a dataset with meteorological observations, taken at the university’s weather station. In particular, it contains a set of precipitation measurements since the foundation of the university. A climatologist, Jenny Fairweather, is interested in this dataset for her research into climate change. She is looking for trends in the weather. A meteorologist, Wilson Rainbird, who works for the UK Met Office wants to use these data for the purposes of weather prediction. He is mainly interested in combining these precipitation measurements with other similar datasets. A researcher, Alice Snowe, from another university’s Accident Research Unit conducts most of her research in the area of road traffic accidents. She would like to map the precipitation measurements to another dataset containing information on road accidents in order to analyse possible correlations. Lastly, the university’s data repository manager, John Shower, is concerned with issues regarding data access and IPR.

Page 8: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 1: Metadata

• What is “sufficient metadata” for each of these stakeholders “to understand the potential for further research and re-use of the data”?

Page 9: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Example

• The DaMaRO project at the University of Oxford has developed a metadata schema for its DataFinder (Rumsey, 2012).

• A three-tier metadata approach:– Mandatory minimal metadata to enable basic discovery, such as

Creator, Title, Publisher, Date, Location, Access terms & conditions– Mandatory contextual metadata (mostly administrative and partly

based on EPSRC expectations), such as Funding Agency, Grant Number, Last access request date, Project Information, Data Generation Process, Why the data was generated, Date (range) of data collection, Reasons for embargo

– Optional metadata (including discipline-specific metadata) to enable reuse, such as machine settings and experimental conditions under which the data were gathered

Page 10: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 2: Data citation

• How should data be cited?• There are no established standards for data

citation yet, although some style manuals such as the APA’s (in the 5th and 6th editions) and some repositories such as the UK Data Archive do provide instructions.

Page 11: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 2: Data citation

• Researcher, Alice Snowe, from another university’s Accident Research Unit is seeking to use the dataset with precipitation measurements going back to the foundation of the University. This dataset was deposited in 2011 by the University’s meteorologist, Christopher Oldman Frost, and covers all years up to and including 2010. It consists of data subsets that are organised per year, each consisting of several files, including Excel spreadsheets, Word files, and image files (digitised observations written down on paper). Of course, Mr Oldman Frost is not the only meteorologist who has been involved in taking the measurements that make up this dataset.

Page 12: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 2: Data citation

• Alice Snowe is now writing a research paper for Science called ‘The correlation between bicycle accidents and precipitation in urban centres during the rush hour’. She needs to cite our institutional repository’s dataset. In particular she will need to refer to the precipitation measurements of 4 May 1979. Elsewhere in her article she also needs to refer to a subset covering the winter months of the years 1981-1985.

• Write down the references that Alice Snowe needs to give in her article.

Page 13: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

APA

Basic form:• Rightsholder. (Year). Title of data set (Version number)

[Description of form]. Location: Name of producer.orRightsholder. (Year). Title of data set (Version number) [Description of form]. Retrieved from http://

• University of Poppleton (2011). Precipitation measurements 1905-2010 taken at Western Bank weather station [Data files and documentation]. Poppleton: The University of Poppleton, Meteorological Service.

Page 14: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

DataCite

• DataCite (http://www.datacite.org) is a not-for-profit organisation that aims to promote and support the sharing of research data

• They are developing an infrastructure that supports methods of data citation, discovery, and access

• They are currently leveraging the DOI (Digital Object Identifier) infrastructure, which is also used for research articles

• They can provide DOIs for datasets• DataCite DOIs have to resolve to a public landing page

with information about the dataset and a direct link to it

Page 15: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

DataCite

Basic form:• Creator (PublicationYear): Title. Version. Publisher.

ResourceType. Identifier• Version and ResourceType are optional elements• For citation purposes, DataCite recommends that DOI names

are displayed as linkable, permanent URLs• More info in DataCite (2011)

• University of Poppleton (2011): Precipitation measurements 1905-2010 taken at Western Bank weather station. Meteorological service, The University of Poppleton. http://dx.doi.org/10.1594/UoP.MS.298

Page 16: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 2: Data citation

• What practical issues did you encounter when writing the references for Alice Snowe’s research paper? How could these issues be solved?

Page 17: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Data Citation

• Issues include (Ball & Duke, 2011a and b):– At what granularity should data be made citeable?– How to credit each contributor in a dataset that is

assembled from very many contributions?– Where in a research paper should a data citation

be given (e.g. a paper describing a dataset versus subsequent papers using it)?

– What to do with frequently updated data?

Page 18: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

REFERENCES

Page 19: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

References

• American Psychological Association (2010). Publication Manual of the American Psychological Association (6th edition). Washington, DC: American Psychological Association, pp. 210-211.

• Ball, A., & Duke, M. (2011a). Data Citation and Linking. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Retrieved from http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/data-citation-and-linking

• Ball, A., & Duke, M. (2011b). How to Cite Datasets and Link to Publications. DCC How-To Guides. Edinburgh: Digital Curation Centre. Retrieved from http://www.dcc.ac.uk/resources/how-guides/cite-datasets

Page 20: RDMRose 2.5 Metadata and data citation

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

References• DataCite (2011). DataCite Metadata Schema for the Publication and

Citation of Research Data. Version 2.2. London: DataCite. Retrieved from http://schema.datacite.org/meta/kernel-2.2/doc/DataCite-MetadataKernel_v2.2.pdf. doi:10.5438/0005

• DataCite (n.d.). Why cite data? Hannover. Retrieved from http://datacite.org/whycitedata

• Rumsey, S. (2012). Just enough metadata: Metadata for research datasets in institutional data repositories [PowerPoint presentation]. Oxford: The University of Oxford. Retrieved from http://damaro.oucs.ox.ac.uk/docs/Just%20enough%20metadata%20v3-1.pdf

• UK Data Archive (n.d.). Citing Data. Colchester. Retrieved from http://www.data-archive.ac.uk/conditions/citing-data