Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at...

18
Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010 Eefke Smit, International Association of STM publishers Director, Standards and Technology

Transcript of Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at...

Page 1: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

Preservation, access and re-use of Research Data

The STM view on publishing datasets

Presented at the DataCite Summer Meeting 2010Hannover, 8 June 2010

Eefke Smit, International Association of STM publishersDirector, Standards and Technology

Page 2: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

2

Context

“…… increased availability of primary sources of data in digital form has the potential to shift the balance away from research based on secondary sources such as publications, thus positioning data as the central element in the scientific process.” (a statement from the Director of the Directorate General for Information Society and Media of the European Commission, 2008)

“If the raw data doesn’t form a central part of the scientific record then we perhaps need to start asking whether the usefulness of that record in its current form is starting to run out.” (from a blog called Science in the Open: http://blog.openwetware.org/scienceintheopen/2008/05/16/avoid-the-pain-and-embarassment-make-all-the-raw-data-available/

“..let us get back to the days where observational scientists could justify peer reviewed publication primarily on the basis of collection, description and reporting of high quality data sets (usually with some basic level of interpretation..” Quote taken from a discussion paper called “The

Risk-Reward Basis for Data Publication” (marine sciences, 2007)

“Problem = scientific community does not see online data as “publication” (from a presentation called: How to motivate scientists to publish data online, Mark J. Costello. June 2008)

Page 3: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

3

What do scientist want…….

Page 4: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

4

How to locate data ?

Page 5: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

5

Where to submit data ?

Page 6: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

6

Some numbers…..

Preview of Parse Insight Results: Researchers:• Only 20 % of researchers share data• 40 % have problems sharing it (distrust, legal and privacy issues)• But 80 % of researchers like to use data from others……

What publishers do:• 70 % of publishers = 90 % of journals accept data and other suppl

material• 95 % of publishers facilitate linking to datasets• Less than 5 % publishers have special facilities for datasets• 60 % see the researcher and research institute as the responsible party

to maintain and curate datasets

Page 7: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

7

What do Publishers currently do……

Instructions to authors in “Tetrahedron”

Page 8: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

8

Supplementary files are linked directly from an article’s abstract

page.

Page 9: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

9

Supplementary files are referenced within

the article text and linked via the article’s abstract page using

the doi.

Page 10: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

10

Page 11: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

11

How do Publishers view research data in the context of “IP”

The Publishing Industry (STM/ALPSP) position is:

It is also stated that:

“…..believe that, as a general principle, data sets, raw data outputs of research, and sets or subsets of that data should wherever possible be made freely accessible to other scholars” (Statement from STM & ALPSP, June 2006)

“….articles published in scholarly journals often include tables and charts in which certain data points are included or expressed. Journal publishers often do seek the transfer of or ownership of the publishing rights in such illustrations.., but this does not amount to a claim to the underlying data itself..”

Page 12: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

12

Research data and the Publisher’s Mission

Can we contribute to the data dissemination/retrieval process?

Storing, Linking Search, Discovery

Can we contribute to research workflows ?

Meta-data, collections, ontologies Visualization, mining, etc

Can we meaningful contribute to an “editorial” process for data?

Submission processes editorial organization, review

Publishers are committed to making genuine contributions to the research communities…..

• support to the scholarly communication process

• increased availability of research output

• increased citations to research output

• increased overall quality of research

• develop new means of knowledge discovery

• increase in the research efficiency

Page 13: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

13

Support through the journal networks and publishing platforms

• General instructions to make available

• available as supplementary information with the online article

• Textual references to data repositories & datasets

• Verbal instructions, limited support by editorial team

• “More granular” definition of research data and supplementary information

• Specific instructions on how, when and where to submit, and how to cite.

• Specific sustainable destinations for research data

• Agreed formats & metadata requirements for data submission

• Expand editorial teams with a “data-editor”

• Hyper-linking between articles and (final) dataset destinations and v.v.

• “Federated searching”• Intelligent (contextual) referencing

of datasets in articles

Move from….. To……….

Note: a successful implementationrequires a combination of domainspecific and generic solutions

Page 14: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

14

working examples……..

Page 15: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

15

Vice versa

Page 16: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

What Publishers are busy solving

• Peer review practices• Readability, navigation, accessibility, presentation• Discoverability: search, metadata, linking, citability• Copyright issues• Preservation and long term archiving• Version control/ dynamic data• Access, permissions for re-use• Editorial practice and support

See joint NISO/ NFAIS initiative: http://www.niso.org/topics/tl/supplementary/

Page 17: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

•To make solutions scalable and sustainable, we need:

convergence

•Good collaboration with all stakeholders in the chain: researchers, research instuitutes, safe data repositories, libraries, policymakers

•Standards and common practice building on what is in place already: from persistent identifiers, citation conventions, to submission guidelines across scholarly journals

•Scalable solutions that work across disciplines

•Infrastructure: TiB and DataCite are excellent initiatives to get the right infrastructure in place

Willingness in abundance among publishersWhat we now need is:

Page 18: Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.

18

In conclusion

• Do Publishers recognise the importance of “data publishing”

YES

• Can Publishers help to get research data in the open?

YES

• Will Publishers help to improve the discoverability of data?

YES

…..and YES:• Solutions must be scalable & sustainable• Existing capabilities should be used as much as possible• We need close collaboration across the whole chain of researchers

and research communities, libraries and data centres as well as the policy makers...and support DataCite.