Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

30
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services a.dewaard@ elsevier.com December 19, 2016 Elsevier‘s RDM Program: Ten Habits of Highly Effective Data

Transcript of Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

Page 1: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 1

Anita de Waard, VP Research Data CollaborationsElsevier RDM [email protected]

December 19, 2016

Elsevier‘s RDM Program: Ten Habits of Highly Effective Data

Page 2: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 2

https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data

10. I

nteg

rate

ups

tream

and

dow

nstre

am

– m

ake

met

adat

a to

ser

ve u

se.

Save

Share

Use

9. Re-usable (allow tools to run on it)

8. Reproducible

7. Trusted (e.g. reviewed)

6. Comprehensible (description / method is available)

5. Citable

4. Discoverable (data is indexed or data is linked from article)

3. Accessible

1. Stored (existing in some form)

2. Preserved (long-term & format-independent)

A Maslow Hierarchy for Research Data:

Page 3: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 3

Store, Preserve: Data Rescue Award

Page 4: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 4

Store: Hivebench

www.hivebench.com

Page 5: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 5

https://data.mendeley.com/

Linked to published papers – or not

Linked to Github – or not

Versioning and provenance tracking

Store, Access: Mendeley Data

Different Licenses: GNU-PL, CC-BY CC0,

etc

Page 6: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 6

Access, Cite: Data Linking• Integrated in paper submission process• Supplementary data is never behind a firewall• Closely integrated with > 150 databases:

Page 7: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 7

Access, Discover: Scholix/DLIs• ICSU-WDS/RDA Publishing Data Service Working group,

merged with National Data Service pilot • Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe

PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others• Proposed long-term architecture and interoperability framework: www.scholix.org• Operational prototype at http://dliservice.research-infrastructures.eu/#/api

(including 1.4 Million links from various sources)

Page 8: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 8

Cite: Force11

https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier

Page 9: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 9

Discover: Datasearch

https://datasearch.elsevier.com

Page 10: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 10

Data articles

Softwarearticles

Methodarticles

Protocols

Video articles

Hardwarearticles

Labresources

Full Researchpaper

• Brief article types designed to communicate a specific element of the research cycle

• Complementary to full research papers

• Easy to prepare and submit• Peer-reviewed and indexed • Receive a DOI and fully citable• Allow citable post-publication updates

• Primarily Open Access (CC-BY) • Published in Multidisciplinary and domain-specific journals

https://www.elsevier.com/books-and-journals/research-elements

Review: Research Elements

Page 11: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 11

• Cortex Registered Reports:• Method and proposed analysis are submitted for pre-registration• Paper is conditionally accepted• Research is executed• Full paper submitted, accepted provided that protocol is followed

• Reproducibility Papers: • Describes all the software and data used to derive the published results, as

well as provides instructions on how to reproduce and validate such results. • Using Mendeley Data, authors also submit their code, data, and optionally a

ReproZip package or a Docker container to make the review process easier.• Reviewers not only review the reproducibility paper, but also validate the

results and claims published in the original manuscript. • Once the paper is accepted, (non-blind) reviewers also become co-authors

and are encouraged to add a section in the paper that states the extent to which the software is portable, is robust to changes, and is likely to be usable.

Reproduce: Some Journal Efforts:

Page 12: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 12

Research article

published

Initial inquiry

Share, publish and

link data

Monitor progress and

provide guidance

Generate reports

111110 000111101110 0000

001 100111 011100101

What?

• Service for Research Institutes (esp. librarians) to engage with researchers throughout the research data life cycle.

How?

Offer service for Librarians to interact with researchers regarding the RDM Process to:

• Offer solutions to store, share, link and publish data

• Monitor progress report on posting, citation, downloads of dataset

• Provide monthly reportingDATA LIGHTHOUSE

Metrics for Institutions: Data Lighthouse

Page 13: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 13

10. I

nteg

rate

ups

tream

and

dow

nstre

am

– m

ake

met

adat

a to

ser

ve u

se.

Save

Share

Use

9. Re-usable

8. Reproducible

7. Trusted

6. Comprehensible

5. Citable

4. Discoverable

3. Accessible

1. Stored

2. Preserved

https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data

Data at Risk

Reproducibility Initiative

Dat

a Li

ghth

ouse

In summary:Elsevier Efforts Collaborative Efforts

Page 14: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 14

“Now show me how all of this works together… on one of my papers!”

• Phil Bourne, August 2016

See Demo

Page 15: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 15

A Tale of (Ir)reproducibility There once was a computational biology paper…

Kinney et al. 2010, http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000976

Page 16: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 16

A Tale of (Ir)eproducibility ... that couldn’t be (easily) reproduced.

Page 17: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 17

A Tale of (Ir)eproducibility Some brave souls did reproduce it …

Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278

Page 18: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 18

A Tale of (Ir)eproducibility… but it was a lot of work.

Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278

Page 19: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 19

Some tools to improve this:1. Store protocols in an Electronic Lab Notebook.

Keep collection of protocols

online

Edit, export, share

Page 20: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 20

Some tools to improve this:2. Run experiments from this Lab Notebook.

Edit, export, share

Base on saved Protocols

Save and Export Outputs

Page 21: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 21

Some tools to improve this:3. Export results to a trusted data repository.

Describe how exoeriment can be reproduced

Keep track of versions of

dataset

Create DOI for Citation

Link back to protocols

Store up to 5 GB of data in many formats

Page 22: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 22

Some tools to improve this:4. Publish in a data journal & link back.

Journal focuses on Method

reporiduction

Link to protocols

Link to Data

Fully OA

Page 23: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 23

The Moral of this Story: • How are we improving the ‘old way of working’?

- Methods and data can be stored by researchers directly during the experiment, so the 270 hours of reproduction > 0 (given that the protocol is stored for reuse during the experiment)

- Better reproducibility because tools and methods are stored innately, no need to recap, rebuild, and recover

- More accurate workflow representation because progress is tracked while it happens, not just afterwards

• Are we there yet? - We’re getting somewhere: “Your tools […], layer a UI on top of a whole

set of disjointed components; this is ultimately what people want!” Phil Bourne, ADDS NIH

- But we’re not quite there: o Need to run code from the tools, plannedo Even easier exporting/publishing workflows plannedo Integration with other tools: ELNs, (institutional) repositories, journals, sharing

platforms planned.

Page 24: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 24

A development partnership proposal:

1. You try out our tools:- Institutional install for Hivebench- Installation of Mendeley Data (in the cloud, later on local service) - If interested: Data Lighthouse pilot

2. In return, you help us explore what these tools will look like:- Connect to Pis/Postdocs/Grad Students who are interested in trying out

Hivebench/Mendeley Data - We ask them for feedback on the tools, help with any issues- You explore Data Lighthouse, tell us what you would like to see in terms of

reporting/emails etc. 3. Timeframe:

- Start by signing an MoU (no money changes hands; we provide services/software/support, you help connect us to researchers, provide feedback)

- We evaluate collaboration after 6 months, see if anything needs to change- Tools are free for 24 months, no other obligations.

Page 25: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 25

Hivebench Features:

Fully-fledged electronic online notebook. Allows researchers to manage:• Experiments,• Protocols,• Reagents,• Research Data (integrated with Mendeley Data, or not).

Collaborative and confidential:• Researchers can keep results private, or collaborate with group, or world to publish

protocols• Secure location in the cloud

Institutional edition (planned):• Hivebench installed locally, on institutional server in secure offline environment• Log-in with institutional credentials• Tracking and reporting of metrics at group/individual level

Page 26: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 26

Mendeley Data Features (today and tomorrow)Trusted Data Repository• Publish data under embargo: full control of visibility of datasets before and after publication• Once published, DOI is assigned• Published datasets stored (and accessible) in perpetuity in the DANS archive• Data Seal of Approval certification

Flexible and Easy to Use• Simple and intuitive user interface (a la Drop Box, Google Docs) • Version management for longitudinal studies: new DOI for each version, enable version citation• Customised metadata schemas for each research project• Upload data directly from university file systems, other electronic lab notebooks, Dropbox etc.• Automatic tagging of datasets with keywords using Elsevier Fingerprint Engine

Integrated into Research Ecosystem• Integrated with Mendeley reference manager and social network used by over 3 million researchers• Integrated with Github, versioning can be updated with software version• Integrated with Hivebench ELN for end to end research lifecycle management• Integrated with Elsevier publishing platform (Evise) used by over 1,000 scientific journals• Link datasets with other research outputs (articles, datasets, software etc.) to increase findability and

re-use• Files can be stored in the cloud

Page 27: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 27

Mendeley Data Institutional Features (mostly tomorrow)Customized for Institutions:• Seamless integration with Pure to link research data to people, departments, publications

and projects• Customised workflows that fit the way each research project team works and the rules of

your institution• Files can be stored on institutional network file system• Provide DOI minting using institutional prefix• Showcase research datasets externally on a web page with institutional branding• Provide single sign-on for researchers using existing institutional credentials

Reporting and Analysis Tools: • Reporting on impact of datasets including views, downloads and citations• Reporting on compliance with funder data mandates by Grant ID• Reporting on storage space used by person, project and department to ensure operation

within assigned quotas

Page 28: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 28

Data Lighthouse pilot, some questions: General Research Data Management questions: 1. How does RDM work in your institution? 2. What role do libraries, research office, researchers play, respectively? 3. Do you have the institutional data policy? 4. Which departments are the higher/lower adopters?5. What are the RDM tools available for your researchers? How well are they used?6. Are you aware of negative/positive factors that may influence adoption rates?

Engagement questions:7. How do you currently engage with researchers in the RDM space?8. What additional services do you need?9. Does the Data Lighthouse project resonate with your needs?10. Are there any use cases/scenarios and metrics that we haven’t thought of?11. Can we work together to improve adoption rates of RDM tools by your researchers?12. Where would information re RDM processes come from, what format should it have? Pilot questions: would you be interested in e.g.:13. Organizing a joint workshop between Research Data Management key personnel of your

institution and the Elsevier RDM team to refine the current Data Lighthouse project scope and requirements?

14. Running a test emailing campaign within 1-2 departments/labs followed by phone interviews with a few librarians and active researchers?

Page 29: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 29

Support for Research Data Management with Data Lighthouse (mockups)

Datasets shared

Datasets linked

Datasetscurated

Data articles submitted

Data articles published

Datasets viewed

Datasets cited

Data Lighthouse Dashboard

Data Lighthouse Dashboard

Page 30: Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

| 30

Links:

• RDM Projects:• https://www.hivebench.com• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-data-rescue-aw

ard-in-the-geosciences

• http://www.journals.elsevier.com/softwarex/• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html• https://data.mendeley.com/• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data• https://www.force11.org/• http://www.nationaldataservice.org/• https://rd-alliance.org/• https://www.elsevier.com/about/open-science/research-data

• Bourne Demo: Original Materials:- The original research paper: Kinnings et al, 2010- The paper describing the earlier reproducibility effort: Garijo et al., 2013- A wiki with the reproduction attempt: Gil/Darijo, 2012- Background materials on the reproduction efforts: Garijo, 2012- SMAP Tool: Xie, 2010- Protocol in Hivebench: https://www.hivebench.com/protocols/16483 - Experiment in Hivebench: https://www.hivebench.com/notebooks/8524/experiments/20562 - Data in Mendeley Data: https://data.mendeley.com/datasets/r69mvkckmn/draft?preview=1 - MethodsX Paper, with links to protocols and data:

http://www.articleofthefuture.com/methodsx.html