| 1
Anita de Waard, VP Research Data CollaborationsElsevier RDM [email protected]
December 19, 2016
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
| 2
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
10. I
nteg
rate
ups
tream
and
dow
nstre
am
– m
ake
met
adat
a to
ser
ve u
se.
Save
Share
Use
9. Re-usable (allow tools to run on it)
8. Reproducible
7. Trusted (e.g. reviewed)
6. Comprehensible (description / method is available)
5. Citable
4. Discoverable (data is indexed or data is linked from article)
3. Accessible
1. Stored (existing in some form)
2. Preserved (long-term & format-independent)
A Maslow Hierarchy for Research Data:
| 3
Store, Preserve: Data Rescue Award
| 5
https://data.mendeley.com/
Linked to published papers – or not
Linked to Github – or not
Versioning and provenance tracking
Store, Access: Mendeley Data
Different Licenses: GNU-PL, CC-BY CC0,
etc
| 6
Access, Cite: Data Linking• Integrated in paper submission process• Supplementary data is never behind a firewall• Closely integrated with > 150 databases:
| 7
Access, Discover: Scholix/DLIs• ICSU-WDS/RDA Publishing Data Service Working group,
merged with National Data Service pilot • Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe
PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others• Proposed long-term architecture and interoperability framework: www.scholix.org• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources)
| 8
Cite: Force11
https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier
| 10
Data articles
Softwarearticles
Methodarticles
Protocols
Video articles
Hardwarearticles
Labresources
Full Researchpaper
• Brief article types designed to communicate a specific element of the research cycle
• Complementary to full research papers
• Easy to prepare and submit• Peer-reviewed and indexed • Receive a DOI and fully citable• Allow citable post-publication updates
• Primarily Open Access (CC-BY) • Published in Multidisciplinary and domain-specific journals
https://www.elsevier.com/books-and-journals/research-elements
Review: Research Elements
| 11
• Cortex Registered Reports:• Method and proposed analysis are submitted for pre-registration• Paper is conditionally accepted• Research is executed• Full paper submitted, accepted provided that protocol is followed
• Reproducibility Papers: • Describes all the software and data used to derive the published results, as
well as provides instructions on how to reproduce and validate such results. • Using Mendeley Data, authors also submit their code, data, and optionally a
ReproZip package or a Docker container to make the review process easier.• Reviewers not only review the reproducibility paper, but also validate the
results and claims published in the original manuscript. • Once the paper is accepted, (non-blind) reviewers also become co-authors
and are encouraged to add a section in the paper that states the extent to which the software is portable, is robust to changes, and is likely to be usable.
Reproduce: Some Journal Efforts:
| 12
Research article
published
Initial inquiry
Share, publish and
link data
Monitor progress and
provide guidance
Generate reports
111110 000111101110 0000
001 100111 011100101
What?
• Service for Research Institutes (esp. librarians) to engage with researchers throughout the research data life cycle.
How?
Offer service for Librarians to interact with researchers regarding the RDM Process to:
• Offer solutions to store, share, link and publish data
• Monitor progress report on posting, citation, downloads of dataset
• Provide monthly reportingDATA LIGHTHOUSE
Metrics for Institutions: Data Lighthouse
| 13
10. I
nteg
rate
ups
tream
and
dow
nstre
am
– m
ake
met
adat
a to
ser
ve u
se.
Save
Share
Use
9. Re-usable
8. Reproducible
7. Trusted
6. Comprehensible
5. Citable
4. Discoverable
3. Accessible
1. Stored
2. Preserved
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
Data at Risk
Reproducibility Initiative
Dat
a Li
ghth
ouse
In summary:Elsevier Efforts Collaborative Efforts
| 14
“Now show me how all of this works together… on one of my papers!”
• Phil Bourne, August 2016
See Demo
| 15
A Tale of (Ir)reproducibility There once was a computational biology paper…
Kinney et al. 2010, http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000976
| 16
A Tale of (Ir)eproducibility ... that couldn’t be (easily) reproduced.
| 17
A Tale of (Ir)eproducibility Some brave souls did reproduce it …
Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278
| 18
A Tale of (Ir)eproducibility… but it was a lot of work.
Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278
| 19
Some tools to improve this:1. Store protocols in an Electronic Lab Notebook.
Keep collection of protocols
online
Edit, export, share
| 20
Some tools to improve this:2. Run experiments from this Lab Notebook.
Edit, export, share
Base on saved Protocols
Save and Export Outputs
| 21
Some tools to improve this:3. Export results to a trusted data repository.
Describe how exoeriment can be reproduced
Keep track of versions of
dataset
Create DOI for Citation
Link back to protocols
Store up to 5 GB of data in many formats
| 22
Some tools to improve this:4. Publish in a data journal & link back.
Journal focuses on Method
reporiduction
Link to protocols
Link to Data
Fully OA
| 23
The Moral of this Story: • How are we improving the ‘old way of working’?
- Methods and data can be stored by researchers directly during the experiment, so the 270 hours of reproduction > 0 (given that the protocol is stored for reuse during the experiment)
- Better reproducibility because tools and methods are stored innately, no need to recap, rebuild, and recover
- More accurate workflow representation because progress is tracked while it happens, not just afterwards
• Are we there yet? - We’re getting somewhere: “Your tools […], layer a UI on top of a whole
set of disjointed components; this is ultimately what people want!” Phil Bourne, ADDS NIH
- But we’re not quite there: o Need to run code from the tools, plannedo Even easier exporting/publishing workflows plannedo Integration with other tools: ELNs, (institutional) repositories, journals, sharing
platforms planned.
| 24
A development partnership proposal:
1. You try out our tools:- Institutional install for Hivebench- Installation of Mendeley Data (in the cloud, later on local service) - If interested: Data Lighthouse pilot
2. In return, you help us explore what these tools will look like:- Connect to Pis/Postdocs/Grad Students who are interested in trying out
Hivebench/Mendeley Data - We ask them for feedback on the tools, help with any issues- You explore Data Lighthouse, tell us what you would like to see in terms of
reporting/emails etc. 3. Timeframe:
- Start by signing an MoU (no money changes hands; we provide services/software/support, you help connect us to researchers, provide feedback)
- We evaluate collaboration after 6 months, see if anything needs to change- Tools are free for 24 months, no other obligations.
| 25
Hivebench Features:
Fully-fledged electronic online notebook. Allows researchers to manage:• Experiments,• Protocols,• Reagents,• Research Data (integrated with Mendeley Data, or not).
Collaborative and confidential:• Researchers can keep results private, or collaborate with group, or world to publish
protocols• Secure location in the cloud
Institutional edition (planned):• Hivebench installed locally, on institutional server in secure offline environment• Log-in with institutional credentials• Tracking and reporting of metrics at group/individual level
| 26
Mendeley Data Features (today and tomorrow)Trusted Data Repository• Publish data under embargo: full control of visibility of datasets before and after publication• Once published, DOI is assigned• Published datasets stored (and accessible) in perpetuity in the DANS archive• Data Seal of Approval certification
Flexible and Easy to Use• Simple and intuitive user interface (a la Drop Box, Google Docs) • Version management for longitudinal studies: new DOI for each version, enable version citation• Customised metadata schemas for each research project• Upload data directly from university file systems, other electronic lab notebooks, Dropbox etc.• Automatic tagging of datasets with keywords using Elsevier Fingerprint Engine
Integrated into Research Ecosystem• Integrated with Mendeley reference manager and social network used by over 3 million researchers• Integrated with Github, versioning can be updated with software version• Integrated with Hivebench ELN for end to end research lifecycle management• Integrated with Elsevier publishing platform (Evise) used by over 1,000 scientific journals• Link datasets with other research outputs (articles, datasets, software etc.) to increase findability and
re-use• Files can be stored in the cloud
| 27
Mendeley Data Institutional Features (mostly tomorrow)Customized for Institutions:• Seamless integration with Pure to link research data to people, departments, publications
and projects• Customised workflows that fit the way each research project team works and the rules of
your institution• Files can be stored on institutional network file system• Provide DOI minting using institutional prefix• Showcase research datasets externally on a web page with institutional branding• Provide single sign-on for researchers using existing institutional credentials
Reporting and Analysis Tools: • Reporting on impact of datasets including views, downloads and citations• Reporting on compliance with funder data mandates by Grant ID• Reporting on storage space used by person, project and department to ensure operation
within assigned quotas
| 28
Data Lighthouse pilot, some questions: General Research Data Management questions: 1. How does RDM work in your institution? 2. What role do libraries, research office, researchers play, respectively? 3. Do you have the institutional data policy? 4. Which departments are the higher/lower adopters?5. What are the RDM tools available for your researchers? How well are they used?6. Are you aware of negative/positive factors that may influence adoption rates?
Engagement questions:7. How do you currently engage with researchers in the RDM space?8. What additional services do you need?9. Does the Data Lighthouse project resonate with your needs?10. Are there any use cases/scenarios and metrics that we haven’t thought of?11. Can we work together to improve adoption rates of RDM tools by your researchers?12. Where would information re RDM processes come from, what format should it have? Pilot questions: would you be interested in e.g.:13. Organizing a joint workshop between Research Data Management key personnel of your
institution and the Elsevier RDM team to refine the current Data Lighthouse project scope and requirements?
14. Running a test emailing campaign within 1-2 departments/labs followed by phone interviews with a few librarians and active researchers?
| 29
Support for Research Data Management with Data Lighthouse (mockups)
Datasets shared
Datasets linked
Datasetscurated
Data articles submitted
Data articles published
Datasets viewed
Datasets cited
Data Lighthouse Dashboard
Data Lighthouse Dashboard
| 30
Links:
• RDM Projects:• https://www.hivebench.com• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-data-rescue-aw
ard-in-the-geosciences
• http://www.journals.elsevier.com/softwarex/• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html• https://data.mendeley.com/• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data• https://www.force11.org/• http://www.nationaldataservice.org/• https://rd-alliance.org/• https://www.elsevier.com/about/open-science/research-data
• Bourne Demo: Original Materials:- The original research paper: Kinnings et al, 2010- The paper describing the earlier reproducibility effort: Garijo et al., 2013- A wiki with the reproduction attempt: Gil/Darijo, 2012- Background materials on the reproduction efforts: Garijo, 2012- SMAP Tool: Xie, 2010- Protocol in Hivebench: https://www.hivebench.com/protocols/16483 - Experiment in Hivebench: https://www.hivebench.com/notebooks/8524/experiments/20562 - Data in Mendeley Data: https://data.mendeley.com/datasets/r69mvkckmn/draft?preview=1 - MethodsX Paper, with links to protocols and data:
http://www.articleofthefuture.com/methodsx.html
Top Related