The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research...

27
The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington, DC Andrew Sallans Natalie Meyers Partnerships Lead E-Research Librarian Center for Open Science University of Notre Dame The OSF at Notre Dame, CNI Fall 2015

Transcript of The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research...

Page 1: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and

Supporting the Research Mission

CNI Fall 2015 Membership Meeting Washington, DC

Andrew Sallans Natalie MeyersPartnerships Lead E-Research

Librarian

Center for Open Science University of Notre Dame

The OSF at Notre Dame, CNI Fall 2015

Page 2: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

12/15/2015 CNI Fall Mtg https://osf.io/s5e2b/

Page 3: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

OSF Extensions & Pilots @ ND

Page 4: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

https://osf.io/s5e2b/

I want to preserve my simulation methodand results so other people can try it out.

data output

DOI:10.XXXX DOI:10.ZZZZ

DOI:10.CCCC

DOI:10.YYYY

… and repeat this 1M times with different –p values.

mysim.exe –in data –out output –p 10

12/15/2015 CNI Fall Mtg

Page 5: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

https://osf.io/s5e2b/

But it’s not that simple!

12/15/2015 CNI Fall Mtg

Page 6: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

I want to preserve my simulation methodand results so other people can try it out.

data output

mysim.exe –in data –out output –p 10

config

calib HTTP GET

Green Goat Linux 57.83.09.B

libsimruby

X86-64 CPU / 64GB RAM / 200GB Disk

SIM_MODE=clever

Page 7: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Your application works perfectly todayon your machineWill your application still work next month?Will your application still work next year?Will your application still work 10 years later?Will your application still work today on another machine?

Challenges of Reproducible Computing

Page 8: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

The DASPOS Project Team includes computer science experts from the University of Notre Dame and the University of Chicago, physicists from the ATLAS and CMS experiments at the LHC, the DØ experiment at the Tevatron, experts in other data-intensive fields such as bioinformatics and astrophysics, and digital librarians with broad experience in the preservation of large datasets in the sciences and humanities.

The DASPOS project has been funded in whole or in part with Federal funds from the National Science Foundation, under Award No. 1247316.

daspos.org

Page 9: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

https://osf.io/s5e2b/

Goal and Scope of ProjectThe goal of DASPOS is to “scout out” solutions to the most pressing technical problems, and make them available to those constructing preservation systems. In particular, this project will:• Establish a dialogue with other fields facing preservation and re-use issues with Big Data.

Identify areas of commonality and outline where solutions diverge due to specific needs.• Develop metadata to support the preservation and re-use of HEP data, and its related

software and computational algorithms. Design the metadata so as to meet the needs of as many other fields as possible for wide re-use.

• Define a reference architecture for a data preservation system targeted for HEP but coordinated with other fields. Include decision points where policy choices impact the architectural structure.

• Develop a preservation validation test-bed on which a technical implementation of the reference architecture can be developed and constructed.

• Perform a Curation Challenge, where a physics data analysis is conducted based solely on curated and archived data.

daspos.org

12/15/2015 CNI Fall Mtg

Page 10: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

VecNet’s Malaria Modelers - Share Simulations & Results

Page 11: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

VecNet Digital Library

Our digital library software stack and features were first developed and presented for beta feedback in 2013:

Brower D, Lakshminarayanan B, Meyers N. Multiple Identities: Managing Authorities in Repositories and Digital Collections presented at American Library Association Annual Conference, Chicago, IL 2013.

and then again at last year’s ACM/IEEE JCDL conference :

Barker M, Brower D, and Meyers N. Vector-Borne Disease Network Digital Library presented at Digital Libraries 2014 IEEE(978-1-4799-5569-5) London, UK,

Sept 9, 2014 .

Page 12: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

dl.vecnet.org Vector-borne Disease Network

VecNet digital library supports mathematical modeling of malaria transmission & eradication.

It is a repository for curating & sharing information about simulations used to model malaria transmission & the impact of interventions

Contains: field, lab, survey, climate, demographic, and simulation data, input file code snippets, input file sets for models, simulations, tagged bibliographic citations, articles, maps, reports and more on entomology, epidemiology, demography, climatology, and interventions

VecNet Digital Library

Page 13: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Boehm, R. and Meyers N. Repository Platforms for Research Data: VecNet Use Case presented at Research Data Alliance (RDA) 6th Plenary Meeting, Paris, Sept 25, 2015.

Dynamic Data Citation & Repositories for Research Data

Meyers N. Dynamic Data Citation: VecNet Use Case presented at Federation of Earth Science Information Partners’ Winter Meeting Dynamic Data Citation Workshop, Washington, D.C., Jan 8, 2015.

Meyers N. VecNet Digital Library & Data Citation for Simulations presented at Institute for Disease Modeling 3rd Annual Modeling Symposium, Bellevue, Washington, April 22, 2015

Page 14: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Attended Andrew Sallans’ talk “Improving Integrity, Transparency, and Reproducibility Through Connection of the Scholarly Workflow” during NISO’s virtual Conference: Scientific Data Management: Caring for Your Institution and its Intellectual Wealth. February 18, 2015Attended Open Repositories ‘15 and was attracted to OSF featuresHosted an A Panel Presentation of the CoS Reproducibility Projects at Notre Dame’s Center for Digital Scholarship Sept 9, 2015.

Getting to Know OSF

Page 15: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Integrating our Institutional Repository w/OSF (CAS Authentication) Embarked on NDS Dashboard integration w/CRC & Ian TaylorPiloting registration of select VecNet malaria data files in OSFTesting Umbrella Software Preservation tool interactivity with OSF (OpenMalaria simulation execution Use Case) Working on a reproducible software engineering environment by creating and documenting a reproducible development environment for the OSF framework

Openstack images to run OSF frontend backend service on CRC resources and Vagrant/Virtualbox files for use by developers on their laptops (ongoing)

OSF Related Ongoing Efforts at ND

Page 16: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Why OSF and an Institutional Repository?

1. Why integrate OSF w/CurateND? -> Start Staging Data for Preservation & initial sharing btwn collaborators

2. Institutional Branding and Central Authentication -> Fosters Ease of Use & Trust Among Institutional Researchers

3. Group Role Enhancements –> Hierarchical Lab Roles

4. Storage Source Configuration -> Flexibility of Resources

5. Integration with Computational Environment -> Access to HPC & Reuse

6. Metadata Enhancements to OSF -> Incrementally & automatically add Metadata prior to a preservation phase effort

7. Push OSF Project Snapshot (aka Registration) to CurateND –> EZ deposit to Institutional Repository preservation storage encourages institutional data preservation

Page 17: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

CurateND Institutional Repository OSF Integration

Contact: Rick Johnson [email protected]

Page 18: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

NDS OSF Dashboard integration

http://www.nationaldataservice.org/http://ndspilot.com

Contact:Ian Taylor

[email protected]

bitbucket.org/nds-org/nds-dashboard

Page 19: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Umbrella: Ensuring executable software preservation & reuse

http://ccl.cse.nd.edu/software/umbrella

A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids

Page 20: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Specify the execution environment clearly -- Hardware, Kernel, OS, Software, Data, Environment VariablesMaterialize the execution environment at runtime automatically -- No need to configure environment manually -- Matching evaluation & choose minimal mechanismLoose-coupled with sandbox techniques: -- Parrot, chroot, VM, DockerConstruct sandbox through mounting mechanisms without copying -- multiple namespaces can be constructed concurrentlyUtilize more computing resources: -- Local Machine, Grid, Cloud

Umbrella Features

Makes Applications Portable and Reproducible

Page 21: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Umbrella & OSF The Open Malaria Use Case

A Tool for Ensuring executable software preservation & reuse

Page 22: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Umbrella & Open Malaria Use Case Contacts

Haiyan Meng [email protected]

Douglas [email protected]

Please use the following citation for Umbrella in a scientific publication: Haiyan Meng and Douglas Thain, Umbrella: A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids, Workshop on Virtualization Technologies in Distributed Computing (VTDC) at HPDC, June, 2015. DOI: 10.1145/2755979.2755982

For more information about Umbrella:The Cooperative Computing Lab http://ccl.cse.nd.edu

Alex [email protected]

About Open Malaria Use Case:The Center for Research Computinghttp://crc.nd.edu

Page 23: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Learning from DASPOS, Umbrella & NDS / OSF

Repositories: Will they take provisional data, active data . . . ?Compatibility: Can we plug into existing tools? Diff? Jupyter? Software Preservation Layers: Preserve program binaries, or sources + compilers, or something else? (Parrot, Umbrella, Prune . . . ) Naming: Tension between usability and durability: URL’s, DOIs, PIDs, UUIDs, HMACs, . . .

Complexity of Composition: Connect systems together? NDS? OSF? CurateND? Citation: Dynamic? Static? For publication? For reuse ? Usability: Do users have to change behavior?Overhead: Tools must be close to native performance, or they won’t get used.

Page 24: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

NDS dashboard Enhancements including backend container toolkit development• Fix bugs that cause exceptions for valid operations. • Optimize the toolkit to reduce time taken to perform tasks • Implement post operation to support the uploading of files into OSF

storage providers

Automate the uploading of the diff of the containers for each run into OSF storage

Support VNC working on a container, users can pull up a remote desktop to a container & viewing remote desktop apps e.g. Pegasus workflow.

• Backend integration of Jupyter notebooks• Front end spawning of these which manages the state i.e. spawn notebook, all

editing and then copy edited content back into the OSF storage to update content

OSF ND Immediate Next Efforts

Page 25: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

OSF can be useful in other projects:Spatial Repellents Trial Piloting use for master files management

EU-funded Switch Project is considering use of the OSF• http://www.switchproject.eu/

Collaboration with USC's Institute for Information Sciences on RACE: Repository and Workflows for Accelerating Circuit Realization. • RACE is developing a trusted repository for integrated circuit designs. • OSF / NDS Dashboard can be extended and integrated with the Pegasus

Workflow system and interface to CurateND for long term circuit designs' preservation.

Potential Future OSF Projects

Page 26: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

12/15/2015 CNI Fall Mtg https://osf.io/s5e2b/

Page 27: The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington,

Contact:

Andrew Sallans Natalie MeyersPartnerships Lead E-Research

Librarian

Center for Open Science University of Notre Dame

The OSF at Notre Dame, CNI Fall 2015

osf.io

cos.io

crc.nd.edu

library.nd.edu daspos.org

vecnet.org

ccl.cse.nd.edu

nationaldataservice.org

More Info:

Link to this Presentation:https://osf.io/s5e2b/