The Research Object Initiative:Frameworks and Use Cases

61
The Research Object Initiative: Frameworks and Use Cases Professor Carole Goble The University of Manchester, UK [email protected] c.uk NIH BD2K BioCADDIE webinar, 11 June 2015

Transcript of The Research Object Initiative:Frameworks and Use Cases

Page 1: The Research Object Initiative:Frameworks and Use Cases

The Research Object Initiative:Frameworks and Use Cases

Professor Carole GobleThe University of Manchester, [email protected]

NIH BD2K BioCADDIE webinar, 11 June 2015

Page 2: The Research Object Initiative:Frameworks and Use Cases

From Manuscripts to Research Objects

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

Datasets, Data collectionsStandard operating proceduresSoftware, algorithmsConfigurations, Tools and apps, servicesCodes, code librariesWorkflows, scriptsSystem software Infrastructure Compilers, hardware

Page 3: The Research Object Initiative:Frameworks and Use Cases

Scattered Assets

Slid

esha

re

Github

figsh

are

Com

mun

ity d

b

Arxi

v.or

g

Page 4: The Research Object Initiative:Frameworks and Use Cases
Page 5: The Research Object Initiative:Frameworks and Use Cases

Concept

Page 6: The Research Object Initiative:Frameworks and Use Cases

Drivers for Research Objects (1)

• Computational Workflows / Scripts– Multi-step, nested. – Data, executable codes, services

(remote and local), libraries– Preservation, Repair– Reproducibility

• Systems Biology– Models, data (construction, validation,

predicted), SOPs, samples– Structured around Investigations,

Studies, Assays– Exchange– Reproducibility

Page 7: The Research Object Initiative:Frameworks and Use Cases

Drivers for Research Objects (2)

• Computational Workflows Commons– Projects and individuals– myExperiment.org

• Systems Biology Commons– Modellers and experimentalists– Projects and Programs– Catalogue of research assets– Fairdomhub.org– Fair-dom.org– Seek4science.org

Page 8: The Research Object Initiative:Frameworks and Use Cases

"Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al

Workflow Commons

Page 9: The Research Object Initiative:Frameworks and Use Cases

https://doi.org/10.15490/seek.1.investigation.56

Page 10: The Research Object Initiative:Frameworks and Use Cases
Page 11: The Research Object Initiative:Frameworks and Use Cases
Page 12: The Research Object Initiative:Frameworks and Use Cases

[Snoep, 2015]

https://doi.org/10.15490/seek.1.investigation.56

Penkler et al (2015) FEBSJ 282:1481-1511.

Page 13: The Research Object Initiative:Frameworks and Use Cases

https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/

Adve

rt!!!

Page 14: The Research Object Initiative:Frameworks and Use Cases

Local Repositories

LIMS

Public Repositorie

s

Central repositories

Funding Agencies

Catalogue

SearchIndex

Tools

Research Infrastructure

s

harv

estin

g

linka

ge

linkagedeposit

execute

subm

issi

onlin

kage

companion site

CR

ISresults

gateway

catalogue

Standards

metadata

submission

access

linkage

linkage

Consumers

Producers

access

Publishers

haven

pla

tform

Com

mons

Page 15: The Research Object Initiative:Frameworks and Use Cases

Research Objects1. Multi-various, citable research products

Page 16: The Research Object Initiative:Frameworks and Use Cases

Research Objects2. Compound, nested, scattered, yet interconnected

research products, structured investigations

Page 17: The Research Object Initiative:Frameworks and Use Cases

Research Objects3. Preserved, Portable research products, inter-platform exchange, reproducibility

Pop-up projects

Dynamic groups

Internal / external visibility

Commons

Page 18: The Research Object Initiative:Frameworks and Use Cases

Research Objects4. Active research products: evolving. executable.

• Fork.• Merge.

• Version.• Cite• Snapshot.• Live.

[Martin Scharm]

Haus et al, BMC Systems Biology, 2011, 5:10Solvent production by Clostridium acetobutylicum

Page 19: The Research Object Initiative:Frameworks and Use Cases

Bigger on the inside than the outsidecite? resolve? steward?

?closed

embed

fixed

local

open

alien

refer

fluidContent

TARDIS Time and Relative Dimension in Space

Scholarship

Multi Span

type

steward

site

author

research

researchers

platforms

time

Contributions

Page 20: The Research Object Initiative:Frameworks and Use Cases

Bigger on the inside than the outsidecite? resolve? steward?

?closed

embed

fixed

local

open

alien

refer

fluidContent

TARDIS Time and Relative Dimension in Space

Scholarship

Multi Span

type

steward

site

author

research

researchers

platforms

time

Contributions

Page 21: The Research Object Initiative:Frameworks and Use Cases

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

time

transfe

r

KnowledgeTurning

interpret

CommonsFAIR

ResearchProducts

ReproducibilityInterpretationComparisonPreservationPortability

ReleaseActive

Research

http://ccrtypewriter.blogspot.co.uk/

Research Objectmeans

ends

driver

Page 22: The Research Object Initiative:Frameworks and Use Cases

Framework

Page 23: The Research Object Initiative:Frameworks and Use Cases

Multi-various products, platforms, resourcesFirst class citizens - id, manage, credit, track, profile, focus

A Framework to Bundle, Port and Link (scattered) resources, related experiments. Metadata Objects that carry Research Context. Units of exchange.

Research Objects

http://www.researchobject.org

Page 24: The Research Object Initiative:Frameworks and Use Cases

The Research Object FrameworkDesiderata

Standards. Machine-processable.Technology Independent.

Multi-platfo

rm.

Incremental.

The least possible.The simplest feasible.

Graceful degradation.

Standard tooling

Page 25: The Research Object Initiative:Frameworks and Use Cases

Research Object Framework

Principles & Conventions

API specificationMetadata formats

RO Core modelusing standards

Annotation profilesprogressive extensionsAdobe

UCF

ORE

ODF

OADM/PROV

Page 26: The Research Object Initiative:Frameworks and Use Cases

Research Object Framework

Principles & Conventions

API specification

Platform Profiles using legacy & commodity platforms

Metadata formats

Policies ServicesTools

LifecycleSteward

Ship

Training

Commodity

Native

RO Core modelusing standards

Annotation profilesprogressive extensionsAdobe

UCF

ORE

ODF

OADM/PROV

Page 27: The Research Object Initiative:Frameworks and Use Cases

Identity

Aggregation

Interpretation:

The objects

How they are linked together

RO Core Model

manifest

Refer to aggregations and their contents

Describe group & constituents

External ids Local filesAttribution:

Who , when, where, why?

MetadataDescription

Page 28: The Research Object Initiative:Frameworks and Use Cases

RO Core Model

AggregationsResource mapsProxies

Annotation first class and stand-off

Identity persistence and resolution, NamesCitation

Identity

Annotation

Aggregation

DOIs

URIsHandles

ORCID

W3C OADM

OAI-ORE

manifestPoint of extendability

Page 29: The Research Object Initiative:Frameworks and Use Cases

Identity

Annotation

Aggregation

RO Core Platform ProfilesDOIs

URIsHandles

ORCID

Data Citation Implementation

OAI-ORE

W3C OADM

Page 30: The Research Object Initiative:Frameworks and Use Cases

RO Model Ontology

http://w3id.org/ro/

Defines core concepts of research objects, identity, aggregation, annotation. Used in the manifest

Page 31: The Research Object Initiative:Frameworks and Use Cases

Metadata Objects

ManifestThe Container Manifest content and the relationships between the content• RO metadata- id, title, creator,

status….• Aggregates – list of ids/links to

resources• Annotations – list of annotations

about resourcesThe Objects

• Remote, through links

• Locally, embedded

Page 32: The Research Object Initiative:Frameworks and Use Cases

Manifest – remote and local

on my machine

Page 33: The Research Object Initiative:Frameworks and Use Cases

Container Machinery

Manifest

The ContainerPackaging: Zip files, DOCKER Images…

Catalogues & Commons: FAIRDOM SEEK, Farr Commons CKAN, myExperiment…

The Container Manifest content and the relationships between the content

Page 34: The Research Object Initiative:Frameworks and Use Cases

Export, archive, publish and transfer ROs.

File format for storage and distribution of ROs as a ZIP archive

Includes an RO’s manifest, annotations and some or all of its aggregated resources

Basis for more specific file formats

Backwards compatible: its zipProgrammatic access: JSON and JSON-LD manifest, API

https://researchobject.github.io/specifications/bundle/

https://w3id.org/bundle/ doi:10.5281/zenodo.10440

Page 35: The Research Object Initiative:Frameworks and Use Cases

https://researchobject.github.io/specifications/bundle/

https://w3id.org/bundle/ doi:10.5281/zenodo.10440

Page 36: The Research Object Initiative:Frameworks and Use Cases

http://www.cnri.reston.va.us/papers/OverviewDigitalObjectArchitecture.pdf

RO Lifecycles, Resolution, Citation

• Defend it (snapshot)• Locate it (most recent)• Reuse it (a version, a

component)• Credit it (contributory

authorship)• Cross link it (connections)

PURL

Page 37: The Research Object Initiative:Frameworks and Use Cases

Checklists

Versio

nin

g

Pro

venance

Dependencies

AnnotationProfiles.

Depth: how deeply described

Coverage: how much is covered.

Progression levelsSemantic Framework

PID

The ManifestThe Object Metadata

PAVVoID

VIVO-ISF

PAV

Mim Ontology

Puppet, Makefile

More detail, fewer

stake

holders

Less detail, more stakeholders

Page 38: The Research Object Initiative:Frameworks and Use Cases

Checklists

Gamble M, Goble CA, Klyne G, Zhao JMim: A minimum information model vocabulary and framework for scientific linked data IEEE 8th Intl Conf on eScience pp: 1-8

Zhao J, Klyne G, Gamble M, Goble CA - A Checklist-Based Approach for Quality Assessment of Scientific Information Proc Third Linked Science Workshop 2013, co-located ISWC2013.

Page 39: The Research Object Initiative:Frameworks and Use Cases

LibraryPublishers

Experiments

Type specific

PIDCitatio

nNISO-JATS

Dublin Core

ISA

MIAME

Wf-Desc

ChecklistAnnotationProfiles

.

OBI

SBML, SED-ML

JERM

EXPO

Wf-prov

Gamble M, Goble CA, Klyne G, Zhao JMim: A minimum information model

vocabulary and framework for scientific linked data IEEE 8th Intl Conf on eScience

pp: 1-8

Page 40: The Research Object Initiative:Frameworks and Use Cases

Use Cases

Page 41: The Research Object Initiative:Frameworks and Use Cases

Use case• SEEK Commons

for Systems Biology

• Natively RO• Export/Import

RO bundles

Page 42: The Research Object Initiative:Frameworks and Use Cases

SEEK Metadata framework link studies and link assets

Describes common elements and relationships between things produced and used in experiments.

Structured descriptions for consistency and comparison

Just Enough Results Model

Page 43: The Research Object Initiative:Frameworks and Use Cases

Snapshots& Living

Living ROs

Snapshot RO of investigation and all its parts

Page 44: The Research Object Initiative:Frameworks and Use Cases

Community Sys Bio Models metadata + packaging

Bergmann, Rodriguez, Le Novère. COMBINE archive specification. <

http://identifiers.org/combine.specifications/omex.version-1

> (2014)

Bergman et al COMBINE archive and OMEX format: one file to share

all information to reproduce a modeling project, BMC

Bioinformatics 2014, 15:369 

Combine with RO.Standardised metadata

& API

http://co.mbine.org/documents/archive

OMEX

https://github.com/stain/ro-combine-archivedoi:10.5281/zenodo.10439

Page 45: The Research Object Initiative:Frameworks and Use Cases

Bridge from Research to FAIR publishing

DepositRun

2

Page 46: The Research Object Initiative:Frameworks and Use Cases

RO Unzip

Page 47: The Research Object Initiative:Frameworks and Use Cases
Page 48: The Research Object Initiative:Frameworks and Use Cases
Page 49: The Research Object Initiative:Frameworks and Use Cases

RO Query

Page 50: The Research Object Initiative:Frameworks and Use Cases

Use Case: Taverna Workflows

Page 51: The Research Object Initiative:Frameworks and Use Cases

Workflow Results

workflowrun.prov.ttl(RDF)

outputA.txt

outputC.jpg

outputB/

https://w3id.org/bundle

intermediates/

1.txt2.txt

3.txt

de/def2e58b-50e2-4949-9980-fd310166621a.txt

inputA.txtworkflow

URI references

attribution

executionenvironment

Aggregating in Research Object

ZIP folder structure (RO Bundle)

mimetypeapplication/vnd.wf4ever.robundle+zip

.ro/manifest.json

Page 52: The Research Object Initiative:Frameworks and Use Cases

Workflow Specification

Example data and config.

Components.

Plug-ins, Versions

Workflow System

Software package

Workflow Runs

Data and configs

Provenance logs

Study

Portability

Preserving

Repair

Reproduce

Report

Asset specific Commons

Personal Notebook

Community Registry

General Publishing Repository

Page 53: The Research Object Initiative:Frameworks and Use Cases

Use case: ATLAS Collider Data Analytics

Portable, lightweight application runtime and packaging tool.

Image

ATLAS and CMS detector data

Charles Vardeman, Da Huo

All data and files of the execution+ Instructions

convert

bundle

manifest

Relate files and layers

Add provenance

and annotationsLink in other

content

run

read

archive

Page 54: The Research Object Initiative:Frameworks and Use Cases

Use case: The Farr Institute

Commons

safe use of patient and research data for medical

researchclinical study cohorts

Research Objects: scripts, data, samples…

different e-Labs, legacy data

http://www.farrinstitute.org/

Page 55: The Research Object Initiative:Frameworks and Use Cases

Use case: The Farr Institute

Commons

The open source data portal software

exchange

catalogue

deposit

Page 56: The Research Object Initiative:Frameworks and Use Cases

Use case: The Farr Institute

Commons

The open source data portal software

exchange

catalogue

deposit

Page 57: The Research Object Initiative:Frameworks and Use Cases

Uses “code as a research object” functionality

Page 58: The Research Object Initiative:Frameworks and Use Cases

Baking RO Infrastructuremake, import, export,

inspect, render, version, process, check, …

• Libraries– Create and inspect RO Bundles and their metadata– Java, Ruby and Python

• User tools– RO Manager: command line tool to make ROs– ROHUB: a prototype web app to manage ROs

• Platforms– SEEK – CKAN plug-in to build, import and export ROs

http://www.researchobject.org/specifications/

Page 59: The Research Object Initiative:Frameworks and Use Cases

NIH BD2K + Research Objects

Metadata Profiles

RO Model API

Community IDs*

RO Model Manifest Profile

Implementation Profiles

*BioMedBridges 10 Rules for Identifiers.

Page 60: The Research Object Initiative:Frameworks and Use Cases

SummaryFAIR Research Objects: • Concept, model, framework, use cases• Lightweight, Incremental

Challenges• Multi-stewarding and lifecycles (OAIS)• Policy, governance

Partnerships• Figshare, Oxford Bodliean, Farr Institute• BioCADDIE?

Page 61: The Research Object Initiative:Frameworks and Use Cases

Acknowledgements & LinksStian Soiland-ReyesMatt GambleRob Haines Sean BechhoferNorman MorrisonPhil CrouchFinn BacallStuart OwenCarole GobleKhalid Belhajjame

Graham KlyneJun Zhao

Daniel Garijo, Oscar Corcho

Esteban García Cuesta

University of Manchester

University of OxfordLancaster University

UPM

http://researchobject.orghttp://fair-dom.orghttp://www.seek4science.orghttp://www.farrinstitute.orghttp://www.wf4ever-project.orghttp://myexperiment.org

Raul Palma

iSOCO

PSNC

Paris 6