The Research Object Initiative:Frameworks and Use Cases

Post on 28-Jul-2015

379 views 0 download

Tags:

Transcript of The Research Object Initiative:Frameworks and Use Cases

The Research Object Initiative:Frameworks and Use Cases

Professor Carole GobleThe University of Manchester, UKcarole.goble@manchester.ac.uk

NIH BD2K BioCADDIE webinar, 11 June 2015

From Manuscripts to Research Objects

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

Datasets, Data collectionsStandard operating proceduresSoftware, algorithmsConfigurations, Tools and apps, servicesCodes, code librariesWorkflows, scriptsSystem software Infrastructure Compilers, hardware

Scattered Assets

Slid

esha

re

Github

figsh

are

Com

mun

ity d

b

Arxi

v.or

g

Concept

Drivers for Research Objects (1)

• Computational Workflows / Scripts– Multi-step, nested. – Data, executable codes, services

(remote and local), libraries– Preservation, Repair– Reproducibility

• Systems Biology– Models, data (construction, validation,

predicted), SOPs, samples– Structured around Investigations,

Studies, Assays– Exchange– Reproducibility

Drivers for Research Objects (2)

• Computational Workflows Commons– Projects and individuals– myExperiment.org

• Systems Biology Commons– Modellers and experimentalists– Projects and Programs– Catalogue of research assets– Fairdomhub.org– Fair-dom.org– Seek4science.org

"Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al

Workflow Commons

https://doi.org/10.15490/seek.1.investigation.56

[Snoep, 2015]

https://doi.org/10.15490/seek.1.investigation.56

Penkler et al (2015) FEBSJ 282:1481-1511.

https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/

Adve

rt!!!

Local Repositories

LIMS

Public Repositorie

s

Central repositories

Funding Agencies

Catalogue

SearchIndex

Tools

Research Infrastructure

s

harv

estin

g

linka

ge

linkagedeposit

execute

subm

issi

onlin

kage

companion site

CR

ISresults

gateway

catalogue

Standards

metadata

submission

access

linkage

linkage

Consumers

Producers

access

Publishers

haven

pla

tform

Com

mons

Research Objects1. Multi-various, citable research products

Research Objects2. Compound, nested, scattered, yet interconnected

research products, structured investigations

Research Objects3. Preserved, Portable research products, inter-platform exchange, reproducibility

Pop-up projects

Dynamic groups

Internal / external visibility

Commons

Research Objects4. Active research products: evolving. executable.

• Fork.• Merge.

• Version.• Cite• Snapshot.• Live.

[Martin Scharm]

Haus et al, BMC Systems Biology, 2011, 5:10Solvent production by Clostridium acetobutylicum

Bigger on the inside than the outsidecite? resolve? steward?

?closed

embed

fixed

local

open

alien

refer

fluidContent

TARDIS Time and Relative Dimension in Space

Scholarship

Multi Span

type

steward

site

author

research

researchers

platforms

time

Contributions

Bigger on the inside than the outsidecite? resolve? steward?

?closed

embed

fixed

local

open

alien

refer

fluidContent

TARDIS Time and Relative Dimension in Space

Scholarship

Multi Span

type

steward

site

author

research

researchers

platforms

time

Contributions

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

time

transfe

r

KnowledgeTurning

interpret

CommonsFAIR

ResearchProducts

ReproducibilityInterpretationComparisonPreservationPortability

ReleaseActive

Research

http://ccrtypewriter.blogspot.co.uk/

Research Objectmeans

ends

driver

Framework

Multi-various products, platforms, resourcesFirst class citizens - id, manage, credit, track, profile, focus

A Framework to Bundle, Port and Link (scattered) resources, related experiments. Metadata Objects that carry Research Context. Units of exchange.

Research Objects

http://www.researchobject.org

The Research Object FrameworkDesiderata

Standards. Machine-processable.Technology Independent.

Multi-platfo

rm.

Incremental.

The least possible.The simplest feasible.

Graceful degradation.

Standard tooling

Research Object Framework

Principles & Conventions

API specificationMetadata formats

RO Core modelusing standards

Annotation profilesprogressive extensionsAdobe

UCF

ORE

ODF

OADM/PROV

Research Object Framework

Principles & Conventions

API specification

Platform Profiles using legacy & commodity platforms

Metadata formats

Policies ServicesTools

LifecycleSteward

Ship

Training

Commodity

Native

RO Core modelusing standards

Annotation profilesprogressive extensionsAdobe

UCF

ORE

ODF

OADM/PROV

Identity

Aggregation

Interpretation:

The objects

How they are linked together

RO Core Model

manifest

Refer to aggregations and their contents

Describe group & constituents

External ids Local filesAttribution:

Who , when, where, why?

MetadataDescription

RO Core Model

AggregationsResource mapsProxies

Annotation first class and stand-off

Identity persistence and resolution, NamesCitation

Identity

Annotation

Aggregation

DOIs

URIsHandles

ORCID

W3C OADM

OAI-ORE

manifestPoint of extendability

Identity

Annotation

Aggregation

RO Core Platform ProfilesDOIs

URIsHandles

ORCID

Data Citation Implementation

OAI-ORE

W3C OADM

RO Model Ontology

http://w3id.org/ro/

Defines core concepts of research objects, identity, aggregation, annotation. Used in the manifest

Metadata Objects

ManifestThe Container Manifest content and the relationships between the content• RO metadata- id, title, creator,

status….• Aggregates – list of ids/links to

resources• Annotations – list of annotations

about resourcesThe Objects

• Remote, through links

• Locally, embedded

Manifest – remote and local

on my machine

Container Machinery

Manifest

The ContainerPackaging: Zip files, DOCKER Images…

Catalogues & Commons: FAIRDOM SEEK, Farr Commons CKAN, myExperiment…

The Container Manifest content and the relationships between the content

Export, archive, publish and transfer ROs.

File format for storage and distribution of ROs as a ZIP archive

Includes an RO’s manifest, annotations and some or all of its aggregated resources

Basis for more specific file formats

Backwards compatible: its zipProgrammatic access: JSON and JSON-LD manifest, API

https://researchobject.github.io/specifications/bundle/

https://w3id.org/bundle/ doi:10.5281/zenodo.10440

https://researchobject.github.io/specifications/bundle/

https://w3id.org/bundle/ doi:10.5281/zenodo.10440

http://www.cnri.reston.va.us/papers/OverviewDigitalObjectArchitecture.pdf

RO Lifecycles, Resolution, Citation

• Defend it (snapshot)• Locate it (most recent)• Reuse it (a version, a

component)• Credit it (contributory

authorship)• Cross link it (connections)

PURL

Checklists

Versio

nin

g

Pro

venance

Dependencies

AnnotationProfiles.

Depth: how deeply described

Coverage: how much is covered.

Progression levelsSemantic Framework

PID

The ManifestThe Object Metadata

PAVVoID

VIVO-ISF

PAV

Mim Ontology

Puppet, Makefile

More detail, fewer

stake

holders

Less detail, more stakeholders

Checklists

Gamble M, Goble CA, Klyne G, Zhao JMim: A minimum information model vocabulary and framework for scientific linked data IEEE 8th Intl Conf on eScience pp: 1-8

Zhao J, Klyne G, Gamble M, Goble CA - A Checklist-Based Approach for Quality Assessment of Scientific Information Proc Third Linked Science Workshop 2013, co-located ISWC2013.

LibraryPublishers

Experiments

Type specific

PIDCitatio

nNISO-JATS

Dublin Core

ISA

MIAME

Wf-Desc

ChecklistAnnotationProfiles

.

OBI

SBML, SED-ML

JERM

EXPO

Wf-prov

Gamble M, Goble CA, Klyne G, Zhao JMim: A minimum information model

vocabulary and framework for scientific linked data IEEE 8th Intl Conf on eScience

pp: 1-8

Use Cases

Use case• SEEK Commons

for Systems Biology

• Natively RO• Export/Import

RO bundles

SEEK Metadata framework link studies and link assets

Describes common elements and relationships between things produced and used in experiments.

Structured descriptions for consistency and comparison

Just Enough Results Model

Snapshots& Living

Living ROs

Snapshot RO of investigation and all its parts

Community Sys Bio Models metadata + packaging

Bergmann, Rodriguez, Le Novère. COMBINE archive specification. <

http://identifiers.org/combine.specifications/omex.version-1

> (2014)

Bergman et al COMBINE archive and OMEX format: one file to share

all information to reproduce a modeling project, BMC

Bioinformatics 2014, 15:369 

Combine with RO.Standardised metadata

& API

http://co.mbine.org/documents/archive

OMEX

https://github.com/stain/ro-combine-archivedoi:10.5281/zenodo.10439

Bridge from Research to FAIR publishing

DepositRun

2

RO Unzip

RO Query

Use Case: Taverna Workflows

Workflow Results

workflowrun.prov.ttl(RDF)

outputA.txt

outputC.jpg

outputB/

https://w3id.org/bundle

intermediates/

1.txt2.txt

3.txt

de/def2e58b-50e2-4949-9980-fd310166621a.txt

inputA.txtworkflow

URI references

attribution

executionenvironment

Aggregating in Research Object

ZIP folder structure (RO Bundle)

mimetypeapplication/vnd.wf4ever.robundle+zip

.ro/manifest.json

Workflow Specification

Example data and config.

Components.

Plug-ins, Versions

Workflow System

Software package

Workflow Runs

Data and configs

Provenance logs

Study

Portability

Preserving

Repair

Reproduce

Report

Asset specific Commons

Personal Notebook

Community Registry

General Publishing Repository

Use case: ATLAS Collider Data Analytics

Portable, lightweight application runtime and packaging tool.

Image

ATLAS and CMS detector data

Charles Vardeman, Da Huo

All data and files of the execution+ Instructions

convert

bundle

manifest

Relate files and layers

Add provenance

and annotationsLink in other

content

run

read

archive

Use case: The Farr Institute

Commons

safe use of patient and research data for medical

researchclinical study cohorts

Research Objects: scripts, data, samples…

different e-Labs, legacy data

http://www.farrinstitute.org/

Use case: The Farr Institute

Commons

The open source data portal software

exchange

catalogue

deposit

Use case: The Farr Institute

Commons

The open source data portal software

exchange

catalogue

deposit

Uses “code as a research object” functionality

Baking RO Infrastructuremake, import, export,

inspect, render, version, process, check, …

• Libraries– Create and inspect RO Bundles and their metadata– Java, Ruby and Python

• User tools– RO Manager: command line tool to make ROs– ROHUB: a prototype web app to manage ROs

• Platforms– SEEK – CKAN plug-in to build, import and export ROs

http://www.researchobject.org/specifications/

NIH BD2K + Research Objects

Metadata Profiles

RO Model API

Community IDs*

RO Model Manifest Profile

Implementation Profiles

*BioMedBridges 10 Rules for Identifiers.

SummaryFAIR Research Objects: • Concept, model, framework, use cases• Lightweight, Incremental

Challenges• Multi-stewarding and lifecycles (OAIS)• Policy, governance

Partnerships• Figshare, Oxford Bodliean, Farr Institute• BioCADDIE?

Acknowledgements & LinksStian Soiland-ReyesMatt GambleRob Haines Sean BechhoferNorman MorrisonPhil CrouchFinn BacallStuart OwenCarole GobleKhalid Belhajjame

Graham KlyneJun Zhao

Daniel Garijo, Oscar Corcho

Esteban García Cuesta

University of Manchester

University of OxfordLancaster University

UPM

http://researchobject.orghttp://fair-dom.orghttp://www.seek4science.orghttp://www.farrinstitute.orghttp://www.wf4ever-project.orghttp://myexperiment.org

Raul Palma

iSOCO

PSNC

Paris 6