The Research Object Initiative:Frameworks and Use Cases
-
Upload
carole-goble -
Category
Science
-
view
378 -
download
0
Transcript of The Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use Cases
Professor Carole GobleThe University of Manchester, [email protected]
NIH BD2K BioCADDIE webinar, 11 June 2015
From Manuscripts to Research Objects
“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
Datasets, Data collectionsStandard operating proceduresSoftware, algorithmsConfigurations, Tools and apps, servicesCodes, code librariesWorkflows, scriptsSystem software Infrastructure Compilers, hardware
Scattered Assets
Slid
esha
re
Github
figsh
are
Com
mun
ity d
b
Arxi
v.or
g
Concept
Drivers for Research Objects (1)
• Computational Workflows / Scripts– Multi-step, nested. – Data, executable codes, services
(remote and local), libraries– Preservation, Repair– Reproducibility
• Systems Biology– Models, data (construction, validation,
predicted), SOPs, samples– Structured around Investigations,
Studies, Assays– Exchange– Reproducibility
Drivers for Research Objects (2)
• Computational Workflows Commons– Projects and individuals– myExperiment.org
• Systems Biology Commons– Modellers and experimentalists– Projects and Programs– Catalogue of research assets– Fairdomhub.org– Fair-dom.org– Seek4science.org
"Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al
Workflow Commons
https://doi.org/10.15490/seek.1.investigation.56
[Snoep, 2015]
https://doi.org/10.15490/seek.1.investigation.56
Penkler et al (2015) FEBSJ 282:1481-1511.
https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/
Adve
rt!!!
Local Repositories
LIMS
Public Repositorie
s
Central repositories
Funding Agencies
Catalogue
SearchIndex
Tools
Research Infrastructure
s
harv
estin
g
linka
ge
linkagedeposit
execute
subm
issi
onlin
kage
companion site
CR
ISresults
gateway
catalogue
Standards
metadata
submission
access
linkage
linkage
Consumers
Producers
access
Publishers
haven
pla
tform
Com
mons
Research Objects1. Multi-various, citable research products
Research Objects2. Compound, nested, scattered, yet interconnected
research products, structured investigations
Research Objects3. Preserved, Portable research products, inter-platform exchange, reproducibility
Pop-up projects
Dynamic groups
Internal / external visibility
Commons
Research Objects4. Active research products: evolving. executable.
• Fork.• Merge.
• Version.• Cite• Snapshot.• Live.
[Martin Scharm]
Haus et al, BMC Systems Biology, 2011, 5:10Solvent production by Clostridium acetobutylicum
Bigger on the inside than the outsidecite? resolve? steward?
?closed
embed
fixed
local
open
alien
refer
fluidContent
TARDIS Time and Relative Dimension in Space
Scholarship
Multi Span
type
steward
site
author
research
researchers
platforms
time
Contributions
Bigger on the inside than the outsidecite? resolve? steward?
?closed
embed
fixed
local
open
alien
refer
fluidContent
TARDIS Time and Relative Dimension in Space
Scholarship
Multi Span
type
steward
site
author
research
researchers
platforms
time
Contributions
Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013
time
transfe
r
KnowledgeTurning
interpret
CommonsFAIR
ResearchProducts
ReproducibilityInterpretationComparisonPreservationPortability
ReleaseActive
Research
http://ccrtypewriter.blogspot.co.uk/
Research Objectmeans
ends
driver
Framework
Multi-various products, platforms, resourcesFirst class citizens - id, manage, credit, track, profile, focus
A Framework to Bundle, Port and Link (scattered) resources, related experiments. Metadata Objects that carry Research Context. Units of exchange.
Research Objects
http://www.researchobject.org
The Research Object FrameworkDesiderata
Standards. Machine-processable.Technology Independent.
Multi-platfo
rm.
Incremental.
The least possible.The simplest feasible.
Graceful degradation.
Standard tooling
Research Object Framework
Principles & Conventions
API specificationMetadata formats
RO Core modelusing standards
Annotation profilesprogressive extensionsAdobe
UCF
ORE
ODF
OADM/PROV
Research Object Framework
Principles & Conventions
API specification
Platform Profiles using legacy & commodity platforms
Metadata formats
Policies ServicesTools
LifecycleSteward
Ship
Training
…
Commodity
Native
RO Core modelusing standards
Annotation profilesprogressive extensionsAdobe
UCF
ORE
ODF
OADM/PROV
Identity
Aggregation
Interpretation:
The objects
How they are linked together
RO Core Model
manifest
Refer to aggregations and their contents
Describe group & constituents
External ids Local filesAttribution:
Who , when, where, why?
MetadataDescription
RO Core Model
AggregationsResource mapsProxies
Annotation first class and stand-off
Identity persistence and resolution, NamesCitation
Identity
Annotation
Aggregation
DOIs
URIsHandles
ORCID
W3C OADM
OAI-ORE
manifestPoint of extendability
Identity
Annotation
Aggregation
RO Core Platform ProfilesDOIs
URIsHandles
ORCID
Data Citation Implementation
OAI-ORE
W3C OADM
RO Model Ontology
http://w3id.org/ro/
Defines core concepts of research objects, identity, aggregation, annotation. Used in the manifest
Metadata Objects
ManifestThe Container Manifest content and the relationships between the content• RO metadata- id, title, creator,
status….• Aggregates – list of ids/links to
resources• Annotations – list of annotations
about resourcesThe Objects
• Remote, through links
• Locally, embedded
Manifest – remote and local
on my machine
Container Machinery
Manifest
The ContainerPackaging: Zip files, DOCKER Images…
Catalogues & Commons: FAIRDOM SEEK, Farr Commons CKAN, myExperiment…
The Container Manifest content and the relationships between the content
Export, archive, publish and transfer ROs.
File format for storage and distribution of ROs as a ZIP archive
Includes an RO’s manifest, annotations and some or all of its aggregated resources
Basis for more specific file formats
Backwards compatible: its zipProgrammatic access: JSON and JSON-LD manifest, API
https://researchobject.github.io/specifications/bundle/
https://w3id.org/bundle/ doi:10.5281/zenodo.10440
https://researchobject.github.io/specifications/bundle/
https://w3id.org/bundle/ doi:10.5281/zenodo.10440
http://www.cnri.reston.va.us/papers/OverviewDigitalObjectArchitecture.pdf
RO Lifecycles, Resolution, Citation
• Defend it (snapshot)• Locate it (most recent)• Reuse it (a version, a
component)• Credit it (contributory
authorship)• Cross link it (connections)
PURL
Checklists
Versio
nin
g
Pro
venance
Dependencies
AnnotationProfiles.
Depth: how deeply described
Coverage: how much is covered.
Progression levelsSemantic Framework
PID
The ManifestThe Object Metadata
PAVVoID
VIVO-ISF
PAV
Mim Ontology
Puppet, Makefile
More detail, fewer
stake
holders
Less detail, more stakeholders
Checklists
Gamble M, Goble CA, Klyne G, Zhao JMim: A minimum information model vocabulary and framework for scientific linked data IEEE 8th Intl Conf on eScience pp: 1-8
Zhao J, Klyne G, Gamble M, Goble CA - A Checklist-Based Approach for Quality Assessment of Scientific Information Proc Third Linked Science Workshop 2013, co-located ISWC2013.
LibraryPublishers
Experiments
Type specific
PIDCitatio
nNISO-JATS
Dublin Core
ISA
MIAME
Wf-Desc
ChecklistAnnotationProfiles
.
OBI
SBML, SED-ML
JERM
EXPO
Wf-prov
Gamble M, Goble CA, Klyne G, Zhao JMim: A minimum information model
vocabulary and framework for scientific linked data IEEE 8th Intl Conf on eScience
pp: 1-8
Use Cases
Use case• SEEK Commons
for Systems Biology
• Natively RO• Export/Import
RO bundles
SEEK Metadata framework link studies and link assets
Describes common elements and relationships between things produced and used in experiments.
Structured descriptions for consistency and comparison
Just Enough Results Model
Snapshots& Living
Living ROs
Snapshot RO of investigation and all its parts
Community Sys Bio Models metadata + packaging
Bergmann, Rodriguez, Le Novère. COMBINE archive specification. <
http://identifiers.org/combine.specifications/omex.version-1
> (2014)
Bergman et al COMBINE archive and OMEX format: one file to share
all information to reproduce a modeling project, BMC
Bioinformatics 2014, 15:369
Combine with RO.Standardised metadata
& API
http://co.mbine.org/documents/archive
OMEX
https://github.com/stain/ro-combine-archivedoi:10.5281/zenodo.10439
Bridge from Research to FAIR publishing
DepositRun
2
RO Unzip
RO Query
Use Case: Taverna Workflows
Workflow Results
workflowrun.prov.ttl(RDF)
outputA.txt
outputC.jpg
outputB/
https://w3id.org/bundle
intermediates/
1.txt2.txt
3.txt
de/def2e58b-50e2-4949-9980-fd310166621a.txt
inputA.txtworkflow
URI references
attribution
executionenvironment
Aggregating in Research Object
ZIP folder structure (RO Bundle)
mimetypeapplication/vnd.wf4ever.robundle+zip
.ro/manifest.json
Workflow Specification
Example data and config.
Components.
Plug-ins, Versions
Workflow System
Software package
Workflow Runs
Data and configs
Provenance logs
Study
Portability
Preserving
Repair
Reproduce
Report
Asset specific Commons
Personal Notebook
Community Registry
General Publishing Repository
Use case: ATLAS Collider Data Analytics
Portable, lightweight application runtime and packaging tool.
Image
ATLAS and CMS detector data
Charles Vardeman, Da Huo
All data and files of the execution+ Instructions
convert
bundle
manifest
Relate files and layers
Add provenance
and annotationsLink in other
content
run
read
archive
Use case: The Farr Institute
Commons
safe use of patient and research data for medical
researchclinical study cohorts
Research Objects: scripts, data, samples…
different e-Labs, legacy data
http://www.farrinstitute.org/
Use case: The Farr Institute
Commons
The open source data portal software
exchange
catalogue
deposit
Use case: The Farr Institute
Commons
The open source data portal software
exchange
catalogue
deposit
Uses “code as a research object” functionality
Baking RO Infrastructuremake, import, export,
inspect, render, version, process, check, …
• Libraries– Create and inspect RO Bundles and their metadata– Java, Ruby and Python
• User tools– RO Manager: command line tool to make ROs– ROHUB: a prototype web app to manage ROs
• Platforms– SEEK – CKAN plug-in to build, import and export ROs
http://www.researchobject.org/specifications/
NIH BD2K + Research Objects
Metadata Profiles
RO Model API
Community IDs*
RO Model Manifest Profile
Implementation Profiles
*BioMedBridges 10 Rules for Identifiers.
SummaryFAIR Research Objects: • Concept, model, framework, use cases• Lightweight, Incremental
Challenges• Multi-stewarding and lifecycles (OAIS)• Policy, governance
Partnerships• Figshare, Oxford Bodliean, Farr Institute• BioCADDIE?
Acknowledgements & LinksStian Soiland-ReyesMatt GambleRob Haines Sean BechhoferNorman MorrisonPhil CrouchFinn BacallStuart OwenCarole GobleKhalid Belhajjame
Graham KlyneJun Zhao
Daniel Garijo, Oscar Corcho
Esteban García Cuesta
University of Manchester
University of OxfordLancaster University
UPM
http://researchobject.orghttp://fair-dom.orghttp://www.seek4science.orghttp://www.farrinstitute.orghttp://www.wf4ever-project.orghttp://myexperiment.org
Raul Palma
iSOCO
PSNC
Paris 6