Bosc seek-2014-goble

30
SEEK for Science: A Data Management Platform to support Open and Reproducible Science Professor Carole Goble The University of Manchester UK SC 2014, 12 th July 2014

description

SEEK for Science: A Data and Model Management Platform to support Open and Reproducible Science in Systems Biology

Transcript of Bosc seek-2014-goble

Page 1: Bosc seek-2014-goble

SEEK for Science: A Data Management Platform to support Open and Reproducible Science

Professor Carole Goble The University of ManchesterUK

BOSC 2014, 12th July 2014

Page 2: Bosc seek-2014-goble

HypothesisGeneration

Public Data Acquisition

Experiment and Data

Generation

Public Data Acquisition

ModelAnalysis

Biological insight

Biological insight

Experiment Analysis

ModelValidation

Model Construction

Public Data Acquisition

Public Data Acquisition

Mo

del

lin

g

Exp

erim

enta

l

Systems Biology

Page 3: Bosc seek-2014-goble

Sponsors and Motivation

• BMBF “Großprojekt“• ~45 organisations• ~70 groups• multiscale rep of the liver• multiscale data, models• imaging data

• EU ERANet programme• 122 organisations• 16 multi-inst. consortia• independent projects in a

two-round funding initiative

Page 4: Bosc seek-2014-goble

Funders • Preserve results beyond projects.• Organise & link data, models,

processes.• Exchange & search initiative‘s

assets.• Share & disseminate results• Improve standard curation practice.• Pool capacities.• Handle home-brewed solutions

with mixed resourcing and no access

Page 5: Bosc seek-2014-goble

People

• Dynamic distributed groups of experimentalists and modellers

• Cherished own home-grown and unstable data solutions– wikis, CMS, databases,

spreadsheets, files.• Access & visibility control over

shared content

Page 6: Bosc seek-2014-goble

Content• Locally hosted private repositories• Public archives • From single-cell to human• Samples, Specimens, Standard Op

Procedures

• Small Data: Reactome…: files, spreadsheets

• Big Data: NGS, Mass Spec…: Specialist repositories, files

• Models: ODE, SBML, Native Matlab, PDE, Multi-scale

• In progress: versioning, track provenance and parameters• Published: citation, links to publications

Page 7: Bosc seek-2014-goble

Cataloguing

7

Find my peersFind my peers Creating and sharing SOPs across projects Creating and sharing SOPs across projects

Track my specimensTrack my

specimens

yellow pages, manage SOPs and link them to investigations, studies, assays, specimens and samples

Browse experimental data without downloading them

Browse experimental data without downloading them

How data, models and SOPs fit together

How data, models and SOPs fit together

Which data belong to which publication

Which data belong to which publication

Data viewing functionalityISA: Link Studies to their data, models, SOPs, samples, publications

Track different versions of my

model

Track different versions of my

model

Page 8: Bosc seek-2014-goble

The Web-based SEEK PlatformRuby on RAILS 3.2, BSD,

https://bitbucket.org/seek4science/seek

https://seek.sysmo-db.org/models/114

http://www.seek4science.org

Page 9: Bosc seek-2014-goble

Data

Models

Articles

ExternalDatabases

http://www.seek4science.org

Metadata

http://www.isatools.org

Aggregated Asset Infrastructure….share and interlinking multi-stewarded, mixed,

methods, models, data, samples…

A Commons….

Page 10: Bosc seek-2014-goble

simulate models

project mgt,access control reporting, citation governance & policies

yellow pages of peers projects, experts

catalogue, link and index data, models, samples, specimens, sops, experiments, publications using standards

curate & annotate data and models using standards with compliance tools

incorporate public data and model repositories & toolsdeposition

manage, store and exchange different types and scales of data

Reproducibility Score Card

integrate local and project tools and data systems

scaled-out collection & analytics using third party platforms

differentiate construction, validation & predicted data

Page 11: Bosc seek-2014-goble

Yellow Pages InstitutionsProjectsPeople

ISA

InvestigationStudyAssay

Asset Catalogue

Models

Datafiles

SOPs

Publications

TagsVersions

Access Privileges

PresentationsEvents

Datafiles Models SOPs

JERM Extract, H

arvest, IndexAP

Is a

nd L

inks BioModels

CheBI

BioPortal

PubMed

JWS Online

GEO

SABIO-RK

Web Interface REST API

Local SEEK

Wikis

CMS

Own DB

Direct Upload

Project DM External

SEEK

OpenBIS

Page 12: Bosc seek-2014-goble

• Gateway plugin framework – Tight and loose coupling– RAILS plugin or bundled GEM

• Metadata framework– JERM and ISA

• Different instances– Single query across all model

repositories– One click deposition

BioModels

Plug-in, Play nice, Don’t reinvent

Page 13: Bosc seek-2014-goble

Data….• Public and new data • Factors studied

– Linked -> SABIO-RK and ChEBI

• Samples and Specimens– Extends EBI/NCBI BioSamples

• Treatment Extraction • Tagging with vocabularies• Spreadsheet-based data-view• Big Data

– Upload and by email, Share by trusted link, Link to external repository

• Access– DOIs and Temp links for reviews

Page 14: Bosc seek-2014-goble

Cytoscape

Repositories• Biomodels, JWS Online,

local SEEKJWS Online Simulator• SBML support• Auto generation of SBGN

schemas for user models• SED-ML export DataFuse• Link and compare

construction and validation data with models

• Run models with parameter values from spreadsheets

Models….

Page 15: Bosc seek-2014-goble

Models

Exchange

Experiment Data

Exch

ange

Exchange

ExchangeVerificationComparison

Just Enough Results Model

ISA-TAB

SBMLMIRIAMSBGNSemanticSBMLCellML

Construction

Prediction

MIBBI StandardsOBO Controlled Vocabularies

SED-ML

Simulation Experiment Description Markup Language

Standard Formats and Vocabularies

Page 16: Bosc seek-2014-goble

Standards, Structure, Interlink

Construction Validation

Metabolomics

Metabolomics

Mass SpecTranscriptomics

Proteomics

Fluxomics

Investigations

Studies

Assays

Towards Interoperable Bioscience Data, Nature Genetics, 2012

Assays

Page 17: Bosc seek-2014-goble
Page 18: Bosc seek-2014-goble
Page 19: Bosc seek-2014-goble
Page 20: Bosc seek-2014-goble

Just Enough Results ModelDescribes and enriches the relationships between things produced and used in experiments.

http://bioportal.bioontology.org/ontologies/JERM

reuse community ontologies, markups, mim, identifiers

Page 21: Bosc seek-2014-goble

metadata sheetssample sheets

data sheets

indexes

http://rightfield.org.uk/

Just Enough Results ModelDescribes and enriches the relationships between things produced and used in experiments.

http://bioportal.bioontology.org/ontologies/JERM

reuse community ontologies, markups, mim, identifiers

Page 22: Bosc seek-2014-goble

Different types of dataPlugins to registered data repositories

Extract and auto-catalogue metadata

Define relationships, cross-link, aggregate, query

standard based templates

non-standard templates

Open Modelling Exchange

Format archive

Page 23: Bosc seek-2014-goble

Sys Bio Research Objectsportable packaged research

Adobe UCF

Research Object Bundle

ORE PROVODF

• Aggregation• Annotations/

provenance• Ad-hoc domain-

specific specification

OMEX archive

Systems Biology:A common archive format for reuse across tools

http://www.researchobject.org

Page 24: Bosc seek-2014-goble

Reproducible (Open?) Research

Data sharing, openness and careers incentive

See Titus and Phil talks

Page 25: Bosc seek-2014-goble

Open Research: Research Groups & Lifecycles• Sharing policy • Visibility, Downloadability• Fine grained permissions

• Protocols for– Management transfer – Visibility feedback and sharing

workflows– Publication data deposition in

external public stores – Batch publishing

Within ProjectVersions

Retractions

Across ProjectsVersions

PublicFinal versionNo Retraction

Manager

Owner

Gatekeeper

Page 26: Bosc seek-2014-goble

Open Source Customisable Platform

https://bitbucket.org/seek4science/seek

Vrije Universiteit, Amsterdam

Systems Science for Health (SSfH)MACS

Yeast Glycolysis

Page 27: Bosc seek-2014-goble

Open Source Customisable Platform

https://bitbucket.org/seek4science/seek

Page 28: Bosc seek-2014-goble

Open Facility for European Systems Biology data & model management

seeded by EU programmes

• Platform– SEEK + openBIS + new features & styling

• Resource– EuroSEEK + pool of community resources

(including established SEEKs).– Independent researchers. Secure data.

• Facility– Curation & support services, training

http://fair-dom.org/

Page 29: Bosc seek-2014-goble

Open Facility for European Systems Biology data & model management

seeded by EU programmes

• Community– workshops, user and developer forums,

knowledge network, standards & policy, training, FAIRDOM Foundation, Model Carpentry.

• Sys Bio Developers Foundry workshop 6-7 October Heidelberghttp://fair-dom.org/wiki/Foundry_workshop

• RI– working with other EU RIs, an EU network of

national facilities, funding models.

http://fair-dom.org/

Page 30: Bosc seek-2014-goble

Carole Goble

Stuart Owen

Jacky Snoep

Wolfgang Mueller

Olga Krebs Quyen Nguyen

Natalie Stanford

Katy WolstencroftPeter Kunszt Bernd Rinn

also contributing:VLN SEEK team

also contributing:UK SEEK team