SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium

61
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany luation Conference, 19-20 May 2009, Vienna, Austria

description

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium. Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany. 2 nd Evaluation Conference, 19-20 May 2009, Vienna, Austria. - PowerPoint PPT Presentation

Transcript of SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium

Page 1: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UKJacky Snoep, Uni of Manchester, UK / Stellenbosch, South AfricaIsabel Rojas, EML Research gGmbH, Germany

2nd Evaluation Conference, 19-20 May 2009, Vienna, Austria

Page 2: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Started July 2008, 3 years, 3 staff + 3 investigators people, 3 teams over 3 sites

Sensitively retrofit a data access, model handling and data integration platform.

Support and manage the diversity of data, models and competencies.

Web-based solution:exchange of data, models and processes (intra-

and inter-consortia).search for data, models and processes across

the initiative.dissemination of results.

SysMO-DB

Page 3: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

SysMO-DB Team

University of Stellenbosch, South AfricaUniversity of Manchester, UK

Jacky Snoep

EML Research gGmbH, Germany

Isabel Rojas

University of Manchester, UK

Olga Krebs

Wolfgang Müller

Sergejs Aleksejevs

Carole Goble

Stuart Owen

Katy Wolstencroft

Page 4: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Own solutions

Suspicion

Data issues

Resource Issues

Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.

Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians.

Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared

Different organisms, different strains.

No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping

Page 5: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Principles…

A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards

Provide instant gratification Address doubt and anxiety Build it rather than write about it.

Page 6: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Another view on the goal

File Management systemsPlone, Alfresco, PHProjekt, eGroupWare, Wikis

Specialist databases that you make your own: BASE, maxD, myExperiment

Specialist public databases you have a bit of: SABIO, JWS Online, myExperiment

Specialist public databasesBRENDA, PDB, BioModels, WikiPathways, KEGG, UniProt, GenBank, SGD, PubMed

Project

PublicReference Data Sets

Community Supported Data Sets

Pile of spread sheets on my hard drive

Personal

SysMO

Page 7: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Some numbers& Some consequences

1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist

11 projects, 91 institutes 20 person days/year/project 2.5 person days/year/institute “just in case“ approach impossible

Focus on real needs “just in time“, “just enough“ The right 20%

Help people help themselves Communication!

20%

80%

80-20-rule:80% of the featureswon‘t be used anyway

Useful features

Page 8: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Social Approach Questionnaires PALS

19 Postdocs and PhD students All three kinds of people Our design and technical

collaboration team Very intense face to face and

virtual collaboration UK and Continental PALS

Chapters Audits and Sharing

Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Page 9: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Communication via PALs

DB team PALS Projects

Show what is thereSuggest what is possible

Ask for requirements

Give requirementsTell priorities

Rate outcomesSuggest improvements

Double checkTransmit

Disseminate

Collect answers

Page 10: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

SysMO-DB PALs Meeting statistics

10 months 2 PAL all hands meetings 2 PAL chapter meetings 9 visits to 6 SysMO projects Numerous Skype chats, mails, telcons

Impact on development?

See later in talk

Page 11: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

“We need a way of collecting structuring and collecting and sharing Standard

Operating Procedures”

“Excel spreadsheets are our most common way of collecting and processing

data”

“I need a kind of “yellow pages” that tells me who is in what project and

what they are working on”

Page 12: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Modellers

Exchange

Experimentalists

Exch

ange

Exchange

Exchange

Bioinformaticians

Page 13: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

SpreadsheetRepository

SBMLModels

Repository

SOPRepository

WorkflowRepository

Cons

ortiu

m

Dat

a

Mod

els

Proc

esse

sSo

ps a

nd W

orkfl

ows

SysMO Approach

SysMO-SEEK web portal interface

JWS Online

AssetsCatalogue

YellowPages

SearchSysMO DB

JERM

Publ

ic d

ata

SBML Nature Protocols

Workflow Management System

Page 14: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Discovery SysMO-SEEK

Single, web based, access point Access control & Versioning managementYellow pages (“who is who”)

People, Expertise, Equipment Assets catalogue (“who has what”)

SOPs, Spreadsheets, pre-published models Metadata about Data held by projects

Access to other repositories Models (JWS Online), Workflows (myExperiment), Public web services (BioCatalogue)

Call out to external resources e.g. PubMed

Does not hold results.

Holds metadata on results and links to results

A component for SysMO groups to incorporate in their own environments and applications

Page 15: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Demo

Page 16: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium
Page 17: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Finding and Exchanging Project Data

“Just Enough” Exchange

Page 18: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Data Comparison and Exchange Public data sources

model organism databases – (e.g. SGD)

BRENDA …. Data produced by SysMO

SABIO-RK, iChiP, MeMo …. Local databases & Files

Excel Spreadsheets The most common form of

experimental data format.Proteomics

Met

adat

a

Metabolomics

Microarray

Proteomics

Single Cell Data

Page 19: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

COSMIC and BaCell ( Alfresco, document management system)

Page 20: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

SysMO LAB Spreadsheet

Experiment

measurementnumber

Glucose

Ethanol Acetate Lactate

Formiate

Succinate

Pyruvate

Acetoin

2,3 Butanediol

    mM mM mM mM mM mM mM mM mM

1 1 3,57 0 16,61 11,57 0 0 0 3,06 0

2 1 0 0 32,85 7,03 5,73 0 0,56 4,21 0

Our Extra Work!!

Page 21: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

ChallengeAim: Maintain the independence of the projects

Data registered in the SEEK Assets Catalogue Data remains at the host project site Data pulled from host project site on request

1. Need to map to a common metadata model for each data type (microarray, metabolomic…) so data can be found, understood and compared.

Just Enough Results Models (JERM)2. Need to create software that interfaces with the

different existing project data management setups (Alfresco, eGroupWare, MediaWiki, BASE, Excel…)

JERM Adapters and Extractors

Page 22: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

JERM: Just Enough Results Model

Way to “wrap“ data sources to match our agreed common data model for each data type

Minimum information needed to exchange data of each type Databases Content management

Systems Excel Spreadsheets Data File Store

JERM

Extract Export

Import

ProteomicsM

etad

ata

Metabolomics

Microarray

Proteomics

Single Cell Data

Page 23: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

What is Metadata?

Information, additional to the raw/processed data itself.

What a potential user of the data would need to know to be able to make full and accurate use of the data in a subsequent scientific analysis.

Machine readable descriptions of Data, Models, Services, Resources, Applications

[COSMIC]

Page 24: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

CIMR Core Information for Metabolomics ReportingMIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experimentMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry

ExperimentsSTRENDA Standards for Reporting Enzymology DataTBC Tox Biology Checklist

BioPAX : Biological Pathways Exchange http://www.biopax.org/FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions

http://www.mibbi.org/index.php/MIBBI_portalMIBBI: Minimum Information for Biological and Biomedical Investigations

Minimum Information Initiatives

Page 25: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Just Enough Results Model

Inspired by MCISB Key Results initiative and SBRML [Paton et al]

Harvested standards Analysed current

practice and consortium schemas and spreadsheets

Designing the corresponding JERMs

Mapping data sources of the projects to JERMs.

Page 26: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

What does it cover?

Page 27: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Experimental Data Metadata

People

Projects

Assay

Study

Experimental conditions

Factors studied

Models

SOPs

Homogenised terminology and values in the datasets themselves

Workflows

ISA-TAB compliant

Investigation

Where is it used?

Page 28: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Minimum metadata for SysMO exchange

What an experiment is. Find

Extract metadata from datasets for the Assets catalogue

Access Expose data results through a

JERM interface Access controlled by

consortiums, groups and individuals

Just Enough Results Model

Met

adat

a SABIO-RK

BRENDA

myDB

mySpreadSheet

JERM Web Service Access Interface

Access Control

JERM Extractor and Access Wrapper Layer

JERMTemplate

SourceAccess

and Harvester

SourceExtractor

Page 29: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

COSMIC

BaCell-SysMO

SysMOLab

MOSES

Alfresco

Alfresco

Wiki

Wiki

ANOTHER

A DATASTORE

Page 30: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

In Practice for Spreadsheets

Native JERM Template JERMed

+

+ +

Page 31: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

RegisterExtractMatched to the JERMAdding metadata

browse

search

++

Now

Whole record

Page 32: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

RegisterExtractMatched to the JERMAdding metadata here

browse

search

+++

Whole record

Near future

Filtered record

Enriched record

Page 33: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

RegisterExtractMatched to the JERMAdding metadata here

browse

search

++

Future Collections of

Records

+Meta-analysis

Page 34: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

JERM Source Extractor Generator New spreadsheets adopt JERM

template Legacy spreadsheet JERM

mapper. Databases have JERM mapper

Spreadsheet Ontology Annotator Restrict the values that a range

of fields can have.

Just Enough Results Model Tools

Met

adat

a SABIO-RK

BRENDA

myDB

mySpreadSheet

JERM Web Service Access Interface

Access Control

JERM Extractor and Access Wrapper Layer

JERMTemplate

SourceAccess

and Harvester

SourceExtractor

Page 35: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Models

Page 36: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Model

JWS Online - database of curated models and a model simulator.

ToBiN – platform for storage and analysis of genome scale metabolic networks (PSYSMO)

Biomodels - database of curated models (EMBL-EBI) Copasi – Complex Pathway Simulator (Mendes et al) Pre-publication SEEK store Semantic SBML (TRANSLUCENT); SBRML (MCISB)

More After the Demo!

Page 37: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Processes

Page 38: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Experimental Processes

Protocol Title Authors Keywords Abstract Materials

ReagentsReagent Set UpEquipment

Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References

Protocols and SOPs SOPs assets deposited or

linked to SOP gathering Nature Protocols format

recommendation High level classification for

indexing and tagging Got a few, need more.

Page 39: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Experimental Processes

Protocols and SOPs SOPs assets deposited

or linked to SOP gathering Nature Protocols format

recommendation High level classification

for indexing and tagging Got a few, need more.

http://www.molmeth.org

http://openwetware.org

Page 40: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Workflow Management System

Bioinformatics Processes: Workflows Data preparation, annotation and analysis

pipelines SBML model construction and population

Linking together Data sets, Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta)

Free and Open Source

Page 41: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Data integration: workflows for model parameterisation and validation.

Building models using workflows

Manipulation of SBML models in workflows

LibSBML: data integration & constructing and annotating SBML models

[Li et al]

Page 42: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Ramp up when more data resources become workflow accessible

Libraries of SysMO workflows

Spreadsheet Smart.

Page 43: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Microarray Analysis

SBML Model manipulation

Pathway Analysis Chemical

structure analysis

Protein structure analysis

Kinetic data Excel

Spreadsheet handling

Controlled vocabulary look-ups

http://myexperiment.org

Page 44: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Now…

Demo!!!!!!

Everyone contributedBut obviously we only have time for a few examples

Page 45: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

ModelsJWS Online model interface http://jjj.mib.ac.uk

http://jjj.bio.vu.nlhttp://jjj.biochem.sun.ac.za

• Sysmo models interface at JWS Online• SBML upload and webservices• JWS update, new interface (to be released soon), SBGN schema’s

Page 46: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

JWS Online SysMO home

~/sysmo

Page 47: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

MOSES models selection

Page 48: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

MOSES models

Page 49: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

JWS Online interface MOSES model

link to localhost /sysmo

Page 50: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

SBML model upload

Page 51: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

JWS Online access via web services

~/axis/services/QueryJWS?wsdl

{getRates, getAllModels, getAllBiomodels, getAllBiomodelsIds, getModelsByOrganism, getModelsByCategory, getModelInfo, getNmat, getKmat, getLmat, getSteadyStateTable, getTimecourse, getJacob, getEigenv, getCmat, getEmat, getRateEquations, getRateEquationFormulae, getExtVar, getExternalMetabValues, getInitMetabValues, getParamValues, hasFunction}

Page 52: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

JWS Online new interface (α)

Page 53: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

SpreadsheetRepository

SBMLModels

Repository

SOPRepository

WorkflowRepository

Cons

ortiu

m

Dat

a

Mod

els

Proc

esse

sSo

ps a

nd W

orkfl

ows

What we have done....

SysMO-SEEK web portal interface

JWS Online

AssetsCatalogue

YellowPages

SearchSysMO DB

JERM

Publ

ic d

ata

Standards SBML Nature Protocols

Workflow Management System

Page 54: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Training, Know-how and Dissemination

SysMO-DB Training Kick-start toolkits, workflows and SOP

templates SysMO consortium (esp. PALS)

Social networking for shared content, know-how and best practice

Contribution and Best of breed solutions in place

Outside consortium 6 presentations 2 tutorials More in the pipeline

Page 55: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

SABIO-RK User MeetingJune 15-16, 2009

Heidelberg, Germany

Costs supported by SysMO

http://projects.eml.org/sdbv/events/SABIORK_UserMeeting/index.html

Page 56: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Future: more, more, more!

Extend and stabilize software More JERM more data in SEEK

More JERM extractors, data, search possibilities More Models

More data into JWS, Integrate more tools to SysMO-SEEK More SOPs More Workflows

Facilitate workflow-ready solutions, Data collection/analysis workflow, Workflow player in SEEK

More semantics Closed vocabularies, Ontologies

More training

Page 57: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Timetable

SEEK Launch June 2009 JERM Phase 1 demo July 2009 Workflow with JWS-Online and SABIO-RK July 2009 JERM model stablised Sept 2009 Spreadsheet tools Nov 2009 Model comparison Nov 2009 SEEK controlled vocabularies Feb 2010 JERM tooling Feb 2010 MIRIAM comparison Mar 2010 Workflow authoring and harvesting Mar 2010 Workflow Player in SEEK June 2010 Training and Outreach ongoing

Page 58: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

How to get there

Update SEEK and Share data Do not need to share full content

tell people about existence of data; help people avoid duplicate work; find contacts

After publication data ready for sharing with the scientific world SysMO-DB will sign a NDA where needed

Retaining data at sites comes with responsibility Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project

Page 59: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Talk to your PAL Right requirements Right software Steer the project Lots of work under the

hood

Make sure your PAL has a voice in your project.

Look at our wiki Thanks!

Page 60: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Acknowledgements SysMO-DB Team SysMO-PALS

myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EMBL-EBI, MCISB

Page 61: SysMO-DB:  Towards “just enough” data exchange for the SysMO Consortium

Thank you!Questions?