SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology
Katy WolstencroftUniversity of Manchester
SysMO-DB
A data access, model handling and data integration platform for Systems Biology:
To support and manage the diversity of Data, Models and experimental protocols Local data management systems
That promotes shared understanding Using a common platform and common
technologies
DB
Systems Biology Challenges
Interdisciplinary work Heterogeneous data and models Modellers and experimentalists have different
skills, training, experience Modellers and experimentalists have different
vocabularies and jargon Working together
Pan European collaboration Eleven individual projects, 91 institutes
Different research outcomes A cross-section of microorganisms, incl.
bacteria, archaea and yeast
Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way
Present these processes in the form of computerized mathematical models
Pool research capacities and know-how
Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave
http://www.sysmo.net
Systems Biology of Microorganisms
The Problem
No one concept of experimentation or modelling
No planned, shared infrastructure for pooling
Types of data
Multiple omics genomics, transcriptomics proteomics, metabolomics fluxomics, reactomics
Images Molecular biology Reaction Kinetics Models
Metabolic, gene network, kinetic Relationships between data sets/experiments
Procedures, experiments, data, results and models Analysis of data
Started in June 2008Web-based solution to facilitate:
exchange of data, models and processes (intra- and inter- consortia)
search for data, models and processes across the initiative
maximisation of the "shelf life" and utility of the data, models and processes generated
dissemination of results
DB SysMO-DB
SysMO-DB Team
University of Stellenbosch, South AfricaUniversity of Manchester, UK
Jacky Snoep
Hits, Germany
Isabel Rojas
University of Manchester, UK
Olga Krebs
Wolfgang Müller Carole Goble
Stuart Owen
Katy Wolstencroft
Finn Bacall
SABIO-RK
JWS Online
TavernamyExperiment
SysMO-DB PALS team Power Contributors.
21 Postdocs and PhD students Design and technical
collaboration team Intense collaboration UK and Continental PALS
Chapters Audits and Sharing.
Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..
20 questions Deployment into Projects
Principles…
A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards
Provide instant gratification Incremental development Fitting in with normal lab
practices
The Lowest Hanging Fruit
SysMO SEEK – a catalogue of assets SysMO Yellow Pages The people and their expertise The institutions and their facilities Data – experimental data sets Data – analysed results Data – external reference data sets Models Processes – laboratory protocols and bioinformatics
analyses Publications
The catalogue references assets held elsewhere
SEEK screenshot?
COSMIC
BaCell-SysMO
SysMOLab
MOSES
Alfresco
Alfresco
Wiki
Wiki
ANOTHER
A DATASTORE
Harvesters
Why not a central Warehouse? Protective of models
in progress vs published models. Access and Version management Curator-Rival conflict
Reluctant to share data Even within their own projects Legacy spreadsheets dominate Curation practices vary Centralised archive take-up Point to Point Exchange
People don’t mind sharing methods People want to advertise publications
Nature 461, 145 (10 Sept09)
Access Permissions
Just Enough Sharing
Reusing myExperiment
Dat
a
Mod
els
Proc
esse
s
SysMO DB
SysMO-DB Architecture
SysMO-SEEK web interface
Assets and Yellow
Pages Catalogues
JERM
Making use of the Assets
Understanding the content of the data Linking assets together Linking assets to experimental context Running comparisons between data files Running model simulations Running data analysis pipelines
What is the JERM?
JERM “Just Enough Results Model” Minimum information to exchange data
What type of data is it Microarray, growth curve, enzyme activity…
What was measured Gene expression, OD, metabolite concentration….
What do the values in the datasets mean Units, time series, repeats….
Which experiment does it relate to? How does it relate to models? How was the data created
SOPs and protocols
CIMR Core Information for Metabolomics ReportingMIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experimentMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
ExperimentsSTRENDA Standards for Reporting Enzymology DataTBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions
http://www.mibbi.org/index.php/MIBBI_portal
Minimum Information Models
The Idea
For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data
Generate and apply…. JERM template JERM extractor for data host Subset registered in SEEK Access / export through JERM interface / template
Define a JERM….. Top down analysis of standards Bottom up analysis of practice
1
2
3
ISA-TAB
Experimental Data
Metadata
People
ProjectsAssay
Study
Experimental conditions
Factors studied
Models
SOPs
Homogenised terminology and values in the datasets themselves
Workflows
Based on ISA-TAB
Investigation
SEEK + JERM
For publishing
JERM data needs to be related to SOPs, experimental context (ISA) and other data
JERM must be “MIBBI” compliant for exporting to public repositories e.g. Microarray data needs to be MIAME compliant
ISA-TAB
Relating data and its experimental context Investigation, Study, Assay
TAB = tabular A format suitable for spreadsheets
http://isatab.sourceforge.net/
ISA Provides....
A common framework for relating different types of data e.g. microarrays and proteomics
Facilitates submission to international public repositories of genomics, transcriptomics and proteomics studies
Identifying Biological Objects
What do you have in your data? Proteins/enzymes, genes/expression levels,
metabolites
Where/how do these objects interact? Pathways, flux, experimental conditions
What models describe these interactions
Possible when using common frameworks, naming schemes and controlled vocabularies
BioPortal Integration for Searching
Repository for submitting and sharing Biological ontologies http://bioportal.bioontology.org/
Search for concepts across all or selected ontologies
BioPortal provides a number of Restful Webservices Search Concept lookup Visualisation
Integrated within SEEK as a plugin
Tools to help manage data:Annotation standards by stealth
Controlled vocabulary plug inBioPortal
Following Standards
We recommend formats but we do not enforce them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum
information models Models – SBML and related standards Publications – PubMed and DOI
If you follow the prescribed formats, you get more out, but if you don’t, you can still participate
lowering the adoption barrier
Off the shelf
Except for the JERM, we have only used community resources, vocabularies and services
You can get a long way by implementing community practices and providing ways to integrate them
SysMO-DB and Models
Nicolas Le Novere, Data Integration in the Life Sciences, Manchester, 2009
Models: Incentives for using Standards
Models can be shared in SysMO-SEEK in any format SBML is the recommended format We also recommend MIRIAM compliance and SBO annotation
If you use SBML, you can use JWS Online to run simulations in SEEK
Screenshot of JWS Online
JWS Online Plugin•online simulator, runs in your browser•upload models in SBML format•Web Service enabled•SBGN schemas, with annotations and external links
Falko Krause, Humboldt-University, Berlin http://www.semanticsbml.org/aym
Models Resources
Models can be published in public repositories JWS-Online, BioModels
Models can be annotated SBML, MIRIAM, SBO
No public resources currently for sharing models with associated data, or for loading new data into models
Linking Data to Models
Relating data and models Where did the data come from for developing the
model? Where did the data come from for validating the
model? What were the results of model simulations?
Current Functionality in SEEK
Show all data used for construction together with the model, such that process can be repeated
Uploaded models loaded with this data by default
Manually alter parameters and run simulations
Next Steps: Model Validation
Test/compare model with experimental data for complete system Find data in SEEK Upload data from elsewhere Automatically load into model Run simulations and compare with original results
JERM for models Mapping tools – allows you to identify
columns/rows in spreadsheets containing the right information
ISA for Models
Modelling and experimental work intersect Investigations, Study, Assay.....or modelling
analysis..... Modelling analysis types
Metabolic models, gene networks Modelling type
ODE, algebraic
Studies – combinations of experimental assays, modelling analyses, and informatics analyses
SysMO-DB the e-Laboratory
An e-Laboratory is an information system for bringing together people, data and analytical methods at the point of investigation or decision-making
Current Status
Finding things so that we can compare them Understanding who has what Understanding what can be compared with what
– the experimental context
Where we are going…
A dynamic resource for analysis as well as browsing
Automatic comparison of data from inside files Understanding where and how data and models
are linked Running simulations with new experimental data Running analyses and workflows over the data
and models
Workflows from myExperiment
Data preparation, annotation and analysis Systems Biology workflow Pack on myExperimentMicroarray analysis and text mining
Created by Afsaneh Maleki-Dizaji
from SUMO, University of Sheffield
Based on previous work by Paul Fisher, University of Manchester
http://www.myexperiment.org/workflows/187
SEEK as a data analysis and meta analysis service
SBML model construction and population Calibration workflow Data requirements
Parameterised SBML model Experimental data
Metabolite concentrations from key results database
Calibration by COPASI web service
Peter Li
Data analysis and meta analysis
SEEK Analysis Service with pre-cooked analysis tools.
Calibration workflow Data requirements
Parameterised SBML model Experimental data
Metabolite concentrations from key results database
Calibration by COPASI web service
Peter Li
Load model:
Load data:
GO
New Directions
Opening SysMO Out
Using SysMO as a dissemination space for the SysMO consortium Supplementary material in publications Data citation
Packaging software so that others can use it Easy to install a SEEK for yourself
Packaging and exchanging JERM Templates Helping with standardisation
Promotion and example work with SBRML and data and models linkage
SysMO-DB Approach in Other projects
SysMO2 – new projects and legacy EraSysBio+ Lungsys and SBCancer Virtual Liver
New Considerations
Eukaryotic organisms Interactions between host and pathogen Human disease
multicellular interactions, tissues, organs multiscale modelling
Outstanding Issues
Keeping data at project sites has responsibilities Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project.
How it works
Find a solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work
PhD students, Post-docs Let the scientists retain control over their data
and who can see it Don’t reinvent. Use available vocabularies,
minimal model standards Help prevent people duplicating work by linking
the people as well as the resources
Acknowledgements
SysMO-DB Team SysMO-PALS
myGrid, Hits and JWS Online teams EMBL-EBI, MCISB
http://www.sysmo-db.org
Top Related