SysMO-DB: A Community-Based Approach to Data Sharing
description
Transcript of SysMO-DB: A Community-Based Approach to Data Sharing
SysMO-DB: A Community-Based Approach to Data Sharing
Dr Katy WolstencroftUniversity of Manchester
SysMO-DB
A data access, model handling and data integration
platform for Systems Biology A web based resource
That promotes shared understanding Using a common platform and common technologies
Started July 2008
DB
SysMO-DB Dev Team
University of Stellenbosch, South AfricaUniversity of Manchester, UK
Jacky Snoep
Heidelberg Institute for Theoretical Studies Germany
University of Manchester, UK
Olga Krebs
Wolfgang Müller
Sergejs Aleksejevs Carole Goble
Stuart Owen
Katy Wolstencroft
Finn Bacall
Franco B du Preez
Pan European collaboration Eleven individual projects, 89 institutes
Different research outcomes A cross-section of microorganisms, incl.
bacteria, archaea and yeast
Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way
Present these processes in the form of computerized mathematical models
Pool research capacities and know-how
Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave
http://www.sysmo.net
Systems Biology of Microorganisms
Challenges
Heterogeneous data and models Distributed groups of researchers Modellers and experimentalists have different
skills, training, experience Scientists want to remain in control
Social and technical challenges
Social Challenge: Focus Group
DB team Focus Group Projects
Show what is thereSuggest what is possible
Ask for requirements
Give requirementsTell priorities
Rate outcomesSuggest improvements
Double checkTransmit
Disseminate
Collect answers
Focus Group SysMO-DB PALS
21 Postdocs and PhD students Modellers, experimentalists
and bioinformaticians Design and technical
collaboration team Intense collaboration UK and Continental PALS
Chapters Audits and Sharing.
Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..
20 questions Deployment into Projects
Technical Challenge
Rapid and incremental development Just enough and just in time , not Just in case No reinvention Driven by the PALs Sustainable and extensible Migrate to standards Fitting in with normal lab practices
What do we share
Protocol Title Authors Keywords Abstract Materials
ReagentsReagent Set UpEquipment
Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References
Methods Data Results+ +
Nature Protocols
All SysMO Assets
Protocols for Models
Protocol Title Authors Keywords Description Assumptions Equations Numerical Methods/Algorithms Computational Tools Parameter Estimation Techniques Limitations References
What do we share
Methods Data Results+ +Models +
All SysMO Assets
SOP
A Tree View of Assets
Investigation Studies Assay
ConstructionValidation
SOP
SOP
ISA infrastructure provides a directory structure for experiments
http://isatab.sourceforge.net/
Expertise, tools
Coordinates, data
How do we share
“Just Enough Results Model” What type of data is it
Microarray, growth curve, enzyme activity… What was measured
Gene expression, OD, metabolite concentration…. What do the values in the datasets mean
Units, time series, repeats….
Based on: Minimum information models
e.g. MIAME, MIAPE, MIRIAM Biological ontologies
e.g. Gene Ontology, MGED, SBO Bioportal web service used in SysMO-SEEK for:
Concept lookup and visualisation
How do we share
Share JERM templates developed by SysMO-DB, PALs and consortium Spreadsheet templates Database Schemas
Encourage uptake throughout SysMO transcriptomics metabolomics proteomics etc….
Tools to help manage data:Annotation standards by stealth
Controlled vocabulary plug inBioPortal
JERM Model
SysMO JERM a ‘MIBBI’ for the SysMO-SEEK What do we need to help you find stuff?
Title, person, filename, class
What is experiment specific? What is experiment specific, but helps us map
between them? Common biological elements
chemicals, genes, proteins, organisms, strains
Identifying Biological Objects
What do you have in your data? Proteins/enzymes, genes/expression levels,
metabolites
Where/how do these objects interact? Pathways, flux, experimental conditions
What models describe these interactions
Possible when using common frameworks, naming schemes and controlled vocabularies
Following Standards We recommend formats but we do not enforce
them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum
information models Models – SBML and related standards Publications – PubMed and DOI
If you follow the prescribed formats, you get more out, but if you don’t, you can still participate
Lowering the adoption barrier
Access Permissions
Just Enough Sharing
...we don’t talk about security
COSMIC
SysMOLab
MOSES
Alfresco
Wiki
Wiki
ANOTHER
A DATASTORE
Just Enough sharing
SOP
Fetch on Request
Direct Upload
When do People Share
Data Collection Pre-publication Post-publication
Your own group and maybe your project
Project + maybe consortium
Consortium and wider community
Collaboration Discussion and criticism Advertising
• Suspicion and fear of scooping
• Reputation
SysMO Aims : sharing sooner
Incentives for sharing
Safe haven for data Credit and attribution Help with exporting to public repositories (e.g.
One-click export to ArrayExpress, PRIDE etc) A repository for “supplementary materials” in
publications Linking publications and data
Access other resources through a SEEK gateway
SEEK as a Gateway
JWS Online Plugin•online simulator, runs in SysMO-SEEK•upload models in SBML format•SBGN schemas, with annotations and external links
Incentives for sharing
Credit and attribution SEEK records who owns what. If data, models, or
protocols are reused, scientists get recognition Accountability
SEEK records who owns what. If you take credit for others work, they will see
Data citation – formal credit for data published in SEEK
Data Citation
Persistent identifiers and URLs for the data Linking people to the data Safe haven for the data Guarantees of sustainability
Data MUST be uploaded and archived If cited, it must be public
SEEK as a Safe Haven
HITS can archive SysMO data for 10 years All SysMO software is open source and available
Distinction between sustaining the service and the software
Governance and Policy
What is required by SysMO members? When should they share during their projects? How long after the project can they keep data private
to finish publications? If their data is stored locally, what is the archive
process? Policy from DMG and funding agencies and NOT
SysMO-DB
Governance and Policy
Proposals under discussion: All data registered in SEEK should be uploaded and
archived at the end of a SysMO project All data from finished projects should be shared
How long after the end? 1 day, 6 months, 1 year? Scientists can invoke “creator’s privilege” on SysMO
assets produced near the end of the project Extra time to write-up and publish before release to the
general public – respecting publication cycles
SysMO So Far…
People ARE sharing Over 300 assets in SEEK
SOPs: 102, Models: 17, DataFiles: 95 ,Investigations: 13, Studies: 26, Assays: 53
PALs – a network of young SysBio researchers Training and education in data and metadata
management spreading through the consortium Modellers and experimentalists communicating
SysMO Methods Spreading
Virtual Liver Mueller, via HITS
Lungsys SBCancer EraSysBio+
Eukaryotic organisms Interactions between host and pathogen Human disease Multi scale modelling
Why it works for us
A solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work
PhD students, Post-docs Build to the PALs requirements Respect publication cycles Respect cultural differences Scientists stay in control
Acknowledgements
SysMO-DB Team SysMO-PALS
myGrid, Hits and JWS Online EMBL-EBI, MCISB
http://www.sysmo-db.org