Integrative Information Management for Systems
Biology
Neil SwainstonManchester Centre for Integrative Systems Biology
Data Integration in the Life Sciences, Gothenburg, Sweden27 August 2010
The MCISB
• Pioneer the development of new experimental and computational technologies in systems biology
• Currently employs 9.5 multidisciplinary people• Mathmaticians, informaticians, experimentalists, etc.• All share same office, lab
• Develop kinetic models of yeast metabolism
Metabolism
Models
• Genome-scale SBML model of yeast metabolism• Not kinetic / quantitative!• Annotated model
– All >2000 molecules have unique database references– MIRIAM standards have been followed (RDF)– Should be entirely unambiguous for third party users– Should be usable in third party tools– Should allow experimental data to be imported easily
– Herrgård MJ, Swainston N, et al. A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nat Biotechnol. 2008, 26, 1155-60.
Bottom-up systems biology
• Steps in kinetic modeling:
• Identify the pathway or portion of a network that is to be modeled
• Associate the model with functions and parameter values that represent its dynamic behavior, either from databases or experimentation
• Analyze and/or simulate the resulting model to understand its properties
Bottom-up systems biology
• In common practice, model construction is a manual process, in which a modeler associates a model with experimental data for simulation
• Such an approach can give rise to good quality models, but is more a cottage industry than as a highly scalable production process
• Can this be automated?
Automation of the process
• Experimental data is captured from instruments, and subject to primary analyses
• Experimental data and results of the primary analyses are archived in experimental data repositories
• The information required for modeling is extracted from the experimental data resources and stored in a Key Results Database (KRDB)
• A workflow obtains qualitative model information, represented using SBML, parameterizes this model with results in the KRDB, and analyses/simulates the resulting quantitative model
Enzyme kineticsQuantitative
metabolomicsQuantitativeproteomics
SBML Model
Parameters(KM, kcat)
Variables(metabolite, protein concentrations)
PRIDE XML MeMo SABIO-RK
Web service
KRDB
Web service
From instrument to result
• Raw data typically needs analysed before use
• Experimental data is often managed in an ad hoc way
• Experimentalists are not keen to spend time on data curation for archiving or sharing
• Try to capture necessary metadata as part of primary data analysis
Requirements
• The experimental techniques share requirements:• perform analyses on the raw experimental data to
derive the secondary quantitative parameters required in the model
• store the raw experimental data along with relevant metadata and the derived parameters, thus providing the facility to trace back and reanalyze raw data should this be required
• Where possible, existing data standards and tools are reused, although in practice data standards tend to lag behind technique development, and tools tend to lag behind standards
Data capture
• Software wizards have been developed that step experimentalists through the analysis of primary data• QconCAT PrideWizard for proteomics• KineticsWizard for enzyme kinetics
• Metadata collected along the way, as unobtrusively as possible• Heavily reliant on database web services
KineticsWizard
QconCAT PrideWizard
QconCAT PrideWizard
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
Pride
Web interfaces
From instrument to result
• All laboratories carry out primary analyses of experimental data
• All laboratories carry out some form of secondary analyses based on primary results
• Many laboratories struggle to manage the results of these processes in a systematic manner
• We see the key to obtaining manageable results as being to integrate data capture and management with necessary analyses
But…
• …MCISB has to manage “only” three types of experiment• Proteomics, metabolomics, enzyme kinetics
• Informatics team share office with experimentalists and modellers
• We’ve been doing this for years…• Lots of time, lots of people, lots of resource• Infrastructure development is part of our remit
And…
• …many projects are far more diverse
• Informatics team separated from experimentalists, who are separated from modellers
• Less informatics resource
• Heavyweight approach of MCISB (bespoke tools for each experiment) not always applicable…
So…
• …lightweight approach may be more suitable
• Store only secondary data necessary for modelling• Not raw data
• Key Results Database (KRDB)• More modeller-focussed
Key Results Database
• Who, what, some how and why?• Measure “something” under “some conditions”• Measurements are generally a number but may
be some other artifact• Conditions may apply across entire experiment
(Static Factors)• Conditions may change across measurements
(Variable Factors)• Measurements may take place at a certain time
Key Results Database
KRDB structure
KRDB web interface
KRDB web interface
Key Results Database
• Deployed in Liverpool, MCISB, UCD
• Easily extensible interface
• eXist “lets it all hang out” as RESTful web services
Modelling infrastructure
Taverna
Modelling life-cycle workflows
Qualitative model construction
Input: list of ORFs
Output: SBML file
1. Get reaction info
3. Create species
2. Create compartments
4. Create reactions
Qualitative model construction
Qual to quan: parameterisation
• Data requirements• Qualitative SBML model
• Starting concentrations for enzymes and source metabolites• Key Results Database
• Enzyme kinetics data• SABIO-RK database web service
Qual to quan: parameterisation
Model parameterisation
Model calibration
• Optional modification of parameters in reaction kinetics until the output of the model produces results similar to those obtained from experimentation
• Data requirements• Parameterised SBML model• Experimental data
• Metabolite concentrations from KRDB• Calibration by COPASI web service
COPASI web service
Design and Architecture of Web Services for Simulation of Biochemical Systems. Dada JO, Mendes P. Data Integration in the Life Sciences, Manchester, UK (2009).
Model calibration
Model simulation
• The running of a parameterized (and calibrated?) model using a specified simulation operation
Model simulation
SBRML
• Simulation results are data too, and are represented in our case in SBRML• Systems Biology Results Markup Language• Developed by Joseph Dada, et al. (Manchester)
• Structured format for representing simulation results• And experimental data?
• Dada JO, et al. SBRML: a markup language for associating systems biology data with models. Bioinformatics 2010, 26, 932-938.
Model simulation
Conclusion
• Classically, systems biology has been a cottage industry
• Experimental results are selected for use in modelling in an ad hoc manner
• Modellers develop and refine models using a time consuming and partially documented process
Conclusion
• Large scale experimentation should lead to more systematic behaviour
• Data integration to support the construction and parameterisation of models
• Large scale computational experimentation to support the comparison of models and their results
Thanks…
Integrative Information Management for Systems
Biology
Neil SwainstonManchester Centre for Integrative Systems Biology
Data Integration in the Life Sciences, Gothenburg, Sweden27 August 2010
Top Related