Download - Integrative information management for systems biology

Integrative Information Management for Systems

Biology

Neil SwainstonManchester Centre for Integrative Systems Biology

Data Integration in the Life Sciences, Gothenburg, Sweden27 August 2010

http://mcisb.org/index.html

The MCISB

• Pioneer the development of new experimental and computational technologies in systems biology

• Currently employs 9.5 multidisciplinary people• Mathmaticians, informaticians, experimentalists, etc.• All share same office, lab

• Develop kinetic models of yeast metabolism


Metabolism


Models

• Genome-scale SBML model of yeast metabolism• Not kinetic / quantitative!• Annotated model

– All >2000 molecules have unique database references– MIRIAM standards have been followed (RDF)– Should be entirely unambiguous for third party users– Should be usable in third party tools– Should allow experimental data to be imported easily

– Herrgård MJ, Swainston N, et al. A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nat Biotechnol. 2008, 26, 1155-60.


Bottom-up systems biology

• Steps in kinetic modeling:

• Identify the pathway or portion of a network that is to be modeled

• Associate the model with functions and parameter values that represent its dynamic behavior, either from databases or experimentation

• Analyze and/or simulate the resulting model to understand its properties


Bottom-up systems biology

• In common practice, model construction is a manual process, in which a modeler associates a model with experimental data for simulation

• Such an approach can give rise to good quality models, but is more a cottage industry than as a highly scalable production process

• Can this be automated?


Automation of the process

• Experimental data is captured from instruments, and subject to primary analyses

• Experimental data and results of the primary analyses are archived in experimental data repositories

• The information required for modeling is extracted from the experimental data resources and stored in a Key Results Database (KRDB)

• A workflow obtains qualitative model information, represented using SBML, parameterizes this model with results in the KRDB, and analyses/simulates the resulting quantitative model


Enzyme kineticsQuantitative

metabolomicsQuantitativeproteomics

SBML Model

Parameters(KM, kcat)

Variables(metabolite, protein concentrations)

PRIDE XML MeMo SABIO-RK

Web service

KRDB

Web service

From instrument to result

• Raw data typically needs analysed before use

• Experimental data is often managed in an ad hoc way

• Experimentalists are not keen to spend time on data curation for archiving or sharing

• Try to capture necessary metadata as part of primary data analysis


Requirements

• The experimental techniques share requirements:• perform analyses on the raw experimental data to

derive the secondary quantitative parameters required in the model

• store the raw experimental data along with relevant metadata and the derived parameters, thus providing the facility to trace back and reanalyze raw data should this be required

• Where possible, existing data standards and tools are reused, although in practice data standards tend to lag behind technique development, and tools tend to lag behind standards


Data capture

• Software wizards have been developed that step experimentalists through the analysis of primary data• QconCAT PrideWizard for proteomics• KineticsWizard for enzyme kinetics

• Metadata collected along the way, as unobtrusively as possible• Heavily reliant on database web services


KineticsWizard


QconCAT PrideWizard


QconCAT PrideWizard

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Pride


Web interfaces


From instrument to result

• All laboratories carry out primary analyses of experimental data

• All laboratories carry out some form of secondary analyses based on primary results

• Many laboratories struggle to manage the results of these processes in a systematic manner

• We see the key to obtaining manageable results as being to integrate data capture and management with necessary analyses


But…

• …MCISB has to manage “only” three types of experiment• Proteomics, metabolomics, enzyme kinetics

• Informatics team share office with experimentalists and modellers

• We’ve been doing this for years…• Lots of time, lots of people, lots of resource• Infrastructure development is part of our remit


And…

• …many projects are far more diverse

• Informatics team separated from experimentalists, who are separated from modellers

• Less informatics resource

• Heavyweight approach of MCISB (bespoke tools for each experiment) not always applicable…


So…

• …lightweight approach may be more suitable

• Store only secondary data necessary for modelling• Not raw data

• Key Results Database (KRDB)• More modeller-focussed


Key Results Database

• Who, what, some how and why?• Measure “something” under “some conditions”• Measurements are generally a number but may

be some other artifact• Conditions may apply across entire experiment

(Static Factors)• Conditions may change across measurements

(Variable Factors)• Measurements may take place at a certain time


KRDB structure

KRDB web interface



• Deployed in Liverpool, MCISB, UCD

• Easily extensible interface

• eXist “lets it all hang out” as RESTful web services


Modelling infrastructure


Taverna

http://taverna.sourceforge.net


Taverna


Modelling life-cycle workflows


Qualitative model construction

Input: list of ORFs

Output: SBML file

1. Get reaction info

3. Create species

2. Create compartments

4. Create reactions


Qualitative model construction


Qual to quan: parameterisation

• Data requirements• Qualitative SBML model

• Starting concentrations for enzymes and source metabolites• Key Results Database

• Enzyme kinetics data• SABIO-RK database web service


Qual to quan: parameterisation


Model parameterisation

Model calibration

• Optional modification of parameters in reaction kinetics until the output of the model produces results similar to those obtained from experimentation

• Data requirements• Parameterised SBML model• Experimental data

• Metabolite concentrations from KRDB• Calibration by COPASI web service

COPASI web service

Design and Architecture of Web Services for Simulation of Biochemical Systems. Dada JO, Mendes P. Data Integration in the Life Sciences, Manchester, UK (2009).

Model calibration

Model simulation

• The running of a parameterized (and calibrated?) model using a specified simulation operation

Model simulation

SBRML

• Simulation results are data too, and are represented in our case in SBRML• Systems Biology Results Markup Language• Developed by Joseph Dada, et al. (Manchester)

• Structured format for representing simulation results• And experimental data?

• Dada JO, et al. SBRML: a markup language for associating systems biology data with models. Bioinformatics 2010, 26, 932-938.

Model simulation

Conclusion

• Classically, systems biology has been a cottage industry

• Experimental results are selected for use in modelling in an ad hoc manner

• Modellers develop and refine models using a time consuming and partially documented process

Conclusion

• Large scale experimentation should lead to more systematic behaviour

• Data integration to support the construction and parameterisation of models

• Large scale computational experimentation to support the comparison of models and their results

Thanks…

Integrative Information Management for Systems

Biology

Neil SwainstonManchester Centre for Integrative Systems Biology

Data Integration in the Life Sciences, Gothenburg, Sweden27 August 2010