Formal representation of models in systems biology

Post on 10-May-2015

1.986 views 5 download

Tags:

Transcript of Formal representation of models in systems biology

1 INRIA2011::Dumontier

Formal representation of models in systems biology

Michel Dumontier, Ph.D.

Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University

Professeur Associé, Département d’informatique et de génielogiciel, Université Laval

Ottawa Institute of Systems BiologyOttawa-Carleton Institute of Biomedical Engineering

Systems BiologyWe create and simulate biological models to :• gain insight into the structure and function of

biochemical networks• reveal metabolic and signalling capabilities so as

to predict phenotypes• undertake metabolic engineering to maximize

some desired product

To do this, we need • to integrate & manage our data & knowledge in

a coherent, scalable and machine understandable manner

• efficient software to execute computationally demanding simulations

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 2

3 INRIA2011::Dumontier

Repressilator: A self-regulating system

A synthetic oscillatory network of transcriptional regulators. Elowitz MB, Leibler S. (2000). Nature 403: 335-338.

4 INRIA2011::Dumontier

Biomodels are specified using SBML+ semantic annotation

Semantic annotations are primarily used for browse and search

• EBI managed resource with 600+ SBML models

• 300+ models are annotated with

• Pubmed - papers• ChEBI - chemicals• UniProt - proteins• KEGG - chemicals,

reactions• E.C. - reactions• Gene Ontology -

functions, reactions, compartments

• Taxonomy - organism

http://www.ebi.ac.uk/biomodels-main/

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 5

Bio-ontologies

• Provide rich human and machine understandable descriptions of the terms they purport to describe

• Are generally used for semantic annotation of data, which when reused, facilitate integration across domains (granularity, species, experimental methods)

• Facilitate granular and cross-domain queries • Can be used to obtain explanations for inferences drawn• Can be efficiently processed by algorithms and software

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 6

Additional annotations are specified using the Resource Description Framework (RDF)

   <species metaid="_525530" id="GLCi" compartment="cyto" initialConcentration="0.097652231064563">

           <annotation>        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">          <rdf:Description rdf:about="#_525530">            <bqbiol:is>              <rdf:Bag>                <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/>                <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/>              </rdf:Bag>            </bqbiol:is>          </rdf:Description>        </rdf:RDF>      </annotation>    </species>

object

predicate

The intent is to express that the species represents a substance composed of glucose moleculesWe also know from the SBML model that this substance is located in the cytosol and with a (initial) concentration of 0.09765M

The annotation element stores the RDF

subject

Implicit subject and xml attributes

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 7

RDF-based Linked Data

• Provides the basis for simple data syndication and syntactic data integrationo IRIso Statements (aka triples) take the form of o <subject> <predicate> <object>

• Easy to implemento stand-alone datasetso logical layer over databases

• Limited reasoningo class and property hierarchieso domain/range restrictionso can’t automatically discover inconsistency

• Standardized Queries - SPARQL• Scalable - to billions of triples

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 8

Objective:Computational Knowledge Discovery• Terminological resources increasingly being used to annotate

SBML-based biomolecular models o Makes it easier to explore or find models

• By converting models into formal representations of knowledge we get to:o validate the accuracy of the annotations o infer knowledge explicit in terminological resourceso discover biological implications inherent in the models and the

results of simulations.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 9

10 INRIA2011::Dumontier

Approach• We formally represent semantically annotated

biomodels such that it becomes possible to:– Capture the semantics of models and the

biological systems they represent– Check the consistency of biological knowledge

represented through automated reasoning– query the results of simulations in the context of

the biological knowledge

11 COMBINE2011:Dumontier

Have you heard of OWL?

12 COMBINE2011:Dumontier

OWL - The Web Ontology Language

Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data values

o quantifiers o existential, universal, cardinality restriction

o negationo disjunctiono property characteristics

o transitive, functional, inverse functional, symmetric, antisymmetric, reflexive, irreflexive

o complex classes in domain and range restrictions o property chains

12

13 COMBINE2011:Dumontier

Powerful Reasoning over OWL ontologies

• Consistency: determines whether the ontology contains contradictions.

• Satisfiability: determines whether classes can have instances.

• Subsumption: is class C1 implicitly a subclass of C2?

• Classification: repetitive application of subsumption to discover implicit subclass links between named classes

• Realization: find the most specific class that an individual belongs to.

13

Triples to axioms: Many possible formalizations – knowledge of logics and domain expertise comes in

handy here!We make an ontological commitment by converting RDF triples into OWL axioms. Triple in RDF:<C1 R C2>• C1 and C2 are classes, R a relation between 2 classes• intended meaning:

o C1 SubClassOf: C2o C1 SubClassOf: R some C2o C1 SubClassOf: R only C2o C2 SubClassOf: R some C1o C1 SubClassOf: S some C2o C1 DisjointFrom: C2 o C1 and C2 SubClassOf: owl:Nothingo R some C1 DisjointFrom: R some C2o C1 EquivalentClasses C2o ...

• in general: P(C1, C2), where P is an OWL axiom (template)

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 14

15 INRIA2011::Dumontier

Formalization: every element E of the SBML language represents a class Rep(E) and we assert that

E subClassOf: represents some Rep(E)

Conceptualization:Models and their components represent physical entities (material entities, processes)

Top-level ontologies can make additional commitment by enforcing disjointness

among basic types

Material object, Process, Function and Quality are mutually disjoint.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 16

Relations impose additional constraints, such that inconsistencies arise when incorrectly used

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 17

OWL Axiom:Model SubClassOf: represents some MaterialEntity

Conversion rule: a Model annotated with class C represents:

If C is a SubClassOf MaterialEntity then M SubClassOf: represents some C

If C is a SubClassOf Function then M SubClassOf: represents some (has-function some C)

If C is a SubClassOf Process then M SubClassOf: represents some (has-function some (realized-by only C))

For each model annotation, we make a commitment to what it represents

BIOMODEL 82: Converting Model

Annotated with heterotrimeric G-protein complex cycle (GO:0031684):

• represents an object O1• O1 has a function F1• F1 is realized by processes of the type heterotrimeric G-protein

complex cycle

M SubClassOf: represents some O1O1 SubClassOf: (has-function some (realized-by only GO:0031684)

Compartment(x): a model component that represents a material entity which is part of the object represented by the model to which the component belongs

Compartment SubClassOf 'model component' and represents some 'Material object'

Conversion rule:• represents an object O2• part of the object represented by the model• compartment’s species represent objects that are located in O2• C SubClassOf: represents some A2• A2 SubClassOf: located-in some A1

Every compartment represents a material entity

21 INRIA2011::Dumontier

Models and their components represent physical entities (material entities, processes)

reactions

species Material entityrepresents

Process

represents

model

has part

IndividualClass

Datatype

SBMLHarvester

Robert Hoehndorf, Michel Dumontier, John H Gennari, Sarala Wimalaratne, Bernard de Bono, Daniel L Cook and Georgios V Gkoutos. Integrating systems biology models and biomedical ontologies. BMC Systems Biology 2011, 5:124.

compartments

model-ofPhysical system

Material entityrepresents

Material entity

Function

has-function

realized in only

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 22

SBML2OWL: Implementation

SBML Harvester http://code.google.com/p/sbmlharvester/• libSBML to access model structure & extract RDF annotations• Jena RDF API to parse RDF annotations• OWLAPI to create OWL axioms & reason with top-level ontology

Application to BioModels repository yields:• OWL ontology with more than 300,000 classes, 800,000 axioms• includes all referenced ontologies

o GO (functions, compartments, processes)o ChEBI (molecules)o Celltype (cell types)o FMA (anatomy)o PATO (qualities)

Answering questions

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 24

Model verification

After reasoning, we found 27 models to be inconsistent

reasons1. our representation - functions sometimes found in the place of

physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations

2. SBML abused – e.g. species used as a measure of time3. constraints in the ontologies themselves mean that the annotation

is simply not possible

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 25

Finding inconsistencies with axiomatically enhanced ontologiesrecent work treats function as process and axioms state that an ATPase activity (GO:0004002) is a Catalytic activity that has Water and ATP as input, ADP and phosphate as output and is a part of an ATP catabolic process.To this, we add:• GO: ATP + Water the only inputs (universal quantification)• ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all

different (disjointness)BIOMD0000000176 and BIOMD0000000177 models of anaerobic glycolysis in yeast.• “ATP” input to “ATPase” reaction, which is annotated with

ATPase activity. The species “ATP”, however, is mis-annotated with Alpha-D-glucose 6-phosphate (CHEBI:17665), not with ATP.

Disambiguation: Model annotations

Assertion:M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C))

C SubClassOf: MaterialEntityThen: • represents some C is satisfiable• represents some (has-function some C) and represents

some (has-function some (realized-by only C)) are unsatisfiable

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 27

28 INRIA2011::Dumontier

Species are further described with ‘modifiers’ in the context of a reaction

essential activator

<listOfModifiers> <modifierSpeciesReference sboTerm="SBO:0000461" species="X"/> </listOfModifiers>

partial inhibitor <listOfModifiers> <modifierSpeciesReference sboTerm="SBO:0000536" species="PX"/> </listOfModifiers>

29 INRIA2011::Dumontier

Roles are realized in the context of processes by material entities

reaction

species

SBO: participant role

sio: realizes

ChEBI:substance

sio: role of

sio: represents

GO:Processsio: represents

sio: has direct partChEBI:molecule

model

sio: has proper part

IndividualClass

Datatype

SBMLHarvester+

sio: has participant

role chain: realizes o role of -> has participant

30 INRIA2011::Dumontier

Semanticscience Integrated Ontology (SIO)

• OWL2 ontology• 1000+ classes covering basic types (physical, processual, abstract,

informational) with an emphasis on biological entities• 183 basic relations (mereological, participatory, attribute/quality,

spatial, temporal and representational)• axioms can be used by reasoners to compute inferences for

consistency checking, classification and answering questions about life science knowledge

• embodies emerging ontology design patterns – specifies a data model

• dereferenceable URIs• searchable in the NCBO bioportal• Available at http://semanticscience.org/ontology/sio.owl

31 INRIA2011::Dumontier

Examining Mathematical Expressions

<parameter metaid="metaid_0000233" id="k_tl" name="k_tl" constant="false" sboTerm="SBO:0000016"> (unimolecular rate constant) <notes> <p xmlns="http://www.w3.org/1999/xhtml">Translation rate constant</p> </notes> </parameter>

<assignmentRule metaid="metaid_0400235" variable="k_tl"> <math> <apply> <divide/> <ci> eff </ci> <ci> t_ave </ci> </apply> </math> </assignmentRule>

<kineticLaw sboTerm="SBO:0000049"> <math> <apply> <times/> <ci> k_tl </ci> <ci> X </ci> </apply> </math> </kineticLaw>

<parameter metaid="metaid_0000025" id="eff" name="translation efficiency" value="20"> <notes> <p xmlns="http://www.w3.org/1999/xhtml">Average number of proteins per transcript</p> </notes> </parameter>

32 INRIA2011::Dumontier

SBO:Reaction

SBO:mathematicalexpression

Quantity

SBO: systems description parameter

sio:denotessio: is specified bysio:

has proper part

SBML Reactions may be specified by mathematical expressions, which contain

quantitative variables that denote quantities

sio: has value Literal

IndividualClass

Datatype

SBMLFarmersio:derives from

33 INRIA2011::Dumontier

When running a simulation, some attributes change with timespecies

attribute

sio:has attributesio

: has value double

uo:unitsio: measured at time

sio: has valuedatetime

sio: result of

simulationsio: has agent

software

sio: has participant

parametermodelIndividualClass expression

sio: has unit

Datatype

KISAO:algorithm

sio: conforms to

34 INRIA2011::Dumontier

35 INRIA2011::Dumontier

Copasi output: not machine understandable

36 INRIA2011::Dumontier

Query Answering over RDF/OWLFind those concentration measurements for species that represent molecular entities that contain ribonucleotide residues

‘concentration’ and (‘measured at’ some double[>20.0, <40.0])and ‘is attribute of’ some ( ‘species’ and ‘represents’ some (‘has part’ some ‘ribonucleotide residue’))

ChEBI ontology

37 INRIA2011::Dumontier

Curve Analysis: Elements of a Plot

A

B

C

D

Time

Conc

entr

ation

Curves Contain- Points- Line Segments- Curve SegmentsCurves Annotated As- Monotonic/Non-Monotonic- Overall Increasing/DecreasingCurve Segments- Contain line segments- Same properties as curveLine Segments- Contain points- Increasing/DecreasingPoints- Have attributes and values- Can be local minima/maxima- Can be inflection points

Curve: MonotonicityTEDDY_0000144

TEDDY_0000008

strictly increasing

TEDDY_0000009

strictly decreasing

Curve SegmentOverall Change (Slope)Constituent Points

Point: Stationary Point (Maximum)

Curve: Overall Change

38 INRIA2011::Dumontier

Queries

‘local maximum’ and ‘is attribute of’ some ( species and represents some ( ‘has function’ some ‘dna binding’))

Biomodel + UniProt + GO

39 INRIA2011::Dumontier

Get the non-monotonic curves for protein species

‘non-monotonic curve’and ‘has part’ some ( ‘concentration’ and ‘is attribute of’ some ( ‘species’ and ‘represents’ some ‘protein’)))

40 INRIA2011::Dumontier

ConclusionThe SBML-derived ontologies can be

i) checked for their consistency, thereby uncovering erroneous curations

ii) infer attributes and relations of the substances, compartments and reactions beyond what was originally described in the models

iii) answer sophisticated questions across a model knowledge base

iv) extended with modifiers, mathematical expressions and parameters, simulation Results (from tab files) to answer questions about simulation results with reference to the semantic annotations (GO) in biomodels, UniProt

INRIA2011::Dumontier41

Acknowledgements

Leonid Chepelev

Robert Hoehndorf

42 INRIA2011::Dumontier

Michel Dumontiermichel_dumontier@carleton.ca

Publications: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier