An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous –...
-
Upload
kristin-parks -
Category
Documents
-
view
216 -
download
3
Transcript of An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous –...
An (OBO) ontology is NOT a model of language, it is a model of reality.
Words are ambiguous – especially in isolation. Take the word 'wing'
what type of things have wings?
birds, bats, flies, sphenoid bones*, planes, cars...
An (OBO) ontology uses words from natural language...
... but constrains their meaning to some subset of their possible meanings in language.
Constraints are written into definitions and relationships between terms and refer to the scientific literature.
* the lateral process of a sphenoid bone is referred to as a wing (Henderson's dictionary biological terms)
OBO foundry approach to ontology - 1
OBO foundry approach to ontology- 2 Problem with relying on the 'self-evident' meaning of words:
With no definitions for the nouns, these two statements both have obvious readings which are true:
1. eye part of craniofacial tissue
2. ommatidium part of eye
But, if we combine them:
ommatidium part of eye part of craniofacial tissue
... we can't get a true statement about biology, no matter what meaning we choose for the nouns.
An ontology with thousands of terms and relations makes an enormous number of statements. Only by defining terms can we hope to avoid problems like this.
OBO 101 Type – a classification of things
(instances) in the real world, based on some biologically significant similarity (specified by definition + relationships)
OBO 101 Type – a classification of things
(instances) in the real world, based on some biologically significant similarity.
Relationships between instances:Straightforward - each instance is just
one thing:my left index finger is part_of my left
hand
OBO 101 Type – a classification of things
(instances) in the real world, based on some biologically significant similarity.
Relationships between instances:Straightforward - each instance is just
one thing:my left index finger is part_of my left
hand
Relationships between types:For any type, there can be many
instances in the world.Does a relationship between types
apply to all or just some of these instances?
Basic relations
Why we build anatomy ontologies
Main use case: annotation of gene expression and phenotypes (MODs and others)
This needs:This needs:
1. Terms with written definitions=> consistent annotation despite variant usage of terminology in literature.
2. Careful use of logically defined relations => allowing annotations with related terms to be grouped in biologically meaningful ways.
Use case 2: A controlled system for recording character matrices for use in homology calls (cTOL/Nescent).
Grouping annotations using an anatomy ontology
QUERY: list all axon tracts which are part of the adult midbrain
List all alleles causing phenotypesin these structures.List all genes expressed in these structures.
Grouping annotations using an anatomy ontology
QUERY: list all neurons which develop from neuroblast NB 7-1
List all alleles causing phenotypesin these structures.
List all genes expressed in these structures.
Why standardise the way we build anatomy ontologies?
So we don't all need to re-invent the wheel.
Use of common relations allows the development of common tools.
Best practice allows development of scalable and maintainable ontologies:
Written definitions make the reasons for graph structure clear. => New editors can understand and build on the work of
previous editors
Reducing multiple inheritance makes ontologies makes avoiding true-path violations practical.
CARO is a standardisation exercise
CARO provides an upper-ontology template to use for constructing anatomy ontologies.
CARO provides a bridge between anatomy ontologies and the basic formal ontology (BFO). This is useful for putting constraints on relations.
anatomical structure maps to BFO: object and as such is defined as 'maximally connected'
this distinction makes part relationships easier to define and avoids confusion with is_a relations
disconnected anatomical systems (e.g., immune system) are classified under 'anatomical group'
CARO is a purely structural classification.
CARO classifies structures from the bottom up – on the basis of granularity.
cell component cell portion of tissue multi-tissue structure ...
CARO classifies structure from the top down – as subdivisions of a whole organism
organ system appendage
CARO makes a clear distinction between cellular and acellular structures.
CARO associated standardisation
1. A system for defining structures by their developmental fate.
2. A system for recording developmental timings.
3. A system for recording part relationships which change over developmental time.
Why build multi-species anatomy ontologies ?
We need systems for cross species inference of gene function.
What do we need for inference? The development, differentiation of functioning of
homolgous structures is likely to involve conserved genetic pathways – especially if the species are closely related.
Lee Niswander Nature Reviews Genetics 4, 133-143 (February 2003)
Conservation of gene expression and role in limb development
Why do we care about homology when building an anatomy ontology?
Consider an anatomy ontology of vertebrates:
skeletal systemcranial skeletal system
parietal bone (in_organism human)parietal bone (in_organism zebrafish) frontal bone (in_organism human)frontal bone (in_organism zebrafish)
Homologous : frontal bone (zebrafish) and parietal bone (human)
fpa
Different genes and developmental processes may underlie the development of the zebrafish frontal and the human frontal, even
though they have the same name and are similarly located
How and where should homology information be captured?
Use case: Query for phenotypes affecting the human frontal bone and its homologous structure in other species.
Homologs = Synonyms
How should we not build multi-species anatomy ontologies ?
It is not practical to write structural definitions which apply across many species and which apply to every structure in a homology group.
e.g.- try to define parietal bone for all descendants of the common ancestor of zebrafish and human.
How should we not build multi-species anatomy ontologies ?
Structural defs across many species may sometimes be impossible:
Structure Z def: Anatomical entity with structural attributes, A, B, C and D.
Structure X def: Anatomical entity with structural attributes C, D, E & F.
evolved_from structure Z
Structure Y def: Anatomical entity with structural attributes A, B, G & H.
evolved_from structure Z
fpa
A definition of homology
XY
Where:All instances of structure X part_of some instance of species AAll instances of structure Y part_of some instance of species BAll instances of structure Z part_of some instance of species C
X is homologous_to YIFFAll X derived_from some ZANDAll Y derived_from some Z,AND C is the most recent common ancestor of A and B
Z
derived_fromderived_fromderived_from
Some term definitionsforewing (MRCA Insecta)def: Membranous dorsal appendage of adult mesothoracic
segment of MRCA Insecta.
hindwing (MRCA Insecta)def: Membranous dorsal appendage of the adult metathoracic
segment of MRCA Insecta.
elytra (MRCA Coleoptera)def: Hard, chitinous dorsal appendage of adult mesothoracic
segment of MRCA Coleoptera.
haltere (MRCA Diptera)def: club shaped dorsal appendage of the metathoracic segment of MRCA Diptera.
MRCA = most recent common ancestor species
appendage (CARO). is_a wing of MRCA Insecta. . is_a forewing of MRCA Insecta. . . derived_from wing of MRCA Diptera. . . derived_from elytra of MRCA Coleoptera. . is_a hindwing of MRCA insecta. . . derived_from haltere of MRCA Diptera. . . derived_from wing (alae) of MRCA Coleoptera. is_a limb of MRCA tetrapoda. . is_a forelimb of MRCA tetrapoda. . . derived_from wing of MRCA Aves. . . derived_from wing of MRCA Chiroptera*
MRCA = most recent common ancestor species
(*? perhaps should be derived_from autopod?)
A homology based multi-species anatomy ontology
Relations within species specific anatomy ontologies:
. wing (Drosophila melanogaster). derived from wing of Diptera MRCA
. elytra (Tribolium casteneum). derived_from elytra of coleoptera MRCA
combined with:
. forewing of MRCA Insecta
. . .derived_from wing of MRCA Diptera
. . .derived_from elytra of MRCA Coleoptera
is sufficient to deduce that:
elytra of Tribolium cateneumhomologous_to wing of Drosophila melanogaster
Reasoning homology
Acknowledgements Michael Ashburner CARO
Fabian Neuhaus (NIST) Paula MaBee (U S.Dakota /cTOL) Jose Mejino (U Washington, FMA) Chris Mungall (Berkeley) Melissa Haendel (U Oregon, ZFIN) Barry Smith (Buffalo)
Funding FlyBase; The Newton Trust
Terms in the Danio AO refer to many evolutionarily conserved structures.
Key point
The names of terms in this anatomy ontology refer, in normal usage, to many conserved anatomical structures.
But the terms themselves refer to whatever their definitions say they refer to - this is considerably more restrictive than normal usage.
Fabian’s idea
wing forelimb
limb-like thing
derives_from
derives_from
bat wing is_homologous to dog forelimb
batsdogs
Homology groupinglimb-like thing--d-wing-bats--d-forelimb-dogs
Question, what type of thing is a limb-like thing?
forelimb – MRCA tetrapodadef: The forelimb of the most recent common ancestor of the tetrapoda. (attributes ....)
Approaches to recording homology - relations
1. Capture pairwise X in species A homologous_to Y in species B
Disadvantage – homology of X and Y to structures in species C, D, E F .... must be captured separately => v. many homology statements.
2. Capture with reference to some central multi-species ontology:
X is_a fore-limb tetrapodaX derives_from fore-limb MRCA tetrapoda
Examples of term ambiguity:
Early stages in spermatogenesis
perineurial layer
David’s idea
wing forelimb
limb
is_a
is_a
bat wing is_homologous to dog forelimb
batsdogs
Homology group ontologyforelimb-tetrapoda--is_a wing-bat--is_a forelimb-dog
CARO/other structural ontologyCardinal organism part--is_a appendage
tetrapoda
forelimb – tetrapodadef: An anatomical structure derived from the forelimb of the most recent common ancestor of the tetrapoda (attributes...)
Fish group idea (may not be current)
wing forelimb
limb
bat wing is_homologous to dog forelimb as limbbat wing is_homologous to bird wing as limbbat wing NOT homologous_to bird wing as wing
batsdogs
bird
wing
homologous_to
homologous_to
CARO/multi-species AOappendage--i-wing-bats--i-wing-birds--i-forelimb-dogs homologous_to at level of tetrapoda limb +
evidence + citation
tetrapoda
Approaches to capturing homology – data format.
OBO 1.2 format does not allow references and evidence codes to be captured for relationships, although OBO 1.3 will.
Therefore – there is a good case for capturing this information separately in a flat file (tsv) for now. This is what the nescent/cTOL project is doing.
The same argument applies, no matter what the relation.