An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous –...

33
An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of things have wings? birds, bats, flies, sphenoid bones*, planes, cars... An (OBO) ontology uses words from natural language... ... but constrains their meaning to some subset of their possible meanings in language. Constraints are written into definitions and relationships between terms and refer to the scientific literature. OBO foundry approach to ontology - 1

Transcript of An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous –...

Page 1: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

An (OBO) ontology is NOT a model of language, it is a model of reality.

Words are ambiguous – especially in isolation. Take the word 'wing'

what type of things have wings?

birds, bats, flies, sphenoid bones*, planes, cars...

An (OBO) ontology uses words from natural language...

... but constrains their meaning to some subset of their possible meanings in language.

Constraints are written into definitions and relationships between terms and refer to the scientific literature.

* the lateral process of a sphenoid bone is referred to as a wing (Henderson's dictionary biological terms)

OBO foundry approach to ontology - 1

Page 2: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

OBO foundry approach to ontology- 2 Problem with relying on the 'self-evident' meaning of words:

With no definitions for the nouns, these two statements both have obvious readings which are true:

1. eye part of craniofacial tissue

2. ommatidium part of eye

But, if we combine them:

ommatidium part of eye part of craniofacial tissue

... we can't get a true statement about biology, no matter what meaning we choose for the nouns.

An ontology with thousands of terms and relations makes an enormous number of statements. Only by defining terms can we hope to avoid problems like this.

Page 3: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

OBO 101 Type – a classification of things

(instances) in the real world, based on some biologically significant similarity (specified by definition + relationships)

Page 4: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

OBO 101 Type – a classification of things

(instances) in the real world, based on some biologically significant similarity.

Relationships between instances:Straightforward - each instance is just

one thing:my left index finger is part_of my left

hand

Page 5: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

OBO 101 Type – a classification of things

(instances) in the real world, based on some biologically significant similarity.

Relationships between instances:Straightforward - each instance is just

one thing:my left index finger is part_of my left

hand

Relationships between types:For any type, there can be many

instances in the world.Does a relationship between types

apply to all or just some of these instances?

Page 6: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Basic relations

Page 7: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Why we build anatomy ontologies

Main use case: annotation of gene expression and phenotypes (MODs and others)

This needs:This needs:

1. Terms with written definitions=> consistent annotation despite variant usage of terminology in literature.

2. Careful use of logically defined relations => allowing annotations with related terms to be grouped in biologically meaningful ways.

Use case 2: A controlled system for recording character matrices for use in homology calls (cTOL/Nescent).

Page 8: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Grouping annotations using an anatomy ontology

QUERY: list all axon tracts which are part of the adult midbrain

List all alleles causing phenotypesin these structures.List all genes expressed in these structures.

Page 9: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Grouping annotations using an anatomy ontology

QUERY: list all neurons which develop from neuroblast NB 7-1

List all alleles causing phenotypesin these structures.

List all genes expressed in these structures.

Page 10: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Why standardise the way we build anatomy ontologies?

So we don't all need to re-invent the wheel.

Use of common relations allows the development of common tools.

Best practice allows development of scalable and maintainable ontologies:

Written definitions make the reasons for graph structure clear. => New editors can understand and build on the work of

previous editors

Reducing multiple inheritance makes ontologies makes avoiding true-path violations practical.

Page 11: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

CARO is a standardisation exercise

CARO provides an upper-ontology template to use for constructing anatomy ontologies.

CARO provides a bridge between anatomy ontologies and the basic formal ontology (BFO). This is useful for putting constraints on relations.

anatomical structure maps to BFO: object and as such is defined as 'maximally connected'

this distinction makes part relationships easier to define and avoids confusion with is_a relations

disconnected anatomical systems (e.g., immune system) are classified under 'anatomical group'

Page 12: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

CARO is a purely structural classification.

CARO classifies structures from the bottom up – on the basis of granularity.

cell component cell portion of tissue multi-tissue structure ...

CARO classifies structure from the top down – as subdivisions of a whole organism

organ system appendage

CARO makes a clear distinction between cellular and acellular structures.

Page 13: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

CARO associated standardisation

1. A system for defining structures by their developmental fate.

2. A system for recording developmental timings.

3. A system for recording part relationships which change over developmental time.

Page 14: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Why build multi-species anatomy ontologies ?

We need systems for cross species inference of gene function.

What do we need for inference? The development, differentiation of functioning of

homolgous structures is likely to involve conserved genetic pathways – especially if the species are closely related.

Page 15: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Lee Niswander Nature Reviews Genetics 4, 133-143 (February 2003)

Conservation of gene expression and role in limb development

Page 16: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Why do we care about homology when building an anatomy ontology?

Consider an anatomy ontology of vertebrates:

skeletal systemcranial skeletal system

parietal bone (in_organism human)parietal bone (in_organism zebrafish) frontal bone (in_organism human)frontal bone (in_organism zebrafish)

Homologous : frontal bone (zebrafish) and parietal bone (human)

fpa

Different genes and developmental processes may underlie the development of the zebrafish frontal and the human frontal, even

though they have the same name and are similarly located

Page 17: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

How and where should homology information be captured?

Use case: Query for phenotypes affecting the human frontal bone and its homologous structure in other species.

Homologs = Synonyms

Page 18: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

How should we not build multi-species anatomy ontologies ?

It is not practical to write structural definitions which apply across many species and which apply to every structure in a homology group.

e.g.- try to define parietal bone for all descendants of the common ancestor of zebrafish and human.

Page 19: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

How should we not build multi-species anatomy ontologies ?

Structural defs across many species may sometimes be impossible:

Structure Z def: Anatomical entity with structural attributes, A, B, C and D.

Structure X def: Anatomical entity with structural attributes C, D, E & F.

evolved_from structure Z

Structure Y def: Anatomical entity with structural attributes A, B, G & H.

evolved_from structure Z

Page 20: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

fpa

A definition of homology

XY

Where:All instances of structure X part_of some instance of species AAll instances of structure Y part_of some instance of species BAll instances of structure Z part_of some instance of species C

X is homologous_to YIFFAll X derived_from some ZANDAll Y derived_from some Z,AND C is the most recent common ancestor of A and B

Z

derived_fromderived_fromderived_from

Page 21: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Some term definitionsforewing (MRCA Insecta)def: Membranous dorsal appendage of adult mesothoracic

segment of MRCA Insecta.

hindwing (MRCA Insecta)def: Membranous dorsal appendage of the adult metathoracic

segment of MRCA Insecta.

elytra (MRCA Coleoptera)def: Hard, chitinous dorsal appendage of adult mesothoracic

segment of MRCA Coleoptera.

haltere (MRCA Diptera)def: club shaped dorsal appendage of the metathoracic segment of MRCA Diptera.

MRCA = most recent common ancestor species

Page 22: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

appendage (CARO). is_a wing of MRCA Insecta. . is_a forewing of MRCA Insecta. . . derived_from wing of MRCA Diptera. . . derived_from elytra of MRCA Coleoptera. . is_a hindwing of MRCA insecta. . . derived_from haltere of MRCA Diptera. . . derived_from wing (alae) of MRCA Coleoptera. is_a limb of MRCA tetrapoda. . is_a forelimb of MRCA tetrapoda. . . derived_from wing of MRCA Aves. . . derived_from wing of MRCA Chiroptera*

MRCA = most recent common ancestor species

(*? perhaps should be derived_from autopod?)

A homology based multi-species anatomy ontology

Page 23: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Relations within species specific anatomy ontologies:

. wing (Drosophila melanogaster). derived from wing of Diptera MRCA

. elytra (Tribolium casteneum). derived_from elytra of coleoptera MRCA

combined with:

. forewing of MRCA Insecta

. . .derived_from wing of MRCA Diptera

. . .derived_from elytra of MRCA Coleoptera

is sufficient to deduce that:

elytra of Tribolium cateneumhomologous_to wing of Drosophila melanogaster

Reasoning homology

Page 24: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Acknowledgements Michael Ashburner CARO

Fabian Neuhaus (NIST) Paula MaBee (U S.Dakota /cTOL) Jose Mejino (U Washington, FMA) Chris Mungall (Berkeley) Melissa Haendel (U Oregon, ZFIN) Barry Smith (Buffalo)

Funding FlyBase; The Newton Trust

Page 25: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.
Page 26: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Terms in the Danio AO refer to many evolutionarily conserved structures.

Page 27: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Key point

The names of terms in this anatomy ontology refer, in normal usage, to many conserved anatomical structures.

But the terms themselves refer to whatever their definitions say they refer to - this is considerably more restrictive than normal usage.

Page 28: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Fabian’s idea

wing forelimb

limb-like thing

derives_from

derives_from

bat wing is_homologous to dog forelimb

batsdogs

Homology groupinglimb-like thing--d-wing-bats--d-forelimb-dogs

Question, what type of thing is a limb-like thing?

forelimb – MRCA tetrapodadef: The forelimb of the most recent common ancestor of the tetrapoda. (attributes ....)

Page 29: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Approaches to recording homology - relations

1. Capture pairwise X in species A homologous_to Y in species B

Disadvantage – homology of X and Y to structures in species C, D, E F .... must be captured separately => v. many homology statements.

2. Capture with reference to some central multi-species ontology:

X is_a fore-limb tetrapodaX derives_from fore-limb MRCA tetrapoda

Page 30: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Examples of term ambiguity:

Early stages in spermatogenesis

perineurial layer

Page 31: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

David’s idea

wing forelimb

limb

is_a

is_a

bat wing is_homologous to dog forelimb

batsdogs

Homology group ontologyforelimb-tetrapoda--is_a wing-bat--is_a forelimb-dog

CARO/other structural ontologyCardinal organism part--is_a appendage

tetrapoda

forelimb – tetrapodadef: An anatomical structure derived from the forelimb of the most recent common ancestor of the tetrapoda (attributes...)

Page 32: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Fish group idea (may not be current)

wing forelimb

limb

bat wing is_homologous to dog forelimb as limbbat wing is_homologous to bird wing as limbbat wing NOT homologous_to bird wing as wing

batsdogs

bird

wing

homologous_to

homologous_to

CARO/multi-species AOappendage--i-wing-bats--i-wing-birds--i-forelimb-dogs homologous_to at level of tetrapoda limb +

evidence + citation

tetrapoda

Page 33: An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.

Approaches to capturing homology – data format.

OBO 1.2 format does not allow references and evidence codes to be captured for relationships, although OBO 1.3 will.

Therefore – there is a good case for capturing this information separately in a flat file (tsv) for now. This is what the nescent/cTOL project is doing.

The same argument applies, no matter what the relation.