The Relation Ontology
description
Transcript of The Relation Ontology
The Relation Ontology
Barry Smith
1
Concepts, Types and Frames
Concepts Frames
TypesRelational Structures
2
Concepts, Types and Frames
Concepts FramesLinguistic Approach
TypesRelational Structures
Scientific Approach
3
4
has_lower_level_granularity
TLR2-MyD88binding TLR2has_participant
LTA bindinghas_disposition
TIR domain
has_part
TLR2-TLR2ligand
binding
TIR-TIRbinding
process
preceded_by
regulated_by
has_outp
ut
has_participant
TLR2:MyD88complex
MyD88
has_participant
TLR-2 signalling pathway
5how to define relations such as this?
Uses of ‘ontology’ in PubMed abstracts
6
By far the most successful: The Gene Ontology
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
How to do biology across the genome?
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
9
10
what cellular component?
what molecular function?
what biological process?
11
what cellular component?
what molecular function?
what biological process?
GO used in curation of literature
and in integration of databasesMouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
12
The GO Idea
MouseEcotope GlyProt
DiabetInGene
GluChem
Holliday junction helicase complex
13
The GO Idea
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
14
Clark et al., 2005
part_of
is_a
15
GO used in reasoning
GO provides a controlled system of representations for use in
annotating data
• multi-species
• multi-disciplinary
• multi-granularity, from molecules to population
18
Gene products involved in cardiac muscle development in humans 19
$100 mill. invested in literature curation using GO
over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO
20
GO allows a new kind of biological research
based on analysis and comparison of the massive quantities of annotations linking GO terms to the gene products described in scientific literature and in scientific databases
21
GO is amazingly successful in overcoming data silo problems
but it covers only
– cellular components
– molecular functions
– biological processes
22
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry23
The OBO Foundry
– to extend the GO to enable intelligent integration of gigantic bodies of heterogeneous data across the entire domain of the life sciences, including clinical medicine
– to create an evolving, map-like, computable representation of the entire domain of biological and medical reality
24
Initial Candidate Members– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype (Quality) Ontology– FMA Foundational Model of Anatomy– ChEBI Chemical Entities of Biological Interest – CARO Common Anatomy Reference Ontology – PRO Protein Ontology 25
The OBO Foundry
Under development – Disease Ontology– Infectious Disease Ontology– Mammalian Phenotype Ontology – Plant Trait Ontology– Environment Ontology– Ontology for Biomedical Investigations– Behavior Ontology– RNA Ontology – RO Relation Ontology
26
The OBO Foundry
A success story in top-down information integration
Ontologies configured as extensions of a single upper level ontology (BFO)
Used by 100s of researchers to promote interoperability of experimental data in scores of high-throughput domains of biology and medicine via semantic annotation
27
The linguistic approach
Bottoms-up, focused on linguistic properties manifested by the contents of a large corpus viewed from a cognitive perspective (mapping/modeling meanings or concepts rather than entities in reality)
28
Automatic mining of “assocations” from MEDLINE
FACTA: Finding Associated Concepts with Text Analysis
– What diseases are related to a particular chemical?– What proteins are related to a particular disease?
http://text0.mib.man.ac.uk/software/facta/
29
For the linguistic approach
fiction may be no less important than fact
English has no privileged status (the larger the corpus, the better)
consistency (and thus additivity) of annotations is not important, because cognitive perspectives differ
goal is automatic generation of semantic annotations via pattern- matching
30
For the scientific approach
factual discourse alone importantEnglish is lingua franca regimentation is allowedgoal of truth: to create a single
computer-processable map of reality via painstaking Handarbeit
truth is one we strive for consistency of annotations
31
The linguistic approach is concerned with knowledge representation
The scientific approach is concerned with reality representation
32
OBO Relation Ontology (RO 1.0)
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
33
Relation Ontology
supports consistent linkage of OBO Foundry ontologies through a common system of formally defined relations
to enable reasoning both within and across ontologies, and thus also within and between the literature annotated in its terms
34
Relation Ontologyinstance_of
is_a (= is a subtype of)
depends_on
part_of
inheres_in
has_input
has_participant
….
http://obofoundry.org/ro/35
Basic Formal Ontology (BFO)
Continuant Occurrent(Process, Event)
IndependentContinuant
DependentContinuant
36
http://ifomis.uni-saarland.de/bfo/
Fundamental DichotomyContinuants preserve their identity through
change
Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive phases
– exist only in their phases
– have all their parts of necessity
37
instance_of
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......
types
instances38
types vs. instances
compare OWL: T-box vs. A-box
(terminology vs. assertions)
39
3 kinds of (binary) relations
Between types
• human is_a mammal
• human heart part_of human
Between an instance and a type
• this human instance_of the type human
• this human allergic_to the type tamiflu
Between instances
• Mary’s heart part_of Mary
• Mary’s aorta connected_to Mary’s heart40
depends_on
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......quality dependson bearer
41
Dependent continuants
the whiteness quality of this cheese
your role as lecturer
the disposition of this peach to ripen
42
depends_on
Continuant Occurrent
process
IndependentContinuant
thing
DependentContinuant
quality
.... ..... .......temperature dependson bearer
43
depends_on
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality, …
.... ..... .......event dependson participant
44
Type-level relations presuppose the underlying instance-level relations
A is_a B =def. A and B are types and all instances of A are instances of B
A part_of B =def. All instances of A are instance-level-parts-of some instance of B
45
The assertions linking terms in ontologies must hold universally
Hence all type-level relations in RO
are provided with
All-Some definitions
(For linguists, Some-Some relations
are equally important)
47
Including only All-Some relations means:
All relations evaluable as
1. Transitive
2. Symmetric
3. Reflexive
4. Anti-Symmetric
All relations support logical reasoning
– as contrasted with: is_related_to, is_associated_with, is_narrower_in_meaning_than …
49
Reasoning should be able to cascade from one relational assertion (A R1 B) to the next (B R2 C).
Find all DNA binding proteins should also Find all transcription factor proteins because
– Transcription factor is_a DNA binding protein
Only the All-Some structure guarantees such cascading of relational assertions
50
Organisms are continuantsthey are entities which endure through time through gain and loss of parts
Processes are occurrents
they are entities which unfold through time, and have all their parts as a matter of necessity
53
human testis part_of adult human being
but nothuman being has_part human testis
and not even
male human being has_part human testis
54
part_of for continuant types
A part_of B =def.
For all x, t if x instance_of A at t then there is some y, y instance_of B at t and x instance_level_part_of y at t
cell membrane part_of cell55
part_of for occurrent types
A part_of B =def.
For all x, if x instance_of A then there is some y, y instance_of B and x instance_level_part_of y
EVERY A IS PART OF SOME B 56
transformation_of
A transformation_of B =Def.
Every instance of A was at some earlier time an instance of B
– adult transformation_of child
59
transformation_of
60
c at t1
C
c at t
C1
time
same instance
pre-RNA mature RNA
adultchild
C
c at t
C1
c1 at t1
C'
c' at t
time
instances
zygote derives_fromovumsperm
derives_from
correction to original Genome Biology paper: derivation is never one-to-one 61
two continuants fuse to form a new continuant
C
c at t
C1
c1 at t1
C'
c' at t fusion
derives_from
62
one initial continuant is replaced by two successor continuants
C
c at t
C1
c1 at t1
C2
c1 at t1
fission
derives_from
63
one continuant detaches itself from an initial continuant, which itself continues to exist
C
c at t c at t1
C1
c1 at t
budding
derives_from combined with transformation_of
64
one continuant absorbs a second continuant while itself continuing to exist
C
c at t
c at t1
C'
c' at t capture
derives_from combined with transformation_of
65
ISO “Concept logic” for mereology
Toronto part_of Ontario
brain part_of central nervous system
ISO, “Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies” ANSI/NISO Z39.19-2005) sees these as examples of the same part_of relation
66
Instances vs. types
Instance-level relations and type-level relations have logically distinct properties
Type relations are liftings of instance relations
67
What is symmetric on the level of instances need not be symmetric on the level of types
adjacency on the instance level is always symmetric
68
Not however on the level of types:
seminal vesicle adjacent_to urinary bladder
Not: urinary bladder adjacent_to seminal vesicle
69
Similarly, on the level of types, while:
nucleus adjacent_to cytoplasm
it is not the case that
cytoplasm adjacent_to nucleus
70
continuous_with on the instance level is always symmetric
a continuous_with b on the instance level means: there is a fiat boundary
between a and b
if a continuous_with b,
then b continuous_with a
71
72
continuous_with as a relation between types
A continuous_with B =Def.
for all x, if x instance-of A then there is some y such that y instance_of B and x continuous_with y
73
continuous_with is not symmetric
Consider lymph node and lymphatic vessel
Each lymph node is continuous with some lymphatic vessel, but there are lymphatic vessels (e.g. lymphs and lymphatic trunks) which are not continuous with any lymph nodes
74
3 kinds of binary relations
Between types• human is_a mammal• cell nucleus part_of cell
Between an instance and a type• this human instance_of the type human• this human allergic_to the type penicillin
Between instances• Mary’s heart part_of Mary• Mary’s aorta connected_to Mary’s heart
75
Linguistic vs. scientific approach to semantic annotation
Semantic annotation can provide support for logical reasoning across the content of scientific literature only if the distinctions between relations at the type level and relations at the instance level are taken account of.
(Many?) linguistic accounts of relations do not take account of this distinction.
76
Why not?
Because linguistic accounts (like dictionaries) focus on relations between meanings, not on instances in reality
Because linguistic accounts focus on what is meaningfully combinable, rather than on what is logically inferrable
Because linguistic accounts focus on relations captured grammatically, not on relations observed experimentally and captured in scientific theories
77
The Relation Ontology
Barry Smith
78
Sophia AnaniadouUK National Centre for Text Mining
79
Do linguistics and biology truly ever meet?