The Relation Ontology Barry Smith 1. Concepts, Types and Frames ConceptsFrames Types Relational...

71
The Relation Ontology Barry Smith 1
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of The Relation Ontology Barry Smith 1. Concepts, Types and Frames ConceptsFrames Types Relational...

The Relation Ontology

Barry Smith

1

Concepts, Types and Frames

Concepts Frames

TypesRelational Structures

2

Concepts, Types and Frames

Concepts FramesLinguistic Approach

TypesRelational Structures

Scientific Approach

3

4

has_lower_level_granularity

TLR2-MyD88binding TLR2has_participant

LTA bindinghas_disposition

TIR domain

has_part

TLR2-TLR2ligand

binding

TIR-TIRbinding

process

preceded_by

regulated_by

has_outp

ut

has_participant

TLR2:MyD88complex

MyD88

has_participant

TLR-2 signalling pathway

5how to define relations such as this?

Uses of ‘ontology’ in PubMed abstracts

6

By far the most successful: The Gene Ontology

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV

How to do biology across the genome?

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE

9

10

what cellular component?

what molecular function?

what biological process?

11

what cellular component?

what molecular function?

what biological process?

GO used in curation of literature

and in integration of databasesMouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

12

The GO Idea

MouseEcotope GlyProt

DiabetInGene

GluChem

Holliday junction helicase complex

13

The GO Idea

MouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

14

Clark et al., 2005

part_of

is_a

15

GO used in reasoning

GO provides a controlled system of representations for use in

annotating data

• multi-species

• multi-disciplinary

• multi-granularity, from molecules to population

18

Gene products involved in cardiac muscle development in humans 19

$100 mill. invested in literature curation using GO

over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO

20

GO allows a new kind of biological research

based on analysis and comparison of the massive quantities of annotations linking GO terms to the gene products described in scientific literature and in scientific databases

21

GO is amazingly successful in overcoming data silo problems

but it covers only

– cellular components

– molecular functions

– biological processes

22

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

The Open Biomedical Ontologies (OBO) Foundry23

The OBO Foundry

– to extend the GO to enable intelligent integration of gigantic bodies of heterogeneous data across the entire domain of the life sciences, including clinical medicine

– to create an evolving, map-like, computable representation of the entire domain of biological and medical reality

24

Initial Candidate Members– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype (Quality) Ontology– FMA Foundational Model of Anatomy– ChEBI Chemical Entities of Biological Interest – CARO Common Anatomy Reference Ontology – PRO Protein Ontology 25

The OBO Foundry

Under development – Disease Ontology– Infectious Disease Ontology– Mammalian Phenotype Ontology – Plant Trait Ontology– Environment Ontology– Ontology for Biomedical Investigations– Behavior Ontology– RNA Ontology  – RO Relation Ontology

26

The OBO Foundry

A success story in top-down information integration

Ontologies configured as extensions of a single upper level ontology (BFO)

Used by 100s of researchers to promote interoperability of experimental data in scores of high-throughput domains of biology and medicine via semantic annotation

27

The linguistic approach

Bottoms-up, focused on linguistic properties manifested by the contents of a large corpus viewed from a cognitive perspective (mapping/modeling meanings or concepts rather than entities in reality)

28

Automatic mining of “assocations” from MEDLINE

FACTA: Finding Associated Concepts with Text Analysis

– What diseases are related to a particular chemical?– What proteins are related to a particular disease?

http://text0.mib.man.ac.uk/software/facta/

29

For the linguistic approach

fiction may be no less important than fact

English has no privileged status (the larger the corpus, the better)

consistency (and thus additivity) of annotations is not important, because cognitive perspectives differ

goal is automatic generation of semantic annotations via pattern- matching

30

For the scientific approach

factual discourse alone importantEnglish is lingua franca regimentation is allowedgoal of truth: to create a single

computer-processable map of reality via painstaking Handarbeit

truth is one we strive for consistency of annotations

31

The linguistic approach is concerned with knowledge representation

The scientific approach is concerned with reality representation

32

OBO Relation Ontology (RO 1.0)

Foundational is_apart_of

Spatial located_incontained_inadjacent_to

Temporal transformation_ofderives_frompreceded_by

Participation has_participanthas_agent

33

Relation Ontology

supports consistent linkage of OBO Foundry ontologies through a common system of formally defined relations

to enable reasoning both within and across ontologies, and thus also within and between the literature annotated in its terms

34

Relation Ontologyinstance_of

is_a (= is a subtype of)

depends_on

part_of

inheres_in

has_input

has_participant

….

http://obofoundry.org/ro/35

Basic Formal Ontology (BFO)

Continuant Occurrent(Process, Event)

IndependentContinuant

DependentContinuant

36

http://ifomis.uni-saarland.de/bfo/

Fundamental DichotomyContinuants preserve their identity through

change

Occurrents (aka processes)

– have temporal parts

– unfold themselves in successive phases

– exist only in their phases

– have all their parts of necessity

37

instance_of

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......

types

instances38

types vs. instances

compare OWL: T-box vs. A-box

(terminology vs. assertions)

39

3 kinds of (binary) relations

Between types

• human is_a mammal

• human heart part_of human

Between an instance and a type

• this human instance_of the type human

• this human allergic_to the type tamiflu

Between instances

• Mary’s heart part_of Mary

• Mary’s aorta connected_to Mary’s heart40

depends_on

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......quality dependson bearer

41

Dependent continuants

the whiteness quality of this cheese

your role as lecturer

the disposition of this peach to ripen

42

depends_on

Continuant Occurrent

process

IndependentContinuant

thing

DependentContinuant

quality

.... ..... .......temperature dependson bearer

43

depends_on

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality, …

.... ..... .......event dependson participant

44

Type-level relations presuppose the underlying instance-level relations

A is_a B =def. A and B are types and all instances of A are instances of B

A part_of B =def. All instances of A are instance-level-parts-of some instance of B

45

The assertions linking terms in ontologies must hold universally

Hence all type-level relations in RO

are provided with

All-Some definitions

(For linguists, Some-Some relations

are equally important)

47

Including only All-Some relations means:

All relations evaluable as

1. Transitive

2. Symmetric

3. Reflexive

4. Anti-Symmetric

All relations support logical reasoning

– as contrasted with: is_related_to, is_associated_with, is_narrower_in_meaning_than …

49

Reasoning should be able to cascade from one relational assertion (A R1 B) to the next (B R2 C).

Find all DNA binding proteins should also Find all transcription factor proteins because

– Transcription factor is_a DNA binding protein

Only the All-Some structure guarantees such cascading of relational assertions

50

Organisms are continuantsthey are entities which endure through time through gain and loss of parts

Processes are occurrents

they are entities which unfold through time, and have all their parts as a matter of necessity

53

human testis part_of adult human being

but nothuman being has_part human testis

and not even

male human being has_part human testis

54

part_of for continuant types

A part_of B =def.

For all x, t if x instance_of A at t then there is some y, y instance_of B at t and x instance_level_part_of y at t

cell membrane part_of cell55

part_of for occurrent types

A part_of B =def.

For all x, if x instance_of A then there is some y, y instance_of B and x instance_level_part_of y

EVERY A IS PART OF SOME B 56

transformation_of

A transformation_of B =Def.

Every instance of A was at some earlier time an instance of B

– adult transformation_of child

59

transformation_of

60

c at t1

C

c at t

C1

time

same instance

pre-RNA mature RNA

adultchild

C

c at t

C1

c1 at t1

C'

c' at t

time

instances

zygote derives_fromovumsperm

derives_from

correction to original Genome Biology paper: derivation is never one-to-one 61

two continuants fuse to form a new continuant

C

c at t

C1

c1 at t1

C'

c' at t fusion

derives_from

62

one initial continuant is replaced by two successor continuants

C

c at t

C1

c1 at t1

C2

c1 at t1

fission

derives_from

63

one continuant detaches itself from an initial continuant, which itself continues to exist

C

c at t c at t1

C1

c1 at t

budding

derives_from combined with transformation_of

64

one continuant absorbs a second continuant while itself continuing to exist

C

c at t

c at t1

C'

c' at t capture

derives_from combined with transformation_of

65

ISO “Concept logic” for mereology

Toronto part_of Ontario

brain part_of central nervous system

ISO, “Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies” ANSI/NISO Z39.19-2005) sees these as examples of the same part_of relation

66

Instances vs. types

Instance-level relations and type-level relations have logically distinct properties

Type relations are liftings of instance relations

67

What is symmetric on the level of instances need not be symmetric on the level of types

adjacency on the instance level is always symmetric

68

Not however on the level of types:

seminal vesicle adjacent_to urinary bladder

Not: urinary bladder adjacent_to seminal vesicle

69

Similarly, on the level of types, while:

nucleus adjacent_to cytoplasm

it is not the case that

cytoplasm adjacent_to nucleus

70

continuous_with on the instance level is always symmetric

a continuous_with b on the instance level means: there is a fiat boundary

between a and b

if a continuous_with b,

then b continuous_with a

71

72

continuous_with as a relation between types

A continuous_with B =Def.

for all x, if x instance-of A then there is some y such that y instance_of B and x continuous_with y

73

continuous_with is not symmetric

Consider lymph node and lymphatic vessel

Each lymph node is continuous with some lymphatic vessel, but there are lymphatic vessels (e.g. lymphs and lymphatic trunks) which are not continuous with any lymph nodes

74

3 kinds of binary relations

Between types• human is_a mammal• cell nucleus part_of cell

Between an instance and a type• this human instance_of the type human• this human allergic_to the type penicillin

Between instances• Mary’s heart part_of Mary• Mary’s aorta connected_to Mary’s heart

75

Linguistic vs. scientific approach to semantic annotation

Semantic annotation can provide support for logical reasoning across the content of scientific literature only if the distinctions between relations at the type level and relations at the instance level are taken account of.

(Many?) linguistic accounts of relations do not take account of this distinction.

76

Why not?

Because linguistic accounts (like dictionaries) focus on relations between meanings, not on instances in reality

Because linguistic accounts focus on what is meaningfully combinable, rather than on what is logically inferrable

Because linguistic accounts focus on relations captured grammatically, not on relations observed experimentally and captured in scientific theories

77

The Relation Ontology

Barry Smith

78

Sophia AnaniadouUK National Centre for Text Mining

79

Do linguistics and biology truly ever meet?