Towards Common Upper Ontology Barry Smith ontology.buffalo/smith September 25, 2009

Post on 06-Jan-2016

29 views 0 download

Tags:

description

Towards Common Upper Ontology Barry Smith http://ontology.buffalo.edu/smith September 25, 2009. Overview. The Rise of Applied Ontology The OBO Foundry Basic Formal Ontology How to Build an Ontology What is a Disease?. Overview. The Rise of Applied Ontology The OBO Foundry - PowerPoint PPT Presentation

Transcript of Towards Common Upper Ontology Barry Smith ontology.buffalo/smith September 25, 2009

Towards Common Upper Ontology

Barry Smithhttp://ontology.buffalo.edu/smith

September 25, 2009

1

Overview

1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?

2

Overview

1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?

3

Uses of ‘ontology’ in PubMed abstracts

4

2006 2260

2007 2968

2008 3236

year number of abstracts

5

By far the most successful: The Gene Ontology

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV

How to do biology across the genome?

7

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE

8

what cellular component?

what molecular function?

what biological process?

9

GO used to tag database entriesMouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

10

GO used to tag database entriesMouseEcotope GlyProt

DiabetInGene

GluChem

Holliday junction helicase complex

11

GO used to tag database entriesMouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

12

what cellular component?

what molecular function?

what biological process?

GO used in curation of literature

13

A new kind of scientific publishing

Biologist curators annotate experimental observations reported in the biomedical literature to link gene products (such as proteins) with GO terms

International Society of Biocurators http://www.biocurator.org/

14

15

16

17

Clark et al., 2005

part_of

converting journal articles into algorithmically processable artifacts

18

The logic of GO

OBO Format

http://oboedit.org/

OWL DL

http://www.co-ode.org/resources/papers/OBO2OWL.pdf

Common Logic http://www.berkeleybop.org/people/cjm/Mungall-bib.html#mungall_experiences_2009

19

$100 mill. invested in literature curation using GO

over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO

20

GO provides a controlled system of representations for use in annotating

data and literature

• multi-species

• multi-disciplinary

• multi-granularity, from molecules to population

21

Example of use of the GOA study of 11 breast and 11 colorectal cancers found 13,023 genes

The GO tells you what is standard functioning for each these genes

By searching for deviations from this standard in the sample, 189 genes were identified as being mutated at significant frequencies and thus as providing targets for diagnostic and therapeutic intervention.

Sjöblöm T, et al. Science. 2006 ;314:268-74.

22

This kind of research only works if we have a common ontology

• Data is retrievable

• Data is comparable

• Data is integratable

only to the degree that it is annotated using a common controlled vocabulary (compare the role of seconds, meters, kilograms …)

23

Overview

1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?

24

GO is amazingly successful in overcoming data silo problems

but it covers only

– cellular components

– molecular functions

– biological processes

25

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

The Open Biomedical Ontologies (OBO) Foundry26

The OBO Foundry– to extend the GO to enable intelligent integration of gigantic bodies of heterogeneous data across the entire domain of the life sciences, including clinical medicine

– to create an evolving, map-like, computable representation of the entire domain of biological and medical reality

Barry Smith, et al., “The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration”, Nature Biotechnology, 25 (11), 2007

27

Overview

1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?

28

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RNAO, PRO)

Molecular Function(GO)

Molecular Process

(GO)

rationale of OBO Foundry coverage

GRANULARITY

RELATION TO TIME

29

Basic Formal Ontology (BFO)

Continuant Occurrent(Process, Event)

IndependentContinuant

DependentContinuant

http://ontology.buffalo.edu/bfo/ 30

BFO

A simple top-level ontology to support information integration in scientific research

No abstracta

Nothing propositional

No overlap with domain ontologies (for society, for information, …) – built by populating downwards

31

Three Fundamental Dichotomies

Continuant vs. occurrent

Dependent vs. independent

Type vs. instance

32

Continuant

thing, quality …

Occurrent

process, event

33

depends_on

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......quality dependson bearer

34

instance_of

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......

types

instances35

3 kinds of (binary) relations

Between types

• human is_a mammal

• human heart part_of human

Between an instance and a type

• this human instance_of the type human

• this human allergic_to the type tamiflu

Between instances

• Mary’s heart part_of Mary

• Mary’s aorta connected_to Mary’s heart36

depends_on

Continuant Occurrent

process

IndependentContinuant

thing

DependentContinuant

quality

.... ..... .......temperature dependson bearer

37

depends_on

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality, …

.... ..... .......event dependson participant

38

3 kinds of (binary) relations

Between types

• human is_a mammal

• human heart part_of human

Between an instance and a type

• this human instance_of the type human

• this human allergic_to the type tamiflu

Between instances

• Mary’s heart part_of Mary

• Mary’s aorta connected_to Mary’s heart39

Clark et al., 2005

part_of

is_a

Definitions of relations

40

Barry Smith, et al., “Relations in Biomedical Ontologies”, Genome Biology 2005, 6 (5), R46.

Type-level relations presuppose the underlying instance-level relations

A is_a B =def. A and B are types and all instances of A are instances of B

A part_of B =def. All instances of A are instance-level-parts-of some instance of B

41

human testis part_of adult human being

but nothuman being has_part human testis

and not even

male human being has_part human testis

42

The assertions linking terms in ontologies must hold universally

Hence type-level relations such as

part_of are provided with

All-Some definitions

43

part_of for continuant types

A part_of B =def.

For all x, t if x instance_of A at t then there is some y, y instance_of B at t and x instance_level_part_of y at t

cell membrane part_of cell44

part_of for occurrent types

A part_of B =def.

For all x, if x instance_of A then there is some y, y instance_of B and x instance_level_part_of y

EVERY A IS PART OF SOME B 45

Instances vs. types

Instance-level relations and type-level relations have logically distinct properties

What is symmetric on the level of instances need not be symmetric on the level of types

46

seminal vesicle adjacent_to urinary bladder

Not: urinary bladder adjacent_to seminal vesicle

nucleus adjacent_to cytoplasm

Not: cytoplasm adjacent_to nucleus

47

Overview

1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?

48

Blinding Flash of the Obvious

Continuant Occurrent(Process, Event)

IndependentContinuant

DependentContinuant

How to create an ontology from the top down

49

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

The Open Biomedical Ontologies (OBO) Foundry50

Example: The Cell Ontology

Benefits of coordination

No need to reinvent the wheel

Can profit from lessons learned through mistakes made by others

Can more easily reuse what is made by others

Can more easily inspect and criticize results of others’ work (PATO)

Leads to innovations (e.g. Mireot) in strategies for combining ontologies

52

Users of BFO

PharmaOntology (W3C HCLS SIG)

MediCognos / Microsoft Healthvault

Cleveland Clinic Semantic Database in Cardiothoracic Surgery

Major Histocompatibility Complex (MHC) Ontology (NIAID)

Neuroscience Information Framework Standard (NIFSTD) and Constituent Ontologies

53

Users of BFO

Interdisciplinary Prostate Ontology (IPO)

Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research

Neural Electromagnetic Ontologies (NEMO)

ChemAxiom – Ontology for Chemistry

Ontology for Risks Against Patient Safety (RAPS/REMINE) (EU FP7)

IDO Infectious Disease Ontology (NIAID)

54

Users of BFO

National Cancer Institute Biomedical Grid Terminology (BiomedGT)

US Army Universal Core Semantic Layer (UCore SL)

US Army Biometrics Ontology

US Army Command and Control Ontology

Ontology for General Medical Science (OGMS)

55

Infectious Disease Ontology Consortium

• MITRE, Mount Sinai, UTSouthwestern – Influenza

• IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)

• Colorado State University – Dengue Fever

• Duke University – Tuberculosis, Staph. aureus, HIV

• Case Western Reserve – Infective Endocarditis

• University of Michigan – Brucilosis

56

Initial Candidate Members– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype (Quality) Ontology– FMA Foundational Model of Anatomy– ChEBI Chemical Entities of Biological Interest – CARO Common Anatomy Reference Ontology – PRO Protein Ontology

The OBO Foundry

57

Under development – Disease Ontology– Infectious Disease Ontology– Mammalian Phenotype Ontology – Plant Trait Ontology– Environment Ontology– Ontology for Biomedical Investigations– Behavior Ontology– RNA Ontology  

The OBO Foundry

58

Initial Candidate Members– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype (Quality) Ontology– FMA Foundational Model of Anatomy– ChEBI Chemical Entities of Biological Interest – CARO Common Anatomy Reference Ontology – PRO Protein Ontology

The OBO Foundry

59

Under development – Disease Ontology– Infectious Disease Ontology– Mammalian Phenotype Ontology – Plant Trait Ontology– Environment Ontology– Ontology for Biomedical Investigations– Behavior Ontology– RNA Ontology  

The OBO Foundry

60

Blinding Flash of the Obvious

Continuant Occurrent(Process, Event)

IndependentContinuant

DependentContinuant

How to create an ontology from the top down

61

Continuant

IndependentContinuant

DependentContinuant

..... .....

Non-realizableDependentContinuant(quality)

Realizable DependentContinuant(function, role, disposition)

62

Realizable dependent continuants

plan

function

role

disposition

capability

tendency

continuants

63

Their realizations

execution

expression

exercise

realization

application

course

occurrents

64

Continuant

IndependentContinuant

DependentContinuant

..... .....

Non-realizableDependentContinuant(quality)

Realizable DependentContinuant(function, role, disposition)

65

realization depends_on realizable

Continuant Occurrent

IndependentContinuant

bearer

DependentContinuant

disposition

.... ..... .......Process of realization

66

Specific Dependenceon the instance level

a depends_on b =def. a is necessarily such that if b ceases to exist than a ceases to exist

on the type level

A specifically_depends_on B =def. for every instance a of A, there is some instance b of B such that a depends_on b.

67

depends_on

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......temperature dependson bearer

68

Specifically dependent continuants

• the quality of whiteness of this cheese

• your role as lecturer

• the disposition of this patient to experience diarrhea

69

the particular case of redness (of a particular fly eye)

the universal red

instantiates

an instance of an eye (in a particular fly)

the universal eye

instantiates

depends on

70

the particular case of redness (of a particular fly eye)

red

instantiates

an instance of an eye (in a particular fly)

eye

instantiates

depends on

color anatomical structure

is_a is_a

71

depends_on

Continuant Occurrent

process

IndependentContinuant

thing

DependentContinuant

quality

.... ..... .......temperature dependson bearer

72

Specifically Dependent Continuants

SpecificallyDependentContinuant

Quality, Pattern

Realizable Dependent Continuant

if the bearer ceases to exist, then its quality, function, role ceases to exist

the color of my skin

the function of my heart to pump blood

my weight73

Generically Dependent Continuants

GenericallyDependentContinuant

Information Object

Gene Sequence

if one bearer ceases to exist, then the entity can survive, because there are other bearers

(copyability)

the pdf file on my laptop

the DNA (sequence) in this chromosome 74

Overview

1. The Rise of Applied Ontology2. The OBO Foundry3. Basic Formal Ontology4. How to Build an Ontology5. What is a Disease?

75

What is a Disease?

a state in which a function or part of the body is no longer in a healthy condition.

an illness

a process that is a hazard to health and/or longevity.

a pathological condition that is cross-culturally defined and recognized

76

Four distinct classificatory tasks

1. of people (patients, carriers, …)

2. of diseases (cases, instances, problems, …)

3. of courses of disease (symptoms, …)

4. of representations (data, diagnoses…)

77

Four distinct BFO categories

1. person (patient, carrier, …) – independent continuant

2. disease (case, instance, problem, …) – specifically dependent continuant

3. course of disease (symptom, treatment…)– occurrent

4. representation (record, datum, diagnosis…)– generically dependent continuant

78

Disposition

Internally-Grounded Realizable Entity

A disposition is

a realizable entity which is such that

(1) if it ceases to exist, then its bearer is physically changed, and

(2) whose realization occurs, in virtue of the bearer’s physical make-up, when this bearer is in some special physical circumstances

79

Disorder

A part of an (extended) organism which serves as the bearer of a disposition of a certain sort

80

Big Picture

81

A disease is a disposition rooted in a

physical disorder in the organism and

realized in pathological processes.

etiological process

produces

disorder

bears

disposition

realized_in

pathological process

produces

abnormal bodily features

recognized_as

signs & symptomsinterpretive process

produces

diagnosis

used_in82

Elucidation of Primitive Terms ‘bodily feature’ - an abbreviation for a physical

component, a bodily quality, or a bodily process. disposition - an attribute describing the propensity to

initiate certain specific sorts of processes when certain conditions are satisfied.

clinically abnormal - some bodily feature that (1) is not part of the life plan for an organism of the

relevant type (unlike aging or pregnancy), (2) is causally linked to an elevated risk either of pain or

other feelings of illness, or of death or dysfunction, and (3) is such that the elevated risk exceeds a certain

threshold level.*

*Compare: baldness83

Definitions - Foundational Terms

Disorder =def. – A physical component that is clinically abnormal.

Pathological Process =def. – A bodily process that is a realization of a disorder and is clinically abnormal.

Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism.

84

Dispositions and Predispositions

All diseases are dispositions; not all dispositions are diseases.

A predisposition is a disposition. Predisposition to Disease of Type X

=def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing the disease X.

85

Cirrhosis - environmental exposure Etiological process - phenobarbitol-

induced hepatic cell death produces

Disorder - necrotic liver bears

Disposition (disease) - cirrhosis realized_in

Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death produces

Abnormal bodily features recognized_as

Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out cirrhosis suggests

Laboratory tests produces

Test results - elevated liver enzymes in serum used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease cirrhosis

86

Influenza - infectious Etiological process - infection of

airway epithelial cells with influenza virus produces

Disorder - viable cells with influenza virus bears

Disposition (disease) - flu realized_in

Pathological process - acute inflammation produces

Abnormal bodily features recognized_as

Symptoms - weakness, dizziness Signs - fever

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out influenza suggests

Laboratory tests produces

Test results - elevated serum antibody titers used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease flu

But the disorder also induces normal physiological processes (immune response) that can results in the elimination of the disorder (transient disease course).

87

Huntington’s Disease - genetic Etiological process - inheritance of

>39 CAG repeats in the HTT gene produces

Disorder - chromosome 4 with abnormal mHTT bears

Disposition (disease) - Huntington’s disease realized_in

Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum produces

Abnormal bodily features recognized_as

Symptoms - anxiety, depression Signs - difficulties in speaking and

swallowing

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out Huntington’s suggests

Laboratory tests produces

Test results - molecular detection of the HTT gene with >39CAG repeats used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease

88

Benefits of coordinationNo need to reinvent the wheel

Can profit from lessons learned through mistakes made by others

Can more easily reuse data collected by others

Can more easily resolve the silo problems created by multiple independent discipline-specific ontologies

89