Creating the CTSA Ontology Landscape: A Modular Strategy

58
Creating the CTSA Ontology Landscape: A Modular Strategy Barry Smith

description

Creating the CTSA Ontology Landscape: A Modular Strategy. Barry Smith. For modularity to work, developers must accept some basic principles. for formulating definitions of modularity of user feedback for error correction and gap identification for ensuring compatibility between modules - PowerPoint PPT Presentation

Transcript of Creating the CTSA Ontology Landscape: A Modular Strategy

Page 1: Creating the CTSA Ontology Landscape: A  Modular Strategy

Creating the CTSA Ontology Landscape: A Modular Strategy

Barry Smith

Page 2: Creating the CTSA Ontology Landscape: A  Modular Strategy

For modularity to work, developers must accept some basic principles

– for formulating definitions– of modularity– of user feedback for error correction and gap

identification– for ensuring compatibility between modules– for using ontologies to annotate legacy data– for using ontologies to create new data– for developing user-specific views

Page 3: Creating the CTSA Ontology Landscape: A  Modular Strategy

The Modular Approach• Create a small set of plug-and-play ontologies as

stable monohierarchies with a high likelihood of being reused

• Create ontologies incrementally• Reuse existing ontology resources• Use these ontologies incrementally in

annotating heterogeneous data• Annotating = arms length approach; the data

and data-models themselves remain as they are3

Page 4: Creating the CTSA Ontology Landscape: A  Modular Strategy

Benefits of Modularity• Brings a clean division of labor amongst

domain experts, who can manage governance aspects pertaining to their own domains

• Automatic consistency of the results of the distributed efforts – no room for contradiction

• Additivity of annotations even when multiple independently developed ontologies are used

• Lessons learned in developing and using one module can be used by the developers and users of later modules

4

Page 5: Creating the CTSA Ontology Landscape: A  Modular Strategy

Benefits of Modularity

• Increased likelihood of reuse, since potential users will be aware that they are investing in the results of an authoritative coordinated approach of proven reliability

• Increased value and portability of training in any given module

• Incentivization of those responsible for individual modules

5

Page 6: Creating the CTSA Ontology Landscape: A  Modular Strategy

Benefits of Modularity• All of those involved can more easily inspect

and criticize the results of others’ work • Creates a collaborative environment for

ontology development • serves as a platform for innovations which can

be easily propagated throughout the whole system

• Developing and using ontologies in a consistent fashion brings a number of network effects – the value of existing annotations increases as new annotations are added 6

Page 7: Creating the CTSA Ontology Landscape: A  Modular Strategy

You will need to embrace some strategy along these lines if you want to get funding for

translational research

NIH Mandates for Sharing of Research Data

Investigators submitting an NIH application seeking $500,000 or more in any single year are expected to include a plan for data sharing

(http://grants.nih.gov/grants/policy/data_sharing)

7

Page 8: Creating the CTSA Ontology Landscape: A  Modular Strategy

Logical standards can be only part of the solution

OWL … bring benefits primarily on the side of syntax (language)

What we need are standards on the semantics (content) side (via top-level ontologies), including standards for• top-level ontologies• common relations (part_of …)• relation of lower-level ontologies to each

other and to the higher levels

Page 9: Creating the CTSA Ontology Landscape: A  Modular Strategy

BFO, DOLCE, SUMO

All exist in FOL and OWL versionsAll have been tested in use

BFO: very small, truly domain-neutral

DOLCE: largely extends BFO, but built to support ‘linguistic and cognitive engineering’

SUMO: has its own tiny mathematics, tiny physics, tiny biology (‘body-covering’, ‘fruit-Or-vegetable’), …

9

Page 10: Creating the CTSA Ontology Landscape: A  Modular Strategy

120+ ontology projects using BFO

http://www.ifomis.org/bfo/

• Open Biomedical Ontologies Foundry • Ontology for General Medical Science• eagle-I, VIVO, CTSAconnect• AstraZeneca • Elsevier

Page 11: Creating the CTSA Ontology Landscape: A  Modular Strategy

How a common upper level ontology can help resist ontology chaos

• something to teach• training (expertise) is portable• each new ontology you confront will be more easily

understood at the level of content– and more easily criticized, error-checked

• provides starting-point for domain-ontology development• provides platform for tool-building and innovations• lessons learned in building and using one ontology can

potentially benefit other ontologies• promote shareability of data across discilinary and other

boundaries

Page 12: Creating the CTSA Ontology Landscape: A  Modular Strategy

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)Subcellular Anatomy Ontology (SAO)

Sequence Ontology (SO*) Molecular

Function(GO*)Protein Ontology

(PRO*) OBO Foundry Modular Organization

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical Investigations

(OBI)

Ontology of General Medical Science

(OGMS)

Basic Formal Ontology (BFO)

12

Page 13: Creating the CTSA Ontology Landscape: A  Modular Strategy

BFO

A simple top-level ontology to support information integration in scientific research• No overlap with domain ontologies

(organism, person, society, information, …)• Based on realism• No abstracta• Tested in many natural science domains

13

Page 14: Creating the CTSA Ontology Landscape: A  Modular Strategy

Basic Formal Ontology

Continuant Occurrent

process, eventIndependentContinuant

entity

DependentContinuant

property

property dependson bearer

14

Page 15: Creating the CTSA Ontology Landscape: A  Modular Strategy

depends_on

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

property event dependson participant

15

Page 16: Creating the CTSA Ontology Landscape: A  Modular Strategy

Basic Formal Ontology

continuant occurrent

biological processes

independentcontinuant

cellular component

dependentcontinuant

molecular function

Page 17: Creating the CTSA Ontology Landscape: A  Modular Strategy

roles, qualities

Continuant Occurrent

process, eventIndependentContinuant

DependentContinuant

17

Quality Disposition

Page 18: Creating the CTSA Ontology Landscape: A  Modular Strategy

instance_of

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

property

.... ..... .......

types

instances 18

Page 19: Creating the CTSA Ontology Landscape: A  Modular Strategy

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENTCell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RNAO, PRO)

Molecular Function(GO)

Molecular Process

(GO)

rationale of OBO Foundry coverage

GRANULARITY

RELATION TO TIME

19

Page 20: Creating the CTSA Ontology Landscape: A  Modular Strategy

Example: The Cell Ontology

Page 21: Creating the CTSA Ontology Landscape: A  Modular Strategy

Four distinct classificatory tasks

1. of people (patients, carriers, …)2. of diseases (cases, instances, problems, …)3. of courses of disease (symptoms, treatments…)4. of representations (records, observations, data,

diagnoses…)

ICD confuses 1. & 2.Most standard terminologies confuse 2. and 4

21

Page 22: Creating the CTSA Ontology Landscape: A  Modular Strategy

Ontology for General Medical Science (OGMS)

1. person (patient, carrier, …) – independent continuant

2. disease (case, instance, problem, …) – specifically dependent continuant

3. course of disease (symptom, treatment…)– occurrent

4. representation (record, datum, diagnosis…)– generically dependent continuant

http://code.google.com/p/ogms/22

Page 23: Creating the CTSA Ontology Landscape: A  Modular Strategy

Four distinct BFO categories

1. people (patients, carriers, …) – independent continuants

2. disease (case, instance, problem, condition …) – disposition

3. course of disease (symptom, episode, outbreak …)– realization of dispositions

4. representations (records, data, diagnoses…)– generically dependent continuants

23

Page 24: Creating the CTSA Ontology Landscape: A  Modular Strategy

Big Picture (Ontology for General Medical Science)

24

Page 25: Creating the CTSA Ontology Landscape: A  Modular Strategy

Elucidation of Primitive Terms

‘extended organism’ = the organism and all the material entities located within it

‘bodily feature’ = either a physical part of the extended organism, a bodily quality, or a bodily process.

25

Page 26: Creating the CTSA Ontology Landscape: A  Modular Strategy

Elucidation of Primitive Terms

clinically abnormal - some bodily feature that (1) is not part of the life plan for an organism of the

relevant type (unlike loss of milk teeth, aging or pregnancy),

(2) is causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and

(3) is such that the elevated risk exceeds a certain threshold level.*

*Compare: baldness

26

Page 27: Creating the CTSA Ontology Landscape: A  Modular Strategy

DisorderA material entity (fiat object part) which is clinically abnormal and part of an extended organism

Compare: Downtown Santa BarbaraMount Everest

27

Page 28: Creating the CTSA Ontology Landscape: A  Modular Strategy

Definitions - Foundational Terms

Pathological Process =def. – A bodily process that is clinically abnormal.

Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism.

29

Page 29: Creating the CTSA Ontology Landscape: A  Modular Strategy

Big Picture (Ontology for General Medical Science)

30

Page 30: Creating the CTSA Ontology Landscape: A  Modular Strategy

31http://code.google.com/p/ogms/

Page 31: Creating the CTSA Ontology Landscape: A  Modular Strategy

Disease Course=Def. The sum of processes through which a given disease instance is realized.

32

Page 32: Creating the CTSA Ontology Landscape: A  Modular Strategy

A disease is a disposition

etiological process

produces

disorder

bears

disposition

realized_in

pathological process

produces

abnormal bodily features

recognized_as

signs & symptomsinterpretive process

produces

diagnosis

used_in36

Page 33: Creating the CTSA Ontology Landscape: A  Modular Strategy

Cirrhosis - environmental exposure Etiological process - phenobarbitol-

induced hepatic cell death produces

Disorder - necrotic liver bears

Disposition (disease) - cirrhosis realized_in

Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death produces

Abnormal bodily features recognized_as

Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out cirrhosis suggests

Laboratory tests produces

Test results - elevated liver enzymes in serum used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease cirrhosis

37

Page 34: Creating the CTSA Ontology Landscape: A  Modular Strategy

Influenza - infectious Etiological process - infection of

airway epithelial cells with influenza virus produces

Disorder - viable cells with influenza virus bears

Disposition (disease) - flu realized_in

Pathological process - acute inflammation produces

Abnormal bodily features recognized_as

Symptoms - weakness, dizziness Signs - fever

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out influenza suggests

Laboratory tests produces

Test results - elevated serum antibody titers used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease flu

38

Page 35: Creating the CTSA Ontology Landscape: A  Modular Strategy

Huntington’s Disease - genetic Etiological process - inheritance of

>39 CAG repeats in the HTT gene produces

Disorder - chromosome 4 with abnormal mHTT bears

Disposition (disease) - Huntington’s disease realized_in

Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum produces

Abnormal bodily features recognized_as

Symptoms - anxiety, depression Signs - difficulties in speaking and

swallowing

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out Huntington’s suggests

Laboratory tests produces

Test results - molecular detection of the HTT gene with >39CAG repeats used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease

39

Page 36: Creating the CTSA Ontology Landscape: A  Modular Strategy

Dispositions and Predispositions

Some dispositions are predispositions to other dispositions.

40

Page 37: Creating the CTSA Ontology Landscape: A  Modular Strategy

HNPCC - genetic pre-disposition Etiological process - inheritance of a mutant mismatch repair gene

produces Disorder - chromosome 3 with abnormal hMLH1

bears Disposition (disease) - Lynch syndrome

realized_in Pathological process - abnormal repair of DNA mismatches

produces Disorder - mutations in proto-oncogenes and tumor suppressor genes with

microsatellite repeats (e.g. TGF-beta R2) bears

Disposition (disease) - non-polyposis colon cancer realized in

Symptoms (including pain)

41

Page 38: Creating the CTSA Ontology Landscape: A  Modular Strategy

Arterial AneurysmDisposition – atherosclerosis

realized inPathological process – fatty material collects within the walls of arteries

producesDisorder – artery with weakened wall

bearsDisposition – of artery to become distended

realized_inPathological process – process of distending

producesDisorder – arterial aneurysm

bearsDisposition – of artery to rupture

realized inPathological process – (catastrophic event) of rupturing

producesDisorder – ruptured artery, arterial system with dangerously low blood pressure

bearsDisposition – circulatory failure

realized inPathological process – exsanguination, failure of homeostasis

producesDeath 42

Page 39: Creating the CTSA Ontology Landscape: A  Modular Strategy

Systemic arterial hypertension Etiological process – abnormal

reabsorption of NaCl by the kidney produces

Disorder – abnormally large scattered molecular aggregate of salt in the blood bears

Disposition (disease) - hypertension realized_in

Pathological process – exertion of abnormal pressure against arterial wall produces

Abnormal bodily features recognized_as

Symptoms - Signs – elevated blood pressure

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out hypertension suggests

Laboratory tests produces

Test results - used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease hypertension

43

Page 40: Creating the CTSA Ontology Landscape: A  Modular Strategy

Type 2 Diabetes Mellitus Etiological process –

produces Disorder – abnormal pancreatic beta

cells and abnormal muscle/fat cells bears

Disposition (disease) – diabetes mellitus realized_in

Pathological processes – diminished insulin production , diminished muscle/fat uptake of glucose produces

Abnormal bodily features recognized_as

Symptoms – polydipsia, polyuria, polyphagia, blurred vision

Signs – elevated blood glucose and hemoglobin A1c

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out diabetes mellitus suggests

Laboratory tests – fasting serum blood glucose, oral glucose challenge test, and/or blood hemoglobin A1c produces

Test results - used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease type 2 diabetes mellitus

44

Page 41: Creating the CTSA Ontology Landscape: A  Modular Strategy

Type 1 hypersensitivity to penicillin Etiological process – sensitizing of mast

cells and basophils during exposure to penicillin-class substance produces

Disorder – mast cells and basophils with epitope-specific IgE bound to Fc epsilon receptor I bears

Disposition (disease) – type I hypersensitivity realized_in

Pathological process – type I hypersensitivity reaction produces

Abnormal bodily features recognized_as

Symptoms – pruritis, shortness of breath Signs – rash, urticaria, anaphylaxis

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - suggests

Laboratory tests – produces

Test results – occasionally, skin testing used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease type 1 hypersensitivity to penicillin

45

Page 42: Creating the CTSA Ontology Landscape: A  Modular Strategy

Early Onset Alzheimer’s Disease

Disorder –  mutations in APP, PSEN1 and PSEN2bears

Disposition – impaired APP processingrealized in

Pathological process – accumulation of intra- and extracellular protein in the brainproduces

Disorder – amyloid plaque and neurofibrillary tanglesbears

Disposition – of neurons to dierealized in

Pathological process – neuronal lossproduces

Disorder – cognitive brain regions damaged and reduced in sizebears

Disposition (disease) – Alzheimer’s dementiarealized in

Symptoms – episodic memory loss and other cognitive domain impairment46

Page 43: Creating the CTSA Ontology Landscape: A  Modular Strategy

Arterial Aneurysm• Disposition – atherosclerosis

– realized in• Pathological process – fatty material collects within the walls of arteries

– produces• Disorder – artery with weakened wall

– bears• Disposition – of artery to become distended

– realized_in• Pathological process – process of distending

– produces• Disorder – arterial aneurysm

– bears• Disposition – of artery to rupture

– realized in• Pathological process – (catastrophic event) of rupturing

– produces• Disorder – ruptured artery, arterial system with dangerously low blood pressure

– bears• Disposition – circulatory failure

– realized in• Pathological process – exsanguination, failure of homeostasis

– produces• Death

47

Page 44: Creating the CTSA Ontology Landscape: A  Modular Strategy

Hemorrhagic stroke• Disorder – cerebral arterial aneurysm

– bears• Disposition – of weakened artery to rupture

– realized in• Pathological process – rupturing of weakened blood vessel

– produces• Disorder – Intraparenchymal cerebral hemorrhage

– bears• Disposition (disease) – to increased intra-cranial pressure

– realized in• Pathological process – increasing intra-cranial pressure, compression of brain

structures– produces

• Disorder – Cerebral ischemia, Cerebral neuronal death– bears

• Disposition (disease) – stroke– realized in

• Symptoms – weakness/paralysis, loss of sensation, etc48

Page 45: Creating the CTSA Ontology Landscape: A  Modular Strategy

Ontology modules extending of OGMS

Sleep Domain Ontology (SDO)Infectious Disease Ontology (IDO)Ontology of Medically Relevant Social

Entities (OMRSE)Vital Sign Ontology (VSO)Mental Disease Ontology (MD)Neurological Disease Ontology (ND)

49

Page 46: Creating the CTSA Ontology Landscape: A  Modular Strategy

Infectious Disease Ontology (IDO)

– IDO Core: • General terms in the ID domain. • A hub for all IDO extensions.

– IDO Extensions: • Disease specific. • Developed by subject matter experts.

• Provides:– Clear, precise, and consistent natural language

definitions– Computable logical representations (OWL, OBO)

Page 47: Creating the CTSA Ontology Landscape: A  Modular Strategy

How IDO evolvesIDOCore

IDOSa

IDOHumanSa

IDORatSa

IDOStrep

IDORatStrep

IDOHumanStrep

IDOMRSa

IDOHumanBacterial

IDOAntibioticResistant

IDOMAL IDOHIVCORE and SPOKES:Domain ontologies

SEMI-LATTICE:By subject matter experts in different communities of interest.

IDOFLU

Page 48: Creating the CTSA Ontology Landscape: A  Modular Strategy

IDO Core

• Contains general terms in the ID domain:– E.g., ‘colonization’, ‘pathogen’, ‘infection’

• A contract between IDO extension ontologies and the datasets that use them.

• Intended to represent information along several dimensions:– biological scale (gene, cell, organ, organism, population)– discipline (clinical, immunological, microbiological) – organisms involved (host, pathogen, and vector types)

Page 49: Creating the CTSA Ontology Landscape: A  Modular Strategy

Sample IDO Definitions

• Host of Infectious Agent (BFO Role): A role borne by an organism in virtue of the fact that its extended organism contains an infectious agent.

• Extended Organism (OGMS): An object aggregate consisting of an organism and all material entities located within the organism, overlapping the organism, or occupying sites formed in part by the organism.

• Infectious Agent: A pathogen whose pathogenic disposition is an infectious disposition.

Page 50: Creating the CTSA Ontology Landscape: A  Modular Strategy

IDO and IDOSa

• Scale of the infection (disorder)

from Shetty, Tang, and Andrews, 200912/10/2010 54

Page 51: Creating the CTSA Ontology Landscape: A  Modular Strategy

Staphylococcus aureus (Sa)

MSSa MRSa

HA-MRSa CA-MRSa

UK CA-MRSa Australian CA-MRSa

Specific Strains

{Antibiotic Resistance

{Pathogenesis Location Type

{Geographic Region

{Various Differentia

Differentiated by:

Page 52: Creating the CTSA Ontology Landscape: A  Modular Strategy

Sample Application: A lattice of infectious disease application ontologies from NARSA isolate data

Network on Antimicrobial Resistance in Staphylococcus aureus– http://www.narsa.net/content/staphLinks.jsp

True personalized medicine – YourDiseaseOntology

Page 53: Creating the CTSA Ontology Landscape: A  Modular Strategy

Ways of differentiating Staphylococcus aureus infectious diseases

• Infectious Disease– By host type– By (sub-)species of pathogen– By antibiotic resistance– By anatomical site of infection

• Bacterial Infectious Disease– By PFGE (Strain)– By MLST (Sequence Type)– By BURST (Clonal Complex)

• Sa Infectious Disease– By SCCmec type

• By ccr type• By mec class

– spa type

http://www.sccmec.org/Pages/SCC_ClassificationEN.html

Page 54: Creating the CTSA Ontology Landscape: A  Modular Strategy

ido.owl

narsa.owl

narsa-isolates.owl

ndf-rt

NRS701’s resistance to clindamycin

Page 55: Creating the CTSA Ontology Landscape: A  Modular Strategy

BFO: The Very Top

continuant

independentcontinuant

dependentcontinuant

qualityfunctionroledisposition

occurrent

Page 56: Creating the CTSA Ontology Landscape: A  Modular Strategy

Basic Formal Ontology

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......

types

instances

Page 57: Creating the CTSA Ontology Landscape: A  Modular Strategy

Basis of BFO in GO

Continuant Occurrent

biological processIndependent

Continuant

cellular component

DependentContinuant

molecular function

..... ..... ........

Page 58: Creating the CTSA Ontology Landscape: A  Modular Strategy

How a common upper level ontology can help resist ontology chaos

something to teachtraining (expertise) is portableeach new ontology you confront will be more easily

understood at the level of contentand more easily criticized, error-checked

provides starting-point for domain-ontology developmentprovides platform for tool-building and innovations• lessons learned in building and using one ontology can

potentially benefit other ontologies• promote shareability of data across discilinary and

other boundaries