Creating the CTSA Ontology Landscape: A Modular Strategy

Post on 23-Feb-2016

25 views 0 download

Tags:

description

Creating the CTSA Ontology Landscape: A Modular Strategy. Barry Smith. For modularity to work, developers must accept some basic principles. for formulating definitions of modularity of user feedback for error correction and gap identification for ensuring compatibility between modules - PowerPoint PPT Presentation

Transcript of Creating the CTSA Ontology Landscape: A Modular Strategy

Creating the CTSA Ontology Landscape: A Modular Strategy

Barry Smith

For modularity to work, developers must accept some basic principles

– for formulating definitions– of modularity– of user feedback for error correction and gap

identification– for ensuring compatibility between modules– for using ontologies to annotate legacy data– for using ontologies to create new data– for developing user-specific views

The Modular Approach• Create a small set of plug-and-play ontologies as

stable monohierarchies with a high likelihood of being reused

• Create ontologies incrementally• Reuse existing ontology resources• Use these ontologies incrementally in

annotating heterogeneous data• Annotating = arms length approach; the data

and data-models themselves remain as they are3

Benefits of Modularity• Brings a clean division of labor amongst

domain experts, who can manage governance aspects pertaining to their own domains

• Automatic consistency of the results of the distributed efforts – no room for contradiction

• Additivity of annotations even when multiple independently developed ontologies are used

• Lessons learned in developing and using one module can be used by the developers and users of later modules

4

Benefits of Modularity

• Increased likelihood of reuse, since potential users will be aware that they are investing in the results of an authoritative coordinated approach of proven reliability

• Increased value and portability of training in any given module

• Incentivization of those responsible for individual modules

5

Benefits of Modularity• All of those involved can more easily inspect

and criticize the results of others’ work • Creates a collaborative environment for

ontology development • serves as a platform for innovations which can

be easily propagated throughout the whole system

• Developing and using ontologies in a consistent fashion brings a number of network effects – the value of existing annotations increases as new annotations are added 6

You will need to embrace some strategy along these lines if you want to get funding for

translational research

NIH Mandates for Sharing of Research Data

Investigators submitting an NIH application seeking $500,000 or more in any single year are expected to include a plan for data sharing

(http://grants.nih.gov/grants/policy/data_sharing)

7

Logical standards can be only part of the solution

OWL … bring benefits primarily on the side of syntax (language)

What we need are standards on the semantics (content) side (via top-level ontologies), including standards for• top-level ontologies• common relations (part_of …)• relation of lower-level ontologies to each

other and to the higher levels

BFO, DOLCE, SUMO

All exist in FOL and OWL versionsAll have been tested in use

BFO: very small, truly domain-neutral

DOLCE: largely extends BFO, but built to support ‘linguistic and cognitive engineering’

SUMO: has its own tiny mathematics, tiny physics, tiny biology (‘body-covering’, ‘fruit-Or-vegetable’), …

9

120+ ontology projects using BFO

http://www.ifomis.org/bfo/

• Open Biomedical Ontologies Foundry • Ontology for General Medical Science• eagle-I, VIVO, CTSAconnect• AstraZeneca • Elsevier

How a common upper level ontology can help resist ontology chaos

• something to teach• training (expertise) is portable• each new ontology you confront will be more easily

understood at the level of content– and more easily criticized, error-checked

• provides starting-point for domain-ontology development• provides platform for tool-building and innovations• lessons learned in building and using one ontology can

potentially benefit other ontologies• promote shareability of data across discilinary and other

boundaries

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)Subcellular Anatomy Ontology (SAO)

Sequence Ontology (SO*) Molecular

Function(GO*)Protein Ontology

(PRO*) OBO Foundry Modular Organization

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical Investigations

(OBI)

Ontology of General Medical Science

(OGMS)

Basic Formal Ontology (BFO)

12

BFO

A simple top-level ontology to support information integration in scientific research• No overlap with domain ontologies

(organism, person, society, information, …)• Based on realism• No abstracta• Tested in many natural science domains

13

Basic Formal Ontology

Continuant Occurrent

process, eventIndependentContinuant

entity

DependentContinuant

property

property dependson bearer

14

depends_on

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

property event dependson participant

15

Basic Formal Ontology

continuant occurrent

biological processes

independentcontinuant

cellular component

dependentcontinuant

molecular function

roles, qualities

Continuant Occurrent

process, eventIndependentContinuant

DependentContinuant

17

Quality Disposition

instance_of

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

property

.... ..... .......

types

instances 18

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENTCell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RNAO, PRO)

Molecular Function(GO)

Molecular Process

(GO)

rationale of OBO Foundry coverage

GRANULARITY

RELATION TO TIME

19

Example: The Cell Ontology

Four distinct classificatory tasks

1. of people (patients, carriers, …)2. of diseases (cases, instances, problems, …)3. of courses of disease (symptoms, treatments…)4. of representations (records, observations, data,

diagnoses…)

ICD confuses 1. & 2.Most standard terminologies confuse 2. and 4

21

Ontology for General Medical Science (OGMS)

1. person (patient, carrier, …) – independent continuant

2. disease (case, instance, problem, …) – specifically dependent continuant

3. course of disease (symptom, treatment…)– occurrent

4. representation (record, datum, diagnosis…)– generically dependent continuant

http://code.google.com/p/ogms/22

Four distinct BFO categories

1. people (patients, carriers, …) – independent continuants

2. disease (case, instance, problem, condition …) – disposition

3. course of disease (symptom, episode, outbreak …)– realization of dispositions

4. representations (records, data, diagnoses…)– generically dependent continuants

23

Big Picture (Ontology for General Medical Science)

24

Elucidation of Primitive Terms

‘extended organism’ = the organism and all the material entities located within it

‘bodily feature’ = either a physical part of the extended organism, a bodily quality, or a bodily process.

25

Elucidation of Primitive Terms

clinically abnormal - some bodily feature that (1) is not part of the life plan for an organism of the

relevant type (unlike loss of milk teeth, aging or pregnancy),

(2) is causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and

(3) is such that the elevated risk exceeds a certain threshold level.*

*Compare: baldness

26

DisorderA material entity (fiat object part) which is clinically abnormal and part of an extended organism

Compare: Downtown Santa BarbaraMount Everest

27

Definitions - Foundational Terms

Pathological Process =def. – A bodily process that is clinically abnormal.

Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism.

29

Big Picture (Ontology for General Medical Science)

30

31http://code.google.com/p/ogms/

Disease Course=Def. The sum of processes through which a given disease instance is realized.

32

A disease is a disposition

etiological process

produces

disorder

bears

disposition

realized_in

pathological process

produces

abnormal bodily features

recognized_as

signs & symptomsinterpretive process

produces

diagnosis

used_in36

Cirrhosis - environmental exposure Etiological process - phenobarbitol-

induced hepatic cell death produces

Disorder - necrotic liver bears

Disposition (disease) - cirrhosis realized_in

Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death produces

Abnormal bodily features recognized_as

Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out cirrhosis suggests

Laboratory tests produces

Test results - elevated liver enzymes in serum used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease cirrhosis

37

Influenza - infectious Etiological process - infection of

airway epithelial cells with influenza virus produces

Disorder - viable cells with influenza virus bears

Disposition (disease) - flu realized_in

Pathological process - acute inflammation produces

Abnormal bodily features recognized_as

Symptoms - weakness, dizziness Signs - fever

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out influenza suggests

Laboratory tests produces

Test results - elevated serum antibody titers used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease flu

38

Huntington’s Disease - genetic Etiological process - inheritance of

>39 CAG repeats in the HTT gene produces

Disorder - chromosome 4 with abnormal mHTT bears

Disposition (disease) - Huntington’s disease realized_in

Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum produces

Abnormal bodily features recognized_as

Symptoms - anxiety, depression Signs - difficulties in speaking and

swallowing

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out Huntington’s suggests

Laboratory tests produces

Test results - molecular detection of the HTT gene with >39CAG repeats used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease

39

Dispositions and Predispositions

Some dispositions are predispositions to other dispositions.

40

HNPCC - genetic pre-disposition Etiological process - inheritance of a mutant mismatch repair gene

produces Disorder - chromosome 3 with abnormal hMLH1

bears Disposition (disease) - Lynch syndrome

realized_in Pathological process - abnormal repair of DNA mismatches

produces Disorder - mutations in proto-oncogenes and tumor suppressor genes with

microsatellite repeats (e.g. TGF-beta R2) bears

Disposition (disease) - non-polyposis colon cancer realized in

Symptoms (including pain)

41

Arterial AneurysmDisposition – atherosclerosis

realized inPathological process – fatty material collects within the walls of arteries

producesDisorder – artery with weakened wall

bearsDisposition – of artery to become distended

realized_inPathological process – process of distending

producesDisorder – arterial aneurysm

bearsDisposition – of artery to rupture

realized inPathological process – (catastrophic event) of rupturing

producesDisorder – ruptured artery, arterial system with dangerously low blood pressure

bearsDisposition – circulatory failure

realized inPathological process – exsanguination, failure of homeostasis

producesDeath 42

Systemic arterial hypertension Etiological process – abnormal

reabsorption of NaCl by the kidney produces

Disorder – abnormally large scattered molecular aggregate of salt in the blood bears

Disposition (disease) - hypertension realized_in

Pathological process – exertion of abnormal pressure against arterial wall produces

Abnormal bodily features recognized_as

Symptoms - Signs – elevated blood pressure

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out hypertension suggests

Laboratory tests produces

Test results - used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease hypertension

43

Type 2 Diabetes Mellitus Etiological process –

produces Disorder – abnormal pancreatic beta

cells and abnormal muscle/fat cells bears

Disposition (disease) – diabetes mellitus realized_in

Pathological processes – diminished insulin production , diminished muscle/fat uptake of glucose produces

Abnormal bodily features recognized_as

Symptoms – polydipsia, polyuria, polyphagia, blurred vision

Signs – elevated blood glucose and hemoglobin A1c

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out diabetes mellitus suggests

Laboratory tests – fasting serum blood glucose, oral glucose challenge test, and/or blood hemoglobin A1c produces

Test results - used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease type 2 diabetes mellitus

44

Type 1 hypersensitivity to penicillin Etiological process – sensitizing of mast

cells and basophils during exposure to penicillin-class substance produces

Disorder – mast cells and basophils with epitope-specific IgE bound to Fc epsilon receptor I bears

Disposition (disease) – type I hypersensitivity realized_in

Pathological process – type I hypersensitivity reaction produces

Abnormal bodily features recognized_as

Symptoms – pruritis, shortness of breath Signs – rash, urticaria, anaphylaxis

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - suggests

Laboratory tests – produces

Test results – occasionally, skin testing used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease type 1 hypersensitivity to penicillin

45

Early Onset Alzheimer’s Disease

Disorder –  mutations in APP, PSEN1 and PSEN2bears

Disposition – impaired APP processingrealized in

Pathological process – accumulation of intra- and extracellular protein in the brainproduces

Disorder – amyloid plaque and neurofibrillary tanglesbears

Disposition – of neurons to dierealized in

Pathological process – neuronal lossproduces

Disorder – cognitive brain regions damaged and reduced in sizebears

Disposition (disease) – Alzheimer’s dementiarealized in

Symptoms – episodic memory loss and other cognitive domain impairment46

Arterial Aneurysm• Disposition – atherosclerosis

– realized in• Pathological process – fatty material collects within the walls of arteries

– produces• Disorder – artery with weakened wall

– bears• Disposition – of artery to become distended

– realized_in• Pathological process – process of distending

– produces• Disorder – arterial aneurysm

– bears• Disposition – of artery to rupture

– realized in• Pathological process – (catastrophic event) of rupturing

– produces• Disorder – ruptured artery, arterial system with dangerously low blood pressure

– bears• Disposition – circulatory failure

– realized in• Pathological process – exsanguination, failure of homeostasis

– produces• Death

47

Hemorrhagic stroke• Disorder – cerebral arterial aneurysm

– bears• Disposition – of weakened artery to rupture

– realized in• Pathological process – rupturing of weakened blood vessel

– produces• Disorder – Intraparenchymal cerebral hemorrhage

– bears• Disposition (disease) – to increased intra-cranial pressure

– realized in• Pathological process – increasing intra-cranial pressure, compression of brain

structures– produces

• Disorder – Cerebral ischemia, Cerebral neuronal death– bears

• Disposition (disease) – stroke– realized in

• Symptoms – weakness/paralysis, loss of sensation, etc48

Ontology modules extending of OGMS

Sleep Domain Ontology (SDO)Infectious Disease Ontology (IDO)Ontology of Medically Relevant Social

Entities (OMRSE)Vital Sign Ontology (VSO)Mental Disease Ontology (MD)Neurological Disease Ontology (ND)

49

Infectious Disease Ontology (IDO)

– IDO Core: • General terms in the ID domain. • A hub for all IDO extensions.

– IDO Extensions: • Disease specific. • Developed by subject matter experts.

• Provides:– Clear, precise, and consistent natural language

definitions– Computable logical representations (OWL, OBO)

How IDO evolvesIDOCore

IDOSa

IDOHumanSa

IDORatSa

IDOStrep

IDORatStrep

IDOHumanStrep

IDOMRSa

IDOHumanBacterial

IDOAntibioticResistant

IDOMAL IDOHIVCORE and SPOKES:Domain ontologies

SEMI-LATTICE:By subject matter experts in different communities of interest.

IDOFLU

IDO Core

• Contains general terms in the ID domain:– E.g., ‘colonization’, ‘pathogen’, ‘infection’

• A contract between IDO extension ontologies and the datasets that use them.

• Intended to represent information along several dimensions:– biological scale (gene, cell, organ, organism, population)– discipline (clinical, immunological, microbiological) – organisms involved (host, pathogen, and vector types)

Sample IDO Definitions

• Host of Infectious Agent (BFO Role): A role borne by an organism in virtue of the fact that its extended organism contains an infectious agent.

• Extended Organism (OGMS): An object aggregate consisting of an organism and all material entities located within the organism, overlapping the organism, or occupying sites formed in part by the organism.

• Infectious Agent: A pathogen whose pathogenic disposition is an infectious disposition.

IDO and IDOSa

• Scale of the infection (disorder)

from Shetty, Tang, and Andrews, 200912/10/2010 54

Staphylococcus aureus (Sa)

MSSa MRSa

HA-MRSa CA-MRSa

UK CA-MRSa Australian CA-MRSa

Specific Strains

{Antibiotic Resistance

{Pathogenesis Location Type

{Geographic Region

{Various Differentia

Differentiated by:

Sample Application: A lattice of infectious disease application ontologies from NARSA isolate data

Network on Antimicrobial Resistance in Staphylococcus aureus– http://www.narsa.net/content/staphLinks.jsp

True personalized medicine – YourDiseaseOntology

Ways of differentiating Staphylococcus aureus infectious diseases

• Infectious Disease– By host type– By (sub-)species of pathogen– By antibiotic resistance– By anatomical site of infection

• Bacterial Infectious Disease– By PFGE (Strain)– By MLST (Sequence Type)– By BURST (Clonal Complex)

• Sa Infectious Disease– By SCCmec type

• By ccr type• By mec class

– spa type

http://www.sccmec.org/Pages/SCC_ClassificationEN.html

ido.owl

narsa.owl

narsa-isolates.owl

ndf-rt

NRS701’s resistance to clindamycin

BFO: The Very Top

continuant

independentcontinuant

dependentcontinuant

qualityfunctionroledisposition

occurrent

Basic Formal Ontology

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......

types

instances

Basis of BFO in GO

Continuant Occurrent

biological processIndependent

Continuant

cellular component

DependentContinuant

molecular function

..... ..... ........

How a common upper level ontology can help resist ontology chaos

something to teachtraining (expertise) is portableeach new ontology you confront will be more easily

understood at the level of contentand more easily criticized, error-checked

provides starting-point for domain-ontology developmentprovides platform for tool-building and innovations• lessons learned in building and using one ontology can

potentially benefit other ontologies• promote shareability of data across discilinary and

other boundaries