Creating the CTSA Ontology Landscape: A Modular Strategy
description
Transcript of Creating the CTSA Ontology Landscape: A Modular Strategy
Creating the CTSA Ontology Landscape: A Modular Strategy
Barry Smith
For modularity to work, developers must accept some basic principles
– for formulating definitions– of modularity– of user feedback for error correction and gap
identification– for ensuring compatibility between modules– for using ontologies to annotate legacy data– for using ontologies to create new data– for developing user-specific views
The Modular Approach• Create a small set of plug-and-play ontologies as
stable monohierarchies with a high likelihood of being reused
• Create ontologies incrementally• Reuse existing ontology resources• Use these ontologies incrementally in
annotating heterogeneous data• Annotating = arms length approach; the data
and data-models themselves remain as they are3
Benefits of Modularity• Brings a clean division of labor amongst
domain experts, who can manage governance aspects pertaining to their own domains
• Automatic consistency of the results of the distributed efforts – no room for contradiction
• Additivity of annotations even when multiple independently developed ontologies are used
• Lessons learned in developing and using one module can be used by the developers and users of later modules
4
Benefits of Modularity
• Increased likelihood of reuse, since potential users will be aware that they are investing in the results of an authoritative coordinated approach of proven reliability
• Increased value and portability of training in any given module
• Incentivization of those responsible for individual modules
5
Benefits of Modularity• All of those involved can more easily inspect
and criticize the results of others’ work • Creates a collaborative environment for
ontology development • serves as a platform for innovations which can
be easily propagated throughout the whole system
• Developing and using ontologies in a consistent fashion brings a number of network effects – the value of existing annotations increases as new annotations are added 6
You will need to embrace some strategy along these lines if you want to get funding for
translational research
NIH Mandates for Sharing of Research Data
Investigators submitting an NIH application seeking $500,000 or more in any single year are expected to include a plan for data sharing
(http://grants.nih.gov/grants/policy/data_sharing)
7
Logical standards can be only part of the solution
OWL … bring benefits primarily on the side of syntax (language)
What we need are standards on the semantics (content) side (via top-level ontologies), including standards for• top-level ontologies• common relations (part_of …)• relation of lower-level ontologies to each
other and to the higher levels
BFO, DOLCE, SUMO
All exist in FOL and OWL versionsAll have been tested in use
BFO: very small, truly domain-neutral
DOLCE: largely extends BFO, but built to support ‘linguistic and cognitive engineering’
SUMO: has its own tiny mathematics, tiny physics, tiny biology (‘body-covering’, ‘fruit-Or-vegetable’), …
9
120+ ontology projects using BFO
http://www.ifomis.org/bfo/
• Open Biomedical Ontologies Foundry • Ontology for General Medical Science• eagle-I, VIVO, CTSAconnect• AstraZeneca • Elsevier
How a common upper level ontology can help resist ontology chaos
• something to teach• training (expertise) is portable• each new ontology you confront will be more easily
understood at the level of content– and more easily criticized, error-checked
• provides starting-point for domain-ontology development• provides platform for tool-building and innovations• lessons learned in building and using one ontology can
potentially benefit other ontologies• promote shareability of data across discilinary and other
boundaries
Anatomy Ontology(FMA*, CARO)
Environment
Ontology(EnvO)
Infectious Disease
Ontology(IDO*)
Biological Process
Ontology (GO*)
Cell Ontology
(CL)
CellularComponentOntology
(FMA*, GO*) Phenotypic Quality
Ontology(PaTO)Subcellular Anatomy Ontology (SAO)
Sequence Ontology (SO*) Molecular
Function(GO*)Protein Ontology
(PRO*) OBO Foundry Modular Organization
top level
mid-level
domain level
Information Artifact Ontology
(IAO)
Ontology for Biomedical Investigations
(OBI)
Ontology of General Medical Science
(OGMS)
Basic Formal Ontology (BFO)
12
BFO
A simple top-level ontology to support information integration in scientific research• No overlap with domain ontologies
(organism, person, society, information, …)• Based on realism• No abstracta• Tested in many natural science domains
13
Basic Formal Ontology
Continuant Occurrent
process, eventIndependentContinuant
entity
DependentContinuant
property
property dependson bearer
14
depends_on
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
property event dependson participant
15
Basic Formal Ontology
continuant occurrent
biological processes
independentcontinuant
cellular component
dependentcontinuant
molecular function
roles, qualities
Continuant Occurrent
process, eventIndependentContinuant
DependentContinuant
17
Quality Disposition
instance_of
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
property
.... ..... .......
types
instances 18
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
rationale of OBO Foundry coverage
GRANULARITY
RELATION TO TIME
19
Example: The Cell Ontology
Four distinct classificatory tasks
1. of people (patients, carriers, …)2. of diseases (cases, instances, problems, …)3. of courses of disease (symptoms, treatments…)4. of representations (records, observations, data,
diagnoses…)
ICD confuses 1. & 2.Most standard terminologies confuse 2. and 4
21
Ontology for General Medical Science (OGMS)
1. person (patient, carrier, …) – independent continuant
2. disease (case, instance, problem, …) – specifically dependent continuant
3. course of disease (symptom, treatment…)– occurrent
4. representation (record, datum, diagnosis…)– generically dependent continuant
http://code.google.com/p/ogms/22
Four distinct BFO categories
1. people (patients, carriers, …) – independent continuants
2. disease (case, instance, problem, condition …) – disposition
3. course of disease (symptom, episode, outbreak …)– realization of dispositions
4. representations (records, data, diagnoses…)– generically dependent continuants
23
Big Picture (Ontology for General Medical Science)
24
Elucidation of Primitive Terms
‘extended organism’ = the organism and all the material entities located within it
‘bodily feature’ = either a physical part of the extended organism, a bodily quality, or a bodily process.
25
Elucidation of Primitive Terms
clinically abnormal - some bodily feature that (1) is not part of the life plan for an organism of the
relevant type (unlike loss of milk teeth, aging or pregnancy),
(2) is causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and
(3) is such that the elevated risk exceeds a certain threshold level.*
*Compare: baldness
26
DisorderA material entity (fiat object part) which is clinically abnormal and part of an extended organism
Compare: Downtown Santa BarbaraMount Everest
27
Definitions - Foundational Terms
Pathological Process =def. – A bodily process that is clinically abnormal.
Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism.
29
Big Picture (Ontology for General Medical Science)
30
31http://code.google.com/p/ogms/
Disease Course=Def. The sum of processes through which a given disease instance is realized.
32
A disease is a disposition
etiological process
produces
disorder
bears
disposition
realized_in
pathological process
produces
abnormal bodily features
recognized_as
signs & symptomsinterpretive process
produces
diagnosis
used_in36
Cirrhosis - environmental exposure Etiological process - phenobarbitol-
induced hepatic cell death produces
Disorder - necrotic liver bears
Disposition (disease) - cirrhosis realized_in
Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death produces
Abnormal bodily features recognized_as
Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out cirrhosis suggests
Laboratory tests produces
Test results - elevated liver enzymes in serum used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease cirrhosis
37
Influenza - infectious Etiological process - infection of
airway epithelial cells with influenza virus produces
Disorder - viable cells with influenza virus bears
Disposition (disease) - flu realized_in
Pathological process - acute inflammation produces
Abnormal bodily features recognized_as
Symptoms - weakness, dizziness Signs - fever
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out influenza suggests
Laboratory tests produces
Test results - elevated serum antibody titers used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease flu
38
Huntington’s Disease - genetic Etiological process - inheritance of
>39 CAG repeats in the HTT gene produces
Disorder - chromosome 4 with abnormal mHTT bears
Disposition (disease) - Huntington’s disease realized_in
Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum produces
Abnormal bodily features recognized_as
Symptoms - anxiety, depression Signs - difficulties in speaking and
swallowing
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out Huntington’s suggests
Laboratory tests produces
Test results - molecular detection of the HTT gene with >39CAG repeats used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease
39
Dispositions and Predispositions
Some dispositions are predispositions to other dispositions.
40
HNPCC - genetic pre-disposition Etiological process - inheritance of a mutant mismatch repair gene
produces Disorder - chromosome 3 with abnormal hMLH1
bears Disposition (disease) - Lynch syndrome
realized_in Pathological process - abnormal repair of DNA mismatches
produces Disorder - mutations in proto-oncogenes and tumor suppressor genes with
microsatellite repeats (e.g. TGF-beta R2) bears
Disposition (disease) - non-polyposis colon cancer realized in
Symptoms (including pain)
41
Arterial AneurysmDisposition – atherosclerosis
realized inPathological process – fatty material collects within the walls of arteries
producesDisorder – artery with weakened wall
bearsDisposition – of artery to become distended
realized_inPathological process – process of distending
producesDisorder – arterial aneurysm
bearsDisposition – of artery to rupture
realized inPathological process – (catastrophic event) of rupturing
producesDisorder – ruptured artery, arterial system with dangerously low blood pressure
bearsDisposition – circulatory failure
realized inPathological process – exsanguination, failure of homeostasis
producesDeath 42
Systemic arterial hypertension Etiological process – abnormal
reabsorption of NaCl by the kidney produces
Disorder – abnormally large scattered molecular aggregate of salt in the blood bears
Disposition (disease) - hypertension realized_in
Pathological process – exertion of abnormal pressure against arterial wall produces
Abnormal bodily features recognized_as
Symptoms - Signs – elevated blood pressure
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out hypertension suggests
Laboratory tests produces
Test results - used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease hypertension
43
Type 2 Diabetes Mellitus Etiological process –
produces Disorder – abnormal pancreatic beta
cells and abnormal muscle/fat cells bears
Disposition (disease) – diabetes mellitus realized_in
Pathological processes – diminished insulin production , diminished muscle/fat uptake of glucose produces
Abnormal bodily features recognized_as
Symptoms – polydipsia, polyuria, polyphagia, blurred vision
Signs – elevated blood glucose and hemoglobin A1c
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out diabetes mellitus suggests
Laboratory tests – fasting serum blood glucose, oral glucose challenge test, and/or blood hemoglobin A1c produces
Test results - used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease type 2 diabetes mellitus
44
Type 1 hypersensitivity to penicillin Etiological process – sensitizing of mast
cells and basophils during exposure to penicillin-class substance produces
Disorder – mast cells and basophils with epitope-specific IgE bound to Fc epsilon receptor I bears
Disposition (disease) – type I hypersensitivity realized_in
Pathological process – type I hypersensitivity reaction produces
Abnormal bodily features recognized_as
Symptoms – pruritis, shortness of breath Signs – rash, urticaria, anaphylaxis
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - suggests
Laboratory tests – produces
Test results – occasionally, skin testing used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease type 1 hypersensitivity to penicillin
45
Early Onset Alzheimer’s Disease
Disorder – mutations in APP, PSEN1 and PSEN2bears
Disposition – impaired APP processingrealized in
Pathological process – accumulation of intra- and extracellular protein in the brainproduces
Disorder – amyloid plaque and neurofibrillary tanglesbears
Disposition – of neurons to dierealized in
Pathological process – neuronal lossproduces
Disorder – cognitive brain regions damaged and reduced in sizebears
Disposition (disease) – Alzheimer’s dementiarealized in
Symptoms – episodic memory loss and other cognitive domain impairment46
Arterial Aneurysm• Disposition – atherosclerosis
– realized in• Pathological process – fatty material collects within the walls of arteries
– produces• Disorder – artery with weakened wall
– bears• Disposition – of artery to become distended
– realized_in• Pathological process – process of distending
– produces• Disorder – arterial aneurysm
– bears• Disposition – of artery to rupture
– realized in• Pathological process – (catastrophic event) of rupturing
– produces• Disorder – ruptured artery, arterial system with dangerously low blood pressure
– bears• Disposition – circulatory failure
– realized in• Pathological process – exsanguination, failure of homeostasis
– produces• Death
47
Hemorrhagic stroke• Disorder – cerebral arterial aneurysm
– bears• Disposition – of weakened artery to rupture
– realized in• Pathological process – rupturing of weakened blood vessel
– produces• Disorder – Intraparenchymal cerebral hemorrhage
– bears• Disposition (disease) – to increased intra-cranial pressure
– realized in• Pathological process – increasing intra-cranial pressure, compression of brain
structures– produces
• Disorder – Cerebral ischemia, Cerebral neuronal death– bears
• Disposition (disease) – stroke– realized in
• Symptoms – weakness/paralysis, loss of sensation, etc48
Ontology modules extending of OGMS
Sleep Domain Ontology (SDO)Infectious Disease Ontology (IDO)Ontology of Medically Relevant Social
Entities (OMRSE)Vital Sign Ontology (VSO)Mental Disease Ontology (MD)Neurological Disease Ontology (ND)
49
Infectious Disease Ontology (IDO)
– IDO Core: • General terms in the ID domain. • A hub for all IDO extensions.
– IDO Extensions: • Disease specific. • Developed by subject matter experts.
• Provides:– Clear, precise, and consistent natural language
definitions– Computable logical representations (OWL, OBO)
How IDO evolvesIDOCore
IDOSa
IDOHumanSa
IDORatSa
IDOStrep
IDORatStrep
IDOHumanStrep
IDOMRSa
IDOHumanBacterial
IDOAntibioticResistant
IDOMAL IDOHIVCORE and SPOKES:Domain ontologies
SEMI-LATTICE:By subject matter experts in different communities of interest.
IDOFLU
IDO Core
• Contains general terms in the ID domain:– E.g., ‘colonization’, ‘pathogen’, ‘infection’
• A contract between IDO extension ontologies and the datasets that use them.
• Intended to represent information along several dimensions:– biological scale (gene, cell, organ, organism, population)– discipline (clinical, immunological, microbiological) – organisms involved (host, pathogen, and vector types)
Sample IDO Definitions
• Host of Infectious Agent (BFO Role): A role borne by an organism in virtue of the fact that its extended organism contains an infectious agent.
• Extended Organism (OGMS): An object aggregate consisting of an organism and all material entities located within the organism, overlapping the organism, or occupying sites formed in part by the organism.
• Infectious Agent: A pathogen whose pathogenic disposition is an infectious disposition.
IDO and IDOSa
• Scale of the infection (disorder)
from Shetty, Tang, and Andrews, 200912/10/2010 54
Staphylococcus aureus (Sa)
MSSa MRSa
HA-MRSa CA-MRSa
UK CA-MRSa Australian CA-MRSa
Specific Strains
{Antibiotic Resistance
{Pathogenesis Location Type
{Geographic Region
{Various Differentia
Differentiated by:
Sample Application: A lattice of infectious disease application ontologies from NARSA isolate data
Network on Antimicrobial Resistance in Staphylococcus aureus– http://www.narsa.net/content/staphLinks.jsp
True personalized medicine – YourDiseaseOntology
Ways of differentiating Staphylococcus aureus infectious diseases
• Infectious Disease– By host type– By (sub-)species of pathogen– By antibiotic resistance– By anatomical site of infection
• Bacterial Infectious Disease– By PFGE (Strain)– By MLST (Sequence Type)– By BURST (Clonal Complex)
• Sa Infectious Disease– By SCCmec type
• By ccr type• By mec class
– spa type
http://www.sccmec.org/Pages/SCC_ClassificationEN.html
ido.owl
narsa.owl
narsa-isolates.owl
ndf-rt
NRS701’s resistance to clindamycin
BFO: The Very Top
continuant
independentcontinuant
dependentcontinuant
qualityfunctionroledisposition
occurrent
Basic Formal Ontology
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......
types
instances
Basis of BFO in GO
Continuant Occurrent
biological processIndependent
Continuant
cellular component
DependentContinuant
molecular function
..... ..... ........
How a common upper level ontology can help resist ontology chaos
something to teachtraining (expertise) is portableeach new ontology you confront will be more easily
understood at the level of contentand more easily criticized, error-checked
provides starting-point for domain-ontology developmentprovides platform for tool-building and innovations• lessons learned in building and using one ontology can
potentially benefit other ontologies• promote shareability of data across discilinary and
other boundaries