Segmenting & Merging Domain-specific Modules for Clinical Informatics
-
Upload
chimezie-ogbuji -
Category
Health & Medicine
-
view
2.278 -
download
0
Transcript of Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Chimezie Ogbuji Cleveland Clinic & Case Western Reserve University
Sivaram Arabandi Case Western Reserve University
Songmao Zhang Chinese Academy of Sciences
Guo-Qiang Zhang Case Western Reserve University
Introduction
● What are we doing and why are we doing it?– Generally
– Specifically
● What is the criteria for success?● What are existing best practices and well-
documented challenges of ontology re-use?
Introduction
● Construct domain-specific ontologies to support data curation and ongoing clinical research activity
● PhysioMIMI is an informatics infrastructure for collection, management, and analysis of sleep-related data
● Our method was used to bootstrap a Sleep Domain Ontology (SDO)
Goal / Criteria for Success
● Want to (automatically) – Generate anatomy and clinical terminology modules
that make use of principled normal forms, are minimal in size, and preserve the meaning of re-used symbols
– As much as is computationally feasible
– Be able to facilitate the customization of a large source ontology such as SNOMED-CT
● Provide a framework for bootstrapping terminology for a specific domain
Desiderata for Clinical Terminology
● There is a critical need for formal, reproducible methods for recognizing and filling gaps in medical terminologies (Cimino 1998)
● Clinical terminology systems need to extend smoothly and quickly in response to the needs of users (Rector 1999)– A fixed, enumerated list of concepts can never be
complete and results in a combinatorial explosion of terms (exhaustive pre-coordination)
Desiderata (cont.)
● Post-coordination is a contrasting approach where a set of atomic concepts are used to create new terms on demand rather than a priori
● Rector 2003 proposed a set of normalization criteria and an approach for decomposing and recombining disjoint, homogenous taxonomies
● Goal is for trees of primitive terms to serve as a terminological framework that minimizes implicit differentia
– Discrete coordinate system
Background
● Related efforts regarding– Ontology merging
– Ontology modularization
● Review formalisms for ontology modularization– What is a deductive, conservative extension?
– What is a module?
● What is the difference between a segment and a module?
Related Work
● Noy and Musen (2000)– Discuss how to either automate the merging and
alignment or guide the user, suggesting conflicts and actions to take
– Rely on lexical matching of term names
● Bontas and Tolksdorf (2005)– Similar goal as Noy & Musen
– User provides a list of term matches between source & target
– Follow semantic connections from these terms
Related Work (cont.)
● Bontas et al. (2005) identify the following challenges in ontology re-use:– Automated translation of source ontologies into
common KR format
– Customization of source ontology
– Performance challenges of large medical ontologies
Related Work (cont.)
● d'Aquin et al.(2006)– Use a modularization algorithm based on a
traversal paradigm
– Describe 3 generic steps of dynamic knowledge selection algorithms:
● Selection of relevant ontologies● Modularization via an algorithm● Merging of ontology modules in a meaningful way
– Claim all entailments are preserved but do not demonstrate how this is guaranteed
Modularization
● Move to introduction (single bullet item)● The size of major medical ontologies is
prohibitive to the use of deductive reasoning● In addition and more relevant here, their size is
a significant challenge to terminology management
● Ontology modularization is a blossoming field in logic engineering
Deductive, Conservative Extensions
● Grau et al. (2008) define a formal relationship between DL ontologies: deductive, conservative extension
● Use case: we are developing ontology P and want to re-use a set of symbols from ontology Q without changing their meaning
● If the symbols they have in common are re-used in this way then:
– P + Q is a conservative extension of Q
Module
● When answering a query involving terms in O (its signature or vocabulary), importing O'
1
should give the same answers as if O' had been imported instead:
– O'1 is a more manageable fragment of O'
● Then we say O'1 is a module for O in O'
Materials
● SNOMED-CT● FMA● Common anatomy signature
Materials
● There is a reasonable consensus around two reference ontologies in clinical medicine– SNOMED-CT and the Foundational Model of
Anatomy (FMA)
● Both leverage an underlying formal knowledge representation
SNOMED-CT
● A comprehensive terminological framework for clinical documentation and reporting.
● Comprised of about half a million concepts:– Clinical findings, procedures, body structures,
organisms, substances, pharmaceutical products, specimen, quantitative measures, and clinical situations
● Has an underlying description logic (EL)– EL has been proven to be suitable for medical
terminology
SNOMED-CT Challenges
● Its size is deters the use of logical inference systems to manage and process it (due to performance issues)
● Most description logic systems run into challenges with memory exhaustion when classifying it in its entirety
● In some cases, its definitions are inconsistent or incomplete
● However, it is the de facto reference for clinical terminology
SNOMED-CT SEP Triplets
● SNOMED-CT uses SEP triplets to model anatomy concepts and their relationships to each other
● For every proper SNOMED-CT anatomy concept (an Entire class), there are two auxiliary classes:– A Structure class
– A Part class
● Main motivation is to rely on subsumption to reason about part-whole relationships
SEP Triplets
Example: Lower respiratory tract structure (part), Structure of respiratory system (structure), Entire respiratory system (entire)
Foundational Model of Anatomy
● Has a goal to conceptualize the physical objects and spaces that constitute the human body
● Leverages a frame-based knowledge representation to formulate over 75,000 concepts including:– Macroscopic, microscopic, and sub-cellular
canonical anatomy
● Anatomy is fundamental to biomedical domains
FMA (cont.)
● Concepts are connected by several mereological relations
● Primarily concerned with part_of and has_part ● Adheres to a strict, aristotelian modeling
paradigm– Ensures definitions are consistent and state the
essence of anatomy in terms of their characteristics
● Using a 2006 OWL translation from the version in the OBO foundary
Common Anatomy Signature
● There is a significant overlap between anatomy terms in SNOMED-CT and FMA
● Bodenreider and Zhang (2006) analyzed this overlap
● Leveraged lexical and structural analysis● Identified ~ 7500 common concepts
– Refer to as Sanatomy
● Key to the general applicability of our method within the domain of clinical medicine
Normal Forms
● Similarly, SNOMED-CT manual describes methods for generating normal forms
● Canonical forms comprised of maximally decomposed logical expressions
– Entailments from full SNOMED-CT still follow from normal forms
● Useful for comparing post-coordinated expressions during retrieval or analysis of data
Methods
● Start with a list of user-specified SNOMED-CT concepts
– Determines the domain
● 3 step process resulting in
– A SNOMED-CT module: O'snct-fma
– Transliteration of SEP triplets
– A FMA segment: O'fma-snct
● Segmentation heuristic● Directly merge into a single ontology
Core Procedure
● Extract normal forms from SNOMED-CT
● SNOMED-CT anatomy terms in Sanatomy
that are
reached during the extraction are replaced and used as seeds to extract a segment from the FMA
● Axioms involving SNOMED-CT anatomy terms in S
anatomy and the terms themselves
are
replaced such that they preserve the intent of the SEP triplet scheme using FMA terms
Segmentation Heuristic
● Seidenberg and Rector (2006) describe an ontology segmentation heuristic that starts with a set of terms and creates an extract from an ontology around those terms– Traverses ontology structure and is limited by user-
specified recursion depth
● Inspiration for modularization algorithm of d'Aquin et al. (2006)
Seidenberg and Rector (2006)
Segments v.s. Modules
● The segmentation heuristic we use is in contrast to those of Grau et al. (2008) that produce modules with 100% semantic fidelity
● Sacrifice semantic fidelity for an expedient extraction process
● The (tractable) calculation of deductive, conservative extensions for EL is an open research problem
● Or at the very least a challenging problem
Reifying SEP triplets
● Need to replace SNOMED-CT anatomy terms in a way that preserves the intent of the SEP anatomy scheme
● Transcribe them into a more expressive description logic
● Define a set of rules to determine how axioms involving mapped SNOMED-CT terms are replaced
● Shultz et al. (1998) describe how to logically identify components of an SEP triplet
Definitions
● Terms:
– Osnomed
is the short normal form of SNOMED-CT
starting from a user-specified term set
● Anatomy module for a clinical domain
– O'snct-fma
is a module for Osnomed
in Ofma
with respect to
Sanatomy
● Clinical domain module for anatomy
– O'fma-snct
is a module for Ofma
in Osnomed
with respect to
Sanatomy
Results
● The applied domain– Sleep studies (Polysomnograms)
● Quantitative analysis– With and without the use of normal forms
● Example● How the goals were met● Advantages● Challenges
Analysis
● Results:
– 825 (718) classes in O'snct-fma
– 901 (648) classes in O'fma-snct
– 81 (53) SNOMED-CT anatomy concepts in Sanatomy
were reached
– 43 (35) were structures, 37 (17) were entire parts, one was a part
*Numbers in parenthesis are within the normal form
Analysis (cont.)
● Of the 366 (85) disorders and procedures, 23 (4) were cross-boundary definitions
● 266 (232) FMA classes were at the periphery of the segment extraction heuristic
● Candidates for subsequent FMA extraction– Incrementally expand the domain by
connections to related parts of human anatomy
SEP Reification Example
● In SNOMED-CT, Corticobasal Degeneration is a disorder that has (as its finding sites):
– Cerebral cortex (structure)
– Basal ganglion (structure)
● As a result of the SEP reification, it is defined as follows
Achieving the Goals
Goal
1.Identify and fill gaps in clinical terminology
2.Use canonical, normalized representations
3.Has sufficient expressive power
4.Re-uses the FMA
Approach
1.Allow an informatician to seed and control the extraction
2.Take advantage of normal form transformations
3.Leveraging more expressive KR
4.Use a set of rules to reify SEP triplets
Advantages
● We further demonstrate the general value of ontology segmentation within the context of biomedical terminology
● Address the challenge of managing terminology and filling in gaps using reference ontologies in a coordinated way
● The use of a more expressive DL to reify SEP triplets is similar to the approach of Suntisrivaraporn (2007)
– We use terms from a reference ontology of anatomy
FMA Enrichment
● Provides partitive axioms that connect the cerebral cortex to 100 other subordinate anatomical entities
Advantages (cont.)
● O'snct-fma
is a deductive, conservative extension
of its combination with O'fma-snct
– Every inclusion axiom involving FMA terms alone in the combination also holds in FMA as a whole
– The reification process takes advantage of the fidelity of the SNOMED-CT to FMA mappings
● Any application that uses the FMA can still use the combination without loss of meaning of the FMA terms
Challenges
● The use of disjunction operator introduces the need for a more expressive description logic than EL++
● Subsumption links are only traversed upwards from target terms
– Found that downward traversal significantly impacts the size of the segment
Cross-module Definitions
● SNOMED-CT concepts in O'snct-fma
defined by
role restrictions where the filler class involve anatomy terms in S
anatomy
● These embody the kinds of explicit definitions that normal forms attempt to facilitate
● In some cases, the definitions are enriched due to connections to FMA– Resulting in richer entailment
Conclusion (cont.)
● However for an application that uses SNOMED-CT, the same disease may have 2 sites where one is a SNOMED-CT concept and the other is an FMA concept.