Segmenting & Merging Domain-specific Modules for Clinical Informatics

45
Segmenting & Merging Domain-specific Modules for Clinical Informatics Chimezie Ogbuji Cleveland Clinic & Case Western Reserve University Sivaram Arabandi Case Western Reserve University Songmao Zhang Chinese Academy of Sciences Guo-Qiang Zhang Case Western Reserve University

Transcript of Segmenting & Merging Domain-specific Modules for Clinical Informatics

Page 1: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Segmenting & Merging Domain-specific Modules for Clinical Informatics

Chimezie Ogbuji Cleveland Clinic & Case Western Reserve University

Sivaram Arabandi Case Western Reserve University

Songmao Zhang Chinese Academy of Sciences

Guo-Qiang Zhang Case Western Reserve University

Page 2: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Introduction

● What are we doing and why are we doing it?– Generally

– Specifically

● What is the criteria for success?● What are existing best practices and well-

documented challenges of ontology re-use?

Page 3: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Introduction

● Construct domain-specific ontologies to support data curation and ongoing clinical research activity

● PhysioMIMI is an informatics infrastructure for collection, management, and analysis of sleep-related data

● Our method was used to bootstrap a Sleep Domain Ontology (SDO)

Page 4: Segmenting & Merging Domain-specific Modules for Clinical Informatics
Page 5: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Goal / Criteria for Success

● Want to (automatically) – Generate anatomy and clinical terminology modules

that make use of principled normal forms, are minimal in size, and preserve the meaning of re-used symbols

– As much as is computationally feasible

– Be able to facilitate the customization of a large source ontology such as SNOMED-CT

● Provide a framework for bootstrapping terminology for a specific domain

Page 6: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Desiderata for Clinical Terminology

● There is a critical need for formal, reproducible methods for recognizing and filling gaps in medical terminologies (Cimino 1998)

● Clinical terminology systems need to extend smoothly and quickly in response to the needs of users (Rector 1999)– A fixed, enumerated list of concepts can never be

complete and results in a combinatorial explosion of terms (exhaustive pre-coordination)

Page 7: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Desiderata (cont.)

● Post-coordination is a contrasting approach where a set of atomic concepts are used to create new terms on demand rather than a priori

● Rector 2003 proposed a set of normalization criteria and an approach for decomposing and recombining disjoint, homogenous taxonomies

● Goal is for trees of primitive terms to serve as a terminological framework that minimizes implicit differentia

– Discrete coordinate system

Page 8: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Background

● Related efforts regarding– Ontology merging

– Ontology modularization

● Review formalisms for ontology modularization– What is a deductive, conservative extension?

– What is a module?

● What is the difference between a segment and a module?

Page 9: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Related Work

● Noy and Musen (2000)– Discuss how to either automate the merging and

alignment or guide the user, suggesting conflicts and actions to take

– Rely on lexical matching of term names

● Bontas and Tolksdorf (2005)– Similar goal as Noy & Musen

– User provides a list of term matches between source & target

– Follow semantic connections from these terms

Page 10: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Related Work (cont.)

● Bontas et al. (2005) identify the following challenges in ontology re-use:– Automated translation of source ontologies into

common KR format

– Customization of source ontology

– Performance challenges of large medical ontologies

Page 11: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Related Work (cont.)

● d'Aquin et al.(2006)– Use a modularization algorithm based on a

traversal paradigm

– Describe 3 generic steps of dynamic knowledge selection algorithms:

● Selection of relevant ontologies● Modularization via an algorithm● Merging of ontology modules in a meaningful way

– Claim all entailments are preserved but do not demonstrate how this is guaranteed

Page 12: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Modularization

● Move to introduction (single bullet item)● The size of major medical ontologies is

prohibitive to the use of deductive reasoning● In addition and more relevant here, their size is

a significant challenge to terminology management

● Ontology modularization is a blossoming field in logic engineering

Page 13: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Deductive, Conservative Extensions

● Grau et al. (2008) define a formal relationship between DL ontologies: deductive, conservative extension

● Use case: we are developing ontology P and want to re-use a set of symbols from ontology Q without changing their meaning

● If the symbols they have in common are re-used in this way then:

– P + Q is a conservative extension of Q

Page 14: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Module

● When answering a query involving terms in O (its signature or vocabulary), importing O'

1

should give the same answers as if O' had been imported instead:

– O'1 is a more manageable fragment of O'

● Then we say O'1 is a module for O in O'

Page 15: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Materials

● SNOMED-CT● FMA● Common anatomy signature

Page 16: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Materials

● There is a reasonable consensus around two reference ontologies in clinical medicine– SNOMED-CT and the Foundational Model of

Anatomy (FMA)

● Both leverage an underlying formal knowledge representation

Page 17: Segmenting & Merging Domain-specific Modules for Clinical Informatics

SNOMED-CT

● A comprehensive terminological framework for clinical documentation and reporting.

● Comprised of about half a million concepts:– Clinical findings, procedures, body structures,

organisms, substances, pharmaceutical products, specimen, quantitative measures, and clinical situations

● Has an underlying description logic (EL)– EL has been proven to be suitable for medical

terminology

Page 18: Segmenting & Merging Domain-specific Modules for Clinical Informatics

SNOMED-CT Challenges

● Its size is deters the use of logical inference systems to manage and process it (due to performance issues)

● Most description logic systems run into challenges with memory exhaustion when classifying it in its entirety

● In some cases, its definitions are inconsistent or incomplete

● However, it is the de facto reference for clinical terminology

Page 19: Segmenting & Merging Domain-specific Modules for Clinical Informatics

SNOMED-CT SEP Triplets

● SNOMED-CT uses SEP triplets to model anatomy concepts and their relationships to each other

● For every proper SNOMED-CT anatomy concept (an Entire class), there are two auxiliary classes:– A Structure class

– A Part class

● Main motivation is to rely on subsumption to reason about part-whole relationships

Page 20: Segmenting & Merging Domain-specific Modules for Clinical Informatics

SEP Triplets

Example: Lower respiratory tract structure (part), Structure of respiratory system (structure), Entire respiratory system (entire)

Page 21: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Foundational Model of Anatomy

● Has a goal to conceptualize the physical objects and spaces that constitute the human body

● Leverages a frame-based knowledge representation to formulate over 75,000 concepts including:– Macroscopic, microscopic, and sub-cellular

canonical anatomy

● Anatomy is fundamental to biomedical domains

Page 22: Segmenting & Merging Domain-specific Modules for Clinical Informatics

FMA (cont.)

● Concepts are connected by several mereological relations

● Primarily concerned with part_of and has_part ● Adheres to a strict, aristotelian modeling

paradigm– Ensures definitions are consistent and state the

essence of anatomy in terms of their characteristics

● Using a 2006 OWL translation from the version in the OBO foundary

Page 23: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Common Anatomy Signature

● There is a significant overlap between anatomy terms in SNOMED-CT and FMA

● Bodenreider and Zhang (2006) analyzed this overlap

● Leveraged lexical and structural analysis● Identified ~ 7500 common concepts

– Refer to as Sanatomy

● Key to the general applicability of our method within the domain of clinical medicine

Page 24: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Normal Forms

● Similarly, SNOMED-CT manual describes methods for generating normal forms

● Canonical forms comprised of maximally decomposed logical expressions

– Entailments from full SNOMED-CT still follow from normal forms

● Useful for comparing post-coordinated expressions during retrieval or analysis of data

Page 25: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Methods

● Start with a list of user-specified SNOMED-CT concepts

– Determines the domain

● 3 step process resulting in

– A SNOMED-CT module: O'snct-fma

– Transliteration of SEP triplets

– A FMA segment: O'fma-snct

● Segmentation heuristic● Directly merge into a single ontology

Page 26: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Core Procedure

● Extract normal forms from SNOMED-CT

● SNOMED-CT anatomy terms in Sanatomy

that are

reached during the extraction are replaced and used as seeds to extract a segment from the FMA

● Axioms involving SNOMED-CT anatomy terms in S

anatomy and the terms themselves

are

replaced such that they preserve the intent of the SEP triplet scheme using FMA terms

Page 27: Segmenting & Merging Domain-specific Modules for Clinical Informatics
Page 28: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Segmentation Heuristic

● Seidenberg and Rector (2006) describe an ontology segmentation heuristic that starts with a set of terms and creates an extract from an ontology around those terms– Traverses ontology structure and is limited by user-

specified recursion depth

● Inspiration for modularization algorithm of d'Aquin et al. (2006)

Page 29: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Seidenberg and Rector (2006)

Page 30: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Segments v.s. Modules

● The segmentation heuristic we use is in contrast to those of Grau et al. (2008) that produce modules with 100% semantic fidelity

● Sacrifice semantic fidelity for an expedient extraction process

● The (tractable) calculation of deductive, conservative extensions for EL is an open research problem

● Or at the very least a challenging problem

Page 31: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Reifying SEP triplets

● Need to replace SNOMED-CT anatomy terms in a way that preserves the intent of the SEP anatomy scheme

● Transcribe them into a more expressive description logic

● Define a set of rules to determine how axioms involving mapped SNOMED-CT terms are replaced

● Shultz et al. (1998) describe how to logically identify components of an SEP triplet

Page 32: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Definitions

● Terms:

– Osnomed

is the short normal form of SNOMED-CT

starting from a user-specified term set

● Anatomy module for a clinical domain

– O'snct-fma

is a module for Osnomed

in Ofma

with respect to

Sanatomy

● Clinical domain module for anatomy

– O'fma-snct

is a module for Ofma

in Osnomed

with respect to

Sanatomy

Page 33: Segmenting & Merging Domain-specific Modules for Clinical Informatics
Page 34: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Results

● The applied domain– Sleep studies (Polysomnograms)

● Quantitative analysis– With and without the use of normal forms

● Example● How the goals were met● Advantages● Challenges

Page 35: Segmenting & Merging Domain-specific Modules for Clinical Informatics
Page 36: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Analysis

● Results:

– 825 (718) classes in O'snct-fma

– 901 (648) classes in O'fma-snct

– 81 (53) SNOMED-CT anatomy concepts in Sanatomy

were reached

– 43 (35) were structures, 37 (17) were entire parts, one was a part

*Numbers in parenthesis are within the normal form

Page 37: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Analysis (cont.)

● Of the 366 (85) disorders and procedures, 23 (4) were cross-boundary definitions

● 266 (232) FMA classes were at the periphery of the segment extraction heuristic

● Candidates for subsequent FMA extraction– Incrementally expand the domain by

connections to related parts of human anatomy

Page 38: Segmenting & Merging Domain-specific Modules for Clinical Informatics

SEP Reification Example

● In SNOMED-CT, Corticobasal Degeneration is a disorder that has (as its finding sites):

– Cerebral cortex (structure)

– Basal ganglion (structure)

● As a result of the SEP reification, it is defined as follows

Page 39: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Achieving the Goals

Goal

1.Identify and fill gaps in clinical terminology

2.Use canonical, normalized representations

3.Has sufficient expressive power

4.Re-uses the FMA

Approach

1.Allow an informatician to seed and control the extraction

2.Take advantage of normal form transformations

3.Leveraging more expressive KR

4.Use a set of rules to reify SEP triplets

Page 40: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Advantages

● We further demonstrate the general value of ontology segmentation within the context of biomedical terminology

● Address the challenge of managing terminology and filling in gaps using reference ontologies in a coordinated way

● The use of a more expressive DL to reify SEP triplets is similar to the approach of Suntisrivaraporn (2007)

– We use terms from a reference ontology of anatomy

Page 41: Segmenting & Merging Domain-specific Modules for Clinical Informatics

FMA Enrichment

● Provides partitive axioms that connect the cerebral cortex to 100 other subordinate anatomical entities

Page 42: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Advantages (cont.)

● O'snct-fma

is a deductive, conservative extension

of its combination with O'fma-snct

– Every inclusion axiom involving FMA terms alone in the combination also holds in FMA as a whole

– The reification process takes advantage of the fidelity of the SNOMED-CT to FMA mappings

● Any application that uses the FMA can still use the combination without loss of meaning of the FMA terms

Page 43: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Challenges

● The use of disjunction operator introduces the need for a more expressive description logic than EL++

● Subsumption links are only traversed upwards from target terms

– Found that downward traversal significantly impacts the size of the segment

Page 44: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Cross-module Definitions

● SNOMED-CT concepts in O'snct-fma

defined by

role restrictions where the filler class involve anatomy terms in S

anatomy

● These embody the kinds of explicit definitions that normal forms attempt to facilitate

● In some cases, the definitions are enriched due to connections to FMA– Resulting in richer entailment

Page 45: Segmenting & Merging Domain-specific Modules for Clinical Informatics

Conclusion (cont.)

● However for an application that uses SNOMED-CT, the same disease may have 2 sites where one is a SNOMED-CT concept and the other is an FMA concept.