Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to SNOMED CT Concepts

Ritu Khare1,3 Yuan An3 Jiexun Li3 Il‐Yeol Song3 Xiaohua Hu3 Michele Follen1,2

Exploiting Semantic Structure for MappingClinician‐specified Form Terms to SNOMED CT Concepts

The elements of clinical databases are usually named after the clinical terms

Motivation, Problem, and Challenges Structure‐based SNOMED‐CT Mapping Framework

Ritu Khare , , Yuan An , Jiexun Li , Il‐Yeol Song , Xiaohua Hu , Michele Follen ,

College of Medicine Center for Women’s Health Research 1, and Obstetrics and Gynecology2 , College of Information Science and Technology3

used in various design artifacts. These terms are instinctively supplied by theusers, and hence, different users often use different terms to describe the sameclinical concept. This term diversity makes future database integration andanalysis a huge challenge.

Form Term SNOMED CT Concept

Semantic Structure Analyzer

Structure –based Classification

Model

Structure –based Classification

Model

Semantic Category

Picker(configurable)

SNOMED CT Category Specific

Mapping (API)

SemanticForm Tree

Training Data

Terms(in Clinical Forms)

SNOMED CT

ConceptsMapping/

Standardization

Semantic Information Extraction

Form

XY

H

Fig. 3. Overall Mapping Framework: (1) The form tree structure is analyzed to derive the form context, (2) Theclassification model (Naïve Bayes) ranks the SNOMED CT semantic categories suitable for the form context, (3) Acategory is picked, (4) The most linguistically matching concept in this category is selected as the winner concept.

Patient History FormPATIENT

Name:

M FGender:DOB: MRN:

Chief Complaints

HISTORY

Diversity Challenge(Well Addressed)

Different cliniciansspecify differentform terms tospecify the samel l

Context Challenge(Less Explored)

The same formterm when used indifferent contexts,may map tod ff

Key IdeasExploit the local semantic structure of form treeto determine the term context, and candidateSNOMED CT semantic categories.

Select a winner semantic category , and map theterm to the linguistically matching concept withinthe determined semantic category.

How can weleverage thesemantic structureof clinical forms tomap the form termsinto standardSNOMED CTconcepts?Preliminaries: SNOMED CT and Semantic Form Trees

Results and Contributions

Future WorkEmpirical Study with Clinician‐designed Forms

About the Data

The data includes 26 forms collected from 5healthcare institutions. The forms containover 1500 terms, out of which 954 (63%) aremappable to SNOMED CT concepts.

Review of Systems:Complaints

Eyes

ENMTRespiratory

clinical concept.e.g.,MRN, orMed.Rec.#.VitalSigns,Constitutional, orPhysical status

different SNOMEDCT concepts.e.g., the termRespiratory in Fig. 1and 2.

Fig 1. A Sample Clinician Designed Form

About the Methods

BASELINE: Linguistic comparison

HYBRID: Linguistic as well asStructural (Contextual)comparison (See Fig. 3)

Leverage other relationships ofSNOMED CT and test with othervocabularies from the UMLS.Test within larger frameworksof health information systems.

The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is awidely used medical terminology. It comprises 360,000 clinical CONCEPTSbelonging to various SEMANTIC CATEGORIES. Each concept is represented usinga CONCEPT ID and a FULLY SPECIFIED NAME. A simple search for the term Eyesacross the UMLS SNOMED CT browser leads to the following top results:

concepts?Preliminaries: SNOMED CT and Semantic Form Trees

0.51

0.63 0.640.73

0.660.69

0.890.76

0.650.72

0.89 0.920.87

0.780.84

Mapping Precision

70

80

90Precision

Recall

Concept Id Fully‐specified Name Semantic Category63342001 Sunsetting eyes Finding

HYBRID++: Linguistic as well asadvanced structural comparison

Findings Implications

Structural Knowledgehas the ability toaddress the context

Improvement due tostructure (Fig 4)(R = recall, P=Precision)Hybrid over Baseline:

Conclusion

Apply other classificationtechniques and employsophisticated linguistictechniques.

0.37

0.52 0.490.43

0.450.43

0.69

0.43

0.31

0.43

0.57

0.74

0.510.43

0.52

Mapping Recall

Baseline Hybrid Hybrid++

Set1 Set2 Set3 Set4 Set5

40

50

60

70

Baseline Hybrid Hybrid++

Precision with Term ProcessingRecall with Term Processing

371110006 Immature eyes Disorder362508001 Both eyes, entire Body Structure

Patient Examination FormPATIENT

Name:

M FGender:

TEXAMINATION

root

Patient Examination

Name Gender T Respiratory

Person Procedure

ObservableEntity

ObservableEntity

ObservableEntity

Fig 5. Change in Results with the term processing, advanced linguistic technique

challenge, andimprove the overallmappingperformance.

Hybrid over Baseline:18% (P); 2%(R)Hybrid++ over Hybrid:16% (P); 23%(R)

Linguistic Techniquescan improve the recalland address thediversity challenge to alarge extent.

It is desirable todevelop hybridapproaches that canaddress both thechallenges & lead to asuperior performance

Improvement due toLinguistics (Fig 5)2‐3% (P), >30%(R)

National Cancer Institute (National Biomedical Imaging Branch): Grant #P01‐CA‐82710‐09National Science Foundation Grants: NSF CCF 0905291, NSF CCF 1049864, and NSFC 90920005

Fig. 2. A clinical form and its equivalent Semantic Form Tree. Each node in the tree is tagged with SNOMED CT semantic categories.

Set1 Set2 Set3 Set4 Set5Acknowledgements

RespiratorySymmetric chest expansionNormal Percussion

M F symm. expan.

nl perc.ObservableEntity

FindingFindingQualifierValue

QualifierValue Fig 4. Mapping Results for 3 Methods

Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to SNOMED CT Concepts

Education

Transcript of Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to SNOMED CT Concepts