Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University...

16
Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan Schulz Freiburg University Medical Center, Germany Federal Technological University of Paraná, Brazil

Transcript of Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University...

Page 1: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Automatic Mapping of Clinical Documentation

to SNOMED CT

Holger Stenzhorn Saarland University Hospital, Homburg, Germany

Edson Pacheco

Percy Nohama

Stefan Schulz Freiburg University Medical Center, Germany

Federal Technological University of Paraná, Brazil

Page 2: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Introduction Methods Results Conclusion

Page 3: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Background

• Important role of narrative content in the EHR

• Manual coding: cost, quality and scope problems

• Increasing demand for high-quality structured data

• SNOMED CT as a new terminological standard claims to

represent the whole clinical process

Can language technology help semantically enrich narratives in the Electronic Health Record ?

Introduction Methods Results Conclusion

Page 4: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Case study

• Source:

– discharge summaries from the

cardiology department of the

Hospital de Clínicas de

Porto Alegre, Brazil

– Language:

Portuguese

• Target

– SNOMED Clinical Terms, 01/2009

– Languages: English, Spanish

Introduction Methods Results Conclusion

Page 5: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Sample Discharge Summary

# HAS # DM # Miocardiopatia dilatada chagásica (FE 35%) # Ca de prostata -

orquiectomia (2004) # Cardiopatia isquêmica - IAM em 2005, com colocação de

stent em DA e lesão severa inoperável em CD Pct vem a emergência em 20/03

com quadro de dor torácica típica, sem elevação enzimática, com diagnóstico

de angina instável e fibrilação atrial não identificada em avaliações prévias.

Adicionalmente, apresentava descompensação do diabetes com sindrome

hiperosmlar não cetótica. Recebe tratamento clínico para otimização do quadro

e é submetido a novo cateterismo em 28/03, que demonstra CD ocluída no

terço proximal, DA com stent rpoximal com lesão de 40% no seu interior e Mg

de Cx com lesão de 60-65%. Recebe alta em bom estado geral, sem dor

torácica, anticoagulado, com plano de retorno ambulatorial para equipe de

cardiopatia isquêmica e para o ambulatório de anticoagulação.

Acronyms

Abbreviations

Punctuationerrors

Typing errors

Telegram Style

Introduction Methods Results Conclusion

Page 6: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Introduction Methods Results Conclusion

Page 7: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

NLP pipeline

sentencedetecting

spellchecking

acronymexpansion

NErecognition

POStagging

NPextraction

contextdetection

morpho-semanticabstraction

SCT - EN

SCT - SP

subsetcreation

morpho-semanticabstraction

MID-RepresentationSNOMED CT

MID-Representation

Term candidates

Introduction Methods Results Conclusion

Page 8: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Language processing tools implemented

• Sentence splitter, POS tagger: openNLP, trained with

manually annotated texts

• Acronym expander: RegExp matching against acronym

database, disambiguation by local context (token

cooccurrence in a three token window)

• Noun phrase detector: driven by typical POS patterns in

Spanish SNOMED CT descriptions (with few adaptations to

Portuguese, due to the similarity between the two

languages)

• Not yet implemented: spell checker, NE-recognizer,

context (e.g. negation) detector

Introduction Methods Results Conclusion

Page 9: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Morphosemantic Abstraction

• Using MSI (morphosemantic indexing) toolkit

(Averbis GmbH, Freiburg)

• Extraction of significant word fragments (subwords) and mapping to

semantic identifiers (MIDs):

• #derm = {heart, cardiac, herz, kard, corac, cardiac, coeur, … }

• #inflamm = { inflamm, -itic, -itis, -phlog, entzuend, -itis, inflam, flog, inflam, flog, ... }

• Thesaurus ~ 21.000 equivalence classes

• Lexicon entries:– English: ~23.000– German: ~24.000– Portuguese: ~15.000– Spanish : ~11.000– French: ~ 8.000– Swedish: ~10.000– Italian: ~ 4.000

muscle

myo

muskel

muscul

inflamm

-itis

inflam

entzünd

Eq Class

subword herzheart

card

corazon

card

INFLAMMMUSCLE

HEART

Introduction Methods Results Conclusion

Page 10: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Methods: NLP pipeline

sentencedetecting

spellchecking

acronymexpansion

NErecognition

POStagging

NPextraction

contextdetection

morpho-semanticabstraction

SCT - EN

SCT - SP

subsetcreation

morpho-semanticabstraction

MID-RepresentationSNOMED CT

MID-Representation

Term candidates

MappingHeuristics

Introduction Methods Results Conclusion

Page 11: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

SNOMED CT Concepts as Subwords

SNOMED CTConcept Description

MIDs

ENG: Congestive heart failure

#abund #cardiac #deficien

ENG: Congestive heart disease

#abund #cardiac #disorder

ENG: Congestive cardiac failure

#abund #cardiac #deficien

SPA: Insuficiencia cardíaca 

#insuff #cardiac

SPA: Insuficiencia cardíaca congestiva

#insuff #cardiac #abund

Introduction Methods Results Conclusion

Page 12: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Mapping heuristics

• For each term candidate

• decide whether there is a matching SNOMED description

• if yes, find the best SNOMED description

• map to the pertaining SNOMED description

• Preference criteria:

• matching with “term-typical” POS patterns

• MID coincidence (weighted by tf-idf)

• threshold: 60%

• In case of failure: test whether term candidate corresponds

to two SNOMED concepts. Plausibility of concept

coordinations using SNOMED relationship table

Introduction Methods Results Conclusion

Page 13: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Gold standard (kappa = 0.89)

Introduction Methods Results Conclusion

Page 14: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

First results

Number of tokens (MIDs) Correct Mappings2 66%3 71%4 80%5 89%6 79%7 80%8 75%9 45%10 25%

Introduction Methods Results Conclusion

Page 15: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

ConclusionConclusion

• Work in progress– Encouraging preliminary results

– SNOMED mapping possible across language boundaries

• Future work– Implement and test pipeline elements not implemented so

far

– Measure impact of each pipeline element for mapping quality

– Scientific challenges:

• Automated context (e.g. plan, order, negation) identification

• Use of SNOMED CT’s ontological structure for improving mapping result

Introduction Methods Results Conclusion

Page 16: Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Acknowledgements

• German Research Foundation (DFG)

• International Bureau of the

German Ministry of Research (BMBF-IB)

• Brazilian National Research Council (CNPq)

• Hospital de Clínicas de Porto Alegre (HCPA)

• Averbis GmbH, Germany