Ensembles of NLP Tools for Data Element Extraction from...

24
Ensembles of NLP Tools for Data Element Extraction from Clinical Notes Tsung-Ting Kuo, Pallavi Rao, Cleo Maehara, Son Doan, Juan D. Chaparro, Michele E. Day, Claudiu Farcas, Lucila Ohno-Machado, and Chun-Nan Hsu This research was supported by PCORI Contract CDRN-1306-04819

Transcript of Ensembles of NLP Tools for Data Element Extraction from...

Ensembles of NLP Tools for Data Element Extraction from Clinical Notes

Tsung-Ting Kuo, Pallavi Rao, Cleo Maehara, Son Doan, Juan D. Chaparro, Michele E. Day, Claudiu Farcas,

Lucila Ohno-Machado, and Chun-Nan Hsu

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Disclosure andLearning Objective

• Speaker Tsung-Ting Kuo discloses that he has no relationships with commercial interests

• After participating in this activity, the learner should be better able to:• Understand to what extent ensembles of popular

NLP tools improve the extraction of numerous and diverse concepts

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Outline• Introduction

• Methods

• Results

• Conclusion

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Outline• Introduction

• Methods

• Results

• Conclusion

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Clinical NLP Tools• Natural Language Processing (NLP)• Concept extraction from clinical text

• Ex. Electronic health records (EHR)• Many popular NLP concept extraction tools exist

• cTAKES, MetaMap, etc.

• However, a NLP tool can hardly deal with all tasks• Especially when concept types are numerous & diverse• Ex. data element extraction from clinical notes

• Concepts related to certain medical conditions• 183 types of data element in this study

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Ensembles of NLP Tools• Ensemble methods• Combining various NLP tools to improve performance• Superiority is shown empirically in machine learning• Many ensemble methods are available

• Still unclear to what extend ensembles improve• Especially for numerous and diverse concepts• Most recent studies focus on extracting few types

• Usually < 10 types of concept

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

NLP Phenotyping• Phenotyping• Characterization of disease states using EHR• Critical component of precision medicine• Relies heavily on structured data and NLP from text

• Our goal• Quantify the improvement achieved by NLP ensembles• In the phenotyping of 3 very different cohorts

• Congestive heart failure (CHF)• Weight management/obesity (WM/O)• Kawasaki disease (KD)

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Outline• Introduction

• Methods

• Results

• Conclusion

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

NLP Ensemble Pipeline for Data Element Extraction from Clinical Notes

NLP Ensemble Pipeline• We developed an NLP ensemble pipeline to• Integrate two popular NLP tools: cTAKES and MetaMap• Evaluate the ensemble approach

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

NLPToolkit

cTAKES

MetaMap

NLP Ensemble

BasicEnsemble

AdvancedEnsemble

NLPPostprocessor

Extracted Data

Elements

Annotation Tags

Structured-FormattedFiles

AnnotatedFiles

………...........

………..........

... ...

.......

.......

..

..

..

..Clinical Notes

………........

………..........

.....

......

.....

Electronic Health Record (EHR)

System

NLPPreprocessor

FormatConverter

EncodingConverter

SentenceSplitter

Data Elements• Subject matter experts identified data elements• CHF = 50, WM/O = 96, KD = 37• Example CHF data elements

• Category Data Element• Terms Congestive Heart Failure• Encounter Information Days Since Symptom Onset• Other Information Body Mass Index• Laboratory Tests Blood Urea Nitrogen• Imaging Tests Chest X-Ray• Medications Beta-Blocker• History and Progress Past Medical History• Comorbidities Hypertension• Implants and Procedures Implantable Pacemaker

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Mapping Tables• We mapped data elements to standard codes• Map to the most specific one via BioPortal• SNOMED-CT, LOINC, RxNorm, or UMLS

• We created data element mapping tables• To normalize the output formats of the NLP tools• If output of any tool contained multiple standard codes

• We mapped all standard codes to the unique data elements• NLP outputs ready to be inputs for ensemble methods

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Ensemble Methods• Basic methods

• Union and intersection• CHF sentence example

• “Mr. X is being discharged on Lasix, Digoxin and Toprol daily.”

• Union = { Lasix, Digoxin }, intersection = { }

• Machine learning may be able to recover data elements• CHF sentence example

• “Mr. X is being discharged on Lasix, Digoxin and Toprol daily.”

• Given the 3 data elements are usually mentioned together in training• Neither union nor intersection can improve the extraction results

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

NLPTool1 NLPTool2 Mayberecovered

NLPTool1 NLPTool2

Ensemble Methods (cont.)

• Advanced methods• Convert to binary Multi-Label Classification (MLC) task

• Consider relationships among data elements• Ex. 10,000 sentences, 50 data elements, 2 NLP tools

• Applied 5 well-known MLC Algorithms• Binary Relevance (BR) [Tsoumakas et al. 2006]

• Multi-Label K-Nearest Neighbor (MLkNN) [Zhang et al. 2007]

• Instance-Based Logistic Regression for Multi-Label (IBLR-ML) [Read et al. 2011]

• Random k-Labelsets (RAkEL) [Tsoumakas et al. 2007]

• Ensemble of Classifier Chains (ECC) [Read et al. 2011]

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

BinaryFeatures

BinaryFeatures

BinaryLabels

10,000instances

50forNLPTool1 50forNLPTool2 50labels

Test Corpus

• Datasets• Evaluated on 4 sets of notes

• CHF = 33, WM/O = 34, KD-Public = 33, KD-Private = 30• For each dataset

• Training = randomly selected 50% notes (10-fold CV)• Testing = the remaining 50% notes

• Annotations• Manually annotated 6,914 data element mentions

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Evaluation Metrics

• Corpus-level• Correct prediction = the one in the same note

• Sentence-level• Correct prediction = the one in the same sentence• Used Stanford CoreNLP to split to 9,320 sentences

• For both levels, we compute 3 metrics• Precision, Recall, and F1-Scores

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Outline• Introduction

• Methods

• Results

• Conclusion

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Corpus-Level F1-Scores

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

CHF WM/O KD-Public KD-Private

F1-S

core

cTAKES MetaMap Union Intersection BR MLkNN IBLR-ML RAkEL ECC

NLP Tool Basic Ensemble Advanced Ensemble

Sentence-Level F1-Scores

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

CHF WM/O KD-Public KD-Private

F1-S

core

cTAKES MetaMap Union Intersection BR MLkNN IBLR-ML RAkEL ECC

NLP Tool Basic Ensemble Advanced Ensemble

Discussion• Basic ensemble methods• Union improved and intersection performed worse• Coverage is a critical concern

• Advanced ensemble methods• No method consistently performs best• MLC boosted results for WM/O and KD-Private datasets

• Label density = annotations per sentence• Ex. WM/O = 0.94, CHF = 0.62

• Higher label density = better training examples = better model

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Outline• Introduction

• Methods

• Results

• Conclusion

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Conclusion• Summary

• Developed an NLP ensembles pipeline• Extracted data elements from clinical notes• Applied popular NLP tools and ensemble methods• Tested on public/private notes for 3 conditions• Showed ensemble may be a practical solution

• Future works• Extract data elements for more cohorts• Combine more NLP tools• Evaluate on more annotated clinical notes

• The ensemble component is publicly available• https://github.com/tsungtingkuo/ensemble

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Questions?

Acknowledgements• PCORI CDRN-1306-04819• iDASH U54HL108460• CTRI CTSA UL1TR001442• Antonios Koures, PhD, UCSD• Jane Burns, MD, UCSD• Adriana Tremoulet, MD, UCSD• Howard Taras, MD, UCSD• Zhaoping Li, MD, UCLA• Michael Ong, MD, PhD, UCLA• Paul A. Heidenreich, MD, MS,

Stanford & Palo Alto VA• i2b2 U54LM008748

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Data Source• We tested on 130 clinical notes• Public = 100

• Collected 45,136 notes• MT Samples, i2b2 Challenges and ShARe CLEF eHealth Tasks

• Selected notes based on keyword combinations• Randomly selected notes for the evaluation

• CHF = 33, WM/O = 34, KD-Public = 33

• Private = 30• We randomly sampled from a pool of 381 notes for KD

• Rady Children’s Hospital and the Emory University• Institutional Review Boards (IRB) was approved this study• KD-Private = 30

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

Qualitative Results

• Data elements using union ensemble method• Example of the CHF data elements

ThisresearchwassupportedbyPCORIContractCDRN-1306-04819

F1-Score Category Data Element

HighestMedications HeparinComorbidities Diabetes MellitusLaboratory Tests Serum Albumin

Lowest Medications Angiotensin-Converting Enzyme InhibitorWarfarin

History and Progress Chief Complaint