From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October...

59
From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007

Transcript of From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October...

Page 1: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

From free text to clinical data

Language and Computing

Davide Zaccagnini, MDKaren Doyle, RNOctober 23, 2007

Page 2: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Outline

• Reality of Applying NLP to AHLTA documents

• Use Cases

• Ontology-Based NLP

Page 3: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Use Cases

• PRIMARY Use Case for Health Care Documentation compared with documentation produced for Biomedical Research

– Collect information to determine diagnosis (ses) and execute a plan of treatment and communicate with healthcare team.

• By-products of Electronic Documentation– Coding for Billing – Problem Lists– Past Medical History– Social History; 14 Elements tobacco use ETOH, toxin exposure, marital

status – Family History– Medications – Allergies– Bio-surveillance– Quality Metrics; Pay for Performance, Joint Commission, HEDIS– Research

Page 4: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

AHLTA offers Structured Documentation Tool

Medcin Terms in Blue

Page 5: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Structured and Unstructured Text DoD HA Policy Guidance

Ref ASAD Health Affairs August 7, 2007

Page 6: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Blue is the original code calculated based on the structured documentation. Pinks are the how the Doctor can change the subscores,. But the document does not change.

Page 7: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Background of TATRC HPI Free Text DUMMY

• Lost Data in S/O sections: What is the value?• Patient History

– Patient’s “story”, reflects signs and symptoms – History of Present Illness – Review of Systems:– Past Family, Social and Medical History– Used to calculate Evaluation and Management (E&M)

Billing Codes• HPI: History of Present Illness

– Definition: A chronological description of the present illness from the first sign or symptom, or from last encounter

– Comprised of 8 Elements used in the calculation of E&M code

Location, quality, severity, duration, timing, context, modifying factors, associated signs and symptoms

Page 8: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

(HPI Dummy # 1) Free text Section Extracted manually

for Analysis

Page 9: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

100 Texts for Processing

Page 10: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Free Text to Data: What is desirable?

• HPI 1 45yo G4P4, POD14 s/p TAH, doing well. Denies f/c. Denies any pain. Not taking any pain meds. Staples removed on 9May. Appetite good. No N/V. Normal bowel/bladder function. She is very happy with the outcome of surgery. Only concern is incision -very small area that has not healed completely. has been keeping the incision clean and dry.

• Expand Abbreviations• Codify Terms to

Vocabularies ICD 9 SNOMED, MEDCIN

• Negation• Modality• Applying Rules

– Financial Billing – Obtain; age, height,

weight, blood pressure, dates

– Quality Metrics – Surveillance – History, Family, Past

Medical, Current Problems?

Page 11: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Free Text Example

Expand Abbreviations Code to Vocabularies

Evaluate for Negation

Page 12: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Apply Rules

appetite good

good

very

f/cn/v

TAHpain

happy

taking pain meds

negation

Page 13: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Ontology-based NLP

Page 14: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Natural Language Processing and Understanding

“…..natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”

Wikipedia

Page 15: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

DATA MODELS ONTOLOGY

FORMALLY DEFINED OF CONCEPTS:

• NO PREDEF. USE

• REALITY DRIVEN

• NO PREDEF. CONTEXT

• INFERRED MODEL

AGREED UPONTERMS:

• PREDEF. USE

• DATA DRIVEN

• PREDEF. CONTEXT

• SPECIALIZED MODEL

Representations (formal or otherwise)

Page 16: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

What is fever?

All definitions are accurate within their model, but what is fever?

does the patient have fever?

Page 17: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

ID# ZIP code BP001123 02139 80/120

001223 24425 65/130

patientidentifier

geographical area

blood pressure

ID#

The world according to a databasePatients {ID#, ZIP code, BP}

The world according to an ontology patient

has (identifier (is_a (ID#)) ∩ lives_in (geographic_area) ∩ has (blood_pressure (is_measured_by (blood pressure measurement(…)))

blood pressure measurement

value

80/120

is_a

is_identifed_by has is_measured_by generates

is_a is_a

65/130

lives in

ZIP codeis_identifed_by

Formal representations

Page 18: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Ontologies:the meaning of data

An ontology:• Explicitly specifies meaning• Represents reality, not data• Is a formal schema• Its consistency can be automatically

enforced and checked

Page 19: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

NLP Workflow

• Example Pipeline

Input handler

Paragrapher

Segmenter

Section labeler

Syntactic parser

Fragment labeler

Lexeme filter

Vital signs extractor

Labs extractor

FreePharma

Disambiguator

Coder

Concept filters

Relevance ranker

Output handler

Negation/modality

-> Assigns fragment labels to pieces of text within sections

-> Filters out function words (e.g. determiners) to reduce false mapping positives

-> Identifies negation, modality and future

-> Extracts vital signs

-> Extracts lab results

-> Extracts medications

-> Disambiguates concepts

-> Codes to standard classification systems like SNOMED-CT, ICD-9,…

-> Fetches document and pass to first processing component

-> Paragraph and title detection

-> Maps tokens and multi-words to ontology. Rewriting to enhance mapping

-> Assigns section labels to paragraphs

-> Performs syntactic parsing validating against grammar

-> Marks concepts that belong to different filters (e.g. diagnoses, procedures)

-> Calculates relevance of concepts

-> Creates XML/HTML/… output

Semantic tagger -> Further deduces concepts based on syntax, rewriting, full definitions and so on

Page 20: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Semantic Tagging

Concept: SNOMED CT : 29074008 : POLYP OF ANTRUM (DISORDER)

Sample: “Demonstrated benign small polyps in the antrum”

antrum > antralpolyp < polypsMorphological Variations:

antral polyp ; polyp antralWord Clustering:

maxillary sinus polyp, antral polypKnown Synonyms:

Page 21: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Types of Disambiguation

by STRING: lexical match between a term, (or it’s inflections) and a concept in the ontology.

fever

symptom

cough

Ex.: “Patient presents fever”

Page 22: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

by DEFINITION: match between terms and concepts in the ontology, where these concepts meet necessary and sufficient conditions (logic-based reasoning)

Ex.: “Patient underwent a liver biopsy”

true true

has_location (liver) Λ is_a (biopsy)

procedureorgan

liver biopsy

liver biopsy =

Types of Disambiguation

Page 23: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

by RELATIONSHIPS: match between SOME of the term(s), assigned to different concepts in the ontology, where these concepts compose the full definition of the concept using a ‘suggested parent’.

Ex.: “CT of thyroid”

true true

is_a (CT scan) Λ has_location (thyroid)

neckCT thyroid

CT of Neck

?

has_location

is_a (CT scan) Λ has_location (neck)

true

=

=

is_a

Types of Disambiguation

Page 24: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Examples of disambiguation

Page 25: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Ontology and NLP

LinKBase®

MedicalOntology

Spanish

English

Lexicon Grammar Proprietary

ICD-9

MEDCIN

SNOMED CT

CPT

Radlex (partial)

concepts are mapped to terms in multiple languages

Cross-mapped to multiple coding systems

Natural language processing Terminologies and data integration

Page 26: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Conclusion

• Ontologies are powerful NLP tools for:• Segmentation• Disambiguation• Higher level inference• Interoperability of extracted data• Requires human resources for maintenance,

but reduce the need for annotated data

• They are “white boxes”• Models that can be expanded and changed

• Combined with stochastic algorithms, they provide both formality and scalability

Page 27: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Thank you

Page 28: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.
Page 29: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

“Patients in the North East have higher blood pressure than the average population”

patientidentifier

geographical area

blood pressure

ID#

blood pressure measurement

value

80/120

is_a

is_identifed_byhas

is_measured_bygenerates

is_a is_a

65/130

lives in

ZIP codeis_identifed_by

NLP/U, formal representations

Page 30: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Disambiguation

• Words in document are mapped to concepts in the ontology

• When more than one candidate exist in the ontology, it builds a graph of concept relations using:1. Nearness in sentence2. IS_A Relationships

3. Horizontal relationships

Page 31: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Syntactic Parsing

«A very young patient was given a double dose by his mother.»

The subject.

The predicate

Note passiveconstruction

Page 32: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Negation via Syntax

Page 33: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Modality via Syntax

Page 34: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Reference Resolution

“TeSSI” understands indirect reference to patient

Page 35: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

The system is able to disambiguate between two different meanings of “depressed” in one and the same sentence. While it defines the “depressed” in “depressed patient” as a state of mind, it recognizes “depressed” as a part of “depressed fracture” and tags this noun phrase with the corresponding SNOMED code.

Disambiguation

Page 36: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Fragment Labeling

• Sentences and phrases are labeled• History, exam, impression, etc.

• Independent of superficial formatting

• One label – one type of information

Page 37: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

“HPI: The patient whose mother had breast cancer presents with loss of hearing”

Family History

Chief Complaint

Fragment Labeling

Page 38: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

FreePharma

. Medication Extraction• Example

Page 39: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Semantic Indexing

Input handler

Paragrapher

Segmenter

Disambiguator

Relevance ranker

Indexer

-> Disambiguate concepts

-> Fetch document and pass to first processing component

-> Paragraph and title detection

-> Map tokens and multi-words to ontology

-> Calculate relevance of concepts

TeSSI : Terminology Supported Semantic Indexing

-> Write information to index for quick access.

Page 40: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Information Extraction

Input handler

Paragrapher

Segmenter

Section labeler

Syntactic parser

Fragment labeler

Vital signs extractor

Labs extractor

FreePharma

Output handler

Negation/modality

-> Assign fragment labels to pieces of text within sections

-> Identify negation, modality and future

-> Extract vital signs

-> Extract lab results

-> Extract medications

-> Fetch document and pass to first processing component

-> Paragraph and title detection

-> Assign section labels to paragraphs

-> Perform syntactic parsing validating against grammar

-> Create XML/HTML/… output

Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on

-> Map tokens and multi-words to ontology

Page 41: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Knowledge Discovery

Input handler

Paragrapher

Segmenter

Section labeler

Syntactic parser

Fragment labeler

Vital signs extractor

Labs extractor

FreePharma

Ontology writer

Negation/modality

-> Assign fragment labels to pieces of text within sections

-> Identify negation, modality and future

-> Extract vital signs

-> Extract lab results

-> Extract medications

-> Fetch document and pass to first processing component

-> Paragraph and title detection

-> Assign section labels to paragraphs

-> Perform syntactic parsing validating against grammar

-> Add discovered knowledge to onology

Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on

-> Map tokens and multi-words to ontology

Rules Engine -> Xml structured rules for interpreting syntactic structure and forming semantic represenations

Page 42: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Automatic coding

Input handler

Paragrapher

Segmenter

Section labeler

Syntactic parser

Fragment labeler

Vital signs extractor

Labs extractor

FreePharma

Negation/modality

-> Assign fragment labels to pieces of text within sections

-> Identify negation, modality and future

-> Extract vital signs

-> Extract lab results

-> Extract medications

-> Fetch document and pass to first processing component

-> Paragraph and title detection

-> Assign section labels to paragraphs

-> Perform syntactic parsing validating against grammar

Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on

-> Map tokens and multi-words to ontology

Rules Engine -> Xml structured rules for interpreting syntactic structure and forming semantic represenations

Code Calculator -> Code calculator: e&M, ICD-9, CPT

Output handler -> Create XML/HTML/… output

Page 43: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

NLP-based applications and products

Page 44: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

44 44

Quality

Projects:CPR TechnologiesJCAHOEclipsys

• Extraction of CMS Core Measures• National Patient Safety Network• Datawarehousing

Page 45: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

45 45

Coding

Projects:Kaiser PermanenteConvergent Solutions

• E&M Coding• SNOMED Coding• ICD-9 Coding• CPT in development

Page 46: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

46 46

Medication Extraction

Projects:The Marshfield ClinicMedquistUAB

• Medication Reconcilation• Personalized Medication Project• Validation of therapies from literature

Page 47: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

47 47

Interoperability

Projects:Integic/DoDRevolution Health

• Semantic Integration of the military health systems

• Tie together free text content and portal applications

Page 48: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

48 48

Web Search and Retrieval

Projects:Revolution HealthMerck

• Ontolgy enhanced search • Concept based indexing

Page 49: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

49 49

Radiology

Projects:FUJIFILM MEDICAL SYSTEMS

• Findings and pertinent negatives extracted from radiology reports

Page 50: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Radiology

• Observation Types• Findings• Pertinent Negatives• Quality Assurance• Unclassified

• Observation Components• Fundamentals• Modifiers• Qualifiers

• Observation Status• (Present) / Historical• Changed/Not Changed/(not stated)

Page 51: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Observation Types

• Findings• E.g. “bilateral infiltrates”

• Pertinent Negatives• E.g. “the lungs are clear”

• Quality Assurance• E.g. “poor inspiration”

• Unclassified• E.g. “the lungs are unchanged”

Page 52: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Observation Components

• Fundamentals• Pathologic Entities• Physiologic entities• Devices• Procedure

• Modifiers• Location• Qualitative• Quantitative

• Uncertainty (modal)• Negation

Page 53: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Observation Status

• Historical• (non-Historical)• Change Stated• No Change Stated• (Change not stated)• Grouped• Contains Uncertain (modal) Element

Page 54: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Example PN and F (Modal)

Page 55: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Example Hx and Grouped

Page 56: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Example CS and NCS

Page 57: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Example Quality Assurance

Page 58: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

Modifier in long distance dependency

Finding of PE in

historical context

Finding of devices

Findings

Page 59: From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007.

A knowledge that lungs should be

clear

negation of abnormalities

statement of normality

Pertinent Negatives