Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

35
Unrestricted © Siemens AG 2014. All rights reserved Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports IEEE International Conference on Healthcare Informatics / September 2014 Heiner Oberkampf 1,2 , Claudia Bretschneider 1 , Sonja Zillner 1 , Bernhard Bauer 2 and Matthias Hammon 3 1 Siemens AG, Corporate Technology 2 University of Augsburg, Software Methodologies for Distributed Systems 3 University Hospital Erlangen, Department of Radiology

description

Abstract: A large percentage of relevant radiologic patient information is currently only available in unstructured formats such as free text reports. In particular measurements are relevance since they are comparable and thus provide insight into the change of the health status over time, for example in response to some treatment. In radiology most of the measurements in reports describe the size of anatomical entities. Even though it is possible to extract measurements and anatomical entities from text using standard information extraction techniques, it is difficult to extract the relation between the measurement and the corresponding anatomical entity. Here we present a knowledgebased approach to extract this relation using a model about typical size descriptions of anatomical entities in combination with hierarchical knowledge of existing medical ontologies. We evaluate our approach on two data sets of German radiology reports reaching an F1-measure of 0.85 and 0.79 respectively.

Transcript of Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Page 1: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Unrestricted © Siemens AG 2014. All rights reserved

Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

IEEE International Conference on Healthcare Informatics / September 2014

Heiner Oberkampf1,2, Claudia Bretschneider1, Sonja Zillner1, Bernhard Bauer2 and Matthias Hammon3

1Siemens AG, Corporate Technology2University of Augsburg, Software Methodologies for Distributed Systems3University Hospital Erlangen, Department of Radiology

Page 2: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 2 September 2014 Corporate Technology

Agenda

Measurements in Radiology

Knowledge Model

Semantic Annotation of Radiology Reports

Extraction Algorithm

Evaluation

Outlook

Page 3: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 3 September 2014 Corporate Technology

Measurements in Radiology

Not comprehensive list

Sizelength: 1D, 2D, 3Darea, volumeindex (e.g. spleen index = width*height*depth)

Density measured in Hounsfield scale (Hu)mainly in CT imagesminimal, maximal and mean density values for Regions of Interest (ROIs)

Anglee.g. bone configurations or fractions

Blood flowe.g. PET: myocardial blood flow and blood flow in brain…

1) Source: http://www.recist.com/recist-in-practice/19.html

1)

Page 4: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 4 September 2014 Corporate Technology

Size Measurements in Radiology Reports

Example Sentences

Leber mit kranio-kaudalem Durchmesser von 15,5 cm.

Größenprogrediente, unscharf abgrenzbare Hypodensität links temporal nach kranial bis nach parietobasal reichend (IMA 7-22; aktuell etwa 8 x 7 x 6 cm - Voruntersuchung etwa 4,5 x 3,5 cm) mit einzelnen, neuaufgetretenen, stippchenförmigen Hyperdensitäten (IMA 11-14).

Etwas kaudal hiervon im Unterlappen am Lappenspalt zentral ein 1.3 cm (VU 1.3 cm) großer Rundherd mit weiterhin deutlich vermehrtem FDG-Uptake (SUV max. 3.9; VU 5.7; IMA 182) im Oberlappen lappenspaltnah ein 1.0 cm (VU 1.0 cm) großer Rundherd mit vermehrtem FDG-Uptake (SUV max. 0.8; VU 1.5; IMA 199) sowie auf gleicher Höhe im Unterlappen dorsal paravertebral zwei Rundherde mit Ausläufern von 1.5 cm (VU 1.3 cm) und lateral hiervon zwei verschmolzene Lymphknoten von zusammen 1.7 cm Durchmesser (VU 1.5 cm + Satellit von 0.9 cm) mit deutlich vermehrtem FDG-Uptake (SUV max. 4.0; VU 3.2 bzw. SUV max. 6.6; VU 4.8; IMA 207) und im costophrenischen Winkel dorsal ein 0.9 cm (VU 0.5 cm) großer Rundherd mit vermehrtem FDG-Uptake (SUV max. 1.7; VU 1.7; IMA 234).

Page 5: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 5 September 2014 Corporate Technology

Longitudinal Integration

Image source: “Automated Detection and Volumetric Segmentation of the Spleen in CT Scans” M. Hammon, P. Dankerl, M. Kramer, S. Seifert, A. Tsymbal2, M. J. Costa2, R. Janka1, M. Uder1, A. Cavallaro

Page 6: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 6 September 2014 Corporate Technology

Two Data Sets

382 Lymphoma Patients• 2584 reports• imaging modality: CT, MRI, US,

Radiography, …

Diverse Internistic Patients• 6007 reports• imaging modality: CT

Page 7: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 7 September 2014 Corporate Technology

Size Measurements in Radiology Reports1)

Mostly 1- and 2-dimensional and one or two per sentence.

1) Based on a data set of 2854 German radiology reports of 377 lymphoma patients and one of 6007 of diverse internistic patients

40%

58%

2%1-dim

3-dim

2-dim

13109

4820538 668 290

1 2 3 4 >4

# measurements contained in a sentence

# se

nten

ces

Type of measuements: Type of sentences:

Page 8: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 8 September 2014 Corporate Technology

Agenda

Measurements in Radiology

Knowledge Model

Semantic Annotation of Radiology Reports

Extraction Algorithm

Evaluation

Outlook

Page 9: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 9 September 2014 Corporate Technology

Size Specifications

Commonly used types to describe the size of anatomical entities.

Interval• Anterior-posterior diameter of liver normally 10-13 cm• Thickness of wall of gallbladder normally 0.1 -0.3 cm

Normal Valuewith deviation

• Truncus pulmonalis: 1.4 cm +/- 0.4 cm

Upper Bound • Normal lymph node < 1 cm

Lower Bound• Normal aorta diameter > 4 cm at root• Enlarged lymph node > 1 cm

Basic form: anatomical entity, quality, value specificationNote: Specifications might be age or gender specific

Page 10: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 10 September 2014 Corporate Technology

Coverage• 50 size specifications• 38 different anatomical

entities

Representation• OWL

Knowledge Representation

Anatomical Entities• Radiological Lexicon (RadLex)• Foundational Model of Anatomy

(FMA)1)

Qualities• Ont. for Phenotypic Qualities (PATO)1)

Value Specifications• Ont. for Biomedical Investigations

(OBI)1)

• Information Artifact Ontology (IAO)1)

• Units Ontology (UO)1)

• Model for Clinical Information (MCI)

Knowledge ModelReused Ontologies Knowledge Resources

The Knowledge Model is based on existing biomedical ontologies.

1) Part of the Open Biological and Biomedical Ontologies Foundry library http://www.obofoundry.org/

Page 11: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 11 September 2014 Corporate Technology

obi:has value specificationiao:has measurement

unit label

Value Specification

Quality Anatomical Entity

Normal Upper Bound Specification

Example: Lymph nodes are normally < 1 cm.

_:vs1uo:centimeter

uo:length unit

“1.0”^^xsd:float

obi:scalar value specification

_:usp

radlex:lymph node

bfo:inheres in

pato:normal

_:ln

pato:length

pato:size

:normalDiameterOfLymphNode

obi:has specified value

iao:is quality specification of

pato:diameter

mci:upper bound specification

Page 12: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 12 September 2014 Corporate Technology

obi:has specified value

ro:has part

iao:has measurement unit label

Value Specification

Quality Anatomical Entity

Normal Interval Specification

Example: Normal diameter of the pulmonary atery is between 1.6 and 2.6 cm.

_:vs1 uo:centimeter

uo:length unit

“2.6”^^xsd:float

obi:scalar value specification

obi:has value specification

radlex:pulmonary atery

bfo:inheres in_:pulmAtery:normalDiameterOfPu

lmonaryAtery

_:vs2 “1.6”^^xsd:float

_:ubsp

mci:upper bound specification

_:lbsp

mci:lower bound specification

iao:is quality specification of

pato:normal

pato:length

pato:size

pato:diameter

_:isp

mci:interval specification

Page 13: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 13 September 2014 Corporate Technology

obi:has specified value

ro:has part

iao:has measurement unit label

Value Specification

Quality Anatomical Entity

Normal Interval Specification

Example: Normal length of kidney along craniocaudal axis: 8.0 – 13.0 cm.

_:vs1 uo:centimeter

uo:length unit

“13.0”^^xsd:float

obi:scalar value specification

obi:has value specification

_:isp

mci:interval specification

radlex:kidney

bfo:inheres in

pato:normal

_:kidney

pato:lengthpato:size

bspo:transverse plane

_:tp

bspo:orthogonal_to:normalLengthKidney

Craniocaudal

_:vs2 “8.0”^^xsd:float

_:ubsp

mci:upper bound specification

_:lbsp

mci:lower bound specification

iao:is quality specification of

Page 14: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 14 September 2014 Corporate Technology

Agenda

Measurements in Radiology

Knowledge Model

Semantic Annotation of Radiology Reports

Extraction Algorithm

Evaluation

Outlook

Page 15: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 15 September 2014 Corporate Technology

Semantic Annotation of Radiology Reports

Recognition of entities from ontologies and measurements

23 mm

radlex:lymph node

“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”

measurement

value unit

Page 16: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 16 September 2014 Corporate Technology

Semantic Annotation of Radiology Reports

Functional Scope• Detection of multiword terms independent from the ordering of the individual tokens.• Respect sentence boundaries and map multiword terms only when they occur within

these boundaries.• Recognition of inflected forms of ontological concepts in the text such as detection of

plural form or other grammatical inflections based on stemmed forms.

Technical Realization• builds on top of the UIMA framework• adapted form of the UIMA Concept Mapper• Outputs annotations in RDF

Page 17: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 17 September 2014 Corporate Technology

Running Example

The running example used during the description of the resolution algorithm

radlex:lymphadenopathy

radlex:kidney radlex:renal pedicle

2.3 uo:centimeter

radlex:lymph node

radlex:lateral aortic lymph node

radlex:inferior para-aortic lymph node

radlex:rightradlex:enlarged

radlex:inferiorradlex:paraaortic

“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”

Annotations:

Page 18: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 18 September 2014 Corporate Technology

Agenda

Measurements in Radiology

Knowledge Model

Semantic Annotation of Radiology Reports

Extraction Algorithm

Evaluation

Outlook

Page 19: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 19 September 2014 Corporate Technology

Overview of Algorithm

1. Using ontology structure of RadLex and create spanning tree for annotations.

2. Compare Measurement values with Knowledge Model

3. Compute a ranking and select the best entity

Page 20: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 20 September 2014 Corporate Technology

Filter and Expand the Set of Annotations

Use knowledge from the RadLex ontology

anatomical entity imaging observationclinical finding

RadLex entity

descriptorimaging modality …

lymphadenopathy

kidney renal pedicle

lymph node

lateral aortic lymph node

inferior para-aortic lymph node

rightenlarged

inferiorparaaortic

Anatomical_Site

Page 21: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 21 September 2014 Corporate Technology

Minimal Spanning Tree

Based on the set of relevant annotations we create a tree along the RadLex subclass hierarchy

Sentence:“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”

Page 22: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 22 September 2014 Corporate Technology

Attach Normal Size Specifications

For each entity of the spanning tree we retrieve available size specifications from the knowledge model.

normal: 0-1 cmenlarged: 1-5 cmcraniocaudal extension: 8-13 cm

anterior posterior diameter: 4 cm

compValue: 0.73 compValue: 0.0 compValue: 1.3compValue: 0.0compValue: 2.48

compValue: 0.73

Sentence:“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”

Page 23: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 23 September 2014 Corporate Technology

Propagate Comparison Value

compValue: 0.73 compValue: 0.0

compValue: 0.0

compValue: 0.0 compValue: 0.0

compValue: 0.0 Sentence:“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”

Page 24: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 24 September 2014 Corporate Technology

Ranking and Selection of Best Entity

Take ranking includes the position in the RadLex hierarchy

2.3 uo:centimeterradlex:inferior para-aortic lymph node

• Include position in RadLex hierarchy more specific entities are preferred• Use threshold criteria to select best entity

“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”

Structured Representation:

Page 25: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 25 September 2014 Corporate Technology

Agenda

Measurements in Radiology

Knowledge Model

Semantic Annotation of Radiology Reports

Extraction Algorithm

Evaluation

Outlook

Page 26: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 26 September 2014 Corporate Technology

Scope of the Algorithm

The described algorithm resolves only one measurement-entity relation per sentence.

In Scope Out of Scope

• Sentences with two measurements about different entities. E.g. “Splenomegaly with 23.0 x 14.5 x 8.5 cm and approx. 1.0 cm lesion.”

• Sentences with more than two measurements

• Sentences with one measurement• Sentences with two measurements where

both measurements are about the same entity. E.g.“Spleen now with 10.5 x 4.5 cm slightly smaller than in previous examination with 13.3 x 6.7 cm.”

Page 27: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 27 September 2014 Corporate Technology

Scope of Algorithm

Analysis of sentences in- and out-of-scope

3980

791

249

71 78 31

1 2 3 4 >4

# se

nten

ces

# measurements contained in a sentence

#Sentences out of Scope: 8.25%

9129

2798

982

467 590 259

1 2 3 4 >4#

sent

ence

s# measurements contained in a sentence

#Sentences out of Scope: 16.15%

Reports on Lymphoma Patients Reports on Internistic Patients

Page 28: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 28 September 2014 Corporate Technology

Evaluation Schema

Description Example

correct • The entity resolved is exactly what the measurement is about

• The radiologist cannot name a betterentity

“Lymph node in mediastium 1.8 cm”mediastinal lymph node

(correct) • The entity resolved is correct however it could be more specific

• The radiologist can name a better entity

“Lymph node in jaw angle 1 cm”lymph node

Radiologist: jugular lymph node

unresolvable • The sentence does not allow a resolution

• The algorithm did not resolve to a false entity

“The biggest is now 2.7 cm.”“Previously 53x18 mm.”“Craniocaudal diameter now 10.8 cm.”

false • The resolved entity is false or no entity was resolved

• The radiologist can find the correct entity.

“Large metastasis in liver with a size of 12.3 x 7.0 cm.” liver

Radiologist: metastasis

Page 29: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 29 September 2014 Corporate Technology

Evaluation Results

Evaluation results for 500 randomly selected sentences for each data set.

5%

21%

50%

24%false

unresolvable

(correct)

correct

4%

19%

44%

34%false

unresolvable

(correct)

correct

resolved 84%, unresolved 16%recall: 0.8698precision:0.8389F-measure: 0.8540

resolved 80%, unresolved 20%recall:0.7904precision:0.7864F-measure: 0.7884

Lymphoma Internistic

Page 30: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 30 September 2014 Corporate Technology

anat

omic

al e

ntity

Evaluation by Resolved Anatomical Entity

Page 31: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 31 September 2014 Corporate Technology

Evaluation of Annotator

Using RadLex brings the follwowing two problems when used for German text:

1. Missing annotations• Only about 25% of all RadLex concepts have German labels• 6.59% of all sentences get no relevant annotations• In 50% of the false resolutions, the correct entity was not annotated

2. Wrong annotations due to unspecific synonyms• ‘radlex:breast mass’ has synonyms: ‘mass’, ‘nodule’, ‘lesion’, ‘nodular enhancement’

and ‘area of enhancement’• ‘mass’ or ‘lesion’ are annotated with ‘radlex:breast mass’ and then the resolution

algorithm often falsely resolves to ‘breast mass’.

Page 32: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 32 September 2014 Corporate Technology

Limitations of a Pure Knowledge-based Approach

• normal size specifications overlap

• measured entities are often not within the normal range

• annotation quality• coverage• level of detail of RadLex concepts• wrong annotations due to synonyms

• restriction to sentence boundaries

• multiple measurements in one sentence

• one measurement about multiple entities

We need to use the sentence context to better resolve more complex sentences.

Page 33: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 33 September 2014 Corporate Technology

Agenda

Measurements in Radiology

Knowledge Model

Semantic Annotation of Radiology Reports

Extraction Algorithm

Evaluation

Outlook

Page 34: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 34 September 2014 Corporate Technology

Outlook

Adaptation of the algorithm already made:• Use adapted version of RadLex• Use statistics from the evaluated data set• Use distance within sentence• Now all sentences are in scope

Ongoing:• Include context information about the quality: normal, enlarged…• include annotations from previous sentence for unresolved sentences.• Density measurements

Page 35: Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports

Restricted © Siemens AG 2014. All rights reservedPage 35 September 2014 Corporate Technology

Application

Longitudinal view on reports from consequtive examinations