Information Extraction Lecture 5 – Named Entity Recognition III
Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports
-
Upload
oberkampf -
Category
Data & Analytics
-
view
191 -
download
6
description
Transcript of Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports
Unrestricted © Siemens AG 2014. All rights reserved
Knowledge-based Extraction of Measurement-Entity Relations from German Radiology Reports
IEEE International Conference on Healthcare Informatics / September 2014
Heiner Oberkampf1,2, Claudia Bretschneider1, Sonja Zillner1, Bernhard Bauer2 and Matthias Hammon3
1Siemens AG, Corporate Technology2University of Augsburg, Software Methodologies for Distributed Systems3University Hospital Erlangen, Department of Radiology
Restricted © Siemens AG 2014. All rights reservedPage 2 September 2014 Corporate Technology
Agenda
Measurements in Radiology
Knowledge Model
Semantic Annotation of Radiology Reports
Extraction Algorithm
Evaluation
Outlook
Restricted © Siemens AG 2014. All rights reservedPage 3 September 2014 Corporate Technology
Measurements in Radiology
Not comprehensive list
Sizelength: 1D, 2D, 3Darea, volumeindex (e.g. spleen index = width*height*depth)
Density measured in Hounsfield scale (Hu)mainly in CT imagesminimal, maximal and mean density values for Regions of Interest (ROIs)
Anglee.g. bone configurations or fractions
Blood flowe.g. PET: myocardial blood flow and blood flow in brain…
1) Source: http://www.recist.com/recist-in-practice/19.html
1)
Restricted © Siemens AG 2014. All rights reservedPage 4 September 2014 Corporate Technology
Size Measurements in Radiology Reports
Example Sentences
Leber mit kranio-kaudalem Durchmesser von 15,5 cm.
Größenprogrediente, unscharf abgrenzbare Hypodensität links temporal nach kranial bis nach parietobasal reichend (IMA 7-22; aktuell etwa 8 x 7 x 6 cm - Voruntersuchung etwa 4,5 x 3,5 cm) mit einzelnen, neuaufgetretenen, stippchenförmigen Hyperdensitäten (IMA 11-14).
Etwas kaudal hiervon im Unterlappen am Lappenspalt zentral ein 1.3 cm (VU 1.3 cm) großer Rundherd mit weiterhin deutlich vermehrtem FDG-Uptake (SUV max. 3.9; VU 5.7; IMA 182) im Oberlappen lappenspaltnah ein 1.0 cm (VU 1.0 cm) großer Rundherd mit vermehrtem FDG-Uptake (SUV max. 0.8; VU 1.5; IMA 199) sowie auf gleicher Höhe im Unterlappen dorsal paravertebral zwei Rundherde mit Ausläufern von 1.5 cm (VU 1.3 cm) und lateral hiervon zwei verschmolzene Lymphknoten von zusammen 1.7 cm Durchmesser (VU 1.5 cm + Satellit von 0.9 cm) mit deutlich vermehrtem FDG-Uptake (SUV max. 4.0; VU 3.2 bzw. SUV max. 6.6; VU 4.8; IMA 207) und im costophrenischen Winkel dorsal ein 0.9 cm (VU 0.5 cm) großer Rundherd mit vermehrtem FDG-Uptake (SUV max. 1.7; VU 1.7; IMA 234).
Restricted © Siemens AG 2014. All rights reservedPage 5 September 2014 Corporate Technology
Longitudinal Integration
Image source: “Automated Detection and Volumetric Segmentation of the Spleen in CT Scans” M. Hammon, P. Dankerl, M. Kramer, S. Seifert, A. Tsymbal2, M. J. Costa2, R. Janka1, M. Uder1, A. Cavallaro
Restricted © Siemens AG 2014. All rights reservedPage 6 September 2014 Corporate Technology
Two Data Sets
382 Lymphoma Patients• 2584 reports• imaging modality: CT, MRI, US,
Radiography, …
Diverse Internistic Patients• 6007 reports• imaging modality: CT
Restricted © Siemens AG 2014. All rights reservedPage 7 September 2014 Corporate Technology
Size Measurements in Radiology Reports1)
Mostly 1- and 2-dimensional and one or two per sentence.
1) Based on a data set of 2854 German radiology reports of 377 lymphoma patients and one of 6007 of diverse internistic patients
40%
58%
2%1-dim
3-dim
2-dim
13109
4820538 668 290
1 2 3 4 >4
# measurements contained in a sentence
# se
nten
ces
Type of measuements: Type of sentences:
Restricted © Siemens AG 2014. All rights reservedPage 8 September 2014 Corporate Technology
Agenda
Measurements in Radiology
Knowledge Model
Semantic Annotation of Radiology Reports
Extraction Algorithm
Evaluation
Outlook
Restricted © Siemens AG 2014. All rights reservedPage 9 September 2014 Corporate Technology
Size Specifications
Commonly used types to describe the size of anatomical entities.
Interval• Anterior-posterior diameter of liver normally 10-13 cm• Thickness of wall of gallbladder normally 0.1 -0.3 cm
Normal Valuewith deviation
• Truncus pulmonalis: 1.4 cm +/- 0.4 cm
Upper Bound • Normal lymph node < 1 cm
Lower Bound• Normal aorta diameter > 4 cm at root• Enlarged lymph node > 1 cm
Basic form: anatomical entity, quality, value specificationNote: Specifications might be age or gender specific
Restricted © Siemens AG 2014. All rights reservedPage 10 September 2014 Corporate Technology
Coverage• 50 size specifications• 38 different anatomical
entities
Representation• OWL
Knowledge Representation
Anatomical Entities• Radiological Lexicon (RadLex)• Foundational Model of Anatomy
(FMA)1)
Qualities• Ont. for Phenotypic Qualities (PATO)1)
Value Specifications• Ont. for Biomedical Investigations
(OBI)1)
• Information Artifact Ontology (IAO)1)
• Units Ontology (UO)1)
• Model for Clinical Information (MCI)
Knowledge ModelReused Ontologies Knowledge Resources
The Knowledge Model is based on existing biomedical ontologies.
1) Part of the Open Biological and Biomedical Ontologies Foundry library http://www.obofoundry.org/
Restricted © Siemens AG 2014. All rights reservedPage 11 September 2014 Corporate Technology
obi:has value specificationiao:has measurement
unit label
Value Specification
Quality Anatomical Entity
Normal Upper Bound Specification
Example: Lymph nodes are normally < 1 cm.
_:vs1uo:centimeter
uo:length unit
“1.0”^^xsd:float
obi:scalar value specification
_:usp
radlex:lymph node
bfo:inheres in
pato:normal
_:ln
pato:length
pato:size
:normalDiameterOfLymphNode
obi:has specified value
iao:is quality specification of
pato:diameter
mci:upper bound specification
Restricted © Siemens AG 2014. All rights reservedPage 12 September 2014 Corporate Technology
obi:has specified value
ro:has part
iao:has measurement unit label
Value Specification
Quality Anatomical Entity
Normal Interval Specification
Example: Normal diameter of the pulmonary atery is between 1.6 and 2.6 cm.
_:vs1 uo:centimeter
uo:length unit
“2.6”^^xsd:float
obi:scalar value specification
obi:has value specification
radlex:pulmonary atery
bfo:inheres in_:pulmAtery:normalDiameterOfPu
lmonaryAtery
_:vs2 “1.6”^^xsd:float
_:ubsp
mci:upper bound specification
_:lbsp
mci:lower bound specification
iao:is quality specification of
pato:normal
pato:length
pato:size
pato:diameter
_:isp
mci:interval specification
Restricted © Siemens AG 2014. All rights reservedPage 13 September 2014 Corporate Technology
obi:has specified value
ro:has part
iao:has measurement unit label
Value Specification
Quality Anatomical Entity
Normal Interval Specification
Example: Normal length of kidney along craniocaudal axis: 8.0 – 13.0 cm.
_:vs1 uo:centimeter
uo:length unit
“13.0”^^xsd:float
obi:scalar value specification
obi:has value specification
_:isp
mci:interval specification
radlex:kidney
bfo:inheres in
pato:normal
_:kidney
pato:lengthpato:size
bspo:transverse plane
_:tp
bspo:orthogonal_to:normalLengthKidney
Craniocaudal
_:vs2 “8.0”^^xsd:float
_:ubsp
mci:upper bound specification
_:lbsp
mci:lower bound specification
iao:is quality specification of
Restricted © Siemens AG 2014. All rights reservedPage 14 September 2014 Corporate Technology
Agenda
Measurements in Radiology
Knowledge Model
Semantic Annotation of Radiology Reports
Extraction Algorithm
Evaluation
Outlook
Restricted © Siemens AG 2014. All rights reservedPage 15 September 2014 Corporate Technology
Semantic Annotation of Radiology Reports
Recognition of entities from ontologies and measurements
23 mm
radlex:lymph node
“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”
measurement
value unit
Restricted © Siemens AG 2014. All rights reservedPage 16 September 2014 Corporate Technology
Semantic Annotation of Radiology Reports
Functional Scope• Detection of multiword terms independent from the ordering of the individual tokens.• Respect sentence boundaries and map multiword terms only when they occur within
these boundaries.• Recognition of inflected forms of ontological concepts in the text such as detection of
plural form or other grammatical inflections based on stemmed forms.
Technical Realization• builds on top of the UIMA framework• adapted form of the UIMA Concept Mapper• Outputs annotations in RDF
Restricted © Siemens AG 2014. All rights reservedPage 17 September 2014 Corporate Technology
Running Example
The running example used during the description of the resolution algorithm
radlex:lymphadenopathy
radlex:kidney radlex:renal pedicle
2.3 uo:centimeter
radlex:lymph node
radlex:lateral aortic lymph node
radlex:inferior para-aortic lymph node
radlex:rightradlex:enlarged
radlex:inferiorradlex:paraaortic
“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”
Annotations:
Restricted © Siemens AG 2014. All rights reservedPage 18 September 2014 Corporate Technology
Agenda
Measurements in Radiology
Knowledge Model
Semantic Annotation of Radiology Reports
Extraction Algorithm
Evaluation
Outlook
Restricted © Siemens AG 2014. All rights reservedPage 19 September 2014 Corporate Technology
Overview of Algorithm
1. Using ontology structure of RadLex and create spanning tree for annotations.
2. Compare Measurement values with Knowledge Model
3. Compute a ranking and select the best entity
Restricted © Siemens AG 2014. All rights reservedPage 20 September 2014 Corporate Technology
Filter and Expand the Set of Annotations
Use knowledge from the RadLex ontology
anatomical entity imaging observationclinical finding
RadLex entity
descriptorimaging modality …
lymphadenopathy
kidney renal pedicle
lymph node
lateral aortic lymph node
inferior para-aortic lymph node
rightenlarged
inferiorparaaortic
Anatomical_Site
Restricted © Siemens AG 2014. All rights reservedPage 21 September 2014 Corporate Technology
Minimal Spanning Tree
Based on the set of relevant annotations we create a tree along the RadLex subclass hierarchy
Sentence:“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”
Restricted © Siemens AG 2014. All rights reservedPage 22 September 2014 Corporate Technology
Attach Normal Size Specifications
For each entity of the spanning tree we retrieve available size specifications from the knowledge model.
normal: 0-1 cmenlarged: 1-5 cmcraniocaudal extension: 8-13 cm
anterior posterior diameter: 4 cm
compValue: 0.73 compValue: 0.0 compValue: 1.3compValue: 0.0compValue: 2.48
compValue: 0.73
Sentence:“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”
Restricted © Siemens AG 2014. All rights reservedPage 23 September 2014 Corporate Technology
Propagate Comparison Value
compValue: 0.73 compValue: 0.0
compValue: 0.0
compValue: 0.0 compValue: 0.0
compValue: 0.0 Sentence:“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”
Restricted © Siemens AG 2014. All rights reservedPage 24 September 2014 Corporate Technology
Ranking and Selection of Best Entity
Take ranking includes the position in the RadLex hierarchy
2.3 uo:centimeterradlex:inferior para-aortic lymph node
• Include position in RadLex hierarchy more specific entities are preferred• Use threshold criteria to select best entity
“Enlarged lymph node right paraaortal below the renal pedicle now 23 mm.”
Structured Representation:
Restricted © Siemens AG 2014. All rights reservedPage 25 September 2014 Corporate Technology
Agenda
Measurements in Radiology
Knowledge Model
Semantic Annotation of Radiology Reports
Extraction Algorithm
Evaluation
Outlook
Restricted © Siemens AG 2014. All rights reservedPage 26 September 2014 Corporate Technology
Scope of the Algorithm
The described algorithm resolves only one measurement-entity relation per sentence.
In Scope Out of Scope
• Sentences with two measurements about different entities. E.g. “Splenomegaly with 23.0 x 14.5 x 8.5 cm and approx. 1.0 cm lesion.”
• Sentences with more than two measurements
• Sentences with one measurement• Sentences with two measurements where
both measurements are about the same entity. E.g.“Spleen now with 10.5 x 4.5 cm slightly smaller than in previous examination with 13.3 x 6.7 cm.”
Restricted © Siemens AG 2014. All rights reservedPage 27 September 2014 Corporate Technology
Scope of Algorithm
Analysis of sentences in- and out-of-scope
3980
791
249
71 78 31
1 2 3 4 >4
# se
nten
ces
# measurements contained in a sentence
#Sentences out of Scope: 8.25%
9129
2798
982
467 590 259
1 2 3 4 >4#
sent
ence
s# measurements contained in a sentence
#Sentences out of Scope: 16.15%
Reports on Lymphoma Patients Reports on Internistic Patients
Restricted © Siemens AG 2014. All rights reservedPage 28 September 2014 Corporate Technology
Evaluation Schema
Description Example
correct • The entity resolved is exactly what the measurement is about
• The radiologist cannot name a betterentity
“Lymph node in mediastium 1.8 cm”mediastinal lymph node
(correct) • The entity resolved is correct however it could be more specific
• The radiologist can name a better entity
“Lymph node in jaw angle 1 cm”lymph node
Radiologist: jugular lymph node
unresolvable • The sentence does not allow a resolution
• The algorithm did not resolve to a false entity
“The biggest is now 2.7 cm.”“Previously 53x18 mm.”“Craniocaudal diameter now 10.8 cm.”
false • The resolved entity is false or no entity was resolved
• The radiologist can find the correct entity.
“Large metastasis in liver with a size of 12.3 x 7.0 cm.” liver
Radiologist: metastasis
Restricted © Siemens AG 2014. All rights reservedPage 29 September 2014 Corporate Technology
Evaluation Results
Evaluation results for 500 randomly selected sentences for each data set.
5%
21%
50%
24%false
unresolvable
(correct)
correct
4%
19%
44%
34%false
unresolvable
(correct)
correct
resolved 84%, unresolved 16%recall: 0.8698precision:0.8389F-measure: 0.8540
resolved 80%, unresolved 20%recall:0.7904precision:0.7864F-measure: 0.7884
Lymphoma Internistic
Restricted © Siemens AG 2014. All rights reservedPage 30 September 2014 Corporate Technology
anat
omic
al e
ntity
Evaluation by Resolved Anatomical Entity
Restricted © Siemens AG 2014. All rights reservedPage 31 September 2014 Corporate Technology
Evaluation of Annotator
Using RadLex brings the follwowing two problems when used for German text:
1. Missing annotations• Only about 25% of all RadLex concepts have German labels• 6.59% of all sentences get no relevant annotations• In 50% of the false resolutions, the correct entity was not annotated
2. Wrong annotations due to unspecific synonyms• ‘radlex:breast mass’ has synonyms: ‘mass’, ‘nodule’, ‘lesion’, ‘nodular enhancement’
and ‘area of enhancement’• ‘mass’ or ‘lesion’ are annotated with ‘radlex:breast mass’ and then the resolution
algorithm often falsely resolves to ‘breast mass’.
Restricted © Siemens AG 2014. All rights reservedPage 32 September 2014 Corporate Technology
Limitations of a Pure Knowledge-based Approach
• normal size specifications overlap
• measured entities are often not within the normal range
• annotation quality• coverage• level of detail of RadLex concepts• wrong annotations due to synonyms
• restriction to sentence boundaries
• multiple measurements in one sentence
• one measurement about multiple entities
We need to use the sentence context to better resolve more complex sentences.
Restricted © Siemens AG 2014. All rights reservedPage 33 September 2014 Corporate Technology
Agenda
Measurements in Radiology
Knowledge Model
Semantic Annotation of Radiology Reports
Extraction Algorithm
Evaluation
Outlook
Restricted © Siemens AG 2014. All rights reservedPage 34 September 2014 Corporate Technology
Outlook
Adaptation of the algorithm already made:• Use adapted version of RadLex• Use statistics from the evaluated data set• Use distance within sentence• Now all sentences are in scope
Ongoing:• Include context information about the quality: normal, enlarged…• include annotations from previous sentence for unresolved sentences.• Density measurements
Restricted © Siemens AG 2014. All rights reservedPage 35 September 2014 Corporate Technology
Application
Longitudinal view on reports from consequtive examinations