Annotation of 311 Admission Summaries of the ICU Corpus Yefeng Wang.
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of Annotation of 311 Admission Summaries of the ICU Corpus Yefeng Wang.
Aim
• Create evaluation data for SNOMED CT concept matching performance.
• Create training data for machine learning systems.– Rule-based systems has low recall– Difficult to tune parameter, building the rules– Machine learning system is the state of art– No such annotated data available yet.
Existing Corpora
• Most of the existing corpora are in biomedical domain– GENIA (2000 abstracts from MEDLINE)– PennBioIE (2300 MEDLINE abstracts)
• Only a few are from clinical domain– Ogren et al., (clinical condition only)– Chapman et al., (clinical condition only)– CLEF, (semantically annotation, formal report)
Selection of Data
• Clinical notes were from 311 patients’ admission summaries.
• One note per patient• Admission notes were used for annotation
– Semi Structured, Variety of information• Chief Complaint • Background• History of Presented illness• Medication• Examination • Observation in Nursing Notes• Social• Other summaries (Echo reports, Surgical reports, etc)
The Annotation Task
• Concept Annotation– Annotate semantic category of medical concepts– Categories were based on SNOMED CT
• Relation Annotation– Relationships between concepts.– Inter-term relation
• Relationship between two separate concepts
– Intra-term relation• Relationship between atomic concepts within a composite
concept (Post-coordination).
Annotation SchemaBody small bowel loopsProcedure Loop ileostomyFinding Persistent tachycardiaAbnormality Inflammatory adhesionsQualifier Grade 3 intubationObject Sump drainSubstance CeftriaxoneOccupation Review by cardiologistOrganism EnterococcusBehaviour Lives with son
Development of Guidelines
• Iterative Approach• 10 reports were annotated jointly by two
annotators.– Discussion, – Development of initial guidelines
• 25 reports were used for iterative refinement of guidelines– Annotate separately– 5 documents for each iteration– New examples, rules were added into annotation
guidelines if necessary
Annotation Agreement
• Inter-Annotator Agreement were calculated during each development cycle.
• F1- is used for calculation– Harmonic mean of recall and precision– Precision = # correct annotation / # annotation– Recall = # correct annotation / # existing concepts
• Repeat development process until the annotator agreement reach a threshold of 90%.
• The guidelines then are finalised, no more new rules will be added into the guidelines.
• Differences resolved by a third annotator to make a gold standard corpus.
IAA for the development cycle
Iter TP FP FN P R F
1 152 38 31 80.00 83.06 81.50
2 185 21 29 89.81 86.45 88.10
3 194 22 31 89.81 86.22 87.98
4 238 29 25 89.14 90.49 89.81
5 159 19 17 89.33 90.34 89.83
IAA for the whole corpus (311)
Class TP FP FN P R F
body 1404 202 298 87.4 82.48 84.87
observable 348 62 89 84.77 79.52 82.06
abnormality 720 139 248 83.74 74.35 78.77
qualifier 1763 198 392 89.89 81.8 85.66
object 158 43 39 78.35 80 79.17
substance 2465 129 156 95.01 94.03 94.52
behaviour 68 16 18 80.49 78.57 79.52
occupations 125 33 35 78.95 77.92 78.43
finding 4116 371 398 91.72 91.17 91.44
organism 25 8 10 75 70.59 72.73
procedure 2076 298 288 87.43 87.82 87.63
overall 13273 1504 1976 89.82 87.04 88.41
Concept Frequency
Concept Class # of Instance Percentage
body 620 4.88%
observable 210 1.65%
substance 2431 19.14%
qualifier 1727 13.60%
object 158 1.24%
behaviour 375 2.95%
occupation 136 1.07%
finding 4755 37.44%
organism 35 0.28%
procedure 2253 17.74%
total 12700 100.00%
Comparison to other corpus
• Comparison to corpus in newswire, biomedical, science (astronomy) domain.
• Available corpus MUC, GENIA, ASTRO
Corpus ICU GENIA MUC ASTRO
# category 10 36 8 43
# entity 12700 40548 11568 10744
# avg. len (words)
1.49 1.70 1.64 1.49
tag density 40.3% 33.8% 11.8% 5.4%
Concept Identification Result
• 279 documents for training
• 32 documents for testing
• 4656 tokens, 1218 concepts
• Rule-based system (TTSCT)
• Use Conditional Random Fields CRF++ as the learner.
• Evaluate using CONLL 2000 evaluation script.
Concept Matcher Performance
P R F1 Δ
No Pre-processing at all (simple TTSCT) 58.76 26.63 36.35 ---
Pruning lexicon, removing unrelated classes.
64.87 46.73 54.33 +17.98
Expanding acronyms + Exact Matching (TTSCT)
74.89 55.25 63.59 +9.26
Expanding acronyms + Approximate 71.67 63.19 67.16 +4.47
Expanding acronyms + Approx. Matching 1 + Approx. Matching 2
64.88 59.14 61.88 -5.28
Performance increase over baseline +30.81
Machine Learning ResultsP R F1 Dec.
Best 84.22 78.90 81.48 ---
Best - abb 83.20 77.26 80.12 -1.36
Best – orth 83.67 78.24 80.87 -0.61
Best – affix 83.16 77.01 79.97 -1.51
Best – SNOMED 79.06 73.15 75.99 -5.49
Best – bigram 83.17 78.74 80.89 -0.59
Best – bow 81.26 73.32 77.08 -4.40
Bow (Baseline) 76.86 66.26 71.16
-10.32
---
Bow + SNOMED 82.61 74.88 78.55 +7.39
Inter Relation Annotation
• Annotate relationship between concepts• Inter-concept relations
– Relationship between two outermost concepts– CXR in ED bilateral mid- lower zone opacification
CXR
Opacification
Bilateral Mid-lower zone
HAS_FOCUSLATERALITY
HAS_FINDING
Intra-Concept Relations
• Relations between inner concepts and outermost concepts– Term decomposition– R groin abscess
Groin Abscess
Groin Abscess
ASSOCIATED ABNORMALITY
FINDING_SITE
R LATERALITY
Relation Types
procedure_site associated_abnormality
finding_site has_focus
due_to associated_with
associated_device associated_substance
has_finding interprets
severity locality
laterality negation
Inter + Intra Concept Relationships
• Hemicolectomy and formation of ileostomy for bowel obstruction
HemicolectomyIlestomy
Bowel Obstruction
HAS_FOCUSHAS_FOCUS
Bowel Obstruction
FINDING_SITE ASSOCIATEDABNORMALITY