The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey.
-
Upload
alysa-roling -
Category
Documents
-
view
216 -
download
0
Transcript of The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey.
The NeighborhoodAuditing Tool
James GellerMichael HalperYehoshua PerlC. Paul Morrey
22
Research PaperC.P. Morrey, J. Geller, M. Halper, Y. Perl.
The Neighborhood Auditing Tool: A hybrid interface for auditing the UMLS. J Biomed Inform, 42(3):468-89, 2009.
33
Overview
Goals of an Auditor’s Tool for the UMLS Principles of Auditing with Neighborhoods The Idea of a Hybrid Display Current State of the NAT: Serving the Auditor Presentation of NAT Features Live Audit Session Planned State of the NAT: Guiding the Auditor Conclusions Future Work
44
Auditing the UMLS
About 150 source vocabularies It is natural that inconsistencies will appear Over 2.1 million concepts and nearly 9.7
million terms* Two level structure consisting of the
Semantic Network and the Metathesaurus
*UMLS Metathesaurus version 2009AA
5
Previous Work on Auditing H. Gu, Y. Perl, J. Geller, M. Halper, L. Liu, and J.J. Cimino. Representing
the UMLS as an Object-oriented Database: Modeling Issues and Advantages. J Am Med Inform Assoc, 7(1):66-80, 2000.
J. Geller, H. Gu, Y. Perl, and M. Halper. Semantic refinement and error correction in large terminological knowledge bases. Data & Knowledge Engineering, 45(1):1-32, 2003.
J.J. Cimino, H. Min, and Y. Perl. Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus. J Biomed Inform, 36(6):450-461, 2003.
H. Gu, Y. Perl, G. Elhanan, H. Min, L. Zhang, Y. Peng. Auditing concept categorizations in the UMLS. Artif Intell Med, 31(1):29-44, 2004.
Y. Chen, Y. Perl, J. Geller, and J.J. Cimino. Analysis of a study of the users, uses, and future agenda of the UMLS. J Am Med Inform Assoc, 14(2):221-231, 2007.
6
Previous Work on Auditing (cont’d)
H. Gu, G. Hripcsak, Y. Chen, C.P. Morrey, G. Elhanan, J.J. Cimino, J. Geller, and Y. Perl. Evaluation of a UMLS auditing process of semantic type assignments. In J.M. Teich, J. Suermondt, and G. Hripcsak, editors, Proc AMIA Symp, pages 294-298, Chicago IL, Nov. 2007.
Y. Chen, H. Gu, Y. Perl, J. Geller, M. Halper. Structural group auditing of a UMLS semantic type's extent. J Biomed Inform. 2009 Feb;42(1):41-52.
L. Chen, C.P. Morrey, H. Gu, M. Halper, Y. Perl. Modeling multi-typed structurally viewed chemicals with the UMLS Refined Semantic Network. J Am Med Inform Assoc, 16(1):116-31, 2009.
Y. Chen, H. Gu, Y. Perl, J. Geller. Structural group-based auditing of missing hierarchical relationships in UMLS. J Biomed Inform. 2009 Jun;42(3):452-67.
Y. Chen, H. Gu, Y. Perl, M. Halper, and J. Xu, Expanding the extent of a UMLS Semantic Type via Group Neighborhood Auditing. J Am Med Inform Assoc, Accepted for publication.
7
How we did it before the NAT: Provide Info as Paper Form
CPT: C1081844 Antonospora locustaeSRC: NCBISTY: T004T009 Fungus + InvertebrateDEF:SYN: Antonospora locustae | Nosema locustaePAR: Antonospora{STY: Invertebrate}CHD:
Data shown for this concept is from the UMLS Metathesaurus version 2006AC
88
Auditing Results also Paper Form(C1081844) Antonospora locustaeSTY: Fungus + Invertebrate
No errors Semantic Type Error: Fungus Semantic Type Error: Invertebrate Add Semantic Type______________________ Ambiguity Other error_____________________________ Comments _____________________________
______________________________________
99
Goals of an Auditor’s Tool for the UMLS
Display relevant information to the auditor. Do not overwhelm the auditor with too
much information. Help the auditor focus on areas most likely
to contain errors.Algorithms suggest likely erroneous conceptsConcepts are reviewed in a neighborhood
display
1010
Principles of Auditing with Neighborhoods
Several years of experience: Auditing is to a large degree a “local” activity.
Concepts have two kinds of knowledge elements:Textual Knowledge Elements: Preferred term,
CUI, synonyms, LUI, definition, sources, semantic types
Contextual Knowledge Elements: Neighbors
1111
Neighborhoods
Focus concept: The concept presently under review
Immediate Neighborhood: The set of concepts reachable from the focus concept by stepping one relationship (up, down, lateral, etc.)
Extended neighborhood: Includes parents of parents (grandparents), children of children (grandchildren) and siblings. No lateral chains.
12
References about Neighborhood M.S. Tuttle, D.D. Sherertz, N.E. Olson, M.S. Erlbaum,
W.D. Sperzel, and L.F. Fuller, et al. Using META-1, the first version of the UMLS Metathesaurus. In Proc 14th Annu Symp Comput Appl Med Care, pages 131-135, Washington, D.C., 1990.
S.J. Nelson, M.S. Tuttle, W.G. Cole, D.D. Sherertz, W. D. Sperzel, M.S. Erlbaum, L.L. Fuller, N.E. Olson, From meaning to term: semantic locality in the UMLS Metathesaurus. In Proc Annu Symp Comput Appl Med Care, pages 209-213, Washington, D.C., 1991.
1313
Immediate Neighborhood
Microsporidia, Unclassified
Microsporidia <protozoa>
Dictyocoela Edhazardia
FibrillanosemaMicrosporidium
Kabatana
Oligosporidium
Cellular aspects of
Microbiological
Pathogenicity Aspects
virologic
1414
Extended Neighborhood
RELATIONSHIPS
SIBLINGS
GRANDCHILDREN
CHILDREN
FOCUS CONCEPT
PARENTS
GRANDPARENTS
Microsporidia, Unclassified
Microsporidia <protozoa>
Erroneous concept
fungus
PHYLUM MICROSPORA
Protozoa
Sporozeoa
Dictyocoela Edhazardia
FibrillanosemaMicrosporidium
Dictyocoela berillonum
Dictyocoela cavimanum
Edhazardia aedis
Fibrillanosema crangonycis
Microsporidium 57864
Dictyocoela dehayesum
Dictyocoela duebenum
Dictyocoela grammarellum
Dictyocoela muelleri
Dictyocoela sp.L11
Kabatana
Kabatana takedai
Microsporidium africanum
Microsporidium ceylonensis
Microsporidium cypselurus
Microsporidium prosopium
Microsporidium seriolae
Oligosporidium
Oligosporidium occidentalis
Microsporea
Cellular aspects of
Microbiological
Pathogenicity Aspects
virologic
SIB
15
Up-Extended and Down-Extended Neighborhood
An up-extended neighborhood includes grandparents and the immediate neighborhood.
A down-extended neighborhood includes grandchildren and the immediate neighborhood.
Give auditor all s/he needs but not more.
16
Semantic Type Neighborhood
If we provide the semantic types for every concept, those also form a neighborhood.
It is important to keep the information of which semantic types are assigned to which concepts.
1717
The Idea of a Hybrid Display
Diagrams are wonderful – as long as they fit on one screen.
Indented text is wonderful – as long as there are no or very few multiple parents.
But the UMLS does not fit onto one screen and there are many cases of multiple parents.
1818
What makes a diagram wonderful?
You can follow parent/child paths with your eyes.
You can get a feeling for everything a concept is connected to with one look.
You can see multiple parents and multiple paths with one look.
You can see global features (short and bushy versus tall and sparse, or (gasp!) tall and bushy).
1919
What makes indented text wonderful?
Indentation expresses parenthood compactly and elegantly.
There are no lines crossing. You don’t need a layout algorithm. There is a linear order in which to study
text.
2020
The Idea of a Hybrid Display (cont.)
Keep the best features of text and the best features of diagrams.
Maintain relative positions between the focus concept and its children, parents, etc.
Eliminate clutter of arrows.
2121
A Hybrid Diagram/Form Display of a Neighborhood
Children
Focus ConceptSynonyms Relationships
Parents
2222
Desirable Information Beyond Neighborhoods
Concept definition for Focus Concept Sources for concepts and relationships Assigned Semantic Types of concepts Definitions of relevant Semantic Types Global view of the Semantic Network
Indented (better for wide branches)Graphical (better for almost everything else)
2323
Current State of the NAT: Serving the Auditor
The Neighborhood Auditing Tool has been implemented to fully support display of neighborhoods.
Navigation to adjacent neighboring concepts is an easy click.
Additional features listed before have been implemented.
2424
Demonstration of NAT Features
Neighborhood Grandparents and
grandchildren Synonyms Relationships: Concept,
Sibling, Term Focus concept definition Sources: Concepts,
Relationships Display CUIs Semantic Type display
Semantic Type definition Semantic Network
(indented) Semantic Network
(diagram) Navigation Search (full, partial) Viewing History Choice of release Choice of sources
offline version
2525
Audit Example: A Cycle of Three Concepts An SQL query found three concepts
that participate in a PAR/CHD cycle. We follow an auditor’s review of this
cycle. O. Bodenreider, Circular hierarchical
relationships in the UMLS: etiology, diagnosis, treatment, complications and prevention. Proc AMIA Symp. 2001:57-61
offline version
The Cycle of Three Concepts
Mood Disorders
Affective Disorders, Psychotic
Bipolar Disorder
Relationship Sources: Medical Subject Headings National Drug File Reference Terminology SNOMED-2 Alcohol and Other Drug Thesaurus
Relationship Sources: Medical Subject Headings National Drug File Reference Terminology
Relationship Source: DSM-IV
Relationship Sources: DSM-IV and many others
Recommended Modeling
Mood Disorders
Affective Disorders, PsychoticBipolar Disorder
2828
Audit Example: Semantic Types
An algorithm determined that the concept Antonospora locustae was likely assigned incorrect semantic types.
We follow an auditor’s review of this concept.
offline version
29
Preliminary Evaluation Study with NAT
Compare paper-based auditing and NAT-based auditing.
Counterbalanced groups. Recall improves with NAT use. Auditors
seem willing to investigate more concepts. Precision stays the same. Auditors’ mental
process does not improve.
3030
Conclusions
Preliminary study showed that people are more successful finding errors with NAT than with paper sources.
Recall improved with the NAT, precision did not.
NAT seems to nicely complement use of the UMLSKS.
3131
Future Work
Integration of algorithms for developing “audit sets” with NAT.
Recording and reporting auditor recommendations.
Facilitate team auditing where several auditors review the same sample.
Managing and reporting work flow of auditor teams.
32
The Neighborhood Auditing Tool is available online at:
http://nat.njit.edu
3333
Auditor
Errors Recall Precision F
with NAT
w/o NAT
with NAT
w/o NAT
with NAT
w/o NAT
with NAT
w/o NAT
1 57 45 0.97 0.82 0.53 0.51 0.86 0.63
2 22 20 0.43 0.35 0.55 0.55 0.48 0.43
3 39 34 0.64 0.58 0.46 0.53 0.54 0.55
4 56 44 0.55 0.54 0.30 0.34 0.39 0.42
Avg. 44 36 0.65 0.57 0.46 0.48 0.57 0.51
Preliminary Evaluation Study
Improved Recall
The auditor finds it easy to search for more errors in the neighborhood of the suspicious concept.
With better recall and the same precision you still find more errors.
Semantic Types Example
The concept Antonospora locustae was selected for audit by an algorithm that found it was the only concept assigned to the intersection Fungus + Invertebrate in the UMLS 2007AA.
NAT Features Demonstration
Neighborhood
Cycle Example
An SQL query provided us with a list of concepts in the Metathesaurus that participate in cycles of length three.
One of these cycles exists among the concepts Bipolar Disorder, Mood Disorders, and Affective Disorders, Psychotic.