Case Report Pancoast s Syndrome due to Fungal Abscess in ...
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B....
-
Upload
ferdinand-harrell -
Category
Documents
-
view
214 -
download
0
Transcript of Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B....
Codifying Semantic Codifying Semantic Information in Medical Information in Medical
Questions Using Lexical Questions Using Lexical SourcesSources
Paul E. PancoastPaul E. Pancoast
Arthur B. SmithArthur B. Smith
Chi-Ren ShyuChi-Ren Shyu
Research PurposeResearch Purpose
To find a method for classifying medical To find a method for classifying medical questions that are asked by cliniciansquestions that are asked by clinicians
Hypothesis - Hypothesis - Simply indexing by keywords Simply indexing by keywords isn’t enough to isn’t enough to distinguishdistinguish questions with different meanings questions with different meanings
but similar wordingbut similar wording, or to , or to groupgroup questions with similar meanings but questions with similar meanings but
different words.different words.
DefinitionsDefinitions
Semantic Information – the meaning of the Semantic Information – the meaning of the wordswords
Syntactic Information – the parts of speech Syntactic Information – the parts of speech of the words (word type, sentence part)of the words (word type, sentence part)
Medical Questions – a question asked by a Medical Questions – a question asked by a clinicianclinician
Lexical Sources – sources of words and Lexical Sources – sources of words and vocabulariesvocabularies
UMLS – Unified Medical Language SystemUMLS – Unified Medical Language System
UMLSUMLS
Ambitious project of the National Library of Ambitious project of the National Library of Medicine, begun in 1986Medicine, begun in 1986
Help researchers retrieve and integrate Help researchers retrieve and integrate electronic biomedical information from a electronic biomedical information from a variety of sourcesvariety of sources
Links over 100 controlled vocabulariesLinks over 100 controlled vocabularies Assigns unique identifiers to medical Assigns unique identifiers to medical
concepts and stringsconcepts and strings Maps the hierarchical relationships Maps the hierarchical relationships
between the medical conceptsbetween the medical concepts
Why Bother?Why Bother?(To classify medical questions?)(To classify medical questions?)
Clinicians have questions when treating Clinicians have questions when treating patientspatients
Researchers have gathered collections of these Researchers have gathered collections of these questionsquestions
No good method exists to classify the questionsNo good method exists to classify the questions How many times has a particular question been How many times has a particular question been
asked?asked? Which questions should receive priority for Which questions should receive priority for
evidence-based answers?evidence-based answers?
ExamplesExamples
What is the best way to treat acute What is the best way to treat acute pharyngitis?pharyngitis?
How should I approach a patient with a sore How should I approach a patient with a sore throat?throat?
What should I do with a patient with What should I do with a patient with diabetes and insulin resistance?diabetes and insulin resistance?
What should I do with a patient with What should I do with a patient with diabetes who is resistant to taking insulin?diabetes who is resistant to taking insulin?
MethodsMethodsSource QuestionsSource Questions
American researcher – observed American researcher – observed clinicians at workclinicians at work
British researchers – questions sent in by British researchers – questions sent in by clinicians – answered by researchersclinicians – answered by researchers
Australian researchers – questions sent in Australian researchers – questions sent in by clinicians – answered by researchersby clinicians – answered by researchers
4083 total questions4083 total questions
MethodsMethods Source Vocabulary Source Vocabulary
MRCON – a table from the MetathesaurusMRCON – a table from the Metathesaurus Lists the medical concepts by unique identifiers Lists the medical concepts by unique identifiers
(CUI) and each string associated with a concept(CUI) and each string associated with a concept unique (string => 1 concept) unique (string => 1 concept) ambiguous (string => 2+ concepts)ambiguous (string => 2+ concepts)
COLD – ambient temperature, viral respiratory infection, COLD – ambient temperature, viral respiratory infection, chronic obstructive lung diseasechronic obstructive lung disease
2,247,454 strings associated with concepts2,247,454 strings associated with concepts Non-medical Lexicon – from Roget’s Non-medical Lexicon – from Roget’s
Thesaurus Thesaurus Query objects (why, when, how), identifiers (I, Query objects (why, when, how), identifiers (I,
you, he), modifiers (soon, frequently)you, he), modifiers (soon, frequently) 749 terms in this lexicon749 terms in this lexicon
String MatchingString Matching
Parsing program (written in C)Parsing program (written in C) Separates individual questions into 3-Separates individual questions into 3-
word, 2-word, 1-word windowsword, 2-word, 1-word windows Matches the window against MRCON and Matches the window against MRCON and
our lexiconour lexicon Generates a report of:Generates a report of:
Total number of words parsedTotal number of words parsed Number of matches from unique, ambiguous, Number of matches from unique, ambiguous,
non-medical listsnon-medical lists Strings that didn’t match any of the listsStrings that didn’t match any of the lists
ResultsResults String – individual word or words that matchedString – individual word or words that matched Hits – how often the string was foundHits – how often the string was found Words – total number of matching words (some strings have more Words – total number of matching words (some strings have more
than one word in them)than one word in them)
StringStringss
HitsHits WordsWords % % matchmatch
MRCONMRCON
UniqueUnique4,5344,534 24,8424,84
4430,18630,186 42.3%42.3%
MRCONMRCON
AmbiguouAmbiguouss
574574 9,2569,256 9,7699,769 13.7%13.7%
Non-Non-medicalmedical
208208 16,7616,7688
17,78317,783 24.9%24.9%
UnmatcheUnmatchedd
2,3212,321 13,62413,624 19.1%19.1%
ResultsResults
100 strings occurred 7850 times – or 100 strings occurred 7850 times – or 57.6% of the total matches57.6% of the total matches
712 strings => 3+ hits, 85% of all hits712 strings => 3+ hits, 85% of all hits
Our focus was on strings that didn’t Our focus was on strings that didn’t match one of the source vocabulariesmatch one of the source vocabularies 19.1% didn’t match19.1% didn’t match Hypothesis that additional terms not found Hypothesis that additional terms not found
in MRCON will be important for indexingin MRCON will be important for indexing
ResultsResults
Unmatched words – 2+ occurrencesUnmatched words – 2+ occurrences
Unique wordsUnique words Total NumberTotal Number PercentPercent
VerbVerb 261261 36763676 31.7%31.7%
NounNoun 186186 23562356 20.3%20.3%
PrepositionPreposition 99 25442544 21.9%21.9%
Adj/Adv/ConjAdj/Adv/Conj 103103 10951095 9.5%9.5%
Mix *Mix * 7272 810810 7.0%7.0%
PronounPronoun 1010 614614 5.3%5.3%
IntegerInteger 7070 502502 4.3%4.3%
* can be more than one word type, depending on the context. Attacks, step, process all can be nouns or verbs
DiscussionDiscussion
MRCON – selected because of MRCON – selected because of low rate of ambiguous string-CUI low rate of ambiguous string-CUI combinationscombinations 89% unique string matches89% unique string matches 11% ambiguous string matches11% ambiguous string matches
Other tables have greater word Other tables have greater word coverage, but have more coverage, but have more ambiguity for each of the wordsambiguity for each of the words
DiscussionDiscussion Our word-matching results were similar Our word-matching results were similar
to other researchersto other researchers Cimino matched 43% of words with Cimino matched 43% of words with
Meta-1 Meta-1 (we had 56% MRCON matches)(we had 56% MRCON matches) Computers & Biomedical Research. Computers & Biomedical Research. Aug 1992;25(4):366-373.Aug 1992;25(4):366-373.
Hersh matched 60% of words to medical Hersh matched 60% of words to medical terminology & names dictionary terminology & names dictionary (we had 79% combined lexicon matches)(we had 79% combined lexicon matches) Proceedings/AMIA Annual Fall Symposium. p. Proceedings/AMIA Annual Fall Symposium. p. 1997.1997.
DiscussionDiscussion
Stop words – commonly removed by Stop words – commonly removed by most normalization tools. Prepositions, most normalization tools. Prepositions, conjunctions, pronounsconjunctions, pronouns
Provide valuable contextual information.Provide valuable contextual information. Blood Blood FORFOR an HIV-positive patient an HIV-positive patient Blood Blood FROMFROM an HIV-positive patient an HIV-positive patient Asprin Asprin ANDAND warfarin warfarin Asprin Asprin OROR warfarin warfarin
DiscussionDiscussion
Integers Integers 186 distinct integers or integer word 186 distinct integers or integer word
combinationscombinations Occurred 647 timesOccurred 647 times Additional modification of conceptsAdditional modification of concepts
Hyperkalemia – 5.3 mEq/li & 8.7 mEq/liHyperkalemia – 5.3 mEq/li & 8.7 mEq/li Both are hyperkalemia, but the evaluation Both are hyperkalemia, but the evaluation
and management are markedly differentand management are markedly different
DiscussionDiscussion
Verbs – largest category of unmatched Verbs – largest category of unmatched wordswords Include action and relation conceptsInclude action and relation concepts Non-medical lexicon contained someNon-medical lexicon contained some
Treats, attends, increases, lessens, reduce, follows, Treats, attends, increases, lessens, reduce, follows, starts, can, should, is, equal, improvestarts, can, should, is, equal, improve
Verb tense changes the meaning of a Verb tense changes the meaning of a questionquestion In a patient In a patient TAKINGTAKING antibiotics antibiotics In a patient who In a patient who TOOKTOOK antibiotics antibiotics
DiscussionDiscussion
Verbs may be conceptually related to Verbs may be conceptually related to medical conceptsmedical concepts Diagnose Diagnose => Diagnosis=> Diagnosis TreatTreat => Treatment=> Treatment EvaluateEvaluate => Evaluation=> Evaluation PrescribePrescribe => Prescription=> Prescription
In these cases the verb (relationship) In these cases the verb (relationship) is not equivalent to the noun (concept)is not equivalent to the noun (concept)
SummarySummary
We developed an application to We developed an application to Parse individual words from collections of Parse individual words from collections of
medical questionsmedical questions Match the words (phrases) with lexical sources, Match the words (phrases) with lexical sources,
codified by the UMLScodified by the UMLS Our results were better than previous Our results were better than previous
investigators (for percentage of matched investigators (for percentage of matched words)words)
We still have some work to do….We still have some work to do….
Related ExperimentsRelated Experiments
We attempted to cluster questions by We attempted to cluster questions by sequences of semantic typessequences of semantic types Initial attempts mostly clustered Initial attempts mostly clustered
common phrases such as common phrases such as “How should I” “How should I” and and “What is the”“What is the”
We may repeat this method after We may repeat this method after discarding ‘stop phrases’discarding ‘stop phrases’
Future WorkFuture Work
Family Practice Inquiries Network Family Practice Inquiries Network (FPIN) has 200 questions that have (FPIN) has 200 questions that have associated MeSH terms manually associated MeSH terms manually assigned by librarians. assigned by librarians.
We will look at these question-term We will look at these question-term groups for clustering purposes (with groups for clustering purposes (with the hypothesis that they will not the hypothesis that they will not make distinct clusters).make distinct clusters).
Future WorkFuture Work
I will work with researchers at NLM to I will work with researchers at NLM to apply MetaMap to medical questionsapply MetaMap to medical questions
extract triplets (Medical Concept-extract triplets (Medical Concept-Allowable Relation-Medical Concept) Allowable Relation-Medical Concept) from questions. from questions. DrugDrug--treats-treats-DiseaseDisease
Insert the triplets into a vector-space Insert the triplets into a vector-space model and look for clustersmodel and look for clusters