Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B....

23
Codifying Semantic Codifying Semantic Information in Medical Information in Medical Questions Using Lexical Questions Using Lexical Sources Sources Paul E. Pancoast Paul E. Pancoast Arthur B. Smith Arthur B. Smith Chi-Ren Shyu Chi-Ren Shyu

Transcript of Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B....

Codifying Semantic Codifying Semantic Information in Medical Information in Medical

Questions Using Lexical Questions Using Lexical SourcesSources

Paul E. PancoastPaul E. Pancoast

Arthur B. SmithArthur B. Smith

Chi-Ren ShyuChi-Ren Shyu

Research PurposeResearch Purpose

To find a method for classifying medical To find a method for classifying medical questions that are asked by cliniciansquestions that are asked by clinicians

Hypothesis - Hypothesis - Simply indexing by keywords Simply indexing by keywords isn’t enough to isn’t enough to distinguishdistinguish questions with different meanings questions with different meanings

but similar wordingbut similar wording, or to , or to groupgroup questions with similar meanings but questions with similar meanings but

different words.different words.

DefinitionsDefinitions

Semantic Information – the meaning of the Semantic Information – the meaning of the wordswords

Syntactic Information – the parts of speech Syntactic Information – the parts of speech of the words (word type, sentence part)of the words (word type, sentence part)

Medical Questions – a question asked by a Medical Questions – a question asked by a clinicianclinician

Lexical Sources – sources of words and Lexical Sources – sources of words and vocabulariesvocabularies

UMLS – Unified Medical Language SystemUMLS – Unified Medical Language System

UMLSUMLS

Ambitious project of the National Library of Ambitious project of the National Library of Medicine, begun in 1986Medicine, begun in 1986

Help researchers retrieve and integrate Help researchers retrieve and integrate electronic biomedical information from a electronic biomedical information from a variety of sourcesvariety of sources

Links over 100 controlled vocabulariesLinks over 100 controlled vocabularies Assigns unique identifiers to medical Assigns unique identifiers to medical

concepts and stringsconcepts and strings Maps the hierarchical relationships Maps the hierarchical relationships

between the medical conceptsbetween the medical concepts

Why Bother?Why Bother?(To classify medical questions?)(To classify medical questions?)

Clinicians have questions when treating Clinicians have questions when treating patientspatients

Researchers have gathered collections of these Researchers have gathered collections of these questionsquestions

No good method exists to classify the questionsNo good method exists to classify the questions How many times has a particular question been How many times has a particular question been

asked?asked? Which questions should receive priority for Which questions should receive priority for

evidence-based answers?evidence-based answers?

ExamplesExamples

What is the best way to treat acute What is the best way to treat acute pharyngitis?pharyngitis?

How should I approach a patient with a sore How should I approach a patient with a sore throat?throat?

What should I do with a patient with What should I do with a patient with diabetes and insulin resistance?diabetes and insulin resistance?

What should I do with a patient with What should I do with a patient with diabetes who is resistant to taking insulin?diabetes who is resistant to taking insulin?

MethodsMethodsSource QuestionsSource Questions

American researcher – observed American researcher – observed clinicians at workclinicians at work

British researchers – questions sent in by British researchers – questions sent in by clinicians – answered by researchersclinicians – answered by researchers

Australian researchers – questions sent in Australian researchers – questions sent in by clinicians – answered by researchersby clinicians – answered by researchers

4083 total questions4083 total questions

MethodsMethods Source Vocabulary Source Vocabulary

MRCON – a table from the MetathesaurusMRCON – a table from the Metathesaurus Lists the medical concepts by unique identifiers Lists the medical concepts by unique identifiers

(CUI) and each string associated with a concept(CUI) and each string associated with a concept unique (string => 1 concept) unique (string => 1 concept) ambiguous (string => 2+ concepts)ambiguous (string => 2+ concepts)

COLD – ambient temperature, viral respiratory infection, COLD – ambient temperature, viral respiratory infection, chronic obstructive lung diseasechronic obstructive lung disease

2,247,454 strings associated with concepts2,247,454 strings associated with concepts Non-medical Lexicon – from Roget’s Non-medical Lexicon – from Roget’s

Thesaurus Thesaurus Query objects (why, when, how), identifiers (I, Query objects (why, when, how), identifiers (I,

you, he), modifiers (soon, frequently)you, he), modifiers (soon, frequently) 749 terms in this lexicon749 terms in this lexicon

String MatchingString Matching

Parsing program (written in C)Parsing program (written in C) Separates individual questions into 3-Separates individual questions into 3-

word, 2-word, 1-word windowsword, 2-word, 1-word windows Matches the window against MRCON and Matches the window against MRCON and

our lexiconour lexicon Generates a report of:Generates a report of:

Total number of words parsedTotal number of words parsed Number of matches from unique, ambiguous, Number of matches from unique, ambiguous,

non-medical listsnon-medical lists Strings that didn’t match any of the listsStrings that didn’t match any of the lists

ResultsResults String – individual word or words that matchedString – individual word or words that matched Hits – how often the string was foundHits – how often the string was found Words – total number of matching words (some strings have more Words – total number of matching words (some strings have more

than one word in them)than one word in them)

StringStringss

HitsHits WordsWords % % matchmatch

MRCONMRCON

UniqueUnique4,5344,534 24,8424,84

4430,18630,186 42.3%42.3%

MRCONMRCON

AmbiguouAmbiguouss

574574 9,2569,256 9,7699,769 13.7%13.7%

Non-Non-medicalmedical

208208 16,7616,7688

17,78317,783 24.9%24.9%

UnmatcheUnmatchedd

2,3212,321 13,62413,624 19.1%19.1%

ResultsResults

100 strings occurred 7850 times – or 100 strings occurred 7850 times – or 57.6% of the total matches57.6% of the total matches

712 strings => 3+ hits, 85% of all hits712 strings => 3+ hits, 85% of all hits

Our focus was on strings that didn’t Our focus was on strings that didn’t match one of the source vocabulariesmatch one of the source vocabularies 19.1% didn’t match19.1% didn’t match Hypothesis that additional terms not found Hypothesis that additional terms not found

in MRCON will be important for indexingin MRCON will be important for indexing

ResultsResults

Unmatched words – 2+ occurrencesUnmatched words – 2+ occurrences

Unique wordsUnique words Total NumberTotal Number PercentPercent

VerbVerb 261261 36763676 31.7%31.7%

NounNoun 186186 23562356 20.3%20.3%

PrepositionPreposition 99 25442544 21.9%21.9%

Adj/Adv/ConjAdj/Adv/Conj 103103 10951095 9.5%9.5%

Mix *Mix * 7272 810810 7.0%7.0%

PronounPronoun 1010 614614 5.3%5.3%

IntegerInteger 7070 502502 4.3%4.3%

* can be more than one word type, depending on the context. Attacks, step, process all can be nouns or verbs

DiscussionDiscussion

MRCON – selected because of MRCON – selected because of low rate of ambiguous string-CUI low rate of ambiguous string-CUI combinationscombinations 89% unique string matches89% unique string matches 11% ambiguous string matches11% ambiguous string matches

Other tables have greater word Other tables have greater word coverage, but have more coverage, but have more ambiguity for each of the wordsambiguity for each of the words

DiscussionDiscussion Our word-matching results were similar Our word-matching results were similar

to other researchersto other researchers Cimino matched 43% of words with Cimino matched 43% of words with

Meta-1 Meta-1 (we had 56% MRCON matches)(we had 56% MRCON matches) Computers & Biomedical Research. Computers & Biomedical Research. Aug 1992;25(4):366-373.Aug 1992;25(4):366-373.

Hersh matched 60% of words to medical Hersh matched 60% of words to medical terminology & names dictionary terminology & names dictionary (we had 79% combined lexicon matches)(we had 79% combined lexicon matches) Proceedings/AMIA Annual Fall Symposium. p. Proceedings/AMIA Annual Fall Symposium. p. 1997.1997.

DiscussionDiscussion

Stop words – commonly removed by Stop words – commonly removed by most normalization tools. Prepositions, most normalization tools. Prepositions, conjunctions, pronounsconjunctions, pronouns

Provide valuable contextual information.Provide valuable contextual information. Blood Blood FORFOR an HIV-positive patient an HIV-positive patient Blood Blood FROMFROM an HIV-positive patient an HIV-positive patient Asprin Asprin ANDAND warfarin warfarin Asprin Asprin OROR warfarin warfarin

DiscussionDiscussion

Integers Integers 186 distinct integers or integer word 186 distinct integers or integer word

combinationscombinations Occurred 647 timesOccurred 647 times Additional modification of conceptsAdditional modification of concepts

Hyperkalemia – 5.3 mEq/li & 8.7 mEq/liHyperkalemia – 5.3 mEq/li & 8.7 mEq/li Both are hyperkalemia, but the evaluation Both are hyperkalemia, but the evaluation

and management are markedly differentand management are markedly different

DiscussionDiscussion

Verbs – largest category of unmatched Verbs – largest category of unmatched wordswords Include action and relation conceptsInclude action and relation concepts Non-medical lexicon contained someNon-medical lexicon contained some

Treats, attends, increases, lessens, reduce, follows, Treats, attends, increases, lessens, reduce, follows, starts, can, should, is, equal, improvestarts, can, should, is, equal, improve

Verb tense changes the meaning of a Verb tense changes the meaning of a questionquestion In a patient In a patient TAKINGTAKING antibiotics antibiotics In a patient who In a patient who TOOKTOOK antibiotics antibiotics

DiscussionDiscussion

Verbs may be conceptually related to Verbs may be conceptually related to medical conceptsmedical concepts Diagnose Diagnose => Diagnosis=> Diagnosis TreatTreat => Treatment=> Treatment EvaluateEvaluate => Evaluation=> Evaluation PrescribePrescribe => Prescription=> Prescription

In these cases the verb (relationship) In these cases the verb (relationship) is not equivalent to the noun (concept)is not equivalent to the noun (concept)

SummarySummary

We developed an application to We developed an application to Parse individual words from collections of Parse individual words from collections of

medical questionsmedical questions Match the words (phrases) with lexical sources, Match the words (phrases) with lexical sources,

codified by the UMLScodified by the UMLS Our results were better than previous Our results were better than previous

investigators (for percentage of matched investigators (for percentage of matched words)words)

We still have some work to do….We still have some work to do….

Related ExperimentsRelated Experiments

We attempted to cluster questions by We attempted to cluster questions by sequences of semantic typessequences of semantic types Initial attempts mostly clustered Initial attempts mostly clustered

common phrases such as common phrases such as “How should I” “How should I” and and “What is the”“What is the”

We may repeat this method after We may repeat this method after discarding ‘stop phrases’discarding ‘stop phrases’

Future WorkFuture Work

Family Practice Inquiries Network Family Practice Inquiries Network (FPIN) has 200 questions that have (FPIN) has 200 questions that have associated MeSH terms manually associated MeSH terms manually assigned by librarians. assigned by librarians.

We will look at these question-term We will look at these question-term groups for clustering purposes (with groups for clustering purposes (with the hypothesis that they will not the hypothesis that they will not make distinct clusters).make distinct clusters).

Future WorkFuture Work

I will work with researchers at NLM to I will work with researchers at NLM to apply MetaMap to medical questionsapply MetaMap to medical questions

extract triplets (Medical Concept-extract triplets (Medical Concept-Allowable Relation-Medical Concept) Allowable Relation-Medical Concept) from questions. from questions. DrugDrug--treats-treats-DiseaseDisease

Insert the triplets into a vector-space Insert the triplets into a vector-space model and look for clustersmodel and look for clusters

Thank-you!!Thank-you!!

??????