Restrict to MeSH algorithm - National Institutes of Health

21
Evaluation of the Evaluation of the Restrict to Restrict to MeSH MeSH algorithm algorithm Student: Michael Bales Student: Michael Bales Mentor: Olivier Mentor: Olivier Bodenreider Bodenreider NLM Summer Research Rotation NLM Summer Research Rotation

Transcript of Restrict to MeSH algorithm - National Institutes of Health

Evaluation of theEvaluation of theRestrict to Restrict to MeSHMeSH algorithmalgorithm

Student: Michael BalesStudent: Michael BalesMentor: Olivier Mentor: Olivier BodenreiderBodenreider

NLM Summer Research RotationNLM Summer Research Rotation

BackgroundBackground

MEDLINEMEDLINEMeSHMeSHNLM Indexing InitiativeNLM Indexing InitiativeUMLS UMLS MetathesaurusMetathesaurus

Collection of terminological systemsCollection of terminological systems1.2 million concepts, each assigned a CUI1.2 million concepts, each assigned a CUI

MeSH descriptor

Noun phrase

UMLS

UMLS CUIC0202022: Primary C0202022: Primary

malignant neoplasm of malignant neoplasm of left lower lobe of lungleft lower lobe of lung

MeSH main headingsD001984: Bronchial D001984: Bronchial neoplasmsneoplasms

D008175: Lung D008175: Lung neoplasmsneoplasms

Restrict to Restrict to MeSHMeSH::Mapping CUIs to Mapping CUIs to MeSHMeSH entitiesentities

Medical text

Restrict to MeSHRestrict to MeSH

Based on the principle of semantic localityBased on the principle of semantic localityFour techniquesFour techniques

Use synonymyUse synonymyUse associated expressions (Use associated expressions (ATXsATXs))

Explore the ancestorsExplore the ancestors

Explore the other related conceptsExplore the other related concepts

Use synonymyUse synonymy

Term mapped to source conceptTerm mapped to source conceptFor this concept, is there a synonym term For this concept, is there a synonym term that comes from that comes from MeSHMeSH??

UMLS CUIC0002006: Aldosterone

MeSH main headingD000450: Aldosterone

Use associated expressionsUse associated expressions

Is there an associated expression (ATX) that Is there an associated expression (ATX) that describes this concept using a combination of describes this concept using a combination of MeSH main headings?MeSH main headings?ATXsATXs correspond to ICD termscorrespond to ICD terms

AND

Foreign Bodies

MH/SH

Esophagus surgery

Endoscopic removal of intraluminalforeign body from oesophagus without incision

Use ancestorsUse ancestorsBuild the graph of the ancestors of the conceptBuild the graph of the ancestors of the concept

using parents and broader conceptsusing parents and broader conceptsall the way to the topall the way to the topexclude ancestors with incompatible semantic typeexclude ancestors with incompatible semantic type

From the graph, select the concepts that come From the graph, select the concepts that come from from MeSHMeSHRemove those that are ancestors of another Remove those that are ancestors of another concept coming from concept coming from MeSHMeSHAlso try children or siblings as seedAlso try children or siblings as seed

UMLS CUIGiant cell sarcoma

MeSH main headingSarcoma

Use other related conceptsUse other related concepts

Explore the other related conceptsExplore the other related conceptsExclude incompatible semantic typesExclude incompatible semantic typesFrom those, select the concepts that come From those, select the concepts that come from from MeSHMeSH

UMLS CUINicotinic Acid 0.15 MG / Riboflavin 0.02 MG / Thiamine 0.06 MGOral Tablet

MeSH main headingsNiacinRiboflavinThiamineTablets

MethodsMethods

Quantitative evaluation (all CUIs)Quantitative evaluation (all CUIs)From three perspectives:From three perspectives:

CUIsCUIsMeSHMeSH main headingsmain headingsMapping methodMapping method

Data used:Data used:UMLS UMLS MetathesaurusMetathesaurus 2006AA2006AARtMRtM--suggested mappingssuggested mappings

Qualitative evaluationQualitative evaluation

Assess performance on individual Assess performance on individual mappings mappings Random sample of 50 CUIsRandom sample of 50 CUIsAnswer a set of questions for each CUI Answer a set of questions for each CUI and mappingand mappingDetailed output of Detailed output of RtMRtM

Quantitative evaluation resultsQuantitative evaluation results

84.5% of UMLS CUIs were mapped to at least one MeSH entity

Percent of CUIs assigned at least one mapping to MeSH

Mapped

Not mapped

from the perspective of from the perspective of CUIsCUIs

From the perspective of From the perspective of CUIsCUIs

Mappings by Restrict to MeSH method, by semantic group

0%

20%

40%

60%

80%

100%

All

Chemica

ls & D

rugs

Anatom

yLiv

ing B

eings

Proced

ures

Occup

ation

sPhy

siolog

y

Geogra

phic

Areas

Pheno

mena

Activit

ies &

Beh

avior

sDev

ices

Disorde

rsObje

cts

Genes

& M

olecu

lar S

eque

nces

Conce

pts &

Idea

sOrg

aniza

tions

Method used for Restrict to MeSH method, by semantic group

0%

20%

40%

60%

80%

100%

All

Che

mic

als

& D

rugs

Ana

tom

y

Livi

ng B

eing

s

Pro

cedu

res

Occ

upat

ions

Phy

siol

ogy

Geo

grap

hic

Are

as

Phe

nom

ena

Act

iviti

es &

Beh

avio

rs

Dev

ices

Dis

orde

rs

Obj

ects

Gen

es &

Mol

ecul

ar S

eque

nces

Con

cept

s &

Idea

s

Org

aniz

atio

ns

No MeSH termO (Other related term)A (associated expression)I (synonymy)G/S (graph of ancestors, seeded by siblings)G/P (graph of ancestors, seeded by parents)G/C (graph of ancestors, seeded by children)

Relative count of CUIs in semantic group

Relative count of MeSH main headings in semantic group

From the perspective of From the perspective of mapping methodmapping method

Maximum proportional tree depth category, by main heading

0

10

20

30

40

50

1 (near root) 2 3 4 5 (near leaves)Tree depth category

Perc

ent m

ain

head

ings

in c

ateg

ory

All of MeSHRtM

From the perspective of From the perspective of MeSHMeSH main headingsmain headings

From the perspective of From the perspective of MeSHMeSH main headingsmain headings

MeSH main headings suggested by Restrict to MeSH method, by treeUMLS Metathesaurus 2006AA

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

A (A

nato

my)

B (O

rgan

ism

s)

C (D

isea

ses)

D (C

hem

ical

s an

d D

rugs

)

E (A

naly

tical

, Dia

gnos

tic, a

ndTh

erap

eutic

Tec

hniq

ues

and

Equ

ipm

ent)

F (P

sych

iatry

and

Psy

chol

ogy)

G (B

iolo

gica

l Sci

ence

s)

H (P

hysi

cal S

cien

ces)

I (A

nthr

opol

ogy,

Edu

catio

n,S

ocio

logy

and

Soc

ial P

heno

men

a)

J (T

echn

olog

y an

d Fo

od a

ndB

ever

ages

)

K (H

uman

ities

)

L (In

form

atio

n S

cien

ce)

M (P

erso

ns)

N (H

ealth

Car

e)

V (P

ublic

atio

n C

hara

cter

istic

s)

Z (G

eogr

aphi

c Lo

catio

ns)

MeSH tree

Tota

l MeS

H m

ain

head

ings

in tr

ee

Main headings suggested by RtM at least twice

Main headings suggested by RtM only once (for MeSH-supplied CUI)

Qualitative evaluationQualitative evaluation

What mapping method is used?What mapping method is used?If there is no mapping to If there is no mapping to MeSHMeSH, why not?, why not?Is the mapping in the same semantic neighborhood as the Is the mapping in the same semantic neighborhood as the CUI?CUI?

If not, how did this lateral semantic drift occur?If not, how did this lateral semantic drift occur?Is the mapping at the same level of specification as the CUI?Is the mapping at the same level of specification as the CUI?

If not, how did this semantic drift occur?If not, how did this semantic drift occur?If the graph of ancestors was used, how many tree levels were If the graph of ancestors was used, how many tree levels were climbed before selecting the suggested mapping?climbed before selecting the suggested mapping?For CUIs that mapped to several For CUIs that mapped to several MeSHMeSH entities, what entities, what proportion were appropriate mappings (none, some, or all)?proportion were appropriate mappings (none, some, or all)?

Assess the quality of mappings on an individual level

Mapping methods in sample were Mapping methods in sample were similar to those used for all similar to those used for all CUIsCUIs

Methods used in random sample

ANot mappedOG/PG/CG/S

Mapping methods used for all of MeSH

ANot mappedOG/PG/CG/S

Qualitative evaluation Qualitative evaluation –– resultsresults13 of 50 CUIs were not mapped to 13 of 50 CUIs were not mapped to MeSHMeSH

Orphans (n=6)Orphans (n=6)Mapping via ancestors not possible (n=6)Mapping via ancestors not possible (n=6)Crossing semantic boundary (n=1)Crossing semantic boundary (n=1)

37 CUIs were mapped37 CUIs were mapped33 of 37 (89%) were in same semantic 33 of 37 (89%) were in same semantic neighborhood as CUIneighborhood as CUIAll 37 were more general than the CUIAll 37 were more general than the CUI

AmiodaroneAmiodarone overdose overdose OverdoseOverdoseGiant cell sarcoma Giant cell sarcoma SarcomaSarcoma

ConclusionConclusion

RtMRtM already achieves good performancealready achieves good performanceMinor enhancements will improve methodMinor enhancements will improve methodRapid growth in biomedical literatureRapid growth in biomedical literatureEffective manual and automated indexing Effective manual and automated indexing methods increasingly neededmethods increasingly needed

AcknowledgementsAcknowledgements

Olivier Olivier BodenreiderBodenreiderMay ChehMay ChehRob LoganRob LoganTom Tom RindfleschRindfleschNLM Training GrantNLM Training Grant