Toward Creating a gold Standard of Drug Indications from FDA Drug Labels
-
Upload
ritu-khare -
Category
Health & Medicine
-
view
89 -
download
1
Transcript of Toward Creating a gold Standard of Drug Indications from FDA Drug Labels
Ritu Khare1, Jiao Li2, Zhiyong Lu1 1National Center for Biotechnology Information (NCBI), U. S. National Library of Medicine, NIH
2Institute of Medical Information, Chinese Academy of Medical Sciences
Toward Creating a Gold Standard of Drug Indications
from FDA Drug Labels
Presentation Order
2
1. Motivation
2. Materials and Methods
3. Results
4. Discussion
Drug Disease Treatment Relationships
Which drug(s) are approved for
treating which diseases(s)
Most frequently sought
information among clinicians (Ely et al. 2000)
Among top 10 most frequent
multi-concept queries on
PubMed (Dogan et al. 2009)
Applications
Google Knowledge Graph
(quick referencing)
Training biomedical systems (Lu et al. 2013, Li and Lu 2012)
Controlling errors in EMRs (Khare et al. 2013)
3
Disease 1
Disease 2
Disease 3
Disease
Drug Indications (e.g. What are the indicated uses of
Fluoxetine capsule)
Disease Treatments (e.g. What are the prescribed drugs
for hypertension)
Gold Standard Properties
1. Factual
2. Structured and Normalized
3. Specific to a dose-form e.g. ear drop, oral tablet, topical gel,…
Ketorolac injection and Ketorolac ophthalmic solution have different indications
Existing resources
- DrugBank (University of Alberta), MedicineNet (WebMD), DailyMed (National Library of Medicine
/ FDA)
- Factual, Specific (Not Structured)
- NDF-RT (U.S. Dept of Veteran Affairs), Freebase (Google)
- Factual, Structured (Not Specific)
4
Disease 1
Disease 2
Disease 3
Drug 1
Drug 2
Drug 3
RXCUI
RXCUI
RXCUI
UMLS CUI
UMLS CUI
UMLS CUI
DailyMed: The Drug Indication Data Store
Drug Database of the National Library of Medicine (NLM)
Most recent drug labels (or packet inserts) submitted to FDA by various
pharmaceutical companies.
5
Factual
Structured
Dose Form Specific
X
Identify Indications from Drug Labels
The Challenges
6
Drug Indication Excerpts in DailyMed
d1
Dutasteride capsules are indicated for the treatment of symptomatic benign
prostatic hyperplasia.
Dutasteride is not approved for the prevention of prostate cancer.
d2 Ranitidine is indicated in the treatment of GERD. Concomitant antacids
should be given for pain relief to patients with GERD.
d3 In patients with coronary heart disease, but with multiple risk factors for
coronary heart disease such as retinopathy, albuminuria, smoking, or
hypertension.
contraindication
other drug’s
risk factors
Related Studies with DailyMed Indications
1. Neveol and Lu (2010)
SemRep (tool for identifying relationships)
1,263 ingredients
73% accuracy
2. Wei et al. (2013)
Use SIDER2 (based on DailyMed)
1,554 ingredients
67% accuracy
3. Fung et al. (2013)
MetaMap (biomedical concept recognizer)
2,105 Drugs (ingredient + dose form)
77% accuracy (on 295 drugs)
7
- Biomedical Text
Mining Tools (65-
80% accuracy)
- Expert Annotation
Presentation Order
8
1. Motivation
2. Materials and Methods
3. Results
4. Discussion
DailyMed: Dataset
Downloaded: August 24 2012 version
Multiple drug labels for same drug by different
manufacturers.
Clustered drug labels using RxNorm identifiers
Determined a representative drug label for each
cluster
Frequently sought drugs
303 ingredients are most frequently sought (80%
access) on PubMed Health (query logs 2010-2011)
Top Drugs: Clonazepam(Klonopin), Acetaminophen
(Tylenol), Azithromycin (Zithromax), …
9
18,353 human prescription drug labels
2,497 unique drug labels
(based on RxNorm identifiers)
504 frequent drug labels
100 drug labels
(randomly selected)
Mining Indications from Drug Labels
1. Automatically identify the candidate indications from drug
labels
2. Display the drug labels with pre-computed candidate
indications, i.e., preannotations (Neveol et al. 2011) on an
annotation interface
3. Two expert annotators accept/reject the preannotations
Educational background: medical and library sciences
Training: Biomedical Literature Indexing
10
Method:
Identify Disease Mention from Drug Labels
UMLS-based disease lexicon
Seed concepts: UMLS CUIs
Vocabulary: MeSH, SNOMED-CT
Semantic Types: 12 types belonging to “Disorder” semantic group
Terms:
Removed: acronyms, abbreviations, fully specified names, and stop words.
Included: all English language non-suppressed synonyms, and their normalized
strings (NLM’s normalization tool NORM)
Extracting disease mentions from Drug Labels
Tokenized, lengths 1 -6.
All tokens and their normalized versions matched with lexicon terms
Overlapping mentions, e.g. “arthritis” and “rheumatoid arthritis,” resolved
by choosing the more specific (longer) match
11
Annotation Interface
12
Annotation Workflow: Two Rounds
Round 1
Pre-annotations = All Disease Mentions
A1 and A2 independently perform annotations
Round 2
Pre-annotations (color-coded) = (i) exclusive judgments (ii) pre-annotations from round-1 not
selected by either
A1 and A2 independently improve previous annotations.
13
Annotation Workflow Sets and Guidelines
Sets Preliminary Guidelines
100 Drug Labels 50 drug labels at a time
Set-1 (avg. 126 words/drug label)
Set 2 (avg. 249 words/ drug label)
Annotation Order
Set-1 round-1 Error analysis/Update Guideline
Set-1 round-2 Error analysis/Update Guideline
Set-2 round-1 Error analysis/Update Guideline
Set-2 round-2 Error analysis/Update Guideline
14
Preliminary Annotation Guidelines Examples
What to Annotate
Select all types of indications (treatment, relief, prevent,…)
What NOT to Annotate
Do not select medical procedures
15
Evaluation
Ground Truth Evaluation
Ground truth for the 100-
drug label dataset.
Three study investigators Reviewed drug labels
Derived the indicated usages and
the UMLS concepts.
Consulted NDF-RT and PubMed
Health
Total 461 ground truth
indications
1. Pre-annotation Performance
Precision, Recall
2. Annotator Performance
Common judgments (Both
annotators agree)
Joint performance Recall, Precision, F1-measure
Inter-annotator Agreement (Jaccard)
(num_match/num_match+num_
nonmatch)
16
Presentation Order
17
1. Motivation
2. Materials and Methods
3. Results
4. Discussion
Pre-annotation Quality 850 Pre-annotations (UMLS-CUIs) for 100 drug labels
Precision Recall 51.88%
Remaining disease mentions:
Contraindications
This drug should not be used for treating
type I diabetes.
Part of organization’s name
The Advisory Council for the Elimination
of Tuberculosis, the American Thoracic
Society, …
Characteristics of an indication
A major depressive episode implies a
prominent and relatively persistent
depressed or dysphoric mood that usually
interferes with daily functioning
Symptoms, Organism names, Risk factors of
…
95.67%
Missed cases
natural language challenges
Identifying “skin infections” from “skin and
soft tissue infections”
limitations of the lexicon
the concepts “tick fever” and “pylori
infection” were not included.
18
Judgment(Expert Annotation) Assessment
Number of Drug Labels and Duration
Joint Performance
A nearly perfect joint precision
Avg. 7.5% improvement in F1-measure
improved from round-1 to round-2.
Inter-annotator agreement
Set-1: 76.2%
Set-2: 93.9%
19
Round-1
#Drug Labels
Round-2
# Drug Labels
Avg. Total
Time
/Annotator
Set-1 50 22 124 min
Set-2 50 28 173 min
Error Analysis Set 1
Set 2
Missed Indications
Alprazolam is also indicated for the
treatment of panic disorder, with or
without agoraphobia
Incorrect Judgments
Selecting Symptoms
Panic disorder is characterized by
following symptoms: palpitations,
pounding heart, or accelerated
heart rate …
Selecting Indications of other Drugs
Cimetidine hydrochloride injection is
indicated for the short term treatment of
active duodenal ulcer. Concomitant
antacids should be given as needed for
relief of pain.
Missed Indications
Drug labels were long upto 800
words
Incorrect judgments
Selecting species names
Respiratory tract infections caused by
Streptococcus pneumoniae
Selecting conditions (e.g. sedation)
caused by the drug.
20
Updated Guidelines
What NOT to Annotate What To Annotate 1. Contraindications
2. Indicated Usages of Another Drug
3. Disease mentions part of an
organization’s name
4. Explicitly specified symptoms
5. Species or organism names
6. Medical procedures
7. Characteristics of an indication
8. Risk factors
1. All indicated usages
2. All types of indications (treat,
prevent, manage, relief…)
3. Main and associated indications
4. Indication treated by a combination
of drugs
5. Efficacy established in clinical trials
21
Special Cases of Annotation
1. Causing Disease
2. Optional Indication
3. In patients with a disease
Updated Guidelines Special Cases (Need Domain Knowledge)
1. Causing Indication
Hydroxyzine Hydrochloride: Useful in the management of pruritus due to
allergic conditions such as chronic urticaria and atopic and contact
dermatoses
Diclofenac Epolamine: Flector Patch is indicated for the topical treatment of
acute pain due to minor strains, sprains, and contusions
2. Optional Indication
Fluoxetine Hydrochloride: Acute treatment of Panic Disorder, with or without agoraphobia, in
adult patients
Alprazolam: Alprazolam is also indicated for the treatment of panic disorder, with or without
agoraphobia.
3. In patients /adults with a Disease
Azithromycin : Azithromycin tablet is indicated for the prevention of disseminated Mycobacterium
avium complex (MAC) disease in persons with advanced HIV infection.
KEPRA: KEPPRA XR™ is indicated as adjunctive therapy in the treatment of partial onset
seizures in patients ≥16 years of age with epilepsy.
22
Presentation Order
23
1. Motivation
2. Materials and Methods
3. Results
4. Discussion
Conclusions Semi-automatic method (NLP + Annotation by two experts)
Toward factual, structured, specific gold standard
A promising performance, joint judgments as gold
Avg. 3 min/drug label by each annotator
F1-measure = avg. 0.95
First study involving annotation of drug indications
Specific and detailed indication annotation guidelines.
What to Annotate, What Not to …, Special Cases
Challenges
About half disease mentions (pre-annotations) not indications
Long textual drug labels
Special Cases of Annotation
24
Limitations and Future Work
Framework
Pre-process drug labels for
improved presentation and
summarization
Algorithm for preparing
pre-annotations needs
sophisticated text mining
techniques (e.g. MetaMap,
NegEx)
Evaluation
Different pair(s) of annotators
Compare gold standard with
existing resources/studies
Classification ability of annotated
corpus
Current Status
534 unique drug labels curated (~
7,688 drug labels)
272 Frequently Sought
Ingredients
25
Acknowledgments
Grant Intramural Research Program of the NIH, National Library of
Medicine
Two Human Annotators
Zanmei Li Yujing Ji
Biomedical Text Mining Group at NCBI
Robert Leaman Yuqing Mao Chih-Hsuan Wei
26