Integrating Domain Knowledge to Improve Signal Detection ... · Pharmacovigilance, also known as...

Texas Medical Center LibraryDigitalCommons@The Texas Medical Center

UT SBMI Dissertations (Open Access) School of Biomedical Informatics

Summer 2014

Integrating Domain Knowledge to Improve SignalDetection from Electronic Health Records forPharmacovigilanceNing Shang

Follow this and additional works at: http://digitalcommons.library.tmc.edu/uthshis_dissertations

Part of the Bioinformatics Commons, and the Medicine and Health Sciences Commons

This is brought to you for free and open access by the School of BiomedicalInformatics at DigitalCommons@The Texas Medical Center. It has beenaccepted for inclusion in UT SBMI Dissertations (Open Access) by anauthorized administrator of DigitalCommons@The Texas Medical Center.For more information, please contact [email protected].

Recommended CitationShang, Ning, "Integrating Domain Knowledge to Improve Signal Detection from Electronic Health Records for Pharmacovigilance"(2014). UT SBMI Dissertations (Open Access). Paper 26.

http://digitalcommons.library.tmc.edu?utm_source=digitalcommons.library.tmc.edu%2Futhshis_dissertations%2F26&utm_medium=PDF&utm_campaign=PDFCoverPages

http://digitalcommons.library.tmc.edu/uthshis_dissertations?utm_source=digitalcommons.library.tmc.edu%2Futhshis_dissertations%2F26&utm_medium=PDF&utm_campaign=PDFCoverPages

http://digitalcommons.library.tmc.edu/uthshis?utm_source=digitalcommons.library.tmc.edu%2Futhshis_dissertations%2F26&utm_medium=PDF&utm_campaign=PDFCoverPages

http://digitalcommons.library.tmc.edu/uthshis_dissertations?utm_source=digitalcommons.library.tmc.edu%2Futhshis_dissertations%2F26&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/110?utm_source=digitalcommons.library.tmc.edu%2Futhshis_dissertations%2F26&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/648?utm_source=digitalcommons.library.tmc.edu%2Futhshis_dissertations%2F26&utm_medium=PDF&utm_campaign=PDFCoverPages

http://digitalcommons.library.tmc.edu/uthshis_dissertations/26?utm_source=digitalcommons.library.tmc.edu%2Futhshis_dissertations%2F26&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

Integrating Domain Knowledge to Improve Signal Detection from Electronic Health Records for Pharmacovigilance

By

Ning Shang, MS

APPROVED:

. HuaX~D

~----Elmer V. Berns tam, MD, MSE, MS

~

Date approved: ___ 7_/_l'f_/_~_o/_~

Integrating Domain Knowledge to Improve Signal Detection from Electronic Health

Records for Pharmacovigilance

A

Dissertation

Presented to the Faculty of

The University of Texas

Health Science Center at Houston

School of Biomedical Informatics

in Partial Fulfilment of the Requirements for the Degree of

Doctor of Philosophy

By

Ning Shang, MS

University of Texas Health Science Center at Houston

2014

Dissertation Committee:

Trevor Cohen, MBChB, PhD1, Chair

Jorge Herskovic, MD, PhD2

Hua Xu, PhD1

Elmer V. Bernstam, MD, MSE, MS1

Peng Wei, PhD3 1The School of Biomedical Informatics 2MD Anderson Cancer Center 3The School of Public Health

Copyright by

Ning Shang

2014

ii

Acknowledgements

I want to thank Dr. Trevor Cohen for being a great mentor. His scientific spirit, serious

attitude and considerate character have taught me to become a thorough scientist and

beyond. I want to thank Dr. Jorge Herskovic for his support and all the enlightening

discussions I had with him. I want to thank Dr. Hua Xu for sharing his research experiences

with me. I want to thank Dr. Elmer V. Bernstam and his team for building and sharing

clinical data repositories, facilitating my research goals. I want to thank Dr. Peng Wei for

providing invaluable advice for selecting the best statistical models. Last but not least, I

want to thank Dr. Parsa Mirhaji for opening the world of pharmacovigilance to me and the

training in out-of-the-box thinking.

I want to thank my parents, my sister, and my fiancée for their unconditional love and

support.

This work was supported by the US National Library of Medicine Grant (1R01LM011563),

Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance.

iii

Abstract

The intent of this dissertation is to make a contribution to the field of pharmacovigilance.

Pharmacovigilance, also known as post-marketing drug surveillance, is the process of

continued monitoring for adverse drug reactions (ADRs) after drugs are released into the

market. An ADR is a harmful or unpleasant reaction related to the use of a medical product.

ADRs were reported to be between the fourth and sixth leading cause of death in the United

States in 1994, accounting for 3-7% of medical hospital admissions. On account of the

practice of pharmacovigilance, Vioxx (Rofecoxib) and Avandia (Rosiglitazone) are

examples of high profile drugs that were suspended from the American or European

market.

To prevent these effects on human health, pre-marketing clinical trials are designed to test

drug safety and efficacy. Although clinical trials are extensive and last multiple years, rare

ADRs may not be detected, and others may occur on account of idiosyncratic

characteristics of individuals excluded from the evaluated sample.

To aid the pharmacovigilance process, automated methods for the identification of strongly

correlated drug/ADR pairs from data sources such as adverse event reporting systems, or

Electronic Health Records (EHRs), have been developed. These methods however are

generally statistical in nature, and do not draw upon the large volumes of knowledge

embedded in the biomedical literature.

iv

In this dissertation I investigate the ability of scalable Literature Based Discovery (LBD)

methods to identify side effects of pharmaceutical agents in a computationally automated

manner. LBD methods can provide evidence from the literature to support the plausibility

of a drug/ADR association, thereby assisting human review to validate the signal, which is

an essential component of pharmacovigilance. The hypothesis underlying this work is that

by combining signals mined from EHR data with biomedical domain knowledge, the

accuracy of side effects detection may be improved. This also addresses the lack of

causality assessment in existing statistical methods in pharmacovigilance practice.

My theoretical contribution is that by conducting automated abductive reasoning and by

estimating the strength of generated explanatory hypotheses the plausibility of a drug/ADR

signal can be assessed. I adapt and extend the original abductive reasoning process as

defined by Peirce in 19th century by stating that the strength of the explanations found for

an observation is a measure for its plausibility, rather than taking an observation as given.

Practical contributions to pharmacovigilance and informatics include the development of

methods to leverage the knowledge from biomedical literature, the detection of signals

from the EHR data and the subsequent evaluation using supporting evidence from the

literature on a large scale in an automated way, and the development of an improved

drug/ADR reference set. My contributions are not restricted to pharmacovigilance and as

such constitute a contribution to the field of informatics in general.

I demonstrate that my work has extended the state of the art in EHR-based

pharmacovigilance and contribute new ideas that pave the way for further studies with the

potential to further enhance the field of pharmacovigilance and drug safety.

v

Table of Contents

Acknowledgements ............................................................................................................. ii

Abstract .............................................................................................................................. iii

Table of Contents ................................................................................................................ v

List of Tables .................................................................................................................... xii

List of Figures .................................................................................................................. xiii

Chapter 1: Introduction ....................................................................................................... 1

1. 1 Significance of Pharmacovigilance ....................................................................... 1

1.2 Challenges in pharmacovigilance and suggested solutions ................................... 2

1.3 Dissertation structure ............................................................................................. 4

1.4 Key contributions ................................................................................................... 5

Chapter 2: Literature Review .............................................................................................. 7

2.1 Pharmacovigilance: post-marketing drug surveillance .......................................... 7

2.1.1 Pharmacovigilance definition and significance .................................................. 7

2.1.2 Pharmacovigilance workflow in related health departments .............................. 9

2.1.3 Pharmacovigilance data sources ....................................................................... 10

2.2 Pharmacovigilance methodology ......................................................................... 12

2.2.1 Statistical data mining from pharmacovigilance database ................................ 12

vi

2.2.2 Taxonomic reasoning ........................................................................................ 17

2.3 Predicting drug side effect relations ..................................................................... 18

2.3.1 Machine learning algorithms ............................................................................ 18

2.3.2 Regression and correlation analysis .................................................................. 19

2.3.3 Mining literature to predict drug/ADR associations ......................................... 20

2.3.4 Statistical data mining from EHR for signal detection ..................................... 21

2.4 Pharmacovigilance challenges from my perspective ........................................... 23

2.5 Signal detection from EHR .................................................................................. 23

2.6 Expert clinical review and causality assessment in pharmacovigilance .............. 25

2.6.1 Expert clinical review in pharmacovigilance .................................................... 25

2.6.2 Assessment of causality .................................................................................... 26

2.7 LBD and literature NLP annotation tools ............................................................ 27

2.7.1 LBD and discovery patterns.............................................................................. 27

2.7.2 Literature NLP annotation tools........................................................................ 29

2.8 Semantic Vectors ................................................................................................. 30

2.8.1 Overview of semantic vectors ........................................................................... 31

2.8.2 Scalability and random indexing (RI) ............................................................... 32

2.9 Research opportunities ......................................................................................... 33

2.9.1 Active drug surveillance with EHR .................................................................. 33

vii

2.9.2 Using distributional semantic models to find

plausible drug/ADR associations ...................................................................... 33

Chapter 3: Chapter 3: Signal Detection from Outpatient Electronic Health Record

Data Using Statistical Mining ......................................................................... 35

3.1 Materials .............................................................................................................. 37

3.1.1 Clinical data warehouse (CDW) ....................................................................... 37

3.1.2 Medical language extraction and encoding system (MedLEE) ........................ 38

3.1.3 Side effect resource 2 (SIDER2)....................................................................... 38

3.1.4 Hierarchical relationships between concepts from UMLS knowledge base .... 38

3.1.5 MEDication-Indication (MEDI) ....................................................................... 39

3.2 Methods................................................................................................................ 40

3.2.1 Annotating the EHR data using MedLEE ......................................................... 40

3.2.2 Construct test data set ....................................................................................... 40

3.2.3 Retrieve co-occurrence data for SIDER2 test set ............................................. 41

3.2.4 Statistical algorithms ......................................................................................... 41

3.2.4.1 Disproportionality analysis ............................................................................ 41

3.2.4.2 Other statistical algorithm based on co-occurrence ....................................... 41

3.2.5 Filtering indications .......................................................................................... 44

3.2.6 Performance evaluation .................................................................................... 44

3.3 Experimental design............................................................................................. 44

viii

3.4 Results .................................................................................................................. 45

3.4.1 Experiment dataset ............................................................................................ 45

3.4.2 Query the co-occurrence data for experiment datasets ..................................... 46

3.4.3 Statistical results ............................................................................................... 47

3.4.3.1 Performance results ........................................................................................ 47

3.4.3.1.1 Performance of statistical algorithms with original co-occurrence data ..... 47

3.4.3.1.2 Effects of potential ontology intervention .................................................. 48

3.4.3.1.3 Effects of MEDI intervention ..................................................................... 50

3.5 Discussion ............................................................................................................ 52

3.6 Conclusion ........................................................................................................... 54

Chapter 4: Using Knowledge Extracted from the Literature to

Identify Plausible Adverse Drug Reactions .................................................... 55

4.1 Materials .............................................................................................................. 55

4.1.1 MMB and SemMedDB ..................................................................................... 56

4.1.2 SIDER2 ............................................................................................................. 56

4.1.3 MEDication-Indication (MEDI) ....................................................................... 57

4.2 Methods................................................................................................................ 58

4.2.1 RRI .................................................................................................................... 58

4.2.2 PSI ..................................................................................................................... 63

4.2.2.1 Operations in PSI ........................................................................................... 63

ix

4.2.2.2 PSI training process ....................................................................................... 63

4.2.2.3 Inferring discovery patterns ........................................................................... 65

4.2.2.4 Applying discovery patterns to find possible ADRs ..................................... 66

4.3 Experimental design............................................................................................. 69

4.3.1 Experiment 1 design ......................................................................................... 70

4.3.2 Experiment 2 design ......................................................................................... 71

4.3.3 Performance measurements .............................................................................. 72

4.4 Results .................................................................................................................. 73

4.4.1Experiment 1 ...................................................................................................... 73

4.4.1.1 Inferring discovery patterns ........................................................................... 74

4.4.1.2 Performance ................................................................................................... 76

4.4.1.3 AUC ............................................................................................................... 76

4.4.1.4 Rediscovery results ........................................................................................ 78

4.4.2 Experiment 2 ..................................................................................................... 79

4.4.3 Plausibility evidence found by PSI discovery patterns approach ..................... 81

4.5 Discussion ............................................................................................................ 88

4.6 Conclusion ........................................................................................................... 96

Chapter 5: Toward a Reality-based Repository of Adverse Drug Reactions:

Comparing SIDER2 with Side Events Encountered in Practice ..................... 97

5.1 Materials ............................................................................................................ 100

x

5.2 Methods.............................................................................................................. 101

5.2.1 Import SIDER2 and FAERS data in database ................................................ 101

5.2.2 Parsing drug and side effect terms using MetaMap ........................................ 101

5.2.3 Mapping between SIDER2 and FAERS ......................................................... 103

5.2.4 Retrieve number of reports for mapped SIDER2 drug-side effect pairs

and perform disproportionality analysis to find significant

SIDER2 drug/ADR associations ..................................................................... 104

5.2.5 Performance measure ...................................................................................... 105

5.3 Experimental design........................................................................................... 105

5.4 Results ................................................................................................................ 106

5.4.1 Concept extraction .......................................................................................... 106

5.4.2 Reported SIDER side effects .......................................................................... 107

5.4.3 Statistically significant drug/ADR pairs detected by

disproportionality measures ............................................................................ 108

5.4.4 Manual review sample drug/ADR pairs without case reports ........................ 109

5.5 Discussion .......................................................................................................... 110

5.6 Conclusion ......................................................................................................... 112

Chapter 6: Improving Signal Detection from the Electronic Health Records by

Using Literature Based Discovery ................................................................ 113

6.1 Experimental design........................................................................................... 113

xi

6.2 Materials and Methods ....................................................................................... 114

6.2.1 Models selected for this study......................................................................... 114

6.2.2 Experimental reference standards and test dataset .......................................... 115

6.2.3 Performance metrics ....................................................................................... 116

6.3 Results ................................................................................................................ 117

6.3.1 Performance .................................................................................................... 117

6.3.2 AUC ................................................................................................................ 119

6.4 Discussion .......................................................................................................... 120

6.5 Conclusion ......................................................................................................... 121

Chapter 7: Key Findings, Innovation, Contributions, Future work and Conclusions..... 123

7.1 Overview and summary of key findings ............................................................ 123

7.2 Innovation .......................................................................................................... 124

7.3 Theoretical contribution ..................................................................................... 124

7.4 Practical contribution ......................................................................................... 125

7.5 Future work ......................................................................................................... 127

7.6 Conclusions ......................................................................................................... 127

References ....................................................................................................................... 129

xii

List of Tables

Table 2-1. Quantitative data mining procedures used as

disproportionality measures in SRS .........................................................15

Table 3-1. Overall error measure in FDR approach ...................................................43

Table 3-2. Top 10 clinical documentation types in the EHR system.........................46

Table 3-3. Number of false pairs that have been excluded as indications .................51

Table 4-1. Different groups by extending MEDI in experiment 2 ............................72

Table 4-2. The most frequent predicate paths in

inferred discovery patterns ........................................................................74

Table 4-3. Results of precision and rank-based measures

for different groups ...................................................................................76

Table 4-4. PSI-double+triple model performance across all tests

with different MEDI lists ..........................................................................80

Table 4-5. Reasoning pathways used to retrieve evidence from the

literature for the pair rosiglitazone -- myocardial infarction ...................82

Table 4-6. Some example predications for possible mechanism of

rosiglitazone causing myocardial infarction ............................................83

Table 5-1. The percentage of statistically significant SIDER2

drug/ADR pairs using disproportionality measures .................................109

Table 6-1. SIDER2 and a Reference Set are used to construct the dataset

for the experiment .....................................................................................115

Table 6-2. Results of precision and rank-based measures for

different groups and different datasets ......................................................117

xiii

List of Figures

Figure 2-1. Number of adverse event reports reported to FDA

from 2001 to 2011 ...................................................................................9

Figure 2-2. FDA pharmacovigilance process ............................................................11

Figure 3-1. The example ancestor-offspring hierarchy relationships

between UMLS concepts ........................................................................39

Figure 3-2. Comparison of ability of different performance metrics to recover

SIDER2 side effects for different statistical algorithms with different

drug/ADR co-occurrence threshold .........................................................48

Figure 3-3. Comparison of the predicted drug/ADR association count comparing

between original and descendant co-occurrence set ...............................49

Figure 3-4. Comparison of ability of different performance metrics to recover

SIDER2 side effects for different statistical algorithms with

different co-occurrence set ......................................................................50

Figure 3-5. Comparison of the predicted drug/ADR association count

comparing between different MEDI filtering indications .......................51

Figure 3-6. Comparison of ability of different performance metrics to

recover SIDER2 side effects for different statistical algorithms

after filtering indications by different MEDI lists ..................................52

Figure 4-1. The pseudo code for RRI model training process ...................................62

Figure 4-2. Schematic representation of RRI training and inference process ...........62

Figure 4-3. The Pseudo code for PSI model training process....................................65

Figure 4-4. Schematic representation of PSI training and inference process ............68

Figure 4-5. Experimental design in the detection of SIDER2 known

ADRs using LBD distributional semantic models ...................................69

Figure 4-6. ROC plot of true positive rate and false positive rate

for all groups ...........................................................................................77

xiv

Figure 4-7. Performance of AUC for all drugs by PSI-double+triple group .............78

Figure 4-8. Rediscovery plot for experiment groups .................................................79

Figure 4-9. The predications retrieved by reasoning pathway for rosiglitazone

causing myocardial infarction with specifying semantic

groups for concepts .................................................................................85

Figure 4-10. Middle terms that were retrieved by PSI discovery patterns

involving LDL-C ..................................................................................87

Figure 5-1. The procedure for mapping SIDER2 and FAERS ..................................103

Figure 5-2. Procedure of mapping SIDER2 drugs/side effects to FAERS

drugs/reactions using “dipyridamole” as an example .............................104

Figure 5-3. Study design for decreasing false SIDER2

drug-side effect associations ...................................................................106

Figure 5-4. Frequency distribution of number of SIDER2 drug/ADR pairs

grouped by the number of FAERS reports that contain the

drug/ADR of interest...............................................................................108

Figure 5-5. Interpretation of 60 randomly selected SIDER2 pairs without

any supporting reports..............................................................................110

Figure 6-1. Overall research design for reranking statistically significant

drug/ADR associations by similarity scores ...........................................114

Figure 6-2. Comparing accumulated number of true positives for each ranking

between pre- and post-reranking signals for

SIDER2 set and the Reference set ..........................................................118


for pre- and post-reranking groups for SIDER2 set ................................119


for pre- and post-reranking groups for Reference set .............................120

Figure 6-5. The proposed architecture for a plausibility-based

pharmacovigilance system ......................................................................122

1

Chapter 1: Introduction

1.1 Significance of Pharmacovigilance

An adverse drug reaction (ADR) is an “appreciably harmful or unpleasant reaction,

resulting from an intervention related to the use of a medical product” (Edwards &

Aronson, 2000). ADRs contribute to substantial morbidity, health care visits and hospital

admissions (Baker et al., 2004; Chan, Nicklason, & Vial, 2001; Hamilton, Briceland, &

Andritz, 1998; Lazarou, Pomeranz, & Corey, 1998; Pirmohamed et al., 2004). For

example, ADRs were reported to be between the fourth and sixth leading cause of death in

the United States in 1994 (Lazarou et al., 1998), accounting for 3-7% of medical hospital

admissions (Baker et al., 2004; Hamilton et al., 1998) and a substantial number of health

care visits (Bourgeois, Mandl, Valim, & Shannon, 2009). A recent observational study

conducted in two emergency departments shows that 8.9% of emergency patients had a

probable ADR (Capuano et al., 2004). Overall, they have a considerable negative impact

and place an enormous burden on both health and healthcare system (Pirmohamed et al.,

2004; Rodríguez-Monguió, Otero, & Rovira, 2003) and consequently constitute a notable

public health problem (Capuano et al., 2004).

To prevent these harmful effects on human health, pre-marketing clinical trials are

designed to test drug safety and efficacy. Phase III clinical trials have been estimated to

cost 86.3 million US dollars and last 30.5 months on average (DiMasi, Hansen, &

Grabowski, 2003). Nonetheless, rare ADRs may not be detected due to the limited duration

2

and sample size of such trials, and others may occur on account of idiosyncratic

characteristics of individuals excluded from the evaluated sample. The continued

monitoring for ADRs after drugs are released into the market, referred to as

pharmacovigilance (also known as post-marketing drug surveillance), is therefore an

important tool to monitor and improve drug safety (Mann & Andrews, 2007). For example,

on account of the practice of pharmacovigilance, Vioxx (Rofecoxib) was withdrawn

voluntarily worldwide in 2004 because of the finding of increased risk of heart attack

(Merck, 2004). Similarly, Avandia (Rosiglitazone) was suspended from the European

market in 2010 by the European Medicines Agency (EMA) (FDA, 2011; Ye, 2011).

1.2 Challenges in pharmacovigilance and suggested solutions

Over the last decade, the World Health Organization (WHO), the Food and Drug

Administration (FDA), EMA and others instituted spontaneous reporting systems (SRSs)

for the systematic collection of ADR reports (Rawlins, 1988a). Drug safety data obtained

from SRSs have been analyzed using quantitative data mining procedures to retrieve

strongly associated drug/ADR pairs (Puijenbroek, Diemont, & van Grootheest, 2003;

Rawlins, 1988a, 1988b). These highlighted associations are subsequently reviewed and

scrutinized by domain experts. These are essential parts of a pharmacovigilance system,

which are referred to as drug/ADR signal detection and evaluation (A. Bate et al., 1998; A.

Bate, Lindquist, Edwards, & Orre, 2002; Bates, Lindquist M., Orre R., Edwards I., &

Meyboom R., 2002).

There are two major problems with these processes. First, research suggests data collected

by SRS are limited by long time latency, incorrect or incomplete clinical information,

3

underreporting and reporting bias (Rawlins, 1988a, 1988b; Hasford, Goettler, Munter, &

Müller-Oerlinghausen, 2002; Alvarez-Requejo et al., 1998). Second, it has been argued

that causality assessment is lacking in pharmacovigilance practice (Anderson & Borlak,

2011). The WHO defines causality assessment as the evaluation of the likelihood that a

medicine caused the observed adverse event (WHO-Uppsala Monitoring Centre, 2013).

The lack of this evaluation is due to the expense and time involved in manual review by

domain experts. In practice resources are too limited for the large number of signals

produced by SRSs.

A potential workaround to the resource limitations is to use information from other sources

to supplement SRSs. Drug intake and symptoms are documented by clinicians in EHR data,

although the rate of false positives is expected to be high because the number of possible

adverse events is small compared to the number of desired treatment outcomes. After

filtering known drug indications a causality assessment of statistically significant signals

is necessary to identify possible side effects. Potential and known mechanisms of action

are described in detail in the biomedical literature. It is therefore possible to leverage

domain knowledge from biomedical literature to make statements about the plausibility of

drug/ADR connections. The hypothesis underlying this work is that by combining signals

mined from EHR data with biomedical domain knowledge, the accuracy of side effect

detection may be improved.

I evaluate this hypothesis in a series of experiments, First, I apply disproportionality

measures and Chi-square test to analyze drug-event co-occurrence data derived from EHR

to find statistically significant drug/ADR associations and evaluate the performance of

these statistical algorithms. For the statistically significant drug/ADR associations, I apply

4

scalable methods of literature-based discovery (LBD) (Hristovski, Friedman, Rindflesch,

& Peterlin, 2006; Kostoff, Briggs, Solka, & Rushenberg, 2008; Swanson, 1986b)

leveraging methods of distributional semantics (Cohen, Schvaneveldt, & Widdows, 2010;

Cohen, Widdows, Schvaneveldt, Davies, & Rindflesch, 2012; Widdows & Cohen, 2010)

to evaluate the plausibility of their connections. I do this by utilizing knowledge from the

published literature to automatically generate explanatory hypotheses which collectively

contribute to a plausibility score and a subsequent reranking according to their plausibility.

This also alleviates the problem of an overwhelming amount of associations for manual

expert review.

During the course of this research, it came to my attention that there is no agreed-upon gold

standard in pharmacovigilance (Tatonetti, Ye, Daneshjou, & Altman, 2012). This may

affect the performance evaluation because of the existing false drug/ADR associations in

the reference data. So a reference set was developed and used for additional performance

evaluation in this study.

To reiterate, the unifying hypothesis that is tested in the dissertation is that drug/ADR

signal detection from EHRs can be improved by integrating knowledge from the

biomedical literature.

1.3 Dissertation structure

This dissertation is structured as follows. In Chapter 2, I review literature related to the

field of and recent research in pharmacovigilance as well as LBD methods and

distributional semantics. In the following chapters I perform a series of experiments to

evaluate my hypotheses. In Chapter 3, I describe the first experiment, in which statistical

5

mining algorithms are used to find statistically correlated drug/ADR associations from

EHR data. In Chapter 4, I build LBD based distributional semantic models to identify

plausible ADRs using knowledge extracted from the literature. In Chapter 5, a drug/ADR

association reference set is built as a subset of a dataset containing known side effects. The

subset contains only those associations that are confirmed to be statistically significant

using data from an SRS. In Chapter 6, I evaluate the effects of plausibility based reranking

on the precision of the statistically significant drug/ADR associations detected in EHR

data. In this experiment, two reference sets are utilized and the performance against these

sets is compared. The dissertation concludes with describing its significance and

contributions to the field of pharmacovigilance, informatics and public health.

1.4 Key contributions

The purpose of this dissertation is to automatically identify drug/ADR associations that are

most likely causal among those signals detected from EHR with statistical methods by

integrating domain knowledge from the biomedical literature that is related to side effects.

A surprising drug/ADR signal (surprise observation) is detected from an EHR

system.

There is a genuine doubt about this observation (doubt).

To resolve the doubt, an inquiry is initiated (inquiry) by searching the literature to

support plausible connections between the drug and the ADR.

Supporting plausibility evidence is subsequently utilized as sufficient explanation

(explanation) to relieve the doubt about the surprising observation.

My theoretical contribution is that by conducting automated abductive reasoning

(Josephson & Josephson, 1996; Locke, Golden-Biddle, & Feldman, 2008; Peirce, 1955;

6

Schvaneveldt & Cohen, 2010) and by estimating the strength of generated explanatory

hypotheses the plausibility of an observation (in this work, a drug/ADR signal) can be

assessed. This means that I adapt the original abductive reasoning process by stating that

the strength of the explanations found for an observation is a measure for its plausibility.

A low plausibility likely implies that an observation is a false positive and should be

considered for rejection. A high measure of plausibility on the contrary supports the

validity of the observation.

This differs from the original line of thought by introducing a measure to assess plausibility

rather than taking the observation as a given fact.

The practical contributions to pharmacovigilance and informatics are as follows:

The development of methods to leverage knowledge from the literature as a

means to assess the plausibility of observations from EHR data.

The detection of signals from EHR data and the subsequent evaluation using

supporting evidence from literature in pharmacovigilance on a large scale in an

automated way.

The development of a drug/ADR reference set based on both an established side

effects repository and a high volume drug/ADR data set from an SRS.

The methods and procedures of integrating formal knowledge can be generalized

and applied to other domains, such as outbreak detection, for which a timely

identification of plausibility for signals is essential. This is not restricted to

pharmacovigilance and constitutes a contribution to the field of informatics in

general.

7

Chapter 2: Literature Review

2.1 Pharmacovigilance: post-marketing drug surveillance

2.1.1 Pharmacovigilance definition and significance

Vioxx (Rofecoxib) was withdrawn voluntarily from market by Merck in 2004, after it was

found that the use of this agent increased the risk of myocardial infarction (Merck, 2004).

Graham et al (Graham et al., 2005) estimated that between 88,000 and 140,000 excess

serious coronary heart diseases might have been caused by rofecoxib in the US since it was

launched in 1999. Avandia (Rosiglitazone) was suspended from the European market in

2010 (Blind, Dunder, Graeff, & Abadie, 2011; EMA, 2010; Ye, 2011) on account of an

increased risk of cardiovascular complications. These high-profile examples illustrate that

pharmacovigilance is very important to supplement existing drug safety profiles because

clinical drug trials cannot be large or long enough to identify all problems related to a new

drug (Rawlins, 1988a). Additionally, subjects are pre-selected by eligibility criteria and

therefore may not fully represent the patient population after the drugs are put to market

(Anderson & Borlak, 2011). Besides, some patient may have idiosyncratic drug reactions

which cannot be explained by the pharmacological effect of the drug (Knowles, Uetrecht,

& Shear, 2000). Consequently, it is highly unlikely that instances of all possible ADRs will

be detected during pre-marketing clinical trials.

To address this problem, health departments and organizations (such as the World Health

Organization (WHO), United States Food and Drug Administration (FDA), and European

8

Medicines Agency (EMA)) encourage physicians, other health care professionals, and

patients to report voluntarily about any observed ADRs. In addition to voluntarily

reporting, pharmaceutical companies are mandatory required to report serious adverse

events (Ahmad, 2003). These bodies have Spontaneous Reporting Systems (SRS) to enable

the efficient submission of reports electronically (Kessler et al., 1993; Wysowski & Swartz,

2005).

Since 1969, the FDA Adverse Event Reporting System (FAERS), which provides the

means for clinicians to report suspected adverse events electronically, collected more than

7 million case reports (Fine, 2013) and the number of ADR reports submitted to the FDA

continues to grow (Figure 2-1). More than 75 drug products have been removed from the

market due to safety problems from 1969 to 2002 (Wysowski & Swartz, 2005). The facts

emphasize the importance of post-marketing drug monitoring, known as

pharmacovigilance – “the science and activities relating to the detection, assessment,

understanding and prevention of adverse effects or any other drug-related problem after

drugs are on market” (WHO, 2010). Pharmacovigilance is designed to detect any rare or

long-term adverse effects over a very large population and a long period of time.

9

Figure 2-1: Number of adverse event reports reported to FDA categorized by reporter

(top left), by patient outcome (top right), by geographic sources (bottom left) and by

types of reports (bottom right) from 2001 to 2011

2.1.2 Pharmacovigilance workflow in related health departments

In general, the pharmacovigilance process proceeds as follows (Andrew Bate, 2003;

Edwards, 1992; Lindquist et al., 1999):

(1) Reported drug-related problems are collected in SRSs nationally or internationally;

(2) Quantitative data mining procedures are used to analyze these data and retrieve

relatively strongly correlated drug/ADR pairs (drug/ADR associations);

10

(3) These highlighted associations are then reviewed and evaluated by domain experts

making up an expert clinical review panel; and

(4) Associations considered to be of clinical interest are then annotated as “signals”.

Specifically, signal is defined as “reported information on a possible causal relationship

between an adverse event and a drug, the relationship being unknown or incompletely

documented previously” (Edwards & Biriell, 1994; van Puijenbroek et al., 2002). Overall,

the pharmacovigilance process in WHO (Andrew Bate, 2003), FDA (Fine, 2013; Gould,

2005) and EU-ADR initiative (P. Lopes et al., 2013) includes two components -- a

statistical component (quantitative signal detection, steps (1) and (2)) and a qualitative

component (expert clinical review, steps (3) and (4)) (Andrew Bate, 2003). Figure 2-2

demonstrates the specific pharmacovigilance workflow in FDA.

2.1.3 Pharmacovigilance data sources

Through pharmacovigilance, international and national health institutions gather large

amount of data mainly from SRS for further analysis. In addition to case reports,

specialized networks and active surveillance data are also considered as data sources for

detecting safety signals, even though they are in experimental phrases (FDA Science Board

Subcommittee, 2011). Specialized networks, such as the Drug Induced Liver Injury

Network (DILIN) (Hoofnagle, 2004), focus on collecting side effects data concerning

specific organ targets, patterns of ADRs or patient populations. Active surveillance efforts,

such as Sentinel Network (Platt et al., 2009; Platt, Madre, Reynolds, & Tilson, 2008) and

Observational Medical Outcomes Partnership (OMOP) (Stang et al., 2010), utilize existing

systems for the purpose of drug monitoring (U.S. Food and Drug Administration, 2008).

Examples of potentially useful systems include EHRs, pharmacoepidemiological databases

11

(Johansson, Wallander, de Abajo, & Rodriguez, 2010), health insurance claims databases

(Brown et al., 2007, 2009), and so forth. the Observational Medical Outcomes Partnership

(OMOP) uses EHR and other data sources to identify ADR signals and the preliminary

results demonstrate that active surveillance from EHR systems can complement current

pharmacovigilance practice, but also that performance varies by data source, drug and

outcomes of interest (FDA Science Board Subcommittee, 2011).

Figure 2-2: FDA pharmacovigilance process – adapted from (Fine, 2013; Gould, 2005)

With the opportunity presented by EHRs’ broader availability, researchers have also used

EHR data for drug safety research and have attempted to demonstrate that EHR data are a

relevant and significant data source for pharmacovigilance (Collins, 2011; Wang,

12

Hripcsak, Markatou, & Friedman, 2009). These authors argue that EHR data can

compensate for some of the deficiencies of SRS, such as under-reporting, misclassification,

a long lag time between observation and reporting, reporting bias and the provision of

incomplete background clinical information (Rawlins, 1988a, 1988b).

2.2 Pharmacovigilance methodology

2.2.1 Statistical data mining from pharmacovigilance database

Statistical algorithms are routinely applied to SRSs (A. Bate et al., 1998; W. DuMouchel

& Pregibon, 2001; W DuMouchel, 1999; Lindquist et al., 1999; Lindquist, Stahl, Bate,

Edwards, & Meyboom, 2000; Noren, Bate, Orre, & Edwards, 2006) to measure the strength

of reported drug-event associations. Quantitative data mining procedures are measuring

“disproportionality”. Reporting disproportionality is defined as a statistically significant

higher reporting frequency of particular drug/ADR associations compared with marginal

distributions of drugs and ADRs as a background in the reporting database (Shibata &

Hauben, 2011). These disproportionality measures quantify how often a drug and a

possible event co-occur compared to the background reporting occurrence across all other

drugs and events using an independence model (Manfred Hauben, Madigan, Gerrits,

Walsh, & Van Puijenbroek, 2005). The reporting of the event related to other drugs in the

database is used as a proxy for the background occurrence of the event (Poluzzi, Raschi,

Piccinni, & De Ponti, 2012). To put it in another way, different disproportionality measures

quantify the extent to which a given event is frequently reported with a given drug by

estimating the observed-to-expected co-occurrence ratio. Those drug-event pairs that are

significantly different from background reporting occurrence may reflect credible signals

13

that need further investigation. However, reported ADRs do not necessarily accurately

reflect the usage of drugs and the incidence of ADRs in the population (Poluzzi et al.,

2012), the significant pairs may represent reporting tendencies (Manfred Hauben & Reich,

2005). Commonly used disproportionality measures include the proportional reporting

ratio (PRR) (Evans, Waller, & Davis, 2001), the reporting odds ratio (ROR) (Puijenbroek

et al., 2003), Bayesian confidence propagation neural network (BCPNN, information

component (IC) is the statistical score) (A. Bate et al., 1998; Lindquist et al., 1999, 2000;

Noren et al., 2006), multi-item gamma Poisson Shrinker (MGPS, empiric Bayes geometric

mean (EBGM) is the statistical score) (W. DuMouchel & Pregibon, 2001; W DuMouchel,

1999), and others (Table 2-1). Different comparison references are used to estimate the

expected ADR occurrences in frequency probability based measures. The observed for all

drugs and the observed for all other drugs are used to estimate the expected occurrence of

the ADR for PRR and ROR respectively (Poluzzi et al., 2012). With Bayesian methods, if

a drug and an ADR is independent, then their joint probability to the product of the

individual probabilities will equal to 1 (Poluzzi et al., 2012). This inference hypothesis is

used to calculate and interoperate observed-to-expected co-occurrence ratio. Roux et al.

(E. Roux et al., 2003) evaluated these statistical models on simulated datasets (constructed

by SRS modeling for arbitrary selected 150 drugs and 100 side effects) by comparing the

percentage of false positive signals among given drug-event combinations (Emmanuel

Roux, Thiessard, Fourrier, Begaud, & Tubert-Bitter, 2005). The false positive rates varied

from 1.1% to 53.4%; and EBGM, IC and Chi-square models seemed to have better

performances (Emmanuel Roux et al., 2005). Puijenbroek et al. (van Puijenbroek et al.,

2002) also evaluated different measures and found out that sensitivity was high with respect

14

to the reference measure when a combination of point- and precision estimates was used.

For example, in the PRR measure, the PRR is the point estimate; and its lower limit of 95%

Confidence Interval (CI) is the precision estimate.

Table 2-1: Quantitative data mining procedures used as disproportionality measures in SRS (Balakin & Ekins, 2009; M. Hauben &

Bate, 2009; Manfred Hauben et al., 2005; Emmanuel Roux et al., 2005; van Puijenbroek et al., 2002)

Reports with the suspected event j Reports without the suspected event j

Reports with the drug i of interest a(=nij) b

Reports with all other drugs c d

Measure of Association and Data

source Formula

Measure of importance Probabilistic

Interpretation

Frequentist Approaches

Proportional Reporting Ratio (PRR)

(Evans et al., 2001)

UK Yellow Card database,

Medicines Control Agency (MCA)

Reporting Odds Ratio

(ROR) (Puijenbroek et al., 2003)

Netherlands Pharmacovigilance

Centre Lareb

)(ln96.1)ln(%95

)1111

()(ln

)/(

)/(

PRRSEPRReCI

dccbaaPRRSE

dcc

baaPRR

4,3,2

196.1

2

aPRR

or

SEPRR

)|(

)|(

DAP

DAP

)(ln96.1)ln(%95

)1111

()(ln

)/(

)/(

RORSEROReCI

dcbaRORSE

bc

ad

db

caROR

196.1 SEROR

)|()|(

)|()|(

DAPDAP

DAPDAP

15

Table 2-1: Continued

Measure of Association and Data

source Formula

Measure of importance Probabilistic

Interpretation

Bayesian Approaches

Relative Risk (RR)

WHO BCPNN and FDA MGPS

A: ADR; D: Drug; BCPNN: Bayesian Confidence Propagation Neural Network; MGPS: Multi-item gamma Poisson Shrinker

dcba

caba

a

E

nRR

ij

ij

))((

))((

)(log

2caba

dcbaaIC

02 SDIC

)(

)|(log

)()(

)()|(log

)()(

),(log

2

2

2

AP

DAP

DPAP

DPDAP

DPAP

DAP

))((

)(

caba

dcbaaRR

205 EBGM

)(

)|(

AP

DAP

16

17

2.2.2 Taxonomic reasoning

In addition to the use of quantitative techniques, ontology-based reasoning in combination

with quantitative data mining methods has been shown to be able to improve signal

detection from SRSs (J. S. Almenoff et al., 2007; Bousquet, Henegar, Louėt, Degoulet, &

Jaulent, 2005; Bousquet, Trombert, Kumar, & Rodrigues, 2008; Henegar, Bousquet, Lillo-

Le Louët, Degoulet, & Jaulent, 2006; Mera, Beach, Powell, & Pattishall, 2010; Nadkarni,

2010). Different methods of ontological reasoning were conceived. Two examples, coined

terminological reasoning by subsumption and approximate matching (Bousquet et al.,

2005), are summarized in the following.

In a terminological hierarchy, when multiple subclasses are connected to one superclasses,

it is said that the superclass subsumes all of the subclasses. Reasoning by subsumption then

means that a given class and all of its subsumed classes are considered to be one entity.

The new entity then constitutes all of the signals of the individual classes into one,

increasing its number of occurrence. In SRSs, ADRs are usually coded using the Medical

Dictionary for Regulatory Activities (MedDRA) terminology, and MedDRA’s hierarchy

was used to facilitate terminological reasoning by subsumption.

In approximate matching, a new concept is built as the disjunction of all sets of possible

characteristics. For example, Hepatitis is defined as “((hyperbilirubinemia AND ALAT

increased) OR (ASAT increased AND cholestasis) OR (jaundice AND cytolysis))”

(Bousquet et al., 2005). This is another way to group cases to increase the number of

occurrences of identifiable drug/ADR associations.

These and other approaches and combinations of both approaches have been shown to

improve signal detection by exploiting ontological properties with statistical data mining

18

methods (J. S. Almenoff et al., 2007; Bousquet et al., 2005, 2008; Mera et al., 2010;

Nadkarni, 2010).

2.3 Predicting drug side effect relations

2.3.1 Machine learning algorithms

Other researchers have attempted to predict which drugs cause which side effects using

machine learning approaches with structured knowledge concerning the drugs and ADRs

as features. Liu et al. applied five supervised machine learning (ML) algorithms with drug

features drawn from PubChem (chemical substructures), DrugBank (drug target,

transporters, and enzymes), KEGG (pathway) and SIDER itself (drug indication and side

effects). A ML classifier was built for each SIDER ADR. The classifiers were then

evaluated on 832 SIDER drugs (for which DrugBank IDs could be found) using five-fold

cross-validation. Of the algorithms evaluated, the support vector machine (SVM) algorithm

performed best with an AUC of 0.9054 on the SIDER dataset as a whole (Liu, Wu, et al.,

2012).

In other work (Huang, Wu, & Chen, 2011), the features were selected from drug-target

information from DrugBank and drug non-target protein information from an expanded

protein-protein interaction (PPI) network derived from gene ontology (GO) annotations. A

SVM classifier was built for each ADR and cross-validated on a known side effect resource

(SIDER, version 1).

In related work, decision tree models were derived for specific end organs using features

derived from the chemical structure of the agents in the SIDER set. The accuracies of

19

decision trees for allergic, renal, central nervous system (CNS) and hepatic ADRs ranged

from 78.9% to 90.2% (Hammann, Gutmann, Vogt, Helma, & Drewe, 2010).

In addition to supervised ML algorithms, unsupervised learning using topic models has

also been used to analyze drug labels and group drugs by topics that are associated with

the same side effects (Bisgin, Liu, Fang, Xu, & Tong, 2011).

2.3.2 Regression and correlation analysis

Pauwels and colleagues applied sparse canonical correlation analysis (SCCA) utilizing the

chemical substructure of drugs to predict side effects (Pauwels, Stoven, & Yamanishi,

2011). Features representing drugs’ chemical substructures were extracted from PubChem.

Side effects were extracted from SIDER. Canonical correlation analysis (CCA) is a way of

measuring the linear relationship between two multidimensional variables and customized

to analyze the relationship between the group of chemical substructure and the group of

side effects and to find what chemical substructure can represent what side effect. Highly

correlated sets of chemical substructures and side effects were retrieved, and predictive

scores associating substructures to side effects were calculated. SCCA is a variant of

canonical correlation analysis that aims to minimize the size of the set of features utilized

so as to facilitate explanation. The AUC for SCCA is 0.8932 for predicting side effects.

In another study, logistic regression was used to predict ADRs using as features chemical

substances from PubChem and BioAssay (a database of bioactivity screens of chemical

substances). Performance was evaluated for side effects related to specific organ systems,

and was best for immune system disorders (AUC=0.92) and blood and lymphatic system

disorders (AUC=0.79) (Pouliot, Chiang, & Butte, 2011).

20

2.3.3 Mining literature to predict drug/ADR associations

Currently, systematic literature review is utilized as an in-depth investigation to provide

scientific evidence to confirm a specific drug/ADR relationship or support regulatory

decisions (for example, drug withdrawal) in the pharmacovigilance process (Arnaiz et al.,

2001; Y. K. Loke, Kwok, & Singh, 2011; Yoon K Loke, Price, & Herxheimer, 2007; P.

Lopes et al., 2013; Marie et al., 2008; Olivier & Montastruc, 2006). However, evidence in

the biomedical literature (e.g. case reports, clinical trials) may appear before an ADR is

recognized. This implies that analyzing the literature to predict drug/ADR associations may

complement current drug safety methods (disproportionality measures of SRS). This aspect

of pharmacovigilance has been explored in a few studies only (Alomar, Hourani, &

Sulaiman, 2008; Deftereos, Andronis, Friedla, Persidis, & Persidis, 2011; Shetty & Dalal,

2011).

There are two major investigative perspectives. One perspective involves using the

literature as a data source for ADR case reports, and then applying statistical models to

mine the drug/ADR associations (Alomar et al., 2008; Shetty & Dalal, 2011). Shetty et al.

(Shetty & Dalal, 2011) retrieved literature about side effects by searching PubMed using

NLM’s Medical Subject Headings (MeSH) index, and then filtering irrelevant articles (for

example articles discuss only treatments) using a Lasso-based document relevance

classifier (Genkin, Lewis, & Madigan, 2007). With the contingency table built from

drug/ADR occurrence across all relevant citations, a disproportionality analysis was

applied to find statistically significant drug/ADR associations. This model discovered 54%

of all detected FDA warnings from labeling and also confirmed the association between

21

rofecoxib and heart disease with literature before 2002 while rofecoxib was recalled in

2004.

The second perspective is based on the paradigm of literature-based discovery (LBD), and

involves processing the literature to uncover indirect relationships between drugs and

ADRs. Using LBD for pharmacovigilance has not been attempted until recently Hristovski

et al. (Hristovski et al., 2006) model LBD discovery patterns to analyze semantic

predications and find therapeutic relations for drugs and a disease. To date, LBD-inspired

pharmacovigilance research has concerned providing justification for a connection

between a drug and an ADR, or investigating possible mechanisms to explain an observed

side effect using the literature (Ahlers, Hristovski, Kilicoglu, & Rindflesch, 2007;

Hristovski, Burgun-Parenthoine, Avillach, & Rindflesch, 2012a). However, this approach

has not yet been evaluated as a means to predict side effects on a large scale. I will elaborate

further on this aspect in Section 2.7.1.

2.3.4 Statistical data mining from EHR for signal detection

Using EHR data for pharmacovigilance is an active research area (Wang et al., 2009).

Disproportionality measures applied to SRSs have been applied to EHR data also (Zorych,

Madigan, Ryan, & Bate, 2011; Liu, McPeek Hinz, et al., 2012). However, these algorithms

were designed for reporting systems to evaluate whether a given event is disproportionally

“reported” with a given drug. As side effects may not be explicitly reported in the EHR,

the applicability of disproportionality measures to EHR data must still be evaluated

(Zorych et al., 2011). Zorych et al. (Zorych et al., 2011) hypothesize that the observed co-

occurrence of a drug and an event can be regarded as though they were reported as a

spontaneous reporting case, to produce drug-event contingency tables. Using this

22

approach, existing disproportionality algorithms were applied to both real world and

simulated EHR data. Results show that the disproportionality algorithms can find

legitimate drug/ADR associations; they also produce a large number of false positive

associations. Liu et al. (Liu, McPeek Hinz, et al., 2012) assessed disproportionality

measures on drug-laboratory test ADR associations derived from EHR data. This study

further demonstrated that disproportionality measures can be applied to analyze EHR data

for pharmacovigilance. Of note, these EHR data (Zorych et al., 2011; Liu, McPeek Hinz,

et al., 2012) were structured.

However, most clinical information also includes unstructured narratives, which contain

valuable patient data. Several researchers (Cao, Hripcsak, & Markatou, 2007; Cao,

Markatou, Melton, Chiang, & Hripcsak, 2005; Wang et al., 2009) have investigated

measuring co-occurrence within the narrative section of the EHR as a means to detect

possible drug/ADR associations. This is accomplished by calculating above-chance

occurrences of drug and problem pairs in the clinical notes, where each note represents a

clinical encounter. Associations among entities are estimated using statistical methods

(Chi-Square statistics) based on their co-occurrence statistics in the clinical notes. Co-

occurrence statistics (Christopher D. Manning & Hinrich Schütze, 1999) have been

successfully applied in the computational linguistics domain, for example, for the purpose

of word sense disambiguation (Yarowsky, 1995), automatic generation of semantic

lexicons (Roark & Charniak, 1998) and synonym mining (Turney, 2001). In addition to

word-level co-occurrence statistics, concept-level co-occurrence has also been used to

discover associations between concepts (Cao et al., 2005). If a drug causes an ADR, then

this drug and this related problem are more likely to appear together than random

23

combinations of drugs and adverse events in clinical notes (Cao et al., 2005). It has been

demonstrated that co-occurrence statistics can be derived from unstructured EHR data to

detect possible side effects (Wang et al., 2009).

2.4 Pharmacovigilance challenges from my perspective

The above review of the literature highlights two major challenges: inefficient and

incomplete SRS data and the lack of computer-assisted causality assessment in

pharmacovigilance practice. As discussed in section 2.1.3, EHR may compensate for the

limitations of SRSs and provide the opportunity for active surveillance. As argued in

(Anderson & Borlak, 2011), there is a lack of causality assessment in pharmacovigilance

practice. While expert clinical review is designed to verify potential ADRs, it is a human-

intensive and time-consuming process. The available human resources are inadequate to

review the large amount of noisy signals detected in SRS and EHR data, creating a

bottleneck in the pharmacovigilance process. More research is needed to develop methods

to automate, or assist with, the knowledge-intensive task of expert clinical review. In the

section that follows, I will discuss emerging methodologies that I propose adapting in order

to address these limitations: EHR data can be used as an alternative data source for

pharmacovigilance and scalable LBD methods based on distributional semantics can be

utilized to assist signal evaluation.

2.5 Signal detection from EHR

An EHR system presents the possibility of a real-time, continuous approach to drug

surveillance (Brown et al., 2007; Trifirò et al., 2009; Wang et al., 2009). It has been

24

estimated that Vioxx would have been withdrawn in just 3 months instead of 5 years if the

EHR data of 100 million patients had been available for drug monitoring (McClellan, 2007;

Pray, Robinson, & Translation, 2007; Trifirò et al., 2009; Wagner et al., 2006). In addition

to review the statistical mining from EHR (Section 2.3.4), I continue to review the recent

proposed framework using EHR for pharmacovigilance from a pilot study (Wang et al.,

2009).

Wang et al. (Wang et al., 2009) proposed a pharmacovigilance framework utilizing an EHR

system. The basic procedure is: (1) Drug-relevant data (both structured and unstructured)

are extracted from the EHR. A Natural Language Processing (NLP) tool is used to parse

free-text clinical notes because unstructured data may include richer patient information;

(2) The drug relevant data are coded and mapped to a standard terminology for further

analysis; (3) Drug and related event pairs are selected by filtering indication relationships

and excluding incorrect temporal order of events (i.e. side effect before drug

administration); (4) Selected drug-event pairs are analyzed using statistical data mining

algorithm (Chi-square statistics) to obtain a ranked ADR list for all respective drugs based

on the strength of the statistics; (5) For each drug’s ranked ADR list, an empirical threshold

of significance level is adjusted by volume tests using the P-value plot (Schweder &

Spjøtvoll, 1982; Cao et al., 2005, 2007; Wang et al., 2009) to determine possible signals

for each drug.

Essentially, there are two major tasks required to use EHR data for pharmacovigilance. The

first task is to process and clean EHR data to get drug-event combinations and retrieve

these combinations’ co-occurrence data using informatics tools. Second is to conduct

25

statistical analysis to analyze these co-occurrence data of drugs and events to detect the

significant drug/ADR pairs.

ADR signals have relatively low incidence in comparison with background information in

the EHR (for example, therapeutic relationships are more common) because EHRs are not

designed for ADR monitoring. This leads to noise, which means the application of basic

statistical methods alone will detect many false signals. Thus estimating a threshold to find

stronger associations maybe difficult (Cao et al., 2007; Wang et al., 2009). However,

pharmacovigilance from EHR can provide good signal candidates for further investigation.

2.6 Expert clinical review and causality assessment in pharmacovigilance

2.6.1 Expert clinical review in pharmacovigilance

Since the beginning of pharmacovigilance practice (mid 1960s), expert clinical review has

been viewed as essential to assess causality during the signal evaluation process

(Agbabiaka, Savović, & Ernst, 2008; Andrews & Moore, 2014; Andrew Bate, 2003).

Detected signals are reviewed by one or more experts who are often specialists, with

expertise related to the ADR concerned (Andrews & Moore, 2014). This is referred to as

expert judgement or global introspection -- an expert expresses a judgement about possible

drug causation after having taken into account all the available and relevant information

on the considered case (Arimone et al., 2007). Expert judgment assessments are generally

expressed in terms of a qualitative probability scale (Andrews & Moore, 2014). For

example, the WHO causality categories are ‘Certain’, ‘Probable’, ‘Possible’, ‘Unlikely’,

‘Conditional/unclassified’, and ‘Unassessable/unclassifiable’(Meyboom, Hekster,

Egberts, Gribnau, & Edwards, 1997). During the assessment, specific structured

26

approaches have been used to guide the experts from which perspective to evaluate the

possible ADR. For instance, the Swedish regulatory agency (Wiholm, 1984) recommends

the experts to assess possible ADRs from the following aspects: (1) the temporal sequence,

(2) previous information on the drug, (3) dose relationship, (4) response pattern to drug,

(5) rechallenge, (6) alternative aetiological candidates, and (7) concomitant drugs.

2.6.2 Assessment of causality

To address the assessment of causality, general principles exist that can be applied to

evaluate the causality of potential ADRs (Shakir & Layton, 2002). The theoretical basis

for these principles was proposed by Sir Austin Bradford-Hill in 1965 (Hill, 1965).

Bradford-Hill, an English epidemiologist and statistician, was the first to demonstrate that

cigarette smoking contributes toward lung cancer using what are now referred to as the

“Bradford-Hill criteria” (Richard Doll, 1994). The Bradford-Hill criteria provide

viewpoints from which to evaluate evidence indicative of causality and have been widely

used in pharmacoepidemiology. These criteria are named ‘strength’, ‘consistency’,

‘specificity’, ‘temporality’, ‘biological gradient’ (referring to dose-response relationships),

‘plausibility’, ‘coherence’, ‘experimental evidence’, and ‘analogy’ (Hill, 1965; Kleinberg

& Hripcsak, 2011; Ward, 2009). Since then, the criteria have been widely used in

epidemiology and may be applied to assess the causality of drug/ADR relationships

(Anderson & Borlak, 2011; Perrio, Voss, & Shakir, 2007; Shakir & Layton, 2002). Three

of these criteria seem particularly pertinent to the development of pharmacovigilance

methods in our study:

The strength criterion reflects that strong associations are more likely to be causal

than weak associations (Shakir & Layton, 2002). Quantitative statistical data

27

mining methods evaluate ADR signals from the strength of association point of

view.

The plausibility criterion relates to evidence about mechanisms that may be

involved to support a causal relationship.

The coherence criterion relates to the consistency of the hypothesis in question with

contemporary medical knowledge.

2.7 LBD and literature NLP annotation tools

2.7.1 LBD and discovery patterns

Processing published biomedical literature to uncover implicit relationships among entities

is referred to as LBD (Bruza & Weeber, 2008; Swanson, 1986b, 1987; Weeber, Klein, de

Jong‐van den Berg, & Vos, 2001). LBD involves finding new knowledge by analyzing the

literature, rather than through scientific experimentation. This is accomplished by

identifying hidden connections between entities described in the published literature

(Swanson, 1986a, 1986b). The origins of LBD may be traced to the serendipitous discovery

that fish oils can be therapeutically useful in the treatment of Raynaud’s syndrome (poor

circulation in the peripheries) by information scientist Don Swanson (Swanson, 1986a,

1986b). Weeber et al. described the two types of LBD (Weeber et al., 2001).

One type, referred to as “open LBD”, starts from a known term or concept (generally called

A, although also referred to as C in Swanson’s early work) and tries to find an interesting

hypothesis in the form of a previously unrecognized connection to some other term. If an

article argues that A is associated with B and a second article mentions that B is associated

with C, A may treat C. For example dietary fish oil (A) affects platelet aggregation, blood

28

viscosity and vascular reactivity (B), and these biological factors (B) play a role in

Raynaud's syndrome (C) (Swanson, 1986a). Consequently, it is reasonable to hypothesize

that A treats C. The open LBD process proceeds from the source term A to an unknown

target term C and culminates in the generation of a new hypothesis.

The second type of LBD is referred to as “closed LBD”. In a closed LBD process the goal

is to evaluate an existing hypothesis. Closed LBD starts with known terms A and C, with

the goal to identify intermediate terms B that provide the bridge between A and C (Weeber

et al., 2001). For example, in 1988 Swanson found intermediate concepts to explain a

hypothetical relationship between migraine and magnesium (Swanson, 1988). Smalheiser

and Swanson used closed LBD to propose an explanation for epidemiologic evidence that

estrogen might protect against Alzheimer’s disease (Smalheiser & Swanson, 1996).

LBD methodologies generally utilize statistical information derived from the frequency

with which terms, or discrete concepts extracted from the literature using automated tools

(e.g. MetaMap) or assigned to it by human annotators (Srinivasan & Rindflesch, 2002), co-

occur (Hristovski et al., 2006; Yetisgen-Yildiz & Pratt, 2006). This has been referred to as

the co-occurrence model (Sehgal, Qiu, & Srinivasan, 2008). These co-occurrence statistics

are interpreted by correlation-mining and ranking algorithms (Yetisgen-Yildiz & Pratt,

2006, 2009).

A limitation of these methods is that they generally do not consider the nature of the

relationship between the terms or concepts concerned. To address this limitation,

Hristovski and his collaborators (Hristovski et al., 2006) propose using semantic relations

to eliminate spurious relationships introduced by frequently co-occurring concepts that are

not meaningfully related. In their initial work, the semantic relations concerned were

29

extracted from the literature by two NLP systems: SemRep (Rindflesch & Fiszman, 2003)

and (specifically to extract phenotypic information) BioMedLEE (L. Chen & Friedman,

2004). Their approach involved the specification of “discovery patterns”, patterns of

relationships between concepts that may indicate an implicit therapeutic relationship

(Hristovski, Friedman, Rindflesch, & Peterlin, 2008). These conditions can be specified as

sets of semantic predicates. For example, Ahlers et al. (Ahlers, Hristovski, et al., 2007)

defined the May_Disrupt pattern as follows:

Substance X <inhibits> Substance Y

Substance Y <causes|predisposes|associated with> Pathology Z

Substance X <may disrupt> Pathology Z

Variants of this approach have been applied to generate or support the hypotheses that fish

oil treats Raynaud’s disease (Hristovski et al., 2006), insulin treats Huntington disease

(Hristovski et al., 2006), and antipsychotic agents prevent cancer (Ahlers, Hristovski, et

al., 2007). Recently, this approach was also adapted to provide evidence to support the

plausibility of an observed drug/ADR association (Hristovski et al., 2012a; Hristovski,

Burgun-Parenthoine, Avillach, & Rindflesch, 2012b), providing proof-of-concept that

LBD methods can be applied within the problem domain of pharmacovigilance. Regardless

of the application domain, knowledge used to populate discovery patterns is extracted from

the biomedical literature using NLP.

2.7.2 Literature NLP annotation tools

MetaMap and SemRep are two examples of tools that have been used to extract information

encoded in the biomedical literature. MetaMap is a widely-used NLP tool that identifies

concepts from the Unified Medical Language System (UMLS) in biomedical text (A. R.

30

Aronson & Lang, 2010; A. R. Aronson, 2001). SemRep (Rindflesch, Fiszman, & Libbus,

2005; Rindflesch & Fiszman, 2003) is a rule-based NLP tool (A. R. Aronson & Rindflesch,

1998) that draws on concepts extracted by MetaMap and medical domain knowledge in the

UMLS to extract semantic predications (Rindflesch & Fiszman, 2003). Its input consists

of sentences from the literature; its output is a series of semantic predications identified in

the respective text. A semantic predication is a subject-predicate-object triple sentence in

which the subject and object are UMLS concepts and the predicate is a semantic

relationship. For example, metformin (UMLS Concept C0025598) TREATS diabetes

mellitus (C0011849) is a semantic predication extracted from the phrase “Treatment of

diabetes mellitus with metformin”. Evaluations of SemRep reveal a precision between 0.73

and 0.81, and a recall of 0.55 on the biomedical literature (A. R. Aronson & Rindflesch,

1998; Kilicoglu, Fiszman, Rosemblat, Marimpietri, & Rindflesch, 2010; Rindflesch &

Aronson, 2002). Semantic predications benefit the LBD process in several respects. The

additional information provided by semantic predications makes the LBD results easier to

interpret. In addition, it has been noted that a large number of uninformative co-occurrences

must be manually reviewed when LBD is based on lexical statistics alone (Lindsay &

Gordon, 1999). In contrast, semantic predications provide the means to isolate relationships

between concepts that are logically connected in a meaningful way.

2.8 Semantic Vectors (Cohen & Widdows, 2009)

Regardless of whether co-occurrence relations or discovery patterns are used, LBD systems

must explore large numbers of co-occurring terms or possible reasoning pathways to

identify explanatory hypotheses (for closed discovery) or previously unrecognized

31

relationships (for open discovery). Consequently, the process of LBD can be

computationally expensive, and thus faces scalability issues in the context of the rapid

growth of the biomedical literature. In contrast, the field of distributional semantics has

produced corpus-derived statistical models that can measure the relatedness between two

concepts by comparing vector representations of these concepts, called semantic vectors,

that are derived from the contexts they have occurred in (Cohen & Widdows, 2009),

without the need to explicitly explore co-occurring concepts once the initial model has been

generated. Consequently, several authors have explored the use of distributional models

for LBD (Cohen et al., 2010; Cole & Bruza, 2005; Gordon & Dumais, 1998).

2.8.1 Overview of semantic vectors

These geometrically motivated models of distributional semantics represent terms or

concepts as high-dimensional vectors derived from the contexts in which they have

occurred. Relatedness between a pair of terms or concepts is then estimated from their

representing vectors’ similarity (Cohen et al., 2010). The vector components can be binary,

ternary, real, or complex values (Kanerva, 2009; Widdows & Cohen, 2012).

The overall methodology of semantic vectors is to build vector representations for terms

and documents. Different methods exist to compute these vectors. For example, document

vectors can be built from term-by-context matrices (Cohen & Widdows, 2009); in Latent

Semantic Analysis (LSA) the initial representation is a term-document matrix derived from

a corpus (Landauer & Dumais, 1997).

Overall, document vectors are derived from the vector representations of the terms that

they contain and as a consequence, documents with similar terms have similar vector

representations. A mathematically computed distance between vectors is then

32

representative of the similarity of the documents they represent. A key aspect is that the

abstract concept of meaning or semantics of a concept is converted into a metric that can

be computationally exploited.

Term-by-context matrices or term-document matrices can be large for text corpora with a

high number of terms and documents. PubMed/MEDLINE for example has both dozens of

millions of terms and documents. Deriving vector representations with term-by-context or

term-document matrices can thus become computationally unfeasible. Methods to

circumvent this problem by reducing the vector dimensionality have been conceived and

shown to effectively preserve the meaning of the represented concepts as described in the

next section.

2.8.2 Scalability and random indexing (RI)

RI, a relatively recent development for dimension reduction, supports the derivation of

semantic distance from large corpora at minimal computational expense (Cohen &

Widdows, 2009). RI further improves the scalability of distributional methods by avoiding

computationally intensive approaches to dimension reduction of the original term-by-

context matrix (Kanerva, Kristofersson, & Holst, 2000; Kanerva, 2010). The algorithm’s

computational complexity scales linearly with increasing size of the input data. It can be

incrementally updated as new documents are added without retraining the whole dataset;

thus it is applicable to large corpora such as MEDLINE. Two variants of RI have been

applied to LBD in our previous work, which can efficiently infer and identify therapeutic

relationships. One is Reflective Random Indexing (RRI) (Cohen et al., 2010), which

models co-occurrence. The other one is Predication-based Semantic Indexing (PSI)

(Cohen, Widdows, Schvaneveldt, Davies, et al., 2012), which implements discovery

33

patterns in vector space. On account of their scalability, these models permit inference on

a scale that would be prohibitively time-consuming if explicit exploration of all possible

reasoning pathways were attempted. This is accomplished through a mechanism known as

“indirect inference” (Landauer & Dumais, 1997), which enables distributional models to

find meaningful connections between terms that do not co-occur with one another directly,

without the need to explore intervening terms explicitly. Further details about these two

LBD distributional semantics models (RRI and PSI) are provided in Section 4.2.1 and 4.2.2

respectively.

2.9 Research opportunities

2.9.1 Active drug surveillance with EHR

With the broader availability of EHR data and the building of data repositories by

integrating different EHR systems (FDA Science Board Subcommittee, 2011; P. Lopes et

al., 2013; Platt et al., 2009; Stang et al., 2010), an active and real time pharmacovigilance

surveillance may be achieved in the near future. This poses informatics challenges from

the data integration perspective. With respect to EHR signal detection, the challenge of

improving signal detection is likely to be a research focus. This can be achieved using

statistical data mining methods, and by improving the accuracy of true drug/ADR co-

occurrence data using informatics methods.

2.9.2 Using distributional semantic models to find plausible drug/ADR associations

Signal evaluation is still a key pharmacovigilance challenge (P. Lopes et al., 2013). The

proposal of substantiating and verifying ADR signals by analyzing the literature and

existing drug-related knowledge bases in an automated fashion raises many research

34

questions. With the scalability of newer distributional semantics methods, and their

capability of modeling the indirect relationships between entities using literature, I posit

that LBD distributional semantics can retrieve evidence from the literature to establish the

plausibility of connections between drugs and ADRs. This will assist the

pharmacovigilance evaluation process by providing relevant evidence for domain experts

to consider for causality assessment in signal evaluation.

In the chapters that follow, I will evaluate this hypothesis by using state-of-the-art

statistical algorithms to analyze an EHR system and identify statistically significant

drug/ADR associations. Scalable LBD models based on distributional semantics are

designed and built to leverage knowledge from the biomedical literature to identify

plausible drug/ADR associations. An evaluation is conducted to determine the validity of

each developed method.

35

Chapter 3: Signal Detection from Outpatient Electronic Health Record Data Using

Statistical Mining

Research suggests data collected by SRS are limited by long time latency, incorrect or

incomplete clinical information, underreporting and reporting bias (Alvarez-Requejo et al.,

1998; Hasford et al., 2002). Clinicians and researchers have also utilized existing

healthcare data sources such as Electronic Health Records (EHR) to attempt to identify

previously unreported adverse drug reactions (ADRs) (Haerian et al., 2012; Harpaz et al.,

2013; Trifirò et al., 2009; Wang et al., 2009). However, these data are inherently noisy as

drugs and potential side effects may co-occur in the EHR for many reasons. In addition,

the EHR often contains free-text data, and the accuracy of Natural Language Processing

(NLP) tools is not perfect. New methods are required to selectively identify potentially

hazardous drug/ADR associations. Consequently, the development of computational

approaches to more accurately detect potential side effects is currently an active area of

research (Bauer-Mehren et al., 2012; Deftereos et al., 2011; Friedman, 2009; Oliveira et

al., 2013; Shetty & Dalal, 2011). These approaches have predominantly focused on

improving signal detection using statistical methods, machine learning or some

combination thereof.

Statistical methods employed mostly involve disproportionality analysis. Other statistical

algorithm based on co-occurrence has also been explored to detect possible side effects

from unstructured clinical notes. Both statistical methods are based on an independence

36

model (Wang et al., 2009; Zorych et al., 2011). Disproportionality measure was developed

from SRS data and has been tested to be able to retrieve drug/ADR associations from EHR

data (Liu, McPeek Hinz, et al., 2012; Zorych et al., 2011) under the premise that the

observed co-occurrence of a drug and an event can be considered as the reported a possible

drug/ADR association (Zorych et al., 2011). Other statistical algorithm based on co-

occurrence, originated from detecting the above-chance frequent occurrence of two entities

from a text corpus, has also been demonstrated to be effective in retrieving possible

drug/ADR associations from clinical corpus (discharge summaries in (Wang et al., 2009)).

The motivation of using this algorithm is based on the hypothesis that a drug entity and a

possible problem entity are more likely to appear together than random combinations of

any drug entity and any problem entity (Cao et al., 2007; Wang et al., 2009).

However, EHR data was not captured for the purpose of reporting side effects, and

consequently there exists information other than drug related problems, e.g. drug treatment

information (symptoms, indication, etc.). So researchers need to select a related cohort for

investigating ADRs using EHR data. Liu et al. (Liu, McPeek Hinz, et al., 2012) selected

drugs and corresponding abnormal laboratory results as possible side effects from

inpatients structured clinical data as the cohort. Wang et al. (Wang et al., 2009) utilized a

NLP tool to process unstructured clinical notes and only selected drug and possible side

effects related to the processed sections as the cohort. Consequently, statistical methods

and machine learning tools are utilized together to analyze unstructured clinical notes for

finding possible drug/ADR associations.

Structured (Liu, McPeek Hinz, et al., 2012) and unstructured (Wang et al., 2009) inpatient

EHR data and structured outpatient (Honigman, Light, Pulling, & Bates, 2001; Honigman,

37

Lee, et al., 2001) EHR data have been demonstrated to be a possible data source to identify

ADRs. However, to our knowledge, unstructured outpatient EHR data hasn’t been used for

ADRs identification. OMOP preliminary results suggest that the performance of using

EHR systems varies by data source (FDA Science Board Subcommittee, 2011). In this

study, an unstructured outpatient EHR data is evaluated to be a feasible data source for

detecting ADRs. Both disproportionality measures and other statistical algorithm based on

co-occurrence are used to analyze the outpatient EHR data and find possible drug/ADR

associations. Corresponding workflow is discussed.

3.1 Materials

3.1.1 Clinical data warehouse (CDW)

The CDW was developed by the Center for Clinical and Translational Science (CCTS) at

the University of Texas Health Science Center at Houston. The outpatient EHR system for

UT Physicians (Allscripts), is hosted on CDW for clinical research usage. It contains

medical records on approximately 364,000 patients treated by UT Physicians (Saitwal et

al., 2012). The batch used in this experiment is 20130130 containing about ten-year clinical

data until January, 2013 and 2,603,279 outpatient clinical notes. This large sample size

provided by the EHR system may provide enough power to detect small frequency ADRs.

This work is qualified for an IRB exemption and has been approved by the Committee for

the Protection of Human Subjects. This project refers to HSC-SBMI-12-0226 – “Detection

of adverse drug events from electronic health records”.

38

3.1.2 Medical language extraction and encoding system (MedLEE)

MedLEE is a NLP system, designed to parse narrative patient reports into structured

representations including UMLS codes for identified concepts (Friedman, Shagina,

Lussier, & Hripcsak, 2004). Friedman et al (Friedman et al., 2004) report the recall of

MedLee for UMLS coding of all terms from 150 randomly selected sentences -- selected

from a text collection consisted of 818,000 sentences that were retrieved from de-identified

patients’ discharge summaries -- as 0.77 compared with a reference standard which was

determined by seven experts’ manual review. The precision of the system was reported as

89% for encoding a second set of 150 randomly selected sentences from the text collection

then subsequently validated by experts. Since the overall evaluation was conducted on the

records from the same institution in which the NLP system was developed, performance

may be lower elsewhere as NLP systems often must be adapted to new contexts.

3.1.3 Side effect resource 2 (SIDER2 (Kuhn, Campillos, Letunic, Jensen, & Bork,

2010))

SIDER2 is a publicly available database containing information on marketed medicines

and their known adverse reactions (Kuhn et al., 2010). SIDER2 was used to construct a

dataset for our experiment and as a reference standard to confirm whether a predicted side

effect is a true adverse reaction. SIDER2 contains 996 drugs, 4,192 side effects, and 99,423

drug/ADR pairs. Only those side effects and drugs that were contained in the EHR data

were retained, leaving a total of 833 drugs, 2123 side effects, and 71,753 drug/ADR pairs.

3.1.4 Hierarchical relationships between concepts from UMLS knowledge base

The number of clinical notes for a problem was extended by grouping related concepts

based on the UMLS semantic network and UMLS Metathesaurus, specifically the

39

MRREL.RRF file (U. S. National Library of Medicine, 2009). This file includes

relationships between UMLS concepts found in the UMLS Metathesaurus, while the

relationship can be synonym (SY), child (CHD), sibling (SIB), parent (PAR) and etc. By

utilizing these ancestor-offspring hierarchical relationships (Figure 3-1), all offspring nodes

for a SIDER2 ADR can be retrieved and subsequently are considered as related concepts,

and their corresponding clinical notes are counted for the SIDER2 ADR. Take Breast

Carcinoma as an example, Breast Carcinoma (C0678222) is a parent node of Mucinous

Carcinoma of Breast (C1334807) and Lobular Carcinoma (C0206692), and a child node of

Carcinoma (C0007097). This hierarchical relatedness can be used to group related concepts

for Breast Carcinoma and consequently to count its clinical notes.

Figure 3-1: The example ancestor-offspring hierarchy relationships between UMLS

concepts

3.1.5 MEDication-Indication (MEDI)

It was anticipated that co-occurring drugs and problems in the clinical notes might often

be related therapeutically. Consequently, I evaluated the utility of an indication knowledge

resource, MEDI (Wei et al., 2013), as a means to eliminate drug indications from

consideration as potential side effects. MEDI is a medication indication resource that was

extracted from a set of commonly used medication resources, including RxNorm,

40

MedlinePlus, SIDER2, and Wikipedia (Wei et al., 2013). MEDI drugs are represented by

RxNorm codes, and indications are represented by ICD-9 codes. MEDI contains 3112

medications and 63,343 medication-indication pairs (MEDI-Complete). Additionally, the

MEDI high-precision subset (MEDI-HPS) was also created by only including indications

that are retrieved from RxNorm or at least two of the three other resources. MEDI-HPS

contains 2136 medications and 13,304 medication-indication pairs. The estimated

precision of MEDI-HPS is about 92%.

3.2 Methods

3.2.1 Annotating the EHR data using MedLEE

Drugs and co-occurring problems from medical records in the CDW were annotated by

MedLEE. The medical records I utilized are free-text, unstructured clinical encounter

documentations in outpatient setting. Each clinical summary was processed by MedLEE,

resulting in an output that included UMLS-codes and semantic types for each extracted

concept. Concepts denoted as a “medication” or “problem” were utilized as the source of

potential side effects. Their occurrence at document level was recorded, to enable

evaluation of the frequency with which a drug or a problem occurred in the EHR data.

3.2.2 Construct test data set

All SIDER2 drugs and side effects that are contained in the EHR data were utilized for the

experiment. All selected drugs and ADRs are paired up to construct a SIDER2 drug/ADR

test set. This resulted in a set of 1,768,459 drug/ADR relationships, 71,753 of which were

true and 1,696,706 of which were false.

41

3.2.3 Retrieve co-occurrence data for SIDER2 test set

Co-occurrence data for the SIDER2 drug/ADR test set were retrieved from the EHR system

and is referred to as original co-occurrence data. By using the ancestor-offspring

hierarchical relationships in UMLS knowledge base, for each SIDER2 ADR, the EHR

clinical documents that contain this concept or this concept’s all related offspring concepts

are considered as ADR positive reports. This may also improve statistical power since the

prevalence of some side effects is relatively low. By doing so, a second set of co-occurrence

data were retrieved for each SIDER2 drug/ADR test set and is referred to as descendent

co-occurrence data.

3.2.4 Statistical algorithms

3.2.4.1 Disproportionality analysis

After retrieving co-occurrence data for the drugs and ADRs that appear in the test set, a

contingency table was constructed for each drug-problem pair. Disproportionality

measures – reporting odds ratio (ROR), proportional reporting ratio (PRR), Yule’s Q

(Yule), information component (IC), Multi-item gamma Poisson Shrinker (MGPS) -- were

applied to identify statistically significant drug/ADR pairs (Table 2-1 in Chapter 2). If the

observed occurrence / expected occurrence is more than a quantitative threshold (which

varies across different disproportionality measures), the drug/ADR combination is

considered as a signal.

3.2.4.2 Other statistical algorithm based on co-occurrence

To test independence, Chi-squared test can be applied to drugs and problem-related entities

extracted from the EHR data using NLP (or structured data if available) to test the strength

of the relationship between drugs and problems based on their paired and independent

42

observations in the dataset. There may exist interrelations between different drugs and

different problems.

The Chi-squared value reflects the magnitude of the dependence. However, traditional Chi-

squared statistics rejects most null hypotheses due to the large sample size (Cao et al.,

2007) which results in false positives. Zorych et al. (Zorych et al., 2011) has shown that

there are large amount of false positive signals detected from EHR data. In Chi-squared

test, each drug/ADR pair is tested separately without taking into account the multiple

comparison. Multiple comparison refers that multiple drug/ADR associations hypotheses

are tested from the same dataset simultaneously (Ahmed, Dalmasso, et al., 2010).

Researchers have proposed statistical approaches to control error rates in multiple

hypothesis testing. Among these, the false discovery rate (FDR) was introduced to measure

the multiple-hypothesis testing error and has been successfully used in large-scale genomic

studies to control false positive results (Benjamini & Hochberg, 1995; J. D. Storey &

Tibshirani, 2003; J. D. Storey, 2002). FDR avoids the over-conservativeness of the

standard Bonferroni approach to multiple hypothesis testing, and has proven to be powerful

in identifying sparse signals from a large number of tests (Cai & Sun, 2009). For example,

FDR is used to detect differential gene expression in replicated DNA microarray

experiments, where unknown dependencies are likely to occur (J. Storey & Tibshirani,

2001). On account of the proven utility of the FDR approach as a means to identify sparse

signals from large datasets (Ahmed, Thiessard, Miremont-Salamé, Bégaud, & Tubert-

Bitter, 2010), I also used the Chi-squared test augmented with the FDR as measuring

significance threshold. The FDR approach (J. D. Storey & Tibshirani, 2003) determines

the statistical significance of a q value for each tested pair. q values are estimated from p

43

values, which measure significance in terms of the false positive rate. The false positive

rate is the rate at which null hypotheses are rejected (false drug/ADR associations are

predicted as positive by a statistical model). For example, a false positive rate of 5% means

that on average 5% of the false drug/ADR associations will be predicted as positive by the

statistical model. The q value measures significance in terms of false discovery rate, the

rate at which statistically significant signals are, in fact, false drug/ADR associations. Table

3-1 describes how overall accuracy or error is measured in m drug/ADR pairs testing.

Table 3-1: Overall error measure in FDR approach (J. D. Storey & Tibshirani, 2003)

Tested

significant

(Positive)

Tested not

significant

(Negative)

Total

Null hypothesis is True

(no relationship between drug

and ADR)

False Positive

(FP)

α (Type I

error)

True Negative

(TN)

m0 (FP+TN)

(# of true null

drug/ADR pairs)

Alternative hypothesis is True

(has relationship between drug

and ADR)

True Positive

(TP)

False Negative

(FN)

β (Type II error)

m1 (TP+FN)

(# of true alternative

pairs)

Total S = 1 (FP+TP)

(# of pairs

tested

significant)

S = 0 m (m0+m1 || S=1 +

S=0)

(total drug/ADR

comparisons)

𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑎𝑡𝑒 = 𝐸[𝐹

𝑚0] ≤ 0.05

𝑓𝑎𝑙𝑠𝑒 𝑑𝑖𝑠𝑐𝑜𝑣𝑒𝑟𝑦 𝑟𝑎𝑡𝑒 (𝐹𝐷𝑅) = 𝐸[𝐹

𝑆] = 𝐸[

𝐹

𝐹 + 𝑇] ≤ 0.05

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =𝑚0 − 𝐹

𝑚0

𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =𝑇

𝑚1

𝑓𝑎𝑙𝑠𝑒 𝑑𝑖𝑠𝑐𝑜𝑣𝑒𝑟𝑦 𝑟𝑎𝑡𝑒 (𝐹𝐷𝑅) = 𝐸[𝑚0 ∙ (1 − 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦)

𝑚0 ∙ (1 − 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦) + 𝑚1 ∙ 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦]

44

3.2.5 Filtering indications

To evaluate the utility of existing knowledge of indication, the two MEDI lists were applied

to exclude indication relationships from the false pairs and tested using the top 5 best

performing statistical models using the original SIDER2 test set.

3.2.6 Performance evaluation

The performance for different statistical algorithms were calculated and compared using

the original SIDRE2 test set. The best five performing models were advanced for testing

with different MEDI interventions and procedures to group related concepts.

3.3 Experimental design

The CDW EHR clinical narratives were processed and annotated by MedLEE (Friedman

et al., 2004) with default setting. MedLEE extracts biomedical concepts from the EHR and

categorizes these into semantic types based on large lexicon. Within MedLEE’s annotated

output, those entities labels as “medication” and “problem” were considered as candidate

drugs and ADRs respectively. SIDER2 (Kuhn et al., 2013) was used to construct a

drug/ADR reference standard (SIDER2 drug/ADR test set) by pairing up all drugs and

ADRs that exist in both the EHR data and SIDER2 to generate a set of 71,753 true and

1,696,706 false drug/ADR relationships. Since SIDER2 was extracted from package

inserts by text mining tools, there may exist false drug/ADR associations because of the

text mining error or over reporting of side effects (Duke J, Friedlin J, & Ryan P, 2011).

Existing disproportionality measures and Chi-squared test with FDR for detecting

drug/ADR associations from outpatient clinical notes were calculated and compared. In

45

this experiment, different thresholds (4, 6, 10, 25) of co-occurred drug/ADR reports were

considered as the second condition to predict true drug/ADR associations.

Clinical narratives may have different concepts representing a SIDER2 ADR. Clinical

notes containing these related concepts can be retrieved to expand the examples available

for each SIDER2 ADR. In addition, a co-occurring drugs and problems may indicate

therapeutic relationships, the MEDI lists were used to filter drug indication relationships

from drug/problem candidates. Overall, my experiments concerned three methodological

variants: (1) the choice of statistical measure; (2) increasing the examples available for

each potential ADR by grouping related concepts for a SIDER2 problem; and (3) filtering

known indication relationships.

3.4 Results

3.4.1 Experiment dataset

There are 2,603,279 clinical narratives (narrative notes in CDW updates to 01/30/2013)

that were annotated by MedLEE. 2,325,614 notes that contain at least medication or

problem were used for the experiment. There are 229 narrative types in this batch. 23.5%

of the notes are labeled as “chart note” (Table 3-2). A chart note, also called as a progress

note, is dedicated when an established patient is seen for a repeat visit.

In the EHR system, there are 7780 drugs and 10,670 problems. For SIDER2 set, there are

833 SIDER2 drugs that are also contained in the CDW, and 2123 SIDER2 side effects that

are contained in the CDW. These overlapping drugs and problems between SIDER2 and

CDW were used to build an evaluation set for the experiment. In this evaluation set, all

46

drugs were paired up with all problems. The true drug/ADR pairs are the pairs that are exist

in SIDER2. The false drug/ADR pairs are the pairs that do not exist in SIDER2.

Table 3-2: Top 10 clinical documentation types in the EHR system

clinical documentation

type

Number of narrative

notes

Percentage of total

clinical notes

Chart Note 546,025 0.235

Clinical Note 180,192 0.077

Clinical Summary-

RTF

144,486 0.062

Pre-Live Dictation. 120,295 0.052

Telephone Note 118,701 0.051

Nurse Note 113,253 0.049

Est Patient 96,567 0.041

Established 81,496 0.035

New Patient 63,110 0.027

UT Imaging. 60,419 0.026

3.4.2 Query the co-occurrence data for experiment datasets

For the drug/ADR pair in the evaluation set, the number of notes that contain the drug-of-

interest and ADR-of-interest were counted to quantify co-occurrence. As discussed

previously, to improve the statistical power, the notes that contain related problems-of-

interest were also considered relevant notes when estimating the drug-problem co-

occurrence counts for this problem (so any drugs co-occurring with a taxonomically related

problem would be counted as though they had co-occurred with the index problem). On

account of the two options for measuring associations with respect to related problems,

there are two contingency tables for each experiment set (original vs. descendent co-

occurrence).

47

3.4.3 Statistical results

3.4.3.1 Performance results

3.4.3.1.1 Performance of statistical algorithms with original co-occurrence data

The performance of predicting true positives, precision, recall and F-measure for different

statistical algorithms with different co-occurrence threshold are compared and plotted in

Figure 3-2. The average precision is 0.1315±0.0105. From this, I can anticipate that 10-

15% of recovered signals from outpatient clinical notes can be true side effects. Chi-

squared statistics with FDR threshold is the besting performing model with an F-measure

of 0.1826. The best 5 performing statistical models based on F-measure are used in the

following analysis.

48

Figure 3-2: Comparison of ability of different performance metrics (R-ROR, P-PRR, Y-

Yule, C-Chi-square, F-Chi-square with FDR, M-MGPS) to recover SIDER2 side effects

for different statistical algorithms with different drug/ADR co-occurrence threshold; top

left, top right, bottom left, and bottom right are the comparison of number of true

positives, precision, recall, and F-measure, respectively

3.4.3.1.2 Effects of potential ontology intervention

The best 5 performing statistical models from previous results were used in comparing

their performance between original and descendant co-occurrence data (Figure 3-3). With

the descendant co-occurrence data, the algorithms detect more true positives, but this was

counter-effected by more false positives retrieved and leads to the decreased precision.

Overall, the F-measure decreases with the descendant co-occurrence data (Figure 3-4).

49

Figure 3-3: Comparison of the predicted drug/ADR association count comparing between

original and descendant co-occurrence set for best five performing statistical models (F2-

Chi-square with FDR and 25 co-occurrence; C-Chi-square with 25 co-occurrence; F1- Chi-

square with FDR and 10 co-occurrence; P-PRR with 10 co-occurrence; Y-Yule with 10 co-

occurrence); top left, top right, bottom left, and bottom right are the comparison of number

of true positives, false positives, true negatives, and false negatives, respectively

50

Figure 3-4: Comparison of ability of different performance metrics to recover SIDER2

side effects for best five performing statistical algorithms (F2-Chi-square with FDR and 25

co-occurrence; C-Chi-square with 25 co-occurrence; F1- Chi-square with FDR and 10 co-

occurrence; P-PRR with 10 co-occurrence; Y-Yule with 10 co-occurrence); top left, top

right and bottom are the comparison of precision, recall, and F-measure, respectively

3.4.3.1.3 Effects of MEDI intervention

In this test, I selected the five statistical models with the best F-measure and evaluated the

effects of different MEDI interventions. Since I excluded indications from false pairs in

the reference standard (Table 3-3), it is expected that this will decrease the number of false

positives (consequently increasing the true negatives). However, the number of true

positives will not be changed, so recall should not be changed either. The precision may be

improved because of the decreasing of false positives, as may the F-measure. The results

51

of these experiments are shown in Figure 3-5 and 3-6 below, which present the filtering

the indications can decrease the predicted false positives and correspondingly increase the

precision and F-measure.

Table 3-3: Number of false pairs that have been excluded as indications

Reference Set true pairs false pairs Total pairs

Without filtering out indications 71,753 1,696,706 1,768,459

Filtering indication by MEDI-HPS 71,753 1,694,739 1,766,492

Filtering indication by MEDI-

Complete

71,753 1,690,353 1,762,106

Figure 3-5: Comparison of the predicted drug/ADR association count with different

MEDI filtering indications for best five performing statistical models (F2-Chi-square

with FDR and 25 co-occurrence; C-Chi-square with 25 co-occurrence; F1- Chi-square

52

with FDR and 10 co-occurrence; P-PRR with 10 co-occurrence; Y-Yule with 10 co-

occurrence); top left, top right, bottom left, and bottom right are the comparison of

number of true positives, false positives, true negatives, and false negatives, respectively

Figure 3-6: Comparison of ability of different performance metrics to recover SIDER2

side effects for best 5 performing statistical algorithms (F2-Chi-square with FDR and 25

co-occurrence; C-Chi-square with 25 co-occurrence; F1- Chi-square with FDR and 10

co-occurrence; P-PRR with 10 co-occurrence; Y-Yule with 10 co-occurrence) after

filtering indications by different MEDI lists; top left, top right and bottom are the

comparison of precision, recall, and F-measure, respectively

3.5 Discussion

This experiment demonstrated that unstructured outpatient data can be used as a data source

for detecting drug/ADR associations. The statistical testing procedure is well established

for signal detection and can be potentially improved by considering the error controlling

53

within multiple comparison setting. Filtering known therapeutic relationships from

possible drug/ADR candidates can improve the overall performance; this should become a

required component in pharmacovigilance using EHR data.

This work has several limitations. MedLEE was used to extract medications and problems

from the CDW EHR system. After reviewing the annotated problems, there were some

unlikely problem terms such as “arms” or “legs”. These are less likely problems related to

medications and maybe caused by NLP processing errors. Since the procedure is not tied

to a specific NLP tool, there is also the possibility of selecting another NLP tool with better

performance. With the improvement of NLP technology, this method should get better

performance accordingly.

There is no agreed-upon gold standard in the field of pharmacovigilance (Tatonetti et al.,

2012), and it has been pointed out that SIDER2 data set may contain false drug/ADR

associations (Duke J et al., 2011). This is likely to affect the performance evaluation by

including wrong validation data. In addition, SIDER doesn’t include all the drugs that were

used in the EHR system. Possible causes include the usage of different drug names, new

drugs being used in clinical practice and terminology mapping or NLP annotation errors.

This further hints at the fact that a comprehensive and accurate reference set is needed for

accurately evaluating pharmacovigilance systems or methods.

By exploiting UMLS hierarchical relationships, sub-concepts for a SIDER problem can be

grouped together and the pathways between those sub-concepts can be used to relate to the

comprising SDIER2 problem concept. However, initial experiments didn’t show an

improvement of the overall performance in the experiment. This does not however

necessarily imply that grouping related concepts isn’t suitable for improving the signal

54

detection performance. To assess this better, future studies are needed to scrutinize the

effects of subsumption and grouping of concepts that are related through a hierarchy.

Ontological inference experiments should be used to group related concepts in a more

controlled and accurate manner to determine the effects on signal detection performance.

3.6 Conclusion

Drug/ADR signals can be detected from the outpatient unstructured EHR data with existing

disproportionality measures and Chi-square statistics with FDR. Filtering known

therapeutic relationships from possible drug/ADR combinations can exclude confounding

factors and improve the performance. With the precision of about 10-15% of evaluated

SIDER2 by statistical algorithms, additional signal evaluation is needed for the detected

drug/ADR signals.

55

Chapter 4: Using Knowledge Extracted from the Literature to Identify Plausible

Adverse Drug Reactions1

Over the last decade, drug safety data obtained from spontaneous reporting systems (SRSs)

have been analyzed using quantitative data mining procedures to retrieve strongly

associated drug/ADR pairs (Puijenbroek et al., 2003; Rawlins, 1988a, 1988b). These

highlighted associations are subsequently reviewed and scrutinized by domain experts.

Review by domain experts is required to evaluate a signal using their knowledge and

judgment to find a signal with clinical significance. However, on account of the human-

intensive nature of this task, automated assistance is desirable. In this study, I attempt to

partially automate this aspect of the signal evaluation process. I do so using methods that

leverage knowledge extracted from the biomedical literature as a means to assess the

plausibility of an observed drug/ADR association.

4.1 Materials

In this study, MetaMapped Medline Baseline (MMB) and Semantic MEDLINE Database

(SemMedDB) were used to represent knowledge from the biomedical literature. Side

Effect Resource 2 (SIDER2) was used as data set for drug/ADR associations. The Semantic

Vectors package was used to build concept-based (RRI, Reflective Random Indexing) and

1 This chapter is published at: Shang, N., Xu, H., Rindflesch, T. C., & Cohen, T. (2014). Identifying

plausible adverse drug reactions using knowledge extracted from the literature. Journal of Biomedical

Informatics.

56

predication-based (PSI, Predication-based Semantic Indexing) semantic space models

(Cohen et al., 2010; Cohen, Widdows, Schvaneveldt, Davies, et al., 2012).

4.1.1 MMB and SemMedDB

2012 MMB was used as a repository for concept-based modeling. The MMB contains

20,494,848 articles included in Medline up to November, 2011 and contains 399,701

distinct concepts. The SemMedDB V2.2 (semmedVER22) was used for predication

modeling, which was processed with SemRep version 1.5. This was the current version

when the experiments started. SemMedDB contains 22,252,812 citations included in

Medline up to March 31, 2013 and contains 63,795,467 predications. There are 58 distinct

predicates and 257,350 distinct concepts in SemMedDB. There are also negated

predications in the SemMedDB repository (e.g. anticoagulant_therapy NEG_TREATS

(does not TREAT) phlebitis). However, the number of negative predications is relatively

small (1.2% of total predications) and consequently they were not included in the PSI

model.

4.1.2 SIDER2


and their known adverse reactions (Kuhn et al., 2010). SIDER2 was used to construct a

dataset for the experiment and as a reference standard to confirm whether a predicted side

effect is a true adverse reaction. SIDER2 terms were normalized by mapping drug and side

effects terms to UMLS CUI with UMLS Terminology Services (UTS) API 2.0 (U.S.

National Library of Medicine & National Institutes of Health, 2012). These UMLS CUIs

were then subsequently searched in SemMedDB and MMB to retrieve the mapped UMLS

concepts which are represented in SemMedDB and MMB.

57

SIDER2 contains 996 drugs, 4,192 side effects, and 99,423 drug/ADR pairs. Only those

side effects and drugs that were represented in both the RRI and the PSI spaces were

retained, so the reference set contains 959 drugs, 3,436 side effects, and 90,787 drug/ADR

pairs. Each vector model’s search space was composed of vectors representing the SIDER2

side effects. For the PSI model, SIDER2 drug/ADR pairs were also used as training data

to infer predicate reasoning pathways.

4.1.3 MEDication-Indication (MEDI)

As I would anticipate connections in the literature between medications and diseases they

treat, I evaluated the utility of another knowledge resource, MEDI (Wei et al., 2013), as a

means to eliminate drug indications from consideration as potential side effects. MEDI is

a medication indication resource that was extracted from a set of commonly used

medication resources, including RxNorm, MedlinePlus, SIDER2, and Wikipedia (Wei et

al., 2013). MEDI drugs are represented by RxNorm codes, and indications are represented

by ICD-9 codes. MEDI contains 3,112 medications and 63,343 medication-indication

pairs. Additionally, the MEDI high-precision subset (MEDI-HPS) was created by only

including indications that are retrieved from RxNorm or at least two of the three other

resources. MEDI-HPS contains 2,136 medications and 13,304 medication-indication pairs.

The estimated precision of MEDI-HPS is about 92% (Wei et al., 2013).

In the experiments, MEDI was used to eliminate drugs’ indications from the side effects

search space. To do so, all terms representing drugs (RxCUI) and indications (ICD-9 codes)

in MEDI were normalized to UMLS concepts, and then each drug’s indications were

filtered from this drug’s search space. In many cases, there exist hierarchical relationships

between concepts. For example, C0264702 acute myocardial infarction of apical-lateral

58

wall is a child node of C0155626 acute myocardial infarction. So in the experiments, I

extended the MEDI list by aggregating the related concepts by different hierarchical

relations. I tested these various extensions of the MEDI list as different MEDI

interventions.

4.2 Methods

4.2.1 RRI

RRI (Cohen et al., 2010) is a variant of RI adapted to enable the recognition of meaningful

indirect associations. The variant of RRI I used for the experiments allows for the

estimation of semantic relatedness between UMLS concepts, and proceeds as follows.

First, all terms in the text corpus are assigned unique vector representations, known as

elemental vectors. I will refer to the elemental vector for concept C as E(C) for the

remainder of this manuscript. In accordance with the RI paradigm (Kanerva, 2009),

elemental vectors are generated stochastically. In this way, RI creates unique fingerprints

for all terms in the text corpus. The vector components can be binary, ternary, real, or

complex values (Kanerva, 2009; Widdows & Cohen, 2012). In the experiments, I use

32,000 dimensional binary vectors constructed in accordance with the Binary Spatter Code

(BSC) (Kanerva, 1994), one of a family of representational approaches known as Vector

Symbolic Architectures (VSAs) (Gayler, 2004; Kanerva, 1994, 1996; Plate, 1995). This

dimensionality was selected based on the results of simulation experiments in previous

research (Wahle, Widdows, Herskovic, Bernstam, & Cohen, 2012), which suggest that at

this dimensionality around 2,000 unique elemental vectors can be superposed with low

probability of the superposed product being closer to some other elemental vector in the

59

space than its component vectors. However, I did not attempt to optimize this parameter,

and would anticipate some improvement in accuracy in exchange for the additional

computational work required to perform these experiments at higher dimensionalities. In

the BSC, elemental vectors are constructed by distributing an equal number of 1’s and 0’s

at random across the dimensions of the vector concerned. Consequently, elemental vectors

have a high probability of being orthogonal or close-to-orthogonal to each other, with

orthogonality defined as a Hamming Distance (HD) of half the dimensionality of the

vectors concerned (Kanerva et al., 2000; Kanerva, 1994, 1996).

The next step is to generate vector representations of documents, by superposing the

elemental vectors of the terms contained in these documents. With binary vectors,

superposition is accomplished by keeping track of the number of 1’s and 0’s that have been

added in each dimension, and assigning the value in this dimension using the majority rule,

with ties split at random. I will refer to this operation by using the “+”symbol, with “+=”

indicating a superposition that includes the vector on the left of the operator also (so

DOC(D) += E(C) is equivalent to DOC(D) = DOC(D) + E(C), a common operation during

training).

In the experiments, this superposition is weighted using the Log-Entropy weighting

procedure. The local term weight for term i in document j (lij) is derived from the frequency

of a term in a document. The global weight for term i (gi) describes the frequency of the

term within the entire text corpus. They are computed with Equation 1.

This weighting scheme reduces the influence of high frequency terms that may be

uninformative, and tempers the influence of terms that recur frequently within a single

document (Dumais, 1991). Once document vectors have been generated (Equation 2), it is

60

possible to generate vector representations of concepts (in this case, or terms in the general

case), known as semantic vectors. I will refer to the semantic vector for concept C as S(C).

Semantic vectors are constructed by superposing vector representations of the documents

a concept occurs in.

𝐿𝑜𝑔𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑡𝑒𝑟𝑚 𝑖, 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑗) = 𝑙𝑖𝑗 × 𝑔𝑖

𝑙𝑖𝑗 = log(𝑡𝑒𝑟𝑚 𝑖, 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑗) = log(1 + 𝑡𝑓𝑖𝑗)

𝑔𝑖 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑡𝑒𝑟𝑚 𝑖) = 1 − ∑𝑝𝑖𝑗 × log 𝑝𝑖𝑗

log 𝑛𝑗, 𝑤ℎ𝑖𝑙𝑒 𝑝𝑖𝑗 =

𝑡𝑓𝑖𝑗

𝑔𝑓𝑖

𝑡𝑓𝑖𝑗: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑡𝑒𝑟𝑚 𝑖 𝑖𝑛 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑗

𝑔𝑓𝑖: 𝑔𝑙𝑜𝑏𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑓𝑜𝑟 𝑡𝑒𝑟𝑚 𝑖 (𝑡𝑜𝑡𝑎𝑙 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑡𝑒𝑟𝑚 𝑖 𝑖𝑛 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠)

𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠

Equation 1: Log-Entropy weighting equation

𝑆(𝑑𝑜𝑐)+= 𝐸(𝑡𝑒𝑟𝑚) ∙ 𝑙𝑜𝑐𝑎𝑙 𝑤𝑒𝑖𝑔ℎ𝑡 ∙ 𝑔𝑙𝑜𝑏𝑎𝑙 𝑤𝑒𝑖𝑔ℎ𝑡 = 𝐸(𝑡𝑒𝑟𝑚) ∙ 𝑙𝑖𝑗 ∙ 𝑔𝑖

Equation 2: Generating document vectors with weighting procedure in RRI

Superposition of binary vectors requires maintaining a “voting record” that keeps track of

the number of 1’s and 0’s added in each dimension. When local and global weighting

metrics are utilized, the “votes” may not be integer values. So, for example, if the vector

1010 were added with a weight of 0.5, a straightforward implementation of the voting

record would add 0.5 to the dimensions of the voting record corresponding to the 1’s, and

subtract 0.5 from the dimensions corresponding to the 0’s. Normalization involves tallying

these votes. After training is complete, those dimensions of the voting record with positive

values would be assigned 1, those with negative values would be assigned 0, and those

with a zero value would be assigned either 1 or 0 at random. In practice, however, it is

61

computationally inconvenient to maintain and update 32,000 real values to serve as a

voting record for each semantic vector. Consequently, the Semantic Vectors package

employs a binary matrix approximation of the voting record, which sacrifices some

floating-point precision in exchange for computational efficiency. These implementation

details are provided in (Widdows & Cohen, 2012).

These operations are expressed concisely in the pseudo code in Figure 4-1, adapted from

(Cohen et al., 2010). A schematic representation for RRI is shown in Figure 4-2. I used

Semantic Vectors Version 3.7 to build RRI vectors. Once semantic vectors were

constructed, the relatedness between drugs and ADRs was estimated as [1 −

2

𝑛𝐻𝑎𝑚𝑚𝑖𝑛𝑔𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑥, 𝑦)]. Therefore, a ranked list of ADRs for each drug was

provided.

q := # total terms

n := # dimensions, n := 32, 000

p := # total documents

m := # total UMLS concepts

k := # terms in a specific document OR # documents related to a specific concept

⊳ initialize elemental vectors E(T)

for i < q do

generate a zero vector of dimension n

in E(Ti) arbitrarily set half of the zero dimensions to 1

end for

⊳ train document model (E(T), p)

for j < p do

for term i in document j do

DOC(Dj) + = E(Ti) × LogEntropy (term i, doc j)

end for

normalize (DOC (Dj))

end for

62

⊳ train concept model (DOC (D), m)

for j < m do

for each of k documents concept j occurs in do

S(Cj) + = DOC(Dk)

end while

normalize (S(Cj))

end for

Figure 4-1: The pseudo code for RRI model training process

Figure 4-2: Schematic representation of RRI training and inference process

63

4.2.2 PSI

4.2.2.1 Operations in PSI

The PSI model provides the means to implement discovery patterns for LBD using

distributional semantics (Cohen, Widdows, Schvaneveldt, Davies, et al., 2012; Cohen,

Widdows, Schvaneveldt, & Rindflesch, 2012). This is accomplished by representing

concepts and relationships extracted by SemRep as high-dimensional vectors using an

adaption of RI. In previous work, PSI has been applied to discover therapeutic relationships

(Cohen, Widdows, Schvaneveldt, Davies, et al., 2012) using a two-stage process of

discovery by analogy: first a geometric operator is used to infer discovery patterns from

known treatments, then the identified discovery patterns are used to infer previously unseen

therapeutic relationships.

In addition to the superposition operation described previously, the PSI model utilizes a

binding operation. Binding (⊗) is a compositional operation that is provided by VSAs,

such as the BSC (Gayler, 2004; Kanerva, 1996). Binding two elemental vectors generates

a third vector, which is dissimilar from these two component vectors. The binding

operation is reversible (release ⊘). With binary vectors, pairwise exclusive OR (XOR) is

used to accomplish both binding (⊗) and release (⊘).

4.2.2.2 PSI training process

The training process for generating semantic vectors proceeds as follows:

(1) Generate elemental vectors for all concepts and relations occurring in semantic

predications;

(2) generate a semantic vector for each concept, initially empty;

64

(3) for each predication (concept-predicate-concept), bind the elemental vector of one

concept and the elemental vector of the predicate, and add this bound product to the

semantic vector for the other concept.

During step (3), a statistical weighting scheme is applied. For the predication C1 P C2, the

semantic vector S (C2) is generated as shown in Equation 3.

𝑆(𝐶2)+= 𝐸(𝐶1) ∙ 𝑃𝑓 ∙ (𝑖𝑑𝑓𝑃 + 𝑖𝑑𝑓𝐶1)

Equation 3: Generating second degree of Semantic Vectors

The global weight Pf is derived from the number of times that the predication occurs in the

SemMedDB. The local weight idf (inverse document frequency of the concept C or the

predicate P) reflects the occurrence of the concept across all documents. They are

computed as shown in Equation 4.

𝑃𝑓 = log(1 + 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐶1 𝑃 𝐶2

𝑖𝑑𝑓𝐶/𝑃 = log(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐶/𝑃)

Equation 4: Weighting procedure in PSI

The pseudo code for PSI is displayed in Figure 4-3.

q := # unique UMLS concepts and predicates from SemMedDB

n := # dimensions, n := 32, 000

p := # total predications

k := # predications related to a specific concept

⊳ initialize elemental vectors E (C) and E (P)

for i < q do

65

generate a zero vector of dimension n

in E (Ci) or E (Pi) arbitrarily set half of the zero dimensions to 1

end for

⊳train semantic model (E, p)

for j < p do

predication j: C1 P C2

𝑆(𝐶1)+= 𝐸(𝐶2)⨂𝐸(𝑃 − 𝐼𝑁𝑉) ⋅ 𝑃𝑓 ⋅ (𝑖𝑑𝑓𝑃 + 𝑖𝑑𝑓𝐶2)

𝑆(𝐶2)+= 𝐸(𝐶1)⨂𝐸(𝑃) ⋅ 𝑃𝑓 ⋅ (𝑖𝑑𝑓𝑃 + 𝑖𝑑𝑓𝐶1)

end for

normalize S(C)

Figure 4-3: The Pseudo code for PSI model training process

All concepts and relations were assigned a binary elemental vector of 32,000 bits in length.

The semantic vector of each concept was generated by superposing bound products related

to this concept, where the bound products were produced by binding the elemental vectors

for the other concept and predicate elemental vectors in each predication this concept

occurs in. The search space of SIDER2 side effects contains 3,436 ADRs.

4.2.2.3 Inferring discovery patterns

After training the semantic vectors, the PSI model can be used to infer discovery patterns

by “releasing” the semantic vector of a drug using the semantic vector of its ADR.

The bound product of the drug’s semantic vector and discovery patterns’ vectors can be

subsequently used as a query vector to search the vector space of side effects. In this

procedure, discovery patterns were inferred from all known drug/ADR associations. For

each drug, the five discovery patterns that were most frequently inferred from all other

drugs and their ADRs were retained.

66

The pathways connecting drugs to side effects may not be restricted to one middle term

(and two predicates). In previous experiments predicting therapeutic relationships,

performance was improved by including pathways of three predicates and two middle

terms (Cohen, Widdows, Schvaneveldt, & Rindflesch, 2012). This is accomplished by

generating a second-degree semantic vector for a concept, S2(concept), by adding together

the (first-degree) semantic vectors of all concepts connected to it by a predicate of interest.

In the experiments, the two most popular predicates from inferred double-predicate

reasoning pathways -- INTERACTS_WITH and COMPARED_WITH -- were used to

build second-degree semantic vectors S2. This vector is then used as an alternative starting

point for the inference procedure. From this point, the five most frequently inferred double-

predicate reasoning pathways using the second order semantic vector of all other drugs and

the (first order) semantic vectors of their ADRs were retained. As these inferred pathways

connect to drugs through either INTERACTS_WITH or COMPARED_WITH, they are

referred to as triple-predicate pathways.

4.2.2.4 Applying discovery patterns to find possible ADRs (Step 5 in Figure 4-4)

To combine query vectors for frequently inferred reasoning pathways into one search

expression, I use a disjunction operation that originates in the quantum logic of Birkhoff

and von Neumann, and was first applied to information retrieval by Widdows and Peters

(Birkhoff & Von Neumann, 1936; Widdows & Peters, 2003). I define the disjunction of

these five query vectors as a query subspace derived from them using a binary vector

approximation (Cohen, Widdows, De Vine, Schvaneveldt, & Rindflesch, 2012) of the

Gram-Schmidt orthonormalization procedure (Golub & Van Loan, 2012). The length of

67

the projection of some other vector in this subspace provides an estimate of vector-

subspace similarity.

For the double-predicate discovery patterns model, a drug’s query subspace was

constructed from this drug’s first-degree semantic vector bound to the vector

representations of the five double predicate reasoning pathways most frequently inferred

from other drugs. For the double- and triple-predicate discovery patterns model, a drug’s

query subspace also included this drug’s second-degree semantic vector bound to vector

representations of the five reasoning pathways most frequently inferred from the second-

degree semantic vectors of other drugs.

The length of the projection of the semantic vector for a candidate ADR into a drug’s query

subspace was used to estimate the relatedness between these entities, providing a ranked

list of potential ADRs for each drug.

Figure 4-4 provides an overview of the PSI-based analogical reasoning process in its

entirety.

68

Figure 4-4: Schematic representation of PSI training and inference process.

Triglycerides: TG; myocardial infarction: MI; INTERACTS_WITH: IW;

COEXISTS_WITH: CoeW; ASSOCIATED_WITH: AW; COMPARED_WITH: ComW

69


An overview of the experimental design is shown in Figure 4-5. The first experiment was

conducted without knowledge of drug indications. The concept-based RRI model and

discovery pattern-based PSI model were compared with respect to their ability to identify

known drug/ADR associations. In the second experiment, the model with the best

performance from the first experiment was used to evaluate the effect of eliminating known

indications from the list of predictions.

Figure 4-5: Experimental design in the detection of SIDER2 known ADRs using LBD

distributional semantic models

70

4.3.1 Experiment 1 design

Distributional semantic vectors were used to model MMB and SemMedDB. RRI vectors

and PSI vectors formed the basis for the models of LBD concept-based co-occurrence and

LBD discovery patterns, respectively. As MetaMap may retrieve many more concepts from

a particular document than SemRep retrieves predications, I varied the RRI model to assess

the extent to which observed effects were due to the advantage of a more extensive (albeit

less structured) knowledge base. In one case, a RRI space was derived from only those

sentences from which predications were extracted. Consequently, there are three

distributional semantic models -- RRI built from documents (RRI-from-document group),

RRI built from predication source sentences (RRI-from-predication group), and PSI built

from predications. The PSI model was evaluated with two settings. In the first case, only

two-predicate discovery patterns were considered (PSI-double group), while the second

case considered both two- and three-predicate patterns (PSI-double+triple group). The

elemental vectors for terms, which are not meaningfully related to one another, were used

to implement a random baseline (Baseline group).

With the RRI models, for each drug, related problems were sought by comparing each

vector in the side effect search space to this drug’s vector representation.

With the PSI model, SIDER2 known drug-side effect pairs were used to infer predicate

paths. For the PSI-double group, each drug’s query subspace was built as the disjunction

of the bound products between the drug and its five double-predicate reasoning pathways.

For the PSI-double+triple group, each drug’s query subspace additionally included the

second degree semantic vector of this drug bound to five triple-predicate paths. The five

triple-predicate paths were retrieved by the extension of second degree semantic vectors of

71

drugs. Comparing a drug’s query subspace with each vector in the search space allowed us

to infer the drug’s possible side effects.

4.3.2 Experiment 2 design

From preliminary results, I found that there were some indications in the inferred ADRs.

So I hypothesized that excluding known indications for drugs from the search space would

improve performance. I tested this hypothesis in the second experiment utilizing

knowledge of drug indications from MEDI. In this experiment, I tested variants of the

MEDI indication list using the best performing model from the first experiment (PSI double

+ triple group). I extended the MEDI-complete and MEDI-HPS lists to include all

offspring, or immediate offspring nodes based on the UMLS semantic network utilizing

the MRREL.RRF file. This file includes relationships between UMLS concepts found in

the UMLS Metathesaurus (U. S. National Library of Medicine, 2009). By utilizing these

ancestor-offspring hierarchical relationships, I define an offspring node as a node that has

a MEDI indication as an ancestor (regardless of the number of intervening nodes); and an

immediate offspring node as a node that has this MEDI indication as its parent.

In this procedure, I first normalized all MEDI terms. For MEDI drugs, I mapped each

drug’s RxCUI to a UMLS CUI with the RxNorm API (U.S. National Library of Medicine

& National Institutes of Health, 2014) and then searched the UMLS CUI in SemMedDB

and MMB to retrieve the mapped UMLS concept. For MEDI indications, I mapped each

indication’s ICD-9 term to a UMLS CUI using the UTS API 2.0 (U.S. National Library of

Medicine & National Institutes of Health, 2012) and then subsequently searched for this

UMLS CUI in SemMedDB and MMB to retrieve the mapped UMLS concept. After

72

normalizing MEDI terms, the hierarchical relation of synonym (SY), child (CHD), and

sibling (SIB) in MRREL.RRF were used to find drugs’ MEDI indications extended

offspring or immediate offspring. Consequently, there were six MEDI lists (Table 4-1).

These MEDI lists were used to exclude indications from the side effect search space and

were tested in the second experiment.

Table 4-1: Different groups by extending MEDI in experiment 2

MEDI intervention Extension Procedure Experiment Group

No MEDI None No-MEDI group

MEDI-HPS

indications for

SIDER2 drugs

None MEDI-HPS group

All synonym (SY), child (CHD),

and sibling (SIB), as well as their

offspring.

MEDI-HPS-offspring

group

Immediate synonym (SY), child

(CHD), and sibling (SIB)

relationships.

MEDI-HPS-immediate

offspring group

MEDI-complete

indications for

SIDER2 drugs

None MEDI-complete group

All synonym (SY), child (CHD),

and sibling (SIB), as well as their

offspring.

MEDI-complete-offspring

group

Immediate synonym (SY), child

(CHD), and sibling (SIB)

relationships.

MEDI-complete-

immediate offspring group

4.3.3 Performance measurements

To evaluate performance, I used a number of widely used metrics. Precision measures the

proportion of accurate ADRs in relation to the total number of ADRs retrieved (Salton,

Fox, & Wu, 1983). To evaluate the precision at different points in a ranked list, I used

average precision (AP, the average of the precision values measured at the point at which

73

each correct result is retrieved for one example (Manning, Raghavan, & Schütze, 2008)).

Mean average precision (MAP) is the average of the AP across all drugs. Precision at k

(Manning et al., 2008) measures the precision at fixed levels of retrieved results and

emphasizes the importance of finding relevant results early. I evaluated precision at k=50

(Pk=50). Recall represents the proportion of ADRs retrieved out of the total number of ADR

associations in the reference standard (Salton et al., 1983).

A “rediscovery” (true discovery) is defined as an adverse effect inferred by a vector model

and subsequently confirmed by SIDER2 as a true prediction. Consequently, the median

rediscovery rank for a particular drug approximates the point in the ranked list produced

by a particular model at which half of the known adverse reactions for this drug were

recovered.

The AP and median rank of the rediscoveries across drugs were compared by the paired t

test and the Wilcoxon matched-pairs signed-rank test, respectively.

To measure the performance with respect to the true positive rate (TPR) and false positive

rate (FPR), receiver operating characteristic (ROC) curve was plotted for all drug/ADR

pairs for all models. Subsequently, a global area under the ROC curve (AUC, “global”

indicates that the scores of all drug/ADR pairs were combined into a single curve) was

calculated using AUCCalculator (Davis & Goadrich, 2006). For the model with the best

global AUC, a drug-based AUC was also calculated and compared between drugs.

4.4 Results

4.4.1 Experiment 1

74

4.4.1.1 Inferring discovery patterns

The most strongly associated double-predicate path was calculated for each known

drug/ADR pair. In total, 90,787 predicate paths were inferred. Among them, there were

1,485 unique predicate paths. The five most frequently inferred double-predicate paths

were selected. Second degree semantic vectors for drugs were constructed by adding

together the semantic vector representations of any concept occurring in a semantic

predication with the drug in question, where the predicate type was either

INTERACTS_WITH or COMPARED_WITH. The most frequently occurring double

predicate paths and inferred triple predicate paths with corresponding examples are shown

in Table 4-2. They are consistent across all drugs. Many of these paths are readily

interpretable, and could support a plausible biological mechanism for a predicted effect.

For example, INTERACTS_WITH:CAUSES-INV suggests a drug may interfere with

some biological factor which may cause a side effect. COMPARED_WITH:CAUSES-INV

can be used to identify similar side effects by comparing their drug class information as

COMPARED_WITH often indicates a comparative evaluation across different drugs in the

same therapeutic category. Triple predicate paths extend the connecting path for drugs and

related ADRs.

Table 4-2: The most frequent predicate paths in inferred discovery patterns

double/triple-

predicate

Example

INTERACTS_WITH

CAUSES-INV

dipyridamole INTERACTS_WITH nitric oxide

bradycardia CAUSES-INV nitric oxide

75

double/triple-

predicate

Example

ASSOCIATED_WITH

COEXISTS_WITH

rosiglitazone COEXISTS_WITH apolipoprotein a-ii

apolipoprotein a-ii ASSOCIATED_WITH myocardial

infarction

COMPARED_WITH

CAUSES-INV

bisoprolol COMPARED_WITH metoprolol

hypotension CAUSES-INV metoprolol

ASSOCIATED_WITH

INTERACTS_WITH

rosiglitazone INTERACTS_WITH triglycerides

triglycerides ASSOCIATED_WITH myocardial infarction

ISA

CAUSES-INV

naproxen ISA calcineurin inhibitor

toxic nephropathy CAUSES-INV calcineurin inhibitor

INTERACTS_WITH

INTERACTS_WITH

ASSOCIATED_WITH

rosiglitazone INTERACTS_WITH lyrm1

lyrm1 INTERACTS_WITH fatty acids, nonesterified

fatty acides, nonesterified ASSOCIATED_WITH

myocardial infarction

INTERACTS_WITH

ASSOCIATED_WITH

COEXISTS_WITH

rosiglitazone INTERACTS_WITH glycerol-3-phosphate

dehydrogenase

glycerol-3-phosphate dehydrogenase COEXISTS_WITH

succinate dehydrogenase

Succinate dehydrogenase ASSOCIATED_WITH


COMPARED_WITH

INTERACTS_WITH

ASSOCIATED_WITH

rosiglitazone COMPARED_WITH glycerophosphates

glycerophosphates INTERACTS_WITH low-density

lipoproteins

low-density lipoproteins ASSOCIATED_WITH


COMPARED_WITH

COEXISTS_WITH

ASSOCIATED_WITH

rosiglitazone COMPARED_WITH gw 501516

gw 501516 COEXISTS_WITH high-density lipoprotein

cholesterol

high-density lipoprotein cholesterol

ASSOCIATED_WITH myocardial infarction

76

4.4.1.2 Performance

Results for different vector models are shown in Table 4-3. PSI-based models performed

better than RRI-based models and both models perform better than the random baseline.

The PSI-double + triple group outperformed all other groups. All differences in median

rank and AP were statistically significant (as estimated by Wilcoxon’s signed rank test and

paired t test respectively). Pk=50 for each drug was compared across groups using Pearson’s

correlation. For variants of the same model (RRI or PSI), Pk=50 was highly correlated (0.75-

0.84). Correlation in Pk=50 between the PSI and RRI models was between 0.52 and 0.57,

suggesting the potential to improve performance by combining results.

Table 4-3: Results of precision and rank-based measures for different groups

Group MAP Pk=50 for all drugs

(global precision)

Median Rank

(n=3,436)

AUROC

median mean

Baseline group 0.0300 0.0284 1708 1711.44 0.5021

RRI-from-predication

group

0.0365 0.0469 1629 1651.08 0.5140

RRI-from-document

group

0.0520 0.0784 1333 1454.30 0.5508

PSI-double group 0.0591 0.0942 1233 1379.65 0.5973

PSI-double+triple

group

0.0848 0.1410 808 1108.47 0.6841

4.4.1.3 AUC

Figure 4-6 and Table 4-3 present the global ROC curves for all models. ROC curve shows

the tradeoff between sensitivity and specificity. The global AUC provides a cumulative

estimate of accuracy, and is shown for each model in Table 4-3. PSI-double+triple group

77

has the best global AUC of 0.6841. I measured its AUC at the drug level (Figure 4-7). The

mean and median AUC are 0.7102 ± 0.0752 and 0.7058 respectively. Figure 4-7 shows a

plot of the AUC for each drug against the log of the number of predications in SemMedDB

with this drug as subject. This suggests a trend in which performance is generally better for

those drugs for which more knowledge is available in the database. Those drugs with an

AUC of 0.8 or above tend to occur in 10,000 or more predications as subject.

Note that the global AUC as a metric will be inflated by methods that incorporate category

bias into their prediction (Akbani, Kwek, & Japkowicz, 2004; Liu, Wu, et al., 2012), a

subject I will return to in the discussion section.

Figure 4-6: ROC plot of true positive rate and false positive rate for all groups

78

Figure 4-7: Performance of AUC for all drugs by PSI-double+triple group

4.4.1.4 Rediscovery results

The results of this experiment are illustrated in Figure 4-8. This figure plots the number of

rediscovered side effects (left Y axis) and the proportion of the valid side effects

rediscovered (or global recall, right Y axis) for each model against the mean number of

suggested potential ADRs (X axis) at different statistical thresholds. All distributional

models outperform the random baseline.

79

Figure 4-8: Rediscovery plot for experiment groups

With approximately 100 predictions per drug, baseline, RRI-from-predication, RRI-from-

document, PSI-double and PSI-double+triple group have a global recall of 0.029, 0.045,

0.069, 0.088, 0.125, respectively.

4.4.2 Experiment 2

The PSI-double+triple model was the best performing model in the first experiment, and

was selected to test the effects of using variants of the MEDI list as a way to exclude

therapeutic relationships to reduce the number of highly ranked false positive predictions.

Table 4-4 presents the performance of the PSI-double+triple model when different MEDI

lists were used. The median rank of true positive predictions was lower when MEDI was

used to exclude the indication from the search space for each drug. However, as median

80

rank is based on the rank of true positive results only, it does not consider known side

effects that may have been excluded from consideration by the MEDI list. In contrast, MAP

also measures whether true side effects have been excluded. Consequently, MAP in the

MEDI-complete-immediate offspring was higher than other groups. Overall, AUC was

highest for the MEDI-HPS-immediate offspring group. Of the models, only the MEDI-

HPS-immediate offspring group outperformed the baseline PSI model by all metrics, and

the improvements in performance were small in comparison with the differences in

performance between distributional models in experiment 1. All differences between all

MEDI intervention groups and No-MEDI group in Pk=50 are statistically significant as

measured by the paired t test. However, the improvement in cumulative accuracy is

negligible.

Table 4-4: PSI-double+triple model performance across all tests with different MEDI

lists. Best results are in boldface. Performance exceeding the baseline (results obtained

by the best PSI model without MEDI) is marked with an asterisk (*)

Group MAP Pk=50 for all

drugs

Median Rank AUC

median mean

No-MEDI group 0.0848 0.1410 808 1108.468 0.6841

MEDI-HPS group 0.0849* 0.1417* 807* 1107.595* 0.6839

MEDI-HPS-offspring

group

0.0798 0.1283 765* 1050.745* 0.6813

MEDI-HPS-immediate

offspring group

0.0866* 0.1444* 795* 1094.41* 0.6850*

MEDI-complete group 0.0839 0.1401 809 1108.062* 0.6831

MEDI-complete-

offspring group

0.0673 0.0917 681* 953.761* 0.6737

MEDI-complete-

immediate offspring

group

0.0889* 0.1467* 773* 1068.718* 0.6821

81

4.4.3 Plausibility evidence found by PSI discovery patterns approach

In this paper, the association between rosiglitazone and myocardial infarction, a highly

publicized ADR discovered after the drug was released to the market, is used to illustrate

how evidence from the literature can be retrieved for the evaluation of plausibility by a

domain expert. The term “myocardial_infarction” was ranked in the top 1% (rank=29) and

top 1.5% (rank=50) of potential side effects for rosiglitazone by the PSI-double and PSI-

double+triple models respectively.

Rosiglitazone is a thiazolidinedione (TZD) antidiabetic drug, used to treat type 2 diabetes

mellitus as an adjunct to lifestyle changes (Cheung, 2010; Hamblin, Chang, Fan, Zhang, &

Chen, 2009; Lygate et al., 2003). Since its approval by the FDA in 1999, rosiglitazone was

prescribed 3.8 million times annually up to June 2009 in the United States (Shah et al.,

2010). A meta-analysis of clinical trials conducted by Nissen (Nissen & Wolski, 2007) in

2007 suggested that the use of rosiglitazone was associated with a significant increase in

the risk of myocardial infarction. This led to rosiglitazone’s withdrawal from the European

market in 2010 and a rosiglitazone black-box warning in the U.S. (Berthet, Olivier,

Montastruc, & Lapeyre-Mestre, 2011; Shah et al., 2010). In 2013, the FDA lifted some

prescription restrictions in the U.S. market based on a reevaluation of the Rosiglitazone

Evaluated for Cardiac Outcomes and Regulation of Glycaemia in Diabetes (RECORD)

trial (ClinicalTrials.gov Identifier NCT00379769) (U.S. Food and Drug Administration,

2013), but the European suspension is still in effect at the time of this writing.

Rosiglitazone is a nuclear peroxisome proliferator-activated receptor (PPAR-gamma)

agonist. The mechanism through which rosiglitazone causes cardiovascular events is

unclear, but is thought to be related to unfavorable effects on triglycerides, low-density

82

lipoprotein cholesterol (LDL-C) particle size and density, and greater affinity for PPAR-

gamma than other TZD drugs (Bourg & Phillips, 2012; Goldberg et al., 2005; Khanderia,

Pop-Busui, & Eagle, 2008; Y. K. Loke et al., 2011). To evaluate the extent to which these

hypotheses were consistent with information utilized by the PSI-double+triple model, I

reconstructed the pathways of predicates and concepts that were consistent with the

inferred discovery patterns used to make this prediction.

For myocardial infarction, each discovery pattern that was used for the inference was used

to search the indexed SemMedDB predications and find middle terms that connect

rosiglitazone with myocardial infarction through the discovery pattern. The middle terms

retrieved were ranked based on their inverse document frequency. Since the indexed

SemMedDB predications contain the source literature ID (PMID), I also retrieved related

literature evidence that supports the prediction.

Consequently 108,100 unique predication pathways were retrieved through 8 unique

predicate paths (Table 4-5) and with 1,601 distinct middle terms that connect rosiglitazone

with myocardial infarction. Table 4-6 shows some example predication pathways that were

composed of two or three predications. There were around 17 sentences providing evidence

to support each predication on average. I analyzed middle terms’ semantic groups

(Bodenreider & McCray, 2003) and list the sample with distinct predicate paths connecting

with different semantic groups (Figure 4-9).

Table 4-5: Reasoning pathways used to retrieve evidence from the literature for the pair

rosiglitazone -- myocardial infarction

Predicate Path

COMPARED_WITH : COEXISTS_WITH : ASSOCIATED_WITH

COMPARED_WITH : INTERACTS_WITH : ASSOCIATED_WITH

83

Predicate Path

INTERACTS_WITH : COEXISTS_WITH : ASSOCIATED_WITH

INTERACTS_WITH : INTERACTS_WITH : ASSOCIATED_WITH

COEXISTS_WITH : ASSOCIATED_WITH

INTERACTS_WITH : ASSOCIATED_WITH

COMPARED_WITH : ASSOCIATED_WITH : COEXISTS_WITH

INTERACTS_WITH : ASSOCIATED_WITH : COEXISTS_WITH

Table 4-6: Some example predications for possible mechanism of rosiglitazone causing


Middle term “LDL cholesterol lipoprotein”; 123 unique predication pathways; for

example:

rosiglitazone INTERACTS_WITH apolipoproteins_b (Brackenridge et al., 2009;

Sarafidis et al., 2005) INTERACTS_WITH ldl_cholesterol_lipoproteins (Vessby,

Kostner, Lithell, & Thomis, 1982) ASSOCIATED_WITH myocardial_infarction

(Goldberg et al., 2005)

rosiglitazone INTERACTS_WITH paraoxonase_1 (van Wijk et al., 2006)

INTERACTS_WITH ldl_cholesterol_lipoproteins (Gupta, Singh, Maturu, Sharma, &

Gill, 2011) ASSOCIATED_WITH myocardial_infarction (Tetsuro Yoshida et al.,

2010)

Middle term “triglyceride”; 1515 unique predication pathways; for example:

rosiglitazone COEXISTS_WITH triglycerides (Goldberg et al., 2005)

ASSOCIATED_WITH myocardial_infarction (De Caterina et al., 2011; Friis-Moller et

al., 2003; Kuller Lewis et al., 2002; Lekhal, Børvik, Brodin, Nordøy, & Hansen, 2010;

Phillips, 1977)

rosiglitazone INTERACTS_WITH triglycerides (Chao et al., 2000; Nadeau, Ehlers,

Aguirre, Reusch, & Draznin, 2007) ASSOCIATED_WITH myocardial_infarction

rosiglitazone INTERACTS_WITH glycerol-3-phosphate_dehydrogenase (Suzuki,

Suzuki, Sembon, Fuchimoto, & Onishi, 2013) COEXISTS_WITH triglycerides (Im &

Hoopes, 1983) ASSOCIATED_WITH myocardial_infarction

Middle term “ppar_gamma”; 992 unique predication pathways; for example:

rosiglitazone COEXISTS_WITH ppar_gamma (Egerod et al., 2009; Johnson et al.,

2000; Otake et al., 2011; Risérus et al., 2005) ASSOCIATED_WITH

myocardial_infarction (Fliegner et al., 2008)

rosiglitazone INTERACTS_WITH ppar_gamma (Gao et al., 2007; Kim, 2006; Lee,

2003; Tsukahara, 2006; Tzameli, 2004) ASSOCIATED_WITH myocardial_infarction

84

rosiglitazone INTERACTS_WITH glycerol-3-phosphate_dehydrogenase (Suzuki et

al., 2013) COEXISTS_WITH ppar_gamma (Muhlhausler, Duffield, & McMillen,

2007) ASSOCIATED_WITH myocardial_infarction

rosiglitazone INTERACTS_WITH glycerol-3-phosphate_dehydrogenase

INTERACTS_WITH ppar_gamma (Ding, Nagai, & Woo, 2003)

ASSOCIATED_WITH myocardial_infarction

rosiglitazone INTERACTS_WITH resistin (Jung et al., 2005) INTERACTS_WITH

ppar_gamma (Patel et al., 2003) ASSOCIATED_WITH myocardial_infarction

85

Figure 4-9: The predications retrieved by reasoning pathway for rosiglitazone causing

myocardial infarction with specifying semantic groups for concepts

86

There were 2,618 distinct predication pathways about “triglycerides”, “LDL-C” and

“PPAR-gamma” specifying 247 unique middle terms. Drilling down, Figure 4-10 shows

the connecting concepts between LDL-C and myocardial infarction that fall along the

reasoning pathways employed by the PSI-double+triple model. In each reasoning pathway,

the middle terms were ranked using inverse document frequency, to approximate the

weighting used by the predictive model. For each predication in these pathways, the source

sentences from the literature were retrieved. For example, the article “A comparison of

lipid and glycemic effects of pioglitazone and rosiglitazone in patients with type 2 diabetes

and dyslipidemia” (Boyle et al., 2002) explains that rosiglitazone increased triglycerides

compared with pioglitazone and has different effect on plasma lipids which may contribute

to heart disease. Figure 4-10 shows the middle terms retrieved to justify that rosiglitazone

may cause myocardial infarction via LDL-C.

The capacity to retrieve and organize knowledge in this way suggests a new paradigm for

information retrieval in which information supporting a hypothesis of interest is

automatically aggregated and organized at the conceptual level. However, as the number

of assertions in the literature far exceeds the number of documents, further research is

needed to develop methods through which to prioritize these assertions, and present them

in a manner conducive to human consumption.

87

Figure 4-10: Middle terms that were retrieved by PSI discovery patterns involving LDL-

C

88

4.5 Discussion

This study evaluates the ability of scalable LBD methods based on distributional semantics

to rank the plausibility of connections between drugs and potential ADRs. I find that both

the RRI and PSI models are able to retrieve known side effects of drugs, but PSI performs

this task better, as one would anticipate given the additional information beyond co-

occurrence that it encodes. The PSI model can further provide the reasoning pathways that

were used to link a drug to a predicted side effect. Consequently, relevant literature can be

retrieved to support the predictions, and provided to experts for review. However further

research is needed to develop approaches through which the assertions underlying the large

numbers of reasoning pathways utilized by the model can be prioritized for expert review,

as these are too numerous for exhaustive manual review. Ultimately, I aim to provide

domain experts with essential evidence while preventing information over-load. Even

though it is not the best performing model, the RRI model has the advantages of a simple

training process and the availability of more data to draw upon (as MetaMap has higher

recall for concepts than SemRep has for predications). Conversely, the PSI model has the

advantage of modeling plausibility, a capability with the potential to assist expert clinical

review for pharmacovigilance. In addition, the correlation analysis between groups

suggests that RRI and PSI complement each other, and can potentially be combined to

improve performance on this task.

For predicting ADRs, several statistical models and Machine Learning (ML) algorithms

have been evaluated against an edition of SIDER, or a subset of this repository. In addition

to methodological differences, these approaches have leveraged different data sets and a

variety of knowledge bases as a basis for making predictions. In the section that follows, I

89

will provide a review of these approaches, and the performance they have documented for

the prediction of ADRs in SIDER.

Pauwels et al. represented drugs using as features the presence or absence of chemical

substructure components described in PubChem (B. Chen, Wild, & Guha, 2009). In

addition to standard supervised ML approaches, they applied canonical correlation analysis

(CCA), including a sparse variant that emphasizes a small number of informative features

for each training example. These methods were used to predict SIDER side effects, with a

reported global AUC of 0.8932 (Pauwels et al., 2011) on a set of 1350 ADR and 888 drugs,

using fivefold cross-validation.

Subsequently, Liu et al. applied five supervised ML algorithms to the same SIDER set. In

addition to the PubChem-derived chemical substructure features used by Pauwels et al.,

features were drawn from DrugBank (Knox et al., 2011) (drug targets, transporters, and

enzymes), KEGG (Kanehisa & Goto, 2000) (pathway information) and SIDER itself (drug

indications and side effects). A classifier was built for each SIDER ADR, and the classifiers

were then evaluated on 832 SIDER drugs (for which DrugBank IDs could be found) using

fivefold cross-validation. The support vector machine (SVM) algorithm performed best

with a global AUC of 0.9524 on the full SIDER dataset (Liu, Wu, et al., 2012). The authors

attribute much of the improvement in performance by this and other metrics to the effects

of incorporating SIDER side effects as features, suggesting that certain side effects have a

tendency to co-occur in drug label data.

Other authors have reported performance on subsets of SIDER using similar methods. For

cardio-toxicity related ADRs in SIDER, a median AUC of 0.771 using SVM for prediction

has been reported (Huang et al., 2011). In this case, features were selected from information

90

about intended drug targets in DrugBank, and information about off-target effects from an

expanded protein-protein interaction network developed using gene ontology (GO)

annotations. A SVM classifier was built for each evaluated ADR and cross-validated on

SIDER.

With respect to performance, two of these studies, Pauwels et al. (Pauwels et al., 2011) and

Liu et al. (Liu, Wu, et al., 2012) report a global AUC of close to 0.9 or higher with the best

of their methods. Though the results are not directly comparable as I made predictions on

a per-drug rather than a per-ADR basis, the difference between the global AUC of these

methods and that obtained with our approach seems large. However this difference in

global AUC is misleading. As noted by Liu and colleagues in their paper, the imbalance

between positive and negative examples across ADRs and the way in which the global

AUC was calculated in this work leads to an apparent inconsistency between it and the

other evaluation metrics presented. For example, Pauwels and his colleagues display the

AUC across different ADRs in a series of box plots, which shows a median AUC for the

best-performing method (by this metric) of slightly above 0.6. Acknowledging this issue,

Liu et al. also report precision and recall for each evaluated method with, for example,

precision of 0.66 and recall of 0.63 for SVMs with their maximal feature set. Notably, the

AUC in this case was around 0.95.

This apparent inconsistency can be explained by the effect of the prevalence of positive

examples for each ADR on the prediction strength. This is readily apparent for simple

algorithms such as Naive Bayes, where the prior probability of a given category is

incorporated into the estimate. However, it is also an issue for more sophisticated

algorithms such as SVM (Akbani et al., 2004) particularly when the imbalance between

91

categories is severe. This is the case for many of the ADR examples: Liu et al. report a

positive to negative ratio of around 1:166 for 554 of the 1135 ADRs. So given the same set

of features, instances in these cases are likely to receive a lower prediction score than those

in balanced cases. When these scores are aggregated across examples to generate a global

AUC, ML methods that incorporate the category bias will obtain an inflated global AUC

on account of this tendency to assign lower scores to instances with few positive examples.

However, as noted by Liu et al. and demonstrated by the other reported metrics, this AUC

is not an accurate reflection of the ability of these models to detect positive examples.

To simulate the effects of category bias on global AUC, I performed a simple experiment

in which I multiplied the similarity scores produced by our model by the proportion of

positive examples for each drug. This roughly approximates the effects of an accurately

estimated prior class probability during cross-validation experiments. This resulted in an

increase of our global AUC from 0.68 to 0.88. I do not present this result for the purpose

of comparative evaluation, as our experiments are not directly comparable with prior ML

work for other reasons I will subsequently discuss. Rather, I present it as an illustration of

the disproportionate influence of category bias on global AUC, which underscores the

issues with this evaluation metric raised by Liu et al. I trust it will also serve to dispel the

misleading impression that the predictive accuracy of our methods is vastly inferior to that

reported previously.

As our method does not consider the number of ADRs associated with a particular drug,

the global AUC and median AUC approximately agree with one another. Our median AUC

(across all drugs) of 0.7058, which falls somewhere in between that reported by Pauwels

et al. (Pauwels et al., 2011) (across all ADRs) and Huang et al. (Huang et al., 2011) (across

92

cardio-toxicity related ADRs only). On account of the difference in denominator these

results are not directly comparable, but they do further illustrate the discrepancy between

global and local AUC in models that are not agnostic to class imbalance. Arguably such

agnosticism is desirable from the perspective of an expert review, as it is difficult to justify

the assertion that those drugs with fewer known associated side effects should be

considered less likely to cause some newly observed side effect (and vice versa).

With respect to methodological differences, all of the above methods are supervised ML

methods, and were applied to infer whether or not drugs were associated with each ADR

from the features of other drugs known to be associated with this ADR. So the predictive

models were generally customized on a per-ADR basis, for example by generating an

individual classifier for each ADR in the case of SVM. In contrast, our approach infers a

set of abstract reasoning pathways that were consistent across the drugs I evaluated.

However, as illustrated by the absence of evidence across certain pathways in the

rosiglitazone example, some pathways may be more predictive for particular medications

or ADRs. So it seems likely that I could further improve our performance by incorporating

supervised ML, a direction I plan to explore in future work.

Our approach differs with respect to the knowledge sources utilized also. For example,

KEGG and DrugBank are manually curated databases. Our knowledge base, SemMedDB,

contains predications that have been automatically extracted by SemRep from the

biomedical literature using NLP. Inaccuracies in language processing, or indeed in the

literature itself may introduce sources of error that are not present in manually curated data.

However, the scope of the literature is much broader than that of human-curated resources.

Furthermore, as there is no agreed-upon gold standard for ADRs, different studies have

93

utilized different datasets as reference sets (Tatonetti et al., 2012). Our study employed

SIDER2, which includes considerably more drugs and ADRs than SIDER1.

This work has several limitations. The first of these concerns the use of SIDER2 as a

reference standard. As SIDER2 consists of recognized side effects only, I cannot reliably

distinguish between false positive signals and previously unknown ADRs. Furthermore,

SIDER was compiled from package insert information by NLP tools (Kuhn et al., 2010),

and as such may include side effects that seldom occur in practice or false associations that

were caused by text-mining errors (Kuhn et al., 2013). While SIDER2 is sufficient to

evaluate the hypotheses of the current work, in future work I plan to incorporate other data

sources, such as EHR data and FDA reports. These data sources may provide additional

evidence to support the assertion that an unknown drug/ADR pair is worth investigating

further. Alternatively, they may provide the means to select a subset of the side effects in

SIDER2 that have been observed frequently in practice as an additional evaluation set.

Secondly, the MMB repository contains one year less literature than the SemMedDB

dataset. There is a difference of 1,7579,64 citations (7.9% of SemMedDB dataset). These

were the newest datasets at the time of the experiment. However the MMB repository has

many more data points than SemMedDB. For example, more than 99.99% of citations have

concepts extracted by MetaMap and 59.91% of citations have predications extracted by

SemRep.

Another concern is the existing knowledge about causal relationships between drugs and

related ADRs from the literature. For our dataset (90,787 pairs), 45% of pairs (concerning

953 drugs) co-occur directly in the MMB repository and 5% of pairs (concerning 693

drugs) have direct causal relationship (drug CAUSES ADR) in SemMedDB. So PSI’s

94

accuracy is dependent upon its ability to meaningfully infer connections between concepts

that were not previously linked in its database, a capacity that would be particularly useful

as a means of assessing novel ADRs that had not previously been documented in the

literature. RRI is also able to draw such inferences, but in this case more of its performance

may be attributable to direct co-occurrence.

Inspecting the middle terms that our model retrieved for rosiglitazone-myocardial

infarction association (Figure 4-10), I found that at times uninformative high-level

concepts, such as “genes” and “proteins”, were retrieved. In our study, I addressed the issue

of uninformative high level concepts in two ways, both related to their propensity to occur

relatively frequently in the corpus. Firstly I used a frequency threshold of 1,000,000 to

exclude frequently occurring concepts contained in SemMedDB. The frequency of “genes”

and “proteins” is less than the threshold and cannot be filtered. Secondly, I used a weighting

procedure to reduce the influence of high-frequency terms on the training process.

However, more sophisticated approaches to filtering are possible. Information concerning

UMLS semantic types and position in the UMLS hierarchy could be used to develop more

sophisticated approaches, to further filter out uninformative high-level concepts, which

may improve performance.

The predictions made by PSI depend upon assertions extracted from the biomedical

literature. One concern about the extracted predications is that they may be implausible on

account of NLP errors. Though SemRep has been optimized for precision, its precision is

not perfect. For example, Kilicoglu and colleagues estimate the precision of SemRep to be

around 0.77 (Kilicoglu et al., 2008). Based on this, and other published evaluations (Ahlers,

Fiszman, Demner-Fushman, Lang, & Rindflesch, 2007; Kilicoglu et al., 2010), it is

95

reasonable to estimate that around three in four predications in the set are perfectly

accurate. In many cases, inaccurate predications nonetheless indicate co-occurrence, which

is also informative. The PSI-based analogical reasoning approach I have employed is

robust to isolated language processing errors, as highly ranked predictions are based on

assertions extracted from thousands of unique reasoning pathways. For example, for the

rosiglitazone-myocardial infarction association, 108,100 unique predications were

retrieved, spanning eight of the inferred reasoning pathways. On average, individual

predications were supported by 17 excerpts from the literature. If I extrapolate from prior

published evaluations of SemRep, the predication concerned would have been accurately

extracted from around 12 of these excerpts. So it is likely that at least some of the evidence

supporting each individual assertion is accurate. Moreover, as this method is distributional

in nature, it does not require that these assertions be perfectly accurate. Rather, the

frequency with which an assertion is extracted factors into the strength of its contribution

to a reasoning pathway. Nonetheless, the biomedical literature may contain controversial

assertions, or contradictory conclusions from different experts or different experiments.

This is illustrated by the rosiglitazone (brand name: Avandia) case. In 2007, the FDA added

a black-box warning for heart-related risks to Avandia based on a meta-analysis (Nissen &

Wolski, 2007) and three other studies (U.S. Food and Drug Administration, 2007). In 2013,

the FDA lifted certain Avandia prescribing restrictions based on the readjudicated results

of the RECORD trial (R. D. Lopes et al., 2013; Mahaffey et al., 2013), claiming the initial

concerns were overblown (U.S. Food and Drug Administration, 2013). This decision was

condemned by one of the authors of the original meta-analysis (Thompson, 2013).

Currently our models weight the contribution of assertions using statistics related to local

96

and global frequency. However, it would also be possible to weight the importance of these

assertions based on some assessment of the reliability of the source. For example, in

information retrieval experiments, an approach incorporating citation information was

better able to identify articles considered as important in a pre-existing bibliography

(Herskovic & Bernstam, 2005). Possibilities include weights derived from the citation

count of the source article, the impact factor of the journal, or the nature of the experiment

described. It is possible that weighting metrics of this source would improve the predictions

of our models, and they also suggest approaches to prioritize the large numbers of

assertions supporting our predictions for review by human experts.

4.6 Conclusion

In this research, an emerging, scalable method of LBD that uses distributional statistics to

infer and apply discovery patterns was adapted to evaluate the plausibility of drug/ADR

relationships for the purpose of pharmacovigilance. The effective application of large

amounts of partially accurate biomedical knowledge to this problem was facilitated by the

scalable and robust nature of approximate inference in geometric space. This approach was

shown to be more effective than a comparable co-occurrence based baseline, and has the

further benefit of permitting the retrieval of evidence underlying the assertions used by the

system to make its predictions. Consequently, our approach provides the means to assist

with expert clinical review by providing evidence supporting the plausibility of the

connection between drugs and ADRs. Furthermore, the models I have developed can be

applied to filter drug/ADR signals that are detected in spontaneous reporting systems or

EHR data, a direction I plan to explore in future work.

97

Chapter 5: Toward a Reality-based Repository of Adverse Drug Reactions:

Comparing SIDER2 with Side Events Encountered in Practice

Recent informatics research has focused on the development and evaluation of automated

approaches to pharmacovigilance (D. J. Almenoff et al., 2005; Andrew Bate, 2003; Wang

et al., 2009). However, since the purpose of pharmacovigilance is to monitor and predict

previously unknown side effects, there is no complete reference set with which to validate

detected drug/ADR associations. Nonetheless, there is a need for such a gold standard in

pharmacovigilance (Manfred Hauben & Aronson, 2007). For example, a reference set is

required to evaluate a signal detection algorithm or validate the predictions of a

pharmacovigilance system. As it is not practical to construct a large-scale validation set

consisting of previously unknown side effects, common strategies involve validation

against published drug reference book (Lindquist et al., 2000), drug safety related labelling

changes (Manfred Hauben & Reich, 2004), curated reference sets (Coloma et al., 2013;

LePendu et al., 2013; Ryan et al., 2012; Ryan, Schuemie, Welebob, et al., 2013), or large-

scale database of known drug/ADR associations (Kuhn et al., 2010).

The EU-ADR (Coloma et al., 2013) and the Observational Medical Outcomes Partnership

(OMOP) (Ryan et al., 2012; Ryan, Schuemie, Welebob, et al., 2013), major drug

surveillance projects in Europe and America, systematically developed small curated

reference sets for evaluating their new developed methods on drug safety surveillance with

observational healthcare data (for example, administrative claims and EHRs). The EU-

98

ADR reference standard (Coloma et al., 2013) contains 94 drug-event associations (44

positives and 50 negatives). This was constructed based on literature, WHO VigibaseTM

(WHO spontaneous reporting system (SRS)) and clinical adjudication. These associations

were restricted to ten important events in pharmacovigilance and are adequately

represented in the EU-ADR network. With the experiences of previous OMOP experiments

and stakeholder interest, OMOP selected four events to construct a reference standard for

methodology evaluation. The OMOP reference standard (Ryan, Schuemie, Welebob, et al.,

2013) contains 399 drug-event associations (165 positive controls and 234 negative

controls) that were developed from drug labeling, Tisdale level of causative evidence

review (James E. Tisdale & Douglas A Miller, 2010) and systematic literature review.

These associations are also required to have sufficient representations in OMOP databases.

These two reference standards have been applied to evaluate different statistical methods

in discriminating true effects from false drug-event associations with the EU-ADR

databases (Schuemie et al., 2012, 2013) and the OMOP databases (William DuMouchel,

Ryan, Schuemie, & Madigan, 2013; Madigan, Schuemie, & Ryan, 2013; Norén et al., 2013;

Ryan, Schuemie, Gruber, Zorych, & Madigan, 2013; Ryan, Schuemie, & Madigan, 2013;

Suchard et al., 2013).

In addition to the curated relative small reference sets, researchers have used the Side Effect

Resource (SIDER) to serve as a gold standard in a large scale to validate detected ADRs

or to evaluate drug-event signal detection systems (Deftereos et al., 2011; Saiakhov,

Chakravarti, & Klopman, 2013; Yates & Goharian, 2013). SIDER contains information

about the side effects of drugs that has been extracted from package inserts by text mining

tools. However, there is reason to believe that SIDER may include false drug/ADR

99

associations, which would lead to inaccurate evaluation of the systems concerned. It has

been argued that package inserts over-report with respect to side effects (Duke J et al.,

2011). Analysis of SIDER has shown that text mining errors may occur when processing

the package insert; for instance, generic warnings have been mistakenly extracted from

labeling information (Kuhn et al., 2013).

Therefore, to use a large scale reference standard in pharmacovigilance evaluation, it is of

interest to determine the extent to which the side effects reported in SIDER occur in

practice. By limiting to the side effects that have sufficient representations in practice, false

drug/ADR associations can be eliminated from SIDER and the predictive power of

pharmacovigilance methods can be less affected by inadequate sample size of drugs or

events. The information of practice usage can be obtained from a post-marketing SRS. A

SRS is designed to collect anecdotal case reports. Even though they are not peer-reviewed,

the case reports in SRSs can provide supporting evidence for possible drug/ADR

associations (J. K. Aronson & Hauben, 2006). These collected data represent suspected

drug/ADR associations reported by healthcare practitioners that observed a reaction in a

patient under their care, by pharmaceutical companies that are mandatory required to report

any collected side effects, or by consumers that may experience unpleasant reactions for

drug treatments. Therefore, these reports provide anecdotal evidence that a side effect

mentioned in a repository has occurred in practice.

In this study, I evaluate the frequency with which SIDER2 (SIDER version 2) drug/ADR

associations occur in reports submitted to the Food and Drug Agency (FDA) Adverse Event

Reporting System (FAERS). The ultimate goal of this research is to develop a drug/ADR

100

reference set consisting of those label-derived side effects that have been observed in

practice, for the purpose of pharmacovigilance research.

5.1 Materials


and their known adverse reactions (Campillos, Kuhn, Gavin, Jensen, & Bork, 2008; Kuhn

et al., 2010). The current version (SIDER2) was released in 2012, and was used for this

study. The information in SIDER2 was extracted from drug labeling information (package

inserts) using text mining tools and a side effects dictionary. The source of package inserts

includes British Columbia Cancer Agency, Facts@FDA, FDA Center for Drug Evaluation

and Research, FDA MedWatch, and Health Canada Drug Product Database (DPD).

Labeling information is from clinical trials, post-marketing surveillance, etc. To construct

the side effects dictionary (Kuhn et al., 2010), Coding Symbols for Thesaurus of Adverse

Reaction Terms (COSTART) was used as a seed dictionary and then was expanded by

extracting synonyms for the seed dictionary from the Unified Medical Language System

(UMLS). To build SIDER drug-event associations, the drugs’ indications and side effects

related sections were processed to extract terms that correspond to the side effects

dictionary and the terms extracted from indications were subsequently excluded from

drugs’ side effects. SIDER2 contains 99,423 drug-event associations for 996 drugs and

4192 side effects.

FAERS is a spontaneous reporting system database that contains information on adverse

event and medication error reports that were submitted to the FDA by pharmaceutical

companies, health professionals and consumers in the United States (Center for Drug

101

Evaluation and Research & U.S. Food and Drug Administration, 2012). FAERS database

contains 332,346 distinct strings in the “drug name” field, and 16,272 distinct strings in the

“reaction” field within 4,070,077 reports collected between 2004 to 2012 Q3. MetaMap is

a widely-used NLP tool that identifies concepts from the UMLS in biomedical text (A. R.

Aronson & Lang, 2010; A. R. Aronson, 2001).

5.2 Methods

5.2.1 Import SIDER2 and FAERS data in database

SIDER2 was downloaded and imported into a local database. Distinct drugs and side

effects were retrieved from the “meddraAvderseEffects” table. Publicly available FAERS

quarterly data files (2004 to 2012) were downloaded from the FDA website (Center for

Drug Evaluation and Research & U.S. Food and Drug Administration, 2014). I imported

all data into a SQL database aside from the 2012q4 data, since the metadata used in this

quarter were inconsistent with the database as a whole. The database contain 4,070,077

FAERS reports. For this study, I used information in the “Drug” and “Reaction” tables.

5.2.2 Parsing drug and side effect terms using MetaMap (Figure 5-1)

2013 edition of MetaMap (MetaMap 2013 was used to process and annotate retrieved drugs

and reactions and map them to UMLS concepts (U. S. National Library of Medicine, 2009)

for the purpose of mapping between FAERS and SIDER2. MetaMap identifies matches

between terms in text and candidate concepts from the UMLS. Each of these candidates is

annotated with a confidence score, a Concept Unique Identifier (CUI), a preferred label,

and one or more semantic types. The semantic types indicate the type of the concept.

102

Examples include “Pharmacologic Substance” or “Diagnostic Procedure”. Candidates can

be assigned multiple semantic types.

I manually reviewed distinct semantic types or semantic type combinations for MetaMap

processed SIDER2 drugs and side effects and found out two features of drugs or side effects

relevant semantic types. First, there are semantic types that are irrelevant to my research,

such as “Organ or Tissue Function”, “Organism”, or “Activity”. Candidates of those

semantic types were excluded. Second, a combination of semantic types could often

contribute a more relevant classification. For example, the semantic type “Immunologic

Factor” occurred very frequently and often was associated with concepts that were not

relevant to drugs. However “Immunologic Factor” in combination with “Pharmacologic

Substance” provided more insight, and identified a relevant candidate in the context of our

study. Another example of semantic type combinations is a concept annotation that has a

candidate with only one semantic type having a higher confidence score and a candidate

with semantic type combinations having a less confidence score. The manual review

revealed that candidates with combinations of certain semantic types yielded more relevant

concepts. Consequently, restricting the result list by only accepting very specific semantic

types or combinations may increase the relevance of the concepts identified by MetaMap

to the study.

103

Figure 5-1: The procedure for mapping SIDER2 and FAERS

5.2.3 Mapping between SIDER2 and FAERS

After annotating SIDER2 and FAERS with selected semantic types or semantic type

combinations, corresponding CUIs were retrieved and used to find the matched FAERS

drugs or ADRs for parsed SIDER2 drugs or ADRs. Figure 5-2 illustrates this process using

the example drug “dipyridamole”. For this drug, 78 FAERS drug inputs were found.

104

Figure 5-2: Procedure of mapping SIDER2 drugs/side effects to FAERS drugs/reactions

using “dipyridamole” as an example

5.2.4 Retrieve number of reports for mapped SIDER2 drug-side effect pairs and

perform disproportionality analysis to find significant SIDER2 drug/ADR

associations

For each SIDER2 drug-side effect pair, I retrieved the number of related FAERS reports

through their mapped FAERS drugs or reactions. Then I constructed a contingency table

for each drug-side effect association with corresponding FAERS terms. Disproportionality

measures were used to calculate the significance of the association between drug and side

effect pairs co-occurring in the FAERS reports. These methods operate under the

assumption that if a drug causes a side effect, this drug and this event are more likely to

appear together than random combinations of drugs and suspected adverse events in the

SRS reports (Cao et al., 2005).

105

Routinely used disproportionality measures for pharmacovigilance drug-side effect signal

detection include the Reporting Odds Ratio (ROR), Proportional Reporting Ratio (PRR),

Yule’s Q, and relative risk (RR) (Balakin & Ekins, 2009; Cao et al., 2007; Emmanuel Roux

et al., 2005; van Puijenbroek et al., 2002). In addition to these measure, I applied Chi-

square test with the False Discovery Rate (FDR), which was introduced to measure the

multiple-hypothesis testing error and has been successfully used in large-scale genomic

studies to control false positive results (Benjamini & Hochberg, 1995; J. D. Storey &

Tibshirani, 2003; J. D. Storey, 2002).

5.2.5 Performance measure

Descriptive statistics of agreement for drug-side effect relationships between SIDER2 and

FAERS were estimated. We also conducted a manual review of a random sample of 60

pairs that occurred in SIDER2 without accompanying FAERS reports, to investigate the

cause of this mismatch. DailyMed was used to retrieve drug package inserts and review

the “Adverse Drug Reaction” section to track down the side effect of interest. MEDLINE

was queried to find published literature or case reports that relate to the drug/ADR pairs

retrieved from the package inserts.


To identify the subset of SIDER2 drug/ADR pairs that have been reported in practice, I

compare these pairs with anecdotal evidence from FAERS reports (Figure 5-3). To map

SIDER2 drugs to text in the FAERS “drug” field, and SIDER2 side effect to text in the

FAERS “reaction” field, MetaMap 2013 was used to process all terms and a manual review

process of selecting related semantic types that (Figure 5-1) was used to select appropriate

106

UMLS candidates. Multiple disproportionality analysis methods (A. R. Aronson & Lang,

2010; A. R. Aronson, 2001; Center for Drug Evaluation and Research & U.S. Food and

Drug Administration, 2012) were then applied to analyze the significance of co-occurring

drug-side effect associations. The significantly reported SIDER2 drug/ADR associations

that are detected by all disproportionality measures are considered as a pharmacovigilance

reference standard.

Figure 5-3: Study design for decreasing false SIDER2 drug-side effect associations

5.4 Results

5.4.1 Concept extraction

Using the mapping procedure, I mapped 99% (986/996) of SIDER2 drugs and 97%

(4072/4192) of SIDER2 side effects to UMLS concepts; 78% of FAERS drug strings

(259,806 drugs) and 98% of FAERS reaction strings (15,989 reactions) to UMLS concepts.

Among SIDER2 mapped UMLS concepts, 969 drugs and 3853 side effects were mapped

to FAERS drug and reaction inputs. These concepts constituted 94.85% (94,306) of the

SIDER2 drug/ADR pairs, which were subsequently compared with FAERS reports. Of

107

these 94,306 SIDER2 drug-side effect pairs, 11,306 pairs do not co-occur in FAERS

reports. 83,000 drug/ADR pairs have co-occurring reports in FAERS database.

For this study I only included reports that were related to these UMLS concepts.

Consequently, 4,070,225 reports were used for disproportionality analysis.

5.4.2 Reported SIDER side effects

I analyzed the frequency distribution of reports related to 83,000 SIDER2 drug/ADR pairs

that have co-occurred in FAERS reports (Figure 5-4). The number of FAERS reports

ranges from 1 to 360,146, with mean of 1944, median of 85, 1st quartile at 14, and 3rd

quartile at 598. The histogram of representing all pairs has long right tail and data are

skewed. For a more granular picture of this distribution, I plotted the histogram for each

quartile (Figure 5-4). Most drug/ADR pairs have a relatively small number of FAERS

reports and less than 600 FAERS reports.

108

Figure 5-4: Frequency distribution of SIDER2 drug/ADR pairs (y-axis) that are grouped

by the number of FAERS reports that contain the drug/ADR of interest (x-axis)

5.4.3 Statistically significant drug/ADR pairs detected by disproportionality

measures

I evaluated the significantly reported instances of SIDER2 drug/ADR associations by

applying disproportionality measures using FAERS data (Table 5-1). About 60% to 80%

of SIDER2 drug/ADR pairs met the thresholds for significance of these statistical

measures. 46,203 SIDER2 pairs (904 drugs, 2984 side effects) were detected as statistically

significant by all statistical metrics.

109

Table 5-1: The percentage of statistically significant SIDER2 drug/ADR pairs using

disproportionality measures

ROR-1.96SE>1 PRR-1.96SE>1 YulesQ-

1.96SE>0

IC-2SD>0 Chi-Square test

with FDR (q

Value<0.05)

60%

(50,139/83,000)

60.9%

(50,570/83,000)

61.4%

(50,978/83,000)

55.7%

(46,209/83,000)

79.7%

(66,178/83,000)

5.4.4 Manual review sample drug/ADR pairs without case reports

I randomly selected 60 drug/ADR pairs from 11,306 pairs that were not included in any

FAERS report and analyzed possible reasons for why there are no case reports (Figure 5-

5). Only 25 pairs were listed as side effects in the drug labeling or package inserts. Among

those 25, eight side effects were listed as “infrequent side effects” or less than 1% of tested

patients. For three side effects, the trial provided limited evidence whether they were

caused by the drug under trial. In two instances (cidofovir related hypophosphatemia and

glycosuria), participants were on several other medications and so it is difficult to establish

the causal relationships. In another instance (teniposide related arrhythmia), there was only

one report of this complication during pre-marketing trials, it was presumed rather than

confirmed clinically, and it had occurred in an elderly patient with a variety of other health

problems. Another six pairs among the 25 pairs had either mild side effects (e.g. dry throat,

dizziness), or concerned a rarely prescribed drug (e.g. maraviroc). I could not find a

plausible explanation for the remaining eight side effects that did not occur in any FAERS

report; these side effects were listed in “Adverse Reactions” section for different systems

without frequency information in the package inserts.

110

Among the remaining 35 drug/ADR pairs, six did not concern drugs; they instead

concerned a vitamin or some other chemical component. So no package insert was

available specifically the six chemical components (e.g. 1,25-dihydroxyvitamin D3,

betaine). For seventeen pairs package inserts were available, but the labeling does not list

the side effects concerned. Four pairs suggested a possible text mining error, as SIDER2

side effect was close to the information in the labelling either conceptually or

orthographically, but is, in fact, not a match. For four pairs that are drugs (e.g. ofloxacin),

I couldn't find the package insert in the DailyMed database.

Figure 5-5: Interpretation of 60 randomly selected SIDER2 pairs without any supporting

reports

5.5 Discussion

The study suggests that a higher precision subset of SIDER2 pairs could be identified by

filtering out possible false positives using an inclusion criterion based on statistically

significant association in the FAERS data. These false positives may result from NLP

6

4

17

8

8

5

1

8

3

Not drug

No labeling

Not listed in labeling

NLP error

Listed in labeling

Listed in labeling but minor ADR

Infrequent prescribed drug

Listed in labeling, infrequent ADR

Listed in labeling, unclear causal relationships

111

errors during the parsing of drug labeling or inserts; on account of the inclusion of

infrequent side effects that were not conclusively linked to the drug, but nonetheless

occurred during clinical trials; or for other reasons. For the evaluation of new methods in

pharmacovigilance it is imperative to use a high quality data set. If a data set used as a gold

standard contains uncertainties itself the performance assessment of a new system or

algorithm against these data set is difficult. It cannot be clearly concluded as to why the

results of a new method show a certain outcome. Both a bias towards good or bad results

could be caused by the quality of the reference data rather than by the method itself.

Furthermore, in the case of machine learning systems trained on these data, the inaccuracies

inherent in the data set may impair the performance of the derived models.

There are some limitations to this study. With the mapping procedure, it frequently

occurred that input strings, specifically those of reactions, were mapped to multiple

concepts. This was the case when either no concept existed that expressed the entirety of

the input, or MetaMap simply could not recognize it. For example, “carcinoma of the small

bowel” exists as a concept in and of itself, even though it is a combination of carcinoma as

a clinical observation and small bowel as a body location. On the other hand, for “splenic

embolism” MetaMap lists a candidate concept for each of embolism and spleen. The

spleen, which is of semantic type “Body Part, Organ, or Organ Component” is certainly

not an appropriate result for splenic embolism, while the embolism itself is too general as

it may appear at other body sites also. So rather than accepting “embolism” by itself, a

combination of those two concepts would be most appropriate. The main concept,

embolism in the example, and one or more concepts that contribute, modify, or tag the

main concept could be combined. Similarly, for the input string “neutrophil count

112

increased” MetaMap detects “neutrophil count” as “Laboratory Procedure” and

“increased” as “Quantitative Concept” but without connection to the neutrophil count.

Considering a combinations of concepts of different semantic types that MetaMap treats as

atomic units may improve the quality of our results.

It also has been argued that SRSs are vulnerable to bias, and that the ADR under-reporting

rate is considerable. Heterogeneity and bias result from different reasons: severe effects

were more reported than mild ones, and labeled adverse drug reactions are more likely to

be reported than unlabeled ones (Benjamini & Hochberg, 1995). The first of these findings

is supported by our interpretations of 60 randomly selected SIDER2 drug/ADR pairs

examined without FAERS reports, as some of these side effects were mild in nature. That

labeled side effects are more frequently reported supports our argument for the use of

FAERS reports as a means to validate drug/ADR repositories.

An important outcome of this study is a high-precision subset of SIDER2 in which reported

drug/ADR pairs are supported by the statistical significance of their FAERS reports. I

anticipate that this subset will be of value for the training and evaluation future

pharmacovigilance systems and new ADR signal detection algorithms.

5.6 Conclusion

Based on the consistency between FAERS and SIDER2 drug/ADR associations, a

drug/ADR reference set is retrieved. The utility and application of the reference set will be

further evaluated in pharmacovigilance research.

113

Chapter 6: Improving Signal Detection from the Electronic Health Records by

Using Literature Based Discovery

In previous experiments, I demonstrated that (1) possible drug/ADR signals can be detected

from EHR data; and (2) LBD models based on methods of distributional semantics can

identify plausible ADRs. In this experiment, I test my unifying hypothesis that signal

detected from EHR data can be improved by integrating knowledge from the literature.

Specifically, I propose that the performance of signal detection from the EHR can be

improved by reranking the detected signals using LBD models based on distributional

semantics. I evaluate this procedure using the drug/ADR reference set that we developed

from SIDER2 and FAERS, as well as with SIDER2 itself.

6.1 Experimental design (Figure 6-1)

In Chapter 3, I described how Drug/ADR signals were detected from EHR data hosted in

a Clinical Data Warehouse (CDW) using several disproportionality measures and other

statistical tests. The best performing statistical model from those experiments was chosen

for this one. Likewise, the best performing LBD model from the experiments described in

Chapter 4 was utilized for this experiment. To perform the experiment, drug/ADR signals

were detected in EHR data using the statistical model and ranked in accordance with the

strength of their associations (according to the statistical model). These drug/ADR signals

are referred to as pre-reranking drug/ADR signals. Subsequently, these signals were

114

reranked in accordance with the similarity scores (the measure for plausibility) that the

LBD distributional model estimated from the relatedness between drugs and ADRs across

the set of discovery patterns identified in the experiments described in Chapter 4.

Consequently, these are referred to as post-reranking drug/ADR signals. They were then

evaluated by comparing the pre- and post-drug/ADR signals to both SIDER2 and a

reference set that we developed from it.

Figure 6-1: Overall research design for reranking statistically significant drug/ADR

associations by similarity scores

6.2 Materials and Methods

6.2.1 Models selected for this study

I have elaborated methodologies for statistical data mining from EHR and LBD

distributional semantics in Chapter 3 and Chapter 4, respectively. Of these methods, the

Chi-square statistics with FDR (for estimating the statistical significance of observed

115

associations) and PSI double+triple group were best performing models in their respective

experiments, and consequently were utilized for this one.

In pre-reranking drug/ADR signals, the chi-square score is used as to reflect the statistical

strength of the association observed in CDW data. The FDR-based q values are used as a

significance threshold, to predict if the signal is true or false. Similarity scores from PSI

double+triple model are used to rerank those signals that fell above the q-value threshold.

6.2.2 Experimental reference standards and test dataset

I evaluated the above models using two reference standards – side effect resource

(SIDER2) and the reference set I developed using SIDER2 and FDA adverse event

reporting system (FAERS, as described in Chapter 5). The reference set is relatively small

and contains only those SIDER2 relationships that frequently occurred in SRS databases.

For each reference standard, its drugs and side effects that are not only contained in the

CDW EHR data but also represented in the PSI model were eligible for this experiment.

This resulted in a set of 811 drugs and 1879 ADRs with SIDER2 and 773 drugs and 1374

ADRs with the reference set (Table 6-1). Among these, only the predicted positive pairs

by Chi-square statistics with FDR as statistically significant associations were utilized for

the analysis.

Table 6-1: SIDER2 and a Reference Set are used to construct the dataset for the

experiment

Group Drugs ADRs True

Pairs

False

Pairs

Total

Pairs

Statistically

significant

associations

True

positives in

statistically

significant

associations

SIDER2 811 1879 260,555 395,468 436,865 260,555 28,114

116

Group Drugs ADRs True

Pairs

False

Pairs

Total

Pairs

Statistically

significant

associations

True

positives in

statistically

significant

associations

Reference

Set

773 1374 40,838 315,544 356,382 215,024 27,765

6.2.3 Performance metrics

Average precision (AP) is the average of the precision that is measured at the rank at which

each correct prediction is retrieved. Precision at 100 (Manning et al., 2008) is the precision

at the first 100 retrieved results. A true positive drug/ADR association, defined as

“rediscovery”, is an adverse effect that is confirmed by SDIER2 or the reference set. The

median rank of rediscoveries across statistically significant drug/ADR associations

approximates the point in the ranked list that half of the known adverse effects were

recovered by Chi-square statistics with FDR or PSI-double+triple model.

AP, precision at 100, and median rank of rediscoveries are calculated and compared

between pre- and post- reranking for the two datasets. A two-sample Kolmogorov-Smirnov

test was conducted to evaluate if there is a significant difference for the total number of

rediscoveries at each rank between pre- and post-reranking signals. The rediscovery at each

rank is also plotted for different dataset.

To measure the performance with respect to the true positive rate (TPR) and false positive

rate (FPR), receiver operating characteristic (ROC) curve was plotted for statistically

significant drug/ADR associations. Subsequently, an area under the ROC curve (AUROC)

117

from the ranking of each model and an area for precision and recall (AUC for Precision-

Recall) were calculated using AUCCalculator (Davis & Goadrich, 2006).

6.3 Results

6.3.1 Performance

Results of comparing pre-reranking and post-reranking for different datasets are shown in

Table 6-2. PSI-based models perform better than RRI-based models and both models

perform better than the random baseline.

Table 6-2: Results of precision and rank-based measures for different groups and

different datasets

Test Set Group MAP Precision

at 100

Median

Rank

AUC for

Precision-

Recall

AUROC

SIDER2 Pre-

reranking

(Chi)

0.1424 0.16 103,528 0.1411 0.5708

Post-

reranking

(Similarity)

0.1640 0.21 85,166 0.1639 0.6323

Reference

Set

Pre-

reranking

(Chi)

0.1655 0.19 87,218 0.1641 0.5664

Post-

reranking

(Similarity)

0.1847 0.21 75,197 0.1846 0.6173

118

With the SIDER2 dataset, the median rank of true positives with EHR data (pre-reranking

signals) is 103,528; and in combination with the LBD model (post-reranking signals) it is

85,166. With the reference set, the median rank of true positives in pre- and post-reranking

signals is 87,218 and 75,197, respectively. Two-sample Kolmogorov-Smirnov test shows

there is a significant difference of the accumulated number of true positives at each rank

(Figure 6-2) between pre- and post- reranking methods for both datasets (all P-value <

2.2e-16).

Figure 6-2: Comparing accumulated number of true positives for each ranking between

pre- and post-reranking signals for SIDER2 set (top) and the Reference set (bottom)

119

6.3.2 AUC

Figures 6-3 and 6-4 present the global ROC curves. ROC curve shows the tradeoff between

sensitivity and specificity. The AUC provides a cumulative estimate of accuracy, and is

shown for each model in Table 6-2. With respective dataset, AUROC of post-reranking

method is greater than the pre-reranking method.

Figure 6-3: ROC plot of true positive rate and false positive rate for pre- and post-

reranking groups for SIDER2 set

120

Figure 6-4: ROC plot of true positive rate and false positive rate for pre- and post-

reranking groups for the Reference set

6.4 Discussion

In this experiment, drug/ADR signals that were predicted from EHR data using the best

performing statistical algorithm were reranked using the PSI model. This resulted in

significant increases of the true positive rate at a corresponding rank (in comparison to the

true positive rate with no reranking). Precision and AUC are better performed with post-

reranking method. The experiment demonstrates that the PSI model can filter noisy signals

using a measure of the plausibility of the relationship between the drugs and ADRs

concerned.

Overall, based on what I have learned from the series of experiments, I want to propose the

architecture for a plausibility-based pharmacovigilance system (Figure 6-5). This

121

framework describes the essential steps: (1) Process EHR data to extract coded drugs and

possible problems; a NLP tool is required for processing unstructured clinical notes. (2)

Select a cohort for drug/ADR analysis. (3) From the drug/problem candidates, existing

known relationships knowledge is used to filter the known drug/problem relationships. (4)

Conducting statistical analysis for drug/problem candidates and identifying statistically

significant drug/ADR associations. (5) Using the PSI model to justify plausible drug/ADR

associations and filtering the statistically detected signals using their plausibility. (6)

Present plausible drug/ADR signals and their evidence retrieved by the PSI model to

clinicians or practitioners for review. (7) Automatically submit the detected drug/ADR

signals with all related clinical information to PV health administrative departments. This

helps clinicians in making clinical decision for the patients that are taking the relevant

drugs.

6.5 Conclusion

This experiment supports my overall hypothesis that the precision of signal detection

from EHR data can be improved by integrating knowledge from the biomedical literature.

Figure 6-5: The proposed architecture for a plausibility-based pharmacovigilance system

122

123

Chapter 7: Key Findings, Innovation, Contributions, Future work and Conclusions

7.1 Overview and summary of key findings

For this thesis I conducted four experiments. First, of all the SIDER2 ADRs that occur in an

outpatient EHR system, I was able to detect 10-15% in that EHR data by using existing

disproportionality measures and Chi-square statistics with FDR. The Chi-square statistics with

FDR was demonstrated as the best performing model with an F-measure of 0.1826, evaluated

against SIDER2. Second, I built two LBD models based on scalable methods of distributional

semantics (RRI and PSI discovery patterns) to identify possible drug/ADR associations utilizing

the biomedical literature. The PSI discovery patterns model outperforms the RRI co-occurrence

based model and can be used to evaluate the plausibility of drug/ADR associations. It has the

additional advantage of modeling the relations between the involved medical concepts. Third, as

a consequence of possible false associations that exist in the SIDER side effects dataset used for

evaluation, I constructed a drug/ADR reference set. This reference set is based on the consistency

between FAERS and the SIDER data set. I then used this as an additional reference set for

evaluating side effects. Fourth and last, I applied a plausibility measure obtained through PSI

discovery patterns to rerank the statistically significant drug/ADR associations detected in EHR

data, which improved the precision against SIDER2 and the newly developed reference set. This

verified my overall hypothesis that using literature to justify plausibility can improve the quality

of signal detection from EHR data.

124

7.2 Innovation

I have developed means to partially automate the signal evaluation process by integrating

knowledge extracted from the biomedical literature with possible drug/ADR signals derived from

EHR data using statistical methods. To do so, recent LBD methods using “discovery patterns”

were adapted to the task of evaluating the plausibility of drug/ADR signals at scale. To the best of

my knowledge, this is the first time knowledge from the biomedical literature has been integrated

with EHR data for signal detection. Previous attempts at applying methods of literature based

discovery to find possible mechanisms to explain drug/ADR associations were conducted using

manually defined discovery patterns. My research describes a method to automatically define

pharmacovigilance related discovery patterns from the literature, and applies those to find

predication-based explanations in an automated way on a large scale.

In summary, this research is the first to integrate semantic predications into the signal detection

process to provide evidence supporting the plausibility of the connection between drugs and

ADRs. The automated evaluation of plausibility to support causality according to the meaning

defined by the Bradford-Hill criteria, is a novel contribution to the field of pharmacovigilance and

to signal detection in general.

7.3 Theoretical contribution

From a theoretical perspective, my work is motivated by the notion of abductive reasoning as

described by American philosopher CS Peirce (Peirce, 1955). According to Peirce, abductive

reasoning (Schvaneveldt & Cohen, 2010) is the process of seeking the best possible explanation

for an observation. The observation is a given and it is the goal to find an explanation that is

125

sufficient to justify the presence of the observation. This is in contrary to deductive reasoning

where from a given starting condition possible consequences are deduced. It is also important to

note the difference between a sufficient and a necessary condition. A sufficient explanation serves

to explain an observation but its presence is not necessary. Thus, the observation can still be

explained by other conditions and one explanation does not explicitly exclude others.

Currently in PV, signals indicating possible causal associations are discovered within a large

database and then manually evaluated. A problem with this approach is that mined associations

can be relevant or irrelevant, or may even be negative associations. Exhaustive manual review of

these potential signals is not feasible. In my research I abduce explanations in an automated way,

or, in other words, propose and find logical explanations for the identified associations in an

automated way. The theoretical value added is then to use the generated explanations to

collectively assess a measure for the plausibility of an association. The theoretical contribution is

constituted by not primarily focusing on finding explanatory hypotheses but by focusing on

assessing the plausibility of an observation. Where abductive reasoning is concerned with finding

truthful explanations for an observation, my framework adds to that by going a step further and

also assessing the truthfulness of the observation itself.

7.4 Practical contribution

Practically, the detection of signals from EHR data has the potential to improve post-marketing

drug surveillance in real world applications. The computational requirements of the employed

algorithms and tools are sufficiently low to allow for delivery of results in real time. In this context,

“real time” effectively means timely surveillance, and thus the early detection of ADRs that are

potentially harmful to patients. In the concluding experiment I showed that augmenting statistical

126

associations with a plausibility measure enhances the identification of known ADRs, suggesting

that this approach would also lead to more accurate identification of novel ADRs. Furthermore,

delivery of the underlying explanatory hypotheses to domain experts has the potential to increase

the efficiency of the critical clinical review process. This is because the number of associations

that is mined from EHR data is very high, and these signals still have to be manually evaluated.

Prioritizing signals automatically has the potential to speed up the review process. Noise can be

separated from those signals supported by plausible evidence because of the evidence provided to

researchers or practitioners. Moreover, the evidence provided is in a very concise format (at the

predication level rather than the document level) that allows for the exploration of a large amount

of evidence in an efficient manner.

With respect to informatics in general, the methods and procedures of integrating formal

knowledge can be generalized and applied to other domains. With the rapid growth of use of EHR

data for clinical research (Hripcsak & Albers, 2013), new findings can be learned from the EHR

data. The PSI discovery patterns model can be adapted and used to provide the automated

interpretation of the findings. A very interesting example is outbreak surveillance, for which a

timely identification of plausible signals is essential. For example, an influenza outbreak can be

identified from EHR data by mining abnormal lab results, symptoms and outpatient diagnoses. PSI

discovery patterns can retrieve possible sources, mode of transmission and risk factors (CDC,

2006) by analyzing the plausible pathways between the virus and the disease/symptoms. This can

assist field investigators’ work.

In summary, the detection of signals from EHR data and the subsequent evaluation using

supporting evidence from literature with PSI discovery patterns in pharmacovigilance on a large

scale in an automated way has not been done before and has shown good results. The methods and

127

procedures proposed and examined in this work are novel to the field of pharmacovigilance and

applicable to other informatics areas.

7.5 Future work

In future work, I plan to improve my methods for the estimation of plausibility, and provide better

ways for domain experts to explore the evidence that supports the explanatory hypotheses the

methods generate. Although predications are a concise way to present supporting evidence

gathered by PSI discovery patterns from the medical literature, the presentation to domain experts

for review in a concise way still presents a challenge. Since evidence is retrieved based on

predications which can naturally be built into a graph network, it is straightforward to examine if

graph algorithms can be utilized to analyze the evidence network. This can result in prioritization

or the identification of important biological factors, or even biological pathways. Subsequently,

improved visualizations could highlight specific aspects like biological factors and pathways and

could put an emphasis on important concepts through their connectedness for example. This would

add to the efficacy of the manual review process.

The methods developed for my research have the potential to support a real-time PV system. This

work has demonstrated that signal detection and signal evaluation can be done in a partially

automated and therefore, potentially more timely manner. I will continue working in this direction

with the aim to contribute to the development of an active drug surveillance system.

7.6 Conclusions

This thesis demonstrates that drug/ADR associations can be detected from unstructured outpatient

clinical notes by using statistical mining algorithms. PSI discovery patterns can further improve

128

the precision of detected signals by leveraging knowledge from literature and modeling the

plausibility of the identified associations. Consequently this work has extended the state of the art

in EHR-based pharmacovigilance and contributed new ideas that pave the way for further studies

with the potential to further enhance the field of pharmacovigilance and drug safety.

129

References

Agbabiaka, T. B., Savović, J., & Ernst, E. (2008). Methods for causality assessment of adverse

drug reactions. Drug Safety, 31(1), 21–37.

Ahlers, C. B., Fiszman, M., Demner-Fushman, D., Lang, F.-M., & Rindflesch, T. C. (2007).

Extracting semantic predications from Medline citations for pharmacogenomics. In Pac

Symp Biocomput (Vol. 12, pp. 209–20).

Ahlers, C. B., Hristovski, D., Kilicoglu, H., & Rindflesch, T. C. (2007). Using the Literature-

Based Discovery Paradigm to Investigate Drug Mechanisms. AMIA Annual Symposium

Proceedings, 2007, 6–10.

Ahmad, S. R. (2003). Adverse drug event monitoring at the Food and Drug Administration.

Journal of General Internal Medicine, 18(1), 57–60.

Ahmed, I., Dalmasso, C., Haramburu, F., Thiessard, F., Broët, P., & Tubert-Bitter, P. (2010).

False discovery rate estimation for frequentist pharmacovigilance signal detection

methods. Biometrics, 66(1), 301–309.

Ahmed, I., Thiessard, F., Miremont-Salamé, G., Bégaud, B., & Tubert-Bitter, P. (2010).

Pharmacovigilance Data Mining With Methods Based on False Discovery Rates: A

Comparative Simulation Study. Clinical Pharmacology & Therapeutics, 88(4), 492–498.

doi:10.1038/clpt.2010.111

130

Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced

datasets. In Machine Learning: ECML 2004 (pp. 39–50). Springer. Retrieved from

http://link.springer.com/chapter/10.1007/978-3-540-30115-8_7

Almenoff, D. J., Tonning, J. M., Gould, A. L., Szarfman, A., Hauben, M., Ouellet-Hellstrom, R.,

… LaCroix, K. (2005). Perspectives on the Use of Data Mining in Pharmacovigilance.

Drug Safety, 28(11), 981–1007. doi:10.2165/00002018-200528110-00002

Almenoff, J. S., Pattishall, E. N., Gibbs, T. G., DuMouchel, W., Evans, S. J. W., & Yuen, N.

(2007). Novel Statistical Tools for Monitoring the Safety of Marketed Drugs. Clinical

Pharmacology & Therapeutics, 82, 157–166. doi:10.1038/sj.clpt.6100258

Alomar, M. J., Hourani, A. A., & Sulaiman, S. A. (2008). Proposal for the development of

adverse drug reaction prediction model. American Journal of Pharmacology and

Toxicology, 3(2), 193.

Alvarez-Requejo, A., Carvajal, A., Bégaud, B., Moride, Y., Vega, T., & Arias, L. H. M. (1998).

Under-reporting of adverse drug reactions Estimate based on a spontaneous reporting

scheme and a sentinel system. European Journal of Clinical Pharmacology, 54(6), 483–

488. doi:10.1007/s002280050498

Anderson, N., & Borlak, J. (2011). Correlation versus Causation? Pharmacovigilance of the

Analgesic Flupirtine Exemplifies the Need for Refined Spontaneous ADR Reporting.

PLoS ONE, 6(10), e25221. doi:10.1371/journal.pone.0025221

Andrews, E. B., & Moore, N. (2014). Mann’s Pharmacovigilance. John Wiley & Sons.

Arimone, Y., Miremont-Salamé, G., Haramburu, F., Molimard, M., Moore, N., Fourrier-Réglat,

A., & Bégaud, B. (2007). Inter-expert agreement of seven criteria in causality assessment

of adverse drug reactions. British Journal of Clinical Pharmacology, 64(4), 482–488.

131

Arnaiz, J. A., Carné, X., Riba, N., Codina, C., Ribas, J., & Trilla, A. (2001). The use of evidence

in pharmacovigilance: Case reports as the reference source for drug withdrawals.

European Journal of Clinical Pharmacology, 57(1), 89–91. doi:10.1007/s002280100265

Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: the

MetaMap program. Proceedings of the AMIA Symposium, 17–21.

Aronson, A. R., & Lang, F.-M. (2010). An overview of MetaMap: historical perspective and

recent advances. Journal of the American Medical Informatics Association, 17(3), 229–

236.

Aronson, A. R., & Rindflesch, T. C. (1998). Semantic knowledge representation project. A

Report to the Board of Scientific Counselors.

Aronson, J. K., & Hauben, M. (2006). Drug safety: Anecdotes that provide definitive evidence.

BMJ: British Medical Journal, 333(7581), 1267.

Baker, G. R., Norton, P. G., Flintoft, V., Blais, R., Brown, A., Cox, J., … Tamblyn, R. (2004).

The Canadian Adverse Events Study: The Incidence of Adverse Events Among Hospital

Patients in Canada. Canadian Medical Association Journal, 170(11), 1678–1686.

doi:10.1503/cmaj.1040498

Balakin, K. V., & Ekins, S. (2009). Pharmaceutical Data Mining: Approaches and Applications

for Drug Discovery. John Wiley & Sons.

Bate, A. (2003). The use of Bayesian confidence propagation neural network in

pharmacovigilance (dissertation). Umeå University. Retrieved from http://umu.diva-

portal.org/smash/record.jsf?pid=diva2:144650

132

Bate, A., Lindquist, M., Edwards, I., Olsson, S., Orre, R., Lansner, A., & De Freitas, R. (1998).

A Bayesian neural network method for adverse drug reaction signal generation. European

Journal of Clinical Pharmacology, 54(4), 315–321.

Bate, A., Lindquist, M., Edwards, I. R., & Orre, R. (2002). A Data Mining Approach for Signal

Detection and Analysis. Drug Safety, 25(6), 393–397.

Bates, A., Lindquist M., Orre R., Edwards I., & Meyboom R. (2002). Data-mining analyses of

pharmacovigilance signals in relation to relevant comparison drugs. European Journal of

Clinical Pharmacology, 58, 483–490. doi:10.1007/s00228-002-0484-z

Bauer-Mehren, A., van Mullingen, E. M., Avillach, P., Carrascosa, M. del C., Garcia-Serna, R.,

Piñero, J., … Furlong, L. I. (2012). Automatic Filtering and Substantiation of Drug

Safety Signals. PLoS Computational Biology, 8(4), e1002457.

doi:10.1371/journal.pcbi.1002457

Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and

Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B

(Methodological), 57(1), 289–300.

Berthet, S., Olivier, P., Montastruc, J.-L., & Lapeyre-Mestre, M. (2011). Drug safety of

rosiglitazone and pioglitazone in France: a study using the French PharmacoVigilance

database. BMC Pharmacology and Toxicology, 11(1), 5.

Birkhoff, G., & Von Neumann, J. (1936). The logic of quantum mechanics. Annals of

Mathematics, 823–843.

Bisgin, H., Liu, Z., Fang, H., Xu, X., & Tong, W. (2011). Mining FDA drug labels using an

unsupervised learning technique-topic modeling. BMC Bioinformatics, 12(Suppl 10),

S11.

133

Blind, E., Dunder, K., Graeff, P. A., & Abadie, E. (2011). Rosiglitazone: a European regulatory

perspective. Diabetologia, 54(2), 213–218. doi:10.1007/s00125-010-1992-5

Bodenreider, O., & McCray, A. T. (2003). Exploring semantic groups through visual approaches.

Journal of Biomedical Informatics, 36(6), 414–432. doi:10.1016/j.jbi.2003.11.002

Bourg, C. A., & Phillips, B. B. (2012). Rosiglitazone, Myocardial Ischemic Risk, and Recent

Regulatory Actions. Annals of Pharmacotherapy, 46(2), 282–289.

doi:10.1345/aph.1Q400

Bourgeois, F. T., Mandl, K. D., Valim, C., & Shannon, M. W. (2009). Pediatric adverse drug

events in the outpatient setting: an 11-year national analysis. Pediatrics, 124(4), e744–

e750.

Bousquet, C., Henegar, C., Louėt, A. L. L., Degoulet, P., & Jaulent, M. C. (2005).

Implementation of automated signal generation in pharmacovigilance using a knowledge-

based approach. International Journal of Medical Informatics, 74(7-8), 563–571.

Bousquet, C., Trombert, B., Kumar, A., & Rodrigues, J. M. (2008). Semantic categories and

relations for modelling adverse drug reactions towards a categorial structure for

pharmacovigilance. In AMIA Annual Symposium Proceedings (Vol. 2008, p. 61).

Boyle, P. J., King, A. B., Olansky, L., Marchetti, A., Lau, H., Magar, R., & Martin, J. (2002).

Effects of pioglitazone and rosiglitazone on blood lipid levels and glycemic control in

patients with type 2 diabetes mellitus: a retrospective review of randomly selected

medical records. Clinical Therapeutics, 24(3), 378–396.

Brackenridge, A. L., Jackson, N., Jefferson, W., Stolinski, M., Shojaee-Moradie, F., Hovorka,

R., … Russell-Jones, D. (2009). Effects of rosiglitazone and pioglitazone on lipoprotein

metabolism in patients with Type 2 diabetes and normal lipids. Diabetic Medicine: A

134

Journal of the British Diabetic Association, 26(5), 532–539. doi:10.1111/j.1464-

5491.2009.02729.x

Brown, J. S., Kulldorff, M., Chan, K. A., Davis, R. L., Graham, D., Pettus, P. T., … Platt, R.

(2007). Early detection of adverse drug events within population-based health networks:

application of sequential testing methods. Pharmacoepidemiology and Drug Safety, 16,

1275–1284. doi:10.1002/pds.1509

Brown, J. S., Kulldorff, M., Petronis, K. R., Reynolds, R., Chan, K. A., Davis, R. L., … Platt, R.

(2009). Early adverse drug event signal detection within population-based health

networks using sequential methods: key methodologic considerations.

Pharmacoepidemiology and Drug Safety, 18, 226–234. doi:10.1002/pds.1706

Bruza, P. D., & Weeber, M. (2008). Literature-based Discovery. Springer.

Cai, T. T., & Sun, W. (2009). Simultaneous testing of grouped hypotheses: Finding needles in

multiple haystacks. Journal of the American Statistical Association, 104(488), 1467–

1481.

Campillos, M., Kuhn, M., Gavin, A.-C., Jensen, L. J., & Bork, P. (2008). Drug target

identification using side-effect similarity. Science, 321(5886), 263–266.

Cao, H., Hripcsak, G., & Markatou, M. (2007). A statistical methodology for analyzing co-

occurrence data from a large sample. Journal of Biomedical Informatics, 40(3), 343–352.

Cao, H., Markatou, M., Melton, G. B., Chiang, M. F., & Hripcsak, G. (2005). Mining a clinical

data warehouse to discover disease-finding associations using co-occurrence statistics. In

AMIA Annual Symposium Proceedings (Vol. 2005, p. 106).

135

Capuano, A., Motola, G., Russo, F., Avolio, A., Filippelli, A., Rossi, F., & Mazzeo, F. (2004).

Adverse drug events in two emergency departments in Naples, Italy: an observational

study. Pharmacological Research, 50(6), 631–636.

CDC. (2006). Investigating an Outbreak. In Principles of Epidemiology in Public Health

Practice (third edition.). CDC. Retrieved from

http://www.uic.edu/sph/prepare/courses/ph490/resources/epilesson06.pdf

Center for Drug Evaluation and Research, & U.S. Food and Drug Administration. (2012,

September 10). FDA Adverse Event Reporting System (FAERS) (formerly AERS).

WebContent. Retrieved March 12, 2014, from

http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/Adv

erseDrugEffects/default.htm

Center for Drug Evaluation and Research, & U.S. Food and Drug Administration. (2014,

February 26). FDA Adverse Events Reporting System (FAERS) - FDA Adverse Event

Reporting System (FAERS): Latest Quarterly Data Files. WebContent. Retrieved March

12, 2014, from

http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/Adv

erseDrugEffects/ucm082193.htm

Chan, M., Nicklason, F., & Vial, J. (2001). Adverse drug events as a cause of hospital admission

in the elderly. Internal Medicine Journal, 31(4), 199–205.

Chao, L., Marcus-Samuels, B., Mason, M. M., Moitra, J., Vinson, C., Arioglu, E., … Reitman,

M. L. (2000). Adipose tissue is required for the antidiabetic, but not for the

hypolipidemic, effect of thiazolidinediones. Journal of Clinical Investigation, 106(10),

1221–1228.

136

Chen, B., Wild, D., & Guha, R. (2009). PubChem as a Source of Polypharmacology. Journal of

Chemical Information and Modeling, 49(9), 2044–2055. doi:10.1021/ci9001876

Chen, L., & Friedman, C. (2004). Extracting phenotypic information from the literature via

natural language processing. Studies in Health Technology and Informatics, 107(Pt 2),

758–762.

Cheung, B. M. (2010). Behind the rosiglitazone controversy. Expert Review of Clinical

Pharmacology, 3(6), 723–725. doi:10.1586/ecp.10.126

Christopher D. Manning, & Hinrich Schütze. (1999). Foundations of statistical natural language

processing (Vol. 999). MIT Press.

Cohen, T., Schvaneveldt, R., & Widdows, D. (2010). Reflective Random Indexing and indirect

inference: A scalable method for discovery of implicit connections. Journal of

Biomedical Informatics, 43(2), 240–256.

Cohen, T., & Widdows, D. (2009). Empirical Distributional Semantics: Methods and Biomedical

Applications. Journal of Biomedical Informatics, 42(2), 390–405.

doi:10.1016/j.jbi.2009.02.002

Cohen, T., Widdows, D., De Vine, L., Schvaneveldt, R., & Rindflesch, T. C. (2012). Many paths

lead to discovery: analogical retrieval of cancer therapies. Retrieved from

http://eprints.qut.edu.au/54257/

Cohen, T., Widdows, D., Schvaneveldt, R. W., Davies, P., & Rindflesch, T. C. (2012).

Discovering discovery patterns with predication-based Semantic Indexing. Journal of

Biomedical Informatics, 45(6), 1049–1065. doi:10.1016/j.jbi.2012.07.003

Cohen, T., Widdows, D., Schvaneveldt, R. W., & Rindflesch, T. C. (2012). Discovery at a

distance: Farther journeys in predication space. In Bioinformatics and Biomedicine

137

Workshops (BIBMW), 2012 IEEE International Conference on (pp. 218–225). Retrieved

from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6470307

Cole, R., & Bruza, P. (2005). A Bare Bones Approach to Literature-Based Discovery: An

Analysis of the Raynaud’s/Fish-Oil and Migraine-Magnesium Discoveries in Semantic

Space. In A. Hoffmann, H. Motoda, & T. Scheffer (Eds.), Discovery Science (Vol. 3735,

pp. 84–98). Springer Berlin / Heidelberg. Retrieved from

http://www.springerlink.com/content/y14820t9518583u1/abstract/

Collins, F. S. (2011). Reengineering translational science: the time is right. Science Translational

Medicine, 3(90), 90cm17–90cm17.

Coloma, P. M., Avillach, P., Salvo, F., Schuemie, M. J., Ferrajolo, C., Pariente, A., … Trifirò, G.

(2013). A reference standard for evaluation of methods for drug safety signal detection

using electronic healthcare record databases. Drug Safety: An International Journal of

Medical Toxicology and Drug Experience, 36(1), 13–23. doi:10.1007/s40264-012-0002-x

Davis, J., & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. In

Proceedings of the 23rd international conference on Machine learning (pp. 233–240).

Retrieved from http://dl.acm.org/citation.cfm?id=1143874

De Caterina, R., Talmud, P. J., Merlini, P. A., Foco, L., Pastorino, R., Altshuler, D., …

Ardissino, D. (2011). Strong association of the APOA5-1131T>C gene variant and

early-onset acute myocardial infarction. Atherosclerosis, 214(2), 397–403.

doi:10.1016/j.atherosclerosis.2010.11.011

Deftereos, S. N., Andronis, C., Friedla, E. J., Persidis, A., & Persidis, A. (2011). Drug

repurposing and adverse event prediction using high-throughput literature analysis. Wiley

138

Interdisciplinary Reviews: Systems Biology and Medicine, 3(3), 323–334.

doi:10.1002/wsbm.147

DiMasi, J. A., Hansen, R. W., & Grabowski, H. G. (2003). The price of innovation: new

estimates of drug development costs. Journal of Health Economics, 22(2), 151–185.

doi:10.1016/S0167-6296(02)00126-1

Ding, J., Nagai, K., & Woo, J.-T. (2003). Insulin-dependent adipogenesis in stromal ST2 cells

derived from murine bone marrow. Bioscience, Biotechnology, and Biochemistry, 67(2),

314–321.

Duke J, Friedlin J, & Ryan P. (2011). A quantitative analysis of adverse events and

“overwarning” in drug labeling. Archives of Internal Medicine, 171(10), 941–954.

doi:10.1001/archinternmed.2011.182

Dumais, S. T. (1991). Improving the retrieval of information from external sources. Behavior

Research Methods, Instruments, & Computers, 23(2), 229–236.

DuMouchel, W. (1999). Bayesian Data Mining in Large Frequency Tables, with an Application

to the FDA Spontaneous Reporting System. The American Statistician, 53(3), 177–190.

DuMouchel, W., & Pregibon, D. (2001). Empirical bayes screening for multi-item associations.

In Proceedings of the seventh ACM SIGKDD international conference on Knowledge

discovery and data mining (pp. 67–76).

DuMouchel, W., Ryan, P. B., Schuemie, M. J., & Madigan, D. (2013). Evaluation of

Disproportionality Safety Signaling Applied to Healthcare Databases. Drug Safety,

36(S1), 123–132. doi:10.1007/s40264-013-0106-y

Edwards, I. R. (1992). The Who Database—I. Drug Information Journal, 26(4), 477–480.

doi:10.1177/009286159202600403

139

Edwards, I. R., & Aronson, J. K. (2000). Adverse drug reactions: definitions, diagnosis, and

management. The Lancet, 356(9237), 1255–1259.

Edwards, I. R., & Biriell, C. (1994). Harmonisation in pharmacovigilance. Drug Safety, 10(2),

93–102.

Egerod, F. L., Svendsen, J. E., Hinley, J., Southgate, J., Bartels, A., Brunner, N., & Oleksiewicz,

M. B. (2009). PPAR and PPAR Coactivation Rapidly Induces Egr-1 in the Nuclei of the

Dorsal and Ventral Urinary Bladder and Kidney Pelvis Urothelium of Rats. Toxicologic

Pathology, 37(7), 947–958. doi:10.1177/0192623309351723

EMA. (2010). European Medicines Agency recommends suspension of Avandia, Avandamet and

Avaglim. (Press release). Retrieved from

http://www.ema.europa.eu/docs/en_GB/document_library/Press_release/2010/09/WC500

096996.pdf

Evans, S. J. W., Waller, P. C., & Davis, S. (2001). Use of proportional reporting ratios (PRRs)

for signal generation from spontaneous adverse drug reaction reports.

Pharmacoepidemiology and Drug Safety, 10(6), 483–486. doi:10.1002/pds.677

FDA. (2011, May 18). FDA Drug Safety Communication: Updated Risk Evaluation and

Mitigation Strategy (REMS) to Restrict Access to Rosiglitazone-containing Medicines

including Avandia, Avandamet, and Avandaryl. WebContent. Retrieved April 12, 2012,

from http://www.fda.gov/Drugs/DrugSafety/ucm255005.htm

FDA Science Board Subcommittee. (2011). FDA Science Board Subcommittee: review of the

FDA/CDER pharmacovigilance program.

Fine, A. (2013, February 26). Introduction to Postmarketing Drug Safety Surveillance:

Pharmacovigilance in FDA/CDER. Retrieved from

140

http://www.fda.gov/downloads/AboutFDA/WorkingatFDA/FellowshipInternshipGraduat

eFacultyPrograms/PharmacyStudentExperientialProgramCDER/UCM340626.pdf

Fliegner, D., Westermann, D., Riad, A., Schubert, C., Becher, E., Fielitz, J., … Regitzzagrosek,

V. (2008). Up-regulation of PPARγ in myocardial infarction. European Journal of Heart

Failure, 10(1), 30–38. doi:10.1016/j.ejheart.2007.11.005

Friedman, C. (2009). Discovering Novel Adverse Drug Events Using Natural Language

Processing and Mining of the Electronic Health Record. In C. Combi, Y. Shahar, & A.

Abu-Hanna (Eds.), Artificial Intelligence in Medicine (pp. 1–5). Springer Berlin

Heidelberg. Retrieved from http://link.springer.com/chapter/10.1007/978-3-642-02976-

9_1

Friedman, C., Shagina, L., Lussier, Y., & Hripcsak, G. (2004). Automated Encoding of Clinical

Documents Based on Natural Language Processing. Journal of the American Medical

Informatics Association, 11(5), 392–402. doi:10.1197/jamia.M1552

Friis-Moller, N., Sabin, C. A., Weber, R., d’ Arminio Monforte, A., El-Sadr, W. M., Reiss, P., …

Pradier, C. (2003). Combination antiretroviral therapy and the risk of myocardial

infarction. New England Journal of Medicine, 349(21), 1993–2003.

Gao, D.-F., Niu, X.-L., Hao, G.-H., Peng, N., Wei, J., Ning, N., & Wang, N.-P. (2007).

Rosiglitazone inhibits angiotensin II-induced CTGF expression in vascular smooth

muscle cells––Role of PPAR-γ in vascular fibrosis. Biochemical Pharmacology, 73(2),

185–197. doi:10.1016/j.bcp.2006.09.019

Gayler, R. W. (2004). Vector Symbolic Architectures answer Jackendoff’s challenges for

cognitive neuroscience. arXiv:cs/0412059. Retrieved from

http://arxiv.org/abs/cs/0412059

141

Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for

text categorization. Technometrics, 49(3), 291–304.

Goldberg, R. B., Kendall, D. M., Deeg, M. A., Buse, J. B., Zagar, A. J., Pinaire, J. A., …

Jacober, S. J. (2005). A Comparison of Lipid and Glycemic Effects of Pioglitazone and

Rosiglitazone in Patients With Type 2 Diabetes and Dyslipidemia. Diabetes Care, 28(7),

1547–1554. doi:10.2337/diacare.28.7.1547

Golub, G. H., & Van Loan, C. F. (2012). Matrix computations (Vol. 3). JHU Press. Retrieved

from http://books.google.com/books?hl=en&lr=&id=5U-l8U3P-

VUC&oi=fnd&pg=PT10&dq=Golub+and+van+Loan+Matrix+Computations+1989&ots

=7YKwIq0Rar&sig=-OanYbHRwdgPqNlFZhMbUqYeq8I

Gordon, M. D., & Dumais, S. (1998). Using latent semantic indexing for literature based

discovery. Journal of the American Society for Information Science, 49(8), 674–685.

Gould, A. L. (2005, May 18). Issues in the Practical Application of Data Mining Techniques to

Pharmacovigilance. Retrieved from

http://www.fda.gov/ohrms/dockets/ac/05/slides/2005-4143S1_06_Gould-

Merck_files/frame.htm

Graham, D. J., Campen, D., Hui, R., Spence, M., Cheetham, C., Levy, G., … Ray, W. A. (2005).

Risk of acute myocardial infarction and sudden cardiac death in patients treated with

cyclo-oxygenase 2 selective and non-selective non-steroidal anti-inflammatory drugs:

nested case-control study. The Lancet, 365(9458), 475–481.

Gupta, N., Singh, S., Maturu, V. N., Sharma, Y. P., & Gill, K. D. (2011). Paraoxonase 1 (PON1)

Polymorphisms, Haplotypes and Activity in Predicting CAD Risk in North-West Indian

Punjabis. PLoS ONE, 6(5), e17805. doi:10.1371/journal.pone.0017805

142

Haerian, K., Varn, D., Vaidya, S., Ena, L., Chase, H. S., & Friedman, C. (2012). Detection of

Pharmacovigilance-Related Adverse Events Using Electronic Health Records and

Automated Methods. Clinical Pharmacology & Therapeutics, 92(2), 228–234.

doi:10.1038/clpt.2012.54

Hamblin, M., Chang, L., Fan, Y., Zhang, J., & Chen, Y. E. (2009). PPARs and the

cardiovascular system. Antioxidants & Redox Signaling, 11(6), 1415–1452.

Hamilton, R. A., Briceland, L. L., & Andritz, M. H. (1998). Frequency of Hospitalization after

Exposure to Known Drug-Drug Interactions in a Medicaid Population.

Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy, 18(5),

1112–1120.

Hammann, F., Gutmann, H., Vogt, N., Helma, C., & Drewe, J. (2010). Prediction of Adverse

Drug Reactions Using Decision Tree Modeling. Clinical Pharmacology & Therapeutics,

88(1), 52–59. doi:10.1038/clpt.2009.248

Harpaz, R., Vilar, S., DuMouchel, W., Salmasian, H., Haerian, K., Shah, N. H., … Friedman, C.

(2013). Combing signals from spontaneous reports and electronic health records for

detection of adverse drug reactions. Journal of the American Medical Informatics

Association. Retrieved from

http://jamia.bmjjournals.com/content/early/2012/10/30/amiajnl-2012-000930.abstract

Hasford, J., Goettler, M., Munter, K.-H., & Müller-Oerlinghausen, B. (2002). Physicians’

knowledge and attitudes regarding the spontaneous reporting system for adverse drug

reactions. Journal of Clinical Epidemiology, 55(9), 945–950.

Hauben, M., & Aronson, J. K. (2007). Gold Standards in Pharmacovigilance. Drug Safety, 30(8),

645–655.

143

Hauben, M., & Bate, A. (2009). Decision support methods for the detection of adverse events in

post-marketing data. Drug Discovery Today, 14(7-8), 343–357.

doi:10.1016/j.drudis.2008.12.012

Hauben, M., Madigan, D., Gerrits, C. M., Walsh, L., & Van Puijenbroek, E. P. (2005). The role

of data mining in pharmacovigilance. Retrieved from

http://informahealthcare.com/doi/abs/10.1517/14740338.4.5.929

Hauben, M., & Reich, L. (2004). Safety Related Drug-Labelling Changes. Drug Safety, 27(10),

735–744.

Hauben, M., & Reich, L. (2005). Communication of findings in pharmacovigilance: use of the

term “signal” and the need for precision in its use. European Journal of Clinical

Pharmacology, 61(5), 479–480. doi:10.1007/s00228-005-0951-4

Henegar, C., Bousquet, C., Lillo-Le Louët, A., Degoulet, P., & Jaulent, M.-C. (2006). Building

an ontology of adverse drug reactions for automated signal generation in

pharmacovigilance. Computers in Biology and Medicine, 36(7–8), 748–767.

doi:10.1016/j.compbiomed.2005.04.009

Herskovic, J. R., & Bernstam, E. V. (2005). Using Incomplete Citation Data for MEDLINE

Results Ranking. AMIA Annual Symposium Proceedings, 2005, 316–320.

Hill, A. B. (1965). The environment and disease: association or causation? Proceedings of the

Royal Society of Medicine, 58(5), 295.

Honigman, B., Lee, J., Rothschild, J., Light, P., Pulling, R. M., Yu, T., & Bates, D. W. (2001).

Using computerized data to identify adverse drug events in outpatients. Journal of the

American Medical Informatics Association, 8(3), 254.

144

Honigman, B., Light, P., Pulling, R. M., & Bates, D. W. (2001). A computerized method for

identifying incidents associated with adverse drug events in outpatients. International

Journal of Medical Informatics, 61(1), 21–32.

Hoofnagle, J. H. (2004). Drug-induced liver injury network (DILIN). Hepatology, 40(4), 773–

773. doi:10.1002/hep.1840400403

Hripcsak, G., & Albers, D. J. (2013). Next-generation phenotyping of electronic health records.

Journal of the American Medical Informatics Association, 20(1), 117–121.

doi:10.1136/amiajnl-2012-001145

Hristovski, D., Burgun-Parenthoine, A., Avillach, P., & Rindflesch, T. C. (2012a). Towards

Using Literature-based Discovery to Explain Drug Adverse Effects. In Quality of Life

through Quality of Information. Retrieved from

http://person.hst.aau.dk/ska/mie2012/CD/Interface_MIE2012/MIE_2012_Content/MIE_2

012_Content/sco.html

Hristovski, D., Burgun-Parenthoine, A., Avillach, P., & Rindflesch, T. C. (2012b). Using

Literature-based Discovery to Explain Drug Adverse Effects. In AMIA Annual

Symposium Proceedings (p. 1779).

Hristovski, D., Friedman, C., Rindflesch, T. C., & Peterlin, B. (2006). Exploiting Semantic

Relations for Literature-Based Discovery. AMIA Annual Symposium Proceedings, 2006,

349–353.

Hristovski, D., Friedman, C., Rindflesch, T. C., & Peterlin, B. (2008). Literature-Based

Knowledge Discovery using Natural Language Processing. In P. Bruza & M. Weeber

(Eds.), Literature-based Discovery (pp. 133–152). Springer Berlin Heidelberg. Retrieved

from http://link.springer.com/chapter/10.1007/978-3-540-68690-3_9

145

Huang, L.-C., Wu, X., & Chen, J. Y. (2011). Predicting adverse side effects of drugs. BMC

Genomics, 12(Suppl 5), S11.

Im, M. J., & Hoopes, J. E. (1983). Increases in dihydronicotinamide adenine dinucleotide

(NADH) content and alpha-glycerophosphate dehydrogenase activity in epidermal wound

healing. Proceedings of the Society for Experimental Biology and Medicine. Society for

Experimental Biology and Medicine (New York, N.Y.), 173(1), 17–20.

James E. Tisdale, & Douglas A Miller. (2010). Drug-Induced Diseases: Prevention, Detection,

and Management (Second Edition edition.). Bethesda, Md: ASHP.

Johansson, S., Wallander, M. A., de Abajo, F. J., & Rodriguez, L. A. G. (2010). Prospective drug

safety monitoring using the UK primary-care General Practice Research Database:

theoretical framework, feasibility analysis and extrapolation to future scenarios. Drug

Safety, 33(3), 223–232.

Johnson, B. A., Wilson, E. M., Li, Y., Moller, D. E., Smith, R. G., & Zhou, G. (2000). Ligand-

induced stabilization of PPARγ monitored by NMR spectroscopy: implications for

nuclear receptor activation. Journal of Molecular Biology, 298(2), 187–194.

doi:10.1006/jmbi.2000.3636

Josephson, J. R., & Josephson, S. G. (1996). Abductive inference: Computation, philosophy,

technology. Cambridge University Press. Retrieved from

http://books.google.com/books?hl=en&lr=&id=uu6zXrogwWAC&oi=fnd&pg=PA1&dq

=Abductive+Inference:+Computation,+Philosophy,+Technology&ots=6MzrU6Y7AC&si

g=msk0oNa-wcWqEBisEzNHNiUhrck

Jung, H. S., Youn, B.-S., Cho, Y. M., Yu, K.-Y., Park, H. J., Shin, C. S., … Park, K. S. (2005).

The effects of rosiglitazone and metformin on the plasma concentrations of resistin in

146

patients with type 2 diabetes mellitus. Metabolism: Clinical and Experimental, 54(3),

314–320. doi:10.1016/j.metabol.2004.05.019

Kanehisa, M., & Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic

Acids Research, 28(1), 27–30.

Kanerva, P. (1994). The spatter code for encoding concepts at many levels. In ICANN’94 (pp.

226–229). Springer. Retrieved from http://link.springer.com/chapter/10.1007/978-1-

4471-2097-1_52

Kanerva, P. (1996). Binary spatter-coding of ordered K-tuples. In Artificial Neural Networks—

ICANN 96 (pp. 869–873). Springer. Retrieved from

http://link.springer.com/chapter/10.1007/3-540-61510-5_146

Kanerva, P. (2009). Hyperdimensional Computing: An Introduction to Computing in Distributed

Representation with High-Dimensional Random Vectors. Cognitive Computation, 1(2),

139–159. doi:10.1007/s12559-009-9009-8

Kanerva, P. (2010). What we mean when we say “What’s the dollar of mexico?”: Prototypes and

mapping in concept space. In Proc. AAAI Fall Symp. on Quantum Informatics for

Cognitive, Social, and Semantic Processes. Retrieved from

http://www.aaai.org/ocs/index.php/FSS/FSS10/paper/viewPDFInterstitial/2243/2691

Kanerva, P., Kristofersson, J., & Holst, A. (2000). Random indexing of text samples for latent

semantic analysis. In Proceedings of the 22nd annual conference of the cognitive science

society (Vol. 1036). Retrieved from

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.4.6523&rep=rep1&type=pdf

147

Kessler, D. A., Natanblut, S., Kennedy, D., Lazar, E., Rheinstein, P., Anello, C., … Cook, K.

(1993). Introducing MEDWatch. JAMA: The Journal of the American Medical

Association, 269(21), 2765–2768.

Khanderia, U., Pop-Busui, R., & Eagle, K. A. (2008). Thiazolidinediones in Type 2 Diabetes: A

Cardiology Perspective. Annals of Pharmacotherapy, 42(10), 1466–1474.

doi:10.1345/aph.1K666

Kilicoglu, H., Fiszman, M., Rodriguez, A., Shin, D., Ripple, A., & Rindflesch, T. C. (2008).

Semantic MEDLINE: a web application for managing the results of PubMed Searches. In

Proceedings of the third international symposium for semantic mining in biomedicine

(Vol. 2008, pp. 69–76). Retrieved from http://datacommunitydc.org/blog/wp-

content/uploads/2013/04/semmed_smbm_15_cr.pdf

Kilicoglu, H., Fiszman, M., Rosemblat, G., Marimpietri, S., & Rindflesch, T. C. (2010).

Arguments of nominals in semantic interpretation of biomedical text. In Proceedings of

the 2010 workshop on biomedical natural language processing (pp. 46–54). Retrieved

from http://dl.acm.org/citation.cfm?id=1869967

Kim, K. Y. (2006). Antiangiogenic Effect of Rosiglitazone Is Mediated via Peroxisome

Proliferator-activated Receptor -activated Maxi-K Channel Opening in Human Umbilical

Vein Endothelial Cells. Journal of Biological Chemistry, 281(19), 13503–13512.

doi:10.1074/jbc.M510357200

Kleinberg, S., & Hripcsak, G. (2011). A review of causal inference for biomedical informatics.

Journal of Biomedical Informatics, 44(6), 1102–1112. doi:10.1016/j.jbi.2011.07.001

148

Knowles, S. R., Uetrecht, J., & Shear, N. H. (2000). Idiosyncratic drug reactions: the reactive

metabolite syndromes. The Lancet, 356(9241), 1587–1591. doi:10.1016/S0140-

6736(00)03137-8

Knox, C., Law, V., Jewison, T., Liu, P., Ly, S., Frolkis, A., … Wishart, D. S. (2011). DrugBank

3.0: a comprehensive resource for “Omics” research on drugs. Nucleic Acids Research,

39(Database), D1035–D1041. doi:10.1093/nar/gkq1126

Kostoff, R. N., Briggs, M. B., Solka, J. L., & Rushenberg, R. L. (2008). Literature-related

discovery (LRD): Methodology. Technological Forecasting and Social Change, 75(2),

186–202. doi:10.1016/j.techfore.2007.11.010

Kuhn, M., Al Banchaabouchi, M., Campillos, M., Jensen, L. J., Gross, C., Gavin, A.-C., & Bork,

P. (2013). Systematic identification of proteins that elicit drug side effects. Molecular

Systems Biology, 9(1). doi:10.1038/msb.2013.10

Kuhn, M., Campillos, M., Letunic, I., Jensen, L. J., & Bork, P. (2010). A side effect resource to

capture phenotypic effects of drugs. Molecular Systems Biology, 6, 343.

doi:10.1038/msb.2009.98

Kuller Lewis, Alice Arnold, Russell Tracy, James Otvos, Greg Burke, Bruce Psaty, … Richard

Kronmal. (2002). Nuclear Magnetic Resonance Spectroscopy of Lipoproteins and Risk of

Coronary Heart Disease in the Cardiovascular Health Study. Arteriosclerosis,

Thrombosis, and Vascular Biology, 22(7), 1175–1180.

doi:10.1161/01.ATV.0000022015.97341.3A

Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic

analysis theory of acquisition, induction, and representation of knowledge. Psychological

Review, 104(2), 211.

149

Lazarou, J., Pomeranz, B. H., & Corey, P. N. (1998). Incidence of adverse drug reactions in

hospitalized patients. JAMA: The Journal of the American Medical Association, 279(15),

1200–1205.

Lee, C.-H. (2003). Minireview: Lipid Metabolism, Metabolic Diseases, and Peroxisome

Proliferator-Activated Receptors. Endocrinology, 144(6), 2201–2207.

doi:10.1210/en.2003-0288

Lekhal, S., Børvik, T., Brodin, E., Nordøy, A., & Hansen, J.-B. (2010). Tissue factor-induced

Thrombin Generation in the Fasting and Postprandial State among Elderly Survivors of

Myocardial Infarction. Thrombosis Research, 126(4), 353–359.

doi:10.1016/j.thromres.2009.10.003

LePendu, P., Iyer, S. V., Bauer-Mehren, A., Harpaz, R., Mortensen, J. M., Podchiyska, T., …

Shah, N. H. (2013). Pharmacovigilance Using Clinical Notes. Clinical Pharmacology &

Therapeutics. doi:10.1038/clpt.2013.47

Lindquist, M., Edwards, I. R., Bate, A., Fucik, H., Nunes, A. M., & Ståhl, M. (1999). From

association to alert--a revised approach to international signal analysis.

Pharmacoepidemiology and Drug Safety, 8 Suppl 1, S15–25. doi:10.1002/(SICI)1099-

1557(199904)8:1+<S15::AID-PDS402>3.0.CO;2-B

Lindquist, M., Stahl, M., Bate, A., Edwards, I. R., & Meyboom, R. H. B. (2000). A retrospective

evaluation of a data mining approach to aid finding new adverse drug reaction signals in

the WHO international database. Drug Safety, 23(6), 533–542.

Lindsay, R. K., & Gordon, M. D. (1999). Literature-based discovery by lexical statistics. Journal

of the American Society for Information Science, 50(7), 574–587.

doi:10.1002/(SICI)1097-4571(1999)50:7<574::AID-ASI3>3.0.CO;2-Q

150

Liu, M., McPeek Hinz, E. R., Matheny, M. E., Denny, J. C., Schildcrout, J. S., Miller, R. A., &

Xu, H. (2012). Comparative analysis of pharmacovigilance methods in the detection of

adverse drug reactions using electronic medical records. Journal of the American Medical

Informatics Association. doi:10.1136/amiajnl-2012-001119

Liu, M., Wu, Y., Chen, Y., Sun, J., Zhao, Z., Chen, X. -w., … Xu, H. (2012). Large-scale

prediction of adverse drug reactions using chemical, biological, and phenotypic

properties of drugs. Journal of the American Medical Informatics Association, 19(e1),

e28–e35. doi:10.1136/amiajnl-2011-000699

Locke, K., Golden-Biddle, K., & Feldman, M. S. (2008). Perspective—Making Doubt

Generative: Rethinking the Role of Doubt in the Research Process. Organization Science,

19(6), 907–918. doi:10.1287/orsc.1080.0398

Loke, Y. K., Kwok, C. S., & Singh, S. (2011). Comparative cardiovascular effects of

thiazolidinediones: systematic review and meta-analysis of observational studies. BMJ,

342(mar17 1), d1309–d1309. doi:10.1136/bmj.d1309

Loke, Y. K., Price, D., & Herxheimer, A. (2007). Systematic reviews of adverse effects:

framework for a structured approach. BMC Medical Research Methodology, 7(1), 32.

doi:10.1186/1471-2288-7-32

Lopes, P., Nunes, T., Campos, D., Furlong, L. I., Bauer-Mehren, A., Sanz, F., … Oliveira, J. L.

(2013). Gathering and Exploring Scientific Knowledge in Pharmacovigilance. PLoS

ONE, 8(12), e83016. doi:10.1371/journal.pone.0083016

Lopes, R. D., Dickerson, S., Hafley, G., Burns, S., Tourt-Uhlig, S., White, J., … Mahaffey, K.

W. (2013). Methodology of a reevaluation of cardiovascular outcomes in the RECORD

151

trial: Study design and conduct. American Heart Journal, 166(2), 208–216.e28.

doi:10.1016/j.ahj.2013.05.005

Lygate, C. A., Hulbert, K., Monfared, M., Cole, M. A., Clarke, K., & Neubauer, S. (2003). The

PPARγ-activator rosiglitazone does not alter remodeling but increases mortality in rats

post-myocardial infarction. Cardiovascular Research, 58(3), 632–637.

Madigan, D., Schuemie, M. J., & Ryan, P. B. (2013). Empirical Performance of the Case–

Control Method: Lessons for Developing a Risk Identification and Analysis System.

Drug Safety, 36(S1), 73–82. doi:10.1007/s40264-013-0105-z

Mahaffey, K. W., Hafley, G., Dickerson, S., Burns, S., Tourt-Uhlig, S., White, J., … Lopes, R.

D. (2013). Results of a reevaluation of cardiovascular outcomes in the RECORD trial.

American Heart Journal, 166(2), 240–249.e1. doi:10.1016/j.ahj.2013.05.004

Mann, R. D., & Andrews, E. B. (Eds.). (2007). Pharmacovigilance (2nd ed.). Wiley.

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New

York: Cambridge University Press.

Marie, I., Delafenêtre, H., Massy, N., Thuillez, C., Noblet, C., & Centers, N. of the F. P. (2008).

Tendinous disorders attributed to statins: A study on ninety-six spontaneous reports in the

period 1990–2005 and review of the literature. Arthritis Care & Research, 59(3), 367–

372. doi:10.1002/art.23309

McClellan, M. (2007). Drug Safety Reform at the FDA — Pendulum Swing or Systematic

Improvement? New England Journal of Medicine, 356(17), 1700–1702.

doi:10.1056/NEJMp078057

152

Mera, R. M., Beach, K. J., Powell, G. E., & Pattishall, E. N. (2010). Semi-automated risk

estimation using large databases: quinolones and clostridium difficile associated diarrhea.


Merck. (2004, September 30). Merck Announces Voluntary Worldwide Withdrawal of VIOXX.

Merck Announces Voluntary Worldwide Withdrawal of VIOXX. Retrieved from

http://www.merck.com/newsroom/vioxx/pdf/vioxx_press_release_final.pdf

Meyboom, R. H., Hekster, Y. A., Egberts, A. C., Gribnau, F. W., & Edwards, I. R. (1997).

Causal or Casual? Drug Safety, 17(6), 374–389.

Muhlhausler, B. S., Duffield, J. A., & McMillen, I. C. (2007). Increased maternal nutrition

increases leptin expression in perirenal and subcutaneous adipose tissue in the postnatal

lamb. Endocrinology, 148(12), 6157–6163. doi:10.1210/en.2007-0770

Nadeau, K. J., Ehlers, L. B., Aguirre, L. E., Reusch, J. E. B., & Draznin, B. (2007). Discordance

between intramuscular triglyceride and insulin sensitivity in skeletal muscle of Zucker

diabetic rats after treatment with fenofibrate and rosiglitazone. Diabetes, Obesity and

Metabolism, 9(5), 714–723. doi:10.1111/j.1463-1326.2006.00696.x

Nadkarni, P. M. (2010). Drug safety surveillance using de-identified EMR and claims data:

issues and challenges. Journal of the American Medical Informatics Association, 17(6),

671–674. doi:10.1136/jamia.2010.008607

Nissen, S. E., & Wolski, K. (2007). Effect of rosiglitazone on the risk of myocardial infarction

and death from cardiovascular causes. The New England Journal of Medicine, 356,

2457–2471.

Noren, G. N., Bate, A., Orre, R., & Edwards, I. R. (2006). Extending the methods used to screen

the WHO drug safety database towards analysis of complex associations and improved

153

accuracy for rare events. Statistics in Medicine, 25(21), 3740–3757.

doi:10.1002/sim.2473

Norén, G. N., Bergvall, T., Ryan, P. B., Juhlin, K., Schuemie, M. J., & Madigan, D. (2013).

Empirical Performance of the Calibrated Self-Controlled Cohort Analysis Within

Temporal Pattern Discovery: Lessons for Developing a Risk Identification and Analysis

System. Drug Safety, 36(S1), 107–121. doi:10.1007/s40264-013-0095-x

Oliveira, J. L., Lopes, P., Nunes, T., Campos, D., Boyer, S., Ahlberg, E., … van der Lei, J.

(2013). The EU-ADR Web Platform: delivering advanced pharmacovigilance tools.


Olivier, P., & Montastruc, J.-L. (2006). The nature of the scientific evidence leading to drug

withdrawals for pharmacovigilance reasons in France. Pharmacoepidemiology and Drug

Safety, 15(11), 808–812. doi:10.1002/pds.1248

Otake, K., Azukizawa, S., Fukui, M., Shibabayashi, M., Kamemoto, H., Miike, T., … Shirahase,

H. (2011). A Novel Series of (S)-2, 7-Substituted-1, 2, 3, 4-tetrahydroisoquinoline-3-

carboxylic Acids: Peroxisome Proliferator-Activated Receptor α/γ Dual Agonists with

Protein-Tyrosine Phosphatase 1B Inhibitory Activity. Chemical and Pharmaceutical

Bulletin, 59(10), 1233–1242.

Patel, L., Buckels, A. C., Kinghorn, I. J., Murdock, P. R., Holbrook, J. D., Plumpton, C., …

Smith, S. A. (2003). Resistin is expressed in human macrophages and directly regulated

by PPAR gamma activators. Biochemical and Biophysical Research Communications,

300(2), 472–476.

154

Pauwels, E., Stoven, V., & Yamanishi, Y. (2011). Predicting drug side-effect profiles: a chemical

fragment-based approach. BMC Bioinformatics, 12(1), 169. doi:10.1186/1471-2105-12-

169

Peirce, C. S. (1955). Philosophical writings of Peirce. Courier Dover Publications. Retrieved

from

http://books.google.com/books?hl=en&lr=&id=ClSjXRIbxAMC&oi=fnd&pg=PR9&dq=

philosophical+writings+of+peirce.+Abduction+and+induction&ots=ablBmn3EHE&sig=

76UULmR1s7dxYL1bWy-Luqjt9bo

Perrio, M., Voss, S., & Shakir, S. A. . (2007). Application of the bradford hill criteria to assess

the causality of cisapride-induced arrhythmia: a model for assessing causal association in

pharmacovigilance. Drug Safety, 30(4), 333–346.

Phillips, G. B. (1977). Relationship between serum sex hormones and glucose, insulin and lipid

abnormalities in men with myocardial infarction. Proceedings of the National Academy

of Sciences, 74(4), 1729–1733.

Pirmohamed, M., James, S., Meakin, S., Green, C., Scott, A. K., Walley, T. J., … Breckenridge,

A. M. (2004). Adverse drug reactions as cause of admission to hospital: prospective

analysis of 18 820 patients. Bmj, 329(7456), 15–19.

Plate, T. A. (1995). Holographic reduced representations. Neural Networks, IEEE Transactions

on, 6(3), 623–641.

Platt, R., Madre, L., Reynolds, R., & Tilson, H. (2008). Active drug safety surveillance: a tool to

improve public health. Pharmacoepidemiology and Drug Safety, 17(12), 1175–1182.

doi:10.1002/pds.1668

155

Platt, R., Wilson, M., Chan, K. A., Benner, J. S., Marchibroda, J., & McClellan, M. (2009). The

new Sentinel Network—improving the evidence of medical-product safety. New England

Journal of Medicine, 361(7), 645–647.

Poluzzi, E., Raschi, E., Piccinni, C., & De Ponti, F. (2012). Data mining techniques in

pharmacovigilance: analysis of the publicly accessible FDA adverse event reporting

system (AERS). Data Mining Applications in Engineering and Medicine. Croatia:

InTech, 267–301.

Pouliot, Y., Chiang, A. P., & Butte, A. J. (2011). Predicting Adverse Drug Reactions Using

Publicly Available PubChem BioAssay Data. Clinical Pharmacology & Therapeutics,

90(1), 90–99. doi:10.1038/clpt.2011.81

Pray, L., Robinson, S., & Translation, F. on D. D., Development, and. (2007). Challenges for the

FDA: The Future of Drug Safety, Workshop Summary. National Academies Press.

Puijenbroek, E. P., Diemont, W. L., & van Grootheest, K. (2003). Application of quantitative

signal detection in the Dutch spontaneous reporting system for adverse drug reactions.

Drug Safety, 26(5), 293–301.

Rawlins, M. D. (1988a). Spontaneous reporting of adverse drug reactions. I: the data. British


Rawlins, M. D. (1988b). Spontaneous reporting of adverse drug reactions. II: Uses. British


Richard Doll. (1994). Austin Bradford Hill. 8 July 1897-18 April 1991. Biographical Memoirs of

Fellows of the Royal Society, 40, 128–140.

Rindflesch, T. C., & Aronson, A. R. (2002). Semantic processing for enhanced access to

biomedical knowledge. Real World Semantic Web Applications, 157–172.

156

Rindflesch, T. C., & Fiszman, M. (2003). The interaction of domain knowledge and linguistic

structure in natural language processing: interpreting hypernymic propositions in

biomedical text. Journal of Biomedical Informatics, 36(6), 462–477.

doi:10.1016/j.jbi.2003.11.003

Rindflesch, T. C., Fiszman, M., & Libbus, B. (2005). Semantic Interpretation for the Biomedical

Research Literature. In H. Chen, S. S. Fuller, C. Friedman, & W. Hersh (Eds.), Medical

Informatics (pp. 399–422). Springer US. Retrieved from

http://link.springer.com/chapter/10.1007/0-387-25739-X_14

Risérus, U., Tan, G. D., Fielding, B. A., Neville, M. J., Currie, J., Savage, D. B., … Karpe, F.

(2005). Rosiglitazone increases indexes of stearoyl-CoA desaturase activity in humans:

link to insulin sensitization and the role of dominant-negative mutation in peroxisome

proliferator-activated receptor-gamma. Diabetes, 54(5), 1379–1384.

Roark, B., & Charniak, E. (1998). Noun-phrase co-occurrence statistics for semiautomatic

semantic lexicon construction. In Proceedings of the 17th international conference on

Computational linguistics-Volume 2 (pp. 1110–1116).

Rodríguez-Monguió, R., Otero, M. J., & Rovira, J. (2003). Assessing the economic impact of

adverse drug effects. PharmacoEconomics, 21(9), 623–650.

Roux, E., Thiessard, F., Fourrier, A., Begaud, B., & Tubert-Bitter, P. (2005). Evaluation of

statistical association measures for the automatic signal generation in pharmacovigilance.

IEEE Transactions on Information Technology in Biomedicine, 9(4), 518–527.

doi:10.1109/TITB.2005.855566

Roux, E., Thiessard, F., Fourrier, A., Begaud, B., Tubert-Bitter, P., & others. (2003).

Spontaneous reporting system modelling for data mining methods evaluation in

157

pharmacovigilance. In AIME Workshop on Intelligent Data Analysis in Medicine and

Pharmacology (IDAMAP). Retrieved from http://rouxemmanuel.free.fr/Roux.pdf

Ryan, P. B., Madigan, D., Stang, P. E., Marc Overhage, J., Racoosin, J. A., & Hartzema, A. G.

(2012). Empirical assessment of methods for risk identification in healthcare data: results

from the experiments of the Observational Medical Outcomes Partnership. Statistics in

Medicine, 31(30), 4401–4415. doi:10.1002/sim.5620

Ryan, P. B., Schuemie, M. J., Gruber, S., Zorych, I., & Madigan, D. (2013). Empirical

Performance of a New User Cohort Method: Lessons for Developing a Risk

Identification and Analysis System. Drug Safety, 36(S1), 59–72. doi:10.1007/s40264-

013-0099-6

Ryan, P. B., Schuemie, M. J., & Madigan, D. (2013). Empirical Performance of a Self-

Controlled Cohort Method: Lessons for Developing a Risk Identification and Analysis

System. Drug Safety, 36(S1), 95–106. doi:10.1007/s40264-013-0101-3

Ryan, P. B., Schuemie, M. J., Welebob, E., Duke, J., Valentine, S., & Hartzema, A. G. (2013).

Defining a Reference Set to Support Methodological Research in Drug Safety. Drug

Safety, 36(S1), 33–47. doi:10.1007/s40264-013-0097-8

Saiakhov, R., Chakravarti, S., & Klopman, G. (2013). Effectiveness of CASE Ultra Expert

System in Evaluating Adverse Effects of Drugs. Molecular Informatics, 32(1), 87–97.

doi:10.1002/minf.201200081

Saitwal, H., Qing, D., Jones, S., Bernstam, E. V., Chute, C. G., & Johnson, T. R. (2012). Cross-

terminology mapping challenges: A demonstration using medication terminological

systems. Journal of Biomedical Informatics, 45(4), 613–625.

doi:10.1016/j.jbi.2012.06.005

158

Salton, G., Fox, E. A., & Wu, H. (1983). Extended Boolean information retrieval. Commun.

ACM, 26(11), 1022–1036. doi:10.1145/182.358466

Sarafidis, P. A., Lasaridis, A. N., Nilsson, P. M., Mouslech, T. F., Hitoglou-Makedou, A. D.,

Stafylas, P. C., … Tourkantonis, A. A. (2005). The effect of rosiglitazone on novel

atherosclerotic risk factors in patients with type 2 diabetes mellitus and hypertension. An

open-label observational study. Metabolism: Clinical and Experimental, 54(9), 1236–

1242. doi:10.1016/j.metabol.2005.04.010

Schuemie, M. J., Coloma, P. M., Straatman, H., Herings, R. M., Trifirò, G., Matthews, J. N., …

others. (2012). Using electronic health care records for drug safety signal detection: a

comparative evaluation of statistical methods. Medical Care, 50(10), 890–897.

Schuemie, M. J., Gini, R., Coloma, P. M., Straatman, H., Herings, R. M. C., Pedersen, L., …

Sturkenboom, M. C. J. M. (2013). Replication of the OMOP Experiment in Europe:

Evaluating Methods for Risk Identification in Electronic Health Record Databases. Drug

Safety, 36(S1), 159–169. doi:10.1007/s40264-013-0109-8

Schvaneveldt, R. W., & Cohen, T. A. (2010). Abductive reasoning and similarity: Some

computational tools. In Computer-Based Diagnostics and Systematic Analysis of

Knowledge (pp. 189–211). Springer. Retrieved from


Schweder, T., & Spjøtvoll, E. (1982). Plots of p-values to evaluate many tests simultaneously.

Biometrika, 69(3), 493–502.

Sehgal, A. K., Qiu, X. Y., & Srinivasan, P. (2008). Analyzing LBD methods using a general

framework. In Literature-based discovery (pp. 75–100). Springer. Retrieved from


159

Shah, N. D., Montori, V. M., Krumholz, H. M., Tu, K., Alexander, G. C., & Jackevicius, C. A.

(2010). Responding to an FDA warning—geographic variation in the use of

rosiglitazone. New England Journal of Medicine, 363(22), 2081–2084.

Shakir, S. A. W., & Layton, D. (2002). Causal association in pharmacovigilance and

pharmacoepidemiology: thoughts on the application of the Austin Bradford-Hill criteria.

Drug Safety, 25(6), 467–471.

Shetty, K. D., & Dalal, S. R. (2011). Using information mining of the medical literature to

improve drug safety. Journal of the American Medical Informatics Association, 18(5),

668–674. doi:10.1136/amiajnl-2011-000096

Shibata, A., & Hauben, M. (2011). Pharmacovigilance, signal detection and signal intelligence

overview. In Information Fusion (FUSION), 2011 Proceedings of the 14th International

Conference on (pp. 1–7). IEEE. Retrieved from

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5977551

Smalheiser, N. R., & Swanson, D. R. (1996). Linking estrogen to Alzheimer’s disease.

Neurology, 47(3), 809 –810.

Srinivasan, P., & Rindflesch, T. C. (2002). Exploring text mining from MEDLINE. Proceedings

of the AMIA Symposium, 722–726.

Stang, P. E., Ryan, P. B., Racoosin, J. A., Overhage, J. M., Hartzema, A. G., Reich, C., …

Woodcock, J. (2010). Advancing the science for active surveillance: rationale and design

for the Observational Medical Outcomes Partnership. Annals of Internal Medicine,

153(9), 600–606.

Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical

Society: Series B (Statistical Methodology), 64(3), 479–498.

160

Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies.

Proceedings of the National Academy of Sciences of the United States of America,

100(16), 9440.

Storey, J., & Tibshirani, R. (2001). Estimating false discovery rates under dependence, with

applications to DNA microarrays (Technical Report) (p. 25). Department of Statistics,

Stanford Universit.

Suchard, M. A., Zorych, I., Simpson, S. E., Schuemie, M. J., Ryan, P. B., & Madigan, D. (2013).

Empirical Performance of the Self-Controlled Case Series Design: Lessons for

Developing a Risk Identification and Analysis System. Drug Safety, 36(S1), 83–93.

doi:10.1007/s40264-013-0100-4

Suzuki, S., Suzuki, M., Sembon, S., Fuchimoto, D., & Onishi, A. (2013). Characterization of

actions of octanoate on porcine preadipocytes and adipocytes differentiated in vitro.

Biochemical and Biophysical Research Communications, 432(1), 92–98.

doi:10.1016/j.bbrc.2013.01.080

Swanson, D. R. (1986a). Fish oil, Raynaud’s syndrome, and undiscovered public knowledge.

Perspectives in Biology and Medicine, 30(1), 7.

Swanson, D. R. (1986b). Undiscovered public knowledge. The Library Quarterly, 103–118.

Swanson, D. R. (1987). Two medical literatures that are logically but not bibliographically

connected. Journal of the American Society for Information Science, 38(4), 228–233.

Swanson, D. R. (1988). Migraine and magnesium: eleven neglected connections. Perspectives in

Biology and Medicine, 31(4), 526–557.

161

Tatonetti, N. P., Ye, P. P., Daneshjou, R., & Altman, R. B. (2012). Data-Driven Prediction of

Drug Effects and Interactions. Science Translational Medicine, 4(125), 125ra31–125ra31.

doi:10.1126/scitranslmed.3003377

Tetsuro Yoshida, Kimihiko Kato, Mitsutoshi Oguri, Sachiro Watanabe, Norifumi Metoki,

Hidemi Yoshida, … Yoshiji Yamada. (2010). Association of genetic variants with

myocardial infarction in Japanese individuals with different lipid profiles. International

Journal of Molecular Medicine, 25(4). doi:10.3892/ijmm_00000383

Thompson, D. (2013, November 25). FDA to Lift Restrictions on Diabetes Drug Avandia.

Retrieved December 24, 2013, from http://health.usnews.com/health-

news/news/articles/2013/11/25/fda-to-lift-restrictions-on-diabetes-drug-avandia

Trifirò, G., Pariente, A., Coloma, P. M., Kors, J. A., Polimeni, G., Miremont-Salamé, G., …

Fourrier-Reglat, A. (2009). Data mining on electronic health record databases for signal

detection in pharmacovigilance: which events to monitor? Pharmacoepidemiology and

Drug Safety, 18(12), 1176–1184. doi:10.1002/pds.1836

Tsukahara, T. (2006). Different Residues Mediate Recognition of 1-O-Oleyllysophosphatidic

Acid and Rosiglitazone in the Ligand Binding Domain of Peroxisome Proliferator-

activated Receptor. Journal of Biological Chemistry, 281(6), 3398–3407.

doi:10.1074/jbc.M510843200

Turney, P. (2001). Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In

Proceedings of the Twelth European Conference on Machine Learning (ECML-2001)

(pp. 491–502). Freiburg, Germany.

162

Tzameli, I. (2004). Regulated Production of a Peroxisome Proliferator-activated Receptor-

Ligand during an Early Phase of Adipocyte Differentiation in 3T3-L1 Adipocytes.

Journal of Biological Chemistry, 279(34), 36093–36102. doi:10.1074/jbc.M405346200

U. S. National Library of Medicine. (2009, September). UMLS® Reference Manual. Text.

Retrieved February 24, 2014, from http://www.ncbi.nlm.nih.gov/books/NBK9676/

U.S. Food and Drug Administration. (2007, November 14). 2007 - FDA Adds Boxed Warning for

Heart-related Risks to Anti-Diabetes Drug Avandia. WebContent. Retrieved December

24, 2013, from

http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/2007/ucm109026.htm

U.S. Food and Drug Administration. (2008, May). The Sentinel Initiative National Strategy for

Monitoring Medical Product Safety. U.S. Food and Drug Administration. Retrieved from

http://www.fda.gov/downloads/Safety/FDAsSentinelInitiative/UCM124701.pdf

U.S. Food and Drug Administration. (2013, November 25). Press Announcements - FDA

requires removal of certain restrictions on the diabetes drug Avandia. WebContent.

Retrieved December 24, 2013, from

http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm376516.htm

U.S. National Library of Medicine, & National Institutes of Health. (2012). UMLS Terminology

Services (UTS) API 2.0. Retrieved from http://uts.nlm.nih.gov/metathesaurus.html

U.S. National Library of Medicine, & National Institutes of Health. (2014). RxNorm API.

Retrieved from http://rxnav.nlm.nih.gov/RxNormAPI.html

Van Puijenbroek, E. P., Bate, A., Leufkens, H. G. M., Lindquist, M., Orre, R., & Egberts, A. C.

G. (2002). A comparison of measures of disproportionality for signal detection in

163

spontaneous reporting systems for adverse drug reactions. Pharmacoepidemiology and

Drug Safety, 11(1), 3–10. doi:10.1002/pds.668

Van Wijk, J., Coll, B., Cabezas, M. C., Koning, E., Camps, J., Mackness, B., & Joven, J. (2006).

Rosiglitazone modulates fasting and post-prandial paraoxonase 1 activity in type 2

diabetic patients. Clinical and Experimental Pharmacology & Physiology, 33(12), 1134–

1137. doi:10.1111/j.1440-1681.2006.04505.x

Vessby, B., Kostner, G., Lithell, H., & Thomis, J. (1982). Diverging effects of cholestyramine on

apolipoprotein B and lipoprotein Lp(a). A dose-response study of the effects of

cholestyramine in hypercholesterolaemia. Atherosclerosis, 44(1), 61–71.

Wagner, A. K., Chan, K. A., Dashevsky, I., Raebel, M. A., Andrade, S. E., Lafata, J. E., … Platt,

R. (2006). FDA drug prescribing warnings: is the black box half empty or half full?


Wahle, M., Widdows, D., Herskovic, J. R., Bernstam, E. V., & Cohen, T. (2012). Deterministic

Binary Vectors for Efficient Automated Indexing of MEDLINE/PubMed Abstracts.

AMIA Annual Symposium Proceedings, 2012, 940–949.

Wang, X., Hripcsak, G., Markatou, M., & Friedman, C. (2009). Active computerized

pharmacovigilance using natural language processing, statistics, and electronic health

records: a feasibility study. Journal of the American Medical Informatics Association,

16(3), 328–337.

Ward, A. C. (2009). The role of causal criteria in causal inferences: Bradford Hill’s “aspects of

association.” Epidemiologic Perspectives & Innovations, 6(1), 2. doi:10.1186/1742-5573-

6-2

164

Weeber, M., Klein, H., de Jong‐van den Berg, L. T. W., & Vos, R. (2001). Using concepts in

literature‐based discovery: Simulating Swanson’s Raynaud–fish oil and migraine–

magnesium discoveries. Journal of the American Society for Information Science and

Technology, 52(7), 548–557. doi:10.1002/asi.1104

Wei, W.-Q., Cronin, R. M., Xu, H., Lasko, T. A., Bastarache, L., & Denny, J. C. (2013).

Development and evaluation of an ensemble resource linking medications to their

indications. Journal of the American Medical Informatics Association, 20(5), 954–961.

doi:10.1136/amiajnl-2012-001431

WHO. (2010). WHO | Pharmacovigilance. WHO. Retrieved April 1, 2013, from

http://www.who.int/medicines/areas/quality_safety/safety_efficacy/pharmvigi/en/index.ht

ml

WHO-Uppsala Monitoring Centre. (2013, January). Glossary of terms in pharmacovigilance.

Retrieved June 26, 2014, from http://www.who-umc.org/graphics/27400.pdf

Widdows, D., & Cohen, T. (2010). The semantic vectors package: New algorithms and public

tools for distributional semantics. In Fourth IEEE international conference on semantic

computing (Vol. 1, p. 43). Retrieved from

https://semanticvectors.googlecode.com/files/semantic-vectors-package-2010.pdf

Widdows, D., & Cohen, T. (2012). Real, complex, and binary semantic vectors. In Quantum

Interaction (pp. 24–35). Springer. Retrieved from


Widdows, D., & Peters, S. (2003). Word vectors and quantum logic: Experiments with negation

and disjunction. Mathematics of Language, 8(141-154). Retrieved from

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.3862&rep=rep1&type=pdf

165

Wiholm, B.-E. (1984). The Swedish drug-event assessment methods. Special workshop--

regulatory. Drug Information Journal, 18(3-4), 267–269.

Wysowski, D. K., & Swartz, L. (2005). Adverse drug event surveillance and drug withdrawals in

the United States, 1969-2002: the importance of reporting suspected reactions. Archives

of Internal Medicine, 165(12), 1363.

Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In

Proceedings of the 33rd annual meeting on Association for Computational Linguistics

(pp. 189–196).

Yates, A., & Goharian, N. (2013). ADRTrace: detecting expected and unexpected adverse drug

reactions from user reviews on social media sites. In Advances in Information Retrieval

(pp. 816–819). Springer. Retrieved from http://link.springer.com/chapter/10.1007/978-3-

642-36973-5_92

Ye, J. (2011). Challenges in drug discovery for thiazolidinedione substitute. Acta Pharmaceutica

Sinica B, 1(3), 137–142. doi:10.1016/j.apsb.2011.06.011

Yetisgen-Yildiz, M., & Pratt, W. (2006). Using statistical and knowledge-based approaches for

literature-based discovery. Journal of Biomedical Informatics, 39(6), 600–611.

Yetisgen-Yildiz, M., & Pratt, W. (2009). A new evaluation methodology for literature-based

discovery systems. Journal of Biomedical Informatics, 42(4), 633–643.

doi:10.1016/j.jbi.2008.12.001

Zorych, I., Madigan, D., Ryan, P., & Bate, A. (2011). Disproportionality methods for

pharmacovigilance in longitudinal observational databases. Statistical Methods in

Medical Research, 22(1), 39–56. doi:10.1177/0962280211403602

Integrating Domain Knowledge to Improve Signal Detection ... · Pharmacovigilance, also known as...

Documents

Transcript of Integrating Domain Knowledge to Improve Signal Detection ... · Pharmacovigilance, also known as...