International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta,...

10
International Journal of Medical Informatics 94 (2016) 21–30 Contents lists available at ScienceDirect International Journal of Medical Informatics journal homepage: www.ijmijournal.com Insight: An ontology-based integrated database and analysis platform for epilepsy self-management research Satya S. Sahoo a,b,, Priya Ramesh b , Elisabeth Welter c , Ashley Bukach c , Joshua Valdez a , Curtis Tatsuoka c , Yvan Bamps d , Shelley Stoll e , Barbara C. Jobst f , Martha Sajatovic c a Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States b Electrical Engineering and Computer Science Department, School of Engineering, Case Western Reserve University, Cleveland, OH 44106, United States c Neurological Institute, University Hospitals Case Medical Center, Cleveland, OH 44106, United States d Rollins School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan, Ann Arbor, MI 48109, United States f Department of Neurology, Geisel School of Medicine, Dartmouth College, Lebanon, NH 03756, United States a r t i c l e i n f o Article history: Received 4 February 2016 Received in revised form 15 June 2016 Accepted 18 June 2016 a b s t r a c t We present Insight as an integrated database and analysis platform for epilepsy self-management research as part of the national Managing Epilepsy Well Network. Insight is the only available informatics plat- form for accessing and analyzing integrated data from multiple epilepsy self-management research studies with several new data management features and user-friendly functionalities. The features of Insight include, (1) use of Common Data Elements defined by members of the research community and an epilepsy domain ontology for data integration and querying, (2) visualization tools to support real time exploration of data distribution across research studies, and (3) an interactive visual query inter- face for provenance-enabled research cohort identification. The Insight platform contains data from five completed epilepsy self-management research studies covering various categories of data, including depression, quality of life, seizure frequency, and socioeconomic information. The data represents over 400 participants with 7552 data points. The Insight data exploration and cohort identification query interface has been developed using Ruby on Rails Web technology and open source Web Ontology Lan- guage Application Programming Interface to support ontology-based reasoning. We have developed an efficient ontology management module that automatically updates the ontology mappings each time a new version of the Epilepsy and Seizure Ontology is released. The Insight platform features a Role-based Access Control module to authenticate and effectively manage user access to different research studies. User access to Insight is managed by the Managing Epilepsy Well Network database steering committee consisting of representatives of all current collaborating centers of the Managing Epilepsy Well Network. New research studies are being continuously added to the Insight database and the size as well as the unique coverage of the dataset allows investigators to conduct aggregate data analysis that will inform the next generation of epilepsy self-management studies. © 2016 Elsevier Ireland Ltd. All rights reserved. Insight can be accessed at: http://mew.meds.cwru.edu/ 1. Introduction Persons with chronic health conditions can significantly benefit from self-management techniques, which consist of understanding their health conditions and adopting a set of behaviors to manage their conditions [1–5]. Epilepsy is one of the most common serious Corresponding author at: Division of Medical Informatics, 2103 Cornell Road, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States. E-mail address: [email protected] (S.S. Sahoo). neurological disorders, affecting an estimated 50 million persons worldwide [6], with 200,000 new cases reported each year [7]. Development and adoption of self-management techniques have been strongly recommended for persons with epilepsy to positively influence their prognosis and ability to manage the symptoms of epilepsy [8,9]. Patients with epilepsy experience repeated seizures that manifest as physical or behavioral changes that disrupt normal activities [10]. Repeated epilepsy seizures have a negative impact on quality of life, education, and employment, and also increase the risk of early mortality [11]. In addition, epilepsy as a chronic con- dition imposes a high economic burden on patients, their families, and society; in the US the total cost per year for medical expendi- ture and informal care for patients with epilepsy is estimated to be http://dx.doi.org/10.1016/j.ijmedinf.2016.06.009 1386-5056/© 2016 Elsevier Ireland Ltd. All rights reserved.

Transcript of International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta,...

Page 1: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

If

SCa

b

c

d

e

f

a

ARRA

1

ftt

SS

h1

International Journal of Medical Informatics 94 (2016) 21–30

Contents lists available at ScienceDirect

International Journal of Medical Informatics

journa l homepage: www. i jmi journa l .com

nsight: An ontology-based integrated database and analysis platformor epilepsy self-management research

atya S. Sahoo a,b,∗, Priya Ramesh b, Elisabeth Welter c, Ashley Bukach c, Joshua Valdez a,urtis Tatsuoka c, Yvan Bamps d, Shelley Stoll e, Barbara C. Jobst f, Martha Sajatovic c

Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United StatesElectrical Engineering and Computer Science Department, School of Engineering, Case Western Reserve University, Cleveland, OH 44106, United StatesNeurological Institute, University Hospitals Case Medical Center, Cleveland, OH 44106, United StatesRollins School of Public Health, Emory University, Atlanta, GA 30322, United StatesCenter for Managing Chronic Disease, University of Michigan, Ann Arbor, MI 48109, United StatesDepartment of Neurology, Geisel School of Medicine, Dartmouth College, Lebanon, NH 03756, United States

r t i c l e i n f o

rticle history:eceived 4 February 2016eceived in revised form 15 June 2016ccepted 18 June 2016

a b s t r a c t

We present Insight as an integrated database and analysis platform for epilepsy self-management researchas part of the national Managing Epilepsy Well Network. Insight is the only available informatics plat-form for accessing and analyzing integrated data from multiple epilepsy self-management researchstudies with several new data management features and user-friendly functionalities. The features ofInsight include, (1) use of Common Data Elements defined by members of the research community andan epilepsy domain ontology for data integration and querying, (2) visualization tools to support realtime exploration of data distribution across research studies, and (3) an interactive visual query inter-face for provenance-enabled research cohort identification. The Insight platform contains data from fivecompleted epilepsy self-management research studies covering various categories of data, includingdepression, quality of life, seizure frequency, and socioeconomic information. The data represents over400 participants with 7552 data points. The Insight data exploration and cohort identification queryinterface has been developed using Ruby on Rails Web technology and open source Web Ontology Lan-guage Application Programming Interface to support ontology-based reasoning. We have developed anefficient ontology management module that automatically updates the ontology mappings each time anew version of the Epilepsy and Seizure Ontology is released. The Insight platform features a Role-basedAccess Control module to authenticate and effectively manage user access to different research studies.

User access to Insight is managed by the Managing Epilepsy Well Network database steering committeeconsisting of representatives of all current collaborating centers of the Managing Epilepsy Well Network.New research studies are being continuously added to the Insight database and the size as well as theunique coverage of the dataset allows investigators to conduct aggregate data analysis that will inform

ilepsy

the next generation of ep

Insight can be accessed at: http://mew.meds.cwru.edu/

. Introduction

Persons with chronic health conditions can significantly benefit

rom self-management techniques, which consist of understandingheir health conditions and adopting a set of behaviors to manageheir conditions [1–5]. Epilepsy is one of the most common serious

∗ Corresponding author at: Division of Medical Informatics, 2103 Cornell Road,chool of Medicine, Case Western Reserve University, Cleveland, OH 44106, Unitedtates.

E-mail address: [email protected] (S.S. Sahoo).

ttp://dx.doi.org/10.1016/j.ijmedinf.2016.06.009386-5056/© 2016 Elsevier Ireland Ltd. All rights reserved.

self-management studies.© 2016 Elsevier Ireland Ltd. All rights reserved.

neurological disorders, affecting an estimated 50 million personsworldwide [6], with 200,000 new cases reported each year [7].Development and adoption of self-management techniques havebeen strongly recommended for persons with epilepsy to positivelyinfluence their prognosis and ability to manage the symptoms ofepilepsy [8,9]. Patients with epilepsy experience repeated seizuresthat manifest as physical or behavioral changes that disrupt normalactivities [10]. Repeated epilepsy seizures have a negative impacton quality of life, education, and employment, and also increase therisk of early mortality [11]. In addition, epilepsy as a chronic con-

dition imposes a high economic burden on patients, their families,and society; in the US the total cost per year for medical expendi-ture and informal care for patients with epilepsy is estimated to be
Page 2: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

2 al of M

$aniit

tt[itsectDthmodAbsachp

1

bimicfiiiceeeoitIrtgath

dEaftrdEaa

2 S.S. Sahoo et al. / International Journ

9.6 billion [12]. Some studies have described the burden associ-ted with non-adherence to anti-seizure drugs, which can result inegative outcome for persons with epilepsy [13]. Therefore, there

s clear need to develop and use approaches that improve the abil-ty of people with epilepsy to control their seizures and improveheir health.

Epilepsy self-management includes a set of behavioral practiceshat are likely to help patients better control their seizures andhat may positively impact their symptoms as well as prognosis8,14]. These self-management techniques are often categorizednto three areas of: (1) treatment management (e.g., medica-ion adherence), (2) seizure management (e.g., keeping track ofeizures), and (3) lifestyle management (e.g., getting regular sleep,ngaging in safe physical activity) [7]. Self-management techniquesan reduce healthcare utilization by lowering the number of inpa-ient hospitalizations, inpatient days and visits to the Emergencyepartment [13]. Since 2007, the U.S. Centers for Disease Con-

rol and Prevention’s (CDC) Prevention Research Centers (PRC)as funded the Managing Epilepsy Well (MEW) Network, a the-atic research network whose mission is to advance the science

f epilepsy self-management. CDC MEW Network programs areesigned to enhance the quality of life of people with epilepsy [7].

key feature of MEW Network self-management approaches haseen to incorporate consideration of comorbidity, such as depres-ion, and to address the pervasive problem of cognitive impairmentmong individuals with epilepsy. The MEW Network currentlyomprises eight PRCs (connected with accredited schools of publicealth or schools of medicine with a preventive medicine residencyrogram) that collaborate on epilepsy self-management research.

.1. Motivation for integrated analysis of MEW network data

The MEW Network was founded on principles of community-ased participatory research. Each collaborating site develops

ts own capacity to conduct independent research with com-unity partners to respond to variations in the needs and

nterests specific to these communities. Since 2007, each site hasollected, securely stored, analyzed, and published their studyndings. However, pooled data from both completed and ongo-

ng MEW self-management studies represent valuable, untappednformation relevant to the epilepsy self-management researchommunity. Integrative analysis of these datasets will enablepilepsy researchers to gain new insights into various aspects ofpilepsy self-management that may ultimately help people withpilepsy and their families. A MEW database can harness the powerf aggregated data to better understand the different factors that

mpact people with epilepsy, for example identifying the associa-ion between seizure frequency and quality of life or depression.ndeed, effective secondary use of healthcare data for advancingesearch, a critical aspect of biomedical informatics, maximizeshe value of existing data [15,16]. A database containing aggre-ated data collected by the MEW Network sites will potentiallyllow researchers to propose new studies on self-managementechniques across subpopulations, geographical locations, andealthcare settings with sufficient statistical power.

There are many MEW study-specific data repositories that storeata using various approaches, including relational databases, MSxcel spreadsheet, and paper forms [17]. These data repositoriesre conceptually similar to “data silos” with limited or no supportor data sharing, integration, and secondary analysis. In addition,here is limited terminological standardization across epilepsyesearch studies, which impedes integrated analysis of data across

ifferent studies. Similar to initiatives such as the Internationalpilepsy Electrophysiology Portal (IEEG-Portal), which aims to cre-te a database for epilepsy electrophysiological data [18], there is

clear need to develop an integrated and scalable database for

edical Informatics 94 (2016) 21–30

epilepsy self-management research data. To meet this need, a MEWNetwork database workgroup was established in 2014 to explorethe feasibility of a common epilepsy self-management database.The goals of the workgroup included establishment of an analyt-ical platform that would eventually enable members of the MEWNetwork to interactively create cohorts of patients for secondarydata analysis.

The consortium of researchers involved in this initiativeincluded representatives from all the MEW Network collaboratingcenters. Since September 2014, a Steering Committee representingresearch data stakeholders, has developed a standardized pro-cess for sharing de-identified study data across the MEW Networkthrough a Data User Agreements (DUA) and provides continuedoversight and guidance for the expansion, management and useof the MEW self-management study data for research purposes.The MEW Network is currently coordinated by Dartmouth College.While the MEW Network currently includes 8 collaborating centers,a total of 11 centers have been active during the extant tenure of theMEW Network, including previously and currently funded centers.These centers are: Emory University; the University of Texas HealthCenter at Houston; the University of Michigan; the Dartmouth Insti-tute at the Geisel School of Medicine; New York University; theUniversity of Arizona; the Morehouse College School of Medicine;the University of Illinois at Chicago; the University of Minnesota;the University of Washington at Seattle; and Case Western ReserveUniversity (CWRU).

The development of the Insight MEW Network data analyticsplatform (hereafter called Insight) is led by CWRU. Data contribu-tion from participating sites is purely voluntary and not requiredunder the respective CDC PRC cooperative agreements. As ofNovember 2015, data from five research studies have been inte-grated into Insight. The five research studies have a total of over400 de-identified participants with more than 7552 data pointsrepresented using 16 MEW Common Data Elements (CDEs) rec-ommended for inclusion in study designs by the MEW Network tostandardize the terminology in the database. Although Insight con-tains only de-identified data from the research studies, the data isstored using standard best practices in terms of data security, dataaccessibility, and maintenance of user access log data.

1.2. Aims of the Insight database

Insight is being developed as a national resource for the epilepsyself-management research community. To the best of our knowl-edge there is no existing epilepsy self-management data analysisplatform similar to Insight, which together with the integrateddataset features a rich set of data exploration and query func-tions for easier data analysis by epilepsy researchers. The fourprimary goals of Insight are: (1) integrate data using MEW CDEsas standard terminology, (2) enable users to interactively explorethe data using visualization functions, (3) provide an intuitive andflexible provenance-enabled query environment for users to per-form cohort identification; and (4) serve as the basis of a proposedepilepsy self-management registry. Insight uses an epilepsy domainontology called Epilepsy and Seizure Ontology (EpSO) [19] to sup-port advanced data exploration, query composition, and executionstrategies through use of ontology reasoning for “query unfolding”and for reconciling semantic heterogeneity. These techniques aredescribed in detail in the next section.

2. Material and methods

Data from a MEW Network research study are added intoInsight in accordance with a standardized protocol. After a DUAis completed between the Insight team and a specific MEW Net-

Page 3: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

al of M

wI(msstifLrowtmed

aldSSwevaTboItibwi

cdiraettebdc

tdtddi

2

M(fieic

S.S. Sahoo et al. / International Journ

ork collaborating center, the study data is transferred to thensight team either as Microsoft Excel or Comma Separated ValueCSV) files together with study protocols describing the provenance

etadata (e.g., inclusion and exclusion criteria), and the study-pecific data dictionary. No data is or will be used to identify anytudy participant. Using the data dictionary, we map the study datao the MEW CDEs and invoke the data transformation pipeline forntegration and loading of data into the database. The data trans-ormation pipeline is implemented as a flexible Extract Transformoad (ETL) workflow, which can be easily modified to incorpo-ate the mappings between MEW CDEs and the data dictionariesf different research studies. Although there has been extensiveork in data integration techniques [20,21], there are no existing

echniques that can completely automate the generation of correctappings between two terminologies; therefore we manually gen-

rated mappings between the MEW CDEs and study-specific dataictionaries.

The MEW CDEs are the core component of the ETL workflownd are based on data elements recommended for epilepsy surveil-ance studies by the Institute of Medicine (IOM) [22], the CDEseveloped by the National Institute of Neurological Disorders andtroke (NINDS) [23], and terms defined in the Behavioral Risk Factorurveillance System (BRFSS) Questionnaire [24]. The MEW Net-ork database initiative is using an incremental approach to define

pilepsy self-management CDEs with 16 CDEs selected as Tier-1ariables (mainly demographic variables) and 15 additional CDEspproved as Tier-2 variables (mainly clinical outcome variables).he Insight database schema and the ETL workflow are designed toe easily extensible to incorporate additional CDEs as they are rec-mmended for adoption by the MEW database Steering Committee.

n addition to data transformation, the ETL workflow annotateshe appropriate subset of study data with ontology classes definedn EpSO. This “semantic annotation” is used to support ontology-ased reasoning during data exploration and query execution,hich significantly improves the quality of query results (described

n the following section).The Insight database is the first step towards supporting cross-

ohort and multi-study data analysis. At present, most biomedicalatabases follow a cumbersome approach for data retrieval, which

nvolves request for data by researchers to a database manager,etrieval of study cohort data or variable values by the manager,nd data analysis by researchers to evaluate their hypothesis. How-ver, if the researchers are not satisfied with the retrieved data,hey have to resubmit their requests to the database manager andhis process may involve multiple iterations. Clearly, there are sev-ral limitations to this approach, including significant time delayetween data request and subsequent analysis, limited support forata exploration, and lack of tools to dynamically modify searchriteria during cohort identification.

Insight has been designed to support an alternate approachhat allows researchers to directly access data in the integratedatabase through an intuitive visual interface (Fig. 1(a) illustrateshe differences in the two approaches). In the following section, weescribe the implementation of the database, the ontology-drivenata exploration, and features of the visual query interface (Fig. 1(b)

llustrates the information flow in Insight).

.1. Implementation of the integrated database

The integrated data is stored in a relational database usingySQL (version 5.6.16). The tables store: (a) configuration details

e.g., values corresponding to the MEW CDEs, and user pro-

le information), and (b) study data (e.g., visit information, andpilepsy seizure details). The configuration details include: thenclusion/exclusion criteria for each research study; study proto-ol; the mappings from the ontology classes to the study values;

edical Informatics 94 (2016) 21–30 23

the ontology class hierarchy (used to support the multi-level dropdown menus in the visual query widgets and query unfolding dur-ing query execution); and the mappings between de-identifiedstudy identifier and study name. The user credentials and usergroups with well-defined access privileges (as determined by theMEW database Steering Committee) allows the implementation ofa flexible and easy to maintain Role-based Access Control (RBAC)functionality in Insight.

In contrast to a naïve approach of assigning access privilegesto each individual user, which is difficult to modify and maintainwith large number of users, granting access to user groups allowssystematic maintenance of access privileges for all user assignedto a specific user groups. The research study data are stored in 2tables with unique identifier of the participant (generated by Insightto uniquely identify participants), the source study of the partici-pant, and demographic information. The data collected at differenttime points in a clinical trial corresponding to the research designof individual studies (e.g., baseline, interim, and follow-up visits)are stored in a separate table. Fig. 2(a) shows the current entity-relationship diagram of the Insight database and Fig. 2(b) shows ascreenshot of the epilepsy ontology class hierarchy used to supportquery composition and execution.

2.2. Research studies and data elements

The current version of the Insight database integrates data fromfive completed MEW Network research studies (as of November2015) with a total of over 400 participants. The five MEW Net-work research studies in the Insight database are: (a) TargetedSelf-Management for Epilepsy and Mental Illness (TIME) study,(b) WebEase (Epilepsy Awareness, Support, and Education) study[25], (c) F.O.C.U.S. on Epilepsy: pilot and randomized control trial(RCT) studies, and (d) Home-Based Self-management and CognitiveTraining Changes Lives (HOBSCOTCH) study [26]. The TIME studytested an intervention consisting of in-person group sessions foradults with epilepsy and co-morbid serious mental illness, suchas severe depression, bipolar disorder or schizophrenia. The on-line WebEase Program, which was tested in a RCT, uses principlesof the trans-theoretical model of behavioral change, social behav-ioral theory and motivational interviewing, to provide tailored andresponsive communications, that adapt activities and resources tomeet the user’ readiness to change, self-management needs andconfidence in behavior change. WebEase guides the adoption offavorable medication adherence, stress reduction, and sleep man-agement behaviors by the adult with epilepsy [27]. The FOCUSstudy tested an intervention in adults with epilepsy and memberof their social support network. The intervention goal was teachself-management techniques via a workshop, telephonic coachingsessions, and workbooks that leverage the patient’s social sup-port [25]. The HOBSCOTCH program targets memory and cognitiveproblems in adults with epilepsy. The HOBSCOTCH is a phone-delivered program, which was tested in a small pilot RCT [28].All studies integrated into the database used a repeated measuresresearch design, with data points collected at discrete time inter-vals. For this report, only baseline data points from the studies arefeatured.

The data from these research studies are mapped and trans-formed to the MEW CDEs by the study-specific ETL workflow. TheTier-1 MEW CDEs can be divided into five categories: (a) demo-graphics, (b) seizure details, (c) quality of life, (d) depression, and(e) other health measures. Table 1 shows the data categories, thevariables corresponding to each data category, the number of data

points, and the data value corresponding to each variable. Table 1also shows the distribution of participants across the five studies(the FOCUS Pilot and RCT values are aggregated into single set ofvalues). Fig. 3 is a Venn diagram showing the distribution of the
Page 4: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

24 S.S. Sahoo et al. / International Journal of Medical Informatics 94 (2016) 21–30

Fig. 1. (a) Traditional approaches to biomedical database require a database manager to retrieve data for secondary analysis, however Insight allows users to directly retrieverelevant data. Fig. 1(b): Information and data flow implemented by the MEW Network database group is shown with steps covering completion of a Data User Agreement(DUA) to access by users. The complete workflow consists of 6 steps (marked with a numeric circle in the figure). Step 1 involves completion of a data user agreement betweenthe MEW Network center and CWRU followed by mapping of CDEs to data dictionary terms of the new research study in Step 2. Step 3 consists of using the study-specificETL to integrate the new study data into Insight. In Step 4, the data is annotated with EpSO before storage in the database. Once the MEW Network coordination committeeapproves a new research study in Step 5, an investigator can access the integrated data using the Insight user interface in Step 6.

Fig. 2. (a) The Entity-Relationship (ER) diagram used for the Insight database is shown consisting of tables to store configuration information and study data. Fig. 2(b): Ascreenshot of the Epilepsy and Seizure Ontology (EpSO) class hierarchy is shown with details of various types of epileptic seizures, including Aura, Autonomic Seizures,Dialeptic Seizures, and Motor Seizures.

Page 5: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

S.S. Sahoo

et al.

/ International

Journal of

Medical

Informatics

94 (2016)

21–30

25

Table 1The distribution of data points corresponding to five data categories and their Managing Epilepsy Well Common Data Elements in the Insight database.

Data Category Study Variables Number of Data Points MEW Network Studies Total Number of Participants

WebEase Study FOCUS Study (Pilot and RCT studies) TIME Study HOBSCOTCH Study

Socioeconomic-Demography

GenderMale 430 39 (26%) 69 (38%) 18 (41%) 18 (31%) 144 (33%)Female 109 (74%) 111 (63%) 26 (59%) 40 (69%) 286 (67%)RaceWhite 372 132 (89%) 140 (78%) 16 (36%) – 288 (77%)Black/African-American 2 (1%) 22 (12%) 25 (57%) – 49 (13%)Asian 3 (2%) 0 0 – 3 (1%)American Indian or Alaska Native 0 1 (1%) 3 (7%) – 4 (1%)Other* 11 (8%) 17 (9%) 0 – 28 (8%)EthnicityHispanic or Latino 372 7 (5%) 9 (5%) 3 (7%) – 19 (5%)Not Hispanic or Latino 141 (95%) 170 (94%) 41 (93%) – 352 (95%)Other 0 1 (1%) 0 – 1 (1%)EducationNever attended school or only attended kindergarten 430 0 0 0 0 0Grades 1 through 8 (Elementary) 0 1 (1%) 0 0 1 (1%)Grades 9 through 11 (Some high school) 1 (1%) 3 (1%) 11 (25%) 0 15 (3%)Grade 12 or GED (High school graduate) 19 (13%) 33 (18%) 8 (18%) 27 (47%) 87 (20%)College 1 year to 3 years (Some college or technical school) 68 (46%) 61 (34%) 21 (48%) 0 150 (35%)College 4 years or more (College graduate) 58 (39%) 75 (42%) 4 (9%) 30 (52%) 167 (39%)Other 2 (1%) 7 (4%) 0 1 (1%) 10 (2%)IncomeLess than $25 K 224 – 81 (45%) 42 (95%) – 123 (55%)$25,000–$49,999 – 16 (9%) 2 (5%) – 18 (8%)$50,000 or greater – 7 (4%) 0 – 7 (3%)Other – 76 (42%) 0 – 76 (34%)EmploymentSelf-employed 430 1 (1%) 1 (1%) 0 0 2 (1%)Employed for wages 76 (51%) 68 (38%) 3 (7%) 21 (36%) 168 (39%)Retired 3 (2%) 17 (9%) 5 (11%) 0 25 (6%)Student 5 (3%) 6 (3%) 1 (2%) 0 12 (3%)Homemaker 3 (2%) 8 (4%) 4 (9%) 0 15 (3%)Do not work/Disability 4 (3%) 48 (27%) 19 (43%) 0 71 (17%)Out of work for less than 1 year 0 8 (4%) 2 (5%) 0 10 (2%)Out of work for 1 year or more 0 19 (11%) 10 (23%) 37 (64%) 66 (15%)Other 56 (38%) 5 (3%) 0 0 61 (14%)Relationship StatusMarried 192 69 (47%) – 4 (9%) – 73 (38%)Member of an unmarried couple 12 (8%) – 2 (5%) – 14 (7%)Never married 36 (24%) – 21 (48%) – 57 (30%)Divorced 22 (15%) – 14 (32%) – 36 (19%)Widowed 3 (2%) – 1 (2%) – 4 (2%)Separated 1 (1%) – 2 (4%) – 3 (2%)Refused/ Unknown/Other 5 (3%) – 0 – 5 (2%)

Seizure Details Type of SeizurePartial Seizure 250 65 (44%) – 7 (16%) 49 (85%) 121 (48%)Generalized Seizure 76 (51%) – 25 (57%) 9 (15%) 110 (44%)Other 7 (5%) – 12 (27%) 0 19 (8%)

Health Measures Health StatusExcellent 44 – – 1 (2%) – 1 (2%)Very good – – 4 (9%) – 4 (9%)Good – – 9 (20%) – 9 (20%)Fair – – 20 (46%) – 20 (46%)Poor – – 7 (16%) – 7 (16%)Don’t know/not sure – – 3 (7%) – 3 (7%)

Quality of Life Measure QOLIE-10Average total value (Baseline) 2182 2.59 3.4 2.9 – 3.0

Depression Measure PHQ-9Average total value (Baseline) 2626 – 8.2 10.9 9.3 9.1

QOLIE-10: 10-item quality of life measure, lower scores indicate better quality.PHQ-9: Patient Health Questionnaire, lower scores indicate less depression.

* Other includes additional or non-standard categories, missing data or results that are not interpretable.

Page 6: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

26 S.S. Sahoo et al. / International Journal of M

Fig. 3. The Venn diagram shows the data points corresponding to 437 participantsin different categories of MEW CDE. The data categories are Income, Employment,S(b

nisitof

2

ocmaisOpfrafiaI(lomums

(aifo“

eizure Type, Depression Measure (PHQ-9), and Quality of Life Measure (QOLIE-10)the diagram was created using tool available at: http://bioinformatics.psb.ugent.e/webtools/Venn).

umber of patients corresponding to quality of life, depression,ncome, employment status, and seizure frequency values. Fig. 3hows that there is overlap between some of the measures, includ-ng Income, Quality of Life, and Depression measures (PHQ-9). Inhe next section, we describe the role of the epilepsy domain ontol-gy in data integration, query composition, and query executioneatures of Insight.

.3. Role of the epilepsy and seizure ontology

The Epilepsy and Seizure Ontology (EpSO) is an epilepsy domainntology that has been developed by an interdisciplinary team oflinical researchers and computer scientists to support various dataanagement tasks in epilepsy research [19]. A domain ontology is

formal knowledge representation structure that models domainnformation for consistent and correct interpretation of data byoftware applications [29]. Domain ontologies, such as the Genentology (GO) [30] and Human Phenotype Ontology (HPO) [31],lay a key role in reconciling data heterogeneity, harmonizing data

rom different sources, and ensure the “completeness” of queryesults using ontology reasoning [32]. EpSO is being developed as

domain ontology using the well-known four-dimensional classi-cation of epilepsy and seizures with information about seizures,natomical locations, etiology, and related medical conditions [19].n addition, EpSO also models terms describing medication, geneswith known association to epilepsy syndromes), electrophysio-ogical signal features, and the MEW CDEs. We use various types ofntology modeling constructs to represent domain-specific infor-ation in EpSO. For example, ontology class-level restrictions are

sed to model information about preferred medication, paroxys-al events, and signal features associated with different epilepsy

yndromes.Using the description logic-based Web Ontology Language

OWL2) features [33], EpSO can support identification of equiv-lent terms with different syntactic labels (e.g., using synonym

nformation and commonly used acronyms), development of userriendly visual interface (e.g., using textual description of ontol-gy terms), and improvement in quality of query results throughquery unfolding” technique using reasoning. The query unfolding

edical Informatics 94 (2016) 21–30

strategy uses the ontology class hierarchy together with OWL2 sub-sumption reasoning to expand a user query expression by includingall the appropriate subclasses of a query term in the query expres-sion. This enables the query execution module to retrieve resultscorresponding to query term and all its sub classes. For example, if acohort identification query includes “Aura” as a type of seizure, theInsight query execution module automatically expands the queryexpression to include all its sub classes such as “Auditory Aura”,“Gustatory Aura”, etc. Many biomedical database query executionapproaches rely only on exact string matching between the queryterm and the data values, and therefore query result is often incom-plete. This limitation is addressed in Insight through use of EpSOclass hierarchy during query execution.

Given the key role of EpSO in Insight, the ontology managementmodule supports: (a) automated updates of EpSO OWL file as soonas a new version of the ontology is released; and (b) semantic anno-tation of research study data as part of the ETL workflow (describedearlier). The automated update feature uses a remote Web serviceinvocation to retrieve the ontology OWL file, which is parsed usingthe open source OWL Application Programming Interface (OWLAPI)[34]. The OWLAPI was developed to allow users to programmati-cally create and modify OWL ontologies with support for parsing,validation of OWL2 profiles, and use of reasoners. The parsed ontol-ogy classes together with the class structure information are used toupdate the Insight database. In addition to query unfolding, Insightuses the EpSO class hierarchy to support the data exploration andquery composition functionalities also, which are described in thefollowing section.

2.4. Data exploration, visualization, and query composition

Insight allows users to interactively explore the different studyvariables stored in the database and perform queries to createresearch cohorts across different MEW research studies for sub-sequent analysis. This module was implemented using the Rubyon Rails (RoR) Web application framework that uses Model ViewController (MVC) architecture. The MVC architecture facilitates thelogical separation of different components of Insight for easiermaintenance and extension with new functionalities. Using agilesoftware engineering principles, we developed Insight over mul-tiple iterations with user-guided prioritization of functionalitiesand frequent user feedback. Agile development technique allowedevolutionary development of Insight with easier incorporation ofrequested changes and higher user satisfaction through early avail-ability of working software [35]. Insight can be accessed at http://mew.meds.cwru.edu. After logging into Insight, the user accessesthe data exploration and query interface.

Based on user recommendations, the data access and queryprocess is divided into two phases. In the first phase, users canselect the most appropriate research study for data explorationand cohort identification by filtering research studies based on theirprovenance metadata (e.g., inclusion and/or exclusion criteria). Thestudy metadata is manually extracted from the protocol descrip-tion of each research study, categorized into specific categories, andmade available to the user as “provenance metadata widgets”. Theprovenance widget consists of visual interactive query compositionfeatures. Provenance information describes the history or lineageof data, which enables accurate interpretation of data in the cor-rect context [36]. Therefore, the provenance of the research studiesallows users to select research studies that meet the requirementsof their research hypothesis and create research cohorts with com-parable data points. After a user selects the provenance values in

the study, metadata query widgets and the resulting research stud-ies are displayed in the “Query Result” window with links providedfor the user to view the complete study protocol (Fig. 4). Users canalso download all the data corresponding to a particular research
Page 7: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

S.S. Sahoo et al. / International Journal of Medical Informatics 94 (2016) 21–30 27

Fig. 4. A screenshot of the first phase of querying using provenance metadata of the research studies using a query widget with editable values. The users can also review ad

spns

rpiofttq

ifesoaltsh“soiscctoca

3

Mun

etailed description of study protocol of a research study.

tudy in the “Query Result” window without completing the secondhase of querying. In addition, users can also modify their prove-ance metadata values in the query widgets to refine their researchtudy search queries.

In the second phase, users can create a research cohort from theesearch studies selected in the first phase by selecting an appro-riate MEW CDE from a “drop-down” menu. Each MEW CDE is

mplemented as an individual query widget and users can selectne or more values using either a drop-down menu or input inter-ace for numeric values (as shown in Fig. 5). The values selected byhe user in each MEW CDE query widget are displayed as “editableags” and these tags can be easily deleted to modify the cohortuery.

To help the users to select the most relevant values correspond-ng to a MEW CDE, Insight features intuitive data visualizationunctionality. The data visualization functionality allows users toxplore the distribution of data values across multiple researchtudies by selecting an option from different graphic visualizationptions such as histogram or pie chart. These visualization featuresllow users to select the most appropriate MEW CDE values thatead to inclusion of greater or smaller number of participants inheir research cohorts. For example, the visualization feature mayhow that most of the study participants in the Insight databaseave “college or technical school” qualification corresponding toeducation” CDE. Therefore, users can select “college or technicalchool” to ensure that their research cohort has maximum numberf participants. Fig. 5 shows histogram and pie charts correspond-

ng to three MEW studies for the “education” MEW CDE. Fig. 5hows the research cohort corresponding to the cohort identifi-ation query composed by the user. The users can access the resultohort using features of the “result explorer”, which allows userso download the research cohort data as an MS Excel spreadsheetr CSV file for further analysis. In the following section, we dis-uss some of the planned extensions and challenges currently beingddressed in the MEW Network database initiative.

. Discussion

As the Insight system is expanded to include data from additionalEW Network studies, there is increasing interest in effectively

sing the integrated datasets to gain new insights into epilepsyeurological disorder and self-management techniques.

3.1. Application of the Insight platform in epilepsyself-management research

The integrated dataset together with the various features sup-ported by the Insight platform can provide useful support to theMEW Network in advancing it’s mission and plan future researchpriorities. For example, preliminary descriptive data extracted fromthe database suggests that women with epilepsy make up thelargest sub population in the MEW Network enrolled sample withmean value of 0.67 women and 0.33 men for data aggregated fromfour research studies (Table 1). Table 1 also describes the study-specific distribution of women and men. This finding is perhapsnot surprising given that women may be more likely than men toseek help for chronic health conditions [37]. However, the data onepilepsy does not suggest a clear gender susceptibility for mostforms of epilepsy [38]. Thus, men with epilepsy appear to be inad-equately represented in the samples collected by various MEWNetwork studies. This insight gained from the integrated datasetcan form the basis for devising and implementing strategies for amore gender-balanced sampling approach. This will provide trulyrepresentative data findings for people with epilepsy with respectto different self-management techniques.

3.2. Towards dynamic integration of data from ongoing MEWnetwork research studies

At present, the Insight ETL workflow is executed in a “batchmode” to integrate data from completed research studies. In con-trast to completed studies, data from ongoing research studies needto be integrated into Insight as a continuous data feed. In future,we propose to use a Service Oriented Architecture (SOA) that con-sists of RESTful service endpoints hosted at each MEW Networkcollaborating center on a dedicated server to transfer de-identifiedstudy data. The service endpoint can be remotely queried by Insightusing a secure connection for new data and allow easy transferof data to Insight. We believe that this approach (based on “pull”instead of “push”) will allow integration of a continuous data feedfrom ongoing research studies with minimal additional effort bythe researchers at different MEW centers.

The ETL workflow will be extended to process data from multi-ple research studies simultaneously. Although we do not expect

the size of data from different research studies to be large, wewill implement a parallelized ETL workflow using scalable Hadooptechnology, for example MapReduce [39] deployed on a cluster-computing infrastructure to process data from multiple studies
Page 8: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

28 S.S. Sahoo et al. / International Journal of Medical Informatics 94 (2016) 21–30

F n of da ed usir

stanct

3m

rtsedhusri[rtp

astacdWiouca

ig. 5. The data exploration features of Insight allows users to view the distributioppropriate cohort identification queries. The cohort identification query is composesearch cohort is also available for viewing.

imultaneously. In addition, the ETL workflow will also be extendedo incorporate the Tier-2 MEW CDEs, which have recently beenpproved by the MEW Network database steering committee. Theew Tier-2 CDEs will also be incorporated in EpSO as ontologylasses and these classes will be used for semantic annotation ofhe data.

.3. Formal representation of research study provenanceetadata

The current implementation of Insight does not structure theesearch study metadata into specific categories and only presentshem as list of inclusion or exclusion criteria. There has been exten-ive work on formalization of research study protocols [40,41], forxample the Ontology for Clinical Research (OCRe) that models theetails of study design and eligibility criteria [42]. The OCRe projectas also developed an annotation pipeline called ERGO, which issed for formal representation of eligibility criteria of researchtudies. We propose to extend this work together with standardepresentation model of provenance, such as the new PROV spec-fications developed by the World Wide Web Consortium (W3C)43], to model the research study metadata in Insight. This formalepresentation of research study metadata will enable users to sys-ematically select inclusion and exclusion criteria during the firsthase of querying in Insight.

In addition, the formal representation of study provenance willlso enable reasoning over metadata values from multiple researchtudies for cross-study cohort identification queries. We expecthat our experience with formal modeling of study provenance willllow us to provide appropriate feedback to the MEW Networkollaborating centers in terms of capturing relevant study meta-ata for subsequent secondary data analysis in Insight. The use of3C PROV specifications to model study provenance will also make

t easier for Insight to interface with existing biomedical datasets

n the Web, such as the Linked Open Data (LOD) [44], which alsose PROV to model provenance information. For example, Insightan retrieve data about epilepsy related genes (together with itsvailable provenance information) from LOD.

ata values corresponding to a MEW CDE in real time and helps them to composeng a set of visual query widgets with editable values and the details of the resulting

3.4. Limitations

The MEW Network was not funded as a centralized entity withstandardized procedures and standard pre-assigned data elements;therefore interpretation of results is limited by heterogeneity atthe micro and macro-level. Investigators using Insight should readthrough the specific study protocols carefully to understand howsamples may differ based upon enrollment criteria, study designand study implementation. Data harmonization procedures mayobscure the finer-grained detail that a specific individual studyhas identified and results can differ from individual study reportsbecause of the way missing or outlier data is handled or analyzed.However, the MEW database represents an innovative approachfor using data that takes advantage of the strengths of larger sam-ples to identify patterns or outcomes that would not be possiblewith smaller studies. Even for studies where efficacy findings arenegative (such as the FOCUS RCT study), the collected data is stilluseful to evaluate other potentially important clinical questionsabout epilepsy self-management.

4. Conclusion

In this paper, we described the development of Insight, an inte-grated database and analytical platform for the MEW Networkcollaborating centers to facilitate cross-study cohort identificationfor aggregate analysis. This approach effectively leverages the valu-able epilepsy self-management data being collected across the U.S.Insight is the first provenance-enabled epilepsy data integrationand analysis system that allows users to explore data distribu-tion and perform research cohort identification. In addition, Insightuses a novel epilepsy domain ontology called EpSO for seman-tic annotation, data integration, and query execution. The EpSOclass structure is used for implementing a “query unfolding” strat-egy, which ensures completeness of the cohort identification query

results. Insight also features an ontology management module toautomatically update the database with each new release of EpSOand provides fine-granularity access control through a flexibleRBAC feature.
Page 9: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

S.S. Sahoo et al. / International Journal of M

Summary pointsChallenges impeding effective secondary analysis of clinicalresearch data:(1) Lack of appropriate informatics tool to support data access,querying, and analysis by researchers,(2) Use of heterogeneous terminology during data collectionmakes integration and analysis difficult.This study advances secondary analysis of epilepsy self man-agement research data with following contributions:(1) Development and use of community-driven Common DataElements (CDE) together with an epilepsy domain ontology fordata integration and analysis by researchers in the ManagingEpilepsy Well (MEW) Network,(2) Development of an integrated database and analysis plat-form called Insight allows researchers to conduct cross-studyanalysis with use of provenance metadata to select appropriatestudies for cohort queries,(3) Interactive visual interface allows researchers to explore

faTMbwtadt

C

F

aai

A

idcm

A

SItm

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[28] T. Caller, K. Secore, R. Ferguson, R. Roth, F. Alexandre, J. Harrington, B. Jobst, A

data distribution during query composition.

We aim to develop Insight as a national informatics resourceor the epilepsy self-management research community to allowdvanced data analytics and research cohort identification queries.he Insight database is already being used to support analysis byEW Network investigators who aim to understand the correlation

etween quality of life and other clinical variables among personsith epilepsy. As new MEW Network research studies are added

o Insight, we believe the MEW Network researchers can evalu-te novel research hypotheses with sufficient statistical power andata variety to advance our understanding of self-managementechniques for epilepsy.

onflict of interest

None declared.

unding

This work is supported in part by the Centers for Disease Controlnd Prevention (CDC) (grant# U48DP005030 and U48DP001930),nd the National Institutes of Biomedical Imaging and Bioengineer-ng (NIBIB) Big Data to Knowledge (BD2 K) grant (1U01EB020955).

uthor contributions

SSS, PR, MS designed the overall architecture of Insight with usernput from EW. PR, SSS, JV implemented the software platform withata collection and curation done by AB, EW, YB, SS, BCJ, and CT. Allo-authors contributed to writing, reviewing, and editing the finalanuscript.

cknowledgements

The authors thank the members of the MEW Network databaseteering Committee for their continuing support in developingnsight. We also thank Rosemarie Kobau and Matthew Zack fromhe CDC for reviewing our manuscript and providing valuable com-

ents.

eferences

[1] E. Robinson, C. DiIorio, L. DePadilla, F. McCarty, K. Yeager, T. Henry, D.Schomer, P. Shafer, Psychosocial predictors of lifestyle management in adultswith epilepsy, Epilepsy Behav. 13 (3) (2008) 523–528.

[

edical Informatics 94 (2016) 21–30 29

[2] C. DiIorio, P.O. Shafer, R. Letz, T.R. Henry, D.L. Schomer, K. Yeager, Project EASEstudy group project EASE: a study to test a psychosocial model of epilepsymedication managment, Epilepsy Behav. 5 (6) (2004) 926–936.

[3] C. DiIorio, B. Faherty, B. Manteuffel, Epilepsy self-management: partialreplication and extension, Res. Nurs. Health 17 (1994) 167–174.

[4] C. DiIorio, M. Hennessy, B. Manteuffel, Epilepsy self-management: a test of atheoretical model, Nurs. Res. 45 (1996) 211–217.

[5] Institute of Medicine (IOM) Report Living Well with Chronic Illness: A Call forPublic Health Action. 2012.

[6] T. Dua, H.M. de Boer, L.L. Prilipko, S. Saxena, Epilepsy care in the world:results of an ILAE/IBE/WHO global campaign against epilepsy survey,Epilepsia 47 (7) (2006) 1225–1231.

[7] Centers for Disease Control and Prevention. Available from: http://www.cdc.gov/. Retrieved on December 24, 2015.

[8] C. DiIorio, M. Henry, Self-management in persons with epilepsy, J. Neurosci.Nurs. 27 (6) (1995) 338–343.

[9] S. Kendall, D. Thompson, L. Couldridge, The information needs of carers ofadults diagnosed with epilepsy, Seizure 13 (7) (2004) 499–508.

10] W.A. Hauser, J.R. Lee, Do seizures beget seizures, Prog. Brain Res. 135 (2002)215–219.

11] R. Kobau, H. Zahran, D.J. Thurman, M.M. Zack, T.R. Henry, S.C. Schachter, P.H.Price, Centers for Disease Control and Prevention (CDC), Epilepsy surveillanceamong adults–19 States, Behavioral Risk Factor Surveillance System, 2005.Morbidity and Mortality Weekly Report Surveillance Summary. 2008;57,(6):1–20.

12] D. Yoon, K.D. Frick, D.A. Carr, J.K. Austin, Economic impact of epilepsy in theUnited States, Epilepsia 50 (2009) 2186–2191.

13] R.E. Faught, J.R. Weiner, A. Guérin, M.C. Cunnington, M.S. Duh, Impact ofnonadherence to antiepileptic drugs on health care utilization and costs:findings from the RANSOM study, Epilepsia 50 (3) (2009) 501–509.

14] K.A. Cole, P.M. Gaspar, Implementation of an epilepsy self-managementprotocol, J. Neurosci. Nurs. 47 (1) (2015) 3–9.

15] J.P. Holdren, E. Lander, Realizing the full potential of health informationtechnology to improve healthcare for americans: the path forward, in: PCASTReport, Washington D.C., 2010, p. 2010.

16] Integration of clinical and genetic data in the i2b2 architectureS. Murphy, M.E.Mendis, D.A. Berkowitz, I. Kohane, H. Chueh (Eds.), AMIA Annu. Symp. Proc.(2006).

17] S.S. Sahoo, G.Q. Zhang, Y. Bamps, R. Fraser, S. Stoll, S.D. Lhatoo, C. Tatsuoka, J.Sams, E. Welter, M. Sajatovic, Managing information well: toward anontology-driven informatics platform for data sharing and secondary use inepilepsy self-management research centers, Health Inform. J. (2015) 1–14.

18] J.B. Wagenaar, B.H. Brinkmann, Z. Ives, G.A. Worrell, B.A. Litt, Multimodalplatform for cloud-based collaborative research, in: 6th InternationalIEEE/EMBS Conference on Neural Engineering (NER), San Diego, CA : IEEE,2013, pp. 1386–1389.

19] S.S. Sahoo, S.D. Lhatoo, D.K. Gupta, L. Cui, M. Zhao, C. Jayapandian, A. Bozorgi,G.Q. Zhang, Epilepsy and seizure ontology: towards an epilepsy informaticsinfrastructure for clinical research and patient care, J. Am. Med. Inform. Assoc.21 (1) (2014) 82–89.

20] A. Doan, A. Halevy, Z. Ives, Principles of Data Integration, Morgan Kaufmann,Waltham, MA, 2012.

21] E. Mena, A. Illarramendi, V. Kashyap, A. Sheth, OBSERVER: an approach forquery processing in global information systems based on interoperationacross pre-existing ontologies, Distrib. Parallel Databases (DAPD) 8 (2) (2000)223–271.

22] D.C. Hesdorffer, C.E. Begley, Surveillance of epilepsy and prevention ofepilepsy and its sequelae: lessons from the Institute of Medicine report, Curr.Opin. Neurol. 26 (2) (2013) 168–173.

23] D.W. Loring, D.H. Lowenstein, N.M. Barbaro, B.E. Fureman, J. Odenkirchen,M.P. Jacobs, J.K. Austin, D.J. Dlugos, J.A. French, W.D. Gaillard, B.P. Hermann,D.C. Hesdorffer, S.N. Roper, A.C. Van Cott, S. Grinnon, A. Stout, Common dataelements in epilepsy research: development and implementation of theNINDS epilepsy CDE project, Epilepsia 52 (6) (2011) 1186–1191.

24] US Center for Disease Control and Prevention (CDC). Behavioral Risk FactorSurveillance System (BRFSS), 2012, Available from: http://www.cdc.gov/brfss/about/brfss today.htm. Retrieved on December 24, 2015.

25] C.K. DiIorio, Y.A. Bamps, A.L. Edwards, C. Escoffery, N.J. Thompson, C.E. Begley,R. Shegog, N.M. Clark, L. Selwa, S.C. Stoll, R.T. Fraser, P. Ciechanowski, E.K.Johnson, R. Kobau, P.H. Price, Managing Epilepsy Well Network: theprevention research centers’ managing epilepsy well network, EpilepsyBehav. 19 (3) (2010) 218–224.

26] T.A. Caller, R.J. Ferguson, R.M. Roth, K.L. Secore, F.P. Alexandre, W. Zhao, T.D.Tosteson, P.L. Henegan, K.A. Birney, B.C. Jobst, A cognitive-behavioralintervention (HOBSCOTCH) improves quality of life and attention in epilepsy:a pilot study, Epilepsy Behav. 57 (Pt A) (2016) 111–117, http://dx.doi.org/10.1016/j.yebeh.2016.01.024.

27] C. DiIorio, C. Escoffery, K.A. Yeager, F. McCarty, T.R. Henry, A. Koganti, E.Reisinger, E. Robinson, R. Kobau, P. Price, WebEase: development of aweb-based epilepsy self-management intervention, Prev. Chronic Illn. 6 (1)(2009) A28.

pilot study of a self-management intervention for cognitive impairment inepilepsy, Neurology 82 (10 Suppl. S43.001) (2014).

29] O. Bodenreider, R. Stevens, Bio-ontologies: current trends and futuredirections, Brief. Bioinform. 7 (3) (2006) 256–274.

Page 10: International Journal of Medical Informatics · School of Public Health, Emory University, Atlanta, GA 30322, United States e Center for Managing Chronic Disease, University of Michigan,

3 al of M

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

cancer, and heart disease.Data User Agreements (DUA): is an agreement for transfer and usage of data across

different entities, for example research centers or universities.

0 S.S. Sahoo et al. / International Journ

30] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis,K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A.Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G.Sherlock, Gene ontology: tool for the unification of biology: the gene ontologyconsortium, Nat. Genet. 25 (1) (2000) 25–29.

31] S. Köhler, S.C. Doelken, C.J. Mungall, S. Bauer, H.V. Firth, I. Bailleul-Forestier,G.C. Black, D.L. Brown, M. Brudno, J. Campbell, D.R. FitzPatrick, J.T. Eppig, A.P.Jackson, K. Freson, M. Girdea, I. Helbig, J.A. Hurst, J. Jähn, L.G. Jackson, A.M.Kelly, D.H. Ledbetter, S. Mansour, C.L. Martin, C. Moss, A. Mumford, W.H.Ouwehand, S.M. Park, E.R. Riggs, R.H. Scott, S. Sisodiya, S. Van Vooren, R.J.Wapner, A.O. Wilkie, C.F. Wright, A.T. Vulto-van Silfhout, N. de Leeuw, B.B. deVries, N.L. Washingthon, C.L. Smith, M. Westerfield, P. Schofield, B.J. Ruef, G.V.Gkoutos, M. Haendel, D. Smedley, S.E. Lewis, P.N. Robinson, The humanphenotype ontology project: linking molecular biology and disease throughphenotype data, Nucleic Acids Res. (2014) 966–974.

32] S.S. Sahoo, O. Bodenreider, J.L. Rutter, K.J. Skinner, A.P. Sheth, Anontology-driven semantic mashup of gene and biological pathwayinformation: application to the domain of nicotine dependence, J. Biomed.Inform. 41 (5) (2008) 752–765 (Semantic Mashup of Biomedical Data).

33] P. Hitzler, M. Krötzsch, B. Parsia, P.F. Patel-Schneider, S. Rudolph, OWL 2 WebOntology Language Primer, World Wide Web Consortium W3C, 2009.

34] M. Horridge, S. Bechhofer, The OWL API: a java API for OWL ontologies,Semant. Web J. 2 (1) (2011) 11–21.

35] K. Beck, M. Beedle, A.V. Bennekum, A. Cockburn, W. Cunningham, M. Fowler, J.Grenning, et al. Manifesto for agile software development, 2001.

36] S.S. Sahoo, V. Nguyen, O. Bodenreider, P. Parikh, T. Minning, A.P. Sheth, Aunified framework for managing provenance information in translationalresearch, BMC Bioinf. 12 (461) (2011).

37] B. O’Dea, N. Glozier, R. Purcell, P.D. McGorry, J. Scott, K.L. Feilds, D.F. Hermens,J. Buchanan, E.M. Scott, A.R. Yung, E. Killacky, A.J. Guastella, I.B. Hickie, Across-sectional exploration of the clinical characteristics of disengaged (NEET)young people in primary mental healthcare, BMJ Open 4 (12) (2014).

38] P. Perucca, P. Camfield, C. Camfield, Does gender influence susceptibility andconsequences of acquired epilepsies? Neurobiol. Dis. 72 (Pt. B) (2014)125–130.

39] J. Dean, S. Ghemawat, MapReduce: a flexible data processing tool, Commun.ACM 53 (1) (2010) 72–77.

40] C. Weng, X. Wu, Z. Luo, M.R. Boland, D. Theodoratos, S.B. Johnson, EliXR: an

approach to eligibility criteria extraction and representation, J. Am. Med.Inform. Assoc. 18 (Suppl. 1) (2011) 116–124.

41] K. Milian, A. Teije (Eds.), Towards Automatic Patient Eligibility Assessment:from Free-text Criteria to Queries. 14th Conference on Artificial Intelligence inMedicine, AIME, 2013, Springer, Murcia Spain, 2013.

edical Informatics 94 (2016) 21–30

42] I. Sim, S.W. Tu, S. Carini, H.P. Lehmann, B.H. Pollock, M. Peleg, K.M. Wittkowski,The Ontology of Clinical Research (OCRe): an informatics foundation for thescience of clinical research, J. Biomed. Inform. 52 (2014) 78–91.

43] L. Moreau, P. Missier, PROV Data Model (PROV-DM), World Wide WebConsortium W3C, 2013.

44] C. Bizer, T. Heath, T. Berners-Lee, Linked data – the story so far, Int. J. Semant.Web Inf. Syst. (2009) (Special Issue on Linked Data.).

Glossary of Terms

Managing Epilepsy Well (MEW) Network: is a thematic research network whose mis-sion is to advance the science of epilepsy self-management.

Common Data Elements (CDE): is a set of well-defined terms used consistently acrossmultiple research studies to reduce data heterogeneity, improve data quality,and facilitate data integration.

Ruby on Rails (RoR): is a web development framework that uses Model View Con-troller (MVC) architecture together with Ruby language for rapid applicationdevelopment.

Web Ontology Language Application Programming Interface (OWLAPI): was developedto allow users to programmatically create and modify OWL ontologies withsupport for parsing, validation of OWL2 profiles, and use of reasoners.

Epilepsy and Seizure Ontology (EpSO): is a domain ontology developed to formallymodel terms associated with the well-known four-dimensional classification ofepilepsy consisting of seizure, anatomical location, etiology, and related medicalconditions.

Role-based Access Control (RBAC): is an approach to manage accessibility of users tocomputational resources based on their role.

U.S. Centers for Disease Control and Prevention’s (CDC): is a federal organization in theUnited States focused on national public health through control and preventionof diseases.

Prevention Research Centers (PRC): is a network of research centers funded by theUS CDC to study techniques to address risks of chronic illness such as obesity,