The ACP MS
Repository
A Case Study
tranSMART Community Meeting, Nov. 2013 Stephen Wicks, Ph.D.
The ACP Repository: a Case Study
What is Multiple Sclerosis? Chronic inflammatory/demyelination disorder
affecting the CNS. (about 0.1%) Leading cause of neurological disability in young
adults. Symptoms are variable and significant. They include
vision, cognition, locomotion, pain, disorientation, dexterity, mood, bowel/bladder control, others.
Generally progressive, but progression is idiosyncratic. (CISRRMSSPMS, vs. CISPPMS etc.)
Complex etiology
The ACP Repository: a Case Study
What is the cost of MS?
Difficult and costly to diagnose (MRI, symptom variability leads to extensive differential diagnosis)
Treatments can slow progression, but are expensive.
Many different drugs exist, but patient stratification for maximum efficacy and minimum side effects is non-existent. “Role the dice”
Often strikes early in life, and is a life-long disability.
Average Diagnosis at about 30. 5% before 16.
ACP is a founding member of Orion. Orion seeks to cure MS by harnessing the power
of computational modeling of disease progression. ACP will provide its data to Orion in tranSMART to
facilitate this goal. Rancho BioSciences will curate and harmonize the
ACP data for Orion
Orion Bionetworks
The ACP Repository: a Case Study
ACP and the MS Repository
Founded in 2001 by an MIT entrepreneur with MS
ACP MS Repository started in 2006. The goal was to identify the cause of MS.
ACP MS Repository enrollment shut down this year. Approximately 3200 participants enrolled.
Biosamples, demographics, medical history etc.
Research data OPT-UP
Repository Enrollment Status (6/21/2013)• 3,220 subjects enrolled; 467 longitudinal visits completed
6
• DNA, RNA, Plasma, Serum, PBMCs + data from 52 page CRF
The ACP Repository: a Case Study
ACP Repository
3200+ participantsBiosamples & Datasets
$13 millionInvested
MS Researchers Worldwide
Academia & Industry
77 sets of biosamples+data
(b,m)illions of datapoints,From 36 studies, so far
“Matchmaker”Database Graphical
User Interface
Allowing MS Researchers Worldwide to Explore the
ACP Repository Database
MS Discovery Forum
Reviewing Developments in the MS FieldCommunicating with MS Researchers
Insights and Results
Mechanisms Diagnostics Causes Treatments
The ACP EngineThe ACP Repository: a Case Study
The ACP Repository: a Case Study
ACP MS Repository Open-access collection of highly annotated blood-derived
samples plus data from MS, related diseases, & control subjects gathered from 2006-2013.
Requirement for research data derived from samples to be deposited (with a provision for IP protection).
Contributes to MS+ research in many ways: Enables studies that might not be conducted otherwise
(academic & commercial) Creates a common results database for studies from
multiple bio-analytical techniques on overlapping sets of subjects.
Approximately 3200 participants.
“Working with them (ACP) allowed us to obtain critical samples and confirm our results for only $20,000. If I had to obtain these samples from scratch, it would have cost $1 million and added 5 years to the project.”
- Thomas M. Aune, PhD, Molecular Biology, Vanderbilt University School of Medicine
(from Scientific American)
Case Report Form (CRF)Curation challenges
The ACP Repository: a Case Study
ACP Case Report Form 48 Page (first visit) and 38 page (second
visit) complete clinical workup Form completed with the assistance of a
clinical research associate over a several hour interview (with sample draw and lab workup)
Broad data: 80 distinct tables in an SQL database
Deep data: in flat data files, more than 20 million cells
The ACP Repository: a Case Study
CRF Sample Fields
Illustrates some of the problems associated with curating this dataset.
The ACP Repository: a Case Study
103 distinct textual responses. “Betseron”, beta-seron, betaseron, BETASERON, etc.Study drugs “CS-0777”, or drug trail enrollment “BG00012 (FUMARATE) OR PLACEBO”Inappropriate (sometimes lethaly so) drug units No consistent measure of frequency. “First Drug”, “Second Drug” etc… ordinal order was meaningless.
DMD Curation Solutions We applied drug ontologies and mapping
vocabularies where needed. We repaired and consolidated dose,
frequency, etc. to a single measure with 3 values (high, standard, low)
We re-formatted the data to eliminate the ambiguous cardinal ordering of reporting
The ACP Repository: a Case Study
CRF Sample Fields
The ACP Repository: a Case Study
Multiple Drugs (Observations) were addressed with…
VISIT_NAME application
The ACP Repository: a Case Study
Controlled Vocabularies (sports)
~5000 responses 779 distinct sports reported When filtered by “ski”,29 “gym”, 45 “walk”, 30; “jog”, 17, “run”,
40
The ACP Repository: a Case Study
Controlled Vocabularies (sports)
All sports mapped to a 29 term vocabulary.
The ACP Repository: a Case Study
Controlled Vocabularies (pets)
~6500 pets reported 600 distinct pets reported When filtered by “dog”, 112, however, this
misses mispellings (“diog”, “dot”, “pubs”, dog-like pets “wolf”, “half-wolf”, “mutt”, and breeds (“poddle”, “poodle”, “Afghan Hound”, etc.)
59 additional dog-like entries
The ACP Repository: a Case Study
Controlled Vocabularies (pets)
All pets mapped to a 31 category controlled vocabulary
The ACP Repository: a Case Study
Medication Curation Challenges
>10,000 medications listed. 2703 distinct medications listed. Mapped these to 614 real medications (e.g.
Amitriptyline) This was split into two tables:
Continuing Medications (541 entities) Stopped Medications (317 entities)
VISIT_NAME was used to represent distinct observations across the whole study
Truly longitudinal measures were reified in the tree hierarchy in the data mapping file.
The ACP Repository: a Case Study
Amitriptaline Amitriptylin Amitriptyline
Amitriptyline HCL Amitroptyline Amitryetyline Amitrypatiline Amitryptailine Amitryptaline Amitryptilin Amitryptiline Amitryptilline Amitryptylene Amitryptyline
ACP Repository Tree
The ACP Repository: a Case Study
Date and Time Coding All dates converted to
periods (Months, Years, or Days) prior to the relevant blood draw date.
Dates were represented by International Standard ISO 8601. i.e. YYYY-MM-DD (e.g. 2001-12-15)
Dates in multiple formats:
15/12/200115/Dec/2001Dec-20012001Dec./200112/2001 --/--/------/2001-------------/12/2001
The ACP Repository: a Case Study
77 studies ongoing or completed 36 studies have returned data to ACP Data types:
Low-D biomarker (antibodies, metabolites, serum markers of inflammation, etc.)
Low-D genotype data High-D SNP/GWAS data Gene-expression studies Whole-genome sequencing (2 distinct studies)
Study types: Etiology Diagnostics Disease activity biomarkers
Repository Usage
The ACP Repository: a Case Study
Repository Usage
The ACP Repository: a Case Study
Research Data Curation Challenges
Few guidelines provided to researchers for data formatting or treatment
Often little or no documentation describing how the data was generated or handled (raw vs. normalized, transformations e.g.)
Load study meta-data (contact info, description, etc. at the node level)
The ACP Repository: a Case Study
Sample Study Results
Biogen gene expression study: Designed to identify gene-expression profiles that discriminate progressiveforms of MS from relapsing-remittingforms of the disease.
The ACP Repository: a Case Study
Future Directions Rancho BioSciences is providing guidance to ACP
for data-collection practices going forward (e.g. OPT-UP)
We loaded the clinical data and 6 sample study datasets into an Oracle-based tranSMART instance that we host in-house for QC purposes.
The full dataset is slated to be loaded into a 1.1 postgreSQL-based tranSMART instance (hosted by Recombinant by Deloitte for Orion).
This and other data sources (Inst. For Neuroscience at B&W) will be analyzed and modeled by Orion
The ACP Repository: a Case Study
Thanks for your time! Questions?
The ACP Repository: a Case Study
Top Related