ACMI 2017 Winter Symposium 1
Mike Becich, MD PhDDepartment of Biomedical Informatics
Chair and University Distinguished Professor,Associate Vice Chancellor for Informatics
Associate Director, U Pit Cancer Inst, Clin Trans Sci InstUniversity of Pittsburgh School of Medicine
NCI Board of Scientific Advisors
Towards a Data CommonsACMI 2017 Winter Symposium
Duck Key, FL
ACMI 2017 Winter Symposium 2
Motivations• Making Data Sharing Efficient (and Persistent)• NIH Institutes/Center (ICs) are funding “Commons”
– Precision Medicine and Data Science programs are drivers• NLM’s Trans-NIH Biomedical Informatics Coordinating
Committee (BMIC):• Data Sharing Repositories - https://
www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html • Common Data Elements (CDE) Resource Portal - https://
www.nlm.nih.gov/cde/index.html
• NCI, NIAID, NICHD and NLM have been most proactive to date
ACMI 2017 Winter Symposium 3
NIH Initiatives• NIH Data Commons Pilots – https://
datascience.nih.gov/commons– Model Organism Database– BD2K Centers Pilots (e.g. Pitt/Harvard)– Human Microbiome Project– NCI Data Commons
• Genomic Data Commons (GDC) – U Chicago – TCGA data• Cloud Pilots - ongoing
• NIAID – BD2K Center for Enhanced Data Annotation and Retrieval (Musen/Stanford)
ACMI 2017 Winter Symposium 4
Big Data To Knowledge (BD2K) bioCADDIE and DataMed
• USCD (Ohno-Machado) BD2K Data Discovery Index Project – bioCADDIE
• DataMed v1.5 available• Aims to allow in a PubMed-like fashion to
search for and discover data sets• Is this scalable to provide institutional
infrastructure?
ACMI 2017 Winter Symposium 5
bioCADDIE and DataMed v1.5
https://datamed.org/
ACMI 2017 Winter Symposium 6
PCORI CDRN and CTSA ACT start to unlockClinical Data from EHRs – Key Drivers
ACMI 2017 Winter Symposium 7
Further Fuel is Precision Medicine Initiative – Adding Biospecimens, Mobile Sensors
ACMI 2017 Winter Symposium 8
• Operationally data sharing is an NIH requirement• Most Institutions (maybe all) don’t really treat data as
the valued asset it is – era of Data Science• Most health science investigators are struggling due to
access to scalable storage, high performance computing and open source tool maintenance – the day of supercomputing is here
• Hence, institutions (and BMI) need to support a “real” plan for Research Data Management
• At Pitt Data Commons = Research Data Management
Key Question – How to Pull It All Together
ACMI 2017 Winter Symposium 9
Data Commons Infrastructure @ PittData
Infrastructure Component
Awareness Implementation & Deployment
Adoption Comments
CRIS/Center for Research Computing
Yellow Red Red Under discussion
DMPT Tool Yellow Yellow Yellow In progress
Box storage (small scale) sharing
Green Green green Not the type of “cloud computing” we need for research – simply storage and no HPC, software tools
Storage (large scale) Red Red Red Turn to PSC, SaM and commercial cloud provider(s) – need scale and flexibility
Data Catalogue Red Red red
Metadata schema / ontologies
Red Red red No institutional data schema in place; disciplinary standards present in some areas
Analysis tools Green (everyone knows this is needed)
Yellow red Check licensing arrangements
Visualization tools Yellow Red Red
DOIs Yellow Red Red
Deposit Red Red Red
Repository/ preservation
Red Red Red Noted as a major gap
Tracking tools Red Red Red
Training Yellow Yellow Yellow 4 classes offered by HSLS
Advocacy/ guides Yellow Yellow Yellow In development by ULS, HSLS, CSSD
ACMI 2017 Winter Symposium 10
Who’s at the table?• School of Computing and Information (SCI):
– Department of Computer Science– School of Information Science
• Dept of Information Culture & Data Stewardship (Liz Lyons - chair)
• Department of Biomedical Informatics– CRIO for the Health Sciences – Recruiting Op TBN
• CIO & Computing Services and Systems Development• Center for Research Computing – New Director TBN• Pittsburgh Supercomputing Center• Health Sciences & University (Pitt and CMU) Libraries• Office of Research
ACMI 2017 Winter Symposium 11
Building Blocks – Pittsburgh Genome Research Repository (Rebecca Jacobson – ACMI)
ACMI 2017 Winter Symposium 12
Building Blocks – BD2K - Center for Causal Discovery (Greg Cooper - ACMI)
ACMI 2017 Winter Symposium 13
Building Blocks – Pittsburgh Health Data Alliance (Becich – ACMI)
• Two Centers created:• Center for Machine
Learning in Healthcare• Led by Joe Marks in
CMU School of Computer Science
• Center for Commercial Applications (CCA)• Led by Mike Becich
and Don Taylor• $2M/yr in Early Stage• $22M in follow on
funding for successful projects
• Launch in July 2015
ACMI 2017 Winter Symposium 14
ACMI 2017 Winter Symposium 15
• NCI – Cancer Immunology Data Commons (CIDC) – linked to Cancer Immunologic Monitoring and Analysis Centers (CIMAC)
• PDX Data Commons – Patient Derived Xenografts – linked to PDX Trial Research Network
• NCI Commons Credits for cloud HPC
New National Funding Ops – NCI
ACMI 2017 Winter Symposium 16
• TOPMed – Trans-Omics for Precision Medicine goals:– Collect and assemble -omics (RNASeq, methylation,
metabolomics, epigenomics, and proteomics) data with WGS and clinical outcomes data across diverse populations including those traditionally underrepresented in research.
– Build a data commons repository that the scientific community can use for future research and to enable precision medicine.
– Stimulate systems medicine approaches that help organize data to ensure they are accessible and interpretable for health disease research.
– Promote discoveries about the fundamental mechanisms that underlie HLBS disorders.
NHLBI Data Commons
ACMI 2017 Winter Symposium 17
• Archiving and Sharing of Longitudinal Data Resources on Aging (U24)– Foster data sharing and wider use of longitudinal
data for research on aging in the behavioral and social sciences
– sharing best practices in data and metadata documentation, and disseminating information about useful data sets to the research community
NIA Data Commons Efforts
ACMI 2017 Winter Symposium 18
• Archiving and Documenting Child Health and Human Development Data Sets– support archiving and documenting existing data sets in order
to enable secondary analysis of these data by the scientific community
– Types of data include survey data, administrative data, results of assays conducted on biospecimens, data from clinical trials, and patient registries.
– Also included are archiving activities for data that is to be added to existing data sets in order to enhance their potential scientific impact, such as geographic information systems (GIS), community-level, or registry data.
NICHD Data Commons Efforts
ACMI 2017 Winter Symposium 19
• “... efficient storage, manipulation, analysis, and sharing of research output, from all parts of the research lifecycle...” PE Bourne– Funding opportunities are being launched across the NIH– Time to fit your local, regional and national data sharing
and analysis needs– Need “jumpstart” funding in research computing
infrastructure– Sustainability possible if Offices of Research ensure “data
sharing” infrastructure is budgeted on each grant
Common Goals of Commons
ACMI 2017 Winter Symposium 20
Conclusions
Please join in this effort by e-mailing me – [email protected] Interest/Skills/Personal Goals – I will send you Pitt’s RoadMap
• Biomedical Informatics and the new home (NLM) of the Data Science Program is Key
• Influencers should assist the new NLM Director in the four working groups of the NLM Strategic Planning Process
• Key innovations in development of research objects, integrative metadata development, causal analytics and novel research computing environments (supercomputing/cloud computing/storage) will be key!!!
Top Related