Post on 23-Jan-2018
Big Data as a Catalyst for
Collaboration & Innovation
Philip E. Bourne Ph.D., FACMI
Associate Director for Data Science
National Institutes of Health
philip.bourne@nih.gov
SRP Annual Meeting
Puerto Rico, November 18, 2015
Thesis…
We are entering a period of disruption
in biomedical research and we should
all be thinking about what this means
http://i1.wp.com/chisconsult.com/wp-
content/uploads/2013/05/disruption-is-a-
process.jpg
http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
Evidence of Disruption …
Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
Disruption: Example - Photography
DigitizationDeception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volu
me,
Velo
city,
Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
Disruption: Biomedical Research
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
Disruptive Features: Sustainability
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
Disruptive Features:
Reproducibility
Changing Value of Scholarship (?)
“And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack ObamaJanuary 30, 2015
Disruptive Features – New Science
Precision Medicine Initiative
National Research Cohort
– >1 million U.S. volunteers
– Numerous existing cohorts (many funded by NIH)
– New volunteers
Participants will be centrally involved in design and
implementation of the cohort
They will be able to share genomic data, lifestyle
information, biological samples – all linked to their
electronic health records
Big Data in Biomedicine…
This speaks to something more
fundamental that more data …
It speaks to new methodologies, new
skills, new emphasis, new cultures,
new modes of discovery …
Open Educational Resources
Software Discovery and Sustainability
Community Based
Standards
Clinical Data Challenges
Database Sustainability
About the BD2K Program
• Initial FY 2014 funding of $32M.
• Proposed investment of $656M through 2020.
• Funded by ALL NIH Institutes and Centers.
Software and Tool
Development
Centers of Excellence for Big Data Computing
Data Discovery
Index Coordination Consortium
Scientist Training Awards
Data Science Courses
Data Sharing
BD2K FY14 Awardssupported by all NIH Institutes
Big Data: MRI images &
GWAS data from over 30,000
people.
Collaboration: Data came
from 190 worldwide sites
across 33 countries.
Methods: To homogenize
data from different sites, the
group designed standardized
protocols for image analysis,
quality assessment, genetic
imputation, and association.
Found five novel genetic
variants influencing volume
of brain regions.
Results provided insight into
the variability of brain
development, and may be
applied to mechanisms of
neuropsychiatric dysfunction.
(Nature, 2015)
Detect Predict Adapt
MD2K Applications – CHF and Smoking
Elements of The Digital Enterprise
CommunityPolicy
Infrastructure
• Sustainability
• Collaboration
• Training
Elements of The Digital Enterprise
CommunityPolicy
Infrastructure
• Sustainability
• Collaboration
• Training
A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
Policies – Now & Forthcoming
Data Sharing
– Goal: legitimize data as a form of scholarship
– Now: Genomic data sharing announced
– Coming: Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
– Data citation
http://www.nih.gov/news/health/aug2014/od-27.htm
Elements of The Digital Enterprise
CommunityPolicy
Infrastructure
• Sustainability
• Collaboration
• Training
Elements of The Digital Enterprise
CommunityPolicy
Infrastructure
• Sustainability
• Collaboration
• Training
The Commons is a shared virtual space which is
FAIR:
– Find
– Access (use effectively)
– Interoperate
– Reuse
An environment to find and catalyze the use of
shared digital research objects
The CommonsConcept
The Developer or User Defines the
Environment from the Appropriate
Building Blocks
Infrastructure - The Commons:
Conceptual Framework
Research Objects (with UIDs)
Discoverability(Search & Find)
Th
e C
om
mo
ns
Open APIs
Data and tools
Computing
Platform(s)
Containers
Packaging software
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
CenterBD2K
Center
BD2K
Center
DDICC
Software
Standards
Infrastructure - The
CommonsLabs
Labs
Labs
Labs
Commons - Pilots
The Cloud Credits - business model
BD2K Centers
MODs (Model Organism Databases)
HMP Data and tools available in the cloud
NCI Cloud Pilots & Genomic Data
Commons
Elements of The Digital Enterprise
CommunityPolicy
Infrastructure
• Sustainability
• Collaboration
• Training
SRP & BD2K
Possible Interactions? (1/3)
Innovation
– Join IDEAS Labs
– Propose new EHS programs
Collaboration
– Participation in standards efforts
• EHS vocabulary
• CDE harmonization
• BD2K standards coordination center
Training
– BD2K training coordination center
SRP & BD2K
Possible Interactions? (2/3)
Open Science - FAIR
– Participate in the Open Science Competitions
– With EPA challenge to produce device to measure
atmospheric pollutants and biologic response (2013).
• Toxicogenomics challenge with NTP/NIEHS, UNC, and
Sage/DREAM to predict response to a given chemical
based upon individual genetic susceptibility (2014).
• With HHS to generate visualization tool to demonstrate
impact of climate change on health risk (2016).
– CHEAR data science center.
SRP & BD2K
Possible Interactions? (3/3)
Leadership and coordination
– Trans-NIH, BD2K
– Interagency Big Data Senior Steering Group
– Sustainability Working Group
– Research Data Alliance Interest Group, Toxicogenomics
Databases Interoperability
ADDS Team
BD2K Representatives
NIH…Turning Discovery Into Health
philip.bourne@nih.gov
https://datascience.nih.gov/http://www.ncbi.nlm.nih.gov/research/staff/bourne/