Big Data as a Catalyst for Collaboration & Innovation

Post on 23-Jan-2018

1.143 views 0 download

Transcript of Big Data as a Catalyst for Collaboration & Innovation

Big Data as a Catalyst for

Collaboration & Innovation

Philip E. Bourne Ph.D., FACMI

Associate Director for Data Science

National Institutes of Health

philip.bourne@nih.gov

SRP Annual Meeting

Puerto Rico, November 18, 2015

Thesis…

We are entering a period of disruption

in biomedical research and we should

all be thinking about what this means

http://i1.wp.com/chisconsult.com/wp-

content/uploads/2013/05/disruption-is-a-

process.jpg

http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg

Evidence of Disruption …

Evidence:

– Google car

– 3D printers

– Waze

– Robotics

– Sensors

From: The Second Machine Age: Work, Progress,

and Prosperity in a Time of Brilliant Technologies

by Erik Brynjolfsson & Andrew McAfee

Disruption: Example - Photography

DigitizationDeception

Disruption

Demonetization

Dematerialization

Democratization

Time

Volu

me,

Velo

city,

Variety

Digital camera invented by

Kodak but shelved

Megapixels & quality improve slowly;

Kodak slow to react

Film market collapses;

Kodak goes bankrupt

Phones replace

cameras

Instagram,

Flickr become the

value proposition

Digital media becomes bona fide

form of communication

Disruption: Biomedical Research

Digitization of Basic &

Clinical Research & EHR’s

Deception

We Are Here

Disruption

Demonetization

Dematerialization

Democratization

Open science

Patient centered health care

Disruptive Features: Sustainability

Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

Disruptive Features:

Reproducibility

Changing Value of Scholarship (?)

“And that’s why we’re here today. Because something

called precision medicine … gives us one of the greatest

opportunities for new medical breakthroughs that we

have ever seen.”

President Barack ObamaJanuary 30, 2015

Disruptive Features – New Science

Precision Medicine Initiative

National Research Cohort

– >1 million U.S. volunteers

– Numerous existing cohorts (many funded by NIH)

– New volunteers

Participants will be centrally involved in design and

implementation of the cohort

They will be able to share genomic data, lifestyle

information, biological samples – all linked to their

electronic health records

Big Data in Biomedicine…

This speaks to something more

fundamental that more data …

It speaks to new methodologies, new

skills, new emphasis, new cultures,

new modes of discovery …

Open Educational Resources

Software Discovery and Sustainability

Community Based

Standards

Clinical Data Challenges

Database Sustainability

About the BD2K Program

• Initial FY 2014 funding of $32M.

• Proposed investment of $656M through 2020.

• Funded by ALL NIH Institutes and Centers.

Software and Tool

Development

Centers of Excellence for Big Data Computing

Data Discovery

Index Coordination Consortium

Scientist Training Awards

Data Science Courses

Data Sharing

BD2K FY14 Awardssupported by all NIH Institutes

Big Data: MRI images &

GWAS data from over 30,000

people.

Collaboration: Data came

from 190 worldwide sites

across 33 countries.

Methods: To homogenize

data from different sites, the

group designed standardized

protocols for image analysis,

quality assessment, genetic

imputation, and association.

Found five novel genetic

variants influencing volume

of brain regions.

Results provided insight into

the variability of brain

development, and may be

applied to mechanisms of

neuropsychiatric dysfunction.

(Nature, 2015)

Detect Predict Adapt

MD2K Applications – CHF and Smoking

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

A Culture of Sharing

1999 20042003 2007 20142008

Research

Tools

Policy

NIH Data

Sharing Policy

Model

Organism

Policy

Genome-wide

Association

(GWAS) Policy

2012

NIH Public

Access Policy

(Publications)

Big Data to

Knowledge

(BD2K) Initiative

Genomic Data

Sharing (GDS)

Policy

Modernization of

NIH Clinical

Trials

White House

Initiative

(2013 “Holdren

Memo”)

Policies – Now & Forthcoming

Data Sharing

– Goal: legitimize data as a form of scholarship

– Now: Genomic data sharing announced

– Coming: Data sharing plans on all research awards

– Data sharing plan enforcement

• Machine readable plan

• Repository requirements to include grant numbers

– Data citation

http://www.nih.gov/news/health/aug2014/od-27.htm

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

The Commons is a shared virtual space which is

FAIR:

– Find

– Access (use effectively)

– Interoperate

– Reuse

An environment to find and catalyze the use of

shared digital research objects

The CommonsConcept

The Developer or User Defines the

Environment from the Appropriate

Building Blocks

Infrastructure - The Commons:

Conceptual Framework

Research Objects (with UIDs)

Discoverability(Search & Find)

Th

e C

om

mo

ns

Open APIs

Data and tools

Computing

Platform(s)

Containers

Packaging software

BD2K

Center

BD2K

Center

BD2K

Center

BD2K

CenterBD2K

Center

BD2K

Center

DDICC

Software

Standards

Infrastructure - The

CommonsLabs

Labs

Labs

Labs

Commons - Pilots

The Cloud Credits - business model

BD2K Centers

MODs (Model Organism Databases)

HMP Data and tools available in the cloud

NCI Cloud Pilots & Genomic Data

Commons

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

SRP & BD2K

Possible Interactions? (1/3)

Innovation

– Join IDEAS Labs

– Propose new EHS programs

Collaboration

– Participation in standards efforts

• EHS vocabulary

• CDE harmonization

• BD2K standards coordination center

Training

– BD2K training coordination center

SRP & BD2K

Possible Interactions? (2/3)

Open Science - FAIR

– Participate in the Open Science Competitions

– With EPA challenge to produce device to measure

atmospheric pollutants and biologic response (2013).

• Toxicogenomics challenge with NTP/NIEHS, UNC, and

Sage/DREAM to predict response to a given chemical

based upon individual genetic susceptibility (2014).

• With HHS to generate visualization tool to demonstrate

impact of climate change on health risk (2016).

– CHEAR data science center.

SRP & BD2K

Possible Interactions? (3/3)

Leadership and coordination

– Trans-NIH, BD2K

– Interagency Big Data Senior Steering Group

– Sustainability Working Group

– Research Data Alliance Interest Group, Toxicogenomics

Databases Interoperability

ADDS Team

BD2K Representatives

NIH…Turning Discovery Into Health

philip.bourne@nih.gov

https://datascience.nih.gov/http://www.ncbi.nlm.nih.gov/research/staff/bourne/