Big Data as a Catalyst for Collaboration & Innovation

31
Big Data as a Catalyst for Collaboration & Innovation Philip E. Bourne Ph.D., FACMI Associate Director for Data Science National Institutes of Health [email protected] SRP Annual Meeting Puerto Rico, November 18, 2015

Transcript of Big Data as a Catalyst for Collaboration & Innovation

Page 1: Big Data as a Catalyst for Collaboration & Innovation

Big Data as a Catalyst for

Collaboration & Innovation

Philip E. Bourne Ph.D., FACMI

Associate Director for Data Science

National Institutes of Health

[email protected]

SRP Annual Meeting

Puerto Rico, November 18, 2015

Page 2: Big Data as a Catalyst for Collaboration & Innovation

Thesis…

We are entering a period of disruption

in biomedical research and we should

all be thinking about what this means

http://i1.wp.com/chisconsult.com/wp-

content/uploads/2013/05/disruption-is-a-

process.jpg

http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg

Page 3: Big Data as a Catalyst for Collaboration & Innovation

Evidence of Disruption …

Evidence:

– Google car

– 3D printers

– Waze

– Robotics

– Sensors

From: The Second Machine Age: Work, Progress,

and Prosperity in a Time of Brilliant Technologies

by Erik Brynjolfsson & Andrew McAfee

Page 4: Big Data as a Catalyst for Collaboration & Innovation

Disruption: Example - Photography

DigitizationDeception

Disruption

Demonetization

Dematerialization

Democratization

Time

Volu

me,

Velo

city,

Variety

Digital camera invented by

Kodak but shelved

Megapixels & quality improve slowly;

Kodak slow to react

Film market collapses;

Kodak goes bankrupt

Phones replace

cameras

Instagram,

Flickr become the

value proposition

Digital media becomes bona fide

form of communication

Page 5: Big Data as a Catalyst for Collaboration & Innovation

Disruption: Biomedical Research

Digitization of Basic &

Clinical Research & EHR’s

Deception

We Are Here

Disruption

Demonetization

Dematerialization

Democratization

Open science

Patient centered health care

Page 6: Big Data as a Catalyst for Collaboration & Innovation

Disruptive Features: Sustainability

Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

Page 7: Big Data as a Catalyst for Collaboration & Innovation

Disruptive Features:

Reproducibility

Changing Value of Scholarship (?)

Page 8: Big Data as a Catalyst for Collaboration & Innovation

“And that’s why we’re here today. Because something

called precision medicine … gives us one of the greatest

opportunities for new medical breakthroughs that we

have ever seen.”

President Barack ObamaJanuary 30, 2015

Disruptive Features – New Science

Page 9: Big Data as a Catalyst for Collaboration & Innovation

Precision Medicine Initiative

National Research Cohort

– >1 million U.S. volunteers

– Numerous existing cohorts (many funded by NIH)

– New volunteers

Participants will be centrally involved in design and

implementation of the cohort

They will be able to share genomic data, lifestyle

information, biological samples – all linked to their

electronic health records

Page 10: Big Data as a Catalyst for Collaboration & Innovation

Big Data in Biomedicine…

This speaks to something more

fundamental that more data …

It speaks to new methodologies, new

skills, new emphasis, new cultures,

new modes of discovery …

Page 11: Big Data as a Catalyst for Collaboration & Innovation

Open Educational Resources

Software Discovery and Sustainability

Community Based

Standards

Clinical Data Challenges

Database Sustainability

About the BD2K Program

• Initial FY 2014 funding of $32M.

• Proposed investment of $656M through 2020.

• Funded by ALL NIH Institutes and Centers.

Software and Tool

Development

Centers of Excellence for Big Data Computing

Data Discovery

Index Coordination Consortium

Scientist Training Awards

Data Science Courses

Data Sharing

Page 12: Big Data as a Catalyst for Collaboration & Innovation

BD2K FY14 Awardssupported by all NIH Institutes

Page 13: Big Data as a Catalyst for Collaboration & Innovation

Big Data: MRI images &

GWAS data from over 30,000

people.

Collaboration: Data came

from 190 worldwide sites

across 33 countries.

Methods: To homogenize

data from different sites, the

group designed standardized

protocols for image analysis,

quality assessment, genetic

imputation, and association.

Found five novel genetic

variants influencing volume

of brain regions.

Results provided insight into

the variability of brain

development, and may be

applied to mechanisms of

neuropsychiatric dysfunction.

(Nature, 2015)

Page 14: Big Data as a Catalyst for Collaboration & Innovation

Detect Predict Adapt

MD2K Applications – CHF and Smoking

Page 15: Big Data as a Catalyst for Collaboration & Innovation

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

Page 16: Big Data as a Catalyst for Collaboration & Innovation

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

Page 17: Big Data as a Catalyst for Collaboration & Innovation

A Culture of Sharing

1999 20042003 2007 20142008

Research

Tools

Policy

NIH Data

Sharing Policy

Model

Organism

Policy

Genome-wide

Association

(GWAS) Policy

2012

NIH Public

Access Policy

(Publications)

Big Data to

Knowledge

(BD2K) Initiative

Genomic Data

Sharing (GDS)

Policy

Modernization of

NIH Clinical

Trials

White House

Initiative

(2013 “Holdren

Memo”)

Page 18: Big Data as a Catalyst for Collaboration & Innovation

Policies – Now & Forthcoming

Data Sharing

– Goal: legitimize data as a form of scholarship

– Now: Genomic data sharing announced

– Coming: Data sharing plans on all research awards

– Data sharing plan enforcement

• Machine readable plan

• Repository requirements to include grant numbers

– Data citation

http://www.nih.gov/news/health/aug2014/od-27.htm

Page 19: Big Data as a Catalyst for Collaboration & Innovation

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

Page 20: Big Data as a Catalyst for Collaboration & Innovation

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

Page 21: Big Data as a Catalyst for Collaboration & Innovation

The Commons is a shared virtual space which is

FAIR:

– Find

– Access (use effectively)

– Interoperate

– Reuse

An environment to find and catalyze the use of

shared digital research objects

The CommonsConcept

Page 22: Big Data as a Catalyst for Collaboration & Innovation

The Developer or User Defines the

Environment from the Appropriate

Building Blocks

Page 23: Big Data as a Catalyst for Collaboration & Innovation

Infrastructure - The Commons:

Conceptual Framework

Research Objects (with UIDs)

Discoverability(Search & Find)

Th

e C

om

mo

ns

Open APIs

Data and tools

Computing

Platform(s)

Containers

Packaging software

Page 24: Big Data as a Catalyst for Collaboration & Innovation

BD2K

Center

BD2K

Center

BD2K

Center

BD2K

CenterBD2K

Center

BD2K

Center

DDICC

Software

Standards

Infrastructure - The

CommonsLabs

Labs

Labs

Labs

Page 25: Big Data as a Catalyst for Collaboration & Innovation

Commons - Pilots

The Cloud Credits - business model

BD2K Centers

MODs (Model Organism Databases)

HMP Data and tools available in the cloud

NCI Cloud Pilots & Genomic Data

Commons

Page 26: Big Data as a Catalyst for Collaboration & Innovation

Elements of The Digital Enterprise

CommunityPolicy

Infrastructure

• Sustainability

• Collaboration

• Training

Page 27: Big Data as a Catalyst for Collaboration & Innovation

SRP & BD2K

Possible Interactions? (1/3)

Innovation

– Join IDEAS Labs

– Propose new EHS programs

Collaboration

– Participation in standards efforts

• EHS vocabulary

• CDE harmonization

• BD2K standards coordination center

Training

– BD2K training coordination center

Page 28: Big Data as a Catalyst for Collaboration & Innovation

SRP & BD2K

Possible Interactions? (2/3)

Open Science - FAIR

– Participate in the Open Science Competitions

– With EPA challenge to produce device to measure

atmospheric pollutants and biologic response (2013).

• Toxicogenomics challenge with NTP/NIEHS, UNC, and

Sage/DREAM to predict response to a given chemical

based upon individual genetic susceptibility (2014).

• With HHS to generate visualization tool to demonstrate

impact of climate change on health risk (2016).

– CHEAR data science center.

Page 29: Big Data as a Catalyst for Collaboration & Innovation

SRP & BD2K

Possible Interactions? (3/3)

Leadership and coordination

– Trans-NIH, BD2K

– Interagency Big Data Senior Steering Group

– Sustainability Working Group

– Research Data Alliance Interest Group, Toxicogenomics

Databases Interoperability

Page 30: Big Data as a Catalyst for Collaboration & Innovation

ADDS Team

BD2K Representatives

Page 31: Big Data as a Catalyst for Collaboration & Innovation

NIH…Turning Discovery Into Health

[email protected]

https://datascience.nih.gov/http://www.ncbi.nlm.nih.gov/research/staff/bourne/