Open Data in a Global Ecosystem

36
Open Data in a Global Ecosystem Philip E. Bourne Ph.D., FACMI Associate Director for Data Science National Institutes of Health [email protected] BioMedBridges, EBI, November 17, 2015 http://www.slideshare.net/pebourne

Transcript of Open Data in a Global Ecosystem

Page 1: Open Data in a Global Ecosystem

Open Data in a Global EcosystemPhilip E. Bourne Ph.D., FACMIAssociate Director for Data Science

National Institutes of [email protected]

BioMedBridges, EBI, November 17, 2015

http://www.slideshare.net/pebourne

Page 2: Open Data in a Global Ecosystem

Not a talking head….An on-going conversation

Page 3: Open Data in a Global Ecosystem

Some context to start that conversation …

Page 4: Open Data in a Global Ecosystem

Perspective

Structural bioinformatics researcher

Former custodian of the RCSB PDB

Obsessive about open science e.g., PLOS

NIH-wide responsibility for developments in data science

Page 5: Open Data in a Global Ecosystem

Consider this change from my own career experience ….

Page 6: Open Data in a Global Ecosystem

The History of Computational Biomedicine According to Bourne

1980s 1990s 2000s 2010s 2020

Discipline:

Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver

The Raw Material:

Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated

The People:

No name Technicians Industry recognition data scientists Academics

Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol

Page 7: Open Data in a Global Ecosystem

It Follows …

We are entering a period of disruption in biomedical research and we should all be thinking about what this means

to bioinformatics & biomedicine

http://i1.wp.com/chisconsult.com/wp-content/uploads/2013/05/disruption-is-a-process.jpg http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg

Page 8: Open Data in a Global Ecosystem

Big Data in Biomedicine…

This speaks to something more fundamental that more data …

It speaks to new methodologies, new skills, new emphasis, new cultures,

new modes of discovery …

Page 9: Open Data in a Global Ecosystem

We are at a Point of Deception …

Evidence:– Google car– 3D printers– Waze– Robotics– Sensors

From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee

Page 10: Open Data in a Global Ecosystem

Disruption: Example - Photography

DigitizationDeception

Disruption

Demonetization

Dematerialization

Democratization

Time

Vol

ume,

Vel

ocity

, Var

iety

Digital camera invented byKodak but shelved

Megapixels & quality improve slowly; Kodak slow to react

Film market collapses;Kodak goes bankrupt

Phones replacecameras

Instagram,Flickr become thevalue proposition

Digital media becomes bona fide form of communication

Page 11: Open Data in a Global Ecosystem

Disruption: Biomedical Research

Digitization of Basic & Clinical Research & EHR’s

Deception

We Are Here

Disruption

Demonetization

Dematerialization

Democratization

Open science

Patient centered health care

Page 12: Open Data in a Global Ecosystem

Disruptive Features: Sustainability

Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

Page 13: Open Data in a Global Ecosystem

Disruptive Features:Reproducibility

Changing Value of Scholarship (?)

Page 14: Open Data in a Global Ecosystem

“And that’s why we’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.”

President Barack ObamaJanuary 30, 2015

Disruptive Features – New Science

Page 15: Open Data in a Global Ecosystem

Precision Medicine Initiative

National Research Cohort – >1 million U.S. volunteers– Numerous existing cohorts (many funded by NIH)– New volunteers

Participants will be centrally involved in design and implementation of the cohort

They will be able to share genomic data, lifestyle information, biological samples – all linked to their electronic health records

Page 16: Open Data in a Global Ecosystem

What Are Some General Implications of Such a Future?

Open collaborative science becomes of increasing importance nationally and internationally

The value of data and associated analytics becomes of increasing value to scholarship

Opportunities exist to improve the efficiency of the research enterprise and hence fund more research

Global cooperation between funders will be needed to sustain the emergent digital enterprise

Current training content and modalities will not match supply to demand

Balancing accessibility vs security becomes more important yet more complex

Page 17: Open Data in a Global Ecosystem

What Are Some General Implications of Such a Future?

Open collaborative science becomes of increasing importance nationally and internationally

The value of data and associated analytics becomes of increasing value to scholarship

Opportunities exist to improve the efficiency of the research enterprise and hence fund more research

Global cooperation between funders will be needed to sustain the emergent digital enterprise

Current training content and modalities will not match supply to demand

Balancing accessibility vs security becomes more important yet more complex

Page 18: Open Data in a Global Ecosystem

How Should We Respond as Funders?

Community: – Encourage wherever possible a global cultural shift towards

open science– Encourage global exchanges – Encourage global projects

Policies:– Understand and map data sharing policies, standards etc.– Understand ethical, legal and societal differences

Infrastructure:– Share the burden and the reward

Page 19: Open Data in a Global Ecosystem

How Should We Respond as Funders?

Community: – Encourage wherever possible a global cultural shift towards

open science– Encourage global exchanges

Policies:– Understand and map data sharing policies, standards etc.– Understand ethical, legal and societal differences

Infrastructure:– Share the burden and the reward

Page 20: Open Data in a Global Ecosystem

https://www.openscienceprize.org/

Page 21: Open Data in a Global Ecosystem

A Culture of Sharing

1999 20042003 2007 20142008

Research Tools Policy

NIH Data Sharing Policy

Model Organism Policy

Genome-wide Association (GWAS) Policy

2012

NIH Public Access Policy (Publications)

Big Data to Knowledge (BD2K) Initiative

Genomic Data Sharing (GDS) Policy

Modernization of NIH Clinical Trials

White House Initiative

(2013 “Holdren Memo”)

Page 22: Open Data in a Global Ecosystem

The BD2K Program

BD2K Budget

Page 23: Open Data in a Global Ecosystem

BD2K FY14 Awardssupported by all NIH Institutes

Page 24: Open Data in a Global Ecosystem

MD2K Applications – CHF and Smoking

Page 25: Open Data in a Global Ecosystem

How Should We Respond as Funders?

Community: – Encourage wherever possible a global cultural shift towards

open science– Encourage global exchanges – Encourage global projects

Policies:– Understand and map data sharing policies, standards etc.– Understand ethical, legal and societal differences

Infrastructure:– Share the burden and the reward

Page 26: Open Data in a Global Ecosystem

The Commons is a shared virtual space which is FAIR:

– Find

– Access (use effectively)

– Interoperate

– Reuse

An environment to find and catalyze the use of shared digital research objects

The CommonsConcept

Page 27: Open Data in a Global Ecosystem

The Developer or User Defines the Environment from the Appropriate

Building Blocks

Page 28: Open Data in a Global Ecosystem

The CommonsComponents

Page 29: Open Data in a Global Ecosystem

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

DDICC

Software

Standards

Infrastructure - The Commons

Labs

Labs

Labs

Labs

Page 30: Open Data in a Global Ecosystem

Public Beacons

Host Content

AMPLab 1000 Genomes Project

Broad Institute ExAC

Curoverse PGP, GA4GH Example Data

EBI 1000 Genomes Project, UK10K, GoNL, EVS, GEUVADIS, UMCG Cardio GenePanel

Google 1000 Genomes Project, Phase III, Illumina Platinum Genomes

ISB Known VARiants

NCBI NHLBI Exome Sequence Project

OICR 55 cancer datasets

SolveBio 56 public datasets

UCSC ClinVar, LOVD, UniProt

University of Leicester Cafe CardioKit, Cafe Variome Central

WTSI IBD, Native American, Egyptian, UK10K

Over 120 public datasets beaconized across 21 institutions

10s thousands of individuals

Page 31: Open Data in a Global Ecosystem
Page 32: Open Data in a Global Ecosystem

Commons - Pilots

The Cloud Credits - business model

BD2K Centers

MODs (Model Organism Databases)

HMP Data and tools available in the cloud

NCI Cloud Pilots & Genomic Data Commons

Page 33: Open Data in a Global Ecosystem

I not only use all the brains I have, but all I can borrow.

– Woodrow Wilson

Page 34: Open Data in a Global Ecosystem

What Can We Do Now?

Extend the research pilots concept

Have TCC & TeSS work together

Global hackathons, competitions

Closer ties between NLM and EBI / Elixir

Student exchanges Engage foundations, charities

in more global initiatives

http://wwwdev.ebi.ac.uk/Tools/ddi/

Page 35: Open Data in a Global Ecosystem

ADDS Team

BD2K Representatives

Page 36: Open Data in a Global Ecosystem

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

[email protected]://datascience.nih.gov/

http://www.ncbi.nlm.nih.gov/research/staff/bourne/