The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer...

24
The NCI Cancer Research Data Commons Allen Dearry, Ph.D. Program Director Center for Biomedical Informa9cs and Informa9on Technology CI4CC 10.25.2017

Transcript of The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer...

Page 1: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

The NCI Cancer Research Data Commons

AllenDearry,Ph.D.ProgramDirector

CenterforBiomedicalInforma9csandInforma9onTechnology

CI4CC10.25.2017

Page 2: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

Agenda

1.Background2.Overview–NCICancerResearchDataCommons3.CommonsFramework4.Discussion

Page 3: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

Background

Page 4: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

4

(10,000+ patient tumors and increasing)

Courtesy of P. Kuhn (USC)

2006-2015:

A Decade of Illuminating the Underlying Causes of Primary Untreated Tumors Omics Characterization

Precision Medicine Ini1a1ve (PMI)

•  Deepbiologicalunderstanding•  Advancesinscien9ficmethods,instrumenta9on,andtechnology

•  Advancesindatamanagementandcomputa9on•  Abilitytoapplythoseadvancestodriveresearchandtreatment

•  Abilitytosecurelysharedataacrossdomains,ins9tu9ons,andstakeholders

CancerresearchandcaregeneratedetaileddatathatarecriBcaltocreatealearninghealthsystemforcancerKeytenetofthePMI:secure,responsibleaccesstohigh-qualitydataThePMIwasannouncedduringtheStateoftheUnionAddress,2015

PrecisionMedicineisagrandchallenge,requiring:

Page 5: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

5

Basic Ingredients for PMI Big Data

• Open Science. Supporting Open Access, Open Data, Open Source Software, and Data Liquidity for the cancer community

• Standardization through terminology, CDEs, and CRFs

•  Interoperability by exposing existing knowledge through appropriate integration of ontologies, vocabularies, taxonomies, and data standards

• Sustainable models for informatics infrastructure, services, data, metadata, curation

Page 6: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

6

NIH Genomic Data Sharing Policy

hAps://gds.nih.gov/ Went into effect January 25, 2015

NCI guidance:

hAp://www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data

Guiding Principle:

The greatest public benefit will be realized if large-scale genomic data are made available in a 1mely manner to the largest possible number of

inves1gators. For human data, data are made available under terms and condi1ons consistent with the informed consent provided by individual

par1cipants.

Page 7: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

7

The Beau Biden Cancer Moonshotsm

Overarchinggoals–Jan,2016•  Accelerateprogressincancer,includingpreven9on&screening

•  FromcuNngedgebasicresearchtowideruptakeofstandardofcare

•  Encouragegreatercoopera9onandcollabora9on

•  Withinandbetweenacademia,government,andprivatesector

•  Enhancedatasharing

BlueRibbonPanel–October,2016•  NetworkforDirectPa9entEngagement•  CancerImmunotherapyTransla9onalScience

Network•  Therapeu9cTargetIden9fica9ontoOvercome

DrugResistance•  ANa9onalCancerDataEcosystemforSharing

andAnalysis•  FusionOncoproteinsinChildhoodCancers•  SymptomManagementResearch•  Preven9onandEarlyDetec9on–Implementa9on

ofEvidence-basedApproaches•  Retrospec9veAnalysisofBiospecimensfrom

Pa9entsTreatedwithStandardofCare•  Genera9onof3DHumanTumorAtlas•  DevelopmentofNewEnablingCancer

Technologies•  Fullreport:www.cancer.gov/brp

Page 8: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

Na9onalCancerDataEcosystemRecommenda9on

Overallgoal:“Enableallpar9cipantsacrossthecancerresearchandcarecon9nuumtocontribute,access,combineandanalyzediversedatathatwillenablenewdiscoveriesandleadtoloweringtheburdenofcancer.”• Envisionedtoconsistofmul9plecomponents

•  Fundamentalinfrastructuretoconnectthecomponentsandensureinteroperability

•  CommonAPIs•  Dataschemas•  Commondatadic9onaries•  Enhancedcloudcompu9ngpla]orms

• Componentssuchasrepositories,analy9csservices,andinterac9veportals

•  TheabilitytolinkdiversedatatypesanddatasourcesisfundamentaltointeroperabilityoftheCancerDataEcosystem.

Page 9: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

9

Changing the Conversa1on around Data Sharing

• Howdowefinddata,so^ware,standards?• Howcanwemakedata,annota9ons,so^ware,metadataaccessible?• Howdoweadopt/adaptorcreatedatastandards?• Howdowemakemoredatamachinereadable?

NaBonalCancerDataEcosystemNCICancerResearchDataCommons

NIHDataCommonsPilot

DataCommonsco-locatedata,storageandcompuBnginfrastructure,andfrequentlyusedtoolsforanalyzingandsharingdatatocreateaninteroperableresourcefortheresearchcommunity.

Page 10: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

10

•  AkeycomponentofalearningNa9onalCancerDataEcosystem•  Makingresearchdataavailablefordiscovery,valida9on,newtherapies•  Maximizingtheimpact,reuse,andreproducibilityofcancerresearch•  Facilita9nginnova9onofmethodsandtoolsforresearch•  Promo9ngresearchcollabora9ons•  Changingincen9vesfordatasharing

Reducetherisk,improveearlydetecBon,outcomes,andsurvivorshipincancer

Why Develop a Cancer Research Data Commons?

Page 11: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

NCICancerResearchDataCommons-Vision

Page 12: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

12

The NCI Genomic Data Commons

•  Unifyfragmentaryrepositories•  Supportthereceipt,qualitycontrol,integraBon,storage,andredistribuBonofstandardizedgenomicdatasetsderivedfromcancerresearchstudies

•  Harmoniza9onofrawsequencebothfromexis9ngandnewcancerresearchprograms

•  Applica9onofstate-of-the-artmethodsofgenera9ngderivedgenomicdata•  Providethefounda9onfor:

•  Iden9fica9onofhigh-andlow-frequencycancerdrivers•  Defininggenomicdeterminantsofresponsetotherapy•  Clinicaltrialcohortssharingtargetedgene9clesions

Page 13: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

13

•  PI:GadGetz,AnthonyPhilippakis•  GoogleCloud•  FirehoseinthecloudincludingBroadbestprac9cesworkflows• hep://firecloud.org

BroadIns9tute

•  PI:IlyaShmulevich•  GoogleCloud•  LeverageGoogleinfrastructure;Novelqueryandvisualiza9on• hep://cgc.systemsbiology.net/

Ins9tuteforSystemsBiology

•  PI:BrandiDavis-Dusenbery•  AmazonWebServices•  Interac9vedataexplora9on;>30publicpipelines• hep://www.cancergenomicscloud.org

SevenBridgesGenomics

Three NCI Genomics Cloud Pilots

ExtensionDesign/BuildI

Design/BuildII Evalua9on Cloud

Resources

Sept2016Jan2016April2015Sept2014 October2017

Page 14: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

14

Original Goals of the Pilots Remain Relevant

DemocraBzeaccesstoNCI-generatedgenomicandrelateddata,andtocreateacost-effecBvewaytoprovidescalablecomputaBonalcapacitytothecancerresearchcommunity.

Provide:•  Accesstolargegenomicdatasetswithoutneedtodownload•  Accesstopopularpipelinesandvisualiza9ontools•  Abilityforresearcherstobringtheirowntoolsandpipelinestothedata•  Abilityforresearcherstobringtheirowndataandanalyzeincombina9onwithexis9nggenomicdata•  Workspaces,forresearcherstosaveandsharetheirdataandresultsofanalyses

Page 15: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

SBGCGC

BroadFireCloud ISBCGC

Researchers

WebInterface WebInterface

DataSubmission&Harmoniza9on

NCICloudResources:Visualiza9on,Compute,Pipelines,WorkspacesAuthen9ca9on

&Authoriza9onthrueRACommons&dbGaP

GDC

GDC / Cloud Resources: Today

GenomicDataCommons:Harmoniza9on,Visualiza9on,&Download

APIsAPIs

Page 16: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

DataCommonsFramework

Whatisit?•  Reusable,expandableframeworkfortheDataCommons

•  DefinesthecoreprinciplesandstructureofaDataCommons

•  ProvidesreusablecomponentsthatcanbeleveragedacrosstheDataCommons

Components•  Secureuserauthen9ca9onandauthoriza9on

•  Metadatavalida9onandtools•  Domain-specific,extensibledatamodels•  APIandcontainerenvironmentfortoolsandpipelines

•  Accesstocomputa9onalworkspacesforstoringdata,tools,andresults

Page 17: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

DataCommonsFramework–Why?

•  LeverageworkalreadycompletedbyGDCandCloudPilots/Resources.

• Developinfrastructureandfounda9onfortheDataCommonsandnodesastheyarecreated.

•  Ensureconsistencyandinteroperabilityfromthestart,maximizefuturedatasharing.

• Designmodular,interoperablecomponents—dataaccessservices,indexingandsearch,workspaces,workflowandtoolstores,portalsandUIs--thatcanbeflexibleandassembledintodiversedataenvironments.

• Op9mizeabilitytointegratenewdatatypes.•  InterrelatewithotherCommonsdevelopments—NIH,CZI...

Page 18: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

NCI Cancer Research Data Commons (NCRDC)

GenomicDataCommonsNode:GDC

ImagingDataCommonsNode:IDC

ProteomicDataCommonsNode:PDC

APIs

•  Authentication and Authorization •  Metadata Validation Tools •  Data Models

•  User Workspaces •  Container Environment

DataCommonsFramework–Modular,FlexibleCoreServices

Page 19: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

Researchers

WebInterface

DataSubmission&Harmoniza9on

GDC

GDC / Cloud Resources: Near Term - Moving Towards a Commons Framework

DockStore Analysisresources

APIsAPIs

SBGCGC

BroadFireCloud

ISBCGC

GDC@GCP

GDC@AWS

GDC@Azure

WebInterface

Centrally-managedcopiesofthedata,mirroredinthe

commercialclouds

Centrally-managedauthenBcaBonandauthorizaBonthrueRACommonsand

dbGaP

CloudResourcesconBnuetoprovidedataaccess,analyBctools,workspace

Page 20: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

The NCI Cancer Research Data Commons A virtual, expandable infrastructure

Ø  StandardizeddatasubmissionandQ/CØ  ControlledvocabulariesØ  Harmoniza9onbysubjectmaeerexperts GenomicData

ProteomicData

GDC

Clinical

Functional

Cancer Models

Imaging

Population

Proteomics

NCI Cancer Research Data Commons

GDC

ImagingDataØ  SecuredataaccessthroughAPIorwebUIØ  QueryacrossdatadomainsØ  Analy9cs,elas9ccompute,visualiza9on

GDC

Authentication &

Authorization

Biologists / Clinical Researchers

Clinicians and Patients

Tool / Algorithm Developers

Computational Scientists

DataContributors

Page 21: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

API API API API

CancerDataAggregatorAggregatebycase,sample,study,disease,Bssue,etc.

API

APIs

CommunityPresenta9on

Analy9cs

Mul9-modaldataaggrega9on

DataCommonsRepositories/Nodes

Genomics Imaging ProteomicsClinical

Page 22: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

GovernanceandOutreach

• Governanceprocesstobeestablished,includingScien9ficandTechnicalReviewBoardandSteeringCommieee

•  Structuredprocessfordecisions,interac9ons,roles

• Outreachandcollabora9on• WorkingwithNIHandotherICsonrelatedini9a9ves/DataCommons,aswellasexternalgroupssuchasChanZuckerberg

•  Par9cipa9ngonNIHandinteragencyworkinggroupsandonPMI-andMoonshot-relatedprojects

•  PlansforworkshopsandRFIstogetcommunityinput,feedback,andpar9cipa9on

Page 23: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

CloudResourcesTeamLeads•  GadGetz,Ph.D-BroadIns9tute•  IlyaShmulevich,Ph.D-ISB•  BrandiDavis-Dusenberry,Ph.D-SevenBridges

NCICBIITTeam•  DurgaAddepalli,Ph.D.•  AllenDearry,Ph.D.•  JuliKlemm,Ph.D.•  TanjaDavidsen,Ph.D.•  IzumiHinkson,Ph.D.•  BetsyHsu,Ph.D.•  StephenJee,Ph.D.•  JohnOtridge,Ph.D.•  SimaPandya•  EveShalley•  SteveTsang,Ph.D.

FrameworkTeam•  RobertGrossman,Ph.D-UniversityofChicago•  PhillisTang•  Chris9naYung

Acknowledgements NCICenterforCancerGenomics

•  JCZenklusen,Ph.D.•  DanielaGerhard,Ph.D.•  ZhiningWang,Ph.D.

NCIOfficeofCancerClinicalProteomicsResearch•  HenryRodriguez,Ph.D.•  ChrisKinsinger,Ph.D.

NCICancerImagingProgram

•  PaulaJacobs•  JohnFreymann•  Jus9nKirby

NCILeadership•  DougLowy,M.D.•  WarrenKibbe,Ph.D.•  LouStaudt,M.D.,Ph.D.•  StephenChanock,M.D.

Page 24: The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

www.cancer.gov www.cancer.gov/espanol