Friend Oslo 2012-09-09

45
Lessons Learned: Reali.es of Building Cancer Models Sharing , Rewards and Affordability Stephen Friend MD PhD

description

Stephen Friend, Sept 9, 2012. Oslo University Hospital, Oslo, Norway

Transcript of Friend Oslo 2012-09-09

Page 1: Friend Oslo 2012-09-09

       

Lessons  Learned:  Reali.es  of  Building  Cancer  Models-­‐

Sharing  ,  Rewards  and  Affordability            

Stephen  Friend  MD  PhD    

Page 2: Friend Oslo 2012-09-09
Page 3: Friend Oslo 2012-09-09
Page 4: Friend Oslo 2012-09-09
Page 5: Friend Oslo 2012-09-09

KRAS NRAS

BRAF

MEK1/2

EGFR

ERBB2

BCR/ABL

EGFRi

Proliferation,   Survival

•  EGFR  Pathway  commonly  mutated/ac.vated  in  Cancer  •  30%  of  all  epithelial  cancers  

•  Blocking  Abs  approved  for  treatment  of  metasta.c  colon  cancer  

•  Subsequently  found  that  RASMUT  tumors  don’t  respond  –  “Nega.ve  Predic.ve  Biomarker”  

•  However  s.ll  EGFR+  /  RASWT  pa.ents  who  don’t  respond?  –  need  “Posi.ve  Predic.ve  Biomarker”  

•  And  in  Lung  Cancer  not  clear  that  RASMUT  status  is  useful  biomarker  

Predic.ng  treatment  response  to  known  oncogenes  is  complex  and  requires  detailed  understanding  of  how  different  gene.c  backgrounds  func.on  

Oncogenes only make good targets in particular molecular contexts : EGFR story

Page 6: Friend Oslo 2012-09-09

Reality: Overlapping Pathways

Page 7: Friend Oslo 2012-09-09

Preliminary Probabalistic Models- Rosetta

Gene symbol Gene name Variance of OFPM explained by gene expression*

Mouse model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11] C3ar1 Complement component

3a receptor 1 46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct identification of genes that are

causal for disease Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

Page 8: Friend Oslo 2012-09-09

"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)

"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)

"Genetics of gene expression and its effect on disease." Nature. (2008)

"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)

….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc

"Identification of pathways for atherosclerosis." Circ Res. (2007)

"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)

…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome

"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)

“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)

"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)

"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)

"Integrating large-scale functional genomic data ..." Nat Genet. (2008)

…… Plus 3 additional papers in PLoS Genet., BMC Genet.

d

Metabolic Disease

CVD

Bone

Methods

Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models

• >80 Publications from Rosetta Genetics

Page 9: Friend Oslo 2012-09-09

Ø  50 network papers Ø  http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

Page 10: Friend Oslo 2012-09-09
Page 11: Friend Oslo 2012-09-09

Background:  Informa.on  Commons  for  Biological  Func.ons  

Page 12: Friend Oslo 2012-09-09

Sage Bionetworks A non-profit organization with a vision to enable networked team

approaches to building better models of disease

BIOMEDICINE INFORMATION COMMONS INCUBATOR

Sagebase.org

Data Repository

Discovery Platform

Building Disease Maps

Commons Pilots

Page 13: Friend Oslo 2012-09-09

Sage Bionetworks Collaborators

§  Pharma Partners §  Merck, Pfizer, Takeda, Astra Zeneca, Amgen,Roche, Johnson &Johnson

13

§  Foundations §  Kauffman CHDI, Gates Foundation

§  Government §  NIH, LSDF, NCI

§  Academic §  Levy (Framingham) §  Rosengren (Lund) §  Krauss (CHORI)

§  Federation §  Ideker, Califano, Nolan, Schadt

Page 14: Friend Oslo 2012-09-09

Predictive models of cancer phenotypes

Ø  mRNA Ø  copy number Ø  somatic

mutations Ø  epigenetics Ø  proteomics

Molecular characterization

Cancer phenotypes Ø  Drug sensitivity

screens Ø  Clinical

prognosis

Panel  of  tumor  samples  

Predic2ve  model  

months'a)er'beginning'treatment'Brian&J&Druker,&Nature'Medicine'15,&114901152&(2009)&

overall'survival'(%)'

Rela:ng'a'gene:c'feature'of'a'cancer'to'the'efficacy'of'a'drug:'Gleevec'(Ima:nib)'improves'survival'in'CML'pa:ents'harboring'the'

BCREABL'transloca:on'

KRAS% NRAS%

BRAF%

MEK1/2%

EGFR% ERBB2%MET%

PIKC3A%

BCR/ABL%

ARF% MDM2% TP53%

Ima9nib%Nilo9nib%

AZD0530%

AZD6244%

Erlo9nib%

Lap9nib%

NutlinG3%

PDG0325901%

PF2341066%PHAG665752%%

PLX4720%RAF265%

ZDG6474%

Page 15: Friend Oslo 2012-09-09

Biological System

Data Analysis

Fundamentally  Biological  Science  hasn’t  changed  yet  because  of  the  ‘Omics  Revolu.on……  

…..it  is  s.ll  about  the  process  of  linking  a  system  to  a  hypothesis  to  some  data  to    some  analyses    

Page 16: Friend Oslo 2012-09-09

Biological

System

Data

Analysis

Iterative Networked Approaches To Generating Analyzing and Supporting New Models

Uncouple the automatic linkage between the data generators, analyzers, and validators  

Page 17: Friend Oslo 2012-09-09

SYNAPSE

CURATED DATA

TOOLS/ METHODS

ANALYSES/ MODELS

RAW DATA

BioMedicine Information Commons

Data Generators

Data Analysts

Experimentalists

Clinicians

Patients/ Citizens

Networked Approaches

Page 18: Friend Oslo 2012-09-09

SYNAPSE

CURATED DATA

TOOLS/ METHODS

ANALYSES/ MODELS

RAW DATA

BioMedical Information Commons

Data Generators

Data Analysts

Experimentalists

Clinicians

Patients/ Citizens

Networked Team Approaches

3  PRIVACY  BARRIERS  

2  REWARDS  

RECOGNITION  

5  REWARDS  

FOR  SHARING  

1  USABLE  DATA  

4  HOW  TO  

DISTRIBUTE  TASKS  

Page 19: Friend Oslo 2012-09-09

Open and Networked Team Approaches

1  USABLE  DATA  

2  REWARDS  

RECOGNITION  

SYNAPSE  

Page 20: Friend Oslo 2012-09-09

Two approaches to building common scientific and technical knowledge

Text summary of the completed project Assembled after the fact

Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding

Page 21: Friend Oslo 2012-09-09

Synapse is GitHub for Biomedical Data

Data and code versioned Analysis history captured in real time Work anywhere, and share the results with anyone Social Science

Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding

Page 22: Friend Oslo 2012-09-09

Watch What I Do, Not What I Say

sage bionetworks synapse project

Page 23: Friend Oslo 2012-09-09

Most of the People You Need to Work with Don’t Work with You

sage bionetworks synapse project

Page 24: Friend Oslo 2012-09-09

My Other Computer is “The Cloud”

sage bionetworks synapse project

Page 25: Friend Oslo 2012-09-09

Data Analysis with Synapse

Run Any Tool

On Any Platform

Record in Synapse

Share with Anyone

Page 26: Friend Oslo 2012-09-09

Performance*assessment*

Expression* Copy*number* Muta6on* Phenotype*

Expression* Copy*number* Muta6on* Phenotype*

Expression*Copy*

number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Synapse  infrastructure  for  sharing,  searching,    and  analyzing  TCGA  data

•  Automated  workflows  for  cura.on,  QC,  and  sharing  of  large-­‐scale  datasets.  

•  All  of  TCGA,  GEO,  and  user-­‐submihed  data  processed  with  standard  normaliza.on  methods.  

•  Searchable  TCGA  data:  •  23  cancers  •  11  data  plajorms  •  Standardized  meta-­‐data  ontologies  

Page 27: Friend Oslo 2012-09-09

130$drugs$

Pred

ic.o

n$Ac

curacy$(R

2 )$

Performance*assessment*

Expression* Copy*number* Muta6on* Phenotype*

Expression* Copy*number* Muta6on* Phenotype*

Expression*Copy*

number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Synapse  infrastructure  for  sharing,  searching,    and  analyzing  TCGA  data

•  Comparison  of  many  modeling  approaches  applied  to  the  same  data.  

•  Models  transparently  shared  and  reusable  through  Synapse.  

•  Displayed  is  comparison  of  6  modeling  approaches  to  predict  sensi.vity  to  130  drugs.  

•  Extending  pipeline  to  evaluate  predic.on  of  TCGA  phenotypes.  

•  Hos.ng  of  collabora.ve  compe..ons  to  compare  models  from  many  groups.  

Page 28: Friend Oslo 2012-09-09

Open and Networked Approaches

3  PRIVACY  BARRIERS  

PORTABLE  LEGAL  CONSENT:  weconsent.us  John  Wilbanks  

Page 29: Friend Oslo 2012-09-09

REDEFINING HOW WE WORK TOGETHER: Sage/DREAM Breast Cancer Prognosis Challenge

4  HOW  TO  

DISTRIBUTE  TASKS  

COLLABORATIVE  CHALLENGES  

5  REWARDS  

FOR  SHARING  

Page 30: Friend Oslo 2012-09-09

What  is the problem? Our current models of disease biology are primitive and limit

doctor’s understanding and ability to treat patients Current incentives reward those who silo information and work in closed systems

Page 31: Friend Oslo 2012-09-09

The Solution: Competitions to crowd-source research in biology and other fields

Ø  Why competitions? •  Objective assessments •  Acceleration of progress •  Transparency •  Reproducibility •  Extensible, reusable models

Ø  Competitions in biomedical research •  CASP (protein structure) •  Fold it / EteRNA (protein / RNA structure) •  CAGI (genome annotation) •  Assemblethon / alignathon (genome assembly / alignment) •  SBV Improver (industrial methodology benchmarking) •  DREAM (co-organizer of Sage/DREAM competition)

Ø  Generic competition platforms •  Kaggle, Innocentive, MLComp

Page 32: Friend Oslo 2012-09-09

METABRIC

• Array-CGH"• Expression arrays"• Sequencing TP53 PIK3CA"• Amplified DNA and cDNA banks"• miRNA profiling"

Anglo-Canadian collaboration"

Gene sequencing (ICGC)

Page 33: Friend Oslo 2012-09-09

Sage/DREAM Challenge: Details and Timing

Phase  1: July thru end-Sep 2012 Ø  Training data: 2,000 breast cancer

samples from METABRIC cohort •  Gene expression •  Copy number •  Clinical covariates •  10 year survival

Ø  Supporting data: Other Sage-curated breast cancer datasets

•  >1,000 samples from GEO •  ~800 samples from TCGA •  ~500 additional samples from

Norway group •  Curated and available on

Synapse, Sage’s compute platform

Ø  Data released in phases on Synapse from now through end-September

Ø  Will evaluate accuracy of models built

on METABRIC data to predict survival in:

•  Held out samples from METABRIC

•  Other datasets  

Phase  2:  Oct 15 thru Nov 12, 2012

Ø  Evaluation of models in novel

dataset. Ø  Validation data: ~500 fresh

frozen tumors from Norway group with:

•  Clinical covariates •  10 year survival

Page 34: Friend Oslo 2012-09-09

Performance*assessment*

Expression* Copy*number* Muta6on* Phenotype*

Expression* Copy*number* Muta6on* Phenotype*

Expression*Copy*

number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Synapse  transparent,  reproducible,  versioned  machine  learning  infrastructure  for  method  comparison

METABRIC  cohort:  997  breast  cancer  samples  

Clinical  covariates      Gene  expression  (Illumina  HT12v3)    Copy  number  (Affy  SNP  6.0)                  10  year  survival  

Loaded  through  Synapse  R  client  as  Bioconductor  objects.  

Page 35: Friend Oslo 2012-09-09

Performance*assessment*

Expression* Copy*number* Muta6on* Phenotype*

Expression* Copy*number* Muta6on* Phenotype*

Expression*Copy*

number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Synapse  transparent,  reproducible,  versioned  machine  learning  infrastructure  for  method  comparison

Custom  models  implement  train()  and  predict()  API.  

Implementa)on  of  simple  clinical-­‐only  survival  model  used  as  baseline  predictor.  

Page 36: Friend Oslo 2012-09-09

Trey%Ideker)Janusz%Dutkowski)

Eric%Schadt)Gaurav%Pandey)

Gustavo%Stolovi=ky)Erhan%

Bilal)

Andrea%Califano)

Yishai%Shimoni)

Mukesh%Bansal) Mariano%

Alvarez)

Garry%Nolan)

In%Sock%Jang)Ben%Sauerwine)Stephen%Friend)

Justin%Guinney)

Marc%Vidal)

Adam%Margolin)

Ben%Logsdon)

Federa2on  modeling  compe22on  

Models  submiVed  and  evaluated  in  real-­‐2me  

leaderboard    

>200  models  tested  within  3  months  

Page 37: Friend Oslo 2012-09-09

Sage-­‐DREAM  Breast  Cancer  Prognosis  Challenge    one  month  of  building  beher  disease  models  together  

154  par.cipants;  27  countries    268  par.cipants;  32  countries    

290  models  posted  to  Leaderboard  

breast  cancer  data  

Challenge  Launch:  July  17  

August  17  Status  

Page 38: Friend Oslo 2012-09-09

Examples  of  Par.cipants  

Page 39: Friend Oslo 2012-09-09
Page 40: Friend Oslo 2012-09-09
Page 41: Friend Oslo 2012-09-09
Page 42: Friend Oslo 2012-09-09

Summary of Breast Cancer Challenge #1 hVps://synapse.sagebase.org/  -­‐  BCCOverview:0   Transparency,  reproducibility  

Valida2on  in  novel  dataset  

Publica2on  in  Science  Transla2onal  Medicine  

Dona2on  of  Google-­‐scale  compute  space.  

For  the  goal  of  promo2ng  democra2za2on  of  medicine…  Registra2on  star2ng  NOW…    

sign  up  at:    synapse.sagebase.org  

Performance*assessment*

Expression* Copy*number* Muta6on* Phenotype*

Expression* Copy*number* Muta6on* Phenotype*

Expression*Copy*

number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Page 43: Friend Oslo 2012-09-09

Breast Cancer Collaborative Challenges and Beyond

Start  With  Pre-­‐Collated  Cohort  

Collabora.ve  Challenge  Hosted  on  

Synapse  

Obtain  research  ques.ons  from  breast  cancer  community  for  Challenge  2  

Generate  and  fund  research  Challenge  2  research  proposal  

The  challenge  on  molecular  predictors  of  breast  cancer  will  create  a  community-­‐based  effort  to  provide  an  unbiased  assessment  of  the  most  accurate  models  and  methodologies  for  predic:on  of  breast  cancer  survival.  

Announce  best  performing  model  to  predict  breast  cancer  

survival  

43  

Page 44: Friend Oslo 2012-09-09

SYNAPSE

CURATED DATA

TOOLS/ METHODS

ANALYSES/ MODELS

RAW DATA

BioMedical Information Commons

Data Generators

Data Analysts

Experimentalists

Clinicians

Patients/ Citizens

Networked Team Approaches

3  PRIVACY  BARRIERS  

2  REWARDS  

RECOGNITION  

5  REWARDS  

FOR  SHARING  

1  USABLE  DATA  

4  HOW  TO  

DISTRIBUTE  TASKS  

Page 45: Friend Oslo 2012-09-09

       

Lessons  Learned:  Reali.es  of  Building  Cancer  Models-­‐

Sharing  ,  Rewards  and  Affordability            

Stephen  Friend  MD  PhD