H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Post on 06-Jan-2017

945 views 0 download

Transcript of H2O World - H2O for Genomics with Hussam Al-Deen Ashab

H2O for Genomics

0

Hussam Al-Deen

GenomeDx Biosciences

• About GenomeDx

• Cancer and genomics

• Genomic information we use

‒ Genome-wide RNA expression for applications in cancer

• Our prostate cancer solution

• Why we use H2O ?

• Applications tested:

‒ Tumor Gleason Grade Classifier tested for multiple endpoint prediction

• Conclusions and Future Directions

Outline

1

GenomeDx Biosciences

A b o u t U s

2

A clinical genomics company founded to

transform the practice of oncology

Use machine learning and statistical

algorithms to generate clinical tests

Decipher® metastasis signature

More than 20 Peer-review

publications supporting analytical,

clinical validity and utility

Over 5,000 patients tested in clinical

trials and oncology practice

Decipher GRIDTM platform

Data sharing program for Decipher

users

Free access for academic research

Clinical Lab

San Diego, CA

Informatics Lab

Vancouver, BC

Cancer is a disease of the genome

T i s s u e - b a s e d g e n o m i c s

3

• Cancer is a complex disease and has many, many subtypes

‒ Indolent, aggressive, hormone or chemo sensitive/resistant, etc.

DNA RNA Protein

vector.childrenshospital.org people.duke.edu fineartamerica.com

• Measuring RNA expression (concentration) and activity of genes is

highly informative for a genomic-based understanding of cancer

Measure gene activity using genome-wide expression

analysis of clinical biosamples

T i s s u e - b a s e d g e n o m i c s

4

RNA

EXTRACTIONMICROARRAY

TUMOR

SAMPLE

CANCER PATIENT

BIOPSY/SURGERYEXPRESSION

DATA

M E D I C A L C E N T E R

MOFFITTCancer Center & Research Institute

H. LEE

Decipher GRID a novel data-sharing program

to accelerate cancer genomics innovation

5

4

6

A B C

CMYK

PANTONE

4.1

6.1

Rhode - custom thinner weight

Prostate cancer is a significant burden on the US

healthcare system

P r o s t a t e c a n c e r m o s t p r e v a l e n t c a n c e r a f f e c t i n g m e n

Prostate cancer alone is projected in 2015 to account for 26% of incident

cancer cases in men

Siegel, Rebecca L., Kimberly D. Miller, and Ahmedin Jemal. "Cancer statistics, 2015." CA: a cancer journal for clinicians 65.1 (2015): 5-29.

6

• Accurate forecasting of recurrence

risk key to determining optimal

treatment choice:

‒ Observation

‒ Radiation therapy

‒ Hormone therapy

‒ Chemotherapy

• Goal of risk-adapted therapy:

‒ Reduce side effects of treatment

‒ Reduce costs of treatment

Clinical genomics aims to improve cancer patient care

P r o s t a t e c a n c e r b a l a n c i n g t h e h a r m s a n d b e n e f i t s

7

• Highly advanced algorithms such

as Deep Learning

• Ready to use algorithms with

existing languages and tools

• Easily explore data and develop

models

• Multiple algorithms within the

same package

Why we use H2O?

8

http://h2o.ai/

• Genomics:

‒ High-dimensional Dataset ~ 46K features

‒ Feature selection to reduce dimensionality of data

• Deep Learning:

‒ Can exploit non-linear relationship between features (genes)

‒ Improve performance

‒ Deep Features may help us understand the biology

Deep Neural Network

9

• Different packages to train deep

neural network:

‒ Filtering to reduce # of Features ~ 100

‒ No grid search

‒ Cross Validation AUC ~ 0.5

• H2O Deep neural network :

‒ Filtering to reduce # of Features ~ 100

‒ Good Results (AUC)

Deep Neural Network

10

Application:

Development of a Tumor

Gleason Grade Classifier

11

Tumor gleason grade is a strong prognostic factor and used to

guide treatment decisions

D i g i t i z i n g t h e G l e a s o n G r a d e

• Gleason grade is the current

gold standard in prostate

cancer:

• Assigns score from 1 to 5

based on tissue microscopic

appearance

• Higher score is associated with

more aggressive disease

• Men with higher grade prostate

cancer more likely to receive

chemical castration (hormone

therapy) https://en.wikipedia.org/wiki/Gleason_grading_system

12

Why develop a genomic model for pathology tumor grading?

D i g i t i z i n g t h e G l e a s o n G r a d e

• Gleason grade is subjective:

• Depends on pathologist

experience

• Border line cases differently

interpreted

• Gleason grade on biopsy is

often ‘up-graded’ on final

pathology

• Genomics could provide a more

robust prediction of outcomeshttps://en.wikipedia.org/wiki/Gleason_grading_system

13

G3

(n = 366)

G4+

(n = 624)

G4+

(n = 424)

G3

(n = 113)

Study Design

~ 7000 patients

1,537

Patients

Training

(n = 990)

Testing

(n = 537)G3 : Patients who had Gleason 3

G4+ : Patients who had Gleason 4 or 5

14

Classifier Development Overview

Univariate Filtering

H2O Grid Search (10 Fold C.V)

Deep neural network

Array features on Affymetrix Human

Exon 1.0 ST microarrays were

summarized into ~ 46,000 features

(genes)

H2O

H2O Grid search to optimize hidden

layer size

Two-sample Wilcoxon tests ‘Mann-

Whitney’

n = 366

n = 624

46,000 features

G3

G4+

15

Classification table, with cut-point equal to 0.5

Misclassification Rate = 0.31

Truth

Prediction G3 G4+

G3 179 69

G4+ 99 190

Gleason Grade ROC Curve

• Model score AUC = 0.77 95% CI:(0.73-0.81)

• GC1 score AUC = 0.72 95% CI:(0.68-0.76)

• GC2 score AUC = 0.74 95% CI:(0.70-0.78)

• Biopsy Gleason AUC = 0.72 95% CI:(0.68-

0.76)

Boxplot of Model Score distributionS

en

sit

ivit

y

Specificity

1.0

0.8

0.6

0.4

0.2

0.0

1.0 0.8 0.6 0.4 0.2 0.0

1.0

0.75

0.50

0.25

0.00

Sc

ore

G3 G4+

AUC: 0.77 [0.73 – 0.81]

16

Determining Patient Risk

M e t a s t a t i c p r o s t a t e c a n c e r

• Prostate cancer can spread to other parts of

patient body

• After surgery up to 50%1 of men will have

clinical risk factors that increase the chance

of metastasis

• Very few men will experience metastasis

and die of their cancer2

• Gleason grade is surrogate for metastatic

disease

http://www.drugdevelopment-technology.com/projects/

drug_abiateronecance/drug_abiateronecance5.html

17

[1] Swanson, G.P., et al., Pathologic findings at radical prostatectomy: risk factors for failure and death. Urol

Oncol, 2007. 25(2): p. 110-4.

[2] Pound, C.R., et al., Natural history of progression after PSA elevation following radical prostatectomy. JAMA,

1999. 281(17): p. 1591-7

Genomic Gleason Classifier Predicts

Metastatic Outcomes

AUC : 73.4 [67.36 – 79.43]

1.0

0.75

0.50

0.25

Metastasis

0

Sc

ore

18

MET No-MET

METNo-MET

Pro

ba

bil

ity o

f M

eta

sta

sis

Fre

e S

urv

iva

l

1.0

0.8

0.6

0.4

0.2

0.0

0 24 48 24072 96

Time (Surgery to Metastasis)

p−value < 0.001

120 144 168 192 216

0.75

0.90

MET : Patients who developed metastatic disease

No-MET : Patients who developed metastatic disease

Number of

Features

Training

Time

Number

of LayersActivation

Hidden

layers

Hidden

Dropout

Input

Dropout

Testing

AUC (GG1)

Testing

AUC

(Metastatic Disease)

250 ~ 1 hour 2RectifierWi

thDropout(48, 169) (0.55, 0.09) 0.34 77 70

500 ~ 1 hour 3 Rectifier(339, 204,

91)

(0.04, 0.03,

0.13)0.47 78 67

Random search to reduce training time and

incorporate more features

19

[1] GG : Gleason Grade

• Applied advanced machine learning algorithm to genomic data

• H2O Deep Learning model outperform other Gleason predicting models

• Incorporate more genomic features (46 K) into the analysis to improve model development and performance

• Exploit nonlinear relationship between features (genes)

• Can Deeplearning help us understand the biology ?

Conclusions and Future

Directions

20

GenomeDx- A multi-disciplinary adventure!

21

Thank you.

22

hussam@genomedx.com

Tel: +1 888.975.4540 ext. 139

fax: +1 886.505.5161