H2O World - H2O for Genomics with Hussam Al-Deen Ashab

23
H 2 O for Genomics 0 Hussam Al-Deen GenomeDx Biosciences

Transcript of H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Page 1: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

H2O for Genomics

0

Hussam Al-Deen

GenomeDx Biosciences

Page 2: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

• About GenomeDx

• Cancer and genomics

• Genomic information we use

‒ Genome-wide RNA expression for applications in cancer

• Our prostate cancer solution

• Why we use H2O ?

• Applications tested:

‒ Tumor Gleason Grade Classifier tested for multiple endpoint prediction

• Conclusions and Future Directions

Outline

1

Page 3: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

GenomeDx Biosciences

A b o u t U s

2

A clinical genomics company founded to

transform the practice of oncology

Use machine learning and statistical

algorithms to generate clinical tests

Decipher® metastasis signature

More than 20 Peer-review

publications supporting analytical,

clinical validity and utility

Over 5,000 patients tested in clinical

trials and oncology practice

Decipher GRIDTM platform

Data sharing program for Decipher

users

Free access for academic research

Clinical Lab

San Diego, CA

Informatics Lab

Vancouver, BC

Page 4: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Cancer is a disease of the genome

T i s s u e - b a s e d g e n o m i c s

3

• Cancer is a complex disease and has many, many subtypes

‒ Indolent, aggressive, hormone or chemo sensitive/resistant, etc.

DNA RNA Protein

vector.childrenshospital.org people.duke.edu fineartamerica.com

Page 5: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

• Measuring RNA expression (concentration) and activity of genes is

highly informative for a genomic-based understanding of cancer

Measure gene activity using genome-wide expression

analysis of clinical biosamples

T i s s u e - b a s e d g e n o m i c s

4

RNA

EXTRACTIONMICROARRAY

TUMOR

SAMPLE

CANCER PATIENT

BIOPSY/SURGERYEXPRESSION

DATA

Page 6: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

M E D I C A L C E N T E R

MOFFITTCancer Center & Research Institute

H. LEE

Decipher GRID a novel data-sharing program

to accelerate cancer genomics innovation

5

4

6

A B C

CMYK

PANTONE

4.1

6.1

Rhode - custom thinner weight

Page 7: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Prostate cancer is a significant burden on the US

healthcare system

P r o s t a t e c a n c e r m o s t p r e v a l e n t c a n c e r a f f e c t i n g m e n

Prostate cancer alone is projected in 2015 to account for 26% of incident

cancer cases in men

Siegel, Rebecca L., Kimberly D. Miller, and Ahmedin Jemal. "Cancer statistics, 2015." CA: a cancer journal for clinicians 65.1 (2015): 5-29.

6

Page 8: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

• Accurate forecasting of recurrence

risk key to determining optimal

treatment choice:

‒ Observation

‒ Radiation therapy

‒ Hormone therapy

‒ Chemotherapy

• Goal of risk-adapted therapy:

‒ Reduce side effects of treatment

‒ Reduce costs of treatment

Clinical genomics aims to improve cancer patient care

P r o s t a t e c a n c e r b a l a n c i n g t h e h a r m s a n d b e n e f i t s

7

Page 9: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

• Highly advanced algorithms such

as Deep Learning

• Ready to use algorithms with

existing languages and tools

• Easily explore data and develop

models

• Multiple algorithms within the

same package

Why we use H2O?

8

http://h2o.ai/

Page 10: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

• Genomics:

‒ High-dimensional Dataset ~ 46K features

‒ Feature selection to reduce dimensionality of data

• Deep Learning:

‒ Can exploit non-linear relationship between features (genes)

‒ Improve performance

‒ Deep Features may help us understand the biology

Deep Neural Network

9

Page 11: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

• Different packages to train deep

neural network:

‒ Filtering to reduce # of Features ~ 100

‒ No grid search

‒ Cross Validation AUC ~ 0.5

• H2O Deep neural network :

‒ Filtering to reduce # of Features ~ 100

‒ Good Results (AUC)

Deep Neural Network

10

Page 12: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Application:

Development of a Tumor

Gleason Grade Classifier

11

Page 13: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Tumor gleason grade is a strong prognostic factor and used to

guide treatment decisions

D i g i t i z i n g t h e G l e a s o n G r a d e

• Gleason grade is the current

gold standard in prostate

cancer:

• Assigns score from 1 to 5

based on tissue microscopic

appearance

• Higher score is associated with

more aggressive disease

• Men with higher grade prostate

cancer more likely to receive

chemical castration (hormone

therapy) https://en.wikipedia.org/wiki/Gleason_grading_system

12

Page 14: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Why develop a genomic model for pathology tumor grading?

D i g i t i z i n g t h e G l e a s o n G r a d e

• Gleason grade is subjective:

• Depends on pathologist

experience

• Border line cases differently

interpreted

• Gleason grade on biopsy is

often ‘up-graded’ on final

pathology

• Genomics could provide a more

robust prediction of outcomeshttps://en.wikipedia.org/wiki/Gleason_grading_system

13

Page 15: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

G3

(n = 366)

G4+

(n = 624)

G4+

(n = 424)

G3

(n = 113)

Study Design

~ 7000 patients

1,537

Patients

Training

(n = 990)

Testing

(n = 537)G3 : Patients who had Gleason 3

G4+ : Patients who had Gleason 4 or 5

14

Page 16: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Classifier Development Overview

Univariate Filtering

H2O Grid Search (10 Fold C.V)

Deep neural network

Array features on Affymetrix Human

Exon 1.0 ST microarrays were

summarized into ~ 46,000 features

(genes)

H2O

H2O Grid search to optimize hidden

layer size

Two-sample Wilcoxon tests ‘Mann-

Whitney’

n = 366

n = 624

46,000 features

G3

G4+

15

Page 17: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Classification table, with cut-point equal to 0.5

Misclassification Rate = 0.31

Truth

Prediction G3 G4+

G3 179 69

G4+ 99 190

Gleason Grade ROC Curve

• Model score AUC = 0.77 95% CI:(0.73-0.81)

• GC1 score AUC = 0.72 95% CI:(0.68-0.76)

• GC2 score AUC = 0.74 95% CI:(0.70-0.78)

• Biopsy Gleason AUC = 0.72 95% CI:(0.68-

0.76)

Boxplot of Model Score distributionS

en

sit

ivit

y

Specificity

1.0

0.8

0.6

0.4

0.2

0.0

1.0 0.8 0.6 0.4 0.2 0.0

1.0

0.75

0.50

0.25

0.00

Sc

ore

G3 G4+

AUC: 0.77 [0.73 – 0.81]

16

Page 18: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Determining Patient Risk

M e t a s t a t i c p r o s t a t e c a n c e r

• Prostate cancer can spread to other parts of

patient body

• After surgery up to 50%1 of men will have

clinical risk factors that increase the chance

of metastasis

• Very few men will experience metastasis

and die of their cancer2

• Gleason grade is surrogate for metastatic

disease

http://www.drugdevelopment-technology.com/projects/

drug_abiateronecance/drug_abiateronecance5.html

17

[1] Swanson, G.P., et al., Pathologic findings at radical prostatectomy: risk factors for failure and death. Urol

Oncol, 2007. 25(2): p. 110-4.

[2] Pound, C.R., et al., Natural history of progression after PSA elevation following radical prostatectomy. JAMA,

1999. 281(17): p. 1591-7

Page 19: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Genomic Gleason Classifier Predicts

Metastatic Outcomes

AUC : 73.4 [67.36 – 79.43]

1.0

0.75

0.50

0.25

Metastasis

0

Sc

ore

18

MET No-MET

METNo-MET

Pro

ba

bil

ity o

f M

eta

sta

sis

Fre

e S

urv

iva

l

1.0

0.8

0.6

0.4

0.2

0.0

0 24 48 24072 96

Time (Surgery to Metastasis)

p−value < 0.001

120 144 168 192 216

0.75

0.90

MET : Patients who developed metastatic disease

No-MET : Patients who developed metastatic disease

Page 20: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Number of

Features

Training

Time

Number

of LayersActivation

Hidden

layers

Hidden

Dropout

Input

Dropout

Testing

AUC (GG1)

Testing

AUC

(Metastatic Disease)

250 ~ 1 hour 2RectifierWi

thDropout(48, 169) (0.55, 0.09) 0.34 77 70

500 ~ 1 hour 3 Rectifier(339, 204,

91)

(0.04, 0.03,

0.13)0.47 78 67

Random search to reduce training time and

incorporate more features

19

[1] GG : Gleason Grade

Page 21: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

• Applied advanced machine learning algorithm to genomic data

• H2O Deep Learning model outperform other Gleason predicting models

• Incorporate more genomic features (46 K) into the analysis to improve model development and performance

• Exploit nonlinear relationship between features (genes)

• Can Deeplearning help us understand the biology ?

Conclusions and Future

Directions

20

Page 22: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

GenomeDx- A multi-disciplinary adventure!

21

Page 23: H2O World - H2O for Genomics with Hussam Al-Deen Ashab

Thank you.

22

[email protected]

Tel: +1 888.975.4540 ext. 139

fax: +1 886.505.5161