epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal...

90
1 Molecular subgroup of primary prostate cancer presenting with metastatic biology Authors’ Information Steven M. Walker 1, 2 , Laura A. Knight 1 2 , Andrena M. McCavigan 2 , Gemma E. Logan 2 , Viktor Berge 3 , Amir Sherif 4 , Hardev Pandha 5 , Anne Y. Warren 6 , Catherine Davidson 1 , Adam Uprichard 1 ,Jaine K. Blayney 1 , Bethanie Price 2 , Gera L. Jellema 2 , Aud Svindland 3 , Simon S. McDade 1 , Christopher G. Eden 5 , Chris Foster 7 , Ian G. Mills 1, 3, 8, 9 , David E. Neal 10 , Malcolm D. Mason 11 , Elaine W. Kay 12 , David J. Waugh 1 , D. Paul Harkin 1, 2 , R. William Watson 13, Noel W. Clarke 14 , Richard D. Kennedy 1, 2 1 Centre for Cancer Research and Cell Biology, Queen’s University Belfast, 97 Lisburn Road, Belfast, BT9 7BL, UK 2 Almac Diagnostics, 19 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK 3 Department of Urology, Oslo University Hospital (Aker), Oslo, N-0424, Norway 4 Department of Surgical and Perioperative Sciences, Urology and Andrology, Umeå University, Umeå, Sweden, SE-901 87 5 Department of Microbial Sciences, University of Surrey, Leggett Building, Guildford, GU2 7XH, UK 6 Department of Pathology, Addenbrooke’s Hospital, Cambridge, CB2 2QQ, UK 7 Institute of Translational Medicine, University of Liverpool, Merseyside, L69 3BX, UK 8 Department of Molecular Oncology, Oslo University Hospital/Institute for Cancer Research, Oslo, N-0424, Norway 9 Prostate Cancer Research Group, Centre for Molecular Medicine Norway (NCMM), University of Oslo and Oslo University Hospitals, Forskningsparken, Oslo, N-0349, Norway 10 Uro-oncology Research Group, Cambridge Research Institute, Cambridge, CB2 0RE, UK 11 Wales Cancer Bank, Cardiff University, Health Park, Cardiff, CF14 4XN, UK 12 Centre for Systems Medicine, RCSI, Beaumont Hospital, Dublin, Ireland 13 UCD School of Medicine, 8 Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland 14 Christie NHS Foundation Trust, 550 Wilmslow Rd, Manchester, M20 4BX, UK Corresponding Author: Professor Richard D. Kennedy Centre for Cancer Research and Cell Biology Queen’s University of Belfast 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Transcript of epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal...

Page 1: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

1

Molecular subgroup of primary prostate cancer presenting with metastatic biology

Authors’ Information

Steven M. Walker 1, 2, Laura A. Knight 1 2, Andrena M. McCavigan 2, Gemma E. Logan 2,

Viktor Berge 3, Amir Sherif 4, Hardev Pandha 5, Anne Y. Warren 6, Catherine Davidson 1,

Adam Uprichard 1 ,Jaine K. Blayney 1, Bethanie Price 2, Gera L. Jellema 2, Aud Svindland 3,

Simon S. McDade 1, Christopher G. Eden 5, Chris Foster 7, Ian G. Mills 1, 3, 8, 9, David E. Neal 10, Malcolm D. Mason 11, Elaine W. Kay 12, David J. Waugh 1, D. Paul Harkin 1, 2, R. William

Watson 13, Noel W. Clarke14, Richard D. Kennedy 1, 2

1 Centre for Cancer Research and Cell Biology, Queen’s University Belfast, 97 Lisburn Road, Belfast, BT9 7BL, UK2 Almac Diagnostics, 19 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK3 Department of Urology, Oslo University Hospital (Aker), Oslo, N-0424, Norway4 Department of Surgical and Perioperative Sciences, Urology and Andrology, Umeå University, Umeå, Sweden, SE-901 875 Department of Microbial Sciences, University of Surrey, Leggett Building, Guildford, GU2 7XH, UK6 Department of Pathology, Addenbrooke’s Hospital, Cambridge, CB2 2QQ, UK7 Institute of Translational Medicine, University of Liverpool, Merseyside, L69 3BX, UK8 Department of Molecular Oncology, Oslo University Hospital/Institute for Cancer Research, Oslo, N-0424, Norway9 Prostate Cancer Research Group, Centre for Molecular Medicine Norway (NCMM), University of Oslo and Oslo University Hospitals, Forskningsparken, Oslo, N-0349, Norway10 Uro-oncology Research Group, Cambridge Research Institute, Cambridge, CB2 0RE, UK11 Wales Cancer Bank, Cardiff University, Health Park, Cardiff, CF14 4XN, UK12 Centre for Systems Medicine, RCSI, Beaumont Hospital, Dublin, Ireland13 UCD School of Medicine, 8 Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland14 Christie NHS Foundation Trust, 550 Wilmslow Rd, Manchester, M20 4BX, UK

Corresponding Author:

Professor Richard D. Kennedy

Centre for Cancer Research and Cell Biology

Queen’s University of Belfast

Lisburn Road

[email protected]

Key Words: prostate cancer, prognostic, recurrence, progression, Metastatic Assay

Word Count: Total= 3155 words, Main= 2881 Abstract = 274 words

1

2

3

4

5

6

7

8

9

10

11121314151617181920212223242526

2728

29

30

31

32

33

34

35

36

37

38

Page 2: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

2

ABSTRACT

BACKGROUND: Approximately 4-25% of patients with early prostate cancer develop

disease recurrence following radical prostatectomy.

OBJECTIVE: To identify a molecular subgroup of prostate cancers with metastatic

potential at presentation resulting in a high risk of recurrence following radical

prostatectomy.

DESIGN, SETTING & PARTICIPANTS: Unsupervised hierarchical clustering was

performed using gene expression data from 70 primary resections, 31 metastatic lymph

nodes and 25 normal prostate samples. Independent assay validation was performed

using 322 radical prostatectomy samples from four sites with a mean follow-up of 50.3

months.

OUTCOME MEASURES & STATISTICAL ANALYSIS: Molecular subgroups were identified

using unsupervised hierarchical clustering. A partial least squares approach was used to

generate a gene expression assay. Relationships with outcome (time to biochemical and

metastatic recurrence) were analyzed using multivariable Cox regression and log-rank

analysis.

RESULTS & LIMITATIONS: A molecular subgroup of primary prostate cancer with

biology similar to metastatic disease was identified. A 70-transcript signature

(Metastatic Assay) was developed and independently validated in the radical

prostatectomy samples. Metastatic Assay positive patients had increased risk of

biochemical recurrence (Multivariable HR 1.62 [1.13-2.33]; p= 0.0092) and metastatic

recurrence (Multivariable HR=3.20 (1.76-5.80); p=0.0001). A combined model with

CAPRA-S identified patients at increased risk of biochemical and metastatic recurrence

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

Page 3: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

3

superior to either model alone (HR=2.67 [1.90-3.75]; p<0.0001 and HR=7.53 [4.13-

13.73]; p<0.0001 respectively. The retrospective nature of the study is acknowledged as

a potential limitation.

CONCLUSIONS: The Metastatic Assay may identify a molecular subgroup of primary

prostate cancers with metastatic potential.

PATIENT SUMMARY: The Metastatic Assay may improve the ability to detect patients at

risk of metastatic recurrence following radical prostatectomy. The impact of adjuvant

therapies should be assessed in this higher risk population.

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

Page 4: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

4

INTRODUCTION

Although prognosis for localized prostate cancer patients following radical

prostatectomy is very good, 4-25% (dependent upon disease stage and use of

population PSA screening) will develop metastatic disease within 15 years 1,2. In

addition, patients with low and some intermediate risk prostate cancers are best

treated by active surveillance, however there is clinical uncertainty about progression

in this population 3. Progression in low/intermediate risk may be due to a more

biologically aggressive genotype of primary tumours, whilst in clinically higher risk

groups there may be undetected micro-metastatic disease at presentation 4. This could

be treated by adjuvant approaches including pelvic radiotherapy 5, extended lymph

node dissection 6, adjuvant hormone therapy 7 or chemotherapy 8.

Presently metastatic risk is estimated from histopathologic grade (Gleason score and

clinical grade grouping), tumour stage and presenting PSA level. These prognostic

factors have limitations; 15% of lower-grade prostate cancer patients (Gleason≤7)

experience disease recurrence 9 whereas 74-76% of higher-grade patients (Gleason>7)

do not develop metastatic disease following surgery 10. For Gleason 7 tumours,

dominant lesion grade affects prognosis, 40% of Gleason 4+3 patients developing

recurrence by 5-years compared to 15% for Gleason 3+4 11. Clearly there is a need to

identify additional prognostic factors to guide adjuvant treatment. Current approaches

can be broadly classified as mathematical risk models using clinical factors such as

CAPRA 12 and CAPRA-Surgery (CAPRA-S) 13 scoring, or biomarkers measured from

tumour tissue. Regarding biomarkers, researchers have taken immunohistochemical

approaches such as high Ki67 expression 14 or PTEN loss to indicate metastatic potential

15. Others have used multiplexing approaches where a gene expression 16-18 or proteomic

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

Page 5: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

5

signature 19 has been trained against known outcomes to predict high and low risk

disease using archived material.

It is recognized that malignancies originating from the same anatomical site can

represent different molecular entities 20. We hypothesized that a unique molecular

subgroup of primary prostate cancers may exist that has a gene expression pattern

associated with metastatic disease. We took an unsupervised hierarchical clustering

approach using primary localised prostate cancer, primary prostate cancer presenting

with concomitant metastatic disease, lymph node metastasis and normal prostate

samples to identify a novel “metastatic molecular subgroup”. A 70-transcript signature

(Metastatic Assay) was developed using this approach and independently validated in a

cohort of radical prostatectomy samples for biochemical and metastatic recurrence.

101

102

103

104

105

106

107

108

109

110

111

112

113

Page 6: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

6

PATIENTS & METHODS

Study design

Study design followed the reporting recommendations for tumour marker prognostic

studies (REMARK) guidelines as outlined in the criteria checklists (Supplemental Table

1 & Appendix A) and REMARK study design diagram (Supplementary Figure 1).

Patients

Formalin Fixed Paraffin Embedded (FFPE) sections from 126 samples (70 primary

prostate cancer specimens from radical prostatectomy resections including those with

known concomitant metastases, 31 metastatic disease in lymph nodes and 25

histologically confirmed normal prostate samples that did not display hypertrophy,

sourced from bladder resections) were collected from the University of Cambridge and

the Institute of Karolinska for molecular subgroup identification (Supplementary Table

2). A secondary training dataset of 75 primary resection samples was collected, of

which 20 were profiled in duplicate, to aid selection of the final signature length

(Supplementary Table 3). For independent in-silico validation three public datasets

were identified 17,21,22, GSE25136, n=79 (Supplemental Table 4), GSE46691, n=545

(Supplemental Table 5) and GSE21034, n=126 (Supplemental Table 6). 322 FFPE

prostatectomy samples from four sites were collected for independent validation of the

assay (Supplementary Table 7). Biochemical recurrence was defined as a post-

prostatectomy rise in PSA of >0.2 ng/ml followed by a subsequent rise. Metastatic

recurrence was defined as radiologic evidence of any metastatic disease, including

lymph nodes, bone and visceral metastases. Inclusion criteria were T1a-T3c NX M0

prostate cancers treated by radical-prostatectomy, no previous systemic adjuvant or

neoadjuvant treatment in non-recurrence patients and at least 3 years follow-up.

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

Page 7: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

7

Ethical approval was obtained from East of England Research Ethics Committee (Ref:

14/EE/1066).

Metastatic-subgroup Assay Discovery

The 126 discovery samples were analyzed for gene expression using a cDNA microarray

platform optimized for FFPE tissue. Unsupervised hierarchical clustering, an unbiased

statistical method to discover structure in data, was applied to the gene expression

profiles. Genes were selected using variance-intensity ranking and then an iterative

procedure of clustering with different gene-lists to determine the optimal set for

reproducibility (Supplementary Methods). Data matrices were standardized to median

gene expression and agglomerative 2-dimensional hierarchical clustering performed,

using Euclidean distance and Ward’s linkage. The optimal number of sample and gene

clusters were identified using the GAP statistic 23.

GO biological processes determined biological significance of the gene clusters. Chi-

squared or ANOVA tests were used to assess association of sample clusters with clinical

data. Class-labels were assigned to samples, classifying the subgroup enriched with

metastatic tumours as the “metastatic-subgroup”; and the subgroup enriched with

normal prostate samples the “non-metastatic-subgroup”.

A signature to identify the metastatic-subgroup was developed using partial-least-

squares (PLS) regression. All model development steps (pre-processing,

gene-filtering/selection, model parameter estimation) were nested within 10x5-fold

cross-validation (CV), including assessment of signature score reproducibility in

5xFFPE-separate sections and repeatability across 20 resection samples from the

secondary training dataset with technical duplicates. In sum, area under the ROC curve

(AUC), C-index performance for metastatic recurrence in the additional dataset of 75

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

Page 8: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

8

resections and assay stability across replicates were used to guide the final number of

transcripts detected by the assay. Thresholds for dichotomizing predictions were

selected at the point where sensitivity and specificity for detecting the metastatic-

subgroup reached a joint maximum.

Statistical Assessment of Assay Performance

The performance of the Metastatic Assay regarding biochemical and metastatic

progression was assessed by sensitivity and specificity. Cox regression was used to

investigate prognostic effects of the assay with respect to time to recurrence endpoints.

The estimated effect of the assay was adjusted for PSA, age and Gleason score in a

multivariable model. A second multivariable analysis was performed to investigate the

prognostic effect of the assay when adjusting for CAPRA-S 13, whilst further assessing

additional prognostic effect of a combined model generated for the assay and CAPRA-S

together. Verification of proportional hazard assumptions was assessed using a

statistical test based on the Schoenfeld residuals 24. Samples with unknown clinical

factors were excluded. All tests of statistical significance were 2-sided at 5% level of

significance.

Combined model development and application (Metastatic Assay and CAPRA-S)

A combined model using Metastatic Assay dichotomized calls and CAPRA-S

dichotomized into Low risk (CAPRA-S: 0-5) and High risk (CAPRA-S: 6-10) was assessed

in the resection validation cohort independently against biochemical and metastatic

endpoints using Cox regression analysis (Supplementary Methods).  Subjects were

classified ‘Low risk’ given a combined model result Assay Negative/CAPRA-S low risk;

otherwise subjects were labelled ‘High Risk’ (i.e. samples that were classified as Assay

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

Page 9: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

9

Negative/CAPRA-S high risk, Assay Positive/CAPRA-S low risk or Assay

Positive/CAPRA-S high risk).

See Supplemental Methods for additional experimental detail.

185

186

187

Page 10: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

10

RESULTS

Molecular Subtyping and Identification of a Metastatic-subgroup in the Discovery

cohort

We hypothesized a molecular subgroup of poor prognosis primary prostate cancers

would be transcriptionally similar to metastatic disease. To identify this subgroup, we

measured gene expression in primary prostate cancers, primary prostate cancers with

known concomitant metastases, metastatic lymph node samples and histologically

confirmed normal prostate tissue (Supplementary Table 2).

Unsupervised hierarchical clustering identified two sample groups and two gene

clusters (Figure 1A). Importantly, one of the molecular subgroups (C1) demonstrated

significant enrichment for primary cancers with known concomitant metastatic disease

(Figure 1A & 1B, chi-squared p<0.0001). In addition, the C1 group contained all

metastatic lymph node samples and no normal prostate samples. We defined this

subgroup as the ‘metastatic-subgroup’ and the other (C2) the ‘non-metastatic-

subgroup’.

Identifying Metastatic-subgroup Biology

A feature of the metastatic-subgroup was loss of gene expression observed in Gene

cluster 1 (G1) (Figure 1A & Supplementary Table 8). To investigate if loss of gene

expression was due to epigenetic silencing we measured DNA methylation in 8

metastatic and 14 non-metastatic-subgroup samples (Supplementary Table 9). Semi-

supervised hierarchical clustering of the methylation data of down-regulated genes (G1)

separated the samples into 2 groups (Supplementary Figure 2 & Supplementary Table

10), with 7/8 samples (88%) from the metastatic-subgroup (M2), and 10/14 samples

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

Page 11: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

11

(71%) from the non-metastatic-subgroup clustering together (M1) (chi-squared,

p=0.02). Functional analysis demonstrated that the metastatic-subgroup had higher

levels of methylation in genes that negatively regulate pathways known to be involved

in aggressive prostate cancer such as WNT and growth signalling (Supplementary Table

11) 25.Together these data suggest that epigenetic silencing is a feature of the

metastatic-subgroup and may therefore be important in metastases.

To better understand molecular processes upregulated in the metastatic-subgroup we

performed differential gene analysis, identifying 222 that were over-expressed.

Ingenuity Pathway Analysis (IPA) (www.ingenuity.com) identified 2 up-regulated

pathways in the metastatic-subgroup (FDR p<0.05). The ToppGene Suite 26 identified

18 up-regulated pathways (FDR p<0.05) (Supplementary Table 12). These pathways

represented mitotic progression and Forkhead Box M1 (FOXM1) pathways.

Consistently, FOXM1 was 2.80 fold over-expressed in the metastatic-subgroup.

Development of a Metastatic Assay

Next we developed an assay that could identify metastatic-subgroup tumours

(Supplementary Figure 3). Computational classification using PLS-regression resulted

in a 70-transcript Metastatic Assay. In the training set, the AUC under CV for detecting

the metastatic-subgroup was 99.1 [98.5-99.8]. The standard deviation (SD) in assay

scores using 5 separate sections from the same tumour was 0.06 representing 6.9% of

the assay range and 100% agreement in assay call. In a secondary training dataset of 75

primary resections, the C-index for detecting the metastatic subgroup was 90.4, with an

SD in assay scores using 20 patient samples with technical replicates of 0.02

representing 2.9% of assay range (Supplementary Figure 4).

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

Page 12: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

12

Importantly, as the assay was trained against a distinct molecular subgroup rather than

clinical outcome; there was a bimodal distribution of scores (Supplementary Figure 5).

The Metastatic Assay gene list and weightings are listed in Supplementary Table 13.

Metastatic Assay performance in public datasets

The assay was applied to three independent public prostate cancer resection gene

expression datasets. Assay scores were calculated using the partial least squares model

and dichotomized into assay-positive and assay-negative (Supplemental Methods). In

the first (n=79) 21 the assay was significantly associated with biochemical recurrence

with a sensitivity of 70.3% and specificity of 66.7% (Chi-square p=0.0049). In a second,

(n=545) 17, the assay was significantly associated with metastatic progression with a

sensitivity of 67.0% and specificity of 54.6% (Chi-square p<0.0001). Using a third

dataset with time to event data, (n=126) 22, multivariable analysis adjusting for Gleason

(grades represented in four subgroups), age and PSA demonstrated increased risk of

biochemical recurrence (HR=3.03 [1.43-6.41]; p=0.0040) (Table 1)(Figure 2B).

However, possibly due to the small number of metastatic events (11%) the association

with outcome in multivariable analysis did not reach statistical significance (HR=2.53,

[0.67-9.54]; p=0.1735) (Table 1).

Metastatic Assay performance in an independent primary prostate cancer resection

dataset

The assay was then applied to 322 FFPE prostatectomy samples from four clinical sites

with a median follow-up 50.3 months using predefined inclusion/exclusion criteria per

REMARK guidelines (Supplementary Figure 1). A pre-defined assay cut-off of 0. 3613

was used to define Metastatic Assay positivity. On multivariable analysis a positive

assay result was associated with increased risk of biochemical recurrence (HR=1.62

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

Page 13: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

13

[1.13 – 2.33]; p=0.0092) (Figure 3A and Table 2) and metastatic recurrence (HR=3.20

[1.76 – 5.80]; p=0.0001) (Table 2 and Figure 3B).

Comparison of the Metastatic Assay with clinical risk stratification

To test assay independence from approaches used in the clinic, we assessed its

performance within risk groups defined by Gleason score and the CAPRA-S model in the

independent resection validation cohort. When separated by Gleason (high risk GS≥4+3

and low risk GS≤3+4) the Metastatic Assay identified patients at higher risk of

metastatic recurrence with a HR of 2.43 (1.14-5.17; p=0.0036) and HR=5.61 (1.19-

26.47; p=0.0013) in the high and low risk GS groups respectively (Figure 3C).

The CAPRA-S prognostic model uses PSA at presentation, age, Gleason score, T-stage,

seminal vesicle invasion (SVI), extracapsular extension (ECE), lymph node invasion

(LNI) and surgical margins 13. In multivariable analysis adjusted for CAPRA-S, both the

Metastatic Assay and CAPRA-S were significantly associated with biochemical

recurrence (HR=1.72 [1.19-2.48]; p=0.0042 and HR=2.52 [1.79-3.54]; p<0.0001) and

development of metastatic disease (HR=2.94 [1.60-5.40]; p=0.0005 and HR=4.76 [2.46-

9.23], p<0.0001) (Table 2). Given the independence of the Metastatic Assay result and

CAPRA-S score a combined model was assessed. Patients classified within the high-risk

subgroup (Assay Positive and CAPRA-S high) were significantly associated with both

biochemical and metastatic recurrence (HR=2.67 [1.90-3.75]; p<0.0001 and HR=7.53

[4.13-13.7]; p<0.0001 respectively) demonstrating superiority to either model alone

(Figure 4 and Table 2, Combined Model).

To assess the clinical impact of the combined model of Metastatic Assay plus CAPRA-S,

additional performance metrics were assessed for the metastatic endpoint in the

independent resection validation validation cohort. As the assay was dichotomous, the

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

Page 14: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

14

comparison of sensitivity and specificity between the Metastatic Assay alone, CAPRA-S

alone and the Combined Model were investigated. Whilst the sensitivity of CAPRA-S

(70.5%) was greater than that of the Metastatic Assay alone (47.7%), there was an

increase in sensitivity to 80.1% when combined in the model. There was, however a

decrease in specificity from 81.9% (Metastatic Assay) and 71.5% (CAPRA-S) to 61.1% in

the Combined Model which may indicate patients who have not yet experienced

recurrence within the 50.3 month median follow-up (Supplementary Table 15).

Metastatic Assay performance as a continuous predictor of recurrence

A Combined Model of continuous Metastatic Assay scores and CAPRA-S had higher

performance for predicting metastatic recurrence, with the highest C-index, HR and AUC

compared to either metric alone, within two validation cohorts (MSKCC: AUC=0.88,

[0.81-0.93], HR 1.55 [1.26-1,91]; p<0.0001, C-index=0.83 [0.74-0.91] (Supplementary

table 16) and Independent Resection Validation: AUC=0.80 [0.74-0.85], HR 1.66 [1.43-

1.93]; p<0.0001, C-index=0.82 [0.76-0.86] (Supplementary Table 17)). The Metastatic

Assay is an independent predictor of both biochemical and metastatic recurrence when

assessed as a continuous variable in multivariate analysis in two validation datasets

(MSKCC: HR 2.00 [1.24-3.24]; p=0.0050 and HR 2.99 [1.10-8.17]; p=0.0334 and

Independent Resection Cohort: HR 1.16 [1.03-1.30]; p=0.0155 and HR 1.52 [1.24-1.85];

p<0.0001 (per 0.1 unit change in assay score)) (Supplementary Table 18).

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

Page 15: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

15

DISCUSSION

The majority of early prostate cancer patients treated by radical resection are cured.

However, up to 25% of patients develop metastatic disease within 15 years 1,2. In

surveillance for low/intermediate risk disease there is concern about risks of clinical

under-grading and disease progression, with a proportion of patients needing

treatment within 5 years 3. This engenders clinical uncertainty in modern practice in

two key areas; firstly in the appropriate and safe selection of patients for active

surveillance, particularly in the Gleason 3+4 intermediate group, and secondly in

patients undergoing radical local treatment for intermediate and higher grade tumours,

where adjuvant loco-regional and systemic treatment may improve outcome. A test

which helps to select patients at higher risk of progression in these settings will have

significant clinical utility.

Several prognostic gene expression assays have been developed by comparing gene

expression data between good and poor outcome patients 16-18. In contrast, we identified

a molecular subgroup of primary prostate cancer samples that shared biology with

metastatic disease. We developed an assay for this molecular subgroup which identified

patients at risk of biochemical and metastatic recurrence in three publicly available and

one prospectively collected multicentre dataset.

Consistent with the molecular subgroup representing metastatic biology, the assay was

better at predicting metastatic progression rather than biochemical recurrence. The

latter does not necessarily predict metastatic development; only one third of patients

with biochemical recurrence develop measurable metastatic disease 8 years after

resection 27. In addition, the HR of 3.20 for metastatic recurrence compares favourably

to the reported hazard ratios for other prognostic assays to predict metastatic disease,

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

Page 16: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

16

HRs ranging between 1.40 and 3.30 16-18. A significant feature of assay performance was

independence from CAPRA-S, allowing the development of a combined risk model with

superior performance to either CAPRA-S or the Metastatic Assay individually.

An interesting feature of the metastatic-subgroup was methylation and loss of gene

expression such as OLFM4 known to inhibit metastatic processes including WNT

signalling 28. It is therefore possible that novel therapies aimed at reversing epigenetic

silencing or targeting WNT signalling may act against the metastatic biology in this

molecular subgroup 29. Regarding up-regulated genes in the metastatic-subgroup, a

significant proportion were regulated by FOXM1 known to promote prostate cancer

progression 30. Indeed, others have found increased FOXM1 gene expression to be

prognostic and have included it in a 31-gene expression assay 16. Interestingly only

6/70 genes in the metastatic assay overlapped with 3 prognostic signatures that are

entering clinical practice (AZGP1 18, PTTG1, TK1 and KIF11 16, ANO7 and MYBPC1 17 )

GenomeDx (p=0.06) , GHI (p=0.16) and Myriad (p=0.06) after multiple test correction

using a Benjamini-Hochberg correction, likely reflecting the distinct approach of

molecular subtyping versus trained endpoint analysis (Supplementary Figure 6).

A potential limitation of this study is the retrospective validation of the assay in historic

datasets. Diagnostic and surgical approaches have improved with time, which may

reduce disease recurrence. We expect, however, that the effect of these improvements

would mostly be on local recurrence whereas this assay has been developed to predict

metastatic disease progression, likely largely beyond surgical control at presentation.

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

Page 17: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

17

CONCLUSIONS

We have identified a molecular subgroup of primary prostate cancer with metastatic

capacity. We hypothesize that using this molecular subtyping approach may improve

patient stratification considering active surveillance and may benefit patients with

higher risk clinically localized disease by focusing loco-regional and systemic adjuvant

therapy in those at highest risk of regional and systemic failure.

349

350

351

352

353

354

355

Page 18: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

18

AUTHOR CONTRIBUTIONS

Study concept and design: Walker, Harkin and Kennedy.

Acquisition of data: Walker, Knight, Logan, Blayney, McCavigan, Price and Jellema.

Analysis and interpretation of data: Walker, Knight and Kennedy

Writing of the manuscript: Walker, Logan, Knight, Clarke and Kennedy

Critical revision of the manuscript for important intellectual content: Waugh, Mills, Neal,

Clarke and Harkin.

Obtaining funding: Kennedy, Harkin.

Administrative, technical or material support: Sherif, Warren, Neal, Berge, Svindland,

Pandha, Mason, McDade, Watson, Davidson, Uprichard and Kay.

ACKNOWLEDGEMENTS

We acknowledge the Welsh Cancer Biobank/Cardiff University Health, Irish Prostate

Cancer Research Consortium Biobank, the Northern Ireland Biobank and The Prostate

Biobank associated with Oslo University Hospital along with their members of the tissue

acquisition teams. In particular we thank E. Smith (University of Surrey) and L. Spary

(Welsh Cancer Bank) for the support in acquiring samples and corresponding clinical

data from the clinical sites. We would also like to thank J. Fay (RCSI, Beaumont Hospital)

for continued support and guidance with pathology. In addition, this work was

supported by the Belfast-Manchester Movember Centre of Excellence (CE013_2-004),

funded in partnership with Prostate Cancer UK (Waugh, Clarke and Mills) and by

European Regional Development Fund through Invest Northern Ireland (INI), Ref:

RD1208001 and RD0115336.

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

Page 19: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

19

SUPPLEMENTARY METHODS

Patients

The Discovery cohort was collected in collaboration with two clinical sites, University of

Cambridge and Institute of Karolinska. These samples were anonymous without clinical

outcome and were solely used for molecular subgroup identification. Samples ranged in

type with 56% primary tumours, 17% primary tumours with concomitant metastases,

8% known metastatic disease and 19% normal. In total 44% of patients had a high

Gleason scores>7, 19% with a Gleason 7 and the remaining patients either <7 or

unknown (Supplementary Table 2). Of the Discovery cohort, 22 patient samples were

selected for methylation analysis. In total, 36% (8/22 patients) were ‘Assay Positive’

within the C1 subgroup of metastatic biology, with 72% harboring a Gleason scores >7

(Supplementary Table 9). Within the secondary training dataset 7% had a Gleason of

<7, with 77% Gleason 7 and 16% Gleason >7. Median pre-operative PSA levels were 7.7

ng/ml and median age of 58 years. The first in silico dataset (GSE25136) consisted in

total of 79 patients with a 53 to 47% split of non-recurrence to recurrence patients.

Overall, 56% of patients had a resection Gleason score of 7, with two equal proportions

of patients (each 22%) having a Gleason of either <7 or >7. Median pre-operative PSA

levels were 7.6 ng/ml and median age of 61.2 years (Supplemental Table 4). An

additional in silico dataset (GSE46691) comprised of 545 patients in total, of which 39%

had known metastatic progression. Exactly half the population had a Gleason of 7, with

38% >7 and 12% <7 (Supplemental Table 5). The final in silico dataset (GSE21034)

consisted of a total of 126 patients with a 25 and 11% split of biochemical recurrence

and metastatic progression respectively. Overall, 57% of patients had a resection

Gleason score of 7, with 11% >7 and 32% <7. Median pre-operative PSA levels were

5.92 ng/ml and median age of 57.6 years (Supplemental Table 6).

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

Page 20: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

20

Prior to acquiring the retrospective validation cohort a power calculation was

performed using a Hazard ratio of 2.0. Using preliminary data we estimated that the

metastatic group (Assay Positive) is approximately 30% of the population with a

recurrence rate of 40%, therefore with 263 patients with approximately 105 recurrence

events this will give a study power of between 90% at a significance level of 0.05. The

retrospective validation cohort was collected in collaboration with four clinical sites,

University Hospital of Oslo, Wales Cancer Bank, University of Surrey and the Irish

Prostate Cancer Research Consortium. Samples ranged across recurrence subgroups

with 53% non-recurrence, 32% biochemical recurrence and 15% known metastatic

progression. Median time to recurrence event was 12 months (biochemical) and 3

months (metastatic). In total 17% of patients had a high Gleason score >7, with 61%

having a Gleason 7 and the remaining patients either <7 or unknown. The majority of

patients (99%) had a pathological T-stage of either T2 or T3. Median pre-operative PSA

levels were 8.4 ng/ml and median age of 62 years. Seminal vesicle invasion (19%),

lymph node invasion (5%), extracapsular extension (30%) and positive surgical

margins (32%) were also appropriately represented across the validation cohort

(Supplementary Table 7).

Whilst overarching ethical approval was obtained, additional clinical site ethical

approval was also obtained from collaborators, namely The Prostate Biobank Oslo and

the Irish Prostate Cancer Research Consortium Biobank/Mater Misericordiae University

Hospital ethics committee.

Molecular Profiling of Prostate Cancer samples

Samples were pathology reviewed to identify the most dominant Gleason grade within

the tumour for macrodissection. Total RNA was extracted from 2x10 µm

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

Page 21: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

21

macrodissected FFPE tissue slides using the Roche High-Pure RNA Paraffin Kit (Roche

Diagnostics GmbH, Mannheim, Germany). RNA was converted into cDNA, amplified and

converted into single-stranded form using SPIA® technology of the WT-Ovation™ FFPE

RNA Amplification System (NuGEN Technologies Inc., San Carlos, CA, USA). Amplified

cDNA was fragmented, biotin-labelled using FL-Ovation™ cDNA Biotin Module (NuGEN

Technologies Inc.), and hybridized to the Almac Prostate Cancer DSATM. Arrays were

scanned using Affymentrix Genechip® Scanner 7G (Affymetrix Inc., Santa Clara, CA,

USA). Stratagene Universal Human Reference (UHR) samples and ES-2 cell lines were

used as process controls.

Methylation Profiling of Prostate Cancer samples

For the 22 patients, 8 metastatic-subgroup and 14 non-metastatic-subgroup, DNA was

extracted using Recoverall (Life technologies). Genomic DNA (800 ng) was treated with

sodium bisulfite using the Zymo EZ DNA Methylation KitTM (Zymo Research, Orange,

CA, USA) according to the manufacturer’s procedure, with the alternative incubation

conditions recommended when using the Illumina Infinium Methylation Assay. The

methylation assay was performed on 4 l bisulfite-converted genomic DNA at 50 ng/ l μ μ

according to the Infinium HD Methylation Assay protocol. Samples were processed onto

Illumina 450k arrays as per manufacturer’s procedures.

Data preparation & Quality Control (QC)

Microarray Data

Samples were pre-processed using the Robust Multi-Array (RMA) average methodology

30. The QC assessment comprised a combination of the following quality metrics

including array image analysis, GeneChip QC, principal components analysis (PCA) and

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

Page 22: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

22

intensity distribution analysis. Array data was examined to identify any image artefacts.

As part of the GeneChip QC, percent present (%P), average signal absent, scale factor,

average background and raw Q were all assessed. Samples with a %P<15% were

deemed a QC fail. Hotelling T2 and residual residual Q method was used to identify

sample outliers at the expression level within the PCA analysis. Finally, Kolmogorov-

Smirnov statistic 31 were used to examine the intensity distribution of the samples and

identify outliers.

Methylation Data

Raw data was processed using the R package “Lumi”, specifically this was used to

correct any color bias and normalise the processed data using a quantile approach,

uncorrected b-values were extracted using the same software.

Hierarchical clustering

Genes were ranked based on variance and intensity (variance high → low; intensity high

→ low). A two-step process was implemented to determine the optimal data matrix size

to be used for subgroup identification (unsupervised analysis). Firstly, the most stable

number of sample groups was identified. Secondly, the optimal number of genes leading

to the identified number of sample groups was determined. The most stable number of

sample groups was identified by sub setting the ranked and sorted data matrix into 50

sub matrices increments of 100 (max being 5000 genes). The GAP statistic 21 was run to

determine the optimal number of sample groups in each of these sub-matrices. This

index gives an indication of the within-cluster tightness and between-cluster

separateness. The smallest number of genes generating the optimal sample cluster

number was selected as the list of most variable genes to take forward for unsupervised

subgroup identification. For the purpose of clustering, the data matrices were

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

Page 23: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

23

standardized to the median value of gene expression. Standardization of the data allows

the comparison of different genes’ expression levels, which may not necessarily be on

the same scale or at the same intensity levels. Following standardization, 2-dimensional

hierarchical clustering was performed (samples x genes). Euclidean distance was used

to calculate the distance matrix, which is a multidimensional matrix representing the

distance from each data point (gene-sample pair) to all the other data points. Ward’s

linkage method was subsequently applied to join the samples and genes together, based

on the calculated distance matrix. In order to determine the optimal number of sample

clusters and gene clusters, the GAP statistic was calculated for a range of potential

clusters.

Functional enrichment analysis was performed to determine the significance of each

gene cluster. Enrichment analysis consisted of the comparison of the gene list of interest

to other gene lists of known function grouped according to the GO classification

“Biological processes” (entities). Entities were ranked according to a statistically

derived enrichment score 32 and adjusted for multiple testing 34; thereby measuring the

significance of likelihood that the association between the gene set of interest and a

given process is due to chance.

Differential Expression Analysis

The pre-processed data was filtered to remove all Affymetrix AFFX control probe sets

and uninformative probe sets whose expression resides in the background noise region

(background filtering). Background filtering was performed based on a combination of

the expression and the variance of individual probe sets. Expression selects those probe

sets whose average expression is above the threshold defined by σBg at the user

specified significance level . The variance selects those probe sets whose variance is α

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

Page 24: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

24

above the variance of the background σBg. A t-test was performed on the reliably

detected probe set list to establish the variance contributions of the factor of interest

(cluster). Multiple test correction (MTC) was applied using False Discovery Rate (FDR,

33). Data was filtered based on a fold change greater than 2 and an adjusted p-value of

0.05. Functional enrichment analysis was performed on the resulting gene list to

provide insight into pathways associated with the genes in the list. Using commercial

software IPA, functional enrichment analysis was conducted to identify and rank

biological entities which are found to be associated with the gene sets of interest 32.

Entities have been ranked according to a statistically derived enrichment score 34 and

adjusted for multiple testing 33; thereby measuring the significance (pFDR threshold <

0.05) of likelihood that the association between the gene set of interest and a given

process or pathway is due to chance.

Signature Generation

The following steps summarize the procedure for developing the gene signature:

1. Cross-validation : The samples were randomly split into 5 cross-validation (CV)

folds for signature training/testing, and this was repeated 10 times to allow an

unbiased estimation of the model performance.

2. Pre-processing : RMA background correction of the data at the probe intensity

level, followed by a median summary of the intensities of probes to probe sets

and subsequently probe sets to Entrez gene ID. The Entrez gene level

summarized data matrix was log2 transformed and quantile normalized. Note

that samples in the CV test set were normalized using a quantile normalisation

model from the corresponding CV training set to ensure that all estimates of

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

Page 25: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

25

model performance are based on signature scores pre-processed on a per sample

basis.

3. Filtering : A gene filter was applied before model development to remove 75

percent of genes with low variance and low intensity.

4. Machine Learning : Partial Least Squares (PLS) was used to train the algorithm

against the “metastatic-subgroup” endpoint.

5. Feature Selection : A wrapper based method for feature selection was

implemented, where genes (those remaining after the initial filter) are ranked

using the respective weights defined by the PLS algorithm and 10 percent of

genes with the lowest absolute weights are removed. This process is repeated

after each round of feature elimination (within cross validation) where the genes

are re-ranked in order to determine the genes with the lowest absolute weights

and removing 10 percent each time until only 2 genes remained.

6. Interim validation data set : Five separate sections across an FFPE tumour block

were profiled in order to evaluate the impact of biological heterogeneity on the

signature score. A secondary training dataset of 75 samples of which 20 were

profiled in duplicate were additionally used to guide signature selection.

Signature scores for each of these sections were calculated under CV alongside

each CV test set.

Model selection included the following steps:

1. Evaluating the Area under the Receiver Operating Characteristic (ROC) Curve

(AUC) in the training data and C-index performance in the secondary training

dataset under cross validation.

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

Page 26: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

26

2. Evaluating the variability in signature scores across the five separate sections of

FFPE material which were predicted under CV. The variability was determined

by calculating the standard deviation (SD) of the signature scores across the five

samples and expressing the SD as a fraction of the signature score range (i.e.

calculating a percent SD).

3. Evaluating the variability in signature scores across 20 patients with technical

replicates which were predicted under CV. The variability was determined by

calculating the pooled standard deviation (SD) of the signature scores across the

20 patient technical replicates and expressing the SD as a fraction of the

signature score range (i.e. calculating a percent SD).

The signature length that yielded a high AUC in training set, a high C-index in the

secondary training set and low SDs in both the reproducibility samples and clinical

technical replicates was selected. Following migration of the Metastatic Assay to a

platform with an improved chemistry (NuGEN Ovation FFPE WTA V3), a technical bias

adjustment was applied to the assay threshold which was used to dichotomize assay

scores for resection clinical validation cohort.

Generation of Metastatic Assay Scores

Probeset expression was summarized to an Entrez Gene ID level using the median

value. Assay scores were calculated using the partial least squares model:

i

iii kbxwscore Signature

Where w i is the weight of each entrez gene, x i is the gene expression, b iis the entrez gene

specific bias and k=0.4365. Assay calls were assigned based upon predefined cut-off for

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

Page 27: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

27

resection (0.3613). Samples with a continuous score result > cut-off were labelled ‘assay

positive’ otherwise ‘assay negative’.

Methylation Data Analysis

The data was transformed using the logit transformation and negative values were

corrected for by adding a factor of 10 to the data matrix. Semi-supervised hierarchical

cluster analysis was performed in the methylation data using the genes in G1 (gene

cluster 1) which were under-expressed in the metastatic biology subgroup relative to

the non-metastatic biology subgroup. Gene symbols were mapped to methylation probe

IDs using HumanMethylation450_15017482_v1-2 annotation which were then

summarized to gene level using the median. Functional enrichment analysis of the gene

clusters was performed as previously described for the microarray analysis.

Performance analysis of the Metastatic Assay as a continuous predictor

Predicted score outputs from the Metastatic Assay were transformed to a continuous

scale between 0 and 1 using the overall range of scores i.e. Scorei = Xi – (min(X) /

(max(X)-min(X)). A combined model of continuous Metastatic Assay scores with CAPRA-

S was developed under cross-validation to reduce bias in performance estimates. Cox

proportional hazards regression method was used to estimate the univariate and

multivariable hazard ratios (HRs) (incorporating Gleason, Age & iPSA) of the continuous

Metastatic Assay scores, CAPRA-S and the combined model scores. Area under the

receiver-operating characteristic curve (AUC) and concordance Index (C-Index)

performance metrics were also calculated to determine significance for prediction of

biochemical and metastatic outcomes. Net benefit scores were determined across risk

thresholds ranging from 0-40% for metastatic events using decision curve analysis.

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

Page 28: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

28

589

590

591

Page 29: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

29

REFERENCES

1. Wilt TJ, Brawer MK, Jones KM, et al. Radical prostatectomy versus observation for localized prostate cancer. N Engl J Med. 2012;367(3):203-213.

2. Bill-Axelson A, Holmberg L, Garmo H, et al. Radical prostatectomy or watchful waiting in early prostate cancer. N Engl J Med. 2014;370(10):932-942.

3. Klotz L, Vesprini D, Sethukavalan P, et al. Long-term follow-up of a large active surveillance cohort of patients with prostate cancer. J Clin Oncol. 2015;33(3):272-277.

4. Bader P, Burkhard FC, Markwalder R, Studer UE. Is a limited lymph node dissection an adequate staging procedure for prostate cancer? J Urol. 2002;168(2):514-518; discussion 518.

5. Roach M, 3rd, DeSilvio M, Lawton C, et al. Phase III trial comparing whole-pelvic versus prostate-only radiotherapy and neoadjuvant versus adjuvant combined androgen suppression: Radiation Therapy Oncology Group 9413. J Clin Oncol. 2003;21(10):1904-1911.

6. Abdollah F, Gandaglia G, Suardi N, et al. More extensive pelvic lymph node dissection improves survival in patients with node-positive prostate cancer. Eur Urol. 2015;67(2):212-219.

7. Zapatero A, Guerrero A, Maldonado X, et al. High-dose radiotherapy with short-term or long-term androgen deprivation in localised prostate cancer (DART01/05 GICOR): a randomised, controlled, phase 3 trial. Lancet Oncol. 2015;16(3):320-327.

8. James ND, Sydes MR, Clarke NW, et al. Addition of docetaxel, zoledronic acid, or both to first-line long-term hormone therapy in prostate cancer (STAMPEDE): survival results from an adaptive, multiarm, multistage, platform randomised controlled trial. Lancet. 2016;387(10024):1163-1177.

9. Cooperberg MR, Lubeck DP, Meng MV, Mehta SS, Carroll PR. The changing face of low-risk prostate cancer: trends in clinical presentation and primary management. J Clin Oncol. 2004;22(11):2141-2149.

10. Bolla M, van Poppel H, Tombal B, et al. Postoperative radiotherapy after radical prostatectomy for high-risk prostate cancer: long-term results of a randomised controlled trial (EORTC trial 22911). Lancet. 2012;380(9858):2018-2027.

11. Makarov DV, Sanderson H, Partin AW, Epstein JI. Gleason score 7 prostate cancer on needle biopsy: is the prognostic difference in Gleason scores 4 + 3 and 3 + 4 independent of the number of involved cores? J Urol. 2002;167(6):2440-2442.

12. Cooperberg MR, Pasta DJ, Elkin EP, et al. The University of California, San Francisco Cancer of the Prostate Risk Assessment score: a straightforward and reliable preoperative predictor of disease recurrence after radical prostatectomy. J Urol. 2005;173(6):1938-1942.

13. Cooperberg MR, Hilton JF, Carroll PR. The CAPRA-S score: A straightforward tool for improved prediction of outcomes after radical prostatectomy. Cancer. 2011;117(22):5039-5046.

14. Khor LY, Bae K, Paulus R, et al. MDM2 and Ki-67 predict for distant metastasis and mortality in men treated with radiotherapy and androgen deprivation for prostate cancer: RTOG 92-02. J Clin Oncol. 2009;27(19):3177-3184.

15. Cuzick J, Yang ZH, Fisher G, et al. Prognostic value of PTEN loss in men with conservatively managed localised prostate cancer. Br J Cancer. 2013;108(12):2582-2589.

16. Cuzick J, Swanson GP, Fisher G, et al. Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study. Lancet Oncol. 2011;12(3):245-255.

17. Erho N, Crisan A, Vergara IA, et al. Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy. PLoS One. 2013;8(6):e66855.

592

593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640

Page 30: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

30

18. Klein EA, Cooperberg MR, Magi-Galluzzi C, et al. A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling. Eur Urol. 2014;66(3):550-560.

19. Shipitsin M, Small C, Choudhury S, et al. Identification of proteomic biomarkers predicting prostate cancer aggressiveness and lethality despite biopsy-sampling error. Br J Cancer. 2014;111(6):1201-1212.

20. Perou CM, Sorlie T, Eisen MB, et al. Molecular portraits of human breast tumours. Nature. 2000;406(6797):747-752.

21. Glinsky GV, Glinskii AB, Stephenson AJ, Hoffman RM, Gerald WL. Gene expression profiling predicts clinical outcome of prostate cancer. J Clin Invest. 2004;113(6):913-923.

22. Taylor BS, Schultz N, Hieronymus H, et al. Integrative genomic profiling of human prostate cancer. Cancer Cell. 2010;18(1):11-22.

23. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2001;63(2):411-423.

24. GRAMBSCH PM, THERNEAU TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994;81(3):515-526.

25. Kypta RM, Waxman J. Wnt/beta-catenin signalling in prostate cancer. Nat Rev Urol. 2012;9(8):418-428.

26. Chen J, Bardes EE, Aronow BJ, Jegga AG. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37(Web Server issue):W305-311.

27. Pound CR, Partin AW, Eisenberger MA, Chan DW, Pearson JD, Walsh PC. Natural history of progression after PSA elevation following radical prostatectomy. JAMA. 1999;281(17):1591-1597.

28. Li H, Liu W, Chen W, Zhu J, Deng CX, Rodgers GP. Olfactomedin 4 deficiency promotes prostate neoplastic progression and is associated with upregulation of the hedgehog-signaling pathway. Sci Rep. 2015;5:16974.

29. Thibault A, Figg WD, Bergan RC, et al. A phase II study of 5-aza-2'deoxycytidine (decitabine) in hormone independent metastatic (D2) prostate cancer. Tumori. 1998;84(1):87-89.

30. Aytes A, Mitrofanova A, Lefebvre C, et al. Cross-species regulatory network analysis identifies a synergistic interaction between FOXM1 and CENPF that drives prostate cancer malignancy. Cancer Cell. 2014;25(5):638-651.

641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672

Page 31: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

31

FIGURE & TABLE LEGENDS

Table 1 – Multivariable analysis of the MSKCC cohort for biochemical recurrence (right)

and metastatic progression (left), p-values, hazard ratios (HR) and 95% confidence

intervals (CI) of the HR are outlined within the table. Covariate analysis of the

Metastatic Assay adjusting for CAPRA-S within the MSKCC cohort is also included with

p-values, hazard ratios (HR) and 95% confidence internals (CI) of the HR outlined.

Table 2 - Multivariable analysis of the Metastatic Assay in the independent resection

validation cohort for biochemical recurrence (right) and metastatic progression (left),

p-values, hazard ratios (HR) and 95% confidence intervals (CI) of the HR are outlined

within the table. Covariate analysis of the Metastatic Assay adjusting for CAPRA-S

within the independent resection validation cohort is also included with P-values,

hazard ratios (HR) and 95% confidence internals (CI) of the HR outlined. Analysis from

a combined model of the Metastatic Assay and CAPRA-S within the independent

resection validation cohort was also assessed, outlining p-values, hazard ratios and

confidence intervals for biochemical and metastatic disease progression.

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

Page 32: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

32

Suppl. Table 1 – Checklist for REMARK classification of Biomarkers

Suppl. Table 2 - Summary of demographic, clinical and pathological variables

considered for analysis of the Discovery resection cohort. Table outlines total number

of patients with each defined tumour type (Primary, Primary with metastasis,

Metastatic disease and Normal), the number and percentage of patients associated with

each of the representative Gleason grades and the number (%) of patients obtained

from each of the two clinical sites.

Suppl. Table 3 - Summary of demographic, clinical and pathological variables for the

secondary training dataset. Table outlines total number of patients, the median and

range of age at surgery (years), time to recurrence (months), pre-operative PSA levels

(ng/ml), Gleason scores, within each pathological T-stage subgroup, with lymph node

invasion (LNI), and patients with positive and negative surgical margins.

Suppl. Table 4 – Summary of demographic, clinical and pathological variables

considered for analysis of the Glinsky in silico validation sample cohort. Table outlines

total number and percentage of patients with recurrence events, disease relapse, each

representative Gleason score for both resection and biopsy, lymph node invasion (LNI)

and surgical margins. Medians and range are also summarized for pre-operative PSA

levels and age.

Suppl. Table 5 – Summary of demographic, clinical and pathological variables

considered for analysis of the Erho in silico validation sample cohort. Table outlines

total number and percentage of patients with metastatic recurrence events and

representative Gleason scores.

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

Page 33: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

33

Suppl. Table 6 - Summary of demographic, clinical and pathological variables

considered for analysis of the MSKCC in silico validation sample cohort. Table outlines

total number and percentage of patients with recurrence events either biochemical or

metastatic, site of metastasis, pathological T-stage, each representative prostatectomy

Gleason score, seminal vesicle invasion (SVI) and surgical margins. Medians and range

are also summarized for pre-operative PSA levels and age.

Suppl. Table 7 - Summary of demographic, clinical and pathological variables

considered for analysis of the independent resection validation cohort. Table outlines

total number of patients, the median and range of age at surgery (years), time to

recurrence (months), pre-operative PSA levels (ng/ml) and the number (%) of patients

from each of the four clinical sites, within each recurrence subgroup, associated with

each of the representative Gleason scores, within each pathological T-stage subgroup,

with lymph node invasion (LNI), seminal vesicle invasion (SVI), extracapsular extension

(ECE) and patients with negative, diffuse or focal surgical margins.

Suppl. Table 8 – Summary of up-regulated and down-regulated genes of the

‘metastatic-subgroup’ compared to the non-metastatic-subgroup, outlining gene

symbol, fold change and FDR p-values.

Suppl. Table 9 – Summary of demographic, clinical and pathological variables

considered for analysis of the methylation sample cohort. Table outlines total number

of patients from each clinical site, with each defined tumour type (Primary or

Metastatic), the number and percentage of patients associated with each of the two

identified subgroups (C1 and C2), the representative Gleason score and clinical T-stage.

Suppl. Table 10 - Summary of genes identified in gene clusters G1 and G2.

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

Page 34: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

34

Suppl. Table 11 - Summary of biological processes from Gene Ontology functional

analysis within the cluster of G1, p-values, pFDR-value and genes involved within the

pathway.

Suppl. Table 12 – Summary of canonical pathways from IPA and Toppgene functional

analysis within the over-expressed and under-expressed subgroups including pathway

name, p-values, pFDR-value and genes involved within the pathway.

Suppl. Table 13 – Developed Metastatic Assay with the 70 gene transcripts, Entrez

gene ID, weightings and bias.

Suppl. Table 14 – Performance metrics (Sensitivity and Specificity) of the Metastatic

Assay for biochemical recurrence from Glinsky in silico dataset and metastatic

recurrence from Erho in silico dataset.

Suppl. Table 15 – Performance metrics (Sensitivity and Specificity) for metastatic

recurrence of CAPRA-S alone and the developed combined model (CAPRA-S +

Metastatic Assay).

Suppl. Table 16 – Univariate assessment of the Metastatic Assay as a continuous

predictor in the MSKCC cohort both alone and in a combined model with CAPRA-S.

Suppl. Table 17 – Univariate assessment of the Metastatic Assay as a continuous

predictor in the Independent Resection Validation cohort both alone and in a combined

model with CAPRA-S.

Suppl. Table 18 – Multivariable assessment of the Metastatic Assay as a continuous

predictor in the MSKCC cohort and the Independent Resection Validation cohort.

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

Page 35: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

35

Figure 1 – A) Hierarchical clustering of transcriptional profiles from the Discovery

cohort. Specific genes which are upregulated (red) or downregulated (green) are

labelled on the vertical axis within gene clusters. Sample cluster C1 represents the

‘Metastatic-subgroup’ characterized by a shut-down of gene expression (G1) compared

to sample cluster C2. B) Bar chart representing the number and type of each tumour

mapping to each of the two identified sample clusters within the Discovery cohort. C)

Pie chart depicting association with increased methylation rates (dark blue) in the

under-expressed genes of 30% and the over-expressed genes of 6% (p<0.001).

Figure 2 - Kaplan Meier survival analysis for association of the Metastatic Assay at

predicting time to biochemical recurrence (A) and metastatic progression (B) in the

MSKCC in silico cohort. Survival probability (%) showed reduced progression-free

survival (PFS) in months of the ‘Assay Positive’ (yellow) of 85 patients when compared

to the ‘Assay Negative’ (blue) of 41 patients for biochemical and metastatic disease

respectively, (HR = 3.76 [1.70-8.34], p < 0.0001 and (HR = 6.00 [1.90-18.91], p =0.0005).

Figure 3 - Kaplan Meier survival analysis for association of the Metastatic Assay at

predicting time to biochemical recurrence (A) and metastatic progression (B) in the

resection validation cohort. Survival probability (%) showed reduced progression-free

survival (PFS) in months of the ‘Assay Positive’ (yellow) of 74 patients when compared

to the ‘Assay Negative’ (blue) of 248 patients for biochemical and metastatic disease

respectively, (HR = 1.76 [1.18-2.64], p=0.0008, (HR = 3.47 [1.70-7.07], p<0.0001). C)

Association of the Metastatic Assay at predicting metastatic progression stratified into

low risk (GS≤3+4) tumours and high risk (GS≥4+3) tumours, HR 5.61 [1.19-26.47],

p=0.0013 and HR 2.43 [1.14-5.17], p=0.0036 respectively.

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

Page 36: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

36

Figure 4 - A) Association of a combined model (Assay + CAPRA-S) at predicting time to

biochemical recurrence of high/low risk disease in the resection cohort. Reduced

progression-free survival (PFS) in months of the ‘High Risk’ subgroup (yellow) of 112

patients when compared to the ‘Low Risk’ subgroup (blue) of 125 patients (HR = 2.67

[1.90-3.75]; p<0.0001). B) Association of a combined model (Assay + CAPRA-S) at

predicting time to metastatic disease progression of high/low risk disease in the

resection cohort. Reduced progression-free survival (PFS) in months of the ‘High Risk’

subgroup (yellow) of 112 patients compared to ‘Low Risk’ subgroup (blue) of 125

patients (HR =7.53 [4.13 – 13.73]; p <0.0001).

779

780

781

782

783

784

785

786

787

788

Page 37: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

37

Suppl. Figure 1 – REMARK study design flow diagram.

Suppl. Figure 2 - Hierarchical clustering of methylation profiles from the specific genes.

Genes which have increased (red) or decreased (green) levels of methylation are

labelled on the vertical axis within gene clusters. 7/8 samples (88%) from the

metastatic-subgroup (M2), and 10/14 samples (71%) from the non-metastatic-

subgroup clustering together (M1) (chi-squared, p=0.02).

Suppl. Figure 3 - Workflow outlining the training and validation processes of the

Metastatic Assay development and optimization.

Suppl. Figure 4 - A) Assessment of the C-index in a cohort of 75 primary prostate

cancer cases treated with primary resection. C-index showing the identification of

metastatic recurrence by the Metastatic Assay (C-index=90.4). B) Standard deviation

(SD) of the signature scores across the 20 patient technical replicates.

Suppl. Figure 5 - Histogram to show the bimodal distribution of Metastatic Assay

scores in the Discovery cohort.

Suppl. Figure 6 – Venn diagram outlining the overlap of genes between the Metastatic

Assay and the three clinically utilised prognostic assays.

Suppl. Figure 7 – Decision Curve Analysis showing the net benefit of the Metastatic

Assay, CAPRA-S and the Combined Model in (A) MSKCC and (B) Independent Resection

Validation across probability thresholds for the metastatic endpoint compared with

treating all (All) or no patients (None).

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

Page 38: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

38

MAIN TABLES & FIGURES

  Table 1. Validation of Metastatic Assay in MSKCC cohort      

   

Biochemical Recurrence Metastatic Recurrence  Multivariate Model 1

 Multivariate Model 1  

Covariate HR 95% CI  p Covariate HR 95% CI  p

Metastatic Assay 3.03 1.43 to 6.41 0.0040 Metastatic Assay 2.53 0.67 to 9.54 0.1735

Gleason: (3+4) <7 4+3 8-10

0.382.048.09

0.10 to 1.370.76 to 5.43

2.74 to 23.91

0.14090.15790.0002

Gleason: (3+4)* <7 4+3 8-10

0.0022.61187.79

0.002.34 to 218.06

16.52 to 2134.99

0.96580.0073

<0.0001Age 0.99 0.94 to 1.04 0.6564 Age 0.88 0.80 to 0.97 0.0110

PSA 1.00 0.96 to 1.04 0.9857 PSA 0.94 0.89 to 0.98 0.0106

 Multivariate Model 2 Multivariate Model 2  Covariate HR 95% CI  p Covariate HR 95% CI  p

Metastatic Assay 3.35 1.62 to 6.94 0.0012 Metastatic Assay 3.95 1.15 to 13.53 0.0298

CAPRA-S 3.92 1.92 to 7.99 0.0002 CAPRA-S 3.50 1.13 to 10.80 0.0302

Abbreviations: HR, hazard ratio; CI, confidence intervals; PSA, prostate specific antigen; CAPRA-S, Cancer of the Prostate Risk Assessment post-surgical.  

*Absence of metastatic events in patients with Gleason score <3+4.                   

38

809

810

811

812

813

Page 39: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

39

  Table 2. Validation of Metastatic Assay in Primary Prostate Cancer Resection Dataset.      

   

Biochemical Recurrence Metastatic Recurrence Multivariate Model 1  Multivariate Model 1  Covariate HR 95% CI  p Covariate HR 95% CI  p

Metastatic Assay 1.62 1.13 to 2.33 0.0092 Metastatic Assay 3.20 1.76 to 5.80 0.0001Gleason: (3+4) <7 4+3 8-10

0.761.952.79

0.44 to 1.301.29 to 2.951.82 to 4.30

0.32240.0017

<0.0001

Gleason: (3+4) <7 4+3 8-10

0.724.336.85

0.19 to 2.731.89 to 9.932.92 to 16.04

0.63580.0006

<0.0001

Age 1.00 0.97 to 1.03 0.9027 Age 0.97 0.92 to 1.02 0.2828PSA 1.01 1.00 to 1.01 0.0321 PSA 1.00 0.99 to 1.01 0.6423

 Multivariate Model 2 Multivariate Model 2  Covariate HR 95% CI p Covariate HR 95% CI p

Metastatic Assay 1.72 1.19 to 2.48 0.0042 Metastatic Assay 2.94 1.60 to 5.40 0.0005

CAPRA-S 2.52 1.79 to 3.54 <0.0001 CAPRA-S 4.76 2.46 to 9.23 <0.0001

Combined Model HR 95% CI  p Combined Model HR 95% CI  p

Metastatic Assay + CAPRA-S2.67 1.90 to 3.75 <0.0001

Metastatic Assay + CAPRA-S7.53

4.13 to 13.73

<0.0001

   Abbreviations: HR, hazard ratio; CI, confidence intervals; PSA, prostate specific antigen; CAPRA-S, Cancer of the Prostate Risk Assessment post-surgical.            

39

814

Page 40: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

40

Figure 1 Molecular Subtyping and Identification of the Metastatic subgroup

40

A

G2

G1

A

C1 C2

815

816

817

Page 41: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

41

Figure 1 Cont’d Molecular Subtyping and Identification of the Metastatic-subgroup

41

P<0.001

B

P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001P<0.001

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

Page 42: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

42

Figure 2 Validation of the Metastatic Assay in resections using the MSKCC in silico dataset

42

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

Page 43: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

43

Figure 3 Validation of the Metastatic Assay in resections using the retrospective independent resection validation dataset

43

BA

856

857

858

859

860

861

862

863

Page 44: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

44

C

44

GS≤3+4 GS ≥ 4+3

864

865

866

867

868

869

870

871

872

Page 45: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

45

Figure 4 Validation of the Metastatic Assay in resections using a combined model with CAPRA-S to stratify high and low risk

45

A B

873

874

875

876

877

878

879

880

881

882

Page 46: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

46

SUPPLEMENTARY DATA

Supplementary Table 1 – contained within the Supplemental File (Tab 1)

Supplementary Table 2. Demographic and Clinical characteristics of Discovery Cohort.

Covariate No. % Clinical Site Cambridge 73 58 Karolinska 53 42

Sample Type Primary Tumour 70 56

Primary Tumour with Mets 21 17 Metastatic Disease 10 8 Normal 25 19 Gleason Score 6 10 8 7 24 19 8 - 10 56 44 N/A 36 29

46

883

884

885

886

887

888

889

Page 47: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

47

 

Supplementary Table 3. Demographic and Clinical Characteristics of the Secondary Training Dataset   

  Covariate No. % Age 58 (45-71)

  Recurrence Event   Recurrence 26 35   Non-Recurrence 49 65   Time to Recurrence   Biochemical 37 (7-102)

  Metastatic

64 (19-105)

  Pre-operative PSA   Median (Min - Max) 7.7 (2.6 – 61.4)   Gleason Score   <7 5 7   7 58 77   >7 12 16   Surgical Margins (SM)   Positive 33 56   Negative 42 44   Lymph Node Invasion (LNI)   Yes 2 2

 NoUnknown

1756

2375

  T-stage   T2 36 48

  T3 39 52          

47

890

891

Page 48: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

48

 

Supplementary Table 4. Demographic and Clinical Characteristics of the Glinsky in silico validation cohort   

  Covariate No. %   Recurrence Event   Recurrence 37 47   Non-Recurrence 42 53   Relapse   Yes 37 47   No 42 53   Pre-operative PSA   Median (Min - Max) 7.6(1.5 - 62.1)   Gleason Score (Resection)   <7 17 22   7 44 56   >7 18 22   Surgical Margins (SM)   Positive 50 63   Negative 29 37   Lymph Node Invasion (LNI)   Yes 3 4   No 76 96   Gleason Score (Biopsy)   <7 37 47   7 32 41   >7 10 12   Age   Median (Min - Max) 61.2(44.9 - 72.7)

         

48

892

Page 49: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

49

 Supplementary Table 5. Demographic and Clinical Characteristics of the Erho in silico validation cohort

    Covariate No. %   Gleason Score   <7 63 12   7 271 50   >7 211 38   Metastatic Recurrence   No 333 61   Yes 212 39

         

49

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

Page 50: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

50

 Supplementary Table 6. Demographic and Clinical Characteristics of the MSKCC in silico validation cohort

    Covariate No. %   Type of Tumour   Primary 126 100   Biochemical Recurrence   Yes 32 25   No 94 75   Metastatic Recurrence   Yes 14 11   No 112 89

Site of Metastasis   Bone 7 6   Local 2 2   RP node, Bone 1 1   Bone, Soft Tissue 1 1   RP node, Lung 1 1   Pelvic node 1 1   Lung 1 1   None 112 89   Age at Diagnosis   Median (Min - Max) 57.6(37.3 - 72.78)

Pre-operative PSA   Median (Min - Max) 5.9(1.1 - 46.4)   Pathological T-stage

50

Page 51: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

51

  T1 71 56   T2 50 40   T3 5 4   Surgical Margins (SM)   Positive 30 24   Negative 96 76   Seminal Vesicle Invasion (SVI)   Yes 13 10   No 113 90   Lymph Node Invasion (LNI)   No 97 77   Yes 6 5   Unknown 23 18   Race   Black Non-Hispanic 23 18   White Non-Hispanic 94 75   Black Hispanic 3 2   Asian 2 2   Unknown 4 3   Gleason   <7 40 32   7 72 57   >7 14 11

         

Supplementary Table 7. Demographic and Clinical characteristics of the Independent Resection Validation Cohort.     Covariate No.   %    Clinical Site          IPCRC 61   19    Oslo 142   44    Surrey 34   11    WCB 85   26       Age at Surgery    Median 62 (41-75)       Recurrence Event          Non-recurrence 172   53    Biochemical recurrence 103   32    Metastatic progression 47   15       Time to Recurrence - Median (range)    Biochemical recurrence 12 (1-100)    Metastatic progression 6 (3-63)       Pre-operative PSA          Median (range), ng/ml   8.4 (2 - 253)         Gleason score    <6 2 1    6 67 21    7 197 61    8 - 10 55 17       Pathological T-stage          T1 1   0.5  

51

916

Page 52: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

52

  T2 174   54    T3 146   45    T4 1   0.5       Lymph Node Invasion (LNI)    Yes 16 5    No 105 33    Unknown 201 62       Seminal Vesicle Invasion (SVI)          Yes 62   19    No 260   81       Extracapsular Extension (ECE)    Yes 97 30    No 190 59    Unknown 35 11       Surgical Margins (SM)          Negative 132   41    Focal 40   12    Diffuse 65   20    Unknown 85   27             

Site of Metastasis Bone Mets 24 51 Visceral Mets 17 36 Varied Mets 6 13

Supplementary Table 8 – contained within the Supplemental File (Tab 2)

Supplementary Table 9. Demographic & Clinical Characteristics of the

Methylation Cohort

Covariate No. % Clinical Site      

Cambridge 5 23 Karolinska 17 77 Sample Type Primary Tumour 20 91 Metastatic Disease 2 8 Class Label/Subgroup     C1 8 36 C2 14 64 Gleason Score     6 2 9 7 4 18 8 9 41 9 7 32 Stage    

52

917

918

919

920

Page 53: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

53

T2 5 23 T3 14 64 Unknown 3 13

Supplementary Table 10 – contained within the Supplemental File (Tab 3)

Supplementary Table 11 – contained within the Supplemental File (Tab 4)

Supplementary Table 12 – contained within the Supplemental File (Tabs 5 & 6)

53

921

922

923

924

925

926

927

Page 54: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

54

Supplementary Table 13. Signature gene IDs, weights and bias.

Gene Name Entrez Gene ID Weight Bias

CAPN6 827 -0.010899 4.440873

THBS4 7060 -0.009632 6.912586

PLP1 5354 -0.008886 4.383572

MT1A 4489 -0.008681 6.747957

MIR205HG 406988 -0.008279 7.215245

SEMG1 6406 -0.007935 4.230423

RSPO3 84870 -0.007296 4.293173

ANO7 50636 -0.007164 6.522548

PCP4 5121 -0.007139 7.621758

ANKRD1 27063 -0.006922 5.928315

MYBPC1 4604 -0.006845 4.574319

MMP7 4316 -0.006835 6.756722

SERPINA3 12 -0.006831 5.745462

SELE 6401 -0.006810 5.977682

KRT5 3852 -0.006403 6.080494

LTF 4057 -0.006400 6.497260

KIAA1210 57481 -0.006381 3.559966

TMEM158 25907 -0.006312 8.063421

ZFP36 7538 -0.006271 9.960827

FOSB 2354 -0.006108 6.954936

PCA3 50652 -0.006102 5.262342

TRPM8 79054 -0.006060 4.865791

PTTG1 9232 0.006017 4.712693

LOC283194 283194 -0.005950 4.980381

PAGE4 9506 -0.005837 7.073907

STEAP4 79689 -0.005685 8.105295

TMEM178A 130733 -0.005647 7.594526

CXCL2 2920 -0.005598 8.928978

HS3ST3A1 9955 -0.005593 4.232782

EYA1 2138 -0.005581 5.504276

RSPO2 340419 -0.005563 3.922421

PKP1 5317 -0.005553 5.912186

MUC6 4588 -0.005522 6.640037

PENK 5179 -0.005506 4.514855

DEFB1 1672 -0.005400 6.825491

SLC7A3 84889 -0.005390 4.649004

MIR578 693163 -0.005355 5.087389

PI15 51050 -0.005264 4.858716

UBXN10-AS1 101928017 -0.005259 6.065878

54

928

Page 55: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

55

PDK4 5166 -0.005249 4.174094

PHGR1 644844 -0.005208 5.183571

SERPINE1 5054 -0.005195 6.691866

PDZRN4 29951 -0.005147 4.752328

ZNF185 7739 -0.005105 6.900544

ADRA2C 152 -0.005055 7.078377

AZGP1 563 -0.005018 8.191178

TK1 7083 0.004966 5.581335

POTEH 23784 -0.004961 4.824976

KIF11 3832 0.004929 3.917669

CLDN1 9076 -0.004924 4.960283

MIR4530 100616163 -0.004908 10.536452

MAFF 23764 -0.004901 8.497945

ZNF765 91661 -0.004862 3.976333

CKS2 1164 0.004856 6.503981

TCEAL7 56849 -0.004856 4.819328

PLIN1 5346 0.004831 4.629392

SIGLEC1 6614 0.004773 5.503752

FAM150B 285016 -0.004773 6.664595

MFAP5 8076 -0.004772 4.129177

SFRP1 6422 -0.004762 7.901262

DUSP5 1847 -0.004718 5.762678

VARS2 57176 0.004675 5.223455

ABCC4 10257 -0.004664 5.230377

SH3BP4 23677 -0.004623 4.882708

SORD 6652 -0.004573 8.958411

MTERFD1 51001 0.004522 5.334199

DPP4 1803 -0.004506 4.659748

AATBC 284837 0.004502 4.905313

FAM3B 54097 -0.004443 7.388071

KLK3 354 -0.004425 10.226441

55

929

930

931

932

933

934

935

936

Page 56: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

56

Supplementary Table 15. Performance metrics for metastatic recurrence for CAPRA-S and the Combined Model (CAPRA-S + Metastatic Assay) 

  Metric Metastatic Assay CAPRA-S Combined Model   Sensitivity 47.73% [32.46-63.31] 70.45% [54.80-83.24] 84.09% [69.93-93.36]   Specificity 81.87% [75.69-87.03] 71.50% [64.58-77.75] 64.14% [53.88-68.06]

         

Supplementary Table 14. Performance metrics for Biochemical recurrence (Glinsky) and Metastatic recurrence (Erho) for the Metastatic Assay 

 Metric Metastatic Assay

(Glinsky cohort)Metastatic Assay

(Erho cohort)   Sensitivity 70.27% [56.41-80.00] 66.67% [52.44-76.63]   Specificity 66.98% [60.51-71.36] 54.65% [50.43-57.38]

         

56

937

938

939

940

941

942

943

944

Page 57: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

57

Supplementary Table 16. Univariate assessment of the Metastatic Assay as a continuous predictor in the MSKCC cohort both alone and in a combined model with CAPRA-S

    Metric Metastatic Assay CAPRA-S Combined Model  

Biochemical Recurrence

AUC 0.67 [0.58-0.75] 0.79 [0.71-0.86] 0.79 [0.71-0.86]   HR 1.39 [1.16-1.66]; p=0.0003 1.39 [1.24-1.56]; p<0.0001 1.60 [1.39-1.85]; p<0.0001

  C-index 0.71 [0.60-0.79] 0.75 [0.66-0.83] 0.80 [0.72-0.87]

Metastatic Recurrence

AUC 0.71 [0.63-0.79] 0.87 [0.80-0.93] 0.88 [0.81-0.93]

HR 1.48 [1.13-1.95]; p=0.0048 1.39 [1.19-1.61; p<0.0001 1.55 [1.26-1.91]; p<0.0001

C-index 0.69 [0.52-0.84] 0.79 [0.69-0.88] 0.83 [0.74-0.91]

Supplementary Table 17. Univariate assessment of the Metastatic Assay as a continuous predictor in the Independent Resection Validation cohort both alone and in a combined model with CAPRA-S

    Metric Metastatic Assay CAPRA-S Combined Model  

Biochemical Recurrence

AUC 0.59 [0.54-0.65] 0.76 [0.70-0.82] 0.77 [0.71-0.82]   HR 1.19 [1.06-1.34]; p=0.0030 1.27 [1.19-1.36]; p<0.0001 1.38 [1.27-1.50]; p<0.0001

  C-index 0.58 [0.54-0.63] 0.62 [0.58-0.68] 0.68 [0.64-0.74]

Metastatic Recurrence

AUC 0.71 [0.66-0.76] 0.76 [0.70-0.81] 0.80 [0.74-0.85]

HR 1.58 [1.29-1.93]; p<0.0001 1.45 [1.28-1.64]; p<0.0001 1.66 [1.43-1.93]; p<0.0001

C-index 0.71 [0.64-0.78] 0.73 [0.66-0.79] 0.82 [0.76-0.86]

 Supplementary Table 18. Multivariable assessment of the Metastatic Assay as a continuous predictor in the MSKCC cohort and the Independent Resection Validation cohort 

57

945

946

947

948

Page 58: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

58

   

Biochemical Recurrence Metastatic Recurrence Multivariate Model 1 - MSKCC

 Multivariate Model 1 - MSKCC  

Covariate HR 95% CI  p Covariate HR 95% CI  p

Metastatic Assay 2.00 1.24 to 3.24 0.0050 Metastatic Assay 2.99 1.10 to 8.17 0.0334

Gleason: (3+4) <7 4+3 8-10

0.512.094.27

0.14 to 1.900.78 to 5.57

1.27 to 14.35

0.31730.14340.0195

Gleason: (3+4)* <7 4+3 8-10

0.0021.0280.14

0.002.22 to 198.597.10 to 901.73

0.96720.00820.0004

Age 0.97 0.92 to 1.03 0.3552 Age 0.89 0.82 to 0.97 0.0093

PSA 0.97 0.92 to 1.02 0.2005 PSA 0.89 0.83 to 0.96 0.0026

Multivariate Model 2 – Independent Resection Cohort Multivariate Model 2 – Independent Resection CohortCovariate HR 95% CI  p Covariate HR 95% CI  p

Metastatic Assay 1.16 1.03 to 1.30 0.0155 Metastatic Assay 1.52 1.24 to 1.85 <0.0001

Gleason: (3+4) <7 4+3 8-10

0.771.972.81

0.45 to 1.321.30 to 2.981.82 to 4.32

0.34160.0015

<0.0001

Gleason: (3+4) <7 4+3 8-10

0.784.407.08

0.21 to 2.961.92 to 10.113.01 to 16.62

0.72110.0005

<0.0001Age 1.00 0.97 to 1.02 0.7220 Age 0.96 0.92 to 1.01 0.1269

PSA 1.01 1.00 to 1.01 0.0163 PSA 1.00 0.99 to 1.02 0.5067

Abbreviations: HR, hazard ratio; CI, confidence intervals; PSA, prostate specific antigen; CAPRA-S, Cancer of the Prostate Risk Assessment post-surgical.  

*Absence of metastatic events in patients with Gleason score <3+4.                   

58

949

950

951

952

953

Page 59: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

59

Suppl. Figure 1 – REMARK Study Design Flow Diagram

59

954

955

956

Page 60: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

60

Suppl. Figure 2 - Semi-supervised hierarchical clustering of the methylation data of down-regulated genes

M1 M2

60

957

958

959

960

Page 61: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

61

Suppl. Figure 3 Process for the development of the Metastatic Assay

61

961

962

963

964

965

966

Page 62: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

62

Suppl. Figure 4 Assessment of C-index (A) and SD (B) performance metrics for the metastatic endpoint in the secondary training dataset to justify selection of a 70-gene signature

62

A

B

967968969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

985

Page 63: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

63

Suppl. Figure 5 Distribution of the Metastatic Assay signature scores in the Discovery dataset

63

986987

Page 64: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

64

Suppl. Figure 6 - Venn diagram outlining the overlap of genes between the Metastatic Assay and the three clinically utilised prognostic assays.

64

PTTG1

TK1

KIF11

AZGP1

ANO7

MYBPC1

988989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

Page 65: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

65

Suppl. Figure 7 – Decision Curve Analysis showing the net benefit of the Metastatic Assay, CAPRA-S and the Combined Model in (A) MSKCC and (B) Independent Resection Validation across probability thresholds for the metastatic endpoint compared with treating all (All) or no patients (None)

65

A B

100810091010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

Page 66: epubs.surrey.ac.ukepubs.surrey.ac.uk/814092/1/REF EURUROL-D-16-01757...  · Web viewThe optimal number of sample and gene clusters were identified using the GAP statistic ... assessed

66

APPENDIX A

**Gemma to add further REMARK text**

66

1029

1030