Supplementary Material for Tracking the genomic evolution ... · Tracking the genomic evolution of...

CD-15-0412: Murugaesu et al. - Supplementary Material

1

Supplementary Material for

Tracking the genomic evolution of esophageal adenocarcinoma through

neoadjuvant chemotherapy

Supplementary Materials and Methods

Supplementary Table S1 Sequencing coverage for tumor regions and germline

samples

Supplementary Table S2 Clinical characteristics of patients with EAC

Supplementary Table S4 Annotation of chromosomal segments that overlap with

TCGA ESCA recurrent amplifications and tumor regions in our

EAC cohort

Supplementary Figure S1 Mutations identified using M-seq compared with a single

biopsy

Supplementary Figure S2 Relationship of intratumor heterogeneity and response to

NAC treatment

Supplementary Figure S3 Tumor Phylograms

Supplementary Figure S4 Copy number events lead to mutational heterogeneity


2

Supplementary Figure S5 Heatmap of the all driver mutations identified across all

tumor regions

Supplementary Figure S6 Driver mutations identified using M-seq compared with a

single biopsy

Supplementary Figure S7 Copy number states across the genome for each tumor

region

Supplementary Figure S8 Chromosome view of chromosome 19, tumor sample EAC017,

region R1, demonstrating chromothripsis

Supplementary Figure S9 wGII scores for the TCGA ESCA cohort and M-seq EAC cohort

Supplementary Figure S10 The trinucleotide context for temporally dissected combined

EAC cohort

Supplementary Figure S11 Assessing copy number heterogeneity using a minimum

consecutive segment method


3

SUPPLEMENTARY MATERIALS AND METHODS

Patient cohort description

Multiple pre-treatment tumor regions were obtained endoscopically from a single tumor

mass and post-chemotherapy tumor regions were obtained from the surgical tumor

resection. Patients were treated with neoadjuvant combination chemotherapy and did not

receive any concurrent radiation treatment. Detailed clinical characteristics are provided in

Supplementary Table S2.

Tumor processing

All tumor samples were snap-frozen. Peripheral blood was collected from all patients and

snap-frozen. Approximately 5x2x2mm tumor tissue and 500µl of blood was used for

genomic DNA extraction, using the DNeasy kit (Qiagen) according to manufacturer’s

protocol. DNA was quantified by Qubit (Invitrogen) and DNA integrity was examined by

agarose gel eletrophoresis.

Multi-region whole exome sequencing

For each tumor region and matched germ-line, exome capture was performed on 1-2 μg

DNA using the Agilent Human All Exome V4 kit according to the manufacturer’s protocol

(Agilent). Samples were paired-end multiplex sequenced on the Illumina HiSeq 2500 at the

Advanced Sequencing Facility at The Francis Crick Institute, Lincoln's Inn Fields

Laboratories, as described previously (2, 3). Each captured library was loaded on the

Illumina platform and paired-end sequenced to the desired average sequencing depth

(approximately 90x, detailed coverage information is provided in Supplementary Table S1).


4

SNV and INDEL calling from multi-region whole exome sequencing

Raw paired end reads (100bp) in FastQ format generated by the Illumina pipeline were

aligned to the full hg19 genomic assembly (including unknown contigs) obtained from

GATK bundle 2.8 (4), using bwa mem (bwa-0.7.7) (5). Picard tools v1.107 was used to clean,

sort and merge files from the same patient region and to remove duplicate reads

(http://broadinstitute.github.io/picard). Quality control metrics were obtained using a

combination of picard tools (1.107), GATK (2.8.1) and FastQC (0.10.1)

(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

SAMtools mpileup (0.1.16) (6) was used to locate non-reference positions in tumor and

germ-line samples. Bases with a phred score of <20 or reads with a mapping-quality <20

were skipped. BAQ computation was disabled and the coefficient for downgrading

mapping quality was set to 50. Somatic variants between tumor and matched germ-line

were determined using VarScan2 somatic (v2.3.6) (7) utilizing the output from SAMtools

mpileup. Default parameters were used with the exception of minimum coverage for the

germ-line sample that was set to 10, minimum variant frequency was changed to 0.01 and

tumor purity was set to 0.5. VarScan2 processSomatic was used to extract the somatic

variants.

The resulting SNV calls were filtered for false positives using Varscan2's associated

fpfilter.pl script, having first run the data through bam-readcount (0.5.1). Additionally, for

those variants not subjected to Ion Torrent validation, further filtering was applied

whereby variants were only accepted if present in ≥ 5 reads and ≥ 5% variant allele

frequency (VAF) in at least one tumor region with germ-line VAF ≤ 1%. If a variant was


5

found to meet these criteria in a single region, then the VAF threshold was reduced to ≥

1% in order to detect low frequency variants.

All INDEL calls classed as ‘high confidence‘ by VarScan2 processSomatic underwent

manual review prior to validation. All variants were annotated using ANNOVAR (8). Variants

identified as non-silent were manually reviewed using Integrated Genomics Viewers (IGV)

(9), and those showing an Illumina specific error profile (10) were removed from further

analysis.

Ion AmpliSeqTM Custom Validation panel

A total of 685 mutations (representing all non-silent variants from EAC001, EAC003 and

EAC005) were subjected to orthogonal validation. For each tumor, an Ion AmpliSeqTM

custom panel (Life Technologies) was designed using the online designer

(www.ampliseq.com). Multiplex PCRs were performed on DNA from each region of the

relevant tumor according to the manufacturer’s protocol. Barcoded sequencing libraries

were constructed, which were sequenced with 200 bp read length on the Ion Torrent

PGMTM sequencer (Life Technologies). Sequence alignment to target regions from the hg19

genome was performed using the Ion Torrent Torrent SuiteTM software.

Variants were sequenced to a median depth of 445. An SNV was considered absent when

the VAF < 1% while having a read coverage ≥ 50x or considered a germ-line variant when

VAF > 1% in the germ-line. In total 27 mutations were absent in all tumor regions or

identified as germ-line variants (validation rate 96.1%). Variants with read coverage <50x

were considered inconclusive and were extracted from exome sequencing data.


6

Additionally, all variants and INDELs were manually reviewed using Integrated Genomics

Viewers (IGV) (9), to determine those to be removed from further analysis.

The Ion Torrent data was additionally utilized to inform the filtering parameters for the

remaining tumors. Using the results of the validation and manual review as a simple true

or false variant call, we optimized the filters to maximize our ability to select true positive

and reject true negative variants in this training set and apply those filters to the variants

for the tumors missing ion torrent data. The resulting filters were set so that only exonic

and splice-site mutations with a maximum variant read count ≥ 5 and a maximum VAF≥

5% were included in further analysis (maximum of the regions in the given tumor),

providing an overall accuracy of 91.1%.

Intratumor Heterogeneity Index

An Intratumor Heterogeneity (ITH) Index was generated for each tumor. This was

calculated firstly by determining the proportion of heterogeneous mutations relative to

the total number of mutations for each possible pairwise comparison of regions within the

tumor. The ITH index was then determined by calculating the mean value from the

resulting matrix of pairwise comparisons.

Copy number analysis

All data analysis was performed in the R statistical environment, version 3.0.2. Processed

sample exome SNP and copy number data from paired tumor-normal was generated using

VarScan2 (v2.3.6). Varscan2 copy number was run using default parameters with the

exception of min-coverage (8) and data-ratio. The data-ratio was calculated on a per-


7

sample basis as described in (7). Output from Varscan were processed using the Sequenza

R package 2.1.1 (11) to provide segmented copy number data and cellularity and ploidy

estimates for all samples based on the exome sequence data. The following settings were

used: breaks.method = 'full', gamma = 40, kmin = 5, gamma.pcf = 200, kmin.pcf = 200.

Manual verification was performed of the automatically selected models for ploidy and

cellularity, and for 6 cases the model fitting was re-run using the second most optimum

solution returned by Sequenza (samples EAC003 R3, EAC006 R3, EAC014 R4, EAC015 R3,

EAC017 R2, EAC017 R4), and for one case the third most optimum solution (EAC017 R3).

Processed copy number data for each sample was divided by the sample mean ploidy, and

log2 transformed. Gain and loss were defined as log2(2.5/2) and log2(1.5/2), respectively.

Amplification was defined as log2(4/2). For calling copy number aberrations, segments

smaller than 500 kb or containing less than 5 SNPs were removed. When evaluating if

regions of copy number gain and loss are ubiquitous, it is difficult to determine whether

copy number regions showing partial overlap between multiple samples of the same

tumor are ubiquitous or heterogeneous. To evaluate heterogeneity in copy number gain

and loss all parts of the genome were considered independently and split into minimum

consecutive segments of overlap within each tumor. Any segment of gain or loss that

overlapped across all regions was defined as ubiquitous and all other segments of copy

number aberrations as heterogenous. Hence, if parts of a single segment showed both

ubiquitous and heterogeneous overlap between samples it was considered as two

segments, one heterogeneous and one ubiquitous (see Supplementary Fig. S11 illustrating

this method).


8

Genome doubling was determined as previously described (12). wGII was determined as

described (13). Raw ESCA TCGA SNP data for calculation of wGII was downloaded from the

TCGA on 2014-10-21, and processed as described (14).

Cancer cell fraction estimation and cluster analysis

The cancer cell fraction and mutation copy number of each mutation were estimated by

integrating Sequenza-derived integer copy number and tumor purity estimates with the

VAF as outlined in Lohr et al (15) and Landau et al (16).

For each variant, the expected VAF, given the cancer cell fraction (CCF), can be calculated

as follows:

Expected VAF (CCF) = p*CCF / CPNnorm (1-p) + p*CPNmut.

Where CPNmut corresponds to the local copy number of the tumor, and p is the tumor

purity and CPNnorm the local copy number of the matched normal sample. For a given

mutation with ‘a’ alternative reads, and a depth of ‘N’, the probability of a given CCF can be

estimated using a binomial distribution P(CCF) = binom(a|N, VAF(CCF)). CCF values can

then be calculated over a uniform grid of 100 CCF values (0.01,1) and subsequently

normalized to obtain a posterior distribution. Given that sex chromosomes were excluded

from this analysis CPNnorm was assumed to be 2.

Similarly, the mutation copy number (the number of chromosomal alleles harboring the

mutation) can be calculated as follows:


9

Mutation copy number = (VAF/p)*((p*CNt)+CNn*(1-p))

In order to assess the reliability of Sequenza purity estimates we also calculated purity

estimates based on SNV variant allele frequency (VAF) profiles. In brief, given that the

majority of homogenously identified mutations likely represent clonal events, we

extracted homogenously identified mutations and identified the modal cell fraction

estimate within the largest cell fraction peak. Notably, we observed a highly significant

correlation between VAF purity estimates and Sequenza purity estimates (p < 2 x 10-16,

Pearson’s r = 0.99). Given the concordance between the two, VAF purity estimates were

used in subsequent analysis.

In order to cluster CCF values, we evaluated all possible combinations of presence (CCF

>10%) and absence (CCF <10%) calls in tumor regions. Thus, for a given tumor with R

tumor regions, each cluster corresponded to a binary profile of length L (for example,

given 6 tumor regions, an SNV with the profile 101111 was defined as present in regions

R1, R3, R4, R5 and R6, but not R2, and was grouped with all other SNVs with the same

binary profile). The SNV cluster with a binary profile consisting entirely of 1s represents

mutations homogenously found in all tumor regions. Only SNV clusters with at least 5

SNVs were considered. Notably, although each SNV cluster may harbor both clonal and

subclonal mutations the vast majority of mutations within each cluster were found to be in

agreement, indicating each cluster generally corresponds to a clonal or subclonal

mutation cluster.


10

Identification of SNV heterogeneity driven by copy number alterations

SNVs were filtered in order to remove those whose absence, or low CCF values, may be

driven by copy number events. For each tumor we identified any SNV residing in genomic

segments of copy number heterogeneity across tumor regions, with minor and major copy

number aberrations considered separately. For each chromosome, we grouped mutations

into non-contiguous genomic segments with consistent copy number states within tumor

regions and within SNV clusters defined above. In order to restrict our analysis to

mutations lost in at least one tumor region, we determined the median CCF value of each

SNV group, and only considered SNV groups where the median CCF value was <=0.25 in at

least one tumor region. We then evaluated whether copy number loss coincided with

lower CCF levels using a one-sided Wilcoxon test or, if more than two copy number states

were present across tumor regions, a one-sided Cochrane Armitage trend test. To ensure

the lower CCF value was driven by copy number and not tumor region, we also

implemented a regression analysis, including both copy number and region in the model.

In total, across all tumor regions, 100 mutations were filtered as being driven by copy

number change.

Phylogenetic tree construction

All non-silent mutations that passed validation (EAC001, EAC003 and EAC005) or further

filtering (EAC006, EAC009, EAC014, EAC015 and EAC017) were considered for the purpose

of determining phylogenetic trees. Trees were built using binary presence/absence

matrices built from the regional distribution of variants within the tumor. The R

Bioconductor package phangorn (1.99-7) (17) was utilized to perform the parsimony


11

ratchet method (18) generating unrooted trees. Branch lengths were determined using the

acctran function.

Identification and classification of driver mutations

All non-silent variants were compared against a list of potential driver genes (n=598). The

driver gene list was comprised of all genes identified in the COSMIC cancer gene census

(June 2014) (19), plus those identified in large scale pan-cancer analyses (using q < 0.05 as

cut-off) (20), and previous esophageal sequencing studies (21). Any variants that were

located within one of these genes underwent categorization based on pre-set criteria. If

the gene was annotated as being recessive by COSMIC (tumor suppressor), and the variant

was deemed to be deleterious (either a stop-gain or predicted deleterious in two of the

three computational approaches applied – Sift (22), Polyphen (23) and MutationTaster

(24)), then the specific variant would be classed as Category 1 (high confidence driver

mutation). If it failed to reach these criteria, the proximity to mutations annotated in

COSMIC (data obtained February 2015) was determined. If ≥ 3 COSMIC mutations were

located within 15bp, the variant was classed as Category 2 (putative driver mutation). If not

then it would be classified as Category 3 (low confidence driver mutation). Alternatively, if

the variant was found in a gene annotated by COSMIC as dominant (oncogene), then we

sought to identify exact matches to the specific variant in COSMIC. If an exact match was

found ≥3 times, the variant was classed as Category 1. If exact matches were not found,

then the same criteria as described for the tumor suppressor genes above was applied to

class the variant as Category 2 or 3. Finally, if the driver gene had not been classified as

either an oncogene or tumor suppressor, then all tests described above were applied and


12

if it passed, the variant would be classed a Category 2, otherwise Category 3. All remaining

variants were classed as Category 4 and represented as variants of unknown significance.

Temporal dissection of mutations

For each M-seq tumor, we classified each mutation as ‘early’ or ‘late’ based on whether it

was located on the trunk or branch of the phylogenetic tree. All truncal mutations were

classified as ‘early’ and any branch mutations as ‘late’. Chi-square tests were used to

compare the mutation spectra of the six mutation types (C>A, C>G, C>T, T>A, T>C, T>G). A

two-sided Fisher’s exact test was used to compare the relative frequency of each mutation

type between early and late variants. Additionally, we specifically sought to determine the

significance of the enrichment of T>G mutations in the CpTpT context compared with

genomic background in both early and late, adapting the method described in (25, 26).

The variants were also split according to their occurrence pre-treatment and post-

treatment with platinum chemotherapy. Variants were classed as post-treatment specific if

they were absent from all regions extracted from pre-chemotherapy i.e. exclusive to the

post-treatment samples. As with the early versus late comparison, chi-square tests were

used to compare the mutation spectra of the six mutation types and two-sided Fisher’s

exact test to compare the relative frequency of each mutation type between pre- and post-

chemotherapy. Finally, we tested for the presence of a platinum signature in the post-

treatment regions as described below.

Detecting a platinum mutation pattern


13

Previous work (27) has demonstrated the propensity for platinum exposure to lead to C>A

(G>T) mutations within a CpC (GpG) context. To detect a platinum mutation pattern the

methods outlined for APOPEC enrichment were adapted (26). The enrichment ECpC relating

to the strength of mutagenesis at the CpC motif across the genome was calculated as

follows:

E = (mutationsCpC * contextC) / (mutationsC * contextCpC)

where mutationsCpC is the number of mutated cytosines (and guanines) falling in a CpC (or

GpG) dinucleotide, mutationsC (or G) is the total number of mutated cytosines (or

guanines), contextCpC is the total number of CpC (or GpG) dinucleotides within a 41-base

region centered on the mutated cytosines (and guanines) and contextC (or G) is the total

number of cytosines (or guanines) within the 41 base region centered on the mutated

cytosines (or guanines). A two-sided Fisher’s exact test was used to determine if an over-

representation of platinum signature mutations was present in each sample. The test

compared the ratio of the number of cytosine-to-adenine and guanine-to-thymine

substitutions that occurred in and out of the CpC (GpG) platinum target dinucleotide to an

analogous ratio for all cytosines and guanines that reside inside and outside of the CpC or

GpG dinucleotide within 41 base region centered on the mutation cytosine (and guanine),

representing the genomic background. P-values were adjusted using Benjamin-Hochberg

multiple test correction. A two-tailed Fisher’s exact test was performed to compare for

platinum enrichment between the pre- and post-chemotherapy samples.


14

To further assess the impact of platinum exposure on the mutational content of the tumor

regions, the indel content of the pre-chemotherapy and post-chemotherapy regions was

compared. The maximum number of exonic indels, classed as ‘high confidence‘ by

VarScan2 processSomatic, within a single pre-chemotherapy region for a given tumor was

obtained and similarly the maximum number identified within a single post-

chemotherapy region was identified. These values, obtained for each tumor, were

compared using a paired Wilcoxon signed rank test.

We were unable to assess the previously identified specific platinum dinucleotide

substitution signature (CpT > ApC) due to the absence of confident dinucleotide

substitutions within this cohort.


15

REFERENCES FOR METHODS

1. Mandard AM, Dalibard F, Mandard JC, Marnay J, Henry-Amar M, Petiot JF, et al.

Pathologic assessment of tumor regression after preoperative chemoradiotherapy of

esophageal carcinoma. Clinicopathologic correlations. Cancer. 1994;73:2680-6.

2. Gerlinger M, Horswell S, Larkin J, Rowan AJ, Salm MP, Varela I, et al. Genomic

architecture and evolution of clear cell renal cell carcinomas defined by multiregion

sequencing. Nature genetics. 2014;46:225-33.

3. Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, et al.

Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N

Engl J Med. 2012;366:883-92.

4. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The

Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA

sequencing data. Genome research. 2010;20:1297-303.

5. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics. 2009;25:1754-60.

6. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence

Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078-9.

7. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2:

somatic mutation and copy number alteration discovery in cancer by exome sequencing.

Genome research. 2012;22:568-76.

8. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants

from high-throughput sequencing data. Nucleic acids research. 2010;38:e164.

9. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al.

Integrative genomics viewer. Nature biotechnology. 2011;29:24-6.


16

10. Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, et al. Sequence-

specific error profile of Illumina sequencers. Nucleic acids research. 2011;39:e90.

11. Favero F, Joshi T, Marquard AM, Birkbak NJ, Krzystanek M, Li Q, et al. Sequenza:

allele-specific copy number and mutation profiles from tumor sequencing data. Annals of

oncology : official journal of the European Society for Medical Oncology / ESMO.

2015;26:64-70.

12. Dewhurst SM, McGranahan N, Burrell RA, Rowan AJ, Gronroos E, Endesfelder D, et

al. Tolerance of whole-genome doubling propagates chromosomal instability and

accelerates cancer genome evolution. Cancer discovery. 2014;4:175-85.

13. Burrell RA, McClelland SE, Endesfelder D, Groth P, Weller MC, Shaikh N, et al.

Replication stress links structural and numerical cancer chromosomal instability. Nature.

2013;494:492-6.

14. Birkbak NJ, Wang ZC, Kim JY, Eklund AC, Li Q, Tian R, et al. Telomeric allelic

imbalance indicates defective DNA repair and sensitivity to DNA-damaging agents. Cancer

Discov. 2012;2:366-75.

15. Lohr JG, Stojanov P, Carter SL, Cruz-Gordillo P, Lawrence MS, Auclair D, et al.

Widespread genetic heterogeneity in multiple myeloma: implications for targeted therapy.

Cancer cell. 2014;25:91-101.

16. Landau DA, Carter SL, Stojanov P, McKenna A, Stevenson K, Lawrence MS, et al.

Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell.

2013;152:714-26.

17. Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27:592-3.

18. Nixon KC. The Parsimony Ratchet, a new method for rapid parsimony analysis.

Cladistics. 1999;15:407-14.


17

19. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A census of

human cancer genes. Nature reviews Cancer. 2004;4:177-83.

20. Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR, et al.

Discovery and saturation analysis of cancer genes across 21 tumour types. Nature.

2014;505:495-501.

21. Dulak AM, Stojanov P, Peng S, Lawrence MS, Fox C, Stewart C, et al. Exome and

whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver

events and mutational complexity. Nature genetics. 2013;45:478-86.

22. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous

variants on protein function using the SIFT algorithm. Nature protocols. 2009;4:1073-81.

23. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A

method and server for predicting damaging missense mutations. Nature methods.

2010;7:248-9.

24. Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation

prediction for the deep-sequencing age. Nature methods. 2014;11:361-2.

25. de Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, Yates L, et al. Spatial and

temporal diversity in genomic instability processes defines lung cancer evolution. Science.

2014;346:251-6.

26. Roberts SA, Lawrence MS, Klimczak LJ, Grimm SA, Fargo D, Stojanov P, et al. An

APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nature

genetics. 2013;45:970-6.

27. Meier B, Cooke SL, Weiss J, Bailly AP, Alexandrov LB, Marshall J, et al. C. elegans

whole-genome sequencing reveals mutational signatures related to carcinogens and DNA

repair deficiency. Genome research. 2014;24:1624-36.


18

Supplementary Table S1. Sequencing coverage for tumor regions and germline samples

Mean MedianEAC005 R1 Pre 110.51 101

R2 Pre 99.61 90R3 Pre 116.3 107R4 Post 90.8 80R5 Post 79.63 70R6 Post 93.29 78GL Germline 101.89 80

EAC015 R1 Pre 75.18 68R2 Post 106.52 97R3 Post 141.53 129R4 Post 137.5 126R5 Post 114.41 105GL Germline 93.47 86

EAC001 R1 Pre 95.44 81R2 Pre 77.17 68R3 Pre 89.76 79R4 Pre 103.14 89R5 Post 84.18 73R6 Post 106.76 89R7 Post 76.29 69R8 Post 87.24 79GL Germline 95.04 87

EAC006 R1 Pre 210.48 191R2 Pre 141.26 128R3 Post 189.62 165R4 Post 84.08 73R5 Post 105.55 94GL Germline 92.84 85

EAC003 R1 Pre 108.55 91R2 Pre 97.14 83R3 Pre 73.02 64GL Germline 142.26 132

EAC014 R1 Pre 77.7 68R2 Pre 96.26 84R3 Pre 113.45 101R4 Pre 121.75 108GL Germline 45.8 42

EAC017 R1 Pre 119.44 105R2 Pre 117.15 104R3 Post 62.02 56R4 Post 121.83 104R5 Post 108.37 92R6 Post 104.11 90GL Germline 116.13 106

EAC009 R1 Pre 90.14 82R2 Pre 116.33 105R3 Pre 136.46 123GL Germline 136.59 123

Tumor Region Type Coverage


19

Supplementary Table S2. Clinical characteristics of patients with EAC

Abbreviations: EAC; esophageal adenocarcinoma. ECX; epirubicin, cisplatin and capecitabine, EOX; epirubicin, oxaliplatin and capecitabine.

Identity Gender

Pre-op TNM Staging

Pre-op Stage Pathology

Mandard Score

Path TNM Stage

Post-op Stage Response

Neoadj chemo

No. of cycles Outcome

EAC005 Male T2N1 2B Adenocarcinoma 5 pT3N2 3B Upstaged ECX 2 Poor

EAC015 Male T3N1 3A Adenocarcinoma 5 pT4aN3 3C Upstaged ECX 3 Poor

EAC001 Male T3N0 3A Adenocarcinoma 5 pT3N0 2B Same ECX 4 Intermediate

EAC006 Male T1N0 1B Adenocarcinoma 3 pT1N1 2B Upstaged EOX 3 Intermediate

EAC014 Male T3N1 3A Adenocarcinoma 4 pT1bN1 2B Downstaged ECX 1 Intermediate

EAC003 Male T3N1 3A Adenocarcinoma 3 pT3N1 3A Same ECX 3 Good

EAC017 Male T3N1 3A Adenocarcinoma 3 pT2N1 2B Downstaged EOX 3 Good

EAC009 Male T4N2 3C Adenocarcinoma 2 pT1bN1 2B Downstaged ECX 3 Good


20

Supplementary Table S4 Annotation of chromosomal segments that overlap with TCGA ESCA recurrent amplifications and tumor regions in our EAC cohort.

Abbreviations: EAC; esophageal adenocarcinoma, ESCA; esophageal cancer.

Chromosome Start End # Genes Gistic Reg Size (bp) Cytoband Rangechr1 145414549 171983088 548 chr1q23.3 26568539 chr1p12-q24.3chr6 43738144 44153635 7 chr6p21.1 415491 chr6p21.1chr7 54942676 57316916 27 chr7p11.2 2374240 chr7p11.2chr7 92730862 94052663 12 chr7q21.2 1321801 chr7q21.2-q21.3chr8 11355603 12580568 28 chr8p23.1 1224965 chr8p23.1chr8 128702020 128825001 3 chr8q24.21 122981 chr8q24.21chr11 33902613 36458777 19 chr11p13 2556164 chr11p13chr11 69647998 69924352 2 chr11q13.3 276354 chr11q13.3chr12 24982722 25801682 6 chr12p1 818960 chr12p12.1chr12 69499276 70672267 14 chr12q15 1172991 chr12q15chr13 73630948 84107781 43 chr13q22.1 10476833 chr13q22.1chr14 35399861 39901572 36 chr14q21.1 4501711 chr14q13.2-q21.1chr17 37875893 38020421 4 chr17q12 144528 chr17q12chr18 19613596 19853474 2 chr18q11.2 239878 chr18q11.2chr19 30243979 31770851 4 chr19q12 1526872 chr19q12


21

Supplementary Figure S1. Mutations identified using M-seq compared with a single

biopsy. Median and interquartile range are indicated by horizontal black lines. M-seq;

multi-region exome sequencing.

Supplementary Figure S2. Relationship of intratumor heterogeneity and response to

NAC treatment. Spearman rho is indicated. NAC; neoadjuvant chemotherapy.

Supplementary Figure S3. Tumor phylograms. Phylograms were inferred using a

parsimony ratchet approach. The phylograms are presented to scale, with the number of

mutations as evolutionary distance. Uncertainties assessed by bootstrap tests are indicated

next to the nodes. GL indicates germline. Scale bar indicates the number of mutations.

Supplementary Figure S4. Copy number events lead to mutational heterogeneity. For

example, in EAC015 on chromosome 2, mutations present in other tumor regions but not

present in region R1 are likely absent due to complete loss of one chromosomal copy. Only

mutations potentially explained by copy number events are depicted (grey). Major allele

copy number is shown with a black line and the minor allele copy number with a green

line.

Supplementary Figure S5. Heatmap of the distribution of all the driver mutations

identified in the EAC cohort. The putative driver mutations were classified as tumor

suppressor genes or oncogenes as reported the COSMIC cancer gene census (17) this is


22

indicated in dark green and light green respectively. Ubiquitously detected mutations

(present in all tumor regions) are indicated in dark blue and heterogeneous mutations

(present in one or more tumor regions but not all) are indicated in orange. Driver

mutations that confer an illusion of clonality are indicated in the right column in dark blue.

Supplementary Figure S6. Driver mutations identified using M-seq compared with a

single biopsy. Median and interquartile range are indicated by horizontal black lines. M-

seq; multi-region exome sequencing.

Supplementary Figure S7. Copy number states across the genome for all tumor regions.

Gains (+1 copy number relative to ploidy) are depicted in orange, losses (-1 copy number

relative to ploidy) are depicted in blue and amplifications (x2 ploidy) are depicted in red.

Supplementary Figure S8. Chromosome view of chromosome 19, tumor sample EAC017,

region R1, demonstrating chromothripsis. A) Mutant allele fraction, each dot indicates a

mutation. (B) B-allele fraction based on SNPs detected in the region. (C) Depth ratio of the

tumor relative to the paired normal sample. Within each window, a thick black line

indicates the median value, and a blue bar indicates the interquartile range. Red lines

indicate segmented values. The thin dotted lines indicate the expected copy number

values under the fitted model.

Supplementary Figure S9. wGII scores for the TCGA ESCA cohort and M-seq EAC cohort.

Grey, TCGA samples. Red, pre-chemotherapy tumor regions. Green, post-chemotherapy


23

tumor regions. Median and interquartile range are indicated by horizontal black lines.

ESCA; esophageal cancer, EAC; esophageal adenocarcinoma.

Supplementary Figure S10. The Trinucleotide context for temporally dissected combined

EAC cohort. Each 96 substitution classification is defined by the mutation type and

sequence context immediately 3’ and 5’ to the mutated base. The mutation types are on

the top horizontal bar, vertical axes depict the percentage of mutations attributed to a

specific mutation type.

Supplementary Figure S11. Assessing copy number heterogeneity using a minimum

consecutive segment method. This figure shows mock segmentation data with copy

number aberrations present in four regions from a single mock patient. Panel A shows the

segmentation data, Panel B shows the data after the transformation into minimum

consecutive segments (MCS). The start and end positions of all copy number aberrations

from each tumor region, see the short vertical black lines in Panel A, are introduced to all

regions within the same tumor sample, as shown in Panel B with long black vertical lines.

The original genomic segments were split into smaller segments giving the MCS. If an area

of overlap was present in all tumor regions containing the same copy number aberration,

either a loss, gain or amplification, as indicated in Panel B, this would now be identified as

a ubiquitous and all other MCS with copy number aberrations were defined as

heterogeneous.

Supplementary Figure S1

Multi-region Single biopsy

100

150

200

250

300

Num

ber o

f non

−sile

nt m

utat

ions

EAC005EAC015EAC001EAC006EAC014EAC003EAC017EAC009

●●●●●●●●


●

●●

●

●

●●

Response

ITH

inde

x0.

10.

20.

30.

4

Good Intermediate Poor

Spearman rho = 0.93

EAC005EAC001EAC006EAC014EAC003EAC017EAC009

●●●●●●●

R1

R2

R3 R4

R5

GL

47

9499

20

R1 R2

R3

R4 R5

R6

R7R8

GL

52

96

100

100

5071

10

R1

R2 R3

R4R5

GL

8490

9320R1

R2

R3

R4R5

R6 GL

5494

47

54

100

10

R1R2

R3R4

GL

100

100R1

R2

R354

10

R1 R2

R3

R4

R5

R6

GL

67

10074

100

20

R2

R1

R3

GL

38

1010

GL


EAC006

EAC014

EAC009EAC003 EAC017

EAC005 EAC015 EAC001

0

1

2

3

4

>5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122

R1_P

reCh

emo

0

1

2

3

4

>5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122

0

1

2

3

4

>5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122

0

1

2

3

4

>5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122


Copy

num

ber

R2_P

ostC

hem

oCo

py n

umbe

r

R3_P

ostC

hem

oCo

py n

umbe

rR5

_Pos

tChe

mo

Copy

num

ber

EAC0

05EA

C015

EAC0

01EA

C006

EAC0

14EA

C003

EAC0

17EA

C009

TLL1TLR4MTOREYSNUAK1GNPTABFAT1BCL6STAT3CBFA2T3OLIG2KCNJ5CCDC6EGFRKLF4TPRHIST1H4IRPL22TCF7L2MN1ACSL3CDH11BRD4PIK3R1PALB2CICATRXDOCK2SLC39A12PTPRCHIP1PMLNOTCH1ATMSMAD4SCN10AAJAP1SPG20SYKCHN1SETBP1MAML2BRAFPER1HOXD13MYH11OMDEWSR1MYCLZNF331CLTCROS1PDGFRAKITNKX2−1KAT6ABRIP1BRCA1NF1DICER1CDKN2ACDH1EXT2AXIN1PBRM1SYNE1AKAP6NTRK3TP53

TSG

or O

G

Illus

ion

of c

lona

lity

Driver gene

Classi�cation of driver gene

Distribution of driver gene

tumor suppressor

oncogene

Illusion of clonalitypresentabsent

ubiquitous

heterogneous

unclassi�ed


510

1520

Num

ber o

f driv

er m

utat

ions

EAC005EAC015EAC001EAC006EAC014EAC003EAC017EAC009

Multi-region Single biopsy


●●●●●●●●

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

EAC

005

EAC0

15EA

C001

EAC0

06EA

C014

EAC0

03EA

C017

EAC0

09

Tum

or s

ampl

e

chromosome

R6 R5

R1 R2 R3 R4

R6 R5

R2 R3 R4

R1 R2 R3 R4

R1 R2 R3

R6 R5

R1 R2 R3 R4

R4 R3 R2

R1 R2 R3 R5

R4 R5 R6


Pre

Pre

Pre

Pre

Pre

Pre

Pre

Post

Post

Post

Post

Post

0.0

0.2

0.4

0.6

0.8

1.0

mut.tab$position

Mut

ant a

llele

freq

uenc

y

A>C, T>GA>G, T>CA>T, T>AC>A, G>TC>G, G>CC>T, G>A

0.0

0.1

0.2

0.3

0.4

0.5

xlim

B al

lele

freq

uenc

y

0.0

0.5

1.0

1.5

2.0

2.5

xlim

Dep

th ra

tio

0123456789

Copy

num

ber

0 10 20 30 40 50Position (Mb)

chr19


wG

II

0.0

0.2

0.4

0.6

0.8

TCGA M−seqall tumor

regions

M−seqpre-

chemotherapytumor regions

M−seqpost-

chemotherapytumor regions


C > A C > G C > T T > A T > C T > G

Perc

enta

ge o

f tot

al m

utat

ions

05

1015

20

C > A C > G C > T T > A T > C T > G

ACA

ACC

ACG

ACT

CCA

CCC

CCG

CCT

GCA

GCC

GCG GCT TC

ATC

CTC

GTC

TAC

AAC

CAC

GAC

TCC

ACC

CCC

GCC

TG

CAG

CCG

CG GCT TC

ATC

CTC

GTC

TAC

AAC

CAC

GAC

TCC

ACC

CCC

GCC

TG

CAG

CCG

CG GCT TC

ATC

CTC

GTC

TAT

AAT

CAT

GAT

TCT

ACT

CCT

GCT

TG

TAG

TCG

TG GTT TT

ATT

CTT

GTT

TAT

AAT

CAT

GAT

TCT

ACT

CCT

GCT

TG

TAG

TCG

TG GTT TT

ATT

CTT

GTT

TAT

AAT

CAT

GAT

TCT

ACT

CCT

GCT

TG

TAG

TCG

TG GTT TT

ATT

CTT

GTT

T

Perc

enta

ge o

f tot

al m

utat

ions

05

1015

20

ACA

ACC

ACG

ACT

CCA

CCC

CCG

CCT

GCA

GCC

GCG GCT TC

ATC

CTC

GTC

TAC

AAC

CAC

GAC

TCC

ACC

CCC

GCC

TG

CAG

CCG

CG GCT TC

ATC

CTC

GTC

TAC

AAC

CAC

GAC

TCC

ACC

CCC

GCC

TG

CAG

CCG

CG GCT TC

ATC

CTC

GTC

TAT

AAT

CAT

GAT

TCT

ACT

CCT

GCT

TG

TAG

TCG

TG GTT TT

ATT

CTT

GTT

TAT

AAT

CAT

GAT

TCT

ACT

CCT

GCT

TG

TAG

TCG

TG GTT TT

ATT

CTT

GTT

TAT

AAT

CAT

GAT

TCT

ACT

CCT

GCT

TG

TAG

TCG

TG GTT TT

ATT

CTT

GTT

T


Early Late

copy number aberration segment normal copy number segment

A

R1

R2

R3

R4

R1

R2

R3

R4

B

H H U H

H = heterogeneous U = ubiquitous

overlap


Supplementary Material for Tracking the genomic evolution ... · Tracking the genomic evolution of...

Documents

Transcript of Supplementary Material for Tracking the genomic evolution ... · Tracking the genomic evolution of...