Supplementary Material for Tracking the genomic evolution ... · Tracking the genomic evolution of...
Transcript of Supplementary Material for Tracking the genomic evolution ... · Tracking the genomic evolution of...
CD-15-0412: Murugaesu et al. - Supplementary Material
1
Supplementary Material for
Tracking the genomic evolution of esophageal adenocarcinoma through
neoadjuvant chemotherapy
Supplementary Materials and Methods
Supplementary Table S1 Sequencing coverage for tumor regions and germline
samples
Supplementary Table S2 Clinical characteristics of patients with EAC
Supplementary Table S4 Annotation of chromosomal segments that overlap with
TCGA ESCA recurrent amplifications and tumor regions in our
EAC cohort
Supplementary Figure S1 Mutations identified using M-seq compared with a single
biopsy
Supplementary Figure S2 Relationship of intratumor heterogeneity and response to
NAC treatment
Supplementary Figure S3 Tumor Phylograms
Supplementary Figure S4 Copy number events lead to mutational heterogeneity
CD-15-0412: Murugaesu et al. - Supplementary Material
2
Supplementary Figure S5 Heatmap of the all driver mutations identified across all
tumor regions
Supplementary Figure S6 Driver mutations identified using M-seq compared with a
single biopsy
Supplementary Figure S7 Copy number states across the genome for each tumor
region
Supplementary Figure S8 Chromosome view of chromosome 19, tumor sample EAC017,
region R1, demonstrating chromothripsis
Supplementary Figure S9 wGII scores for the TCGA ESCA cohort and M-seq EAC cohort
Supplementary Figure S10 The trinucleotide context for temporally dissected combined
EAC cohort
Supplementary Figure S11 Assessing copy number heterogeneity using a minimum
consecutive segment method
CD-15-0412: Murugaesu et al. - Supplementary Material
3
SUPPLEMENTARY MATERIALS AND METHODS
Patient cohort description
Multiple pre-treatment tumor regions were obtained endoscopically from a single tumor
mass and post-chemotherapy tumor regions were obtained from the surgical tumor
resection. Patients were treated with neoadjuvant combination chemotherapy and did not
receive any concurrent radiation treatment. Detailed clinical characteristics are provided in
Supplementary Table S2.
Tumor processing
All tumor samples were snap-frozen. Peripheral blood was collected from all patients and
snap-frozen. Approximately 5x2x2mm tumor tissue and 500µl of blood was used for
genomic DNA extraction, using the DNeasy kit (Qiagen) according to manufacturer’s
protocol. DNA was quantified by Qubit (Invitrogen) and DNA integrity was examined by
agarose gel eletrophoresis.
Multi-region whole exome sequencing
For each tumor region and matched germ-line, exome capture was performed on 1-2 μg
DNA using the Agilent Human All Exome V4 kit according to the manufacturer’s protocol
(Agilent). Samples were paired-end multiplex sequenced on the Illumina HiSeq 2500 at the
Advanced Sequencing Facility at The Francis Crick Institute, Lincoln's Inn Fields
Laboratories, as described previously (2, 3). Each captured library was loaded on the
Illumina platform and paired-end sequenced to the desired average sequencing depth
(approximately 90x, detailed coverage information is provided in Supplementary Table S1).
CD-15-0412: Murugaesu et al. - Supplementary Material
4
SNV and INDEL calling from multi-region whole exome sequencing
Raw paired end reads (100bp) in FastQ format generated by the Illumina pipeline were
aligned to the full hg19 genomic assembly (including unknown contigs) obtained from
GATK bundle 2.8 (4), using bwa mem (bwa-0.7.7) (5). Picard tools v1.107 was used to clean,
sort and merge files from the same patient region and to remove duplicate reads
(http://broadinstitute.github.io/picard). Quality control metrics were obtained using a
combination of picard tools (1.107), GATK (2.8.1) and FastQC (0.10.1)
(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
SAMtools mpileup (0.1.16) (6) was used to locate non-reference positions in tumor and
germ-line samples. Bases with a phred score of <20 or reads with a mapping-quality <20
were skipped. BAQ computation was disabled and the coefficient for downgrading
mapping quality was set to 50. Somatic variants between tumor and matched germ-line
were determined using VarScan2 somatic (v2.3.6) (7) utilizing the output from SAMtools
mpileup. Default parameters were used with the exception of minimum coverage for the
germ-line sample that was set to 10, minimum variant frequency was changed to 0.01 and
tumor purity was set to 0.5. VarScan2 processSomatic was used to extract the somatic
variants.
The resulting SNV calls were filtered for false positives using Varscan2's associated
fpfilter.pl script, having first run the data through bam-readcount (0.5.1). Additionally, for
those variants not subjected to Ion Torrent validation, further filtering was applied
whereby variants were only accepted if present in ≥ 5 reads and ≥ 5% variant allele
frequency (VAF) in at least one tumor region with germ-line VAF ≤ 1%. If a variant was
CD-15-0412: Murugaesu et al. - Supplementary Material
5
found to meet these criteria in a single region, then the VAF threshold was reduced to ≥
1% in order to detect low frequency variants.
All INDEL calls classed as ‘high confidence‘ by VarScan2 processSomatic underwent
manual review prior to validation. All variants were annotated using ANNOVAR (8). Variants
identified as non-silent were manually reviewed using Integrated Genomics Viewers (IGV)
(9), and those showing an Illumina specific error profile (10) were removed from further
analysis.
Ion AmpliSeqTM Custom Validation panel
A total of 685 mutations (representing all non-silent variants from EAC001, EAC003 and
EAC005) were subjected to orthogonal validation. For each tumor, an Ion AmpliSeqTM
custom panel (Life Technologies) was designed using the online designer
(www.ampliseq.com). Multiplex PCRs were performed on DNA from each region of the
relevant tumor according to the manufacturer’s protocol. Barcoded sequencing libraries
were constructed, which were sequenced with 200 bp read length on the Ion Torrent
PGMTM sequencer (Life Technologies). Sequence alignment to target regions from the hg19
genome was performed using the Ion Torrent Torrent SuiteTM software.
Variants were sequenced to a median depth of 445. An SNV was considered absent when
the VAF < 1% while having a read coverage ≥ 50x or considered a germ-line variant when
VAF > 1% in the germ-line. In total 27 mutations were absent in all tumor regions or
identified as germ-line variants (validation rate 96.1%). Variants with read coverage <50x
were considered inconclusive and were extracted from exome sequencing data.
CD-15-0412: Murugaesu et al. - Supplementary Material
6
Additionally, all variants and INDELs were manually reviewed using Integrated Genomics
Viewers (IGV) (9), to determine those to be removed from further analysis.
The Ion Torrent data was additionally utilized to inform the filtering parameters for the
remaining tumors. Using the results of the validation and manual review as a simple true
or false variant call, we optimized the filters to maximize our ability to select true positive
and reject true negative variants in this training set and apply those filters to the variants
for the tumors missing ion torrent data. The resulting filters were set so that only exonic
and splice-site mutations with a maximum variant read count ≥ 5 and a maximum VAF≥
5% were included in further analysis (maximum of the regions in the given tumor),
providing an overall accuracy of 91.1%.
Intratumor Heterogeneity Index
An Intratumor Heterogeneity (ITH) Index was generated for each tumor. This was
calculated firstly by determining the proportion of heterogeneous mutations relative to
the total number of mutations for each possible pairwise comparison of regions within the
tumor. The ITH index was then determined by calculating the mean value from the
resulting matrix of pairwise comparisons.
Copy number analysis
All data analysis was performed in the R statistical environment, version 3.0.2. Processed
sample exome SNP and copy number data from paired tumor-normal was generated using
VarScan2 (v2.3.6). Varscan2 copy number was run using default parameters with the
exception of min-coverage (8) and data-ratio. The data-ratio was calculated on a per-
CD-15-0412: Murugaesu et al. - Supplementary Material
7
sample basis as described in (7). Output from Varscan were processed using the Sequenza
R package 2.1.1 (11) to provide segmented copy number data and cellularity and ploidy
estimates for all samples based on the exome sequence data. The following settings were
used: breaks.method = 'full', gamma = 40, kmin = 5, gamma.pcf = 200, kmin.pcf = 200.
Manual verification was performed of the automatically selected models for ploidy and
cellularity, and for 6 cases the model fitting was re-run using the second most optimum
solution returned by Sequenza (samples EAC003 R3, EAC006 R3, EAC014 R4, EAC015 R3,
EAC017 R2, EAC017 R4), and for one case the third most optimum solution (EAC017 R3).
Processed copy number data for each sample was divided by the sample mean ploidy, and
log2 transformed. Gain and loss were defined as log2(2.5/2) and log2(1.5/2), respectively.
Amplification was defined as log2(4/2). For calling copy number aberrations, segments
smaller than 500 kb or containing less than 5 SNPs were removed. When evaluating if
regions of copy number gain and loss are ubiquitous, it is difficult to determine whether
copy number regions showing partial overlap between multiple samples of the same
tumor are ubiquitous or heterogeneous. To evaluate heterogeneity in copy number gain
and loss all parts of the genome were considered independently and split into minimum
consecutive segments of overlap within each tumor. Any segment of gain or loss that
overlapped across all regions was defined as ubiquitous and all other segments of copy
number aberrations as heterogenous. Hence, if parts of a single segment showed both
ubiquitous and heterogeneous overlap between samples it was considered as two
segments, one heterogeneous and one ubiquitous (see Supplementary Fig. S11 illustrating
this method).
CD-15-0412: Murugaesu et al. - Supplementary Material
8
Genome doubling was determined as previously described (12). wGII was determined as
described (13). Raw ESCA TCGA SNP data for calculation of wGII was downloaded from the
TCGA on 2014-10-21, and processed as described (14).
Cancer cell fraction estimation and cluster analysis
The cancer cell fraction and mutation copy number of each mutation were estimated by
integrating Sequenza-derived integer copy number and tumor purity estimates with the
VAF as outlined in Lohr et al (15) and Landau et al (16).
For each variant, the expected VAF, given the cancer cell fraction (CCF), can be calculated
as follows:
Expected VAF (CCF) = p*CCF / CPNnorm (1-p) + p*CPNmut.
Where CPNmut corresponds to the local copy number of the tumor, and p is the tumor
purity and CPNnorm the local copy number of the matched normal sample. For a given
mutation with ‘a’ alternative reads, and a depth of ‘N’, the probability of a given CCF can be
estimated using a binomial distribution P(CCF) = binom(a|N, VAF(CCF)). CCF values can
then be calculated over a uniform grid of 100 CCF values (0.01,1) and subsequently
normalized to obtain a posterior distribution. Given that sex chromosomes were excluded
from this analysis CPNnorm was assumed to be 2.
Similarly, the mutation copy number (the number of chromosomal alleles harboring the
mutation) can be calculated as follows:
CD-15-0412: Murugaesu et al. - Supplementary Material
9
Mutation copy number = (VAF/p)*((p*CNt)+CNn*(1-p))
In order to assess the reliability of Sequenza purity estimates we also calculated purity
estimates based on SNV variant allele frequency (VAF) profiles. In brief, given that the
majority of homogenously identified mutations likely represent clonal events, we
extracted homogenously identified mutations and identified the modal cell fraction
estimate within the largest cell fraction peak. Notably, we observed a highly significant
correlation between VAF purity estimates and Sequenza purity estimates (p < 2 x 10-16,
Pearson’s r = 0.99). Given the concordance between the two, VAF purity estimates were
used in subsequent analysis.
In order to cluster CCF values, we evaluated all possible combinations of presence (CCF
>10%) and absence (CCF <10%) calls in tumor regions. Thus, for a given tumor with R
tumor regions, each cluster corresponded to a binary profile of length L (for example,
given 6 tumor regions, an SNV with the profile 101111 was defined as present in regions
R1, R3, R4, R5 and R6, but not R2, and was grouped with all other SNVs with the same
binary profile). The SNV cluster with a binary profile consisting entirely of 1s represents
mutations homogenously found in all tumor regions. Only SNV clusters with at least 5
SNVs were considered. Notably, although each SNV cluster may harbor both clonal and
subclonal mutations the vast majority of mutations within each cluster were found to be in
agreement, indicating each cluster generally corresponds to a clonal or subclonal
mutation cluster.
CD-15-0412: Murugaesu et al. - Supplementary Material
10
Identification of SNV heterogeneity driven by copy number alterations
SNVs were filtered in order to remove those whose absence, or low CCF values, may be
driven by copy number events. For each tumor we identified any SNV residing in genomic
segments of copy number heterogeneity across tumor regions, with minor and major copy
number aberrations considered separately. For each chromosome, we grouped mutations
into non-contiguous genomic segments with consistent copy number states within tumor
regions and within SNV clusters defined above. In order to restrict our analysis to
mutations lost in at least one tumor region, we determined the median CCF value of each
SNV group, and only considered SNV groups where the median CCF value was <=0.25 in at
least one tumor region. We then evaluated whether copy number loss coincided with
lower CCF levels using a one-sided Wilcoxon test or, if more than two copy number states
were present across tumor regions, a one-sided Cochrane Armitage trend test. To ensure
the lower CCF value was driven by copy number and not tumor region, we also
implemented a regression analysis, including both copy number and region in the model.
In total, across all tumor regions, 100 mutations were filtered as being driven by copy
number change.
Phylogenetic tree construction
All non-silent mutations that passed validation (EAC001, EAC003 and EAC005) or further
filtering (EAC006, EAC009, EAC014, EAC015 and EAC017) were considered for the purpose
of determining phylogenetic trees. Trees were built using binary presence/absence
matrices built from the regional distribution of variants within the tumor. The R
Bioconductor package phangorn (1.99-7) (17) was utilized to perform the parsimony
CD-15-0412: Murugaesu et al. - Supplementary Material
11
ratchet method (18) generating unrooted trees. Branch lengths were determined using the
acctran function.
Identification and classification of driver mutations
All non-silent variants were compared against a list of potential driver genes (n=598). The
driver gene list was comprised of all genes identified in the COSMIC cancer gene census
(June 2014) (19), plus those identified in large scale pan-cancer analyses (using q < 0.05 as
cut-off) (20), and previous esophageal sequencing studies (21). Any variants that were
located within one of these genes underwent categorization based on pre-set criteria. If
the gene was annotated as being recessive by COSMIC (tumor suppressor), and the variant
was deemed to be deleterious (either a stop-gain or predicted deleterious in two of the
three computational approaches applied – Sift (22), Polyphen (23) and MutationTaster
(24)), then the specific variant would be classed as Category 1 (high confidence driver
mutation). If it failed to reach these criteria, the proximity to mutations annotated in
COSMIC (data obtained February 2015) was determined. If ≥ 3 COSMIC mutations were
located within 15bp, the variant was classed as Category 2 (putative driver mutation). If not
then it would be classified as Category 3 (low confidence driver mutation). Alternatively, if
the variant was found in a gene annotated by COSMIC as dominant (oncogene), then we
sought to identify exact matches to the specific variant in COSMIC. If an exact match was
found ≥3 times, the variant was classed as Category 1. If exact matches were not found,
then the same criteria as described for the tumor suppressor genes above was applied to
class the variant as Category 2 or 3. Finally, if the driver gene had not been classified as
either an oncogene or tumor suppressor, then all tests described above were applied and
CD-15-0412: Murugaesu et al. - Supplementary Material
12
if it passed, the variant would be classed a Category 2, otherwise Category 3. All remaining
variants were classed as Category 4 and represented as variants of unknown significance.
Temporal dissection of mutations
For each M-seq tumor, we classified each mutation as ‘early’ or ‘late’ based on whether it
was located on the trunk or branch of the phylogenetic tree. All truncal mutations were
classified as ‘early’ and any branch mutations as ‘late’. Chi-square tests were used to
compare the mutation spectra of the six mutation types (C>A, C>G, C>T, T>A, T>C, T>G). A
two-sided Fisher’s exact test was used to compare the relative frequency of each mutation
type between early and late variants. Additionally, we specifically sought to determine the
significance of the enrichment of T>G mutations in the CpTpT context compared with
genomic background in both early and late, adapting the method described in (25, 26).
The variants were also split according to their occurrence pre-treatment and post-
treatment with platinum chemotherapy. Variants were classed as post-treatment specific if
they were absent from all regions extracted from pre-chemotherapy i.e. exclusive to the
post-treatment samples. As with the early versus late comparison, chi-square tests were
used to compare the mutation spectra of the six mutation types and two-sided Fisher’s
exact test to compare the relative frequency of each mutation type between pre- and post-
chemotherapy. Finally, we tested for the presence of a platinum signature in the post-
treatment regions as described below.
Detecting a platinum mutation pattern
CD-15-0412: Murugaesu et al. - Supplementary Material
13
Previous work (27) has demonstrated the propensity for platinum exposure to lead to C>A
(G>T) mutations within a CpC (GpG) context. To detect a platinum mutation pattern the
methods outlined for APOPEC enrichment were adapted (26). The enrichment ECpC relating
to the strength of mutagenesis at the CpC motif across the genome was calculated as
follows:
E = (mutationsCpC * contextC) / (mutationsC * contextCpC)
where mutationsCpC is the number of mutated cytosines (and guanines) falling in a CpC (or
GpG) dinucleotide, mutationsC (or G) is the total number of mutated cytosines (or
guanines), contextCpC is the total number of CpC (or GpG) dinucleotides within a 41-base
region centered on the mutated cytosines (and guanines) and contextC (or G) is the total
number of cytosines (or guanines) within the 41 base region centered on the mutated
cytosines (or guanines). A two-sided Fisher’s exact test was used to determine if an over-
representation of platinum signature mutations was present in each sample. The test
compared the ratio of the number of cytosine-to-adenine and guanine-to-thymine
substitutions that occurred in and out of the CpC (GpG) platinum target dinucleotide to an
analogous ratio for all cytosines and guanines that reside inside and outside of the CpC or
GpG dinucleotide within 41 base region centered on the mutation cytosine (and guanine),
representing the genomic background. P-values were adjusted using Benjamin-Hochberg
multiple test correction. A two-tailed Fisher’s exact test was performed to compare for
platinum enrichment between the pre- and post-chemotherapy samples.
CD-15-0412: Murugaesu et al. - Supplementary Material
14
To further assess the impact of platinum exposure on the mutational content of the tumor
regions, the indel content of the pre-chemotherapy and post-chemotherapy regions was
compared. The maximum number of exonic indels, classed as ‘high confidence‘ by
VarScan2 processSomatic, within a single pre-chemotherapy region for a given tumor was
obtained and similarly the maximum number identified within a single post-
chemotherapy region was identified. These values, obtained for each tumor, were
compared using a paired Wilcoxon signed rank test.
We were unable to assess the previously identified specific platinum dinucleotide
substitution signature (CpT > ApC) due to the absence of confident dinucleotide
substitutions within this cohort.
CD-15-0412: Murugaesu et al. - Supplementary Material
15
REFERENCES FOR METHODS
1. Mandard AM, Dalibard F, Mandard JC, Marnay J, Henry-Amar M, Petiot JF, et al.
Pathologic assessment of tumor regression after preoperative chemoradiotherapy of
esophageal carcinoma. Clinicopathologic correlations. Cancer. 1994;73:2680-6.
2. Gerlinger M, Horswell S, Larkin J, Rowan AJ, Salm MP, Varela I, et al. Genomic
architecture and evolution of clear cell renal cell carcinomas defined by multiregion
sequencing. Nature genetics. 2014;46:225-33.
3. Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, et al.
Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N
Engl J Med. 2012;366:883-92.
4. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The
Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA
sequencing data. Genome research. 2010;20:1297-303.
5. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics. 2009;25:1754-60.
6. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence
Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078-9.
7. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2:
somatic mutation and copy number alteration discovery in cancer by exome sequencing.
Genome research. 2012;22:568-76.
8. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants
from high-throughput sequencing data. Nucleic acids research. 2010;38:e164.
9. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al.
Integrative genomics viewer. Nature biotechnology. 2011;29:24-6.
CD-15-0412: Murugaesu et al. - Supplementary Material
16
10. Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, et al. Sequence-
specific error profile of Illumina sequencers. Nucleic acids research. 2011;39:e90.
11. Favero F, Joshi T, Marquard AM, Birkbak NJ, Krzystanek M, Li Q, et al. Sequenza:
allele-specific copy number and mutation profiles from tumor sequencing data. Annals of
oncology : official journal of the European Society for Medical Oncology / ESMO.
2015;26:64-70.
12. Dewhurst SM, McGranahan N, Burrell RA, Rowan AJ, Gronroos E, Endesfelder D, et
al. Tolerance of whole-genome doubling propagates chromosomal instability and
accelerates cancer genome evolution. Cancer discovery. 2014;4:175-85.
13. Burrell RA, McClelland SE, Endesfelder D, Groth P, Weller MC, Shaikh N, et al.
Replication stress links structural and numerical cancer chromosomal instability. Nature.
2013;494:492-6.
14. Birkbak NJ, Wang ZC, Kim JY, Eklund AC, Li Q, Tian R, et al. Telomeric allelic
imbalance indicates defective DNA repair and sensitivity to DNA-damaging agents. Cancer
Discov. 2012;2:366-75.
15. Lohr JG, Stojanov P, Carter SL, Cruz-Gordillo P, Lawrence MS, Auclair D, et al.
Widespread genetic heterogeneity in multiple myeloma: implications for targeted therapy.
Cancer cell. 2014;25:91-101.
16. Landau DA, Carter SL, Stojanov P, McKenna A, Stevenson K, Lawrence MS, et al.
Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell.
2013;152:714-26.
17. Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27:592-3.
18. Nixon KC. The Parsimony Ratchet, a new method for rapid parsimony analysis.
Cladistics. 1999;15:407-14.
CD-15-0412: Murugaesu et al. - Supplementary Material
17
19. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A census of
human cancer genes. Nature reviews Cancer. 2004;4:177-83.
20. Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR, et al.
Discovery and saturation analysis of cancer genes across 21 tumour types. Nature.
2014;505:495-501.
21. Dulak AM, Stojanov P, Peng S, Lawrence MS, Fox C, Stewart C, et al. Exome and
whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver
events and mutational complexity. Nature genetics. 2013;45:478-86.
22. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous
variants on protein function using the SIFT algorithm. Nature protocols. 2009;4:1073-81.
23. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A
method and server for predicting damaging missense mutations. Nature methods.
2010;7:248-9.
24. Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation
prediction for the deep-sequencing age. Nature methods. 2014;11:361-2.
25. de Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, Yates L, et al. Spatial and
temporal diversity in genomic instability processes defines lung cancer evolution. Science.
2014;346:251-6.
26. Roberts SA, Lawrence MS, Klimczak LJ, Grimm SA, Fargo D, Stojanov P, et al. An
APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nature
genetics. 2013;45:970-6.
27. Meier B, Cooke SL, Weiss J, Bailly AP, Alexandrov LB, Marshall J, et al. C. elegans
whole-genome sequencing reveals mutational signatures related to carcinogens and DNA
repair deficiency. Genome research. 2014;24:1624-36.
CD-15-0412: Murugaesu et al. - Supplementary Material
18
Supplementary Table S1. Sequencing coverage for tumor regions and germline samples
Mean MedianEAC005 R1 Pre 110.51 101
R2 Pre 99.61 90R3 Pre 116.3 107R4 Post 90.8 80R5 Post 79.63 70R6 Post 93.29 78GL Germline 101.89 80
EAC015 R1 Pre 75.18 68R2 Post 106.52 97R3 Post 141.53 129R4 Post 137.5 126R5 Post 114.41 105GL Germline 93.47 86
EAC001 R1 Pre 95.44 81R2 Pre 77.17 68R3 Pre 89.76 79R4 Pre 103.14 89R5 Post 84.18 73R6 Post 106.76 89R7 Post 76.29 69R8 Post 87.24 79GL Germline 95.04 87
EAC006 R1 Pre 210.48 191R2 Pre 141.26 128R3 Post 189.62 165R4 Post 84.08 73R5 Post 105.55 94GL Germline 92.84 85
EAC003 R1 Pre 108.55 91R2 Pre 97.14 83R3 Pre 73.02 64GL Germline 142.26 132
EAC014 R1 Pre 77.7 68R2 Pre 96.26 84R3 Pre 113.45 101R4 Pre 121.75 108GL Germline 45.8 42
EAC017 R1 Pre 119.44 105R2 Pre 117.15 104R3 Post 62.02 56R4 Post 121.83 104R5 Post 108.37 92R6 Post 104.11 90GL Germline 116.13 106
EAC009 R1 Pre 90.14 82R2 Pre 116.33 105R3 Pre 136.46 123GL Germline 136.59 123
Tumor Region Type Coverage
CD-15-0412: Murugaesu et al. - Supplementary Material
19
Supplementary Table S2. Clinical characteristics of patients with EAC
Abbreviations: EAC; esophageal adenocarcinoma. ECX; epirubicin, cisplatin and capecitabine, EOX; epirubicin, oxaliplatin and capecitabine.
Identity Gender
Pre-op TNM Staging
Pre-op Stage Pathology
Mandard Score
Path TNM Stage
Post-op Stage Response
Neoadj chemo
No. of cycles Outcome
EAC005 Male T2N1 2B Adenocarcinoma 5 pT3N2 3B Upstaged ECX 2 Poor
EAC015 Male T3N1 3A Adenocarcinoma 5 pT4aN3 3C Upstaged ECX 3 Poor
EAC001 Male T3N0 3A Adenocarcinoma 5 pT3N0 2B Same ECX 4 Intermediate
EAC006 Male T1N0 1B Adenocarcinoma 3 pT1N1 2B Upstaged EOX 3 Intermediate
EAC014 Male T3N1 3A Adenocarcinoma 4 pT1bN1 2B Downstaged ECX 1 Intermediate
EAC003 Male T3N1 3A Adenocarcinoma 3 pT3N1 3A Same ECX 3 Good
EAC017 Male T3N1 3A Adenocarcinoma 3 pT2N1 2B Downstaged EOX 3 Good
EAC009 Male T4N2 3C Adenocarcinoma 2 pT1bN1 2B Downstaged ECX 3 Good
CD-15-0412: Murugaesu et al. - Supplementary Material
20
Supplementary Table S4 Annotation of chromosomal segments that overlap with TCGA ESCA recurrent amplifications and tumor regions in our EAC cohort.
Abbreviations: EAC; esophageal adenocarcinoma, ESCA; esophageal cancer.
Chromosome Start End # Genes Gistic Reg Size (bp) Cytoband Rangechr1 145414549 171983088 548 chr1q23.3 26568539 chr1p12-q24.3chr6 43738144 44153635 7 chr6p21.1 415491 chr6p21.1chr7 54942676 57316916 27 chr7p11.2 2374240 chr7p11.2chr7 92730862 94052663 12 chr7q21.2 1321801 chr7q21.2-q21.3chr8 11355603 12580568 28 chr8p23.1 1224965 chr8p23.1chr8 128702020 128825001 3 chr8q24.21 122981 chr8q24.21chr11 33902613 36458777 19 chr11p13 2556164 chr11p13chr11 69647998 69924352 2 chr11q13.3 276354 chr11q13.3chr12 24982722 25801682 6 chr12p1 818960 chr12p12.1chr12 69499276 70672267 14 chr12q15 1172991 chr12q15chr13 73630948 84107781 43 chr13q22.1 10476833 chr13q22.1chr14 35399861 39901572 36 chr14q21.1 4501711 chr14q13.2-q21.1chr17 37875893 38020421 4 chr17q12 144528 chr17q12chr18 19613596 19853474 2 chr18q11.2 239878 chr18q11.2chr19 30243979 31770851 4 chr19q12 1526872 chr19q12
CD-15-0412: Murugaesu et al. - Supplementary Material
21
Supplementary Figure S1. Mutations identified using M-seq compared with a single
biopsy. Median and interquartile range are indicated by horizontal black lines. M-seq;
multi-region exome sequencing.
Supplementary Figure S2. Relationship of intratumor heterogeneity and response to
NAC treatment. Spearman rho is indicated. NAC; neoadjuvant chemotherapy.
Supplementary Figure S3. Tumor phylograms. Phylograms were inferred using a
parsimony ratchet approach. The phylograms are presented to scale, with the number of
mutations as evolutionary distance. Uncertainties assessed by bootstrap tests are indicated
next to the nodes. GL indicates germline. Scale bar indicates the number of mutations.
Supplementary Figure S4. Copy number events lead to mutational heterogeneity. For
example, in EAC015 on chromosome 2, mutations present in other tumor regions but not
present in region R1 are likely absent due to complete loss of one chromosomal copy. Only
mutations potentially explained by copy number events are depicted (grey). Major allele
copy number is shown with a black line and the minor allele copy number with a green
line.
Supplementary Figure S5. Heatmap of the distribution of all the driver mutations
identified in the EAC cohort. The putative driver mutations were classified as tumor
suppressor genes or oncogenes as reported the COSMIC cancer gene census (17) this is
CD-15-0412: Murugaesu et al. - Supplementary Material
22
indicated in dark green and light green respectively. Ubiquitously detected mutations
(present in all tumor regions) are indicated in dark blue and heterogeneous mutations
(present in one or more tumor regions but not all) are indicated in orange. Driver
mutations that confer an illusion of clonality are indicated in the right column in dark blue.
Supplementary Figure S6. Driver mutations identified using M-seq compared with a
single biopsy. Median and interquartile range are indicated by horizontal black lines. M-
seq; multi-region exome sequencing.
Supplementary Figure S7. Copy number states across the genome for all tumor regions.
Gains (+1 copy number relative to ploidy) are depicted in orange, losses (-1 copy number
relative to ploidy) are depicted in blue and amplifications (x2 ploidy) are depicted in red.
Supplementary Figure S8. Chromosome view of chromosome 19, tumor sample EAC017,
region R1, demonstrating chromothripsis. A) Mutant allele fraction, each dot indicates a
mutation. (B) B-allele fraction based on SNPs detected in the region. (C) Depth ratio of the
tumor relative to the paired normal sample. Within each window, a thick black line
indicates the median value, and a blue bar indicates the interquartile range. Red lines
indicate segmented values. The thin dotted lines indicate the expected copy number
values under the fitted model.
Supplementary Figure S9. wGII scores for the TCGA ESCA cohort and M-seq EAC cohort.
Grey, TCGA samples. Red, pre-chemotherapy tumor regions. Green, post-chemotherapy
CD-15-0412: Murugaesu et al. - Supplementary Material
23
tumor regions. Median and interquartile range are indicated by horizontal black lines.
ESCA; esophageal cancer, EAC; esophageal adenocarcinoma.
Supplementary Figure S10. The Trinucleotide context for temporally dissected combined
EAC cohort. Each 96 substitution classification is defined by the mutation type and
sequence context immediately 3’ and 5’ to the mutated base. The mutation types are on
the top horizontal bar, vertical axes depict the percentage of mutations attributed to a
specific mutation type.
Supplementary Figure S11. Assessing copy number heterogeneity using a minimum
consecutive segment method. This figure shows mock segmentation data with copy
number aberrations present in four regions from a single mock patient. Panel A shows the
segmentation data, Panel B shows the data after the transformation into minimum
consecutive segments (MCS). The start and end positions of all copy number aberrations
from each tumor region, see the short vertical black lines in Panel A, are introduced to all
regions within the same tumor sample, as shown in Panel B with long black vertical lines.
The original genomic segments were split into smaller segments giving the MCS. If an area
of overlap was present in all tumor regions containing the same copy number aberration,
either a loss, gain or amplification, as indicated in Panel B, this would now be identified as
a ubiquitous and all other MCS with copy number aberrations were defined as
heterogeneous.
Supplementary Figure S1
Multi-region Single biopsy
100
150
200
250
300
Num
ber o
f non
−sile
nt m
utat
ions
EAC005EAC015EAC001EAC006EAC014EAC003EAC017EAC009
●●●●●●●●
Supplementary Figure S2
●
●●
●
●
●●
Response
ITH
inde
x0.
10.
20.
30.
4
Good Intermediate Poor
Spearman rho = 0.93
EAC005EAC001EAC006EAC014EAC003EAC017EAC009
●●●●●●●
R1
R2
R3 R4
R5
GL
47
9499
20
R1 R2
R3
R4 R5
R6
R7R8
GL
52
96
100
100
5071
10
R1
R2 R3
R4R5
GL
8490
9320R1
R2
R3
R4R5
R6 GL
5494
47
54
100
10
R1R2
R3R4
GL
100
100R1
R2
R354
10
R1 R2
R3
R4
R5
R6
GL
67
10074
100
20
R2
R1
R3
GL
38
1010
GL
Supplementary Figure S3
EAC006
EAC014
EAC009EAC003 EAC017
EAC005 EAC015 EAC001
0
1
2
3
4
>5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122
R1_P
reCh
emo
0
1
2
3
4
>5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122
0
1
2
3
4
>5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122
0
1
2
3
4
>5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122
Supplementary Figure S4
Copy
num
ber
R2_P
ostC
hem
oCo
py n
umbe
r
R3_P
ostC
hem
oCo
py n
umbe
rR5
_Pos
tChe
mo
Copy
num
ber
EAC0
05EA
C015
EAC0
01EA
C006
EAC0
14EA
C003
EAC0
17EA
C009
TLL1TLR4MTOREYSNUAK1GNPTABFAT1BCL6STAT3CBFA2T3OLIG2KCNJ5CCDC6EGFRKLF4TPRHIST1H4IRPL22TCF7L2MN1ACSL3CDH11BRD4PIK3R1PALB2CICATRXDOCK2SLC39A12PTPRCHIP1PMLNOTCH1ATMSMAD4SCN10AAJAP1SPG20SYKCHN1SETBP1MAML2BRAFPER1HOXD13MYH11OMDEWSR1MYCLZNF331CLTCROS1PDGFRAKITNKX2−1KAT6ABRIP1BRCA1NF1DICER1CDKN2ACDH1EXT2AXIN1PBRM1SYNE1AKAP6NTRK3TP53
TSG
or O
G
Illus
ion
of c
lona
lity
Driver gene
Classi�cation of driver gene
Distribution of driver gene
tumor suppressor
oncogene
Illusion of clonalitypresentabsent
ubiquitous
heterogneous
unclassi�ed
Supplementary Figure S5
510
1520
Num
ber o
f driv
er m
utat
ions
EAC005EAC015EAC001EAC006EAC014EAC003EAC017EAC009
Multi-region Single biopsy
Supplementary Figure S6
●●●●●●●●
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
EAC
005
EAC0
15EA
C001
EAC0
06EA
C014
EAC0
03EA
C017
EAC0
09
Tum
or s
ampl
e
chromosome
R6 R5
R1 R2 R3 R4
R6 R5
R2 R3 R4
R1 R2 R3 R4
R1 R2 R3
R6 R5
R1 R2 R3 R4
R4 R3 R2
R1 R2 R3 R5
R4 R5 R6
Supplementary Figure S7
Pre
Pre
Pre
Pre
Pre
Pre
Pre
Post
Post
Post
Post
Post
0.0
0.2
0.4
0.6
0.8
1.0
mut.tab$position
Mut
ant a
llele
freq
uenc
y
A>C, T>GA>G, T>CA>T, T>AC>A, G>TC>G, G>CC>T, G>A
0.0
0.1
0.2
0.3
0.4
0.5
xlim
B al
lele
freq
uenc
y
0.0
0.5
1.0
1.5
2.0
2.5
xlim
Dep
th ra
tio
0123456789
Copy
num
ber
0 10 20 30 40 50Position (Mb)
chr19
Supplementary Figure S8
wG
II
0.0
0.2
0.4
0.6
0.8
TCGA M−seqall tumor
regions
M−seqpre-
chemotherapytumor regions
M−seqpost-
chemotherapytumor regions
Supplementary Figure S9
C > A C > G C > T T > A T > C T > G
Perc
enta
ge o
f tot
al m
utat
ions
05
1015
20
C > A C > G C > T T > A T > C T > G
ACA
ACC
ACG
ACT
CCA
CCC
CCG
CCT
GCA
GCC
GCG GCT TC
ATC
CTC
GTC
TAC
AAC
CAC
GAC
TCC
ACC
CCC
GCC
TG
CAG
CCG
CG GCT TC
ATC
CTC
GTC
TAC
AAC
CAC
GAC
TCC
ACC
CCC
GCC
TG
CAG
CCG
CG GCT TC
ATC
CTC
GTC
TAT
AAT
CAT
GAT
TCT
ACT
CCT
GCT
TG
TAG
TCG
TG GTT TT
ATT
CTT
GTT
TAT
AAT
CAT
GAT
TCT
ACT
CCT
GCT
TG
TAG
TCG
TG GTT TT
ATT
CTT
GTT
TAT
AAT
CAT
GAT
TCT
ACT
CCT
GCT
TG
TAG
TCG
TG GTT TT
ATT
CTT
GTT
T
Perc
enta
ge o
f tot
al m
utat
ions
05
1015
20
ACA
ACC
ACG
ACT
CCA
CCC
CCG
CCT
GCA
GCC
GCG GCT TC
ATC
CTC
GTC
TAC
AAC
CAC
GAC
TCC
ACC
CCC
GCC
TG
CAG
CCG
CG GCT TC
ATC
CTC
GTC
TAC
AAC
CAC
GAC
TCC
ACC
CCC
GCC
TG
CAG
CCG
CG GCT TC
ATC
CTC
GTC
TAT
AAT
CAT
GAT
TCT
ACT
CCT
GCT
TG
TAG
TCG
TG GTT TT
ATT
CTT
GTT
TAT
AAT
CAT
GAT
TCT
ACT
CCT
GCT
TG
TAG
TCG
TG GTT TT
ATT
CTT
GTT
TAT
AAT
CAT
GAT
TCT
ACT
CCT
GCT
TG
TAG
TCG
TG GTT TT
ATT
CTT
GTT
T
Supplementary Figure S10
Early Late
copy number aberration segment normal copy number segment
A
R1
R2
R3
R4
R1
R2
R3
R4
B
H H U H
H = heterogeneous U = ubiquitous
overlap
Supplementary Figure S11