Identification of Cancer Susceptibility Loci by High-Resolution Cancer Gene Microarray Analysis
by
Jonathan Trick
A thesis submitted in conformity with the requirements for the
degree of Master of Science
Department of Medical Biophysics
University of Toronto
©Copyright by Jonathan Trick (2014)
ii
Identification of Cancer Susceptibility Loci by High-Resolution Cancer
Gene Microarray Analysis
Jonathan Trick
Master of Science
Department of Medical Biophysics
University of Toronto
2014
Abstract
Li-Fraumeni syndrome (LFS) is a highly penetrant familial cancer syndrome associated with
inherited germline mutations in TP53. However, a causative gene has not been identified in
~25% of LFS and many LFS-like families.
In this work we aim to discover possible candidate genes that contribute to the highly
penetrant cancer predisposition observed in these families. We designed an ultra high-
resolution CGH array to interrogate genes implicated in cancer. We detected alterations in
genes such as MSH2 and PTCH1 which were successfully validated by qPCR. This also led
to the detection of a novel substitution in a family by sequencing of PTCH1. Upon further
investigation, we found that the proband harbouring the MSH2 deletion belongs to a kindred
that satisfies the criteria for Lynch syndrome. Both the methodology and the results of this
study can be expanded upon in the future to hopefully identify additional causative genes in
LFS.
iii
Acknowledgements
There are a number of people to whom I owe the completion of this thesis. The first,
like all things, is my mother who raised my brother and I alone in very difficult
circumstances but always reinforced the value of education. My supervisor, Dr. David
Malkin, is someone I can truly say is a great role model for any aspiring scientist. I am very
grateful for both the guidance and patience he has shown me throughout my time in his lab.
My committee members, Dr. Susan Done and Dr. Stephen Meyn, were also a great help
throughout the project offering sound advice along the way and providing insightful
feedback during the writing of this thesis.
The members of the Malkin lab provided not only scientific support, but also an
enjoyable environment in which to work. Ana Novokmet and Margaret Pienkowska in
particular were especially helpful. This project was only completed thanks to the
collaboration of many people including the staff at the Centre for Applied Genomics, the
staff of the Molecular Genetics Laboratory at the Hospital for Sick Children, and the lab of
Dr. Cynthia Hawkins.
I also owe not only my supervisor, but both my academic department of Medical
Biophysics and my chain of command in the Canadian Armed Forces for being so
accommodating. I consider myself uniquely fortunate to be surrounded by so many people
willing to make my goals a priority. Finally, I must thank my family and friends, especially:
Zainab Motala, Don Oakie, James Tran, Matthew Mistry, Jordan Jarvis, Jonathan Fuller,
Jordan John, and my brother Michael, for their unconditional support.
iv
Table of Contents Abstract .................................................................................................................................................. ii
Acknowledgements ............................................................................................................................... iii
List of Tables........................................................................................................................................ vii
List of Figures ..................................................................................................................................... viii
List of Abbreviations .............................................................................................................................. x
Chapter 1: Introduction and Background ............................................................................................... 1
1.1 Li-Fraumeni Syndrome .......................................................................................................... 1
1.1.1 Overview ........................................................................................................................ 1
1.1.2 Cancer Risk Patterns in LFS Families ............................................................................ 4
1.1.3 Genetic Etiology of Li-Fraumeni Syndrome .................................................................. 5
1.1.4 Genetic Modifiers ......................................................................................................... 10
1.1.5 The Role of Other Genes in LFS .................................................................................. 13
1.1.6 Epigenetics and Li-Fraumeni Syndrome ...................................................................... 14
1.2 Hereditary Cancer Predisposition Syndromes ...................................................................... 15
1.2.1 Overview ...................................................................................................................... 15
1.2.2 Gorlin Syndrome .......................................................................................................... 17
1.2.3 Lynch Syndrome .......................................................................................................... 20
1.3 TP53 ..................................................................................................................................... 24
1.3.1 Overview ...................................................................................................................... 24
1.3.2 Transcriptional Regulation of TP53 ............................................................................. 25
1.3.3 TP53 and MDM2 ......................................................................................................... 27
1.3.4 Post Translational Modifications and the TP53 Response ........................................... 29
1.3.5 TP53-mediated Cell Cycle Arrest ................................................................................ 31
1.3.6 TP53-mediated Apoptosis ............................................................................................ 32
1.4 Copy Number Variation ....................................................................................................... 33
1.4.1 Overview ...................................................................................................................... 33
1.4.2 Generation of Copy Number Variants.......................................................................... 36
1.4.3 Germline CNVs in Disease .......................................................................................... 39
1.4.4 CNVs in Cancer predisposition .................................................................................... 40
1.5 Rationale .............................................................................................................................. 45
Chapter 2: Materials and Methods ....................................................................................................... 46
v
2.1 Genes of Interest .................................................................................................................. 46
2.2 Array Design ........................................................................................................................ 46
2.2.1 Design Overview .......................................................................................................... 46
2.2.2 Design of Exonic Probes .............................................................................................. 47
2.2.3 Genomic Probes ........................................................................................................... 49
2.2.4 Non-coding Exon Probes ............................................................................................. 50
2.2.5 Promoter Region Probes ............................................................................................... 50
2.2.6 Gene Intron Probes ....................................................................................................... 51
2.2.7 Finalizing the Array ..................................................................................................... 52
2.3 Samples ................................................................................................................................ 52
2.3.1 Sample Selection .......................................................................................................... 52
2.3.2 Subject Recruitment ..................................................................................................... 55
2.4 Analysis and Validation of CGH Array Data ....................................................................... 55
2.4.1 Custom CGH Array Analysis ....................................................................................... 55
2.4.2 Quantitative PCR validation ......................................................................................... 56
2.4.3 TaqMan PTCH1 Validation ......................................................................................... 57
2.4.3 Sequencing of PTCH1 .................................................................................................. 57
2.4.4 Mismatch Repair Gene Mutation Screening ................................................................ 57
Chapter 3: Results ................................................................................................................................ 58
3.1 Array Results ........................................................................................................................ 58
3.1.1 Candidate Genes Found by Custom Array CGH ......................................................... 58
3.1.2 Significant Alterations Found in Single Samples ......................................................... 59
3.1.3 Difficulty Identifying Genuine Copy Number Alterations with Custom Probes ......... 60
3.1.4 Genetic evidence of anticipation .................................................................................. 62
3.2 Validation of Candidate Genes ............................................................................................ 64
3.2.1 MSH2 involved in a Li-Fraumeni-Like phenotype ...................................................... 64
3.2.2 Large copy number gains in sample 23 likely an artifact ............................................. 66
3.2.3 Confirmation of custom array’s ability to detect previously unknown alterations ...... 68
3.2.4 DICER1 is copy number neutral in the patient cohort ................................................. 70
3.2.5 PTCH1 validation ......................................................................................................... 72
4.1 Discussion on high-resolution genomic analysis in LFS ..................................................... 80
vi
4.1.1 Utility of custom CGH arrays for detecting novel copy number alterations in the
germline 80
4.1.2 Next Generation Sequencing ........................................................................................ 83
4.1.3 Next Generation Sequencing and Cancer Syndromes .................................................. 86
4.2 Potential Genetic Evidence of Anticipation ......................................................................... 89
4.3 Discovery of a Lynch Syndrome Kindred ............................................................................ 90
4.4 PTCH1 Associated with the LFS phenotype ........................................................................ 93
4.4.1 Deletions in PTCH1 isoforms associated with the LFS phenotype ............................. 93
4.4.2 A Novel Variant in an LFS-L Family Affected by Syndactyly.................................... 95
4.5 Management of Cancer Predisposition Syndromes .............................................................. 97
4.6 Concluding Remarks ............................................................................................................ 98
References .......................................................................................................................................... 100
Supplementary Information ............................................................................................................... 118
Supplementary Materials and Methods .......................................................................................... 118
Sanger Genes .............................................................................................................................. 118
Additional Genes ........................................................................................................................ 129
qPCR Supplementary Information ................................................................................................. 129
qPCR Protocol ............................................................................................................................ 129
qPCR Primers ............................................................................................................................. 130
Supplementary Results ................................................................................................................... 133
Regions of Interest after CGH array analysis ............................................................................. 133
vii
List of Tables
Table 1- The revised Amsterdam criteria and Bethesda guidelines used to diagnose HNPCC.
Table 2- Known cancer predisposition genes with rare copy number variants and their
associated syndromes.
Table 3- Summary of all 40 samples run on the array including sex, related samples, age at
diagnoses (dx) and tumour type.
Table 4- Primary genes of interest after CGH array analysis
Table 5-Large copy number losses seen in only one sample
Table 6- Seven large copy number gains were detected in Sample 23
Table 7- Summary of all shared alterations that were detected as expanded in families by
array segmentation analysis.
viii
List of Figures
Figure 1- LFS component Tumours compiled by Nichols and Malkin.
Figure 2- Tumours associated with TP53 germline mutations seen in LFS.
Figure 3- TP53 mutations in Li-Fraumeni Syndrome carriers are clustered in the DNA
binding domain, codons 102-292.
Figure 4- Human TP53 protein has four key functional domains: the transcriptional
activation domain, the DNA binding domain, the tetramerization domain, and the negative
regulatory domain.
Figure 5- Recently there has been a rapid rise in reported copy number variation into the
Database of Genomic Variants.
Figure 6- The majority of copy number variants reported in the Database of Genomic
Variants are now under 10kb.
Figure 7- Custom probes of our custom CGH array are highlighted in green while copy
number probes from the Affymetrix Genome-Wide Human SNP6.0 are highlighted in red.
The custom array provides a much greater probe density in the coding regions in genes of
interest.
Figure 8- A notable example of a sample (b) which was not identified by segmentation
analysis but reported very similar log ratios to a sample (a), which was. Samples a and b
were run beside each other on the same slide.
Figure 9- MSH2 copy number loss encompassing exons 3-6 (shown in red) detected on the
custom CGH array
Figure 10- MSH2 copy number loss in validated via SYBR Green qPCR. Shown are the
mean (+/- SEM) copy number ratios.
Figure 11- Pedigree of a proband with a history of HNPCC in the paternal lineage and breast
cancer predisposition in the maternal lineage.
Figure 12- FOXO1 copy number gain failed to validate via SYBR Green qPCR, instead
appearing as a copy number loss. Shown are the mean (+/- SEM) copy number ratios.
Figure 13- MYCN copy number gain failed to validate via SYBR Green qPCR, instead
appearing as copy number neutral. Shown are the mean (+/- SEM) copy number ratios.
Figure 14- PCDH15 copy number gain successfully validated via SYBR Green qPCR.
Shown are the mean (+/- SEM) copy number ratios.
ix
Figure 15- EXT1 copy number loss successfully validated via SYBR Green qPCR. Shown
are the mean (+/- SEM) copy number ratios.
Figure 16- The copy number losses on the 5’ end of DICER1 failed to validate via SYBR
Green qPCR, instead appearing as copy number neutral in all four samples. Shown are the
mean (+/- SEM) copy number ratios.
Figure 17- The copy number losses encompassing two exons on the 5’ end of DICER1 failed
to validate via SYBR Green qPCR, instead appearing as copy number neutral in all three
samples. Shown are the mean (+/- SEM) copy number ratios.
Figure 18- Thirteen copy number losses were observed on the far 5’ end of PTCH1, shown in
red.
Figure 19- Of the copy number losses encompassing the 5’ end of PTCH1, 11/13
successfully validated via SYBR Green qPCR. Shown are the mean (+/- SEM) copy number
ratios.
Figure 20- The six samples with detected copy number losses encompassing a TaqMan probe
were all revealed to be copy number neutral at the probe’s locus using a TaqMan copy
number qPCR assay.
Figure 21- The copy number gain spanning 2.6kb in PTCH1 failed to validate via SYBR
Green qPCR, instead appearing as a copy number loss. Shown are the mean (+/- SEM) copy
number ratios.
Figure 22- The copy number loss observed in exon 14 of PTCH1 failed to validate via SYBR
Green qPCR, instead appearing as copy number neutral. Shown are the mean (+/- SEM)
copy number ratios.
Figure 23- The copy number gain observed in the 3’UTR of PTCH1 failed to validate via
SYBR Green qPCR, instead appearing as copy number neutral. Shown are the mean (+/-
SEM) copy number ratios.
Figure 24- Pedigree Showing a LFS-L family with a history of syndactyly in the maternal
lineage.
x
List of Abbreviations
ALL acute lymphoblastic leukemia
APAF1 apoptotic peptidase activating factor 1
APC adenomatous polyposis coli
ARF alternate reading frame of CDKN2A
ATM ataxia telangiectasia mutated
ATR ataxia telangiectasia and Rad3-related protein
BAP1 BRCA1 associated protein-1
BAX BCL2-associated X protein
BCC basal cell carcinoma
BCL10 B-cell CLL/lymphoma 10
Bcl-2 B-cell CLL/lymphoma 2
BCMA tumor necrosis factor receptor superfamily, member 17
BIR potassium inwardly-rectifying channel, subfamily J, member 11
BLM Bloom syndrome, RecQ helicase-like
BRCA1 breast cancer 1, early onset
BRCA2 breast cancer 2, early onset
CCDS consensus coding sequence project
CDH1 cadherin 1, type 1, E-cadherin
CDK cyclin-dependent kinase
CDKN1A /p21 cyclin-dependent kinase inhibitor
CGH comparative genomic hybridization
CHEK1 checkpoint kinase 1
CHEK2 checkpoint kinase 2
CMMR-D constitutional mismatch repair-deficiency
c-MYC v-myc avian myelocytomatosis viral oncogene homolog
xi
CNV copy number variant
COSMIC catalogue of somatic mutations in cancer
CXCL1 chemokine (C-X-C motif) ligand 1
DGV database of genomic variants
DM1 myotonic dystrophy type 1
DMC1 DNA meiotic recombinase 1
EBV Epstein-Barr virus
EPCAM epithelial cell adhesion molecule
ERCC4 excision repair cross-complementation group 4
EXT1 exostosin glycosyltransferase 1
FADD Fas (TNFRSF6)-associated via death domain
FANCC Fanconi anemia, complementation group C
FAP familial adenomatous polyposis
FasL Fas ligand
FoSTeS fork stalling and template switching
FOXO1 forkhead box O1
FOXP2 forkhead box P2
GADD45 growth arrest and DNA-damage-inducible, alpha
GLI GLI family zinc finger
HBOC hereditary breast-ovarian cancer syndrome
Hh Hedgehog
HNPCC Hereditary Nonpolyposis Colorectal Cancer Syndrome
HR homologous recombination
IARC International Agency for Research on Cancer
IL-8 interleukin 8
IPA Ingenuity Pathways Analysis
IR ionizing radiation
xii
KDM1A/LSD1 lysine (K)-specific demethylase 1A
KIA1797 focadhesin
LFL Li-Fraumeni like
LFS Li-Fraumeni syndrome
MAX MYC associated factor X
MDM2 MDM2 proto-oncogene, E3 ubiquitin protein ligase
miRNA micro-RNA
MLH1 mutL homolog 1
MLPA multiplex ligation polymorphism analysis
MMR mismatch repair
MSH2 mutS homolog 2
MSH6 mutS homolog 6
MTA3 metastasis associated 1 family, member 3
NAHR non-allelic homologous recombination
NBCCS nevoid basal cell carcinoma syndrome
NCBI National Center for Biotechnology Information
NF1 neurofibromin 1
NF2 neurofibromin 2
NGS next generation sequencing
NHEJ non-homologous end joining
NOXA phorbol-12-myristate-13-acetate-induced protein 1
OPGP Ontario Population Genomics Platform
p16 cyclin-dependent kinase inhibitor 2A, multiple tumor suppressor 1
p63 tumor protein p63
p73 tumor protein p73
PCC hereditary pheochromocytoma
PCDH15 protocadherin 15
xiii
PCNA proliferating cell nuclear antigen
PCR polymerase chain reaction
PMS2 postmeiotic segregation increased 2
PTCH1 patched 1
PTCH2 patched 2
PTEN phosphatase and tensin homolog
PUMA BCL2 binding component 3
QC quality control
RAD51 RAD51 recombinase
Rad51L1 RAD51 paralog B
RB retinoblastoma
RB1 retinoblastoma 1
SHH sonic hedgehog
SIRT1 sirtuin 1
SIRT3 sirtuin 3
SMO smoothened, frizzled class receptor
SNP single nucleotide polymorphism
Sp1 Sp1 transcription factor
Spo11 SPO11 meiotic protein covalently bound to DSB
STK11 serine/threonine kinase 11
SUFU suppressor of fused homolog (Drosophila)
TCAG The Centre for Applied Genomics
TP53 tumor protein p53
TRRAP transformation/transcription domain-associated protein
UTR untranslated region
UV ultraviolet radiation
VHL von Hippel-Lindau tumor suppressor, E3 ubiquitin protein ligase
xiv
WRAP53/ WDR79 WD repeat containing, antisense to TP53
WT1 Wilms tumor 1
1
Chapter 1: Introduction and Background
1.1 Li-Fraumeni Syndrome
1.1.1 Overview
In 1969, Li and Fraumeni reported findings that indicated the existence of a previously
unknown familial cancer predisposition syndrome. Following an epidemiological survey
including 280 medical charts and 418 death certificates of children diagnosed with
rhabdomyosarcoma from 1960 to 1964, a familial pattern of cancer emerged. In five families, a
second child had developed a soft tissue sarcoma. In addition to this, a diverse range of tumours
including osteosarcomas, (premenopausal) breast cancers, brain cancers, and leukemias were
observed in the first- and second-degree relatives along one lineage of the proband. A number of
cases of multiple metachronous primary tumours were also observed. This striking incidence of
cancer in each family was far greater than would be expected by chance. The pattern of
inheritance seemed to suggest an autosomal dominant mechanism of transmission1. In 1988, Li
et al described 24 families that they had followed for many years, who met the following criteria
which would become the clinical definition of “classic” Li-Fraumeni syndrome (LFS, OMIM
151623): a proband with sarcoma diagnosed under the age of 45, who has a first-degree relative
with any cancer under the age of 45, as well as another first- or second-degree relative with
either any cancer under 45 years or a sarcoma at any age2.
2
Figure 1- LFS component Tumours compiled by Nichols and Malkin3.
Figure 2- Tumours associated with TP53 germline mutations seen in LFS4.
3
To date, more than 500 classical LFS families have been reported in the database of the
International Association for Research on Cancer4 or through isolated reports. Many more
families exist but have not been reported in the literature. Because the classical LFS criteria are
quite stringent, a designation of “LFS-like” (LFL) is used to describe the many families that
show a strong similarity to LFS inheritance and tumor type, but do not meet the classical criteria.
The defining criteria to classify LFL have been refined by Chompret et al. to include: i) a
proband with a characteristic LFS tumour before the age of 46 with at least one first- or second-
degree relative with an LFS tumour (except for breast cancer if the proband has breast cancer)
before the age of 56 or with multiple primary tumours; ii) a proband with multiple tumours
(except breast cancer), two of which fall in the LFS tumour spectrum and the first of which
occurred before the age of 46; and iii) any patient with an adrenocortical carcinoma or choroid
plexus tumour, irrespective of family history5.
A hallmark of LFS is the wide tumour spectrum that can be observed even within
families (which would presumably have the same or similar mutation profiles). Even more
striking is the incidence of multiple primary tumour types in a single individual. Hisada et al.
carried out a retrospective study of 200 affected carriers of TP53 germline mutations and found
that 15% developed a second cancer, 4% a third cancer, and 2% a fourth cancer. Those who had
survived childhood cancers were at the highest risk of developing additional malignancies6. It
should be noted however, that it is not clear if the outcome of these patients is significantly
different than non-LFS patients who have been treated in a similar manner for the same
(sporadic) tumour. While notable, this characteristic is not unique in hereditary cancer
syndromes. Individuals with germline mutations in the retinoblastoma protein, RB1, another
tumour suppressor, are also at a higher risk of developing secondary tumours7.
4
1.1.2 Cancer Risk Patterns in LFS Families
Attempts at determining a lifetime cancer risk in LFS patients have led to somewhat
varied results. In a hospital-based analysis in 2006, Wu et al. estimated the lifetime risk of TP53
mutation carriers to be 73% in males, approaching 100% in females. The high risk of breast
cancer in females was thought to account for the large difference between the sexes8. Creating
three broad age groups: <15 years, 16-45 years, and >45 years, the relative risk in males was
found to be 19%, 27%, and 54%, respectively. The relative risk for females was found to be
12%, 82%, and 100% in the same age groups. An earlier study by Hwang et al. in 2003 observed
families identified on the basis of childhood soft tissue sarcomas. They evaluated cancer risk in
both gene mutation carriers and noncarriers who had been followed for more than 20 years9. In
the carrier group, 12%, 35%, 52%, and 80% developed cancer by the ages of 20, 30, 40, and 50
years, respectively. Breast cancers and soft tissue sarcomas represented the most common
cancers. In the more than 3000 noncarriers, a cumulative risk of 0.7%, 1%, 2.2%, and 5.1% was
observed. Interestingly, this is almost identical to that of the general population, which lends
credence to the theory that the germline mutation of TP53 is sufficient to cause the LFS
phenotype without any other genetic modifiers. This study also found a higher cancer risk in the
female carriers. In the four age groups previously mentioned, the specific cumulative risks for
female carriers were found to be 18%, 49%, 77% and 93%. Every age group saw an increased
risk when compared to the male carriers: 10%, 21%, 33%, and 68%. Contradicting the
commonly held belief that the high incidence of breast cancer accounts for this difference, the
study showed an increased cancer risk in females even after sex-specific cancers (breast, ovarian,
and prostate cancer) were excluded. Cancers with a higher risk for females included brain and
lung.
5
1.1.3 Genetic Etiology of Li-Fraumeni Syndrome
Due to its highly penetrant, autosomal dominant nature, it was believed that a common
etiological agent existed in LFS. However, due to the relative rarity of LFS/LFL kindreds, along
with a high mortality, available tissue samples were limited. Furthermore, the normal karyotypes
observed in LFS patients hampered classical genetic linkage analysis. Not until 20 years after
being described, in 1990, was TP53 identified as a causative gene in LFS. Malkin et al. used a
candidate gene approach based on the observation that somatic mutations in TP53 were detected
in more than 50% of sporadic human cancers10
as well as that transgenic mice expressing
mutated TP53 alleles developed a wide spectrum (albeit not identical to the LFS spectrum) of
tumours, as seen in LFS11
. All five families studied in this initial report were found to harbour
germline heterozygous point mutations in the TP53 gene. Subsequent studies have led to the
estimate that of the families that meet the criteria for classical LFS, only 60%-80% have had
germline mutations detected in TP5312
. The incidence of germline TP53 coding mutations in
LFS-like kindreds is estimated to be significantly lower at roughly 40%13
. The rate of TP53
mutations per birth has been estimated to be roughly 1 in 5000 individuals14
. The frequency of de
novo TP53 mutations in carriers has been estimated to be between 7 and 20%15
. Similar to
sporadic tumours, TP53 mutations tend to be clustered in the DNA binding domain of the gene
(codons 102-292), especially in highly conserved regions (figure 3). While the order of
frequency varies between the two groups, there is a distinct similarity in mutation “hotspots” in
LFS and sporadic tumours. These hotspots are usually residues located at or near the p53-DNA
interface including codons 175, 220, 245, 248, 273, and 2824. These sites are categorized as
“contact” mutants (codons 248 and 273) or “structural” mutants (codons (175, 220, 245, and
282) based on whether they directly bind DNA or contribute to the necessary folding of the DNA
6
binding domain. Within TP53, the location of the inherited mutation does appear to have a
predisposing effect on tumour type. Missense mutations in the DNA binding domain, the most
common mutation type observed, are associated with breast and brain tumours. Nonsense,
frameshift and splice mutations which result in truncated protein or a loss of function are
associated with an increased risk of early onset tumours, particularly in the brain. Interestingly,
adrenocortical tumours represent the only tumour type consistently associated with mutations
outside the DNA binding domain. Missense mutations in the loops opposing the DNA binding
domain are frequently seen in these tumours12
. While there is overlap between LFS component
tumours and sporadic tumours that frequently acquire TP53 mutations, there are also notable
exceptions, particularly sporadic sarcomas which rarely (~5%) acquire TP53 mutations despite
being a hallmark of LFS 16
.
7
Figure 3- TP53 mutations in Li-Fraumeni Syndrome carriers are clustered in the DNA
binding domain, codons 102-2924.
Both in the germline and in sporadic tumours, missense mutations account for the
majority (~70%) of mutations in TP53 4. This is in contrast to the nonsense mutations commonly
seen in other tumour suppressors which often result in severely truncated protein rather than the
full-length, but dysfunctional protein product encoded by missense mutations. Hussain and
Harris compared TP53 mutations to that of RB1, APC, ATM, WT1, BRCA1, BRCA2, NF1, NF2,
p16 and VHL, 10 well-known tumor suppressors known to be mutated in human cancer. Most of
these were inactivated by nonsense mutations, deletions or insertions17
whereas 74% of TP53
8
mutations were missense, a figure supported by the updated IARC database4. The advent of high-
throughput sequencing however has improved our ability to detect the small missense mutations
across the cancer genome. For example, a recent study by Fujimoto et al. conducted whole
genome sequencing on 27 hepatocellular carcinomas and found 1734 missense mutations as
opposed to only 101 nonsense mutations, 161 short coding indels and 52 splice-site mutations as
well as 561 structural alterations (deletions, duplications, inversions and translocations)18
. This
approach did not distinguish between oncogenes and tumour suppressors, which may account for
some of the difference. Regardless of the true distribution of missense mutations in the cancer
genome, the large proportion of these mutations in TP53 is significant.
Despite being a critical tumor suppressor, it appears to be advantageous for a tumour to
retain a defective TP53 protein rather than to remove it entirely. In fact, due to its detection at
high levels in cancer cells19
, TP53 was initially considered a tumour antigen with transforming
capabilities. In 1984, Wolf et al. provided the initial evidence of the oncogenic effects of mutant
TP53 when they showed that injection of mutant-p53 expressing L12 cells into mice caused a
much more severe phenotype than L12 cells expressing no TP5320
. Olive et al. demonstrated in
2004 that mice harboring a R270H(equivalent to R273H in humans) mutation had a significantly
higher tumour burden than p53-/-
mice as well as exhibiting a distinct tumour spectrum21
. Shlien
et al. characterized a remarkable phenotype involving large deletions around the TP53 locus of
sizes up to 2Mb. Interestingly, the individuals with these large copy number losses around the
TP53 locus do not appear to be at an increased risk of cancer and instead exhibit a complex
phenotype of congenital abnormalities and developmental delay. It would appear that the
expression of the intact wild-type allele is sufficient to protect against the cancer phenotype
when no mutant p53 protein is expressed 22
. There are two theories that are not necessarily
9
mutually exclusive to explain these observations: that mutant TP53 can exert a dominant-
negative inhibitory effect on wild-type TP53, or that mutant TP53 can have a wild-type TP53-
independent gain of function.
Due to their ability to oligomerize, mutant TP53 proteins are able to exert a dominant-
negative effect over wild-type protein. It has been shown that wild-type and mutant TP53 protein
can be coimmunoprecipitated23
. The oligomerized defective proteins will form part of a now
defective p53 tetramer which is unable to bind to and activate its downstream targets. This
dominant-negative effect was demonstrated using two hotspot mutations (R270H and P275S) in
mouse embryonic (ES) stem cells. After exposure to γ radiation, their ability to induce BAX,
cyclinG and MDM2 was assessed. Induction of targets was not as effectively achieved in cells
carrying TP53 point mutations as in the p53+/-
cells, particularly in the case of BAX and MDM2.
Furthermore, doxorubicin-induced apoptosis was severely inhibited in the point mutated cells
compared to the p53+/-
cells24
.
The gain of function theory postulates that mutated TP53 can develop novel functionality
that differs from that of the normal, wild-type protein and that these new functions can contribute
to tumourigenesis and neoplastic growth. A good example of this is the expression of R172H
mutant p53 exerting an oncogenic effect by inactivating the TP53 family members, p63 and p73,
thus inhibiting their ability to induce cell cycle arrest25
. Mutant TP53 has been shown to have
oncogenic effects in vivo. Dittmer et al. demonstrated that overexpression of the R175H mutant
in TP53-deficient cells leads to tumour formation in mice, while cells which do not express this
mutation do not form tumours in nude mice26
.
10
1.1.4 Genetic Modifiers
Unlike many other hereditary cancer predisposition syndromes, LFS is notable for its
high phenotypic variability with respect to tumor type, age of onset, and severity of the disease.
This variability is even observed within families. The presence of other genetic modifiers offers
a plausible explanation for this phenomenon seen in LFS patients. A role for modifying genes
has been observed in other cancer predisposition syndromes, such as hereditary breast cancer
predisposition. A G>C single nucleotide polymorphism (SNP) in RAD51 is associated with an
earlier age of onset in BRCA2 carriers despite the fact that this SNP appears to have no effect in
noncarriers as well as BRCA1 carriers27
. It has also been shown in neurofibromatosis type 1 that
the phenotypic correlations between monozygotic twins was higher than between more distant
relatives with the same disease-causing NF1 mutations, suggesting that variability at other loci
may be responsible for some of the variability of the disease28
. Shlien et al. showed that TP53
mutation carriers have a marked increase in global copy number variation29
though this
phenomenon was not observed in a subsequent study with a different patient cohort30
. If global
copy number variation is in fact higher in LFS carriers, this is suggestive of an underlying
genomic instability that may contribute to the phenotype exclusive of the TP53 mutation itself.
TP53 is highly polymorphic with 100 SNPs currently listed in the IARC p53 database4.
One of the earliest observations with regards to genetic modifiers in LFS was a SNP in TP53
exon 4 which results in either a proline or arginine at codon 72. This SNP is very common in the
general population, but it does appear to confer a functional difference. The Arg72 variant
induces apoptosis better than the Pro72 variant in stably transfected human Saos2 cells31
. A
possibly important modifier in the key TP53 regulator, MDM2 has also been observed. Bond et
al. identified a SNP at the 309th
base pair of the first intron of MDM2. The SNP results in a T>G
11
transition which extends an Sp1 transcription binding site. The variant would thus be over-
expressed due to its higher affinity for Sp1. This overexpression was observed by real-time
quantitative PCR and immunoblotting in cells homozygous for the G allele. Furthermore, these
MDM2SNP309 (G/G) expressing cells showed significantly reduced cell death rates when
treated with etoposide, a DNA damaging agent, indicating this variant is overactive in its
repression of TP53. In fact, cells expressing this variant exhibited similar cell-death numbers to
cells expressing mutant TP53, demonstrating the significant functional effect this SNP can have
on cellular p53 levels. An analysis of LFS patients comparing those heterozygous or
homozygous for the G variant of SNP309 and those who were T/T at the locus showed a
significantly earlier median age of tumour onset(27 vs. 18 years)32
. Bougeard et al. subsequently
investigated the role of the TP53 codon 72 SNP and hypothesized that it may have an additive
effect with the MDM2SNP309, particularly because MDM2 was known to have a higher affinity
for Arg72. Analysing 61 TP53 mutation carriers from 41 kindreds, they found that carriers of the
Arg72 variant did in fact have an earlier age of tumour onset compared to those with the Pro72
variant (21.8 years vs. 34.4 years). Those who carried the Arg72 variant as well as the G variant
of the MDM2SNP309 had the lowest age of onset at 16.9 years33
. However, a more recent study
of 19 extended LFS pedigrees of 463 individuals and 129 TP53 mutation carriers showed only a
modest association between MDM2SNP309 and increased cancer risk with the difference not
being statistically significant. The SNP had a similar tumorigenic effect in both carriers and non-
carriers, indicating it may not be an LFS-specific modifier34
.
More recently, a 16 base pair duplication in intron 3 of TP53 (PIN3) was identified as a
possible genetic modifier of LFS. The genotypes of PIN3, MDM2SNP309, P53Codon72 (PEX4),
as well as PIN2 (a single base pair change in intron two) of a cohort of 32 Brazilian TP53
12
mutation carriers were assessed. A modest effect of MDM2SNP309 and PEX4 on age of onset
was confirmed. A decrease of 19 years in the age of onset was observed in mutation carriers who
were homozygous for the non-duplicated PIN3. This effect did not appear to be cumulative with
either MDM2SNP309 or PEX435
. A subsequent study contradicted these results as no significant
difference in the age of onset was observed in carriers homozygous for non-duplicated PIN3.
Males in the study however, did show a moderate increase in cancer risk when homozygous or
heterozygous for the duplicated PIN3 36
. Of note in the Brazilian study is the fact that 18 of the
32 TP53 mutation carriers in this cohort carried a unique mutation at codon 337 that is common
in a population in southern Brazil. This may help to explain the different results between studies.
Unfortunately due to the inherently small sample sizes involved with LFS studies, assessing the
effect of various genetic modifiers with sufficient power remains a challenge.
Of note on the topic of genetic modifiers in LFS is the possible role of genetic
anticipation which suggests that multiple genetic insults are accumulated throughout generations
leading to more severe phenotypes in each successive generation37
. Recently, a potential genetic
cause for this phenomenon has been hinted at by the observation of accelerated telomere attrition
in progressive generations of LFS families38,39
. In these studies, TP53 mutation carriers affected
with cancer were seen to have shorter telomere length as measured in peripheral blood
lymphocyte DNA than nonaffected relatives and telomere attrition between children and adults
was faster in carriers than controls. A similar telomere shortening phenomenon linked to
anticipation was subsequently observed in hereditary breast cancer40
. This telomere shortening
could be a marker of genomic instability which could be used to predict anticipation in LFS
kindreds.
13
1.1.5 The Role of Other Genes in LFS
As previously stated, roughly 20% of LFS and 60% of LFL cases do not appear to be
caused by a germline mutation of TP53. The involvement of other disease-causing genes offers
an attractive, if currently controversial, explanation for this. To date, a secondary disease-causing
gene remains elusive. The focus of the attention has been on genes involved in the p53 pathway,
apoptosis or cell cycle control such as p6341
, BCL1042
, BAX43
, CDKN2A44,45
, PTEN 44,46
, and
CHEK147
. These studies have all yielded negative results. Germline mutations in the p53-
phosphorylating kinase CHEK2 have been reported in one LFS family as well as two families
suggestive of LFS48
. One of these reported mutations, 1322delT, was later revealed to be a
duplicated exon49
. Both of the remaining reported mutations, He157Thr and 1100delC, which
were found in a total of four families suggestive of LFS, are now understood to be
polymorphisms. The 1100delC polymorphism has an estimated frequency in healthy individuals
of roughly 1%, while being 4 to 5 times more common in families with familial breast cancer.
This variant is estimated to confer a two-fold risk of breast cancer in females, and up to a 10-fold
risk in males50
51
. It was subsequently found to have a modest association with both prostate52
and colon cancer risk53
. The He157Thr variant has been estimated to have a frequency of roughly
5% in healthy individuals and surprisingly appears to be associated with a decreased risk of lung
and laryngeal cancers54
while conferring an increased risk of prostate cancer55
. While the
1100delC polymorphism offers an interesting association with breast cancer, particularly with
bilateral and male breast cancer51
, these findings do appear to invalidate CHEK2 as a secondary
disease-causing gene in LFS. Finally, linkage to a region in chromosome 1q23 has been reported
in an LFS kindred with wild-type TP5356
. The role of any predisposing genes in this region
however, remains to be determined. Recently, Aury-Landas et al. detected 20 copy number
14
variants (CNVs) of intermediate size in 15/64 LFS patients with no detectable TP53 mutation.
While it is likely that a number of these represent non-pathogenic CNVs, a notable pattern
emerged in four patients with brain tumours. All four patients exhibited CNVs affecting genes
coding TP53 partners involved in transcriptional regulation and chromatin remodelling,
KDM1A/LSD1, MTA3, TRAPP, and SIRT3. The authors demonstrated that the CNV
encompassing SIRT3 leads to its overexpression, and that this overexpression prevents apoptosis
in vitro, and results in the hypermethylation of numerous genes57
. These results indicate that
alterations in genes involved in chromatin remodelling may be a contributing factor in TP53
wild-type LFS, particularly in cases resulting in brain tumours.
1.1.6 Epigenetics and Li-Fraumeni Syndrome
Deregulation of genes through aberrant epigenetic modifications is often seen in sporadic
cancer. In hereditary cancer syndromes inherited epigenetic defects have also rarely been
observed. The most striking example of this is in hereditary non-polyposis colon cancer.
Epigenetic silencing through promoter hypermethylation of one allele of either MLH1 or MSH2
has been repeatedly observed 58
59
60
. This led to speculation that epigenetic inactivation may
account for some LFS cases that appear to present with wild-type TP53. A polymorphism in the
promoter of TP53 was detected and appeared more frequently in the LFS/LFL families (11%)
compared to the controls (0.3%). Despite this interesting finding however, the polymorphism
was shown to have no functional effect on TP53 expression61
. Methylation of the promoter
region has been reported in numerous sporadic tumours including brain62
, breast63
, and liver64
as
well as leukemia65
. Notably, 32% of these acute lymphoblastic leukemia (ALL) patients were
found to exhibit DNA hypermethylation at the TP53 promoter despite the fact that TP53 coding
mutations in ALL are rare (roughly 2-3%). This suggests that in ALL, promoter methylation may
15
provide a considerable silencing effect without concurrent gene mutations. Finkova et al. set out
to determine the methylation status of the TP53 promoter in 14 families suggestive of LFS but
with no detectable mutation in TP53. They found no detectable methylation at any of the CG
dinucleotides tested66
, indicating that TP53 gene silencing through promoter hypermethylation is
not likely a contributing factor to the LFS phenotype. The discovery of CNVs in chromatin
remodelling genes previously mentioned however, does indicate that epigenetic dysfunction may
play a causative role in TP53 wild-type LFS.
1.2 Hereditary Cancer Predisposition Syndromes
1.2.1 Overview
Over 200 hereditary cancer predisposition syndromes have been described, the majority
of which are quite rare. Collectively however, these syndromes are estimated to account for at
least 5-10% of all cancers67
. The prevalence of hereditary cancer predisposition in children was
recently investigated by Knapke et al. who studied cancer survivors from a pediatric cancer
survivorship clinic (n=370) over two years and screened them for family history, demographics,
and tumour characteristics that would suggest an underlying cancer predisposition syndrome.
The authors found a rather high number of individuals, 29%, who were identified as candidates
for further cancer genetics evaluation. The majority of these (61%) had a family history of
cancer, while 18% had tumours strongly associated with cancer predisposition syndromes, 16%
had a medical history which would suggest a genetic diagnosis, and 6% were selected on the
basis of a family history of another congenital condition68
. While this study provides only a
rough estimate of the true proportion of pediatric cancers caused by hereditary cancer
16
predisposition syndromes, it may well be an under-estimate as many syndromes lead to very
severe phenotypes that would not be well represented in a cohort of cancer survivors.
Approximately 100 genes have been implicated as causally linked to cancer predisposition
syndromes, most of which are inherited in an autosomal dominant manner69
. The large majority
of these are tumour suppressor genes.
Alfred Knudson developed the now famous “two hit hypothesis” after studying
hereditary retinoblastoma (RB) in children. He reasoned that multiple “hits” were necessary for
tumour formation. While sporadic retinoblastoma required two somatic hits to occur in the same
target retinal cell, children with the hereditary form of RB are born with a heterozygous germline
mutation in the RB1 gene (the “first hit”) and require only one subsequent somatic hit; this
explains the earlier ages of onset in hereditary RB, as well as the far greater risk of multifocal,
bilateral tumors70
. Our group focuses on Li-Fraumeni (LFS) syndrome, however the results of
this study highlighted the relevance of other cancer predisposition syndromes, particularly Gorlin
syndrome and Lynch syndrome. Cancer predisposition syndromes can be divided into two
subgroups: 1) multisystem syndromes in which cancer is but one of the various manifestations,
such as Gorlin Syndrome; and 2) pure cancer predisposition syndromes whose only apparent
phenotype is tumour formation such as LFS and Lynch syndrome.
Cancer-associated multisystem syndromes are often diagnosed early in life before the
development of cancer due to the presence of physical manifestations of the syndrome that
frequently, but not necessarily, present at birth. Despite displaying a wide range of signs, these
disorders are often associated with a specific tumour type that is usually rare in the general
population. Neurofibromatosis-1 (NF-1) is caused by germline mutations in the NF1 gene and
affects an estimated 1 in 2500 to 3000 individuals. It is inherited in an autosomal dominant
17
fashion, but notably, an estimated 50% of patients harbour de novo mutations71
. Individuals with
NF-1 exhibit skeletal, dermatologic, ophthalmic and neurologic abnormalities. Tumours
associated with NF-1 include: optic pathway gliomas (the most common tumor in NF-1),
astrocytomas, neurofibromas, peripheral nerve sheath tumours, and juvenile myelomonocytic
leukemia. Other notable cancer-associated multisystem syndromes for which causative genes
have been identified include Cowden syndrome, Fanconi anemia and tuberous sclerosis.
1.2.2 Gorlin Syndrome
Nevoid basal cell carcinoma syndrome (OMIM 109400, NBCCS, or Gorlin Syndrome) is
a complex multisystem syndrome characterized by an extremely high incidence of basal cell
carcinomas (BCCs) throughout an individual’s lifetime. Incidence may be as high as 90% with
skin pigmentation thought to provide a protective effect72,73
. It is inherited in an autosomal
dominant fashion with an estimated prevalence of 1 in 57000 to 1 in 256000 and appears to
affect males and females equally74
. Aside from multiple BCCs, other clinical manifestations of
NBCCS include odontogenic keratocysts of the jaws, hyperkeratosis of palms and soles, skeletal
abnormalities, intracranial ectopic calcifications, and facial dysmorphism along with rare cases
of intellectual deficit. Individuals affected by NBCCS are also at a higher risk of developing a
wide range of malignancies including medulloblastoma, fibroma and rhabdomyosarcoma.
Medulloblastoma in NBCCS typically presents during the first two years of life, while in the
general population it tends to peak around 7 or 8 years of age. There is also a distinct
preponderance of the desmoplastic subtype. The risk of developing a medulloblastoma appears
to be approximately 5%75
. Interestingly, males with NBCCS appear to be three times more likely
than females to develop a medulloblastoma74
. This early onset medulloblastoma may be the
presenting sign of NBCCS which makes testing for NBCCS vital as invasive BCCs and other
18
secondary malignant neoplasms can occur within the radiation field of the treated
medulloblastoma76–78
. In 105 affected individuals examined at the National Institute of Health,
the mean age of BCC onset was determined to be 23 and 21 years for Caucasians and African-
Americans, respectively, though the percentage of African-Americans who developed a BCC
was much lower79
. While these tumours can be quite locally destructive, they rarely metastasize.
NBCCS is associated with germline mutations in the homologue of the Drosophila
melanogaster Patched gene, PTCH1. PTCH1 is a tumour suppressor that in the fruit fly, is vital
for proper body segmentation. It encodes a transmembrane glycoprotein composed of 12
transmembrane domains and two large extracellular loops where binding with its ligand, Sonic
Hedgehog (SHH) occurs. Between 50 and 85% of individuals meeting the NBCCS clinical
criteria harbour a germline mutation in PTCH180
. PTCH2 is highly homologous to PTCH1 and
mutations in PTCH2 have been found in one simplex case of BCC and one simplex case of
medulloblastoma81
. This led to the discovery of mutations in PTCH2 in a select few cases of
NBCCS presenting with wild-type PTCH182,83
. These carriers also appear to exhibit a less severe
phenotype than those who carry the classical PTCH1 mutations. The association of NBCCS with
the SHH pathway spurred investigation into the role that other SHH pathway members may play
in NBCCS. This led to the identification of a germline mutation in SUFU in two members of an
NBCCS kindred80
. Mutations in SUFU have also been linked to medulloblastoma without the
NBCCS phenotype84
.
The diagnosis of NBCSS is made on the basis of a set of clinical criteria; either two
major criteria or one major and two minor criteria must be satisfied to confirm the diagnosis74
.
The major criteria include: multiple BCCs under 20 years; histopathologically-proven
odontogenic keratocysts of the jaws; three or more palmar or plantar pits; bilamellar calcification
19
of the falx cerebri; bifid, fused, or markedly splayed ribs; and first degree relatives with NBCCS.
Minor criteria include: macrocephaly; congenital malformations including cleft palate, frontal
bossing, and moderate hypertelorism; other skeletal abnormalities such as Sprengel deformity,
marked pectus deformity, and marked syndactyly of the digits; ovarian fibroma;
medulloblastoma. Gene mutation analysis plays a key role in diagnosis as there is considerable
variation in presentation even within the same family. Due to the fact that 20-30% of probands
have a de novo PTCH1 mutation, genetic testing is indicated for children presenting with a
desmoplastic medulloblastoma as an NBCSS diagnosis would have direct implications on
treatment85
.
In ‘pure’ cancer predisposition syndromes there are usually no early phenotypic
manifestations to facilitate an early diagnosis. A comprehensive review of the family history and
genetic testing are required to identify the presence of these syndromes before the formation of a
tumour. Perhaps the most widely recognized pure cancer predisposition syndrome is hereditary
breast-ovarian cancer syndrome (HBOC). Associated with mutations in BRCA1 and BRCA2,
HBOC is estimated to account for 5-10% of breast and ovarian cases Mutations in either BRCA1
or BRCA2 are present in approximately 1 in 500 individuals in the general population and more
common in certain populations; 1 in 40 in individuals of Ashkenazi Jewish ancestry carry a
BRCA mutation86
. Like many cancer predisposition syndromes, the penetrance varies based on
the actual mutation affecting the gene. The cumulative risks for developing breast and ovarian
cancer by age 70 have been estimated to be 65% and 39% for breast and ovarian cancer,
respectively in BRCA1 mutation carriers, and 45% and 11% for breast and ovarian cancer,
respectively in BRCA2 carriers. Other notable pure cancer predisposition syndromes include
20
familial adenomatous polyposis (FAP), rhabdoid tumour predisposition syndrome, Li-Fraumeni
syndrome and familial retinoblastoma.
1.2.3 Lynch Syndrome
Hereditary Nonpolyposis Colorectal Cancer Syndrome (OMIM 120435, HNPCC or
Lynch Syndrome) was first reported in 1913 by Aldred Warthin who chronicled a family,
“Family G”, with a hereditary pattern of stomach and endometrial cancer87
. Family G was further
characterized by Lynch and Krush almost 60 years later and this autosomal dominant pattern of
inheritance of gastrointestinal and gynaecologic cancer became known as HNPCC88
. Colorectal
cancer is one of the most common malignancies in the developed world. HNPCC has been
associated with approximately 2-4% of all colorectal cancer cases. Individuals with HNPCC also
have a higher risk of developing other malignancies such as cancers of the endometrium, ovaries,
stomach, small intestines, urinary tract, brain, and pancreas. Ascertainment bias has likely
influenced older estimates of colorectal cancer risk which were reported as being as high as 80%.
More recently, the lifetime risks of developing colorectal cancer for HNPCC patients has been
estimated to be 22-66% while also conferring significant risk of endometrial cancer, an estimated
32-45%89–91
.
Mutations in the mismatch repair (MMR) genes are observed in the majority of
individuals with HNPCC. These genes are vital in the maintenance of genomic fidelity by
correcting nucleotide mismatches that the normal editing function of DNA polymerase failed to
rectify. They include MLH1, MSH2, MSH6, and PMS2. In the past, mutations in MLH1 and
MSH2 were thought to account for up to 90% of HNPCC cases92
. This seems to have been an
overestimation however, likely caused by the less striking phenotype caused by MSH6 and PMS2
mutations which have more recently been estimated to be as high as 13% and 9% respectively93
.
21
Due to these underlying MMR gene mutations, a hallmark characteristic of tumours in HNPCC
is very high levels of microsatellite instability. The replication errors caused by the failure of the
MMR system are thought to be present in up to 90% of colorectal tumours in HNPCC94
. In 2009,
germline mutations in the EPCAM gene were found in HNPCC families that exhibited loss of
MSH2 expression despite having no detectable MSH2 mutations95,96
. These deletions at the end
of the EPCAM gene lead to transcriptional read-through into its downstream neighbouring gene
MSH2. MSH2 is subsequently epigenetically silenced through hypermethylation of its promoter
in cells where EPCAM is expressed such as in the epithelial cells of the intestine. This discovery
highlights the importance of studying the regions encompassing pathogenic genes as they can
directly influence gene function. Epigenetic silencing of MSH2 through deletion of EPCAM is
estimated to account for up to 6.3% of all HNPCC cases97,98
. HNPCC offers one of the best
examples of the importance of epigenetics in hereditary cancer as the EPCAM discovery was not
the first example of epimutations leading to the HNPCC phenotype. Hypermethylation of the
MLH1 and MSH2 promoters leading to gene inactivation has been reported in a number of
individuals meeting the clinical criteria for HNPCC. This hypermethylation can also be present
in the spermatozoa, indicating the potential for transmission to the offspring58,59,99
.
Early identification of HNPCC is vital in order to implement effective cancer prevention
strategies before tumour formation. A prospective Finnish study evaluated the efficacy of regular
colonoscopic surveillance and found that it was associated with a reduction in colorectal
incidence and mortality by 62%100
. More regular surveillance (1-2 years vs. 3 years) has since
been shown to improve outcomes even more by further reducing the risk of colorectal cancer and
largely limiting the developing malignancies to localized, early stage tumours101
. The
International Collaborative Group on HNPCC established diagnostic criteria, the Amsterdam
22
criteria, in 1991102
which were further refined in 1999 in order to incorporate the various tumours
outside the gastrointestinal tract103
. These criteria are neither highly sensitive nor specific, as
only 50% of probands meeting the criteria are found to harbour germline MMR gene mutations
while only 60% of families with known MMR gene mutations meet the criteria104
. The less
stringent Bethesda guidelines were developed in 1997 and updated in 2004 with the aim of
improving the sensitivity of diagnostic criteria. They largely succeed in this goal with sensitivity
estimates as high as 94%, though perhaps up to 28% of MMR gene mutation carriers may still be
missed under these guidelines93,105,106
. The Amsterdam and Bethesda criteria are summarized in
Table 1. In the past, MMR gene screening has often been limited to MLH1 and MSH2. As the
role that MSH6, PMS2, EPCAM and promoter hypermethylation play in HNPCC becomes
clearer, and their testing becomes more routine, more individuals that meet the clinical criteria
with these genetic lesions will likely be identified in the future.
23
Amsterdam II Criteria (1999)- All must be met:
Three or more relatives with histologically confirmed colorectal cancer or cancer of the
endometrium, small bowel, ureter, or renal pelvis. One affected relative must be a first-degree
relative of the other two; FAP should also be excluded
Two or more successive generations are affected
At least one relative was diagnosed before the age of 50 years
Revised Bethesda Guidelines (2004)- One or more of the following must be met:
Colorectal cancer before the age of 50 years
Synchronous or metachronous colorectal cancer or other HNPCC-related tumours, regardless
of age
Colorectal cancer with MSI-high morphology before the age of 60 years
Colorectal cancer (regardless of age) and a first-degree relative with colorectal cancer or an
HNPCC-related tumour before the age of 50 years
Colorectal cancer (regardless of age) and two or more first- or second-degree relatives
diagnosed with colorectal cancer or an HNPCC-related tumour (regardless of age)
Table 1- The revised Amsterdam criteria and Bethesda guidelines used to diagnose
HNPCC.
24
1.3 TP53
1.3.1 Overview
The TP53 protein was initially discovered through its association with the SV40 large T
antigen. This 53-54 kilodalton protein was initially described as an oncogene due to its ability to
transform recipient cells19
. Later however, TP53’s true identity as a tumour suppressor was
revealed upon the discovery that wild-type TP53 does in fact suppress cellular growth10,107
. The
previously observed transformation is caused by TP53 only when the gene is mutated.
Human TP53 protein consists of 393 amino acids and has four key functional domains
(figure 4): a transcriptional activation domain between amino acids 1-42; the site of interaction
between TP53 and the cell’s transcriptional machinery as well as its own negative regulators
such as MDM2; a DNA binding domain between residues 102-292, required for binding to
consensus DNA sequences in the phosphate backbone of the DNA helix; a tetramerization
domain between residues 326-355 responsible for the TP53 protein’s successful oligomerization
into a functional tetramer; and a terminal regulatory domain consisting of the final 26 amino
acids which regulates the protein’s ability to bind specific DNA sequences at the core domain.
25
Figure 4- Human TP53 protein has four key functional domains: the transcriptional
activation domain, the DNA binding domain, the tetramerization domain, and the negative
regulatory domain.
1.3.2 Transcriptional Regulation of TP53
The TP53 gene is transcribed from the negative strand of chromosome 17p13.1. The first
promoter, P53P1 is located 250 bp upstream of the first, non-coding, exon. The second promoter,
P53P2 lies within intron 1108
. The gene’s third promoter, P53P3 is located in the fourth intron.
While much attention has been paid to TP53’s post-translational regulation, certain aspects of its
transcriptional control have long been understood. Reich et al. demonstrated that 3T3 cells had
increased levels of TP53 when deprived of and then stimulated with serum109
. The increased
levels were shown to be effected at the transcriptional level as no increase in TP53’s half-life
was observed while the gene’s transcription mRNA levels rose 6-7 hours after serum stimulation.
This increase in TP53 mRNA prior to DNA synthesis may seem paradoxical but appears to be
commonplace110,111
. While this seems like strange timing for the increased expression of a tumor
suppressor, it is thought that this surge in TP53 transcription allows for increased “surveillance”
26
by TP53 at a time when cells are synthesising DNA and are thus at high risk of DNA damaging
events112
.
The TP53 promoter contains a bHLH recognition sequence (CACGTG) between 70 and
75 base pairs upstream of the transcription start site, which is a known recognition sequence of c-
MYC/MAX. Importantly, c-MYC is unable to bind DNA when not heterodimerized with MAX,
without which, c-MYC’s regulation of TP53 cannot occur. Using in vitro DNA methylation,
Schroeder et al. observed a 90% reduction of TP53 expression, providing the initial evidence for
DNA methylation dependent silencing of TP53113
. Examples of TP53 methylation in vivo have
been described above in the section entitled “Epigenetics and Li-Fraumeni Syndrome”.
Recently, micro-RNAs, (miRNAs) have emerged as important post-transcriptional gene
regulators. In 2009 Le et al. searched for potential miRNA-binding sites in the 3’-UTR region of
TP53 using an in silico approach. This led to the identification of miR-125b as a potential
regulator of TP53. The authors did in fact observe that knockdown of miR-125b leads to
increased levels of TP53 protein and induces apoptosis in human lung fibroblasts. Interestingly,
when zebrafish embryos are exposed to gamma radiation or camptothecin, miR-125b appears to
be down-regulated which corresponds to the rapid increase in TP53 as part of the DNA damage
response114
. A similar interaction with TP53 was reported soon after for the isoform miRNA-
125a115
. Since then, more than a dozen miRNAs have been identified as regulators of TP53,
either as negative regulators targeting the TP53 3’ UTR, or positive regulators that target the
UTRs of TP53 inhibiting genes such as SIRT1 and MDM2116
. Many of these miRNAs appear to
have functional relevance in human cancer cases. In colorectal cancer for example, miR-125b
when highly expressed, is associated with lower TP53 expression, advanced tumor size and
invasion and poor prognosis when compared to the low expression group117
. Of the known
27
transcription factors that bind the human TP53 promoter, only BCL6 and Pax negatively affect
its transcription118
. It is likely then, that miRNAs play a key role in the negative regulation of
TP53 and that when deregulated could significantly promote tumourigenesis. In the past few
years a rapid rise in the number of reported miRNAs regulating tumour suppressors like TP53
have been observed. While the number of confirmed TP53-regulating miRNAs remains small,
we can expect this number to rise in the near future as their importance becomes better realized.
The regulation of TP53 by WRAP53 (also known as WDR79) is a recently discovered
situation of regulation by a genomic neighbour119
. WRAP53 is located immediately upstream of
TP53, but on the opposing strand. This anti-sense transcript exists in numerous forms that can
use one of three possible first exons. One of these, exon 1α, overlaps with up to 227 basepairs of
TP53’s first exon. It appears that TP53 and this Wrap53α interact in a head to head manner and
that this interaction is necessary for proper transcription of TP53. Knockdown of the Wrap53α
transcript and the blocking of TP53/WRAP53 RNA hybrids by 2-O-oligonucleotides led to
significantly reduced levels of TP53 mRNA, while overexpression of the overlapping exon 1α
sequence increased TP53 mRNA levels. Oddly, this interaction appears to be one way as
overexpression or knockdown of TP53 appeared to have no effect on the expression of
WRAP53119
. Over-expression of the Wrap53 protein has no effect on TP53 transcription or
protein levels which presents additional evidence of the impact RNA can have on gene
regulation118
.
1.3.3 TP53 and MDM2
The MDM2 protein is the key regulator of TP53 protein levels through ubiquitination.
Ubiquitination is the covalent attachment of one or more ~8kD ubiquitin molecules to a protein.
This process requires the consecutive function of three enzymes: an E1 ubiquitin-activating
28
enzyme, an E2 ubiquitin-conjugating enzyme, and an E3 ubiquitin-ligating enzyme. MDM2 is an
E3 ubiquitin-ligating enzyme which like many E3 ligases, harbours a Really Interesting New
Protein (RING) domain. The observation that co-deficiency of MDM2 and TP53 is not lethal
while deficiency of MDM2 alone is has offered compelling evidence of the importance of the
TP53-MDM2 relationship120,121
. Despite being viable, these mice develop a similar spectrum of
tumours seen in TP53-null mice. The C-terminal region of TP53 is the site of the nuclear
localization signals and has been shown to be critical in MDM2-mediated regulation. Deletion of
TP53’s terminal 30 residues ablates this degradation 122
. In fact if any of the six lysine residues
(K370, K372, K373, K381, K382, and K386) in this region where ubiquitination occurs are
mutated, the transcriptional output of TP53 is increased 123
. TP53 is poly-ubiquitinated by high
levels of MDM2, leading to the proteasomal degradation, and mono-ubiquitinated by low levels
of MDM2, leading to TP53 being shuttled out of the nucleus into the cytoplasm. Aside from
ubiquitination, MDM2 also directly inhibits TP53 transcriptional activation by binding to the
protein itself124
. Mice with mutant MDM2 which has no E3 activity but retains the TP53 binding
capacity die during embryogenesis. They can however be rescued by the loss of TP53, which
indicates that the E3 ligases activity of MDM2 is crucial to its repression of TP53125
.
Importantly, MDM2 is itself a TP53 transcription target. Stress-induced increases in TP53 levels
induce increases in MDM2 expression which then downregulates TP53, creating a negative
feedback loop. MDM2 must be blocked in order for effective TP53 activity to occur. This is the
role of the ARF protein (p14ARF
). The deletion of MDM2 residues 222-437 was shown to abolish
its binding to ARF126
, identifying the key region required for ARF’s regulation of MDM2. It was
later shown by subsequent deletion analysis that residues 210-244 are responsible for most, if not
all, of ARF binding activity127
. Therefore TP53 and ARF bind to different regions of MDM2 in a
29
non-competitive manner. Deletion of the first 20 residues of human ARF severely inhibits its
binding to MDM2128
. Interestingly, the importance of this region is not well-conserved and this
effect was not observed in mice129
. Upon binding to MDM2, ARF inhibits its function by
sequestering it to the nucleolus, a region primarily associated with ribosome assembly, leading to
an increase in TP53 levels and activity.
1.3.4 Post Translational Modifications and the TP53 Response
Despite recent findings shedding light on TP53’s transcriptional regulation, the primary
method of regulation is thought to be by activation of the latent protein through post-translational
modifications. In fact, all threonine and serine residues in the first 89 amino acids of TP53 can be
phosphorylated or dephosphorylated following stress130
. The sheer number of TP53
modifications however has not quelled doubts about their importance as it has been shown that
mutations at all the known N-terminal and C-terminal phosphorylation sites do not inhibit
TP53’s transcriptional activation activity131
. While a comprehensive review of all the post-
translational modifications of TP53 goes beyond the scope of this introduction, a basic
understanding of TP53 modifications with respect to function is helpful.
TP53 lies at the heart of a complex cellular machinery dedicated to detecting and
repairing DNA damage. Ionizing radiation (IR) is a common and often potent source of
environmental DNA damage. Sources of IR include cosmic radiation, naturally occurring
sources (i.e. tritium), as well as artificial sources such, x-ray tubes and radiation therapy.
Exposure to IR leads to the accumulation of hydroxyl radicals which cause DNA double-strand
breaks and induce a TP53 response. Hydroxyl radicals are also generated as a consequence of
cellular respiration. The ATM (ataxia telangiectasia-mutated) gene is crucial to this response. By
phosphorylating TP53 at Ser15 ATM impairs the MDM2-mediated repression of TP53132
. Ser20
30
of TP53 is phosphorylated by Chk2133
which is itself phosphorylated by ATM, further inhibiting
MDM2-mediated repression134,135
.
Ultraviolet radiation (UV) is another important source of DNA damage that one is
exposed to daily from sunlight exposure. UV-C is the shortest wavelength of UV and as it
directly damages DNA the most, is the most studied. Both UV-B and UV-C radiation frequently
result in cis-syn cyclobutane pyrimidine dimers that cause errors during replication resulting in
the “classical C-T mutation” seen in many cancerous growths, as well as the generation of
reactive oxygen species (ROS). The ATM-Rad3-related (ATR) gene plays a key role in the
cellular response to UV radiation. The ATR gene is both structurally and functionally related to
ATM as both phosphorylate Ser15. While both ATR and ATM are involved in the IR response136
,
and share a common target motif, only ATR is involved in the UV response137
. Evidence of this
is seen in ataxia telangiectasia patients, who are not hyper-sensitive to UV radiation as TP53
activation appears to occur normally. A difference in kinetics between the IR and UV responses
has also been observed. Lu and Lane showed that UV induced cells mounted a TP53 response in
two hours while an IR response is brought about in one hour. While the UV response is mounted
slower however, TP53 levels continue to rise more than three hours after the response, by which
time levels are already dropping in an IR-induced response138
.
Upon activation TP53 can fulfil its primary role as a transcription factor for a large
number of downstream targets. In 1992 el-Deiry et al. used an unbiased approach to identify
DNA bound to TP53. They found 18 independent genomic DNA fragments bound to TP53. Each
of these possessed two copies of a 10bp motif 5’-Pu-Pu-Pu-C-A/t-T/a-G-Py-Py-Py-3’ separated
by a stretches no more than 13bp (Where Pu are purine residues and Py are pyrimidines)139
.
Following the sequencing of the human genome, a new technique known as ChIP on chip was
31
developed combining chromatin immunoprecipitation and high-density oligonucleotide arrays
capable of identifying DNA-protein interactions. This technique has been used to predict up to
1600 TP53 binding site regions across the genome by extrapolating from results found on
chromosomes 21 and 22140
. Subsequent to this, Wei et al. conducted a complete genome scan to
find all of TP53’s direct targets. Contradicting the results from Cawley et al., they found only
542 high quality TP53 binding sites141
. Using this, the authors were able to compile a list of
TP53 target genes. These genes can be broadly classified by TP53’s two primary roles: cell cycle
arrest and apoptosis.
1.3.5 TP53-mediated Cell Cycle Arrest
Cells can normally arrest in the G1 or G2 phases. Cells lacking TP53 however do not
seem able to arrest in the G1 phase142
. GADD45’s association with this TP53-mediated G1 cell-
cycle arrest was found following the identification of a TP53 binding site in its promoter and the
observation that it is induced following DNA damage. This induction is dependent on wild-type
TP53 function and is also not present in cells derived from ataxia telangiectasia patients143
. TP53
also exerts strict control over P21 (otherwise known as WAF1 or CDKN1A), a key regulator of
the cell cycle. Upon detection of DNA damage during G1, P21 is induced by TP53. It is then
transported to the nucleus where it blocks the cell cycle144
. It accomplishes this by inhibiting two
cyclin-dependant kinases (CDKs): CDK2 and CDC2. This TP53-mediated arrest is critical for
keeping cells from replicating errors in the S phase. If DNA damage is detected during the S-
phase, TP53 induces P21 which will associate with the proliferating nuclear antigen (PCNA),
halting replication.
32
1.3.6 TP53-mediated Apoptosis
Programmed cell death, termed ‘apoptosis’, is not only vital in the suppression of cancer,
but also to proper development, homeostasis, and immune system maintenance. Strict regulation
of this system is therefore necessary throughout all stages of life. While an inability to initiate
apoptosis effectively is a major hallmark of cancer145
, excessive apoptosis can lead to embryonic
death146
. A family of cysteine proteases, the caspases, are the primary drivers of apoptosis which
cleave at an Asp residue147
. Activated TP53 promotes apoptosis by inducing the expression of
the Fas receptor148
. When the Fas receptor is present on the surface of the cell it becomes
sensitized to the effect of the fas ligand (FasL). Upon binding with FasL, the receptor binds the
fas-associated death domain protein (FADD) in the cytoplasm. It is FADD that recruits two
initiator caspases: caspase-8 and caspase-10. These initiator complexes activate the executioner
caspases 3, 6, and 7. It is the cleavage activity of these executioner caspases that are responsible
for the morphological changes of apoptosis including protrusions from the plasma membrane,
and the collapse and fragmentation of the nuclear structure.
Cytochrome c resides between the inner and outer membranes of the mitochondria but is
released into the cytosol when apoptosis is initiated. It then binds APAF1 to form a seven-spoked
molecule called the apoptosome which activates the procaspase 9 and converts it into caspase 9.
Caspase 9 goes on to activate the executioner caspases itself. A family of proteins, the Bcl-2
family, controls the release of cytochrome c by the mitochondria149
. Of this family, the Bcl-2
gene itself, and four related genes (Bcl-X1, A1, Bcl-w, and Mcl-1) are anti-apoptotic while the
Bax family (Bax, Bak, and Bok) as well as the BH3-only family (Bim, Bik, Bad, Puma, Bid,
Noxa, Hrk, and Bmf) are pro-apoptotic. It is the quantity of these pro- versus anti-apoptotic genes
that determine whether cytochrome c is released from the mitochondria or retained. TP53 is a
33
transcriptional activator of Bax150
, NOXA151
, and PUMA152
, all pro-apoptotic genes whose
activation promote mitochondria-mediated apoptosis.
1.4 Copy Number Variation
1.4.1 Overview
While variation in SNPs has been studied in LFS patients, SNPs represent but one
form of common genomic variation. Microsatellites and minisatellites confer significant
variation between individuals as well. In the last decade the important role that copy number
variations (CNVs) play in genetic diversity has come to light. CNVs are defined as structural
genomic variants in which a copy number difference has been observed between two or more
genomes153
. A CNV is defined as being larger than one kilobase in size, and can span many
megabases, being visible under a microscope. A related classification of structural variation
known as “indels” are commonly defined as being between 10-1000bp. The discovery of this
prevalent form of genomic variation quickly overturned the idea of a single diploid “reference
genome”. The past decade has seen an explosion in data showing just how prevalent CNVs are in
the general population. Prior to the significant advances of DNA microarray technology of the
last 10 years, only a few copy number variable loci had been identified, such as the alpha-7-
nicotinic receptor gene at 15q13-15154
. The initial landmark genome-scale studies of 2004
detected 76 CNVs in 20 individuals155
and 255 CNVs in 55 individuals156
. These initial findings
were shown to just be the tip of the iceberg. The Database of Genomic Variants (DGV) compiles
structural variation detected in healthy samples in CNV studies across the world157
. Currently,
109836 merged-level CNVs have been listed in the database. The last few years in particular
have seen a dramatic increase in the reporting CNVs with the advent of ultra-high resolution
arrays and whole genome sequencing (Figure 5) encompassing up to 71% of the genome.
34
Estimating the true size of many of these CNVs can be difficult, particularly with regards to
earlier studies as we now know that the limited resolution of these platforms led to an over-
estimation of CNV size. For example, in 2010, Conrad and Pinto et al. used a set of 20
NimbleGen arrays, harnessing 42 million probes in total which were tiled across the genome at a
density of one probe every 50bp. This led to the discovery of 11700 CNVs greater than 443bp in
size with an average of 1117 and 1488 CNVs detected per person in the European and African
samples, respectively158
. The average number of CNVs per individual was higher, and the
average CNV size was lower than previously estimated. Due to increasing platform resolution
first of CGH arrays and now of next-generation sequencing (NGS) this trend has continued with
the reporting of numerous CNVs and indels under 10kb in size in the healthy population (Figure
6).
35
Figure 5- Recently there has been a rapid rise in reported copy number variation into the
Database of Genomic Variants157
.
Figure 6- The majority of copy number variants reported in the Database of Genomic
Variants are now under 10kb157
.
36
1.4.2 Generation of Copy Number Variants
Structural alterations such as CNVs are often generated as a result of errors in DNA
repair. In an attempt to repair DNA breaks, segments of the genome can be fused in such a way
that can generate deletions, duplications or inversions159
. These repair mechanisms can be
broadly divided into those which are homology-mediated and those that are not homology-
mediated. Homologous recombination (HR) not only generates new combinations of linked
alleles at meiosis, but is central to many DNA repair processes. HR requires significant DNA
sequence identity, roughly 50bp in E.coli160
and up to 300bp in mammalian cells161,162
. During
meiosis the Spo11 protein creates DNA double stranded breaks, thus providing the substrate for
HR163
. This HR during meiosis allows for the exchange of material between the maternal and
paternal chromosomes and is responsible for much of the genetic diversity observed in sexually
replicating organisms. While mitotic HR is functionally similar, it is brought about by DNA
double stranded breaks caused by cellular metabolism or external DNA damaging agents. The 3’
ends of the break are processed by endonucleases which leads to the formation of a 3’ tail (a
single stranded overhang). It is at this stage that a homologous donor segment is found allowing
for the invasion by the single stranded DNA of the homologous duplex DNA, displacing a strand
and creating a D-loop. The strand exchange is catalyzed by RAD51 in both meiotic and mitotic
HR while DMC1 is only active during meiotic HR. The synthesis of DNA from the invading
strand can now begin with the donor DNA acting as a template and the second end of the double
strand break is “captured”, creating two Holliday junctions which are then cut and ligated to seal
the nicks. The critical role of HR in maintaining genomic integrity can be seen in cells with
suppressed RAD51 expression as they rapidly accumulate structural abnormalities and cease to
divide164
. While HR is generally a robust DNA repair mechanism capable of repairing double
37
stranded breaks accurately, repetitive elements in the genome can lead to recombination between
homologous segments at different chromosomal locations. This is known as non-allelic
homologous recombination (NAHR) and is often associated with low-copy repeats, known as
segmental duplications. These segmental duplication-mediated NAHR events have been
associated with a number of genomic disorders, including 24kb flanking repeats in Charcot-
Marie-Tooth disease165
and ~200kb repeats in Smith-Magenis syndrome166
.
Synthesis-dependent strand annealing and break induced replication (BIR) represent
other forms of homology-mediated DNA repair processes. Synthesis-dependent strand annealing
is similar to the HR model described above, but rather than forming a double Holiday junction
by capturing the 2nd
DNA end, the synthesized strand of DNA is displaced from the D-loop
allowing it to fuse with the other end of the break. The BIR mechanism comes into play upon the
collapse of DNA replication forks, resulting in one-ended double stranded breaks167
. When a
replication fork encounters a nick in the template strand this mechanism is engaged. One strand
of the fork breaks off and is resected, revealing a one-side 3’ overhang. This invades a
homologous strand to form a D-loop. A new replication fork is then formed upon which both the
leading and lagging strands are synthesized. This process is normally faithful, but if the repair
involves a homologous sequence at a different chromosomal location, structural alterations can
result. There is in fact growing evidence that BIR constitutes a major mechanism of CNV
formation159
.
While homology mediated mechanisms are thought to account for the majority of CNV
formation through DNA repair, non-homology mediated pathways can also repair DNA double
stranded breaks and could thus lead to structural alterations. The most prominent of these
mechanisms is known as non-homologous end joining (NHEJ). In non-dividing haploid
38
organisms or in diploid organisms that are not in S phase, there is no homologous donor nearby
that can be used in the homology mediated repair mechanisms. NHEJ provides a repair
mechanism that is commonly used in these instances to rejoin double stranded breaks into a
contiguous product168
. In vertebrates, the Ku protein binds to DNA to form a complex with
which a nuclease, polymerase and ligase can interact. This is a flexible process that can lead to
many different junction products, with each side potentially having different nucleotides resected
or added. When there is a small amount of homology at the ends of the DNA, they can be joined
together. NHEJ has been shown to be the driving process behind a number of genomic disorders.
In a breakpoint characterization of 39 deletions at the dystrophin gene for example, unequal
homologous recombination was very poorly represented while junction features such as short
homologous segments were common, suggesting many of the pathogenic structural alterations
were brought about by NHEJ169
.
A more recently discovered mechanism known as ‘fork stalling and template switching’
(FoSTeS) has been implicated in the demyelinating disorder Pelizaeus-Merzbacher disease170
. In
FoSTeS the replication fork stalls, causing the lagging strand to disassociate from the original
template and anneal to another replication fork nearby which is followed by reinitiating DNA
synthesis. The location of the new fork will dictate whether a deletion or duplication (upstream
or downstream, respectively) will occur. After stalling of the first fork, FoSTeS invades a new
strand making use of small microhomologous segments. The repetitive disassociation of a
nascent strand and reinitiating of DNA on the original template can cause complex
rearrangements to arise171
. Of note is that this is a replication-based mechanism, challenging the
notion that congenital pathogenic CNVs are all of meiotic origin. Further evidence of the
potential impact of non-meiosis driven CNVs can be seen in monozygotic twins that have
39
different CNVs172
. The origin of these different CNVs would have to have been in somatic cells
during mitosis.
1.4.3 Germline CNVs in Disease
Copy number variants do in fact appear to have significant effects on the transcription of
genes by either affecting dosage directly or by disrupting proximal or distant regulatory
regions173
. Pathogenic CNVs have often been observed containing multiple genes. The diseases
caused by such genomic rearrangements are known as “genomic disorders”174
. Because of this,
genomic disorders often present with a broad and varied phenotype such as Prader-Willi
syndrome which is associated with a 15q11-q13 deletion involving many genes and manifests
itself with various mental and physical effects. Beginning with karyotyping and later, with early
CGH arrays, identifying the genetic cause of genomic disorders has relied on low resolution
detection methods which would be biased in their detection of large, multi-gene CNVs.
Particularly with the increased use of whole genome sequencing in the clinic as well as high
resolution arrays, the importance of rearrangements of just one gene, or even just one part of one
gene in genomic disorders will be better understood. Interestingly, the effect of a pathogenic
CNV may not be limited to the gene, or genes, it contains. Williams-Beuren syndrome is a rare
neurodevelopmental disorder associated with a deletion at 7q11 involving up to 28 genes. Upon
measuring expression levels of the genes as well as those of the surrounding regions, it was
unexpectedly observed that even genes far outside the deleted region (thus having a copy number
of 2) also had reduced expression levels175
. These results suggest that flanking genes even
several megabases away from a genomic rearrangement should be considered to have a potential
role in the observed phenotype, despite being copy number neutral.
40
1.4.4 CNVs in Cancer predisposition
Retinoblastoma presented one of the earliest examples of a genomic rearrangement
causing cancer predisposition when individuals with retinoblastoma presented with
cytogenetically visible deletions at 13q14176,177
which led to the mapping of the RB1 gene to this
region. There are now approximately 100 genes known to cause Mendelian-inherited syndromes
when mutated69
. Roughly 40% of these genes have been observed as deleterious CNVs in
individuals affected by these cancer predisposition syndromes178
. Whole-genome CNV profiling
in high-risk individuals has recently revealed a number of candidate regions and genes including
individuals predisposed to colorectal cancer179
, breast cancer180
, and melanoma181
.
Identifying common CNVs that confer a moderate risk of cancer is significantly more
difficult as each variant may confer only a slight increase in risk as well as the fact that many
CNVs encompass genes that mediate interaction with the environment. In these cases the
detection of a pathogenic CNV would depend on the presence or absence of certain
environmental triggers. The effect of CNVs being modulated by environmental factors is seen in
the drug response of detoxification genes182,183
. It is perhaps surprising that a significant number
of cancer genes have been identified in CNV regions. Previously, our own lab found 49 cancer
genes that were directly encompassed or overlapped by a CNV in more than one person from a
large reference population29
. As the resolution of the platform used was quite modest (the mean
CNV size was 206kb), one would expect the true number of cancer genes encompassed or
overlapped by CNVs to be much higher. The DGV contains numerous CNVs encompassing
these genes. An interesting example of this is copy number variation in the Rad51L1, a gene vital
41
for DNA repair by homologous recombination and has been shown to harbour a SNP that is
associated with breast cancer184
.
While common cancer SNPs and CNVs likely contribute to mild cancer predisposition in
much of the population, it is the rare, high-risk CNVs that are associated with highly penetrant
cancer predisposition syndromes. The majority of these are thought to primarily be caused by
base-pair sized germline mutations. However, PCR-based sequencing often leaves genomic
rearrangements undetected, and the rise of copy number detection methods such as CGH arrays
and multiplex ligation-dependent probe amplification (MLPA) has led to an increased
appreciation of the role of copy number gains and losses in these syndromes. A summary of
genes associated with cancer syndromes in which CNVs have been observed can be found in
table 2. While it appears that point mutations and CNVs often result in a similar phenotype, there
are cases where CNVs confer a similar, but seemingly altered phenotype than point mutations.
When one copy of the entire APC gene is deleted for example, it appears to result in an
attenuated form of familial adenomatous polyposis (FAP) compared to the more severe cases
seen in individuals harbouring a point mutation in APC185
. A number of studies have assessed the
potential role of CNVs in individuals who tested negative for mutations in genes associated with
their respective cancers. These include hereditary pancreatic186
, colorectal179
, breast cancers180
.
While these studies have not yet identified causative genes, a number of rare CNVs have been
identified in each of them. In both the breast and colorectal cancer studies, a shared CNV
encompassing KIA1797 and MIR491 at 9p21.3 was reported. This locus appearing in two
separate cancer predisposition screens provides evidence of its role in cancer predisposition.
Another interesting example of a rare CNV’s potential role in cancer risk was provided by Yang
et al. who reported a rare 4q13 duplication in a melanoma-prone family. Although this
42
duplication is unique to this family, it segregates with melanoma in the three affected
individuals181
. This region contains 10 genes, two of which, CXCL1 and IL-8, have been shown
to stimulate melanoma growth187,188
. There are also indications that CNVs may be responsible
for some of the variation seen within cancer syndromes. A study of BRCA1-associated ovarian
cancer individuals detected significantly more copy number losses in the BRCA1 group
compared to sporadic ovarian cancer cases and controls. The BRCA1 group also showed CNVs
at 31 previously unknown regions189
. The implications of this presumed genomic instability are
intriguing. Does the primary mutation result in an unstable genome prone to de novo CNV
formation? Or perhaps the CNVs are directly involved in tumour formation. These two concepts
are not necessarily mutually exclusive. It is possible that the primary mutation leads to
accelerated CNV generation and that some of these CNVs then go on to influence the phenotype
resulting perhaps in a more severe presentation.
43
Gene Syndrome Tumour Types
APC Adenomatous polyposis coli; Turcot syndrome Colorectal, pancreatic, desmoid,
hepatoblastoma, glioma, other CNS
cancers
BMPR1A Juvenile polyposis Gastrointestinal polyps
BRCA1 Hereditary breast/ovarian cancer Breast, ovarian
BRCA2 Hereditary breast/ovarian cancer Breast, ovarian, pancreatic, leukemia
(FANCB, FANCD1)
CDH1 Familial gastric carcinoma Gastric, breast
CDKN1B Multiple endocrine neoplasia type IV Pituitary tumor, testicular tumor
CDKN2A Familial malignant melanoma Melanoma, pancreatic
CHEK2 Familial breast cancer Breast, prostate
CREBBP Rubinstein–Taybi syndrome Nervous system, brain, leukemia
CYLD Brooke–Spiegler syndrome, familial
cylindromatosis, multiple familial
trichoepithelioma
Multiple skin appendage tumors
EPCAM Lynch syndrome Colorectal, endometrial
EXT1 Multiple exostoses type 1 Exostoses, osteosarcoma
EXT2 Multiple exostoses type 2 Exostoses, osteosarcoma
FANCA Fanconi anemia A Acute myeloid leukemia
FH Hereditary leiomyomatosis and renal cell
cancer
Leiomyomatosis, renal
FLCN Birt–Hogg–Dubé syndrome Renal cell carcinoma
GPC3 Simpson–Golabi–Behmel syndrome Wilms’ tumors
HRPT2 Hyperparathyroidism–jaw tumor syndrome Parathyroid carcinoma, renal cell
carcinoma
JAG1 Alagille syndrome Hepatocellular carcinoma, papillary
thyroid carcinoma
MADH4 Juvenile polyposis Gastrointestinal polyps
MEN1 Multiple endocrine neoplasia type 1 Parathyroid adenoma, pituitary
adenoma, pancreatic islet cell,
carcinoid
MSH2 Lynch syndrome Colorectal, endometrial, ovarian
MSH6 Lynch syndrome Colorectal, endometrial, ovarian
NF1 Neurofibromatosis type 1 Neurofibroma, glioma
NF2 Neurofibromatosis type 2 Meningioma, acoustic neuroma
44
NSD1 Sotos syndrome Increased risk of benign or malignant
tumors, including neuroblastoma and
gastric carcinoma
PMS2 Lynch syndrome; Turcot syndrome Colorectal, endometrial, ovarian,
medulloblastoma, glioma
PRKAR1A Carney complex Myxoma, endocrine, papillary thyroid
PTCH1 Gorlin syndrome Skin basal cell, medulloblastoma
PTEN Cowden disease; Lhermitte–Duclos syndrome Breast cancer, leukemia, renal cell
adenocarcinoma, neuroendocrine
carcinoma, Merkel cell carcinoma
RB1 Familial retinoblastoma Retinoblastoma, sarcoma, breast,
small cell lung
RUNX1 Familial platelet disorder Acute myeloid leukemia
SDHB Familial paraganglioma Paraganglioma, pheochromocytoma
SDHC Familial paraganglioma Paraganglioma, pheochromocytoma
SDHD Familial paraganglioma Paraganglioma, pheochromocytoma
SMAD4 Juvenile polyposis syndrome Colon, stomach, small bowel and
pancreas
SMARCB1 Rhabdoid tumor predisposition syndrome-1 Schwannomas, malignant rhabdoid
STK11 Peutz–Jeghers syndrome Jejunal harmartoma, ovarian,
testicular, pancreatic
TP53 Li–Fraumeni syndrome Breast, sarcoma, adrenocortical
carcinoma, glioma, multiple other
tumor types
TSC1 Tuberous sclerosis 1 Hamartoma, renal cell
TSC2 Tuberous sclerosis 2 Hamartoma, renal cell
VHL von Hippel–Lindau syndrome Renal, hemangioma,
pheochromocytoma
WT1 Denys–Drash syndrome, Frasier syndrome,
Familial Wilms’ tumor
Wilms’ tumor
Table 2- Known cancer predisposition genes with rare copy number variants and their
associated syndromes. Adapted from178
45
1.5 Rationale
Identification of pathogenic genes is vital in the management of patients with cancer
predisposition syndromes. A genetic cause remains to be identified for a quarter of all LFS cases
and the majority of LFL cases. Current technology allows for the high-throughput genetic
screening of TP53 wild-type individuals. The use of a custom CGH array will allow us to
interrogate hundreds of genes at exon-level resolution that are likely to be involved in the
LFS/LFL phenotype. Using this custom array we can expect to detect a number of copy number
variable regions in our genes of interest. Identification of copy number variants in these genes
will hopefully lead to the identification of a handful of candidate genes that are the underlying
cause of the highly penetrant cancer predisposition seen in many LFS/LFL patients which thus
far remains unexplained.
46
Chapter 2: Materials and Methods
2.1 Genes of Interest
Due to the high penetrance of the cancer phenotype in the TP53-WT families, it was
decided that it was best to focus on cancer-related genes as opposed to a more conventional
genome-wide approach. This allowed for a significantly higher resolution than would be feasible
with a standard genome-wide CGH array. The Wellcome Trust Sanger Institute’s Cancer Gene
Census is an ongoing effort to catalogue those genes for which mutations have been causally
implicated in cancer. As of March 2014, it includes 522 genes, 20% of which have had mutations
in the germline observed190
. When the custom array was being designed for this study, the list
included 427 genes (Supplementary Table 1). As the phenotypes of the TP53-WT families are
similar to those who harbour TP53 mutations, p53-pathway genes were also of interest. As p53
has hundreds of binding partners, it was necessary to limit which genes were to be included in
the array. Ingenuity Pathways Analysis (IPA; Ingenuity Systems Inc., USA) was used to identify
genes primarily involved in p53-regulated cell cycle control. From the IPA, an additional 16
genes were selected based on their relevance to cancer biology and their interaction with p53.
(Supplementary Table 2). Collectively, this expanded the total number of genes of interest to 443
genes.
2.2 Array Design
2.2.1 Design Overview
The initial expectations for the custom array were that it would have a resolution of at
least 300bp in the exons, 5kb within the introns, and 1kb in the defined promoter region (defined
as being at least 5kb from the transcription start site) in the genes of interest as well as ~100kb
47
across the rest of the genome. Based on previous experience with the custom Agilent platform
and Partek’s Genomics Suite, it was decided that a minimum of three probes were required to
make a reliable call in a given region. This meant that a minimum inter-probe distance of 100bp,
~1660bp, 333bp, and ~33kb was required for the exons, introns, promoter regions, and across the
genome, respectively. The Agilent 4x180k format (4 identical arrays of ~180 thousand probes
each per slide) was selected as the most appropriate to achieve the desired resolution in a cost-
effective manner.
2.2.2 Design of Exonic Probes
The exonic regions of the genes of interest were the focus of the array; therefore a special
effort was made to ensure the highest possible probe density in these regions in particular.
However, Agilent recommends that probes not be placed with an inter-marker distance of less
than 100-150bp. This is primarily due to two main problems that increase in severity the higher
the probe density becomes. First, if there is more than one probe per 100-150bp segment of
DNA, any given fragment of DNA may have more than one probe it can bind to with perfect
complementarity. This can result in noisy data as multiple probes may be competing to bind to
one fragment of DNA. The second concern is that the average quality of probes tends to decrease
as density increases. If one attempts to place many probes in a small area, it must be expected
that some of the probes will not be ideal with regards to sequence similarity and melting
temperature. These caveats had to be considered when attempting to meet the tiling-level
resolution that was desired. For these reasons, true high-density, tiling-resolution CGH arrays are
relatively rare, although they have been run successfully22,158
.
Exon regions were defined using the Consensus Coding Sequence Project (CCDS) which
is a collaborative effort between the National Center for Biotechnology Information, the
48
European Bioinformatics Institute, the University of California, Santa Cruz, and the Wellcome
Trust Sanger Institute to agree upon a consistent set of protein coding genes for human and
mouse for public use191
. Each coding exon of each splice variant of each of the 443 genes of
interest was defined this way. As it would be ideal to have probes covering the exon/intron
boundary, an additional 50bp was added to each end of these coordinates (100bp total) which
were based on the hg19 human genome assembly. This resulted in a defined “exon region” that
encompassed in all roughly 1.6 million bp with an average size of 266bp.
Custom probes were designed using Agilent’s eArray program using the “Genomic
Tiling” option. The average probe spacing selected for these coordinates was 35bp. The program
was told to avoid repeat regions and Alu 1 and Rsa 1 restriction sites, which are used in the
standard Agilent protocol to digest the genomic DNA. Repeat regions of the genome were also
avoided. The preferred probe melting temperature was 80˚C. Probes were “trimmed” in order to
ensure that the melting temperatures were as close to 80˚C as possible. Probes can be trimmed
from the default size of 60bp to a minimum of 45bp. This resulted in an output of 30966 probes
tiled across the defined exon regions.
The eArray software can calculate probe performance scores for non-catalogue probes to
predict how likely it is that they will produce a good log ratio response when used on the Agilent
CGH platform. These scores are based on the GC content, melting temperature, sequence
complexity and metrics to measure homology with the rest of the reference genome. These
factors are taken into account and the probe is given a score between 0 and 1. The average probe
score of all the Agilent Catalogue probes is 0.759. The custom probes were scored and all probes
with a score under 0.3 were eliminated. This resulted in a total of 28373 probes tiling the defined
exon regions with an average score of 0.615. The average probe size after trimming was 51bp. In
49
total, the probes covered 1.1 million base pairs, or 70% of the defined exon regions. The actual
average probe spacing within these intervals was 55bp as there were many small stretches within
the defined regions that were not amenable to quality probe placement. According to the
previously mentioned standards, this gave a predicted average resolution of 165bp.
The probe design protocol above was repeated for many different probe spacing inputs
from 8bp to 150bp. The 35bp input was determined to be the best based on a good balance of
probe density to probe quality. A previous custom Agilent array used by the lab had similar
probe spacing22
. The 35bp set spacing resulted in a sufficient actual density(average 55bp inter-
probe distance) to allow for a resolution well below the minimum desired resolution while still
maintaining a high percentage of acceptable probes (28373 out of 30966 with an average quality
of 6.2/10).
2.2.3 Genomic Probes
A modest genome-wide resolution was desired in order to detect any large, multi-gene
copy number alterations that would otherwise go undetected by focusing only on the 443 genes
of interest. The probes were designed last and were thus used to fill the rest of the array (~48%
of probes). Due to the very low density required, selecting from the more reliable Agilent
Catalogue probes was much more desirable than custom probe design. Probes were selected to
span all 22 autosomal chromosomes as well as the X chromosome, excluding the centromeric
regions. The Agilent Catalogue probes were selected based on a shared melting temperature of
80˚C as well as a similarity score filter that filters out probes with secondary genomic alignments
that could potentially impact probe performance (the most stringent filter). The desired average
spacing was 40kb. This resulted in 75100 probes spread across the entire genome, giving an
estimated resolution of ~120kb.
50
2.2.4 Non-coding Exon Probes
Deletions of the untranslated regions (UTRs) of genes have previously been implicated in
cancer, including in the germline of some cancer syndromic patients192. As these deletions need
not be large in order to demonstrate a significant effect, it was deemed necessary to investigate
these regions at high resolution. The UTRs of all the RefSeq transcripts were obtained via the
UCSC genome browser. This resulted in 1002 independent regions with a total of 1.1 million bp.
As with the coding exons, an additional 50 base pairs were added to either side of these
coordinates. The coordinates were used to design custom probes in the Agilent eArray software
in a manner similar to the coding exons. The average probe spacing selected was 75 base pairs.
As with the exon regions, the probes were designed to be trimmed and exclude repeat regions as
well as the relevant restriction sites, and to have a melting temperature around 80˚C. This
resulted in a total of 11106 custom probes being placed across all the defined UTR regions.
Roughly 57% of the defined area was covered with an actual average spacing of 100 base pairs
with a mean probe size of 52bp. This resulted in an average estimated resolution of roughly 300
base pairs, though it should be noted that the spacing in the majority of regions was close to 75bp
with the exception of repeat regions within the defined area (increasing the calculated average).
2.2.5 Promoter Region Probes
Upstream deletions, even at a considerable distance from the transcriptional start site
have previously been shown to alter gene function. For this reason, the regions upstream of our
genes of interest were also interrogated by the array. Two separate “promoter regions” were
defined: the first being 5kb upstream of the transcriptional start site, the second being 5kb
upstream from the first (a total of 10kb from the TSS). The overlap between these regions and
the UTR regions allows for high probe density around the transcriptional start site due to the high
51
density of the UTR probes. Custom probes were designed for these two regions in the same
manner described above. The first region was to have an average spacing of 75 base pairs; the
second was to have an average spacing of 150bp. This resulted in a total of 13714 probes being
placed in the first region and a further 6145 probes being placed in the second for a total of
19858 probes in the defined promoter region. Roughly 27% of the defined promoter region was
covered (37% and 16.7% for the first and second regions, respectively). The actual average
spacing for the first region was 162 base pairs, and for the second, 361 base pairs. This resulted
in estimated resolutions of 486 and 1083 base pairs, respectively. The cause of the significantly
lower probe density than the requested input is the large amount of repeat regions of varying
sizes spread across these regions.
2.2.6 Gene Intron Probes
In order to detect either large alterations in the introns that may influence gene function
or smaller alterations that extend only slightly into the defined exon regions, a modest resolution
across the introns of the genes of interest was desired. As the desired probe density was
relatively low, it was possible to select probes from the Agilent Catalogue of pre-made probes.
The entire RefSeq gene coordinates were used with eArray picking probes at a 1.25kb
intermarker spacing while excluding the previously defined exon regions. The criteria used to
pick these probes were the same as those described above for the genomic probes. This resulted
in a total of 28404 probes being placed evenly across the intronic regions of the genes of interest.
With a roughly 1.25kb intronic spacing, the estimated resolution was 3.75kb. As intronic regions
were flanked by exonic probes, the resolution of intronic regions smaller than 3.75kb was
correspondingly higher.
52
2.2.7 Finalizing the Array
In addition to the requisite quality control (QC) probes, an additional 3576 Agilent
Catalogue probes were placed evenly across the genome in order to fill the final few spots on the
array. This led to an array with 166 417 experimental probes with a total of 180 880 array
features including the QC probes. All regions were checked computationally and visually using
the University of California, Santa Cruz (UCSC) Genome Browser.
Figure 7- Custom probes of our custom CGH array are highlighted in green while copy
number probes from the Affymetrix Genome-Wide Human SNP6.0 are highlighted in red. The
custom array provides a much greater probe density in the coding regions in genes of interest.
2.3 Samples
2.3.1 Sample Selection
Samples were selected from families who fit the previously mentioned revised
Chompret Criteria. In total, whole genomic DNA extracted from 32 individuals from 22 kindreds
53
were selected to be run on the custom CGH array. All individuals were shown not to harbour
TP53 coding mutations or TP53 duplications/deletions by Sanger sequencing of the entire coding
region of the gene as well as up to 100bp into the introns and multiplex ligation polymorphism
analysis (MLPA). Samples from five healthy teenagers (median age= 15 years) were obtained
from the biorepository of the Montreal Neurological Institute. None of the control patients have a
family history of cancer including 2nd
degree relatives. In order to confirm the accuracy of the
array, three samples with confirmed deletions at the TP53 locus were also run. These samples
represented the small (~2.2kb), medium (~15kb), and large (~1Mb) deletions that represented the
range of expected alterations. This brought the total number of samples run to 40: 32 affected, 5
healthy controls, and 3 TP53 positive controls. The ages at diagnosis and tumour types can be
found in table 3. As Agilent CGH arrays are run with a sample of interest and reference
simultaneously in order to set a baseline copy number state, using a proper reference is vital. It
was decided that a pool of older individuals with no history of cancer would best serve this
purpose. While larger pools can mask common alterations, they help to ensure that uncommon
alterations are not misrepresented or missed entirely. The Centre for Applied Genomics (TCAG)
has developed the Ontario Population Genomics Platform (OPGP). The OPGP consists of
approximately 2600 DNA samples randomly selected from a predominantly Ontario-based
control cohort sorted into 96-well plates. The samples are prepared from permanently stored
EBV- transformed lymphoblastoid cell lines. From the OPGP, reference pools of 40 samples
from each sex were selected on a basis of having a medical history free of cancer, diabetes, and
mental illness. The average age of each pool was 60.1 years and 67.2 years for males and
females, respectively. Each pool contained 38/40 Caucasian individuals, a similar proportion to
the cases.
54
Sample ID Sex Related Samples dx Age Tumour
1 F 16 OS-TP53 deletion control
2 F n/a TP53 deletion control
3 F 17 ADCC- TP53 deletion control
4 F 5 13 ARMS
5 M 4 17 OS
6 F 45 Breast
7 F 2 ADCC
8 F 2 medulloblastoma, thyroid, SCLT, meningioma
9 M 33 0.1 neuroblastoma
10 F 1 CPC
11 F 12,16 38 Melanoma
12 M 11,16 54 OligodenLG
13 F 40 Breast
14 M 6 months CPC
15 M 34 CPC
16 M 11,12 6 pNET
17 M 18,19 13 OS
18 M 17,19 7 ERMS
19 F 17,18 n/a Unaffected (Syndactyly)
20 F 21 37 Breast
21 F 20 1 Medulloblastoma
22 M 19 OS
23 M 24,25 2 ERMS
24 F 23,25 40 Breast
25 M 23,24 43 GliomaLG
26 M 33 5 CPC
27 F 4 Astrocytoma
28 F 5 NB
29 F 39 Uterus Sarcoma
30 F HEALTHY CONTROL
31 F 42 Breast
32 M 40 Astrocytoma
33 F 9 34 Cervix
34 M 36 38 DCIS
35 F 43 Breast
36 F 34 8 GBM
37 F HEALTHY CONTROL
38 F HEALTHY CONTROL
39 M HEALTHY CONTROL
40 M HEALTHY CONTROL
Table 3- Summary of all 40 samples run on the array including sex, related samples, age
at diagnoses (dx) and tumour type.
55
2.3.2 Subject Recruitment
Written informed consent was obtained for all 40 samples of interest prior to the
extraction of DNA from peripheral blood leukocytes. For patients with cancer, samples were
obtained prior the initiation of therapy. All OPGP samples have been re-consented for
anonymized use as controls in genetic studies. All 80 samples used in the sex-matched reference
pools were taken from plate 19 of the OPGP. Before being run, DNA was quantified using a
NanoDrop Spectrophotometer (NanoDrop, Wilmington, DE) and the quality was assessed by
agarose gel electrophoresis. This study was approved by the Research Ethics Board at the
Hospital for Sick Children in Toronto.
2.4 Analysis and Validation of CGH Array Data
2.4.1 Custom CGH Array Analysis
All samples of interest were analysed using the Custom Agilent 4x180k CGH array
described above (Agilent Technologies, Santa Clara, CA). Both samples and references were
restriction enzyme digested, purified, labelled, and hybridized according to the manufacturer’s
protocol at TCAG. Arrays were run in four blocks: Samples 1-8, 9-16, 17-31, and 32-40 over a
span of roughly 6 months. After verifying the positive controls using Agilent Genomic
Workbench 5.0, the bulk of the array analysis was done using Partek Genomics Suite 6.6 (Partek,
St. Louis, MI). To detect small alterations, the Partek Genomic Segmentation model was used.
Called segments were defined as having a minimum of three markers showing means either
under a copy number of 1.6, or 2.4 with a p-value of 0.01 and signal to noise threshold of 0.3.
This substantial list of alterations was then filtered by excluding any alterations detected in any
of the five healthy controls, as well as any alteration found in only one sample. The resultant list
of alterations defined our primary regions of interest. In an effort to identify alterations found
56
only in one sample but likely to be accurate, another segmentation analysis was done to identify
larger copy number altered segments with convincing means and probe frequencies. Called
segments were defined as having a minimum of 15 markers showing means either under a copy
number of 1.4, or over 2.6 with a p-value of 0.001 and signal to noise threshold of 0.5. As before,
alterations seen in the healthy controls were eliminated, however alterations present in only one
sample were included. Both segmentation analyses were shown to accurately identify all three
positive controls.
2.4.2 Quantitative PCR validation
Quantitative PCR was used to validate the regions of interest from the array analysis.
This was performed on a Roche LightCycler 480 (Roche Applied Science, Indianapolis, IN)
using the Roche SYBR green kit. Primers were designed using Primer3 and the human genome
reference assembly (UCSC version hg19). All the primer pairs as well as the run protocol can be
found in the supplementary methods (Supplementary table 3). Primers with the highest
efficiency were used as primary primers while secondary primers used for confirmation were
selected from primers with lower (but still sufficient) efficiency. All samples were run in
triplicate. A commercial pool of genomic DNA from 50 individuals (Roche) was used as a
positive calibrator for each gene in each experiment. Both BCMA and FOXP2 were used as
reference genes. Copy number state was determined by a relative quantification method which
compensates for differences in target and reference amplification efficiencies. Copy number
ratios below 0.7 of the reference were determined to be deletions while ratios above 1.3 were
determined to be amplifications.
57
2.4.3 TaqMan PTCH1 Validation
A catalogue TaqMan (Invitrogen) copy number assay that overlaps with the 5’ region of
interest were used to further investigate alterations in PTCH1. The assays were run using a
Roche LightCycler 480 according to the manufacturer’s protocol. All samples were run in
quadruplicate. The commercial pool of Roche genomic DNA from 50 individuals was used as a
positive calibrator while the Human TaqMan TERT assay was used as a reference.
2.4.3 Sequencing of PTCH1
Sequencing of PTCH1 was done in the Molecular Genetics Laboratory at the Hospital for
Sick Children. Sequencing primers can be found in supplementary table 4.
2.4.4 Mismatch Repair Gene Mutation Screening
Mismatch repair gene mutation screening, which included both sequencing and MLPA of
MSH2, MLH1, MSH6, and EPCAM was done at the Laboratory for Advanced Molecular
Diagnostics at Mount Sinai Hospital. For the immunohistochemistry, sections were cut at 4
microns and allowed to dry prior to baking in a 60 degree oven for 30 minutes. The slides were
dewaxed through a series of xylenes, dehydrated through a series of graded alcohols and brought
to water before being rinsed with Tris buffer. Heat retrieval was performed in a Tris/HCl buffer
pH 9.0 for 30 minutes at 100˚C using a HistoPro microwave pressure cooker. The sections were
stained using the Dako 480 immunostainer. The MLH1 and MSH2 antibodies (Monoclonal ES05
and Monoclonal 25D12, respectively) were obtained from Leica Biosystems while the MSH6
and PMS2 (Rabbit monoclonal SP93 and Rabbit monoclonal PR3947, respectively) antibodies
were obtained from Cell Marque.
58
Chapter 3: Results
3.1 Array Results
3.1.1 Candidate Genes Found by Custom Array CGH
The segmentation analysis previously described was able to accurately describe all three
TP53 deletion controls, matching the size and copy number that was previously known. After
further filtering by the criteria described above, a list of 179 genes/loci were identified as
potential regions of interest. The full list of regions of interest can be found in the supplementary
results (supplementary table 5). These were then evaluated based on the mean copy number and
probe frequency of the reported segment as well as by their likelihood to contribute to an
autosomal dominant cancer predisposition phenotype. This included looking particularly for
genes in the p53 pathway as well as genes that have been previously implicated in other cancer
predisposition syndromes. This resulted in the identification of 13 primary genes of interest
shown below in table 4. Of these, PTCH1 and DICER1 were chosen as the validation targets of
the highest priority due to their prevalence in the tested cohort, their convincing means, and the
likelihood of errors in these genes causing a wide range of tumours based on their biology and
previously known association with cancer predisposition.
59
Gene Type Sample ID Unique Loci
CREB1 Loss 5 3
CREB1 Gain 1 1
FNBP1 Loss 5 2
BCL7A Loss 11 1
APC Loss 7 2
APC Gain 4 3
ATM Loss 3 2
PTCH1 Loss 13 2
PTCH1 Gain 2 2
DICER1 Loss 7 2
HOXC13 Loss 4 2
EXT2 Loss 7 2
EXT2 Gain 1 1
WRN Loss 5 4
HOXA13 Loss 4 1
HOXA13 Gain 1 1
HOXA10 Loss 8 2
HOXA10 Gain 2 2
PMS2 Loss 2 1
PMS2 Gain 2 2
Table 4- Primary genes of interest after CGH array analysis. Unique loci refer to how
many distinct regions were identified as altered in a given gene.
3.1.2 Significant Alterations Found in Single Samples
As copy number gains are rarely implicated in autosomal dominant cancer predisposition,
only segments showing a copy number loss were considered as regions of interest (Table 5). In
total, 10 large copy number losses were detected in single samples. These all occurred in one
sample each with the exception of one sample demonstrating two large copy number losses.
While copy number losses were the focus, a single sample with embryonal rhabdomyosarcoma
(Sample 23) appeared to have seven large copy number gains. This was considered significant as
no other sample had more than two large copy number alterations detected. These seven
60
segments can be seen in table 6. Due to its involvement in rhabdomyosarcoma, the ~1.9kb copy
number gain FOXO1 was chosen as the primary alteration of interest of these seven.
Segment Sample ID Size (bp) Markers Mean Gene(s)
chr8:118910810-118929166 24 18356 15 1.04201 EXT1
chr15:20562844-22617694 10 2054850 36 1.2054 multiple
chr15:20735436-22617694 32 1882258 33 1.18954 multiple
chr2:47636572-47646395 36 9823 25 1.16945 MSH2
chr4:190910973-191002380 15 91407 20 1.34896 DUX2, DUX4
chrX:129244389-129246323 15 1934 19 1.27595 ELF4
chr17:29558557-29562850 32 4293 18 1.35945 NF1
chr11:3697265-3698117 36 852 16 1.38014 NUP98
chr6:135501207-135502627 34 1420 15 1.34839 MYB
chr14:22401486-22963927 9 562441 172 1.27338 multiple
Table 5-Large copy number losses seen in only one sample
Location Size (bp) Probes Mean Gene
chr22:23651269-23664153 12884 61 2.643 BCR
chr7:156797661-156803572 5911 42 2.7186 MNX1
chr14:99640288-99642451 2163 30 2.89292 BCL11B
chr2:16079743-16082896 3153 22 2.76887 MYCN
chr13:41239262-41241181 1919 21 2.7937 FOXO1
chr2:100210137-100217744 7607 20 2.73956 AFF3
chr22:23522754-23523962 1208 18 3.06706 BCR
Table 6- Seven large copy number gains were detected in Sample 23
3.1.3 Difficulty Identifying Genuine Copy Number Alterations with Custom Probes
Due to the large amount of reported copy number alterations seen in many samples,
particularly those of small to intermediate size (3-12 markers) and unconvincing means (copy
number of ~1.5 or ~2.5), we suspected that the segmentation analysis was over-reporting copy-
number altered regions. After examining the visual data plots of many regions of interest in all
40 samples, certain patterns in the reported log ratio were often observed in areas with a high
custom probe density. Segments with reported copy number losses were often seen to have
means that approach a call (but not pass the threshold) in other samples which would appear
61
copy number neutral by segmentation analysis. In addition, it was sometimes observed that
probes would follow very similar patterns in their individual reported log ratios. An example of
this can be found in figure 8. These patterns were mostly seen in samples that were run at the
same time, particularly in arrays on the same slide. This made the creation of a list of candidate
genes for validation more difficult as it seemed likely many reported alterations were not in fact
genuine.
Figure 8- A notable example of a sample (b) which was not identified by segmentation
analysis but reported very similar log ratios to a sample (a), which was. Samples a and b were
run beside each other on the same slide.
62
3.1.4 Genetic evidence of anticipation
A number of detected CNVs detected in parents were found to be expanded in their
children. These children also developed tumours at an earlier age than their parents. A total of 14
expanded alterations were detected (Table 7). Of these, only two were detected as being larger in
the parent, and both by only one probe. The expansion of alterations was most notable in samples
5 and 4 (father and daughter), who had five expanded alterations detected.
63
Sample ID Alteration Size (bp) Probes
Segment Mean Gene(s)
5 Loss 302 4 1.194 CREB1
4* Loss 377 5 1.207 CREB1
5 Loss 1466 7 1.291 TPR
4* Loss 2125 11 1.387 TPR
25 Loss 270445 7 1.235 PABOC4L
23* Loss 233076 6 1.101 PABOC4L
34 Loss 807 5 1.004 BCL7A
36* Loss 982 6 1.151 BCL7A
5 Loss 437 4 1.011 MLL
4* Loss 62794 6 1.213 MLL
11 Loss 2049 21 1.289 KAT6B
16* Loss 13225 44 1.519 KAT6B
19 Loss 1103 9 1.283 none
17* Loss 1256 10 1.307 none
5 Loss 416 4 1.105 EXT2
4* Loss 1170 8 1.314 EXT2
5 Loss 170 3 1.125 PHOX2B
4* Loss 319 5 1.252 PHOX2B
11 Gain 9166 12 2.959 TIMP3/SYN3
16* Gain 9112 11 3.367 TIMP3/SYN3
34 Loss 6301 13 1.411 CHEK2
36* Loss 6334 14 1.202 CHEK2
33 Loss 643 14 1.523 HOXA10
9* Loss 726 17 1.372 HOXA10
33 Loss 174 3 1.303 HNRNOA2B1
9* Loss 281 4 1.016 HNRNOA2B1
19 Loss 355 4 1.352 NCOA1
18* Loss 468 5 1.285 NCOA1
Table 7- Summary of all shared alterations that were detected as expanded in families by
array segmentation analysis. The child in each parent/child pair is denoted with an asterisk (*).
64
3.2 Validation of Candidate Genes
3.2.1 MSH2 involved in a Li-Fraumeni-Like phenotype
Due to the high prevalence of MSH2 deletions in hereditary nonpolyposis colorectal
cancer (HNPCC, or Lynch Syndrome), validation of a possible MSH2 deletion in an LFS-L
proband with a Glioblastoma multiforme (Sample 36 in table 3) was considered a high priority.
This deletion was in fact confirmed by qPCR (Figure 10). The affected mother of the proband
was also run on the array (Sample 34) and appeared to be copy number neutral at this locus. This
was curious as it was initially assumed that the cancer predisposition was inherited from the
maternal lineage due to the prevalence of early-onset breast cancer and an unaffected father
(pedigree in figure 11). Further evaluation of the family history however, revealed a very strong
incidence of colon cancer along the paternal lineage. It was subsequently discovered that the
proband’s paternal grandfather had tested positive for an MSH2 deletion encompassing exons 3-
6, identical to our array (figure 9). Following the confirmation of the deletion in the family, the
proband’s thus far unaffected brother was also tested and was found to be positive for the
deletion. Subsequent immunohistochemical analysis of the proband’s tumour revealed that 50%
of the tumour cells express PMS2 and MLH1 and that while MSH6 and MSH2 are expressed in
endothelial cells, they appear to be completely lost in the tumour.
65
Figure 9- MSH2 copy number loss encompassing exons 3-6 (shown in red) detected on
the custom CGH array
Figure 10- MSH2 copy number loss in validated via SYBR Green qPCR. Shown are the
mean (+/- SEM) copy number ratios.
66
Figure 11- Pedigree of a proband with a history of HNPCC in the paternal lineage and
breast cancer predisposition in the maternal lineage.
3.2.2 Large copy number gains in sample 23 likely an artifact
Validating the FOXO1 copy number gain in Sample 23 proved difficult. The only
primers of borderline-sufficient quality repeatedly demonstrated a copy number loss rather than a
gain while using both BCMA and FOXP2 as references (Figure 12). In order to evaluate the
reliability of these results, MYCN was chosen as secondary gene of interest due to its similar
probe count and mean to the FOXO1 alteration. A reliable primer pair repeatedly demonstrated a
67
neutral copy number at the locus (Figure 13). This was confirmed with a second primer pair as
well (not shown). It should also be noted that of all seven of the large copy number gains seen on
the array in this sample, none were shared in the other two family members that were also tested.
Because of this, it was decided that the abnormally high incidence of copy number gains
observed in this sample was simply an artifact caused by either the improper labelling of some
DNA fragments, or improper hybridization
.
Figure 12- FOXO1 copy number gain failed to validate via SYBR Green qPCR, instead
appearing as a copy number loss. Shown are the mean (+/- SEM) copy number ratios.
68
Figure 13- MYCN copy number gain failed to validate via SYBR Green qPCR, instead
appearing as copy number neutral. Shown are the mean (+/- SEM) copy number ratios.
3.2.3 Confirmation of custom array’s ability to detect previously unknown alterations
As the specificity of the array was in doubt, it was deemed necessary to ensure that the
array was indeed capable of identifying previously unknown alterations (both gains and losses)
that could be validated. For this, a large gain in PCDH15 (44 probes, size: ~1.625Mb, mean:
3.37) was used as well as an intermediate sized loss in EXT1 (15 probes, size: ~18kb, mean:
1.04). These were chosen due to their likelihood of being genuine based on the array analysis.
Both alterations were in fact validated via qPCR with their ideal primers (Figures 14 and 15) as
well as secondary primer pairs. The region observed to have a copy number gain in PCDH15 has
been reported to be copy number variable in the DGV. The copy number loss observed in EXT1
69
lies in a large intronic region. As both these alterations were deemed unlikely to be causative,
neither was pursued as a potential gene of interest.
Figure 14- PCDH15 copy number gain successfully validated via SYBR Green qPCR.
Shown are the mean (+/- SEM) copy number ratios.
70
Figure 15- EXT1 copy number loss successfully validated via SYBR Green qPCR.
Shown are the mean copy number ratios.
3.2.4 DICER1 is copy number neutral in the patient cohort
DICER1 was considered a gene of interest and a high priority for qPCR validation. Both
regions of interest shown by the array were analyzed via qPCR and both were shown to be copy
number neutral (Figures 16 and 17). Due to the fact that all seven alterations were shown to be
neutral by qPCR, we decided that further pursuit of DICER1 as a gene of interest was
unnecessary and that both alterations shown on the array were simply artifacts.
71
Figure 16- The copy number losses on the 5’ end of DICER1 failed to validate via SYBR
Green qPCR, instead appearing as copy number neutral in all four samples. Shown are the mean
(+/- SEM) copy number ratios.
72
Figure 17- The copy number losses encompassing two exons on the 5’ end of DICER1
failed to validate via SYBR Green qPCR, instead appearing as copy number neutral in all three
samples. Shown are the mean (+/- SEM) copy number ratios.
3.2.5 PTCH1 validation
The results for PTCH1 were the most striking from the array, especially in the context of
the gene’s involvement in hereditary cancer predisposition. Our interest was primarily focused
on the thirteen samples showing a copy number loss on the far 5’ end of the gene. This loss
encompasses the first exon of some PTCH1 transcripts, while being upstream of others (Figure
18). After validating multiple primers, the best were used to evaluate the copy number status of
this locus via qPCR. If acceptable secondary primers were available for the sample of interest,
they were used to confirm the results from the primary primers. The results of the primary
primers can be seen in Figure 19. There was a considerable amount of variation in the copy
number state across the samples. Two samples (27 and 35) were shown to be copy number
73
neutral, with the remaining eleven samples showing copy number ratios below the threshold of
0.7. In an effort to further validate these findings, a TaqMan reference assay was used, though
this proved negative in all samples. The best usable probe however was 500bp away from the
common region of copy number loss and only included the largest reported losses. The TaqMan
results for these six samples are shown in figure 20.
Figure 18- Thirteen copy number losses were observed on the far 5’ end of PTCH1,
shown in red.
74
Figure 19- Of the copy number losses encompassing the 5’ end of PTCH1, 11/13
successfully validated via SYBR Green qPCR. Shown are the mean (+/- SEM) copy number
ratios.
75
Figure 20- The six samples with detected copy number losses encompassing a TaqMan
probe were all revealed to be copy number neutral at the probe’s locus using a TaqMan copy
number qPCR assay. Shown are the mean (+/- SEM) copy number ratios.
A 2.6kb gain was observed in one sample (sample 5) by array analysis The qPCR
validation however, gave unexpected results. Though there was a considerable amount of
76
variation between runs, the mean copy number ratio actually showed a loss (figure 21), a result
that was replicated with a secondary pair of reliable primers.
Figure 21- The copy number gain spanning 2.6kb in PTCH1 failed to validate via SYBR
Green qPCR, instead appearing as a copy number loss. Shown are the mean (+/- SEM) copy
number ratios.
The copy number losses seen in three related samples in exon 14 of PTCH1 as well as
one copy number gain observed in the 3’ UTR were both shown to be copy number neutral via
qPCR (Figures 22 and 23).
77
Figure 22- The copy number loss observed in exon 14 of PTCH1 failed to validate via
SYBR Green qPCR, instead appearing as copy number neutral. Shown are the mean (+/- SEM)
copy number ratios.
78
Figure 23- The copy number gain observed in the 3’UTR of PTCH1 failed to validate via
SYBR Green qPCR, instead appearing as copy number neutral. Shown are the mean (+/- SEM)
copy number ratios.
Sequencing of PTCH1 revealed a single nucleotide transversion present in two related
samples, 19 and 18 (mother and son, pedigree shown in figure 24), that has not been previously
reported in either the Catalogue of Somatic Mutations in Cancer (COSMIC) or the NCBI SNP
database. This c.1298C>A substitution results in a serine to tyrosine change at codon 443.
PolyPhen-2, a tool for annotating coding nonsynonymous SNPs193
predicted this substitution to
be “probably damaging” with a score of 0.984. The “Sorting Tolerant From Intolerant” (SIFT)
algorithm194
also predicts this mutation to be “damaging” with a SIFT score of 0.01.
79
Figure 24- Pedigree Showing a LFS-L family with a history of syndactyly in the maternal
lineage. The family members of interest are outlined in red.
80
Chapter 4: Discussion and Future Directions
Despite sharing a similar, severe phenotype to mutant TP53-associated Li-Fraumeni
Syndrome (LFS), there are a multitude of families in whom no causative germline variants have
been detected. In this study, we shed some light on this wild-type TP53 population. First, we
demonstrated the utility of a candidate gene/region approach using a custom platform. With this
approach we were able to properly identify a hereditary cancer syndrome that was initially
identified as being Li-Fraumeni Like (LFL), while identifying a new gene in PTCH1 that may at
least play a role in the development of the strong cancer predisposition phenotype observed in
LFS/LFL families without detectable TP53 mutations.
4.1 Discussion on high-resolution genomic analysis in LFS
4.1.1 Utility of custom CGH arrays for detecting novel copy number alterations in the
germline
Our custom CGH array proved sensitive enough to detect all three TP53 control
deletions, the smallest of which was 2.1kb. It also detected numerous copy number alterations
that were validated with qPCR, the smallest of which was 387bp. On the other hand, our custom
array does appear to suffer from issues of specificity. Notably, all detected copy number changes
in DICER1, in two different regions, failed to validate. The full extent of mis-reported alterations
is difficult to predict, but based on the number of regions that appear to have shifts in their
reported log ratios, particularly within slides that were run at the same time, we can predict that
the issue is quite widespread. Samples 9-16 appeared to demonstrate similar copy number shifts
the most. The cause of these patterns is difficult to discern, but is likely the result of a
combination of technical error and difference in handling between runs which was exacerbated
81
by the low probe quality (with regards to GC content, secondary binding, and melting
temperature as quantified by the Agilent probe score) that was often necessary in order to
achieve a high probe density in many exons. Indeed, shared log ratio patterns were almost
always observed in segments primarily composed of custom probes in the exonic regions. Even
after the elimination of the worst quality probes, in order to achieve our desired resolution we
were forced to include a number of low quality probes on the array. In regions with many of
these probes, errors in reporting are to be expected. The very inclusive requirement of only three
probes to define a segment would also contribute to a large number of these reported copy
number alterations where likely none exist. Again however, in order to achieve a resolution of
less than 500bp, this was a necessary decision. Placing one healthy negative control sample on
each slide (one array of four), would help in the analysis as a log ratio pattern detected in the
control could be dismissed as an artifact. Ultimately though, the price for the high sensitivity of
the platform is its low specificity. Given the potential to reveal very small copy number changes,
this is likely acceptable for research purposes, but the significant sacrifice of specificity limits its
usefulness in clinical applications.
Recently, there have been a few studies utilizing custom CGH platforms with a probe
density of ~50bp22,195
. These studies, however, tend to focus on either a specific region (or just a
few regions) of interest, or a handful of candidate genes. Our study is unique in its ultra-high
resolution across almost 450 distinct regions of the genome. This has not been the only study to
utilize a custom CGH array in order to identify additional LFS genes. Very recently, Aury-
Landas et al. also used a custom 4x180k Agilent platform to evaluate the copy number status of
TP53 and its surrounding regions as well as 24 candidate genes and 6 miRNAs along with the
rest of the genome, the results of which are discussed in the introduction of this thesis57
. Notably,
82
only ~28 000 probes on the array were dedicated to their regions of interest, compared to the
~91 000 probes in this study. Using stringent criteria, only one small (~4kb) alteration was
detected; the remaining 16 detected CNVs not listed in the DGV were all over 82kb and none
were found in their regions of interest. The results of this study as well as ours can be used to
direct how similar approaches can be used in the future.
As discussed in the Methods chapter, placing custom probes at a very high density across
a large number of regions can lead to difficulties in analysis despite the high theoretical
resolution. This appears to have been an issue in the study of Aury-Landas et al. and certainly
was in this study. One should think carefully about the true utility of custom probes when
placing so many together at a high density. If at all possible, probes previously designed to be
placed on the platform (i.e. Agilent catalogue probes) should be used as they are more reliable
than custom probes. A combination of catalogue probes and custom probes may be ideal, and
with the large number of catalogue probes now available, it would now be possible to forgo the
use of any custom probes at all while maintaining a reasonably high probe density, albeit less
dense than the custom array used in this study.
A discussion of the utility of custom CGH arrays going forward must take into account
the place they now have in the broad arsenal of genome analyzing tools. Next generation
sequencing (NGS) is of particular relevance today, as will be discussed below, and has enabled
researchers to interrogate the genome at base pair resolution. The niche of CGH arrays is that
they are relatively inexpensive, and analyzed relatively easily giving reliable results. As any
future custom CGH array studies in this area should be done with this new niche in mind, it
would change the design process significantly by changing the focus from sensitivity to
specificity and reliability. This would be best achieved by discarding any custom probes
83
altogether. By extending defined “exonic regions” to include a few hundred bases on either side
of each exon, many catalogue probes could still be placed in and around most exons. In order to
improve the cost-effectiveness of the array, the genomic probes could be eliminated as well. In
this study for example, ~75 000 probes were spaced out across the genome. While this
represented a very large proportion of the array’s total probe count, it still resulted in rather
unimpressive resolution. There are now a number of standard commercial arrays that offer very
impressive resolution throughout the genome such as Agilent’s 1M array which has a median
inter-probe distance of 2.1kb, and 1.8kb within RefSeq genes. Platforms such as these would
provide a more cost-effective means of interrogating the entire genome in relation to resolution.
The custom array should focus on the candidate genes and the regions directly surrounding them.
By eliminating the genomic probes, and relying solely on catalogue probes for the regions of
interest, the array would be small enough to be placed on something like Agilent’s 8x60k
platform, cutting costs by allowing eight arrays to be multiplexed on one slide.
4.1.2 Next Generation Sequencing
Upon the sequencing of the human genome in 2001, it was estimated that it consisted of
~3 billion base pairs containing 30,000-40,000 protein coding genes, a figure that later was
revised to be only 20,000-25,000 genes196,197
. Until recently, studies on genetic diseases have
relied on traditional Sanger sequencing to sequence a small subset of genes of interest. Next
generation sequencing (NGS) however, has revolutionized genomics by allowing for feasible
whole genome sequencing by individual investigators in a matter of days198,199
. In whole-genome
sequencing, genomic DNA is fragmented and then directly subjected to massively parallel
sequencing. This allows for the discovery of single nucleotide variants (SNVs) not only in the
coding regions of genes that are the focus of Sanger sequencing, but also in promoters,
84
enchancers, introns, non-coding RNAs and intergenic sequences across the entire genome. NGS
has also become more and more effective at detecting structural alterations including
translocations as well as copy number alterations200,201
. Notably, balanced translocations go
undetected in CGH array analysis. NGS was also responsible for the characterization of a
catastrophic phenomenon of the cancer genome known as chromothripsis whereby tens to
hundreds of genomic rearrangements occur during a single cellular event202
. NGS has even been
used to map global epigenetic modifications203
. While these high-throughput methods are prone
to errors, this is compensated by high “read depth”, the number of times a given fragment is read.
Average read depths of ~40x in whole genome sequencing and >100x in targeted NGS are now
quite common and are able to provide a reliable sequencing result. The amount of data generated
by this high-throughput method is vast and places intense demands both on computer hardware
and software to enable efficient data analysis. For example, a modest average coverage of just
30x would result in 90 Gb of sequence being read across the whole genome. The time required to
analyze such vast amounts of data is significant. In 2008, the first sequencing of an entire cancer
genome was reported when acute myeloid leukemia cells were compared to matched normal
somatic DNA from the skin. The researchers discovered ten genes with acquired mutations, eight
of which had not been previously described204
. Despite the financial cost and significant labour
analysis required, whole genome sequencing has seen a rapid increase in use as investigators
have rushed to unlock the secrets of the cancer genome that were previously undetectable.
An alternative to costly and labour-intensive whole genome sequencing is targeted or
whole-exome sequencing. Protein-coding sequences are thought to account for only ~1.3% of the
genome205
and can thus be sequenced through NGS without the massive reads required for whole
genome sequencing. In whole exome sequencing, the roughly 50-62 Mb that make up the exome
85
are captured using chemically synthesized nucleotides, referred to as “baits”. These captured
fragments are then subjected to the same massively parallel sequencing used to sequence the
genome. Due to the significantly smaller size of the exome, much higher coverage of up to 100x
can be achieved while still presenting a cheaper, easier-to-manage alternative to whole genome
sequencing. While there are additional costs related to the capture element of the procedure, it
has allowed researchers to focus on variants most likely to have functional effects in a more cost-
effective manner with a higher degree of accuracy than whole genome sequencing and has led to
the discovery of numerous novel mutations in the cancer genome206–208
. While normal exome
sequencing cannot detect fusion genes, transcriptome sequencing (RNA-seq) can be used to
sequence RNA variants, and is thus capable of detecting gene fusions. A hallmark of cancer is
tumour heterogeneity in which a single tumour will have multiple, distinct subclonal populations
with differing genetic backgrounds209
. This can make it quite difficult to get a good picture of the
genetic landscape in a region of interest as a tumour sample that is being sequenced will likely
contain multiple subclonal populations as well as some healthy tissue. By utilizing baits in a
manner similar to exome sequencing, specific targets can be captured and then “deep sequenced”
to a very high degree, typically between 1000x and 10, 000x. This extremely high read depth
allows for alleles with just 1% frequency to be detected reliably205
. When Shah et al. performed
deep sequencing at a depth of 20,000x on somatic mutations detected by whole genome
sequencing in triple-negative breast cancers, the existence of multiple subclonal populations with
different mutant allele frequencies was confirmed210
. The extreme read depth of this technique
allows investigators to sequence a handful of candidate genes with excellent accuracy when
whole exome sequencing is not desired.
86
4.1.3 Next Generation Sequencing and Cancer Syndromes
In the field of hereditary cancer syndromes, NGS technologies will be of critical
importance in the coming years. Sequencing germline DNA of cancer-prone individuals is not
fraught with the same challenges as tumour sequencing with regards to sub-clonal populations
and the accumulation of passenger mutations. Tumour DNA however can often be paired with
“normal” somatic DNA in an effort to identify driver mutations which would be unique to the
tumour population. Whole genome sequencing of the germline DNA of cancer-prone individuals
would likely yield a massive amount of alterations with unknown functional effects due to the
massive amount of data being generated. Because of this, the majority of NGS studies in cancer
predisposition have relied on exome, or targeted sequencing rather than whole genome
sequencing. Particularly in highly penetrant syndromes, it is reasonable to expect causative
mutations to lie in the coding portion of genes, making exome sequencing attractive while being
much more efficient than whole genome sequencing. Additionally, sequencing of unaffected
family members can provide a background from which to identify alterations that segregate with
tumour formation. NGS technologies may well lead to the identification of causative genes in
TP53 wild-type LFS as well as other cancer syndromes in which causative genes have not yet
been identified.
Various NGS platforms have already been used to characterize germline mutations in
cancer-prone individuals. Hereditary pheochromocytoma (PCC) is associated with germline
mutations in one of nine known susceptibility genes, but like LFS, exhibits familial cases that are
wild-type in the associated genes. Whole exome sequencing was used to identify non-
synonymous single nucleotide substitutions in MAX present in three unrelated individuals with
PCC that were not detected in 750 healthy controls. A further analysis of 59 cases lacking
87
mutations in the previously known susceptibility genes revealed additional MAX mutations211
.
Germline mutations in MAX have since been estimated to account for 1.12% of PCC212
. Whole
exome sequencing has since led to the identification of multiple variants associated with
colorectal cancer predisposition213
, BAP1 mutations associated with predisposition to renal cell
carcinoma214
, and ERCC4 mutations associated with unclassified Fanconi anemia215
. It has also
yielded interesting results highlighting the difficulty of classifying cancer predisposition
syndromes. Mutations in FANCC and BLM, associated with Fanconi anemia and Bloom
syndrome respectively, were detected in families predisposed to breast cancer despite not
demonstrating a phenotype associated with either syndrome other than cancer predisposition216
.
While mutations in TP53 are a defining feature LFS, whole exome sequencing has revealed that
inherited TP53 germline mutations are perhaps more common than previously thought. Whole
exome sequencing of a kindred in which five individuals had been diagnosed with leukemia,
revealed a TP53 mutation at codon 306, which has been previously reported in LFS families.
This kindred however, does not appear to be affected by any LFS-associated tumours. While the
family does not meet the classic or even Chompret LFS criteria, their phenotype is severe with
only one surviving member who has an extremely high-risk form of childhood acute
lymphoblastic leukemia217
. The fact that the same germline TP53 mutation is associated with two
different severe cancer predisposition presentations is intriguing and points to the involvement of
other modifier genes. Whole exome sequencing has also been used to identify the presence of
LFS where other syndromes were suspected. Whole exome sequencing was performed on a
patient with gastric adenocarcinoma and a family history of colorectal cancer. The patient did not
appear to harbour mutations in genes usually associated with colorectal cancer predisposition
syndromes (CDH1, APC, MLH1, MSH2, MSH6, PMS2, PTEN, or STK11). The sequencing led to
88
the discovery of a mutation at codon 248 of TP53 that has been reported in many LFS families,
including families with gastric adenocarcinoma218
. Identification of cancer predisposition
syndromes can be difficult when relying on the clinical presentation alone. NGS technologies,
which are becoming more and more common in the clinic as well as the research laboratory, can
be instrumental in properly identifying a syndrome.
A revised custom CGH array like the one previously discussed would provide quick and
reliable data on the copy number status of more than 500 genes of interest at low cost. NGS
however is becoming an increasingly attractive option for investigating TP53 wild-type LFS.
Custom selection methods like Agilent’s SureSelect could be used to sequence all the regions of
interest of this study, including UTRs, introns and promoters as well as the coding regions with
very high coverage. The cost, both in money and labour of extensive custom probe selection
should not be underestimated. Like custom CGH probes, custom selection probes may also
suffer from reliability issues and many would be previously unvalidated. Whole exome
sequencing offers a more reliable alternative as commercial exome selection kits are widely
available. While the non-coding regions of interest would obviously be overlooked with this
method, it would include the coding regions of the entire genome. As it seems probable that
multiple variants work in concert to generate the highly variable phenotypes seen in LFS and
many other cancer syndromes, the ability to interrogate the entire exome would be immensely
helpful. While whole genome sequencing appears to offer the best of both worlds, the cost and
immense analytical challenges make it less attractive for a study such as this where an ideal
sample size in the near future would be over 100 samples.
With regards to analyzing only the copy number of hundreds of genes of interest, a
revised CGH array may well still prove to be the best option, particularly for a large sample set.
89
An array like this could be applied quite cheaply to a very large number of samples. Catalogue
probes at roughly 200bp spacing would provide a reliable resolution of under 1kb. Any gene of
interest with a detected alteration of this size in a coding region could be considered a candidate
gene which could then be sequenced in detail in more samples. This is in contrast to the massive
number of small variants that would be generated by NGS technology. Assessing the potential
phenotypic effect of these variants, and thus generating a small list of candidate genes, would
prove very difficult and significantly more costly.
4.2 Potential Genetic Evidence of Anticipation
Anticipation has been suggested to occur in other cancer predisposition syndromes as
well, but remains controversial. It is difficult to determine if the decrease in age of onset is
genuine or merely a result of ascertainment bias. Identification of a genetic basis for anticipation
then is vital. Perhaps the best studied example of this is myotonic dystrophy type 1 (DM1) where
expansion of an unstable CTG trinucleotide repeat is associated with more severe disease and
earlier age of onset219
. Telomere length has been the focus of much attention in cancer
predisposition syndromes and shortening of telomeres appears to be associated with an earlier
age of onset in LFS38
and hereditary breast cancer40
while this does not appear to be the case in
HNPCC220
despite ample clinical evidence of anticipation221
.
The custom CGH array used in this study is not well suited to evaluating global copy
number differences between family members. It did however detect a number (12/14) of shared
regions of interest where the alterations appear to be expanded in the child. The expansions
remain unvalidated and are for the most part quite small, consisting of only a few probes. While
the sample size is small, only 2/14 of the shared alterations were expanded in the parent, and
even then only by one probe. Children developed tumours at an earlier age than parents in all
90
these pairs. The most notable example of repeated expanded alterations was seen in samples 4
and 5 which actually have the smallest difference in ages of onset (17 years vs. 13 years).
Despite this, the daughter did appear to exhibit a more severe phenotype. While her father’s
osteosarcoma was successfully treated, she unfortunately did not survive.
This evidence of anticipation is of course very slight and cannot be used to make any
broad conclusions. It is however intriguing and suggests that further studies may yield interesting
results in the search for a genetic cause of anticipation in LFS. NGS of paired LFS samples will
be able to accurately identify the accumulation of likely deleterious mutations throughout
generations if they do in fact exist.
4.3 Discovery of a Lynch Syndrome Kindred
Using the custom CGH array, a heterozygous deletion encompassing exons 3-6 of the
Lynch syndrome (HNPCC) gene MSH2 was detected in a proband which was validated via
qPCR. Curiously, this deletion was not detected in the affected mother whose DNA was also
analyzed. A more complete family history revealed a high incidence of colon cancer on the
paternal side of the family though the father is as yet unaffected (pedigree shown in figure 11).
The paternal grandfather however was found later to have tested positive for the same deletion.
The proband’s brother is thus far unaffected but has been shown to also carry the deletion.
MSH2 codes for a 105kDa protein consisting of 934 amino acids and is one of the key
mismatch repair (MMR) genes. It forms two different heterodimers: MutSα (an MSH2-MSH6
heterodimer) and MutSβ (an MSH2-MSH3 heterodimer). These heterodimers bind to DNA
mismatches and initiate the DNA repair process by bending the DNA helix, shielding around 20
base pairs. Both MutS heterodimers form a complex with the MutLα heterodimer, which directs
91
downstream MMR events. As discussed in the introduction, mutations in MSH2 and MLH1 are
thought to be the main cause of HNPCC. While their prevalence relative to MSH6 and PMS2 has
perhaps been overestimated in the past, MSH2 and MLH1 do confer a more penetrant phenotype.
Deletions encompassing exons 3-6 have been reported 14 times in database of The International
Society for Gastrointestinal Hereditary Tumours (InSiGHT) and deletions of MSH2 account for
26.25% of all catalogued variants while substitutions account for 68%222
.
This case highlights the importance of obtaining the most detailed family history possible
when a cancer syndrome is suspected. The early-onset glioblastoma multiforme of the proband
combined with the incidence of early onset breast cancer in the maternal lineage was certainly
suggestive of LFS. While the father was unaffected, a complete family history revealed the
evidence of a devastating colorectal cancer predisposition. Had the paternal lineage been the
focus, MSH2 testing would certainly have been indicated for the proband. The case also
demonstrates how the variable presentations of cancer syndromes can confound diagnosis. While
the proband’s immediate family history is suggestive of LFS, a HNPCC gene appears to be the
causative mutation. Recently it has been suggested that soft tissue sarcomas, a common feature
of LFS, should also be included in the HNPCC tumour spectrum223
. The authors conducted a
review of the literature and found eleven cases of soft tissue sarcoma in HNPCC patients. Ten of
these were found to harbour germline mutations in MMR genes (seven in MSH2 and three in
MLH1). As previously discussed, a family with a severe leukemia predisposition was found to
harbour a TP53 mutation seen in LFS217
and a proband with a suspected colorectal cancer
syndrome was shown to have an LFS-associated TP53 mutation218
. These are all cases in which
the clinical presentation does not agree with what would be predicted by the genetic presentation.
92
The identification of a causative mutation can aid greatly in the proper management of cancer
syndromes which often rely on clinical presentation and family history to make a diagnosis.
Why would a family with an LFS mutation appear to have a colorectal cancer
predisposition syndrome? Why does a proband with an HNPCC-associated mutation present
with an LFS-associated tumour? Secondary alterations likely account for much of the variability
that is observed in cancer syndromes. A low-risk allele may have a different effect in the
presence of a primary mutation in MSH2 or TP53 for example, which may exert a significant
effect on the phenotype. In the specific case in question, it seems likely that additional cancer
predisposition variants were inherited from the mother, whose own predisposition remains
unexplained. Constitutional mismatch repair-deficiency (CMMR-D) is associated with
homozygous mutations in the MMR genes. While predisposing to HNPCC-associated tumours, it
also predisposes to hematological malignancies as well as brain tumours. PMS2 is the most
frequently reported of the MMR genes in CMMR-D, likely due to its lower penetrance in
HNPCC relative to MSH2 and MLH1224
. Glioblastoma is the most common brain tumour
observed in CMMR-D, with a median age at diagnosis of 8 years, the age at diagnosis of the
proband224
. Due to the potential for bi-allelic MMR involvement, the proband and mother were
subjected to an MMR gene panel assessing MLH1, MSH2, MSH6 and EPCAM but there was no
indication of bi-allelic involvement in these genes. Notably, PMS2 was not included in this
panel. The custom CGH array detected a 3.9kb copy number gain in PMS2 in the proband’s
mother. However, this does not seem to be inherited by the proband. Still, it is possible that this
amplification, if genuine, is coupled with a small alteration on the other allele that could have
been inherited. While this would be extraordinary if it were the case, the clinical presentation of
the proband warrants a thorough investigation of any potential MMR involvement. This
93
proband-mother pair is an excellent candidate for whole exome sequencing which may lead to
the identification of shared alterations that may be responsible for the distinct, severe phenotype
observed in the proband.
4.4 PTCH1 Associated with the LFS phenotype
4.4.1 Deletions in PTCH1 isoforms associated with the LFS phenotype
The custom CGH array detected a number of alterations in PTCH1. Of these, the only
ones to be successfully validated via qPCR were 11/13 of the copy number losses at the 5’ end of
the gene. This result however is in doubt as a TaqMan copy number assay 500bp downstream of
the common region of copy number loss, which included detected regions of loss in six samples,
revealed the probe locus to be copy number neutral in all samples. Accurate copy number
detection for this small region has proven difficult, but further validation is certainly necessary to
assess the true nature of these detected losses. The gene was then sequenced in all samples with a
PTCH1 alteration detected leading to the discovery of a novel Ser433Tyr substitution in two
family members.
PTCH1 encodes a 161kDa transmembrane protein consisting of 1447 amino acids which
form 12 transmembrane-spanning domains and 2 large extracellular loops. The gene itself
contains 23 coding exons and spans ~73kb. PTCH1 is the ligand-binding component of the
hedgehog (Hh) receptor complex. In the absence of the Hh ligand, PTCH1 maintains another
transmembrane protein, Smoothened (SMO), in an inactive state. Upon binding of the Hh ligand
to the extracellular loops of PTCH1, SMO is released and transduces the signal to a SUFU-GLI
complex in the cell’s cytoplasm which results in the activation of GLI transcription factors. The
transcription of PTCH1 itself is induced by the activity of the Hh pathway, creating a negative
feedback loop225
.
94
As discussed in the introduction, mutations in PTCH1 are associated with Nevoid basal
cell carcinoma syndrome (NBCCS, or Gorlin Syndrome). In NBCCS, various tumours and
hamartomas exhibit loss of heterozygosity. Loss of heterozygosity has been observed in almost
90% of hereditary basal cell carcinomas (BCCs)226
. Lindstroem et al. reviewed 132 germline
mutations of PTCH1. Of these, 73% were nonsense mutations, the majority of which are due to
small insertions and deletions, in contrast to sporadic BCCs where missense mutations make up
the majority of mutations. These mutations are concentrated in the two large extracellular loops
while missense mutations are primarily located in the transmembrane domains227
. There is
increasing evidence of large deletions of PTCH1, including deletions that completely envelop the
gene, playing a larger role in NBCCS than previously thought. The increasing using of arrays has
led to the identification of many such cases in families with NBCCS in whom sequencing failed
to detect any mutations228,229
. Mice heterozygous for PTCH1 mutations are often affected by
rhabdomyosarcoma, a common LFS tumour230
. Interestingly, Kappler et al. showed that
rhabdomyosarcomas caused by mutations of either PTCH1 or TP53 in mice have distinct gene
expression profiles and biological features231
.
Small copy number losses on the 5’ end of PTCH1 have yet to be reported in NBCCS.
The 5’ structure of the human PTCH1 gene was unclear until the discovery that there exist at
least five exons that are alternatively used as the first exon of the protein. This results in at least
three different protein isoforms: L, M, and S232–234
. The exact breakpoints of the detected
deletions remain unclear, but they affect the 1a transcript coding for the first exon of the L and M
isoforms and some appear to include their transcriptional start sites according to the array with
all the others only one probe away. These isoforms are differentially regulated both temporally
and spatially and the shorter isoform S appears to be less stable than the others232
. While small
95
deletions of the L and M isoforms have not been previously reported in NBCCS, nonsense
mutations involving only the L and M isoforms have been seen in a few cases235–237
. Recently,
Suzuki et al. reported a nonsense mutation at codon 129 that was shown to eliminate translation
of functional PTCHM while allowing for the translation of PTCHS. Patient cells would thus be
expected to produce half the amount of longer isoforms (PTCHM and PTCHL) while producing
normal, or perhaps even slightly higher than normal, levels of the PTCHS isoform. This suggests
that NBCCS is caused by the haploinsufficiency of PTCHL and PTCHM with the PTCHS
isoform unable to compensate for their function235
. The role of PTCHS then is unclear, however
as it is the more ubiquitously expressed and less stable, it may have a role in situations where
transient expression and rapid degradation of PTCH1 is nesseccary232
. Cell lines are available for
a few of the samples that appear to have these 5’ deletions. In the future, the state of these three
PTCH1 isoforms should be assessed to see if these deletions have a similar effect on PTCHL and
PTCHM. In the few samples with tumour DNA available, PTCH1 status should also be assessed
to see if the loss of heterozygosity often observed in NBCCS is also seen in these tumours.
4.4.2 A Novel Variant in an LFS-L Family Affected by Syndactyly
The detection of a Ser433Tyr substitution by Sanger sequencing was an unexpected
discovery as this variant has not been reported in any case of NBCCS, the NCBI SNP, 1000
Genomes, NHLBI exome databases, or the catalogue of somatic mutations in cancer (COSMIC).
Because of this, the functional relevance of this variant is hard to predict, although PolyPhen-2, a
tool for annotating coding nonsynonymous SNPs193
predicted this substitution to be “probably
damaging” with a score of 0.984. The “Sorting Tolerant From Intolerant” (SIFT) algorithm194
also predicts this mutation to be “damaging” with a SIFT score of 0.01. This affects a highly
conserved codon at the very end of the first extracellular loop, a domain frequently affected by
96
nonsense mutations in NBCCS. This domain is critical for Hh binding and has been shown to be
critical in a SMO-independent mechanism of GLI1 inhibition238
. In NBCCS, missense mutations
appear to be concentrated in the transmembrane domains227
.
The variant was discovered in two family members (mother and son) of three tested on
the array. While both children have developed cancer (including one without this mutation), the
mother has not. The mother however is affected by syndactyly as is the son who shares the novel
variant (pedigree shown in figure 24). While syndactyly is not considered a hallmark of NBCCS,
it has been observed6 as well as polydactyly
239. This family presents a challenging case as while
both children are affected by LFS tumours at an early age, as far as we are aware, neither parent
is affected and neither lineage has a particularly strong history of cancer. There is a strong
history of syndactyly in the maternal lineage which appears to be inherited in an autosomal
dominant fashion, but this does not completely segregate with the cancer phenotype. Aside from
this mutation at codon 433, all three family members also appear to harbour the previously
discussed deletion at the 5’ end of the PTCH1 gene. As this substitution does not result in a
truncated protein, it does seem understandable that it would not confer the usual NBCSS
phenotype. The question, however, is if it does exert a cancer-predisposing effect, especially in
the presence of other mutations in the gene, such as the 5’ deletions in PTCH1, or with mutations
in other genes, perhaps other tumour suppressors. PTCH2 is highly homologous to PTCH1 and
has been reported in very few cases of NBCCS and appears to confer a milder phenotype14,15
.
PTCH2-deficient mice, unlike PTCH1-deficient mice, have no obvious defects and do not appear
to be predisposed to cancer. However, loss of PTCH2 has a marked effect on tumour formation
in combination with PTCH1 haploinsufficiency. These mice exhibit a higher incidence of
tumours and a broader spectrum of tumour types compared to mice deficient in PTCH1 alone241
.
97
This provides an interesting example of a mutation that does not appear to be important until
placed in the context of another lesion. It is possible that the mutation at codon 433 may have a
similar effect.
In any case, the other members of the family who are affected by syndactyly should be
considered for PTCH1 sequencing to confirm that the mutation does in fact segregate with the
syndactyly phenotype as this could be done quickly and easily. Assessing this mutation’s
functional effect on cancer predisposition however is more difficult. Chung and Bunz recently
sought to answer a similar question about a PTCH1 variant detected in colorectal cancer. This
variant, P681L, is a missense mutation located in the intracellular loop of PTCH1. Expression of
exogenous SMO resulted in robust activation of a GLI-responsive luciferase reporter construct.
This could be suppressed by the expression of wildtype PTCH1, but not the variant242
. The
ability of the S433Y variant to inhibit GLI activity can also be assessed in this manner.
4.5 Management of Cancer Predisposition Syndromes
Management of cancer predisposition syndromes generally relies on a routine
surveillance protocol in order to detect and treat tumours as early as possible. Villani et al.
assessed the feasibility and potential clinical effect of a comprehensive surveillance protocol
using frequent biochemical and imaging studies in TP53 mutation carriers. They found that the
3-year overall survival was 100% in the surveillance group and only 21% in the non-surveillance
group243
. This demonstrates the clinical impact that accurate diagnosis of a cancer syndrome
followed by robust surveillance can have. Drug treatment is currently not used to treat
asymptomatic LFS patients but chemoprevention in NBCCS is being explored. The SMO
98
inhibitor, Vismodegib, received FDA approval for BCC treatment in 2012. Shortly after this,
Tang et al. published the results of a randomized, double-blind, placebo-controlled trial of
Vismodegib in NBCCS patients. The authors found that the per-patient rate of new surgically
eligible BCCs dropped from 29 in the placebo to 2 in the Vismodegib group. Existing clinically
significant BCCs also saw a significant reduction in size compared to the placebo group. While
the drug appears to hold much promise, it is associated with many adverse effects that caused a
number of patients to discontinue treatment244
. Furthermore, there is evidence that continuous
Vismodegib therapy can lead to acquired resistance to the drug245
and it appears that BCCs will
rapidly rebound upon cessation of Vismodegib treatment246
. Nonetheless, Vismodegib not only
holds promise as a useful tool in the management of NBCCS, but also provides an example of
how the identification of causative germline mutations can lead to novel therapies for cancer
predisposition syndromes.
4.6 Concluding Remarks
Our study aimed to identify candidate genes that may play a role in LFS and LFS-L.
Using a custom CGH array we were able to create a large list of potentially significant alterations
in genes of interest, many of which were validated by qPCR. These include alterations in MSH2
and PTCH1, two genes previously implicated in cancer predisposition syndromes.
The functional relevance of these newly identified novel alterations, as well as those
which are yet to be validated remains unknown. New technologies, particularly NGS should
allow for the highly accurate identification of both sequence and structural alterations in
subsequent studies. With the increasing numbers of LFS and LFS-L samples available that can
be analyzed both in our lab and through collaboration with labs around the world, the genes of
interest identified in this study can be interrogated with these new technologies in order to better
99
understand their potential role in the development of the LFS phenotype. Exome sequencing also
holds the potential of identifying smaller alterations that would go undetected by a CGH array
and may reside in genes not thought to be a high priority. Of particular interest in any future
study would be other genes in the Hh pathway such as SMO and SUFU. The identification of
causative mutations is crucial to the proper treatment of cancer predisposition syndromes such as
LFS. Knowledge of associated genes is necessary for the proper identification of carriers that
leads to highly successful surveillance programs that significantly increase survival, quality
genetic counseling, and a better quality of life.
100
References
1. Li, F. P. & Fraumeni, J. F. Soft-tissue sarcomas, breast cancer, and other neoplasms. A
familial syndrome? Ann. Intern. Med. 71, 747–52 (1969).
2. Li, F. P. et al. A cancer family syndrome in twenty-four kindreds. Cancer Res. 48, 5358–
62 (1988).
3. Nichols, K. E., Malkin, D., Garber, J. E., Fraumeni, J. F. & Li, F. P. Germ-line p53
mutations predispose to a wide spectrum of early-onset cancers. Cancer Epidemiol.
Biomarkers Prev. 10, 83–7 (2001).
4. Petitjean, A. et al. Impact of mutant p53 functional properties on TP53 mutation patterns
and tumor phenotype: lessons from recent developments in the IARC TP53 database.
Hum. Mutat. 28, 622–9 (2007).
5. Tinat, J. et al. 2009 version of the Chompret criteria for Li Fraumeni syndrome. J. Clin.
Oncol. 27, e108–9; author reply e110 (2009).
6. Hisada, M., Garber, J. E., Fung, C. Y., Joseph, F. & Li, F. P. Multiple Primary Cancers in.
90, (1998).
7. Marees, T. et al. Risk of second malignancies in survivors of retinoblastoma: more than
40 years of follow-up. J. Natl. Cancer Inst. 100, 1771–9 (2008).
8. Wu, C.-C., Shete, S., Amos, C. I. & Strong, L. C. Joint effects of germ-line p53 mutation
and sex on cancer risk in Li-Fraumeni syndrome. Cancer Res. 66, 8287–8292 (2006).
9. Hwang, S.-J., Lozano, G., Amos, C. I. & Strong, L. C. Germline p53 Mutations in a
Cohort with Childhood Sarcoma: Sex Differences in Cancer Risk. Am. J. Hum. Genet. 72,
975–983 (2003).
10. Nigro, J. M. et al. Mutations in the p53 gene occur in diverse human tumour types. Nature
342, 705–708 (1989).
11. Lavigueur, A. et al. High incidence of lung, bone, and lymphoid tumors in transgenic
mice overexpressing mutant alleles of the p53 oncogene. Mol. Cell. Biol. 9, 3982–3991
(1989).
12. Olivier, M. et al. Li-Fraumeni and Related Syndromes : Correlation between Tumor Type
, Family Structure , and TP53 Genotype Li-Fraumeni and Related Syndromes : Correlation
between Tumor Type , Family. 6643–6650 (2003).
13. Varley, J. M. Germline TP53 mutations and Li-Fraumeni syndrome. Hum. Mutat. 21,
313–20 (2003).
101
14. Lalloo, F. et al. Prediction of pathogenic mutations in patients with early-onset breast
cancer by family history For personal use . Only reproduce with permission from The
Lancet Publishing Group . 361, 1101–1102 (2003).
15. Gonzalez, K. D. et al. High frequency of de novo mutations in Li-Fraumeni syndrome. J.
Med. Genet. 46, 689–93 (2009).
16. Olivier, M., Hollstein, M. & Hainaut, P. TP53 mutations in human cancers: origins,
consequences, and clinical use. Cold Spring Harb. Perspect. Biol. 2, a001008 (2010).
17. Hussain, S. P. & Harris, C. C. Molecular Epidemiology of Human Cancer : Contribution
of Mutation Spectra Studies of Tumor Suppressor Genes of Human Cancer : Contribution
of Mutation Spectra Studies of Tumor Suppressor Genes. 4023–4037 (1998).
18. Fujimoto, A. et al. Whole-genome sequencing of liver cancers identifies etiological
influences on mutation patterns and recurrent mutations in chromatin regulators. Nat.
Genet. 44, 760–4 (2012).
19. Rotter, V., Boss, M. a & Baltimore, D. Increased concentration of an apparently identical
cellular protein in cells transformed by either Abelson murine leukemia virus or other
transforming agents. J. Virol. 38, 336–46 (1981).
20. Wolf, D., Harris, N. & Rotter, V. Reconstitution of p53 expression in a nonproducer Ab-
MuLV-transformed cell line by transfection of a functional p53 gene. Cell 38, 119–126
(1984).
21. Olive, K. P. et al. Mutant p53 gain of function in two mouse models of Li-Fraumeni
syndrome. Cell 119, 847–60 (2004).
22. Shlien, A. et al. A common molecular mechanism underlies two phenotypically distinct
17p13.1 microdeletion syndromes. Am. J. Hum. Genet. 87, 631–42 (2010).
23. Milner, J. O., Medcalf, E. A. & Cook, A. C. p53 Complexes. 11, 12–19 (1991).
24. De Vries, A. et al. Targeted point mutations of p53 lead to dominant-negative inhibition
of wild-type p53 function. Proc. Natl. Acad. Sci. U. S. A. 99, 2948–53 (2002).
25. Lang, G. a et al. Gain of function of a p53 hot spot mutation in a mouse model of Li-
Fraumeni syndrome. Cell 119, 861–72 (2004).
26. Dittmer D, Pati S, Zambetti G, Chu S, Teresky AK, Moore M, Finlay C, L. A. Gain of
function mutations in p53. Nat Genet. 4, 42–6
27. Kadouri, L. et al. A single-nucleotide polymorphism in the RAD51 gene modifies breast
cancer risk in BRCA2 carriers, but not in BRCA1 carriers or noncarriers. Br. J. Cancer
90, 2002–5 (2004).
102
28. Easton, D. F., Ponder, M. a, Huson, S. M. & Ponder, B. a. An analysis of variation in
expression of neurofibromatosis (NF) type 1 (NF1): evidence for modifying genes. Am. J.
Hum. Genet. 53, 305–13 (1993).
29. Shlien, A. et al. Excessive genomic DNA copy number variation in the Li-Fraumeni
cancer predisposition syndrome. Proc. Natl. Acad. Sci. U. S. A. 105, 11264–9 (2008).
30. Silva, A. G., Achatz, I. M. W., Krepischi, A. C., Pearson, P. L. & Rosenberg, C. Number
of rare germline CNVs and TP53 mutation types. Orphanet J. Rare Dis. 7, 101 (2012).
31. Dumont, P., Leu, J. I.-J., Della Pietra, A. C., George, D. L. & Murphy, M. The codon 72
polymorphic variants of p53 have markedly different apoptotic potential. Nat. Genet. 33,
357–65 (2003).
32. Bond, G. L. et al. A single nucleotide polymorphism in the MDM2 promoter attenuates
the p53 tumor suppressor pathway and accelerates tumor formation in humans. Cell 119,
591–602 (2004).
33. Bougeard, G. et al. Impact of the MDM2 SNP309 and p53 Arg72Pro polymorphism on
age of tumour onset in Li-Fraumeni syndrome. J. Med. Genet. 43, 531–3 (2006).
34. Wu, C.-C. et al. Joint effects of germ-line TP53 mutation, MDM2 SNP309, and gender on
cancer risk in family studies of Li-Fraumeni syndrome. Hum. Genet. 129, 663–73 (2011).
35. Marcel, V. et al. TP53 PIN3 and MDM2 SNP309 polymorphisms as genetic modifiers in
the Li-Fraumeni syndrome: impact on age at first diagnosis. J. Med. Genet. 46, 766–72
(2009).
36. Fang, S. et al. Sex-specific effect of the TP53 PIN3 polymorphism on cancer risk in a
cohort study of TP53 germline mutation carriers. Hum. Genet. 130, 789–94 (2011).
37. Trkova, M., Hladikova, M., Kasal, P., Goetz, P. & Sedlacek, Z. Is there anticipation in the
age at onset of cancer in families with Li-Fraumeni syndrome? J. Hum. Genet. 47, 381–6
(2002).
38. Tabori, U., Nanda, S., Druker, H., Lees, J. & Malkin, D. Younger age of cancer initiation
is associated with shorter telomere length in Li-Fraumeni syndrome. Cancer Res. 67,
1415–8 (2007).
39. Trkova, M., Prochazkova, K., Krutilkova, V., Sumerauer, D. & Sedlacek, Z. Telomere
length in peripheral blood cells of germline TP53 mutation carriers is shorter than that of
normal individuals of corresponding age. Cancer 110, 694–702 (2007).
40. Martinez-Delgado, B. et al. Genetic anticipation is associated with telomere shortening in
hereditary breast cancer. PLoS Genet. 7, e1002182 (2011).
103
41. Bougeard, G. et al. Detection of 11 germline inactivating TP53 mutations and absence of
TP63 and HCHK2 mutations in 17 French families with Li-Fraumeni or Li-Fraumeni-like
syndrome. J. Med. Genet. 38, 253–7 (2001).
42. Stone, J. G. et al. Analysis of Li–Fraumeni syndrome and Li–Fraumeni-like families for
germline mutations in Bcl10. Cancer Lett. 147, 181–185 (1999).
43. Barlow, J. W. et al. Germ Line BAX Alterations Are Infrequent in Li-Fraumeni Syndrome
Germ Line BAX Alterations Are Infrequent in Li-Fraumeni Syndrome. 1403–1406
(2004).
44. Burt, E. C. et al. Exclusion of the genes CDKN2 and PTEN as causative gene defects in
Li-Fraumeni syndrome. Br. J. Cancer 80, 9–10 (1999).
45. Portwine, C., Lees, J., Verselis, S., Li, F. P. & Malkin, D. Absence of germline p16
INK4a alterations in p53 wild type Li-Fraumeni syndrome families. J. Med. Genet. 37 ,
e13–e13 (2000).
46. Brown, L. T., Sexsmith, E. & Malkin, D. Identification of a novel PTEN intronic deletion
in Li-Fraumeni syndrome and its effect on RNA processing. Cancer Genet. Cytogenet.
123, 65–8 (2000).
47. Vahteristo, P. et al. p53 , CHK 2 , and CHK1 Genes in Finnish Families with Li-Fraumeni
Syndrome : Further Evidence of CHK2 in Inherited Cancer Predisposition Further
Evidence of CHK2 in Inherited Cancer Predisposition 1. 5718–5722 (2001).
48. Bell, D. W. Heterozygous Germ Line hCHK2 Mutations in Li-Fraumeni Syndrome.
Science (80-. ). 286, 2528–2531 (1999).
49. Sodha, N. et al. Screening hCHK2 for Mutations. Sci. 289 , 359 (2000).
50. Meijers-Heijboer, H. et al. Low-penetrance susceptibility to breast cancer due to
CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations. Nat. Genet. 31, 55–9
(2002).
51. Vahteristo, P. et al. A CHEK2 genetic variant contributing to a substantial fraction of
familial breast cancer. Am. J. Hum. Genet. 71, 432–8 (2002).
52. Cybulski, C. et al. CHEK2 is a multiorgan cancer susceptibility gene. Am. J. Hum. Genet.
75, 1131–5 (2004).
53. Xiang, H., Geng, X., Ge, W. & Li, H. Meta-analysis of {CHEK2} 1100delC variant and
colorectal cancer susceptibility. Eur. J. Cancer 47, 2546–2551 (2011).
54. Cybulski, C. et al. Constitutional CHEK2 mutations are associated with a decreased risk
of lung and laryngeal cancers. Carcinogenesis 29, 762–5 (2008).
104
55. Cybulski, C. et al. A personalised approach to prostate cancer screening based on
genotyping of risk founder alleles. Br. J. Cancer 108, 2601–9 (2013).
56. Bachinski, L. L. et al. Genetic Mapping of a Third Li-Fraumeni Syndrome Predisposition
Locus to Human Chromosome 1q23 Locus to Human Chromosome 1q23. 427–431
(2005).
57. Aury-Landas, J. et al. Germline copy number variation of genes involved in chromatin
remodelling in families suggestive of Li-Fraumeni syndrome with brain tumours. Eur. J.
Hum. Genet. 1–8 (2013). doi:10.1038/ejhg.2013.68
58. Chan, T. L. et al. Heritable germline epimutation of MSH2 in a family with hereditary
nonpolyposis colorectal cancer. Nat. Genet. 38, 1178–83 (2006).
59. Suter, C. M., Martin, D. I. K. & Ward, R. L. Germline epimutation of MLH1 in
individuals with multiple cancers. Nat. Genet. 36, 497–501 (2004).
60. Morak, M. et al. Further evidence for heritability of an epimutation in one of 12 cases
with MLH1 promoter methylation in blood cells clinically displaying HNPCC. Eur. J.
Hum. Genet. 16, 804–11 (2008).
61. Attwooll, C. L. et al. Identification of a rare polymorphism in the human TP53 promoter.
Cancer Genet. Cytogenet. 135, 165–72 (2002).
62. Amatya, V. J., Naumann, U., Weller, M. & Ohgaki, H. TP53 promoter methylation in
human gliomas. Acta Neuropathol. 110, 178–84 (2005).
63. Kang, J. H. et al. Methylation in the p53 Promoter Is a Supplementary Route to Breast
Carcinogenesis : Correlation between CpG Methylation in the p53 Promoter and the
Mutation of the p53 Gene in the Progression from Ductal Carcinoma In Situ to Invasive
Ductal Carcinoma Muta. 81, 573–579 (2001).
64. Pogribny, I. P. & James, S. J. Reduction of p53 gene expression in human primary
hepatocellular carcinoma is associated with promoter region methylation without coding
region mutation. Cancer Lett. 176, 169–74 (2002).
65. Agirre, X. et al. Methylation of CpG dinucleotides and/or CCWGG motifs at the promoter
of TP53 correlates with decreased gene expression in a subset of acute lymphoblastic
leukemia patients. Oncogene 22, 1070–2 (2003).
66. Finkova, A. et al. The TP53 gene promoter is not methylated in families suggestive of Li-
Fraumeni syndrome with no germline TP53 mutations. Cancer Genet. Cytogenet. 193,
63–6 (2009).
67. Garber, J. E. & Offit, K. Hereditary cancer predisposition syndromes. J. Clin. Oncol. 23,
276–92 (2005).
105
68. Knapke, S., Nagarajan, R., Correll, J., Kent, D. & Burns, K. Hereditary cancer risk
assessment in a pediatric oncology follow-up clinic. Pediatr. Blood Cancer 58, 85–9
(2012).
69. Cazier, J. & Tomlinson, I. General lessons from large-scale studies to identify human
cancer predisposition genes. 255–262 (2010). doi:10.1002/path
70. Knudson, A. G. Mutation and cancer: statistical study of retinoblastoma. Proc. Natl. Acad.
Sci. U. S. A. 68, 820–3 (1971).
71. Williams, V. C. et al. Neurofibromatosis type 1 revisited. Pediatrics 123, 124–33 (2009).
72. Goldstein, A. M. et al. Clinical findings in two African-American families with the nevoid
basal cell carcinoma syndrome (NBCC). Am. J. Med. Genet. 50, 272–81 (1994).
73. Lo Muzio, L. et al. Nevoid basal cell carcinoma syndrome. Clinical findings in 37 Italian
affected individuals. Clin. Genet. 55, 34–40 (1999).
74. Lo Muzio, L. Nevoid basal cell carcinoma syndrome (Gorlin syndrome). Orphanet J. Rare
Dis. 3, 32 (2008).
75. Cowan, R. et al. The gene for the naevoid basal cell carcinoma syndrome acts as a
tumour-suppressor gene in medulloblastoma. Br. J. Cancer 76, 141–5 (1997).
76. O’Malley, S., Weitman, D., Olding, M. & Sekhar, L. Multiple neoplasms following
craniospinal irradiation for medulloblastoma in a patient with nevoid basal cell carcinoma
syndrome. Case report. J. Neurosurg. 86, 286–8 (1997).
77. Mariateresa Mancuso, S. P. Basal cell carcinoma and its development: insights from
radiation-induced tumors in Ptch1-deficient mice. Cancer Res. 64, 934 – 41 (2004).
78. Marin-Gutzke, M. et al. Basal Cell Carcinoma in Childhood After Radiation Therapy.
Ann. Plast. Surg. 53, 593–595 (2004).
79. Kimonis, V. E. et al. Clinical manifestations in 105 persons with nevoid basal cell
carcinoma syndrome. Am. J. Med. Genet. 69, 299–308 (1997).
80. Pastorino, L. et al. Identification of a SUFU germline mutation in a family with Gorlin
syndrome. Am. J. Med. Genet. A 149A, 1539–43 (2009).
81. Smyth, I. Isolation and characterization of human patched 2 (PTCH2), a putative tumour
suppressor gene inbasal cell carcinoma and medulloblastoma on chromosome 1p32. Hum.
Mol. Genet. 8, 291–297 (1999).
82. Fan, Z. et al. A missense mutation in PTCH2 underlies dominantly inherited NBCCS in a
Chinese family. J. Med. Genet. 45, 303–308 (2008).
106
83. Fujii, K. et al. Frameshift mutation in the PTCH2 gene can cause nevoid basal cell
carcinoma syndrome. Fam. Cancer 12, 611–614 (2013).
84. Slade, I. et al. Heterogeneity of familial medulloblastoma and contribution of germline
PTCH1 and SUFU mutations to sporadic medulloblastoma. Fam. Cancer 10, 337–342
(2010).
85. Ottensmeier, H. et al. Treatment of Early Childhood Medulloblastoma by Postoperative
Chemotherapy Alone. 978–986 (2005).
86. Struewing, J. P. et al. The risk of cancer associated with specific mutations of BRCA1 and
BRCA2 among Ashkenazi Jews. N. Engl. J. Med. 336, 1401–8 (1997).
87. Warthin, A. S. Heredity with reference to carcinoma. Arch. Intern. Med. XII, 546 (1913).
88. Lynch, H. T. & Krush, A. J. Cancer family “G” revisited: 1895-1970. Cancer 27, 1505–11
(1971).
89. Quehenberger, F., Vasen, H. F. A. & van Houwelingen, H. C. Risk of colorectal and
endometrial cancer for carriers of mutations of the hMLH1 and hMSH2 gene: correction
for ascertainment. J. Med. Genet. 42, 491–6 (2005).
90. Jenkins, M. A. et al. Cancer risks for mismatch repair gene mutation carriers: a
population-based early onset case-family study. Clin. Gastroenterol. Hepatol. 4, 489–98
(2006).
91. Stoffel, E. et al. Calculation of risk of colorectal and endometrial cancer among patients
with Lynch syndrome. Gastroenterology 137, 1621–7 (2009).
92. Lynch, H. T. & de la Chapelle, A. Hereditary colorectal cancer. N. Engl. J. Med. 348,
919–32 (2003).
93. Hampel, H. et al. Screening for the Lynch syndrome (hereditary nonpolyposis colorectal
cancer). N. Engl. J. Med. 352, 1851–60 (2005).
94. Aaltonen, L. A. et al. Replication errors in benign and malignant tumors from hereditary
nonpolyposis colorectal cancer patients. Cancer Res. 54, 1645–8 (1994).
95. Kovacs, M. E., Papp, J., Szentirmay, Z., Otto, S. & Olah, E. Deletions removing the last
exon of TACSTD1 constitute a distinct class of mutations predisposing to Lynch
syndrome. Hum. Mutat. 30, 197–203 (2009).
96. Ligtenberg, M. J. L. et al. Heritable somatic methylation and inactivation of MSH2 in
families with Lynch syndrome due to deletion of the 3’ exons of TACSTD1. Nat. Genet.
41, 112–7 (2009).
107
97. Niessen, R. C. et al. Germline hypermethylation of MLH1 and EPCAM deletions are a
frequent cause of Lynch syndrome. Genes. Chromosomes Cancer 48, 737–44 (2009).
98. Kempers, M. J. E. et al. Risk of colorectal and endometrial cancers in EPCAM deletion-
positive Lynch syndrome: a cohort study. Lancet Oncol. 12, 49–55 (2011).
99. Gazzoli, I., Loda, M., Garber, J., Syngal, S. & Kolodner, R. D. A hereditary nonpolyposis
colorectal carcinoma case associated with hypermethylation of the MLH1 gene in normal
tissue and loss of heterozygosity of the unmethylated allele in the resulting microsatellite
instability-high tumor. Cancer Res. 62, 3925–8 (2002).
100. Järvinen, H. J. et al. Controlled 15-year trial on screening for colorectal cancer in families
with hereditary nonpolyposis colorectal cancer. Gastroenterology 118, 829–834 (2000).
101. Vasen, H. F. A. et al. One to 2-year surveillance intervals reduce risk of colorectal cancer
in families with Lynch syndrome. Gastroenterology 138, 2300–6 (2010).
102. Vasen, H. F., Mecklin, J. P., Khan, P. M. & Lynch, H. T. The International Collaborative
Group on Hereditary Non-Polyposis Colorectal Cancer (ICG-HNPCC). Dis. Colon
Rectum 34, 424–5 (1991).
103. Vasen, H. F., Watson, P., Mecklin, J. P. & Lynch, H. T. New clinical criteria for
hereditary nonpolyposis colorectal cancer (HNPCC, Lynch syndrome) proposed by the
International Collaborative group on HNPCC. Gastroenterology 116, 1453–6 (1999).
104. Kastrinos, F. & Stoffel, E. M. The History, Genetics, and Strategies for Cancer Prevention
in Lynch Syndrome. Clin. Gastroenterol. Hepatol. (2013). doi:10.1016/j.cgh.2013.06.031
105. Syngal, S., Fox, E. A., Eng, C., Kolodner, R. D. & Garber, J. E. Sensitivity and specificity
of clinical criteria for hereditary non-polyposis colorectal cancer associated mutations in
MSH2 and MLH1. J. Med. Genet. 37, 641–5 (2000).
106. Hampel, H. Point: justification for Lynch syndrome screening among all patients with
newly diagnosed colorectal cancer. J. Natl. Compr. Canc. Netw. 8, 597–601 (2010).
107. Baker, S. J. et al. Chromosome 17 deletions and p53 gene mutations in colorectal
carcinomas. Science (80-. ). 244, 217–221 (1989).
108. Reisman, D., Greenberg, M. & Rotter, V. Human p53 oncogene contains one promoter
upstream of exon 1 and a second, stronger promoter within intron 1. Proc. Natl. Acad. Sci.
U. S. A. 85, 5146–50 (1988).
109. Reich, N. C. & Levine, A. J. Growth regulation of a cellular tumour antigen, p53, in
nontransformed cells. Nature 308, 199–201
108
110. Mosner, J. et al. Negative feedback regulation of wild-type p53 biosynthesis. EMBO J. 14,
4442–9 (1995).
111. Boggs, K. & Reisman, D. Increased p53 transcription prior to DNA synthesis is regulated
through a novel regulatory element within the p53 promoter. Oncogene 25, 555–65
(2006).
112. Reisman, D., Takahashi, P., Polson, A. & Boggs, K. Transcriptional Regulation of the p53
Tumor Suppressor Gene in S-Phase of the Cell-Cycle and the Cellular Response to DNA
Damage. Biochem. Res. Int. 2012, 808934 (2012).
113. Schroeder, M. & Mass, M. J. CpG methylation inactivates the transcriptional activity of
the promoter of the human p53 tumor suppressor gene. Biochem. Biophys. Res. Commun.
235, 403–6 (1997).
114. Le, M. T. N. et al. MicroRNA-125b is a novel negative regulator of p53. Genes Dev. 23,
862–76 (2009).
115. Zhang, Y. et al. MicroRNA 125a and its regulation of the p53 tumor suppressor gene.
FEBS Lett. 583, 3725–30 (2009).
116. Hünten, S., Siemens, H. & Kaller, M. MicroRNA Cancer Regulation. 774, 77–101
(Springer Netherlands, 2013).
117. Nishida, N. et al. MicroRNA miR-125b is a prognostic marker in human colorectal
cancer. Int. J. Oncol. 38, 1437–43 (2011).
118. Saldaña-Meyer, R. & Recillas-Targa, F. Transcriptional and epigenetic regulation of the
p53 tumor suppressor gene. Epigenetics 6, 1068–77 (2011).
119. Mahmoudi, S. et al. Wrap53, a natural p53 antisense transcript required for p53 induction
upon DNA damage. Mol. Cell 33, 462–71 (2009).
120. Jones, S. N., Roe, A. E., Donehower, L. A. & Bradley, A. Rescue of embryonic lethality
in Mdm2-deficient mice by absence of p53. Nature 378, 206–8 (1995).
121. Montes de Oca Luna, R., Wagner, D. S. & Lozano, G. Rescue of early embryonic lethality
in mdm2-deficient mice by deletion of p53. Nature 378, 203–6 (1995).
122. Kubbutat, M. H., Ludwig, R. L., Ashcroft, M. & Vousden, K. H. Regulation of Mdm2-
directed degradation by the C terminus of p53. Mol. Cell. Biol. 18, 5690–8 (1998).
123. Rodriguez, M. S., Desterro, J. M., Lain, S., Lane, D. P. & Hay, R. T. Multiple C-terminal
lysine residues target p53 for ubiquitin-proteasome-mediated degradation. Mol. Cell. Biol.
20, 8458–67 (2000).
109
124. Ohkubo, S., Tanaka, T., Taya, Y., Kitazato, K. & Prives, C. Excess HDM2 impacts cell
cycle and apoptosis and has a selective effect on p53-dependent transcription. J. Biol.
Chem. 281, 16943–50 (2006).
125. Itahana, K. et al. Targeted inactivation of Mdm2 RING finger E3 ubiquitin ligase activity
in the mouse reveals mechanistic insights into p53 regulation. Cancer Cell 12, 355–66
(2007).
126. Stott, F. J. et al. The alternative product from the human CDKN2A locus, p14(ARF),
participates in a regulatory feedback loop with p53 and MDM2. EMBO J. 17, 5001–14
(1998).
127. Midgley, C. A. et al. An N-terminal p14ARF peptide blocks Mdm2-dependent
ubiquitination in vitro and can activate p53 in vivo. Oncogene 19, 2312–23 (2000).
128. Nobori, T. et al. Deletions of the cyclin-dependent kinase-4 inhibitor gene in multiple
human cancers. Nature 368, 753–6 (1994).
129. Weber, J. D. et al. Cooperative signals governing ARF-mdm2 interaction and nucleolar
localization of the complex. Mol. Cell. Biol. 20, 2517–28 (2000).
130. Appella, E. & Anderson, C. W. Post-translational modifications and activation of p53 by
genotoxic stresses. Eur. J. Biochem. 268, 2764–72 (2001).
131. Ashcroft, M., Kubbutat, M. H. & Vousden, K. H. Regulation of p53 function and stability
by phosphorylation. Mol. Cell. Biol. 19, 1751–8 (1999).
132. Shieh, S. Y., Ikeda, M., Taya, Y. & Prives, C. DNA damage-induced phosphorylation of
p53 alleviates inhibition by MDM2. Cell 91, 325–34 (1997).
133. Shieh, S. Y., Ahn, J., Tamai, K., Taya, Y. & Prives, C. The human homologs of
checkpoint kinases Chk1 and Cds1 (Chk2) phosphorylate p53 at multiple DNA damage-
inducible sites. Genes Dev. 14, 289–300 (2000).
134. Chehab, N. H., Malikzay, A., Stavridi, E. S. & Halazonetis, T. D. Phosphorylation of Ser-
20 mediates stabilization of human p53 in response to DNA damage. Proc. Natl. Acad.
Sci. U. S. A. 96, 13777–82 (1999).
135. Unger, T. et al. Critical role for Ser20 of human p53 in the negative regulation of p53 by
Mdm2. EMBO J. 18, 1805–14 (1999).
136. Tibbetts, R. S. et al. A role for ATR in the DNA damage-induced phosphorylation of p53.
Genes Dev. 13, 152–7 (1999).
137. Canman, C. E. et al. Activation of the ATM kinase by ionizing radiation and
phosphorylation of p53. Science 281, 1677–9 (1998).
110
138. Lu, X. & Lane, D. P. Differential induction of transcriptionally active p53 following UV
or ionizing radiation: defects in chromosome instability syndromes? Cell 75, 765–78
(1993).
139. el-Deiry, W. S., Kern, S. E., Pietenpol, J. A., Kinzler, K. W. & Vogelstein, B. Definition
of a consensus binding site for p53. Nat. Genet. 1, 45–9 (1992).
140. Cawley, S. et al. Unbiased mapping of transcription factor binding sites along human
chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116,
499–509 (2004).
141. Wei, C.-L. et al. A global map of p53 transcription-factor binding sites in the human
genome. Cell 124, 207–19 (2006).
142. O’Connor, P. M. et al. Characterization of the p53 tumor suppressor pathway in cell lines
of the National Cancer Institute anticancer drug screen and correlations with the growth-
inhibitory potency of 123 anticancer agents. Cancer Res. 57, 4285–300 (1997).
143. Kastan, M. B. et al. A mammalian cell cycle checkpoint pathway utilizing p53 and
GADD45 is defective in ataxia-telangiectasia. Cell 71, 587–97 (1992).
144. el-Deiry, W. S. et al. WAF1/CIP1 is induced in p53-mediated G1 arrest and apoptosis.
Cancer Res. 54, 1169–74 (1994).
145. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–
674 (2011).
146. Middleton, G., Cox, S. W., Korsmeyer, S. & Davies, A. M. Differences in bcl-2- and bax-
independent function in regulating apoptosis in sensory neuron populations. Eur. J.
Neurosci. 12, 819–27 (2000).
147. Shi, Y. Mechanisms of caspase activation and inhibition during apoptosis. Mol. Cell 9,
459–70 (2002).
148. Bennett, M. et al. Cell surface trafficking of Fas: a rapid mechanism of p53-mediated
apoptosis. Science 282, 290–3 (1998).
149. Adams, J. M. Ways of dying: multiple pathways to apoptosis. Genes Dev. 17, 2481–95
(2003).
150. Miyashita, T. & Reed, J. C. Tumor suppressor p53 is a direct transcriptional activator of
the human bax gene. Cell 80, 293–9 (1995).
151. Oda, E. et al. Noxa, a BH3-only member of the Bcl-2 family and candidate mediator of
p53-induced apoptosis. Science 288, 1053–8 (2000).
111
152. Nakano, K. & Vousden, K. H. PUMA, a novel proapoptotic gene, is induced by p53. Mol.
Cell 7, 683–94 (2001).
153. Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat.
Rev. Genet. 7, 85–97 (2006).
154. Gault, J. et al. Comparison of polymorphisms in the alpha7 nicotinic receptor gene and its
partial duplication in schizophrenic and control subjects. Am. J. Med. Genet. B.
Neuropsychiatr. Genet. 123B, 39–49 (2003).
155. Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science
305, 525–8 (2004).
156. Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36,
949–51 (2004).
157. Macdonald, J. R., Ziman, R., Yuen, R. K. C., Feuk, L. & Scherer, S. W. The Database of
Genomic Variants: a curated collection of structural variation in the human genome.
Nucleic Acids Res. 42, D986–92 (2014).
158. Conrad, D. F. et al. Origins and functional impact of copy number variation in the human
genome. Nature 464, 704–12 (2010).
159. Hastings, P. J., Lupski, J. R., Rosenberg, S. M. & Ira, G. Mechanisms of change in gene
copy number. Nat. Rev. Genet. 10, 551–64 (2009).
160. Lovett, S. T., Hurley, R. L., Sutera, V. A., Aubuchon, R. H. & Lebedeva, M. A. Crossing
over between regions of limited homology in Escherichia coli. RecA-dependent and
RecA-independent pathways. Genetics 160, 851–9 (2002).
161. Liskay, R. M., Letsou, A. & Stachelek, J. L. Homology requirement for efficient gene
conversion between duplicated chromosomal sequences in mammalian cells. Genetics
115, 161–7 (1987).
162. Reiter, L. T. et al. Human meiotic recombination products revealed by sequencing a
hotspot for homologous strand exchange in multiple HNPP deletion patients. Am. J. Hum.
Genet. 62, 1023–33 (1998).
163. Neale, M. J. & Keeney, S. Clarifying the mechanics of DNA strand exchange in meiotic
recombination. Nature 442, 153–8 (2006).
164. Sonoda, E. et al. Rad51-deficient vertebrate cells accumulate chromosomal breaks prior to
cell death. EMBO J. 17, 598–608 (1998).
112
165. Pentao, L., Wise, C. A., Chinault, A. C., Patel, P. I. & Lupski, J. R. Charcot-Marie-Tooth
type 1A duplication appears to arise from recombination at repeat sequences flanking the
1.5 Mb monomer unit. Nat. Genet. 2, 292–300 (1992).
166. Chen, K. S. et al. Homologous recombination of a flanking repeat gene cluster is a
mechanism for a common contiguous gene deletion syndrome. Nat. Genet. 17, 154–63
(1997).
167. Llorente, B., Smith, C. E. & Symington, L. S. Break-induced replication: what is it and
what is it for? Cell Cycle 7, 859–64 (2008).
168. Lieber, M. R. The mechanism of double-strand DNA break repair by the nonhomologous
DNA end-joining pathway. Annu. Rev. Biochem. 79, 181–211 (2010).
169. Toffolatti, L. et al. Investigating the mechanism of chromosomal deletion:
characterization of 39 deletion breakpoints in introns 47 and 48 of the human dystrophin
gene. Genomics 80, 523–30 (2002).
170. Lee, J. A., Carvalho, C. M. B. & Lupski, J. R. A DNA replication mechanism for
generating nonrecurrent rearrangements associated with genomic disorders. Cell 131,
1235–47 (2007).
171. Van Binsbergen, E. Origins and breakpoint analyses of copy number variations: up close
and personal. Cytogenet. Genome Res. 135, 271–6 (2011).
172. Bruder, C. E. G. et al. Phenotypically concordant and discordant monozygotic twins
display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82, 763–71
(2008).
173. Stranger, B. E. et al. Relative impact of nucleotide and copy number variation on gene
expression phenotypes. Science 315, 848–53 (2007).
174. Inoue, K. & Lupski, J. R. Molecular mechanisms for genomic disorders. Annu. Rev.
Genomics Hum. Genet. 3, 199–242 (2002).
175. Merla, G. et al. Submicroscopic deletion in patients with Williams-Beuren syndrome
influences expression levels of the nonhemizygous flanking genes. Am. J. Hum. Genet. 79,
332–41 (2006).
176. Howard, R. O., Breg, W. R., Albert, D. M. & Lesser, R. L. Retinoblastoma and
chromosome abnormality. Partial deletion of the long arm of chromosome 13. Arch.
Ophthalmol. 92, 490–3 (1974).
177. Orye, E., Delbeke, M. J. & Vandenabeele, B. Retinoblastoma and long arm delection of
chromosome 13. Attempts to define the deleted segment. Clin. Genet. 5, 457–64 (1974).
113
178. Krepischi, A. C. V., Pearson, P. L. & Rosenberg, C. Germline copy number variations and
cancer predisposition. Future Oncol. 8, 441–50 (2012).
179. Venkatachalam, R. et al. Identification of candidate predisposing copy number variants in
familial and early-onset colorectal cancer patients. Int. J. Cancer 129, 1635–42 (2011).
180. Krepischi, A. C. et al. Germline DNA copy number variation in familial and early-onset
breast cancer. Breast Cancer Res. 14, R24 (2012).
181. Yang, X. R. et al. Duplication of CXC chemokine genes on chromosome 4q13 in a
melanoma-prone family. Pigment Cell Melanoma Res. 25, 243–7 (2012).
182. Cho, H.-J. et al. Glutathione-S-transferase genotypes influence the risk of chemotherapy-
related toxicities and prognosis in Korean patients with diffuse large B-cell lymphoma.
Cancer Genet. Cytogenet. 198, 40–6 (2010).
183. Gamazon, E. R., Huang, R. S., Dolan, M. E. & Cox, N. J. Copy number polymorphisms
and anticancer pharmacogenomics. Genome Biol. 12, R46 (2011).
184. Thomas, G. et al. A multistage genome-wide association study in breast cancer identifies
two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nat. Genet. 41, 579–84 (2009).
185. Hodgson, S. V, Fagg, N. L., Talbot, I. C. & Wilkinson, M. Deletions of the entire APC
gene are associated with sessile colonic adenomas. J. Med. Genet. 31, 426 (1994).
186. Lucito, R. et al. Copy-number variants in patients with a strong family history of
pancreatic cancer. Cancer Biol. Ther. 6, 1592–9 (2007).
187. Balentien, E., Mufson, B. E., Shattuck, R. L., Derynck, R. & Richmond, A. Effects of
MGSA/GRO alpha on melanocyte transformation. Oncogene 6, 1115–24 (1991).
188. Wang, J. M., Taraboletti, G., Matsushima, K., Van Damme, J. & Mantovani, A. Induction
of haptotactic migration of melanoma cells by neutrophil activating protein/interleukin-8.
Biochem. Biophys. Res. Commun. 169, 165–70 (1990).
189. Yoshihara, K. et al. Germline Copy Number Variations in BRCA1 - Associated Ovarian
Cancer Patients. 177, 167–177 (2011).
190. Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–83 (2004).
191. Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: Identifying a common
protein-coding gene set for the human and mouse genomes. Genome Res. 19, 1316–23
(2009).
192. Malanga, D. et al. Functional characterization of a rare germline mutation in the gene
encoding the cyclin-dependent kinase inhibitor p27Kip1 (CDKN1B) in a Spanish patient
114
with multiple endocrine neoplasia-like phenotype. Eur. J. Endocrinol. 166, 551–60
(2012).
193. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations.
Nat. Methods 7, 248–9 (2010).
194. Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous
variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–81 (2009).
195. Vasson, A. et al. Custom oligonucleotide array-based CGH: a reliable diagnostic tool for
detection of exonic copy-number changes in multiple targeted genes. Eur. J. Hum. Genet.
21, 977–87 (2013).
196. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409,
860–921 (2001).
197. Finishing the euchromatic sequence of the human genome. Nature 431, 931–45 (2004).
198. Mardis, E. R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum.
Genet. 9, 387–402 (2008).
199. Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145
(2008).
200. Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using
genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–9 (2008).
201. Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast
cancer genomes. Nature 462, 1005–10 (2009).
202. Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic
event during cancer development. Cell 144, 27–40 (2011).
203. Lister, R. & Ecker, J. R. Finding the fifth base: genome-wide sequencing of cytosine
methylation. Genome Res. 19, 959–66 (2009).
204. Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia
genome. Nature 456, 66–72 (2008).
205. Yoshida, K., Sanada, M. & Ogawa, S. Deep sequencing in cancer research. Jpn. J. Clin.
Oncol. 43, 110–5 (2013).
206. Tiacci, E. et al. BRAF mutations in hairy-cell leukemia. N. Engl. J. Med. 364, 2305–15
(2011).
115
207. Pasqualucci, L. et al. Inactivating mutations of acetyltransferase genes in B-cell
lymphoma. Nature 471, 189–95 (2011).
208. Agrawal, N. et al. Exome sequencing of head and neck squamous cell carcinoma reveals
inactivating mutations in NOTCH1. Science 333, 1154–7 (2011).
209. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by
multiregion sequencing. N. Engl. J. Med. 366, 883–92 (2012).
210. Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative
breast cancers. Nature 486, 395–9 (2012).
211. Comino-Méndez, I. et al. Exome sequencing identifies MAX mutations as a cause of
hereditary pheochromocytoma. Nat. Genet. 43, 663–7 (2011).
212. Burnichon, N. et al. MAX mutations cause hereditary and sporadic pheochromocytoma
and paraganglioma. Clin. Cancer Res. 18, 2828–37 (2012).
213. DeRycke, M. S. et al. Identification of Novel Variants in Colorectal Cancer Families by
High-Throughput Exome Sequencing. Cancer Epidemiol. Biomarkers Prev. 22, 1239–
1251 (2013).
214. Popova, T. et al. Germline BAP1 mutations predispose to renal cell carcinomas. Am. J.
Hum. Genet. 92, 974–80 (2013).
215. Bogliolo, M. et al. Mutations in ERCC4, Encoding the DNA-Repair Endonuclease XPF,
Cause Fanconi Anemia. Am. J. Hum. Genet. 92, 800–806 (2013).
216. Thompson, E. R. et al. Exome sequencing identifies rare deleterious mutations in DNA
repair genes FANCC and BLM as potential breast cancer susceptibility alleles. PLoS
Genet. 8, e1002894 (2012).
217. Powell, B. C. et al. Identification of TP53 as an acute lymphocytic leukemia susceptibility
gene through exome sequencing. Pediatr. Blood Cancer 60, E1–3 (2013).
218. Chang, V. Y., Federman, N., Martinez-Agosto, J., Tatishchev, S. F. & Nelson, S. F.
Whole exome sequencing of pediatric gastric adenocarcinoma reveals an atypical
presentation of Li-Fraumeni syndrome. Pediatr. Blood Cancer 60, 570–4 (2013).
219. Turner, C. & Hilton-Jones, D. The myotonic dystrophies: diagnosis and management. J.
Neurol. Neurosurg. Psychiatry 81, 358–67 (2010).
220. Seguí, N. et al. Telomere length and genetic anticipation in Lynch syndrome. PLoS One 8,
e61286 (2013).
116
221. Bozzao, C., Lastella, P. & Stella, A. Anticipation in lynch syndrome: where we are where
we go. Curr. Genomics 12, 451–65 (2011).
222. Plazzer, J. P. et al. The InSiGHT database: utilizing 100 years of insights into Lynch
syndrome. Fam. Cancer 12, 175–80 (2013).
223. Urso, E. et al. Soft tissue sarcoma and the hereditary non-polyposis colorectal cancer
(HNPCC) syndrome: formulation of an hypothesis. Mol. Biol. Rep. 39, 9307–10 (2012).
224. Wimmer, K. & Etzler, J. Constitutional mismatch repair-deficiency syndrome: have we so
far seen only the tip of an iceberg? Hum. Genet. 124, 105–22 (2008).
225. Berman, D. M. et al. Medulloblastoma growth inhibition by hedgehog pathway blockade.
Science 297, 1559–61 (2002).
226. Lo Muzio, L. Nevoid basal cell carcinoma syndrome (Gorlin syndrome). Orphanet J. Rare
Dis. 3, 32 (2008).
227. Lindström, E., Shimokawa, T., Toftgård, R. & Zaphiropoulos, P. G. PTCH mutations:
distribution and analyses. Hum. Mutat. 27, 215–9 (2006).
228. Takahashi, C. et al. Germline PTCH1 mutations in Japanese basal cell nevus syndrome
patients. J. Hum. Genet. 54, 403–8 (2009).
229. Nagao, K. et al. Entire PTCH1 deletion is a common event in point mutation-negative
cases with nevoid basal cell carcinoma syndrome in Japan. Clin. Genet. 79, 196–8 (2011).
230. Calzada-Wack, J. et al. Unbalanced overexpression of the mutant allele in murine Patched
mutants. Carcinogenesis 23, 727–33 (2002).
231. Kappler, R. et al. Profiling the molecular difference between Patched- and p53-dependent
rhabdomyosarcoma. Oncogene 23, 8785–95 (2004).
232. Nagao, K. et al. Identification and characterization of multiple isoforms of a murine and
human tumor suppressor, patched, having distinct first exons. Genomics 85, 462–71
(2005).
233. Kogerman, P. et al. Alternative first exons of PTCH1 are differentially regulated in vivo
and may confer different functions to the PTCH1 protein. Oncogene 21, 6007–16 (2002).
234. Shimokawa, T., Rahnama, F. & Zaphiropoulos, P. G. A novel first exon of the Patched1
gene is upregulated by Hedgehog signaling resulting in a protein with pathway inhibitory
functions. FEBS Lett. 578, 157–62 (2004).
235. Suzuki, M. et al. Selective haploinsufficiency of longer isoforms of PTCH1 protein can
cause nevoid basal cell carcinoma syndrome. J. Hum. Genet. 57, 422–6 (2012).
117
236. Minami, M. et al. Germline mutations of the PTCH gene in Japanese patients with nevoid
basal cell carcinoma syndrome. J. Dermatol. Sci. 27, 21–6 (2001).
237. Savino, M. et al. Spectrum of PTCH mutations in Italian nevoid basal cell-carcinoma
syndrome patients: identification of thirteen novel alleles. Hum. Mutat. 24, 441 (2004).
238. Rahnama, F. et al. Inhibition of GLI1 gene activation by Patched1. Biochem. J. 394, 19–
26 (2006).
239. Valdivielso-Ramos, M. et al. Novel mutation in the PTCH1 gene in a patient with Gorlin
syndrome with prominent clinical features. Clin. Exp. Dermatol. 39, 406–7 (2014).
240. Barreto, D. C., Gomez, R. S., Bale, A. E., Boson, W. L. & De Marco, L. PTCH gene
mutations in odontogenic keratocysts. J. Dent. Res. 79, 1418–22 (2000).
241. Lee, Y. et al. Patched2 modulates tumorigenesis in patched1 heterozygous mice. Cancer
Res. 66, 6964–71 (2006).
242. Chung, J. H. & Bunz, F. A loss-of-function mutation in PTCH1 suggests a role for
autocrine hedgehog signaling in colorectal tumorigenesis. Oncotarget 4, 2208–11 (2013).
243. Villani, A. et al. Biochemical and imaging surveillance in germline TP53 mutation
carriers with Li-Fraumeni syndrome: a prospective observational study. Lancet Oncol. 12,
559–67 (2011).
244. Tang, J. Y. et al. Inhibiting the hedgehog pathway in patients with the basal-cell nevus
syndrome. N. Engl. J. Med. 366, 2180–8 (2012).
245. Chang, A. L. S., Atwood, S. X., Tartar, D. M. & Oro, A. E. Surgical excision after
neoadjuvant therapy with vismodegib for a locally advanced basal cell carcinoma and
resistant basal carcinomas in Gorlin syndrome. JAMA dermatology 149, 639–41 (2013).
246. Wolfe, C. M., Green, W. H., Cognetta, A. B. & Hatfield, H. K. Basal cell carcinoma
rebound after cessation of vismodegib in a nevoid basal cell carcinoma syndrome patient.
Dermatol. Surg. 38, 1863–6 (2012).
118
Supplementary Information
Supplementary Materials and Methods
Sanger Genes
Gene Location
ABL1 chr9:133589268-133763060
ABL2 chr1:179068463-179198819
ACSL3 chr2:223725732-223808118
AF15Q14 chr15:40886447-40954881
AF1Q chr1:151032151-151040972
AF3p21 chr3:48711280-48723334
AF5q31 chr5:132211072-132299354
AKAP9 chr7:91570189-91739986
AKT1 chr14:105235689-105262080
AKT2 chr19:40736225-40791265
ALK chr2:29415641-30144432
ALO17 chr17:78234667-78370085
APC chr5:112043218-112181935
ARHGEF12 chr11:120207946-120360645
ARHH chr4:40192613-40245992
ARNT chr1:150782186-150849186
ASPSCR1 chr17:79935426-79975280
ASXL1 chr20:30946147-31027121
ATF1 chr12:51157819-51214906
ATIC chr2:216176679-216214499
ATM chr11:108093559-108239826
BCL10 chr1:85731461-85742587
BCL11A chr2:60678303-60780633
BCL11B chr14:99635627-99737822
BCL2 chr18:60790579-60986613
BCL3 chr19:45251978-45263300
BCL6 chr3:187439165-187463513
BCL7A chr12:122459861-122499948
BCL9 chr1:147013182-147098013
BCR chr22:23522552-23660223
BHD chr17:17115529-17140502
BIRC3 chr11:102188194-102208464
BLM chr15:91260579-91358684
BMPR1A chr10:88516396-88684944
119
BRAF chr7:140433815-140624564
BRCA1 chr17:41196313-41276132
BRCA2 chr13:32889617-32973809
BRD3 chr9:136895454-136933141
BRD4 chr19:15348302-15391262
BRIP1 chr17:59759985-59940755
BTG1 chr12:92534054-92539673
BUB1B chr15:40453210-40513335
C12orf9 chr12:66500965-66502496
CANT1 chr17:76987799-77005899
CARD11 chr7:2945769-3083579
CARS chr11:3022160-3078671
CBFA2T1 chr8:92971152-93075191
CBFA2T3 chr16:88941267-89043401
CBFB chr16:67063050-67134956
CBL chr11:119076990-119178858
CBLB chr3:105377110-105587887
CBLC chr19:45281126-45303902
CCND1 chr11:69455873-69469241
CCND2 chr12:4382902-4414521
CCND3 chr6:41902672-42016610
CD74 chr5:149781201-149792332
CD79A chr19:42381190-42385438
CD79B chr17:62006098-62009704
CDH1 chr16:68771195-68869444
CDH11 chr16:64980685-65155919
CDK4 chr12:58142054-58146078
CDK6 chr7:92234237-92465941
CDKN2A -p16(INK4a) chr9:21967752-21994490
CDKN2A- p14ARF chr9:21967752-21994491
CDKN2C chr1:51435642-51440307
CDX2 chr13:28536315-28543423
CEBPA chr19:33790847-33793390
CEP1 chr9:123850574-123939886
CHCHD7 chr8:57124315-57131174
CHEK2 chr22:29083731-29137822
CHIC2 chr4:54875958-54930788
CHN1 chr2:175664042-175870170
CIC chr19:42788817-42799949
CLTC chr17:57697050-57774317
CLTCL1 chr22:19166989-19279239
120
CMKOR1 chr2:237478380-237490992
COL1A1 chr17:48261459-48279000
COPEB chr10:3818189-3827473
COX6C chr8:100890223-100906242
CREB1 chr2:208394616-208470282
CREB3L2 chr7:137559727-137686846
CREBBP chr16:3775058-3930121
CRLF2 chrX:1314887-1331530
CRTC3 chr15:91073198-91188576
CTNNB1 chr3:41240942-41281939
CYLD chr16:50775961-50835846
D10S170 chr10:61548522-61666818
DDB2 chr11:47236493-47260769
DDIT3 chr12:57910373-57914300
DDX10 chr11:108535860-108811646
DDX5 chr17:62494374-62502484
DDX6 chr11:118618473-118661972
DEK chr6:18224400-18264799
DICER1 chr14:95552565-95623759
DUX4 chr4:191005267-191006883
EGFR chr7:55086725-55224642
EIF4A2 chr3:186501256-186507877
ELF4 chrX:129198896-129244688
ELK4 chr1:205585235-205602000
ELKS chr12:1100404-1605099
ELL chr19:18553475-18632937
ELN chr7:73442427-73484234
EML4 chr2:42396490-42559686
EP300 chr22:41488614-41576080
EPS15 chr1:51819935-51,985,036
ERBB2 chr17:37844393-37884914
ERCC2 chr19:45854649-45873845
ERCC3 chr2:128014866-128051752
ERCC4 chr16:14014014-14046205
ERCC5 chr13:103498174-103528347
ERG chr21:39751952-40033704
ETV1 chr7:13930858-14031050
ETV4 chr17:41605212-41623762
ETV5 chr3:185764111-185826878
ETV6 chr12:11802788-12048323
EVI1 chr3:168801287-168865522
121
EWSR1 chr22:29663998-29696514
EXT1 chr8:118811602-119124058
EXT2 chr11:44117099-44266979
EZH2 chr7:148504475-148581414
FACL6 chr5:131289152-131347349
FANCA chr16:89803959-89883065
FANCC chr9:97861338-98079991
FANCD2 chr3:10068113-10143614
FANCE chr6:35420138-35434881
FANCF chr11:22644079-22647387
FANCG chr9:35073835-35080013
FBXW7 chr4:153242411-153456172
FCGR2B chr1:161632905-161648442
FEV chr2:219845809-219850379
FGFR1 chr8:38268657-38326352
FGFR1OP chr6:167412816-167454065
FGFR2 chr10:123237845-123357972
FGFR3 chr4:1795039-1810599
FH chr1:241660857-241683085
FIP1L1 chr4:54243820-54326102
FLI1 chr11:128562389-128683161
FLT3 chr13:28577412-28674729
FNBP1 chr9:132649466-132805473
FOXL2 chr3:138663067-138665982
FOXO1A chr13:41129803-41240734
FOXO3A chr6:108881026-109005971
FOXP1 chr3:71004737-71633140
FSTL3 chr19:676389-683392
FUS chr16:31191431-31206190
FVT1 chr18:60994972-61034506
GAS7 chr17:9813926-10101868
GATA1 chrX:48644982-48652715
GATA2 chr3:128198265-128212030
GATA3 chr10:8096667-8117162
GMPS chr3:155588325-155655518
GNAQ chr9:80335200-80646192
GNAS chr20:57414795-57486249
GOLGA5 chr14:93260650-93306304
GOPC chr6:117881435-117923705
GPC3 chrX:132669776-133119673
GPHN chr14:66974125-67648523
122
GRAF chr5:142150292-142608571
HCMOGT-1 chr17:19990335-20218065
HEAB chr11:57425216-57429336
HEI10 chr14:20779530-20797536
HERPUD1 chr16:56965748-56977793
HIP1 chr7:75163409-75368279
HIST1H4I chr6:27107088-27107457
HLF chr17:53342321-53402426
HLXB9 chr7:156797547-156803347
HMGA1 chr6:34204577-34214007
HMGA2 chr12:66218240-66360068
HNRNPA2B1 chr7:26229557-26240413
HOOK3 chr8:42752033-42885681
HOXA11 chr7:27220777-27224835
HOXA13 chr7:27236499-27239725
HOXA9 chr7:27202058-27205149
HOXC11 chr12:54366910-54370201
HOXC13 chr12:54332576-54340327
HOXD11 chr2:176972084-176974314
HOXD13 chr2:176957532-176960666
HRAS chr11:532243-535550
HRPT2 chr1:193091088-193223940
HSPCA chr14:102547076-102606086
HSPCB chr6:44214849-44221614
IDH1 chr2:209100954-209119806
IDH2 chr15:90627214-90645708
IGH@ chr14:106053226-106054732
IGK@ chr2:89156507-89165894
IGL@ chr22:22516610-22517078
IKZF1 chr7:50344378-50472796
IL2 chr4:123372630-123377650
IL21R chr16:27413483-27463362
IL6ST chr5:55230938-55290821
IRF4 chr6:391752-411442
IRTA1 chr1:157543540-157567870
ITK chr5:156607907-156682109
JAK1 chr1:65298906-65432187
JAK2 chr9:4985245-5128182
JAK3 chr19:17935595-17958841
JAZF1 chr7:27870196-28220437
JUN chr1:59246464-59249785
123
KDM5A chr12:389223-498620
KDM5C chrX:53220504-53254604
KDM6A chrX:44732423-44971843
KDR chr4:55944427-55991762
KIAA1549 chr7:138516129-138666064
KIT chr4:55524095-55606879
KLK2 chr19:51376689-51383822
KRAS chr12:25358180-25403854
KTN1 chr14:56046925-56151301
LAF4 chr2:100163718-100722045
LASP1 chr17:37026112-37078022
LCK chr1:32716840-32751765
LCP1 chr13:46700059-46756459
LCX chr10:70320117-70454238
LHFP chr13:39917030-40177356
LIFR chr5:38475065-38595507
LMO1 chr11:8245857-8285406
LMO2 chr11:33880125-33913836
LPP chr3:187871663-188608459
LYL1 chr19:13209848-13213681
MADH4 chr18:48556583-48611409
MAF chr16:79627746-79634622
MAFB chr20:39314519-39317876
MALT1 chr18:56338618-56417370
MAML2 chr11:95711440-96076344
MAP2K4 chr17:11924135-12047050
MDM2 chr12:69201971-69239211
MDM4 chr1:204485511-204527247
MDS1 chr3:168801287-169381563
MDS2 chr1:23953824-23967056
MECT1 chr19:18794425-18893142
MEN1 chr11:64570996-64578766
MET chr7:116312459-116438439
MHC2TA chr16:10971055-11018839
MITF chr3:69788586-70017488
MKL1 chr22:40806292-41032690
MLF1 chr3:158288953-158324252
MLH1 chr3:37035268-37092335
MLL chr11:118307205-118395934
MLLT1 chr19:6210393-6279959
MLLT10 chr10:21823102-22032555
124
MLLT2 chr4:87856154-88062205
MLLT3 chr9:20344968-20622514
MLLT4 chr6:168227671-168365792
MLLT6 chr17:36861873-36886055
MLLT7 chrX:70315999-70323384
MN1 chr22:28144266-28197486
MPL chr1:43803475-43820134
MSF chr17:75277492-75496676
MSH2 chr2:47630263-47710360
MSH6 chr2:48010221-48034084
MSI2 chr17:55333931-55757299
MSN chrX:64887511-64961792
MTCP1 chrX:154292309-154299547
MUC1 chr1:155158302-155162700
MUTYH chr1:45794915-45806142
MYB chr6:135502453-135540311
MYC chr8:128748315-128753678
MYCL1 chr1:40361096-40367687
MYCN chr2:16080683-16087128
MYH11 chr16:15796994-15950887
MYH9 chr22:36677324-36784063
MYST4 chr10:76586379-76792639
NACA chr12:57106211-57119326
NBS1 chr8:90945564-90996899
NCOA1 chr2:24807346-24993568
NCOA2 chr8:71024267-71316020
NCOA4 chr10:51565108-51590733
NF1 chr17:29421781-29646377
NF2 chr22:29999545-30094583
NFIB chr9:14081842-14398982
NFKB2 chr10:104154229-104162280
NIN chr14:51186482-51297839
NONO chrX:70503042-70521016
NOTCH1 chr9:139388897-139440238
NOTCH2 chr1:120454178-120612276
NPM1 chr5:170814708-170837887
NR4A3 chr9:102584137-102629173
NRAS chr1:115247079-115259515
NSD1 chr5:176560080-176727213
NTRK1 chr1:156785542-156851642
NTRK3 chr15:88419988-88799661
125
NUMA1 chr11:71713911-71791573
NUP214 chr9:134000981-134109090
NUP98 chr11:3696241-3819022
NUT chr15:34638066-34649929
OLIG2 chr21:34398239-34401500
OMD chr9:95176528-95186836
P2RY8 chrY:1531466-1606037
PAFAH1B2 chr11:117015000-117047129
PALB2 chr16:23614483-23652678
PAX3 chr2:223064606-223163715
PAX5 chr9:36838531-37034476
PAX7 chr1:18957500-19062631
PAX8 chr2:113973575-114036498
PBX1 chr1:164528802-164821045
PCM1 chr8:17780366-17887455
PCSK7 chr11:117075789-117102811
PDE4DIP chr1:144851428-145076079
PDGFB chr22:39619687-39640957
PDGFRA chr4:55095264-55164411
PDGFRB chr5:149493403-149535422
PER1 chr17:8043789-8055753
PHOX2B chr4:41746100-41750987
PICALM chr11:85668486-85780108
PIK3CA chr3:178866311-178952495
PIK3R1 chr5:67522462-67597647
PIM1 chr6:37137922-37143202
PLAG1 chr8:57073469-57123859
PML chr15:74287014-74340153
PMS1 chr2:190648811-190742354
PMS2 chr7:6012871-6048737
PMX1 chr1:170633313-170708540
PNUTL1 chr22:19701987-19712297
POU2AF1 chr11:111222983-111250157
POU5F1 chr6:31132115-31138451
PPARG chr3:12329349-12475854
PRCC chr1:156737274-156770604
PRDM16 chr1:2985742-3355183
PRF1 chr10:72357105-72362531
PRKAR1A chr17:66508110-66528908
PRO1073 chr11:65265233-65273937
PSIP2 chr9:15464066-15511003
126
PTCH1 chr9:98205266-98279247
PTEN chr10:89623195-89728531
PTPN11 chr12:112856536-112947716
RAB5EP chr17:5185558-5289131
RAD51L1 chr14:68286496-69062737
RAF1 chr3:12625102-12705700
RANBP17 chr5:170289022-170727018
RAP1GDS1 chr4:99182527-99365010
RARA chr17:38465423-38513894
RB1 chr13:48877883-49056024
RBM15 chr1:110881945-110889303
RECQL4 chr8:145736667-145743210
REL chr2:61108752-61150178
RET chr10:43572517-43625795
ROS1 chr6:117609530-117747018
RPL22 chr1:6245081-6259679
RPN1 chr3:128338813-128369719
RUNX1 chr21:36160099-36421595
RUNXBP2 chr8:41786998-41909505
SBDS chr7:66452690-66460588
SDH5 chr11:61197597-61214237
SDHB chr1:17345227-17380665
SDHC chr1:161284166-161334533
SDHD chr11:111957571-111966517
SEPT6 chrX:118750911-118827333
SET chr9:131445934-131458674
SETD2 chr3:47057900-47205467
SFPQ chr1:35649203-35658743
SFRS3 chr6:36562090-36572243
SH3GL1 chr19:4360368-4400471
SIL chr1:47715811-47779819
SLC45A3 chr1:205626981-205649630
SMARCA4 chr19:11071598-11172959
SMARCB1 chr22:24129150-24176704
SMO chr7:128828713-128853383
SOCS1 chr16:11348274-11350039
SRGAP3 chr3:9022278-9291311
SS18 chr18:23596219-23670611
SS18L1 chr20:60718822-60757568
SSH3BP1 chr10:27035527-27150016
SSX1 chrX:48114797-48126879
127
SSX2 chrX:52725946-52736249
SSX4 chrX:48242968-48271344
STK11 chr19:1205798-1228434
STL chr6:125229394-125284173
SUFU chr10:104263719-104393214
SUZ12 chr17:30264044-30328057
SYK chr9:93564012-93660841
TAF15 chr17:34136488-34174237
TAL1 chr1:47681963-47695443
TAL2 chr9:108424738-108425383
TCEA1 chr8:54879117-54935008
TCF1 chr12:121416549-121440312
TCF12 chr15:57210833-57580712
TCF3 chr19:1609293-1650286
TCL1A chr14:96176305-96180533
TCL6 chr14:96117515-96139789
TET2 chr4:106067943-106200958
TFE3 chrX:48886242-48900990
TFEB chr6:41651716-41703997
TFG chr3:100428160-100467810
TFPT chr19:54610320-54619055
TFRC chr3:195776156-195809032
THRAP3 chr1:36690017-36770955
TIF1 chr7:138145079-138270330
TLX1 chr10:102891061-102897545
TLX3 chr5:170736288-170739137
TMPRSS2 chr21:42836479-42880085
TNFAIP3 chr6:138188581-138204445
TNFRSF17 chr16:12058964-12061924
TNFRSF6 chr10:90750288-90775541
TOP1 chr20:39657462-39753124
TP53 chr17:7571720-7590863
TPM3 chr1:154127780-154164609
TPM4 chr19:16178317-16213813
TPR chr1:186280788-186344457
TRA@ chr14:22748989-22749635
TRB@ chr7:142239528-142251156
TRD@ chr14:22953787-23020068
TRIM27 chr6:28870780-28891768
TRIM33 chr1:114935401-115053781
TRIP11 chr14:92434243-92506403
128
TSC1 chr9:135766735-135820020
TSC2 chr16:2097990-2138712
TSHR chr14:81421869-81575291
TTL chr2:113239743-113290218
USP6 chr17:5031687-5078324
VHL chr3:10183319-10193744
WAS chrX:48542186-48549815
WHSC1 chr4:1873123-1983933
WHSC1L1 chr8:38132563-38239790
WRN chr8:30890778-31031276
WT1 chr11:32409325-32457087
WTX chrX:63404998-63425624
XPA chr9:100437192-100459691
XPC chr3:14186650-14220172
ZNF145 chr11:113930431-114121394
ZNF198 chr13:20532810-20665967
ZNF278 chr22:31721791-31742249
ZNF331 chr19:54024177-54083523
ZNF384 chr12:6775644-6798676
ZNF521 chr18:22641888-22932214
ZNF9 chr3:128886659-128902810
ZNFN1A1 chr7:50344378-50472796
Supplementary Table 1- The list of Sanger Cancer Genes as of January 2011, with
genomic coordinates (hg19).
129
Additional Genes
Gene Location
CDK1 chr10:62538089-62554604
IGF1 chr12:102789645-102874378
CHEK1 chr11:125495036-125546148
DKK1 chr10:54074041-54077416
CDC25C chr5:137620959-137667516
TP53I3 chr2:24300305-24307728
TP73 chr1:3569129-3650467
RAD51 chr15:40987327-41024354
TP63 chr3:189349216-189599284
IGFBP3 chr7:45951850-45960871
GADD45A chr1:68150883-68154019
TIMP3 chr22:33196802-33259027
MMP13 chr11:102813723-102826463
HNRNPC chr14:21677298-21737638
CDKN1A chr6:36644237-36655108
Supplementary Table 2- Additional cell cycle genes identified by Ingenuity Pathways
Analysis that were selected on the basis of their relevance to cancer biology and interaction with
p53 with genomic coordinates (hg19).
qPCR Supplementary Information
qPCR Protocol
Quantitative PCR began with a five minute hot start of 95˚C with a ramp time of 4.4˚C/s.
After the hot start, the following cycle was repeated 40 times: 95˚C for 10 seconds with a ramp
time of 4.4˚C/s, 60˚C for 15 seconds with a ramp time of 2.2˚C/s, 72˚C for 10 seconds with a
ramp time of 4.4˚C/s. The final melt consisted of a 5 second period at 95˚C with a ramp time of
4.4˚C/s followed by 65˚C for 60s with a ramp time of 2.2˚C/s.
130
qPCR Primers
PTCH-3-F AACATTTGGCCATCTTGTCC
PTCH-3-R AAGGCTTTGAGAATGCAAGC
PTCH-6-F CTTCTCCTCCTCCTCCGTCT
PTCH-6-R CTGACAGGTCCTGCCTATGG
PTCH-7-F GGGCTTGGATTTCACATCA
PTCH-7-R AAAAGACGACAGGGGAGACA
PTCH-17-F CGAGGTTCGCTGCTTTTAAT
PTCH-17-R ACTCCTCCCTTCTGCTTCGT
PTCH-20-F TTCAGCTTCCACATGCTGTC
PTCH-20-R CCCCGCTGGTTTCTTATTTA
PTCH-24-F GAGGCTGGAGTCGGAGAACT
PTCH-24-R TACCATGCAGTCCACTGTCC
MYCN-8-F ACCCTCGTAGCTCGCACTTA
MYCN-8-R GGTAGTCCGAAGGTGCAAAA
FOXP2-F TGCTAGAGGAGTGGGACAAGTA
FOXP2-R GAAGCAGGACTCTAAGTGCAGA
MSH2-7-F AATTCAAAGAGGAGGAATTCTGA
MSH2-7-R CCATGTACCTGATTCTCCATTTC
EXT1-1-F AGAACGGTGGGATACAGCAC
EXT1-1-R TAGCAGCTTAGCCCGTTTGT
EXT1-3-F GCACAAAGGGTTGGAGAGAA
EXT1-3-R GCGTGCTCAACAGTAAGCAG
PCDH15-1-F ACCAGGCAGAAAGCTGAAAA
PCDH15-1-R CAATGGGCTTTCTGGGAGTA
PCDH15-6-F AACCCATACTGGAGGGGAAT
PCDH15-6-R TGATGAAGAAAGTCGGAATGG
BCMA-F GCCTCGAGTACACGGTGGAA
BCMA-R AGCAGCTGGCAGGCTCTTG
DICER1-1-F GCATGAGAGCGAGCCTGT
DICER1-1-R CAACGCCAAGGTCCAGTC
DICER1-9-F GGAGGCCTGAAAGGGTAAAT
DICER1-9-R TGGGTCCTTTCTTTGGACTG
Supplementary Table 3- The primary primers used for array validation by quantitative
PCR.
131
PTCH Ex 01F TGG AAG GCG CAG GGT CTG ACT
PTCH Ex 01 R CGA TCC CAA AGA GTT AGA GGA
PTCH Ex 02F CTG CGG CCC GGC TTT ATG AC
PTCH Ex 02R GCG CCC AAA CAA TAA ACA AT
PTCH Ex 03F ACT GCT CAC ACA TCA GCC AGT CTC AT
PTCH Ex 03R GCA TTT CCA GGG CAA CTT CAT TTA CTA
PTCH Ex 04–05F CAA GCT TGC TGG GTC TCT ACT T
PTCH Ex 04–05R CCC GAC TAT TCA CTC AAA AAA TGC ACA
PTCH Ex 06F ATT TGT TTT GAT GCC AGA GTC CCA GA
PTCH Ex 06R GGC TAA TGG GAG GTG TAT GGC AAA TC
PTCH Ex 07F AAG ATT TGC CAT ACA CCT CCC ATT AGC
PTCH Ex 07R AAT TCC CCA CAA GGT GCT TTT TCA A
PTCH Ex 08F GGA AAC ATG TGC TCA CAG AGA AGG AAA
PTCH Ex 08R CCA GAA TTG CAA TGT TTT GAA
PTCH Ex 09F CCC TGC CCT GGA ATC ACG TAG AAC
PTCH Ex 09R CTC TCT GTC CTG GAT GCA CA
PTCH Ex 10F TTT GCC GTT TGC CTA CCT TTG ACT C
PTCH Ex 10R GCA TTC CCC TGA AAC CAG TA
PTCH Ex 11F AGG TGC TGG TGG CAG AGT CCT AAC TA
PTCH Ex 11R GCA GCC AGT GAC ACA TCA TCT GAC AT
PTCH Ex 12F CTG CCA CGT ATC TGC TCA CAC AGT C
PTCH Ex 12R CAC CCA GTT AAA CAG AGC CTC AAA CAC
PTCH Ex 13F CAC GGT TTC AAA TGC TTC AAG AGG A
PTCH Ex 13R CAA ACC CCG TTA CCC ACA TTC CTT
PTCH Ex 14F CAG GCG ATG AAC CAG GTG ATG TTA T
PTCH Ex 14R GAA GCA ATC TGA TGA ACT CCA AAG GTT
PTCH Ex 15F AGT GTG GTG GTG AAA ACA AGG
PTCH Ex 15R GCT GCT GCA GAA ACA GTT CA
PTCH Ex 16F GGG ACA CAG AGG GTG TGT TT
PTCH Ex 16R CCA GTG CCT TAG GTC TCC AG
PTCH Ex 17F GCC AGT GAT TGC ATC CTC CGA TAA
PTCH Ex 17R GGG GGT TGT ATC CCA TTA CA
PTCH Ex 18F CCT CAC AAA GAA TGA CTG CTG GAA GAT
PTCH Ex 18R CCA GAG GCC CAG ACA TAA ACA AAA CTT
PTCH Ex 19F AAG GTT CCC ACT TGG AGA CAA ACA GAG
PTCH Ex 19R TGA ATT AGG CAG TAA AGG CAG TGT CCA
PTCH Ex 20F TAC GTC AAC ACC AAA TAT GAC CCA GTG
PTCH Ex 20R TCT GCC TCA GCC TCC CAA GTA GC
PTCH Ex 21F GGG GTG GGT TTT GTT CAT TT
132
PTCH Ex 21R AGC CAG TAC ACC GAA GAG GA
PTCH Ex 22F CCC CTG AAA AAT ACC GTG CTT TGA G
PTCH Ex 22R ATC TGC CTG TGT GAT GTG CTG CTC
PTCH Ex 23F GGG TTG ACT GAG TCT TTG GTG AAA CC
PTCH Ex 23R TAA AAG GTC ACT GGG GTC CA
ex1b-1-F CGC GGA CTC ACA ATT ACA AG
ex1b-1-R CGA GCA CAA GGT GGA GAA G
ex1b-2-F GGG CTT GGA TTT CAC ATC A
ex1b-2-R CTG ACA GGT CCT GCC TAT GG
Supplementary Table 4- The primers used for sequencing analysis of PTCH1.
133
Supplementary Results
Regions of Interest after CGH array analysis
Chromosome Start Stop ID Alteration Length(bp) Probes Mean P-value Gene(s)
chr2 237490104 237490746 11 Gain 642 5 3.440 0.005955 CXCR7
chr2 237490104 237490896 12 Gain 792 7 3.174 0.005593 CXCR7
chr2 223086061 223089609 11 Gain 3548 4 3.682 0.004443 PAX3
chr2 223086061 223089609 15 Gain 3548 4 3.607 0.008166 PAX3
chr2 208464673 208464975 5 Loss 302 4 1.194 0.001805 CREB1
chr2 208464598 208464975 1 Loss 377 5 1.309 0.003628 CREB1
chr2 208464598 208464975 4 Loss 377 5 1.207 0.000382 CREB1
chr2 208464598 208464975 6 Loss 377 5 1.238 0.001706 CREB1
chr2 208462804 208463413 4 Loss 609 4 1.158 0.007531 CREB1
chr2 208462406 208462653 27 Gain 247 3 2.889 0.005166 CREB1
chr1 205632173 205632709 12 Loss 536 14 1.425 0.000641 SLC45A3
chr1 205632173 205632709 16 Loss 536 14 1.401 0.001514 SLC45A3
chr1 205632104 205632709 14 Loss 605 15 1.419 0.001884 SLC45A3
chr4 191002380 191006727 1 Gain 4347 43 3.165 2.97E-16
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 2 Gain 4347 43 4.151 7.59E-17
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 4 Gain 4347 43 3.095 5.20E-19
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 5 Gain 4347 43 3.054 1.29E-15
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 6 Gain 4347 43 3.187 9.79E-16
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 7 Gain 4347 43 2.982 3.35E-12
DUX4L4/L6/L5/L4/L2/DUX
134
2
chr4 191002380 191006727 8 Gain 4347 43 2.975 4.78E-15
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 18 Gain 4347 43 2.779 1.60E-08
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 19 Gain 4347 43 2.756 1.42E-08
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 20 Gain 4347 43 2.725 1.77E-08
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191002380 191006727 22 Gain 4347 43 2.547 4.78E-06
DUX4L4/L6/L5/L4/L2/DUX2
chr4 191001678 191006727 24 Gain 5049 47 2.660 2.74E-09
DUX4L4/L6/L5/L4/L2/DUX2
chr4 190961628 191006727 3 Gain 45099 62 3.661 8.48E-29
DUX4L4/L6/L5/L4/L2/DUX2
chr4 190961628 191002380 9 Loss 40752 19 1.301 1.08E-05
DUX4L4/L6/L5/L4/L2/DUX2
chr4 190961628 191002380 12 Loss 40752 19 1.485 0.002982
DUX4L4/L6/L5/L4/L2/DUX2
chr1 186303920 186304502 10 Loss 582 3 1.336 0.007518 TPR
chr1 186303482 186304502 1 Loss 1020 6 1.215 0.000734 TPR
chr1 186303036 186304502 3 Loss 1466 7 1.295 0.003576 TPR
chr1 186303036 186304502 5 Loss 1466 7 1.291 0.000821 TPR
chr1 186303036 186304502 6 Loss 1466 7 1.226 0.000128 TPR
chr1 186303036 186304502 18 Loss 1466 7 1.390 0.002018 TPR
chr1 186302377 186304502 4 Loss 2125 11 1.387 0.003783 TPR
chr2 175664003 175664853 3 Loss 850 10 1.257 0.007548 CHN1
135
chr2 175664003 175664853 4 Loss 850 10 1.273 0.003226 CHN1
chr2 175664003 175664853 5 Loss 850 10 1.143 0.001622 CHN1
chr2 175664003 175664853 6 Loss 850 10 1.299 0.009763 CHN1
chr2 175664003 175664853 22 Loss 850 10 1.447 0.007578 CHN1
chr2 175664003 175664351 26 Loss 348 4 1.273 0.004103 CHN1
chr2 175868293 175869966 23 Gain 1673 7 3.181 0.003828 CHN1
chr2 175872476 175872798 21 Gain 322 3 2.912 0.003915 none
chr3 174883953 175144278 11 Gain 260325 5 3.524 0.001884 NAALADL2
chr3 174758301 175196598 12 Gain 438297 10 3.000 0.000455 NAALADL2
chr5 170736166 170738656 3 Gain 2490 28 2.657 0.000247 TLX3
chr5 170736166 170738656 5 Gain 2490 28 2.532 0.003525 TLX3
chr3 168802621 168803008 11 Loss 387 8 1.387 0.005094 MECOM
chr3 168802621 168803008 14 Loss 387 8 1.356 0.00503 MECOM
chr3 168802621 168803008 34 Loss 387 8 1.297 0.000575 MECOM
chr3 168802621 168803008 36 Loss 387 8 1.345 0.00317 MECOM
chr6 168276058 168281187 3 Loss 5129 5 1.150 0.008582 MLLT4
chr6 168276058 168281187 5 Loss 5129 5 1.149 0.009663 MLLT4
chr3 162711459 166506831 10 Gain 3795372 106 2.464 3.21E-05 multiple
chr3 162629146 167317738 27 Gain 4688592 128 2.420 0.000139 multiple
chr3 162509444 162615098 5 Loss 105654 3 0.135 0.005344 BC073807
chr3 162509444 162615098 14 Loss 105654 3 0.044 0.009006 BC073807
chr3 162509444 162629146 27 Gain 119702 4 3.472 0.006365 BC073807
chr3 162509444 162629146 33 Gain 119702 4 3.228 0.008951 BC073807
chr1 161334125 161334349 23 Loss 224 4 1.299 0.008194 SDHC
chr1 161333957 161648424 7 Loss 314467 104 1.501 4.26E-06 SDHC
chr1 161333957 161648162 36 Loss 314205 102 1.548 0.000273 SDHC
chr1 161332911 161333287 33 Loss 376 5 1.452 0.006229 SDHC
chr2 154850752 155001418 10 Gain 150666 3 3.246 0.006216 GALNT13
chr2 154850752 155001418 19 Gain 150666 3 3.296 0.001175 GALNT13
136
chrX 154299505 154299990 9 Loss 485 3 0.975 0.008064 BRCC3
chrX 154299505 154299990 25 Loss 485 3 0.926 0.003345 BRCC3
chr1 150855139 151031795 8 Loss 176656 78 1.580 0.005934 multiple
chr1 150854772 151031795 2 Loss 177023 79 1.503 0.001187 multiple
chr2 150333719 151069169 12 Gain 735450 21 2.823 0.001569 MMADHC
chr2 150333719 151069169 16 Gain 735450 21 2.942 0.002285 MMADHC
chr7 146035975 146149372 3 Loss 113397 3 1.098 0.00022 CNTNAP2
chr7 146035975 146149372 6 Loss 113397 3 1.178 0.003827 CNTNAP2
chr7 146035975 146149372 21 Loss 113397 3 1.273 0.009971 CNTNAP2
chr7 144839854 147499415 27 Gain 2659561 72 2.427 0.00111 many
chr7 144320788 147499415 10 Gain 3178627 86 2.419 0.006028 many
chr7 142252511 142254840 13 Gain 2329 13 2.828 0.003154 TCRB
chr7 142252411 142254840 11 Gain 2429 14 2.959 0.000202 TCRB
chr7 142252411 142254840 14 Gain 2429 14 3.084 6.53E-05 TCRB
chr7 142252411 142254840 35 Gain 2429 14 2.673 0.002056 TCRB
chr5 140308757 140521036 11 Gain 212279 5 3.149 0.004481 many
chr5 140308757 140521036 16 Gain 212279 5 3.683 0.00688 many
chr9 139390347 140447444 10 Loss 1057097 373 1.629 1.42E-06 many
chr9 139287703 141122085 11 Loss 1834382 415 1.644 0.004464 many
chr7 138666279 138666566 27 Loss 287 3 1.343 0.005519 none
chr7 138665432 138666675 35 Loss 1243 7 1.406 0.00762 KIAA1549
chr7 138603448 138603656 11 Gain 208 4 3.380 0.008244 KIAA1549
chr7 138603448 138603656 12 Gain 208 4 3.725 2.81E-05 KIAA1549
chr7 138603448 138603656 20 Gain 208 4 2.891 0.007751 KIAA1549
chr6 138198903 138199701 32 Loss 798 4 1.036 0.008413 TNFAIP3
137
chr6 138198298 138199701 5 Loss 1403 5 1.274 0.002537 TNFAIP3
chr6 138198298 138199762 24 Loss 1464 6 1.199 0.004298 TNFAIP3
chr4 134925812 135196257 7 Loss 270445 7 1.329 0.002708 PABOC4L
chr4 134925812 135158888 23 Loss 233076 6 1.101 0.000106 PABOC4L
chr4 134925812 135196257 25 Loss 270445 7 1.235 0.00137 PABOC4L
chr9 132689352 132689611 11 Loss 259 5 1.086 0.001401 FNBP1
chr9 132689352 132689611 15 Loss 259 5 1.171 0.001257 FNBP1
chr9 132689352 132689611 16 Loss 259 5 1.120 0.001814 FNBP1
chr9 132689352 132690064 35 Loss 712 6 1.410 0.000806 FNBP1
chr9 132652044 132652975 9 Loss 931 13 1.547 0.000344 FNBP1
chr11 130980227 134934167 10 Gain 3953940 110 2.411 0.0009 many
chr11 130980227 134934167 25 Gain 3953940 110 2.411 0.008527 many
chr11 130623876 134934167 27 Gain 4310291 119 2.393 0.000721 many
chrX 129258582 151940841 10 Gain 22682259 1060 2.340 0.001048 many
chrX 129253954 143171508 27 Gain 13917554 829 2.350 4.93E-05 many
chrX 129244389 129246246 25 Loss 1857 18 1.548 0.00895 ELF4
chrX 129244292 129246323 15 Loss 2031 20 1.288 6.34E-05 ELF4
chrX 129244292 129245170 29 Loss 878 6 1.160 0.00433 ELF4
chrX 129199456 129253954 18 Loss 54498 149 1.619 0.009742 ELF4
chr11 128681934 128682174 3 Loss 240 3 1.129 0.002329 FLI1
chr11 128681934 128682174 6 Loss 240 3 1.100 0.003259 FLI1
chr11 128681934 128682248 29 Loss 314 4 1.192 0.009886 FLI1
chr6 125232156 125233193 15 Loss 1037 5 1.183 0.004529 STL
chr6 125231587 125232804 19 Loss 1217 9 1.372 0.006788 STL
chr12 122459997 122460593 15 Loss 596 3 0.931 0.006011 BCL7A
chr12 122459997 122460593 16 Loss 596 3 0.971 0.005758 BCL7A
chr12 122459997 122460593 29 Loss 596 3 1.022 0.002443 BCL7A
chr12 122459953 122460593 24 Loss 640 4 1.299 0.000634 BCL7A
chr12 122459786 122460593 32 Loss 807 5 1.310 0.006296 BCL7A
chr12 122459786 122460593 34 Loss 807 5 1.004 0.004624 BCL7A
chr12 122459611 122460593 9 Loss 982 6 1.209 0.002536 BCL7A
chr12 122459611 122460593 10 Loss 982 6 1.224 0.001463 BCL7A
chr12 122459611 122460593 13 Loss 982 6 1.337 0.000999 BCL7A
138
chr12 122459611 122460593 36 Loss 982 6 1.151 0.004283 BCL7A
chr12 122459463 122460593 31 Loss 1130 8 1.294 0.001221 BCL7A
chr12 121403212 121409527 7 Loss 6315 5 1.159 0.004089 HNF1A-AS1
chr12 121380055 121409527 2 Loss 29472 6 1.068 0.000602 HNF1A-AS1
chr12 121380055 121409527 6 Loss 29472 6 1.282 0.006815 HNF1A-AS1
chr12 121380055 121409527 8 Loss 29472 6 1.290 0.004219 HNF1A-AS1
chr1 120611983 120612398 24 Loss 415 5 1.458 0.004214 NOTCH2
chr1 120611983 120612273 31 Loss 290 4 1.253 0.00689 NOTCH2
chr1 120611948 120612205 29 Loss 257 4 1.298 0.001139 NOTCH2
chr1 120611948 120612273 34 Loss 325 5 1.080 0.001001 NOTCH2
chr1 120610364 120612273 9 Loss 1909 6 1.218 0.001734 NOTCH2
chr1 120610364 120612273 32 Loss 1909 6 1.288 0.000888 NOTCH2
chr1 120610364 120612273 35 Loss 1909 6 1.151 0.005866 NOTCH2
chr1 120483081 120483576 19 Loss 495 3 1.226 0.000951 NOTCH2
chr1 120535582 120618462 8 Gain 82880 66 2.501 9.43E-05 NOTCH2
chr1 120531186 120620993 7 Gain 89807 80 2.672 8.92E-09 NOTCH2
chr11 119171049 119171786 3 Loss 737 8 1.270 0.000603 CBL
chr11 119171049 119171713 4 Loss 664 7 1.349 0.003694 CBL
chr11 119171049 119171713 5 Loss 664 7 1.259 0.001221 CBL
chr11 119171049 119171786 6 Loss 737 8 1.311 0.000287 CBL
chr11 119171049 119171713 22 Loss 664 7 1.349 0.000314 CBL
chrX 118827177 118831825 17 Loss 4648 37 1.527 0.008259 SEPT6
chrX 118827177 118831825 18 Loss 4648 37 1.402 8.61E-06 SEPT6
chrX 118827177 118831825 20 Loss 4648 37 1.502 0.002268 SEPT6
chrX 118827177 118831825 29 Loss 4648 37 1.532 0.00827 SEPT6
chrX 118827177 118831825 32 Loss 4648 37 1.480 0.000297 SEPT6
chrX 118826101 118835146 36 Loss 9045 42 1.486 0.000244 SEPT6
chr11 118397166 118420988 29 Loss 23822 4 0.842 0.008443 MLL
chr11 118397053 118459847 1 Loss 62794 6 1.054 0.003435 MLL
chr11 118397053 118459847 4 Loss 62794 6 1.213 0.000649 MLL
chr11 118396953 118397390 5 Loss 437 4 1.011 0.001537 MLL
chr11 118342430 118342622 10 Loss 192 5 1.222 0.000728 MLL
chr11 118342430 118342589 33 Loss 159 4 1.373 0.001653 MLL
139
chr6 117688820 117703808 2 Gain 14988 14 3.314 0.000335 ROS1
chr6 117687317 117703808 15 Gain 16491 17 2.983 0.004734 ROS1
chr6 117687282 117703808 13 Gain 16526 18 2.647 0.002692 ROS1
chr6 117687282 117703808 14 Gain 16526 18 2.843 0.005736 ROS1
chr6 117687282 117703808 34 Gain 16526 18 3.092 0.000448 ROS1
chr6 117687282 117703808 36 Gain 16526 18 2.953 0.002008 ROS1
chr6 117662305 117662707 22 Loss 402 8 1.284 0.007085 ROS1
chr11 117104101 117104478 13 Loss 377 5 1.249 0.008091 RNF214
chr11 117102951 117103284 31 Loss 333 4 1.116 0.00187 RNF214
chr11 117102951 117104478 35 Loss 1527 16 1.435 0.000982 RNF214
chr11 117102402 117103177 16 Loss 775 5 0.997 0.00949 RNF214
chr11 117102402 117103284 23 Gain 882 6 3.314 0.006161 RNF214
chr2 113992753 113994702 18 Loss 1949 8 1.298 0.001199 PAX8
chr2 113992753 113994702 29 Loss 1949 8 1.354 0.002764 PAX8
chr11 113934697 113934912 31 Loss 215 5 1.239 0.009449 ZBTB16
chr11 113934697 113934912 32 Loss 215 5 1.275 0.007454 ZBTB16
chr5 112176121 112176459 1 Loss 338 5 1.155 0.001088 APC
chr5 112176121 112176459 3 Loss 338 5 1.170 0.000531 APC
chr5 112176121 112176459 4 Loss 338 5 1.235 0.001802 APC
chr5 112176121 112176459 5 Loss 338 5 1.053 0.002266 APC
chr5 112176121 112176459 6 Loss 338 5 1.058 0.001013 APC
chr5 112176121 112176459 8 Loss 338 5 1.152 0.00563 APC
APC
chr5 112175024 112175235 11 Gain 211 4 4.360 0.0064 APC
chr5 112174961 112175235 35 Gain 274 5 3.067 0.006351 APC
APC
chr5 112162551 112162908 32 Loss 357 4 1.149 0.008015 APC
APC
chr5 112110338 112113367 35 Gain 3029 5 2.915 0.009899 APC
chr5 112101667 112103051 26 Gain 1384 5 2.769 0.004348 APC
chr11 111228333 111228655 11 Loss 322 6 1.245 0.009882 POU2AF1
chr11 111224730 111228655 18 Loss 3925 19 1.463 0.001088 POU2AF1
chr11 111224305 111228655 20 Loss 4350 24 1.533 0.006601 POU2AF1
140
chr12 109084752 109085548 4 Loss 796 3 1.199 0.005051 CORO1C
chr12 109084752 109085548 6 Loss 796 3 1.179 0.007269 CORO1C
chr9 108420581 108421031 1 Loss 450 5 1.277 0.003515 none
chr9 108420581 108421031 3 Loss 450 5 1.141 0.000549 none
chr9 108420581 108421031 4 Loss 450 5 1.260 0.002065 none
chr9 108420581 108421031 5 Loss 450 5 1.025 0.00429 none
chr9 108420581 108421031 6 Loss 450 5 1.099 0.001297 none
chr11 108186585 108186756 19 Loss 171 3 0.984 0.009857 ATM
chr11 108186585 108186756 31 Loss 171 3 1.060 0.00819 ATM
chr11 108089420 108091942 6 Loss 2522 18 1.353 0.001571 none
chr9 105421419 105858176 7 Gain 436757 14 2.716 0.000225 CYLC2
chr9 105421419 105858176 31 Gain 436757 14 2.711 0.003656 CYLC2
chr10 102891538 102892029 9 Loss 491 6 1.244 0.004388 TLX1
chr10 102891349 102894443 23 Gain 3094 15 2.755 0.006672 TLX1
chr12 102790096 102791526 1 Loss 1430 13 1.201 0.000549 IGF1
chr12 102790096 102791351 5 Loss 1255 10 1.163 0.002702 IGF1
chr12 102790096 102791526 6 Loss 1430 13 1.232 0.001004 IGF1
chr9 102582512 102582738 24 Loss 226 3 1.218 0.000404 AK057451
chr9 102580975 102582738 31 Loss 1763 18 1.443 0.004229 AK057451
chr11 102193604 102194890 11 Gain 1286 14 2.895 0.001573 BIRC3
chr11 102193604 102194890 12 Gain 1286 14 2.871 0.001761 BIRC3
chr11 102193604 102193901 13 Gain 297 4 3.226 0.005055 BIRC3
chr11 102183776 102184729 19 Loss 953 5 1.338 0.008037 none
chrX 100509822 100770894 5 Loss 261072 7 1.148 0.005082 many
chr14 99641915 99642386 29 Loss 471 11 1.338 0.000117 BCL11B
chr14 99641915 99642386 32 Loss 471 11 1.410 0.000733 BCL11B
chr4 99174241 99175735 6 Loss 1494 7 1.251 0.004137 none
chr4 99174241 99175735 19 Loss 1494 7 1.313 0.004397 none
chr4 99173676 99176543 4 Loss 2867 9 1.274 0.001636 none
141
chr9 98278952 98279212 20 Loss 260 5 1.026 0.001646 PTCH1
chr9 98278825 98279278 17 Loss 453 7 1.263 0.001443 PTCH1
chr9 98278825 98279278 18 Loss 453 7 1.247 0.001583 PTCH1
chr9 98278825 98279278 19 Loss 453 7 1.218 0.001483 PTCH1
chr9 98278825 98279212 25 Loss 387 6 1.054 0.001364 PTCH1
chr9 98278825 98279212 27 Loss 387 6 1.157 0.002837 PTCH1
chr9 98278825 98279366 31 Loss 541 8 1.073 0.000213 PTCH1
chr9 98278411 98279366 24 Loss 955 10 1.317 0.000406 PTCH1
chr9 98277653 98279577 9 Loss 1924 13 1.367 1.17E-05 PTCH1
chr9 98277653 98279278 15 Loss 1625 10 1.210 2.49E-05 PTCH1
chr9 98277653 98279577 29 Loss 1924 13 1.399 6.83E-05 PTCH1
chr9 98277653 98279462 32 Loss 1809 12 1.347 2.71E-05 PTCH1
chr9 98277653 98279462 35 Loss 1809 12 1.397 0.000143 PTCH1
chr9 98266417 98269035 5 Gain 2618 7 3.457 0.000376 PTCH1
chr9 98231091 98231571 18 Loss 480 5 1.232 0.004915 PTCH1
chr9 98231091 98231571 19 Loss 480 5 1.376 0.004915 PTCH1
chr9 98231091 98231571 17 Loss 480 5 1.366 0.004915 PTCH1
chr9 98207931 98208554 8 Gain 623 8 2.799 0.004598 PTCH1
chr9 97862625 97862988 6 Gain 363 3 3.929 0.004548 FANCC
chr9 97862625 97862988 24 Gain 363 3 3.235 0.000307 FANCC
chr9 97862625 97862988 31 Gain 363 3 3.435 0.009661 FANCC
chr14 95623818 95624127 9 Loss 309 4 1.201 0.004721 DICER1-AS
chr14 95623818 95624046 15 Loss 228 3 1.019 0.009364 DICER1-AS
chr14 95623818 95624127 31 Loss 309 4 1.215 0.002501 DICER1-AS
chr14 95623642 95624046 35 Loss 404 7 1.348 0.002043
DICER1/DICER1-AS
chr14 95598404 95600396 4 Loss 1992 8 1.279 0.006329 DICER1
chr14 95598404 95600396 5 Loss 1992 8 1.279 0.006329 DICER1
chr14 95597936 95600396 3 Loss 2460 9 1.332 0.000625 DICER1
chr14 93275641 93276233 13 Gain 592 4 3.093 0.001232 GOLGA5
chr14 93275411 93276233 12 Gain 822 5 3.770 9.56E-05 GOLGA5
chr14 92472623 92473426 9 Gain 803 3 3.538 0.009893 TRIP11
chr14 92472259 92474129 34 Gain 1870 9 3.459 0.000284 TRIP11
chr14 92472160 92472623 15 Gain 463 6 3.476 0.006426 TRIP11
142
chr7 91565310 91567123 3 Loss 1813 5 1.187 4.49E-03 none
chr7 91565310 91567123 4 Loss 1813 5 1.240 0.000216 none
chr7 91565310 91567123 5 Loss 1813 5 1.067 0.003545 none
chr15 91187994 91251111 24 Loss 63117 11 1.403 0.009155 CRTC3
chr15 91166194 91172639 12 Loss 6445 18 1.444 0.006887 CRTC3
chr15 91162997 91174672 11 Loss 11675 26 1.447 0.002801 CRTC3
chr15 91072479 91072705 29 Loss 226 3 1.116 0.002099 none
chr15 91187994 91251111 24 Loss 63117 11 1.403 0.009155 CRTC3
chr8 90999078 91001637 7 Loss 2559 5 1.397 0.00234 none
chr8 90992801 90994324 3 Loss 1523 10 1.035 0.006308 NBN
chr15 90645513 90645860 34 Loss 347 5 1.106 0.004591 IDH2
chr15 90645513 90645860 35 Loss 347 5 1.232 0.003733 IDH2
chr15 90645513 90645860 36 Loss 347 5 1.170 0.006836 IDH2
chr15 88799661 88801392 19 Loss 1731 15 1.452 0.007311 NTRK3
chr15 88799661 88801171 29 Loss 1510 13 1.402 0.00126 NTRK3
chr15 88799661 88801171 32 Loss 1510 13 1.427 0.002543 NTRK3
chr15 88799137 88799732 3 Gain 595 11 2.957 0.000277 NTRK3
chr11 85691995 85692529 22 Gain 534 4 3.071 0.006438 PICALM
chr11 85691414 85692529 8 Gain 1115 5 3.323 0.009989 PICALM
chr14 83864884 88222494 11 Gain 4357610 139 2.460 0.001457 few
chr14 83778865 88131721 10 Gain 4352856 139 2.434 6.38E-05 few
chrX 82066647 118623797 10 Gain 36557150 949 2.387 4.40E-11 many
chrX 81947420 82066647 10 Gain 119227 4 3.509 0.004614 many
chr17 79935484 79935555 35 Loss 71 3 1.072 0.007929 ASPSCR1
chr17 79935311 79936311 14 Loss 1000 7 1.265 0.007006 ASPSCR1
chr17 79935311 79936311 23 Gain 1000 7 3.418 0.002291 ASPSCR1
chr17 79793482 79917933 10 Loss 124451 5 1.227 0.000179 many
chr17 79793482 79917933 11 Loss 124451 5 1.048 3.20E-06 many
chr17 79793482 79917933 12 Loss 124451 5 1.109 0.001028 many
chr16 79638111 79638376 26 Loss 265 3 1.302 0.00668 MAF
chr16 79633758 79635252 29 Loss 1494 16 1.413 0.001956 MAF
143
chr16 79633722 79634562 35 Loss 840 9 1.361 0.000632 MAF
chr16 79632626 79633583 23 Gain 957 10 3.314 0.001382 MAF
chr16 79628998 79629147 1 Loss 149 3 0.928 0.002068 MAF
chr16 79628998 79629147 2 Loss 149 3 0.867 0.001724 MAF
chr16 79628998 79629147 3 Loss 149 3 0.941 0.006462 MAF
chr16 79628998 79629147 5 Loss 149 3 0.883 0.005759 MAF
chr16 79628998 79629147 6 Loss 149 3 0.833 0.00431 MAF
chr16 79628998 79629147 7 Loss 149 3 0.946 0.003549 MAF
chr16 79628998 79629147 8 Loss 149 3 0.954 0.001386 MAF
chr16 79628998 79629147 17 Loss 149 3 0.937 0.003369 MAF
chr16 79628998 79629147 18 Loss 149 3 0.948 0.005963 MAF
chr16 79628998 79629147 19 Loss 149 3 0.985 0.004854 MAF
chr16 79628998 79629147 20 Loss 149 3 0.933 0.003669 MAF
chr16 78996814 79113461 11 Gain 116647 4 4.307 0.004557 WWOX
chr16 78996814 79113461 15 Gain 116647 4 4.176 0.003799 WWOX
chr16 78996814 79113461 16 Gain 116647 4 3.926 0.004887 WWOX
chr10 76789124 76789420 12 Loss 296 7 1.288 0.000362 KAT6B
chr10 76784545 76784973 5 Loss 428 4 0.924 0.00536 KAT6B
chr10 76647068 76650513 8 Loss 3445 3 1.191 0.002814 KAT6B
chr10 76602742 76603055 34 Loss 313 8 1.405 0.009859 KAT6B
chr10 76584280 76586329 11 Loss 2049 21 1.289 9.69E-09 KAT6B
chr10 76584280 76586329 13 Loss 2049 21 1.419 8.54E-08 KAT6B
chr10 76584280 76586329 15 Loss 2049 21 1.285 5.62E-08 KAT6B
chr10 76584280 76590621 34 Loss 6341 27 1.353 4.49E-06 KAT6B
chr10 76584280 76586236 35 Loss 1956 20 1.443 7.05E-05 KAT6B
chr10 76584123 76586329 9 Loss 2206 23 1.460 3.97E-06 KAT6B
chr10 76584123 76586384 10 Loss 2261 24 1.418 6.16E-09 KAT6B
chr10 76582854 76586236 2 Loss 3382 34 1.469 0.003478 KAT6B
chr10 76582854 76596079 12 Loss 13225 44 1.534 0.002274 KAT6B
chr10 76582854 76586329 14 Loss 3475 35 1.475 0.000201 KAT6B
chr10 76582854 76596079 16 Loss 13225 44 1.519 0.002468 KAT6B
chr10 76582854 76586329 36 Loss 3475 35 1.402 1.10E-07 KAT6B
chr10 76580602 76586384 18 Loss 5782 51 1.511 0.000225 KAT6B
chr10 76580602 76586384 20 Loss 5782 51 1.545 0.002214 KAT6B
chr10 76580602 76586384 29 Loss 5782 51 1.541 0.002281 KAT6B
chr10 76579508 76586236 6 Loss 6728 51 1.556 0.008593 KAT6B
chr10 76579508 76586236 32 Loss 6728 51 1.504 0.000359 KAT6B
144
chr2 75954855 84884742 10 Gain 8929887 230 2.395 0.000234 many
chr2 75954855 84758934 27 Gain 8804079 227 2.390 7.88E-05 many
chr7 75353402 75367710 22 Gain 14308 11 2.841 7.28E-05 HIP1
chr7 75353402 75367710 31 Gain 14308 11 2.762 7.13E-05 HIP1
chr7 75351575 75370260 17 Gain 18685 29 2.566 0.006613
chr7 73484027 73500121 12 Gain 16094 4 3.204 0.007007 LIMK1
chr7 73484027 73500121 34 Gain 16094 4 3.925 0.003506 LIMK1
chr8 71318608 71318833 8 Loss 225 3 1.162 0.006896 none
chr8 71318608 71319452 17 Loss 844 8 1.337 0.003406 none
chr10 70453336 70453568 23 Loss 232 4 1.123 0.003146 TET1
chr10 70453261 70453645 4 Loss 384 6 1.282 0.005234 TET1
chr10 70438861 70441754 36 Gain 2893 4 4.216 0.009321 TET1
chr10 70330378 70332402 10 Gain 2024 8 2.585 0.00377 TET1
chr14 68944454 68944765 3 Loss 311 4 1.048 0.009975 RAD51B
chr14 68944454 68944646 4 Loss 192 3 1.029 0.002191 RAD51B
chr14 68944372 68944646 1 Loss 274 4 1.109 0.004521 RAD51B
chr14 68944372 68944765 6 Loss 393 5 1.136 0.004907 RAD51B
chr14 68944372 68945429 7 Loss 1057 7 1.339 0.001322 RAD51B
chr14 68944372 68945429 8 Loss 1057 7 1.341 0.000305 RAD51B
chr14 68943769 68945429 18 Loss 1660 8 1.354 0.002342 RAD51B
chr14 68943769 68945429 19 Loss 1660 8 1.389 0.00736 RAD51B
chr5 67594534 67594792 4 Loss 258 3 1.223 0.004717 PIK3R1
chr5 67594138 67594627 34 Gain 489 5 3.940 0.007901 PIK3R1
chr6 65402650 67041053 6 Gain 1638403 43 2.535 0.000711 EYS
chr6 65402650 67041053 8 Gain 1638403 43 2.475 0.001777 EYS
chr16 65158041 65159558 11 Gain 1517 11 2.786 0.002136 none
chr16 65157966 65159672 26 Gain 1706 13 2.512 0.004394 none
chr16 65039485 65155672 10 Gain 116187 103 2.405 0.004262 CDH11
chr16 65038735 65155672 27 Gain 116937 104 2.405 0.000825 CDH11
chr16 64957254 64981120 22 Gain 23866 6 3.234 0.005708 CDH11
145
chrX 63411823 63412074 10 Loss 251 4 1.444 0.002784 FAM123B
chrX 63411823 63412074 13 Loss 251 4 1.336 0.002817 FAM123B
chrX 63411823 63412074 15 Loss 251 4 1.044 0.003119 FAM123B
chrX 63411823 63412074 35 Loss 251 4 1.390 0.000101 FAM123B
chrX 63405896 63408024 32 Loss 2128 22 1.348 0.007497 FAM123B
chrX 63405136 63429061 18 Loss 23925 158 1.550 3.00E-05 FAM123B
chrX 63347505 63408159 17 Loss 60654 38 1.462 0.003626 FAM123B
chr17 62511609 62522526 18 Loss 10917 4 1.079 0.008661 DEP95
chr17 62511254 62522526 4 Loss 11272 6 1.340 0.003413 DEP95
chr17 62511254 62522526 5 Loss 11272 6 1.179 0.002247 DEP95
chr17 62511254 62522526 6 Loss 11272 6 1.223 0.000405 DEP95
chr7 62280148 62761968 7 Gain 481820 12 3.498 2.59E-08 LOC643955
chr7 62280148 62761968 9 Gain 481820 12 3.442 1.54E-08 LOC643955
chr7 62280148 62761968 33 Gain 481820 12 3.343 2.07E-07 LOC643955
chr10 61551326 61552261 2 Gain 935 13 3.277 9.87E-05 CCDC6
chr10 61551326 61552584 15 Gain 1258 17 3.038 0.000217 CCDC6
chr10 61551326 61552658 16 Gain 1332 18 3.000 0.00018 CCDC6
chr10 61551251 61552156 1 Gain 905 12 2.738 0.002869 CCDC6
chr10 61551251 61552261 6 Gain 1010 14 2.754 0.001063 CCDC6
chr10 61551251 61552349 7 Gain 1098 15 2.879 4.55E-05 CCDC6
chr2 60782857 60783517 3 Loss 660 7 1.224 0.008493 none
chr2 60782857 60783517 6 Loss 660 7 1.132 0.007876 none
chr2 60782857 60783517 18 Loss 660 7 1.111 0.008616 none
chr2 60780953 60782104 10 Loss 1151 14 1.444 0.002589 none
chr2 60780806 60783517 29 Loss 2711 30 1.487 0.00398 none
chr2 60780806 60783283 34 Loss 2477 28 1.456 0.004057 none
chr20 60717737 60719501 35 Loss 1764 9 1.352 0.00097 PSMA7
chr20 60717654 60720735 10 Loss 3081 11 1.451 0.000341 PSMA7
chr20 60717549 60720735 9 Loss 3186 12 1.424 0.000597 PSMA7
146
chr20 57464204 57467440 10 Loss 3236 6 1.283 0.003726 GNAS
chr20 57463531 57465585 23 Gain 2054 6 3.266 0.007725 GNAS
chr20 57462338 57466660 29 Loss 4322 8 1.307 0.006646 GNAS
chr11 57420512 57423034 4 Loss 2522 7 1.359 0.008527 YPEL4
chr11 57416829 57417952 7 Loss 1123 7 1.261 0.003476 none
chr11 57394769 57428535 29 Loss 33766 62 1.579 0.007805 some
chr7 56427383 62280148 9 Gain 5852765 48 2.501 0.008094 many
chr7 56427383 61896163 10 Gain 5468780 41 2.615 1.27E-06 many
chr4 55605813 55606317 1 Loss 504 4 1.190 0.009231 KIT
chr4 55605813 55606317 3 Loss 504 4 1.217 0.002517 KIT
chr4 55605813 55606317 5 Loss 504 4 1.180 0.008143 KIT
chr4 55564750 55565822 11 Gain 1072 3 4.264 0.002575 KIT
chr4 55564750 55565822 15 Gain 1072 3 4.431 0.006001 KIT
chr4 55564750 55565822 16 Gain 1072 3 4.473 0.007809 KIT
chr12 54329700 54330771 9 Loss 1071 14 1.390 7.47E-07 none
chr12 54327202 54328305 19 Loss 1103 9 1.283 0.004119 none
chr12 54327202 54328305 20 Loss 1103 9 1.242 0.003883 none
chr12 54327049 54328305 17 Loss 1256 10 1.307 0.00719 none
chr17 53400717 53402174 11 Gain 1457 17 2.832 0.000536 HLF
chr17 53400566 53402000 12 Gain 1434 17 2.772 0.001103 HLF
chr7 50757049 54427822 10 Gain 3670773 100 2.422 0.002032 many
chrX 48905186 48909256 2 Loss 4070 7 1.112 0.002019 none
chrX 48904998 48909256 6 Loss 4258 9 1.207 0.006932 none
chrX 48904998 48908799 7 Loss 3801 8 1.188 0.004988 none
chrX 48534355 48912101 36 Loss 377746 263 1.617 0.00154 many
chrX 48534206 48912101 18 Loss 377895 264 1.616 0.000551 many
chr1 47783066 47785951 22 Gain 2885 3 3.336 0.001418 none
chr1 47779551 47787756 23 Gain 8205 16 2.599 0.007209 STIL
chr1 47766997 47767652 4 Loss 655 4 1.266 0.004645 STIL
chr1 47766997 47767652 6 Loss 655 4 1.217 0.006673 STIL
147
chrX 44968978 44970623 1 Loss 1645 6 1.154 0.006385 KDM6A
chrX 44968978 44970623 5 Loss 1645 6 1.084 0.006717 KDM6A
chrX 44968978 44970623 6 Loss 1645 6 1.200 0.005429 KDM6A
chrX 44732331 44732893 34 Loss 562 7 1.059 0.003575 KDM6A
chrX 44732259 44733036 29 Loss 777 9 1.318 0.003596 KDM6A
chr22 44966582 45106030 22 Gain 139448 5 2.926 0.0033 some
chr22 44966582 45077310 36 Gain 110728 4 3.584 0.009301 some
chr11 44265893 44266309 5 Loss 416 4 1.105 0.004965 EXT2
chr11 44265893 44266309 6 Loss 416 4 1.024 0.002498 EXT2
chr11 44265641 44266491 1 Loss 850 9 1.346 0.003548 EXT2
chr11 44265641 44266491 8 Loss 850 9 1.480 0.005239 EXT2
chr11 44265641 44266491 18 Loss 850 9 1.383 0.007943 EXT2
chr11 44265139 44266309 4 Loss 1170 8 1.314 0.000138 EXT2
chr6 44219058 44219450 36 Gain 392 5 3.122 0.00997 HSP90AB1
chr6 44218906 44219450 12 Gain 544 6 3.069 0.006149 HSP90AB1
chr6 44214034 44214278 31 Loss 244 3 1.135 0.007638 HSP90AB1
chr6 44213099 44214785 29 Loss 1686 19 1.481 0.000869 HSP90AB1
chr11 44117264 44118171 34 Loss 907 9 1.325 0.000627 EXT2
chr11 44117063 44117583 23 Gain 520 6 2.977 0.002585 EXT2
chr19 42792643 42793234 10 Gain 591 3 2.722 0.005704 CIC
chr19 42792643 42793234 13 Gain 591 3 3.108 0.00674 CIC
chr19 42792643 42793234 14 Gain 591 3 4.288 0.005125 CIC
chr4 41750933 41751252 4 Loss 319 5 1.252 0.001866 PHOX2B
chr4 41750933 41751103 5 Loss 170 3 1.125 0.005873 PHOX2B
chr4 41747679 41749990 23 Gain 2311 15 2.743 0.003426 PHOX2B
chr8 41786656 41787274 1 Loss 618 3 1.068 0.000743 KAT6A
chr8 41786656 41787274 3 Loss 618 3 1.101 0.005265 KAT6A
chr8 41786656 41787274 4 Loss 618 3 1.058 0.006428 KAT6A
chr8 41786656 41787274 5 Loss 618 3 1.085 0.005692 KAT6A
148
chr8 41786656 41787274 6 Loss 618 3 1.135 0.00088 KAT6A
chr22 41532800 41534328 33 Gain 1528 6 2.637 0.005069 EP300
chr22 41532800 41534328 36 Gain 1528 6 3.149 0.004776 EP300
chr3 41268003 41275233 4 Loss 7230 12 1.380 0.008616 CTNNB1
chr3 41240407 41240957 35 Loss 550 8 1.423 0.005777 CTNNB1
chr3 41240315 41240957 34 Loss 642 9 1.365 0.00342 CTNNB1
chr3 41240005 41248200 23 Gain 8195 21 2.528 0.000466 CTNNB1
chr3 41238999 41239508 32 Gain 509 6 3.091 0.008694 none
chr13 41130568 41131189 1 Loss 621 6 1.211 0.005463 FOXO1
chr13 41130568 41130990 3 Loss 422 4 1.126 0.000694 FOXO1
chr13 41130568 41130990 4 Loss 422 4 1.239 0.001462 FOXO1
chr13 41130568 41131189 6 Loss 621 6 1.197 0.003281 FOXO1
chr13 41130568 41131356 32 Loss 788 9 1.460 0.000354 FOXO1
chr15 40903163 40908185 14 Gain 5022 6 3.977 0.009251 CASC5
chr15 40903163 40908185 36 Gain 5022 6 4.017 0.006759 CASC5
chr15 40504159 40504785 18 Gain 626 3 3.273 0.005881 BUB1B
chr15 40491279 40492441 4 Loss 1162 4 1.237 0.00305 BUB1B
chr15 40488777 40492617 8 Loss 3840 16 1.479 0.004244 BUB1B
chr15 40448624 40449550 15 Gain 926 4 3.721 0.003829 BUB1B
chr4 40244820 40245271 18 Loss 451 10 1.293 0.006431 RHOH
chr4 40188577 40189138 5 Loss 561 5 0.953 0.006986 none
chr20 39319565 39319991 2 Gain 426 5 3.718 0.001146 MAFB
chr20 39319565 39320100 3 Gain 535 6 3.430 0.00101 MAFB
chr20 39319565 39320100 7 Gain 535 6 3.004 0.002856 MAFB
chr20 39319565 39319991 8 Gain 426 5 2.892 0.004144 MAFB
chr20 39317467 39318634 13 Loss 1167 12 1.519 0.00289 MAFB
chr20 39316652 39318108 23 Gain 1456 15 3.064 1.06E-05 MAFB
chr20 39314530 39314779 11 Gain 249 4 3.349 0.008663 MAFB
chr5 38504086 38504316 20 Loss 230 3 0.940 0.002747 LIFR
chr5 38503570 38504316 3 Loss 746 4 1.308 0.007276 LIFR
chr5 38480714 38481044 3 Loss 330 4 1.012 0.002428 LIFR
chr5 38480714 38481044 5 Loss 330 4 0.968 0.000257 LIFR
149
chr5 38480714 38481044 6 Loss 330 4 1.083 0.003317 LIFR
chr8 38325170 38325773 32 Loss 603 6 1.263 0.007659 FGFR1
chr8 38325170 38325866 35 Loss 696 7 1.320 0.007074 FGFR1
chr6 37135956 37136817 6 Loss 861 8 1.229 0.007081 none
chr6 37135956 37136971 8 Loss 1015 10 1.313 0.003396 none
chr6 37133987 37136971 2 Loss 2984 21 1.275 0.000778 none
chr3 37029065 37030954 14 Gain 1889 7 3.271 0.004068 EPM2AIP1
chr3 37029065 37030954 15 Gain 1889 7 3.493 0.000396 EPM2AIP1
chr3 37029065 37030954 16 Gain 1889 7 3.353 0.002423 EPM2AIP1
chr21 36429461 36430121 31 Loss 660 3 1.168 0.007221 none
chr21 36429461 36430121 32 Loss 660 3 1.053 0.008211 none
chr15 34746489 34843587 34 Loss 97098 3 0.876 0.002618 GOLGA8B
chr15 34703790 34843587 10 Loss 139797 4 1.002 0.000789 GOLGA8B
chr6 34202600 34203601 9 Loss 1001 13 1.378 0.000159 none
chr6 34202600 34203752 21 Loss 1152 15 1.507 3.60E-05 none
chr6 34201975 34206285 35 Loss 4310 34 1.506 0.000426 HMGA1
chr6 34178962 34206285 31 Loss 27323 53 1.578 0.004667 HMGA1
chr11 33890209 33890959 29 Loss 750 3 1.105 0.002351 LMO2
chr11 33888981 33891534 31 Loss 2553 11 1.406 0.005257 LMO2
chr22 33245820 33254625 9 Gain 8805 10 2.666 0.006469 TIMP3/SYN3
chr22 33245820 33254625 10 Gain 8805 10 2.692 0.006007 TIMP3/SYN3
chr22 33245820 33254625 15 Gain 8805 10 3.298 0.006141 TIMP3/SYN3
chr22 33245820 33254042 35 Gain 8222 9 2.809 0.003339 TIMP3/SYN3
chr22 33245513 33254625 12 Gain 9112 11 3.204 0.002115 TIMP3/SYN3
chr22 33245513 33254625 16 Gain 9112 11 3.367 0.003912 TIMP3/SYN3
chr22 33245459 33254625 11 Gain 9166 12 2.959 0.006367 TIMP3/SYN3
150
chr22 33245459 33254625 14 Gain 9166 12 3.002 0.006339 TIMP3/SYN3
chr22 33190929 33192684 15 Gain 1755 10 3.187 0.006475 SYN3
chr13 32972670 32973058 23 Loss 388 6 1.300 0.00089 BRCA2
chr13 32937371 32937878 4 Loss 507 6 1.315 0.005834 BRCA2
chr13 32918410 32920314 11 Gain 1904 4 4.226 0.004112 BRCA2
chr13 32918410 32919215 33 Gain 805 3 2.890 0.004276 BRCA2
chr16 32026576 33785739 21 Gain 1759163 27 2.545 0.000327 many
chr16 31964971 33950850 10 Gain 1985879 33 2.646 0.000171 many
chr16 31964971 34737439 11 Gain 2772468 52 2.585 0.002153 many
chr16 31931207 46699664 27 Gain 14768457 74 2.448 0.000213 many
chr16 31931207 33785739 31 Gain 1854532 29 2.553 0.003347 many
chr16 31931207 34789967 33 Gain 2858760 55 2.466 0.002332 many
chr16 31551608 34789967 20 Gain 3238359 66 2.631 8.21E-05 many
chr16 31202131 31202788 34 Loss 657 8 1.242 4.01E-05 FUS
chr16 31202131 31202788 36 Loss 657 8 1.283 3.03E-05 FUS
chr20 30945886 30947168 29 Loss 1282 10 1.390 0.00707 ASXK1
chr20 30945886 30947168 31 Loss 1282 10 1.378 0.009556 ASXK1
chr8 30938511 30938757 5 Loss 246 3 1.024 0.005635 WRN
chr8 30938511 30938757 9 Loss 246 3 1.059 0.004466 WRN
chr8 30922392 30924381 23 Gain 1989 7 2.767 0.007874 WRN
chr8 30901884 30906067 34 Gain 4183 3 4.223 0.007417 WRN
chr8 30889802 30900493 20 Loss 10691 17 1.401 0.005898 WRN/PURG
chr17 30327236 30335307 15 Gain 8071 7 3.379 0.007832 SUZ12
chr17 30320306 30321287 15 Gain 981 5 3.623 0.008908 SUZ12
chr6 30117279 31132151 23 Loss 1014872 26 1.502 0.005133 many
chr2 29923076 29927835 34 Gain 4759 5 3.598 0.000146 ALK
chr2 29919023 29963483 11 Gain 44460 41 2.639 0.003132 ALK
chr2 29416150 29416245 2 Gain 95 3 4.231 0.000636 ALK
chr2 29415924 29416533 11 Gain 609 10 3.388 0.002639 ALK
chr2 29415924 29416533 12 Gain 609 10 2.996 0.007239 ALK
chr2 29415924 29416533 13 Gain 609 10 2.764 0.001735 ALK
chr2 29415924 29416533 14 Gain 609 10 3.120 0.000704 ALK
151
chr2 29415924 29416533 15 Gain 609 10 3.437 0.003281 ALK
chr2 29415924 29416533 16 Gain 609 10 3.403 0.001514 ALK
chr17 29558557 29562850 32 Loss 4293 18 1.359 8.06E-05 NF1
chr17 29553605 29562850 36 Loss 9245 53 1.531 0.002408 NF1
chr17 29419697 29420071 11 Loss 374 5 1.128 0.006005 none
chr17 29419697 29420071 15 Loss 374 5 1.164 0.005298 none
chr17 29419697 29420071 36 Loss 374 5 1.114 0.0057 none
chr17 29419415 29419995 2 Loss 580 7 1.012 6.71E-05 none
chr17 29419415 29419995 6 Loss 580 7 1.200 0.000875 none
chr17 29419415 29419995 7 Loss 580 7 1.148 0.000671 none
chr17 29414319 29419995 8 Loss 5676 13 1.380 0.001408 none
chr2 29416150 29416245 2 Gain 95 3 4.231 0.000636 ALK
chr2 29415924 29416533 11 Gain 609 10 3.388 0.002639 ALK
chr2 29415924 29416533 12 Gain 609 10 2.996 0.007239 ALK
chr2 29415924 29416533 13 Gain 609 10 2.764 0.001735 ALK
chr2 29415924 29416533 14 Gain 609 10 3.120 0.000704 ALK
chr2 29415924 29416533 15 Gain 609 10 3.437 0.003281 ALK
chr2 29415924 29416533 16 Gain 609 10 3.403 0.001514 ALK
chr22 29090584 29091770 32 Loss 1186 8 1.322 0.000717 CHEK2
chr22 29085469 29091770 15 Loss 6301 13 1.256 9.43E-05 CHEK2
chr22 29085469 29091770 34 Loss 6301 13 1.411 0.001665 CHEK2
chr22 29085469 29091803 36 Loss 6334 14 1.202 6.58E-05 CHEK2
chr22 29083868 29093022 3 Gain 9154 22 2.656 0.00022 CHEK2
chr22 28199441 28199793 11 Gain 352 4 3.730 0.004801 none
chr22 28199441 28199885 12 Gain 444 5 3.570 0.005944 none
chr22 28199441 28199885 13 Gain 444 5 3.350 0.004689 none
chr22 28199441 28199885 15 Gain 444 5 3.620 0.009102 none
chr7 27239155 27239928 23 Gain 773 6 3.113 0.00515 HOXA13
chr7 27238954 27240177 29 Loss 1223 11 1.329 0.002147 HOXA13
chr7 27238954 27240177 31 Loss 1223 11 1.408 0.007261 HOXA13
chr7 27238954 27240258 32 Loss 1304 12 1.381 0.001318 HOXA13
chr7 27238954 27240177 35 Loss 1223 11 1.443 0.002307 HOXA13
chr7 27226460 27226753 25 Gain 293 4 3.298 0.006359 none
chr7 27222489 27224501 23 Gain 2012 10 3.085 0.000111 HOXA11
152
chr7 27213143 27213867 32 Loss 724 15 1.329 0.000107 HOXA10
chr7 27213143 27213786 34 Loss 643 14 1.109 4.81E-06 HOXA10
chr7 27213143 27213786 15 Loss 643 14 1.327 0.00215 HOXA10
chr7 27213143 27213786 33 Loss 643 14 1.523 0.00309 HOXA10
chr7 27213143 27213279 27 Loss 136 4 1.190 0.008506 HOXA10
chr7 27213060 27213786 35 Loss 726 17 1.278 5.10E-07 HOXA10
chr7 27213060 27213786 9 Loss 726 17 1.372 6.41E-06 HOXA10
chr7 27213060 27213411 31 Loss 351 11 1.290 0.000871 HOXA10
chr7 27213013 27213613 23 Gain 600 15 3.410 5.65E-05 HOXA10
chr7 27203163 27206490 23 Gain 3327 35 2.719 3.66E-07 HOXA9
chr6 27107195 27118133 4 Gain 10938 6 3.191 0.001044
HIST1H2BK/HIST1HrI/HIST1H2AH
chr6 27107124 27118133 23 Gain 11009 8 2.632 0.006399
HIST1H2BK/HIST1HrI/HIST1H2AH
chr7 26231477 26231651 33 Loss 174 3 1.303 0.008396 HNRNOA2B1
chr7 26231370 26231651 9 Loss 281 4 1.016 0.002651 HNRNOA2B1
chr7 26231370 26231651 10 Loss 281 4 1.041 0.007384 HNRNOA2B1
chr7 26230320 26231801 3 Loss 1481 16 1.355 0.003216 HNRNOA2B1
chr7 26230320 26231651 5 Loss 1331 14 1.324 0.001054 HNRNOA2B1
chr7 26230320 26231651 6 Loss 1331 14 1.360 0.007961 HNRNOA2B1
chr7 26230320 26231651 25 Loss 1331 14 1.341 0.006597 HNRNOA2B1
chr2 24992633 24992988 8 Loss 355 4 1.255 0.009718 NCOA1
chr2 24992633 24992988 19 Loss 355 4 1.352 0.002698 NCOA1
chr2 24992520 24992988 18 Loss 468 5 1.285 0.004724 NCOA1
chr2 24992429 24992988 29 Loss 559 6 1.379 0.00713 NCOA1
chr2 24992343 24992988 5 Loss 645 7 1.308 0.005693 NCOA1
chr2 24888584 24889057 3 Loss 473 3 0.972 0.000251 NCOA1
chr2 24888584 24889057 18 Loss 473 3 1.161 0.000967 NCOA1
chr16 23652680 23653049 29 Loss 369 5 1.341 0.002629 DCTN5
153
chr16 23652447 23652967 23 Gain 520 8 2.775 0.008182 PALB2/DCTN5
chr11 22647624 22648313 11 Gain 689 6 3.384 0.002996 none
chr11 22647624 22648313 34 Gain 689 6 3.352 0.00366 none
chr11 22646141 22647191 23 Gain 1050 25 2.484 0.002486 FANCF
chr15 22617694 28534745 2 Gain 5917051 184 2.467 0.001252 many
chr15 22617694 28775354 27 Gain 6157660 186 2.433 1.95E-07 many
chr10 21920380 21924854 2 Gain 4474 4 4.297 0.003262 MLLT10
chr10 21920380 21924854 8 Gain 4474 4 3.222 0.006256 MLLT10
chr10 21920380 21924854 13 Gain 4474 4 3.495 0.005123 MLLT10
chr10 21920380 21924854 25 Gain 4474 4 3.676 0.006732 MLLT10
chr10 21920380 21924854 34 Gain 4474 4 4.571 0.005093 MLLT10
chr10 21833725 21929076 16 Gain 95351 87 2.622 0.000585 MLLT10
chr10 21828980 21929076 12 Gain 100096 91 2.499 0.00403 MLLT10
chr10 21824344 21929076 15 Gain 104732 97 2.533 0.002327 MLLT10
chr15 21587557 22617694 10 Loss 1030137 18 1.355 0.000149 many
chr15 20912596 22617694 18 Loss 1705098 31 1.494 0.002985 many
chr15 20449001 22393843 15 Loss 1944842 32 1.254 9.16E-06 many
chr15 20449001 22617694 32 Loss 2168693 39 1.212 1.17E-12 many
chr15 20449001 22572823 36 Loss 2123822 38 1.278 9.70E-06 many
chr15 20170037 21162008 10 Loss 991971 20 1.239 9.53E-06 many
chr13 20662760 20663013 27 Loss 253 3 1.427 0.005066 ZMYM2
chr13 20565878 20567487 5 Loss 1609 5 1.151 0.003587 ZMYM2
chr17 20133000 20135712 29 Loss 2712 6 1.294 0.004067 SPECC1
chr17 20059546 20107851 35 Loss 48305 8 1.444 0.000348 SPECC1
chr19 18890772 18890951 10 Gain 179 3 3.045 0.002035 CRTC1
chr19 18890772 18890951 11 Gain 179 3 3.737 0.000235 CRTC1
chr19 18890772 18890951 12 Gain 179 3 3.919 0.001162 CRTC1
chr19 18890772 18890951 13 Gain 179 3 3.558 0.005579 CRTC1
chr19 18890772 18890951 34 Gain 179 3 3.808 0.008049 CRTC1
chr19 18890772 18890951 36 Gain 179 3 4.119 0.009941 CRTC1
chr19 18794120 18794911 14 Loss 791 6 1.085 0.000517 CRTC1
chr19 18793710 18794911 9 Loss 1201 7 1.222 0.002819 CRTC1
chr19 18793710 18794581 31 Loss 871 5 1.068 0.009172 CRTC1
154
chr22 18644898 18851514 23 Gain 206616 4 3.143 0.004955 many
chr22 18644898 18997483 26 Gain 352585 9 2.931 0.00012 many
chr8 17781642 17791750 27 Gain 10108 10 2.842 0.006587 PCM1
chr8 17781642 17793053 34 Gain 11411 12 2.985 0.008356 PCM1
chr8 17781642 17793053 36 Gain 11411 12 2.816 0.001986 PCM1
chr8 17778487 17779712 34 Gain 1225 7 3.060 0.006407 PCM1
chr1 17380591 17381279 31 Loss 688 3 1.097 0.000698 SDHB
chr1 17380591 17381279 32 Loss 688 3 1.057 0.00144 SDHB
chr22 16541382 17291241 19 Gain 749859 14 2.708 0.006046 many
chr22 16516330 17291241 10 Gain 774911 15 2.475 0.001715 many
chr22 16054713 17044026 23 Gain 989313 10 2.665 0.006475 many
chr19 15360021 15360421 11 Gain 400 4 3.460 0.006056 BRD4
chr19 15360021 15360421 15 Gain 400 4 3.773 0.001348 BRD4
chr19 15360021 15360421 33 Gain 400 4 2.934 0.002808 BRD4
chr19 15360021 15360421 34 Gain 400 4 3.617 0.003856 BRD4
chr19 15357864 15358073 21 Loss 209 6 1.360 0.007331 BRD4
chr16 14042120 14042665 1 Loss 545 8 1.077 0.002192 ERCC4
chr16 14042120 14042665 3 Loss 545 8 0.920 0.004714 ERCC4
chr16 14042120 14042665 4 Loss 545 8 1.221 0.000308 ERCC4
chr16 14042120 14042665 5 Loss 545 8 1.053 0.002692 ERCC4
chr16 14042120 14042665 6 Loss 545 8 0.980 0.000256 ERCC4
chr16 14042120 14042665 7 Loss 545 8 1.294 0.004397 ERCC4
chr16 14028134 14029637 4 Loss 1503 13 1.425 0.007459 ERCC4
chr7 14032560 14032858 2 Gain 298 4 4.350 0.006222 none
chr7 14032458 14032934 18 Gain 476 6 3.101 0.004583 none
chr7 14032458 14032858 25 Gain 400 5 3.672 0.005317 none
chr7 13971205 13971595 5 Loss 390 3 1.034 0.009828 ETV1
chr7 13931254 13931718 3 Loss 464 5 1.255 0.001891 ETV1
chr7 13931147 13931718 5 Loss 571 6 1.058 0.003695 ETV1
chr16 11348888 11349213 7 Gain 325 7 2.928 0.004004 SOCS1
chr16 11348791 11349213 5 Gain 422 8 3.051 0.006231 SOCS1
chr16 11348791 11349213 17 Gain 422 8 2.921 0.004535 SOCS1
155
chr16 11348791 11350125 23 Gain 1334 12 2.972 9.51E-05 SOCS1
chr16 11348710 11360722 34 Loss 12012 61 1.489 0.000464 SOCS1
chr16 11348710 11350221 35 Loss 1511 14 1.342 0.001128 SOCS1
chr19 11071155 11072064 9 Loss 909 5 1.287 0.009205 SMARCA4
chr19 11070022 11072064 23 Gain 2042 6 3.090 0.001212 SMARCA4
chr21 10761494 10958437 9 Loss 196943 7 1.418 0.00706 TPTE
chr21 10761494 10958437 21 Loss 196943 7 1.449 0.002859 TPTE
chr12 9617419 9729896 18 Gain 112477 4 4.681 0.006827 none
chr12 9617419 9700632 29 Gain 83213 3 3.998 0.001311 none
chr21 9412661 14486023 10 Gain 5073362 31 2.641 0.006149 none
chr21 9412661 14486023 10 Gain 5073362 31 2.641 0.006149 many
chr21 9412661 12798463 27 Gain 3385802 29 2.615 0.000148 many
chr17 7579309 7579579 7 Gain 270 4 45.045 0.001053 TP53
chr17 7578179 7578520 7 Gain 341 7 35.725 0.000293 TP53
chr17 7572512 7573087 7 Gain 575 9 31.835 8.51E-05 TP53
chr17 7571756 7586887 1 Loss 15131 55 1.277 1.64E-11 TP53
chr17 7571756 7573919 3 Loss 2163 15 1.139 4.79E-07 TP53
chr17 7350235 8332073 2 Loss 981838 300 1.320 4.73E-29 many
chr7 6043480 6046391 29 Gain 2911 6 2.987 0.000961 PMS2
chr7 6042451 6046391 34 Gain 3940 9 3.180 0.002874 PMS2
chr7 6041162 6042451 14 Loss 1289 6 1.098 0.000187 PMS2
chr7 6041162 6042451 35 Loss 1289 6 1.417 0.001344 PMS2
chr17 5045536 5048750 9 Gain 3214 15 2.521 0.00436 USP6
chr17 5038831 5042825 4 Gain 3994 30 2.648 6.73E-05 USP6
chr17 5037179 5042825 12 Loss 5646 38 1.480 0.000637 USP6
chr17 5037009 5042825 11 Loss 5816 39 1.479 0.00053 USP6
chr17 5036528 5037009 17 Gain 481 3 3.827 0.005764 USP6
chr17 5030584 5043852 8 Gain 13268 82 2.545 0.001146 USP6
chr17 5030067 5043852 3 Gain 13785 88 2.754 1.25E-07 USP6
chr16 3777752 3781556 34 Loss 3804 53 1.517 0.000323 CREBBP
156
chr16 3777752 3781556 36 Loss 3804 53 1.512 8.71E-05 CREBBP
chr11 3730557 3733446 3 Loss 2889 7 1.207 0.003613 NUP98
chr11 3730557 3733304 16 Gain 2747 5 3.682 0.002331 NUP98
chr11 3730557 3733379 34 Gain 2822 6 3.542 0.000485 NUP98
chr11 3697489 3697730 2 Loss 241 6 1.140 0.00884 NUP98
chr11 3697265 3698117 36 Loss 852 16 1.380 9.15E-05 NUP98
chr1 2985527 2987203 2 Loss 1676 4 1.059 0.002657 PRDM16
chr1 2985527 2987203 5 Loss 1676 4 1.188 0.009582 PRDM16
chr1 2985527 2987203 6 Loss 1676 4 1.035 0.008593 PRDM16
chr1 2985527 2987203 7 Loss 1676 4 1.053 0.008038 PRDM16
chr1 2985527 2987203 8 Loss 1676 4 1.209 0.006018 PRDM16
chr1 2981313 2987203 35 Loss 5890 51 1.554 0.004207 PRDM16
chr1 2979547 2987203 34 Loss 7656 65 1.465 3.30E-05 PRDM16
chr19 1204858 1205483 10 Loss 625 6 1.154 0.000741 none
chr19 1204858 1205483 24 Loss 625 6 1.252 0.006983 none
chr19 1204858 1206188 31 Loss 1330 13 1.266 0.000132 STK11
chr19 1204441 1206532 13 Loss 2091 20 1.439 5.18E-06 STK11
chr19 1204441 1206601 35 Loss 2160 21 1.415 1.34E-06 STK11
chr19 1204341 1206601 9 Loss 2260 22 1.397 1.22E-06 STK11
chr19 1204341 1205334 27 Loss 993 8 1.201 0.003221 none
chr6 391297 392018 9 Loss 721 6 1.187 0.006637 IRF4
chr6 391012 393307 32 Loss 2295 13 1.283 0.001368 IRF4
chr6 390033 393813 14 Loss 3780 22 1.410 0.001425 IRF4
chr6 389807 393813 29 Loss 4006 24 1.434 0.002961 IRF4
chr6 389807 393813 31 Loss 4006 24 1.451 0.003974 IRF4
chr6 389807 393813 34 Loss 4006 24 1.321 0.000228 IRF4
chr6 389807 393813 35 Loss 4006 24 1.444 0.000493 IRF4
chr6 389807 393813 36 Loss 4006 24 1.425 0.003696 IRF4
chr14 96137304 96138881 6 Loss 1577 6 0.872 0.003057 TCL6
chr14 96137304 96138881 1 Loss 1577 6 0.903 0.004131 TCL6
chr14 96137304 96138881 3 Loss 1577 6 0.930 0.008222 TCL6
chr14 96137304 96138881 5 Loss 1577 6 0.978 0.001597 TCL6
chr14 96137304 96138881 4 Loss 1577 6 1.063 0.000595 TCL6
chr13 49039334 49047387 10 Loss 8053 8 1.441 0.001357 RB1
157
chr13 48877704 48879553 23 Gain 1849 8 2.712 0.006528 RB1
chr13 48877469 48877704 9 Loss 235 3 1.290 0.006003 none
chr15 91336489 91337544 9 Gain 1055 4 3.830 0.001668 BLM
chr15 91306243 91308546 16 Gain 2303 7 3.657 0.006367 BLM
chr15 91187994 91251111 24 Loss 63117 11 1.403 0.009155 CRTC3
Supplementary Table 5 – A full list of regions of interest selected based on the criteria described
in the methods
Top Related