Download - Identification of Cancer Susceptibility Loci by High ... · Jonathan Trick Master of Science Department of Medical Biophysics University of Toronto 2014 Abstract Li-Fraumeni syndrome

Identification of Cancer Susceptibility Loci by High-Resolution Cancer Gene Microarray Analysis

by

Jonathan Trick

A thesis submitted in conformity with the requirements for the

degree of Master of Science

Department of Medical Biophysics

University of Toronto

©Copyright by Jonathan Trick (2014)

ii

Identification of Cancer Susceptibility Loci by High-Resolution Cancer

Gene Microarray Analysis

Jonathan Trick

Master of Science

Department of Medical Biophysics

University of Toronto

2014

Abstract

Li-Fraumeni syndrome (LFS) is a highly penetrant familial cancer syndrome associated with

inherited germline mutations in TP53. However, a causative gene has not been identified in

~25% of LFS and many LFS-like families.

In this work we aim to discover possible candidate genes that contribute to the highly

penetrant cancer predisposition observed in these families. We designed an ultra high-

resolution CGH array to interrogate genes implicated in cancer. We detected alterations in

genes such as MSH2 and PTCH1 which were successfully validated by qPCR. This also led

to the detection of a novel substitution in a family by sequencing of PTCH1. Upon further

investigation, we found that the proband harbouring the MSH2 deletion belongs to a kindred

that satisfies the criteria for Lynch syndrome. Both the methodology and the results of this

study can be expanded upon in the future to hopefully identify additional causative genes in

LFS.

iii

Acknowledgements

There are a number of people to whom I owe the completion of this thesis. The first,

like all things, is my mother who raised my brother and I alone in very difficult

circumstances but always reinforced the value of education. My supervisor, Dr. David

Malkin, is someone I can truly say is a great role model for any aspiring scientist. I am very

grateful for both the guidance and patience he has shown me throughout my time in his lab.

My committee members, Dr. Susan Done and Dr. Stephen Meyn, were also a great help

throughout the project offering sound advice along the way and providing insightful

feedback during the writing of this thesis.

The members of the Malkin lab provided not only scientific support, but also an

enjoyable environment in which to work. Ana Novokmet and Margaret Pienkowska in

particular were especially helpful. This project was only completed thanks to the

collaboration of many people including the staff at the Centre for Applied Genomics, the

staff of the Molecular Genetics Laboratory at the Hospital for Sick Children, and the lab of

Dr. Cynthia Hawkins.

I also owe not only my supervisor, but both my academic department of Medical

Biophysics and my chain of command in the Canadian Armed Forces for being so

accommodating. I consider myself uniquely fortunate to be surrounded by so many people

willing to make my goals a priority. Finally, I must thank my family and friends, especially:

Zainab Motala, Don Oakie, James Tran, Matthew Mistry, Jordan Jarvis, Jonathan Fuller,

Jordan John, and my brother Michael, for their unconditional support.

iv

Table of Contents Abstract .................................................................................................................................................. ii

Acknowledgements ............................................................................................................................... iii

List of Tables........................................................................................................................................ vii

List of Figures ..................................................................................................................................... viii

List of Abbreviations .............................................................................................................................. x

Chapter 1: Introduction and Background ............................................................................................... 1

1.1 Li-Fraumeni Syndrome .......................................................................................................... 1

1.1.1 Overview ........................................................................................................................ 1

1.1.2 Cancer Risk Patterns in LFS Families ............................................................................ 4

1.1.3 Genetic Etiology of Li-Fraumeni Syndrome .................................................................. 5

1.1.4 Genetic Modifiers ......................................................................................................... 10

1.1.5 The Role of Other Genes in LFS .................................................................................. 13

1.1.6 Epigenetics and Li-Fraumeni Syndrome ...................................................................... 14

1.2 Hereditary Cancer Predisposition Syndromes ...................................................................... 15

1.2.1 Overview ...................................................................................................................... 15

1.2.2 Gorlin Syndrome .......................................................................................................... 17

1.2.3 Lynch Syndrome .......................................................................................................... 20

1.3 TP53 ..................................................................................................................................... 24

1.3.1 Overview ...................................................................................................................... 24

1.3.2 Transcriptional Regulation of TP53 ............................................................................. 25

1.3.3 TP53 and MDM2 ......................................................................................................... 27

1.3.4 Post Translational Modifications and the TP53 Response ........................................... 29

1.3.5 TP53-mediated Cell Cycle Arrest ................................................................................ 31

1.3.6 TP53-mediated Apoptosis ............................................................................................ 32

1.4 Copy Number Variation ....................................................................................................... 33

1.4.1 Overview ...................................................................................................................... 33

1.4.2 Generation of Copy Number Variants.......................................................................... 36

1.4.3 Germline CNVs in Disease .......................................................................................... 39

1.4.4 CNVs in Cancer predisposition .................................................................................... 40

1.5 Rationale .............................................................................................................................. 45

Chapter 2: Materials and Methods ....................................................................................................... 46

v

2.1 Genes of Interest .................................................................................................................. 46

2.2 Array Design ........................................................................................................................ 46

2.2.1 Design Overview .......................................................................................................... 46

2.2.2 Design of Exonic Probes .............................................................................................. 47

2.2.3 Genomic Probes ........................................................................................................... 49

2.2.4 Non-coding Exon Probes ............................................................................................. 50

2.2.5 Promoter Region Probes ............................................................................................... 50

2.2.6 Gene Intron Probes ....................................................................................................... 51

2.2.7 Finalizing the Array ..................................................................................................... 52

2.3 Samples ................................................................................................................................ 52

2.3.1 Sample Selection .......................................................................................................... 52

2.3.2 Subject Recruitment ..................................................................................................... 55

2.4 Analysis and Validation of CGH Array Data ....................................................................... 55

2.4.1 Custom CGH Array Analysis ....................................................................................... 55

2.4.2 Quantitative PCR validation ......................................................................................... 56

2.4.3 TaqMan PTCH1 Validation ......................................................................................... 57

2.4.3 Sequencing of PTCH1 .................................................................................................. 57

2.4.4 Mismatch Repair Gene Mutation Screening ................................................................ 57

Chapter 3: Results ................................................................................................................................ 58

3.1 Array Results ........................................................................................................................ 58

3.1.1 Candidate Genes Found by Custom Array CGH ......................................................... 58

3.1.2 Significant Alterations Found in Single Samples ......................................................... 59

3.1.3 Difficulty Identifying Genuine Copy Number Alterations with Custom Probes ......... 60

3.1.4 Genetic evidence of anticipation .................................................................................. 62

3.2 Validation of Candidate Genes ............................................................................................ 64

3.2.1 MSH2 involved in a Li-Fraumeni-Like phenotype ...................................................... 64

3.2.2 Large copy number gains in sample 23 likely an artifact ............................................. 66

3.2.3 Confirmation of custom array’s ability to detect previously unknown alterations ...... 68

3.2.4 DICER1 is copy number neutral in the patient cohort ................................................. 70

3.2.5 PTCH1 validation ......................................................................................................... 72

4.1 Discussion on high-resolution genomic analysis in LFS ..................................................... 80

vi

4.1.1 Utility of custom CGH arrays for detecting novel copy number alterations in the

germline 80

4.1.2 Next Generation Sequencing ........................................................................................ 83

4.1.3 Next Generation Sequencing and Cancer Syndromes .................................................. 86

4.2 Potential Genetic Evidence of Anticipation ......................................................................... 89

4.3 Discovery of a Lynch Syndrome Kindred ............................................................................ 90

4.4 PTCH1 Associated with the LFS phenotype ........................................................................ 93

4.4.1 Deletions in PTCH1 isoforms associated with the LFS phenotype ............................. 93

4.4.2 A Novel Variant in an LFS-L Family Affected by Syndactyly.................................... 95

4.5 Management of Cancer Predisposition Syndromes .............................................................. 97

4.6 Concluding Remarks ............................................................................................................ 98

References .......................................................................................................................................... 100

Supplementary Information ............................................................................................................... 118

Supplementary Materials and Methods .......................................................................................... 118

Sanger Genes .............................................................................................................................. 118

Additional Genes ........................................................................................................................ 129

qPCR Supplementary Information ................................................................................................. 129

qPCR Protocol ............................................................................................................................ 129

qPCR Primers ............................................................................................................................. 130

Supplementary Results ................................................................................................................... 133

Regions of Interest after CGH array analysis ............................................................................. 133

vii

List of Tables

Table 1- The revised Amsterdam criteria and Bethesda guidelines used to diagnose HNPCC.

Table 2- Known cancer predisposition genes with rare copy number variants and their

associated syndromes.

Table 3- Summary of all 40 samples run on the array including sex, related samples, age at

diagnoses (dx) and tumour type.

Table 4- Primary genes of interest after CGH array analysis

Table 5-Large copy number losses seen in only one sample

Table 6- Seven large copy number gains were detected in Sample 23

Table 7- Summary of all shared alterations that were detected as expanded in families by

array segmentation analysis.

viii

List of Figures

Figure 1- LFS component Tumours compiled by Nichols and Malkin.

Figure 2- Tumours associated with TP53 germline mutations seen in LFS.

Figure 3- TP53 mutations in Li-Fraumeni Syndrome carriers are clustered in the DNA

binding domain, codons 102-292.

Figure 4- Human TP53 protein has four key functional domains: the transcriptional

activation domain, the DNA binding domain, the tetramerization domain, and the negative

regulatory domain.

Figure 5- Recently there has been a rapid rise in reported copy number variation into the

Database of Genomic Variants.

Figure 6- The majority of copy number variants reported in the Database of Genomic

Variants are now under 10kb.

Figure 7- Custom probes of our custom CGH array are highlighted in green while copy

number probes from the Affymetrix Genome-Wide Human SNP6.0 are highlighted in red.

The custom array provides a much greater probe density in the coding regions in genes of

interest.

Figure 8- A notable example of a sample (b) which was not identified by segmentation

analysis but reported very similar log ratios to a sample (a), which was. Samples a and b

were run beside each other on the same slide.

Figure 9- MSH2 copy number loss encompassing exons 3-6 (shown in red) detected on the

custom CGH array

Figure 10- MSH2 copy number loss in validated via SYBR Green qPCR. Shown are the

mean (+/- SEM) copy number ratios.

Figure 11- Pedigree of a proband with a history of HNPCC in the paternal lineage and breast

cancer predisposition in the maternal lineage.

Figure 12- FOXO1 copy number gain failed to validate via SYBR Green qPCR, instead

appearing as a copy number loss. Shown are the mean (+/- SEM) copy number ratios.

Figure 13- MYCN copy number gain failed to validate via SYBR Green qPCR, instead

appearing as copy number neutral. Shown are the mean (+/- SEM) copy number ratios.

Figure 14- PCDH15 copy number gain successfully validated via SYBR Green qPCR.

Shown are the mean (+/- SEM) copy number ratios.

ix

Figure 15- EXT1 copy number loss successfully validated via SYBR Green qPCR. Shown

are the mean (+/- SEM) copy number ratios.

Figure 16- The copy number losses on the 5’ end of DICER1 failed to validate via SYBR

Green qPCR, instead appearing as copy number neutral in all four samples. Shown are the


Figure 17- The copy number losses encompassing two exons on the 5’ end of DICER1 failed

to validate via SYBR Green qPCR, instead appearing as copy number neutral in all three

samples. Shown are the mean (+/- SEM) copy number ratios.

Figure 18- Thirteen copy number losses were observed on the far 5’ end of PTCH1, shown in

red.

Figure 19- Of the copy number losses encompassing the 5’ end of PTCH1, 11/13

successfully validated via SYBR Green qPCR. Shown are the mean (+/- SEM) copy number

ratios.

Figure 20- The six samples with detected copy number losses encompassing a TaqMan probe

were all revealed to be copy number neutral at the probe’s locus using a TaqMan copy

number qPCR assay.

Figure 21- The copy number gain spanning 2.6kb in PTCH1 failed to validate via SYBR

Green qPCR, instead appearing as a copy number loss. Shown are the mean (+/- SEM) copy

number ratios.

Figure 22- The copy number loss observed in exon 14 of PTCH1 failed to validate via SYBR

Green qPCR, instead appearing as copy number neutral. Shown are the mean (+/- SEM)

copy number ratios.

Figure 23- The copy number gain observed in the 3’UTR of PTCH1 failed to validate via

SYBR Green qPCR, instead appearing as copy number neutral. Shown are the mean (+/-

SEM) copy number ratios.

Figure 24- Pedigree Showing a LFS-L family with a history of syndactyly in the maternal

lineage.

x

List of Abbreviations

ALL acute lymphoblastic leukemia

APAF1 apoptotic peptidase activating factor 1

APC adenomatous polyposis coli

ARF alternate reading frame of CDKN2A

ATM ataxia telangiectasia mutated

ATR ataxia telangiectasia and Rad3-related protein

BAP1 BRCA1 associated protein-1

BAX BCL2-associated X protein

BCC basal cell carcinoma

BCL10 B-cell CLL/lymphoma 10

Bcl-2 B-cell CLL/lymphoma 2

BCMA tumor necrosis factor receptor superfamily, member 17

BIR potassium inwardly-rectifying channel, subfamily J, member 11

BLM Bloom syndrome, RecQ helicase-like

BRCA1 breast cancer 1, early onset

BRCA2 breast cancer 2, early onset

CCDS consensus coding sequence project

CDH1 cadherin 1, type 1, E-cadherin

CDK cyclin-dependent kinase

CDKN1A /p21 cyclin-dependent kinase inhibitor

CGH comparative genomic hybridization

CHEK1 checkpoint kinase 1

CHEK2 checkpoint kinase 2

CMMR-D constitutional mismatch repair-deficiency

c-MYC v-myc avian myelocytomatosis viral oncogene homolog

xi

CNV copy number variant

COSMIC catalogue of somatic mutations in cancer

CXCL1 chemokine (C-X-C motif) ligand 1

DGV database of genomic variants

DM1 myotonic dystrophy type 1

DMC1 DNA meiotic recombinase 1

EBV Epstein-Barr virus

EPCAM epithelial cell adhesion molecule

ERCC4 excision repair cross-complementation group 4

EXT1 exostosin glycosyltransferase 1

FADD Fas (TNFRSF6)-associated via death domain

FANCC Fanconi anemia, complementation group C

FAP familial adenomatous polyposis

FasL Fas ligand

FoSTeS fork stalling and template switching

FOXO1 forkhead box O1

FOXP2 forkhead box P2

GADD45 growth arrest and DNA-damage-inducible, alpha

GLI GLI family zinc finger

HBOC hereditary breast-ovarian cancer syndrome

Hh Hedgehog

HNPCC Hereditary Nonpolyposis Colorectal Cancer Syndrome

HR homologous recombination

IARC International Agency for Research on Cancer

IL-8 interleukin 8

IPA Ingenuity Pathways Analysis

IR ionizing radiation

xii

KDM1A/LSD1 lysine (K)-specific demethylase 1A

KIA1797 focadhesin

LFL Li-Fraumeni like

LFS Li-Fraumeni syndrome

MAX MYC associated factor X

MDM2 MDM2 proto-oncogene, E3 ubiquitin protein ligase

miRNA micro-RNA

MLH1 mutL homolog 1

MLPA multiplex ligation polymorphism analysis

MMR mismatch repair

MSH2 mutS homolog 2

MSH6 mutS homolog 6

MTA3 metastasis associated 1 family, member 3

NAHR non-allelic homologous recombination

NBCCS nevoid basal cell carcinoma syndrome

NCBI National Center for Biotechnology Information

NF1 neurofibromin 1

NF2 neurofibromin 2

NGS next generation sequencing

NHEJ non-homologous end joining

NOXA phorbol-12-myristate-13-acetate-induced protein 1

OPGP Ontario Population Genomics Platform

p16 cyclin-dependent kinase inhibitor 2A, multiple tumor suppressor 1

p63 tumor protein p63

p73 tumor protein p73

PCC hereditary pheochromocytoma

PCDH15 protocadherin 15

xiii

PCNA proliferating cell nuclear antigen

PCR polymerase chain reaction

PMS2 postmeiotic segregation increased 2

PTCH1 patched 1

PTCH2 patched 2

PTEN phosphatase and tensin homolog

PUMA BCL2 binding component 3

QC quality control

RAD51 RAD51 recombinase

Rad51L1 RAD51 paralog B

RB retinoblastoma

RB1 retinoblastoma 1

SHH sonic hedgehog

SIRT1 sirtuin 1

SIRT3 sirtuin 3

SMO smoothened, frizzled class receptor

SNP single nucleotide polymorphism

Sp1 Sp1 transcription factor

Spo11 SPO11 meiotic protein covalently bound to DSB

STK11 serine/threonine kinase 11

SUFU suppressor of fused homolog (Drosophila)

TCAG The Centre for Applied Genomics

TP53 tumor protein p53

TRRAP transformation/transcription domain-associated protein

UTR untranslated region

UV ultraviolet radiation

VHL von Hippel-Lindau tumor suppressor, E3 ubiquitin protein ligase

xiv

WRAP53/ WDR79 WD repeat containing, antisense to TP53

WT1 Wilms tumor 1

1

Chapter 1: Introduction and Background

1.1 Li-Fraumeni Syndrome

1.1.1 Overview

In 1969, Li and Fraumeni reported findings that indicated the existence of a previously

unknown familial cancer predisposition syndrome. Following an epidemiological survey

including 280 medical charts and 418 death certificates of children diagnosed with

rhabdomyosarcoma from 1960 to 1964, a familial pattern of cancer emerged. In five families, a

second child had developed a soft tissue sarcoma. In addition to this, a diverse range of tumours

including osteosarcomas, (premenopausal) breast cancers, brain cancers, and leukemias were

observed in the first- and second-degree relatives along one lineage of the proband. A number of

cases of multiple metachronous primary tumours were also observed. This striking incidence of

cancer in each family was far greater than would be expected by chance. The pattern of

inheritance seemed to suggest an autosomal dominant mechanism of transmission1. In 1988, Li

et al described 24 families that they had followed for many years, who met the following criteria

which would become the clinical definition of “classic” Li-Fraumeni syndrome (LFS, OMIM

151623): a proband with sarcoma diagnosed under the age of 45, who has a first-degree relative

with any cancer under the age of 45, as well as another first- or second-degree relative with

either any cancer under 45 years or a sarcoma at any age2.

2

Figure 1- LFS component Tumours compiled by Nichols and Malkin3.

Figure 2- Tumours associated with TP53 germline mutations seen in LFS4.

3

To date, more than 500 classical LFS families have been reported in the database of the

International Association for Research on Cancer4 or through isolated reports. Many more

families exist but have not been reported in the literature. Because the classical LFS criteria are

quite stringent, a designation of “LFS-like” (LFL) is used to describe the many families that

show a strong similarity to LFS inheritance and tumor type, but do not meet the classical criteria.

The defining criteria to classify LFL have been refined by Chompret et al. to include: i) a

proband with a characteristic LFS tumour before the age of 46 with at least one first- or second-

degree relative with an LFS tumour (except for breast cancer if the proband has breast cancer)

before the age of 56 or with multiple primary tumours; ii) a proband with multiple tumours

(except breast cancer), two of which fall in the LFS tumour spectrum and the first of which

occurred before the age of 46; and iii) any patient with an adrenocortical carcinoma or choroid

plexus tumour, irrespective of family history5.

A hallmark of LFS is the wide tumour spectrum that can be observed even within

families (which would presumably have the same or similar mutation profiles). Even more

striking is the incidence of multiple primary tumour types in a single individual. Hisada et al.

carried out a retrospective study of 200 affected carriers of TP53 germline mutations and found

that 15% developed a second cancer, 4% a third cancer, and 2% a fourth cancer. Those who had

survived childhood cancers were at the highest risk of developing additional malignancies6. It

should be noted however, that it is not clear if the outcome of these patients is significantly

different than non-LFS patients who have been treated in a similar manner for the same

(sporadic) tumour. While notable, this characteristic is not unique in hereditary cancer

syndromes. Individuals with germline mutations in the retinoblastoma protein, RB1, another

tumour suppressor, are also at a higher risk of developing secondary tumours7.

4

1.1.2 Cancer Risk Patterns in LFS Families

Attempts at determining a lifetime cancer risk in LFS patients have led to somewhat

varied results. In a hospital-based analysis in 2006, Wu et al. estimated the lifetime risk of TP53

mutation carriers to be 73% in males, approaching 100% in females. The high risk of breast

cancer in females was thought to account for the large difference between the sexes8. Creating

three broad age groups: <15 years, 16-45 years, and >45 years, the relative risk in males was

found to be 19%, 27%, and 54%, respectively. The relative risk for females was found to be

12%, 82%, and 100% in the same age groups. An earlier study by Hwang et al. in 2003 observed

families identified on the basis of childhood soft tissue sarcomas. They evaluated cancer risk in

both gene mutation carriers and noncarriers who had been followed for more than 20 years9. In

the carrier group, 12%, 35%, 52%, and 80% developed cancer by the ages of 20, 30, 40, and 50

years, respectively. Breast cancers and soft tissue sarcomas represented the most common

cancers. In the more than 3000 noncarriers, a cumulative risk of 0.7%, 1%, 2.2%, and 5.1% was

observed. Interestingly, this is almost identical to that of the general population, which lends

credence to the theory that the germline mutation of TP53 is sufficient to cause the LFS

phenotype without any other genetic modifiers. This study also found a higher cancer risk in the

female carriers. In the four age groups previously mentioned, the specific cumulative risks for

female carriers were found to be 18%, 49%, 77% and 93%. Every age group saw an increased

risk when compared to the male carriers: 10%, 21%, 33%, and 68%. Contradicting the

commonly held belief that the high incidence of breast cancer accounts for this difference, the

study showed an increased cancer risk in females even after sex-specific cancers (breast, ovarian,

and prostate cancer) were excluded. Cancers with a higher risk for females included brain and

lung.

5

1.1.3 Genetic Etiology of Li-Fraumeni Syndrome

Due to its highly penetrant, autosomal dominant nature, it was believed that a common

etiological agent existed in LFS. However, due to the relative rarity of LFS/LFL kindreds, along

with a high mortality, available tissue samples were limited. Furthermore, the normal karyotypes

observed in LFS patients hampered classical genetic linkage analysis. Not until 20 years after

being described, in 1990, was TP53 identified as a causative gene in LFS. Malkin et al. used a

candidate gene approach based on the observation that somatic mutations in TP53 were detected

in more than 50% of sporadic human cancers10

as well as that transgenic mice expressing

mutated TP53 alleles developed a wide spectrum (albeit not identical to the LFS spectrum) of

tumours, as seen in LFS11

. All five families studied in this initial report were found to harbour

germline heterozygous point mutations in the TP53 gene. Subsequent studies have led to the

estimate that of the families that meet the criteria for classical LFS, only 60%-80% have had

germline mutations detected in TP5312

. The incidence of germline TP53 coding mutations in

LFS-like kindreds is estimated to be significantly lower at roughly 40%13

. The rate of TP53

mutations per birth has been estimated to be roughly 1 in 5000 individuals14

. The frequency of de

novo TP53 mutations in carriers has been estimated to be between 7 and 20%15

. Similar to

sporadic tumours, TP53 mutations tend to be clustered in the DNA binding domain of the gene

(codons 102-292), especially in highly conserved regions (figure 3). While the order of

frequency varies between the two groups, there is a distinct similarity in mutation “hotspots” in

LFS and sporadic tumours. These hotspots are usually residues located at or near the p53-DNA

interface including codons 175, 220, 245, 248, 273, and 2824. These sites are categorized as

“contact” mutants (codons 248 and 273) or “structural” mutants (codons (175, 220, 245, and

282) based on whether they directly bind DNA or contribute to the necessary folding of the DNA

6

binding domain. Within TP53, the location of the inherited mutation does appear to have a

predisposing effect on tumour type. Missense mutations in the DNA binding domain, the most

common mutation type observed, are associated with breast and brain tumours. Nonsense,

frameshift and splice mutations which result in truncated protein or a loss of function are

associated with an increased risk of early onset tumours, particularly in the brain. Interestingly,

adrenocortical tumours represent the only tumour type consistently associated with mutations

outside the DNA binding domain. Missense mutations in the loops opposing the DNA binding

domain are frequently seen in these tumours12

. While there is overlap between LFS component

tumours and sporadic tumours that frequently acquire TP53 mutations, there are also notable

exceptions, particularly sporadic sarcomas which rarely (~5%) acquire TP53 mutations despite

being a hallmark of LFS 16

.

7

Figure 3- TP53 mutations in Li-Fraumeni Syndrome carriers are clustered in the DNA

binding domain, codons 102-2924.

Both in the germline and in sporadic tumours, missense mutations account for the

majority (~70%) of mutations in TP53 4. This is in contrast to the nonsense mutations commonly

seen in other tumour suppressors which often result in severely truncated protein rather than the

full-length, but dysfunctional protein product encoded by missense mutations. Hussain and

Harris compared TP53 mutations to that of RB1, APC, ATM, WT1, BRCA1, BRCA2, NF1, NF2,

p16 and VHL, 10 well-known tumor suppressors known to be mutated in human cancer. Most of

these were inactivated by nonsense mutations, deletions or insertions17

whereas 74% of TP53

8

mutations were missense, a figure supported by the updated IARC database4. The advent of high-

throughput sequencing however has improved our ability to detect the small missense mutations

across the cancer genome. For example, a recent study by Fujimoto et al. conducted whole

genome sequencing on 27 hepatocellular carcinomas and found 1734 missense mutations as

opposed to only 101 nonsense mutations, 161 short coding indels and 52 splice-site mutations as

well as 561 structural alterations (deletions, duplications, inversions and translocations)18

. This

approach did not distinguish between oncogenes and tumour suppressors, which may account for

some of the difference. Regardless of the true distribution of missense mutations in the cancer

genome, the large proportion of these mutations in TP53 is significant.

Despite being a critical tumor suppressor, it appears to be advantageous for a tumour to

retain a defective TP53 protein rather than to remove it entirely. In fact, due to its detection at

high levels in cancer cells19

, TP53 was initially considered a tumour antigen with transforming

capabilities. In 1984, Wolf et al. provided the initial evidence of the oncogenic effects of mutant

TP53 when they showed that injection of mutant-p53 expressing L12 cells into mice caused a

much more severe phenotype than L12 cells expressing no TP5320

. Olive et al. demonstrated in

2004 that mice harboring a R270H(equivalent to R273H in humans) mutation had a significantly

higher tumour burden than p53-/-

mice as well as exhibiting a distinct tumour spectrum21

. Shlien

et al. characterized a remarkable phenotype involving large deletions around the TP53 locus of

sizes up to 2Mb. Interestingly, the individuals with these large copy number losses around the

TP53 locus do not appear to be at an increased risk of cancer and instead exhibit a complex

phenotype of congenital abnormalities and developmental delay. It would appear that the

expression of the intact wild-type allele is sufficient to protect against the cancer phenotype

when no mutant p53 protein is expressed 22

. There are two theories that are not necessarily

9

mutually exclusive to explain these observations: that mutant TP53 can exert a dominant-

negative inhibitory effect on wild-type TP53, or that mutant TP53 can have a wild-type TP53-

independent gain of function.

Due to their ability to oligomerize, mutant TP53 proteins are able to exert a dominant-

negative effect over wild-type protein. It has been shown that wild-type and mutant TP53 protein

can be coimmunoprecipitated23

. The oligomerized defective proteins will form part of a now

defective p53 tetramer which is unable to bind to and activate its downstream targets. This

dominant-negative effect was demonstrated using two hotspot mutations (R270H and P275S) in

mouse embryonic (ES) stem cells. After exposure to γ radiation, their ability to induce BAX,

cyclinG and MDM2 was assessed. Induction of targets was not as effectively achieved in cells

carrying TP53 point mutations as in the p53+/-

cells, particularly in the case of BAX and MDM2.

Furthermore, doxorubicin-induced apoptosis was severely inhibited in the point mutated cells

compared to the p53+/-

cells24

.

The gain of function theory postulates that mutated TP53 can develop novel functionality

that differs from that of the normal, wild-type protein and that these new functions can contribute

to tumourigenesis and neoplastic growth. A good example of this is the expression of R172H

mutant p53 exerting an oncogenic effect by inactivating the TP53 family members, p63 and p73,

thus inhibiting their ability to induce cell cycle arrest25

. Mutant TP53 has been shown to have

oncogenic effects in vivo. Dittmer et al. demonstrated that overexpression of the R175H mutant

in TP53-deficient cells leads to tumour formation in mice, while cells which do not express this

mutation do not form tumours in nude mice26

.

10

1.1.4 Genetic Modifiers

Unlike many other hereditary cancer predisposition syndromes, LFS is notable for its

high phenotypic variability with respect to tumor type, age of onset, and severity of the disease.

This variability is even observed within families. The presence of other genetic modifiers offers

a plausible explanation for this phenomenon seen in LFS patients. A role for modifying genes

has been observed in other cancer predisposition syndromes, such as hereditary breast cancer

predisposition. A G>C single nucleotide polymorphism (SNP) in RAD51 is associated with an

earlier age of onset in BRCA2 carriers despite the fact that this SNP appears to have no effect in

noncarriers as well as BRCA1 carriers27

. It has also been shown in neurofibromatosis type 1 that

the phenotypic correlations between monozygotic twins was higher than between more distant

relatives with the same disease-causing NF1 mutations, suggesting that variability at other loci

may be responsible for some of the variability of the disease28

. Shlien et al. showed that TP53

mutation carriers have a marked increase in global copy number variation29

though this

phenomenon was not observed in a subsequent study with a different patient cohort30

. If global

copy number variation is in fact higher in LFS carriers, this is suggestive of an underlying

genomic instability that may contribute to the phenotype exclusive of the TP53 mutation itself.

TP53 is highly polymorphic with 100 SNPs currently listed in the IARC p53 database4.

One of the earliest observations with regards to genetic modifiers in LFS was a SNP in TP53

exon 4 which results in either a proline or arginine at codon 72. This SNP is very common in the

general population, but it does appear to confer a functional difference. The Arg72 variant

induces apoptosis better than the Pro72 variant in stably transfected human Saos2 cells31

. A

possibly important modifier in the key TP53 regulator, MDM2 has also been observed. Bond et

al. identified a SNP at the 309th

base pair of the first intron of MDM2. The SNP results in a T>G

11

transition which extends an Sp1 transcription binding site. The variant would thus be over-

expressed due to its higher affinity for Sp1. This overexpression was observed by real-time

quantitative PCR and immunoblotting in cells homozygous for the G allele. Furthermore, these

MDM2SNP309 (G/G) expressing cells showed significantly reduced cell death rates when

treated with etoposide, a DNA damaging agent, indicating this variant is overactive in its

repression of TP53. In fact, cells expressing this variant exhibited similar cell-death numbers to

cells expressing mutant TP53, demonstrating the significant functional effect this SNP can have

on cellular p53 levels. An analysis of LFS patients comparing those heterozygous or

homozygous for the G variant of SNP309 and those who were T/T at the locus showed a

significantly earlier median age of tumour onset(27 vs. 18 years)32

. Bougeard et al. subsequently

investigated the role of the TP53 codon 72 SNP and hypothesized that it may have an additive

effect with the MDM2SNP309, particularly because MDM2 was known to have a higher affinity

for Arg72. Analysing 61 TP53 mutation carriers from 41 kindreds, they found that carriers of the

Arg72 variant did in fact have an earlier age of tumour onset compared to those with the Pro72

variant (21.8 years vs. 34.4 years). Those who carried the Arg72 variant as well as the G variant

of the MDM2SNP309 had the lowest age of onset at 16.9 years33

. However, a more recent study

of 19 extended LFS pedigrees of 463 individuals and 129 TP53 mutation carriers showed only a

modest association between MDM2SNP309 and increased cancer risk with the difference not

being statistically significant. The SNP had a similar tumorigenic effect in both carriers and non-

carriers, indicating it may not be an LFS-specific modifier34

.

More recently, a 16 base pair duplication in intron 3 of TP53 (PIN3) was identified as a

possible genetic modifier of LFS. The genotypes of PIN3, MDM2SNP309, P53Codon72 (PEX4),

as well as PIN2 (a single base pair change in intron two) of a cohort of 32 Brazilian TP53

12

mutation carriers were assessed. A modest effect of MDM2SNP309 and PEX4 on age of onset

was confirmed. A decrease of 19 years in the age of onset was observed in mutation carriers who

were homozygous for the non-duplicated PIN3. This effect did not appear to be cumulative with

either MDM2SNP309 or PEX435

. A subsequent study contradicted these results as no significant

difference in the age of onset was observed in carriers homozygous for non-duplicated PIN3.

Males in the study however, did show a moderate increase in cancer risk when homozygous or

heterozygous for the duplicated PIN3 36

. Of note in the Brazilian study is the fact that 18 of the

32 TP53 mutation carriers in this cohort carried a unique mutation at codon 337 that is common

in a population in southern Brazil. This may help to explain the different results between studies.

Unfortunately due to the inherently small sample sizes involved with LFS studies, assessing the

effect of various genetic modifiers with sufficient power remains a challenge.

Of note on the topic of genetic modifiers in LFS is the possible role of genetic

anticipation which suggests that multiple genetic insults are accumulated throughout generations

leading to more severe phenotypes in each successive generation37

. Recently, a potential genetic

cause for this phenomenon has been hinted at by the observation of accelerated telomere attrition

in progressive generations of LFS families38,39

. In these studies, TP53 mutation carriers affected

with cancer were seen to have shorter telomere length as measured in peripheral blood

lymphocyte DNA than nonaffected relatives and telomere attrition between children and adults

was faster in carriers than controls. A similar telomere shortening phenomenon linked to

anticipation was subsequently observed in hereditary breast cancer40

. This telomere shortening

could be a marker of genomic instability which could be used to predict anticipation in LFS

kindreds.

13

1.1.5 The Role of Other Genes in LFS

As previously stated, roughly 20% of LFS and 60% of LFL cases do not appear to be

caused by a germline mutation of TP53. The involvement of other disease-causing genes offers

an attractive, if currently controversial, explanation for this. To date, a secondary disease-causing

gene remains elusive. The focus of the attention has been on genes involved in the p53 pathway,

apoptosis or cell cycle control such as p6341

, BCL1042

, BAX43

, CDKN2A44,45

, PTEN 44,46

, and

CHEK147

. These studies have all yielded negative results. Germline mutations in the p53-

phosphorylating kinase CHEK2 have been reported in one LFS family as well as two families

suggestive of LFS48

. One of these reported mutations, 1322delT, was later revealed to be a

duplicated exon49

. Both of the remaining reported mutations, He157Thr and 1100delC, which

were found in a total of four families suggestive of LFS, are now understood to be

polymorphisms. The 1100delC polymorphism has an estimated frequency in healthy individuals

of roughly 1%, while being 4 to 5 times more common in families with familial breast cancer.

This variant is estimated to confer a two-fold risk of breast cancer in females, and up to a 10-fold

risk in males50

51

. It was subsequently found to have a modest association with both prostate52

and colon cancer risk53

. The He157Thr variant has been estimated to have a frequency of roughly

5% in healthy individuals and surprisingly appears to be associated with a decreased risk of lung

and laryngeal cancers54

while conferring an increased risk of prostate cancer55

. While the

1100delC polymorphism offers an interesting association with breast cancer, particularly with

bilateral and male breast cancer51

, these findings do appear to invalidate CHEK2 as a secondary

disease-causing gene in LFS. Finally, linkage to a region in chromosome 1q23 has been reported

in an LFS kindred with wild-type TP5356

. The role of any predisposing genes in this region

however, remains to be determined. Recently, Aury-Landas et al. detected 20 copy number

14

variants (CNVs) of intermediate size in 15/64 LFS patients with no detectable TP53 mutation.

While it is likely that a number of these represent non-pathogenic CNVs, a notable pattern

emerged in four patients with brain tumours. All four patients exhibited CNVs affecting genes

coding TP53 partners involved in transcriptional regulation and chromatin remodelling,

KDM1A/LSD1, MTA3, TRAPP, and SIRT3. The authors demonstrated that the CNV

encompassing SIRT3 leads to its overexpression, and that this overexpression prevents apoptosis

in vitro, and results in the hypermethylation of numerous genes57

. These results indicate that

alterations in genes involved in chromatin remodelling may be a contributing factor in TP53

wild-type LFS, particularly in cases resulting in brain tumours.

1.1.6 Epigenetics and Li-Fraumeni Syndrome

Deregulation of genes through aberrant epigenetic modifications is often seen in sporadic

cancer. In hereditary cancer syndromes inherited epigenetic defects have also rarely been

observed. The most striking example of this is in hereditary non-polyposis colon cancer.

Epigenetic silencing through promoter hypermethylation of one allele of either MLH1 or MSH2

has been repeatedly observed 58

59

60

. This led to speculation that epigenetic inactivation may

account for some LFS cases that appear to present with wild-type TP53. A polymorphism in the

promoter of TP53 was detected and appeared more frequently in the LFS/LFL families (11%)

compared to the controls (0.3%). Despite this interesting finding however, the polymorphism

was shown to have no functional effect on TP53 expression61

. Methylation of the promoter

region has been reported in numerous sporadic tumours including brain62

, breast63

, and liver64

as

well as leukemia65

. Notably, 32% of these acute lymphoblastic leukemia (ALL) patients were

found to exhibit DNA hypermethylation at the TP53 promoter despite the fact that TP53 coding

mutations in ALL are rare (roughly 2-3%). This suggests that in ALL, promoter methylation may

15

provide a considerable silencing effect without concurrent gene mutations. Finkova et al. set out

to determine the methylation status of the TP53 promoter in 14 families suggestive of LFS but

with no detectable mutation in TP53. They found no detectable methylation at any of the CG

dinucleotides tested66

, indicating that TP53 gene silencing through promoter hypermethylation is

not likely a contributing factor to the LFS phenotype. The discovery of CNVs in chromatin

remodelling genes previously mentioned however, does indicate that epigenetic dysfunction may

play a causative role in TP53 wild-type LFS.

1.2 Hereditary Cancer Predisposition Syndromes

1.2.1 Overview

Over 200 hereditary cancer predisposition syndromes have been described, the majority

of which are quite rare. Collectively however, these syndromes are estimated to account for at

least 5-10% of all cancers67

. The prevalence of hereditary cancer predisposition in children was

recently investigated by Knapke et al. who studied cancer survivors from a pediatric cancer

survivorship clinic (n=370) over two years and screened them for family history, demographics,

and tumour characteristics that would suggest an underlying cancer predisposition syndrome.

The authors found a rather high number of individuals, 29%, who were identified as candidates

for further cancer genetics evaluation. The majority of these (61%) had a family history of

cancer, while 18% had tumours strongly associated with cancer predisposition syndromes, 16%

had a medical history which would suggest a genetic diagnosis, and 6% were selected on the

basis of a family history of another congenital condition68

. While this study provides only a

rough estimate of the true proportion of pediatric cancers caused by hereditary cancer

16

predisposition syndromes, it may well be an under-estimate as many syndromes lead to very

severe phenotypes that would not be well represented in a cohort of cancer survivors.

Approximately 100 genes have been implicated as causally linked to cancer predisposition

syndromes, most of which are inherited in an autosomal dominant manner69

. The large majority

of these are tumour suppressor genes.

Alfred Knudson developed the now famous “two hit hypothesis” after studying

hereditary retinoblastoma (RB) in children. He reasoned that multiple “hits” were necessary for

tumour formation. While sporadic retinoblastoma required two somatic hits to occur in the same

target retinal cell, children with the hereditary form of RB are born with a heterozygous germline

mutation in the RB1 gene (the “first hit”) and require only one subsequent somatic hit; this

explains the earlier ages of onset in hereditary RB, as well as the far greater risk of multifocal,

bilateral tumors70

. Our group focuses on Li-Fraumeni (LFS) syndrome, however the results of

this study highlighted the relevance of other cancer predisposition syndromes, particularly Gorlin

syndrome and Lynch syndrome. Cancer predisposition syndromes can be divided into two

subgroups: 1) multisystem syndromes in which cancer is but one of the various manifestations,

such as Gorlin Syndrome; and 2) pure cancer predisposition syndromes whose only apparent

phenotype is tumour formation such as LFS and Lynch syndrome.

Cancer-associated multisystem syndromes are often diagnosed early in life before the

development of cancer due to the presence of physical manifestations of the syndrome that

frequently, but not necessarily, present at birth. Despite displaying a wide range of signs, these

disorders are often associated with a specific tumour type that is usually rare in the general

population. Neurofibromatosis-1 (NF-1) is caused by germline mutations in the NF1 gene and

affects an estimated 1 in 2500 to 3000 individuals. It is inherited in an autosomal dominant

17

fashion, but notably, an estimated 50% of patients harbour de novo mutations71

. Individuals with

NF-1 exhibit skeletal, dermatologic, ophthalmic and neurologic abnormalities. Tumours

associated with NF-1 include: optic pathway gliomas (the most common tumor in NF-1),

astrocytomas, neurofibromas, peripheral nerve sheath tumours, and juvenile myelomonocytic

leukemia. Other notable cancer-associated multisystem syndromes for which causative genes

have been identified include Cowden syndrome, Fanconi anemia and tuberous sclerosis.

1.2.2 Gorlin Syndrome

Nevoid basal cell carcinoma syndrome (OMIM 109400, NBCCS, or Gorlin Syndrome) is

a complex multisystem syndrome characterized by an extremely high incidence of basal cell

carcinomas (BCCs) throughout an individual’s lifetime. Incidence may be as high as 90% with

skin pigmentation thought to provide a protective effect72,73

. It is inherited in an autosomal

dominant fashion with an estimated prevalence of 1 in 57000 to 1 in 256000 and appears to

affect males and females equally74

. Aside from multiple BCCs, other clinical manifestations of

NBCCS include odontogenic keratocysts of the jaws, hyperkeratosis of palms and soles, skeletal

abnormalities, intracranial ectopic calcifications, and facial dysmorphism along with rare cases

of intellectual deficit. Individuals affected by NBCCS are also at a higher risk of developing a

wide range of malignancies including medulloblastoma, fibroma and rhabdomyosarcoma.

Medulloblastoma in NBCCS typically presents during the first two years of life, while in the

general population it tends to peak around 7 or 8 years of age. There is also a distinct

preponderance of the desmoplastic subtype. The risk of developing a medulloblastoma appears

to be approximately 5%75

. Interestingly, males with NBCCS appear to be three times more likely

than females to develop a medulloblastoma74

. This early onset medulloblastoma may be the

presenting sign of NBCCS which makes testing for NBCCS vital as invasive BCCs and other

18

secondary malignant neoplasms can occur within the radiation field of the treated

medulloblastoma76–78

. In 105 affected individuals examined at the National Institute of Health,

the mean age of BCC onset was determined to be 23 and 21 years for Caucasians and African-

Americans, respectively, though the percentage of African-Americans who developed a BCC

was much lower79

. While these tumours can be quite locally destructive, they rarely metastasize.

NBCCS is associated with germline mutations in the homologue of the Drosophila

melanogaster Patched gene, PTCH1. PTCH1 is a tumour suppressor that in the fruit fly, is vital

for proper body segmentation. It encodes a transmembrane glycoprotein composed of 12

transmembrane domains and two large extracellular loops where binding with its ligand, Sonic

Hedgehog (SHH) occurs. Between 50 and 85% of individuals meeting the NBCCS clinical

criteria harbour a germline mutation in PTCH180

. PTCH2 is highly homologous to PTCH1 and

mutations in PTCH2 have been found in one simplex case of BCC and one simplex case of

medulloblastoma81

. This led to the discovery of mutations in PTCH2 in a select few cases of

NBCCS presenting with wild-type PTCH182,83

. These carriers also appear to exhibit a less severe

phenotype than those who carry the classical PTCH1 mutations. The association of NBCCS with

the SHH pathway spurred investigation into the role that other SHH pathway members may play

in NBCCS. This led to the identification of a germline mutation in SUFU in two members of an

NBCCS kindred80

. Mutations in SUFU have also been linked to medulloblastoma without the

NBCCS phenotype84

.

The diagnosis of NBCSS is made on the basis of a set of clinical criteria; either two

major criteria or one major and two minor criteria must be satisfied to confirm the diagnosis74

.

The major criteria include: multiple BCCs under 20 years; histopathologically-proven

odontogenic keratocysts of the jaws; three or more palmar or plantar pits; bilamellar calcification

19

of the falx cerebri; bifid, fused, or markedly splayed ribs; and first degree relatives with NBCCS.

Minor criteria include: macrocephaly; congenital malformations including cleft palate, frontal

bossing, and moderate hypertelorism; other skeletal abnormalities such as Sprengel deformity,

marked pectus deformity, and marked syndactyly of the digits; ovarian fibroma;

medulloblastoma. Gene mutation analysis plays a key role in diagnosis as there is considerable

variation in presentation even within the same family. Due to the fact that 20-30% of probands

have a de novo PTCH1 mutation, genetic testing is indicated for children presenting with a

desmoplastic medulloblastoma as an NBCSS diagnosis would have direct implications on

treatment85

.

In ‘pure’ cancer predisposition syndromes there are usually no early phenotypic

manifestations to facilitate an early diagnosis. A comprehensive review of the family history and

genetic testing are required to identify the presence of these syndromes before the formation of a

tumour. Perhaps the most widely recognized pure cancer predisposition syndrome is hereditary

breast-ovarian cancer syndrome (HBOC). Associated with mutations in BRCA1 and BRCA2,

HBOC is estimated to account for 5-10% of breast and ovarian cases Mutations in either BRCA1

or BRCA2 are present in approximately 1 in 500 individuals in the general population and more

common in certain populations; 1 in 40 in individuals of Ashkenazi Jewish ancestry carry a

BRCA mutation86

. Like many cancer predisposition syndromes, the penetrance varies based on

the actual mutation affecting the gene. The cumulative risks for developing breast and ovarian

cancer by age 70 have been estimated to be 65% and 39% for breast and ovarian cancer,

respectively in BRCA1 mutation carriers, and 45% and 11% for breast and ovarian cancer,

respectively in BRCA2 carriers. Other notable pure cancer predisposition syndromes include

20

familial adenomatous polyposis (FAP), rhabdoid tumour predisposition syndrome, Li-Fraumeni

syndrome and familial retinoblastoma.

1.2.3 Lynch Syndrome

Hereditary Nonpolyposis Colorectal Cancer Syndrome (OMIM 120435, HNPCC or

Lynch Syndrome) was first reported in 1913 by Aldred Warthin who chronicled a family,

“Family G”, with a hereditary pattern of stomach and endometrial cancer87

. Family G was further

characterized by Lynch and Krush almost 60 years later and this autosomal dominant pattern of

inheritance of gastrointestinal and gynaecologic cancer became known as HNPCC88

. Colorectal

cancer is one of the most common malignancies in the developed world. HNPCC has been

associated with approximately 2-4% of all colorectal cancer cases. Individuals with HNPCC also

have a higher risk of developing other malignancies such as cancers of the endometrium, ovaries,

stomach, small intestines, urinary tract, brain, and pancreas. Ascertainment bias has likely

influenced older estimates of colorectal cancer risk which were reported as being as high as 80%.

More recently, the lifetime risks of developing colorectal cancer for HNPCC patients has been

estimated to be 22-66% while also conferring significant risk of endometrial cancer, an estimated

32-45%89–91

.

Mutations in the mismatch repair (MMR) genes are observed in the majority of

individuals with HNPCC. These genes are vital in the maintenance of genomic fidelity by

correcting nucleotide mismatches that the normal editing function of DNA polymerase failed to

rectify. They include MLH1, MSH2, MSH6, and PMS2. In the past, mutations in MLH1 and

MSH2 were thought to account for up to 90% of HNPCC cases92

. This seems to have been an

overestimation however, likely caused by the less striking phenotype caused by MSH6 and PMS2

mutations which have more recently been estimated to be as high as 13% and 9% respectively93

.

21

Due to these underlying MMR gene mutations, a hallmark characteristic of tumours in HNPCC

is very high levels of microsatellite instability. The replication errors caused by the failure of the

MMR system are thought to be present in up to 90% of colorectal tumours in HNPCC94

. In 2009,

germline mutations in the EPCAM gene were found in HNPCC families that exhibited loss of

MSH2 expression despite having no detectable MSH2 mutations95,96

. These deletions at the end

of the EPCAM gene lead to transcriptional read-through into its downstream neighbouring gene

MSH2. MSH2 is subsequently epigenetically silenced through hypermethylation of its promoter

in cells where EPCAM is expressed such as in the epithelial cells of the intestine. This discovery

highlights the importance of studying the regions encompassing pathogenic genes as they can

directly influence gene function. Epigenetic silencing of MSH2 through deletion of EPCAM is

estimated to account for up to 6.3% of all HNPCC cases97,98

. HNPCC offers one of the best

examples of the importance of epigenetics in hereditary cancer as the EPCAM discovery was not

the first example of epimutations leading to the HNPCC phenotype. Hypermethylation of the

MLH1 and MSH2 promoters leading to gene inactivation has been reported in a number of

individuals meeting the clinical criteria for HNPCC. This hypermethylation can also be present

in the spermatozoa, indicating the potential for transmission to the offspring58,59,99

.

Early identification of HNPCC is vital in order to implement effective cancer prevention

strategies before tumour formation. A prospective Finnish study evaluated the efficacy of regular

colonoscopic surveillance and found that it was associated with a reduction in colorectal

incidence and mortality by 62%100

. More regular surveillance (1-2 years vs. 3 years) has since

been shown to improve outcomes even more by further reducing the risk of colorectal cancer and

largely limiting the developing malignancies to localized, early stage tumours101

. The

International Collaborative Group on HNPCC established diagnostic criteria, the Amsterdam

22

criteria, in 1991102

which were further refined in 1999 in order to incorporate the various tumours

outside the gastrointestinal tract103

. These criteria are neither highly sensitive nor specific, as

only 50% of probands meeting the criteria are found to harbour germline MMR gene mutations

while only 60% of families with known MMR gene mutations meet the criteria104

. The less

stringent Bethesda guidelines were developed in 1997 and updated in 2004 with the aim of

improving the sensitivity of diagnostic criteria. They largely succeed in this goal with sensitivity

estimates as high as 94%, though perhaps up to 28% of MMR gene mutation carriers may still be

missed under these guidelines93,105,106

. The Amsterdam and Bethesda criteria are summarized in

Table 1. In the past, MMR gene screening has often been limited to MLH1 and MSH2. As the

role that MSH6, PMS2, EPCAM and promoter hypermethylation play in HNPCC becomes

clearer, and their testing becomes more routine, more individuals that meet the clinical criteria

with these genetic lesions will likely be identified in the future.

23

Amsterdam II Criteria (1999)- All must be met:

Three or more relatives with histologically confirmed colorectal cancer or cancer of the

endometrium, small bowel, ureter, or renal pelvis. One affected relative must be a first-degree

relative of the other two; FAP should also be excluded

Two or more successive generations are affected

At least one relative was diagnosed before the age of 50 years

Revised Bethesda Guidelines (2004)- One or more of the following must be met:

Colorectal cancer before the age of 50 years

Synchronous or metachronous colorectal cancer or other HNPCC-related tumours, regardless

of age

Colorectal cancer with MSI-high morphology before the age of 60 years

Colorectal cancer (regardless of age) and a first-degree relative with colorectal cancer or an

HNPCC-related tumour before the age of 50 years

Colorectal cancer (regardless of age) and two or more first- or second-degree relatives

diagnosed with colorectal cancer or an HNPCC-related tumour (regardless of age)

Table 1- The revised Amsterdam criteria and Bethesda guidelines used to diagnose

HNPCC.

24

1.3 TP53

1.3.1 Overview

The TP53 protein was initially discovered through its association with the SV40 large T

antigen. This 53-54 kilodalton protein was initially described as an oncogene due to its ability to

transform recipient cells19

. Later however, TP53’s true identity as a tumour suppressor was

revealed upon the discovery that wild-type TP53 does in fact suppress cellular growth10,107

. The

previously observed transformation is caused by TP53 only when the gene is mutated.

Human TP53 protein consists of 393 amino acids and has four key functional domains

(figure 4): a transcriptional activation domain between amino acids 1-42; the site of interaction

between TP53 and the cell’s transcriptional machinery as well as its own negative regulators

such as MDM2; a DNA binding domain between residues 102-292, required for binding to

consensus DNA sequences in the phosphate backbone of the DNA helix; a tetramerization

domain between residues 326-355 responsible for the TP53 protein’s successful oligomerization

into a functional tetramer; and a terminal regulatory domain consisting of the final 26 amino

acids which regulates the protein’s ability to bind specific DNA sequences at the core domain.

25

Figure 4- Human TP53 protein has four key functional domains: the transcriptional

activation domain, the DNA binding domain, the tetramerization domain, and the negative

regulatory domain.

1.3.2 Transcriptional Regulation of TP53

The TP53 gene is transcribed from the negative strand of chromosome 17p13.1. The first

promoter, P53P1 is located 250 bp upstream of the first, non-coding, exon. The second promoter,

P53P2 lies within intron 1108

. The gene’s third promoter, P53P3 is located in the fourth intron.

While much attention has been paid to TP53’s post-translational regulation, certain aspects of its

transcriptional control have long been understood. Reich et al. demonstrated that 3T3 cells had

increased levels of TP53 when deprived of and then stimulated with serum109

. The increased

levels were shown to be effected at the transcriptional level as no increase in TP53’s half-life

was observed while the gene’s transcription mRNA levels rose 6-7 hours after serum stimulation.

This increase in TP53 mRNA prior to DNA synthesis may seem paradoxical but appears to be

commonplace110,111

. While this seems like strange timing for the increased expression of a tumor

suppressor, it is thought that this surge in TP53 transcription allows for increased “surveillance”

26

by TP53 at a time when cells are synthesising DNA and are thus at high risk of DNA damaging

events112

.

The TP53 promoter contains a bHLH recognition sequence (CACGTG) between 70 and

75 base pairs upstream of the transcription start site, which is a known recognition sequence of c-

MYC/MAX. Importantly, c-MYC is unable to bind DNA when not heterodimerized with MAX,

without which, c-MYC’s regulation of TP53 cannot occur. Using in vitro DNA methylation,

Schroeder et al. observed a 90% reduction of TP53 expression, providing the initial evidence for

DNA methylation dependent silencing of TP53113

. Examples of TP53 methylation in vivo have

been described above in the section entitled “Epigenetics and Li-Fraumeni Syndrome”.

Recently, micro-RNAs, (miRNAs) have emerged as important post-transcriptional gene

regulators. In 2009 Le et al. searched for potential miRNA-binding sites in the 3’-UTR region of

TP53 using an in silico approach. This led to the identification of miR-125b as a potential

regulator of TP53. The authors did in fact observe that knockdown of miR-125b leads to

increased levels of TP53 protein and induces apoptosis in human lung fibroblasts. Interestingly,

when zebrafish embryos are exposed to gamma radiation or camptothecin, miR-125b appears to

be down-regulated which corresponds to the rapid increase in TP53 as part of the DNA damage

response114

. A similar interaction with TP53 was reported soon after for the isoform miRNA-

125a115

. Since then, more than a dozen miRNAs have been identified as regulators of TP53,

either as negative regulators targeting the TP53 3’ UTR, or positive regulators that target the

UTRs of TP53 inhibiting genes such as SIRT1 and MDM2116

. Many of these miRNAs appear to

have functional relevance in human cancer cases. In colorectal cancer for example, miR-125b

when highly expressed, is associated with lower TP53 expression, advanced tumor size and

invasion and poor prognosis when compared to the low expression group117

. Of the known

27

transcription factors that bind the human TP53 promoter, only BCL6 and Pax negatively affect

its transcription118

. It is likely then, that miRNAs play a key role in the negative regulation of

TP53 and that when deregulated could significantly promote tumourigenesis. In the past few

years a rapid rise in the number of reported miRNAs regulating tumour suppressors like TP53

have been observed. While the number of confirmed TP53-regulating miRNAs remains small,

we can expect this number to rise in the near future as their importance becomes better realized.

The regulation of TP53 by WRAP53 (also known as WDR79) is a recently discovered

situation of regulation by a genomic neighbour119

. WRAP53 is located immediately upstream of

TP53, but on the opposing strand. This anti-sense transcript exists in numerous forms that can

use one of three possible first exons. One of these, exon 1α, overlaps with up to 227 basepairs of

TP53’s first exon. It appears that TP53 and this Wrap53α interact in a head to head manner and

that this interaction is necessary for proper transcription of TP53. Knockdown of the Wrap53α

transcript and the blocking of TP53/WRAP53 RNA hybrids by 2-O-oligonucleotides led to

significantly reduced levels of TP53 mRNA, while overexpression of the overlapping exon 1α

sequence increased TP53 mRNA levels. Oddly, this interaction appears to be one way as

overexpression or knockdown of TP53 appeared to have no effect on the expression of

WRAP53119

. Over-expression of the Wrap53 protein has no effect on TP53 transcription or

protein levels which presents additional evidence of the impact RNA can have on gene

regulation118

.

1.3.3 TP53 and MDM2

The MDM2 protein is the key regulator of TP53 protein levels through ubiquitination.

Ubiquitination is the covalent attachment of one or more ~8kD ubiquitin molecules to a protein.

This process requires the consecutive function of three enzymes: an E1 ubiquitin-activating

28

enzyme, an E2 ubiquitin-conjugating enzyme, and an E3 ubiquitin-ligating enzyme. MDM2 is an

E3 ubiquitin-ligating enzyme which like many E3 ligases, harbours a Really Interesting New

Protein (RING) domain. The observation that co-deficiency of MDM2 and TP53 is not lethal

while deficiency of MDM2 alone is has offered compelling evidence of the importance of the

TP53-MDM2 relationship120,121

. Despite being viable, these mice develop a similar spectrum of

tumours seen in TP53-null mice. The C-terminal region of TP53 is the site of the nuclear

localization signals and has been shown to be critical in MDM2-mediated regulation. Deletion of

TP53’s terminal 30 residues ablates this degradation 122

. In fact if any of the six lysine residues

(K370, K372, K373, K381, K382, and K386) in this region where ubiquitination occurs are

mutated, the transcriptional output of TP53 is increased 123

. TP53 is poly-ubiquitinated by high

levels of MDM2, leading to the proteasomal degradation, and mono-ubiquitinated by low levels

of MDM2, leading to TP53 being shuttled out of the nucleus into the cytoplasm. Aside from

ubiquitination, MDM2 also directly inhibits TP53 transcriptional activation by binding to the

protein itself124

. Mice with mutant MDM2 which has no E3 activity but retains the TP53 binding

capacity die during embryogenesis. They can however be rescued by the loss of TP53, which

indicates that the E3 ligases activity of MDM2 is crucial to its repression of TP53125

.

Importantly, MDM2 is itself a TP53 transcription target. Stress-induced increases in TP53 levels

induce increases in MDM2 expression which then downregulates TP53, creating a negative

feedback loop. MDM2 must be blocked in order for effective TP53 activity to occur. This is the

role of the ARF protein (p14ARF

). The deletion of MDM2 residues 222-437 was shown to abolish

its binding to ARF126

, identifying the key region required for ARF’s regulation of MDM2. It was

later shown by subsequent deletion analysis that residues 210-244 are responsible for most, if not

all, of ARF binding activity127

. Therefore TP53 and ARF bind to different regions of MDM2 in a

29

non-competitive manner. Deletion of the first 20 residues of human ARF severely inhibits its

binding to MDM2128

. Interestingly, the importance of this region is not well-conserved and this

effect was not observed in mice129

. Upon binding to MDM2, ARF inhibits its function by

sequestering it to the nucleolus, a region primarily associated with ribosome assembly, leading to

an increase in TP53 levels and activity.

1.3.4 Post Translational Modifications and the TP53 Response

Despite recent findings shedding light on TP53’s transcriptional regulation, the primary

method of regulation is thought to be by activation of the latent protein through post-translational

modifications. In fact, all threonine and serine residues in the first 89 amino acids of TP53 can be

phosphorylated or dephosphorylated following stress130

. The sheer number of TP53

modifications however has not quelled doubts about their importance as it has been shown that

mutations at all the known N-terminal and C-terminal phosphorylation sites do not inhibit

TP53’s transcriptional activation activity131

. While a comprehensive review of all the post-

translational modifications of TP53 goes beyond the scope of this introduction, a basic

understanding of TP53 modifications with respect to function is helpful.

TP53 lies at the heart of a complex cellular machinery dedicated to detecting and

repairing DNA damage. Ionizing radiation (IR) is a common and often potent source of

environmental DNA damage. Sources of IR include cosmic radiation, naturally occurring

sources (i.e. tritium), as well as artificial sources such, x-ray tubes and radiation therapy.

Exposure to IR leads to the accumulation of hydroxyl radicals which cause DNA double-strand

breaks and induce a TP53 response. Hydroxyl radicals are also generated as a consequence of

cellular respiration. The ATM (ataxia telangiectasia-mutated) gene is crucial to this response. By

phosphorylating TP53 at Ser15 ATM impairs the MDM2-mediated repression of TP53132

. Ser20

30

of TP53 is phosphorylated by Chk2133

which is itself phosphorylated by ATM, further inhibiting

MDM2-mediated repression134,135

.

Ultraviolet radiation (UV) is another important source of DNA damage that one is

exposed to daily from sunlight exposure. UV-C is the shortest wavelength of UV and as it

directly damages DNA the most, is the most studied. Both UV-B and UV-C radiation frequently

result in cis-syn cyclobutane pyrimidine dimers that cause errors during replication resulting in

the “classical C-T mutation” seen in many cancerous growths, as well as the generation of

reactive oxygen species (ROS). The ATM-Rad3-related (ATR) gene plays a key role in the

cellular response to UV radiation. The ATR gene is both structurally and functionally related to

ATM as both phosphorylate Ser15. While both ATR and ATM are involved in the IR response136

,

and share a common target motif, only ATR is involved in the UV response137

. Evidence of this

is seen in ataxia telangiectasia patients, who are not hyper-sensitive to UV radiation as TP53

activation appears to occur normally. A difference in kinetics between the IR and UV responses

has also been observed. Lu and Lane showed that UV induced cells mounted a TP53 response in

two hours while an IR response is brought about in one hour. While the UV response is mounted

slower however, TP53 levels continue to rise more than three hours after the response, by which

time levels are already dropping in an IR-induced response138

.

Upon activation TP53 can fulfil its primary role as a transcription factor for a large

number of downstream targets. In 1992 el-Deiry et al. used an unbiased approach to identify

DNA bound to TP53. They found 18 independent genomic DNA fragments bound to TP53. Each

of these possessed two copies of a 10bp motif 5’-Pu-Pu-Pu-C-A/t-T/a-G-Py-Py-Py-3’ separated

by a stretches no more than 13bp (Where Pu are purine residues and Py are pyrimidines)139

.

Following the sequencing of the human genome, a new technique known as ChIP on chip was

31

developed combining chromatin immunoprecipitation and high-density oligonucleotide arrays

capable of identifying DNA-protein interactions. This technique has been used to predict up to

1600 TP53 binding site regions across the genome by extrapolating from results found on

chromosomes 21 and 22140

. Subsequent to this, Wei et al. conducted a complete genome scan to

find all of TP53’s direct targets. Contradicting the results from Cawley et al., they found only

542 high quality TP53 binding sites141

. Using this, the authors were able to compile a list of

TP53 target genes. These genes can be broadly classified by TP53’s two primary roles: cell cycle

arrest and apoptosis.

1.3.5 TP53-mediated Cell Cycle Arrest

Cells can normally arrest in the G1 or G2 phases. Cells lacking TP53 however do not

seem able to arrest in the G1 phase142

. GADD45’s association with this TP53-mediated G1 cell-

cycle arrest was found following the identification of a TP53 binding site in its promoter and the

observation that it is induced following DNA damage. This induction is dependent on wild-type

TP53 function and is also not present in cells derived from ataxia telangiectasia patients143

. TP53

also exerts strict control over P21 (otherwise known as WAF1 or CDKN1A), a key regulator of

the cell cycle. Upon detection of DNA damage during G1, P21 is induced by TP53. It is then

transported to the nucleus where it blocks the cell cycle144

. It accomplishes this by inhibiting two

cyclin-dependant kinases (CDKs): CDK2 and CDC2. This TP53-mediated arrest is critical for

keeping cells from replicating errors in the S phase. If DNA damage is detected during the S-

phase, TP53 induces P21 which will associate with the proliferating nuclear antigen (PCNA),

halting replication.

32

1.3.6 TP53-mediated Apoptosis

Programmed cell death, termed ‘apoptosis’, is not only vital in the suppression of cancer,

but also to proper development, homeostasis, and immune system maintenance. Strict regulation

of this system is therefore necessary throughout all stages of life. While an inability to initiate

apoptosis effectively is a major hallmark of cancer145

, excessive apoptosis can lead to embryonic

death146

. A family of cysteine proteases, the caspases, are the primary drivers of apoptosis which

cleave at an Asp residue147

. Activated TP53 promotes apoptosis by inducing the expression of

the Fas receptor148

. When the Fas receptor is present on the surface of the cell it becomes

sensitized to the effect of the fas ligand (FasL). Upon binding with FasL, the receptor binds the

fas-associated death domain protein (FADD) in the cytoplasm. It is FADD that recruits two

initiator caspases: caspase-8 and caspase-10. These initiator complexes activate the executioner

caspases 3, 6, and 7. It is the cleavage activity of these executioner caspases that are responsible

for the morphological changes of apoptosis including protrusions from the plasma membrane,

and the collapse and fragmentation of the nuclear structure.

Cytochrome c resides between the inner and outer membranes of the mitochondria but is

released into the cytosol when apoptosis is initiated. It then binds APAF1 to form a seven-spoked

molecule called the apoptosome which activates the procaspase 9 and converts it into caspase 9.

Caspase 9 goes on to activate the executioner caspases itself. A family of proteins, the Bcl-2

family, controls the release of cytochrome c by the mitochondria149

. Of this family, the Bcl-2

gene itself, and four related genes (Bcl-X1, A1, Bcl-w, and Mcl-1) are anti-apoptotic while the

Bax family (Bax, Bak, and Bok) as well as the BH3-only family (Bim, Bik, Bad, Puma, Bid,

Noxa, Hrk, and Bmf) are pro-apoptotic. It is the quantity of these pro- versus anti-apoptotic genes

that determine whether cytochrome c is released from the mitochondria or retained. TP53 is a

33

transcriptional activator of Bax150

, NOXA151

, and PUMA152

, all pro-apoptotic genes whose

activation promote mitochondria-mediated apoptosis.

1.4 Copy Number Variation

1.4.1 Overview

While variation in SNPs has been studied in LFS patients, SNPs represent but one

form of common genomic variation. Microsatellites and minisatellites confer significant

variation between individuals as well. In the last decade the important role that copy number

variations (CNVs) play in genetic diversity has come to light. CNVs are defined as structural

genomic variants in which a copy number difference has been observed between two or more

genomes153

. A CNV is defined as being larger than one kilobase in size, and can span many

megabases, being visible under a microscope. A related classification of structural variation

known as “indels” are commonly defined as being between 10-1000bp. The discovery of this

prevalent form of genomic variation quickly overturned the idea of a single diploid “reference

genome”. The past decade has seen an explosion in data showing just how prevalent CNVs are in

the general population. Prior to the significant advances of DNA microarray technology of the

last 10 years, only a few copy number variable loci had been identified, such as the alpha-7-

nicotinic receptor gene at 15q13-15154

. The initial landmark genome-scale studies of 2004

detected 76 CNVs in 20 individuals155

and 255 CNVs in 55 individuals156

. These initial findings

were shown to just be the tip of the iceberg. The Database of Genomic Variants (DGV) compiles

structural variation detected in healthy samples in CNV studies across the world157

. Currently,

109836 merged-level CNVs have been listed in the database. The last few years in particular

have seen a dramatic increase in the reporting CNVs with the advent of ultra-high resolution

arrays and whole genome sequencing (Figure 5) encompassing up to 71% of the genome.

34

Estimating the true size of many of these CNVs can be difficult, particularly with regards to

earlier studies as we now know that the limited resolution of these platforms led to an over-

estimation of CNV size. For example, in 2010, Conrad and Pinto et al. used a set of 20

NimbleGen arrays, harnessing 42 million probes in total which were tiled across the genome at a

density of one probe every 50bp. This led to the discovery of 11700 CNVs greater than 443bp in

size with an average of 1117 and 1488 CNVs detected per person in the European and African

samples, respectively158

. The average number of CNVs per individual was higher, and the

average CNV size was lower than previously estimated. Due to increasing platform resolution

first of CGH arrays and now of next-generation sequencing (NGS) this trend has continued with

the reporting of numerous CNVs and indels under 10kb in size in the healthy population (Figure

6).

35

Figure 5- Recently there has been a rapid rise in reported copy number variation into the

Database of Genomic Variants157

.

Figure 6- The majority of copy number variants reported in the Database of Genomic

Variants are now under 10kb157

.

36

1.4.2 Generation of Copy Number Variants

Structural alterations such as CNVs are often generated as a result of errors in DNA

repair. In an attempt to repair DNA breaks, segments of the genome can be fused in such a way

that can generate deletions, duplications or inversions159

. These repair mechanisms can be

broadly divided into those which are homology-mediated and those that are not homology-

mediated. Homologous recombination (HR) not only generates new combinations of linked

alleles at meiosis, but is central to many DNA repair processes. HR requires significant DNA

sequence identity, roughly 50bp in E.coli160

and up to 300bp in mammalian cells161,162

. During

meiosis the Spo11 protein creates DNA double stranded breaks, thus providing the substrate for

HR163

. This HR during meiosis allows for the exchange of material between the maternal and

paternal chromosomes and is responsible for much of the genetic diversity observed in sexually

replicating organisms. While mitotic HR is functionally similar, it is brought about by DNA

double stranded breaks caused by cellular metabolism or external DNA damaging agents. The 3’

ends of the break are processed by endonucleases which leads to the formation of a 3’ tail (a

single stranded overhang). It is at this stage that a homologous donor segment is found allowing

for the invasion by the single stranded DNA of the homologous duplex DNA, displacing a strand

and creating a D-loop. The strand exchange is catalyzed by RAD51 in both meiotic and mitotic

HR while DMC1 is only active during meiotic HR. The synthesis of DNA from the invading

strand can now begin with the donor DNA acting as a template and the second end of the double

strand break is “captured”, creating two Holliday junctions which are then cut and ligated to seal

the nicks. The critical role of HR in maintaining genomic integrity can be seen in cells with

suppressed RAD51 expression as they rapidly accumulate structural abnormalities and cease to

divide164

. While HR is generally a robust DNA repair mechanism capable of repairing double

37

stranded breaks accurately, repetitive elements in the genome can lead to recombination between

homologous segments at different chromosomal locations. This is known as non-allelic

homologous recombination (NAHR) and is often associated with low-copy repeats, known as

segmental duplications. These segmental duplication-mediated NAHR events have been

associated with a number of genomic disorders, including 24kb flanking repeats in Charcot-

Marie-Tooth disease165

and ~200kb repeats in Smith-Magenis syndrome166

.

Synthesis-dependent strand annealing and break induced replication (BIR) represent

other forms of homology-mediated DNA repair processes. Synthesis-dependent strand annealing

is similar to the HR model described above, but rather than forming a double Holiday junction

by capturing the 2nd

DNA end, the synthesized strand of DNA is displaced from the D-loop

allowing it to fuse with the other end of the break. The BIR mechanism comes into play upon the

collapse of DNA replication forks, resulting in one-ended double stranded breaks167

. When a

replication fork encounters a nick in the template strand this mechanism is engaged. One strand

of the fork breaks off and is resected, revealing a one-side 3’ overhang. This invades a

homologous strand to form a D-loop. A new replication fork is then formed upon which both the

leading and lagging strands are synthesized. This process is normally faithful, but if the repair

involves a homologous sequence at a different chromosomal location, structural alterations can

result. There is in fact growing evidence that BIR constitutes a major mechanism of CNV

formation159

.

While homology mediated mechanisms are thought to account for the majority of CNV

formation through DNA repair, non-homology mediated pathways can also repair DNA double

stranded breaks and could thus lead to structural alterations. The most prominent of these

mechanisms is known as non-homologous end joining (NHEJ). In non-dividing haploid

38

organisms or in diploid organisms that are not in S phase, there is no homologous donor nearby

that can be used in the homology mediated repair mechanisms. NHEJ provides a repair

mechanism that is commonly used in these instances to rejoin double stranded breaks into a

contiguous product168

. In vertebrates, the Ku protein binds to DNA to form a complex with

which a nuclease, polymerase and ligase can interact. This is a flexible process that can lead to

many different junction products, with each side potentially having different nucleotides resected

or added. When there is a small amount of homology at the ends of the DNA, they can be joined

together. NHEJ has been shown to be the driving process behind a number of genomic disorders.

In a breakpoint characterization of 39 deletions at the dystrophin gene for example, unequal

homologous recombination was very poorly represented while junction features such as short

homologous segments were common, suggesting many of the pathogenic structural alterations

were brought about by NHEJ169

.

A more recently discovered mechanism known as ‘fork stalling and template switching’

(FoSTeS) has been implicated in the demyelinating disorder Pelizaeus-Merzbacher disease170

. In

FoSTeS the replication fork stalls, causing the lagging strand to disassociate from the original

template and anneal to another replication fork nearby which is followed by reinitiating DNA

synthesis. The location of the new fork will dictate whether a deletion or duplication (upstream

or downstream, respectively) will occur. After stalling of the first fork, FoSTeS invades a new

strand making use of small microhomologous segments. The repetitive disassociation of a

nascent strand and reinitiating of DNA on the original template can cause complex

rearrangements to arise171

. Of note is that this is a replication-based mechanism, challenging the

notion that congenital pathogenic CNVs are all of meiotic origin. Further evidence of the

potential impact of non-meiosis driven CNVs can be seen in monozygotic twins that have

39

different CNVs172

. The origin of these different CNVs would have to have been in somatic cells

during mitosis.

1.4.3 Germline CNVs in Disease

Copy number variants do in fact appear to have significant effects on the transcription of

genes by either affecting dosage directly or by disrupting proximal or distant regulatory

regions173

. Pathogenic CNVs have often been observed containing multiple genes. The diseases

caused by such genomic rearrangements are known as “genomic disorders”174

. Because of this,

genomic disorders often present with a broad and varied phenotype such as Prader-Willi

syndrome which is associated with a 15q11-q13 deletion involving many genes and manifests

itself with various mental and physical effects. Beginning with karyotyping and later, with early

CGH arrays, identifying the genetic cause of genomic disorders has relied on low resolution

detection methods which would be biased in their detection of large, multi-gene CNVs.

Particularly with the increased use of whole genome sequencing in the clinic as well as high

resolution arrays, the importance of rearrangements of just one gene, or even just one part of one

gene in genomic disorders will be better understood. Interestingly, the effect of a pathogenic

CNV may not be limited to the gene, or genes, it contains. Williams-Beuren syndrome is a rare

neurodevelopmental disorder associated with a deletion at 7q11 involving up to 28 genes. Upon

measuring expression levels of the genes as well as those of the surrounding regions, it was

unexpectedly observed that even genes far outside the deleted region (thus having a copy number

of 2) also had reduced expression levels175

. These results suggest that flanking genes even

several megabases away from a genomic rearrangement should be considered to have a potential

role in the observed phenotype, despite being copy number neutral.

40

1.4.4 CNVs in Cancer predisposition

Retinoblastoma presented one of the earliest examples of a genomic rearrangement

causing cancer predisposition when individuals with retinoblastoma presented with

cytogenetically visible deletions at 13q14176,177

which led to the mapping of the RB1 gene to this

region. There are now approximately 100 genes known to cause Mendelian-inherited syndromes

when mutated69

. Roughly 40% of these genes have been observed as deleterious CNVs in

individuals affected by these cancer predisposition syndromes178

. Whole-genome CNV profiling

in high-risk individuals has recently revealed a number of candidate regions and genes including

individuals predisposed to colorectal cancer179

, breast cancer180

, and melanoma181

.

Identifying common CNVs that confer a moderate risk of cancer is significantly more

difficult as each variant may confer only a slight increase in risk as well as the fact that many

CNVs encompass genes that mediate interaction with the environment. In these cases the

detection of a pathogenic CNV would depend on the presence or absence of certain

environmental triggers. The effect of CNVs being modulated by environmental factors is seen in

the drug response of detoxification genes182,183

. It is perhaps surprising that a significant number

of cancer genes have been identified in CNV regions. Previously, our own lab found 49 cancer

genes that were directly encompassed or overlapped by a CNV in more than one person from a

large reference population29

. As the resolution of the platform used was quite modest (the mean

CNV size was 206kb), one would expect the true number of cancer genes encompassed or

overlapped by CNVs to be much higher. The DGV contains numerous CNVs encompassing

these genes. An interesting example of this is copy number variation in the Rad51L1, a gene vital

41

for DNA repair by homologous recombination and has been shown to harbour a SNP that is

associated with breast cancer184

.

While common cancer SNPs and CNVs likely contribute to mild cancer predisposition in

much of the population, it is the rare, high-risk CNVs that are associated with highly penetrant

cancer predisposition syndromes. The majority of these are thought to primarily be caused by

base-pair sized germline mutations. However, PCR-based sequencing often leaves genomic

rearrangements undetected, and the rise of copy number detection methods such as CGH arrays

and multiplex ligation-dependent probe amplification (MLPA) has led to an increased

appreciation of the role of copy number gains and losses in these syndromes. A summary of

genes associated with cancer syndromes in which CNVs have been observed can be found in

table 2. While it appears that point mutations and CNVs often result in a similar phenotype, there

are cases where CNVs confer a similar, but seemingly altered phenotype than point mutations.

When one copy of the entire APC gene is deleted for example, it appears to result in an

attenuated form of familial adenomatous polyposis (FAP) compared to the more severe cases

seen in individuals harbouring a point mutation in APC185

. A number of studies have assessed the

potential role of CNVs in individuals who tested negative for mutations in genes associated with

their respective cancers. These include hereditary pancreatic186

, colorectal179

, breast cancers180

.

While these studies have not yet identified causative genes, a number of rare CNVs have been

identified in each of them. In both the breast and colorectal cancer studies, a shared CNV

encompassing KIA1797 and MIR491 at 9p21.3 was reported. This locus appearing in two

separate cancer predisposition screens provides evidence of its role in cancer predisposition.

Another interesting example of a rare CNV’s potential role in cancer risk was provided by Yang

et al. who reported a rare 4q13 duplication in a melanoma-prone family. Although this

42

duplication is unique to this family, it segregates with melanoma in the three affected

individuals181

. This region contains 10 genes, two of which, CXCL1 and IL-8, have been shown

to stimulate melanoma growth187,188

. There are also indications that CNVs may be responsible

for some of the variation seen within cancer syndromes. A study of BRCA1-associated ovarian

cancer individuals detected significantly more copy number losses in the BRCA1 group

compared to sporadic ovarian cancer cases and controls. The BRCA1 group also showed CNVs

at 31 previously unknown regions189

. The implications of this presumed genomic instability are

intriguing. Does the primary mutation result in an unstable genome prone to de novo CNV

formation? Or perhaps the CNVs are directly involved in tumour formation. These two concepts

are not necessarily mutually exclusive. It is possible that the primary mutation leads to

accelerated CNV generation and that some of these CNVs then go on to influence the phenotype

resulting perhaps in a more severe presentation.

43

Gene Syndrome Tumour Types

APC Adenomatous polyposis coli; Turcot syndrome Colorectal, pancreatic, desmoid,

hepatoblastoma, glioma, other CNS

cancers

BMPR1A Juvenile polyposis Gastrointestinal polyps

BRCA1 Hereditary breast/ovarian cancer Breast, ovarian

BRCA2 Hereditary breast/ovarian cancer Breast, ovarian, pancreatic, leukemia

(FANCB, FANCD1)

CDH1 Familial gastric carcinoma Gastric, breast

CDKN1B Multiple endocrine neoplasia type IV Pituitary tumor, testicular tumor

CDKN2A Familial malignant melanoma Melanoma, pancreatic

CHEK2 Familial breast cancer Breast, prostate

CREBBP Rubinstein–Taybi syndrome Nervous system, brain, leukemia

CYLD Brooke–Spiegler syndrome, familial

cylindromatosis, multiple familial

trichoepithelioma

Multiple skin appendage tumors

EPCAM Lynch syndrome Colorectal, endometrial

EXT1 Multiple exostoses type 1 Exostoses, osteosarcoma

EXT2 Multiple exostoses type 2 Exostoses, osteosarcoma

FANCA Fanconi anemia A Acute myeloid leukemia

FH Hereditary leiomyomatosis and renal cell

cancer

Leiomyomatosis, renal

FLCN Birt–Hogg–Dubé syndrome Renal cell carcinoma

GPC3 Simpson–Golabi–Behmel syndrome Wilms’ tumors

HRPT2 Hyperparathyroidism–jaw tumor syndrome Parathyroid carcinoma, renal cell

carcinoma

JAG1 Alagille syndrome Hepatocellular carcinoma, papillary

thyroid carcinoma

MADH4 Juvenile polyposis Gastrointestinal polyps

MEN1 Multiple endocrine neoplasia type 1 Parathyroid adenoma, pituitary

adenoma, pancreatic islet cell,

carcinoid

MSH2 Lynch syndrome Colorectal, endometrial, ovarian

MSH6 Lynch syndrome Colorectal, endometrial, ovarian

NF1 Neurofibromatosis type 1 Neurofibroma, glioma

NF2 Neurofibromatosis type 2 Meningioma, acoustic neuroma

44

NSD1 Sotos syndrome Increased risk of benign or malignant

tumors, including neuroblastoma and

gastric carcinoma

PMS2 Lynch syndrome; Turcot syndrome Colorectal, endometrial, ovarian,

medulloblastoma, glioma

PRKAR1A Carney complex Myxoma, endocrine, papillary thyroid

PTCH1 Gorlin syndrome Skin basal cell, medulloblastoma

PTEN Cowden disease; Lhermitte–Duclos syndrome Breast cancer, leukemia, renal cell

adenocarcinoma, neuroendocrine

carcinoma, Merkel cell carcinoma

RB1 Familial retinoblastoma Retinoblastoma, sarcoma, breast,

small cell lung

RUNX1 Familial platelet disorder Acute myeloid leukemia

SDHB Familial paraganglioma Paraganglioma, pheochromocytoma

SDHC Familial paraganglioma Paraganglioma, pheochromocytoma

SDHD Familial paraganglioma Paraganglioma, pheochromocytoma

SMAD4 Juvenile polyposis syndrome Colon, stomach, small bowel and

pancreas

SMARCB1 Rhabdoid tumor predisposition syndrome-1 Schwannomas, malignant rhabdoid

STK11 Peutz–Jeghers syndrome Jejunal harmartoma, ovarian,

testicular, pancreatic

TP53 Li–Fraumeni syndrome Breast, sarcoma, adrenocortical

carcinoma, glioma, multiple other

tumor types

TSC1 Tuberous sclerosis 1 Hamartoma, renal cell

TSC2 Tuberous sclerosis 2 Hamartoma, renal cell

VHL von Hippel–Lindau syndrome Renal, hemangioma,

pheochromocytoma

WT1 Denys–Drash syndrome, Frasier syndrome,

Familial Wilms’ tumor

Wilms’ tumor

Table 2- Known cancer predisposition genes with rare copy number variants and their

associated syndromes. Adapted from178

45

1.5 Rationale

Identification of pathogenic genes is vital in the management of patients with cancer

predisposition syndromes. A genetic cause remains to be identified for a quarter of all LFS cases

and the majority of LFL cases. Current technology allows for the high-throughput genetic

screening of TP53 wild-type individuals. The use of a custom CGH array will allow us to

interrogate hundreds of genes at exon-level resolution that are likely to be involved in the

LFS/LFL phenotype. Using this custom array we can expect to detect a number of copy number

variable regions in our genes of interest. Identification of copy number variants in these genes

will hopefully lead to the identification of a handful of candidate genes that are the underlying

cause of the highly penetrant cancer predisposition seen in many LFS/LFL patients which thus

far remains unexplained.

46

Chapter 2: Materials and Methods

2.1 Genes of Interest

Due to the high penetrance of the cancer phenotype in the TP53-WT families, it was

decided that it was best to focus on cancer-related genes as opposed to a more conventional

genome-wide approach. This allowed for a significantly higher resolution than would be feasible

with a standard genome-wide CGH array. The Wellcome Trust Sanger Institute’s Cancer Gene

Census is an ongoing effort to catalogue those genes for which mutations have been causally

implicated in cancer. As of March 2014, it includes 522 genes, 20% of which have had mutations

in the germline observed190

. When the custom array was being designed for this study, the list

included 427 genes (Supplementary Table 1). As the phenotypes of the TP53-WT families are

similar to those who harbour TP53 mutations, p53-pathway genes were also of interest. As p53

has hundreds of binding partners, it was necessary to limit which genes were to be included in

the array. Ingenuity Pathways Analysis (IPA; Ingenuity Systems Inc., USA) was used to identify

genes primarily involved in p53-regulated cell cycle control. From the IPA, an additional 16

genes were selected based on their relevance to cancer biology and their interaction with p53.

(Supplementary Table 2). Collectively, this expanded the total number of genes of interest to 443

genes.

2.2 Array Design

2.2.1 Design Overview

The initial expectations for the custom array were that it would have a resolution of at

least 300bp in the exons, 5kb within the introns, and 1kb in the defined promoter region (defined

as being at least 5kb from the transcription start site) in the genes of interest as well as ~100kb

47

across the rest of the genome. Based on previous experience with the custom Agilent platform

and Partek’s Genomics Suite, it was decided that a minimum of three probes were required to

make a reliable call in a given region. This meant that a minimum inter-probe distance of 100bp,

~1660bp, 333bp, and ~33kb was required for the exons, introns, promoter regions, and across the

genome, respectively. The Agilent 4x180k format (4 identical arrays of ~180 thousand probes

each per slide) was selected as the most appropriate to achieve the desired resolution in a cost-

effective manner.

2.2.2 Design of Exonic Probes

The exonic regions of the genes of interest were the focus of the array; therefore a special

effort was made to ensure the highest possible probe density in these regions in particular.

However, Agilent recommends that probes not be placed with an inter-marker distance of less

than 100-150bp. This is primarily due to two main problems that increase in severity the higher

the probe density becomes. First, if there is more than one probe per 100-150bp segment of

DNA, any given fragment of DNA may have more than one probe it can bind to with perfect

complementarity. This can result in noisy data as multiple probes may be competing to bind to

one fragment of DNA. The second concern is that the average quality of probes tends to decrease

as density increases. If one attempts to place many probes in a small area, it must be expected

that some of the probes will not be ideal with regards to sequence similarity and melting

temperature. These caveats had to be considered when attempting to meet the tiling-level

resolution that was desired. For these reasons, true high-density, tiling-resolution CGH arrays are

relatively rare, although they have been run successfully22,158

.

Exon regions were defined using the Consensus Coding Sequence Project (CCDS) which

is a collaborative effort between the National Center for Biotechnology Information, the

48

European Bioinformatics Institute, the University of California, Santa Cruz, and the Wellcome

Trust Sanger Institute to agree upon a consistent set of protein coding genes for human and

mouse for public use191

. Each coding exon of each splice variant of each of the 443 genes of

interest was defined this way. As it would be ideal to have probes covering the exon/intron

boundary, an additional 50bp was added to each end of these coordinates (100bp total) which

were based on the hg19 human genome assembly. This resulted in a defined “exon region” that

encompassed in all roughly 1.6 million bp with an average size of 266bp.

Custom probes were designed using Agilent’s eArray program using the “Genomic

Tiling” option. The average probe spacing selected for these coordinates was 35bp. The program

was told to avoid repeat regions and Alu 1 and Rsa 1 restriction sites, which are used in the

standard Agilent protocol to digest the genomic DNA. Repeat regions of the genome were also

avoided. The preferred probe melting temperature was 80˚C. Probes were “trimmed” in order to

ensure that the melting temperatures were as close to 80˚C as possible. Probes can be trimmed

from the default size of 60bp to a minimum of 45bp. This resulted in an output of 30966 probes

tiled across the defined exon regions.

The eArray software can calculate probe performance scores for non-catalogue probes to

predict how likely it is that they will produce a good log ratio response when used on the Agilent

CGH platform. These scores are based on the GC content, melting temperature, sequence

complexity and metrics to measure homology with the rest of the reference genome. These

factors are taken into account and the probe is given a score between 0 and 1. The average probe

score of all the Agilent Catalogue probes is 0.759. The custom probes were scored and all probes

with a score under 0.3 were eliminated. This resulted in a total of 28373 probes tiling the defined

exon regions with an average score of 0.615. The average probe size after trimming was 51bp. In

49

total, the probes covered 1.1 million base pairs, or 70% of the defined exon regions. The actual

average probe spacing within these intervals was 55bp as there were many small stretches within

the defined regions that were not amenable to quality probe placement. According to the

previously mentioned standards, this gave a predicted average resolution of 165bp.

The probe design protocol above was repeated for many different probe spacing inputs

from 8bp to 150bp. The 35bp input was determined to be the best based on a good balance of

probe density to probe quality. A previous custom Agilent array used by the lab had similar

probe spacing22

. The 35bp set spacing resulted in a sufficient actual density(average 55bp inter-

probe distance) to allow for a resolution well below the minimum desired resolution while still

maintaining a high percentage of acceptable probes (28373 out of 30966 with an average quality

of 6.2/10).

2.2.3 Genomic Probes

A modest genome-wide resolution was desired in order to detect any large, multi-gene

copy number alterations that would otherwise go undetected by focusing only on the 443 genes

of interest. The probes were designed last and were thus used to fill the rest of the array (~48%

of probes). Due to the very low density required, selecting from the more reliable Agilent

Catalogue probes was much more desirable than custom probe design. Probes were selected to

span all 22 autosomal chromosomes as well as the X chromosome, excluding the centromeric

regions. The Agilent Catalogue probes were selected based on a shared melting temperature of

80˚C as well as a similarity score filter that filters out probes with secondary genomic alignments

that could potentially impact probe performance (the most stringent filter). The desired average

spacing was 40kb. This resulted in 75100 probes spread across the entire genome, giving an

estimated resolution of ~120kb.

50

2.2.4 Non-coding Exon Probes

Deletions of the untranslated regions (UTRs) of genes have previously been implicated in

cancer, including in the germline of some cancer syndromic patients192. As these deletions need

not be large in order to demonstrate a significant effect, it was deemed necessary to investigate

these regions at high resolution. The UTRs of all the RefSeq transcripts were obtained via the

UCSC genome browser. This resulted in 1002 independent regions with a total of 1.1 million bp.

As with the coding exons, an additional 50 base pairs were added to either side of these

coordinates. The coordinates were used to design custom probes in the Agilent eArray software

in a manner similar to the coding exons. The average probe spacing selected was 75 base pairs.

As with the exon regions, the probes were designed to be trimmed and exclude repeat regions as

well as the relevant restriction sites, and to have a melting temperature around 80˚C. This

resulted in a total of 11106 custom probes being placed across all the defined UTR regions.

Roughly 57% of the defined area was covered with an actual average spacing of 100 base pairs

with a mean probe size of 52bp. This resulted in an average estimated resolution of roughly 300

base pairs, though it should be noted that the spacing in the majority of regions was close to 75bp

with the exception of repeat regions within the defined area (increasing the calculated average).

2.2.5 Promoter Region Probes

Upstream deletions, even at a considerable distance from the transcriptional start site

have previously been shown to alter gene function. For this reason, the regions upstream of our

genes of interest were also interrogated by the array. Two separate “promoter regions” were

defined: the first being 5kb upstream of the transcriptional start site, the second being 5kb

upstream from the first (a total of 10kb from the TSS). The overlap between these regions and

the UTR regions allows for high probe density around the transcriptional start site due to the high

51

density of the UTR probes. Custom probes were designed for these two regions in the same

manner described above. The first region was to have an average spacing of 75 base pairs; the

second was to have an average spacing of 150bp. This resulted in a total of 13714 probes being

placed in the first region and a further 6145 probes being placed in the second for a total of

19858 probes in the defined promoter region. Roughly 27% of the defined promoter region was

covered (37% and 16.7% for the first and second regions, respectively). The actual average

spacing for the first region was 162 base pairs, and for the second, 361 base pairs. This resulted

in estimated resolutions of 486 and 1083 base pairs, respectively. The cause of the significantly

lower probe density than the requested input is the large amount of repeat regions of varying

sizes spread across these regions.

2.2.6 Gene Intron Probes

In order to detect either large alterations in the introns that may influence gene function

or smaller alterations that extend only slightly into the defined exon regions, a modest resolution

across the introns of the genes of interest was desired. As the desired probe density was

relatively low, it was possible to select probes from the Agilent Catalogue of pre-made probes.

The entire RefSeq gene coordinates were used with eArray picking probes at a 1.25kb

intermarker spacing while excluding the previously defined exon regions. The criteria used to

pick these probes were the same as those described above for the genomic probes. This resulted

in a total of 28404 probes being placed evenly across the intronic regions of the genes of interest.

With a roughly 1.25kb intronic spacing, the estimated resolution was 3.75kb. As intronic regions

were flanked by exonic probes, the resolution of intronic regions smaller than 3.75kb was

correspondingly higher.

52

2.2.7 Finalizing the Array

In addition to the requisite quality control (QC) probes, an additional 3576 Agilent

Catalogue probes were placed evenly across the genome in order to fill the final few spots on the

array. This led to an array with 166 417 experimental probes with a total of 180 880 array

features including the QC probes. All regions were checked computationally and visually using

the University of California, Santa Cruz (UCSC) Genome Browser.

Figure 7- Custom probes of our custom CGH array are highlighted in green while copy

number probes from the Affymetrix Genome-Wide Human SNP6.0 are highlighted in red. The

custom array provides a much greater probe density in the coding regions in genes of interest.

2.3 Samples

2.3.1 Sample Selection

Samples were selected from families who fit the previously mentioned revised

Chompret Criteria. In total, whole genomic DNA extracted from 32 individuals from 22 kindreds

53

were selected to be run on the custom CGH array. All individuals were shown not to harbour

TP53 coding mutations or TP53 duplications/deletions by Sanger sequencing of the entire coding

region of the gene as well as up to 100bp into the introns and multiplex ligation polymorphism

analysis (MLPA). Samples from five healthy teenagers (median age= 15 years) were obtained

from the biorepository of the Montreal Neurological Institute. None of the control patients have a

family history of cancer including 2nd

degree relatives. In order to confirm the accuracy of the

array, three samples with confirmed deletions at the TP53 locus were also run. These samples

represented the small (~2.2kb), medium (~15kb), and large (~1Mb) deletions that represented the

range of expected alterations. This brought the total number of samples run to 40: 32 affected, 5

healthy controls, and 3 TP53 positive controls. The ages at diagnosis and tumour types can be

found in table 3. As Agilent CGH arrays are run with a sample of interest and reference

simultaneously in order to set a baseline copy number state, using a proper reference is vital. It

was decided that a pool of older individuals with no history of cancer would best serve this

purpose. While larger pools can mask common alterations, they help to ensure that uncommon

alterations are not misrepresented or missed entirely. The Centre for Applied Genomics (TCAG)

has developed the Ontario Population Genomics Platform (OPGP). The OPGP consists of

approximately 2600 DNA samples randomly selected from a predominantly Ontario-based

control cohort sorted into 96-well plates. The samples are prepared from permanently stored

EBV- transformed lymphoblastoid cell lines. From the OPGP, reference pools of 40 samples

from each sex were selected on a basis of having a medical history free of cancer, diabetes, and

mental illness. The average age of each pool was 60.1 years and 67.2 years for males and

females, respectively. Each pool contained 38/40 Caucasian individuals, a similar proportion to

the cases.

54

Sample ID Sex Related Samples dx Age Tumour

1 F 16 OS-TP53 deletion control

2 F n/a TP53 deletion control

3 F 17 ADCC- TP53 deletion control

4 F 5 13 ARMS

5 M 4 17 OS

6 F 45 Breast

7 F 2 ADCC

8 F 2 medulloblastoma, thyroid, SCLT, meningioma

9 M 33 0.1 neuroblastoma

10 F 1 CPC

11 F 12,16 38 Melanoma

12 M 11,16 54 OligodenLG

13 F 40 Breast

14 M 6 months CPC

15 M 34 CPC

16 M 11,12 6 pNET

17 M 18,19 13 OS

18 M 17,19 7 ERMS

19 F 17,18 n/a Unaffected (Syndactyly)

20 F 21 37 Breast

21 F 20 1 Medulloblastoma

22 M 19 OS

23 M 24,25 2 ERMS

24 F 23,25 40 Breast

25 M 23,24 43 GliomaLG

26 M 33 5 CPC

27 F 4 Astrocytoma

28 F 5 NB

29 F 39 Uterus Sarcoma

30 F HEALTHY CONTROL

31 F 42 Breast

32 M 40 Astrocytoma

33 F 9 34 Cervix

34 M 36 38 DCIS

35 F 43 Breast

36 F 34 8 GBM



39 M HEALTHY CONTROL

40 M HEALTHY CONTROL

Table 3- Summary of all 40 samples run on the array including sex, related samples, age

at diagnoses (dx) and tumour type.

55

2.3.2 Subject Recruitment

Written informed consent was obtained for all 40 samples of interest prior to the

extraction of DNA from peripheral blood leukocytes. For patients with cancer, samples were

obtained prior the initiation of therapy. All OPGP samples have been re-consented for

anonymized use as controls in genetic studies. All 80 samples used in the sex-matched reference

pools were taken from plate 19 of the OPGP. Before being run, DNA was quantified using a

NanoDrop Spectrophotometer (NanoDrop, Wilmington, DE) and the quality was assessed by

agarose gel electrophoresis. This study was approved by the Research Ethics Board at the

Hospital for Sick Children in Toronto.

2.4 Analysis and Validation of CGH Array Data

2.4.1 Custom CGH Array Analysis

All samples of interest were analysed using the Custom Agilent 4x180k CGH array

described above (Agilent Technologies, Santa Clara, CA). Both samples and references were

restriction enzyme digested, purified, labelled, and hybridized according to the manufacturer’s

protocol at TCAG. Arrays were run in four blocks: Samples 1-8, 9-16, 17-31, and 32-40 over a

span of roughly 6 months. After verifying the positive controls using Agilent Genomic

Workbench 5.0, the bulk of the array analysis was done using Partek Genomics Suite 6.6 (Partek,

St. Louis, MI). To detect small alterations, the Partek Genomic Segmentation model was used.

Called segments were defined as having a minimum of three markers showing means either

under a copy number of 1.6, or 2.4 with a p-value of 0.01 and signal to noise threshold of 0.3.

This substantial list of alterations was then filtered by excluding any alterations detected in any

of the five healthy controls, as well as any alteration found in only one sample. The resultant list

of alterations defined our primary regions of interest. In an effort to identify alterations found

56

only in one sample but likely to be accurate, another segmentation analysis was done to identify

larger copy number altered segments with convincing means and probe frequencies. Called

segments were defined as having a minimum of 15 markers showing means either under a copy

number of 1.4, or over 2.6 with a p-value of 0.001 and signal to noise threshold of 0.5. As before,

alterations seen in the healthy controls were eliminated, however alterations present in only one

sample were included. Both segmentation analyses were shown to accurately identify all three

positive controls.

2.4.2 Quantitative PCR validation

Quantitative PCR was used to validate the regions of interest from the array analysis.

This was performed on a Roche LightCycler 480 (Roche Applied Science, Indianapolis, IN)

using the Roche SYBR green kit. Primers were designed using Primer3 and the human genome

reference assembly (UCSC version hg19). All the primer pairs as well as the run protocol can be

found in the supplementary methods (Supplementary table 3). Primers with the highest

efficiency were used as primary primers while secondary primers used for confirmation were

selected from primers with lower (but still sufficient) efficiency. All samples were run in

triplicate. A commercial pool of genomic DNA from 50 individuals (Roche) was used as a

positive calibrator for each gene in each experiment. Both BCMA and FOXP2 were used as

reference genes. Copy number state was determined by a relative quantification method which

compensates for differences in target and reference amplification efficiencies. Copy number

ratios below 0.7 of the reference were determined to be deletions while ratios above 1.3 were

determined to be amplifications.

57

2.4.3 TaqMan PTCH1 Validation

A catalogue TaqMan (Invitrogen) copy number assay that overlaps with the 5’ region of

interest were used to further investigate alterations in PTCH1. The assays were run using a

Roche LightCycler 480 according to the manufacturer’s protocol. All samples were run in

quadruplicate. The commercial pool of Roche genomic DNA from 50 individuals was used as a

positive calibrator while the Human TaqMan TERT assay was used as a reference.

2.4.3 Sequencing of PTCH1

Sequencing of PTCH1 was done in the Molecular Genetics Laboratory at the Hospital for

Sick Children. Sequencing primers can be found in supplementary table 4.

2.4.4 Mismatch Repair Gene Mutation Screening

Mismatch repair gene mutation screening, which included both sequencing and MLPA of

MSH2, MLH1, MSH6, and EPCAM was done at the Laboratory for Advanced Molecular

Diagnostics at Mount Sinai Hospital. For the immunohistochemistry, sections were cut at 4

microns and allowed to dry prior to baking in a 60 degree oven for 30 minutes. The slides were

dewaxed through a series of xylenes, dehydrated through a series of graded alcohols and brought

to water before being rinsed with Tris buffer. Heat retrieval was performed in a Tris/HCl buffer

pH 9.0 for 30 minutes at 100˚C using a HistoPro microwave pressure cooker. The sections were

stained using the Dako 480 immunostainer. The MLH1 and MSH2 antibodies (Monoclonal ES05

and Monoclonal 25D12, respectively) were obtained from Leica Biosystems while the MSH6

and PMS2 (Rabbit monoclonal SP93 and Rabbit monoclonal PR3947, respectively) antibodies

were obtained from Cell Marque.

58

Chapter 3: Results

3.1 Array Results

3.1.1 Candidate Genes Found by Custom Array CGH

The segmentation analysis previously described was able to accurately describe all three

TP53 deletion controls, matching the size and copy number that was previously known. After

further filtering by the criteria described above, a list of 179 genes/loci were identified as

potential regions of interest. The full list of regions of interest can be found in the supplementary

results (supplementary table 5). These were then evaluated based on the mean copy number and

probe frequency of the reported segment as well as by their likelihood to contribute to an

autosomal dominant cancer predisposition phenotype. This included looking particularly for

genes in the p53 pathway as well as genes that have been previously implicated in other cancer

predisposition syndromes. This resulted in the identification of 13 primary genes of interest

shown below in table 4. Of these, PTCH1 and DICER1 were chosen as the validation targets of

the highest priority due to their prevalence in the tested cohort, their convincing means, and the

likelihood of errors in these genes causing a wide range of tumours based on their biology and

previously known association with cancer predisposition.

59

Gene Type Sample ID Unique Loci

CREB1 Loss 5 3

CREB1 Gain 1 1

FNBP1 Loss 5 2

BCL7A Loss 11 1

APC Loss 7 2

APC Gain 4 3

ATM Loss 3 2

PTCH1 Loss 13 2

PTCH1 Gain 2 2

DICER1 Loss 7 2

HOXC13 Loss 4 2

EXT2 Loss 7 2

EXT2 Gain 1 1

WRN Loss 5 4

HOXA13 Loss 4 1

HOXA13 Gain 1 1

HOXA10 Loss 8 2

HOXA10 Gain 2 2

PMS2 Loss 2 1

PMS2 Gain 2 2

Table 4- Primary genes of interest after CGH array analysis. Unique loci refer to how

many distinct regions were identified as altered in a given gene.

3.1.2 Significant Alterations Found in Single Samples

As copy number gains are rarely implicated in autosomal dominant cancer predisposition,

only segments showing a copy number loss were considered as regions of interest (Table 5). In

total, 10 large copy number losses were detected in single samples. These all occurred in one

sample each with the exception of one sample demonstrating two large copy number losses.

While copy number losses were the focus, a single sample with embryonal rhabdomyosarcoma

(Sample 23) appeared to have seven large copy number gains. This was considered significant as

no other sample had more than two large copy number alterations detected. These seven

60

segments can be seen in table 6. Due to its involvement in rhabdomyosarcoma, the ~1.9kb copy

number gain FOXO1 was chosen as the primary alteration of interest of these seven.

Segment Sample ID Size (bp) Markers Mean Gene(s)

chr8:118910810-118929166 24 18356 15 1.04201 EXT1

chr15:20562844-22617694 10 2054850 36 1.2054 multiple

chr15:20735436-22617694 32 1882258 33 1.18954 multiple

chr2:47636572-47646395 36 9823 25 1.16945 MSH2

chr4:190910973-191002380 15 91407 20 1.34896 DUX2, DUX4

chrX:129244389-129246323 15 1934 19 1.27595 ELF4

chr17:29558557-29562850 32 4293 18 1.35945 NF1

chr11:3697265-3698117 36 852 16 1.38014 NUP98

chr6:135501207-135502627 34 1420 15 1.34839 MYB

chr14:22401486-22963927 9 562441 172 1.27338 multiple

Table 5-Large copy number losses seen in only one sample

Location Size (bp) Probes Mean Gene

chr22:23651269-23664153 12884 61 2.643 BCR

chr7:156797661-156803572 5911 42 2.7186 MNX1

chr14:99640288-99642451 2163 30 2.89292 BCL11B

chr2:16079743-16082896 3153 22 2.76887 MYCN

chr13:41239262-41241181 1919 21 2.7937 FOXO1

chr2:100210137-100217744 7607 20 2.73956 AFF3

chr22:23522754-23523962 1208 18 3.06706 BCR

Table 6- Seven large copy number gains were detected in Sample 23

3.1.3 Difficulty Identifying Genuine Copy Number Alterations with Custom Probes

Due to the large amount of reported copy number alterations seen in many samples,

particularly those of small to intermediate size (3-12 markers) and unconvincing means (copy

number of ~1.5 or ~2.5), we suspected that the segmentation analysis was over-reporting copy-

number altered regions. After examining the visual data plots of many regions of interest in all

40 samples, certain patterns in the reported log ratio were often observed in areas with a high

custom probe density. Segments with reported copy number losses were often seen to have

means that approach a call (but not pass the threshold) in other samples which would appear

61

copy number neutral by segmentation analysis. In addition, it was sometimes observed that

probes would follow very similar patterns in their individual reported log ratios. An example of

this can be found in figure 8. These patterns were mostly seen in samples that were run at the

same time, particularly in arrays on the same slide. This made the creation of a list of candidate

genes for validation more difficult as it seemed likely many reported alterations were not in fact

genuine.

Figure 8- A notable example of a sample (b) which was not identified by segmentation

analysis but reported very similar log ratios to a sample (a), which was. Samples a and b were

run beside each other on the same slide.

62

3.1.4 Genetic evidence of anticipation

A number of detected CNVs detected in parents were found to be expanded in their

children. These children also developed tumours at an earlier age than their parents. A total of 14

expanded alterations were detected (Table 7). Of these, only two were detected as being larger in

the parent, and both by only one probe. The expansion of alterations was most notable in samples

5 and 4 (father and daughter), who had five expanded alterations detected.

63

Sample ID Alteration Size (bp) Probes

Segment Mean Gene(s)

5 Loss 302 4 1.194 CREB1

4* Loss 377 5 1.207 CREB1

5 Loss 1466 7 1.291 TPR

4* Loss 2125 11 1.387 TPR

25 Loss 270445 7 1.235 PABOC4L

23* Loss 233076 6 1.101 PABOC4L

34 Loss 807 5 1.004 BCL7A

36* Loss 982 6 1.151 BCL7A

5 Loss 437 4 1.011 MLL

4* Loss 62794 6 1.213 MLL

11 Loss 2049 21 1.289 KAT6B

16* Loss 13225 44 1.519 KAT6B

19 Loss 1103 9 1.283 none

17* Loss 1256 10 1.307 none

5 Loss 416 4 1.105 EXT2

4* Loss 1170 8 1.314 EXT2

5 Loss 170 3 1.125 PHOX2B

4* Loss 319 5 1.252 PHOX2B

11 Gain 9166 12 2.959 TIMP3/SYN3

16* Gain 9112 11 3.367 TIMP3/SYN3

34 Loss 6301 13 1.411 CHEK2

36* Loss 6334 14 1.202 CHEK2

33 Loss 643 14 1.523 HOXA10

9* Loss 726 17 1.372 HOXA10

33 Loss 174 3 1.303 HNRNOA2B1

9* Loss 281 4 1.016 HNRNOA2B1

19 Loss 355 4 1.352 NCOA1

18* Loss 468 5 1.285 NCOA1

Table 7- Summary of all shared alterations that were detected as expanded in families by

array segmentation analysis. The child in each parent/child pair is denoted with an asterisk (*).

64

3.2 Validation of Candidate Genes

3.2.1 MSH2 involved in a Li-Fraumeni-Like phenotype

Due to the high prevalence of MSH2 deletions in hereditary nonpolyposis colorectal

cancer (HNPCC, or Lynch Syndrome), validation of a possible MSH2 deletion in an LFS-L

proband with a Glioblastoma multiforme (Sample 36 in table 3) was considered a high priority.

This deletion was in fact confirmed by qPCR (Figure 10). The affected mother of the proband

was also run on the array (Sample 34) and appeared to be copy number neutral at this locus. This

was curious as it was initially assumed that the cancer predisposition was inherited from the

maternal lineage due to the prevalence of early-onset breast cancer and an unaffected father

(pedigree in figure 11). Further evaluation of the family history however, revealed a very strong

incidence of colon cancer along the paternal lineage. It was subsequently discovered that the

proband’s paternal grandfather had tested positive for an MSH2 deletion encompassing exons 3-

6, identical to our array (figure 9). Following the confirmation of the deletion in the family, the

proband’s thus far unaffected brother was also tested and was found to be positive for the

deletion. Subsequent immunohistochemical analysis of the proband’s tumour revealed that 50%

of the tumour cells express PMS2 and MLH1 and that while MSH6 and MSH2 are expressed in

endothelial cells, they appear to be completely lost in the tumour.

65

Figure 9- MSH2 copy number loss encompassing exons 3-6 (shown in red) detected on

the custom CGH array

Figure 10- MSH2 copy number loss in validated via SYBR Green qPCR. Shown are the


66

Figure 11- Pedigree of a proband with a history of HNPCC in the paternal lineage and

breast cancer predisposition in the maternal lineage.

3.2.2 Large copy number gains in sample 23 likely an artifact

Validating the FOXO1 copy number gain in Sample 23 proved difficult. The only

primers of borderline-sufficient quality repeatedly demonstrated a copy number loss rather than a

gain while using both BCMA and FOXP2 as references (Figure 12). In order to evaluate the

reliability of these results, MYCN was chosen as secondary gene of interest due to its similar

probe count and mean to the FOXO1 alteration. A reliable primer pair repeatedly demonstrated a

67

neutral copy number at the locus (Figure 13). This was confirmed with a second primer pair as

well (not shown). It should also be noted that of all seven of the large copy number gains seen on

the array in this sample, none were shared in the other two family members that were also tested.

Because of this, it was decided that the abnormally high incidence of copy number gains

observed in this sample was simply an artifact caused by either the improper labelling of some

DNA fragments, or improper hybridization

.

Figure 12- FOXO1 copy number gain failed to validate via SYBR Green qPCR, instead

appearing as a copy number loss. Shown are the mean (+/- SEM) copy number ratios.

68

Figure 13- MYCN copy number gain failed to validate via SYBR Green qPCR, instead

appearing as copy number neutral. Shown are the mean (+/- SEM) copy number ratios.

3.2.3 Confirmation of custom array’s ability to detect previously unknown alterations

As the specificity of the array was in doubt, it was deemed necessary to ensure that the

array was indeed capable of identifying previously unknown alterations (both gains and losses)

that could be validated. For this, a large gain in PCDH15 (44 probes, size: ~1.625Mb, mean:

3.37) was used as well as an intermediate sized loss in EXT1 (15 probes, size: ~18kb, mean:

1.04). These were chosen due to their likelihood of being genuine based on the array analysis.

Both alterations were in fact validated via qPCR with their ideal primers (Figures 14 and 15) as

well as secondary primer pairs. The region observed to have a copy number gain in PCDH15 has

been reported to be copy number variable in the DGV. The copy number loss observed in EXT1

69

lies in a large intronic region. As both these alterations were deemed unlikely to be causative,

neither was pursued as a potential gene of interest.

Figure 14- PCDH15 copy number gain successfully validated via SYBR Green qPCR.

Shown are the mean (+/- SEM) copy number ratios.

70

Figure 15- EXT1 copy number loss successfully validated via SYBR Green qPCR.

Shown are the mean copy number ratios.

3.2.4 DICER1 is copy number neutral in the patient cohort

DICER1 was considered a gene of interest and a high priority for qPCR validation. Both

regions of interest shown by the array were analyzed via qPCR and both were shown to be copy

number neutral (Figures 16 and 17). Due to the fact that all seven alterations were shown to be

neutral by qPCR, we decided that further pursuit of DICER1 as a gene of interest was

unnecessary and that both alterations shown on the array were simply artifacts.

71

Figure 16- The copy number losses on the 5’ end of DICER1 failed to validate via SYBR

Green qPCR, instead appearing as copy number neutral in all four samples. Shown are the mean

(+/- SEM) copy number ratios.

72

Figure 17- The copy number losses encompassing two exons on the 5’ end of DICER1

failed to validate via SYBR Green qPCR, instead appearing as copy number neutral in all three

samples. Shown are the mean (+/- SEM) copy number ratios.

3.2.5 PTCH1 validation

The results for PTCH1 were the most striking from the array, especially in the context of

the gene’s involvement in hereditary cancer predisposition. Our interest was primarily focused

on the thirteen samples showing a copy number loss on the far 5’ end of the gene. This loss

encompasses the first exon of some PTCH1 transcripts, while being upstream of others (Figure

18). After validating multiple primers, the best were used to evaluate the copy number status of

this locus via qPCR. If acceptable secondary primers were available for the sample of interest,

they were used to confirm the results from the primary primers. The results of the primary

primers can be seen in Figure 19. There was a considerable amount of variation in the copy

number state across the samples. Two samples (27 and 35) were shown to be copy number

73

neutral, with the remaining eleven samples showing copy number ratios below the threshold of

0.7. In an effort to further validate these findings, a TaqMan reference assay was used, though

this proved negative in all samples. The best usable probe however was 500bp away from the

common region of copy number loss and only included the largest reported losses. The TaqMan

results for these six samples are shown in figure 20.

Figure 18- Thirteen copy number losses were observed on the far 5’ end of PTCH1,

shown in red.

74

Figure 19- Of the copy number losses encompassing the 5’ end of PTCH1, 11/13

successfully validated via SYBR Green qPCR. Shown are the mean (+/- SEM) copy number

ratios.

75

Figure 20- The six samples with detected copy number losses encompassing a TaqMan

probe were all revealed to be copy number neutral at the probe’s locus using a TaqMan copy

number qPCR assay. Shown are the mean (+/- SEM) copy number ratios.

A 2.6kb gain was observed in one sample (sample 5) by array analysis The qPCR

validation however, gave unexpected results. Though there was a considerable amount of

76

variation between runs, the mean copy number ratio actually showed a loss (figure 21), a result

that was replicated with a secondary pair of reliable primers.

Figure 21- The copy number gain spanning 2.6kb in PTCH1 failed to validate via SYBR

Green qPCR, instead appearing as a copy number loss. Shown are the mean (+/- SEM) copy

number ratios.

The copy number losses seen in three related samples in exon 14 of PTCH1 as well as

one copy number gain observed in the 3’ UTR were both shown to be copy number neutral via

qPCR (Figures 22 and 23).

77

Figure 22- The copy number loss observed in exon 14 of PTCH1 failed to validate via

SYBR Green qPCR, instead appearing as copy number neutral. Shown are the mean (+/- SEM)

copy number ratios.

78

Figure 23- The copy number gain observed in the 3’UTR of PTCH1 failed to validate via

SYBR Green qPCR, instead appearing as copy number neutral. Shown are the mean (+/- SEM)

copy number ratios.

Sequencing of PTCH1 revealed a single nucleotide transversion present in two related

samples, 19 and 18 (mother and son, pedigree shown in figure 24), that has not been previously

reported in either the Catalogue of Somatic Mutations in Cancer (COSMIC) or the NCBI SNP

database. This c.1298C>A substitution results in a serine to tyrosine change at codon 443.

PolyPhen-2, a tool for annotating coding nonsynonymous SNPs193

predicted this substitution to

be “probably damaging” with a score of 0.984. The “Sorting Tolerant From Intolerant” (SIFT)

algorithm194

also predicts this mutation to be “damaging” with a SIFT score of 0.01.

79

Figure 24- Pedigree Showing a LFS-L family with a history of syndactyly in the maternal

lineage. The family members of interest are outlined in red.

80

Chapter 4: Discussion and Future Directions

Despite sharing a similar, severe phenotype to mutant TP53-associated Li-Fraumeni

Syndrome (LFS), there are a multitude of families in whom no causative germline variants have

been detected. In this study, we shed some light on this wild-type TP53 population. First, we

demonstrated the utility of a candidate gene/region approach using a custom platform. With this

approach we were able to properly identify a hereditary cancer syndrome that was initially

identified as being Li-Fraumeni Like (LFL), while identifying a new gene in PTCH1 that may at

least play a role in the development of the strong cancer predisposition phenotype observed in

LFS/LFL families without detectable TP53 mutations.

4.1 Discussion on high-resolution genomic analysis in LFS

4.1.1 Utility of custom CGH arrays for detecting novel copy number alterations in the

germline

Our custom CGH array proved sensitive enough to detect all three TP53 control

deletions, the smallest of which was 2.1kb. It also detected numerous copy number alterations

that were validated with qPCR, the smallest of which was 387bp. On the other hand, our custom

array does appear to suffer from issues of specificity. Notably, all detected copy number changes

in DICER1, in two different regions, failed to validate. The full extent of mis-reported alterations

is difficult to predict, but based on the number of regions that appear to have shifts in their

reported log ratios, particularly within slides that were run at the same time, we can predict that

the issue is quite widespread. Samples 9-16 appeared to demonstrate similar copy number shifts

the most. The cause of these patterns is difficult to discern, but is likely the result of a

combination of technical error and difference in handling between runs which was exacerbated

81

by the low probe quality (with regards to GC content, secondary binding, and melting

temperature as quantified by the Agilent probe score) that was often necessary in order to

achieve a high probe density in many exons. Indeed, shared log ratio patterns were almost

always observed in segments primarily composed of custom probes in the exonic regions. Even

after the elimination of the worst quality probes, in order to achieve our desired resolution we

were forced to include a number of low quality probes on the array. In regions with many of

these probes, errors in reporting are to be expected. The very inclusive requirement of only three

probes to define a segment would also contribute to a large number of these reported copy

number alterations where likely none exist. Again however, in order to achieve a resolution of

less than 500bp, this was a necessary decision. Placing one healthy negative control sample on

each slide (one array of four), would help in the analysis as a log ratio pattern detected in the

control could be dismissed as an artifact. Ultimately though, the price for the high sensitivity of

the platform is its low specificity. Given the potential to reveal very small copy number changes,

this is likely acceptable for research purposes, but the significant sacrifice of specificity limits its

usefulness in clinical applications.

Recently, there have been a few studies utilizing custom CGH platforms with a probe

density of ~50bp22,195

. These studies, however, tend to focus on either a specific region (or just a

few regions) of interest, or a handful of candidate genes. Our study is unique in its ultra-high

resolution across almost 450 distinct regions of the genome. This has not been the only study to

utilize a custom CGH array in order to identify additional LFS genes. Very recently, Aury-

Landas et al. also used a custom 4x180k Agilent platform to evaluate the copy number status of

TP53 and its surrounding regions as well as 24 candidate genes and 6 miRNAs along with the

rest of the genome, the results of which are discussed in the introduction of this thesis57

. Notably,

82

only ~28 000 probes on the array were dedicated to their regions of interest, compared to the

~91 000 probes in this study. Using stringent criteria, only one small (~4kb) alteration was

detected; the remaining 16 detected CNVs not listed in the DGV were all over 82kb and none

were found in their regions of interest. The results of this study as well as ours can be used to

direct how similar approaches can be used in the future.

As discussed in the Methods chapter, placing custom probes at a very high density across

a large number of regions can lead to difficulties in analysis despite the high theoretical

resolution. This appears to have been an issue in the study of Aury-Landas et al. and certainly

was in this study. One should think carefully about the true utility of custom probes when

placing so many together at a high density. If at all possible, probes previously designed to be

placed on the platform (i.e. Agilent catalogue probes) should be used as they are more reliable

than custom probes. A combination of catalogue probes and custom probes may be ideal, and

with the large number of catalogue probes now available, it would now be possible to forgo the

use of any custom probes at all while maintaining a reasonably high probe density, albeit less

dense than the custom array used in this study.

A discussion of the utility of custom CGH arrays going forward must take into account

the place they now have in the broad arsenal of genome analyzing tools. Next generation

sequencing (NGS) is of particular relevance today, as will be discussed below, and has enabled

researchers to interrogate the genome at base pair resolution. The niche of CGH arrays is that

they are relatively inexpensive, and analyzed relatively easily giving reliable results. As any

future custom CGH array studies in this area should be done with this new niche in mind, it

would change the design process significantly by changing the focus from sensitivity to

specificity and reliability. This would be best achieved by discarding any custom probes

83

altogether. By extending defined “exonic regions” to include a few hundred bases on either side

of each exon, many catalogue probes could still be placed in and around most exons. In order to

improve the cost-effectiveness of the array, the genomic probes could be eliminated as well. In

this study for example, ~75 000 probes were spaced out across the genome. While this

represented a very large proportion of the array’s total probe count, it still resulted in rather

unimpressive resolution. There are now a number of standard commercial arrays that offer very

impressive resolution throughout the genome such as Agilent’s 1M array which has a median

inter-probe distance of 2.1kb, and 1.8kb within RefSeq genes. Platforms such as these would

provide a more cost-effective means of interrogating the entire genome in relation to resolution.

The custom array should focus on the candidate genes and the regions directly surrounding them.

By eliminating the genomic probes, and relying solely on catalogue probes for the regions of

interest, the array would be small enough to be placed on something like Agilent’s 8x60k

platform, cutting costs by allowing eight arrays to be multiplexed on one slide.

4.1.2 Next Generation Sequencing

Upon the sequencing of the human genome in 2001, it was estimated that it consisted of

~3 billion base pairs containing 30,000-40,000 protein coding genes, a figure that later was

revised to be only 20,000-25,000 genes196,197

. Until recently, studies on genetic diseases have

relied on traditional Sanger sequencing to sequence a small subset of genes of interest. Next

generation sequencing (NGS) however, has revolutionized genomics by allowing for feasible

whole genome sequencing by individual investigators in a matter of days198,199

. In whole-genome

sequencing, genomic DNA is fragmented and then directly subjected to massively parallel

sequencing. This allows for the discovery of single nucleotide variants (SNVs) not only in the

coding regions of genes that are the focus of Sanger sequencing, but also in promoters,

84

enchancers, introns, non-coding RNAs and intergenic sequences across the entire genome. NGS

has also become more and more effective at detecting structural alterations including

translocations as well as copy number alterations200,201

. Notably, balanced translocations go

undetected in CGH array analysis. NGS was also responsible for the characterization of a

catastrophic phenomenon of the cancer genome known as chromothripsis whereby tens to

hundreds of genomic rearrangements occur during a single cellular event202

. NGS has even been

used to map global epigenetic modifications203

. While these high-throughput methods are prone

to errors, this is compensated by high “read depth”, the number of times a given fragment is read.

Average read depths of ~40x in whole genome sequencing and >100x in targeted NGS are now

quite common and are able to provide a reliable sequencing result. The amount of data generated

by this high-throughput method is vast and places intense demands both on computer hardware

and software to enable efficient data analysis. For example, a modest average coverage of just

30x would result in 90 Gb of sequence being read across the whole genome. The time required to

analyze such vast amounts of data is significant. In 2008, the first sequencing of an entire cancer

genome was reported when acute myeloid leukemia cells were compared to matched normal

somatic DNA from the skin. The researchers discovered ten genes with acquired mutations, eight

of which had not been previously described204

. Despite the financial cost and significant labour

analysis required, whole genome sequencing has seen a rapid increase in use as investigators

have rushed to unlock the secrets of the cancer genome that were previously undetectable.

An alternative to costly and labour-intensive whole genome sequencing is targeted or

whole-exome sequencing. Protein-coding sequences are thought to account for only ~1.3% of the

genome205

and can thus be sequenced through NGS without the massive reads required for whole

genome sequencing. In whole exome sequencing, the roughly 50-62 Mb that make up the exome

85

are captured using chemically synthesized nucleotides, referred to as “baits”. These captured

fragments are then subjected to the same massively parallel sequencing used to sequence the

genome. Due to the significantly smaller size of the exome, much higher coverage of up to 100x

can be achieved while still presenting a cheaper, easier-to-manage alternative to whole genome

sequencing. While there are additional costs related to the capture element of the procedure, it

has allowed researchers to focus on variants most likely to have functional effects in a more cost-

effective manner with a higher degree of accuracy than whole genome sequencing and has led to

the discovery of numerous novel mutations in the cancer genome206–208

. While normal exome

sequencing cannot detect fusion genes, transcriptome sequencing (RNA-seq) can be used to

sequence RNA variants, and is thus capable of detecting gene fusions. A hallmark of cancer is

tumour heterogeneity in which a single tumour will have multiple, distinct subclonal populations

with differing genetic backgrounds209

. This can make it quite difficult to get a good picture of the

genetic landscape in a region of interest as a tumour sample that is being sequenced will likely

contain multiple subclonal populations as well as some healthy tissue. By utilizing baits in a

manner similar to exome sequencing, specific targets can be captured and then “deep sequenced”

to a very high degree, typically between 1000x and 10, 000x. This extremely high read depth

allows for alleles with just 1% frequency to be detected reliably205

. When Shah et al. performed

deep sequencing at a depth of 20,000x on somatic mutations detected by whole genome

sequencing in triple-negative breast cancers, the existence of multiple subclonal populations with

different mutant allele frequencies was confirmed210

. The extreme read depth of this technique

allows investigators to sequence a handful of candidate genes with excellent accuracy when

whole exome sequencing is not desired.

86

4.1.3 Next Generation Sequencing and Cancer Syndromes

In the field of hereditary cancer syndromes, NGS technologies will be of critical

importance in the coming years. Sequencing germline DNA of cancer-prone individuals is not

fraught with the same challenges as tumour sequencing with regards to sub-clonal populations

and the accumulation of passenger mutations. Tumour DNA however can often be paired with

“normal” somatic DNA in an effort to identify driver mutations which would be unique to the

tumour population. Whole genome sequencing of the germline DNA of cancer-prone individuals

would likely yield a massive amount of alterations with unknown functional effects due to the

massive amount of data being generated. Because of this, the majority of NGS studies in cancer

predisposition have relied on exome, or targeted sequencing rather than whole genome

sequencing. Particularly in highly penetrant syndromes, it is reasonable to expect causative

mutations to lie in the coding portion of genes, making exome sequencing attractive while being

much more efficient than whole genome sequencing. Additionally, sequencing of unaffected

family members can provide a background from which to identify alterations that segregate with

tumour formation. NGS technologies may well lead to the identification of causative genes in

TP53 wild-type LFS as well as other cancer syndromes in which causative genes have not yet

been identified.

Various NGS platforms have already been used to characterize germline mutations in

cancer-prone individuals. Hereditary pheochromocytoma (PCC) is associated with germline

mutations in one of nine known susceptibility genes, but like LFS, exhibits familial cases that are

wild-type in the associated genes. Whole exome sequencing was used to identify non-

synonymous single nucleotide substitutions in MAX present in three unrelated individuals with

PCC that were not detected in 750 healthy controls. A further analysis of 59 cases lacking

87

mutations in the previously known susceptibility genes revealed additional MAX mutations211

.

Germline mutations in MAX have since been estimated to account for 1.12% of PCC212

. Whole

exome sequencing has since led to the identification of multiple variants associated with

colorectal cancer predisposition213

, BAP1 mutations associated with predisposition to renal cell

carcinoma214

, and ERCC4 mutations associated with unclassified Fanconi anemia215

. It has also

yielded interesting results highlighting the difficulty of classifying cancer predisposition

syndromes. Mutations in FANCC and BLM, associated with Fanconi anemia and Bloom

syndrome respectively, were detected in families predisposed to breast cancer despite not

demonstrating a phenotype associated with either syndrome other than cancer predisposition216

.

While mutations in TP53 are a defining feature LFS, whole exome sequencing has revealed that

inherited TP53 germline mutations are perhaps more common than previously thought. Whole

exome sequencing of a kindred in which five individuals had been diagnosed with leukemia,

revealed a TP53 mutation at codon 306, which has been previously reported in LFS families.

This kindred however, does not appear to be affected by any LFS-associated tumours. While the

family does not meet the classic or even Chompret LFS criteria, their phenotype is severe with

only one surviving member who has an extremely high-risk form of childhood acute

lymphoblastic leukemia217

. The fact that the same germline TP53 mutation is associated with two

different severe cancer predisposition presentations is intriguing and points to the involvement of

other modifier genes. Whole exome sequencing has also been used to identify the presence of

LFS where other syndromes were suspected. Whole exome sequencing was performed on a

patient with gastric adenocarcinoma and a family history of colorectal cancer. The patient did not

appear to harbour mutations in genes usually associated with colorectal cancer predisposition

syndromes (CDH1, APC, MLH1, MSH2, MSH6, PMS2, PTEN, or STK11). The sequencing led to

88

the discovery of a mutation at codon 248 of TP53 that has been reported in many LFS families,

including families with gastric adenocarcinoma218

. Identification of cancer predisposition

syndromes can be difficult when relying on the clinical presentation alone. NGS technologies,

which are becoming more and more common in the clinic as well as the research laboratory, can

be instrumental in properly identifying a syndrome.

A revised custom CGH array like the one previously discussed would provide quick and

reliable data on the copy number status of more than 500 genes of interest at low cost. NGS

however is becoming an increasingly attractive option for investigating TP53 wild-type LFS.

Custom selection methods like Agilent’s SureSelect could be used to sequence all the regions of

interest of this study, including UTRs, introns and promoters as well as the coding regions with

very high coverage. The cost, both in money and labour of extensive custom probe selection

should not be underestimated. Like custom CGH probes, custom selection probes may also

suffer from reliability issues and many would be previously unvalidated. Whole exome

sequencing offers a more reliable alternative as commercial exome selection kits are widely

available. While the non-coding regions of interest would obviously be overlooked with this

method, it would include the coding regions of the entire genome. As it seems probable that

multiple variants work in concert to generate the highly variable phenotypes seen in LFS and

many other cancer syndromes, the ability to interrogate the entire exome would be immensely

helpful. While whole genome sequencing appears to offer the best of both worlds, the cost and

immense analytical challenges make it less attractive for a study such as this where an ideal

sample size in the near future would be over 100 samples.

With regards to analyzing only the copy number of hundreds of genes of interest, a

revised CGH array may well still prove to be the best option, particularly for a large sample set.

89

An array like this could be applied quite cheaply to a very large number of samples. Catalogue

probes at roughly 200bp spacing would provide a reliable resolution of under 1kb. Any gene of

interest with a detected alteration of this size in a coding region could be considered a candidate

gene which could then be sequenced in detail in more samples. This is in contrast to the massive

number of small variants that would be generated by NGS technology. Assessing the potential

phenotypic effect of these variants, and thus generating a small list of candidate genes, would

prove very difficult and significantly more costly.

4.2 Potential Genetic Evidence of Anticipation

Anticipation has been suggested to occur in other cancer predisposition syndromes as

well, but remains controversial. It is difficult to determine if the decrease in age of onset is

genuine or merely a result of ascertainment bias. Identification of a genetic basis for anticipation

then is vital. Perhaps the best studied example of this is myotonic dystrophy type 1 (DM1) where

expansion of an unstable CTG trinucleotide repeat is associated with more severe disease and

earlier age of onset219

. Telomere length has been the focus of much attention in cancer

predisposition syndromes and shortening of telomeres appears to be associated with an earlier

age of onset in LFS38

and hereditary breast cancer40

while this does not appear to be the case in

HNPCC220

despite ample clinical evidence of anticipation221

.

The custom CGH array used in this study is not well suited to evaluating global copy

number differences between family members. It did however detect a number (12/14) of shared

regions of interest where the alterations appear to be expanded in the child. The expansions

remain unvalidated and are for the most part quite small, consisting of only a few probes. While

the sample size is small, only 2/14 of the shared alterations were expanded in the parent, and

even then only by one probe. Children developed tumours at an earlier age than parents in all

90

these pairs. The most notable example of repeated expanded alterations was seen in samples 4

and 5 which actually have the smallest difference in ages of onset (17 years vs. 13 years).

Despite this, the daughter did appear to exhibit a more severe phenotype. While her father’s

osteosarcoma was successfully treated, she unfortunately did not survive.

This evidence of anticipation is of course very slight and cannot be used to make any

broad conclusions. It is however intriguing and suggests that further studies may yield interesting

results in the search for a genetic cause of anticipation in LFS. NGS of paired LFS samples will

be able to accurately identify the accumulation of likely deleterious mutations throughout

generations if they do in fact exist.

4.3 Discovery of a Lynch Syndrome Kindred

Using the custom CGH array, a heterozygous deletion encompassing exons 3-6 of the

Lynch syndrome (HNPCC) gene MSH2 was detected in a proband which was validated via

qPCR. Curiously, this deletion was not detected in the affected mother whose DNA was also

analyzed. A more complete family history revealed a high incidence of colon cancer on the

paternal side of the family though the father is as yet unaffected (pedigree shown in figure 11).

The paternal grandfather however was found later to have tested positive for the same deletion.

The proband’s brother is thus far unaffected but has been shown to also carry the deletion.

MSH2 codes for a 105kDa protein consisting of 934 amino acids and is one of the key

mismatch repair (MMR) genes. It forms two different heterodimers: MutSα (an MSH2-MSH6

heterodimer) and MutSβ (an MSH2-MSH3 heterodimer). These heterodimers bind to DNA

mismatches and initiate the DNA repair process by bending the DNA helix, shielding around 20

base pairs. Both MutS heterodimers form a complex with the MutLα heterodimer, which directs

91

downstream MMR events. As discussed in the introduction, mutations in MSH2 and MLH1 are

thought to be the main cause of HNPCC. While their prevalence relative to MSH6 and PMS2 has

perhaps been overestimated in the past, MSH2 and MLH1 do confer a more penetrant phenotype.

Deletions encompassing exons 3-6 have been reported 14 times in database of The International

Society for Gastrointestinal Hereditary Tumours (InSiGHT) and deletions of MSH2 account for

26.25% of all catalogued variants while substitutions account for 68%222

.

This case highlights the importance of obtaining the most detailed family history possible

when a cancer syndrome is suspected. The early-onset glioblastoma multiforme of the proband

combined with the incidence of early onset breast cancer in the maternal lineage was certainly

suggestive of LFS. While the father was unaffected, a complete family history revealed the

evidence of a devastating colorectal cancer predisposition. Had the paternal lineage been the

focus, MSH2 testing would certainly have been indicated for the proband. The case also

demonstrates how the variable presentations of cancer syndromes can confound diagnosis. While

the proband’s immediate family history is suggestive of LFS, a HNPCC gene appears to be the

causative mutation. Recently it has been suggested that soft tissue sarcomas, a common feature

of LFS, should also be included in the HNPCC tumour spectrum223

. The authors conducted a

review of the literature and found eleven cases of soft tissue sarcoma in HNPCC patients. Ten of

these were found to harbour germline mutations in MMR genes (seven in MSH2 and three in

MLH1). As previously discussed, a family with a severe leukemia predisposition was found to

harbour a TP53 mutation seen in LFS217

and a proband with a suspected colorectal cancer

syndrome was shown to have an LFS-associated TP53 mutation218

. These are all cases in which

the clinical presentation does not agree with what would be predicted by the genetic presentation.

92

The identification of a causative mutation can aid greatly in the proper management of cancer

syndromes which often rely on clinical presentation and family history to make a diagnosis.

Why would a family with an LFS mutation appear to have a colorectal cancer

predisposition syndrome? Why does a proband with an HNPCC-associated mutation present

with an LFS-associated tumour? Secondary alterations likely account for much of the variability

that is observed in cancer syndromes. A low-risk allele may have a different effect in the

presence of a primary mutation in MSH2 or TP53 for example, which may exert a significant

effect on the phenotype. In the specific case in question, it seems likely that additional cancer

predisposition variants were inherited from the mother, whose own predisposition remains

unexplained. Constitutional mismatch repair-deficiency (CMMR-D) is associated with

homozygous mutations in the MMR genes. While predisposing to HNPCC-associated tumours, it

also predisposes to hematological malignancies as well as brain tumours. PMS2 is the most

frequently reported of the MMR genes in CMMR-D, likely due to its lower penetrance in

HNPCC relative to MSH2 and MLH1224

. Glioblastoma is the most common brain tumour

observed in CMMR-D, with a median age at diagnosis of 8 years, the age at diagnosis of the

proband224

. Due to the potential for bi-allelic MMR involvement, the proband and mother were

subjected to an MMR gene panel assessing MLH1, MSH2, MSH6 and EPCAM but there was no

indication of bi-allelic involvement in these genes. Notably, PMS2 was not included in this

panel. The custom CGH array detected a 3.9kb copy number gain in PMS2 in the proband’s

mother. However, this does not seem to be inherited by the proband. Still, it is possible that this

amplification, if genuine, is coupled with a small alteration on the other allele that could have

been inherited. While this would be extraordinary if it were the case, the clinical presentation of

the proband warrants a thorough investigation of any potential MMR involvement. This

93

proband-mother pair is an excellent candidate for whole exome sequencing which may lead to

the identification of shared alterations that may be responsible for the distinct, severe phenotype

observed in the proband.

4.4 PTCH1 Associated with the LFS phenotype

4.4.1 Deletions in PTCH1 isoforms associated with the LFS phenotype

The custom CGH array detected a number of alterations in PTCH1. Of these, the only

ones to be successfully validated via qPCR were 11/13 of the copy number losses at the 5’ end of

the gene. This result however is in doubt as a TaqMan copy number assay 500bp downstream of

the common region of copy number loss, which included detected regions of loss in six samples,

revealed the probe locus to be copy number neutral in all samples. Accurate copy number

detection for this small region has proven difficult, but further validation is certainly necessary to

assess the true nature of these detected losses. The gene was then sequenced in all samples with a

PTCH1 alteration detected leading to the discovery of a novel Ser433Tyr substitution in two

family members.

PTCH1 encodes a 161kDa transmembrane protein consisting of 1447 amino acids which

form 12 transmembrane-spanning domains and 2 large extracellular loops. The gene itself

contains 23 coding exons and spans ~73kb. PTCH1 is the ligand-binding component of the

hedgehog (Hh) receptor complex. In the absence of the Hh ligand, PTCH1 maintains another

transmembrane protein, Smoothened (SMO), in an inactive state. Upon binding of the Hh ligand

to the extracellular loops of PTCH1, SMO is released and transduces the signal to a SUFU-GLI

complex in the cell’s cytoplasm which results in the activation of GLI transcription factors. The

transcription of PTCH1 itself is induced by the activity of the Hh pathway, creating a negative

feedback loop225

.

94

As discussed in the introduction, mutations in PTCH1 are associated with Nevoid basal

cell carcinoma syndrome (NBCCS, or Gorlin Syndrome). In NBCCS, various tumours and

hamartomas exhibit loss of heterozygosity. Loss of heterozygosity has been observed in almost

90% of hereditary basal cell carcinomas (BCCs)226

. Lindstroem et al. reviewed 132 germline

mutations of PTCH1. Of these, 73% were nonsense mutations, the majority of which are due to

small insertions and deletions, in contrast to sporadic BCCs where missense mutations make up

the majority of mutations. These mutations are concentrated in the two large extracellular loops

while missense mutations are primarily located in the transmembrane domains227

. There is

increasing evidence of large deletions of PTCH1, including deletions that completely envelop the

gene, playing a larger role in NBCCS than previously thought. The increasing using of arrays has

led to the identification of many such cases in families with NBCCS in whom sequencing failed

to detect any mutations228,229

. Mice heterozygous for PTCH1 mutations are often affected by

rhabdomyosarcoma, a common LFS tumour230

. Interestingly, Kappler et al. showed that

rhabdomyosarcomas caused by mutations of either PTCH1 or TP53 in mice have distinct gene

expression profiles and biological features231

.

Small copy number losses on the 5’ end of PTCH1 have yet to be reported in NBCCS.

The 5’ structure of the human PTCH1 gene was unclear until the discovery that there exist at

least five exons that are alternatively used as the first exon of the protein. This results in at least

three different protein isoforms: L, M, and S232–234

. The exact breakpoints of the detected

deletions remain unclear, but they affect the 1a transcript coding for the first exon of the L and M

isoforms and some appear to include their transcriptional start sites according to the array with

all the others only one probe away. These isoforms are differentially regulated both temporally

and spatially and the shorter isoform S appears to be less stable than the others232

. While small

95

deletions of the L and M isoforms have not been previously reported in NBCCS, nonsense

mutations involving only the L and M isoforms have been seen in a few cases235–237

. Recently,

Suzuki et al. reported a nonsense mutation at codon 129 that was shown to eliminate translation

of functional PTCHM while allowing for the translation of PTCHS. Patient cells would thus be

expected to produce half the amount of longer isoforms (PTCHM and PTCHL) while producing

normal, or perhaps even slightly higher than normal, levels of the PTCHS isoform. This suggests

that NBCCS is caused by the haploinsufficiency of PTCHL and PTCHM with the PTCHS

isoform unable to compensate for their function235

. The role of PTCHS then is unclear, however

as it is the more ubiquitously expressed and less stable, it may have a role in situations where

transient expression and rapid degradation of PTCH1 is nesseccary232

. Cell lines are available for

a few of the samples that appear to have these 5’ deletions. In the future, the state of these three

PTCH1 isoforms should be assessed to see if these deletions have a similar effect on PTCHL and

PTCHM. In the few samples with tumour DNA available, PTCH1 status should also be assessed

to see if the loss of heterozygosity often observed in NBCCS is also seen in these tumours.

4.4.2 A Novel Variant in an LFS-L Family Affected by Syndactyly

The detection of a Ser433Tyr substitution by Sanger sequencing was an unexpected

discovery as this variant has not been reported in any case of NBCCS, the NCBI SNP, 1000

Genomes, NHLBI exome databases, or the catalogue of somatic mutations in cancer (COSMIC).

Because of this, the functional relevance of this variant is hard to predict, although PolyPhen-2, a

tool for annotating coding nonsynonymous SNPs193

predicted this substitution to be “probably

damaging” with a score of 0.984. The “Sorting Tolerant From Intolerant” (SIFT) algorithm194

also predicts this mutation to be “damaging” with a SIFT score of 0.01. This affects a highly

conserved codon at the very end of the first extracellular loop, a domain frequently affected by

96

nonsense mutations in NBCCS. This domain is critical for Hh binding and has been shown to be

critical in a SMO-independent mechanism of GLI1 inhibition238

. In NBCCS, missense mutations

appear to be concentrated in the transmembrane domains227

.

The variant was discovered in two family members (mother and son) of three tested on

the array. While both children have developed cancer (including one without this mutation), the

mother has not. The mother however is affected by syndactyly as is the son who shares the novel

variant (pedigree shown in figure 24). While syndactyly is not considered a hallmark of NBCCS,

it has been observed6 as well as polydactyly

239. This family presents a challenging case as while

both children are affected by LFS tumours at an early age, as far as we are aware, neither parent

is affected and neither lineage has a particularly strong history of cancer. There is a strong

history of syndactyly in the maternal lineage which appears to be inherited in an autosomal

dominant fashion, but this does not completely segregate with the cancer phenotype. Aside from

this mutation at codon 433, all three family members also appear to harbour the previously

discussed deletion at the 5’ end of the PTCH1 gene. As this substitution does not result in a

truncated protein, it does seem understandable that it would not confer the usual NBCSS

phenotype. The question, however, is if it does exert a cancer-predisposing effect, especially in

the presence of other mutations in the gene, such as the 5’ deletions in PTCH1, or with mutations

in other genes, perhaps other tumour suppressors. PTCH2 is highly homologous to PTCH1 and

has been reported in very few cases of NBCCS and appears to confer a milder phenotype14,15

.

PTCH2-deficient mice, unlike PTCH1-deficient mice, have no obvious defects and do not appear

to be predisposed to cancer. However, loss of PTCH2 has a marked effect on tumour formation

in combination with PTCH1 haploinsufficiency. These mice exhibit a higher incidence of

tumours and a broader spectrum of tumour types compared to mice deficient in PTCH1 alone241

.

97

This provides an interesting example of a mutation that does not appear to be important until

placed in the context of another lesion. It is possible that the mutation at codon 433 may have a

similar effect.

In any case, the other members of the family who are affected by syndactyly should be

considered for PTCH1 sequencing to confirm that the mutation does in fact segregate with the

syndactyly phenotype as this could be done quickly and easily. Assessing this mutation’s

functional effect on cancer predisposition however is more difficult. Chung and Bunz recently

sought to answer a similar question about a PTCH1 variant detected in colorectal cancer. This

variant, P681L, is a missense mutation located in the intracellular loop of PTCH1. Expression of

exogenous SMO resulted in robust activation of a GLI-responsive luciferase reporter construct.

This could be suppressed by the expression of wildtype PTCH1, but not the variant242

. The

ability of the S433Y variant to inhibit GLI activity can also be assessed in this manner.

4.5 Management of Cancer Predisposition Syndromes

Management of cancer predisposition syndromes generally relies on a routine

surveillance protocol in order to detect and treat tumours as early as possible. Villani et al.

assessed the feasibility and potential clinical effect of a comprehensive surveillance protocol

using frequent biochemical and imaging studies in TP53 mutation carriers. They found that the

3-year overall survival was 100% in the surveillance group and only 21% in the non-surveillance

group243

. This demonstrates the clinical impact that accurate diagnosis of a cancer syndrome

followed by robust surveillance can have. Drug treatment is currently not used to treat

asymptomatic LFS patients but chemoprevention in NBCCS is being explored. The SMO

98

inhibitor, Vismodegib, received FDA approval for BCC treatment in 2012. Shortly after this,

Tang et al. published the results of a randomized, double-blind, placebo-controlled trial of

Vismodegib in NBCCS patients. The authors found that the per-patient rate of new surgically

eligible BCCs dropped from 29 in the placebo to 2 in the Vismodegib group. Existing clinically

significant BCCs also saw a significant reduction in size compared to the placebo group. While

the drug appears to hold much promise, it is associated with many adverse effects that caused a

number of patients to discontinue treatment244

. Furthermore, there is evidence that continuous

Vismodegib therapy can lead to acquired resistance to the drug245

and it appears that BCCs will

rapidly rebound upon cessation of Vismodegib treatment246

. Nonetheless, Vismodegib not only

holds promise as a useful tool in the management of NBCCS, but also provides an example of

how the identification of causative germline mutations can lead to novel therapies for cancer

predisposition syndromes.

4.6 Concluding Remarks

Our study aimed to identify candidate genes that may play a role in LFS and LFS-L.

Using a custom CGH array we were able to create a large list of potentially significant alterations

in genes of interest, many of which were validated by qPCR. These include alterations in MSH2

and PTCH1, two genes previously implicated in cancer predisposition syndromes.

The functional relevance of these newly identified novel alterations, as well as those

which are yet to be validated remains unknown. New technologies, particularly NGS should

allow for the highly accurate identification of both sequence and structural alterations in

subsequent studies. With the increasing numbers of LFS and LFS-L samples available that can

be analyzed both in our lab and through collaboration with labs around the world, the genes of

interest identified in this study can be interrogated with these new technologies in order to better

99

understand their potential role in the development of the LFS phenotype. Exome sequencing also

holds the potential of identifying smaller alterations that would go undetected by a CGH array

and may reside in genes not thought to be a high priority. Of particular interest in any future

study would be other genes in the Hh pathway such as SMO and SUFU. The identification of

causative mutations is crucial to the proper treatment of cancer predisposition syndromes such as

LFS. Knowledge of associated genes is necessary for the proper identification of carriers that

leads to highly successful surveillance programs that significantly increase survival, quality

genetic counseling, and a better quality of life.

100

References

1. Li, F. P. & Fraumeni, J. F. Soft-tissue sarcomas, breast cancer, and other neoplasms. A

familial syndrome? Ann. Intern. Med. 71, 747–52 (1969).

2. Li, F. P. et al. A cancer family syndrome in twenty-four kindreds. Cancer Res. 48, 5358–

62 (1988).

3. Nichols, K. E., Malkin, D., Garber, J. E., Fraumeni, J. F. & Li, F. P. Germ-line p53

mutations predispose to a wide spectrum of early-onset cancers. Cancer Epidemiol.

Biomarkers Prev. 10, 83–7 (2001).

4. Petitjean, A. et al. Impact of mutant p53 functional properties on TP53 mutation patterns

and tumor phenotype: lessons from recent developments in the IARC TP53 database.

Hum. Mutat. 28, 622–9 (2007).

5. Tinat, J. et al. 2009 version of the Chompret criteria for Li Fraumeni syndrome. J. Clin.

Oncol. 27, e108–9; author reply e110 (2009).

6. Hisada, M., Garber, J. E., Fung, C. Y., Joseph, F. & Li, F. P. Multiple Primary Cancers in.

90, (1998).

7. Marees, T. et al. Risk of second malignancies in survivors of retinoblastoma: more than

40 years of follow-up. J. Natl. Cancer Inst. 100, 1771–9 (2008).

8. Wu, C.-C., Shete, S., Amos, C. I. & Strong, L. C. Joint effects of germ-line p53 mutation

and sex on cancer risk in Li-Fraumeni syndrome. Cancer Res. 66, 8287–8292 (2006).

9. Hwang, S.-J., Lozano, G., Amos, C. I. & Strong, L. C. Germline p53 Mutations in a

Cohort with Childhood Sarcoma: Sex Differences in Cancer Risk. Am. J. Hum. Genet. 72,

975–983 (2003).

10. Nigro, J. M. et al. Mutations in the p53 gene occur in diverse human tumour types. Nature

342, 705–708 (1989).

11. Lavigueur, A. et al. High incidence of lung, bone, and lymphoid tumors in transgenic

mice overexpressing mutant alleles of the p53 oncogene. Mol. Cell. Biol. 9, 3982–3991

(1989).

12. Olivier, M. et al. Li-Fraumeni and Related Syndromes : Correlation between Tumor Type

, Family Structure , and TP53 Genotype Li-Fraumeni and Related Syndromes : Correlation

between Tumor Type , Family. 6643–6650 (2003).

13. Varley, J. M. Germline TP53 mutations and Li-Fraumeni syndrome. Hum. Mutat. 21,

313–20 (2003).

101

14. Lalloo, F. et al. Prediction of pathogenic mutations in patients with early-onset breast

cancer by family history For personal use . Only reproduce with permission from The

Lancet Publishing Group . 361, 1101–1102 (2003).

15. Gonzalez, K. D. et al. High frequency of de novo mutations in Li-Fraumeni syndrome. J.

Med. Genet. 46, 689–93 (2009).

16. Olivier, M., Hollstein, M. & Hainaut, P. TP53 mutations in human cancers: origins,

consequences, and clinical use. Cold Spring Harb. Perspect. Biol. 2, a001008 (2010).

17. Hussain, S. P. & Harris, C. C. Molecular Epidemiology of Human Cancer : Contribution

of Mutation Spectra Studies of Tumor Suppressor Genes of Human Cancer : Contribution

of Mutation Spectra Studies of Tumor Suppressor Genes. 4023–4037 (1998).

18. Fujimoto, A. et al. Whole-genome sequencing of liver cancers identifies etiological

influences on mutation patterns and recurrent mutations in chromatin regulators. Nat.

Genet. 44, 760–4 (2012).

19. Rotter, V., Boss, M. a & Baltimore, D. Increased concentration of an apparently identical

cellular protein in cells transformed by either Abelson murine leukemia virus or other

transforming agents. J. Virol. 38, 336–46 (1981).

20. Wolf, D., Harris, N. & Rotter, V. Reconstitution of p53 expression in a nonproducer Ab-

MuLV-transformed cell line by transfection of a functional p53 gene. Cell 38, 119–126

(1984).

21. Olive, K. P. et al. Mutant p53 gain of function in two mouse models of Li-Fraumeni

syndrome. Cell 119, 847–60 (2004).

22. Shlien, A. et al. A common molecular mechanism underlies two phenotypically distinct

17p13.1 microdeletion syndromes. Am. J. Hum. Genet. 87, 631–42 (2010).

23. Milner, J. O., Medcalf, E. A. & Cook, A. C. p53 Complexes. 11, 12–19 (1991).

24. De Vries, A. et al. Targeted point mutations of p53 lead to dominant-negative inhibition

of wild-type p53 function. Proc. Natl. Acad. Sci. U. S. A. 99, 2948–53 (2002).

25. Lang, G. a et al. Gain of function of a p53 hot spot mutation in a mouse model of Li-

Fraumeni syndrome. Cell 119, 861–72 (2004).

26. Dittmer D, Pati S, Zambetti G, Chu S, Teresky AK, Moore M, Finlay C, L. A. Gain of

function mutations in p53. Nat Genet. 4, 42–6

27. Kadouri, L. et al. A single-nucleotide polymorphism in the RAD51 gene modifies breast

cancer risk in BRCA2 carriers, but not in BRCA1 carriers or noncarriers. Br. J. Cancer

90, 2002–5 (2004).

102

28. Easton, D. F., Ponder, M. a, Huson, S. M. & Ponder, B. a. An analysis of variation in

expression of neurofibromatosis (NF) type 1 (NF1): evidence for modifying genes. Am. J.

Hum. Genet. 53, 305–13 (1993).

29. Shlien, A. et al. Excessive genomic DNA copy number variation in the Li-Fraumeni

cancer predisposition syndrome. Proc. Natl. Acad. Sci. U. S. A. 105, 11264–9 (2008).

30. Silva, A. G., Achatz, I. M. W., Krepischi, A. C., Pearson, P. L. & Rosenberg, C. Number

of rare germline CNVs and TP53 mutation types. Orphanet J. Rare Dis. 7, 101 (2012).

31. Dumont, P., Leu, J. I.-J., Della Pietra, A. C., George, D. L. & Murphy, M. The codon 72

polymorphic variants of p53 have markedly different apoptotic potential. Nat. Genet. 33,

357–65 (2003).

32. Bond, G. L. et al. A single nucleotide polymorphism in the MDM2 promoter attenuates

the p53 tumor suppressor pathway and accelerates tumor formation in humans. Cell 119,

591–602 (2004).

33. Bougeard, G. et al. Impact of the MDM2 SNP309 and p53 Arg72Pro polymorphism on

age of tumour onset in Li-Fraumeni syndrome. J. Med. Genet. 43, 531–3 (2006).

34. Wu, C.-C. et al. Joint effects of germ-line TP53 mutation, MDM2 SNP309, and gender on

cancer risk in family studies of Li-Fraumeni syndrome. Hum. Genet. 129, 663–73 (2011).

35. Marcel, V. et al. TP53 PIN3 and MDM2 SNP309 polymorphisms as genetic modifiers in

the Li-Fraumeni syndrome: impact on age at first diagnosis. J. Med. Genet. 46, 766–72

(2009).

36. Fang, S. et al. Sex-specific effect of the TP53 PIN3 polymorphism on cancer risk in a

cohort study of TP53 germline mutation carriers. Hum. Genet. 130, 789–94 (2011).

37. Trkova, M., Hladikova, M., Kasal, P., Goetz, P. & Sedlacek, Z. Is there anticipation in the

age at onset of cancer in families with Li-Fraumeni syndrome? J. Hum. Genet. 47, 381–6

(2002).

38. Tabori, U., Nanda, S., Druker, H., Lees, J. & Malkin, D. Younger age of cancer initiation

is associated with shorter telomere length in Li-Fraumeni syndrome. Cancer Res. 67,

1415–8 (2007).

39. Trkova, M., Prochazkova, K., Krutilkova, V., Sumerauer, D. & Sedlacek, Z. Telomere

length in peripheral blood cells of germline TP53 mutation carriers is shorter than that of

normal individuals of corresponding age. Cancer 110, 694–702 (2007).

40. Martinez-Delgado, B. et al. Genetic anticipation is associated with telomere shortening in

hereditary breast cancer. PLoS Genet. 7, e1002182 (2011).

103

41. Bougeard, G. et al. Detection of 11 germline inactivating TP53 mutations and absence of

TP63 and HCHK2 mutations in 17 French families with Li-Fraumeni or Li-Fraumeni-like

syndrome. J. Med. Genet. 38, 253–7 (2001).

42. Stone, J. G. et al. Analysis of Li–Fraumeni syndrome and Li–Fraumeni-like families for

germline mutations in Bcl10. Cancer Lett. 147, 181–185 (1999).

43. Barlow, J. W. et al. Germ Line BAX Alterations Are Infrequent in Li-Fraumeni Syndrome

Germ Line BAX Alterations Are Infrequent in Li-Fraumeni Syndrome. 1403–1406

(2004).

44. Burt, E. C. et al. Exclusion of the genes CDKN2 and PTEN as causative gene defects in

Li-Fraumeni syndrome. Br. J. Cancer 80, 9–10 (1999).

45. Portwine, C., Lees, J., Verselis, S., Li, F. P. & Malkin, D. Absence of germline p16

INK4a alterations in p53 wild type Li-Fraumeni syndrome families. J. Med. Genet. 37 ,

e13–e13 (2000).

46. Brown, L. T., Sexsmith, E. & Malkin, D. Identification of a novel PTEN intronic deletion

in Li-Fraumeni syndrome and its effect on RNA processing. Cancer Genet. Cytogenet.

123, 65–8 (2000).

47. Vahteristo, P. et al. p53 , CHK 2 , and CHK1 Genes in Finnish Families with Li-Fraumeni

Syndrome : Further Evidence of CHK2 in Inherited Cancer Predisposition Further

Evidence of CHK2 in Inherited Cancer Predisposition 1. 5718–5722 (2001).

48. Bell, D. W. Heterozygous Germ Line hCHK2 Mutations in Li-Fraumeni Syndrome.

Science (80-. ). 286, 2528–2531 (1999).

49. Sodha, N. et al. Screening hCHK2 for Mutations. Sci. 289 , 359 (2000).

50. Meijers-Heijboer, H. et al. Low-penetrance susceptibility to breast cancer due to

CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations. Nat. Genet. 31, 55–9

(2002).

51. Vahteristo, P. et al. A CHEK2 genetic variant contributing to a substantial fraction of

familial breast cancer. Am. J. Hum. Genet. 71, 432–8 (2002).

52. Cybulski, C. et al. CHEK2 is a multiorgan cancer susceptibility gene. Am. J. Hum. Genet.

75, 1131–5 (2004).

53. Xiang, H., Geng, X., Ge, W. & Li, H. Meta-analysis of {CHEK2} 1100delC variant and

colorectal cancer susceptibility. Eur. J. Cancer 47, 2546–2551 (2011).

54. Cybulski, C. et al. Constitutional CHEK2 mutations are associated with a decreased risk

of lung and laryngeal cancers. Carcinogenesis 29, 762–5 (2008).

104

55. Cybulski, C. et al. A personalised approach to prostate cancer screening based on

genotyping of risk founder alleles. Br. J. Cancer 108, 2601–9 (2013).

56. Bachinski, L. L. et al. Genetic Mapping of a Third Li-Fraumeni Syndrome Predisposition

Locus to Human Chromosome 1q23 Locus to Human Chromosome 1q23. 427–431

(2005).

57. Aury-Landas, J. et al. Germline copy number variation of genes involved in chromatin

remodelling in families suggestive of Li-Fraumeni syndrome with brain tumours. Eur. J.

Hum. Genet. 1–8 (2013). doi:10.1038/ejhg.2013.68

58. Chan, T. L. et al. Heritable germline epimutation of MSH2 in a family with hereditary

nonpolyposis colorectal cancer. Nat. Genet. 38, 1178–83 (2006).

59. Suter, C. M., Martin, D. I. K. & Ward, R. L. Germline epimutation of MLH1 in

individuals with multiple cancers. Nat. Genet. 36, 497–501 (2004).

60. Morak, M. et al. Further evidence for heritability of an epimutation in one of 12 cases

with MLH1 promoter methylation in blood cells clinically displaying HNPCC. Eur. J.

Hum. Genet. 16, 804–11 (2008).

61. Attwooll, C. L. et al. Identification of a rare polymorphism in the human TP53 promoter.

Cancer Genet. Cytogenet. 135, 165–72 (2002).

62. Amatya, V. J., Naumann, U., Weller, M. & Ohgaki, H. TP53 promoter methylation in

human gliomas. Acta Neuropathol. 110, 178–84 (2005).

63. Kang, J. H. et al. Methylation in the p53 Promoter Is a Supplementary Route to Breast

Carcinogenesis : Correlation between CpG Methylation in the p53 Promoter and the

Mutation of the p53 Gene in the Progression from Ductal Carcinoma In Situ to Invasive

Ductal Carcinoma Muta. 81, 573–579 (2001).

64. Pogribny, I. P. & James, S. J. Reduction of p53 gene expression in human primary

hepatocellular carcinoma is associated with promoter region methylation without coding

region mutation. Cancer Lett. 176, 169–74 (2002).

65. Agirre, X. et al. Methylation of CpG dinucleotides and/or CCWGG motifs at the promoter

of TP53 correlates with decreased gene expression in a subset of acute lymphoblastic

leukemia patients. Oncogene 22, 1070–2 (2003).

66. Finkova, A. et al. The TP53 gene promoter is not methylated in families suggestive of Li-

Fraumeni syndrome with no germline TP53 mutations. Cancer Genet. Cytogenet. 193,

63–6 (2009).

67. Garber, J. E. & Offit, K. Hereditary cancer predisposition syndromes. J. Clin. Oncol. 23,

276–92 (2005).

105

68. Knapke, S., Nagarajan, R., Correll, J., Kent, D. & Burns, K. Hereditary cancer risk

assessment in a pediatric oncology follow-up clinic. Pediatr. Blood Cancer 58, 85–9

(2012).

69. Cazier, J. & Tomlinson, I. General lessons from large-scale studies to identify human

cancer predisposition genes. 255–262 (2010). doi:10.1002/path

70. Knudson, A. G. Mutation and cancer: statistical study of retinoblastoma. Proc. Natl. Acad.

Sci. U. S. A. 68, 820–3 (1971).

71. Williams, V. C. et al. Neurofibromatosis type 1 revisited. Pediatrics 123, 124–33 (2009).

72. Goldstein, A. M. et al. Clinical findings in two African-American families with the nevoid

basal cell carcinoma syndrome (NBCC). Am. J. Med. Genet. 50, 272–81 (1994).

73. Lo Muzio, L. et al. Nevoid basal cell carcinoma syndrome. Clinical findings in 37 Italian

affected individuals. Clin. Genet. 55, 34–40 (1999).

74. Lo Muzio, L. Nevoid basal cell carcinoma syndrome (Gorlin syndrome). Orphanet J. Rare

Dis. 3, 32 (2008).

75. Cowan, R. et al. The gene for the naevoid basal cell carcinoma syndrome acts as a

tumour-suppressor gene in medulloblastoma. Br. J. Cancer 76, 141–5 (1997).

76. O’Malley, S., Weitman, D., Olding, M. & Sekhar, L. Multiple neoplasms following

craniospinal irradiation for medulloblastoma in a patient with nevoid basal cell carcinoma

syndrome. Case report. J. Neurosurg. 86, 286–8 (1997).

77. Mariateresa Mancuso, S. P. Basal cell carcinoma and its development: insights from

radiation-induced tumors in Ptch1-deficient mice. Cancer Res. 64, 934 – 41 (2004).

78. Marin-Gutzke, M. et al. Basal Cell Carcinoma in Childhood After Radiation Therapy.

Ann. Plast. Surg. 53, 593–595 (2004).

79. Kimonis, V. E. et al. Clinical manifestations in 105 persons with nevoid basal cell

carcinoma syndrome. Am. J. Med. Genet. 69, 299–308 (1997).

80. Pastorino, L. et al. Identification of a SUFU germline mutation in a family with Gorlin

syndrome. Am. J. Med. Genet. A 149A, 1539–43 (2009).

81. Smyth, I. Isolation and characterization of human patched 2 (PTCH2), a putative tumour

suppressor gene inbasal cell carcinoma and medulloblastoma on chromosome 1p32. Hum.

Mol. Genet. 8, 291–297 (1999).

82. Fan, Z. et al. A missense mutation in PTCH2 underlies dominantly inherited NBCCS in a

Chinese family. J. Med. Genet. 45, 303–308 (2008).

106

83. Fujii, K. et al. Frameshift mutation in the PTCH2 gene can cause nevoid basal cell

carcinoma syndrome. Fam. Cancer 12, 611–614 (2013).

84. Slade, I. et al. Heterogeneity of familial medulloblastoma and contribution of germline

PTCH1 and SUFU mutations to sporadic medulloblastoma. Fam. Cancer 10, 337–342

(2010).

85. Ottensmeier, H. et al. Treatment of Early Childhood Medulloblastoma by Postoperative

Chemotherapy Alone. 978–986 (2005).

86. Struewing, J. P. et al. The risk of cancer associated with specific mutations of BRCA1 and

BRCA2 among Ashkenazi Jews. N. Engl. J. Med. 336, 1401–8 (1997).

87. Warthin, A. S. Heredity with reference to carcinoma. Arch. Intern. Med. XII, 546 (1913).

88. Lynch, H. T. & Krush, A. J. Cancer family “G” revisited: 1895-1970. Cancer 27, 1505–11

(1971).

89. Quehenberger, F., Vasen, H. F. A. & van Houwelingen, H. C. Risk of colorectal and

endometrial cancer for carriers of mutations of the hMLH1 and hMSH2 gene: correction

for ascertainment. J. Med. Genet. 42, 491–6 (2005).

90. Jenkins, M. A. et al. Cancer risks for mismatch repair gene mutation carriers: a

population-based early onset case-family study. Clin. Gastroenterol. Hepatol. 4, 489–98

(2006).

91. Stoffel, E. et al. Calculation of risk of colorectal and endometrial cancer among patients

with Lynch syndrome. Gastroenterology 137, 1621–7 (2009).

92. Lynch, H. T. & de la Chapelle, A. Hereditary colorectal cancer. N. Engl. J. Med. 348,

919–32 (2003).

93. Hampel, H. et al. Screening for the Lynch syndrome (hereditary nonpolyposis colorectal

cancer). N. Engl. J. Med. 352, 1851–60 (2005).

94. Aaltonen, L. A. et al. Replication errors in benign and malignant tumors from hereditary

nonpolyposis colorectal cancer patients. Cancer Res. 54, 1645–8 (1994).

95. Kovacs, M. E., Papp, J., Szentirmay, Z., Otto, S. & Olah, E. Deletions removing the last

exon of TACSTD1 constitute a distinct class of mutations predisposing to Lynch

syndrome. Hum. Mutat. 30, 197–203 (2009).

96. Ligtenberg, M. J. L. et al. Heritable somatic methylation and inactivation of MSH2 in

families with Lynch syndrome due to deletion of the 3’ exons of TACSTD1. Nat. Genet.

41, 112–7 (2009).

107

97. Niessen, R. C. et al. Germline hypermethylation of MLH1 and EPCAM deletions are a

frequent cause of Lynch syndrome. Genes. Chromosomes Cancer 48, 737–44 (2009).

98. Kempers, M. J. E. et al. Risk of colorectal and endometrial cancers in EPCAM deletion-

positive Lynch syndrome: a cohort study. Lancet Oncol. 12, 49–55 (2011).

99. Gazzoli, I., Loda, M., Garber, J., Syngal, S. & Kolodner, R. D. A hereditary nonpolyposis

colorectal carcinoma case associated with hypermethylation of the MLH1 gene in normal

tissue and loss of heterozygosity of the unmethylated allele in the resulting microsatellite

instability-high tumor. Cancer Res. 62, 3925–8 (2002).

100. Järvinen, H. J. et al. Controlled 15-year trial on screening for colorectal cancer in families

with hereditary nonpolyposis colorectal cancer. Gastroenterology 118, 829–834 (2000).

101. Vasen, H. F. A. et al. One to 2-year surveillance intervals reduce risk of colorectal cancer

in families with Lynch syndrome. Gastroenterology 138, 2300–6 (2010).

102. Vasen, H. F., Mecklin, J. P., Khan, P. M. & Lynch, H. T. The International Collaborative

Group on Hereditary Non-Polyposis Colorectal Cancer (ICG-HNPCC). Dis. Colon

Rectum 34, 424–5 (1991).

103. Vasen, H. F., Watson, P., Mecklin, J. P. & Lynch, H. T. New clinical criteria for

hereditary nonpolyposis colorectal cancer (HNPCC, Lynch syndrome) proposed by the

International Collaborative group on HNPCC. Gastroenterology 116, 1453–6 (1999).

104. Kastrinos, F. & Stoffel, E. M. The History, Genetics, and Strategies for Cancer Prevention

in Lynch Syndrome. Clin. Gastroenterol. Hepatol. (2013). doi:10.1016/j.cgh.2013.06.031

105. Syngal, S., Fox, E. A., Eng, C., Kolodner, R. D. & Garber, J. E. Sensitivity and specificity

of clinical criteria for hereditary non-polyposis colorectal cancer associated mutations in

MSH2 and MLH1. J. Med. Genet. 37, 641–5 (2000).

106. Hampel, H. Point: justification for Lynch syndrome screening among all patients with

newly diagnosed colorectal cancer. J. Natl. Compr. Canc. Netw. 8, 597–601 (2010).

107. Baker, S. J. et al. Chromosome 17 deletions and p53 gene mutations in colorectal

carcinomas. Science (80-. ). 244, 217–221 (1989).

108. Reisman, D., Greenberg, M. & Rotter, V. Human p53 oncogene contains one promoter

upstream of exon 1 and a second, stronger promoter within intron 1. Proc. Natl. Acad. Sci.

U. S. A. 85, 5146–50 (1988).

109. Reich, N. C. & Levine, A. J. Growth regulation of a cellular tumour antigen, p53, in

nontransformed cells. Nature 308, 199–201

108

110. Mosner, J. et al. Negative feedback regulation of wild-type p53 biosynthesis. EMBO J. 14,

4442–9 (1995).

111. Boggs, K. & Reisman, D. Increased p53 transcription prior to DNA synthesis is regulated

through a novel regulatory element within the p53 promoter. Oncogene 25, 555–65

(2006).

112. Reisman, D., Takahashi, P., Polson, A. & Boggs, K. Transcriptional Regulation of the p53

Tumor Suppressor Gene in S-Phase of the Cell-Cycle and the Cellular Response to DNA

Damage. Biochem. Res. Int. 2012, 808934 (2012).

113. Schroeder, M. & Mass, M. J. CpG methylation inactivates the transcriptional activity of

the promoter of the human p53 tumor suppressor gene. Biochem. Biophys. Res. Commun.

235, 403–6 (1997).

114. Le, M. T. N. et al. MicroRNA-125b is a novel negative regulator of p53. Genes Dev. 23,

862–76 (2009).

115. Zhang, Y. et al. MicroRNA 125a and its regulation of the p53 tumor suppressor gene.

FEBS Lett. 583, 3725–30 (2009).

116. Hünten, S., Siemens, H. & Kaller, M. MicroRNA Cancer Regulation. 774, 77–101

(Springer Netherlands, 2013).

117. Nishida, N. et al. MicroRNA miR-125b is a prognostic marker in human colorectal

cancer. Int. J. Oncol. 38, 1437–43 (2011).

118. Saldaña-Meyer, R. & Recillas-Targa, F. Transcriptional and epigenetic regulation of the

p53 tumor suppressor gene. Epigenetics 6, 1068–77 (2011).

119. Mahmoudi, S. et al. Wrap53, a natural p53 antisense transcript required for p53 induction

upon DNA damage. Mol. Cell 33, 462–71 (2009).

120. Jones, S. N., Roe, A. E., Donehower, L. A. & Bradley, A. Rescue of embryonic lethality

in Mdm2-deficient mice by absence of p53. Nature 378, 206–8 (1995).

121. Montes de Oca Luna, R., Wagner, D. S. & Lozano, G. Rescue of early embryonic lethality

in mdm2-deficient mice by deletion of p53. Nature 378, 203–6 (1995).

122. Kubbutat, M. H., Ludwig, R. L., Ashcroft, M. & Vousden, K. H. Regulation of Mdm2-

directed degradation by the C terminus of p53. Mol. Cell. Biol. 18, 5690–8 (1998).

123. Rodriguez, M. S., Desterro, J. M., Lain, S., Lane, D. P. & Hay, R. T. Multiple C-terminal

lysine residues target p53 for ubiquitin-proteasome-mediated degradation. Mol. Cell. Biol.

20, 8458–67 (2000).

109

124. Ohkubo, S., Tanaka, T., Taya, Y., Kitazato, K. & Prives, C. Excess HDM2 impacts cell

cycle and apoptosis and has a selective effect on p53-dependent transcription. J. Biol.

Chem. 281, 16943–50 (2006).

125. Itahana, K. et al. Targeted inactivation of Mdm2 RING finger E3 ubiquitin ligase activity

in the mouse reveals mechanistic insights into p53 regulation. Cancer Cell 12, 355–66

(2007).

126. Stott, F. J. et al. The alternative product from the human CDKN2A locus, p14(ARF),

participates in a regulatory feedback loop with p53 and MDM2. EMBO J. 17, 5001–14

(1998).

127. Midgley, C. A. et al. An N-terminal p14ARF peptide blocks Mdm2-dependent

ubiquitination in vitro and can activate p53 in vivo. Oncogene 19, 2312–23 (2000).

128. Nobori, T. et al. Deletions of the cyclin-dependent kinase-4 inhibitor gene in multiple

human cancers. Nature 368, 753–6 (1994).

129. Weber, J. D. et al. Cooperative signals governing ARF-mdm2 interaction and nucleolar

localization of the complex. Mol. Cell. Biol. 20, 2517–28 (2000).

130. Appella, E. & Anderson, C. W. Post-translational modifications and activation of p53 by

genotoxic stresses. Eur. J. Biochem. 268, 2764–72 (2001).

131. Ashcroft, M., Kubbutat, M. H. & Vousden, K. H. Regulation of p53 function and stability

by phosphorylation. Mol. Cell. Biol. 19, 1751–8 (1999).

132. Shieh, S. Y., Ikeda, M., Taya, Y. & Prives, C. DNA damage-induced phosphorylation of

p53 alleviates inhibition by MDM2. Cell 91, 325–34 (1997).

133. Shieh, S. Y., Ahn, J., Tamai, K., Taya, Y. & Prives, C. The human homologs of

checkpoint kinases Chk1 and Cds1 (Chk2) phosphorylate p53 at multiple DNA damage-

inducible sites. Genes Dev. 14, 289–300 (2000).

134. Chehab, N. H., Malikzay, A., Stavridi, E. S. & Halazonetis, T. D. Phosphorylation of Ser-

20 mediates stabilization of human p53 in response to DNA damage. Proc. Natl. Acad.

Sci. U. S. A. 96, 13777–82 (1999).

135. Unger, T. et al. Critical role for Ser20 of human p53 in the negative regulation of p53 by

Mdm2. EMBO J. 18, 1805–14 (1999).

136. Tibbetts, R. S. et al. A role for ATR in the DNA damage-induced phosphorylation of p53.

Genes Dev. 13, 152–7 (1999).

137. Canman, C. E. et al. Activation of the ATM kinase by ionizing radiation and

phosphorylation of p53. Science 281, 1677–9 (1998).

110

138. Lu, X. & Lane, D. P. Differential induction of transcriptionally active p53 following UV

or ionizing radiation: defects in chromosome instability syndromes? Cell 75, 765–78

(1993).

139. el-Deiry, W. S., Kern, S. E., Pietenpol, J. A., Kinzler, K. W. & Vogelstein, B. Definition

of a consensus binding site for p53. Nat. Genet. 1, 45–9 (1992).

140. Cawley, S. et al. Unbiased mapping of transcription factor binding sites along human

chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116,

499–509 (2004).

141. Wei, C.-L. et al. A global map of p53 transcription-factor binding sites in the human

genome. Cell 124, 207–19 (2006).

142. O’Connor, P. M. et al. Characterization of the p53 tumor suppressor pathway in cell lines

of the National Cancer Institute anticancer drug screen and correlations with the growth-

inhibitory potency of 123 anticancer agents. Cancer Res. 57, 4285–300 (1997).

143. Kastan, M. B. et al. A mammalian cell cycle checkpoint pathway utilizing p53 and

GADD45 is defective in ataxia-telangiectasia. Cell 71, 587–97 (1992).

144. el-Deiry, W. S. et al. WAF1/CIP1 is induced in p53-mediated G1 arrest and apoptosis.

Cancer Res. 54, 1169–74 (1994).

145. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–

674 (2011).

146. Middleton, G., Cox, S. W., Korsmeyer, S. & Davies, A. M. Differences in bcl-2- and bax-

independent function in regulating apoptosis in sensory neuron populations. Eur. J.

Neurosci. 12, 819–27 (2000).

147. Shi, Y. Mechanisms of caspase activation and inhibition during apoptosis. Mol. Cell 9,

459–70 (2002).

148. Bennett, M. et al. Cell surface trafficking of Fas: a rapid mechanism of p53-mediated

apoptosis. Science 282, 290–3 (1998).

149. Adams, J. M. Ways of dying: multiple pathways to apoptosis. Genes Dev. 17, 2481–95

(2003).

150. Miyashita, T. & Reed, J. C. Tumor suppressor p53 is a direct transcriptional activator of

the human bax gene. Cell 80, 293–9 (1995).

151. Oda, E. et al. Noxa, a BH3-only member of the Bcl-2 family and candidate mediator of

p53-induced apoptosis. Science 288, 1053–8 (2000).

111

152. Nakano, K. & Vousden, K. H. PUMA, a novel proapoptotic gene, is induced by p53. Mol.

Cell 7, 683–94 (2001).

153. Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat.

Rev. Genet. 7, 85–97 (2006).

154. Gault, J. et al. Comparison of polymorphisms in the alpha7 nicotinic receptor gene and its

partial duplication in schizophrenic and control subjects. Am. J. Med. Genet. B.

Neuropsychiatr. Genet. 123B, 39–49 (2003).

155. Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science

305, 525–8 (2004).

156. Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36,

949–51 (2004).

157. Macdonald, J. R., Ziman, R., Yuen, R. K. C., Feuk, L. & Scherer, S. W. The Database of

Genomic Variants: a curated collection of structural variation in the human genome.

Nucleic Acids Res. 42, D986–92 (2014).

158. Conrad, D. F. et al. Origins and functional impact of copy number variation in the human

genome. Nature 464, 704–12 (2010).

159. Hastings, P. J., Lupski, J. R., Rosenberg, S. M. & Ira, G. Mechanisms of change in gene

copy number. Nat. Rev. Genet. 10, 551–64 (2009).

160. Lovett, S. T., Hurley, R. L., Sutera, V. A., Aubuchon, R. H. & Lebedeva, M. A. Crossing

over between regions of limited homology in Escherichia coli. RecA-dependent and

RecA-independent pathways. Genetics 160, 851–9 (2002).

161. Liskay, R. M., Letsou, A. & Stachelek, J. L. Homology requirement for efficient gene

conversion between duplicated chromosomal sequences in mammalian cells. Genetics

115, 161–7 (1987).

162. Reiter, L. T. et al. Human meiotic recombination products revealed by sequencing a

hotspot for homologous strand exchange in multiple HNPP deletion patients. Am. J. Hum.

Genet. 62, 1023–33 (1998).

163. Neale, M. J. & Keeney, S. Clarifying the mechanics of DNA strand exchange in meiotic

recombination. Nature 442, 153–8 (2006).

164. Sonoda, E. et al. Rad51-deficient vertebrate cells accumulate chromosomal breaks prior to

cell death. EMBO J. 17, 598–608 (1998).

112

165. Pentao, L., Wise, C. A., Chinault, A. C., Patel, P. I. & Lupski, J. R. Charcot-Marie-Tooth

type 1A duplication appears to arise from recombination at repeat sequences flanking the

1.5 Mb monomer unit. Nat. Genet. 2, 292–300 (1992).

166. Chen, K. S. et al. Homologous recombination of a flanking repeat gene cluster is a

mechanism for a common contiguous gene deletion syndrome. Nat. Genet. 17, 154–63

(1997).

167. Llorente, B., Smith, C. E. & Symington, L. S. Break-induced replication: what is it and

what is it for? Cell Cycle 7, 859–64 (2008).

168. Lieber, M. R. The mechanism of double-strand DNA break repair by the nonhomologous

DNA end-joining pathway. Annu. Rev. Biochem. 79, 181–211 (2010).

169. Toffolatti, L. et al. Investigating the mechanism of chromosomal deletion:

characterization of 39 deletion breakpoints in introns 47 and 48 of the human dystrophin

gene. Genomics 80, 523–30 (2002).

170. Lee, J. A., Carvalho, C. M. B. & Lupski, J. R. A DNA replication mechanism for

generating nonrecurrent rearrangements associated with genomic disorders. Cell 131,

1235–47 (2007).

171. Van Binsbergen, E. Origins and breakpoint analyses of copy number variations: up close

and personal. Cytogenet. Genome Res. 135, 271–6 (2011).

172. Bruder, C. E. G. et al. Phenotypically concordant and discordant monozygotic twins

display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82, 763–71

(2008).

173. Stranger, B. E. et al. Relative impact of nucleotide and copy number variation on gene

expression phenotypes. Science 315, 848–53 (2007).

174. Inoue, K. & Lupski, J. R. Molecular mechanisms for genomic disorders. Annu. Rev.

Genomics Hum. Genet. 3, 199–242 (2002).

175. Merla, G. et al. Submicroscopic deletion in patients with Williams-Beuren syndrome

influences expression levels of the nonhemizygous flanking genes. Am. J. Hum. Genet. 79,

332–41 (2006).

176. Howard, R. O., Breg, W. R., Albert, D. M. & Lesser, R. L. Retinoblastoma and

chromosome abnormality. Partial deletion of the long arm of chromosome 13. Arch.

Ophthalmol. 92, 490–3 (1974).

177. Orye, E., Delbeke, M. J. & Vandenabeele, B. Retinoblastoma and long arm delection of

chromosome 13. Attempts to define the deleted segment. Clin. Genet. 5, 457–64 (1974).

113

178. Krepischi, A. C. V., Pearson, P. L. & Rosenberg, C. Germline copy number variations and

cancer predisposition. Future Oncol. 8, 441–50 (2012).

179. Venkatachalam, R. et al. Identification of candidate predisposing copy number variants in

familial and early-onset colorectal cancer patients. Int. J. Cancer 129, 1635–42 (2011).

180. Krepischi, A. C. et al. Germline DNA copy number variation in familial and early-onset

breast cancer. Breast Cancer Res. 14, R24 (2012).

181. Yang, X. R. et al. Duplication of CXC chemokine genes on chromosome 4q13 in a

melanoma-prone family. Pigment Cell Melanoma Res. 25, 243–7 (2012).

182. Cho, H.-J. et al. Glutathione-S-transferase genotypes influence the risk of chemotherapy-

related toxicities and prognosis in Korean patients with diffuse large B-cell lymphoma.

Cancer Genet. Cytogenet. 198, 40–6 (2010).

183. Gamazon, E. R., Huang, R. S., Dolan, M. E. & Cox, N. J. Copy number polymorphisms

and anticancer pharmacogenomics. Genome Biol. 12, R46 (2011).

184. Thomas, G. et al. A multistage genome-wide association study in breast cancer identifies

two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nat. Genet. 41, 579–84 (2009).

185. Hodgson, S. V, Fagg, N. L., Talbot, I. C. & Wilkinson, M. Deletions of the entire APC

gene are associated with sessile colonic adenomas. J. Med. Genet. 31, 426 (1994).

186. Lucito, R. et al. Copy-number variants in patients with a strong family history of

pancreatic cancer. Cancer Biol. Ther. 6, 1592–9 (2007).

187. Balentien, E., Mufson, B. E., Shattuck, R. L., Derynck, R. & Richmond, A. Effects of

MGSA/GRO alpha on melanocyte transformation. Oncogene 6, 1115–24 (1991).

188. Wang, J. M., Taraboletti, G., Matsushima, K., Van Damme, J. & Mantovani, A. Induction

of haptotactic migration of melanoma cells by neutrophil activating protein/interleukin-8.

Biochem. Biophys. Res. Commun. 169, 165–70 (1990).

189. Yoshihara, K. et al. Germline Copy Number Variations in BRCA1 - Associated Ovarian

Cancer Patients. 177, 167–177 (2011).

190. Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–83 (2004).

191. Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: Identifying a common

protein-coding gene set for the human and mouse genomes. Genome Res. 19, 1316–23

(2009).

192. Malanga, D. et al. Functional characterization of a rare germline mutation in the gene

encoding the cyclin-dependent kinase inhibitor p27Kip1 (CDKN1B) in a Spanish patient

114

with multiple endocrine neoplasia-like phenotype. Eur. J. Endocrinol. 166, 551–60

(2012).

193. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations.

Nat. Methods 7, 248–9 (2010).

194. Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous

variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–81 (2009).

195. Vasson, A. et al. Custom oligonucleotide array-based CGH: a reliable diagnostic tool for

detection of exonic copy-number changes in multiple targeted genes. Eur. J. Hum. Genet.

21, 977–87 (2013).

196. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409,

860–921 (2001).

197. Finishing the euchromatic sequence of the human genome. Nature 431, 931–45 (2004).

198. Mardis, E. R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum.

Genet. 9, 387–402 (2008).

199. Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145

(2008).

200. Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using

genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–9 (2008).

201. Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast

cancer genomes. Nature 462, 1005–10 (2009).

202. Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic

event during cancer development. Cell 144, 27–40 (2011).

203. Lister, R. & Ecker, J. R. Finding the fifth base: genome-wide sequencing of cytosine

methylation. Genome Res. 19, 959–66 (2009).

204. Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia

genome. Nature 456, 66–72 (2008).

205. Yoshida, K., Sanada, M. & Ogawa, S. Deep sequencing in cancer research. Jpn. J. Clin.

Oncol. 43, 110–5 (2013).

206. Tiacci, E. et al. BRAF mutations in hairy-cell leukemia. N. Engl. J. Med. 364, 2305–15

(2011).

115

207. Pasqualucci, L. et al. Inactivating mutations of acetyltransferase genes in B-cell

lymphoma. Nature 471, 189–95 (2011).

208. Agrawal, N. et al. Exome sequencing of head and neck squamous cell carcinoma reveals

inactivating mutations in NOTCH1. Science 333, 1154–7 (2011).

209. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by

multiregion sequencing. N. Engl. J. Med. 366, 883–92 (2012).

210. Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative

breast cancers. Nature 486, 395–9 (2012).

211. Comino-Méndez, I. et al. Exome sequencing identifies MAX mutations as a cause of

hereditary pheochromocytoma. Nat. Genet. 43, 663–7 (2011).

212. Burnichon, N. et al. MAX mutations cause hereditary and sporadic pheochromocytoma

and paraganglioma. Clin. Cancer Res. 18, 2828–37 (2012).

213. DeRycke, M. S. et al. Identification of Novel Variants in Colorectal Cancer Families by

High-Throughput Exome Sequencing. Cancer Epidemiol. Biomarkers Prev. 22, 1239–

1251 (2013).

214. Popova, T. et al. Germline BAP1 mutations predispose to renal cell carcinomas. Am. J.

Hum. Genet. 92, 974–80 (2013).

215. Bogliolo, M. et al. Mutations in ERCC4, Encoding the DNA-Repair Endonuclease XPF,

Cause Fanconi Anemia. Am. J. Hum. Genet. 92, 800–806 (2013).

216. Thompson, E. R. et al. Exome sequencing identifies rare deleterious mutations in DNA

repair genes FANCC and BLM as potential breast cancer susceptibility alleles. PLoS

Genet. 8, e1002894 (2012).

217. Powell, B. C. et al. Identification of TP53 as an acute lymphocytic leukemia susceptibility

gene through exome sequencing. Pediatr. Blood Cancer 60, E1–3 (2013).

218. Chang, V. Y., Federman, N., Martinez-Agosto, J., Tatishchev, S. F. & Nelson, S. F.

Whole exome sequencing of pediatric gastric adenocarcinoma reveals an atypical

presentation of Li-Fraumeni syndrome. Pediatr. Blood Cancer 60, 570–4 (2013).

219. Turner, C. & Hilton-Jones, D. The myotonic dystrophies: diagnosis and management. J.

Neurol. Neurosurg. Psychiatry 81, 358–67 (2010).

220. Seguí, N. et al. Telomere length and genetic anticipation in Lynch syndrome. PLoS One 8,

e61286 (2013).

116

221. Bozzao, C., Lastella, P. & Stella, A. Anticipation in lynch syndrome: where we are where

we go. Curr. Genomics 12, 451–65 (2011).

222. Plazzer, J. P. et al. The InSiGHT database: utilizing 100 years of insights into Lynch

syndrome. Fam. Cancer 12, 175–80 (2013).

223. Urso, E. et al. Soft tissue sarcoma and the hereditary non-polyposis colorectal cancer

(HNPCC) syndrome: formulation of an hypothesis. Mol. Biol. Rep. 39, 9307–10 (2012).

224. Wimmer, K. & Etzler, J. Constitutional mismatch repair-deficiency syndrome: have we so

far seen only the tip of an iceberg? Hum. Genet. 124, 105–22 (2008).

225. Berman, D. M. et al. Medulloblastoma growth inhibition by hedgehog pathway blockade.

Science 297, 1559–61 (2002).

226. Lo Muzio, L. Nevoid basal cell carcinoma syndrome (Gorlin syndrome). Orphanet J. Rare

Dis. 3, 32 (2008).

227. Lindström, E., Shimokawa, T., Toftgård, R. & Zaphiropoulos, P. G. PTCH mutations:

distribution and analyses. Hum. Mutat. 27, 215–9 (2006).

228. Takahashi, C. et al. Germline PTCH1 mutations in Japanese basal cell nevus syndrome

patients. J. Hum. Genet. 54, 403–8 (2009).

229. Nagao, K. et al. Entire PTCH1 deletion is a common event in point mutation-negative

cases with nevoid basal cell carcinoma syndrome in Japan. Clin. Genet. 79, 196–8 (2011).

230. Calzada-Wack, J. et al. Unbalanced overexpression of the mutant allele in murine Patched

mutants. Carcinogenesis 23, 727–33 (2002).

231. Kappler, R. et al. Profiling the molecular difference between Patched- and p53-dependent

rhabdomyosarcoma. Oncogene 23, 8785–95 (2004).

232. Nagao, K. et al. Identification and characterization of multiple isoforms of a murine and

human tumor suppressor, patched, having distinct first exons. Genomics 85, 462–71

(2005).

233. Kogerman, P. et al. Alternative first exons of PTCH1 are differentially regulated in vivo

and may confer different functions to the PTCH1 protein. Oncogene 21, 6007–16 (2002).

234. Shimokawa, T., Rahnama, F. & Zaphiropoulos, P. G. A novel first exon of the Patched1

gene is upregulated by Hedgehog signaling resulting in a protein with pathway inhibitory

functions. FEBS Lett. 578, 157–62 (2004).

235. Suzuki, M. et al. Selective haploinsufficiency of longer isoforms of PTCH1 protein can

cause nevoid basal cell carcinoma syndrome. J. Hum. Genet. 57, 422–6 (2012).

117

236. Minami, M. et al. Germline mutations of the PTCH gene in Japanese patients with nevoid

basal cell carcinoma syndrome. J. Dermatol. Sci. 27, 21–6 (2001).

237. Savino, M. et al. Spectrum of PTCH mutations in Italian nevoid basal cell-carcinoma

syndrome patients: identification of thirteen novel alleles. Hum. Mutat. 24, 441 (2004).

238. Rahnama, F. et al. Inhibition of GLI1 gene activation by Patched1. Biochem. J. 394, 19–

26 (2006).

239. Valdivielso-Ramos, M. et al. Novel mutation in the PTCH1 gene in a patient with Gorlin

syndrome with prominent clinical features. Clin. Exp. Dermatol. 39, 406–7 (2014).

240. Barreto, D. C., Gomez, R. S., Bale, A. E., Boson, W. L. & De Marco, L. PTCH gene

mutations in odontogenic keratocysts. J. Dent. Res. 79, 1418–22 (2000).

241. Lee, Y. et al. Patched2 modulates tumorigenesis in patched1 heterozygous mice. Cancer

Res. 66, 6964–71 (2006).

242. Chung, J. H. & Bunz, F. A loss-of-function mutation in PTCH1 suggests a role for

autocrine hedgehog signaling in colorectal tumorigenesis. Oncotarget 4, 2208–11 (2013).

243. Villani, A. et al. Biochemical and imaging surveillance in germline TP53 mutation

carriers with Li-Fraumeni syndrome: a prospective observational study. Lancet Oncol. 12,

559–67 (2011).

244. Tang, J. Y. et al. Inhibiting the hedgehog pathway in patients with the basal-cell nevus

syndrome. N. Engl. J. Med. 366, 2180–8 (2012).

245. Chang, A. L. S., Atwood, S. X., Tartar, D. M. & Oro, A. E. Surgical excision after

neoadjuvant therapy with vismodegib for a locally advanced basal cell carcinoma and

resistant basal carcinomas in Gorlin syndrome. JAMA dermatology 149, 639–41 (2013).

246. Wolfe, C. M., Green, W. H., Cognetta, A. B. & Hatfield, H. K. Basal cell carcinoma

rebound after cessation of vismodegib in a nevoid basal cell carcinoma syndrome patient.

Dermatol. Surg. 38, 1863–6 (2012).

118

Supplementary Information

Supplementary Materials and Methods

Sanger Genes

Gene Location

ABL1 chr9:133589268-133763060

ABL2 chr1:179068463-179198819

ACSL3 chr2:223725732-223808118

AF15Q14 chr15:40886447-40954881

AF1Q chr1:151032151-151040972

AF3p21 chr3:48711280-48723334

AF5q31 chr5:132211072-132299354

AKAP9 chr7:91570189-91739986

AKT1 chr14:105235689-105262080

AKT2 chr19:40736225-40791265

ALK chr2:29415641-30144432

ALO17 chr17:78234667-78370085

APC chr5:112043218-112181935

ARHGEF12 chr11:120207946-120360645

ARHH chr4:40192613-40245992

ARNT chr1:150782186-150849186

ASPSCR1 chr17:79935426-79975280

ASXL1 chr20:30946147-31027121

ATF1 chr12:51157819-51214906

ATIC chr2:216176679-216214499

ATM chr11:108093559-108239826

BCL10 chr1:85731461-85742587

BCL11A chr2:60678303-60780633

BCL11B chr14:99635627-99737822

BCL2 chr18:60790579-60986613

BCL3 chr19:45251978-45263300

BCL6 chr3:187439165-187463513

BCL7A chr12:122459861-122499948

BCL9 chr1:147013182-147098013

BCR chr22:23522552-23660223

BHD chr17:17115529-17140502

BIRC3 chr11:102188194-102208464

BLM chr15:91260579-91358684

BMPR1A chr10:88516396-88684944

119

BRAF chr7:140433815-140624564

BRCA1 chr17:41196313-41276132

BRCA2 chr13:32889617-32973809

BRD3 chr9:136895454-136933141

BRD4 chr19:15348302-15391262

BRIP1 chr17:59759985-59940755

BTG1 chr12:92534054-92539673

BUB1B chr15:40453210-40513335

C12orf9 chr12:66500965-66502496

CANT1 chr17:76987799-77005899

CARD11 chr7:2945769-3083579

CARS chr11:3022160-3078671

CBFA2T1 chr8:92971152-93075191

CBFA2T3 chr16:88941267-89043401

CBFB chr16:67063050-67134956

CBL chr11:119076990-119178858

CBLB chr3:105377110-105587887

CBLC chr19:45281126-45303902

CCND1 chr11:69455873-69469241

CCND2 chr12:4382902-4414521

CCND3 chr6:41902672-42016610

CD74 chr5:149781201-149792332

CD79A chr19:42381190-42385438

CD79B chr17:62006098-62009704

CDH1 chr16:68771195-68869444

CDH11 chr16:64980685-65155919

CDK4 chr12:58142054-58146078

CDK6 chr7:92234237-92465941

CDKN2A -p16(INK4a) chr9:21967752-21994490

CDKN2A- p14ARF chr9:21967752-21994491

CDKN2C chr1:51435642-51440307

CDX2 chr13:28536315-28543423

CEBPA chr19:33790847-33793390

CEP1 chr9:123850574-123939886

CHCHD7 chr8:57124315-57131174

CHEK2 chr22:29083731-29137822

CHIC2 chr4:54875958-54930788

CHN1 chr2:175664042-175870170

CIC chr19:42788817-42799949

CLTC chr17:57697050-57774317

CLTCL1 chr22:19166989-19279239

120

CMKOR1 chr2:237478380-237490992

COL1A1 chr17:48261459-48279000

COPEB chr10:3818189-3827473

COX6C chr8:100890223-100906242

CREB1 chr2:208394616-208470282

CREB3L2 chr7:137559727-137686846

CREBBP chr16:3775058-3930121

CRLF2 chrX:1314887-1331530

CRTC3 chr15:91073198-91188576

CTNNB1 chr3:41240942-41281939

CYLD chr16:50775961-50835846

D10S170 chr10:61548522-61666818

DDB2 chr11:47236493-47260769

DDIT3 chr12:57910373-57914300

DDX10 chr11:108535860-108811646

DDX5 chr17:62494374-62502484

DDX6 chr11:118618473-118661972

DEK chr6:18224400-18264799

DICER1 chr14:95552565-95623759

DUX4 chr4:191005267-191006883

EGFR chr7:55086725-55224642

EIF4A2 chr3:186501256-186507877

ELF4 chrX:129198896-129244688

ELK4 chr1:205585235-205602000

ELKS chr12:1100404-1605099

ELL chr19:18553475-18632937

ELN chr7:73442427-73484234

EML4 chr2:42396490-42559686

EP300 chr22:41488614-41576080

EPS15 chr1:51819935-51,985,036

ERBB2 chr17:37844393-37884914

ERCC2 chr19:45854649-45873845

ERCC3 chr2:128014866-128051752

ERCC4 chr16:14014014-14046205

ERCC5 chr13:103498174-103528347

ERG chr21:39751952-40033704

ETV1 chr7:13930858-14031050

ETV4 chr17:41605212-41623762

ETV5 chr3:185764111-185826878

ETV6 chr12:11802788-12048323

EVI1 chr3:168801287-168865522

121

EWSR1 chr22:29663998-29696514

EXT1 chr8:118811602-119124058

EXT2 chr11:44117099-44266979

EZH2 chr7:148504475-148581414

FACL6 chr5:131289152-131347349

FANCA chr16:89803959-89883065

FANCC chr9:97861338-98079991

FANCD2 chr3:10068113-10143614

FANCE chr6:35420138-35434881

FANCF chr11:22644079-22647387

FANCG chr9:35073835-35080013

FBXW7 chr4:153242411-153456172

FCGR2B chr1:161632905-161648442

FEV chr2:219845809-219850379

FGFR1 chr8:38268657-38326352

FGFR1OP chr6:167412816-167454065

FGFR2 chr10:123237845-123357972

FGFR3 chr4:1795039-1810599

FH chr1:241660857-241683085

FIP1L1 chr4:54243820-54326102

FLI1 chr11:128562389-128683161

FLT3 chr13:28577412-28674729

FNBP1 chr9:132649466-132805473

FOXL2 chr3:138663067-138665982

FOXO1A chr13:41129803-41240734

FOXO3A chr6:108881026-109005971

FOXP1 chr3:71004737-71633140

FSTL3 chr19:676389-683392

FUS chr16:31191431-31206190

FVT1 chr18:60994972-61034506

GAS7 chr17:9813926-10101868

GATA1 chrX:48644982-48652715

GATA2 chr3:128198265-128212030

GATA3 chr10:8096667-8117162

GMPS chr3:155588325-155655518

GNAQ chr9:80335200-80646192

GNAS chr20:57414795-57486249

GOLGA5 chr14:93260650-93306304

GOPC chr6:117881435-117923705

GPC3 chrX:132669776-133119673

GPHN chr14:66974125-67648523

http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr11:128562389-128683161&hgsid=167019319&refGene=pack&hgFind.matches=NM_001167681,











http://genome.ucsc.edu/cgi-bin/hgTracks?position=chrX:48644982-48652715&hgsid=167019319&refGene=pack&hgFind.matches=NM_002049,










122

GRAF chr5:142150292-142608571

HCMOGT-1 chr17:19990335-20218065

HEAB chr11:57425216-57429336

HEI10 chr14:20779530-20797536

HERPUD1 chr16:56965748-56977793

HIP1 chr7:75163409-75368279

HIST1H4I chr6:27107088-27107457

HLF chr17:53342321-53402426

HLXB9 chr7:156797547-156803347

HMGA1 chr6:34204577-34214007

HMGA2 chr12:66218240-66360068

HNRNPA2B1 chr7:26229557-26240413

HOOK3 chr8:42752033-42885681

HOXA11 chr7:27220777-27224835

HOXA13 chr7:27236499-27239725

HOXA9 chr7:27202058-27205149

HOXC11 chr12:54366910-54370201

HOXC13 chr12:54332576-54340327

HOXD11 chr2:176972084-176974314

HOXD13 chr2:176957532-176960666

HRAS chr11:532243-535550

HRPT2 chr1:193091088-193223940

HSPCA chr14:102547076-102606086

HSPCB chr6:44214849-44221614

IDH1 chr2:209100954-209119806

IDH2 chr15:90627214-90645708

IGH@ chr14:106053226-106054732

IGK@ chr2:89156507-89165894

IGL@ chr22:22516610-22517078

IKZF1 chr7:50344378-50472796

IL2 chr4:123372630-123377650

IL21R chr16:27413483-27463362

IL6ST chr5:55230938-55290821

IRF4 chr6:391752-411442

IRTA1 chr1:157543540-157567870

ITK chr5:156607907-156682109

JAK1 chr1:65298906-65432187

JAK2 chr9:4985245-5128182

JAK3 chr19:17935595-17958841

JAZF1 chr7:27870196-28220437

JUN chr1:59246464-59249785







































123

KDM5A chr12:389223-498620

KDM5C chrX:53220504-53254604

KDM6A chrX:44732423-44971843

KDR chr4:55944427-55991762

KIAA1549 chr7:138516129-138666064

KIT chr4:55524095-55606879

KLK2 chr19:51376689-51383822

KRAS chr12:25358180-25403854

KTN1 chr14:56046925-56151301

LAF4 chr2:100163718-100722045

LASP1 chr17:37026112-37078022

LCK chr1:32716840-32751765

LCP1 chr13:46700059-46756459

LCX chr10:70320117-70454238

LHFP chr13:39917030-40177356

LIFR chr5:38475065-38595507

LMO1 chr11:8245857-8285406

LMO2 chr11:33880125-33913836

LPP chr3:187871663-188608459

LYL1 chr19:13209848-13213681

MADH4 chr18:48556583-48611409

MAF chr16:79627746-79634622

MAFB chr20:39314519-39317876

MALT1 chr18:56338618-56417370

MAML2 chr11:95711440-96076344

MAP2K4 chr17:11924135-12047050

MDM2 chr12:69201971-69239211

MDM4 chr1:204485511-204527247

MDS1 chr3:168801287-169381563

MDS2 chr1:23953824-23967056

MECT1 chr19:18794425-18893142

MEN1 chr11:64570996-64578766

MET chr7:116312459-116438439

MHC2TA chr16:10971055-11018839

MITF chr3:69788586-70017488

MKL1 chr22:40806292-41032690

MLF1 chr3:158288953-158324252

MLH1 chr3:37035268-37092335

MLL chr11:118307205-118395934

MLLT1 chr19:6210393-6279959

MLLT10 chr10:21823102-22032555




























http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr1:204485511-204527247&hgsid=167108063&refGene=pack&hgFind.matches=NR_024171,














124

MLLT2 chr4:87856154-88062205

MLLT3 chr9:20344968-20622514

MLLT4 chr6:168227671-168365792

MLLT6 chr17:36861873-36886055

MLLT7 chrX:70315999-70323384

MN1 chr22:28144266-28197486

MPL chr1:43803475-43820134

MSF chr17:75277492-75496676

MSH2 chr2:47630263-47710360

MSH6 chr2:48010221-48034084

MSI2 chr17:55333931-55757299

MSN chrX:64887511-64961792

MTCP1 chrX:154292309-154299547

MUC1 chr1:155158302-155162700

MUTYH chr1:45794915-45806142

MYB chr6:135502453-135540311

MYC chr8:128748315-128753678

MYCL1 chr1:40361096-40367687

MYCN chr2:16080683-16087128

MYH11 chr16:15796994-15950887

MYH9 chr22:36677324-36784063

MYST4 chr10:76586379-76792639

NACA chr12:57106211-57119326

NBS1 chr8:90945564-90996899

NCOA1 chr2:24807346-24993568

NCOA2 chr8:71024267-71316020

NCOA4 chr10:51565108-51590733

NF1 chr17:29421781-29646377

NF2 chr22:29999545-30094583

NFIB chr9:14081842-14398982

NFKB2 chr10:104154229-104162280

NIN chr14:51186482-51297839

NONO chrX:70503042-70521016

NOTCH1 chr9:139388897-139440238

NOTCH2 chr1:120454178-120612276

NPM1 chr5:170814708-170837887

NR4A3 chr9:102584137-102629173

NRAS chr1:115247079-115259515

NSD1 chr5:176560080-176727213

NTRK1 chr1:156785542-156851642

NTRK3 chr15:88419988-88799661









































125

NUMA1 chr11:71713911-71791573

NUP214 chr9:134000981-134109090

NUP98 chr11:3696241-3819022

NUT chr15:34638066-34649929

OLIG2 chr21:34398239-34401500

OMD chr9:95176528-95186836

P2RY8 chrY:1531466-1606037

PAFAH1B2 chr11:117015000-117047129

PALB2 chr16:23614483-23652678

PAX3 chr2:223064606-223163715

PAX5 chr9:36838531-37034476

PAX7 chr1:18957500-19062631

PAX8 chr2:113973575-114036498

PBX1 chr1:164528802-164821045

PCM1 chr8:17780366-17887455

PCSK7 chr11:117075789-117102811

PDE4DIP chr1:144851428-145076079

PDGFB chr22:39619687-39640957

PDGFRA chr4:55095264-55164411

PDGFRB chr5:149493403-149535422

PER1 chr17:8043789-8055753

PHOX2B chr4:41746100-41750987

PICALM chr11:85668486-85780108

PIK3CA chr3:178866311-178952495

PIK3R1 chr5:67522462-67597647

PIM1 chr6:37137922-37143202

PLAG1 chr8:57073469-57123859

PML chr15:74287014-74340153

PMS1 chr2:190648811-190742354

PMS2 chr7:6012871-6048737

PMX1 chr1:170633313-170708540

PNUTL1 chr22:19701987-19712297

POU2AF1 chr11:111222983-111250157

POU5F1 chr6:31132115-31138451

PPARG chr3:12329349-12475854

PRCC chr1:156737274-156770604

PRDM16 chr1:2985742-3355183

PRF1 chr10:72357105-72362531

PRKAR1A chr17:66508110-66528908

PRO1073 chr11:65265233-65273937

PSIP2 chr9:15464066-15511003







http://genome.ucsc.edu/cgi-bin/hgTracks?position=chrY:1531466-1606037&hgsid=167108063&refGene=pack&hgFind.matches=NM_178129,



































126

PTCH1 chr9:98205266-98279247

PTEN chr10:89623195-89728531

PTPN11 chr12:112856536-112947716

RAB5EP chr17:5185558-5289131

RAD51L1 chr14:68286496-69062737

RAF1 chr3:12625102-12705700

RANBP17 chr5:170289022-170727018

RAP1GDS1 chr4:99182527-99365010

RARA chr17:38465423-38513894

RB1 chr13:48877883-49056024

RBM15 chr1:110881945-110889303

RECQL4 chr8:145736667-145743210

REL chr2:61108752-61150178

RET chr10:43572517-43625795

ROS1 chr6:117609530-117747018

RPL22 chr1:6245081-6259679

RPN1 chr3:128338813-128369719

RUNX1 chr21:36160099-36421595

RUNXBP2 chr8:41786998-41909505

SBDS chr7:66452690-66460588

SDH5 chr11:61197597-61214237

SDHB chr1:17345227-17380665

SDHC chr1:161284166-161334533

SDHD chr11:111957571-111966517

SEPT6 chrX:118750911-118827333

SET chr9:131445934-131458674

SETD2 chr3:47057900-47205467

SFPQ chr1:35649203-35658743

SFRS3 chr6:36562090-36572243

SH3GL1 chr19:4360368-4400471

SIL chr1:47715811-47779819

SLC45A3 chr1:205626981-205649630

SMARCA4 chr19:11071598-11172959

SMARCB1 chr22:24129150-24176704

SMO chr7:128828713-128853383

SOCS1 chr16:11348274-11350039

SRGAP3 chr3:9022278-9291311

SS18 chr18:23596219-23670611

SS18L1 chr20:60718822-60757568

SSH3BP1 chr10:27035527-27150016

SSX1 chrX:48114797-48126879









































127

SSX2 chrX:52725946-52736249

SSX4 chrX:48242968-48271344

STK11 chr19:1205798-1228434

STL chr6:125229394-125284173

SUFU chr10:104263719-104393214

SUZ12 chr17:30264044-30328057

SYK chr9:93564012-93660841

TAF15 chr17:34136488-34174237

TAL1 chr1:47681963-47695443

TAL2 chr9:108424738-108425383

TCEA1 chr8:54879117-54935008

TCF1 chr12:121416549-121440312

TCF12 chr15:57210833-57580712

TCF3 chr19:1609293-1650286

TCL1A chr14:96176305-96180533

TCL6 chr14:96117515-96139789

TET2 chr4:106067943-106200958

TFE3 chrX:48886242-48900990

TFEB chr6:41651716-41703997

TFG chr3:100428160-100467810

TFPT chr19:54610320-54619055

TFRC chr3:195776156-195809032

THRAP3 chr1:36690017-36770955

TIF1 chr7:138145079-138270330

TLX1 chr10:102891061-102897545

TLX3 chr5:170736288-170739137

TMPRSS2 chr21:42836479-42880085

TNFAIP3 chr6:138188581-138204445

TNFRSF17 chr16:12058964-12061924

TNFRSF6 chr10:90750288-90775541

TOP1 chr20:39657462-39753124

TP53 chr17:7571720-7590863

TPM3 chr1:154127780-154164609

TPM4 chr19:16178317-16213813

TPR chr1:186280788-186344457

TRA@ chr14:22748989-22749635

TRB@ chr7:142239528-142251156

TRD@ chr14:22953787-23020068

TRIM27 chr6:28870780-28891768

TRIM33 chr1:114935401-115053781

TRIP11 chr14:92434243-92506403







































128

TSC1 chr9:135766735-135820020

TSC2 chr16:2097990-2138712

TSHR chr14:81421869-81575291

TTL chr2:113239743-113290218

USP6 chr17:5031687-5078324

VHL chr3:10183319-10193744

WAS chrX:48542186-48549815

WHSC1 chr4:1873123-1983933

WHSC1L1 chr8:38132563-38239790

WRN chr8:30890778-31031276

WT1 chr11:32409325-32457087

WTX chrX:63404998-63425624

XPA chr9:100437192-100459691

XPC chr3:14186650-14220172

ZNF145 chr11:113930431-114121394

ZNF198 chr13:20532810-20665967

ZNF278 chr22:31721791-31742249

ZNF331 chr19:54024177-54083523

ZNF384 chr12:6775644-6798676

ZNF521 chr18:22641888-22932214

ZNF9 chr3:128886659-128902810

ZNFN1A1 chr7:50344378-50472796

Supplementary Table 1- The list of Sanger Cancer Genes as of January 2011, with

genomic coordinates (hg19).











http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=167108063&db=hg19&position=chr11%3A32409325-32457087












129

Additional Genes

Gene Location

CDK1 chr10:62538089-62554604

IGF1 chr12:102789645-102874378

CHEK1 chr11:125495036-125546148

DKK1 chr10:54074041-54077416

CDC25C chr5:137620959-137667516

TP53I3 chr2:24300305-24307728

TP73 chr1:3569129-3650467

RAD51 chr15:40987327-41024354

TP63 chr3:189349216-189599284

IGFBP3 chr7:45951850-45960871

GADD45A chr1:68150883-68154019

TIMP3 chr22:33196802-33259027

MMP13 chr11:102813723-102826463

HNRNPC chr14:21677298-21737638

CDKN1A chr6:36644237-36655108

Supplementary Table 2- Additional cell cycle genes identified by Ingenuity Pathways

Analysis that were selected on the basis of their relevance to cancer biology and interaction with

p53 with genomic coordinates (hg19).

qPCR Supplementary Information

qPCR Protocol

Quantitative PCR began with a five minute hot start of 95˚C with a ramp time of 4.4˚C/s.

After the hot start, the following cycle was repeated 40 times: 95˚C for 10 seconds with a ramp

time of 4.4˚C/s, 60˚C for 15 seconds with a ramp time of 2.2˚C/s, 72˚C for 10 seconds with a

ramp time of 4.4˚C/s. The final melt consisted of a 5 second period at 95˚C with a ramp time of

4.4˚C/s followed by 65˚C for 60s with a ramp time of 2.2˚C/s.

130

qPCR Primers

PTCH-3-F AACATTTGGCCATCTTGTCC

PTCH-3-R AAGGCTTTGAGAATGCAAGC

PTCH-6-F CTTCTCCTCCTCCTCCGTCT

PTCH-6-R CTGACAGGTCCTGCCTATGG

PTCH-7-F GGGCTTGGATTTCACATCA

PTCH-7-R AAAAGACGACAGGGGAGACA

PTCH-17-F CGAGGTTCGCTGCTTTTAAT

PTCH-17-R ACTCCTCCCTTCTGCTTCGT

PTCH-20-F TTCAGCTTCCACATGCTGTC

PTCH-20-R CCCCGCTGGTTTCTTATTTA

PTCH-24-F GAGGCTGGAGTCGGAGAACT

PTCH-24-R TACCATGCAGTCCACTGTCC

MYCN-8-F ACCCTCGTAGCTCGCACTTA

MYCN-8-R GGTAGTCCGAAGGTGCAAAA

FOXP2-F TGCTAGAGGAGTGGGACAAGTA

FOXP2-R GAAGCAGGACTCTAAGTGCAGA

MSH2-7-F AATTCAAAGAGGAGGAATTCTGA

MSH2-7-R CCATGTACCTGATTCTCCATTTC

EXT1-1-F AGAACGGTGGGATACAGCAC

EXT1-1-R TAGCAGCTTAGCCCGTTTGT

EXT1-3-F GCACAAAGGGTTGGAGAGAA

EXT1-3-R GCGTGCTCAACAGTAAGCAG

PCDH15-1-F ACCAGGCAGAAAGCTGAAAA

PCDH15-1-R CAATGGGCTTTCTGGGAGTA

PCDH15-6-F AACCCATACTGGAGGGGAAT

PCDH15-6-R TGATGAAGAAAGTCGGAATGG

BCMA-F GCCTCGAGTACACGGTGGAA

BCMA-R AGCAGCTGGCAGGCTCTTG

DICER1-1-F GCATGAGAGCGAGCCTGT

DICER1-1-R CAACGCCAAGGTCCAGTC

DICER1-9-F GGAGGCCTGAAAGGGTAAAT

DICER1-9-R TGGGTCCTTTCTTTGGACTG

Supplementary Table 3- The primary primers used for array validation by quantitative

PCR.

131

PTCH Ex 01F TGG AAG GCG CAG GGT CTG ACT

PTCH Ex 01 R CGA TCC CAA AGA GTT AGA GGA

PTCH Ex 02F CTG CGG CCC GGC TTT ATG AC

PTCH Ex 02R GCG CCC AAA CAA TAA ACA AT

PTCH Ex 03F ACT GCT CAC ACA TCA GCC AGT CTC AT

PTCH Ex 03R GCA TTT CCA GGG CAA CTT CAT TTA CTA

PTCH Ex 04–05F CAA GCT TGC TGG GTC TCT ACT T

PTCH Ex 04–05R CCC GAC TAT TCA CTC AAA AAA TGC ACA

PTCH Ex 06F ATT TGT TTT GAT GCC AGA GTC CCA GA

PTCH Ex 06R GGC TAA TGG GAG GTG TAT GGC AAA TC

PTCH Ex 07F AAG ATT TGC CAT ACA CCT CCC ATT AGC

PTCH Ex 07R AAT TCC CCA CAA GGT GCT TTT TCA A

PTCH Ex 08F GGA AAC ATG TGC TCA CAG AGA AGG AAA

PTCH Ex 08R CCA GAA TTG CAA TGT TTT GAA

PTCH Ex 09F CCC TGC CCT GGA ATC ACG TAG AAC

PTCH Ex 09R CTC TCT GTC CTG GAT GCA CA

PTCH Ex 10F TTT GCC GTT TGC CTA CCT TTG ACT C

PTCH Ex 10R GCA TTC CCC TGA AAC CAG TA

PTCH Ex 11F AGG TGC TGG TGG CAG AGT CCT AAC TA

PTCH Ex 11R GCA GCC AGT GAC ACA TCA TCT GAC AT

PTCH Ex 12F CTG CCA CGT ATC TGC TCA CAC AGT C

PTCH Ex 12R CAC CCA GTT AAA CAG AGC CTC AAA CAC

PTCH Ex 13F CAC GGT TTC AAA TGC TTC AAG AGG A

PTCH Ex 13R CAA ACC CCG TTA CCC ACA TTC CTT

PTCH Ex 14F CAG GCG ATG AAC CAG GTG ATG TTA T

PTCH Ex 14R GAA GCA ATC TGA TGA ACT CCA AAG GTT

PTCH Ex 15F AGT GTG GTG GTG AAA ACA AGG

PTCH Ex 15R GCT GCT GCA GAA ACA GTT CA

PTCH Ex 16F GGG ACA CAG AGG GTG TGT TT

PTCH Ex 16R CCA GTG CCT TAG GTC TCC AG

PTCH Ex 17F GCC AGT GAT TGC ATC CTC CGA TAA

PTCH Ex 17R GGG GGT TGT ATC CCA TTA CA

PTCH Ex 18F CCT CAC AAA GAA TGA CTG CTG GAA GAT

PTCH Ex 18R CCA GAG GCC CAG ACA TAA ACA AAA CTT

PTCH Ex 19F AAG GTT CCC ACT TGG AGA CAA ACA GAG

PTCH Ex 19R TGA ATT AGG CAG TAA AGG CAG TGT CCA

PTCH Ex 20F TAC GTC AAC ACC AAA TAT GAC CCA GTG

PTCH Ex 20R TCT GCC TCA GCC TCC CAA GTA GC

PTCH Ex 21F GGG GTG GGT TTT GTT CAT TT

132

PTCH Ex 21R AGC CAG TAC ACC GAA GAG GA

PTCH Ex 22F CCC CTG AAA AAT ACC GTG CTT TGA G

PTCH Ex 22R ATC TGC CTG TGT GAT GTG CTG CTC

PTCH Ex 23F GGG TTG ACT GAG TCT TTG GTG AAA CC

PTCH Ex 23R TAA AAG GTC ACT GGG GTC CA

ex1b-1-F CGC GGA CTC ACA ATT ACA AG

ex1b-1-R CGA GCA CAA GGT GGA GAA G

ex1b-2-F GGG CTT GGA TTT CAC ATC A

ex1b-2-R CTG ACA GGT CCT GCC TAT GG

Supplementary Table 4- The primers used for sequencing analysis of PTCH1.

133

Supplementary Results

Regions of Interest after CGH array analysis

Chromosome Start Stop ID Alteration Length(bp) Probes Mean P-value Gene(s)

chr2 237490104 237490746 11 Gain 642 5 3.440 0.005955 CXCR7

chr2 237490104 237490896 12 Gain 792 7 3.174 0.005593 CXCR7

chr2 223086061 223089609 11 Gain 3548 4 3.682 0.004443 PAX3

chr2 223086061 223089609 15 Gain 3548 4 3.607 0.008166 PAX3

chr2 208464673 208464975 5 Loss 302 4 1.194 0.001805 CREB1

chr2 208464598 208464975 1 Loss 377 5 1.309 0.003628 CREB1

chr2 208464598 208464975 4 Loss 377 5 1.207 0.000382 CREB1

chr2 208464598 208464975 6 Loss 377 5 1.238 0.001706 CREB1

chr2 208462804 208463413 4 Loss 609 4 1.158 0.007531 CREB1

chr2 208462406 208462653 27 Gain 247 3 2.889 0.005166 CREB1

chr1 205632173 205632709 12 Loss 536 14 1.425 0.000641 SLC45A3

chr1 205632173 205632709 16 Loss 536 14 1.401 0.001514 SLC45A3

chr1 205632104 205632709 14 Loss 605 15 1.419 0.001884 SLC45A3

chr4 191002380 191006727 1 Gain 4347 43 3.165 2.97E-16

DUX4L4/L6/L5/L4/L2/DUX2

chr4 191002380 191006727 2 Gain 4347 43 4.151 7.59E-17


chr4 191002380 191006727 4 Gain 4347 43 3.095 5.20E-19


chr4 191002380 191006727 5 Gain 4347 43 3.054 1.29E-15


chr4 191002380 191006727 6 Gain 4347 43 3.187 9.79E-16


chr4 191002380 191006727 7 Gain 4347 43 2.982 3.35E-12

DUX4L4/L6/L5/L4/L2/DUX

134

2

chr4 191002380 191006727 8 Gain 4347 43 2.975 4.78E-15


chr4 191002380 191006727 18 Gain 4347 43 2.779 1.60E-08


chr4 191002380 191006727 19 Gain 4347 43 2.756 1.42E-08


chr4 191002380 191006727 20 Gain 4347 43 2.725 1.77E-08


chr4 191002380 191006727 22 Gain 4347 43 2.547 4.78E-06


chr4 191001678 191006727 24 Gain 5049 47 2.660 2.74E-09


chr4 190961628 191006727 3 Gain 45099 62 3.661 8.48E-29


chr4 190961628 191002380 9 Loss 40752 19 1.301 1.08E-05


chr4 190961628 191002380 12 Loss 40752 19 1.485 0.002982


chr1 186303920 186304502 10 Loss 582 3 1.336 0.007518 TPR

chr1 186303482 186304502 1 Loss 1020 6 1.215 0.000734 TPR

chr1 186303036 186304502 3 Loss 1466 7 1.295 0.003576 TPR

chr1 186303036 186304502 5 Loss 1466 7 1.291 0.000821 TPR

chr1 186303036 186304502 6 Loss 1466 7 1.226 0.000128 TPR

chr1 186303036 186304502 18 Loss 1466 7 1.390 0.002018 TPR

chr1 186302377 186304502 4 Loss 2125 11 1.387 0.003783 TPR

chr2 175664003 175664853 3 Loss 850 10 1.257 0.007548 CHN1

135

chr2 175664003 175664853 4 Loss 850 10 1.273 0.003226 CHN1

chr2 175664003 175664853 5 Loss 850 10 1.143 0.001622 CHN1

chr2 175664003 175664853 6 Loss 850 10 1.299 0.009763 CHN1

chr2 175664003 175664853 22 Loss 850 10 1.447 0.007578 CHN1

chr2 175664003 175664351 26 Loss 348 4 1.273 0.004103 CHN1

chr2 175868293 175869966 23 Gain 1673 7 3.181 0.003828 CHN1

chr2 175872476 175872798 21 Gain 322 3 2.912 0.003915 none

chr3 174883953 175144278 11 Gain 260325 5 3.524 0.001884 NAALADL2

chr3 174758301 175196598 12 Gain 438297 10 3.000 0.000455 NAALADL2

chr5 170736166 170738656 3 Gain 2490 28 2.657 0.000247 TLX3

chr5 170736166 170738656 5 Gain 2490 28 2.532 0.003525 TLX3

chr3 168802621 168803008 11 Loss 387 8 1.387 0.005094 MECOM

chr3 168802621 168803008 14 Loss 387 8 1.356 0.00503 MECOM

chr3 168802621 168803008 34 Loss 387 8 1.297 0.000575 MECOM

chr3 168802621 168803008 36 Loss 387 8 1.345 0.00317 MECOM

chr6 168276058 168281187 3 Loss 5129 5 1.150 0.008582 MLLT4

chr6 168276058 168281187 5 Loss 5129 5 1.149 0.009663 MLLT4

chr3 162711459 166506831 10 Gain 3795372 106 2.464 3.21E-05 multiple

chr3 162629146 167317738 27 Gain 4688592 128 2.420 0.000139 multiple

chr3 162509444 162615098 5 Loss 105654 3 0.135 0.005344 BC073807

chr3 162509444 162615098 14 Loss 105654 3 0.044 0.009006 BC073807

chr3 162509444 162629146 27 Gain 119702 4 3.472 0.006365 BC073807

chr3 162509444 162629146 33 Gain 119702 4 3.228 0.008951 BC073807

chr1 161334125 161334349 23 Loss 224 4 1.299 0.008194 SDHC

chr1 161333957 161648424 7 Loss 314467 104 1.501 4.26E-06 SDHC

chr1 161333957 161648162 36 Loss 314205 102 1.548 0.000273 SDHC

chr1 161332911 161333287 33 Loss 376 5 1.452 0.006229 SDHC

chr2 154850752 155001418 10 Gain 150666 3 3.246 0.006216 GALNT13

chr2 154850752 155001418 19 Gain 150666 3 3.296 0.001175 GALNT13

136

chrX 154299505 154299990 9 Loss 485 3 0.975 0.008064 BRCC3

chrX 154299505 154299990 25 Loss 485 3 0.926 0.003345 BRCC3

chr1 150855139 151031795 8 Loss 176656 78 1.580 0.005934 multiple

chr1 150854772 151031795 2 Loss 177023 79 1.503 0.001187 multiple

chr2 150333719 151069169 12 Gain 735450 21 2.823 0.001569 MMADHC

chr2 150333719 151069169 16 Gain 735450 21 2.942 0.002285 MMADHC

chr7 146035975 146149372 3 Loss 113397 3 1.098 0.00022 CNTNAP2

chr7 146035975 146149372 6 Loss 113397 3 1.178 0.003827 CNTNAP2

chr7 146035975 146149372 21 Loss 113397 3 1.273 0.009971 CNTNAP2

chr7 144839854 147499415 27 Gain 2659561 72 2.427 0.00111 many

chr7 144320788 147499415 10 Gain 3178627 86 2.419 0.006028 many

chr7 142252511 142254840 13 Gain 2329 13 2.828 0.003154 TCRB

chr7 142252411 142254840 11 Gain 2429 14 2.959 0.000202 TCRB

chr7 142252411 142254840 14 Gain 2429 14 3.084 6.53E-05 TCRB

chr7 142252411 142254840 35 Gain 2429 14 2.673 0.002056 TCRB

chr5 140308757 140521036 11 Gain 212279 5 3.149 0.004481 many

chr5 140308757 140521036 16 Gain 212279 5 3.683 0.00688 many

chr9 139390347 140447444 10 Loss 1057097 373 1.629 1.42E-06 many

chr9 139287703 141122085 11 Loss 1834382 415 1.644 0.004464 many

chr7 138666279 138666566 27 Loss 287 3 1.343 0.005519 none

chr7 138665432 138666675 35 Loss 1243 7 1.406 0.00762 KIAA1549

chr7 138603448 138603656 11 Gain 208 4 3.380 0.008244 KIAA1549

chr7 138603448 138603656 12 Gain 208 4 3.725 2.81E-05 KIAA1549

chr7 138603448 138603656 20 Gain 208 4 2.891 0.007751 KIAA1549

chr6 138198903 138199701 32 Loss 798 4 1.036 0.008413 TNFAIP3

137

chr6 138198298 138199701 5 Loss 1403 5 1.274 0.002537 TNFAIP3

chr6 138198298 138199762 24 Loss 1464 6 1.199 0.004298 TNFAIP3

chr4 134925812 135196257 7 Loss 270445 7 1.329 0.002708 PABOC4L

chr4 134925812 135158888 23 Loss 233076 6 1.101 0.000106 PABOC4L

chr4 134925812 135196257 25 Loss 270445 7 1.235 0.00137 PABOC4L

chr9 132689352 132689611 11 Loss 259 5 1.086 0.001401 FNBP1

chr9 132689352 132689611 15 Loss 259 5 1.171 0.001257 FNBP1

chr9 132689352 132689611 16 Loss 259 5 1.120 0.001814 FNBP1

chr9 132689352 132690064 35 Loss 712 6 1.410 0.000806 FNBP1

chr9 132652044 132652975 9 Loss 931 13 1.547 0.000344 FNBP1

chr11 130980227 134934167 10 Gain 3953940 110 2.411 0.0009 many

chr11 130980227 134934167 25 Gain 3953940 110 2.411 0.008527 many

chr11 130623876 134934167 27 Gain 4310291 119 2.393 0.000721 many

chrX 129258582 151940841 10 Gain 22682259 1060 2.340 0.001048 many

chrX 129253954 143171508 27 Gain 13917554 829 2.350 4.93E-05 many

chrX 129244389 129246246 25 Loss 1857 18 1.548 0.00895 ELF4

chrX 129244292 129246323 15 Loss 2031 20 1.288 6.34E-05 ELF4

chrX 129244292 129245170 29 Loss 878 6 1.160 0.00433 ELF4

chrX 129199456 129253954 18 Loss 54498 149 1.619 0.009742 ELF4

chr11 128681934 128682174 3 Loss 240 3 1.129 0.002329 FLI1

chr11 128681934 128682174 6 Loss 240 3 1.100 0.003259 FLI1

chr11 128681934 128682248 29 Loss 314 4 1.192 0.009886 FLI1

chr6 125232156 125233193 15 Loss 1037 5 1.183 0.004529 STL

chr6 125231587 125232804 19 Loss 1217 9 1.372 0.006788 STL

chr12 122459997 122460593 15 Loss 596 3 0.931 0.006011 BCL7A

chr12 122459997 122460593 16 Loss 596 3 0.971 0.005758 BCL7A

chr12 122459997 122460593 29 Loss 596 3 1.022 0.002443 BCL7A

chr12 122459953 122460593 24 Loss 640 4 1.299 0.000634 BCL7A

chr12 122459786 122460593 32 Loss 807 5 1.310 0.006296 BCL7A

chr12 122459786 122460593 34 Loss 807 5 1.004 0.004624 BCL7A

chr12 122459611 122460593 9 Loss 982 6 1.209 0.002536 BCL7A

chr12 122459611 122460593 10 Loss 982 6 1.224 0.001463 BCL7A

chr12 122459611 122460593 13 Loss 982 6 1.337 0.000999 BCL7A

138

chr12 122459611 122460593 36 Loss 982 6 1.151 0.004283 BCL7A

chr12 122459463 122460593 31 Loss 1130 8 1.294 0.001221 BCL7A

chr12 121403212 121409527 7 Loss 6315 5 1.159 0.004089 HNF1A-AS1

chr12 121380055 121409527 2 Loss 29472 6 1.068 0.000602 HNF1A-AS1

chr12 121380055 121409527 6 Loss 29472 6 1.282 0.006815 HNF1A-AS1

chr12 121380055 121409527 8 Loss 29472 6 1.290 0.004219 HNF1A-AS1

chr1 120611983 120612398 24 Loss 415 5 1.458 0.004214 NOTCH2

chr1 120611983 120612273 31 Loss 290 4 1.253 0.00689 NOTCH2

chr1 120611948 120612205 29 Loss 257 4 1.298 0.001139 NOTCH2

chr1 120611948 120612273 34 Loss 325 5 1.080 0.001001 NOTCH2

chr1 120610364 120612273 9 Loss 1909 6 1.218 0.001734 NOTCH2

chr1 120610364 120612273 32 Loss 1909 6 1.288 0.000888 NOTCH2

chr1 120610364 120612273 35 Loss 1909 6 1.151 0.005866 NOTCH2

chr1 120483081 120483576 19 Loss 495 3 1.226 0.000951 NOTCH2

chr1 120535582 120618462 8 Gain 82880 66 2.501 9.43E-05 NOTCH2

chr1 120531186 120620993 7 Gain 89807 80 2.672 8.92E-09 NOTCH2

chr11 119171049 119171786 3 Loss 737 8 1.270 0.000603 CBL

chr11 119171049 119171713 4 Loss 664 7 1.349 0.003694 CBL

chr11 119171049 119171713 5 Loss 664 7 1.259 0.001221 CBL

chr11 119171049 119171786 6 Loss 737 8 1.311 0.000287 CBL

chr11 119171049 119171713 22 Loss 664 7 1.349 0.000314 CBL

chrX 118827177 118831825 17 Loss 4648 37 1.527 0.008259 SEPT6

chrX 118827177 118831825 18 Loss 4648 37 1.402 8.61E-06 SEPT6

chrX 118827177 118831825 20 Loss 4648 37 1.502 0.002268 SEPT6

chrX 118827177 118831825 29 Loss 4648 37 1.532 0.00827 SEPT6

chrX 118827177 118831825 32 Loss 4648 37 1.480 0.000297 SEPT6

chrX 118826101 118835146 36 Loss 9045 42 1.486 0.000244 SEPT6

chr11 118397166 118420988 29 Loss 23822 4 0.842 0.008443 MLL

chr11 118397053 118459847 1 Loss 62794 6 1.054 0.003435 MLL

chr11 118397053 118459847 4 Loss 62794 6 1.213 0.000649 MLL

chr11 118396953 118397390 5 Loss 437 4 1.011 0.001537 MLL

chr11 118342430 118342622 10 Loss 192 5 1.222 0.000728 MLL

chr11 118342430 118342589 33 Loss 159 4 1.373 0.001653 MLL

139

chr6 117688820 117703808 2 Gain 14988 14 3.314 0.000335 ROS1

chr6 117687317 117703808 15 Gain 16491 17 2.983 0.004734 ROS1

chr6 117687282 117703808 13 Gain 16526 18 2.647 0.002692 ROS1

chr6 117687282 117703808 14 Gain 16526 18 2.843 0.005736 ROS1

chr6 117687282 117703808 34 Gain 16526 18 3.092 0.000448 ROS1

chr6 117687282 117703808 36 Gain 16526 18 2.953 0.002008 ROS1

chr6 117662305 117662707 22 Loss 402 8 1.284 0.007085 ROS1

chr11 117104101 117104478 13 Loss 377 5 1.249 0.008091 RNF214

chr11 117102951 117103284 31 Loss 333 4 1.116 0.00187 RNF214

chr11 117102951 117104478 35 Loss 1527 16 1.435 0.000982 RNF214

chr11 117102402 117103177 16 Loss 775 5 0.997 0.00949 RNF214

chr11 117102402 117103284 23 Gain 882 6 3.314 0.006161 RNF214

chr2 113992753 113994702 18 Loss 1949 8 1.298 0.001199 PAX8

chr2 113992753 113994702 29 Loss 1949 8 1.354 0.002764 PAX8

chr11 113934697 113934912 31 Loss 215 5 1.239 0.009449 ZBTB16

chr11 113934697 113934912 32 Loss 215 5 1.275 0.007454 ZBTB16

chr5 112176121 112176459 1 Loss 338 5 1.155 0.001088 APC

chr5 112176121 112176459 3 Loss 338 5 1.170 0.000531 APC

chr5 112176121 112176459 4 Loss 338 5 1.235 0.001802 APC

chr5 112176121 112176459 5 Loss 338 5 1.053 0.002266 APC

chr5 112176121 112176459 6 Loss 338 5 1.058 0.001013 APC

chr5 112176121 112176459 8 Loss 338 5 1.152 0.00563 APC

APC

chr5 112175024 112175235 11 Gain 211 4 4.360 0.0064 APC

chr5 112174961 112175235 35 Gain 274 5 3.067 0.006351 APC

APC

chr5 112162551 112162908 32 Loss 357 4 1.149 0.008015 APC

APC

chr5 112110338 112113367 35 Gain 3029 5 2.915 0.009899 APC

chr5 112101667 112103051 26 Gain 1384 5 2.769 0.004348 APC

chr11 111228333 111228655 11 Loss 322 6 1.245 0.009882 POU2AF1

chr11 111224730 111228655 18 Loss 3925 19 1.463 0.001088 POU2AF1

chr11 111224305 111228655 20 Loss 4350 24 1.533 0.006601 POU2AF1

140

chr12 109084752 109085548 4 Loss 796 3 1.199 0.005051 CORO1C

chr12 109084752 109085548 6 Loss 796 3 1.179 0.007269 CORO1C

chr9 108420581 108421031 1 Loss 450 5 1.277 0.003515 none

chr9 108420581 108421031 3 Loss 450 5 1.141 0.000549 none

chr9 108420581 108421031 4 Loss 450 5 1.260 0.002065 none

chr9 108420581 108421031 5 Loss 450 5 1.025 0.00429 none

chr9 108420581 108421031 6 Loss 450 5 1.099 0.001297 none

chr11 108186585 108186756 19 Loss 171 3 0.984 0.009857 ATM

chr11 108186585 108186756 31 Loss 171 3 1.060 0.00819 ATM

chr11 108089420 108091942 6 Loss 2522 18 1.353 0.001571 none

chr9 105421419 105858176 7 Gain 436757 14 2.716 0.000225 CYLC2

chr9 105421419 105858176 31 Gain 436757 14 2.711 0.003656 CYLC2

chr10 102891538 102892029 9 Loss 491 6 1.244 0.004388 TLX1

chr10 102891349 102894443 23 Gain 3094 15 2.755 0.006672 TLX1

chr12 102790096 102791526 1 Loss 1430 13 1.201 0.000549 IGF1

chr12 102790096 102791351 5 Loss 1255 10 1.163 0.002702 IGF1

chr12 102790096 102791526 6 Loss 1430 13 1.232 0.001004 IGF1

chr9 102582512 102582738 24 Loss 226 3 1.218 0.000404 AK057451

chr9 102580975 102582738 31 Loss 1763 18 1.443 0.004229 AK057451

chr11 102193604 102194890 11 Gain 1286 14 2.895 0.001573 BIRC3

chr11 102193604 102194890 12 Gain 1286 14 2.871 0.001761 BIRC3

chr11 102193604 102193901 13 Gain 297 4 3.226 0.005055 BIRC3

chr11 102183776 102184729 19 Loss 953 5 1.338 0.008037 none

chrX 100509822 100770894 5 Loss 261072 7 1.148 0.005082 many

chr14 99641915 99642386 29 Loss 471 11 1.338 0.000117 BCL11B

chr14 99641915 99642386 32 Loss 471 11 1.410 0.000733 BCL11B

chr4 99174241 99175735 6 Loss 1494 7 1.251 0.004137 none

chr4 99174241 99175735 19 Loss 1494 7 1.313 0.004397 none

chr4 99173676 99176543 4 Loss 2867 9 1.274 0.001636 none

141

chr9 98278952 98279212 20 Loss 260 5 1.026 0.001646 PTCH1

chr9 98278825 98279278 17 Loss 453 7 1.263 0.001443 PTCH1

chr9 98278825 98279278 18 Loss 453 7 1.247 0.001583 PTCH1

chr9 98278825 98279278 19 Loss 453 7 1.218 0.001483 PTCH1

chr9 98278825 98279212 25 Loss 387 6 1.054 0.001364 PTCH1

chr9 98278825 98279212 27 Loss 387 6 1.157 0.002837 PTCH1

chr9 98278825 98279366 31 Loss 541 8 1.073 0.000213 PTCH1

chr9 98278411 98279366 24 Loss 955 10 1.317 0.000406 PTCH1

chr9 98277653 98279577 9 Loss 1924 13 1.367 1.17E-05 PTCH1

chr9 98277653 98279278 15 Loss 1625 10 1.210 2.49E-05 PTCH1

chr9 98277653 98279577 29 Loss 1924 13 1.399 6.83E-05 PTCH1

chr9 98277653 98279462 32 Loss 1809 12 1.347 2.71E-05 PTCH1

chr9 98277653 98279462 35 Loss 1809 12 1.397 0.000143 PTCH1

chr9 98266417 98269035 5 Gain 2618 7 3.457 0.000376 PTCH1

chr9 98231091 98231571 18 Loss 480 5 1.232 0.004915 PTCH1

chr9 98231091 98231571 19 Loss 480 5 1.376 0.004915 PTCH1

chr9 98231091 98231571 17 Loss 480 5 1.366 0.004915 PTCH1

chr9 98207931 98208554 8 Gain 623 8 2.799 0.004598 PTCH1

chr9 97862625 97862988 6 Gain 363 3 3.929 0.004548 FANCC

chr9 97862625 97862988 24 Gain 363 3 3.235 0.000307 FANCC

chr9 97862625 97862988 31 Gain 363 3 3.435 0.009661 FANCC

chr14 95623818 95624127 9 Loss 309 4 1.201 0.004721 DICER1-AS

chr14 95623818 95624046 15 Loss 228 3 1.019 0.009364 DICER1-AS

chr14 95623818 95624127 31 Loss 309 4 1.215 0.002501 DICER1-AS

chr14 95623642 95624046 35 Loss 404 7 1.348 0.002043

DICER1/DICER1-AS

chr14 95598404 95600396 4 Loss 1992 8 1.279 0.006329 DICER1

chr14 95598404 95600396 5 Loss 1992 8 1.279 0.006329 DICER1

chr14 95597936 95600396 3 Loss 2460 9 1.332 0.000625 DICER1

chr14 93275641 93276233 13 Gain 592 4 3.093 0.001232 GOLGA5

chr14 93275411 93276233 12 Gain 822 5 3.770 9.56E-05 GOLGA5

chr14 92472623 92473426 9 Gain 803 3 3.538 0.009893 TRIP11

chr14 92472259 92474129 34 Gain 1870 9 3.459 0.000284 TRIP11

chr14 92472160 92472623 15 Gain 463 6 3.476 0.006426 TRIP11

142

chr7 91565310 91567123 3 Loss 1813 5 1.187 4.49E-03 none

chr7 91565310 91567123 4 Loss 1813 5 1.240 0.000216 none

chr7 91565310 91567123 5 Loss 1813 5 1.067 0.003545 none

chr15 91187994 91251111 24 Loss 63117 11 1.403 0.009155 CRTC3

chr15 91166194 91172639 12 Loss 6445 18 1.444 0.006887 CRTC3

chr15 91162997 91174672 11 Loss 11675 26 1.447 0.002801 CRTC3

chr15 91072479 91072705 29 Loss 226 3 1.116 0.002099 none

chr15 91187994 91251111 24 Loss 63117 11 1.403 0.009155 CRTC3

chr8 90999078 91001637 7 Loss 2559 5 1.397 0.00234 none

chr8 90992801 90994324 3 Loss 1523 10 1.035 0.006308 NBN

chr15 90645513 90645860 34 Loss 347 5 1.106 0.004591 IDH2

chr15 90645513 90645860 35 Loss 347 5 1.232 0.003733 IDH2

chr15 90645513 90645860 36 Loss 347 5 1.170 0.006836 IDH2

chr15 88799661 88801392 19 Loss 1731 15 1.452 0.007311 NTRK3

chr15 88799661 88801171 29 Loss 1510 13 1.402 0.00126 NTRK3

chr15 88799661 88801171 32 Loss 1510 13 1.427 0.002543 NTRK3

chr15 88799137 88799732 3 Gain 595 11 2.957 0.000277 NTRK3

chr11 85691995 85692529 22 Gain 534 4 3.071 0.006438 PICALM

chr11 85691414 85692529 8 Gain 1115 5 3.323 0.009989 PICALM

chr14 83864884 88222494 11 Gain 4357610 139 2.460 0.001457 few

chr14 83778865 88131721 10 Gain 4352856 139 2.434 6.38E-05 few

chrX 82066647 118623797 10 Gain 36557150 949 2.387 4.40E-11 many

chrX 81947420 82066647 10 Gain 119227 4 3.509 0.004614 many

chr17 79935484 79935555 35 Loss 71 3 1.072 0.007929 ASPSCR1

chr17 79935311 79936311 14 Loss 1000 7 1.265 0.007006 ASPSCR1

chr17 79935311 79936311 23 Gain 1000 7 3.418 0.002291 ASPSCR1

chr17 79793482 79917933 10 Loss 124451 5 1.227 0.000179 many

chr17 79793482 79917933 11 Loss 124451 5 1.048 3.20E-06 many

chr17 79793482 79917933 12 Loss 124451 5 1.109 0.001028 many

chr16 79638111 79638376 26 Loss 265 3 1.302 0.00668 MAF

chr16 79633758 79635252 29 Loss 1494 16 1.413 0.001956 MAF

143

chr16 79633722 79634562 35 Loss 840 9 1.361 0.000632 MAF

chr16 79632626 79633583 23 Gain 957 10 3.314 0.001382 MAF

chr16 79628998 79629147 1 Loss 149 3 0.928 0.002068 MAF

chr16 79628998 79629147 2 Loss 149 3 0.867 0.001724 MAF

chr16 79628998 79629147 3 Loss 149 3 0.941 0.006462 MAF

chr16 79628998 79629147 5 Loss 149 3 0.883 0.005759 MAF

chr16 79628998 79629147 6 Loss 149 3 0.833 0.00431 MAF

chr16 79628998 79629147 7 Loss 149 3 0.946 0.003549 MAF

chr16 79628998 79629147 8 Loss 149 3 0.954 0.001386 MAF

chr16 79628998 79629147 17 Loss 149 3 0.937 0.003369 MAF

chr16 79628998 79629147 18 Loss 149 3 0.948 0.005963 MAF

chr16 79628998 79629147 19 Loss 149 3 0.985 0.004854 MAF

chr16 79628998 79629147 20 Loss 149 3 0.933 0.003669 MAF

chr16 78996814 79113461 11 Gain 116647 4 4.307 0.004557 WWOX

chr16 78996814 79113461 15 Gain 116647 4 4.176 0.003799 WWOX

chr16 78996814 79113461 16 Gain 116647 4 3.926 0.004887 WWOX

chr10 76789124 76789420 12 Loss 296 7 1.288 0.000362 KAT6B

chr10 76784545 76784973 5 Loss 428 4 0.924 0.00536 KAT6B

chr10 76647068 76650513 8 Loss 3445 3 1.191 0.002814 KAT6B

chr10 76602742 76603055 34 Loss 313 8 1.405 0.009859 KAT6B

chr10 76584280 76586329 11 Loss 2049 21 1.289 9.69E-09 KAT6B

chr10 76584280 76586329 13 Loss 2049 21 1.419 8.54E-08 KAT6B

chr10 76584280 76586329 15 Loss 2049 21 1.285 5.62E-08 KAT6B

chr10 76584280 76590621 34 Loss 6341 27 1.353 4.49E-06 KAT6B

chr10 76584280 76586236 35 Loss 1956 20 1.443 7.05E-05 KAT6B

chr10 76584123 76586329 9 Loss 2206 23 1.460 3.97E-06 KAT6B

chr10 76584123 76586384 10 Loss 2261 24 1.418 6.16E-09 KAT6B

chr10 76582854 76586236 2 Loss 3382 34 1.469 0.003478 KAT6B

chr10 76582854 76596079 12 Loss 13225 44 1.534 0.002274 KAT6B

chr10 76582854 76586329 14 Loss 3475 35 1.475 0.000201 KAT6B

chr10 76582854 76596079 16 Loss 13225 44 1.519 0.002468 KAT6B

chr10 76582854 76586329 36 Loss 3475 35 1.402 1.10E-07 KAT6B

chr10 76580602 76586384 18 Loss 5782 51 1.511 0.000225 KAT6B

chr10 76580602 76586384 20 Loss 5782 51 1.545 0.002214 KAT6B

chr10 76580602 76586384 29 Loss 5782 51 1.541 0.002281 KAT6B

chr10 76579508 76586236 6 Loss 6728 51 1.556 0.008593 KAT6B

chr10 76579508 76586236 32 Loss 6728 51 1.504 0.000359 KAT6B

144

chr2 75954855 84884742 10 Gain 8929887 230 2.395 0.000234 many

chr2 75954855 84758934 27 Gain 8804079 227 2.390 7.88E-05 many

chr7 75353402 75367710 22 Gain 14308 11 2.841 7.28E-05 HIP1

chr7 75353402 75367710 31 Gain 14308 11 2.762 7.13E-05 HIP1

chr7 75351575 75370260 17 Gain 18685 29 2.566 0.006613

chr7 73484027 73500121 12 Gain 16094 4 3.204 0.007007 LIMK1

chr7 73484027 73500121 34 Gain 16094 4 3.925 0.003506 LIMK1

chr8 71318608 71318833 8 Loss 225 3 1.162 0.006896 none

chr8 71318608 71319452 17 Loss 844 8 1.337 0.003406 none

chr10 70453336 70453568 23 Loss 232 4 1.123 0.003146 TET1

chr10 70453261 70453645 4 Loss 384 6 1.282 0.005234 TET1

chr10 70438861 70441754 36 Gain 2893 4 4.216 0.009321 TET1

chr10 70330378 70332402 10 Gain 2024 8 2.585 0.00377 TET1

chr14 68944454 68944765 3 Loss 311 4 1.048 0.009975 RAD51B

chr14 68944454 68944646 4 Loss 192 3 1.029 0.002191 RAD51B

chr14 68944372 68944646 1 Loss 274 4 1.109 0.004521 RAD51B

chr14 68944372 68944765 6 Loss 393 5 1.136 0.004907 RAD51B

chr14 68944372 68945429 7 Loss 1057 7 1.339 0.001322 RAD51B

chr14 68944372 68945429 8 Loss 1057 7 1.341 0.000305 RAD51B

chr14 68943769 68945429 18 Loss 1660 8 1.354 0.002342 RAD51B

chr14 68943769 68945429 19 Loss 1660 8 1.389 0.00736 RAD51B

chr5 67594534 67594792 4 Loss 258 3 1.223 0.004717 PIK3R1

chr5 67594138 67594627 34 Gain 489 5 3.940 0.007901 PIK3R1

chr6 65402650 67041053 6 Gain 1638403 43 2.535 0.000711 EYS

chr6 65402650 67041053 8 Gain 1638403 43 2.475 0.001777 EYS

chr16 65158041 65159558 11 Gain 1517 11 2.786 0.002136 none

chr16 65157966 65159672 26 Gain 1706 13 2.512 0.004394 none

chr16 65039485 65155672 10 Gain 116187 103 2.405 0.004262 CDH11

chr16 65038735 65155672 27 Gain 116937 104 2.405 0.000825 CDH11

chr16 64957254 64981120 22 Gain 23866 6 3.234 0.005708 CDH11

145

chrX 63411823 63412074 10 Loss 251 4 1.444 0.002784 FAM123B

chrX 63411823 63412074 13 Loss 251 4 1.336 0.002817 FAM123B

chrX 63411823 63412074 15 Loss 251 4 1.044 0.003119 FAM123B

chrX 63411823 63412074 35 Loss 251 4 1.390 0.000101 FAM123B

chrX 63405896 63408024 32 Loss 2128 22 1.348 0.007497 FAM123B

chrX 63405136 63429061 18 Loss 23925 158 1.550 3.00E-05 FAM123B

chrX 63347505 63408159 17 Loss 60654 38 1.462 0.003626 FAM123B

chr17 62511609 62522526 18 Loss 10917 4 1.079 0.008661 DEP95

chr17 62511254 62522526 4 Loss 11272 6 1.340 0.003413 DEP95

chr17 62511254 62522526 5 Loss 11272 6 1.179 0.002247 DEP95

chr17 62511254 62522526 6 Loss 11272 6 1.223 0.000405 DEP95

chr7 62280148 62761968 7 Gain 481820 12 3.498 2.59E-08 LOC643955

chr7 62280148 62761968 9 Gain 481820 12 3.442 1.54E-08 LOC643955

chr7 62280148 62761968 33 Gain 481820 12 3.343 2.07E-07 LOC643955

chr10 61551326 61552261 2 Gain 935 13 3.277 9.87E-05 CCDC6

chr10 61551326 61552584 15 Gain 1258 17 3.038 0.000217 CCDC6

chr10 61551326 61552658 16 Gain 1332 18 3.000 0.00018 CCDC6

chr10 61551251 61552156 1 Gain 905 12 2.738 0.002869 CCDC6

chr10 61551251 61552261 6 Gain 1010 14 2.754 0.001063 CCDC6

chr10 61551251 61552349 7 Gain 1098 15 2.879 4.55E-05 CCDC6

chr2 60782857 60783517 3 Loss 660 7 1.224 0.008493 none

chr2 60782857 60783517 6 Loss 660 7 1.132 0.007876 none

chr2 60782857 60783517 18 Loss 660 7 1.111 0.008616 none

chr2 60780953 60782104 10 Loss 1151 14 1.444 0.002589 none

chr2 60780806 60783517 29 Loss 2711 30 1.487 0.00398 none

chr2 60780806 60783283 34 Loss 2477 28 1.456 0.004057 none

chr20 60717737 60719501 35 Loss 1764 9 1.352 0.00097 PSMA7

chr20 60717654 60720735 10 Loss 3081 11 1.451 0.000341 PSMA7

chr20 60717549 60720735 9 Loss 3186 12 1.424 0.000597 PSMA7

146

chr20 57464204 57467440 10 Loss 3236 6 1.283 0.003726 GNAS

chr20 57463531 57465585 23 Gain 2054 6 3.266 0.007725 GNAS

chr20 57462338 57466660 29 Loss 4322 8 1.307 0.006646 GNAS

chr11 57420512 57423034 4 Loss 2522 7 1.359 0.008527 YPEL4

chr11 57416829 57417952 7 Loss 1123 7 1.261 0.003476 none

chr11 57394769 57428535 29 Loss 33766 62 1.579 0.007805 some

chr7 56427383 62280148 9 Gain 5852765 48 2.501 0.008094 many

chr7 56427383 61896163 10 Gain 5468780 41 2.615 1.27E-06 many

chr4 55605813 55606317 1 Loss 504 4 1.190 0.009231 KIT

chr4 55605813 55606317 3 Loss 504 4 1.217 0.002517 KIT

chr4 55605813 55606317 5 Loss 504 4 1.180 0.008143 KIT

chr4 55564750 55565822 11 Gain 1072 3 4.264 0.002575 KIT

chr4 55564750 55565822 15 Gain 1072 3 4.431 0.006001 KIT

chr4 55564750 55565822 16 Gain 1072 3 4.473 0.007809 KIT

chr12 54329700 54330771 9 Loss 1071 14 1.390 7.47E-07 none

chr12 54327202 54328305 19 Loss 1103 9 1.283 0.004119 none

chr12 54327202 54328305 20 Loss 1103 9 1.242 0.003883 none

chr12 54327049 54328305 17 Loss 1256 10 1.307 0.00719 none

chr17 53400717 53402174 11 Gain 1457 17 2.832 0.000536 HLF

chr17 53400566 53402000 12 Gain 1434 17 2.772 0.001103 HLF

chr7 50757049 54427822 10 Gain 3670773 100 2.422 0.002032 many

chrX 48905186 48909256 2 Loss 4070 7 1.112 0.002019 none

chrX 48904998 48909256 6 Loss 4258 9 1.207 0.006932 none

chrX 48904998 48908799 7 Loss 3801 8 1.188 0.004988 none

chrX 48534355 48912101 36 Loss 377746 263 1.617 0.00154 many

chrX 48534206 48912101 18 Loss 377895 264 1.616 0.000551 many

chr1 47783066 47785951 22 Gain 2885 3 3.336 0.001418 none

chr1 47779551 47787756 23 Gain 8205 16 2.599 0.007209 STIL

chr1 47766997 47767652 4 Loss 655 4 1.266 0.004645 STIL

chr1 47766997 47767652 6 Loss 655 4 1.217 0.006673 STIL

147

chrX 44968978 44970623 1 Loss 1645 6 1.154 0.006385 KDM6A

chrX 44968978 44970623 5 Loss 1645 6 1.084 0.006717 KDM6A

chrX 44968978 44970623 6 Loss 1645 6 1.200 0.005429 KDM6A

chrX 44732331 44732893 34 Loss 562 7 1.059 0.003575 KDM6A

chrX 44732259 44733036 29 Loss 777 9 1.318 0.003596 KDM6A

chr22 44966582 45106030 22 Gain 139448 5 2.926 0.0033 some

chr22 44966582 45077310 36 Gain 110728 4 3.584 0.009301 some

chr11 44265893 44266309 5 Loss 416 4 1.105 0.004965 EXT2

chr11 44265893 44266309 6 Loss 416 4 1.024 0.002498 EXT2

chr11 44265641 44266491 1 Loss 850 9 1.346 0.003548 EXT2

chr11 44265641 44266491 8 Loss 850 9 1.480 0.005239 EXT2

chr11 44265641 44266491 18 Loss 850 9 1.383 0.007943 EXT2

chr11 44265139 44266309 4 Loss 1170 8 1.314 0.000138 EXT2

chr6 44219058 44219450 36 Gain 392 5 3.122 0.00997 HSP90AB1

chr6 44218906 44219450 12 Gain 544 6 3.069 0.006149 HSP90AB1

chr6 44214034 44214278 31 Loss 244 3 1.135 0.007638 HSP90AB1

chr6 44213099 44214785 29 Loss 1686 19 1.481 0.000869 HSP90AB1

chr11 44117264 44118171 34 Loss 907 9 1.325 0.000627 EXT2

chr11 44117063 44117583 23 Gain 520 6 2.977 0.002585 EXT2

chr19 42792643 42793234 10 Gain 591 3 2.722 0.005704 CIC

chr19 42792643 42793234 13 Gain 591 3 3.108 0.00674 CIC

chr19 42792643 42793234 14 Gain 591 3 4.288 0.005125 CIC

chr4 41750933 41751252 4 Loss 319 5 1.252 0.001866 PHOX2B

chr4 41750933 41751103 5 Loss 170 3 1.125 0.005873 PHOX2B

chr4 41747679 41749990 23 Gain 2311 15 2.743 0.003426 PHOX2B

chr8 41786656 41787274 1 Loss 618 3 1.068 0.000743 KAT6A

chr8 41786656 41787274 3 Loss 618 3 1.101 0.005265 KAT6A

chr8 41786656 41787274 4 Loss 618 3 1.058 0.006428 KAT6A

chr8 41786656 41787274 5 Loss 618 3 1.085 0.005692 KAT6A

148

chr8 41786656 41787274 6 Loss 618 3 1.135 0.00088 KAT6A

chr22 41532800 41534328 33 Gain 1528 6 2.637 0.005069 EP300

chr22 41532800 41534328 36 Gain 1528 6 3.149 0.004776 EP300

chr3 41268003 41275233 4 Loss 7230 12 1.380 0.008616 CTNNB1

chr3 41240407 41240957 35 Loss 550 8 1.423 0.005777 CTNNB1

chr3 41240315 41240957 34 Loss 642 9 1.365 0.00342 CTNNB1

chr3 41240005 41248200 23 Gain 8195 21 2.528 0.000466 CTNNB1

chr3 41238999 41239508 32 Gain 509 6 3.091 0.008694 none

chr13 41130568 41131189 1 Loss 621 6 1.211 0.005463 FOXO1

chr13 41130568 41130990 3 Loss 422 4 1.126 0.000694 FOXO1

chr13 41130568 41130990 4 Loss 422 4 1.239 0.001462 FOXO1

chr13 41130568 41131189 6 Loss 621 6 1.197 0.003281 FOXO1

chr13 41130568 41131356 32 Loss 788 9 1.460 0.000354 FOXO1

chr15 40903163 40908185 14 Gain 5022 6 3.977 0.009251 CASC5

chr15 40903163 40908185 36 Gain 5022 6 4.017 0.006759 CASC5

chr15 40504159 40504785 18 Gain 626 3 3.273 0.005881 BUB1B

chr15 40491279 40492441 4 Loss 1162 4 1.237 0.00305 BUB1B

chr15 40488777 40492617 8 Loss 3840 16 1.479 0.004244 BUB1B

chr15 40448624 40449550 15 Gain 926 4 3.721 0.003829 BUB1B

chr4 40244820 40245271 18 Loss 451 10 1.293 0.006431 RHOH

chr4 40188577 40189138 5 Loss 561 5 0.953 0.006986 none

chr20 39319565 39319991 2 Gain 426 5 3.718 0.001146 MAFB

chr20 39319565 39320100 3 Gain 535 6 3.430 0.00101 MAFB

chr20 39319565 39320100 7 Gain 535 6 3.004 0.002856 MAFB

chr20 39319565 39319991 8 Gain 426 5 2.892 0.004144 MAFB

chr20 39317467 39318634 13 Loss 1167 12 1.519 0.00289 MAFB

chr20 39316652 39318108 23 Gain 1456 15 3.064 1.06E-05 MAFB

chr20 39314530 39314779 11 Gain 249 4 3.349 0.008663 MAFB

chr5 38504086 38504316 20 Loss 230 3 0.940 0.002747 LIFR

chr5 38503570 38504316 3 Loss 746 4 1.308 0.007276 LIFR

chr5 38480714 38481044 3 Loss 330 4 1.012 0.002428 LIFR

chr5 38480714 38481044 5 Loss 330 4 0.968 0.000257 LIFR

149

chr5 38480714 38481044 6 Loss 330 4 1.083 0.003317 LIFR

chr8 38325170 38325773 32 Loss 603 6 1.263 0.007659 FGFR1

chr8 38325170 38325866 35 Loss 696 7 1.320 0.007074 FGFR1

chr6 37135956 37136817 6 Loss 861 8 1.229 0.007081 none

chr6 37135956 37136971 8 Loss 1015 10 1.313 0.003396 none

chr6 37133987 37136971 2 Loss 2984 21 1.275 0.000778 none

chr3 37029065 37030954 14 Gain 1889 7 3.271 0.004068 EPM2AIP1

chr3 37029065 37030954 15 Gain 1889 7 3.493 0.000396 EPM2AIP1

chr3 37029065 37030954 16 Gain 1889 7 3.353 0.002423 EPM2AIP1

chr21 36429461 36430121 31 Loss 660 3 1.168 0.007221 none

chr21 36429461 36430121 32 Loss 660 3 1.053 0.008211 none

chr15 34746489 34843587 34 Loss 97098 3 0.876 0.002618 GOLGA8B

chr15 34703790 34843587 10 Loss 139797 4 1.002 0.000789 GOLGA8B

chr6 34202600 34203601 9 Loss 1001 13 1.378 0.000159 none

chr6 34202600 34203752 21 Loss 1152 15 1.507 3.60E-05 none

chr6 34201975 34206285 35 Loss 4310 34 1.506 0.000426 HMGA1

chr6 34178962 34206285 31 Loss 27323 53 1.578 0.004667 HMGA1

chr11 33890209 33890959 29 Loss 750 3 1.105 0.002351 LMO2

chr11 33888981 33891534 31 Loss 2553 11 1.406 0.005257 LMO2

chr22 33245820 33254625 9 Gain 8805 10 2.666 0.006469 TIMP3/SYN3

chr22 33245820 33254625 10 Gain 8805 10 2.692 0.006007 TIMP3/SYN3

chr22 33245820 33254625 15 Gain 8805 10 3.298 0.006141 TIMP3/SYN3

chr22 33245820 33254042 35 Gain 8222 9 2.809 0.003339 TIMP3/SYN3

chr22 33245513 33254625 12 Gain 9112 11 3.204 0.002115 TIMP3/SYN3

chr22 33245513 33254625 16 Gain 9112 11 3.367 0.003912 TIMP3/SYN3

chr22 33245459 33254625 11 Gain 9166 12 2.959 0.006367 TIMP3/SYN3

150

chr22 33245459 33254625 14 Gain 9166 12 3.002 0.006339 TIMP3/SYN3

chr22 33190929 33192684 15 Gain 1755 10 3.187 0.006475 SYN3

chr13 32972670 32973058 23 Loss 388 6 1.300 0.00089 BRCA2

chr13 32937371 32937878 4 Loss 507 6 1.315 0.005834 BRCA2

chr13 32918410 32920314 11 Gain 1904 4 4.226 0.004112 BRCA2

chr13 32918410 32919215 33 Gain 805 3 2.890 0.004276 BRCA2

chr16 32026576 33785739 21 Gain 1759163 27 2.545 0.000327 many

chr16 31964971 33950850 10 Gain 1985879 33 2.646 0.000171 many

chr16 31964971 34737439 11 Gain 2772468 52 2.585 0.002153 many

chr16 31931207 46699664 27 Gain 14768457 74 2.448 0.000213 many

chr16 31931207 33785739 31 Gain 1854532 29 2.553 0.003347 many

chr16 31931207 34789967 33 Gain 2858760 55 2.466 0.002332 many

chr16 31551608 34789967 20 Gain 3238359 66 2.631 8.21E-05 many

chr16 31202131 31202788 34 Loss 657 8 1.242 4.01E-05 FUS

chr16 31202131 31202788 36 Loss 657 8 1.283 3.03E-05 FUS

chr20 30945886 30947168 29 Loss 1282 10 1.390 0.00707 ASXK1

chr20 30945886 30947168 31 Loss 1282 10 1.378 0.009556 ASXK1

chr8 30938511 30938757 5 Loss 246 3 1.024 0.005635 WRN

chr8 30938511 30938757 9 Loss 246 3 1.059 0.004466 WRN

chr8 30922392 30924381 23 Gain 1989 7 2.767 0.007874 WRN

chr8 30901884 30906067 34 Gain 4183 3 4.223 0.007417 WRN

chr8 30889802 30900493 20 Loss 10691 17 1.401 0.005898 WRN/PURG

chr17 30327236 30335307 15 Gain 8071 7 3.379 0.007832 SUZ12

chr17 30320306 30321287 15 Gain 981 5 3.623 0.008908 SUZ12

chr6 30117279 31132151 23 Loss 1014872 26 1.502 0.005133 many

chr2 29923076 29927835 34 Gain 4759 5 3.598 0.000146 ALK

chr2 29919023 29963483 11 Gain 44460 41 2.639 0.003132 ALK

chr2 29416150 29416245 2 Gain 95 3 4.231 0.000636 ALK

chr2 29415924 29416533 11 Gain 609 10 3.388 0.002639 ALK

chr2 29415924 29416533 12 Gain 609 10 2.996 0.007239 ALK

chr2 29415924 29416533 13 Gain 609 10 2.764 0.001735 ALK

chr2 29415924 29416533 14 Gain 609 10 3.120 0.000704 ALK

151

chr2 29415924 29416533 15 Gain 609 10 3.437 0.003281 ALK

chr2 29415924 29416533 16 Gain 609 10 3.403 0.001514 ALK

chr17 29558557 29562850 32 Loss 4293 18 1.359 8.06E-05 NF1

chr17 29553605 29562850 36 Loss 9245 53 1.531 0.002408 NF1

chr17 29419697 29420071 11 Loss 374 5 1.128 0.006005 none

chr17 29419697 29420071 15 Loss 374 5 1.164 0.005298 none

chr17 29419697 29420071 36 Loss 374 5 1.114 0.0057 none

chr17 29419415 29419995 2 Loss 580 7 1.012 6.71E-05 none

chr17 29419415 29419995 6 Loss 580 7 1.200 0.000875 none

chr17 29419415 29419995 7 Loss 580 7 1.148 0.000671 none

chr17 29414319 29419995 8 Loss 5676 13 1.380 0.001408 none

chr2 29416150 29416245 2 Gain 95 3 4.231 0.000636 ALK

chr2 29415924 29416533 11 Gain 609 10 3.388 0.002639 ALK

chr2 29415924 29416533 12 Gain 609 10 2.996 0.007239 ALK

chr2 29415924 29416533 13 Gain 609 10 2.764 0.001735 ALK

chr2 29415924 29416533 14 Gain 609 10 3.120 0.000704 ALK

chr2 29415924 29416533 15 Gain 609 10 3.437 0.003281 ALK

chr2 29415924 29416533 16 Gain 609 10 3.403 0.001514 ALK

chr22 29090584 29091770 32 Loss 1186 8 1.322 0.000717 CHEK2

chr22 29085469 29091770 15 Loss 6301 13 1.256 9.43E-05 CHEK2

chr22 29085469 29091770 34 Loss 6301 13 1.411 0.001665 CHEK2

chr22 29085469 29091803 36 Loss 6334 14 1.202 6.58E-05 CHEK2

chr22 29083868 29093022 3 Gain 9154 22 2.656 0.00022 CHEK2

chr22 28199441 28199793 11 Gain 352 4 3.730 0.004801 none

chr22 28199441 28199885 12 Gain 444 5 3.570 0.005944 none

chr22 28199441 28199885 13 Gain 444 5 3.350 0.004689 none

chr22 28199441 28199885 15 Gain 444 5 3.620 0.009102 none

chr7 27239155 27239928 23 Gain 773 6 3.113 0.00515 HOXA13

chr7 27238954 27240177 29 Loss 1223 11 1.329 0.002147 HOXA13

chr7 27238954 27240177 31 Loss 1223 11 1.408 0.007261 HOXA13

chr7 27238954 27240258 32 Loss 1304 12 1.381 0.001318 HOXA13

chr7 27238954 27240177 35 Loss 1223 11 1.443 0.002307 HOXA13

chr7 27226460 27226753 25 Gain 293 4 3.298 0.006359 none

chr7 27222489 27224501 23 Gain 2012 10 3.085 0.000111 HOXA11

152

chr7 27213143 27213867 32 Loss 724 15 1.329 0.000107 HOXA10

chr7 27213143 27213786 34 Loss 643 14 1.109 4.81E-06 HOXA10

chr7 27213143 27213786 15 Loss 643 14 1.327 0.00215 HOXA10

chr7 27213143 27213786 33 Loss 643 14 1.523 0.00309 HOXA10

chr7 27213143 27213279 27 Loss 136 4 1.190 0.008506 HOXA10

chr7 27213060 27213786 35 Loss 726 17 1.278 5.10E-07 HOXA10

chr7 27213060 27213786 9 Loss 726 17 1.372 6.41E-06 HOXA10

chr7 27213060 27213411 31 Loss 351 11 1.290 0.000871 HOXA10

chr7 27213013 27213613 23 Gain 600 15 3.410 5.65E-05 HOXA10

chr7 27203163 27206490 23 Gain 3327 35 2.719 3.66E-07 HOXA9

chr6 27107195 27118133 4 Gain 10938 6 3.191 0.001044

HIST1H2BK/HIST1HrI/HIST1H2AH

chr6 27107124 27118133 23 Gain 11009 8 2.632 0.006399

HIST1H2BK/HIST1HrI/HIST1H2AH

chr7 26231477 26231651 33 Loss 174 3 1.303 0.008396 HNRNOA2B1

chr7 26231370 26231651 9 Loss 281 4 1.016 0.002651 HNRNOA2B1

chr7 26231370 26231651 10 Loss 281 4 1.041 0.007384 HNRNOA2B1

chr7 26230320 26231801 3 Loss 1481 16 1.355 0.003216 HNRNOA2B1

chr7 26230320 26231651 5 Loss 1331 14 1.324 0.001054 HNRNOA2B1

chr7 26230320 26231651 6 Loss 1331 14 1.360 0.007961 HNRNOA2B1

chr7 26230320 26231651 25 Loss 1331 14 1.341 0.006597 HNRNOA2B1

chr2 24992633 24992988 8 Loss 355 4 1.255 0.009718 NCOA1

chr2 24992633 24992988 19 Loss 355 4 1.352 0.002698 NCOA1

chr2 24992520 24992988 18 Loss 468 5 1.285 0.004724 NCOA1

chr2 24992429 24992988 29 Loss 559 6 1.379 0.00713 NCOA1

chr2 24992343 24992988 5 Loss 645 7 1.308 0.005693 NCOA1

chr2 24888584 24889057 3 Loss 473 3 0.972 0.000251 NCOA1

chr2 24888584 24889057 18 Loss 473 3 1.161 0.000967 NCOA1

chr16 23652680 23653049 29 Loss 369 5 1.341 0.002629 DCTN5

153

chr16 23652447 23652967 23 Gain 520 8 2.775 0.008182 PALB2/DCTN5

chr11 22647624 22648313 11 Gain 689 6 3.384 0.002996 none

chr11 22647624 22648313 34 Gain 689 6 3.352 0.00366 none

chr11 22646141 22647191 23 Gain 1050 25 2.484 0.002486 FANCF

chr15 22617694 28534745 2 Gain 5917051 184 2.467 0.001252 many

chr15 22617694 28775354 27 Gain 6157660 186 2.433 1.95E-07 many

chr10 21920380 21924854 2 Gain 4474 4 4.297 0.003262 MLLT10

chr10 21920380 21924854 8 Gain 4474 4 3.222 0.006256 MLLT10

chr10 21920380 21924854 13 Gain 4474 4 3.495 0.005123 MLLT10

chr10 21920380 21924854 25 Gain 4474 4 3.676 0.006732 MLLT10

chr10 21920380 21924854 34 Gain 4474 4 4.571 0.005093 MLLT10

chr10 21833725 21929076 16 Gain 95351 87 2.622 0.000585 MLLT10

chr10 21828980 21929076 12 Gain 100096 91 2.499 0.00403 MLLT10

chr10 21824344 21929076 15 Gain 104732 97 2.533 0.002327 MLLT10

chr15 21587557 22617694 10 Loss 1030137 18 1.355 0.000149 many

chr15 20912596 22617694 18 Loss 1705098 31 1.494 0.002985 many

chr15 20449001 22393843 15 Loss 1944842 32 1.254 9.16E-06 many

chr15 20449001 22617694 32 Loss 2168693 39 1.212 1.17E-12 many

chr15 20449001 22572823 36 Loss 2123822 38 1.278 9.70E-06 many

chr15 20170037 21162008 10 Loss 991971 20 1.239 9.53E-06 many

chr13 20662760 20663013 27 Loss 253 3 1.427 0.005066 ZMYM2

chr13 20565878 20567487 5 Loss 1609 5 1.151 0.003587 ZMYM2

chr17 20133000 20135712 29 Loss 2712 6 1.294 0.004067 SPECC1

chr17 20059546 20107851 35 Loss 48305 8 1.444 0.000348 SPECC1

chr19 18890772 18890951 10 Gain 179 3 3.045 0.002035 CRTC1

chr19 18890772 18890951 11 Gain 179 3 3.737 0.000235 CRTC1

chr19 18890772 18890951 12 Gain 179 3 3.919 0.001162 CRTC1

chr19 18890772 18890951 13 Gain 179 3 3.558 0.005579 CRTC1

chr19 18890772 18890951 34 Gain 179 3 3.808 0.008049 CRTC1

chr19 18890772 18890951 36 Gain 179 3 4.119 0.009941 CRTC1

chr19 18794120 18794911 14 Loss 791 6 1.085 0.000517 CRTC1

chr19 18793710 18794911 9 Loss 1201 7 1.222 0.002819 CRTC1

chr19 18793710 18794581 31 Loss 871 5 1.068 0.009172 CRTC1

154

chr22 18644898 18851514 23 Gain 206616 4 3.143 0.004955 many

chr22 18644898 18997483 26 Gain 352585 9 2.931 0.00012 many

chr8 17781642 17791750 27 Gain 10108 10 2.842 0.006587 PCM1

chr8 17781642 17793053 34 Gain 11411 12 2.985 0.008356 PCM1

chr8 17781642 17793053 36 Gain 11411 12 2.816 0.001986 PCM1

chr8 17778487 17779712 34 Gain 1225 7 3.060 0.006407 PCM1

chr1 17380591 17381279 31 Loss 688 3 1.097 0.000698 SDHB

chr1 17380591 17381279 32 Loss 688 3 1.057 0.00144 SDHB

chr22 16541382 17291241 19 Gain 749859 14 2.708 0.006046 many

chr22 16516330 17291241 10 Gain 774911 15 2.475 0.001715 many

chr22 16054713 17044026 23 Gain 989313 10 2.665 0.006475 many

chr19 15360021 15360421 11 Gain 400 4 3.460 0.006056 BRD4

chr19 15360021 15360421 15 Gain 400 4 3.773 0.001348 BRD4

chr19 15360021 15360421 33 Gain 400 4 2.934 0.002808 BRD4

chr19 15360021 15360421 34 Gain 400 4 3.617 0.003856 BRD4

chr19 15357864 15358073 21 Loss 209 6 1.360 0.007331 BRD4

chr16 14042120 14042665 1 Loss 545 8 1.077 0.002192 ERCC4

chr16 14042120 14042665 3 Loss 545 8 0.920 0.004714 ERCC4

chr16 14042120 14042665 4 Loss 545 8 1.221 0.000308 ERCC4

chr16 14042120 14042665 5 Loss 545 8 1.053 0.002692 ERCC4

chr16 14042120 14042665 6 Loss 545 8 0.980 0.000256 ERCC4

chr16 14042120 14042665 7 Loss 545 8 1.294 0.004397 ERCC4

chr16 14028134 14029637 4 Loss 1503 13 1.425 0.007459 ERCC4

chr7 14032560 14032858 2 Gain 298 4 4.350 0.006222 none

chr7 14032458 14032934 18 Gain 476 6 3.101 0.004583 none

chr7 14032458 14032858 25 Gain 400 5 3.672 0.005317 none

chr7 13971205 13971595 5 Loss 390 3 1.034 0.009828 ETV1

chr7 13931254 13931718 3 Loss 464 5 1.255 0.001891 ETV1

chr7 13931147 13931718 5 Loss 571 6 1.058 0.003695 ETV1

chr16 11348888 11349213 7 Gain 325 7 2.928 0.004004 SOCS1

chr16 11348791 11349213 5 Gain 422 8 3.051 0.006231 SOCS1

chr16 11348791 11349213 17 Gain 422 8 2.921 0.004535 SOCS1

155

chr16 11348791 11350125 23 Gain 1334 12 2.972 9.51E-05 SOCS1

chr16 11348710 11360722 34 Loss 12012 61 1.489 0.000464 SOCS1

chr16 11348710 11350221 35 Loss 1511 14 1.342 0.001128 SOCS1

chr19 11071155 11072064 9 Loss 909 5 1.287 0.009205 SMARCA4

chr19 11070022 11072064 23 Gain 2042 6 3.090 0.001212 SMARCA4

chr21 10761494 10958437 9 Loss 196943 7 1.418 0.00706 TPTE

chr21 10761494 10958437 21 Loss 196943 7 1.449 0.002859 TPTE

chr12 9617419 9729896 18 Gain 112477 4 4.681 0.006827 none

chr12 9617419 9700632 29 Gain 83213 3 3.998 0.001311 none

chr21 9412661 14486023 10 Gain 5073362 31 2.641 0.006149 none

chr21 9412661 14486023 10 Gain 5073362 31 2.641 0.006149 many

chr21 9412661 12798463 27 Gain 3385802 29 2.615 0.000148 many

chr17 7579309 7579579 7 Gain 270 4 45.045 0.001053 TP53

chr17 7578179 7578520 7 Gain 341 7 35.725 0.000293 TP53

chr17 7572512 7573087 7 Gain 575 9 31.835 8.51E-05 TP53

chr17 7571756 7586887 1 Loss 15131 55 1.277 1.64E-11 TP53

chr17 7571756 7573919 3 Loss 2163 15 1.139 4.79E-07 TP53

chr17 7350235 8332073 2 Loss 981838 300 1.320 4.73E-29 many

chr7 6043480 6046391 29 Gain 2911 6 2.987 0.000961 PMS2

chr7 6042451 6046391 34 Gain 3940 9 3.180 0.002874 PMS2

chr7 6041162 6042451 14 Loss 1289 6 1.098 0.000187 PMS2

chr7 6041162 6042451 35 Loss 1289 6 1.417 0.001344 PMS2

chr17 5045536 5048750 9 Gain 3214 15 2.521 0.00436 USP6

chr17 5038831 5042825 4 Gain 3994 30 2.648 6.73E-05 USP6

chr17 5037179 5042825 12 Loss 5646 38 1.480 0.000637 USP6

chr17 5037009 5042825 11 Loss 5816 39 1.479 0.00053 USP6

chr17 5036528 5037009 17 Gain 481 3 3.827 0.005764 USP6

chr17 5030584 5043852 8 Gain 13268 82 2.545 0.001146 USP6

chr17 5030067 5043852 3 Gain 13785 88 2.754 1.25E-07 USP6

chr16 3777752 3781556 34 Loss 3804 53 1.517 0.000323 CREBBP

156

chr16 3777752 3781556 36 Loss 3804 53 1.512 8.71E-05 CREBBP

chr11 3730557 3733446 3 Loss 2889 7 1.207 0.003613 NUP98

chr11 3730557 3733304 16 Gain 2747 5 3.682 0.002331 NUP98

chr11 3730557 3733379 34 Gain 2822 6 3.542 0.000485 NUP98

chr11 3697489 3697730 2 Loss 241 6 1.140 0.00884 NUP98

chr11 3697265 3698117 36 Loss 852 16 1.380 9.15E-05 NUP98

chr1 2985527 2987203 2 Loss 1676 4 1.059 0.002657 PRDM16

chr1 2985527 2987203 5 Loss 1676 4 1.188 0.009582 PRDM16

chr1 2985527 2987203 6 Loss 1676 4 1.035 0.008593 PRDM16

chr1 2985527 2987203 7 Loss 1676 4 1.053 0.008038 PRDM16

chr1 2985527 2987203 8 Loss 1676 4 1.209 0.006018 PRDM16

chr1 2981313 2987203 35 Loss 5890 51 1.554 0.004207 PRDM16

chr1 2979547 2987203 34 Loss 7656 65 1.465 3.30E-05 PRDM16

chr19 1204858 1205483 10 Loss 625 6 1.154 0.000741 none

chr19 1204858 1205483 24 Loss 625 6 1.252 0.006983 none

chr19 1204858 1206188 31 Loss 1330 13 1.266 0.000132 STK11

chr19 1204441 1206532 13 Loss 2091 20 1.439 5.18E-06 STK11

chr19 1204441 1206601 35 Loss 2160 21 1.415 1.34E-06 STK11

chr19 1204341 1206601 9 Loss 2260 22 1.397 1.22E-06 STK11

chr19 1204341 1205334 27 Loss 993 8 1.201 0.003221 none

chr6 391297 392018 9 Loss 721 6 1.187 0.006637 IRF4

chr6 391012 393307 32 Loss 2295 13 1.283 0.001368 IRF4

chr6 390033 393813 14 Loss 3780 22 1.410 0.001425 IRF4

chr6 389807 393813 29 Loss 4006 24 1.434 0.002961 IRF4

chr6 389807 393813 31 Loss 4006 24 1.451 0.003974 IRF4

chr6 389807 393813 34 Loss 4006 24 1.321 0.000228 IRF4

chr6 389807 393813 35 Loss 4006 24 1.444 0.000493 IRF4

chr6 389807 393813 36 Loss 4006 24 1.425 0.003696 IRF4

chr14 96137304 96138881 6 Loss 1577 6 0.872 0.003057 TCL6

chr14 96137304 96138881 1 Loss 1577 6 0.903 0.004131 TCL6

chr14 96137304 96138881 3 Loss 1577 6 0.930 0.008222 TCL6

chr14 96137304 96138881 5 Loss 1577 6 0.978 0.001597 TCL6

chr14 96137304 96138881 4 Loss 1577 6 1.063 0.000595 TCL6

chr13 49039334 49047387 10 Loss 8053 8 1.441 0.001357 RB1

157

chr13 48877704 48879553 23 Gain 1849 8 2.712 0.006528 RB1

chr13 48877469 48877704 9 Loss 235 3 1.290 0.006003 none

chr15 91336489 91337544 9 Gain 1055 4 3.830 0.001668 BLM

chr15 91306243 91308546 16 Gain 2303 7 3.657 0.006367 BLM

chr15 91187994 91251111 24 Loss 63117 11 1.403 0.009155 CRTC3

Supplementary Table 5 – A full list of regions of interest selected based on the criteria described

in the methods