Global profiling of methylation in the cancer genome - Illumina · Global profiling of methylation...
Transcript of Global profiling of methylation in the cancer genome - Illumina · Global profiling of methylation...
Global profiling of methylation
in the cancer genome
Andy Feber
UCL Cancer Institute
University College London, UK
Illumina, Manchester, 7th September 2010
genetic
factors
What determines a cancer phenotype?
non-genetic
factors
• epigenetics
• environment
Epigenetics and Cancer
• How are epigenetics changes involved in cancer?
• Definition
“The study of heritable changes in gene expression that
occur independent of changes in the primary DNA
sequence”
Background - epigenetics
Histone modifications, e.g., - acetylation
- methylation
DNA methylation
Non-coding RNA (ncRNA)
&
Micro RNA (miRNA)
One Genome……..Many Methylomes
DNA Methylation
• DNA methylation is the addition of methyl group to cytosine generally in CpG dinucleotides
• 28.6 million CpG sites in the human genome, 70% of which are methylated
• CpG rich regions known CpG Islands (CGIs) are generally located near to the start of genes and associate with promoters
• Previously thought to key site of epigenetic regulation of gene expression, and have been the main focus of epigenetic research
• Recently changes in methylation at regions out side CGIs, known as CGI sores have been shown to more significantly associated with gene regulation
• Only 7% of CpGs reside within CGIs, many CpGs remain un-analyzed by conventional approaches, microarray, PCR, bisulphite sequencing
• Next generation sequencing now allows the profiling of over 100x106 loci at one time
• Combined with enrichment strategies, such as MEDIP (Methylated DNA immunoprecipitation) (MeDIP-Seq), allows whole genome methylation (methylome) to be assessed in a single experiment
MeDIP Seq
T
CG
AT
C
G
A
T CG
T
C
G
G
CA
C
CTGTGG
Reference Sequence
….CGTGATGTCGCGCCTCACTCCGGTGG…
TCCGGTGG
CCTCACTCCGG
CGCGCCTCAC
TGATGTCGCG
GCTGATGTCG
TGTCGCGCC
TCGCGCCTC
CCTCACTCCG
CTCCGGTGG
Determining methylation from read count
• How to determine absolute methylation levels both within a genome and between genomes?
• Within a given genomic region, MeDIP enrichment is proportional to the
number of methylated CpG sites.
• Simple enrichment ratios/read counts do not accurately reflect the absolute methylation levels within a particular region of interest.
Absolute Methylation ValueMeDIP EnrichmentHypothetical Genomic Region
C.
100%
C.
B.
50%
B.
A.
0%
A.
A
B CRead
Count
Bioinformatic challenge…
• Enrichment bias means absolute methylation levels are difficult to
quantitate
p
1
c
ccpbasep v,mCrA|AGm|Af
Batman : Bayesian Tool for Methylation Analysis
Batman :
Bayesian Tool for Methylation Analysis
MPNST Methylome
• To define the methylome (methylated genome) associated with a malignant phenotype
• Using Medip-Seq to identify tumor specific differential methylation
which correlates with tumor progression/development
Aim
• Pools of ten cases per sample cohort
– Malignant Peripheral Nerve Sheath Tumors (MPNST)
– Benign neurofibromas (NF)
– Normal cultured Schwann Cells (SC)
• Age and gender matched
• MPNST 6 Female, 4 Male, median age 30.7 (range 12 to 58)
• NF 6 Female, 4 Male, median age 27.7 (range 15 to 54)
Samples
Benign Disease
Familial
(Germ line
mutation in NF1)
Sporadic
(often with alterations
in NF1, eg LOH)
Plexiform Dermal
Malignant Peripheral
Nerve Sheath Tumours
10-15% develop
malignant disease
Neurofibromatosis type 1 (NF1)
3000 cases/year
Only 20% of patients disease free
after 5 years
Malignant Peripheral Nerve Sheath Tumors (MPNST)
Sample Total number
of reads
Total
Mapped
Reads
Total Unique
Mapped
Reads
MPNST 140119516 133145064 75918388
NF 140442616 134234980 81619250
SC 138120350 131484108 68697944
Read Stats
* Those with a Maq score of >10 and both paired reads mapping uniquely
• Covering ~68% of CpGs in each of the three genomes.
Copy Number Correction
Sample Batman V Infinium
Pearson correlation
MPNST 0.78
NF 0.80
SC 0.77
Medip-Seq Verification
• Verification of medip-seq initially using the Infinium 27K Human BeadChips, Illumina. Interrogate ~27500 CpG sites across the genome.
• Comparison of Medip-seq data with arrays showed a high degree of correlation
• Similar to correlations observed between:-– BeadArray v bisulphite sequencing
– BATMAN v bisulphite sequencing
Global changes in methylation
• What are the global changes involved in MPNST development ?
• To assess changes in global methylation, the methylation status of each CpG site was bind into 3 methylation states Low (<40%), High (>60%), intermediate (40-60%)
• Global analysis of revealed a small change in global methylation (1%), compared to other tumours which show global loss of methylation ranging from 10-20%.
Intermediate methylationLow methylation High methylation
• Analysis of regulatory features of CGIs, CGI shores and promoters, show similar
levels of global methylation between MPNST and Schwann cell controls
Global repeat methylation
• One of the most commonly cited features of the cancer methylome is hypomethylation of repeats
• Methylation over LINE and SINE repeats, changes slightly, interestingly LINE repeats appear to lose low methylated CpGs
• Largest changes in global methylation seen in Satellite repeats, with a 25% change in methylation between MPNST and non-neoplastic Schwann cells
Intermediate methylationLow methylation High methylation
DMR - Differentially Methylated Regions
•Regions of differential methylation were
defined by average Batman methylation
scores over 1kb.
•Regions were called differentially
methylated if they had an average
difference of 33% in batman methylation
score
• Increasing numbers of DMRs during
progression from non-neoplastic
schwann cell controls to MPNSTsDMRs Hypermethylated Hypomethylated
h2bDMR
(SC v NF)
45239 46587
b2mDMR
(NF v MPNST)
41886 45230
cDMR
(SC v MPNST)
48391 53075
Hypermethylated Hypomethylated
CGI 385 79
CGI shores 2119 1669
promoters 1097 1098
Non CGI
associated
promoters 293 175
exons 11858 11432
Introns 61709 57632
miRNA 22 30
Conserved regions 16535 27805
Satellite repeats 142 1398
LTR repeats 14339 12665
LINE repeats 34515 25359
SINE repeats 32661 39502
Hypermethylated Hypomethylated
CGI 49 47
CGI shores 996 1382
promoters 484 812
Non CGI
associated
promoters
39 95
exons 7885 12104
Introns 48086 49503
miRNA 19 31
Conserved regions 18566 16113
Satellite repeats 128 259
LTR repeats 10805 10773
LINE repeats 25526 22110
SINE repeats 28764 36448
DMRs in Genomic Features• Comparison of DMRs in different genomic features shows in which regions
methylation changes during disease progression
• Association of features DMRs with genes allows identification of potential candidate onoc- and tumorsuppressor genes
SC v NF
(h2bDMR)SC v MPNST
(cDMRs)
DMR Enrichment
Hypermethylated Hypomethylated
• Relative enrichment analysis was carried out to identify those features that have a significantly (p<0.001, red bars) higher number of DMRs than would be expected by chance
• Significant enrichment of hypomethylated satellite and SINE repeats, also enrichment of hypermethylated LINE repeats
• Of those regions assumed to be functionally relevant in the regulation of gene expression, only CGI shores and promoters (not associated with a CGI) to be significantly enriched
• Previous studies have focused on CGI and CGI associated promoters, suggesting many possible sites important in cancer have been missed.
• Are DMRs enriched in specific genomic features
Hypermethylated Hypomethylated
Enrichment in repeats
• Analysis of aberrant methylation in repeats located either within or outside introns showed a distinct pattern of repeat methylation
• We see significant enrichment of both hypomethylated non-intronic SINEs and non-intronic satellites repeats
• Also significant enrichment of intronic SINE repeats in early disease
• Enrichment of hypermethylated intronic LINE repeats, as well as non-intronic LINES
Discrete types of satellite repeats show enrichment
Hypermethylated Hypomethylated
• Satellite repeats be divided into 19 different types of repeat
• Enrichment analysis of sat repeat type highlighted 2 specific types of repeat which under go hypomethylation , SATR1 and ARL
• SATR1 appear to early events in tumourigenic progression, whereas ARL hypomethylation may be a later event
• Do satellite repeats undergo sequence specific methylation?
• Knock-out of specific DNMT family members have been shown to alter specific satellite repeat
methylation
• What its the role of aberrant satellite repeat methylation in cancer
Hypermethylated Hypomethylated
CGI 385 79
CGI shores 2119 1669
promoters 1097 1098
Non CGI
associated
promoters 293 175
exons 11858 11432
Introns 61709 57632
miRNA 22 30
Conserved regions 16535 27805
Satellite repeats 142 1398
LTR repeats 14339 12665
LINE repeats 34515 25359
SINE repeats 32661 39502
DMRs in Genomic Features• Where to start?
• 101,466 unique cDMRs
• Do DMRs associate with candidate genes
SC v MPNST
(cDMRs)
Candidate genes
MEST
- Imprinted region, differently methylated in glioblastomas (which also have frequent NF1 mutations)
WT1 –Wilms tumor suppressor 1 gene,
MPNST
NF
SC
Association of methylation of gene expression
• If aberrant methylation is a key driver of tumourigenic process?
• which regions of the genome show strongest correlation with gene expression?
• Integration of independent gene expression from MPNST (n=10) and NF (n=28) Henderson et al., 2005 (Affy U95) and Miller et al., 2009 (Affy U133+)
Effect of methylation on gene expression
across canonical gene features
• Is there a difference in canonical methylation of genomic features and those genes with high expression in MPNST vs low expression in MPNST
• Largest difference in methylation (13%) observed in the 1st exon, shows a strong
inverse relationship with gene expression
• If CGI shores have a greater effect of gene expression than CGIs, is there a difference in methylation between genes with high and low expression
Effect of methylation gene expression
CGIs and CGI shores
• Strong inverse relationship in both up stream and down stream shores, with no difference in CGI.
• Largest difference in methylation (11%) seen ~800bp-1.5kb outside the CGI suggesting these
regions are important in the regulation of gene expression
Does gene expression reflect methylation state?
• Partition clustering (with 10,000 permutations) of the expression of genes associated with
DMRs between NF and MPNST, show significant association between :-
• hypermethylation of CGI shores (p=0.0001)
• hypermethylated non-CGI promoters (p=0.0003)
• hypomethylated CGI shores (p=0.0001)
• Can the expression of genes associated with DMRs discriminate between disease
phenotypes?
Hyper CGI Shore
N=1056
Hyper non CGI Promoter
N=702
SOX10
MPNST
NF
CDKN2A
MPNST
NF
Summary
• Medip-seq provides high resolution methylation profiling of the human
epigenome
• Provides insights into the role aberrant methylation plays in regions not
accessible by other technologies
• Whole genome methylation profiles can identify potential
prognostic/diagnostic molecular markers of malignant development and
progression.
• Still not the whole picture, other epigenetic modifications are out there
• Non-CpG methylation
• 5-hydroxymethylcytosine
Acknowledgements
Lab:
UCL Cancer Institute, UK:Adrienne Flanagan
Andrew TeschendorffElia Stupka Nadege PresneauBernadine Idowu
Gurdon Institute, UK:Thomas Down
Barts and The London School of Medicine and Dentistry, UK:Vardman Rakyan
Illumina, USA: Gray Schroth
Zhang Lu
Funding:
SACT(Skeletal Action Cancer Trust)