Disease epigenomics:Interpreting non-coding variants using
chromatin and activity signatures
Jason Ernst Broad Institute of MIT and Harvard MIT Computer Science
& Artificial Intelligence Laboratory Challenge: interpreting
disease-associated variants
Gene annotation (Coding, 5/3UTR, RNAs) Evolutionary signatures
Roles in gene/chromatin regulation Activator/repressor signatures
CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/) Non-coding
annotation Chromatin signatures Other evidence of function
Signatures of selection (sp/pop) GWAS, case-control, reveal
disease-associated variants Molecular mechanism, cell-type
specificity, drug targets Challenges towards interpreting disease
variants Find true causative SNP among many candidates in LD Use
causal variant: predict function, pathway, drug targets Non-coding
variant: type of function, cell type of activity Regulatory
variant: upstream regulators, downstream targets This talk:
genomics tools for addressing these challenges The good news:
ever-expanding dimensions
Additional dimensions: Environment Genotype Disease Gender Stage
Age Each point represents agenome-wide dataset Chromatin marks Cell
types Now: Cell-type and chromatin-mark dimensions Next: References
for each background All clearly needed, and increasingly available
Difficulty of interpreting increasing # tracks
Challenge: simplify Learn combinations Interpret function
Prioritize marks Study dynamics Challenge of data integration in
many marks/cells
Epigenetic modifications DNA/histone/nucleosome Encode epigenetic
state Histone code hypothesis Distinct function for distinct
combinations of marks? Hundreds of histone marks Astronomical
number of histone mark combinations How do we find biologically
relevant ones? Unsupervised approach Probabilistic model Explicit
combinatorics Epigenomic information retains genome state in
differentiation and development Genome-widemodification maps
Hundreds ofhistone tail modifications already known Two types: DNA
methyl. Histone marks DNA packaged into chromatin around histone
proteins Genomic tools for disease SNP interpretation
Chromatin states regulatory region annotation Combinatorial
patterns of marks chromatin states Distinct classes of
prom/enh/transcr/represd/repetitive Reveal new genes, lincRNAs,
enhancers, GWAS/SNP Activity signatures linking enhancer networks
Correlated changes in expression, chromatin, motifs Link TFs to
enhancers and enhancers to targets Predict causal cell-type
specific activators/repressors Interpreting disease variants
Predicting SNP chromatin states and cell-type specificity Specific
mechanistic predictions for disease SNPs Measuring selective
pressures within human populations ChromHMM: learning hidden
chromatin states
Transcription Start Site Enhancer DNA Observed chromatin marks.
Called based on a poisson distribution Most likely Hidden State
Transcribed Region 1 6 5 3 4 1: 3: 4: 5: 6: High Probability
Chromatin Marks in State 2: 0.8 0.9 0.7 200bp intervals All
probabilities are learned de novo from chromatin data alone
(Baum-Welch aka. EM) 2 K4me3 K36me3 K4me1 K27ac We had talked about
adding the H3K4 etc labels within the shapes Each state: vector of
emissions, vector of transitions Ernst and Kellis, Nature Biotech
2010 Chromatin states for genome annotation
Learn de novo significant combinations of chromatin marks Reveal
functional elements, even without looking at sequence Use for
genome annotation Use for studying regulation dynamics in different
cell types Promoter states Transcribed states Active Intergenic
Repressed Emerging large-scale genomic/epigenomic datasets
Multiple cell types Diverse experiments Developmental time-course
Reference Epigenome Mapping Centers Used to study many disease
epigenomes ENCODE Chromatin Group (PI: Bernstein) Insulator
Enhancer Promoter Transcribed Repressed Repetitive 15-state model
learned jointly 9 chromatin marks+WCE 9 human cell types HUVEC
Umbilical vein endothelial NHEK Keratinocytes GM12878
Lymphoblastoid K562 Myelogenous leukemia HepG2 Liver carcinoma NHLF
Normal human lung fibroblast HMEC Mammary epithelial cell HSMM
Skeletal muscle myoblasts H1 Embryonic H3K4me1 H3K4me2 H3K4me3
H3K27ac H3K9ac H3K27me3 H4K20me1 H3K36me3 CTCF +WCE +RNA x NHEK
HUVEC H1 Cell type concatenation approach Ensures common emission
parameters Verified with independent learning Chromatin states
capture coordinated mark changes
State definitions are cell-type invariant Same combinations
consistently found State locations are cell-type specific Can study
pair-wise or multi-way changes Chromatin states correlation with
gene expression
TSS +50kb -50kb Lower expression Higher expression Pair-wise
changes reveal cell-type specific functions
Gene functional enrichments match cell function Distinguish On,
Off, and Poised promoter states Genomic tools for disease SNP
interpretation
Chromatin states regulatory region annotation Combinatorial
patterns of marks chromatin states Distinct classes of
prom/enh/transcr/represd/repetitive Reveal new genes, lincRNAs,
enhancers, GWAS/SNP Activity signatures linking enhancer networks
Correlated changes in expression, chromatin, motifs Link TFs to
enhancers and enhancers to targets Predict causal cell-type
specific activators/repressors Interpreting disease variants
Predicting SNP chromatin states and cell-type specificity Specific
mechanistic predictions for disease SNPs Measuring selective
pressures within human populations Introducing multi-cell activity
profiles
Gene expression Chromatin States Active TF motif enrichment TF
regulator expression Dip-aligned motif biases HUVEC NHEK GM12878
K562 HepG2 NHLF HMEC HSMM H1 TF On TF Off Motif aligned Flat
profile ON OFF Active enhancer Repressed Motif enrichment Motif
depletion Enhancer vs. promoter dynamics
Promoters typically active in many cells Enhancers exquisitely
cell-type specific Enhancer vs. promoter dynamics Linking candidate
enhancers to correlated target genes
Search for coherent changes between: gene expression chromatin
marks at distant loci (10kb) Combine two vectors: Expression vector
for each gene Vector of mark intensities at dist locus (combine
marks based on enhancer emissions) 3. High correlation
enhancer/target link 10kb Candidate TM4SF1 Enhancer Predictive
power of distal enhancer regions
Correlation of individual regions (Sorted by Rank) Mark intensity
correlation w/ expr 10kb upstream 100kb upstream 10kb/100kb
controls At least 100 regions with >80% correlation Coordinated
activity reveals enhancer links
Enhancer activity Gene activity Predicted regulators Activity
signatures for each TF Distal enhancer hard to integrate in
regulatory models Linked to target genes based on coordinated
activity Linked to upstream regulators using TF expr & motifs
Nucleosome Positioning Footprints Supports Transcription Factor
Cell Type Predictions
Tag Enrichment for H3K27ac Genomic tools for disease SNP
interpretation
Chromatin states regulatory region annotation Combinatorial
patterns of marks chromatin states Distinct classes of
prom/enh/transcr/represd/repetitive Reveal new genes, lincRNAs,
enhancers, GWAS/SNP Activity signatures linking enhancer networks
Correlated changes in expression, chromatin, motifs Link TFs to
enhancers and enhancers to targets Predict causal cell-type
specific activators/repressors Interpreting disease variants
Predicting SNP chromatin states and cell-type specificity Specific
mechanistic predictions for disease SNPs Measuring selective
pressures within human populations Enhancer annotation revisits
disease SNPs
xx Enhancer annotation revisits disease SNPs Previously unlinked
phenotypes enriched for cell-type specific enhancers Application1:
Pinpoint disease SNPs in enhancers
Much smaller fraction of genome considered Strong enhancers 1.9%,
weak 2.8%, promoter 1.4% Application 2: Make much more precise
predictions
Use: * Cell-type specificity of chromatin states * Predicted
activators/repressors of these states * Predicted motif instances
across the genome Ex1: Systemic lupus erythematosus intergenic
SNP
SNP in lymphoblastoid GM-specific enhancer state Disrupts Ets1
motif instance, predicted GM regulator Model: Disease SNP abolishes
GM-specific enhancer Ets-1 is a predicted activator of GM/HUVEC
enhancers
Enhancer activity Gene activity Predicted regulators Activity
signatures for each TF Enhancer class specific to GM and HUVEC cell
types Ets expression Ets-1 motif enrichment in enhancers Model:
Ets-1 disruption would abolish enhancer state Ex2: Erythrocyte
phenotype study intronic SNP
K562: erythroleukaemia cell type ` ` Disease SNP creates motif
instance for Gfi-1 repressor Gfi-1 predicted repressor for
K562-specific enhancers Creation of repressive motif abolishes K562
enhancer Gfi-1 is a predicted repressor of non-K562 enhancers
Enhancer activity Gene activity Predicted regulators Activity
signatures for each TF Gfi expression Gfi-1 motif depletion in
enhancers Prediction: Gfi-1 large-scale repression of non-K562
Motif created Gfi-1 recruited enhancer repressed More generally:
eQTLs in specific chromatin states
Dixon 2007: All eQTLs, Lymphoblasts, 400 ind. Schadt 2008: Trans
eQTLs, liver cells, 427 ind. Nucleotide-resolution genome-wide
expr. predictors Strong enrichment for promoter and enhancer states
Trans-eQTLs select for cell-type specific enhancers Genomic tools
for disease SNP interpretation
Chromatin states regulatory region annotation Combinatorial
patterns of marks chromatin states Distinct classes of
prom/enh/transcr/represd/repetitive Reveal new genes, lincRNAs,
enhancers, GWAS/SNP Activity signatures linking enhancer networks
Correlated changes in expression, chromatin, motifs Link TFs to
enhancers and enhancers to targets Predict causal cell-type
specific activators/repressors Interpreting disease variants
Predicting SNP chromatin states and cell-type specificity Specific
mechanistic predictions for disease SNPs Measuring selective
pressures within human populations
Top Related