Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5....
Transcript of Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5....
Comparative eQTL analyses within and betweenseven tissue types suggest mechanisms underlying
cell type specificity of eQTLs
Barbara Engelhardt, Duke UniversityChristopher D Brown, University of Pennsylvania
November 9th, 2012
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Motivation: Predicting functional SNPs
Most functional nucleotides in vertebrate genomes arenon-coding
> 85% of common disease associations with non-coding SNPs
We would like to know whether any non-coding SNP in celltype of interest is biochemically functional to study:
genome-wide association study hitsde novo mutations involved in highly penetrant diseasesomatic mutations involved in cancer
Current functional SNP analyses are limited by our narrowunderstanding of the functional constraints of most of thegenome
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Functional SNPs: Expression Quantitative Trait Loci
eQTLs are genetic variants that are associated withdifferences in mRNA transcription levels
Current eQTL studies do not go far enough:
cell specificity across relevant cell types unclearLD-linked SNPs instead of causal SNPoften one local, most significantly associated eQTL-SNP
Study goal: quantify, identify possible mechanisms for, andpredict cell type specific eQTLs
Results will enable functional interpretations of SNPs in a celltype specific way
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Comparison of eQTLs: eleven studies, seven cell types
Used gene expression and genotype data from 11 publiclyavailable studies on 7 different cell types
Analysis pipeline was uniform across studies:
Remapped expression probes to unique genes in EnsemblRemoved unexpressed probes, probes containing SNPsRemoved principal components to account for study-specificconfoundersImputed genotypes to CEPH HapMap phase 2 panel
Evaluated eQTLs using Bayes factors (BFs)
Single permutation to evaluate FDR
Only considered cis-eQTLs (SNPs within 1Mb of TSS or TES)
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
eQTLs across studies: by the numbers
Study Code Tissue N N genesCAP CPL LCLs 480 18718
HapMap 2 STL LCLs 210 15752
Harvard HCE Cerebellum 540 18263
Harvard HPC Prefrontal cortex 678 18257
Harvard HVC Visual cortex 463 18263
GenCord GCF Blood fibroblasts 83 16691
GenCord GCL LCLs 85 16691
GenCord GCT Blood t cells 85 16691
UChicago CLI Liver 206 16236
Merck MLI Liver 266 18234
Myers MBR Brain 193 11707
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Sample size versus fraction of genes with eQTLs
10
10
10
1 2 3 4 5 6
10
10
10
10
10
0 50 100 150
GCFGCTGCLMBR
CLISTLMLI
HVCCPLHCEHPC
5
10
15
20
25
30
5
10
15
20
10
10
10
10
5% FDR
1 2 3 4 5 6 70
200 400 600200 400 600 -40 -20
TSS
TES 20 40KbSamplesSamples
eQTL Count (x10 )log BF log BF
FDR
eQTL
Cou
nt
Stud
y
Gen
es w
ith e
QTL
[%]
eQTL
s with
AH
[%]
eQTL
Cou
nt
3’
4’
1’2’
P
3’ 4’1’ 2’
CLI
STL
MLI
HVC
HPCHCE CPL
MBRG
C*
CLICLI
STL
STL
MBRMBR
MLIMLI
HVC HVC
HPC HPC
HCE HCE
GC*
GC*
CPL CPL
Studies with duplicate arrays have substantially more power
Study size and replicate arrays account for 98% of thevariability in fraction of genes with eQTLs
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Allelic heterogeneity
Allelic heterogeneity: variants at a genomic locusindependently regulate the same biological process.
ENCODE: > 400, 000 regulatory elements for ∼ 23, 000 genes
Most significant eQTL is often not the only eQTL
Used LD-block method to identify allelic heterogeneity
Followed identification with a test for independent effects
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Allelic heterogeneity across eleven studies
10
10
10
1 2 3 4 5 6
10
10
10
10
10
0 50 100 150
GCFGCTGCLMBR
CLISTLMLI
HVCCPLHCEHPC
5
10
15
20
25
30
5
10
15
20
10
10
10
10
5% FDR
1 2 3 4 5 6 70
200 400 600200 400 600 -40 -20
TSS
TES 20 40KbSamplesSamples
eQTL Count (x10 )log BF log BF
FDR
eQTL
Cou
nt
Stud
y
Gen
es w
ith e
QTL
[%]
eQTL
s with
AH
[%]
eQTL
Cou
nt
3’
4’
1’2’
P
3’ 4’1’ 2’
CLI
STL
MLI
HVC
HPCHCE CPL
MBRG
C*
CLICLI
STL
STL
MBRMBR
MLIMLI
HVC HVC
HPC HPC
HCE HCE
GC*
GC*
CPL CPL
Sample size well correlated withlevels of allelic heterogeneity
Gene Ontology analysis showsno distinction between geneswith primary eQTLs and thosewith secondary or more eQTLs
We hypothesize that allelicheterogeneity is ubiquitous
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
eQTLs across cell types: locations
eQTLs enriched relative tobackground at TSS, TES
TSS, TES enrichment extendsto eQTLs in all tiers
10
10
10
1 2 3 4 5 6
10
10
10
10
10
0 50 100 150
GCFGCTGCLMBR
CLISTLMLI
HVCCPLHCEHPC
5
10
15
20
25
30
5
10
15
20
10
10
10
10
5% FDR
1 2 3 4 5 6 70
200 400 600200 400 600 -40 -20
TSS
TES 20 40KbSamplesSamples
eQTL Count (x10 )log BF log BF
FDR
eQTL
Cou
nt
Stud
y
Gen
es w
ith e
QTL
[%]
eQTL
s with
AH
[%]
eQTL
Cou
nt
3’
4’
1’2’
P
3’ 4’1’ 2’
CLI
STL
MLI
HVC
HPCHCE CPL
MBRG
C*
CLICLI
STL
STL
MBRMBR
MLIMLI
HVC HVC
HPC HPC
HCE HCE
GC*
GC*
CPL CPL
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Replication within and between cell types
eQTL replication entails log10 BF > 1.0 in target data set forall eQTLs in discovery data at FDR < 5%
Blue lines show within cell type replication; red lines showbetween cell type replication
0
25
50
75
100
0 20 40 600 5 10 15 20 0.0 2.5 5.0 7.5 10.0 12.5
Repl
icat
ion
[%]
0
25
50
75
50 100 150 200 50 100 150 200 50 100 150 200|SNP - TSS| [Kb]
Repl
icat
ion
[%]
log BF
LCL + LCLLCL + Liver
Liver + LiverLiver + LCL
Brain + BrainBrain + LCL
LCL + LCLLCL + Liver
Liver + LiverLiver + LCL
Brain + BrainBrain + LCL
False positives: small percentage of replicating eQTLs
False negatives: due to study design, lack of power, etc.
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Incorporating ENCODE data: functional interpretability
ENCODE project has extensive genomic data for cell typespecific genomic featuresUnderstand how eQTL regulates transcription
Figure from ENCODE projectBarbara Engelhardt Mechanisms underlying eQTL cell type specificity
Allelic heterogeneity and insulators
CTCF is the best characterizedinsulator protein, conserved infunction across metazoans
If two SNPs independentlyregulate transcription, we mightexpect an enrichment of CTCFbetween them
In Drosophila melanogaster,recent work showed insulatorsare enriched between alternativepromoters [Negre, 2010]
We see this same enrichment inhumans
23
20
40
60
80
100
0 50 100 150 200 250 300SNP-SNP Distance [kb]
Inte
rven
ing
CTC
F [%
]
Background SNPsIndependent eQTL SNPs
Figure 3. Insulators are enriched between SNPs independently associated with the samegene expression trait.Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
eQTLs and overlap with DHS sites
DNAse I hypersensitive (DHS)sites: indicate histone-depletedopen chromatin; classic feature ofactive regulatory elements
Clear enrichment in eQTL overlap
Significant enrichment forreplicating eQTLs versusnon-replicating eQTLs (not shown)
Significant enrichment for LCLeQTLs in DHS sites in LCLs versusDHS sites in Hepg2 cells (notshown)
10
20
30
40
0
10
20
30
0
10
20
SN
P-CR
E O
verla
p [%
] S
NP-
CRE
Ove
rlap
[%]
SN
P-CR
E O
verla
p [%
]
DHS
Site
sp3
00 S
ites
Activ
e Pr
omot
ers
-50 -25
TSS
TES 25 50 -50 -25
TSS
TES 25 50-50 -25
TSS
TES 25 50
Kb KbKb
Background SNPseQTL SNPs
Non-replicatingReplicating
HepG2 CREsLCL CREs
Background SNPseQTL SNPs
Non-replicatingReplicating
HepG2 CREsLCL CREs
Background SNPseQTL SNPs
Non-replicatingReplicating
HepG2 CREsLCL CREs
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
eQTLs and overlap with heterochromatin
Heterochromatin (facultative):tightly packed, cell specific form ofDNA; regulatory elements inheterochromatin regions areinaccessible to transcriptionalregulators
Clear depletion in eQTL overlap
Significant depletion for replicatingeQTLs versus non-replicatingeQTLs (not shown)
Significant depletion for LCLeQTLs in heterochromatin in LCLsversus heterochromatin in Hepg2cells (not shown)
20
40
60
0
5
10
15
20
25
0
20
40
60
80
-50 -25
TSS
TES 25 50 -50 -25
TSS
TES 25 50 -50 -25
TSS
TES 25 50
SN
P-CR
E O
verla
p [%
] S
NP-
CRE
Ove
rlap
[%]
Insu
lato
r In
ters
ectio
n [%
]
Hete
roch
rom
atin
Repr
esse
d Ch
rom
atin
Insu
lato
rs
Kb Kb Kb
Background SNPseQTL SNPs
Non-replicatingReplicating
HepG2 CREsLCL CREs
Background SNPseQTL SNPs
Non-replicatingReplicating
HepG2 CREsLCL CREs
Background SNPseQTL SNPs
Non-replicatingReplicating
HepG2 CREsLCL CREs
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Predicting replication of eQTLs
Built random forest classifier to predict whether a specificeQTL would replication in a second study
Class was whether an eQTL replicated or not
Features included:
genomic information (e.g., distance to TSS of SNP)non-cell type specific regulatory elements (e.g., GERP scores)cell type specific regulatory elements (e.g., DHS sites, TFBS)
Considered predicting replication:
within cell type using cell type specific CRE informationbetween cell type using target cell type specific CRE data
Validated accuracy using 10-fold cross validation
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Predicting replication of eQTLs: ROC curves
Receiver Operating Characteristic (ROC) curves compare the rateof false positives versus the rate of true positives as the cutoffmoves from most to least restrictiveRed lines: within cell type replicability; blue lines: between celltype replicability.
0
25
50
75
100
0 25 50 75 1000 25 50 75 100 0 25 50 75 100FPR
TPR
LCL + LCLLCL + Liver
Liver + LiverLiver + LCL
Brain + BrainBrain + LCL
Area under the ROC Curve (AUC): quantifies improvementover random guessing
For LCL eQTLs, AUCs are 0.79 and 0.73, respectively, forwithin LCL and between LCL and liver eQTL replication
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Predicting replication of eQTLs: Gini scores
How predictive is each feature for whether the eQTLreplicates?
Across all training sets, biggest contributors:
eQTL discovery significanceSNP to TSS distance,gene expression level
Cis-regulatory elements vary considerably in the degree towhich they are useful in predicting replication
Intervening insulators contribute substantially to within celltype predictions
Heterochromatin states contribute substantially to betweencell type predictions
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Summary and Conclusions
We leveraged eQTLs found in both within and between celltypes and extensive ENCODE data in this large comparativestudy to quantify, describe mechanistically, and predict celltype specific eQTL SNPs
With an SNP and a cell type of interest:
identify an eQTL well correlated (in high LD) with the hitcompute probability that it will replicate in cell type of interestconsider the location of the hit relative to cell type specific andprediction-informative CREsmake a more informed hypothesis about mechanism ofphenotype (validate via experiments)
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity
Acknowledgements
Casey Brown (UChicago, Penn), Lara Mangravite (SageBionetworks), Matthew Stephens (University of Chicago)
Greg Crawford (Duke University), all the ENCODE data
eQTL studies: GenCord, CAP, Harvard Brain, HapMap phase2, Merck liver, Myers brain, UChicago liver
Funding: NIH NHGRI K99/R00
Paper on arXiv, Haldane’s Sieve
Graphics: R package ggplot2
Barbara Engelhardt Mechanisms underlying eQTL cell type specificity