Cis - regulatory SNPs altering transcription detected by allelic expression mapping

Post on 12-Jan-2016

23 views 1 download

description

Cambridge, July 16, 2010. Cis - regulatory SNPs altering transcription detected by allelic expression mapping. Tomi Pastinen, MD, PhD Assistant Professor Departments of Human and Medical Genetics, McGill University McGill University and Genome Quebec Innovation Centre. Outline. - PowerPoint PPT Presentation

Transcript of Cis - regulatory SNPs altering transcription detected by allelic expression mapping

Cambridge, July 16, 2010

Tomi Pastinen, MD, PhDAssistant ProfessorDepartments of Human and Medical Genetics, McGill UniversityMcGill University and Genome Quebec Innovation Centre

1. Allelic expression: principle and methodology

2. Catalogs of cis-regulatory SNPs (cis-rSNPs)

3. Applications

AsthmaAsthmaType 1 Type 1

DiabetesDiabetes

Non-codingvariant

Non-codingvariant

Codingvariant

Codingvariant

C

TTC

C

T

mRNA (or pre-mRNA)

relative allele ratios can be used as quantitative trait for mapping local cis-regulatory variation in phased samples

Expectedequal expression of allelic transcripts

CT

11

=

T

T

Observedbiased allelic expression

CT

12

=

Illumina Human 1M Duo(currently 2.5 M quad)

C

TTC

C

T

TT gDNA AE

ratio (T/C)=1

cDNA AEratio (T/C)=2

Allelic Expression (AE) Measurement

Population panel of cells(CEU & YRI LCLs from

HapMap)

AB

AB

AA

AE mapping in phased chromosomes

AE association +vecis-regulatory SNP (cis-rSNP)

AE association -ve

Variability of allele ratio [ = Y/(Y+X)] needs to be accounted for when comparing cDNA to gDNA

cDNA cDNA

gDNAgDNA

R2 = 0.72in biologicalreplicates(CEU LCL)

R2 = 0.83in technicalreplicates(CEU LCL)

R2 = 0.76in biologicalreplicates in same environment(osteoblast)

R2 = 0.61in individualbut in differentcell culturecondition(osteoblast)

We use heterozygote ratio difference in phased gDNA and cDNA genotype data

het ratio = RNAc1/(c1+c2) - DNAc1/(c1+c2)

• single point allelic expression is noisy• heterozygosity low using coding SNPs only

P = 2x10-9

C1C2 genotypeBA BB/AA AB

phased RNA (cDNA) and

genomic DNA (gDNA)

genotyping data from same

individual are averaged across multiple sites in

primary transcripts

phased RNA (cDNA) and

genomic DNA (gDNA)

genotyping data from same

individual are averaged across multiple sites in

primary transcripts

Full Transcript (AFF3 = ~600Kb), 5’ Association

AE measurements across large genes

AB

AB

Observation

AERati

o

Monoallelic A 0.5

Equal Expression of A and B

0

Monoallelic B -0.5

Differential 5’ Exon Usage (CUGBP2), 5’ Association

Allele-specific expression of long isoform

• on average > 50% of population variance in cis-regulation can be explained by common SNPs in associated loci

• 5-10x more fxn variation revealed as compared to cis-eQTL mapping

• >90% of mapped cis-rSNPs behave as expected in the offspring (Mendelian inheritance)

observe large effect sizes for associated variants

common cis-variants affect >30% of measured RefSeq transcripts

low-throughput methods show converging data for 75% of genome-wide significant AE mapping results, but diversity of mechanisms suggested

AE association by RefSeq annotation

RefSeq (n)

Fraction of

Measured

RefSeq

Non-RefSeqOverlap

(n)

Fraction of

MeasuredNon-

RefSeq

Mean r2

GW Significant(P<7.6x10-9)

1360 14% 815 8% 0.74

Permutation 0.001

2935 30% 2225 21% 0.63

Permutation 0.005

3408 35% 2787 26% 0.59

Ge et al., Nature Genet. 2009

1) Large effect size (> 1.2-fold difference between cis-rSNP heterozygotes) across fulllength transcripts2) Most SNPs (>75%) of all available SNPs in primary transcript above signal cut-off3) Consistent allelic effects across introns and exons of the primary transcript (for transcripts fulfilling criteria 1+2, the proportion with exon – intron r2 > 0.3 is >90%)

~17%of genes

of top cis-eQTLs up to 50% of AE-mapping data show converging cis-rSNP; but given the high discovery rate only ~10% of cis-rSNPs yield significant cis-eQTL (Ge et al. 2009)

But comparison of AE mapping data in YRI LCLs vs. YRI RNA-seq. data shows converging effects for vast

majority of transcriptional cis-rSNPs

-log10(P-value)

6.8

6.3

CEU

YRI

CEU+YRI

-log10(P-value)

-log10(P-value)

11

Fine-map region of shared association to look for causal cis-rSNPs

CEU SNP YRI SNP

14

12

10

8

6

4

96

94

92

90

88

86

84

82-5 -4 -3 -2 -1 0

1000G Score Cutoff

simple scoring based on deviation from expected heterozygosity among samples showing unequal/equal AE

Pperm < 0.001 in FB Pperm > 0.001 in FBOverlapping transcription altering cis-rSNP

5’ proximal cis-rSNPsaltering regulationof DISC1 in a cell typeindependent manner

Most common type ofcis-rSNPs

5’ distal cis-rSNPsaltering regulationof PTGER4 in a cell typedependent manner

3rd most common typeof cis-rSNPs

3’ distal cis-rSNPsaltering regulationof EFNA5 in a cell typedependent manner

least common typeof cis-rSNPs

5’ distal cis-rSNPsaltering regulationof EFNA5 in a cell typedependent manner

P-value (2-tailed)Association orientation 5' 3' 5' 3'Maximum Fine-mapped SNPs/Transcript 1 1 3 3

BroadChipSeqPeaksGm12878Ctcf9.33E-

052.11E-

043.16E-06

9.21E-08

BroadChipSeqPeaksGm12878H3k4me11.01E-

439.57E-

259.59E-53

2.32E-24

BroadChipSeqPeaksGm12878H3k4me22.14E-

211.91E-

096.79E-22

7.88E-15

BroadChipSeqPeaksGm12878H3k4me33.53E-

161.14E-

066.24E-09

2.12E-10

DukeDNaseSeqPeaksGm12878V32.50E-

041.96E-

034.19E-12

2.86E-04

UncFAIREseqPeaksGm12878V31.30E-

793.16E-

464.87E-

1126.88E-

70

UtaChIPseqPeaksGm12878CtcfV39.03E-

071.98E-01 1.47E-03

9.17E-03

UwChIPSeqHotspotsRep1Gm06990Ctcf5.33E-

091.13E-

078.33E-06

1.30E-10

UwChIPSeqHotspotsRep1Gm12801Ctcf2.85E-

021.81E-

048.36E-04

5.60E-08

UwChIPSeqPeaksGm12801Ctcf3.09E-

041.27E-

021.62E-07

6.59E-06

UwDnaseSeqHotspotsRep1Gm128784.29E-

094.40E-

051.91E-05

1.10E-06

UwDnaseSeqHotspotsRep2Gm069902.93E-

052.14E-

059.56E-04

1.69E-06

UwDnaseSeqHotspotsRep2Gm128789.63E-

082.22E-

026.98E-04

1.02E-03

UwDnaseSeqPeaksRep2Gm128788.58E-

093.92E-01 4.27E-08 8.22E-02

YaleChIPseqPeaksGm12878MaxV23.41E-

062.21E-01 3.13E-03 4.09E-01

YaleChIPseqPeaksGm12878Pol2V28.60E-

092.41E-02 7.71E-05 1.72E-01

In vitro validation of intronic enhancer rSNP (rs909685)

In vitro validation of promoter rSNP (rs344071)

Allele-specific DNA-protein interactions

Input

FAIRE

MNase

rs17658686CG

C*-

com

peti

tor

C*-

nu

clea e

xtr

act

ion

C*+

C c

om

peti

tor

C*+

G c

om

peti

tor

C*+

non

sp

eci

fic

com

peti

tor

G*-

com

peti

tor

G*+

C c

om

peti

tor

G*+

G c

om

peti

tor

G*+

non

sp

eci

fic

com

peti

tor

Genetic association

Functionalassociation

Ge et al. Nature Genetics 2009

Functional association

Potential mechanism

Verlaan et al. AJHG, 2009

Creutzfeld-Jacob’s disease: PRNPLDL cholesterol: HMGCRCRP levels: IL6RCrohn’s disease: IL23RPlasma homocysteine: CBSTooth development: HOXB2

Creutzfeld-Jacob’s disease: PRNPLDL cholesterol: HMGCRCRP levels: IL6RCrohn’s disease: IL23RPlasma homocysteine: CBSTooth development: HOXB2

CIS-rSNPs POTENTIALLYEXPLAINING DISEASEASSOCIATIONS ARE ENRICHED FOR TISSUESPECIFIC VARIANTS

CIS-rSNPs POTENTIALLYEXPLAINING DISEASEASSOCIATIONS ARE ENRICHED FOR TISSUESPECIFIC VARIANTS

Preliminary observations:Examples:

common haplotypes harbor functional alleles altering cis-regulation in most human genes

cis-regulatory SNPs altering transcription can be characterized by: specific assessment of population variation in cis-regulation (AE-

mapping) fine-mapping using sequenced genomes (1000G/imputation for

common variants) intersection with functional genomic data (ENCODE)

regulatory variation in complex genomic regions (overlapping transcripts), or causing post-transcriptional effects require other tools (strand-specific assays/RNA-seq.)

large-scale, orthogonal validation tools need to catch up with mapping

McGill University and Génome Québec Innovation Centre

Pastinen LabTony Kwan, Véronique Adoue, Lisanne Morcos, Dominique J Verlaan, Tomi Pastinen, Elin Grundberg, Vonda Koka, Kevin Lam, Bing Ge

Alexandre Montpetit, Eef Harmsen, Joana Dias, Rose Hoberman, Ken Dewar

RLBP1L1 Potential new transcript

1) top associated SNP from AE-mapping2) highest scoring 1000 Genomes site

Region of active chromatin

Histone marks are tissue-specific

Region for RNA polymerase 2

binding

Highly conserved regions of regulatory potential

But most comprehensive survey of imprinting to date suggests that <100imprinted loci exist as compared to thousands of loci modulated by cis-rSNPs

Morcos et al. manuscript in prep.