Patrocles: a database of polymorphic miR-mediated gene ... · polymorphic miR-mediated gene...
Transcript of Patrocles: a database of polymorphic miR-mediated gene ... · polymorphic miR-mediated gene...
Patrocles: a database ofpolymorphic miR-mediated generegulation in vertebrates
Denis Baurain
Samuel HiardWouter CoppietersCarole CharlierMichel Georges
Polymorphic miR-mediatedgene regulation
AAAAAAAAA….
3’-UTR
mature miR
miRNP
Pri -miR
nucleus
Pre -miR
Host gene ?
Exportin5
cytoplasm
Drosha
complex
Dicer
Helicase
mRNA
miR/miR*
miRNP
Targets (1)
miRs (100s)
Silencing machinery(overall effect)
DNASequencePolymorphisms: DSPs
Considerable sequence space is devoted to miR-mediated gene regulation(targets, miRs, silencing machinery)
DSPs in silencing components are likely to contribute to (complex)phenotypic variation including disease
Proof in animals: Texel sheep, Clop et al. (2006) Nat. Genet. 38:813-818 Suggestions in humans: Sethupathy & Collins (2008) TIG 24:489-497
Polymorphic miR-mediatedgene regulation
AAAAAAAAA….
3’-UTR
mature miR
miRNP
Pri -miR
nucleus
Pre -miR
Host gene ?
Exportin5
cytoplasm
Drosha
complex
Dicer
Helicase
mRNA
miR/miR*
miRNP
Targets (1)
miRs (100s)
Silencing machinery(overall effect)
DNASequencePolymorphisms: DSPs
Considerable sequence space is devoted to miR-mediated gene regulation(targets, miRs, silencing machinery)
DSPs in silencing components are likely to contribute to (complex)phenotypic variation including disease
Proof in animals: Texel sheep, Clop et al. (2006) Nat. Genet. 38:813-818 Suggestions in humans: Sethupathy & Collins (2008) TIG 24:489-497
http://www.patrocles.org/
Mining public databases forSNPs and other DSPs in the3 sequence compartments
Patrocles - Overview
Patrocles
miRBase
miRs 8nt motifs
UCSC
alignments
Ensembl
3’-UTRs SNPs
SymAtlas
gene expr.
GEO
DGV
HapMap1000 genomes
CNVs
gene expr.genotypes
allele freqs
Literature
8nt motifsmiR expr.CNVs
eQTLmachinery
Updates and Synchronization...
Patrocles - Overview
Patrocles
miRBase
miRs 8nt motifs
UCSC
alignments
Ensembl
3’-UTRs SNPs
SymAtlas
gene expr.
GEO
DGV
HapMap1000 genomes
CNVs
gene expr.genotypes
allele freqs
Literature
8nt motifsmiR expr.CNVs
eQTLmachinery
Updates and Synchronization...
Currently, 7 species human chimp mouse rat dog cow chicken
target sitesin 3’-UTRs
target sitemotifs
SNPsin 3’-UTRs
3’-UTRs
2,674,395 (12.4%)4,072,176 (15.5%)sequence space
19,59530,290 (9.0%)conserved L NOT X-targets
57,15464,010 (22.4%)conserved X NOT L-targets
9,43610,425 (27.7%)conserved X AND L-targets
31,41637,700X AND L-targets
455,620661,187X OR L-targets
219,392375,054L-targets
267,644323,833X-targets
5859X AND L-octamers
9481,164X OR L-octamers
466683L-octamers
117170miR*
484676miR
540540X-octamers
111,178 (87.8%)114,305 (83.9%)known ancestral allele
126,589136,159total
21,634,54826,261,732sequence space
21,91124,319genes
mousehuman
target sitesin 3’-UTRs
target sitemotifs
SNPsin 3’-UTRs
3’-UTRs
2,674,395 (12.4%)4,072,176 (15.5%)sequence space
19,59530,290 (9.0%)conserved L NOT X-targets
57,15464,010 (22.4%)conserved X NOT L-targets
9,43610,425 (27.7%)conserved X AND L-targets
31,41637,700X AND L-targets
455,620661,187X OR L-targets
219,392375,054L-targets
267,644323,833X-targets
5859X AND L-octamers
9481,164X OR L-octamers
466683L-octamers
117170miR*
484676miR
540540X-octamers
111,178 (87.8%)114,305 (83.9%)known ancestral allele
126,589136,159total
21,634,54826,261,732sequence space
21,91124,319genes
mousehuman
Friedman et al. (2009) Genome Res. 19:92-105
Targets – Methods 2 collections of 8nt motifs
X-targets: 540 8nt motifs (mammals)conserved in 3’-UTRs, putative miR target sitesXie et al. (2005) Nature 434:338-345
L-targets: 683 8nt motifs (human)rc(2-8nt)+A from mature miRs in miRBaseLewis et al. (2005) Cell 120:15-20
2 collections of 7nt motifs (from L-targets) 7mer-A1 7mer-m8
target sitesin 3’-UTRs
target sitemotifs
SNPsin 3’-UTRs
3’-UTRs
2,674,395 (12.4%)4,072,176 (15.5%)sequence space
19,59530,290 (9.0%)conserved L NOT X-targets
57,15464,010 (22.4%)conserved X NOT L-targets
9,43610,425 (27.7%)conserved X AND L-targets
31,41637,700X AND L-targets
455,620661,187X OR L-targets
219,392375,054L-targets
267,644323,833X-targets
5859X AND L-octamers
9481,164X OR L-octamers
466683L-octamers
117170miR*
484676miR
540540X-octamers
111,178 (87.8%)114,305 (83.9%)known ancestral allele
126,589136,159total
21,634,54826,261,732sequence space
21,91124,319genes
mousehuman
target sitesin 3’-UTRs
target sitemotifs
SNPsin 3’-UTRs
3’-UTRs
2,674,395 (12.4%)4,072,176 (15.5%)sequence space
19,59530,290 (9.0%)conserved L NOT X-targets
57,15464,010 (22.4%)conserved X NOT L-targets
9,43610,425 (27.7%)conserved X AND L-targets
31,41637,700X AND L-targets
455,620661,187X OR L-targets
219,392375,054L-targets
267,644323,833X-targets
5859X AND L-octamers
9481,164X OR L-octamers
466683L-octamers
117170miR*
484676miR
540540X-octamers
111,178 (87.8%)114,305 (83.9%)known ancestral allele
126,589136,159total
21,634,54826,261,732sequence space
21,91124,319genes
mousehuman
Targets - Concordance betweenX and L target site motifs
540 8mers577 7mers554 6mers
683 8mers1265 7mers1448 6mers
91%
40%
1. human: A ...TTTGGTGAAACCAAC... => ancestral allele human: G ...TTTGGTGGAACCAAC... => derived allele chimp ...TTTGGTGAAACCAAC... => sibling species
2. rat ...TTTGGTGAAACAAAC... mouse ...CTTGGTGAAACAAAC...
3. dog ...TTTGGTGAAACTAAC... cow ...TTTGGTGAAACTAAC...
(3/3) TTTGGTGA (3/3) TTGGTGAA (3/3) TGGTGAAA (3/3) GGTGAAAC(2/3) not in dog/cow gtgaaacc (2/3) not in dog/cow tgaaacca(2/3) not in dog/cow gaaaccaa => hsa-miR-29b-2*(2/3) not in dog/cow aaaccaac
TargetsPatrocles SNPs - Methods
1. human: A ...TTTGGTGAAACCAAC... => ancestral allele ||||||||| 3'-GAUUCGGUGGUACACUUUGGUC-5' => hsa-miR-29b-2* |||.||||| human: G ...TTTGGTGGAACCAAC... => derived allele chimp ...TTTGGTGAAACCAAC... => sibling species
2. rat ...TTTGGTGAAACAAAC... mouse ...CTTGGTGAAACAAAC...
3. dog ...TTTGGTGAAACTAAC... cow ...TTTGGTGAAACTAAC...
(3/3) TTTGGTGA ........ ........(2/3) not in dog/cow gaaaccaa => hsa-miR-29b-2*(2/3) not in dog/cow aaaccaac
TargetsPatrocles SNPs - Methods
CNCDNCnot cons.P
(CC)DCconserved
?derancsite \ allele
+S +W7C / S7C
TargetsPatrocles SNPs - Results
mousehuman
5637837741S
2,0652,2903,2951,944P
7,5738,54511,2449,006CNC
7,2507,73210,3287,392DNC
496+65951+102959+581,546+50DC+CC
17,50519,65726,71920,679total
LewisXieLewisXiepSNP class
# destructions
=# creations
Targets - Patrocles SNPsEvidence for purifying selection
SNP shuffling in 3’-UTR sequence space with preservation of trinucleotide context
human - DC
Targets - Patrocles SNPsEvidence for purifying selection
SNP shuffling in 3’-UTR sequence space with preservation of trinucleotide context
human - DC
possible elimination of SNPs affecting conserved targets 22 to 35% in human 53 to 67% in mouse
Chen & Rajewsky (2006) Nat. Genet. 38:1452-1456 depletion of SNPs in conserved miR target sites when compared to
other conserved 3’-UTR sequences
TargetsPrioritization for lab validation most interesting pSNPs are
pSNPs destroying conserved target sites pSNPs creating target sites in anti-targets
to yield a phenotype, target and miR have to beexpressed in the same tissue (at the same time)
co-expression plots for human and mouse target genes: SymAtlas miRs: Landgraf et al. (2007) Cell 129:1401-1414
two different kinds of plots comparing miR and target comparing miR host gene (if any) and target
miRtarget
tissue mapping?
pSNPs - Co-expression plotsrs34542287 A/G [0.985/0.015]Destroyed Conserved target site
miR-9 vs. actin-binding LIM protein 1ACCA[A]AGA
rs28399411 G/A [0.994/0.006]Destroyed Conserved target site
miR-32 vs. Axonal membrane protein GAP-43TGTGC[A]AT
mature miR counts[+] direct evidence[–] gross tissue mapping
host gene expression[+] perfect matching of tissues[–] indirect evidence
pri-miRs (stem-loops) from miRBase
pDSPs altering miR sequence SNPs (de-)stabilizing interaction (seed, mature non-seed)
pDSPs altering miR concentration SNPs altering processing efficiency (anywhere in stem-loop) CNVs encompassing miR genes
human: http://projects.tcag.ca/variation/ mouse: She et al. (2008) Nat. Genet. 40:909-914 rat: Guryev et al. (2008) Nat. Genet. 40:538-545
eQTL (or allelic imbalance) corresponding to host genes (only human) Morley et al. (2004) Nature 430:743-747 Cheung et al. (2005) Nature 437:1365-1369 Ge et al. (2005) Genome Res. 15:1584-1591 Stranger et al. (2005) PLoS Genet. 1:e78 Pant et al. (2006) Genome Res. 16:331-339 Dixon et al. (2007) Nat. Genet. 39:1202-1207 Goring et al. (2007) Nat. Genet. 39:1208-1216 Spielman et al. (2007) Nat. Genet. 39:226-231 Stranger et al. (2007) Nat. Genet. 39:1217-1224
Polymorphic miRs - Methods
A
5’
9 8 7 6 5 4 3 2
miR seed
target site
1 2 3 4 5 6 7 8
Targeted mRNA
1
5’
Pri-miR
Host gene ?
Pri-miR
Host gene ?
Polymorphic miRs – Results
n.d.85affected miRs
n.d.78eQTLmiRs hostedin eQTL genes
0256affected miRs
0158CNVsmiRs in CNVs
79146other
626mature non-seed
412seed
89184total
71136affected miRs
SNPsin pre-miRs
466676pre-miRs
mousehuman
Polymorphic miRs – Results
Duan et al. (2007) Hum. Mol. Genet. 16:1124-1131
T Ge.g., hsa-miR-125a
SNP in seed (+8) blocks processingof pri-miR to pre-miR
manually curated list of 52 gene products involved inRNA-mediated gene silencing
3 broad compartments 1. miR biogenesis: 4 (+4) 2. RISC/mRNP: 12 (+2) 3. P-bodies: 27 (+3)
pDSPs altering machinery gene sequence SNPs (non-synonymous, stop/frameshift, splicing site)
pDSPs altering machinery gene product concentration CNVs encompassing machinery genes (human, mouse, rat) eQTL corresponding to machinery genes (human)
Silencing machinery – Methods
1
2
3
Silencing machinery – Results
n.d.21 machinery genes identified as eQTL
017affected genes
017CNVsmachinery genesin CNVs
5242splicing sites
245stops / frameshifts
73151non-synonymous
127237total
3549affected genes
SNPs inmachinery genes
5152genes
mousehuman
Conclusions other features of Patrocles
allelic imbalance plots (HapMap) reported associations between pSNPs (or SNPs in
miR genes) and phenotypes Patrocles Finder for custom sequences
most pSNPs are likely false positives due topoor specificity of target site predictions Patrocles still contains some interesting biology
(e.g., purifying selection on non-conserved sites) systematic validation of pSNPs become possible
(e.g., AgoIP + estimation of allelic imbalance inRISC-bound mRNAs; HITS-CLIP)
http://www.patrocles.org/