Quantitative Trait Loci Mapping, Genome-wide Association ...
Fine-Mapping Quantitative Trait Loci in mice and plants
description
Transcript of Fine-Mapping Quantitative Trait Loci in mice and plants
Fine-Mapping Quantitative Trait Loci in mice and plants
Richard Mott, William Valdar, Jonathan Flint, Paula Kover1 and colleagues
Wellcome Trust Centre for Human GeneticsOxford University
1University of Manchester
Mosaic Crosses
Inbred founders G3 GN F20
mixing chopping up inbreeding
F2, dialleleHeterogeneous Stock, Advanced Intercross, Random Outbreds
RecombinantInbred Lines
Hidden Chromosome Structure (ancestral mosaic)
Observed chromosome structure (genotypes)
Analysis of Mosaic Crosses
Mott et al, 2000 PNAS
Hidden Markov Model
Mice
Pseudo-random matingfor 50 generations
Heterogeneous stocks
LP/J CBA/J A/J C57BL/6J BALB/cJAKR/J DBA/2J C3H/HeJ
Collaborative Cross Mice 8-way funnel
Brother-sister matingfor >20 generations
~ 1000 RecombinantInbred Lines
A/J C57BL6/J NOD/LtJ 129S1/SvImJ CAST/EiJNZO/HlLtJ PWK/PhJ WSB/EiJ
Genetic diversity of mouse populations
Roberts et al Mamm Genome (2007) 18:473-481
The diversity captured in the model resources is shown in their respective color as (BXD RI, blue; B.P CSS, gray; Northport HS, green; LSDP, orange; CC, red).
Study design
2,000 heterogeneous stock mice
15,000 diallelic markers
More than 100 phenotypes
each mouse subject to a battery of tests spread over weeks 5-9 of the animal’s life
101 Phenotypes
Anxiety (conditioned and unconditioned tasks) [24]
Asthma (plethysmography) [13]
Biochemistry [15]
Diabetes [16]
Haematology [15]
Immunology [9]
Weight/size related [8]
Wound Healing [1]
Bacterial Gut Flora
Mandible Shape
Gene Expression
Hippocampus, Liver, Lung
Intraperitoneal Glucose Tolerance Test
Solberg et al (2006) Mammalian Genome
How to select peaks: a simulated example
How to select peaks: a simulated example
Simulate 7 x 5% QTLs
(ie, 35% genetic effect)
+ 20% shared environment effect
+ 45% noise
= 100% variance
Simulated example: 1D scan
Peaks from 1D scan
phenotype ~ covariates + ?
1D scan: condition on 1 peak
phenotype ~ covariates + peak 1 + ?
1D scan: condition on 2 peaks
phenotype ~ covariates + peak 1 + peak 2 + ?
1D scan: condition on 3 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 + ?
1D scan: condition on 4 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 +peak 4 + ?
1D scan: condition on 5 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + ?
1D scan: condition on 6 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + ?
1D scan: condition on 7 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + ?
1D scan: condition on 8 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + ?
1D scan: condition on 9 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + ?
1D scan: condition on 10 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + ?
1D scan: condition on 11 peaks
phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + peak 11 + ?
Peaks chosen by forward selection
Forward selection on a bootstrap sample
Forward selection on a bootstrap sample
Forward selection on a bootstrap sample
Bootstrap evidence mounts up…
In 1000 bootstraps…
Bootstrap Posterior Probability(BPP)
Model averaging by bootstrap aggregation
Choosing only one model:
very data-dependent, arbitrary
can’t get all the true QTLs in one model
Bootstrap aggregation averages over models
true QTLs get included more often than false ones
References:
Broman & Speed (2002)
Hackett et al (2001)
BPP > 0.25 strong evidence for a QTL
Valdar et al (2006) Nature Geneticshttp://gscan.well.ox.ac.uk
81 diabetes QTLs
T2D p-valhistone deacetylase 2 0.0009594
Mouse gene Human gene Description Chr SNPWGA T2D
logPGeneSniffer
ABCB11 ABCB11 ATP binding cassette, sub family B (MDR/TAP), member 11
2 rs567074 1.13 327
G6pc2 G6PC2 glucose 6 phosphatase, catalytic, 2 2 rs567074 1.13 2929
Whsc1 WHSC1 Wolf Hirschhorn syndrome candidate 1 4 rs487903 1.80 51
Whsc2 WHSC2 Wolf Hirschhorn syndrome candidate 2 4 rs13141668 2.13 29.9
SIM1 SIM1 single minded homolog 1 (Drosophila) 6 rs10457849 1.02 808
Hdac2 HDAC2 histone deacetylase 2 6 rs2348779 3.02 477
tns3 TNS3 tensin 3 7 rs865603 2.43 6
D530005L17Rik C9orf39 9 rs2593402 2.09 6.4
TRAF6 TRAF6 TNF receptor associated factor 6 11 rs10734444 1.06 332
RAG1 RAG1 recombination activating gene 1 11 rs10734444 1.06 985
Usp12 USP12 ubiquitin specific peptidase 12 13 rs2479559 2.47 0.7
Wdfy2 WDFY2 WD repeat and FYVE domain containing 2
13 rs9526793 1.56 0
4932702D22Rik EXDL1 exonuclease 3" 5" domain like 1 15 rs7173954 1.19 0
Ganc GANC glucosidase, alpha; neutral C 15 rs10518763 2.03 316
BC030867 C17orf53 17 rs228787 1.87
Mouse gene Human gene Description Chr SNPWGA T2D
logPGeneSniffer
ABCB11 ABCB11 ATP binding cassette, sub family B (MDR/TAP), member 11
2 rs567074 1.13 327
G6pc2 G6PC2 glucose 6 phosphatase, catalytic, 2 2 rs567074 1.13 2929
Whsc1 WHSC1 Wolf Hirschhorn syndrome candidate 1 4 rs487903 1.80 51
Whsc2 WHSC2 Wolf Hirschhorn syndrome candidate 2 4 rs13141668 2.13 29.9
SIM1 SIM1 single minded homolog 1 (Drosophila) 6 rs10457849 1.02 808
Hdac2 HDAC2 histone deacetylase 2 6 rs2348779 3.02 477
tns3 TNS3 tensin 3 7 rs865603 2.43 6
D530005L17Rik C9orf39 9 rs2593402 2.09 6.4
TRAF6 TRAF6 TNF receptor associated factor 6 11 rs10734444 1.06 332
RAG1 RAG1 recombination activating gene 1 11 rs10734444 1.06 985
Usp12 USP12 ubiquitin specific peptidase 12 13 rs2479559 2.47 0.7
Wdfy2 WDFY2 WD repeat and FYVE domain containing 2
13 rs9526793 1.56 0
4932702D22Rik EXDL1 exonuclease 3" 5" domain like 1 15 rs7173954 1.19 0
Ganc GANC glucosidase, alpha; neutral C 15 rs10518763 2.03 316
BC030867 C17orf53 17 rs228787 1.87
Mouse and human T2D candidates
Zeggini et al (2007) Science
Bacterial Gut Flora
• Different inbred strains have different gut flora (David Threadgill)• Flora can be assayed from feacal boli (quantitative PCR)
– Bacteroides distasonis– Clostridium– Eubacterium plexicaudatum– Firmcutes– Flexistipes– Lactobacillus acidophilus– Lactobacillus salivarius
• Data available for 600 HS mice at present
Defensins
Beta-defensin cluster
QTLs and Gene Expression Networks
ComplexPhenotypes
Gene ExpressionPhenotypes
Genotypes
QTL eQTL
100 traits2000 HS mice
47,000 probes500 hippocampi
250 livers250 lungs
13800 SNPs2000 mice + 300 parents
Directcorrelation
eQTLs at different BPP thresholds
Lung Liver Hippocampus (1) Hippocampus (2)
BPP Cis Trans Cis Trans Cis Trans Cis Trans
0.1 2194 7868 1151 6799 2478 3923 3945 5217
0.2 2189 7264 1150 6510 2474 3749 3940 4904
0.3 2179 6780 1146 6280 2469 3614 3928 4675
0.4 2160 6313 1136 5945 2456 3442 3915 4447
0.5 2135 5796 1121 5557 2442 3210 3890 4202
0.6 2090 5084 1086 5087 2394 2818 3843 3901
0.7 2004 3617 1021 3184 2319 1901 3729 2565
0.8 1889 1611 946 1303 2207 862 3573 1295
0.9 1721 443 849 390 2038 355 3358 598
1 1288 103 564 45 1548 130 2703 249
Tissue specificity of eQTLs at different BPP thresholds
BPP 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Only Liver 363 363 362 363 360 349 309 280 255 169
Only Lung 895 893 890 881 869 847 816 771 704 544
Hippocampus 1238 1238 1237 1230 1222 1202 1176 1130 1063 837
Liver + lung 170 170 168 163 157 153 143 129 119 72
Hippocampus + lung 622 619 616 616 616 608 574 540 500 388
Hippocampus + liver 112 111 112 110 111 102 98 88 77 39
Hippocampus + liver + lung 506 506 504 500 493 482 471 449 398 284
snps from http://mouse.perlegen.com/
Alkaline Phosphatase
Foreman, et al. Physiol. Genomics 23, 295–303 (2005), Valdar et al, Nat. Genet. .38, 879-887 (2006)
Eif4g3
Akp2
Mrto4 Ccdc21
Akp2
Akp2
Ccdc21
Sytl1
Sytl1
Mrto4
Eif4g3
Large effect cis eQTLs are often artefacts
Plants
Arabidopsis thaliana HSRIL16-way funnel
EcotypeCountry of Origin NASC Ecotype
Country of Origin NASC
Ler-0 Germany NW20 Kn-0 Lithuania N6762
Ct-1 Italy N6674 Po-0 Germany N6839
Wil-2 Russia N6689 Oy-0 Norway N6824
No-0 Germany N6805 Bur-0 Ireland N6643
Hi-0 Netherlands N6736 Sf-2 Spain N6857
Wu-0 Germany N6897 Zu-0 Switzerland N6902
Col-0 USA N6673 Ws-0 Russia N6891
Rsch-4 Russia N6850 Can-0 Canary Isles N6660
Mt-0 Libya N1380 Edi-0 Scotland N6688
Tsu-0 Japan N6874
LD decay in the 19 founders
Differences Between Founders
• A 5-SNP window (~500kb) separates founders into 7 groups on average
• A 10-SNP window (~1Mb) separates founders into 13 groups on average
Col vs Ler: 0.54 Oy vs Po: 0.89
Mean=0.71
A 5-SNP window (~500kb) separates founders into 7 groups on averageA 10-SNP window (~1Mb) separates founders into 13 groups on average
Allele Frequencies in the founders
Haplotype Mosaic of the HSRIL
Prediction based on minimising recombinants in 1540 SNPs
QTL Mapping: Simulations
QTL MappingProvisional Results of 200 HSRILs, genotypes not corrected
Resequencing?
• Resequence 19 founders using Solexa/FLX– 80-90% of each genome should be recovered, based on Bur-0
• Impute sequences of HSRILs from the mosaics• Association Mapping using imputed genomes to refine QTLs
– QTL mapping resolution ~200-500kb in HSRILs– LD between founders decays by 10kb
Acknowledgements
MouseFunded by The Wellcome Trust
William ValdarCarmen Arboleda-HitasAmarjit BhomraPeter BurnsRichard CopleyStuart DavidsonKate ElliotSimon FiddyJonathan FlintPolinka HernandezSue MillerRichard MottChela NunezSagiv ShifmanLeah SolbergAmy TaylorMartin TaylorJennifer TaylorJordana Tzenova-BellBinnaz YalcinDave BannermanShoumo BhattacharyaBill CooksonRob DeaconDominique Gauguier
ArabidopsisFunded by BBSRC
Paula KoverXiao WangWilliam ValdarRichard Mott
QTL networks are linked by genes with correlated expression
Compute correlation R between every pair of transcripts, first removing sex effects.
Pick Phenotype with N QTL, lengths L1,L2,..LN
Derive empirical distribution of maximum inter-QTL R
Do 1000 times:
Randomly rearrange these QTL segments across genome
Compute max R between transcripts under different segments
Compute segment network joined by transcripts > Rthreshold
Find networks of QTLs linked by strongly correlated transcripts
Correlations between genes under different QTLs are higher than expected by permutation
Hippocampus Liver
Large QTL networks are unlikely
Correlation Threshold
Fal
se P
ositi
ve R
ate
Based on mRNA co-expression networks
Liver
Liver Haematological QTL-transcript networks
Food hyponeophagia(reluctance to try new food)
Sort1 neurotenson receptorcelsr2 gpcr, cadherincrmp1 neuronal development, semaphoringrb2 growth factor receptorcamk2b schizophrenia/depressionslc24a3 neuronal synapsesBC034068 Nat8l, acetyl transferase, retinal development
Wound healing
Brp17 PKND = MR1 involved in fibrillogenesis in muscleNcor1 breast cancer cell proliferationIgfb2 carcinogenesis/cell proliferationZswim1 transcription factor of unknown function4732437J24Rik heat shock hsp40 1600023A02Rik wfdc2 contains domain WAP, that is involved in cancer in rats1700009P17Rik unknown function, contains domain pro1217, involved in cell wall strengthening in plantsAV028368 (9) d4ertd429e , unknown, contains Zn/cys6 domain, probably DNA-binding
(Rthresh = 0.75)Based on mRNA co-expression networks
Parent of Origin Effects
• Parent of Origin Effects occur when the parental ancestry of the allele affects the phenotype.
• Parental Genotypes were collected for the HS: It was possible to estimate if alleles were descended from dam or sire.
• 11% of QTLs investigated showed evidence of POE, defined as the maximum –log10P for the POE model was at least 4 log10P units greater than the maximum -log10P for the null POE model, over the QTL’s 95% confidence interval. This is a conservative estimate.
• Body-weight and Immunological phenotypes were particularly rich in POE QTL. Behavioural POE QTL were rare.
Parent of Origin Effects
The most significant POE QTL are listed below
-log10P Phenotype Locus
poe poe+ null null diff
Imm CD4 Intensity 11:rs3659504-gnf11.098.633 20.5 23.9 7.4 16.5
Imm %CD8+ 1:rs6363233-rs6224524 16.2 36.8 23.8 12.9
Imm %CD8+ 12:rs13481411-rs13481412 12.7 28.3 17.1 11.2
Weight 6 weeks 2:rs13476640-rs4223268 11.3 15.8 5.8 9.9
Imm %CD8+/CD3+ 3:rs6274760-rs3701904 12.1 19.9 10.3 9.6
Imm %B220+ 12:rs13481568-rs13481583 11.6 21.2 11.6 9.6
Haem.Red Cell Distribution Width
2:rs8279354-rs3696744 10.8 18.9 9.4 9.5
Imm %CD8+ 2:rs6286688-rs13476391 11.1 32.3 22.8 9.4
Weight.10weeks (g) 2:rs6230107-rs3674264 11.7 17.6 8.9 8.7
Imm %CD3+ 12:rs4229511-rs13481634 15.1 21.4 12.7 8.7
Parent of Origin expression QTL
• 60 Hippocampus cis-eQTL showed evidence for POE at logP threshold = 4.
• The most significant hippocampal POE eQTL is logP = 38 chromosome 7, syntenic to the imprinted human Prader-Willi locus, Some of the POE eQTL are known to be imprinted but many appear to be novel.
• In the lung, 42 POE eQTL were identified, the most significant being Fads (fatty acid desaturase 3) on chromosome 19, logP=23.
• In the liver, 19 POE eQTL were identified, the most significant being Ugt1a13, (UDP glycosyltransferase 1 family, polypeptide A10) on chromosome 19 at logP=11.
• There was considerable overlap between the sets of liver and lung POE eQTL, but little overlap with the hippocampal set.
Joint POE QTL and POE eQTL
• We identified 12 POE QTL that overlapped with a POE eQTL. There are several interesting candidate genes suggested by this analysis. These include:
• Pldn (pallidin, platelet storage pool deficiency) for Red Cell Distribution Width.
• Scn2b (sodium channel, voltage-gated, type II, beta) for Open field total activity.
• ENSMUSESTG00003715632 and D7Ert715e for Weight at 6 weeks; syntenic to human Prader-Willi syndrome.
Ultra fine mapping in highly outbred stocks (Binnaz Yalcin)
2003: MF1 outbred population from Harlan
Yalcin et al, 2004 Nature Genetics
Ultra fine mapping in highly outbred stocks (Binnaz Yalcin)
2003: MF1 outbred population from Harlan
2007: Binnaz is searching for alternative wild
populations
Yalcin et al, 2004 Nature Genetics
http://gscan.well.ox.ac.uk
Extra Slides
Estimating genotypes for unobserved snps
The plan was to use the same MF1 mice from Harlan UK (as we did a few
years ago) to refine large effect QTLs (ALP, MCV, CD4 and HDL) found in
the HS project. In the MF1 pilot project, we genotyped and phenotyped 400
of these mice. Unexpectedly almost all markers used were homozygous and
Jonathan almost had a heart attack!
Over the last few months, I have embarked upon finding another population
of outbred mice, commercially available. I am currently in the process of
genetically characterising over 40 different colonies of outbred mice in
order to pick up the most suitable one for QTL mapping. The smallest the
LD block the best.
single marker association
single marker associationHAPPY method
Insulin.15
Insulin.Delta
Insulin.30
Insulin.AUC
Insulin locus on chromosome 11
Insulin locus on chromosome 11
Insulin.15
Annotatedgenes
Insulin locus on chromosome 11
Insulin.15
Annotatedgenes
tensin 3