Reverse genetics:
Quantitative Trait Locus (QTL) mapping
Association mapping
Integrating Mendelian and Quantitative Genetics using molecular techniques
Quantitative traitQuantitative trait
16 64 76 8828 40 52Height
Mendelian traitMendelian traitIndividual
10987654321
12 11 22 22 11 22 12 11 22 12Genotype =
Allele A1
Allele A2
Courtesy of Glenn Howe
Identifying Genes Underlying Phenotypes
Linkage and quantitative trait locus (QTL) analysis
Need a pedigree with segregating traits
Linkage map with moderate number of markers
Very large regions of chromosomes represented by markers
Quantitative Trait Locus Mapping
HEIG
HT
GENOTYPEBBBbbb
abc
ABC
ABC
Parent 1 Parent 2
Xabc
F1 F1
X
ABC
abc
ABC
abc
ABc
aBc
aBc
Abc
ABc
aBc
Abc
Abc
abc
Abc
ABC
ABc
Abc
aBc
aBc
Abc
aBc
aBc
Bb
BbBB BB BBbb bbBB Bb Bb
“Genetic architecture” of quantitative traits
QTL studies can reveal the following facets of the genetic architecture of a quantitative trait:
-Number of genes underlying the trait
-The strength of effect of each gene
-Additive vs. dominant effects of traits
-Potential gene interactions among genes
-Ultimately, “QTN” or the actual genes involved
Quantitative Trait Locus Analysis
Step 1: Make a controlled cross to create a large family (or a collection of families)
Parents should differ for phenotypes of interest
Segregation of trait in the progeny
Step 2: Create a genetic map
Large number of markers phenotyped for all progeny
Step 3: Measure phenotypes
Need phenotypes with moderate to high heritability
Step 4: Detect associations between markers and phenotype using a model
Step 5: Identify underlying molecular mechanisms
Step 1: Construct Pedigree Cross two individuals with
contrasting characteristics
Create population with segregating traits
Ideally: inbred parents crossed to produce F1s, which are intercrossed to produce F2s
Recombinant Inbred Lines created by repeated intercrossing
Allows precise phenotyping, isolation of allelic effects
Grisel 2000 Alchohol Research & Health 24:169
Step 2: Construct Genetic Map Based on nonrandom
association of alleles at different loci in pedigree
Calculate pairwise likelihood of linkage
Gives overview of structure of entire genome
Most efficient with anonymous markers: AFLP
Codominant markers much more informative: SSR
Step 3: Determine Phenotypes of Offspring
Phenotype must be segregating in pedigree
Must differentiate genotype and environment effects
How?
Works best with phenotypes with high heritability
Proportion of total phenotypic variance due to genetic effects
Why is this important?
0.1
0.5
0.9
Step 4: Detect Associations between Markers and Phenotypes Single-marker associations
are simplest
Simple ANOVA, correcting for multiple comparisons
Log likelihood ratio: LOD (Log10 of odds)
If QTL is between two markers, situation more complex
Recombination between QTL and markers (genotype doesn't predict phenotype)
'Ghost' QTL due to adjacent QTL
Use interval mapping or composite interval mapping
Simultaneously consider pairs of loci across the genome
Step 5: Identify underlying molecular mechanisms
QTG: Quantitative Trait Gene
QTN: Quantitative Trait Nucleotide
chromosome
Genetic Marker
Adapted from Richard Mott, Wellcome Trust Center for Human Genetics
QTL
QTL mapping: model for a single marker locus
Marker locus A, quantitative trait locus Q, recombine at rate r
Qq genotype has mean Qq
qq genotype has mean qq
Offspring
Aa has mean Aa=Qq (1-r) + qq r
aa has mean aa=Qq r + qq(1-r)
QTL effect = (Qq - qq )= (Aa-aa)/(1-2r)
Recombination rate confounded with QTL effect
rA Q
a q
a q
a qx
QTL mapping: model for flanking marker loci
In simplest case, two markers A and B flank the QTL
Enough degrees of freedom to separately estimate QTL effect
"Interval mapping": estimate QTL effect in a sliding window along the marker map
Many approaches developed...
r1 r2
A Q B
a q b
a q b
a q bx
QTL map of in Douglas fir (bud opening date)
Figure 2.—Seven QTL for terminal bud flush were detected in the growth initiation experiment . QTL were found on six linkage groups (2, 3, 4, 5, 12, and 14) and were detected in five of the six treatment combinations.
Jermstad et al. (2003) Genetics, Vol. 165, 1489-1506
QTL Vary by Year, Site, and Population Loblolly pine QTL measured in different years at same site, in different
sites, and with a different genetic background
Stippled: not repeated across years
wood-specific gravity
% latewood
Brown et al
Drawbacks of QTL mapping
Often results are difficult to reproduce, and vary by year, pedigree and location
Multiple experiments are needed to confirm results, but experiments are large undertakings (population size, genotyping, phenotyping)
Even if QTL localized to a few cM, this could correspond to 1000s of KB of DNA, containing many genes
As controlled crosses are used, only a fraction of natural variation surveyed
Biased towards detecting large effect QTL, as small effect QTL are not statistically significant
Association Genetics
Methods for associating phenotypes with SNPsEffects of population structureCandidate gene approaches
QTL mapping vs. association genetics
Indirect vs. direct association
Two approaches to association studiesTwo approaches to association studies
Population-based
Cases (affected individuals) and unrelated population controls (unaffected individuals) collected from “one” population
Effects of population structure can be incorporated
Family-based
Child-family trios and TDT design is the most common
Robust to effects of population structure
Case – control association test The simplest method Compare SNP
frequencies of affected vs. unaffected
Chi-square with one degree of freedom test
21 = (ad - bc)2N .
(a+c)(b+d)(a+b)(c+d)
Genotype “AA”
Genotype“Aa”
Total
affected a b a+b
unaffected c d c+d
Total a+c b+d
Case-Control Example: Diabetes
Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States
High rate of Type II diabetes in these populations
Found significant associations with Immunoglobin G marker (Gm)
Does this indicate underlying mechanisms of disease?
Type 2 Diabetes present absent Total
present 8 29 37
absent 92 71 163
Total 100 100 200
Gm Haplotype
(1) Test for an association
21 = (ad - bc)2N .
(a+c)(b+d)(a+b)(c+d)
Case-control test for association (case=diabetic, control=not diabetic)
Question: Is the Gm haplotype associated with risk of Type 2 diabetes???
(2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes. (Note the test is exactly analogous to calculating r2 between two loci).
= [(8x71)-(29x92)]2 (200) (100)(100)(37)(163)
= 14.62
Index of indian Heritage
Gm Haplotype Percent with diabetes
0 Present
Absent
17.8
19.9
4 Present
Absent
28.3
28.8
8 Present
Absent
35.9
39.3
Case-control test for association (continued)
Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???
The real story: Stratify by American Indian heritage
0 = little or no indian heritage; 8 = complete indian heritage
Conclusion: The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage
Family-Based Association: The Transmission Disequilibrium Test (TDT)
Still an association test (like a case-control), but we study parents and offspring and we condition on the parental genotypes
-this reduces effects of population stratification
Given the genotypes of the parents, is there an allele that is transmitted more frequently to affected individuals?
Under the null hypothesis (H0) of no linkage, what proportion of alleles do we expect the heterozygous parent to transmit?
AB or AA?
AB AA
Only look at affected offspring with at least one heterozygous parent, and consider only family with affected progeny
(2) Statistically test whether this observed number is different from 50:50
To do TDT, (1) we count the number of kids inheriting A or B across many families (trios) with affected kids
(3) If NOT 50:50, then affected kids may be inheriting one allele preferentially over the other
For each heterozygous parent in each family, we determine which allele is transmitted to the affected offspring and which is not.
H0: Two alleles are transmitted equally (no linkage and no association)
Test statistic is (b - c)2 ; 2 with 1 df b + c
Ha: One of the alleles is preferentially transmitted(linkage and association)
Transmission Disequilibrium Test (TDT)(with known parental genotypes and 2 alleles at the locus)
AB AA
A B number=b
AB AA
A A number=c
For each heterozygous parent in each family, see which allele is transmitted to the affected offspring and which is not.
Transmission Disequilibrium Test (TDT) : Example
10 families 15 families
1 2 1 1
1 2
1 2 1 1
1 1
TDT test b= , c=
(b - c)2 = = , p-value = b + c
Methods for genetic association in natural populations
• Standard general linear models (GLMs), usually with p values computed by permutation.
y = + mi + eij, where y is the trait value, is a general
mean, mi is the genotype of the i-th SNP and eij is the residual.
• Structured Association (Pritchard et al. 2000; Thornsberry 2001) and PCA Association (Price et al. 2006).
Controls for population structure by incorporating a Q matrix. This matrix is an n × p population structure incidence matrix where n is the number of individuals assayed and p is the number of populations defined.
• Mixed Linear Models (MLMs; Yu et al. 2006).
They incorporate a Q matrix (fixed effect) but also a pairwise relatedness matrix (K matrix, a random effect), which account for within population structure.
Based on Yu & Buckler (2006)Current Opinion in Biotechnology
GLMGC
Genetic association method depends upon population structure
Familial relatedness
Po
pula
tion
str
uct
ure
SAGC
GLMGCMLM
MLMTDT
unknown
SA=structured associationGC=genomic controlGLM=general linear modelTDT=transmission disequilibriumMLM=mixed linear model
Pinus taeda L
Continuous range, no clear population genetic structure
Fragmented range, significant population structure
TREESNIPS project (also P. sylvestris, Picea abies and oaks)
ADEPT project
Tamrabta(30)
TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)
CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca
(25)San Leonardo de Yagüe
Valdemaqueda(24)(21)Arenas de San Pedro
(27)San Cipriano
(40)Petrock
(43)Le Verdon
Olonne/Mer(44)
(42)Hourtin
(41)Mimizan
Cenicientos(20) Ahin(28)
St Jean de Monts(45)
(46) Pleucadec
(11)Pineta (10)Aulenne
Restonica (2)Pinia (15)
(29)Oria
(47)Erdeven
Pinus pinastergeographicrange France
Spain
Tunisia
Portugal
Morocco
Tamrabta(30)
TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)
CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca
(25)San Leonardo de Yagüe
Valdemaqueda(24)(21)Arenas de San Pedro
(27)San Cipriano
(40)Petrock
(43)Le Verdon
Olonne/Mer(44)
(42)Hourtin
(41)Mimizan
Cenicientos(20) Ahin(28)
St Jean de Monts(45)
(46) Pleucadec
(11)Pineta (10)Aulenne
Restonica (2)Pinia (15)
(29)Oria
(47)Erdeven
Pinus pinastergeographicrange France
Spain
Tunisia
Portugal
Morocco
Tamrabta(30)
TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)
CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca
(25)San Leonardo de Yagüe
Valdemaqueda(24)(21)Arenas de San Pedro
(27)San Cipriano
(40)Petrock
(43)Le Verdon
Olonne/Mer(44)
(42)Hourtin
(41)Mimizan
Cenicientos(20) Ahin(28)
St Jean de Monts(45)
(46) Pleucadec
(11)Pineta (10)Aulenne
Restonica (2)Pinia (15)
(29)Oria
(47)Erdeven
Pinus pinastergeographicrange France
Spain
Tunisia
Portugal
Morocco
22 populations
Pinus pinaster Ait.
Phenotypic traits
S1
S2
S3
2o wall
1o wall
microfibrilangle
• Earlywood specific gravity (ewsg)• Latewood specific gravity (lwsg)• Percent latewood (lw)• Earlywood microfibril angle (ewmfa)• Lignin & cellulose content (lgn-cel)
• Synthetic PCAs for different wood-age types
Genetic association with wood property traits in loblolly pine
González-Martínez et al. 2007 Genetics
Significant genetic association of cad gene with earlywood specific gravity and 4cl with % latewood
0 500 1000 1500 2000 2500
1
994
1410
1609
1697
1845
1934
2004
2385
2589
F4 R4 F3 R3 F2 R1A61 601 947 1454 1486 2003
F5 R3 F6 R6491 1956 2728
0 500 1000 1500 2000 2500 2500 3000 3500
-60 90 208 321 781 1008 1133 1417 1528 1681 3192 328490
F1A R1A F2 R2 F3 R3F6 R6
4cl cad
Based on Yu & Buckler (2006)Current Opinion in Biotechnology
GLMGC
Genetic association method depends upon population structure
Familial relatedness
Po
pula
tion
str
uct
ure
SAGC
GLMGCMLM
MLMTDT
unknown
SA=structured associationGC=genomic controlGLM=general linear modelTDT=transmission disequilibriumMLM=mixed linear model
K vs. Q matrix
Traits measured
Power considerations: structured populations
Zhao et al. (2007)PLoS Genetics
% variation explained by QTN
Po
we
r
(Small association pop of ~100 accessions)
Candidate Gene Associations vs. Whole Genome Scans
If LD is high and haplotype blocks are conserved, entire genome can be efficiently scanned for associations with phenotypes
Simplest for case-control studies (e.g., disease, gender)
If LD is low, candidate genes are usually identified a priori, and a limited number are scanned for associations
Biased by existing knowledge
Use "Candidate Regions" from high LD populations, assess candidate genes in low LD populations
P_2852_A157.3
P_2385_A
AB
OV
E:B
ELO
W
CO
AR
SE
RO
OT
P_204_C0.0S8_328.8P_2385_C11.6T4_1012.1S15_8S5_3713.8T4_7S6_1215.5S8_2917.9P_2786_A S12_1820.4T1_1322.3T7_423.5T3_13 T3_36S17_2124.1
S15_16T12_1525.3T2_3026.5S13_2029.5S1_2036.5T9_1 S1_1943.2S3_1350.5S1_2452.9S2_754.1P_575_A59.1T12_2260.6S2_3285.0T7_995.7S2_6107.8S13_16 T5_25121.4T5_12124.3T10_4129.0T1_26 T7_13135.7P_93_A148.6S4_20150.2S7_13 S7_12T12_4152.8
S4_24T3_10S6_4154.1
S3_1163.4S6_20 S13_31T7_15171.3
T2_31178.2S8_4180.8S8_28182.1O_30_A184.2T5_4193.5T3_17198.1T12_12206.8S5_29210.6P_2789_A219.9P_634_A S17_43226.5S17_33230.3S17_12232.7S4_19243.1
S17_26262.9
I
QTL Candidate Region
Candidate Gene Identification
Candidate genes are selected by knowledge of how they influence similar traits in other organisms. There is increasing evidence that some genes can control similar phenotypic traits even in distantly related species.Easy to apply: lets see if this primer set works on this particular species!
The “Candidate gene” approach
Candidate genes are genes of known biological action involved with the development or physiology of the trait - Biological candidates They may be structural genes or genes in a regulatory or biochemical pathway affecting trait expression Positional candidates lie within the QTL region that affect the trait
Candidate gene definitions
MHC related genes for studying disease and parasite resistance, and mate choiceHeat shock proteins (HSP) for temperature and stress toleranceGrowth hormone and its receptors for growth, sizeCandidate genes also available for many ecologically relevant traits incl. morphology, color, foraging, learning and memory, social interactions, alternative mating strategies
Traditional candidate genes and traits
Coat colour variation in mice (Robbins et al. 1993)
Hair and skin color in humans (Valverde et al. 1995)
Feather coloration in chickens (Takeuchi et al. 1996)
Coat colour in pigs (Kijas et al. 1998)
Feather coloration in several bird species (Theron et al. 2001; Mundy et al. 2004)
Coat colour in several mammals such as horse, red fox and pocket mice (Mundy et al. 2004)
Skin color in lizards (Rosenblum et al. 2004).
Coat color of Kermode Bear (Ritland et al. 2001)
Success story: Melanocortin-1 receptor gene
Melanocortin-1 receptor gene (MC1R)
Mundy 2005Mundy 2005
Nachman et al. Nachman et al. 20032003
MC1R in pocket mouse
Nachman et al. Nachman et al. 20032003
MC1R in pocket mouse: habitat differences
Mundy et al. Mundy et al. 20042004
MC1R in lesser snow goose
Mundy et al. 2004
MC1R in Arctic skua
Top Related