Download - Reverse genetics: Quantitative Trait Locus (QTL) mapping Association mapping Integrating Mendelian and Quantitative Genetics using molecular techniques.

Reverse genetics:

Quantitative Trait Locus (QTL) mapping

Association mapping

Integrating Mendelian and Quantitative Genetics using molecular techniques

Quantitative traitQuantitative trait

16 64 76 8828 40 52Height

Mendelian traitMendelian traitIndividual

10987654321

12 11 22 22 11 22 12 11 22 12Genotype =

Allele A1

Allele A2

Courtesy of Glenn Howe

Identifying Genes Underlying Phenotypes

Linkage and quantitative trait locus (QTL) analysis

Need a pedigree with segregating traits

Linkage map with moderate number of markers

Very large regions of chromosomes represented by markers

Quantitative Trait Locus Mapping

HEIG

HT

GENOTYPEBBBbbb

abc

ABC

ABC

Parent 1 Parent 2

Xabc

F1 F1

X

ABC

abc

ABC

abc

ABc

aBc

aBc

Abc

ABc

aBc

Abc

Abc

abc

Abc

ABC

ABc

Abc

aBc

aBc

Abc

aBc

aBc

Bb

BbBB BB BBbb bbBB Bb Bb

“Genetic architecture” of quantitative traits

QTL studies can reveal the following facets of the genetic architecture of a quantitative trait:

-Number of genes underlying the trait

-The strength of effect of each gene

-Additive vs. dominant effects of traits

-Potential gene interactions among genes

-Ultimately, “QTN” or the actual genes involved

Quantitative Trait Locus Analysis

Step 1: Make a controlled cross to create a large family (or a collection of families)

Parents should differ for phenotypes of interest

Segregation of trait in the progeny

Step 2: Create a genetic map

Large number of markers phenotyped for all progeny

Step 3: Measure phenotypes

Need phenotypes with moderate to high heritability

Step 4: Detect associations between markers and phenotype using a model

Step 5: Identify underlying molecular mechanisms

Step 1: Construct Pedigree Cross two individuals with

contrasting characteristics

Create population with segregating traits

Ideally: inbred parents crossed to produce F1s, which are intercrossed to produce F2s

Recombinant Inbred Lines created by repeated intercrossing

Allows precise phenotyping, isolation of allelic effects

Grisel 2000 Alchohol Research & Health 24:169

Step 2: Construct Genetic Map Based on nonrandom

association of alleles at different loci in pedigree

Calculate pairwise likelihood of linkage

Gives overview of structure of entire genome

Most efficient with anonymous markers: AFLP

Codominant markers much more informative: SSR

Step 3: Determine Phenotypes of Offspring

Phenotype must be segregating in pedigree

Must differentiate genotype and environment effects

How?

Works best with phenotypes with high heritability

Proportion of total phenotypic variance due to genetic effects

Why is this important?

0.1

0.5

0.9

Step 4: Detect Associations between Markers and Phenotypes Single-marker associations

are simplest

Simple ANOVA, correcting for multiple comparisons

Log likelihood ratio: LOD (Log10 of odds)

If QTL is between two markers, situation more complex

Recombination between QTL and markers (genotype doesn't predict phenotype)

'Ghost' QTL due to adjacent QTL

Use interval mapping or composite interval mapping

Simultaneously consider pairs of loci across the genome

Step 5: Identify underlying molecular mechanisms

QTG: Quantitative Trait Gene

QTN: Quantitative Trait Nucleotide

chromosome

Genetic Marker

Adapted from Richard Mott, Wellcome Trust Center for Human Genetics

QTL

QTL mapping: model for a single marker locus

Marker locus A, quantitative trait locus Q, recombine at rate r

Qq genotype has mean Qq

qq genotype has mean qq

Offspring

Aa has mean Aa=Qq (1-r) + qq r

aa has mean aa=Qq r + qq(1-r)

QTL effect = (Qq - qq )= (Aa-aa)/(1-2r)

Recombination rate confounded with QTL effect

rA Q

a q

a q

a qx

QTL mapping: model for flanking marker loci

In simplest case, two markers A and B flank the QTL

Enough degrees of freedom to separately estimate QTL effect

"Interval mapping": estimate QTL effect in a sliding window along the marker map

Many approaches developed...

r1 r2

A Q B

a q b

a q b

a q bx

QTL map of in Douglas fir (bud opening date)

Figure 2.—Seven QTL for terminal bud flush were detected in the growth initiation experiment . QTL were found on six linkage groups (2, 3, 4, 5, 12, and 14) and were detected in five of the six treatment combinations.

Jermstad et al. (2003) Genetics, Vol. 165, 1489-1506

QTL Vary by Year, Site, and Population Loblolly pine QTL measured in different years at same site, in different

sites, and with a different genetic background

Stippled: not repeated across years

wood-specific gravity

% latewood

Brown et al

Drawbacks of QTL mapping

Often results are difficult to reproduce, and vary by year, pedigree and location

Multiple experiments are needed to confirm results, but experiments are large undertakings (population size, genotyping, phenotyping)

Even if QTL localized to a few cM, this could correspond to 1000s of KB of DNA, containing many genes

As controlled crosses are used, only a fraction of natural variation surveyed

Biased towards detecting large effect QTL, as small effect QTL are not statistically significant

Association Genetics

Methods for associating phenotypes with SNPsEffects of population structureCandidate gene approaches

QTL mapping vs. association genetics

Indirect vs. direct association

Two approaches to association studiesTwo approaches to association studies

Population-based

Cases (affected individuals) and unrelated population controls (unaffected individuals) collected from “one” population

Effects of population structure can be incorporated

Family-based

Child-family trios and TDT design is the most common

Robust to effects of population structure

Case – control association test The simplest method Compare SNP

frequencies of affected vs. unaffected

Chi-square with one degree of freedom test

21 = (ad - bc)2N .

(a+c)(b+d)(a+b)(c+d)

Genotype “AA”

Genotype“Aa”

Total

affected a b a+b

unaffected c d c+d

Total a+c b+d

Case-Control Example: Diabetes

Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States

High rate of Type II diabetes in these populations

Found significant associations with Immunoglobin G marker (Gm)

Does this indicate underlying mechanisms of disease?

Type 2 Diabetes present absent Total

present 8 29 37

absent 92 71 163

Total 100 100 200

Gm Haplotype

(1) Test for an association

21 = (ad - bc)2N .

(a+c)(b+d)(a+b)(c+d)

Case-control test for association (case=diabetic, control=not diabetic)

Question: Is the Gm haplotype associated with risk of Type 2 diabetes???

(2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes. (Note the test is exactly analogous to calculating r2 between two loci).

= [(8x71)-(29x92)]2 (200) (100)(100)(37)(163)

= 14.62

Index of indian Heritage

Gm Haplotype Percent with diabetes

0 Present

Absent

17.8

19.9

4 Present

Absent

28.3

28.8

8 Present

Absent

35.9

39.3

Case-control test for association (continued)

Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???

The real story: Stratify by American Indian heritage

0 = little or no indian heritage; 8 = complete indian heritage

Conclusion: The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage

Family-Based Association: The Transmission Disequilibrium Test (TDT)

Still an association test (like a case-control), but we study parents and offspring and we condition on the parental genotypes

-this reduces effects of population stratification

Given the genotypes of the parents, is there an allele that is transmitted more frequently to affected individuals?

Under the null hypothesis (H0) of no linkage, what proportion of alleles do we expect the heterozygous parent to transmit?

AB or AA?

AB AA

Only look at affected offspring with at least one heterozygous parent, and consider only family with affected progeny

(2) Statistically test whether this observed number is different from 50:50

To do TDT, (1) we count the number of kids inheriting A or B across many families (trios) with affected kids

(3) If NOT 50:50, then affected kids may be inheriting one allele preferentially over the other

For each heterozygous parent in each family, we determine which allele is transmitted to the affected offspring and which is not.

H0: Two alleles are transmitted equally (no linkage and no association)

Test statistic is (b - c)2 ; 2 with 1 df b + c

Ha: One of the alleles is preferentially transmitted(linkage and association)

Transmission Disequilibrium Test (TDT)(with known parental genotypes and 2 alleles at the locus)

AB AA

A B number=b

AB AA

A A number=c

For each heterozygous parent in each family, see which allele is transmitted to the affected offspring and which is not.

Transmission Disequilibrium Test (TDT) : Example

10 families 15 families

1 2 1 1

1 2

1 2 1 1

1 1

TDT test b= , c=

(b - c)2 = = , p-value = b + c

Methods for genetic association in natural populations

• Standard general linear models (GLMs), usually with p values computed by permutation.

y = + mi + eij, where y is the trait value, is a general

mean, mi is the genotype of the i-th SNP and eij is the residual.

• Structured Association (Pritchard et al. 2000; Thornsberry 2001) and PCA Association (Price et al. 2006).

Controls for population structure by incorporating a Q matrix. This matrix is an n × p population structure incidence matrix where n is the number of individuals assayed and p is the number of populations defined.

• Mixed Linear Models (MLMs; Yu et al. 2006).

They incorporate a Q matrix (fixed effect) but also a pairwise relatedness matrix (K matrix, a random effect), which account for within population structure.

Based on Yu & Buckler (2006)Current Opinion in Biotechnology

GLMGC

Genetic association method depends upon population structure

Familial relatedness

Po

pula

tion

str

uct

ure

SAGC

GLMGCMLM

MLMTDT

unknown

SA=structured associationGC=genomic controlGLM=general linear modelTDT=transmission disequilibriumMLM=mixed linear model

Pinus taeda L

Continuous range, no clear population genetic structure

Fragmented range, significant population structure

TREESNIPS project (also P. sylvestris, Picea abies and oaks)

ADEPT project

Tamrabta(30)

TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)

CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca

(25)San Leonardo de Yagüe

Valdemaqueda(24)(21)Arenas de San Pedro

(27)San Cipriano

(40)Petrock

(43)Le Verdon

Olonne/Mer(44)

(42)Hourtin

(41)Mimizan

Cenicientos(20) Ahin(28)

St Jean de Monts(45)

(46) Pleucadec

(11)Pineta (10)Aulenne

Restonica (2)Pinia (15)

(29)Oria

(47)Erdeven

Pinus pinastergeographicrange France

Spain

Tunisia

Portugal

Morocco

Tamrabta(30)





(27)San Cipriano

(40)Petrock

(43)Le Verdon

Olonne/Mer(44)

(42)Hourtin

(41)Mimizan



(46) Pleucadec



(29)Oria

(47)Erdeven


Spain

Tunisia

Portugal

Morocco

Tamrabta(30)





(27)San Cipriano

(40)Petrock

(43)Le Verdon

Olonne/Mer(44)

(42)Hourtin

(41)Mimizan



(46) Pleucadec



(29)Oria

(47)Erdeven


Spain

Tunisia

Portugal

Morocco

22 populations

Pinus pinaster Ait.

Phenotypic traits

S1

S2

S3

2o wall

1o wall

microfibrilangle

• Earlywood specific gravity (ewsg)• Latewood specific gravity (lwsg)• Percent latewood (lw)• Earlywood microfibril angle (ewmfa)• Lignin & cellulose content (lgn-cel)

• Synthetic PCAs for different wood-age types

Genetic association with wood property traits in loblolly pine

González-Martínez et al. 2007 Genetics

Significant genetic association of cad gene with earlywood specific gravity and 4cl with % latewood

0 500 1000 1500 2000 2500

1

994

1410

1609

1697

1845

1934

2004

2385

2589

F4 R4 F3 R3 F2 R1A61 601 947 1454 1486 2003

F5 R3 F6 R6491 1956 2728

0 500 1000 1500 2000 2500 2500 3000 3500

-60 90 208 321 781 1008 1133 1417 1528 1681 3192 328490

F1A R1A F2 R2 F3 R3F6 R6

4cl cad

Based on Yu & Buckler (2006)Current Opinion in Biotechnology

GLMGC

Genetic association method depends upon population structure

Familial relatedness

Po

pula

tion

str

uct

ure

SAGC

GLMGCMLM

MLMTDT

unknown

SA=structured associationGC=genomic controlGLM=general linear modelTDT=transmission disequilibriumMLM=mixed linear model

K vs. Q matrix

Traits measured

Power considerations: structured populations

Zhao et al. (2007)PLoS Genetics

% variation explained by QTN

Po

we

r

(Small association pop of ~100 accessions)

Candidate Gene Associations vs. Whole Genome Scans

If LD is high and haplotype blocks are conserved, entire genome can be efficiently scanned for associations with phenotypes

Simplest for case-control studies (e.g., disease, gender)

If LD is low, candidate genes are usually identified a priori, and a limited number are scanned for associations

Biased by existing knowledge

Use "Candidate Regions" from high LD populations, assess candidate genes in low LD populations

P_2852_A157.3

P_2385_A

AB

OV

E:B

ELO

W

CO

AR

SE

RO

OT

P_204_C0.0S8_328.8P_2385_C11.6T4_1012.1S15_8S5_3713.8T4_7S6_1215.5S8_2917.9P_2786_A S12_1820.4T1_1322.3T7_423.5T3_13 T3_36S17_2124.1

S15_16T12_1525.3T2_3026.5S13_2029.5S1_2036.5T9_1 S1_1943.2S3_1350.5S1_2452.9S2_754.1P_575_A59.1T12_2260.6S2_3285.0T7_995.7S2_6107.8S13_16 T5_25121.4T5_12124.3T10_4129.0T1_26 T7_13135.7P_93_A148.6S4_20150.2S7_13 S7_12T12_4152.8

S4_24T3_10S6_4154.1

S3_1163.4S6_20 S13_31T7_15171.3

T2_31178.2S8_4180.8S8_28182.1O_30_A184.2T5_4193.5T3_17198.1T12_12206.8S5_29210.6P_2789_A219.9P_634_A S17_43226.5S17_33230.3S17_12232.7S4_19243.1

S17_26262.9

I

QTL Candidate Region

Candidate Gene Identification

Candidate genes are selected by knowledge of how they influence similar traits in other organisms. There is increasing evidence that some genes can control similar phenotypic traits even in distantly related species.Easy to apply: lets see if this primer set works on this particular species!

The “Candidate gene” approach

Candidate genes are genes of known biological action involved with the development or physiology of the trait - Biological candidates They may be structural genes or genes in a regulatory or biochemical pathway affecting trait expression Positional candidates lie within the QTL region that affect the trait

Candidate gene definitions

MHC related genes for studying disease and parasite resistance, and mate choiceHeat shock proteins (HSP) for temperature and stress toleranceGrowth hormone and its receptors for growth, sizeCandidate genes also available for many ecologically relevant traits incl. morphology, color, foraging, learning and memory, social interactions, alternative mating strategies

Traditional candidate genes and traits

Coat colour variation in mice (Robbins et al. 1993)

Hair and skin color in humans (Valverde et al. 1995)

Feather coloration in chickens (Takeuchi et al. 1996)

Coat colour in pigs (Kijas et al. 1998)

Feather coloration in several bird species (Theron et al. 2001; Mundy et al. 2004)

Coat colour in several mammals such as horse, red fox and pocket mice (Mundy et al. 2004)

Skin color in lizards (Rosenblum et al. 2004).

Coat color of Kermode Bear (Ritland et al. 2001)

Success story: Melanocortin-1 receptor gene

Melanocortin-1 receptor gene (MC1R)

Mundy 2005Mundy 2005

Nachman et al. Nachman et al. 20032003

MC1R in pocket mouse

Nachman et al. Nachman et al. 20032003

MC1R in pocket mouse: habitat differences

Mundy et al. Mundy et al. 20042004

MC1R in lesser snow goose

Mundy et al. 2004

MC1R in Arctic skua