Introduction to Quantitative Trait Loci Linkage and Association Studies
description
Transcript of Introduction to Quantitative Trait Loci Linkage and Association Studies
Introduction to Quantitative Trait LociLinkage and Association Studies
Lon CardonWellcome Trust Centre for Human Genetics
University of Oxford
Pak ShamInstitute of Psychiatry
King’s College London
Stacey ChernyBoth, and then some
QTL Mapping: Morning Schedule
09.00 – 10.00 Linkage Theory (overview) Sham10.00 – 10.30 Illustrative application Cardon
11.00 – 11.30 Association/Linkage DisequilibriumTheory Sham
11.30 – 12.15 Application Cherny12.15 – 12.30 Interpreting the results Cardon
• F:\lon\fulker_paper99.pdf• Fourteenth International Twin Course (Advanced):
Boulder, Colorado, March 2000
Positional Cloning of Complex Traits
LO
D
Sib pairs Chromosome Region Association Study
Genetics
GenomicsPhysical Mapping/Sequencing
Candidate Gene Selection/Polymorphism Detection
Mutation Characterization/Functional Annotation
Genome Screens for Linkage in Sib-pairs
In 1997/98, > 20 genome screens published using sib-pairs- Diabetes (IDDM + NIDDM)- Asthma- Osteoporosis- Obesity- Multiple Sclerosis- Epilepsy- Inflammatory Bowel Disease- Celiac Disease- Psychiatric Disorders- Behavioral Traits- others...
Scan Rate at least 2-fold greater in 1998/1999
Many more studies of specific loci, candidate gene regions
Disequilibrium Mapping
• 100’s candidate gene studies every year
• Replications rare
• Genome-wide SNP maps expected in late 2001 (300,000 SNPs; ~ 1 SNP/10 kb)
• Applications in epidemiology, drug design, functional assessment, …
n
iiii
n
iicL
1
1
1
)()'(2
1||log
2
1)log( yEyE
Likelihood for Variance Components Applications
where yi is the vector of phenotypes for the ith family,
Ei is a function of polygenic effects, environmental effects, major loci, interactions, etc.,
and
may be used to incorporate a wide range of covariates, including association/disequilibrium parameters.
Lange, Westlake & Spence, AJHG, 1976
Linear Model of Association(Fulker et al, AJHG, 1999)
Biometrical basis
;ijijijij egGy
bbgenotype if
Bbgenotype if 0
BBgenotype if
ij
ij
ij
ij
a
a
G
jif
ji|,yCov(y
gikja
ega
ijkikij if2
1)(
if)
22
222
ijwibij wby
Variance model (linkage)
Means model (association)
Population association is parameterized independent of linkage (unlike TDT)
ijk = proportion of alleles shared ibd at marker2
a = additive genetic variance parameter2
g = polygenic (residual) variance parameter2
e = environmental (residual) variance parameter
Application: ACE• British population• Circulating ACE levels
– Normalized separately for males / females
• 10 di-allelic polymorphisms– 26 kb– Common– In strong Linkage disequilibrium
• Keavney et al, HMG, 1998
Angiotensin-1 Converting Enzyme
Keavney et al. (1999) Hum Mol Gen, 7:1745-1751
Angiotensin-1 Converting EnzymeKeavney et al. (1999) Families
83 extended families4 - 18 members/familyage: 19-90 years
Families ascertained for study of blood pressure
Phenotype: Plasma ACE activity, standardized withingender
No correlation between ACE and SBP or DBP
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10
Marker
Fre
q A
sso
c A
llele
0.75
0.8
0.85
0.9
0.95
1
1 2 3 4 5 6 7 8 9
Marker(n,n+1)
D'
ACE Markers and Disequilibrium
Data from Keavney et al. (1999) Hum Mol Gen, 7:1745-1751
•F:\lon\2000\linkage.mx
•F:\lon\2000\marker*.mx
Angiotensin Converting EnzymeMarker/IBD Files
Linkage in Sib-pairs
0
5
10
15
20
0 2 4 6 8 10
Marker
Ch
i-sq
uar
ed
Between Pairs Model of Association(Fulker et al, AJHG, 1999)
G1 G2 A1 A2 MeanBB BB a a a BB Bb a 0 a/2BB bb a -a 0 Bb BB 0 a a/2Bb Bb 0 0 0 Bb bb 0 -a -a/2bb BB -a a 0bb Bb -a 0 -a/2bb bb -a -a -a
Genotype Genetic Value
BBBbbb
a0
-a
2a
bb BBBb
Biometrical Model Between Pair Expectations
• Genotype-phenotype associations between pairs may result from allelicassociation or from population substructure
Within Model of Association(Fulker et al, AJHG, 1999)
G1 G2 A1 A2 Diff1 Diff2
BB BB a a 0 0BB Bb a 0 a/2 -a/2BB bb a -a a -aBb BB 0 a -a/2 a/2Bb Bb 0 0 0 0Bb bb 0 -a a/2 -a/2bb BB -a a -a abb Bb -a 0 -a/2 a/2bb bb -a -a 0 0
Genotype Genetic Value
BBBbbb
a0
-a
2a
bb BBBb
Biometrical Model Within Expectations
• Genotype-phenotype associations within pairs unaffected by sampling artifacts• Difference = 0 unless 1 parent heterozygous (cf. TDT)
Parameter Expectations
)/()( rsaDE w
),()/()( kkb pfrsaDE
)1(2)/(22ˆ 22222 RpqapqaDpqpqaa
Leta = additive genetic valueD = disequilibrium coef between q1, m1 alleles [P(m1q1)-P(m1)P(q1)]r = frequency m1 allele (s = 1 – r)p = frequency q1 allele (q = 1 – p)R = correlation between numbered alleles at marker and QTLk = population strata counter
Test of linkage only (typical VC) 2a = 0
Test of substructure: b = w
Powerful test in absence of stratification: a= b+w = 0
Test of linkage in presence of association: 2a = 0 (a free)
Variance Components Association Model- Obvious Uses -
0
0.1
0.2
0.3
0.4
0.5
0.25 0.5 0.75 1
D/Dmax
Po
we
r: L
inka
ge
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Po
we
r: L
D
Linkage
LD
Variance Components Test for Linkage Disequilibrium - Power of Testing Linkage vs LD -
Linkage in Sib-pairs
0
5
10
15
20
0 2 4 6 8 10
Marker
Ch
i-sq
uar
ed
Linkage in the presence of association
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10
Marker
Ch
i-sq
uar
ed
(lin
kag
e|as
soc)
Linkage and Association in Sib Pairs
01234
5678
1 2 3 4 5 6 7 8 9 10
Marker
Lin
kag
e ch
i-sq
uar
ed
0
20
40
60
80
100
Ass
oci
atio
n c
hi-
squ
ared
Evidence for Linkage: Full Sample
0
5
10
LO
D
A-5466C A-240T T1237C I/D 4656(CT)3/2
T-5991C T-3892C T-93C G2215A G2350A
Evidence Against Complete LD: Full Sample
A-5466C A-240T T1237C I/D 4656(CT)3/2
T-5991C T-3892C T-93C G2215A G2350A
0.0
0.5
1.0
1.5
LO
D
Evidence for Association: Full Sample
0
5
10
15
LO
D
A-5466C A-240T T1237C I/D 4656(CT)3/2
T-5991C T-3892C T-93C G2215A G2350A
Drawing Conclusions: Full Sample
A-5466C A-240T T1237C I/D 4656(CT)3/2
T-5991C T-3892C T-93C G2215A G2350A
0
5
10
15
LO
D
0.0
0.5
1.0
1.5
Series2 for Association against Complete LD
ACE Example Summary
• Agrees with haplotype analysis
• Distinguishes complete and incomplete disequilibrium– Measure of distance for incomplete LD– Indicator of trait allele frequencies
• Typical or fairy-tale?
D' Estimates, Oxford ACE Data
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10
Marker
D'
Observed D' with I/D minimum estimated D'
QTL Allele Frequency Estimation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 2 3 4 5 6 7 8 9 10
Marker
QT
L a
llele
fre
q
Useful diagnostics• Fit association and linkage
models separately• Provide indicator of
distance– Minimum D’ (D’min)
• Select next markers– Range for QTL alleles
(pmin, pmax)
Haplotype Analysis
• 3 clades– All common haplotypes
– >90% of all haplotypes
• “B” = “C” – Equal phenotypic effect
– Functional variant on right
• Keavney et al (1998)
TATATTAIA3
TATATCGIA3
TATATTGIA3
CCCTCCGDG2
CCCTCCADG2
TATATCADG2
TACATCADG2
A
B
C
Case/Control Studies: Admixture Consider two case/control samples, A and B, genotyped at a marker with alleles M and m Neither has any association: Sample ‘A’ Sample ‘B’ M m Freq. M m Freq. Affected 50 50 .10 1 9 .01 Unaffected 450 450 .90 99 891 .99 .50 .50 .10 .90 2
1 is n.s. 21 is n.s.
Now consider a single sample comprised of A + B: M m Freq. Affected 51 59 .055 Unaffected 549 1341 .945 .30 .70
21 = 14.84, p < 0.001
…Association can be induced by mixed samples
The Spielman TDT
• Traditional case-control– Compare allele frequencies in two samples
• Cases and controls must be one population
• Heterozygous parents– Parental alleles are the study population– Population allele frequencies fixed
• 50:50, independent of original
– Test for excess among affected offspring
1/2 3/4
1/3
TDT based on (T - NT)2/(T+NT)
Transmission/Disequilibrium Test
• TDT uses only heterozygous parentsConsequence: at different markers with variable allele frequencies, analyses are based on different subsets of overall sample => difficulties for localization
• TDT evaluates linkage in presence of association; ie., joint testConsequence: given positive evidence, cannot distinguish between strong linkage or strong association
• Several sibling-based extensions developed
Family-based Association Methods for Quantitative Traits
Allison, D.B., AJHG, 1997 Selected parent-offspring triosRabinowitz, D. Hum Hered, 1997 Nuclear familiesFulker, D. W. et al. AJHG, 1999 Sib-pairs without parentsElston, R. C. et al. AJHG, 1999 General pedigrees (linkage)Allison, D. B. et al. AJHG, 1999 Sibships with/without parents (linkage)Abecasis, G. et al. AJHG 2000 General pedigrees with/without parentsCardon, L.R. Hum Hered 2000 Sib-pairs with GxE, epistatic interactionsMonks, S. et al. abstract 1999 Nuclear families
Primary aim: association test free of pop. sub-structure effects
Quantitative Genetic Model
2a
bb BBBb
d
midpoint
Genotype Genetic Value
BBBbbb
ad
-a
Simple Association Model
• Fit by linear regression– Phenotype (yij)
– Mean ()
– Number of ‘B’ alleles at marker (gij)
• Evidence for association when a 0
ijijaij gy
ijwibij wbyE )(
Linear Model of Association in Sib-pairs
bi and wij are defined on the basis of the marker genotypei.e., b and w are f(genotype(QTL), genotype(marker),Dmq)
ACE: D’min, pmin and pmax
Expected Actual
T-5991C G2215A I/D G2350A
D’ > 0.78 0.78 0.82 0.85
Minor allele
.15 – .48 .45 - .50 .45 - .50 .45 - .50