Association analysis Shaun Purcell Boulder Twin Workshop 2004.

Post on 20-Dec-2015

217 views 2 download

Tags:

Transcript of Association analysis Shaun Purcell Boulder Twin Workshop 2004.

Association analysisAssociation analysis

Shaun PurcellBoulder Twin Workshop 2004

OverviewOverview

• Candidate gene association

• Haplotypes and linkage disequilibrium

• Linkage and association

• Family-based association

What is association?What is association?

• Categorical traits– disease susceptibility genes

• Continuous traits– quantitative trait loci, QTL

Disease traitsDisease traits

Case Control

AA n1 n2

Aa n3 n4

aa n5 n6

Is there a difference in allele/genotype frequency between cases and controls?

Disease traitsDisease traits

Case Control

AA 30 25 p2

Aa 50 50 2p(1-p)

aa 20 25 (1-p)2

Is there a difference in allele/genotype frequency between cases and controls?

2Test for independence , p-value

Disease traitsDisease traits

Case

Control

AA n1 n2

Aa n3 n4

aa n5 n6

Case Control

A 2n1+n3

2n2+n4

a 2n5+n3

2n6+n4

Case

Control

A* n1+n3

n2+n4

aa n5 n6

General model Additive model Dominant model for A

2 df

1 df 1 df

Effect sizes calculated as odds ratios

Quantitative traitsQuantitative traits

AA

Aa

aa

-2

-1

0

1

2

3

4

aa Aa AA

ID Y G A D001 0.34 aa -1 0002 1.23 Aa 0 1003 1.66 Aa 0 1004 2.74 AA 1 0005 1.33 AA 1 0… … … … …

Y = aA + dD + e

Some web resourcesSome web resources• BGIM

http://statgen.iop.kcl.ac.uk/bgim/Introductory tutorials on twin analysis, primer on maximum likelihood, Mx language.

• GxE moderator modelshttp://statgen.iop.kcl.ac.uk/gxe/

• Power calculationhttp://statgen.iop.kcl.ac.uk/gpc/

• Case/control association toolshttp://statgen.iop.kcl.ac.uk/gpc/model/

Relative riskRelative riskGenotype P(D|G) RR

AA P(D|AA) P(D|AA)/P(D|aa)

Aa P(D|Aa) P(D|Aa)/P(D|aa)

aa P(D|aa) 1

P(D|AA) / P(D|aa) labelled RR(AA)

P(D|Aa) / P(D|aa) labelled RR(Aa)

Genetic modelsGenetic modelsModel RR(Aa) RR(AA)

General x y

Multiplicative x x2

Dominant x x

Recessive 1.000 x

No effect 1.000 1.000

TestsTestsTest Alternate NullAny effect? General No effectAny effect assuming a multiplicative gene?

Multiplicative

No effect

Any effect assuming a dominant gene?

Dominance No effect

Any effect assuming a recessive gene?

Recessive No effect

Can we assume a multiplicative effect?

General Multiplicative

Can we assume a dominant effect?

General Dominance

Can we assume a recessive effect?

General Recessive

Multiple samplesMultiple samples

• Constrain frequencies across samples• Constrain effects across samples

– Can test genetic models with effects and/or frequencies constrained to be equal

– Can perform tests of homogeneity of effects and/or frequencies across samples

An exampleAn example2 case/control samples2 case/control samples

• Population frequency 5%

Case

Control

AA 17 11

Aa 35 59

aa 24 40

Case

Control

AA 37 10

Aa 67 43

aa 20 37

Homogeneous effects across samplesHomogeneous allele frequencies across samples

Model p RR(Aa)RR(AA)-2LL----- - ---------------- Gen 0.367 1.979 3.663

0.367 1.979 3.663 793.143

Mult 0.367 1.911 3.6510.367 1.911 3.651 793.199

Dom 0.401 1.990 1.9900.401 1.990 1.990

802.927

Rec 0.405 1.000 1.9210.405 1.000 1.921

805.064

None 0.442 1.000 1.0000.442 1.000 1.000 815.628

Heterogeneous effects across samplesHomogeneous allele frequencies across samples

Model p RR(Aa) RR(AA) -2LL----- - ------ ------ ---- Gen 0.367 1.235 2.136

0.367 2.890 5.547 786.498

Mult 0.367 1.440 2.073 0.367 2.282 5.208 788.262

Dom 0.401 1.216 1.2160.401 2.936 2.936 796.422

Rec 0.405 1.000 1.5190.405 1.000 2.195 803.849

None 0.443 1.000 1.0000.443 1.000 1.000 815.628

TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS & EQ FREQS=========================================================

Gen vs None (2 df) : 22.485 p = 0.000Mult vs None (1 df) : 22.429 p = 0.000Dom vs None (1 df) : 12.701 p = 0.000Rec vs None (1 df) : 10.564 p = 0.001Gen vs Mult (1 df) : 0.056 p = 0.813Gen vs Dom (1 df) : 9.784 p = 0.002Gen vs Rec (1 df) : 11.921 p = 0.001

TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS & EQ FREQS===========================================================

Gen vs None (4 df) : 29.130 p = 0.000Mult vs None (2 df) : 27.366 p = 0.000Dom vs None (2 df) : 19.205 p = 0.000Rec vs None (2 df) : 11.779 p = 0.003Gen vs Mult (2 df) : 1.764 p = 0.414Gen vs Dom (2 df) : 9.925 p = 0.007Gen vs Rec (2 df) : 17.351 p = 0.000

TESTS OF EQUAL EFFECTS -- ASSUMING EQ FREQS===========================================

w/ Gen model (2 df) : 6.645 p = 0.036w/ Mult model (1 df) : 4.938 p = 0.026w/ Dom model (1 df) : 6.505 p = 0.011w/ Rec model (1 df) : 1.215 p = 0.270

Indirect associationIndirect association

QTL

Genotyped markers

Ungenotyped markers

RecombinationRecombination

Paternal chromosomeMaternal chromosome

Homologous chromosomes in one parent

Recombination eventduring meiosis

Recombinant gamete transmitted,harboring mutation

RecombinationRecombination

Paternal chromosomeMaternal chromosome

Homologous chromosomes in one parent

No recombination eventduring meiosis

Nonrecombinant gamete transmitted,not harboring mutation

Linkage: affected sib Linkage: affected sib pairspairs

Paternal chromosomeMaternal chromosome

First affected offspring, no recombination

Second affected offspring,recombinant gamete

IBD sharing from this one parent (0 or 1)1

0

Association analysisAssociation analysis

• Mutation occurs on a ‘red’ chromosome

Association analysisAssociation analysis

• Mutation occurs on a ‘red’ chromosome

Association analysisAssociation analysis

• Association due to `linkage disequilibrium’

A aM AM aMm Am am

This individual has aa and Mm genotypes

and am and aM haplotypes

HaplotypesHaplotypes

A aM AM aMm Am am

This individual has Aa and Mm genotypes and AM and am haplotypes

… but given only genotype data, consistent with Am/aM as well as

AM/am

HaplotypesHaplotypes

A aM AM aMm Am am

This individual has AA and Mm genotypes

and AM and Am haplotypes

HaplotypesHaplotypes

Equilibrium haplotype Equilibrium haplotype frequenciesfrequencies

A aM pr ps pm qr qs q

r s

Linkage disequilibriumLinkage disequilibrium

A aM pr + D ps - D pm qr - D qs + D q

r s

DMAX = Min(qs, pr)

D’ = D /DMAX

r2 = D’ / pqrs

Haplotype analysisHaplotype analysis

1. Estimate haplotypes from genotypes2. Associate haplotypes with trait

Haplotype Freq. Odds RatioAAGG 40% 1.00*

AAGT 30% 2.21

CGCG 25% 1.07

AGCT 5% 0.92

* baseline, fixed to 1.00

LinkageLinkage AssociationAssociation

QTL genotype

Trait

IBD at the QTL

Sib correlation

0 1 2 aa Aa AA

Marker genotype

Trait

QTL genotype

Trait

LDRF

IBD at the Marker

Sib correlation

0 1 2IBD at the QTL

Sib correlation

0 1 2 aa Aa AAaa Aa AA

Variance ComponentsVariance Components

• MeansM1 M2

• Variance-covariance matrix

V1 C21

C12 V2

ASSOCIATION

LINKAGE

Variance ComponentsVariance Components

• MeansM1 + bG1 M2 + bG2

• Variance-covariance matrix

V1 C21+ q(-½)

C12 + q(-½) V2

LINKAGEq = regression coef. = IBD sharing 0 , ½ , 1

ASSOCIATIONb = regression coef.G = individual’s genotype

• POPULATION MODEL– Allele & genotype frequencies– Demographics & population history– Linkage disequilibrium, haplotype structure

• TRANSMISSION MODEL– Mendelian segregation– Identity by descent & genetic relatedness

• PHENOTYPE MODEL– Biometrical model of quantitative traits– Additive & dominance components

Components of a Genetic Components of a Genetic TheoryTheory

G

G

G

G

G

G

G

G

Time

G

G

G

G

G

G

G

G

GG

G

G

G

G

GG

PP

3/5 2/6

3/2 5/2

3/5 2/6

3/6 5/6

Both families are ‘linked’ with the marker…

…but a different allele is involved.

Linkage without associationLinkage without association

3/6 2/4

3/2 6/2

3/5 2/6

3/6 5/6

All families are ‘linked’ with the marker…

… and allele 6 is ‘associated’ with disease

4/6 2/6

6/6 6/6

Linkage is just association within families

Linkage and associationLinkage and association

3/6

2/43/2

6/23/5

2/5

3/6 5/6

Allele 6 is more common in the GREEN populationThe disease is more common in the GREEN population

… a ‘spurious association’

4/62/6

6/6

2/2

3/4

5/2

Controls Cases

Association without Association without linkagelinkage

TDTTDT

• Transmission disequilibrium test– test for linkage and association

AA Aa

Aa AA

AA AA

Aa

aa AA

Aa

Aa Aa

TDT “A” disease alleleTDT “A” disease allele

AA x Aa AA x Aa aa x Aa aa x Aa

AA Aa Aa aa

+ - + -

0.5 0.5 + -

+ - 0.5 0.5

Additive

Dominant

Recessive

Between and within Between and within componentscomponents

Sib1

Sib2

Sib1 = B - W

Sib2 = B + W

Between and within Between and within componentscomponents

• Fulker et al (1999)

S1 S2 S1 S2 B W S1 S2

AA AA 1 1 1 0 B+W B-W

AA Aa 1 0 0.5 0.5

B+W B-W

AA aa 1 -1 0 1 B+W B-W

Note : W = S1 – B

Parental genotypesParental genotypes

• Use parental genotypes to generate B

• Examples– AA from AAxAA W = 0

– Aa from AAxAa W = -0.5

– Aa from AaxAa W = 0

Pat Mat

B

1 1 1

1 0 0.5

1 -1 0

0 1 0.5

0 0 0

0 -1 -0.5

-1 1 0

-1 0 -0.5

-1 -1 -1

assoc.mxassoc.mx

• Sibling pair sample

• B and W components precalculated in input file

• Single SNP genotype

• Quantitative trait

assoc.datassoc.dat

-0.007 -0.972 -1 0 -0.5 -0.5 0.5 -0.829 -0.196 1 1 1 0 0 0.369 0.645 1 1 1 0 0 0.318 1.55 0 1 0.5 -0.5 0.5 1.52 0.910 0 0 0 0 0 -0.948 -1.55 1 1 1 0 0 0.596 -0.394 1 0 0.5 0.5 -0.5 -1.91 -0.905 0 1 0.5 -0.5 0.5 0.499 0.940 1 0 0.5 0.5 -0.5 -1.17 -1.29 1 0 0.5 0.5 -0.5 -0.16 -1.81 1 1 1 0 0

s1 s2 g1 g2 b w1 w2

! Mx script for QTL association: sib pairs, univariate

Group 1 : Calc NG=2

Begin Matrices;! ** Parameters

B Full 1 1 free! association : between componentW Full 1 1 free ! association : within component

M Full 1 1 free ! meanS Full 1 1 free ! Shared residual varianceN Full 1 1 free! Nonshared residual variance

! ** Definition variables **C Full 1 1 ! association : between X Full 1 1 ! association : within, sib 1 Y Full 1 1 ! association : within, sib 2

End Matrices;

! ** Uncomment for B=W model ! Equate W 1 1 1 B 1 1 1

! Starting valuesMatrix B 0Matrix W 0Matrix M 0Matrix S 0.5Matrix N 0.5

End

Group2 : Data Group Data NI=7 NO=0 RE file=assoc.dat Labels Sib1 Sib2 g1 g2 b w1 w2 Select Sib1 Sib2 b w1 w2 / Definition b w1 w2 /

Matrices = Group 1

Means M + B*C + W*X | M + B*C + W*Y / Covariance

S + N | S _ S | S + N /

Specify C b / Specify X w1 / Specify Y w2 /

End

ModelsModels

B & W B Full 1 1 free W Full 1 1 free!Equate W 1 1 1 B 1 1 1

B = W B Full 1 1 free W Full 1 1 freeEquate W 1 1 1 B 1 1 1

B B Full 1 1 free W Full 1 1!Equate W 1 1 1 B 1 1 1

B=W=0B Full 1 1 W Full 1 1!Equate W 1 1 1 B 1 1 1

TestsTests

Test HA H0

Standard association test B = WB=W=0

Test of stratification B & W B = W

Robust association test B & W B

assoc.mxassoc.mx

Model B W -2LL df

B & W -0.478 -0.365 2103.96 795

B = W -0.420 -0.420 2105.05 796

B -0.4778 2127.01 796

B=W=0 2163.34 797

Test of total association HA B=W 2105.05 H0 B=W=0 2163.34

Δ-2LL = 58.29, df = 1, p < 1e-14

assoc.mxassoc.mx

Model B W -2LL df

B & W -0.478 -0.365 2103.96 795

B = W -0.420 -0.420 2105.05 796

B -0.4778 2127.01 796

B=W=0 2163.34 797

Test of stratification HA B &W 2103.96 H0 B = W 2105.05

Δ-2LL = 1.09, df = 1, p =0.29

assoc.mxassoc.mx

Model B W -2LL df

B & W -0.478 -0.365 2103.96 795

B = W -0.420 -0.420 2105.05 796

B -0.4778 2127.01 796

B=W=0 2163.34 797

Test of within association HA B &W 2103.96 H0 B 2127.01 Δ-2LL = 23.06, df = 1, p < 1e-6

ImplementationImplementation

• QTDT– Abecasis et al (2001) AJHG– extends between/within model to

general pedigrees– multiple alleles– covariates– combined test of linkage and

association– discrete as well as quantitative traits

Linkage Linkage AssociationAssociation

• families

• detectable over large distances >10 cM

• large effects OR >3, variance>10%

• unrelateds or families

• detectable over small distances <1 cM

• small effects OR<2, variance<1%