Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type...

Post on 21-Dec-2015

215 views 1 download

Tags:

Transcript of Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type...

Thoughts about the TDT

Contribution of TDT: FindingGenes for 3 Complex Diseases

• PPAR-gamma in Type 2 diabetes

Altshuler et al. Nat Genet 26:76-80, 2000

• NOD2 in Crohn’s Disease

Hugot et al., Nature 411: 599-603, 2001

• ADAM33 in asthma

Van Eerdewegh et al., Nature 418: 426-430, 2002

The common PPAR-gamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes

Altshuler et al. Nat Genet 26:76-80, 2000

*

NOD2 Variants and Susceptibility to Crohn’s Disease

Hugot et al., Nature 411: 599-603, 2001

SNP13:

p=6x10-6

Chrom 16q

Van Eerdewegh et al., Nature 418: 426-430, 2002

ADAM33 Gene: Asthma and Bronchial Hyperresponsiveness

P= 3x10-6

to 0.04

Chrom 20p

Supplementary InformationTable 2 Transmission Disequilibrium test (TDT) for 5 SNPs in ADAM 33

AsthmaOver-Transmitted TDT

SNP/ SNP Combination Allele/Haplotype T NT p-valueS1 G 37 20 0.033T1 T 43 27 0.072V-1 C 43 27 0.072V1 A 7 7 1.00V4 C 73 55 0.13S1/T1 GT 72 38 0.0029T1/V-1 TC 80 46 0.0043T1/V4 TC 97 60 0.0070S1/T1/V-1 GTC 77 41 0.0029S1/T1/V1 GTA 75 41 0.0047S1/T1/V4 GTC 96 60 0.0084T1/V-1/V1 TCA 76 45 0.015T1/V-1/V4 TCC 97 59 0.0046T1/V1/V4 TAC 98 58 0.0031S1/T1/V-1/V1 GTCA 74 41 0.0068S1/T1/V-1/V4 GTCC 96 58 0.0034S1/T1/V1/V4 GTAC 97 58 0.0078T1/V-1/V1/V4 TCAC 96 59 0.0063S1/T1/V-1/V1/V4 GTCAC 95 58 0.0048

SNP Haplotypes

Population distributions of (a) disease given genotype, and (b) genotype given disease.

Affected Affected

Genotype Yes No Genotype Yes No

M1M1 a 1 – a M1M1 d g

M1M2 b 1 – b M1M2 e h

M2M2 c 1 – c M2M2 f i

(a) (b)

Clayton

R R M Ma

cR R M M

b

c( ) , ( )1 1 1 2

d e

f

g h

iOdds Ratio

He calls this the relative risk. Confusing!

Ott

D M

θ

D1 M1

D2 M2

Null hypothesis: θ = ½

(Disease and marker loci unlinked)

Alternative hypothesis: θ < ½

(Disease and marker loci linked)

freq (D1 M1) ≠ freq (D1) × freq (M1)

δ = freq (D1 M1) – freq (D1) × freq (M1)

• We assume that we observe the marker locus genotypes, either M1M1, M1M2, or M2M2, of both parents and the affected sibs in all families in the data.

Probabilities for transmitted and non-transmitted

marker alleles M1 and M2 from any parent of an affected child.

Non-transmitted allele

Transmitted Allele M1 M2 Total

M1 P(11) P(12) P(1.)

M2 P(21) P(22) P(2.)

Total P(.1) P(.2) 1

P(11) = q2 + q δ / p

P(12) = q (1 – q) + (1 – θ – q) δ / p

P(21) = q (1 – q) + (θ – q) δ / p

P(22) = (1 – q)2 – (1 – q) δ / p

Numbers of transmitted and non-transmitted

marker alleles M1 and M2 among the parents of the affected sibs

Non-transmitted allele

Transmitted Allele M1 M2

M1 n11 n12

M2 n21 n22

Put n12 + n21 = n

Only P(12) and P(21) depend on θ .

Also, when θ = ½, P(12)=P(21)

So the “natural” (TDT) test statistic is

This (McNemar statistic) has an

asymptotic 1 df χ2 distribution when the null hypothesis is true.

( )n n

n1 2 2 1

2

• Note that this statistic depends only on n12 and n21 only, and ignores n11 and n22.

• This makes sense: the statistic uses data only from M1M2 parents, and only these are informative for linkage.

• We call these ‘informative” parents.

• So at the end of the day we consider only transmissions from informative parents.

• We will focus entirely on the denominator, n, of the TDT statistic.

• It is remarkable how many questions one can ask about this.

• But before we ask these, we first ask, where does this denominator come from?

• Assuming the null hypothesis is true, n12 has a binomial (n, ½) distribution.

• Note: this is true even if the data contain several affected children from the same family.

• Thus the variance of n12 - n21 (= 2n12 – n) is 4n/4 = n.

• We will examine three situations, all focusing on the question: “Is n the correct (variance) denominator for the situation at hand?”.

• Situation 1. Testing for association.

• Here the null hypothesis is “no association”, or

0The problem here is that transmissions to different affected sibs in the same family are not independent under this null hypothesis. Thus when there are several families in the data with more than one affected sib, n12 does not have a binomial distribution.

If H0, δ =0, is true, the cell probabilities for the simple random-mating case are

P(11) = q2 , P(12) = q(1 – q) ,

P(21) = q(1 – q) , P(22) = (1 – q)2

(Thus should we not be testing this H0 by using both n11 n22 – n12 n21 and n12 –

n21 and a 2 degrees of freedom test?)

Let’s ignore this point for now.

P(11) = (Σi αi (pi2

qi2

+ δi pi qi )) / (Σi αi pi2)

P(12) = (Σi αi (pi2

qi (1 – qi) + δi pi (1 – θ – qi))) / (Σi αi pi

2)

P(21) = (Σi αi (pi2

qi (1 – qi) + δi pi (θ – qi))) / (Σi αi pi2)

P(22) = (Σi αi (pi2

(1 – qi)2 – δi pi (1 – qi))) / (Σi αi pi2)

αi = relative size of subpopulation iδi = linkage disequilibrium in subpopulation ipi = frequency of D1 in subpopulation iqi = frequency of M1 in subpopulation i

Suppose that in family j, M1 is transmitted n12j

times, M2 is transmitted n21j times, from M1M2

parents.

Define Dj as n12j – n21j

The test statistic is

2j

j

D

DT

2j

j

D

DT

n12 – n21

Suppose that there is only one affected child in each family.

Then Dj = ±1 (for all j)

... ly,Equivalent2112

2112

nn

nnz

T

T2 = TDT χ2

• Situation 2. Suppose we have families in the data where both parents are dead, (so we do not know their marker locus genotypes), but where there are two affected sibs, one being M1M1, the other M2M2.

• We therefore can infer that both parents were informative.

• Should we use the data from these families in the analysis, using the standard TDT statistic?

• The answer is “no”. Why is this so?

• Because the very fact that we can infer the parental genotypes unambiguously means that one sib MUST be M1M1 and the other MUST be M1M1.

• In such families there is zero variance, rather than some binomial variance, for the number of M1 genes in the two sibs.

• Philosophical question: is there any difference between the actions you take in directly observing an event and having unambiguous evidence that the event occurred?

• In this case, “yes there is”.

Situation 3. Suppose that we have two affected sibs, one informative (i.e. M1M2) parent, in each family in the data.

Numbers of transmission from the informative parents

2M1 1M1 , 1M2 2M2 Total

# families i j k n

H0 means n/4n/2

n/4 n

4

4

2

2

4

4

totalis Sum

2 Sharing

2 TDT

222

2TOT

2

222SH

2

22TDT

2

n

nk

n

nj

n

ni

ΧΧ

n

nki

n

jkiΧΧ

n

kiΧΧ

1 if 2

21 if

0 if )(2

2))(1(2)(

2TDT

2TDT

2TDT

2TDT

2

j

Χ

ki

jki

nΧΧW

12

10

, 2

If

22TDT

22WWW ΧΧΧΧ

nki

n

ΧΧΧ

n

ΧΧΧ

nki

W

W

2SH22

TDT

2SH2

TDT2

1)0(

1)1(

2 Assume

Suppose that a sharing Χ2 has been carried out, correctly, as a one-sided test.

Given i + k = s, what is the distribution of Χ2

TDT ?

2 if 1 2

factor Correction

)!0( is This

2 original

correct

)(,0)(

21,Bin , then given ,21HUnder

22TDT

222TDT

0

nss

n

W

n

kiΧ

ki

ki

s

kiΧ

skiVarkiE

s kis

,

One affected sib, two informative (M1M2) parents

Genotype of affected child

M1 M1 M1 M2 M2M2 Total

# families p q r nExpected when H0 (θ=½)

n/4n/2

n/4 n

rp

rpΧ

n

rpΧ

22WL

22TDT

2

s.assumption variousmakes

22 222112211211

221122

HHRR nnnnnnn

nnΧ

GameticDisequilibrium Δ2 = Δ1 (1–θ)

GameticDisequilibrium Δ3 = Δ2 (1–θ)

GameticDisequilibrium Δ1

Parents of generation 1 mate only within their subpopulation

Parents of generation 2 mate at random throughout population

Parents of generation 3 mate at random throughout population

Subpopulation 1 2 …… i …… k

Relative Size α1 α2 …… αi …… αk

Coefficient of gameticDisequilibrium δ1 δ2 …… δi …… δk

Generation 0

Generation 1

Generation 2

Generation 3

Generation 1Gametic Disequilibrium

Δ1

Generation 2Gametic Disequilibrium

Δ2

Generation 3Gametic Disequilibrium

Δ3

Generation 4, etc

Generation 0

The value of the TDT statistic in two models

1. Immediate admixtureGeneration 1 1.48

Generation 2 2.07

Generation 3 15.34

Generation 4 12.43

2. Gradual admixtureGeneration 1 1.48

Generation 2 2.07

Generation 3 8.53

Generation 4 6.99