Reconstructing Parental Genotypes When Testing for Linkage in the Presence of Association

8
141 0040-5809/01 $35.00 © 2001 Elsevier Science All rights reserved. Theoretical Population Biology 60, 141–148 (2001) doi:10.1006/tpbi.2001.1540, available online at http://www.idealibrary.com on Reconstructing Parental Genotypes When Testing for Linkage in the Presence of Association Michael Knapp Institute for Medical Biometry, Informatics, and Epidemiology, University of Bonn, D-53105 Bonn, Germany Received March 29, 2001 Various family-based association methods have recently been proposed that allow testing for linkage in the presence of linkage disequilibrium between a marker and a disease even if there is only incomplete parental-genotype information. For some families, it may be possible to reconstruct missing parental genotypes from the genotypes of their offspring. Treating such a reconstructed family as if parental genotypes have been typed, however, can introduce bias. The reconstruction- combined transmission/disequilibrium test (RC-TDT ) and its X-chromosomal counterpart, XRC-TDT, employ parental-genotype reconstruction and correct for the biases involved in this reconstruction without relying on population marker allele frequencies. For the two tests, exact P values can be obtained by numerically calculating the convolution of the null distributions corresponding to the families in the sample. © 2001 Elsevier Science INTRODUCTION The identification of genes which are involved in the aetiology of oligogenic or multifactorial diseases is a major challenge in human genetics. Genome-wide linkage analysis based on the evaluation of allele-sharing within families allowed us to locate genes responsible for a large number of simple Mendelian diseases during the past 15 years. The search for genes predisposing to mul- tifactorial diseases, however, is complicated by phenom- ena such as locus heterogeneity and incomplete pene- trance. In addition, each individual genetic variant may make only a modest contribution to disease risk. Even if linkage between the disease and a genomic region has been identified reproducibly, these regions generally consist of several megabases. Therefore, methods are required to achieve mapping on a finer scale and to search for genetic variants of small effect. As shown by Risch and Merikangas (1996) and Risch (2000), genetic association studies which assess correlations between genetic variants and the occurrence of a disease on the population level can be much more powerful to detect multifactorial genetic effects than linkage analysis. The most widely applied design of association studies is the case–control design. A potential pitfall of case–control association studies is that an existing association between the disease and an allele at the marker locus can be due to population stratification. If there is an admixture of subpopulations, and if both the marker allele frequencies and the disease prevalence vary over these subpop- ulations, a case–control association study will show evi- dence for association. The cause of this association, however, is nongenetic and is not due to the influence of the marker allele under investigation or some allele at a linked locus nearby. For this reason, alternative designs for association studies have been developed. Various approaches have been proposed that use controls selected from the families of affected probands. Historically, the first representative of these approaches is the haplotype rela- tive risk (HRR) method proposed by Rubinstein et al. (1981) and detailed by Falk and Rubinstein (1987). The

Transcript of Reconstructing Parental Genotypes When Testing for Linkage in the Presence of Association

ary

ots

niv

ve

b

or

o

, h

DT

rre

ue

tio

Theoretical Population Biology 60, 141–148 (2001)doi:10.1006/tpbi.2001.1540, available online at http://www.idealibr

Reconstructing Parental GenTesting for Linkage in the PreAssociation

Michael KnappInstitute for Medical Biometry, Informatics, and Epidemiology, UD-53105 Bonn, Germany

Received March 29, 2001

Various family-based association methods ha

linkage in the presence of linkage disequilibrium

only incomplete parental-genotype information. F

missing parental genotypes from the genotypes

family as if parental genotypes have been typed

combined transmission/disequilibrium test (RC-T

employ parental-genotype reconstruction and co

without relying on population marker allele freq

obtained by numerically calculating the convolu

families in the sample. © 2001 Elsevier Science

INTRODUCTION

The identification of genes which are involved in theaetiology of oligogenic or multifactorial diseases is amajor challenge in human genetics. Genome-widelinkage analysis based on the evaluation of allele-sharingwithin families allowed us to locate genes responsible fora large number of simple Mendelian diseases during thepast 15 years. The search for genes predisposing to mul-tifactorial diseases, however, is complicated by phenom-ena such as locus heterogeneity and incomplete pene-trance. In addition, each individual genetic variant maymake only a modest contribution to disease risk. Even if

141

linkage between the disease and a genomic region hasbeen identified reproducibly, these regions generallyconsist of several megabases. Therefore, methods arerequired to achieve mapping on a finer scale and tosearch for genetic variants of small effect. As shown byRisch and Merikangas (1996) and Risch (2000), geneticassociation studies which assess correlations betweengenetic variants and the occurrence of a disease on the

.com on

ypes Whenence of

ersity of Bonn,

recently been proposed that allow testing for

etween a marker and a disease even if there is

some families, it may be possible to reconstruct

f their offspring. Treating such a reconstructed

owever, can introduce bias. The reconstruction-

) and its X-chromosomal counterpart, XRC-TDT,

ct for the biases involved in this reconstruction

ncies. For the two tests, exact P values can be

n of the null distributions corresponding to the

population level can be much more powerful to detectmultifactorial genetic effects than linkage analysis. Themost widely applied design of association studies is thecase–control design. A potential pitfall of case–controlassociation studies is that an existing association betweenthe disease and an allele at the marker locus can be due topopulation stratification. If there is an admixture ofsubpopulations, and if both the marker allele frequenciesand the disease prevalence vary over these subpop-ulations, a case–control association study will show evi-dence for association. The cause of this association,however, is nongenetic and is not due to the influence ofthe marker allele under investigation or some allele at a

0040-5809/01 $35.00© 2001 Elsevier ScienceAll rights reserved.

linked locus nearby.For this reason, alternative designs for associationstudies have been developed. Various approaches havebeen proposed that use controls selected from thefamilies of affected probands. Historically, the firstrepresentative of these approaches is the haplotype rela-tive risk (HRR) method proposed by Rubinstein et al.(1981) and detailed by Falk and Rubinstein (1987). The

HRR method is based on a sample of nuclear familieswith a single affected child. The internal control group isformed by those parental marker alleles which have notbeen transmitted to that child. By using this controlgroup, which is by definition ethnically matched to thecase group, the stratification artifact is eliminated.At present, the most popular method of family-basedassociation analysis is the transmission/disequilibriumtest (TDT), introduced by Spielman et al. (1993). TheTDT is a simple and powerful method to detect linkagebetween a marker and a disease susceptibility locus in thepresence of linkage disequilibrium between the two loci.For the TDT, an allele transmitted by a parent to anaffected child is matched to the other allele not trans-mitted from the same parent, and McNemar’s q2 test formatched pairs with dichotomous outcome (McNemar,1947) is then applied. As a test of linkage in the presenceof association, transmissions from parents to more thanone affected child can be included in TDT analysis. If thesample consists of nuclear families with a single affectedchild, the TDT is also a valid test of association in thepresence of linkage. However, the TDT is not a valid testof association when applied to families with more thanone affected child. Extensions of the TDT have beendeveloped (Martin et al., 1997) that can use samples offamilies with multiple affected children for testing thenull hypothesis of no association or no linkage.The TDT requires data from families in which markergenotypes are available for both parents. The availabilityof parental marker genotypes can limit the applicabilityof the TDT, especially when the disease of interest has alate age of onset. For this reason, Spielman and Ewens(1998) invented a method called ‘‘sib TDT’’ (S-TDT),which compares marker genotypes in affected and unaf-fected offspring. Generally, the S-TDT is a test oflinkage. However, the S-TDT is also a valid test of asso-ciation, provided that the data consist entirely of sibshipswith exactly one affected and one unaffected sib.Spielman and Ewens (1998) also described a method forcombining data from families in which parental geno-types are available with data from families in which geno-types of unaffected sibs are available but genotypes ofparents are not. This procedure has been named

142

‘‘combined TDT’’ (C-TDT) by Knapp (1999a). With anincreasing degree of marker polymorphism and anincreasing size of the sibship, there is an increasingprobability that missing parental genotypes can beuniquely determined from the genotypes of the children.For example, parental-genotype reconstruction is pos-sible for each of the three families presented in Table 1 ofSpielman and Ewens (1998). In the context of theC-TDT, it would be tempting to treat such reconstructed

families as if parental genotypes had been typed. Curtis(1997), however, has shown that such a procedure canintroduce a bias, and he claims that correcting this biaswould require the knowledge of population marker-allelefrequencies. But, as has been argued by Knapp (1999a),the bias involved in the reconstruction of parentalmarker genotypes can be avoided without relying onpopulation marker-allele frequencies. The central idea ofthe reconstruction-combined TDT (RC-TDT) proposedby Knapp (1999a) is the calculation of the null distribu-tion of the number of transmissions of a certain allele,conditional on the event that missing parental genotypescan be reconstructed. Horvath et al. (2000) described aprocedure for X-chromosomal markers which followsthe logic of the RC-TDT. In addition, the present paperdescribes a variant of this procedure for diseases whichonly can occur in males.

C-TDT AND RC-TDT FOR AUTOSOMAL

MARKERS

Notation

It will be assumed that there is a specific allele (denotedA) at the marker locus that is of particular interest. Thesample consists of m nuclear families (parents andchildren). For 1 [ i [ m, nai denotes the number ofaffected children, nui denotes the number of unaffectedchildren, and nci :=nai+nui denotes the size of the sibshipfor family i. LetNgai (N

gui) be random variables, denoting

the number of affected (or unaffected) children withgenotype g in family i. Small letters (i.e., ngai and n

gui) are

used to denote the observed values of Ngai and Ngui.

Further, let Ngi :=Ngai+N

gui and n

gi :=n

gai+n

gui denote

the random variable and the observed number ofchildren with genotype g in family i, respectively.

C-TDT

The transmission/disequilibrium test, introduced by

Michael Knapp

Spielman et al. (1993), requires that (i) marker genotypesare known for both parents and (ii) at least one parent isheterozygous for allele A. For each family which satisfiesthese conditions, the TDT counts the number of trans-missions of allele A from heterozygous parents to theiraffected children. If homi denotes the number of parentsin family iwith genotypeAA, then this count is given by

TTDTi :=2NAAai +NABai −homi · nai. (1)

When marker and disease are unlinked (i.e., under H0),TTDTi has a binomial distribution B(heti · nai, 1/2) withheti denoting the number of heterozygous parents in thefamily. Obviously,

eTDTi :=EH0 (TTDTi )=heti · nai/2, (2)

vTDTi :=VarH0 (TTDTi )=heti · nai/4. (3)

The sib TDT, proposed by Spielman and Ewens(1998), does not require parental marker genotypes, butinstead uses marker genotypes of unaffected siblings. Therequirements for a family to be suitable for the S-TDTare that (i) there is at least one affected and one unaf-fected child and (ii) there are at least two different geno-types in the offspring, even after all other marker allelesthan A are grouped as B. The S-TDT then counts thenumber

TS−TDTi :=2NAAai +NABai (4)

of alleles A in affected children. The distribution ofTS−TDTi under the null hypothesis of no linkage, condi-tional on the observed distribution (nAAi , n

ABi , n

BBi ) of

marker genotypes in the whole sibship and on thenumber (nai, nui) of affected and unaffected children inthe sibship, can be derived from the hypergeometric dis-tribution. Especially, as given by Spielman and Ewens(1998),

eS−TDTi :=EH0 (TS−TDTi | (nAAi , n

ABi , n

BBi , nai, nui))

=(2nAAi +nABi ) ·nainci

(5)

and

vS−TDTi :=VarH0 (TS−TDTi | (nAAi , n

ABi , n

BBi , nai, nui))

1nai · nui · (4nAAi · (nci−nAAi −nABi )+nABi · (nci−n

ABi ))

2

Reconstructing Parental Genotypes

=n2ci · (nci−1)

. (6)

In case that a family meets the requirements for both theTDT and the S-TDT, Spielman and Ewens (1998)expected that the TDT is at least as powerful as theS-TDT. Therefore, the authors proposed to ignore theunaffected offspring in such families and to treat thesefamilies as with the TDT.

The test statistic of the C-TDT is given by

;Ii=1 (Ti−ei)

`;Ii=1 vi

, (7)

where the summation is over all families in the sample inwhich either the conditions for the TDT or the conditionsfor the S-TDT are satisfied. Ti, ei, and vi denote theappropriate count of allele A, null expectation, andvariance for the ith family (i.e., either (1)–(3) for the TDTor (4)–(6) for the S-TDT).Approximate P values corresponding to an observedvalue for the test statistic (7) can be obtained by notingthat the distribution of (7) is approximately the standardnormal distribution under the null hypothesis of nolinkage. There has been some debate (Laird et al., 1998;Ewens and Spielman, 1998) about the feasibility of cal-culating exact P values for the S-TDT and C-TDT. It iswell known (e.g., Elston, 1998) that P values obtained onthe basis of theoretical large-sample approximations canbe quite unreliable. Thus, the superiority of exact Pvalues over asymptotic P values is evident. As shown byKnapp (1999b), the numerical calculation of exact Pvalues for the S-TDT and C-TDT is straightforward. Asnoted above, for a single family, the null distribution of(1) is a binomial distribution, and the null distribution of(4) can be obtained from the hypergeometric distribu-tion. Both of these distributions are concentrated on, atmost, 2nai+1 different values. The null distribution of;Ii=1 Ti is the convolution of the null distributions cor-responding to the different families in the sample. Thenumerical calculation of the convolution of such distri-butions concentrated on a small part of the naturalnumbers is quite feasible, at least for sample sizestypically occurring in practice.

RC-TDT

For some families without parental-genotype infor-mation, it may be possible to reconstruct parental geno-types from the genotypes of their offspring. Treatingthese reconstructed families as if parental genotypes hadbeen typed, however, can introduce a bias, as has been

143

shown by Curtis (1997). A simple example for theoccurrence of such a bias is provided by considering afamily with one affected and one unaffected child and byassuming that the missing genotypes of both parents areactually AB. Then, the parental genotypes can be recon-structed only if one child has genotype AA and the otherhas genotype BB. Under the null hypothesis of nolinkage between the marker and the disease, one-half ofthe time the affected sib will have genotype AA, whereas

in the other half of such families, the affected sib willhave BB. Therefore, the null expectation of the numberof allelesA in the affected sib is 1, which is identical to thenull expectation of the number of alleles A in an affectedofspring of a double-heterozygous ABmating with typedparents. But the conditional null variance of the numberof alleles A in an affected child from such a mating, giventhat the missing parental genotypes can be reconstructed,is also 1. This is two times the null variance of the numberof alleles A in an affected offspring of an AB×ABmating. Because of this increased variance, treating sucha reconstructed family as if parental genotypes had beentyped will inflate the type I error rate of the C-TDT.The example presented in the preceding paragraphillustrates that the null distribution of the number ofalleles A, transmitted by heterozygous parents to theiraffected children in families with reconstructed parentalmarker genotypes, generally will be different from thecorresponding distribution for completely typed families.The central idea of the reconstruction-combined TDTproposed by Knapp (1999a) is the systematic calculationof this conditional distribution. For this purpose, it issufficient to consider a marker locus with four alleles A,B, C, and D, because there are at most four differentalleles segregating in a single family and because familieswithout allele A are uninformative in the present context.The alleles B, C, and D may denote different alleles,across families. The RC-TDT counts the number ofallelesA in the affected offspring, i.e.,

TRC−TDTi :=2NAAai +NABai +N

ACai +N

ADai . (8)

The next step in the construction of the RC-TDT is tospecify the condition for the observed marker genotypesin the offspring to allow reconstruction of the parentalgenotypes, for each parental mating type. The C-TDTdoes not distinguish between families for which bothparental genotypes are missing and families with only asingle missing genotype. To reconstruct parental geno-types, however, such partial information can be takeninto account. When both parental genotypes are missing,four different parental mating types must be distin-guished.

144

1. Both parents are heterozygous for alleleAwith thesame genotype (e.g., AB×AB). Then, reconstructionrequires that there is at least one child with genotypeAA and at least one child with genotype BB. Thus, thecondition for reconstruction becomes R=(NAA > 0) 5(NBB > 0).2. Both parents are heterozygous for allele A butwith different genotypes (e.g., AB×AC). In this case,

a necessary and sufficient condition for reconstruc-tion is R=(NAA > 0 5NBC > 0) 2 (NAA > 0 5NAB > 05NAC > 0).3. Both parents are heterozygous for some allele dif-ferent from A but one parent is heterozygous for A(e.g., AB×BC). Then, the condition for reconstruc-tion is R=(NBB > 0 5NAC > 0) 2 (NBB > 0 5NAB > 05NBC > 0).4. One parent is heterozygous for allele A, andthere are four different parental alleles (e.g., AB×CD).The condition R=(NAC > 0 5NBD > 0) 2 (NAD > 0 5NBC > 0) does not allow an exact reconstruction of thisparental mating type. For example, if NACi > 0 andNBDi > 0, then the possibility remains that the matingtype is AD×CB, instead of AB×CD. However, thecondition allows us to decide that exactly one of theparents is heterozygous for allele A. Therefore, thiscondition is sufficient in the present context.

It is now straightforward to calculate the conditional nulldistribution W(TRC−TDTi | R, nai, nui ) numerically. As hasbeen shown by Knapp (1999a, Table 1), it is even possibleto give closed expressions for the expectation and varianceof this distribution. From these expressions, it can be seenthat the conditional null expectation and/or variance issometimes larger and sometimes smaller than the corre-sponding unconditional moments. Thus, treating recon-structed families as if parental genotypes had been typeddoes not necessarily inflate the type I error rate but canalso induce an effect in the opposite direction.If only one parental genotype is missing, nine differentparental mating types must be distinguished. Thesemating types, their corresponding conditions for recon-struction, and expressions for the conditional momentsofTRC−TDTi were given in Table 2 of Knapp (1999a).Families in which at least one parental marker geno-type is missing and in which no unaffected offspring isavailable are not suitable for C-TDT analysis. This kindof families, however, may be included for RC-TDTanalysis, provided that parental-genotype reconstructionis possible. All conditions for reconstruction describedabove for the case that both parental genotypes are

Michael Knapp

missing, and the conditions for reconstruction given inTable 2 of Knapp (1999a) require that at least two dif-ferent genotypes in the offspring are observed. If a familyconsists of exactly two affected children and no unaf-fected siblings, then parental-genotype reconstructioncan be possible, but such families are nevertheless unin-formative for linkage, because the conditional varianceof the number of alleles A in affected offspring will bezero. Thus, families in which all offspring are affected are

useful for RC-TDT analysis only if there are at least threeaffected sibs.Knapp (1999b, p. 1209) described a modification ofthe RC-TDT which is concerned with families in whichonly a single parental marker genotype is missing and thetyped parent is heterozygous (say, AB) for allele A. It hasbeen shown by Curtis and Sham (1995) that affectedoffspring with an allele not present in the availableparent (i.e., C) can be used for TDT analysis. Knapp(1999b) proposed to include such families for RC-TDTanalysis provided that all children in the family have thesame marker genotype. It has been argued by Knapp(1999b) that, if more than one allele not being present inthe typed parent occurs in the genotypes of the sibship,then the missing parental genotype can be reconstructed,and if both alleles A and B of the typed parent occur inthe children, then the family will be suitable for analysisby C-TDT. The latter, however, is true only if the familyhas at least one unaffected sib. Therefore, a moreappropriate way for handling such families (i.e., familieswith one AB parent, one parent untyped, and exactly oneallele not present in the typed parent occurs in theoffspring) is to distinguish by the number of unaffectedsiblings: (1) if there is at least one unaffected sibling, thenthe relevant distribution of the number of alleles A in theaffected children is concentrated on the points 0 and nai.(2) If, however, there is no unaffected offspring, then therelevant null distribution of the number of allelesA in theoffspring is a binomial distributionB(nACai +n

BCai , 1/2).

Similar to (7), the test statistic of the RC-TDT is thestandardized total number of alleles A in the affectedoffspring of a family. Whereas the C-TDT distinguishesonly two categories of families (i.e., families in whichparental genotypes are available and families in whichthe conditions for the S-TDT are satisfied), the RC-TDTadds two additional categories: (i) families in which atleast one parental genotype is missing but can be recon-structed. For such families, the appropriate conditionalnull expectation ei and conditional null variance vi mustbe inserted into (7); (ii) The kind of families described inthe preceding paragraph, for which neither the missingparental genotype can be reconstructed nor the condi-tions for the S-TDT are satisfied. For this kind of

Reconstructing Parental Genotypes

families, the appropriate null distribution is given above.Similar to the C-TDT, P values can be obtained bycomparing the observed value for the test statistic with astandard normal distribution. Alternatively, exact Pvalues can be assigned to this test statistic in the sameway as described by Knapp (1999b) for the S-TDT andRC-TDT.The RC-TDT uses marker genotypes of both affectedand unaffected children for reconstructing parental

genotypes. In a case where the marker genotypes of theoffspring do not allow reconstruction of missing parentalgenotypes, the family is treated as with the S-TDT. Sincethe relevant distribution of the test statistic for theS-TDT is conditional on the marker genotype distributionin the whole sibship (see (5) and (6)), the S-TDT does notneed to be modified for families in which the offspring’smarker genotypes do not allow parental genotypereconstruction. As noted by Spielman and Ewens (1999),the null distribution of the number of marker alleles A,transmitted by heterozygous parents to their affectedchildren in families with reconstructed parental markergenotypes, will be identical to the corresponding distri-bution for completely typed families, provided thatreconstruction is done from the genotypes of unaffectedoffspring only. With a strategy of using only unaffectedchildren for reconstruction of missing parental geno-types, therefore, it would not be necessary to modify theTDT distribution in case that reconstruction is possible.A family for which reconstruction from the genotypes ofunaffected offspring is not possible, however, can nolonger be treated as with the S-TDT. An example is afamily with nai=1, nui=2, and n

AAi =n

ABi =n

BBi =1.

According to (6), vS−TDTi =2/3. If reconstruction fromthe genotypes of unaffected offspring is not possible,then the genotype of the affected child in this family iseither AA or BB, and the conditional variance of thenumber of alleles A in the affected child is 1. In principle,it is possible to modify (5) and (6) for such families byadditionally conditioning on the event that parentalgenotype reconstruction cannot be done from thegenotypes of unaffected offspring. Since using markergenotypes of only unaffected offspring will allow us toreconstruct missing parental genotypes in fewer familiesthan using marker genotypes of the whole offspring, thisstrategymay be less powerful than the approach taken bythe RC-TDT.

XC-TDT AND XRC-TDT FOR X-LINKED

MARKERS

145

Notation

In addition to the notation of the preceding section, letnfai and n

mai denote the number of affected daughters and

affected sons, nfui and nmui denote the number of unaf-

fected daughters and unaffected sons, and nfci=nfai+n

fui

and nmci=nmai+n

mui denote the total number of daughters

and sons for family i. Genotypes consisting of a singleallele correspond to sons. For example, nAai denotes the

observed number of affected sons with genotype A infamily i.

XC-TDT

Adaption of the C-TDT to X-chromosomal markersrequires to describe variants, of both the TDT and theS-TDT, that can be used for X-chromosomal markers.Adapting the TDT to X-chromosomal markers (whichwill be called XTDT) is straightforward: the XTDTsimply counts the number of transmissions and thenumber of nontransmissions of allele A from hetero-zygous mothers to their affected offspring. Adapting theS-TDT to X-chromosomal markers (which will be calledXS-TDT) is slightly less obvious. Ho and Bailey–Wilson(2000) and Horvath et al. (2000) proposed to divide thesibship of each family into two strata (which have beencalled subsibships), with the first subsibship consisting ofdaughters and the second subsibship consisting of sons.The S-TDT is then applied separately to each of the twosubsibships; that is, first, the null distribution of thenumber Tfi of alleles A in affected daughters is calculatedconditional on nfa , n

fu and the observed marker genotype

distribution in the daughters, and, second, the nulldistribution of the number Tmi of alleles A in affectedsons is calculated conditional on nma , n

mu and the observed

marker genotype distribution in the sons. If the paternalgenotype is missing but the maternal genotype is known,the male subsibship is analyzed with the XTDT. Splittingthe sibship into subsibships is justified because Tfi andTmi are independent underH0.

XRC-TDT

In order to adapt the RC-TDT to X-chromosomalmarkers, it is necessary to calculate the null distributionof the number of alleles A in affected offspring, condi-tional on the event that the number of alleles A inthe missing parental genotype or genotypes can bedetermined. For X-chromosomal markers, three situa-tions must be distinguished: (1) both parental genotypesare missing; (2) the maternal genotype is missing but thepaternal genotype has been typed; and (3) the paternal

146

genotype is missing but the maternal genotype has beentyped. Tables 1–3 of Horvath et al. (2000) show, for eachof these three situations, the parental mating types forwhich genotype reconstruction is possible, together withthe corresponding conditions for reconstruction andclosed expressions for the conditional moments ofTXRC−TDTi . As does the RC-TDT, the XRC-TDT cancombine families with typed parental genotypes andfamilies with reconstructed parental genotypes. For the

(autosomal) RC-TDT, there is an additional categoryconsisting of families in which at least one parentalgenotype is missing and cannot be reconstructed but inwhich the condition for the S-TDT is satisfied. In thecontext of the analysis of an X-chromosomal marker,however, this category does not exist. The reason is thata subsibship is only suitable for the XS-TDT if two dif-ferent genotypes are present in this subsibship. Fordaughters, however, the presence of two different geno-types enables the reconstruction of both parents’ genotypes,and observing two different genotypes in the sons enablesthe reconstruction of the maternal genotype. Obviously,the paternal genotype is not required for countingtransmissions to sons.Again, the test statistic of the XRC-TDT is the stan-dardized total number of allelesA in affected offspring ofa family, and either approximate P values, based on thenormal approximation, or exact P values, based on thenumerical convolution of the null distributions corre-sponding to the different families in the sample, can beobtained.

A Special Case: Only Males Can Be Affected

Prostate cancer provides an example of a disease whichcan only occur in males. Whereas the XRC-TDT remainsa valid test for this kind of diseases, the transmissions ofmaternal marker alleles to daughters will provide noinformation but only add noise to the value of the teststatistic. Therefore, it is reasonable to expect that a morepowerful approach is obtained by taking only thematernal marker transmissions to the sons into account.This section describes a variant of the XRC-TDT, calledXLRC-TDT, which can be used for analyzing anX-linkedmarker and a disease limited to males.In a case where the mother’s genotype is available andis AB, the XLRC-TDT counts the number of markerallele transmissions to the affected sons. Under H0, thisnumber is binomially distributed asB(nmai, 1/2).The case where the mother’s genotype is missing ismost easily discussed by considering the possibilities forthe number of different genotypes being present in the

Michael Knapp

daughters.

1. If there are two different genotypes in thedaughters, parental genotypes can always be reconstructed.Since, conditional on the parental mating type, thegenotype distributions in daughters and sons are inde-pendent, the number of alleles A in affected sons of amother, who is heterozygous for allele A, is binomiallydistributed asB(nmai, 1/2).

TABLE I

Conditional Expectations and Variances of the Number Ti of A Alleles inAffected Sons.

Condition R EH0 (Ti | R) VarH0 (Ti | R)

(NBi > 0)nmai2·1−(1/2)n

mci −1

1−(1/2)nmci

nmai4·1−(nmai+1) · (1/2)

nmci

(1−(1/2)nmci)2

(NAi > 0)nmai2·

1

1−(1/2)nmci

nmai4·1−(nmai+1) · (1/2)

nmci

(1−(1/2)nmci)2

(NAi > 0) and(NBi > 0)

nmai2

nmai4·1−nmai · (1/2)

nmci −1

1−(1/2)nmci −1

2. If there are no daughters in the family, the appro-priate condition for reconstructing the maternal geno-type is (NAi > 0) and (N

Bi > 0), i.e., one must observe at

least one son with allele A and at least one son withoutallele A. The appropriate conditional expectation andvariance is given in the last row of Table I.3. If all daughters possess the same genotype, foursubcases must be distinguished:

a. All daughters have genotype AA. Then, at leastone son with an allele different from A (say, B) must beobserved to reconstruct the maternal genotype, i.e.,(NBi > 0) is the appropriate condition. The conditionalexpectation and variance are given in the first row ofTable I.b. All daughters have genotype BB. Then, the

condition for reconstruction is (NAi > 0); see the secondrow of Table I.c. All daughters have genotype AB. If the paternal

genotype is available and is A, then the condition is(NAi > 0). Analogously, if the paternal genotype isavailable and is B, then the condition is (NBi > 0).Finally, if the paternal genotype is also missing, then thecondition which enables us to reconstruct the maternalgenotype is (NAi > 0) and (N

Bi > 0).

d. All daughters have genotype BC. Then, at leastone son with allele Amust be observed to decide that themother is heterozygous for allele A, i.e., the relevantcondition is (NAi > 0).

Reconstructing Parental Genotypes

DISCUSSION

The approach of deducing missing parental genotypesin the context of family-based association analysis,introduced by Knapp (1999a) for autosomal markersand by Horvath et al. (2000) for X-chromosomal

markers, was based more on intuition than on theoreticalarguments. The results of simulation studies presentedby Knapp (1999a) and Horvath et al. (2000) supportedthe expectation that parental-genotype reconstructionimproves the power of methods which use marker datafrom unaffected sibs instead of from parents. A seem-ingly different approach for developing TDT-type sta-tistical tests when parental marker data are not availablehas been described by Rabinowitz and Laird (2000).Their approach is not restricted to considering a qualita-tive disease phenotype, but only requires an arbitraryfunction T depending on an individual’s phenotype.Analogously, let X denote an arbitrary function depend-ing on an individual’s marker genotype. The test statisticis obtained as a sum of pedigree-specific contributionsSi=;ni

j=1 XijTij with Tij and Xij denoting the coded traitphenotype and the coded marker phenotype, respec-tively, of offspring j in family i. Rabinowitz and Laird(2000) presented an algorithm for computing the condi-tional distribution of Si, given the minimal sufficient sta-tistic under the null hypothesis of no linkage and noassociation. In nuclear families with incomplete parentalgenotypes, the minimal sufficient statistic consists of theobserved traits, the partially observed parental geno-types, and the offspring genotypes. In the case where Tcodes the affection status with Tij=1 for an affected andTij=0 for an unaffected offspring, and where Xij countsthe number of alleles A in an individual, then Si countsthe number of alleles A in affected individuals. For thischoice of T and X, Horvath et al. (2001) compared theRC-TDT with the method proposed by Rabinowitz andLaird (2000). Their simulation results revealed that thetwo tests have very similar power. Indeed, it can beshown that the conditional distribution being relevantfor the RC-TDT is identical to the conditional distribu-tion being relevant for the approach of Rabinowitz andLaird (2000) for a family where the offspring allow theunique reconstruction of missing parental genotypes. Bythis correspondence, the work of Rabinowitz and Laird(2000) provides the theoretical justification for thereconstruction part of the RC-TDT and XRC-TDT.Vice versa, the relatively abstract algorithm introducedby Rabinowitz and Laird (2000) can be interpreted as

147

generalizing the RC-TDT to the case when parentalgenotypes cannot be uniquely reconstructed.

REFERENCES

Curtis, D. 1997. Use of siblings as controls in case–control associationstudies,Ann. Hum. Genet. 61, 319–333.

Curtis, D., and Sham, P. C. 1995. A note on the application of thetransmission disequilibrium test when a parent is missing, Am. J.Hum. Genet. 56, 811–812.

Elston, R. C. 1998. Statistical genetics ’98. Methods of linkage analy-sis—And the assumptions underlying them, Am. J. Hum. Genet. 63,931–934.

Ewens, W. J., and Spielman, R. S. 1998. Reply to Laird et al., Am. J.Hum. Genet. 63, 1915–1916.

Falk, C. T., and Rubinstein, P. 1987. Haplotype relative risks: An easyreliable way to construct a proper control sample for risk calcula-tions,Ann. Hum. Genet. 51, 227–233.

Ho, G. Y. F., and Bailey-Wilson, J. E. 2000. The transmission/disequilibrium test for linkage on the X-chromosome, Am. J. Hum.Genet. 66, 1158–1160.

Horvath, S., Laird, N. M., and Knapp, M. 2000. The transmission/disequilibrium test and parental genotype reconstruction forX-chromosomal markers,Am. J. Hum. Genet. 66, 1161–1167.

Horvath, S., Xu, X., and Laird, N. M. 2001. The family based associa-tion test method: Strategies for studying general genotype-phenotype associations,Eur. J. Hum. Genet. 9, 301–306.

Knapp, M. 1999a. The transmission/disequilibrium test and parental-genotype reconstruction: The reconstruction-combined transmis-sion/disequilibrium test,Am. J. Hum. Genet. 64, 861–870.

Knapp, M. 1999b. Using exact P values to compare the power betweenthe reconstruction-combined transmission/disequilibrium test andthe sib transmission/disequilibrium test, Am. J. Hum. Genet. 65,1208–1210.

Laird, N. M., Blacker, D., and Wilcox, M. 1998. The sib transmission/

148

disequilibrium test is a Mantel–Haenszel test, Am. J. Hum. Genet.63, 1915.

Martin, E. R., Kaplan, N. L., and Weir, B. S. 1997. Tests for linkageand association in nuclear families,Am. J. Hum. Genet. 61, 439–448.

McNemar, Q. 1947. Note on sampling error of the differences betweencorrelated proportions or percentages,Psychometrika 12, 153–157.

Rabinowitz, D., and Laird, N. M. 2000. A unified approach to adjust-ing association tests for population admixture with arbitrarypedigree structure and arbitrary missing marker information, Hum.Hered. 50, 211–223.

Risch, N. 2000. Searching for genetic determinants in the new millen-nium,Nature 405, 847–856.

Risch, N., and Merikangas, K. 1996. The future of genetic studies ofcomplex human diseases, Science 273, 1516–1517.

Rubinstein, P., Walker, M., Carpenter, C., Carrier, C., Krassner, J.,Falk, C., and Ginsberg, F. 1981. Genetics of HLA disease associa-tions. The use of the haplotype relative risk (HRR) and the ‘‘haplo-delta’’ (Dh) estimates in juvenile diabetes from three racial groups,Hum. Immunol. 3, 384.

Spielman, R. S., and Ewens, W. J. 1998. A sibship test for linkage in thepresence of association: The sib transmission/disequilibrium test,Am. J. Hum. Genet. 62, 450–458.

Spielman, R. S., and Ewens, W. J. 1999. TDT clarification, Am. J.Hum. Genet. 64, 668–669.

Spielman, R. S., McGinnis, R. E., and Ewens,W. J. 1993. Transmissiontest for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM), Am. J. Hum. Genet. 52,502–516.

Michael Knapp