A loss function related to the FDR for random effects multiple comparisons

10
Journal of Statistical Planning and Inference 125 (2004) 49 – 58 www.elsevier.com/locate/jspi A loss function related to the FDR for random eects multiple comparisons Charles Lewis , Dorothy T. Thayer Educational Testing Service, Princeton, NJ 08541, USA Received 9 December 2002; accepted 21 July 2003 Abstract When testing multiple comparisons for random eects, it is possible to apply Bayesian decision theory in a sampling theory context. Shaer (J. Statist. Plann. Inference 82 (1999) 197) provides a recent example. We show that minimizing the Bayes risk for a per-comparison “0–1” loss function also controls a random eects version of the false discovery rate, thus supporting and extending Shaer’s results. c 2003 Published by Elsevier B.V. MSC: 62J15; 62F03; 62F15; 62C10; 62C12 Keywords: Multiple comparisons; Bayesian decision theory; False discovery rate 1. Introduction Shaer (1999) recently studied the performance of several multiple comparison pro- cedures (MCPs) in what is essentially a random eects environment (population means sampled from a normal distribution). She found that, once it had been adjusted to pro- vide weak familywise Type I error (FWE) control, a modication of Duncan’s Bayesian MCP (Duncan, 1965; Waller and Duncan, 1969) performed similarly in this environ- ment to Benjamini and Hochberg’s (1995) procedure for controlling the false discovery rate (FDR). We consider a simple, one-way random eects model with both between and within variances assumed known. In this framework, we modify Duncan’s Bayesian approach, using a “0–1” loss function rather than a linear one, and characterize the MCP that minimizes the resulting Bayes risk. We further adopt the “wrong-sign” modication Corresponding author. 0378-3758/$ - see front matter c 2003 Published by Elsevier B.V. doi:10.1016/j.jspi.2003.07.020

Transcript of A loss function related to the FDR for random effects multiple comparisons

Journal of Statistical Planning andInference 125 (2004) 49–58

www.elsevier.com/locate/jspi

A loss function related to the FDR for randome)ects multiple comparisonsCharles Lewis∗, Dorothy T. ThayerEducational Testing Service, Princeton, NJ 08541, USA

Received 9 December 2002; accepted 21 July 2003

Abstract

When testing multiple comparisons for random e)ects, it is possible to apply Bayesian decisiontheory in a sampling theory context. Sha)er (J. Statist. Plann. Inference 82 (1999) 197) providesa recent example. We show that minimizing the Bayes risk for a per-comparison “0–1” lossfunction also controls a random e)ects version of the false discovery rate, thus supporting andextending Sha)er’s results.c© 2003 Published by Elsevier B.V.

MSC: 62J15; 62F03; 62F15; 62C10; 62C12

Keywords: Multiple comparisons; Bayesian decision theory; False discovery rate

1. Introduction

Sha)er (1999) recently studied the performance of several multiple comparison pro-cedures (MCPs) in what is essentially a random e)ects environment (population meanssampled from a normal distribution). She found that, once it had been adjusted to pro-vide weak familywise Type I error (FWE) control, a modi@cation of Duncan’s BayesianMCP (Duncan, 1965; Waller and Duncan, 1969) performed similarly in this environ-ment to Benjamini and Hochberg’s (1995) procedure for controlling the false discoveryrate (FDR).We consider a simple, one-way random e)ects model with both between and within

variances assumed known. In this framework, we modify Duncan’s Bayesian approach,using a “0–1” loss function rather than a linear one, and characterize the MCP thatminimizes the resulting Bayes risk. We further adopt the “wrong-sign” modi@cation

∗ Corresponding author.

0378-3758/$ - see front matter c© 2003 Published by Elsevier B.V.doi:10.1016/j.jspi.2003.07.020

50 C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58

of the FDR proposed by Williams et al. (1999) and show that this error rate, whenaveraged over all sets of population means, is controlled by the new procedure for anyvalue of the between-means variance. Consequently, it should be expected to performsimilarly to other FDR-controlling procedures. This result provides theoretical supportfor Sha)er’s simulation-based @ndings.Finally, for the more practical case where the between-means variance is unknown,

we propose a minor modi@cation of our procedure that continues to guarantee con-trol of the (“wrong-sign”) FDR. Simulations of pairwise comparisons for randome)ects demonstrate similar (but generally slightly superior) power, expressed as theper-comparison rate of correct sign declarations, for this procedure compared to thatof Benjamini and Hochberg (1995).After completing the research described in this paper, the article of Gelman and

Tuerlinckx (2000) was brought to our attention. The approach they take is very similarto ours (considering multiple comparisons for random e)ects, focusing on incorrect signdeclarations—their Type S errors—and adopting a Bayesian framework). It should benoted, however, that they seem to be unaware of, for instance, the work of Duncan(1965), Benjamini and Hochberg (1995) or Sha)er (1999). The biggest di)erencebetween our work and theirs is that Gelman and Tuerlinckx make no use of Bayesiandecision theory in their development. We view the primary contribution of this paperto be the demonstration that minimizing Bayes risk for an appropriate loss functionresults in control of the FDR, an important sampling theory multiple comparisons errorrate.

2. One-way random e�ects setup

We start with a vector of m population means \′ = (�1; : : : ; �m) and correspondingsample means Hy′ = ( Hy 1; : : : ; Hym) such that the Hy j are based on samples of size nj andare conditionally independent given the �j with Hy j | �j ∼ N(�j; �2=nj). Now supposethe �j are identically and independently distributed with �j ∼ N(; �2). In other words,we consider the �j to be random e)ects. In most practical applications, , �2 and�2 would all be unknown. Nonetheless, in what follows, we will @rst treat them asknown quantities. This approximates a large-sample situation where these parametersare replaced by consistent sample estimates. Our most important motivation for thissimpli@cation is that, as long as these parameters are considered known, a Bayesianapproach to the problem of making inferences about the �j has a sampling theoryinterpretation.Using Bayes’ Theorem, we may derive the conditional distribution of \ given Hy:

p(\ | Hy) =m∏

j=1

p(�j | Hy j); with �j | Hy j ∼ N(�j;

�2�2=nj

�2 + �2=nj

)and

�j =�2 Hy j + (�2=nj)

�2 + �2=nj:

C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58 51

Now consider a contrast among the �j : =∑m

j=1 cj�j with∑m

j=1 cj = 0. Let =∑mj=1 cj�j. Then it follows that

| Hy ∼ N ; �2�2

m∑j=1

c2j =nj

�2 + �2=nj

;

so we may write p( | Hy) = p( | ).In fact, we will be interested in a family of contrasts, such as the set of all pairwise

di)erences among the �j. We denote the vector of k contrasts of interest by �′ =( 1; : : : ; k), based on a (k × m) matrix of contrast coeNcients C = (cij), and the

corresponding vector of posterior means by �′=( 1; : : : ; k). The goal of our analysis

will be to identify the signs of each of the i based on the corresponding i (i =1; : : : ; k).

3. Decision theory framework

The sign identi@cation problem for the i may be recast in the language of decisiontheory. Speci@cally, for each i we wish to take an action, denoted by ai. The possibleactions we consider are ai=+1 (deciding that i ¿ 0), ai=−1 (deciding that i ¡ 0),and ai=0 (deciding that the sign of i is indeterminate). These correspond to the actionsdescribed, for instance, by Jones and Tukey (2000) in their directional interpretationof the usual signi@cance test for i.For the vector of contrasts �, let the corresponding vector of actions be denoted

by a′ = (a1; : : : ; ak). We now introduce a loss function for our decision problem. LetLk(�; a) =

∑ki=1 L1( i; ai) with L1( i; ai) de@ned as 0 if the sign of i is correctly

given by ai, as 1 if the sign is incorrectly identi@ed, and as A if ai = 0 and i �= 0.If ai = 0 and i = 0, we de@ne L1(0; 0) = 0. Note that L1 is similar to what Berger1985, p. 63) calls “0–1” loss. He states that “In practice, ‘0–1’ loss will rarely be agood approximation to the true loss.” We agree, but also observe that it is consistentwith the usual sampling theory error rates, where a “miss is as good as a mile.”Such a loss function may be expected to have a simple relationship to these errorrates.Next, we identify the decision function Dk : � → a that minimizes the (Bayes) risk

rk(Dk) = E�; �{Lk [�;Dk(�)]}. Say we denote the ith component of Dk by Dki. Usingmethods similar to those described by Berger (1985, pp. 163–166), it is straightforwardto show that the optimum function is de@ned by: Dki( i)=+1 if zi¿ z∗

1−A, Dki( i)=−1if zi6− z∗

1−A and Dki( i) = 0 if |zi|¡z∗1−A, with

zi = i

/√√√√�2�2m∑

j=1

c2ij=nj

�2 + �2=njand z∗

1−A =−�−1(A) for i = 1; : : : ; k:

52 C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58

4. False discovery rate

We now look at the directional false discovery rate or DFDR (Sha)er, 2002) for ourdecision function. The DFDR is de@ned as the expectation of the ratio of the numberof incorrect sign declarations to the total number of sign declarations, de@ning theratio to be zero if the denominator is zero. In our setting, the expectation is taken withrespect to the joint distribution of � and �.Suppose we de@ne ski( i)= |Dki( i)|. Thus ski( i) equals 1 if we declare a sign for

i and 0 otherwise. This allows us to de@ne

DFDR(Dk) = E�; �

[k∑

i=1

ski( i) · L1( i; Dki( i))

/max

{k∑

i=1

ski( i); 1

}]: (1)

To derive an upper bound for the DFDR, @rst rewrite (1) as

DFDR(Dk)

=E�

{k∑

i=1

E i | i[ski( i) · L1( i; Dki( i)) | i]

/max

{k∑

i=1

ski( i); 1

}}: (2)

Now E i | i[ski( i) · L1( i; Dki( i)) | i]6 ski( i) · A, with equality when |zi|6 z∗

1−A.

Thus we have∑k

i=1 E i | i[ski( i)L1( i; Dki( i)) | i]=max{

∑ki=1 ski( i); 1}6A. Sub-

stituting in (2) gives

DFDR(Dk)6E�(A) = A: (3)

Note that, as a practical matter, this will be a strict inequality, so that Dk actuallyprovides conservative control of the DFDR. We will illustrate this fact in a subsequentsection with simulation results.To summarize, (3) states that the decision rule chosen to minimize a “per-

comparison” Bayes risk,

rk(Dk) = E�; �{Lk [�;Dk(�)]}=k∑

i=1

E i; i[L1( i; Dki( i))];

controls a random e)ects version of the DFDR at level A. This may give some insightinto the simulation results obtained by Sha)er (1999), that a modi@cation of Duncan’sBayesian procedure very similar to our rule has comparable power to Benjamini andHochberg’s (1995) FDR-controlling procedure in a random e)ects setting.

5. Modi%cation of the procedure when �2 is unknown

So far we have only considered the case where �2 is known. In fact, it will typicallybe unknown and, consequently, it must be estimated from the data. (The values of and �2 will also typically be unknown. Their estimation, however, does not present anydiNculties for our procedure, so we ignore this issue, as did Sha)er (1999, p. 202).)

C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58 53

Table 1Estimated limits for DFDR, approached as �2 → 0, for a decision rule with �2 replacing �2, as a functionof the number of means, m, with A = 0:025

m 2 3 4 5 10 15 25 35 40 50

DFDR 0.014 0.023 0.028 0.031 0.036 0.035 0.031 0.027 0.025 0.022

Standard errors are 0.0003–0.0004, based on 100,000 replications.

First, we de@ne F=∑

nj( Hy j − Hy)2=[(m−1)�2]. A “natural” estimator of �2 is given by�2 =(F −1)�2[(m−1) ∑

nj]=[(∑

nj)2−∑

n2j ], at least when �2 is positive, i.e., whenF ¿ 1. This in turn suggests a modi@cation to our decision rule: Declare the sign of i

if and only if �2¿ 0 and |zi|¿z∗1−A, with zi = i=

√�2�2

∑mj=1 (c

2ij=nj)=(�2 + �2=nj).

If our interest is in controlling the DFDR, a question arises as to the e)ect of sub-stituting �2 for �2 in our decision rule. We carried out a simulation study to investigatethis question, restricting our attention to the set of all pairwise di)erences and takingall nj to be equal. First, we determined through simulations that, for the rule using�2, the DFDR is a decreasing function of �2. We then estimated the limiting value ofDFDR as �2 approaches zero. Table 1 gives the results of this study for several valuesof m and A=0:025 (so z∗

1−A=1:96). We see that, for m¡ 4 and m¿ 40, the modi@edrule appears to provide conservative control of the DFDR at 0.025. For m between 4and 40, some further (slight) modi@cation is needed to provide the desired control.Sha)er’s (1999) modi@cation of Duncan’s Bayesian MCP also replaced �2 with �2.

In addition, she adjusted her critical value t∞ (corresponding to our z∗1−A) to provide

weak control of the nondirectional familywise error rate (NFWER in the notation ofSha)er, 2002) at 0.050 for �2 =0. This is equivalent to controlling the DFDR at 0.025as �2 → 0. Her Table 1 (Sha)er, 1999, p. 203) shows the resulting values of t∞ forvalues of m ranging from 2 to 100. For m between 4 and 40 these values exceed 1.96,a result that is consistent with our Table 1, where 1.96 has been used for all valuesof m.In considering how to further modify our decision rule to control the DFDR, we

decided not to change the choice of z∗1−A. Instead, we introduced a second critical value,

denoted by F∗, to be compared to F . In this modi@cation, we declare the sign of i ifand only if F ¿F∗ and |zi|¿z∗

1−A. Here F∗ is chosen (based on simulation) so thatDFDR¡A for all �2. We prefer this modi@cation to Sha)er’s (1999) approach for tworeasons. First, the only problems with controlling the DFDR occur for small values of�2, so it seems appropriate to be conservative when �2 is small. (Even Sha)er’s (1999)approach only declares signs when �2¿ 0.) Second, for large �2 we would like our ruleto approximate the usual per-comparison procedure using z∗

1−A. With Sha)er’s (1999)approach this does not occur, whereas with our modi@cation it does. It may also beobserved that our modi@ed decision rule is related to the traditional “protected LSD”procedure, where individual comparisons are only made following a rejection of theoverall null hypothesis of no di)erences among the population means. Table 2 containsvalues of F∗ determined via simulation to provide control of the DFDR at 0.025 for

54 C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58

Table 2Values of F∗ chosen to control the DFDR at 0.025 for all values of �2 for a decision rule with �2 replacing�2, as a function of the number of means, m

m 2 3 4 5 10 15 25 35 40 50

F∗ 1.01 1.01 2.56 2.32 1.88 1.69 1.49 1.37 1.32 1.01

several values of m. Note that, for m=2 or 3 and m¿ 40, F∗ is set at 1.01, reRectingthe result indicated in Table 1 that no additional control is needed for these cases.

6. Comparisons among procedures

Finally, we wish to compare the performance of our Bayesian decision rule Dk , usingthe true value of �2, with the modi@ed rule that uses the estimate �2 in place of �2 andan initial test comparing F to a critical value F∗. These two procedures will be referredto as Bayes and empirical Bayes (or EB), respectively. We will compare them in termsof the DFDR and in terms of an appropriate measure of power. The measure we adoptfor power is essentially one used by Sha)er (1999) and referred to by her as averagepower (Sha)er, 1999, p. 205). In keeping with the notational conventions introducedby Sha)er (2002), we will refer to it as the directional per-comparison power rate, orDPCPR. In our random e)ects context, it may be de@ned for any decision rule Dk

used to identify the signs of a set of k contrasts as

DPCPR(Dk) = E�; �

[k∑

i=1

|Dki( Hy)|[1− L1( i; Dki( Hy))]=k

]:

Here the possible dependence of Dki on the complete vector of observed means Hy (suchas occurs in the EB procedure) is made explicit. A corresponding change would benecessary to provide a similar de@nition for DFDR(Dk).The simulations on which Figs. 1–5 are based used 25,000 replications of sampling

\ and Hy at each of several values of � (0.01, 0.50, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0, and 8.0),scaled so that �2=n = 1. Each of @ve MCPs (described below) was applied to eachreplication. We computed

Number of incorrect sign declarationsmax{number of sign declarations; 1} and

number of correct sign declarationsk

for each procedure on each replication. The mean of each of these quantities for eachprocedure was obtained over the 25,000 replications for each value of � and providedcorresponding estimates of the DFDR and DPCPR.Figs. 1 and 2 plot these estimates of the DFDR and DPCPR, respectively, for @ve

procedures as a function of � for m = 10 means (or k = 45 pairwise di)erences) andA = 0:025. Here we have taken �2=n = 1. Besides the Bayes and EB procedures, wehave included a @xed e)ects per-comparison procedure (labeled unadjusted), basedon z∗

0:975 = 1:96, a @xed e)ects per-family procedure (labeled Bonferroni), based on

C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58 55

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

DF

DR

6 7 82 3 410 5

Tau

.025 Line

BayesUnadjusted

B&HBonferroni

EB

Fig. 1. Random e)ects DFDR for @ve MCPs applied to all pairwise comparisons with m = 10 means and�2=n = 1, as a function of �.

0

0.25

0.5

0.75

1

DP

CP

R

8764 5321Tau

BonferroniB&HEBBayesUnadjusted

Fig. 2. Random e)ects DPCPR (average power) for @ve MCPs applied to all pairwise comparisons withm = 10 means and �2=n = 1, as a function of �.

z∗1−0:025=45 =3:26, and the directional version of the procedure (labeled B&H) proposedby Benjamini and Hochberg (1995) to control the @xed e)ects NFDR (nondirectionalFDR), set here at 2A=0:050. (This choice for B&H controls the random e)ects DFDRat 0.025. The use of the directional version of the B&H procedure is advocated by

56 C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58

0

0.025

0.05

0.075

0.1

0.125

Pow

er U

nadj

uste

d -

Pow

er B

ayes

0 87654321Tau

Fig. 3. Di)erence in random e)ects DPCPR (average power) for unadjusted and Bayes MCPs, with �2=n=1,as a function of �.

-0.075

-0.05

-0.025

0

0.025

0.05

0.075

Pow

er B

ayes

- P

ower

Em

piri

cal B

ayes

876543210Tau

Diff Bayes - EB m=50Diff Bayes - EB m=25Diff Bayes - EB m=10Diff Bayes - EB m=5Diff Bayes - EB m=4Diff Bayes - EB m=3Diff Bayes - EB m=2

Fig. 4. Di)erence in random e)ects DPCPR (average power) for Bayes and EB MCPs, with �2=n= 1, as afunction of � and m.

Williams et al. (1999).) Note that, for these last three procedures, the test statistic we

used is zi;@xed = i=√

�2∑

c2ij=nj, obtained from our zi by letting �2 → ∞.In Fig. 1 we see that all procedures, with the expected exception of the unadjusted

method, produced values of the DFDR less than 0.025 for all values of �. Note that theBayes procedure is quite conservative, especially for small values of � (less than 0.5),with DFDR ≈ 0:0. On the other hand, for �¿ 1 it is the least conservative among theDFDR controlling procedures.

C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58 57

-0.1

-0.05

0

0.05

0.1

Pow

er E

mpi

rica

l Bay

es -

Pow

er B

&H

876543210Tau

Diff EB - B&H m=50Diff EB - B&H m=25Diff EB - B&H m=10Diff EB - B&H m=5Diff EB - B&H m=4Diff EB - B&H m=3Diff EB - B&H m=2

Fig. 5. Di)erence in random e)ects DPCPR (average power) for EB and B&H MCPs, with �2=n = 1, as afunction of � and m.

Fig. 2 reRects the relationships observed in Fig. 1, but now expressed in terms ofaverage power (DPCPR). The unadjusted procedure provides the greatest power, whilethe Bonferroni procedure provides the least power (except for �6 0:5, where the Bayesprocedure has essentially no power). For �¿ 1:5, Bayes has the greatest power amongprocedures controlling the DFDR, approaching the power of the unadjusted method for�¿ 4. The EB procedure shows very similar power to Bayes for �¿ 1:5. B&H haspower similar to both Bayes and EB, a result that is consistent with Sha)er’s (1999)@ndings.Fig. 3 compares the power of the unadjusted and Bayes procedures. Since we are

looking at a per-comparison measure of power and both of these are de@ned as per-comparison procedures, their random e)ects power curves will not change with m. Wesee that the greatest di)erence in power between the two procedures occurs for �= 1and is approximately 0.11. The di)erence decreases for larger values of � and is lessthan 0.025 for �¿ 3.Fig. 4 compares the average power of the Bayes and EB procedures for di)erent

values of � and m. We may draw the following general conclusions from this com-parison: EB is more powerful for small values of �, while Bayes is more powerfulfor large values of �; the di)erences are greater for small values of m and smaller forlarger values of m, with all di)erences less than 0.005 for m= 50.We may observe that the Bayes procedure was developed to minimize Bayes risk,

not to maximize power. We proved that this procedure also controls the DFDR at Afor all values of �. However, this control is very conservative for values of � closeto zero. The EB modi@cation, on the other hand, does not minimize Bayes risk, butis designed to provide close control of the DFDR for small �, and, as a result, hasgreater power for this situation.

58 C. Lewis, D.T. Thayer / Journal of Statistical Planning and Inference 125 (2004) 49–58

Fig. 5 compares the average power of EB and B&H as a function of � and m. The@rst point to observe is that the absolute di)erence in power for the two proceduresnever exceeds 0.06, regardless of the values of � and m. For m = 2 and 3, B&H isuniformly more powerful than EB. For m¿ 4, EB is more powerful than B&H, withthe di)erence increasing gradually for increasing m. In interpreting these comparisons,it should be remembered that EB is a special purpose procedure, speci@cally designedfor the random e)ects model, while the B&H procedure is much more general, anddesigned to control the FDR for a wide variety of models.

Acknowledgements

This paper bene@ted greatly from many years of collaboration with John W. Tukey,many fruitful discussions with Juliet Popper Sha)er and Yoav Benjamini, and manyhelpful suggestions from the editors and referees.

References

Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approachto multiple testing. J. Roy. Statist. Soc. B 57, 289–300.

Berger, J.O., 1985. Statistical Decision Theory and Bayesian Analysis, 2nd Edition. Springer, New York.Duncan, D.B., 1965. A Bayesian approach to multiple comparisons. Technometrics 7, 171–222.Gelman, A., Tuerlinckx, F., 2000. Type S error rates for classical and Bayesian single and multiplecomparison procedures. Comput. Statist. 15, 373–390.

Jones, L.V., Tukey, J.W., 2000. A sensible formulation of the signi@cance test. Psychol. Methods 5,411–414.

Sha)er, J.P., 1999. A semi-Bayesian study of Duncan’s Bayesian multiple comparison procedure. J. Statist.Plann. Inference 82, 197–213.

Sha)er, J.P., 2002. Multiplicity, directional (Type III) errors and the null hypothesis. Psychol. Methods 7,356–369.

Waller, R.A., Duncan, D.B., 1969. A Bayes rule for symmetric multiple comparisons problems. J. Amer.Statist. Assoc. 64, 1484–1503.

Williams, V.S.L., Jones, L.V., Tukey, J.W., 1999. Controlling error in multiple comparisons, with examplesfrom state-to-state di)erences in educational achievement. J. Ed. Behavioral Statist. 24, 42–69.