....:j
OPTIMAL GROUP TESTING IN THE PRESENCE OF BLOCKERS
by
Scott A. Langfeldt, Jacqueline M. Hughes-Oliver, SUjit K. Ghosh,and S. Stanley Young
Institute of Statistics Mimeograph Series No. 2297
. May 1997
NORTH CAROLINA STATE l)NIVERSITYRaleigh, North Carolina
rMimeo Series
No. 2297May 1997
Optimal Group Testing in thePresence of Blockers
By: Langfeldt, Hughes-Oliver,Ghosh and Young
[I
Name
~o Q!iJi4
Date
'~r
j
I
"
..
! <;:tatistics. of tM Oe7'rtment 01 v
The library . U 'versityNorth Carolina State m
Optimal Group Testing in the Presence of Blockers
Scott A. LangfeldtJacqueline M. Hughes-Oliver
Sujit GhoshNorth Carolina State University
S. Stanley YoungGlaxo Wellcome
Abstract
Testing in groups can lead to great efficiencies in total testing cost when searching
for individuals with some characteristic. If the presence of a blocking object can cause
a group with a positive object to test negative, there is a need to find optimal pooling
strategies to minimize the cost of testing and reduce the number of missed positive
individuals. We develop application driven cost functions to determine optimal testing
strategies. Also, formulas are derived that describe the behavior of three different
grouping strategies and provide examples illustrating how to determine the optimal
strategy. We show that the strategies resulting from these methods provide much
lower expected cost than classical methods. These results can be directly applied to
HIV blood testing and for screening compounds for potential drugs.
Key Words: Composite Sampling, Pooling, Compound Screening, Square Array-"
Design
.1\. ii·,it. '
1
'..~
,
.-
1 Introduction
Large pharmaceutical companies have different compounds in collections ranging in size from
several thousands to several hundreds of thousands. The discovery of a new drug begins by
first identifying active compounds (that is, those that have some biological effect) from these
collections. Since a new drug can be worth hundreds of millions or even billions of dollars,
finding new drugs is very competitive; it is critical these huge compound collections be
screened quickly and economically while reducing the chance of missing compounds that
have strong drug potential.
The handling of large numbers of compounds for screening is logistically complex. Archive
dry samples are put into liquid stores so that robotic liquid handling systems can be used.
Small amounts_of these liquids are then transferred from master plates to daughter plates. A
typical plate has 8x12=96 wells or 16x24=384 wells in which the liquid stores of compounds
may be placed (See Figure 1 for example of 96 well plate.) Liquid handling robots are then
pr'0grammed to select and pool samples. The activity of a pool is measured and individual
compounds in highly active pools are retested.
1 2 3 4 5 6 78 9 10 11 12
A 000000000000BOOOOOOOOOOOOcOOOOOOOOOOOODOOOOOOOOOOOOEOOOOOOOOOOOOFOOOOOOOOOOOOGOOOOOOOOOOOOHOOOOOOOOOOOO
Figure 1: 8x12 = 96 Well Plate used in Compound Screening.
Section 2 contains a review of the methods of pooling and an assessment of activity of a
pool. These methods assume that all pools containing active compounds will be identified
as such. The existence of blocking compounds that mask the effect of active compounds
invalidates the assumption of no false negative test results. This paper provides alternative
formulas and optimal pooling strategies that account for the existence of blockers. Section 3
2
describes the different components of the cost and optimality criteria, and Section 4 contains
guidelines and examples of using the results in Section 3. A discussion'is presented in Section
5. Additional notation and details are included in the Appendix.
2 Group Testing in Compound Screening
2.1 Dorfman
The idea of group testing was first introduced by Dorfman (1943) to improve the efficiency of,screening all incoming US servicemen for syphillis prior to World War II. Dorfman suggests
instead of testing each serviceman's blood individually, pool k men's blood samples together,
and if the pool tests negative for syphillis, then conclude all servicemen in that pool are
negative. If the pool tests positive, retest each man's blood individually.
Suppose p is the proportion of individuals in the population that possess the charac
teristic, N is the number of individuals to be classified, and k is the pool size. Under the
assumptions of perfect testing and random dispersion of the characteristic, Dorfman shows
that the expected number of tests is given by:
k+l kET,Dorfman(k) = N[-k- - (1- p) ] (1)
..
Feller (1968) shows that (1) is minimized by using k to be the smallest integer larger
than 1/yIP. Intuitively, as p gets small, larger pools can be used as even a large pool will
test negative and, as a result, many unneccesary individual tests can be eliminated. For
example, if 1% of the population has some characteristic, then the optimal k of 11 results in
an 80% reduction of tests relative to testing each individual separately!
There are many multi-stage variations on the basic Dorfman strategy. For example, Fin
ucan (1965) suggests a multi-stage method which begins the same as the Dorfman strategy,
but instead of retesting all individuals in each positive pool, the individuals from positive
pools are re-pooled and then tested again in these new pools of possibly a different size. This
can be repeated several times. Also, Sterrett (1957) suggests a sequential scheme which again
begins as Dorfman; however, if a pool is positive, the individuals in that pool are tested one
at a time (sequentially) until a positive is discovered. Then the remaining individuals are
pooled and tested. If this new pool tests negative, then all the positive items in the original
pool were identified. If the pool tests positive, then these individuals are tested sequentially.
Others who have contributed to these multi-stage strategies include Sobel and Groll (1959),
3
..
Kuman and Sobel (1971) and many others. Although these strategies yield smaller expected
number of tests, they also require more than two stages and thus will not be considered
further.
If pools always register positive when there is one or more individual that has the char
acteristic, then the aforementioned methods based on Dorfman's strategy are all expected to
work well. Unfortunately, in compound screening there can be certain compounds, known
as blockers, that block the detection of an active compound when placed in the same pool
as an active. The basic Dorfman strategy with a small p leads to the use of large pool
sizes. However, as the pool size increases, the probability that a blocker is present with an
active also increases, raising the chances of missed active compounds. This clearly violates
the assumption of perfect testing. The criterion for optimality must now simultaneously
consider minimizing the probability of a false negative and minimizing the number of tests.
Our extension of the Dorfman strategy does this.
2.2 Square Arrays Designs
Phatarfod and Sudbury (1994) observe when testing blood for the HIV virus, it is possible
for a blood sample to neutralize a positive sample; the neutralizing sample is called a blocker.
If we let f be the proportion of blockers in the population, then, intuitively, as f gets large
relative to p, it will be more and more difficult to detect positive individuals. Phatarfod
and Sudbury (1994) are interested in reducing the probability of missing positive individuals
when these blockers exist, while at the same time reducing the expected number of tests.
They suggest placing the individuals in square (k x k) trays and using one of two testing
(pooling) strategies.
1. SAl: Each of the k rows and k columns are pooled and tested for a total of 2k
preliminary tests per tray. Then, all individuals that lie in a positive row and a positive
column are retested. We term this the AND strategy.
2. S A2: Each of the k rows of a k by k array are pooled and tested. If none of the k
rows is positive, then no further testing is conducted on that array. If exactly one row tests
positive, each of the individuals in that row are tested. If more than one row tests positive,
each of the k columns are pooled and tested, and all individuals that lie in a positive row
and a positive column are retested. This is a combination of the Dorfman scheme (zero or
one row positive) and SAl (more than one row positive), and can result in three stages of
testing if there are more than two active rows.
Phatarfod and Sudbury (1994) show that the expected number of tests per individual for
4
SAl and SA2 when blockers do not exist are given by:
ET,SAl(k)
ET,SA2(k)
N[2k- 1 + 1 - 2(1 - p)k + (1 _ p)2k-l]
N[2k- 1 +1 - 2(1- p)k + (1- p)2k-l
-k-1(1 _ p)k2_ p(l _ p)k2 _k]
(2)
(3)
r
For several possible values of p and assuming f = 0, Phatarfod and Sudbury (1994) find
corresponding optimal values of k that minimize the expected number of tests. They then
use these optimum values to examine the probabilities of a false negative and the expected
number of tests and find some improvements over simple Dorfman. One may reasonably
believe that if f were not restricted to equal 0 while finding the optimal k, then even greater
gains may be achieved. We investigate this belief in the remaining sections.
3 Costs of Conducting a Two-Stage Group Test
There are four basic steps required in two-stage group testing. First, the individual samples
are collected, organized and prepared for testing. Second, they are pooled. Third, the
pools are tested, and, fourth, the individuals from the positive pools are retested. When
you include the cost of missing a positive individual, this corresponds to four basic costs:
the startup cost, which is the same regardless of the strategy selected; the cost of pooling
(which mayor may not depend on the pool size); the cost of testing a cell, either pooled or
individual; and the cost of missing a positive individual.
Therefore, the total expected cost of using strategy S with pool size k is:
where
Ns(k)
ET,s(k)
EM,s(k)
Startup cost,
Cost of constructing one pool.
(assume cost of poolin~ does not depend on the size of the pool),
Cost of a single test,
Cost of an undetected positive,
Number of pools of size k for strategy S,
Expected total number of tests for strategy S of pool size k,
Expected number of undetected (missed) positives for strategy
S using pools of size k.
5
(4)
Note that if we know the probability of missing a positive, PM,s(k), then the expected
number of missed positive individuals is simply N PM,s( k).
Our goal is to find the strategy that minimizes the total expected cost. This strategy
will result in an optimum trade-off between the expected number of tests, number of pools,
and the expected number of missed positives.
This is not the first time that optimizing a cost function has been applied to a group
testing problem. Burns and Mauro (1987) suggest using a cost function when there are
probabilities of misclassification inherent in the test. They minimize a linear combination
of the total number of expected tests, the cost of a false positive and the cost of a false
negative.
3.1 Some Strategies
The easiest strategy is simply testing each individual separately. This will always result in
a. total cost of Co + C2N. Certainly, not very appealing, but it can be the best strategy in
some cases.
As before, let p be the proportion of individuals with the characteristic, and let f be
the proportion of individuals that are blockers. These events are mutually exclusive. We
will consider three two-stage testing strategies for the purposes of classifying each of N
individuals according to whether they have the characteristic: Dorfman, Square Array with
AND Retesting and Square Array with OR Retesting.
1. Dorfman. Pool each row and test for the characteristic of interest. If the pool is
negative, conclude all individuals in the pool are negative; if a pool is positive, retest each
individual separately.
In a square array strategy, as proposed by Phatarfod and Sudbury (1994), we randomly
place the individuals in square (k by k) arrays and each of the k rows and k columns are
pooled and tested, we can imagine two retesting strategies:
2. Square Array with AND retesting. Retest all individuals that are in a positive column
and a positive row. (This is equivalent tb Phatarfod and Sudbury (1994) SAl strategy.)
This strategy will result in twice as many initial pools as Dorfman, but fewer total tests, for
certain values of p, will be required. In addition, since a positive individual will be missed
if a blocker occurs either in the same row or in the same column, the probability of a false
negative test result can be high.
3. Square Array with OR retesting. Retest all individuals that are in a positive column
or a positive row. If missed positives are costly, this strategy might be a good choice. There
6
are many total tests required, but an individual will be missed only if a blocker occurs in
the same row and the same column.
To minimize and compare cost functions, we need to determine the number of pools,
Ns(k), the expected number of total tests, ET,s(k), and the expected number of missed
positives, EM,s(k), for each of the three proposed strategies, Dorfman, Square Array AND
retesting and Square Array OR retesting. The details of the derivations are given in the
Appendix, and the results are summarized in Table 1.
Table 1
Expected Values Used in Cost Function Optimization.Strategy Ns(k) ET s(k)a EM s(k)Dorfman Nk ·1 N[k- 1 + Rp,j(k)] Np[l - (1- J)k-l]
Square ArrayAND 2Nk-1 Nl2k- 1 + p(l - f~2(k-l) Np[l - (1 - f?(k-l)]
+ 1-p-J)Rp,J k-1)]
OR 2Nk-1 N{2k- 1 + 2Rp,j(k) N p[l - (1 - J)k-l j2-p(l - J)2(k-l)-(1 - p - J)R~.f(k - I)}
We can now determine optimal strategies for any situation where p, f, Co, C I , C2 , and
C3 are known.
3.2 Determining Cost Parameters
Assigning values to the cost parameters, Co - C3 , is a critical step. It may be possible to
assign to each parameter a dollar amount which includes manpower, cost of materials, time,
and maybe opportunity costs. If that is not possible, then a more creative approach may be
necessary. To begin the process, it may be easiest to set C2 (the cost of a single test) to be
1. Then the other cost parameters can be determined relative to C2• C I would be the cost
of pooling relative to conducting a test. So, C2=2, for example, would imply that the act of
pooling one group was twice as expensive as conducting a test. The"cost of missing a positive
individual can be thought of as "How many additional tests would I be willing to run if I was
guaranteed of identifying one previously missed positive individual." So, C3 = 100 implies
that the experimenter would be willing to run 100 additional tests if it meant identifying
one previously missed positive.
7
11
In compound screening, the cost of a missed positive can be estimated as the cost of
finding a lead compound. A lead compound is biologically active, amenable to chemical
modification, novel in structure or some other aspect and is not too different in any of
a number of other aspects. It typically costs on the order of half of a million dollars to
develop a high throughput assay and test 50 to 100 thousand compounds. Typically 50 to
100 potential leads are found. So, a lead costs 5 to 10 thousand dollars. There is often
redundancy in a compound collection so that if one particular compound is missed, it is
likely that a similar compound is found which will point back at the missed compound. This
would decrease the effective cost of missing an active compound. Since a single assay costs
approximately 1 dollar, this would result in C3 to be significantly less than 5 to 10 thousand
dollars and can be determined by the goal of the screen: identify most or just some of the
active compounds.
4= Minimizing Cost of Strategies
Once p, !, Co, CI , C2 and C3 have been determined, the difficult part is finished. Now,
all that remains is to evaluate the cost function and determine for which strategy at which
value of k is the cost minimum.
4.1 Scenario 1: p = 0.02, f = .01, Co - 200, C1
C3 = 100
0.5, C2 - 1 and
Suppose it is known that p = 0.02 and! = .01. Also, the experimenter desires to screen
10,000 individuals and has determined that the initial startup cost is 200, the cost of con
ducting a single test is 1, the cost of pooling the samples is 0.5 (half the cost of a test) and
a missed positive is 100. This corresponds to cost parameter values of Co = 200, CI = 1,
C2 = 0.5 and C3 = 100 and a cost function of:
Cs(k) = 200 + (0.5)Ns~k) + (1)ET,s(k) + (100)EM ,s(k)
To determine the best strategy, simply calculate the k that minimizes cost for each of the
three strategies using the formulas given in Table 1. For example, Dorfman with k = 10
results in a total cost of:
CD(10) 200 + (0.5)ND(10) + (1)ET,D(10) + (100)EM ,D(10)
200 + (0.5)(1000) + (1)(2670) + (100)(17.3)
5099
8
Similarly, we can calculate the cost for the Square Array AND and OR retesting strategies
where k = 10 to be:
CA (10) 6905
Co(10) 6293,
respectively. These need to be calculated for all reasonable k, and will result in the costs
presented in Table 2. Dorfman (k = 7), Square Array AND retesting (k = 9) and Square
Array OR retesting (k = 11) are optimal where Dorfman (k = 7) has the optimal overall
minimum cost of 4754.
Table 2Scenario 1: Total Costs for
Dorfman, AND and OR
retesting strategies: p = 0.02,
f = 0.01, Co = 200, C1 = 0.5,
C2 = 1, C3 = 100.ota ost
Dor man AND
6 4765 7376 71557 4754 7050 67448 4824 6898 64999 4945 6862 6361
10 5099 6905 629411 5276 7003 627712 5469 7142 6295
4.2 Scenario 2 - p = 0.02, f = .01, Co = 200, C1 = 0.5, C2
C3 = 1000
1 and
Now suppose the cost of a missed positive, is much higher, C3 = 1000 while everything else
remains the same. We now calculate the costs found in Table 3. Notice the pool sizes have
shrunk significantly (k = 3,3,8) and the optimal strategy is the Square Array OR retesting
(k = 8) with minimum cost 7330. Since the cost of a missed positive is so high and the
Square Array OR retesting strategy helps prevent missed positives, the OR strategy has
the minimal cost. This example illustrates the trade-offs between missed positives and total
expected number of tests.
9
4.3 Scenario 3: p
C3 = 1000
0.02, f .01, Co - 200, c1 - 2, C2 - 1 and
Now consider the original scenario except the cost of a missed positive is large, C3 = 1000
and the cost of pooling is twice the cost of conducting a test, CI = 2. We would expect the
overall cost of the AND retesting strategy to be higher and the OR retesting strategy to be
lower relative to the others. Also, the additional cost of constructing a pool will result in
higher costs for AND and ORand higher costs for smaller pool sizes.
Calculating the costs for all reasonable k results in Table 4. The best strategies are
achieved by using pool sizes of k = 4,4,10 for Dorfman, AND and OR respectively. The
optimal cost of 10640 is found by using OR with k = 10. Notice that if we tested each
individual separately, it would result.. in a total test cost of 10200. This is a case where we
should not pool; we should instead test individually.
For a fixed p and f, as C3 increases, the square array OR retesting strategy becomes
more attractive. Also, as the cost of pooling, CI , increases, Dorfman becomes better relative
to the Square Array Strategies.
Table 5 shows optimal strategies for a fixed cost function. Using Co = 200, CI = 2,
C2 = 1 and C3 = 1000 we get optimal strategies for different values of p and f. (These costs
correspond to those in Scenario 3.)
As f increases, the cost of the Square Array OR retesting strategy becomes lower and
lower relative to the other strategies' costs. Also, as p increases, the optimal pool sizes
decrease, as expected. Other than pool size, p has little effect on the strategy. There are
examples where the strategy changes as p changes, but this is not very common and usually
only occurs when there is little difference between the total costs of two strategies over the
range of reasonable p.
Figure 5 shows how the cost function changes as k changes. For reference, figures (a), (c)
and (g) correspond to scenarios 1,2 and 3 respectively. Where the cost of a missed positive,
C3 , is high and blockers are not too uncommon, (figures (c) and (g)), the cost increase rapidly
as k increases. Also, where C3 is low or Mockers are uncommon, the cost stays relatively
flat, making the selection of k once the strategy has been determined to be not extremely
critical. Also, in figure (g), none of the costs drop below 10200, the cost of individual testing,
reinforcing the fact that in this scenario, individual testing is the best choice.
10
5 Discussion
Square array strategies clearly have some advantages over classical Dorfman in several cases.
When 1 is high relative to p, or when the cost of a missed positive (C3 ) is high, the OR
retesting strategy has low total cost. The AND retesting strategy works very well when
the cost of a missed positive (C3 ) is low or 1 is low. It is interesting to note that even
when 1 = 0, the AND retesting strategy generally shows much lower costs than Dorfman.
In short, by using information about the proportion of blockers and the costs of testing,
pooling and missing positive individuals, a design can be determined which minimizes the
cost of identifying individuals with the characteristic.
A disadvantage of using square arrays is that it is not practical for a small group of
individuals if N / k2 is not close to an integer. For example, suppose the experimenter is
interested in screening 100 individuals where Co=200, C1 =2, C2=1 and C 3 =1000 where it
is known that p=0.005 and 1=0.01. For this scenario, the optimal test is OR with k=18.
However, this requires 182 = 324 individuals to fill one square array. In this case, it may be
more reasonable to form 12 pools using the optimal Dorfman strategy of k = 8 and conduct
a test on the four individuals left over or compare the costs of AND and OR with k=10.
There is need for additional research in this area.
If a large number of individuals are to be tested, it is not essential that N / P be close to
an integer in order to use a square array strategy. For example, a typical size for a collection
of compounds is 10000. In the scenario above where the optimal test is a Square Array AND
strategy with k = 18, 30 plates would be filled with a remaining 280 (2.80%) compounds
that are not assigned to a plate. These remainders could either be tested individually, placed
in a Square Array with k = 16 (24 remaining), or tested in Dorfman pools of k = 8. In
short, as N gets large, a smaller and smaller proportion of the individuals will be left out
of the original pools and there will be significant cost savings regardless of the number of
individuals remaining after the original pooling. In other words, for large samples, it is not
essential that N / P be close to an integer.
In scenario 3 we found that none of the 'strategies was better than testing each individual
separately. This was due to a high cost of pooling as well as a high cost of missing a positive
individual. If the cost of pooling is very high, this will result in none of the group testing
strategies being very effective because they are more expensive than individual testing. If
this is the case, the experimenter should not force pools as the cost would be high and many
positive individuals could be missed.
In the family of two-dimensional strategies, only square arrays were considered here.
11
There may be some large improvements in total cost if we generalize to rectangular k by I
arrays. Although the search for a minimum in a rectangular arrays will be more difficult as
compared to square arrays, it should be straight forward to find the k and I that globally
minimize total cost.
Also, some significant cost benefits may emerge if we no longer limit our investigation
to two-stage tests and consider various multi-stage or sequential testing strategies. There
are situations, however, where these multi-stage strategies are not practical. For example,
in compound screening, programming the robots that pool the compounds is not a trivial
chore, and since the time delay costs are so high, any avoidable delay is unacceptable.
Throughout this paper, we assumed that p and f were both fixed and known. It is
more likely the case that these values are completely unknown or may change as the work
progresses. In these situations, it is important that a strategy be devised to estimate p and
f while at the same time identifying individuals with the characteristic of interest. One
approach would be to run a preliminary test to estimate p and f and then use these values
to design the strategy and then implement it on a large collection of individuals. If the initial
experiment was in a k by I rectangular array, where k is very different from I, then this may
be possible. Also, a Bayesian approach would be appropriate if the experimenter has some
prior information about these values. Either of these methods would assist the experimenter
in estimating p and f.Some applications require that the pool sizes do not change once the bulk of the ex
periment has begun, making the initial estimate of p and f even more critical. This may
be inconvenient, but would only be required only once. However, this would not allow for
modification if later in the screening it becomes clear that p and f are very different than
previously believed.
The strategies discussed in the paper assume that the individuals being tested are stochas
tically independent. Usually, compound collections are ordered according to when they were
acquired and similar compounds are collected in batches. If these compounds are placed in
trays in the collection order, adjacent compounds are very likely to be similar. This violates
the assumption of stochastic independence, and makes these methods flawed. However, if
the compounds are instead randomly assigned to the pools, then this assumption will still
be valid. In HIV testing, blood collected in batches may tend to be similar (e.g., from the
same geographical location, etc.) which again will violate this assumption.
The Phatarfod and Sudbury (1994) SAl strategy is identical in description to the AND
strategy of Section 3.1: an individual will be retested if it lies in a positive row and a positive
12
..
column. Intuitively, a false negative will occur if either the row or the column contains a
blocker, which is higher than the probability of either the row or the column individually.
Phatarfod and Sudbury (1994) claimed that SAl would reduce the chance of a missed positive
as compared to Dorfman of the same pool size. Phatarfod and Sudbury (1994) incorrectly
stated that the probability of a false negative using SAl is {I - (1 - J)k-l p; this is the
probability of a missed positive for the OR retesting strategy. The correct probability should
be [1 - (1 - J)2(k-l)] which is strictly larger than the probability of a false negative using
Dorfman of [1 - (1 - J)k-l] for all f > o. The use of SAl (a.k.a. AND) does not decrease
the probability of a false negative, but instead increases it as compared to Dorfman of the
same pool size.
References
[1] Burns,K.C. and Mauro,C.A. (1987). Group Testing with Test Error as a Function of
Concentration. Communications in Statistics 16 (10), 2821-2837
[2] Dorfman, R. (1943). The Detection of Defective Members of Large Populations,Annals
of Mathematical Sciences 14, 436-440
[3] Feller, W. (1968). An Introduction to Probability Theory and its Application.Vol. 1, 3rd
end. Wiley, New York, 1968
[4] Finucan, H.M. (1965). The Blood Testing Problem, Applied Statistics 13, 43-50
[5] Kumar, S. and Sobel, M. (1971). Finding a Single Defective in Binomial Group Testing.
Journal of the American Statistical Association 66, 824-828.
[6] Phatarfod, R.M. and Sudbury, A. (1994), The Use of a Square-Array Scheme in Blood
Testing Statistics in Medicine, 13, 2337-2343
[7] Sobel, M. and Groll, P.A. (1959). Group Testing to Eliminate Efficiently all Defectives
in a Binomial Sample. The Bell System Technical Journal 38, 1179-1252.
[8] Sterrett,A.(1957),On the Detection of Defective Members of Large Populations. Annals
of Mathematical Sciences 28, 1033-1036
13
APPENDIX
Here we present the details that lead to the formulas given in Table 1. Let
At Event individual in row i, column j has the characteristic,
Aij Event individual in row i, column j is a blocker,
Aij Event individual in row i, column j is
neither a blocker nor has the characteristic,
Rt Event that row i is tested to have the characteristic,
Ri Event that row i is tested to not have the characteristic,
Cf Event that column j is tested to have the characteristic,
Cj Event that column j is tested to not have the characteristic,
P(At) p,
P(Aij) - f,P(Aij ) - 1 - p - f
Then assuming individuals have been randomly placed in pools (rows or columns) of size
k, we can show that
P(Ri contains no blockers)
(1 - J)k-l (5)
..P(R contains no blockers and at least one with characteristic) (6)
~ (k)! ifO(1 f)k-iLJ O!i!(k - i)!P - P -i=l
k
(1 - J)k L ( ~ ) C-P-)i(1 _ _ P_)k-ii=l z 1 - f 1 - f
(1 - J)k(1 - (1 _ _ P_)k)I-f
(1- J)k - (1 - P - J)\ (7)
P(AT)P(Rt n CtIAT) +P(A~)P(Rt n C+IA~) +tJ t J tJ tJ t J tJ
P(Ao.)P(Rt n CtIA,?)tJ t J tJ
pP(RtIAt)P(CfIAt) +
14
,.
(1 - p - f)P(RtIAij)P(CtIAij)
_ p(1 - f)2(k-l) + (1 - p - f)[(1 - f)k-l - (1 - p - f)k-l]2, (8)
P(Rt u ct) - P(Rt) +P(ct) - P(Rt nct)
- 2[(1 - f)k - (1 - P - f)k] - p(1 - f)2(k-l)
-(1 - p - f)[(1 - f)k-l - (1 - p - f)k-l]2. (9)
These results provide the basis for the required formulas for the three strategies, as given
below.
Dorfman
ET,DoTfman( k) - N/k +NP(Rt)1 k k- N[- + (1 - f) - (1- p - f) ]k
PM,DoTfman(k) - P(R? nAt:)Z ZJ
- P(At)P(RiIAt)
- p[1 - (1 - f)k-l]
NDoTfman(k) - N/k
(10)
(11)
(12)
Square Array with AND Retesting
ET,A(k) - 2N/k +NP(Rt net)
- N[2/k +p(1 - f)2(k-l)
+(1- p - f)[(1 - f)k-l - (1 - p - f)k-l]2]
PM,A(k) - P(At)[1 - P(Rt n ctIAt)]"
- p[1- P(RiIAt)P(CjIAt)]
- p[1 - (1 - j)2(k-l)]
NA(k) - 2N/k
(13)
(14)
(15)
Square Array with OR Retesting
ET,o(k) = N{2/k +2[(1- f)k - (1 - p - f)k] - p(1 - f)2(k-l)
-(1 - p - f)[(1- f)k-l - (1 - p - f)k-l]2} (16)
15
.,
t
PM,o(k) - P(AT)P(R~ n C~IAT)1) 1 ) 1)
= P(Aij)P(RfIAij)P(CjlAij)
- p(l - (1 - j)(k-l))2 (17)
No(k) - 2N/k (18)
16
p=O.02.1=0.01C1=0.5,C2=1,C3=1oo
p=0.01,1=0.001C1 =O.5,C2=1 ,C3=100
Dorfman
AND IOR
1=- -
~
!w
DOI1mon IANDOR
1=- -
I\\I
~<"~"'''''''''':''::'=':':'~'~'~'~'~'~~'=''''~''''~
~
!w
10 20 30 40 50 10 20 30 40 50
rPool Size
(a)Pool Size
(b)
p=0.02,1=0.01C1 =O.5,C2=1 ,C3=1 000
p=0.01.1=0.001C1 =0.5,C2=1 ,C3=1000
I.....I\
---
1=--DOI1manANDOR
Ii\I
~,,,,,,", ~ •.:..::"""'"'''''~'''
10 20 30 40 50 10 20 30 40 50
PooISlz1(e)
Pool Size(d)
p=0.01,1=0.001C1=2,C2=1,C3=100
1
- DOI1men I..... AND I- - OR
p=O.02,I=O.01C1=2,C2=1,C3=1OO
I I DOI1man I! ANDI -- ~
~-.=-.::-:.--~-~-"--10 20 30 40 50 10 20 30 40 50
Pool Size(e)
Pool Size(I)
p=O.02,1=0.01C1=2,C2=1,C3=1000
p=O.02,I=O.01C1=2,C2=1,C3=1oo0
___ --r ......
.. IDorfmanANDOR
,L....---I::.:: DOI1mon
ANDOR
10 20 30 40 50 10 20 30 40 50
Pool Size(g)
Pool Size(h)
Figure 1: Total Cost as a Function of Pool Size for Three Retesting Strategies
17
Table 3
Scenario 2 - Total Costs for
Dorfman) AND and OR
retesting strategies: p = 0.02)f = 0.01) Co = 200) C1 = 0.5)
Cz = 1) C3 = 1000.ota ost
Dor man And
3456789
9756106431200313587152881705218850
18288196242189124587275003052433599
112259162811675877361
73307435
.,
Table 4Scenario 3 - Total Costs for
Dorfman) AND and ORretesting strategies: p = 0.02)f = 0.01) Co = 200) C1 = 2)
Cz = 1) C3 = 1000.ota ost
Dor man And
456789
1011
1439315003160871743018927205172216623851
18
2712427891295873178634273369333969242507
166621411612587116471108010768
1064010650
1
.1"
.'"
Table 5
3 359257 13452
19
25
Top Related