Statistically Ap'propriate Numbers of Ammals · Statistically Ap'propriate Numbers of Ammals The...

9
Vol 41, No I January ]991 Labaratory Animal Science Capyright'1991 by !he American Associatian far Laboratary Animal Science MD Mann, DA Crouse and ED Prentice generaldownwardtrend in animal useis not simply the result of animal welfare legislation beca use its beginning clearly precedes the Animal WelfareActof1985. Decliningusage also. could be theresult ofincreased costof animals and their careo but increasing grant funds should have partly offset that ef- recto A number of factors are probably responsible for this trend, among them increased concern among scientists for animal welfare, increased costof purchase and maintenance of animals, decreased funding of animal-related research from Federal sources, increased oversight by institutional commit- tees, and increased emphasis on molecular approaches and biotechnology which often require fewer animals. It is not possible to rank-order these factors by importante. Clearly, the reasonsfor the decline in the number of animals used are multiple. Over the last decade, public awareness ofthe use of animals in biomedical research has generally increased, whereas public attitudes toward such research have not necessarily changed. Most people still recognize the poten- tial benefitsof animal research and believe that thesebenefits outweigh the costs. At the same time, however, the public is concerned that animal s are used both humanely and wisely, e.g., wisely in terms ofconserving natural resources. General- ly, scientists reflect the attitudes ofthe population and there has beena concerted effort to reduce the number of animals used in research. Scientific granting agencies, oversightbodies such as the U.s. Department ofAgriculture (USDA),and the scien- tific community have all endorsed the use of Russell and Burch's three Rs (1). Russell and Burch recommended that efforts be made to Replace animal s with nonanimal models, Reducethe number of animals used in research, and Refine the techniquesof researchtoreduce animal suffering. Clearly, the intentofthe Animal Welfare Act of1985 (2)and therules resulting from that law are designed to encourage the research community to implement these three principIes. This is reflected in the following passage from the Public Health Service (PHS) Policy on Humane CaTe and Use of Laboratory Anjmals(3): "The animals selected should be of an appropriate species and quality and the mínimum number required to obtain valid results. Methods such as mathematical models, computer simulation, and in vitro biological systemsshould be considered."The USDA animal welfare regulations (4)algo reflect this concern: "A proposal to conduct an activity involving animals, or to make a signifi- cant change in an ongoing activity involving animal s, must contain...A rationale for involving animals, and for the ap- propriateness ofthe species and numbers of animals to be us- ed..." Both the PHS and USDA have charged the Institu- tional Animal Care and Use Committee (IACUC)l with en- suring appropriate and humane animal care and use.It is dif- ficult to see how any committee could offer such assurance without considering the appropriateness of animal numbers. Five years of experience in making decisions regarding the appropriatenessof animal numbershas convinced us that this is a complicatedissue.an issue '~'orthvofexamination in liíT1o¡t .--"'- of animal welfare considerations. A number offactors enter into any considerationsofthe appropriate number of animals to be used in a research project. Statistical, economic, contrac- tual and welfare issues all play somerole in such considera- tions. Investigators and committees must use all ofthese fac- tors in making decisionsaboutthe appropriateness ofanimaJ numbers. In thisreview, we willfirstconsiderthe statistical aspects of experimental design and will then take up non- statistical issues which committees encounter in the review of proposed experiments using animals. Statistically Ap'propriate Numbers of Ammals The decision about the number of animals to use in an experiment often can be made on statistical grounds. that is, the appropriate minimum number of animélJ..tu be Consistent with the national trend, the number of animals used in research at the University of Nebraska Medica! Center has declined dramatically since 1979. This ¡'-rom me Department of Physiology and Biophysics (Mann) and the Depart. ment IÍ Anatomy (Crouse:, Prentice):,University ofNebraska Medical Center:, Omaha: N~ 6f11!'S 6

Transcript of Statistically Ap'propriate Numbers of Ammals · Statistically Ap'propriate Numbers of Ammals The...

  • Vol 41, No I

    January ]991

    Labaratory Animal ScienceCapyright'1991by !he American Associatian far Laboratary Animal Science

    MD Mann, DA Crouse and ED Prentice

    general downward trend in animal use is not simply the resultof animal welfare legislation beca use its beginning clearlyprecedes the Animal WelfareActof1985. Decliningusage also.could be theresult ofincreased cost of animals and their careobut increasing grant funds should have partly offset that ef-recto A number of factors are probably responsible for thistrend, among them increased concern among scientists foranimal welfare, increased cost of purchase and maintenanceof animals, decreased funding of animal-related research fromFederal sources, increased oversight by institutional commit-tees, and increased emphasis on molecular approaches andbiotechnology which often require fewer animals. It is notpossible to rank-order these factors by importante. Clearly,the reasons for the decline in the number of animals used aremultiple.

    Over the last decade, public awareness ofthe useof animals in biomedical research has generally increased,whereas public attitudes toward such research have notnecessarily changed. Most people still recognize the poten-tial benefits of animal research and believe that these benefitsoutweigh the costs. At the same time, however, the public isconcerned that animal s are used both humanely and wisely,e.g., wisely in terms of conserving natural resources. General-ly, scientists reflect the attitudes ofthe population and therehas been a concerted effort to reduce the number of animalsused in research.

    Scientific granting agencies, oversight bodies suchas the U.s. Department of Agriculture (USDA), and the scien-tific community have all endorsed the use of Russell andBurch's three Rs (1). Russell and Burch recommended thatefforts be made to Replace animal s with nonanimal models,Reduce the number of animals used in research, and Refinethe techniques of research to reduce animal suffering. Clearly,the intentofthe Animal Welfare Act of1985 (2)and therulesresulting from that law are designed to encourage theresearch community to implement these three principIes.This is reflected in the following passage from the PublicHealth Service (PHS) Policy on Humane CaTe and Use ofLaboratory Anjmals(3): "The animals selected should be ofan appropriate species and quality and the mínimum numberrequired to obtain valid results. Methods such asmathematical models, computer simulation, and in vitrobiological systems should be considered." The USDA animalwelfare regulations (4) algo reflect this concern: "A proposalto conduct an activity involving animals, or to make a signifi-cant change in an ongoing activity involving animal s, mustcontain...A rationale for involving animals, and for the ap-propriateness ofthe species and numbers of animals to be us-ed..."

    Both the PHS and USDA have charged the Institu-tional Animal Care and Use Committee (IACUC)l with en-suring appropriate and humane animal care and use. It is dif-ficult to see how any committee could offer such assurancewithout considering the appropriateness of animal numbers.Five years of experience in making decisions regarding theappropriateness of animal numbers has convinced us that thisis a complicated issue. an issue '~'orthv ofexamination in liíT1o¡t.--"'-of animal welfare considerations. A number offactors enterinto any considerations ofthe appropriate number of animalsto be used in a research project. Statistical, economic, contrac-tual and welfare issues all play some role in such considera-tions. Investigators and committees must use all ofthese fac-tors in making decisionsabout the appropriateness ofanimaJnumbers. In thisreview, we willfirstconsiderthe statisticalaspects of experimental design and will then take up non-statistical issues which committees encounter in the reviewof proposed experiments using animals.

    Statistically Ap'propriateNumbers of AmmalsThe decision about the number of animals to use

    in an experiment often can be made on statistical grounds.that is, the appropriate minimum number of animélJ.. tu be

    Consistent with the national trend, the number ofanimals used in research at the University of NebraskaMedica! Center has declined dramatically since 1979. This

    ¡'-rom me Department of Physiology and Biophysics (Mann) and the Depart.ment IÍ Anatomy (Crouse:, Prentice):, University ofNebraska Medical Center:,Omaha: N~ 6f11!'S

    6

  • Animal Numbers in Research

    .., ,'Ol' "Table 1 Sources of sample si~e tables~d nomo~ms

    .Reference

    .11

    .15,38

    .9

    .39

    .15,40.14,41,42.13,15,17,43.12,44.9,15

    .10,15,45,46

    .9

    .15

    .9,12,15,44,47

    .48

    .49

    15,43

    StatisticaITest ChiSquare,2x2 ChiSquare,rxc FTest ;..;: Log-rankStatistic MultipleCorrelation Odds Ratios One-way Analysis ofVariance (Parametric).

    Pairedttest Proportions Pearson Correlation Coefficient

    Comparing Two Proportions ConfidenceLimits SignTest t Test. Two-sample McNemar Test. Two-sample McNemar Test, Matched Pairs .

    Two-way ~alysis ofVaria~arametric) .

    Do Not Reject Ha "E j~ Reject Ha

    Values of tFigure 1 Sampling distribution ofthe t ratio when the nullhypothesis is true (H. True) and when the alternative hypothesis istrue (H, True). The criterion value ofthe tratio is indicated by thevertical line labeled tu. The probability associated with tha¡criterion, a, is indicated with the aTea to the right of tu under the H.True curve (cross hatched aTea), whereas the f] probability is indicatedby the aTea to the left ofthe tu under the H, True curve (hatchedaTea). Thepowerofthetest, l-f], isindicated bythe balance ofthe areaunder the H, True curve.

    used is specifiable given the nature ofthe experiment and thestatistical tests to be used in analyzing the data obtained. Pro-per use of most statistical tests requires that the decision ofwhat test to use precede the experiment, and, in fact, the ex-periment must be designed to use that particular test. Inreference to analysis ofvariance, Sokal and Rohlf(5) remindus that, "An important point about such testsis that they aredesigned and chosen independently ofthe results ofthe ex-periment. They should be planned before the experiment hasbeen actually carried out and the results obtained;' Manyjournals are now requiring that "power-based" assessmentofthe adequacy ofthe sample size be provided in papers sub-mitted for publication (6,7,8). Such an assessment should bemade before the experiment is begun.

    It is beyond the scope ofthis paper to present theformulae for calculation of appropriate sample sizes. Calcula-tion formulae, sample size tables and nomograms exist for avariety ofstatistical tests (9,10,11,12,13,14). Table 1 presentsBorne examples oftests for which such tables exist, but thislist is by no means exhaustive. Particularly, readers are refer-red to an excellent text by Jacob Caben (15) which discussespower analysis in detail and presents tables for a wide rangeof statistical tests. Unfortunately, there are tests commonlyused in experimental research for which there are not suchtables, but Erb (16) suggested Borne strategies for applyingexisting tables to simi1ar tests f(,l" ",hich there are no tables.Use of these formulae, tables or nomograms requiresknowledge ofseveral parameters: the acceptable alpha, typeI or false positive error; the acceptable beta, type II or falsenegative error; the smallest difference worth detecting or theeffect size {15J; and the variability of the control and ex-perimental populations.

    Alpha error: Alpha error is the probability of er-ror in concluding that there is a difference between ex-perimental and control populations, i.e., concludingthat theexperimental treatment ofthe animals has an effect when inreality there is no effect. This is the error associated with re-jecting the notorious null hypothesis, the hypothesis ofno dif-ference or no effect, when it is actually true. In parametrictests, it is the area (for so-called one-tailed tests) or areas (fortwo-tailed tests) under the extremes ofthe normal curverepresenting the null hypothesis (Ho True) beyond the

    criterion value, in Figure 1 the cross-hatched afea. Everybiological population contains some variation. Randomsamp]es from that popu]ation will rarely be identical tO eachother. Therefore, there is a finite probability that two very dü-ferent samples cou]d be drawn from a single population.Because the samples are different, the conclusion may bedrawn that they come from different populations, when, infact, they do noto

    The probability of an a error can never be zero inbiological experimentation, but biologists will settle for anappropriately low, nonzero probability. This is the ubiquitousleve] of significance or p value. The acceptable a probabilityhas been arbitrarily set at 0.05 or 0.01, va]ues employed inmost statistical testing. The 0.05 leve] indicates a willingnessto incorrectly reject the null hypothesi¡; once in 20 times \\'henit is actually true. This probability is frequently used asfollows: When the calculated probability is equal to or lessthan 0.05, the null hypothesis is rejected (i.e., it is concludedthat the treatment had an effect), whereas, ifthe ca]cu]atedprobability is equal to or greater than 0.06, the nullhypothesis is not rejected (i.e., it is concluded that the treat-ment had no effect). In this way, the experimenter could limitbis ]iability of error to 1 in 20 such experiments. But considerthe probabi]ity ofO.06. This corresponds to an error rate of1 in 17 or 18 instead of 1 in 20, a small difference ",.hich, bycommon sense, should merit a similar interpretation. Thepractice of simp]y specifying a difference associated with aprobability of 0.06 or even 0.07 as "insignificant;' withoutspecifying the actual values, is indefensible, i.e., it ispreferab]e to specify the ca]culated p value.

    Most people would agree that the smaller the pro-

    7

  • Vol 41, No 1Laboratory Animal ScienceJanuary 1991

    ~

    ~;-- ~~~~~~70

    1.4 1.0 0.61.0

    0.8

    C-

    ~ 60'-

    QJ.

    l 0.6'-Q)

    ~ 0.4O-

    r 1.0

    1.40.2

    0.0o l~O20 40 60 80 100

    Number of Animals per Group

    o ~ I I .

    o 0.05 0.10

    Significance Criterion, CX2

    Figure 2 The relationship between the significance criterion, Q.,and the sample size. This figure was constructed for the t ratio forequal size samples assuming a {3 probability ofO.2. Because samplesize depends upon the effect size, three different curves were con-structed for effect sizes, 0.6, 1.0 and 1.4, where effect size, A, is in-dicated by A = (P.-~)/(J, with 1'. and I'y the means ofthe two popula-tions and o the withm population standard deviation, assumed to beequal in the two populations. Alpha values are for two-tailed tests.The hatched afea spans the normally accepted significance criteria.

    Figure 3 The relationship between the power ofthe test, 1-~, andthe sample size. The figure was constructedfor the tratio for equ~lsize samples, assuming a.=O.O5 (two-tailed). AII other assumptionsand conventions are the same as those in Figure 2. The hatched aresspans the range of powers which can be considered reasonable fordetection of an effect.

    error, as it would be if a promising treatment for cancer wereincorrectly discarded. then (3 should be smaller than a. Theloss oflives and increased suffering that might result fromdiscarding the treatment would indeed be costly. On the otherhand, if committing an a error is more costly than commit-ting a (3 error, as it would be if an ineffective treatment wereadopted when another, effective, treatment exists. then ashould be smaller than {3. Again, the loss oflife and increas-ed suffering could be significant. Animal experimenters maybe tempted to make both values vanishingly small, but theadditional animalsrequired may be too high a price to payfor such certainty.

    Fig'JYe 3 s~c'.':s t!1e relationship bet\\'een pov..er andsample size for the same three relative effect sizes used inFigure 2. Clearly sample size may be reduced by acceptinga reduced power but, below about 0.7, even large reductionsin power have relatively little effect on sample size.

    Effect sjze: Keppel (17) has pointed out that "inmosto if not all, of the experiments we conducto the nullhypothesis is falseo That is, when we treat groups of subjectsdifferently. they will behave differently:' Ifthese differencesare small then we must be certain we have enough animalsto detect them. The effect size is the actual difference betweenthe experimental and control populations in the parametersul;ing measured. In order to determine the appropriate sam.

    bability of an a error the better. There are, however, limits tothe truth ofthis statement. In general, the smaller the a pro-bability, the smaller the power ofthe experiment', i.e., the lesslikely the experiment is to find an effect ofthe treatment ifit exists. AII other things being equal, the smaller the accep-table a probability, the larger the sample size must be todetect a real effect ifit exista. Even a cursory examination ofany statistical table will verify this fact. ¡n FigUl.e 2 are plot-ted two-tailed significance criteria, a", against sample sizefor three different effect sizes (0.6,1.0 and 1.4 times the stan-dard deviation). Clearly, sample size must increase withreduced a probability. In addition, the smaller the probability(a) of accepting a false positive effect, the larger the probabili-tyof r.,;1:"~I-~~ e;ect .,t.,l se",,111,'." ol-1, e~;s ; e l-1, e 1" r ~ e-'1, cl~_"-.b _v. J ~ .~. ..~.. .'.1 t' ,'. ~ b probability of a .B error.

    Beta error: The.B error occurs when it is wronglyconcluded that the treatment has no effect, and it is relatedto the power ofthe study, power = 1 -{3, the probability that

    a difference that actually exists will be detected by the study.In parametric tests, the .B error is the aTea under the normalcurve representing the hypothesis alternative to the nullhypothesis (H1 True) beyond the criterion value; in Figure 1it is the hatched aTea. The power ofthe test is the remaindercf..l-.c ¿¡-~¿ -'¡¡-..:l.:l' ...he "H1 True" curve. The optimal relation-ship between a and.B dependa upon the situation ofthe study.If CV¡ilm¡:.:.ing a ¡3 t:rror is lrlur~ l:(¡~Lly than committing an a

    8

    ~L-Va.V)

    oE

    'c«-oL-V

    .Q

    E::3Z

    5

    4

    3

    2

    1

    o

    o

    o

    o

    o

  • Animal Numbers in Research

    expects. In other words,he must determIne' 'the smalIest dú-ference that is worth detecting" (16). It is possible to find evena miniscule difference statisticalIy significant; but, thoughthis difference is statistica11y significant, it may not bebio1ogica11yor clinica11y significant or important. A verysmalI improvement in the therapeutic potential of a new drugmay be detectable statisticalIy, but that smalI improvementmay have no clinical importance ifthe drug is more expen-sive or if it has undesirable side-effects.

    Caben (15) has pointed out that "...the larger theES [effect size] posited, other things (significance criterion,desired power) being equal, the smalIer the sample sizenecessary to detect it:' Conversely, the smalIer the effect size,the larger the sample size must be. The appropriate samplesize is that needed to detect an expected effect size, ifthe ef-fect exists. Erb (16) has emphasized that a sample size largerthan that needed to find the smalIest worthwhile differenceconstitutes a waste of animals. Furthermore, a sample sizesmalIer than that needed to detect such a difference is an in-appropriate use of a11 the animals ofthe study.

    As an example ofthis problem, our committee wascalIed upon toreview aprotocol3 involvinguse ofmonkeys ina study ofvarious procedures to shorten thejaw (sagittal split-ramus osteotomy). In the initial protocol, 15 monkeys wererequested to be put into 10 different experimental groups.Given the nature of the anatomical observations andmeasurements to be taken and the variability of such obser-vations, no useful information could have be en derived fromthis study. The committee, therefore, suggested that eitherthe number of animals per group be increased or the numberof groups reduced. In this case, it may be 1ess wasteful ofanimals to use more ofthem. In fact, the committee reviewof this protocolled to no increase in animal usage, largelybecause ofthe projected cost, but did result in a more focusedproject which addressed the most important experimentalvariables.

    It is difficul t to say how common this error may beover the whole range ofbiomedical research. A hint ofits com-monness was given by Freiman, et al. (18,19) who reviewed71 negative clinical trials. They calculated, using datapresented in fue reports ofthese trials, thepower ofthe studiesgiven the sample size employed and found that 67 ofthestudies involved insufficient patients to detect a 25% benefitofthe therapy. Fifty ofthe studies could not have detected a50% benefit ir it existed. Freiman, et al. had no way ofcalculating the costs and benefits for each trial, but it is hardto imagine that a 50% benefit is unimportant. Even worse,most ofthe authors concluded that no clinically meaningfuleffect existed. We have no similar data for animal ex-periments, but we may presume that this error is not uncom-mon in them (20). Similar experiments involving animalswould be wasteful of animal s; it alBo may be regarded asunethical by peers (21). False negative findings have an ad-ditional effect whichmay be devastating to scientific progress.Such findings may prevent others from investigating thephenomena involved and certainly will affect wt: \va} u~ltrsthink about them.

    ; :c.are mclmed not to beheve any res\ut ach¡eved with a simplesize of 5. But, both investigators and committees must beaware that, if the assumptions of a statistical test are true forsuch a small sample, a statistically significant result isbelievable. As we have seen, the difficulty comes when theresult is not significant (or more correctly, when the acceptedcriterion value is not met).

    Variability: One ofthe most common causes offailure to detect a real difference is excessive variability inthe measurements. Control measures in experiments are im-portant for reducing the undesirable variability. However,even if all extraneous variability were controlled, theparameters measured still'would be different in differentanimals or subjects; individuals within a population simplydiffer from each other.

    It is important that any sample value (mean,pro-portion, variance) be clase to the relevant population value,i.e., it must be reliable. Reliability may be related to a numberoffactors, depending upon the specific statistical model be-ing used, but it is always related to the size ofthe sample. Forexample, the most commonly used measure ofreliability, fuestandard error ofthe mean, depends upon the square root ofthe ratio of an unbiased estimate ofthe pópulation variance(8) and the sample size (n), thus, SE =~Similarly, the stan-dard error of a Pearson correlation coefficient depends uponthe ratio of 1 minus the square of the population coefficient(r) and the square root of the sample size minus 1,SE = (1 -r)/.[ñ""=1. Uniformly in statistical tests, the largerthe size ofthe sample taken, the smaller the error and thegreater the reliability of the results.

    For many tests, the larger ratio ofthe effect size tothe standard deviation, the more likely the effect is to bestatistically significant. It is a common practice to increasesample size and therefore reduce the standard deviation whenthe effect size is small. Failure to increase the sample sizeresults in an experiment of low power. An increase in sam-pIe size should be equivalent in both experimental and con-trol groups. As discussed later, a large increase in only thecontrol group can invalidate the test by violating itsassumptions.

    Knowing and e8timating: In most cases, it isnecessary to know the effect size, a and {3 probabilities, andvariability in arder to estimate minimum sample size. Alphaprobabilities have been generally agreed upon by the researchcommunity, although the established values are arbitrary.Some care should be taken in toa quickly accepting thesevalues as immutable. It is possible to derive any conclusionfrom a study no matter what the associated probabilities. Onthe other hand, peer reviewers are not likely to relish conclu-sions of significance ifthe probability of an a error is toa large,say >0.1. Acceptable {3 probabilities are not canonized. Ac-cording to Neyman and Tokarska (22), a power of 0.8-0.9should be regarded as reasonable for detection of an effect.Kraemer (23) suggested that power be 0.8. In bis calculations,McCance(24)used apowerofO. 7 asthe lower boundary oftherange of acceptable powers. Reasonable values ofpower arein the range oro. 7-0.9.

    9

  • ~~f::~:

    ~"';~"';c'CFrequently, the effeét size and variability ofthe

    data are known from previous experiments performed eitherby the investigator or by others. There already may have beenexperiments using similar protocols, with subjects drawnfrom essentially the same population or from a"sufficientlysimilar population. In these cases, rather precise values maybe assigned to these variables. The experimenter may notalways know these parameters, especially whenmakingmeasurements ofnew variables. What then? One possible ap-proach is to do a pilot study using a small sample size. Iftheexperimenter is willing to accept the sample means andvariances as adequate estimates of the populationparameters, then a rough estimate ofminimum sample sizeis possible. The minimum sample size calculated will be onethat produces sufficient power to detect an effect at least aslarge as that obtained in the pilot study. This estimate willnot be as good as that obtained when the effect size andvariability are actually known, but it is better than nothingat all.

    "'.;;;;-"

    animals are usedin research because they are more expen-siv~ is not accurate for many, ifnotmost, cases. His questionabout the statistical reliability oflarge-animal experimentsor the unjustifiably large numbers ofrodents in experiments,while important, is meaningless in this contexto

    There are many reasons for choosing one speciesover another in experimentation. Frequently, rodents arechosen because they may be inbred, thus reducing variabili-ty by eliminating much ofthe genetic variation. Gall stonesare easily induced in prairie dogs, but not in most otherspecies, by a simple change in diet. The cheek pouch ofthehamster is a site into which tissues can be transplanted forthe study of microcirculation, the circulation througharterioles and capillaries. The squid is ideal for axonologybecause ofitslarge-diameter axons. Some largeranimals maybe selected because something about them more closelyresembles the condition in man, and so forth. Often customis the only reason for using a particular species. For manyyears, cats were used in neurophysiological experiments simp-ly because they had been used by many different investigato~for many years.

    That is not to say that economic considerations donot enter into the choice of species. Rats may be elected forexperiments which could be done equally well in cats or dogs,because rats are less expensive to purchase and maintain. Butexpense should not be the only consideration in the choice ofspecies. Experiments in screening ofpossible carcinogens re-quire large numbers of animals becausethe expected in-cidence of cancer is likely to be low and it takes a long time,relative to the lüe-span ofthe species, for the cancers to ap-pear(26). Thisispartly aneconomicissue, butit wouldbedif-ficult to justify the use of large numbers of endangeredprimates when abundant rats would suffice.

    Rowan (25) has also missed the point of the ap-propriate experimental units (16). For large animal ex-periments the proper unit ofthe experiment is frequently thenumber of measurements beca use a number ofmeasurements are made on each animal. Therefore, n is thenumber ofmeasurements, not the number of animals. For ro-dent studies, the unit ofthe experiment frequently is thenumber of animals or the number ofpools of animals. Often,rodent studies involve pooling animal materia]s to getsamp]es ]arge enough to measure or large enough to producesome desired effect size. In these experiments, the number ofanima]s required is determined by the number of experimen-tal conditions and the number of animals per pooled sample.

    Grantor jmposed restrjctjons: When an investi.gator applies for a contract to perform some studies for a cer-tain granting agency, he frequently must agree to conduct theresearch iDa certain way. Often, the agency will specify howmany animals must be used in the experiments. It sometimesappears that the numbers have not been specified for obviousor defensible reasons. It is common practice in such contractsto specify that the control groups be two to five times largerthan the experimental groups. There is no statisticaljustifica-tion forthis practice. It is possible to increase the power oftheexperiment by increasing the size ofthe conu'ol b'1.0UP by itself(any increase innumber of animal s will have this effect), but

    Nonstatistical Factors inDetermination oí Sample SizeEthical considerations: Clearly, there are ethical

    implications ofusing toa many or toa few animal s to detectan effect if it exists. Both are a waste of animals, though us-ing toa few may be the greater waste. There are alBO ethicalor animal welfare considerations that influence the numberof animals used, but these are considerations that have lit-tle to do with statistical analysis. For example, our commit-tee recently was asked to review a protocol involving the useof rabbits in a study of frostbite. The investigator proposedproducing frostbite in both hind limbs of a rabbit, thenrewarming each limb according to a different procedure to seewhich would produce the least necrosis and better healing.The committee was concerned that a rabbit debilitated in thisway would have great difficulty moving around in a cage andthat so much damage to the leg tissues and their circulationmight reduce survival ofthe animals. The committee recom-mended that the investigator consider doubling the numberof animals and producing frostbite in only one leg or usingthe fore limbs or ears instead. These recommendations weremade only for reasons ofthe welfare ofthe animal.

    Frequently, such considerations lead to either areduction in the number oftreatments or an increase in thenumber of animals. Either outcome is justifiable on bothethical and statistical grounds. Protocols involving multiplesurvival surgeries (occasions for general anesthesia andsurgery on the same animal s separated by survival time) arereviewed carefully by the committee. These must be justifiedexhaustively by the investigator. Unless suchjustification isadequate the committee will recommend that the surgeriesbe done on different groups of animal s, sometimes resultingin increased numbers of animals.

    Economic considerations: Rowan (25) observedthat "few experiments require large numbers of dogs, cats orprimates, but protocols that require hundreds ofrodents arecommon." His conclusion that "obviously" fewer large

    10

    ~

  • Animal Numbers in Research

    tbe increase In poweris only apparentrc.'.Most statistical tests require tbe assumption tbat

    tbe variances oftbe experimental and control populations arebomogeneous. As we bave seen, increasing sample size willreduce tbe variance of tbe sample, tbus increasing tbe ratiooftbe effect size to tbe variance. However, in comparing twosamples, the actual power of tbe test is determined by tbesmallest sample, not tbe largest. Enlarging only the controlgroup causes a violation oftbe assumption ofbomogeneityand, tberefore, invalidates tbe statistical test.

    In tbese cases, tbe committee faces a dilemma. A p-proval ofsucb a protocol constitutes a waste ofanimals and,tberefore, an abrogation oftbe committee's responsibilities.To refuse sucb a protocol means tbe institution will not receivetbe contract. It seems tbat it is tbe responsibility oftbe grant-ing agency to assure tbat fue contract contains valid specifica-tions for tbe appropriate number of animals. In some cases,tbe appropriate number of animals is not specified by an agen-cy, but by convention. Sucb is fue case for establisbing potencyof vaccines, wbere Hendriksen, et al. (27) bave sbown tbatsubstantial reductions from conventional sample sizes arepossible witb no loss of ability to acbieve a 95% confidenceintervalo A similar suggestion has been made by Sbillaker,et al. (28) for skin sensitization tests and by Ennever andRosenkranz (29) for carcinogenicity tests.

    Artificial inflation of control group size: Frequent-ly, investigators will ask for more animals tban are requiredfor tbe experiments they propase. We suspect tbat tbis hap-pens because tbe investigator has not calculated tbenumberof animals actually needed. When asked for ajustification fortbe large number of animals, many oftbese investigators res-pond by decreasing tbe requested numbers. Usually tbedeliberations ofthe committee do not result in a cbange inrequestednumbers of animals. For example, in tbe most re-cent 109 protocols, 89% were approved for tbe number ofanimals originally requested. For 2/3 oftbe remainder, tbeinvestigator reduced tbe number of animals, and for 1/3 tbeinvestigator increased the number of animals as a result ofquestions asked by tbe committee. For tbose tbat reduced tbenumber of animals, tbe average reduction was 33% (range2-67%). Two investigators reduced tbe number of experimen-ta] treatments, but kept tbe number of animals tbe same.

    Sometimes investigators propase using controlgroups tbat appear inordinately large. Tbis can occur whentbe protocol involves sampling specific parameters at a seriesof times after a particular treatment. Control animal s aretben included for eacb time at wbicb experimental animalswill be used. Tbis is often justified by tbe nature of tbe ex-perimental designo However, w ben tbe control group does notcompensate forvariations in sample .'time," tbe investigatormay be trying to increase tbe apparent power ofbis experi-mento As discussed in tbe previous section, tbis is notjustified.

    Protocols witb several experiments in tbem willoften request different sample sizes tor each experImento Tbecommittee must ask tbe investigator to explain w by unequalsample sizes are to be used and, more to tbe point, bow tbesmaller sample isjustified iftbe larger one is actually need-ed or, conversely, why thelarger sample is requested ifthe

    smallerórieÍs sUfficient. This problem occurs more often in

    experiments involving rodents, probably because they aremos~ likely to involve large numbers of animals in several

    experiments.

    Ways of Reducing the Number ofAnimals U sed in ResearchNearly everyone agrees that it is desirable to

    reduce the number of animals used in research, but the efo

    fect on power of such reduction can be dramatic. Obviously,

    both cornmittees and granting agencies must think careful-

    ly about the power of experiments before suggesting reduc-

    tions in sample size (24). W)1at are ways of reducing the

    number of animals used without reducing the power of the

    experiments?Increase effect size: Ifthe effect size, the minimum

    acceptable effect, can be increased then the size of the

    minimum sample needed to detect the effect will be reduced.

    One way to increase the effect size is to change the baseline

    against which the measurements are made. For example, if

    the effect to be measured is a change in blood glucose level~,

    then fasting the animals before the experiment may produce

    larger changes in these levels and increased effect size. Kemp-

    thorne (30) was able to magnify the effects of proteins on

    growth by reducing the nutrition level ofthe animal s before

    the experimento Towe and Mann (31) were able to detect an

    effect of strychnine in the cerebral cortex of cats simply by

    reducing the stimulus intensity. With supramaximal inten-

    sities the responses were saturated and an increase could not

    be detected. Other such maneuvers could be used in other

    situations.

    This sort ofmanipulation may not always be ap-

    propriate. For example, dietary manipulations themselves

    could alter the response to an intervention, requiring an ad-

    ditional control group, or they may limit the generality ofthe

    conclusions drawn. The appropriateness of such manipula-

    tions depends upon the nature ofthe experiments.lfthe pur-

    pose is to see if an int~rvention can have any effect on a

    response, as in the case of the strychnine experiments of Towe

    and Mann (31), such manipulations are certainly warranted.

    Reduce varjabjljty:Sample size may be reduced if

    the extraneous variability ofthe measurements in the sam-

    pIe can be reduced, a main purpose of designing experiments.

    One way to reduce variability is to increase the accuracy of

    the measurements. This may mean developing a new

    measurement tool. Another possibility is to use inbred

    animals (animals which are more alike genetically), litter-

    mate animals, or matched pairs of animals. Variability also

    can be reduced by using an animal as its own control.

    Some experimental design elements themselves

    reduce or control for variability. The use ofthe randomized

    block design will control for variation. In addition, the

    primary purpose of analysis of covariance is to eliminate the

    effects ofvariables acting wjth the experimental variables.

    Even after all ofthese measures have been taken, there will

    still be variation in the data. This is the random variation in

    the measurements themselves and the variation among the

    individual s in the population of the parameter being

    11

  • Vol 41, No 1Laboratory Animal Science

    '-o!:anuary 1991

    di~tribution will illustrate thedifficulty withsmall samples,The 95% confidence interval is nearly 0,7 t units wider for asa'mple size of5thanfor asample size oflOOO, Itis nearly 0.6tunits wider for a sample size of5 than for a sample size of35.

    The power ofthe test decreases dramatically as thesample size is reduced from 35 to 5. Therefore, the probabili-ty offinding an effect Ir one exists becomes much smaller. Notonly does the test become less reliable, but it algo becomes lesssensitive. The most common reason given for risking theseerrors with such small samples is that the appropriatenumber of animals would be too expensive or the experimentstoo time-consuming. Some i-nvestigators de al with this pro-blem by replicating the experimento That is, over the courseof time, they make the same observations in several smallsamples instead of severallarge samples of animal s, By pool-ing the results, they are able to get the pooled samples to anappropriate size. However, they incur the penalty ofhavingto assume that the experiments were done in exactly the sameway each time. As Tversky and Kahneman (32) have noted,"regardless of one's confidence in the original finding, itscredibility is surely enhanced by the replication" ifthe effectis in the same direction and of approximately the samemagnitude and ifthe variances are approximately equal.

    The usual sampling techniques implicitly assumethat a sample ofprespecified size will be taken and observa-tions will be made on the entire sample, regardless ofwhetherall observations are actually needed to reach a decision aboutthe hypothesis, Using a sequential test (33), it may be possibleto reduce the sample size by as much as 50%. Sample unitsare selected randomly one at a time from the population. Aftereach observation, theexperimenter must decide whether ornot to reject the null hypothesis or obtain another observa-tion. (The decision is based upon the ratio ofthe probabilityfunction for the test under the null hypothesis , Ho, to thatunder the alternative hypothesis, H1,) The a ¡ind.B pro-babilities must, of courSe, be set in advance ofthe experimentoIn this way, the experiment can be terminated before theusual preset number of animals has been used. Of course, theprobability of detecting any event is greater ifyou 're lookingfor it, This means that ordinary sampling statistics may notapply to serial tests, The interested reader should consult oneofthe many available sources for the methods of calculatingprobabilities and examples of application of these techniques(33,34,35).

    C c c -.c

    riieasUXed. These variations cannot be eliminated.~~~ Wjse useofcontroIgroups:The number ofanimals

    used inanfXPeriment often can be reduced by careful use ofcontrols. Inf!ome experiments, an animal may be used as itsown control, 'thereby eliminating the need for a separate con-trol group. Th\s procedure has the added advantage ofreduc-ing variability b;y minimizing the effects of interanimal varia-tion beca use pal~ed tests may be used. Often the same se.quence of observations is repeated under a variety of condi.tions in different e~er~ments within the same protocoloUnless time is an i~p~ant variable in such experiments,there is no necessity to du~icate control groups. This may givean added savings in animals.

    Repeated samples from anjmals: With micro-analysis techniques available, it is often possible to measureblood concentrations from very small samples. In these cases,it is possible to obtain samples repeatedly from the sameanimal over a period oftime. This procedure will reduce then umber of animals used by eliminating the need for separategroups to be sampled at each time. As in most procedures, thisone has its limitations. The experimenter must either knowthat each sample has no efTect on subsequent samples or hemust be convinced that this is unimportant. For example, ifblood is to be drawn from rodents by retroorbital bleeding,both animal welfare and scientific considerations dictate thatthis not be done toa often. In addition, our committee requiresthat this procedure be done under anesthesia possibly in-troducing additional complications. The patency of indwell.ing catheters, especially when used in small animal s,represents another limitation. Patency is frequently reduc.ed \\'ith time, so the totalduration oftheir usefulness may belimited.

    Usjng one-tajJed rather than t~'o-tajJed test.,,:Usually, an experimenter does not specify the direction ofthedifference between experimental and control groups. That is,he will accept a difference with the experimentalmeasurements either greater than or less than the controlmeasurements. This requires the use of a two-tailed test.However, ifthe experimenter is willing to specify the direc-tion ofthe expected difference, then a one-tailed test may beused- The danger ofusing such a test is that it precludesfin-ding results in the direction opposite to that expected. In somekinds of experiments, the opposite result may be ofno interest,as in the case üf studies of Borne potential therapeutic agentfordepression. The investigator is not likely to be interestedin an agent that increases depression.

    When this kind of repeated sampling is done on thesame animals, the statistical analysis must be done with careoThe repeated measurements are not independent, becausethe value of each measurement is related to the measure-ments taken before. Shott (35) found that a majar deficiencyofpapers published in two veterinary journals was treatmentof dependent samples as if they were independent. Withrepeated measurements from the same animal, it isunreasonable to pool the samples as ifthey wereequivaient.It isequally unreasonabJe to compare the sampJe at one timein a group of animaJs with the sample at another time in thatsame group using a 2-sample t test, a test w hich requu.es thatthe samples be independent. U nder these circumstances, com-parisons between experimental and control groups can bedone legitimateJy.

    Usingreplication or sequential testing: A samplesize frequently proposed for protocols our committee hasreviewed is 5-10 animals. Many investigators analyze theirresults in terms ofthe statistic t. For very large samples, thedistribution of t approaches the normal distribution. As thesample size decreases, the peak ofthe distribution is depre5s-ed and the tails elevated, in short, it deviates more and morefrom normality. One ofthe assumptions ofthe test is that thedata are, in fact, distributed normally. For very small samples,this is probably not true. There is no firm definition ofwhatconstitutes a "very small" sample, but a brief Jook at the t

    12

  • ~-~c_.~

    Animal Numbers in Research

    have the same power as a two-taIled test wlth an IX probabIlI-ty ofO.lO provided the result is in the expected direction. Irthe result is in the other direction, the one-tailed test has nopower at all. The result ofthe increase in power afforded bythe one-tailed test is that a smallersample can be used toachieve the same power as would be achieved by a two-tailedtest with a larger sample. The investigator who elects this op-tion must be aware that he pays the penalty ofnot being ableto find a significant result in the direction opposite the ex-pected one.

    thatdevelopment of such a model would not eliminate therteedTor aniJIíals, cbtitónlyreducethenumber needed.

    ~ -c

    Rosenkranz and Klopman (37) correctly emphasize thatanimal testing will always be neededfrom time-to-time toverify the continued validity ofthemodel.

    ConclusionsCommittees charged with evaluation of ex-

    perimental protocols must, by law, be constituted by both pro-fessional research workers and lay persons. Even the profes-sional scientist may not be familiar with the statistical con-siderations outlined briefly here. It is too much to ask the layrepresentatives to make decisioI)S based largely on such con-siderations. Nevertheless, the committee must make deci-sions about the protocols, including decisions about the ap-propriatenessofthe number ofanimals requested. It is clearto liS that such decisions are not simple and seldom made uponstatistical g¡'ounds alone. Members of committees must viewthe entire experiment in making decisions about animalnumbers. We hope that we have highlighted some of the issuesthat have appeared in our consideration of a large numberof protocols.

    The subject of scientific merit has been purposelyavoided here; it is a thorny issue that is best treated in-dependently. However, we would be remiss in not pointing outthat an experiment that is a waste oftime is also a waste ofanimals.

    AcknowledgmentsThe authors wish to thank Drs. Francis Clark and Roger

    Koehler and Ms. Sally Mann for their comments on earlier versionsofthis manuscript. They also wish to thank members ofthe commit-tee for their diligent work in support of animal welfare.

    References1. Russell WMs, Burch RL. The Principles of Humane

    Experimental Technique. Methuel), London, 1959.2. U.S. Congress. The Animal Welfare Act. 1966 PL

    89-544, Vol. 7, U.s. Code 2131-2156. Amended December 17, 1985.3. arrice for Protection from Rese~rch Risks.Public

    Hea/th SeT"ice Po/i~v on Humane Use ofLaboratory Anima/s. NIH,Bethesda, MD, 1986.

    4. Part IV. Department of Agriculture. Animal and PlantHealth Inspection Service, 9 CFR Parts 1,2, and 3 Animal Welfare;Fi:::?) R"l'é;o F~d~r,qJ Rpgic:ter.J 989:54(168):36153.

    5. SokaIRR, RohlfFJ. BiometJy. ThePrinciplesandhac-tice ofStatistics in Biological Research. W.H. Freeman, San Fran-cisco, 1969.

    Using trena analysis to detect too small samples:If an experiment is performed with the preselected samplesize, and the results are associated with an obtained p valuegreater than the a value selected beforethe experiment, mostexperimenters are inclined to simply not reject the nullhypothesis. They will conclude that their intervention hadno effect. There is still the possibility of a (3 error, of acceptingthe null hypothesis when it is falseo It is always a good ideato carefully examine the data in any experimento No ex-perimenter should simply plug bis data into a statistical pro-gram and confine bis attention to the computer output. Ex-amination ofthe data themselves may give an indication ifa (3 error has occurred. For example, ifthe difference betweenthe means of the experimental and control groups is notassociated with a probability less than or equal to a, it maybe that the sample size was toa small. One way to detect thisis to do a formal or informal trend analysis. If every ex-perimental valuewasgreaterthaneverycontrol value, itmaybe worthwhile to add more observations. On the other hand,ifhalfwere greater and halfwere smaller, adding more obser-vations may well be a waste of animals.

    Replacement ofanimals:The most effective wayto reduce animal numbers would be not to use them at all.In some cases, it is possible to do studies on in vitro embryocultures or cell cultures (36). For example, some questionsregarding the infection of cells by viruses or effects ofmitogenson ionic channels in lymphocytes can be answered effective-ly using cultured cells. This, by itself, may not reduce animalnumbers unless cell-lines can be maintained and the numberof available cells expanded beca use animals are the ultimatesource of the cells. But not all research questions can beanswered by use of cultures. Questions about the metastaticprocess, tissue or organ transplantation, brain mapping,behavior and many other processes cannot be answered at pre-sent using cultured cells. On the other hand, some questionsregarding hematopoietic stem cells in peripheral blood orbone marrow, for example, can be answered using human cellsand tissues which are available clinically, but not all suchquestions are answerable this way. Ifthe cells must be in-troduced to a host and tissues collected from fue host, humanscannot be used.

    Computer or other types of models are commonlysuggested to replace animals in research. In fact, some modelsmay well prove to be useful for this purpose (37). But anymodel is only as good as the data and concepts on which it isbased. We rarely understand any living system well enoughto develop all-encompassing and effective models. It is clear

    6. Altman DG, Gore SM, Gardner MJ, et al. Statisticalguidelines for cont.ributors to medical journals. Brit Med J1983;286:1489-93.

    7. Gardner MJ, Machin D, Campbell MJ. Use of check listsin assessing the statistical content on medical studies. Brit Med J

    1986;292:810-12.8. Berry G. Statistical guide-lines and statistical

    guidance. Med J Austral 1987;146:408-09.9. Beyer WH. Handbook olTables lor Probability and

    Statistics, 2nd Edition. CRC Press, Boca Raton, 1968.

    13

  • ,e'

    z!

    c,_::~""," 32.. TverskYA. I\;¡¡Jtnemanp. Beliefip tbelaw of smail"" nuffibers."Psychol BuI11971j76i105-10:::"' '~"::~ -'

    33. Ostle B. Statistics in Research. lowa State Universi-ty Press, Ames, lA, 1963.

    34. Armitage P. Sequential Medical Trials. BlackweilScientific Publications, Oxford, 1960.

    35. Shott s. Statistics in veterinary research. J Amer VetMed Assoc 1985;187:138-41.

    36. Marazzi A, Ruffieux C, Randriamiharisoa A. Robustregression in biological assay: Application to the evaluation of alter-native experimental techniques. Experientia 1988;44:857-73.

    37. Rosenkranz Hs, Klopman G. CASE, the computer-automated structure evaluation system, as an a.lternative to exten-sive animal testing. Toxicol Indust Health 1988;4:533-40.

    38. Lachin JM. Sample size determinations for r x c com-parative trials. Biometrics 1977;33:315-24.

    39. Lakatos E. Sample sizes based on the log-rankstatistic in complex clinical trial. Biometrics 1988;44:229-41.

    40. Gatsonis C, Sampson AR. Multiple correlation:Exact power and sample size calculations. Psychol Bull1989;106(3):516-24.

    41. Gordon l. Sample size estimation in occupational mor-tality studies with use of confidente interval theory. Amer JEpidemiol1987;125(1):158-62.

    42. Satten GA, Kupper 11. Sample size requirements forinterval estimation of the odds ratio. Amer J Epidemio].1990;131:177-84.

    43. Day SJ, Graham DF. Sample size and power for com-paring two or more treatment groups in clinical trials. Brit .r.,{ed J1989;299:663-65.

    44. Bach LA, Sharpe K. Sample size for clinical andbiological research. Aust NZ J Med 1989;19:64-68.

    45. Arkin CF, Wachtel MS. How many patients arenecessary to assess test performance? JAMA 1990;263(2):275-78.

    46. Mould RF. Clinical trial design in cancer. Clin Radiol1979;30:371-8L

    47. Beal SL. Sample size determination for confidente in-tervals on the population mean and on the difference between twopopulation means. Biometrics 1989;45:969-77.

    48. Feuer EJ, Kessler LG. Test statistic and sample sizefor a two-sample McNemar test. Biometrics 1989;45(2):629-36.

    49. Fleiss JL, Levin B. Sample size determination instudies with matched pairs. JClin Epidemiol1988;41:727-30.

    Footnotes'The PHS and USDA use the term IACUC to designate the com-mittee charged with local enforcement of animal welfare Tules.In this paper, the term comm;tteewill be used to designatethisbody.'The term power, in strict usage, applies to the statistical test,not to the experiment McCance (24) has pointed out that its ap.plication tothe experiment isjustified because ofthe inextricablelink between the experiment and the test of significance itemploys.'OUT committee uses the term protocol to refer to a proposaltouse animals in a research or teaching project. These protocolsmust follow a specified formato They are not proposals for fun.ding; rather they concentrate on important animal welfare con-siderations (e.g., anesthesia. surgery, analgesia, euthanasiaJ aswell as on research considerations (e.g., scientific justification.experimental design, and numbers of animals per test group).

    -,,~c.,c)O, A.l.eong J, ~artlett DE. Imp~oveq graphs for-calculatingsampleSizes whencomparing twomdependent binomia[distributions. Biometrics 1979;35:875-81.

    11. Fleiss JL. Statistical Methods for Rates and Propor-tions, 2nd Edition. John Wiley & Sons, New York, 1981.

    12. Glantz SA. Primer of Biostatistics, 2nd Edition.McGraw-Hill, New York, 1987.

    13. Kastenbaum MA, Hoel DG, Bowman KO. Sample sizerequirements. Biometrika 1970; 57:421-30.

    14. Schlesselman JJ. Statistics in veterinary research.J Amer Vet Med Assoc 1982;187:138-41.

    15. Cohen J. Statistical Power Analysis for the BehavioralSciences, 2nd Edition. Lawrence Er lbaum Associates, Hillsdale, NJ,1988.

    16. Erb HN- A statistical approach for calculating theminimum number of animals needed in research. lLAR News1990;32(1):11-16.

    17. Keppel G. Sensitivity of experimental designs. Designand Analysis. A Researcher's Handbook. Prentice-Hall, Inc.,Englewood Cliffs, NJ, 1973;521-46.

    18. Freiman JA, Chalmers TC, Smith H, et al. The im-portance ofbeta, the type n error and sample size in the design andinterpretation ofthe randomized control trial. New Engl J Med1978;299:690-94-

    19. Freiman JA, Chalmers TC, Smith H, et al. The im-portance ofbeta, the type n error, and sample size in the design andinterpretation ofthe randomized controlled trial. Survey of 71"negative" trials. In Bailar JC nI, Mosteller F, eds. Medical Uses ofStatistics. New Engl J Med Books, Waltham, MA, 1986; 289-304.

    20. Overall JE. Classical statistical hypothesis testingwithin thecontext ofBayesian theory. Psychol Bu111964;61:286-302.

    21. Altman DG. Statistics andethics in medical research:nI How large a sample? Brit Med J1980;281:1336-38.

    22. Neyman J, Tokarska B. Errors ofthe second kind intesting "Student's" hypothesis. JAmer Stat Assoc 1936;31:318-26.

    23- Kraemer HC. Sample size: When is enough enough?Amer J Med Sci 1988;296(5):360-63.

    24. McCance l. The number of animals. NIPS 1989;4:172-76.

    25. Rowan AN. Research protocol design and laboratoryanimal research. lnvest Radiol1987;22:615-17.

    26. Newton CM. Biostatistical and biomathematicalmethods in efficient animal experimentation. The Future of Animals,Cel1s, Models, and Systeins in Research, Development, Education andTesting: NationaIAcademyofSciences, Washington,DC, 1977;152-69.

    27. Hendriksen CFM, Gun JW v d, Marsman FR, et al.The effects ofreductionsin the numbersofanimals usedforthe poten-cy assay of diphtheria and tetanus components of absorbed vaccineby the method of European Pharmacopoeia. J Biol Stand1987;15:353-62.

    28. Shillaker RO, Bell GM, Hodgson JT, et al. Guinea pigmaximisation test for skin sensitisation: The use of fewer animals.Arch Toxico11989:63:283-88.

    29. Ennever FK, Rosenkranz H8. Methodologies for in-terpretationofshort-term test results which may allowreduction inthe use of animals in carcinogenicity testing. Toxicol lndust Health1988;4:13749.

    30. Kempthorne O. Design and Analysis ofExperiments.Wiley, New York, 1952;179-

    31. Towe AL, Mann MD. Effect Df strychnine on theprimary evoked response and on the corticofugal reflex discharge. ExpNeurol1973;39:395-413.

    14