FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

38
AdA094 478 MINNESOTA UNIV MINNEAPOLIS COMPUTERIZED ADAPTIVE TES-ETC FIG 5/9 A COMPARISON OF ADAPTIVE SEQUENTIAL, AND CONVENTIONAL TESTING S--ETC(U) NOV 80 A 6 KINGSBURY. D J WEISS NOOO147 GC-O172 UNCLASSIFIED RR-80-4 NL * EEElh7llllEE Ifllfllflflllfllfll EI/I/IEl/Iz

Transcript of FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

Page 1: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

AdA094 478 MINNESOTA UNIV MINNEAPOLIS COMPUTERIZED ADAPTIVE TES-ETC FIG 5/9A COMPARISON OF ADAPTIVE SEQUENTIAL, AND CONVENTIONAL TESTING S--ETC(U)NOV 80 A 6 KINGSBURY. D J WEISS NOOO147 GC-O172

UNCLASSIFIED RR-80-4 NL

* EEElh7llllEEIfllfllflflllfllfllEI/I/IEl/Iz

Page 2: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

A COMPARISON OF _

N ADAPTIVE, SEQUENTIAL, ANDi CONVENTIONAL TESTING STRATEGIES

FOR MASTERY DECISIONS

< G. Gage Kingsbury TDTICand E CT EDavid J. Weiss %. FEBO 3 1981t

RESEARCH REPORT 80-4

NOVEMBER 1980

COMPUTERIZED ADAPTIVE TESTING LABORATORYPSYCHOMETRIC METHODS PROGRAM

DEPARTMENT OF PSYCHOLOGY

UNIVERSITY OF MINNESOTA

MINNEAPOLIS, MN 55455

This research was supported by funds from the AirForce Human Resources Laboratory, the Air ForceOffice of Scientific Research, the Army Research

8.. Institute, and the Office of Naval Research,and monitored by the Office of Naval Research.

"Approved for public release; distribution unlimited.Reproduction in whole or in part is permitted for

any purpose of the United States Government.

81 2 03 003

wol

Page 3: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

SECUAITY CLASSIFICATION OF THIS PAGE (Wen Date Entered) 4~ 0 7a~ ..REPORT PAGEREAD INSTRUCTIONS

BEFORE COMPLETING FORMRORT NUMBER2. GOVT ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER

ReseQarch po_ 104 "- .S. TYPE OF REPORT 6 PERIOD COVERED

A Comparison of Adaptive, Sequential, and Technical ReportConventional Testing Strategies for !

Mastery Decisions, S. PERFORMING ORG. REPORT NUMBER

7. AUTHOR(s) S. CONTRACT OR GRANT NUMBER(e)

G0 . Gage Kingsbury SW David J. Weiss /5 N006.14-79-4-1729. PERFORMING ORGANIZATION NAME AND ADDRESS 10 PROGRAM ELEMENT. PROJECT, TASK

AREA 6 WORK UNIT NUMBERS

Department of Psychology' P.E.: 6115N Proj.: RR042-04

University of Minnesota T.A.: RR042-04-01

Minneapolis, Minnesota 55455 W.1. 4 NR 150-433II. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORTRAX -

Personnel and Training Research Programs NovOffice of Naval Research C 13.- NUMBER OF PAGES-

Arlington, Virginia 22217 28

14. MONITORING AGENCY NA. IE & AODRESS(If f r5Lq Coos 4 iemnrtc ¢) 1S. SECURITY CLASS. (of this oep"e-)

Unclassified

ID T SCHEDULE TOOWG16. D4Sj[SkT._W-S- TEMENT (at this Report)

Approved for public release; distribution unlimited. Reproduction in wholeor in part is permitted for any purpose of the United States Government.

17. DISTRIBUTION STATEMENT (of the abstract entered in Block 20, II different from Repoet)

I0. SUPPLEMENTARY NOTES

This research was supported by funds from the Air Force Human Resources Lab-oratory, the Air Force Office of Scientific Research, the Army ResearchInstitute, and the Office of Naval Research, and monitored by the Office of

Naval Research.

1S. KEY WORDS (Continue on reverse side II ntceeas ' and Identify by block number)

achievement testing criterion-referenced testing

mastery testing sequential testing

adaptive testing latent trait theory

tailored testing item characteristic curve theory

computerized testing item response theory. ABSTRACT (Continue on rovere ede it nececsfy and Identify by block number)

Two procedures for making mastery decisions with variable length tests and aconventional mastery testing procedure were compared in monte carlo simulation.

• s: The simulation varied the characteristics of the item pool used for testing andthe maximum test length allowed. The procedures were compared in terms of themean test length needed to make a decision, the validity of the decisions made

by each procedure, and the types of classification errors made by each proce-dure. Both of the variable test length procedures were found to result in

DD OJAN73 1473 EDITION OF I NOV6s IS OBSOLETE UnclassifiediL/ L-tr75S/N 01 02-LF-01 4.6601 SECURITY CLASSIFICATION OF THIS PAGE When De4 tlnere)

Page 4: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

UnclassifiedS9CUATY CLASSVICATiOW OF THIS PAGE (MiRhm Datentew!4

portant reductions in mean test length from the conventional test length.

- The Sequential Probability Ratio Test (SPRT) procedure resulted in greatertest length reductions, on the average, than the Adaptive Mastery Testing(AMT) procedure. However, the AMT procedure resulted both in more validmastery decisions and in more balanced error rates than the SPRT procedureunder all conditions. In addition, the AMT procedure produced the best com-bination of test length and validity

[ccession For

NTIS GRA&I

DTIC T

Distr , ,

Ava i - .. . Codes

Dist L ::1

,I

i!

Unclassified

SECURITY CLASSIFICATION OF THIS PAGE(Me, De Entereg

Page 5: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

CONTENTS

Alternatives to Conventional Mastery Tests ................................... 1SPRT Applied to Mastery Testing ......................................... 2

The SPRT .......................................................... 2Ferguson's Application to Mastery Testing ......................... 3

IRT-Based Mastery Testing ............................................... 4Individualized Item Selection ..................................... 4

Mastery Level ..................................................... 5Making the Mastery Decision Using Bayesian Confidence Intervals ... 7

SPRT versus AMT ......................................................... 8Treatment of Items ................................................ 8Treatment of Uncertainty .......................................... 9Treatment of Mastery Level ........................................ 9Treatment of the Achievement Metric ............................... 9

Method ....................................................................... 9Item Pool Generation ................................................... 10

Uniform Pool ..................................................... 10b-Variable Pool .................................................. 10a- and b-Variable Pool ........................................... 10a-, b-, and c-Variable Pool ...................................... 10

Mastery Level Conversion .............................................. 11Item Response Generation .............................................. 11Conventional Tests .................................................... 11Simulation of the Testing Strategies .................................. 13

CT Procedure ..................................................... 13SPRT Procedure ................................................... 13AMT Procedure .................................................... 13

Comparison Among the Testing Procedures ............................... 14

Results ..................................................................... 14Test Length ........................................................... 14

Uniform Pool ..................................................... 14b-Variable Pool .................................................. 14a- and b-Variable Pool ........................................... 15a-, b-, and c-Variable Pool ...................................... 16

Correspondence with True Mastery Status ............................... 16Uniform Pool ..................................................... 16b-Variable Pool .................................................. 16-__ P.a- and b-Variable Pool ........................................... 17i a-, b-, and c-Variable Pool ...................................... 17

Correspondence as a Function of Test Length ........................... 18

Frequency and Type of Errors .......................................... 20Uniform Pool ..................................................... 20b-Variable Pool .................................................. 20a- and b-Variable Pool ........................................... 22a-, b-, and c-Variable Pool ...................................... 22

Discussion and Conclusions .................................................. 23

References .................................................................. 25

Appendix: Supplementary Tables ............................................. 26

Page 6: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

A COMPARISON OF ADAPTIVE, SEQUENTIAL,AND CONVENTIONAL TESTING STRATEGIES

FOR MASTERY DECISIONS

The use of criterion-referenced achievement test interpretation has gainedgreat support within the educational measurement community since its introduc-tion less than two decades ago (Glaser & Klaus, 1962). It is intuitively ap-pealing to educators to be able to measure students' performances against abso-lute standards of behavior on prespecified learning objectives, and the use ofcriterion-referenced test interpretation gives educators this capability. One

of the most basic forms of criterion-referenced test interpretation involvesclassifying students into two categories--one containing students who haveachieved a sufficient command of the subject matter (mastery) and the other con-taining students who have not achieved a sufficient command of the subject mat-ter (nonmastery). Traditionally, students are declared masters when theirscores on a conventional classroom achievement test are as high or higher than aprespecified cutoff level, or are declared nonmasters if their scores on thetest are lower than the cutoff level. This form of classroom testing has beencalled mastery testing and can be useful (1) in determining the degree of stu-dent proficiency within a classroom and (2) as a diagnostic tool to identifyindividuals who need further training in specific instructional areas (Nitko &Hsu, 1974).

As traditional mastery testing has been developing its own technology,adaptive testing technology has also developed to allow educators to make maxi-mum use of classroom testing time while reducing to a minimum the amount of timespent on testing. The use of adaptive testing techniques has recently beenshown to be effective in reducing test length while obtaining high-fidelityachievement level estimates in several instructional settings (e.g., Bejar,Weiss, & Gialluca, 1977; Brown & Weiss, 1977).

Mastery and adaptive testing technologies have each shown their usefulnessin the academic setting for different, but compatible, reasons. It is thereforenot surprising that a fusion of the two techniques should occur in order toallow mastery testing to be accomplished in the shortest possible class timewhile maintaining the accurate decisions necessary for correct diagnoses of stu-

dent instructional problems.

Alternatives to Conventional Mastery Tests

Two alternatives have been proposed to conventional mastery tests in which

all test items are administered to every student: (1) Ferguson's (1969, 1970)application of Wald's (1947) Sequential Probability Ratio Test (SPRT) and (2)Kingsbury and Weiss's (1979) formulation of an item response theory (IRT; Lord &Novick, 1968) adaptive mastery testing method. Both of these testing proceduresattempt to accomplish two common ends. First, the procedures seek to shortenthe length of the test. Second, the procedures use statistical techniques de-signed to hold the number of misclassifications (i.e., individuals for whom the

Page 7: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-2-

wrong decision is made) to some acceptable minimum. The methods by which thesetwo procedures attempt to accomplish these ends are quite different, however.

The very fact that two procedures exist that attempt to accomplish the samebasic ends through different techniques renders a comparison of the two methodsdesirable. The primary objective of this study was a comparison of the effi-ciency with which these two procedures for mastery testing achieved their goalsof reducing test length while obtaining a high percentage of correct decisions.

SPRT Applied to Mastery Testing

The SPRT. Wald's (1947) SPRT was originally designed as a quality control

test for use in a manufacturing setting. It was designed to determine whether alarge consignment of light bulbs (or any other product) contained a small enoughproportion of defective bulbs to pass some prespecified quality criterion whileonly testing a small sample of the light bulbs in the consignment. Wald's solu-tion to this problem was to draw light bulbs sequentially from the consignment,to test the light bulb drawn at each stage, and to determine at each stage therelative probabilities of the following two hypotheses:

H0 : p = P[]

Hi: P = P, [2]

wherep= the proportion of defective elements (light bulbs) in the population

(consignment);

P0 = the proportion of defective elements in the population below which itis always desired to accept the quality of the population; and

,= the proportion of defective elements in the population above which itis always desired to reject the quality of the population.

Since each stage of the sampling procedure may be viewed as a Bernoullitrial (given that each element is sampled at random without replacement fr." thepopulation of equivalent elements and is assigned either nondefective or defec-tive status), the probability of observing a certain number of defective ele-ments in a sample of a certain size, given that either H0 or H, is true, may bedescribed with the binomial probability density function. Consequently, theprobability of observing W defective elements in a sample of m elements (Wi),under H0 : T = is

Po (in- Win) (i- po W [3]

( m-WM m ).

Under H1 : ..=pi, the probability becomes

S M 1(i- WM) (1 - p M [4

The ratio of these two probabilities yields an index of the relativestrengths of the two hypotheses such that at each stage in the sampling proce-dure the quality of the consignment may be either rejected or accepted, or sam-pling of elements may be continued. The stringency of the test is based (1) on

I- - . . .. .

Page 8: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-3-

the proportion (a) of errors one is willing to tolerate in rejecting the quality

of the consignments that actually do have the quality desired, and (2) on theproportion (8) of errors one is willing to tolerate in accepting the quality ofconsignments that do not actually have the minimum acceptable quality.

In its final log form the test used by the SPRT at each stage of samplingspecifies that if

P1-,M > I-Log - - Log , [5]

•0 m

the consignment is rejected; if

Log - Log - [6]

the consignment is accepted; and if

8 1 -8Log <Log - < Log [7]

0m

sampling continue3.

Wald (1947) has shown that this testing procedure results in error levelsapproximating a and B across consignments. Further, it has been shown that theprobability of not obtaining a decision for a consignment approaches zero as thesample size increases.

Ferguson's application to mastery testing. Ferguson (1969) has applied theSPRT within a mastery testing situation using students' responses to test items

in place of light bulbs and a domain of items that represents an instructionalobjective instead of a consignment. The quality that Ferguson evaluated was

students' command of the content area being tested. Ferguson also branchedthrough an instructional hierarchy, applying the SPRT to various objectives ofinstruction. The present study, however, will consider only the application ofthe SPRT to a single instructional unit.

To employ the SPRT in a mastery testing situation, the educator must speci-fy the following:

1. Two performance criterion levels-- 0 , serving as the lowest level atwhich a mastery decision will be made and pl, serving as the highestlevel at which a nonmastery decision will be made . These two valuesbound the neutral region in which testing will continue.

2. Two levels of error acceptance (a and a), which determine the strict-ness of the decision test and should reflect the relative costs of thetwo error types.

Page 9: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-4-

3. A maximum test length co constrain the testing time for individuals who

are very difficult to classify.

One characteristic of this form of adaptive mastery testing is that it isfairly simple to implement within a classroom situation. The decision rule iseasily incorporated into a chart (e.g., Wald, 1947, Sec. 5.3.2) showing theteacher or the student how many questions need to be answered correctly or in-correctly for each test length in order to terminate the test. Once the chartsare made for various values of p,' p' a, and , the statistical work is com-pleted and the SPRT procedure can be readily used by the educator. Since theprocedure is sequential, however, it is not fully adaptive. Items are selectedat random or in a fixed sequence; it is only the test length that varies forindividuals.

IIRT-Based Adaptive Mastery Testing

The paradigm for adaptive mastery testing (AMT) that Kingsbury and Weiss(1979) have proposed makes use of IRT and Bayesian statistical theory to adaptthe mastery test to the individual's level of skill during the testing process.IRT is used (1) to estimate the parameters that most efficiently describe eachof the items in the item pool and (2) to select the best items for each testee.Given the IRT parameter estimates, it is possible to apply an adaptive item se-lection and test termination procedure that will allow mastery decisions thatare quite accurate to be made while shortening the length of the test nee-led for

most individuals.

The AMT procedure is based on three integrated procedures (see Kingsbury &j Weiss, 1979). These are

1. A procedure for individualizing the administration of test items,

2. A method for converting a traditional (proportion correct) masterylevel to the latent achievement metric, and

3. A procedure for making mastery decisions using Bayesian confidence

intervals.

Individualized item selection. To make mastery testing a more efficientprocessi it is desirable to reduce the length of each individual's test (1) byeliminating test items that provide little information concerning an individu-al's achievement level and (2) by terminating the AMT procedure after enoughinformation has been gathered so that the mastery decision can be made with ahigh degree of confidence. To operationalize this goal, the item to be adminis-tered to an individual at any point during the testing procedure is selected onthe basis of the amount of information that the item provides concerning theindividual's achievement level estimate at that point in the test; that item isthe item that most reduces the uncertainty in the person's achievement estimate.A procedure that selects and administers the most informative item at each pointin an adaptive test--the maximum information search and selection (MISS) tech-nique--has been described by Brown and Weiss (1977) and is used in the AMT pro-cedure.

Page 10: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-5-

The information that an item provides at each point along the achievementcontinuum may be determined using the IRT model that is assumed to underly theindividual's responses to the test items. The AMT procedure assumes the three-parameter logistic item response function (IRF), or item characteristic curve,model (Birnbaum, 1968). Using this model, the information available in any test.item is (Birnbaum, 1968, Equation 20.4.16)

I.(6) = (1 - c.) D 2ai 2 CDL.(O) / [DL.(6

+ ci IP[-DLi( )], [81

iwhere1i(0) = the information available from item i at any achievement level,

~8;= the lower asymptote of the IRF for the item;

D = 1.7, a scaling factor used to allow the logistic IRF to closelyapproximate a normal ogive;

a = the discriminatory power of the item at the inflection point ofthe IRF;

= the logistic probability density function;Li(e), = ai( - bi ), where b. is the difficulty of the item; and

2. - 1 -1

= the cumulative logistic function.

If it is assumed that the achievement level estimate (8) is the best esti-

mate of the actual achievement level (), the item information of each of theitems not yet administered may be evaluated at 6 at any point during the test bysubstituting 8 for 6 in Equation 8. The item that has the highest informationvalue at the individual's current level of 8 is thus chosen to be administerednext. For this study a Bayesian estimator of the individual's achievement

level, developed by Owen (1969), was used.

Mastery level. The classical mastery testing procedure specifies a per-

centage of the items on a test that must be correctly answered by an individualin order for him/her to be declared a master. Using IRT, it is possible to gen-erate an analogue to the percentage correct mastery level of classical theory

for use in adaptive testing, even though the use of MISS will tend to result ineach person answering about 50% of the items correctly given a large enough itempool (because items administered will most probably have difficulty levels veryclose to the individual's level of 8). The analogue is based on the use of thetest response function (TRF), or test characteristic curve (Lord & Novick,1968). The TRF is the function that relates the achievement continuum to theexpected proportion of correct answers that a person at any level of 8 may beexpected to obtain if all of the items on the test are administered.

For this procedure the assumption was made that a three-parameter logistic

ogive described the functional relationship between the latent trait (achieve-ment) and the probability of observing a correct response to any of the items onthe test (i.e., the IRF). This assumption yields a TRF of the following form:

n 1 + exp[1.7ai(b 6)]E(PI8) E(i - ci) + C. [91

- - - -exp[l.7ai(b - )]

Page 11: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-6-

whereE(PJe) = the expected value of the proportion of correct answers observed

on the test, given any achievement level;n = the number of items on the test;

ci = the estimate of the lower asymptote for the IRFof item i;

ai = the estimate of the item discrimination;

b i = the estimate of the difficulty of the item; and

e = any given achievement level.

This monotonically increasing function enables expressing any given levelof 0 as its most likely proportion correct or, more importantly in this context,determining the level of e that will most probably result in any given propor-tion of correct answers (Kingsbury & Weiss, 1979, pp. 3-4). Once this determi-nation is made, IRT and its technology may be used to increase the efficiency ofpresent mastery testing techniques.

Figure 1 shows the TRF for a hypothetical test, which may be used to illus-trate how the transformation of a specified cutoff score from the proportion-correct metric to the latent achievement (e) metric can be made. In Figure 1 ahorizontal line (A) has been drawn from the prespecified cutoff score (70% cor-

Figure 1Hypothetical Test Response Function Illustrated Conversionfrom the Proportion-Correct Metric to the Achievement Metric

1.0"

.9

.2

.4

-4.0 -3.0 -2.0I -I. 1.0 2.0 3.0 4. 0

Achievement ()levcl

.. o.,3

* i i lI. .. . . . . .. i II I II I . . ... . . . . . . . . . . . . . .

Page 12: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-7-

rect) to the TRF. From this point of intersection, a vertical line (B) has beendrawn to the 6 metric. The point of intersection on the achievement metric (em )

is then the point on the 6 metric to be used as the cutoff value, since a scorehigher than 6m is most likely to yield a proportion-correct score above the cut-

off score and a score lower than 6m is most likely to yield a proportion-correctscore below the cutoff score.

Making the mastery decision using Bayesian confidence intervals. Althoughany achievement level estimate obtained using IRT-based scoring of any subset ofthe items from a test will be on the same metric as the TRF for the originaltest, two different subsets of items may result in achievement level estimatesthat are not equally informative. For example, if one test consisted of manyitems and the other used only a few items, the longer test would probably yielda more precise a estimate, provided that the items in the two tests had similarIRFs. Thus, IRT-based 6 estimates that are on the same metric are comparableexcept for their differential precision. Comparisons of IRT-based 6 estimatesshould therefore be based on confidence interval estimates instead of the pointestimates.

For this reason, the AMT strategy makes mastery decisions with the use ofBayesian confidence intervals. Specifically, after each item is selected andadministered to an individual (using MISS to choose the appropriate item at each

point in the test), a point estimate of the individual's achievement level (0)may be determined using Owen's Bayesian scoring algorithm, based on informationgained from all items administered previously. Given this point estimate andthe corresponding variance estimate, also obtained using Owen's procedure, aBayesian confidence interval may be defined such that

1. < ) < + 1.96(i) with p = .95 [10]

where91 = the Bayesian point estimate of achievement level, calculated follow-

ing item i;

2i = the Bayesian posterior variance following item i; and

e = the true achievement level.

Equation 10 may be interpreted as meaning that the probability is .95 that thetrue value of the achievement level parameter, 6, is within the bounds of theconfidence interval. It might also be said that there was 95% confidence thatthe true parameter value lies within the confidence interval.

After this confidence interval has been generated, it is a simple matter todetermine whether or not Om, the achievement level earlier designated as the

mastery level on the 6 metric, falls outside the limits of the confidence inter-val. If it does not, the AMT procedure administers another item to the individ-ual and recalculates the confidence interval. This procedure continues until,after some item has been administered, the confidence interval calculated doesnot include 6 , the mastery level on the achievement continuum. At this poit

it mthe testing procedure terminates and a mastery decision is made. If the lowerlimit of the confidence interval falls above the specified mastery level, em,

the indivi--ial is declared a master; if the upper limit of the confidence inter-val falls below 0m) the individual is declared a nonmaster.

4 ] " ... " mj "

Page 13: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-8-

Given a finite item pool size, the testing procedure may exhaust the poolbefore a decision can be made in this manner. It is possible to make a decisionconcerning mastery for any of these individuals based on whether the Bayesianpoint estimate of their achievement level (6) is above or below the specifiedmastery level, 6 m . These decisions, however, cannot be made with the same de-

gree of confidence as those made with confidence intervals that do not containthe mastery level.

At any stage during the testing procedure, the individual's 6 estimate willfall to one side or the other of the specified mastery cutoff, so for any oneperson only half of the dual inequality in Equation 10 needs to be evaluated atany time during the test. Thus, the decision rule implied by Equation 10 isfunctionally equivalent to using a one-sided Bayesian confidence interval withp = .975. The error rate that the procedure actually produces will be examinedfurther below.

SPRT versus AMT

The two mastery testing strategies described above differ in a number ofcharacteristics. The most salient of these differences are as follows:

1. Treatment of the items in the domain.2. Treatment of the uncertainty of decisions.

3. Treatment of the mastery level.4. Treatment of the achievement metric.

Treatment of items. The SPRT, as outlined above, treats all of the itemsin the mastery test as if they were perfect replicates of each other. Thus, anindividual's response to a particular item is viewed solely as a probabilisticfunction of the individual's true mastery status. This assumption is most ap-propriate in the production setting in which Wald originally designed his proce-dure--that is, each light bulb can be expected to be like every other lightbulb. This assumption may be less tenable in the mastery testing situation,where an individual's responses to test items may vary as a function of differ-ential characteristics of the items themselves, as well as the individual's mas-tery status.

The AMT procedure assumes that if items differ, their individual character-istics may be described by a logistic ogive that varies as a function of theitem's power to discriminate among individuals with different achievement levels(a), the item's difficulty (b), and the ease with which an individual may answerthe item correctly with no knowledge of the subject matter (c). This assumptionconcerning the operating characteristics of the items is less restrictive thanthe assumption made in the SPRT; but to the extent that the items do -.ot conform

to the logistic form specified, the assumption might still restrict the effi-ciency of the AMT procedure.

Both mastery testing procedures, therefore, postulate some systematic simi-

larities among the test items. To the extent that one of the postulations iscloser to the actual state of the world than the other, it might be expectedthat the corresponding procedure would perform more efficiently. Thus, thecharacteristics of the item pool itself is the first point at which the two

testing strategies diverge.

....

Page 14: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

9

Treatment of uncertainty. The SPRT makes use of traditional hypothesistesting methods to determine the point at which an individual's item responsesare sufficient evidence for making a decision concerning his/her mastery status.Here "sufficient" is defined in terms of the a and B error rates that are ac-ceptable for the group of students being tested. a and B may be set indepen-dently to reflect the educator's concerns over the relative costs of the twoerror types.

The AMT procedure uses a symmetric Bayesian confidence interval to make themastery decision. This functionally sets a equal to 8 and, by doing so, impliesequal costs for the two error types. To the extent that the costs of the twoerror types are not equal, the SPRT procedure provides the educator with moreflexibility than the AMT procedure as currently operationalized.

Treatment of mastery level. The SPRT uses a neutral region, rather than asingle mastery level, to define the mastery and nonmastery regions. The speci-fication of this neutral region is based on a decision by the educator concern-ing the range that appropriately reflects uncertainty as to whether the stu-dent's performance is actually the performance of a master or a nonmaster. Bycontrast, the AMT procedure defines a single mastery level and determines wheth-er an individual is significantly above or below the mastery level using aBayesian confidence interval.

This difference between the two testing procedures renders tentative anycomparison that might be made. The performance of the SPRT procedure will varywidely as a function of the uncertainty band chosen. For the AMT technique thisuncertainty is not directly taken into account. Any comparison between the twotechniques is conditional upon the width and absolute bounds of the uncertaintyregion.

Treatment of the achievement metric. The decisions made by the SPRT aredependent on the percentage of items that are correctly answered for any specif-ic test length. Thus, the metric of achievement assumed in this procedure isthe proportion-correct metric. The AMT procedure assumes, due to the differen-tial properties of the items in the item pool, that there is a nonlinear trans-formation of the proportion-correct metric that more accurately represents theachievement of the individuals taking the test. This latent continuum (the emetric) serves as the achievement metric for the AMT procedure.

This difference in the achievement metric again renders comparisons betweenthe two procedures somewhat difficult, since the "true" achievement levels of

b individuals must be postulated to fit one of these metrics. Any differencesnoted in the performance of the two procedures may be due to this difference inthe achievement metrics assumed.

Method

Monte Carlo simulation was used to delineate circumstances in which one ofthe mastery testing procedures might have an advantage ove the other. Themethod used to compare the two variable-length mastery testing procedures (AMTand SPRT) to one another, as well as to a conventional (fixed length) testing

1"

i ... . . 1 I . ...1 . . . . . . . . .

Page 15: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

10-

procedure, consisted of five basic steps:

1. Four item pools were generated in which the items differed from one

another to different degrees.

2. Item responses were generated for 500 simulated subjects (simulees) for

each of the items in the four item pools.

3. Conventional tests of three different lengths were drawn at random fromthe larger item pools; these conventional tests served as item poolsfrom which the SPRT and AMT procedures drew items.

4. The AMT and SPRT procedures were simula:ed for each of the four differ-ent item pool types and the three conventional test lengths.

5. Comparisons were drawn among the three types of tests (AMT, SPRT, con-ventional) concerning the degree of correspondetce between the deci-sions made by the three test types and the true mastery status. Fur-ther comparisons were made based on the average test length that each

test type required to reach its decisions.

Item Pool Generation

Four 100-item pools were generated to reflect different types of pools thatmight be used in a mastery test. Items were randomly ordered during the genera-tion of each item pool.

Uniform pool. The uniform pool consisted of 100 items that were perfect

replications of one another. Each item had discrimination (a) of 1.0, difficul-ty (b) of 0.0, and lower asymptote (guessing level, or c) of .2. This pool wasdesigned to correspond to the SPRT procedure's assumption that all items in thetest are similar.

b-variable pool. The b-variable pool varied from the uniform pool only in

that the items had a range of difficulty levels. Eleven values of b were as-signed to an approximately equal number of items in the pool. The values of bchosen were -2.5, -2.0, -1.5, -1.0 -.5, 0.0, .5, 1.0, 1.5, 2.0, and 2.5. Nineitems at each level of difficulty were used in this pool, along with an addi-tional item with b = 0.0 to bring the pool to 100 items. This item pool struc-ture is shown in Appendix Table A.

a- and b- variable pool. The a- and b- variable pool differed from the

b-variable pool in that the discriminations (a) of the items differed across a

range of values. The a values used were .5, 1.0, 1.5, and 2.0. Each level ofdiscrimination was equally represented in the item pool, and each level of item

discrimination occurred with approximately the same frequency at each level of

difficulty. This item pool structure is shown in Appendix Table B.

a-, b-, and c-variable pool. The a-, b-, and c-variable pool differed fromathe - and b-variable pool in that the guessing levels of the items were allowed

to spread across a range of values. The c values used were .1, .2, and .3. Allc values were approximately equally represented. The parameter estimates were

Page 16: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

F - I1 -

arranged such that each level of difficulty was represented by items that hadapproximately the same average a level and the same average c level (i.e., thepool was approximately rectangular). This item pool structure is shown in Ap-pendix Table C.

Mastery Level Conversion

Each of the four item pools described above was designed so that a propor-tion-correct mastery level of .60 would correspond to an achievement level of0.0 when conversion via the TRF was done. Figures 2a through 2d show the TRFsfor the uniform; b-variable; a- and b-variable; and a-, b-, and c-variable itempools, respectiveTy. These figures were developed by plotting the values ofE(PIe) given by Equation 9 as a function of e, using each of the 100 items thatmade up each item pool to evaluate Equation 9. In each of these figures, the .6proportion-correct mastery cutoff has been converted to the e metric. In eachitem pool the converted value observed was 0m=0.0 , as expected.

It should also be noted from these four figures that the TRF for the uni-form item pool was quite different from the TRFs for the other three item pools(which were quite similar to one another). The TRF for the uniform item poolwas much steeper for e levels close to 0.0. This was to be expected, since eachitem in the uniform item pool had a difficulty level of 0.0, whereas in theother item pools the difficulty of items differed from one another, and eachdifficulty level was approximately equally represented. The uniform item poolthus has all of the characteristics of a peaked test concentrating its abilityto differentiate contiguous 6 levels immediately around 6=0.0. The other itempools have the characteristics of rectangular tests, with more widespread mea-surement capacity. This spreading of the measurement capacity is representeddirectly in the shallow, steadily increasing TRFs for these three item pools.

Item Response Generation

Achievement levels for 500 simulees were drawn from a normal distributionwith a mean of 0.0 and a standard deviation of 1.0. Item responses for each ofthese simulees were then generated for each item in each of the four item poolsusing the three-parameter logistic IRT model. That is, knowing the 0 level ofthe simulee and the parameters of the item in question, the expected probabilityof a correct response was calculated applying Equation 9, above, to the item. Arandom number was then drawn from a uniform di7tribution ranging from 0 to 1.If this number was lower than the probability of a correct response, the simuleewas given a correct response to the item. If the number was higher than thecorrect response probability, the simulee was given an incorrect response.

Thus, in this study the achievement metric and the item response generatorcorrespond closely to the model assumed by IRT. The "true" mastery level foreach simulee was determined by comparing the 6 level used to generate the itemresponses with the proportion correct mastery level expressed on the 0 metric.

Conventional Tests

"-, Conventional tests (CTs) of three different lengths (10, 25, and 50 items)were drawn at random from each of the four item pools, with the stipulation that

'1

Page 17: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

- 12 -

Figure 2

Test Characteristic Curves for Four Item Pools

(a) Uniform Item Pool (b) b-Variable Item Pool

1.00 1.00

Z, .90 .908o w .80

U.80 N 8

0 .700 .70

oc .60----------------------------------------- 0 .60-----------

N .50 . 0o I 0

0. .40 . .40

.30 .3020 .20

.10 .10

0. I0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Achievement Level (0) Achievement Level (0)

(c) a- and b-Variable Item Pool (d) a-, b-, and c-Variable Pool

1.00 1.00

.90 .90

S .80 o800 o

U .70 U .70

0 I0.60 - - .60 ,

.100 .50

* ~0.30 I' .30

-1.0 -2 . 0l. 0.0 1.0 2 0 3.0 .-3-0 -2.0 -1. 0 ,0 1.0 2.0 3.0

Achievement Level (!)Achievement Level (0)

x .20. . .. . " .20J

Page 18: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-13-

the shortest test served as the first portion of the next longer test and thatthis test in turn served as the first portion of the longest test. For each ofthe three variable pools (Appendix Tables A, B, and C), Items I to 10 comprisedthe 10-item CT, Items 1 to 25 comprised the 25-item CT, and Items I to 50 com-prised the 50-item CT. (This procedure was not necessary for the uniform pool,since all items were identical.) These 12 CTs served as subpools from which theAMT and SPRT procedures drew items during the simulations. This random samplingfrom a larger domain of items was designed to correspond to the traditional mas-tery testing paradigm and to the random sampling model underlying the SPRT pro-ceture.

Simulation of the Testing Strategies

Using the item response data for the 500 simulees and the item parameters

available for each of the items (for the ANT procedure), the three testingstrategies (AMT, SPRT, CT) were employed to make mastery decisions for each in-dividual. Each testing procedure was used with each of the 12 subpools.

CT procedure. Two different decision processes were used with the CTs toexamine the effect of choosing an achievement metric. The first decision pro-cess used proportion-correct scores (CT/PC) and a mastery criterion of 60% cor-rect responses. After all of the items in the CT were administered, if the sim-ulee answered 60% or more items correctly, the simulee was declared a master.If the simulee's score was less than 60% correct, the simulee was declared anonmaster. This decision rule corresponded to the normal classroom use of con-ventional mastery tests with the proportion-correct achievement metric.

The second decision process used with the CT tests employed the latentachievement continuum and Bayesian scoring (CT/B) to make mastery decisions.Using the TRF the proportion-correct mastery level was converted to a masterylevel on the e metric. Then, after all of the items in the CT were administeredto a simulee, the obtained response pattern was scored using the known IRF pa-rameters for the items and Owen's Bayesian scoring procedure, assuming a priordistribution having a mean of 0.0 and a variance of 1.0. If the final 0 esti-mate was higher than the mastery level on the e metric, the simulee was declareda master. If the final estimate was lower than the mastery level, the simuleewas declared a nonmaster.

SPRT procedure. For the SPRT procedure the limits of the neutral region

were set at proportion-correct values of .5 and .7. Values of a and B were eachset to .10. For simulees for whom no decision was made by the SPRT procedurebefore the item pool was exhausted, the mastery decision was made in the sameway as it was for the conventional testing procedure, using a mastery propor-tion-correct value of .6.

AMT procedure. For the AMT procedure the mastery levels in each of the

100-item pools corresponding to 60% correct were determined from the TRF. Thismastery level was used with each of the smaller item pools, even though they hadnot been designed to result in the same mastery level on the e metric. Thisprocedure added some sampling error to the AMT procedure in order to more appro-priately reflect the error inherent when using estimated item parameters to de-termine the mastery level. As with the Bayesian scoring of the CT, each indi-

Page 19: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

- 14

vidual was given a prior distribution with a mean of 0.0 and a variance of 1.0for the Bayesian scoring of the adaptive test.

Comparison Among the Testing Procedures

For each of the three testing procedures (AMT, SPRT, and CT), the value ofthe procedure may be judged by the average length of the test required to makethe mastery decision and by how well the decisions that are made reflect thetrue state of nature. Specifically, the AMT and SPRT procedures were comparedin terms of the average reduction in the length of the test required to makemastery decisions across the entire group of individuals. Further, all threeprocedures were compared in terms of how well the decisions they made corre-sponded with the true mastery status of the simulees.

Comparisons within each testing procedure concerning the average testlength and the correspondence of decisions with true mastery status were madeacross all 12 combinations of test lengths and item pool types. For the CT pro-cedure, the correspondence of decisions with true mastery status was examinedusing both the CT/PC and CT/B scores.

Results

Test Length

Table 1 shows the mean test length required by each of the testing proce-dures to make a decision concerning the mastery status of the simulees in thetest group.

Uniform pool. As can be seen from Table 1, the AMT procedure resulted insome test length reduction for each maximum test length (MTL), with the reduc-tion in test length increasing as the MTL increased. For the 10-item MTL, thepercentage by which the CT length was reduced was 9.7%; for the 25-item MTL thereduction was 36%; and for the 50-item MTL the observed reduction was 54%.

For the SPRT procedure, also, increasing test length reduction was noted asMTL increased; and some reduction was noted at each level of MTL. For the10-item MTL, the reduction observed was 12%. The 25-item MTL resulted in a 48%reduction. For the 50-item MTL the reduction was 69%. At all MTL levels theSPRT procedure resulted in a greater reduction of test length than the AMT pro-cedure.

b-variable pool. For the pool in which the difficulty levels of the itemsdiffered, the data in Table I show the same trends that were noted for the uni-

form pool. The AMT procedure reduced the test length at each MTL, and the re-duction increased with the MTL level. For the 10-item, 25-item, and 50-item MTLlevels, the AMT procedure reduced test length by 6%, 28%, and 46%, respectively.

The SPRT procedure also reduced test length at each MTL level; however, inthis case, the reductions were larger for the longer MTL levels, relative tothose obtained with AMT. At the 10-item, 25-item, and 50-item MTL levels the

test length reductions observed were 4%, 33%, and 57%, respectively.

.... .. .... .... .... .. - . . - . . .. . .

Page 20: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

- 15 -

Table 1Mean Number of Items Administered to Each Simuleefor Three Mastery Testing Strategies Using Each

Item Pool, at Three Maximum Test Lengths

Item Pool and Maximum Test LengthTesting Strategy 10 25 50

Uniform PoolConventional 10.00 25.00 50.00ANT 9.03 15.99 23.00SPRT 8.75 13.12 15.39

b-Variable PoolConventional 10.00 25.00 50.00AMT 9.43 18.09 27.17SPRT 9.62 16.79 21.41

a- and b-Variable PoolConventional 10.00 25.00 50.00AMT 8.55 15.78 24.07SPRT 9.41 15.78 18.55

a-, b-, and c-Variable Pool

Conventional 10.00 25.00 50.00AMT 8.73 16.35 23.39SPRT 8.62 13.42 15.70

For this pool the ANT procedure resulted in slightly greater reduction intest length (average of 9.43 items) than the SPRT procedure (average of 9.62items) at the 10-item MTL level, whereas the SPRT procedure resulted in greatertest length reductions for the longer MTL levels. Across all MTL levels, bothprocedures reduced test length somewhat less for this item pool than for theuniform item pool.

a- and b-variable pool. The data in Table 1 indicate that the AMT proce-

dure resulted in test length reduction at each MTL level with this item pool.Test length reduction was greater for the larger MTL levels. For the 10-item,25-item, and 50-item MTL levels, the reductions in test lengths were 14%, 37%,

and 52%, respectively.

For the SPRT procedure, test length reduction was again observed, increas-ing with MTL. The reductions in test length noted were 6%, 37%, and 63%, re-

spectively, for the 10-item, 25-item, and 50-item MTL levels.

With this item pool, the AMT procedure produced a greater decrease in testlength (mean of 8.55 items) than the SPRT procedure (mean of 9.41 items) for the10-item MTL. For the 25-item MTL level, both procedures resulted in the samemean reductions in test length (15.78 items), but the SPRT procedure resulted in

the greater test length reduction for the 50-item MTL. The two testing proce-dures resulted in mean test lengths which were, on the average, slightly longerthan those observed for the uniform pool but shorter than those observed for theb-variable pool.

Page 21: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-16-

a-, b-, and c-variable pool. Table I shows that when the AMT procedure wasused with this item pool, test length was again reduced at each MTL, with thisreduction greater for the longer MTL levels. For the 10-item, 25-item, and50-item MTL levels, the observed reductions in test length were 13%, 35%, and53%, respectively.

For the SPRT procedure with this item pool, test length reduction was oncemore observed, with an increasing reduction as the MTL increased. The reduc-tions noted were 14%, 46%, and 69%, respectively, for the 10-item, 25-item, and50-item MTL levels.

For this item pool the SPRT procedure terminated using a smaller averagenumber of items for each MTL. Further, the degree of test length reduction inthis pool for both procedures at all MTL levels was quite similar to that ob-served for the uniform item pool.

Correspondence with True Mastery Status

For each of the simulees in the sample, the true 0 level (the 0 level thatwas used to generate the item responses) was known. Given this, it was knownwhether the individual's 8 level was actually above or below the mastery levelas determined on the achievement metric (6 = 0.0). Phi correlations betweentrue mastery status and the mastery state (correspondence coefficients) deter-mined by each of the testing procedures for each MTL level and pool type areshown in Table 2.

Uniform pool. The major trend observed for the uniform pool was that foreach testing procedure an increase in the MTL level was accompanied by an in-crease in correspondence coefficients. In addition, it was observed that forthe 10-item and 25-item MTL levels, the AMT procedure produced the highest cor-respondence coefficients observed (r = .775 and .840, respectively). For the50-item MTL level the CT/PC procedure resulted in the highest correspondence(r = .875). The CT/B procedure resulted in the lowest correspondence coeffi-cient observed at each MTL level. However, the differences in correspondencebetween MTL levels within any testing procedure were generally larger than thedifferences noted between testing procedures within a single MTL level.

b-variable pool. The same major trend that was found for the uniform poolwas again observed in the b-variable pool: Each testing strategy resulted inhigher correspondence as the MTL level increased. However, for this item poolthe AMT procedure resulted in the highest correspondence coefficient at each MTLlevel. The CT/B procedure resulted in the next highest correspondence coeffi-cient for the two highest MTL levels, and the CT/PC procedure tied with the SPRTprocedure for the next highest correspondence at the 10-item MTL level.

Differences in correspondence coefficients observed between testing proce-

dures within an MTL level were larger in this pool than in the uniform pool butwere still somewhat smaller than the differences noted between MTL levels, on

4 the average. For this pool each correspondence level observed was lower than

for the uniform pool across all MTL levels and testing procedures.

Both the AMT procedure and the CT/B resulted in higher correspondence coef-

Page 22: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

- 17 -

Table 2Phi Correlations Between Observed Mastery

State and True Mastery State for Each MasteryTesting Strategy, Using Each Type of Item Pool,

at Three Maximum Test Lengths

Item Pool and Maximum Test LengthTesting Strategy 10 25 50

Uniform PoolCT/PC .771 .837 .875CT/B .706 .803 .863AHT .775 .840 .871SPRT .771 .837 .867

b-Variable PoolCT/PC .541 .667 .783CT/B .533 .714 .791AMT .615 .715 .828SPRT .541 .656 .704

a- and b-Variable PoolCT/PC .626 .719 .771CT/B .638 .763 .788AMT .638 .756 .778SPRT .626 .698 .720

a-, b-, and c-Variable PoolCT/PC .290 .670 .735CT/B .485 .741 .804AMT .470 .733 .787SPRT .290 .592 .571

ficients at the 25-item MTL level (.715 and .714) than the SPRT procedure did atthe 50-item MTL level (.704).

a- and b- variable pool. Again, each testing strategy resulted in highercorrespondence coefficients as MTL increased. In this pool, however, CT/B re-sulted in the highest correspondence coefficients at each MTL level (tied withthe AMT procedure at the 10-item MTL level). The AMT procedure resulted in thenext highest correspondence coefficients at the 25-item and 50-item MTL levels.

Differences in correspondence coefficients among the various testing proce-

dures within any MTL level still showed a tendency to be smaller than differ-ences within any particular testing procedure across MTL levels. The corres-pondence coefficients observed for the CT/B and the AMT procedure at the 25-itemMTL level (.763 and .756) were each higher than the correspondence coefficientobserved for the SPRT procedure at the 50-item MTL level.

a-, b-, and c-variable pool. The same pattern of increasing correspondencewith increasing MTL was again noted for the CT/PC, CT/B, and ANT procedures.For the SPRT procedure the correspondence peaked at r - .592 at the 25-item MTL

and dropped to .571 at the 50-item MTL. The CT/B procedure resulted in the

Page 23: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-18-

highest correspondence coefficients at all three MTL levels. The AMT procedureproduced the next highest correspondence for all three MTL levels. The SPRTprocedure resulted in the lowest level of correspondence at all MTL levels (tiedwith the CT/PC procedure at the 10-item MTL level).

Once again, the average difference in correspondence was much greater be-tween MTL levels within testing strategies than between testing strategies with-in a single MTL level. Further, on the average, the correspondence coefficientsfor this pool were lower than for the other pools, with rather large decreasesat the 10-item MTL level, particularly for the CT/PC and SPRT strategies.

With this item pool, the SPRT testing procedure produced a lower corres-pondence coefficient at the 50-item MTL level (.571) than any of the other threeprocedures produced at the 25-item MTL level (.670 for the CT/PC procedure, .741for the CT/B procedure, and .733 for the AMT procedure).

Correspondence as a Function of Test Length

Figures 3a through 3d combine the observations made above concerning testlength and correspondence. These figures show the correspondence coefficients(as reported in Table 2) as a function of the mean number of items administeredby each testing strategy at each MTL level (as reported in Table 1) separatelyfor each item pool. A testing strategy can be said to be most efficient to theextent that it yields the highest correspondence coefficient for any given meantest length, or combination of highest correspondence level and shortest testlength.

For the uniform item pool, Figure 3a indicates that the SPRT procedure wasthe most efficient, as defined above. At all mean test lengths for which datawere available, the SPRT procedure resulted in the combination of highest cor-respondence coefficient and shortest test length. The correspondence coeffi-cient observed for the SPRT at a mean test length of 15.39 items (the longestmean test length observed for the SPRT procedure) was .867. To achieve thiscorrespondence level (interpolating from the data in Figure 3a), the AMT proce-dure would need to administer approximately 21 items, the CT/PC procedure wouldneed about 45 items, and the CT/B procedure would need more than 50 items.

For the b-variable item pool, Figure 3b indicates that the AMT procedurewas the most efficient testing procedure. The correspondence coefficient ob-served for the AMT procedure using an average test length of 27.17 items (thelongest mean test length observed for the AMT procedure) was .828. As Figure 3bshows, this was the highest coefficient observed for any test strategy for anymean test length using this item pool. Each of the CT procedures would haverequired more than 50 items to achieve this level of correspondence. For theSPRT procedure, the longest mean test length observed was 21.41 items. At thismean test length the SPRT procedure resulted in a correspondence level of .704,whereas for the same test length the AMT procedure would have resulted in a cor-respondence level (again through linear interpolation) of approximately .745.

For the a- and b- variable item pool, Figure 3c shows that the AMT proce-dure was again the most efficient test procedure at all observed mean testlengths, achieving the combination of highest correspondence levels and lowest

Page 24: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

Figure 3Correspondence Coefficients as a Function of the Mean Number

of Items Administered by Each Testing Strategy f or Each Item Pool

0--------------- 0 SPRT *4 0 a lo CT/PCA A AMT CT/B

(a) Uniform Item Pool (b) b-Variable Item Pool

1.00 -1.00

.90 -. 90

.80 - ~ 80

0 .60 - .60

W.50 a.50

40 V~ .400.a 30 .30

0 .20 -" .20

.10 -. 10

5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 415 50Mean Test Length Mean Test Length

(c) a- and b-Variable Item Pool (d) a-, b-, and c-Variable Item Pool

1.00 1.00

.90 .90

K .80 - .80

.70 - .700 0

uI.6 .60Q Qw .50 .50

~.30 .30

0 i I I I I I0~

5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50Mean Test Length Mean Test Length

Page 25: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-20-

test lengths for all three MTLs. At a mean test length of 24.07 items (thelongest mean test length observed for the AMT procedure), the AMT procedure re-sulted in a correspondence coefficient of .778. The CT/PC procedure would re-quire more than 50 items to produce as high a correspondence level, and the CT/Bprocedure would require about 39 items. For the SPRT procedure, the longestmean test length observed was 18.55 items. At this test length the SPRT proce-dure resulted in a correspondence coefficient of .720, whereas interpolationshows that the AMT procedure at the same test length would yield a correspond-ence coefficient of about .765.

Figure 3d shows that the AMT procedure was also the most efficient testingprocedure at all observed mean test lengths with the a-, b-, and c-variable itempool. At its longest observed mean test length, 23.3F9 items, the AMT procedureresulted in a correspondence coefficient of .787. The CT/PC procedure wouldneed to administer more than 50 items to achieve this correspondence level, andthe CT/B procedure would need about 46 items. The longest mean test length ob-served with the SPRT was 15.70 items, which resulted in a correspondence levelof .571. Interpolation of the data in Figure 3d shows that the AMT would prob-ably result in a correspondence coefficient of about .710 at this test length.

The data in Figure 3 also show that the differences in the efficiencies ofthe testing strategies were much more pronounced in the a-, b-, and c-variableitem pool than in any of the other three item pools. For instance, in the a-,b-, and c-variable item pool the scoring system used with the conventional testsmade a consistent difference in the correspondence coefficient observed at eachtest length. The magnitude of this difference was much greater with this itempool than with any other item pool.

Frequency and Type of Errors

To further compare the performance of the mastery testing strategies thefrequency with which each procedure made incorrect decisions of false masteryand false nonmastery was examined; the percentage of decision errors made byeach of the testing strategies with each of the item pools at each MTL is shownin Table 3. The "Total" column in Table 3 reproduces in a different manner theinformation already reported from the correlational analysis. For each situa-tion in which a high correlation was noted, a correspondingly low total errorrate is noted in Table 3, as expected.

Uniform pool. For the uniform pool each of the testing strategies resultedin the same general pattern of errors across MTL levels. Each procedure result-ed in fewer errors of each type with increased MTL. The difference in the fre-quencies of false mastery and false nonmastery decisions was smaller with higherMTL levels for all procedures except the CT/B procedure. The differences amongthe procedures in terms of the types of false decisions made were minimal, againexcept for the CT/B procedure, which resulted in higher total error rates ateach MTL level than any other testing strategy.

b-variable pool. For this item pool the patterns of errors made by thedifferent testing strategies were less regular than in the uniform pool. TheCT/PC and SPRT procedures produced more false mastery than false nonmastery de-cisions at all MTL levels. The AMT procedure produced more false mastery than

Page 26: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-21-

-1 1.4 , -4 -4C-

C> 4 4

00 -4 41

0) - CnO Cq Oe4 CN % 0C1 ) 0 0~J -'O- O

5.40 E O -4-* . . . . . . . . . . . .

00 C.N 00Z 00 r T' - - 1 f - C'J e

o -

(a -4J* C: * 1

a)3) 0 CU) 4 L L 1 CO: f1e r D Lfr; f.4AC'J;

U S-4cOi JJ U)t w oC D 0 o co0-tI DC

4) 0 10 (n C4 -* c)C4 m r C ' oJ

.0Ok

0 4C)0 % C14 IO 0) o 0'eJ~ 00tCJN 'tO-.t %

-.4 CU-4 C 0 CO ar- C14~N~ C-4 ~ I' C1

uC~

4 -10 U C( r:C (: C 60 I a

0C ca 4 C '

to 0) %D 0-.%D 0 I I t0 00 -. OC t O

a)CU .- a

O '-4

0 .-d C

4 a) 4

-4 C14- ICo 6 e>1 U. .u u uo rC bo L.0 E- CO o F- a.~ on -4 C'O H-

91.4 -M w w od ~-4 c. a- gCU- I -- - xr- 41.A 05- 1-4 .. CL,-W H F4 C. C E4-4 cOs.0 -4'5-. 04O

4-4-, >-.~(p~~A A fiCUC

-4 4 ' '.

Page 27: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-22-

false nonmastery decisions at the 10-item MTL level, but produced more falsenonmastery than false mastery decisions at the two higher MTL levels. The CT/Bprocedure produced more false mastery than false nonmastery decisions for thetwo lowest MTL levels, but the trend was reversed for the highest MTL level.For the AMT procedure and the CT/B procedure the discrepancy in the frequenciesof the two types of errors was smaller than for the other two procedures at allthree MTL levels and was quite small (less than 2%) for the two higher MTL lev-els. For the CT/PC procedure the difference in the frequencies of the two typesof errors was quite small at the highest MTL level; but for the SPRT procedure,a fairly large discrepancy between the two error rates (8% to 20%) was observedat each MTL.

In all testing conditions but one (AMT with a 25-item MTL), the use of theb-variable item pool resulted in higher discrepancies between the two observederror rates (as well as higher absolute error rates) than when the uniform poolwas used.

a- and b-variable pool. Using this item pool, the SPRT procedure resultedin more false mastery decisions than false nonmastery decisions for all threeMTL levels. The CT/B procedure resulted in a predominance of false mastery de-cisions for the two lowest MTL levels and a predominance of false nonmasterydecisions at the 50-item MTL level. The CT/PC procedure produced a greater per-centage of false mastery decisions at the 10-item and 50-item MTL levels, but agreater percentage of false nonmastery decisions at the 25-item MTL level. TheAMT procedure resulted in a greater percentage of false mastery decisions at the10-item MTL level, but a greater percentage of false nonmastery decisions at the25-item and 50-item MTL levels.

Small discrepancies in error rates (less than 2%) were observed for theCT/B procedure at all MTL levels for the AMT procedure at the 10-item and25-item MTL levels, and for the SPRT procedure and the CT/PC procedure at the25-item MTL level only. The SPRT procedure resulted in the largest discrepancyin error rates at all MTL levels (tied with the CT/PC procedure at the 10-itemMTL).

a-, b-, and c-variable pool. For this item pool, each of the testing pro-cedures resulted in higher frequencies of false nonmastery decisions than falsemastery decisions for the 10-item and 25-item MTL. For the 50-item MTL theCT/PC procedure resulted in a higher frequency of false mastery decisions, butthe CT/B, AMT, and SPRT procedures still resulted in hiher percentages of falsenonmastery decisions.

The AMT procedure used with this item pool resulted in smaller differencesin the frequencies of the two error types than any of the other testing proce-dures at the 50-item MTL level. The CT/B and AMT procedures tied for the lowestdiscrepancy in error rates (3%) at the 25-item MTL level. The CT/B procedureresulted in the lowest discrepancy in error rates (9.6%) at the 10-item MTLlevel. For the 50-item MTL level the AMT procedure produced a very small dif-ference in the two error rates (.6%). The CT/PC procedure also produced a smalldifference in the two error rates for the 50-item MTL level (1.6%). The SPRTprocedure resulted in the highest difference between the two error rates at allMTL levels (tied with the CT/PC procedure at the 10-item MTL.)

II

It'

Page 28: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

77-23-

One interesting result was observed when the errors made with the b-vari-

able item pool were compared with those made using the a-, b-, and c-variableitem pool. For the b-variable pool, each of the testing procedures was morelikely to make false mastery decisions than false nonmastery decisions. Thistendency was reversed for the a-, b-, and c-variable item pool, where each ofthe procedures made more false nonmastery decisions than false mastery deci-sions. These trends were most noticeable for each of the testing procedures atthe 10-item MTL level and most noticeable for the SPRT procedure across all MTLlevels.

It is probable that these trends were artifacts of the random sampling ofitems used to create the conventional tests, since the shorter conventionaltests would be less representative of the item domain due to the small sample ofitems taken. The results obtained here would be explained by a very easy

10-item conventional test being drawn from the b-variable pool and a very diffi-cult 10-item test being drawn from the a-, b- and c-variable pool. In fact,the mean b-value for the 10-item conventional test drawn from the b-variable poolwas -.80; for the a-, b-, and c-variable pool, it was 1.25. This would also

explain the observation that the SPRT procedure most clearly showed thesetrends, since the SPRT procedure used shorter test lengths, on the average, than

the other two procedures to make its final decisions and therefore was mostprone to small-sample artifacts.

Discussion and Conclusions

Several trends were noted in the data concerning the performance of the

different testing strategies with the four different item pools. In every in-stance the AMT and SPRT procedures produced reductions in the mean test lengthrequired to make mastery decisions. This reduction increased with the MTL ineach item pool. The AMT procedure resulted in reductions of 6% to 54% from the

length of the conventional test. The SPRT procedure resulted in reductions of4% to 69%. On the average, the SPRT procedure required fewer items to make the

mastery decision.

The correspondence between the estimated mastery status and the true mas-tery status systematically increased with MTL for all testing procedures in each

item pool (with the exception of the SPRT procedure used with the a-, b-, andc-variable item pool). Either the AMT procedure or the CT/B procedure resultedin the highest level of correspondence in all circumstances but one (the conven-tional tests performed best for the 50-item MTL with the uniform pool). On the

average, though, the differences between different MTL levels were more pro-nounced than differences between testing procedures. Further, the type of itempool used had important effects on the correspondence obtained.

In the b-variable and the a- and b-variable item pools, the AMT and CT/Bprocedures resulted in higher levels of correspondence at the 25-item MTL level

then did the SPRT procedure at the 50-item MTL level. For the a-, b-, andc-variable pool, the ANT, CT/B, and CT/PC procedures each resulted in higher

• ,correspondence at the 25-item MTL level than did the SPRT procedure at the50-item MTL level.

I

*1 ________________

Page 29: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

77-24-

When correspondence coefficients were examined as a function of the meannumber of items administered by each of the testing strategies, the results dif-fered depending on the type of item pool used. For the uniform item pool theSPRT procedure was the most efficient testing procedure (i.e., it produced thehighest correspondence level for any mean test length) for all observed meantest lengths. Using each of the other item pools, the AMT procedure was themost efficient testing strategy at all observed mean test lengths. Both the ANTand SPRT procedures were more efficient than either conventional testing proce-dure at all observed mean test lengths using the uniform and b-variable itempools. Using the a- and b-variable and a-, b-, and c-variable item pools, onlythe AMT procedure was consistently more efficient than the conventional testingprocedures.

The AMT procedure resulted in the most even frequencies in the types ofdecision errors made across most MTL levels and item pools. This was desirable,since both error types were assumed to have the same relative cost. Further, itwas noted that the SPRT procedure was most susceptible to small-sample arti-facts, resulting in the largest imbalance in the frequencies with which the twotypes of errors were made.

To prescribe the best testing strategy of those described here requiresspecification of priorities and conditionals. If testing time is at a premiumin a course of instruction, then it might be important to shorten the length oftests in the course without much loss of decision accuracy. Both the AMT andSPRT procedures were designed to accomplish this. Results from this study indi-cate that the performance of these strategies is affected greatly by the charac-teristics of the items available for use. If a uniform item pool is availablein which all items are equal in difficulty, discrimination, and guessing parame-ters, then the SPRT procedure will require the fewest items while resulting indecisions having correspondence coefficients that are quite comparable to theother three procedures. If, however, the item pool includes items with variablea, b, and c parameters, the SPRT procedure results in the shortest tests, buteach of the other procedures will make more accurate classifications. Using arealistic item pool of this type, the AMT procedure provides the optimal combi-nation of high decision correspondence and short test length. These factorsmust be considered before any decision is made as to which procedure is "best."

Several variations on the SPRT procedure and the AMT procedure that werenot examined in this study are possible. For instance, Reckase (1980) has sug-gested that the SPRT procedure might be expanded to use the three-parameter lo-gistic IRT model. This would remove the restrictive assumption from the SPRTmodel that all test items have identical characteristics. At the same time,this more general response model would allow use of the differential character-istics of the items as a basis for item selection and presentation. Comparisonof the performance of the AMT procedure to this expanded SPRT testing procedureshould be considered as a subject for future research.

Page 30: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-25-

References

Bejar, I. I., Weiss, D. J., & Gialluca, K. A. An information comparison of con-

ventional and adaptive tests in the measurement of classroom achievement(Research Report 77-7). Minneapolis: University of Minnesota, Department

of Psychology, Psychometric Methods Program, October 1977.

Birnbaum, A. Some latent trait models and their use in inferring an examinee'sability. In F. M. Lord & M. R. Novick, Statistical theories of mental testscores. Reading, MA: Addison-Wesley, 1968.

Brown, J. M., & Weiss, D. J. An adaptive testing strategy for achievement testbatteries (Research Report 77-6). Minneapolis: University of Minnesota,Department of Psychology, Psychometric Methods Program, October 1977.

Ferguson, R. L. Computer-assisted criterion-referenced measurement (WorkingPaper No. 4i). University of Pittsburgh, Learning and Research DevelopmentCenter, 1969. (ERIC Document Reproduction No. ED 037 089)

Ferguson, R. L. The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruc-tion. (Doctoral dissertation, University of Pittsburgh, 1969) DissertationAbstracts International, 1970, 30, 3856A. (University MicrofilmsNo. 70-4530).

Glaser, R., & Klaus, D. J. Proficiency measurement: Assessing human perfor-mance. In R. M. Gagne (Ed.), Psychological principles in system develop-ment. Chicago: Holt, Rinehart, & Winston, 1962.

Kingsbury, G. G., & Weiss, D. J. An adaptive testing strategy for mastery deci-sions (Research Report 79-5). Minneapolis: University of Minnesota, De-partment of Psychology, Psychometric Methods Program, September 1979.

Lord, F. M., & Novick, M. R. Statistical theories of mental test scores. Read-ing, MA: Addison-Wesley, 1968.

Nitko, A., & Hsu, T. C. Using domain referenced tests for student placement,diagnosis, and attainment in a system of adaptive individualized instruc-tion. Educational Technology, 1974, 14, 48-53.

Owen, R. J. A Bayesian approach to tailored testing (Research Bulletin 69-92).Princeton, NJ: Educational Testing Service, 1969.

Reckase, M. D. Some decision procedures for use with tailored testing. In D.J. Weiss (Ed)., Proceedings of the 1979 Computerized Adaptive Testing Con-ference. Minneapolis: University of Minnesota, Department of Psychology,

Computerized Adaptive Testing Laboratory, 1980.

Wald, A. Sequential analysis. New York: Wiley, 1947.

.1 ' ./

Page 31: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

7- 26-

APPENDIX: SUPPLEMENTARY TABLES

Table AItem Parameter Values for Each of the Items

in the b-Variable Item Pool

Item Item ItemNumber a b c Number a b c Number a b c

1 1.0 -2.5 .2 35 1.0 -1.5 .2 68 1.0 1.5 .22 1.0 -1.0 .2 36 1.0 -1.5 .2 69 1.0 -0.5 .2

3 1.0 -1.5 .2 37 1.0 0.5 .2 70 1.0 -0.5 .24 1.0 2.5 .2 38 1.0 -1.0 .2 71 1.0 -0.5 .25 1.0 -2.5 .2 39 1.0 0.5 .2 72 1.0 2.0 .26 1.0 0.5 .2 40 1.0 1.5 .2 73 1.0 2.0 .27 1.0 -1.5 .2 41 1.0 2.5 .2 74 1.0 -1.0 .28 1.0 -0.5 .2 42 1.0 -0.0 .2 75 1.0 2.0 .29 1.0 -2.5 .2 43 1.0 0.5 .2 76 1.0 2.0 .2

10 1.0 1.0 .2 44 1.0 2.5 .2 77 1.0 -2.5 .211 1.0 2.0 .2 45 1.0 0.0 .2 78 1.0 1.5 .212 1.0 -1.0 .2 46 1.0 0.0 .2 79 1.0 -0.5 .213 1.0 -1.0 .2 47 1.0 0.5 .2 80 1.0 2.0 .214 1.0 0.5 .2 48 1.0 -2.0 .2 81 1.0 -2.0 .215 1.0 0.0 .2 49 1.0 2.5 .2 82 1.0 1.0 .216 1.0 0.0 .2 50 1.0 -2.5 .2 83 1.0 -1.5 .217 1.0 1.0 .2 51 1.0 -0.5 .2 84 1.0 -2.0 .218 1.0 1.5 .2 52 1.0 0.5 .2 85 1.0 -1.5 .219 1.0 1.5 .2 53 1.0 -2.5 .2 86 1.0 2.0 .220 1.0 -1.5 .2 54 1.0 1.0 .2 87 1.0 2.0 .221 1.0 2.5 .2 55 1.0 2.5 .2 88 1.0 -1.0 .222 1.0 -2.5 .2 56 1.0 -2.0 .2 89 1.0 -1.0 .223 1.0 0.0 .2 57 1.0 -2.0 .2 90 1.0 1.5 .224 1.0 -2.5 .2 58 1.0 0.5 .2 91 1.0 -1.5 .225 1.0 -1.0 .2 59 1.0 1.0 .2 92 1.0 0.0 .226 1.0 2.5 .2 60 1.0 1.0 .2 93 1.0 0.0 .227 1.0 2.5 .2 61 1.0 -0.5 .2 94 1.0 1.5 .228 1.0 -1.0 .2 62 1.0 -2.0 .2 95 1.0 -0.5 .229 1.0 -2.0 .2 63 1.0 2.0 .2 96 1.0 1.0 .230 1.0 1.0 .2 64 1.0 0.0 .2 97 1.0 0.5 .231 1.0 -2.0 .2 65 1.0 1.5 .2 98 1.0 -2.0 .232 1.0 0.0 .2 66 1.0 -0.5 .2 99 1.0 2.5 .233 1.0 1.5 .2 67 1.0 1.0 .2 100 1.0 -2.5 .234 1.0 -1.5 .2

-- - - - . - --#~- S-

Page 32: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

- 27 -

Table BItem Parameter Values for Each of the Items

in the a- and b-Variable Item Pool

Item Item ItemNumber a b c Number a b c Number a b c

1 1.5 -0.5 .2 35 1.0 0.5 .2 68 1.0 2.0 .22 1.0 0.0 .2 36 2.0 2.5 .2 69 1.5 -0.5 .23 0.5 -1.0 .2 37 1.0 0.5 .2 70 0.5 0.5 .24 0.5 -1.5 .2 38 1.5 -2.5 .2 71 0.5 1.5 .25 1.5 -0.5 .2 39 0.5 -2.5 .2 72 1.0 -1.5 .26 1.5 2.0 .2 40 1.0 -2.0 .2 73 1.0 -2.5 .27 1.5 -2.0 .2 41 1.5 -2.0 .2 74 1.0 -1.0 .28 1.5 -1.0 .2 42 0.5 0.0 .2 75 0.5 2.0 .29 0.5 -1.0 .2 43 0.5 -2.5 .2 76 0.5 1.0 .2

10 1.5 1.0 .2 44 2.0 0.5 .2 77 1.5 1.0 .211 0.5 1.0 .2 45 2.0 -2.0 .2 78 1.0 2.5 .212 1.5 -2.0 .2 46 0.5 -1.5 .2 79 0.5 0.0 .213 2.0 0.0 .2 47 1.0 -1.5 .2 80 1.5 0.5 .214 0.5 -0.5 .2 48 1.5 0.5 .2 81 1.5 1.5 .215 1.0 1.5 .2 49 2.0 -2.0 .2 82 1.5 -0.5 .216 2.0 -2.5 .2 50 2.0 -1.5 .2 83 2.0 1.5 .217 0.5 -1.5 .2 51 1.0 0.5 .2 84 1.0 1.0 .218 1.0 2.5 .2 52 1.0 -1.0 .2 85 2.0 -2.0 .219 2.0 2.0 .2 53 1.0 -2.0 .2 86 2.0 2.0 .220 2.0 0.0 .2 54 2.0 1.0 .2 87 2.0 -1.5 .221 1.5 2.0 .2 55 0.5 -2.5 .2 88 1.0 1.5 .222 0.5 2.5 .2 56 1.0 -2.5 .2 89 2.0 1.0 .223 1.0 0.0 .2 57 1.5 -1.0 .2 90 2.0 2.0 .224 2.0 2.0 .2 58 1.5 1.0 .2 91 1.5 1.5 .225 0.5 -2.5 .2 59 2.0 -1.0 .2 92 0.5 2.5 .226 1.5 2.5 .2 60 1.0 0.5 .2 93 2.0 1.5 .227 1.0 0.0 .2 61 1.5 0.0 .2 94 2.0 -0.5 .228 1.5 2.0 .2 62 1.0 -1.5 .2 95 0.5 -2.5 .229 1.0 2.5 .2 63 2.0 -1.0 .2 96 0.5 0.5 .230 1.0 2.5 .2 64 2.0 -1.5 .2 97 1.0 0.0 .231 0.5 -1.0 .2 65 1.5 2.0 .2 98 2.0 1.5 .232 0.5 -1.0 .2 66 2.0 -0.5 .2 99 0.5 0.0 .233 2.0 -2.5 .2 67 1.5 -0.5 .2 100 1.5 -0.5 .234 0.5 1.5 .2

II

- - - .. .,. "4 - S.- .. . .- ,

Page 33: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

-28-

Table CItem Parameter Values for Each of the Items

in the a-, b-, and c-Variable Item Pool

Item Item ItemNumber a b c Number a b c Number a b c

1 1.5 2.5 .1 35 0.5 1.0 .3 68 1.0 -1.5 .22 1.0 2.5 .3 36 1.0 2.5 .1 69 0.5 0.0 .33 2.0 0.5 .3 37 1.0 -2.0 .1 70 1.0 0.0 .14 1.0 1.5 .1 38 0.5 -2.5 .2 71 1.5 1.0 .25 1.5 1.5 .2 39 2.0 2.0 .1 72 1.5 1.0 .16 2.0 -1.5 .1 40 1.5 2.0 .3 73 2.0 -1.5 .27 2.0 2.0 .2 41 2.0 -0.5 .1 74 1.5 -2.0 .18 1.5 1.0 .2 42 1.0 -2.5 .3 75 1.5 -0.5 .39 2.0 1.0 .1 43 2.0 -1.0 .2 76 0.5 1.5 .3

10 1.5 1.5 .5 44 0.5 -1.0 .2 77 0.5 2.0 .211 1.5 -1.0 .3 45 0.5 -1.0 .1 78 1.5 -0.5 .212 1.0 -1.5 .1 46 0.5 2.5 .3 79 2.0 2.0 .213 0.5 -0.5 .1 47 1.5 -2.0 .2 80 1.5 0.5 .114 1.0 0.5 .3 48 1.0 2.5 .2 81 1.0 -1.0 .215 1.5 -2.5 .1 49 1.0 0.5 .3 82 1.0 2.5 .116 0.5 -2.5 .3 50 2.0 -2.5 .1 83 1.0 2.5 .317 0.5 -1.5 .3 51 1.0 0.0 .3 84 1.0 -2.0 .118 1.0 -1.0 .1 52 1.5 2.0 .3 85 1.0 0.5 .219 2.0 -2.5 .3 53 1.5 2.0 .3 86 0.5 0.5 .120 2.0 0.0 .2 54 2.0 2.5 .3 87 2.0 2.0 .121 1.5 2.0 .2 55 2.0 -1.5 .2 88 0.5 -2.5 .122 2.0 -2.0 .2 56 0.5 0.0 .2 89 1.5 -0.5 .323 0.5 0.5 .2 57 1.0 0.5 .2 90 2.0 1.5 .324 0.5 1.0 .2 58 1.0 0.0 .2 91 0.5 -2.5 .325 1.5 -0.5 .3 59 1.0 0.0 .3 92 2.0 0.0 .326 2.0 -2.0 .3 60 1.5 -2.0 .1 93 1.0 -2.0 .327 2.0 -0.5 .3 61 1.5 0.0 .2 94 0.5 2.5 .128 2.0 1.0 .3 62 1.0 1.5 .2 95 0.5 -2.5 .129 2.0 -1.0 .2 63 1.5 -0.5 .1 96 0.5 0.0 .230 1.5 1.0 .2 64 0.5 -1.0 .2 97 0.5 -1.5 .131 2.0 1.5 .1 65 2.0 1.5 .2 98 0.5 -1.5 .132 0.5 -1.0 .3 66 1.5 0.5 .1 99 1.5 -0.5 .233 1.0 -1.5 .2 67 1.0 1.0 .1 100 0.5 1.5 .334 2.0 -2.0 .2

.1

Page 34: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

DISTRIBUTION LIST

Navy 1 CAPT Richard L. Martin, USN 1 Roger W. Remington, Ph.D

Prospective Commanding Officer Code L52USS Carl Vinson (CVN-70) NAMRL

Dr. Ed Aiken Newport News Shipbuilding and Pensacola, FL 32508Navy Personnel R&D Center Drydock CoSan Diego. CA 92152 Newport News, VA 23607 1 Dr. Bernard Rimland (03B)

Navy Personnel R&D CenterDr. Jack R. Borating 1 Dr. James McBride San Diego, CA 92152

Provost & Academic Dean Navy Personnel R&D Center

U.S. Naval Postgraduate School San Diego, CA 92152 1 Mr. Arnold RubensteinMonterey. CA 93940 Naval Personnel Support Technology

1 Dr William Montague Naval Mterial Comand (08T244)Dr. Robert Breaux Navy Personnel R&D Center Room 1044. Crystal Plaza 05Code N-711 San Diego, CA 92152 2221 Jefferson Davis Highway

NAVTRAEQUIPCEN Arlington. VA 20360Orlando, FL 32813 1 LibraryArigoV206Naval Health Research Center 1 Dr. Worth Scanland

Chief of Naval Education and Training P. 0. Box 85122 Chief of Naval Education and TrainingLiason Office San Diego, CA 92138 Code N-5

Air Force Human Resource Laboratory NAS, Pensacola, FL 32508Flying Training Division Naval Medical R&D CommandWILLIAMS AFB AZ 8522 Code 44 1 Dr. Robert G. Smith

National Naval Medical Center Office of Chief of Naval Operations

Dr. Richard Elater Bethesda, MD 20014 OP-9871i

Department of Administrative Sciences Washington. DC 20350

Naval Postgraduate School I Ted M. 1. YellenTechnical Information Office. Code 201 1 Dr. Alfred F. SmodeMonterey, CA 93940 NAVY PERSONNEL R&D CENTER Training Analysis & Evaluation Group

DR. PAT FEDERICO SAN DIEGO, CA 92152 (TAEG)

NAVY PERSONNEL R&D CENTER Dept. of the NavySAN DIEGO, CA 92152 1 Library, Code P201L Orlando. FL 32813

Navy Personnel R&D Center

Mr. Paul Foley San Diego, CA 92152 1 Dr. Richard SorensenNavy Personnel R&D Center Navy Personnel R&D (-enterSan Diego. CA 92152 6 Commanding Officer San Diego, CA 9?152

Naval Research Laboratory

Dr. John Ford Code 2627 1 W. Gary Thomson

N-ivy Personnel R&D Center Washington. DC 20390 Naval Ocean Systems CenterSan Diego, CA 92152 1 scooitCode 7132

nSan Diego CA 92152

Dr. Henry M. Halff ONR Branch OfficeDepartment of Psycholoy.C-O09 Bldg 114, Section D 1 Dr. Ronald WeitzmanUniversity of California at San Diego 666 Summer Street Code 54 WZLU s Jolla. CA 92093 Boston. MA 02210 Department of Administrative Sciences

U. S. Naval Postgraduate School

Dr. Patrick R. Harrison I Psychologist Monterey, CA 93940Psychology Course Director ONR Branch OfficeLEADERSHIP & LAW DEPT. (7b) 536 S. Clark Street 1 DR. MARTIN F. WISKOFFDIV. OF PROFESSIONAL DEVELOPMENT Chicago, IL 60605 NAVY PERSONNEL R& D CENTER

U.S. NAVAL ACADEMY SAN DIEGO, CA 92152I Office of Naval ResearchANNAPOLIS, MD 211102 Code 437

CDR Robert S. Kennedy 800 N. Quincy SStreet ArmyHead. Human Performance Sciences Arlington, VA 22217

SNaval Aerospace Medical Research LabBox 9 a7 e Personnel & Training Research Programs 1 Technical Director

New Orleans, LA 70189 (Code 458) U. S. Army Research Institute for theOffice of Naval Research Behavioral and Social Sciences

I Dr. Norman J. Kerr Arlington. VA 2221 5001 Eisenhower AvenueChief of Naval Technical Training Alexandria, VA 22333Naval Air Station Memphis (75) 1 PsychologistMillington, TN 38054 ONR Branch Office 1 HO USAREUE & 7th Army

1030 East Green Street ODCSOPSDr. William L. Maloy Pasadena. CA 91101 USAAREUE Director of GEDPrincipal Civilian Advisor for APO New York 09403

Education and Training 1 Office of the Chief of Naval Operations

Naval Training Command, Code OOA Research Development & Studies Branch I DR. RALPH DUSEKPensacola, FL 32508 (OP-i15) U.S. ARMY RESEARCH INSTITUTE

Washington, DC 20350 5001 EISENHOWER AVENUEDr. Kneale Marshall ALEXANDRIA. VA 22333Scientific Advisor to DCNO(MPT) I LT Frank C. Petho. MSC, USH (Ph.D)

OPO1T Code L51

W;ashington DC 20370 Naval Aerospace Medical Research LaboratPensacola, FL 32508

--

Page 35: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

Dr. Myron Fischl 1 Research and Measurment Division I HEAD, SECTION ON MEDICAL EDUCATIONU.S. Army Research Institute for the Research Branch, AFMPC/MPCYPR UNIFORMED SERVICES UNIV. OF THE

Social and Behavioral Sciences Randolph AFB, TX 78148 HEALTH SCIENCFS001 Eisenhower Avenue 6917 ARLINGTON ROAD

Alexandria, VA 22333 1 Dr. Malcolm Ree BETHESDA, MD 20014AFHRL/MP

Dr. Michael Kaplan Brooks AFB. TX 78235

U.S. ARMY RESEARCH INSTITUTE Civil Govt5001 EISENHOWER AVENUE 1 Dr. Marty RockwayALEXANDRIA. VA 22333 Technical Director

AFHRL(OT) 1 Dr. Susan ChipmanDr. Milton S. Katz Williams AF8, AZ 58Z24 Learning and DevelopmentTraining Technical Area National Institute of EducationU.S. Army Research Institute 1 Jack A. Thorp, Maj.. USAF 1200 19th Street NJ5001 Eisenhower Avenue Life Sciences Directorate Washington, DC 20208Alexandria. VA 22333 AFOSR

Bolling AFB, DC 20332 1 Dr. Joseph 7. LipsonDr. Harold F. O'Neil. Jr. SEDR W-638Attn: PERI-OK National Science FoundationArmy Research Institute Marines Washington. DC 205505001 Eisenhower AvenueAlexandria. VA 22333 1 Dr. John Pays

1 H. William Greenup National Institute of EducationMr. Robert Ross Education Advisor (E031) 1200 19th Street NWU.S. Army Research Institute for the Education Center, MCDEC Washington. DC 20208

Social and Behavioral Sciences Quantico, VA 221345001 Eisenhower Avenue 1 Dr. Arthur HelmedAlexandria. VA 22333 1 Director, Office of Manpower National Intitute of Education

Utilization 1200 19th Street NWDr. Robert Sasmor HQ. Marine Corps (MPU) Washington. DC 20208U. S. Army Research Institute for the BCB. Bldg. 2009

Behavioral and Social Sciences Quantico, VA 22134 1 Dr. Andrew R. Molnar5001 Eisenhower Avenue Science Education Dev.Alexandria, VA 22333 1 DR. A.L. SLAFKOSKY and Research

SCIENTIFIC ADVISOR (CODE RD-1) National Science FoundationCommandant HQ, U.S. MARINE CORPS Washington, DC 20550US Army Institute of Administration WASHINGTON, DC 20380Attn: Dr. Sherrill 1 Personnel R&D CenterFT Benjamin Harrison. IN 46256 CoastGuari Office of Personnel Managment

1900 E Street NWDr. Frederick Steinheiser Washington, DC 20415U. S. Army Reserch Institute 1 Mr. Thomas A. Warm5001 Eisenhower Avenue 1 Dr. Vern W. UrryAlexandria, VA 22333 U. S. Coast Guard Institute Personnel R&D Center

P. 0. Substation 18 Office of Personnel Management

Dr. Joseph Ward Oklahoma City, OK 73169 1900 E Street NWU.S. Army Research Institute Washington. DC 204155001 Eisenhower Avenue Other DoDAlexandria. VA 22333 1 Dr. Joseph L. Young. Director

Memory & Cognitive Processes12 Defense Technical Information Center National Science Foundation

, Air Force Cameron Station, Bldg 5 Washington, DC 20550Alexandria, VA 22314

1 Air Force Human Resources Lab Attn: TC Non Govt

AFHRL/MPD 1 Dr. Dexter FletcherBrooks AFB, TX 78235 ADVANCED RESEARCH PROJECTS AGENCY 1 Dr. Erling B. Andersen

1 Air University Library ARLINGTON, VA 22209 SttisedsAUL/L1BE 76/V3 Studietrede 6

Maxwell AFB, AL 36112 1 Dr. William Graham 1455 CopenhagenTesting Directorate DENMARK

Dr. Earl A. Alluisi MEPCOM/MEPCT-PHQ, AFHRL (AFSC) Ft. Sheridan, IL 60037 D psychological research unitBrooksDept. o Defense (Army Office)1 Military Assistant for Training and Campbell Park OfficesDr, Genevieve Haddad Personnel Technology Canberra ACT 2600. AustraliaProgram Manager Office of the Under Secretary of DefenseLife Sciences Directorate for Research & Engineering 1 Dr. Alan BaddeleyAFOSR Room 30129. The Pentagon Md pled Psychology Unitolling AFB, IDC 20332 Washington, DC 20301 15 Chaucer Road

Cambridge CB2 2EFDr. Ross L. (AFHRL/LR) MAJOR Wayne ellman, USAF ENGLANDWright -Patterson AFB Office of the Assistant SecretaryOhio 45433 of Defense (MRA&L) 1 Dr. Isaac Bejar

3B930 The Pentagon Educational Testing Service

Washington, DC 20301 Princeton, NJ 08450

Page 36: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

Dr. Werner Birke 1 Dr. Hans Crombag I Dr. Steven HunksDvzWPS im Streltloraeftemt Education Research Center Department of EducationPostfach 20 50 03 University of Leydan University of AlbertaD-500 Bonn 2 Roerhsavelaan 2 Edmonton. AlbertaWEST GERMANY 2334 EN Leyden CANADA

The NETHERLANDS

Dr. Nicholas A. Bond I Dr. Earl HuntDept. of Psychology I L'COL J. C. Eagenberger Dept. of PsychologySIacramento State College DIRECTORATE OF PERSONNEL APPLIED RESEARCH University of Washington-OO Jay Street NATIONAL DEFENCE HO Seattle, WA 98105:' cramento. CA 95819 101 COLONEL BY DRIVE

OTTAWA, CANADA KIA 012 1 Dr. Huynh HuynhDr. Devid G. Bowers College of EducationInstitute for Sreial Research 1 Dr. Leonard Feldt University of South Carolinaliniversity of Michigan Lindquist Center for Measurment Columbia, SC 29208P. 0. Rox 1248 University of IowaAnn Arbor. MI 48106 Iowa City, IA 5722 1 Dr. Douglas H. Jones

Rm T-255Dr. Robert Brennan 1 Dr. Richard L. Ferguson Educational Testing ServiceAmerican College Testing Programs The American College Testing Program Princeton. NJ 08450P. o. Box 168 P.O. Box 168Iowa City, IA S2240 Iowa City. IA 522410 1 Professor John A. Keats

University of Newcastle1 Dr. Victor Fields AUSTRALIA 2308

Dept. of PsychologyMontgomery College 1 Dr. Mazie Knerr

DR. C. VICTOR BUNDERSON Rockville. MD 20850 Litton-MellonLcsTATE ST.ebgaBox 12R6I T PLAZA. SUITE 10 1 Univ. Prof. Dr. Gerhard Fischer Sp ngfield. VA 22151

11 60 SO. STATE ST Liebigga33e 5/3

A 1010 Vienna I Mr. Marlin KrogerAUSTRIA 1117 Via Goleta

Dr. John B. Carroll Palos Verdes Estates, CA 90274

Psychometric Lab 1 Professor Donald FitzgeraldUniv. of No. Carolina University of New England Dr. Michael Levine

Davie Hall n13A Armidale. New South Wales 2351 Department of Educational PsychologyCnapel Hill. NC 27514 AUSTRALIA 210 Education Bldg.

University of Illinois

Cnarles Myers Library I Dr. Edwin A. Fleishman Champaign, IL 61801

Livingstone House Advanced Research Resources Organ.Livingstone Road Suite 900 1 Dr. Charles LewisStratford 1330 East Weat Highway Faculteit Sociale Wetenschappen

London E15 2LJ Washington, DC 20014 Rijksuniversitelt Groningen

ENGLAND Oude Boteringestraat1 Dr. John R. Frederiksen Groningen

Dr. Kenneth E. Clark Bolt Beranek & Newman NETHERLANDS

College of Arts & Sciences 50 Moulton Street

University of Rochester Cambridge, MA 02138 1 Dr. Robert LinnRiver Campus Station College of EducationRiver NY 1Cp 627 1 DR. ROBERT GLASER University of IllinoisRochester, LRC Urbana. IL 61101Dr. ornan Cliff UNIVERSITY OF PITTSBURGHDept. of Psychology 3939 O'HARA STREET 1 Dr. Frederick M. LordUniv. of So. California PITTSBURGH, PA 15213 Educational Testing ServiceUniversityoPark Princeton, NJ 085110University k I Dr. Ron HambletonLos Angeles. CA 90007 School of Education 1 Dr. Gary Plarco

Dr. William E. Coffman University of Massechusetts Educational Testing Service

Director. Iowa Testing Programs Amherst, MA 01002 Princeton, NJ 08450[ 31 Ltndquist Center |

1ist Center I Dr. Chester Harris 1 Dr. Scott MaxwellUniversity Of Iowa School of Education Department of PsychologyIowa City. IA 52212 University of California University of Houston

Dr. Allan H. Collins Santa Barbara, CA 93106 Houston, TX 77004." j Bolit Beranek & Naman, Inc. E

50 Moulton Street 1 Dr. Lloyd Humphreys I Dr. Samuel T. MayoCambride, Ma 021 Department of Psychology Loyola University of ChicagoC i Ma . ... University of Illinois 820 North Michigan Avenue

Dr. Meredith P. Crawford Champaign, IL 61820 Chicago. IL 60611

American Psychological Association L .1200 17th Street, N.W. Library -D. Allen.Munr

12in thn ret n 2n HumRRO/Western Division Behavioral Technology Laboratories, . 27857 Berwick Drive 1845 Elena Ave.. Fourth Floor

Carmel, CA 93921 Redondo Beach, CA 90277

84

-- 1 -"- - --... . .. . . ... . .. - "- "... -

Page 37: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

Dr. Melvin R. Novick 1 Committee on Cognitive Research I Dr. Howard Wainer e356 Lindquist Center for Measurment % Dr. Lonnie R. Sherrod Bureau of Social SCience ResearchUniversity Of Iowa Social Science Research Council 1990 M Street. N. W.Iowa City. IA 522412 605 Third Avenue Washington. DC 20036

New York, NY 10016I Dr. Jesse Orlsnsky I DR. TH0OMAS WALLSTEN

Institute for Defense Analyses Dr. Kazuo Shigemasu PSYCHOMETRIC LABORATORY400 Army Navy Drive University of Tohoku DAVIE HALL 013A

Arlington, VA 22202 Department of Educational Psychology UNIVERSITY OF NORTH CAROLKawauchi, Sendal 980 CHAPEL HILL. NC 27514

Dr. James A. Paulson JAPAN

Portland State University 1 Dr. Phyllis WeaverP.O. Box 751 1 Dr. Edwin Shirkey Graduate School of EducationPortland, OR 97207 Department of Psychology Harvard University

University or Central Florida 200 Larsen Hall. Appian Way

MR. LUIGI PETRULLO Orlando, FL 32816 Cambridge, HA 02138?431 N. EDGEWOOD STREETARLINGTON, VA 22207 1 Dr. Robert Smith I DR. SUSAN E. WHITELY

Department or Computer Science PSYCHOLOGY DEPARTMENTDR. DIANE M. RAMSEY-KLEE Rutgers University UNIVERSITY OF KANSAS

R-K RESEARCH & SYSTEM DESIGN New Brunswick, NJ 08903 LAWRENCE, KANSAS 660443947 RIDGEMONT DRIVEMALIBU, CA 90265 1 Dr. Richard Snow 1 Wolfgang Wildgrube

School of Education StreitkraerteamtIINRAT M. L. RAUCH Stanford University Box 20 50 03

P II 11 Stanford, CA 91305 D-5300 Bonn 2BUNDESMINISTERIUM DER VERTEIDIGUNG WEST GERMANY

POSTFACH 1328 1 Dr. Robert Sternberg

D-53 BONN 1, GERMANY Dept. of Psychology I Dr. Karl ZinnYale University Center for research on Learning

Dr. Mark D. Reckase Box 1lA, Yale Station and Teaching

Educational Psychology Dept. New Haven, CT 06520 University of Michigan

iniversity of Missouri-Columbia Ann Arbor, MI 118104 Hill Hall I DR. ALBERT STEVENS

Columbia. MO 65211 BOLT BERANEK & NEWMAN. INC.50 MOULTON STREET

Dr. Fred Reif CAMBRIDGE. MA 02138SESAMEc/o Physics Department 1 DR. PATRICK SUPPESUniversity of California INSTITUTE FOR MATHEMATICAL STUDIES IN

Berkely. CA 94720 THE SOCIAL SCIENCESSTANFORD UNIVERSITY

Dr. Andrew M. Rose STANFORD, CA 911305American Institutes for Research1055 Thomas Jefferson St. NW 1 Dr. Hariharan SwaminathanWashington. DC 20007 Laboratory of Psychometric and

Evaluation Research

Dr. Leonard L. Rosenbaum. Chairman School or EducationDepartment of Psychology University or MassachusettsMontgomery College Amherst, MA 01003Rockville, MD 20850

1 Dr. Brad Sympson

Dr. Ernst Z. Rothkopf Psychometric Research Group

Pell Laboratories Educational Testing Service600 Mountain Avenue Princeton. NJ 08541

Murray Hill, NJ 79711I Dr. Kikusi Tatsuoka

Or. Lawrence Rudner Computer Bdsed Education Research401 Elm Avenue Laboratory

Takoma Park. MD 20012 252 Engineering Research LaboratoryUniversity or Illinois

I Dr. J. Ryan Urbana, IL 61801Department of Education

University of South Carolina I Dr. David ThissenColumbia, SC 2?oo3 Department or Psychology

University or Kansas

PROF. FUMIKO SA'IEJIIIA Lawrence. KS 660111DEPT. OF PSYCHOLOGYUNIVERSITY OF TENNESSEE 1 Dr. Robert TautakawaKNOXVILLE. TN 37Q16 Department of Statistics

University of Missouri

DR. ROBERT J. SEIDEL Columbia, MO 65201INSTRUCTIONAL TECHNLOGY GROUP

HUMRRO 1 Dr. J. Uhlaner3no N. WASHINGTON ST. Prceptronics, Inc.

ALEXAPD-iIA. VA .214 6271 Variel AvenueWoodland Hills. CA 913611

l fl . . . .lw

Page 38: FIG A COMPARISON OF ADAPTIVE SEQUENTIAL, AND …

I