Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets...

30
Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences Han L. J. van der Maas, Dylan Molenaar, Gunter Maris, Rogier A. Kievit & Denny Borsboom Department of Psychology University of Amsterdam version: September 20, 2010 submitted Abstract This manuscript analyzes latent variable models from a cognitive psychology perspective. We start by discussing work by Tuerlinckx and De Boeck (2005), who proved that a diffusion model for two‐choice response processes entails a two‐parameter logistic Item Response Theory (IRT) model for individual differences in the response data. Following this line of reasoning, we discuss the appropriateness of IRT for measuring abilities and bipolar traits, such as pro/contra attitudes. Surprisingly, if a diffusion model underlies the response processes, IRT models are appropriate for bipolar traits, but not for ability tests. A reconsideration of the concept of ability that is appropriate for such situations leads to a new item response model for accuracy and speed based on the idea that ability has a natural zero point. The model implies fundamentally new ways to think about guessing, response speed and person fit in item response theory. We discuss the relation between this model and existing models, as well as implications for psychology and psychometrics. Keywords: Ability, item response theory, diffusion model, decision making, response times, guessing

Transcript of Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets...

Page 1: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics1

Cognitivepsychologymeetspsychometrictheory:

Ontherelationbetweenprocessmodels fordecisionmakingand latentvariablemodelsforindividualdifferences

HanL.J.vanderMaas,DylanMolenaar,GunterMaris,RogierA.Kievit&DennyBorsboom

DepartmentofPsychology

UniversityofAmsterdam

version:September20,2010

submitted

Abstract

Thismanuscript analyzes latent variablemodels from a cognitive psychology perspective.WestartbydiscussingworkbyTuerlinckxandDeBoeck(2005),whoprovedthatadiffusionmodelfortwo‐choiceresponseprocessesentailsatwo‐parameterlogisticItemResponseTheory(IRT)modelforindividualdifferencesintheresponsedata.Followingthislineofreasoning,wediscussthe appropriateness of IRT for measuring abilities and bipolar traits, such as pro/contraattitudes. Surprisingly, if a diffusion model underlies the response processes, IRT models areappropriateforbipolartraits,butnotforabilitytests.Areconsiderationoftheconceptofabilitythatisappropriateforsuchsituationsleadstoanewitemresponsemodelforaccuracyandspeedbased on the idea that ability has a natural zero point. Themodel implies fundamentally newwaystothinkaboutguessing,responsespeedandpersonfitinitemresponsetheory.Wediscusstherelationbetweenthismodelandexistingmodels,aswellasimplicationsforpsychologyandpsychometrics.

Keywords: Ability, item response theory, diffusion model, decision making, response times,guessing

Page 2: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics2

Itemresponsetheory(IRT)coversafamilyofmeasurementmodels for the analysis of testdata, inwhich item responses or test scoresare related to a latent variable. Specificmodels covered by the general IRTframework are, for instance, the models ofRasch(1960),Birnbaum(1968),Lord(1952),and Mokken (1971) for dichotomous itemsand a continuous latent variable; factormodelsforcontinuousitemsandacontinuouslatent variable (Lawley & Maxwell, 1963;Jöreskog, 1971; Mellenbergh, 1994); latentclass models for dichotomous items and acategorical latent variable (Goodman, 1974;Lazarsfeld & Henry, 1968); and mixturemodelsforcontinuousitemsandacategoricallatent variable (McLachlan & Peel, 2000;Bartholomew,1987).

Although these models are generallyapplicable, they have been especiallysuccessful in applications to psychologicaland educational testing, where primaryattention has been devoted to thedevelopment of IRTmodels for dichotomousitems and a continuous latent variable (e.g.,seeFischer&Molenaar,1995;VanderLinden&Hambleton,1997).Thisclassofmodelshasproventobeextremelyusefulintestanalysis,becauseitallowsformodeltesting,equating,computer adaptive testing, and theinvestigation of differential item functioningoritembias.

In IRT models for dichotomous itemresponses,theprobabilityofanitemresponseismathematicallyrelatedtocharacteristicsofthe item and to characteristics of therespondent. For instance, in the two‐parameter logistic (2PL) model (Birnbaum,1968), the probability of a 'correct' or'affirmative' response, P+, depends on thedifference between person ability (θ) anditem difficulty (β), weighted by itemdiscrimination(α)inthefollowingway:

P+ =eα (θ −β )

1+ eα (θ −β ) (1)

Usingmarginalmaximumlikelihoodthe itemparameterscanbeestimatedfromamatrixofitem responses, Ykj, which consists of theresponses of K persons to J items. The 2PLmodel is popular because it is more flexiblethan the one‐parameter logistic (1PL)model(Rasch,1960), inwhichallαj are equal (αj =α), but still gives a relatively parsimoniousaccount of the association structure in thedata. On the basis of these elementary IRTmodels, many more advanced IRT modelshave been proposed (Van der Linden &Hambleton,1997).

IRTmodels, like the1PLand2PLmodel, aretypicallyappliedtoitemresponsesthatresultfrom human information processing.However,theybearnoobviousconnectiontomodelsthathavebeendevelopedincognitivepsychologytorepresentthemechanismsthatunderlie such information processing. TheRaschmodel,forinstance,istypicallyderivedfrom statistical or measurement‐theoreticassumptions (Fischer, 1995; Rasch, 1960;Roskam&Jansen,1984).Suchderivationsarebased on desirable properties (e.g.,sufficiency of the total score for the latentvariable,parameterseparation,oradditivity)rather than on a mathematical model of thepsychologicalprocessesatplayinrespondingtotestitems.IRTmodelsarethusbasedonaset of assumptions concerning the relationbetween itemresponsesandasetofperson‐anditemparameters,butdonotspeakonthequestion how these item responses aregenerated; as Mislevy, (2008) states,‘[n]eitherthegenesisofperformancenorthenature of the processes producing it areaddressed in the metaphor or the attendantprobabilitymodels of classical test theoryorIRT.’

It is important to emphasize that the lack ofattendanceto itemresponseprocesses isnotan inherent weakness in IRT. In manysituations, item response processes aretheoretically and practically intractable, andthe fact that IRTmodelsbypassassumptions

Page 3: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics3

concerningthemmayinsuchcasesbeseenasa strengthrather thanaweakness.However,in investigations forwhich knowledge of thegenerating process is deemed relevant, thepaucity of results that link such models toinformationprocessingtheoriescanbecomeaproblem.Forinstance,ithasbeenarguedthattheprimarylocusforvalidityevidenceliesininvestigationsthatfocusonthequestionhowa testing procedure works, i.e., whichprocesses transmit variation amongindividuals into variation in the itemresponses (Borsboom, Mellenbergh, & VanHeerden, 2004; Borsboom & Mellenbergh,2007). It is evident that, in answering suchquestions,thepresenceofatheorythat linksIRT models to information processingtheories is essential. In general, suchconnectionswould seem tobe invaluable forresearcherswhoaimtoaugment IRTmodelswith an explanatory component, i.e., anaccount of how a latent variable can beconceptualized at the level of an individualperson, and how it may affect that person’sitem responses. In providing such anexplanation,theseaccountsmayalsoservetobridge the gap between intra‐individualprocess models and models for inter‐individual differences (Molenaar, 2004;Hamaker, Nesselroade & Molenaar, 2007;Borsboom, Mellenbergh, & Van Heerden,2003;vanderMaasetal.,2006).

In the context of ‘classic’ IRTmodels, whichrelateacontinuous latentvariabletoasetofdichotomous item responses, Tuerlinckx andDe Boeck (2005) carried out pioneeringresearchinconnectingprocessmodelstoIRT.They showed that, if a diffusion process(Ratcliff,1978)generatestheitemresponses,thentheseitemresponseswillconformtothestructureofa2PLmodel.Inwhatfollows,wetake their result as a starting point, andaugment it to accommodate different testingsituations as they may arise in practice. Aswill become apparent, such extensions carryimportant consequences for the inter–pretationofexistingmodels.

Thestructureofthispaperisasfollows.First,we shortly review the derivation presentedbyTuerlinckxandDeBoeck(2005).Thenwederive three implications of their work thatmaybeconsideredplausibleforbipolaritems(i.e., items with two ‘attractor’ options)commonly used in attitude and personalitytests, but not for typical ability tests. Toaddress this problem we reconsider theabilityconceptandproposeanewIRTmodel,the Q‐diffusion model, which is applicablewhen respondents followadiffusionprocessin deciding which answer is correct. Inaddition, we extend the model toaccommodate for guessing and for multiple‐choice items.We introduce techniques to fitthe Q‐diffusion model to data and illustratethese techniques with two empiricalexamples. Finally, we discuss the relationbetweentheQ‐diffusionmodelandotherIRTmodels.

The relation between Item ResponseTheorymodelsanddiffusionprocesses

While the analysis of test data has mostlybeen the territory of psychometric modelssuch as the 2PL, the development of formalmodels of human cognition has, in the pastdecades, been mostly carried out bymathematical psychologists. The cross‐fertilization of these fields has, to date, beenrather meager, even though the fieldsevidently have much to offer to each other(Borsboom, 2006). The current work,however, is based on the conviction that a)IRTmodelsareideallyformaltheoriesofitemresponses rather than just statisticalmodeling techniques, and b) such modelsshould be based on the best formal modelsthat mathematical psychology has to offerwith respect to the cognitive processes thatlie between item administration and itemresponse.

Many item response processes used incognitive and personality testing require therespondenttomakeadecision(e.g.,todecide

Page 4: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics4

which response option is most likely to becorrect, which description best fits therespondent,etc.).Inthefieldofmathematicalpsychology, several models have beenproposed for decision‐making (Busemeyer&Townsend, 1993). An important class ofmodels applies the idea of sequentialsampling of information (for a typology ofthese models, see Bogacz, Brown, Moehlis,Holmes and Cohen, 2006). In such models,noisy accumulation of information drives adecisionprocessthatstopswhenevidenceforone of the response alternatives exceeds athreshold. Themost influentialmodel in thisclass of models is the diffusion model. Thediffusion model is a continuous‐time,continuous‐state random‐walk sequentialsampling model (see Laming, 1968; Link,1992; Ratcliff, 1978; Stone, 1960) that hasbeen successfully applied to two‐choice

response time (RT) paradigms in studies ofmemory, perception and language (see, e.g.,Ratcliff,1978,2002;Ratcliff&Rouder,1998;Ratcliff,VanZandt,&McKoon,1999).

One important reason for the popularity ofthediffusionmodel is that it implements thesequential probability ratio test, the SPRT(Stone, 1960;Wald, 1947). This implies thatthe diffusion model optimizes expectedaccuracy given RT, or conversely, optimizesRT given the level of accuracy. Furthermore,as Bogacz et al. (2006) show, morebiologically realistic complex models withleakage, inhibition and pooled inhibition canbereducedtothedriftdiffusionmodel,andinthis way implement the SPRT. Thus, thediffusion model is a widely applicableresponsemodel thatmay serve as a startingpoint for the analysis of test responses.

Figure1:Thedriftdiffusionmodeloftwochoicedecisions.Thisexampleconcernsthelexicaldecisionaboutwordsandnonwords.Therandomwalk,representingthenoisyaccumulationofevidence,startsatzandcontinuesuntilthe‘word’or‘nonword’boundaryishitatdecisiontimeDT.ResponsetimeRTisthesumofdecisiontimeandthetime(Ter)required forotherprocesses,suchas themotorpartof theresponse.Startingpositionzanddriftratexmayvaryovertrialsaccordingtoauniformandanormaldistribution,respectively.

Page 5: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics5

Figure1displaysthebasic ingredientsof thediffusion model. Upon being administered atwo‐choice response task, the respondentstarts collecting evidence for the responseoptions.Thisisformallymodeledasarandomwalkthatstartsinthepointz(sampledfromauniformdistributionwithrangeSz)andstopswhen either of the boundaries at a or 0 isreached. Response X takes value 1 whenaccumulation of information terminates atbound a, and 0 when it terminates at thebound at 0. This termination determines thedecision time (DT). Response time T is thesumofnondecisiontime(Ter),whichmayforinstancecoverperceptionofthestimulusandthetimeneededtoexecuteamotorresponse,anddecisiontime(DT).

The information accumulation process thatleadstoDTdependsonadriftrateparameter,which varies over trials with mean v andvarianceη. Drift rate is themean amount ofevidence accumulated over time and isthought toreflect thesubject'sability for thetask. Boundary separation, in contrast, isdetermined by the response caution of thesubject, which may be influenced byinstructions and rewards. If boundaryseparation is decreased, both DT and theprobability of terminating at the correctboundary reduce. In this way, the inverserelationbetweenspeedandaccuracy,i.e.,thespeed accuracy trade­off, is naturallyaccommodatedinthemodel.

The startingpointz reflects the aprioribiasofaparticipantforoneortheotherresponse.Inresponse timemodeling, thisparameter isusuallymanipulatedviapayofforproportionmanipulations (Edwards, 1965). Suchmanipulations are generally not applied inpsychological or educational testing. Variabilityintheaccumulationofinformationdependsontheparameters,whichisusuallyfixed at .1 or 1 to identify the model.Variabilityinthestartingpointandvariabilityindrift rateover trialshavebeen introducedin the model to account for errors faster or

slower than correct responses. Furtherexplanationofthemodelcanbefoundin, forinstance, Ratcliff and Rouder (1998). In thispaper we only use the two most importantparametersof thediffusionprocess:v,whichdenotes (mean) drift rate, and a, whichdenotesboundaryseparation. It is importantto note that these two fundamentalparameters, which feature in almost everysequential sampling model of choice,influencebothaccuracyandresponsetime,asspecifiedinthefollowingequations.ThejointdensityofX(boundarychosen)andT (response time) has a rather complexmathematicalform:

fX , T(x,t) =πσ 2

a2( exp(ax− z )v

σ 2 −v2

2σ 2 (t− Ter )

× msin(πm(ax− 2zx+ z )a

)exp(− 12π 2σ 2m2

a2m=1

∑ ( t− Ter ))

(2)

In contrast, the equations for theprobabilityofacorrectresponseandfortheexpectationofRT(i.e.,ofDT+Ter)arerelativelyeasy.Theprobability of X=1, which we denote by P+,(i.e., terminating at the upper boundary) is(Cox&Miller,1970):

P+ = P(X =1) =e−2zv −1e−2av −1

(3)

Incaseofanunbiaseddecisionprocess, i.e.z=1/2a,thissimplifiestoaformveryfamiliartoIRTmodelers:

P+ =e−av −1e−2av −1

=eav

1+ eav (4)

ThemeanDTcanbeexpressed1as

1 For unbiased (z = ½ a) decision making we canrewritethistoE(DT)=(a/2v)(2P+‐1).Theimportanceof the reduction inDTby the second termdiminisheswhen P+ takes extreme values close to zero or one.Thus forhighvaluesof |av|, that isP+closetozeroorone, E(DT) can be approximated by |a/2v|. TheunderestimationofE(DT) inthisapproximation isnot

Page 6: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics6

E(DT ) =a2v1− e−av

1+ e−av (5)

Tuerlinckx and De Boeck (2005) haveconnected Equation (4) to the 2PL (1). Theyargue that v, the mean drift rate, can bedecomposed into a person part (θ) and anitem part (β), and propose a simple linearrelationsuchthatv=θ ‐β.2Atfirstsight,thisseems to be a reasonable step. However, aswill be shown below, the resulting models,although plausible for attitude andpersonality tests, are highly implausible forabilitytests.

Boundary separation, a, is equivalent to thediscriminationparameterαinthe2PLmodelwithin the derivation of Tuerlinckx and DeBoeck (2005). This means that thediscriminatory power of items depends onfactors that determine boundary separation,suchas instruction,rewards,andtime limits,but also on person characteristics, such asresponse caution. Standard speed accuracytrade‐off studies have shown that boundaryseparation is influenced by time limits(Wickelgren, 1977). In practical testsituations, boundary separation will dependon how much time subjects get or take toansweritems.Thisimpliesthatincreasingthetime limit of items should increase thediscriminatory power of items. This is animportantconsequenceofthemodelthatwillplayakeyroleinthepresentarticle.

Tuerlinckx and De Boeck (2005) describetheirwork as fostering a new interpretationof item response models, rather than as aderivation. Indeed, the most crucial step intheir line of reasoning, which consists in

importantinthequalitativeanalysiscomparisonslater.In fitting the Q‐diffusion model to data we will notapplythisapproximation.

2 Throughout the paper diffusion model parameterswill be set in normal typeface and IRT parameters insymbolictypeface.

equating the diffusion parameters with the2PLparameters, is aprimarily interpretativeone.Aswewilldemonstratelater,alternativesetups and interpretations are possible. Ontheotherhand,intheworkbyTuerlinckxandDe Boeck (2005), the 2PL is derived fromassumptions about human informationprocessing and two‐choice decision tasks,which make the interpretative step quitereasonable.Forinstance,inviewofthiswork,thechoiceforthelogisticequationinthe2PLcan be justified on considerations ofsubstantive theory, and standard IRTparameters receive mechanisticinterpretations in terms of one of the bestinformation processing models currentlyavailable. Thus, the work of Tuerlinckx andDe Boeck (2005) lays out a general processmodelaccountofitemresponseprocessesthatiscompatiblewiththestandardIRTmodelforitem response data. It is hard tooveremphasize the psychometric importanceofthisresult.

Implications of the diffusioninterpretation of IRT models for abilitytesting

If a diffusionmodel accurately describes theresponseprocessesthatsubjectsfollowwhenanswering test items, this carries theimplication that a standard 2PL IRT modelshould fit the data. However, additionalimplications follow from the model as well,because it not only yields predictions on theprobabilitydistributionofitemresponses,butalsoonthedistributionofresponsetimes. Inthisrespect,thediffusionmodelmakesstrongpredictions about the qualitative andquantitative properties of the RT’s (Ratcliff,2006). Three qualitative implications areespeciallyrelevantinthecurrentcontext.

A first implication that follows from themodel, and that will play an important rolebelow, is that reducing item administrationtime, in the limit, should yield P+=1/2,irrespectiveofthevalueofθ.Thisisbecause,

Page 7: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics7

in the diffusion model, persons adjustboundary separation to handle very shorttime limits; as time limits become smaller,boundary separation approaches zero, andthe probability of hitting either boundbecomes 1/2. In IRT terms, itemdiscriminationbecomes zero, so that the ICCbecomes a flat line at P+=1/2, equal for alllevelsofθ.

Clearly, this can only happen if items havetwo response options; otherwise, for Mresponseoptions,theprobabilityofanygivenresponse approaches 1/M as the time limitapproaches zero. Hence the diffusioninterpretationofIRTmodelsisnotapplicabletomultiple‐choice itemswithmore than tworesponse options. This is important because,even though IRTmodelsareoftenapplied tobinarydata,thedichotomiesinthedataresultfrom scoring (incorrect‐correct), rather thanfromthe fact that theresponseprocess itselfresultsfromatwo‐choicesituation.Thus,thederivation discussed above does not applynaturally to the ability tests for which IRTmodels are often used. We will extend themodelinthisdirectionlaterinthispaper.

The second important implication of adiffusion interpretation of IRT modelsconcerns the effect of changes in v (whichequals θ‐β in IRT terms). Under a diffusionmodel,RT’sareslowestwhenv≈0,andhencein the IRTcontext shouldbeslowestwhenθequalsβ. Personswithθ<<β are expected tobeveryfast; infact, theyshouldbeasfastaspersons with θ>>β. This is plausible forpersonality and attitude items, but not forabilitytests.

Toseethis,consideritemssuchas“thedeathpenalty is allowed” and “I stick to mydecisions”, where ‘agree’ and ‘disagree’ arearbitrarily coded as X=1 and X=0. Subjectswith extreme positions (θ<<β or θ>>β) willprobably answer confidently and quickly.Subjects with θ=β will be in doubt and arelikely to respond slower (Van der Maas,

Kolstein, & van der Pligt, 2003). The lattereffect isalsoknownas thedistance‐difficultyhypothesis (Ferrando & Lorenzo‐Seva,2007).3 For ability tests, this implication isextremelyunlikelytohold:thereisnoreasonto suppose that, for instance, individuals ofvery limited intelligence will be as fast ingiving the incorrect response, as highlyintelligent individuals are in giving thecorrect response. In fact, in such cases oneexpectsthatindividualsforwhomtheitemisveryhardwilltakelongerthanindividualsforwhomtheitemiseasy.

A third implication concerns itemdiscrimination,whichinthediffusioncontextis determined by boundary separation. Forpositivev,thatis,wheneverθ>β, increasesinboundary separation lead to increases in P+.Becauseboundaryseparationisa functionofthe time limit imposed on the respondent,thismeansthatifweallowablepersonsmoretime to think about their answer, theprobability of a correct response willincrease. However, the reverse is also true,and considerably more surprising. Fornegativev,i.e.,forθ<β,allowingmoretimetothink should reduce the probability of acorrect response. This is illustrated in figure2.

This gives a special meaning to the pointwhere θ=β. Instead of just being the pointwhere P+ equals 1/2, it separates twoqualitatively different regimes. Below thispoint, more time to solve the item willdecreaseP+.Abovethispoint, itwill increaseP+. For very long time limits, the itemcharacteristiccurveshouldapproachthatofaGuttman item, i.e., become a step function.

3 In fact, this phenomenon favors Tuerlinckx and DeBoeck’s model since their model explains this effectwithout any alterations of the model. The model ofFerrando and Lorenzo‐Seva requires an adaptation ofThissen’s(1983)modelforRT’stoexplainthiseffect.

Page 8: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics8

Figure2:A longer time limit foran itemwill increasethe probabilty correct for persons with θ>β butdecreasethisprobabilityforpersonswithθ<β.

This may again be considered plausible intypicalpersonalityorattitudetests,butnotinability tests. In personality and attitudetesting,ifθisjustbelowβ,andthesubjecthasto respond quickly, the noise factor in thedecisionprocesswillplayalargerole.Butthelonger a subject thinks about his or herposition,themorelikelyitbecomesthatheorshewillselecttheansweroptionthatbestfitshis or her latent state. For ability tests,however, the lower end of the populationcannotapproachthelimitofP+=0asamatterof principle: the worst they can do is guess,which means they approach the guessingprobability of an item (0.5 for the binarycase),notzero.Inaddition,oneshouldexpectthatincreasingthetimelimitwillincreasetheprobability of a correct response across theboard.

Guessing is a widely recognized problem inIRT,wherethestandardsolutiontohandleitis to introduce a extra parameter into themodel. This results in the 3PL model(Birnbaum, 1968), which is an extension ofthe1or2PL:P3pl+=cj+(1‐cj)P2pl+(seeEquation1). In thismodelc is the lowerasymptoteoftheitemcharacteristicfunction,suchthatfor

anyvalueofθ,P+≥c.SanMartin,delPinoandDe Boeck (2005), following Hutchinson(1991),discussseveral interpretationsbasedonadistinctioninap‐processofsearchingforthe correct answer, and a g‐process forguessing. One example of this line ofreasoning is that a respondent first searchesfor the correct answer, and only guesseswhen this search fails (assuming that therecognizes this failure); however, analternativeorderofprocessing ispossible. Itis not straightforward to derive this modelfrom a diffusionmodel.Wewill not attemptsuchaderivationbecausewethinkasimplerandmorefundamentalsolutionispossibleintermsofasingleprocessmodel.

Inconclusion,ifadiffusionmodelgovernstheresponseprocessesinatestingsituation,thenthethreequalitativeimplicationsabovemakesensefortwo‐choiceattitudeandpersonalitytests, but not for typical ability tests. Weconsiderthistobesurprising,because1)thediffusionmodelwould,atfirstglance,seemtogive a reasonable approximation of theresponseprocessesinatypicalabilitytestingsituation,and2)abilitytestsarethestandardfield of application for IRT models. Whatshouldweconcludefromthis?

One possible conclusion is that the 1PL and2PLmodelcanonlybeappliedtopersonalityand attitude tests, and perhaps to someparticular ability tests4, but not to the usualability tests. This conclusion is correct, but4 Typical Piagetian conservation tests, in whichchildren have to judge whether the amount of liquidremains the same when poured from a normal glassinto a glass with a smaller diameter, could beconsistent with these three implications, as 1)nonconservers typically score below chance level,because they believe the incorrect answer to becorrect,2) transitionalchildren(θ=β)areslower thannonconservers and conservers (van der Maas &Molenaar, 1992), and 3) the probability of a correctresponseislikelytodecreasefornonconserverswhentheygetmoretimetothinkovertheiranswer.

Page 9: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics9

only when the diffusion model adequatelydescribes the response process. It isdebatable towhat extent this is the case fortypical ability tests (i.e., IQ‐tests). Thus, weleave open the possibility that the diffusionmodel does not describe the responseprocesses in such cases, but that whatevermodel does may imply the standard IRTmodelafterall.

An alternative conclusion is that a diffusionmodeldoesinfactdescribetheitemresponseprocessesintypicalabilitytestsaccurately.Inthat case, even though the standard 2PLmodelmay be a valuable pragmatic or data‐analytic tool, it cannot be considered to givean accurate theoretical account of the datastructure. In this viewpoint, the suggestedcourseofactionistoinvestigatewhatkindofIRT model does follow from a diffusionaccount of the response process. The nextparagraphsdevelopthislineofreasoning,andsuggest a new IRT model that is consistentwith a diffusion interpretation of theresponseprocess.

Whatareabilities?

If a sequential sampling model holds for atypical ability test item, the standard 2PLdoesnotfollow.First,theimplication,thatP+approaches 0.5 as the time limit approacheszero,cannotbecorrectwhenever itemshavemorethantworesponseoptions;inthatcase,P+shouldapproachtheguessingprobability,which is 1/M for equally attractivealternatives. Second, the model should beconsistentwith the fact that giving a subjectmore time will improve the probability of acorrectresponseacrossallvaluesofthelatentvariable. One could, of course, take theseproblems to be purely statistical in nature,and craft solutions by considering modelsthat evade them. However, we suspect thatproblems we encounter here indicate anunderlying problem in theway that abilitiesare typically represented in psychometrictheory.Toseethis,itisnecessarytoconsider

the deep structure of the ability concept insomedetail.

In current psychometric theory, researcherscommonly use the word 'ability' to refer tothe latent variable in a psychometric modellike the 2PL. However, what such a latentvariable represents isnot an ability.A latentvariable in a standard measurement modelcanonly representdifferences between levelsof ability. That is, standard models ofpsychometric theory are restricted to therepresentation of individual differences(Borsboom, Mellenbergh, & Van Heerden,2003),andthesentence'Johnhasvalueθkonthe abilitymeasured by this test' derives itsmeaning exclusively from the relationbetween John and other test takers, real orimagined (Borsboom, Kievit, Cervone, &Hood, 2009). However, in our attempt torelatepsychometricabilities topsychologicalprocesses, we are doing something entirelydifferent from traditional approaches inpsychometrics. For in the diffusion model,abilities are not merely instances of anindividual differences variable. They areparametersatplayintheactualprocessthatasingle individual follows when answering atest item. For this reason, we are forced toaddress a question that is virtually neverraisedinpsychometrictheory:whatisabilityatthelevelofanindividual?

To start with an uncontroversial and well‐understood ability, consider the ability towalk. It refers to a capacity todosomething,namely,tocoveracertaindistancebyusingaparticularformofpropulsioncommontolandanimals. One immediate and striking featureof the ability in question is that it can bepresent or absent. That is, while someindividualscanwalk,others‐e.g.,babiesandpersonswithdisabilities,butalsofish,chairs,and elementary particles ‐ cannot. In thissense, it is meaningful to say the ability towalk isessentiallypositive.Wethinkthatthischaracteristiciscommontoallabilities.Intheterms of philosophers like Harré (Harré &

Page 10: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics10

Madden, 1975), at the individual level, toascribeanabilitytoanindividualistoascribethatindividualacausalpower.Althoughsuchpowers may be present in greater or lesserdegree,theyhaveadefiniteminimum,namelyabsence. Thus, in contrast to, say, yourappraisalofaMozartsymphonyoryourlikingofparties,abilitiescannotbenegative.5

A second observation about a simple ability,like the ability towalk, is that any task thatcan be said to measure this ability requiressome of the ability. That is, a task thatdependsontheabilitytowalk(i.e.,forwhichonerequirestheability)dependsonapositiveamount of work to be done through thestructures and processes that instantiate theability.Forinstance,anytaskthatdependsontheabilitytowalkmustrequireanindividualtowalkapositivedistance; if thedistance tobe covered is zero, then the task simplycannotdependon the ability towalk (this isevident from the fact thatall sortsofobjectsthatdonothave theability inquestion,suchas, say, the journal that you are currentlyholding in your hands, are able to meet thetask just by stayingwhere they are). This isalso a key difference between ability testingontheonehand,andpersonalityandattitudetestingon theother: to endorsean item inapersonality questionnaire, or to complywitha statement in an attitude test, does notrequireanygivenlevelofapersonalitytraitorattitude ‐ one may, for instance, lie. That is,someone who walks a certain distance, orsolves an IQ‐item, displays (some of) theability in question, but someone whoendorses a personality item does not. Thus,like abilities themselves, the 'difficulties' of5 A standard reply of psychometricians to therequirementofnonnegativeabilityisthatwecanapplyRasch’s transformation θ*= eθ in order to scale thelatent trait values to positive values. However, thisdoesnotfullysolvetheproblem,asitstillallowssomeθ to be less than some β, implying subject scoringbelowchance.

tasks that measure these abilities areessentiallypositiveaswell.

A third observation about simple abilities isthat, if a task depends on that single abilityandonnootherability, thenthat taskcan inprinciplebecarriedoutbyanyindividualwhopossesses the ability if only the individual isgiven sufficient time. Again, considering theability to walk, we may identify tasks thatdepend only on this ability as tasks thatrequire a person to walk a certain distance,without obstacles like rivers that requireswimming or mountains that requireclimbing. If given enough time, any personwho has the ability to walk may cover anydistance unless, for some reason, thatindividuallosestheabilityalongtheway.

We propose that the above characteristicsmay be considered axiomatic for simpleabilitiesandthetasksthatdependuponthem.It is evident, however, that this puts seriouslimitsonwhatcouldcountasaprocessmodelthat describes abilities. In a diffusionmodel,for instance, it requires that drift rate isalways positive, and that the probability ofhitting the boundary for a correct responseshouldapproachoneiftimelimitsareabsent.

Whathappens ifweassumethesepropertiesto hold, and subsequently introduceindividual differences into the model?Consideringagaintheabilitytowalk,wemaysay that among individuals who have thisability, some are 'better' than others. Thismeans, in the diffusionmodel logic followedhere, that these individuals will a) have ahigher probability of successfully completingatask(e.g.,'walking100meters')iftherearetimelimits(which impliesthatsomewillnotcompletethetask),andb)willcompletethattask faster if thereareno time limits (in thiscase,allindividualswillcompletethetask).Inthegeneralsituation,thetimelimitsimposedwill induce a speed accuracy trade‐off,meaning thatunder time limits therewill beboth individualdifferences in theprobability

Page 11: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics11

ofcompletingataskandinthetimeinwhichtheydoso.

What kind of individual differences modelfollows from this line of reasoning? First,becauseabilitiesareessentiallypositive, it issensible to represent them by positivenumbers. Thus, the domain of θ will beassumed to be R+. Second, because all tasksrequire positive ability, the same holds fortask'difficulty',sothatdifficultyalsohasR+asitsdomain.Third,abilityanddifficultyshouldbecombinedinsuchawaythatresultingdriftrateisalwayspositive.Inthenextsectionwewill propose a new way to combine abilityand difficulty to establish this. Fourth, if thetimelimitapproachesinfinity,theprobabilityof a correct response should approach unityforalllevelsofability.Fifth,astimepressureincreases, the probability of a correctresponseshouldapproach1/M forMequallyattractiveansweroptions,withP+=0.5forthetwo‐choice case. This implies a solution forthe problem of guessing in the 1PL and 2PLmodel for abilities. Given that drift rate isalways nonnegative, below chance scoringcannot occur. We will now derive a modelthathasalloftheseproperties.

Thepositiveabilitymodel

In this paragraph, we take characteristics ofability outlined above as axiomatic, andproposeaclassofmodelsthatrespectstheseaxioms.Naturally,thereexistsawideclassofmodelsforwhichthisisthecase;weproposeto let these models fall under the generalpositive ability model. Here, we focus onspecificsubtypesofthesemodelsthatmaybeusefulinscientificresearchbecausetheyhaveclear relations with existing process andindividual differences models. As before, wetakethediffusionmodelasourstartingpoint.

First, however, an observation about theparametersofthediffusionmodelisinorder.Tuerlinckx and De Boeck (2005) usedlogit(P+) = av to derive the 2PLmodel; they

equated the diffusion parameter a with theIRTparameterα, andthediffusionparameterv with the linear combination of IRTparametersθ‐β.Fromthelatteridentification,itisclearthatthediffusionmodelhasnoclearseparation between item and personparameters,whereasinitemresponsetheorythis distinction is essential (van der Linden,2009). A similar problem occurs with the aparameter, forwhich it isunclearwhether itshouldbeconsideredaperson,item,oritemxpersonparameter(in IRT, thecorrespondingparameterα is an item x person interactionparameter). In order toderive an IRTmodelfor ability from the diffusion hypothesis,however,weneed tobeable to separate theperson and item contributions to perfor–manceasinIRT.

To solve this problem, we decompose driftrate and boundary separation into a personand an item part. For drift rate, Tuerlinckxand De Boeck (2005) propose to distinguishbetweenabilityanddifficulty,whichweheredenote by vp and vi, respectively; both arerequired to be positive, in keeping with ourreconceptualizationoftheabilityconcept.Forboundary separation, we propose a similardecomposition of the a parameter into aperson and item part (response caution andtime pressure), denoted by ap and ai,respectively. Response caution is then takento be a person characteristic. Individualdifferences in response caution may, forinstance, relate to personality traits. Timepressuredependsonthesetupofthetestandtest instructions. Inacomputerizedtestwithfixed equal time limits per item, it can beequal forall items. Inatestwithatime limitfor the whole test, time pressure mayincrease during the test. These and otherscenarioswillbediscussedlater.

Next, we need functions v=f(vp,vi) anda=g(ap,ai) to combine the person and itemparts into theordinarydiffusionparameters.Severalconstraintsonthesefunctionscanbeformulated. First, v and a must be positive:

Page 12: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics12

boundary separation must be positive bydefinition, and drift rate because of therequirementthatabilityispositive(notethatfor this reason, the difference functionproposedbyTuerlinckxandDeBoeck(2005)does not work here). Second, the function fshouldbemonotonically increasing invp andmonotonically decreasing in vi. Third, if vpapproaches infinity, or vi approaches zero, fshouldapproachinfinity,suchthatP+goesto1.Fourth, ifvp approacheszero,orvi goes toinfinity, f should approach zero, such thatP+approaches1/2.

There exists a wide class of functions thathave these properties, and thus instantiatesubmodels of the positive abilitymodel. Oneexampleofafunctionthatrespectstheaboveconstraints, and that we will use in thefollowing, is the quotient function, v= vp/vi.6Weproposetoapplythequotientfunctionforbothvanda,suchthatv=vp/vianda=ap/ai.Inthiscase,vandaarealwayspositivewhenvp,vi, ap, and ai are positive. Also, drift rateincreases with increasing ability, anddecreases with increasing difficulty. Forboundary separation, the quotient functionworks well too. Given these definitions wearrive at the sequential sampling based itemresponsemodelforpositiveability7:

P+ =eakpvk

p

ajivji

1+ eakpvk

p

ajivji

(6)

6Ametaphoricalwaytothinkaboutthisrelation is interms of the famous Newtonian relation speed (driftrate)=power(ability)/ force(difficulty). Interestingly,Rasch (1960) proposed to decompose the speedparameter in his model for reading speed in readingability/difficulty(seevanderLinden,2009,forarecentdiscussion).

7 Here superscripts p and i are used to indicate thepersonanditempartofdriftandboundaryseparation.Subscriptskandjindicatesubjectanditemnumber.

This model adheres to the propertiesattributed to simple abilities in the previousparagraph: a) both ability and difficulty arepositive, b) drift rate v=vp/vi, is alwayspositive,so thatc)as the time limitbecomeslarger, the probability of a correct responseincreases for all levels of ability; the limitingcase is the situation without a time limit, inwhich the probability of a correct responseequals one. In (6), participants cannotsystematicallyscorebelowchance level.Thismodelhastheproper ingredientstoserveasamodelforabilitythatiscongruentwiththehypothesisthatadiffusionprocessgeneratestheitemresponses.

The model proposed in (6) is naturallyapplicable to two‐choice tests; however, aswe argued earlier, it is important toaccommodate tests with multiple responseoptions. For this purpose, we propose toapply an extension of the 2PL that coversmultiple‐choiceoptions. InvanderMaasandMaris (2010) we derived such an extensionfromBock's (1972) nominal responsemodel(anIRTmodelformultiple‐choiceitemswithunorderedcategories).Undertheassumptionthat incorrect alternatives are equallyattractivetoallsubjects,asimpleextensionofthemodeltomultiple‐choiceispossible.Here,we sufficeby giving themodel formula for acorrect response, which is transparent initself:

P+ =eakpvk

p

ajivji

M −1+ eakpvk

p

ajivji

=eakpvk

p

ajivji−ln(Mj −1)

1+ eakpvk

p

ajivji−ln(Mj −1)

(7)

Here, when ability approaches zero, theprobability of a correct response does notapproach0.5, as in theoriginal derivationofTuerlinckx and De Boeck (2005), but 1/M,whichispreciselywhatisdesired.8

8Tuerlinckx and De Boeck’s (2005) derivation of the2PLmodelwasbasedontherelationlogit(P+)=av=α

Page 13: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics13

Themodelin(7)stillhasthestructureofa2PLmodel.Given

θ k = akpvk

p;αj = 1/ajivj

i;βj = ln(Mj −1)ajivj

i ,(7)leadstoequation(1),thestandard2PLmodel,withtheconstraintsthatbothθandαarenonnegative.Forthelinearsetupofthe2PL,logit(P+)=

αj*θ k + βj

* (asusedinthenominalresponsemodel),

βj*furtherreducesto‐

ln(Mj‐1),and

αj* = αj = 1/aj

ivji .

We propose that model formula (7) may betaken as a general framework tomodel itemresponses for simple abilities, under theassumptionthatadiffusionprocessgivesriseto the item responses. For the problem ofnamingthismodel,weproposetolinkittoanearlier model proposed by Ramsay (1989),whichwasproposedonadifferentbasis,andwhichhecalledtheQuotientModel(QM):

P+ =eθ k /β j

K + eθ k /β j, θ k ≥ 0, βj > 0

(8)

Whenθ=apvp,β=aivi,andK=M‐1,Ramsay'sQM and the model in (7) are equivalent.Therefore, the model we proposed inEquation(7)canbeconsideredaspecificationof Ramsay's QM on a diffusion model basis.Forthisreason,weproposetocallthemodeltheQ‐diffusionmodel.9

(θ –β). Formultiple‐choicewe now get a v =α(θ–β‐ln(M‐1). Assuming that an increase in number ofalternativesinfirst instancereducesthedriftrateandnot boundary separationwe can express the effectivedriftrateformultiple‐choiceasv*=v‐ln(M‐1)/a.Analternative way to incorporate multiple‐choice in thediffusionmodelistochangestartingpointZtoa/M.Inbothcasestheprobabilityofacorrectguess(whenv=0) is, as desired, 1/M.However, in the latter case themeanRTofanincorrectanswerismuchlowerthanthemeanRTofcorrectanswers.Thisisnotdesirable,andoneof themain reasonswhyweprefer to correct formultiplechoicebyadaptingthedriftrate.

9 As Ramsay notes, the quotient model could also becalled an exponential difference equation, whenlogarithmictransformationtoabilityanddifficultyare

PropertiesoftheQ­diffusionmodel

An attractive property of the Q‐diffusionmodelisthatitexplicatesthemeaningoftheusual IRTparametersθ,α,and β.Theroleofthe speed accuracy trade‐off in testing isclarified by the definition of ability as theproductofboundaryseparationanddriftrate.The success rate in solving items not onlydepends on the information powers of thesubject but also on his or her responsecaution. These two factors can be separatedusing response times. Improvements in testscores thatresult fromincreases inresponsecaution should increase response times,whereas improvements that result fromincreases in drift rate should lead to lowerresponse times. In addition, thediscrimination parameter α obtains aradically new interpretation as the easinessparameter, which is the product of the timepressureanddriftrateoftheitem.Theroleofβ (or actually β∗) is limited to being anintercept guessing parameter for subjectswithzeroability.

The predicted item response probabilities instandard IRT models are invariant underlineartransformationsoftheθparameter;i.e.,the scale of θ has an arbitrary zero. As aresult, a value of zero could mean verydifferent things depending on the values forothersubjectsandtheitemparameters.IntheQ‐diffusion model, however, θ=0 alwaysmeans the absence of ability, which impliesperformanceatchancelevel.ThisgivestheQ‐

applied. Earlier, Cressie andHolland (1983) used thismodelinaslightlydifferentform:

P+ =cee

θ k* −β j

*

(1− c) + ceeθ k* −β j

*

which,givenc=1/Mandlogarithmictransformationsofθ*andβ*,isagainequivalenttoRamsay'sQMandthusto(7).

Page 14: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics14

diffusionmodel certain ratio properties: Forinstance, if M=2 a doubling of θ gives adoubling of the logit score. A similar line ofreasoning applies to the α parameter. Forexample, for theabilityofaddition,wecouldthink of 1+1 as an item for which theparameterviisnearlyzero,whichimpliesthatit has very high α. For extremely difficultitems,ontheotherhand,αapproacheszero.

Theitemcharacteristiccurves(ICC)oftheQ‐diffusionmodel are simple. Since alla and varepositive,allθandαarenonnegative.Thisimplies that all ICC’s reside in the firstquadrantandallICC'saremonotonicallynondecreasing. The intercept is simply 1/M.Slopes are determined by

α j =1/a ji v j

i . Figure3 gives examples forM=2,M=4 andM=500(asanapproximationtoopenquestions10).

The ICC’s in figure 3 demonstrate therestrictivenessoftheQ‐diffusionmodel.Boththe2PLand the3PLaremuchmore flexible.The additional restrictions of theQ‐diffusionmodel have the following sources. First, incontrastto2‐and3PLmodels,theQ‐diffusionmodel excludes the possibility that subjectsscore below chance level. Second, if Mj =M,thenthemodelimpliesthatICC’sdonotcross.AsintheRasch(1960)model,thisrestrictionhas both statistical and interpretiveadvantages,althoughitmakesthemodellessflexibleasadata‐fittingtool;needlesstosay,however, the present manuscript does notwork within the tradition that views data‐10Thisapproximationcanbedefendedbecausethesetof possible answers to open questions is oftenrestricted.A chess itemasking for thebestmove in achessposition,forinstance,isanopenquestionwithalimited set of legal moves (often < 100) as possibleanswers. However, it is well known (e.g., de Groot,1978) that chess players, in contrast to chesscomputers programs, only consider a very limited setof possible moves. The initial selection of candidatemoves is an important and still not well‐understoodpart of the solution process, not covered in thediffusionpartofthedecisionprocess.

fitting flexibility as the prime virtue of ameasurement model. Third, all ICC’s startincreasingatθ=0(see figure3,whereM is2and 4). In the 1‐, 2‐ and 3PL, the ICC’s ofdifficultitemsfirstremainatzero(orchance)level and begin to increase only when θreachestheβ oftheitem.

Another way to demonstrate the ICC’s of Q‐diffusionmodel is to compare themwith theICC’s of the so‐called "Rasch model withguessing", which is a 3PL model with equaldiscrimination and fixed (at 1/M) guessingparameters for the items. In figure 4 wedisplaybothICC’sontheexpandlogscaletoeasecomparison.

A note on the limitations of the Q‐diffusionmodel is inorderat thispoint.Thestructureimposed on the ICCs clearly restricts theapplicability of the Q‐diffusion model incertain practical applications where IRTmodelsareroutinelyusedtoday.Forinstance,onemaysolvesomeitemsofatestmeasuringone's proficiency in physics, but one willcertainly fail somemoredifficult items, evenwhen one is allowed to ponder them for therestofone'slife;suchasituationviolatestheQ‐diffusion model. This may mean that adiffusionprocessdoesnotaccuratelydescribetheability inquestion, that the testdoesnotmeasure a single ability, or both. In manypracticaltestingsituations,wedosuspectthatscoresinfactdependonahostofrelatedandhierarchically nested abilities (van derMaasetal.,2006).Thus,inthiscasetheQ‐diffusionmodelmaynotdescribethedatawellbecausethe conceptualization of the test items asmeasuring a single ability is incorrect. Eventhough the item scores may approachunidimensionality in a statistical fashion,fromasubstantivepointofviewtheydependondiscretelyseparableabilities.Thus,suchatest may be treated with, for instance, aconjunctive multidimensional or multi–component Q‐diffusion model (de la Torre,2009;Embretson,1984;Hoskens&DeBoeck,2001; Maris, 1995). The construction

Page 15: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics15

Figure3:ICCofarestrictedMC2PLforabilitytesting.Allθ(vpap)andα(1/viai)arebydefinitionpositive.Interceptsare equal to 1/M, i.e. subjectswith ability zero guess. Note that for highM the typical logistic form of the ICC isrecovered.HighMisusedtomodelopenquestions.

Figure4:ThequotientmodelcomparedwiththeRasch(difference)modelwithguessing.Toeasecomparison,theQMmodel is alsodisplayedon a log scale and theDM‐G is displayedon an exponential scale.A subtledifferencebetweentheQM(logscale)andDM‐GistheasymmetryintheQM‐logICC’s.Theyleavethelowerasymptoteslowerthantheyapproachtheupperasymptote.Alargerdifferencebetweenthemodelsisindicatedbythearrows.DM‐GICC’scanbeaddedleftandrighttothecurrentsetofitems.IntheQM‐logtheitemontherightisthemostdifficultitempossible.

Page 16: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics16

of such a model requires further theoreticalwork.

Finally, as a typical advantage of the Q‐diffusion model we mention the followingcase. Suppose we constructed a computeritem bank for items with a time limit of 30secondsperitem.Forsomereasonwedecideto apply this item bank in a test with adifferent time limit, say 1 minute per item.Rescaling of the item parameters is hard toachievewithinthestandardinterpretationofIRTparameters.Wegenerallyexpectitemstobe more easy, but the change in difficultyprobably also depends on difficulty itself.According to the diffusion interpretation ofthe2PL theanswer is simpleandsurprising.The β's don’t change at all. Only the α’schange, as they increase upon relaxing thetimelimit.

FittingtheQ­diffusionmodel

In this paragraph we develop statisticalmethodologytofittheQ‐diffusionmodelandpresenttwoexamples.First,notethatthefullQ‐diffusion model cannot be fitted withaccuracydataonly,becauseRT’sarerequiredto distinguish between the a and v part ofpersons and items. If RT's are not available,however,wemaystillfitapartialQ‐diffusionmodel to the accuracy data only (see alsoRamsay,1989).

If RT’s are available, different techniquesdeveloped to fit the diffusion model to data(Vandekerckhove&Tuerlinckx,2007;Voss&Voss, 2007; Wagenmakers ,van der Maas &Grasman,2007)areavailablebutnotalwaysappropriate for psychometric applications.One reason for this is that, in typicalexperimental psychology applications wherestandard techniques are used, subjects aretreatedas equal (

akp = ap ,

vkp = v p ), and items

aredividedintoasmallnumberoftypes,sayA and B (

v ji = vA

i or

v ji = vB

i ). Moreover,becauseitemsaresubmittedinmixedblocks,boundaryseparationisequaloveritemtypes

(

a ji = ai). Since M is also known, only two

parametershavetobeestimated,oftenbasedon many observations per subject per item(type).

In psychometrics, however, subjects anditems differ from each other, so that otherconstraints are required. If time pressure isequalforallitems,then

a ji = ai.Theαestimate

then reflects differences in easiness (i.e.,1/difficulty). For many tests such as exams,however, timepressure increases during thetest,which, if not corrected for,will bias theestimates of

v ji .We developed a Bayesian fit

techniquetomeetthespecialrequirementsofpsychometricdata.

Example1:Mentalrotation

We collected data of 121 subjects in thecontext of a mental rotation task (Kievit,2010). Subjects had to identify a stimulus aseither identicaltoordifferentfromarotatedpresentation of the same stimulus or of adifferent stimulus. In such amental rotationtask,itemdifficultyvarieswithrotationangle.Formoredetails,thereadermayconferBorst,Kievit, Thompson & Kosslyn (2010), whousedthesameexperimentalsetup.

Responses were dichotomous (cor–rect/incorrect) and RT’s were recorded. Werandomly selected 10 out of 280 items ofthree different rotation angles (500, 1000,1500) for the model fitting analysis. Wediscuss the analysis at two levels that mayarise in practice. First, we will analyze theaccuracy data only. This allows forcomparison against standard IRT models,which do not have implications for theresponse times; in addition, the analysis canbeexecutedusingstandardsoftware.Second,we will analyze the full dataset, includingresponse times. This analysis presentsmoredifficultproblems;wetackledtheseproblemsusingaBayesiananalysis.

Page 17: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics17

Estimation without response times. WithoutRT’swecannotdistinguishbetweentheaandv parameters of persons and items, butanalyzing only the accuracy does allow for acomparison to standard IRT models.Therefore we choose the linear setup of the2PL formulation of Equation 7, in which

θ k = akpvk

p , αj = 1/ajivj

i , and βj = ln(Mj −1). Theresulting model may be fitted using thenonlinearmixed effect setup for IRTmodels(De Boeck &Wilson, 2004) through the SASprocedure NLMIXED (SAS Institute, 2000),where we attain positive values of abilityparametersbyexponentiation.TheNLMIXEDprocedure provides ML estimates withstandarderrors.Mjcanbefixedorestimated,and models can be compared usinginformationcriteriasuchasAICandBIC.

Table 1: Fit statistics for the Q­diffusion itemresponse model and several standard itemresponsemodels.Modelswere fitted using theaccuracy data ofmental rotation (example 1)andchessability(example2).

Model ‐2LL AIC BIC

Mentalrotationexample

Q‐diffusion 832.2 852.2 880.2

1PL 835.1 857.1 887.9

1PLguessing 846.5 866.5 894.5

2PL 830.5 870.5 926.4

3PLfull 819.2 859.2 915.1

Chessabilityexample

Q‐diffusion 4263.0 4341.0 4479.2

1PL 4309.3 4351.3 4425.7

1PLguessing 4214.8 4294.8 4436.7

2PL 4178.4 4258.4 4400.3

3PLfull 4164.8 4282.8 4491.9

Results are represented in Table 1, andindicate that the Q‐diffusion modeloutperformsthe1PL,2PL,aswellasthe3PLandthe1PLwithguessing.BothAICandBICfavortheQ‐diffusionmodel.

Estimationwithresponsetimes.FittingthefullQ‐diffusionmodel requires the evaluation ofrathercomplexmathematicalfunctions,someinvolving triple integrals. Hence, we take aBayesian approach to model fitting, whichrenders these functions tractable. We notethat it should be possible in principle toimplement the Q‐diffusion model in afrequentist framework as well; because weuse uninformative priors for the parametersintheQ‐diffusionmodel,resultswillgenerallybethesame.

The model setup is as follows. First, weassume that the binary (M=2) responses areBernoullidistributedaccordingtoEquation6.We approximate the RT distribution with alognormal distribution, i.e., log(RTkj) ~normal(μkj, σkj2). Parameters μ and σ can beobtained if the values of mean and varianceareknown:

ukj = log[E(RTkj )]−121+var(RTkj )E(RTkj )

2

σkj2 = log 1+

var(RTkj )E(RTkj )

2

(9)

The mean and variance are derived inWagenmakers, Grasman, & Molenaar (2005)and are functions of the Q‐diffusion modelparameters:

Page 18: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics18

E(RTkj ) =akpvj

i

2ajivk

p

1− ehkj

1+ ehkj+Terk

var(RTkj ) =akp

2aji

vji

vkp

32hkje

hkj − e2hkj +1(ehkj +1)2

hkj = −vkpak

p

vjiaj

i

(10)

In a simulation study we established goodrecovery of theparameters of the fullmodelincludingTerp (i.e.,non‐decisiontimevaryingbyperson).

Asuninformativeitempriorsforaiandvi,weuse uniform distributions from .01 to .5 andfrom .01 to 100, respectively. As person

priorsforap,vpandTerpwechooselognormaldistributions with µ and σ of 0 and 1,respectively.Notethatcomparedtotheotherpriors,thepriorofaiappearstoberelativelynarrow. However, from experiences withfittingtheQ‐diffusionmodel,weencounteredthataiwasalwaysroughlyintherangeof.25‐ .35. We drew 10000 samples from theposteriordistributionanddiscarded the first5000asburn‐in.JudgedfromtheplotsoftheMCMCoutput,allchainsconverged.

ThefitoftheQ‐diffusionmodelwasevaluatedby means of the posterior predictivedistribution (Gelman, Carlin, Stern, & Rubin,2004), which is the distribution that ispredicted for the observed data given theestimated model. If the model is the true

Figure5:Theobserveddatadistribution(bars)andtheposteriorpredictivedistribution(solidline)forthementalrotationdata.

Page 19: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics19

model, the posterior predictive distributionand the observed distribution areasymptotically equivalent. Figure 5demonstratesahighdegreeofequivalenceforthe mental rotation data. Taken together,these results indicate that the Q‐diffusionmodelisfeasible,andsupportthehypothesisthatmentalrotationisasimpleability.

Example2:Chess

Thementalrotationexampleisatheoreticallyinterestingone,butdoesnotprovideuswitha strong external criterion that would allowustoexaminethepredictivepropertiesofQ‐diffusion model parameters. An exampledataset that does allow for such comparisoniscomprisedofsubsetofdataderivedfromachess ability test (van der Maas &Wagenmakers, 2005). In this test, chessplayers solve chess puzzles that require therespondent to for instance select the bestmove given a certain configuration of pieceson the board. The advantage of the chessdataset is that it contains a very strongcriterionmeasure in the form of Elo ratings(Elo, 1978). The Elo rating is based on thenumber of wins and losses of a given chessplayer, and is updated on the basis of theoutcomes of officially played games. Eloratings are very good predictors of gameresults. Having a strong external criterionmeasureallowsustoevaluatethefitoftheQ‐diffusionmodelinadirectway.

Some aspects of the dataset are challenging.First, the item format of these chess puzzleswas open ended. Since the number oflegitimate‘sensible’movesinchessislimited,wecaninterprettheitemformatasmultiple‐choicewith an unknown number of options.For this reason, we apply Equation 7 to thebinary scored responses. The correction forguessing used in that equation, however,cannoteasilybeappliedtoRT’s(seeEquation5).WefollowedthelineofreasoninggiveninFootnote 8, assuming that an increase inalternatives primarily increases vi, the item

drift rate or difficulty of the item. Thecorrecteditemdriftratevi*isthenafunctionof Mi and all other person and itemparameters,wherevirepresentstheitemdriftratefortwoalternatives:

akp* = ak

p;vkp* = vk

p;aji * = aj

i

vji * =

akpvk

pvji

akpvk

p − ln(M i −1)ajivj

i

(11)

Thesetransformedparametersarethenusedto compute E(RT) and var(RT) according toEquation 9.Wewere able to implement thissetup in Winbugs, and applied it to thisexample11, estimating M as a separateparameterforeachitembelow.

An additional problem is that chess playingability in general consists of many differentabilities,aswellasspecificbitsofknowledge.For certain typesof items, suchasend‐gamepuzzles, one needs to know specific solutionrules.Ifnot,onewillcertainlyfailsuchitems,even when one is allowed lots of time. Forthese items, theQ‐diffusionmodel cannotbecorrect.However,somesubtestsof thechesstestconsistofso‐calledtacticalitemsinwhichknowledgeisrelativelyunimportant.Welimitouranalysesto20chessitems(last20itemsofChoose‐a‐MovetestB)andanalyzethedatawith the full Q‐diffusion model according totheBayesiansetupdescribedabove.

In the model fitting procedure, we used auniformdistributionrangingfrom1.01to500for M. Other priors were equal to thoseapplied in the mental rotation example. Wedrew 10000 samples from the posteriordistribution and discarded the first 5000 as11WhenM>2,equationsgetnumericallytoocomplexfor the standard WinBUGS program (i.e., samplingproceeds extremely slow or is not possible at all).Therefore,forthecasethatM>2,weimplementedtheq‐diffusion model in the WinBUGS DevelopmentInterface (WBDev; Lunn, 2003) which is a freelyavailableWinBUGSadd‐onprogram.

Page 20: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics20

burn‐in. Judging from theplots of theMCMCoutput,allchainsconverged.Forallitems,theposterior predictive distributions and theobserveddistributionaresimilar.

Table 2 displays the relations of Elo,tournamentratings,andagewithsumscores,mean RT’s, person drift rates, personresponse cautions and non‐decision time.Usingdriftrateinsteadoftestscoreimprovesvalidity slightly. Hence, the use of response

timesappears tobebeneficial.However, it isprudenttonotethat,usingtheaccuracydataonly, the 2PLmodel provided a better fit tothe data than the Q‐diffusion model. Hence,the evidence that we are dealing with asimple ability here is less convincing than inthe mental rotation example. Note that agecorrelateshighestwithTer,aresultalsofoundin other applications of diffusion modeling(e.g., Ratcliff, Tapar & McKoon, 2001).

Table2:Correlationsofthestandardteststatistics(testscoreandmeanRT),personestimatesaccordingtothe1PLand2PLmodel,andtheQ­diffusionparameterspersondriftrate(v),responsecaution(a),andnon­decisiontime(Ter),withtheEloratings,andagesofchessplayersinvanderMaas&Wagenmakers(2004)chessstudy.

Test score RT 1PL θ 2PL θ vp ap Ter Elo Rating 0.68 -0.44 0.67 0.69 0.72 -0.38 -0.17 Age -0.35 0.54 -0.35 -0.33 -0.34 0.24 0.60

RelationToOtherIRTModels

Historically, there have been manyattempts to incorporate RT’s in IRTmodels.This isanongoingresearchtopic,andinadditionthereareseveralrelativelynovelIRTmodelsthathandleRTinanewway(vanderLinden,2009).Also,differentmodelshavebeensuggested forguessing.Inthissection,wecomparetheQ‐diffusionmodelwithsuchmodels.

TheD­diffusionIRM

Wefirstreturntotheinterpretationofthe2PL in terms of the diffusion parametersby Tuerlinckx and De Boeck (2005). Wehavearguedthatthisinterpretationmakessense for attitude and personality tests,sinceattitudesandpersonalitytraitsoftensuggest a bipolar structure; examples arepro/contra attitudes andintrovert/extravert personality. In suchcases, item and person drift rates can benegative, and therefore the differencefunctionv=vp‐vi seemsreasonable.Sincea

hastobepositive,theratiofunctionseemsappropriate for a=g(ap,ai). Hence forbipolartraitswepropose:

P+ =eakp

aji(vk

p −vji )−ln(Mj −1)

1+ eakp

aji(vk

p −vji )−ln(Mj −1)

(12)

This is a minor extension of Tuerlinckxand De Boeck’s model but it clarifies theroleof theperson(responsecaution)anditem (time pressure) part of thediscrimination parameter. We call thismodel the D‐diffusion item responsemodel,whereDdenotesdifference.

An interesting consequence of thisseparation of theperson and item role inboundaryseparationforboththeQandD‐diffusion IRM, is that it allows one tomodel person fit. Strandmark and Linn(1987)presenta2PLmodelforpersonfit(see also Reise, 2000, Ferrando, 2007).Interestingly,their2PLmodelissimilartothe D‐diffusion IRM. In their model, the

Page 21: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics21

discrimination parameter of the 2PL isequal to the product of a person and anitem discrimination parameter. However,boththeinterpretationandtheestimationof Strandmark and Linn’s model areproblematic.Wethinkthattheconceptsofresponse caution and time pressure mayhelp to interpret the person and itemdiscrimination parameter in this model.Such an interpretation directly suggestsothermeans of testing the interpretation,such as experimental manipulations ofresponsecautionortimepressure,and/orusingresponsetimedata.

When we define person fit in terms ofresponse caution, we can also see thatpersonfitplaysadifferentroleinattitudetesting than in ability testing. In the D‐diffusionmodel,a lower levelofresponsecautionmay increaseP(agree) forθ<β. Inthe Q‐diffusion model, however, a lowerlevel of response caution is alwayscounterproductive, since it decreases P+forall(positive)valuesofθ.

IRTModelsForResponseTimes

VanderLinden(2009)reviewsanumberofIRT‐typemodelsforthejointanalysisofaccuracyandRT.Oneofthekeymodelsinthis tradition isRasch’s (1960)model formisreadings and reading speed. In themodel part formisreadings, the expectednumber of misreadings depends on aprobabilityθjk=δj/ξk, i.e., theratiooftextdifficultyandpersonreadingability.InthemodelpartforRT,the‘speed’parameter‐the expected number of words read pertimeunit‐equalsλjk=δj/ξk..Thissuggeststhat same parameters explain accuracyand response times. However, Rasch(1960) did not necessarily believe this tobe the case, and proposed to let this bedecidedempirically.Thisisexactlythelinefollowed by van der Linden and hiscollaborators. Van der Linden (2007)proposesahierarchicalapproach,inwhich

RTandresponsesaremodeledseparatelybyitemandpersonparametersthatcanberelatedindifferentwaysatasecondlevelofmodeling,dependingonthedata.Sincehismodeliscurrentlythemostpromisingapproach within IRT, we compare ourmodeltothehierarchicalapproach.12

A key idea in van der Linden’s andmostotherIRT‐RTmodelsisthat,inadditiontothe usual ability parameter, a new latentconstruct is required, usually in the formof a speed parameter. He distinguishesbetween item time intensity and personspeed parameters, on the one hand, anditem difficulty and person abilityparameters on the other. Item timeintensity and person speed are used tomodel RT’s, whereas item difficulty andperson ability figure in the accuracymodel. Item difficulty and time intensityare latent parameters that derive theirmeaning entirely from the fact that theyrepresent the effects of the items on theprobability of a correct response and thetimespentonitems,respectively.Becausethese are different quantities, the twotypesofeffectsaredifferent,althoughtheymay correlate across items (van derLinden,2009).Hence,theitemandperson12 Van der Linden criticizesmodels that integrateresponse time and accuracy at one level. AnexampleisRoskam’s(1987)modelinwhichthelogofresponsetimeisaddedtotheθ‐βpartofthe1PLmodel,whichincreasesP+whenmoretimeisspenton the item. In other models this correction isreplaced by a speed parameter (Verhelst,Verstralen,&Jansen,1997)ormodifiedbypersonanditemparameters(Wang&Hanson,2005).Thegeneral form of these types of models islogit(P+)=θk‐τk‐βj. An example of a model thatintegratesaccuracyparametersinamodelforRTisThissen's (1983) model: ln(RTjk)=µ+τk+ξj‐ρ(αjθk‐βj)+εjk,whereµ isageneral intercept,τkandξjareslowness parameters of person i and item j,respectively, ρ determines the influence of theusual2PLresponseparameterstructureandε isanormallydistributederrorterm.

Page 22: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics22

parameters are combined only at thesecond order level of van der Linden'shierarchicalmodel.

An important underlying idea in van derLinden’s model is the ‘fundamentalequation of RT modeling’. According tovan der Linden, the person speedparameter equals the amount of laborrequired to solve the itemdividedbyRT.Hence, RT is the ratio of amount of laborand speed. A logarithmic transformationthen gives E(ln(RTjk))= ξj ‐ τk. ThisequationisthebasisfortheRTpartofvanderLinden’smodel.

How does this model relate to the Q‐diffusionmodel? First of all,wenote thatparametersintroducedinvanderLinden’smodelaretypicallatentvariableseffectingeither accuracy or RT. In the Q‐diffusionmodel, we have no such variables. Driftrate and boundary separationfundamentally differ from the speed andability parameters of the model typesdiscussedabove.Thereasonisthat,intheQ‐diffusion model, these are processparameters,whichareofequalimportanceto accuracy and toRT. Since they arenotdefinedby theireffectson theprobabilityof a correct response and time spent onitems,asthestandardIRTparametersare,there is no objection to use them withinthesamelevelofmodeling.

Of course, an important advantage of theQ‐diffusionmodel is that its extension toRTmodeling need not be crafted, as it isthe very basis of the model. Givenappropriate data, we could fit the jointdistribution of accuracy and RT asspecified in equation 2. We can also usetheequationforthemeanRT(equation5).It is informative to relate the mean RTprediction of the diffusion model to vander Linden’s fundamental equation of RTmodeling. As explained in footnote 1, for

reasonablehighvaluesofav,expectedRTequalsa/2v.Hence:

E(DT) =ξ j*

τ k* ≈

a2v

=12

akp

a ji

vkp

v ji

=12

v ji

a ji

vkp

akp

E(ln(DT) = ξ j − τ k ≈ −ln(2) + lnv ji

a ji − ln

vkp

akp

ξ j ≈ lnv ji

a ji ;τ k ≈ ln

vkp

akp (13)

In words, the time intensity and speedparameters in van der Linden’s modelrelatelinearlytothelogarithmoftheratioof drift rates and item boundaryseparations for items and persons,respectively. SincevanderLinden (2007)applied standard 2PL or 3PL to theresponse data, the translation of hisdifficultyandabilityparametersisalreadyspecified in Equation 13, as the productsof item and person drift rates andboundaryseparations,respectively.

Theserelationsconstrainthemodelatthesecond level of van der Linden'shierarchical model, which in turn allowsfor a test of theQ‐diffusionmodelwithinthemodeling approachof vanderLinden(Fox, Klein Entink, & van der Linden,2007).

For instance, if speed and abilityparametersat thesecond level invanderLinden’s model are positively correlated,then individual differences are primarilydue to differences in drift rate (for anexample, see van der Linden, 2007). Iftheseparameterscorrelatenegatively,theindividual differences in drift rate areprobably similar across subjects, anddifferences are mainly due to differencesin response caution (see example 2 ofKleinEntink,Fox,&vanderLinden,2009).

Page 23: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics23

ThereisoneimportantproblemfortheQ‐diffusion model that emerges from thiscomparison. Dimensional analysis, asapplied to equation 3, leads to soundresults. Boundary separation ismeasuredinunitsof information, anddrift rate is aspeedmeasure(unitsperseconds),sothattheirratioRT(DT)ismeasuredinseconds.However, since we modeled v and a asratiosof thepersonand itemparameters,they becomedimensionless quantities. Asaconsequence,RTbecomesdimensionlesstoo.

To solve this problem, we have toreconsider our choices of v=f(vp,vi) anda=g(ap,ai). For v, we suggested a solutionin Footnote 6. If vp is the informationprocessing'power'oftheperson,andvi isthe'force'requiredtosolvetheitem,theirratio v is a speed measure. For g, asprovisional solution,we suggest todefineap inunitsofinformation,andtoviewthetime pressure ai as a dimensionlessquantity modifying person responsecaution. Clearly, more work is requiredhere, involving precise analysis of howpeople set their responseboundaries andhow they are influenced by task factorssuch as instruction, rewards and timelimits. This is currently an active area ofresearch inexperimentalpsychology(e.g.,Simenetal.,2009;Wagenmakers,Ratcliff,Gomez&McKoon,2008).

IRTModelsForGuessing

TheQ‐diffusionmodelhandlesguessinginaprincipledway:theguessingprobabilityfor equally attractive response optionsequals1/Mforzeroability.Wenotethatitis quite remarkable that guessing can infact be handled by restricting the 2PLmodel (topositiveθ) insteadofextendingthemodelwithaguessingparameter,asiscommon in the IRT tradition. How,precisely, does the Q‐diffusion modelrelatetoothermodelsforguessing?

Currently, the 3PL is by far the mostpopular IRT model to accommodateguessing. Yet, this model suffers from atleasttwoproblems.First, theestimatesof3PL parameters are unstable, especiallyforsmallsamples(seeSanMartin,delPinoand De Boeck, 2005, for an overview). Ifthe guessing parameters are constrainedto be equal to each other, or to 1/M, thisproblem is less severe. If additionally alldiscriminationparametersaresettounity,we obtain theDM‐G, or the 'Raschmodelwith guessing' discussed above. But thismodel seems to fit worse in thecomparisonofmodelsbyRamsay(1989).

Thesecondproblemofthe3PL,whichalsoplagues theDM‐G, is its interpretation.Asnoted above, different interpretationsbased on a distinction in a p‐process ofsearchingforthecorrectanswer,andag‐process for guessing are possible(Hutchinson, 1991). Subjects may searchfor the correct answer, and guess whenthis search fails (assuming that he or sheknows recognizes his or her failure);however, other serial or parallel setupsarepossible.According toSanMartin,delPinoandDeBoeck,thesuccessofguessingwillalsodependonability.Theythereforeintroduce ability‐based guessing withintheDM‐Gmodel,bymakingthesuccessofthe g‐process dependent on ability,weightedbyadiscriminationparameterα:

P+ =e(θ k −β j )

1+ e(θ k −β j )+ (1− e(θ k −β j )

1+ e(θ k −β j )) eα jθ k+γ j

1+ eα jθ k+γ j

(14)

Forα=0 this model reduces to the DM‐Gwithguessingparameterequaltotheexpitof γ. In an empirical example, they showthat this DM‐AG better explains the datathan the DM and the DM‐G. However, adisadvantageoftheDM‐AG(or1PL‐AG,asSanMartin,delPinoandDeBoeckcall it)

Page 24: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics24

isthatthep‐processandg‐processarenotwell separated anymore. If ability plays atoo large role in guessing, the g‐processcouldbeequally successful,orevenmoresuccessful than the p‐process. Also, forvery lowability themodelpredictsbelowchance responding (property 3, see SanMartin,delPinoandDeBoeck,2005).

As Ramsay (1989) remarks, it would bemoreelegant tohaveamodelbasedonasingleprocessmodel.Especiallywhenthep‐ and g‐process get mixed up, a singleprocess model might be preferable. Wethink the Ramsay's QM and our Q‐diffusion model are more attractive forthis reason. In the QM, guessing dependsonability,andthereisanaturaltransitionformaccuraterespondingtoguessing:Thelower the level of ability, the lower theprobability of a correct response; thisprobabilityhasitslowerasymptoteat1/Mfor an ability level of zero (whichrepresents no ability). In the QM there isjust one probabilistic process: abilitybased guessing. Pure guessing, educatedguessing(P+>1/M)andcorrectrespondingare explained by the same underlyingmechanism.

Of course, sometimes a two‐processdescription of item responding might bemoreaccurate.CaoandStokes(2008)andBechger, Maris and Verstralen (2005)discuss several guessing scenarios andassociated IRT models. It would be veryinteresting to attempt to derive thesemodels from sequential sampling modelsofdecisionmakingthatincludeaguessingprocess (see for instance, Ratcliff, 2006).The DM‐AG may serve as a basic modelhere.

Anothermodel for guessing is introducedby Hessen (2004; 2005). Hesseninvestigates a subclass of the fourparameter logistic IRM, for which bothspecific objectivity and sufficiency of the

total score for the latent trait hold. Onespecial case thatHessen (2004) proposesis:

P+ =λ / eβ j + eθ k −β j

1+ eθ k −β j (15)

ThisIRTmodelhasupperasymptotesat1,but lowerasymptotesat

λ / eβ j .Hence,theeasier the itemis, thehighertheguessingasymptote becomes. This, in a sense,mirrors the DM‐AG in that the success ofguessing depends on the difficulty of theitemandnoton theabilityof the subject.Wewill thereforecall thismodel theDM‐DG,fordifficultybasedguessing.

The prime attractiveness of Hessen’smodel lies in the statistical properties ofspecific objectivity and sufficiency. TheseadvantagesalsoapplytotheDM‐Gand,ofcourse,theDM,butnottotheDM‐AGandthe QM. Ramsay (1989) discusses a wayfor which the QM permits specificobjective comparisons, but clearlysufficiency of the total score for ability ismissing. This is a disadvantage, but weagree with Ramsay’s remark that weshould not overemphasize statisticalconvenience.

Discussion

In introductions to item response theory,thepreference for the logistic equation istypically explained in terms of statisticalor measurement‐theoretical convenience.However,fromasubstantivepointofview,thelackofapsychologicaljustificationforthis key property of the measurementmodel compromises test validity. Thereason is that validity requires a causalmechanismlinkingthetraitorabilitywiththe item responses (Borsboom,Mellenbergh, & Van Heerden, 2004;Borsboom & Mellenbergh, 2007). In theabsenceofsuchamechanism,therelationbetween the targeted attribute and the

Page 25: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics25

item responses is essentially a black box,and the psychological appropriateness ofthe function that describes this relationbecomes an article of faith. Since theprocesses that lie between itemadministration and item response arepsychological in nature, the only way toremedy this situation is by constructingpsychological theories of item responseprocesses,andlinkingthesetomodelsforindividual differences. In the positiveabilitymodelwemakethisimportantstep.Apart from the fact that the ensuinginvestigation, in our view, has producedmany unexpected implications andsurprising results, the most importantaspect of this paper may be that it givesproof of concept: It is possible tosystematically connect latent variables toitemresponsesthroughprocessmodelssoas to get a substantive handle on themeasurementprobleminpsychology.

Clearly, however, we have only begun toinvestigate the common ground coveredby IRT and cognitive processmodels.Weconsider the further exploration of thisterritory to be of significant importanceforboth IRTmodelingand formalmodelsofcognitivepsychology.Somepossibilitiesfor advances along these lines are thefollowing.

First, the current paper proposed afundamental difference between abilitytestsontheonehand,andpersonalityandattitudetestontheother,bynotingthatadiffusionprocessrendersatraditionalIRTmodel unlikely for abilities, but plausiblefor personality and attitudes. This issurprisingbecause IRTmodelshavebeentraditionally proposed for, and applied inthe context of, ability testing: althoughapplications to personality and attitudeshave become more frequent in the pastdecades, these are clearly spin‐offs of theability testing approach. It turns out,however, that the situationmight aswell

have been reversed: from a processmodeling point of view, standard IRTmodels are plausible for attitudes andpersonality,butnotnecessarily forabilitytests. This, of course, invites furtherinvestigations into the substantive natureof abilities versus personality traits andattitudes,andintothemethodologicalandpsychometrictreatmentoftestscoresthatisconsistentwiththatnature.

Theitemresponsemodelthatweargueisrequiredforabilitiesradicallydiffersfromstandard item response models in anumber of respects. In particular, thepostulatethatabilityisessentiallypositivehas far‐reaching implications. It leads toscales with natural zero points, invitingfurther analysis concerning themeasurementpropertiesof suchscales. Italso leadstoan itemresponsemodel thatincorporates guessing as part of thedecisionprocess.Incontrasttootheritemresponsemodels of guessing, thepositiveability model can accommodate forguessing by restricting instead ofextendingthestandard2PLmodel.Thisisa remarkable result.Finally, themodelingframework leads to novel interpretationsofstandarditemresponseparameters.Forinstance, it is surprising that the positiveability model has no standard βparameter. Instead, the difficulty of theitemis incorporated in thediscriminationparameter. At the same time, we haveshown that standard ability anddiscriminationparameters,aswellasVander Linden’s time intensity and speedparameters,canbetranslatedtothebasicdiffusion parameters of drift rate andboundary separation. Van der Linden’sapproach stands out as the best currentpsychometric model for accuracy andresponsetimes;therefore,thefactthatwewere able to find such strong relationsbetween his approach and ours ispromising.

Page 26: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics26

Considering abilities and theirmeasurement, we think that the positiveability model strongly suggests that theappearanceofunidimensionalityforbroadsets of items, as for instance observed inintelligence testing or educationalmeasurement, is just that: appearance.Arithmetic items for addition andmultiplication simply cannot beunidimensional, because they depend ondiscretely separable abilities, each ofwhichisplausiblygovernedbyaseparatepositive abilitymodel. The appearance ofunidimensionality probably results fromthe fact that these abilities are stronglyintertwinedandarrangedinahierarchicalfashion; this, in turn, results in data thatwillappearunidimensionalbecauseoftheimplied strong positive associationbetween item responses. However, thisshouldnotbemistakenforevidencethatasingle ability is in play. It merely meansthatindividualdifferencesinperformancecan be reasonably described by a scalarvariable. Statistical unidimensionality,thus,doesnotimplythatasingleabilityismeasured. It is important to stress thisfact, because in both psychometric andsubstantive literatures, the concepts of apsychometric latent variable and asubstantiveabilityhavebecomeconflated,as have the activities of 'fitting aunidimensional model' and 'measuring asingle ability'; these concepts andactivities do not coincide and should beseparated clearly (van der Maas et al.,2006). The diffusion account may beuseful in disentangling different abilities,as it extends the standard IRT paradigmwith predictions on the behavior of RTdata. Building up an account of therelations between distinct abilities intypicaltestsshouldbeamainpointonthepsychometric and psychological researchagendasforthenextdecades.

Naturally, the presented modelingapproachhingesontheappropriatenessof

the chosen process model. Thus, onepossible critique of our model could bebasedonthefact thatthediffusionmodelis normally applied to simple fastdecisions,as inperceptual tasksor lexicaldecision research, rather than to the typeof decisions found in tests used indifferential psychology. For example, inmany ability tests that are analyzed withIRT, the decision process may consist oflonger, perhaps sequential, stages thatcould probably not be reduced to onesimple random walk of informationaccumulation. In this case, a simplerandom walk is at best a roughapproximation of the underlying decisionprocess. On the other hand, givenreasonable assumptions, more complexdecision models reduce to the diffusionmodel (Bogacz et al., 2006). Also, someslow responses to certain knowledge‐based questions (‘What is the biggestcountry of Europe?’) may be welldescribed by a simple random walkprocess. For other decisions, involving aseries of computations for instance, asimplerandomwalkmaybetoosimplisticbut still serve as a reasonable firstapproximation. In addition, it has beenshownthatoptimaldecisionmaking,eveninmorecomplexdecisionmodels,maybebest described by the diffusion model(Bogacz et al, 2006;Ratcliff, vanZandt,&McKoon,1999).

Another strong assumption in ourapproach concerns the fact that ability isessentially positive. First, it could bearguedthatsomeabilitiesdoinfacthaveabipolar structure, which admits for bothpositiveandnegativevaluesinthemodel.AclassicexampleisthePiagetianabilitytoconserve quantitative properties asnumber mass and volume, in spite ofchanges in form. Children who do notunderstand conservation systematicallyscore below chance level on multiple‐choiceitemsofaconservationtest.Insuch

Page 27: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics27

a case, all qualitative implications ofTuerlinckx and De Boeck's model makeperfect sense, as explained in Footnote 4,and we would recommend to use the D‐diffusionitemresponsemodelinthiscase(Equation12).Howeverwedonotbelievethis to be a counterexample to our thesisof ability being essentially positive. Thereason is that children’s responses toconservationitemsaredeterminedbytwomutual exclusive strategies or abilitiescausing sudden transitions indevelopmental trajectories (vanderMaas& Molenaar, 1992), and each of theseshould be constructed as being anessentiallypositiveability.

Second, the Q‐diffusion model requirestests that measure a single ability. Sincesome psychological and educational testsclearlymeasureahostofrelatedabilities,the applicability of the Q‐diffusionmodelto such cases may be limited. Perhaps aconjunctive multidimensional ormulticomponent Q‐diffusion model couldbe developed for such situations. On theone hand, this represents an opportunityfor further research. On the other hand,one could also argue that we shouldreconsidertheuseofteststhatdependonmultiple related abilities. From ameasurement point of view, single abilitytestsshouldbepreferred.The integrationof simple abilities into higher orderabilities, perhaps culminating in whatseems to be a single overarchingdimension, should ideally be explicitlymodeled; this would arguably be a moretransparent approach than the currentpractice,inwhichmultiplerelatedabilitiesareimplicitlygroupedtogetherinasingledimension.Thedisadvantageof the latterprocedure is that the emergent singledimension no longer represents atheoretically transparent psychologicalconcept.

Third,aratherradicalconsequenceofthedefinition of ability in the Q‐diffusionmodel is that any able person willeventually solve all items of test, even ifability is very small and the itemextremely difficult. This consequence istheoretically valuable because itrepresents an important testableprediction that flows naturally from themodel. It isalsouseful tocharacterizethenature of simple abilities. However, inpsychometric practice, it may not beappropriate in certain situations. In thesecases, extensions of the Ratcliff diffusionmodel (e.g., introducing variance in driftrates) can be used to eliminate thispropertyofthemodelsothatitallowsforresponse errors even when boundaryseparation (available time to respond)approachesinfinity.

A final limitationof theQ‐diffusionmodelconcerns the solutionwepropose to dealwith multiple‐choice items. We havederived thecorrection formultiple‐choicefrom Bock's (1972) nominal responsemodel,which leads to a simple extensionof the Q‐diffusion model that can handlemultiple‐choice items. However, theresulting transformation is not so easilyapplicable to the formula for responsetimes. As a consequence, the Bayesian fitprocedure for M>2 is complicated andrequires additional assumptions. Thus, itwould be preferable to derive a Q‐diffusion type model from a multiple‐choice stochastic sequential samplingmodel fordecisionmaking. Inviewof therecentinterestinmultiple‐choicedecisionmodels in mathematical psychology (e.g.,McMillen&Holmes,2006),wearehopefulthatsuchanapproachiswithinreach.

Clearly, the connection between processmodels for decision making and IRTmodels for individual differences is anextremely fruitful one. It allowsresearchers in individual differences

Page 28: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics28

research to craftprocessmodels for theiritem responses as well as RT’s, and todevelop new research strategies andhypothesesthatmayfunctiontoelucidatehowtestswork.Theproposedsystematicconnection between psychologicalprocesses and psychometric latentvariables may allow researchers toaddress their validity problems byuncovering how their tests work, i.e., byexplicitly modeling the processes that liebetween item administration and itemresponse (Borsboom,Mellenbergh, & VanHeerden, 2004). For this reason, thepresentinvestigationsmaydomuchmorethan merely extend the family of IRTmodels with some new members; theymay serve to finally get a grip on thevalidity issues that have plaguedpsychologicaltestingforthepastcentury.

References

Bartholomew, D. J. (1987). Latent variable modelsandfactoranalysis.London:Griffin.

Bechger, T. M., Maris, G., & Verstralen, H. H. M.(2005).TheNedelskymodelformultiplechoiceitems.InA.vanderArk,M.Croon,&K.Sijtsma(Eds.), New developments in categorical dataanalysis for the social and behavioral sciences(pp.187‐206).Mahwah,NJ:LawrenceErlbaum.

Birnbaum, A. (1968). Some latent trait models andtheir use in inferring an examinee’s ability. InLord, F.M. & Novick, M.R. (Eds.), Statisticaltheoriesofmentaltestscores.Reading(pp.397‐479).MA:Addison‐Wesley.

Bock,D.R. (1972). Estimating itemparameters andlatentabilitywhenresponsesarescoredintwoormorenominalcategories.Psychometrika,37,29‐51.

Bogacz,R.,BrownE.,Moehlis,J.,Holmes,P.,&Cohen,J. C. (2006). The Physics of Optimal DecisionMaking: A Formal Analysis of Models ofPerformance inTwo‐AlternativeForced‐ChoiceTasks.PsychologicalReview,113(4),700‐765.

Borsboom, D., Mellenbergh, G.J., & Van Heerden, J.(2003). The theoretical status of latentvariables.PsychologicalReview,110,203‐219.

Borsboom, D., Mellenbergh, G.J., & Van Heerden(2004). The concept of validity. PsychologicalReview,111,1061‐1071.

Borsboom, D., & Mellenbergh, G.J. (2004). Whypsychometrics is not pathological. Theory &Psychology,14,105‐120.

Borsboom, D., & Mellenbergh, G.J. (2007). Testvalidityandcognitiveassessment. In:Leighton,J., & Gierl, M. (Eds.). Cognitive DiagnosticAssessment for Education: Theory andApplications(p.85‐115).Cambridge:CambridgeUniversityPress.

Borsboom, D., Kievit, R., Cervone, D.P., &Hood, S.B.(2009). The two disciplines of scientificpsychology,or:Thedisunityofpsychologyasaworking hypothesis. In: Valsiner, J., Molenaar,P.C.M., Lyra, M.C.D.P., & Chaudary, N.Developmentalprocessmethodologyinthesocialand developmental sciences (pp. 67‐98). NewYork:Springer.

Borst, G., Kievit, R.A., Thompson, W. L. & Kosslyn,S.M. (2010) Linear increase of times withincreasinganglesinmentalrotationparadigms:A refutationof the tacitknowledgehypothesis.European Journal of Cognitive Psychology. Inpress.

Busemeyer, J.R.&,Townsend, J.T. (1993).DecisionFieldTheory:ADynamic‐CognitiveApproachtoDecisionMaking in anUncertain Environment.PsychologicalReview,100,432‐432.

Cao, J.&Stokes,S.L. (2008).BayesianIRTGuessingModels for Partial Guessing Behaviors.Psychometrika,73(2),209‐230.

Cressie, N., & Holland, P.W. (1983). Characterizingthemanifestprobabilitiesoflatenttraitmodels.Psychometrika,48,129‐141.

Cox, D. R. & Miller, H. D. (1970). The Theory ofStochasticProcesses.London.

delaTorre,J.(2009).Acognitivediagnosismodelforcognitively‐based multiple‐choice options.Applied Psychological Measurement, 33, 163‐183.

DeGroot,A.D. (1978).Thoughtand choice in chess.The Hague: Mouton. (Original work publishedin1946).

Edwards,W. (1965). Optimal strategies for seekinginformation: Models for statistics, choicereaction times, and human informationprocessing.JournalofMathematicalPsychology,2,312‐329.

Elo, A. (1978). The Rating of Chessplayers, Past and Present, New York: Arco.

Embretson,S.E.(1984).Agenerallatenttraitmodelfor responseprocesses.Psychometrika, 49, 175–186.

Ferrando, P. J. (2007). A Pearson‐Type‐VII ItemResponse Model for Assessing PersonFluctuation.Psychometrika,72,25‐42.

Ferrando, P. J.,& Lorenzo‐Seva,U. (2007). An item‐response model incorporating response time

Page 29: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics29

data in binary personality items. AppliedPsychologicalMeasurement,31,525‐543.

Fischer,G.H.(1995).DerivationsoftheRaschmodel.In Fischer G. H., Molenaar I. W. (Eds.), Raschmodels: Foundations, recent developments, andapplications(pp.157‐180).NewYork:Springer.

Fischer, G. H. and I. W. Molenaar (1995). RaschModels: Foundations, Recent Developments, andApplications.NewYork:SpringerVerlag.

Fox, J.P.,KleinEntink,R.H.,&vanderLinden,W. J.(2007). Modeling of responses and responsetimes with the package CIRT. Journal ofStatisticalSoftware,20(7),1‐14.

Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B.(2004). Bayesian Data Analysis. London:Chapman&Hall.

Goodman, L.A. (1974). Exploratory latent structureanalysis using both identifiable andunidentifiablemodels.Biometrika,61,215–231.

Hamaker,E.L.,Nesselroade,J.R.,&Molenaar,P.C.M.(2007). The integrated state‐space model.JournalofResearchinPersonality,41,295‐315.

Harré, R., & Madden, E. H. (1975). Causal Powers.Oxford:Blackwell.

Hessen, D. J. (2004). A new class of parametric IRTmodels fordichotomous itemscores. JournalofAppliedMeasurement,5,385‐397.

Hessen, D.J. (2005). Constant latent odds‐ratiosmodels and the Mantel‐Haenszel nullhypothesis.Psychometrika,70,497‐516.

Hoskens, M., & De Boeck, P. (2001).MultidimensionalComponential ItemResponseTheory Models for Polytomous Items. AppliedPsychologicalMeasurement,25,19‐37

Hutchinson,T.P. (1991).Ability, partial informationand guessing: Statistical modelling applied tomultiple­choice tests. Rundle Mall, SouthAustralia:RumsbyScientificPublishing.

Jöreskog, K. G. (1971). Statistical analysis of sets ofcongenerictests.Psychometrika,36,109–133.

Kievit, R. A. (2010). Representational inertia: Theinfluence of associative knowledge on 3Dmental transformations. (internal report,UniversityofAmsterdam)

KleinEntink,R.H.,Fox, J.P.,&vanderLinden,W. J.(2009). A multivariate multilevel approach tosimultaneous modeling of accuracy and speedontestitems.Psychometrika,74,21‐48.

Laming,D.R.J.(1968).Informationtheoryofchoice­reactiontimes.London:AcademicPress.

Lawley,D.N.,&Maxwell,A.E.(1963).Factoranalysisasastatisticalmethod.London:Butterworth.

Lazarsfeld, P. F., & Henry, N. W. (1968). Latentstructureanalysis.Boston:HoughtonMifflin.

Link,S.W.(1992).TheWaveTheoryofDifferenceandSimilarity. Hillsdale: Lawrence ErlbaumAssociates.

Lord,F.M.(1952).Atheoryoftestscores.NewYork:PsychometricSociety.

Lunn,D. J. (2003).WinBUGSDevelopment Interface(WBDev).ISBABulletin,10(3):10–11.

Maris, E. (1995). Psychometric latent responsemodels.Psychometrika,60,523–547.

McLachlan, G., & Peel, D. (2000). Finite mixturemodels.NewYork:Wiley.

McMillen, T. & Holmes, P. (2006). The dynamics ofchoice among multiple alternatives. Journal ofMathematicalPsychology,50(1),30‐57.

Mellenbergh, G. J. (1994). Generalized Linear ItemResponse Theory. Psychological Bulletin, 115,300–307.

Mislevy, R. J. (2008). How cognitive sciencechallenges the educational measurementtradition.Measurement,6(1–2),124.

Mokken,R.J.(1971).Atheoryandprocedureofscaleanalysis with applications in political research.TheHague,theNetherlands:Mouton.

Molenaar,P.C.M.(2004).AmanifestoonPsychologyasidiographicscience:Bringingthepersonbackinto scientific psychology, this time forever.Measurement,2(4),201‐218.

Ramsay, J. O. (1989). A comparison of three simpletest theorymodels.Psychometrika, 54(3), 487‐499.

Rasch, G. (1960). Probabilistic Models for SomeIntelligence and Attainment Tests. Copenhagen:DanishInstituteforEducationalResearch.

Ratcliff, R. (1978). A theory of memory retrieval.PsychologicalReview,85(2),59‐108.

Ratcliff, R. (2002). A diffusion model account ofresponse time and accuracy in a brightnessdiscriminationtask:Fittingrealdataandfailingto fit fake but plausible data. PsychonomicBulletin&Review,9(2),278‐291.

Ratcliff, R. (2006). Modeling response signal andresponsetimedata.CognitivePsychology,53(3),195‐237.

Ratcliff,R.&Rouder,J.N.(1998).ModelingResponseTimes for Two‐Choice Decisions. PsychologicalScience,9(5),347‐356.

Ratcliff, R., Thapar, A., & McKoon, G. (2001). Theeffects of aging on reaction time in a signaldetection task. Psychology and Aging, 16, 323‐341.

Ratcliff, R., Van Zandt, T., & McKoon, G. (1999).Comparingconnectionistsanddiffusionmodelsof reaction time. Psychological Review, 106,261–300.

Reise, S. P. (2000). Using multilevel logisticregressiontoevaluateperson‐fitinIRTmodels.MultivariateBehavioralResearch,35,543–570.

Roskam, E. E. (1997). Models for speed and time‐limit tests. In W. J. van der Linden & R. K.Hambleton (Eds.). Handbook of modern item

Page 30: Cognitive psychology meets psychometric theory: On the ......Cognitive psychology meets psychometrics 1 Cognitive psychology meets psychometric theory: On the relation between process

Cognitivepsychologymeetspsychometrics30

response theory (pp. 187‐208). New York:Springer.

Roskam, E. E., & Jansen, P. G. W. (1984). A newderivationoftheRaschmodel.InE.DeGreef&J.vanBuggenhaut (Eds.),Trends inmathematicalpsychology(pp.293‐307).Amsterdam:Elsevier.

SanMartin,E.,G.delPino,&deBoeck,P.(2006).IRTmodels for ability based guessing. AppliedPsychologicalMeasurement,30,1–21.

SASInstitute(2000).SAS/STATuser’sguide(Version8).Cary,NC:SASIntitute.

Simen, P., Contreras, D., Buck, C., Hu, P., Holmes, P.and Cohen, J. D. (2009). Reward‐rateoptimization in two‐alternative decisionmaking: empirical tests of theoreticalpredictions,JournalofExperimentalPsychology:Human Perception and Performance, 35,1865‐1897.

Stone, M. (1960). Models for choice‐reaction time.Psychometrika,25(3),251‐260.

Strandmark,N.L.&Linn,R.L.(1987).Ageneralizedlogistic item response model parameterizingtest score inappropriateness. AppliedPsychologicalMeasurement,11,355‐370.

Thissen,D.(1983).Timedtesting:Anapproachusingitem response theory. InD. J.Weiss (Ed.),Newhorizons in testing: Latent trait test theory andcomputerized adaptive testing. New York:AcademicPress.

Tuerlinckx, F., & De Boeck, P. (2005). Twointerpretations of the discriminationparameter.Psychometrika,70(4),629‐650.

Van der Linden, W.J. & Hambleton, R. K. (Eds.)(1997). Handbook of modern item responsetheory.NewYork:Springer.

van der Linden, W. J. (2007). A hierarchicalframeworkformodelingspeedandaccuracyontestitems.Psychometrika,72,287‐308.

van der Linden, W. J. (2009). Conceptual issues inresponse‐timemodeling. JournalofEducationalMeasurement,46,247‐272.

vanderMaas,H.L.J.,Dolan,C.V.,Grasman,R.P.P.P.,Wicherts,J.M.,Huizenga,H.M.&Raijmakers,M.E. J. (2006). A dynamical model of generalintelligence: the positive manifold ofintelligencebymutualism.PsychologicalReview,113(4),842‐861.

vanderMaas,H.L.J.,&Maris,G.(submitted).Anote

on the assumptions required to apply the 1PLand 2PL model to dichotomously scoredmultiplechoicedata.

van derMaas, H. L. J., &Molenaar, P. C. M. (1992).Stagewise cognitive development: Anapplicationofcatastrophetheory.PsychologicalReview,99(3),395–417.

van der Maas, H. L. J., Kolstein, R., & van der Pligt, J. (2003). Sudden jumps in attitudes. Sociological Methods & Research, 32(2), 125-152.

van derMaas, H. L. J. &Wagenmakers, E. J. (2005).The Amsterdam Chess Test: a psychometricanalysisofchessexpertise.AmericanJournalofPsychology,118,29‐60.

Vandekerckhove, J., & Tuerlinckx, F. (2007). Fittingthe Ratcliff diffusion model to experimentaldata.Psychonomic Bulletin& Review,14, 1011‐1026.

Verhelst,N.D.,Verstralen,H.H.F.M.,&Jansen,M.G.(1997).A logisticmodel for time‐limit tests. InW. J. vander Linden&R.K.Hambleton (Eds.).Handbook of modern item response theory (pp.169‐185).NewYork:Springer.

Voss,A.,&Voss, J. (2007). Fast‐dm:A freeprogramfor efficient diffusionmodel analysis.BehaviorResearchMethods,39,767‐775.

Wagenmakers,E.J.,Grasman,R.,&Molenaar,P.C.M.(2005).On the relationbetween themeanandthevarianceofadiffusionmodelresponsetimedistribution. Journal of MathematicalPsychology,49,195‐204.

WagenmakersE. J.,Ratcliff,R.,Gomez,P.,&McKoonG.(2008).Adiffusionmodelaccountofcriterionshifts in the lexical decision task. Journal ofMemoryandLanguage,58,140–159.

Wagenmakers,E.J.,vanderMaas,H.L.J.,&Grasman,R. P. (2007). An EZ‐diffusion model forresponse time and accuracy. PsychonomicBulletin&Review,14,3‐22.

Wald,A.(1947).Sequentialanalysis.NewYork:Wiley.Wang, T. & Hanson, B. A (2005). Development and

calibration of an item response model thatincorporates response time. AppliedPsychologicalMeasurement,29,323‐339.

Wickelgren, W. A. (1977). Speed‐accuracy tradeoffand information processing dynamics. ActaPsychologica,41,67‐85.