Efficient estimation of sensory thresholds1975; Nachmias, 1981; Quick, 1974). The logistic func tion...

Behavior Research Methods, Instruments, & Computers1986, 18 (6), 623-632

Efficient estimation of sensory thresholds

LEWIS O. HARVEY, JR.University of Colorado, Boulder, Colorado

Laboratory computers permit detection and discrimination thresholds to be measured rapidly,efficiently, and accurately. In this paper, the general natures of psychometric functions and ofthresholds are reviewed, and various methods for estimating sensory thresholds are summarized.The most efficient method, in principle, using maximum-likelihood threshold estimations, is examined in detail. Four techniques are discussed that minimize the reported problems found withthe maximum-likelihood method. A package of FORTRAN subroutines, ML-TEST, which implements the maximum-likelihood method, is described. These subroutines are available on requestfrom the author.

With a laboratory computer, it is now easy to measuredetection and discrimination thresholds rapidly, efficiently, and with a known degree of accuracy. This threepart paper examines such methods. In the first part, thegeneral natures of psychometric functions and sensorythresholds are discussed. In the second part, various approaches to the rapid determination of thresholds are examined. In the third part, a specific computer implementation of the maximum-likelihood threshold estimationmethod, ML-TEST, is described. These ideas are developed in the context of measuring contrast thresholds forthe visual detection of sine wave gratings. They are,however, of sufficient generality that they can be appliedto many other situations as well.

forced-choice (2AFC) detection experiment as a functionof log stimulus contrast for three sine wave gratings ofdifferent spatial frequency (2.0, 8.0, and 21.0 cycles perdegree). An important feature of psychometric functionsis seen in Figure 1: In spite of the large differences inthe contrasts required for equal detection performance,the slopes of the psychometric functions, when stimuluscontrast is expressed in logarithmic units, are about thesame for all three sets of data. We could, therefore,represent each set with a single S-shaped function that hasbeen shifted laterally to coincide with the data.

To describe the data in Figure 1, we need an appropriate S-shaped function that will predict response probabilityfor different stimulus levels (SL). Three such functions

Figure 1. Probability of a correct response as a function of logstimulus contrast in a two-alternative forced-choicedetection experiment for three different vertical sine wave gratings: 2.0 (left), 8.0(center), and 21.0 (right) cycles per degree. Each data point wasdetermined to a standard error of 0.05, averaging 72 trialsper point.The solid curves are the best fitting Weibull functions for each setof data, fit with a maximum-likelihoodcriterion. The horizontal linerepresents threshold performance of 0.82; the vertical lines markthe value of ex for each curve.

LOG CONTRAST

-++ ++

.90

.80 I::::-----+--F-----if/

.70

.60

.50

.40

.30

20

. 10 L.LJL..-L.i...L..J.....L..l--L.J-...L...IJ...L..J.....L..J.....L...L...L...L.lU.-L....L.<LJ....J--'--'-J

-3 0 -2.5 -2.0 -1.5 -1.0 -.5 0.0

:.:: IIIIIII~ 11 11111 11111 11111 1111

",,*

r-..IUwa:a:ou'-/

0...

PSYCHOMETRIC FUNCTIONS

A psychometric function relates some physical measureof stimulus intensity (e.g., exposure duration, luminance,contrast, or amplitude) to some performance measure ofdetection or discrimination, such as hit rate, percent correct, or d' (Torgerson, 1958, p. 145). When performanceis expressed as probability, psychometric functions typically are S-shaped or ogival in form. Figure 1 shows theprobability of a correct response in a two-alternative

Thedevelopment of thesesubroutines beganwhentheauthorwasguestprofessorat the Institute for MedicalPsychology, Ludwig-MaximiliansUniversity, Munich, Federal Republic of Germany, with the supportof a research stipend from the Alexandervon Humboldt Foundation,Bonn. I thank Ernst POppel and Ingo Rentschlerfor their warm hospitality and the von HumboldtFoundationfor its generous and flexiblesupport. Figures were prepared with equipment provided by GrantRR07013-78 from the Biomedical Research Support Grant Program,NationalInstitutesof Health. The Monte Carlo simulations were carried out on the VAX 111780 computer of the Computing Laboratoryfor Instruction in Psychological Research (CLlPR) in the Departmentof Psychology at the University of Colorado.

Requests for reprints and correspondence should be addressed to:Lewis O. Harvey, Jr., Departmentof Psychology, Campus Box 345,University of Colorado, Boulder, CO 80309-0345. Those wishingcopies of the ML-TEST maximum-likelihood subroutines and demonstrationprogramshouldsendthe authora single-sided, single-density floppydisk formatted for the RT-ll operating system.

623 Copyright 1986 Psychonomic Society, Inc.

(2)

624 HARVEY

are commonly used for this purpose: the logistic function, theWeibull function, and the Gaussian integral function. The logistic function (Berkson, 1951, 1953) is givenby:

P(X) = -y + [(1.0--y) . 1.01~~-SL

SL = loge[(Xlaf)

= (lOglOX-logloa) . (j . loge10. (1)

The Weibull function (Nachmias, 1981; Quick, 1974;Weibull, 1951) is given by:

P(X) = -y + [(1.0--y) . (l.O-e-SL))

SL = (XJa)"

= 10.0[(log,oX-logloa ) . ~l.

The Gaussian integral function is given by:

P(X) = -y + [(1.0--y) . rx-(j_. e<-SLzl2)dX]

J -CD (2 '11")0.5

SL = loglo[(XJaf)

= (logloX-log1oa) . (j. (3)

The Gaussian integral can be approximated by a polynomial function (Abramowitz & Stegun, 1965, p. 932),whereas the other two integrals have exact analytic forms.Each of these mathematical representations of the psychometric function is completely specified by threeparameters: ex, the stimulus intensity at which the slopeof the function is maximum; (j, the steepness of the function; and -y, the probability of a correct response due tochance alone.

The choice of which function to use for representingpsychometric data can be made either on theoretical oron practical grounds. Some models of detection and discrimination performance make explicit predictions aboutthe nature of the psychometric function. Signal detectiontheory, for example, predicts that the psychometric function of an ideal observer will be based on the Gaussianintegral (Green & Swets, 1974), whereas models utilizing probability summation among independent detectionmechanisms predict a Weibull function (Green & Luce,1975; Nachmias, 1981; Quick, 1974). The logistic function is used as a substitute for the Gaussian integral because it is computationally more efficient and has a closedanalytic form (Berkson, 1951).

On the other hand, one can choose the function that provides the best fit to the data, disregarding the theoreticalimplications. In 90%of the psychometric data I have collected by using n-alternative forced-choice detectionparadigms, the Weibull function gives a better fit thaneither the Gaussian or the logistic. Similar findings arewidely reported in the vision literature (e.g., Nachmias,1981; Watson, 1979).

The concept of threshold has both a theoretical meaning and a practical meaning. As a theoretical concept, sen-

sory thresholds are a property of some models of detection and discrimination (Krantz, 1969). The classicthreshold model is the high threshold detection model.This model assumes a sensory threshold: Stimuli whoseintensity is above this threshold can influence perception;stimuli below the threshold intensity have no effect. Thethreshold is further assumed to fluctuate in time, havinga mean and standard deviation described by a Gaussianprobability distribution. Under the high threshold detection model, psychometric data, after correction for guessing, should fit a Gaussian integral function. The stimulusthat evokes a response probability of 0.5 is equal to themean threshold value. This threshold stimulus value isgiven by the parameter ex.

The high threshold model of detection has been testedrepeatedly and has been rejected repeatedly because someof the predictions of this model are not supported by actual data (e.g., Egan, 1975; Green & Swets, 1974; Krantz,1969; Swets, 1986a, 1986b; Swets, Tanner, & Birdsall,1961). In spite of this rejection, several concepts derivedfrom the high threshold model (e.g., correction for guessing and subliminal stimulation) still are widely used bypsychologists and the general public. Other detectionmodels either incorporate no sensory threshold at all orhypothesize multiple weak thresholds (Krantz, 1969), sothere is no compelling reason to choose a specific psychometric function in order to compute a threshold fromdata.

The practical use of the threshold concept is as a specification of psychometric performance, For this purpose,a sensory threshold can be defined as the stimulus levelthat results in some specific level of performance (e.g.,d' of 1.0 or a correct response probability of 0.75). Theparameter ex of the above functions represents this typeof threshold. The three solid curves in Figure 1 are thebest fitting Weibull functions for each set of data, usinga maximum-likelihood fitting criterion. The stimulus contrast corresponding to the ex of each function is markedby a vertical line. It should be noted that the Weibull function is asymmetrical and that ex corresponds to a performance probability that is not halfway between chance (-y)and 1.0, as is the case for the logistic function and theGaussian integral. In a two-alternative forced-ehoice task,where -y is 0.5, a corresponds to a correct response probability of 0.82 for the Weibull function but to only 0.75for the logistic function and the Gaussian integral, asshown in Figure 2. Therefore, one must be careful incomparing thresholds derived from Weibull functions withthose from logistic or Gaussian integrals. The latter twoalways give lower ex values than does the Weibullfunction.

SEQUENTIAL EXPERIMENTS

A sequential experiment is one in which the experimenter makes a decision about the experiment aftereach trial. Levitt (1971), in reviewing adaptive psychophysical methods, distinguished two broad classes of se-

SENSORY THRESHOLDS 625

1967; Taylor, Forbes, & Creelman, 1983); and(3) function-constrained methods, of which those basedon maximum-likelihood parameter estimation are bestdeveloped (Hall, 1968; Pentland, 1980; Watson & Pelli,1983). Thesemethods depend uponsomeor all of the following assumptions: (1) response probability is a monotonic function of stimulus level; (2) the responses fromthe subject are independent of each other and of thepreceding stimuli (Bernoulli assumption); (3) the psychometric function is stationary withtime(thereis nochangein the shapeof the function and there is no changein thesubject's sensitivity during themeasurements); and(4) thepsychometric function has a specific mathematical form(e.g., Gaussian integral, logistic, or Weibull). Themethods provide rulesallowing theexperimenter to makefour separate decisions during the course of the experiment (Taylor & Creelman, 1967): (1) when during theexperiment to changethe stimulus level; (2) whatstimulus level to use when a change is to be made; (3) whento end the experiment; and (4) how to calculate thethreshold after the experiment has ended.

The properties of these three methods are nicelyreviewed by bothLevitt (1971) andTayloret al. (1983),and willbe discussed onlybriefly here. Staircase methodsrequire only Assumptions 1 and 2, but they are inefficient because many stimuli not at the threshold arepresented. Also, there is no clearstatistically determinedbasis for stopping or forcalculating a threshold value fromthe results of trials (Wetherill, Chen, & Vasudeva, 1966;Wetherill & Levitt, 1965).

The PESTprocedure introduced an explicit statisticalrule for deciding whento changethe stimulus level: Thelevel is changed after enough trials have been given torejectthehypothesis that thecurrentstimulus givesa performance probability equal to the probability evoked bythe threshold stimulus. PESTalsocontains rulesgoverning the step size of the stimulus change: The experimentstops when thestimulus stepfalls below a predefined size.The current stimulus is then takenas the bestestimate ofthreshold. ThePESTmethod requires Assumptions 1and2, and, weakly, 3; that is, during a runwitha fixed stimulus, performance should not change. The originalPEST(Taylor & Creelman, 1967) andimproved versions (Findlay, 1978) are moreefficient than is the adaptive methodbecause they reach the threshold value faster (Pentland,1980); however, the PESTmethod is still notas efficientas possible because stimuli that are too high or too loware presented to the subject for several trials before being rejected (although for a strongdefense of PEST, seeTaylor et al., 1983).

In 1968, Hall introduced the idea that after each trial,onecouldcompute which of numerous Gaussian integralfunctions hadthe maximum likelihood of beingthe function describing the performance up to that point. The 01

valueof the function withthe maximum likelihood of accounting for the results obtained thus far is used as thestimulus on the next trial because it is the best estimateof the threshold. These ideaswere further developed by

1.10

1.00

.90

r-... .80I-U .70W0::: .600:::0 50U'-' .400..

.30

.20

.10-3.0 -2.5 -2.0 -1.5 -1.0 -.5 0.0

LOG CONTRAST

quential experiments: (1) those in which the number oftrials is determined by the resultsof the preceding trials,and (2) thosein whichthe choiceof stimulus level is determined by the resultsof the preceding trials. An example of a Class 1 experiment is a 2AFC detection studywitha fixed stimulus contrast. Aftereach trial, the probability of a correctresponse, P(correct), andthe standarderror (SE) of P are computed:

SEp = [p. (1-P)/NJO.5 (4)

whereN is the numberof trialsprior to and including thecurrent trial (Hays, 1963). If the standard error is lessthana predetermined level, the experiment is terminated.The probabilities shown in Figure 1 were collected inClass 1 experiments using a stopping standard error of0.05 or less.

Ifone's goal is to finda stimulus thatgivesa predeterminedperformance level, performing a Class 1 sequential experiment is a waste of time. Many trials are runwithstimuli that are eithertoo lowor too high. It is mostefficient to present only the sought-after stimulus. Ofcourse, if the value of this stimulus were known, onewould not have to do the experiment in the first place.

Thegoalof Class 2 experiments is to change thestimulus during the course of the trials to converge on thatstimulus giving the desired performance level. Threebroadmethods exist for achieving thisgoal: (1) theadaptive staircase (or up-down) method (Campbell, 1963;Cornsweet, 1962; Dixon & Mood, 1948; Levitt, 1971;Wetherill, 1963; Zwislocki, Maire, Feldman, & Rubin,1958); (2) the PEST method (Parameter Estimation bySequential Testing) (Findlay, 1978; Taylor& Creelman,

Figure 2. Threedifferent functions representing performance ona two-alternative forced-£boicedetection task: logistic (left), Gaussian integral (middle), and WeibuII (right). (3s for the three functions are 4.80,6.62, and 3.50, respectively. For the WeibuII function, IX represents performance of 0.82, but for the other two, IX

represents performance of 0.75. IX is 0.05 log units lower for thelogisticand Gaussian integral functions than for the Weibull function fit to the same set of data.

626 HARVEY

Pentland (1980) andthenby Watson and Pelli(1983), whopublished a simple BASIC programfor implementing theirprocedure (dubbed QUEST). A chi-square criterioncouldalso be used in place of the likelihood function for thepurpose of converging on the value of the threshold (Berkson, 1955),but this approach has notyet beendeveloped.

The following example illustrates the principle ofmaximum-likelihood hypothesis testing. Assume thatwitha certainstimulus contrast, we observe the following seriesof correct (C) and incorrect (I) responses on 10 trials ofa two-alternative forced-choice detection experiment:

CCICICCCIC

Assume furtherthatwe haveseveral alternate hypotheses,each one predicting the probability of making a correctresponse. With the assumptions thateachtrial is independent (Assumption 2) and that performance is stationary(Assumption 3), we can calculatethe probability (likelihood)that eachof the hypotheses generatedthe observedresults. If the probabilityof a correct responsepredictedby Hypothesis I is P(C)I' and the probability of an incorrect response is P(I)1 = I-P(C)1> the likelihood ofthe above sequenceof correct and incorrect trials underthis hypothesis is simply the product of each of the individual probabilities:

L1 = P(C)1 . P(C)1 . P(l)l . P(C)1 . P(l)l .

P(C)l . P(C)1 . P(C)l . P(l)l . P(C)l. (5)

If NC is the number of correct responses and NI is thenumber of incorrect responses, then:

It is computationally safer, especially with many observations, to take the natural logarithmof Equation 5 andcompute the log likelihood of the events:

logeLl = logeP(C)l + logeP(C)1 + logeP(l)l +logeP(C)l + 10geP(l)1 + 10geP(C)1 +logeP(C)l + logeP(C)l + logeP(l)1 +logeP(C)l' (7)

Or in more compact form:

logeLl = [NC . logeP(C)l] + [NI . 10geP(l).].(8)

Table 1 presentsthe likelihood and log likelihood of thatsequenceof correct and incorrect responses for each ofnine different hypotheses about the value of P(C). Noticethatone, marked witha dagger, hasa maximum likelihood, and of the hypotheses tested, it is most likely tobe correct.

Equation I, 2, or 3 may be treated as a hypothesis thataccounts for a series of correct and incorrect responsesin a psychophysical experiment. Given values for a, (J,and -y, the equations predict the probability of a correctresponse, P(C) , and the probability of an incorrectresponse, P(l) = I-P(C), for anypossible stimulus. The

(9)

ex (3 'Y Log Likelihood

-2.5500 3.5000 0.5000 -548.9255-2.5000 3.5000 0.5000 -359.5210-2.4500 3.5000 0.5000 -235.7017-2.4000 3.5000 0.5000 -136.3806-2.3500 3.5000 0.5000 -88.8615-2.3000 3.5000 0.5000 -76.7483*-2.2500 3.5000 0.5000 -86.6537-2.2000 3.5000 0.5000 -108.6983-2.1500 3.5000 0.5000 -135.4853-2.1000 3.5000 0.5000 -161.9160-2.0500 3.5000 0.5000 -185.1755OOסס.2- 3.5000 0.5000 -204.1403

Weibull Function Parameters

Table 2Log Likelihoods that the Data on the Left Side of Figure 1

Were Generated by One of 12 Different Hypotheses

Table 1Likelihood and Log Likelihood of the Observed Sequence of

Correct (C) and Incorrect (I) Responses, CCICICCCIC,for Nine Hypothesized Probabilities P(C)

Hypothesis

P Q Likelihood Log Likelihood

0.1 0.9 7.29 * 10- 8 -16.43420.2 0.8 6.55 * 10- 6 -11.93550.3 0.7 7.50 * 10- 5 -9.49780.4 0.6 3.54 * 10- 4 -7.94650.5 0.5 9.77 * 10- 4 -6.93150.6 0.4 1.79 * 10- 3 -6.32470.7 0.3 2.22 * 10- 3 -6.1086t0.8 0.2 1.68 * 10- 3 -6.39030.9 0.1 4.78 * 10- 4 -7.6453

Table 2 showsthe resultof the application of Equation9to the 28 data points shown in the left side of Figure 1.Twelvedifferent Weibull functions, each having a fixedslope ({J=3.50) and chance level (-y=0.5), but differingin the thresholdparameter ex, were tested as hypotheses.Of the ex values tested, the Weibull function with ex of-2.3 log contrast has the maximum likelihood of beingthe correct psychometric function. If we explore different values of {J as well as of ex, we find that a Weibull

n

logeL = {E [NC1 • logeP(C)a +i=1

log likelihoodthat specifica, {J, and -y valuesfor one ofthese equations describe sets of correct and incorrectresponses obtainedwith different stimulus contrasts canbe calculated by extending Equation 8 to include theprobabilitiespredicted for several stimuli, not for only one.For the ith contrast, these functions predict a value ofP(C)I and P(l)h and the predictions from each contrastare combinedwith the observed number correct and incorrect for that contrast:

Note-Each hypothesis is a Weibull function having the listed valuesof ex, (3,and 'Y. *Thisfunctionhas the maximum likelihood(of thosetested) of being the correct hypothesis.

[This hypothesis has the maximum likelihood of being correct.

(6)L1 = P(C)fC . P(I)fI


.50.40.3020.100LLll.LllJ.LLLLLll.LllJ.LLLllLl.LLlJ.LLL.LllJ.llLLLll.LllJ.LJ.J0.00

(/) 2AFC--l« 1500::r-

3AFCu,0 lee

'!-AFC

0::W(])

2 50::JZ

200 l"TT"T"TTT1rTTTTTT"T"TTT1rTTTTTT"T"1TnrTTTTTT"I"TTTTTTTTTT"lrT"n

ALPHA CONFIDENCE INTERVAL

Figure 3. Using the ML-TEST procedure, the mean number oftrials required to reach criterion confidence interval as a functionof confidence interval in log contrast units using a two-alternativeforced-choice (2AFC) (upper curve), 3AFC (middle curve), and4AFC (lower curve) detection paradigm. These data were producedby a Monte Carlo simulation of an ideal observer whose detectionperformance is represented by Weibull functions having {3 of 3.50,sof 0.01, and 'Y of 0.50, 0.33, and 0.25 for 2AFC, 3AFC, and 4AFCparadigms, respectively. Each point is based on 100 Monte Carlotrials. The a priori threshold value was randomly chosen from auniform distribution two log contrast units wide, centered on thetrue threshold value, and the a priori logIikelibooddistribution wasbased on one simulated trial.

drops to )('/2.0 belowthe maximum log likelihood value.Thesetwo stimulus values define the upperand lowerconfidence limits of the maximum-likelihood estimate ofthreshold a. The 95% confidence interval is derived inthis manner when Xl is set equal to 5.0239.

The confidence interval is usedas the criterionfor stopping the experiment. The experimentcontinuesuntil theconfidence interval reaches some predefmedwidth. Thenarrowerthecriterion interval, themoretrialsare requiredto reach it. Figure 3 showsthe numberof trialsas a function of the criterion confidence interval in log contrastunits for 2AFC, 3AFC, and 4AFC detectionparadigms.The 3AFC is more efficientthan the 2AFC paradigmbecausechance performance probability of 0.33 canbe identified with fewer trials than can that of 0.50 (see Equation 4). In the samemanner, the 4AFC paradigmis moreefficientthan the other two becausechanceperformanceof 0.25 is easiest of the three to be identified.

Shelton, Picardi, and Green (1982) comparedthe threemethods in actual psychophysical experiments andreported that all three gave unbiased estimates ofthreshold. They noted that the maximum-likelihoodmethoddid not reach its theoretical efficiencyadvantageover the other two becauseof a tendencyafter a few correct trials (particularly those that are correct by chanceto low stimuli)to jump to the loweststimuluslevelpossible and to remainthere for a numberof subsequent trials.A similarproblem, butin the opposite direction, wasnotedby Taylor et al. (1983): After a lapse (when the subject

function witha of -2.295 and {3 of 2.9449 is more likelyto be the correct function, with a log likelihood of-75.7722. This curve is drawn through the data inFigure 1.

The principles of maximum-likelihood hypothesis testingmaybe usedto control stimulus selection andthresholdestimation in psychophysical experiments. Beforethe experiment starts, theexperimenter chooses oneof the aboveequationsto represent the psychometric function underlying the subject's performance(for visionexperiments,a Weibull function works well). The values of {3 and 'Yare fixed ({3 of 3.5 and 'Y of 0.5 are appropriate for 2AFCparadigms usingWeibullfunctions). A seriesof a valuesare selected as possible thresholdvaluesto test duringtheexperiment. Sincethe values of {3 and 'Y are heldconstant,each a represents a completepsychometric function. Insinewavegrating detection experiments, onecouldchoosethese as to cover the range of possible contrasts in 0.02log unit steps from a low of -3.50 log units to 0.0 logunits (contrasts from0.0003 to 1.0), giving 176differentcandidates for the threshold value of a. Depending oncomputer power available, one could consider a widerrange of candidateas or use a smaller step size, or both.I have found that on a DEC LSI-11123, about 176candidatescouldbe handled without a bothersome intertrialdelay. An LSI-11173 enabledme to use 512differenta candidates with virtually no intertrial delay.

On eachtrialof theexperiment, a stimulus is presented,and the subjectmakesa responsethat is either correct orincorrect. Given the stimulus used on the trial, each ofthecandidate functions, represented by thedifferent valuesof a, predict whatthe P(C) or P(I) shouldbe. A separatelog likelihood is maintained for each candidate a, and,after each trial, the quantity10geP(C) (if the subjectwascorrect) or 10ge[1-P(C)] (if the subject was incorrect)is added (see Equation 7) to the previous log likelihoodfor thatcandidate. Thelog likelihoods of thedifferent candidateas are thensearchedto find the one with the maximum. This valueof a is an unbiased estimateof the truevalue of threshold a (Hays, 1963).

In principle, themaximum-likelihood estimation methodis the most efficientmethodof estimating thresholdsbecause the stimulus used on each trial is most likely to bethe threshold stimulus (Pentland, 1980; Taylor et al.,1983; Watson& Pelli, 1983), andsimulation studiesconfirm this principle (Pentland, 1980). This efficiency isachievedat the cost of making an additional assumption,number 4, that the psychometric function describing asubject's performancehas a specific mathematical form,a reasonable assumption given the nature of empiricaldata.

The log likelihood values may also be used to estimatethe confidence interval within which the true value ofthreshold a lies. Using the relationship between x' andlog likelihood ratio (Hays, 1963; Wilks, 1962),

x' = -2.0 * loge likelihood ratio. (10)

One searches to find the first candidate a above and belowthe maximum likely value of a whoselog likelihood

628 HARVEY

makes an error with a stimulus that should be easily detectable), the procedure jumps to a very high stimuluslevel and moves down very slowly, particularly if the lapseoccurs early in the trials.

There are four techniques that minimize these problemsand maximize the efficiency of the maximum-likelihoodmethod. The first is to incorporate the probability of alapse into the predicting functions. Letting 0 be the lapseprobability (Watson & Pelli, 1983), constrain thepredicted probabilities used in the log likelihood computations so that the minimum probability possible is 0 andthe maximum is 1-0 (Hall, 1981). This constraint makesthe method less susceptible to lapses (Taylor et al., 1983)and avoids the problem of computing the logarithm ofzero. I usually set 0 to 0.01.

The second technique is to separate the estimation ofthe threshold a from the decision about which stimulusto present on the next trial. Watson and Pelli (1983)pointed out that there are really three sets of log likelihoods relevant to a psychophysical experiment: (1) thea priori log likelihood that each candidate a is thethreshold based on previous results, wild guesses, etc.;(2) the log likelihoods based on the sequence of correctand incorrect responses during the experimental trials; and(3) the a posteriori log likelihoods, which combine theresults of the trials with the a priori log likelihoods.

The a posteriori log likelihoods are computed (minusa constant) by adding the log likelihoods based on the trialsalone to the a priori log likelihoods (see Watson & Pelli,1983). The log likelihoods are used to estimate themaximum-likelihood value of a and the confidence interval of that estimate; the a posteriori log likelihoods areused to select the stimulus value to use on the next trial.During the first five or so trials, the estimated value ofthreshold a will fluctuate widely or will drop to very lowvalues if the subject makes no incorrect responses. Themaximum-likelihood value selected from the a posteriorilog likelihoods, because it is constrained by the a prioriestimate of the threshold, will not fluctuate widely, butwill stay in the neighborhood of the initial estimate ofthreshold.

I create the a priori log likelihoods for each of the candidate a s by simulating a very small number of trials witha stimulus that is my best guess about the value ofthreshold a, using Equation 8. In this case the numbercorrect, NC, and the number incorrect, NI, will not necessarily be integers, as they are with real trials. A Weibullfunction with 'Y of 0.5 predicts a response probability of0.816with a threshold stimulus. For two simulated trials,this function predicts that there will be 1.632 correctresponses and 0.3680 incorrect responses, and thesevalues would be used in the evaluation of Equation 8 foreach candidate a. This representation of the a priori loglikelihoods seems to work better than a Gaussian probability distribution, which was suggested by Watson andPelli (1983). Simulating trials creates log likelihoods thatare compatible with the likelihoods computed during ac-

tual trials. Those computed from arbitrary unimodal distributions, such as Gaussian distributions, are not.

The third technique for minimizing problems is usedin the case where one or more a values are tied for maximum likelihood. Ties will always occur during the earlytrials if the subject is always correct or always incorrect.When ties occur among the log likelihoods, estimate aby taking the mean of the highest and lowest of the tiedvalues. If ties occur among the a posteriori log likelihoods, which is not likely because of the a priori log likelihood distribution, select the highest tied value to use asthe stimulus on the next trial.

The fourth technique that maximizes efficiency is to notuse the a with the highest a posteriori log likelihood asthe stimulus on the next trial, but one that is slightlyhigher. Taylor (1971) formally developed the concept ofefficiency of estimating points on a psychometric function. He defined an inverse measure of efficiency, thesweat factor, so called because it measures the amountof work (number of trials) one must do to measure athreshold. Maximum efficiency of measurement isreached when a stimulus is used that gives the lowest, orideal, sweat factor. Except for Gaussian integrals andlogistic functions with 'Y equal to zero, this ideal stimulus is not the value of a, but one offset higher by a small

Table 3

(3 'Y P(C) P(C).

Weibull Function

3.50 0.50 0.0765 0.92 0.8163.75 0.50 0.0700 0.92 0.8164.00 0.50 0.0652 0.92 0.8163.50 0.33 0.0717 0.89 0.7553.75 0.33 0.0658 0.89 0.7554.00 0.33 0.0635 0.89 0.7553.50 0.25 0.0685 0.87 0.7243.75 0.25 0.0632 0.87 0.7244.00 0.25 0.0604 0.87 0.7243.50 0.00 0.0571 0.79 0.6323.75 0.00 0.0543 0.79 0.6324.00 0.00 0.0497 0.79 0.632

Logistic Function

4.75 0.50 0.0443 0.81 0.7505.00 0.50 0.0416 0.81 0.7504.75 0.33 0.0335 0.73 0.6675.00 0.33 0.0313 0.73 0.6674.75 0.25 0.0289 0.68 0.6255.00 0.25 0.0285 0.68 0.625

Gaussian Integral Function

6.50 0.50 0.0721 0.84 0.7506.75 0.50 0.0661 0.84 0.7506.50 0.33 0.0508 0.76 0.6676.75 0.33 0.0538 0.76 0.6676.50 0.25 0.0508 0.71 0.6256.75 0.25 0.0416 0.71 0.625

Note-The stimulus offset value, f, in log contrast units, required toachieve the ideal sweat factor, the corresponding probability of a cor-rect response, P(C), and the probability of a correct response at stimu-lus level ex, P(C). for selected values of {3 and 'Y for the Weibull func-tion, the logistic function, and the Gaussian integral. f is zero for logisticand Gaussian integrals with 'Y of zero.


EPSILON (LOG CONTRAST UNITS)

.113

EPSILON (LOG CONTRAST UNITS)

.13\-. \13

«I(L-.J«I..L.oo(f)

equalto zero, because the increase in the number of trialsis small relative to the decrease iti a variability.

Subjects in psychophysical experiments do violate boththe assumption of stationarity and of independence(Shipley, 1961; Taylor et al., 1983) by showing lapsesand sequential dependencies of their responses. With themaximum-likelihood estimation procedure, theseeffectscan be reduced by intermixing different test conditionsin blocks of 20 or 30 trials. As before, the stoppingcriterion is the confidence interval of the estimate ofthreshold a, and a newblockof trials for a givencondition is added to the results of the previous blocks untilthat criterion is met.

It is problematical to apply these methods to psychophysical paradigms that do not control for observerresponse bias. When psychometric functions are measuredin experiments in which the subject is asked to reportwhether or not the stimulus was seen (or heard), for example, the dependent variable is the probability of sayingyes, P(yes), thehit rate. Severalauthors(e.g., Nachmias, 1981; Watson & Pelli, 1983) have suggested thatthese data may be fit with logistic or Weibull functionssetting 'Y equal to the falsealarm rate. The problemwiththis approach is that the false alarm rate is not constant:there is a different false alarm rate for each pointon thepsychometric function. Observers more or less strive tomaintain an unbiased decision criterionin a detection experiment (unless theexperimental conditions are arrangedotherwise) and, thus, tend to maximize the percent correct (Siegert's observer, see Egan, 1975). One conse-

Figure S. Using the ML-TEST procedure, standard deviation ofthe estimation of threshold a, as a function of stimulus offset, e,in log contrast units with a two-alternative forced-choice detectionparadigm. These data were produced by a Monte Carlo simulationof an ideal observer whose detection performance is represented byWeibull functions having {:1 of 3.50, /j of 0.01, and 'Yof 0.50. Eachpoint is based on 200 Monte Carlo trials with a stopping confidenceinterval of 0.30 log units. The a priori threshold value was randomlychosen from a uniform distribution two log contrast units wide, centered on the true threshold value, and the a priori log likelihooddistribution was based on one simulated trial.

. \13

(f)-.J 80«0::f- 613I..L.0

0:: 413wenL

"~::>z

a

Figure 4. Using the ML-TEST procedure, the mean number oftrials required to reach criterion confidence interval as a functionof stimulus offset, e, in log contrast units with a two-alternativeforced-choice detection paradigm. These data were produced by aMonte Carlo simulation of an ideal observer whose detection performance is represented by Weibull functions having {:1 of 3.50, /j

of 0.01, and 'Y of 0.50. Each point is based on 200 Monte Carlo trialswith a stopping confidence interval of 0.30 log units. The a priorithreshold value was randomly chosen from a uniform distributiontwo log contrast units wide, centered on the true threshold value,and the a priori log likelihood distribution was based on one simulated trial.

amount, e. Somevalues of e corresponding to the idealsweat factor, as defined by Taylor (1971), for the threetypes of psychometric functions are given in Table 3.

That the selection of € affects the numberof trials required to reach stopping criterion with the maximumlikelihood estimation procedure is shown in Figure 4.Eachdata point is the meanof 200 MonteCarlo simulations. There is a minimum at an € of about +0.04, whichis lowerthanthevaluerequired to achieve the idealsweatfactor, 0.0765, shownin Table 3 for a Weibull functionwiththe sameparameters. Theuse of the € to achieve theidealsweatfactor alsoinfluences thevariability of the estimateof threshold a. The standard deviation of the estimated threshold a from the same Monte Carlo simulationis shown in Figure 5 as a function of €. In thisfigure,it is seen that an € of zero gives minimum variability ofthe estimate of a. This variability sharplyincreases withany nonzerovalueof € and is especially highat the idealsweat factor value of e (0.07 in this case). The use of €

forces the subject's correct response probability to increase (0.92 with a Weibull function of 'Y 0.50), thusreducing thenumber of errors. Except for the casewhere'Y equals zero, errors carry more information than correct responses in themaximum-likelihood estimation (seeWatson & Pelli, 1983). Thislossof information producedby forcing performance to a higher level increases thevariability in estimating threshold a. I recommend, as acompromise, using a value of € that is one half or lessof the ideal value given in Table 3. In practice, I set €

630 HARVEY

quence of maintaining an unbiased decision criterion fordetecting stimuli of different intensities is that the falsealarm rate covaries with the hit rate (Green & Swets,1974), and the false alarm rate will be higher for weakstimuli thanfor strong stimuli. Using a psychometric function with "/ fixed at some value is equivalent to assumingthat the subject has a constant false alarm rate for allstimuli (a Neyman-Pearson observer), and there seemsto be little justification for making this assumption in mostpsychophysical experiments.

THE ML-TEST SUBROUTINES

I have implemented the maximum-likelihood methodof estimating thresholds described above in a package of11 FORTRAN routines designed to run under RT-I l onPDP-II series machines and under VMS on VAX computers. They should be easily adapted to other computers.The user must choose values for 6, ,,/, 0, E, and X2 anddecide which psychometric function to use: logistic,Weibull, or Gaussian integral. The user must provide anarray filled with candidate a values in logarithmic form,an array to hold the a posteriori log likelihoods for eachcandidate, and an array to hold the log likelihoods basedon the experimental trials. These routines are:

SUBROUTINE TEST is called at the beginning of theexperiment with the initial estimate of the threshold value,and it calculates the a priori log likelihoods by simulating a small (user-specified) number of trials. It is alsocalled after each experimental trial to update both the loglikelihoods and the a posteriori log likelihoods. The routine returns the index of the candidate a with the highesta posteriori log likelihood of being the threshold stimulus. This value, possibly offset by E, can be used as thestimulus on the next trial.

SUBROUTINE TMAX searches the log likelihood array and returns the index of the candidate a with the maximum likelihood of being threshold. It also returns theupper and lower confidence limits of the estimate.

SUBROUTINE TSRT sorts and prepares the trial-bytrial data (which stimulus is used and the correctness ofthe response) for use by subroutine TFIT.

SUBROUTINE TFIT takes the trial-by-trial data (collapsed by subroutine TSRT) and calculates the values ofa and fJ of the psychometric function giving the bestmaximum-likelihood fit to the data

SUBROUTINE TXLL calculates the log likelihood andX2

, indicating how well a particular set of a, 6, 'Y, andovalues ofa psychometric function describe a set ofpsychometric data. It is used by subroutine TFIT.

SUBROUTINE TFLL ftlls an array with candidate avalues, using equal steps on a logarithmic scale. The userspecifies the lowest log a value, the step size, and thenumber of values needed.

SUBROUTINE TSCR provides a random order fortesting different conditions. The user defines the numberof conditions, N, and TSCR returns an array with the integers 1 to N in random order.

FUNCTION PFUNI evaluates the logistic function atstimulus X, given values of a, 6, ,,/, and o.

FUNCTION PFUN2 evaluates the Weibull function atstimulus X, given values of a, 6, ,,/, and o.

FUNCTION PFUN3 evaluates the Gaussian integralat stimulus X, given values of a, 6, ,,/, and o.

FUNCTION COMLN evaluates the natural logarithmof the number of combinations of N things taken R at atime; it is used by subroutine TXLL.

Two demonstration programs are also included in thepackage:

SIMI. This program is similar to one being used tomeasure contrast sensitivity functions, except that theresponse of the subject is simulated within the program,not collected from a real observer.

SIM2. This program measures a complete psychometricfunction, using a simulated observer. The best fittinglogistic, Weibull, and Gaussian integral functions arefound from the data.

The ML-TEST method represented by these routinesdiffers in four ways from the QUEST method describedby Watson and Pelli (1983). First, and least important,the a posteriori log likelihoods are kept separate from thelog likelihoods based on the experimental trials. One neednot subtract the a priori log likelihoods from thea posteriori log likelihoods every time the estimate ofthreshold a and its confidence interval is to be computed.

The second difference is that the subroutine TMAXreturns the maximum-likelihood value of threshold a, nota plus E. Thus, thresholds measured using different offset values of E are directly comparable.

The third difference is that the a priori log likelihoodsare estimated using simulated trials rather than a Gaussian distribution. This representation of the a priori loglikelihoods seems to allow for faster convergence to thefinal value of the estimated threshold a, especially whenthe initial estimate is too high, than does the Gaussian distribution.

The fourth and most important difference is that thenumber of candidate a s is not constrained by the stimulithat are actually possible to present to subjects with theequipment used in the experiment. It is desirable to consider a series of candidate a values that are spaced in equallogarithmic steps because the shape of the function is constant with logarithmic stimuli. But equal logarithmic stepsin stimulus intensity may not be possible with the equipment that generates the stimuli. The Innesfree image synthesizer, for example, controls sine wave grating contrastwith a 12-bit computer word, allowing 4,096 differentcontrast values. These contrast values are ordered in equalarithmetic, not logarithmic, steps. It is not possible to forma series of stimuli having equal logarithmic steps, and,therefore, the QUEST procedure (which requires equallogarithmic steps in order to achieve computational efficiency) could not be used with this fine piece of equipment. The ML-TEST routines do not require that stimuliactually presented to the subjects be the same as those inthe array of candidate a. The probabilities predicted by


Figure 6. Contrast sensitivity, the reciprocal of threshold contrastcr,as a function ofsine wavegrating spatial frequency in cyclesperdegree of visualangle. These data were collected usingtheML-TFSfprocedure in a two-alternative forced~hoice detection paradigm witha Weibull function having (3 of 3.50, {, of 0.01, and l' of 0.5. The95% confidenceinterval used as stopping criterion was0.20 logcontrast units. Thisconfidence interval isdelimitedby thetwosolidlines.

t00.0

>f->f-(/) 100zw(/)

f-(/)

« 100::f-zou

SPATIAL FREQUENCY (CPD)

tertrial delay was about 3 sec. A faster computer wouldreduce session times accordingly.

Using a wider confidence interval as stopping criterionspeeds up the measurements considerably. Figure 7 showsa second set of contrast sensitivity data measured usinga stopping confidence interval of 0.35 log contrast units.Each of the 21 data points took an average of 23 trialsand the session lasted 40 min. The average number oftrials in the Monte Carlo simulation under similar conditions was 29 (SD= 12.6). Thus, each contrast thresholdrequired about 2 min to beestimated with a 95% confidence interval of 0.35 log units. The rapidity with whichthese data were collected does not obscure the characteristic irregularity of this subject's contrast sensitivity at 6cycles per degree.

For measuring detection sensitivity, the superiority ofthe forced-ehoice paradigm over the method of adjustmenthas been widely recognized since the advent of signal detection theory (Egan, 1975; Green & Swets, 1974).Thresholds measured with the method of adjustment areinfluenced both by the subject's sensitivity and by his/herdecision criterion (response bias). Forced-choiceparadigms minimize the influence of response bias onthreshold measurements. Higgins, Jaffe, Coletta, Caruso,and de Monasterio (1984) have recently demonstrated thatcontrast sensitivity functions measured with the methodof adjustment procedure have a standard error about twicethat of contrast sensitivity functions measured in a 2AFCparadigm. The ML-TEST procedure described above provides a rapid and efficient way in which the 2AFCparadigm can be used to measure contrast sensitivity. Afurther benefit of this method is that the confidence in-

Figure 7. Contrast sensitivity, the reciprocal of threshold contrastex, as a function of sine wavegrating spatial frequency in cyclesperdegreeof visualangle. These data were collected using theML-TFSfprocedure in a two-alternative forced~hoice detectionparadigmwitha Weibull function having (3 of 3.50, {, of 0.01, and l' of 0.5. The95% confidenceinterval used as stopping criterion was0.35 logcontrast units. Thisconfidence interval is delimitedby thetwosolidlines.

100 0

SPATIAL FREQUENCY (CPD)

r

Ie

>f->f-(f) 1eezw(/)

f(f)

«0::fzou

each candidate a's psychometric function, given the actual stimulus presented, are computed each time, usingthe appropriate subroutine function. This approach requires more computation, but the gained flexibility isworth it. The stimulus value returned by subroutine TESTfrom the a posteriori log likelihood array is the recommended value (plus any offset) to use on the next trial,if possible with the equipment being used. One should,for efficiency to be kept high, use the closest valuepossible.

A contrast sensitivity function at 21 spatial frequenciesmeasured using the ML-TEST routines is presented inFigure 6. The sinusoidal gratings were generated by anInnesfree image synthesizer and displayed on a Tektronix 608 CRT monitor. The image synthesizer was controlled by a Southwest Technical Products S/09microcomputer having a 6809 microprocessor. A twoalternative forced-choice detection paradigm was used.Each measurement was stopped when the 95%confidenceinterval reached 0.2 log contrast units. This confidenceinterval is delimited by the solid lines above and belowthe data points. Therefore, we can be 95 % certain thatthe true threshold lies between these limits. Each of the21 data points required an average of 47 trials beforecriterion was reached, which compares favorably with themean of 64 trials (SD= 17) required in the Monte Carlosimulation. (The starting stimulus was chosen randomlyin the simulation, whereas in actual measurements thestarting stimulus is based on previous measurements and,therefore, is closer to the measured threshold.) The entire experimental session lasted about 70 min, about3.3 min per threshold. With the computer used, the in-

632 HARVEY

terval of the thresholds is under the control of the experimenter, and each threshold can be obtained with aknown degree of accuracy. Speed of measurement canbe traded off for accuracy of measurement.

The ML-TEST routines have been used extensively fora variety of psychophysical measurements with both experienced and naive subjects in several laboratories in theUnited States and Europe. The source code for the routines and the sample programs are well commented andshould allow a user who is experienced with computercontrolled experiments to implement them with nodifficulty .

REFERENCES

ABRAMOWITZ, M., & STEGUN, 1.A. (Eds.). (1965). Handbook ofmathematicalfunctions. New York: Dover.

BERKSON, J. (1951). Why I prefer logits to probits. Biometrics, 7,327-339.

BERKSON, J. (1953).A statistically preciseand relatively simplemethodof estimatingthe bio-assaywithquanta!response,basedon the logistic function. Journal of the American Statistical Association, 48,565-600.

BERKSON, J. (1955).Maximum-likelihood and minimum chi-squareestimates of the logistic function. Journal of the American StatisticalAssociation, SO, 130-162.

CAMPBELL, R. A. (1963). Detectionof a noisesignal of varyingduration. Journal of the Acoustical Society ofAmerica, 35, 1732-1737.

CORNSWEET, T. N. (1962). The staircase-method in psychophysics.American Journal of Psychology, 75, 485-491.

DIXON, W. J., & MOOD, A. M. (1948). A method for obtaining andanalyzing sensitivity data. Journal ofthe American Statistical Association, 43, 109-126.

EGAN, J. P. (1975). Signal detection theory and ROC analysis. NewYork: Academic Press.

FINDLAY, J. M. (1978). Estimates on probability functions: A morevirulent PEST. Perception & Psychophysics, 23, 181-185.

GREEN, D. M., & LUCE, R. D. (1975).Parallelpsychometric functionsfrom a set of independent detectors. Psychological Review, 82,483-486.

GREEN, D. M., & SWETS, J. A. (1974). Signal detection theory andpsychophysics. Huntington, New York: Krieger.

HALL, J. L. (1968). Maximum-likelihood sequential procedurefor estimation of psychometric functions. Journal of the Acoustical Society of America, 44, 370 (A-N9).

HALL, J. L. (1981). Hybrid adaptiveprocedurefor estimationof psychometric functions. Journal of the Acoustical Society of America,69, 1763-1769.

HAYS, W. L. (1963). Statistics for psychologists. NewYork:Holt, RInehart & Winston.

HIGGINS, K. E., JAFFE, M. J., COLETTA, N. J., CARUSO, C. C., & DE

MONASTERIO, F. M. (1984). Spatialcontrast sensitivity: Importanceofcontrolling thepatient'svisibility criterion.Archives ofOphthalmology, 102, 1035-1041.

KRANTZ, D. H. (1969). Threshold theories of signal detection. Psychological Review, 76, 308-324.

LEVITT, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the Acoustical Society of America, 49, 467-477.

NACHMlAS, J. (1981).On the psychometric functionfor contrastdetection. Vision Research, 21, 215-223.

PENTLAND, A. (1980). Maximum likelihood estimation: ThebestPEST.Perception & Psychophysics, 28, 377-379.

QUICK, R. F.(1974). A vector magnitude model of contrastdetection.Kybemetik, 16, 65-67.

SHELTON, B. R., PICARDI, M. C., & GREEN, D. M. (1982). Comparison of three adaptive psychophysical procedures. Journal of theAcoustical Society of America, 71, 1527-1533.

SHIPLEY, E. (1961). Dependence of successivejudgmentsin detectiontasks: Correctnessof the response.Journal ofthe Acoustical ofSociety America, 33, 1142-1143.

SWETS, J. A. (1986a). Form of empiricalROCs in discrimination anddiagnostic tasks: Implications for theoryand measurement of performance. Psychological Bulletin, 99, 181-198.

SWETS, J. A. (1986b). Indicesof discrimination or diagnostic accuracy:TheirROCsand implied models. Psychological Bulletin, 99, 100-117.

SWETS, J. A., TANNER, W. P., & BIRDSALL, T. G. (1961). Decisionprocesses in perception. Psychological Review, 68, 301-340.

TAYWR, M. M. (1971).On the efficiencyof psychophysical measurement. Journal of the Acoustical Society of America, 49, 505-508.

TAYWR, M. M., & CREELMAN, C. D. (1967). PEST: Efficient estimates on probability functions. Journal of the Acoustical Society ofAmerica, 41, 782-787.

TAYWR, M. M., FORBES, S. M., & CREELMAN, C. D. (1983). PESTreducesbias in forced-choice psychophysics. Journal ofthe Acoustical Society of America, 74, 1367-1374.

TORGERSON, W. S. (1958). Theory and methods ofscaling. NewYork:Wiley.

WATSON, A. B. (1979). Probability summation over time. VisionResearch, 19, 515-522.

WATSON, A. B., & PELLI, D. G. (1983).QUEST: A Bayesian adaptivepsychometric method. Perception & Psychophysics, 33, 113-120.

WEIBULL, W. (1951).Astatistical distribution function of wideapplicability. Journal of Applied Mechanics, 18, 292-297.

WETHERILL, G. B. (1963). Sequential estimationof quanta! responsecurves. Journal of the Royal Statistical Society, Series B, 25, 1-48.

WETHERILL, G. B., CHEN, H., & VASUDEVA, R. B. (1966). Sequentialestimationof quanta! responsecurves: A new methodof estimation.Biometrika, 53, 439-454.

WETHERILL, G. B., & LEVITT, H. (1965). Sequential estimation of pointson a psychometric function. British Journal ofMathematical & Statistical Psychology, 18, 1-10.

WILKS, S. S. (1962). Mathematical statistics. New York: Wiley.ZWISLOCKI, J., MAIRE, F., FELDMAN, A. S., & RUBIN, H. (1958).On

the effect of practice and motivationon the threshold of audibility .Journal of the Acoustical Society of America, 30, 254-262.

Efficient estimation of sensory thresholds1975; Nachmias, 1981; Quick, 1974). The logistic func tion...

Documents

Transcript of Efficient estimation of sensory thresholds1975; Nachmias, 1981; Quick, 1974). The logistic func tion...