Eyewitness Identification: Witness, Offender, & Law Enforcement Attributes
(2011). Eyewitness identification research: Strengths and ...
Transcript of (2011). Eyewitness identification research: Strengths and ...
1 Hwnnrr
'nsulting?L1 Hunutr
). (7997).the law,
't1, Public
). (2005).e on juror& K . D .entpirical
rrd Press.T)^ ! l - ^ l -
. / I d t r t a ^ -
). (2000).:cts: Inte-
Lnzt, and
re effectslirniting
d groupBehnttior,
al public-snt Quar-
tpp. 7467
tocetlttres:rescntatneLrblishing.The jury
r victims:t" triais.
t 1
C H A P T E R 1 3
Eyewitness Identification Research:Strengths and Weaknesses of
Alternative MethodsGARY L. WELLS ar.ro STEVEN D. PENROD
vEWTTNESS TDENTIFICATIoN RESEARCH has become an increasingly popular
specialty among those interested in psychology and law. We have
tried to write this chapter in a wav that would be useful for new or"would be" eyewitness researchers, and we also think that tl'rere might be
some points here that seasoned eyewitness researchers might find useful. The
reader should keep in mind, however, that that there can be different views
about what is the best methodology for eyewitness research. We offer our best
advice, but that is no guarantee that following our advice will automatically
disarm critics of any researcher's siudy. Furthermore, researchers sl'rould
always consider any particular recommendation that we make in the context
of the hypotheses, purposes, and goals of their own study.
For example, one of the authors of this chapter favors a methodology in
which fillers are counterbalanced in and out of the position of the innocent
suspect rather than having one person always serve as the innocent replace-
ment for the culprit (a general preference of the second author). Both authors
agree, however, that if the purpose of the study is to test ideas about lineup
bias emanating from poor fillers, the counterbalancing methodology wouldnot make sense. We cannot anticipate every hypothesis that researcherswould want to pursue, so researchers will always have to think through
every methodological decision within the context of the questions they aretrying to answer.
Readers should also note that considerations of ecological validitp althoughoften dominant in eyewitness research, can sometimes take a back seat to
theoretical considerations. Consider, for example, the removal-without-
replacement procedure for lineups in which the culprit-absent lineup is createdby removing the culprit and replacing him/her with no one (Wells,1993).
237
238 CnrlrrruaL Lew-Tnral Issurs
From the perspective of ecologicar validity, this is a peculiar methodologywithout much of a real-world counterpart. on the othei hand, this methodJl-ogy is powerful for testing certain types of theoretical questions and is, insome ways, a "cleaner" manipulation of the culprit-present lineup versus ,culprit-absent lineup variable. This is because the traditional methoi (replacethe culprit with someone else) changes t1,o variables at olce (the culprit isremoved, and a new face is introduced), whereas the rernoval-without-replacement procedure removes only the culprit. Depending on the purpossof the study, ecological validity does not alwiys trump other concerns. In fact,the relative paucity of theory developme.t in the eyewitness area is somethingthat many researchers now believe neecls to be rettified, which could lead tothe use of methodologies that are less ecologically valid but more revealing ofpsychological processes (see Applied Cognitiae ps,t1cholog1,2008, Volum eZi$),devoted to such discussions).
Readers should also keep in mind that rnethodological sounclness is on$one factor that determines whether an eyewitness stucty is likely to survivethe rigorous review process and be accepted for publication. Many manu_scripts are rejected from top journals despite their methodological ,ol1dn.r, lbecause the study does not significantly advance o.r, knoirledge. one of ,the poorest reasons to conduct a study is merely that ,,no one has done ir,,All studies have to be justified at much deepei levels than that. why is itimportant? How does it relate to the current literature? How does it aclvanceour theoretical understanding? what are the applied implicatio.s? Noviceeyer'r'itness researchers would do well to examine carefully the introduction ,and discussion sections of r,vell-cited eyewitness articles to develop a feelfor :lror,v these articles were "packaged,.,,
EYEWITNESS IDENTIFICATION LABORATORYEXPERIMENTAL METHODS
Designing, conducting, and analyzing an eyewitness identification experi-ment involves a series of decisions that can have considerable consequencesfor the results and their interpretations. Although we cannot exhaust all thepossible decision junctures that confront eyewitness identification research-ers/ many of which depend o. specifics of the hypothesis being tested, thereare several that confront researchers in the design of almost ill eyewitnessident i f icat ion experiments.
The basic idea of an eyewitness ic{entification experiment is to exposeparticipant witnesses to an event in which they view one or .r.,or" u.tor, ,engaged in some critical behavior. Later, the participant witnesses are given isome type of eyewitness identification test, usuallya photo array, in whichthey are asked to attempt to identify the individual(sj. In most eyewitnesside.tification experiments, the participant witnesses are also asked to indi-cate the confidence or certainty they have in their identificatiol decisions. iDepending on the focus of the experiment, other measures might be taken,
such as dec;their identifand so on.ward. Butsomewhat r
Tss EvsNr
One of the iare many e.events, sucl(see Wells,eyewitnessthe researcLis certainlycontinuingwitnesses irpresentatiotypically hrOn the othrwith videowitness evethat eyewiileaving littlevent rneth
Regardleresearcherstions of theweapon-forand the sangroups (Cl
between th(weapon vrsolved everto Class A r
takes onlyentire classaffects the
For exarspread quirin which tl"randomlysuspiciousappreciabl,the possibi
Jology:hodol-1 ls, ln
versusreplacelprit isithouLrPosesIn fact,rethinglead torling ofe 22(6),
is only;urvivemanu-ndness.One offne it:"hy is itdvanceNoviceluctionfeel for
experi-
luencest all the:search-rd, therelwitness
exPosee actorsre givenn whichlwitnessto indi-
:cisions.'e taken,
Eyewitness Identificntion Research 239
such as decision latency, willingness of participant witnesses to testify about
their identification decision, staiements about how they made their decisions'
and so on. In many respects, such an experiment seems rather straightfor-
ward. But the apparent simplicity of preparing such an experiment is
somewhat decePtive'
THs EvrNr
eu.ntr, such as thefts, in *|rl.h unsuspecting people become eyewitnesses
(see Wells, Rydell, & Seelau, 7993, for an example)' Increasingly' however': - J ^ ^ ^ . ' ^ a f o t L a l
eyewitness iientification researchers have come to rely on video events that
,the researchers create, often depicting some type of crime. uslng video events
i, ...tuirrty easier than using live events, because video does not require
One of the initial decisions to be made is what kind of event to create' There
*. *u"y examples in the eyewitness identification literature of live staged
gontinuing reenactments with actors, there are no concerns about participant
witnesseslntervening to stop the crime, the event is perfectly consistent in its
,presentation frorn or,-e participant to the next, and institutional review boards
iiypicarry have fewe, .on."t.'r about video events :hl iTlt^l1li.ll,l,'i;6i-it.'",n"r hand, it is possible to raise concerns about ecological validity
,with video events due to the obvious fact that eyewitnesses in real cases
witness events live, in three dimensions, and so on. But, at this point, it seems
that eyewitness scientists have largely accepted the video-event method,
,,leauir,g iittle incentive for researchers to use the more difficuit and costly live-
goups (Classes A and B) are nested within conditions. Any differences'tetouu"r, the classes are therefore confounded with the manipulated variable
(weapon vs. no weapon). Surprisingly enough' the problem of nesting is not
solved even if there were 400 itudents and they had been randomly assigned
toClass A or Class B. The reason that this does not solve the problem is tirat it
event method.Regardless of the medium chosen for the witnessed event, we cautron
researchers to avoid designs in which large groups are nested within condi-
tions of the study. At thelxtreme, imagine that a researcher was testing the
, *uupo"-ro.us effect by having a classroom disruptor flash a knife in Class A
, and the same disruptor have no weaPon when disrupting Class B' In this case'
takes only one small glitch (which could easily be undetected) to affect au
entire class (or a large"portion of the class); if something affects the class' it
affects the entire condition.
For example, maybe a student was suspicious' and- a whispering rumor
spread quictty in only one class. Contrast this with a fully randomized design
inwhich the event was staged 400 times, once for each witness, and each was
randomly assigned to the weaPon versus no weaPon condition' That one
suspicious stuJent could affecf only one data point' which would have no
appreciable effect on the condition in which that student fell' Consider also
the possibilitv that the actor who plays the role of the disruptor is unable
240 Cnrvn:el Law-Tnrel Issurs
to enact the event in exactly the same way each time (which is likely).sometimes the actor faces away from the group more than other times, orsometimes the actor moves a little faster. Any minor difference for Class Aversus Class B could impact results for the entire condition (because eachclass is a full condition).
At the other extreme, where each enactment is done for each individualwitness, these minor variations "wash out," because they will occur as oftenin one condition as they do in the other condition (the law of large numbers).Based on strict interpretations of a random design and the assumptionsbehincl statistical analyses, the situation in which groups (Class A andClass ts) are nested in conditions (weapon and no weapon) is the same asan experiment with a total sample size of two (one per condition), not 400(200 per condition). These are extremes, of course, where in the nested. casethere is only one group per condition and in the other case there are nonestings of groups within conditions. Most reviewers and editors will permitsome level of nesting within conditions. For example, groups of two oi threewitnesses at a time are usually not flagged as problematic nestings if theoverall sample size is large enough. Still, researchers need to be prepired forthe possibility that editors or reviewers will require some compliiateh higher-order statistical analyses that are harmful to statistical power when there arenestings. And, at the extreme where there is only one group per condition,there is no statistical solution. Such designs will be (and should be) rejectedfor publication in reputable journals.
once a medium has been chosen for the event, questions arise, such as whatparticipant witnesses are told prior to the event, how long the participantwitnesses will be exposed to the culprit(s), and wheiher to use mulilpleculprits reenacting the same event. Most eyewitness identification researchersuse mild deception to avoid cueing the witnesses prior to the event that theywill witness. For example, in a video-event study, participant witnesses mightbe told that the study involves impressions of people or impressions oferrents. In live-event experiments, participants might be seated in a waitingroom believing that the study has not yet started. The rationale for not alertingparticipant witnesses that they will be witnesses is simply that actualwitnesses are generally iaken by surprise.
The issue of exposure duration is a more difficult one, because the relationbetn'een expostlre duration and eyewitness identification performance is notprecise. The general idea is to find an exposure duration that avoids ceilingand basement effects. An extremely long exposure might result in eyewit-ness identification performance being virtually perfect, which might thenprevent the researcher from showing that certain other variables (e.g., abetter lineup procedure) can improve eyewitness identification perform-ance. Conversely, an exposure duration that is too brief might t"rrrit in sucha weak memory trace that there would be no chance to show how othervariables (e.g., passage of time before the lineup) can harm eyewitnessidentification performance.
The appropriateaspects of the rese;accuracy by militafound that under s<over one in four inlet al., 2004). In co.better performancejust 12 seconds. Hebe done using justperformance to gowe discuss how lir
The question of 'crime, is generaliystimulus samplingcharacteristics of thstudy. For examplothers, in large parappearance than omore important forstimulus samplingtor) variable is an
For example, ifwhen the culpritsampling is essenculprit-actors withprit-actors with miUsing only one or Ithreat) in this case,confounded with t
However, if thestimuli themselvestwo culprit-actorsculprit-actor are tlissues of generalizfrom the manipulencourage readersthese points; it alslessen the need fostimulus sampling
Bunpwc rus LINsut
Great care must beThe need for som,in the lineup, whe
is likely).times, orr Class Aruse each
:rdividualr as oftenrurnbers).umptions;s A and) same as), not 400rsted casere are noill permitr or threergs if thepared forrd higher-there are
:ondition,
) rejected
h as whatarticipantmultiple
searchersthat they
ses might:ssions ofa waiting>t alertingLat actual
e relation.nce is notCs ceilingn eyewit-ight then:s (e.g., aperform-It in suchow otherrewitness
Eyertritness Identification Resesrclt 247
TheappropriateexposuretimeCanvarydramaticallyas'afunctionofoti"reraspects of the ."r"ur.h design' Thus' foi example' a study of identification
accuracybymi l i ta rypersonne lbe ing t ra ined inami l i ta rysurv iva lschoo lfound that under ro*" nigt -stress circumstances, trainees could identify just
.ver one in four interrogiors involved in 40-minute interrogations (Morgan
et al., 2004). ln contrai, M""to", Hope' and Bull (2003) obtained slightly
betterperformancesfromwitnesseswhoviewedatargetpersoninavideoforjust 12 seconds. Hence, pilot testing-is.highly recommended' which can often
be done using just the control conditiorito make sure that there is room for
performancetogoupanddownasafunct ionofothermanipulat ions.(Later,we discuss how lineup performance is assessed')
Thequest io, ,orwn"tnertousemult ipleculpr i ts. ,eachreenact irrgthesamecrime, is generally construed as a" issue of "slimuius sampling'" The idea of
s t imu lussampl ing is toer rsure tha t theres t r l t sarenotdueto thepar t i cu la rcharacteristics of the individual who was selected to serve as the culprit in the
s tudy .Forexample ,weknowthatsomepeop learemorerecogn izab le thanothers, inlargepartbecausesomepeoplehaveamoredist i rrct ive(urrusual)appearance than other people. However, the need for stimulus sampling is
more impor tan t fo rsomestud ies than i t i s fo ro thers . Ingenera l , the issueofstimulus sampling is extremely important when the manipulated (or predic-
tor) variable is an exemplar from a caiegory'
For exampie, if the experiment is exJmining how eyewitnesses perform
when the culprit is Bh& versus white or male versus female, stimulus
sampling is essential. Cleariy, one must comPare many different Black
culprit-actors with *u"y difflrent white culprit-actors or many male cul-
pr i t -actorswithmanyfemaleculpr i t -actorstoreachanyproPelconclusions.Using only one or two such actori threatens construct validity (a very serious
threat) inthiscase,becausethecategoryvariabie(e.g.maleVerSuSfemale) isconfounded with characteristics of the exemplars'
However, if the critical manipulated variatle is not a characteristic of the
stimuli themselves (e.8., a pre-lineup instruction)' then the use of only one or
two culprit-actors i, J ters serious threat, because the characteristics of the
cu lp r i t -ac to rare thesameineachcond i t ion ' In the la t te rcase, therecanbeissues of generalization (i.e., would other exempiars yield the same effects
from the manipulated variables?) but not issues of construct validity' we
encouragereaders toexamineanar t i c lebyWel lsandWindsch i t l (1999)onthese points; it also includes discussions of how statistical interactions can
l e s s e n t h e n e e d f o r s t i m u l u s s a m p l i n g a n d h o w t o a n a l y z e t h e d a t a w h e nstimulus samPling is used'
Butt-otNc ruE LruruP
Great care must be taken in making various decisions about building a lineup'
T h e n e e d f o r s o m e d e c i s i o n s i s o b v i o u s , s u c h a s h o w m a n y p e o p l e t o u s ein the lineup, *huth", to present the lineup live or in photos, and so on. But
242 CnlvrrNnl Law-Tnlnl Issuss
some decisions are even more complex and require considerable foretat a level often not exercised in the eyewitness identification literature. Onethe first decisions to be made is whether to use both culprit-presentculprit-absent lineups. There are circumstances in which using onlyor only absent lineups-is perfectly legitimate, depending on ihe nature ofquestion being asked. Much of the research on the post-identificationback effect, for example, uses only culprit-absent lineups because the focuson the development of false certainty after a mistaken identification hasmade (e.g., Wells & Bradfield, 1998). Using a different rationale (a questithat focused on witness performance in culprit-present arrays), MemonGabbert (2003) used only culprit-present lineups to examine simultaversus sequential lineups. Cenerally, however, if the hypothesiseyewitness identification accuracy, it is critical to use both culprit-presentculprit-absent lineups.
The typical method for creating a culprit-absent lineup is to use thefillers but replace the culprit wiih an innocent suspect. Theoretically,simulates the situation in which police investigators have unknowifocused the lineup on an innocent suspect. But the issue too often gover (or ignored altogether) by eyewitness identification researchers is howselect the person who will replace the culprit (i.e., how to select the isuspect). In some published experiments, researchers state that theya person who closely resembled the culprit. For example, the reseamight sort through a large number of photos, have them rated forto the culprit, and then pick the most similar one to be the replacementthe culprit.
We believe, however, that the choice of strategy should depend onquestions being addressed by the researcher and what one assumesanything) about the ways in which innocent people become suMost often, people do not become innocent suspects in actual casesof their high similarity to the actual culprit. After all, the police invdo not know who the actual culprit is (otherwise they would arrest theculprit instead). Instead, an innocent suspect may be someone who merelythe general verbal description of the culprit that was given by the eyewiBut if the innocent suspect in an experiment is selected a priori to looklike the culprit than do any of the fillers, this can serve to greatly ovethe rate at which the innocent suspects would be identified in actualwhere the innocent person merely fits a general description.
Furthermore, selecting as the innocent replacement someone who loolagreat deal like the culprit could serve to underestimate the conaccuracy relationship. At the extreme, replacing the culprit with a clonethe culprit would yield a result in which it is guaranteed that there can beconfidence-accuracy relation. In the general case of an experiment,recommend a different method than one that selects a person whomost like the culprit as the innocent replacement for the culprit. In parti
way that thI descriPtior
any singl<each of tt
and make thr
peculiarities ait-absent lin,
are other
t closely rences; fo
by a poli,hotographs. I
may bear t
may app.illance vide
the surveillan'uct a lineul
imilarly, a pa dra
witnessesins their stlike. This s
simulatingrate-gull
hers spe'it, we reconparagraPh.h the most-
rate can
the most-sThe issue ofbntification e
aboutselection rr
, the effectsis importan'
tant.In gOne i s '
it (as giveiption fro
i s t o tthe res
ts. (Note
we recommend that the innocent replacement for the culprit be selected ined in
rethoughtre. One ofesent andpresent-ture of the
rtion feed-he focus is
n has been
a questionemon and
rultaneousi concernsresent and
e the same
ically, this
knowinglyen glossedrs is how to
re innocentey selectedresearchersr similarityicement for
end on the
rssumes (if
e suspects.ses becauservestigators;t the actual
r merely fits
eyewitness.o look more'verestimate
actual cases
who looks a
confidence-h a clone of:re can be no
:riment, we
r who iooks
n particular,selected the
Eyezoitness ldentificntiotr Resesrch 243
same way that ihe fillers were selected (often, because they fit the general
verbal description of the culprit). Using this method, there is no reason to
specify any single one of the lineup members as the innocent replacement.
Instead, each of the fillers can be rotated into the replacement position in the
culprit-absent lineup. This type of counterbalancing can help stabilize the
data and make the results more replicable, because the results do not depend
on peculiarities associated with the selection of any single member of the
culprit-absent lineup.
There are other circumstances in actual cases in which an innocent suspect
might closely resemble the actual culprit. Mere chance is one of these
circumstances; for example, a witness may be presented a set of photographs
selected by a police officer-using the witness's description-from a database
of photographs. By chance alone, one of the individuals in this "all-suspect"
array may bear a sirong resemblance to the perpetrator. But other circum-
stances may apply. Suppose, for example, the culprit's image was caught on
surveillance video. As a result, the police might receive a tip that the person
on the surveillance video closely resembles Person X and they might decide to
conduct a lineup with Person X as their suspect.
Similarly, a person could become a suspect because he or she closely
resembled a drawing or "composite" of the perpetrator prepared by one or
more witnesses and published in the media. In this case, the police are
selecting their suspect based on some knowledge of what the actual culprit
looks like. This situation is what many eyewitness identification experiments
are simulating when the eyewitnesses use iheir knowledge of what the
confederate-culprit looks like in selecting an innocent suspect. Unless
researchers specifically intend to simulate this circumstance or something
like it, we recommend the counterbalancing method described in the preced-
ing paragraph. This method can be coupled with a reporting of the rate at
which the most-chosen innocent person was selected by witnesses. The most-
chosen rate can give some indication of the risk of misidentification associated
with the most-similar foil.
The issue of how to select fillers confronts virtually every eyewitness
identification experiment. Obviously, if the experiment itself is testing hy-
potheses about the best way to select fillers, the hypotheses themselves dictate
the selection method. But if the experiment is testing some other question
(e.g., the effects of pre-lineup instructions or the effects of exposure duration),
it is important to remember that the method of selecting fillers is still
important. In general, researchers have focused on two methods for selecting
fillers. One is to select fillers who fit the general verbal description of the
culprit (as givenby people who are shown the culprit and then asked to give a
description from memory), which is cailed the match-to-description method'
Another is to select fillers who show some level of similarity to the culprit
(called the resemblance-to-culprit method) as rated by pilot study partic-
ipants. (Note that how this plays out in any given experiment must be
determined in part by how the innocent suspect is selected')
,1.
r l
:.'.
244 CnrrrarNnl Lnw-Tnrel Issurs
In actual police practices there is jurisdictional variation. Some jurisdictionsselect their fillers based on the verbal description of the culprit that wasgiven by the eyewitness(es). Other jurisdictions select the fillers based ontheir similarity to the suspect. The latter practice (similarity-to-susmethod) is actually different from the way fillers are selected in almostany experiment, because the fillers that are selected are more likely to bedifferent if the suspect is innocent than if the suspect is guilty. In other words,the similarity-to-culprit method is not the same as the similarity-method. Regardless of how an experimenter decides to select the fillers to:be used in the study, it is essential that the method be fully described inany write-up of the study. Furthermore, researchers should usually reportsome measure of the final lineup product. Often, this measure is functionalsize (Wells, Leippe, & Ostrom, 1,979) or effective size and defendant bias(Malpass, 1,987), both of which involve the collection of additional datafrom "mock witnesses." However, it should be noted that these measuresare totally insensitive to a bias that occurs when the innocent suspect isselected to resemble the culprit, whereas the fillers are selected to fit thedescription (Wells & Bradfield,1.999).In such a situation, the functional-sizgeffective-size, anddefendant-bias measures will show no lineup bias, and theinnocent suspect will actually stand out.
REPORTING AND ANALYZING THE DATA
IorNrrRceloN Dnrn
Clearly, there are many different approaches to reporting and analyzing data,from an eyewitness identification experiment, and researchers must always'consider the unique aspects of their experiments in deciding how to approachthe data. Nevertheless, there are problems that reoccur frequently. The mostcommon problem is one of underreporting the data. Too often, for example,researchers will collapse across types of accuracy (e.g., identifications of theculprit in a culprit-present lineup and correct rejections from a culprit-absentlineup) and iypes of inaccuracy (e.g., identifications of the innocent suspect,incorrect rejections when the lineup included the culprit, and filler identifi"cations in both culprit-present and culprit-absent lineups).
Unfortunately, collapsing across both types of accuracy and both types ofinaccuracy masks a lot of interesting and important information. We believe:that any eyewitness identification experiment that includes culprit-present:and culprit-absent lineups should minimally report the descriptive statistics,that are characterized by Table 13.1. This kind of table is useful becauseitexhausts the possible responses that witnesses can make in response to eachlineup type. of course, this sl"rould be done for each condition of the experi.ment and here we have displayed two conditions-X and Y. Note that'reporting both the percentages and the frequencies is important, especiallyif the conditions do not have equal sample sizes. In some cases, it can also be,
,f:
Lineup con
Condition,
Culprit-Pre
Culprit-Abr
Condition
Culprit-Pre
Culprit-Ab
Note: Numl
usefui tidentifit
The {that thethan usmembeThis isreceive20Vo Ginngcetrate di'
The,elewitway tccationin culJ13.1, athe ralquick
AItInostichits (i<innoctratio i
effect,prete(fromculprand t
-rrisdictions'it ihat was's based on-to-suspect
I in almostlikely to bether words,-to-suspect
re fillers toescribed inLally reportfunctional
ndant biastional dataI measuressuspect is
I to fit thetional-size,as, and the
yzing dataust alwaysr approach. The mostr example,ions of therrit-absentnt suspect,er identifi-
-h types ofVe believerit-presente statisticsbecause itse to eachhe experi-Note thatespeciallyan also be
Condition X
Culprif Present
Culprit-Absent
Condition Y
Culprit-Present
Culprit-Absent
65%(52)
Cannot occur
75% (60)
Cannot occur
Cannot occur
10% (8)
Cannot occur
30"/" (24)
157" (12t
50% (40)
5%(4)
20"/. (16)
Eyewitness Identification Resesrch 245
Table 13.1Distributions of ldentification Decisions as a Function of Culprit-present
and Culprit-Absent Conditions and the XYManipulation
Witness lD DecisionLineup condition lD of Culprit lD of lnnocent Suspect lD of F i l ler No lD
20% {16)40% (32)
20% (16)
50% (40)
Nole: Numbers in parentheses are frequencies.
useful to report the rate at which each individual filler in the lineup isidentified rather than collapsing all fillers into one total.
The format of data presentation in Table 13.1 is based on a presumptionthat the design of the experiment used a single a priori innocent suspect ratherthan using the method of roiating the innocent suspect across all lineupmembers (the counterbalancing procedure discussed in the previous section).This is apparent from the fact that the innocent suspect in Condition yreceived 30% of the choices whereas the five fillers in total received only20vo (an average rate of 4vo). rf the rotation method had been used, theinnocent suspect rate of identification would be exactly the same as the fillerrate divided by the number of fillers.
The format of rable 13.1 also permits a quick assessment of whethereyewitness performance is running above levels expected by chance. oneway to define chance is to compare ihe rate of correct rejections (no identifi-cation in culprit-absent conditions) to incorrect rejections (no identificationsin culprit-present conditions). Because the frequencies are reported in Table13.1, anyone can quickly calculate a chi-square value to determine whetherthe rates are significantly different. The format of Table 13.1 also permits aquick assessment of diagnosticity.
Although space does not permit us to give a full treatment of the diag-nosticity statistic, the diagnosticity of a suspect identification is the ratio ofhits (identifications of the culprit) to false identifications (identifications of theinnocent suspect). In the case of Condition X in Table 13.1, the diagnosticityratio is 6.5, and in the case of Condition Y, the diagnosticity ratio is 2.5. Ineffect, the diagnosticity ratio for identifications of the suspect can be inter-preted as how much more likely an eyewitness is to identify the guilty partyfrom a culprit-present lineup than to identify an innocent suspect from aculprit-absent lineup. A diagnosticity ratio of 1.0 indicates no diagnosticity,and higher numbers indicate increasing levels of diagnosticity.
246 CRtr,ttxel Lew-Tulnl IssuEs
The diagnosticity ratio can be also computed for fillers and for
iclentification decisions. Diagnosticity ratios are quite important for
analyses that attempt to use Bayesian siatistics to calculate probabilities
error under different possible base rates for whether the lineup incl
the culprit. Detailed treatments of eyewitness identification diagnosti
and Bayesian probabilities can be found in the eyewitness identi
literatttre (Wells & Lindsay, 1'980; Wells & Olson, 2002; Wells & Turtk
1gSO. The article by Wells and Olson also describes how io test whethel
diagnosticity ratio is significantly different from 1.0 or whether one diag
nosticity ratio is significantly greater than another diagnosticity ratio'
Commonly, researchers first want to know whether the manipulation
conditions X vs. Y in Table 13.1) had any significant effect on eyewi
identification performance. Analyses of variance are generaily not a
ate for data of this form (i.e., categorical data of a dichotomous form).
omnibus (overall) test of staiistical significance is readily available,
using a hierarchical log-linear analysis, which is contained in SPSS and ot
statistical packages. Individual contrasts (e.g., is the 107o rate of in
suspect identifications in condition X statistically different from the 30%
of innocent suspect identifications in condition Y?) are also easily condu
using the chi-square statistic.
CoruRoErlcs Dnrn
Commonly, researchers are interested in using confidence measures to
the relation between confidence and accuracy. The point-biserial correlati
has been the most common method for doing this in the eyewitness i
cation literature, but numerous researchers have quesiioned the appropri
ness of the point-biserial correlation and have instead argued for
of calibration and resolution (e.g., Brewer, Keast, & Rishworth,2002;
& Wells,2006; Juslin, Olson, & Winman ,7996)' One problem is that cali
and resolution statistics require either extremely large sample sizes or a
number of repeated measures per pariicipant. Note, however, that cal
can be calculated properly only if the confidence scale is categorical from
to L00% (i.e., the commonly used 7-point scale cannot be used)'. Furl
cornplicating the traditional practice of reporting the point-biserial
tion between confidence and accuracy is the fact that the point
l. The fact that confidence calibration calculations require 0-100% scales rather than
common scales (e.g., 1-7) might itself be a good reason to use 0-100% confidence sc
eyewitness identification studies. In general, the authors recommend percentage
for confidence questions, because percentage scales have some degree of lay comp
hension and meaning. Hence, percentages might be more natural and readily grasped
participar-rt witnesses. Furthermore, text, figures, and tables in articles that refer to n
confidence levels. such as 3.9, require the reader to go back to the methods section
look again at n'hat kind of scale was used (5 point? 7 point? 9 point?) to get a sense
where participant-witnesses l\rere on the scale. Percentage confidence, on the other hat
does not impose this ambiguitY or extra bttrden on the reader'
1 for no-t for anybiiities ofincludes
gnosticityrtification& Turtle,,vhether aone diag-ratio.rtion (e.g.,
yewitnessappropri-form). Anhowever,and otherf innocente 307o rate
:onducted
rs to assess:orrelationss identifi-rpropriate-'measures
02; Brewercalibrations or a largecalibrational from 0%)1. Furtherial correla-int-biserial
rer than otherence scales in:entage scalesf lay compre-ly grasped byreier to meanls section andget a sense ofre other hand,
Eyeuitness Identification Research 247
Table 13.2Frequency Array for Confidence Level by Accuracy
Accuracv O"/" 10% 20"/" 30% 40"/" 50% 60% 70% 80% 90% 100%!
Accurate
lnaccurate
correlation is sensitive to base rates (McGrath & Meyer, 2006).In the current
context, that means that the base rate for accuracy affects the point-biserial
correlation.Regardless of the summary statistic used to express the relation between
eyewitness identification confidence and accuracy, we recommend that
researchers also consider reporting confidence level using accuracy/
frequency arrays. Table 13.2 gives an example of such an array. This array
is extremely useful because it permits readers to calculate the point-biserial
correlation, or d, or any other statistic. Importantly, it also permits those who
are doing meta-analyses to easily collapse across studies. And, it allows
researchers to explore the idea that extreme confidence (e.g., 90% or more
confident) might be populated almost exclusively by accurate witnesses (in
this case 897o of witnesses with confidence of 90% or better are accurate) even
when the overall distribution shows a modest or weak relation.
It is now well documented thai the relation between eyewitness identifi-
cation accuracy and confidence is consistently moderated by whetl'rer the
witnesses are choosers (i.e., made an identification) or non-choosers (e.g., see
Sporer, Penrod, Read, & Cutler, 1995). Hence, it is not recommended that
researchers collapse over both types of accuracy (accurate identifications and
correct rejections) and both types of inaccuracy (mistaken identifications and
incorrect rejections) when reporting confidence-accuracy relations, because
this can hide the fact that the relation is stronger among choosers (accurate vs.
inaccurate identifiers) than it is among non-choosers (correct rejecters vs.
incorrect reiecters).Finally, we note that researchers should consider the fact that the confi-
dence-accuracy relation in eyewitness identification might also vary as a
function of whether the identification of fillers in the culprit-present lineup
are included among the choosers in calculating this relation. From a PersPec-tive of the ecology of the forensic situation, in actual cases we would not
normally be interested in the question of wl'rether eyewitness identification
confidence helps to discriminate between identifications of a guilty suspect
and identifications of fillers. Using the assumption of the single-susPect
model for lineups (one person is the suspect and the remainder are known
to be innocent fillers), it is already known in an actual case whether the
identification of a filler was mistaken-clearly it was. Instead, eyewitness
identification confidence is useful to the extent that it helps discriminate
between accurate identifications (i.e., when the culprit is present) and mis-
taken identifications of an innocent suspect (i.e., when the culprit is absent).
5 1 0 3 4
1 5 1 5 3 3
0 0 0 2
0 0 1 4 1 2
2 1
1 7
1 7 8
2 1
248 CntvlNel Law-Tnlel IssuEs
In some cases, it might be informative to report the average confidence with
wl'ricl-r every lineup member was chosen; the suspect in the target-present
lineup, each individual filler in the target-present lineup, and each individual
member of the target-absent lineup. If standard deviations are also reported
with each of these L2 means, it would be easy for the interested reader to
calculate the confidence-accuracy correlation under any number of assump-
tions (e.g., if the most-selected filler was the innocent suspect versus the
average filler).Readers should note that many of the points we have made here about
confidence apply as well to other continuous-variable "postdictors" of eye-
witness identification accuracy, such as identification decision latencies.
Again, however, the approach that a researcher takes can vary depending
on the nature of the questions being asked. In the case of decision latencies, for
example, Dunning and Perretta (2002) were interested in finding the number
of seconds that represented the optimum separation of accurate and in-
accurate witnesses. Hence, for their PurPoses it was useful to find the point
along the decision latency time line that yielded the maximum chi-square
value separating the accurate from the inaccurate witnesses.
ARCHIVAL RESEARCH AND FIELD EXPERIMENTS
Although our focus in this chapter has been on ihe design of experimental
studies, it is important to note that other metl"rods have been used to examine
eyewitness performance. Here we consider two forms of research involving
actual cases: (1) archival studies, which use police or prosecutor files to
examine past cases or collect new data in police settings but do not involve
any experimental methods, and (2) field experiments, which use experimen-
tal methods in connection r.vith actual cases. A handful of studies of eye-
witness identification performance in actual cases have made use of archival
records. The largest and best of these studies have been conducted in the
United Kingdom (e.g., Pike, Brace, & Kynan, 2002; Slater,1,994; Valentine,
Pickering, & Darling,2003; Wright & McDaid,1996) and have involved nearly
15,000 eyewitnesses. Archival studies in North America have been much
smaller in scale (e.g., Behrman & Davey,2001; Behrman & Richards,2005;
Tollestrup, Turtle, & Yuille, 7994).
By t'ield experiments, we mean experiments using eyewitnesses to actual
crimes, usually conducted in conjunction with a police department. We are
not using the term field experiment to refer to an experiment involving a
simulated crime that happens to be conducted in the "field" (e.9., on the street
or in a convenience store). (In the latter case, we consider these to be lab
experiments that happen to step outside of the traditional lab and set up a lab
in some other setting.) A superb example of a field experiment is one
conducted by Wright and Skagerberg (2007), in which actual eyewitnesses
to serious crimes were asked critical questions about their lineup identifica-
tion either before or after being told whether the person they identified was
the susPect in tldid not mattersuspect, becaus
If the issue i
and field exPerthe highlY con'Chicago Policelems with thisSteblay, in Prewith the factotwas or was noldefinitivelY knthere is no amlwas accurate cthe identificatiIn all of the I
exampie, the ialbeit an inno
One waY tcexperiments iwhereas thesometimes nidentificationthis strategY {
field exPerim(1994) observbetween misrecorded as ethe same Proto the ChicaPthey were cctrator knew
Of course,is accurate,accurate thathat variabl<tions of fillidentificatioinaccuracY.for instancesusPects anexamPle, coperson is tttheir identi:merelY fiile
nfidence witharget-presentch individualalso reportedted reader torr of assump-:t versus the
e here abouttors" of eye-x latencies.r dependingIatencies, forthe number
:ate and in-rd the pointt chi-square
S
<perimentalto examine
h involvingtor files tonot involve:xperimen-lies of eye-of archivalcted in theValentine,ved nearlyreen muchrrds, 2005;
; to actualnt. We arervolving at the streetto be lab
:t up a labnt is onewitnessesdentifica-lified was
Eyezttitness Identification Resenrch 249
the suspect in the case or merely a filler. In the case of this field experiment, itdid not matter whether the suspect was the actual perpetrator or an innocentsuspect, because the question concerned the effects of receiving feedback.
If the issue is one of accuracy of identification decisions, archival studiesand field experiments present special challenges. This is readily apparent inthe highly controversial field experiment conducted by an employee of theChicago Police Department (Mecklenburg, 2006). The methodological prob-lems with this study are well documented (e.g., see Schacter et al., 2007;Steblay, in press). For current purposes, however, we are more concernedwith the factors that lead to ambiguity regarding whether an identificationwas or was not accurate. In a lab experiment on eyewitness identification, it isdefinitively known whether a suspect in a lineup is in fact the culprit. Hence,there is no ambiguity in a lab experiment as to whether a given identificationwas accurate or inaccurate. In an archival study or field experiment, however,the identification of a suspect might or might not be an accurate identification.In all of the DNA exoneration cases involving mistaken identification, forexample, the identification that was made was the identification of a suspect,albeit an innocent one.
One way to partially get around this problem in archival studies and fieldexperiments is to note that the identification of a liller is always an enorwhereas the identification of a suspect would sometimes be an error andsometimes not. In the UK studies, approximately one in three positiveidentifications made by witnesses was a foil identification. However, eventhis strategy for determining errors can be problematic in archival studies andfield experiments due to poor record keeping. For example, Tollestrup et al.(1994) observed that the Royal Canadian Mounted Police did not distinguishbetween misidentifications of foils and rejections of lineups-both wererecorded as a failure to identify anyone. Behrman and Davey (2001) reportedthe same problem in their study, and it is clear from footnote 3 in an appendixto the Chicago experiment that foil identifications were not recorded becausethey were considered "tentative," at least in conditions where the adminis-trator knew the identity of the suspect.
Of course, even without knowing whether the identification of the suspectis accurate, one can assume that such an identification is more likely to beaccurate than is the identification of a filler. In this sense, it could be arguedthat variables that increase identifications of suspects or decrease identifica-tions of fillers are promoting accuracy, whereas variables that decreaseidentifications of suspects or increase identifications of fillers are promotinginaccuracy. Still, great caution must be exercised in using this logic. Consider,for instance, the myriad \^rays that one might increase identifications ofsuspects and decrease identifications of fillers in a field experiment. Forexample, comparing a condition in which witnesses are keptblind as to whichperson is the suspect with a condition in which witnesses are told prior totheir identification decision which person is the suspect and which ones aremerely fillers. The latter would serve to depress the identification of fillers and
250 CRnrrxel Law-Tnrel Issurs
increase identifications of the suspect. Surely we would not accept thisevidence that witnesses should be told which person is the suspect andare fillers.
r e ma
Alternatively, consider what happens if the fillers in the lineup are vant;specifically to be implausible lineup members (e.g., do not fit the wi fieldescription of the culprit). That would also serve to depress filler identi queltions and increase identifications of the suspect. But does thataccuracv? As a final example, if the depression of filler identifications is agood outcome, then why use fillers at all? why not make every lineup a showup instead? The point here is that it is not automatically true that variathat decrease filler identifications or increase suspect identifications inexperiments necessarily promote accuracy (Wells, 2008).
Similar problems can emerge when examining the confidenceExPtrelation in archival and field experiments. Again, the idea is that
identifications are definitely inaccurate whereas identifications of sushave at least some (perhaps the most) accurate identifications in the mi numl
q o" -
ircr(tr'
ysCin
ence the confidence of the witness based on the lineup administr their
Hence, the relation between eyewitness identification confidence andfiller versus suspect identification decisions can be a proxy for the confideaccuracy relation in field experiments. Caution must be exercised herewell, because there is compelling lab-based evidence that non-blind liadministrators (who are the ones soliciting the confidence staiement) i
knowledge of whether the witness identified a filler or the suspect (see Carrioch & Brimacombe, 200i). Hence, in any iurisdiction in which
should be the one who administers the lineup.Finally, we note that field experiments also differ from lab experi
in numerous ways that increase the need for quite large sample sizes. Inexperiments, for example, it is possible to give all witnesses the sameand quality of view, the same duration of exposure, the same delay betweenwitnessing and testing, the same fillers, and approximately the same levelsanxiety, fear, arousal, and so on. In a field experiment, in contrast, thesenumerous other variables float freely and vary widely from one witnessthe next. Careful measures of these free-floating variables can help at thelevel of data analysis if they are statistically controlled, but in geneial, theyrcontribute noise to the data that results in the need for larger sample sizesthan are needed in the typical lab experiment.
rESS
TTNESS
reveale,mPl
&Messtrr
Daco.
made tthat theare wr(
W e ts o f
have b,and strMemor
,and fie,witness
as a prl'correct
weaportobberl
lineup administrator is the case detective or someone who knows whi ith 71person is the suspect and which ones are fillers (which is the case injurisdictions in the united states), the relation between witness confi
noor confidence of witnesses should routinely use double-blind proced lil(wells, 2008). This means that neither the case detective nor any exP€individual who knows which members are fillers and which is the suspect
and whether the witness picked a filler instead of the suspect is likely toartificially inflated by the lineup administrators' influence on the witness, alenti
In general, field experiments that are designed to examine the acc
this asI which
;elected'itness's
:ntifica-rromote
>ns is aa show-lriablesin field
rcuracyrt filleruspectsre mix.rnd theidence-nere aslineup
) influ-trators'' t ( p o_ - r _ . b . /
ich thewhich
n mostfidencey t o b etnCSS.
rcuracy'edures
' other;uspect
'iments
. In labre typeItweenrvels ofrse andness toat the
Ll, theye sizes
Eyezuitness Identification Research 251
GENERALIZING RESEARCH FINDINGS
Our emphasis in this chapter has been on experimental methods. We have
noted a number of the strengths of experimental methods and some of the
advantages of laboratory and staged-event experiments over archival studies
and field experiments involving actual cases. Our emphasis naturally raises
the question of whether the results from laboratory and staged-event experi-
ments generalize to real-world witnesses and what evidence rt'e might use to
address that question. Although space does not permit us to undertake a full
review of relevant research findings, we do wish to highlight some of the
ways in which the question can be addressed and note a few examplefindings.
Do ExpEnnurENTAL STuDIES AND SruDIrs WnH Acruel
Wnrunssrs YiEr-o CoNvERcENT Fwnntcs?
A number of archival researchers have endeavored to test whether findings
revealed in experimental studies are replicated with actual witnesses. For
example, laboratory studies indicate that the probability of identifying a
perpetrator is reduced by the presence of a weapon (e.8., E. F. Loftus, Loftus,
&Messo, 7987;Maass & Kohnken,7989), and Steblay Q992) reported a reliable
impairment from the presence of a weapon in a meta-analysis of 19 studies.
In their archival study, Tollestrup et al. (1994) found that the presence of a
weapon reduces the likelihood of the suspect's being identified: 47Vo of
robbery suspects in cases involving a weapon were identified, as compared
with 7L7o of suspects in robberies without a weapon. In contrast, Behrman
and Davey (2001) found similar rates of identification of police suspects when
they compared240 witnesses to crimes in which a weapon was used with 49
witnesses in crimes without weapons. In their study of 640 witnesses,
Valentine, Pickering, and Darling (2003) found that the presence of a weapon
had no effect on the likelihood of identifying a suspect, but witnesses were
more likely to identify a foil if a weapon was not present. Similar comparisons
of experimental and archival findings involving other variables have been
made by Valentine et al. along with others, and it is not uncommon to see
that the findings do not converge. Does this mean the experimental findings
are wrong?We believe it is entirely too early to reach any conclusions from compari-
sons of this kind. Certainly there is little doubt that a great many variables
have been shown to reliably affect witness performance using laboratory
and staged-event experimental methods (see, e.g., the review by Wells,
Memon, & Penrod, 2000. However, archival and field-experiment data suffer
from a number of deficiencies. As we have noted previously, archival studies
and field experiments generally cannot authoritatively establish whether
witnesses have made correct identifications. They rely on suspect choices
as a primary variable, and because suspect choices can involve a mixture of
correct and incorrect identifications, researchers should not be too surprised
:l
252 CnlvrN,ql Law-Tnnl Issurs
to observe a different pattern of results when they compare such studies withresults from experiments in which accuracy is known.
Another important difference between experimental studies and fieldarchival studies is that field and archival studies cannot easily isolatevariables, because those variables are correlated with other variamany of which might not have even been measured. Consider, for example;,a field study that involved numerous witnesses to a shooting in which it wasfound that those who were most stressed by the shooting were the moslaccurate in reporting details of the event (Tollestrup et a1.,1994). This findingseemingly goes against the general consensus of the experimental literaon stress and accuracy. However, a closer examination of the data also showsthat witnesses who were physically closest to the shooting were the moslstressed (which makes sense), and those closest to the event had theview and were more accurate. These are called multi-collinearity problems,and they are rampant in field and archival studies, which makes it difficulttoreach conclusions about the role of individual variables.
There are certainly other features of field data that can muddy the resultsfrom such studies. Consider, for example, that in field studies it maydifficult to determine whether a weapon was actually visible to a witnessfor how long. In addition, not all witnesses are necessarily tested whenpolice have a suspect. Witnesses who report a poor memory (those who sawweapon?) may indicate that they do not really remember what the perpelooked like and might, therefore, not be shown a suspect. Furthermore,might ask about case "selection" effects. Are the police differentially selectiweapon versus non-weapon cases for more intensive invesiigation (we
so)? Does more intensive investigation of weapons cases yield a highuproportion of guiity suspects (and a higher proportion of suspect choias compared with non-weapons cases? We do not know the answer toquestions, though they are subject to research.
Do DrrrenrNr Rrsrencu MEruoos PnooucE DrrrunrNr Rrsulrs?
As we highlighted earlier in this chapter, researchers are confronted with avariety of methodological choices when they construct their research studies,giving rise to the following questions:
Should a long or short video be used rather than a staged eventcontaining one or several culprits?Should the culprit be replaced with a look-alike when making culpritabsent arrays?How many fillers should be selected and in what manner?Do these choices yield differences in results that might prompt us toconcerned about the generalizability of our findings?
We would poini to two types of analysis that address the question.
a
ShapircI ider
reported ithe studittions werrthe studi<studies ofions, sucl
iticexl
tory stuizable acr
Similarcomparecthat the owas signi
false athan twomade fro
Althoumore rea.
II
res waI)
Shapiro 'explain Ivariablesan upcolcontribut(numberperform;(facial revarrance
A secoby PenrraPpearec
mit c<over stucinteract '
rneta-an;lve varli
listicuentii
itne
-udies with
nd field or;ily isolate
variables,,r example,hich it was: the mosthis findingI literaturealso shows: the mostthe betterproblems,difficult to
the resuitsit may bewitness orwhen the
who saw a,erpetrator'more/ oney selectingr (we hope
a higher:t choices):r to those
:ed with a:h studies,
5ed event
rg culprit-
>t us to be
tion.
Eyetuitness ldentit'icatiort Resenrclr 253
Shapiro and Penrod (1986) published a meta-analysis of factors influencing
facial identification. They drew on results from over 190 individual studies
reported in 128 articles on face recognition and eyewitness memory. Overall,
the studies contained 960 experimental conditions under which identifica-
tions were made and included more than 16,950 participants. Most (80Vo) t>f
the studies analyzed by Shapiro and Penrod (1986) were laboratory-based
studies of facial recognition rather than of more realistic eyewitness simula-
tions, such as videotaped or staged events. They noted that type of study (face
recognition vs. eyewitness) accounted for 35% of the variance in performance
across experimental conditions. Performance levels were higher in the labo-
ratory studies, which raises the question of whether the results are general-
izable across research methods.
Similarly, Lindsay and Harvie (1988) conducted a meta-analysis that
compared a sample of face recognition and staged-crime studies. They found
that the overall correct identification rate (.64) in 113 face-recognitior"r siudies
was significantly higher than the rate (.58) in 47 staged-crime studies and that
the false alarm rate (.18) in 64 face-recognition studies was significantly lon'er
than two types of mistaken identifications in siaged-crime studies-those
made from 47 target-present arlays (.29) and 12 target-absent arrays (.41).
Although these comparisons of laboratory-based face recognition and
more realistic eyewiiness studies initially may seem to raise concerns about
external validity, the difference in performance between the two types of
studies would not raise such concerns if explained by variables (e.g., retention
interval) that generally differ in face-recognition versus eyewitness studies.
Shapiro and Penrod (1986) tested whether "study characteristics" might
explain the differences in identification accuracy rates. They found that
variables associaied wrth nttention variables (including knowledge about
an upcoming test and mode of presentation) made significant individual
contributions. The durntion of exposme per face, Plse, ftletnory load nt sfttdtl
(number of faces studied), and retention interunl accounted for differences in
performance. Once all those variables were accounted for, the type of study
(facial recognition vs. eyewitness) accounted for jtrst 37o of performance
variance (rather than 35%).A second way to address the generalizability issue was addressed in detail
by Penrod & Bornstein (2007), who examined meta-analyses that have
appeared since Shapiro ancl Penrod's (1986) study. These later meta-analyses
permit comparisons of performance across research methodoiogies that vary
over studies; these meta-analyses often ask whether methodological variables
interact with subsiantive variables. Penrod and Bornstein show that these
meta-analyses consistently demonstrate that whenever the effects of substan-
tive variables vary across research methodologies, they are larger for more-
realistic procedures. Factors such as lineup presentation (simultaneous vs.
sequential), weapon focus, stress, live versus video testing, instructions to
witnesses, cross-race identifications, and the like exert larger effects on
eyewitness performance when testing conditions more closely match real
eyewitness researchers failing to consider some of these issues, and neither tion. ,for
254 CRrrurNal L.qw-Tnlel Issuss
witnessing conditions (e.g., staged crime or other eyewitness event and li
video stimuli) than when the situation is relatively contrived and
This pattern of results suggests that if there is a generalizability issue, it ismuch eyewitness research underestimates the magnitude of effects
studied. Based on research to date, researchers and practitioners can
reasonably confident that experimental findings generalize to real e
situations.
CONCLUDING REMARKS
In this chapter, we have described a number of methodological issuesconfront every researcher who designs and conducts an eyewitness i
cation study. In many cases, researchers fail to think these issues throughand end up learning the hard way by having their manuscripts rejectedmethodological grounds or discovering the problem later and having to
the study.We think this chapter might be especially useful for novice
researchers. Sometimes we are surprised, however, to find even ex
the authors of this chapter can claim to be fully exempt from having
bad call on one of these methodological issues- No study is perfect,research involves a constant process of improvement'
We also want to emphasize that with only a few exceptions, ourmendations in this chapter are not universal rules applicable to all sResearchers need to consider the issues we have raised in the context of
hypotheses and goals. We gave examples of some instances of this sort,
as when it is acceptable to use only target-present or only targetlineups. However, when the question being tested is purely theoretical,of the ecological validity considerations can become moot.
Finally, we encourage would-be eyewitness identification researchers
read the eyewitness identification literature carefully to gain a sense of
methodological issues that are often unique to a given study. But, we cauti
researchers not to assume that a study is devoid of all possiblelogical problems merely because it was published. The standards forodological soundness in eyewitness identification research and com
of reporting are higher today than they were only a few years ago, andtrend will undoubtedly continue.
REFERENCES
Behrman, B. W., & Davey, S. L. (2001). Eye-witness identification in actuai criminalcases: An archival analysis. Lazu nnd Htt-nmn Behauior. 25, 475497.
Be'hrman, B. W., & Richards, R. E. (2005).Suspect/foil identification iu actual crimes
and in the laboratory: A realityanalysis. Law and Human Behaoior,
Brewer, N., Keast, A., & Rishworth, A
reflectiontion and <Psycholog.
, N. ,dence-ac<identifications, fobase rateogy: Appl
ning, Dticity andsecond tfrom in;
lournal o,h , L
Lineup aI impact <
Human IP. , (
Calibratidence irments o:from a
Leaning1316.rdsay, Rfalse al;ficationrcollecti<berg, P.aspectsissues (
Wiley.E.
Some f;
HutnattA.
identififect." L
h,
effect tPsychot
of the ston seprocedtfrom h20Piloi
fendarlineup?og
279-3Q1.
The confidence-accuracy relationshipeyewitness identification: The effects
the elculpriPsycht
and live or
controlled.ue, it is that
fects being
rers can be
eyewitness
, issues that.ess identifi-rrough fullY, rejected onving to redo
r eyewitnessexperiencednd neither ofving made aperfect, and
/ our recom-o all studies.ntext of theirhis sort, suchtarget-absentrretical, some
esearchers to;ense of otherlt, we cautionble methodo-rds for meth-completenessago, and this
eality monitoringwr Behaaior, 29,
rworth, A. Q002).7 relationshiP inr: The effects ol
reflection and disconfirmation on correla-tion and calibration. lournal oi ExperimentnlPsychology : Applied, 8, 44-56.
Brewer. N., & Wells, G. L. (2006). The confi-dence-accuracy relationship in eyewitnessidentification: Effects of lineup instruc-tions, foil similarity and target-absentbase rates. lounnl of Experimental Psychol-ogy: Aptplied, 12, 11-30
Dunning, D., & Perretta, S. (2002)' Automa-ticity and eyewitness accuracy: A 10- to 12-
second rule for distingriishing accuratefrom inaccurate positive identifications'
lottrnal ot' Applied Psyclnlogy, 87,951-962Garrioch, L., & Brimacombe, C. A' E. (2001).
Lineup administrators' exPectations: Their
impact on eyewitness confidence. Lau I
Htnnan Behaaior, 25, 299-374.juslin, P., Olson, N., & Winman, A. (1'996)'
Calibration and diagnosticity of confi-
dence in eyewitness identification: Com-ments on what can and cannot be inferredfrom a iow confidence-accuracy correla-tion. lottrnal ot' Experimental Psychology:Lenrning, Menrory, nnd Cognitiott, 5,1304-1316.
Lindsay, R. C. L., & Harvie, V. L' (1988). Hits,
false alarms, correct and mistaken identi-fications: The effect of method of data
collection on facial memory. In M Grune-berg, P. Morris, & R. Sykes (Eds.), Prncticalaspects of memory: Current resenrch nnd
issrres (pp. 47-52). Chichester, Enp;land:Wiley.
Loftus. E. F., Loftus, G' R., & Messo, 1.{dJ987)'Some facts about "weapon focus." Laut and
Human Behnoior, 1L, 55-62.Maass, A., & Kohnken, G. (1989). Eyewitness
identif ication: Simulating the "weapon ef-
lect." Law and Humnn Behaaior, 13, 397 408'
McGrath, R. E., & Meyer, G. J' (2006). When
effect sizes disagree: The case of r and d'
Psychological Metlnds, 71 ' 386-407 .Mecklenburg, S. (2006). Report to thelegislatu'e
of the state of lllittois: The Illittois Pilot Progrnrn
ott sequentinl double-blind identificntianprocedtrres. Retrieved November 75, 2070'from lrttp: / / www.chicago police.or g / lLo/"20P ilotZo2} o n7o20 Ey ew itnes s%20lD.p df
Malpass, R. S. (1981). Effective size and de-fendant bias in eyewitness identificationlineups. LatLt and Hwnan Beluaior, 5' 299-309.
Memon, A., & Gabbert, F. (2003). Unravelingtire effects of sequential presentation in
culprit-present lineups. Applied CognitittcPsychology, 1'7 , 703-774.
Eyewitness ldentification Research 255
Memon, A., Hope, L., & Bull, R. H. C (2003)'
Exposure duration: Effects on eyert'itnessacclrracy and contidence. British lournal of
P sychology, 94, 339-354.Morgan, C. A., Hazlett, G., Doran, A., Gar-
rett, S., Ho1't, G., Thomas, P., Sor'rthwick,S. M. (2004). Accuracy of eyewitnessmemory for persons encountered duringexposure to highly intense stress. Itt lcrtrrl-tionnl lournnl of Lazu nntl Psychintrtl, 27,103-2 /> .
Penrod, S. D., & Bornstein, B. H. (2007). Gen-eralizing eyewitness reliability researci't'In R. C. L. LindsaY, D. Ross, D. Read, &M. Toglia (Eds.), Handbook of eyezoitrtesspsychology ffoI. II): Mentory for people '
Mahwah, Nf: Lawrence Erlbaum.Pike, G., Brace, N., & Kynan, S. (2002' March)'
The visual identification of suspects: Proce-
dures and practice (Briefing Note No. 2/02)'
London, England: Home Office ResearchDevelopment and Statistics Directoraie.
Schacter, D., Dawes, R., Jacoby, L. L', Kahne-man, D., Lempert, R., Roediger, H. L., &
Rosenihal, R. (2007). Studying eyewitnessinvestigations in the field Latu nnd HtnnnrrBe.hnaior, 31,3-5.
Shapiro, P., & Penrod, S. (1986). A rneta-
analysis of facial identification studies'
P sy chol o gicnl Bulle t in, 1 00, 139 -156.
Slater, A. (1994). Identification parades: A
scientific evaluation. Police Research
Award Scheme. London, Engiand: Police
Research GrouP, Home Office.Sporer, S., Penrod, S., Read, D., & Cutler, B' L'
(1995). Choosing, confidence, and accu-racy: A meta-analysis of the confidence-accuracy relation in eyewitness identifica-
tion studies. Psychologicnl Bil letin, 118'
3 l3 -JZ/ .
Steblay, N. M. (1992). A meta-analytic revierv
of the weapon focus effect. Lnzt' nnd Htnnnn
Behaaior,16, 41'3424.Steblay, N. M. (In press). What we know nolv:
The Evanston Illinois field lineups' Lrrzl'
and Htnnan Behauior. doi: 10'1007/s10979'009-9207-7
Tollestrup, P., Turtle, J., & Yuil le, J' (1994)'
Actual victims and r.t'itnesses to robbery
and fraud: An archival analysis. In D' F'
lloss, J. D. Read, & M. P. Toglia (Eds'),
Atltt l t ryauilttess testirrtottrl: Crtrrert trertdsnnd deuelopmerrfs (pp' 44-16U. New York,
NY: Carnbridge UniversitY Press.Valentine, T., Pickering, A., & Darling, S'
(2003). Characteristics of eyewitness iden-
tification that preclict tl're outcome of real
256 CRrtrrrua,l Lar,v-TRlal Issuss
lineups. Applied Cognitiue Pqclnlogy, 17,969-993.
Wells, G. L. (1993). What do we know abouteyewitness identification? American Psy-chologist, 48, 553-577.
Wells, G. L. (2008). Field experiments on eve-witness identif ication: io*ards a beiterunderstanding of pitfalls and prospects.Lazu nnd Hurnan Behaaior, 32, 6-10.
Wells, G. L., & Bradfield, A. L. (1998). "Good,you identified the suspect:" Feedback toeyervitnesses distorts their reports of thewitnessing experience. lournal ot' AppliedP sy chology, 83, 360-37 6.
Wells, G. L., & Bradfield, A. L. (1999). Meas-uring the goodness of lineups: Pararneterestimation, question effects, and limits tothe mock witness paradigm. Applied Cog-ritiae Psychology, 135, 2740.
Wells, G. L., Leippe, M. R., & Ostrom, T. M.(1979). Guidelines for empiricallv assess-ing the fairness of a l ineup. Ln ru oid Hrrro,,Belnz,ior, 3.285-293.
Wells, G. L., & Lindsay, R. C. L. (1980). Onestimating the diagnosticity of eyewitnessnonidentifications. Psyclzological Bulletin,88,776-784.
Wells, G. L., Memon, A. & Penrod, S.(2006). Eyewitness evidence: Imits probative value. Psychologicalin tlrc Ptfulic Irtterest, 7, 45-75.
Wells, G. L., & Olson, E. (2002). Evewiidentification: Information sain fromcriminating and exonerating behavilournnl of Experimental P sycltol ogy.8,755-767
Wells, G. L., Rydell, S. M., & Seelau, E. p(1993). On the selection of distractors foteyewitness lineups. lottnml ot' Applied psy.
chologu, 78,835-844Wells, G. L., & Turtle,I. W. (1986). E
identification: The importance olmodels. P sychologicnl Bullet in, 99, 3204n,
Wells, G. L., & Windschitl, P. D. (1999)ulus sampling in social psyexperimentation. Personality and Socialchol ogy Bulletin, 25, 71'15-7725.
Wright, D. B., & McDaid, A. T. (1995).paring system and estimator vausing data from real lineups.Cogttitizte P aplrclogy, 1 0, 75-84.
Wright, D. B., & Skagerberg, E. M. (2007). po$.identification feedback affects realnesses. P syclnlogicnl Science, 18, 172-179,
C1
GAIt
An 8-,testifirand athe Stimotht
and la
is abo
reach
she ut
sp.eaknoddi
McNi
Andr,nowneighguilty
I n a itestiflKasalinnocw h o .Termspeciiover