Kahneman Et Al. - 1982 - Judgment Under Uncertainty Heuristics and Biases

399
Judgment under uncertainty Heuristics a n d biases PUBLISHED BY THE PRESS SYNDICATE DF Tl-IE UNIVERSITY DF CAMBRIIJ-GE The Pitt Building, Trumpington Street. Cambridge. United Kingdom CAMBRIDGEUHIYERSITY PRESS  I‘he Edinburgh Building, Cambridge CH2 ZRU, UK 40 West Zth Street, New York. NY IDEIII-4211, USA 4?? Williamstown Road, Port Melbourne, VIC 32117. Australia Ruiz de Alarcon 1 3 , ZSDI4 Madrid. Spain Dock House. The Waterfront, Cape Town SD01, South Africa httpz.- iwww.camhridge .org Q Cambridge University Press 1982 This book is in copyright. Subject to statutory errception an d to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1982 Reprinted 1932. I983 {twice}, 1934. 1935 {twice}, 1986, 1937, I988, 1990, 1991. 1993. 1994. I998. 1999. 2001 Printed in the United States of America Typeset in Times A caiaiog recardfar this book is avaiiabie from the British Library Library of Congress Caiaioging in Pui:-iiraiion Dara is avaiiabie ISBN U $21 23414 '7 ' paperback

Transcript of Kahneman Et Al. - 1982 - Judgment Under Uncertainty Heuristics and Biases

u n d e r
uncer ta in ty
PUBLISHED BY THE PRESS SYNDICATE DF Tl-IE UNIVERSITY DF CAMBRIIJ-GE
Th e Pit t Bu i ld ing, Trumping ton Street. Cambr idge . U n ite d K in g d o m
CAMBRIDGE UHIYERSITY PRESS
  I ‘ h e E d i n b u r g h Bui ld ing, Cambr idge CH 2 ZRU, U K
40 W e s t Zfl th Street , N e w Y ork . N Y IDEIII-4211, U S A
4?? Wi l l iamstown Road, Port Melbourne, VIC 3 2 1 1 7 . Austral ia
R u i z de Alarcon 1 3 , ZSDI4 Madr id. Spain
Dock House. T h e W a t e rf ro n t , Cape Town SD01, South Africa
httpz.- iwww.cam hridge .org
Q Cambr idge Univers i ty Press 1982
This book is in copyr ight . S u b je c t to s ta tu to ry errception an d
to the provisions of re levant collective l icensing agreements ,
n o r e p ro d u c tio n of a ny pa rt m a y take place wi thout
th e wri t ten permission of Cambr idge Univers i ty Press.
F irs t pub l ished 1982
Repr inted 1932. I 9 8 3 {twice}, 1934. 1935 {twice}, 1986 , 1937 , I 9 8 8 ,
1990, 19 91. 19 9 3. 1994. I 9 9 8 . 1999 . 2001
P rin te d in th e Uni ted States of Amer ica
Ty pe se t in Tim e s
A caiaiog recardfar t his b o ok is avai iabie from th e Brit ish Library
Library of Congress Caiaioging in Pui:-iiraiion Dara is avai iabie
ISBN U $21 23414 '7 ' paperback
 
1 . Iu d g rn e n t u n d e r u n c e rta in ty :
Heuristics a n d biases
A m o s Tversky an d Danie l K a h n e m a n
Many dec is ions are based on beliefs concerning th e likelihood of u n c e r -
tain events such as the outcome of an election, the guilt of a defendant. or
the future value of th e dollar. These beliefs are usually expressed in
s t a t e m e n t s such as “I think that. . . ,”   c h a n c e s are. . . ,” “it is unlikely
that . . . .” and so forth. Occasionally, beliefs concerning uncertain events
are expressed in numerical form as o d d s or sub jec t i ve probabilities. W h a t
d e t e r m i n e s such be l ie fs? How d o people assess th e probability of an
uncertain event or the value of an uncertain quantity? This article shows
that people rely on a limited number of heuristic principles which reduce
the c o m p l e x tasks of assessing probabilities a n d predicting va lues to
simpler judgmental opera t ions . In g e n e r a l , these heur is t ics are quite
useful, but s o m e t i m e s they le ad to severe a n d s ys te m a tic er rors .
Th e sub jec t i ve a s s e s s m e n t of probability r e s e m b l e s th e s u b j e c tiv e assess-
ment of phys ica l quantities s u c h as d is tance or size. These judgments are
all b a s e d on data of limited validity, which are processed accord ing to
heuristic ru les . For e xam p le , the apparent d is tance of an ob jec t is d e t e r -
mined in part b y it s clarity. The m o r e sharply th e ob jec t is s e e n , th e c loser
it appears to b e . This rule has some validity, b e c a u s e in any given s c e n e the
m o r e distant ob jec ts are se en le ss sharply t han n e a r e r objects. However,
the reliance on this rule leads to systematic errors in the estimation of
distance. Specifically, distances are often overestimated when visibility is
poor b e c a u s e the contours of ob jec ts are blurred. On the other hand,
d is tances are often u n d e r e s t i m a t e d when visibility is g o o d b e c a u s e th e
ob jec ts are s e e n sharply. Thus , th e re l iance on clarity a s an indication of
distance leads to common b iases . Such biases a re also found in the
intuitive judgment of probability. This article d e s c r i b e s three heuristics
This chapter originally appeared in Science, l9?4_. I85, 1124-I131- Copyright D l'§l'.74 by the
 
4 INTRODUCTION
t ha t are employed to assess probabilities a n d to predict values. Biases to
which t h e s e heuristics l ead enumerated, and the applied and theoreti-
ca l implications of these observa t ions are d iscussed.
R e p r e s e n t a t i v e n e s s
Many of the probabilistic ques t ions with which people are c o n c e r n e d
belong to on e of th e following t ypes: What is the probability that ob jec t A
b e l o n g s to class B ? What is th e probability that event A originates from
process B ? W h a t is the probability that process B will g e n e r a t e event A? In
answering such questions, people typically rely on the representativeness
heuristic, in which probabilities are eva lua ted by th e d e gre e to which A is
rep resen ta t i ve of B, that is , b y th e d e g r e e to which A r e s e m b l e s B. For
e x a m p l e , when is highly rep resen ta t i ve of B, the probability that A
originates from B is j u d g e d to b e high. On the other h a n d , if A is not
similar to B, the probability that A originates from B is judged to b e low.
For an illustration of judgment b y represen ta t i veness . c o n s i d e r an
individual who has b e e n d e s c r i b e d b y a former neighbor as follows:
“Steve is very shy a n d withdrawn, invariably helpful, but with little
interest in peop le . or in th e world of reality. A m e e k a n d tidy soul , he has a
need for order and structure, and a passion for detail.” How do people
assess th e probability that S te ve is e n g a g e d in a particular occupa t ion from
a list of possibilities (for example, farmer, salesman, airline pilot, librarian,
or phys ic ian)? How d o pe ople order these occupa t ions from m o st to least
likely? In the representativeness heuristic, the probability that Steve is a
librarian, for example, is assessed by the d e g re e to which h e is representa-
tive o f, or similar to, th e s te reo type of a librarian. I n d e e d , research with
p r o b l e m s of this type has shown that people order th e o cc u pa tio n s by
probability and by similarity in exactly the s a m e way (Kahnetnan d c
Tversky. 1 9 7 3 . 4). This approach to the judgment of probability leads to
serious errors, b e c a u s e similarity, or representativeness, is not influenced
b y s e ve r a l fa c to rs that should affect judgments of probability.
insensitivity to prior probability of ontconres
One of the factors that have no effect on representativeness but should
have a m a j o r e f fec t o n probability is the prior probability, or base-rate
frequency, of the outcomes. In the case of Steve, for example, the fact that
t he re are m a n y m o r e f a r m e r s than librarians in th e population should
enter into an y reasonab le es t ima te of th e probability that S te ve is a
librarian rather than a fa rm e r. C on s id e r at io n s of base-rate f r e q u e n c y ,
however, do not affect the similarity of Steve to the stereotypes of
librarians an d fa rm e rs . If p e op le e v alu a te probability b y re p re s e n ta tiv e -
ness. the re fo re , prior probabilities will b e n e g l e c t e d . This hypothesis w as
tested in an experiment where prior probabilities were manipulated
 
J u d g m e n t under uncertainty 5
(Kahneman & : Tve rsl-c y, 1 97 3 . 4 ). Subjects were shown brief personality
descriptions of several allegedly sampled at random from a
group of 100 profess ionals - e n g i n e e r s an d lawyers . Th e sub jec ts w e r e
asked to assess. for each description, the probability t ha t it b e l o n g e d to an
engineer rather t han to a l awyer . In on e experimental condition, sub jec ts
were told that the group from which the descriptions had been drawn
cons is ted of 7 0 e n g i n e e r s a n d 3 0 lawyers . In another condition, sub jec ts
w e r e told that th e group cons is ted of 3 0 e n g i n e e r s an d T0 la w ye rs . Th e
o d d s tha t a n y particular description b e l o n g s to an engineer rather t han to
a lawyer should b e higher in the first condition, where there is a majority
of e n g i n e e r s . than in the second condition, where t he re is a majority of
l awyers . Spec ifical ly , it can b e shown by applying Bayes’ rule that th e ratio
of these o d d s should b e (.7i .3)1, or 5.44, for each description. In a sharp
violation of B a y e s ’ rule. the subjects in the two conditions produced
essent ia l ly th e s a m e probability j u d g m e n t s . Apparently, sub jec ts eva lua ted
the likelihood that a particular description belonged to an engineer rather
than to a lawyer b y th e d e g r e e to which this description w as represen ta-
tive of th e two s tereotypes, with little or n o re g ard for the prior probabili-
t ies of th e categor ies .
The subjects used prior probabilities correctly when they had no other
information. In th e a b s e n c e of a personality ske tch , they j u d g e d th e
probability that a n unknown individual is an engineer to b e .7 and .3 ,
respec t ive ly , in th e two base-rate conditions. However, prior probabilities
w e r e effectively ignored when description w as introduced, e v e n when
this description w a s totally uninformative. The r e s p o n s e s to the following
description this p h e n o m e n o n :
Dick is a 3 0 y e a r old m a n . H e is married with n o children. A m a n of high ability
and high motivation, he promises to be quite successful in his field. He is well
liked b y h is co l leagues.
This description w as intended to convey no information relevant to the
question of whether Dick is a n engineer or a lawyer. Consequently, the
probability that Dick is engineer should e q u a l th e proportion of
engineers in the group, as if no description had been given. The subjects,
however, judged the probability of Dick being an engineer to be‘ .5
regardless of whether the stated proportion of engineers in the group was
.7 or .3 . Evidently, people respond differently when given no evidence
a n d when given worthless e v i d e n c e . W h e n n o s p e c ific e v id e n c e is given,
prior probabilities are properly utilized; when worthless e v i d e n c e is
given, prior probabilities are ignored (I-(ahneman t i: Tversky, 19173, 4 ).
insensi t iv i ty to sample size
To evaluate the probability of obtaining a particular result in a sample
 
6 INTRODUCTION
t iveness heuristic. That is , they assess th e likelihood of a sample resu l t , for
example, that the average in a random sample of ten men will b e 6
fee t (180 c e n t i m e t e r s ) , by the similarity of this result to th e corresponding
p a r a m e t e r (that is , to th e average height in th e population of m e n ). The
similarity of a sample stat ist ic to a population p a r a m e t e r d o es not d e p e n d
on the size of the sample. Consequently, if probabilities are assessed by
representativeness. then the judged probability of a sample statistic will b e
essen t ia l l y independent of sample size. I n d e e d , when sub jec ts assessed th e
distributions of average height for samples of various sizes, they p r o d u c e d
identical distributions. For example, the probability of obtaining an aver-
ag e height g re a te r th an ti f e e t w as ass igned the s a m e value for samples of
1000. 100, a n d Ill m e n ( K a h n e m a n it Tversky , 1 9 ? 2 b , 3 ). Moreover, sub jec ts
failed to appreciate the role of sample size even when it was emphasized
in formulation of the problem. C o n s i d e r following ques t ion :
A c e rt ain town is se rved b y two hospi tals. In th e larger hospital a b o u t 45 b a b ie s a re
b o r n each d ay . a nd in the smaller hospital a b o u t I5 b a b i e s are born each d a y . As
you know, a b o u t 5 ll p e r c e n t of all b ab ie s are b oys. However, th e exac t p e r c e n t a g e
varies from day to day. Sometimes it may be higher than 5 0 percent, sometimes
lower.
For a period of I year, each hospital recorded the days on which more than 60
p e r c e n t of th e b a b i e s born w e r e boys . Which hospital d o you think re c ord e d m o re
such days?
Th e larger hospi ta l (21)
Th e smaller hospi ta l (21)
About the s a m e (that is, within 5 percent of each other) (53)
Th e va lu e s in paren theses are the number of undergraduate s t u d e n t s who
chose e a c h answer.
M o s t sub jec ts j u d g e d the probability of obtaining m o r e than 6 0 p e r c e n t
b oys to b e the s a m e in th e s m a l l an d in th e large hospital, p r e s u m a b l y
b e c a u s e these e v e n t s are d e s c r i b e d by th e s a m e stat ist ic a n d are therefore
equally rep resen ta t i ve of the g e n e r a l population. In cont ras t , sampling
theory entails t ha t the expec ted number of d a y s on which m o r e than 6 0
p e r c e n t of th e b a b i e s are b o y s is m u c h g r e a t e r in th e small h o sp ita l th a n in
th e la rg e o n e , because a large sample is less likely to stray from 5 0 p e r c e n t .
This fundamental notion of statistics is evidently not part of people’s
repertoire of intuitions.
A similar insensitivity to sample size has been reported in judgments of
posterior probability, that is , of the probability that a s a m p l e has b e e n
drawn from on e population rather t han from a n o t h e r . Consider the
following e x a m p l e :
Imagine an urn filled with bal ls, of which 1 1 ; . are of on e color a n d 1 ; of a n o t h e r . On e
individual has drawn 5 bal ls from th e urn, n d found that 4 w e r e re d an d I w as
white. Another individual has drawn 20 bal ls a n d found that 12 w ere re d an d
 
J u d g m e n t under uncertainty F
con ta ins 1 ' ) ‘ : re d bal ls n d 1 ; white bal ls, rather than th e oppos i te? W h a t o d d s should
each individual g ive?
In this problem, the correct posterior odds a re B to 1 for the 4:1 sample
a n d 16 to 1 for th e 12:3 sample , a s s u m i n g e q u a l prior probabilities.
However, most people feel that the first sample provides much stronger
evidence for the hypothesis that the urn is predominantly red, b e c a u s e the
proportion of re d bal ls is larger in th e first t han in th e s e c on d sample .
H e r e a g a i n , intuitive j u d g m e n t s dominated b y th e sample proportion
and are essentially unaffected by the size of the sample, which plays a
crucial role in th e determination of th e ac tua l posterior o d d s ( K a h n e m a n 8:
Tversky, 1 9 ' 7 2 b ) . In addition, intuitive estimates of posterior odds are far
less e x t r e m e than th e c orre c t va lues . Th e underestimation of th e i m p a c t of
evidence ha s been observed repeatedly in problems of this type (W.
Edwards, 1 9 6 8 , 2 5 ; Slovic S r Lichtenstein, l9?'1). It h a s been labeled
 conservatism.
Miscorrcept ior ts of chance
People expec t that a s e qu e n c e of e v e n t s g e n e r a t e d b y a r a n d o m process
will represent the essential characteristics of that process even when the
sequence is short. In considering tosses of a coin for heads or tails, for
e x a m p l e , pe op le re g ard th e sequence H-T-H-T-T-I-I to b e m o r e likely than
the sequence I-I-I-I-H-T-T-T, which does not appear random, and also more
likely t han the s e qu e n c e I-I-I-I-H-H-T-I-I, which d o e s not re pre se n t th e
fairness the coin (Kahneman 8 : Tversky, 1 9 7 2 b , 3 ). Thus, people expect
t ha t th e essent ia l character is t ics of th e process will b e r e p r e s e n t e d , not
only globally in the entire sequence, but also locally in each of its parts. A
locally rep resen ta t i ve s e qu e n c e , h o w e v e r , dev ia tes sys temat ica l l y from
c h a nc e expec ta t ion : it con ta ins too m a n y alternations an d too few r u n s .
Another c o n s e q u e n c e of the belief in local representativeness is the
well-known gambler’s fallacy. After observing a long run of red on the
roulette wheel, for e x a m p l e , m o s t people erroneously b e l i e v e that b lack is
now d u e , p r e s u m a b l y because th e occurence of b lack will result in a m o r e
representative sequence than the occurrence of an additional red. Chance
is commonly viewed as a self-correcting process in which a deviation in
one direction induces a deviation in the opposite direction to restore the
equilibrium. In fact, deviations are not “corrected” as a chance process
unfolds, they a re merely diluted.
Misconceptions of are not limited naive subjects. A study of
the stat is t ical intuitions of e x p e r i e n c e d research psycholog is ts (Tverslcy 8 :
K a h n e m a n , 1971, 2) revea led a lingering belief in what m ay b e c alle d th e
“law of small numbers,” according to which even small samples are highly
rep resen ta t i ve of th e populations from which they a re drawn. The
of these reflected the expectation that a valid
 
3 INTRODUCTION
hypothes is a b o u t a population will b e r e p r e s e n t e d by a statistically signifi-
can t result in a sample - with little r e g a r d for its size. As a c o n s e q u e n c e ,
th e researchers put too much faith in th e re su lts of small s a m pl e s a n d
g ro ss ly o ve re s tim a te d the replicability of such resu l ts . In the ac tua l
conduct of research, this bias leads to the selection of samples of inade-
quate size a n d to overinterpretation of findings.
I aserrsitivit y to predictabi l i ty
People are s o m e t i m e s cal led u p o n to m a ke su ch numerical predictions as
th e future value of a stock, the d e m a n d for a commodity, or the o u t c o m e of
a football g am e . S u ch predictions are often m a d e b y represen ta t i veness .
For e x a m p l e , suppose on e is given a description of a c o m p a n y a n d is asked
to predict it s future profit. description of c o m p a n y is very
favorab le , a very high profit will appear m o s t representative of that
description; if the description is mediocre, a mediocre performance will
a p p e a r m o s t represen ta t i ve . Th e d eg re e to which th e description is favor-
able is unaffected by the reliability of that description or by the degree to
which it permits accura te prediction. H e n c e , if people predict solely in
t e r m s of th e favorab leness of the description, their predictions will b e
insensitive to th e reliability of th e e vid e n ce a n d to th e expec ted accuracy
of the prediction.
This mode of judgment violates the normative statistical theory in
which the e x t r e m e n e s s a n d th e r a n g e of predictions a re controlled by
cons ide ra t ions of predictability. When predictability is nil, the s a m e
prediction should b e made in all cases . For example, if the descriptions of
c o m p a n i e s provide n o information relevant to profit, then the s a m e value
(such as average profit) should b e predicted for all companies. If predict-
ability is per fec t , of course . the va lues predicted will m a t c h th e ac tua l
values and the range of predictions will equal the range of outcomes. In
general, the higher the predictability, the wider the range of predicted
values.
Several studies of numerical prediction have demonstrated that intui-
tive predictions violate this rule, a n d tha t sub jec ts show little or n o r e g a r d
for cons ide ra t ions of predictability ( K a h n e m a n 8: Tversky , 1 9 7 3 , 4). In on e
of these s tud ies , sub jec ts w e r e p r e s e n t e d with severa l parag raphs , each
describing th e p e r f o r m a n c e of a student t eacher during a particular
pract ice le sson . S om e su b je cts w e r e asked to evaluate the quality of th e
lesson d e s c r i b e d in th e paragraph in percentile scores, relative to a
spec ified population. Other sub jec ts w e r e asked to predict, also in p e r c e n -
tile scores, th e standing of each student t e a c h e r 5 years after th e pract ice
lesson. Th e j u d g m e n t s m a d e under th e two conditions w e r e identical. That
is , th e prediction of a r e m o t e criterion (success of a t e a c h e r after 5 years)
w as identical to the evaluation of the information on which the prediction
 
Iudgrnent under uncertainty 9
these predictions w e r e undoubtedly aware of the limited predictability of
t each ing com p e te nce on th e basis of a s ing le trial lesson 5 years earlier;
nevertheless, their predictions were as extreme as their evaluations.
The illusion of validi ty
As w e have s e e n , people often predict b y se le ct in g th e ou tcom e (for
e x a m p l e , an o cc u pa tio n ) th at is m o s t represen ta t i ve of th e input (for
e x a m p l e , th e description of a p e rs on ). Th e c o nfid e n c e they have in their
prediction d e p e n d s primarily on th e d e g r e e of rep resen ta t i veness {that is ,
on the quality of the match between the s e l e c t e d outcome and the input)
with little or n o regard for th e factors tha t limit predictive accuracy . T h u s ,
people express great confidence in the prediction that a person is a
librarian when given a description of his personality which matches the
s te reo type of librarians, e v e n if th e description is scan ty , unreliable, or
outdated. The unwarranted confidence which is produced by a good fit
b e t w e e n th e predicted o u t c o m e a n d the input information m ay b e cal led
the illusion of validity. This illusion persists even when the judge is aware
of h e factors that limit th e accuracy of h is predictions. It is a c o m m o n
observa t ion that psycholog is ts who c o n d u c t se lec t ion interviews often
experience considerable confidence in their predictions, even when they
know of th e vast literature t ha t s h o w s selec t ion interviews to b e highly
fallible. Th e continued re l iance on the clinical interview for se lec t ion,
desp i te repea ted d e m o n s t r a t i o n s of it s i n a d e q u a c y , amply at tests to th e
strength of this ef fec t .
The internal cons is tency of a pattern of inputs is a m a j o r determinant of
one 's c o nfid e n c e in predictions b a s e d on these inputs. For e x a m p l e , people
express more confidence in predicting the final grade-point average of a
student w h o s e first-year record consists entirely of B's than in predicting
the grade-point average of a student whose first-year record includes
many A's an d C's. Highly cons is ten t p a t t e r n s are m o s t often observed
when the input var iab les are highly redundant or cor re la ted . H e n c e ,
people tend to have great confidence in predictions b a s e d on redundant
input var iab les . However, an elementary result in th e stat ist ics of corre la-
tion asserts that, given input variables of stated validity, a prediction based
o n severa l s u c h inputs can achieve higher accuracy when they are
independent of each other than when they are redundant or cor re la ted .
Thus, redundancy among inputs d e c r e a s e s accuracy even as it increases
c o nfid e n c e , an d people are often confident in predictions t ha t are quite
likely to b e off th e m a rk (K ah n e m a n 6 : Tversky , 1 9 7 3 , 4).
Misconcep t ions of regression
Suppose a la rge group of children has b e e n e xa m in e d on two equivalent
 
ll] INTRDDUCTIUN
who did b e s t on on e of th e two vers ions , he will usually find their
p e r f o r m a n c e on th e s e c on d version to b e s o m e w h a t disappointing.
Conversely, if one selects ten children from among those who did worst
on on e vers ion , t h e y will b e found, on the average , to d o som e w h a t better
on th e other ve rs io n . M ore generally, c o n s i d e r two va ria b le s X a n d Y
which have th e s a m e distribution. If on e selects individuals w h os e a ve r-
a g e X score deviates from the mean of X by it units, then the average of
their Y scores will usually dev ia te from th e m e a n of Y by less than it units.
These observations illustrate a general phenomenon known as regression
toward th e m e a n , which w as first d o c u m e n t e d b y Gal ton m o r e than 100
years ago.
In th e normal course of life, on e e n c o u n t e r s m a n y ins tances of r eg res -
sion toward the mean, in the comparison of the height of fathers and s o n s ,
of the intelligence of husbands and wives, or of the performance of
individuals on consecutive examinations. Nevertheless, people do not
develop correct intuitions about this phenomenon. First, they do not
e xp e ct re g r e ss io n in m a n y contex ts where it is bound to occu r. S e c o n d ,
when they recognize the occurrence of regression, they often invent
spurious c a u s a l explanations for it (Kahneman 8 : Tversky, 1 ‘ E l ' ? 3 , 4). We
s u g g e s t tha t th e phenomenon of reg ress ion r e m a i n s e lus ive because it is
incompatible with th e belief that the predicted o u t c o m e should b e m a x i -
mally' representative of the input, and, hence, that the value of the
o u t c o m e variable should b e as e x t r e m e as th e va lue of th e input var iab le .
The failure to recognize the import of regression can have pernicious
c o n se q u e nc e s , a s illustrated b y the following o b s e rv atio n (K ah n e m a n a
Tversky , 1 9 7 3 , 4). In a d iscuss ion of flight training. e x p e r i e n c e d instructors
noted that praise for an exceptionally smooth landing is typically followed
by a poorer landing on the next try, while harsh criticism after a rough
landing is usually followed by an improvement on the next try. The
instructors concluded that verb l rewards a re detrimental to learning,
while v e r b a l punishments a re b e n efic i a l . contrary to accepted psychologi-
ca l doctrine. This conclusion is unwarranted because of th e presence of
reg ress ion toward th e m e a n . As in other cases of repea ted examination, an
improvement will usually follow a poor p e rfo rm a n ce a n d a deterioration
will usually follow an outstanding performance, even f the instructor
does not respond to the trainee’s achievement on the first attempt. Because
the instructors had pra ised their t ra inees after g o o d landings a n d a d m on -
ished t h e m after poor o n es , they r eached the e r r o n e o u s an d potentially
harmful conclusion that punishment is more effective than reward.
Thus, the failure to understand the effect of regression leads one to
o ve re s tim a te th e ef fec t iveness of punishment a n d to u n d e r e s t i m a t e th e
efiec t i veness of r e w a r d . In social interaction, as well as in training,
r e w a r d s a re typically administered when p e r f o r m a n c e is g o o d , a n d
 
J u d g m e n t under uncertainty ll
re g re ss ion a lon e , the re fo re , behavior is m o s t likely to improve after
punishment and most likely to deteriorate after reward. Consequently, the
human condition is s u c h that , b y chance a lone , on e is m o s t often r e w a r d e d
for punishing others a n d m o s t often punished for rewarding t h e m . People
are generally not aware of this contingency. In fact , the elusive role of
reg ress ion in determining th e a p p a r e n t c o n se q u e nc e s of r e w a r d a n d
punishment s e e m s to have e s c a p e d the notice of students of this a r e a .
Availability
There a re situations in which people assess the frequency of a class or the
probability of an e v e n t b y th e ease with which i ns tances or occur rences
can b e brought to mind. For example, one may assess the risk of heart
attack among middle-aged people by recalling such occurrences among
one 's acqua in tances . Similarly, on e m ay evaluate th e probability that a
given business venture will fail by imagining various difficulties it could
encounter. This judgmental heuristic is called availability. Availability is a
u s e f u l for assessing frequency or b e c a u s e ins tances of
la rge classes are usually r eached better an d fas te r than ins tances of less
frequent classes. However, availability is af fected b y factors other t han
frequency an d probability. Consequently, th e re l iance on availability
l eads to predictable biases, s o m e of which are illustrated below.
Biases d u e to the retrieoatrility of i ns tances
When the size a class judged by the of its instances, a class
w h o s e ins tances are easi ly retrieved will a p p e a r m o r e numerous than a
class of equal frequency whose instances a re less retrievable. In an
e l e m e n t a r y demonstration of this ef fect , sub jec ts h e a r d a list of well-
known persona l i t i es of both sexes an d w e re su b s e qu e n t ly asked to j u d g e
whether the list contained more names of men than of women. Different
l ists w e r e p re se n te d to different g r o u p s of sub jec ts . In s o m e of th e lists th e
men were relatively more famous than the women, and in others the
women w e r e relatively m o r e f a m o u s than th e m e n . In each of th e l ists, th e
subjects erroneously judged that the c lass (sex) that had the more famous
personalities was the more numerous (Tversky 8 : Kahneman, 1 9 ? 3 , 1 1).
In addition to familiarity, t he re are other factors, such a s sal ience, which
affect th e retrievability of i ns tances. For e x a m p l e , the i m p a c t of see ing a
house burning on th e s u b je c tiv e probability of such acc iden ts is probably
g r e a t e r than the i m p a c t of reading a b o u t a fire in th e loca l paper .
Furthermore, recent occurrences are likely to b e relatively more available
than earlier occur rences . It is a c o m m o n exper ience that th e sub jec t i ve
probability of traffic accidents rises temporarily when one s e e s a car
overturned by th e side of the road .
 
1 1 INTRDDUCTION
Biases due to th e effectiveness of a search se t
S uppose on e sa m ple s a word (of t h ree letters or m o r e ) at random from an
English text . Is it m o r e likely that the word starts with r or t ha t r is th e
third le t ter? People approach this problem by recalling words t ha t begin
with r (road) and words that have r in the third position (car) and assess
the relative frequency by the ease with which words of the two t ypes c o m e
to mind. Because it is m u c h eas ier to search for words b y their first letter
t han by their third letter, m o s t people j u d g e words that begin with a given
consonant to b e more numerous than words in which the s a m e consonant
appears in the third position. T h e y d o so e v e n for consonan ts , such as r or
k, that are m o r e frequent in th e third position t han in the first (Tversky &
K a h n e m a n , 1 9 7 3 , 11).
Different tasks elicit different search sets. For example , suppose you are
asked to rate the frequency with which abstract words (thought, love) and
concre te (door, a p p e a r in written E n g l i s h . A to
answer this question is to search for contexts in which the word could
appear . It s e e m s eas ier to think of con tex ts in which an abst rac t c o n c e p t is
mentioned (lave in love stor ies) than to think of con tex ts in which a
concrete word (such as d oo r) is mentioned. If the frequency of words is
judged by the availability of th e c on te xts in which they appear , abs t rac t
words will b e judged as relatively more numerous than concrete words.
This bias h a s been observed in a recent study (Galbraith 8 : Underwood,
1 - W 3 ) which s h o w e d that the j u d g e d frequency of occur rence of abst rac t
words w as m u c h higher t han tha t of concre te words, equa ted in objec t ive
frequency. Abstract words were also judged to appear in a much greater
variety of contex ts than concre te w o r d s .
Biases o f inraginabi l i ty
S o m e t i m e s on e has to assess the frequency of a class w h o s e ins tances are
not stored in memory but can b e generated according to a given rule. In
such s i tua t ions , on e typically g e n e r a t e s s e ve ra l in s ta n ce s an d eva lua tes
frequency or th e ease with which the relevant i ns tances can
b e constructed. However, the e a s e of constructing instances does not
always reflect their actual frequency, and this mode of evaluation is prone
to b iases. To illustrate, consider a group of 10 peop le who form c o m m i t t e e s
of it members, 2 5 it 5 B. How many different committees of it members can
b e formed? The correct answer to this problem is given by the binomial
coefficient ( 1 “ ) which r e a c h e s a maximum of 2 5 2 for it - 5 . Clearly, the
number of c o m m i t t e e s of it m e m b e r s equals th e number of c o m m i t t e e s of
(1 0 — it ) m e m b e r s , because a ny co m m itte e of it m e m b e r s d efi n e s a unique
group of (10 — k) nonmembers.
O n e w a y to a n s w e r this ques t ion without computation is to mentally
 
J u d g m e n t under uncertainty 13
ease with which they c o m e to mind. C o m m i t t e e s few m e m b e r s , say 2,
are m o r e avai lab le than c o m m i t t e e s of many m e m b e r s , say 8 . The simplest
scheme for the construction of committees is a partition of the group into
disjoint sets. On e readily sees tha t it is easy to c o n s t r u c t five disjoint
committees of 2 members, while it is impossible to generate even two
disjoint c o m m i t t e e s of B m e m b e r s . Consequently, if frequency is assessed
by imaginability. or by availability for construction. the small committees
will appe ar m o re n u m e r o u s than larger c o m m i t t e e s . in con t ras t to th e
correct bell-shaped function. Indeed, when naive subjects were asked to
es t ima te the number of distinct c o m m i t t e e s of various sizes, their es t ima tes
w e r e a dec reas ing monotonic function of c o m m i t t e e size (Tversky 8 :
K a h n e m a n , 1 9 7 3 , 11). Fo r e xa m p le , th e m e d i a n es t ima te of th e number of
c o m m i t t e e s of 2 m e m b e r s w as 7 ' 0 , while the es t ima te for c o m m i t t e e s of 8
m e m b e r s w a s 20 (the cor rec t a n s w e r is 45 in b o t h cases).
lmaginability plays an important role in th e evaluation of probabilities
in real-life s i tuat ions. The risk involved in an adventurous expedition, for
example, is evaluated by imagining contingencies with which the expedi-
tion is not equipped to cope. If many such difficulties are vividly
p o r t r a y e d , th e expedition can b e m a d e to a ppe ar exceedingly d a n g e r o u s ,
although the ease with which d is a s te r s a re imagined n e e d not reflect their
ac tua l likelihood. Converse ly , the risk involved in an undertaking m ay e
grossly underestimated if some possible dangers are either difficult to
conceive of, or simply do not come to mind.
i l lusory correlation
C h a p m a n an d C h a p m a n ( 1 9 6 9 ) have d e s c r i b e d an interesting b ias in th e
judgment of the frequency with which two events co-occur. They
p r e s e n t e d na ive j u d g e s with information concerning severa l hypothetical
mental patients. The data for each patient consisted of a clinical diagnosis
a n d a drawing of a person m a d e b y the patient. Late r th e ju d g e s e stim a te d
the frequency with which e a c h diagnosis (such as paranoia or suspicious-
ness) had b e e n a c c o m p a n i e d b y va riou s fea tu res of th e drawing (such as
peculiar eyes). Th e sub jec ts markedly o v e r e s t i m a t e d th e frequency of
co-occurrence of natural assoc ia tes . such as suspiciousness and peculiar
eyes. This ef fec t w as labe led illusory correlation. In their e r r o n e o u s
judgments of the data to which they had been exposed, naive subjects
 rediscovered much of the common, but unfounded, clinical lore
concerning th e interpretation of th e d raw-a-person test. The illusory
correlation ef fec t w as extremely res is tan t to contradictory data . It pers is ted
even when the correlation between symptom and diagnosis was actually
n e g a t i v e , a n d it prevented th e j u d g e s from detecting relationships that
w e r e in fa c t p re s e n t.
Availability provides a natural a c c o u n t for th e illusory-correlation
 
I4 INTRUDUCTIUN
b a s e d on th e strength of th e associat ive bond b e tw e e n th e m . W h e n th e
associat ion is s t r o n g , on e is likely to c o n c l u d e that th e e v e n t s have b e e n
frequently paired. Consequently, strong assoc ia tes will b e judged to have
occurred t o g e t h e r f requen t ly . Accord ing to th is vie w , the illusory correla-
tion b e t w e e n suspic iousness a n d peculiar drawing of th e eyes, for e x a m -
ple, is d u e to th e fact that susp ic iousness is m o r e readily associated with
th e e y e s than with a n y other part of th e b od y.
Lifelong exper ience has taught u s that , in g e n e r a l , ins tances of la rge
classes are recalled better and faster than instances of less frequent c lasses;
that likely o c c u r r e n c e s are e a s i e r to imagine than unlikely o n e s ; and that
th e associat ive c on n e c tio n s b e tw e e n even ts are strengthened when th e
e v e n t s frequently c o -o c cu r . A s a resu l t , m an has a t his d isposal a p r o c e d u r e
(the availability heuristic) for estimating the numerosity of a class, the
likelihood of an e v e n t , or frequency of co-occu r rences , b y the ease
with which th e relevant m e n t a l opera t ions of retrieval, construction. or
associat ion can b e performed. However, as th e preceding e x a m p l e s have
demonstrated, this valuable estimation procedure results in systematic
er rors .
Adjustment a n d anchoring
In many situations, people make estimates by starting from a n initial value
that is ad ju ste d to yield th e final a n s w e r . Th e initial va lue , or starting
point, may b e suggested by the formulation of the problem, or it may b e
th e result of a partial computation. In either case, a d j u s t m e n t s are typically
insufficient (Slovic 8 : Lichtenstein, 19?1). T h a t is , different starting points
yield different est imates, which are b iased t o w a r d the initial va lu e s. W e
call this phenomenon anchoring.
i nsuf f ic ie r r t adjustment
In a demonstration of the anchoring ef fect , sub jec ts w e r e asked to e s tim a te
various quantities, stated in p e r c e n t a g e s (for e x a m p l e , th e p e r c e n t a g e of
African countries in the United Nations). For e a c h quantity, a number
between U and 100 was determined by spinning a wheel of fortune in the
subjects’ presence. The subjects were instructed to indicate first whether
that number w as higher or lower t han th e value of the quantity, an d then
to e stim ate th e va lue of th e quantity b y moving upward or downward
from the given number. Different groups were given different numbers
for e a c h quantity, and these arbitrary numbers had a marked effect on
e s tim a te s . Fo r e x a m p l e , th e m e d i a n es t ima tes of th e percen tage of African
countries in th e United Nations w e r e 25 a n d 45 for g r o u p s that rece ived 10
and 6 5 , respectively, as starting points. Payoffs for accuracy did not reduce
the anchoring ef fect .
 
J u d g m e n t u n d e r u n c e r t a i n t y 1 5
sub jec t , but also when th e s u b j e c t bases his es t ima te on th e result s o m e
incomplete computation. A study of intuitive numerical estimation illus-
t ra tes this ef fect . Two g r o u p s of high school s tu d e n ts e s tim a te d , within 5
seconds, a numerical express ion that w as written on the b la ck b oa rd . O n e
group estimated the product
Bx7x6x5x4x3x2xl
while another group e s tim a te d th e product
1x2x3x4x5x6x7xB
To rapidly a n s w e r s u c h ques t ions , people m ay perform a few steps of
computation and estimate the product b y extrapolation or adjustment.
Because a d j u s t m e n t s are typically insufficient, this p r o c e d u r e should lead
to underestimation. Furthermore, b e c a u s e the result of the first few steps
of multiplication (performed from left to right) is higher in th e d e s c e n d -
ing sequence than in the ascending sequence, the former expression
should b e j u d g e d larger than the la t te r . Both predictions w e r e confirmed.
The m e d i a n es t ima te for th e a sce n d in g sequence w as 512, while th e
m e d i a n es t ima te for the d e s c e n d i n g s e qu e n c e w as 2,250. The cor rec t
a n s w e r is 40,320.
Biases in th e evaluation of conjunctive a n d disjunctive e v e n t s
In a r e c e n t study b y Bar-Hillel ( 1 9 1 7 3 ) sub jec ts w e r e given th e opportunity
to b e t on on e of two even ts . Three types of e v e n t s w e r e u s e d : (i) simple
events, such as drawing a red marble from a bag containing 5 0 percent red
m a r b l e s an d 5 0 p e r c e n t white m a r b l e s ; (ii) conjunctive even ts , s u c h a s
drawing a re d m a r b l e seven t i m e s in success ion, with r e p l a c e m e n t , from a
b ag containing 9 0 p e r c e n t re d m a rb le s a n d 10 p e r c e n t white m a rb le s ; a n d
(iii) disjunctive events, such as drawing a red marble at least once in seven
successive tries, with replacement, from a b a g containing 10 percent red
marbles and 9 0 percent white marbles. In this problem, a significant
majority of sub jec ts preferred to b e t o n th e conjunctive event ( the p r o b a -
bility of which is .48) rather t han on the simple event (the probability of
which is .50). Subjects also preferred to bet on the simple event rather than
on the disjunctive event, which h a s a probability of . 5 2 . Thus, most
subjects bet on the less likely event in both comparisons. This pattern of
choices illustrates a general finding. of choice among and
of judgments of probability indicate that people t e n d to o v e r e s t i m a t e th e
probability of conjunctive even ts (Cohen, C h e s n i c k , 8 : H a r a n , 19'?2, 24)
a n d to u n d e r e s t i m a t e th e probability of disjunctive e ve n ts. Th ese biases
are readily explained a s ef fects anchoring. Th e stated of th e
elementary event (success at an one s t a g e ) provides a natural starting
point for the estimation of th e probabilities of both conjunctive a n d
 
16 INTRDDUCTIUN
insufficient, th e final es t ima tes remain too close to th e probabilities of the
e l e m e n t a r y e v e n t s in both cases. Note that th e overall probability of a
conjunctive event is lower than the probability of e a c h elementary event,
w h e r e a s the overall probability of a disjunctive event is higher t han the
probability of each elementary e v e n t . As a consequence of anchoring, the
overall probability will b e o v e r e s t i m a t e d in conjunctive p r o b l e m s a n d
u n d e r e s t i m a t e d in disjunctive p r o b l e m s .
Biases in the evaluation of c o m p o u n d even ts a re particularly significant
in th e c o n t e x t of planning. The successfu l completion of an undertaking,
such as th e development of a new product, typically h s a conjunctive
character: for the undertaking to s uc ce e d , e ach of a series of events must
occur . E v e n when each of these e v e n t s is very likely, th e overall probabil-
ity of success can b e quite low if the number of e v e n t s is large . Th e g e n e r a l
t e n d e n c y to overes t ima te the probability of conjunctive leads to
unwarranted optimism in th e evaluation of th e likelihood that a plan will
s u c c e e d or that a pro jec t will b e c m p le te d on tim e . Conversely, disjunc-
tive s t ruc tu res are typically e n c o u n t e r e d in the evaluation of r isks. A
complex system, such as a nuclear reactor or a human body, will malfunc-
tion if a n y of it s essent ia l c om p on e n t s fa ils . E v e n when the likelihood of
failure in each c o m p o n e n t is slight, the probability of an overall failure
ca n b e high if m a n y com p on e n ts are involved. Because of anchoring,
people will tend to underestimate the probabilities of failure in complex
sys tems. Th u s, the direction of th e anchoring b ias can s o m e t i m e s b e
inferred from the structure of the event. The chain-like structure of
conjunctions leads to overestimation, the funnel-like structure of disjunc-
tions leads to underestimation.
Anchor ing in th e assessment o f subjective probabi l i ty distr ibut ions
In dec is ion analys is , exper ts are often required to express their beliefs
a b o u t a quantity, such as th e value of th e Dow-Jones average on a
particular d a y , in the form of a probability distribution. S u ch a distribu-
tion is usually constructed by asking the person to select values of the
quantity that correspond to specified percentiles of his subjective proba-
bility distribution. For example, the judge may b e asked to select a
n u m b e r , X9“, such tha t h is sub jec t i ve probability that this number will e
higher than the value of the Dow-Jones a v e r a g e is .90. That is , h e should
se lec t th e va lu e X 9 0 so tha t he is jus t willing to accept 9 to 1 o d d s that th e
Dow-Jones average will not exceed it. A sub jec t i ve probability distribution
for th e va lue of th e Dow-Jones average can b e c o n s t r u c t e d from severa l
such j u d g m e n t s corresponding to different percen t i l es .
By collecting subjective probability distributions for many different
quantities, it is possible to tes t the judge for proper calibration. A judge is
properly (or externally) cal ib ra ted in a se t of p r o b l e m s if exact ly Il p e r c e n t
 
Iudgrnent under uncertainty 1?
X . For example, the true values should fall below Km for 1 percent of the
quantities a n d above X 9 9 for 1 p e r c e n t of th e quantities. Th us, th e t rue
va lues should fall in th e c o nfid e n c e interval b e t w e e n Km a n d X9 ; on 9 8
p e r c e n t of the p r o b l e m s .
Several investigators (Alpert -5 : Raiffa, 1 9 6 9 , 21 ; Stael von Holstein,
1 9 7 1 b ; Winkler, 196?) have obtained probability disruptions for m a n y
quantities from a large number of judges. These distributions indicated
la rge a n d sys temat i c d e p a r t u r e s from proper calibration. In m o st s tu d ie s .
the actual values of the assessed quantities are either smaller than Km or
g re a te r th a n Ks, for a b o u t 3 0 p e r c e n t of the p ro b le m s . T ha t is , th e sub jec ts
state overly narrow c o nfid e n c e intervals which reflect m o r e certainty t han
is justified b y their knowledge a b ou t th e assessed quantities. This b ias is
common to naive and to sophisticated subjects, and it is not eliminated by
introducing scoring rules, which provide for external
calibration. This ef fec t is attributable, in part a t least, to anchoring.
To select X9 0 for the value of the Dow-]ones average, for example, it is
natural to begin b y thinking a b o u t one ’s b e s t es t ima te of th e Dow—]ones
and to adjust this value upward. If this adjustment - like most others — is
insufficient, then X“, will not b e sufficiently extreme. A similar anchoring
ef fec t will occur in th e s e le c tio n of Km , which is p r e s u m a b l y obtained b y
adjusting one 's b e s t es t ima te downward. Consequently, the c o n f i d e n c e
interval between X“, and Kw will be too narrow, and the a s s e s s e d probabil-
ity distribution will b e too tight. In support of this interpretation it can b e
shown that sub jec t i ve probabilities are sys temat ica l l y altered b y a proce-
dure in which one's best estimate does not serve as an anchor.
Sub jec t i ve probability distributions for a ( the Dow-Jones
average) can b e obtained in two different ways: (i) by asking the subject to
s e l e c t values of the Dow-Iones that correspond to specified percentiles of
his probability distribution and {ii} by asking the subject to assess the
probabilities that the true value of the Dow~]ones will e xce e d som e
spec ified values. The two p r o c e d u r e s are formally equivalent an d should
yield identical distributions. However, they s u g g e s t different m o d e s of
adjustment from different anchors. In procedure (i). the natural starting
point is one's best estimate of the quality. In procedure (ii), on the other
hand, the subject may be anchored on the value stated in the question.
Alternatively, he may b e anchored on even odds, or 51] -5U chances, which
is a natural starting point in th e estimation of likelihood. In either case,
procedure (ii) should yield less extreme odds than procedure (i).
To co n tra st the two p r o c e d u r e s , a se t of 24 quantities (such as th e air
d is tance from New Delhi to Peking} w as p r e s e n t e d to a group of sub jec ts
who assessed either X“, or X“, for each problem. Another group of subjects
received the median judgment of the first group for e a c h of the 24
quantities. Th ey w e re asked to assess th e o d d s tha t each of th e given va lues
e x c e e d e d th e true va lue of th e relevant quantity. In th e a b s e n c e of a n y
 
IB INTRODUCTION
that is , 9:1. However, if e ve n od ds or th e stated value serve as anchors , th e
odds of the second group should b e less extreme, that is, closer to
I n d e e d , the m e d i a n o d d s stated by this group, across all p ro b le m s, w e re
3:1. W h e n the judgments of th e two g rou ps w e re tes ted for e x t e r n a l
calibration, it was found that subjects in the first group were too extreme.
in accord with earlier s tu d ie s. Th e even ts that they defined as having a
probability of .1 0 actually obtained in 24 percent of the cases. In cont ras t ,
subjects in the second group were too conservative. Events to which they
ass igned an a ve rag e probability of .3 4 actually obtained in 26 percent of
th e cases. These resu l ts illustrate he manner in which th e d e g r e e of
calibration d e p e n d s on the p r o c e d u r e of elicitation.
Discussion
This article has b e e n c o n c e r n e d with cognitive biases that s tem from th e
reliance on judgmental heuristics. These biases are not attributable to
motivational e ffe cts su ch as wishful thinking or th e distortion of judg-
m e n t s b y payoffs an d pena l t i es . I n d e e d , severa l of th e severe e r ro rs of
judgment reported earlier occur red d e spite th e fact that sub jec ts w e r e
e n c o u r a g e d to b e accura te a n d w e r e r e w a r d e d for the cor rec t a n s w e r s
( K a h n e m a n G t Tversky, 19 7 2b , 3 ; Tversky l l ’ : K a h n e m a n , 1 9? 3 , 1 1).
The reliance on heuristics and the prevalence of b i a s e s are not restricted
to l a y m e n . E x p e r i e n c e d researchers are also p r o n e to th e sam e b iase s -
when they think intuitively. For example, the tendency to predict the
o u t c o m e that b e s t rep resen ts th e data, with insufficient r e g a r d for prior
probability, has b e e n observed in th e intuitive judgments of individuals
who have had extensive training in statistics (Kahneman 8 : Tversky, 1 9 7 3 ,
4 ; Tversky 8 .: Kahneman, 1 9 7 1 , 2 ). Although the statistically sophisticated
avoid elementary er rors, such as th g a m b l e r ’ s fal lacy, their intuitive
judgments are liable to similar fal lacies in m o r e intricate a n d less t ranspar -
ent p r o b l e m s .
It is not surprising th at u se fu l heuristics such as represen ta t i veness a n d
availability are retained, even though they occasionally lead to errors in
pred ic t ion or e stim a tion . W h at is perhaps surpr is ing is the fai lure of
people to infer from lifelong exper ience such fundamental s t a tis t ic a l r u le s
as reg ress ion toward the m e a n . or the ef fec t of sample size on sampling
variability. Although e v e r y o n e is exposed , in th e normal course of life, to
n u m e r o u s e x a m p l e s from which these ru les could have b e e n induced, very
few people discover the principles of sampling and regression on their
own. Stat is t ica l principles are not l e a r n e d from e v e r y d a y e x p e r i e n c e
b e c a u s e th e relevant i ns tances are not c o d e d appropriately. For e x a m p l e ,
people do not discover that successive lines in a text differ more in average
word length than do successive p ag e s, b e c au s e they simply do not attend
 
not learn the relation between sample size and sampling variability,
although the data for such learning are a b u n d a n t .
Th e la ck of an appropriate c od e a ls o exp la ins why people usually d o not
d e t e c t the biases in their judgments of probability. A person could
conceivably learn whether h is judgments are externally cal ib ra ted by
keeping a tally of the proportion of events that actually occur among those
to which h e assigns the s a m e probability. However, it is not natural to
group even ts by their j u d g e d probability. In th e a b s e n c e of such grouping
it is impossible for a n individual to discover, for example, that only 5 0
p e r c e n t of th e predictions to which he has ass igned a probability of .9 or
higher actually come true.
The empirical analys is of cognitive b iase s has implications for th e
theoretical and applied role of judged probabilities. Modern decision
theory (de Finetti, 1 9 6 3 ; S a v a g e , 1 9 5 4 ) regards subjective probability as the
quantified opinion of an idealized person . Spec ifical ly , the sub jec t i ve
probability of a given event is defined by the s e t of bets about this event
that such a person is willing to accept . An internally cons is ten t , or
coherent, subjective probability measure ca n b e derived for an individual
if h is cho ices a m o n g b e t s sat isfy cer ta in principles, that is , th e a x i o m s of
the theory. The derived probability is subjective in the s e n s e that different
individuals are allowed to have different probabilities for th e s a m e e v e n t .
The m a j o r contribution of this approach is tha t it provides a rigorous
sub jec t i ve interpretation of probability that is applicable to unique even ts
a n d is e m b e d d e d in a g e n e r a l theory of rational dec is ion .
It should perhaps b e n ote d th a t, while sub jec t i ve probabilities can
sometimes inferred from preferences among bets, they are normally not
formed in this fash ion . A person b e t s on team A rather than on team B
b e c a u s e he believes that team A is more likely to win, he does not infer
this belief from his betting preferences. Thus, in reality, subjective proba-
bilities determine preferences among b e t s and are not derived from them,
as in th e ax iomat ic theory of rational dec is ion (Savage, 1954).
Th e inherently sub jec t i ve nature of probability has le d many s t u d e n t s to
the belief that coherence, or internal consistency, is the only valid
criterion by which judged probabilities should b e evaluated. From the
standpoint of the formal theory of subjective probability, any se t of
internally c o n s i s t e n t probability j u d g m e n t s is as g o o d as a n y o ther . This
criterion is not entirely sat isfactory, b e cau se a n internally cons is ten t se t of
subjective probabilities can be incompatible with other beliefs held by the
individual. C o n s i d e r a person w h o s e sub jec t i ve probabilities for all possi-
b le o u t c o m e s of a coin- toss ing g a m e r efle c t th e g a m b le r 's fa lla cy . Th a t is ,
his estimate of the probability of tails on a particular toss increases with
the number of consecutive heads that preceded that toss . The judgments of
such a person could b e internally c o n s i s t e n t an d therefore acceptab le as
 
20 INTRUDUCTIUN
t h e o r y . These probabilities. h o w e v e r , a re incompatible with th e generally
held belief that a coin ha s no memory and is therefore incapable of
generating sequential dependencies. For judged probabilities to b e consid-
e re d a d equ a te , or rational, internal cons is tency is not e n o u g h . The judg-
m e n ts m u s t b e compatible with th e entire w e b of beliefs held b y th e
individual. Unfortunately, t he re can b e n o simple formal p r o c e d u r e for
assessing th e compatibility of a se t of probability judgments with th e
judge's total system of beliefs. The rational judge will nevertheless strive
for compatibility, e v e n though internal cons is tency is m o r e easi ly
ach ieved an d assessed. In particular, he will attempt to m a k e his probabil-
ity judgments compatible with his knowledge about the s u b j e c t matter, the
laws of probability, an d h is own judgmental heuristics a n d biases.
Summary
This article described three heuristics that are employed in making
judgments under uncertainty: (i) rep resen ta t i veness , which is usually
employed when people a re a s k e d to judge the probability that an object or
event A b elon g s to class or process B; (ii) availability of i ns tances or
scenar ios, which is often employed when people are asked to assess th e
frequency of a class or the plausibility of a particular development; and
(iii) adjustment from an anchor, which is usually employed in numerical
prediction when a relevant va lue is ava i lab le . These heuristics are highly
e c o n o m i c a l an d usually ef fec t ive, but they lead to sys te m a tic a nd predict-
able er rors . A b e t t e r understanding of these heuristics a n d of th e biases to
which they lead could improve judgments an d dec is ions in situations of
uncertainty.
P a r t II
 
2. Bel ie f in th e law of s m a l l n u m b e r s
A m o s Tversky an d Danie l K a h n e m t m
 S u p p o se you have run an experiment o n 20 sub jec ts , a n d have obtained a
significant result which confirms your theory (z = = 2.23, p - = - . : .05, two-
ta i led). Y ou n ow have cause to ru n an a dd i t i o na l g r o u p of 1 0 subjects.
What do you think the probability is that the results will b e significant, by
a one-tailed test, separately for this group?”
If you fe e l that the probability is s o m e w h e r e around .85, you m ay b e
pleased to know that you belong to a majority group. Indeed, that was the
m e d i a n a n s w e r of two s m a l l g r o u p s who w e r e kind enough to r e s p o n d to a
questionnaire distributed at meetings of the Mathematical Psychology
Group a n d of th e American P s y c h o lo g i ca l A s s oc ia t io n .
On the other hand, if you feel that the probability is around . 48 , you
belong to a minority. Only 9 of our 8 4 respondents gave answers between
.40 an d .60. However, .4 8 h a p p e n s to b e a m u ch m ore reasonab le es t ima te
than .85. ‘
Apparently, most psychologists have an exaggerated belief in the likeli-
hood of successfu l ly replicating an o b t a i n e d finding. Th e sources of such
' Th e required can b e interpreted in s eve ra l w a ys . O n e is to
follow c o m m o n research prac t ice , where a v alu e o b ta in e d in on e study is taken to define a
plausible alternative to the null hypothesis. The probability requested in the question can
then b e interpreted as the power of th e s e c o n d tes t {i .e., th e probability of obtaining a
significant result in th e second s a m p l e } aga ins t th e alternative hypo thes is defined by th e
result of th e first s a m p l e . In th e spec ia l case of a tes t of a m e a n with known v a ria n c e , o n e
would c om p u te th e power of the tes t aga ins t th e h yp oth e sis that th e population m e a n
equals th e m e a n of th e first s am p le . S in c e the size of th e s e c o n d samp le is half that of th e
first, th e c o m p u t e d probability of obtaining z 3 1.6 45 is only .4?3. A theoretically m o r e
justifiable approach is to interpret the requested probability within a Bayesian framework
a n d c o m p u t e it relative to s o m e appropriately se lec ted prior distribution. Assuming a
uniform prior, th e des i r ed posterior probability is .4?B. Clearly, if the prior distribution
favors th e null hypo thes is . as is often th e case. th e posterior probability will b e e v e n
sma l l e r .
T h is c h a pt e r originally appeared in Psychological Bullet in, 19?1, 2 , 105-10. Copyright Q 1 9 1 7 1 b y
the American Psychological Association. Reprinted by permission.
 
24 REPRESENTATIVENESS
beliefs, and their consequences for the conduct of scientific inquiry, a re
what this paper is about. Our thesis is that people have strong intuitions
about random sampling; that these intuitions are wrong in fundamental
respects; that these intuitions are shared by na ive sub jec ts a n d b y trained
sc ien t is ts ; a n d tha t they are applied with unfortunate consequences in th e
course of scientific inquiry.
We submit that people view a sample randomly drawn from a popula-
tion as highly rep resen ta t i ve , tha t is , similar to the population in all
essent ia l character is t ics. C o n s e q u e n t l y , they expec t a n y two samples
drawn from a particular population to be more similar to one another and
to th e population t han sampling theory pred ic ts , a t least for s m a l l
samples .
The tendency to regard a s a m p l e as a representation is manifest in a
wide variety of situations. When subjects a re instructed to generate a
random sequence of hypothetical tosses of a fair coin, for example, they
p r o d u c e s e qu e n c es where the proportion of h e a ds in a n y short s e g m e n t
s tays far closer to .5 0 than the laws of chance would predict (Tune, 1 9 6 4 } .
Thus , each s e g m e n t of th e response s e qu e n c e is highly rep resen ta t i ve of
th e “fairness” of th e co in . Similar ef fects are o b s e r v e d when sub jec ts
success ive ly predict even ts in a randomly g e n e r a t e d ser ies, as in probabil-
ity learning e x p e r i m e n t s (Estes, 1964) or in other s e q u e n t i a l g a m e s of
chance. Subjects act as if every segment of the random sequence rnust
r efle c t th e t rue proportion: if th e s e qu e n c e has s t rayed from th e popu la-
tion proportion, a corrective bias in the other direction is expected. This
ha s been called the gambler’s fallacy.
The heart of the gambler's fallacy is a misconception of the fairness of
the laws of chance. The gambler feels that the fairness of the coin entitles
him to e xp e ct that an y deviation in on e direction will soon b e cance l led b y
a corresponding deviation in the other. Even the fairest of coins, however,
given the limitations of its memory and moral s e n s e , cannot b e as fair as
the gambler expects it to b e . This fallacy is not unique to gamblers.
C o n s i d e r the following example :
Th e m e a n IQ of th e population of eighth g rade rs in a city is k n o w n to b e 100. Y ou
have se lec ted a r a n d o m sample of 5 0 children for a study of educational ach ieve-
ments. The first child tested h a s an IQ of 1 5 0 . What do you expect the mean IQ to
b e for the whole sample?
The correct answer is 101. A surprisingly large number of people believe
that the expected IQ for the sample is still 10-0. This expectation can be
justified only b y th e belief that a random process is self-correcting. Idioms
such as “errors cance l each other out” r efle c t the i m a g e of an act ive
self-correcting process. Some familiar processes in nature obey such laws:
a deviation from a stable equilibrium p r o d u c e s a force tha t res tores the
equilibrium. Th e laws of c h a nc e , in cont ras t , d o not work that w a y :
deviations are not canceled as sampling proceeds, they are merely
diluted.
 
Belief in th e la w of s m a l l n u m b e r s 25
T h u s far, w e have to desc r i be two re la ted intuitions a b o u t
c h a nc e . W proposed a re p re s e n ta tio n h yp oth e sis a cc ord in g to which
people b e l i e v e s a m pl e s to b e very similar to on e another a n d to the
population from which they are drawn. W e also sugges ted that people
b e l i e v e sampling to b e a self-correcting process. Th e two beliefs le ad to the
s a m e c o n se q u e nc e s . Both g e ne ra te e x pe c ta tio n s a b o u t c h ara cte r is tic s of
samples, and the variability of these expectations is less than the true
variability, at least for s m a l l samples .
The law of la rge n u m b e r s g u a r a n t e e s tha t very la rge samples will
i n d e e d b e highly rep resen ta t i ve of th e population from which they are
drawn. If, in addition, a self-corrective t e n d e n c y is a t work, t h e n s m a l l
samples should also b e highly representative and similar to one another.
People's intuitions about random sampling appear to satisfy the law of
s m a l l n u m b e r s , which asserts tha t th e law of la rge n u m b e r s appl ies to
small numbers as well.
Consider a hypothetical scientist who lives by the law of small numbers.
I-low would h is belief affect h is scientific work? A s s u m e our sc ien t i s t
s tud ies p h e n o m e n a w h o s e magnitude is s m a l l relative to uncontrolled
variability, that is , th e signal-to-noise ratio in the messages he rece ives
from nature is low. Our scientist could b e a meteorologist, a pharmacolo-
g is t , or perhaps a psycholog is t .
If he believes in the law of small numbers, the scientist will have
e x a g g e r a t e d c o nfid e n c e in th e validity of conc lus ions b a s e d on s m a l l
samples . To illustrate, suppose he is e n g a g e d in studying which of two
toys infants will prefer to play with. Of the first five infants studied, four
have shown a preference the s a m e toy. Many a psychologist will feel
s o m e c o n f i d e n c e a t this point, t ha t the null hypothes is of n o p r e f e r e n c e is
false. Fortunately, such a conviction is not a sufficient condition for
journal publication, although i may do for a book. By a quick computa-
tion, our psycho log is t will d iscover that th e probability of a result as
extreme as the one obtained is as high as 3 , 1 5 under the null hypothesis.
To b e su re , th e application of stat is t ical hypothesis testing to scientific
inference is beset with serious difficulties. Nevertheless, the computation
of s ign ificance levels (o r likelihood rat ios, as a Bayesian might prefer)
forces the scientist to evaluate the obtained effect in terms of a valid
es t ima te of sampling var iance rather t han in t e r m s of h is s u b j e c tiv e b iased
es t ima te . Sta t is t ica l tests, therefore, pro tec t the scientific community
against overly hasty rejections of the null hypothesis (i.e., Type I error} by
policing it s m a n y m e m b e r s who would rather live b y the law of small
numbers. On other hand, there a re n o comparable s a f e g u a r d s against
the risk of failing to confirm a valid research hypothesis (i.e., Type II
error).
I m a g i n e a psychologist who s t u d i e s the correlation b e t w e e n n e e d for
achievement and grades. When deciding on sample size, he may reason as
follows: “What correlation d o I expect? r - .35 . What N d o I n e e d to m a k e
 
26 REPRESENTATIVENE55
Th e only flaw in this r e a s o n i n g is that our psychologist has forgotten
a b o u t sampling variation, poss ib ly he be l i eves th a t a n y sample
must b e highly representative of its population. I-lowever, if his g u e s s
a b ou t th e correlation in th e population is c orre c t, th e correlation in th e
sample is about as likely to lie below or above . 35 . Hence, the likelihood of
obtaining a significant result ( i .e., the power of th e test) for N - 3 3 is a b o u t
.50.
In a detailed investigation of stat is t ical power, I. Cohen (1962, 1 9 6 9 ) has
provided plausible definitions of large, medium, and small ef fec ts and a n
ex tens ive se t of computational aid s to th e estimation of power for a variety
of stat is t ical tests. In th e normal tes t for a difference b e t w e e n two m e a n s ,
for example, a difference of . 2 5 1 1 is small, a difference of . 500 is medium,
a n d a difference of lo is la rge, accord ing to th e p ro po se d definitions. The
m e a n IQ difference b e t w e e n cler ica l an d semiskilled workers is a medium
ef fect . In an ingenious study of research pract ice, I. C o h e n (1962) reviewed
all the statistical analyses published in one volume of the [ n o r m a l of
A b n or m a l an d Social Psychology, a n d computed th e likelihood of detecting
the three sizes effect. The average power was .13 for the
d e t e c t i o n of s m a l l ef fects, .4 8 for medium ef fects, an d .8 3 for la rge effects.
If psycholog is ts typically expec t medium ef fects a n d se lec t sample size as
in th e a b ove e xa m p le , the power of their s tud ies should i n d e e d b e a b o u t
. 5 0 .
Cohen 's a n a ly s is s h ow s that the stat is t ical power of m a n y psycho log ica l
s t u d i e s is ridiculously low. This is a self-defeating pract ice: it m a k e s for
frustrated scientists and inefficient research. The investigator who t es ts a
valid hypothes is but fails to obtain significant resu l ts c a n n o t help but
regard n a t u r e as untrustworthy or e v e n host i le . Furthermore, as Overall
(1 9 6 9) h as s h o w n , the preva lence of s tud ies deficient in stat is t ical power is
not only wasteful but actually pernicious: it results in a large proportion of
invalid re jec t ions of th e null hypothes is a m o n g published resu l ts .
Because cons ide ra t ions of stat is t ical power are of particular importance
in the design of replication studies, we probed attitudes concerning
replication in our questionnaire.
Suppose one of your doctoral students has completed a difficult and time-
consuming experiment on 40 a n i m a l s . H e has scored an d ana lyzed a large number
of var iab les . His resu l ts are generally inconclusive, but on e before-after compari-
son yields a highly significant t - 2.'?[l, which is surprising a n d could b e of m a j o r
theoretical s ign ificance .
Considering the importance of the result, its surprisal value, and the number of
analyses that your student has performed, would you r e c o m m e n d that h e repl ica te
th e study be fo re publishing? If you r e c o m m e n d replication, how m a n y a n i m a l s
would you urge him to run?
Among the psychologists to whom we put these questions there w a s
 
Belief in th e law of small n u m b e r s 2?
out of 7 5 r e s p o n d e n t s , probably because they suspected that the single
significant result w as d u e to chance. The m e d i a n recommendation w as for
th e doc to ra l student to run 20 sub jec ts in a replication s t u d y . It is
instructive to c o n s i d e r the likely c o n se q u e nc e s of this adv ice . If th e m e a n
a n d th e v aria n c e in th e s e c on d sample are actually identical to those in th e
first sample , then th e resulting value of t will b e 1.88. Following th e
reasoning of--Footnote 1, the student's chance of obtaining a significant
result in the replication is only slightly above one-half (for p - .05,
one-tail te st). S in ce w e had anticipated that a replication sample of 20
would appear reasonab le to our re spon d e n ts , w e a d d e d the following
question:
A s s u m e tha t your unhappy student has in fact repea ted th e initial study with 20
additional a n im a ls , a n d has obtained an insignificant resu l t in th e s a m e direction,
t - 1.24. What would you recommend now? Check one: [the numbers in
paren theses re fe r to th e number of r e s p o n d e n t s who c h e c k e d each a n s w e r ]
(a ) H e should p oo l th e re s u lts a n d publish his conclusion as fact. (ll)
{ I 1 } H e should report th e r e su lts as a tentative finding. (26)
{c } H e should run another group of [median 20] an ima ls . (21)
(d) H e should try to find a n explanation for th e difference b e t w e e n th e two
groups. (30)
N o t e tha t rega rd less of one ’s c o n f i d e n c e in th e original finding, it s
credibility is surely e n h a n c e d b y th e replication. Not only is the e x p e r i -
mental effect in the s a m e direction in the two samples but the magnitude
of th e ef fec t in th e replication is fully two-thirds of that in the original
study. In view of the sample size (20), which our respondents recom-
m e n d e d , th e replication w as a b o u t as successfu l as on e is entitled to expec t .
The distribution of responses, however, reflects continued skepticism
concerning the s t u d e n t ' s finding following the r e c o m m e n d e d replication.
This unhappy state of affairs is a typical consequence of insufficient
statistical power.
In cont ras t to Responses t r a n d c , which can b e justified o n s o m e
g r o u n d s , th e m o s t popular response, Response a ‘, is indefensible. W e doubt
that th e s a m e a n s w e r would have b e e n o b t a i n e d if th e re sp on d e n ts had
realized that the difference between the two studies does not even
approach significance. (If the variances of the two samples are equal, I for
th e difference is .53.) In th e a b s e n c e of a stat is t ical test , our r e s p o n d e n t s
followed th e r e p re s e n t at io n h y po th e s is : as th e difference b e t w e e n th e two
samples w a s larger than they expected, they viewed it as worthy of
explanation. However, th e attempt to  find an explanation for the differ-
e n c e b e t w e e n the two groups is in all probability an exerc ise in explain-
ing noise.
Altogether our respondents evaluated the replication rather harshly.
follows from the r e p re s e n t a tio n h y po th e s is : if w e expec t all
 
2 3 REPRESENTATIVENES5
should b e statistically significant Th e harshness th e crite-
rion for successfu l replication is manifest in th e responses to th e following
q u e s t i o n :
An investigator has repor ted a result that you c o n s i d e r implausible. He ran 15
sub jec ts , a n d reported a significant va lue , t = 2.46 . Another investigator has
attempted to duplicate his procedure, and he obtained a nonsignificant value of t
with the same number of subjects. The direction was the same in both se ts of data.
Y ou are reviewing th e literature. W h a t is th e highest va lue oft in th e s e c o n d se t
of data that you would describe as a failure to replicate?
Th majority of our re s po n d e n ts re g a rd e d t - 1.20 as a failure to repl ica te .
If the data of two such s tu d ie s (t = 2.46 an d t = 1.70) are p o o l e d , the value
of t for th e c o m b i n e d d ata is a b o u t 3.00 (a ss u m in g e qu a l v a ria n c e s ). T h u s ,
we a re fa ce d with a paradox ica l state of affa i rs, in which th e s a m e data that
would increase our confidence in the finding when viewed part of the
original s t u d y , shake our c o n f i d e n c e when viewed as an independent
study. This double standard is particularly disturbing since, for many
reasons, replications are usually cons ide red as independent s tu d ie s, a n d
hypo theses are often eva lua ted b y listing confirming a n d disconfirming
repor ts .
Contrary to a widespread belief, a case can be made that a replication
sample should often b e larger t han the original. The dec is ion to repl ica te a
once o b t a i n e d finding often expresses a g r e a t f o n d n e s s for that finding
an d a des i re to se e it accepted by a skep t i ca l community. Since that
community u n r e a s o n a b l y d e m a n d s that th e replication b e independently
significant, or a t least tha t it approach s i g n ific a n c e , on e m u s t run a large
sa ple. To illustrate, if the unfortunate doctoral student whose thesis was
earlier a s s u m e s the validity of his initial result (t - 2.70, N‘ 40),
an d if he is willing to accept a risk of only .1 0 of obtaining a t lower t han
1.20, he should run approximately 5 0 animals in his replication study.
With a som e w ha t w e ake r initial result (t - 2.20, N - 40], th e size of the
replication sample required for th e s a m e power r ises to a b o u t 7 5 .
Th a t th e e f fe c ts d is cu s s e d t h u s far are not limited to hypo theses a b o u t
means and variances is demonstrated by the responses to the following
ques t ion :
Y ou have run a correlational s t u d y , scoring 20 v ar ia b le s o n 100 sub jec ts . Twenty-
s e v e n of th e I9 0 correlation coeffic ien ts are significant at th e .0 5 level: a n d 9 of
these are significant b eyon d th e . 0 1 leve l . Th e m e a n abso lu te leve l of th e
significant correlations is . 31 , and the pattern of results is very reasonable on
theoretical g r o u n d s . How m a n y of th e 2? significant correlations would you e x p e c t
to b e significant aga in , in n exac t replication of th e s tu d y, with N - 40?
With N = 40 . a correlation of about .3 1 is required for significance at the
.0 5 leve l . This is the m e a n of the significant correlations in the original
s t u d y . Thus , only a b o u t half of the originally significant correlations (i.e.,
 
Belief in the law of small numbers 2 9
correlations in th e replication are bound to differ from those in the
original s t u d y . H e n c e , b y reg ress ion ef fects, th e initially significant coeffi -
c ien ts are m o s t likely to b e r e d u c e d . Thus , 8 to 10 repea ted significant
correlations from th e original 2? is probably a g e n e r ou s e s tim a te of what
on e is entitled to expec t . The m e d i a n es t ima te of our r e s p o n d e n t s is 18 .
This is m o r e th an th e number of repea ted significant correlations that will
b e found if th e correlations are r e c o m p u t e d for 40 sub jec ts randomly
se lec ted from the original 100 Apparently, people expec t m o r e than a
m e r e duplication of th e original stat ist ics in th e replication s a m p l e : they
expec t a duplication of th e s ign ificance of resu l ts , with little r e g a r d for
sample s ize . This expectation requires a ludicrous extension of the repre-
s e n ta tio n h yp oth e sis : e v e n th e law of sm a ll n u m b e rs is incapable of
generating such a
Th e expecta t ion that pat te rns of resu l ts are replicable a l m o s t in their
entirety prov ides th e rationale for a c o m m o n , though much deplored
p ra ctic e . Th e investigator who c o m p u t e s all correlations b e t w e e n th ree
indexes of anxiety and three indexes of dependency will often report and
interpret with g r e a t c o nfid e n c e the single significant correlation o b t a i n e d .
His c o nfid e n c e in th e shaky finding s t e m s from h is belief that th e
obtained correlation matrix is highly rep resen ta t i ve a n d readily rep l ica-
ble.
In review, w e ha ve seen that th e believer in th e law of sm a ll n u m b e rs
practices science as follows:
1 . H e g a m b l e s h is research hypo theses on s m a l l samp les without realiz-
ing that th e o d d s aga ins t him are u n r e a s o n a b l y high. H e overes t ima tes
p o w e r .
2. H e has u n d u e c o nfid e n c e in early t r e n d s (e.g., th e data of th e first
few subjects) and in the stability of observed patterns (e.g., the number
and identity of significant results). He overestimates significance.
3 . In evaluating replications, h is or others ’ , he has u n r e a s o n a b l y high
e xpe cta tion s a b ou t th e replicability of significant resu l ts . H e underesti-
m a t e s th e b r e a d t h of c o nfid e n c e intervals.
4. H e ra re ly attributes a deviation of resu l ts from expec ta t ions to
sampling variability, b e c a u s e he finds a causal “explanation” for any
discrepancy. Thus, he ha s little opportunity to recognize sampling varia-
tion in action. His belief in the law of small numbers, therefore, will
forever remain intact.
Our questionnaire elicited cons ide rab le e v i d e n c e for th e p re v ale n c e of
the belief in the law of small nu bers? Our typical respondent is a
believer, regard less of th e group to which he b e l o n g s . Th ere w e re practi-
cally n o d i f f e r e n c e s b e t w e e n the m e d i a n responses of a u d i e n c e s at a
2 W . E d w a r d s (19 63 , 25 ) has a r g u e d that peop le fail to e xtra ct sufficient information or
certainty from probabilistic data; he cal led this failure conse r va t i sm. Our r e s p o n d e n t s can
hardly b e d e s c r i b e d as conservat ive . R a t h e r , in accord with th e r e p r e s e n t a t io n h y p o th e s i s ,
 
3 0 REPRESENTATIVENES5
mathematical psychology meeting and at a general session of the Ameri-
can Psycho log ica l Associat ion convention, although w e m a k e n o c la ims
for the rep resen ta t i veness of either s a m p l e . Apparently, acqua in tance
with formal logic a n d with probability theory d o e s not extinguish erro-
neous intuitions. What, then, c a n b e done? Can the belief in the law of
sm a ll n u m b e r s b e abo l i shed or a t least controlled?
Research exper ience is unlikely to help m u c h , because sampling var ia-
tion is all too easi ly “explained.” Corrective exper iences are those that
provide neither motive nor opportunity for spurious explanation. T h u s , a
student in a stat ist ics course m ay draw repea ted samples of given size from
a population, an d learn the ef fec t of sample size on sampling variability
from p e r s o n a l observa t ion . W e are far from cer ta in , however, that expecta-
tions can b e corrected in this manner, since related b i a s e s , such as the
g a m b le r ’s fa lla c y , survive c o n s i d e r a b l e contradictory e v i d e n c e .
Even if the b ias c a n n o t b e unlearned, s t u d e n t s can learn to recogn ize its
ex is tence a n d take the necessary precau t ions . S ince the teaching of stat is-
t ics is not short on admonitions, a warning a b o u t b iased stat is t ical intui-
tions may not b e out of place. The obvious precaution is computation. The
believer in th e law of small n u m b e r s has incorrect intuitions a b o u t
significance level, power, an d c o nfid e n c e intervals. Sign ificance levelsare
usually c o m p u t e d an d r e p o r t e d , but power an d c o nfid e n c e limits are n o t .
Perhaps they should be.
Explicit computation of p o w e r , relative to s o m e reasonab le hypo thes is ,
for instance, ]. Cohen’s ( 1 9 6 2 , 1 9 6 9 ) small, large, and medium effects,
should surely b e car r i ed out b e f o r e a n y study is d o n e . S u ch computations
will often lead to the realization that there is simply no point in running
the study unless, for example, sample size is multiplied by four. We refuse
to b e lie ve th at a se r ious investigator will knowingly accept a .5 0 risk of
failing to confirm a valid research hypo thes is . In addition, computations
of power are essent ia l to th e interpretation of n e g a t i v e resu l ts , that is ,
failures to re jec t th e null hypothes is . Because readers ’ intuitive es t ima tes
of power are likely to b e wrong, the publication of computed values does
not appear to b e a waste of either readers’ time or journal space .
In th e early psycho log ica l literature, the convention prevailed of report-
ing, for e x a m p l e , a sample m e a n as M 1 PE, where PE is the p r o b a b l e error
{i.e., the 5 0% c o nfid e n c e interval around the m e a n ) . This convention w as
later a b a n d o n e d in favor of th e hypothesis-testing formulation. A confi-
d e n c e interval, however, provides a u s e f u l index of sampling variability,
and it is precisely this variability that we tend to underestimate. The
e m p h a s i s o n s ig n ific a n c e levels te n ds to obscu re a fundamental distinction
b e tw e e n th e size of an ef fec t a n d its stat is t ical s ign ificance . R e ga r d l e ss of
sample size, the size of an effect in one study is a reasonable estimate of
the size of th e ef fect in replication. In cont ras t , the e s t i m a t e d s ign ificance
a depends critically sample s ize . expec-
 
Belief in th e la w of s m a l l numbers 31
if the distinction b e t w e e n size an d s ign ificance is clarified, a n d if th e
computed size of observed effects is routinely reported. From this point of
view, a t least, th e acceptance of th e hypothesis-testing model has not b e e n
an unmixed b less ing for psycho logy .
Th e true believer in th e law of s m a l