“ The most exciting phrase to hear in science, the one that heralds new discoveries, is not...

21
most exciting phrase to hear in science, the one heralds new discoveries, is not ‘Eureka!’ (‘I found ather ‘hmm.... That’s funny.’ ” Isaac Asimov

Transcript of “ The most exciting phrase to hear in science, the one that heralds new discoveries, is not...

“The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ (‘I found it!’) but rather ‘hmm.... That’s funny.’ ”

Isaac Asimov

Working Memory Constraints and Biases in Sequential Binary Decision-Making

Tasks

Alternation bias as a consequence of the small sample (working memory constrained) properties of

autocorrelation

Kim KaivantoDepartment of Economics

Lancaster University Management School

Encoding of the Past in Economic Agents and InstitutionsThe Scottish Institute for Advanced Studies

Friday 9 January 200911:30−12:00

Why is the perception of randomness important? The basis for subjective prob. reasoning, induction, inference, etc.

Why binary sequences? Basic, fundamental, simplest case. Many problems, evolutionarily

important and current, either inherently have a binary structure or can be simplified/reduced to a binary structure.

What is alternation bias and why is it important? ‘Alternation bias’ is a robust and widely-replicated finding...

To be described in greater detail… The – at least ‘an’ – accepted explanation for

• the gambler’s fallacy, the ‘hot hand’, overattribution, …• implications for: the disposition effect, escalation of commitment, the St. Petersburg Paradox, …

Motivation

The individual experiences or ‘sees’ random sequences through the narrow window of working memory. Typical individuals do not store the full history of random sequence in episodic (long term) memory. Instead, let us conjecture that, individuals typically store their subjec-tive characterisations of random sequences in semantic (long term) memory – specifically, a subjective representation/analogue of ‘autocorrelation’. The ‘wet ware’ analogue of data compression. These subjective autocorrelation characterisations are formed in working memory, and are thereby constrained. The small sample (working memory span) sampling distribution of ρ1,ρ2, ρ3,… become central to perception of randomness.

Negative bias in these sampling distributions explains alternation bias.

The idea

Sequences (a) and (b) were each generated by a randomisationdevice designed to generate an i.i.d. random sequenceP(H)=P(T)=½ and P(T|T)=P(H|T)=P(H|H)=P(T|H)=½.

(a) HHHTTHHTHHTTHTHHTTTTH (b) HHTTHTHTHHHTTHTTTHHTH One of the two devices is known to operate properly, while theother is known to be defective.

Q. Which of the two series is more likely to have been generatedby the defective device?

Alternation Bias

Sequences (a) and (b) were each generated by a randomisationdevice designed to generate an i.i.d. random sequenceP(H)=P(T)=½ and P(T|T)=P(H|T)=P(H|H)=P(T|H)=½.

(a) HHHTTHHTHHTTHTHHTTTTH (b) HHTTHTHTHHHTTHTTTHHTH One of the two devices is known to operate properly, while theother is known to be defective.

Q. Which of the two series is more likely to have been generatedby the defective device?

People tend tochoose (a)

Alternation Bias

Sequences (a) and (b) were each generated by a randomisationdevice designed to generate an i.i.d. random sequenceP(H)=P(T)=½ and P(T|T)=P(H|T)=P(H|H)=P(T|H)=½.

(a) HHHTTHHTHHTTHTHHTTTTH (b) HHTTHTHTHHHTTHTTTHHTH One of the two devices is known to operate properly, while theother is known to be defective.

Q. Which of the two series is more likely to have been generatedby the defective device?

a) P(T|T)=P(H|T)=P(H|H)=P(T|H)=½b) P(T|H)=P(H|T)=.6, P(T|T)=P(H|H)=.4

People consistently judge series with alteration rates of .6 to be “random”,whereas an alteration rate of .5 produces runs too long to be thought of as “random”.

People tend tochoose (a)

When actually (b)is ‘defective’

Alternation Bias

For higher-order effects, consider the conditional probabilities embodying ‘alternation bias’ as reported by Rabin (2002) QJE:

P(T|T) = 0.42 so P(H|T) = 0.58P(T|TT) = 0.38 so P(H|TT) = 0.62P(T|TTT...) = 0.30 so P(H|TTT…) = 0.70

Thus the literature supplies the following alternation bias estimates:

P(H|T) P(H|T) P(H|TT) P(H|TTT)

.6 .58 .62 .70

First order Higher order

n.b. These estimates are elicited without invoking monetary payoffs. Hence, there can be no confounding of (i) conditional probability distortion with (ii) outcome value weighting.

Alternation Bias

Wagenaar (1970) Acta Psychologica; (1972) Psychological Bulletin

• “The results showed that sequences with conditional probabilitiesof around .4 [and of alternation of .6] were judged to be most random”

Bar-Hillel & Wagenaar (1991) Advances in Applied Mathematics

• “Peoples’ perception of randomness is biased in that they see clumps orstreaks in truly random series and expect more alternation, or shorter runs, than are there. Similarly, the produce series with higher than ex-pected alternation rates. …The subjectively ideal random sequenceobeys “local representativeness”; namely, in short segments of it, itrepresents both the relative frequencies (e.g., for a coin, 50%–50%)and the irregularity (avoidance of runs and other patterns).”

• most explanations offered are mostly qualitative and ad hoc in nature

Alternation Bias

Some explanations:

Ross & Levy (1958), Teraoka (1963) – faulty subjective concept of randomness

Tune (1964), Baddeley (1966), Weiss (1964, 1965), Machado (1993) – functionallimitations of subjects, e.g. limited capacity of memory, limited attention memory,recall blocking/interference, forgetting,

Falk (1981), Kahneman & Tversky (1972) – judgemental heuristic: “representa-tiveness bias”

Neuringer (1986) – lack of skill (which can however be learned with appropriatefeedback training)

Alternation Bias

Rabin (2002) ‘Inference by Believers in the Law of Small Numbers’, QJE, 117(3)

R’02→ ‘the law of small numbers’ a.k.a. ‘the local representativeness effect’ is responsible for e.g. the Gambler’s Fallacy and belief in the ‘hot hand’.

R’02→ uses an urn analogy (explain/illustrate!)

R’02→ ‘overinference’ of e.g. fund manager skill, or of market price trends

R’02→

“If a person believes every pair of flips of a fair coin generates one head and one tail,then he believes that two heads in a row indicates a biased coin.

If he believes that an average fund manager is successful once every two years, then he believes that a fund manager who is successful two years in a row must be unusually good.

I formalize this overinference result by showing that, after two signals, a believer in the law of small numbers always has on average more extreme beliefs than heshould.”

Rapoport & Budescu (1997) Psychological Review

People “produce sequences with too few symmetries and long runs and too many alternations among events. The authors propose a psychological theory to ac-count for these findings, which assumes that subjects generate non-random se-quences that locally represent theoretical random series subject to a constraint ontheir short-term memory. Closed-form expressions are then derived for the major statistics that have been used to test for deviations from randomness.”

Limited working memory and ‘Narrow Window Theory’

Human working memory is limited to 7±2 items (e.g. Miller, 1956). However, it can be as low as 4 or as high as 10.

Yaakov Kareev (1995, 1997ab, 2000, 2004, 2007): ‘Narrow Window Theory’

“The samples of data that people use in their attempts to detect relationships in the environment are limited in size by working memory capacity.

…Pearson's rxy distribution is skewed when the population correlation differs from zero (i.e., when a correlation exists), and the more so, the smaller the sample.

As both the median and the mode of the sampling distribution are more extreme than the population value, it follows that samples likely to be encountered indicate a correlation stronger than that in the population.

The limited capacity of working memory may serve as an amplifier that helps people to avoid missing strong relationships.

As the distribution is more skewed the smaller the sample size, the effect suggests an explanation for the fact that young children detect meaningful covariation fairly rapidly.”

Yaakov Kareev (2000) Seven (Indeed, Plus or Minus Two) andthe Detection of Correlations, Psychological Review, 107(2).

What about the effect of limited sample size (working memory) on perception of (auto)correlation in sequences?

Huitema & McKean (1991)

How does small-sample bias manifest for (i) fair and (ii) ρ1=0 binary sequences?

What about the effect of limited sample size (working memory) on perception of (auto)correlation in sequences?

How does small-sample bias manifest for (i) fair and (ii) ρ1=0 binary sequences?

+ Binary sequences are basic, interesting & fundamental.

→ Is the empirical small-sample bias negative?

→ An explanation for Alternation Bias?

→ An explanation for the Gambler’s Fallacy, Hot Hand / Overattribution?

+ Explanations based on a documented, robust memory constraint, rather than an ‘urn analogy’ or mere assumption of ‘belief in the law of small numbers’

+ Testable hypotheses concerning:

• data compression via the narrow window vs. no compression • window ‘width’ and the magnitude of alternation bias

Sampling distribution of ρ1 for the sample size 10 drawn 10,000 times from a fair and memoryless Bernoulli process (true, not pseudo-random numbers).

Mean = -0.102688 μ2 = 0.0849096 μ3 = 0.00616234 μ4 = 0.0209152

Frequency Percent Valid Percent Cumulative

Percent Valid -.900 21 .2 .2 .2 -.900 3 .0 .0 .2 -.733 9 .1 .1 .3 -.733 9 .1 .1 .4 -.733 49 .5 .5 .9 -.733 38 .4 .4 1.3 -.700 88 .9 .9 2.2 -.700 15 .2 .2 2.3 -.567 42 .4 .4 2.7 -.567 91 .9 .9 3.7 -.567 231 2.3 2.3 6.0 . . . . . -.011 15 .2 .2 61.0 . . . . . .624 45 .5 .5 99.3 .624 3 .0 .0 99.3 .683 18 .2 .2 99.5 .683 19 .2 .2 99.7 .683 3 .0 .0 99.7 .700 1 .0 .0 99.7 .700 15 .2 .2 99.9 1.000 13 .1 .1 100.0 Total 10000 100.0 100.0

Limited working memory, Alternation Bias

Sampling distribution of ρ1 for the sample size 7 drawn 10,000 times from a fair and memoryless Bernoulli process (true, not pseudo-random numbers).

Mean = -0.129589μ2 = 0.117093μ3 = 0.0244065μ4 = 0.0495815

Frequency Percent Valid Percent Cumulative

Percent Valid -.857 180 1.8 1.8 1.8 -.857 9 .1 .1 1.9 -.607 886 8.9 8.9 10.8 -.607 67 .7 .7 11.4 -.457 345 3.5 3.5 14.9 -.457 444 4.4 4.4 19.3 -.457 158 1.6 1.6 20.9 -.357 348 3.5 3.5 24.4 -.357 129 1.3 1.3 25.7 -.274 44 .4 .4 26.1 -.274 922 9.2 9.2 35.3 -.257 651 6.5 6.5 41.8 -.257 182 1.8 1.8 43.7 -.257 423 4.2 4.2 47.9 . . . . . -.024 70 .7 .7 78.3 . . . . . .443 61 .6 .6 92.8 .443 163 1.6 1.6 94.4 .443 80 .8 .8 95.2 .560 20 .2 .2 95.4 .560 229 2.3 2.3 97.7 .560 86 .9 .9 98.5 1.000 146 1.5 1.5 100.0 Total 10000 100.0 100.0

Limited working memory, Alternation Bias

Sampling distribution of ρ1 for the sample size 5 drawn 10,000 times from a fair and memoryless Bernoulli process (true, not pseudo-random numbers).

Mean = -0.135855μ2 = 0.173576μ3 = 0.0705881 μ4 = 0.120731

Frequency Percent Valid Percent Cumulative

Percent Valid -.800 376 3.8 3.8 3.8

-.800 317 3.2 3.2 6.9 -.467 271 2.7 2.7 9.6 -.467 1434 14.3 14.3 24.0 -.467 566 5.7 5.7 29.6 -.467 281 2.8 2.8 32.5 -.300 424 4.2 4.2 36.7 -.300 583 5.8 5.8 42.5 -.300 264 2.6 2.6 45.2 -.300 519 5.2 5.2 50.4 -.133 288 2.9 2.9 53.2 -.133 67 .7 .7 53.9 -.133 284 2.8 2.8 56.7 -.050 268 2.7 2.7 59.4 -.050 358 3.6 3.6 63.0 -.050 277 2.8 2.8 65.8 -.050 293 2.9 2.9 68.7 -.050 52 .5 .5 69.2 .033 112 1.1 1.1 70.3 .033 610 6.1 6.1 76.4 .033 544 5.4 5.4 81.9 .367 834 8.3 8.3 90.2 .367 401 4.0 4.0 94.2 1.000 577 5.8 5.8 100.0 Total 10000 100.0 100.0

Limited working memory, Alternation Bias

Frequency Percent Valid Percent Cumulative

Percent -.750 1347 13.5 13.5 13.5 -.417 2463 24.6 24.6 38.1 -.250 1273 12.7 12.7 50.8 -.083 2451 24.5 24.5 75.3 .250 1252 12.5 12.5 87.9 1.000 1214 12.1 12.1 100.0

Valid

Total 10000 100.0 100.0

Mean = -0.1032 μ2 = 0.246817 μ3 = 0.124055 μ4 = 0.207792

Sampling distribution of ρ1 for the sample size 4 drawn 10,000 times from a fair and memoryless Bernoulli process (true, not pseudo-random numbers).

Limited working memory, Alternation Bias

Sampling distribution of ρ1 for samples of size 10, size 7, and size 5.

Limited working memory, Alternation Bias