1 Learning and the Economics of Small Decisions Ido Erev and Ernan Haruvy Mainstream analyses of...

1

Learning and the Economics of Small Decisions

Ido Erev and Ernan Haruvy

Mainstream analyses of economic behavior assume that incentives shape behavior even when individual agents have limited understanding of the environment. The shaping process in these cases is indirect: The economic incentives determine the agents’ experience, and this experience in turn drives future behavior.

Consider, for example, an agent that has to decide whether to cross the road at a particular location and time.

The agent is not likely to understand the exact incentive structure and compute the implied equilibria. Rather, she (he or it) is likely to response to past experiences.

The current chapter reviews experimental studies that explore this shaping process.

2

The clicking paradigm

The current experiment includes many trials. Your task, in each trial, is to click on one of the two keys presented on the screen. Each click will be followed by the presentation of the keys’ payoffs. Your payoff for the trial is the payoff of the selected key.

You selected Right. Your payoff in this trial is 1Had you selected Left, your payoff would be 0

10

Not a test of rational economic theoryThe rationality assumption is not even wrong

2

3

S R P(R)

5 0 (10, .1; -1) 27

6 0 (-10, .1; +1) 60

400 trials, ¼ cent per point

1. Underweighting of rare events (Barron & Erev, 2003)

Risk Seeking

Experience-Description gap (Hertwig et al, 2004)

Occurs in one-shot decisions from sampling

Implies a reversed Allais paradox: (4, .8) > 3, but (4,.2) ~ (3, .25)

Robust to prior information (Lajarraga & Gonzalez, 2011)

Similar pattern in Honey Bee (Shafir et al., 2008)

Taleb’s Black Swan effect

Sensitivity to magnitude: -20 vs. -10 (Ert & Erev, 2010)

Risk Aversion

4

H L P(H)1 1 0 962 (11, .5; -9) 0 583 0 (9, .5; -11) 53

2. The payoff variability effect (Myers & Sadler, 1960; Busemeyer &Townsend, 1993 ).

Neither!!

Risk aversionOr Loss aversion?

0

0.20.4

0.60.8

11.2

1 2 3 4 5 6 7 8 9 1

Blocks of 20 trials

Pro

po

rtio

n o

f H

C

ho

ices Problem 1

Problem 2

Problem 3

3. The Big Eye effect (Ben Zion et al., 2010, Grosskopf et al., 2006)

x ~ N(0,300), y ~ N(0, 300)

R1: xR2: y M: Mean(R1,R2) + 5

0

0.2

0.4

0.6

0.8

1

1 10 30 50 70 90

Trial

Ass

et M

Pro

p

Deviation from: maximization, risk aversion, loss aversion.Implies under-diversificationRobust to prior information

4. The hot stove effect (Hogarth & Einhorn, 1992; March and Denrell, 2002).5

6

6. The very recent effect (Nevo & Erev, 2010)

5. Surprise-triggers-changeEvaluation of the sequential dependency in 2-alternative studies reveals a 4-fold recency pattern:

Problem Proportion. Of repeated R choices

Proportion. Of Switches to R

0 or (+1, .9; -10) After +1After -10

8469

After +1After -10

2131

0 or (+10, .1; -1) After +10After -1

6079

After +10After -1

23 6

7

7. Consistent individual differences (see Bechara, Damasio, Damasio and Anderson, 1994; Yechiam et al., 2007)

Correlation between behavior in Problem 2 “0 or (11, .5; -9)” and in Problem 3 “0 or (9, .5; -11)”

Statistic Correlation

Loss/risk attitude 0.18

Recency (Best reply 1) 0.69

Distance from 0.5 0.75

8

I-SAW (Inertia, Sampling and Weighting, Nevo & Erev, 2012)

Three response modes: Exploration, exploitation and inertia.

At each exploitation trial player i computes the estimated value of alternative j as:ESV(j) = (1-wi)(Mean of sample of mi from j) + wi (Grand Mean j)

And the very last outcome is more likely to be in the sample. The alternative with the highest ESV is selected.

Exploration implies random choice.

Inertia implies repetition of the last choice. The probability of inertia decreases when the outcomes are surprising. Surprise is computed by the gap between the payoff at t, and the payoffs in the previous trials

An example of a case based decision model (Gilboa & Schmeidler, 1995 and see related ideas in Kareev, 2000; Osborne and Rubinstein, 1998; Gonzalez et al., 2003) .

8

9

Two choice prediction competitions (Erev et al. 2010a, 2010b)

1. Individual choice tasks http://tx.technion.ac.il/~eyalert/Comp.html

The task: Predicting the proportion of risky choices in binary choice task in the clicking paradigm without information concerning forgone payoffs.

Two studies (estimation and competition) each with 60 conditions. We published the estimation, and challenge other researchers to predict the result of the second. The models were rank based on their squared error.

The best baseline is a predecessor of I-SAW. The winning submission, submitted by Stewart, West & Lebiere is based on a similar instance based (“episodic”) logic (with a quantification in ACT-R).

Reinforcement learning and similar “semantic” models did not do well.9

http://tx.technion.ac.il/~eyalert/Comp.html

10

2. Market entry games http://sites.google.com/site/gpredcomp

The task: Predicting behavior in a repeated 4-person market entry games with complete feedback. At each trial each player has to choose between:

R: Entering a risky market (expected payoff decreasing with entrants)

S: Staying out (a safer option)

Two studies (estimation and competition) each with 40 conditions We published the estimation, and challenge other researchers to predict the result of the second.

The models were rank based on their squared error.

The best baseline is I-SAW. The winner, Chen et al., is a variant of I-SAWThe running up, Gonzalez, Dutt & Lejarraga, is a similar instance based (“episodic”) logic. 10

http://sites.google.com/site/gpredcomp

11

Relationship to reinforcement learning

There are four main reasons to the popularity of reinforcement learning models:

1. Effectiveness (Sutton and Barto, 1998) 2. Neural correlates (Schultz, 1998)3. Useful ex ante-predictions (Erev & Roth, 1998)4. Easy to estimate using elegant statistical methods

12

Relationship to reinforcement learning

There are four main reasons to the popularity of reinforcement learning models:

1. Effectiveness (Sutton and Barto, 1998) 2. Neuro correlates (Schultz, 1998)3. Useful ex ante-predictions (Erev & Roth, 1998)4. Easy to estimate using elegant statistical methods

But:1.The effective learning occurs only when the state of nature is known, I-saw can do better in dynamic settings.2.I-SAW (and many other models have similar neuro correlates)3.I-SAW provides better ex-ante predictions.4.The estimations are elegant under the assumption that the model is “well specified”

Yet, the predictions of I-SAW can be the product of a case-contingent reinforcement learning process.

G1 B1 G2 B2

G1 .97 .01 .01 .01

B1 .01 .97 .01 .01

G2 .01 .01 .01 .97

G2 .01 .01 .97 .01

S: “0 for sure” or R “+1 if G, -1 if B”

13

Learning in games, and the effect of prior information

The entry game competition demonstrates that the existence of social interaction does not have to change the learning model that best captures behavior.

Another indication of the generality of basic learning processes come from the study of games with unique mixed strategy equilibrium (Erev & Roth, 1998)

A2 B2 Statistic Eq. Minimal Full I-SAW1 A1 .77 .35 P(A1) 49 68 59 64

B1 .08 .48 P(A2) 16 42 32 28

2 A1 .73 .74 P(A1) 99 76 84 84B1 .87 .20 P(A2) 79 40 36 21

9 A1 .40 .76 P(A1) 65 58 56 61B1 .91 .23 P(A2) 51 45 45 46

14

I-SAW and similar model that assume learning between action fail when the instructions lead the subject to learn among more sophisticated strategies. One example is the prisoner dilemma game (data from Rapoport & Chammah, 1965)

PD1 C D

C 1,1 -10,10

D 10,-10 -1,-1 0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6

Pro

po

rtio

n C

Blocks of 50 trials

Fixed

Random

14

15

B1 B2 B3 B4 B5 B6 B7 B8 B9

A1 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, 2 -9, -9 -9, -9

A2 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, 2 -9, -9 -9, -9

A3 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, 2 -9, -9 -9, -9

A4 2, -9 2, -9 2, -9 2, -9 2, -9 2, -9 -1, -2 2, -9 2, -9

A5 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, 2 -9, -9 -9, -9

A6 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, 2 -9, -9 -9, -9

A7 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, 2 -9, -9 -9, -9

A8 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, 2 -9, -9 -9, -9

A9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, -9 -9, 2 -9, -9 7, 7

The description experience gap in games (Erev & Greiner, 2012)

16

Applications and the economics of small decisions.

The discoveries--innovations gap (a result of too much coffee with Al)

St. Petersburg paradox, Allais paradox, rejections in the ultimatum game, endowment effect, fine is a price….trade, markets, money, rule enforcement, auctions, incentive schemes.

One explanation to this gap is based on the following assertions:(1)Many of the popular discoveries are reflections of reliance on experiences in similar but not identical situations.

(2)Many of the innovations involve a change of the incentive structure to insure that the desired behavior will be better on average and better the most similar situations. Thus, even I-SAW like agents will respond to the change.

17

Gentle COntinuous Punishment (gentle COP): Enforcement of safety rules (Erev & Rodansky, 2004; and see Zohar, 1980; Zohar and Luria, 1994)

Enforcement is necessaryWorkers like enforcement programsProbability is more important than magnitudeLarge punishments are too costly, therefore, gentle enforcement can be optimalSmall brother

01020304050

60708090

100

Ear plugs

Eye protection

Gloves

18

Gentle COP2: Washing hands and using gloves in hospitals

In 1847, Dr. Ignaz Semmelweis first demonstrated that routine hand-washing could prevent the spread of disease. In an experiment, Dr. Semmelweis insisted that his students staffing a Vienna hospital’s maternity ward wash their hands before treating the maternity patients--and deaths on the maternity ward fell dramatically. In one study, it fell from 15% to near 0%!!. Though his findings were published, there was no apparent increase in hand washing by doctors until the discoveries of Louis Pasteur years after Dr. Semmelweis died in a mental asylum (Nuland, 2003).

http://en.wikipedia.org/wiki/File:Ignaz_Semmelweis_1860.jpg

Relative value of violation

Proportion of violators

Gentle COP3: Cheating in exams

Many rule enforcement problems has at least two equilibria.

The gentle COP idea is particularly effective in these settings. It can be used to move the game to the desired equilibrium.

19

Seven undergraduate courses were selected to participate in the study. In all courses the final exam was conducted in two rooms. One room was randomly assigned to the experimental (gentle COP) condition, and the second was assigned to the control condition. The only difference between the two conditions involved the timing of the preparation of the map in the instructions to the proctors. In the control group the instruction was:

(2c) “A map of the students seating should be prepared immediately after the beginning of the exam.”

After finishing the exam, the students were asked to complete a brief questionnaire in which they were ask to “rate the extent to which students cheated in this exam relative to other exams.”

The results reveal lower cheating ratings in the gentle COP room in all 7 courses.

20

21

Broken Window theory

Kelling and Wilson (1982) suggest that physical decay and disorder in a neighborhood increase crime rate. This suggestion, known as Broken Windows theory, was motivated by a field experiment conducted by Zimbardo (1969). Broken windows theory was a motivation for the “quality of life” policing strategy implemented in New York City in the mid 1990’s (Kelling & Sousa, 2001). This policing strategy advocated increased number of police on the streets and arresting persons for less serious but visible offenses. Some credit this strategy for the decline in crime and disorder . However, field studies that test the broken windows hypothesis provide mixed results. Skogan (1990) found that robbery victimization was higher in neighborhoods characterized by disorder, but Harcourt (2001) found that the crime-disorder relationship did not hold for other crime types including burglary, assault, rape and pick-pocketing.

22

The effect of the timing of warning (Barron, Leider & Stack, 2008)

Evaluation of the impact of warnings reveals a large effect of prior experience. Individuals who have had good experiences in the past are less affected by the warning. For example, when the FDA added a black-box warning to the drug Cisapride, the data show an increase in usage of 2% among repeat users, but a decrease of 17% amongst first-time users (Smalley, et. al., 2000). Another example is provided by a study of parent-adolescent sexual communication. Regular condom use was found to be lower when parent-adolescent sexual communication occurred at a later age (Hutchinson, 2002).

Barron, Leider and Stack (2008) show that part of the effect of experience remains even after controlling for the available information. This part appears to be a reflection of the experience description gap.

23

The evolution of social groups

Proximity is an important determinant of liking. Even if students are randomly assigned to rooms, individuals are more likely to become friends with and have a favorable impression of individuals who are nearby (Segal, 1974). Denrell (205) shows that this pattern can be a product of the hot stove effect:

Our opinions about our friends are likely to change after each meeting. When the opinion is negative, and we can avoid this friend, the opinion last longer.

24

Investors decisions:

The black swan effect

Simulated vs. real index funds.

Under diversification

Positive correlations between price change and volume of trade

25

Summary

Many of the classical properties of human and animal learning can be reliably reproduced in the easy to run (and to model) clicking paradigm.

The main results can be predicted with instance based models that assume best reply to small samples of experiences in similar cases. The implied behavioral processes are evolutionary reasonable, but can lead to robust deviations from maximization.

The current understanding of decisions from experience is sufficient to shed light on many natural problems.

26

Related topics

Objective tests

One period a head econometrics

ENO

Level-1 reasoning

Imitation

Learned helplessness

1 Learning and the Economics of Small Decisions Ido Erev and Ernan Haruvy Mainstream analyses of...

Documents

Transcript of 1 Learning and the Economics of Small Decisions Ido Erev and Ernan Haruvy Mainstream analyses of...