Belief Learning in an Unstable Infinite Game

32
Belief Learning in an Unstable Infinite Game Paul J. Healy CMU

description

Belief Learning in an Unstable Infinite Game. Paul J. Healy CMU. Issue #3. Issue #2. Belief Learning in an Unstable Infinite Game. Issue #1. Issue #1: Infinite Games. Typical Learning Model: Finite set of strategies Strategies get weight based on ‘fitness’ - PowerPoint PPT Presentation

Transcript of Belief Learning in an Unstable Infinite Game

Page 1: Belief Learning in an Unstable Infinite Game

Belief Learning in an Unstable Infinite Game

Paul J. Healy

CMU

Page 2: Belief Learning in an Unstable Infinite Game

Belief Learning in an Unstable Infinite Game

Issue #3

Issue #1

Issue #2

Page 3: Belief Learning in an Unstable Infinite Game

Issue #1: Infinite Games

• Typical Learning Model:– Finite set of strategies– Strategies get weight based on ‘fitness’– Bells & Whistles: experimentation, spillovers…

• Many important games have infinite strategies– Duopoly, PG, bargaining, auctions, war of attrition…

• Quality of fit sensitive to grid size?• Models don’t use strategy space structure

Page 4: Belief Learning in an Unstable Infinite Game

Previous Work

• Grid size on fit quality:– Arifovic & Ledyard

• Groves-Ledyard mechanisms• Convergence failure of RL with |S| = 51

• Strategy space structure:– Roth & Erev AER ’99

• Quality-of-fit/error measures– What’s the right metric space?

• Closeness in probs. or closeness in strategies?

Page 5: Belief Learning in an Unstable Infinite Game

Issue #2: Unstable Game

• Usually predicting convergence rates– Example: p–beauty contests

• Instability: – Toughest test for learning models– Most statistical power

Page 6: Belief Learning in an Unstable Infinite Game

Previous Work

• Chen & Tang ‘98– Walker mechanism & unstable Groves-Ledyard– Reinforcement > Fictitious Play > Equilibrium

• Healy ’06– 5 PG mechanisms, predicting convergence or not

• Feltovich ’00– Unstable finite Bayesian game– Fit varies by game, error measure

Page 7: Belief Learning in an Unstable Infinite Game

Issue #3: Belief Learning

• If subjects are forming beliefs, measure them!

• Method 1: Direct elicitation– Incentivized guesses about s-i

• Method 2: Inferred from payoff table usage– Tracking payoff ‘lookups’ may inform our models

Page 8: Belief Learning in an Unstable Infinite Game

Previous Work

• Nyarko & Schotter ‘02– Subjects BR to stated beliefs– Stated beliefs not too accurate

• Costa-Gomes, Crawford & Boseta ’01– Mouselab to identify types– How players solve games, not learning

Page 9: Belief Learning in an Unstable Infinite Game

This Paper

• Pick an unstable infinite game• Give subjects a calculator tool & track usage• Elicit beliefs in some sessions

• Fit models to data in standard way• Study formation of “beliefs”

– “Beliefs” <= calculator tool– “Beliefs” <= elicited beliefs

Page 10: Belief Learning in an Unstable Infinite Game

The Game

• Walker’s PG mechanism for 3 players• Added a ‘punishment’ parameter

N 1, 2, 3Si 10, 10 R1

u isi , s i viys tisys

jsj

viy b i y a i y2

tis si 1 si 1ys

Page 11: Belief Learning in an Unstable Infinite Game

Parameters & Equilibrium

• vi(y) = biy – aiy2 + ci

• Pareto optimum: y = 7.5• Unique PSNE: si* = 2.5

• Punishment γ = 0.1• Purpose: Not too wild, payoffs rarely negative

• Guessing Payoff: 10 – |gL - sL|/4 - |gR - sR|/4• Game Payoffs: Pr(<50) = 8.9%

Pr(>100) = 71%

ai bi ci

1 0.1 1.5 110

2 0.2 3.0 125

3 0.3 4.5 140

Page 12: Belief Learning in an Unstable Infinite Game

Choice of Grid Size

Grid Width 5 2 1 1/2 1/4 1/8

# Grid Points 5 11 21 41 81 161

% on Grid 59.7 61.6 88.7 91.6 91.9 91.9

S = [-10,10]

Page 13: Belief Learning in an Unstable Infinite Game

Properties of the Game

• Best response:

• BR Dynamics: unstable– One eigenvalue is +2

sBR b1/2a1b2/2a2b3/2a3

0 1 /2a1 1 /2a1

1 /2a2 0 1 /2a21 /2a3 1 /2a3 0

s

Page 14: Belief Learning in an Unstable Infinite Game

Interface

Page 15: Belief Learning in an Unstable Infinite Game

Design• PEEL Lab, U. Pittsburgh• All Sessions

– 3 player groups, 50 periods– Same group, ID#s for all periods– Payoffs etc. common information– No explicit public good framing– Calculator always available– 5 minute ‘warm-up’ with calculator

• Sessions 1-6– Guess sL and sR.

• Sessions 7-13– Baseline: no guesses.

Page 16: Belief Learning in an Unstable Infinite Game

• Total Variation:– No significant difference (p=0.745)

• No. of Strategy Switches:– No significant difference (p=0.405)

• Autocorrelation (predictability):– Slightly more without elicitation

• Total Earnings per Session:– No significant difference (p=1)

• Missed Periods:– Elicited: 9/300 (3%) vs. Not: 3/350 (0.8%)

Does Elicitation Affect Choice?

t|xt xt 1 |

Page 17: Belief Learning in an Unstable Infinite Game

Does Play Converge?

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

12

14

16

18

20Average Distance From Equilibrium

Average | si – si* | per Period Average | y – yo | per Period

0 5 10 15 20 25 30 35 40 45 500

1

2

3

4

5

6

7

8

9

10Average |y - yo|

Page 18: Belief Learning in an Unstable Infinite Game

Does Play Converge, Part 2

0 5 10 15 20 25 30 35 40 45 50-10

-8

-6

-4

-2

0

2

4

6

8

10

Page 19: Belief Learning in an Unstable Infinite Game

Accuracy of Beliefs• Guesses get better in time

0 5 10 15 20 25 30 35 40 450

2

4

6

8

10

12

14

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

12

14

Average || s-i – s-i(t) || per Period

Elicited guesses Calculator inputs

Page 20: Belief Learning in an Unstable Infinite Game

Model 1: Parametric EWA

• δ : weight on strategy actually played• φ : decay rate of past attractions• ρ : decay rate of past experience• A(0): initial attractions• N(0): initial experience• λ : response sensitivity to attractions

Atsi At 1si N t 1 1 Is i ,s it u isi , s it

N tN t N t 1 1

itsi e A ts i

xS ie A tx

Page 21: Belief Learning in an Unstable Infinite Game

Model 1’: Self-Tuning EWA

• N(0) = 1• Replace δ and φ with deterministic functions:

tsi 1 if u isi , s it u ist0 otherwise

i,t 1 12 xS i

1t

1

t

Ix,s i Ix,s it

2

Atsi Nt 1A t 1s i 1 Isi ,sit u is i ,s it

Nt 1 1

Page 22: Belief Learning in an Unstable Infinite Game

STEWA: Setup

• Only remaining parameters: λ and A0

– λ will be estimated

– 5 minutes of ‘Calculator Time’ gives A0

• Average payoff from calculator trials:

A0si

t 1T

I si ,si tu ist

t 1T

I si ,si t

if t 1T Is i ,s it 1

1T

t 1T u ist otherwise

Page 23: Belief Learning in an Unstable Infinite Game

STEWA: Fit

• Likelihoods are ‘zero’ for all λ– Guess: Lots of near misses in predictions

• Alternative Measure: Quad. Scoring Rule

– Best fit: λ = 0.04 (previous studies: λ>4)– Suggests attractions are very concentrated

1 kP isik , t Isik , sit

2

Page 24: Belief Learning in an Unstable Infinite Game

-10

-8

-6

-4

-2

0

2

4

6

8

10

1112131415161718192021222324252627282930

-

0.2000

0.4000

0.6000

0.8000

1.0000

EWA Prob

Strategy

Period

STEWA Lambda=4: Session 3 Player 2 Pers 11-30

Page 25: Belief Learning in an Unstable Infinite Game

-10

-8

-6

-4

-2

0

2

4

6

8

10

1112131415161718192021222324252627282930

0

0.2

0.4

0.6

0.8

1

EWA Prob

Strategy

Period

STEWA Lambda=0.04: Session 3 Player 2 Pers 11-30

Page 26: Belief Learning in an Unstable Infinite Game

STEWA: Adjustment Attempts

• The problem: near misses in strategy space,

not in time• Suggests: alter δ (weight on hypotheticals)

– original specification : QSR* = 1.193 @ λ*=0.04– δ = 0.7 (p-beauty est.): QSR* = 1.056 @ λ*=0.03– δ = 1 (belief model): QSR* = 1.082 @ λ*=0.175– δ(k,t) = % of B.R. payoff: QSR* = 1.077 @ λ*=0.06

• Altering φ:– 1/8 weight on surprises: QSR* = 1.228 @ λ*=0.04

Page 27: Belief Learning in an Unstable Infinite Game

STEWA: Other Modifications

• Equal initial attractions: worse• Smoothing

– Takes advantage of strategy space structure• λ spreads probability across strategies evenly• Smoothing spreads probability to nearby strategies

– Smoothed Attractions– Smoothed Probabilities– But… No Improvement in QSR* or λ* !

• Tentative Conclusion:– STEWA: not broken, or can’t be fixed…

Page 28: Belief Learning in an Unstable Infinite Game

Other Standard Models

• Nash Equilibrium• Uniform Mixed Strategy (‘Random’)• Logistic Cournot BR• Deterministic Cournot BR• Logistic Fictitious Play• Deterministic Fictitious Play• k-Period BR

st BR 1k t k

t 1 s

Page 29: Belief Learning in an Unstable Infinite Game

“New” Models

• Best respond to stated beliefs (S1-S6 only)

• Best respond to calculator entries– Issue: how to aggregate calculator usage?– Decaying average of input

• Reinforcement based on calculator payoffs– Decaying average of payoffs

Page 30: Belief Learning in an Unstable Infinite Game

Model ComparisonsMODEL PARAM BIC 2-QSR MAD MSD

Random Choice* N/A In: Infinite In: 0.952

Out: 0.878

In: 7.439

Out: 7.816

In: 82.866

Out: 85.558

Logistic STEWA* λ In: Infinite In: 0.807

Out: 0.665

λ*=0.04

In: 3.818

Out: 3.180

λ*=0.41

In: 34.172

Out: 22.853

λ*=0.35

Logistic Cournot* λ In: Infinite In: 0.952

Out: 0.878

λ*=0.00(!)

In: 4.222

Out: 3.557

λ*=4.30

In: 38.186

Out: 25.478

λ*=4.30

Logistic F.P.* λ In: Infinite In: 0.955

Out: 0.878

λ*=14.98

In: 4.265

Out: 3.891

λ*=4.47

In: 31.062

Out: 22.133

λ*=4.47

* Estimates on the grid of integers {-10,-9,…,9,10}

In = periods 1-35 Out = periods 36-End

Page 31: Belief Learning in an Unstable Infinite Game

Model Comparisons 2MODEL PARAM MAD MSD

BR(Guesses)

(6 sessions only)

N/A In: 5.5924

Out: 3.3693

In: 57.874 Out: 19.902

BR(Calculator Input) δ (=1/2) In: 6.394

Out: 8.263

In: 79.29

Out: 116.7

Calculator Reinforcement*

δ (=1/2) In: 7.389

Out: 7.815

In: 82.407

Out: 85.495

k-Period BR k In: 4.2126

Out: 3.582

k* = 4

In: 35.185

Out: 23.455

k* = 4

Cournot N/A In: 4.7974

Out: 3.857

In: 45.283

Out: 29.058

Weighted F.P. δ In: 4.500

Out: 3.518

δ* = 0.56

In: 38.290

Out: 22.426

δ * = 0.65

Page 32: Belief Learning in an Unstable Infinite Game

The “Take-Homes”

• Methodological issues– Infinite strategy space– Convergence vs. Instability– Right notion of error

• Self-Tuning EWA fits best.

• Guesses & calculator input don’t seem to offer any more predictive power… ?!?!