Belief Learning in an Unstable Infinite Game

Belief Learning in an Unstable Infinite Game

Paul J. Healy

CMU

Belief Learning in an Unstable Infinite Game

Issue #3

Issue #1

Issue #2

Issue #1: Infinite Games

• Typical Learning Model:– Finite set of strategies– Strategies get weight based on ‘fitness’– Bells & Whistles: experimentation, spillovers…

• Many important games have infinite strategies– Duopoly, PG, bargaining, auctions, war of attrition…

• Quality of fit sensitive to grid size?• Models don’t use strategy space structure

Previous Work

• Grid size on fit quality:– Arifovic & Ledyard

• Groves-Ledyard mechanisms• Convergence failure of RL with |S| = 51

• Strategy space structure:– Roth & Erev AER ’99

• Quality-of-fit/error measures– What’s the right metric space?

• Closeness in probs. or closeness in strategies?

Issue #2: Unstable Game

• Usually predicting convergence rates– Example: p–beauty contests

• Instability: – Toughest test for learning models– Most statistical power

Previous Work

• Chen & Tang ‘98– Walker mechanism & unstable Groves-Ledyard– Reinforcement > Fictitious Play > Equilibrium

• Healy ’06– 5 PG mechanisms, predicting convergence or not

• Feltovich ’00– Unstable finite Bayesian game– Fit varies by game, error measure

Issue #3: Belief Learning

• If subjects are forming beliefs, measure them!

• Method 1: Direct elicitation– Incentivized guesses about s-i

• Method 2: Inferred from payoff table usage– Tracking payoff ‘lookups’ may inform our models

Previous Work

• Nyarko & Schotter ‘02– Subjects BR to stated beliefs– Stated beliefs not too accurate

• Costa-Gomes, Crawford & Boseta ’01– Mouselab to identify types– How players solve games, not learning

This Paper

• Pick an unstable infinite game• Give subjects a calculator tool & track usage• Elicit beliefs in some sessions

• Fit models to data in standard way• Study formation of “beliefs”

– “Beliefs” <= calculator tool– “Beliefs” <= elicited beliefs

The Game

• Walker’s PG mechanism for 3 players• Added a ‘punishment’ parameter

N 1, 2, 3Si 10, 10 R1

u isi , s i viys tisys

jsj

viy b i y a i y2

tis si 1 si 1ys

Parameters & Equilibrium

• vi(y) = biy – aiy2 + ci

• Pareto optimum: y = 7.5• Unique PSNE: si* = 2.5

• Punishment γ = 0.1• Purpose: Not too wild, payoffs rarely negative

• Guessing Payoff: 10 – |gL - sL|/4 - |gR - sR|/4• Game Payoffs: Pr(<50) = 8.9%

Pr(>100) = 71%

ai bi ci

1 0.1 1.5 110

2 0.2 3.0 125

3 0.3 4.5 140

Choice of Grid Size

Grid Width 5 2 1 1/2 1/4 1/8

# Grid Points 5 11 21 41 81 161

% on Grid 59.7 61.6 88.7 91.6 91.9 91.9

S = [-10,10]

Properties of the Game

• Best response:

• BR Dynamics: unstable– One eigenvalue is +2

sBR b1/2a1b2/2a2b3/2a3

0 1 /2a1 1 /2a1

1 /2a2 0 1 /2a21 /2a3 1 /2a3 0

s

Interface

Design• PEEL Lab, U. Pittsburgh• All Sessions

– 3 player groups, 50 periods– Same group, ID#s for all periods– Payoffs etc. common information– No explicit public good framing– Calculator always available– 5 minute ‘warm-up’ with calculator

• Sessions 1-6– Guess sL and sR.

• Sessions 7-13– Baseline: no guesses.

• Total Variation:– No significant difference (p=0.745)

• No. of Strategy Switches:– No significant difference (p=0.405)

• Autocorrelation (predictability):– Slightly more without elicitation

• Total Earnings per Session:– No significant difference (p=1)

• Missed Periods:– Elicited: 9/300 (3%) vs. Not: 3/350 (0.8%)

Does Elicitation Affect Choice?

t|xt xt 1 |

Does Play Converge?

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

12

14

16

18

20Average Distance From Equilibrium

Average | si – si* | per Period Average | y – yo | per Period

0 5 10 15 20 25 30 35 40 45 500

1

2

3

4

5

6

7

8

9

10Average |y - yo|

Does Play Converge, Part 2

0 5 10 15 20 25 30 35 40 45 50-10

-8

-6

-4

-2

0

2

4

6

8

10

Accuracy of Beliefs• Guesses get better in time

0 5 10 15 20 25 30 35 40 450

2

4

6

8

10

12

14

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

12

14

Average || s-i – s-i(t) || per Period

Elicited guesses Calculator inputs

Model 1: Parametric EWA

• δ : weight on strategy actually played• φ : decay rate of past attractions• ρ : decay rate of past experience• A(0): initial attractions• N(0): initial experience• λ : response sensitivity to attractions

Atsi At 1si N t 1 1 Is i ,s it u isi , s it

N tN t N t 1 1

itsi e A ts i

xS ie A tx

Model 1’: Self-Tuning EWA

• N(0) = 1• Replace δ and φ with deterministic functions:

tsi 1 if u isi , s it u ist0 otherwise

i,t 1 12 xS i

1t

1

t

Ix,s i Ix,s it

2

Atsi Nt 1A t 1s i 1 Isi ,sit u is i ,s it

Nt 1 1

STEWA: Setup

• Only remaining parameters: λ and A0

– λ will be estimated

– 5 minutes of ‘Calculator Time’ gives A0

• Average payoff from calculator trials:

A0si

t 1T

I si ,si tu ist

t 1T

I si ,si t

if t 1T Is i ,s it 1

1T

t 1T u ist otherwise

STEWA: Fit

• Likelihoods are ‘zero’ for all λ– Guess: Lots of near misses in predictions

• Alternative Measure: Quad. Scoring Rule

– Best fit: λ = 0.04 (previous studies: λ>4)– Suggests attractions are very concentrated

1 kP isik , t Isik , sit

2

-10

-8

-6

-4

-2

0

2

4

6

8

10

1112131415161718192021222324252627282930

-

0.2000

0.4000

0.6000

0.8000

1.0000

EWA Prob

Strategy

Period

STEWA Lambda=4: Session 3 Player 2 Pers 11-30

-10

-8

-6

-4

-2

0

2

4

6

8

10

1112131415161718192021222324252627282930

0

0.2

0.4

0.6

0.8

1

EWA Prob

Strategy

Period

STEWA Lambda=0.04: Session 3 Player 2 Pers 11-30

STEWA: Adjustment Attempts

• The problem: near misses in strategy space,

not in time• Suggests: alter δ (weight on hypotheticals)

– original specification : QSR* = 1.193 @ λ*=0.04– δ = 0.7 (p-beauty est.): QSR* = 1.056 @ λ*=0.03– δ = 1 (belief model): QSR* = 1.082 @ λ*=0.175– δ(k,t) = % of B.R. payoff: QSR* = 1.077 @ λ*=0.06

• Altering φ:– 1/8 weight on surprises: QSR* = 1.228 @ λ*=0.04

STEWA: Other Modifications

• Equal initial attractions: worse• Smoothing

– Takes advantage of strategy space structure• λ spreads probability across strategies evenly• Smoothing spreads probability to nearby strategies

– Smoothed Attractions– Smoothed Probabilities– But… No Improvement in QSR* or λ* !

• Tentative Conclusion:– STEWA: not broken, or can’t be fixed…

Other Standard Models

• Nash Equilibrium• Uniform Mixed Strategy (‘Random’)• Logistic Cournot BR• Deterministic Cournot BR• Logistic Fictitious Play• Deterministic Fictitious Play• k-Period BR

st BR 1k t k

t 1 s

“New” Models

• Best respond to stated beliefs (S1-S6 only)

• Best respond to calculator entries– Issue: how to aggregate calculator usage?– Decaying average of input

• Reinforcement based on calculator payoffs– Decaying average of payoffs

Model ComparisonsMODEL PARAM BIC 2-QSR MAD MSD

Random Choice* N/A In: Infinite In: 0.952

Out: 0.878

In: 7.439

Out: 7.816

In: 82.866

Out: 85.558

Logistic STEWA* λ In: Infinite In: 0.807

Out: 0.665

λ*=0.04

In: 3.818

Out: 3.180

λ*=0.41

In: 34.172

Out: 22.853

λ*=0.35

Logistic Cournot* λ In: Infinite In: 0.952

Out: 0.878

λ*=0.00(!)

In: 4.222

Out: 3.557

λ*=4.30

In: 38.186

Out: 25.478

λ*=4.30

Logistic F.P.* λ In: Infinite In: 0.955

Out: 0.878

λ*=14.98

In: 4.265

Out: 3.891

λ*=4.47

In: 31.062

Out: 22.133

λ*=4.47

* Estimates on the grid of integers {-10,-9,…,9,10}

In = periods 1-35 Out = periods 36-End

Model Comparisons 2MODEL PARAM MAD MSD

BR(Guesses)

(6 sessions only)

N/A In: 5.5924

Out: 3.3693

In: 57.874 Out: 19.902

BR(Calculator Input) δ (=1/2) In: 6.394

Out: 8.263

In: 79.29

Out: 116.7

Calculator Reinforcement*

δ (=1/2) In: 7.389

Out: 7.815

In: 82.407

Out: 85.495

k-Period BR k In: 4.2126

Out: 3.582

k* = 4

In: 35.185

Out: 23.455

k* = 4

Cournot N/A In: 4.7974

Out: 3.857

In: 45.283

Out: 29.058

Weighted F.P. δ In: 4.500

Out: 3.518

δ* = 0.56

In: 38.290

Out: 22.426

δ * = 0.65

The “Take-Homes”

• Methodological issues– Infinite strategy space– Convergence vs. Instability– Right notion of error

• Self-Tuning EWA fits best.

• Guesses & calculator input don’t seem to offer any more predictive power… ?!?!

Belief Learning in an Unstable Infinite Game

Documents

Transcript of Belief Learning in an Unstable Infinite Game