Lecture 6 EWA (and others) - Heidelberg University · 2016. 11. 2. · EWA: model n players,...

Today

Experience-Weighted Attraction learningModelComparison

Directional learning/Trial&Error learningImitationSummary

2 / 44

oechsslerRechteck

EWA

Paper: Camerer & Ho (1999)Broad model with several parametersContains Fictitious Play (FP) and Reinforcement Learning(RL) as special cases

FP and RL can be tested withhin the modelWhen fitted, EWA will always be as least as good as FPand RLSimilarities and differences between the two can be shownin the model

3 / 44

EWA: model

n players, indexed by isji ∈ Si strategy j of player i , finite strategy spaceπi(si(t), s−i(t)) payoff of player i in round tIn RL, the state was determined by, for each player, avector which included one entry per strategy, calledpropensitiesIn EWA, each player has a similar vector with one entry perstrategy, A(t), called attractionsIn addition to that, the state also includes a number N(t),which is termed observation-equivalents

Note: N is not the number of players

A and N are updated each round to form the learningtheory

4 / 44

EWA: N

N(t) “Observation-equivalents”: Roughly the same asprevious rounds of experienceN(0) Value of N at the start of the game, one of theparameters of the theoryUpdating rule for N:

N(t) = ρN(t − 1) + 1, t ≥ 1

Set value of 1 is added each roundPrevious experience is depreciated with a discount factor ρ

5 / 44

EWA: A

Aji(t) “Attraction” of player i ’s strategy j in round t

Aji(0) Initial attractions at the start of the game, parametersof the theoryUpdating rule for A consists of 3 parts:

1 Keeping (discounted) old attraction: First entry in thenumerator

2 Adding value given the outcome of the current round:Second entry in the numerator

3 Normalization with experience: Denominator

6 / 44

EWA: A

1. Keeping discounted old attractions:

φN(t − 1)Aji(t − 1)

φ: Second discount rate, for old attractions, can bedifferent from ρ

7 / 44

EWA: A

2. Adding value for current round

φN(t − 1)Aji(t − 1) +[δ + (1− δ)I(sji , si(t))]πi(s

ji , s−i(t))

I is the indicator function, which is = 1 if the strategy wasused this round, and = 0 if the strategy was not usedδ determines weight added for used vs unused strategies:

δ = 0 Only used strategy is updated (as in RL)δ = 1 Used and unused strategies treated the same (as inFP)δ is another parameter of the theory

8 / 44

EWA: A

3. Normalization with experience

φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(sji , si(t))]πi(s

ji , s−i(t))

N(t)

N(t) = ρN(t − 1) + 1

N(0) is pregame experience, further N ingame experience

9 / 44

EWA: Parameters

Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s

ji , si(t))]πi(s

ji , s−i(t))

ρN(t − 1) + 1

In FP, the initial weights determine, how strongly the “prior”beliefs about the game influence gameplaySimilar here, N(0) determines the strengths of pregameinfluence. Someone with extremely high N(0) will notchange his play according to actual experience, but stick tohis pre-conceived knowledge of the game

10 / 44

EWA: Parameters


ji , si(t))]πi(s

ji , s−i(t))

N(t) = ρN(t − 1) + 1

The initial attractions need not be distributed evenly. Justas N(0) reflects the strength of pre-game knowledge, Aji(0)reflects what exactly you know prior to the gameNote the similarity with RL’s propensities and initialpropensities

11 / 44

EWA: Parameters


ji , si(t))]πi(s

ji , s−i(t))

ρN(t − 1) + 1

Two discount factors. Camerer and Ho say:“These factors combine cognitive phenomena like forgettingwith a deliberate tendency to discount old experience whenthe environment is changing”... I don’t see why you need 2 different discount factors forthat ...

Both parameters together govern the steady stateattraction level: (1− ρ)/(1− φ) times the steady-stateaverage payoff

12 / 44

EWA: Parameters


ji , si(t))]πi(s

ji , s−i(t))

ρN(t − 1) + 1

δ determines what strategies players consider to base theirlearning onThe bigger δ, the stronger are hypothetical payoffs used tobase learning on

With δ = 0, only actual payoffs are usedWith δ = 1, the learner does not differentiate betweenhypothetical and actual payoffs

Easy to interpret δ in terms of FP and RL

13 / 44

EWA: Choice


ji , si(t))]πi(s

ji , s−i(t))

ρN(t − 1) + 1

Question for the audience: Which strategies are played ifEWA is used?

Hint: It is a trick questionI did not tell you yet!

14 / 44

EWA: Choice


ji , si(t))]πi(s

ji , s−i(t))

ρN(t − 1) + 1

Question for the audience: Which strategies are played ifEWA is used?Hint: It is a trick question

I did not tell you yet!

14 / 44

EWA: Choice


ji , si(t))]πi(s

ji , s−i(t))

ρN(t − 1) + 1

Question for the audience: Which strategies are played ifEWA is used?Hint: It is a trick questionI did not tell you yet!

14 / 44

EWA: Choice

So far, we only saw how attractions are generated fromparameters and past playHow do attractions map into behavior? Camerer and Hopropose 3 forms (logit, probit, power) and use logit

P ji (t + 1) =eλA

ji (t)

Σmik=1eλAki (t)

Where P ji (t + 1) is the probability of player i to play strategyj in round t and mi is the number of strategies player i hasλ is then another parameter of the model

15 / 44

EWA: Parameters

Before we continue, a quick parameter count:1 ρ discounting2 φ discounting3 N(0) strength of previous experience4 Aji (0) “shape” of previous experience (actually up to i times

j parameters)5 δ weight on hypothetical payoffs6 λ sensitivity to attractions

Cue a comment from ChristophModel will obviously fit very well, due to many parametersusedQuestionable whether it can be used to predict behaviorGood in organising behavior, separating different learningbehaviors, all in one model, while keeping (some)psychological base

16 / 44

oechsslerRechteck

EWA: Special cases

The model collapses to Reinforcement Learning if:ρ = 0N(0) = 1δ = 0

The model is equivalent to Fictitious Play/Cournotadjustment if:

ρ = φ (= 1 for FP, = 0 for Cournot adj.)δ = 1and a proper choice rule is used (max)

Because of the two discounting parameters, EWA is notjust a convex combination of RL and FP, but broader

17 / 44

EWA: Experimental fit

Camerer and Ho test EWA and its special casesextensively, using data from several gamesEWA fits better, even when penalized for additionalparameters (but: RL and FP used with non-uniform initialweights/propensities, so more parameters here as well)However: Many technical decisions must be made whentrying to design “fair” test, so test by the authors notentirely convincingEstimated parameters of EWA differ, sometimes widely,across different types of games

18 / 44

EWA lite

Updated version of EWA, which tries to address thecritique that EWA has to many parametersReplaces all EWA parameters with functions, determinedby the game structure and one addition parameter (soEWA lite has just one parameter)Good inner compromise between edge cases of FP, RL,etc??

Personally, I rather believe in humans applying moresimple “rules of thumb” and chosing appropriate rules ofthumb for each situation

19 / 44

Lecture 6 EWA (and others) - Heidelberg University · 2016. 11. 2. · EWA: model n players,...

Documents

Transcript of Lecture 6 EWA (and others) - Heidelberg University · 2016. 11. 2. · EWA: model n players,...