Lecture 6 EWA (and others) - Heidelberg University · 2016. 11. 2. · EWA: model n players,...
Transcript of Lecture 6 EWA (and others) - Heidelberg University · 2016. 11. 2. · EWA: model n players,...
-
Today
Experience-Weighted Attraction learningModelComparison
Directional learning/Trial&Error learningImitationSummary
2 / 44
oechsslerRechteck
-
EWA
Paper: Camerer & Ho (1999)Broad model with several parametersContains Fictitious Play (FP) and Reinforcement Learning(RL) as special cases
FP and RL can be tested withhin the modelWhen fitted, EWA will always be as least as good as FPand RLSimilarities and differences between the two can be shownin the model
3 / 44
-
EWA: model
n players, indexed by isji ∈ Si strategy j of player i , finite strategy spaceπi(si(t), s−i(t)) payoff of player i in round tIn RL, the state was determined by, for each player, avector which included one entry per strategy, calledpropensitiesIn EWA, each player has a similar vector with one entry perstrategy, A(t), called attractionsIn addition to that, the state also includes a number N(t),which is termed observation-equivalents
Note: N is not the number of players
A and N are updated each round to form the learningtheory
4 / 44
-
EWA: N
N(t) “Observation-equivalents”: Roughly the same asprevious rounds of experienceN(0) Value of N at the start of the game, one of theparameters of the theoryUpdating rule for N:
N(t) = ρN(t − 1) + 1, t ≥ 1
Set value of 1 is added each roundPrevious experience is depreciated with a discount factor ρ
5 / 44
-
EWA: A
Aji(t) “Attraction” of player i ’s strategy j in round t
Aji(0) Initial attractions at the start of the game, parametersof the theoryUpdating rule for A consists of 3 parts:
1 Keeping (discounted) old attraction: First entry in thenumerator
2 Adding value given the outcome of the current round:Second entry in the numerator
3 Normalization with experience: Denominator
6 / 44
-
EWA: A
1. Keeping discounted old attractions:
φN(t − 1)Aji(t − 1)
φ: Second discount rate, for old attractions, can bedifferent from ρ
7 / 44
-
EWA: A
2. Adding value for current round
φN(t − 1)Aji(t − 1) +[δ + (1− δ)I(sji , si(t))]πi(s
ji , s−i(t))
I is the indicator function, which is = 1 if the strategy wasused this round, and = 0 if the strategy was not usedδ determines weight added for used vs unused strategies:
δ = 0 Only used strategy is updated (as in RL)δ = 1 Used and unused strategies treated the same (as inFP)δ is another parameter of the theory
8 / 44
-
EWA: A
3. Normalization with experience
φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(sji , si(t))]πi(s
ji , s−i(t))
N(t)
N(t) = ρN(t − 1) + 1
N(0) is pregame experience, further N ingame experience
9 / 44
-
EWA: Parameters
Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s
ji , si(t))]πi(s
ji , s−i(t))
ρN(t − 1) + 1
In FP, the initial weights determine, how strongly the “prior”beliefs about the game influence gameplaySimilar here, N(0) determines the strengths of pregameinfluence. Someone with extremely high N(0) will notchange his play according to actual experience, but stick tohis pre-conceived knowledge of the game
10 / 44
-
EWA: Parameters
Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s
ji , si(t))]πi(s
ji , s−i(t))
N(t) = ρN(t − 1) + 1
The initial attractions need not be distributed evenly. Justas N(0) reflects the strength of pre-game knowledge, Aji(0)reflects what exactly you know prior to the gameNote the similarity with RL’s propensities and initialpropensities
11 / 44
-
EWA: Parameters
Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s
ji , si(t))]πi(s
ji , s−i(t))
ρN(t − 1) + 1
Two discount factors. Camerer and Ho say:“These factors combine cognitive phenomena like forgettingwith a deliberate tendency to discount old experience whenthe environment is changing”... I don’t see why you need 2 different discount factors forthat ...
Both parameters together govern the steady stateattraction level: (1− ρ)/(1− φ) times the steady-stateaverage payoff
12 / 44
-
EWA: Parameters
Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s
ji , si(t))]πi(s
ji , s−i(t))
ρN(t − 1) + 1
δ determines what strategies players consider to base theirlearning onThe bigger δ, the stronger are hypothetical payoffs used tobase learning on
With δ = 0, only actual payoffs are usedWith δ = 1, the learner does not differentiate betweenhypothetical and actual payoffs
Easy to interpret δ in terms of FP and RL
13 / 44
-
EWA: Choice
Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s
ji , si(t))]πi(s
ji , s−i(t))
ρN(t − 1) + 1
Question for the audience: Which strategies are played ifEWA is used?
Hint: It is a trick questionI did not tell you yet!
14 / 44
-
EWA: Choice
Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s
ji , si(t))]πi(s
ji , s−i(t))
ρN(t − 1) + 1
Question for the audience: Which strategies are played ifEWA is used?Hint: It is a trick question
I did not tell you yet!
14 / 44
-
EWA: Choice
Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s
ji , si(t))]πi(s
ji , s−i(t))
ρN(t − 1) + 1
Question for the audience: Which strategies are played ifEWA is used?Hint: It is a trick questionI did not tell you yet!
14 / 44
-
EWA: Choice
So far, we only saw how attractions are generated fromparameters and past playHow do attractions map into behavior? Camerer and Hopropose 3 forms (logit, probit, power) and use logit
P ji (t + 1) =eλA
ji (t)
Σmik=1eλAki (t)
Where P ji (t + 1) is the probability of player i to play strategyj in round t and mi is the number of strategies player i hasλ is then another parameter of the model
15 / 44
-
EWA: Parameters
Before we continue, a quick parameter count:1 ρ discounting2 φ discounting3 N(0) strength of previous experience4 Aji (0) “shape” of previous experience (actually up to i times
j parameters)5 δ weight on hypothetical payoffs6 λ sensitivity to attractions
Cue a comment from ChristophModel will obviously fit very well, due to many parametersusedQuestionable whether it can be used to predict behaviorGood in organising behavior, separating different learningbehaviors, all in one model, while keeping (some)psychological base
16 / 44
oechsslerRechteck
-
EWA: Special cases
The model collapses to Reinforcement Learning if:ρ = 0N(0) = 1δ = 0
The model is equivalent to Fictitious Play/Cournotadjustment if:
ρ = φ (= 1 for FP, = 0 for Cournot adj.)δ = 1and a proper choice rule is used (max)
Because of the two discounting parameters, EWA is notjust a convex combination of RL and FP, but broader
17 / 44
-
EWA: Experimental fit
Camerer and Ho test EWA and its special casesextensively, using data from several gamesEWA fits better, even when penalized for additionalparameters (but: RL and FP used with non-uniform initialweights/propensities, so more parameters here as well)However: Many technical decisions must be made whentrying to design “fair” test, so test by the authors notentirely convincingEstimated parameters of EWA differ, sometimes widely,across different types of games
18 / 44
-
EWA lite
Updated version of EWA, which tries to address thecritique that EWA has to many parametersReplaces all EWA parameters with functions, determinedby the game structure and one addition parameter (soEWA lite has just one parameter)Good inner compromise between edge cases of FP, RL,etc??
Personally, I rather believe in humans applying moresimple “rules of thumb” and chosing appropriate rules ofthumb for each situation
19 / 44