Lecture 6 EWA (and others) - Heidelberg University · 2016. 11. 2. · EWA: model n players,...

22
Today Experience-Weighted Attraction learning Model Comparison Directional learning/Trial&Error learning Imitation Summary 2 / 44

Transcript of Lecture 6 EWA (and others) - Heidelberg University · 2016. 11. 2. · EWA: model n players,...

  • Today

    Experience-Weighted Attraction learningModelComparison

    Directional learning/Trial&Error learningImitationSummary

    2 / 44

    oechsslerRechteck

  • EWA

    Paper: Camerer & Ho (1999)Broad model with several parametersContains Fictitious Play (FP) and Reinforcement Learning(RL) as special cases

    FP and RL can be tested withhin the modelWhen fitted, EWA will always be as least as good as FPand RLSimilarities and differences between the two can be shownin the model

    3 / 44

  • EWA: model

    n players, indexed by isji ∈ Si strategy j of player i , finite strategy spaceπi(si(t), s−i(t)) payoff of player i in round tIn RL, the state was determined by, for each player, avector which included one entry per strategy, calledpropensitiesIn EWA, each player has a similar vector with one entry perstrategy, A(t), called attractionsIn addition to that, the state also includes a number N(t),which is termed observation-equivalents

    Note: N is not the number of players

    A and N are updated each round to form the learningtheory

    4 / 44

  • EWA: N

    N(t) “Observation-equivalents”: Roughly the same asprevious rounds of experienceN(0) Value of N at the start of the game, one of theparameters of the theoryUpdating rule for N:

    N(t) = ρN(t − 1) + 1, t ≥ 1

    Set value of 1 is added each roundPrevious experience is depreciated with a discount factor ρ

    5 / 44

  • EWA: A

    Aji(t) “Attraction” of player i ’s strategy j in round t

    Aji(0) Initial attractions at the start of the game, parametersof the theoryUpdating rule for A consists of 3 parts:

    1 Keeping (discounted) old attraction: First entry in thenumerator

    2 Adding value given the outcome of the current round:Second entry in the numerator

    3 Normalization with experience: Denominator

    6 / 44

  • EWA: A

    1. Keeping discounted old attractions:

    φN(t − 1)Aji(t − 1)

    φ: Second discount rate, for old attractions, can bedifferent from ρ

    7 / 44

  • EWA: A

    2. Adding value for current round

    φN(t − 1)Aji(t − 1) +[δ + (1− δ)I(sji , si(t))]πi(s

    ji , s−i(t))

    I is the indicator function, which is = 1 if the strategy wasused this round, and = 0 if the strategy was not usedδ determines weight added for used vs unused strategies:

    δ = 0 Only used strategy is updated (as in RL)δ = 1 Used and unused strategies treated the same (as inFP)δ is another parameter of the theory

    8 / 44

  • EWA: A

    3. Normalization with experience

    φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(sji , si(t))]πi(s

    ji , s−i(t))

    N(t)

    N(t) = ρN(t − 1) + 1

    N(0) is pregame experience, further N ingame experience

    9 / 44

  • EWA: Parameters

    Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s

    ji , si(t))]πi(s

    ji , s−i(t))

    ρN(t − 1) + 1

    In FP, the initial weights determine, how strongly the “prior”beliefs about the game influence gameplaySimilar here, N(0) determines the strengths of pregameinfluence. Someone with extremely high N(0) will notchange his play according to actual experience, but stick tohis pre-conceived knowledge of the game

    10 / 44

  • EWA: Parameters

    Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s

    ji , si(t))]πi(s

    ji , s−i(t))

    N(t) = ρN(t − 1) + 1

    The initial attractions need not be distributed evenly. Justas N(0) reflects the strength of pre-game knowledge, Aji(0)reflects what exactly you know prior to the gameNote the similarity with RL’s propensities and initialpropensities

    11 / 44

  • EWA: Parameters

    Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s

    ji , si(t))]πi(s

    ji , s−i(t))

    ρN(t − 1) + 1

    Two discount factors. Camerer and Ho say:“These factors combine cognitive phenomena like forgettingwith a deliberate tendency to discount old experience whenthe environment is changing”... I don’t see why you need 2 different discount factors forthat ...

    Both parameters together govern the steady stateattraction level: (1− ρ)/(1− φ) times the steady-stateaverage payoff

    12 / 44

  • EWA: Parameters

    Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s

    ji , si(t))]πi(s

    ji , s−i(t))

    ρN(t − 1) + 1

    δ determines what strategies players consider to base theirlearning onThe bigger δ, the stronger are hypothetical payoffs used tobase learning on

    With δ = 0, only actual payoffs are usedWith δ = 1, the learner does not differentiate betweenhypothetical and actual payoffs

    Easy to interpret δ in terms of FP and RL

    13 / 44

  • EWA: Choice

    Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s

    ji , si(t))]πi(s

    ji , s−i(t))

    ρN(t − 1) + 1

    Question for the audience: Which strategies are played ifEWA is used?

    Hint: It is a trick questionI did not tell you yet!

    14 / 44

  • EWA: Choice

    Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s

    ji , si(t))]πi(s

    ji , s−i(t))

    ρN(t − 1) + 1

    Question for the audience: Which strategies are played ifEWA is used?Hint: It is a trick question

    I did not tell you yet!

    14 / 44

  • EWA: Choice

    Aji(t) =φN(t − 1)Aji(t − 1) + [δ + (1− δ)I(s

    ji , si(t))]πi(s

    ji , s−i(t))

    ρN(t − 1) + 1

    Question for the audience: Which strategies are played ifEWA is used?Hint: It is a trick questionI did not tell you yet!

    14 / 44

  • EWA: Choice

    So far, we only saw how attractions are generated fromparameters and past playHow do attractions map into behavior? Camerer and Hopropose 3 forms (logit, probit, power) and use logit

    P ji (t + 1) =eλA

    ji (t)

    Σmik=1eλAki (t)

    Where P ji (t + 1) is the probability of player i to play strategyj in round t and mi is the number of strategies player i hasλ is then another parameter of the model

    15 / 44

  • EWA: Parameters

    Before we continue, a quick parameter count:1 ρ discounting2 φ discounting3 N(0) strength of previous experience4 Aji (0) “shape” of previous experience (actually up to i times

    j parameters)5 δ weight on hypothetical payoffs6 λ sensitivity to attractions

    Cue a comment from ChristophModel will obviously fit very well, due to many parametersusedQuestionable whether it can be used to predict behaviorGood in organising behavior, separating different learningbehaviors, all in one model, while keeping (some)psychological base

    16 / 44

    oechsslerRechteck

  • EWA: Special cases

    The model collapses to Reinforcement Learning if:ρ = 0N(0) = 1δ = 0

    The model is equivalent to Fictitious Play/Cournotadjustment if:

    ρ = φ (= 1 for FP, = 0 for Cournot adj.)δ = 1and a proper choice rule is used (max)

    Because of the two discounting parameters, EWA is notjust a convex combination of RL and FP, but broader

    17 / 44

  • EWA: Experimental fit

    Camerer and Ho test EWA and its special casesextensively, using data from several gamesEWA fits better, even when penalized for additionalparameters (but: RL and FP used with non-uniform initialweights/propensities, so more parameters here as well)However: Many technical decisions must be made whentrying to design “fair” test, so test by the authors notentirely convincingEstimated parameters of EWA differ, sometimes widely,across different types of games

    18 / 44

  • EWA lite

    Updated version of EWA, which tries to address thecritique that EWA has to many parametersReplaces all EWA parameters with functions, determinedby the game structure and one addition parameter (soEWA lite has just one parameter)Good inner compromise between edge cases of FP, RL,etc??

    Personally, I rather believe in humans applying moresimple “rules of thumb” and chosing appropriate rules ofthumb for each situation

    19 / 44