Lecture 5

50
Lecture V: Game Theory Zhixin Liu Complex Systems Research Center, Complex Systems Research Center, Academy of Mathematics and Syste Academy of Mathematics and Syste ms Sciences, CAS ms Sciences, CAS

description

Lecture

Transcript of Lecture 5

  • Lecture V: Game Theory Zhixin Liu Complex Systems Research Center, Academy of Mathematics and Systems Sciences, CAS

  • In the last two lectures, we talked aboutMulti-Agent SystemsAnalysisIntervention

  • In this lecture, we will talk aboutGame theory complex interactions between people

  • Start With A GameRock-paper-scissorrockpaperscissorrockpaperscissor0,00,00,0-1,1ABOther games: poker, go, chess, bridge, basketball, football,-1,1-1,11,-1 1,-1 1,-1

  • From Games To Game TheorySome hints from the games RulesResults (payoff)StrategiesInteractions between strategies and payoffGames are everywhere.Economic systems: oligarchy monopoly, market, trade Political systems: voting, presidential election, international relations Military systems: war, negotiation,Game theory the study of the strategic interactions among rational agents. Rationality implies that each player tries to maximize his/her payoff

    Not to beat the other players

  • History of Game Theory1928, John von Neumann proved the minimax theorem1944, John von Neumann & Oskar Morgenstern, Theory of Games and Economic Behaviors1950s, John Nash, Nash Equilibrium1970s, John Maynard Smith, Evolutionarily stable strategy Eight game theorists have won Nobel prizes in economics

  • Elements of A GamePlayer: Who is interacting? N={1,2,,n}Actions/ Moves: What the players can do? Action set : Payoff: What the players can get from the game

  • Strategy Strategy: complete plan of actions Mixed strategy: probability distribution over the pure strategies

    Payoff:Pure strategy is a special kind of mixed strategies

  • An Example: Rock-paper-scissorPlayers: A and BActions/ Moves: {rock, scissor, paper}Payoff: u1(rock,scissor)=1 u2(scissor, paper)=-1Mixed strategies s1=(1/3,1/3,1/3)s2=(0,1/2,1/2) u1(s1, s2) = 1/3(00+1/2(-1)+1/21)+ 1/3(01+1/20+1/2(-1))+1/3(0(-1)+1/21+1/20) = 0

    rockpaperscissorrockpaperscissor0,00,00,0-1,1AB-1,1-1,11,-1 1,-1 1,-1

  • Classifications of Games Cooperative and non-cooperative games Cooperative game: players are able to form binding commitments. Non cooperative games: the players make decisions independentlyZero sum and non-zero sum games Zero sum game: the total payoff to all players is zero. E.g., poker, go, Non-zero sum game: e.g., prisoners dilemmaFinite game and infinite game Finite game: the players and the actions are finite.Simultaneous and sequential (dynamic) games Simultaneous game: players move simultaneously, or if they do not move simultaneously, the later players are unaware of the earlier players' actions Sequential game: later players have some knowledge about earlier actions. Perfect information and imperfect information games Perfect information game: all players know the moves previously made by all other players. E.g., chess, go, Perfect information Complete information

    Every player know the strategies and payoffs of the other players but not necessarily the actions.

  • What is the solution of the game?We will first focus on games: Simultaneous Complete information Non cooperative Finite

  • Assumption

    Assume that each player knows the structure of the gameattempts to maximize his payoffattempt to predict the moves of his opponents. knows that this is the common knowledge between the players

  • DominatedStrategy Strategy s' of the player i is called a strictly dominated strategy if there exists a strategy s*, such thatS-i : the strategy set formed by all other players except player iA strategy is dominated if, regardless of what any other players do, the strategy earns a player a smaller payoff than some other strategies.

  • Elimination ofDominatedStrategies LMRUMD4,38,42,85,13,63,06,22,19,6LRUMD4,32,83,63,06,22,1LRU4,36,2LU4,3Example:A dominant strategy may not exist!(U,L) is the solution of the game.

  • Definition of Nash Equilibrium(N, S, u) : a gameSi: strategy set for player i : set of strategy profiles : payoff functions-i: strategy profile of all players except player iA strategy profile s* is called a Nash equilibrium if

    where i is any pure strategy of the player i.Nash Equilibrium (NE): A solution concept of a game

  • Remarks on Nash EquilibriumA set of strategies, one for each player, such that each players strategy is a best response to others strategies

    Best Response: The strategy that maximizes the payoff given others strategies. No player can do better by unilaterally changing his or her strategyA dominant strategy is a NE

  • ExamplePlayers: Smith and LouisActions: { Advertise , Do Not Advertise }Payoffs: Companies Profits

    Each firm earns $50 million from its customersAdvertising costs a firm $20 millionAdvertising captures $30 million from competitor

    How to represent this game?

  • Strategic InteractionsAdNo Ad (50,50) (20,60) (60,20) (30,30)AdLouisSmithNo Ad

  • Best ResponsesBest response for Louis:If Smith advertises: advertiseIf Smith does not advertise: advertiseThe best response for Smith is the same.(Ad, Ad) is a dominant strategy!(Ad, Ad) is a NE!This is another Prisoners Dilemma!AdNo Ad (50,50) (20,60) (60,20) (30,30)AdNo AdSmithLouis

  • Nash EquilibriumNE may be a pair of mixed strategies.Example:head (1,-1) (-1,1) (-1,1) (1,-1)TailBAMatching Pennies headTail(1/2,1/2) is the Nash Equilibrium.

  • Existence of NETheorem (J. Nash, 1950s)

    For a finite game, there exists at least one Nash Equilibrium (Pure strategy, or mixed strategy).

  • Nash EquilibriumNE may not be a good solution of the game, it is different from the optimal solution. e.g., AdNo Ad (50,50) (20,60) (60,20) (30,30)AdNo AdSmithLouis

  • Nash EquilibriumA game may have more than one NE. e.g., The Battle of Sex NE: (opera, opera), (football, football), ((2/3,1/3),(1/3, 2/3))footballopera (2,1) (0,0) (0,0) (1,2)footballoperaHusbandWife

  • Nash EquilibriumZero sum games (two-person): Saddle point is a solution

  • Nash EquilibriumMany varieties of NE: Refined NE, Bayesian NE, Sub-game Perfect NE, Perfect Bayesian NE Finding NEs is very difficult.NE can only tell us if the game reach such a state, then no player has incentive to change their strategies unilaterally. But NE can not tell us how to reach such a state.

  • Iterated Prisoners Dilemma

  • CooperationGroups of organisms:Mutual cooperation is of benefit to all agentsLack of cooperation is harmful to them

    Another types of cooperation:Cooperating agents do wellAny one will do better if failing cooperatePrisoners Dilemma is an elegant embodiment

  • Prisoners DilemmaThe story of prisoners dilemma Player: two prisoners Action: {Cooperation, Defecti} Payoff matrix C (3,3) (0,5) (5,0) (1,1)DPrisoner BCPrisoner AD

  • Prisoners DilemmaNo matter what the other does, the best choice is D.(D,D) is a Nash Equilibrium.But, if both choose D, both will do worse than if both select CC (3,3) (0,5) (5,0) (1,1)DPrisoner BCPrisoner AD

  • The individuals:Meet many timesCan recognize a previous interactantRemember the prior outcome

    Strategy: specify the probability of cooperation and defect based on the historyP(C)=f1(History)P(D)=f2(History)

    Iterated Prisoners Dilemma

  • Tit For Tat cooperating on the first time, then repeat opponent's last choice.

    Player A C D D C C C C C D D D D C

    Player B D D C C C C C D D D D C

    Strategies

  • Tit For Tat - cooperating on the first time, then repeat opponent's last choice. Tit For Tat and Random - Repeat opponent's last choice skewed by random setting.* Tit For Two Tats and Random - Like Tit For Tat except that opponent must make the same choice twice in a row before it is reciprocated. Choice is skewed by random setting.* Tit For Two Tats - Like Tit For Tat except that opponent must make the same choice twice in row before it is reciprocated. Naive Prober (Tit For Tat with Random Defection) - Repeat opponent's last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating.* Remorseful Prober (Tit For Tat with Random Defection) - Repeat opponent's last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating. If the opponent defects in response to probing, show remorse by cooperating once.* Naive Peace Maker (Tit For Tat with Random Co-operation) - Repeat opponent's last choice (ie Tit For Tat), but sometimes make peace by co-operating in lieu of defecting.* True Peace Maker (hybrid of Tit For Tat and Tit For Two Tats with Random Cooperation) - Cooperate unless opponent defects twice in a row, then defect once, but sometimes make peace by cooperating in lieu of defecting.* Random - always set at 50% probability. Strategies

  • Always Defect Always Cooperate Grudger (Co-operate, but only be a sucker once) - Cooperate until the opponent defects. Then always defect unforgivingly. Pavlov (repeat last choice if good outcome) - If 5 or 3 points scored in the last round then repeat last choice. Pavlov / Random (repeat last choice if good outcome and Random) - If 5 or 3 points scored in the last round then repeat last choice, but sometimes make random choices.* Adaptive - Starts with c,c,c,c,c,c,d,d,d,d,d and then takes choices which have given the best average score re-calculated after every move. Gradual - Cooperates until the opponent defects, in such case defects the total number of times the opponent has defected during the game. Followed up by two co-operations. Suspicious Tit For Tat - As for Tit For Tat except begins by defecting. Soft Grudger - Cooperates until the opponent defects, in such case opponent is punished with d,d,d,d,c,c. Customised strategy 1 - default setting is T=1, P=1, R=1, S=0, B=1, always co-operate unless sucker (ie 0 points scored). Customised strategy 2 - default setting is T=1, P=1, R=0, S=0, B=0, always play alternating defect/cooperate.Strategies

  • The same players repeat the prisoners dilemma many times.After ten roundsThe best income is 50.A real case is to get 30 for each player.An extreme case is that each player selects defection, each player can get 10.The most possible case is that each player will play with a mixing strategy of defect and cooperate .

    Iterated Prisoners DilemmaC (3,3) (0,5) (5,0) (1,1)DPrisoner ACDPrisoner B

  • Which strategy can thrive/what is the good strategy?Robert Axelrod, 1980sA computer round-robin tournament Iterated Prisoners DilemmaAXELROD R. 1987. The evolution of strategies in the iterated Prisoners' Dilemma. In L. Davis, editor, Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, Los Altos, CA.

  • Strategies: 14 entries+ random strategyIncluding Markov process + Bayesian inferenceEach pair will meet each other, totally there are 15*15 runs, each pair will play the game 200 timesPayoff: S U(S,S)/15Tit For Tat wins (cooperation based on reciprocity)The first round

  • Characters of good strategiesGoodness: never defect firstTFT vs. Naive prober

    Forgiveness: may revenge, but the memory is short.TFT vs. GrudgerThe first roundNaive Prober - Repeat opponent's last choice but sometimes probe by defecting in lieu of cooperatingGrudger - Cooperate until the opponent defects. Then always defect unforgivingly

  • Winning Vs. High ScoresThis is not a zero sum game, there is a banker.TFT never wins one game. The best result for it is to get the same result as its opponent.Winning the game is a kind of jealousness, it does not work wellIt is possible to arise cooperation in a selfish group.

  • Strategies: 62 entries+ random strategygoodness strategieswiliness: strategies Tit For Tat wins againWin or lost depends on the circumstance.

    The second round

  • Characters of good strategiesGoodness: never defect firstFirst round: the first eight strategies with goodnessSecond round: there are fourteen strategies with goodness in the first fifteen strategiesForgiveness: may revenge, but the memory is short.Grudger is not s strategy with forgiveness

    goodness and forgiveness is a kind of collective behavior.For a single agent, defect is the best strategy.

  • Evolve good strategies by genetic algorithm (GA)

    Evolution of the Strategies

  • What is a good strategy?TFT is a good strategy?Tit For Two Tats may be the best strategy in the first round, but it is not a good strategy in the second round.Good strategy depends on the environment.

    Evolutionarily stable strategyTit For Two Tats - Like Tit For Tat except that opponent must make the same choice twice in row before it is reciprocated.

  • Evolutionarily stable strategy (ESS)Introduced by John Maynard Smith and George R. Price in 1973 ESS means evolutionarily stable strategy, that is a strategy such that, if all member of the population adopt it, then no mutant strategy could invade the population under the influence of natural selection.

    ESS is robust for evolution, it can not be invaded by mutation.John Maynard Smith, Evolution and the Theory of Games

  • Definition of ESSA strategy x is an ESS if for all y, y x, such that holds for small positive.

  • ESSESS is defined in a population with a large number of individuals.

    The individuals can not control the strategy, and may not be aware the game they played

    ESS is the result of natural selection

    Like NE, ESS can only tell us it is robust to the evolution, but it can not tell us how the population reach such a state.

  • ESS in IPD

    Tit For Tat can not be invaded by the wiliness strategies, such as always defect.TFT can be invaded by goodness strategies, such as always cooperate, Tit For Two Tats and Suspicious Tit For Tat Tit For Tat is not a strict ESS.Always Cooperate can be invaded by Always Defect.Always Defect is an ESS.

  • referencesDrew Fudenberg, Jean Tirole, Game Theory, The MIT Press, 1991.

    AXELROD R. 1987. The evolution of strategies in the iterated Prisoners' Dilemma. In L. Davis, editor, Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, Los Altos, CA.

    Richard Dawkins, The Selfish Gene, Oxford University Press.

  • Concluding RemarksTip Of Game theoryBasic ConceptsNash EquilibriumIterated Prisoners DilemmaEvolutionarily Stable Strategy

  • Concluding RemarksMany interesting topics deserve to be studied and further investigated:Cooperative gamesIncomplete information gamesDynamic gamesCombinatorial gamesLearning in games.

  • Thank you!

    Today we are