Joint Strategy Fictitious Play
description
Transcript of Joint Strategy Fictitious Play
Joint Strategy Fictitious Play
Sherwin Doroudi
“Adapted” from
J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for potential games,” in Proceedings of the 44th IEEE Conference on Decision and Control, December 2005, pp. 6692-6697.
Review: Game• Players:
• Actions:
• Payoffs:
Review: GameWe then play the game repeatedly in
“stages,” starting at stage 0. Players can use learning algorithms as discussed in lecture. Note that players know the structural form of their own payoff function, but do not know the form of the other players’ payoff functions.
Notation: ActionsAs in the lecture, we use the
notation
Review: Regret Matching• Guaranteed to converge to a
Coarse Correlated Equilibrium (CCE) in all games (Hart & Mas-Colell, 2000).
• But CCE can be quite bad in some cases, as they are a superset of Nash Equilibria (NE).
Review: Fictitious Play (FP)• Observe empirical frequencies of
every player’s action• Consider best response(s) under
the (incorrect) assumption that other players play according to their empirical frequencies
• Randomly choose a best response and act accordingly
Empirical Frequency in FPThe empirical frequency for a player
and an action is the percentage of stages that the player chose that action up to the previous stage:
Empirical Frequency in FPEach player also has an empirical
frequency vector.
Best Response in FPEach player assumes an expected
payoff
And each player chooses a best response from the set
The Good News!“The empirical frequencies
generated by FP converge to a Nash equilibrium in potential games” (Monderer & Shapley, 1996).
The Bad News (if any)?What are some weaknesses of FP?
A Routing Example• Consider a routing game with 100
players all with the same source and sink
• There are 4 roads from the source to the sink
• Players want to minimize their cost.
A Routing Example• The cost of traveling on each road
is given by a quadratic cost function with positive coefficients (could be randomly generated) depending on the number of players choosing that road
• Can we use FP as a learning algorithm in this example?
A Routing ExampleFormalizing the game, we have
A Routing ExampleRemember this?
A Routing ExampleRemember this?
The sum above is over 4^99=2^198 terms!
A Routing ExampleRemember this?
This is not computationally feasible!
The sum above is over 4^99=2^198 terms!
What do we do?The routing example (which is fairly
realistic) is motivation that we either need to find a more effective way to compute this utility or we need to develop an algorithm that is computationally suitable for “large” games.
Joint Strategy Fictitious Play (JSFP)
• Observe empirical frequencies of joint actions
• Consider best response(s) under the (still incorrect) assumption that all other players act collectively as a group according to their joint empirical frequency
• Randomly choose a best response and act accordingly
Does FP=JSFP?• In the case of two players it is easy
to see that FP and JSFP are the same.
Does FP=JSFP?• In the case of two players it is easy
to see that FP and JSFP are the same
• But in the case of three or more players this is not necessarily the case!
Empirical Frequency in JSFP
The empirical frequency for an action profile may be calculated as follows:
Expected Payoff in JSFPEach player assumes an expected
payoff
Expected Payoff in JSFPEach player assumes an expected
payoff
But this looks about as bad (maybe worse) than FP!
So what can we do?
Expected Payoff in JSFPEach player assumes an expected payoff
We rewrite it in a more useful form!
The JSFP Payoff RecursionSo now, we can rewrite the expected
payoff as a simple recursion, and at every stage choose a value that maximizes it (our best response)
We are maximizing regret!
Convergence Properties of JSFP
The convergence properties of JSFP (for games of three or more players) remain unknown; so this is an open problem. But when a joint action generated by JSFP reaches a strict NE, it will stay there forever. To get convergence properties, we add “inertia” to our learning algorithm.
JSFP with Inertia• Assume that all NE are strict• JSFP-1: If the action chosen by a
player in the previous stage is a best response to the current stage choose that action
• JSFP-2: Otherwise choose an action according to the distribution
The JSFP-2 DistributionHere the alpha parameter represents the
player’s willingness to optimize at a given stage, while the beta parameter whose support is contained in the set of best responses to this stage, and the v term is a distribution with full support on the action taken in the previous stage.
JSFP w/ Inertia Converges!• In particular to some Nash Equilibria for
generalized ordinal potential games• Of course there is no equilibrium
selection mechanism• And not much is known regarding the
convergence rate• But we have shown that JSFP w/ Inertia is
a good substitute for FP in “large” games
JSFP w/ Inertia Converges!If you want the proof, read the paper
as the proof is not trivial!
The Fading Memory Variant
We used the recursion
But we could also use the recursion
Here, rho is a constant or function less than or equal to 1, and it is also proven that this algorithm gives rise to a process converging to some NE.
A Routing Example, Revisited
• We can now apply JSFP w/ Inertia and fading memory to the routing problem, and we should converge to some NE (in generalized ordinal potential games, which includes routing games)
• Simulations show that JSFP without inertia should also work in this case
• Try it!
Example of Convergence
Conclusion• We have demonstrated some
weaknesses of FP (computational demands, observational demands, etc.)
• We have developed JSFP, which seems to accommodate computational limitations