Joint Strategy Fictitious Play

36
Joint Strategy Fictitious Play Sherwin Doroudi

description

Joint Strategy Fictitious Play. Sherwin Doroudi. “Adapted” from. J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for potential games,” in Proceedings of the 44th IEEE Conference on Decision and Control , December 2005, pp. 6692-6697. Review: Game. Players: - PowerPoint PPT Presentation

Transcript of Joint Strategy Fictitious Play

Page 1: Joint Strategy Fictitious Play

Joint Strategy Fictitious Play

Sherwin Doroudi

Page 2: Joint Strategy Fictitious Play

“Adapted” from

J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for potential games,” in Proceedings of the 44th IEEE Conference on Decision and Control, December 2005, pp. 6692-6697.

Page 3: Joint Strategy Fictitious Play

Review: Game• Players:

• Actions:

• Payoffs:

Page 4: Joint Strategy Fictitious Play

Review: GameWe then play the game repeatedly in

“stages,” starting at stage 0. Players can use learning algorithms as discussed in lecture. Note that players know the structural form of their own payoff function, but do not know the form of the other players’ payoff functions.

Page 5: Joint Strategy Fictitious Play

Notation: ActionsAs in the lecture, we use the

notation

Page 6: Joint Strategy Fictitious Play

Review: Regret Matching• Guaranteed to converge to a

Coarse Correlated Equilibrium (CCE) in all games (Hart & Mas-Colell, 2000).

• But CCE can be quite bad in some cases, as they are a superset of Nash Equilibria (NE).

Page 7: Joint Strategy Fictitious Play

Review: Fictitious Play (FP)• Observe empirical frequencies of

every player’s action• Consider best response(s) under

the (incorrect) assumption that other players play according to their empirical frequencies

• Randomly choose a best response and act accordingly

Page 8: Joint Strategy Fictitious Play

Empirical Frequency in FPThe empirical frequency for a player

and an action is the percentage of stages that the player chose that action up to the previous stage:

Page 9: Joint Strategy Fictitious Play

Empirical Frequency in FPEach player also has an empirical

frequency vector.

Page 10: Joint Strategy Fictitious Play

Best Response in FPEach player assumes an expected

payoff

And each player chooses a best response from the set

Page 11: Joint Strategy Fictitious Play

The Good News!“The empirical frequencies

generated by FP converge to a Nash equilibrium in potential games” (Monderer & Shapley, 1996).

Page 12: Joint Strategy Fictitious Play

The Bad News (if any)?What are some weaknesses of FP?

Page 13: Joint Strategy Fictitious Play

A Routing Example• Consider a routing game with 100

players all with the same source and sink

• There are 4 roads from the source to the sink

• Players want to minimize their cost.

Page 14: Joint Strategy Fictitious Play

A Routing Example• The cost of traveling on each road

is given by a quadratic cost function with positive coefficients (could be randomly generated) depending on the number of players choosing that road

• Can we use FP as a learning algorithm in this example?

Page 15: Joint Strategy Fictitious Play

A Routing ExampleFormalizing the game, we have

Page 16: Joint Strategy Fictitious Play

A Routing ExampleRemember this?

Page 17: Joint Strategy Fictitious Play

A Routing ExampleRemember this?

The sum above is over 4^99=2^198 terms!

Page 18: Joint Strategy Fictitious Play

A Routing ExampleRemember this?

This is not computationally feasible!

The sum above is over 4^99=2^198 terms!

Page 19: Joint Strategy Fictitious Play

What do we do?The routing example (which is fairly

realistic) is motivation that we either need to find a more effective way to compute this utility or we need to develop an algorithm that is computationally suitable for “large” games.

Page 20: Joint Strategy Fictitious Play

Joint Strategy Fictitious Play (JSFP)

• Observe empirical frequencies of joint actions

• Consider best response(s) under the (still incorrect) assumption that all other players act collectively as a group according to their joint empirical frequency

• Randomly choose a best response and act accordingly

Page 21: Joint Strategy Fictitious Play

Does FP=JSFP?• In the case of two players it is easy

to see that FP and JSFP are the same.

Page 22: Joint Strategy Fictitious Play

Does FP=JSFP?• In the case of two players it is easy

to see that FP and JSFP are the same

• But in the case of three or more players this is not necessarily the case!

Page 23: Joint Strategy Fictitious Play

Empirical Frequency in JSFP

The empirical frequency for an action profile may be calculated as follows:

Page 24: Joint Strategy Fictitious Play

Expected Payoff in JSFPEach player assumes an expected

payoff

Page 25: Joint Strategy Fictitious Play

Expected Payoff in JSFPEach player assumes an expected

payoff

But this looks about as bad (maybe worse) than FP!

So what can we do?

Page 26: Joint Strategy Fictitious Play

Expected Payoff in JSFPEach player assumes an expected payoff

We rewrite it in a more useful form!

Page 27: Joint Strategy Fictitious Play

The JSFP Payoff RecursionSo now, we can rewrite the expected

payoff as a simple recursion, and at every stage choose a value that maximizes it (our best response)

We are maximizing regret!

Page 28: Joint Strategy Fictitious Play

Convergence Properties of JSFP

The convergence properties of JSFP (for games of three or more players) remain unknown; so this is an open problem. But when a joint action generated by JSFP reaches a strict NE, it will stay there forever. To get convergence properties, we add “inertia” to our learning algorithm.

Page 29: Joint Strategy Fictitious Play

JSFP with Inertia• Assume that all NE are strict• JSFP-1: If the action chosen by a

player in the previous stage is a best response to the current stage choose that action

• JSFP-2: Otherwise choose an action according to the distribution

Page 30: Joint Strategy Fictitious Play

The JSFP-2 DistributionHere the alpha parameter represents the

player’s willingness to optimize at a given stage, while the beta parameter whose support is contained in the set of best responses to this stage, and the v term is a distribution with full support on the action taken in the previous stage.

Page 31: Joint Strategy Fictitious Play

JSFP w/ Inertia Converges!• In particular to some Nash Equilibria for

generalized ordinal potential games• Of course there is no equilibrium

selection mechanism• And not much is known regarding the

convergence rate• But we have shown that JSFP w/ Inertia is

a good substitute for FP in “large” games

Page 32: Joint Strategy Fictitious Play

JSFP w/ Inertia Converges!If you want the proof, read the paper

as the proof is not trivial!

Page 33: Joint Strategy Fictitious Play

The Fading Memory Variant

We used the recursion

But we could also use the recursion

Here, rho is a constant or function less than or equal to 1, and it is also proven that this algorithm gives rise to a process converging to some NE.

Page 34: Joint Strategy Fictitious Play

A Routing Example, Revisited

• We can now apply JSFP w/ Inertia and fading memory to the routing problem, and we should converge to some NE (in generalized ordinal potential games, which includes routing games)

• Simulations show that JSFP without inertia should also work in this case

• Try it!

Page 35: Joint Strategy Fictitious Play

Example of Convergence

Page 36: Joint Strategy Fictitious Play

Conclusion• We have demonstrated some

weaknesses of FP (computational demands, observational demands, etc.)

• We have developed JSFP, which seems to accommodate computational limitations