Nash QLearning for GeneralSum Stochastic Games

Hu, J. and Wellman, M.P.

Lauri Lyly, llyly@cc.hut.fi

Stochastic generalsum games

● Stochasticity: Environment is in part formed by other agents– nondeterministic, noncooperative nature, no agreements

● Arbitrary relation between agents' rewards– Extends last time's topic zerosum

Nash Equilibrium

● Bestresponse joint strategy● Study limited to stationary strategies (policies)● Rewards of others are perceived, strategies are not

Nash QValues

Stage game

● Oneperiod game as opposed to stochastic

● Mainly used in convergence proof● Nash equilibrium for the stage game. M is a “payoff

function”:

Update rule

● Same update rule for agent itself and its conjecture on other agent's Qfunctions

● Qfunctions can be initialized for example to 0● Asynchronous updating: only entries pertaining to current

state are updated

The Nash Qlearning algorithm

Convergence proof requirements

● Assumption 1: Every stateaction tuple is visited infinitely often

● Assumption 2: Learning rate alpha(t) satisfies:– Sum from goes towards infinity– Squared sum does not– alpha = 0 if the element being updated doesn't correspond to

current stateaction tuple (asynchronous updating)

Proof basis and result

● Qlearning process updated by pseudocontraction operator using the usual form:

Q=(1alpha(t))Q(t)+alpha(t)[P(t)Q(t)]Contraction: Values approach optimal Q

● Link between stage games and stochastic games● Goal: Show that NashQ is a pseudocontraction operator● Actually a real contraction operator in restricted conditions

– Game with special types of Nash equilibrium points

Different Nash equilibria

● Global optimal point using stage game notation

● Saddle point

● All equilibria chosen for update must be same type

Stage games compared

Global optimal point (Up, Left) Saddle point (Down,Right)

Experimentation framework

● Two gridworld games

● Motivation for grid games: statespecific actions, qualitative transitions, immediate and longterm rewards

Learning process

● Violates assumption 3 of monotonic selection of global optima or saddle points.

● Still converges in most cases regardless of selection● Offline and online learning rated separately

Conclusions

● No current method provides performance guarantees for generalsum stochastic games

● Works as a starting point– Other promising variants– Nash equilibrium itself can be refined

References

● [7] Hu, J. and Wellman, M.P. (2003). Nash QLearning for GeneralSum Stochastic Games. Journal of Machine Learning Research 4:1039–1069.

Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf ·...

Transcript of Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf ·...

Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf ·...

Documents

Transcript of Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf ·...

Stochastic games and their complexities

Numerical Approximations for Stochastic Diﬀerential Games ... · Numerical Approximations for Stochastic Diﬀerential Games: The Ergodic Case Harold J. Kushner∗ Applied Mathematics

STOCHASTIC GAMES WITH A SINGLE CONTROLLER AND …eilons/sgiiSIAM.pdf · STOCHASTIC GAMES WITH A SINGLE CONTROLLER AND INCOMPLETE INFORMATION∗ DINAH ROSENBERG†, EILON SOLAN‡,

Synthesis for Multi-Objective Stochastic Games: An Application ... - PRISM … · 2013. 6. 3. · methods in PRISM-games, a model checker for stochastic multi-player games, and present

Stochastic Games with a Single Controller and Incomplete ...the analogue for stochastic games. Sorin (1984, 1985) and Sorin and Zamir (1991) studied classes of stochastic games with

A Constructive Study of Markov Equilibria in Stochastic Games …econ/conference/documents/reffett.pdf · 2011-12-07 · Stochastic Games with Strategic Complementarities L ukasz

Two-stage allocations for stochastic linear programming games

Stochastic Coalitional Games for Cooperative Random Access ... · Stochastic Coalitional Games for Cooperative Random Access in M2M Communications Mehdi Naderi Soorki, Walid Saad,

Joint Action Learners in Competitive Stochastic Games

Zero-Sum Stochastic Games An algorithmic review

De nable zero-sum stochastic games · 2017. 1. 29. · Keywords Zero-sum stochastic games, Shapley operator, o-minimal structures, deﬁnable games, uniform value, nonexpansive mappings,

Autonomous Demand Response using Stochastic Differential Games

From Timed Automata to Stochastic Hybrid Games

Dynamic Programming for Partially Observable Stochastic Games

Stochastic Multiplayer Games: Theory and Algorithms - Aachen

A Lyapunov Optimization Approach to Repeated Stochastic Games

Stochastic Zero-sum and Nonzero-sum -regular Games

Solving Stackelberg Equilibrium in Stochastic Games ...

Stochastic Direct Reinforcement: Application to Simple Games … · 2006-01-11 · Stochastic Direct Reinforcement: Application to Simple Games with Recurrence John Moody*, Yufeng

A QLearning approach with ... - University of Sussex