Bertinoro, 2011 Stochastic games Dario Bauso › ~bagagiol › StochasticGames.pdf · Bertinoro,...
Transcript of Bertinoro, 2011 Stochastic games Dario Bauso › ~bagagiol › StochasticGames.pdf · Bertinoro,...
Stochastic gamesBertinoro, 2011 Dario Bauso
• Applications
• Stochastic games: formulation
• Two-players zero sum games
• Results and open questions
• References
freely inspired by Solan E (2009) Stochastic Games, in Encyclopedia of Database Systems, Springer.
Capital accumulation (Fishery)
• Two players jointly own a natural resource or productive asset.
• at every period they have to decide the amount of resource toconsume
• the amount that is not consumed grows by a known (or anunknown) fraction
• state is current amount of resource
• action set is amount of resource to be exploited in the currentperiod
• transition is influenced by the decisions of all the players, aswell as by the random growth of the resource.
Taxation
• A government sets a tax rate at every period
• each citizen decides at every period how much to work, andhow much money to consume; the rest of money grows by aknown interest rate at the next period
• state is citizens amount of savings
• stage payoff of a citizen depends on the amount of i) moneythat he consumed, ii) free time he has, iii) tax that thegovernment collected in total.
• stage payoff of government combines i) average stage payoffof citizens ii) amount of tax collected
Communication network
• A single-cell system with one receiver and multiple uplinktransmitters share a single, slotted, synchronous classicalcollision channel
• transmitters at each time slot decide if and which packet totransmit
• state is channel congestion
• stage payoff combines probability of successfull transmissionplus cost of transmission
• dropped packets are backlogged
Queues
• Individuals may choose between private slow service provider,or powerful public service provider
• state is the current load of public and private service providers
• the payoff is the time to be served
Main ingredients
• Interactions among players
• environment changes in response to players’ behaviors
• players’ stage payoff depends on players current behaviors andenvironment
Formulation
• Set of players N
• state space S
• set of actions Ai of player i
• set valued function Ai : S → Ai (available action for givenstate)
• set of action profilesSA := {(s, a) : s ∈ S , a = (ai )i∈N , ai ∈ Ai (s)}
• stage payoff function ui : SA→ R of player i
• transition function q : SA→ ∆(S)(∆(S) space of probability distribution over S)
Comments
Stochastic games generalize
• finite interactions if play moves at time t to an absorbingstate with payoff 0
• (static) matrix game if t = 1
• repeated games if we have one single state
• stopping games if stage payoff 0 until a player chooses quit,play moves to absorbing state with nonnull payoff
• markov decision problems if we have just one single player.
• Also, payoff ui is profit (to maximize) but can also be a cost(to minimize)
• actions determine current payoff and future state (payoff)
• actions, payoffs, and transitions depend only on current state
Strategies
• given past play at stage t:
(s1, a1, s2, a2, . . . , st)
• (pure) stationary strategy depends on current state only:
σi (s1, a1, s2, a2, . . . , st) ∈ Ai (st)
past play σi (s1, a1, s2, a2, . . . , at−1) do not count
• mixed strategy
σi (s1, a1, s2, a2, . . . , st) ∈ ∆(Ai (st))
∆(Ai (st)) is probability distribution on set Ai (st)
Strategies• Space of stationary mixed strategies for player i
Xi = ×s∈S∆(Ai (s))
• profile of mixed strategies
σ = (σi )i∈N , σi ∈ Xi
• space of infinite plays H∞ = SAN is set of all possible infinitesequences:
(s1, a1, s2, a2, . . . , st , at , . . .)
• Every profile of mixed strategies σ = (σi )i∈N and initial states1 induce probability distribution Ps1,σ on H∞ = SAN
• finite or infinite (T →∞) stream of payoffs
ui (st , at), t = 1, 2 . . . ,T
Set-ups
• finite horizon evaluationinteraction lasts exactly T stages
• discounted evaluationinteraction lasts many stages, players discount stage payoffs -better to receive 1 $ today than tomorrow
• limsup evaluationinteraction lasts many stages, players do not discount stagepayoffs - stage payoff at time t is insignificant if compared topayoffs in all other stages
Set-ups
• T -stage payoff
γTi (s1, σ) := Es1,σ
[ 1
T
T∑t=1
ui (st , at)]
• λ-discounted payoff
γλi (s1, σ) := Es1,σ
[λ
∞∑t=1
(1− λ)t−1ui (st , at)]
• limsup payoff
γ∞i (s1, σ) := Es1,σ
[lim supT→∞
1
T
T∑t=1
ui (st , at)]
Equilibria
• σ is T -stage ε-equilibrium if
γTi (s1, σ) ≥ γT
i (s1, σ′i , σ−i )− ε, ∀ s1 ∈ S , i ∈ N, σ′i ∈ Xi
• σ is λ-discounted ε-equilibrium if
γλi (s1, σ) ≥ γλi (s1, σ′i , σ−i )− ε, ∀ s1 ∈ S , i ∈ N, σ′i ∈ Xi
• σ is limsup ε-equilibrium if
γ∞i (s1, σ) ≥ γ∞i (s1, σ′i , σ−i )− ε, ∀ s1 ∈ S , i ∈ N, σ′i ∈ Xi
Players benefit from unilateral deviations no more than ε
Two player stochastic zero-sum game
• sum of payoffs is zero: u1(s, a) + u2(s, a) = 0, ∀(s, a) ∈ SA
• The game admits at most one equilibrium payoff (termedvalue of the game) at every initial state s1
• each player’s strategy σ1 at an ε-equilibrium is ε-optimalguarantees the value up to ε,
γT1 (s1, σ1, σ2) ≥ vT (s1)︸ ︷︷ ︸
value at s1
−ε, ∀σ2 ∈ X2
• Theorem [Shapley 1953, Fink 1964]: If all sets are finite, thenfor every λ there exists an equilibrium in stationary strategies
Proof
• V space of all functions v : S → R.
• define zero-sum matrix game Gλs (v) for all v
• A1(s), A2(s) space of actions at state s
• payoff (Player 2 pays to Player 1) is
λu1(s, a) + (1− λ)∑s′∈S
q(s ′|s, a)v(s ′)
• define value operator φs(v) = val(Gλs (v))
• non-expansiveness ‖φ(v)− φ(w)‖∞ ≤ (1− λ)‖v −w‖∞ leadsto unique fixed point v̂λ
• optimal mixed action σi in the matrix game Gλst (v̂λ) is
λ-discounted 0-optimal strategy
Example 1: Absorbing gameL R
T 0 s2 1 s1
B 1 s1 0 s0
State s2
LT 1 s1
State s1
LT 0 s0
State s0
• for every v = (v1, v2, v3) ∈ V = R3 the game Gλs2(v) is
L RT (1− λ)v2 λ+ (1− λ)v1
B λ+ (1− λ)v1 (1− λ)v0
Game Gλs2
LT λ+ (1− λ)v1
Game Gλs1
LT (1− λ)v0
Game Gλs0
• v̂λ0 = val(Gλs0(v̂)) yieds v̂λs0 = 0
• v̂λ1 = val(Gλs1(v̂)) yieds v̂λs1 = 1
• v̂λ2 = val(Gλs2(v̂)) yieds v̂λs2 = 1−
√λ
1−λ
σ2 = [1−√λ
1−λ (L),√λ−λ
1−λ (R)] σ1 = [1−√λ
1−λ (T ),√λ−λ
1−λ (B)]
fromv2 = y(1− λ)v2 + (1− y) = y (y prob. player 2 plays L)v2 = x(1− λ)v2 + (1− x) = x (x prob. player 1 plays T).
Example 2: Big MatchL R
T 0 s2 1 s2
B 1 s1 0 s0
State s2
LT 1 s1
State s1
LT 0 s0
State s0
• for every v = (v1, v2, v3) ∈ V = R3 the game Gλs2(v) is
L RT (1− λ)v2 λ+ (1− λ)v2
B λ+ (1− λ)v1 (1− λ)v0
Game Gλs2
LT λ+ (1− λ)v1
Game Gλs1
LT (1− λ)v0
Game Gλs0
• v̂λ0 = val(Gλs0(v̂)) yieds v̂λs0 = 0
• v̂λ1 = val(Gλs1(v̂)) yieds v̂λs1 = 1
• v̂λ2 = val(Gλs2(v̂)) yieds v̂λs2 = 1
2
σ2 = [12(L), 1
2(R)] σ1 = [ 11+λ(T ), λ
1+λ(B)]
fromv2 = y(1− λ)v2 + (1− y)[λ + (1− λ)v2] = y (y prob. player 2 plays L)v2 = x(1− λ)v2 + (1− x) = x[λ + (1− λ)v2] (x prob. player 1 plays T)
Results and open questions
• Can one find for every stochastic game a strategy profile thatis ε-equilibrium for every discount factor sufficiently small?
• Can one identify classes of games where one has a simplestrategy profile that is an ε-equilibrium for every discountfactor sufficiently small? (e.g. stationary strategy, periodicstrategy)
• Theorem [Mertens and Neyman 1981] For two player zerosum games each player has a strategy that is ε-optimal forevery discount factor sufficiently small
• Theorem [Vieille 2000]: For every two player non zero sumstochastic game there is a strategy profile that is ε-equilibriumfor every discount factor sufficiently small.
Algorithms
• based on linear programming for two players zero-sum games
• extentions of Lemke-Howson algorithm for nonzero-sum games
• other algorithms based on fictitious play, value iterates, andpolicy improvement
Additional and future directions
• approximation of games with infinite state and action spacesby finite games
• stochastic games in continuous time
• existence of a uniform equilibrium and a limsup equilibrium inmulti-player stochastic games with finite state and actionspaces.
• development of efficient algorithms that calculate the value oftwo-player zero-sum games.
• approachable and excludable sets in stochastic games withvector payoffs
References
• Filar JA, Vrieze K (1996) Competitive Markov decisionprocesses. Springer.
• Mertens JF, Neyman A (1981) Stochastic games. Int J GameTh 10:5366
• Neyman A, Sorin S (2003) Stochastic games and applications.NATO Science Series. Kluwer
• Solan E (2009) Stochastic Games, in Encyclopedia ofDatabase Systems, Springer.
• Vieille N (2000) Equilibrium in 2-person stochastic games I: AReduction. Israel J Math 119:5591