Bilinear Games: Polynomial Time Algorithms for Rank Based Subclasses Ruta Mehta Indian Institute of...
-
Upload
colten-killman -
Category
Documents
-
view
214 -
download
0
Transcript of Bilinear Games: Polynomial Time Algorithms for Rank Based Subclasses Ruta Mehta Indian Institute of...
Bilinear Games: Polynomial Time Algorithms for Rank Based
Subclasses
Ruta MehtaIndian Institute of Technology, Bombay
Joint work with Jugal Garg and Albert X. Jiang
A Game: Rock-Paper-Scissor
Rock-Paper-Scissor: A Play
Winner
$1
Rock-Paper-Scissor: A Play
Winner
$1
Rock-Paper-Scissor: A Play
Winner
$1
0,0 -1,1 1,-1
1,-1 0,0 -1,1
-1,1 1,-1 0,0
Rock-Paper-Scissor Payoffs
R P C
R 0 -1 1
P 1 0 -1
C -1 1 0
Bimatrix Game
Steady State: No player gains by unilateral deviation
R P C
R 0 1 -1
P -1 0 1
C 1 -1 0
S1 = { R, P, C }
S2 = { R, P, C }
A B
R P C
R 0 -1 1
P 1 0 -1
C -1 1 0
Bimatrix Game
No Steady State
R P C
R 0 1 -1
P -1 0 1
C 1 -1 0
S1 = { R, P, C }
S2 = { R, P, C }
A B
R 1/3 P1/3
C1/3
R 0 -1 1
P 1 0 -1
C -1 1 0
Mixed Play
Steady State
R P C
R 1/3 0 1 -1
P 1/3 -1 0 1
C 1/3 1 -1 0
S1 = { R, P, C }
A B
∆1={r1, p1, c1≥0; r1+p1+c1=1}
S1 = { R, P, C } ∆2={r2, p2, c2≥0; r2+p2+c2=1}
John Nash (1951) Finite Game: Finitely many players, each with
finitely many strategies.
Nash: Every finite game has a steady state in mixed strategy.Hence forth called Nash equilibrium (NE)
Proved using Kakutani fixed point theorem: Highly non-constructive.
Nash Equilibrium Computation Papadimitriou (JCSS’94): PPAD-class
Problems where existence is guaranteed like fixed point, Sperner’s Lemma, Nash equilibrium.
Chen and Deng (FOCS’06): It is PPAD-hard.
CDT (FOCS’06): Even approximation is PPAD-hard.
Rank and Computation
Kannan and Theobald (SODA’07): Define rank of (A,B) as rank(A+B). FPTAS for fixed rank games.
Polynomial time algorithms for exact Nash. Dantzig (1963): Zero-sum (rank-0) is equiv. to LP. AGMS (STOC’11): Rank-1 games.
Bilinear Games Bimatrix Game with polyhedral strategy sets.
Two players: 1 and 2 Polyhedral strategy sets:
X={x | Ex = e; x ≥ 0}, Y={y | Fy=f; y ≥ 0} Payoff matrices: A, B Bilinear Payoff: (x, y) fetches xTAy to player 1,
and xTBy to player 2.
Motivation: Koller et al. (STOC’94) for two-player extensive form game with perfect recall.
mR nRm nR
Nash Equilibrium in Bilinear
NE: No player gains by unilateral deviation. Existence: Corollary of Glicksberg’s result.
Symmetric Game: B=AT and Y=X. (x, y) is a symmetric profile if y=x. Existence of symmetric NE: An adaptation of
Nash’s proof for symmetric bimatrix games.
Bilinear Contains: Bimatrix, Polymatrix, Bayesian, etc.
Bimatrix: X = ∆1, Y = ∆2
Polymatrix: N players. Each pair plays a bimatrix game. Player i: Si finite strategy set, ∆i Mixed strategy
set. Goal of i: Choose xi from ∆i to maximize total
payoff.
Aij
i
j
Polymatrix to Bilinear M= |S1|+ … + |Sn|. X = {(x1,…,xn) | xi in ∆i}, Y=X. A , B=AT
Symmetric NE of (A,B) maps to a NE of the polymatrix game
M MR0
0
Aij
0
0
i
j
A =
Best Response (Koller et al.) Fix a strategy y of player 2. Player 1 solves
max: xT(Ay) min: eTp Ex = e pTE ≥ (Ay)T
x ≥ 0
At optimal: p s.t. Aiy ≤ pTEi & xi > 0 => Aiy = pTEi Given x X, for player 2 we getAt optimal: q s.t. Bjx ≤ qTFj & yj > 0 => qTFj =
Bjx
Best Response Polytopes (BRPs) (x,y) is a NE iff p: Ay ≤ ETp; xi > 0 => Aiy = pTEi
q: xTB ≤ qTF; yj > 0 => qTFj = Bjx
xT(Ay - ETp) ≤ 0 and (xTB - qTF)y ≤ 0xT(A+B)y – eTp – fTy ≤ 0
{( , ) | , 0, }
{( , ) | 0, , }
T ii j
T j T ji
P y p A y p E y Fy f
Q x q x x B q F Ex e
Nash Equilibrium in BRPs
NE iff xT(Ay - ETp)=0 and (xTB - qTF)y=0xT(A+B)y – eTp – fTy=0
Assumption: P and Q are non-degnerate.(u, v) of P x Q gives a NE => (u, v) is a vertex.
{( , ) | , 0, }
{( , ) | 0, , }
T ii j
T j T ji
P y p A y p E y Fy f
Q x q x x B q F Ex e
QP Formulation
max: xT(A+B)y – eTp – fTy s.t. (y, p) P
(x, q) Q
Optimal value 0. Only vertex solutions.
Our Results Rank-1 games: rank(A+B)=1
Extend Adsul et al. algorithm for exact NE.
Fixed rank games: rank(A+B)=k Extend FPTAS of Kannan et al.
Rank of A or B is constant Enumerate all NE in polynomial time.
Rank-1 Case Zero-sum ~ rank(A+B)=0: LP formulation
(Charnes’53) rank(A+B)=1 then A+B = a.bT
The QP formulation: max: (xTa)(bTy) – eTp – fTy s.t. (y, p) P
(x, q) Q
Rank-1 Case Replace (xTa) by z. Recall B = -A + a.bT
xT(A+B)y – eTp – fTy=0 z(bTy) – eTp – fTy=0
N = Points of P x Q’ with z(bTy) – eTp – fTy=0 Forms paths and cycles, since z gives one degree
of freedom.
NE of (A,B): Points in intersection of N and z – xTa =0.
' {( , , ) | 0; ( ) ; }T T ji jQ x z q x x A zb q F Ex e
{( , ) | 0, , }T j T jiQ x q x x B q F Ex e
Parameterized LP
LP(z) = max: z(bTy) – eTp – fTy s.t. (y, p) P
(x, z, q) Q’
Given any c, Optimal value of LP(c) is 0. OPT(c) lies on N, and Let N(c)={Points of N with z=c}, then
OPT(c)=N(c). N is a single path on which z is monotonic.
Rank-1: The Algorithm NE: Intersection of N and H: z – xTa =0. . c1=amin, c2=amaxmin maxmin ; maxT T
x X x Xa x a aa x
H
N
H– H+
NE
N(c1)
N(c2)
Rank-1: Binary Search Algorithm NE of (A,B): Points in intersection of N and H. c=c1+c2/2.
H
NE
N(c1)
N(c2)
N
N(c)H+H–
Rank-1: Binary Search Algorithm NE of (A,B): Points in intersection of N and H. c=c1+c2/2. If N(c) in H–,then c1=c else c2=c.
H
NE
N(c2)
N
N(c1)H+H–
Analysis Terminates because,
z is monotonic on N. Increase in z on each edge is lower bounded by
1/d where d is polynomial sized in the input.
Time complexity: Solve LP(c) to get N(c) in each pivot. log(d) * log(amax – amin) pivots.
Conclusions Bilinear games:
Bimatrix with polytopal strategy sets. Fairly general. Contains polymatrix, bayesian, etc. Polynomial time algorithm for rank based
subclasses.
Open problems: Designing a Lemke-Howson type algorithm. Degree, index, stability concepts. Computation of approximate equilibrium.
Thank You