Bilinear Games: Polynomial Time Algorithms for Rank Based Subclasses Ruta Mehta Indian Institute of...

Bilinear Games: Polynomial Time Algorithms for Rank Based

Subclasses

Ruta MehtaIndian Institute of Technology, Bombay

Joint work with Jugal Garg and Albert X. Jiang

A Game: Rock-Paper-Scissor

Rock-Paper-Scissor: A Play

Winner

$1

0,0 -1,1 1,-1

1,-1 0,0 -1,1

-1,1 1,-1 0,0

Rock-Paper-Scissor Payoffs

R P C

R 0 -1 1

P 1 0 -1

C -1 1 0

Bimatrix Game

Steady State: No player gains by unilateral deviation

R P C

R 0 1 -1

P -1 0 1

C 1 -1 0

S1 = { R, P, C }

S2 = { R, P, C }

A B

R P C

R 0 -1 1

P 1 0 -1

C -1 1 0

Bimatrix Game

No Steady State

R P C

R 0 1 -1

P -1 0 1

C 1 -1 0

S1 = { R, P, C }

S2 = { R, P, C }

A B

R 1/3 P1/3

C1/3

R 0 -1 1

P 1 0 -1

C -1 1 0

Mixed Play

Steady State

R P C

R 1/3 0 1 -1

P 1/3 -1 0 1

C 1/3 1 -1 0

S1 = { R, P, C }

A B

∆1={r1, p1, c1≥0; r1+p1+c1=1}

S1 = { R, P, C } ∆2={r2, p2, c2≥0; r2+p2+c2=1}

John Nash (1951) Finite Game: Finitely many players, each with

finitely many strategies.

Nash: Every finite game has a steady state in mixed strategy.Hence forth called Nash equilibrium (NE)

Proved using Kakutani fixed point theorem: Highly non-constructive.

Nash Equilibrium Computation Papadimitriou (JCSS’94): PPAD-class

Problems where existence is guaranteed like fixed point, Sperner’s Lemma, Nash equilibrium.

Chen and Deng (FOCS’06): It is PPAD-hard.

CDT (FOCS’06): Even approximation is PPAD-hard.

Rank and Computation

Kannan and Theobald (SODA’07): Define rank of (A,B) as rank(A+B). FPTAS for fixed rank games.

Polynomial time algorithms for exact Nash. Dantzig (1963): Zero-sum (rank-0) is equiv. to LP. AGMS (STOC’11): Rank-1 games.

Bilinear Games Bimatrix Game with polyhedral strategy sets.

Two players: 1 and 2 Polyhedral strategy sets:

X={x | Ex = e; x ≥ 0}, Y={y | Fy=f; y ≥ 0} Payoff matrices: A, B Bilinear Payoff: (x, y) fetches xTAy to player 1,

and xTBy to player 2.

Motivation: Koller et al. (STOC’94) for two-player extensive form game with perfect recall.

mR nRm nR

Nash Equilibrium in Bilinear

NE: No player gains by unilateral deviation. Existence: Corollary of Glicksberg’s result.

Symmetric Game: B=AT and Y=X. (x, y) is a symmetric profile if y=x. Existence of symmetric NE: An adaptation of

Nash’s proof for symmetric bimatrix games.

Bilinear Contains: Bimatrix, Polymatrix, Bayesian, etc.

Bimatrix: X = ∆1, Y = ∆2

Polymatrix: N players. Each pair plays a bimatrix game. Player i: Si finite strategy set, ∆i Mixed strategy

set. Goal of i: Choose xi from ∆i to maximize total

payoff.

Aij

i

j

Polymatrix to Bilinear M= |S1|+ … + |Sn|. X = {(x1,…,xn) | xi in ∆i}, Y=X. A , B=AT

Symmetric NE of (A,B) maps to a NE of the polymatrix game

M MR0

0

Aij

0

0

i

j

A =

Best Response (Koller et al.) Fix a strategy y of player 2. Player 1 solves

max: xT(Ay) min: eTp Ex = e pTE ≥ (Ay)T

x ≥ 0

At optimal: p s.t. Aiy ≤ pTEi & xi > 0 => Aiy = pTEi Given x X, for player 2 we getAt optimal: q s.t. Bjx ≤ qTFj & yj > 0 => qTFj =

Bjx

Best Response Polytopes (BRPs) (x,y) is a NE iff p: Ay ≤ ETp; xi > 0 => Aiy = pTEi

q: xTB ≤ qTF; yj > 0 => qTFj = Bjx

xT(Ay - ETp) ≤ 0 and (xTB - qTF)y ≤ 0xT(A+B)y – eTp – fTy ≤ 0

{( , ) | , 0, }

{( , ) | 0, , }

T ii j

T j T ji

P y p A y p E y Fy f

Q x q x x B q F Ex e

Nash Equilibrium in BRPs

NE iff xT(Ay - ETp)=0 and (xTB - qTF)y=0xT(A+B)y – eTp – fTy=0

Assumption: P and Q are non-degnerate.(u, v) of P x Q gives a NE => (u, v) is a vertex.

{( , ) | , 0, }

{( , ) | 0, , }

T ii j

T j T ji

P y p A y p E y Fy f

Q x q x x B q F Ex e

QP Formulation

max: xT(A+B)y – eTp – fTy s.t. (y, p) P

(x, q) Q

Optimal value 0. Only vertex solutions.

Our Results Rank-1 games: rank(A+B)=1

Extend Adsul et al. algorithm for exact NE.

Fixed rank games: rank(A+B)=k Extend FPTAS of Kannan et al.

Rank of A or B is constant Enumerate all NE in polynomial time.

Rank-1 Case Zero-sum ~ rank(A+B)=0: LP formulation

(Charnes’53) rank(A+B)=1 then A+B = a.bT

The QP formulation: max: (xTa)(bTy) – eTp – fTy s.t. (y, p) P

(x, q) Q

Rank-1 Case Replace (xTa) by z. Recall B = -A + a.bT

xT(A+B)y – eTp – fTy=0 z(bTy) – eTp – fTy=0

N = Points of P x Q’ with z(bTy) – eTp – fTy=0 Forms paths and cycles, since z gives one degree

of freedom.

NE of (A,B): Points in intersection of N and z – xTa =0.

' {( , , ) | 0; ( ) ; }T T ji jQ x z q x x A zb q F Ex e

{( , ) | 0, , }T j T jiQ x q x x B q F Ex e

Parameterized LP

LP(z) = max: z(bTy) – eTp – fTy s.t. (y, p) P

(x, z, q) Q’

Given any c, Optimal value of LP(c) is 0. OPT(c) lies on N, and Let N(c)={Points of N with z=c}, then

OPT(c)=N(c). N is a single path on which z is monotonic.

Rank-1: The Algorithm NE: Intersection of N and H: z – xTa =0. . c1=amin, c2=amaxmin maxmin ; maxT T

x X x Xa x a aa x

H

N

H– H+

NE

N(c1)

N(c2)

Rank-1: Binary Search Algorithm NE of (A,B): Points in intersection of N and H. c=c1+c2/2.

H

NE

N(c1)

N(c2)

N

N(c)H+H–

Rank-1: Binary Search Algorithm NE of (A,B): Points in intersection of N and H. c=c1+c2/2. If N(c) in H–,then c1=c else c2=c.

H

NE

N(c2)

N

N(c1)H+H–

Analysis Terminates because,

z is monotonic on N. Increase in z on each edge is lower bounded by

1/d where d is polynomial sized in the input.

Time complexity: Solve LP(c) to get N(c) in each pivot. log(d) * log(amax – amin) pivots.

Conclusions Bilinear games:

Bimatrix with polytopal strategy sets. Fairly general. Contains polymatrix, bayesian, etc. Polynomial time algorithm for rank based

subclasses.

Open problems: Designing a Lemke-Howson type algorithm. Degree, index, stability concepts. Computation of approximate equilibrium.

Thank You

Bilinear Games: Polynomial Time Algorithms for Rank Based Subclasses Ruta Mehta Indian Institute of...

Documents

Transcript of Bilinear Games: Polynomial Time Algorithms for Rank Based Subclasses Ruta Mehta Indian Institute of...