Download - Nesterov’s excessive gap technique and poker

Nesterov’s excessive gap technique and poker

Andrew GilpinCMU Theory Lunch

Feb 28, 2007

Joint work with:Samid Hoda, Javier Peña, Troels Sørensen, Tuomas Sandholm

Outline

• Two-person zero-sum sequential games

• First-order methods for convex optimization

• Nesterov’s excessive gap technique (EGT)

• EGT for sequential games

• Heuristics for EGT

• Application to Texas Hold’em poker

We want to solve:

If Q1 and Q2 are simplices, this is the Nash equilibrium problem for two-person zero-sum matrix games

If Q1 and Q2 are complexes, this is the Nash equilibrium problem for two-person zero-sum sequential games

What’s a complex?

It’s just like a simplex, but more complex.

Each player’s complex encodes her set ofrealization plans in the game

In particular, player 1’s complex is

where E and e depend on the game…

A B C D E F G H

Recall our problem:

where Q1 and Q2 are complexes

Since Q1 and Q2 have a linear description,this problem can be solved as an LP. However,current LP solution methods do not scale

(Un)scalability of LP solvers

• Rhode Island poker [Shi & Littman 01]– LP has 91 million rows and columns– Applying GameShrink automated abstraction algorithm yields an

LP with only 1.2 million rows and columns, and 50 million non-zeros [G. & Sandholm, 06a]

– Solution requires 25 GB RAM and over a week of CPU time

• Texas Hold’em poker– ~1018 nodes in game tree– Lossy abstractions need to be performed– Limitations of current solver technology primary limitation

to achieving expert-level strategies [G. & Sandholm 06b, 07a]

• Instead of standard LP solvers, what about a first-order method?

Convex optimization

Suppose we want to solve

where f is convex.

For general f, convergence requires O(1/ε2) iterations(e.g., for subgradient methods)

For smooth, strongly convex f with Lipschitz-continuous gradient, can be done in O(1/ε½) iterations

Note that this formulation capturesALL convex optimization problems(can model feasible space using anindicator function)

Analysis based on black-box oracleaccess model. Can we do better bylooking inside the box?

Strong convexity

A function is strongly convex if there exists such that

for all and all

is the strong convexity parameter of d

Recall our problem:

where Q1 and Q2 are complexes

Equivalently:

where

and

, ,

Unfortunately, Φ and f are non-smooth

Fortunately, they have a special structure

Let d1,d2 be smooth and strongly convex on Q1,Q2

These are called prox-functions

Now let μ > 0 and consider:

These are well-defined smooth functions

Excessive gap condition

From weak duality, we have that f(y) ≤ Φ(x)

The excessive gap condition requires that

fμ(y) ≤ Φμ(x) (EGC)

The algorithm maintains (EGC), and gradually decreases μ

As μ decreases, the smoothed functions approach thenon-smooth functions, and thus iterates satisfying (EGC)converge to optimal solutions

Nesterov’s main theorem

Theorem [Nesterov 05]There exists an algorithm such that after at most N iterations, the iterates have duality gap at most

Furthermore, each iteration only requires solving three problems of the form

and performing three matrix-vector product operations on A.

Nice prox functions

A prox function d for Q is nice if it is:1. Strongly convex continuous everywhere in Q,

and differentiable in the relative interior of Q

2. The min of d over Q is 0

3. The following maps are easily computable:

Nice simplex prox function 1: Entropy

Nice simplex prox function 2: Euclidean

sargmax can be computed in O(n log n) time

From the simplex to the complex

Theorem [Hoda, G., Peña 06]

A nice prox function can be constructed for

the complex via a recursive application of

any nice prox function for the simplex

Prox function example

Let be any nice simplex prox function.The prox function for this matrix is:

Solving

(similar to b(i-vii))

Heuristics [G., Hoda, Peña, Sandholm 07]

• Heuristic 1: Aggressive μ reduction– The μ given in the previous algorithm is a

conservative choice guaranteeing convergence– In practice, we can do much better by aggressively

pushing μ, while checking that the excessive gap condition is satisfied

• Heuristic 2: Balanced μ reduction– To prevent one μ from dominating the other, we also

perform periodic adjustments to keep them within a small factor of one another

Matrix-vector multiplication in poker[G., Hoda, Peña, Sandholm 07]

• The main time and space bottleneck of the algorithm is the matrix-vector product on A

• Instead of storing the entire matrix, we can represent it as a composition of Kronecker products

• We can also effectively take advantage of parallelization in the matrix-vector product to achieve near-linear speedup

Memory usage comparison

Instance CPLEX IPM CPLEX Simplex EGT

10k 0.082 GB >0.051 GB 0.012 GB

160k 2.25 GB >0.664 GB 0.035 GB

RI 25.2 GB >3.45 GB 0.15 GB

Texas >458 GB >458 GB 2.49 GB

Poker

• Poker is a recognized challenge problem in AI because (among other reasons)– the other players’ cards are hidden;– bluffing and other deceptive strategies are needed in

a good player;– there is uncertainty about future events.

• Texas Hold’em: most popular variant of poker• Two-player game tree has ~1018 nodes

Potential-aware automated abstraction[G., Sandholm, Sørensen 07]

• Most prior automated abstraction algorithms employ a myopic expected value computation as a similarity metric– This ignores hands like flush draws where although the

probability of winning is small, the payoff could be high

• Our newest algorithm considers higher-dimensional spaces consisting of histograms over abstracted classes of states from later stages of the game

• This enables our bottom-up abstraction algorithm to automatically take into account positive and negative potential

Solving the four-round model

• Computed abstraction with– 20 first-round buckets– 800 second-round buckets– 4800 third-round buckets– 28800 fourth-round buckets

• Algorithm using 30 GB RAM– Simply representing as an LP requires 32 TB– Outputs new, improved solution every 2.5 days

[G., Sandholm, Sørensen 07]

Future research

• Customizing second-order (e.g. interior-point methods) for the equilibrium problem

• Additional heuristics for improving practical performance of EGT algorithm

• Techniques for finding an optimal solution from an ε-solution

Thank you ☺