The State of Solving Large Incomplete-Information Games, and …sandholm/solving...

Game-theoretic solution concepts prescribe how rationalparties should act in multiagent settings. This is non-trivial because an agent’s utility-maximizing strategy

generally depends on the other agents’ strategies. The mostfamous solution concept for this is a Nash equilibrium: a strat-egy profile (one strategy for each agent) where no agent hasincentive to deviate from her strategy given that others do notdeviate from theirs.

In this article I will focus on incomplete-information games,that is, games where the agents do not entirely know the stateof the game at all times. The usual way to model them is a gametree where the nodes (that is, states) are further grouped intoinformation sets. In an information set, the player whose turn itis to move cannot distinguish between the states in the infor-mation set, but knows that the actual state is one of them.Incomplete-information games encompass most games of prac-tical importance, including most negotiations, auctions, andmany applications in information security and physical battle.

Such games are strategically challenging. A player has to rea-son about what others’ actions signal about their knowledge.Conversely, the player has to be careful about not signaling toomuch about her own knowledge to others through her actions.Such games cannot be solved using methods for complete-infor-mation games like checkers, chess, or Go. Instead, I will reviewnew game-independent algorithms for solving them.

Poker has emerged as a standard benchmark in this space (Shiand Littman 2002; Billings et al. 2002) for a number of reasons,because (1) it exhibits the richness of reasoning about a proba-bilistic future, how to interpret others’ actions as signals, andinformation hiding through careful action selection, (2) thegame is unambiguously specified, (3) the game can be scaled tothe desired complexity, (4) humans of a broad range of skill existfor comparison, (5) the game is fun, and (6) computers findinteresting strategies automatically. For example, time-testedbehaviors such as bluffing and slow play arise from the game-the-oretic algorithms automatically rather than having to be explic-itly programmed.

Articles

WINTER 2010 13Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602

The State of Solving Large Incomplete-Information Games,

and Application to Poker

Tuomas Sandholm

n Game-theoretic solution concepts prescribehow rational parties should act, but to becomeoperational the concepts need to be accompa-nied by algorithms. I will review the state ofsolving incomplete-information games. Theyencompass many practical problems such asauctions, negotiations, and security applica-tions. I will discuss them in the context of howthey have transformed computer poker. In short,game-theoretic reasoning now scales to manylarge problems, outperforms the alternatives onthose problems, and in some games beats thebest humans.

Kuhn poker — a game with three cards — wasamong the first applications discussed in game the-ory, and it was solved analytically by hand (Kuhn1950). On large-scale poker games, the best com-puterized strategies for a long time were rule based.Nowadays, the best poker-playing programs aregenerated automatically using algorithms that arebased on game-theoretic principles.

There has been tremendous progress on equilib-rium-finding algorithms since 2005. Two-playerzero-sum game trees with 1012 leaves can now besolved near optimally. However, many real gamesare even larger. For example, two-player LimitTexas Hold’em poker has 1018 leaves. For such largegames, abstraction algorithms have emerged aspractical preprocessors.

Most competitive poker-playing programs arenowadays generated using an abstraction algo-rithm followed by using a custom equilibrium-finding algorithm to solve the abstracted game. Seefigure 1. This paradigm was first used in Gilpin,Sandholm, and Sørensen (2007). Predecessors ofthe paradigm included handcrafting small abstrac-tions (Billings et al. 2003), as well as solving auto-matically generated abstractions with general-pur-pose linear programming algorithms (Gilpin andSandholm 2006; 2007a; 2007b).

In this article I will discuss abstraction algo-rithms first and equilibrium-finding algorithmssecond. After that I will address opponent exploita-tion and other topics.

Abstraction Algorithms Abstraction algorithms take as input a descriptionof the game and output a smaller but strategicallysimilar — or even equivalent — game. The abstrac-tion algorithms discussed here work with anyfinite number of players and do not assume a zero-sum game.

Information Abstraction The most popular kind of abstraction is informa-tion abstraction. The game is abstracted so that theagents cannot distinguish some of the states thatthey can distinguish in the actual game. For exam-ple in an abstracted poker hand, an agent is notable to observe all the nuances of the cards that shewould normally observe.

Lossless Information Abstraction. It turns outthat it is possible to do lossless informationabstraction, which may seem like an oxymoron atfirst. The method I will describe (Gilpin and Sand-holm 2007b) is for a class of games that we callgames with ordered signals. It is structured, but stillgeneral enough to capture a wide range of strategicsituations. A game with ordered signals consists ofa finite number of rounds. Within a round, theplayers play a game on a directed tree (the tree canbe different in different rounds). The only uncer-tainty players face stems from private signals theother players have received and from the unknownfuture signals. In other words, players observe each

Articles

14 AI MAGAZINE

Nashequilibrium

Nashequilibrium

Original game

Abstractedgame

Abstraction algorithm

Reverse model

Customalgorithm for

!nding a Nashequilibrium

Figure 1. Current Paradigm for Solving Large Incomplete-Information Games.

others’ actions, but potentially not nature’sactions. In each round, there can be public signals(announced to all players) and private signals (con-fidentially communicated to individual players).We assume that the legal actions that a player hasare independent of the signals received. For exam-ple, in poker, the legal betting actions are inde-pendent of the cards received. Finally, thestrongest assumption is that there is a total order-ing among complete sets of signals, and the pay-offs are increasing (not necessarily strictly) in thisordering. In poker, this ordering corresponds to theranking of card hands.

The abstraction algorithm operates on the sig-nal tree, which is the game tree with all the agents’action edges removed. We say that two siblingnodes in the signal tree are ordered game isomorphicif (1) if the nodes are leaves, the payoff vectors ofthe players (which payoff in the vector material-izes depends on how the agents play) are the sameat both nodes, and (2) if the nodes are interiornodes, there is a bipartite matching of the nodes’children so that only ordered game isomorphicchildren get matched.

The GameShrink algorithm is a bottom-updynamic program that merges all ordered gameisomorphic nodes. It runs in Õ(n2) time, where nis the number of nodes in the signal tree. Game -Shrink tends to run in sublinear time and space inthe size of the game tree because the signal tree issignificantly smaller than the game tree in mostnontrivial games. The beautiful aspect of thismethod is that it is lossless (theorem 1). A smallexample run of GameShrink is shown in figure 2.

We applied GameShrink to Rhode IslandHold’em poker (Gilpin and Sandholm 2007b).That two-player game was invented as a testbedfor computational game playing (Shi and Littman2002). Applying the sequence form to RhodeIsland Hold’em directly without abstraction yieldsa linear program (LP) with 91,224,226 rows, andthe same number of columns. This is much toolarge for (current) linear programming algorithmsto handle. We used GameShrink to shrink this,yielding an LP with 1,237,238 rows and columns— with 50,428,638 nonzero coefficients. We then

applied iterated elimination of dominated strate-gies, which further reduced this to 1,190,443 rowsand 1,181,084 columns. GameShrink required lessthan one second to run. Then, using a 1.65 GHzIBM eServer p5 570 with 64 gigabytes of RAM (theLP solver actually needed 25 gigabytes), we solvedthe resulting LP in 8 days using the interior-pointbarrier method of CPLEX version 9.1.2. In sum-mary, we found an exact solution to a game with3.1 billion nodes in the game tree (the largestincomplete-information game that had beensolved previously had 140,000 (Koller and Pfeffer1997)). To my knowledge, this is still the largestincomplete-information game that has beensolved exactly.1

Lossy Information Abstraction. Some games areso large that even after applying the kind of losslessabstraction described above, the resulting LPwould be too large to solve. To address this prob-lem, such games can be abstracted more aggres-sively, but this incurs loss in solution quality.

One approach is to use a lossy version ofGameShrink where siblings are considered orderedgame isomorphic if their children can be approxi-mately matched in the bipartite matching part ofthe algorithm (Gilpin and Sandholm 2007b; 2006).However, lossy GameShrink suffers from threedrawbacks.

First, the resulting abstraction can be highlyinaccurate because the grouping of states intobuckets is, in a sense, greedy. For example, if lossyGameShrink determines that hand A is similar tohand B, and then determines that hand B is simi-lar to hand C, it will group A and C together,despite the fact that A and C may not be very sim-ilar. The quality of the abstraction can be evenworse when a longer sequence of such compar-isons leads to grouping together extremely differ-ent hands. Stated differently, the greedy aspect ofthe algorithm leads to lopsided buckets where largebuckets are likely to attract even more states intothe bucket.

Second, one cannot directly specify how manybuckets lossy GameShrink should yield (overall orat any specific betting round). Rather, there is aparameter (for each round) that specifies a thresh-

Articles

WINTER 2010 15

Any Nash equilibrium of the shrunken game correspondsto a Nash equilibrium of the original game.

Theorem 1. (Gilpin and Sandholm 2007b.)

old of how different states can be and still be con-sidered the same. If one knows how large an LP canbe solved, one cannot create an LP of that size byspecifying the number of buckets directly; ratherone must use trial-and-error (or some variant ofbinary search applied to the setting of multipleparameters) to pick the similarity thresholds (onefor each round) in a way that yields an LP of rough-ly the desired size.

The third drawback is scalability. The time need-ed to compute an abstraction for a three-roundtruncated version of two-player Limit TexasHold’em was more than a month. Furthermore, itwould have to be executed in the inner loop of theparameter guessing algorithm of the previous para-graph.

Expectation-Based Abstraction Using Clusteringand Integer Programming. In this subsection Idescribe a new abstraction algorithm that elimi-nates the above problems (Gilpin and Sandholm2007a). It is not specific to poker, but for concrete-

ness I will describe it in the context of two-playerTexas Hold’em.

The algorithm operates on an abstraction tree,which, unlike the signal tree, looks at the signalsfrom one player’s perspective only: the abstractionis done separately (and possibly with differentnumbers of buckets) for different players. For TexasHold’em poker, it is initialized as follows. The rootnode contains (52 choose 2) = 1326 children, onefor each possible pair of hole cards that a playermay be dealt in round 1. Each of these children has(50 choose 3) children, each corresponding to thepossible 3-card flops that can appear in round 2.Similarly, the branching factor at the next two lev-els is 47 and 46, corresponding to the 1-card drawsin rounds 3 and 4, respectively.

The algorithm takes as input the number ofbuckets, Kr, allowed in the abstraction in eachround, r. For example, we can limit the number ofbuckets of hands in round 1 to K1 = 20. Thus, wewould need to group (that is, abstract) each of the

Articles

16 AI MAGAZINE

J1J2

J2 K1

K1K2

K2

c b

C B F B

f b

c b

C B F B

f b

c b

C

f b

B BF

c b

C

f b

B BF

c b

C B F B

f b

c b

C B F B

f b

c b

C

f b

B BF

c b

C

f b

B BF

c b

C

f b

B BF

c b

C

f b

B BF

c b

C B F B

f b

c b

C B F B

f b

0 0

0-1

1

-1

1

-1-1

1

-1

11 1

-1-1

1

-1

1

-1 -1

11

-1

1

-1

10 0

0

0 0

0

0 0

0

-1

-2

-2 -1

-2

-2 -1

-2

-2 -1

-2

-2 1

2

2 1

2

2 1

2

2 1

2

2

J1 K1 K2 J1 J2 K2 J1 J2 K11

1

1

1 1

1 1

1

22222

22

2

{{J1}, {J2}, {K1}, {K2}}

{{J1,J2}, {K1}, {K2}}

c b

C B F B

f b

c b

C

f b

B BF

c b

C B F B

f b

J1,J2K1 K2

1

1

c b

C

f b

B BF

c b

C B F B

f b

c b

C B F B

f b

c b

C B F B

f b

J1,J2K1

K2

1

1

1

1

J1,J2 K2 J1,J2 K1

0 0

0-1

1

-1

1 1

-1

-1

-2

-2 -1

-2

-2

22

22

22

-1

11

-1

0 0

0

1

2

2

-1

11

-1

0 0

0

1

2

2

c b

C B F B

f b

-1

10 0

0

c b

B F B

f b

-1

1-1

-2

-2

c b

C B F B

f b

0 0

0-1

1

c b

C B F B

f b

J1,J2

J1,J2 J1,J2K1,K2

K1,K2

K1,K2

1

-1

1

2

2

22

22

{{J1,J2}, {K1,K2}}

1

1 1

1

1/4 1/41/4 1/4

1/31/3 3/1 1/3 1/31/3 1/3 1/3

1/41/41/2 1/2

1/31/3 1/3

1/32/3 1/32/3

1/2

3/13/23/23/1

Figure 2. GameShrink Applied to a Tiny Two-Player Four-Card Poker Game.

(The game consists of two jacks and two kings) (Gilpin and Sandholm 2007b). Next to each game tree is the range of the information fil-ter, which shows the abstraction. Dotted lines denote information sets, which are labeled by the controlling player. Open circles are chancenodes with the indicated transition probabilities. The root node is the chance node for player 1’s card, and the next level is for player 2’scard. The payment from player 2 to player 1 is given below each leaf. In this example, the algorithm reduces the game tree from 113 nodesto 39 nodes.

(52 choose 2) = 1326 hands into 20 buckets. Wetreat this as a clustering problem. To perform theclustering, we must first define a metric to deter-mine the similarity of two hands. Letting w, l, andd be the number of possible wins, losses, and draws(based on the rollout of the remaining cards), wecompute the hand’s value as w + d/2, and we takethe distance between two hands to be the absolutedifference between their values. This gives us thenecessary ingredients to apply clustering to form kbuckets, for example, k-means clustering (algo-rithm 1).

Algorithm 1 is guaranteed to converge, but itmay find a local optimum. Therefore, in our imple-mentation we run it several times with differentstarting points to try to find a global optimum. Fora given clustering, we can compute the error(according to the value measure) that we wouldexpect to have when using the abstraction.

For the later rounds we again want to determinethe buckets. Here we face the additional problem ofdetermining how many children each parent in theabstraction tree can have. For example, we can puta limit of K2 = 800 on the number of buckets inround 2. How should the right to have 800 children(buckets that have not yet been generated at thisstage) be divided among the 20 parents? We modeland solve this problem as a 0–1 integer program(Nemhauser and Wolsey 1999) as follows. Ourobjective is to minimize the expected error in theabstraction. Thus, for each of the 20 parent nodes,we run the k-means algorithm presented above forvalues of k between 1 and the largest number ofchildren we might want any parent to have, MAX.We denote the expected error when node i has kchildren by ci,k. We denote by pi the probability ofgetting dealt a hand that is in abstraction class i(that is, in parent i); this is simply the number ofhands in i divided by (52 choose 2). Based on thesecomputations, the following 0–1 integer programfinds the abstraction that minimizes the overallexpected error for the second level:

The decision variable xi,k is set to 1 if and only ifnode i has k children. The first constraint ensuresthat the limit on the overall number of children isnot exceeded. The second constraint ensures thata decision is made for each node. This problem is ageneralized knapsack problem, and although NP-complete, can be solved efficiently using off-the-shelf integer programming solvers (for example,CPLEX solves this problem in less than one secondat the root node of the branch-and-bound searchtree).

minK1∑i=1

piMAX∑k=1

ci,k xi,k

s.t.K1∑i=1

MAX∑k=1

kxi,k ≤ K 2

MAX∑k=1

xi,k = 1 ∀i

xi,k ∈ {0,1}

We repeat this procedure for round 3 (with theround-2 buckets as the parents and a different(larger) limit (say, K3 = 4,800) on the maximumnumber of buckets. Then we compute the bucket-ing for round 4 analogously, for example with K4 =28,800.

Overall, our technique optimizes the abstractionround by round in the abstraction tree. A betterabstraction (even for the same similarity metric)could conceivably be obtained by optimizing allrounds in one holistic optimization. However, thatseems infeasible. First, the optimization problemwould be nonlinear because the probabilities at agiven level depend on the abstraction at previouslevels of the tree. Second, the number of decisionvariables in the problem would be exponential inthe size of the initial abstraction tree (which itselfis large), even if the number of abstraction classesfor each level is fixed.

Potential-Aware Abstraction. The expectation-based abstraction approach previously describeddoes not take into account the potential of hands.For instance, certain poker hands are considereddrawing hands in which the hand is currently weak,but has a chance of becoming very strong. Sincethe strength of such a hand could potentially turnout to be much different later in the game, it isgenerally accepted among poker experts that sucha hand should be played differently than anotherhand with a similar chance of winning, but with-out as much potential (Sklansky 1999). However,if using the difference between probabilities ofwinning as the clustering metric, the abstractionalgorithm would consider these two very differentsituations similar.

One possible approach to handling the problemthat certain hands with the same probability ofwinning may have different potential would be toconsider not only the expected strength of a hand,but also its variance. Although this would likely bean improvement over basic expectation-basedabstraction, it fails to capture two important issuesthat prevail in many sequential imperfect infor-mation games, including poker.

First, mean and variance are a lossy representa-tion of a probability distribution, and the lost

Articles

WINTER 2010 17

1. Create k centroid points in the interval between theminimum and maximum hand values.

2. Assign each hand to the nearest centroid.3. Adjust each centroid to be the mean of their assigned

hand values.4. Repeat steps 2 and 3 until convergence.

Algorithm 1. k-means Clustering for Poker Hands.

aspects of the probability distribution over handstrength can be significant for deciding how oneshould play in any given situation.

Second, the approach based on mean and vari-ance does not take into account the different pathsof information revelation that hands take inincreasing or decreasing in strength. For example,two hands could have similar means and vari-ances, but one hand may get the bulk of its uncer-tainty resolved in the next round, while the otherhand needs two more rounds before the bulk of itsfinal strength is determined. The former hand isbetter because the player has to pay less to find outthe essential strength of his hand.

To address these issues, we introduced potential-aware abstraction, where we associate with eachstate of the game a histogram over future possiblestates (Gilpin, Sandholm, and Sørensen 2007); seefigure 3. This representation can encode all the per-tinent information from the rest of the game (suchas paths of information revelation), unlike theapproach based on mean and variance.

As in expectation-based abstraction, we use aclustering algorithm and an integer program forallocating children. They again require a distancemetric to measure the dissimilarity between differ-ent states. The metric we now use is the L2-distanceover the histograms of future states. Specifically, letS be a finite set of future states, and let each handi be associated with a histogram, hi, over the futurestates S. Then, the distance between hands i and jis

Under this approach, another design dimensionof the algorithm is in the construction of the pos-

dist (i, j ) =√∑

s∈S(hi (s) − hj (s))

2.

sible future states. There are at least two prohibi-tive problems with this vanilla approach as stated.First, there are a huge number of possible reachablefuture states, so the dimensionality of the his-tograms is too large to do meaningful clusteringwith a reasonable number of clusters (that is, smallenough to lead to an abstracted game that can besolved for equilibrium). Second, for any two statesat the same level of the game, the descendantstates are disjoint. Thus the histograms would havenonoverlapping supports, so any two states wouldhave maximum dissimilarity and thus no basis forclustering.

For both of these reasons (and for reducingmemory usage and enhancing speed), we coarsenthe domains of the histograms. First, instead ofhaving histograms over individual states, we usehistograms over abstracted states (buckets, that is,clusters), which contain a number of states each(and we use those buckets’ centroids to conductthe similarity calculations). We will have, for eachbucket, a histogram over buckets later in the game.Second, we restrict the histogram of each bucket tobe over buckets at the next round only (rather thanover buckets at all future rounds). However, weintroduce a technique (a bottom-up pass of con-structing abstractions up the tree) that allows thebuckets at the next round to capture informationfrom all later rounds.

One way of constructing the histograms wouldbe to perform a bottom-up pass of a tree represent-ing the possible card deals: abstracting round 4first, creating histograms for round 3 nodes basedon the round 4 clusters, then abstracting round 3,creating histograms for round 2 nodes based onthe round 3 clusters, and so on. This is indeed whatwe do to find the abstraction for round 1.

Articles

18 AI MAGAZINE

Round r

Round r-1

.3 .2 0 .5

Figure 3. States and Transitions Used in Potential-Aware Abstraction.

In our potential-aware abstraction algorithm, the similarity of states is measured based on the states’ transition his-tograms to buckets at the next round. This definition is then operationalized by conducting bottom-up passes in theabstraction tree so those later-round buckets get defined first.

However, for later betting rounds, we improveon this by leveraging our knowledge of the factthat abstracted children of any bucket at the levelabove should only include states that can actuallybe children of the states in that bucket. We do thisby multiple bottom-up passes, one for each bucketat the round above. For example, if a round-1bucket contains only those states where the handconsists of two aces, then when we conductabstraction for round 2, the bottom-up pass forthat level-1 bucket should only consider futurestates where the hand contains two aces as thehole cards. This enables the abstraction algorithmto narrow the scope of analysis to information thatis relevant given the abstraction that it made forearlier rounds.

In the last round there is no need to use thepotential-aware techniques discussed above sincethe players will not receive any more information,that is, there is no potential. Instead, we simplycompute the fourth-round abstraction based oneach hand’s probability of winning (based on dif-ferent possible rollouts of the cards), using cluster-ing and integer programming as in expectation-based abstraction.

Comparison of Expectation-Based Versus Poten-tial-Aware Abstraction. Now, which is better,expectation-based or potential-aware abstraction?Both types of abstraction algorithms run very

quickly compared to the time to even approxi-mately solve the abstracted game, so the compari-son comes down to how good the approximateequilibria are when generated from abstractions ofeach of the two types. It turns out that thisdepends on the granularity that the abstraction isallowed to have! (This in turn is dictated in prac-tice by the speed of the equilibrium-finding algo-rithm that is used to approximately solve theabstracted game.) We conducted experiments onthis question in the context of Rhode IslandHold’em so that we could solve the abstractedgame exactly (Gilpin and Sandholm 2008a). Forboth types of abstraction algorithms we allowedthe same number of abstract buckets at each of thethree rounds of the game. We denote an abstrac-tion granularity by the string K1-K2-K3. For exam-ple, 13-25-125 means 13 first-round buckets, 25second-round buckets, and 125 third-round buck-ets. The abstraction granularities we consideredrange from coarse (13-25-125) to fine (13-205-1774). At this latter granularity an equilibrium-pre-serving abstraction exists (Gilpin and Sandholm2007b). In the experiments we fix the first-roundgranularity to 13 (which allows for a lossless solu-tion in principle due to suit isomorphisms), andvary the granularity allowed in rounds two andthree. Figure 4 shows the results when the pro-grams generated with these two abstraction meth-

Articles

WINTER 2010 19

Potential-aware becomes lossless,win-probability-based is as goodas it gets, never lossless

-16.6

1.06

6.99 5.57

0.088

-20

-15

-10

-5

0

5

10

Winnings to potential-aware(small bets per hand)

Finer-grainedabstraction

13-2

5-12

5

13-5

0-25

0

13-1

50-1

250

13-2

05-1

774

13-1

00-7

50

Figure 4. Potential-Aware Versus Expectation-Based Abstraction.(Gilpin and Sandholm 2008a).

ods were played against each other. For very coarseabstractions, the expectation-based approach doesbetter. For medium granularities, the potential-aware approach does better. For fine granularities,the potential-aware approach does better, but thedifference is small. Interestingly, the expectation-based approach never yielded a lossless abstractionno matter how fine a granularity was allowed.

These conclusions hold also when (1) compar-ing the players against an optimal player (that is,an equilibrium strategy), or (2) comparing eachplayer against its nemesis (that is, a strategy thatexploits the player as much as possible in expecta-tion), or (3) evaluating the abstraction based on itsability to estimate the true value of the game.

These conclusions also hold for a variant of theexpectation-based algorithm that considers thesquare of the probability of winning (vi = (wi +d/2)2), rather than simply the probability (Zinke-vich et al. 2007). A motivation for this is that thehands with the higher probabilities of winningshould be more finely abstracted than lower-valuehands. This is because low-value hands are likelyfolded before the last round, and because it isimportant to know very finely how good a high-value hand one is holding if there is a betting esca-lation in the final round. Another suggested moti-vation is that this captures some of the varianceand “higher variance is preferred as it means theplayer eventually will be more certain about theirultimate chances of winning prior to a show-down.” The version of this we experimented withis somewhat different than the original since weare using an integer program to allocate the buck-ets, which enables nonuniform bucketing.

One possible explanation for the crossoverbetween expectation-based and potential-awareabstraction is that the dimensionality of the tem-porary states used in the bottom-up pass in thethird-round (which must be smaller than the num-ber of available second-round buckets in order forthe clustering to discover meaningful centroids) isinsufficient for capturing the strategically relevantaspects of the game. Another hypothesis is thatsince the potential-aware approach is trying tolearn a more complex model (in a sense, clusters ofpaths of states) and the expectation-based model istrying to learn a less complex model (clusters ofstates), the former requires a larger dimension tocapture this richness.

The existence of a cross-over suggests that for agiven game — such as Texas Hold’em — as com-puters become faster and equilibrium-finding algo-rithms more scalable so games with finer-grainedabstractions become solvable, the potential-awareapproach will become the better method of choice.

Problems with Lossy Information Abstraction.In single-agent settings, lossy abstraction has thedesirable property of monotonicity: As one refines

the abstraction, the solution improves. There isonly a weak analog of this in (even two-playerzero-sum) games: If the opponent’s strategy spaceis not abstracted lossily at all, refining our player’sabstraction causes our player to become lessexploitable. (Exploitability is measured by howmuch our player loses to its nemesis in expecta-tion. This can be computed in reasonably sizeablegames by carrying out a best-response calculationto our agent’s strategy.) There are no proven guar-antees on the amount of exploitability as a func-tion of the coarseness of the lossy informationabstraction used. Furthermore, sometimes theexploitability of a player increases as its abstractionor its opponent’s abstraction is refined (Waugh etal. 2009a). This nonmonotonicity holds for infor-mation abstraction even if one carefully selects theleast exploitable equilibrium strategy for the play-er in the abstracted games. The nonmonotonicityhas been shown in small artificial poker variants.For Texas Hold’em (even with just two players),experience from years of the AAAI Computer Pok-er Competition suggests that in practice finerabstractions tend to yield better strategies. Howev-er, further research on this question is warranted.

Another problem is that current lossy abstrac-tion algorithms do not yield lossless abstractionseven if enough granularity is allowed for a losslessabstraction to exist. For the expectation-basedabstraction algorithm this can already be seen infigure 4, and this problem arises in some gamesalso with the potential-aware abstraction algo-rithm. One could trivially try to circumvent thisproblem by running lossless GameShrink first, andonly running a lossy abstraction algorithm if theabstraction produced by GameShrink has a largernumber of buckets than desired.

Strategy-Based Abstraction. It may turn out thatabstraction is as hard a problem as equilibriumfinding itself. After all, two states should fall in thesame abstract bucket if the optimal action proba-bilities in them are (almost) the same, and deter-mining the action probabilities is done by findingan equilibrium.

This led us to develop the strategy-based abstrac-tion approach. It iterates between abstraction andequilibrium finding. The equilibrium finding oper-ates on the current abstraction. Once a (near) equi-librium is found for that abstraction, we redo theabstraction using the equilibrium strategies toinform the bucketing. Then we find a near equilib-rium for the new abstraction, and so on.

We have applied this approach for the AAAIComputer Poker Competition. However, definitiveresults have not yet been obtained on theapproach because for the fine-grained abstractionsused in the competition (about 1012 leaves in theabstracted game tree), approximate equilibriumfinding takes months on a shared-memory super-

Articles

20 AI MAGAZINE

computer using 96 cores. Therefore, we have so farhad time to go over the abstraction/equilibriumfinding cycle only twice. Future research shouldexplore this approach more systematically bothexperimentally and theoretically.

Action Abstraction So far in this article I have discussed informationabstraction. Another way of making games easierto solve is action abstraction, where in the abstract-ed game the players have fewer actions availablethan in the original game. This is especially impor-tant in games with large or infinite action spaces,such as No-Limit poker. So far action abstractionhas been done by selecting some of the actionsfrom the original game into the abstracted game,although in principle one could generate someabstract actions that are not part of the originalgame. Also, so far action abstractions have beengenerated manually.2 Future research should alsoaddress automated action abstraction.

Action abstraction begets a fundamental prob-lem because real opponent(s) may select actionsoutside the abstract model. To address this, workhas begun on studying what are good reverse map-pings (figure 1), that is, how should opponents’actions that do not abide to the abstraction beinterpreted in the abstracted game? One objectiveis to design a reverse mapping that tries to mini-mize the player’s exploitability. Conversely, onewould like to have actions in one’s own abstractionthat end up exploiting other players’ actionabstractions. These remain largely open researchareas, but some experimental results already existon the former. In No-Limit poker it tends to be bet-ter to use logarithmic rather than linear distancewhen measuring how close an opponent’s real betis to the bet sizes in the abstraction (Gilpin, Sand-holm, and Sørensen 2008). Furthermore, a ran-domized reverse mapping that weights the abstractbetting actions based on their distance to theopponent’s real bet tends to help (Schnizlein,Bowling, and Szafron 2009). As with informationabstraction, in some games refining the actionabstraction can actually increase the player’sexploitability (Waugh et al. 2009a).

Phase-Based Abstraction, Real-Time Equilibrium Finding, and Strategy GraftingBeyond information and action abstraction, athird form of abstraction that has been used forincomplete-information games is phase-basedabstraction. The idea is to solve earlier parts (whichwe call phases) of the game separately from laterphases of the game. This has the advantage thateach part can use a finer abstraction than would betractable if the game were solved holistically. Thedownside is that gluing the phases together sound-

ly is tricky. For one, when solving a later phase sep-arately from an earlier phase, a strategy may dis-close to the opponent information about whichexact later phase version is being played (in poker,information about the private cards the player isholding). From the perspective of the later phasealone this will seem like no loss, but from the per-spective of the entire game it fails to hide informa-tion as effectively as a holistically solved game.

This approach has been used to tackle two-play-er Texas Hold’em poker in two (or in principlemore) phases. The first phase includes the earlyrounds. It is solved offline. To be able to solve it,one needs a model of what would happen in thelater rounds, that is, what are the payoffs at theend of each path of the first phase. The firstapproach was to assume rollout of the cards in thelater phase(s), that is, no betting and no folding(Billings et al. 2003). Better results were achievedby taking the strategies for the later phase(s) direct-ly from strong prior poker bots or from statisticalobservations of such bots playing if the bots’ strate-gies themselves are not available (Gilpin and Sand-holm 2007a). I call this bootstrapping with basestrategies.

It has also been shown experimentally that hav-ing the phases overlap yields better results thanhaving the phases be disjoint. For example, thefirst phase can include rounds 1, 2, and 3, whilethe second phase can include rounds 3 and 4(Gilpin and Sandholm 2007a). Or the first phasecan include all rounds, and the second phase amore refined abstraction of one or more of the lat-er rounds.

The second phase can be solved in real time dur-ing the game so that a finer abstraction can be usedthan if all possible second-phase games wouldhave to be solved (that is, all possible sequences ofcards and actions from the rounds before the startof the second phase) (Gilpin and Sandholm2007a). Whether or not the second phase is solvedoffline or in real time, at the beginning of the sec-ond phase, before the equilibrium finding for thesecond phase takes place, the players’ beliefs areupdated using Bayes’ rule based on the cards andactions each player has observed in the first phase.

The idea of bootstrapping with base strategieshas been extended to base strategies that cover theentire game, not just the end (Waugh, Bard, andBowling 2009). The base strategies can be comput-ed using some abstraction followed by equilibriumfinding. Then, one can isolate a part of theabstracted game at a time, construct a finer-grainedabstraction for that part, require that Player 1 fol-low his base strategy in all his information setsexcept those in that part (no such restriction isplaced on the opponent), and solve the gameanew. Then one can pick a different part and dothis again, using the same base strategy. Once all

Articles

WINTER 2010 21

the parts (which constitute a partition of the infor-mation sets where it is Player 1’s turn to move)have been finished, we have a strategy for Player 1.Then, a strategy for Player 2 is computed analo-gously. This grafting approach allows one to focuson one part of the game at a time with fine abstrac-tion while having a holistic view of the gamethrough the base strategies. In principle this canincrease the player’s exploitability, but in practiceit improves performance. Similar approaches canbe used for more than two players, but nothing hasbeen published on that yet.

Equilibrium-Finding Algorithms for Two-Player Zero-Sum Games.So far I have discussed abstraction. I will now moveto algorithms for solving the (abstracted) game.This section focuses on two-player zero-sum

games. The next section covers more generalgames.

The most common solution concept (that is,definition of what it means to be a solution) isNash equilibrium. A strategy for an agent definesfor each information set where it is the agent’s turnto move a probability distribution over the agent’sactions. The two agents’ strategies form a Nashequilibrium if neither agent can benefit in expec-tation by deviating from her strategy given thatthe other agent does not deviate from his.

Formally, the Nash equilibria of two-player zero-sum sequential games are the solutions to

(1)

where X and Y are polytopes defining the players’strategies and A is the payoff matrix (Romanovskii1962; Koller, Megiddo, and von Stengel 1996; vonStengel 1996). When the minimizer plays a strate-

minx∈X

maxy∈Y

yTAx = maxy∈Y

minx∈X

yTAx

Articles

22 AI MAGAZINE

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

100,000,000,000

1,000,000,000,000

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Nodes in game tree

AAAI Poker CompetitionAnnounced

Koller and PfefferUsing sequence form

and LP (simplex)

Billings et al.LP (CPLEX interior point method)

Gilpin and SandholmLP (CPLEX interior point method)

Gilpin, Hoda,Peña & Sandholm

Scalable EGT

Gilpin, Sandholm,and SørensenScalable EGT

Zinkevich et al.Counterfactual regret

Figure 5. Progress on Algorithms for Solving Two-Player Zero-Sum Games.

gy x Œ X and the maximizer plays y Œ Y, theexpected utility to the maximizer is yTAx and,since the game is zero-sum, the minimizer’sexpected utility is –yTAx. Problem 1 can beexpressed as a linear program (LP) whose size is lin-ear in the size of the game tree. Thus the problemis solvable in polynomial time. Today’s best gener-al-purpose LP solvers can solve games with up to107 or 108 leaves in the game tree (correspondingto nonzero entries in A) (Gilpin and Sandholm2006). For example, losslessly abstracted RhodeIsland Hold’em poker has a 106 × 106 payoff matrixcontaining 50 million nonzeros, and solving it (tonear machine precision) with CPLEX’s barriermethod (an interior-point LP method) took a weekand used 25 gigabytes of RAM (Gilpin and Sand-holm 2007b). Interestingly, on these kinds of prob-lems the barrier method does better than the sim-plex method.

The LP approach does not scale to most inter-esting games. For instance, the payoff matrix A in(1) for two-player Limit Texas Hold’em poker hasdimension 1014 × 1014 and contains more than1018 nonzero entries. There has been tremendousprogress in developing equilibrium-finding algo-rithms in the last few years, spurred in part by theAAAI Computer Poker Competition, (see figure 5).These new algorithms find an e-equilibrium, thatis, strategies x Œ X and y Œ Y such that neitherplayer can benefit more than e in expectation bydeviating from her strategy.

Algorithms Based on Smoothing and Gradient Descent In this section I will describe recent custom equi-librium-finding algorithms based on smoothingand gradient descent.

Extending the Excessive Gap Technique toSequential Games and Making it Scalable. Inthis section I will describe the first custom equilib-rium-finding algorithm for sequential incomplete-information games (Gilpin et al. 2007; Hoda et al.2010). It took equilibrium finding to a new level bysolving poker games with 1012 leaves in the gametree (Gilpin, Sandholm, and Sørensen 2007), and itfound an e-equilibrium with a small e (0.027 smallbets per hand in abstracted Limit Texas Hold’em).It remains one of the fastest equilibrium-findingalgorithms in practice.

The algorithm is an adaptation of Nesterov’sexcessive gap technique (Nesterov 2005a; 2005b)to sequential games, and it also includes tech-niques for significantly improving scalability bothin terms of time and memory usage.

Problem 1 can be stated as (2)

where

minx∈X

f (x) = maxy∈Y

φ(y)

f (x) = maxy∈Y

yT Ax and φ(y) = minx∈X

yT Ax.

The functions f and f are respectively convex andconcave nonsmooth functions. The left-hand sideof equation 2 is a standard convex minimizationproblem of the form

(3)First-order methods for solving equation 3 arealgorithms for which a search direction at eachiteration is obtained using only the first-orderinformation of h, such as its gradient or subgradi-ent. When h is nonsmooth, subgradient algo-rithms can be applied, but they have a worst-casecomplexity of O(1/e2) iterations (Goffin 1977).However, that pessimistic result is based on treat-ing h as a black box where the value and subgradi-ent are accessed through an oracle. For nonsmoothfunctions with a suitable max structure, Nesterovdevised first-order algorithms requiring onlyO(1/e) iterations.

The key component of Nesterov’s smoothingtechnique is a pair of prox-functions for the sets Xand Y. These prox-functions are used to constructsmooth approximations fµ ≈ f and fµ ≈ f. To obtainapproximate solutions to equation 2, gradient-based algorithms can then be applied to fµ and fµ.

We say that a function is a prox-function if it isstrongly convex and its minimum is zero. AssumedX and dY are prox-functions for the sets X and Yrespectively. Then for any given µ > 0, the smoothapproximations fµ ≈ f and fµ ≈ f are

Let DX := max{dX (x): x ŒX}, and let sX denote thestrong convexity modulus of dX . Let DY and sY bedefined likewise for Y and dY.3

For an algorithm based on theorem 2 to be prac-tical, the subproblems (equation 4) must be solv-able quickly. They can be phrased in terms of the

min{h(x) : x ∈X} .

f µ(x) := max{yTAx − µdY (y) : y ∈ Y} ,

φµ(y) := min{yTAx + µdX (x) : x ∈ X } .

Articles

WINTER 2010 23

There is a procedure (algorithm 4 presented later)based on the above smoothing technique that after Niterations generates a pair of points (xN, yN ) ∈ X × Ysuch that

0 ≤ f (xN ) − φ(yN ) ≤ 4 ‖A‖N + 1

√DXDY

σXσY.

Furthermore, each iteration of the procedure performssome elementary operations, three matrix-vectormultiplications by A, and requires the exact solutionof three subproblems of the form

maxx∈X

{gT x− dX (x)} or maxy∈Y

{gT y − dY (y)}. (4)

Theorem 2.(Nesterov 2005a, 2005b).

conjugate of the functions dX and dY. The conju-gate of d : Q Æ � is the function d∗ : �n Æ �

defined by

If d is strongly convex and Q is compact, then theconjugate d∗ is Lipschitz continuous, differentiableeverywhere, and

If the prox-function’s conjugate and the conju-gate’s gradient are computable quickly (and theprox-function is continuous, strongly convex, anddifferentiable), we say that the prox-function isnice (Hoda et al. 2010). With nice prox-functionsthe overall algorithm is fast.

d∗(s) := max{sTx− d(x) : x ∈ Q}.

∇d∗(s) = argmax{sTx− d(x) : x ∈ Q}.

In normal form (also known as bimatrix) games,the strategy of each player lives in a simplex, thatis, the probabilities on her actions sum to one. Forthe k-dimensional simplex Dk, the entropy function

and the Euclidean distance function

are nice prox-functions. In multistep games, the strategies live in a more

complex space. The common way to representmultistep games of perfect recall is the sequenceform (Romanovskii 1962; Koller, Megiddo, andvon Stengel 1996; von Stengel 1996). Our theorem3 enables the use of Nesterov’s excessive gap tech-nique for these games.

Nesterov’s Excessive Gap Technique (EGT). ForµX, µY > 0, consider the pair of problems:

Algorithm 4 generates iterates (xk, yk, µkX, µk

Y) withµkX, µk

Y decreasing to zero and such that the fol-lowing excessive gap condition is satisfied at eachiteration:

(5)

From equation 4 and the fact f(x) ≥ f(y), we seethat

(6)

Consequently, f(xk) ≈ f(yk) when µkX and µk

Y aresmall.

Algorithm 2 finds a starting point that satisfiesthe excessive gap condition (equation 5).

Algorithm 3 decreases µX and µY while main-taining equation 5.

If the input (µX, µY, x, y) to algorithm 3 satisfiesequation 5, then so does (µ+

X, µY, x+, y+) as long ast satisfies t2/(1 − t ) ≤ µX µY sX sY /||A||2 (Nesterov2005a).

We are now ready to describe Nesterov’s exces-sive gap technique specialized to equation 1 (seealgorithm 4). By theorem 2, algorithm 4 — whichwe will refer to as EGT — finds an e-equilibrium inO(1/e) iterations. Furthermore, for games withordered signals, each iteration runs in linear timein the size of the game tree (Hoda et al. 2010).

Heuristics. EGT can be sped up by applying heuris-tics that decrease µX and µY faster, while main-taining the excessive gap condition (4) (Gilpin etal. 2007). This leads to faster convergence in prac-tice without compromising any of the theoreticalguarantees.

The first heuristic is based on the followingobservation: although the value t = 2/(k + 3) com-puted in step 2(a) of EGT guarantees the excessive

d(x) = ln k +k∑

i=1

xi lnxi ,

d(x) =12

k∑

i=1

(xi − 1/k )2

f µY (x) := max{yTAx− µYdY (y) : y ∈ Y},

φµX (y) := min{yTAx + µXdQ (x) : x ∈ X}.

f µY (x) ≤ φµX (y).

0≤ φ(y) − f (x) ≤ µXDX + µYDY .

Articles

24 AI MAGAZINE

There is a way to construct nice prox-functions forsequence form games.

Theorem 3.(Hoda et al. 2010).

initial (A, dX , dY )

1. µ0X := µ0

Y := ‖A‖√σXσY

2. x := ∇d∗X (0)

3. y0 := ∇d∗Y(

1µ0YAx

)

4. x0 := ∇d∗X(∇dX (x) + 1

µ0XAT y0

)

5. Return (µ0X , µ

0Y ,x

0,y0)

Algorithm 2.

shrink (A,µX , µY , τ, x,y, dX , dY )

1. x := ∇d∗X(− 1

µXAT y

)

2. x := (1 − τ )x + τ x

3. y := ∇d∗Y(

1µYAx

)

4. x := ∇d∗X(∇dX (x) − τ

(1−τ )µXAT y

)

5. y+ := (1 − τ )y + τ y6. x+ := (1 − τ )x + τ x7. µ+

X := (1 − τ )µX

8. Return (µ+X ,x+ ,y+ )

Algorithm 3.

gap condition (equation 5), this is potentially anoverly conservative value. Instead we can use anadaptive procedure to choose a larger value of t.Since we then can no longer guarantee equation 5a priori, we do a posterior verification, which occa-sionally necessitates an adjustment in the parame-ter t.

The second heuristic is motivated by the obser-vation that after several iterations, one of µX andµY may be much smaller than the other. Thisimbalance is undesirable because the larger onecontributes the most to the worst-case bound(equation 6). Hence, every so many iterations weperform a balancing to bring these values closertogether. The balancing consists of repeatedlyshrinking the larger one of µX and µY.

We also observed that after such balancing, thevalues of µX and µY can sometimes be furtherreduced without violating the excessive gap condi-tion (equation 5). We thus include a final reduc-tion step in the balancing.

Experiments on automatically abstracted pokergames show that each of the two heuristics tendsto reduce e by about an order of magnitude. Thoseexperiments also show that using the entropyprox-function at the leaves performs better thanusing the Euclidian prox-function.

Decomposed Game Representation to SaveMemory. One attractive feature of first-order meth-ods like EGT is that the only operation performedon the matrix A is a matrix-vector product. We canthus exploit the problem structure to store only animplicit representation of A (Gilpin et al. 2007).This representation relies on a certain type ofdecomposition that is present in games withordered signals. For example, the betting sequencesthat can occur in most poker games are independ-ent of the cards that are dealt. We can decomposethe payoff matrix based on these two aspects.

For ease of exposition, we explain the conciserepresentation in the context of Rhode IslandHold’em. The payoff matrix A can be written as

where

for much smaller matrices Fi, Bi, S, and W. Thematrices Fi correspond to sequences of moves inround i that end with a fold, and S corresponds tothe sequences in round 3 that end in a showdown.The matrices Bi encode the betting structures inround i, while W encodes the win/lose/draw infor-mation determined by poker hand ranks. The sym-bol ƒ denotes the Kronecker product. The Kro-necker product of two matrices B Œ �m×n and C Œ

�p×q is

A =

[ A1A2

A3

]

A1 = F1 ⊗ B 1,A2 = F2 ⊗ B 2, andA3 = F3 ⊗ B 3 + S ⊗W

Given the above decomposed representation ofA, the space required is sublinear in the size of thegame tree. For example, in Rhode Island Hold’em,the dimensions of the F1 and F2 matrices are 10 ×10 and 70 × 70 respectively. The dimension of theF3 and S matrices are 490 × 490. The dimensions ofB1, B2, and B3 are 13 × 13, 205 × 205, and 1,774 ×1,774, respectively. By contrast, the matrix A is883,741 × 883,741. Furthermore, the matrices Fi,Bi, S, and W are themselves sparse, so we capitalizeon the Compressed Row Storage (CRS) data struc-ture that only stores nonzero entries. Table 1demonstrates that the decomposed game repre-sentation enables equilibrium finding in the large.

Speeding Up the Matrix-Vector Products UsingParallelization and Sampling. The matrix-vectoroperations dominate the run time of first-orderalgorithms like EGT. They can be parallelized onmultiple cores with near-perfect efficiency (Gilpinet al. 2007). For further speedups, one can samplethe payoff matrix A to construct a sparser matrix,and then run the algorithm on the latter. One canalso redo the sampling dynamically if overfittingto the sparser matrix occurs (Gilpin and Sandholm2010).

Poker Players Created. In 2008, the Associationfor the Advancement of Artificial Intelligence(AAAI) held the third annual AAAI Computer Pok-er Competition, where computer programs sub-mitted by teams worldwide competed against eachother. We generated our players using lossyabstraction algorithms followed by equilibriumfinding in the abstracted game using the EGT algo-rithm described above. GS4-Beta placed first (out

B ⊗ C =

b11C · · · b1nC...

......

bm1C · · · bmn C

∈ Rmp×nq .

Articles

WINTER 2010 25

EGT (A, dX , dY )

1. (µ0X , µ0

Y ,x0,y0) = initial(A, dX , dY )

2. For k= 0 ,1, . . .:(a) τ := 2

k+3

(b) If k is even: // shrink µX

i. (µk+1X , xk+1, y k+1) :=

shrink(A, µkX , µ

kY , τ,xk ,yk , dX , dY )

ii. µk+1Y := µkY

(c) Ifk is odd: // shrink µY

i. (µk+1Y , yk+1, x k+1) :=

shrink(− AT , µ kY , µk

X , τ, yk ,xk , dY , dX )ii. µk+1

X := µkX

Algorithm 4.

of nine) in the Limit Bankroll Competition andTartanian placed third (out of four) in the No-Lim-it Competition. Tartanian actually had the highestwinning rate in the competition, but due to thewinner determination rule for the competition, itgot third place.

A Gradient-Based Algorithm with O(log 1/e)Convergence. Recently we developed a gradient-based equilibrium-finding algorithm that finds ane-equilibrium exponentially faster: in O(log 1/e)iterations (Gilpin, Peña, and Sandholm 2008). Ituses as a subroutine a procedure we call smoothing,which is a recent smoothing technique for non-smooth convex optimization (Nesterov 2005b).The algorithm is unlike EGT in that there is onlyone function that is being optimized andsmoothed instead of two. Also unlike in EGT, thetarget e needs to be specified in advance. This isused to set the constant µ that is used as the mul-tiplier on the prox-function. Unlike in EGT, µ doesnot change inside smoothing.

We showed that smoothing can be extended to

sequential games. This entailed developing a cus-tom dynamic program.

We added an outer loop that decreases e by a fac-tor e in between calls to smoothing. The key wasthen to prove that each such call to smoothinguses only a constant number of first-order itera-tions (gradient steps). It follows immediately thatthe overall algorithm runs in O(log 1/e) first-orderiterations (theorem 4).

Experiments verified that this algorithm con-verges faster than EGT, and the speed differenceincreases systematically the smaller e is. Currentwork includes developing robust software imple-mentations of this algorithm that can scale to thelarge.

Our O(log 1/e) convergence rate matches thebest known convergence rate: that of interior-point methods. At the same time, our algorithm —being a first-order method — is scalable while cur-rent interior-point methods require a prohibitiveamount of memory.

Algorithm Based on CounterfactualRegret Minimization (CFR) Soon after we developed an EGT-based algorithmfor sequential games, another algorithm, counter-factual regret (CFR) was introduced that can alsofind an e-equilibrium with small e in games with1012 leaves in the game tree (Zinkevich et al. 2007).It is based on totally different principles than thegradient-based equilibrium-finding algorithmsdescribed above. Specifically, it is based on regretminimization principles, and it is crafted cleverlyso they can be applied at individual informationsets (usually they are employed in the space ofstrategies; that would not scale to large sequentialgames).

The average overall regret for agent i, RiN is how

much better off i would have been on average inrepetitions 1..N of the game by using his or herbest fixed strategy than by having played the wayhe or she did. If both players have average overallregret less than e, then Agent 1’s time-averagedstrategy and Agent 2’s time-averaged strategy con-stitute a 2e-equilibrium.

The design by Zinkevich et al. starts by studyingone information set I and player i’s choices madein I. Define counterfactual utility ui(s, I) to be theexpected utility given that information set I isreached and all players play using strategy s exceptthat player i plays to reach I. Formally, letting ps(h,h�) be the probability of going from history h to h�,and letting Z be the set of terminal histories:

For all actions a ŒA(I), define s|IÆa to be a strategyprofile like s except that player i always choosesaction a in I. The immediate counterfactual regret is

ui (σ, I ) =∑

h∈I,h ′∈Z πσ−i (h)πσ (h, h′)ui (h′)

πσ−i (I )

Articles

26 AI MAGAZINE

Name CPLEX IPM CPLEX Simplex EGT

10k 0.082 GB > 0.051 GB 0.012 GB

160k 2.25 GB > 0.664 GB 0.035 GB

RI 25.2 GB > 3.45 GB 0.15 GB

Texas > 458 GB > 458 GB 2.49 GB

GS4 > 80,000 GB > 80,000 GB 43.96 GB

Table 1.

Memory footprint of CPLEX interior-point method (IPM), CPLEX simplex,and our memory-efficient version of EGT. 10k and 160k are lossy abstractionsof Rhode Island Hold’em, and RI is lossless. Texas and GS4 are lossy abstrac-tions of Texas Hold’em.

The algorithm finds an ε-equilibrium in

2√

2 · e · κ(A) · ln(2‖A‖/ε ) ·√D

first-order iterations, where ‖·‖ is the Euclidean matrixnorm, D is the maximum Euclidean distance betweenstrategies, and κ(A) is a condition measure of A.

Theorem 4.(Gilpin and Sandholm 2008a).

This is the player’s regret in its decisions at I interms of counterfactual utility, with an additionalweighting term for the counterfactual probabilitythat I would be reached if the player had tried to doso. Denoting

it turns out that that the average overall regret canbe bounded by the local ones:

The key is that immediate counterfactual regretcan be minimized by controlling only si(I). TheCFR algorithm maintains for all I ŒIi, for all aŒA(I)

Let the positive counterfactual regret be

The CFR algorithm simulates the players playingthe game repeatedly. Actions for each agent areselected in proportion to their positive counterfac-tual regret. (If no action has positive counterfactu-al regret, then the action is selected randomly.) Theoutput is a strategy for each agent; it is the time-averaged strategy computed from the agent’sstrategies in repetitions 1..N.

CFR can be sped up by sampling the chancemoves, that is, sampling bucket sequences in theabstraction, rather than considering all the histo-ries that lead to the information set that is beingupdated. This enables approximately 750 itera-tions per second on a Texas Hold’em abstraction,and yields a smaller e faster (Zinkevich et al. 2007).Other forms of sampling can also be used (Lanctotet al. 2009). By theorem 5, CFR runs in O(1/e2) iter-ations (and the chance sampling incurs only a lin-ear increase in the number of iterations) which issignificantly worse than the O(1/e) and O(log(1/e))guarantees of the smoothed gradient based tech-niques described above. In practice, on TexasHold’em abstractions with 1012 leaves, CFR is runfor about a billion sampled iterations while EGTneeds to be run only for about a hundred itera-tions. On the other hand, each sampled CFR itera-tion runs much faster than an iteration of thoseother algorithms: EGT takes hours per iteration(even when the matrix-vector products are paral-lelized across 96 cores). Our initial direct compar-isons between EGT and CFR on small and medium-sized games show that either can have a significant

RNi,imm (I ) =

1N

maxa∈A(I )

N∑

t=1

πσ t

−i (I )(ui (σt |I→a , I ) − ui (σt , I ))

RN, +i,imm (I ) = max{RN

i,imm (I ), 0}

Ni ≤

∑

I ∈I i

R N, +i,imm (I ).R

RNi (I, a) =

1N

N∑

t=1

πσ t

−i (I )(ui (σt |I→a , I ) − ui (σt , I ))

RN, +i (I, a) = max{RN

i (I, a),0}.

overall speed advantage over the other dependingon the game.

A Practical Use of Imperfect Recall All of the results above are for games with perfectrecall, which includes most games played by com-puters. A recent idea has been to model a game ofperfect recall like poker as a game of imperfectrecall by allowing the player to use fine-grainedabstraction at the poker round that she is in andthen forgetting some of those details of that roundso as to be able to have a larger branching factor inthe information abstraction for the next roundwhile keeping the overall abstracted game sizemanageable (Waugh et al. 2009b). Such imperfectrecall abstracted games have been approachedusing CFR, but there are no convergence guaran-tees.

Equilibrium-Finding Algorithms for Multiplayer and Nonzero-Sum Games

While the abstraction methods discussed earlierare for n-player general-sum games, the equilibri-um-finding algorithms above are for two-playerzero-sum games. The problem of finding a Nashequilibrium in two-player general-sum games isPPAD-complete even in normal form games withcomplete information (Daskalakis, Goldberg, andPapadimitriou 2008; Chen and Deng 2006), sug-gesting that there likely is no polynomial-timealgorithm for all instances. The PPAD-complete-ness holds even when payoffs are restricted to bebinary (Abbott, Kane, and Valiant 2005). Withthree or more players, the problem becomes FIXP-complete even in the zero-sum case (Etessami andYannakakis 2007).

To my knowledge the best equilibrium-findingalgorithms for general multiplayer incomplete-information games are continuation methods(Govindan and Wilson 2003). They perturb the

Articles

WINTER 2010 27

Let ∆ i be the difference between i’s highestand lowest payoff in the game. With theCFR algorithm, RN

i,imm (I ) ≤ ∆ i√

|Ai |/√N

and thus RNi ≤ ∆ i |Ii |

√|Ai|/

√N , where

|Ai | = maxh :Player (h)= i |A(h)|.

Theorem 5.(Zinkevich et al. 2007.)

game by giving agents fixed bonuses, scaled by l,for each of their actions. If the bonuses are largeenough (and unique), they dominate the originalgame, so the agents need not consider their oppo-nents’ actions. There is thus a unique pure-strate-gy equilibrium easily determined at l = 1. The con-tinuation method can then be used to follow apath in the space of l and equilibrium profiles forthe resulting perturbed game, decreasing l until itis zero, at which point the original game has beensolved. The algorithm scales better for games thatcan be represented in a structured way using mul-tiagent influence diagrams (Blum, Shelton, andKoller 2006). However, even then it has only beenapplied to relatively small games.

Leveraging Qualitative Models A recent idea that scales to significantly largergames takes advantage of the fact that in many set-tings it is easier to infer qualitative models aboutthe structure of equilibrium than it is to actuallycompute an equilibrium. For example, in(sequences of) take-it-or-leave-it offers, equilibriainvolve accepting offers above a certain thresholdand rejecting offers below it. Threshold strategiesare also common in auctions and in deciding when

to make and break partnerships and contracts. Inpoker, the cards in the hand are private signals,and in equilibrium, often the same action is takenin continuous regions of the signal space (forexample, Ankenman and Chen [2006]). The ideaof using qualitative models as an extra input forequilibrium finding has been applied to continu-ous (and finite) multiplayer Bayesian games, abroad class of imperfect-information games thatincludes many variants of poker (Ganzfried andSandholm 2010). Figure 6 shows an example of aqualitative model for a simplified poker game.

Given a qualitative model that is correct, there isa mixed integer linear feasibility program (MILFP)that finds an equilibrium (a mixed strategy equilib-rium in games with a finite number of types and apure strategy equilibrium in games with a continu-um of types) (Ganzfried and Sandholm 2010). Thepaper also presents extensions of the algorithm togames with dependent private signal distributions,many players, and multiple candidate qualitativemodels of which only some are correct. Experi-ments show that the algorithm can still computean equilibrium even when it is not clear whetherany of the models are correct, and an efficient pro-cedure is given for checking the output in the eventthat they are not correct. The MILFP finds an exactequilibrium in two-player games, and an e-equilib-rium in multiplayer games. It also yields a MILFPfor solving general multiplayer imperfect-informa-tion games given in extensive form without anyqualitative models. For most of these games classes,no prior algorithm was known.

Experiments suggest that modeling a finite gamewith an infinite one with a continuum of types cansignificantly outperform abstraction-based ap -proaches on some games. Thus, if one is able to con-struct a correct qualitative model, solving the MIL-FP formulation of the infinite approximation of agame could potentially be the most efficient ap -proach to solving certain classes of large finitegames (and the only approach to solving the infi-nite version). The main algorithm was used toimprove play in two-player limit Texas hold’em bysolving endgames. In addition, experiments dem -on strated that the algorithm was able to efficientlycompute equilibria in several infinite three-playergames.

Solving Multiplayer Stochastic Games of Imperfect Information Significant progress has also been made on equi-librium finding in multiplayer stochastic games ofimperfect information (Ganzfried and Sandholm2008; 2009). For example, consider a No-LimitTexas Hold’em tournament. The best way to playdiffers from how one should play an individualhand because there are considerations of bankrollmanagement (one gets eliminated once one runs

Articles

28 AI MAGAZINE

Han

d Va

lues

, 0 =

bes

t po

ssib

le h

and

1

0P1-Actions

aBET-FOLD

b

CHECK-CALL

CHECK-FOLD

BLUFF-FOLD

c

d

RAISE/BET

CALL/BET

CALL/CHECK

RAISE-BLUFF/CHECK

FOLD/CHECK

FOLD/BLUFF

e

f

g

h

i

P2-Actions

BET-CALL

Figure 6. A Qualitative Model for a Simplified Poker Game.

(Ganzfried and Sandholm 2010). Player 1’s action regions are on the left, Play-er 2’s on the right.

out of chips) and the payoffs are based on ranks inthe tournament rather than chips. This becomesespecially important near the end of a tournament(where the antes are large). One simple strategyrestriction is to always go all-in or fold in Round 1(that is, once the private cards have been dealt butno public cards have). In the two-player case, thebest strategy in that restricted space is almost opti-mal against an unrestricted opponent (Miltersenand Sørensen 2007). It turns out that if all playersare restricted in this way, one can find an e-equi-librium for the multiplayer game (Ganzfried andSandholm 2008; 2009). The algorithms have aninner loop to determine e-equilibrium strategiesfor playing a hand at a given state (stack vector,one stack of chips per player) given the values ofpossible future states. This is done for all states. Theiteration of the outer loop adjusts the values of thedifferent states in light of the new payoffs obtainedfrom the inner loop. Then the inner loop is exe-cuted again until convergence, then the outerloop, and so on.

For instance, fictitious play can be used for theinner loop and policy iteration for solving Markovdecision processes for the outer loop. Several othervariants were also studied. None of the variants areguaranteed to converge, but some of them havethe property that if they converge, they convergeto an equilibrium. In practice, both the inner andouter loop converge quickly in all of the testedvariants. This suggests that fictitious play is anoth-er promising algorithm for multiplayer imperfect-information games.

Opponent Exploitation So far I have discussed approaches based on gametheory. A totally different approach is to try tolearn to exploit opponents.

Two-player zero-sum games have the nice prop-erty that our player can only benefit if the oppo-nent does not play an equilibrium strategy. Fur-thermore, it does not matter which equilibriumstrategies are selected: if (x, y) and (x�, y�) are equi-libria, then so are (x, y�) and (x�, y). However, evenin two-player zero-sum games, an equilibriumstrategy might not maximally exploit an opponentthat does not play equilibrium. In multiplayergames, there is the further complications of equi-librium selection.

There is a long history of opponent-exploitationresearch in AI (for example, by building models ofopponents). That has also been studied for poker(for example, Billings et al. [1998], Southey et al.[2005], and Bard and Bowling [2007]). In practice— at least in large games like Texas Hold’em (evenin the two-player case), even with relatively largenumbers of hands to learn from — those approach-es are far inferior to the game-theory-based

approaches. For example, our poker player that wasconstructed using potential-aware abstraction andEGT, and used no learning, won the Bankroll Com-petition in the AAAI 2008 Computer Poker Com-petition. This was noteworthy because the BankrollCompetition is designed to favor learning programsthat can take advantage of weak opponents.

One weakness in the learning approach is theget-taught-and-exploited problem (Sandholm 2007):An opponent might play in a way to teach thelearner a model, and then exploit the learner thatattempts to use that model. Furthermore, theopponent might lose significantly less from theteaching than he gains from the exploitation.

One recent approach that has been pursuedboth at University of Alberta’s poker researchgroup (Johanson, Zinkevich, and Bowling 2007;Johanson and Bowling 2009) and mine is to startwith a game-theory-based strategy and then adjustit in limited ways to exploit the opponent as welearn more about the opponent. This already yield-ed a win in the Bankroll Competition in the AAAI2009 Computer Poker Competition, and is a prom-ising direction for the future.

There are some fundamental limits, however.Can this be done in a safe way? That is, can oneexploit to some extent beyond the game-theoreticequilibrium strategy while still maintaining at leastthe same expected payoff as the equilibrium strat-egy? Recently Sam Ganzfried and I proved that thisis impossible. So, in order to increase exploitation,one needs to sacrifice some on the game-theoreticsafety guarantee.

Additional Topics Beyond what I discussed so far, there are otherinteresting developments in the computation ofsolutions to incomplete-information games. Letme briefly discuss some of them here.

One question is whether Nash equilibrium is theright solution concept. In two-player zero-sumgames it provides a safety guarantee as discussedabove, but in more general games it does notbecause equilibrium selection can be an issue. Evenin two-player zero-sum games, the equilibriumstrategy may not play the rest of the game opti-mally if the opponent makes a mistake. Variousequilibrium refinements can be used to prune suchequilibrium strategies from consideration, andthere has been some work on computing equilibri-um strategies that honor such refinements (forexample, (Miltersen and Sørensen 2010; 2006;2008)). In multiplayer games there is also the pos-sibility of collusion (coordination of strategiesand/or information), and there are coalitionalequilibrium refinements (for example, Milgromand Roberts [1996], Moreno and Wooders [1996],and Ray [1996]).

Articles

WINTER 2010 29

There is also work on other general classes ofgames. An optimal polynomial algorithm wasrecently developed for repeated incomplete-infor-mation games (Gilpin and Sandholm 2008b).Work has also been done on Kriegspiel (chesswhere the players do not observe each others’moves), but the best-performing techniques arestill based on sampling of the game tree ratherthan game-theoretic approaches (Ciancarini andFavini 2009). There has been work on computingcommitment (Stackelberg) strategies (where Play-er 1 has to commit to a mixed strategy first, andthen Player 2 picks a strategy) in normal formgames, with significant security applications (forexample, Conitzer and Sandholm [2006]; Jain etal. [2010]). Recently that was studied also insequential incomplete-information games (Letch-ford and Conitzer 2010; Kiekintveld, Tambe, andMarecki 2010).

Conclusions There has been tremendous progress on solvingincomplete-information games in the last fiveyears. For some rather large games like two-playerRhode Island Hold’em, an optimal strategy hasbeen computed. An optimal strategy is not yetknown for any variant of Texas Hold’em, but intwo-player Limit Texas Hold’em — a game that isfrequently used in competitions among profes-sional poker players — computers have surpassedhumans. In the No-Limit and multiplayer variantshumans remain ahead.

This is a very active and fertile research area. Ihope this article helps newcomers enter the field,spurs interest in further pushing the boundary ofwhat is computationally possible, and facilitatesadoption of these approaches to additional gamesof importance, for example, negotiation, auctions,and various security applications.

Acknowledgments I want to thank Andrew Gilpin, Javier Peña, SamGanzfried, Troels Bjerre Sørensen, and Sam Hoda.Their contributions to this research were essential.I also thank Kevin Waugh for conducting experi-ments between EGT and CFR while he was in mylaboratory.

This material is based upon work supported bythe National Science Foundation under grants ITR-0427858, IIS-0905390, and IIS-0964579. I alsothank IBM and Intel for their machine gifts, andthe Pittsburgh Supercomputing Center for com-pute time and help.

Notes1. The reader is invited to play against this strategy atwww.cs.cmu.edu/~gilpin/gsi.html.

2. An extreme form of action abstraction is to restrict

analysis to a small number of strategies and conductequilibrium analysis among them (Wellman 2006).

3. The operator norm of A is defined as ||A|| := max{yT Ax:||x||, ||y|| ≤ 1}, where the norms ||x||, ||y|| are those associ-ated with sX and sY.

References Abbott, T.; Kane, D.; and Valiant, P. 2005. On the Com-plexity of Two-Player Win-Lose Games. In Proceedings ofthe 46th Annual IEEE Symposium on Foundations of Com-puter Science (FOCS-05). Los Alamitos, CA: IEEE Comput-er Society.

Ankenman, J., and Chen, B. 2006. The Mathematics ofPoker. Pittsburgh, PA: ConJelCo LLC.

Bard, N., and Bowling, M. 2007. Particle Filtering forDynamic Agent Modelling in Simplified Poker. In Pro-ceedings of the 22nd AAAI Conference on Artificial Intelli-gence (AAAI-07), 515–521. Menlo Park, CA: AAAI Press.

Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Schaeffer,J.; Schauenberg, T.; and Szafron, D. 2003. ApproximatingGame-Theoretic Optimal Strategies for Full-scale Poker.In Proceedings of the 18th International Joint Conference onArtificial Intelligence (IJCAI-03). San Francisco: MorganKaufmann Publishers.

Billings, D.; Davidson, A.; Schaeffer, J.; and Szafron, D.2002. The Challenge of Poker. Artificial Intelligence 134(1-2): 201–240.

Billings, D.; Papp, D.; Schaeffer, J.; and Szafron, D. 1998.Opponent Modeling in Poker. In Proceedings of the 15thNational Conference on Artificial Intelligence (AAAI-98),493–499. Menlo Park, CA: AAAI Press.

Blum, B.; Shelton, C. R.; and Koller, D. 2006. A Continu-ation Method for Nash Equilibria in Structured Games.Journal of Artificial Intelligence Research 25: 457–502.

Chen, X., and Deng, X. 2006. Settling the Complexity of2-Player Nash Equilibrium. In Proceedings of the 47thAnnual IEEE Symposium on Foundations of Computer Science(FOCS). Los Alamitos, CA: IEEE Computer Society.

Ciancarini, P., and Favini, G. P. 2009. Monte Carlo TreeSearch Techniques in the Game of Kriegspiel. In Proceed-ings of the 21st International Joint Conference on ArtificialIntelligence (IJCAI-09). Menlo Park, CA: AAAI Press.

Conitzer, V., and Sandholm, T. 2006. Computing theOptimal Strategy to Commit to. In Proceedings of the 11thACM Conference on Electronic Commerce, 83–92. New York:Association for Computing Machinery.

Daskalakis, C.; Goldberg, P.; and Papadimitriou, C. 2008.The Complexity of Computing a Nash Equilibrium. SIAMJournal on Computing 39(1): 195–259

Etessami, K., and Yannakakis, M. 2007. On the Complex-ity of Nash Equilibria and Other Fixed Points (ExtendedAbstract). In Proceedings of the 48th Annual IEEE Sympo-sium on Foundations of Computer Science (FOCS-07), 113–123. Los Alamitos, CA: IEEE Computer Society.

Ganzfried, S., and Sandholm, T. 2010. Computing Equi-libria by Incorporating Qualitative Models. In Proceedingsof the Ninth International Conference on Autonomous Agentsand MultiAgent Systems (AAMAS 2009). Richland, SC:International Foundation for Autonomous Agents andMultiagent Systems.

Ganzfried, S., and Sandholm, T. 2009. Computing Equi-libria in Multiplayer Stochastic Games of Imperfect Infor-

Articles

30 AI MAGAZINE

mation. In Proceedings of the 21st International Joint Con-ference on Artificial Intelligence (IJCAI-09). Menlo Park, CA:AAAI Press.

Ganzfried, S., and Sandholm, T. 2008. Computing anApproximate Jam/Fold Equilibrium for 3-Agent No-Lim-it Texas Hold’em Tournaments. In Proceedings of the 8thInternational Conference on Autonomous Agents and MultiA-gent Systems (AAMAS). Richland, SC: International Foun-dation for Autonomous Agents and Multiagent Systems.

Gilpin, A., and Sandholm, T. 2010. Speeding Up Gradi-ent-Based Algorithms for Sequential Games. In Proceed-ings of the Ninth International Conference on AutonomousAgents and MultiAgent Systems (AAMAS 2010). Richland,SC: International Foundation for Autonomous Agentsand Multiagent Systems.

Gilpin, A., and Sandholm, T. 2008a. Expectation-BasedVersus Potential-Aware Automated Abstraction in Imper-fect Information Games: An Experimental ComparisonUsing Poker. In Proceedings of the 23rd AAAI Conference onArtificial Intelligence (AAAI-08). Short paper. Menlo Park,CA: AAAI Press.

Gilpin, A., and Sandholm, T. 2008b. Solving Two-PersonZero-Sum Repeated Games of Incomplete Information. InProceedings of the Seventh International Conference onAutonomous Agents and MultiAgent Systems (AAMAS 2008).Richland, SC: International Foundation for AutonomousAgents and Multiagent Systems.

Gilpin, A., and Sandholm, T. 2007a. Better AutomatedAbstraction Techniques for Imperfect InformationGames, with Application to Texas Hold’em Poker. In Pro-ceedings of the 6th International Conference on AutonomousAgents and MultiAgent Systems (AAMAS 2007), 1168–1175.Richland, SC: International Foundation for AutonomousAgents and Multiagent Systems.

Gilpin, A., and Sandholm, T. 2007b. Lossless Abstractionof Imperfect Information Games. Journal of the ACM54(5): 1–32.

Gilpin, A., and Sandholm, T. 2006. A Competitive TexasHold’em Poker Player via Automated Abstraction andReal-Time Equilibrium Computation. In Proceedings of the21st National Conference on Artificial Intelligence (AAAI-06),1007–1013. Menlo Park, CA: AAAI Press.

Gilpin, A.; Hoda, S.; Peña, J.; and Sandholm, T. 2007. Gra-dient-Based Algorithms for Finding Nash Equilibria inExtensive Form Games. In Proceedings of the 3rd Interna-tional Workshop on Internet and Network Economics (WINE’07). Berlin: Springer-Verlag.

Gilpin, A.; Peña, J.; and Sandholm, T. 2008. First-OrderAlgorithm with O(log(1/e)) Convergence for e-Equilibri-um in Games. In Proceedings of the 23rd AAAI Conferenceon Artificial Intelligence (AAAI-08). Menlo Park, CA: AAAIPress.

Gilpin, A.; Sandholm, T.; and Sørensen, T. B. 2008. AHeads-Up No-Limit Texas Hold’em Poker Player: Dis-cretized Betting Models and Automatically GeneratedEquilibrium-Finding Programs. In Proceedings of the Sev-enth International Conference on Autonomous Agents andMultiAgent Systems (AAMAS 2008). Richland, SC: Interna-tional Foundation for Autonomous Agents and Multia-gent Systems.

Gilpin, A.; Sandholm, T.; and Sørensen, T. B. 2007. Poten-tial-Aware Automated Abstraction of Sequential Games,and Holistic Equilibrium Analysis of Texas Hold’em Pok-

er. In Proceedings of the 22nd AAAI Conference on ArtificialIntelligence (AAAI-07). Menlo Park, CA: AAAI Press.

Goffin, J.-L. 1977. On the Convergence Rate of Subgradi-ent Optimization Methods. Mathematical Programming13(1): 329–347.

Govindan, S., and Wilson, R. 2003. A Global NewtonMethod to Compute Nash Equilibria. Journal of EconomicTheory 110(1): 65–86.

Hoda, S.; Gilpin, A.; Peña, J.; and Sandholm, T. 2010.Smoothing Techniques for Computing Nash Equilibria ofSequential Games. Mathematics of Operations Research35(2): 494–512.

Jain, M.; Tsai, J.; Pita, J.; Kiekintveld, C.; Rathi, S.;Ordóñez, F.; and Tambe, M. 2010. Software Assistants forRandomized Patrol Planning for The LAX Airport Policeand The Federal Air Marshals Service. Interfaces 40(4):267–290.

Johanson, M., and Bowling, M. 2009. Data Biased RobustCounter Strategies. Paper presented at the 12th Interna-tional Conference on Artificial Intelligence and Statistics(AISTATS), Clearwater Beach, FL. 16–18 April.

Johanson, M.; Zinkevich, M.; and Bowling, M. 2007.Computing Robust Counter-Strategies. In Proceedings ofthe 21st Annual Conference on Neural Information ProcessingSystems (NIPS 2007). Cambridge, MA: The MIT Press.

Kiekintveld, C.; Tambe, M.; and Marecki, J. 2010. RobustBayesian Methods for Stackelberg Security Games(Extended Abstract). In Proceedings of the Ninth Interna-tional Conference on Autonomous Agents and MultiAgent Sys-tems (AAMAS 2010). Richland, SC: International Founda-tion for Autonomous Agents and Multiagent Systems.

Koller, D., and Pfeffer, A. 1997. Representations and Solu-tions for Game-Theoretic Problems. Artificial Intelligence94(1): 167–215.

Koller, D.; Megiddo, N.; and von Stengel, B. 1996. Effi-cient Computation of Equilibria for Extensive Two-Per-son Games. Games and Economic Behavior 14(2): 247–259.

Kuhn, H. W. 1950. A Simplified Two-Person Poker. InContributions to the Theory of Games, Annals of Mathe-matics Studies 24, ed. H. W. Kuhn and A. W. Tucker, vol-ume 1, 97–103. Princeton, NJ: Princeton University Press.

Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M.2009. Monte Carlo Sampling for Regret Minimization inExtensive Games. In Proceedings of the 23rd Annual Con-ference on Neural Information Processing Systems (NIPS2009), 1078–1086. Cambridge, MA: The MIT Press.

Letchford, J., and Conitzer, V. 2010. Computing OptimalStrategies to Commit to in Extensive-Form Games. In Pro-ceedings of the 11th ACM Conference on Electronic Com-merce, 83–92. New York: Association for ComputingMachinery.

Milgrom, P., and Roberts, J. 1996. Coalition-Proofnessand Correlation with Arbitrary Communication Possibil-ities. Games and Economic Behavior 17(1): 113–128.

Miltersen, P. B., and Sørensen, T. B. 2010. Computing aQuasi-Perfect Equilibrium of a Two-Player Game. Eco-nomic Theory 42(1): 175–192.

Miltersen, P. B., and Sørensen, T. B. 2008. Fast Algorithmsfor Finding Proper Strategies in Game Trees. In Proceed-ings of the 19th Annual ACM-SIAM Symposium on DiscreteAlgorithms (SODA), 874–883. Philadelphia, PA: Society forIndustrial and Applied Mathematics.

Articles

WINTER 2010 31

Miltersen, P. B., and Sørensen, T. B. 2007. A Near-OptimalStrategy for a Heads-Up No-Limit Texas Hold’em PokerTournament. In Proceedings of the Sixth International Con-ference on Autonomous Agents and MultiAgent Systems(AAMAS 2007). Richland, SC: International Foundationfor Autonomous Agents and Multiagent Systems.

Miltersen, P. B., and Sørensen, T. B. 2006. ComputingProper Equilibria of Zero-Sum Games. In Proceedings of the5th International Conference on Computers and Games. Lec-ture Notes in Computer Science. Berlin: Springer-Verlag.

Moreno, D., and Wooders, J. 1996. Coalition-Proof Equi-librium. Games and Economic Behavior 17(1): 80–112.

Nemhauser, G., and Wolsey, L. 1999. Integer and Combi-natorial Optimization. John Wiley & Sons.

Nesterov, Y. 2005a. Excessive Gap Technique in Non-smooth Convex Minimization. SIAM Journal of Optimiza-tion 16(1): 235–249.

Nesterov, Y. 2005b. Smooth Minimization of NonSmoothFunctions. Mathematical Programming 103:127–152.

Ray, I. 1996. Coalition-Proof Correlated Equilibrium: ADefinition. Games and Economic Behavior 17(1): 56–79.

Romanovskii, I. 1962. Reduction of a Game with Com-plete Memory to a Matrix Game. Soviet Mathematics 3:678–681.

Sandholm, T. 2007. Perspectives on Multiagent Learning.Artificial Intelligence 171(7): 382–391.

Schnizlein, D.; Bowling, M.; and Szafron, D. 2009. Prob-abilistic State Translation in Extensive Games with LargeAction Sets. In Proceedings of the 21st International JointConference on Artificial Intelligence (IJCAI-2009). MenloPark, CA: AAAI Press.

Shi, J., and Littman, M. 2002. Abstraction Methods forGame Theoretic Poker. In Revised Papers from the SecondInternational Conference on Computers and Games, LectureNotes in Computer Science, 333–345. Berlin: Springer-Verlag.

Sklansky, D. 1999. The Theory of Poker, fourth edition. LasVegas, NV: Two Plus Two Publishing.

Southey, F.; Bowling, M.; Larson, B.; Piccione, C.; Burch,N.; Billings, D.; and Rayner, C. 2005. Bayes’ Bluff: Oppo-nent Modelling in Poker. In Proceedings of the 21st Annu-al Conference on Uncertainty in Artificial Intelligence (UAI2005), 550–558. Redmond, WA: AUAI Press.

von Stengel, B. 1996. Efficient Computation of BehaviorStrategies. Games and Economic Behavior 14(2): 220–246.

Waugh, K.; Bard, N.; and Bowling, M. 2009. StrategyGrafting in Extensive Games. In Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS 2009). Cambridge, MA: The MIT Press

Waugh, K.; Schnizlein, D.; Bowling, M.; and Szafron, D.2009a. Abstraction Pathologies in Extensive Games. InProceedings of the 8th International Conference onAutonomous Agents and MultiAgent Systems (AAMAS 2009).Richland, SC: International Foundation for AutonomousAgents and Multiagent Systems.

Waugh, K.; Zinkevich, M.; Johanson, M.; Kan, M.; Schni-zlein, D.; and Bowling, M. 2009b. A Practical Use ofImperfect Recall. In Proceedings of the 8th Symposium onAbstraction, Reformulation and Approximation (SARA2009).Menlo Park, CA: AAAI Press.

Wellman, M. 2006. Methods for Empirical Game-Theo-retic Analysis (Extended Abstract). In Proceedings of the21st AAAI Conference on Artificial Intelligence (AAAI-2006),1552–1555. Menlo Park, CA: AAAI Press. Menlo Park, CA:AAAI Press.

Zinkevich, M.; Bowling, M.; Johanson, M.; and Piccione,C. 2007. Regret Minimization in Games with IncompleteInformation. In Proceedings of the Twenty-First Annual Con-ference on Neural Information Processing Systems (NIPS2007). Cambridge, MA: The MIT Press.

Tuomas Sandholm is a professor in the Computer Sci-ence Department at Carnegie Mellon University. He haspublished more than 400 papers on artificial intelligence,game theory, electronic commerce, multiagent systems,auctions and exchanges, automated negotiation and con-tracting, coalition formation, voting, search and integerprogramming, safe exchange, normative models ofbounded rationality, resource-bounded reasoning,machine learning, and networks. He has 20 years of expe-rience building optimization-based electronic market-places and has fielded several of his systems. He wasfounder, chairman, and CTO/chief scientist of Com-bineNet, Inc. from 1997 until its acquisition in 2010.During this period the company commercialized over800 large-scale generalized combinatorial auctions, withover $50 billion in total spending and over $6 billion ingenerated savings. His technology also runs the nation-wide kidney exchange. He received his Ph.D. and M.S.degrees in computer science from the University of Mas-sachusetts at Amherst in 1996 and 1994. He earned anM.S. (B.S. included) with distinction in industrial engi-neering and management science from the Helsinki Uni-versity of Technology, Finland, in 1991. He is recipient ofthe NSF Career Award, the inaugural ACM AutonomousAgents Research Award, the Alfred P. Sloan FoundationFellowship, the Carnegie Science Center Award for Excel-lence, and the Computers and Thought Award. He is Fel-low of the ACM and AAAI.

Articles

32 AI MAGAZINE

Please Support AAAIwith Your Gift Today!

It is the generosity and loyalty of our membersthat enables us to continue to provide the bestpossible service to the AI community and pro-mote and further the science of artificial intel-ligence by sustaining the many and varied pro-grams that AAAI provides. Direct cash gifts arethe most common form of giving and can beunrestricted or, in some circumstances, desig-nated for a specific project or program. Cashgifts are tax-deductible to the full extent per-mitted by law. You may donate online atwww.aaai.org/donate or contact us [email protected].

The State of Solving Large Incomplete-Information Games, and …sandholm/solving...

Documents

Transcript of The State of Solving Large Incomplete-Information Games, and …sandholm/solving...