GameTheoryNC17

Professor John H. NachbarSpring 2003

Basic Non-cooperative Game Theory:The NC-17 Version

1 Preliminary remarks.

These are notes on basic non-cooperative game theory. By “basic,” I mean “fun-damental,” not “easy.” These notes are written at a first-year graduate level. Imention only a few applications, and then only in passing. I do not discuss equi-librium refinements (with the partial exception of iterated weak dominance). AndI do not discuss cooperative game theory.

With the exception of Section 8.9 (repeated games), I restrict formal definitionsand theorems to finite games, that is, games in which there are a finite numberof players and each player has only a finite number of strategies. I provide somecomments on how, or whether, the definitions and theorems can be extended toinfinite games. And I include some examples of infinite games, e.g. Example 4, theCournot game.

2 Games in Strategic Form.

2.1 Strategic forms.

A strategic form consists of a list of players and a list for each player of that player’sstrategies, that is, a list of what that player can do in the game. Formally, let Idenote the set of players and let Si denote the set of strategies available to playeri ∈ I. I sometimes refer to the elements of Si as pure strategies, to distinguish themfrom the mixed strategies introduced in Section 3.3.

As noted in Section 1, all definitions, with the exception of those in Section 8.9,assume that the game is finite: both Si and I are finite. Let N denote the numberof players; N = |I|.1 Then the set of players can be represented as I = {1, . . . , N}.A strategic form is then

(I, {Si}Ni=1),

where {Si}Ni=1 = {S1, . . . , SN}.

A strategy profile s = (s1, . . . , sN ) lists a strategy for each player. The set ofstrategy profiles is then

S =N∏

i=1

Si.

1In general, given a set X, let |X| denote the cardinality of X.

1

For future reference, I also establish notation for profiles of strategies of playersother than player i. Explicitly, if s = (s1, . . . , si−1, si, si+1, . . . , sN ) then s−i =(s1, . . . , si−1, si+1, . . . , sN ). The set of strategy profiles for players other than i isthen

S−i =∏j �=i

Sj .

By definition, if s−i = (s1, . . . , si−1, si+1, . . . , sN ) then

(si, s−i) = (s1, . . . , si−1, si, si+1, . . . , sN ).

2.2 Strategic form games.

A strategic form becomes a strategic form game once one specifies, for each playeri, a payoff function

ui : S → R.

A game in strategic form is written formally as

(I, {Si}Ni=1, {ui}N

i=1).

In most economic applications, players are (or at least would like to be) sub-jective expected utility maximizers and the interpretation of ui is as follows. Eachstrategy profile s generates an outcome. The outcome might be, for example, thatplayer i gets a gold star while every other player gets a lump of coal. Each playerhas a felicity function over outcomes, and this in turn induces a felicity functionui over strategy profiles. In economic applications, it is common to assume that ifthe strategy profile s results in player i receiving, say, $12 then ui(s) = 12. Thisimplicitly assumes both risk neutrality and the absence of externalities like envy oraltruism.

Remark 1. In game theory as studied by evolutionary biologists, ui is a measure ofreproductive fitness, such as expected numbers of children or grandchildren. I donot, however, discuss this application further. �

2.3 Examples.

As discussed in Section 8.7, any non-cooperative game can be represented as a gamein strategic form. But for the moment it is convenient, in terms of interpretation,to think of a strategic form game as one in which players act simultaneously. I nowgive some standard examples.

It is often convenient to represent games via game boxes. Rather than definewhat a game box is, I give examples using 2×2 games, by which I mean games withtwo players and two strategies for each player.

2

a ba 8, 10 0, 0b 0, 0 10, 8

Figure 1: Battle of the Sexes.

Example 1. The game box for one version of Battle of the Sexes is in Figure 1.Players have two strategies, a and b. If both players choose a then player 1, the rowplayer, gets 8 and player 2, the column player, gets 10. If both players choose b thenplayer 1 gets 10 and player 2 gets 8. If player 1 chooses a and player 2 chooses b, orif player 1 chooses b and player 2 chooses a, then both players get 0.

Thus, player 1 and player 2 would like to coordinate on either (a, a) or (b, b),but they disagree about which of those two is better. �Example 2. The game box for Matching Pennies is in Figure 2. Player 1 likes the

H TH 1,−1 −1, 1T −1, 1 1,−1

Figure 2: Matching Pennies.

profiles (H,H) and (T, T ). Player 2 likes the profiles (H,T ) and (T,H).This game is called zero sum because, for any strategy profile, the sum of payoffs

is zero. �Example 3. The game box for a version of the Prisoner’s Dilemma (sometimes Pris-oners’ Dilemma; your choice) is in Figure 3. The players can maximize the sum of

C FC 4, 4 0, 6F 6, 0 1, 1

Figure 3: A Prisoner’s Dilemma.

their payoffs by coordinating on (C,C). But, regardless of what the opponent does,each player wants to to play F . �

Game boxes are convenient for small games, such as the two player, two strategygames above, but they are useless for large game. A standard example of a largegame is the following.

3

Example 4. There are two firms producing a homogeneous product for which marketdemand is given by

Q =

{12 − P if P ∈ [0, 12],0 if P > 12.

A strategy for firm i is a choice of quantity. Following tradition, I denote a typicalstrategy in the Cournot game as qi (“q” for “quantity”). Note that the strategy setSi = [0,∞) is infinite. It can be made finite by limiting attention to, say, integerquantities less than or equal to 12. But it is often simplest to deal directly with thecontinuum case.

Market price is set to clear the market, if possible. Thus, if the strategy profileis (q1, q2), in which case Q = q1 + q2, then market price is

P (q1, q2) =

{12 − (q1 + q2) if q1 + q2 ∈ [0, 12],0 if q1 + q2 > 12.

Suppose that the cost of producing qi is Ci(qi). Then the profit to firm i, which Iequate with firm i’s payoff, is

ui(q1, q2) = P (q1, q2)qi − Ci(qi).

The game just described is called the Cournot duopoly. Generalizing this gameto N firms yields a Cournot oligopoly.

For simplicity, suppose that there are no costs. One can verify that the sum ofprofits is maximized if total output is 6. Thus, the firms would like to coordinateon an output of 3 each. But they have incentive to cheat. For example, if player1, say, were to produce 3 then player 2 would maximize its profit by producing 4.5rather than 3. �

4

3 Probability Distributions over Strategies.

3.1 Basic Probability Notation.

Let ∆(S) denote the set of probability distributions over S.2 σ ∈ ∆(S) is called astrategy distribution. σ(s) is the probability under σ of the strategy profile s.

The support of σ, written supp(σ), is the set of s ∈ S that get positive probabilityunder σ; that is, supp(σ) is the set of s such that σ(s) > 0. Say that σ is degenerateif there is some s such that σ(s) = 1, in which case supp(σ) = {s}. If σ(s) = 1 thenI use s interchangeably with σ.

Similar definitions hold for σi ∈ ∆(Si) and σ−i ∈ ∆(S−i).

Example 5. In the case of a 2 × 2 game, I can represent σ as in Figure 4. Here

s2 s2s1 α βs1 γ δ

Figure 4: A strategy distribution.

α, β, γ, δ ≥ 0 and α+ β + γ + δ = 1. For example if s = (s1, s2) then σ(s) = β. �Example 6. Less abstractly, consider the Battle of the Sexes game of Example 1 inSection 2.3. Then one possible σ is represented in Figure 5 Thus σ(a, a) = σ(b, b) =

a ba 1/2 0b 0 1/2

.

Figure 5: A strategy distribution for Battle of the Sexes.

1/2, σ(a, b) = σ(b, a) = 0. �Here are three, not mutually exclusive, interpretations of a strategy distribution

as a game theory prediction.

1. Subjective. A strategy distribution σ is an analyst’s subjective forecast of howthe game will be played. It may be that the forecast is degenerate, meaningthat the analyst is certain that some particular strategy profile s will be played.But it is easy to provide examples, such as matching pennies (Example 2),where it seems reasonable to make non-degenerate forecasts.

2Here and below I will use the notation ∆(·) to denote the set of probabilities over a finite set.

5

2. Empirical. σ is an empirical distribution generated by actual play. The gameis played many times, perhaps with the same players, perhaps with differentplayers, and σ(s) is the frequency with which s occurs.

3. Objective. Players actually randomize, and σ is the objective probability dis-tribution over S generated by this randomization. I discuss this further inSection 3.3.

3.2 Marginal distributions.

A strategy distribution σ induces, for each player i, a marginal distribution σi ∈∆(Si) defined by, for each si,

σi(si) =∑

s−i∈S−i

σ(si, s−i).

Example 7. Consider Figure 6. The marginal distribution σ1 is defined by σ1(s1) =

s2 s2s1 1/3 0s1 5/12 1/4

.

Figure 6: A correlated distribution.

1/3 and σ1(s1) = 5/12 + 1/4 = 2/3. The marginal distribution σ2 is defined byσ2(s2) = 1/3 + 5/12 = 3/4 and σ2(s2) = 1/4. �Example 8. Consider the Battle of the Sexes and suppose that the strategy dis-tribution is the one given in Figure 5 (Example 6). The implied marginals areσ1(a) = σ1(b) = σ2(a) = σ2(b) = 1/2. �Example 9. Again consider the Battle of the Sexes but suppose that the strategydistribution is as in Figure 7. Even though the distribution is different than in

a ba 1/4 1/4b 1/4 1/4

.

Figure 7: An independent distribution.

Figure 5, the implied marginals are the same: σ1(a) = σ1(b) = σ2(a) = σ2(b) = 1/2.�

6

3.3 Independence and mixed strategies.

If σ is independent then, for any s = (s1, . . . , sN ),

σ(s) = σ1(s1) × · · · × σN (sN ).

The distribution in Example 9 is independent. The distributions in Example 6 andExample 8 are not independent; they exhibit correlation.

Suppose σ is independent. Then it is as if each player i were randomizing inde-pendently according to σi. If players actually can randomize then the true strategicform is then not (I, {Si}N

i=1) but rather (I, {∆(Si)}Ni=1). One writes (I, {Si}N

i=1),rather than (I, {∆(Si)}N

i=1), only as a kind of shorthand.3 σi is called a mixed strat-egy. A pure strategy is just a degenerate mixed strategy. Explicitly, si is the mixedstrategy σi with σi(si) = 1. Still assuming that σ is independent, I sometimes abusenotation and write (σ1, . . . , σN ) in place of σ. (σ1, . . . , σN ) is called a mixed strategyprofile.

Much of game theory focuses on strategy distributions that are independent. Inparticular, the standard game theory prediction, Nash equilibrium (Section 6), im-poses independence and as a consequence independence is assumed in virtually allgame theory applications. One can argue that independence is without loss of gener-ality if the game is a complete description of the strategic environment. Briefly, theargument goes as follows. If players can correlate then there must be some mecha-nism generating the correlation. For example, it might be that players can correlatebecause they can talk prior to the start of play. By embedding the correlating mech-anism into the description of the game, one can generate an augmented game withthe property that the correlated strategy distribution of the original game becomesan independent distribution in the augmented game. To formalize this requires anunderstanding of extensive forms and extensive form strategies, which I don’t intro-duce until Section 8, so I am getting ahead of the story in even bringing up this issuenow. I provide explicit examples of what I have in mind later, in Section 9. Butbefore leaving this topic, let me note that, in practice, it may not be reasonable toview games as literally complete descriptions of the strategic environment. This hasvarious consequences for the analysis, one of which is that, in some circumstances,it may be sensible to expect correlation.

If σ is independent then σ−i, the marginal distribution over S−i defined byσ−i(s−i) =

∑si∈Si

σ(si, s−i), is independent, hence

σ−i(s−i) =∏j �=i

σj(sj),

and

σ(s) = σi(si)σ−i(s−i).3Of course, ∆(Si) is infinite whereas I have assumed strategy sets are finite. But the extension

of the theory to the continuum in this particular case is trivial.

7

I often write (σi, σ−i) in place of σ.Even if σ is not independent, it may that for some i and all s,

σ(s) = σi(si)σ−i(s−i),

where σ−i may be correlated. If σ satisfies this expression then I continue to write(σi, σ−i) in place of σ and I continue to refer to σi as a mixed strategy.

3.4 Mixed extensions.

Under the standard interpretation of game theory, player i’s utility over lotteries,denoted Ui, has the expected utility form, with felicity function given by the payofffunction ui (c.f. the discussion in Section 2.2). Thus, the utility from the lottery(strategy distribution) σ is

Ui(σ) = Eσ[ui] =∑s∈S

σ(s)ui(s),

where Eσ denotes expectation with respect to the probability distribution σ.It is standard practice in game theory to write ui(σ) instead of Ui(σ). ui with

its domain thus extended from S to ∆(S) is called the mixed extension of ui.

8

4 Best Response.

4.1 Definitions.

Definition 1. σi ∈ ∆(Si) is a best response to σ−i ∈ ∆(S−i) iff, for all σi ∈ ∆(Si),

ui(σi, σ−i) ≥ ui(σi, σ−i).

σi ∈ ∆(Si) is a strict best response to σ−i iff, for all σi ∈ ∆(Si), σi �= σi,

ui(σi, σ−i) > ui(σi, σ−i).

For any σ−i, let BRi(σ−i) denote the set of player i’s best responses to σ−i.

Thus, σi is a best response iff it yields at least as high an expected payoff,given σ−i, as any alternative strategy. σi is a strict best response iff it yields ahigher expected payoff, given σ−i, than any alternative strategy. As the examples ofSection 4.3 illustrate, it is possible to have more than one best response to the sameσ−i. There is a unique best response if and only if there is a strict best response, inwhich case the best response is pure (Theorem 4 below).

One interpretation of σ−i is Bayesian: σ−i is player i’s subjective belief aboutthe behavior of the other players. Another interpretation is that σ−i is the true,objective, distribution over S−i generated by the actual behavior of the other players.I have two observations. First, even if the player is Bayesian, it may be that thesubjective σ−i and the objective σ−i are different: a Bayesian’s beliefs can be wrong.Second, regardless of what one assumes about the rationality of the players, one canalways ask whether, given the payoff functions, a player’s strategy is a best responsein the objective sense. That is, one can ask whether players are acting as if theywere fully informed and were optimizing.

4.2 Basic Facts.

In finite games, best responses (but not necessarily strict best responses) alwaysexist.

Theorem 1. For any σ−i ∈ ∆(S−i), there exists a best response to σ−i

Proof. This follows from the fact that, for any fixed σ−i, (a) the mixed extension ui

is continuous as a function of σi and (b) the domain of this function, namely ∆(Si),is compact. It is a basic mathematical fact that a continuous function defined on acompact set attains a maximum. �

Remark 2. Theorem 1 does not fully generalize to non-finite games. Consider thefollowing game, the Name Your Prize game. There is only one player, who names anatural number z ∈ N+ and then receives z as her payoff. In this game, there isn’t

9

any z that is a best response (I still use the term “best response” even though thereare no other players) since z + 1 is better than z. The problem here is a failure ofcompactness: the set of all numbers is not bounded, hence not compact.

As a second example, consider the Bertrand duopoly game. Two firms producea homogeneous good. Firm i’s strategy is its price, pi. Firm 1’s payoff is its profit,determined as follows. If p1 > p2 or if p1 > 1 then demand for firm 1 is zero. Ifp1 < p2 and p1 < 1 then the demand for firm 1 is 1 − p1. If, however, p1 = p2 < 1then the firm’s split the market; the demand for firm 1 is (1−p1)/2. Finally, assumethat there is zero cost of production. It is easy to verify that if p2 > 1/2 then firm1’s best response is p1 = 1/2, which is the monopoly price. And if p2 = 0 then anyp1 is a best response, since no matter what price firm 1 charges, its profits are zero.But if p2 ∈ (0, 1/2] then, strictly speaking, firm 1 has no best response. Informally,firm 1 wants to undercut firm 2’s price by as little as possible, there is no price thatis “just below” p2. A price of p1 = p2 − ε/2 will always be better than a price ofp1 = p2−ε. The problem here is that firm 1’s payoff function is not continuous in p1for any p2 ∈ (0, 1). In particular, there is a discontinuity at p1 = p2. Note that thisproblem disappears if one considers instead (perhaps more realistically) the finitegame in which there is no price between, say, 59 cents and 60 cents.

The bottom line is that in some non-finite games of interest, the Bertrandduopoly being a classic example, best responses may not, in fact, exist for someprofiles of opposing strategies. �

The linear structure of expected utility has strong implications for whether amixed strategy can be a best response. The remaining results in this subsection allbear on this issue. Some of these results (c.f., Remark 3 below) may be counterin-tuitive.

The first observation along these lines is that a mixed strategy is a best responseto some σ−i if and only if it yields as high an expected payoff as any pure strategy.The only if direction is immediate (since a pure strategy is just a special kind ofmixed strategy). The if direction follows from the fact that the payoff to a mixedstrategy is just the average of payoffs to pure strategies.

Theorem 2. σi is a best response to σ−i iff

ui(σi, σ−i) ≥ ui(si, σ−i).

for any si ∈ Si.

Proof. ⇒. Immediate from the definition of best response.⇐. Suppose that ui(σi, σ−i) ≥ ui(si, σ−i) for any si ∈ Si. Consider any σi ∈

∆(Si). Then, multiplying by the σi(si) and adding across the si,∑si∈Si

σi(si)ui(σi, σ−i) ≥∑

si∈Si

σi(si)ui(si, σ−i). (1)

10

But, since∑

si∈Siσi(si) = 1,∑

si∈Si

σi(si)ui(σi, σ−i) = ui(σi, σ−i).

Moreover, by definition, ∑si∈Si

σi(si)ui(si, σ−i) = ui(σi, σ−i).

Therefore, (1) implies


Since σi was arbitrary, it follows that σ ∈ BRi(σ−i), as was to be shown. �

The second observation is that a mixed strategy is a best response to σ−i if andonly if all the pure strategies in its support are likewise best responses. The intuitionis much like the intuition for Theorem 2. The expected payoff from a mixed strategyσi is an average of the expected payoffs from the pure strategies in its support. Ifsi is in the support of σi but si is not a best response to σ−i then one could raiseone’s payoff average by shifting probability away from si.

Theorem 3. σi is a best response to σ−i iff every si in the support of σi is a bestresponse to σ−i.

Proof. ⇒. I argue by contraposition. Suppose that si �∈ BRi(σ−i). Considerany σi such that σi(si) > 0. I show that σi is not a best response to σ−i. Sincesi �∈ BRi(σ−i), Theorem 2 implies that there is a si such that

ui(si, σ−i) > ui(si, σ−i).

Define σi by shifting all probability that had been on si onto si. Formally, letσi(si) = 0, σi(si) = σi(si) + σi(si), and σi(si) = σi(si) for all si other than si or si.Then

σi(si) − σi(si) = −σi(si),σi(si) − σi(si) = σi(si)

and, for any other si,

σi(si) − σi(si) = 0.

Therefore,

ui(σi, σ−i) − ui(σi, σ−i) =∑

si∈Si

σi(si)ui(si, σ−i) −∑

si∈Si

σ(si)ui(si, σ−i)

=∑

si∈Si

[σi(si) − σ(si)]ui(si, σ−i)

= σi(si) [ui(si, σ−i) − ui(si, σ−i)]> 0.

11

Hence ui(σi, σ−i) > ui(σi, σ−i). Therefore, σi is not a best response. The prooffollows by contraposition.

⇐. The claim is almost immediate. Consider any σi. Since si ∈ BRi(σ−i) forevery si ∈ supp(σi), it follows that, for any such si, ui(si, σ−i) ≥ ui(σi, σ−i). Then,much as in the proof of Theorem 2, it follows that

ui(σi, σ−i) =∑

si∈Si

σi(si)ui(si, σ−1) ≥ ui(σi, σ−i).

(Note that if si �∈ supp(σi) then it is possible that ui(si, σ−i) < ui(σi, σ−i); but inthis case σi(si) = 0.) Since σi was arbitrary, σi ∈ BRi(σ−i). �

Theorem 3 implies that there is always a pure strategy best response to any σ−i.Theorem 3 also implies that if a nondegenerate mixed strategy σi is a best responseto σ−i then there are multiple pure strategy best responses to σ−i. In this case, theset of best response to σ−i, mixed as well as pure, is a continuum.

Theorem 3 plays an important role in the analysis to follow. In particular theprocedure for constructing mixed strategy Nash equilibria is to determine whatmixtures over actions for players other than i will yield more than one best responsefor player i.

The final observation is that a best response is strict if and only if it is the uniquebest response.

Theorem 4. σi is a strict best response to σ−i iff it is the unique best response toσ−i. A strict best response to σ−i is pure.

Proof. Let σi be a strict best responses to σ−i. Then for any σi �= σi, ui(σi, σ−i) >ui(σi, σ−i). This implies that σi �∈ BRi(σ−i). Since σi was arbitrary, the uniquenessclaim follows. Uniqueness together with Theorem 3 then implies that a strict bestresponse must be pure: if it were not pure then Theorem 3 implies that there wouldbe multiple pure best responses. �

Remark 3. One sometimes sees claims to the effect that in games like MatchingPennies (Example 2), players should randomize in order to avoid being exploitedby their opponents. Theorem 3 and 4 imply that this intuition is incompatiblewith expected utility maximization. With expected utility, there is never any strictincentive to randomize; whenever randomization is optimal, playing a pure strategyis optimal as well. �

4.3 Examples.

Example 10. Consider the game in Figure 8. For simplicity, I have recorded onlythe payoffs for player 1. Let p be the probability that player 1 plays T and let q bethe probability that player 2 plays L: p = σ1(T ) and q = σ2(L). It is easy to verify

12

L RT 5 0B 2 7

Figure 8: Payoffs are for player 1 only.

that if q > 7/10 then player 1’s strict best response is p = 1 (player 1 plays T forsure). Similarly, if q < 7/10 then player 1’s strict best response is p = 0. On theother hand, if q = 7/10 then every p ∈ [0, 1] is a best response. �Example 11. Consider the entry deterrence game of Figure 29 (Example 43). Letp = σ1(E) and let q = σ2(Fight). Informally, if q is large, meaning that the incum-bent fights with high probability, then the entrant’s strict best response is to stayout. Conversely, if q is small, meaning that the incumbent accommodates with highprobability, then the entrant’s strict best response is to enter. Explicitly, one canverify that if q < 1 −K/16 then the strict best response is p = 1, if q > 1 −K/16then the strict best response is p = 0, while if q = 1 −K/16 then every p ∈ [0, 1] isa best response.

Conversely, for any p > 0 the strict best response is q = 0. That is, if there isany chance of entry, the incumbent’s strict best response is to accommodate. Butif p = 0, there is zero chance of entry, then any q is a best response. �Example 12. Recall the Cournot duopoly of Example 4. Assume that the cost func-tion is Ci(qi) = cqi where c > 0 and c < 12. Note that this is an infinite game.Ignoring boundary issues (e.g. ignoring q1 + q2 ≥ 12) the payoff to player 1 from theprofile (q1, σ2) is

Eσ2 [(12 − q1 − q2)q1 − cq1] = (12 − q1 − Eσ2 [q2] − c)q1= (12 − q1 − q2 − c)q1,

where q2 = Eσ2 [q2]. Given this, it is easy to verify that, for q2 < 12 − c, player 1’sbest response is strict, hence unique, and is given by (12 − c− q2)/2. On the otherhand, if q2 ≥ 12 − c then the firm’s best response is zero. In summary,

BR1(σ2) =

{12−c−q2

2 if q2 < 12 − c,0 if q2 ≥ 12 − c.

Similarly, if q1 = Eσ1 [q1] then

BR1(σ2) =

{12−c−q1

2 if q1 < 12 − c,0 if q1 ≥ 12 − c.

13

More generally, if there are N firms, all with cost C(qi) = cqi, then one can show,using the notation q−i = Eσ−i [

∑j �=i qj ], that the best response to σ−i is

BR1(σ−i) =

{12−c−q−i

2 if q−i < 12 − c,0 if q−i ≥ 12 − c.

The fact that the best response to σ−i depends only on q−i is not general. Itcomes from the fact that profit is linear in s−i, which in turn comes from theassumption that demand is linear. �Example 13. Recall the prisoner’s dilemma of Example 3. One can readily verifythat F is the best response to any σ−i.

In the repeated prisoner’s dilemma (Example 46), things are much more compli-cated. It is easy to verify that “always fink” is a best response to either “alwayscooperate” or “always fink.” In this respect, the analysis of the stage game carriesover to the repeated game. But, depending on the discount factors, “always fink”may not be a best response to “tit for tat” or “grim.” Explicitly, consider the payoffto “always fink” when the opponent plays “grim.” The “always fink” player receives(using the payoffs of Example 3) the sequence of stage game payoffs 6, 1, 1, . . . . Therepeated game payoff is then

6 + δi + δ2i + · · · = 6 +δi

1 − δi.

In contrast, if a player plays “grim” when his opponent plays “grim” then the playerreceives the sequence of stage game payoffs 4, 4, 4, . . . . The repeated game payoff is

41 − δi

.

One can verify that the payoff to “grim” is larger than the payoff to “always fink”whenever

δi >25.

In fact, one can show that “grim” is a best response to “grim” if this condition onδi is met.4

If one interprets δi as reflecting a rate of time preference ri, so that δi = 1/(1+ri),then “grim” is a better response to “grim” than is “always fink” provided

ri <32

= 150%.

Market interest rates (in real terms) are usually less than 10% per year. So thisrestriction on ri (which depends on the particular stage game payoffs we are using)seems weak. �

4It is not the unique best response. If δi > 2/5 then both “always C” and “tit for tat” are alsobest responses to “grim.” And there are many, many other best responses as well. But, in thisparticular example, all of the best responses yield the same path of play, with mutual play of C inevery period.

14

5 Dominance and Rationalizability.

In the next three sections I introduce three basic solution concepts used in gametheory, rationalizability (this section), Nash equilibrium (Section 6), and correlatedequilibrium (Section 7). Of these solution concepts, rationalizability is the leastrestrictive, correlated equilibrium is the next least restrictive, and Nash equilibriumis the most restrictive. I describe them out of order, with correlated equilibriumcoming last, because people tend to find correlated equilibrium strange at first.Applied game theory employs Nash equilibrium (or one of its refinements, such assubgame perfect Nash equilibrium) almost exclusively. But rationalizability andcorrelated equilibrium play a prominent role in theoretical (as opposed to applied)and empirical game theory. I discuss some of the reasons for this in Section 9.

5.1 Strict Dominance.

Definition 2. σi strictly dominates σi iff, for any σ−i ∈ ∆(S−i),

ui(σi, σ−i) > ui(σi, σ−i)

σi is strictly dominant iff it strictly dominates every σi �= σi. σi is strictly dominatediff there exists a σi that strictly dominates σi.

The definition of strict dominance requires that σi have a higher expected payoffthan σi for all opposing σ−i. The following result establishes that it is sufficient torestrict attention to opposing profiles that are pure. This enormously simplifies thattask of verifying whether some σi dominates some σi.

Theorem 5. σi strictly dominates σi iff, for all s−i ∈ S−i,

ui(σi, s−i) > ui(σi, s−i).

Proof.

⇒. Immediate.

⇐. Suppose that ui(σi, s−i) > ui(σi, s−i) for every s−i ∈ S−i. Consider any σ−i.Then, multiplying by the σ−i(s−i) and adding across the s−i,∑

s−i∈S−i

σ−i(s−i)ui(σi, s−i) >∑

s−i∈S−i

σ−i(s−i)ui(σi, s−i).

Hence

ui(σi, σ−i) > ui(σi, σ−i),

as was to be shown.

15

�

The following result is trivial but still worth recording.

Theorem 6. A strategy is strictly dominant iff it is a strict best response to everyopposing strategy profile. If a strictly dominant strategy exists then it is unique andit is pure.

Proof. Immediate from the definition of strict dominance and from Theorem 4. �

Example 14. Consider the Prisoner’s Dilemma of Example 3. It is easy to see that,for either player, C is strictly dominated by F , which is strictly dominant. �Example 15. Consider the game in Figure 9. In this example, B is strictly dominated

L RT 4 0M 3 3B 0 2

Figure 9: Payoffs are for player 1 only. B is strictly dominated by M . No strategyis strictly dominant.

by M . But M is not strictly dominant. Indeed, it is not a best response to L. Inthis example, no strategy is strictly dominant. �Example 16. Consider the game in Figure 10. B is not not strictly dominated by any

L RT 10 0M 0 10B 4 4

Figure 10: Payoffs are for player 1 only. B is strictly dominated.

pure strategy. But it is strictly dominated by some mixed strategies. In particular,B is strictly dominated by the strategy that randomizes 50:50 between T and M .This mixed strategy earns 5 in expectation against either L or R, whereas B onlyearns 4. �Example 17. In contrast, consider the game in Figure 11. As in the game Figure 10,B is not a best response to any pure strategy. But, despite this, B is not strictlydominated. It is true that T does better than B against L, and it is true that M

16

L RT 10 0M 0 10B 6 6

Figure 11: Payoffs are for player 1 only. No strategy is strictly dominated.

does better than B against R. But for B to be strictly dominated there would haveto be a single strategy for player 1 that does better than B against both L and R.No such strategy exists, as I discuss further in Example 19 in Section 5.2 �Remark 4. A common error is to confuse strict dominance with strict best response.The statement “si is a strict best response” means that there is some σ−i such thatsi is the strict best response to σ−i. Even if si is the strict best response to someσ−i it may not be a best response, let alone the strict best response, to other σ−i. Incontrast, the statement “si is strictly dominant” means (Theorem 6) that si is thestrict best response to every σ−i. It is rare for a player to have a strictly dominantstrategy. �

5.2 Never a Best Response

Definition 3. A strategy σi is never a best response iff there does not exist any σ−i

such that

σi ∈ BRi(σ−i).

The examples above suggest that there may be a close connection between strictdominance and never a best response. Theorem NBRSD, below, establishes that,indeed, strategy is never a best response if and only if it is strictly dominated. The“if” direction is trivial. The “only if” direction, however, is difficult because onemust find a dominating strategy.

Theorem 7. σi is never a best response iff it is strictly dominated.

Proof.

⇒. Suppose that σi is never a best response. I must find a σi that strictly domi-nates σi. That is, I must find σi such that, for all σ−i, ui(σi, σ−i) > ui(σi, σ−i),or

ui(σi, σ−i) − ui(σi, σ−i) > 0.

Let K = |Si| be the number of pure strategies. Represent σi as a vector ofprobabilities p ∈ R

K+ , with

∑Kk=1 pk = 1.5 Since preferences have the expected

5R

K+ = {x ∈ R

K : xk ≥ 0 ∀k}. Similarly, RK− = {x ∈ R

K : xk ≤ 0 ∀k}.

17

utility form,

ui(σi, σ−i) − ui(σi, σ−i) =K∑

k=1

pk[ui(ski , σ−i) − ui(σi, σ−i)].

For each σ−i, define vσ−i ∈ RK to be the vector whose k component is

ui(ski , σ−i) − ui(σi, σ−i). The task, then, is to find a probability vector psuch that, for all vσ−i ,

p · vσ−i > 0.

Let V ⊂ RK be the set of vσ−i . It is easy to verify that V is convex and

compact. Since σi is never a best response, Theorem 2 implies that V ∩RK− = ∅.

By the Separating Hyperplane Theorem, there exists a vector q ∈ RK , q �= 0,

and an r ∈ R such that q · v > r for v ∈ V and q · x < r for x ∈ RK− . In

fact, since 0 ∈ RK− , it follows that q · v > 0 for v ∈ V and q · x ≤ 0 for all

x ∈ RK− . Moreover, since each of the negative unit vectors, vectors of the form

(0, . . . , 0,−1, 0, . . . , 0), is in RK− , it follows that −qk < 0 for all k, or qk > 0 for

all k: q is strictly positive.

q need not be a probability vector, since it need not sum to one. But this iseasily fixed. Let

pk =qk∑K

k=1 qk.

p is a probability vector and by construction p ·vσ−i > 0 for all σ−i. The resultfollows.

⇐. If σi is strictly dominated then, by definition, there exists a σi that yieldsstrictly higher expected payoff for every possible opposing strategy profile.Hence σ is never a best response.

�

Example 18. Consider again the game in Figure 10. I have already shown, in Ex-ample 16, that B is strictly dominated. I now verify directly that it is never a bestresponse. Suppose that player 1 thinks that player 2 plays L with probability q.Then the expected payoff from T is 10q, the expected payoff from M is 10 − 10q,and the expected payoff from B is 4. M is a best response (and is strictly betterthan B) for q ≤ 1/2 and that T is a best response (and is strictly better than B)for q ≥ 1/2. Thus B is never a best response. �Example 19. Consider again the game in Figure 11. B is not a best response to anypure strategy but B is the strict best response to the mixed strategy in which player2 randomizes 50:50 between L and R. �

18

Theorem 7 implies that whether one works with strict dominance or with nevera best response is a matter of convenience. Sometimes it is easier to check for one,sometimes it is easier to check for the other. For strict dominance, it suffices (thanksto Theorem 5) to limit attention to pure opposing strategies but one may have to usea dominating strategy that is mixed (see Example 16). For never a best response,it suffices (thanks to Theorem 2) to check whether a strategy is a better than anypure strategy but one may have to consider opposing distributions that are not pure(see Example 19).

Remark 5. (Technical.) If N ≥ 3 then one can construct examples in which astrategy is not a best response to any independent σ−i and yet the strategy is notstrictly dominated. In such examples, the strategy is a best response to a σ−i thatexhibits correlation. In this sense, Theorem 7 requires that one allow the σ−i to becorrelated. I discuss this issue again in Remark 6. �

5.3 Rationalizability.

For the moment, take it as self evident that a strategy that is never a best responseor, equivalently (Theorem 7), strictly dominated will not be played. Given this, alsotake it as self evident that if a pure strategy sj , for a player j �= i, is never a bestresponse then a “reasonable” belief σ−i about i’s opponents puts probability zeroon any s−i containing sj .

With this sort of argument in mind, define recursively sequences S0, S1, . . . asfollows.

• S0 = S.

• S1i is the subset of S0

i consisting of strategies that are not strictly dominatedin the original game. S1 =

∏Ni=1 S

1i .

• S2i is the subset of S1

i consisting of strategies that are not strictly dominatedin the game with strategy sets S1

1 , . . . , S1N . S2 =

∏Ni=1 S

2i .

• And so on.

By construction, S0 ⊃ S1 ⊃ S2 ⊃ S3 . . . . Note that St+1 ⊂ St does not ruleout St+1 = St. In fact, since I am working with finite games, the set inclusionsare set equalities after some point. That is, there will be a T such that for allt > T , St = ST . (This is not necessarily true for infinite games, as Example 23 willillustrate.)

Define

SRi =

∞⋂t=0

Sti .

Let SR =∏

i SRi . Again, since the games are finite, there is a T for which SR = ST .

19

Definition 4. si ∈ Si is rationalizable iff si ∈ SRi . s ∈ S is rationalizable iff

s ∈ SR. σ ∈ ∆(S) is rationalizable iff supp(σ) ⊂ SR.

Rationalizability was originally motivated primarily by introspective arguments,arguments of the form, “I should play this, because I think that he will play that,because I think that he thinks . . . .” But this sort of sophisticated reasoning isnot necessary to motivate rationalizability. Many learning models, even learningmodels based on extremely naive behavior, predict that play eventually becomesrationalizable.6 The prediction that play eventually becomes rationalizable seemsalso to be consistent with the empirical evidence. I discuss these issues further inSection 9.

The next theorem records that, in finite games, rationalizable strategies exist.

Theorem 8. SR �= ∅.

Proof. This follows from Theorem 1 and the fact that, since the game is finite,SR = ST for some T . �

An alternate way to characterize SR is as follows. If Si ⊂ Si, say that S =∏Si is

a best response set iff, for any i, for any si ∈ Si, there is a σ−i with supp(σ−i) ⊂ S−i

(i.e., if s−i �∈ S−i then σ−i(s−i) = 0 ) such that si ∈ BRi(σ−i). That is, any si ∈ S−i

can be justified as a best response to some belief over the strategies in S−i. It iseasy to verify that the union of all best response sets is a best response set. I referto this as the largest best response set.

Theorem 9. SR is the largest best response set.

Proof. It is easy to show by induction on the St that any best response set is asubset of SR. It remains to show that SR is a best response set. Recall that, sincethe game is finite, SR = ST for some T . Since ST+1 = ST , ST is a best responseset. Hence SR is a best response set. �

Example 20. Consider the game in Figure 12. It is easy to verify that R is strictly

L RU 4, 4 0, 1D 1, 8 7, 2

Figure 12: An example to illustrate rationalizability.

dominated by L and that no other strategy is strictly dominated. It follows thatS1 = {U,D}×{L}. It is now easy to verify that, having deleted R, D is never a best

6The point was raised explicitly in Bernheim (1984), one of the original papers on this topic.

20

response. And this is the last deletion one can make. Hence SR = S2 = {U}×{L}.�Example 21. Recall the Prisoner’s Dilemma game of Example 3 and Example 14.Since F is strictly dominant for either player, SR = {F}×{F}. This is unfortunatefor the players because the payoff from (F, F ) is only (1,1) while the payoff from(C,C) is (4,4).

Example 22. Recall the game Matching Pennies, introduced in Example 2. In thisgame no strategy is strictly dominated: SR = S. �

The analysis extends to infinite games under some circumstances. For example,suppose that each Si is a compact (closed and bounded) subset of R

ki , for somepositive integer ki, and suppose that the ui are continuous. Since the strategy setsare infinite, the set inclusions St+1 ⊂ St may be strict for all t. But one can showthat the St are compact, which implies that SR is not empty. Likewise Theorem 9extends to this environment: SR is the largest best response set.

Example 23. Recall the Cournot duopoly of Example 4, which is an infinite game.Assume that the cost function for either firm is Ci(qi) = cqi, where c > 0 andsmall (less than 4). The best responses were computed in Example 12. These bestresponses imply that any qi greater than 6− c/2 is never a best response and henceis strictly dominated. No other pure strategies are strictly dominated. One roundof strict dominance deletion then yields S1 = [0, 6−c/2]× (0, 6−c/2]. Once one hasdeleted outputs above 6-c/2, no output below 3 − c/4 is ever a best response. ThusS2 = [3−c/4, 6−c/2]× [3−c/4, 6−c/2]. And so on. Each St+1 is a proper subset ofSt; there is no T such that ST = ST+1. One can show that SR = {4−c/3}×{4−c/3}.�Example 24. Again recall the Cournot game of Example 4 but now suppose thatthere are three firms instead of two. Again assume that there are no costs. Again,the best responses were computed in Example 12. Again, any qi > 6−c/2 is strictlydominated. Having deleted these strategies, no further deletion is possible: SR =[0, 6−c/2]× [0, 6−c/2]× [0, 6−c/2]. Thus, with three firms, the set of rationalizablestrategy profiles is almost as large as the original set of strategy profiles. �Remark 6. (Technical.) The definition of rationalizability that I have given corre-sponds to what is sometimes called correlated rationalizability. In constructing theSt, I am eliminating strategies that are strictly dominated at that stage, which isequivalent (Theorem 7) to eliminating strategies that are never a best response atthat stage. As discussed in Remark 5, this equivalence requires considering σ−i thatare correlated. There may be strategies at some stages that are best responses tocorrelated σ−i but not to any independent σ−i. If I were to delete these as well,I would end up with a set of strategy profiles that satisfiesindependent rational-izability. Independent rationalizability is the version of rationalizability originallyproposed by Bernheim (1984) and Pearce (1984). The set of independent rational-izable profiles is smaller than the set of correlated rationalizable strategy profiles,

21

often strictly so. Note that if N = 2 then correlated and independent rationalizabil-ity are equivalent, since each player has only a single opponent. If N ≥ 3 then I findcorrelated rationalizability somewhat more persuasive than independent rationaliz-ability. I discuss some of the reasons for holding this view in Section 9. Correlatedrationalizability also yields a tidier theory. �

5.4 Weak dominance

Game theory is frequently interested in weak, rather than strict, dominance.

Definition 5. σi weakly dominates σi iff, for any σ−i ∈ ∆(S−i),

ui(σi, σ−i) ≥ ui(σi, σ−i),

with strict inequality for at least one σ−i. σi is weakly dominated iff there exists aσi that weakly dominates σi.

There is some ambiguity in the literature as to how to define what it meansfor a strategy to be weakly dominant. The obvious definition is that σi is weaklydominant iff it weakly dominates every other strategy. This is a bit too strong,however, because it implies that a strategy cannot be weakly dominant if it has atwin, identical in every way except for name. I adopt a slightly weaker definition: astrategy is weakly dominant if it always does at least as well as any other strategy.Under this weaker definition, there may be more than one weakly dominant strategy.

Definition 6. σi is weakly dominant iff, for any σi and for any σ−i ∈ ∆(S−i),


Much as I defined SD via iterated deletion of strictly dominated strategies, Ican define SW via the iterated deletion of weakly dominated strategies. It shouldbe evident that SW ⊂ SD. There is a well established tradition within game theoryof rejecting any prediction σ for the game that puts positive probability on profilesoutside of SW . But the argument for focusing on SW is not as secure as the argumentfor focusing on SD = SR.

One problem with SW is that it is not always clear how it should be defined. Inconstructing SR, it turns out not to matter whether at stage t I delete all strategiesfor all players that are strictly dominated at that stage or just some of the strictlydominated strategies for some of the players. But with SW , the order of deletionmay matter. I have defined SW by deleting all weakly dominated strategies for allplayers at each stage, but this is somewhat arbitrary. One common solution is todefine SW to be, instead, the set of strategies that survive the iterated deletionof weakly dominated strategies for some deletion order. A profile s may be in SW

because it survives under one deletion order though it fails to survive under another.

22

A more severe problem is that, as I discuss in Section 9, none of the usual gametheory justifications (e.g. introspection, learning, the empirical evidence) providegeneral support for focusing on SW . One can make a fairly strong (but not iron-clad) case for predicting that weakly dominated strategies will be seen at most onlyinfrequently but the exclusion of strategies that fail to survive the iterated deletionof weakly dominated strategies can be problematic. An alternative with strongertheoretical support is SWR; these are the strategies that survive one round of weakdominance deletion followed by the iterated deletion of strictly dominated strategies.SW ⊂ SWR ⊂ SR. Restriction to SW , rather than SWR, is often compelling, butnot always. Rather than elevate restriction to SW , rather than SWR, as a generalprinciple, I take the position that it is prudent to proceed on a case by case basis.

23

6 Nash Equilibrium.

6.1 Definition and interpretation.

Definition 7. σ∗ ∈ ∆(S) is a Nash equilibrium distribution iff σ∗ is independentand for all i

σ∗i ∈ BRi(σ∗−i).

If σ∗ is a Nash equilibrium distribution then (σ∗1, . . . , σ∗N ) is a Nash equilibrium.

The distinction between a Nash equilibrium and a Nash equilibrium distributionmay become clearer when I discuss examples in Section 6.3.

If σ∗(s∗) = 1 for some pure strategy profile s∗ then s∗ is a pure strategy Nash equi-librium. Otherwise (σ∗1, . . . , σ

∗N ) is a mixed strategy Nash equilibrium. If supp(σ∗) =

S then (σ∗1, . . . , σ∗N ) is a fully mixed (or completely mixed) Nash equilibrium. Oth-

erwise, (σ∗1, . . . , σ∗N ) is a partly mixed Nash equilibrium. I sometimes refer to a Nash

equilibrium as simply an equilibrium.Two interpretations of a Nash equilibrium are standard. The most straightfor-

ward is that a Nash equilibrium describes actual intended behavior in the game. Ina Nash equilibrium, each player i chooses a strategy σ∗i that is a best response to abelief σ−i that is correct. Note that this does not require that player i understandthe game or actually have beliefs. All that is required is that player i acts as if heis best responding to beliefs that are in turn correct.

In the alternative interpretation, the focus is not on intended behavior but ratheron some statistic characterizing play in some larger game. For example, one couldimagine an N -player game being played by a large number of groups of N people.One can compute the frequency with which each strategy profile of the game occursacross the groups, and one can ask whether the resulting frequency distribution isa Nash equilibrium distribution.

The first interpretation is compatible with the second. If each group plays thesame Nash equilibrium then, with high probability (certainty in the case of a pureequilibrium), the observed frequency distribution approximately equals the associ-ated Nash equilibrium distribution. But the converse claim is false: it is possiblefor the empirical frequency across groups to be approximately that of a Nash equi-librium even if no group plays, even approximately, a Nash equilibrium.

For either interpretation, the obvious question is why would one think that aNash equilibrium would arise. Briefly, here are four basic stories that are often toldto justify Nash equilibrium.

The most straightforward justification for Nash equilibrium is preplay agreement:before the game begins, players negotiate an agreement on how to play. A necessarycondition for this agreement to make sense is that no one has incentive to deviatefrom the agreement so long as everyone else adheres to it. That is to say, a necessarycondition for the agreement to make sense is that it be a Nash equilibrium. I am

24

glossing over some subtleties here but the most important difficulty with preplayagreement is that, in many strategic settings, players simply do not negotiate priorto the start of the game. So some other justification is needed for Nash equilibrium.

An alternative justification is the claim that if players are rational then, even ifthey have no prior experience playing this particular game with this particular groupof opponents, they will nevertheless play a Nash equilibrium. This justification forNash equilibrium is problematic. There is, of course, the issue of whether real playersare rational. But quite apart from this, arguments that rationality implies Nashequilibrium invariably invoke strong auxiliary assumptions. There is controversy asto whether these auxiliary assumptions ought to be incorporated into the definitionof rationality. There is broad, but not universal, consensus among game theoriststhat the answer is “no.”

The final possibility is that Nash equilibrium arises over time in a dynamicenvironment in which players learn and adjust. As I discuss in Section 9.4, thereare many, many, different learning theories. In some, players are modeled as highlysophisticated, in others as extremely naive. Broadly, most of these learning theoriessuggest the following general conclusions.

If the learning process converges then it converges to an equilibrium of somesort. What sort of equilibrium varies from setting to setting, but it may be aNash equilibrium, a correlated equilibrium, or something called a self-confirmingequilibrium. Moreover, the definition of “converges” also varies from setting tosetting.

In any event, the learning process may not converge in any reasonable sense.It may, instead, wander around endlessly. For most learning theories, convergenceobtains in some games but not in others. There are a few exceptions, learning the-ories that guarantee some form of convergence in all games. All of the theories thatexhibit this sort of global convergence model players as comparatively unsophisti-cated. At this writing, all learning models that yield global convergence results withsophisticated players make strong auxiliary assumptions.

For more on the interpretation of and justifications for Nash equilibrium andother solution concepts, see Section 9.

6.2 Existence.

The following existence theorem first appeared in Nash (1950), along with the defi-nition of what we now call a Nash equilibrium. Nash (1950) called it an “equilibriumpoint”.

Theorem 10. Every game has at least one Nash equilibrium.

Proof. Given the individual best response correspondences BRi, define the cor-respondence BR by BR(σ1, . . . , σN ) = (BR1(σ−1), . . . ,BRN (σ−N )). BR is a corre-spondence from

∏i ∆(Si) to itself.

∏i ∆(Si) is compact and convex and one can show

that BR is convex-valued and has a closed graph. Therefore, by the Kakutani Fixed

25

Point Theorem, there is a (σ∗1, . . . , σ∗N ) such that (σ∗1, . . . , σ

∗N ) ∈ BR(σ∗1, . . . , σ

∗N ).

This (σ∗1, . . . , σ∗N ) is a Nash equilibrium. �

The following trivial observation is often useful.

Theorem 11. Given a finite game, if σ∗ is a Nash equilibrium distribution then itis rationalizable.

Proof. This follows from Theorem 9 and the fact that if σ∗ is a Nash equilibriumdistribution then supp(σ∗) is a best response set. �

Theorem 11 implies that if one is searching for Nash equilibria then one canabridge each strategy set Si to just SR

i . If it is easy to compute SR and if SR

is small relative to S then this can be a useful way to simplify the problem ofcomputing Nash equilibria. The converse of Theorem 11 is false: just because astrategy is rationalizable doesn’t mean that the strategy gets positive probability ina Nash equilibrium; see Example 29.

Remark 7. Although a Nash equilibrium cannot (by Theorem 11) put positive prob-ability on strategies that are strictly dominated, it can put positive probability onstrategies that are weakly dominated. If one views such strategies are implausiblethen one views such equilibria as implausible. I give an example in Example 45.Deciding which, if any, Nash equilibria are plausible is a recurring theme in gametheory. �Remark 8. Theorem 10 gives existence of a Nash equilibrium for any finite game,without any further assumptions. In contrast, Nash equilibrium can fail to exist innon-finite games. The “Name Your Prize” game in Remark 2 is perhaps the simplestpossible example of a game without an equilibrium. Sion and Wolfe (1957) providesa considerably more subtle example of non-existence. In the Sion and Wolfe (1957)example, strategy sets are compact (in contrast to the “Name Your Prize” game)but the payoff functions are discontinuous.

The bottom line is that existence theorems for non-finite games require auxiliaryassumptions. Fudenberg and Tirole (1991) provides a survey of classical results. Thestate of the art is represented by Simon and Zame (1990), Baye, Tian, and Zhou(1993), and Reny (1999). �

6.3 Examples.

Example 25. Consider the Prisoner’s Dilemma of Example 3. From Example 21,SR = {F} × {F}. Thus, by Theorem 11, the unique Nash equilibrium is the purestrategy equilibrium (F, F ). The associated Nash equilibrium distribution is givenin Figure 13. �Remark 9. A common error is to write the equilibrium (F, F ) in Example 25 as(1, 1). This confuses the equilibrium, which is a strategy profile, with the equilibrium

26

C FC 0 0F 0 1

Figure 13: The Nash equilibrium distribution for the Prisoner’s Dilemma.

payoff profile. In this particular game, the mistake appears to be harmless. But instrategic form games derived from complicated extensive form games, discussed inSection 8, this sort of mistake can lead to serious error. So I take a hard line andinsist that (1, 1) is not an equilibrium. �Example 26. Consider Matching Pennies, introduced in Example 2. The uniqueNash equilibrium is fully mixed and has each player randomize 50:50. The associatedNash equilibrium distribution is represented in Figure 14. �

H TH 1/4 1/4T 1/4 1/4

Figure 14: The Nash equilibrium distribution for Matching Pennies.

It is straightforward to verify that the Nash equilibrium in Example 26 is anequilibrium, but how does one find it in the first place? Pure strategy equilibria areusually easy to compute by simply making a record of which si are best responsesto which s−i. Mixed strategy equilibria, on the other hand, are a problem and thereis no known algorithm that is “good” for finding even one mixed equilibrium, letalone all mixed equilibria, in large games; I return to this issue in Section 6.4.

For the case of 2 × 2 games, however, the analysis is fairly easy. One proceedsas follows. Suppose that the game is as in Figure 15.

L RT u1(T,L), u2(T,L) u1(T,R), u2(T,R)B u1(B,L), u2(B,L) u1(B,R), u2(B,R)

Figure 15: A general 2 × 2 game.

If there is a mixed equilibrium in which player 1 randomizes then it must bethat player 1 is indifferent between T and B (see Theorem 3). So, the question is,what σ2 makes player 1 indifferent? Let q be the probability that player 2 plays L,

27

q = σ2(L). Then the expected payoff of T is

qu1(T,L) + (1 − q)u1(T,R)

and the expected payoff to B is

qu1(B,L) + (1 − q)u1(B,R).

If player 1 is indifferent then these are equal.

qu1(T,L) + (1 − q)u1(T,R) = qu1(B,L) + (1 − q)u1(B,R), (2)

Let q∗ be the q for which this occurs. Then a bit of manipulation yields

q∗ =u1(B,R) − u1(T,R)

u1(B,R) + u1(T,L) − u1(B,L) − u1(T,R).

Similarly, player 2 randomizes only if she is indifferent, and the question is, whatσ1 makes her indifferent? Let p be the probability that player 1 plays T . Then onecan compute that player 2 is indifferent if p = p∗, where

p∗ =u2(B,R) − u2(B,L)

u2(B,R) + u2(T,L) − u2(B,L) − u2(T,R).

If there is only one fully mixed Nash equilibrium then in this equilibrium player1 randomizes by choosing p = p∗ and player 2 randomizes by choosing q = q∗. Ifp∗ or q∗ are undefined (because the denominator is zero) or if they lie outside therelevant range (i.e. outside of (0, 1)) then the game requires more careful analysis.There may be no mixed strategy equilibria. Or there may be an infinite number ofpartly mixed equilibria, as in Example 45 in Section 8.8 below. Or, if the game istrivial, with ui(T,L) = ui(T,R) = ui(B,L) = ui(B,R), then the denominator maybe zero even though every mixed strategy profile is a Nash equilibrium. In this case,there are an infinite number of fully mixed Nash equilibria.

Remark 10. A source of confusion is that player 1’s probability p∗ depends not onplayer 1’s payoffs but on player 2’s payoffs. Because of this, a common error is tocompute p∗ correctly but then ascribe it incorrectly to player 2. Thus, a commonerror is to write that, in the mixed Nash equilibrium, player 1 plays T with proba-bility q∗ and player 2 plays L with probability p∗. Instead, in the actual equilibrium,player 1 plays T with probability p∗ and player 2 plays L with probability q∗. �Remark 11. A number of textbooks instruct students to find the mixed strategyequilibrium by the following calculus-like procedure. Write out player 1’s expectedpayoff as a function of p and q to get

pqu1(T,L) + p(1 − q)u1(T,R) + (1 − p)qu1(B,L) + (1 − p)(1 − q)u1(B,R)

28

then differentiate this expression with respect to p and set this derivative equal tozero, yielding

qu1(T,L) + (1 − q)u1(T,R) − qu1(B,L) − (1 − q)u1(B,R) = 0

which is equivalent to equation (2). Thus, you can manipulate this equality to getthe q∗ found above. That is, the calculus-like approach works.

But I recommend that you do not use the calculus-like approach. That thereis something screwy with this approach should be evident from the fact that youfound q∗ by taking the derivative with respect to p. You should find this at leastsomewhat troubling. Whose optimization problem are you solving, player 1’s orplayer 2’s?

What is going on is the following. In taking a derivative and setting it equal tozero you are assuming that the solution is interior. But this problem is linear in p(expected utilities are always linear in the probabilities) with the constraint p ∈ [0, 1],so the solution is either p = 0, p = 1, or every p ∈ [0, 1]. The assumption that thesolution is interior is therefore equivalent to an assumption that every p ∈ [0, 1] isa solution. The assumption that every p ∈ [0, 1] is a solution implies, in particular,that player 1 is indifferent between T and B. But if that is your assumption, youcan skip the rigmarole of writing down the expected payoff and taking the derivativeand just write down the indifference expression, equation (2), directly, as I told youto do. Don’t kid yourself that the calculus-like approach is more correct. It is justexpressing the same assumption in a different, less transparent form. My experienceis that the calculus-like approach often leads to error. �Example 27. Consider the game in Figure 16. For player 1, this is the same game as

L RT 5, 4 0, 1B 2, 2 7, 8

Figure 16: A game to illustrate Nash equilibrium.

that of Figure 8. This game has three Nash equilibria. Two of the Nash equilibriaare pure: (T,L) an There is also a mixed strategy Nash equilibrium in which player1 plays T with probability 2/3 and player 2 plays L with probability 7/10. Theassociated equilibrium distribution is given in Figure 17. �Example 28. Consider the Battle of the Sexes game of Example 1. This game hasthree Nash equilibria. There are two pure Nash equilibria, (a, a) and (b, b) and onemixed Nash equilibrium in which player 1 plays a with probability 4/9 and player2 plays a with probability 5/9. The Nash equilibrium distribution for the mixedstrategy equilibrium is represented in Figure 18. �

29

L RT 14/30 6/30B 7/30 3/30

Figure 17: The Nash equilibrium distribution for the mixed strategy Nash equilibriumof the game in Figure 16.

a ba 20/81 16/81b 25/81 20/81

Figure 18: The strategy distribution for mixed strategy equilibrium of Battle of theSexes.

Example 29. Theorem 11 states that if a strategy gets positive probability in aNash equilibrium then it must be rationalizable. The converse is false: in somegames there are rationalizable strategies that never get positive probability in anyequilibrium. Consider the game in Figure 19. In this game, SR = S: every strategy

L C RT 4, 4 2, 2 2, 2M 2, 2 3, 0 0, 3B 2, 2 0, 3 3, 0

Figure 19: A game in which some of the rationalizable strategies do not appear inany Nash equilibrium.

is rationalizable. But the unique Nash equilibrium is (T,L). �Example 30. Consider the Cournot duopoly of Example 4. As discussed in Ex-ample 23, if firms have the cost function Ci(qi) = cqi with c > 0 then SR ={4 − c/3} × {4 − c/3}. Hence the Nash equilibrium is q∗1 = q∗2 = 4 − c/3. Anotherway to find this equilibrium is to restrict attention to qi < 12 − c and use the bestresponse functions given in Example 12. If qi < 12 − c for all i then players arenever indifferent. Therefore, any such equilibrium is pure. At such an equilibrium,q∗1 = BR1(q∗2) and q∗2 = BR1(q∗1). This gives two linear equations in two unknowns.Solving yields q∗1 = q∗2 = 4 − c/3. �Example 31. Now consider the three firm Cournot game of Example 24. Here, theset of rationalizable strategy profiles is large: SR = [0, 6− c/2]× [0, 6− c/2]× [0, 6−

30

c/2]. So restriction to SR does not help much. But an argument analogous to thatjust given for the two-firm Cournot game shows that there is a Nash equilibrium:q∗1 = q∗2 = q∗3 = 3 − c/4.

More generally, if there are N firms then there is a Nash equilibrium of theCournot game (with this particular demand function) with q∗i = (12 − c)/(N + 1)for all i. �Example 32. The Bertrand game, introduced in Remark 2 in Section 4.2, is verybadly behaved in that, for a large subset of opposing strategies, a best responsedoes not exist. Nevertheless, the Bertrand game does have a Nash equilibrium:both firms charge 0, which is the competitive price (recall that I have assumed noproduction costs). �

6.4 Counting Nash equilibria.

As discussed in Example 48, repeated games, which are infinite games, typicallyhave an infinite number of Nash equilibria. How many equilibria are typical in finitegames? A lower bound on how large the set of Nash equilibria can possibly be infinite games is provided by L × L games (two players, each with L strategies) inwhich, in the game box representation, payoffs are (1, 1) along the diagonal and(0, 0) elsewhere. In such games, one can show that there are L pure strategy Nashequilibria, corresponding to play along the diagonal and an additional 2L − (L+ 1)fully or partly mixed Nash equilibria. The total number of Nash equilibria is thus2L − 1. This example is robust; payoffs can be perturbed slightly and there will stillbe 2L − 1 Nash equilibria.

This is extremely bad news. First, it means that the maximum number of Nashequilibria is growing exponentially in the size of the game. This establishes that thegeneral problem of calculating all of the equilibria is computationally intractable.Second, it suggests that the problem of finding even one Nash equilibrium may,in general, be computationally intractable. The latter, however, is still merely aconjecture. Note that the issue is not whether algorithms exist for finding one equi-librium, or even all equilibria. For finite games, there exist many such algorithms.The problem is that the time taken by these algorithms to reach a solution can growexplosively in the size of the game.

An additional fact about Nash equilibria is sometimes of technical use. In finitegames, the number of equilibria is “typically” finite and odd. I first present someexamples.Example 33. The game of Example 27 has three Nash equilibria.Example 34. The game in Figure 20 has two Nash equilibria, (a, a) and (b, b). �Example 35. The entry deterrence game of Example 45 has an infinite number ofNash equilibria. �

The sense in which the number of equilibria is “typically” finite and odd is asfollows. Consider a 2 × 2 game. There are four pure strategy profiles, |S| = 4, and

31

a ba 1, 1 0, 0b 0, 0 0, 0

Figure 20: A game with two Nash equilibria.

hence, to describe the payoff functions, I must specify four payoffs for each player,or eight payoffs in all. I can thus describe the payoff functions as an element ofR

8. The set of all payoff functions for a 2 × 2 game form is simply all of R8. More

generally, the set of all payoff functions for a finite game form is RN |S|. One can

show that the subset of RN |S| for which the number of equilibria is not finite and odd

is “thin” in the sense that an arbitrarily small bump to payoffs can transform thegame into one where the number of equilibria is finite and odd. One can show thatExample 34 and Example 35 are, in this sense, not robust. A practical implicationof this, and the main reason why I bother to mention this topic in the first place,is that if you are trying to find all the equilibria of the game, and so far you havefound four, then you are probably missing at least one.

The statement that the number of equilibria is “typically” finite and odd is,however, subject to an important qualification. The definition of “typical” sketchedabove is not appropriate for many games, especially games with a non-trivial dy-namical structure. I postpone further discussion of this issue until I have discussedextensive form games; see Remark 12 in Section 8.8.

32

7 Correlated Equilibrium.

Given a strategy distribution σ, not necessarily independent, let σi be the marginaldistribution for player i and suppose that σi(si) > 0. Then σ−i|si

, the marginaldistribution over S−i conditional on si, is defined by

σ−i|si(s−i) =

σ(si, s−i)σi(si)

.

The following equilibrium concept was introduced in Aumann (1974).

Definition 8. σ ∈ ∆(S) is a correlated equilibrium distribution iff, for all i, ifσi(si) > 0 then

si ∈ BRi(σ−i|si).

Recall from Section 3.2 and Section 3.3 that if σ is not independent then themarginals σi do not have an interpretation as mixed strategies. Therefore, if σ isnot independent then it does not make sense to refer to the profile of marginals,(σ1, . . . , σN ), as a correlated equilibrium. But it is common to abuse terminologyand refer to the correlated equilibrium distribution σ as a correlated equilibrium.

Since the definition of correlated equilibrium distribution allows σ to be inde-pendent, it is easy to verify that any Nash equilibrium distribution is a correlatedequilibrium distribution, but not, in general, vice versa. Therefore, Theorem 10implies the following trivial corollary.

Theorem 12. Every finite game has at least one correlated equilibrium distribution.

It is also possible to prove Theorem 12 by elementary means (without resortingto fixed point theorems).

Theorem 13. If σ is a correlated equilibrium distribution then σ is rationalizable.

Proof. The proof is the same as the proof of Theorem 11. �

Given a game, let ΣR denote the set of strategy distributions for which onlyrationalizable strategies get positive probability, let ΣNash denote the set of Nashequilibrium distributions and let ΣCorr denote the set of correlated equilibrium dis-tributions. ΣR constitutes either all of ∆(S) (if all strategies are rationalizable) orsome lower dimensional face of ∆(S). ΣCorr is a convex set called a polytope (thinkof a cut diamond) that lies in ΣR. ΣNash, in turn, lies somewhere inside of ΣCorr

but, in general, its geometric structure is not as tidy as that of either ΣR or ΣCorr.A distribution over Nash equilibrium distributions is an element of ∆(ΣNash).

Thus, for example, if σ and σ are both Nash equilibrium distributions then anelement of ∆(ΣNash) might put probability 1/2 on σ and probability 1/2 on σ.Example 36 will illustrate this for Battle of the Sexes. Since a Nash equilibrium

33

distribution is a lottery over S, an element of ∆(ΣNash) is a compound lottery overS. Reducing this compound lottery, I abuse notation and view ∆(ΣNash) as simplya lottery over S. With this abuse of notation, it is easy to verify that

∆(ΣNash) ⊂ ΣCorr.

In words, the set of correlated equilibria include all mixtures over Nash equilibria.A natural question is whether, in fact, the set of correlated equilibria equals the setof all mixtures over Nash equilibria: does ∆(ΣNash) = ΣCorr? The answer is “no,not in general.” I will illustrate this with Example 37.

Example 36. Consider once again the Battle of the Sexes game of Example 1. All ofthe Nash equilibrium distributions identified for this game in Example 28 are likewisecorrelated equilibrium distributions. In addition, any probability distribution overNash equilibrium distributions is a correlated equilibrium distribution. Thus, inparticular, the distribution shown in Figure 5 (Example 6) is a correlated equilibriumdistribution. In this distribution, players randomize 50/50 between the two purestrategy equilibria, (a, a) and (b, b). Under this correlated equilibrium distribution,each player gets an expected payoff of 9. In contrast, the expected payoff in themixed strategy equilibrium is also symmetric (i.e. each player gets the same expectedpayoff) but each player gets only 40/9 ≈ 4.4. �Example 37. Figure 21 shows an example of a game called Chicken. The game was

J DJ 6, 6 2, 7D 7, 2 0, 0

Figure 21: A game of Chicken.

inspired by the movie Rebel Without a Cause, in which troubled 1950s teenagerscompeted to see who would be the last to jump to safety when driving cars towardsa cliff. In this stylized representation, players can either jump (J) or drive (D). Thepreferred outcome is to have the other guy jump, giving you a payoff of 7 versus 2for the chicken who jumps. If neither player jumps then both die, which is worthzero.7

You can verify that this game has three Nash equilibria: (J,D), (D, J) and themixed strategy equilibrium in which each player jumps with probability 2/3. Allmixtures over the corresponding Nash equilibrium distributions are correlated equi-librium distributions, but there are also other correlated equilibrium distributions.One of them is shown in Figure 22. This distribution yields an expected payoff of 5

7Shouldn’t the payoff from death be −∞? As I discuss in the notes on decision theory, in thecontext of expected utility, a payoff of −∞ for death is not as sensible as it might at first appear.I have to choose some number to represent death, and here I have chosen 0.

34

J DJ 1/3 1/3D 1/3 0

Figure 22: A correlated equilibrium distribution for Chicken.

to each player, so that the two together get 10. In contrast, the pure Nash equilibriagive only 9 and the mixed Nash equilibria gives only 28/3 ≈ 9.3. So any mixtureover Nash equilibria likewise yields strictly less than 10. This establishes, amongother things, that the distribution in Figure 22 cannot be generated by mixing overNash equilibria. �Example 38. Consider Matching Pennies, introduced in Example 2. The uniqueNash equilibrium is for each player to randomize 50/50. The associated strategydistribution is shown in Figure 14. This is also the unique correlated equilibriumdistribution. In this game, the set of Nash equilibrium distributions coincides withthe set of correlated equilibrium distributions. �

35

x0x2

Player 1

Player 2

x1

x3

In

Out

Acc x4

Fight

8 Games in Extensive Form.

The goal is to represent games with a dynamic structure, chess for example. Thegeneral formalism for such games is cumbersome. My discussion, therefore, is in-formal. For a formal treatment, see Fudenberg and Tirole (1991); Osborne andRubinstein (1994) provides an alternative formalism.

8.1 Extensive forms.

8.1.1 Perfect information.

To keep things simple, I start by discussing games like chess in which each playeracts in turn, fully informed of the previous actions in the game. Such games arecalled games of perfect information. I then complicate things in two ways. I allowplayers to be only partially informed of previous play in the game. And I introducea new “player” called Nature who can introduce randomness into the game.

An extensive form of perfect information consists of the following components.There is a set I of players. There is a finite set X of nodes, which are the pointsin the game at which either (a) players take actions or (b) play terminates. X ispartitioned into the the disjoint sets X1, . . . , XN , Xτ . Xi are the decision nodes forplayer i, the nodes at which player i must act. Xτ are the terminal nodes, the nodesat which the game ends.

For each node x ∈ Xi there is a set Aix, the actions available to player i at x.Let Ai =

⋃Aix be the set of all of player i’s possible actions.

There is an initial decision node x0, typically a decision node for player 1 butpossibly a decision node for one of the other players. Other nodes are then reacheddepending on the actions chosen. Formalizing this turns out to be tedious and so Imerely illustrate with an example.

Figure 23: An entry deterrence extensive form.

In the game shown in Figure 23, there are two players, an industry incumbent(player 2) and a potential entrant (player 1). Player 1 moves first, at the decisionnode labeled x0, and chooses one of two actions, In (enter the industry) or Out (don’t

36

enter the industry). If player 1 chooses I then play moves to the decision node labeledx1 and it is player 2’s turn. If player 2 chooses Fight (launch a price war) then playends at the terminal node labeled x3. If player 2 chooses Acc (“Accommodate,”don’t launch a price war) then play ends at the terminal node labeled x4. Finally, ifplayer 1 chooses O then play ends at the terminal node labeled x2. As a convention,I have marked the decision nodes, but not the terminal nodes, with dots.

In the game shown in Figure 23, the predecessors of x4 are x1 and x0. x1 isthe immediate predecessor of x4. The successors of x0 are x1, x2, x3, and x4. Theimmediate successors of x0 are x1 and x2.

In general, the decision node x0 is the unique node with no predecessors. Theterminal nodes have no successors and they are the only nodes with this property.x0 is a predecessor of every other node. A node can have at most one immediatepredecessor.

These assumptions guarantee that play cannot cycle. Rather, play starts at x0

and ends at some terminal node, passing through each node at most once. Everyterminal node is thus uniquely identified with the play path (or path of play) takento reach that node. For example, x3 is identified with the play path (x0, x1, x3). Iuse “play path” and “terminal node” interchangeably. One can also identify a playpath by the sequence of actions taken. Thus x3 is identified by (In, Fight). Theplay path taken in a chess match is reported in this way.

One last remark. Just as game boxes are convenient for small games but uselessfor large games, so pictures like that in Figure 23 are convenient for small extensiveforms but useless for large or complicated extensive forms. The abstract formalismdoes not require that one be able to draw a picture to have a well defined extensiveform.

8.2 Information sets.

To capture strategic environments in which players are only partially informed ofwhat has happened in the game to date, I assume that for each player i, the set of i’sdecision nodes is partitioned into information sets. Hi denotes the set of informationsets for player i. Thus Xi =

⋃h∈Hi

h and, for any h, h ∈ Hi if h �= h then h∩ h = ∅.As usual, I illustrate with a picture rather than provide a complete formalization.

See Figure 24. Player 1 moves first and chooses either “top,” “middle,” or “bottom.”If he chooses “bottom” then play moves to x3 and player 2 chooses either “in” or“out,” after which the game is over. But if player 1 chooses either “top” or “middle”then play moves to one of the two decision nodes, labeled x1 and x2 inside the oval.The oval represents the information set containing x1 and x2. The interpretationis that if this information set is reached then player 2 knows that he is either atdecision node x1 or x2 but he does not know which node he is at. He may haveopinions based on how he thinks player 1 would play, but he does not observe directlywhether player 1’s action was “top” or “middle.” All he directly observes is thatplayer 1’s action was not “bottom.”

37

x0

x3

Player 1

Player 2

x1top

bottom

left

right

x2

Player 2

In

Out

middle left

right

x4

x5

x7

x8

x9

x6

x10

x11

center

center

Figure 24: An extensive form with a non-trivial information set.

In general, the information set containing x0 is a singleton. Extensive forms ofperfect information are extensive forms in which all information sets are singletons.If an extensive form is not one of perfect information then it is one of imperfectinformation.

For any two distinct decision nodes x and x in an information set, the action setsAix and Aix are the same. The reason for this is that if Aix and Aix were differentthen player i would be able to observe which node he was at merely by observingwhich action set was available. This is contrary to the intended interpretation ofinformation sets. I therefore let Aih denote the set of actions available at every nodex ∈ h.

Path of play is defined for general extensive forms exactly as it is for extensiveforms of perfect information.

8.3 Extensive form strategies.

In an extensive form game, a strategy for player i specifies an action at every infor-mation set. Formally a strategy is a function

si : Hi → Ai

such that, for any h ∈ Hi, si(h) ∈ Aih, where si(h) is the action chosen at h. Onecan think of a strategy as a complete set of instructions that could be handed toan agent, who could then execute the strategy on behalf of the player. Becausethe strategy specifies an action for every one of i’s information sets, the strategyprovides instruction for how the agent should act no matter how the game unfolds.

Exactly as for strategic forms, let Si be the set of player i’s strategies, let S =

38

Player 2

Blue

Red

Sweet

Sour

Hot

Cold

Warm

x0

x3

x1x4

x7

x6

x5

x2

Player 1

∏i Si be the set of strategy profiles, and let S−i =

∏j �=i Sj be the set of strategy

profiles for players other than i.Given strategies si for each of the players, the strategy profile s = (s1, . . . , sN )

determines a path of play (terminal node). Say that a node x is reachable under sif x is contained in the path of play determined by s. Say that an information set his reachable under s if h contains a node that is reachable under s.

Of all the concepts in game theory, the concept of extensive form strategy maybe the one that causes the most confusion. Two observations may be helpful.

First, the most straightforward interpretation of a strategy is that the playerdecides before the game is played how he will play under every possible contingency,no matter how unlikely, and then simply executes this plan, or has an agent executethe plan for him. This interpretation seems farfetched. One typically thinks of realpeople as making decisions about how to play dynamically, as the game proceeds.Player i may not make up his mind about what to do at information set h ∈ Hi

unless play actually reaches h. It turns out, however, that for much, but not quiteall of game theory, the distinction between a player who decides on an extensiveform strategy before play begins and a player who decides on actions dynamicallyas play unfolds is irrelevant. For much of game theory, one can model a player asif he were making all his decisions up front even though he is actually deciding dy-namically. After all, the player eventually chooses some action at every informationset that he reaches, and the strategy can be thought of as merely recording whatthat action will be. One may also want the strategy to reflect the fact that players,especially dynamic players, might randomize at some information sets. I discussthis in Section 8.4.

Second, a common error in writing down extensive form strategies is to fail tospecify actions for all of the player’s information sets. Consider Figure 25. Player

Figure 25: Player 1 has two strategies. Player 2 has six.

1 has just two strategies, Blue and Red. Player 2 has six strategies, one of which isSweet if Blue and Warm if Red. It is common for people just learning game theoryto state that Sweet if Blue is a strategy. This is wrong because we also need to

39

x0

x2

Player 1

Player 2

x1

x3

x4 x5

x6Player 1

top

bottom

left

right

high

low

specify an action for player 2 should player 1 choose Red. This is true even if ourprediction is that player 1 will play Blue. There are two reasons for insisting thatplayer 2’s strategy be complete in this way. First, I view the case in which player 1is certain to play Blue as a limit case. In reality, there is a positive, if possibly verysmall, probability that player 1 will play Red. Second, in thinking through whatshe wants to do, player 1 must consider how player 2 would respond if player 1 wereto play Red as opposed to Blue. That means player 1 must contemplate player 2’sstrategy, not just the strategy fragment Sweet if Blue.

A related, but more forgivable, error can be illustrated using Figure 26. Strictly

Figure 26: An extensive form in which one player can move more than once.

speaking (bottom, high) and (bottom, low) are distinct strategies for player 1. Thisseems bizarre. Player 1 can get to node x4, and choose between high and low, only ifshe chooses top at x0. If, as here, she chooses bottom, why specify what she will doat her x4? Without belaboring the point, the answer is that in some cases it is indeedcommon to write both (bottom, high) and (bottom, low) as simply bottom. Butthere are also circumstances in which one needs to keep track of the full strategy.I need to do so, for example, in order to work with subgame perfection, which Idiscuss later in the course. For the time being, you should view the distinctionbetween (bottom, high) and (bottom, low) as a harmless nuisance.

One way to help avoid error in thinking about extensive form strategies is tolearn how to count strategies. The number of strategies for player i is the numberof different ways to assign actions to information sets, which is∏

h∈Hi

|Aih|.

Thus, for the extensive form in Figure 25, the number of strategies for player 2 is2 × 3 = 6 as claimed. Similarly, in Figure 26, player 1 has 2 × 2 = 4 strategies.Note that the number of terminal nodes in the game of Figure 26 is 6. I know of nogeneral, useful relationship between the number of terminal nodes and the numberof strategies.

40

Example 39. Consider “two move chess.” White moves first, then black, then thegame is over. White has 20 actions and hence 20 strategies: each of the eightpawns can move in one of two ways, and each of the two knights can move in oneof two ways. Black has 20 information sets and at each information set she has 20actions. Therefore, Black has 2020 ≈ 1026 strategies. In contrast, two move chesshas 20 × 20 = 400 terminal nodes. �Example 40. Consider the game of Figure 24. Player 1 has three strategies, top,middle, or bottom. Player 2 has six strategies. One of these is center, out), meaningcenter in the upper information set, out in the lower information set. Note that,since x1 and x2 are contained within the same information set, I do not specifyactions for x1 and x2 separately. �

8.4 Probability distributions in extensive forms.

Exactly as for strategic forms, one can consider ∆(S), the set of probability distribu-tions over strategy profiles, ∆(Si), the set of probability distributions over player i’sstrategies, and ∆(S−i), the set of probability distributions over profiles for playersother than i.

σ ∈ ∆(S) determines a probability distribution over paths of play (terminalnodes). I give an example below. Say that a node x is reachable under σ iff x nodeis reachable under a strategy profile s for which σ(s) > 0. I say that an informationset h is reachable under σ iff h contains a decision node that is reachable under σ.

Assuming that σ is independent, there are two ways to think of how player imight randomize. The first is that player i chooses a mixed strategy, which is simplyan element σi ∈ ∆(Si). In effect, player i tosses a coin (or a die, or spins a wheel,etc.) before play begins to determine an extensive form strategy, then executes thatstrategy. For example, consider Figure 26. Suppose that σ1 puts probability 1/4 onthe strategy (top, high), probability 1/4 on the strategy (top, low), and probability1/2 on the strategy (bottom, high), and probability zero on (bottom, low). Supposeσ2 puts probability 1/2 on both left and right. Then the distribution over terminalnodes is as follows. Node x3 is reached with probability 1/4, nodes x4 and x5 areeach reached with probability 1/8, and node x2 is reached with probability 1/2. Inthis example, all decision nodes, and hence information sets, are reachable under σ.

The alternative way to think about randomization is that player i randomizesdynamically, at her information sets. Formally, a behavior strategy for player i is afunction

σbi : Hi →

⋃h∈Hi

∆(Aih)

41

such that σbi (h) ∈ ∆(Aih).8,9 σb

i (h)(ai) is the probability of ai under the random-ization σb

i (h) ∈ ∆(Aih). A strategy si can be thought of as a degenerate behaviorstrategy: if si(h) = ai then σb

i (h)(ai) = 1.(σb, σ−i) determines a distribution over paths of play just as (σ, σ−i) does. By

way of example, consider again Figure 26. Suppose that, at x0, σb1 puts probability

1/2 on both top and bottom and, at x4, σb1 puts probability 1/2 on both high and

low. As before, assume that σ2 puts probability 1/2 on both left and right. Then thedistribution over terminal nodes is as follows. Node x3 is reached with probability1/4, nodes x4 and x5 are each reached with probability 1/8, and node x2 is reachedwith probability 1/2.

Note that the example using mixed strategies and the example using behaviorstrategies generated the same probability distribution over terminal nodes. Thisreflects a general fact. If an extensive form satisfies a property called perfect recallthen for any behavior strategy σb

i there is an equivalent mixed strategy σi, whereby “equivalent” I mean that, for any σ−i, (σb

i , σ−i) and (σi, σ−i) generate the sameprobability distribution over terminal nodes. And conversely, for any mixed strategythere is an equivalent behavior strategy.10 This result is known as Kuhn’s Theoremand first appeared in Kuhn (1964). I spare you a definition of perfect recall. It isa property satisfied by virtually every extensive form used in practice.11 Kuhn’stheorem is another aspect of the fact that it is in many respects without loss ofgenerality to assume that decisions about how to play are made up front, beforeplay begins, rather than dynamically, as play proceeds. Whether one works withmixed strategies or behavior strategies is largely a matter of convenience.

Finally, I mention in passing that there is a mathematical difficulty that ariseswhen one tries to define behavior strategies in extensive forms with continuum ac-tion sets. The problem was first noted in Aumann (1964), which also proposeda solution. Another solution was proposed in Milgrom and Weber (1985). For areadable discussion of the difference between the two solutions, see Fudenberg andTirole (1991).

8There is a subtlety here. (You may wish to skip this footnote.) In assuming that a player canexecute a behavior strategy, one is implicitly assuming that the opponents can observe at most theaction taken, and not the randomization itself. Thus, for example, if the choice is between left andright the opponent might observe that the action was left but not whether the probabilities usedfor randomization were 50/50 rather than 70/30. More formally, if a player can execute a behaviorstrategy then the true extensive form must specify, for each information set, not only the availableactions but also the available probability mixtures over actions. The extensive form must alsoinclude a notional player, “Nature,” discussed in Section 8.5, to determine what action is actuallyrealized at each information set for each player. Finally, information sets must be constructed sothat opponents do not directly observe the mixtures chosen, only, at most, the actions realized.

9A number of authors use the term behavioral strategy rather than behavior strategy. The termsare equivalent.

10Note that I have not claimed that there is a unique equivalent strategy or behavior strategy.There will be many.

11There has, however, been some recent research interest in games without perfect recall.

42

N

Player 1

1/2

1/2up

down

left

right

8.5 Nature.

I model various kinds of uncertainty in the game by introducing a new notionalplayer, player 0, called Nature. Nature’s information sets are singletons. By as-sumption, Nature chooses a behavior strategy σb

0 that is part of the description ofthe extensive form. For example, in modeling poker I view Nature as determiningthe hands of the players.

Example 41. Figure 27 shows a trivial game with nature. Player 1 has two actions.

Figure 27: A game with nature and one player.

If he chooses down the game is over. If he chooses up then nature randomizes50/50 between left and right. Note that I have stopped labeling nodes explicitly.Henceforth I label nodes only occasionally. �

8.6 Extensive form games.

Just as strategic forms were transformed into strategic form games by adding payofffunctions: ui : S → R, so extensive forms are transformed into extensive form gamesby adding payoff functions vi : Xτ → R; recall that Xτ is the set of terminal nodes.No payoff function for Nature is required.

Example 42. The entry deterrence extensive form of Figure 23 become an extensiveform game by adding payoffs. See Figure 28. Thus, if player 1 enters and player 2accommodates the player 1 gets a payoff of 16 −K while player 2 gets a payoff of16. K is a setup or entry cost, and I assume K ∈ (0, 16). �

8.7 Extensive form games ⇔ strategic form games.

Any extensive form game determines a strategic form game as follows. The set ofstrategic form players is I, and typically is assumed to exclude Nature. The set ofstrategic form strategies for player i is simply the set of extensive form strategies,Si. It remains to specify payoffs.

43

Player 1

Player 2

In

Out

Acc

Fight (-K, 0)

(16-K, 16)

(0, 36)

Figure 28: An entry deterrence game.

Consider a strategy profile s = (s1, . . . , sN ). If there are no moves by Naturethen s determines a terminal node x ∈ Xτ . For the strategic form, set the payoffui(s) = vi(x).

Example 43. Consider again the extensive form game of Figure 28. The associatedstrategic form game box, with player 1 being row and player 2 being column, isshown in Figure 29. �

Fight AccIn −K, 0 16 −K, 16

Out 0, 36 0, 36.

Figure 29: An Entry Deterrence Game

If Nature is present then the true strategy profile is (σb0, s1, . . . , sN ), which yields

a probability distribution over terminal nodes. Given this distribution, set ui(s)equal to the expectation of vi(x).

Example 44. Figure 30 gives the extensive form for a very simple game with oneplayer in addition to nature. As indicated, Nature is equally likely to choose eitherof her two actions. Therefore, in this one player game, u1(up) = 1

2(3) + 12(−1) = 1

while u2(down) = 0. �Thus any extensive form game generates a strategic form game. The converse is

also true, but I do not pursue that issue.

8.8 Nash equilibria in extensive form games.

Definition 9. The Nash equilibria of a game in extensive form are the Nash equi-libria of the associated game in strategic form.

Similar statements hold for rationalizable strategy profiles and correlated equi-libria.

44

N

Player 1

3

-1

0

1/2

1/2up

down

Figure 30: An extensive form game with nature.

Example 45. Consider the entry deterrence game of Example 42. The associatedstrategic form game is exhibited in Example 43. This game has two pure Nashequilibria, (In, Acc) and (Out, Fight), and a continuum of partly mixed Nash equi-libria. In the latter, the entrant chooses NE and the incumbent chooses Fight withprobability at least 1 −K/16. Intuitively, if the incumbent threatens to play Fightwith high enough probability then the entrant does not enter. The pure equilibrium(Out, Fight) as well as the partly mixed equilibria are implausible because Fight isweakly dominated by Acc (the threat of Fight is not credible), but these are Nashequilibria. �Remark 12. The fact that the entry deterrence game has an continuum of NE ap-pears to contradict the claim made in Section 6.4 that the number of Nash equilibriain a finite game is typically finite and odd. As I stated at the time, the claim mustbe qualified because it rests on a definition of “typical” that is not always sensi-ble, especially for games, like the entry deterrence game, with an explicit dynamicstructure. For example, in the entry deterrence game of Example 42, there are threeoutcomes – no entry, entry and fighting, and entry and accommodation – and thesein turn generate the payoffs. Given that there are only three outcomes, even thoughthere are four pure strategy profiles, the appropriate space of payoffs is not R

8 butR

6. While the entry deterrence game is not robust in R8, it is robust in R

6. In thissense, the fact that the entry deterrence game has an infinite number of equilibriais robust.

There is, however, one piece of good news. Notice that in the entry deterrencegame, although the set of Nash equilibria is infinite, the equilibria generate onlytwo possible distributions over outcomes, both degenerate. In one distribution, theoutcome is no entry and in the other the outcome is entry and accommodation.This phenomenon generalizes, although not quite as completely as one might hope;see Govindan and McLennan (1998). �

45

8.9 Repeated games.

People often face the same, or at least similar, decision problems repeatedly. Arepeated game is an idealized representation of this sort of strategic environment.In a repeated game, literally the same people play literally the same game over andover. I focus here on infinitely repeated games of perfect monitoring. I discuss somealternative models of repeated interaction at the end of this subsection.

Fix a strategic form game G. In the repeated game based on G, G is called thestage game or the one-shot game. To avoid confusion with strategies in the repeatedgame, I refer to strategies in the stage game G as actions. Ai is the set of actionsfor player i in the stage game. A is the set of action profiles in the stage game.

In the repeated game, time is divided into periods 1, 2, 3, . . . , with G playedin each period. A t-period history records the action taken by each player in everyperiod 1, 2, . . . , t. Formally, a t-period history is an element of At.

The assumption of perfect monitoring means that player i’s information set inperiod t+1 is uniquely identified by the t period history of the game to date. Thus,under perfect monitoring, at date t + 1 player i knows the actions taken by hisopponents in every preceding period. Let H be the set of all possible finite histories,which is equivalent to the set of all information sets. For technical convenience,I include in H a special abstract element h0, which is the empty history at thebeginning of period 1.

Because information sets are identified with histories, I can define strategiesusing histories rather than information sets. Formally, a pure strategy for player iis a function

si : H → Ai

giving player i’s action for every possible history. The action taken in the first periodis simply si(h0). As usual, Si is the set of player i’s strategies in the repeated gameand S =

∏Ni=1 Si is the set of strategy profiles. Unless Ai is trivial (contains only

one action), Si is uncountably infinite.

Example 46. The most famous repeated game by far is the repeated prisoner’sdilemma (Example 3). As noted above, the set of strategies in the repeated pris-oner’s dilemma is uncountably infinite. But four are well known. They are alwayscooperate (play C in every period, regardless of history), always fink (play F inevery period, regardless of history), tit for tat (play C initially and thereafter playin period t+ 1 whatever action your opponent took in period t), and grim (play Cevery period provided neither player ever plays F ; if either player plays F in anyperiod then play F in every subsequent period). �

To describe the outcome of the repeated game, I focus exclusively on the pathof play, which is the infinite history that records the action taken by each player inevery period 1, 2, . . . . Formally, a path of play is an element of A∞. Given a path

46

of play ζ, let ζt be the action profile taken in period t. Thus ζt ∈ A. A strategyprofile s in the repeated game determines a path of play, which I denote ζ(s).

Finally, in a discounted repeated game, payoffs are defined as follows. Let videnote player i’s payoff function in the stage game. For each player i, there is adiscount factor δi ∈ (0, 1). I give interpretations of δi below. Then player i’s payofffunction in the discounted repeated game is ui : S → R given by

ui(s) =∞∑

t=1

δt−1i vi(ζt(s))

This can be extended to mixed strategy profiles but I do not do so explicitly. Thus,player i’s payoff in the repeated game is the discounted sum of his payoffs in eachperiod.

δi has two interpretations. The first is that player i is impatient and has a rateof time preference, ri. Then δi = 1/(1 + ri). The second interpretation is that inevery period there is a constant probability 1−δ that the game ends (in which case,since 1 − δ > 0, the game eventually ends with probability 1). In this case, withδi = δ, ui is the expected payoff in the repeated game. These two interpretationsare compatible. Thus, suppose that player i has a rate of time preference ri andthat there is a probability 1− δ each period that the game will end. Then player i’sdiscount factor is δi = δ/(1 + ri).

Having defined strategies and payoffs in the repeated game, the formal descrip-tion of the repeated game is complete. Note that the δi are part of the definition ofthe repeated game. Repeated games with the same stage game but different δi aredifferent, and may have much different properties, as I illustrate in Example 13 andExample 48 below.

There are many possible variations on the basic repeated game model, and manyof these variations have been explored in the literature. One can consider finitelyrepeated instead of infinitely repeated games. One can define repeated game payoffsin different ways. A common alternative is limit of means:

ui(s) = lim infT→∞

1T

T∑t=1

vi(ζt(s)).

Loosely, this corresponds to the case where player i is “infinitely patient.” Orone can stick within the discounting framework but allow for non-constant δi. Inparticular, referring to the second interpretation of δ given above, one can considergames in which the probability of termination, 1 − δ, varies or is unknown.

One can consider repeated games of imperfect monitoring, in which players knowtheir own action history but not necessarily the action histories of their opponents.In a repeated Cournot oligopoly game, for example, it may make sense to modelplayers as knowing the history of market prices but not directly the history of theiropponents’ output choices. One can consider different time structures. For example,

47

one can consider games that take place in continuous rather than discrete time. Orone can consider games in which players alternate, some acting in some periods andothers acting in other periods. One can consider stochastic games, in which thestage game played in period t depends on what has happened in previous periods.For example, one can consider oligopoly games in which investments made in earlierperiods change the cost structure of players in later periods. One can considermodels in which the same players meet each period in different games. For example,two firms with multiple divisions may compete against each other in more than onemarket. And finally, one can consider models in which the set of players changesfrom one period to the next.

Example 47. Rationalizable strategies in the repeated Prisoner’s Dilemma. Recallthe repeated Prisoner’s Dilemma, Example 46. If the discount factors, the δi arevery low then, just as F is strictly dominant in the stage game (Example 21), “alwaysfink” is strictly dominant in the repeated game.

If the δi are close to 1, this is no longer true. As discussed in Example 13, ifδi > 2/5 then, for the stage game payoffs of Example 46, “always fink” is not a bestresponse to “grim,” but “grim” is a best response to “grim.” Indeed, one can verifythat “grim” is rationalizable, and hence mutual play of “grim” is a rationalizableprofile. If both players play “grim” then the path of play is mutual play of C.This contrasts sharply with the one-shot Prisoner’s Dilemma, where the uniquerationalizable profile has both players play F .

More generally, in repeated games, if the discount factors δi are close enough to1 then the set of rationalizable strategies can be very large. Moreover, rationalizableprofiles in the repeated game can generate paths of play in which players take actionsthat are not rationalizable in the stage game. I discuss this phenomenon in moredetail later in the course; see also Example 48. �Example 48. Nash Equilibria in the repeated Prisoner’s Dilemma. As noted in Ex-ample 13, “always fink” is a best response to “always fink.” Hence mutual play of“always fink” is a Nash equilibrium in the repeated Prisoner’s Dilemma.

This illustrates a general property of repeated games. If a∗ is a Nash equilibriumof the stage game, then the strategy profile in which each player i plays a∗i in everyperiod regardless of history is a Nash equilibrium in the repeated game. But theremay be many other Nash equilibria as well.

First of all, if the stage game has more than one Nash equilibrium (which isnot the case with the Prisoner’s Dilemma) then there are Nash equilibria in whichplayer’s play one stage game equilibrium in some periods and other stage gameequilibria in other periods. For example, if there are two equilibria, A and B, theycould alternate, playing A in even periods, B in odd periods. Or they could playA for the first three periods, B the next period, A for the next four periods, andso on, corresponding to the decimal expansion for π (3.14159 . . . ). If there are atleast two equilibria in the stage game, there is an uncountable infinity of equilibriain the repeated game.

48

But there may be more equilibria than even this suggests. Again consider therepeated Prisoner’s Dilemma. As also noted in Example 13, “grim” is a best responseto “grim” provided δi is large enough (δi > 2/5). Therefore, depending on the valueof δi, there are equilibria of the repeated game that do not correspond to repeatedequilibria of the stage game. In fact, there are lots of (an uncountable infinity of)such equilibria, a fact that is usually referred to as the “Folk Theorem.” I discussthe Folk Theorem later in the course. �

8.10 Games of incomplete information.

One is often interested in environments in which players are asymmetrically informedabout different aspects of the game. The potential scope for asymmetry is large. Forexample, player 1 may know that terminal node x means that she receives $1000,but she may not know what the other players receive. And even if she does knowwhat her opponents receive, she may not know their vi (i.e., she may not knowtheir felicity functions). And even if she knows the vi, player 1 still faces strategicuncertainty: she does not know what strategies the other players will play. Pursuingthese sorts of issues quickly leads into extremely deep philosophical waters.

Games with these sorts of information asymmetries are called games of incom-plete information. The standard approach within game theory, following Harsanyi(1967), has been to simplify things by modeling a game of incomplete informationas a game of imperfect information in which Nature moves first and determinesplayer types, where a player’s type specifies what the player knows at the start ofthe game. This representation has been shown to be almost without loss of general-ity. The critical qualification is that in the imperfect information representation ofthe game, the probabilities used by Nature in selecting type profiles are part of thedescription of the game and as such are common knowledge among the players: theplayers agree on the probabilities, know that they agree on the probabilities, andso on. This is called the common prior assumption (CPA) and it is substantive. Idiscuss it further in Section 9.3. For more on this issue, see Dekel and Gul (1997).

Remark 13. A game of incomplete information that has been represented as a gameof imperfect information is often called a Bayesian game and the Nash equilibriumof such a game is often referred to as a Bayesian Nash Equilibrium. I do not usethese terms, however. �

49

9 Justifying Rationalizability, Correlated Equilibrium,and Nash Equilibrium.

9.1 Preplay agreement.

Suppose that players meet before the game and reach an agreement as to whatstrategy profile, or mixed strategy profile, to play. A minimal condition for suchan agreement to be meaningful is that, assuming that each player believes thatthe others will abide by the agreement, no player has a strict incentive to deviateaway from it. A mixed strategy profile with this property is a Nash equilibrium.So if players reach an agreement on how to play, the agreement must be a Nashequilibrium.

One objection to this is that while a player might have an incentive to deviate,he won’t if the others can punish him. For example, the players might be able towrite a contract with fines, enforced by a judicial system, if any player deviates. Themethodological position taken in non-cooperative game theory (and this is one of theways that non-cooperative game theory and cooperative game theory differ) is thatif any such method of punishment is available then it must be incorporated directlyinto the description of the game. Thus, for example, under the contract story theplayers, in effect, agree to play a game in which each player’s payoffs have beenmodified to include a fine for any strategy other than the agreed strategy. Similarly,if players will meet again, and if they can use the threat of retaliatory future behaviorto deter deviation in today’s game, then the game must explicitly capture thisrepeated interaction and the agreement will be strategies in this repeated game, notjust over the actions taken in a single period. Because any enforcement mechanismis supposed to be incorporated into the description of the game, one sometimes seesNash equilibrium referred to as a self-enforcing agreement.

As a motivation for Nash equilibrium, I find preplay agreement compelling butit is subject to several qualifications. First, it is not always relevant. To take anextreme example, the entire world economy can be modeled as one gigantic game,but to this date the world population has yet to meet to agree on exactly how toplay it. If one thinks that play in this gigantic game conforms, or eventually willconform, even approximately to equilibrium then one must appeal to some storyother than preplay agreement.

Second, if the game has multiple Nash equilibria then the negotiation over whichequilibrium to play may be non-trivial. The appeal to preplay negotiation may thusend up replacing one puzzle, namely how can players reach equilibrium in a game,with another, namely how can players reach equilibrium in their negotiations aboutthe game.

And third, if players can negotiate prior to play of the game then they may beable to negotiate to change the game itself.12 One expects that players will try to

12In principle, one can model the negotiation explicitly as taking part in an even larger game, in

50

N

1/2heads

1/2tails

b

a

a

b

a

b

b

a

a

b

a

b

(0,0)

(0,0)

(0,0)

(0,0)

(10,8)

(10,8)

(8,10)

(8,10)

Player 1

Player 1

Player 2

Player 2

alter games that yield outcomes that they dislike.One elementary way to change the game is to introduce a correlation device,

which may be unobservable to an outside observer. As a consequence of the cor-relation device, play to the observer may look correlated. Put differently, from theperspective of an outside observer, the implication of preplay negotiation is, in gen-eral, not Nash equilibrium but correlated equilibrium. I will illustrate this usingBattle of the Sexes, introduced in Example 1. I will refer to this game as G.

Suppose that players agree to toss a fair coin before play begins and then toplay (a, a) if the coin lands heads and (b, b) if the coin lands tails. This makesgood sense for the players, because it will give them an expected payoff of 9 each,which is the highest symmetric payoff that they can receive in the game. The truegame, which I call Γ, incorporates the coin toss explicitly as an initial move byNature. Figure 31 provides one possible extensive form representation of Γ. As

Figure 31: An extensive form game for Battle of the Sexes with a correlating device.

drawn, player 1 nominally goes first but the information sets for player 2 implythat play is effectively simultaneous.13 In this extensive form, each player has fourstrategies: aa, ab, ba, and bb, where ab is read, “a if heads and b if tails.” Thestrategic form for Γ is shown in Figure 32. For example, if player 1 chooses ab andplayer 2 chooses bb then when the coin lands heads they play (a, b) for a payoff of(0, 0) and when the coin lands tails they play (b, b) for a payoff of (10, 8). Since the

which players choose what game to play, but this approach quickly becomes sterile.13Alternatively, one could have player 2 nominally go first. This multiplicity in the possible

extensive form representations is irrelevant, since the strategic form of the game, given in Figure 32,is not affected.

51

aa ab ba bbaa 8, 10 4, 5 4, 5 0, 0ab 4, 5 9, 9 0, 0 5, 4ba 4, 5 0, 0 9, 9 5, 4bb 0, 0 5, 4 5, 4 10, 8

Figure 32: The strategic form game for Battle of the Sexes with a correlating device.

probability of heads is 1/2, the expectation of this is (5,4), as recorded in the gamebox.

One can easily verify that the suggested agreement, namely (ab, ab), is a Nashequilibrium of this game. The associated Nash equilibrium distribution is shown inFigure 33.

aa ab ba bbaa 0 0 0 0ab 0 1 0 0ba 0 0 0 0bb 0 0 0 0

Figure 33: The strategy distribution for the Nash equilibrium (ab, ab).

Now, suppose that an outside observer cannot observe strategies in Γ, like ab.Instead, an observer can see only what actions in G are actually played. The induceddistribution over actual play is shown in Figure 34. Note that this is exactly the

a ba 1/2 0b 0 1/2

Figure 34: The induced distribution over actual play.

same as Figure 5. Thus, to an outsider who can only observe play in G, play lookscorrelated.

Finally, note that there are other pure Nash equilibria of the game in Figure 32,namely (ba, ba), (aa, aa), and (bb, bb). The equilibria (aa, bb) and (bb, aa) correspondto the original pure equilibria of the game: players effectively ignore the coin toss. Ingeneral, adding a correlating device can introduce new possibilities for equilibriumbehavior, such as (ab, ab), but, since the correlating device can always be ignored,

52

analogs of the Nash equilibria of the original game are always present as well (thisholds for mixed strategy equilibria as well as pure).

9.2 The “Big Book of Game Theory.”

Suppose that everyone learns game theory from the same book, the “Big Book ofGame Theory,” which prescribes how to play in every game. The book’s underly-ing hypothesis is that everyone follows the book. With this hypothesis, the onlyprescriptions that make sense are Nash equilibrium prescriptions. Otherwise, bydefinition, at least one player has strict incentive to deviate, which contradicts thehypothesis that everyone follows the book. Note that the argument here is verysimilar to that for preplay agreement. The main difference is that players do notnegotiate before the start of each individual game.

You may think of the “Big Book of Game Theory” as a social convention forhow to play games. The main obstacle to this story is that the book must specifyhow to play in every game, including games with multiple equilibria. This raises theproblem of what equilibrium should be played. In a game like Battle of the Sexes,introduced in Example 1, it is not at all obvious how this multiplicity problem shouldbe resolved. Harsanyi and Selten (1988) provides one possible resolution and canthus be considered a “Big Book of Game Theory.”

Harsanyi and Selten (1988) make a good case that their particular prescriptionsare sensible, but other big books of game theory are also possible. Thus, the questionof what equilibrium (if any) will be played has been replaced with another, moreabstract question, how did society settle on one particular Big Book of Game Theoryin the first place? To answer this question, one must appeal to one of the other storiesdescribed in this section.

9.3 Introspection.

By introspection I mean reasoning of the form, “I should play this because I thinkthat my opponent will play that, and I think that my opponent will play that becauseI think that he thinks that I will play . . . .” This subsection addresses what thissort of introspective reasoning, by players of unlimited calculating power, impliesfor player behavior.14

In brief, the answer is that introspection implies that play lies in SR (i.e. thatplay is rationalizable) but (this is more controversial) falls short of implying thatplay lies in SW (i.e. that strategies survive the iterated deletion of weakly dominatedstrategies) or that play conforms to a Nash or correlated equilibrium.

Of course, the calculating power of actual players is limited and, indeed, theempirical evidence is inconsistent with the introspective reasoning that I am aboutto describe. I find the literature on introspection nevertheless interesting, in part

14What I am calling introspection the literature calls interactive epistemology. The seminal paperis Aumann (1976). I give other cites below.

53

because it grapples with difficult questions. But of all the material in this section, thematerial on introspection is both the most demanding and the most easily skipped.

9.3.1 Introspection, SR, and SW .

Consider players engaged in a game G. I refer to players as choosing actions inG rather than strategies; I want to reserve the word “strategy” for another use.Strictly speaking, the players are playing, not G, but rather a game of incompleteinformation arising from G. In particular, the players do not know their opponents’actions, although each may have beliefs about the others’ actions, possibly basedin turn about beliefs about the others’ beliefs, and so on. One can show that it iswithout loss of generality to represent the game of incomplete information, togetherwith whatever behavior it generates, by a formalism, which I call F , in which Naturechooses a profile of types, one type for each player. A player’s type specifies (a) hisbelief about the other players’ actions, his belief about their beliefs, and so on, (b)any other information that the player might have (such as the outcome of a cointoss), and (c) the player’s action in G. The fact that a type specifies a player’saction does not take away the player’s free will. In the F formalism, the typemerely records what action the player eventually ends up choosing as a function ofhis information.

Informally, say that something is common knowledge if everyone knows it, ev-eryone knows that every knows it, and so on.15 Say that a player is rational if shechooses an action that is a best response to her belief. A basic assumption of intro-spective arguments is that rationality is common knowledge: everyone is rational,everyone knows that everyone is rational, everyone knows that everyone knows thateveryone is rational, and so on. Common knowledge of rationality can be definedrigorously within the F formalism just described, but I will not do so. Commonknowledge of rationality is a substantive assumption; the F formalism per se doesnot require rationality, or even approximate rationality, let alone common knowledgeof rationality.16

One can show that if common knowledge of rationality holds at a type profilethen the action profile associated with that type profile lies in SR. In this sense,introspection implies rationalizability.

Introspection provides considerably less support for restricting attention to SW .Say that a player is cautiously rational if he chooses σi only if σi is a best response toa “cautious belief,” where a belief σ−i is cautious iff supp(σ−i) = S−i. That is, σ−i

is cautious iff it rules nothing out. One can show, as an analog of Theorem 7, thatσi is not a best response to any cautious belief iff it is weakly dominated. A natural

15Common knowledge was first formalized in Aumann (1976).16There is an implicit assumption that F itself is common knowledge, where I use common

knowledge in an informal sense (i.e. without reference to a meta formalism that contains F ). Thissort of common knowledge can be viewed as being more or less without loss of generality; seeBrandenburger and Dekel (1993) but also Heifetz (1999).

54

conjecture is that, just as common knowledge of rationality implies that play lies inSR, common knowledge of cautious rationality implies that play lies in SW . Thisconjecture turns out to be false. In this sense, introspection does not imply thatplay lies in SW . Instead, common knowledge of cautious rationality implies thatplayers play strategies that survive one round of weak dominance deletion followedby the iterated deletion of strictly dominated strategies. See Borgers and Samuelson(1992).

9.3.2 Introspection, Nash equilibrium, and correlated equilibrium.

Finally, I turn to the question of whether introspection implies Nash, or at leastcorrelated, equilibrium.

First, in some games there is a Nash equilibrium that is so compelling that itseems reasonable that players would coordinate on it even if they could not commu-nicate beforehand. For example, in the game of Example 29, it seems reasonable topredict that players would select the unique Nash equilibrium, namely (T,L), eventhough every strategy profile in this game is rationalizable. In game theory, suchequilibria are referred to as focal, a term introduced in Schelling (1960). Althoughmost game theorists subscribe to the idea of a focal equilibrium, it has resisted allattempts at formalization. For example, one might suppose that if a Nash equilib-rium is strictly Pareto dominant then it is focal. (A strategy profile is strictly Paretodominant if it gives every play a strictly higher payoff than that of any other strategyprofile.) But on closer examination, this is not so clear. Consider Figure 35. (a, a) is

a ba 100, 100 0, 99b 99, 0 98, 98

Figure 35: A game with a Pareto dominant Nash equilibrium that may not be focal.

a strictly Pareto dominant Nash equilibrium so presumably it is focal. But playingb guarantees a payoff of at least 98 whereas playing a risks 0 for a gain (relative to98) of at most 2. So b looks more prudent than a. The equilibrium (b, b) satisfies aproperty that Harsanyi and Selten (1988) called risk dominance. The “Big Book ofGame Theory” proposed by Harsanyi and Selten (1988) selects (b, b), rather than(a, a), in this game.

Even if we could agree on a definition of what “focal” means, it seems likelythat, while some games would have focal equilibria, others would not. This leadsto the question of whether, in general games, introspection can lead to Nash or atleast correlated equilibrium. The answer is, “not quite.”

The F formalism described above resembles a game of imperfect information. Iwrote “resembles” rather than “is” for two reasons. First, F allows two players to

55

assign different probabilities to any given type profile, whereas a game of imperfectinformation requires that players all assign the same probability to any given typeprofile. The assumption that all players use the same probability distribution overtype profiles is called the common prior assumption (CPA). Second, F specifieswhat actions players take for each type. Assuming CPA, it is thus as though Fspecifies both a game of imperfect information Γ and a strategy profile for Γ, wherea strategy in Γ is a function that specifies a player’s action in G as a function of histype.

Suppose that CPA holds and reinterpret the part of a type that specifies an actionas a suggestion of how the player should play (which the player could ignore) ratherthan as a record of how the player actually does play. With this interpretation, theF formalism is a game Γ of imperfect information. Consider the strategy profile inΓ in which each player takes the action that Nature suggests at each type. It is notdifficult to show that rationality is common knowledge at every type profile in F ifand only if the “do as you are told” strategy profile in Γ is a Nash equilibrium.

To illustrate, suppose that G is the game of Chicken, introduced in Example 37.Assume CPA and thus that there is a game of imperfect information Γ that corre-sponds to the incomplete information formalism F . The simplest interesting typestructure is the following. Player i can be either of type θJ

i or θDi . There are thus

four type profiles, θJJ , θJD, θDJ , and θDD, where, for example, θJD is shorthand forthe profile (θJ

1 , θD2 ). In Γ, Nature moves first and chooses the type profile according

to the distribution P = (P JJ , P JD, PDJ , PDD). Each player observes his type, butnot that his opponent, and then each player simultaneously chooses an action, eitherJ or D. The interpretation is that if player 1 is of type θJ

1 then player 1 is told toplay J . Moreover, once player 1 learns that his type is θJ

1 , player 1 learns that thetype profile is either θJJ or θJD. It follows that, conditional on player 1 being oftype θJ

1 , the probability of player 2 being of type θJ2 , and likewise being told J , is

P JJ/(P JJ +P JD). In this way, player 1’s type records not only Natures suggestionabout how to play but also player 1’s belief about player 2’s type, and hence player1’s belief about player 2’s belief about player 1’s type, and so on.

Suppose that P = (4/9, 2/9, 2/9, 1/9). The strategic form game generated bythis P is shown in Figure 36. The strategy JD means play J when of type θJ

i and

JJ JD DJ DDJJ 6, 6 14/3, 19/3 10/3, 20/3 2, 7JD 19/3, 14/3 14/3, 14/3 3, 14/3 4/3, 14/3DJ 20/3, 10/3 14/3, 3 8/3, 8/3 2/3, 7/3DD 7, 2 14/3, 4/3 7/3, 2/3 0, 0

Figure 36: Chicken with an initial move by Nature.

play D when of type θDi . That is, do what Nature tells you to do. The strategy DJ

56

means do the opposite of what Nature tells you. The strategy JJ means, play J nomatter what Nature tells you. The strategic form payoffs are tedious to compute.The computation is similar to the one for Figure 32 in Section 9.1.

One can readily verify that in this game (JD, JD) is a Nash equilibrium. Thisshould not be surprising: I choose P so that the “do as you are told” Nash equilib-rium of Γ corresponds exactly to the mixed strategy equilibrium of G. In Γ, Naturein effect mixes on behalf of the players. There are other Nash equilibria here, too,such as (JJ,DD), but I want to focus on the equilibrium where players do as theyare told, since it is in this equilibrium that the actions taken correspond to thosespecified in the original formalism.

Now ask the question, for what P is the “do as you are told” strategy profile(JD, JD) a Nash equilibrium of Γ?17 This is equivalent to asking, for F , if CPAholds then what P are compatible with common knowledge of rationality at everytype profile? It should be fairly intuitive that any P that is a Nash equilibriumdistribution works. In fact, any P that is a correlated equilibrium distribution works.Thus, in particular, I can choose P using the correlated equilibrium distribution ofFigure 22: P = (1/3, 1/3, 1/3, 0). Moreover, only the correlated distributions work:(JD, JD) is a Nash equilibrium of Γ iff P is a correlated equilibrium distribution ofG. Thus, assuming both CPA and common knowledge of rationality at every typeprofile in F implies that P is a correlated equilibrium distribution for G.

This reasoning extends to general games and general type structures. CPA andcommon knowledge of rationality at every type profile jointly imply that the “do asyou are told” strategy profile is a Nash equilibrium of Γ, and this in turn impliesthat the induced distribution over action profiles is that of a correlated equilibrium.In this sense, CPA and common knowledge of rationality at every type profile implycorrelated equilibrium. The question is, do we want to interpret this as saying thatintrospection implies correlated equilibrium?

Of the two assumptions, CPA and common knowledge of rationality at everytype profile, I view the common knowledge of rationality assumption as strong butacceptable, given that we are considering introspection by players of unlimited com-puting power. Moreover, the common knowledge assumption used to get correlatedequilibrium is only somewhat stronger than the common knowledge assumption usedto get rationalizability.18

The question of whether introspection implies correlated equilibrium thus comesdown to the question of whether introspection implies CPA. It is tempting to appeal

17Note that when I change P I also change the payoffs in the strategic form game shown inFigure 36.

18Recall that common knowledge of rationality at a type profile implies rationalizability at thattype profile. The correlated equilibrium argument, in contrast, asks for common knowledge ofrationality at every type profile. In the Chicken example, the particular type space used had theproperty that common knowledge of rationality at one type profile implied common knowledge ofrationality at every type profile. But in other type spaces common knowledge of rationality couldhold at some type profiles while failing at others.

57

to precedent and cite the fact that CPA is a standard assumption throughout appliedgame theory. But the objective in applied game theory is to construct models thatcapture useful intuitions and that have testable predictions. The objective here isdifferent: to find out whether introspection in and of itself can yield Nash, or atleast correlated, equilibrium. Given this objective, CPA has to be defended on itsown terms.

One defense of CPA that I find especially seductive is the following. Why wouldtwo players assign different probabilities to how Nature assigns type profiles? Giventhat players have unlimited calculating power, such differences in belief must reflectdifferences in information (rather than calculating errors). So we should embed theformalism F inside a new, larger, formalism that explicitly models the informationthat players used to derive their probabilities for Nature in F . Rename the originalformalism F 0 and let F 1 denote the new formalism. But then it is possible thatplayers have different probabilities about Nature in F 1, which suggests constructinga new, still larger formalism F 2, and so on. This leads to the question, does thissequence of formalisms lead to an over arching formalism, F∞, in which CPA holds?

It turns out that this question was already answered in the original constructionof F . F is F∞ (or, rather, F is equivalent to F∞ in an appropriate sense) and sothe answer is “no.” Therefore, assuming CPA is substantive.

Whether CPA is too substantive to impose in a theory of introspection is con-troversial. On one side, one of the great game theorists, Robert Aumann, has takenthe position that CPA is justifiable in this context. See Aumann (1987), which isthe seminal paper in this area. On the other side are game theorists who arguethat CPA forces coordination of belief, and that the whole point of the introspec-tion literature was to explain such coordination, not assume it at the outset. Ingames with a focal equilibrium, such coordination may be natural, but invokingfocal equilibria just takes us back to where we started. As you might infer, I am onthe anti-Aumann side of this debate. For more on this topic (assuming you haven’talready had more than enough), see Brandenburger (1992) (a survey written for ageneral economics audience and therefore relatively accessible) and Dekel and Gul(1997) (another survey, but targeted at a more sophisticated audience).

One last remark. The introspective theories I have surveyed model introspectionimplicitly. The theories are of the form, if players have such and such informationand if they best respond then their actions must obey such and such restrictions. Analternative is to try to model introspection explicitly. One way to do this, perhapsthe most natural way, is to model each player as choosing a decision algorithm thattakes as input the opponent’s decision algorithm (thus allowing the player to thinkthrough the game from his opponent’s perspective) and that produces as output anaction in the game. Binmore (1987) pointed out that this approach quickly leads toconceptual problems. Canning (1992) formalized Binmore (1987) and showed thatthese problems can be avoided if one restricts the set of available algorithms. But ingames with multiple equilibria like Battle of the Sexes (Example 1), the restriction

58

on decision algorithms effectively requires players to coordinate on an equilibriumbefore introspection begins. So, once again, we are back where we started.

9.4 Learning.

There are many, many competing theories of learning that differ on all sorts ofdimensions. There are theories in which the same players play the same opponentsrepeatedly, theories in which players are part of a large population and never meetopponents more than once, and theories in which players play games with theirneighbors, who then play games with their neighbors, and so on. There are theoriesin which players are sophisticated Bayesian optimizers, theories in which playersuse rules of thumb to update their behavior based on their own experience, andtheories in which players update their behavior by copying what some other playeris doing. This is only a partial enumeration of the different modeling choices made.Fudenberg and Levine (1998) is, at this writing, the most general survey of thelearning literature. Much of the learning literature has been devoted to theorieswith large populations of players with simple update rules. These theories areoften called “evolutionary” because they share some mathematical similarities withdynamics that arise in evolutionary biology. Mailath (1998) provides a concisesurvey.19

Lest this material quickly become too abstract, I start by describing a particularlearning theory called fictitious play. In terms of player sophistication, fictitiousplay lies toward the more sophisticated end of the spectrum of learning theories. Ido not make any claim that fictitious play is particularly realistic. But fictitiousplay is relatively easy to describe, and it nicely illustrates some of the more generalpoints that I wish to make.

9.4.1 Fictitious play.

In fictitious play, players learn by playing the same game against the same opponent,over and over, forever. I focus on fictitious play in two-player games. On extendingfictitious play to games with more than two players, see Fudenberg and Levine(1998).

Consider, in particular, fictitious play in Matching Pennies (Example 2). Forthis game, a general class of fictitious play theories can be described as follows.

In each period, each player makes a forecast of the other player’s action. Player1’s forecast at the start of period 1 is (q0, 1− q0), meaning that player 1 thinks thatplayer 2 will play H with probability q0. Player 1’s forecast at the start of period

19Two good texts on evolutionary game theory are Weibull (1995) and Samuelson (1997). Thesetexts overlap, but the focus of Weibull (1995) is on evolution in strategic form games while the focusof Samuelson (1997) is more on issues particular to games with non-trivial extensive forms. Finally,Young (1998) surveys theories in which random shocks play an important role; these theories bridgethe evolutionary and non-evolutionary literatures.

59

t+ 1 is

(qt+1, 1 − qt+1) =k2

t+ k2(q0, 1 − q0) +

t

t+ k2(H2t, 1 − H2t),

where k2 > 0 is a constant discussed below, H2t is the empirical frequency withwhich player 2 has played H to date, and 1 − H2t = T2t is the empirical frequencywith which player 2 has played T . Notice that, for any k2, as t grows large eventuallynearly all the weight is on the second term: eventually player 1’s forecast is thatplayer 2 will play H with a probability approximately equal to H2t, the empiricalfrequency with which player 2 has played H to date. Thus, if player 2 plays H 25%of the time then eventually qt ≈ 1/4, regardless of k2 or q0. On the other hand, fork2 large, qt approximately equals the initial forecast q0, regardless of H2t, until t islikewise very large. Thus, k2 can be thought of as measuring the strength of player1’s confidence in his initial forecast q0. Player 2’s forecast rule is similar: he has aninitial forecast (p0, 1 − p0) and he updates in a similar way using a weight k1.

Given his forecast for period t + 1, player i then best responds. There are twosubtleties. First, the players are engaged in a repeated version of Matching Penniesand therefore one must define best response in terms of this repeated game. In arepeated game, player i cares about not only how his date t action affects his datet payoff but also about how his date t action might affect his future payoffs, byaffecting his opponent’s future actions. It turns out that fictitious play assumes thisproblem away: one can show that the fictitious play forecast rule implicitly assumesthat a player’s action today has no affect on her opponent’s future actions. So aplayer’s best response can be computed by simply computing the best response forthe current period, ignoring the fact that play will be repeated. The second subtletyis that one must specify what happens if player i turns out to be exactly indifferent.I assume that when a player is indifferent, she chooses H. It turns out that this tiebreaking rule is essentially irrelevant: one can show that for almost any values of k2

and q0, player 1 is never indifferent.Now consider what happens in Matching Pennies. For any k1, k2, p0, and q0,

the empirical frequency for each of the four strategy profiles converges to 1/4, as inFigure 26. Thus, the empirical frequency of play converges to the mixed strategyNash equilibrium distribution for Matching Pennies. In this sense, fictitious playpredicts convergence to the Nash equilibrium of the game. Note, however, thefollowing.

1. The unique Nash equilibrium of matching pennies calls for each player to ran-domize 50/50. But, as I have defined fictitious play, players never randomize.So the sense in which play in fictitious play converges to Nash equilibriumis weak: it is not that the players’ actual behavior converges to that of aNash equilibrium but only that the empirical distribution of play resemblesthe empirical distribution that would be generated by repeated play of theNash equilibrium. One would like to claim that observable play looks as if

60

it had been generated by players actually randomizing. But this, too, is notcorrect. It is true that observable play passes one statistical test (namely thatthe empirical distribution resembles the distribution generated by repeatedplay of a Nash equilibrium) but it fails others. An outside observer, lookingat the sequence of action profiles chosen under fictitious play, would be ableto detect that players were not randomizing 50/50. For example, an observerwould see that, whenever both players play H in period t, player 1 also playsH in period t+ 1. In contrast, if player 1 were playing the equilibrium mixedstrategy then he would play H only half the time under these (or any other)circumstances.

2. Based on the behavior of fictitious play in Matching Pennies, it is temptingto conjecture that fictitious play always converges, at least in a weak sense, toNash equilibrium. This conjecture is false.

First, the good news.

• If the empirical distribution of play converges to a degenerate distributionthat assigns probability 1 to s∗ then s∗ is a pure strategy equilibrium.Conversely, if s∗ is a strict equilibrium and if initial beliefs are close tos∗ then the empirical distribution of play converges to s∗. (A pure Nashequilibrium s∗ is strict iff each s∗i is the unique best response to s∗−i.)

• If the empirical distribution of play converges then the marginals form aNash equilibrium. (This statement is for two-player games).

Now the bad news.

• The empirical distribution of play may never converge. The game inFigure 37, which originated in Shapley (1962), provides an example.

L C RT 1, 0 0, 0 0, 1M 0, 1 1, 0 0, 0B 0, 0 0, 1 1, 0

Figure 37: A game for which fictitious play fails to converge.

The Nash equilibrium in this game calls for each player to randomize(1/3, 1/3, 1/3). One can show that the empirical frequency of play underfictitious play wanders around forever, never converging. This example isrobust; one can perturb payoffs slightly without altering the qualitativeconclusion.

61

• One can construct examples in which play converges, and hence the em-pirical marginals converge to a Nash equilibrium, but for which the limit-ing empirical distribution is not a correlated, let alone a Nash, equilibriumdistribution. For example consider Figure 38. This game has two pure

L RT 0, 0 1, 1B 1, 1 0, 0

Figure 38: A game for which fictitious play can converge to a distribution that isnot a Nash equilibrium distribution.

Nash equilibria, (T,L) and (B,R) and also a mixed Nash equilibrium inwhich both players randomize 50/50. If p0 = q0 then, under fictitiousplay, the empirical distribution converges to the distribution shown inFigure 39. Note that the marginals are indeed 50/50 but the distribution

L RT 1/2 0B 0 1/2

Figure 39: The marginals form a Nash equilibrium for the game of Figure 38 eventhough the distribution itself is not a Nash equilibrium distribution.

itself is not close to any sort of equilibrium distribution. This examplelooks pathological. It is not known how robust this particular problemis.

Although play under fictitious play may fail to converge to a Nash or correlatedequilibrium, play under fictitious play always converges to the rationalizable set,SR. The proof of this fact mirrors the iterated deletion of strategies that are nevera best response. Because fictitious play players always best respond, they neverplay a strategy that is never a best response. Given this, and the fact that forecastsunder fictitious play are eventually close to the empirical frequencies of play, playerseventually forecast that the probability is close to zero that their opponent will playa strategy that is never a best response. Because of this, and the fact that playersbest respond, players eventually never play strategies that do not survive the secondround of deletion of strategies that are never a best response. And so on. On theother hand, play under fictitious does not necessarily converge to SW .

Remark 14. Fictitious play has a Bayesian interpretation. Consider again the Match-ing Pennies example and suppose that player 1 is certain that player 2 is playing an

62

i.i.d. strategy, that is, a strategy of the form, play H with probability q∗ in everyperiod, regardless of what has happened in the game. Suppose that player 1 doesnot know q∗ but believes that it is drawn from the uniform distribution on [0, 1].Then player 1’s forecast for period t + 1 is as described above, with q0 = 1/2 andk2 = 2. More generally, any Beta distribution for q∗ implies a q0, a k2 > 0, and aforecast rule of the sort described.

Fictitious play players are thus “as if” Bayesian. But they aren’t very sophis-ticated. Each player is certain that the other is i.i.d. even though neither is i.i.d.Actually, the situation is worse than this suggests. Suppose that player 1 thinks(correctly) that player 2 thinks that player 1 is playing an i.i.d. strategy. (Butplayer 1 need not know the details of exactly how player 2 assigns probability.) Itis not hard to show that, given this correct belief about player 2’s belief, if player 1believes that player 2 is playing an i.i.d. strategy then player 1 is certain that player2 is not optimizing. And similarly for player 2. It is one thing for a learning theoryto model an optimizing player as thinking his opponent might not be optimizing; itis quite another for a learning theory to model an optimizing player as being certainthat his opponent is not optimizing.

The natural response to this inconsistency in the Bayesian interpretation of fic-titious play is to enrich each player’s belief by including more complicated strategiesalong with the i.i.d. strategies. But as we include more complicated strategies in aplayer’s belief, we also make her best response more complicated, so that the richertheory may still be inconsistent. Nachbar (1997) shows that the inconsistency exhib-ited by fictitious play is, in fact, a general feature of Bayesian and “as if” Bayesianlearning theories. �

I spend the remainder of this subsection surveying what I believe to be the mainlessons of the learning literature.

9.4.2 Learning, SR, and SW .

Most but not quite all, learning theories predict that play eventually lies in SR (i.e.that play is eventually rationalizable). Thus, rationality is not necessary to justifyrationalizability. Even if players are extremely naive, their long run behavior canlook as though it were generated by players of unlimited calculating power.

The connection between learning theories and SW (and other, related, concepts)is complicated and only partially understood. For many but not all theories, if playconverges to something then that something is either in SW or is close (possiblyonly in a very weak sense) to something in SW . But it is also quite possible forplay to cycle, spending part of the time in SW , then veering outside of SW , thenreturning at a later date, etc. And in evolutionary theories it is possible for play toconverge to something not in SW ; see Binmore and Samuelson (1999).

The statement that play eventually lies in SR (or, for that matter, in SW ) issubject to several caveats. First, in many learning theories, strategies that are not

63

“too” strictly dominated can survive indefinitely. This is the case, in particular, ifplayers only ε optimize.

Second, learning theories invariably embed the original game, call it G, in alarger game, call it Γ. Convergence to rationalizable behavior means convergence tobehavior that is rationalizable in Γ, not necessarily to behavior that is rationalizablein G.

One standard example is the following. Consider two players playing the Pris-oner’s Dilemma (Example 21) over and over, forever. As noted in Example 21, theonly rationalizable action in the Prisoner’s Dilemma is F . So one might naivelyconjecture that learning theories predict that eventually both players play F in ev-ery period. But, strictly speaking, the players are engaged, not in the Prisoner’sDilemma (G), but in the repeated Prisoner’s Dilemma (Γ). As discussed in Exam-ple 47, in the repeated game it is rationalizable, for a discount factor close enoughto 1, for players to play C in some or even all periods along the path of play.So it is possible for a learning theory to forecast play of C even though C is notrationalizable in G, the Prisoner’s Dilemma played only once.

9.4.3 Learning, Nash equilibrium, and correlated equilibrium.

Learning theories typically do not guarantee convergence to Nash equilibrium. Amore typical result is that play converges to Nash equilibrium in some situationsbut not in others. Convergence to play of a strict Nash equilibrium (“strict Nashequilibrium” was defined in Section 9.4.1) is usually fairly easy to obtain, but playcan fail to converge, even in a weak sense, to that of a mixed equilibrium.

Convergence to correlated equilibrium is somewhat easier to obtain. Foster andVohra (1997), Fudenberg and Levine (1999), and Hart and Mas-Colell (2000) providerelated but distinct learning theories in which the empirical distribution of playeventually converges to the set of correlated equilibrium distributions, ΣCorr. (Theempirical distribution can wander around within ΣCorr, without ever settling down.)See also Hart and Mas-Colell (1999).

Here are a few additional comments.

1. As already noted in connection with rationalizability (Section 9.4.2), learningtheories embed the original game G in a larger game Γ, and convergence mustbe understood as being with respect to equilibria of Γ, not necessarily toequilibria of G. I mention two consequences of this.

First, depending on one’s perspective (G or Γ), play can look correlated. Forexample, suppose that two players play Battle of the Sexes (Example 1) re-peatedly. Then one possible outcome is that play alternates between the twopure Nash equilibria, with (a, a) played in even periods and (b, b) played inodd periods. The empirical distribution over actions in original game is thenthe one given Figure 34. That is, the players are actually engaged in a Nashequilibrium of Γ but an observer sees a correlated distribution of G.

64

Second, there may be Nash equilibria of Γ in which the outcomes look com-pletely unlike anything that could arise as an equilibrium outcome of G. Thestandard example is the repeated Prisoner’s Dilemma. As discussed in Exam-ple 48, there may be equilibria in which players play C in every period alongthe path of play, whereas only F is played in the unique Nash equilibrium ofthe stage game.

2. As illustrated in Section 9.4.1, if one’s standard of convergence is that theempirical frequency of play converges to that of a Nash equilibrium then it ispossible to get convergence to a mixed strategy Nash equilibrium without theplayers actually playing mixed strategies. This sort of phenomenon, in whichplay looks at least somewhat random to an outside observer even thoughthe players are not, strictly speaking, randomizing, shows up in various guisesthroughout the learning literature. For this reason, I find the classical interpre-tation of a mixed strategy, as representing explicit randomization, needlesslylimiting.

3. Learning theories that exhibit global convergence to Nash or at least correlatedequilibrium typically model players as unsophisticated. Learning theories withsophisticated players (Bayesian optimizing players) have, to date, been ableto establish global convergence only at the price of imposing strong, equilib-rium or equilibrium-like, assumptions at the outset. Thus, for example, thereis a learning literature with sophisticated players that gets a form of globalconvergence but that assumes that players are, before play begins, alreadyin equilibrium within a repeated game of incomplete information, in whichplayers know their own payoff function for the stage game, but not the payofffunctions of their opponents. The difficulty in establishing global convergencein learning theories with sophisticated players may be related to the incon-sistency, discussed in Remark 14, inherent in Bayesian theories of learning inrepeated games.

4. In games with non-trivial extensive forms, Nash equilibrium requires that play-ers behave as if they were best responding to their opponents’ extensive formstrategies. This is extremely demanding because it requires that forecasts becorrect not only at information sets along the path of play (at which playersmay already have observations from previous experiences with the game) butalso at information sets off the path of play (at which players may have few ifany observations).

An alternative, weaker, equilibrium concept called self-confirming equilibriumrequires that players behave as if they were best responding to forecasts thatwere correct along the path of play but not necessarily off the path of play.Note that any Nash equilibrium is a self-confirming equilibrium, but not nec-essarily vice versa. Fudenberg and Levine (1993b) provides an example of a

65

learning theory that yields convergence to self-confirming equilibrium but notnecessarily to Nash equilibrium. For two-player games, the distinction be-tween Nash equilibrium and self-confirming equilibrium is largely irrelevant:if one looks at distributions over terminal nodes (which is all an observer typ-ically sees) then any distribution that can be generated by a self-confirmingequilibrium can also be generated by a Nash equilibrium. But this is no longertrue for games with more than two players. There exist simple three-playerexamples in which there is a self-confirming equilibrium that reaches a termi-nal node that could not have been reached under any Nash equilibrium. SeeFudenberg and Levine (1993a).20

9.5 Empirical Evidence.

The empirical evidence on game theory is drawn partly from laboratory experimentsand partly from real world data. The literature is vast and growing rapidly. Surveysinclude Roth (1995) and Crawford (1997). My discussion is brief.

1. The evidence is, as one would expect, consistent with players learning overtime.

2. There is evidence that, in at least some games, players carry out one or perhapstwo rounds of strict dominance deletion before choosing their own action. Theevidence is not consistent with most players carrying out more than two roundsof strict dominance deletion. But there is evidence of eventual convergence toSR as players grow more experienced.

3. There is a large body of evidence that players persistently play weakly domi-nated strategies. Much of this evidence is associated with a stylized bargaininggame called the ultimatum game. There is continuing controversy over exactlyhow this evidence should be interpreted.

4. In games like matching pennies in which the unique equilibrium involves ran-domization, if convergence to Nash equilibrium obtains at all it is typically inthe weak sense discussed in connection with fictitious play (Section 9.4.1).

5. The evidence suggests convergence to Nash, or at least correlated equilibrium,in many games and settings but also failure of convergence in other games andother settings. Some of the debate in this literature has hinged on the correctdefinition of “converge.” Papers that define convergence using empirical fre-quencies sometimes find evidence of correlation, which is consistent with thediscussion in Section 9.4.3.

20Solution concepts similar to self-confirming equilibrium have been proposed, in different con-texts, by a number of authors. In addition to Fudenberg and Levine (1993a), which coined theterm, see Hahn (1977) and Kalai and Lehrer (1993).

66

Naively, one would expect that Nash or correlated equilibrium would be a goodpredictor of long run behavior in simple games but possibly not in complicatedgames. But Nash equilibrium sometimes fails to predict behavior well evenin games that, one would think, are simple. Conversely, there is real worldevidence from auctions, which are very complicated games, that players exe-cute strategies that are consistent with equilibrium. The classic cite here isHendricks and Porter (1988).

67

References

Aumann, R. (1964): “Mixed and Behaviour Strategies in Infinite ExtensiveGames,” in Advances in Game Theory, ed. by M. Dresher, L. S. Shapley, andA. W. Tucker, pp. 627–650. Princeton University Press, Princeton, NJ, Annals ofMathematics Studies, 52.

(1974): “Subjectivity and Correlation in Randomized Strategies,” Journalof Mathematical Economics, 1, 67–96.

(1976): “Agreeing to Disagree,” Annals of Statistics, 4, 1236–1239.

(1987): “Correlated Equilibrium as an Expression of Bayesian Rationality,”Econometrica, 55, 1–18.

Baye, M., G. Tian, and J. Zhou (1993): “Characterizations of the Existence ofEquilibria in Games with Discontinuous and Non-quasiconcave Payoffs,” Reviewof Economic Studies, 60, 935–948.

Bernheim, B. D. (1984): “Rationalizable Strategic Behavior,” Econometrica,52(4), 1007–1028.

Binmore, K., and L. Samuelson (1999): “Evolutionary Drift and EquilibriumSelection,” Review of Economics Studies, 66(2), 363–393.

Binmore, K. G. (1987): “Modeling Rational Players, Part I,” Economics andPhilosophy, 3, 179–214.

Borgers, T., and L. Samuelson (1992): “‘Cautious’ Utility Maximization andIterated Weak Dominance,” International Journal of Game Theory, 21, 13–25.

Brandenburger, A. (1992): “Knowledge and Equilibrium in Games,” Journal ofEconomic Perspectives, 6(4), 83–101.

Brandenburger, A., and E. Dekel (1993): “Hierarchies of Beliefs and CommonKnowledge,” Journal of Economic Theory, 59, 189–198.

Canning, D. (1992): “Rationality, Computability, and Nash Equilibrium,” Econo-metrica, 60, 877–888.

Crawford, V. (1997): “Theory and Experiment in the Analysis of Strategic Inter-action,” in Advances in Eonomics and Econometrics: Theory and Applications,ed. by D. M. Kreps, and K. F. Wallis, vol. 1, chap. 7. Cambridge University Press,Cambridge, UK.

Dekel, E., and F. Gul (1997): “Rationality and Knowledge in Game Theory,” inAdvances in Eonomics and Econometrics: Theory and Applications, ed. by D. M.

68

Kreps, and K. F. Wallis, vol. 1, chap. 5. Cambridge University Press, Cambridge,UK.

Foster, D., and R. Vohra (1997): “Calibrated Learning and Correlated Equi-librium,” Games and Economic Behavior, 21, 40–55.

Fudenberg, D., and D. Levine (1993a): “Self-Confirming Equilibrium,” Econo-metrica, 61(3), 523–545.

(1993b): “Steady State Learning and Nash Equilibrium,” Econometrica,61(3), 547–574.

(1998): Theory of Learning in Games. MIT Press, Cambridge, MA.

(1999): “Conditional Universal Consistency,” Games and Economic Be-havior, 29, 104–130.

Fudenberg, D., and J. Tirole (1991): Game Theory. MIT Press, Cambridge,MA.

Govindan, S., and A. McLennan (1998): “On the Generic Finiteness of Equi-librium Outcome Distributions in Game Forms,” University of Minnesota.

Hahn, F. (1977): “Exercises in Conjectural Equilibrium,” Scandinavian Journal ofEconomics, 79, 210–226.

Harsanyi, J. (1967): “Games with Incomplete Information Played by BayesianPlayers, Parts I, II, III,” Management Science, 14, 159–182, 320–334, 486–502.

Harsanyi, J., and R. Selten (1988): A General Theory of Equilibrium Selectionin Games. MIT, Cambridge.

Hart, S., and A. Mas-Colell (1999): “A General Class of Adaptive Strategies,”The Hebrew University of Jerusalem.

(2000): “A Simple Adaptive Procedure Leading to Correlated Equilib-rium,” Econometrica, 68(5), 1127–1150.

Heifetz, A. (1999): “How Canonical is the Canonical Model? A Comment onAumann’s Interactive Epistomology.,” International Journal of Game Theory,28, 435–442.

Hendricks, K., and R. Porter (1988): “An Empirical Study of an Auction withAsymmetric Information,” American Economic Review, 78(5), 865–883.

Kalai, E., and E. Lehrer (1993): “Subjective Equilibrium in Repeated Games,”Econometrica, 61(5), 1231–1240.

69

Kuhn, H. W. (1964): “Extensive Games and the Problem of Information,” in Con-tributions to the Theory of Games, Volume II, ed. by M. Dresher, L. S. Shapley,and A. W. Tucker, pp. 193–216. Princeton University Press, Annals of Mathemat-ics Studies, 28.

Mailath, G. (1998): “Do People Play Nash Equilibrium? Lessons from Evolution-ary Game Theory,” Journal of Economic Literature, Forthcoming.

Mas-Colell, A., M. D. Whinston, and J. R. Green (1995): MicroeconomicTheory. Oxford University Press, New York, NY.

Milgrom, P., and R. Weber (1985): “Distributional Strategies for Games withIncomplete Information,” Mathematics of Operations Research, 10, 619–632.

Nachbar, J. H. (1997): “Prediction, Optimization, and Learning in RepeatedGames,” Econometrica, 65, 275–309.

Nash, J. F. (1950): “Equilibrium Points in n-Person Games,” Proceedings of theNational Academy of Sciences, 36, 48–49.

Osborne, M., and A. Rubinstein (1994): A Course in Game Theory. MIT Press,Cambridge, MA.

Pearce, D. G. (1984): “Rationalizable Strategic Behavior and the Problem ofPerfection,” Econometrica, 52(4), 1029–1049.

Reny, P. J. (1999): “On the Existence of Pure and Mixed Strategy Nash Equilibriain Discontinuous Games,” Econometrica, 67(7), 1029–1056.

Roth, A. E. (1995): “Introduction to Experimental Economics,” in Handbook ofExperimental Economics, ed. by J. H. Kagel, and A. E. Roth. Princeton UniversityPress, Princeton, NJ.

Samuelson, L. (1997): Evolutionary Games and Equilibrium Selection. MIT Press,Cambridge, MA.

Schelling, T. (1960): The Strategy of Conflict. Harvard University Press, Cam-bridge, MA.

Shapley, L. (1962): “On the Nonconvergence of Fictitious Play,” Discussion PaperRM-3026, RAND.

Simon, L., and W. Zame (1990): “Discontinuous Games and Endogenous SharingRules,” Econometrica, 58, 861–872.

Sion, M., and P. Wolfe (1957): “On a Game Without a Value,” in Contributionsto the Theory of Games III. Princeton University Press.

70

Weibull, J. (1995): Evolutionary Game Theory. MIT Press, Cambridge, MA.

Young, H. P. (1998): Individual Strategy and Social Structure: An EvolutionaryTheory of Institutions. Princeton University Press, Princeton, NJ.

71

GameTheoryNC17

Documents

Transcript of GameTheoryNC17