Reputation in Repeated Games with No Discounting

28
GAMES AND ECONOMIC BEHAVIOR 15, 82–109 (1996) ARTICLE NO. 0060 Reputation in Repeated Games with No Discounting Joel Watson * Department of Economics, University of California, San Diego, La Jolla, California 92093-0508 Received September 19, 1994 I study two-player, infinitely repeated games with no discounting. I examine how pertur- bations afford players opportunities to establish reputations and I determine how potential reputations lead to outcome selection in both equilibrium and non-equilibrium settings. The main result is the following. Assume that players have beliefs of countable support, players only adopt “forgiving strategies,” and players best-respond to their beliefs. (A forgiving strategy does not punish an opponent forever.) Then if the game is perturbed, each player expects to fare at least as well as if she selected any of her perturbation strategies and the opponent played a best-response. Journal of Economic Literature Classification Numbers: C72, C73. © 1996 Academic Press, Inc. In infinitely repeated games, as numerous folk theorems attest, almost all types of behavior can be supported by a rationality or equilibrium notion. Toughness, fairness, long-run cooperation, and failure of cooperation can all result. Of- ten, though, some types of behavior are intuitively more appealing than others. For example, one often expects long-run cooperation in the repeated prisoners’ dilemma when the discount factor is close to one. An important factor in deter- mining the outcome of a repeated game is that players may build reputations for certain types of behavior. In practice, and evident in daily life, reputations can shape the form of ongoing interaction. Furthermore, the merit of reputations is supported theoretically by several recent studies of repeated games of incom- plete information. However, the theoretical work to date (which I discuss below) focuses on only specific classes of games. In this paper I study reputation in general repeated games. First I present simple intuition about how players may take advantage of opportunities to establish * This is the fourth chapter of my doctoral dissertation (Stanford Graduate School of Business, 1992). I am grateful to David Kreps for guidance, and to Drew Fudenberg and several anonymous referees for comments. The referees called my attention to some errors in earlier drafts of the paper. 82 0899-8256/96 $18.00 Copyright © 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

Transcript of Reputation in Repeated Games with No Discounting

GAMES AND ECONOMIC BEHAVIOR15,82–109 (1996)ARTICLE NO. 0060

Reputation in Repeated Games with No Discounting

Joel Watson∗

Department of Economics, University of California, San Diego, La Jolla, California 92093-0508

Received September 19, 1994

I study two-player, infinitely repeated games with no discounting. I examine how pertur-bations afford players opportunities to establish reputations and I determine how potentialreputations lead to outcome selection in both equilibrium and non-equilibrium settings. Themain result is the following. Assume that players have beliefs of countable support, playersonly adopt “forgiving strategies,” and players best-respond to their beliefs. (A forgivingstrategy does not punish an opponent forever.) Then if the game is perturbed, each playerexpects to fare at least as well as if she selected any of her perturbation strategies and theopponent played a best-response.Journal of Economic LiteratureClassification Numbers:C72, C73. © 1996 Academic Press, Inc.

In infinitely repeated games, as numerous folk theorems attest, almost all typesof behavior can be supported by a rationality or equilibrium notion. Toughness,fairness, long-run cooperation, and failure of cooperation can all result. Of-ten, though, some types of behavior are intuitively more appealing than others.For example, one often expects long-run cooperation in the repeated prisoners’dilemma when the discount factor is close to one. An important factor in deter-mining the outcome of a repeated game is that players may build reputations forcertain types of behavior. In practice, and evident in daily life, reputations canshape the form of ongoing interaction. Furthermore, the merit of reputations issupported theoretically by several recent studies of repeated games of incom-plete information. However, the theoretical work to date (which I discuss below)focuses on only specific classes of games.

In this paper I study reputation in general repeated games. First I present simpleintuition about how players may take advantage of opportunities to establish

∗ This is the fourth chapter of my doctoral dissertation (Stanford Graduate School of Business, 1992).I am grateful to David Kreps for guidance, and to Drew Fudenberg and several anonymous refereesfor comments. The referees called my attention to some errors in earlier drafts of the paper.

820899-8256/96 $18.00Copyright © 1996 by Academic Press, Inc.All rights of reproduction in any form reserved.

REPUTATION IN REPEATED GAMES 83

reputations. I highlight the role that incomplete information plays in motivatinglong-run cooperation, and also identify potential conflicts that result if players canbuild reputations for being very “demanding.” I then examine a class of strategy-perturbed supergames in an attempt to formally represent the simple intuitionpreviously outlined. The games are perturbed in that with some small probabilitya player is forced to adopt a randomly selected “perturbation strategy.”

As I show, there are three essential ingredients in the development of viablereputations in general repeated games. First, and on which I focus, effectivereputations rely on the use offorgiving strategies. Roughly, these are strategiesthat do not “punish” an opponent forever. In order for reputations to be productive,players must be willing to test each other to discover each others’ strategies asthe game progresses. A player need not fear infinite retaliation in response toher inquisitiveness, as long as her opponent uses a forgiving strategy. In short,forgiveness encourages players to learn about each other, and in such a settingreputations are most easily established.

The second essential ingredient is that the game have a long horizon that theplayers value. Establishing a reputation may take some time and a player willbe interested in doing so only if he discounts the future sufficiently little. Thenthe long-term benefit of establishing a reputation outweighs the short-term cost.Finally, the third ingredient in the formation of a reputation is that the game beperturbed with strategies that the players would like to build a reputation forplaying. The perturbation, in short, is the seed from which reputations spring.

I demonstrate that, contrary to what one might expect, meaningful reputationsarise outside the context of equilibrium. In fact, my main result is in terms of anotion of rational behavior that is similar to rationalizability. My work thereforecontinues in developing the theme of Watson (1993) concerning refinementswithout equilibrium.1 I also prove stronger results for equilibrium, where therequirements for successful reputations may be relaxed a bit. For example, if oneaccepts the notion of pure strategyε-equilibrium as the basis for analysis, thenthe assumption that players adopt forgiving strategies may be dropped as longas all of the perturbation strategies are forgiving. Finally, I complete the analysisby studying the middle ground between rationalizability and equilibrium. HereI characterize both beliefs and actual payoffs of the game when the conjecturesof the players contain a “grain of truth” (Kalai and Lehrer, 1993).

My main result goes as follows. Consider non-discounted games, which mostconveniently satisfy the requirement of a long horizon. Assume that playersbelieve the following: players have conjectures of countable support, playersonly adopt forgiving strategies, and players chooseε-best responses to theirconjectures. These assumptions imply lower bounds on the expected payoffs ofthe players, and guarantee that players can build reputations which correspond

1 Others have also shown that rationalizability can be fruitfully employed. Cho (1994) is a good,recent example.

84 JOEL WATSON

to perturbation strategies. Formally, if the game is perturbed, each player expectsto fare at least as well as if she selected any of the perturbation strategies and theopponent played aγ -best response;γ is directly proportional toε and inverselyproportional to the probability of the chosen perturbation strategy.

I demonstrate that in many games there may be a conflict when both playersseek to establish reputations. It is possible that the players believe they willindividually achieve payoffs that together are not feasible in the game. Eachplayer believes that he will convince the other of his “identity” before the otherconvinces him. Such beliefs are consistent with rationalizability but may implynon-existence of the “simple” equilibria I study. Consequently, I argue that wemust either expect more complicated equilibrium behavior or embrace the lessstringent notion of rationalizability.

My work herein is most closely related to the study of Aumann and Sorin(1989), who show that in repeated games of “common interest,” with attentionrestricted to pure strategy equilibria, perturbations consisting ofbounded recallstrategies induce cooperation in the limit. My results concerning pure strategyequilibria are essentially generalizations of the result of Aumann and Sorin tocover general games.2 On the other hand, my results in the non-equilibriumsetting should be viewed as complementary to the findings of Aumann andSorin, since I drop the equilibrium assumption but insist on the use of forgivingstrategies. For other specialized extensions of Aumann and Sorin (1989) in thecontext of discounting, see Watson (1994a) and Takahashi (1992). These papersshow that much of the equilibrium analysis herein applies as well to discountedgames. In fact, my equilibrium results hold with minor modifications, as long asthe discount factor is close to one. Unfortunately, extending the non-equilibriumresults to the case of discounting seems more problematic and involves sometechnical hurdles.

Other related work includes that of Fudenberg and Levine (1989), who pro-vide the definitive analysis of how incomplete information refines the equilibriumoutcomes in games in which a single long-run player faces an infinite sequenceof short-run opponents. They focus on the long-run player’s ability to establisha reputation as the “Stackelberg type” (to the long-run player’s benefit). Morerelevant to my analysis is the work of Schmidt (1993), who also examines repu-tation in two-player, infinitely repeated games. His inquiry follows the logic ofFudenberg and Levine (1989) in that he explores the opportunity for players tomimic the play of the “Stackelberg type.” Schmidt analyzes games of “conflict-ing interest,” in which a player’s Stackelberg strategy holds the opponent to herminimal individually rational payoff. Note, therefore, that his work cannot ad-dress long-run cooperation of the type exhibited in, say, the repeated prisoners’

2 My characterization of conflicts that may develop when players seek to establish reputations fitsnicely with the model of Aumann and Sorin, since the class of common interest games is precisely thatin which such conflicts can never arise.

REPUTATION IN REPEATED GAMES 85

FIG. 1. A simple common interest game.

dilemma. In addition, Schmidt’s results require one player to be arbitrarily morepatient than the other.

Before plunging ahead, I should remind the reader of the seminal theories ofreputation which have inspired much of the recent work on the subject. Kreps andWilson (1982) and Milgrom and Roberts (1982) demonstrate that slight amountsof incomplete information can have momentous effects on the equilibrium of Sel-ten’s chain-store game. In their analysis, the incumbent firm wishes to establisha reputation for “toughness” because it deters entry. These authors also show thatincomplete information leads to substantial cooperation in the finitely repeatedprisoners’ dilemma. In both cases, incomplete information (and reputations)leads to intuitive equilibria that do not exist in the games’ complete informa-tion counterparts. Note that incomplete information in infinitely repeated gameshas quite the opposite effect; it refines the set of equilibrium (or rationalizable)outcomes.

1. INTUITION

Suppose there are two people who, in each of an infinite number of periods,play the common interest game depicted in Fig. 1. Might the players get stuckin the routine of, say, playing(D, D) each period? Perhaps. Nevertheless, theremay be a way for one of the players to signal his intention to cooperate with theother. One signal could be to take actionC repetitively in an attempt to build areputation as someone who always playsC. It seems clear enough that if a playergestured in this way, there is a chance that the opponent would understand andthey would coordinate on(C,C) thereafter. In order for the opponent to acceptsuch a signal, though, she must believe with some positive probability at thebeginning of the game that the other player will behave this way. That is, shemust believe that there is a chance that her opponent always playsC.

Another requirement for the signal to operate is that players have some in-centive to signal. Think of two costs associated with signaling one’s opponent.First, there is animmediate cost—the short-run loss of playing what might notbe an optimal response to the opponent’s expected actions in the next few peri-ods. Second, there is astrategic costwhich may arise if the opponent’s long-run

86 JOEL WATSON

FIG. 2. Another common interest game.

FIG. 3. Prisoners’ dilemma.

response to the signal is adverse. As an illustration of these two costs, take thecommon interest game pictured in Fig. 2. Suppose the players somehow expectto coordinate on the action profile(M,M) each period. One of the players maywish to signal his willingness to coordinate on(U,U ) by playingU for a fewperiods. Since he expects that the opponent will be playingM in the short run(before being convinced of his intentions), the immediate cost of this behavior isthree units per period. Furthermore, suppose the opponent does not understandthe signal and punishes this behavior by playingD thereafter. This response en-tails a long-run loss of one unit each period in perpetuity. Either or both of thesecosts might discourage an attempt at signaling one’s willingness to cooperate.

The affair is more elaborate in general games, because players may wish toestablish reputations for using more complex strategies. Consider the prisoners’dilemma game in Fig. 3. In this game, players do not want to build reputationsfor always choosing the same action, because the optimal response would be toplay D. Rather, a player may wish to convince the opponent that she is playinga strategy that conditions its action on past play. One such strategy istit-for-tat (TFT), which starts with actionC and each subsequent period chooses theaction that the opponent played in the previous period. Notice that a playercannot herself signal that she has adopted the TFT strategy, because in order todo so she must condition her moves on the opponent’s past play, which may beunrevealing indeed. For instance, the opponent might be playingD each period.Then the opponent will not be able to distinguish TFT from, say, the strategythat playsC in the first period andD thereafter.

In order for reputations to develop in general games, players must be willingto probe their opponents to learn about the strategies they are playing. (In the

REPUTATION IN REPEATED GAMES 87

prisoners’ dilemma, in order for a player to establish a reputation for usingthe TFT strategy, her opponent must be willing to investigate this possibilityby playingC for at least a few periods.) As before, a player will be willingto test his opponent for a certain strategy only if he conjectures that there is achance that the opponent is actually playing the strategy. The costs of probingto determine the opponent’s strategy are characterized as before. There may bean immediate loss involved in a testing procedure, as well as a strategic lossif the testing procedure induces a poor response. Under favorable conditions,meaningful reputations are possible. In this paper, I study non-discounted games,which alleviates immediate costs; I allow only “forgiving strategies,” whichminimizes strategic costs (because players will not punish one another forever);and I consider perturbed games, which guarantees that players assign positiveprobability to some strategies.

2. DEFINITIONS

Take as the stage game any two player, normal form gameG = {A1, A2;u1, u2}. Player 1’saction spaceis A1 and player 2’s action space isA2. Assumethat A1 and A2 each contain at least two elements. LetA ≡ A1 × A2 be thespace ofaction profiles. I will sometimes represent a given action profilea ∈ Aas(ai ,aj ) for i = 1, 2, and j = 3− i . (Where it is not indicated, it should beunderstood that given a playeri , j = 3− i is playeri ’s opponent.) The stagegame payoff of playeri (i = 1, 2) is given by the bounded functionui : A→ R.Let zl andzh be numbers such thatzl < ui (a) < zh for i = 1, 2 and for alla ∈ A.

The supergame(repeated game) involves play ofG each periodn, n =1, 2, . . . . At each stage the players can condition their actions on the entirepast history of play. LetHn ≡ An denote the set of possible histories throughperiodn, and letH ≡⋃∞n=0 Hn be the set of all possible (finite) histories. Notethat H0 ≡ ∅ represents the “history” at the start of the game. I will later wantto append histories. Ifh ∈ Hn andh′ ∈ Hm, let hh′ denote the(m+ n)-periodhistory consisting of the action profile sequenceh followed by the sequenceh′.

A pure supergame strategyfor player i is a functionsi : H → Ai whichspecifies the action playeri will take contingent on the history of play. Givenstrategysi , playeri ’s continuation strategy from history hissh

i : H → Ai , wheresh

i (h′) ≡ si (hh′) for all h′ ∈ H . Let Si be the set of supergame strategies for

playeri , and letS≡ S1× S2 be the set of strategy profiles.Given a strategy profiles, the infinite history induced bys, {an(s)}∞n=1, is

defined inductively:

a1(s) ≡ (s1(∅), s2(∅)),

88 JOEL WATSON

and for alln > 1,

an(s) ≡ (s1(a1,a2, . . . ,an−1), s2(a

1,a2, . . . ,an−1)).

I will refer to this infinite history as the (outcome)pathgenerated by profiles. A fi-nite historyh ∈ Hn is consistent with strategy profile sif (a1(s),a2(s), . . . ,an(s))= h. Furthermore, a historyh ∈ H is consistent with player i ’s strategy si ifthere is a strategysj such thath is consistent withs = (si , sj ). For eachh ∈ Hlet SCi (h) be the set of strategies for playeri that are consistent withh.

Consider thenon-discounted supergame, which shall be denotedG∞. I willadopt thelimit of meansevaluation criterion and define playeri ’s payoff by

Ui (s) ≡ lim infN→∞

1

N

N∑n=1

ui (an(s)).

Note that, with this criterion, payoffs over any finite time interval have noeffect. This payoff function is extended to the space of independent mixedstrategies of countable support by an expected payoff calculation. Formally,for any measurable setX, let1cX denote the set of probability measures overX of countable support. Furthermore, for anyσj ∈ 1cSj , let supp(σj ) denotethe support ofσj . Then, given a mixed-strategy profileσ ∈ 1cS1 × 1cS2,Ui (σ ) ≡

∑s1∈supp(σ1)

∑s2∈supp(σ2)

σ1(s1)σ2(s2)Ui (s).A (first-order) conjectureof player i , πj , is some probability measure over

player j ’s strategy space. I will consider only conjectures of countable sup-port, so thatπj ∈ 1cSj . By a slight abuse of notation, I will letsi ∈ Si

also denote the distribution that assigns all probability to pure strategysi . LetU ∗i (πj ) ≡ supsi∈Si

Ui (si , πj ) be playeri ’s supremum expected payoff when heholds conjectureπj .3 Given this conjecture and anyε > 0, let

B Rεi (πj ) ≡ {si ∈ Si | Ui (si , πj ) > U ∗i (πj )− ε}be the set ofε-best responses. Finally, let

wi (si ; ε) ≡ infsj∈B Rεj (si )

Ui (si , sj )

be the greatest lower bound on the payoff of playeri if she adopts strategysi andthe opponent plays anε-best response. Loosely speaking,wi (si ; ε) is the valueto playeri of effectively establishing a reputation for using strategysi .

Playeri ’s infimum individually rational payoffis

ui ≡ infaj∈Aj

supai∈Ai

ui (ai ,aj ).

3 Contrary to the case of discounting, one can show that a best response may not exist in thenon-discounted supergame, even if the stage game is finite. I am grateful to David Kreps for a counter-example.

REPUTATION IN REPEATED GAMES 89

A payoff x ∈ R2 of the supergame is consideredstrictly individually rationalifand only ifxi > ui , for i = 1, 2.

Consider strategy perturbations of this supergame. Suppose that, before thegame is played, nature may perturb the players’ intended supergame strategies.With probabilityβi ∈ [0, 1] playeri ’s intended strategy is replaced with someperturbation strategywhich playeri is constrained to play in the supergame.When playeri ’s strategy has been perturbed, nature selectsi ’s strategy accord-ing to the distributionµi ∈ 1cSi (of countable support). This perturbationselection is independent between the players and is private information. Thatis, at the start of the game, playerj does not know whether playeri ’s strategyhas been perturbed, although playerj may deduce this during the course ofplay. I will designate this perturbed game asG∞(β, µ), whereβ ≡ (β1, β2) andµ ≡ (µ1, µ2). Note that sinceβi = 0 is allowed, both the one- and two-sidedincomplete information cases are covered here.

We can also think of the perturbed game as a Bayesian game in which, beforethe supergame is played, nature selects each player’stype. With probability(1 − βi ) player i is rational and with probabilityβi player i is a “perturbedplayer” who selects her strategy according toµi . The perturbed type might bethought of as “crazy” (irrational) or as a player whose supergame payoffs, whichare different than those defined above, justify playingµi .4 I will draw from theBayesian interpretation periodically. For example, when a player’s strategy isperturbed, it will be convenient to speak as though this player “is” a perturbationstrategy. Note also the following convention. Unless otherwise noted, when Ispeak of “payoffs” in the perturbed game, I am referring to the payoffs of therational (unperturbed) players.

I shall adopt a very weak notion of rationality in this setting—a notion thatis quite similar to rationalizability (Bernheim, 1984; Pearce, 1984). First, as Ihave already imposed, players are assumed to hold conjectures of countablesupport. That is, each player assigns positive probability to only a countablenumber of her opponent’s pure strategies. This assumption is made not only fortechnical simplicity, but it is also consequential to my results. Second, agents areassumed to playε-best responses to their initial conjectures (at the start of thegame). Finally, I assume that each player believes this much about her opponent’sconjectures and behavior.

That players hold conjectures of countable support is stronger in principlethan is required by rationalizability. However, the other assumptions are weaker.Recall that rationalizability imposes that best-response behavior is commonknowledge. Here I only require each player to believe that his opponent willplay anε-best response to her conjecture. Whereas I only assume one iteration of

4 Contrast this with the Bayesian framework of Fudenberg and Levine (1989) in which the indi-vidual stage game payoffs are uncertain. See Watson (1994b) for a discussion of the interpretation ofperturbations.

90 JOEL WATSON

“best response beliefs,” rationalizability demands an infinite number of such iter-ations.

Consider, in addition to the rationality assumptions, a restriction on the typeof strategies that players are assessed by their rivals to have chosen.

DEFINITION 1. si is a forgiving strategyif for all h ∈ H , U ∗j (shi ) = U ∗j (si ).

Let Fi be the set of forgiving strategies for playeri and letF ≡ F1× F2.

Loosely speaking, a forgiving strategy is one that will not punish an opponent(or “hold a grudge”) forever. To illustrate, suppose that playerj plays someforgiving strategysj , and that playeri knows this. Further suppose that playeri ’s supremum expected payoff againstsj is x. Imagine that playeri selects somearbitrary finite sequence of actions from the start of the game, through periodn. Sincesj is forgiving, playeri can then find a finite sequence of actions (fromperiodn+ 1 through periodn+m say) such thatx is the supremum expectedpayoff from the continuation game starting at periodn + m+ 1. Because nofinite time matters in computing supergame payoffs, any strategy thateventually(in finite time) plays anε-best response tosj will be anε-best response from thestart of the game.

These behavioral assumptions do not restrict the payoffs that are possible inthe supergame. To convince you of this, I offer the following “folk” theorem(see Rubinstein, 1979; Fudenberg and Levine, 1986), whose simple proof is inthe appendix.

THEOREM1. In a supergame G∞ any feasible, strictly individually rationalpayoff vector can be supported by a subgame perfect equilibrium in which bothplayers adopt forgiving pure strategies.

One final definition is in order before proceeding to the analysis of this paper.Although my main results are in terms of the weak notion of rational behavioroutlined above, useful corollaries exist for equilibrium concepts. Note that inconstructing the rationality notion, I focused on pure strategies and on conjecturesover countable numbers of pure strategies. In the equilibrium setting, the analogis to consider mixed strategies of countable support. An element of1cSi can bethought of as a mixed strategy in the normal form sense, because it is a probabilitydistribution over pure supergame strategies. Formally, this is different than abehavioral strategy, which would specify a mixedaction as a function of thehistory. The mixed strategies of countable support in the normal form sense havethe property that after some finite time (specific to each) they behave “almost”like pure strategies.

I will present corollaries to my results forε-equilibria. Before stating mydefinition ofε-equilibria for the games studied herein, I must extend some of mydefinitions to mixed strategies. Given a mixed-strategy profileσ ∈ 1cS1×1cS2

and a historyh ∈ H , σ is consistentwith h if there is a strategy profiles in thesupport ofσ that is consistent withh. Furthermore, ifσ is consistent withh,

REPUTATION IN REPEATED GAMES 91

σ h shall denote the continuation mixed strategy after historyh. Mathematicallyspeaking, for anyTi ⊂ Si andh ∈ H , let CSh

i (Ti ) ≡ {si ∈ SCi (h) | shi ∈ Ti } be

the set of strategies that are consistent withh and induce continuation strategiesin Ti . Thenσ h is defined as follows. For anyh ∈ H such thatσ is consistentwith h, σ h

i (Ti ) ≡ σi (CShi (Ti ))/σi (SCi (h)), for everyTi ⊂ Si .

My equilibrium definition follows in the spirit of Radner’s (1980) definitionand is basically identical to that found in Watson (1994a). Formally, take aperturbed supergameG∞(β, µ) and someσ ∈ 1cS1 × 1cS2. The distributionσi is a mixed strategy for the rational playeri , i = 1, 2. Given the perturbation,playeri effectively faces the mixed strategy defined byθj ≡ (1− βj )σj + βjµj .The profileσ is anε-equilibriumin the perturbed game if for allh ∈ H consistentwith (σi , θj ) and allsi ∈ supp(σ h

i ), si ∈ B Rεi (θhj ), for i = 1, 2. That is, after each

history along the equilibrium path, the rational players put positive probabilityonly on strategies that areε-best responses to their updated conjectures, andtheir conjectures are correct. This definition should not be confused with thatof perfect equilibrium, which requires best response behavior afterall histories,not just those along the equilibrium path. Whenε = 0 my definition coincideswith the definition of Nash equilibrium.

3. PERTURBATION SELECTION

Consider the perturbed supergame. The specification of rational behavior de-veloped in the last section is usefully summarized in the following definitionabout conjectures and behavior.

DEFINITION 2. Behavior in the perturbed gameG∞(β, µ) is said to betype-RF if the rational players believe (know) the following:

(1) Players’ conjectures are consistent with the form of the game (specifi-cally, the perturbation),

(2) Playeri ’s initial conjectureπj has countable support and assigns pos-itive probability only to playerj ’s forgiving strategies. That is,πj ∈ 1cFj , fori = 1, 2 and j = 3− i .

(3) Players selectε-best responses to their conjectures.

The letters “R” and “F” in the term “type-RF” stand forrationality(that playersselectε-best responses) andforgiveness(referring to the strategies assignedpositive probability in the players’ conjectures). Since playeri knows the formof the game, his conjecture must include the possibility that playerj ’s strategyis perturbed. That is,πj (sj ) ≥ βjµj (sj ) for all sj ∈ Sj , i = 1, 2. Type-RFbehavior therefore constrains the type of perturbations that can be considered.I have assumed thatπj ∈ 1cFj , which necessitates thatµj ∈ 1cFj as well,wheneverβj > 0.

92 JOEL WATSON

The following theorem demonstrates that the presence of perturbations in thesupergame allows players to establish meaningful reputations.

THEOREM2 (Perturbation Selection).Take any perturbed supergameG∞(β, µ) such thatµi ∈ 1cFi , i = 1, 2. Assume that behavior is type-RFand letπ1 andπ2 be the initial conjectures of the rational players. If βi > 0then, for the rational player i,

U ∗i (πj ) ≥ (1− βj )

[sup

si∈supp(µi )

wi (si ; ε/βiµi (si ))

]+ βj zl .

That is, the rational player expects to fare at least as well as if she selected any ofthe perturbation strategies and the rational opponent played anε/βiµi (si )-bestresponse.

As the proof of the theorem shows, the presence of a perturbation strategysi

gives playeri the option of mimicking this strategy. The behavioral assumptionsimply that (depending on the valuesβi , µi (si ), andε) player j will ultimatelyrecognize playeri as this perturbation, and will eventually optimize against it.Playeri will then enjoy the corresponding payoff.

Most significantly, this is a result that does not rely on any equilibrium assump-tion. As long as forgiving strategies are employed, players have the opportunityto build significant reputations even from within a general structure comparableto rationalizability. This result therefore adds a new insight to the theses of Au-mann and Sorin (1989) and Watson (1994a), whose theorems restrict attentionto pure strategy equilibria.

Theorem 2 guarantees that players can forge reputations, and so it boundstheir expected payoffs. However, players may think they can fare better in thegame without building reputations, and what actually occurs may not be whatthe players expect. For those who are willing to accept the notion of equilibrium,Theorem 2 translates into a constraint on theactualoutcome of the game.

COROLLARY 1. Take any perturbed supergame G∞(β, µ) such thatµi ∈1cFi , i = 1, 2. Supposeσ ∈ 1cF1 × 1cF2 is a mixed strategyε-equilibriumfor this game. If βi > 0 then the expected payoff of the rational player i inequilibrium is bounded below by

(1− βj )

[sup

si∈supp(µi )

wi (si ; ε/βiµi (si ))

]+ βj zl ,

for i = 1, 2.

This corollary also holds for a weaker definition of equilibrium in whichplayers’ strategies areε-best responses merely from the start of the game.

REPUTATION IN REPEATED GAMES 93

3.1. Proof of the Perturbation Selection Theorem. The proof of the theoremconsists of two intuitive lemmas. The first asserts that if a player holds a conjec-ture over her opponent’s forgiving strategies, and if the conjecture is of count-able support, then the player can optimize against each of the possible strategiesto which she assesses positive probability (even though she may not initiallyknow which strategy her opponent is using). That is, by optimizing against herconjecture, the player will optimize against theactual strategy chosen by heropponent—as long as she assigns positive probability to this strategy.

LEMMA 1. Suppose player i holds a conjectureπj ∈ 1cFj . Then

U ∗i (πj ) =∑

sj∈supp(πj )

πj (sj )U∗i (sj ).

Proof. Take anyγ > 0. Becauseπj has countable support, there exists afinite setTj ⊂ Fj such thatπj (Tj ) > 1− γ . One possible strategy for playeri is to take sometj ∈ Tj such thatU ∗i (tj ) ≥ U ∗i (sj ) for all sj ∈ Tj , and play aγ -best response totj as long as playerj ’s actions are consistent withtj . This isa “best-case” scenario for playeri . If player j never deviates fromtj then playeri gets at leastU ∗i (tj )− γ . Suppose, however, that playerj does deviate in finitetime and leth be the history through this time. In this instance, playeri knowsthat playerj ’s strategy is nottj . Furthermore,πh

j is playeri ’s (Bayesian) updatedconjecture over playerj ’s strategies in the continuation game.

If SCj (h) ∩ Tj = ∅ then a history must have occurred to which playerioriginally assigned less thanγ probability. In this case, let playeri adopt anycontinuation strategy whatsoever.

If SCj (h) ∩ Tj 6= ∅, have playeri find a t ′j ∈ {thj | tj ∈ SCj (h) ∩ Tj } such

thatU ∗i (t′j ) ≥ U ∗i (sj ) for all sj ∈ {th

j | tj ∈ SCj (h) ∩ Tj }, and let playeri playaγ -best response tot ′j as long as playerj ’s actions are consistent witht ′j . (Thisis a new best-case scenario.) In the event that playerj deviates fromt ′j , playericontinues the process.

Suppose that playerj never deviates from some strategy inTj . SinceTj isfinite, playeri will eventually (according to her conjectureπj ) work her waydown her list of scenarios until she is playing aγ -best response to playerj ’sexactcontinuation strategy. Since playeri assigns positive probability only toplayer j ’s forgiving strategies, this implies that she is playing aγ -best responseto j ’s actual strategy from the beginning of the game. Notice that it is importantthat playeri “start at the top.” For example, if she begins byγ -best-respondingto the opponent’s worst-case strategy, there is no guarantee that the opponent’sbest-case strategy will distinguish itself.

94 JOEL WATSON

Let T j ≡ supp(πj )\Tj . The strategy for playeri that I described above yieldsat least ∑

sj∈Tj

πj (sj )[U∗i (sj )− γ ] +

∑sj∈T̄j

πj (sj )zl

(in expectation), which is bounded below by∑sj∈supp(πj )

πj (sj )U∗i (sj )− γ (1+ zh − zl ).

The valueγ is arbitrary, so (lettingγ → 0) the supremum expected payoff ofplayeri must be at least as great as∑

sj∈supp(πj )

πj (sj )U∗i (sj ).

Notice that this is also the most that playeri can expect, and soU ∗i (πj ) mustequal this value.

The second lemma uses the first to construct a bound on the actual payoffs ofa player whoε-best-responds to her conjecture. The proof directly follows theanalysis of Aumann and Sorin (1989), particularly their Lemma 8.2.

LEMMA 2. Suppose that player i holds a conjectureπj ∈ 1cFj and thatplayer j ’s actual strategy sj is such thatπj (sj ) > 0. Then if si ∈ B Rεi (πj ),

Ui (si , sj ) ≥ U ∗i (sj )− ε

πj (sj ).

That is, if playeri ε-best-responds to her conjectureπj , then her strategy isanε/πj (sj )-best response to playerj ’s actual strategysj .

Proof. From the last lemma, since playeri selects anε-best response to herconjecture, we have that

Ui (si , πj ) ≥∑

tj∈supp(πj )

πj (tj )U∗i (tj )− ε.

Note that

Ui (si , πj ) =∑

tj∈supp(πj )

πj (tj )Ui (si , tj ).

Therefore ∑tj∈supp(πj )

πj (tj )[U∗i (tj )−Ui (si , tj )] ≤ ε.

REPUTATION IN REPEATED GAMES 95

Each bracketed term is non-negative, by definition of a supremum expectedpayoff. Therefore

πj (tj )[U∗i (tj )−Ui (si , tj )] ≤ ε

for all tj ∈ supp(πj ). In particular,

πj (sj )[U∗i (sj )−Ui (si , sj )] ≤ ε

and algebraic manipulation produces the result.

Given the last lemma, Theorem 2 is easily proven. Supposeβi > 0. Oneoption for playeri in the repeated game is to “pretend” to be (mimic the playof) any of his perturbation strategies. Since playerj holds a conjecture that isconsistent with the perturbation,πi (si ) ≥ βiµi (si ), for all si ∈ Si . If player iadopts perturbation strategysi , the last lemma establishes that the rational playerj ’s payoff is bounded below by

U ∗j (si )− ε

πi (si )≥ U ∗j (si )− ε

βiµi (si ).

This guarantees that playerj ’s ε-best response to her conjecture is anε/βiµi (si )-best response to strategysi . Therefore, if playeri were to adoptsi , he is assured atleastwi (si ; ε/βiµi (si )) in the event in which playerj ’s strategy is not perturbed.The result follows.

3.2.Examples. Take the prisoners’ dilemma depicted in Fig. 3 and suppose thatthe supergame is perturbed with the tit-for-tat strategy for both players. That is,µi (TFTi ) = 1 for i = 1, 2. It is easy to show thatUi (TFTi , sj ) = Uj (TFTi , sj )

for all sj ∈ Sj . It then follows thatwi (TFTi ; γ ) = 5−γ in this particular example.Therefore, playeri expects a payoff of at least 5− ε/βi in the supergame. Ifεis small compared toβ1 andβ2 then the players can convince each other thatthey have adopted the tit-for-tat strategy. The option of establishing reputationstherefore leads to long-run cooperation.

Notice that the opportunity to establish a reputation depends on the valuesε

andβ as well as on the distribution of perturbation strategies. The probabilitythat playeri is perturbed with strategysi is βiµi (si ). Therefore, playerj must(by virtue ofε-best-responding to his conjecture) play at least anε/βiµi (si )-bestresponse to strategysi . If ε is relatively large compared toβiµi (si ) player jmay disregard strategysi whenε-best-responding to his conjecture. In this case,playeri is not guaranteed much of a reward for attempting to build a reputationfor playingsi . If, on the other hand,ε is small compared toβiµi (si ) then playeri is guaranteed that, if he adopts strategysi , his opponent will play a near-bestresponse. He can therefore enjoy the corresponding payoff.

96 JOEL WATSON

Now consider a repeated bargaining game. In each period player 1 makes ademanda1 ∈ [0, 1] and player 2 makes an offera2 ∈ [0, 1]. If a1 > a2 thenno exchange takes place and payoffs in the period are zero for both players. Ifa1 ≤ a2 then player 1 receives a payoff of(a1 + a2)/2 and player 2 receives1− (a1+ a2)/2. A new bargaining game takes place in each subsequent period.Suppose there is one-sided incomplete information in that player 1’s strategymay be perturbed, and assume that among the perturbation strategies assignedpositive probability is the strategy that always demandsa ∈ [0, 1]. Then player1 can assure herself of a payoff close toa by making this demand in everyperiod (and therefore mimicking the perturbation strategy). The rational player1 expects at least this much, and she achieves at least this amount in equilibrium.

An interesting scenario, which we might call thereputation paradox, arisesin this game when both players have the opportunity to establish reputations.Suppose that player 1 may be perturbed with the strategy that always demands4/5 and that player 2 may be perturbed with the strategy that always offers only1/5. In this caseboth players believe they can establish reputations and eachbelieves he will obtain a payoff that is close to or in excess of 4/5. However,payoffs close to(4/5, 4/5) are not feasible in the supergame.

Does this mean that there is a technical problem? As we will see, it doesnot. Remember that type-RF behavior is similar to rationalizability and that, inparticular, it does not imply that players holdcorrectconjectures. Players maybelieve that they can “outsmart” each other, although they cannot both do so.Interestingly enough, given the assumption of type-RF behavior, players havethe real option of establishing a reputation. They simply may not be correct intheir estimates about how soon their opponents will be convinced that they haveadopted perturbation strategies. However, if we take an equilibrium notion as ourbehavioral foundation, such a paradox induces a very genuine technical problem.There may not exist an equilibrium under some values of the parameters andsome perturbations. This issue is taken up in Section 5, where you will findan existence theorem. Note immediately, however, that the reputation paradoxdescribed above cannot occur in common interest games, as Aumann and Sorin(1989) first showed. In common interest games, reputations cannot conflict.

4. CONSISTENCY OF TYPE-RF BEHAVIOR

I claimed that the reputation paradox does not lead to a technical problem withregard to type-RF behavior. The central issue is captured by the question, “Is itpossible for type-RF behavior to be common knowledge between the players?”In other words, is type-RF behavior internally consistent, at least at the beginningof the game? To prove consistency, it is sufficient to find strategy subsetsR1 ⊂ F1

andR2 ⊂ F2 with the following property. Fori = 1, 2 and eachsi ∈ Ri there isa conjectureπj of playeri such thatπj = βjµj + (1−βj )θj for someθj ∈ 1cRj ,

REPUTATION IN REPEATED GAMES 97

andsi ∈ B Rεi (πj ). That is, it can be common knowledge between the playersthat their strategy profile is an element ofR1 × R2 and that they playε-bestresponses to their conjectures at the beginning of the game.

THEOREM3. Take any perturbed game G∞(β, µ) in whichµ ∈ 1cF1 ×1cF2, and letε be any positive number. It is possible for type-RF behavior to becommon knowledge between the players.

The conclusion also holds forε = 0 as long as conjectures are constrained tobe finite.

This theorem is proved in the Appendix, but I shall convey the general ideahere for the case of a finite number of perturbation strategies. The method willbe most easily understood by those who have contemplated rationalizability inthe one-shot game of “matching pennies.” In this game it can be common knowl-edge between the players that each believes she will win with certainty. (This, ofcourse, is a critical difference between properties of rationalizability and equi-librium.) For the perturbed supergame, I specify strategies (two for each player)with the following flavor. The four corresponding strategy profiles generate aspecific finite historyh that is consistent with none of the perturbation strategies,so that afterh the players know that they are both rational. Then the players playa one-period matching game, in which each plays one of two stage game actions.Depending on the outcome of the matching game, the strategies generate a su-pergame payoff that one of the players most prefers. Keeping “matching pennies”in mind, one can see how both players may be willing to adopt such strategies;each thinks he will win the matching component of the game with certainty. Itis then possible for the associated strategy sets to be common knowledge.

5. PURE-STRATEGY EQUILIBRIA AND EXISTENCE

For those who are willing to accept the notion of pure-strategy equilibrium asfundamental, we can partially discard the requirement that agents use forgivingstrategies. However, this promotes two new concerns to the forefront. First, notall perturbation strategies can be mimicked effectively by players in search ofreputations. However, as the first theorem below demonstrates, this does notlimit the possibility of significant reputations. Second, some perturbations mayprohibit the existence of an equilibrium. This is the counterpart to the reputationparadox that I discussed in the context of rationalizability. Below I show that ifperturbation strategies are not “too demanding” (a term to be made precise), theexistence of a pure-strategy equilibrium is assured.

Another boon of selecting pure-strategy equilibria is that the interaction be-tweenε andβ is no longer critical to establishing a reputation. This is becausea player, in mimicking a perturbation strategy, may deviate from his “equilib-rium” strategy (this is the test of an equilibrium). In this case, the player will

98 JOEL WATSON

have signalled to his opponent that his strategy is perturbed, andβ is no longerrelevant. His opponent will believe with certainty that he isnot rational.

The following theorem states that given the right perturbation strategy, playerscan establish any reputation. An implication is that, given the right perturbation,we can guarantee that equilibrium payoffs fall within a ball around a point on thePareto frontier, and that this ball can be made arbitrarily small. The theorem is ageneralization of the results of Watson (1994a) and a response to the conjecturetherein. The distinction between this theorem and Corollary 1 is that the theoremdoes not require the rational players to adopt forgiving strategies but does confineattention to pure-strategy equilibria.

THEOREM4. Take any payoff vectorv that is strictly individually rationaland feasible in the game G∞ and letρ be any positive number. Then there existsa pure-strategy profile t∈ F1× F2 with the following properties:

(a) Ui (t) = vi , i = 1, 2.(b) Profile t is a pure-strategy Nash equilibrium of G∞(β, t) for everyβ.(c) For any distributionµ ∈ 1cF1×1cF2 withµ1(t1) > 0andµ2(t2) > 0,

there are valuesβ∗ ∈ (0, 1)andε∗ > 0such that for allε < ε∗ (includingε = 0)and allβ1, β2 ∈ (0, β∗), the payoff vectors associated with the pure strategyε-equilibria of the perturbed game G∞(β, µ) lie abovev − (ρ, ρ).Please consult the Appendix for a proof.

Corollary 1 and Theorem 4 establish bounds on the payoffs arising fromε-equilibria of the perturbed supergame. However, these results do not guaranteethe existenceof an equilibrium under a variety of perturbations. As discussedin the preceding section, if the perturbation allows both players to establishreputations as being very “demanding,” an equilibrium may not exist. By limitingthe range of perturbations, however, I can affirm that an equilibrium exists.

For any distributionµi ∈ 1cFi let

Wi (µi ) ≡ lim supε→0

[sup

si∈supp(µi )

wi (si ; ε)].

Think of wi (si , ε) as the “demand” of playeri ’s strategysi . ThenWi (µi ) isthe most playeri can demand via a reputation when the perturbation is givenbyµi . The following theorem establishes conditions under which pure strategyequilibria exist in perturbed supergames. The theorem also identifies the scopeof payoffs that are supported by equilibria. Therefore we can think of it as a“modified folk theorem.”

THEOREM5. Take any supergame G∞, any µ ∈ 1cF1 × 1cF2, and anyρ > 0. Suppose there is a strictly individually rational and feasible supergamepayoff v such thatv1 > W1(µ1) and v2 > W2(µ2). Then there exist values

REPUTATION IN REPEATED GAMES 99

β∗ ∈ (0, 1) andε∗ > 0such that for everyβ1, β2 ∈ [0, β∗) and everyε ∈ (0, ε∗),G∞(β, µ) possesses a pure-strategyε-equilibrium whose payoffs are withinρ ofv. In addition, for i = 1, 2, the requirement thatvi > Wi (µi ) can be dropped ifβi = 0.Furthermore, if µ is of finite support then these results hold for equilibriain forgiving strategies.

A proof is in the Appendix.As an example, take the bargaining game described in subsection 3.2. Suppose

that the game is perturbed with the strategy of player 1 that always demandsc and the strategy of player 2 that always offersd, regardless of the history.The players may wish to establish reputations as following these perturbationstrategies. However, notice that the reputations are inconsistent ifc > d. Inthis case no pure-strategy equilibrium exists. On the other hand, ifc < d thenTheorem 5 guarantees that an equilibrium exists. Theorem 5 also guarantees theexistence of an equilibrium in any game with one-sided incomplete information.

6. A “GRAIN OF TRUTH” ASSUMPTION AND LEARNING

The analysis of type-RF behavior leads to stark predictions about what playersbelievewill occur in a perturbed, repeated game. However, recall that in thisrationalizability context nothing can be said about the actual outcome of thegame. To derive conclusions about actual outcomes, I needed to replace therationalizability framework with the more stringent requirement of equilibrium.Unfortunately, this approach has two troubling aspects. First, that players reachan equilibrium is a very strong assumption—one which I, for one, am hesitant tomake. Second, an equilibrium may even fail to exist under some perturbations.

One might then wonder if additional insight can be gained by exploring themiddle ground between type-RF behavior and equilibrium. In fact, the answer is“yes.” In equilibrium players know the strategies employed by their opponents,whereas with type-RF behavior no such condition is assumed. One candidatefor enriching the assumptions of type-RF behavior is to add the “seed” of anequilibrium. Suppose one assumes, within the context of type-RF behavior, thateach player assigns at least some positive probability to theactualstrategy playedby her opponent. One might then say, following the terminology of Kalai andLehrer (1993), that each player’s conjecturecontains a grain of truth.5

5 Kalai and Lehrer’s results on learning actually utilize a concept that is weaker than a grain of truth.They assume that the actual distribution over infinite histories (induced by the players’ strategies) isabsolutely continuous with respect to the individual players’ beliefs. I am content to use the simplernotion here.

100 JOEL WATSON

DEFINITION 3. Supposesi ∈ Si is the strategy chosen by playeri andπj ∈1cSj is playeri ’s conjecture over the possible strategies of playerj , for i = 1, 2.If y1 ≡ π1(s1) > 0 andy2 ≡ π2(s2) > 0 then it will be said that the conjecturescontain a grain of truthof ordery = (y1, y2).

Type-RF behavior guarantees that a player will nearly optimize (whenε issmall) against each strategy in the support of his conjecture. Since perturbationsnecessitate that certain strategies are assigned positive probability by players’conjectures, perturbations allow players to establish reputations. As we saw,this leads to outcome selectionon the level of conjectures. Coupling type-RFbehavior with a grain of truth assumption adds real outcome selection to thestory.

THEOREM6. Take any perturbed game G∞(β, µ) such thatµi ∈ 1cFi , andβi > 0, i = 1, 2. Assume that behavior is type-RF and that the conjectures ofthe players contain a grain of truth of order y. Then the actual payoff of therational player i is bounded below by

supsi∈supp(µi )

wi (si ; ε/βiµi (si ))− ε/yj

in the event that player i faces the rational player j.

The proof, which is in the Appendix, is a simple application of Lemma 2 ofsubsection 3.1.

Compare this theorem with Theorem 2. The assumption that conjectures con-tain a grain of truth of ordery implies that playeri ’s actualpayoff is withinε/yj

of what playeri expects to obtain. This is a significant result because what eachplayer expects to obtain can be very great indeed. Figure 4 demonstrates therelevance of, and relationship between, Theorems 2 and 6. In the figure I havelet

wi ≡ supsi∈supp(µi )

wi (si ; ε/βiµi (si ))

denote the supremum of “reputation values” defined byµi andε. The figureshows the payoffs that result in the case in which both players are rational.Player i expects (from Theorem 2) to achieve at least a payoff ofwi in therepeated game, fori = 1, 2. With the grain of truth assumption (Theorem 6),player i ’s actual payoff is bounded below bywi − ε/yj . The actual payoffs inthe game therefore lie in the shaded region of the figure.

Two additional results are important to note. First, not all orders of grain oftruth may be possible. If(w1− ε/y2, w2− ε/y1) is not feasible in the supergamethen it cannot be possible, under type-RF behavior, for conjectures to contain agrain of truth of ordery. This result is directly analogous to the possibility (as in

REPUTATION IN REPEATED GAMES 101

FIG. 4. Expected and actual payoffs.

Section 5) that an equilibrium may not exist in the perturbed game. Obviously,though, a grain of truth is possible fory sufficiently small.

The second noteworthy result concerns type-RF behavior whenε = 0. In thiscase, if conjectures contain a grain of truth then type-RF behavior implies Nashequilibrium. This follows from Lemma 2, which in this instance demonstratesthat each player actually best-responds to each strategy of her opponent’s towhich she assigns positive probability, including her opponent’sactual strat-egy. A generalization is also apparent. If behavior is type-RF and conjecturescontain a grain of truth of ordery, then the players’ chosen strategies form anε/ min{y1, y2}-equilibrium.

7. CONCLUSION

Identifying in an abstract environment the conditions under which reputa-tions may flourish can help us understand how effective reputations may be inreal-world settings. In situations in which people interact repetitively, opportu-nities to establish reputations may lead to cooperation. The perturbed, repeatedprisoners’ dilemma, which has been applied extensively by researchers frommany disciplines, serves as a good example. However, just as reputations mayinduce cooperation, so might they incite competition and conflict. The possi-bility of conflict depends not only on the type of perturbation but also on thetype of stage game being played. In common interest games, for example, there

102 JOEL WATSON

can be no conflict, and reputations always lead to some level of cooperation. Inbargaining games, though, conflict is more probable.

Such settings tug at the seams of our theoretical models and force us to evalu-ate the foundations of the theory. For example, if we believe that players will (ifat all) arrive at pure strategy equilibria, and if we accept strategy perturbationsas an acceptable modeling tool, then we must be willing to admit that playerssometimes will not reach an equilibrium. Alternatively, perturbations allow usto identify conditions under which players necessarily behave in a very sophis-ticated fashion—perhaps one marked by uncountable mixed-strategy equilibria.In my view, these findings suggest a rationale for embracing the less stringentnotion of rationalizability in some contexts. As I showed, the weak assumptionof type-RF behavior is sufficient to allow reputation phenomena. Furthermore,type-RF behavior is an internally consistent notion in the sense that it can becommon knowledge between the players.

APPENDIX

Proof of Theorem1. Letv be any feasible, strictly individually rational pay-off vector in the supergame. Then there is a sequence of actionshv ≡ {an}∞n=1that achievesv. Define the strategy profilet as follows. Strategyti prescribesplayeri to follow hvi as long as playerj follows hvj . If player j deviates fromtj

in a given period (either by failing to followhvj or by failing to punish playeriwhen ordained to do so) then playeri begins a punishment phase. The punish-ment phase involves playeri forcing player j ’s stage game payoffs sufficientlyclose touj until player j ’s average payoff from the start of the game has fallenbelowvj − r j , wherer j ∈ (0, vj − uj ). After a punishment phase,t dictates thatthe players followhv anew, acting as though the game started over.

Players cannot gain by deviating fromt , either when they are in “good stand-ing” or when they are supposed to punish one another. If playeri deviates fromta finite number of times, his payoff isvi in the supergame, as is the case in equi-librium. If player i deviates an infinite number of times from his equilibriumstrategy, then there is an infinite number of periods at which playeri ’s aver-age payoff falls belowvi − ri . In this case, since we are employing the lim infof average payoff criterion, playeri ’s supergame payoff is bounded above byvi − ri < vi .

Proof of Theorem3. I will first provide a proof for the case in whichµi ,i = 1, 2, is of finite support (that is, when there are a finite number of pertur-bation strategies). Following this, I will describe how the proof is extended toaccommodate a countably infinite number of perturbation strategies.

Let s′1 ∈ F1 ands′2 ∈ F2 be such thatwi (s′i ; ε/2) = maxsi∈supp(µi ) wi (si ; ε/2),for i = 1, 2. Notice that we can find individually rational payoff vectorsx andy

REPUTATION IN REPEATED GAMES 103

such thatx1 ≥ w1(s′1; ε/2), y2 ≥ w2(s′2; ε/2), x1 ≥ y1, andy2 ≥ x2. Think of s′ias the perturbation strategy that playeri would most like to establish a reputationas playing. Thenx is an individually rational payoff vector that gives player 1 atleast what he can expect from mimickings′1 andy gives player 2 at least whatshe can expect from mimickings′2.

I shall now define several strategies. First notice that since the number ofperturbation strategies is finite, there is an integerm> 0 and a history of actionprofilesh ∈ Hm such thath is consistent withnoneof the perturbation strategies.That is, after historyh the players must assign zero probability to the event thatone or both of the players’ strategies has been perturbed. Realize also that we canfind forgiving strategy profilessx andsy such thatUi (sx) = xi andUi (sy) = yi ,for i = 1, 2, U ∗1 (s

y2) = x1, andU ∗2 (s

x1) = y2. Furthermore, letai , bi ∈ Ai be

such thatai 6= bi , i = 1, 2.Notice that we can findforgiving strategiesta

1 , tb1 ∈ F1 andta

2 , tb2 ∈ F2 with

the following properties: fori = 1, 2

• each strategy profilet ∈ {ta1 , t

b1 } × {ta

2 , tb2 } is consistent with historyh;

• tai (h) = ai andtb

i (h) = bi ;• after historiesha and hb strategiesta

i and tbi follow the continuation

strategysxi . After historiesh(a1, b2) andh(b1,a2) strategiesta

i andtbi follow the

continuation strategysyi ; and

• if history l ∈ H is consistent with neithertaj nor tb

j , but is consistent withat least one of the perturbation strategies of playerj , then bothta

i andtbi prescribe

a continuation strategy that is anε-best response to the possible (perturbation)continuation strategies of playerj and yields a payoff withinε of wj (sj ; ε) toeach possible perturbation strategysj .

To see that such forgiving strategies exist, note that I have not defined the strate-gies for all histories. In particular, I can specify that after “unexpected” histories(those that are not generated by the specifications above)ta

i andtbi can “give in,”

perhaps by followingsxi or sy

i .Suppose the rational playeri adopts strategyti ∈ {ta

i , tbi }, for i = 1, 2. The

following outcome path ensues. If playeri ’s strategy is perturbed, the rationalplayer j (playing tj ) will discern this by periodm and will then play anε-bestresponse. If neither of the players’ strategies are perturbed, the players willchoose actions corresponding to historyh. After history h the players assignzero probability to the event that either’s strategy is perturbed. Then at periodm+ 1 the players enter into a “matching” game in which player 1 choosesa1

or b1 and player 2 choosesa2 or b2 (according to which exact strategy profileis being played). If the players “match” by playing(a1,a2) or (b1, b2) then theycontinue according tosx. This leads to the payoff vectorx which player 1 enjoys.If, on the other hand, the players do not match, which is the case if they play

104 JOEL WATSON

(a1, b2) or (b1,a2), they continue according tosy. This leads to the payoff vectory which favors player 2.

For i = 1, 2 define conjecturesπaj andπb

j in the following manner. Recallthat my abuse of notation allows anysi ∈ Si to also represent the probabilitydistribution that assigns all mass to strategysi . Then defineπa

j ≡ βjµj + (1−βj )ta

j andπbj ≡ βjµj+(1−βj )tb

j , for j = 1, 2. For example,πaj is the conjecture

that is consistent with the perturbation and that assigns probability one to theevent that the rational playerj chooses strategyta

j .It is not difficult to see thatta

1 ∈ B Rε1(πa2 ), tb

1 ∈ B Rε1(πb2 ), ta

2 ∈ B Rε2(πb1 ),

andtb2 ∈ B Rε2(π

a1 ). For instance, if player 1 believes that the rational player 2

has chosen strategyta2 for sure, thenta

1 is anε-best response to her conjecture.In this case, player 1 is willing to “give in” if player 2 wins the matching gameat periodm+ 1 (by selectingb2). However, player 1 assigns zero probability tothis occurrence, and so player 1 does not expect that she will be forced to givein.6

With these strategies, the players may believe that they can “outsmart” eachother. Player 1 may think that his opponent will choose actiona2 in periodm+ 1, and therefore selecta1. Player 2 nevertheless might swear that player 1will chooseb1 and therefore may playa2. As with the notion of rationalizability,the conjectures of the players may not be correct.

Now suppose thatµi is of countably infinite support. Extending the proofabove to cover this case is not difficult. Notice first that there exist finite setsX1 ⊂ S1 andX2 ⊂ S2 such thatµi (Xi ) > 1− ε/(zh− zl ), for i = 1, 2. Considerthat the players basically ignore the perturbation strategies not contained inX1

or X2. Defines′i ∈ Fi so thatwi (s′i ; ε/2) = maxsi∈Xi wi (si ; ε/2) for i = 1, 2, andlet h be a finite history that is consistent with none of the strategies ofX1 andX2.

With these alterations the proof proceeds as before, and we definex, y, sx, sy,ta1 , tb

1 , ta2 , andtb

2 accordingly. The only difference is that there may be perturbationstrategies (not contained inX1 or X2) that are consistent with historyh. In theproof forµ of finite support, after historyh each player assigns zero probabilityto the event that her opponent is perturbed. Now, however, if historyh is reached,each player merely believes that her opponent is rational with a probability thatexceeds 1−ε/(zh−zl ). Nonetheless, this is sufficient to imply thatta

1 ∈ B Rε1(πa2 ),

tb1 ∈ B Rε1(π

b2 ), ta

2 ∈ B Rε2(πb1 ), andtb

2 ∈ B Rε2(πa1 ) as before, and the proof is

finished.

Proof of Theorem4. Take as given a gameG∞, ρ > 0, and a payoff vectorv that is strictly individually rational and feasible in the supergame. Theorem 1

6 Note thatta1 would not be anε-best response to conjectureπa

2 if one required a more stringent,sequential notion of best response behavior. I am obviously exploiting theex antenotion of a bestresponse here, although I believe a proof is possible even under the more restrictive definition in whicha strategy must specify anε-best response afterall histories.

REPUTATION IN REPEATED GAMES 105

(the folk theorem) guarantees the existence of an equilibrium strategy profilet ∈ F1 × F2 such thatUi (t) = vi , for i = 1, 2. This satisfies part (a) of thetheorem. It is not difficult to see that the strategy profile constructed in the proofof Theorem 1 has two additional properties: (1)wi (ti ; 0) = vi and (2) there is anumberr > 0 such thatsi ∈ B Rr

i (tj ) implies thatsi ∈ B Ri (tj ), for i = 1, 2. Iwill take advantage of these properties shortly.

Sincet is an equilibrium inG∞, it is also an equilibrium in the perturbed gameG∞(β, t). This is because the perturbation selects the same strategy(t) that therational players adopt. Thus part (b) of theorem is satisfied.

To prove part (c) letρ ′ ≡ min{ρ, r /2} and take any distributionµ ∈ 1cF1 ×1cF2 such thatµ1(t1) > 0 andµ2(t2) > 0. Defineε∗ > 0 to satisfyr > ε∗/µi (ti )and defineβ∗ > 0 to satisfy both(1− β∗)vi + β∗zl > vi − ρ ′ and(vi − ρ ′ −β∗zh)/(1− β∗) > vi − r , for i = 1, 2. Now take anyε ∈ (0, ε∗) and anyβ1,β2 ∈ (0, β∗). Note that(1−βj )vi+βj zl > vi−ρ ′ and(vi−ρ ′−βj zh)/(1−βj ) >

vi − r , for i = 1, 2.Given any two strategy profiless ands′, I will say that their outcome paths

diverge at periodn if an(s) 6= an(s′). Let s be any pure-strategyε-equilibriumin the perturbed supergameG∞(β, µ). Supposes never diverges fromt . ThenUi (s) = Ui (t) = vi and playeri ’s expected equilibrium payoff is bounded belowby

(1− βj )vi + βj zl > (1− β∗)vi + β∗zl > vi − ρ ′ ≥ vi − ρ

for i = 1, 2. Therefore in this case part (c) of the theorem is satisfied.The other possibility to consider is thats diverges fromt at some period. This

implies that the strategy profile(ti , sj ) diverges froms at some point, for somei .Let n be the first period at which(ti , sj ) diverges froms. That is, in equilibrium,prior to periodn player j cannot distinguish between her opponent’s equilibriumstrategy and the perturbation strategyti . Then, at periodn, strategyti takes adifferent action than does playeri ’s equilibrium strategysi . In equilibrium, ifplayer i has been perturbed with strategyti , after periodn player j will inferthat playeri ’s strategy has been perturbed. In other words, playerj will realizeafter periodn that he faces a perturbation strategy, not the rational playeri .

At periodn + 1, when playerj first rules out that he faces a rational oppo-nent, his updated conjecture will assign positive probability only to perturbationstrategies. In particular, his conjecture will assign probability of at leastµi (ti )to the continuation strategy ofti (by Bayes’ rule). In equilibrium, the rationalplayer j must play anε-best response to his updated conjecture from periodn+ 1. Lemma 2 then implies that the rational playerj will play anε/µi (ti )-bestresponse toti in the continuation game from periodn+ 1. Since no finite timematters for supergame payoffs, this implies thatsj is anε/µi (ti )-best responseto ti in the supergame. Recall thatr > ε∗/µi (ti ) and so (by property (2) above)sj is a best response toti in the supergame.

106 JOEL WATSON

Now consider playeri ’s choice of strategy in the supergame. By definition,she must fare at least as well by using her equilibrium strategy as she would ifshe were to select any other strategy. One possibility is that playeri adopt theperturbation strategyti instead of her equilibrium strategy. The rational playerjwill best-respond toti , so playeri expects at least

(1− βj )w(ti ; 0)+ βj zl = (1− βj )vi + βj zl > vi − ρ ′ ≥ vi − ρif she follows ti . This follows from property (1) above and the constraint thatβj < β∗. The bound of part (c) of the theorem is therefore satisfied for playeri .

Now we must find a bound for playerj when(ti , sj ) diverges froms. Twofurther cases must be analyzed. First, suppose(si , tj ) also diverges froms. Thatis, suppose playerj also has the opportunity to at some point deviate from hisequilibrium strategy to followtj , when playeri is rational. In this case, theanalysis above applies as well to playerj and the bound of part (c) is satisfiedfor player j .

Second, suppose that, although playeri has the opportunity to deviate fromsi

to follow ti , player j neverhas the corresponding opportunity (as long as playeri follows si ). In other words,(si , tj ) never diverges froms. In this circumstance,we can still fashion a bound on playerj ’s equilibrium payoff. Remember thataccording to the equilibrium, the rational playeri expects at leastvi − ρ ′. Thisimplies that the rational playeri expects at least(vi − ρ ′ − βj zh)/(1 − βj )

conditional on facing the rational playerj . By the definition ofβ∗, this valueexceedsvi − r . Since the rational playerj plays like strategytj along the pathgenerated bys, it must be thatsi ∈ B Rr

i (tj ). Property (2) above implies thatsi ∈ B Ri (tj ). Finally, then, property (1) establishes that playerj obtains at leastvj against the rational opponent in equilibrium, and the bound of part (c) issatisfied for playerj .

Proof of Theorem5. Take any perturbed gameG∞(β, µ) in which µ ∈1cF1 × 1cF2, and any feasible supergame payoffv such thatvi > Wi (µi ),for i = 1, 2. Let

α ≡ 12 min

{v1−W1(µ1), v2−W2(µ2), v1− u1, v2− u2

}.

By the definition ofWi (µi ) there exists anε∗ such thatε ∈ (0, ε∗) implies thatwi (si ; ε) < Wi (µi )+ α, for si ∈ supp(µi ) andi = 1, 2. SinceWi (µi )+ α < vi

we can findβ∗ > 0 such thatβ∗zl + (1−β∗)vi > β∗zh+ (1−β∗)[Wi (µi )+α].Now take anyε ∈ (0, ε∗) and anyβ such thatβi ∈ [0, β∗), i = 1, 2. Sincev

is feasible, there is a sequence of actionshv ≡ {an}∞n=1 that achievesv. Definethe strategy profilet as follows. Strategyti prescribes playeri to follow hvi aslong as playerj follows hvj . If player j deviates fromhvj at some point, thentiprescribes play according to (a) in the case thatβj > 0 and (b) in the case thatβj = 0.

REPUTATION IN REPEATED GAMES 107

(a) Playeri forms a list,s1j , s

2j , . . . , of the perturbation strategies for player

j that are consistent with the history of play in the game. If the list is emptythenti proceeds according to (b) below. Note that one or more of the strategieson the list may be behaviorally equivalent conditional on the history of play.Such strategies are listed as one (with their probabilities summed). If playeri ’sassessment of the probability ofs1

j (updated conditional on the history) exceeds1− ε/2(zh − zl ) thenti prescribes that playeri play anε/2-best response tos1

j

as long as playerj ’s behavior remains consistent withs1j . If j ’s behavior fails

to conform tos1j at some point, then (a) is started once again. Now take the case

in which the probability ofs1j is not greater than 1− ε/2(zh − zl ). There exists

a finite sequence of actions for playeri that will distinguishs1j from s2

j . (Aftertaking these actions and observingj ’s behavior, playeri will have ruled outeithers1

j or s2j , or both.) Strategyti prescribes that these actions be taken and

(a) started once again, where the new list of perturbation strategies is formed bypruning the old list and preserving the order of the remaining elements.

(b) In the continuation, strategyti prescribes that playeri follow hvi anew, aslong as j follows hvj . If player j deviates fromhvj thenti punishes playerj asin the proof of theorem 1, until playerj ’s average payoff from the beginning ofthe game falls belowuj + α. After the punishment phase,ti starts (b) again.

If the rational players follow strategyt , then playeri ’s expected payoff isbounded below by(1− βj )vi + βj zl > (1− β∗)vi + β∗zl . This is also a validbound on continuation payoffs after each history that is consistent witht .

Suppose the rational players adopt strategy profilet and consider any historyh which is consistent with profilet . After h playeri ascribes (by Bayes’ rule) atleast probability(1−βj ) to the event that her opponent is rational. Playeri , afterh, may consider deviating fromti in either of two ways. First, she can follow oneof the perturbation strategies that deviates from the play of the rational players.If player i were to take this course, her payoff is bounded above by

(1−βj )[Wi (µi )+α]+βj zh < (1−β∗)[Wi (µi )+α]+β∗zh < (1−β∗)vi +β∗zl

by the definitions ofα, β∗ andε∗. Second, she may deviate from her equilibriumstrategy and all of her perturbation strategies. This behavior can earn playeri nomore than

(1− βj )[ui + α] + βj zh < (1− β∗)[ui + α] + β∗zh < (1− β∗)vi + β∗zl .

Therefore it is not in playeri ’s interest to deviate fromti after any such historyh.Strategyti also specifiesε-best response behavior after all histories in which

player j has been discovered to be a perturbation. These facts imply thatt is anε-equilibrium in the gameG∞(β, µ).

The second assertion is easily proved. Furthermore, ifµ has finite supportthen after any history there is a sequence of actions for playerj that makes her

108 JOEL WATSON

behavior consistent with none of her perturbation strategies. By playing these,she inducesti to play according to (b) above. In the finite case, therefore, playerj can obtainvj = U ∗j (ti ) againstti after any history, which proves thatti isforgiving.

Proof of Theorem6. Take any perturbation strategysi in the support ofµi

and letsj be the actual strategy of the rational playerj . We know from Lemma 2that the rational playerj , by ε-best-responding to her conjecture, plays anε/βiµi (si )-best response tosi . This implies thatUi (si , sj ) ≥ wi (si ; ε/βiµi (si )),which in turn implies thatU ∗i (sj ) ≥ wi (si ; ε/βiµi (si )). Since conjectures con-tain a grain of truth of ordery, playeri must assign at least probabilityyj > 0 toplayer j ’s actual strategysj . Player i selects anε-best response to his con-jecture, and so (invoking Lemma 2 again) his payoff is bounded below byU ∗i (sj )− ε/yj .

REFERENCES

AUMANN, R. J.,AND SORIN, S. (1989). “Cooperation and Bounded Recall,”Games Econ. Behav. 1,5–39.

BERNHEIM, B. D. (1984). “Rationalizable Strategic Behavior,”Econometrica52, 1007–1028.

CHO, I.-K. (1994). “Rationalizability, Stationarity and Bargaining,”Rev. Econ. Stud. 61, 357–374.

FUDENBERG, D., AND LEVINE, D. (1989). “Reputation and Equilibrium Selection in Games with aPatient Player,”Econometrica57, 759–778.

FUDENBERG, D., AND MASKIN, E. (1986). “The Folk Theorem in Repeated Games with Discountingor with Incomplete Information,”Econometrica54, 533–554.

KALAI , E., AND LEHRER, E. (1993). “Rational Learning Leads to Nash Equilibrium,”Econometrica61, 1019–1045.

KREPS, D. M., MILGROM, P., ROBERTS, J., AND WILSON, R. (1982). “Rational Cooperation in theFinitely Repeated Prisoner’s Dilemma,”J. Econ. Theory27, 245–252.

KREPS, D. M., AND WILSON, R. (1982). “Reputation and Imperfect Information,”J. Econ. Theory27,253–279.

MILGROM, P.,AND ROBERTS, J. (1982). “Predation, Reputation, and Entry Deterrence,”J. Econ. Theory27, 280–312.

PEARCE, D. G. (1984). “Rationalizable Strategic Behavior and the Problem of Perfection,”Economet-rica 52, 1029–1050.

RADNER, R. (1980). “Collusive Behavior in Noncooperative Epsilon-Equilibria of Oligopolies withLong but Finite Lives,”J. Econ. Theory22, 136–154.

RUBINSTEIN, A. (1979). “Equilibrium in Supergames with the Overtaking Criterion,”J. Econ. Theory21, 1–9.

SCHMIDT, K. M. (1993). “Reputation and Equilibrium Characterization in Repeated Games of Con-flicting Interests,”Econometrica61, 325–357.

TAKAHASHI , I. (1992). “Perturbations and Equilibrium in the Repeated Prisoners’ Dilemma Game,”unpublished manuscript.

WATSON, J. (1993). “A ‘Reputation’ Refinement without Equilibrium,”Econometrica61, 199–205.

REPUTATION IN REPEATED GAMES 109

WATSON, J. (1994a). “Cooperation in the Infinitely Repeated Prisoners’ Dilemma with Perturbations,”Games Econ. Behav. 7, 260–285.

WATSON, J. (1994b). “Strategy Perturbations in Repeated Games as Rules of Thumb,” UCSD WorkingPaper 94-20.