Binmore Modeling Rational Players II

download Binmore Modeling Rational Players II

of 48

Transcript of Binmore Modeling Rational Players II

  • 7/28/2019 Binmore Modeling Rational Players II

    1/48

    Economics and Philosophyhttp://journals.cambridge.org/EAP

    Additional services forEconomics and

    Philosophy:

    Email alerts: Click hereSubscriptions: Click hereCommercial reprints: Click hereTerms of use : Click here

    Modeling Rational Players: Part II

    Ken Binmore

    Economics and Philosophy / Volume 4 / Issue 01 / April 1988, pp 9 - 55DOI: 10.1017/S0266267100000328, Published online: 05 December 2008

    Link to this article: http://journals.cambridge.org/abstract_S0266267100000328

    How to cite this article:

    Ken Binmore (1988). Modeling Rational Players: Part II. Economics and Philosophy,4, pp 9-55 doi:10.1017/S0266267100000328

    Request Permissions : Click here

    Downloaded from http://journals.cambridge.org/EAP, IP address: 144.173.6.37 on 06 Apr 2013

  • 7/28/2019 Binmore Modeling Rational Players II

    2/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    Economics and Philosophy, 4, 1988, 9-55. Printed in the United States of America.

    MODELING RATIONAL PLAYERSPart II

    KEN BINMORELondon School of Economics

    . . . why may we not say that all Automata . . . have an artificiall life.H o b b e s , LeviathanE v e n i n s e l f - co n s c i o u s n es s , t h e I . . . r e m a i n s a r i d d l e t o i tself.

    S c h o p e n h a u e r

    1. INTRODUCTIONThis is the second p art of a two-part p aper. It can be read inde pen den tlyof the first part provided that the reader is prepared to go along withthe unorthodox views on game theory which were advanced in Part Iand are sum ma rized below . The body of the pap er is an attem pt to studysome of the positive implications of such a viewpoint. This requires anexploration of what is involved in modeling "rational players" as com-puting machines.The basic tenet borrowed from Part I is the claim that traditionalgame theory is unsatisfactory insofar as the behavior of ideal "perfectlyration al" play ers is treated axiomatically a la Bourbaki. In the first p lace,it does not "de liver the g oo ds ." For games of any com plexity, confusionreigns supreme about what the "correct" analysis ought to be. Simul-taneously, there are simple games for which it is quite clear what the"correct" traditional analysis is, but for which the conclusions of thisanalysis are profoundly uncomfortable. In the second place, doubts areThis paper, and its predecessor, which consist of amplified and revised extracts from aworking pa per (1986) of the sam e title, were written w ith the supp ort of NSF grant nu m berSES-8605025. I am indebted to numerous individuals for much useful comment, but par-ticularly to Ariel Rubinstein.Figures 1-7 appear on pages 49-53. 1988 Cambridge University Press 0266-2671/88 $5.00 + .00 9

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    3/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    10 KEN BINMOREappropriate about the logical foundations of the traditional approach. Itis , by now, a cliche that completeness is incompatible with consistencyfor formal deductive systems.1 Any such system is therefore "impe rfect"in the sen se th at it can be replaced by a better system. Traditional th eoryseeks to evade this issue by confining its attention exclusively to sub-stantive que stion s (w hat do players do?) and entirely neglecting proceduralquestions (how do players decide what to do?). As Simon (1976) hasobserved, this is a criticism that is relevant not only to game theory, butto economic theory in general.2 In game theory, however, matters areparticularly serious. Indeed, in the first part of this paper, I argued thatit is because traditional theory neglects procedural questions that it hasfailed to "deliver the goods."To seek to tackle procedural questions seriously is to commit oneselfto an attempt to model the thinking processes of the players explicitly.Traditional ga m e th eo ry lacks such a model and henc e is helpless in theface of the counterfactual: suppose a perfectly rational player made thefollowing sequence of irrational moves, then. . . . But an equilibriumanalysis cannot evade such counterfactuals. What keeps players on theequilibrium path is their expectation of what would happen if they wereto deviate from equilibrium play. However, the traditional straitjacketmakes it very hard to confront such issues squarely. The result is theconstruction of magnificent mathematical edifices of which a medievalscholastic mig ht justly be pro ud, b ut little in the way of genu ine prog ress.The existence of an explicit model of a player allows out-of-equilib-rium play to be treated in a non-metaphysical manner. To illustrate thispoint, a lighthearted example is borrowed from Selten and Leopold(1982) who, in turn, borrowed it from Lewis (1976). Consider the coun-terfactual: if kang aro os had n o tails, they w ould top ple over. Selten andLeopold argue3 that, to make sense of such a counterfactual, it is nec-essary to have available a backgrou nd theory w hich is ade qua te to adm itthe construction of a computer model of a kangaroo. The parameters ofsuch a model may then be varied so as to "remove the kangaroo's tail."The stability of the resulting construct can then be tested (perhaps bysimulation), the reb y attaching a mean ing to the superficially nonsensicalcounterfactual.Thus, in gam e theory , the observation of an unanticipated 4 sequenceof moves on the part of an opponent should lead to an updating of themodel used to describe his or her thinking processes and hence to a1. Of sufficient complexity (Gddel's theorem).2. The theory of the organization of the firm is a notable exception.3. Lewis (1976) offers a more grandoise theory.4. In what follows later, all sequences of moves are anticipated, in the sense that theyare assigned positive probability. What is to be understood is that the observation ofa very low probability event will lead to a very substantial revision of the model viaBayesian updating.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    4/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 11revision of the predictions of his or her future behavior. Of course , in somesituations su ch a procedure w ill result in "hustlin g" 5 and other attemptsat deceit. But untangling such attempts is part of what game theoryshould be about.Such a viewpoint forces attention to be paid to issues which tradi-tional game theory leaves unformalized and therefore, in accordancewith the Bourbaki ethos, neglects altogether. In particular, the processby means of which equilibrium is reached (the "libration") will neces-sarily determine the nature of the equilibrium achieved. Part I of thispaper distinguished two types of equilibriating environment: the evo-lutive and the eductive. The former is undoubtedly the more significantfor positive economics. In such an e nviro nm ent, the players are seen assimple stimulus-response m achines whose behavior has the app earanceof having ad apte d to the behavior of other m achines because ill-adaptedmachines have been weeded out by some form of evolutionary com-petition. The dynamics of the libration process are therefore external tothe players and visible over time to an observer as a game is playedrepeatedly.

    This paper, however, follows most economic theorists in confiningattention to the eductive context.6 This mea ns tha t equilibrium is achievedthrough careful reasoning by the agents before and during the play ofthe game. Sometimes the use of equilibrium strategies in this context ismade par t of the axiomatic characterization of rationality. But this evade sthe fundamental question: namely, what is the "right" equilibrium? Toresolve this question, it is necessary to reduce the problem to an ap-propriate one-person problem in which players deduce informationabout the expected play of their opponents through some form of in-trospection. Central to this problem are the reasoning chains whichbegin, "If-I-think-that-you-think-that-I-think. . . ." Such conside rationsare internal to a player. The dynamics of an eductive libration are there-fore invisible to an observer. But the point of view we are pushing hereinsists that this is no good reason for procee ding as tho ugh the dynam icswere absent altogether.7

    The use of computing machines (automata) to model players in anevolutive context is presum ably u ncontrove rsial. For evolutive work, theemphasis belongs on machines of low complexity compared with theirenvironment, in accordance with Simon's (1955, 1959, 1977) notion ofbou nde d rationality. In Part I of this pap er, it wa s argued that c om puting5. Playing "badly" in the hope of tempting the opponent into an unwise attempt atexploitation.6. Although without the implicit assumption that the same results are to be expected inan evolutive setting.7. It is not denied that common knowledge conditions on beliefs allow a satisfying staticanalysis of certain static games. But an evasion of the study of the process by meansof which beliefs are formed leads to difficulties for dynamic games.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    5/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    12 KEN BINMOREm ach ines are also appropria te for mo deling players in an eductive context.Mathematicians are sold on the idea that they can model themselves,in their aspect as formal calculators, in this way.8 One can thereforedefe nd the u se of comp uting m achine s, in an eductive context, by claim-ing that players are to be modeled as mathematicians of the formalistschool. Such computing machines, however, must be understood tohave the potential to be of very high complexity compared with theiren viro nm en t. In particular, there is no reason why such machines sho uldnot reprogram themselves as the game proceeds.Furthe r com me nt on the previous paragraph is postpon ed until Sec-tion 2. The point which requires emphasis at this stage is that, even inan eductive context, with a machine that has access to all relevant in-formation and which can compute for an arbitrarily long period, it isnecessarily the case that the machine will sometimes be in error in thepredictions it makes about its opponents. The reason is essentially thatthe machine would sometimes calculate forever if this were permitted.To avoid this, a "stop ping ru le " mu st be built in. If such a stopping ruleguillotines an exact calculation, then the machine will be forced to em-ploy an alternative "guessing algorithm." By its nature, such a guessingalgorithm will sometimes guess wrongly. Part I of this paper offered aformal argument to this effect, along with the conclusion that "perfectrationality" is therefore an unattainable ideal useful only for metaphys-ical purposes.9Observe that there is necessarily an arbitrary element in the choice ofstopping -rule-cum -guessin g algorithm , if only because it will always bebetter to stop later and hence guess less frequently. This may be un-comfortable for the mathematically minded, but it has advantages whenit comes to "removing the kangaroo's tail" in order to explain an un-anticipated event. From the point of view being advocated here, a ra-tional opp on en t ca nnot be seen as a uniqu e " ty pe ." There are an infinitenumber of types of model which can justifiably be described as "ra-tional." This leaves open the possibility of explaining deviations frompredicted play without the necessity of abandoning the hypothesis thatthe opponent is rational. He or she may not be of a type to which highprobability w as attache d originally, but such probabilities can be u pd atedas the game proceeds.I am aware that the "trembling-hand" explanation of deviations alsodoes not req uire aband oning the hyp othesis that the opp one nt is rationalafter a deviation. However, in Part I of this paper, it was argued at8. The reference is to the Church/Turing hypothesis and includes Turing machines underthe general heading of computing machines.9. M egidd o (1986) sum ma rizes matters elegantly with the oxymoron: " . . . a fully rationalplayer . . . can even decide undecidable problems."

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    6/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 13length that, in an eductive context, the trembling-hand explanation shouldbe one of last resort. This is not to deny that irrational "mistakes" mayfrequently be needed to explain deviations. All that is claimed is that,when the "kangaroo's tail" can be removed in various ways, unlikelyparameter variations of the model should not be ranked above likelyparameter variations.Part II of this paper is no more able to offer sharp answers to theproblems raised than Part I. Since sharp answers would provide anincidental resolution of the problem of scientific induction, this is per-haps not surprising. However, diffuse answers are not without valueand those offered do seem to provide some insight into a number ofquestions. The interpretation of mixed strategies (Section 4) and the issueof "forwards induction" (the examples of Sections 6 and 7) deserveexplicit mention. On the latter question, a summary of the general ap-proach advocated for the eductive context is postponed until the con-clusion (Section 8). It may seem perverse to delay this account until thevery end, but I am anxious that what is essentially a program for furtherstudy not be confused with yet another "equilibrium concept." Perhapsit may provide a framework for an equilibrium concept if a good theoryof eductive libration processes ever emerges, and perhaps the consid-erations of Sections 2, 3, and 5 may have a place in such a theory. Asthings stand, however, possibly their most useful role is in providing awarning that schemes like that offered in the conclusion should not beinterpreted too naively.Finally, as in so many things, it should be noted that Harsanyi andSelten have visited most of these issues before. We refer, in particular,to Selten (1975) on "mistakes," Selten (1978) on "machine models,"Marschak and Selten (1978) on "correlated trembles," Harsanyi (1967,1968) on "t ypes ," H arsan yi (1975) and Harsany i an d Selten (1980, 1982)on "the tracing procedure."

    2. COSTLY RATIONALITYThis section is an attempt to relate the ideas offered in this paper to themore general issues usually discussed under the somewhat misleadingheading of "bounded rationality." Megiddo (1986) is a thought-provok-ing introd uction to these que stion s, which are both conceptually difficultand technically demanding. However, as always in this paper, this sec-tion confines itself to generalities.

    What would a genuinely applicable theory of a rational player belike? Obviously, it would have to take account, not only of what istechnically feasible in the way of calculation, but also of the costs ofcalculating. As Megiddo (1986) and others emphasize, the cost of thetime used in calculating will be a major consideration. Real-life games

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    7/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    14 KEN BINMOREalways involve some explicit or implicit constraint on the time that maybe taken to make a move.10 A chess clock is perhaps the most evidentembodiment of such a time constraint. Accepting this view forces asimultaneous acceptance of the requirement that a model of a rationalplaye r will ha ve to incorporate heuristics, wh ich it uses to decide w heth eror not to carry out a specific calculation and to guess, or "value," theresults of calculations which it decides not to carry out in detail. (Recallthe stopping-rule-cum-guessing algorithm of Section 1). This idea willbe familiar to those who know something of the mechanics of chess-playing computer programs.An important incidental is that an attempt to model the heuristicdecision-making process along Bayesian lines is hopelessly inappro-pria te. Th e po int w ill not be pre ssed here since it follows from w hat wa ssaid in Sec tion 6 of Pa rt I. Briefly, the achievement of consistency cannotrealistically be regarded as costless. On the contrary, the necessary cal-culations should be expected to dwarf anything that has been discussedhi therto .Of all the many possible models of costly rationality, which shouldbe chosen? One approach is to hypothesize "meta-players" who designthe machines which actually play. In Megiddo and Wigderson (1986),Neyman (1985), Rubinstein (1985), and Abreu and Rubinstein (1986),these "meta-players" are seen as playing a "meta-game" 11 in which apu re strategy is the choice of mach ine. A Nash equilibrium is then soug htfor this m eta-g am e. A natural criticism is that this proced ure ev ades theissue by transferring the problem of costly rationality from the playersof the game to unmodeled meta-players. But these meta-players haveto solve a problem which is even more complex than that faced by theplayers themselves. However, such a criticism is fair only if the meta-players are granted a real, rather than a metaphorical, existence. Con-sider, by way of analogy, the status of the "auctioneer" in the Arrow/Debreu model of a market. Nobody believes that such an auctioneerreally exists, except in rare circumstances. No individual nor organiza-tion actually takes on the highly complex task of calculating the market-clearing prices. This is achieved via an unmodeled tatonnement processfor which the auctioneer serves as a simplifying substitute. In a meth-odologically an alo go us fashion, m eta-players can be seen as a me taph orfor an evolutionary process. The question of the complexity of the deci-sion-making processes attributed to the meta-players then ceases to bean issue since it is unloaded onto the environment. This is to argue thatwork of this kind should be classified as evolutive in character, along10. Forma lly, one m ay think of the failure to make a move by time t as a species of "m ov e"in itself, with appropriate consequences for the game payoffs.11 . Not a meta-game in the sense of Howard (1971), but in the natural sense.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    8/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 15with such work as that reported by Maynard Smith (1982, Chapter 5)on the evolutionary stability of animal learning rules.Before more is said about these evolutionary questions, anotherpoint needs to be made. Usually, theoretical realizations of boundedrationality incorporate a fixed, exogenously determined, upper bound onsome aspect of the com plexity of the strategies available to a player ( i.e.,on the complexity of a machine available to a meta-player), and, fromnow on, this paper will use the term "bounded rationality" only in thissense.12 Such an exogenously determined constraint must be expectedto be active in equilibrium. For example, in Neyman (1985) or Megiddoand Wigderson (1986), cooperation is obtained as equilibrium behaviorby boundedly rational players in games like the repeated Prisoner'sDilemma, precisely because the optimal strategies make the constraintactive. To quote Megiddo (1985):

    The underlyin g idea is that, no matter w hat the scarce resource is,[meta-] players design machines that waste the entire amount ofavailable resource so that they cannot perform even simple taskslike counting the number of stages [in a repeated game].It was in this sense that the machines which model the players in anevolutive context were said to be low in complexity relative to theirenvironment.One may think of bounded rationality, as construed above, as adegenerate case of costly rationality in which costs are zero if the ex-ogenously imposed constraint is not violated, but unacceptably large ifit is. However, this will obviously not do for an eductive analysis, forwhich a minimal requirement must surely be that the marginal costs ofcalculation must always be very small. Thus, if a machine fails to carryout certain computational tasks in equilibrium, it will be because themachine has chosen not to do so, not because it is unable to do so w ithoutaban donin g other com putational tasks. In an eductive context, any com-putational constraints on a machine must therefore be endogenous (i.e.,self-imposed). This point is m ade in an enco uraging p ape r of Abre u andRubinstein (1986).13 It may clarify matters to observe that bounded ra-tionality, in essence, empowers a meta-player to make commitments tostay within a certain strategy set. But Selten's reasons (Part I, Section2) for rejecting commitment in an eductive context, remain valid evenif the players are modeled as computing machines.12. Although, in philosophical discussions, the term "bounded rationality" often signalsonly that procedural questions are not to be ignored entirely.13. Encouraging because, in spite of allowing complexity to be endogenous and makingfew assumptions about the cost of calculating, they are still able to generate substan-tive results.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    9/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    16 KEN BINMO RESo far this section has argu ed that some of the recent work , in wh ichcostly or bounded rationality has been imported into a game-theoreticframework, can be seen as exercises in evolutive gam e theory. At the riskof confusing m atte rs hopelessly and com pletely, I now prop ose to arguethat, even in an eductive context, evolution has a central role to play.One might even take the view that, without some evolutionary story inthe background, rationality, in the sense required for an eductive anal-ysis, makes little sense. Given several rival models of rationality, howis one to make a choice?14 The practical man's answer is to let themcompete and see which survives. This, for example, is the ethos behind"D utc h bo ok " defen ses of Bayesianism.15 Of course, such a view requiresinterpreting the word "evolution" in an adequately wide sense. Theword invites concentration on biological processes, but it is important todraw attention to social evolution also. Genetics doubtless largely de-termines the brain's hardware, but its software is probably m ore relevantto the concerns of this paper. To paraphrase Dawkins (1976): memes(ideas, learning rule s, behavioral norm s, etc.) are just as mu ch the objectof evolutionary pressures as genes, but memes multiply through imi-tation rather than physical replication. In particular, education serves asa meme propagator, and this is one of the reasons for the choice of theword "educt ive."W hat dis tingu ishe s eductive theory from evolutive theory is that, inthe former, evolution operates at one remove. In evolutive theory, evo-lutionary processes work directly on the strategies in one specific game.When costly rationality is taken into account, this means that the evo-lutionary proc esses act directly on the rules of behavior w hich implem entthese strategies. Such rules of behavior can be thought of as computerprograms or as computing machines operated by such programs. Ineductive theory, on the other hand, evolutionary processes work on amaster-program w hic h has the capacity to choose strategies in a wide variety

    of disparate games, some of which may never have been played before. Thisdistinction will be reflected in the relative complexity of the decision-making process in the evolutive and the eductive cases respectively.How are the evolutionary origins attributed to a master-program inthe educ tive context to be operationalized? M ore is involved than mak ingthe choice of constraints on calculation endogenous and, of course, re-qu iring tha t the costs of calculation be very small. The device of allowingme ta-players to select Nash equilibrium strategies in a m achine-choosing14. A really good rationality model should be able to solve this choice problem, too - inwhich case it should pick itself! I have constructed an impossibility theorem for socialchoice in certain special circumstances using this principle (Binmore, 1975). Perhapsa more general theorem is true.15. Although the fact that rational agents will treat a bet-laying situation as a game, andhence choose strategically, means that this defense is flawed.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    10/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 17meta-game will not suffice in an eductive context. This is not for novelqualitative reasons. It is simply that the inadequacies of the expedientin the evolutive case become more pressing in the eductive case.Imagine a basic collection of possible master-programs. From this"meme-soup" of possible master-programs, individuals are repeatedlydrawn at random to act as players in games similarly drawn at randomfrom a wide menu of possible games. A selection mechanism of somesort operates so that master-programs which engender low payoffs be-come less frequent in the soup th an tho se which eng end er h igh payoffs.Think of a large population of hosts (the hardware), each of which maybe infected with a master-program (the software), with successful mas-ter-programs som etimes displacing less successful m aster-prog ram s. In-terest then centers on the master-programs which survive this evolu-tionary process.16 At least two regulatory mechanisms need to be takeninto account in considering wh at gov erns the complexity of the survivingmaster-programs. The first is that punishments must be provided fordelayed d ecisions. The second is that complex m achines are more likelyto malfunction than simple machines.17In order to say something about the master-programs (or machinesor memes) which thrive in this evolutionary competition, it is necessaryto look for an appropriate equilibrium. But a Nash equilibrium is not anappropriate equilibrium concept. This is not because of the familiar factthat repeated play of a game betw een the same oppon ents adm its a wholerange of phenomena inexplicable with Nash equilibrium in a one-shotgame18 - for example, cooperation, altruism, revenge, threats, etc. (seeAumann, 1981). Each time a new game is played, the players are re-selected at random and hence the players cannot use the current gameas an opportunity to punish a specific opponent for bad behavior inthe past, nor to signal to a specific opponent that good behavior wouldbe jointly profitable in the future.19 Nor is Nash equilibrium rejectedbecause of an inadequate "informational basis." It is understood thatthe players are to be supplied with all information allowed by the rulesof a game before playing it (and this includes the payoffs of the oppo-nents to the extent that this is permitted). Moreover, large numbers ofobservations of the play of games in the past will allow the master-program s to "lea rn " how relevant characteristics of master-pro gram s are16. Axelrod's (1984) "olym piad" for program s which play the repeated Prisoner's Dilemmais a good image, aside from the fact that the game-menu contains only one item.17. The physical costs of acquiring and operating hardware seem much less important.18. When a game is played repeatedly, the term "one-shot" game is used to distinguishthe game as played just once from the "repeated game" or "super-game" in whichthe game and all its repetitions are treated as a whole.19. Recall that the popu lation of master-program hosts is assum ed to be large. Note also,that such considerations will arise if the game selected for play from the menu ofavailable games happens itself to be a repeated game.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    11/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    18 KEN BINMOREdistributed overall in the population as equilibrium is approached. An"explanation" is therefore provided of how certain matters can become"common knowledge" as required in classical game theory (although,admittedly, the explanation is only slightly less skeletal than the tradi-tional appeals to "implicit communication" or "a common cultural her-i tage").One reason for rejecting Nash equilibrium is that, if one master-program (or set of closely related master-programs) wins out in theevolutionary struggle, then, towards the end of the struggle, there willbe a high probability that such a master-program will be playing itself (ora close relative or relatives) when called upon to play. Given that it isthe "welfare" of the master-program, rather than the "welfare" of anyparticular host or hosts carrying the master-program, that matters forevo lutiona ry success or failure, it is clear that a ma ster-progra m, playingitself, has a one-person decision problem to solve rather than a multi-person problem. A mechanism therefore exists for the evolution of co-operative behavior as described, for example, by Axelrod (1984). Biologistsrefer to this as kin-selection. A gene which generates behavior whichfavors relatives, genera tes behavior wh ich favors itself, because the genehas a good chance of being replicated in the body of a relative. As faras gam e theory is concerned, these considerations provide some supp ortfor the intuition that, if one equilibrium Pareto-dominates another, thenthe former should be preferred.20 But why not go further and abandonthe e quilibrium req uirem ent altogether? This is a pop ular qu estion w hichis usually asked in the specific context of the one-shot Prisoner's Di-lemma by those who find the game theory solution paradoxical. Hof-stadter (1983) gives an unusually clear version of the position of the"anti-equilibrium school." In essence, if I am playing myself, why do Inot just maximize the sum of my payoffs? Game theorists usually shrugoff this question,21 but they are really not entitled to do so, given theirtraditional view that game theory concerns players who all reason inprecisely the same "perfectly rational" manner.From the viewpoint advocated here, however, it is not true that awinner in the evolutionary struggle will necessarily be playing itself (or20. The discussion which follows leaves the status of this intuition indeterminate. Thereason is that there is a tension between the forces which push in this direction andthose w hich ma intain th e stability of the equilibrium in the face of "m uta nt inv ade rs."Which wins out would seem to depend on the precise modeling of the dynamics. As

    always, Harsanyi and Selten (1980, 1982) have considered this point earlier. Perhapsthe evolutionary viewpoint of this section will clarify the nature of the tension theysee between Pareto-dominance and risk-dominance when comparing equilibria. It willnot, on the other hand, clarify their resolution of this tension which they do not claimto be other than a formalization of their educated intuition on this topic.21. With some excuse, since it certainly leads to silly conclusions.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    12/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 19a close relative).22 A small probability will exist, for example, that anopponent may be a "dinosaur" using an obsolete strategy that has notyet been eliminated. Indeed, if the malfunctioning of programs is al-lowed for, inferior strategies will never be eliminated. Such malfunc-tioning will ensure a continual invasion of the population of master-programs by "m uta nts ." A winning m aster-program will therefore haveto incorporate a facility designed to detect and identify such deviants(during the play of each separate game) with a view to exploiting them,wh ere possible. More imm ediately to the point, how ever, is the fact thata master-progra m will not win the evolutionary struggle in the long r ununless it is, itself, immune to exploitation by mutants. Thus, althoughmutants (and/or dinosaurs) may be present in relatively small numbers,their presence must be expected to have a very strong influence on thebehavior of the winning master-program.To say that the idea of a Nash equilibrium in a machine-choosingmeta-game should be replaced by the idea of an evolutionary stable equi-librium (Maynard Smith, 1982) would be a gross simplification. For onething, the nature of the mutants around will be a function of the equi-librium m ach ine. Neve rthe less, it does convey the flavor of w ha t is beingproposed.In summ ary, the use of the device of a machine-choosing meta-ga mein an eductive context requires that:

    1. Marginal costs of complexity be always small, so that bounds onthe amount of computation are determined endogenously.2. Machines be capable of playing any role in any one of a widevariety of games, including games of which they have little or noprevious experience.3. Equilibrium in the meta-game be interpreted in an evolutionarysense. As a minimum, the equilibrium should be immune toinvasion by mutant machines created by malfunctions in the ma-chines currently present in the equilibrium population.4. Machines be capable of recognizing and responding to deviantbehavior.23

    22. It is not argued that there will necessarily be a unique winner. On the contrary, it isto be expected that a mix of winners will survive who calculate in much the sameway bu t not necessarily with the sam e precision, nor from precisely the sam e prem ises.It may even be that a mix of very disparate winners may survive, held together byan analogue of "Bayesian equilibrium."23. In Section 5 of Part I, machines are considered which receive, as part of their prelim-inary input, the Godel number of an opponent. These were considered in order tomake a philosophical point: namely that the assumption that an opponent's mixedstrategy choice can be predicted with precision is questionable. In the current context,it would, of course, be an evasion to postulate that an opponent's Godel number isknown a priori without offering an explanation of how it became known.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    13/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    20 KEN BINMOREIt is, of course, one thing to formulate such dicta and quite another toconstruct a theory in which they are operationalized. In what follows,their role is merely to provide a standpoint from which to evaluate thestructure for a game-playing machine outlined informally in Sections 3and 5.

    3. GAME-PLAYING MACHINES 24W ha t follows in this section clearly has wid er implications than the game-theoretic ap plications for which it is pro pos ed. In particular, it has som esignificance for the software-hardware appro ach to the mind-body p rob-lem advocated by Putnam (1975) and others, and some relevance to theexplanations advanced by some biologists for the evolution of self-con-sciousness in humans. However, I have nothing particularly original tooffer on these general issues. The aim is to codify what seems to be anemerging consensus on these questions, in a manner that, although farfrom formal, is sufficiently precise to allow the possible applications ingame theory to be sensibly evaluated.

    Recall, from Part I, that only the play of contests is to be conside red.I use this term to indicate a game in which no pre-play communicationbetween the players about the game is permitted (nor communicationduring its play except insofar as the formal moves provided for in therules of the game can be used for this purpose). To say that a game isa contest is therefore to make a statement about the environment inwhich the game is played. The study of contests has a good claim to beregarded as fundamental in that games played in environments whichdo permit some communication can be reduced, in principle, to contestsby modeling the communication possibilities as formal moves in a largergam e (N ash, 1951, Binmore and Da sgup ta, 1987).

    The fact that players cannot exchange messages before the play ofthe game does not imply that they do not share information beyondthat supplied to them in accordance with the rules of the game. In thelanguage of the preceding section, they will certainly share the infor-ma tion that the y hav e been chosen from the same population or society.Among other things, this will involve a shared knowledge about arbi-trary conventions which will have evolved in the population. Examplesare, "Drive on the right," or, "Condition your investment decisions onthe level of sunspot activity." Such conventions allow players to coor-dinate their behavior by making use of the fact that they can commonly24. A preliminary version of this paper (Binmore, 1986) used the term "Bayesian autom-aton." But it seems that the word "Bayesian" cannot be employed without the riskof being identified with the "naive Bayesianism" criticized in Part I, Section 6. In anycase, a chang e of terminology m ay serve to signal that the ideas have been dev elopedsomewhat. The word "machine" is used in a deliberately wide sense to include anyadequately complex computing device.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    14/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 21observe phenomena which are not intrinsic to the game (such as thelabels used to identify strategies or actions).

    Sometimes difficulties over such conventions are allowed to confuserationality discussions.25 But the issues seem to me to be second ary an dbest studied by looking directly at the m ann er in which such conv entionsactually evolve by explicitly modeling the appropriate extrinsic p hen om -ena and employing the theory of repeated games. For this reason, theterm "contest" will be taken to imply that the game is played in anenvironment in which any labeling of events which players observetogether ha s yet to take on a m ean ing. O ne may think of this re quirem entas forbidding the "tacit communication" implicit in a common under-standing of the historical significance of arbitrary symbols or extraneouscircumstances.Recall that the basic aim of this paper is to explore what is involvedin an eductive libration process. The advantage of "factoring out" com-munication possibilities that are not explicitly modeled as part of thegame (which is what is achieved by looking only at contests) is that pre-play libration takes place entirely inside the head of each player separately.On e consequ ence is that care is necessary in interpreting the meaning ofan equilibrium in an eductive context - particularly equilibria requiringmixed strategies. This point is made at greater length in Section 4. Allthat nee ds to be said at this point is that ma chines (or m aster-prog ram s)which survive the evolutionary struggle described in the previous sectionwill exhibit behavior which is adapted to the mix of behaviors exhibitedby the equilibrium population. Thus, when two "rational" machinesface each other in a static game, their choices of strategy will be ap-proximately optimal given their predictions of the strategy to be chosenby the other player. But this prediction need not always be realized:either because the equilibrium population may contain many differenttypes of machine (as in Harsanyi's (1967,1968) theory of incompleteinformation); or because the manner in which the machines calculatehas some built-in indeterminacy;26 or both at once. The ideal of "perfect25. For example, there is the red herring concerning what "rationality" requires in the"pure coordination" bimatrix game in which both matrices are the 2 x 2 identitymatrix. If the strategy labels have no significance for the players, then there is nothingto be said since no reason can then exist for supp osing it more likely tha t an op po ne ntwill choose one label rather than another. (Note, incidentally, that the location ofstrategy labels in a visual display may well be significant.) If, on the other hand, thelabeling of strategies is significant, then nothing can be said without a preliminarydiscussion of how and why the labels are significant.The example of the preceding paragraph is trivial. At a deeper level, however,there are the "rationality" claims made for such notions as correlated equilibrium(Aumann, 1974, 1987).26. This is not the same as admitting the possibility that mixed strategies may be used.The reference is to uncertainty about which calculation has been employed in deter-mining which strategy is to be used.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    15/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    22 KEN BINMOREration ality" e xclude s such possibilities, but the fundam ental tenet of thispaper is that this ideal is unattainable.So far, all that has been achieved is to delimit the class of games tobe considered so that all pre-play activity can be located inside eachmachine separately. Of course, as the game proceeds, players will learnabout the structure of the machines they are playing against by studyingthe moves they choose. This will lead them to revise their predictionsof wha t their o pp on en ts are likely to do nex t. Only in static games (i.e.,games which can be thought of as taking place at a single instant) dothese learning considerations become irrelevant so that only the pre-play calculations matter. For this reason, this section and the next areconcerned only with static contests. The question of learning about op-ponents over time is left until Section 5. This allows attention to beconcentrated on the amount of calculation a machine does, althoughlearning issues cannot be neglected altogether.The basic model I propose is that of a computing machine pro-gram m ed to write program s to play gam es. It may therefore be though tof as a "me m e-ge ner ating m em e. " For a static gam e, it receives, as input,a coded description of the game and responds by writing a programwhich analyzes the game and recommends a choice of (possibly mixed)strategy. An analysis of the game includes a prediction of the behaviorof the machines acting as the other players. For simplicity, attention willbe confined to two-player games so that only one opponent need beconsidered. On what basis is a prediction of the opponent's behavior tobe made? Note that a description of the opposing machine (its Godelnumber) is not given as part of a machine's data as in Part I, Section 5,nor data about the manner in which the specific machine currently beingplayed has played games in the past. The possibility of establishing areputation as a certain type of player is therefore absent.In predicting the opponent's behavior, a machine can therefore onlydraw on the fact that both itself and the opposing machine have beendrawn from the same population of potential players. The data availablefrom this source will be classified under two headings: objective andsubjective. The objective data consist of observations of the way gameshave been played in the past by players drawn at random from the samepopulation. Obviously, data on the way the current game, or similargames, has been played will be particularly important. The subjectivedata con sist of the ma chine 's own m aster-prog ram . (For simplicity, mem-ories of more primitive master-programs that may once have controlledthe machine in the past, and may still control other machines in thepresent, are ignored.) The machine's master-program, it is assumed, isstored like other data and can be accessed to an extent to be discussedlater. Notice that the subjective data is just as "hard" as the objectivedata.Yet more taxonomy is now required. Two situations will be distin-

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    16/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 23guished in respect of the available objective data. The first occurs whenthere is a very large quantity of relevant objective data: for example, ifthe strategies chosen by players drawn at random from the populationat large have been observed for several million previous plays of thegame w hich is about to be played. The second occurs w he n th e quantityof relevant data is not very large: perhaps only a few previous plays ofthe current game, or similar games, have been observed (although therequirement that evolution has brought about equilibrium in regard tothe master-programs requires that there must be a very large amountof objective data available on dissimilar games).

    The two situations are distinguished be cause it seem s to be only thesecond which is of genuine interest for an eductive analysis. Unless onewishes to contemplate the survival of master-programs whose predic-tions are at variance with the empirical evidence, the first situation isessentially evolutive in character. A master-program need only write aprogram to play the game which maximizes expected utility given theempirically observed probability distribution over the opponent's strat-egies. Complexity problems are therefore unloaded onto the environ-ment. My suspicion is that those who insist that the choice of anequilibrium strategy should be taken as an axiomatic requirement of "ra-tionality" actually have this evolutive setting in the back of their minds.In such a setting, the axiom is certainly defensible, provided that theword "equilibrium" is not over-specified. For example, game-playingmachines will be "satisficers" to the extent that they will carry throughcalculations only to an ap pro pria te level of approximation. But, as Radn er(1980) has pointed out (see also Megiddo, 1986), very different resultscan sometimes follow when precise equilibria are replaced by approxi-mate equilibria.

    It is when the amount of objective data is inadequate for a "naive"empirical approach that an eductive analysis becomes necessary. Anylibration dynamics can then no longer be treated as being entirely ex-ternal to the players and hence requiring no justification on rationalitygro und s so that equilibrium ha s to be treated a s a fundame ntal principle.In the situation of interest for an eductive approach, if "rationality"gen erate s equilibria, this fact must be explained, together with the reason sfor the selection of one equilibrium rather than another when multipleequilibria exist. Somehow, the machine's subjective data must be har-nessed to this end, along with whatever available objective data maybear on the matter.An aside is now necessary on data which may be available at oneremove via the reports of others. As far as objective data is concerned,this is no problem provided appropriate reservations are made aboutthe reliability of material obtained sec on dh an d. But w ha t of reports fromother machines about their subjective data? What of the deductions thatcan be made a bou t the subjective data of other m achines from observing

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    17/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    24 KEN BINMOR Etheir general game-playing behavior? Both of these considerations areimportant to the assumption that lies at the heart of the approach ad-vocated h ere : nam ely, that introspection is a viable metho d for predictingthe behavior of others. More specifically, in order to predict what anopposing machine will do in a situation for which adequate objectivedata are not available, the assumption will be that a machine will use,as a guid e, w ha t it wo uld do itself, if it were in the same situation. Sucha proce dur e mak es sense only if the answ ers to the preceding questionsare supportive of such a predictive device. To consider the plausibilityof this requ irem en t, it is necessary to retur n to the pu tative evolutionaryorigins of the game-playing machines. Recall that the basic mechanismseen as drivin g th e evolutionary p rocess in Section 2 wa s imitation, withan emphasis on the role of education in facilitating imitation. Thus, as amachine learns about the subjective data present in other machines inthe po pula tion, it m ust be expected to incorporate into its ow n master-program those elements of other master-programs which are successfuland to suppress those elements of its own original master-programwhich are less successful. In the long run, the tendency will be to gen-erate a population with closely similar master-programs and hence itsmembers will have good reason to suppose that introspection is a val-uable so urce of information abou t the thinking processes of others. It isnot claimed, of course, that such a conclusion is necessary: only that itserves as a useful working hypothesis.How does a machine make use of its subjective data in seeking topredict the behavior of the opponent? 27 Sometimes, of course, it will notneed to make such a prediction, as in the one-shot Prisoners' Dilemma.Let us, however, confine attention to those cases where a prediction(albeit an approximate prediction) is necessary if the situation is to bereduced to a one-person optimization problem. It will then be assumedthat the machine seeks to simulate the reasoning processes of the op-ponent by running its own master-program 28 with the opponent 's inputdata and using the output as a prediction of the opponent's play.27. The an sw er to this question will clarify why a ma ster-program which w rites program sis introduced rather than supposing that all the necessary programs have been writtenalready and stored ready for use. Such a fixed library of programs is then just a poorrelation of the "grand book of game theory" to which, following Von Neumann andM orge nstern (1944), gam e theorists are fond of appealing w hen pressed for a defenseof equilibria in an eductive context. Binmore (1986) evades introducing the idea of amaster-program by this method, but only by evoking an "outsider" who intervenes

    to "improve" the design of machines. This "outsider," or the master-program in thecurrent paper, substitutes for the author of the "grand book of game theory" in thetraditional parable.28 . A simp le, but costly, method wou ld be to copy the instructions of the m aster-programto another storage site and then to introduce enough auxiliary instructions to makethis usable as a subroutine or procedure. Such a procedure would then correspondto what the biologists call a "simulator" (Monod, 1972).

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    18/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 25But this is no t so straightforward as it may seem . The evolutionarydefense given above for the use of introspection as a tool in predictingthe behavior of others relies on machines' having the facility to modifytheir own programs as they learn from experience. The machines aretherefore self-correcting. Ho we ver, if a ma chine is progra mm ed to correctitself if the prediction of what it would do in some future circumstancefails some criterion, then the prediction will not necessarily be correctbecause the predicted behavior is subject to correction.29 A self-correctingmachine cann ot therefore predict even its ow n behavior w ith certainty.30The nuts-and-bolts reason for this difficulty is that a machine whichruns precisely its own program before finally deciding what action to takeis involved in an infinite re gress.31 To simulate a mac hine that simulatesits own operation requires a simulation of a simulation which, in turnrequires a simulation of a simulation of a simulation; and so on. If themachine is not to calculate forever, a stopping rule is required to breakout of this infinite sequence of nested loops. When this stopping rulecalls for a halt, the machine will necessarily have to call up on a "gue ssin galgorithm" in order to predict its own behavior. Given that a guessing-algorithm-cum-stopping-rule is required, where does it come from? Inparticular, what is the origin of the basic guessing rule?Here the fact has to be faced that an answer to this question mustbe arbitrary to a large extent, unless the evolutionary process offered asan "explanation" of the origin of the game-playing machines is to bemod eled explicitly. Even then , the answ er will presum ably be c onting enton such historical accidents as the composition of the original p opu lationfrom w hich the cu rrent po pulation evolved. This is not to say that n oth-ing can be said at all. If nothing else, the second law of thermodynamicswill tell us something about the probabilities which a guessing ruleshould attribute to events which are neutral to evolutionary pressures.32How ever, the general problem of scientific induction is beyo nd the scope

    29. The traditional response is that a "perfect" machine's predictions will no t fail thecriterion. This sweep s und er the carpet the question w hich really matters here: name ly,how did the perfect machine get to be perfect?30. Part I offered Godel as an authority for such assertions but any p hilosopher w ho writeson self-consciousness seem s to say something of the sort. An extract to this effect fromSchopenhauer is quoted at the head of Part II, but I prefer Hume's more mundaneobservation that he often repeatedly decides to rise from his bed but fails to do so,only later to find himself getting dressed with no clear idea of how this came about.For a viewpoint closer to that of this paper, see Scriven (1965).31. Observe that to know the program listing of a machine is not to know the results ofall calculations of which the program is capable.32. I see the criteria which Harsanyi and Selten (1982) impose on the prior distributionfrom which their tracing procedure begins, as a brave attempt to systemize theirjudgments about the extent to which arbitrariness in the choice of their equivalent ofa basic guessing rule can be delimited. M uch of wh at they say (although c ertainly n otall) is obviously relevant, given such an interpretation.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    19/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    26 KEN BINMOREof this paper and no attempt will be made to explore these issues. Itwill simply be assumed that a machine does incorporate an algorithmwhich is somehow capable of converting whatever paucity of objectivedata is available into a preliminary prediction of what it would do if itwere called upon for a decision in various circumstances. This basicguessing algorithm is employed only when the stopping rule preventsmore elaborate introspection.What now follows on stopping rules is only a little less sketchy.Given a basic gu ess ing rule, the po int of iterating simu lations is to refinewhatever estimate of the opponent's behavior is current. Instead ofwo rking d ow nw ar ds , throu gh lower and lower levels of simulation, untilthe basic guessing rule is invoked, one can think of starting with thebasic guessing rule and then working upwards. The machine can thenbe ascribed an inductive structure in which moving one rung up thesimulation ladd er is seen simply as a device for replacing one predictionby a more refined prediction. The stopping rule then appears as a ruleof thumb for determining when convergence has been sufficiently ap-proached,33 i.e., when the estimated (small) costs of moving one morerung up the ladder outweigh the estimated benefits of a more refinedprediction. These estimates, like the basic guessing rule, will necessarilybe arbitrary to some degree. Thus the stopping rule will also have ar-bitrary features.It would be easy to underestimate the complexity implicit in thestructure discussed so far. Recall, however, that the machines underdiscussion are self-correcting. This m ea ns, in particular, that the s toppin g-rule-cum-guessing-algorithm will be vulnerable to self-correction. Con-sider first the implications in respect of the stopping rule.A consequence of what was said earlier about self-correcting ma-chines is that such a machine cannot "know" (i.e., have full access to)all the o per ating details of those asp ects of its operation which it mo nitorswith a view to possible correction.34 To this extent, a self-correcting33. It is tempting to dispense with the stopping rule once its role in establishing thenecessity for a basic guessing algorithm is over. Instead, one might go directly to thelimit. It is no counter-argument that machines cannot "go directly to the limit." Ac-cording to the Church/Turing thesis, if a pure mathematician has an algorithm forfinding exact limits, a machine can be constructed to operate the algorithm. The propercou nter-a rgum ent is quite different. It is simply that w orking a finite number of stepsup the simulation ladder from the basic guessing rule is equivalent to working thesame number of steps down the ladder to the basic guessing rule. But working an

    infinite number of steps up is equivalent to nothing at all in the original structureattributed to a game -playing machine. This is not to deny tha t the clearer ma thematicsusually obtained in the limit may not be relevant, only to assert that the interpretiveproblem s c ann ot be ignored, especially since there will normally be m any limits to betaken and the order in which these are taken will typically be significant.34. And there is no point in the machine's incurring the cost of monitoring an operationwhich it is not programmed to correct if necessary.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    20/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 27machine will therefore necessarily be a "riddle to itself." But it need notbe assumed that the machine is "unaware" of the existence of elementsof its own operation to which it does n ot have full access. Such ignorancemay be taken account of, when the machine seeks to simulate its ownbehavior, by its using, not one simulating subroutine, but many alter-native simulating subroutines each of whose outputs is then weightedusing an appropriate probability35 to produce a final prediction.As far as illustrative examples are concerned, the technically easiestway of taking account of such ineradicable uncertainties requires theassumption that the machine's hardware admits a facility for accessing"black boxe s" which generate rando m num bers.36 The interiors of these"black boxes" (or their programs) are assumed not to be available forinspection, although nothing precludes the making of statistical infer-ences on the basis of their output. The advantage of working with ma-chines equipped with such randomizers37 is that it becomes possible toproceed on the assumption that all machines have an identical structureeven though their behavior nee d not be identical. It mu st be rem em bere d,however, that such random izing "black boxes" are not an essential featurefor a game-playing machine. They are a mathematical device for rep-rese nting certain una void able a reas of ignorance in a technically tractablefashion.Figure 1 illustrates the structure of a game-playing m achine x whichseeks to predict an opponent y, given that a random stopping rule isutilized. The notation Gn{x) (n 5= 1) de no tes a mixed strategy for p layerx which is an optimal reply to the use of the mixed strategy G n_i(y) bythe opposing player y. Some unspecified tie-breaking rule is taken forgranted here, but this is not a point which deserves close attention. Thebasic guessing rule, denoted by Go, is us ed with probability 1 - r0. Thenth-level refinement, G n, is used with probability rorx . . . rn_,(l rn).Consider next the implications of a self-correcting facility for thebasic guessing rule. Observe that, in Figure 1,

    Ho = (1 - ro)G o + ro(l - r,)G, + . . . + r^ . . . r._ ,(l - rn)G n (1)is supposedly an improvement on the guessing rule Go. In principle, th e35. Of course, these probabilities, along with everything else, will be subject to self-correction.36. Note that such a modeling device does not allow an escape from the difficulties with"perfect rationality" discussed in Part I, Section 5. The machine z cannot then ensu rethat r is always wrong but it can ensure that r is statistically wrong. The point here isthat, although z cannot calculate r's prediction p when this is generated partly atrandom, z can calculate the probabilities with which the various possible predictionswill be made.37. These are not "non-deterministic au tom ata." M athematicians have usurped this termfor another purpose (Hopcraft and UUman, 1979).

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    21/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    28 KEN BINMO REmachine should therefore be able to improve itself by replacing Gothroughout by Ho. The new structure may then itself be vulnerable toimprovement; and so on. Such self-correction will be worthwhile untilthe basic guessing rule is no longer much altered by further iterations.If Go in Figure 1 is now assumed to be a basic guessing rule generatedby such self-improving "bootstrapping," then it will itself be about asgoo d a pred ictor as the m ethod is capable of producing . A stopp ing rulethat attaches a high probability to going beyond Go would then be com-putationally inefficient. The implication is that, with a "good" basicguessing rule, the probabilities r0, rx, and so on, should be very small.But then quadratic terms in (1) will be negligible. With a "good" basicguessing rule G0/ Figure 1 should therefore be replaced by the evensimpler struc ture show n in Figure 2A, where r0 is und erstoo d to be verysmall.38 Of course, the simplicity of Figure 2A is only superficial, beingachieved only by concealing the complexities inside the algorithm Go.Before leaving Figures 1 and 2A, some comment on convergenceissue s ma y be w orthw hile. It will be familiar from the well-know n "cob-web model" that the type of bootstrap iteration procedure describedhere may oscillate violently if predictions for the next period are madedependent only on behavior in the current period. (See, for example,Binmore, 1983, pp. 339, 389.) On the other hand, stability is achievedif a suitable weighted average of past behaviors is used for predictivepurposes. The current section provides some sort of theoretical backingfor such a procedure. However, no results are offered on convergencebeyond the computations embodied for the specific examples illustratedin Figures 3 and 4.In summary, this section has argued that a game-playing machinein an eductive context needs to be seen as a complicated device in whichloops are nested within loops to an indefinite degree. Not only this,their structure will be arbitrary to some extent, and there will be aspectsof even its own operation to which a machine will not have full access.Moreover, such ignorance is ineradicable. But this is not to argue thateductive game theory is a hopeless enterprise. On the contrary, thebootstrap iteration argument offered here provides a useful supplementto the more traditional arguments employed when attention is focusedon Nash equilibria in static contests. (See Figures 3A and 4A for ex-amples.) Finally, it should be noted that, although the details of whatis pro po sed differ m arkedly, the gene ral appro ach to which the arg um entleads closely rese m bles the tracing procedure of Harsan yi an d Selten (1980,1982).

    38 . It will be determined by some criterion which renders the machine approximatelyindifferent to prolonging its self-improvement exercise and to ending it.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    22/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 29II

    V

    2

    0

    -2

    00

    1

    0

    -1

    EXHIBIT 1

    4. MIXED STRATEGIESThe improvement of the basic guessing rule Go obtained by replacingGo by the rule Ho = (1 - ro)G o + r0Gi (see Figure 2A) will cease to beworthwhile wh en Ho is nearly the same as Go. (This doe s not necessarilyimply that Go is nearly the sam e as G u because G t - G o = (H o - G0)/r0and r0 will be small for the reasons given in the preceding section.)Figure 3A illustrates the dyn am ics for the zero-sum g am e with the payoffmatrix (Exhibit 1).A point (p, q) in Figure 3A is to be identified with a basic guessingrule Go in which G0(I) = (1 - p, p) and G0(II) = (1 - q,q). The trajectorymarked with arrows indicates the path traced if an initial basic guessingrule originally located at the point (0.1, 0.1) is successively replaced bya guessing rule H o = (1 - ro)G o + r0Gu when r0 is vanishingly small.(For example, the optimal reply rule Gj to an initial basic guessing ruleGo corresponding to the point (p, q) located in region A of Figure 3A,corresponds to the point (0, 1). Thus H o will correspond to that pointon the line segment joining (p, q) and (0, 1) which is at a distance from(p, q) equal to a fraction r0 of the total length of the line segment. NextH o is placed in the role of Go and the calculation is iterated. Eventually,the bound ary betw een regions A and B is reached, wh ereu pon the op-timal reply rule Gx jumps to (1, 1) and so a new line segment has to beconsidered.)

    Such trajectories are perhaps best thought of as providing a test fora "good" basic guessing rule to pass. Thus the basic guessing rule G ocorresponding to (0, 1/2) fails the test because a refinement of Go leadsalong the trajectory to something significantly different from Go. Onlybasic guessing rules close to the unique Nash equilibrium of the gameat (2/3, 2/3) will pass the test, in this particular instance.However, the fact that all trajectories in this game converge on itsunique Nash equilibrium is not the point here. Any other result would

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    23/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    30 KEN BINMOREbe surprising. What is at issue in this section is the interpretation of aNash equilibrium when mixed strategies are involved.In this zero-sum game, if player I randomizes between his pure strat-egies, using the first with probability 1/3 and the second with probability2/3, then any strategy is an optimal response for player II, includingthat in which she randomizes between her pure strategies using the firstw ith p robability 1/3 a nd the second with probability 2/3 . Since the sam eis true w ith th e roles of I and II reversed, the quoted pair of mixed strategiesis a Nash equilibrium for the game. This manner of presenting the ideaforces the question: why should rational players randomize? They willnecessarily be indifferent between all pure strategies to which the rec-ommended mixed strategy attaches a positive probability.The answer is, like much else in game theory, that the wrong ques-tion has been asked. The argument which led us to the unique Nashequilibrium in the zero-sum game of Figure 3A was an argument aboutthe convergence of guessing rules. The actual choice of strategy made bya player is another matter.39 This will de pe nd on the stoppin g rule whichcontrols the bootstrap iteration by means of which the basic guessingrule is refined. If, for example, this happens to leave the final pair ofpredictions in region D of Figure 3A, then an optimal response by bothplayers is to use their first pure strategy (i.e., p = q = 0). They willtherefore make this choice, although the predicted return on using thefirst pure strategy will only be a tiny bit better than that for the secondpure strategy. The point here is that they do not choose to randomize -i.e., the output from a machine will not normally designate a mixedstrategy.40 But this outpu t will be predictable by the other m achine onlyto the extent that the guessing rule allows. This conclusion is a conse-quence of the observation that certain aspects of its own operation mustnecessarily be hidden from a self-correcting machine. This is particularlyeasy to understand when random stopping rules are employed, as inthe illustrative example of Section 3 used to motivate the dynamics ofthis section. But the same conclusion necessarily applies even when at-tention is confined to machines whose entire operation is totally deter-ministic.The supposed paradox: that game theory necessarily requires ra-tional playe rs to rando miz e, is therefore a chimera.41 W hen a Nash equi-librium calls for the use of mixed stra tegies , an eductive analysis req uiresthat the probabilities involved be seen as reflecting only the ineradicable39. Especially in the unlikely event that the guessing role happens to make the playerindifferent between two pure strategies. For such cases, an arbitrary tie-breaking rulewas proposed in Section 3.40. Although this is not to say that recommending a mixed strategy is forbidden.41. Although this is not to say that randomizing may not frequently be a convenient orcost-effective means for avoiding calculation. Consider for example, the case of arepeated two-person, zero-sum game.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    24/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 31uncertainties that a player will necessarily have about what the otherswill do . Jokes abou t game theory 's recomm ending that finance ministerstoss coins to decide precisely when to devalue are therefore misplaced:finance ministers can achieve exactly the same effect precisely as theyalways have - i.e., by using a committee of economic and financial ex-perts.The idea that mixed strategies can be "purified" - i.e., that nobodyneed consciously be randomizing in order for mixed strategies to bemeaningful, has a long history. The first expression of this idea in aneductive context seems to be due to Harsanyi (1973). A more recentreference is Aumann, Katznelson, Radner, and Rosenthal (1981). Whatis offered in this section is therefore only a new twist on an old story.Of course, in an evolutive context, all is much easier since a mixed strategycan then be seen simply as a summary of the behaviors current in thepopulation as a whole.

    5. LEARNINGSections 3 and 4 were confined to the discussion of static contests. Thissection studies contests with some dynamic structure. As soon as timeente rs the picture, the problem of learnin g has to be faced. T he traditionalapproa ch is to use Bayesian upda ting, often within the con ceptual frame-work created by Harsanyi (1967, 1968) for his theory of "games of in-complete information." This section also takes these tools of analysis asbasic.42 Where it differs from the traditional approach is in its enlarge-ment of the range of phenomena about which the players may learn asthe game unfolds. In particular, the players may learn about their op-ponents' thinking processes from observing their behavior.43 This includesnot only their opponents' beliefs but also the manner in which thesebeliefs are con structed. A major com plication is that su ch a view requ iresabandoning the principle (whether explicit or implicit) that deviationsfrom equilibrium behavior should be taken to be uncorrelated. The ex-amp les of Part I we re intende d largely for the purpo se of dem onstra tingthat, although the principle may be defensible in some evolutive con-texts, it is not intuitively satisfying in an eductive context.The issues considered here are obviously very relevant to "signallinggames" - i.e., games in which the actions chosen by the players may42. The criticisms of Bayesianism in Part I, Section 6 notwithstanding. This criticism wasdirected against the inappropriate use of a methodology suitable only for "closeduniverse" problems when the actual problem is an "open universe" problem. Muchof the previous discussion can be seen as an attem pt to "close the universe of deb ate "so as to legitimize Bayesian techniques. Admittedly, however, there remains roomfor doubt as to the success of this attempt.43. Indeed, in principle, they might learn things about their ow n thinking process byobserving their ow n behavior.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    25/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    32 KEN BINMOREserve to signal important information to the opponents, usually in re-spect of the signaller's own future intentions in the game.44 Some briefdiscussion of these issues appears in Section 6. At this point, I wantonly to stress that the approach advocated here rejects an assumptionwhich is explicit or implicit in much of the literature : nam ely, tha t beliefscan be seen as the object of strategic choice. This seems to me to makenonsense of the fundamental Bayesian insight that preferences and be-liefs can be separated. Indeed, one might go so far as to say that thepurpose of studying rationality is so that beliefs can be formed withouttheir being distorted by preferences. The approach advocated here de-rives from a different line of research: namely, that developed by thosew ho have w orked on the value of reputations in a gam e-theoretic context(e.g., Kreps and Wilson, 1982b).In this work, it is given that the opponent may not be "rational."Instead it is assumed that a small initial probability exists that the op-po ne nt is a type of player w ho is "irrationa l" in a highly specific m ann er.This probability is revised as observations of the opponent's behaviorbecome available. Matters are complicated by the fact that it will oftenbe wo rthw hile for a "rational" opp on ent to consider aping the behaviorof an "irrational" opponent in the hope of being wrongly identified.Interest centers around the precise extent to which such "hustling" isworthwhile and its role in generating behavior which is markedly dif-ferent from that which occurs when the possibility of "irrationality" isruled out altogether.The twist on this story offered here is that, once the idea of perfectrationality has been put aside, the same conceptual framework can beemployed without postulating the possible existence of players who aredownright irrational. Instead, one may begin with an array of types ofplayer, w ith each typ e represen ting a different specification of the factorsnot tied down by the assertion that a player is "rational." This allowsa wider range of behavior to be reconciled with "rationality," albeit only"approximate rationality," than would otherwise be possible. Notice,however, that it is not claimed that all behavior is capable of beingreconciled with "rationality." My own opinion is that a really goodtheory would include, not only a clear picture of the types of machineto be regarded as approximately rational, but also a clear view on thenature of the defects likely to afflict such machines so that a "minimaldeviation" hypothesis is always available to "explain" off-the-equilib-rium-path actions. If an evolutionary origin for the "rational" machinesis env isage d, s uch defects can be identified with unim prov ing m utatio ns.What needs to be emphasized is that, whether or not the opponent isthought to be "rational," deviations from predicted play in this frame-44. As in the "forwards induction" of Kohlberg and Mertens (1986).

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    26/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 33work will not, in general, be treated as conveying no information aboutpossible future behavior.45For simplicity no further account will be taken of the possible ex-istence of "irrational" types of player, although this imposes a severerestriction on the class of games to which the discussion is applicable.This allows attention to be focused on the source of the variation in thedifferent type s of rational player. Recall the argum en t of Section 3 w hichasserts that a "rational" mach ine will necessarily incorporate a sto ppin grule and a guessing algorithm . Both provide possible sources of variationamong types but it is the guessing rule which seems important here. 46Also remem ber the distinction m ade in Section 3 between subjective andobjective da ta. The proce dure prop osed here is to continue to treat sub -jective data - i.e., data a machine can obtain by examining its own op-eration, as common among the players. In practice, this means that, insimulating another machine, a machine works on the hypothesis thatthe other machine has the same program as itself.47 However, in thissection, the objective data will not be treated as part of this program butas a private input to this program. The important point is that thisobjective inp ut may vary be tween m ach ines an d th at it is to this variationthat differences in the basic guessing rule are to be chiefly attributed.

    The illustrative example employed in Sections 3 and 4 (see Figures1, 2A, 3A) treats these differences in the basic guessing rule as beingsufficiently small as to be negligible. This seems reasonable as a hy-pothesis if it is supposed that the players have a very large amount ofrelevant objective data, since one can then appeal to some variant of thelaw of large numbers. However, in Section 3, it was argued that theinteresting case for an eductive analysis is that in which the amount ofrelevant objective data is small. In this case the differences betweendifferent machines must be expected to be significant and the machines'programs must be expected to recognize this fact.The diagram of Figure 2B is an attempt to adapt the illustrativeexample used previously to take proper account of this more generalsituation. The machines are assumed to recognize n different possibletypes of machine, distinguished by the differing objective data they mayhave received. The objective data received by a machine of type i isassu med to dete rm ine a set of beliefs for that mac hine about the objective

    45. One might think of the traditional trembling-hand argument as attributing deviationsto random electrical fluctuations in the computer hardware. The theory envisagedabove, on the other hand, would locate the source of deviations in the computersoftware.46. The importance of the stopping rule is that its existence m akes it necessary that therebe a guessing algorithm.47. But recall that some aspects of this program will not be available for self-examinationand correction.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    27/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    34 KEN BINMOR Edata (i.e.; the type) of the opponent.48 The machine then employs thesebeliefs to generate a basic guessing rule G o to be used in predicting itsopponent's behavior. In Figure 2B, p;, is the probability attributed by atype i machine to the event that the opponent is type ;. These proba-bilities are taken to be part of subjective data of a machine since theyare obtained as a by-product of the manner in which a machine processesits objective data.49 The notation G[(y) represents an optimal responseby a machine of type / occupying the role of player y given that such amachine predicts the behavior of its opponent x using the guessing ruleG'0(x). Since a m ach ine of type i does not know the type of its opp one nt,it assesses these optimal responses with the weighted average

    G\iy) = puGJ(y) + p2iGl(y) + . . . + pn&(y)Assuming, as in Section 3, that Go is already a good basic guessing rulefor a machine of type i, so that the probability r0 may be taken to besmall, a self-correction by a machine of type i wo uld be to replace Go by

    H o = (1 - ro)G'o + r0G[Figure 3B shows how this modification is reflected in the dynamicsfor the simple zero-sum game studied in Section 4. Only two types areconsidered and the points labeled 1 and 2 repr esen t initial basic gue ssingrules from which the bootstrapping process begins. Each type attachesprobability 3/4 to the event that the opponent is of the same type asitself. As in the case of only one type described in Section 4, both tra-jectories converge on the unique Nash equilibrium. Thus, although thetwo different types come to the game with different experiences, in-trospection leads them to approximately the same final basic guessingrule.Recall that an eductive libration takes place entirely inside a singleplayer's head in a static game. The example is therefore describing themanner in which introspection can lead rational players to reliable con-clusions abou t rational oppo nents even thoug h the premises from whichthe opponent began may be uncertain. It is some such intuition that liesbehind the classical defense of Nash equilibrium. However, this classicaldefense requires that the predictions each player makes about his or heropponent 's play are common knowledge.Figures 3C and 3D indicate situations with two and four types re-

    spectively. In Figure 3C, each type, rather eccentrically, attaches prob-48 . This is where Harsanyi's (1967, 1968) incomplete information framework is borrowed.Note the elegance with which problems of "beliefs about beliefs" are treated.49. This is all that will be offered in defense of the "common knowledge" requirementsof Harsanyi's theory.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    28/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 35II

    23

    00

    00

    32

    EXHIBIT 2ability 1/10 to the event that the other player is the same type as itself.Nevertheless, convergence to the unique Nash equilibrium is still at-tained. To obtain an example without convergence to the Nash equilib-rium, it is necessary to consider even more bizarre initial conditions, asin Figure 3D. Here type 4 is certain that her opponent is type 3, type 3that his op po ne nt is type 2, type 2 that her opp on en t is type 1 and type1 that his opponent is type 4.So far the ap proach of Section 3 for static gam es has been gene ralizedto take account of different types of "rational" player. Obviously, whathas been said has relevance to the problem of selecting equilibria in staticgames when many rival equilibria exist. Harsanyi and Selten (1980, 1982)make this problem the chief motivation for studying their closely relatedtracing procedure. However, I am reluctant to follow Harsanyi and Sel-ten in making a sequenc e of heroic assu mp tions from wh ich to proceed50and hence this issue will be put aside in favor of a study of the impli-cations of the approach for dynamic games. This study is taken upshortly. But, before leav ing static gam es altoge ther, it will be instructiveto look briefly at the following version of the "Battle of the Sexes" inorder to comment on the relevance in this context of Aumann's (1974,1987) notion of a correlated equilibrium. For a discussion of the traditionalviewpoint in this connection, see, for example, Binmore and Branden-burger (1987).

    The game matrix for the "Battle of the Sexes" is shown in Exhibit2. The gam e has thre e N ash equilibria. In Figure 4A, which is analogou sto Figure 3A, these are located at the points (0, 0), (1, 1) and (2/5, 3/5).The first of these, for example, corresponds to each player's using hisor her first pure strategy. If only one type of player is admitted (i.e.,both players begin with the same objective data), then introspection willnecessarily lead both players to nearly the same final basic guessing rule,and this will be one of the three Nash equilibria. In Figure 4A, the Nash50. Among other things, I am unhappy about artificial devices introduced to secure con-vergence. But how are players to evaluate divergent processes?

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    29/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    36 KEN BINMOREequilibrium to which they are led is (0, 0). If their objective data had ledthem to place their initial basic guessing rule at any point (p, q) withp + q < 1, the result would have been the same. With p + q > 1, theresulting Nash equilibrium would have been (1, 1). Figure 4B showsthat the Nash equilibrium to which the players are led when the initialbasic guessing rule satisfies p + q = 1 is (2/5, 3/5). Harsanyi and Sel-ten's "tracing procedure" selects the last of these as the "solution of thegame." Their defense of this result depends on locating their equivalentof an initial basic guessing rule suitably. One might, for example, arguethat, since the game exhibits a certain symmetry, then the initial basicguessing rule should exhibit the same type of symmetry. This wouldlocate it on the line p + c\ = 1 and hence one w ould b e led to select thesame Nash equilibrium as in the Harsanyi/Selten theory. However, Ifind such a n a rg um en t too weak to be acceptable. This is partly be causeI see no particular re ason for insisting on a priori symm etry requirem entsin the players' objective data and partly because, even for situations inwhich such symmetries might be defensible, the slightest perturbationof (p, q) from the line p + q = 1 destroys the conclusion entirely.It is, indeed, far from clear that attention should be confined onlyto Nash equilibria in this context. The objective data that types receivewill, in general, be correlated, since it is obtained from observing thebehavior of the same population of players. It is therefore not surprisingthat, if the process of refining guessing rules converges, then it doesnot necessarily converge to a Nash equilibrium of the game although italways converges to a correlated equilibrium of the gam e. Correlated equi-libria are relevant whenever both players are partially informed aboutthe outco m e of some exogeno us ran do m ev ent and condition their choiceof strategy on this information. The requirement for equilibrium is thenthe same as that proposed by Nash: no player must have an incentiveto deviate from his or her choice, provided that the other player doesnot deviate. A Nash equilibrium is a correlated equilibrium for whicheach player's information about the exogenous event is independent ofthe other player's information. Otherwise a correlated equilibrium maydiffer from a Nash equilibrium.51As an exam ple, consider the case w hen the exogenous random eventconsists of whatever prediction is written in a game theory book abouthow the "Battle of the Sexes" will be played by rational individuals inthe same circumstances as the two players currently about to play thegame. The event is random in the sense that neither player is assumedto have read the book and so what it actually says is unknown to them.

    51. Although Nash equilibrium remains the fundamental concept. A correlated equilib-rium is a Nash equilibrium of an expanded game for which the first "move" consistsof the transmission of the correlated information to the players.

    http://journals.cambridge.org/http://journals.cambridge.org/
  • 7/28/2019 Binmore Modeling Rational Players II

    30/48

    http://journals.cambridge.org Downloaded: 06 Apr 2013 IP address: 144.173.6.37

    MODELING RATIONAL PLAYERS 37But each player is told what th