Dynamic Incentive Mechanisms - Harvard University

16
H ow can we design intelligent protocols to coordinate actions, allocate resources, and make decisions in envi- ronments with multiple rational agents each seeking to maximize individual utility? This problem of “inverse game the- ory, ” in which we design a game form to provide structure to interactions between rational agents, has sparked interest with- in articial intelligence since the early 1990s, when the seminal work of Rosenschein and Zlotkin (1994) and Ephrati and Rosen- schein (1991) introduced the core themes of mechanism design to the AI community. Consider this an inverse articial intelligence perhaps, the problem of designing rules to govern the interaction between agents in order to promote desirable outcomes for multiagent environments. The rules are themselves typically instantiated algorithmically, with the computational procedure by which the rules of interaction are enacted forming the coordination mechanism, and the object of design and analysis. The design of a suitable mechanism can require the integra- tion of multiple methods from AI, such as those of preference elicitation, optimization, and machine learning, in addition to an explicit consideration of agent incentives. An interesting theme that emerges is that by careful design it is possible to sim- plify the reasoning problem faced by agents. Rather than require agents to struggle with complex decision problems, we are in the unusual position of being able to design simple decision environments. Mechanism design theory was developed within mathemati- cal economics as a way to think about what “the market”— viewed somewhat like an abstract computer—can achieve, at least in principle, as a method for coordinating the activities of rational agents. During the debates of the 1960s and 1970s about centralized command-and-control versus market economies, Hurwicz (1973) developed the formal framework of mechanism design to address this fundamental question. The basic setup considers a system of rational agents and an out- come space, with each agent holding private information about Articles WINTER 2010 79 Copyright © 2010, Association for the Advancement of Articial Intelligence. All rights reserved. ISSN 0738-4602 Dynamic Incentive Mechanisms David C. Parkes, Ruggiero Cavallo, Florin Constantin, and Satinder Singh Much of AI is concerned with the design of intelligent agents. A complementary challenge is to understand how to design “rules of encounter” (Rosenschein and Zlotkin 1994) by which to promote simple, robust and beneficial interactions between multiple intelligent agents. This is a natural development, as AI is increasingly used for automated decision mak- ing in real-world settings. As we extend the ideas of mechanism design from economic the- ory, the mechanisms (or rules) become algorith- mic and many new challenges surface. Starting with a short background on mechanism design theory, the aim of this paper is to provide a non- technical exposition of recent results on dynam- ic incentive mechanisms, which provide rules for the coordination of agents in sequential decision problems. The framework of dynamic mechanism design embraces coordinated deci- sion making both in the context of uncertainty about the world external to an agent and also in regard to the dynamics of agent preferences. In addition to tracing some recent develop- ments, we point to ongoing research challenges.

Transcript of Dynamic Incentive Mechanisms - Harvard University

Page 1: Dynamic Incentive Mechanisms - Harvard University

How can we design intelligent protocols to coordinateactions, allocate resources, and make decisions in envi-ronments with multiple rational agents each seeking to

maximize individual utility? This problem of “inverse game the-ory, ” in which we design a game form to provide structure tointeractions between rational agents, has sparked interest with-in arti!cial intelligence since the early 1990s, when the seminalwork of Rosenschein and Zlotkin (1994) and Ephrati and Rosen-schein (1991) introduced the core themes of mechanism designto the AI community.

Consider this an inverse arti!cial intelligence perhaps, theproblem of designing rules to govern the interaction betweenagents in order to promote desirable outcomes for multiagentenvironments. The rules are themselves typically instantiatedalgorithmically, with the computational procedure by whichthe rules of interaction are enacted forming the coordinationmechanism, and the object of design and analysis.

The design of a suitable mechanism can require the integra-tion of multiple methods from AI, such as those of preferenceelicitation, optimization, and machine learning, in addition toan explicit consideration of agent incentives. An interestingtheme that emerges is that by careful design it is possible to sim-plify the reasoning problem faced by agents. Rather than requireagents to struggle with complex decision problems, we are inthe unusual position of being able to design simple decisionenvironments.

Mechanism design theory was developed within mathemati-cal economics as a way to think about what “the market”—viewed somewhat like an abstract computer—can achieve, atleast in principle, as a method for coordinating the activities ofrational agents. During the debates of the 1960s and 1970sabout centralized command-and-control versus marketeconomies, Hurwicz (1973) developed the formal framework ofmechanism design to address this fundamental question. Thebasic setup considers a system of rational agents and an out-come space, with each agent holding private information about

Articles

WINTER 2010 79Copyright © 2010, Association for the Advancement of Arti!cial Intelligence. All rights reserved. ISSN 0738-4602

Dynamic Incentive Mechanisms

David C. Parkes, Ruggiero Cavallo, Florin Constantin, and Satinder Singh

Much of AI is concerned with the design ofintelligent agents. A complementary challengeis to understand how to design “rules ofencounter” (Rosenschein and Zlotkin 1994) bywhich to promote simple, robust and beneficialinteractions between multiple intelligentagents. This is a natural development, as AI isincreasingly used for automated decision mak-ing in real-world settings. As we extend theideas of mechanism design from economic the-ory, the mechanisms (or rules) become algorith-mic and many new challenges surface. Startingwith a short background on mechanism designtheory, the aim of this paper is to provide a non-technical exposition of recent results on dynam-ic incentive mechanisms, which provide rulesfor the coordination of agents in sequentialdecision problems. The framework of dynamicmechanism design embraces coordinated deci-sion making both in the context of uncertaintyabout the world external to an agent and alsoin regard to the dynamics of agent preferences.In addition to tracing some recent develop-ments, we point to ongoing research challenges.

Page 2: Dynamic Incentive Mechanisms - Harvard University

its type, this type de!ning the agent’s preferencesover different outcomes. Each agent makes a claimabout its type, and the mechanism receives theseclaims and selects an outcome; for example, anallocation of resources, an assignment of tasks, ordecision about a public project. Being rationalagents, the basic modeling assumption is that anagent will seek to make a claim so that the out-come selected maximizes its utility given its beliefsabout the claims made by other agents, that is, ina game-theoretic equilibrium. Jackson (2000) pro-vides an accessible recent survey of economicmechanism design.

Mechanism design theory typically insists ondesigns that enjoy the special property of incentivecompatibility: namely, it should be in every agent’sown best interest to be truthful in reporting itstype. Especially when achieved in a dominant-strategy equilibrium (so that truthfulness is bestwhatever the claims of other agents) this obviatesthe need for strategic reasoning and simpli!es anagent’s decision problem.

If being able to achieve incentive compatibilitysounds a bit too good to be true, it often is; in addi-tion to positive results there are also impossibility

results that identify properties on outcomes thatcannot be achieved under any incentive mecha-nism. Just as computer science uses complexityconsiderations to divide problems into tractableand intractable, mechanism design theory classi-!es outcome rules into those that are possible andimpossible, working under its own lens of incen-tive constraints. Where mechanism design getsreally interesting within AI is when the incentiveconstraints are in tension with the computationalconstraints, but we’re getting a bit ahead of our-selves.

Two Simple ExamplesFor a !rst example, we can think about an auctionfor a last-minute ticket for a seat at an ice hockeygame at the Winter Olympics. Rather than the usu-al auction concept (the highest bidder wins andpays his or her bid price), we can consider insteada second-price sealed-bid auction (Vickrey 1961)(see !gure 1). There are three agents (A1, A2, andA3), each with a simple type that is just a singlenumber representing its value for the ticket. Forexample, A2 is willing to pay up to $1000, with itsutility interpreted as v – p for value v = 1000 and

Articles

80 AI MAGAZINE

win

lose

$900

$900 A1$1000 A2$400 A3 $0

Figure 1. A Single-Item Allocation Problem.

Agents A1, A2, and A3 have value $900, $1000, and $400. A2 wins the hockey ticket and makes a payment of $900 forany bid above $900.

Page 3: Dynamic Incentive Mechanisms - Harvard University

payment p. In a second-price auction, A2 wouldwin, but pay $900 rather than its bid price of$1000.

Collecting reports of a single number from eachagent, allocating the item to the agent with thehighest report (breaking ties at random) and col-lecting as payment the second-highest reportde!nes the allocation mechanism. This mecha-nism is incentive compatible in the strong sensethat truthful bidding is a dominant strategy equi-librium. A2 need not worry about bidding less than$1000, its bid merely describes the most it mightpay and its actual payment is the smallest amountthat it could bid and still win.1

For a second example, consider a universitycampus with a central mall along which all stu-dents walk. The problem is to determine where tobuild a glass building that will house the skeleton

of a 26-meter blue whale that washed up on PrinceEdward Island in 1987.2 Let us suppose that everystudent has a preferred location !* along the mallto locate the whale. Moreover, we assume single-peaked preferences, such that for two locations, !and !!, both on the same side of a student’s mostpreferred location, then !! is less preferred than !whenever !! is further from !* than !.

A truthful mechanism can locate the whale atthe median of all the reported peaks (with a ran-dom tie-breaking step if there is an even number ofagents) (Moulin 1980). See !gure 2. For truthfulreports of preference peaks, that is, (100, 200, 400),A2’s report forms the median and the whale islocated at 200 meters into the mall. No agent canimprove the outcome in its favor. For example, A1only changes the location when its report is greaterthan 200, but then the position moves away from

Articles

WINTER 2010 81

A1 A2 A3

100 200 300

median

Figure 2. Single-Peaked Preferences and the Median Selection Mechanism.

Agents A1, A2, and A3 each have preferred locations at 100, 200, and 400. The whale exhibit is located at the medianreport of 200.

Page 4: Dynamic Incentive Mechanisms - Harvard University

its most preferred location of 100. It is a fun exer-cise to think about how the mean rule, in compar-ison, fails to be truthful.3

Mechanism Design and AIThese two examples are representative of problemsfor which the microeconomic theory of mecha-nism design has made great advances. In derivingclean, closed-form results, it is common to adopt amodel in which the private information of anagent is a single number, perhaps a value, or a loca-tion. A typical goal is to characterize the family ofincentive-compatible mechanisms for a problem,in order to identify the mechanism that is opti-mized for a particular design criterion, perhapssocial welfare, revenue, or some notion of fairness.Moreover, the examples are representative in thatthey are static problems: the set of agents is !xedand a decision is to be made in a single time peri-od.

Contrast these examples, then, with the kinds ofcomplex decision problems in which AI is typical-ly interested: environments in which actions haveuncertain effects, are taken across multiple timeperiods, and perhaps with the need to learn aboutthe value and effect of different actions. Moreover,we are increasingly interested in multiagent envi-ronments, and including those in which there isself-interest: e-commerce, crowdsourcing, multiro-bot systems, sensor networks, smart grids, sched-uling, and so forth.

The research challenge that we face is to adaptthe methods of mechanism design to more com-plex problems, addressing computational chal-lenges as they arise and extending characterizationresults as required, in order to develop a theory andpractice of computational mechanism design fordynamic, complex, multiagent environments. Itseems to us that progress will require relaxing someaspects of mechanism design theory, for examplesubstituting a kind of approximate incentive com-patibility (or models of nonequilibrium behavior)and being willing to adopt heuristic notions ofoptimality in return for increasing mechanismscale and mechanism reach.

OutlineIn what follows we will develop some intuitionsfor incentive mechanisms that are compatible withtwo different kinds of dynamics. There is externaluncertainty, in which the dynamics are outside theboundary of individual agents and occur becauseof agent arrivals and departures and other uncer-tain events in the environment. In addition thereis internal uncertainty, in which the dynamics aredue to learning and information acquisition with-in the boundary of an individual agent, and whereit is the private information of an agent that isinherently dynamic.

External UncertaintyBy external uncertainty, we intend to describe asequential decision problem in which the uncer-tain events occur in the environment, external toan agent. The dynamics may include the arrivaland departure of agents with respect to the sphereof in"uence of a mechanism’s outcome space, aswell as changes to the outcomes that are availableto a mechanism. Crucially, by insisting on externaluncertainty we require that an agent that arriveshas a !xed type, and be able to report this type,truthfully if it chooses, to the mechanism upon itsarrival.4

A Dynamic Unit-Demand AuctionTo illustrate this, we return to the problem of allo-cating hockey tickets, and now ask what wouldhappen if one ticket is sold on each of two days,with agents arriving and departing in differentperiods.

There is one ticket available for sale on Mondayand one available for sale on Tuesday. Both ticketsare for the Wednesday game. The type of an agentde!nes an arrival, departure, and value. The arrivalperiod is the !rst period in which an agent is ableto report its type. The departure period is the !nalperiod in which it has value for winning a ticket.The arrival-departure interval denotes the set ofperiods in which an agent has value for an alloca-tion (see !gure 3). In the example, the types are($900, 1, 2), ($1000, 1, 2), ($400, 2, 2) so that A1and A2 are patient, with value for a ticket on eitherof days 1 or 2, but with A3 impatient, and arrivingon day 2 and requiring a ticket allocation on theday of its arrival. The setting is unit-demandbecause each agent requires only a single ticket.

We could run a sequence of second-price auc-tions, with A2 winning on Monday for $900, drop-ping out, and A1 winning on Tuesday for $400. Butthis would not be truthful, and we should expectthat agents would try to misreport to improve theoutcome in their favor. A2 can deviate and bid$500 and win on Tuesday for $400, or just waituntil Tuesday to bid.

A simple variant provides a dynamic and truth-ful auction. We adopt the same greedy decisionpolicy, committing the item for sale in a given peri-od to the unallocated agent with the highest valueamong those present. The item is held by themechanism until the reported departure time ofthe agent. But rather than collect the second price,the payment is set to the lowest amount the agentcould have bid and still been allocated in some peri-od (not necessarily the same period) within itsarrival-departure interval (Hajiaghayi et al. 2005).In the example, A2 is allocated a ticket on Monday,which is released to the agent on Tuesday. A1 isallocated on Tuesday. For payments, both agentspay $400 because this is the smallest amount each

Articles

82 AI MAGAZINE

Page 5: Dynamic Incentive Mechanisms - Harvard University

could bid (leaving all other bids unchanged) andstill be allocated a ticket.

The key to understanding the truthfulness of thedynamic auction is that the allocation policy ismonotonic. An agent that loses with a bid v, arrivala, and departure d also loses for all bids with v! v,a! a and d! d, !xing the bids of other agents. Itis easy to see that the greedy allocation rule ismonotonic. To see why monotonicity providestruthfulness, !rst consider any (a!, d!) report andsome v! v. The payment p for any winning bid isthe critical value at which an agent !rst wins insome period. This is the same for all winning bids,and therefore if v p the agent wins and has nouseful value misreport. If v < p then the agent los-es, and continues to lose for all v! except v! p, butthen its payment p is greater than its true value.

Knowing that value v will be truthfully reportedfor all arrival-departure misreports, we can nowconsider temporal manipulations. For this, weassume that misreports a! a are not possible,because the arrival models the time period whenan agent !rst realizes his or her demand, or !rstdiscovers the mechanism. A misreport to a laterdeparture with d! > d is never useful because themechanism does not release an allocated ticket toan agent until d!, when the agent would have novalue. Finally, deviations with a! a and d! d canonly increase the payment made by a winningagent by monotonicity (since the critical valuemust increase) and will have no effect on the out-come for a losing agent.

But what about more general problems? Can wedesign incentive mechanisms for uncertain envi-ronments with a dynamic agent population, andwhere agents have general valuation functions onsequences of actions by the mechanism? In fact,this is possible, through a different generalizationof the second-price auction to dynamic environ-ments.

A Static VariationWe !rst review the Vickrey-Clarke-Groves (VCG)mechanism for static problems. For this, assumethat the bids (($900, 1, 2), ($1000, 1, 2), ($400, 2,2)) can all be made on Monday (even for agent A3).The complete allocation and payments can now bedetermined in a single period. In the VCG mecha-nism, the tickets are allocated to maximize report-ed value, that is, to A1 and A2. For the payment,we determine the externality imposed by an agentthrough its presence. This is (Va − Vb), where Va isthe total reported value to other agents from theoptimal action that would be taken if the agentwas absent, and Vb is the total reported value toother agents from the action taken given theagent’s presence.

In the example, for A1 this is payment p1 = Va –Vb = 1400 – 1000 = 400 and for A2 this is p2 = Va –

Vb = 1300 – 900 = 400. In both cases, this is the costimposed on A3 by the presence of A1 or A2, whichis $400 because A3 would be allocated if either A1or A2 was absent. For an incentive analysis, con-sider agent A1. The utility to A1 is v · !(A1 wins) –p1 = v · !(A1 wins) – (Va – Vb) = v · !(A1 wins) + Vb –Va, where indicator !(E) is 1 if event E is true and 0otherwise. Now, Va is independent of A1’s bid andso can be ignored for an incentive analysis. We areleft with A1’s utility depending on v · !(A1 wins) +Vb, where v is its true value. A1 affects this quanti-ty through the effect of its bid on the allocation,and thus whether or not it wins and the value toother agents. The VCG mechanism selects an allo-cation that maximizes total reported value, andthus maximizes A1’s true value plus the reportedvalue of the other agents when A1 is truthful.5

A misreport is never useful for an agent and canprovide less utility.

Articles

WINTER 2010 83

A1, $900

A2, $1000

A3, $400

Monday Tuesday

Figure 3. A Dynamic Auction for Tickets to a Wednesday Hockey Game.

Agents A1 and A2 have value $900 and $1000, respectively, for an allocationon Monday or Tuesday. A3 arrives on Tuesday, and has value $400 for a tick-et allocated that day.

Page 6: Dynamic Incentive Mechanisms - Harvard University

The Online VCG MechanismA generalization of the VCG mechanism todynamic environments is provided by the onlineVCG mechanism (Parkes and Singh 2003). Pay-ments are collected so that each agent’s expectedpayment is exactly the expected externalityimposed by the agent on other agents upon itsarrival. The expected externality is the differencebetween the total expected discounted value to theother agents under the optimal policy withoutagent i, and the total expected discounted value tothe other agents under the optimal policy withagent i.6 For this, the mechanism must have a cor-rect probabilistic model of the environment,including a model of the agent arrival and depar-ture process.

The online VCG mechanism aligns the incen-tives of agents with the social objective of follow-ing a decision policy that maximizes the expectedtotal discounted value to all participants. Theproof of incentive compatibility establishes thateach agent’s utility is aligned with the total expect-ed discounted value of the entire system. Themechanism’s incentive properties extend to anyproblem in which each agent’s value, conditionedon a sequence of actions by the mechanism, isindependent of the private type of other agents.

For a simple example, suppose that the types ofA1 and A2 are unchanged while a probabilisticmodel states that A3 will arrive on Tuesday with avalue that is uniform on [$300, $600]. Thus, theexpected externality that A2 imposes on A3 is$450, which is the mean of this value distribution.In making this payment, A2 cannot do better inexpectation by misreporting its type, as long as oth-er agents in future periods play the truthful equi-librium and the probabilistic model of the center iscorrect.

The kind of incentive compatibility achieved bythe online VCG mechanism is somewhat weakerthan the dominant strategy equilibrium achievedin the static VCG mechanism. Notice, for example,that if A3 always bids $300 then A2 can reduce itspayment by delaying its arrival until period 2.Rather, truthful reporting is an agent’s bestresponse in expectation, as long as the probabilis-tic model of the mechanism is correct and agentsin the current and future periods are truthful.7

A Challenge Problem: Dynamic Combinatorial AuctionsIn a combinatorial auction (CA), agent valuationscan express arbitrary relationships between items,such as substitutes (“I want a ticket for one of thefollowing hockey games”) and complements (“Iwant a ticket for two games involving USA.”) SeeCramton, Shoham, and Steinberg (2006) for a sum-mary of recent advances for static CAs. In a dynam-ic CA, we can allow for an uncertain supply of dis-

tinct goods, agent arrival and departure, andagents valuations on different sequences of alloca-tions of goods.

The online VCG mechanism is well de!ned, andprovides incentive compatibility for dynamic CAs.On the other hand, there remain signi!cant com-putational challenges in getting these dynamicmechanisms to truly play out in realistic, multia-gent environments, and here there is plenty ofopportunity for innovative computationaladvances. For example, the following issues loomlarge:

Bidding languages and preference elicitation. This isan issue well developed for static CAs but largelyunexplored for dynamic CAs. One direction is todevelop concise representations of valuation func-tions on sequences of allocation decisions. Anoth-er direction is to develop methods for preferenceelicitation, so that only as much information as isrequired to determine an optimal decision in thecurrent period is elicited.

Winner determination and payment computation.The winner-determination and payment computa-tion problem has been extensively studied in stat-ic CAs, with tractable special cases identi!ed, andapproximation schemes and scalable, heuristicalgorithms developed (Cramton, Shoham, andSteinberg 2006). Similarly, we require signi!cantprogress on winner determination in dynamicCAs, where the decision problem is online and oneof stochastic optimization. Given approximations,then incentive considerations will once againcome into play.

Learning. The incentive compatibility of thedynamic VCG mechanism relies, in part, on hav-ing an exact probabilistic model of the agentarrival and departure process. We must developtechniques to provide incentive compatibility (per-haps approximately) along the path of learning, inorder to enable the deployment of dynamic mech-anisms in unknown environments.

We are, in a sense, in the place that static CAswere around a decade ago, when there was a con-certed effort to provide a solid computational foot-ing for CAs, inspired in part by the push by theU.S. Federal Communications Commission todesign CAs for the allocation of wireless spectrum.One domain that seems compelling, in working todevelop a computational grounding for dynamicCAs is crowdsourcing, where human and computa-tional resources are coordinated to solve challeng-ing problems (Shahaf and Horvitz 2010, von Ahnand Dabbish 2008). A second domain of interest issmart grids, in which renewable energy sources(necessarily more bursty than traditional powersources) are dynamically matched through dynam-ic pricing against the shifting demand of users(Vytelingum et al. 2010).

Articles

84 AI MAGAZINE

Page 7: Dynamic Incentive Mechanisms - Harvard University

Heuristic Mechanism DesignWithin microeconomic theory, a high premium isplaced on developing analytic results that exactlycharacterize the optimal mechanism rule withinthe space of all possible mechanism rules. But thisis typically not possible in many of the complexenvironments in which AI researchers are interest-ed because it would imply that we can derive anoptimal algorithm in a domain. This is often out ofreach. There is even debate about what optimalityentails in the context of bounded computationalresources, and still greater gaps in understandingabout how to best use bounded resources (Russelland Wefald 1991; Russell, Subramanian, and Parr1993). Rather, it is more typical to approach dif!-cult computational problems in AI through a com-bination of inspiration and perspiration, with cre-ativity and experimentation and the weavingtogether of different methods.

From this viewpoint, we can ask what it wouldmean to have a heuristic approach to mechanismdesign. One idea is to insist on provable incentiveproperties but punt on provable optimality prop-erties. In place of optimality, we might adopt as agold standard by which to evaluate a mechanism,the performance of a state-of-the-art, likely heuris-tic, algorithm for the problem with cooperativerather than self-interested agents (Parkes 2009). Inthis sense, a heuristic approach to mechanismdesign is successful when the empirical perform-ance of a designed mechanism is good in compar-ison with the performance of a gold-standard algo-rithm that would be adopted in a cooperativesystem.

To make this concrete, we can return again toour running example and suppose now that ourhockey enthusiasts have friends that also likehockey, and want to purchase multiple tickets. Anagent’s type is now a value for q 1 tickets, and thisis an all-or-nothing demand, with no value forreceiving q! < q tickets and the same value for morethan q tickets. Optimal sequential policies are notavailable for this problem with current computa-tional methods. The reason is the curse of dimen-sionality resulting from the need to include pres-ent, but unallocated agents in the state of theplanning problem.8 A gold-standard, but heuristicalgorithm is provided by the methods of onlinestochastic combinatorial optimization (OSCO)(Hentenryck and Bent 2006). Crucially, we canmodel the future realization of demand as inde-pendent of past allocation decisions, conditionedon the past realization of demand. This “uncer-tainty independence” property permits high-qual-ity decision making through scalable, sample tra-jectory methods.

But, these OSCO algorithms cannot be directlycombined with payments to achieve incentivecompatibility. First, a dynamic-VCG approach fails

to be incentive compatible because an agent mayseek to misreport to correct for the approximationof the algorithm.9 Moreover, the heuristic policiescan fail to be monotonic, which would otherwiseallow for suitable payments to provide incentivecompatibility (Parkes and Duong 2007). For a sim-ple example, suppose that three tickets for theWednesday game are for sale, and that each can besold on either Monday or Tuesday. There are threebidders, with types ($50, 1, 1, q = 2), ($90, 1, 2, q =1) and ($200, 2, 2, q = 1) for A1, A2, and A3, whereA3 arrives with probability (0, 0.25). The opti-mal policy (which we should expect from OSCO inthis simple example) will allocate A1 on Mondayand then A2 or A3 on Tuesday depending onwhether or not A3 arrives. But, A2 could request q= 2 instead and report type ($90, 1, 2, q = 2). In thiscase, the optimal policy will not allocate any tick-ets on Monday, and then A2 or {A2, A3} on Tues-day depending on whether or not A3 arrives. Thisfails a required generalization of monotonicity,which insists on improving outcomes for highervalue, earlier arrival, later departure, or smallerquantity. In this case, A2 asks for an additionalticket and in one scenario (when A3 arrives) goesfrom losing to winning.

One approach to address this problem is to per-turb the decisions of an OSCO algorithm in orderto achieve monotonicity. For this, a self-correctingprocess of output-ironing is adopted (Parkes andDuong 2007, Constantin and Parkes 2009).10 Adecision policy is automatically modi!ed to pro-vide monotonicity by identifying and cancelingdecisions for which every higher type would notprovably be allocated by the algorithm (see !gure4 for a simple one-dimensional example). Forexample, the allocation decision for A2 ($90, 1, 2,q = 2) in the event A3 arrives would be canceled byoutput-ironing since it would be observed that ahigher type ($90, 1, 2, q = 1) would not be allocat-ed. The computational challenge is to enabletractable sensitivity analysis, so that monotonicitycan be veri!ed. The procedure also requires thatthe original algorithm is almost monotonic, sothat the performance of the heuristic mechanismremains close to that of the target algorithm.

Internal UncertaintyBy internal uncertainty, we describe a sequentialdecision problem in which the uncertain eventsoccur within the scope of an individual agent’sview of the world. The dynamics are those of infor-mation acquisition, learning, and updates to thelocal goals or preferences of an agent, all of whichtrigger changes to an agent’s preferences.

To model this we adopt the idea of a dynamictype: the information, local to an agent and pri-vate, can change from period to period and in a

Articles

WINTER 2010 85

Page 8: Dynamic Incentive Mechanisms - Harvard University

way that depends on the actions of the mecha-nism. For this reason, we will need incentive com-patibility constraints to hold for an agent in everyperiod, so that a mechanism continually provideincentives to share private type information withthe mechanism. In contrast, for external uncer-tainty in which an agent’s type is static, it is suf!-cient to align incentives only until the period inwhich an agent makes a claim about its type.11

We will see an analog to the online VCG mech-anism, in which payments are de!ned so that anagent’s expected total payment forward from everyperiod is the expected externality imposed by theagent on the other agents. With external uncer-tainty, this property on payments needs to holdonly upon an agent’s arrival. For internal uncer-tainty, this property must hold in every period.

Dynamic Auctions with Learning AgentsBy way of illustration, suppose that our group ofhockey enthusiasts are not sure exactly how muchthey enjoy the sport, and learn new informationabout their value whenever allocated a ticket. Eachtime someone receives a ticket to attend a game,he or she receives an independent sample of the

value for watching hockey games. The mechanismdesign problem is to coordinate the learningprocess, so that a ticket is allocated in each periodto maximize the expected discounted value, giventhat there is uncertainty about each person’s truevalue. In particular, it might be optimal to allocatea ticket to someone other than the person with thehighest current expected value to allow learning.

Consider !gure 5. This Markov chain illustratesthe Bayesian learning process for a single agent. Weassume that upon attending a game, an agent willeither like the game (value = 1) or dislike the game(value = 0). An agent’s value is sampled from a sta-tionary distribution, with probability [0, 1] forliking a game. In any period t, an agent’s currentinternal belief state is represented by the count (Nt

1,Nt

0) of games liked and disliked. This agent’s initialbelief state is (1, 2) so that its subjective belief is 1/3that it will like the next game, and 2/3 that it willdislike the next game. The Markov chain capturesBayesian learning. In the event that an agent likesthe game, the observation is a “1” and it transi-tions to belief state (2, 2).12 From this state, its sub-jective belief that it will like the next game is 1/2,and so forth. The dynamic type of an agent is its

Articles

86 AI MAGAZINE

win

win

lose$400

$0

lose$900

$600

win

lose

lose$400

$0

lose$900

$600Ironing

Figure 4. An Illustration of Output-Ironing in a Single-Item Allocation Problem.

The initial allocation rule is nonmonotonic, with the agent winning for values [$400, $600] but losing for values [$600, $900]. Output-iron-ing would establish monotonicity by canceling the allocation decision in interval [$400, $600].

Page 9: Dynamic Incentive Mechanisms - Harvard University

current belief state. This belief state also de!nes theprobabilistic model for how its type will evolvebased on sequences of actions.

For the multiagent problem, the Markov chainof each agent advances upon an allocation of aticket. In every period, the decision to make iswhich Markov chain to advance. This is a multi-armed bandits problem (MABP), with each arm cor-responding to an agent, the state of an arm onlychanging when activated, and only one arm acti-vated in every period. The optimal policy for theMABP solves the multiagent learning problem,identifying an allocation policy to maximizeexpected discounted value and !nding the righttrade-off between exploration and exploitation. Itmay not be optimal to allocate an agent with thehighest expected value when there is considerableuncertainty about another agent’s value. With dis-counting, and an in!nite time horizon, the opti-mal policy is characterized by an “index policy”and there is an algorithm that scales linearly in thenumber of arms (Gittins 1989).13 What makes thisa mechanism design problem is that each agenthas private information about its current beliefstate, and will misreport this information if thisimproves the allocation policy in its favor.

The Dynamic VCG MechanismIn the dynamic VCG mechanism (Bergemann andVälimäki 2010), each agent must report a proba-bilistic model for how its type will evolve based onsequences of actions, and also, in each period, itscurrent type.

The mechanism selects an action in each periodaccording to the optimal policy, that is, the policythat maximizes the total expected discounted val-ue. Let t denote the action selected in period t. Asin the online VCG mechanism, the payments arede!ned so that the expected discounted paymentsmade by an agent equals the expected externalityit imposes on the other agents through its effect onthe decision policy. But now the expected dis-counted payment must be aligned in every periodand not just upon arrival.

For this, a payment (Va – Vb) is collected in eachperiod t, where Va is the expected reported dis-counted value that the other agents would achieveforward from the current period under the optimaldecision policy that would be followed if agent iwas not present, and Vb is the expected reporteddiscounted value that the other agents wouldachieve forward from the current period under theoptimal decision policy that would be followed ifagent i was not present in the next period forward,and conditioned on taking action xt in the currentperiod.

In so doing, the dynamic VCG mechanismaligns the incentives of an agent with the objectiveof maximizing the expected discounted value,

summed across all agents. This makes it incentivecompatible for an agent to report its true proba-bilistic model and, in each period, its currenttype.14

In the hockey ticket example, each agent willmake a claim about its current belief state (Nt

1, Nt0)

in each period. The ticket will be allocated to theagent with the highest index (Gittins 1989) andthe allocated agent will then receive value 1 or 0,make a payment, and advance to a new belief state,whereupon it will make a new report.15

For the special case of two agents, the expectedexternality imposed on A2 when A1 is allocated is(1 – )W2, where W2 = w2/(1 – ) is the expected dis-counted value to agent 2 for receiving the item inevery period including the current period, with w2= Nt

1/(Nt1 + Nt

0), where these are the counts for agent

Articles

WINTER 2010 87

1/2, 1

1/3, 1

2/3, 0

1/2, 0

1/4, 1

3/4, 0

2,2

3,2

2,3

1,3

1,4

1,2

2,3

Figure 5. The Markov Chain Model for a Bayesian Learning Agent Observing a Sequence of

{0, 1} Values Received for Attending a Sequence of Hockey Games.

Each state is labeled with counts (Nt1, N

t0) of the number of value 1 and value

0 observations in current period t. Transitions are labeled with a probability(depending on the relative counts) and the value received.

Page 10: Dynamic Incentive Mechanisms - Harvard University

2 and discount factor ∈ (0, 1). The amount (1 –)W2 = W2 – W2 represents the effect of pushing

back the sequence of allocations to A2 by one peri-od, and is the expected externality imposed on A2by the presence of A1 in the current period. In fact,we see that the payment collected is (1 – )W2 = (1– )w2/(1 – ) = w2 and exactly the expected valueto A2 for the current ticket. For three or moreagents, things get just a bit more complicated. Forexample, with three agents, A1’s payment uponwinning is not simply max(w2, w3), but greaterthan this. This is because we will usually have Va >max(W2, W3), because there is an option value forbeing able to switch between A2 and A3 overtime.16

Even with two agents the dynamic VCG mecha-nism is distinct from a sequence of second-priceauctions. This is because the item need not be allo-cated to the agent with the highest expected value.Rather, A2 may be allocated over A1 if there is stilluseful information to be gained about A2’s under-lying value distributions.

To think about how a sequence of simple sec-ond-price auctions fails, suppose there are twoagents, with belief states (55, 45) and (2, 2) respec-tively. A1 has expected value 55/100 = 0.55 and A2has expected value 2/4 = 0.5. Suppose A1 is truth-ful in a second-price auction, and bids his currentexpected value. If A2 also bids truthfully, then withhigh probability (because Nt

1 + Nt0 is large for A1,

and thus there is low variance on its estimate of )A2 will lose in every future period. But A2 has con-siderable uncertainty about its true type (whichwe may suppose is = 0.8), and could instead bidmore aggressively, for example bidding 0.6 despitehaving negative expected utility in the currentperiod. This allows for exploration, and in theevent of enjoying the hockey game, A2’s revisedposterior belief is (3, 2), and the agent will win andhave positive expected utility by bidding 0.55 orhigher in the next period. In this way, the infor-mation value from winning and being allocated aticket can outweigh the short-term loss in utility.

Dynamic Auctions with Deliberative AgentsFor a second example of a problem with internaluncertainty, we suppose now that our sportsenthusiasts are competing for a block of 10 tickets,and that each bidder has uncertainty about how touse the tickets, and a costly process to determinethe best use and thus the ultimate value. Shouldthe tickets go to a group of friends from college, toa charity auction, or be given as gifts to staff atwork? In !nding the best use, a bidder needs totake costly actions, such as calling old friends thatwant to chat forever, or developing and solving anoptimization model for how to best use them toimprove morale at work. Given a set of candidate

uses, a bidder’s value for the tickets is the maxi-mum value across the possible uses.

The problem of deciding when, whether, and forhow long to pursue this costly value-improvementprocess is a problem of metadeliberation. What isthe right trade-off to make between identifyinggood uses and the cost of this process (Russell andWefald 1991; Horvitz 1988; Larson and Sandholm2004)? Here we also need to handle self-interest:one agent might seek to shirk deliberation by pre-tending to have a very costly process so that delib-eration by other agents is prioritized !rst. Theagent would then only need to deliberate about itsown value if the other agents !nd that they have alow enough value. A socially optimal sequence ofdeliberation actions will try to identify high valueuses from agents for which deliberation is quitecheap, in order to avoid costly deliberation byagents whose values are expected to be quite low.

Figure 6 provides a Markov decision process(MDP) to model the deliberation process of anindividual agent. Later, we will show how to com-bine such a model into a multiagent sequentialdecision problem. From any state, the agent hasthe choice of deliberating or stopping and puttingthe tickets to the best use identi!ed so far. Forexample, if the agent deliberates once, with prob-ability 0.33 its value for the tickets will increasefrom 0 to 3, and with probability 0.67 its value willincrease from 0 to 1. Deliberation in this exampleis costly (with per-step cost of 1.1). Each agent hasa model of its costly deliberation process and a dis-count factor (0, 1). Let us assume that a deliber-ation action only revises an agent’s value weaklyupwards and that each agent has only a singledeliberation action available in every state.

The dynamic VCG mechanism applies to thisproblem because each agent’s value is independentof the type of other agents, conditioned on asequence of actions. An optimal policy will pursuea sequence of deliberation actions, ultimately fol-lowed by an allocation action. The goal is to max-imize the total expected discounted value of theallocation net of the cost of deliberation. Upondeliberation, the local state of an agent changesand thus we see the characteristic of internaluncertainty and dynamic type. The mechanismwill coordinate which agent should deliberate ineach period until deciding to allocate the item.Upon activating an agent, either by requesting adeliberation step or allocating the item, a paymentmay be demanded of the activated agent. The pay-ment aligns incentives, and is such that the expect-ed utility to an agent is always nonnegative, eventhough it may be both engaged in deliberation andmaking payments. Upon deliberation, each agentwill report its updated local type truthfully (forexample, its new value and revised belief abouthow future deliberation will improve its value).

Articles

88 AI MAGAZINE

Page 11: Dynamic Incentive Mechanisms - Harvard University

By assuming suf!ciently patient agents (withdiscount factor close enough to (1) and suf!-ciently costly deliberation, the problem has a struc-ture reminiscent of the MABP because a singleagent is “activated” in each period (either to delib-erate or to receive the items) and an agent’s stateonly changes when activated. One difference isthat each agent has two actions (deliberate or stop)from each state. It is possible to convert an agent’sMDP into a Markov chain by a simple transforma-tion in two steps. First, we prune away the actionsthat would not be optimal in a world in which thiswas the only agent (see !gure 7). In the example,the discount factor = 0.95. As long as the value ofan agent only increases (as is the case in thisdomain) then this pruning step is sound (Cavalloand Parkes 2008). The second step is to convert the!nite horizon Markov chain into an in!nite hori-zon Markov chain by unrolling any terminal stopaction with one-time value w into an absorbingstate, with reward (1 – )w received in every periodinto perpetuity (see !gure 8). This step is validbecause these states will remain absorbing statesunder an optimal policy for the MABP.

In the metadeliberation auction suggested in

Cavallo and Parkes (2008), the dynamic VCGmechanism is implemented as follows: each agent!rst reports its local deliberation model to themechanism along with its current deliberationstate. The mechanism there constructs a MABP byconverting each agent’s MDP into a Markov chainand determines the agent to activate in the currentperiod. If the state in the agent’s pruned Markovchain is one from which a deliberation action istaken then this is suggested to the agent by themechanism. Otherwise, the state is an absorbingstate and the item is allocated to the agent. Pay-ments are collected in either case, in order to aligneach agent’s incentives.

Figure 9 illustrates a simple two-agent examplewhere the local MDPs have already been convert-ed into Markov chains. Assume discount factor =0.75 and that deliberation costs are 0 for bothagents. Initially A1 has value 2 and A2 has value 0.The optimal policy calls for A2 to deliberate, andmake a payment of 1 – times the expected valueto A1 if decisions were optimized for him, that is,0.25 11.1 = 2.775. If A2 transitions to the value 40state, in the second time period it is allocated theitem and must pay 11.1. Alternatively, if it transi-

Articles

WINTER 2010 89

3 4

0

1 11

–1.10

3 4

–1.1

–1.1

1–1.10.5

0.67stop

stop stop

stop

stop

0.330.5

–1.11

Figure 6. A Markov Decision Process to Model the Deliberation Process of an Agent.

Each state is labeled with the value from the current best use of the block of tickets. This value is received from a stopaction. The states labeled stop are terminal states. Upon a deliberation action (with cost 1.1) the agent undergoes a prob-abilistic transition to a new state with weakly higher value.

Page 12: Dynamic Incentive Mechanisms - Harvard University

Articles

90 AI MAGAZINE

3 4

0

1 1

–1.1

3 4

-1.1

–1.1

1–1.10.5

0.67

stop stop

stop

0.330.5

Figure 7. The Single-Agent Deliberation Model in Which the Suboptimal Actions Are Removed.

The result is a Markov chain where, for example, (1) the agent will deliberate in state 0, undergoing a probabilistic tran-sition to a state with value 3 or 1, and (2) the agent will stop deliberating and receive value 3 in state 3.

3 4

0

1 1

–1.1

–1.1

–1.1

–1.10.5

0.67

0.330.5

0.05

0.200.15

Figure 8. Obtaining an In!nite-Horizon Markov Chain.

The single-agent deliverative model in which the terminal stop states in the Markov chain (the chain itself obtained bypruning suboptimal actions) are now transformed into absorbing states, obtaining an in!nite-horizon Markov chain.

Page 13: Dynamic Incentive Mechanisms - Harvard University

tions to the value 8 state, the optimal policy callsfor A1 to deliberate and make payment 0.25 8 =2. If A1 then transitions to the absorbing state withvalue 6, the item is allocated to A2 who pays 6. IfA1 instead transitions to the nonabsorbing value 6state, it deliberates again (again making payment2). Finally, if A1 then transitions to the value 40state it is allocated the item and makes payment 8;otherwise A2 is allocated and makes payment 6.

A Challenge: Scaling UpThe incentive properties of the dynamic VCGmechanism extend to any problem in which eachagent’s private type evolves in a way that is inde-pendent of the type of other agents, when condi-tioned on the actions of the mechanism. But whatis challenging in developing applications is thesame problem that we identi!ed in looking toapply the online VCG mechanism to CAs: thesequential decision problem becomes intractablein many domains, and substituting approximationalgorithms does not sustain the incentive proper-ties. Two important challenges include developingtractable special cases and representation lan-guages, and developing heuristic approaches.

Tractable special cases and representation lan-guages. Can we identify additional models for mul-tiagent problems with dynamic private state forwhich the optimal decision policy is tractable? Arelated direction is to develop representation lan-guages that allow agents to succinctly describetheir local dynamics, for example, models of learn-ing and models of value re!nement.

Heuristic approaches. Just as heuristic approachesseem essential for the design of practical dynamicmechanisms for environments with externaluncertainty, so too will we need to develop meth-ods to leverage heuristic algorithms for sequentialdecision making with agents that face internaluncertainty. For this, it seems likely that we willneed to adopt approximate notions of incentivecompatibility, and develop characterizations ofcoordination policies with “good enough” incen-tive properties.

The AI community seems especially well placedto be creative and "exible in creating effectivedynamic incentive mechanisms. Roth (2002) haswritten of the need for developing an engineeringfor economics, and it seems that our communityhas plenty to offer in this direction.

We need not be dogmatic. For example, incen-tive compatibility is nice to have if available, butwe will need to adopt more relaxed criteria bywhich to judge the stability of mechanisms in thepresence of self-interested agents. A good alterna-tive will enable new approaches to the design andanalysis of mechanisms (Lubin and Parkes 2009).Indeed, Internet ad auctions are inspired by, butnot fully faithful to, the theories of incentive-com-

patible mechanism design. Search engines initiallyadopted a second-price style auction over a !rst-price auction because the !rst-price auction creat-ed too much churn on the servers, as bidders and

Articles

WINTER 2010 91

6

2 6

6

40

a Agent 1

0.25

0.75

0.5

10

1.5

1.5

0.5

0

0 0

0

b Agent 2

8

0

40

0.5

10

2

0.5

0

0

Figure 9. An Auction with Deliberative Agents.

Agent 1’s initial best-use value is 2, and may increase to as much as 40 throughdeliberation. Agent 2’s initial best-use value is 0, and may increase to 8 or 40through deliberation. The discount factor is = 0.75, while the cost of delib-eration is 0.

Page 14: Dynamic Incentive Mechanisms - Harvard University

bidding agents automatically chased each otheraround the bid space! So incentive compatibility,at least in some form of local stability, became animportant and pragmatic design criterion.

Certainly, there are real-world problems of inter-est where insisting on truthfulness comes at a greatcost to system welfare. Budish and Cantillon(2009) present a nice exposition of this in the con-text of course registration at Harvard BusinessSchool, where the essentially unique strategy-proofmechanism—the randomized serial dictatorship—has bad welfare properties because of the callous-ness of relatively insigni!cant choices of earlyagents across lower-ranked courses in a randompriority ordering.

ConclusionsThe promise of dynamic incentive mechanisms isthat they can provide simpli!cation, robustness,and optimality in dynamic, multiagent environ-ments, by engineering the right rules of encounter.Good success has been found in generalizing thecanonical VCG mechanism to dynamic environ-ments, and in enabling the coupling of heuristicapproaches to stochastic optimization with incen-tive compatibility in restricted domains. Still, manychallenges remain, both in terms of developing use-ful characterizations of “good enough” incentivecompatibility and in leveraging these characteriza-tions within computational frameworks.

Dynamic mechanisms are fascinating in theirability to embrace both uncertainty that occursoutside of the scope of individual agents and alsoto “reach within” an agent and coordinate its ownlearning and deliberation processes.17 But here wesee the beginning of a problem of scope. Presum-ably we do not really believe that centralized deci-sion making, with coordination even down to thedetails of an agent’s deliberation process, is sensi-ble in large scale, complex environments.

This is where it is also important to pivot awayfrom direct revelation mechanisms—in whichinformation is elicited by a center which makesand enforces decisions—to indirect revelationmechanisms. An indirect mechanism allows anagent to interact while revealing only the minimalinformation required to facilitate coordination.Can dynamic mechanisms be developed that econ-omize on preference elicitation by allowing agentsto send messages that convey approximate orincomplete information about their type, inresponse to queries from a mechanism?18

A couple of other limitations about the currentstate-of-the-art should be highlighted. First, wehave been focused exclusively on goals of socialwelfare, often termed economic ef!ciency. Little isknown about how to achieve alternative goals,such as revenue or various measures of fairness, in

dynamic contexts. Secondly, we have assumedmechanisms in which there is money, that can beused for aligning agent incentives. But we sawfrom the median-choice rule mechanism and itsapplication to the blue whale location problemthat there are interesting static mechanisms forcontexts without money. Indeed, a recent line ofresearch within computer science is developingaround the notion of mechanism design withoutmoney (Procaccia and Tennenholtz 2009). Butthere is relatively little known about how to designdynamic mechanisms without money.19 We mightimagine that the curator of the exhibition on theuniversity mall is interested in bringing through aprogression of massive mammals. How should adynamic mechanism be structured to facilitate asequence of decisions about which mammal, andin which location, to exhibit each year?

AcknowledgmentsThis work would not be possible without manygreat collaborators. The !rst author would like towarmly acknowledge Jonathan Bredin, QuangDuong, Eric Friedman, Mohammad T. Hajiaghayi,Takayuki Ito, Adam Juda, Mark Klein, Robert Klein-berg, Mohammad Mahdian, Gabriel Moreno, Cha-ki Ng, Margo Seltzer, Sven Seuken, and DimahYanovsky. Parkes’s work is supported in part byYahoo! and Microsoft Research awards.

Notes1. Another way to see this is that each agent faces thechoice of winning or losing, and the price for winning isindependent of his report. The report of an agent’s valuejust determines which choice is made, with the choicethat is made being optimal for the agent given the report.A2 faces a choice of losing for zero payment, or winningfor $900, and is happy to win. A3 faces a choice betweenwinning for $1000 or losing and is happy not to win.2. The University of British Columbia has just openedsuch an exhibit.3. Both of these examples illustrate direct mechanisms inwhich agents make a report about their type. In indirectmechanisms, a message sent by an agent is typicallyinterpreted instead as providing partial informationabout an agent’s type; for example, a bid for an item atprice p in an ascending price auction can be interpretedas a claim that v ≥ p, for value v. Indirect mechanisms areimportant when preference elicitation is costly.4. Without a dynamic agent population, external uncer-tainty can be rolled into the standard framework ofmechanism design because incentive constraints needbind only in the initial period, with a decision made andcommitted to (for example, a decision policy) that willstructure the subsequent realization of outcomes (Dolgovand Durfee 2005).5. In the example, when A1 is truthful then v + Vb = 900+ 1000 = 1900 and this is unchanged for all reports v! ≥400. For reports v! < 400, then v + Vb = 0 + 1400 < 1900and A1’s utility falls from 500 to 0.6. As is familiar from Markov decision processes, for in!-

Articles

92 AI MAGAZINE

Page 15: Dynamic Incentive Mechanisms - Harvard University

nite decision horizons a discount factor (0, 1) isadopted, and the objective is to maximize the expecteddiscounted sum, that is, V(0) + V(1) + 2V(2) + ..., whereV(t) is the total value to all agents for the action in peri-od t.7. This is a re!nement on a Bayesian-Nash equilibrium,referred to as a within period ex post Nash equilibrium,because an agent’s best strategy is to report its true typewhatever the reports of other agents up to and includingthe current period, just as long as other agents follow thetruthful equilibrium in future periods. It is equivalent todominant-strategy equilibrium in the !nal period of adynamic problem, when online VCG is equivalent to thestatic VCG mechanism.8. In comparison, if agents are impatient and demand anallocation of q tickets in the period in which they arrive,the associated planning problem is tractable, with aclosed-form characterization of the optimal policy andan analytic characterization of incentive-compatibledynamic mechanisms (Dizdar, Gershkov, and Mol -dovanu 2009).9. This effect is familiar from static problems (Lehmann,O’Callaghan, and Shoham 2002) but exacerbated herebecause we get a new unraveling, with agents no longerhaving a good basis for believing that other agents will betruthful, and thus no longer believing that the mecha-nism’s model of the probabilistic process is correct.10. This use of the phrasing “ironing” is evocative ofremoving the nonmonotonic “rumples” from the deci-sion policy, adopted here in the same sense as Myerson’s(1981) seminal work in optimal mechanism design.Whereas Myerson achieves monotonicity by ironing outnonmonotonicity in the input into an optimization pro-cedure, the approach adopted here is to achieve monot-onicity by working on the output of an optimization pro-cedure.11. We can also couple internal and external uncertain-ty, for example, in enabling mechanism design for envi-ronments in which there is uncertainty about the avail-ability of resources and in which individual agents arere!ning their own beliefs about their value for differentoutcomes (Cavallo, Parkes, and Singh 2010).12. Formally, the agent’s prior in period t on probability

is a Beta distribution ∼�Beta( t, t), with parameters t

and t. The Beta distribution satis!es the conjugate prop-erty, so that the posterior is in the same family. Given pri-or Beta( t, t), then the posterior is Beta( t + 1, t) after a“1” and Beta( t, t + 1) after a “0.”13. An index policy is one in which an index (or “quali-ty”) is computed separately for each arm and the armwith the highest index is activated in every period.14. Truthful reporting is an equilibrium in the samere!nement of the Bayesian-Nash equilibrium as for theonline VCG mechanism, with truthful reporting optimalwhatever the current types of agents, as long as the oth-er agents report their true type forward from the currentperiod.15. For technical reasons, every agent is entitled to makea report in every period, even if not activated. This is toallow an agent to correct an earlier misreport and betruthful going forward from the current state.16. The payment can be easily approximated through asample-based computation, by sampling the trajectory of

states reached under the optimal index policy to A2 andA3 alone.17. For a more technical overview of dynamic mecha-nisms, see Parkes (2007). Lavi and Nisan (2000) !rstintroduced the question of dynamic mechanisms to com-puter science, giving a focus to the design of prior freeand incentive-compatible online algorithms.18. Such mechanisms have been developed to good effectfor settings of static mechanism design, but are still intheir infancy for dynamic mechanism design (Said 2008).19. But see Jackson and Sonenschein (2007), Zou, Gujar,and Parkes (2010), Abdulkadiroglu and Sönmez (1999),and Lu and Boutilier (2010) for some possible inspiration.

ReferencesAbdulkadiroglu, A., and Sönmez, T. 1999. House Alloca-tion with Existing Tenants. Journal of Economic Theory88(2): 233–260.Bergemann, D., and Välimäki, J. 2010. The Dynamic Piv-ot Mechanism. Econometrica 78(2): 771–789.Budish, E., and Cantillon, E. 2009. The Multiunit Assign-ment Problem: Theory and Evidence from Course Allo-cation at Harvard. Technical report, University of Chica-go Booth School of Business.Cavallo, R., and Parkes, D. C. 2008. Ef!cient Metadelib-eration Auctions. In Procedings of the 23rd AAAI Conferenceon Arti!cial Intelligence (AAAI’08), 50–56. Menlo Park, CA:AAAI Press.Cavallo, R.; Parkes, D. C.; and Singh, S. 2010. Ef!cientMechanisms with Dynamic populations and DynamicTypes. Technical report, Harvard University.Constantin, F., and Parkes, D. C. 2009. Self-CorrectingSampling-Based Dynamic MultiUnit Auctions. In Pro-ceedings of the 10th ACM Electronic Commerce Conference(EC’09), 89–98. New York: Association for ComputingMachinery.Cramton, P.; Shoham, Y.; and Steinberg, R., eds. 2006.Combinatorial Auctions. Cambridge, MA: The MIT Press.Dizdar, D.; Gershkov, A.; and Moldovanu, B. 2009. Rev-enue Maximization in the Dynamic Knapsack Problem.Technical report, University of Bonn.Dolgov, D. A., and Durfee, E. H. 2005. Computationally-ef!cient Combinatorial Auctions for Resource Allocationin Weakly-Coupled MDPs. In Proceedings of the 4th Inter-national Joint Conference on Autonomous Agents and Multi-agent Systems (AAMAS’05). New York: Association forComputing Machinery.Ephrati, E., and Rosenschein, J. S. 1991. The Clarke Tax asa Consensus Mechanism Among Automated Agents. InProceedings of the 9th National Conference on Arti!cial Intel-ligence (AAAI-91), 173–178. Menlo Park, CA: AAAI Press.Gittins, J. C. 1989. Multiarmed Bandit Allocation Indices.New York: Wiley.Hajiaghayi, M. T.; Kleinberg, R.; Mahdian, M.; and Parkes,D. C. 2005. Online Auctions with Re-usable Goods. InProceedings of the ACM Conference on Electronic Commerce,165–174. New York: Association for Computing Machin-ery.Hentenryck, P. V., and Bent, R. 2006. Online StochasticCombinatorial Optimization. Cambridge, MA: The MITPress.Horvitz, E. J. 1988. Reasoning Under Varying and Uncer-

Articles

WINTER 2010 93

Page 16: Dynamic Incentive Mechanisms - Harvard University

tain Resource Constraints. In Proceedings of the 7th Nation-al Conference on Arti!cial Intelligence (AAAI-88), 139–144.Menlo Park, CA: AAAI Press.Hurwicz, L. 1973. The Design of Mechanisms forResource Allocation. American Economic Review Papers andProceedings 63(5): 1–30.Jackson, M. O., and Sonnenschein, H. F. 2007. Overcom-ing Incentive Constraints by Linking Decisions. Econo-metrica 75(1): 241–258.Jackson, M. O. 2000. Mechanism Theory. In The Encyclo-pedia of Life Support Systems. Miossissauga, Ontario, Cana-da: EOLSS Publishers.Larson, K., and Sandholm, T. 2004. Using PerformancePro!le Trees to Improve Deliberation Control. In Proceed-ings of the 19th National Conference on Arti!cial Intelligence(AAAI’04). Menlo Park, CA: AAAI Press.Lavi, R., and Nisan, N. 2000. Competitive Analysis ofIncentive Compatible On-line Auctions. In Proceedings ofthe 2nd ACM Conference on Electronic Commerce (EC-00).New York: Association for Computing Machinery.Lehmann, D.; O’Callaghan, L. I.; and Shoham, Y. 2002.Truth Revelation in Approximately Ef!cient Combinato-rial Auctions. Journal of the Association for ComputingMachinery 49(5): 577–602.Lu, T., and Boutilier, C. 2010. The Unavailable CandidateModel: A Decision-Theoretic View of Social Choice. InProceedings of the 11th ACM Conference on Electronic Com-merce (EC’10). New York: Association for ComputingMachinery.Lubin, B., and Parkes, D. C. 2009. Quantifying the Strat-egyproofness of Mechanisms via Metrics on Payoff Dis-tributions. In Proceedings of the 25th Conference on Uncer-tainty in Arti!cial Intelligence, 349–358. Redmond, WA:AUAI Press.Moulin, H. 1980. On Strategy-Proofness and Single-Peakedness. Public Choice 35(4): 437–455.Myerson, R. B. 1981. Optimal Auction Design. Mathe-matics of Operation Research 6(1): 58–73.Parkes, D. C., and Duong, Q. 2007. An Ironing-BasedApproach to Adaptive Online Mechanism Design in Sin-gle-Valued Domains. In Proceedings of the 22nd NationalConference on Arti!cial Intelligence (AAAI’07), 94–101.Menlo Park, CA: AAAI Press.Parkes, D. C. 2009. When Analysis Fails: Heuristic Mech-anism Design via Self-Correcting Procedures. In Proceed-ings of the 35th International Conference on Current Trendsin Theory and Practice of Computer Science (SOFSEM’09),Lecture Notes in Computer Science 5901, 62–66. Berlin:Springer.Parkes, D. C. 2007. Online Mechanisms. In AlgorithmicGame Theory, ed. N. Nisan, T. Roughgarden, E. Tardos,and V. Vazirani, chapter 16, 411–439. Cambridge, UK:Cambridge University Press.Parkes, D. C., and Singh, S. 2003. An MDP-BasedApproach to Online Mechanism Design. In Proceedings ofthe 17th Annual Conference on Neural Information Process-ing Systems (NIPS’03). Cambridge, MA: The MIT Press.Procaccia, A. D., and Tennenholtz, M. 2009. ApproximateMechanism Design Without Money. In Proceedings of the10th ACM Conference on Electronic Commerce, 177–186.New York: Association for Computing Machinery.Rosenschein, J. S., and Zlotkin, G. 1994. Designing Con-

ventions for Automated Negotiation. AI Magazine. 15(3):29–46.Roth, A. E. 2002. The Economist as Engineer. Econometri-ca 70(4): 1341–1378.Russell, S., and Wefald, E. 1991. Do the Right Thing. Cam-bridge, MA: The MIT Press.Russell, S. J.; Subramanian, D.; and Parr, R. 1993. Prov-ably Bounded Optimal Agents. In Proceedings of the 13thInternational Joint Conference on Arti!cial Intelligence(IJCAI-93), 338–344. San Francisco: Morgan KaufmannPublishers.Said, M. 2008. Information Revelation in SequentialAscending Auctions. Technical report, Yale University,New Haven, CT.Shahaf, D., and Horvitz, E. 2010. Generalized Task Mar-kets for Human and Machine Computation. In Proceed-ings of the 24th AAAI Conference on Arti!cial Intelligence.Menlo Park, CA: AAAI Press.Vickrey, W. 1961. Counterspeculation, Auctions, andCompetitive Sealed Tenders. Journal of Finance 16(1): 8–37.von Ahn, L., and Dabbish, L. 2008. Designing Gameswith a Purpose. Communications of the ACM 51(8): 58–67.Vytelingum, P.; Voice, T. D.; Ramchurn, S. D.; Rogers, A.;and Jennings, N. R. 2010. Agent-Based Micro-StorageManagement for the Smart Grid. In Proceedings of the 9thInternational Conference on Autonomous Agents and Multia-gent Systems. International Foundation for AutonomousAgents and Multiagent System. Richland, SC: Interna-tional Foundation for Autonomous Agents and Multia-gent Systems.Zou, J.; Gujar, S.; and Parkes, D. 2010. Tolerable Manipu-lability in Dynamic Assignment Without Money. In Pro-ceedings of the 24th AAAI Conference on Arti!cial Intelli-gence. Menlo Park, CA: AAAI Press.

David Parkes is a professor of computer science at Har-vard University and founder of the EconCS researchgroup. He received his Ph.D. in computer and informa-tion science from the University of Pennsylvania in 2001.His research interests include multiagent systems, com-putational mechanism design, and computational socialchoice.

Ruggiero Cavallo is a research scientist at Yahoo!Research in New York City. He was an undergraduate atCornell University, graduating in 2001, and received hisPh.D. in computer science from Harvard University in2008. His primary research area is mechanism design.

Florin Constantin is a postdoctoral researcher in com-puter science at Georgia Institute of Technology. Hereceived his Ph.D. from Harvard University in 2009. Hiscurrent research interests are computational mechanismdesign and algorithmic game theory.

Satinder Singh is a professor of computer science andengineering at the University of Michigan and, at pres-ent, the director of the AI Laboratory. He received hisPh.D. in computer science at the University of Massa-chusetts, Amherst. His research interests include rein-forcement learning, architectures for "exibly intelligentautonomous agents, and game-theoretic techniques formultiagent learning.

Articles

94 AI MAGAZINE