Dynamic Non-monetary Incentives - Weebly

40
American Economic Journal: Microeconomics 2019, 11(4): 111–150 https://doi.org/10.1257/mic.20170025 111 Dynamic Non-monetary Incentives By Daniel Bird and Alexander Frug* We study a principal-agent interaction where investments and rewards arrive stochastically over time and are privately observed by the agent. Investments (costly for the agent, beneficial for the principal) can be concealed by the agent. Rewards (beneficial for the agent, costly for the principal) can be forbidden by the principal. We ask how rewards should be used and which investments incen- tivized. We identify the unique optimal mechanism and analyze the dynamic investment and compensation policies. When all rewards are identical, the unique optimal way to provide incentives is by a “carte blanche” to pursue all rewards arriving in a predetermined time frame. (JEL D82, M52) O ptimal incentivization when the interests of the involved parties are not aligned is a long-standing and intensively studied question in economics. Most of the literature focuses on issues of imperfect action observability (e.g., hidden effort) or private information (e.g., cost of effort) and assumes monetary compensation. However, in some applications, instantaneous monetary transfers are infeasible or inferior to other methods of compensation, and more importantly, there is uncertainty regarding the availability of future compensation opportunities. Examples of such situations are abundant: an employer can shorten work days when business is slow or allow certain perks when they become available (e.g., attend exclusive events held by corporate partners or professional conferences); a start-up firm may be unable to pay its employees until it locates an investor to finance it; an antitrust authority may incentivize welfare-increasing mergers by supporting profit-increasing mergers in the future; or the management of a large organization may incentivize its divisions by granting preferred access to the organization’s capital when the need arises. For a more concrete example consider an ongoing relationship between the city mayor and a subordinate bureaucrat whose salary is fixed by regulations. Occasionally, the bureaucrat’s position allows him to recognize opportunities that * Bird: Eitan Berglas School of Economics, Tel Aviv University, Ramat-Aviv, Tel-Aviv, 6997801 (email: dbird@ post.tau.ac.il); Frug: Department of Economics and Business Universitat Pompeu Fabra and Barcelona GSE, Ramon Trias Fargas, 25-27 08005 Barcelona (email: [email protected]). Johannes HÖrner was coeditor for this article. We are grateful to Eddie Dekel, Bruno Strulovici, and Asher Wolinsky for valuable suggestions and com- ments. We also appreciate the helpful comments of seminar participants at Arizona State University, London School of Economics, Northwestern University, Penn State University, Tel Aviv University, The European University Institute, The Hebrew University, UC Berkeley, UC San Diego, and Universitat Pompeu Fabra. Frug acknowledges the financial support of the Spanish Ministry of Economy and Competitiveness (projects ECO2014-56154-P and SEV-2011-0075). Go to https://doi.org/10.1257/mic.20170025 to visit the article page for additional materials and author disclosure statement(s) or to comment in the online discussion forum.

Transcript of Dynamic Non-monetary Incentives - Weebly

Page 1: Dynamic Non-monetary Incentives - Weebly

American Economic Journal: Microeconomics 2019, 11(4): 111–150 https://doi.org/10.1257/mic.20170025

111

Dynamic Non-monetary Incentives†

By Daniel Bird and Alexander Frug*

We study a principal-agent interaction where investments and rewards arrive stochastically over time and are privately observed by the agent. Investments (costly for the agent, beneficial for the principal) can be concealed by the agent. Rewards (beneficial for the agent, costly for the principal) can be forbidden by the principal. We ask how rewards should be used and which investments incen-tivized. We identify the unique optimal mechanism and analyze the dynamic investment and compensation policies. When all rewards are identical, the unique optimal way to provide incentives is by a “carte blanche” to pursue all rewards arriving in a predetermined time frame. (JEL D82, M52)

Optimal incentivization when the interests of the involved parties are not aligned is a long-standing and intensively studied question in economics. Most of the

literature focuses on issues of imperfect action observability (e.g., hidden effort) or private information (e.g., cost of effort) and assumes monetary compensation. However, in some applications, instantaneous monetary transfers are infeasible or inferior to other methods of compensation, and more importantly, there is uncertainty regarding the availability of future compensation opportunities. Examples of such situations are abundant: an employer can shorten work days when business is slow or allow certain perks when they become available (e.g., attend exclusive events held by corporate partners or professional conferences); a start-up firm may be unable to pay its employees until it locates an investor to finance it; an antitrust authority may incentivize welfare-increasing mergers by supporting profit-increasing mergers in the future; or the management of a large organization may incentivize its divisions by granting preferred access to the organization’s capital when the need arises.

For a more concrete example consider an ongoing relationship between the city mayor and a subordinate bureaucrat whose salary is fixed by regulations. Occasionally, the bureaucrat’s position allows him to recognize opportunities that

* Bird: Eitan Berglas School of Economics, Tel Aviv University, Ramat-Aviv, Tel-Aviv, 6997801 (email: [email protected]); Frug: Department of Economics and Business Universitat Pompeu Fabra and Barcelona GSE, Ramon Trias Fargas, 25-27 08005 Barcelona (email: [email protected]). Johannes HÖrner was coeditor for this article. We are grateful to Eddie Dekel, Bruno Strulovici, and Asher Wolinsky for valuable suggestions and com-ments. We also appreciate the helpful comments of seminar participants at Arizona State University, London School of Economics, Northwestern University, Penn State University, Tel Aviv University, The European University Institute, The Hebrew University, UC Berkeley, UC San Diego, and Universitat Pompeu Fabra. Frug acknowledges the financial support of the Spanish Ministry of Economy and Competitiveness (projects ECO2014-56154-P and SEV-2011-0075).

† Go to https://doi.org/10.1257/mic.20170025 to visit the article page for additional materials and author disclosure statement(s) or to comment in the online discussion forum.

Page 2: Dynamic Non-monetary Incentives - Weebly

112 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

are either “investments” that benefit the mayor but are costly for himself (e.g., informally facilitating the application process of a desired project) or “rewards” that constitute a personal benefit for the bureaucrat and entail some cost for the mayor (e.g., appointing associates to vacated municipal positions). While investments can easily be concealed by the bureaucrat and rewards banned by the mayor, there are multiple ways in which the randomly arriving rewards can be used to incentivize investments. What is the optimal method of incentivization in this environment? Which investments should be incentivized? How does the optimal policy change over time?

To address these questions, we consider a dynamic principal-agent model with a feature that, to the best of our knowledge, is novel in this literature: both investments and rewards (compensation opportunities) arrive stochastically over time and are either taken immediately or foregone. The implementation of invest-ments is costly for the agent and so the principal uses the randomly arriving rewards for incentivization. While investment and reward availability is privately observed by the agent, implemented investments and rewards are perfectly observed by the principal who can restrict the set of allowed actions based on past implementation. We identify the unique optimal mechanism, which is economically appealing due to its simplicity and clear interpretation.

To understand the problem faced by the principal, consider first a situation with one type of reward activity; that is, every implemented reward generates the same (positive) payoff for the agent and entails the same cost for the principal. Even in this relatively simple case, the optimal form of compensation is not obvious. For example, the agent may be allowed to pursue a fixed number of reward activities (no matter how long it takes), or enjoy all reward activities that arrive in a fixed time interval (no matter how many of them there are). The principal may begin compensation immediately (“front-loading”) or delay it for as long as possible (“back-loading”). We show that the unique optimal way to compensate the agent is via a “ time-constrained carte blanche,” i.e., allowing all rewards that arrive in a fixed time interval that begins at the present moment. In our mayor-bureaucrat example, this implies that after the bureaucrat identifies and implements an investment for the mayor, the most efficient way to provide compensation is by granting him complete freedom to fill vacated positions in his department for a predetermined amount of time. Section IIA explains the economic intuition behind this result.

When there are multiple types of investments and rewards, the principal has to decide on how jointly to use the different reward activities and to determine which investment opportunities are worth pursuing. Our main result shows that there is a unique optimal mechanism, which we call the generalized time mechanism. Under this mechanism, the principal compensates the agent by specifying a (separate) time interval for each type of reward within which all rewards of that type are allowed. When a desired investment is implemented, the time intervals are increased.

The generalized time mechanism exhibits several notable economic properties that arise only when there are multiple types of investments and rewards. Firstly, the set of allowed rewards at each point in time depends on the amount of compen-sation owed to the agent and thus changes over time. When owed compensation is high, the principal permits multiple types of reward activities at the same time.

Page 3: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 113BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

Surprisingly, the principal permits reward activities with a high cost of providing a util to the agent even when it is not necessary to do so. Specifically, even though the principal can incentivize the agent by increasing only the time interval in which cheap rewards are allowed, she allows more expensive rewards as well. As time goes by, the principal gradually reduces the set of allowed rewards, until a new investment is implemented.

Secondly, investment decisions change non-monotonically over time. Periods with many completed investments are followed by periods in which only high-quality investments are incentivized. If such investment opportunities do not arrive quickly enough, the principal gradually becomes less selective about the investments she is willing to incentivize. Consequently, the principal may incen-tivize an investment opportunity similar to one she previously chose to forgo. Additionally, in the long run, the agent is essentially allowed to enjoy all rewards without implementing any investments. In our mayor-bureaucrat example, this implies that ultimately the bureaucrat will have complete freedom to enjoy personal benefits and cease pursuing opportunities that benefit the mayor.

The only information asymmetry assumed in this paper is with regard to the availability of investments and rewards. We abstract away from additional adverse selection and moral hazard problems by assuming that the payoffs from imple-mented actions are perfectly observable and that investments and rewards arrive according to independent Poisson processes. In particular, this implies that the agent cannot hasten the arrival of rewards and that both players always have the same beliefs about the availability of investments and rewards in the future.

The paper proceeds as follows. Section I introduces the model and presents some preliminary analysis. In Section II, we highlight the economic intuition and the implications of our main result in a model with a single type of investment project and a single type of reward activity. In Section III, we prove that there exists a unique optimal mechanism and study its qualitative properties. In Section IV, we analyze variations of our main model. In particular, we show that our main qualitative results remain valid if monetary transfers are added to the model or the principal lacks com-mitment power. Moreover, we discuss the role of the assumption that opportunities are privately observed. Section V offers a review of the related literature. Section VI concludes. All proofs are relegated to the Appendix.

I. Model

We consider an infinite-horizon continuous-time mechanism design problem in which a principal (she) incentivizes an agent (he) to implement stochastically arriving investments via stochastically arriving rewards. Investment projects and reward activities arrive according to independent Poisson processes. There are I ∈ ℕ types of investment projects: investment i ∈ I has an arrival rate μ i , its implementation incurs a loss of l i for the agent, and it generates a benefit of B i for the principal. Without loss of generality, we order the types of investment proj-ects such that the rate of return on investments, B i / l i , is weakly decreasing in i . Similarly, there are J ∈ ℕ types of reward activities: reward j ∈ J has an arrival rate λ j , generates a gain of g j for the agent, and entails a cost of C j for the principal.

Page 4: Dynamic Non-monetary Incentives - Weebly

114 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

Again, without loss of generality, we order the reward types such that the cost of providing a util, C j / g j , is weakly increasing in j . All model parameters are strictly positive. All investments and rewards can be “scaled down” and implemented at an intensity of α ∈ [0, 1] .1 We assume that both players are expected-utility maximiz-ers and that they share the same (strictly) positive discount factor r .

At each point in time, the agent can implement an investment, or enjoy a reward, only if it is both available (by nature) and allowed (by the principal). To specify what investments and rewards are allowed at a given point in time, we define delegation lists. A (pure) delegation list is a vector of the form

D = D inv × D rew ∈ [0, 1] I × [0, 1] J ,

where the k th coordinate of D inv ( D rew ) is the intensity at which an investment (reward) of type k is allowed. When an investment/reward is permitted at intensity zero, we say that it is forbidden.

We assume that only the agent observes whether investments and rewards are available.2 However, an implemented investment/reward is publicly observed. The principal is assumed to have full commitment power, which allows her to credibly propose a contract at the beginning of the interaction. A contract is a function D ̃ ( h t ) that maps public histories (clarified below) into lotteries over pure delegation lists. It is without loss of generality to restrict attention to such contracts (as opposed to allowing the agent to choose at what intensity to implement an opportunity) due to the Poisson arrival assumption. In particular, privately observed opportunities that were forgone in the past are not informative about the future. Therefore, leaving partial discretion to the agent regarding the implementation intensity of an available investment/reward has no value.

The public information at time t consists of the realized delegation list and the agent’s action at time t (i.e., either an implementation of an available investment/ reward at the allowed intensity or “no action”). A public history at the beginning of time t , h t , describes the public information for each s ∈ [0, t) . We say that a contract D ̃ ( ⋅ ) is measurable if for any history h t the following ( multidimensional) integral exists:

∫ h ( ∫

s=t

∞ e −r (s−t) E [ D ̃ ( h s )] ds) d ν D ̃ (h | h t ),

where ν D ̃ (h | h t ) is the measure of the infinite public histories generated by the contract D ̃ ( ⋅ ) conditional on h t .

The main implication of measurability is that under a measurable contract the agent’s continuation utility and the principal’s continuation value are well defined. Denote by N i

inv and N j rew the counting processes that, respectively, count the number

1 When players are expected-utility maximizers and stochastic mechanisms are allowed, assuming that actions can be scaled down is equivalent to assuming that the principal can commit to approving actions probabilistically.

2 The assumption of asymmetric information regarding reward availability is immaterial to our results. See Section IVB for a detailed discussion on publicly observable investments and rewards.

Page 5: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 115BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

of times the ith investment and j th reward were implemented at a positive intensity. Then under a measurable contract the expectation

E [

∫ s=t

e −r (s−t) ( ∑ j∈J

g j E [ D ̃ j rew ( h s )] d N j, s rew − ∑

i∈I l i E [ D ̃ i inv ( h s )] d N i, s

inv ) ds | h t ]

is well defined.We say that a contract is incentive compatible if, after every public history, and

given any realized delegation list D ∈ supp D ̃ ( h t ) , implementing the available investment/reward at the permitted intensity is optimal for the agent. We restrict attention to incentive-compatible contracts. This is without loss of generality since, for any contract where there are histories after which the agent strictly prefers to conceal an available investment/reward of a given type, there is a payoff-equivalent contract under which, after such histories, that type of investment/reward is forbidden.

To conclude, the contracting space from which the principal can choose at the beginning of the interaction is the set of all measurable and incentive-compatible contracts. Notice that we do not need an explicit IR constraint since by concealing all investments, the agent guarantees himself a nonnegative continuation utility after every history. Therefore, any incentive-compatible contract is also interim IR.

A. Markovian Contracts

In the stationary environment of this model, it is well known that attention can be restricted to mechanisms that use the agent’s expected continuation utility, u , as a state variable.3 The set of continuation utilities is bounded from below by zero , since the agent can always conceal all future investment opportunities, and bounded from above by his expected utility from enjoying all future rewards without carrying out any investments:

u – ≡ ∫ 0 ∞

e −r t ∑ j∈J

g j λ j dt = ∑ j∈J

g j λ j

________ r .

Any continuation utility in the interval [0, u – ] is feasible. To specify a Markovian delegation mechanism, we need to define a delegation

function specifying the delegation list for every feasible continuation utility, D(u) , and a stochastic process < u > , according to which the agent’s continuation utility changes. In the Markovian environment, it is sufficient to specify a pure delegation list for every level of u . Were the principal to randomize over multiple delegation lists at some u , one of the delegation lists in the support would maximize her continuation value whenever that u is reached.

Formally, a Markovian contract is defined by a delegation function, D(u) , which specifies a (pure) delegation list for each u , and by a stochastic process < u > = { F τ (u), φ i

inv (u), φ i inv (u) , u 0 } with the support of [0, u – ] . Here, F τ (u) is a

3 See, for example, Spear and Srivastava (1987) and Thomas and Worrall (1990).

Page 6: Dynamic Non-monetary Incentives - Weebly

116 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

Markov process that specifies the distribution of the agent’s continuation utility after a time period of length τ ∈ (0, ∞] , during which neither rewards nor investments were implemented, given that the current agent’s continuation utility equals u . The functions φ i

inv (u) and φ j rew (u) for i ∈ I and j ∈ J specify the

distribution of the agent’s continuation utility after the implementation of an avail-able investment/ reward at state u (at the unique intensity specified by D(u) ). The starting value of the process is u 0 .

B. The Principal’s Problem

The principal’s objective is to choose a delegation function and a stochastic pro-cess for the agent’s continuation utility that maximize her expected value at time zero:

(OBJ) max D(u),<u>

E [

∫ 0 ∞

e −r t ( ∑ i∈I

D i inv ( u t ) B i d N i, t

inv − ∑ j∈J

D j rew ( u t ) C j d N j, t

rew ) dt]

,

subject to the promise-keeping constraint that stipulates that u is indeed the agent’s expected continuation utility:

(PK) u t = E [

∫ s=t

e −r (s−t) ( ∑ j∈J

D j rew ( u t ) g j d N j, s

rew − ∑ i∈I

D i inv ( u t ) l i d N i, s

inv ) ds ]

.

Moreover, the chosen mechanism must incentivize the agent to implement available actions at the allowed intensity:

( IC inv ) E [ φ i inv (u)] − D i

inv (u) l i ≥ u ∀ u ∈ [0, u – ], i ∈ I,

( IC rew ) E [ φ j rew (u)] + D j

rew (u) g j ≥ u ∀ u ∈ [0, u – ] , j ∈ J.

The principal’s value function corresponding to the solution to the above problem is denoted by V(u) . This value function is weakly concave since we allow lotteries.4

To provide sufficient incentives for the agent to implement an investment of type i, the principal must increase his continuation utility by at least l i . The cost of providing one util by using rewards of type j is C j / g j , and so the expected cost of incentivizing an investment of type i via rewards of type j is l i ( C j / g j ) . Therefore, incentivizing investment i via reward j can be profitable only if B i > l i ( C j / g j ) . Moreover, if it is not profitable to incentivize a type i investment via the most effi-cient reward, i.e., B i ≤ l i ( C 1 / g 1 ) , forgoing these investments does not harm the principal.

Similarly, consider a reward for which C j / g j ≥ B 1 / l 1 . The inequality implies that for every investment opportunity implemented in the past, the agent’s effort generated a (weakly) lower value to the principal than the cost of compensation via

4 Were the value function to have a convex region, the principal could increase her continuation payoff by offering the agent a fair lottery over his continuation utility in this region.

Page 7: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 117BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

this reward activity. Therefore, such reward activities cannot generate positive net value to the principal and may be ignored (any past investment that is incentivized via rewards of this type can be forgone, as the cost of incentivizing the agent to implement the investment is not less than the value generated by him doing so). To simplify the exposition of our results, we assume that all investment opportunities and reward activities may potentially be used in a profitable manner. That is, we make the following parametric assumptions:

B I _ l I

> C 1 _ g 1

,

and

C J _ g J

< B 1 _ l 1

.

C. Preliminary Analysis

Before starting the main analysis, we provide two useful lemmata. The first lemma establishes that increasing the agent’s continuation utility decreases the prin-cipal’s value.

LEMMA 1: The value function V(u) is strictly decreasing.

This lemma has two useful corollaries.5

COROLLARY 1: In an optimal mechanism, u 0 = 0 .

COROLLARY 2: In an optimal mechanism, the incentive constraints for invest-ments are binding:

E( φ i inv (u)) = u + l i D i

inv (u) ∀ i ∈ I, u ∈ [0, u – ] .

Implementing an available investment is incentive compatible for the agent if doing so increases his continuation utility by the cost of implementation. As the discounted expected number of rewards is finite, there is an upper bound ( u – ) on his continuation utility. Therefore, it is natural to define a notion of the principal’s ( limited) capacity to incentivize additional investments (henceforth, we refer to the principal’s capacity to incentivize investments as her capacity). Formally, the capacity at state u is u – − u . Since rewards are short-lived, a reward that has not been used for compensation in the past cannot be used in the future and thus this capacity is not only limited, but also perishable. Considering the principal’s problem in terms of how to utilize her limited and perishable capacity turns out to be helpful in many parts of our analysis.

5 All subsequent results hold up to changes that effect the implementation of future investments and rewards with probability zero.

Page 8: Dynamic Non-monetary Incentives - Weebly

118 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

When an investment opportunity is available, the principal can commit some of her capacity in order to incentivize it. Since there is uncertainty regard-ing future availability of investments and capacity is perishable, choosing not to seize the currently available opportunity can be optimal only if the capacity can be redirected to better investments in the future.6 Consequently, if the best investment opportunity is available, it must be implemented at full intensity (or at the maximal possible intensity if full intensity is not incentive compatible) to avoid the risk of capacity being wasted or devoted to inferior investments in the future.

LEMMA 2: In an optimal mechanism, D 1 inv (u) = min {1, u – − u ____

l 1 } .

II. A Special Case: One Investment, One Reward

In this section, we illustrate and discuss a main property of the optimal mechanism:  the unique optimal method for providing compensation by means of a certain type of reward activity is a time allowance. That is, the principal allows the agent to enjoy all rewards that arrive within a given time interval that starts at the present moment. To illustrate this property and its implications in the most transparent way, we consider the case with one type of investment project and one type of reward activity. Clearly, this dramatically simplifies the problem and mutes many of its dimensions; the principal does not need to calculate either the optimal combination of rewards (all rewards are of the same type) or which investment projects to incentivize if they become available (Lemma 2).

Even in this simple case, however, there are multiple ways in which the principal can compensate the agent. In addition to the proposed time-allowance solution, the principal can, for example, allow the agent to pursue a fixed number of rewards after every implemented investment, back-load compensation to the distant future, or employ explicit lotteries. We begin with an intuitive presentation of the time mechanism (henceforth the TM). Although this mechanism is a Markovian delegation mechanism, it is instructive to start with an alternative representation that uses the length of time in which the agent is allowed to enjoy rewards as a state variable. To simplify the exposition in this section, we omit investment and reward indexes.

Time Mechanism.—Let s ∈ [0, ∞] denote the amount of time, starting at the present moment, in which the agent is allowed to enjoy all available reward activities. Set s 0 = 0 ; that is, the agent’s initial time allowance is zero. When the state is s , and there is f (s) ∈ 핉 + that solves the indifference condition

∫ 0 f (s)

e −r t λgdt − l = ∫ 0 s e −rt λgdt,

6 Foregoing an investment that is currently available in order to increase the capacity available for incentivizing identical investments in the future is strictly suboptimal as, with positive probability, the next investment opportunity will arrive only in a distant future and some capacity will be wasted.

Page 9: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 119BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

the principal allows the investment project at full intensity, and increases the agent’s time allowance to f (s) if an investment is implemented. If there is no such f (s) ∈ 핉 + , the principal allows the investment project at the intensity α that solves

∫ 0 ∞

e −rt λg dt − αl = ∫ 0 s e −rt λg dt

and increases the agent’s time allowance to s = ∞ if an investment is implemented.The agent’s expected continuation utility when his time allowance is s is

u(s) ≡ ∫ 0 s e −rt λg dt = 1 − e −rs _______ r λg.

Therefore, it is possible to transform the TM back into the general language of Markovian delegation mechanisms that use the agent’s continuation utility as a state variable. A formal definition of the TM as a Markovian delegation mechanism is provided in Appendix B.

A. Optimality and Uniqueness

PROPOSITION 1: The time mechanism is the unique optimal mechanism.

PROOF: This proposition is a special case of Proposition 2. ∎

The simplicity of the J = I = 1 case enables us to explain the intuition behind this result and to highlight its main driving forces using basic economic concepts. Firstly, it is strictly desirable for the principal to provide compensation as quickly as possible in order to increase her available capacity to incentivize future investment opportunities. It is instructive to begin by considering the extreme situation where the principal has already committed to allowing the agent to enjoy all rewards indefinitely. In this case, the principal cannot incentivize additional investments, and her continuation value is the expected cost of allowing all future rewards, (C/g) u – . If the principal were offered a one-time opportunity to provide the agent with u – utils immediately at a cost of (C/g) u – , she would strictly benefit from accepting this offer, because although she would incur an identical direct cost of compensation, she would also be able to incentivize profitable investments in the future. In  general, the principal has a limited capacity, and hence she will eventually be forced to forgo profitable investments. Thus, if the principal has an opportunity to increase her capacity (by permitting an available reward activity), she will strictly prefer to do so.

Secondly, conditional on every possible arrival time of the next investment, τ , the TM maximizes the principal’s expected capacity at τ . To see this, let u > 0 , and denote by s the corresponding time allowance under the TM. If τ ≥ s , the agent’s continuation utility at τ equals zero (which is the lower bound for any IC mechanism), and hence the principal has the maximal available capacity, u – . Now consider the case where τ < s . By the definition of the TM, regardless of the actual

Page 10: Dynamic Non-monetary Incentives - Weebly

120 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

reward arrival process, all rewards arriving before time τ are allowed. Therefore, the  TM attains the lower bound on the agent’s expected continuation utility at τ and hence maximizes the principal’s expected capacity at τ .

Thirdly, by the construction of the TM, conditional on the value of τ , there is no  uncertainty regarding the principal’s available capacity at τ . Recall that the principal has a (weakly) concave value function over u ; thus, a lottery over capacity ( u – − u ) cannot be beneficial. Therefore, minimizing the risk associated with the availability of future capacity is desirable for the principal.

We have shown that, under the TM, the weakly risk-averse principal enjoys the best expected outcome at the moment of the next investment opportunity, with no uncertainty. Hence, the TM must be an optimal mechanism. Moreover, for any mechanism that is distinct from the TM, there are values of τ for which there exist histories in which some rewards are not allowed at full intensity before time τ . Under such a mechanism, in expectation, the agent receives less compensation than under the TM, and hence the principal’s expected capacity at τ is strictly lower than under the TM. As the arrival time of the next investment opportunity is a random variable with full support, it follows that the TM is the unique optimal mechanism.

B. Properties of the Time Mechanism

Reward Independence.—The realized arrival of rewards is not taken into account under the TM. Thus, our result does not depend on the assumption that the principal observes implemented rewards. In some cases, it might be natural to assume that the principal only partially observes which rewards are implemented, possibly with some delay. In these environments, the TM remains the unique optimal mechanism.

Front-Loading.—Under the TM, the principal provides compensation for the investments the agent has already performed as early as possible (“front-loads compensation”) in two distinct ways. Firstly, the principal front-loads compensation within a history by using the earliest available opportunities to compensate the agent. Secondly, the principal front-loads compensation across different histories by provid-ing compensation via a time allowance. Were the time required to provide the prom-ised continuation utility different in two histories, compensation could be front-loaded by shifting a fraction of the compensation that should be paid in the distant future in one realization to an earlier point in time in the other realization. Doing so is desirable as the risk of wasting resources is higher in the realization with the earlier payment.

Foregone Investment Opportunities.—Under the TM, after a finite amount of time, the mechanism reaches an absorbing state in which no investments are imple-mented.7 By contrast, if the principal could provide instantaneous compensation (e.g., via monetary transfers8) at an identical direct cost to the one she incurs if

7 By the Borel-Cantelli lemma, under the TM, the agent’s continuation utility reaches u – in a finite time with probability one. In this absorbing state, the agent’s continuation utility never decreases, and no further investments are implemented.

8 Throughout this paper, we refer to transfers as a nonnegative payment the principal can deliver to the agent.

Page 11: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 121BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

reward activities are used for compensation, all available investments would be implemented. Forgoing some investment opportunities is an unavoidable conse-quence of the scarcity of compensation opportunities. The expected amount of time it takes to reach this absorbing state determines the (expected) welfare loss resulting from the lack of transfers. The time it takes to reach this state is decreasing in the arrival rate of investment projects and their implementation cost and increasing in the principal’s capacity to compensate the agent u – .9

C. Value Function

The use of time allowances enables us to define a notion of the resources that the principal has at her disposal. If, at state s , the principal were required to marginally increase the agent’s continuation utility, she would do so by allowing him to enjoy all rewards that arrive in an infinitesimal time interval starting at s units of time in the future. Thus, the marginal resource at state s is the right to enjoy all reward activ-ities in that infinitesimal interval.

The principal’s value at u is determined by the cost of paying the current debt (the agent’s continuation utility) and the amount of remaining resources that will successfully be utilized in expectation. In the limiting case where an investment opportunity is always available ( μ = ∞ ), the principal will successfully use all her resources to incentivize additional investment projects. Therefore, the value function in this limiting case has a linear form given by

V ∞ (u) = B _ l ( u – − u) − λC _ r .

In the opposite limiting case of μ = 0 , only the cost of compensation remains. All  of the uncommitted incentivization resources will be wasted as no further investment opportunities are expected. Thus, the value function is given by

V 0 (u) = − C _ g u.

For any μ ∈ (0, ∞) , the value function lies inside the wedge created by the extreme value functions (an example is depicted in Figure 1). In Appendix C, we provide an algorithm for analytically deriving the value function.

The value obtained from successfully utilizing the marginal resource is constant; however, for any 0 < μ < ∞ , the probability of successfully utilizing the marginal resource depends on the state s . The marginal resource is wasted at state s only if no investment opportunity arrives in s units of time. Therefore, the probability of successfully utilizing the marginal resource is increasing in s . Due to discounting, the amount of time required to provide u utils via the TM is convex in u . This, in turn, implies that the value function is strictly concave in u despite the risk neutrality of both players and the linearity of the environment.10

9 There is no nontrivial bound on the welfare loss due to a lack of transfers. One can easily construct examples in which the percentage of implemented discounted investments approaches one or zero.

10 We prove this for our general model in Lemma 8.

Page 12: Dynamic Non-monetary Incentives - Weebly

122 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

III. The General Case

Our main result is that in the general model presented in Section I, there is a unique optimal mechanism under which the principal selectively incentivizes investments and uses reward-specific time allowances to compensate and incentivize the agent.

Formally, we say that a Markovian delegation mechanism is a multidimensional time mechanism (henceforth MDTM) if it satisfies the following four properties: (i) for every j ∈ J , D j

rew ( · ) is an increasing step function with a support of {0, 1} . That is, type j rewards are allowed at full intensity if u is sufficiently high, and forbidden otherwise. (ii) The implementation of a reward does not change the state of the mechanism, φ j rew (u) ≡ u . (iii) In the absence of an investment/reward, the process u t is weakly decreasing where d u t = (r u t − ∑ j: D j

rew ( u t )=1 λ j g j ) dt . (iv) All

( I C inv ) are binding, and u increases deterministically upon implementation of an investment, φ i

inv (u) ≡ u + l i D i inv (u) .

The key feature of MDTMs is that compensation is provided via J time allow-ances. In particular, for each MDTM, there is a function S : [0, u – ] → 핉 + J , such that at state u the agent’s time allowance for reward j is S j (u) .11 The dynamics of these time allowances are simple. Strictly positive time allowances decrease at a constant rate of dt , and upon implementation of an investment all positive time allowances are increased by the same (state- and investment-specific) amount and forbidden rewards may receive only a smaller (state-, investment-, and reward-specific) time allowance than the aforementioned increase.

11 Note that under MDTMs u = ∑ j∈J λ j g j 1 − e −r S j (u) ________ r . Additionally, the time allowances are dynami-

cally consistent in the sense that for u′ < u ∈ [0, u – ) : (i) there exists t < ∞ , such that ∀ j: S j (u) < ∞, S j (u′ ) = max { S j (u) − t, 0} , and (ii) S j (u′ ) = ∞ ⇒ S j (u) = ∞ .

Figure 1. The Value Function and Its Bounds for r = 1 / 20, μ = 5/20, λ = 3 / 20, B = 5, l = g = C = 1

Value function

Upper bound

Low bounder

Utility

2

−2

4

6

8

10

12

Val

ue

0.5 1.0 1.5 2.0 2.5 3.0

Page 13: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 123BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

PROPOSITION 2: There is a unique optimal mechanism. Moreover, this mechanism is a multidimensional time mechanism.

We refer to the optimal mechanism as the generalized time mechanism (henceforth the GTM).

When I = J = 1 , the TM is the only (unidimensional) time mechanism that satisfies Corollary 1 and Lemma 2, and thus Proposition 2 pins down the unique optimal mechanism. In the general case, there are many multidimensional time mechanisms that satisfy these requirements. For example, with two types of reward activities, there are many (two-dimensional) time allowances that provide the same level of expected compensation, and when there are two types of investments, mech-anisms can differ in regard to when type 2 investments are incentivized.

The simple case with one type of investment and reward discussed above is illu-minating in many respects; however, it is not rich enough to address two major aspects of the principal’s problem: the choice of what types of rewards to per-mit and what types of investments to incentivize. After giving a road map to the proof of Proposition 2 in the following subsection, we address these aspects in Section IIIB.

A. Road Map of Proof

The key step is to establish the following properties. First, it is without loss of generality to restrict attention to mechanisms that satisfy a generalized reward-independence property. Namely, implementing rewards does not impact the dynamics of the mechanism. Second, it is without loss of generality to restrict attention to mechanisms where compensation can be represented by a function x (u) : [0, u – ] → [0, J ] , such that at state u reward activities {1, … , ⌊x (u)⌋ } are allowed at full intensity, reward ⌊ x (u)⌋ + 1 is allowed at intensity x(u) − ⌊ x (u)⌋ , and all other rewards are forbidden. That is, mechanisms under which expensive rewards are allowed, only if all cheaper rewards are also allowed at full intensity.12 Finally, the optimal mechanism does not use explicit lotteries.13

For each such mechanism, denote by W(x) the instantaneous compensation provided to the agent when the control is x :

W(x) = ∫ 0 x λ ⌈z⌉ g ⌈z⌉ dz,

and denote by C(x) the instantaneous cost of using this control:

C(x) = ∫ 0 x λ ⌈z⌉ C ⌈z⌉ dz.

In combination with Corollaries 1 and 2, this enables us to write the ( HJB ) equation, corresponding to the (OBJ) problem, using as the control variables x(u)

12 Recall that we compare different types of rewards in terms of the direct cost of providing one util, C j / g j , to the agent.

13 The class satisfying these restrictions is more general than the class of MDTMs.

Page 14: Dynamic Non-monetary Incentives - Weebly

124 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

and { D i inv (u), φ i inv (u)} i∈I (the desired intensity and compensation distribution for

investment project implementation). The optimality condition is given by

(HJB) sup x(u), { D i

inv (u), φ i inv (u)} i∈I

{−rV(u) + V ′ (u)[ru − W(x(u))] − C(x(u))

+ ∑ i∈I

μ i ( D i inv (u) B i + V( φ i inv (u)) − V(u)) } = 0,

subject to

x(u) ∈ [0, J ], D i inv (u) ∈ [0, min {1, u – − u _____

l i } ] , φ i

inv (u) = u + D i inv (u) l i ,

where V ′(u) is the left-hand derivative of V(u) , which exists due to the concavity of V( · ) .

Using this representation, we establish that we can further restrict attention to mechanisms in which the set of allowed rewards is increasing in u and in which all types of rewards are used at full intensity or not at all.

Next, we show that, as under the TM, compensation via any reward is front-loaded. That is, if a reward-type is allowed almost surely at some points in time in the future, this reward must be allowed at full intensity at present. This, in combination with the independence property established above, implies that there exists an optimal mechanism that is a MDTM. Finally, we establish that the value function is strictly concave, which, in turn, implies that the optimal mechanism is unique.

B. Optimal Investment and Compensation Policies

For ease of exposition, we assume that the cost of providing a util, C j / g j , is strictly increasing in j and that the rate of return on investments, B i / l i , is strictly decreasing in i .14 This assumption entails no loss of generality (see Appendix D).

Dynamic Compensation Structure.—In Section II, we argued that front-loading compensation reduces the waste of compensation opportunities and thus minimizes the indirect (opportunity) cost of compensation. The same reasoning suggests that the principal is inclined to permit rewards of different types simultaneously. However, different types of rewards are associated with different direct costs for the principal. Thus, the principal might prefer to avoid permitting expensive rewards. The optimal compensation structure balances the trade-off between minimizing these two direct and indirect costs.

Since the agent’s time allowances decrease over time when no investments are implemented and we have shown that cheap rewards are allowed before expensive ones, there exist weakly increasing activation thresholds { u ˆ j

rew } j=1 J , such that reward

activity j is permitted if and only if u ≥ u ˆ j rew .

14 Recall that we have already ordered actions such that both ratios are weakly monotone.

Page 15: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 125BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

A more interesting feature of compensation under the GTM is that the above acti-vation thresholds are interior in the following sense: (i) cheap rewards are allowed strictly before the more expensive ones, and (ii) the principal permits expensive rewards before it is strictly necessary to do so. In other words, even though the principal can provide the promised utility only via cheap rewards, at some states, she prefers to use expensive rewards as well. For the next proposition, recall that the agent’s expected continuation utility from enjoying all future rewards of type j is ∫ 0

∞ e −r t λ j g j dt = λ j g j /r .

PROPOSITION 3: Activation thresholds of reward activities, u ˆ j rew , are strictly

increasing in j and satisfy u ˆ j rew < ∑ k=1

j−1 λ k g k /r .

We illustrate the intuition for this result using two types of rewards. When the agent’s continuation utility is very low, the amount of time required to compen-sate him by using only the cheap reward activity is very small. Therefore, with very high probability, the agent will have received all his compensation by the time the next investment opportunity arrives even if only rewards of type 1 are used. Therefore, using the expensive type of reward is unlikely to increase the number of investments that will be implemented in the future, but it will definitely increase the cost of compensation. Thus, for sufficiently low levels of u , only rewards of type 1 are allowed.

The intuition for why the principal uses the expensive reward before the capacity of a cheaper reward is exhausted is also straightforward. By forbidding the expen-sive reward at present, the principal is wasting some capacity, albeit capacity with a low value (that is, incentives that are provided via the expensive reward). However, if the cheap reward is forbidden in the distant future, there are likely to be many additional opportunities to use it; thus, with high probability, the cheap reward that is forbidden in the distant future will not be wasted. Therefore, once the probability of wasting the high-value capacity is low enough, it is profitable to risk wasting some of it to avoid wasting the low-value capacity with certainty. As the probability of wasting the high-value capacity converges monotonically to zero with the time allowance of the cheap reward, allowing the expensive reward to be used before the cheap one is exhausted is beneficial.

Proposition 3 is also informative about the combinations of time allowances that are provided in the GTM. Were the time allowance for the expensive reward greater than that of the cheap reward, then if no investment arrives for a long enough the expensive reward would be allowed while the cheap reward would be forbidden. Similarly, were the cheap reward to be allowed indefinitely while the expensive reward were not, we might reach a state where the cheap reward would be allowed indefinably while the expensive one would be forbidden.

COROLLARY 3:

(i ) If the time allowance of reward j is positive and finite, it is strictly greater than the time allowance of rewards j ̃ > j .

Page 16: Dynamic Non-monetary Incentives - Weebly

126 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

(ii ) The time allowance of reward j is infinite if and only if the time allowance of all rewards is infinite.

(iii ) When an investment is implemented, strictly positive time allowances are increased by the same amount, and a smaller time allowance may also be granted for rewards that were initially forbidden.

Dynamic Investment Policy.—In the GTM, there are three forces that lead the principal to become more selective about the investments she incentivizes at higher levels of u . Firstly, as the agent’s continuation utility increases, the principal’s capac-ity to incentivize additional investments decreases. Secondly, by Proposition  3, the direct cost of incentivizing the agent is increasing in his continuation utility. Thirdly, the use of time allowances for compensation implies that the probability of a better investment opportunity arriving before the marginal resource (of any type of reward) is wasted is increasing in the agent’s continuation utility. In the next proposition, we show that there exist thresholds { u ˆ i

inv } i=1 I , such that investment

project i is incentivized if and only if u ≤ u ˆ i inv . The strict concavity of the value

function implies that the thresholds u ˆ i inv are in fact strictly decreasing in i .

PROPOSITION 4: There exist thresholds u – = u ˆ 1 inv > u ˆ 2

inv > ⋯ > u ˆ I inv > 0, such that:

(i ) If u ≤ u ˆ i inv , then investments of type i are incentivized at intensity

min {1, ( u ˆ i inv − u) / l i } .

(ii ) If u > u ˆ i inv , then investments of type i are forbidden.

This proposition, and the implication of Proposition 3 that the agent’s continuation utility drifts down as long as investments are not implemented (for u ∈ (0, u – )) , provide insights into the nature of dynamic investment decisions. Specifically, it sug-gests a potential explanation of why an investment opportunity similar to one that was forgone in the past is implemented at present. After a large and profitable investment is carried out the principal requires a high return on her limited resources to justify the incentivization of an investment. Therefore, despite the principal having the resources, some investments opportunities are temporarily forgone, until the agent’s continuation utility decreases and the return on such investments is deemed acceptable.

A Continuum of Rewards and Investments.—The previous subsections study the optimal investment and compensation policies when there are finitely many types of investment and compensation opportunities. Due to our continuous-time model, one may wonder how sensitive the above-mentioned qualitative properties are to this finiteness assumption and whether anything substantial would change if we allowed for a continuum of investments and rewards.15 By using finite

15 Formally, think of a model where investment and reward opportunities arrive according to two independent Poisson processes. When an investment (reward) opportunity is realized, its type is then drawn i.i.d. according to a continuous distribution function.

Page 17: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 127BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

approximation techniques, it is straightforward to verify that with a continuum of investments/ rewards there exists an optimal mechanism that is a MDTM. Furthermore, this MDTM satisfies the properties established in Propositions 3– 4. However, the continuum of investments/rewards does have two minor implications on the conclusions that can be drawn from these propositions.

In the discrete model, the cheapest reward arrives with positive probability; thus, some compensation can be provided only via the cheapest reward. This, in combi-nation with the front-loading of compensation (via any reward type), implies that if enough time has passed since the last investment was implemented the agent is not allowed to enjoy any rewards. In the continuous model, it is not possible to pro-vide any compensation only via the cheapest reward, and thus the agent is always allowed to enjoy some cheap rewards after the first investment has been imple-mented. However, the probability of an allowed reward arriving is arbitrarily small if no investment has been implemented for a long enough time.

Second, in the discrete model, the best investment opportunity arrives with strictly positive probability. Since this investment is always implemented and the principal’s capacity is limited, after a finite amount of time, the agent will cease implementing investments. In the continuous case, the best opportunity arrives with a probability of zero; thus, it is no longer true that the principal will exhaust her capacity to incen-tivize the agent in finite time. However, the expectation at time zero of the capacity that will be available at time t converges to zero as t → ∞ . This, in turn, implies that the expected percentage of available investments the agent implements in the distant future is arbitrarily small.

C. The Asymmetric Impact of Stochasticity

The key feature of the environment we consider is the stochastic availability of investment and compensation opportunities. With regard to compensa-tion opportunities, not only is it possible to, essentially, nullify the stochasticity at no cost, it turns out that such conduct is the only correct one. However, the stochasticity related to investment availability adversely affects the principal’s expected value. The agent’s inability to credibly commit to implementing investments in the future makes it impossible to use currently available rewards to incentivize future investments. Therefore, the principal is exposed to the risk of wasting incentivization opportunities due to the uncertain arrival of investment opportunities.

A natural way to assess the impact of the stochastic arrival in our environment is to scale the payoff parameters of a given type of investment/reward and to correct its arrival rate accordingly, such that the expected payoffs from implementing all the opportunities (of that specific type) arriving in a fixed time interval remain the same. Formally, we say that an investment type < μ, B, l > is smoothed by a fac-tor of γ > 1 if it is replaced by < γμ, B/γ , l/γ > . A similar notion is defined for rewards.

PROPOSITION 5: Smoothing investments increases V(u) for all u < u – . Smoothing rewards does not alter V(u) .

Page 18: Dynamic Non-monetary Incentives - Weebly

128 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

To incentivize the agent to implement an investment, the principal must offer a sufficient number of discounted rewards, in expectation. When compensa-tion is provided via time allowances, this is done by calculating the expected discounted number of rewards that will be available at each point in time and setting the time allowance to the required length. Since this compensa-tion method satisfies the reward-independence property, in a sense, the stochas-tic nature of reward availability is eliminated. Thus, as long as compensation is provided in accordance with the unique optimal method, smoothing rewards does not affect the dynamics of investment implementation and the players’ expected payoffs.

To illustrate the gain from smoothing investments in the simplest possible way, suppose that there is only one type of investment and that each investment oppor-tunity is “large” enough so that the agent’s continuation utility jumps to u – when the first investment is implemented. If the principal could replace this investment type by a “flow investment” that was always available (smoothing by an infinite factor), she would avoid the inevitable waste of capacity that occurs while waiting for the first investment opportunity to arrive. More generally, smoothing investments reduces the expected waste of the principal’s capacity to provide incentives and thus increases her expected value.

Finally, we comment that it is the uncertain arrival of investments, together with the limitedness and perishability of rewards (and not their stochastic arrival), that generates the non-monotone investment policy and the possible use of expensive rewards before cheap rewards are exhausted.

IV. Extensions

A. Monetary Transfers

In many applications, the agent may receive compensation via a combination of monetary transfers and reward activities. In this subsection, we will argue that the main insights of our model remain valid in these more general environments.

Monetary transfers are a limiting type of reward with g j = C j and an arrival rate that goes to infinity. This limiting type of reward is different from other rewards as it provides the principal with an unlimited capacity to compensate the agent. The  existence of monetary transfers makes all rewards for which the cost of granting a util is more than one irrelevant. The principal’s unlimited ability to provide a util to the agent at a cost of one makes rewards for which the cost is higher inferior, and thus they will not be used. Rewards for which the cost of pro-viding a util is less than one are used via a time allowance. This is summarized in the following proposition whose proof is similar to that of Proposition 2 and is thus omitted.

PROPOSITION 6: Consider a model in which monetary transfers are available:

(i ) Rewards for which g j / C j > 1 are not allowed in the optimal mechanism.

Page 19: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 129BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

(ii ) Rewards for which g j / C j ≤ 1 are used via time allowances.

The optimal mechanism in this setting resembles the GTM in some ways but differs in others. Firstly, unlike in the GTM, a certain type of compensation device ( transfers) is used only after all cheaper rewards are exhausted. This differ-ence reflects the unique nature of transfers as an instantaneous and nonperishable compensation device, a property that negates the need to start using transfers before it is strictly necessary to do so. Were the principal to use transfers while a more efficient reward were not fully committed, she could reduce the expected cost of compensation by decreasing the transfer and allowing the agent to enjoy that reward. If this change creates a shortage of rewards in the future, sufficient incentives can then be provided via transfers instead. However, with positive probability, the agent’s time allowance reaches zero, in which case the cost of compensation is reduced.

Secondly, when transfers are available, there is no longer an absorbing state in which all investments are forgone. More precisely, if all investments have a return greater than one , no investment is ever forgone; while if there exist investments that are profitable to incentivize only via rewards (and not via transfers), all such invest-ments will eventually be forgone.

B. Public Observability of Rewards and Investments

We assume that only the agent observes whether investments and rewards are available. However, in the GTM, the agent reveals the availability of worthwhile investments without receiving any information rents. This raises a natural question about the effect of asymmetric information in our model. Were we to assume instead that both players (or only the principal) observe the availability of rewards, the GTM would remain the unique optimal mechanism. To see this, note that under the GTM the IC constraints for reward implementation are nonbinding.

However, the information asymmetry over investment opportunities is harmful to the principal as it decreases her ability to use her limited capacity effectively. Consider an environment identical to ours in all respects but where the availability of investment opportunities is observed by both players.16 In this environment, she can condition the set of permissible actions not only on implemented investments and rewards but also on their availability in the past.

We illustrate this in the special case of the model with one type of investment and one type of reward. For more general models with stochastic opportunities and symmetric information over availability, see Forand and Zápal (2018) and our companion paper Bird and Frug (2019). In this subsection, we omit investment and reward indexes.

Consider first the case where rewards are “relatively scarce,” l + (μl/r) ≥ λg/r . The following is an optimal mechanism: the agent is asked to implement all investments at intensity λg/ (μl + rl) , and all rewards are allowed at full intensity

16 To avoid confusion, we emphasize that the principal cannot compel the agent to implement an investment.

Page 20: Dynamic Non-monetary Incentives - Weebly

130 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

after the first investment arrives. If the agent forgoes an available investment, all rewards and investments are forbidden indefinitely.

This mechanism is incentive compatible since by construction the agent’s con-tinuation utility from implementing an investment is zero, which is equal to his continuation utility from not doing so. To see that the mechanism is optimal, observe that under this mechanism the agent receives the maximal possible expected com-pensation and has a participation utility of zero. Since under the TM some rewards are wasted with positive probability, this mechanism outperforms the TM.

In the case of “abundant rewards,” l + (μl/r) < λg/r , the following mecha-nism is optimal. The agent is asked to implement all investments at full intensity, and  after  the first implemented investment, all rewards are allowed at intensity (r + μ)l/ (λg) . If the agent forgoes an available investment, all rewards and investments are forbidden indefinitely. This mechanism is incentive compatible by construction and is optimal as all investments are implemented at the min-imal possible (expected) cost. It  outperforms the TM, as under the TM some investments are forgone.

To summarize, the nature of compensation in our environment limits the principal’s  capacity to compensate the agent. However, the degree to which the principal can utilize her limited capacity and the correct way to do so depend on the informational assumptions. When the principal observes which investments are available, she can contract over the arrival of future investment opportunities. Such contracts allow the principal to combine realizations in which investment opportunities are rare and the capacity constraint is slack with other realizations in which investments are abundant and the capacity constraint is binding. Under asymmetric information on the availability of investment opportunities, such contracts cannot be enforced, and thus asymmetric information exacerbates the severity of the limited capacity problem.

It is worthwhile to note that the existence of an “informational problem” (or lack thereof ) is itself a consequence of the type of compensation we consider. In  particular, in a model where compensation is provided solely via mone-tary transfers,  the observability of investment opportunities does not impact the principal’s value. This is due to the fact that when transfers are the means of compensation, then reimbursing the agent for his (exact) implementation cost after observing a completed investment is an optimal mechanism regardless of the observability of investment opportunities.

C. Commitment

In this subsection, we show that lack of commitment has only mild implications on our main qualitative results. In particular, the public perfect equilibrium that maximizes the principal’s payoff is a MDTM, albeit not the GTM.

The principal’s expected continuation value from a contract utilizing time allow-ances is independent of current reward availability, due to the Poisson arrival of rewards. Moreover, since we assume that the principal selects the delegation list without knowing whether rewards are available, the principal cannot revoke a time allowance exactly when the reward is available. Therefore, if the principal lacks

Page 21: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 131BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

commitment power, the only limitation on her ability to use time allowances is the existence of a maximal time allowance she can credibly give.

From the previous section, we know that a MDTM maximizes the prin-cipal’s capacity to incentivize further investments conditional on the agent receiving a fixed continuation utility. Therefore, any continuation utility that the principal can credibly provide without commitment, she can also credibly provide without commitment via a MDTM. Thus, even without commitment power, the principal will use a MDTM. By forbidding all rewards, the princi-pal can unilaterally obtain a nonnegative continuation value; therefore, her lack of commitment implies that in any equilibrium her continuation value is nonnegative at all times. In the class of MDTMs, this restriction is equivalent to putting an upper bound on the agent’s continuation utility, which, in turn, implies that there is a finite bound on the length of the time allowances that are promised.

PROPOSITION 7: The perfect public equilibrium that maximizes the principal’s value is implemented by a MDTM.

This result stands in contrast to the results derived in Möbius (2001) and Hauser and Hopenhayn (2008). They study repeated games where occasionally each player can grant a favor to his counterpart and find that efficient PPE are supported by “chip mechanisms” in which the principal caps the number of rewards the agent will receive. The difference in our results stems from the diff erence in the information structure. Whereas in the “trading favors” literature the principal privately observes available rewards, in our model, the agent is the one who observes reward availability. Therefore, not only can the agent detect (and punish) any deviation by the principal, but the principal’s IC constraints are easier to satisfy as in expectation there is no immediate gain from forbidding a reward.

V. Related Literature

Our work contributes to the literature on dynamic principal-agent interactions by providing the first analysis of a dynamic model with uncertain availability of investments and rewards. The existing literature has addressed a plethora of economic  questions in rich environments;17 however, it does so by assuming compensation (solely) via monetary transfers. We consider a model where the complexity stems from the dynamic stochasticity of action availability. Along other dimensions, our environment is simple, indeed, so simple that a direct comparison between our model and previous dynamic principal-agent models is not insightful.

However, our model does provide new insights into the long-standing debate over the optimal timing of compensation. Earlier work such as Lazear (1981) and Harris and Holmstrom (1982) generally suggests that when the principal has full commitment power, compensation should be back-loaded. Ray (2002) reaches the

17 Seminal papers in this field include the works of Baron and Besanko (1984), Rogerson (1985), Holmstrom and Milgrom (1987), and Spear and Srivastava (1987).

Page 22: Dynamic Non-monetary Incentives - Weebly

132 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

same conclusion when the principal does not have full commitment power. Later work on models of full commitment, such as Rogerson (1985) and Sannikov (2008), shows that the optimal timing of compensation is more complicated as it relates to time-dependent effectiveness and the cost of providing incentives, in which case either back-loading or front-loading may be optimal. In the present paper, we add a novel argument to this debate by pointing out that when compensation opportunities are perishable, the principal has an unambiguous incentive to front-load compensation.

The type of uncertainty in our model is similar to the uncertainty in the “ trading favors” literature (e.g., Möbius 2001 and Hauser and Hopenhayn 2008), which studies equilibria in games where each player occasionally has an opportunity to grant a favor to his counterpart at a cost to himself.18 Möbius (2001) suggests the use of a “chip mechanism” in which a player grants a favor if the difference between the number of favors granted and received is not too large, and Hauser and Hopenhayn (2008) improves on his mechanism by allowing the cost (in chips) of a favor to depend on the current chip distribution and adding a mean reverting element to the chip distribution.

More closely related papers that study stochasticity in action availability are Bird and Frug (2019) and Forand and Zápal (2018). The key difference between them and us is that they assume action availability is publicly observed but allow for non-stationary environments. Forand and Zápal (2018) considers a model with lin-ear payoffs (as in this paper) and shows that there exists an optimal mechanism under which rewards are back-loaded while investments are front-loaded. That is, once the agent enjoys a certain reward-type, he is allowed to do so indefinitely at full intensity, and once an investment is implemented at partial intensity, it is never implemented again. Bird and Frug (2019), however, assumes that the payoffs from actions are convex (i.e., there is a decreasing return of the effort exerted on an investment opportunity). They characterize the unique optimal mechanism in a model reminiscent of the one suggested in this paper and show that, under symmetric observability, compensation must increase monotonically over time. More generally, they establish conditions under which the use of an arbitrary type of “compensation opportunity” must increase (and an arbitrary type of “investment opportunity” must decrease) over time in any interaction in which it is present.

Our work also contributes to the literature on delegation. Following the seminal work of Holmström (1977), the literature on delegation has focused on models where the agent has private information about a payoff-relevant state of nature. Melumad and Shibano (1991) and Alonso and Matouschek (2008) show that in a single delegation problem the optimal delegation rule is to cap the agent’s decisions against his bias. Armstrong and Vickers (2010) considers a static delegation problem where, like in our model, the agent has private information about the available actions. They show that the principal permits an action if it provides her with a high enough payoff relative to the agent’s utility from performing the same action. Frankel (2014, 2016a) studies a more complex environment where multiple tasks are delegated. Frankel (2016a)

18 This type of uncertainty is different from the uncertainty resulting from imperfect monitoring (moral hazard) that is studied in many other continuous-time models; see, e.g., Sannikov (2008), Biais et al. (2010), and Fong and Li (2017).

Page 23: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 133BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

shows that in the multidimensional case, capping the weighted average of the agent’s actions remains an effective mechanism. By contrast, Frankel (2014) shows that if there is sufficient uncertainty about the agent’s preferences, the max–min optimal mechanism is a “moment mechanism” under which all agent types act as if they have the same preferences as the principal.

Guo and Hörner (2015); Frankel (2016b); and Li, Matouschek, and Powell (2017) analyze dynamic delegation where, unlike in our model, the key uncertainty is over the payoffs from available actions. These papers assume the agent’s preferred action is always available and focus on how to incentivize him to choose the prin-cipal’s preferred action instead. Moreover, the last two papers effectively assume that there are at most two actions apart from the default of doing nothing, and the first two papers assume that the payoffs from previous actions are not observed. Lipnowski and Ramos (2017) studies a similar setting in which the principal’s lack of commitment is binding and characterizes the equilibrium payoff in these cases. Several works of a more applied nature have also focused on this environment. Guo (2016) studies the delegation of experimentation when no transfers are permitted, and Nocke and Whinston (2010, 2013) analyzes the optimal dynamic merger policy for an antitrust authority.

VI. Concluding Remarks

This paper studied incentivization in a dynamic principal-agent model where investment and compensation opportunities arrive stochastically over time and are privately observed by the agent. Our main result shows that there exists a unique optimal mechanism in this environment and that the mechanism satisfies several natural and intuitive properties.

First, we established that the unique optimal way to provide incentives is by updating the reward-specific time allowances (the time intervals within which all rewards that become available are allowed) upon implementation of each desired investment. Every other method of compensation needlessly wastes some of the principal’s limited capacity to provide incentives, which, in turn, reduces the number of investments that will be implemented.

We also identified the qualitative properties of the dynamic investment policy and the relation between the time allowances of different types of rewards. We showed that the limited capacity to provide compensation and uncertain invest-ment availability leads the principal to utilize expensive compensation opportunities before cheaper rewards are exhausted and that, in general, the investment policy is non-monotone over time (the principal may become more or less selective regarding which investment opportunities to incentivize).

An important assumption in our analysis is that, even though the agent’s private information is richer than the public information, both players always share the same beliefs about the availability of investments and rewards in the future. If this were not the case (i.e., if the arrival of investments and rewards were not governed by Poisson processes), we would have to consider two additional trade-offs that are not covered in this work: first, the agent’s trade-off between receiving an immediate payoff and manipulating the principal’s beliefs and, second, the principal’s trade-off

Page 24: Dynamic Non-monetary Incentives - Weebly

134 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

between learning about the future and maximizing her continuation value based on her current beliefs. We believe these additional considerations create an interesting dynamic adverse-selection problem, which we leave for future research.

Appendix A: Proofs

PROOF OF LEMMA 1:Let D ˆ ( ⋅ ) be a mechanism (a mapping from histories to lotteries over

delegation lists) under which the agent’s and principal’s payoffs are u > 0 and v , respectively. To prove the lemma, we show that, for u′ < u ( sufficiently close), there exists an incentive-compatible mechanism under which the agent gets u′ , and the principal’s expected value is greater than v . Let H 1 (α, δ, T ) be the set of all his-tories under D ˆ ( ⋅ ) in which the first reward is implemented (i) before the first invest-ment and before time T, (ii) at an intensity of at least α , and (iii) with an incentive constraint that is slack by at least δ .

Case 1: There exist α > 0, δ > 0, T < ∞ such that the measure of H 1 (α, δ, T ) is positive.

There exists a function ϵ(h) > 0 such that modifying the mechanism D ˆ ( · ) and setting the intensity of the first reward at histories consistent with H 1 (α, δ, T ) at (1 − ϵ(h)) of the intensity under D ˆ ( · ) induces an incentive-compatible mechanism. This modification reduces the expected amount of rewards without affecting investments and the lemma follows.

Case 2: The measure of H 1 (α, δ, T ) is zero, for all α > 0, δ > 0, T < ∞ .

In this case, we show there exists a positive measure of histories for which rewards are implemented only after the first investment, and the incentive constraint of this investment is not binding.

Assume to the contrary that under D ˆ ( · ) rewards are implemented before the first investment is implemented, with probability one. Since we are in Case 2, the IC constraint for the first reward is binding with probability one. This, in turn, implies that the agent is indifferent between concealing all rewards (and hence implementing no investments) and following the mechanism, in contradiction to our assumption that u > 0 .

To see why there exists a positive measure of histories for which, in addition to the previous condition, the incentive constraint of the first investment is slack. Suppose that in the set of histories where rewards are not implemented prior to investments, the agent is indifferent between implementing and concealing the first investment, with probability one. It follows that his expected payoff under D ˆ ( · ) , conditional on this set of histories, is zero. Since u > 0 , it must be the case that the agent strictly benefits, in expectation, from implementing rewards prior to the first investment, contradicting Case 2.

Let H 2 (α, δ, T ) be the set of all histories under D ˆ ( · ) where the first reward is implemented (i) between the first and second investments, and before time T,

Page 25: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 135BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

(ii) at an intensity of at least α , and (iii) such that the incentive constraints of the first investment and reward are slack by at least δ . If there exist α > 0, δ > 0, T < ∞ such that H 2 (α, δ, T ) has a strictly positive measure, we can modify D ˆ ( · ) by slightly reducing the intensity of the first reward at histories consistent with H 2 (α, δ, T ) , like in Case 1, without violating any of the incentive constraints. This is because all of the incentive constraints prior to the modification were slack, and the modification does not affect the constraints that come after it.

If such α > 0, δ > 0, T < ∞ cannot be found, as before, there must exist a positive measure of histories for which rewards are implemented only after two completed investments are observed, and the incentive constraints of these investments are slack. The argument develops as before, until we find the smallest k such that there is a strictly positive measure of histories where the agent enjoys a strictly positive (expected) gain from implementing rewards between the k th and (k + 1) th investment, and all of the previous incentive con-straints are slack. We then modify the mechanism in an analogous way to that suggested in Case 1. Eventually, such a k will be found, as otherwise we would get a contradiction to the assumption that the agent’s expected payoff from the mechanism is positive. ∎

PROOF OF LEMMA 2:Let u ˆ such that D 1

inv ( u ˆ ) < min { 1, ( u – − u ˆ ) / l 1 } and suppose that an investment of type 1 is currently available. Denote u = u ˆ + l 1 D 1

inv ( u ˆ ) , which is the agent’s continuation utility after the investment is implemented. We show that there exist α, ϵ > 0 , such that implementing the investment at intensity D 1

inv ( u ˆ ) + α , allowing all rewards for ϵ units of time, and then resuming the original mechanism at state u is: (i) incentive compatible, (ii) satisfies the (PK) condition, and (iii) profitable for the principal.

Assume that a type-1 investment opportunity is currently available. By the (PK) condition, the intensity of the investment project that can be incentivized for a given ϵ is the α(ϵ) that solves

u = −α(ϵ) l 1 + 1 – e −rϵ _______ r ∑ j∈J

λ j g j + e –rϵ u ,

α(ϵ) = 1 − e −rϵ _______ rl 1

∑ j∈J

λ j g j − (1 − e −rϵ )u

__________ l 1

.

Thus, the continuation value from the modified mechanism is

α(ϵ) B 1 − 1 − e −rϵ _______ r ∑ j∈J

λ j C j + e −rϵ V(u).

We need to show that

V(u) < α(ϵ) B 1 − 1 − e −rϵ _______ r ∑ j∈J

λ j C j + e −rϵ V(u).

Page 26: Dynamic Non-monetary Incentives - Weebly

136 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

Rearranging terms we get

V(u) < B 1 ∑ j∈J

λ j g j __________

r l 1 −

∑ j∈J λ j C j

________ r − u B 1 _ l 1

.

Denote by V – (u) the value function in the auxiliary model where an investment

of type 1 is always available. In this model, the principal uses all of her rewards to incentivize investment 1, and thus

V – (u) = (

∑ j∈J λ j g j

________ r − u) B 1 _ l 1

− ∑ j∈J

λ j C j

________ r .

Clearly, V(u) ≤ V – (u) and, moreover, this inequality is strict for u < u – since the

event that no investment projects arrive for T units of time has a positive probability for any T .

Thus, for any u < u – ,

V(u) < V – (u) =

B 1 ∑ j∈J λ j g j __________

r l 1 −

∑ j∈J λ j C j

________ r − u B 1 _ l 1

. ∎

PROOF OF PROPOSITION 2:Recall that the function x(u) : [0, u – ] → [0, J ] is a function that specifies

which rewards are allowed at u . In particular, reward activities {1, … , ⌊x(u)⌋} are allowed at full intensity, reward ⌊x(u)⌋ + 1 is allowed at intensity x(u) − ⌊x(u)⌋ , and all other rewards are forbidden. Furthermore,

W(x) = ∫ 0 x λ ⌈z⌉ g ⌈z⌉ dz ,

C(x) = ∫ 0 x λ ⌈z⌉ C ⌈z⌉ dz.

First, we prove a sequence of lemmata that jointly establish that there is an opti-mal mechanism that is a MDTM.

LEMMA 3: It is without loss of generality to restrict attention to mechanisms with the following properties:

(i ) φ j rew (u) = u, ∀ j ∈ J, u ∈ [0, u – ] .

(ii ) D j rew (u) can be written as

D j rew (x(u)) =

⎨ ⎪

1

if j ≤ ⌊x(u)⌋

x (u) − ⌊ x(u )⌋ if j = ⌊ x (u)⌋ + 1

0

if j > ⌊x (u)⌋ + 1

.

Page 27: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 137BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

(iii ) F τ (u) is a Dirac distribution with a mass point at e r τ u − ∫ 0 τ e r (τ − s) W (x ( u s )) ds .

PROOF: Consider an arbitrary IC mechanism and assume that the current state

is u 0 > 0 . Denote by τ the time until the next investment opportunity arrives. For any pair u 0 , τ and t < τ , the above mechanism generates a distribution of the continuation utility at time t in terms of time 0 utils, e −r t u t | u 0 , and this distribution is independent of τ . The expected continuation utility (measured in terms of time 0 utils) is weakly decreasing (by assumption, no investment projects are carried out during this time) and continuous. Continuity follows from rearranging the (PK) condition to get

(A1) e −rt E[ u t | u 0 ] = u 0 − E [ ∫ 0 t e −rs ∑

j∈J D j

rew ( u s ) g j d N j, s rew ds ] .

Therefore, e −r t E [ u t | u 0 ] is differentiable almost everywhere (in t ); moreover, equation (A1) shows that its derivative belongs to the interval [− e −r t ∑ j∈J

λ j g j , 0] .

Therefore, ∂E[ u t | u 0 ]/∂ t ∈ [rE[ u t | u 0 ] − ∑ j∈J λ j g j , rE[ u t | u 0 ]] almost everywhere,

and we can replicate the dynamics of E [ u t | u 0 ] by finding the unique x ∈ [0, J ] for which

(A2) ∂E[ u t | u 0 ] _______ ∂t

= rE[ u t ] − W(x) .

Since we are focusing on a Markovian solution, we suppress the time index and denote the solution to equation (A2) by x(u) .

By construction, the reward component described above induces the minimal expected cost (for the principal) out of all the compensation schemes that induce the same expected rate of compensation as that of the chosen mechanism. Since there is no uncertainty about the value of u t under this scheme, it is clear that replacing the reward component of the original mechanism with the one constructed above does not harm the principal, who has a weakly concave value function.

Since the set of points where E [ u t | u 0 ] is non-differentiable is of measure zero, the delegation mechanism can be completed in an arbitrary way. ∎

We have shown directly that there exists an optimal Markovian mechanism in which there are no lotteries over u between implemented investments for any u ∈ [0, u – ] . It follows that lotteries are also not needed to incentivize investments. With a slight abuse of notation, for any u, i we denote by φ i

inv (u) ∈ [0, u – ] the agent’s (deterministic) continuation utility after implementing investment i (at the allowed intensity) in state u .

As the value function is weakly concave, it is differentiable almost everywhere and has directional derivatives at all points. We use the notation V′(u) to represent the left derivative of V(u) .

Page 28: Dynamic Non-monetary Incentives - Weebly

138 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

In combination with Corollaries 1 and 2, this enables us to write the (HJB) equation, corresponding to problem (OBJ), using as the control variables x(u) and { D i

inv (u), φ i inv (u)} i∈I . The optimality condition is given by

(HJB) sup x(u), { D i

inv (u), φ i inv (u)} i∈I {−rV(u) + V′(u)[ru − W(x(u))] − C(x(u))

+ ∑ i∈I

μ i ( D i inv (u) B i + V ( φ i inv (u)) − V(u)) } = 0,

subject to

x(u) ∈ [0, J ], D i inv (u) ∈ [0, min {1, u – − u _____

l i } ] , φ i

inv (u) = u + D i inv (u) l i .

LEMMA 4: There exists an optimal x(u) that is weakly increasing.

PROOF:Assume that k 1 is the solution to x(u) in the (HJB) equation. This implies that

−V′(u)(W( k 1 )) − C( k 1 ) ≥ −V′(u)(W( k 2 )) − C( k 2 ) ∀ k 2 < k 1 .

From the weak concavity of V(u) , for any u ̃ > u , we have V′( u ̃ ) ≤ V′(u) . Therefore,

−V′( u ̃ )(W( k 1 )) − C( k 1 ) ≥ −V′( u ̃ )(W( k 2 )) − C( k 2 ) ∀ k 2 < k 1 .

This implies that there is an optimal solution to x( u ̃ ) that is at least k 1 . ∎

LEMMA 5: The image of any optimal non-decreasing x(u) is contained in {0, … , J } .

PROOF: Assume there is a state u at which the mechanism spends a strictly positive

amount  of time for which x(u) ∉ {0, … , J } and denote k = ⌈x(u)⌉ and α = ⌈x(u)⌉ − x(u) . For this to be true, we must have that ru = W(x (u)) , which in combination with Lemma 4 implies that from now on reward j is allowed at an intensity of at least α . Now consider the following deviation: allow reward k at intensity 1 for ϵ 1 discounted units of time (or until the arrival of the next investment) and then decrease the intensity at which reward k is allowed by α for ϵ 2 = ( (1 − α) /α) ϵ 1 units of time in all histories. Clearly, this deviation does not violate the IC for any reward or investment and provides the agent with a continuation utility of u . Moreover, this modification frees up capacity and thus increases the principal’s value. ∎

LEMMA 6: Under an optimal mechanism, u > ( ∑ i=1 j λ i g i ) /r implies that

x(u) ≥ j + 1 .

Page 29: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 139BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

The proof of this lemma is a bit long and computational, and so we first give a brief intuition. If the assertion in the lemma is false, we can construct a profitable deviation. To do so, we provide the agent with a small amount of utils via the cheapest reward that will not be used under the original mechanism (until addi-tional investments are implemented) in the near future and then offset this change by revoking his right to enjoy more expensive rewards afterward. Every investment opportunity that would have been incentivized under the original mechanism is also incentivized under the new mechanism. The new mechanism weakly reduces the cost of providing the promised continuation utility and with positive probability also increases the number of investments that can be incentivized.

PROOF: Assume to the contrary that there exists u ̃ such that W(x( u ̃ )) < r u ̃ . First, define

the reward to be added and for how many units of time it is to be added. Given Lemma 4, our assumption implies that there exists an open interval ϒ such that for all u ∈ ϒ , x(u) = k < j + 1 and ∑ i=1

k λ i g i < ru .Consider u 0 ∈ ϒ :

• There exists ϵ 1 > 0 , such that if no investment project is implemented before time t < ϵ 1 , then u t ∈ ϒ . That is,

u 0 − 1 − e − ϵ 1 r ________ r ∑ i=1

k

λ i g i < sup (ϒ).

• There exists ϵ 2 > 0 , such that if no investment is implemented by time ϵ 2 , the continuation utility after ϵ 2 units of time in which reward activities 1 to k + 1 are allowed at full intensity is in ϒ . That is,

inf (ϒ) < u 0 − 1 − e − ϵ 2 r ________ r ∑ i=1

k+1

λ i g i < sup (ϒ).

• Since δ ≡ u 0 − (1/r) ∑ i=1 k λ i g i > 0 , there exists T such that by time T at

least δ / 2 discounted utils are provided (in expectation) to the agent via reward activities { k + 1, … , J } . This implies that there exists ϵ 3 > 0 , such that reward k + 1 is available to the agent in expectation for at least ϵ 3 > 0 discounted units of time.

Let ϵ ∈ (0, min { ϵ 1 , ϵ 2 , ϵ 3 }) . Add reward activity k + 1 to the delegation list for ϵ units of time, unless an investment is implemented beforehand, in which case treat its arrival time as ϵ. This change provides ξ(ϵ) = ( (1 − e −ϵr ) /r) λ k+1 g k+1 utils to the agent.

Now, define the rewards that are to be revoked. Construct a process that measures the discounted expected utils the agent receives from rewards {k + 1, … , J } after ϵ units of time under the original mechanism:

y t = ∫ ϵ t e −rs ∑

i=k+1

J

λ i g i D i rew ( u s ) ds.

Page 30: Dynamic Non-monetary Incentives - Weebly

140 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

While y t < ω(ϵ) , the compensation under the new mechanism is given by x ̃ t = min { k, x t } , where ω(ϵ) is the unique solution to

ξ(ϵ) = E [ min { ω(ϵ), y ∞ } ].

Once y t ≥ ω(ϵ) , return to the original mechanism. The new mechanism provides the same expected continuation utility as the original one since the first change provides ξ(ϵ) discounted utils to the agent and the second change counterbalances this increase.

The new mechanism weakly reduces the cost of compensation as ξ(ϵ) discounted utils are provided via reward activity k + 1 , as opposed to some mixture of (weakly) more expensive rewards under the original mechanism. Moreover, with positive probability, u t = u – and y t < ξ(ϵ) , in which case additional investments may be implemented under the new mechanism. ∎

By Lemma 3, there exists an optimal mechanism in which the set of allowed rewards is fully determined by the arrival of past investments. By Lemmata 4, 5, and 6, the principal front-loads compensation via each reward to which she has committed. Taken together, these findings show that there exists an optimal mechanism that is a MDTM.

LEMMA 7: For all j < J , there exists u ̃ ( j ) < ∑ k=1 j λ k g k /r for which, under the

optimal mechanism, x( u ̃ ( j )) ≥ j + 1 .

The main part of this proof is to show that ur < W(x(u )) for all u ∈ (0, u – ) . The result then trivially follows from the monotonicity of W(x(u )) and the fact that x (0) = 0 . By Lemma 6, we know that ur ≤ W(x(u )) ; thus, it is enough to show a profitable deviation if ur = W(x(u )) . In a MDTM, the condition ur = W(x(u)) implies that the agent is allowed to enjoy rewards {1, … , j − 1} indefinitely and is not allowed to enjoy the more expensive rewards. We show that by providing the agent with a small time allowance for reward j the principal increases her capacity to incentivize investments via rewards { j, … , J } and with an arbitrarily high probability she still manages to use all rewards of types {1, … , j − 1} . This implies that providing the agent with a small time allowance for reward j increases the principal’s continuation payoff.

PROOF: Consider such a u , denote j = x(u) + 1 and i ⁎ = arg max i { D i

inv (u) : D i inv (u)

> 0} (the worst investment incentivized at u ), and construct a deviation as follows. Remove reward activity 1 from the delegation list between periods T and T + k , and add activity j to the delegation list for the next d units of time (or until a per-mitted investment project arrives, in which case consider this arrival time to be d ). Incentivize the next investment project by first returning reward activity 1 to the dele-gation list between T and T + k and then reverting to using the original compensation strategy.

Page 31: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 141BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

For the initial change to satisfy the (PK) condition, it must be the case that

d =

log (

1 _______________ 1 −

g 1 λ 1 ( e kr − 1) e −r (T+k) _______________

g j λ j

)

_____________________ r .

Moreover, we choose k such that the next investment cannot be incentivized only by reward activity 1:

min i: D i

inv (u)>0 l i D i

inv (u) > λ 1 g 1 1 − e −rk _______ r .

We show that there exists T ⁎ such that for all T > T ⁎ this deviation is profitable. Therefore, it is without loss of generality to assume that no investment project arrives before d .

The deviation is costly if no allowed investment arrives until time T , which happens with probability e − ∑ i: D i

inv (u)>0 μ i T . The increase in the cost of compensation is at most19

1 − e −rd ________ r λ j C j − e −rT 1 − e −rk _______ r λ 1 C 1 = e −rT λ 1 (1 − e −kr ) ( g 1 C j − g j C 1 )

_______________________ g j r

.

For this deviation to create a profit, it must increase the discounted amount of time in which reward project j is allowed.20 The value generated from using reward activity j for the next d units of time is at least

1 − e − rd ________ r ( λ j g j B i ⁎ ___

l i ∗ − λ j C j ) = e − rT

g 1 λ 1 (1 − e −kr ) _____________ r ( B i

⁎ ___ l i

⁎ − C j ___

g j ) .

Note that the ratio of the size of the gain to the size of the loss g 1 ( g j ( B i

⁎ / l i ⁎ ) − C j ) / ( g 1 C j − g j C 1 ) is bounded away from zero, and thus it is

enough to show that the ratio of the probability of loss to the probability of gain converges to zero.

The probability of gain is the probability that reward activity j will always be allowed after the next investment project arrives, as this implies better usage of the most efficient available reward project. Denote by p j (u) the probability that reward project j is used in full given an initial promise of u .

Denote by u t (T ) the agent’s continuation utility at time t conditional on the initial choice of T and no allowed investment project arriving before time t .

We are interested in E τ [ p j ( u τ (T ))] , where τ is the arrival time of the first allowed investment project. Since k was chosen so that the first investment could not be

19 If the first investment project arrives after T + k , this bound is exact, but if the first project arrives between T and T + k , the actual cost is slightly less.

20 By Lemmata 3, 4, and 5 and the assumption that ur ≤ W( x (u)) , rewards 1, … , j − 1 are permitted until the next implemented investment project arrives.

Page 32: Dynamic Non-monetary Incentives - Weebly

142 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

incentivized solely by reward 1 , we have that p j ( u τ (T )) > 0 . Moreover, p j ( ⋅ ) is an increasing function because the amount of time for which reward activity j is allowed in a MDTM is increasing in u .

The previous expectation is bounded from below by Pr [( u τ (T ) > x)] p j (x) for any x ∈ (u − λ 1 g 1 ((1 − e −rk )/r) , u) . As the second term is a strictly positive constant that does not depend on T , it is sufficient to derive Pr [( u τ (T ) > x)] . This probability is bounded from below by the probability of this event in histories in which the first investment project to arrive is of type 1 . In this case,

u τ (T ) =

⎨ ⎪

u − λ 1 g 1 1 − e −rk _______ r e −r(T−τ)

if d < τ ≤ T

u − λ 1 g 1 1 − e −r(T+k−τ) ____________ r

if T ≤ τ < T + k.

u

if τ > T + k

Therefore, clearly,

lim T→∞

Pr [ u τ (T ) > x ] = 1 .

This implies that the probability of gain (loss) converges to 1 (0) with T , and thus the deviation is profitable for a large enough T . ∎

LEMMA 8: V(u) is strictly concave.

PROOF: Consider an optimal MDTM. Assume that the agent’s continuation utility equals

u , and let u 1 < u 2 , such that u 1 , u 2 ∈ [0, u – ] and u = ( u 1 + u 2 ) /2 . One (nonnatu-ral) way the principal can deliver a promise of u is to fictitiously split all investments and rewards in half and create two (perfectly correlated) fictitious worlds, each of which contains half of each reward and half of each investment opportunity. She can then provide u 1 / 2 in fictitious world 1 and u 2 / 2 in fictitious world 2 .

Observe that scaling all payoffs by 1 / 2 multiplies the players’ discounted payoffs by half in any mechanism. Thus, the same MDTM remains optimal in each fictitious world. From Lemma 7, it follows that if u < u – , then with posi-tive probability, the agent’s continuation utility in world 1 eventually reaches zero. Moreover, the first time this happens, the agent’s continuation utility in world 2 is strictly positive. Then, the principal can benefit from temporarily merging the two fictitious worlds. Specifically, instead of wasting her capacity in world 1 (where all rewards are forbidden before the next investment opportunity arrives), she would rather use the most efficient reward activity in world 1 to speed up compen-sation in world 2 . This (weakly) decreases the cost of compensation; moreover, if sufficiently many type-1 investments arrive in quick succession, merging the two worlds increases the number of such investments that can be incentivized in world 2. ∎

Page 33: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 143BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

LEMMA 9: There is a unique optimal mechanism.

PROOF: Due to the strict concavity of V(u) , any optimal mechanism must satisfy the

properties described in Lemmata 3 and 4 (that is, compensation can be represented by an increasing x(u) ). Moreover, under an optimal mechanism, lotteries are not used to incentivize investment projects.

By the separability of the (HJB) equation in the different controls, and by the strict concavity of the value function V (u) , there is a unique optimal delegation list for all but a measure zero set of u s .

When selecting the optimal intensity for investment project i , the principal is maximizing

max D i

inv (u)∈ [0, min {1, u – −u ____ l i

} ] μ i ( D i

inv (u) B i + V(u + D i inv (u) l i ) − V (u)) ,

which is a strictly concave function in D i inv (u) , and it thus has a unique maximizer

for any u .Similarly, when deciding which reward projects to allow, the principal is

maximizing

max x(u)∈{ 0, … , J }

V ′(u)(ru − w(x (u))) − C(x (u)),

which has a unique solution unless V′(u) = g j / C j for some j ∈ J , a condition that can hold for at most a finite set of u s due to strict monotonicity of V′(u) . By the (PK) condition x( u – ) = J and x(0) = 0 in any mechanism.

By Lemma 7, under the optimal mechanism, the measure of time for which u = u ̃ for any u ̃ ∈ (0, u – ) is zero, and thus there is an essentially unique optimal mech-anism. ∎

PROOF OF PROPOSITION 3:By Lemma 7, we know that u ˆ j

rew < ∑ k=1 j−1 λ k g k /r . Since the optimal mechanism

satisfies the conditions specified in Lemma 4, we know that u ˆ j rew is weakly increasing.

Thus, all we need to show is strict monotonicity. We do so by formalizing the intuition provided in the text.

Assume to the contrary that two rewards with C k / g k < C j / g j have the same threshold and consider a promise of u = u ˆ k

rew + δ . This implies that the time allowance of both rewards is t 2 , and that the combined value (to the agent) of these time allowances is ϵ ≤ δ , where

( λ k g k + λ j g j ) 1 − e −r t 2 ________ r = ϵ.

Page 34: Dynamic Non-monetary Incentives - Weebly

144 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

If, instead, only reward k were used to provide the ϵ utils that should have been provided by reward activities j and k , it would need to be used for t 1 units of time, such that

( λ k g k ) 1 − e −r t 1 ________ r = ϵ.

This gives

t 2 = log (

λ k g k + λ j g j ___________

λ j g j + λ k g k e −r t 1 ) _________________ r .

The expected cost of allowing both rewards for t 2 units of time is

( λ k C k + λ j C j ) 1 − e −r t 2 ________ r ,

while the expected cost of allowing reward k for t 1 units of time is

( λ k C k ) 1 − e −r t 1 ________ r .

Thus, the direct savings in compensation costs by using reward k for t 1 units of time instead of using k and j for t 2 units of time is

λ k λ j ( C j g k − C k g j ) e −r t 1 ( e r t 1 − 1)

____________________________ r ( λ k g k + λ j g j )

.

Taking a Taylor expansion around t 1 = 0 , we get

λ k λ j ( C j g k − C i g j )

________________ λ k g k + λ j g j

t 1 + O( t 1 2 ).

Thus, the savings in costs is in the order of t 1 .The probability of a loss of value to the principal due to this change (an investment

project arriving before t 1 when both reward activities are not committed) is

(1 − e − ∑ i∈I μ i t 1 ) .

Taking a Taylor expansion around t 1 = 0 , we get

∑ i∈I

μ i t 1 + O( t 1 2 ).

Page 35: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 145BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

The maximal loss from a misallocation of reward activity k in a period of length t 1 (which occurs if an investment project of type 1 arrives immediately) is

g k λ k ( B 1 _ l 1

− C k _ g k

) 1 − e −r t 1 ________ r .

Taking a Taylor expansion around t 1 = 0 , we get

g k λ k ( B 1 _ l 1

− C k _ g k

) t 1 + O( t 1 2 ).

The probability of loss and the magnitude of loss are each in the order of t 1 ; thus, the maximal expected loss is in the order of t 1

2 . Recall that the gain is of order t 1 . Clearly, t 1 converges to zero with δ , which concludes the proof. ∎

PROOF OF PROPOSITION 4:The threshold for investment i is given by the first-order condition of the (HJB)

equation B i / l i = V′( u ˆ i inv ) . As V′( · ) is strictly increasing, the thresholds for which

these conditions hold are strictly increasing in i . By Lemma 2, u ˆ 1 inv = u – . To see

that u ˆ I inv > 0 , assume that when u = 0 an investment of type I is available. By the

assumption that all investment types are potentially profitable, B I − l I ( C 1 / g 1 ) > 0 . Thus, incentivizing this investment at an infinitesimal intensity by providing an infinitesimal time allowance for rewards of type 1 increases the value to the principal, as otherwise the marginal capacity is wasted with certainty. ∎

PROOF OF PROPOSITION 5:The proof of the first part of the result is analogous to the proof of Lemma 8.

Smoothing the arrival rate of investment i from μ i to μ – i is equivalent to replac-ing investment i with two independent investments i 1 , i 2 with respective arrival rates of μ i and μ – i − μ i . Then, we can consider investment i 1 and μ i / μ – i of all other investments and rewards as fictitious world 1 and investment i 2 and ( μ – i − μ i ) / μ – i of all other investments and rewards as fictitious world 2 . By the same logic used in Lemma 8 for any continuation utility u , the principal can use the GTM in each world separately with respective states of u ( μ i / μ – i ) and u ( ( μ – i − μ i ) / μ – i ) and generate the same payoff as she would get in the original world. However, as the arrival of investments i 1 and i 2 is independent, with positive probability, world 1 will reach a state of u 1 = 0 while u 2 > 0 . At this point, the principal can “transfer” rewards of type 1 from world 1 to world 2 and by doing so increase the speed of compensation in world 2 without increasing the cost of compensation, a change that with positive probability increases the value she generates in fictitious world 2 . The proof of the second part of this proposition is trivial and hence omitted. ∎

PROOF OF PROPOSITION 7:For each u ̃ ∈ [0, u – ] , define by V u ̃ (u) the principal’s value in the mechanism (i.e.,

when the principal has commitment power) in which the support of the agent’s continuation utility is restricted to the set [0, u ̃ ] .

Page 36: Dynamic Non-monetary Incentives - Weebly

146 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

By a proof that is identical to that of Proposition 2, it follows that this value is attained by a multidimensional time mechanism. ∎

LEMMA 10: V u ̃ (u) is:

(i ) Weakly increasing in u ̃ ;

(ii ) Strictly decreasing and continuous in u ;

(iii ) Continuous in u ̃ .

PROOF: The first property follows directly from the fact that any mechanism that can

be implemented when the agent’s utility is restricted to an interval can also be implemented when it is restricted to a super-set of the interval. The second prop-erty follows directly from the properties of the optimal mechanism established in Lemma 1 and Proposition 2. Continuity in u ̃ follows from the upper bound on the marginal value of capacity being B 1 / l 1 .

Since V u – ( u – ) < 0 and V 0 (0) = 0 , there exists a maximal u ⁎ such that V u ⁎ ( u ⁎ ) = 0 . As in any PPE, the principal’s value must be nonnegative at all times, and the agent’s continuation cannot exceed u ∗ in any PPE.

Finally, observe that the MDTM that implements V u ⁎ ( · ) constitutes a PPE in which following a deviation by the principal the state shifts to u ⁎ . Note that as the principal does not observe action availability, her deviation at time t has no impact on her expected payoff at time t . Therefore, since the principal’s value with com-mitment is no less than her value without it, it follows that a MDTM is the optimal mechanism without commitment. ∎

Appendix B: The TM as a Markovian Delegation Mechanism

The following conditions constitute the TM as a Markovoian delegation mechanism:

(1) D inv (u) = min {1, u – − u _____ l } ,

(2) D rew (u) = { 1

if u > 0

0

if u = 0,

(3) φ inv (u) = min{u + l, u – } ,

(4) φ rew (u) = u,

(5) F τ (u) is a Dirac distribution centered on max { gλ + e rτ (ru − gλ) _______________ r , 0} ,

Page 37: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 147BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

(6) u 0 = 0 .

In words, conditions (1) and (2) specify the delegation function D(u) : investments are always incentivized at the maximal possible intensity, and rewards are allowed at full intensity whenever the agent’s continuation utility is positive. Condition (3) states that upon implementation of an investment, the agent’s continuation utility increases by exactly the cost of implementation. The main feature of the TM is reflected in conditions (4) and (5): the agent’s continuation utility does not depend on the actual number of enjoyed rewards but drifts down continuously and determin-istically (as long as no investment is implemented). The initial condition (6) trivially corresponds to s 0 = 0 .

Appendix C: Derivation of the Value Function

From the discussion in the paper, it is clear that the principal’s value is related to the discounted amount of time the agent is permitted to enjoy reward activities. Therefore, once we know the expected discount factor at the first time u t = 0 for every initial value of u 0 , we can calculate the expected discounted measure of time in which rewards are allowed, from which, in turn, we can derive the principal’s expected value.

By the definition of the TM, an agent whose initial continuation utility is u 0 and who carried out no investment projects for t periods has a time t continuation utility of

u(t, u 0 ) = e rt ( ru 0 − λg) + λg

______________ r .

Denote by τ (u) the first hitting time of u t = 0 given u 0 = u :

τ (u) = min t∈ ℝ + ∪{∞}

{ u t = 0 : u 0 = u}.

Define the expected discount factor at time τ (u) to be

h(u) = E [ e −r τ(u) ] .

Consider an initial promise of x and a short interval of time ϵ in which two (or more) investment projects are unlikely to occur. Then h(x) must satisfy the recursion

h(x) = e −μϵ e −rϵ h(u(ϵ, x))

+ e −rϵ μ ∫ 0 ϵ e −μt h (u (ϵ − t, u(t, x) + min { l, u – − u(t, x) _________

l } ) ) dt + e −rϵ O( ϵ 2 ).

Page 38: Dynamic Non-monetary Incentives - Weebly

148 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

Rearranging gives

e rϵ h(x) = e −μϵ h(u(ϵ, x))

+ μ ∫ 0 ϵ e −μt h (u (ϵ − t, u (u(t, x) + min {l, u – − u(t, x) _________

l } ) ) ) dt + O( ϵ 2 ).

Since h(x) is monotone decreasing and continuous, it is differential a.e., and thus we can differentiate the last equality with regard to ϵ :

r e rϵ h(x) = −μ e −μϵ h(u(ϵ, x)) + e −μϵ h′(u(ϵ, x))( e ϵr (rx − λg))

+ μ e μϵ h (u(ϵ, x) + min {l, u – − u(t, x) ________

l } )

+ ∫ 0 ϵ e −μt h′ (u (ϵ − t, u(t, x) + min {l,

u – − u(t, x) ________ l } ) )

× ( e r(ϵ−t) ( e rt (rx − λg) + r)) dt + O( ϵ 2 ).

Taking the limit of ϵ → 0 and rearranging, we get

(C1) 0 = − ( μ

_ r + 1) h(x) + h′(x) (x − λg

_ r ) + μ

_ r h (x + min {l, u – − u(t, x) _________ l } ) .

Clearly, we have the boundary conditions

(C2) h(0) = 1, h( u – ) = 0.

Therefore, h(x) is the solution to a differential difference equation with suitable boundary conditions.

Consider the range x ∈ [ u – − l, u – ] . For this range, we know that h (x + min { l, ( u – − x)/l} ) = h( u – ) = 0 , and thus for this interval equation (C1) becomes

0 = − ( μ

_ r + 1) h 1 (x) + h 1 ′ (x)(x − u – ).

This is an equation that is a simple ODE whose solution is

h 1 (x) = ( u – − x) (μ/r)+1 α

for some scalar α .This, in turn, implies that in the interval [ u – − 2l, u – − l] equation (C1) becomes

0 = − ( μ

_ r + 1) h 2 (x) + h 2 ′ (x)(x − u – ) + μ

_ r h 1 (x + l )

Page 39: Dynamic Non-monetary Incentives - Weebly

VOL. 11 NO. 4 149BIRD AND FRUG: DYNAMIC NON-MONETARY INCENTIVES

with the boundary condition h 2 ( u – − l ) = h 1 ( u – − l) , which is again an ODE in h 2 (x) . This ODE can also be solved as h 1 (x + l ) is already known (up to α ).

We can continue in an iterative fashion until in the interval [ u – − (k + 1)l, u – − kl ] the solution to equation (C1) is the solution to the ODE with h k+1 given by

0 = − ( μ _ r + 1) h k+1 (x) + h k+1 ′ (x)(x − u – ) + μ __ r h k ( x + l),

h k+1 ( u – − kl) = h k ( u – − kl ) .

Since u – is finite and l > 0 , there are a finite number of iterations before we reach k ≥ u – /l , at which point we can use the boundary condition h(0) = 1 to solve for α in h 1 .

When u = 0 , the expected discount factor at the arrival of the first investment proj-ect is μ/ (r + μ) , after which time there are (1 − h(l )) /r expected discounted units of time in which reward activities are allowed before u = 0 is hit again. Thus, the discounted amount of time in which the agent is expected to be allowed to benefit from rewards when the current promise is u = 0 , W(0) , solves

W(0) = μ _ r + μ

1 − h(l ) _ r +

μ _ r + μ h(l ) W(0)

or

W(0) = 1 _ r − 1 ______________ μ(1 − h(l )) + r .

Clearly, for any u > 0 , we have

W(u) = 1 − h(u)

_ r + h(u) W(0) = 1 _ r + h(u) (W(0) − 1 _ r ) .

Therefore, the principal’s value function is given by

V(u) = B _ l (W (u) gλ − u) − W(u)λC.

Appendix D: Weak Order on Actions

At the beginning of Section III, we assumed that B i / l i is strictly decreasing and C j / g j is strictly increasing in order to simplify the exposition of our results. The proof of our main result makes it clear that if there are two reward activities, j 1 , j 2 , with the same rate of transfer, then the principal treats them identically. The same holds for investment projects. If the relative benefit of two investments is the same for the principal, then she incentivizes both until the agent’s continuation utility reaches the appropriate threshold.

Page 40: Dynamic Non-monetary Incentives - Weebly

150 AMERICAN ECONOMIC JOURNAL: MICROECONOMICS NOVEMBER 2019

REFERENCES

Alonso, Ricardo, and Niko Matouschek. 2008. “Optimal Delegation.” Review of Economic Studies 75 (1): 259–93.

Armstrong, Mark, and John Vickers. 2010. “A Model of Delegated Project Choice.” Econometrica 78 (1): 213–44.

Baron, David P., and David Besanko. 1984. “Regulation and Information in a Continuing Relation-ship.” Information Economics and Policy 1 (3): 267–302.

Biais, Bruno, Thomas Mariotti, Jean-Charles Rochet, and Stéphane Villeneuve. 2010. “Large Risks, Limited Liability, and Dynamic Moral Hazard.” Econometrica 78 (1): 73–118.

Bird, Daniel, and Alexander Frug. 2019. “Monotone Contracts.” Universitat Pompeu Fabra Working Paper 1647.

Fong, Yuk-fai, and Jin Li. 2017. “Relational Contracts, Limited Liability, and Employment Dynam-ics.” Journal of Economic Theory 169: 270–93.

Forand, Jean Guillaume, and Jan Zápal. 2018. “Production Priorities in Dynamic Relationships.” http://arts.uwaterloo.ca/~jgforand/Papiers/Forand_Zapal_Dynamic_Favours_TE4JG.pdf.

Frankel, Alexander. 2014. “Aligned Delegation.” American Economic Review 104 (1): 66–83.Frankel, Alexander. 2016a. “Delegating Multiple Decisions.” American Economic Journal: Microeco-

nomics 8 (4): 16–53.Frankel, Alexander. 2016b. “Discounted Quotas.” Journal of Economic Theory 166: 396–444.Guo, Yingni. 2016. “Dynamic Delegation of Experimentation.” American Economic Review 106 (8):

1969–2008.Guo, Yingni, and Johannes Hörner. 2015. “Dynamic Mechanisms without Money.” Yale University,

Cowles Foundation Discussion Paper 1985.Harris, Milton, and Bengt Holmstrom. 1982. “A Theory of Wage Dynamics.” Review of Economic

Studies 49 (3): 315–33.Hauser, Christine, and Hugo Hopenhayn. 2008. “Trading Favors: Optimal Exchange and Forgive-

ness.” Collegio Carlo Alberto Working Paper 88.Holmström, Bengt. 1977. “On Incentives and Control in Organizations.” PhD diss. Stanford University.Holmstrom, Bengt, and Paul Milgrom. 1987. “Aggregation and Linearity in the Provision of Intertem-

poral Incentives.” Econometrica 55 (2): 303–28.Lazear, Edward P. 1981. “Agency, Earnings Profiles, Productivity, and Hours Restrictions.” American

Economic Review 71 (4): 606–20.Li, Jin, Niko Matouschek, and Michael Powell. 2017. “Power Dynamics in Organizations.” American

Economic Journal: Microeconomics 9 (1): 217–41.Lipnowski, Elliot, and João Ramos. 2017. “Repeated Delegation.” https://economicdynamics.org/

meetpapers/2018/paper_1292.pdf.Melumad, Nahum D., and Toshiyuki Shibano. 1991. “Communication in Settings with No Transfers.”

RAND Journal of Economics 22 (2): 173–98.Möbius, Markus M. 2001. “Trading Favors.” https://www.markusmobius.org/sites/default/files/marku-

smobius/files/favors.pdf.Nocke, Volker, and Michael D. Whinston. 2010. “Dynamic Merger Review.” Journal of Political Econ-

omy 118 (6): 1200–1251.Nocke, Volker, and Michael D. Whinston. 2013. “Merger Policy with Merger Choice.” American Eco-

nomic Review 103 (2): 1006–33.Ray, Debraj. 2002. “The Time Structure of Self-Enforcing Agreements.” Econometrica 70 (2):

547–82.Rogerson, William P. 1985. “Repeated Moral Hazard.” Econometrica 53 (1): 69–76.Sannikov, Yuliy. 2008. “A Continuous-Time Version of the Principal-Agent Problem.” Review of Eco-

nomic Studies 75 (3): 957–84.Spear, Stephen E., and Sanjay Srivastava. 1987. “On Repeated Moral Hazard with Discounting.”

Review of Economic Studies 54 (4): 599–617.Thomas, Jonathan, and Tim Worrall. 1990. “Income Fluctuation and Asymmetric Information: An

Example of a Repeated Principal-Agent Problem.” Journal of Economic Theory 51 (2): 367–90.