Signaling with Private Monitoring - web.mit.eduweb.mit.edu/gcistern/www/spm.pdf · Signaling with...
Transcript of Signaling with Private Monitoring - web.mit.eduweb.mit.edu/gcistern/www/spm.pdf · Signaling with...
Signaling with Private Monitoring∗
Gonzalo Cisternas and Aaron Kolb
August 10, 2019
Preliminary
The most recent version of this paper can be found at
http://web.mit.edu/gcistern/www/spm.pdf
Abstract
We examine linear-quadratic signaling games between a long-run player that has a
normally distributed type and a myopic player who privately observes a noisy signal
of the long-run player’s actions. An imperfect signal of the myopic player’s behavior
is publicly observed, and thus there is two-sided signaling. Time is continuous over a
finite horizon, and the noise is Brownian. We construct linear-Markov equilibria using
the players’ beliefs up to the second order as states. In such equilibria, the long-run
player’s second-order belief is controlled, reflecting that past actions are used to fore-
cast the continuation game. Via this higher-order belief channel, the informational
content of the long-run player’s action is not only driven by the weight attached to her
type, but also by how aggressively she has signaled in the past. Applications to models
of leadership, reputation, and trading are examined.
Keywords: signaling; private monitoring; private beliefs; learning; Brownian motion.
JEL codes: C73, D82, D83.
∗Cisternas: MIT Sloan School of Management, 100 Main St., Cambridge, MA 02142, [email protected]: Indiana University Kelley School of Business, 1309 E. Tenth St., Bloomington, IN [email protected]. We thank Alessandro Bonatti, Robert Gibbons and Vish Viswanathan for usefulconversations.
1
1 Introduction
Signaling—that is, the strategic transmission of private information through actions—plays
a central role in many economic interactions. In labor markets, for instance, workers’ edu-
cational choices can credibly convey information about their hidden abilities (Spence, 1973).
In financial markets, signaling influences how quickly market prices incorporate the infor-
mation privately possessed by traders with market power (Kyle, 1985). In organizations, it
offers a productivity-enhancing rationale for leadership by example (Hermalin, 1998).
In all these settings, “receivers” must extract the informational content of the signals gen-
erated by those who hold superior information: employers must interpret a college degree
as a signal of ability; investors must interpret financial data to learn about others’ informa-
tion; followers must interpret leaders’ actions to decide how to follow, and so on. In most
studies of signaling thus far, however, the signals involved are either perfect and/or public,
or private but restricted to static settings, thereby making any such an analysis public: at
the moment they have to move, senders know their receivers’ beliefs. Crucially, while such
public analyses can be appropriate in some settings (e.g., public academic credentials can be
highly informative signals), they are less so in others.
Indeed, imperfect private signals are pervasive: they naturally appear when employers
subjectively assess workers’ performances (MacLeod, 2003; Levin, 2003); when brokers han-
dle orders of retail investors (Yang and Zhu, 2018); or when data brokers collect data from
consumers’ online behavior (Bonatti and Cisternas, 2019), for instance. The presence of id-
iosyncratic noise then renders the inferences made by receivers private, raising a fundamental
question: how do learning and signaling in repeated interactions play out when those who
hold payoff-relevant information do not know what others have seen?
In this paper, we make progress in this direction by examining a player’s signaling in-
centives in settings where her actions generate signals that are hidden to her. Specifically, a
long-run player (she) of a normally distributed type interacts with a myopic player (he) over
a finite horizon. The myopic player privately observes a noisy signal of the long-run player’s
actions, while the long-run player can learn about the myopic player’s private inferences
from an imperfect public signal of the myopic player’s behavior. The players’ preferences are
linear-quadratic and the noise is Brownian. Using continuous-time methods, we construct
linear Markov perfect equilibria (LME) using the players’ beliefs as states.
The games we study feature one-sided incomplete information and imperfect private
monitoring. Consider a leader of an organization interacting with a follower (or many of
them, acting in coordination). The organization’s payoff increases with both the proximity
of the leader’s action to a newly realized state of the world (adaptation) and the proximity of
2
the follower’s actions to the leader’s action (coordination). Moreover, the follower attempts
to match the leader’s action at all times. The environment is, however, complex, in the sense
that the leader cannot immediately convey the state of the world to the follower: the latter
learns it only gradually by subjectively evaluating the leader’s actions. In turn, the leader
receives feedback of the follower’s inferences through a public signal of the follower’s actions.
Due to the private monitoring, the follower’s belief is private, and hence both parties have
private information to signal to one other. In addition, the leader is forced to use her past
actions to estimate the follower’s belief. This forecast—the leader’s second-order belief—is
itself private, even along the path of play, as the leader conditions her actions on the state of
the world. How does the leader then manage the transition of the organization to the new
state of the world accounting for this higher-order uncertainty? What are the implications
for learning and payoffs, and hence for the value of better information channels which reduce
higher-order uncertainty?
Economic forces. We construct LME using beliefs up to the long-run player’s second
order as states. This second-order belief is controlled by the long-run player, reflecting that
she uses past play to forecast the continuation game. The well-known problem of the state
space growing due to the myopic player attempting to forecast such private belief is then
circumvented by a key representation lemma (Lemma 2) that expresses the (candidate, on
path) second-order belief as a convex combination of the long-run player’s type and the belief
about that type based on public information exclusively. This representation reflects how
the long-run player calibrates her belief using the public information when learning about
the myopic player’s belief. This “public” state is therefore part of the set of belief states,
and is affected by the myopic player.
Because different types take different actions in equilibrium, and actions are used to
forecast the myopic player’s belief, different types also perceive different continuation games
as measured by their second-order beliefs. This creates a novel history-inference effect,
whereby the sensitivity of the long-run player’s action to her type—which determines the
myopic player’s learning—is comprised not only of the direct weight her strategy places on
her type, but also by how aggressively she has signaled in the past via the myopic player’s
inference of the long-run player’s private history. This effect compounds over time as the
second-order belief reflects more the long-run player’s type, and its amplitude is decreasing in
the quality of the public signal: shutting down the public signal (no feedback case) maximizes
the potential reliance on the type, while making the public signal noiseless (perfect feedback)
eliminates this dependence. These extreme cases are exploited in the applications we study.
3
Applications. In Section 2, we illustrate the main economic insights of the paper by exam-
ining a game in which a leader must adapt an organization to a new economic environment
while controlling the coordination costs with a myopic follower who tries to match her action.
To accommodate to the follower, the leader’s action is less sensitive to her type (i.e.,
achieves less adaptation) than in the full-information benchmark; but successful accommo-
dation requires knowing the follower’s belief. Critically, because higher types take higher
actions due to their stronger adaptation motives, they also expect their followers to have
higher beliefs—the coordination motive leads higher types to take higher actions via the
history-inference channel. In the absence of feedback, therefore, standard decreasing adap-
tation incentives that fully determine signaling and learning when beliefs are public are then
offset by higher-order belief effects that make the leader’s signaling increasing over time.
This qualitatively different signaling behavior has important consequences on learning
and payoffs, and hence on the value of better information channels within the organization.
In the extreme cases of a myopic and a patient leader, we show that the follower’s overall
learning from the interaction is always higher when the leader receives no feedback than when
the public signal is perfect, a consequence of stronger overall adaptation in the former case.
Learning is, however, a measure of the organization’s struggle: learning occurs only when the
private signal is informative of the state of the world, and hence only when miscoordination
occurs along the way. The stronger signaling that arises in the no-feedback case is then more
decisive when the leader is impatient. Specifically, the history-inference effect substitutes for
the lack of adaptation of a myopic leader, thereby reducing the added value of a noiseless
public signal relative to the no-feedback case as the degree of impatience increases.
In Section 5 we explore two applications based on extensions of our model. In the first,
the type is a bias, and the long-run player wants to preserve a reputation for neutrality,
modeled via a terminal quadratic loss in the myopic player’s belief (e.g., a politician facing
reelection). Clearly, eliminating the public signal has a negative direct effect on the long-run
player’s payoff (increased uncertainty in a concave objective). Since higher types take higher
actions due to their larger biases, however, those types must offset higher beliefs to appear
unbiased; the history-inference effect is then negative, which weakens signaling and hence
the sensitivity of the myopic player’s belief, potentially leading to higher payoffs.
Finally, we exploit the presence of the public belief state in a trading model in which
an informed trader faces both a myopic trader that privately monitors her orders and a
competitive market maker who only observes the public total order flow. In this context,
we show that there is no linear Markov equilibrium for any degree of noise of the private
signal. Intuitively, the myopic player introduces momentum into the price process, as the
information he obtains now gets distributed to the market maker through all future order
4
flows. This causes prices to move against the insider and creates urgency, leading the insider
to trade away all information in the first instant.
Technical contribution. The setting we examine is asymmetric, in terms of the players’
preferences and their private information (a fixed state versus a changing one). In particular,
the players can signal at substantially different rates, which is in stark contrast to a small lit-
erature on symmetric multi-sided learning (see the literature review section). With different
rates of learning, however, the equilibrium analysis can become severely complicated.
Specifically, the belief states we construct depend on two functions of time: (1) the
myopic player’s posterior variance, which determines the sensitivity of the myopic player’s
belief to her private signal, and (2) the weight attached to the long-run player’s type in the
representation result, which captures the contribution of the history-inference effect to the
long-run player’s signaling. Standard dynamic-programming arguments reduce the problem
of existence of LME to a boundary value problem (BVP) that these two functions, along with
the weights in the long-run player’s linear strategy, must satisfy. The two learning ordinary
differential equations (ODEs) endow the BVP with exogenous initial conditions, while the
rest carry endogenous terminal conditions arising from myopic play at the end of the game.
Determining the existence of a solution to such a BVP is challenging because it involves
multiple ODEs in both directions. For this reason, we establish two sets of results. In a
private value environment, the myopic player’s best response does not directly depend on
his belief about the long-run player’s type, but only indirectly via his expectation of the
latter player’s action. In that context, we show that there is a one-to-one mapping between
the solutions to the learning ODEs (Lemma 5), a consequence of the ratio of the signaling
coefficients being constant. This, in turn, makes traditional shooting methods based on the
continuity of the solutions applicable. Via this method, we show the existence of LME in the
leadership model of Section 2 when the public signal is of intermediate quality for horizon
lengths that are decreasing in the prior variance about the state of the world (Theorem 1).
In common value settings, the multidimensionality issue seems unavoidable. Building on
the literature on BVPs with intertemporal linear constraints (Keller, 1968), however, we can
show the existence of LME via the use of fixed-point arguments applied to our BVP with
intratemporal nonlinear (terminal) constraints. Specifically, the multidimensional shooting
problem can be reformulated as one of the existence of a fixed point for a suitable function
derived from the BVP, which we then tackle for a variation of the leadership model when
the follower also cares about matching the state of the world (Theorem 2). Critically, the
method is general—it applies to the whole class of games under study, and opens a way for
examining other settings exhibiting learning and asymmetries.
5
Related Literature Static noisy signaling was introduced by Matthews and Mirman
(1983) in a limit pricing context, and further studied by (Carlsson and Dasgupta, 1997)
as a refinement tool. Recent dynamic analyses involving Gaussian noise and public beliefs
include Dilme (2019), Gryglewicz and Kolb (2019), Kolb (2019) and Heinsalu (2018).1
Multisided signaling has been examined by Foster and Viswanathan (1996) and Bonatti
et al. (2017) in symmetric settings with imperfect public monitoring and dispersed fixed
private information. In those settings, beliefs are private, but the presence of a commonly
observed public signal permits a representation of first-order beliefs that eliminates the need
for higher-order ones.2 Bonatti and Cisternas (2019) in turn examine two-sided signaling in
a setting where firms privately observe a summary statistic of a consumer’s past behavior to
price discriminate. Via the prices they set, however, firms perfectly reveal their information
to the consumer.
The literature on repeated games with private monitoring is extensive, and has largely
focused on non-Markovian incentives—Ely et al. (2005) and Horner and Lovo (2009) (the
latter allowing for incomplete information) study equilibria in which inferences of others’
private histories are not needed, and Mailath and Morris (2002) and Horner and Olszewski
(2006) study almost public information structures. Levin (2003) and Fuchs (2007) examine
one-sided private monitoring in repeated principal-agent interactions.
Regarding our applications, the stage game of our leadership model is a simplified version
of Dessein and Santos (2006).3 In turn, the value of public information has been studied
by Morris and Shin (2002), Angeletos and Pavan (2007), and Amador and Weill (2012)
in settings with infinitesimal players, thus rendering signaling and inferences of individual
private histories unnecessary. Regarding trading models, Yang and Zhu (2018) show, in a
richer two-period version of our model, that a linear equilibrium ceases to exist if a signal of
an informed player’s last trade is too precise and privately observed by another player.
To conclude, this paper contributes to a growing literature employing continuous-time
techniques to the analysis of dynamic incentives. Sannikov (2007) examines two-player games
of imperfect public monitoring; Faingold and Sannikov (2011) reputation effects with behav-
ioral types; Cisternas (2018) off-path private beliefs in games of ex ante symmetric uncer-
tainty; and Horner and Lambert (2019) information design in career concerns settings.
1This last paper also displays a normally distributed type, but it lacks strategic interdependence betweenactions. Thus, behavior is unchanged under some information structures involving private monitoring.
2Likewise in He and Wang (1995), where infinitely many agents privately see dynamic exogenous signals.3See Bolton and Dewatripont (2013) for such a static analysis with one round of pre-play communication.
More generally, these are instances of the linear-quadratic team theory of Marschak and Radner (1972).
6
2 Application: Leading Coordination and Adaptation
Economists have long understood that adaptation to changes in the external economic en-
vironment is a key problem for organizations (e.g., Simon, 1951), and that successful adap-
tation often requires substantial coordination of activities within them.4 Since at least
Radner (1962), however, it has been recognized that coordination is threatened by the pres-
ence of different decision-makers and sources of information. As Williamson (1996) further
points out, “failures of coordination can arise because autonomous parties read and re-
act to signals differently, even though their purpose is to achieve a timely and compatible
combined response.” Private signals—a consequence of either private sources or subjective
interpretations—therefore play an integral role in organizations’ ability to adapt to change.
The study of the adaptation-coordination trade-off has lead to important insights regard-
ing the returns to specialization (Dessein and Santos, 2006), centralization (Alonso et al.,
2008), and governance structures (Rantakari, 2008), but the great majority of these analyses
have been static. In particular, the central question of how information about the economic
environment is gradually transmitted and reflected in decision-making has been much less
explored.5 The difficulty in analyzing learning dynamics in organizations while accounting
for idiosyncratic shocks and/or private information is apparent: to appropriately coordinate
their actions, individuals must forecast what other members know.
In this application, we examine how a leader—a member of an organization with crucial
information about the economic environment and the opportunity to influence others—
manages the dynamics of adaptation and coordination when a follower privately monitors
her actions. For instance, top management wishes to adapt its strategy to a shift in the
market fundamentals, but it suffers from imperfect control: the information about such
changes trickles down the organization through various layers before reaching key productive
divisions. Or consider an expert who leads by example to transmit a technique or skill—an
intangible activity or knowledge—to an apprentice, and the latter subjectively evaluates the
expert’s actions. In both cases, the economic ‘fundamentals’ are learned only gradually by
the receiver, and the sender does not directly observe what the receiver has seen.
Specifically, consider the following game inspired by the team theory of Marschak and
4The literature on the topic is extensive. Refer, for instance, to Chapter 4.2 in Williamson (1996) andChapter 4 in Milgrom and Roberts (1992).
5Marschak (1955), in discussing team theory as a framework for examining organizations, makes the caseclear (p. 137): “A realistic theory of teams would be dynamic. It takes time to process and pass messagesalong a chain of team members; and messages must include not only information on external variablesbut also information on what has been done by [others...] Knowledge about [probabilities and payoffs] isacquired gradually, while the team already proceeds with decisions. These facts make the dynamic teamproblem similar to those in cybernetics and in sequential statistical analysis.”
7
Radner (1972). A team consisting of a leader (she) and a follower (he) operates in an
environment parametrized by a state of the world θ ∼ N (µ, γo). The team’s payoff is
ˆ T
0
e−rt{−(at − θ)2 − (at − at)}dt, (1)
where at denotes the leader’s action at time t, at the follower’s counterpart, r ≥ 0 is a
discount rate, and T < ∞. Thus, performance increases with the proximity of the leader’s
action to the state of the world (adaptation) and with the proximity of both players’ actions
(coordination). Such actions can take values over the whole real line.
We depart from Marschak’s and Radner’s approach to modeling teams by allowing diver-
gence in preferences. Specifically, we assume that the leader’s preferences coincide with the
team’s payoff, while the follower is myopic trying to minimize (at− at)2 at all times t ∈ [0, T ].
The leader knows the realized value of θ, while the follower only knows its distribution.
As time progresses, however, the follower privately observes an imperfect signal of the form
dYt = atdt+ σY dZYt ,
where ZY is a Brownian motion and σY > 0 a volatility parameter. In particular, immediate
adaptation at no coordination costs via a perfectly revealing action is not possible.
In turn, the leader can learn about what the follower has done (and, hence, about what
the follower has seen) from
dXt = atdt+ σXdZXt ,
where ZX is independent of ZY . This signal is public; for instance, an output measure
observed by both parties, or information prepared by the follower.
In this context, our goal is twofold. First, to understand how private monitoring affects
the players’ behavior. Second, to assess the value of better information channels for the team,
which is an issue of central importance for the performance of organizations. To achieve both
objectives in a unified way, we thus fix σY > 0 and focus on the cases of σX = 0 and +∞.
Specifically, to understand how the presence of a noisy private signal affects the players’
behavior, it is useful to examine the benchmark in which Y—and hence, the follower’s
belief—is public; an indirect approach for studying this case is by setting σX = 0, to the
extent that the follower’s action reveals his belief at all times. On the other hand, to assess
the value of better bottom-up information systems (as measured by lower values of σX),
it is natural to consider the baseline case in which X is absent, which is equivalent to
setting σX = ∞. Reductions in σX have the appealing interpretation of being the result
of interventions intended to improve the information that leaders receive from within the
8
organization (which can be important if leaders are busy with other activities). As it turns
out, the extreme cases of σX to be discussed deliver the sharpest economic intuitions.
2.1 Perfect Feedback (“Public”) Case: σX = 0
When the public signal has no noise, the observation of the follower’s action opens the
possibility of the leader perfectly inferring the follower’s belief. Thus, we aim to characterize
a linear Markov equilibrium (LME) of the form
at = β0t + β1tMt + β3tθ and at = Et[at] = β0t + (β1t + β3t)Mt (2)
where Mt := Et[θ], and βit, i = 0, 1, 3 are functions of time satisfying β1t+β3t 6= 0, t ∈ [0, T ].6
The deterministic feature of the candidate equilibrium coefficients is explained shortly; the
last condition permits identifying the follower’s belief from his observed action.7
From standard results in filtering theory, if the follower expects (at)t≥0 as in (2), then
dMt =β3tγtσ2Y
[dYt − {β0t + (β1t + β3t)Mt}︸ ︷︷ ︸Et[at]=
dt], where γt = −(γtβ3t
σ2Y
)2
. (3)
That is, the follower updates upwards whenever the observed increment, dYt, is larger than
the follower’s expectation of it, Et[dYt] = [β0t + (β1t + β3t)Mt]dt. Moreover, the intensity of
the reaction is larger the more aggressively the leader signals the state (i.e., a larger β3t),
and the less is known about the latter (i.e., a larger γt). Finally, learning is deterministic due
to the Gaussian structure, and is faster the stronger the intensity of the leader’s signaling.
The leader’s problem is to maximize (1) subject to (3), recognizing that she affects M
via dYt = atdt + σY dZYt . The next result establishes the existence of a LME, along with
some properties that any such equilibrium should satisfy:
Proposition 1 (LME—Public Case). For all r ≥ 0 and T > 0:
(i) Existence: there exists a LME. In any such equilibrium at = β3tθ + (1− β3t)Mt.
(ii) Signaling coefficient: β3t ∈ (1/2, 1) for t < T , β3T = 1/2, and β3 is strictly decreasing.
Recall that the full information benchmark is simply at = at = θ at all times. From this
perspective, the leader sacrifices adaptation (i.e., β3 < 1) to be able to coordinate with the
6We skip subindex 2 to be consistent with the general model presented in Section 3, where we completethe notion of linear Markov equilibrium with an additional state variable.
7More precisely, a LME as in (2) is perfect when Y is public, but only Nash when Y is private butσX = 0, as the continuity of the paths of M makes deviations by the myopic player observable in this lattercase. Due to the full-support noise, however, this distinction is vacuous in discrete time.
9
follower; in particular, the coefficient on M , β1 = 1− β3, must be positive, as higher values
of M then require higher actions by the leader.8
The leader’s incentives to sacrifice adaptation are weaker the farther the team is from the
end of the interaction. In fact, in equilibrium, dMt ∝ γtβ3t[θ−Mt] at all times, and so stronger
adaptation today brings—via signaling—more coordination tomorrow. This dynamic incen-
tive decays (deterministically) because there is less time to enjoy future coordination and
beliefs are less responsive as learning progresses. The terminal value simply reflects that
static equilibrium behavior (a, a) = (12θ + 1
2M, M) arises at the end of the interaction.
2.2 No Feedback Case: σX =∞
When the public signal is uninformative, the leader must perform a non-trivial inference of
the follower’s belief to correctly assess how her actions affect future payoffs (i.e., to correctly
assess the continuation game). This is, in turn, an exercise of inference of private histories.
Forecasting by input. In the public case the leader’s past actions were immaterial for
inferring the follower’s contemporaneous belief, as the latter was fully determined by the
realized history Y t—i.e., the leader forecasted by output. In fact, since (3) is linear, we have
Mt = A1(t) +
ˆ t
0
A2(t, s)dYs,
for some deterministic functions A1 and A2. The ability to observe Y t implies that the leader
always computes her forecast as above, with larger effort profiles only indicating that the
corresponding shocks ZY were lower, and vice-versa.
In the absence of feedback, the leader does not observe M . Thus, her second-order belief,
Mt := Et[Mt], is a payoff-relevant state. Further, as long as M is as above (for potentially
different functions A1 and A2), and using that Et[dYt] = atdt, the leader’s forecast reads
Mt = A1(t) +
ˆ t
0
A2(t, s)asds. (4)
Unlike the public case, therefore, the leader now forecasts by input : the more effort she has
exerted in, say, pushing the follower’s belief upwards towards θ from a low prior, the higher
she thinks the current value of M is.
8That β0t = 0 and β1t + β3t = 1 hold at all times in this dynamic setting can be understood fromthe leader’s incentives at Mt = θ: in this case, there is no coordination loss (at = at), and the objective
then becomes, locally, one of minimizing the adaptation cost; but since Et[dMt] = 0 (i.e., M is locallyunpredictable), there are no incentives to move away from at = θ. Thus, β0 + (β1 + β3)θ = θ for all types.
10
This contrast between the public and no-feedback cases is natural: in the absence of
any additional information, an expert must rely on how much emphasis she has given to a
particular idea or technique to assess how much the apprentice has assimilated the latter.
(By contrast, in the public case, the apprentice’s output signal suffices for perfectly inferring
his understanding of the topic.) This dependence of Mt on the past history of play in the
no-feedback case reflects the well-known idea that, in games of private monitoring, players
must rely on their past behavior to forecast others’ private histories; yet, it has important
effects on equilibrium outcomes.
Representation of second-order belief and history-inference effect. Observe that
M is hidden to the follower: off the path of play, because deviations go undetected; and in
equilibrium, because the leader’s action carries her type. Along the path of play of a linear
strategy, however, one would expect a linear relationship between θ and M , as the linearity
of (4) suggests. When this is the case, the follower’s (third-order) inference of M would then
reduce to a function of M , and the system of beliefs would “close.”
To this end, suppose that the follower expects that, in equilibrium, M satisfies
Mt =
(1− γt
γo
)θ +
γtγoµ (5)
when the leader follows a strategy
at = β0tµ+ β1tMt + β3tθ, (6)
for some deterministic coefficients βit, i = 0, 1, 3 (potentially different from those in the public
case). The representation (5) encodes two ideas. First, there is no second-order uncertainty
at time zero, i.e., M0 = µ = M0; this is obtained by setting γ0 = γo in the right-hand side
of (5). Second, if enough signaling has taken place, the leader would expect the follower to
have learned the state: γt ≈ 0 in the same expression leads to Mt ≈ θ.
How is the follower’s learning, γt, now determined? To simplify notation, let us use
χ := 1− γ
γo
to denote the weight on the type in (5). Inserting this into (6) yields at = [β0t + β1t(1 −χt)]µ + [β3t + β1tχt]θ, and so the new signaling coefficient is no longer given by the weight
that the equilibrium strategy directly attaches to the type, β3, but instead by
α ≡ β3 + β1χ.
11
We refer to β1χ as the history-inference effect on signaling. In fact, because the leader
uses her actions to forecast M , the follower needs to infer the leader’s private histories to
extract the correct informational content of the signal Y . However, since higher types take
higher actions due to their static adaptation and future coordination motives—forces that
fully determine behavior in the public case—those types also expect their followers to have
higher beliefs. Alternatively, given a history Y t, consider the impact that a marginal increase
in θ has on the leader’s action: in the public case, the overall effect is β3, as all types agree
on the value that M takes; this is not the case when there is no feedback, as different types
perceive different continuation games via M . We collect these ideas in the next result.
Lemma 1 (Belief Representation). Suppose that the follower expects at = [β0t+β1t(1−χt)]µ+
αtθ, where α = β3 + β1χ, χ = 1 − γ/γo, and γt := Et[(θt − Mt)2]. Then, γ = −
(γtαtσ2Y
)2
.
Moreover, if the leader follows (6), Mt = χtθ + (1− χt)µ holds at all times.
The representation of the second-order belief (5) holds only under the linear strategy (6).
More generally, the leader controls M as reflected by (4), and thus (θ,M, µ, t) effectively
summarizes all the payoff-relevant information for the leader’s decision-making.
Proposition 2 (LME—No Feedback Case). For all r ≥ 0 and T > 0:
(i) Existence: there exists a LME. In any such an equilibrium: β0 +β1 +β3 = 1; β3t > 1/2,
t ∈ [0, T ); β3T = 1/2; and β1 > 0 over [0, T ].
(ii) Signaling coefficient: α > 1/2; αT → 1 as T → ∞, and α′t ≥ 0, t ∈ [0, T ), with strict
inequality if and only if r > 0.
Thus, in the no-feedback case, the signaling coefficient α has a radically different behavior
relative to the public case counterpart: it is non-decreasing, and its right-end point becomes
asymptotically close to 1 as the length of the interaction increases. See Figure 1.
2 4 6 8 10t
0.55
0.60
0.65
0.70
0.75
0.80
β3 Publicβ3 NFα NF
β3 Public
β3 NF
α NF
0 2 4 6 8 10t
0.55
0.60
0.65
0.70
0.75
0.80
0.85
β3 Public
β3 NF
α NF
Figure 1: Left r = 0; Right: r = 1. Other parameter values: γo = 1, σY = 1.5, T = 10.
12
The reason behind the discrepancy lies in the history-inference effect compounding over
time. In fact, since the leader expects the follower to gradually learn the state as signaling
progresses, M attaches an increasingly higher weight χ to θ in (5). With a positive coordi-
nation motive (β1 > 0), this implies that higher types take higher actions over time via this
second-order belief channel. We conclude that private monitoring generates an interesting
phenomenon whereby standard monotonically decreasing signaling effects under public be-
liefs are more than offset by an increasingly strong informational content in the leader’s past
history of play (except for the r = 0 case, where both forces perfectly offset each other).
2.3 Learning, Coordination, and the Value of Public Information
The fact that the leader has to rely on her private information to coordinate with the follower,
and that this force reinforces the direct signaling effect coming from β3, opens the possibility
for more information to be transmitted in the no-feedback case. At least, αT > βPub3T = 1/2
for all T > 0, so more signaling indeed takes place by the end of the game.
To assess the validity of this conjecture, we take advantage of the model’s analytic so-
lutions in the patient (r = 0) and myopic (r = ∞) cases. Let γPub and γNF denote the
follower’s posterior variance in the public and no-feedback case, respectively.
Proposition 3 (Learning comparison). For every T > 0:
(i) Patient case: if r = 0, βPub30 > α0 and γPubT > γNFT ;
(ii) Large r case: for every δ ∈ (0, T ), γPubt > γNFt for t ∈ [T − δ, T ] if r large enough.
0 1 2 3 4 5r
0.30
0.35
0.40
0.45
γT NF
γT Public
Figure 2: Terminal values of γPub and γNF , Parameter values: γo = σY = 1, and T = 4.
Consequently, when the leader is either patient or very impatient, in the no-feedback case
the follower always has a more precise knowledge of the state of the world by the end of the
interaction. In this line, part (i) says that, if the leader is patient, this result is non-trivial due
to an inter-temporal substitution effect: the leader, anticipating that the history-inference
13
effect will eventually take place, decides to reduce α0 = βNF30 below the public counterpart,
βPub30 . Part (ii) then states that the fraction of time over which the follower has a more
accurate belief can converge to 1 as r grows large. Figure 2 shows, albeit numerically, that
learning is higher in the no-feedback case for intermediate values of r.
One may feel tempted to conjecture that the organization can be better off by isolating
the leader from any information about the follower, as this fosters the latter’s learning.
The caveat is that information transmission happens through actions: a more precise belief
that is the result of more aggressive signaling is necessarily the reflection of more transient
miscoordination, as learning occurs only when Y is informative about the state of the world.
Proposition 4 (Team’s ex ante payoffs—public vs. no-feedback).
(i) Patient case: if r = 0, the team’s ex ante payoff is larger in public case for all T > 0.
(ii) Large r case: there is T > 0 such that, for all T > T , the team’s ex ante flow payoffs
are larger in the no feedback case over [T , T ] for r sufficiently large.
In the patient case, the team is unequivocally better off in the public case. In particular,
one can show that ex ante coordination costs satisfy
0 < Eθˆ T
0
[βpub3t (θ− Mt)]2dt = −σ2
Y log
(γPubT
γo
)< −σ2
Y log
(γNFTγo
)= Eθ
ˆ T
0
[αt(θ− Mt)]2dt.
Thus, the extent of the follower’s learning is effectively a measure of the total coordination
costs incurred by the team, which are larger in the no-feedback case.9 Consequently, an
important takeaway of our analysis is that not accounting for the specific features of the in-
formation channels within firms, but instead just focusing an outcome measure such terminal
learning γT , can be very misleading in terms of assessing past (or even future) performance:
a better understanding of the economic environment can in fact be the reflection of a painful
struggle to coordinate actions. Alternatively, our analysis uncovers how affecting an infor-
mation channel that does not feed a follower can affect his learning via the strategic response
of other members in the organization.
We conclude our analysis by discussing the second part in the proposition, which allows
us to begin talking about the value of better public information and its dependence on
parameters of the model, such as the leader’s degree of patience.
Part (ii) states that, for sufficiently large horizons and discount rates, the organization’s
continuation payoffs at a time T (independent of r) are ranked in favor of the no-feedback
case. This is in fact the result of a stronger adaptation by the leader. To see why, observe
9Note that − log(γT /γo) is the entropy reduction in the follower’s belief over [0, T ].
14
first that a sufficiently long horizon is needed for the history-inference effect to gain strength.
In this line, the first row in Figure 3 plots total ex ante coordination and adaptation losses
when r = 0: the latter are lower in the no-feedback case for T beyond a threshold.
2 4 6 8 10T
0.5
1.0
1.5
2.0
Coord. Loss (T) NF Coord. Loss (T) Pub
2 4 6 8 10T
0.05
0.10
0.15
0.20
0.25
0.30
Adap. Loss (T) Pub Adap. Loss (T) NF
Coord Losst NF - Coord Losst Pub
r=0 r=0.5 r=1 r=+infty
5 10 15 20 25 30t
-0.04
-0.02
0.02
0.04
0.06
0.08Adapt Losst NF - Adapt Losst Pub
r=0 r=0.5 r=1 r=+infty
5 10 15 20 25 30t
-0.06
-0.04
-0.02
0.02
Figure 3: First row: total adaptation and coordination losses for T ∈ [0, 10] for r = 0. Second row:flow losses for T = 30, r ∈ {0, 0.5, 1,+∞}. Other parameters: γo = σY = 1.
The second row plots differences of ex ante coordination and adaptation flow losses be-
tween cases in a large horizon context. In particular, adaptation losses are eventually lower
in the no-feedback case (right panel), and this is more pronounced as r grows. In fact,
recall that a myopic leader attaches a weight of 1/2 to her type at all times in the public
case. By contrast, in the no-feedback case, the history-inference effect continues to operate
if the leader is myopic, allowing α to become arbitrarily close to 1 as T lengthens. Thus, an
impatient leader in the public case simply adapts too little after a threshold.
This analysis of discounting and time-horizon effects has two implications. First, from (ii),
initializing the game with second-order uncertainty can improve upon a public counterpart
with potentially higher uncertainty yet a perfect ability to coordinate; for instance, inten-
sive one-sided communication by a leader, and subsequent leadership by example without
feedback, can dominate two-sided communication followed by a perfect feedback channel.
Second, the additional value that a noiseless feedback channel brings to an organization
relative to the no-feedback case is expected to fall with discounting due to the leader being
weakly adapted to the environment when beliefs are public. See Figure 4:
15
1 2 3 4 5r
-1.4
-1.2
-1.0
-0.8
-0.6
-0.4
-0.2
Payoff NF
Payoff Public
0 1 2 3 4 5r
1.04
1.06
1.08
1.10
1.12
1.14
Payoff NFPayoff Pub
Figure 4: Ex ante total payoffs comparison. γo = σY = 1, and T = 4.
We have examined how a leader gradually adapts a team to a new economic environ-
ment while controlling the team’s coordination costs, uncovering three sets of results. First,
higher-order uncertainty arising from private monitoring radically affects the way in which a
leader’s actions transmit her private information: the signaling coefficient is non-decreasing,
which is in stark contrast with its public counterpart. Second, the history-inference effect
driving the previous result is sufficiently strong to generate more learning on behalf of the
follower—learning is however, a measure of the coordination costs incurred by the organiza-
tion. Third, the value of interventions aimed at improving the information flow to leaders
depends critically on both horizon effects and discounting: longer interactions combined with
leader myopia reduce the value of noiseless information structures.
Critically, this example is just a first attempt at understanding organizations as dynamic
enterprises, where decision makers can signal and learn information at the same time that
decisions are being made. From this standpoint, it is important to recognize that public
signals rarely are perfectly informative or pure noise. Away from those cases, we would
expect informed parties’ forecasts to lie somewhere in between the two extreme cases just
analyzed: i.e., to rely both on input and output measures. This is done in the next section,
where the key will be to derive a generalization of the representation Mt = χtθ + (1 − χ)µ
for 0 < σX ≤ ∞.
3 General Model
We consider two-player linear-quadratic-Gaussian games with one-sided private information
and one-sided private monitoring in continuous time. The baseline model considered is
introduced next, and extensions of it are presented in Section 5 via two further applications.
Players, Actions and Payoffs. A forward looking long-run player (she) and a myopic
counterpart (he) interact in a repeated game that is played continuously over a time interval
16
[0, T ], T <∞. At each t ∈ [0, T ], the long-run player chooses an action at, while the myopic
player chooses at, both taking values over the real line. Given a profile of realized actions,
(at, at)t∈[0,T ], the long-run player’s total payoff is
ˆ T
0
e−rtU(at, at, θ)dt. (7)
In this specification, r ≥ 0 is the long-run player’s discount rate, U : R3 → R is quadratic
capturing her flow (i.e., stage-game) utility function, and θ denotes the value of a normally
distributed random variable with mean µ and variance γo > 0 that parametrizes the economic
environment. In turn, the myopic player’s stage-game payoff at any time t ≥ 0 is given by
U(at, at, θ) (8)
if (at, at) was chosen at that time, where U : R3 → R is also quadratic.
In what follows, we assume the following properties on the quadratic functions U and U
(partial derivatives are denoted by subindices):
Assumption 1.
(i) Strict concavity: Uaa = Uaa = −1;
(ii) Non-trivial signaling: Uaθ(Uaθ + UaaUaθ) > 0;
(iii) Second-order inferences: Uaa 6= 0 and |Uaθ|+ |Uaa| 6= 0.
(iv) Myopic best-replies intersect: UaaUaa < 1;
We first require that the players’ objectives are concave in their respective choice vari-
ables; from this perspective (i) is simply a normalization. A second minimal requirement is
that the long-run player strategically care about θ, which is implied by (ii). Equipped with
this, part (iii) says that second-order inferences are relevant for play: the myopic player’s
first-order belief matters for his behavior—either directly because he cares about θ, or be-
cause he wants to predict the long-run player’s action—and in turn the long-run player wants
to predict the myopic player’s action, invoking a second-order belief.10
The remaining parts are technical conditions pertaining to the one-shot game that arises
at the end of the interaction via the static game with higher-order uncertainty that takes
place at T . Specifically, part (iv) ensures that a static Nash equilibrium always exists, and
part (ii) ensures that any such equilibrium involves non-trivial signaling. We elaborate more
on these conditions when we explain how to find equilibria of the type we are interested in.
10Of course, (iii) is not really a restriction to our analysis, but instead a choice.
17
Information. The long-run player observes θ before play begins, while the myopic player
only knows the distribution θ ∼ N (µ, γo) from which it is drawn (and this is common
knowledge). In addition, there are two signals X and Y that convey noisy information
about the players’ actions according to
dXt = atdt+ σXdZXt (9)
dYt = atdt+ σY dZYt , (10)
where ZX and ZY are independent Brownian motions, and σY and σX are strictly positive
volatility parameters. In this linear product-structure specification, the signal Y is only
observed by the myopic player, while the signal X is public.11
Let Et[·] denote the long-run player’s conditional expectation operator, which can con-
dition on the histories (θ, as, Xs : 0 ≤ s ≤ t), t > 0, and on her conjecture of the my-
opic player’s play. Likewise, Et[·] denotes the myopic player’s analog, which conditions on
(as, Xs, Ys : 0 ≤ s ≤ t) and on her belief about the long-run player’s strategy.
Strategies and Equilibrium Concept. To characterize equilibrium outcomes, we focus
on Nash equilibria. From a time-zero perspective, an admissible strategy for the long-run
player is any square-integrable real-valued process (at)t∈[0,T ] that is progressively measurable
with respect to the filtration generated by (θ,X). Similarly, an admissible strategy (at)t∈[0,T ]
for the myopic player satisfies identical integrability conditions, but the measurability re-
striction is with respect to (X, Y ).12
Definition 1 (Nash equilibrium.). An admissible pair (at, at)t≥0 is a Nash equilibrium if,
(i) given (at)t≥0, the process (at)t≥0 maximizes
E0
[ˆ T
0
e−rtU(at, at, θ)dt
]among all admissible processes, and
(ii) at solves maxa′∈R
E[U(at, a′, θ)] for all t ∈ [0, T ].
In the next section, we characterize Nash equilibria that are supported by linear Markov
strategies that are sub-game perfect, i.e., that sequentially rational on and off the path
11Thus, flow payoffs do not convey any additional information to the players (i.e., they are either realizedafter time T , or they can be written in terms of the actions and signals observed by each player).
12Square integrability is in the sense of E0[´ T0a2tdt] < +∞ for the long-run player. Such condition ensures
that a strong solution to 9 exists, and thus that the outcome of the game is well defined.
18
of play. Such equilibria generalize that presented in Section 2 for the no-feedback case to
settings in which 0 < σX ≤ ∞.
Remark 1 (Extensions). The baseline model can be generalized along two dimensions:
(i) Terminal payoffs: terminal payoffs of the form e−rTΨ(aT ), with Ψ quadratic, can be
added to (7). A reputation model with this property is studied in 5.1
(ii) Long-run player affecting the public signal X: the drift of (9) can be generalized to
at + νat, where ν ∈ [0, 1] is a scalar. An insider trading model involving ν = 1, as well
as Uaa = 0, (i.e., linear utility) is explored in 5.2
4 Equilibrium Analysis: Linear Markov Equilibria
To construct linear Markov perfect equilibria (henceforth, LME), we first postulate a minimal
set of belief states up to the second order to be used by the players in any equilibrium of this
kind. We then derive a representation of the long-run player’s second-order belief as a linear
function of a subset of such belief states, when the players use the candidate belief states in
a linear fashion. This result generalizes the representation (5) obtained in section 2.2, and
it circumvents the problem of the set of states growing without bound (Section 4.1).
In Section 4.2 we then turn to setting up the long-run player’s best-response problem,
and elaborate on how the problem of existence of LME reduces to finding solutions to a
boundary-value problem. In Section 4.3, we illustrate two proof techniques that depend on
whether the myopic player’s best response explicitly depends on his belief about the state of
the world or not (common vs. private values environments, respectively). Finally, we obtain
two existence results for LME, each for a variation of the coordination game from Section 2.
4.1 Belief States and Representation Lemma
With linear-quadratic payoffs and signals that are linear in the players’ actions, it is natural
to examine equilibria in which the long-run player conditions on her type θ linearly.
The logic is then analogous to that in Section 2.2. Specifically, since the myopic player
cares about the long-run player’s action (and/or her type) to determine his best response, he
will use Y to learn about θ. Because Y is privately observed, however, the myopic player’s
(first-order) belief about θ is private. The strategic interdependence of the players’ actions in
the long-run player’s payoff then forces the latter agent to forecast the myopic player’s belief,
which leads her second-order belief to become a relevant state. As we demonstrate shortly,
such second-order belief is also private due to its dependence on the long-run player’s type
19
via her past actions. Thus, the myopic player is forced to perform a non-trivial inference
about such hidden second-order belief, and so forth.
Along the path of play of any pure strategy, however, the outcome of the game should
depend only on the tuple (θ,X, Y ). Intuitively, given any rule that specifies behavior as a
function of past actions and information, the dependence on past play must disappear when
such a rule is followed, thus leading to realized outcomes that depend on the exogenous
elements of the model. In particular, the long-run player’s second-order belief should be
a function of (θ,X) exclusively, which is the only source of information available to her.
Moreover, in this Gaussian environment, one would expect the relationship between M and
(θ,X) to be linear if the rule that drives behavior is linear in some belief states.
Let Mt := Et[θ] denote the mean of the myopic player’s belief, and Mt := Et[Mt] denotes
the long-run player’s second-order counterpart. The previous discussion then suggests the
existence of a deterministic function χ and a process (Lt)t∈[0,T ] that depends on the paths of
the public signal X, such that M admits the representation
Mt = χtθ + (1− χt)Lt (11)
when the players follow linear Markov strategies
at = β0t + β1tMt + β2tLt + β3tθ (12)
at = δ0t + δ1tMt + δ2tLt, (13)
where the coefficients βit and δjt, i = 0, 1, 2, 3 and j = 0, 1, 2, are deterministic. (We
occasionally use ~β := (β0, β1, β2, β3) and ~δ := (δ0, δ1, δ2) for convenience.) The reason for
augmenting the strategies of 2.2 by the public state L is apparent: if true, the myopic player
uses (11) to forecast M , which means that L becomes a payoff-relevant state for both players.
Lemma 2 below characterizes the pair (χ, L) that validates (11)–(13). Before stating
the result, it is instructive to explain its derivation and introduce some notation. When
the myopic player conjectures that (11)–(12) hold for some “public” process L, he therefore
expects the long-run player’s realized actions to follow
at = β0t︸︷︷︸=:α0t
+ (β2t + β1t(1− χt))︸ ︷︷ ︸=:α2t
Lt + (β3t + β1tχt)︸ ︷︷ ︸=:α3t
θ. (14)
Because L is public, the myopic player can then filter θ from (X, Y ) when Y is driven by (14).
This learning problem is (conditionally) Gaussian, and hence the myopic player’s posterior
20
belief is fully characterized by a mean process (Mt)t≥0, and a deterministic variance
γt := Vart = Et[(θt − Mt)2],
where we have omitted the hat symbol in γt for notational convenience. As in Section 2, this
posterior variance will be determined by the signaling coefficient
α3t := β3t + β1tχt,
with β1χt encoding the history-inference effect: different types are expected to take different
actions not only because of their direct signaling incentives (captured by β3) but also because
their past actions have lead them to hold different beliefs today.
Critically, while the long-run player does not observe M , she recognizes that deviations
from (14) affect its evolution via Y—thus, her problem is one of stochastic control of an
unobserved state. Given the linear-quadratic payoffs, linear dynamics, and Gaussian noise,
this problem can be recast as one of controlling the long-run player’s estimate of M—namely,
Mt := Et[Mt]—after appropriately adjusting her flow payoffs via the use of conditional
expectations.13 Inserting the general linear Markov strategy (12) into the law of motion of
(Mt)t≥0, and solving for Mt as a function of {θ, (Xs)s<t} allows is to pin down (χ, L) given
the coefficients in the strategies.
Lemma 2 (Representation of second-order belief). Suppose that (X, Y ) is driven by (12)–
(13) and that the myopic player believes that (11) holds. Then, (11) holds at all times
(path-by-path of X), if and only if
γt = −γ2t (β3t + β1tχt)
2
σ2Y
, t > 0, γ0 = γo, (15)
χt =γt(β3t + β1tχt)
2(1− χt)σ2Y
− γtχ2t δ
21t
σ2X
, t > 0, χ0 = 0, (16)
dLt = (l0t + l1tLt)dt+BtdXt, t > 0, L0 = µ, (17)
where l0t and l1t, and Bt are given in (B.6)-(B.8). Moreover, Lt = E[Mt|FXt ] = E[θ|FXt ]
and γtχt = Vart = Et[(Mt − Mt)2].
The long-run player uses the public signal to learn about the myopic player’s belief. By
13This is the so-called separation principle, which allows one to filter first, and optimize afterwards usingbelief states, in these types of problems. We elaborate more on this topic in the proof of Lemma 4, wherewe derive the laws of motion of the Markov belief states.
21
the lemma, along the path of (12)–(13), we have that
Mt =Vart
Vartθ +
(1− Vart
Vart
)E[θ|FXt ].
Indeed, while learning about M from X, the only informational advantage that the long-run
player has relative to an outsider who observes X exclusively is that she knows her type.
Due to the Gaussian structure of the model, therefore, (i) Mt is a linear combination of θ
and E[Mt|FXt ], and (ii) the weights are deterministic. By the law of iterated expectations,
E[Mt|FXt ] = E[θ|FXt ], and the representation follows.
Let us now elaborate on the structure of the χ-ODE (16). Recall that the common prior
assumption implies that the long-run player knows that M = µ at time zero: Var0 = 0 then
implies M0 = µ in the previous expression, and so the χ-ODE starts at zero. As signaling
progresses, however, second-order uncertainty arises due to the long-run player losing track
of M (i.e., Vart > 0): this is captured in χ > 0 as soon as α3 > 0 in (16). In other words,
the long-run player expects M to gradually reflect her type θ, and so χt > 0.
Observe that if σX =∞ (the public signal is infinitely noisy) or δ1 ≡ 0 (the myopic player
does not signal back) the public signal is uninformative, so we would expect the long-run
player to forecast M solely by input: in fact, Lt = L0 = µ and χt = 1 − γt/γ0 hold in this
case, exactly as in Section 2.14 Otherwise, she also forecasts by “output,” as reflected in the
dependence of L on X. Conversely, as δ21/σ
2X grows, there is more downward pressure on
the growth of χ: as the signal-to-noise ratio in X improves, the long-run player relies less on
her past actions to forecast M , everything else being held constant. Thus, the no-feedback
maximizes the potential impact of second-order belief effects on behavior.
Our subsequent analysis takes the system (15)–(16) as an input. Thus, we require it to
have a unique solution over [0, T ] so as to ensure that the ODE-characterization is valid. To
this end, notice that the myopic player’s best reply can be written as δ1t := uθ+ua[β3t+β1tχt],
where uθ = Uaθ and ua = Uaa are real numbers.
Lemma 3. Suppose that β1 and β3 are continuous, β3· 6= 0 and δ1t = uθ + ua[β3t + β1tχt].
Then, there is a unique solution to (15)–(16). Such solution satisfies 0 < γt ≤ γo and
0 < χt < 1, t ∈ (0, T ].
The idea is that, under the conditions in the lemma, (γ, χ) is bounded, and hence a
solution to the system exists over [0, T ] (as solutions to ODE systems either exist or explode
over unrestricted domains). Since the system is locally Lipschitz continuous, uniqueness
14Setting δ1/σX ≡ 0 in (16) leads to the same ODE that χ satisfies in the no-feedback case. By uniqueness,such solution is χ = 1− γt/γo. See the proof of Lemma 1.
22
ensues; in particular, γt = Et[(θt − Et[θ])2] and χt = Vart/Vart = Et[(Mt − Mt)2]/γt.
The belief representation (11) relies on the long-run player following the linear strategy
(12); i.e., it does not hold off the path of play. In fact, as argued earlier, (Mt)t≥0 is controlled
by the long-run player, a phenomenon that is the consequence of the private monitoring
present in the model: past play is used for forecasting the myopic player’s private histories,
and so different actions yield different perceptions of the continuation game as measured by
M . Moreover, because such deviations are hidden, from the long-run player’s perspective, the
myopic player is always assuming that (11) holds—thus, the pair (γ, χ) affects the evolution
of M in the myopic player’s learning process, and hence the evolution of M . The next result
introduces the law of motion of M and L for an arbitrary strategy of the long-run player,
which will allow us to state her best-response problem.
Lemma 4. Suppose that the long-run player follows (a′t)t≥0 while the myopic player follows
(13) and believes (11)–(12). Then, from the long-run player’s perspective
dMt =γtα3t
σ2Y
(a′t − [α0t + α2tLt + α3tMt])dt+χtγtδ1t
σXdZt (18)
dLt =χtγtδ1t
σ2X(1− χt)
[δ1t(Mt − Lt)dt+ σXdZt] (19)
where (γ, χ) solves (15)–(16) and (Zt)t≥0 is a Brownian motion from her standpoint.
The dynamic (18) shows that long-run player’s choice of strategy a′ affects M . In par-
ticular, she will update her belief upward when a′t > Et[α0t + α2tLt + α3tMt], i.e., when she
exceeds her own expectation of the myopic player’s belief about her behavior. The intensity
of such a reaction is given by γtα3t/σ2Y : more uncertainty (higher γ) and stronger signaling
(larger α3) makes the long-run player’s belief more sensitive to her own actions. Further, M
evolves deterministically when δ1/σX ≡ 0.15
The drift of (19) demonstrates that the long-run player affects L only indirectly via
changes in M , due to her action not entering the public signal directly. Further, the drift
captures that the belief of an outsider who only observes X always moves in the direction of
M on average, reflecting that such an outsider learns the long-run player’s type. From this
perspective, by leading to Lt = µ at all times, the no-feedback case (σX =∞) misses a mild
signal-jamming effect—the ability to influence a public belief, albeit only indirectly.
Finally, the full-support monitoring and linear-quadratic structure, along with the (equi-
librium) representation (11), make it clear that (t, θ, Lt,Mt) and (t, Lt, Mt) summarize all
15It is worth noting that (Mt)t≥0 corresponds to a player’s non-trivial belief that is controlled by the sameplayer. Unless there are experimentation effects, players’ own beliefs are usually affected by other players.
23
the payoff-relevant information our players.16 In this line, the time variable captures both
time-horizon effects and the learning effects via γ and χ.
4.2 Dynamic Programming and the Boundary-Value Problem
The long-run player’s best-response problem. Given a conjecture ~β by the myopic
player, the coefficients ~δ will be such that
at := δ0t + δ1tMt + δ2tLt = arg maxa′
Et[U(α0t + α2tLt + α3tθ, a′, θ)]. (20)
Because U is quadratic, the long-run player’s best-response problem is, up to a constant,
max(at)t∈[0,T ]
E0
[ˆ T
0
e−rtU(at, δ0t + δ1tMt + δ2tLt, θ)dt
]s.t. (18) and (19)
and where ~δ satisfies (20). Observe that we have replaced M by M in the flow by means of
Et[M2t ] = M2
t + χtγt, and then using that χtγt is deterministic.
We can now define the notion of a linear Markov perfect equilibrium (LME).
Definition 2 (Linear Markov Perfect Equilibrium). A Nash equilibrium (at, at)t≥0 is a Lin-
ear Markov Equilibrium (LME) if there are deterministic coefficients (~β, ~δ) such that at
satisfies (20) and at = α0 + α2tLt + α3tθ, where: (i) (Lt)t≥0 evolves as in (17), (iii) ~α
satisfies (14), and (iii) β0 + β1M + β2L+ β3θ is an optimal policy for the long-run player.
The natural approach for establishing the existence of LME is via dynamic programming.
Specifically, we postulate a quadratic value function
V (θ,m, `, t) = v0t + v1tθ + v2tm+ v3t`+ v4tθ2 + v5tm
2 + v6t`2 + v7tθm+ v8tθ`+ v9tm`,
where vi·, i = 0, ..., 9 depend on time only. In turn, the HJB equation is
rV = supa′
{U(a′, at, θ) + Vt + µM(a′)Vm + µLV` +
σ2M
2Vmm + σMσLVm` +
σ2L
2V``
},
where µM(a′) and µL (respectively, σM and σL) are the drifts (respectively, volatilities) in
the laws of motion for M and L given in Lemma 4, and where at, is determined by ~β and
16We have focused on the long-run player exclusively. While deviations by the myopic player do affectL, the same assumptions (i.e., linear-quadratic structure and undetectable deviations) make his flow payofffully determined by the current value of (t, L, M) after all private histories.
24
χ via (20). A LME with coefficients ~β for the long-run player is obtained when the linear
Markov strategy (12) is an optimal policy for the previous HJB equation.
The boundary-value problem. We briefly explain how to obtain a system of ordinary
differential equations (ODEs) for ~β. Letting a(θ,m, `, t) denote the maximizer of the right-
hand side in the HJB equation, the first-order condition (FOC) reads
Ua(a(θ,m, `, t), δ0t + δ1tm+ δ2t`, θ) +γtα3t
σ2Y
[v2t + 2v5tm+ v7tθ + v9t`]︸ ︷︷ ︸Vm(θ,m,`,t)
= 0. (21)
where γtα3t/σ2Y in the second term captures the sensitivity of M to the long-run player’s
action at time t. Solving for a(θ,m, `, t) in the previous FOC, the equilibrium condition
becomes a(θ,m, `, t) = β0t + β1tm+ β2t`+ β3tθ.
Because the latter condition is a linear equation, we can solve for (v2, v5, v7, v9) as a
function of the coefficients ~β. Inserting these into the HJB equation along with a(θ,m, `, t) =
β0t + β1tm+ β2t`+ β3tθ in turn allows us to obtain a system of ODEs that the ~β coefficients
must satisfy. The resulting system is coupled with the ODEs that v6 and v8 satisfy (and that
are obtained from the HJB equation): since M feeds into L, the envelope condition with
respect to M is not enough to determine equations for the candidate equilibrium coefficients.
Finally, since the pair (γ, χ) affects the law of motion of (M,L), it also affects the evolution
of (~β, v6, v8), and so the ODEs (15)–(16) must be considered.
The boundary conditions for the system of ODEs that (β0, β1, β2, β3, v6, v8, γ, χ) satisfies
are as follows. First, there are the exogenous initial conditions that γ and χ satisfy, i.e.,
γ0 = γo > 0 and χ0 = 0. Second, there are terminal conditions v6T = v8T = 0 due to
the absence of a lump-sum terminal payoff in the long-run player’s problem. Third, more
interestingly, there are endogenous terminal conditions that are determined by the static
Nash equilibrium that arises from myopic play at time T . In fact, letting
Uaθ = uθ, Uaa = ua and Ua(0, 0, 0) = u0,
and analogously for the myopic player via the substitution (·)↔ (·), we obtain
β0T =u0 + uau0
1− uaua, β1T =
ua[uθua + uθ]
1− uauaχT, β2T =
u2aua[uθua + uθ](1− χT )
(1− uaua)(1− uauaχT ), β3T = uθ (22)
which are well-defined thanks to (iv) in Assumption 1 and χ ∈ (0, 1).17
We conclude that b := (β0, β1, β2, β3, v6, v8, γ, χ)′ satisfies a boundary-value problem
17From here, δ0T = u0 + uaβ0T , δ1T = uθ + ua[β3T + β1TχT ] and δ2T = ua[β2T + β1T (1− χT )].
25
(BVP) of the form
bt = f(bt), s.t. D0b0 + DTbT = (B(χT )′, γo, 0)′ (23)
where (i) f : R6 × R+ × [0, 1)→ R8, (ii) D0 and DT are the diagonal matrices
D0 = diag(0, 0, 0, 0, 0, 0, 1, 1) and DT = diag(1, 1, 1, 1, 1, 1, 0, 0)
and
(iii) χ 7→ B(χ) :=
(u0 + uau0
1− uaua,ua[uθua + uθ]
1− uauaχ,u2aua[uθua + uθ](1− χ)
(1− uaua)(1− uauaχ), uθ, 0, 0
)∈ R6. (24)
The general expression for f(·) given any pair (U, U) satisfying Assumption 1 is tedious and
long, and can be found in spm.nb on our websites. In the next section, we provide examples
that exhibit all the relevant properties that any such f(·) can satisfy.
The question of finding LME is then reduced to finding solutions to the BVP (23) (subject
to the rest of the coefficients of the value function being well defined). We turn to this issue
in the next section.
4.3 Existence of Linear Markov Equilibria: Interior Case
In this section, we present two existence results for LME in the case σX ∈ (0,∞): one for the
application introduced in Section 2, and the second for a variation of it in which the follower
(i.e., the myopic player) cares about both matching the leader’s action and matching the her
type. We accomplish this via proving the existence of a solution to the BVP that arises in
each setting, for the case in which the leader is patient (i.e., r = 0)—the applicability of the
methods is, however, more general (both in terms of the flow payoffs and time preferences).
The problem of finding a solution to any instance of the BVP (23) is complex because
there are multiple ODEs in either direction: (β0, β1, β2, β3, v6, v8) are traced backward from
their (endogenous) terminal values, while (γ, χ) are traced forward using their initial (exoge-
nous) ones—see Figure 5. In practice, this implies that the traditional “shooting methods”
can become severely complicated. Specifically, when constructing, say, a modified backward
initial value problem (IVP) in which (γ, χ) has a parametrized initial condition at T , the
requirement becomes that the chosen parameters induce terminal values at 0 that exactly
match (γo, 0). With more than one variable, however, this method essentially requires having
an accurate knowledge of the relationship between γ and χ at T for all possible coefficients~β: it is only then that we can find a way to trace the parametrized values over a region of
26
initial (time-T ) values in a way that it is possible to ensure that the target is hit.
γo
0
χ
T
β (χ ,γ )γt
t
i,T T T
v (χ ,γ )i,T T T
StaticNash
.
..
B(χ ,γ )T T
Figure 5: In the BVP, (γ, χ) has initial conditions, while (~β, v6, v8) has terminal ones.
The reason behind this dimensionality problem is the asymmetry in the environment:
the rate at which the long-run player signals her private information, α3 := β3 + χβ1, can
be substantially different than rate at which the myopic player signals his private belief,
δ1. This, in turn, potentially introduces a non-trivial history dependence between γ and
χ, reflected in the coupled system of ODEs they satisfy. Two natural questions then arise:
first, under which conditions such history dependence can be simplified; and second, how to
tackle the issue of existence of LME when this simplification is not possible.
Private values: one-dimensional shooting. We say that an environment is one of
private values if the myopic player’s flow utility satisfies
uθ := Uaθ = 0,
i.e., the myopic player’s best-reply does not directly depend on his belief about θ, but only
indirectly via the long-run player’s action. Otherwise, we say that the environment is one of
common values (despite the long-run player always knowing θ).
In a private-value setting, the myopic player’s coefficient on M is δ1 = uaα3. In this case,
there is a one-to-one mapping between γ and χ:
Lemma 5. Set σX ∈ (0,∞). Suppose that β1 and β3 are continuous and that δ1 = uaα3. If
ua 6= 0, there are positive constants c1, c2 and d independent of γo such that
χt =c1c2(1− [γt/γ
o]d)
c1 + c2[γt/γo]d.
Moreover, (i) 0 ≤ χt < c2 < 1 for all t ∈ [0, T ] and (ii) c2 → 0 as σX → 0 and c2 → 1 as
σX →∞. If instead ua = 0, χt = 1− γt/γo and c2 = 1.
27
It is easy to see that the right-hand side of the expression for χ in the previous lemma is
strictly decreasing in γt. Consequently, when the ratio of the signaling coefficients is constant,
the dimensionality of the (backward) shooting problem is reduced to the single variable. The
lemma also states that, as long as σX <∞, χ is always strictly below 1, reflecting that the
scope for the history-inference effect is diminished relative to the no-feedback case. Further,
the characterization of χ obtained in the latter case (5) is recovered when ua = 0, as the
public signal is then uninformative.
Thanks to the previous lemma, the standard shooting method based on the continuity of
the solutions is applicable. We state below the BVP for the leading-by-example application
of Section 2 when σX ∈ (0,∞) in its undiscounted version: recall that in that setting, the
follower wants to match the leader’s action, and so
at = Et[at]⇒ δ1t = α3t ⇔ ua = 1.
(Since scaling U and U each by a constant does not alter incentives, the ODEs below are
obtained under U(a, a, θ) = −(θ − a)2 − (a − a)2 and U(a, a, θ) = −(a − a)2 as opposed to
[−(θ − a)2 − (a− a)2]/4 and U(a, a, θ) = −(a− a)2/2, which would yield (i) in Assumption
1). We omit the β0–ODE due to being uncoupled from the rest and linear in itself:
v6t = β22t + 2β1tβ2t(1− χt)− β2
1t(1− χt)2 +2v6tα
23tγtχt
σ2X(1− χt)
v8t = −2β2t − 2(1− 2α3t)β1t(1− χt)− 4β21tχt(1− χt) +
v8tα23tγtχt
σ2X(1− χt)
β1t =α3tγt
2σ2Xσ
2Y (1− χt)
{2σ2
X(α3t − β1t)β1t(1− χt)− α23tβ1tγtχtv8t
−2σ2Y α3tχt(β2t − β1t[1− χt − 2β2tχt])
}β2t =
α3tγt2σ2
Xσ2Y (1− χt)
{2σ2
Xβ21t(1− χt)2 + 2σ2
Y α3tβ2tχ2t (1− 2β2t)− α2
3tγtχt(2v6t + β2tv8t)}
β3t =α3tγt
2σ2Xσ
2Y (1− χt)
{−2σ2
Xβ1t(1− χt)β3t + 2σ2Y α3tβ2tχ
2t (1− 2β3t)− α2
3tβ3tγtχtv8t
}γt = −γ
2t α
23t
σ2Y
with boundary conditions v6T = v8T = 0, β1T = 12(2−χT )
, β2T = 1−χT2(2−χT )
, β3T = 12
and
γ0 = γo, and where α3 := β3 +β1χ and χt is as in the previous lemma. We have the following:
Theorem 1. Let σX ∈ (0,∞) and r = 0. Then, there exists a strictly positive function
T (γo) ∈ O(1/γo) such that, for all T < T (γo), there exists a LME based on the solution to
the previous BVP. In that equilibrium, β0t = 0, β1t + β2t + β3t = 1 and α3t > 0, t ∈ [0, T ].
28
The key step behind the proof is to show that (β1, β2, β3, v6, v8, γ) can be bounded uni-
formly over [0, T (γo)), some T (γo) > 0, when γt ∈ [0, γo] at all times. For a given T < T (γo),
therefore, this implies that tracing the (parametrized) initial condition of γ in the (backward)
IVP from 0 upwards as schematically in Figure 6 will lead to at least one γ-path landing
exactly at γo (while the rest of the ODEs still admitting solutions), due to the continuity of
the solutions with respect to the initial conditions.18
γo
0T
γt
v (χ( ), )i,T
ParametrizedStatic Nash
γFγF
β (χ( ), )i,T
γFγF
χ( )γF
Fγ
γF
Figure 6: The one-dimensional shooting method.
As expected, the signaling coefficient in the interior cases lies “in between” those found
in the extreme cases of Section 2. Graphically for the r = 0 case:19
0 2 4 6 8 10t
0.55
0.60
0.65
0.70
0.75
0.80
0.85
β3 Public
α NF
α Int. σX = 0.1
0 2 4 6 8 10t
0.55
0.60
0.65
0.70
0.75
0.80
0.85
β3 Public
α NF
α Int. σX = 0.75
Common-value settings: fixed-point methods. When α and δ cease to be propor-
tional, χ can depend on both current and past values of γ at all points in time (as it is the
case in most coupled-ODE systems). The multi-dimensionality problem reappears.
18See Bonatti et al. (2017) for an application of this method to a symmetric oligopoly model featuringdispersed fixed private information, imperfect public monitoring, and multiple long-run players.
19In the discounted case, one can instead work with the ‘forward-looking’ component of (β1, β2, β3, v6, v8),which is defined as the latter dynamic coefficients net of their myopic counterpart (given the learning inducedby the dynamic strategy). Such forward-looking system eliminates a component linear in r present in thesystem that (β1, β2, β3, v6, v8) satisfies, and that is absent in the undiscounted version.
29
0 2 4 6 8 10t
0.55
0.60
0.65
0.70
0.75
0.80
0.85
β3 Public α
NF
α Int. σX = 2
0 2 4 6 8 10t
0.55
0.60
0.65
0.70
0.75
0.80
0.85
β3 Public
α NF
α Int. σX = 10
Figure 7: As σX ranges from 0 to +∞ the signaling coefficient starts close to the public benchmark,and gradually becomes closer the the no-feedback case counterpart.
Observe that finding a solution to any given instance of the BVP (23) is, mathematically,
a fixed-point problem. Specifically, notice that the static Nash equilibrium at time T depends
on the value that χ takes at that point. The latter value, however, depends on how much
signaling has taken place along the way, i.e., on values of the coefficients ~β at times prior
to T . Those values, in turn, depend on the value of the equilibrium coefficients at T by
backward induction, and we are back to the same point where we started.
Our approach therefore applies a fixed-point argument adapted from the literature on
BVPs with intertemporal linear constraints (Keller, 1968) to our problem with intratemporal
nonlinear constraints. Because the method is novel and has the generality required to become
useful in other asymmetric settings, we briefly elaborate on how it works.20
Let t 7→ bt(s, γo, 0) denote the solution to the forward IVP version of (23) when the
initial condition is (s, γo, 0), s ∈ R6, provided a solution exists. From Lemma 3, the last two
components of b, γ and χ, always admit solutions as long as the others do; moreover, there
are no constraints on their terminal values. Thus, for the fixed-point argument, we can focus
on the first six components in b := (β0, β1, β2, β3, v6, v8, γ, χ) by defining the gap function
g(s) = B(χT (s, γo, 0))−DT
ˆ T
0
f(bt(s, γo, 0))dt.
This function measures the distance between the total growth of (β0, β1, β2, β3, v6, v8) (last
term in the display), and its target value, B(χT (s, γo, 0)). By (24), B(χ) is nonlinear: the
static Nash equilibrium imposes nonlinear relationships across variables at time T .21
20Our approach is inspired by Theorem 1.2.7 in (Keller, 1968), the proof of which is not provided.21The function g takes only the first six components of b because there are no “shooting” constraints on
γ and χ. Yet, one is not really dispensing with (γ, χ), as this pair does affect (β0, β1, β2, β3, v6, v8).
30
Critically, using that, by definition, b0(s, γo, 0) = s, it follows that
g(s) = s ⇔ B(χT (s, γo, 0)) = s+ DT
ˆ T
0
f(bt(s, γo, 0))dt = DTbT (s, γo, 0),
where the last equality follows from the definition of the ODE-system that DTb satisfies.
Thus, the shooting problem (i.e., find s s.t. B(χT (s, γo, 0)) = DTbT (s, γo, 0)) can be trans-
formed to one of finding a fixed point of the function g.22
The bulk of the method then consists of finding a time T (γo) and a compact set S of
values for s such that (i) for all s ∈ S, a unique solution (bt(s, γo, 0))t∈T (γo) exists for the IVP
with initial condition (s, γo, 0), and (ii) g is continuous map from S to itself. The natural
choice for S is a ball centered around s0 := B(0), the terminal condition of the trivial game
with T = 0. With this in hand, part (i) can be accomplished by bounding the solutions
uniformly as in the one-dimensional shooting method, but now over [0, T (γo)]× S. In turn,
the continuity requirement of (ii) is guaranteed if the system of ODEs has enough regularity,
while the self-map condition can be ensured due to the system scaling with γo and T .23
We can now establish our main existence result for a variation of the leading-by example
application in which the follower’s best-reply is given by
at = uθEt[θ] + Et[at]⇒ δ1t = uθ + α3t, where uθ > 0.
(The positivity constraint ensures that (ii) in Assumption 1 is satisfied.24) The associated
BVP is given by (B.32)-(B.38) in the Appendix.
Theorem 2. Set σX ∈ (0,∞), u > 0 and r = 0 in the leadership model. Then, there is a
strictly positive function T (γo) ∈ O(1/γo) such that if T < T (γo), there exists a LME based
on the BVP (B.32)-(B.38). In such an equilibrium, α3 > 0.
22A BVP with intertemporal linear constraints differs from ours in that D0b0 + DTbT = (B(χT )′, γo, 0)′
becomes Ab0 +BbT = α, where A and B are not necessarily diagonal matrices and α is a constant vector.Thus, unlike in our analysis, one may not be able to dispense with a subset of the system (even if theassociated ODEs can be shown to exist independently from the rest): when A and B are not diagonal, theanalog of g(·) in that case may carry constraints on all coordinates. A complication that arises in our setting,however, is that our version of α is a nonlinear function of a subset of components of bT . This requiresestimating B(χT (s, γo, 0)) for all values of s over which g(·) must be shown to be a self-map.
23In general, it is more useful to work with a change of variables that eliminates 1−χt from denominatorin the system, and which reflects play when the state variable L is replaced by (1 − χ)L. Having shownexistence of the associated BVP in this case, we can then recover a solution to our original BVP by reversingthe change of variables and applying Lemma 3 (which ensures that 1 − χt > 0 for all t ∈ [0, T ], and hencethat the right-hand side of our system of interest is well-defined). This approach avoids the unnecessarytask of finding a uniform upper bound for χ that is strictly less than 1, and that would be required at themoment of bounding the system uniformly. In all cases, γt ∈ [0, γo] due the IVP under consideration beingin its forward version (Lemma 3).
24Since Uaθ = uθ > 0, Uaa = ua = 1 and Uaθ = 1/2 > 0, it follows that Uaθ(Uaθ + UaaUaθ) > 0.
31
We conclude with three observations that distill from this theorem. First, the self-map
condition, while not affecting the order of T (γo) relative to a traditional one-dimensional
shooting case, is not vacuous either. In fact, since s0 = B(0) is the center of S, we have that
g(s)− s0 = B(χT (s, γo, 0))−B(0)−DT
ˆ T
0
f(bt(s, γo, 0))dt.
Thus, bounding B(χT (s, γo, 0)) − B(0) imposes an additional constraint relative to those
that ensure that the system is uniformly bounded (and which guarantee that the last term
in the previous expression is bounded too). In other words, the self-map condition reduces
the constant of proportionality in T (γo) ∈ O(1/γo).
Second, the set of times for which a LME is guaranteed to exist increases without bound
as γo ↘ 0: this is because the rate of growth of the system of ODEs scale with this
parameter, and so its solution converges to the full-information (v6, v8, β1, β2, β3, χ, γ) =
(0, 0, 0, 0, 0, 1, 0, 0), which is defined for all T > 0.25
Finally, the bound T (γo) is obtained under minimal knowledge of the system: it imposes
crude bounds that only use the degree of the polynomial vector f(b), and that do not exploit
any relationship between the coefficients. Thus, the proof technique is both general and
improvable, provided more is known about the system in specific settings.
5 Extensions
As noted in Remark 1, our model can be generalized to accommodate a quadratic terminal
payoff or to allow the long-run player to affect the public signal. To demonstrate, we first
explore a political setting a politician’s payoff depends on her terminal reputation, and a
then trading model a la Kyle (1985) exhibiting private monitoring of an insider’s trades.
5.1 Reputation for Neutrality
We consider an application in which the long-run player is an expert or politician with
career concerns. The politician has a hidden ideological bias θ and takes repeated actions
25Inspection of the ~β-ODEs in the previous BVP indicates that v6 and v8 are always appear multipliedby γ. Thus, we can instead look at the system with vi = γvi, i = 6, 8, and these ODEs scale with γ. Sincethe system is uniformly bounded, γ never vanishes, and we can recover vi, i = 6, 8.
32
— for example, adopting positions on critical issues26 or making campaign promises.27 She
receives utility from taking actions that conform to her bias but also from attaining a neutral
reputation at the end of the horizon; hence, she must trade off her ideological desires with
her career concerns.
We model this specification with
−ˆ T
0
e−rt(at − θ)2dt− e−rTψa2T
as the payoff for the long-run player, where ψ > 0 is common knowledge and governs the
intensity of career concerns, and a flow payoff of U(at, at, θ) = −(at − θ)2 for the myopic
player. Since the myopic player optimally chooses at = Mt at each t ∈ [0, T ], the long-run
player’s termination payoff is effectively −e−rTψM2T . The myopic player can be interpreted as
a decision-maker (or in reduced form, an electorate) whose actions are direct communication,
journalism, or opinion polls which convey his belief about the long-run player.
As in the leading by example application, we study the role of public feedback in deter-
mining learning and payoffs for the long-run player in equilibrium. Note that the direct effect
on payoffs of removing public feedback is negative: due to the concavity of the termination
payoff, greater uncertainty about the myopic player’s belief hurts the long-run player. How-
ever, an indirect effect runs the opposite direction. All else equal, the long-run player prefers
higher actions when her type is higher, and hence her equilibrium strategy attaches positive
weight to her type. But the concavity of the termination payoff implies that the greater
the perceived value of M , the greater the incentive the long-run player has to manipulate it
downward. Higher types therefore must offset higher beliefs from their perspectives, leading
to a negative history-inference effect, which dampens the signaling coefficient α. With re-
duced signaling, the belief is less volatile from an ex ante perspective, which improves payoffs
due to the concavity of the objective function.28 Indeed, provided the objective is not too
concave, the indirect effect dominates, and the politician is better off:
26Mayhew (1974) in a classic political science text outlines three kinds of activities congresspeople engagein for electoral reasons: advertising, credit claiming, and (as in the current model) position taking. Hedescribes the dynamic nature of position taking:
. . . it might be rational for members in electoral danger to resort to innovation. The form ofinnovation available is entrepreneurial position taking, its logic being that for a member facingdefeat with his old array of positions, it makes good sense to gamble on some new ones.
27Campaign promises may be costly either due to a politician’s honesty (Callander and Wilkie, 2007) orbecause the electorate might not reelect politicians who renege on promises (Aragones et al., 2007).
28It is easy to show that the ex ante expectation of M2T is γo−γT , so that greater learning by the myopic
player results in larger terminal losses for the long-run player.
33
Proposition 5. Suppose that ψ < σ2Y /γ
o and r = 0. Then, for all T > 0: (i) there are
unique LME in the public and no feedback cases, and (ii) learning is lower and ex ante payoffs
higher in the no feedback case.
Proposition 5 highlights one mechanism through which an expert might benefit from
committing to not following polls or journalism that publicly convey her reputation for bias.
The present environment is one of common values. Hence, one can establish the existence
of a LME in the interior version of this problem with analogous methods to those in Section
4.3. The only difference is that our baseline model had terminal conditions that were a func-
tion of χ exclusively, whereas the presence of a terminal payoff delivers a terminal condition
for β1 that also depends on γ according to
β1T = − ψγTσ2Y + ψγTχT
,
reflecting the fact that the incentive to manipulate the myopic player’s belief in the final
moment is decreasing in the precision of that belief. To the extent that B depending on γ
and χ is of class C1, our fixed point method goes through.29,30
5.2 Insider Trading
An asset with fixed fundamental value θ, is traded in continuous time until date T when its
fundamental value is revealed, ending the game. The long-run player, or insider, privately
observes θ prior to the start of the game. The myopic player has a technology which allows
him to obtain private, noisy signals of the insider’s trades, as in Yang and Zhu (2018). Both
players and a flow of noise traders submit continuous orders to a third party, the market
maker, who executes those trades at a price Lt, which is public information.
We depart from the baseline model along three dimensions. First, the myopic player’s
flow payoff depends on L according to ξ(θ−L)a− a2
2, where ξ ≥ 0, the interpretation being
that L is the action of the market maker.31 Second, the long-run player’s flow payoff is
29The only adjustment needed is in proving the self-map condition for g, where Bγ appears.30We can also study the case ψ < 0, where the long-run player wants to appear as extreme as possible at
the end of the horizon. In that case, the history-inference effect becomes positive; higher types forecast higherbeliefs by the myopic player and have greater incentive to further manipulate those beliefs. The history-inference effect thus amplifies the signaling coefficient, which benefits the long-run player by increasinglearning and hence the terminal reward. This effect reinforces the positive direct effect of greater uncertaintyin the presence of a convex terminal payoff, so the long-run player again prefers the environment with nofeedback. This result is valid for mildly convex rewards, as β1T is not well-defined if ψ is too negative.
31The quadratic loss term strengthens our non-existence result, as it limits the myopic player’s abilityto exploit the private information he acquires. The parameter ξ can then be interpreted as the size of themyopic player or (inverse of) his transaction costs.
34
simply (θ − Lt)at, i.e., it is linear in at. Finally, the public signal now includes the long-run
player’s action: dXt = (at + at)dt+ σXdZXt . Hence, the myopic player learns from both the
private monitoring channel and the public price.
Following the literature, we seek an equilibrium in which the informed trader reveals her
private information gradually over time through a linear strategy of the form (12). Hence,
we require that the coefficients of the insider’s strategy be C1 functions over strict compact
subsets of [0, T ).32 We can then apply Lemmas 2 and 3 to such sets.33
Clearly, when ξ = 0, the model reduces to the classic model of Kyle (1985) (see also Back
(1992)), and hence a LME with trading strategy of the form β3(θ − L) always exists. This
is not the case when ξ > 0.
Proposition 6. Fix ξ > 0. Then for all σY > 0, there does not exist a linear Markov
equilibrium of the insider trading game.
The intuition for this result is as follows. As the myopic player privately observes a
signal of the insider’s trades, he acquires private information about θ over time. The myopic
player’s own trades then carry further information to the market maker, beyond that which
the market maker learns from the insider alone. This introduces momentum into the law of
motion for the price from the insider’s perspective, measured by a term ξ(m− l) in the price
drift; the insider’s trading at any time not only causes an immediate price impact but also
sets forth continued future price impacts as the myopic player’s trades continue to inform the
market maker. These repeated price impacts via the myopic player make future trades less
attractive to the insider, thereby putting the insider in a “race against herself” and inducing
her to trade away all information in the first instant.
This result is intimately related to a non-existence result in Yang and Zhu (2018). In a
two-period model, they show that a linear equilibrium ceases to exist if the private signal
of a back-runner—a trader who only participates in the last round after receiving noisy
information of the informed player’s first-period trade—is sufficiently precise, situation in
which a mixed strategy equilibrium emerges. More generally, the existence problem relates
to how, with pure strategies, an informed player’s rush to trade depends on the number of
trading opportunities in certain settings. In this line, Foster and Viswanathan (1994) show,
in an asymmetric environment where one long-run trader’s information nests another’s, that
the better informed trader quickly trades the commonly known piece of information to exploit
32By not imposing this requirement over [0, T ], we maintain the possibility of full revelation of the insider’sinformation through an explosion of trades near the end of the game, as is standard in insider trading models.In addition, this requirement ensures that the total order can be “inverted” from the price, and hence it iswithout loss to make X public to all players.
33Specifically, the proof of Lemma 2 provides the learning ODEs for the case ν > 0, and it is easy to seethat the steps of Lemma 3 (with uθ = ξ, ua = 0) go through for this case.
35
her superior information only later on. While there are important differences between our
settings (the belief of the lesser informed player is, in their model, always known to the
first, and their common information exogenous) there is a common theme: once common
information is created (either exogenously or endogenously), there is a pressure to trade
quickly on it. Such pressure is increasing in the number of trading opportunities.34
6 Conclusion
We have examined the implications of a minimal—yet natural—departure from an extensive
literature on signaling games: namely, that the signal observed by a receiver is both noisy
and private. We showed that, unlike in settings where such a signal is public, the sender’s
history of play affects the informativeness of her actions at all points in time, and we explored
the learning and payoff implications of such history-inference effect in applications. In the
process, we have introduced an approach for establishing the existence of LME in dynamic
games of asymmetric learning. Let us now discuss three assumptions of the model: its
asymmetry, the presence of a myopic player, and the linear-quadratic-Gaussian structure.
The asymmetry of the environment studied indeed provides us with enough tractability,
in the sense that it allows us to “close” the set of states at the second order. If instead the
long-run player had a stochastic type, or access to an imperfect private signal, even higher-
order beliefs would become payoff-relevant states. While some economic environments may
feature some of these assumptions, a natural question that arises is whether we believe
economic behavior in such settings is effectively driven by such higher-order inferences.
Second, the presence of a myopic player is not a major technical limitation. In fact, most
of the results are derived for, or can be generalized to, continuous coefficients ~δ. With a
long-run “receiver” such coefficients solve ODEs capturing optimality and correct beliefs but
(i) no additional states are needed, and (ii) the fixed-point argument is applicable (to an
enlarged boundary value problem).
Finally, the linear-quadratic-Gaussian class examined is clearly a limited class. Yet, its
advantage lies in that it is a powerful framework for uncovering economic effects that are
likely to be key in other, more nonlinear, environments. From that perspective, the way in
which the inference of others’ private histories interacts with payoffs in shaping signaling,
along with the time-effects that learning has on incentives, seem to exhaust the set of effects
that we would expect to be of first order in other settings.
34In symmetric settings, Holden and Subrahmanyam (1992) show that intense trading occurs in early pe-riods between two identically informed traders, and Back et al. (2000) obtain the corresponding nonexistenceresult directly in continuous time.
36
Appendix A: Proofs for Section 2
Proofs for Section 2.1
Since player 2 attempts to match player 1’s action, we have
at = Et[β0t + β1tMt + β3tθ]
= β0t︸︷︷︸δ0t
+ (β1t + β3t)︸ ︷︷ ︸δ1t
Mt.
The HJB equation for player 1 is
rV (θ,m, t) = supa
{−(a− θ)2 − (a− at)2 + Λtµt(a)VM(θ,m, t) +
Λ2tσ
2Y
2VMM(θ,m, t) + Vt(θ,m, t)
},
(A.1)
where
Λt :=β3tγtσ2Y
µt(a) := a− β0t − (β1t + β3t)m.
To obtain the maximizer of the RHS of (A.1), we impose the first order condition
0 = −2(a− θ)− 2(a− β0t − (β1t + β3t)m) +β3tγt[v2t + 2mv4t + θv5t]
σ2Y
=⇒ 0 = −2(β0t + β1tm+ β3tθ − θ)− 2β3t(θ −m) +β3tγt[v2t + 2mv4t + θv5t]
σ2Y
, (A.2)
where in the second line we have used that the maximizer must be a∗ := β0t + β1tm+ β3tθ.
Since A.2 must hold for all (θ,m, t) ∈ R2× [0, T ], the coefficients on θ, m and t must vanish,
and we obtain
v2t :=2σ2
Y β0t
β3tγt(A.3)
v4t :=σ2Y (β1t − β3t)
β3tγt(A.4)
v5t :2σ2
Y (2β3t − 1)
β3tγt. (A.5)
37
Since viT = 0 for all i ∈ {0, 1, . . . , 5}, (A.3)-(A.5) imply the terminal values of the coefficients
β0T := 0 (A.6)
β1T :=1
2(A.7)
β3T :=1
2, (A.8)
which are also the myopic equilibrium coefficients, that is, the coefficients for the game in
which both players act myopically at each instant (while still updating their beliefs).
Substituting a∗ into (A.1) yields
0 = −r[v0t + v1tθ + v2tm+ v3tθ2 + v4tm
2 + v5tθm]
+−[β0t +mβ1t + θ(β3t − 1)]2 − (m− θ)2β23t −
(m− θ)[v2t + 2mv4t + θv5t]β23tγt
σ2Y
+v4tβ
23tγ
2t
σ2Y
+ v0t + v1tθ + v2tm+ v3tθ2 + v4tm
2 + v5tθm,
(A.9)
which again must hold for all (θ,m, t) ∈ R2× [0, T ]. Using (A.3)-(A.5) and their derivatives,
we eliminate (v2t, v4t, v5t, v2t, v4t, v5t) from (A.9) to obtain a new equation. As the constant
term and the coefficients on θ,m, θ2,m2 and θm in this new equation must vanish, we obtain
along with (A.13) a system of ODEs for (v0, v1, v3, β0, β1, β3, γ). This system contains a
subsystem for (β0, β1, β3, γ):
β0t = 2rβ0tβ3t (A.10)
β1t = β3t
[r(2β1t − 1) +
β1tβ3tγtσ2Y
](A.11)
β3t = β3t
[r(2β3t − 1)− β1tβ3tγt
σ2Y
](A.12)
γt = −(β3tγt)2
σ2Y
. (A.13)
Hence, a linear Markov equilibrium must solve the following boundary value problem:
(A.10)-(A.13) with boundary conditions γ0 = γo and (A.6)-(A.8). We show that a solution
to this boundary value problem always exists.
It is useful to transform this boundary value problem into an initial value problem by
reversing the direction of time and making a guess for the initial value of γ. Hence, we define
the backward system
38
β0t = −2rβ0tβ3t (A.14)
β1t = β3t
[r(1− 2β1t)−
β1tβ3tγtσ2Y
](A.15)
β3t = β3t
[r(1− 2β3t) +
β1tβ3tγtσ2Y
](A.16)
γt =β2
3tγ2t
σ2Y
, (A.17)
with initial conditions β00 = 0, β10 = β30 = 12
and γ0 = γF ≥ 0.
Define Bpubt := β1t + β3t.
Lemma A.1. In any solution to the backward system over an interval [0, T ], if γF > 0, then
(i) Bpubt = 1 for all t ∈ [0, T ], (ii) β3t ∈ (1/2, 1) and β1t ∈ (0, 1/2) for all t ∈ (0, T ], (iii)
β3 is monotonically increasing while β1 is monotonically decreasing, and (iv) γ is strictly
increasing. If γF = 0, then β1t = β3t = 12
and γt = 0 for all t ∈ [0, T ]. For any γF , β0 = 0.
Proof of Lemma A.1. We first claim that if a solution exists over some interval [0, T ], then
β3t > 0 for all t ∈ [0, T ]. To see this let fβ3(t, β3t) denote the RHS of (A.16). Letting xt := 0
for all t ∈ [0, T ], we have β30 = 1/2 > x0 and β3t − fβ3(t, β3t) = 0 = xt − fβ3(t, xt). By the
comparison theorem in Teschl (2012, Theorem 1.3), the claim holds.
Next, we define Bpubt := β1t + β3t and show that Bpub = 1. Adding (A.15) and (A.16)
yields
Bpubt = 2rβ3t(1−Bpub
t ),
which, given the initial condition and any β3t, has solution of the formBpubt = 1−Ce−
´ t0 2rβ3sds.
Since Bpub0 = 1 = 1− C, we have C = 0 and Bpub = 1.
Hence we can rewrite the β3t ODE as
β3t = β3t
[r(1− 2β3t) +
β3t(1− β3t)γtσ2Y
]. (A.18)
We now show that β3 < 1. Let fβ3(t, β3t) now denote the RHS of (A.18), and define
xt := 1 for all t ∈ [0, T ]. Then x0 = 1 > β30 = 12, and β3t−fβ3(t, β3t) = 0 ≤ r = xt−fβ3(t, xt),
so by the comparison theorem, the claim holds. Since β1t = 1− β3t, we have β1 > 0.
Consider the case γF = 0. Since β3 > 0, γt = 0 is the unique solution to (A.17). Letting
zt := β3t− β1t, we have zt = −2rβ3tzt. As this is a linear ODE with initial condition z0 = 0,
39
the unique solution is zt = 0 for all t ∈ [0, T ]. Since β1t + β3t = 1, we have β1t = β3t = 1/2
for all t ∈ [0, T ], proving the claim in the proposition statement.
Next consider the case γF > 0. Since β3 > 0, (A.17) implies γ is strictly increasing, and
hence γt > 0 for all t ∈ [0, T ]. Now whenever β3t = 12, we have β3t = 1
2
[0 + γt
4σ2Y
]> 0, and
thus β3t > 1/2 for all t ∈ (0, T ]. Since β1t = 1− β3t, we have β1t < 1/2 for all such t.
We now turn to (iii). Since β1t + β3t = 0 for all t ∈ [0, T ], it suffices to show that β3t > 0
for all t ∈ [0, T ]; in turn, it suffices to show that Ht := r(1 − 2β3t) + β3t(1−β3t)γtσ2Y
> 0 for all
t ∈ [0, T ], where β3t = β3tHt. Now H0 = γ04σ2Y> 0, and with algebra it can be shown that if
Ht = 0, Ht =(1−β3t)β3
3tγ2t
σ4Y
> 0. It follows that Ht > 0 for all t ∈ [0, T ], as desired.
Finally, note that in all cases, we have β3 > 0, so the unique solution to (A.14) consistent
with the initial condition β00 = 0 is β0 = 0.
Proof of Proposition 1. Existence of LME reduces to existence of a solution to the boundary
value problem defined by (A.10)- (A.13) with associated boundary conditions. Using the
uniform bounds in Lemma A.1, we can obtain existence using an analogous backward shoot-
ing argument to the one used in Theorem 1. It is then straightforward to verify that the
remaining value function coefficients are well-defined and that the HJB equation is satisfied.
The remaining claims have been established in Lemma A.1.
We conclude with a derivation of the closed form solution for the undiscounted case, to
which we refer in Propositions 3 and 4.
Lemma A.2. For r = 0, the leading by example game has a unique LME for the public case,
and (β0, β1, β3, γ) can be expressed in closed form.
Proof. We derive the following closed form solution:
γt =γT2
+1
2γT− T−t
σ2Y
(A.19)
β3t =1
2− γT (T−t)2σ2Y
, where (A.20)
γT =γoT + 2σ2
Y −√
(γoT )2 + 4σ4Y
T(A.21)
and where β0 ≡ 0 and β1 ≡ 1− β3.
Observe that β3tγt + β3tγt =β23tγ
2t
σ2Y. Hence, define Πt := β3tγt, which has ODE Πt =
Π2t
σ2Y
40
with initial condition Π0 = β30γF = γF/2; its solution is
Πt =1
2γFpub− t
σ2Y
.
Substitute Π into (A.17) to obtain
γt =1
σ2Y
12
γFpub− t
σ2Y
2
=⇒ γt = Cγ +1
2γFpub− t
σ2Y
.
As γ0 = γFpub, we have Cγ = γFpub/2 and thus
γt =γFpub
2+
12
γFpub− t
σ2Y
(A.22)
Moreover,
γT = γo =γFpub
2+
12
γFpub− T
σ2Y
,
which is equivalent to the quadratic
=⇒ T
2
(γFpub
)2 −(γoT + 2σ2
Y
)γFpub + 2σ2
Y γo = 0.
The quadratic on the LHS is convex and evaluates to 2σ2Y γ
o > 0 at γFpub = 0 and evaluates
to − (γo)2 T/2 < 0 at γFpub = γo, so there is a unique solution in (0, γo) which in the forward
system is γT as in the proposition statement. Substituting this into (A.22) and returning to
the forward system by replacing t with T − t yields γt in the forward system. It is easy to
verify that γt > 0 for all t.
Lastly, we have (in the forward system) β3t = Πt/γt = 1
2−γFpub
(T−t)
2σ2Y
and β1t = 1− β3t.
Proofs for Section 2.2
Proof of Lemma 1. Let β0t + β1tMt + β3tθ denote the long-run player’s strategy. Thus,
Et[at] = α0t + α3tMt, where α0 = β0 + β1(1 − χ)m0 and α3 = β3 + χβ1. Then, dMt =
41
α3tγ1tσ2Y
[dYt − (α0t + α1tMt)dt], so letting R(t, s) = exp(−´ ts
α23uγ1uσ2Y
du)
Mt = m0R(t, 0) +
ˆ t
0
R(t, s)α3sγ1s
σ2Y
[(a2t − α0t)dt+ σY dZYt ]
⇒Mt = m0R(t, 0) +
ˆ t
0
R(t, s)α3sγ1s
σ2Y
(a2t − α0t)dt
˙γ1t = −γ21tα
23t
σ2Y
. (A.23)
On the path of play, however, a2t = β0t + β1tMt + β3tθ, so we obtain
Mt = m0R(t, 0) + θ
ˆ t
0
R(t, s)α3sγ1s
σ2Y
[−(1− χs)β1sm0 + β1sM2s + β3sθ]ds
where we used that β0s − α0s = −(1− χs)β1sm0. In particular,
dMt =
(−M2t
[α3sγsσ2Y
(α3s + β1s)
]+α3sγsσ2Y
[−β1s(1− χs)m0 + β3sθ]
)dt,
and so, letting R(t, s) = exp(−´ tsα3uγ1uσ2Y
(α3u − β1u)du),
Mt = m0
(R(t, 0)−
ˆ t
0
R(t, s)α3sγsσ2Y
β1s(1− χs)ds)
+ θ
ˆ t
0
R(t, s)α3sγsσ2Y
β3sds.
From here
χt =
ˆ t
0
R(t, s)α3sγsσ2Y
β3sds and 1− χt = R(t, 0)−ˆ t
0
R(t, s)α3sγsσ2Y
β1s(1− χs)ds.
Critically, observe that the second constraint is a direct consequence of the first. Specifically,
adding and subtracting β3s and noticing that α3s − β1s = β3s − β1s(1− χs)we can write
−ˆ t
0
R(t, s)α3sγsσ2Y
β1s(1− χs)ds =
ˆ t
0
R(t, s)α3sγsσ2Y
[α3s − β1s]ds− χt
=
ˆ t
0
dR(t, s)
dsds− χt
= 1− R(t, 0)− χt (A.24)
where in the last equality we used that R(t, t) = 1.
With this in hand, the relevant constraint is the first. In differential form, and using that
42
α3t = β3 + β1χ,
χt = −α3tγtσ2Y
[α3t − β1t]χt +α3tβ3tγtσ2Y
=(β3t + β1tχt)
2γtσ2Y
(1− χt) =α2
3tγ1t
σ2Y
(1− χt).
Using the exact same arguments in the proof of Lemma 3, we conclude that the re-
sulting (χ, γ) system admits a unique solution with the same properties as in that lemma.
Furthermore, it is easy to check that 1− γt/γo satisfies the χ-ODE, concluding the proof.
We now turn to characterizing equilibrium for the no feedback case. From Lemma 1,
given a conjecture by the follower about (β0, β1, β3), the variance of the follower’s belief
evolves deterministically as γt = −α2tγ
2t
σ2Y
and χ ≡ 1− γ/γo.The follower matches the expectation of the leader’s action by playing
at = E2t [β0t + β1tMt + β3tθ]
= Et[β0t + β1t (m0[1− χt] + χtθ) + β3tθ]
= β0t + β1tm0(1− χt)︸ ︷︷ ︸δ0t
+ (β1tχt + β3t)︸ ︷︷ ︸δ1t
Mt.
Define µ1t := αtγt/σ2Y , µ0t = −µ1t[β0t + β1tm0(1 − χt)] and µ2t = −αtµ1t, where αt =
β1tχt + β3t.
The HJB equation is
rV (θ,m, t) = supa{ −
(a2 − 2a[δ0t + δ1tm] + δ2
0t + 2δ0tδ1tm+ δ21t[γtχt +m2]
)︸ ︷︷ ︸=Et[(a−at)2]
−(a− θ)2 + (µ0t + aµ1t +mµ2t)Vm(θ,m, t) + Vt(θ,m, t)},
(A.25)
where we have used that Et[Mt
]= Mt = m and Et
[M2
t
]=(Et[Mt
])2
+ Var[Mt
]=
m2 + γtχt.
Imposing the first order condition on the RHS of (A.25) and evaluating at a = β0t +
β1tm+ β3tθ, yields
0 = −4[β0t + β1tm+ β3tθ] + 2θ +αtγt[v2t + 2mv4t + θv5t]
σ2Y
+ 2[β0t + m0β1t(1− χt) +mαt].
(A.26)
43
Matching coefficients, we obtain
v2t =2σ2
Y [β0t − m0β1t(1− χt)]αtγt
(A.27)
v4t =−σ2
Y [β3t − β1t(2− χt)]αtγt
(A.28)
v5t =2σ2
Y [2β3t − 1]
αtγt, (A.29)
provided that αt, γt > 0, which we will confirm below. Setting viT = 0 for i = 2, 4, 5 yields
the terminal conditions stated below. As in the public benchmark, these conditions also
yield the equilibrium coefficients for the case where both players behave myopically.
We now evaluate (A.25) at a = β0t+β1tm+β3tθ, using (A.27)-(A.29) and their derivatives
to eliminate (v2t, v4t, v5t, v2t, v4t, v5t) and in turn using (A.33) and (A.34) to eliminate γt and
χt. The resulting equation gives a system of ODEs for (v0, v1, v3, β0, β1, β3, γ), which contains
the following subsystem for (β0, β1, β3, γ):
β0t =αt
2σ2Y
{rσ2
Y β0t(2− χt)− rσ2Y (1− χt) + 2γtβ
21t(1− χt)
}(A.30)
β1t =αt
2σ2Y
{−rσ2
Y + 2β1t[β3tγt + rσ2Y (2− χt)]− 2β2
1tγt(1− χt)}
(A.31)
β3t =αt
2σ2Y
{−rσ2
Y (2− χt)− 2β3t[β1tγt − rσ2Y (2− χt)]
}(A.32)
γt = −α2tγ
2t
σ2Y
(A.33)
χt =γtα
2t (1− χt)σ2Y
(A.34)
with boundary conditions γ0 = γo, β0T = 1−χT2(2−χT )
, β1T = 12(2−χT )
, β3T = 12, γ0 = γo and
χ0 = 0. The ODE for α is
αt = −rαt[1− αt(2− χt)], (A.35)
and its terminal value is αT = 12−χT
.
Let (βm0t , βm1t , β
m3t , α
mt ) =
(1−χt
2(2−χt) ,1
2(2−χt) ,12, 1
2−χt
)denote the myopic coefficients, that is
the myopic equilibrium given the variance γ induced by the dynamic candidate equilibrium
strategy.
Proof of Proposition 2. We work with the backward system, using an initial guess γ0 = γF
to formulate an initial value problem parameterized by γF .
44
The backward system for the coefficients is
β0t =αt
2σ2Y
{−rσ2
Y β0t(2− χt) + rσ2Y (1− χt)− 2γtβ
21t(1− χt)
}(A.36)
β1t =αt
2σ2Y
{rσ2
Y − 2β1t[β3tγt + rσ2Y (2− χt)] + 2β2
1tγt(1− χt)}
(A.37)
β3t =αt
2σ2Y
{rσ2
Y (2− χt) + 2β3t[β1tγt − rσ2Y (2− χt)]
}(A.38)
αt = rαt[1− αt(2− χt)] (A.39)
γt =α2tγ
2t
σ2Y
(A.40)
with boundary conditions β00 = 1−χ0
2(2−χ0), β10 = 1
2(2−χ0), β30 = 1
2and α0 = 1
2−χ0> 0.
By Teschl’s comparison theorem, α > 0 in any solution to the IVP. It follows that for
γF = 0, the IVP has a unique solution (β0, β1, β3, γ,χ) = (0, 1/2, 1/2, 0, 1). By continuity,
suppose now that γF > 0 is sufficiently small that there exists a solution to the IVP. By the
same argument as in the proof of Lemma A.1, γ is increasing.
Given such (γt)t∈[0,T ], we can write the right-hand side of the α-ODE as a function of the
form fα(t, α) that is of class C1. Using that χt = 1 − γt/γo, observe that, in the backward
system,
d
dt
(1
2− χt
)− fα(t, 1/(2− χt)) = − γt
γo(1 + γt/γo)2< 0 = αt − fα(t, αt)
where the first inequality follows from γt being increasing in the backward system. Moreover,
α0 = 12−χ0
. The comparison theorem allows us to conclude that αt ≥ 1/(2−χt), and in turn
αt ≤ 0 (and hence αt ≥ 0 in the forward system), for all t ∈ [0, T ] with both inequalities
strict for t ∈ (0, T ] (t ∈ [0, T ) in the forward system) if and only if r > 0. It follows that for
all t ∈ (0, T ], αt < α0 = 12−χ0
< 1.
Now by simple addition of the ODEs, we obtain that B∞t := β0 + β1 + β3 satisfies
B∞t =αt
2σ2Y
{2rσ2
Y (2− χt)[1−B∞t ]}, with B∞0 = 1.
It is easy to see that the unique solution is B∞t = 1 (one can integrate the previous ODE
between 0 and t small, and argue by contradiction using the continuity of B∞).
Next, we establish uniform bounds on β1 and β3 (and hence β0). Observe that β30 =
βm30 = 1/2 and β10 = βm10 > 0; we first argue that β3t >12
and β1t > 0 for all t ∈ (0, T ].
By way of contradiction, suppose that τ ∈ (0, T ] is the first strictly positive time either of
these properties fails. If β1τ = 0, then β1τ = rβ3τ2
> 0, a contradiction; hence β1τ > 0. But
45
if β3τ = 1/2, then β3τ = β1τγτ (1+2β1τχτ )
4σ2Y
> 0, also a contradiction. We conclude that β3t >12
and β1t > 0 for all t ∈ (0, T ].
By definition, αt = β1tχt + β3t and thus β3t ≤ αt < 1. Finally, note that β10 = βm10 < 1
and the RHS of the β1 ODE can be written as fβ1(t, β1) of class C1, and with algebra it can
be shown that
βm1t − fβ1(t, βm1t) =γt
4σ2Y (2− χt)4
(β3t[2− χt]− [1− χt]) (2β3t[2− χt] + χt)
> 0 = β1t − fβ1(t, β1t),
so by the comparison theorem, we have β1t < βm1t < 1 for all t ∈ (0, T ].
By an analogous one-dimensional shooting argument as in the public case, we obtain
existence of a solution to the BVP and hence existence of an LME.
The final claim to prove is that as T → ∞, αT → 1, for which we use the forward
system. Recalling that α > 1/2, we have γT → 0 as T → ∞, and thus χT → 1 and
αT = 1/(2− χT )→ 1.
We refer to the following result in Propositions 3 and 4.
Lemma A.3. For r = 0, the leading by example game has a unique LME for the no feedback
case, expressed in closed form as follows:
β1t =γo[(γo + γT )2σ2
Y − (T − t)(γo)2γT ]
(γo + γT )[2σ2Y (γo + γT )2 − (T − t)(γo)2γT ]
(A.41)
β3t =σ2Y (γo + γT )2
2σ2Y (γo + γT )2 − (T − t)(γo)2γT
(A.42)
αt =γo
γo + γT(A.43)
γt =γTσ
2Y (γo + γT )2
σ2Y (γo + γT )2 − (T − t) (γo)2 γT
, (A.44)
for all t ∈ [0, T ], where χt = 1 − γt/γo and γT ∈ (0, γo) is the unique solution in (0, γo) to
the cubic γTT (γo)3 + (γT − γo) (γT + γo)2 σ2Y = 0, and β0 ≡ 1− β1 − β3.
Proof. We work with the backward system. First note that by setting r = 0 in (A.39), α
must be constant and equal to its initial value α0 = 12−χ0
. Next, recall that by Lemma 1,
χt = 1 − γtγo
, so χ0 = 1 − γFNFγo
and thus αt = α = γo
γFNF+γofor all t ∈ [0, T ]. Next, note that
the ODE γt =α2γ2tσ2Y
given an initial value γFNF has solution γt =γFNF σ
2Y
σ2Y −γ
FNF
(γo
γFNF
+γo
)2
t
; switching
back to the forward system by replacing t with T − t yields the expression in the original
46
statement. Now the terminal condition γT = γo is equivalent to the following cubic equation
for γFNF :
q(γFNF ) := γFNFT (γo)3 +(γFNF − γo
) (γFNF + γo
)2σ2Y = 0. (A.45)
Note q(γFNF
)> 0 for γFNF ≥ γo and q
(γFNF
)≤ 0 for γFNF ≤ 0, so all real roots must lie in
(0, γo). Now any root to the cubic must satisfy
T (γo)3
γo − γFNF= σ2
Y
(γFNF + γo)2
γFNF. (A.46)
The LHS of (A.46) is strictly increasing for γFNF ∈ (0, γo) while the RHS is strictly decreasing
in this interval, so q has a unique real root. Returning to the β1 ODE, using α = β1χ+ β3,
we have β1 = αγtβ1tσ2Y
(α−β1t). This ODE can be solved by integration after moving β1(α−β1)
to the LHS, and with algebra, one obtains (in the forward system) the expression in the
proposition statement. One then obtains β3t from these known quantities using β3t = α −β1tχt.
Proofs for Section 2.3
Here we prove Propositions 3 and 4. Since these results require some preliminary lemmas,
we organize them into separate sections.
Proof of Proposition 3
We treat the patient and myopic cases one at a time. The following lemma compares signaling
and learning between the public and no feedback cases for r = 0.
Lemma A.4. For r = 0 and all values of T, γo and σY , more information is revealed in the
no feedback case than in the public benchmark case: γFpub > γFNF . In the public benchmark,
there is more aggressive signaling early in the game and less aggressive signaling later in the
game, relative to the no feedback case; i.e., there exists T ∗ ∈ (0, T ) such that βpub3t > αNF if
and only if t < T ∗.
Proof. For the first claim, recall that γFNF is the unique positive root of the cubic equation
q(γF ) = 0 defined in (A.45), where for γ > 0, q(γ) > 0 iff γ > γFNF . Hence, to prove the
47
claim, it suffices to show that q(γFpub) > 0. By direct calculation, we have
q(γFpub) = (γo)3
(Tγo + 2σ2
Y −√
(Tγo)2 + 4σ4Y
)+σ2Y
T 3
(2σ2
Y −√
(Tγo)2 + 4σ4Y
)(2Tγo + 2σ2
Y −√
(Tγo)2 + 4σ4Y
)2
= (γo)4Tq2(S), where
q2(S) := 1 + 2S −√
1 + 4S2 + S(
2S −√
1 + 4S2)(
2 + 2S −√
1 + 4S2)2
and
S :=σ2Y
Tγo.
We now show that q2(S) > 0 for all S > 0 (observe that q2(0) = 0). Let R(S) = 1 + 2S −√1 + 4S2; it is straightforward to verify that R(0) = 0 and that for all S ≥ 0, R′(S) > 0 and
R(S) < 1. Moreover, the inverse of R is the function S : [0, 1) → [0,∞) characterized by
S(R) := R(2−R)4(1−R)
. Hence, by change of variables, q2(S) > 0 for all S > 0 iff q3(R) > 0, where
q3(R) := R− S(R)(1−R)(R + 1)2.
Now for R ∈ [0, 1),
q3(R) > 0
⇐⇒ S(R) =R(2−R)
4(1−R)<
R
(1−R)(R + 1)2
⇐⇒ q4(R) := (2−R)(R + 1)2 < 4.
It is straightforward to verify that over the interval [0, 1], q4(R) attains its maximum value
of 4 at R = 1, and tracing our steps backwards this implies that q(γFpub) > 0, so γFpub > γFNF ,
proving the first claim.
For the second claim, using the forward system, since βpub3T = 12< αNF and βpubt is mono-
tonically decreasing, it suffices to show that βpub30 > αNF . Using the associated expressions
from Lemmas A.2 and A.3, this is equivalent to
1
2− γFpubT
2σ2Y
>γo
γo + γFNF
⇐⇒ γ := γo
(1−
γFpubT
2σ2Y
)< γFNF .
48
It suffices to show that q(γ) := T γ(γo)3 + (γ − γo)(γ + γo)2σ2Y < 0. Recalling that
γpubF =γoT + 2σ2
Y −√
(γoT )2 + 4σ4Y
T
one can show that
q(γ) =(γo)4T [−γoT +
√(γoT )2 + 4σ4
Y ]
2σ2Y
[1−
2σ2Y − (γoT −
√(γoT )2 + 4σ4
Y )
2σ2Y
]
= −T (γo)4
2σ4Y
[(Tγo)2 + 2σ4
Y − Tγo√
(Tγo)2 + 4σ4Y
].
The expression in square brackets can be written as x+y2−√xy > 0 where x = (Tγo)2 > 0
and y = (Tγo)2 + 4σ4Y > 0, and thus q(γ) < 0, concluding the proof.
Continuing toward the proof of Proposition 3, we now handle the myopic case. We begin
by deriving the solution for the public and no feedback cases for a myopic leader.
Lemma A.5. Suppose the leader is myopic. In the LME for the public case, β3 = 1/2 and
γpubt =4σ2Y γ
o
4σ2Y +γot
. In the LME for the no feedback case, αt = γo
γ0+γNFt, where γNFt is defined
implicitly as the unique solution in (0, γo] of the equation 2 ln(γNFt /γo)−γo/γNFt +γNFt /γo =
−γotσ2Y
.
Proof. We first consider the public benchmark case, where in the myopic solution, β3t = 1/2,
and thus (in the forward system)
γpubt = −β23t(γ
pubt )2
σ2Y
= −(γpubt )2
4σ2Y
=⇒ γpubt =4σ2
Y γo
4σ2Y + γot
,
and thus upubt =γpubt
2=
2σ2Y γ
o
4σ2Y +γot
.
In the myopic solution to the no feedback case, αt = 12−χt = 1
1+γNFt /γo, where γNFt solves
the ODE
γNFt = −α2t
(γNFt
)2
σ2Y
= − 1
σ2Y
(γoγNFtγo + γNFt
)2
(A.47)
=⇒ 2γNFtγNFt
+ γoγNFt
(γNFt )2 +
γNFtγo
= − γo
σ2Y
. (A.48)
By integrating both sides of (A.48) and using that γNF0 = γo to pin down the constant of
49
integration, we obtain (γNFt )t∈[0,T ] to (A.47) solves
2 ln(γNFt /γo)− γo/γNFt + γNFt /γo = −γot
σ2Y
. (A.49)
To verify that γNFt ∈ (0, γ0] is well-defined as such, define f : (0, 1]→ R by
f(y) := 2 ln(y)− 1/y + y,
and note that f(y) is strictly increasing as f ′(y) = (1 + 1/y)2 > 0, and moreover, f(1) =
0 ≥ −γotσ2Y
while limy→0 f(y) = −∞ < −γotσ2Y
. It follows that for all t ∈ [0, T ], γNFt ∈ (0, γo] is
uniquely determined by (A.47).
Lemma A.6. For the myopic case, γPubt > γNFt for all t ∈ (0, T ].
Proof. Observe that γNF0 = γPub0 = γo. Solving the ODEs for γPub and γNF by integration,
and using that αt ≥ β3t = 1/2 with strict inequality for all t > 0 then yields the result.
The last step toward proving Proposition 3 is to establish uniform convergence of solu-
tions to the myopic solutions as r → ∞. For arbitrary T > 0 and r ≥ 0, let BVPPub(r)
denote the boundary value problem for (β1, β3, γ) defined by (A.31)-(A.13) and the associated
boundary conditions, parameterized by r, and likewise let BVPNF (r) denote the boundary
value problem for (α, γ, χ) defined by (A.33)-(A.35) and the associated boundary conditions.
Ξpub := {(β1, β3, γ) ∈ C1([0, T ])3 such that (β1, β3, γ) solves BVPPub(r) for some r ≥ 0}
ΞNF := {(α, γ, χ) ∈ C1([0, T ])3 such that (α, γ, χ) solves BVPNF (r) for some r ≥ 0}.
Lemma A.7. The families {(β1, β3, γ) : (β1, β3, γ) ∈ Ξpub} and {(α, γ, χ) : (α, γ, χ) ∈ ΞNF}are uniformly bounded, and hence Ξpub and ΞNF are equicontinuous.
Proof. We begin with the public case. Recall that Ξpub is uniformly bounded, and in par-
ticular, we have (β1t, β3t, γt) ∈ [0, 1/2] × [1/2, 1] × [0, γo] for all (β1, β3, γ) ∈ Ξpub and all
t ∈ [0, T ]. It follows that |γt| =β23tγ
2t
σ2Y≤ (γo)2
σ2Y
. We now establish a uniform bound on β3. If
we define βm3 := 1/2 and βf3 = β3 − βm3 , we have from the (backward system) β3 ODE
βf3t = β3t = β3t[−2rβf3t + β3t(1− β3t)γt/σ2Y ],
which is linear in βf3 . Solving this ODE and multiplying through by r, we obtain
rβf3t =
ˆ t
0
re−r´ ts 2β3uduβ2
3s(1− β3s)γs/σ2Y ds.
50
Now |β23s(1− β3s)γs/σ
2Y | ≤ gpub := γo
σ2Y
. Moreover, 2β3u ≥ 1/2, and thus
|rβf3t| ≤ gpubˆ t
0
re−r´ ts 2β3ududs ≤ gpub
ˆ t
0
re−r(t−s)ds = gpub(1− e−rt) < gpub.
It follows that
|β3t| = |βf3t| = |β3t[−2rβf3t + β3t(1− β3t)γt/σ2Y ]|
≤ | − 2β3t| · |rβf3t|+ |β23t(1− β3t)γt/σ
2Y |
≤ 1 · gpub + gpub,
which is the desired uniform bound as gpub is independent of r. Now since β1 + β3 ≡ 1, |β1t|is also uniformly bounded above by 2gpub. Hence we have established uniform bounds on the
derivatives (β1, β3, γ) for (β1, β3, γ) ∈ Ξpub, and thus Ξpub is equicontinuous.
Next, we turn to the no feedback case, where we recall the uniform bounds αt ∈ [1/(2−χt), 1] ⊂ [1/2, 1], γt ∈ [0, γo] and χt ∈ [0, 1]. Immediately, we have |γt| = |α
2tγ
2t
σ2Y| ≤ (γo)2
σ2Y
, and
since χ ≡ 1− γ/γo, |χt| = | − γt/γo| ≤ γo
σ2Y
=: gNF . We now uniformly bound αt.
Set αmt := 1/(2− χt) and αft := αt − αmt , and note that αmt = χt/(2− χt)2. We
αft := αt − αmt = −rαt(2− χt)αft − χt/(2− χt)2,
which is linear in αf . As in the public case, solving this ODE and multiplying through by r
yields
rαft =
ˆ t
0
re−r´ ts αu(2−χu)du[−χs/(2− χs)2]ds
=⇒ |rαft | ≤ˆ t
0
re−r´ ts αu(2−χu)du|χs/(2− χs)2|ds.
Now |χs/(2− χs)2| ≤ |χs| ≤ gNF as noted above, so
|rαft | ≤ gNFˆ t
0
re−r´ ts αu(2−χu)duds
≤ gNFˆ t
0
r−r(t−s)ds = gNF (1− e−rt) < gNF ,
51
where we have used that αu ≥ 1/(2− χu) =⇒´ tsαu(2− χu)du ≥ (t− s). We now have
|αt| = | − rαt(2− χt)αft | = |rαft | · |αt| · |2− χt|
≤ gNF · 1 · 2.
We have shown that (α, γ, χ) are uniformly bounded for (α, γ, χ) ∈ ΞNF , and thus ΞNF is
equicontinuous.
Let (βpub,∞1 , βpub,∞3 , γpub,∞) and (αNF,∞, γNF,∞, χNF,∞) denote the myopic equilibrium co-
efficients for the public and no feedback cases, respectively.
Proposition A.1. Fix any T > 0 and let {rn}∞n=1 be a sequence with limn→∞ rn = ∞.
Let {(βpub,n1 , βpub,n3 , γpub,n)}∞n=1 and {(αNF,n, γNF,n, χNF,n)}∞n=1 be sequences of solutions to
BVPPub(rn) and BVPNF (rn), respectively. Then uniformly, (βpub,n1 , βpub,n3 , γpub,n)→ (βpub,∞1 , βpub,∞3 )
and (αNF,n, γNF,n, χNF,n)→ (αNF,∞, γNF,∞, χNF,∞).
Proof. First, note that rn → ∞ implies that rn = 0 for at most finitely many n, so there
exists N such that rn > 0 for all n ≥ N ; it suffices to consider only such n. In the public
benchmark, recall that βpub,∞1 = βpub,∞3 = 1/2. We have β3 ≥ 1/2 > 0, so the β3 ODE can
be rearranged to obtain
βpub,n3t = 1/2 +1
rn
[(βpub,n3t )2(1− βpub,n3t )γpub,nt
2βpub,n3t σ2Y
− βpub,n3t
2βpub,n3t
].
Since βpub,n3t ≥ 1/2, the expression in brackets is uniformly bounded in absolute value by some
constant Kpub (independent of rn and t). Hence βpub,n3 converges pointwise to 1/2 = βpub,∞3 .
Since βpub,n1 ≡ 1 − βpub,n3 , βpub,n1 converges pointwise to 1/2 = βpub,∞1 . As [0, T ] is compact
and the sequence {(βpub,n1 , βpub,n3 )}∞n=1 is equicontinuous by Lemma A.7, we apply Lemma 39
in Royden (1988, p. 168) and obtain uniform convergence for βpub1 and βpub3 . Finally, we
prove uniform convergence for γpub, by proving pointwise convergence and invoking the same
result to obtain uniform convergence. Note that for i ∈ N ∪ {∞}
γpub,it =γoσ2
Y
σ2Y + γo
´ Tt
(βpub,i3s )2ds.
Since for n ∈ N, the functions (βpub,n3s )2 are bounded uniformly by a constant and converge
pointwise to (βpub,∞3 )2, the dominated convergence theorem (Royden, 1988, p. 267, Theorem
52
16) implies´ Tt
(βpub,n3s )2ds→´ Tt
(βpub,∞3s )2ds and thus for all t ∈ [0, T ] pointwise,
γpub,nt → γoσ2Y
σ2Y + γo
´ Tt
(βpub,∞3s )2ds= γpub,∞t .
By Lemma A.7, the sequence of γpub,n is equicontinuous, and thus Royden (1988, p. 168,
Lemma 39) gives uniform convergence to γpub,∞.
Next, consider the no feedback case. By Royden (1988, p. 169, Theorem 40), there exists
a subsequence of {(αNF,n, γNF,n, χNF,n)}∞n=1 indexed by {k(n)}∞n=1 which converges uniformly
on [0, T ] to some limit (α∗, γ∗, χ∗). We argue that (α∗, γ∗, χ∗) = (αNF,∞, γNF,∞, χNF,∞), and
thus the original sequence converges uniformly to
(αNF,∞, γNF,∞, χNF,∞).
Using a similar dominated convergence argument to before, we have that pointwise
γNF,k(n)t → γoσ2
Y
σ2Y +γo
´ Tt (α∗)2ds
, so γ∗t =γoσ2
Y
σ2Y +γo
´ Tt (α∗)2ds
and χNF,k(n)t → 1−γ∗t /γo. Since χNF,k(n) →
χ∗ uniformly, we have χ∗ = 1− γ∗/γo.Let αm,nt := 1/(2 − χNF,nt ). Since αNF,nt > 0 for all n, the α ODE can be rearranged to
obtain
αNF,nt − αm,nt =1
rn
[− αNF,nt
αNF,nt (2− χNF,nt )
].
Since 2 − χNF,nt > 1 and αNF,nt > 1/2, the expression in brackets is uniformly bounded in
absolute value by some constant KNF (independent of rn and t), and thus |αNF,nt −αm,nt | → 0
pointwise. It follows that pointwise, and by familiar arguments uniformly, αm,k(n) → α∗. But
αm,k(n) = 1/(2− χNF,n)→ 1/(2− χ∗), so α∗ = 1/(2− χ∗) = 1/(1 + γ∗/γo).
By differentiating the equation γ∗t =γoσ2
Y
σ2Y +γo
´ Tt (α∗)2ds
, we obtain γ∗t =(α∗t γ
∗t )2
σ2Y
=(γ∗t )2
σ2Y [1+γ∗t /γ
o]2,
subject to initial condition γ∗T = γo. This equation has a unique solution which is γNF,∞
as obtained in the proof of Lemma A.5, from which α∗ = αNF,∞ and χ∗ = χNF,∞. This
establishes that all convergent subsequences have the same limit, and hence the original
sequences converges to that limit.
Proof of Proposition 3. Part (i) is a restatement of results from Lemma A.4. For part (ii),
choose any δ ∈ (0, T ). Define γ := mint∈[T−δ,T ](γPub,∞t −γNF,∞t ), where we use γx,r to denote
the solution for the case x ∈ {Pub,NF} and r is the discount rate. By Lemma A.6, γ > 0.
By Proposition A.1, γPub,r − γNF,r converges uniformly to γPub,∞ − γNF,∞ as r → ∞, and
thus for any ε ∈ (0, γ), there exists r > 0 such that for all r > r and all t ∈ [T − δ, T ], we
have γPub,rt − γNF,rt > γPub,∞t − γNF,∞t − ε ≥ γ − ε > 0, as desired.
53
Proof of Proposition 4
We begin by calculating expected flow losses in the public and no feedback cases.
Lemma A.8. Then the expected flow payoffs to player 1 in the public benchmark and no
feedback case have magnitudes
upubt = γpubt [(1− β3t)2 + β2
3t]
uNFt = (1− α3t)2γo + α2
3tγNFt .
Proof. Let Eθ denote ex ante expectations over realizations of θ ∼ N(m0, γo). In the public
benchmark, using β1t = 1− β3t, we have
upubt = Eθ(E0
[(θ − at)2 + (at − at)2
])= Eθ
(E0
[(θ − β1tMt − β3tθ)
2 + (Mt − β1tMt − β3tθ)2])
= Eθ(E0
[(θ −Mt)
2([1− β3t]
2 + β23t
)])= E0
[Et{(θ −Mt)
2}([1− β3t]
2 + β23t
)]= γpubt [(1− β3t)
2 + β23t],
where in the second to last step we have used that the long-run player and myopic player have
the same information prior to θ being realized, along with the law of iterated expectations.
In the no feedback case, we have
uNFt = Eθ(E0
[(θ − at)2 + (at − at)2
])= Eθ
(E0
[(θ − (1− αt)m0 − αtθ)2 + ([1− αt]m0 + αtθ − [1− αt]m0 − αtMt)
2])
= Eθ(E0
[(1− αt)2(θ − m0)2 + α2
t (Mt − θ)2])
= γo(1− αt)2 + α2tEθ
(E0
{Et[(Mt − θ)2
]}),
where we have again used the law of iterated expectations. In the second term, we expand
Et[(Mt − θ)2
]= Et
[M2
t − 2Mtθ + θ2]
= Et[M2
t
]− 2θMt + θ2
= γNF2t +M2t − 2θMt + θ2
= γNFt χt + (Mt − θ)2
= γNFt χt + (1− χt)2(θ − m0)2.
54
This term is unchanged under the expectation E0. Under the outer expectation Eθ, it reduces
to γNFt χt + (1− χt)2γo, and using χt = 1− γNFt /γo, the result follows.
Let V Pub and V NF denote Player 1’s ex ante expected payoff, i.e., at time 0 taking
expectations over θ ∼ N(m0, γo).
Lemma A.9. Assume r = 0. For all T, γo, σ2Y > 0, V Pub > V NF .
Proof. For i ∈ {pub,NF}, define T i :=TγFiσ2Y
and define ρ := γFNF/γo. We claim that the
quantities under comparison are
V pub = −σ2Y
{T pub/2− ln
[16− 8T pub
(4− T pub)2
]}.
V NF = −σ2Y {ρ(1− ρ)− ln ρ} .
First consider the public benchmark, where from Lemma A.8,
V pub = E0
(ˆ T
0
[−(θ − at)2 − (at − at)2
]dt
)= −ˆ T
0
upubt dt
= −ˆ T
0
[γt([1− β3t]
2 + β23t
)]dt
Using the closed-form expressions for γt and β3t, the integrand simplifies to
γt([1− β3t]
2 + β23t
)=γFpub
2
[1 +
2tγFpubσ2Y
(2σ2Y − tγFpub)(4σ2
Y − tγFpub)
]
=γFpub
2
[1 +
2tpub
(2− tpub)(4− tpub)
],
where tpub := tγFpub/σ2Y . Using that the function g : x 7→ x
(2−x)(4−x)has antiderivative
ln(
(4−x)2
2−x
)and integrating the second term w.r.t. tpub over
[0, T pub
]and rescaling by σ2
Y /γFpub,
we obtain
V pub = −TγFpub
2− σ2
Y
(ln
[(4− T pub)2
2− T pub
]− ln 8
)
= −σ2Y
{T pub/2− ln
[16− 8T pub
(4− T pub)2
]}.
55
Next, consider the no feedback case, where by Lemma A.8,
V NF = E0
(ˆ T
0
[−(θ − at)2 − (at − at)2
]dt
)= −ˆ T
0
uNFt dt
= −ˆ T
0
[(1− α)2γo + α2
([1− χt]2γo + γtχt
)]dt. (A.50)
By plugging in the analytic solutions for α, χt and γt in terms of γFNF , the integrand in
(A.50) is
γoγFNF
(γo + γFNF )2
t(γoγFNF
)2 −(γo + γFNF
)3σ2Y
t (γo)2 γFNF (γo + γFNF )2σ2Y
. (A.51)
From the equation defining γFNF , we have(γo + γFNF
)2σ2Y = TγFNF (γo)3 /
(γo − γFNF
), and
thus (A.51) reduces to
γoγFNF
(γo + γFNF )2
tγFNF(γo − γFNF
)− Tγo
(γo + γFNF
)t (γo − γFNF )− Tγo
(A.52)
Integrating (A.52) yields
V NF = −ˆ T
0
[γoγFNF
(γo + γFNF )2
tγFNF(γo − γFNF
)− Tγo
(γo + γFNF
)t (γo − γFNF )− Tγo
]dt
= − TγoγFNF
(γo + γFNF )2
γFNF +(γo)2 ln γo
γFNF
γo − γFNF
= −
σ2Y
(γo − γFNF
)(γo)2
γFNF +(γo)2 ln γo
γFNF
γo − γFNF
= −σ2
Y {ρ(1− ρ)− ln ρ} .
Next, we prove that V NF < V pub. Substituting in the expressions for γpubF , and simplify-
56
ing, we obtain
V NF − V pub = σ2Y
{ρ(ρ− 1) + ln ρ+ 1 + T /2−
√4 + T 2
2
− ln
[8
(−T +
√4 + (T )2
)]+ 2 ln
[2− T +
√4 + (T )2
]}=σ2Y
2ρf(ρ),
where again T := Tγo
σ2Y
and ρ := γNF
γo, where in the second line we have used that the cubic
equation defining γNF rearranges to T = (1−ρ)(1+ρ)2
ρ, and where
f(x) := A1(x) + 2x ln
(A2(x)2
A3(x)
)(A.53)
A1(x) := x3 − 3x2 + 3x+ 1− z(x)
A2(x) := x3 + x2 + x− 1 + z(x)
A3(x) := 8[x3 + x2 − x− 1 + z(x)]
z(x) :=√
4x2 + (1− x)2(1 + x)4.
We now show that f(x) < 0 for all x ∈ (0, 1), so that in particular f(ρ) < 0, from which
the desired result follows. To that end, we first show that A2(x) > 0 and A3(x) > 0 for all
x > 0. By inspection, for all x > 0, we have A2(x) > A3(x)/8, and
A3(x)/8 = (x− 1)(x+ 1)2 +√
4x2 + (1− x)2(1 + x)4
> (x+ 1)2[x− 1 +
√(1− x)2
]≥ 0.
Next, we apply the inequality ln(y) ≤ 2(y
12 − 1
)for y > 0 using y = A2(x)2
A3(x)in (A.53) to
obtain
f(x) ≤ A1(x) + 4x
(A2(x)√A3(x)
− 1
). (A.54)
For x > 0, the RHS of (A.54) is negative if and only if
A2(x)√A3(x)
< −A1
4x+ 1. (A.55)
57
For x ∈ (0, 1), the RHS of (A.55) is strictly positive:
−A1
4x+ 1 =
1
4x
[√4x2 + (1− x)2(1 + x)4 − x3 + 3x2 + x− 1
]>
1
4x
[(1− x)(1 + x)2 − x3 + 3x2 + x− 1
]=
1
2[1 + x(1− x)] > 0.
Hence, for x ∈ (0, 1), (A.55) is equivalent to
0 > A2(x)2 − A3(x)
(−A1
4x+ 1
)2
=2
x2
[(1− x)2A4(x) + A5(x)z(x)
](A.56)
where
A4(x) = x6 − 4x4 − x3 + 4x2 + 3x+ 1
A5(x) = x5 − 3x4 + x3 + 2x2 − 1.
Now by Descartes’ rule of signs, A5 has 3 sign changes and at most 3 positive real roots,
counting multiplicity. It is easy to verify that there is a double root at x = 1, that A5(2) =
−1 < 0, and that limx→+∞A5(x) = +∞, so there is a positive root at some x > 2. This
implies there are no roots in (0, 1). Since A5(0) = −1 < 0, it follows that A5(x) < 0 for all
x ∈ (0, 1). Thus, without signingA4(x), it suffices to show that (1−x)2|A4(x)| < −A5(x)z(x),
or equivalently (since A5(x) < 0)
0 > (1− x)4A4(x)2 − z(x)2A5(x)2 = 4(1− x)4x5(x4 − 2x2 − 3x− 1). (A.57)
Now for x ∈ (0, 1), we have in (A.57) x4 − 2x2 − 3x − 1 < −2x2 − 3x < 0, and the outside
factor is clearly positive. Hence we have shown that f(x) < 0 for x ∈ (0, 1), concluding the
proof.
Lemma A.10. Suppose the leader is myopic. There exists T > 0 such that for all T > T
the team’s ex ante flow payoff is strictly larger in the no feedback case over [T , T ].
Proof. Plugging the solutions from Lemma A.5 into the expressions for the expected flow
58
losses in Lemma A.8, we have
upubt = γpubt [(1− β3t)2 + β2
3t] =γpubt
2=
2σ2Y γ
o
4σ2Y + γot
uNFt = (1− α3t)2γo + α2
3tγNFt =
γoγNFtγo + γNFt
.
Let t := tγo
σ2Y
. Flow losses are worse in the public benchmark case at time t if and only if
upubt > uNFt
⇐⇒ 2σ2Y γ
o
4σ2Y + tγo
>γoγNFtγo + γNFt
⇐⇒ γNFt <2σ2
Y γo
2σ2Y + γot
=2γo
2 + t.
Since f(γ/γo) is strictly increasing in γ, where f(γNF/γo) is the LHS of (A.49), it suffices
to show that there exists T such that f(γ/γo)|γ= 2γo
2+t
> −γotσ2Y
= −t if t > T . By direct
calculation,
f(γ/γo)|γ= 2γo
2+t
+ t =f2(t)
2(t+ 2), where (A.58)
f2(t) := t2 + 4(t+ 2) ln
[2
t+ 2
].
Now the denominator of the RHS of (A.58) is strictly positive, and it is easy to verify that
in the numerator, f2(0) = 0 and f ′2(0) = −4. Moreover, f ′2(t) = 2t − 4 + 4 ln[
2t+2
]with
limt→+∞ f′2(t) = +∞ as the linear term dominates the log term, and f ′′2 (t) = 2 − 4
T+2≥ 0
with strict inequality for all t > 0. It follows that there is a unique positive threshold t∗
defined by f2(t∗) = 0 such that the RHS of (A.58) is strictly positive, and upubt > uNFt , if
and only if t > t∗. Letting T = t∗σ2Y /γ
o yields the result.
Proof of Proposition 4. Part (i) of the proposition is simply a restatement of Lemma A.9.
For part (ii), use superscript r for the discount rate as in the proof of Proposition 3. Let T be
as in Lemma A.10 and define u := mint∈[T ,T ](uPub,∞t −uNF,∞t ), which is strictly positive. Now
the mappings γ 7→ γ/2 and γ 7→ γoγγo+γ
underlying the expressions for expected flow losses in
Lemma A.8 are continuous for γ ∈ [0, γo], a compact set, and hence uniformly continuous;
together with Proposition A.1, this implies that uPub,r and uNF,r converge uniformly to uPub,∞
and uNF,∞, respectively. Hence for any ε ∈ (0, u), there exists r such that for all r > r and
all t ∈ [T , T ], uPub,rt − uNF,rt > uPub,∞t − uNF,∞t − ε ≥ u− ε > 0.
59
Appendix B: Proofs for Section 4
In the proofs for this section, we denote the prior by m0 := µ.
Proof of Lemma 2. We establish a more general version of the lemma for a drift of the
form at + νat, ν ∈ [0, 1], in X. We use “p1” and “p2” to refer to the long-run player and
the myopic player, respectively. Without fear of confusion, we also use γ1t for the posterior
variance of p2 (γt in the main body), as this variance appears in the first filtering step of
the proof. Likewise, p1’s posterior variance will be denoted by γ2, as it is obtained from a
second filtering step.
Inserting (11) in (12), we can write at = α0t + α2tLt + α3tθ, with
α0t = β0t, α2t = β2t + β1t(1− χt), and α3t = β3t + β1tχt.
that p2 conjectures drives the public signal X.35 With this in hand, p2’s filtering problem
can be obtained using the Kalman filter (Chapters 11 and 12 in Liptser and Shiryaev, 1977).
Specifically, define
dX2t := dXt − [at + ν(α0t + α2tLt)]dt = να3tθdt+ σXdZ
Xt
dY 2t := dYt − [α0t + α2tLt]dt = α3tθdt+ σY dZ
Yt
which are in p2’s information set. Then, by Theorems 12.6 and 12.7 in Liptser and Shiryaev
(1977), θ|F2t ∼ N (Mt, γ1t), where,
dMt =να3tγ1t
σ2X
[dX2t − να3tMtdt] +
α3tγ1t
σ2Y
[dY 2t − α3tMtdt]
˙γ1t = −γ21tα
23tΣ,
and Σ :=(ν2
σ2X
+ 1σ2Y
). (These expressions hold for any (admissible) strategies of the players,
as deviations go undetected.)
P1 can affect Mt via her choice (at)t∈[0,T ]. It is easy to see that, from her perspective,
dMt =να3tγ1t
σ2X
[(νat − ν{α0t + α2tLt + α3tMt})dt+ σXdZXt ]
+α3tγ1t
σ2Y
[(at − {α0t + α2tLt + α3tMt})dt+ σY dZYt ]
35Since α3t plays a key role in the economic analysis and appears frequently throughout the paper, wesometimes abbreviate it to αt.
60
under any admissible strategy (at)t∈[0,T ]. Rearranging terms we can write
dMt = (µ0t + µ1tat + µ2tMt)dt+BXt dZ
Xt +BY
t dZYt , (B.1)
where
µ1t = α3tγ1tΣ, µ0t = −µ1t[α0t + α2tLt], µ2t = −α3tµ1t, BXt =
να3tγ1t
σX, BY
t =α3tγ1t
σY. (B.2)
This dynamic is linear in M . Also, since Lt depends only on the paths of X, µ0t is in p1’s
information set. Similarly with (at)t∈[0,T ], which is measurable with respect to (θ,X).
On the other hand, because p1 always thinks that p2 is on path, the public signal follows
dXt = (νat + δ0t + δ1tMtdt+ δ2tLt)dt+ σXdZXt ,
from her perspective. This dynamic is also affine in the unobserved state M , with an
intercept that is again measurable with respect to (θ,X). The hidden state M along with
the observation state (θ,X) is thus conditionally Gaussian. In particular, applying the
filtering equations in Theorem 12.7 in Liptser and Shiryaev (1977) yields that Mt := Et[Mt]
and γ2t := Et[(Mt − Mt)2] satisfy
dMt = (µ0t + µ1tat + µ2tMt)dt︸ ︷︷ ︸=Et[(µ0t+µ1tat+µ2tMt)dt]
+σXB
Xt + γ2tδ1t
σ2X
[dXt − (νat + δ0t + δ1tMt + δ2tLt)dt] (B.3)
γ2t = 2µ2tγ2t + (BXt )2 + (BY
t )2 −(σXB
Xt + γ2tδ1t
σX
)2
, (B.4)
and where dZt := [dXt−(νat+δ0t+δ1tMt+δ2tLt)dt]/σX is a Brownian motion with respect to
the long-run player’s standpoint. (For notational simplicity, we have omitted the dependence
of M and Z on the strategy (at)t∈[0,T ] that is being followed.)
Inserting at = β0t + β1tMt + β2tLt + β3tθ into the right-hand side of dMt, and solving for
dMt as a function of the increment dXt, we obtain
dMt = [µ0t + µ1tMt + µ2tLt + µ3tθ]dt+ BtdXt
61
where
µ0t = −α3tγ1tα0tΣ︸ ︷︷ ︸constant in µ0t
+α3tγ1tβ0tΣ︸ ︷︷ ︸µ1tβ0t
+να3tγ1t + γ2tδ1t
σ2X︸ ︷︷ ︸
=[σXBXt +γ2tδ1t]/σ2
X
[−νβ0t − δ0t]
µ1t = α3tγ1tβ1tΣ︸ ︷︷ ︸µ1tβ1t
+−α23tγ1tΣ︸ ︷︷ ︸µ2t
+να3tγ1t + γ2tδ1t
σ2X
[−νβ1t − δ1t]
µ2t = −α3tγ1tα2tΣ︸ ︷︷ ︸Lt term in µ0t
+α3tγ1tβ2tΣ︸ ︷︷ ︸µ1tβ2t
+να3tγ1t + γ2tδ1t
σ2X
[−νβ2t − δ2t]
µ3t = α3tγ1tβ3tΣ︸ ︷︷ ︸µ1tβ3t
+να3tγ1t + γ2tδ1t
σ2X
[−νβ3t] =
[α3tγ1t
σ2Y
− νγ2tδ1t
σ2X
]β3t
Bt =να3tγ1t + γ2tδ1t
σ2X
.
Let R(t, s) = exp(´ tsµ1udu), and suppose and denote the prior distribution of θ by
N (m0, γ10); in particular, M0 = m0. Path-by-path of X, therefore,
Mt = R(t, 0)m0 + θ
ˆ t
0
R(t, s)µ3sds+
ˆ t
0
R(t, s)[µ0s + µ2sLs]ds+
ˆ t
0
R(t, s)BsdXs,
which, after recalling that we started from (11), yields a system of two equations:
χt =
ˆ t
0
R(t, s)µ3sds,
Lt =R(t, 0)m0 +
´ t0R(t, s)[µ0s + µ2sLs]ds+
´ t0R(t, s)BsdXs
1− χt.
The validity of the construction boils down to finding a solution to the previously stated
equation for χ that takes values in [0, 1). In fact, when this is the case,
(1− χt)dLt − Ltdχt = [µ0t + (µ1t(1− χt) + µ2t)Lt]dt+ BtdXt
=⇒ dLt =Lt[µ1t + µ2t + µ3t]dt+ µ0tdt+ BtdXt
1− χt.
Thus, letting R2(t, s) := exp(´ t
sµ1u+µ2u+µ3u
1−χu du)
, we obtain that
Lt = R2(t, 0)m0 +
ˆ t
0
R2(t, s)µ0s
1− χsds+
ˆ t
0
R2(t, s)Bs
1− χsdXs, (B.5)
62
i.e., L is a (linear) function of the paths of X as conjectured. Moreover, in the particular
case of ν = 0, it is easy to verify that dLt = (`0t + `1tLt)dt+BtdXt, where
l0t = − γtχtδ0tδ1t
σ2X(1− χt)
(B.6)
l1t = −γtχtδ1t(δ1t + δ2t)
σ2X
(B.7)
Bt =γtχtδ1t
σ2X(1− χt)
. (B.8)
We will ultimately find a solution to the equation for χ that is of class C1 and that takes
values in [0, 1). In particular, if χ is differentiable,
χt = µ1tχt + µ3t
= α3tγ1tΣ [χtβ1t − α3tχt]︸ ︷︷ ︸=α3t(1−χt)−β3t
+χtνα3tγ1t + γ2tδ1t
σ2X
[−νβ1t − δ1t]
+α3tγ1tβ3tΣ +να3tγ1t + γ2tδ1t
σ2X
[−νβ3t]. (B.9)
Using that α3t = β3t + β1tχt, we obtain the following system of ODEs
γ1t = −γ21t(β3t + β1tχt)
2Σ
γ2t = −2γ2tγ1t(β3t + β1tχt)2Σ + γ2
1t(β3t + β1tχt)2Σ−
(νγ1t(β3t + β1tχt) + γ2tδ1t
σX
)2
χt = γ1t(β3t + β1tχt)2Σ(1− χt)− (ν[β3t + β1tχt] + δ1tχt)
(νγ1t(β3t + β1tχt) + γ2tδ1t
σ2X
).
In the proof of the next lemma we establish that χ = γ2/γ1. After replacing ν = 0 and
γ2 = χγ in the third ODE, and using γ for γ1, the first and third equations of the previous
system correspond to (15)–(16) as desired. The representation Lt = E[θ|FXt ] is proved in
Lemma B.1 at the end of this subsection. �
Proof of Lemma 3. Consider the system (γ1, γ2, χ) from the proof of the previous lemma
when ν = 0 (in particular, Σ becomes 1/σ2Y ). Also, let δ1t := uθ+uaα3t.
36 The local existence
of a solution follows from continuity of the associated operator. Suppose that the maximal
interval of existence is [0, T ), with T ≤ T .
Since the system is locally Lipschitz continuous in (γ1, γ2, χ) uniformly in t ∈ [0, T ] for
36All the results in this proof extend to a generic continuous function δ1 over [0, T ] in which the explicit
dependence on ~β and χ is not recognized, which happens when the myopic player becomes forward looking.
63
given continuous coefficients, it solution is unique over the same interval (Picard-Lindelof).
In particular, observe that (γ1t, γ2t, χ) = (γo, 0, 0) solves the system as long as β3 = 0.
Without loss of generality then, assume β30 6= 0.
Observe that γ1 is (weakly) decreasing over [0, T ), so γ1t ≤ γo. Suppose there is a time
at which γ1 is strictly negative. Let s < t be the first time γ1 crosses zero, and notice that
for t > s close to s,
0 > γ1t =
ˆ t
s
γ1udu = −ˆ t
s
γ21u[β3u + β1uχu]
2Σds ≥ 0,
which is a contradiction. Thus, γ1t ∈ [0, γo] for all t ∈ [0, T ). Moreover, if γt > 0, straight-
forward integration shows that
γ1t =γo
1 +´ t
0[β3s + β1sχs]2Σds
.
Since ~β is continuous over [0, T ], if γ ever vanishes in [0, T ) we must have that χ diverges
at such point; by definition of T , however, that point must be T . Thus, γ1t > 0 in [0, T )
(regardless of whether χ diverges at T or not).
We now show that 0 < γ2t < γ1t for t > 0. In fact, since γ20 = 0, γ10 > 0 and β30 > 0, we
have γ2ε > 0 for ε small. Consider now [ε, t] with t ∈ (ε, T ). Then,
fγ2(t, x) := −2xγ1t(β3t + β1tχ1t)
2
σ2Y
+γ2
1t(β3t + β1tχ1t)2
σ2Y
−(xδ1t
σX
)2
,
is locally Lipschitz continuous with respect to x uniformly in t ∈ [ε, t]. Since 0− fγ2(t, 0) ≤0 = γ2t−fγ2(t, γ2t) and 0 < γ2ε, we obtain that γ2t > 0 for all t ∈ [0, t] by means of standard
comparison theorems (e.g., Theorem 1.3 in Teschl), and hence over (0, T ) as well.
Now, let zt := γ2t − γ1t, t < T . Using the ODEs for γ1 and γ2 we deduce that
zt < −2γ1t(β3t + β1tχ1t)zt
σ2Y
, z0 = γ20 − γ10 = −γo < 0.
It is then easy to conclude that (Gronwall’s inequality),
zt < z0 exp
(−ˆ t
0
2γ1s(β3s + β1sχ1s)
σ2Y
ds
)< 0, t < T ,
as γ1t(β3t + β1tχ1t) is continuous over [0, t], t < T . Thus, γ2t < γ1t for all t ∈ [0.T ).
With this in hand, γ2t/γ1t ∈ (0, 1) for all t ∈ (0, T ), and γ20/γ10 = 0. Moreover, it is easy
64
to verify that the previous ratio solves the χ−ODE. By uniqueness, χ = γ2/γ1. Replacing
γ2 = χγ1 and ν = 0 in the χ−ODE above yields (16), i.e.,
χt = γ1t(β3t + β1tχt)2 (1− χt)
σ2Y
− γ1t(δ1tχt)
2
σ2X
, t ∈ [0, T ).
By the previous analysis, (γ1, γ2, χ) is bounded over [0, T ). If T < T , the solution can
be extended strictly beyond T thanks to the continuity of the associated operator (Peano’s
theorem), contradicting the definition of T . Thus, the only option is that T = T , in which
case the system admits a continuous extension to T .37 By continuity, such an extension is
unique, the desired properties (χ = γ2/γ1 stated in Lemma 2; χ solves (16) and χ ∈ (0, 1);
and γo ∈ (0, γo]) hold up to T by the exact same arguments previously applied over strict
compact subsets of [0, T ) now over [0, T ].38 �
Proof of Lemma 4. The long-run player’s problem is to choose an admissible a := (at)t∈[0,T ]
that maximizes
U(a) := E0
[ˆ T
0
e−rtU(at, δ0t + δ1tMt + δ2tLt, θ)dt
]where (Mt)t≥0 is given by (B.1) and (Lt)t≥0 by (B.5). Using that the flow is quadratic, we
obtain that
U(a) = E0
[ˆ T
0
e−rtU(at, δ0t + δ1tMat + δ2tLt, θ)dt
]+Uaa2
E0
[ˆ T
0
e−rtEt[(Mat − Ma
t )2]dt
]with Ma
t := Et[Mat ], where we have made explicit the dependence of both processes on the
strategy followed. By the proof of Lemma 2, (Mat )t∈[0,T ] evolves as in (B.3), i.e.,
dMat = (µ0t + µ1tat + µ2tM
at )dt+
σXBXt + γ2tδ1t
σXdZa
t
where dZat := [dXt−(νat+δ0t+δ1tM
at +δ2tLt)dt]/σX is a Brownian motion from the long-run
player’s standpoint, (µ0, µ1, µ2, BXt ) are given by (B.2), and where γ2t evolves as in (B.4).
Moreover, from the same filtering equations (B.3)–(B.4) we know that Et[(Mat − Ma
t )2] is
independent of the strategy followed, and that it coincides with γ2t, t ∈ [0, T ]. Thus, the
37For a generic system zt = f(t, zt), if z is bounded over [0, T ) and f continuous, there exists K s.t.|xt − xs| < M |t− s|; but this implies that (xs)s↗T is Cauchy, and hence the limit exists.
38An alternative way of seeing that χ < 1 is that χ ≤ γ1t(β3t +β1tχt)2(1−χt)/σ2
Y , and so χt ≤ 1− γt/γoby standard comparison theorems, as the latter function satisfies zt = γ1t(β3t + β1tzt)
2(1− zt)/σ2Y , z0 = 0.
65
long-run player’s problem reduces to
max(at)t≥0 admissible
E0
[ˆ T
0
e−rtU(at, δ0t + δ1tMat + δ2tLt, θ)dt
]where (Ma
t )t∈[0,T ] is as above, and (Lt)t≥0 is linear in the paths of X according to (B.5). In
differential form, the latter process can be written as
dLt =1
1− χ1
{Lt[µ1t + µ2t + µ3t] + µ0t + Bt[νat + δ0t + δ1tM
at + δ2tLt]
}dt+
σXBt
1− χtdZa
t .
where we used that dXt = (νat + δ0t + δ1tMat + δ2tLt)dt+ σXdZ
at from the long-run player’s
standpoint. (Refer to the proof of Lemma 2 for the expressions for (µ0t, µ1t, µ2t, µ3t, BXt ).)
So far, we have fixed an admissible strategy (at)t∈[0,T ] (in the sense of Section 3) for
the long-run player, and then obtained processes Ma and Za that potentially depend on
that choice. The above problem thus differs from traditional control problems with perfectly
observed states in that the Brownian motion is, in principle, affected by the choice of strategy.
With linear dynamics, however, the separation principle (e.g., Liptser and Shiryaev, 1977,
Chapter 16), applies. In fact, the solution to the long-run player’s problem can be found
by first fixing a Brownian motion, say, Zt := Z0t (i.e., Za
t when a ≡ 0), and then solving
the optimization problem that replaces Za by Z in the laws of motion of Ma and L. The
method works to the extent that Za ≡ Z for all (at)t≥0: it is easy to conclude from (B.1) and
(B.3) that the process Mat −Ma
t is independent of the strategy followed, and hence so is Zat ,
given that σXdZat = dXt − (νat + δ0t + δ1tM
at + δ2tLt)dt = δ1t(M
at −Ma
t )dt+ σXdZXt under
the true data-generating process, thanks to the linearity of the dynamics. In this procedure,
therefore, one filters as a first step, and then optimizes afterwards using the posterior mean
as a controlled state.39
Returning to the ν = 0 case, we can then insert Zt in the dynamic of Mat . Omitting the
dependence of the resulting process on a (as any control problem does), it is easy to see that
dMt =γtα3t
σ2Y
(at − [α0t + α2tLt + α3tMt])dt+χtγtδ1t
σXdZt.
As for the expression for L (display (19)), this one follows from (17) using that dXt =
39Relative to Chapter 16 in Liptser and Shiryaev (1977), our problem is more general in that it allows for alinear component in the flow, and the public signal can be controlled (when ν 6= 0). The first generalizationis clearly innocuous. As for the second, the key behind the separation principle is that the innovationsdXt−Et[dXt] are independent of the strategy followed, which also happens when ν 6= 0. Given any admissiblestrategy (at)t≥0, therefore, the fact that the filtrations of Z, Za and Xa satisfy FZt = FZa
t ⊆ FXa
t , t ≥ 0,means the optimal control found by using Z is weakly better than any such (at)t≥0. See p.183 in section16.1.4 in Liptser and Shiryaev (1977) for more details in a context of a quadratic regulator problem.
66
(δ0t + δ2tLt + δ1tMt)dt+ σXdZt from the long-run player’s perspective. In fact, it is easy to
see from (B.6)–(B.8) that
l0t +Btδ0t + (l1t +Btδ1t)Lt +Btδ1tMt =γtχtδ1t
σ2X(1− χt)
(Mt − Lt).
This concludes the proof. �
Lemma B.1. The process L is the belief about θ held by an outsider who observes only X.
Moreover,
(θ
M1t
)|FXt ∼ N (M out
t , γoutt ) where M outt =
(Lt
Lt
)and γoutt =
(γ1t
1−χtγ1tχt1−χt
γ1tχt1−χt
γ1tχt1−χt
).
Proof. The outsider jointly filters the state vt = (θt,M1t)′. For the evolution of the state
and the signal, we adopt notation from Liptser and Shiryaev (1977) (Section 12.3). From
the outsider’s perspective, both players (and in particular player 2) are on the equilibrium
path, and thus the outsider believes that vt evolves as
dvt = a1(t,Xout)vtdt+ b1(t,X)dW1(t) + b2(t,X)dW2(t),
where a1(t,Xout) :=
(0 0
zt −zt
)for zt := (να3t)2γ1t
σ2X
+α23tγ1t, b1(t,X) :=
(0 0
0 α3tγ1tσY
), b2(t,X) :=(
0να3tγ1tσX
), W1(t) :=
(W11(t)
ZYt
)and W2(t) := ZX
t , where W11(t) is a standard Brownian mo-
tion and W11(t), ZYt and ZX
t are mutually independent. The signal is
dXoutt := dXt − [δ0t + δ2tLt + ν(α0t + α2tLt)]dt
= A1(t,X)vt +B1(t,X)W1(t) +B2(t,X)W2(t),
where A1(t,X) :=(να3t δ1t
), B1(t,X) :=
(0 0
)and B2(t,X) = σX .
Hence, denoting M outt =
(M out
t,1
M outt,2
)and γoutt =
(γoutt,11 γoutt,12
γoutt,21 γoutt,22
)and imposing γoutt,21 = γoutt,12,
we have from the standard filtering equations of Liptser and Shiryaev (1977) (Theorem 12.7)
67
that
(θ
M1t
)|FXt ∼ N (M out
t , γoutt ), where M out and γout are the unique solutions to
dM outt = a1(t,X)M out
t +1
σ2X
[(0
να3tγ1t
)+ γoutt
(να3t
δ1t
)]× · · · (B.10)
· · · ×{dXout
t − (να3tMoutt,1 + δ1tM
outt,2 )dt
}(B.11)
γoutt = a1(t,X)γoutt + γoutt a∗1 +1
σ2X
[(0
να3tγ1t
)+ γoutt
(να3t
δ1t
)][(0
να3tγ1t
)+ γoutt
(να3t
δ1t
)]∗(B.12)
with initial conditions M out0 =
(m0
m0
)and γout0 =
(γo 0
0 0
).
Recall that
γ1t = −α23t
σ2Y
γ21t
χt =γ1tα
23t(1− χt)σ2Y
− γ1tδ21tχ
2t
σ2X
with initial conditions γ10 = γo and χ0 = 0. It is then straightforward to verify that
γoutt =
(γ1t
1−χtγ1tχt1−χt
γ1tχt1−χt
γ1tχt1−χt
)satisfies the differential equation for γout above along with given initial
condition. Moreover, γoutt = γ1t1−χt
(1 χt
χt χt
)is positive semidefinite as its leading principal
minors are positive multiples of 1 and χ− χ2 > 0.
Next, substitute given the solution γoutt into (B.11) and subtract the equation for the
second component from its first to obtain the following SDE for M := M out1 −M out
2
dMt = −ΣMtα23tγ1t
with initial condition M0 = 0. Now if Mt > 0, then dMt < 0, giving us a contradiction;
likewise for the case Mt < 0. It follows that Mt = 0, and thus M outt,1 = M out
t,2 , for all t ≥ 0.
Substituting this back into (B.11), we have
68
dM outt,1 =
γ1t(να3t + δ1tχt)
σ2X(1− χt)
(dXoutt − (να3t + δ1t)M
outt,1 dt)
=γ1t(να3t + δ1tχt)
σ2X(1− χt)
[dXt − (να0t + δ0t +M out1,t (να3t + δ1t) + Lt(να2t + δ2t))dt] (B.13)
On the other hand, we have
dLt =Lt[µ1t + µ2t + µ3t]dt+ µ0tdt+ BtdXt
1− χt
=γ1t(να3t + δ1tχt)
σ2X(1− χt)
[dXt − (δ0t + να0t + Lt[ν(α2t + α3t) + δ1t + δ2t])dt] (B.14)
Defining Lt := M outt,1 − Lt and subtracting (B.14) from (B.13), we obtain
dLt = −γ1t(να3t + δ1tχt)
σ2X(1− χt)
Lt(να3t + δ1t)
with initial condition L0 = m0 − m0 = 0. We conclude that Lt = 0, and thus Lt = M outt,1 =
M outt,2 , for all t ≥ 0.
Proof of Lemma 5. Suppose δ1 = uaα3, ua > 0. The χ-ODE for ν ∈ [0, 1] boils down to
χ1t = γ1tα23t
([ν2
σ2X
+1
σ2Y
](1− χ1t)−
(ν + uaχt)2
σ2X
):= −γ1tα
23tQ(χt).
The goal is to find a function f : [0, χ) → [0, γo], some χ ∈ (0, 1), such that f(χt) = γt for
all t ≥ 0. When this is the case, and such f is differentiable, f ′(χt)χt = γt. Thus, if α3t > 0,
f ′(χt)
f(χt)=
1
σ2YQ(χt)
.
Thus, we aim to solve the ODE
f ′(χ)
f(χ)=
1
σ2YQ(χ)
, χ ∈ (0, χ), and f(0) = γo,
over some domain [0, χ), with the property that f(χ) > 0 if χ > 0.
To this end, let
c2 =
√b2 + 4(ua)2/[σXσY ]2 − b
2(ua/σX)2and − c1 =
−√b2 + 4(ua)2/[σXσY ]2 − b
2(ua/σX)2,
69
where b := [ν2/σ2X + 1/σ2
Y ] + 2νua/σ2X , be the roots of the quadratic
Q(χ) =
(uaσX
)2
χ2 + χ
([ν2
σ2X
+1
σ2Y
]+
2νuaσ2X
)− 1
σ2Y
.
Clearly, −c1 < 0 < c2. Also, it is easy to verify that c2 < 1.40 Thus,
1
σ2YQ(χ)
= − σ2X
(σ2Y ua)
2(c1 + c2)
[1
χ+ c1
− 1
χ− c2
]is well defined (and negative) over [0, c2) with 1/(χ+ c1) > 0 and −1/(χ− c2) > 0 over the
same domain. We can then set χ = c2 and solve
ˆ χ
0
f ′(s)
f(s)ds = − σ2
X
(σY ua)2(c1 + c2)log
(χ+ c1
c2 − χc2
c1
)⇒ f(χ) = f(0)
(c1
c2
)1/d(c2 − χχ+ c1
)1/d
where 1/d = σ2X/[(σY ua)
2(c1 + c2)] > 0. We then impose f(0) = γo, thus obtaining a strictly
positive and decreasing function that has the initial condition we look for. Moreover, letting
γ = f(χ), its inverse is decreasing and given by
χ(γ) = f−1(γ) = c1c21− (γ/γo)d
c1 + c2(γ/γo)d.
When γ1 = γo, we have that χ = 0, whereas when γ = 0, it follows that χ = c2 as desired.
We verify that this candidate satisfies the χ-ODE; we do this for the ν = 0 case only. To
this end, it is easy to verify that
d(χ(γt))
dt=
α23tγt
σ2Y [c1 + c2(γ/γo)d]2
c1c2d[c1 + c2]
(γtγo
)d.
By construction, moreover,
c1c2 = c1 − c2 =σ2X
σ2Y u
2a
which follows from equating the first- and zero-order coefficients in Q(χ) = u2aχ
2/σ2X+χ/σ2
Y−1/σ2
Y = u2a(χ− c2)(χ+ c1)/σ2
X . Thus, dc1c2 = c1 + c2.
On the other hand,
[uaχ(γ)]2
σ2X
=u2a
σ2X
[c1c2
1− (γ/γo)d
c1 + c2(γ/γo]d
]2
=c2
1(1− c2)
σ2Y
[1− (γ/γo)d
c1 + c2(γ/γo)d
]2
40This follows from squaring both sides of√b2 + 4(ua)2/[σXσY ]2 < b + 2(ua)2/σ2
X using that b +2(ua)2/σ2
X > 0 and b =[ν2/σ2
X + 1/σ2Y
]+ 2νua/σ
2X .
70
where we used that c21c
22/σ
2X = c2
1(1−c2)/σ2Y follows from u2
ac22/σ
2X = (1−c2)/σ2
Y by definition
of c2. Thus, the right-hand side of the χ-ODE evaluated at our candidate χ(γ) satisfies
γ1α23
(1− χσ2Y
− (uaχ)2
σ2X
) ∣∣∣∣∣χ=χ(γ)
=α2
3γ1
σ2Y
(1− χ− c2
1(1− c2)
[1− (γ/γo)d
c1 + c2(γ/γo)d
]2).
Thus, using that c1c2d = c1 + c2 in our expression for d(χ(γt))/dt, it suffices to show that
[c1 + c2]2(γtγo
)d= (1− χ)[c1 + c2(γ/γo)d]2 − c2
1(1− c2)[1− (γ/γo)d]2.
Using that χ[c1 + c2(γ/γo)d] = 1 − (γ/γo), it is easy to conclude that this equality reduces
to three equations
0 = c21 − c2
1c2 − c21 + c2
1c2
(c1 + c2)2 = 2c1c2 − c1c2(c2 − c1) + 2c21(1− c2)
0 = c22 + c1c
22 − c2
1(1− c2).
capturing the conditions on the constant, (γ/γo)d and (γ/γo)2d, respectively. The first
condition is trivially satisfied. As for the third, by definition of c1 and c2 we have that
c22/(1− c2) = σ2
X/(uaσY )2 = c21/(1 + c1). Thus, c1
1(1− c2) = c22(1 + c1), and the result follows.
For the second, use that c1c2(c2 − c1) = −(c1 − c2)2 and that c11(1 − c2) = c2
2(1 + c1) to
conclude
2c1c2−c1c2(c2−c1)+2c21(1−c2) = c2
1 +c22 +2c2
2(1+c1) = c21 +c2
2 +2c22 +2c2 c2c1︸︷︷︸
=c1−c2
= (c1 +c2)2.
Thus, χ(γ) as postulated satisfies the χ-ODE. We then conclude by uniqueness of any such
solution.
Finally, when ua = 0, we have that δ1 ≡ 0, and the χ-ODE reduces to χ = α23tγt(1 −
χt)/σ2Y , χ0 = 0. It is then easy to verify that χ(γ) = 1− γt/γo satisfy the ODE, and hence
we conclude using the same uniqueness argument.
Proof of Theorem 1
We begin by proving that a solution to the BVP exists, and any solution has the stated
properties; from there we will establish the rest of the solution. Recall that αt = α3t =
β1tχt + β3t.
71
We being by reversing time and posing the associated IVP parameterized by an initial
guess γ0 = γF ∈ [0, γo]:
v6t = −β22t − 2β1tβ2t(1− χt) + β2
1t(1− χt)2 − 2v6tα2tγtχt
σ2X(1− χt)
(B.15)
v8t = 2β2t + 2(1− 2αt)β1t(1− χt) + 4β21tχt(1− χt)−
v8tα2tγtχt
σ2X(1− χt)
(B.16)
β1t =αtγt
2σ2Xσ
2Y (1− χt)
{−2σ2
X(αt − β1t)β1t(1− χt)
+2σ2Y αtχt(β2t[1 + 2β1tχt]− β1t[1− χt]) + α2
tβ1tγtχtv8t
} (B.17)
β2t =αtγt
2σ2Xσ
2Y (1− χt)
{−2σ2
Xβ21t(1− χt)2 − 2σ2
Y αtβ2tχ2t (1− 2β2t)
+α2tγtχt(2v6t + β2tv8t)
} (B.18)
β3t =αtγt
2σ2Xσ
2Y (1− χt)
{2σ2
Xβ1t(1− χt)β3t − 2σ2Y αtβ2tχ
2t (1− 2β3t)
+α2tβ3tγtχtv8t
} (B.19)
αt =α3tγtχt
2σ2Xσ
2Y (1− χt)
{4σ2
Y β2tχt + αtγtv8t
}(B.20)
γt =γ2t α
2t
σ2Y
(B.21)
with initial conditions v60 = v80 = 0, β10 = 12(2−χ0)
, β20 = 1−χ0
2(2−χ0), β30 = 1
2and γ0 = γF , where
χt = χ(γt) using the function χ : R+ → (−∞, c2) as defined in Lemma 5. Hereafter, we use
the notation χ := c2. Recall also that c2 < 1, and note by inspection that χ(γ) ↘ −c1 < 0
as γ → +∞, and since χ is decreasing, it has range (−c1, x). The right hand sides of the
equations above are of class C1 over the domain {(v6, v8, β1, β2, β3, γ) ∈ R5×R+}, and hence,
subject to existence, they have unique solutions over [0, T ] given the initial conditions. To
solve the BVP, our goal is to show that there exists γF ∈ (0, γo) such that a (unique) solution
to the IVP above exists and it satisfies γT (γF ) = γo.
Note that if γF = 0, then the IVP has the following (unique) solution: for all t ∈ [0, T ],
χt = χ, β1t = 12(2−χ)
, β2t = 1−χ2(2−χ)
and β3t = 12
(so that αt = 12−χ) and γt = 0, with v6 and v8
obtained from integration of their ODEs. (Clearly, this does not correspond to a solution to
the BVP, since γT (0) = 0 < γo.)
We now consider γF > 0. Given the C1 property mentioned above, and the existence of
a solution to the IVP for γF = 0, there exists ε > 0 such that a (unique) solution to the IVP
exists over [0, T ] for each γF ∈ (0, ε) (see Theorem on page 397 in Hirsch et al. (2004)).
Observe that for γF > 0, we can change variables using vi := γvi for i = 6, 8 and create
a new IVP in (v6, v8, β1, β3, β3, γ). We label this System 1, and it consists of (B.17)-(B.21)
72
(replacing all instances of γvi with vi, i = 6, 8) together with
˙v6t = γt
{−β2
2t − 2β1tβ2t(1− χt) + β21t(1− χt)2 + v6tα
2t
[1
σ2Y
− 2χtσ2X(1− χt)
]}(B.22)
˙v8t = γt
{2β2t + 2(1− 2αt)β1t(1− χt) + 4β2
1tχt(1− χt) + v8tα2t
[1
σ2Y
− χtσ2X(1− χt)
]}(B.23)
subject to v60 = v80 = 0 and the remaining initial conditions above. We will argue later that
when this system has a solution over [0, T ], v6 = v6/γ and v8 = v8/γ are well-defined, and
hence a solution to the original IVP can be recovered.
System 1 has the property that in its solution, v6 and β2 can be expressed directly as
functions of the other variables. In anticipation of this property (which we will soon verify),
it is convenient to work with a reduced IVP in (v8, β1, β3, γ), which we call System 2, as
follows:
˙v8t = γt{
2[1− β1t − β3t] + 2(1− 2αt)β1t(1− χt) + 4β21tχt(1− χt)
+v8tα2t
[1/σ2
Y − χt/(σ2X [1− χt])
]} (B.24)
β1t =αtγt
2σ2Xσ
2Y (1− χt)
{−2σ2
X(αt − β1t)β1t(1− χt)
+2σ2Y αtχt([1− β1t − β3t][1 + 2β1tχt]− β1t[1− χt]) + α2
tβ1tχtv8t
} (B.25)
β3t =αtγt
2σ2Xσ
2Y (1− χt)
{2σ2
Xβ1t(1− χt)β3t − 2σ2Y αt[1− β1t − β3t]χ
2t (1− 2β3t)
+α2tβ3tχtv8t
} (B.26)
and (B.21), subject to the initial conditions v80 = 0, β10 = 12(2−χ(γF ))
, β30 = 12
and γ0 = γF .
Observe that in this system, we have substituted 1 − β1 − β3 for all instances of β2 in the
original ODEs. We will show that there exists a γF ∈ (0, γo) such that a (unique) solution
to System 2 exists and satisfies γT (γF ) = γo, and then we will show that given this solution,
we can construct the solutions to (B.18) (which will be 1 − β1 − β3) and (B.22) directly,
solving System 1 (and hence the BVP).
Based on System 2, define
γ := sup{γF > 0 | a solution to System 2 with γ0 = γF exists over [0, T ]},
with respect to set inclusion. Since the RHS of the equations that comprise System 2 are
of class C1, the solution is unique when it exists, and there is continuous dependence of the
solution on the initial conditions; in particular, the terminal value γT is continuous in γF
73
(see Theorem on page 397 in Hirsch et al. (2004)). Hence if there exists γF ∈ (0, γ) such
that γT (γF ) ≥ γo, by the intermediate value theorem there exists a γF ∈ (0, γ) such that
γT (γF ) = γo, allowing us to construct a solution to System 1. We rule out the alternative case
by contradiction, and then we return to the case just described to complete the construction.
Suppose by way of contradiction that for all γF ∈ (0, γ), γT (γF ) < γo. In particular,
because γt is nondecreasing in the backward system for any initial condition, we have that
γt ∈ (0, γo) and by Lemma 5, χt ∈ (0, χ) for all t ∈ [0, T ] when γF ∈ (0, γ).
To reach a contradiction, it suffices to show that the solution to System 2 can be bounded
uniformly over γF ∈ (0, γ), as this would imply that the solution can be extended strictly to
the right of γ in this case, violating the definition of γ.
To establish uniform bounds, we decompose β1 and β3 as sums of forward-looking and
myopic components and show that both of these components are uniformly bounded; for
v8, we establish the bound without any decomposition. Specifically, define βm1t := 12(2−χt) ,
βm3t = 12
and βfit := βit − βmit for all t ∈ [0, T ], i = 1, 3. Observe that for γF ∈ (0, γ), βm1t is
uniformly bounded by [1/4, 1/2] ⊂ [0, 1], as χt = χ(γt) ∈ [0, 1] for all t ∈ [0, T ], and trivially
βm1t ∈ [0, 1]. Hence, we define one final system, System 3, in (v8, βf1 , β
f3 , γ) to be (B.21),
(B.24), and
βf1t = − χt2(2− χt)2
+αtγt
2σ2Xσ
2Y (1− χt)
{−2σ2
X(αt − [βf1t + βm1t ])[βf1t + βm1t ](1− χt)
+2σ2Y αtχt
([1− (βf1t + βm1t)− (βf3t + βm3t)][1 + 2(βf1t + βm1t)χt]
)−2σ2
Y αtχt(βf1t + βm1t)(1− χt) + α2
t (βf1t + βm1t)χtv8t
},
(B.27)
=: hβf1 (βf1t, β
m1t , β
f3t, β
m3t , v8t, γt)
βf3t =αtγt
2σ2Xσ
2Y (1− χt)
{2σ2
X(βf1t + βm1t)(1− χt)(βf3t + βm3t) + α2
t (βf3t + βm3t)χtv8t
+4σ2Y αtβ
f3tχ
2t [1− (βf1t + βm1t)− (βf3t + βm3t)]
} (B.28)
=: hβf3 (βf1t, β
m1t , β
f3t, β
m3t , v8t, γt)
subject to initial conditions v80 = 0, βf10 = 0, βf30 = 0 and γ0 = γF ∈ (0, γ), where αt =
[βf3t+βm3t ]+[βf1t+β
m1t ]χt. Define hv8(βf1t, β
m1t , β
f3t, β
m3t , v8t, γt) as the RHS of (B.24) with βfit+β
mit
substituted for βit, i = 1, 3.
Given that βm1 and βm2 are uniformly bounded, it suffices to show that the solutions
(v8, βf1 , β
f3 ) are uniformly bounded by some [−K,K]3. (Recall that γ is already bounded by
[0, γo].)
Define α = (K + 1)χ + (K + 1), where we suppress dependence on K. Next, for x ∈
74
{v8, βf1 , β
f3 } define hx : R2
++ → R++ as follows:
hv8(K, γo) := γo {2[1 + 2(K + 1)] + 2(1 + 2α)(K + 1)
+4(K + 1)2χ+Kα2[1/σ2
Y + χ/(σ2X [1− χ])
]} (B.29)
hβf1 (K, γo) :=
α2γo [1/σ2Y + χ/(σ2
X [1− χ])]
2(2− χ)2
+αγo
2σ2Xσ
2Y (1− χ)
{2σ2
X [α +K + 1](K + 1)
+2σ2Y αχ([1 + 2(K + 1)][1 + 2(K + 1)χ] +K + 1) + α2K(K + 1)χ
}(B.30)
hβf3 (K, γo) :=
αγo
2σ2Xσ
2Y (1− χ)
{2σ2
X(K + 1)2 + α2χK(K + 1)
+4σ2Y αχ
2K[1 + 2(K + 1)]},
(B.31)
Define
T (γo) := maxK′>0
minx∈{v8,βf1 ,β
f3 }
K ′
hx(K ′, γo),
and let K denote the arg max.41 We now show that given T < T (γo), (v8, βf1 , β
f3 ) are uni-
formly bounded by [−K,K]3. Suppose otherwise, and define τ = inf{t > 0 : (v8t, βf1t, β
f3t) /∈
[−K,K]3}; by supposition and continuity of the solutions, τ ∈ (0, T ) and |xτ | = K, some
x ∈ {v8, βf1 , β
f3 }. Now by construction of the hx(K, γo), for all t ∈ [0, τ ] and for each
x ∈ {v8, βf1 , β
f3 } we have
|xt| = |hx(βf1t, βm1t , βf3t, β
m3t , v8t, γt)| < hx(K, γo)
and thus by the triangle inequality,
|xτ | < 0 + τ · hx(K, γo) < T (γo)hx(K, γo) ≤ K,
a contradiction. We conclude that the solutions (v8, βf1 , β
f3 , γ) to System 3 are uniformly
bounded by [−K,K]3×[0, γo], and hence the solutions (v8, β1, β3, γ) to System 2 are uniformly
bounded by [−K,K]× [−(K + 1), K + 1]2 × [0, γo]. This gives us the desired contradiction
on the definition of γ from before, so we conclude that there exists γF ∈ (0, γ) such that the
solution to System 2 satisfies γT (γF ) = γo. (Note that any such γF lies in (0, γo), as γ is
nondecreasing in the backward system.)
Now consider any such γF and solution (v8, β1, β3, γ) to System 2 with γT (γF ) = γo.
We claim using the comparison theorem that α > 0 over [0, T ]. First, we have α0 > 0, as
41Note that T (γo),K <∞ as the hx grow faster than linearly in K.
75
α0 = 12−χ(γF )
, and by Lemma 5, 0 ≤ χ(γF ) < χ < 1. Now the RHS of (B.20) is locally
Lipschitz continuous in α, uniformly in t, as the remaining coefficients appearing in that
ODE are bounded (being continuous functions of time over the compact set [0, T ]). By
standard application of the comparison theorem in Teschl (2012, Theorem 1.3), we have
αt > 0 for all t ∈ [0, T ].
Hence, we can define
vcand6t :=σ2Y [−1 + 2β1t(1− χt) + αt]
αt− v8t
2
βcand2t := 1− β1t − β3t.
Observe that in System 2, (B.24)-(B.26) was obtained by replacing β2t with βcand2t in
(B.23), (B.17) and (B.19), so (v8, β1, βcand2 , β3, γ) solves (B.23), (B.17), (B.19) and (B.21).
It is tedious but straightforward to verify that vcand6t and βcand2t solve (B.22) and (B.18),
respectively. Now the RHS of (B.22) and (B.18), given the solutions (v8, β1, β3, γ), are locally
Lipschitz continuous in their respective variables, uniformly in t, and thus (vcand6 , βcand2 ) are
the unique solutions to (B.22) and (B.18) given the other variables.
We have γt ≥ γF > 0 for all t ∈ [0, T ]. Thus, we can recover v6 = v6/γ and v8 = v8/γ as
the solutions to (B.15) and (B.16). Hence, we have established the existence of γF ∈ (0, γo)
and solution to the associated IVP posed at the beginning of the proof such that γT (γF ) = γo.
By reversing the direction of time, this is a solution to the BVP in the theorem statement.
For the remainder of the proof, we refer to the forward system. Now in the full system
of equations for the equilibrium, learning and value function coefficients, we have
v2t =2σ2
Y β0t
γtαt
v5t = −σ2Y [β3t − β1t(2− χt)]
γtαt
v7t =2σ2
Y (2β3t − 1)
γtαt
v9t =2σ2
Y [β2t − β1t(1− χt)]γtαt
76
and a system of ODEs for v0, v1, v3, v4, β0:
v0t := β20t +
αtγtχtσ2X(1− χt)2
{σ2Y [−2β2tχt(1− χt) + αtχt(1− χt)2]
+αtσ2X(1− χt)2 − αtγtχtv6t
}v1t := −2β0t
v3t :=α2tγtχtv3t
σ2X(1− χt)
+ 2β0t[β1t(1− χt) + β2t]
v4t := 1− 2β23t
β0t := −α2tγtχt[αtγt(v3t + β0tv8t) + 4σ2
Y β0tβ2tχt]
2σ2Xσ
2Y (1− χt)
,
all of which have terminal values 0. Since the system of ODEs above is linear, it has a unique
solution (given the solution to the BVP), and v2, v5, v7 and v9 are well-defined by the above
formulas since α, γ > 0. Furthermore, by inspection, (β0, v3) = (0, 0) solve their respective
ODEs. We conclude that there exists a LME.
Proof of Theorem 2
Let
β2 = β2/(1− χ); v6 = v6γ/(1− χ)2; v8 = v8γ/(1− χ).
77
The boundary value problem is
˙v6t = γt
{−β2
1t + 2β1tβ2t + β22t + v6t
(α2t
σ2Y
+2(uθ + αt)
2χtσ2X
)}(B.32)
˙v8t = γt
{(−2 + 4αt)β1t − 2β2t +
v8t(uθ + αt)2χt
σ2X
− 4β21tχt
}(B.33)
β1t =γt
4σ2Xσ
2Y (1 + uθχt)
×{
2σ2Xαt
(u2θ − 2β2
1t + αt(uθ + 2β1t))
+v8tαt(uθ + αt)2(uθ − 2β1t)χt
+4β1tχt[u2θσ
2Y +
(2uθσ
2X + σ2
Y
)α2t + uθαt
(uθσ
2X + 2σ2
Y − σ2Xβ1t
)]−4σ2
Y (uθ + αt)2β2tχt + 4σ2
Y (uθ + αt)2β1t(uθ − 2β2t)χ
2t
}(B.34)
˙β2t =γt
4σ2Xσ
2Y (1 + uθχt)
×{
2σ2Xαt
[u2θ + 2β2
1t + αt(uθ + 2β2t)]
+αtχt(uθ + αt)2[−4v6t + v8t(uθ − 2β2t)
]+4αtχtuθσ
2X
[β2
1t + (uθ + 2αt)β2t
]−4(uθ + αt)
2[uθv6tαt + σ2
Y β2t(−uθ + 2β2t)]χ2t
}(B.35)
β3t =γt
4σ2Xσ
2Y (1 + uθχt)
×{−4σ2
Xα2tβ1t
+2αtχt(uθ + αt)[−uθσ2
X + 2uθσ2Xαt − v8tαt(uθ + αt)
]−2αtχt[2uθσ
2Xαtβ1t − 2σ2
Xβ21t]
−χ2t [v8tαt(uθ + αt)
2(uθ − 2β1t) + 4uθσ2Xαt(uθ + αt − β1t)β1t]
+4σ2Y χ
2t (uθ + αt)
2(−1 + 2αt)β2t + 8σ2Y (uθ + αt)
2β1tβ2tχ3t
}(B.36)
γt = −α2tγ
2t
σ2Y
(B.37)
χt = γt{α2t (1− χt)/σ2
Y − (uθ + αt)2χ2
t/σ2X
}. (B.38)
with boundary conditions (γ0, χ0, v6T , v8T , β1T , β2T , β3T ) = (γo, 0, 0, 0, 1+2uθ2(2−χT )
, 1+2uθ2(2−χT )
, 1/2).
The core of the proof is to establish the existence of a solution (v6, v8, β1, β2, β3, γ, χ) to
the boundary value problem for all T < T (γo); from there, it is straightforward to verify
that the remaining coefficients are well-defined and that the HJB equation is satisfied. We
complete these steps at the end of the proof.42
It is useful to introduce z = (v6, v8, β1, β2, β3, γ, χ) and write the system of ODEs (B.32)-
(B.38) as zt = F (zt). We write z = (z1, z2, . . . , z5) and F (z) = (F1(z), F2(z), . . . , F5(z)).
42In particular, we have ignored β0 for now since it does not appear in any of the ODEs displayed here;afterward, it is easily shown to be identically zero.
78
Define B : R2+ → R5 by B(γ, χ) =
(0, 0, 1+2uθ
2(2−χ), (1+2uθ)
2(2−χ), 1/2
), formed by writing the termi-
nal value of z as a function of (γ, χ). Define s0 ∈ R5 by s0 = B(γo, 0) = (0, 0, 1+2uθ4
, 1+2uθ4
, 1/2).
For x ∈ Rn, let ||x||∞ denote the sup norm, sup1≤i≤n |xi|. For any ρ > 0, let Sρ(s0) denote
the ρ-ball around s0
Sρ(s0) := {s ∈ R5| ||s− s0||∞ ≤ ρ}.
For all s ∈ Sρ(s0), let IVP-s denote the the initial value problem defined by (B.32)-(B.38)
and initial conditions (v60, v80, β10, β20, β30, γ0, χ0) = (s, γo, 0). Whenever a solution to IVP-s
exists, it is unique as F is of class C1; denote it by z(s), where z(s) = (z(s), γ(s), χ(s)) =
(v6(s), v8(s), β1(s), β2(s), β3(s), γ(s), χ(s)). Note that such a solution solves the BVP if and
only if
zT (s) = B(γT (s), χT (s)), (B.39)
as the initial values γ0(s) = γo and χ0(s) = 0 are satisfied by construction. Note also that
zT (s) = s +´ T
0F (zt(s))ds; hence (B.39) is satisfied if and only if s is a fixed point of the
function g : Sρ(s0)→ R5 defined by
g(s) := B(γT (s), χT (s))−ˆ T
0
F (zt(s))dt. (B.40)
Note, moreover, that for any solution, we have by Lemma 3 that χt ∈ [0, χ) where we
define χ as 1 for the purpose of this proof.
Before establishing conditions sufficient for g(s) to be a continuous self-map on Sρ for a
given ρ > 0, we establish the following result, which gives existence, uniqueness and uniform
bounds of solutions to IVP-s for all s ∈ Sρ. Specifically, for arbitrary K > 0, we ensure that
the solution zt(s) varies at most K from its starting point s for all t ∈ [0, T ], and thus by
the triangle inequality, this solution varies most ρ + K from s0. These bounds will be used
further when we turn to the self-map property.
Lemma B.2. Fix γo > 0, ρ > 0 and K > 0. Then there exists a threshold T SBC(γo; ρ,K) >
0 such that if T < T SBC(γo; ρ,K), then for all s ∈ Sρ(s0) a unique solution to IVP-s exists
over [0, T ]. Moreover, for all t ∈ [0, T ], zt(s) ∈ Sρ+K(s0). We call this property the System
Bound Condition (SBC).
Proof. Recall that F is of class C1, and hence given s ∈ Sρ(s0), the solution z(s) is unique
whenever it exists. Toward the SBC, note that it suffices to ensure that for all ||z(s)−s||∞ <
K, since then by the triangle inequality, ||z(s)− s0||∞ ≤ ||z(s)− s||∞ + ||s− s0||∞ < ρ+K.
79
In what follows, we construct bounds on F by writing F (z(s)) = F (z(s)− s0 + s0) and using
the conjectured bounds ||z(s)−s0||∞ < ρ+K, γ ∈ (0, γo], χ ∈ [0, χ) for the solution, when it
exists. Using these bounds on F , we then identify a threshold time T SBC(γo; ρ,K) such that
at all times t < T SBC(γo; ρ,K) the solution to IVP-s (exists and) satisfies the conjectured
bounds.
Note that the desired component-wise inequalities |zit(s)−si0| < ρ+K, i ∈ {1, 2, . . . , 5},imply the further bounds
|v6t|, |v8t| < ρ+K
|β1t| < β1(ρ,K) :=1 + 2uθ
4+ ρ+K
|β2t| < β2(ρ,K) :=1 + 2uθ
4+ ρ+K
|β3t| < β3(ρ,K) := 1/2 + ρ+K
|αt| < α(ρ,K) := β1(ρ,K)χ+ β3(ρ,K).
Hereafter, we suppress the dependence of βi, i ∈ {1, 2, 3}, and α on (ρ,K).
Define functions hi : R3++ → R++ as follows:43
h1(γo; ρ,K) := γo{
(β1 + β2)2 + v6
(α2/σ2
Y + 2(uθ + α)2χ/σ2X
)}h2(γo; ρ,K) := γo
{(2 + 4α)β1 + 2β2 + v8(uθ + α)2χ/σ2
X + 4β21 χ}
h3(γo; ρ,K) :=γo
4σ2Xσ
2Y
×{
2σ2X α(u2θ + 2β2
1 + α(uθ + 2β1))
v8α(uθ + α)2(uθ + 2β1)χ
+4β1χ[u2θσ
2Y +
(2uθσ
2X + σ2
Y
)α2 + uθα
(uθσ
2X + 2σ2
Y + σ2X β1
)]+4σ2
Y (uθ + α)2[β2χ+ β1(uθ + 2β2)χ2
]}h4(γo; ρ,K) :=
γo
4σ2Xσ
2Y
×{
2σ2X α[u2θ + 2β2
1 + α(uθ + 2β2)]
αχ(uθ + α)2[4v6 + v8(uθ + 2β2)
]+ 4αχuθσ
2X
[β2
1 + (uθ + 2α)β2
]+4(uθ + α)2χ2
[uθv6α + σ2
Y β2(uθ + 2β2)]}
h5(γo; ρ,K) :=γo
4σ2Xσ
2Y
×{
4σ2X α
2β1 + 2αχ(uθ + α)[uθσ
2X + 2uθσ
2X α + v8α(uθ + α)
]+2αχ[2uθσ
2X αβ1 + 2σ2
X β21 ]
χ2[v8α(uθ + α)2(uθ + 2β1) + 4uθσ
2X αβ1(uθ + α + β1)
]+4σ2
Y χ2(uθ + α)2(1 + 2α)β2 + 8σ2
Y (uθ + α)2β1β2χ3}.
43We use R++ to denote (0,+∞).
80
Now for arbitrary (ρ,K) ∈ R2++, define
T SBC(γo; ρ,K) := mini∈{1,2,...,5}
K
hi(γo; ρ,K). (B.41)
We claim that by construction, for any t < T SBC(γo; ρ,K), if a solution exists at time
t, then ||zt(s) − s||∞ < K, γt ∈ (0, γo] and χt ∈ [0, χ). To see this, suppose by way of
contradiction that there is some s ∈ Sρ and some t < T SBC(γo; ρ,K) at which a solution to
IVP-s exists but either |zit(s)− si| ≥ K for some i ∈ {1, 2, . . . , 5}, γt /∈ (0, γo] or χt /∈ [0, χ);
let τ be the infimum of such times. Now by Lemma 3, it cannot be that γt /∈ (0, γo] or
χt /∈ [0, χ) while zt(s) exists, so (by continuity of z(s) w.r.t. time) it must be that for some
i ∈ {1, 2, . . . , 5}, |ziτ (s) − si| ≥ K, and the bounds γt ∈ (0, γo] and χt ∈ [0, χ) are satisfied
for all t ∈ [0, τ ].
By construction of the hi(γo; ρ,K), for all t ∈ [0, τ ] we have |Fi(zt(s))| ≤ hi(γ
o; ρ,K) and
thus
|ziτ (s)− si| = |ˆ τ
0
Fi(zt(s))dt|
≤ˆ τ
0
|Fi(zt(s))|dt
≤ τ · hi(γo; ρ,K)
< T SBCi (γo; ρ,K)hi(γo; ρ,K)
≤ K,
where the second to last line uses that τ < T SBC(γo; ρ,K) and the last line uses the definition
of T SBCi (γo; ρ,K); but via the strict inequality, this contradicts the definition of τ , proving
the claim. By the triangle inequality, it follows that zt(s) ∈ Sρ+K(s0) if a solution exists at
time t < T SBC(γo; ρ,K). Together, these bounds imply that the solution cannot explode
prior to time T SBC(γo; ρ,K). In other words, a unique solution must exist over [0, T ] for any
T < T SBC(γo; ρ,K) and it satisfies the SBC.
In order to invoke a fixed point theorem, the key remaining step is to establish, through
the following lemma, that g is a well-defined, continuous self-map on Sρ when T is below a
threshold T (γo; ρ,K). The expression for the latter is provided in the proof of the lemma.
Lemma B.3. Fix γo > 0, ρ > 0 and K > 0. There exists T (γo; ρ,K) ≤ T SBC(γo; ρ,K)
such that for all T < T (γo; ρ,K), g is a well-defined, continuous self-map on Sρ.
Proof. First, the inequality T (γo; ρ,K) ≤ T SBC(γo; ρ,K), which holds by construction (as
carried out below), ensures that a unique solution to IVP-s exists for all s ∈ Sρ. Next, we
81
argue that g is continuous. Note that g(s) can be written as B(γT (s), χT (s)) − [zT (s) − s].Since F is of class C1 on the domain Sρ+K × (0, γo]× [0, χ), zt(s) (which includes γ and χ)
is locally Lipschitz continuous in s, uniformly in t ∈ [0, T ],44 and B is continuous, and thus
continuity of g follows readily.
To complete the proof, we show that if T < T (γo; ρ,K), g satisfies the condition
||g(s)− s0||∞ ≤ ρ for all s ∈ Sρ,
which we refer to as the Self-Map Condition (SMC).
Note that g(s)− s0 = ∆(s)−´ T
0F (zt(s))dt, where
∆(s) := B(γT (s), χT (s))−B(γo, 0)
=
(0, 0,
1 + 2uθ2
[1
2− χT (s)− 1
2
],1 + 2uθ
2
[1
2− χT (s)− 1
2
], 0
).
The hi(ρ,K) constructed in the proof of the previous lemma will provide us a bound for
the components of´ T
0F (zt(s))dt, but we must also bound ∆(s), and in particular, ∆3(s)
and ∆4(s). Note that ∆3(s) = ∆4(s).
Recalling that χ ∈ [0, 1), the ODE for χ implies that
χt ≤ γt{α2t (1− χt)/σ2
Y
}≤ γoα2/σ2
Y ,
which depends on (ρ,K) through α. Hence by the fundamental theorem of calculus, we have
χt =´ t
0χsds ≤ (γoα2/σ2
Y ) t.
Hence, using χT (s) ≤ 1 to bound (2 − χT (s)) in the denominators from below by 1, we
have the following bound for ∆3(s) = ∆4(s):
|∆3(s)| =∣∣∣∣1 + 2uθ
2
[1
2− χT (s)− 1
2
]∣∣∣∣ =1 + 2uθ
2
∣∣∣∣ χT (s)
2(2− χT (s))
∣∣∣∣ ≤ 1 + 2uθ4
(γoα2/σ2
Y
)T
For arbitrary (ρ,K) ∈ R2++, define ∆i(γ
o; ρ,K) = 1+2uθ4
(γoα2/σ2Y ) for i ∈ {3, 4} and define
∆i(γo; ρ,K) = 0 for i ∈ {1, 2, 5}. Note that for all i ∈ {1, 2, 3, 4, 5}, ∆i(ρ,K) is proportional
to γo, and by construction, |∆i(s)| ≤ T ∆i(γo; ρ,K).
Now for arbitrary (ρ,K) ∈ R2++, define
T (γo; ρ,K) := min
{T SBC(γo; ρ,K), min
i∈{1,2,...,5}
ρ
∆i(γo; ρ,K) + hi(γo; ρ,K)
}. (B.42)
44See Theorem on page 397 in Hirsch et al. (2004).
82
To establish the SMC, it suffices to establish for each i ∈ {1, 2, . . . , 5} that |gi(s)−si0| ≤ ρ
for all s ∈ Sρ(s0).
We calculate
|gi(s)− si0| = |Bi(γT (s), χT (s))−Bi(γo, 0)︸ ︷︷ ︸
=∆i(s)
−ˆ T
0
Fi(zt(s))dt|
≤ |∆i(s)|+ˆ T
0
|Fi(zt(s))|dt
≤ T ∆i(γo; ρ,K) + Thi(γ
o; ρ,K)
< ρ,
where (i) in the second to last line we have used the definition of ∆i(γo; ρ,K) and that
|Fi(zt(s))| ≤ hi(γo; ρ,K); and (ii) in the last line we have used that T < T (γo; ρ,K) ≤
ρ∆i(γo;ρ,K)+hi(γo;ρ,K)
by construction. Hence, for all i ∈ {1, 2, . . . , 5} we have |gi(s)− si0| ≤ ρ,
completing the proof.
To complete the solution to the boundary value problem (B.32)-(B.38), note that by
Lemma B.3, g is a well-defined, continuous self-map on the compact set Sρ. By Brouwer’s
Theorem, there exists s∗ such that s∗ = g(s∗), and hence the solution to IVP-s∗ is a solution
to the BVP. To see that T (γo) ∈ O(1/γo), note simply that γo appears as an outside factor
in the denominators of the expressions defining T SBC(γo; ρ,K) and T (γo; ρ,K). Moreover,
since ρ,K have been chosen arbitrarily, we can then optimize T (γo; ρ,K) over choices of
(ρ,K) ∈ R2++.
We argue that α is finite and that γ and α are strictly positive. Finiteness comes directly
from the definition α = β1χ+ β3 and the finiteness of the underlying variables. This implies
that γt > 0 for all t ∈ [0, T ]. The ODE for α is
αt =αt(uθ + αt)γtχt
2σ2Xσ
2Y (1 + uθχt)
{2uθσ
2Xαt − v8tαt(uθ + αt)− 4σ2
Y (uθ + αt)β2tχt
}. (B.43)
By continuity of the solution to the BVP, the RHS of the equation above is locally Lipschitz
continuous in α, uniformly in t. Moreover, αT = β1T + β3TχT = 1+uθχT2−χT
> 0. By a standard
application of the comparison theorem to the backward version of the previous ODE, it must
be that αt > 0 for all t ∈ [0, T ].
Using the solution to the BVP and the facts above, we solve for the rest of the equilibrium
83
coefficients. First, we have directly
v2t =2σ2
Y β0t
γtαt
v5t =σ2Y [β1t(2− χt)− β3t − uθ]
γtαt
v7t = −2σ2Y (1− 2β3t)
γtαt
v9t =2σ2
Y [β2t − β1t(1− χt)]γtαt
.
The last three are clearly well-defined due to α, γ > 0.
The remaining ODEs for β0, v0, v1, v3 and v4 are
β0t = − (uθ + αt)γtχt2σ2
Xσ2Y (1− χt)(1 + uθχt)
{4uθσ
2Y β0tβ2t(1− χt)χt
+α2t [v8tβ0t(1− χt) + v3tγt(1 + uθχt)]
+αt
[uθv3tγt(1 + uθχt) + β0t(1− χt)
(−2uθσ
2X + uθv8t + 4σ2
Y β2tχt
)]}, β0T = 0,
v0t = β20t + (uθ + αt)
2γtχt
+(uθ + αt)
2γtχ2t
σ2X
[−v6t + σ2
Y (uθ + αt − 2β2t)/αt
], v0T = 0,
v1t = −2β0t, v1T = 0,
v3t = 2β0t(β1t + β2t)(1− χt) +v3t(uθ + αt)
2γtχtσ2X(1− χt)
, v3T = 0, and
v4t = 1− 2β23t, v4T = 0.
Observe the system for (β0, v1, v3) is uncoupled from (v0, v4). By inspection, the former has
solution (β0, v1, v3) = (0, 0, 0), and uniqueness follows from the the associated operator being
locally Lipschitz continuous in (β0, v1, v3) uniformly in t ∈ [0, T ]. It follows that v2 = 0, and
the solutions for (v0, v4) can be obtained directly by integration, given their terminal values.
We conclude that a linear Markov equilibrium exists.
Appendix C: Proofs for Section 5
Proofs for Section 5.1
We first analyze the public case, then the no feedback case, and then we analyze the learning
and payoff comparisons. Proposition 5 is then an immediate consequence of Lemmas C.5
84
and C.6.
Public Case
System of ODEs We look for an equilibrium of the form at = β0t + β1tMt + β3tθ, where
Mt = Mt is publicly known.
The (backward) system of ODEs is
β0t = −rβ0tβ3t
β1t = −β1tβ3t
(r +
β3tγtσ2Y
)β3t = −β3t
[−r + β3t
(r − β1tγt
σ2Y
)]γt =
β23tγ
2t
σ2Y
,
with initial conditions
β00 = 0, β10 = −ψγtσ2Y
≤ 0, β30 = 1 and γ0 = γF ∈ (0, γo).
Note: for the value function written as
V (θ,Mt, t) = v0t + v1tθ + v2tMt + v3tθ2 + v4tM
2t + v5tθMt,
we have the (backward) system
v0t = −rv0t −(v2
2t − 4σ2Y v4t)γ
2t
(−2σ2Y + v5tγt)2
v1t = −rv1t −2v2tγt
−2σ2Y + v5tγt
v2t = −rv2t −v2t(4σ
2Y γt + 4v4tγ
2t )
(−2σ2Y + v5tγt)2
v3t = −1− rv3t +4σ4
Y
(−2σ2Y + v5tγt)2
v4t = −rv4t −4v4tγt(2σ
2Y + v4tγt)
(−2σ2Y + v5tγt)2
v5t = −rv5t −4γt[σ
2Y (−2v4t + v5t) + v4tv5tγt]
(−2σ2Y + v5tγt)2
,
with initial conditions v00 = v10 = v20 = v30 = v50 = 0 and v40 = −ψ.
85
In terms of the β coefficients (for which existence of a solution is shown below), we have
v2t =2σ2
Y β0t
β3tγt
v4t =σ2Y β1t
β3tγt
v5t = −2σ2Y (1− β3t)
β3tγt.
Since β3t > 0 and thus v5tγt = −2σ2Y (1−β3t)β3t
< 2σ2Y , the denominator in each ODE is
bounded away from zero. Given v2, v4 and v5, the ODEs for v0, v1 and v3 are linear and
uncoupled and thus have solutions.
Existence of Linear Markov Equilibrium: r = 0 case When r = 0, the backward
system simplifies to
β0t = 0
β1t = −β1tβ23tγt
σ2Y
β3t =β1tβ
23tγt
σ2Y
γt =β2
3tγ2t
σ2Y
,
with initial conditions
β00 = 0, β10 = −ψγtσ2Y
≤ 0, β30 = 1 and γ0 = γF ∈ (0, γo).
Define ψ := ψγo/σ2Y and T := Tγo/σ2
Y .
Lemma C.1. Suppose r = 0. For all T > 0 and all ψ > 0, there exists a linear Markov
equilibrium. The corresponding γT ∈ (0, γo) satisfies gpub(γT/γo) = 0, where
gpub(ρ) := −T ψρ2(1− ρ) + ρ(1 + T )− 1 = 0.
In addition, β3 ∈ (0, 1] is increasing and β1 < 0 is decreasing, and β1 + β3 is constant.
Proof. Since gpub(0) = −1 < 0 < gpub(1) = T , there exists γF ∈ (0, γo) as in the statement of
the proposition. We now show that for any such γF , there exists a solution to the backward
86
IVP with γ0 = γF , and it satisfies γT = γo. The proof is constructive, and the solution is
unique conditional on γF .
Note that β1t + β3t = 0, so β1 + β3 is constant, and
β1t + β3t = β10 + β30 = 1 + β10
=⇒ β1t = 1 + β10 − β3t.
Hence, a uniformly bounded solution for β3 exists if and only if the same holds for β1.
Next, define Π := β1γ and observe that Π ≡ 0, so
β1tγt = β10γF
=⇒ β1t = β10γF/γt (C.1)
=⇒ β3t = 1 + β10(1− γF/γt), (C.2)
where γt ≥ γF > 0 for all t over the interval of existence, since γ is nondecreasing. Now
|β1t| ≤ |β10|, so both β1 and β3 are uniformly bounded; we now show that γ is uniformly
bounded above.
Using (C.2), the ODE for γ is
γt = [1 + β10(1− γF/γt)]2γ2t /σ
2Y
= [(1 + β10)γt − β10γF ]2/σ2
Y .
Integrating and using the initial condition for β10 and γ0 = γF yields
γt =γF [σ4
Y + tψ(γF )2]
σ4Y − tγF (−γFψ + σ2
Y ),
wherever this exists. The denominator is strictly positive if and only if
0 < σ4Y
[1− tγo
σ2Y
(−ρ2ψ + ρ)
]=: σ4
Y h(t).
Now h(t) is linear and thus bounded between h(0) = 1 > 0 and h(T ) = 1− T (−ρ2ψ + ρ) =ρ2T1−ρ > 0, where we have used the identity gpub(ρ) = 0 to eliminate ρ2ψ. We conclude that
the denominator is strictly positive for all t.
87
Moreover, at time t = T , we have
γT =γF [σ4
Y + Tψ(γF )2]
h(T )
=ργoσ4
Y [1 + T ψρ2]
σ4YTρ2
1−ρ
= γo[1 + T ψρ2](1− ρ)
T ρ
= γo,
where the last equality follows from gpub(ρ) = 0.
Returning to (C.1), we obtain that β1 is negative and increasing (in the backward system),
and applying the comparison theorem, it cannot change sign. From (C.2), we obtain that
β3 is less than 1, is decreasing in the backward system, and cannot change sign.
Define functions ψ, ψ : [8,∞) → [27/8,∞) by ψ(T ) = − 1T
+ 52
+ T8
+ (T−8)3/2
8√T
and
ψ(T ) = − 1T
+ 52
+ T8− (T−8)3/2
8√T
.
Lemma C.2. There is a unique linear Markov equilibrium if T ≤ 8 or if T > 8 and
ψ /∈ [ψ(T ), ψ(T )]. For ψ ≤ 27/8, there is a unique linear Markov equilibrium for all T (and
thus for all T ). For the remaining parameter settings, there is multiplicity of equilibria.
Proof. We he have (gpub)′(ρ) = −T ψρ(2− 3ρ) + 1 + T . This quadratic has roots if and only
if ψ ≥ 3(1+T )
T; since (gpub)′(0) = 1 + T > 0, ψ ≥ 3(1+T )
Timplies that gpub is strictly increasing
in [0, 1] and there is a unique solution to gpub(ρ) = 0. For the rest of the proof, suppose that
ψ > 3(1+T )
T. Denote by ρ and ρ the roots of (gpub)′(ρ) = 0, with 0 < ρ < ρ < 1, and note
that these are continuously differentiable in T and ψ in the assumed domain:
ρ :=1−
√1− 3(1 + T )/(T ψ)
3
ρ :=1 +
√1− 3(1 + T )/(T ψ)
3.
Since gpub is increasing on [0, ρ)∪ [ρ, 1] and decreasing on [ρ, ρ), so a necessary and sufficient
condition for uniqueness is that either 0 < gpub(ρ) or gpub(ρ) < 0.
88
By the envelope theorem,
d
dψgpub(ρ) =
∂
∂ψgpub(ρ)|ρ=ρ < 0 and
d
dψgpub(ρ) =
∂
∂ψgpub(ρ)|ρ=ρ < 0.
By construction we have gpub(ρ) > gpub(ρ). Moreover, we have the limits
limψ↓ 3(1+T )
T
gpub(ρ) = limψ↓ 3(1+T )
T
gpub(1/3)
=T − 8
9and
limψ↑∞
gpub(ρ) = −1.
For T ≤ 8, the first limit is nonpositive, and hence for all ψ in the domain, gpub(ρ), gpub(ρ) <
0, giving uniqueness. For T > 8, the first limit is positive, and thus there exist thresholds
ψ, ψ with 3(1+T )
T< ψ < ψ <∞ defined implicitly by
gpub(ρ(ψ)); ψ) = 0 (C.3)
gpub(ρ(ψ)); ψ) = 0, (C.4)
such that (i) if ψ < ψ, gpub(ρ) > 0, giving uniqueness and (ii) if ψ < ψ, gpub(ρ) > 0, giving
uniqueness. We now identify ψ and ψ. Note that if either (C.3) or (C.4) holds, there exists
a double root at some ρ2 ∈ (0, 1) and gpub must be of the form
gpub(ρ) ≡ K(ρ− ρ1)(ρ− ρ2)2
⇐⇒ −T ψρ2(1− ρ) + ρ(1 + T )− 1 ≡ K(ρ− ρ1)(ρ− ρ2)2.
Matching coefficients w.r.t. powers of ρ yields the system
T ψ = K (C.5)
−T ψ = −K(ρ1 + 2ρ2) (C.6)
1 + T = K(ρ22 + 2ρ1ρ2) (C.7)
−1 = −Kρ1ρ22 (C.8)
in variables (K, ψ, ρ1, ρ2). Together (C.5) and (C.6) imply ρ1 = 1 − 2ρ2, with which (C.8)
becomes 1 = K(1 − 2ρ2)ρ22. Dividing (C.7) by this yields 1 + T = 2−3ρ2
ρ2(1−2ρ2). Solving this
89
yields ρ2 =4+T±√
(T−8)T
4(1+T ). Plugging these values of ρ2 back into (C.7), solving for K and
then using (C.5) to solve for ψ yields
ψ, ψ ∈
−16(1 + T )3
T
(4 + T −
√T (T − 8)
)(4− 5T + 3
√T (T − 8)
) ,
− 32(1 + T )3
T
(−2 + T +
√T (T − 8)
)(4 + T +
√T (T − 8)
) .
Multiplying each fraction by the conjugate of each factor in the denominator and simplifying
yields the expressions given before the statement of the proposition.
No Feedback Case
System of ODEs We look for an equilibrium of the form at = β0m0 + β1Mt + β3θ, where
Mt = E1t [Mt].
The backward system is
β0t = −αt[rσ2Y β0t + m0β
21tγt(1− χt)]
σ2Y
β1t = −αtβ1t[rσ2Y + β3tγt − β1tγt(1− χt)]
σ2Y
β3t =αt[rσ
2Y (1− β3t) + β1tβ3tγt]
σ2Y
αt = r(1− αt)αt
γt =α2tγ
2t
σ2Y
,
with initial conditions β00 = 0, β10 = − ψγ0σ2Y +ψγ0χ(γ0)
, β30 = 1, α0 =σ2Y
σ2Y +ψγ0χ(γ0)
and γ0 = γF ,
where χ(γ) := 1− γ/γo.Writing the value function as
V (t, θ,Mt) = v0t + v1tθ + v2tMt + v3tθ2 + v4tM
2t + v5tθMt,
90
we have the (backward) system
v0t = −rv0t −v2tγ
2t [v2t + 4m0v4t(1− χt)]
[−2σ2Y + γt(v5t + 2v4tχt)]2
v1t = −rv1t −2γt[2m0v4tv5tγt(1− χt) + v2t(−2σ2
Y + γt[v5t + 2v4tχt])]
[−2σ2Y + γt(v5t + 2v4tχt)]2
v2t = −rv2t −4γt[σ
2Y v2t + 2m0v
24tγt(1− χt)
[−2σ2Y + γt(v5t + 2v4tχt)]2
v3t = −rv3t −v5tγt[−4σ2
Y + γt(v5t + 4v4tχt)
[−2σ2Y + γt(v5t + 2v4tχt)]2
v4t = −rv4t +4v4tγt[−2σ2
Y + v4tγt]
[−2σ2Y + γt(v5t + 2v4tχt)]2
v5t = −rv5t −4v4tγt
−2σ2Y + γt(v5t + 2v4tχt)
+4v5tγt[−σ2
Y + v4tγt]
[−2σ2Y + γt(v5t + 2v4tχt)]2
with initial conditions v10 = v20 = v30 = v50 = 0, v00 = −ψγ0χ(γ0) and v40 = −ψ. Given a
solution to the β system (with existence shown in the next subsection),
v2t =2σ2
Y β0t
αtγt
v4t =σ2Y β1t
αtγt
v5t = −2σ2Y (1− β3t)
αtγt.
As α is positive and constant (as shown in the next subsection), the coefficients v2, v4
and v5 are well-defined in terms of the solution to the β system. Now in the ODEs for v0,
v1 and v3, we have in the denominators −2σ2Y + γt(v5t + 2v4tχt) =
2σ2Y
αtbounded away from
0. Given v2, v4 and v5, these ODEs are linear and uncoupled and thus have solutions.
91
Existence of Linear Markov Equilibrium: r = 0 case When r = 0, the associated
backward system is:
β1 = −β1γ(β3 + β1χ− β1)(β3 + β1χ)
σ2Y
β3 =β3β1γ(β3 + β1χ)
σ2y
β0 = −β21γ(1− χ)(β3 + β1χ)
σ2Y
γt =γ2(β3 + β1χ)2
σ2Y
where χ = χ(γ) = 1− γ/γo, and with initial conditions
β30 = 1, β00 = 0, γ0 = γF ∈ (0, γo), and β10 = − ψγF
σ2Y + ψγFχ(γF )
< 0.
Lemma C.3. For all T > 0 and ψ > 0, there exists a linear Markov equilibrium. The
corresponding γT ∈ (0, γo) satisfies gNF (γT/γo) = 0 where
gNF (ρ) := T ρ− (1− ρ)[1 + ψρ(1− ρ)]2, ρ ∈ [0, 1],
with T := Tγo/σ2Y and ψ := ψγo/σ2
Y .
Proof. Observe first that since gNF (0) < 0 and gNF (1) > 0, there exists γF ∈ (0, γo) as in
the statement of the Proposition. Now fix any such solution as the initial condition for the
posterior variance in the backward IVP. We will show that this problem admits a solution
over [0, T ] with the property that γT = γo.
Consider the backward IVP indexed by γF over its maximal interval of existence. Notice
first that β0 + β1 + β3 ≡ 0, and so
β0t + β1t + β3t = β00 + β10 + β30 = 1− ψγF
σ2Y + ψγFχ(γF )
.
Thus, as long as β1 and β3 exist, β0 will too, and since β0 does not appear in any of the
other ODEs, we can ignore it from the analysis.
Similarly, α := β3 + β1χ satisfies α ≡ 0, and so
αt := β3t + β1tχt = α := 1− ψγFχ(γF )
σ2Y + ψγFχ(γF )
=σ2Y
σ2Y + ψγFχ(γF )
∈ (0, 1).
92
Consider now the subsystem
β1 = −β1γ(β3 + β1χ− β1)(β3 + β1χ)
σ2Y
= −β1αγ(α− β1)
σ2Y
β3 =β3β1γ(β3 + β1χ)
σ2Y
=β3β1γα
σ2Y
,
(C.9)
and observe that since β10 < 0 and β30 = 1 > 0, the same inequalities hold in a neighborhood
of zero.
We claim that β3 and β1 do not change signs. First, both cannot vanish at the same
time, as this would violate that αt = α > 0. Now suppose β3 is the first to do it, say at time
t; then for all s ∈ [0, t], β10 < 0 and by the comparison theorem, β3 > 0 for all s ∈ [0, t], a
contradiction. Likewise, a contradiction obtains if β1 vanishes first. We therefore conclude
that β1 is increasing while β3 is decreasing, and that they lie in [β10, 0] and [0, 1] as long as
they exist.
The existence of a solution of the IVP over [0, T ] then reduces to the existence of a solution
to the γ-ODE when this one is driven by α. As long as this one exists, straightforward
integration shows that
γt =γFσ2
Y
σ2Y − γF α2t
.
Since γF > 0, the right-hand side is well defined over [0, T ] if σ2Y − γF α2T > 0. Using that
α = σ2Y /[σ
2Y + ψγF (1− γF/γo)] = 1/[1 + ψρ(1− ρ)], where ρ = γF/γo and ψ = ψγo/σ2
Y , we
get
σ2Y − γF α2T > 0⇔ 1− ρα2T > 0⇔ [1 + ψρ(1− ρ)]2 − ρT > 0.
By definition of γF , however, gNF (ρ) = 0, which implies
⇔ [1 + ψρ(1− ρ)]2 − ρT = ρ[1 + ψρ(1− ρ)]2 > 0.
Moreover,
γT =γFσ2
Y
σ2Y − γF α2T
=γF
1− ρα2T=γF [1 + ψρ(1− ρ)]2
ρ[1 + ψρ(1− ρ)]2= γo,
concluding the proof.
Lemma C.4. Suppose that ψ ∈ (0, 2]. Then gNF has a unique root in [0, 1], and thus if
ψ ∈ (0, 2], there is a unique LME.
Proof. To show uniqueness, we prove that, under the previous range of values of ψ, the
93
derivative of gNF is positive at any point that satisfies gNF (ρ) = 0; thus, gNF can only cross
zero one, and hence, it does it from below.
It is easy to verify that
(gNF )′(ρ) = T + [1 + ψρ(1− ρ)]2 − 2ψ(1− ρ)(1− 2ρ)[1 + ψρ(1− ρ)].
At a crossing point, however, T ρ+ ρ[1 + ψρ(1− ρ)]2 = [1 + ψρ(1− ρ)]2, and so
(gNF )′(ρ) =1
ρ[1 + ψρ(1− ρ)]
{1 + ψρ(1− ρ)− 2ψρ(1− ρ)(1− 2ρ)
}.
Since ψ > −4 and ρ ∈ (0, 1), [1 + ψρ(1− ρ)] > 0. Thus, it suffices to show that
Q(ρ) := 1− ψρ+ 5ψρ2 − 4ψρ3 > 0, for ρ ∈ (0, 1).
Now Q(ρ) = 1 − ψρ(1 − ρ)(1 − 4ρ), so when ψ > 0, the previous cubic is positive (and at
least 1) over [1/2, 1]. Also, over [0, 1/2] we have
Q(ρ) = 1− ψρ+ 5ψρ2[1− 4ρ/5] > 1− ψρ > 1− ψ/2.
Since ψ ≤ 2, the result follows.
Learning and Payoff Comparisons
Lemma C.5. If ψ ∈ (0, 2], then there is more learning in the public case for all T > 0.
Proof. Let ρx = γxT/γo ∈ (0, 1), where γxT is the terminal value of γ in the BVP of case
x ∈ {public, no feedback}. When ψ ∈ (0, 2], these values are the unique solutions to the
equations
0 = gNF (ρ) := ρT − (1− ρ)[1 + ψρ(1− ρ)]2
= ρ(1 + T )− 1− ψρ(1− ρ)2[2 + ψρ(1− ρ)]
0 = gpub(ρ) := ρ(1 + T )− 1− ψT ρ2(1− ρ), (C.10)
respectively. In particular, observe that ρx > 1/(1 + T ), x ∈ {public, no feedback}. Our
goal is to show ρpub < ρNF .
94
Now, using that ρpub(1 + T )− 1 = ψT (ρpub)2(1− ρpub), we get that
gNF (ρpub) =ψ(1− ρpub)
T︸ ︷︷ ︸>0
{T 2(ρpub)2 − (1− ρpub)[2ρpubT + ρpub(1 + T )− 1]
}.
Thus, letting
Q(ρ) := T 2ρ2 − (1− ρ)[2ρT + ρ(1 + T )− 1] = ρ2(T 2 + 3T + 1)− ρ(3T + 2) + 1,
it suffices to show that Q(ρpub) < 0, as gNF (ρ) < 0 if and only if ρ < ρNF .
Observe that the roots of Q are given by
ρ− :=(3−
√5)T + 2
2(T 2 + 3T + 1), and ρ+ :=
(3 +√
5)T + 2
2(T 2 + 3T + 1),
and that ρ− <1
1+T< ρ+. Consequently, it suffices to show that gpub(ρ+) > 0: this ensures
that ρpub < ρ+, and since ρpub > 11+T
> ρ−, this implies that Q(ρpub) < 0.
Straightforward algebraic manipulation yields that
gpub(ρ+) > 0
⇔ 4(1 + T )[(3 +√
5)T + 2][T 2 + 3T + 1]2 − 8[T 2 + 3T + 1]3
−ψT [(3 +√
5)T + 2]2[2T 2 + (3−√
5)T ] > 0
Since the constraint is tightest when ψ = 2, we aim to show that, for all T > 0,
4(1 + T )[(3 +√
5)T + 2][T 2 + 3T + 1]2 − 8[T 2 + 3T + 1]3
−2T [(3 +√
5)T + 2]2[2T 2 + (3−√
5)T ] > 0
⇔ 4[(1 +√
5)T + (√
5− 1)][T 2 + 3T + 1]2 − 2[(3 +√
5)T + 2]2[2T 2 + (3−√
5)T ] > 0.
95
The polynomial on the left-hand side can be then written as T5∑i=0
aiTi where
a5 = 4(1 +√
5) > 0
a4 = 4(−9 +√
5) < 0
a3 = 4(−13 + 11√
5) > 0
a2 = 68(√
5− 1) > 0
a1 = 4(−11 + 9√
5) > 0
a0 = 4(√
5− 1) > 0
(C.11)
It is trivial to check that a3 + a4T + a5T2 > 0 for all T > 0. Since the rest of the coefficients
are strictly positive, the proof is concluded.
Let V x denote the ex ante payoff to player 1 in the case x ∈ {public,no feedback}.It follows that
V pub = E0
[−ˆ T
0
(at − θ)2dt− ψM2T
]= −ˆ T
0
E0
[(β1tMt + [β3t − 1]θ)2
]dt− ψ(m2
0 + γo − γT )
= −ˆ T
0
[(β3t − 1)2γo + β2
1t(γo − γt) + 2β1t(β3t − 1)(γo − γt)
]dt− ψσ2
Y (1− ρpub)
Using the solutions for the coefficients and γt in terms of γF and carrying out the simplifi-
cations, we obtain
V pub = V pub(ρpub),
where
V pub(ρ) := σ2Y
{−ψ(1− ρ) + T ψρ2[−ψ(1− ρ) + 1] + ln
(1− ρT ρ
)}.
In the no feedback case, note that
E0[M2t ] = E0[(χtθ + (1− χt)m0)2]
= E0[χ2t θ
2]
= χ2tγ
o.
96
Hence,
E0[M2t ] = E0[(Mt −Mt)
2]︸ ︷︷ ︸=γ2t
+E0[M2t ]
= χtγt + χ2tγ
o.
Using at = β0t + β1tMt + β3tθ = αtθ, we now calculate
V NF = E0
[−ˆ T
0
(at − θ)2dt− ψ(χTγT + χ2Tγ
o)
]= E0
[−ˆ T
0
θ2(1− a)2dt− ψχT (γT + χTγo)
]= −(1− α)2γoT − ψχT (γT + χTγ
o).
Expressing χT = 1− γT/γo, γT and α in terms of γF = γT , we have
V NF = V NF (ρNF ),
where
V NF (ρ) := σ2Y
{−ψ2ρ(1− ρ)3 − ψ(1− ρ)
}.
Lemma C.6. For ψ ∈ (0, 1], the long-run player is better off in the no feedback case than
in the public case for all T > 0.
Proof. We show that (i) V pub(ρpub) < V NF (ρpub) and (ii) V NF (ρ) is increasing for ρ ≥ ρpub,
so that V pub(ρpub) < V NF (ρNF ).
Toward establishing (i), define
V (ρ) := V pub(ρ)− V NF (ρ)
= σ2Y
{T ψρ2[−ψ(1− ρ) + 1] + ln
(1− ρT ρ
)+ ψ2ρ(1− ρ)3
},
so that our first goal is to show V (ρpub) < 0. Using the inequality ln(x) < x − 1 for x > 0,
97
we have
V (ρ) < σ2Y
{T ψρ2[−ψ(1− ρ) + 1] +
[1− ρT ρ
− 1
]+ ψ2ρ(1− ρ)3
}.
=σ2Y
T ρV2(ρ),
where V2(ρ) := T 2ψρ3[1− ψ(1−ρ)] + 1−ρ(1 + T ) + T ψ2ρ2(1−ρ)3, and so it suffices to show
V2(ρpub) < 0. Now the equation gpub(ρpub) = 0 is equivalent to ψ = −1−(1+T )ρ
T ρ2(1−ρ)|ρ=ρpub ; using
this to eliminate ψ and simplifying, we obtain
V2(ρpub) = − [ρ(1 + T )− 1]3
T ρ2|ρ=ρpub ,
which is strictly negative as ρpub > 11+T
, establishing claim (i).
Toward claim (ii), differentiate
d
dρV NF (ρ) = σ2
Y
{−ψ2[−3ρ(1− ρ)2 + (1− ρ)3] + ψ
}= σ2
Y ψ︸︷︷︸>0
{−ψ(1− ρ)2(1− 4ρ) + 1
}.
The expression in braces is positive iff h(ρ) := (1− ρ)2(1− 4ρ) < 1ψ
. Now for ρ ∈ [0, 1], h(ρ)
attains its maximum value of 1 at ρ = 0. Hence, the expression is positive for all ρ ∈ (0, 1) if
ψ ≤ 1. We conclude that V NF (ρ) is increasing for all ρ ∈ (0, 1), and hence for all ρ ≥ ρpub,
if ψ ≤ 1.
Combining parts (i) and (ii) yields V pub(ρpub) < V NF (ρpub) < V NF (ρNF ) as desired.
Proofs for Section 5.2
Proof of Proposition 6. The proof is by contradiction; suppose that a linear Markov equi-
librium exists in which the long-run player plays at = β0t + β1tM2t + β2tLt + β3tθ. Define
α0t := β0t, α2t := β2t + β1t(1− χt) and α3t = β3t + β1tχt. We derive a collection of necessary
conditions for equilibrium and show that there is no real value of α30 for which they can be
satisfied.
From the long-run player’s perspective, given an action profile (at)t≥0, the state variables
98
Mt and Lt evolve according to
dLt = µL(at)dt+ σLdZXt
dMt = µM(at)dt+ σMdZXt ,
where
µL(a) =Bt
1− χt[a+ ξ(m− l)− α0t − (α2t + α3t)l]
µM(a) = Σα3tγt [a− α0t − α2tl − α3tm]
σL =BtσX1− χt
σM = BtσX
Bt =γt(α3t + ξχt)
σ2X
Σ =1
σ2X
+1
σ2Y
,
and where the learning ODEs are
γt = −γ2t α
23tΣ (C.12)
χt = γtα23tΣ(1− χt)− (α3t + ξχt)Bt. (C.13)
In any conjectured equilibrium, given (t, θ,Mt, Lt) = (t, θ,m, l), the long-run player’s
expected continuation payoff from following the equilibrium strategy is Et[´ T
t(θ − Ls)asds
],
which is of the form v0t + v1tθ + v2tm+ v3tl + v4tθ2 + v5tm
2 + v6tl2 + v7tθm+ v8tθl + v9tml.
By optimality, the long-run player’s value function must satisfy the HJB equation
0 = supa
{(θ − L)a+ Vt + µL(a)VL + µM(a)VM +
1
2σ2MVMM +
1
2σ2LVLL + σLσMVLM
},
and since this is linear in a, the following indifference condition must hold:
0 = (θ − l) +Bt
1− χtVL + α3tγtΣVM . (C.14)
Since (C.14) must hold for all values of θ,m and l, we match coefficients to obtain the
99
following four equations:
constant : 0 = γt
[Σv2tα3t +
v3t(α3t + ξχt)
σ2X(1− χt)
](C.15)
θ : 0 = 1 + Σv7tα3tγt +v8tγt(α3t + ξχt)
σ2X(1− χt)
(C.16)
m : 0 = γt
[2Σv5tα3t +
v9t(α3t + ξχt)
σ2X(1− χt)
](C.17)
l : 0 = −1 + Σv9tα3tγt +2v6tγt(α3t + ξχt)
σ2X(1− χt)
. (C.18)
Note that γ0 = γo > 0, and since χ0 = 0, (C.16) (or (C.18)) implies α30 6= 0. Hence, for
sufficiently small t, we can solve (C.17)-(C.18) to obtain
v5t = − σ2Y v9t(α3t + ξχt)
2(σ2X + σ2
Y )α3t(1− χt)(C.19)
v6t =[σ2Xσ
2Y − (σ2
X + σ2Y )v9tα3tγt](1− χt)
2σ2Y γt(α3t + ξχt)
. (C.20)
Differentiate v5t, use (C.13) to replace χt and evaluate at time t = 0 to obtain
v50 = −v90α30(ξ + α30)γ0 + σ2Y v90
2(σ2X + σ2
Y ). (C.21)
On the other hand, the indifference condition reduces the HJB equation to
0 = Vt + µL(0)VL + µM(0)VM +1
2σ2MVMM +
1
2σ2LVLL + σLσMVLM . (C.22)
Now (C.22) must hold for all θ,m and l, and by matching coefficients, we obtain a set of
10 equations. Evaluating the coefficients on m2 and ml at t = 0, and using (C.19), (C.20)
and (C.21) to eliminate v5, v6 and v5, we obtain
m2 : 0 =(σ2
X + 2σ2Y )v90α30(ξ + α30)γ0 − σ2
Xσ2Y v90
σ2X(σ2
X + σ2Y )
=⇒ v90 =(σ2
X + 2σ2Y )v90α30(ξ + α30)γ0
σ2Xσ
2Y
(C.23)
ml : 0 = ξ − (σ2X + 2σ2
Y )v90α30(ξ + α30)γ0
σ2Xσ
2Y
+ v90
=⇒ v90 = −ξ +(σ2
X + 2σ2Y )v90α30(ξ + α30)γ0
σ2Xσ
2Y
(C.24)
100
Clearly, (C.23) and (C.24) cannot hold simultaneously, which gives us the desired contradic-
tion. (Note that if v90 6= 0 and α30 > 0, equality (of the form +∞ = +∞) between the right
hand sides of (C.23) and (C.24) can be achieved if α30 = +∞, supporting the interpretation
that the long-run player would trade away all information in the first instant.)
References
Alonso, R., W. Dessein, and N. Matouschek (2008): “When does coordination re-
quire centralization?” American Economic Review, 98, 145–179.
Amador, M. and P.-O. Weill (2012): “Learning from private and public observations
of others’ actions,” Journal of Economic Theory, 147, 910–940.
Angeletos, G.-M. and A. Pavan (2007): “Efficient use of information and social value
of information,” Econometrica, 75, 1103–1142.
Aragones, E., A. Postlewaite, and T. Palfrey (2007): “Political reputations and
campaign promises,” Journal of the European Economic Association, 5, 846–884.
Back, K. (1992): “Insider trading in continuous time,” The Review of Financial Studies,
5, 387–409.
Back, K., C. H. Cao, and G. A. Willard (2000): “Imperfect competition among
informed traders,” The Journal of Finance, 55, 2117–2155.
Bolton, P. and M. Dewatripont (2013): “Authority in Organizations,” Handbook of
organizational Economics, 342–372.
Bonatti, A. and G. Cisternas (2019): “Consumer Scores and Price Discrimination,”
Review of Economic Studies.
Bonatti, A., G. Cisternas, and J. Toikka (2017): “Dynamic oligopoly with incom-
plete information,” The Review of Economic Studies, 84, 503–546.
Callander, S. and S. Wilkie (2007): “Lies, damned lies, and political campaigns,”
Games and Economic Behavior, 60, 262–286.
Carlsson, H. and S. Dasgupta (1997): “Noise-proof equilibria in two-action signaling
games,” Journal of Economic Theory, 77, 432–460.
101
Cisternas, G. (2018): “Two-sided learning and the ratchet principle,” The Review of
Economic Studies, 85, 307–351.
Dessein, W. and T. Santos (2006): “Adaptive Organizations,” Journal of Political Econ-
omy, 114, 956–995.
Dilme, F. (2019): “Dynamic Quality Signaling with Hidden Actions,” Games and Economic
Behavior, 113, 116–136.
Ely, J. C., J. Horner, and W. Olszewski (2005): “Belief-free equilibria in repeated
games,” Econometrica, 73, 377–415.
Faingold, E. and Y. Sannikov (2011): “Reputation in continuous-time games,” Econo-
metrica, 79, 773–876.
Foster, F. D. and S. Viswanathan (1994): “Strategic trading with asymmetrically in-
formed traders and long-lived information,” Journal of Financial and Quantitative Anal-
ysis, 29, 499–518.
——— (1996): “Strategic trading when agents forecast the forecasts of others,” The Journal
of Finance, 51, 1437–1478.
Fuchs, W. (2007): “Contracting with repeated moral hazard and private evaluations,”
American Economic Review, 97, 1432–1448.
Gryglewicz, S. and A. Kolb (2019): “Strategic Pricing in Volatile Markets,” Tech. rep.
He, H. and J. Wang (1995): “Differential information and dynamic behavior of stock
trading volume,” The Review of Financial Studies, 8, 919–972.
Heinsalu, S. (2018): “Dynamic Noisy Signaling,” American Economic Journal: Microeco-
nomics, 10, 225–249.
Hermalin, B. E. (1998): “Toward an economic theory of leadership: Leading by example,”
American Economic Review, 1188–1206.
Hirsch, M. W., S. Smale, and R. L. Devaney (2004): Differential equations, dynamical
systems, and an introduction to chaos, Academic press.
Holden, C. W. and A. Subrahmanyam (1992): “Long-lived private information and
imperfect competition,” The Journal of Finance, 47, 247–270.
102
Horner, J. and N. S. Lambert (2019): “Motivational Ratings,” Review of Economic
Studies.
Horner, J. and S. Lovo (2009): “Belief-free equilibria in games with incomplete infor-
mation,” Econometrica, 77, 453–487.
Horner, J. and W. Olszewski (2006): “The folk theorem for games with private almost-
perfect monitoring,” Econometrica, 74, 1499–1544.
Keller, H. B. (1968): Numerical Methods for Two-Point Boundary-Value Problems, Blais-
dell Publishing Co.
Kolb, A. (2019): “Strategic Real Options,” Journal of Economic Theory, forthcoming.
Kyle, A. S. (1985): “Continuous auctions and insider trading,” Econometrica: Journal of
the Econometric Society, 1315–1335.
Levin, J. (2003): “Relational incentive contracts,” American Economic Review, 93, 835–
857.
Liptser, r. S. and A. Shiryaev (1977): Statistics of Random Processes 1, 2, Springer-
Verlag, New York.
MacLeod, W. B. (2003): “Optimal contracting with subjective evaluation,” American
Economic Review, 93, 216–240.
Mailath, G. J. and S. Morris (2002): “Repeated games with almost-public monitoring,”
Journal of Economic Theory, 102, 189–228.
Marschak, J. (1955): “Elements for a theory of teams,” Management Science, 1, 127–137.
Marschak, J. and R. Radner (1972): Economic Theory of Teams, New Haven, CT:
Yale University Press.
Matthews, S. A. and L. J. Mirman (1983): “Equilibrium limit pricing: The effects of
private information and stochastic demand,” Econometrica, 981–996.
Mayhew, D. R. (1974): Congress: The electoral connection, vol. 26, Yale University Press.
Milgrom, P. R. and J. D. Roberts (1992): Economics, organization and management,
Prentice-Hall.
103
Morris, S. and H. S. Shin (2002): “Social value of public information,” American Eco-
nomic Review, 92, 1521–1534.
Radner, R. (1962): “Team decision problems,” The Annals of Mathematical Statistics, 33,
857–881.
Rantakari, H. (2008): “Governing Adaptation,” The Review of Economic Studies, 75,
1257–1285.
Royden, H. L. (1988): Real analysis, Macmillan.
Sannikov, Y. (2007): “Games with imperfectly observable actions in continuous time,”
Econometrica, 75, 1285–1329.
Simon, H. A. (1951): “A formal theory of the employment relationship,” Econometrica.
Spence, M. (1973): “Job Market Signaling,” The Quarterly Journal of Economics, 87,
355–374.
Teschl, G. (2012): Ordinary differential equations and dynamical systems, vol. 140, Amer-
ican Mathematical Society.
Williamson, O. E. (1996): The mechanisms of governance, Oxford University Press.
Yang, L. and H. Zhu (2018): “Back-running: Seeking and hiding fundamental information
in order flows,” Rotman School of Management Working Paper.
104