The Princeton Encyclopedia - Scholars at Harvard - Harvard University
A Model of Scienti c Communication - Harvard University · 2020-02-25 · A Model of Scienti c...
Transcript of A Model of Scienti c Communication - Harvard University · 2020-02-25 · A Model of Scienti c...
A Model of Scientific Communication
Isaiah Andrews, Harvard University and NBER∗
Jesse M. Shapiro, Brown University and NBER
February 2020
Abstract
We propose a model of empirical science in which an analyst makes a report to
an audience after observing some data. Agents in the audience may differ in
their beliefs or objectives, and may therefore update or act differently follow-
ing a given report. We contrast the proposed model with a classical model of
statistics in which the estimate directly determines the payoff. We identify set-
tings in which the proposed model prescribes very different, and more realistic,
optimal rules.
keywords: statistical communication, statistical decision theory
∗This article previously circulated under the title “Statistical Reports for Remote Agents.” Weacknowledge funding from the National Science Foundation under Grant No. 1654234, the Alfred P.Sloan Foundation, the Silverman (1968) Family Career Development Chair at MIT, and the BrownUniversity Population Studies and Training Center. Any opinions, findings, and conclusions orrecommendations expressed in this material are those of the authors and do not necessarily reflectthe views of the funding sources. We thank Glen Weyl for contributions to this project in its earlystages. Conversations with Matthew Gentzkow and Kevin M. Murphy on related projects greatlyinfluenced our thinking. For comments and helpful conversations, we also thank Alberto Abadie,Tim Armstrong, Gary Chamberlain, Xiaohong Chen, Max Kasy, Elliot Lipnowski, Adam McCloskey,Emily Oster, Mikkel Plagborg-Møller, Tomasz Strzalecki, and Neil Thakral. The paper also benefitedfrom the comments of seminar and conference audiences at Brown University, Harvard University, theMassachusetts Institute of Technology, the University of Chicago, Boston College, Rice University,Texas A&M University, New York University, Columbia University, the University of California LosAngeles, the University of Texas at Austin, Oxford University, and from comments by conferencediscussant Jann Spiess. We thank our dedicated research assistants for their contributions to thisproject. E-mail: [email protected], jesse shapiro [email protected].
1
1 Introduction
Statistical decision theory, following Wald (1950), is the dominant theory of optimal
estimation in econometrics.1 The classical decision-theoretic model envisions an an-
alyst who estimates an unknown parameter based on some data. The performance
of the estimate is judged by its proximity to the true value of the parameter. This
judgment is formalized by treating the estimate as a decision that, along with the
parameter, determines a realized payoff or loss. For example, if the loss is taken to
be the square of the difference between the estimate and the parameter, then the
expected loss of the decision-maker is the estimator’s mean squared error, a standard
measure of performance.
Although many scientific situations seem well described by the classical model,
many others do not. Scientists often communicate their findings to a broad and diverse
audience, consisting of many different agents (e.g., practitioners, policymakers, other
scientists) with different opinions and objectives. These diverse agents may make
different decisions, or form different judgments, following a given scientific report. In
such cases, it is the beliefs and actions of these audience members which ultimately
matter for realized payoffs or losses.
In this paper we propose an alternative model of empirical science that captures
scientific situations of this kind. We develop sufficient conditions under which the
proposed model predicts very different scientific reports from the classical model.
We offer examples satisfying those conditions, and argue that in practical situations
similar to these examples, scientists’ reports seem to be more consistent with the
predictions of the proposed model than with the predictions of the classical model.
In the proposed communication model, the analyst reports a recommended deci-
sion (or estimate) to an audience based on some observed data. Each agent in the
audience takes her optimal decision (or forms an optimal estimate) after observing
the analyst’s report. Agents differ in their priors, and may therefore have different
optimal decisions (or estimates) following a given report. To simplify the analysis,
we assume that every possible prior belief on the unknown parameter is held by some
agent in the audience. A reporting rule (specifying a distribution of recommendations
1See Lehmann and Casella (1998) for a textbook treatment of statistical decision theory andStoye (2012) for a recent discussion of its relation to econometrics.
2
for each realization of the data) induces an expected loss for each agent, which we
call the rule’s communication risk.
We contrast the proposed communication model with a decision model in which
the analyst selects a decision (or estimate) that directly determines the loss for all
agents. Because agents in the decision model do not optimize following the ana-
lyst’s report, each agent’s expected loss, which we call the rule’s decision risk, is
weakly higher than under the communication model.2 The implications of the deci-
sion model coincide with those of the classical model. Our analysis therefore proceeds
by contrasting rankings of rules under the decision model with rankings under the
communication model.
We find that the two models can imply very different rankings of rules. An ex-
ample illustrates. Suppose that an analyst conducts a randomized controlled trial to
assess the effect of a deworming medication on the average body weight of children
in a low-income country. Even if deworming medication is known to (weakly) im-
prove nutrition, sampling error means that the treatment-control difference may be
negative. Under a canonical quadratic loss, the decision model implies that all au-
dience members prefer that the analyst censor negative estimates at zero, since zero
is closer to the (weakly positive) true effect than any negative number. Under the
same loss, the communication model implies that censoring discards potentially useful
information (the more negative the estimate, the weaker the evidence for a large posi-
tive effect), and has no corresponding benefit (agents can incorporate censoring when
determining their optimal decisions or estimates). While a policymaker choosing a
subsidy for the medication might well censor the decision (thus ruling out imposing
a tax), we claim, and illustrate by example, that a scientist choosing a report for a
research article would be unlikely to censor.
In the paper, we formalize and generalize the example. A rule is admissible with
respect to a given definition of risk if no other rule yields a weakly lower risk for
all agents and a strictly lower risk for at least one. While admissibility is a very
weak notion of optimality, in settings like the deworming example there exists no rule
that is simultaneously admissible for decision and communication. More generally,
we show that the sets of decision-admissible and communication-admissible rules do
2Decision risk is what Lehmann and Casella (1998, Chapter 4) call the Bayes risk.
3
not intersect when (i) some decision is dominated in loss and (ii) the set of feasible
decisions is smaller than the set of distinct optimal action profiles for agents.
We next turn to principles for choosing rules other than admissibility. One such
principle is to minimize weighted average risk with respect to some weights on the
audience. Applying this principle to decision risk leads to the class of Bayes decision
rules. Bayes decision rules with respect to full-support priors have strong optimality
properties in the classical setting, but because they are admissible in decision risk,
they are inadmissible in communication risk under conditions (i) and (ii).
Another principle is to minimize the maximum risk over agents in the audience. In
contrast to our findings for admissibility and optimality in weighted average risk, we
find that any rule that is minimax in decision risk is minimax in communication risk.
This finding establishes a sense in which any rule that is robust for decision-making
is also robust for communication. However, minimax rules can be inadmissible, and
under conditions (i) and (ii), any rule that is both minimax and admissible in decision
risk is minimax and inadmissible in communication risk.
We apply our model to two types of scientific settings recently studied in econo-
metrics: estimating the conditional expectation function when this function is known
to be monotonically decreasing, and estimating the optimal choice of treatment from
a finite set. In both settings, as in the deworming example, we show that the implica-
tions of the decision and communication models are different, and we argue that the
communication model is more consistent with practice in some real-world situations.
Our analysis is positive in nature. We do not take a stand on which principles
should guide scientists’ choice of statistical reports. Nor do we argue that the com-
munication model is a better description of all scientific situations than the decision
model. Rather, we argue that the communication model better describes some im-
portant situations, and that in these situations, it seems to better match scientific
practice.
Heterogeneity among agents plays a central role in our analysis. When agents are
homogeneous, the distinction between decision and communication risk is inconse-
quential, because a benevolent analyst can simply report the agents’ optimal decision
(or estimate) given the data. When agents are instead heterogeneous, the distinction
can be consequential, because different agents may prefer different decisions (or es-
4
timates). Although we model heterogeneity by assuming agents differ in their prior
beliefs, we show that our model is equivalent to one in which agents instead differ in
their loss functions.
Our baseline model assumes that the sets of possible parameter values, data re-
alizations, and decisions are finite. This greatly simplifies the analysis but has the
drawback that it precludes applying our framework to many canonical statistical situ-
ations such as estimating the mean of a Gaussian random variable. Online Appendix
B.1 develops continuous versions of the example and applications discussed in the
main text and shows that there remains a strong conflict between the decision and
communication models, in the sense that changing the notion of risk can reverse
dominance orderings.
We assume that the analyst’s report takes the form of a recommended decision.
This ensures that both decision and communication risk are well-defined for all rules
we consider, and thus allows us to directly compare the ranking of rules under the
two notions of risk. We discuss additional implications of this assumption in Section
2.3.
We are not aware of past work that studies the ranking of rules based on com-
munication risk in a setting with heterogeneous agents. Raiffa and Schlaifer (1961),
Hildreth (1963), Sims (1982, 2007), and Geweke (1997, 1999), among others, consider
the problem of communicating statistical findings to diverse, Bayesian agents.3 One
conclusion from this literature is that when the analyst can communicate the full
likelihood, data, or a sufficient statistic, the analyst should do so. Our analysis is
particularly related to that of Hildreth (1963) who studies, among other topics, the
properties of what we term communication risk in the single-agent setting. Banerjee
et al. (forthcoming) study minimax experimental design when the audience has het-
erogeneous priors. Spiess (2018) studies optimal estimation in a setting where, unlike
in our model, the objectives of the analyst and audience may be misaligned. Andrews
et al. (2020) study the implications of communication risk for structural estimation
3See also Efron (1986) and Poirier (1988). A related literature (e.g., Pratt 1965, Kwan 1999,Abadie forthcoming, Abadie and Kasy 2019, Frankel and Kasy 2018) assesses the Bayesian interpre-tation of frequentist inference. Another literature (e.g., Zhang et al. 2013, Jordan et al. 2018, Zhuand Lafferty 2018a) considers the problem of distributing statistical estimation and inference acrossmultiple machines when communication is costly.
5
in economics (see also Andrews et al. 2017).
Our setting is also related to the literature on comparisons of experiments following
Blackwell (1951, 1953).4 What we term communication risk has previously appeared
in this literature (see for instance Example 1.4.5 in Torgersen 1991), but the primary
focus has been on properties (e.g., Blackwell’s order) that hold for all possible beliefs
and loss functions. By contrast, we focus on the comparison between communication
risk and decision risk for a given loss function and class of priors.
Our setting is broadly related to large literatures on strategic communication
(Crawford and Sobel 1982) and information design (Bergemann and Morris 2019).
As in Farrell and Gibbons (1989), the receivers (agents) in our setting are hetero-
geneous. As in Kamenica and Gentzkow (2011), the sender (analyst) in our setting
commits in advance to a reporting strategy. Unlike much of the literature on strategic
communication, our setting does not involve a conflict of interest between the sender
and the receivers.
The remainder of the paper is organized as follows. Section 2 introduces our
setting, notation, and key definitions, and provides some preliminary results. Section
3 provides our main results on the implications of the communication model and its
differences with the decision and classical models. Section 4 presents applications
to two types of scientific situations. Section 5 concludes. An appendix following
the body text provides proofs for the main results. Online Appendix A proves the
results in the example and applications. Online Appendix B discusses extensions and
auxiliary results.
2 Model
2.1 Primitives
An analyst observes data X ∈ X for X a finite sample space, |X | <∞. The distribu-
tion of X is governed by a parameter θ ∈ Θ, with X|θ ∼ Fθ, for Θ a finite parameter
space. We assume that Fθ has support equal to X for all θ ∈ Θ. The analyst also
observes a public random variable V ∼ U [0, 1] that is independent of θ and X.
4Le Cam (1996) provides a brief review, while an extensive treatment can be found in Torgersen(1991).
6
The analyst publicly commits to a rule c : X × [0, 1] → ∆ (D) that maps from
realizations of the data X and the public random variable V into a distribution over
decisions d ∈ D, for D a finite space. Let C denote the set of all such rules and with a
slight abuse of notation let c (X, V ) ∈ D denote the random realization from a given
rule c ∈ C.Rules are evaluated by their performance with respect to a closed set A ⊆ ∆ (Θ)
of priors on the parameter space, which we will call the audience. Specifically, the
function ρ : C ×A → R≥0 describes the risk of rule c ∈ C with respect to any a ∈ A.
We discuss three possible choices of (A, ρ), along with their interpretation, in the next
section.
The ordering of rules under the risk function ρ (·, ·) may depend on the prior
a ∈ A. We consider three canonical criteria by which the analyst could select a rule,
or set of rules, in such a case.
Definition 1. For a given risk function ρ (·, ·) and audience A, a rule c∗ ∈ C is
• admissible if there exists no rule c ∈ C such that ρ (c, a) ≤ ρ (c∗, a) for all
a ∈ A, with strict inequality for at least one a ∈ A.
• ω-optimal if for a probability measure ω with support equal to A∫Aρ (c∗, a) dω (a) = inf
c∈C
∫Aρ (c, a) dω (a) . (1)
• minimax if
supa∈A
ρ (c∗, a) = infc∈C
supa∈A
ρ (c, a) .
These criteria are central to the classical study of statistics (Lehmann and Casella
1998). They also have an intuitive economic meaning. Admissibility excludes rules
that are dominated. ω-optimality defines the set of rules optimal with respect to
full-support Pareto weights ω on the audience. Minimaxity minimizes the worst-case
risk and is familiar from welfare economics, game theory, and the study of decision-
making under ambiguity. Existence of rules optimal according these criteria follows
from standard arguments.
7
Proposition 1. Suppose that ρ (c, a) is continuous in a for all c ∈ C, and that the set
of risk functions {ρ (c, ·) : c ∈ C} is compact in the supremum norm. Then there exists
an admissible rule, an ω-optimal rule for any ω, a minimax rule, and an admissible
minimax rule. Moreover, any ω-optimal rule is admissible.
Under the conditions of Proposition 1, the criteria we consider are available to the
analyst, in the sense that they each admit at least one rule.
2.2 Risk Functions for Communication and Decision
In our setting, a model of empirical science is characterized by a risk function ρ (·, ·)and an audienceA, which together determine the implications of the canonical criteria
for the analyst’s choice of rules.
We focus on three such models.
Definition 2. Fix a loss function L : D ×Θ→ R≥0. Then:
• The communication model takes A = ∆ (Θ) and
ρ (c, a) = R∗a (c) = Ea
[mind
Ea [L (d, θ) |c (X, V ) , V ]],
for R∗a (c) the communication risk of rule c ∈ C for prior a ∈ ∆ (Θ) and Ea
the expectation under a.
• The decision model takes A = ∆ (Θ) and
ρ (c, a) = Ra (c) = Ea [L (c (X, V ) , θ)]
for Ra (c) the decision risk of rule c ∈ C for prior a ∈ ∆ (Θ).
• The classical model takes A to be the vertices of ∆ (Θ), such that each prior
a ∈ A places probability 1 on some θ (a) ∈ Θ, and takes
ρ (c, a) = Rθ(a) (c) = Eθ(a) [L (c (X, V ) , θ (a))]
for Rθ(a) (c) the frequentist risk of rule c ∈ C for the parameter θ (a) ∈ Θ.
8
The communication risk R∗a (c) is the ex-ante expected loss under the prior a from tak-
ing the a-posteriori optimal decision after observing c (X, V ) and V . The decision risk
Ra (c) is the expected loss under the prior a from taking the decision c (X, V ). Finally,
the frequentist risk Rθ(a) (c) is the expected loss under the parameter value θ (a) from
taking the decision c (X, V ). Lemma 1 in the appendix shows that communication
and decision risk satisfy the conditions of Proposition 1, while these conditions hold
trivially for frequentist risk. Hence, Proposition 1 implies the existence of admissible,
ω-optimal, and minimax rules in the communication, decision, and classical models.
The decision and classical models have different audiences but the same risk func-
tion. The following proposition shows that their implications coincide in a strong
sense.
Proposition 2. The decision model implies the same sets of admissible rules, ω-
optimal rules for some ω, and minimax rules as the classical model.
Intuitively, because the audience A = ∆ (Θ) consists of the set of all possible
convexifications of the vertices of ∆ (Θ), the dominance relation (admissibility), the
implications of full-support averages (ω-optimality), and the worst-case risk (mini-
maxity) are all preserved when switching from the classical model’s audience to the
decision model’s audience. This point is well-understood in the literature (see, e.g.,
Stoye 2011).
While we focus on the case with A = ∆ (Θ) for simplicity of exposition, many of
our results (though not Proposition 2) extend to the case of a general closed, convex
audience A ⊆ ∆ (Θ). We highlight these extensions in the proofs. That said, some
degree of heterogeneity is necessary for the distinction between communication and
decision risk to be interesting. For an audience A = {a} consisting of a single prior
a, all three notions of optimality discussed in Definition 1 coincide, and optimality in
the decision model implies optimality in the communication model.
It is worth briefly highlighting the role of randomization in the rules c. Specifically,
the analyst may randomize publicly through the random variable V , considering c :
X × [0, 1] → D, privately through the choice of a randomized decision, c : X →∆ (D) , or both publicly and privately, c : X × [0, 1] → ∆ (D). For the decision
and classical models private and public randomization are equivalent, in the sense
9
that the risk depends only on the distribution of c (X, V ) |X. For the communication
model, by contrast, the nature of the randomization matters. Intuitively, since V
is observed, publicly randomized rules c : X × [0, 1] → D can be viewed as ex-ante
mixtures over non-random rules c : X → D, where it is revealed ex-post which rule
c was selected. By contrast, privately randomized rules c : X → ∆ (D) can be
viewed as ex-ante mixtures over non-random rules c : X → D, where it is never
revealed which c was selected. Hence, the public random variable V ∼ U [0, 1] is a
modeling device to allow the analyst to randomize across rules while revealing the
outcome of the randomization to the audience. Lemma 2 in the appendix shows that
public randomization always yields weakly lower communication risk than private
randomization, with a strict inequality in some cases.
2.3 Interpretation of the Models
To interpret the models, we can identify each prior a ∈ ∆ (Θ) with some agent. Agents
in the audience for a given scientific finding might include practitioners, policymakers,
or other scientists. We highlight two different ways to interpret the decision d ∈ Dand associated loss L (d, θ).
One interpretation is that the decision d ∈ D represents a real-world action whose
consequences are captured by L (d, θ). For example, doctors may need to choose a
treatment, policymakers to set a tax, and scientists to decide on what experiment to
run next. On this interpretation, the decision model reflects a situation in which the
analyst makes a decision on behalf of all agents, or equivalently, all agents are bound
to take the decision recommended by the analyst. The communication model, by
contrast, reflects a situation in which each agent is free to take her optimal decision
given the information in the analyst’s report.
Another interpretation is that the decision d ∈ D represents a best guess whose
departure from the truth is captured by L (d, θ). This interpretation is evoked by
canonical losses, such as L (d, θ) = (d− θ)2, that increase in the distance between
the estimate and the parameter. On this interpretation, the decision model reflects
a situation in which each agent evaluates the quality of the analyst’s guess according
to the agent’s prior. The communication model, by contrast, reflects a situation in
10
which each agent evaluates the quality of the agent’s own best guess, as informed by
the analyst’s report as well as the agent’s prior.
In many real-world situations the agents in the audience for a given scientific find-
ing will have diverse opinions and may therefore make different decisions, or form dif-
ferent best guesses about an unknown parameter, after learning the same report. The
communication model better reflects such situations than does the decision model.
In other situations—for example, a government committee deciding on the appropri-
ate treatment to reimburse for a given diagnosis for all practitioners, or a scientific
committee deciding where next to point a telescope that will provide data to many
researchers—the decision model seems a better fit.
In our model, agents differ only in their prior beliefs about the parameter. Casual
experience suggests that different agents (doctors, policymakers, scientists) do often
disagree subjectively about how to interpret evidence, and the assumption of hetero-
geneous priors is one way to capture such disagreements (Morris 1995). But agents
may also differ in their preferences or objectives. Online Appendix B.2 shows that
models in which an audience of agents differ in their loss functions can be cast into our
setting through appropriate relabeling and choice of A.5 The same appendix shows
how to accommodate heterogeneous beliefs about the likelihood Fθ. Online Appendix
B.3 deals with the case where an agent’s payoff is determined by the agent’s regret
(i.e., loss or risk relative to a best-possible decision or rule).
It is also possible to do away with heterogeneous agents altogether by envisioning a
single agent who receives some information about the parameter θ that is not available
to the analyst at the time they make their report. Under this interpretation, A is the
set of posterior beliefs that the agent may hold following receipt of the information.
Under the decision model, the agent is bound by the recommended decision regardless
of the additional information. Under the communication model, the agent is free to
make an optimal decision after learning the additional information.
We assume throughout that the set of rules C available to the analyst is the set of
estimators or decision rules, i.e. rules mapping from the sample space to distributions
on the decision space D. This assumption is important for comparing the implications
5Brown (1975) considers a setting with a collection of possible loss functions and proposes corre-sponding notions of admissibility.
11
of the decision and communication models, because decision risk is not well-defined for
rules c /∈ C. Modifying the communication model to allow the use of a broader set of
rules C ′ ⊃ C would only strengthen the conflicts between decision and communication
risk that we document in Section 3.6 An unrealistic aspect of the communication
model is that communication risk (unlike decision risk) is invariant to permutations
of the analyst’s report (e.g., reporting high values of d when the parameter θ is
expected to be low, or vice versa). Online Appendix B.4 shows how to eliminate this
unrealistic aspect of the model without altering our substantive findings.
In some situations, including some of the examples we discuss below, the com-
munication constraint is not binding, in the sense that D is rich enough to allow the
analyst to communicate all the decision-relevant information in the data. In general,
this need not be the case, and when it is not, an analyst concerned with minimiz-
ing communication risk might like to use a richer vocabulary than D. In practice,
scientists often report estimates or other summaries that do not convey all of the
information in the data. We think a plausible reason is that there are communication
or information processing constraints on the part of the agents in the audience. Our
model captures those constraints by the restriction to rules c ∈ C. Restricting reports
to use a finite vocabulary is a standard way to model such constraints in information
theory (e.g., Cover and Thomas 2006), and has been explored in recent studies of
nonparametric statistics under computational constraints (Zhu and Lafferty 2018a,
b).
Online Appendix B.1 develops versions of our example and applications with
continuous parameter spaces. For the example and one application we further al-
low for a continuous sample space X and decision space D. With continuous D,
communication-optimal rules can take an unrealistic form, for example encoding the
full data in the decimal expansion of c (X, V ). The appendix therefore focuses on
showing that the decision and communication models deliver conflicting rankings of
particular, realistic-seeming rules. How best to model communication constraints for
general settings with continuous decision spaces seems a potentially interesting topic
for future work.
6If, according to a given optimality criterion, any rule that is optimal in decision risk is notoptimal in communication risk, then the same applies if we evaluate communication risk with respectto C′ ⊃ C.
12
3 Implications of the Communication Model
In this section we discuss the implications of the communication model, emphasizing
the contrast with the decision (and hence classical) model. To illustrate the key
intuitions, we return to (and elaborate) the example from the introduction.
Example. An analyst observes data on weight gain for a sample of n children enrolled
in a randomized trial of deworming drugs. Weight is measured to finite precision, so
the weight gain for child i is Xi ∈ X0 for X0 a finite set with |X0| ≥ 2. The data are
thus
X = (X1, ..., Xn) ∈ X = X n0 .
Indices i are assigned at random, with children {1, ..., n} assigned to control and
children {n+ 1, ..., n} assigned to treatment, with n, n − n ≥ 3 . Children’s weight
gains Xi, Xj are independent for all i 6= j. For children in the control group, Xi ∼F0 (θ) and for those in the treatment group Xi ∼ F0
(θ), where F0 (t) is an exponential
family with probability mass function of the form f0 (x; t) = exp (tx)h (t) g (x) . This
implies that the control and treatment group means
(X,X
)=
(1
n
n∑i=1
Xi,1
n− n
n∑i=n+1
Xi
)
are a sufficient statistic for θ =(θ, θ). Let XM and XM denote the sets of possible
values for each of these means.
The average treatment effect of deworming drugs on child weight is
EF0(θ) [Xi]− EF0(θ) [Xi] = ATE (θ) ,
which can be estimated without bias by the difference in means X−X. Suppose that
the average treatment effect is known a priori to be non-negative, and that θ, θ ∈ Θ0
for Θ0 a finite set with |Θ0| ≥ 2 and maxθ∈Θ0EF0(θ) [Xi]−minθ∈Θ0 EF0(θ) [Xi] > 2/3.
The parameter space is thus
Θ ={θ ∈ Θ2
0 : ATE (θ) ≥ 0}
={θ ∈ Θ2
0 : θ ≥ θ}.
13
The audience consists of governments who must decide how much to subsidize
(or tax) deworming drugs. The optimal Pigouvian subsidy is simply ATE (θ). The
governments face a loss L (d, θ) = (d− ATE (θ))2 for d the per-unit subsidy, with
d < 0 denoting a tax. The set of feasible decisions,
D ={x− x : (x, x) ∈ XM ×XM
},
corresponds to the support of the average treatment effect estimate X −X. Online
Appendix B.5 provides a microfoundation for the quadratic loss (d− ATE (θ))2 in a
Pigouvian setting.
Because θ ≥ θ by assumption, a tax (d < 0) is never optimal. Therefore, rules that
sometimes recommend d < 0 are unappealing from the standpoint of decision risk.
Nevertheless, because the sample is finite we may have X < X even though θ ≥ θ.
Agents who are, say, unsure whether to subsidize a little or not at all may value
knowing that X < X, because they may apply a lower subsidy when X < X than
when, say, X = X. Therefore, from the standpoint of communication risk, rules that
never report d < 0 may be unappealing because they suppress useful information.
We may alternatively envision the loss as capturing the scientific community’s
desire for a good guess of the true average treatment effect. On this interpretation,
a guess d < 0 is again unappealing from the standpoint of decision risk (such a guess
cannot be right), but may be appealing from the standpoint of communication risk
(because it conveys useful information that agents can use in formulating their own
guesses). N
The Example illustrates a setting in which we expect the implications of the
communication and decision models to be very different. We next formalize the
source of the tension. The following subsections show how it manifests under each of
the criteria that we consider for choosing rules.
Definition 3. A decision d ∈ D is dominated in loss if there exists d′ ∈ D such
that L (d, θ) ≥ L (d′, θ) for all θ ∈ Θ, with strict inequality for some θ′ ∈ Θ.
A decision d ∈ D is dominated in loss if another decision yields a weakly smaller
loss under all parameter values, with a strictly smaller loss under some parameter
value.
14
Definition 4. Let P be the set of partitions of X , with generic element P ∈ P. Let
P∗ denote the subset of P such that for every cell Xp ∈ P ∈ P∗, each agent has at
least one decision d ∈ D that is optimal for every X ∈ Xp. That is,
P∗ =
{P ∈ P :
{∩X∈Xp arg min
d∈DEa [L (d, θ) |X]
}6= ∅ for all Xp ∈ P, a ∈ ∆ (Θ)
}.
The effective size of the sample space X , denoted N (X ), is the minimal size
of a partition in P∗
N (X ) = min {|P | : P ∈ P∗} .
The effective size of the sample space is the smallest number of cells into which
we can partition the sample space X such that knowing only which cell contains X
is sufficient for all agents to take optimal decisions. For example, such a partition
would group data realizations x and x′ that imply the same likelihood, i.e., for which
Prθ {X = x} = Prθ {X = x′} for all θ.
Claim 1. In the Example, any d < 0 is dominated in loss, and N (X ) = |XM | ×∣∣XM
∣∣ > |D|.Intuitively, any decision d < 0 is dominated by the decision d′ = 0 because decision
d′ is strictly closer to the optimal decision ATE (θ). At the same time any two distinct
realizations of the sufficient statistics imply distinct optimal actions for some agent,
so the effective size of the sample space is |XM | ×∣∣XM
∣∣.3.1 Admissibility
When there exists a dominated decision and N (X ) ≥ |D|, the communication and
decision models have non-overlapping sets of admissible rules.
Proposition 3. If (i) there exists a decision d ∈ D that is dominated in loss and
(ii) N (X ) ≥ |D| , then any rule c ∈ C that is admissible under the decision model is
inadmissible under the communication model, and vice versa.
The proof of Proposition 3 proceeds in two steps. The first step establishes that,
under the decision model, if d ∈ D is dominated in loss by d′ ∈ D, then any rule
15
c ∈ C that recommends d with positive probability is dominated by the rule c′ ∈ Cthat instead recommends d′.
The second step establishes that, under the communication model, if the effective
size of the sample space is weakly larger than the decision space, then any rule c ∈ Cwhich never uses some d ∈ D is dominated by another rule c′ ∈ C which sometimes
uses d. The reason is that, for any such c, there exists X ∗ ⊆ X such that (i) different
elements of X ∗ would lead some agent a to take different actions and (ii) c sometimes
assigns the same signal to all X ∈ X ∗. It follows that c can be dominated by a rule
c′ that reports d on the subset of X ∗ where a would like to take a particular action,
which strictly reduces the communication risk for agent a without increasing it for
any other agent.
Propositions 2 and 3 immediately establish conditions for a conflict between the
communication and classical models:
Corollary 1. If (i) there exists a decision d ∈ D that is dominated in loss and (ii)
N (X ) ≥ |D| , then any rule c ∈ C that is admissible under the classical model is
inadmissible under the communication model, and vice versa.
Claim 1 allows us to apply Proposition 3 and Corollary 1 to the Example.
Corollary 2. In the Example, there is no rule c ∈ C that is admissible under both the
decision model and the communication model, and no rule c ∈ C that is admissible
under both the classical model and the communication model.
Online Appendix B.1.1 develops a variant of this example with Gaussian data
and a continuous parameter space, and shows that decision and communication risk
induce conflicting dominance orderings.
It is also possible to modify the Example so the analyst reports estimates of
the treatment and control group means rather than of their difference. With this
modification the communication constraint is not binding, in the sense that it is
feasible to report a sufficient statistic for θ. The hypotheses of Proposition 3, and
hence the conclusions of Corollary 2, continue to hold for this alternative formulation.7
7Formally, let d =(d, d)∈ D = XM ×XM , and define the loss as
L (d, θ) =(d− d−ATE (θ)
)2.
16
Example. (Continued) Kruger et al. (1996) conducted an early randomized con-
trolled trial of the effect of anthelmintic therapy (deworming drugs) on children’s
growth. A separate randomization was used to study the effect of iron-fortified soup.
Children were weighed to the nearest 0.1kg at baseline and endline. Among children
who received unfortified soup, those receiving deworming drugs had a lower average
growth over the intervention period (mean weight gain of 0.9kg, n = 15) than those
receiving a placebo treatment (mean weight gain of 1.0kg, n = 14; see Table 4 of
Kruger et al. 1996). Kruger et al. (1996) state that “[Positive effects on weight gain]
can be expected with reduction in diarrhoea, anorexia, malabsorption, and iron loss
caused by parasitic infection” (p. 10). In a later review of the literature, Croke et al.
(2016) state that “there is no scientific reason to believe that deworming has negative
side effects on weight” (p. 19). If we interpret these statements to mean that the
average treatment effect is known to be non-negative, then censoring the estimated
treatment effect at 0 (i.e. reporting that the treatment and control groups experienced
the same average weight gain) would lead to an estimate strictly closer to the truth
than the negative estimate implied by the group means, and would therefore domi-
nate in mean squared error. However, Kruger et al. (1996) report the group means
and do not report a censored estimate. Indeed, among the four studies that Croke
et al. (2016) identify as implying negative point estimates of the effect of deworming
drugs on weight, none published a censored point estimate.8
The preceding analysis suggests one possible explanation. While each member of
the audience for the research might be happy with some censoring scheme, different
members of the audience would like different ones, and (under the conditions above)
By Claim 1, d− d < 0 is dominated in loss and N (X ) = |XM | ×∣∣XM
∣∣ ≥ |D|.8Croke et al. (2016, Figure 2) identify 4 negative point estimates out of a total of 22 reviewed.
These 4 negative point estimates are from 4 distinct studies (including Kruger et al. 1996), out of atotal of 20 distinct studies reviewed. Donnen et al. (1998, Table 2) report the regression-adjustedweight gains for a group treated with mebendazole and a control. They further report that thetreated group’s gain is statistically significantly below that of the control group at all time horizonsconsidered. Croke et al. (2016, Figure 2) report a statistically significant effect on weight gain of-0.45kg based on the data from Donnen et al. (1998). Miguel and Kremer (2004, Table V) reporttreatment and control group means of standardized weight-for-age and a statistically insignificantdifference in means of -0.00 to rounding precision. Croke et al. (2016, Figure 2) report a statisticallyinsignificant effect on weight of -0.76kg based on the data from Miguel and Kremer (2004). Awasthiet al. (2000, Table 1) report treatment and control group means of weight gain and report thatthese are not statistically different. Croke et al. (2016, Figure 2) report a statistically insignificanteffect of -0.05kg based on the data from Awasthi et al. (2000).
17
any single censoring scheme sacrifices decision-relevant information for some audience
member. N
3.2 ω-optimality
In the classical setting, a rule c is a Bayes decision rule (or Bayes estimator) if it
prescribes the optimal decision given the data X for a Bayesian agent with some
proper prior (e.g., Lehmann and Casella 1998, p. 6; Robert 2007, p. 63). Such decision
rules have strong optimality properties in the classical setting. In particular, Complete
Class Theorems show that in many settings any rule that cannot be expressed as Bayes
is dominated by one that can be. These Theorems have been invoked as part of the
justification for using Bayes procedures (e.g., Robert 2007, p. 512).
Under the decision model any rule that is ω-optimal is a Bayes decision rule in the
classical sense. The reason is that the weighted average decision risk for the audience
can be expressed as the decision risk for a single agent with a weighted average prior
aω (θ) =∫
∆(Θ)a (θ) dω (a):
∫∆(Θ)
Ra (c) dω (a) =∑θ∈Θ
(Eθ [L (c (X, V ) , θ)]
∫∆(Θ)
a (θ) dω (a)
). (2)
Unlike decision risk, communication risk does not generally admit a representation like
(2). Nonetheless, we show in Online Appendix B.6 that a Complete Class Theorem
holds in both the decision model and the communication model.9
Propositions 1 and 3 establish conditions under which the set of ω-optimal rules
in the communication model does not overlap with the set of ω-optimal rules in the
decision model or with the set of Bayes decision rules with respect to full-support
priors.
Corollary 3. If (i) there exists a decision d ∈ D that is dominated in loss and (ii)
N (X ) ≥ |D| , then:
9To describe this result, define ω-optimality analogously to ω-optimality, except that the supportof ω may be a strict subset of ∆ (Θ). Unlike ω-optimality, ω-optimality need not imply admissibility.The Complete Class Theorem states that any rule that is not ω-optimal is dominated by some ω-optimal rule.
18
(a) Any rule c ∈ C that is ω-optimal under the decision model is inadmissible, and
therefore not ω′-optimal for any ω′, under the communication model, and vice versa.
(b) Any rule c ∈ C that can be expressed as a Bayes decision rule with respect to a
prior with full support on Θ is inadmissible, and therefore not ω-optimal for any ω,
under the communication model.
(c) Any rule c ∈ C that is ω-optimal under the communication model is not an
optimal decision rule for any agent with a full-support prior, a ∈ int (∆ (Θ)).
Corollary 3 implies a very strong sense in which the communication model predicts
different ω-optimal rules than the decision (and hence classical) model. This arises
because (under the conditions of the corollary) Bayes decision rules with respect to
full-support priors discard actionable information.
Example. (Continued) Suppose that Kruger et al. (1996) had selected a full-support
prior a ∈ int (∆ (Θ)) and reported the associated Bayes decision. By construction,
under both the decision and communication models, this rule is optimal for audience
members with prior a. Moreover, under the decision model, the rule is optimal with
respect to some full-support Pareto weights ω on the audience, and hence admissible.
Under the communication model, however, this rule is not optimal with respect to
any such Pareto weights, and is inadmissible. The reason is that any such Bayes
decision rule never selects d < 0, and so censors the data unnecessarily.
Figure 1 illustrates this intuition in a toy numerical example with F0 (t) a Bernoulli
distribution with success probability t, Θ0 = {0.3, 0.7}, and hence
Θ = {(0.3, 0.3) , (0.3, 0.7) , (0.7, 0.7)}. The upper-left plot shows the Bayes decision
rule c for a particular full-support prior a ∈ int (∆ (Θ)). The rule c never makes a
negative report, and is therefore dominated in communication risk by another rule c′
that does sometimes make a negative report, shown in the upper-right plot. The lower
plot shows, for each a, the normalized difference R∗a (c) − R∗a (c′) in communication
risk between the rule c and the dominating rule c′.
The dominating rule c′ achieves weakly lower communication risk than the rule c
for all agents a ∈ ∆ (Θ). Because rule c is the Bayes decision rule for agent a, the
dominating rule c′ achieves the same communication risk as c for this agent. The same
holds for other agents who, like agent a, put a lot of prior mass on the parameter
19
values {(0.3, 0.3) , (0.7, 0.7)} under which the intervention is ineffective. However,
for other agents, such as a′, who put more prior mass on the possibility that the
intervention is effective, the dominating rule c′ achieves strictly lower communication
risk than the rule c. Intuitively, such agents value knowing when X < X, i.e. when
the evidence for gains from treatment is especially weak.
Croke et al. (2016, Figure 2) review 20 distinct studies reporting on randomized
controlled trials of the effects of deworming drugs on children’s growth. None of these
reports a Bayesian posterior mean for a proper prior, or other explicitly Bayesian
estimate of the effect of deworming on weight. Our analysis illustrates one possible
reason for this, which is that reporting a Bayes decision with respect to one prior can
sacrifice information useful to an agent with a different prior. N
3.3 Maximum Risk
The set of minimax rules in the communication model nests that in the decision
model.
Theorem 1. Any rule c∗ ∈ C that is minimax under the decision (or classical) model
is minimax under the communication model.
An intuition for the proof is as follows. Pick some rule c∗ ∈ C that is minimax in
decision risk. Because the set of priors ∆ (Θ) is convex, we can show (following results
in Grunwald and Dawid 2004, who in turn build on the classic minimax theorem of
von Neumann 1928) that there exists some worst-off agent a∗ ∈ ∆ (Θ) for whom c∗
is optimal in decision risk. For this agent a∗, it follows that c∗ must also be optimal
in communication risk, because the agent’s optimal decision following any report
c∗ (X, V ) will simply be the report itself. But because this agent a∗ is worst-off in
decision risk, it follows that this same agent must also be worst-off in communication
risk. Collecting this reasoning gives:
infc∈C
supa∈∆(Θ)
Ra (c) = infc∈C
Ra∗ (c) = Ra∗ (c∗) = R∗a∗ (c∗) = infc∈C
R∗a∗ (c) = infc∈C
supa∈∆(Θ)
R∗a (c) .
Hence c∗ is minimax in communication risk. Convexity of the audience is important
20
Figure 1: Illustration of a Bayes decision rule in running example
Bayes decision rule c Decensored Bayes decision rule c′
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.1
0.1
0.1
0.1
0.1
0
0
0
0
0
0
0.1
0.2
0.2
0.2
0.2
0
0
0
0
0
0
0.2
0.3
0.3
0.3
0.3
0
0
0
0
0
0
0.2
0.3
0.4
0.4
0.4
0
0
0
0
0
0
0.2
0.3
0.4
0.4
0.4
0
0
0
0
0
0
0.2
0.3
0.4
0.4
0.4
0
0
0
0
0
0
0.2
0.3
0.4
0.4
0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.2
-0.1
0
-0.2
-0.1
0
0
-0.2
-0.1
0
0
0
-0.2
-0.1
0
0
0
0
-0.2
-0.1
0.1
0.1
0.1
0.1
0.1
-0.2
-0.1
0
0.1
0.2
0.2
0.2
0.2
-0.2
-0.1
0
0
0.2
0.3
0.3
0.3
0.3
-0.2
-0.1
0
0
0
0.2
0.3
0.4
0.4
0.4
-0.2
-0.1
0
0
0
0
0.2
0.3
0.4
0.4
0.4
-0.1
0
0
0
0
0
0.2
0.3
0.4
0.4
0.4
0
0
0
0
0
0
0.2
0.3
0.4
0.4
0.4
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.7
-0.6
-0.5
-0.4
-0.3
-0.6
-0.5
-0.4
-0.3
-0.5
-0.4
-0.3
-0.4
-0.3
-0.3
Communication risk of Bayes decision rulerelative to decensored rule
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Note: The plots illustrate properties of a Bayes decision rule in the running example withΘ0 = {0.3, 0.7} and thus Θ = {(0.3, 0.3) , (0.3, 0.7) , (0.7, 0.7)}. We set n = 20, n = 10,and X0 = {0, 1}, so that XM = XM = {0, 0.1, ..., 1} and D = {−1,−0.9, ..., 1}. Weassume that F0 (t) is a Bernoulli distribution with success probability t. The upper leftplot shows the Bayes decision rule c for a particular full-support prior a ∈ int (∆ (Θ)),specifically a = (0.1, 0.1, 0.8). The upper right plot shows a rule c′ such that c′ (X) =c (X)+1 {c (X) = 0}min
{0, X −X
}. The lower plot shows the difference in communication
risk R∗a (c) − R∗a (c′) between the two rules for each agent a ∈ ∆ (Θ), normalized by themaximum communication risk (over all agents) from a report of the full data. In the plot,we label the prior a, for which R∗a (c) = R∗a (c′), as well as another prior a′, for whichR∗a′ (c) > R∗a′ (c
′).
21
for this line of reasoning, because without it we cannot guarantee that the worst-case
prior corresponds to an agent a∗ in the audience.
Although minimax rules under the decision model are guaranteed to be minimax
under the communication model, they need not be appealing if the analyst cares about
more than the worst-case risk. Recall that by Proposition 1 and Lemma 1, both the
decision and communication models admit a nonempty set of minimax admissible
rules. Under the conditions of Proposition 3, these sets do not intersect, and thus
there is no rule that is minimax admissible under both models.
Corollary 4. If (i) there exists a decision d ∈ D that is dominated in loss and (ii)
N (X ) ≥ |D| , then any rule c∗ ∈ C that is minimax and admissible under the decision
(or classical) model is minimax and inadmissible under the communication model.
Example. (Continued). In the Example, any rule c∗ that is minimax under the
decision model reports d ≥ 0 with probability one. Hence, all rules that are minimax
under the decision model are inadmissible under the communication model in this
setting. Figure 2 continues our numerical illustration. The upper-left plot shows
a minimax decision rule c∗ in this numerical example. The rule c∗ never makes a
negative report, and is therefore dominated in communication risk by another rule c∗∗
that does sometimes make a negative report, shown in the upper-right plot. The lower
plot shows, for each a, the normalized difference R∗a (c∗)−R∗a (c∗∗) in communication
risk between the rule c∗ and the dominating rule c∗∗. The worst-off agent a∗, and other
agents who likewise put significant mass on parameters under which the treatment
is ineffective, do not benefit in communication risk from the dominating rule c∗∗.
However, agents who, like the labeled agent a∗∗, believe a priori that the treatment is
very likely effective, benefit in communication risk from rule c∗∗ because overturning
their priors requires strong evidence of ineffectiveness. Such evidence can be provided
by reporting the treatment-control difference X −X in cases where this difference is
negative. N
22
Figure 2: Illustration of minimax decision rule in running example
Minimax decision rule c∗ Decensored minimax decision rule c∗∗
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0
0
0
0
0.1
0.2
0.2
0.2
0.3
0.3
0.3
0
0
0
0
0.1
0.2
0.3
0.4
0.4
0.4
0.4
0
0
0
0
0.1
0.2
0.4
0.4
0.4
0.4
0.4
0
0
0
0
0.1
0.3
0.4
0.4
0.4
0.4
0.4
0
0
0
0
0.1
0.3
0.4
0.4
0.4
0.4
0.4
0
0
0
0
0.1
0.3
0.4
0.4
0.4
0.4
0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.2
-0.1
0
-0.2
-0.1
0
0
-0.2
-0.1
0
0
0
-0.2
-0.1
0
0
0
0
0.1
0.1
0.1
0.1
0.1
0.1
0.1
-0.2
0.1
0.2
0.2
0.2
0.3
0.3
0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.4
0.4
0.4
-0.2
-0.1
0
0.1
0.2
0.4
0.4
0.4
0.4
0.4
-0.2
-0.1
0
0
0.1
0.3
0.4
0.4
0.4
0.4
0.4
-0.1
0
0
0
0.1
0.3
0.4
0.4
0.4
0.4
0.4
0
0
0
0
0.1
0.3
0.4
0.4
0.4
0.4
0.4
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.7
-0.6
-0.5
-0.4
-0.3
-0.6
-0.5
-0.4
-0.3
-0.5
-0.4
-0.3
-0.4
-0.3
-0.3
Communication risk of minimax rulerelative to decensored rule
0
2
4
6
8
10
12
14
10-3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Note: The plots illustrate properties of a minimax decision rule in the running examplewith Θ0 = {0.3, 0.7} and thus Θ = {(0.3, 0.3) , (0.3, 0.7) , (0.7, 0.7)}. We set n = 20, n = 10,and X0 = {0, 1}, so that XM = XM = {0, 0.1, ..., 1} and D = {−1,−0.9, ..., 1}. Weassume that F0 (t) is a Bernoulli distribution with success probability t. The upper leftplot shows the minimax decision rule c∗. The upper right plot shows a rule c∗∗ such thatc∗∗ (X) = c∗ (X) + 1 {c∗ (X) = 0}min
{0, X −X
}. The lower plot shows the difference in
communication risk R∗a (c∗) − R∗a (c∗∗) between the two rules for each agent a ∈ ∆ (Θ),normalized by the maximum communication risk (over all agents) from a report of the fulldata. In the plot, we label the prior a∗ of the worst-off agent, for which R∗a∗ (c∗) = R∗a∗ (c∗∗),as well as another prior a∗∗, for which R∗a∗∗ (c∗) > R∗a∗∗ (c∗∗).
23
4 Applications
In this section we apply our model to two types of scientific situations studied in the
recent econometrics literature. In both situations we show that the implications of
the decision and communication models are different, and we argue that the commu-
nication model is more consistent with some observed scientific practices.
4.1 Estimation of a Monotone Function
Our first application is to the estimation of a function that is known to be mono-
tonically decreasing, which is a special case of estimation under shape restrictions.
Shape restrictions arise in many important economic situations, and econometric pro-
cedures that exploit these restrictions can offer improved performance according to
conventional criteria (see, e.g., the review by Chetverikov et al. 2018).
As a concrete example, consider estimating the effect of the length of a job-seeker’s
current spell of unemployment on the probability that her resume receives a callback.
Following Oberholzer-Gee (2008), Kroft et al. (2013), and Eriksson and Rooth (2014),
we may imagine that the analyst conducts an audit study, submitting artificial re-
sumes with randomly assigned employment histories, and recording the responses
from potential employers. Multiple classes of economic models (reviewed, for ex-
ample, in Kroft et al. 2013, Section II) imply that a longer current unemployment
spell makes an applicant less attractive to an employer, which can be interpreted
as a monotonicity restriction that the conditional probability of a callback is weakly
decreasing in the duration of the current unemployment spell.
Formally, let Yi ∈ {0, 1} denote a binary outcome, and suppose we are interested
in the conditional mean E [Yi|Zi] of Yi given some discrete Zi ∈ Z = {z1, ..., zJ}, with
z1 < z2 < ... < zJ . In the resume audit setting, we can think of Yi as an indicator
for whether resume i generates a callback, and Zi as the duration (in months) of the
fictitious applicant’s current unemployment spell.
The analyst observes n ≥ 3 independent draws of Yi for each predictor value zj.
Denote the fraction of successes when Zi = zj by Xj ∈{
0, 1n, ..., 1
}. The number of
successes nXj follows a binomial distribution and it is without loss to represent the
24
data as the vector of success rates,
X = (X1, ..., XJ) ∈ X =
{0,
1
n, ..., 1
}J.
The unknown parameter is the vector θ = (θ1, ..., θJ) of conditional means θj =
E [Y |Z = zj]. We assume that conditional mean is weakly decreasing in j
θ1 ≥ θ2 ≥ ... ≥ θJ . (3)
We assume that θj ∈ Θ0 ⊂ (0, 1) for each j, for Θ0 a finite set such that max {Θ0} −min {Θ0} > 2/3. The parameter space is thus
Θ ={θ ∈ ×Jj=1Θ0 : θ1 ≥ θ2 ≥ ... ≥ θJ
}.
We take the decision space to equal the sample space, D = X , so the communi-
cation constraint is not binding in the sense that it is feasible to report the full data
X. Finally, we suppose that the objective is to estimate θ, which we formalize by
considering the quadratic loss
L (d, θ) = ‖d− θ‖22 =
∑j
(dj − θj)2 .
The results of this section hold for any loss of the form L (d, θ) = ‖d−θ‖pp or L (d, θ) =
‖d− θ‖p for p > 1.
A natural rule is to simply report the data, c (X, V ) = X. All audience members
agree that such a report is unbiased, in the sense that Ea [X − θ] = 0 for all a ∈ ∆ (Θ) .
However, Xj need not be decreasing in j, and indeed Prθ {Xj > Xj−1} > 0 for all j
and all θ ∈ Θ. Hence, this rule does not respect the restrictions on the parameter
space. Let d(j) sort the elements of d in decreasing order, so d(1) is the largest and
d(J) the smallest. Define
d∗j (d) = d(j)
as the decision which sorts the elements in d in decreasing order. The following result
is immediate from Chernozhukov et al. (2009).
25
Claim 2. Any decision d with dj > dj−1 for some j is dominated in loss by d∗ (d).
At the same time, one can show that any two distinct realizations of the data
imply distinct optimal actions for some agent, which implies the following.
Claim 3. The effective size of the sample space is equal to the size of the sample and
decision spaces, N (X ) = |X | = |D| .
It follows by Proposition 3 that there is no rule c ∈ C that is admissible under
both the decision model and the communication model, and no rule c ∈ C that is
admissible under both the classical model and the communication model. Online
Appendix B.1.2 develops a version of this example with X normally distributed and
a continuous parameter space Θ0 ⊆ R, and shows that decision and communication
risk again imply different dominance orderings.
Figure 3 illustrates the conflict between the goals of communication and decision
in a numerical example of this application. The top row of plots shows three example
realizations of the data X that all correspond to the same rearranged decision d∗ (X),
which is depicted in the bottom plot. The rearranged decision d∗ (X) dominates a
decision of d (X) = X in loss for any X for which Xj > Xj−1 for some j, and hence for
the three realizations X depicted in the top row of plots. However, by pooling several
different realizations of the data into the same decision, the rearranged decision d∗ (X)
sacrifices information that is decision-relevant for some members of the audience.
Kroft et al. (2013, Figure 2, top panel) report the average callback rate as a
function of months unemployed, which we may think of as analogous to reporting
X, and which violates monotonicity in their data. Kroft et al. (2013, Figure 3) also
report a sorted local linear regression estimate based on Chernozhukov et al (2009),
which imposes monotonicity, and which we may think of as analogous to reporting
d∗ (X). Eriksson and Rooth (2014, Table 5) report the callback rate as a function of
months unemployed in the current spell, which violates monotonicity in their data.
Eriksson and Rooth do not report any sorted or monotonicity-constrained estimates
of this relationship.10
10Oberholzer-Gee (2008, Table 2) reports a linear probability model relating the callback proba-bility to the number of months unemployed in the current spell, which respects monotonicity in hisdata among those currently unemployed.
26
Figure 3: Illustration of dominating decision in estimation of a monotone function
Example data realizations corresponding to the same dominating decision
0 2 4 6 8 10 12
0.05
0.1
0.15
0.2
0 2 4 6 8 10 12
0.05
0.1
0.15
0.2
0 2 4 6 8 10 12
0.05
0.1
0.15
0.2
Dominating decision
0 2 4 6 8 10 12
0.05
0.1
0.15
0.2
Note: The plots illustrate properties of the dominating decision in the estimation of afunction that is known to be monotonically decreasing. The top row of plots depicts threepossible realizations of the callback rates X = (X1, ..., XJ). These are the means of a binaryoutcome Yi conditional on the unemployment duration Zi ∈ Z = {z1, ..., zJ} = {1, ..., 12}.For each realization X, the decision d = X is dominated in (quadratic) loss by the decisiond = d∗ (X), which is depicted in the bottom plot. Under the dominating decision d∗ (X),all of the realizations X depicted in the top row of plots are reported identically, with thereport depicted in the bottom plot.
27
The fact that both articles report estimates that violate monotonicity seems more
consistent with the predictions of the communication model than with the predic-
tions of the decision model. The fact that Kroft et al. (2013) additionally report
sorted estimates seems consistent with the predictions of the decision model, perhaps
suggesting a concern for both communication and decision-making in their context.
4.2 Optimal Treatment Assignment
Our second application is to optimal treatment assignment. Manski (2004) studies
the problem of assigning one of a finite set of treatments to each member of some
population, potentially as a function of a discrete covariate.
As a concrete example, consider the problem of an analyst who must make a
clinical recommendation to an audience of physicians on the basis of the available
evidence. This is a problem faced by many organizations, including medical learned
societies and private publishers. Say that each physician’s goal is to achieve the best
average outcome for patients with each of a given set of attributes (e.g., diagnosis).
We suppose these attributes are discrete, as in Manski (2004), and study the problem
of recommending treatment to patients in a given attribute cell.
Formally, denote the available treatments (e.g., medications) by t ∈ {1, ..., T} for
T ≥ 2. Suppose that n ≥ 1 units (e.g., patients) are randomly allocated to each
treatment t, and that for each unit i the analyst measures a binary outcome Yi (e.g.,
an indicator for the resolution of symptoms). Let us further assume patient outcomes
are independent, so it is without loss to represent the data for treatment t as a fraction
of successes Xt ∈{
0, 1n, ..., 1
}, with nXt following a binomial distribution. The sample
space is then
X =
{0,
1
n, ..., 1
}T.
The unknown parameter is (θ1, ..., θT ) where θt = E [Xt] denotes the success proba-
bility for units assigned to treatment t. We assume that θt lies in a finite set Θ0 ⊂ (0, 1)
with |Θ0| ≥ 2, so the parameter space is Θ = ΘT0 ⊂ (0, 1)T . We show in Online Ap-
pendix B.1.3 that our results for this application continue to hold for continuous
parameter spaces Θ0 = (0, 1) or Θ0 = [0, 1].
The agent’s (e.g., physician’s) decision consists of picking a treatment or declining
28
do so. Formally we take the decision space to be D = {1, ..., T} ∪ {ι} where ι
corresponds to not picking a treatment. The agent’s objective is to pick the best
treatment which, following Manski (2004), we formalize by considering the regret loss
L (d, θ) =
−θd + maxt θt if d 6= ι
maxt θt if d = ι.
Declining to pick a treatment yields greater loss than picking any given treatment
(e.g., because the patient cannot self-prescribe). The decision space in this case is too
small to convey the full data, since T + 1 = |D| < |X | = (n+ 1)T .
Classical decision-theoretic results for selection problems (Lehmann 1966, Eaton
1967) imply the following:
Claim 4. Consider the rule c∗ that takes c∗ (X, V ) = arg maxtXt if the argmax is
unique and otherwise randomizes uniformly over arg maxtXt. This rule minimizes
decision risk uniformly over ∆ (Θ) among rules that are invariant with respect to
permutations of the treatments. Moreover, it is a Bayes decision rule for the agent a∗
with a∗ (θ) = 1|Θ| for all θ ∈ Θ, and is minimax under the decision model.
The rule c∗ is a special case of what Manski (2004) terms the“conditional empirical
success” rule, and is related to the empirical welfare maximization procedures studied
by Kitagawa and Tetenov (2018) and Athey and Wager (2019). The results of Stoye
(2009) imply that c∗ is a minimax decision rule for regret loss in the case of T = 2.
Claim 4 shows that the rule c∗ is an appealing decision rule in several senses, and
together with Theorem 1 implies that c∗ is minimax under the communication model.
The rule c∗ is nevertheless inadmissible under the communication model. To see this,
note first the following properties of the setting:
Claim 5. The decision d = ι is dominated in loss, and the effective size of the sample
space is at least as large as the decision space, N (X ) ≥ |D| .
It follows by Proposition 3 that there is no rule c ∈ C that is admissible under
both the decision model and the communication model, and no rule c ∈ C that is
admissible under both the classical model and the communication model. Since, by
Claim 4, c∗ is a Bayes decision rule for the full-support prior a∗ (θ) = 1|Θ| , it follows
29
that c∗ is admissible under the decision model and therefore, by Proposition 3 and
Corollary 3(b), inadmissible, and not ω-optimal for any ω, under the communication
model.
To understand why rule c∗ is inadmissible under the communication model, note
the following:
Claim 6. Under the communication model, the rule c∗ is dominated by the rule c
that takes c (X, V ) = ι if arg maxtXt = {1, ..., T} and c (X, V ) = c∗ (X, V ) otherwise.
When arg maxtXt = {1, ..., T}, the data are uninformative about which treatment
is best, in the sense that the likelihood for a given parameter value θ is unchanged if
we permute which θt is associated to which treatment. The rule c reflects this fact by
reporting ι, while the rule c∗ instead makes a guess at random. From the standpoint
of decision-making, the rule c∗ selects an undominated decision while the rule c may
not. From the standpoint of communication, the rule c∗ obscures information that
the rule c does not. The information that c∗ obscures is useful to agents whose
priors are such that they wish to follow the empirical success rule only when the
data are informative. Online Appendix B.7 extends the analysis to demonstrate a
case in which the communication model favors reporting ι even when the data are
informative, provided the amount of information in the data is small in comparison
to the audience’s beliefs.
In practice, analysts in situations like the one we have modeled sometimes ex-
press their ignorance rather than choosing a concrete recommendation at random.
UpToDate is a private publisher that synthesizes medical research into clinical rec-
ommendations. As in the communication model, readers of these recommendations
include practitioners who are free to make different clinical decisions. On the choice
among selective serotonin reuptake inhibitors (SSRIs) to treat unipolar major depres-
sion in adults, UpToDate says “Given the lack of clear superiority in efficacy among
antidepressants, selecting a drug is based on other factors, such as ... patient prefer-
ence or expectations” (Simon 2019). Such a report seems more similar to c than to
c∗, and thus more consistent with the predictions of the communication model than
with the predictions of the decision model.
30
5 Conclusions
We propose a model of scientific communication in which the analyst’s report is
designed to convey useful information to the agents in the audience, rather than, as
in a classical model of statistics, to make a good decision or guess on these agents’
behalf. We show conditions under which the proposed model predicts very different
reporting rules from the classical model. We offer examples satisfying these conditions,
and we argue that in practical situations similar to these examples, scientists’ reports
seem to be more consistent with the predictions of the proposed model than with the
predictions of the classical model.
Proofs
This appendix provides proofs for the main results in the paper. Online Appendix
A proves results for the Example and applications. Online Appendix B discusses
extensions and additional results.
Proof of Proposition 1 We first prove the existence of an ω-optimal rule. Since
the set of risk functions {ρ (c, ·) : c ∈ C} is compact in the supremum norm, the set
of weighted average risks{∫A ρ (c, a) dω (a) : c ∈ C
}is compact as well, and thus has
a smallest element. This implies the existence of an ω-optimal rule.
We next show that any ω-optimal rule is admissible. Suppose that the result fails,
so there exists some c′ ∈ C that is ω-optimal but inadmissible. ω-optimality implies
that ∫Aρ (c′, a) dω (a) = min
c∈C
∫Aρ (c, a) dω (a) .
Inadmissibility implies the existence of c′′ ∈ C such that ρ (c′, a) ≥ ρ (c′′, a) for all
a ∈ A and ρ (c′, a′) > ρ (c′′, a′) for some a′ ∈ A. By continuity of ρ (c, a) in a,
ρ (c′, a) > ρ (c′′, a) on an open neighborhood of a′ in ∆ (Θ) (which may or may not
be contained in A). The full support assumption on ω implies that the intersection
31
of this open neighborhood with A has positive ω measure,11 and thus that∫Aρ (c′, a) dω (a) >
∫Aρ (c′′, a) dω (a) .
Hence, we have achieved a contradiction and proved the result.
Since we have shown that an ω-optimal rule exists for any ω, and that ω-optimal
rules are admissible, it follows that an admissible rule exists.
We next prove the existence of minimax rules. To that end, note that A is a closed
subset of a finite-dimensional simplex and so is compact. Since ρ (c, a) is continuous
in a and A is compact, supa∈A ρ (c, a) = maxa∈A ρ (c, a) exists for all c ∈ C. Moreover,
since the set of risk functions {ρ (c, ·) : c ∈ C} is compact in the supremum norm, the
set {maxa∈A ρ (c, a) : c ∈ C} is compact, and so has a minimum, which implies the
existence of a minimax rule.
Finally, to prove the existence of an admissible minimax rule, define the set C0 ⊆ Cof minimax rules and note that the set of minimax risk functions {ρ (c, ·) : c ∈ C0} is
a closed subset of a compact set, and thus is compact. Let us take a countable dense
subset of A, {a1, a2, ...}. For each j ≥ 1, let Cj denote the subset of rules in Cj−1 with
minimal risk at aj,
Cj =
{c ∈ Cj−1 : ρ (c, aj) = min
c′∈Cj−1
ρ (c′, aj)
},
where the min on the right hand side is achieved by compactness of {ρ (c, ·) : c ∈ Cj−1}.{ρ (c, ·) : c ∈ Cj} is a closed subset of a compact set, and so is again compact. More-
over, {ρ (c, ·) : c ∈ Cj} is non-empty by construction for all j, so ∩j {ρ (c, ·) : c ∈ Cj}is non-empty by Cantor’s Intersection Theorem. Any rule c∗ ∈ ∩jCj ⊆ C0 is minimax
by definition but must also be admissible, since otherwise (by continuity of ρ in a)
it would have been dropped at some finite step j. Thus, there exists an admissible
minimax rule. 2
Lemma 1. Decision risk Ra (c) and communication risk R∗a (c) are both continu-
ous in a for all c ∈ C. Moreover, the sets of risk functions {R· (c) : c ∈ C} and
11Since a′ is in the support of ω by assumption, we know that ω assigns positive mass to all openneighborhoods of a′.
32
{R∗· (c) : c ∈ C} are compact in the supremum norm.
Proof of Lemma 1 We first show continuity of decision risk. Note that the decision
risk of a rule c can be written as
∑Θ
a (θ) Eθ [L (c (X, V ) , θ)]
where L (d, θ) is uniformly bounded. Hence, the decision risk is trivially continuous,
and indeed Lipschitz, in a.
We next show continuity of communication risk. To this end, note that for any
fixed c ∈ C, if we define B to be the set of mappings b : D × [0, 1] → D then for any
c ∈ C and any a ∈ ∆ (Θ) we have
R∗a (c) = minb∈B
Ea [L (b (c (X, V ) , V ) , θ)] = minb∈B
Ra (b ◦ c) ,
where the min is achieved by a binding rule ba ∈ B with
ba (c (X, V ) , V ) ∈ mind∈D
Ea [L (d, θ) |c (X, V ) , V ]
for all (X, V ). However, the same argument used to establish continuity of decision
risk implies that Ea [L (b (C, V ) , θ)] is continuous in a. Indeed,
supc,b |Ea [L (b (c (X, V ) , V ) , θ)]− Ea [L (b (c (X, V ) , V ) , θ)]| ≤2 maxd,θ |L (d, θ)|
∑Θ |a (θ)− a (θ)|
(4)
so Ea [L (b (c (X, V ) , V ) , θ)] is Lipschitz in a (with respect to the L1-norm) uniformly
over b, c. This implies that minb∈B Ea [L (b (c (X, V ) , V ) , θ)] is Lipschitz in a as well,
since for ba the optimal rule for a, with
minb∈B
Ea [L (b (c (X, V ) , V ) , θ)] = Ea [L (ba (c (X, V ) , V ) , θ)] = R∗a (c) ,
we have
Ea [L (ba (c (X, V ) , V ) , θ)] ≥ Ea [L (ba (c (X, V ) , V ) , θ)] = R∗a (c)
33
Ea [L (ba (c (X, V ) , V ) , θ)] ≥ Ea [L (ba (c (X, V ) , V ) , θ)] = R∗a (c) ,
and thus
|R∗a (c)−R∗a (c)| ≤ 2 maxd,θ|L (d, θ)|
∑Θ
|a (θ)− a (θ)| .
Hence, communication risk is Lipschitz in a, with the same Lipschitz constant as
decision risk.
To prove compactness, note that both the decision and communication risk func-
tions are uniformly bounded by maxd,θ |L (d, θ)|, and that the set of beliefs A is com-
pact with respect to the L1-norm∑
Θ |a (θ)− a (θ)|. Thus, the sets of decision and
communication risk functions are bounded Lipschitz functions on a compact domain,
and so are compact in the supremum norm by the Arzela–Ascoli Theorem. 2
Proof of Proposition 2 For admissibility, note that rule c dominates rule c′ in the
classical model if and only if
Eθ [L (c (X, V ) , θ)] ≤ Eθ [L (c′ (X, V ) , θ)] for all θ ∈ Θ (5)
with strict inequality for some θ. Note, however, that this implies
Ea [L (c (X, V ) , θ)] ≤ Ea [L (c′ (X, V ) , θ)]
for all a, with strict inequality for full-support a, and thus that c dominates c′ in
the decision model. Conversely, if c dominates c′ in the decision model, then (5) is
immediate. Moreover, there must be a strict inequality for some θ, or else the decision
risk functions for c and c′ would be the same.
For ω-optimality, for any full-support weights ω on ∆ (Θ), define
aω (θ) =
∫∆(Θ)
a (θ) dω (a)
as the implied weight on Θ. Note that if ω has full support on ∆ (Θ) then aω has full
support on Θ, and thus that if a rule c is optimal in the decision model with respect
to the full-support weights ω, it is optimal in the classical model with respect to the
full-support weights aω. Conversely, for any full-support weights aω in the classical
34
model, let ν = minθ∈Θ aω (θ). Let η denote the uniform weight on ∆ (Θ), and let ω
correspond to a mixture with weight ν |Θ| on η and weight 1−ν |Θ| on the degenerate
weight which puts mass one on a (θ) = aω(θ)−ν1−ν|Θ| . By construction the weight ω has full
support, and any rule that is optimal in the classical model with respect to aω is
optimal in the decision model with respect to ω.
Finally note that the maximum risk in the decision model is the same as that in
the classical model, in the sense that for all c ∈ C,
maxa∈A
Ea [L (c (X, V ) , θ)] = maxθ∈Θ
Eθ [L (c (X, V ) , θ)] .
Hence, equivalence of minimax rules in the two models is immediate. 2
To prove several results that follow, it is helpful to restrict attention to the class
of rules that are non-random conditional on the data and the public randomization
device (that is, rules c : X × [0, 1]→ D, rather than c : X × [0, 1]→ ∆ (D)). We can
do so without increasing either decision or communication risk.
Lemma 2. For any c : X × [0, 1] → ∆ (D), there exists a publicly-randomized rule
c : X×[0, 1]→ D such that Ra (c) = Ra (c) and R∗a (c) ≤ R∗a (c), both for all a ∈ ∆ (Θ).
Proof of Lemma 2 Note that we can construct a U [0, 1] random variable U inde-
pendent of X and V such that for some function c : X × [0, 1]2 → D, the conditional
distribution of c (X, V, U) given (X, V ) coincides with that of c (X, V ).12
Next, note that using V we can generate two independent U [0, 1] random variables
(V1, V2), e.g. by taking alternating terms in the decimal expansion of V . Let us define
c (X, V ) = c (X, V1, V2) , noting that c (X, V ) is non-random conditional on (X, V ) .
The conditional distribution of c (X, V ) given X is the same as the conditional dis-
tribution of c (X, V ) given X for all X by construction. Hence, Eθ [L (c (X, V ) , θ)] =
Eθ [L (c (X, V ) , θ)] for all θ ∈ Θ, which implies that Ra (c) = Ra (c) for all a ∈ ∆ (Θ),
12For example, numbering the elements of D as d1, ..., d|D| and defining pj (X,V ; c) =Pr {c (X,V ) = dj |X,V } , conditional on c (X,V ) = dj let us take U to be uniformly distributed
on[∑
j′<j pj′ (X,V ; c) ,∑
j′≤j pj′ (X,V ; c)]. Note that U ∼ U [0, 1] conditional on (X,V ) for all
(X,V ), and so is independent of (X,V ) . If we then define c so that c (X,V, U) = dj if and only if
U ∈[∑
j′<j pj′ (X,V ; c) ,∑
j′≤j pj′ (X,V ; c)], the conditional distribution of c (X,V, U) coincides
with that of c (X,V ).
35
and so establishes the result for decision risk.
For communication risk, as in the proof of Lemma 1 above, we define B to be the
set of mappings b : D × [0, 1] → D and let ba denote the optimal rule for a. Let c
denote the (potentially randomized) rule considered in the statement of the lemma,
and ba a corresponding binding rule. Note that by the definition of c
R∗a (c) = minb∈B
Ra (b ◦ c) = Ra (ba ◦ c) = Ra (ba ◦ c) ≥ minb∈B
Ra (b ◦ c) = R∗a (c) ,
so c has weakly lower communication risk than c.13 2
Proposition 3 extends to cases with general audiences A 6= ∆ (Θ). To state the
result for this more general case, we first need to define dominance in loss and the
effective size of the sample space for a general audience A.
Definition 5. A decision d ∈ D is dominated in loss if there exists d′ ∈ D such
that L (d, θ) ≥ L (d′, θ) for all θ ∈ Θ, with strict inequality for some θ′ such that
Pra′ {θ = θ′} > 0 for at least one agent a′ ∈ A.
Definition 6. Let P be the set of partitions of X , with generic element P ∈ P. Let
P∗A denote the subset of P such that for every cell Xp ∈ P ∈ P∗A, each agent a ∈ Ahas at least one decision d ∈ D that is optimal for every X ∈ Xp. That is,
P∗A =
{P ∈ P :
{∩X∈Xp arg min
d∈DEa [L (d, θ) |X]
}6= ∅ for all Xp ∈ P, a ∈ A
}.
The effective size of the sample space X for audience A, denoted N (X ,A), is
the minimal size of a partition in P∗A
N (X ,A) = min {|P | : P ∈ P∗A} .13To see why this inequality can be strict, it suffices to consider an example. Suppose thatD = {d1, d2} , X = {x1, x2} , and that c implies Pr {c (X,V ) = dj |V,X = xj} = 2/3 for j ∈ {1, 2}.To achieve the same conditional distribution over c (X,V ) |X with a publicly-randomized rule c, itmust be the case that V11 = {v|c (x1, v) = d1} and V22 = {v|c (x2, v) = d2} both have Lebesguemeasure 2/3, and thus that V11 ∩ V22 has Lebesgue measure at least 1/3 (since the measure ofV11 ∪ V22 cannot exceed one). Conditional on V ∈ V11 ∩ V22, however, agents can perfectly infer Xbased on observing (c (X,V ) , V ) , whereas this is never possible based on observing (c (X,V ) , V ) .Hence, for certain loss functions and priors, communication risk will be strictly smaller under c thanunder c.
36
Proposition 4. If (i) there exists a decision d ∈ D that is dominated in loss and
(ii) N (X ,A) ≥ |D| , then any rule c ∈ C that is admissible under the decision model
with audience A is inadmissible under the communication model with audience A,
and vice versa.
Proof of Proposition 4 To prove this result, note first that under our full-support
assumption on Fθ, any rule c that selects the dominated decision d with positive
probability is dominated in decision risk by the rule c′ that is equal to c except that
it chooses the dominating decision d′ whenever c chooses d. Hence, any rule c that
selects d with positive probability is inadmissible under the decision model.
We next show that any rule c that selects d with probability zero is inadmissible
under the communication model. By Lemma 2 we can limit attention to rules c
that use only the public randomization device. Fix a given such rule c, and let
D′ ⊂ D denote the set of decisions that c selects with positive probability. Since
|D′| < |D| ≤ N (X ,A) , we know that for every realization of the public randomization
device V there is some aV ∈ A, some XV ⊆ X , and some dV ∈ D such that
∩X∈XV arg mind∈D
EaV [L (d, θ) |X] = ∅
and c (X, V ) = dV for all X ∈ XV .14
We can view (XV , dV ) as a random variable supported on 2X × D, for 2X the
power set of X . Since XV is non-empty with probability one and (X ,D) are fi-
nite, there exists a pair X ∗ ⊆ X and d∗ ∈ D such that X ∗ is non-empty and
Pr {XV = X ∗, dV = d∗} > 0, where this event depends only on V so the probabil-
ity does not need to be subscripted by a. Let a∗ be an agent with
∩X∈X ∗ arg mind∈D
Ea∗ [L (d, θ) |X] = ∅, (6)
noting that we can take a∗ = aV for any V with XV = X ∗. Let
V∗ = {V ∈ [0, 1] : XV = X ∗, dV = d∗} ,14If this is not the case, then the level sets of c (·, V ) form a partition in P∗A of size at most |D′| ,
contradicting |D′| < N (X ,A) .
37
so c (X, V ) = d∗ for all X ∈ X ∗ and V ∈ V∗. Finally, let d∗∗ denote an element of
arg mind∈D
Ea∗ [L (d, θ) |c (X, V ) = d∗, V ∈ V∗] .
By (6), there exists X ∈ X ∗ and d ∈ D such that
Ea∗
[L(d, θ)|X]< Ea∗
[L (d∗∗, θ) |X
].
For d an element of D\D′, let c be the rule that reports c (X, V ) = d when V ∈ V∗ and
X = X, and that agrees with c otherwise. Given c (X, V ), an agent can reconstruct
c (X, V ), so the rule c does not increase communication risk for any agent relative to
c. For agent a∗, however,
Ea∗
[L(d, θ)|c (X, V ) = d, V ∈ V∗
]< Ea∗
[L (d∗∗, θ) |c (X, V ) = d, V ∈ V∗
],
so d∗∗ is now sub-optimal for a∗ conditional on observing{c (X, V ) = d, V ∈ V∗
}.
This immediately implies that
Ea∗
[mind∈D
Ea∗ [L (d, θ) |c (X, V ) , V ∈ V∗] |c (X, V ) = d∗, V ∈ V∗]<
mind∈D
Ea∗ [L (d, θ) |c (X, V ) = d∗, V ∈ V∗] .
Since Pra∗ {c (X, V ) = d∗, V ∈ V∗} > 0, this implies that R∗a∗ (c) < R∗a∗ (c), so c dom-
inates c as we wanted to show. 2
Proof of Proposition 3 Follows immediately from Proposition 4 by setting A =
∆ (Θ) . 2
Proof of Corollary 1 Follows immediately from Propositions 2 and 3. 2
Proof of Corollary 3 Lemma 1 shows that the decision and communication mod-
els satisfy the conditions of Proposition 1. For part (a) of the corollary, note that
since by Proposition 1 ω-optimality in the decision model implies admissibility in
that model, under the conditions of the corollary it also implies inadmissibility in the
38
communication model by Proposition 3. Moreover, since ω-optimality for communi-
cation implies admissibility for communication, the set of rules that are ω-optimal for
decision and ω′-optimal for communication, where ω and ω′ may be different, must
not overlap.
For part (b) of the corollary, note that by Proposition 2, c is a Bayes decision rule
with respect to a full-support prior if and only if it is ω-optimal in the decision model
for some ω. The result thus follows by part (a).
Finally, for part (c) of the corollary, note that ω-optimality in the communication
model implies admissibility for communication. By Proposition 3 this implies inad-
missibility for decision, and thus by the argument in part (b) implies that the rule is
not a Bayes decision rule with respect to a full-support prior. �
Theorem 1 again extends to general (closed) audiences A, though in this case we
must also require that the audience be convex.
Theorem 2. If the audience A is convex, any rule c∗ ∈ C that is minimax under the
decision model is minimax under the communication model.
Proof of Theorem 2 Note that by continuity of decision risk in a (established
in Lemma 1) and compactness of closed subsets of the finite-dimensional simplex,
supa∈ARa (c) is achieved for all c. Note, next, that the set of decision risk functions
{R· (c) : c ∈ C} is compact in the supremum norm by Lemma 1. Hence,
infc∈C
supARa (c)
is achieved, so a min-max rule exists.
Thus, there exists c∗ ∈ C with infc∈Cmaxa∈ARa (c) = maxa∈ARa (c∗) . Let c∗ :
X → ∆ (D) be a rule that does not use the public randomization device such that the
distribution of c∗(X)|X is the same as that of c∗(X, V )|X for all X ∈ X . Ra (c∗) =
Ra (c∗) for all a, so maxa∈ARa (c∗) = maxa∈ARa (c∗). For C the class of such rules,
note that if we limit attention to c ∈ C we can cast the problem into the setting of
Section 5 of Grunwald and Dawid (2004). Specifically, let us view c ∈ C as the action
“a” in their terminology, and let us define the state “X” in their terminology as the
pair (X, θ) , noting that this state takes only a finite number of values. Since the class
39
of priors A is closed and convex by assumption, the set of implied distributions for
(X, θ) is closed and convex as well. Finally, the set of risk functions for c : X → Dis finite, while the set of decision risk functions for C is the convex hull of a finite set
and so is compact.
Theorem 5.2 of of Grunwald and Dawid (2004) then implies that there exists
a∗ ∈ A such that
maxa∈A
Ra (c∗) = Ra∗ (c∗) = minc∈C
Ra∗ (c) . (7)
Note, however, that the same argument used to construct c∗,
minc∈C
Ra∗ (c) = minc∈C
Ra∗ (c) ,
so (7) and the fact that Ra (c∗) = Ra (c∗) for all a implies that maxa∈ARa (c∗) =
Ra∗ (c∗) = minc∈C Ra∗ (c) . Note, next, that by the definition of communication risk
minc∈C Ra∗ (c) = minc∈C R∗a∗ (c) . Thus, since Ra (c) ≥ R∗a (c) for all a ∈ A, c ∈ C, we
have Ra∗ (c∗) = R∗a∗ (c∗).
Combining this reasoning, we see that
infc∈C
maxa∈A
Ra (c) = maxa∈A
Ra (c∗) = Ra∗ (c∗) = R∗a∗ (c∗) = minc∈C
R∗a∗ (c) .
Note, next, that by definition
infc∈C
maxa∈A
Ra (c) ≥ minc∈C
maxa∈A
R∗a (c) ≥ minc∈C
R∗a∗ (c) .
Thus, since the first and last quantities are equal, we obtain that
infc∈C
supa∈A
Ra (c) = infc∈C
Ra∗ (c) = Ra∗ (c∗) = R∗a∗ (c∗) = infc∈C
R∗a∗ (c) = infc∈C
supa∈A
R∗a (c) ,
as we wanted to show. 2
Proof of Theorem 1 Immediate from Theorem 2, taking A = ∆ (Θ) . 2
Proof of Corollary 4 Immediate from Proposition 3. 2
40
References
Abadie, Alberto. Forthcoming. Statistical non-significance in empirical economics.
American Economic Review: Insights.
Abadie, Alberto and Maximilian Kasy. 2019. Choosing among regularized estimators
in empirical economics: The risk of machine learning. Review of Economics
and Statistics 101(5): 743-762.
Andrews, Isaiah, Matthew Gentzkow, and Jesse M. Shapiro. 2017. Measuring the
sensitivity of parameter estimates to estimation moments. Quarterly Journal
of Economics 132(4): 1553-1592.
Andrews, Isaiah, Matthew Gentzkow, and Jesse M. Shapiro. 2020. Transparency in
structural research. NBER Working Paper No. 26631.
Awasthi, S., V. K. Pande, and R. H. Fletcher. 2000. Effectiveness and cost-effectiveness
of albendazole in improving nutritional status of pre-school children in urban
slums. Indian Pediatrics 37(1): 19-29.
Athey, Susan and Stefan Wager. 2019. Efficient policy learning. arXiv preprint
arXiv:1702.0289. Accessed on September 20 2019 at
<https://arxiv.org/abs/1702.02896v5>.
Banerjee, Abhijit, Sylvain Chassang, Sergio Montero, and Erik Snowberg. Forth-
coming. A theory of experimenters: robustness, randomization, and balance.
American Economic Review.
Bergemann, Dirk and Stephen Morris. 2019. Information design: A unified perspec-
tive. Journal of Economic Literature 57(1): 44-95.
Blackwell, David. 1951. The comparison of experiments. In J. Neyman (ed.), Pro-
ceedings of the Second Berkeley Symposium on Mathematical Statistics and
Probability : 93-102. Berkeley: University of California Press.
Blackwell, David. 1953. Equivalent comparisons of experiments. Annals of Mathe-
matical Statistics 24(2): 265-272.
Brown, Lawrence D. 1975. Estimation with incompletely specified loss functions
(the case of several location parameters). Journal of the American Statistical
Association 70(350): 417-427.
Chen, Ying, Navin Kartik, and Joel Sobel. 2008. Selecting cheap-talk equilibria.
Econometrica 76(1): 117-136.
Chernozhukov, Victor, Ivan Fernandez-Val, and Alfred Galichon. 2009. Improv-
41
ing point and interval estimators of monotone functions by rearrangement.
Biometrika 96(3): 559-575.
Chetverikov, Denis, Andres Santos, and Azeem M. Shaikh. 2018. The econometrics
of shape restrictions. Annual Review of Economics 10: 31-63.
Cover, Thomas M. and Joy A. Thomas. 2006. Elements of Information Theory. 2nd
ed. New York: Wiley-Interscience.
Crawford, Vincent P. and Joel Sobel. 1982. Strategic information transmission.
Econometrica 50(6): 1431-1451.
Croke, Kevin, Joan Hamory Hicks, Eric Hsu, Michael Kremer, and Edward Miguel.
2016. Does mass deworming affect child nutrition? Meta-analysis, cost-effectiveness,
and statistical power. World Bank Policy Research Working Paper No. 7921.
Donnen, Philippe, Daniel Brasseur, Michele Dramaix, Francoise Vertongen, Mweze
Zihindula, Mbasha Muhamiriza, and Philippe Hennart. 1998. Vitamin A sup-
plementation but not deworming improves growth of malnourished preschool
children in eastern Zaire. Journal of Nutrition 128(8): 1320-1327.
Eaton, Morris L. 1967. Some optimum properties of ranking procedures. The Annals
of Mathematical Statistics 38(1): 124-137.
Efron, Bradley. 1986. Why isn’t everyone a Bayesian? American Statistician 40(1):
1-5.
Eriksson, Stefan and Dan-Olof Rooth. 2014. Do employers use unemployment as a
sorting criterion when hiring? Evidence from a field experiment. American
Economic Review 104(3): 1014-1039.
Farrell, Joseph and Robert Gibbons. 1989. Cheap talk with two audiences. American
Economic Review 79(5): 1214-1223.
Frankel, Alex and Maximilian Kasy. 2018. Which findings should be published?
Harvard University Working Paper. Accessed on November 8, 2018 at
<https://maxkasy.github.io/home/files/papers/findings.pdf>.
Geweke, John. 1997. Posterior simulators in econometrics. In D. M. Kreps and
K. F. Wallis (eds.), Advances in Economics and Econometrics: Theory and
Applications, Seventh World Congress 3: 128-165. Cambridge: Cambridge
University Press.
Geweke, John. 1999. Using simulation methods for Bayesian econometric models:
Inference, development, and communication. Econometric Reviews 18(1): 1-
73.
42
Grunwald, Peter D. and A. Philip Dawid. 2004. Game theory, maximum entropy,
minimum discrepancy and robust Bayesian decision theory. Annals of Statis-
tics 32(4): 1367-1433.
Hildreth, Clifford. 1963. Bayesian statisticians and remote clients. Econometrica
31(3): 422-438.
Jordan, Michael I., Jason D. Lee, and Yung Yang. 2018. Communication-efficient
distributed statistical inference. Journal of the American Statistical Associa-
tion.
Kamenica, Emir and Matthew Gentzkow. 2011. Bayesian persuasion. American
Economic Review 101(6): 2590-2615.
Kitagawa, Toru and Aleksey Tetenov. 2018. Who should be treated? Empirical
welfare maximization methods for treatment choice. Econometrica 86(2): 591-
616.
Kremer, Michael and Edward Miguel. 2004. Worms: identifying impacts on education
and health in the presence of treatment externalities. Econometrica 72(1):
159-217.
Kroft, Kory, Fabian Lange, and Matthew J. Notowidigdo. 2013. Duration dependence
and labor market conditions: Evidence from a field experiment. The Quarterly
Journal of Economics 128(3): 1123–1167.
Kruger, Marita, Chad J. Badenhorst, and Erna P. G. Mansvelt. 1996. Effects of iron
fortification in a school feeding scheme and anthelmintic therapy on the iron
status and growth of six- to eight-year-old schoolchildren. Food and Nutrition
Bulletin 17(1): 1-11.
Kwan, Yum K. 1999. Asymptotic Bayesian analysis based on a limited information
estimator. Journal of Econometrics 88(1): 99-121.
Le Cam, L. 1996. Comparison of experiments—A short review. In T. S. Ferguson,
L. S. Shapley, and J. B. MacQueen (eds.), Statistics, Probability and Game
Theory: Papers in Honor of David Blackwell : 127-138. Hayward: Institute of
Mathematical Statistics.
Lehmann, E. L. 1966. On a theorem of Bahadur and Goodman. The Annals of
Mathematical Statistics 37(1): 1-6.
Lehmann, E.L. and George Casella. 1998. Theory of Point Estimation. 2nd ed. New
York: Springer-Verlag.
Manski, Charles F. 2004. Statistical treatment rules for heterogeneous populations.
43
Econometrica 72(4): 1221-1246.
Morris, Stephen. 1995. The common prior assumption in economic theory. Eco-
nomics and Philosophy 11(2): 227-253.
Oberholzer-Gee, Felix. 2008. Nonemployment stigma as rational herding: A field
experiment. Journal of Economic Behavior & Organization 65(1): 30-40.
Poirier, Dale J. 1988. Frequentist and subjectivist perspectives on the problems of
model building in economics. Journal of Economic Perspectives 2(1): 121-144.
Pratt, John W. 1965. Bayesian interpretation of standard inference statements. Jour-
nal of the Royal Statistical Society, Series B (Methodological) 27(2): 169-203.
Raiffa, Howard and Robert Schlaifer. 1961. Applied Statistical Decision Theory.
Boston: Division of Research, Graduate School of Business Administration,
Harvard University.
Robert, Christian. 2007. The Bayesian Choice: From Decision-Theoretic Founda-
tions to Computational Implementation. 2nd ed. New York: Springer-Verlag.
Simon, Gregory. 2019. Unipolar major depression in adults: choosing initial treat-
ment. In D. Solomon (ed.), UpToDate. Accessed on September 19, 2019 at
<https://www.uptodate.com/contents/unipolar-major-depression-in-adults-choosing-
initial-treatment>.
Sims, Christopher A. 1982. Scientific standards in econometric modeling. In M.
Hazewinkel and A. H. G. Rinnooy Kan (eds.), Current Developments in the
Interface: Economics, Econometrics, Mathematics : 317-337. Holland: D Rei-
del Publishing Company.
Sims, Christopher A. 2007. Bayesian methods in applied econometrics, or, why econo-
metrics should always and everywhere be Bayesian. Slides from the Hotelling
Lecture, presented June 29, 2007 at Duke University.
Spiess, Jann. 2018. Optimal estimation when researcher and social preferences are
misaligned. Harvard University Working Paper. Accessed on December 27,
2019 at<https://scholar.harvard.edu/files/spiess/files/alignedestimation.pdf>.
Stoye, Jorg. 2009. Minimax regret treatment choice with finite samples. Journal of
Econometrics 151(1): 70-81.
Stoye, Jorg. 2011. Statistical decisions under ambiguity. Theory and Decision 70(2):
129-148.
Stoye, Jorg. 2012. New perspectives on statistical decisions under ambiguity. Annual
Review of Economics 4: 257-282.
44
Torgersen, E. 1991. Comparison of Experiments. Encyclopedia of Mathematics and
its Applications. 1st ed. Cambridge: Cambridge University Press.
von Neumann, John. 1928. Zur theorie der gesellschaftsspiele. Mathematische An-
nalen 100(1): 295-320.
Wald, Abraham. 1950. Statistical Decision Functions. New York, NY: John Wiley
& Sons.
Zhu, Yuancheng, and John Lafferty. 2018a. Distributed nonparametric regression
under communication constraints. Proceedings of the 35th International Con-
ference on Machine Learning 80.
Zhu, Yuancheng, and John Lafferty. 2018b. Quantized minimax estimation over
Sobolev ellipsoids. Information and Inference: A Journal of the IMA 7(1):
31-82.
Zhang, Yuchen, John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013.
Information-theoretic lower bounds for distributed statistical estimation with
communication constraints. Advances in Neural Information Processing Sys-
tems 26.
45
Online Appendix for
A Model of Scientific Communication
Isaiah Andrews, Harvard University and NBER
Jesse M. Shapiro, Brown University and NBER
February 2020
A Proofs for Claims in Example and Applications
Proof of Claim 1 For dominance in loss, note that since 0 ∈ XM and 0 ∈ XM ,
0 ∈ D by construction. For any d < 0, L (d, θ) > L (0, θ) for all θ ∈ Θ, so d is
dominated in loss by d′ = 0 ∈ D.
To show that N (X ) = |XM |×∣∣XM
∣∣, note first that sufficiency of(X,X
)implies
that N (X ) ≤ |XM | ×∣∣XM
∣∣ . To complete the proof we will show that for any X =(X,X
)∈ X and X ′ =
(X ′, X
′)
such that X 6= X ′, there exists an agent a ∈ ∆ (Θ)
such that
arg mind∈D
Ea [L (d, θ) |X] ∩ arg mind∈D
Ea [L (d, θ) |X ′] = ∅, (8)
which in turn implies that N (X ) ≥ |XM | ×∣∣XM
∣∣.Note that under the assumed form for the distribution F0, the joint likelihood of
the control group outcomes satisfies the monotone likelihood ratio property in X.15 In
particular, for any agent with a non-degenerate prior on θ, the posterior distribution
on θ is strictly increasing in X in the sense of first-order stochastic dominance. Fix the
prior on θ as the prior which puts mass one on the largest value in Θ0, and consider
15Specifically, the joint density is given by∏n
i=1 exp (θXi)h (θ) g (Xi) =exp (θ · n ·X)h (θ)
n∏ni=1 g (Xi), and for X ′ > X and θ′ > θ,∏n
i=1 exp(θ′X ′i
)h(θ′)g (X ′i)∏n
i=1 exp (θX ′i)h (θ) g (X ′i)>
∏ni=1 exp
(θ′Xi
)h(θ′)g (Xi)∏n
i=1 exp (θXi)h (θ) g (Xi).
46
the class of all priors on θ, ∆ (Θ0) , noting that since we have taken θ as large as
possible all of these priors respect the bounds on the parameter space.
Since the data have full support under all θ ∈ Θ, the implied class of posteriors
for θ conditional on(X,X
)is the same as the class of priors, and X does not matter
for the posterior. The posterior risk function for agent a conditional on X is∑θ∈Θ
(d− θ)2 a (θ|X) .
Strict convexity of the squared error loss implies that each agent’s set of optimal
actions conditional on X contains at most two elements. Let us fix a value X <
max {XM} and consider an agent a who, conditional on X, is indifferent between
two decisions d and d′ with d < d′ (and prefers these to all other decisions). Exis-
tence of such an agent follows from our assumptions. Specifically, since n ≥ 3, and
maxθ∈Θ0EF0(θ) [Xi]−minθ∈Θ0 EF0(θ) [Xi] > 2/3, we know that there exist n1 < n2≤ n
such that 1n1, 1n2∈ (0,maxθ∈Θ0 ATE (θ)) . However, upper and lower bounds on
ATE (θ) after fixing θ at its upper bound are maxθ∈Θ0 ATE (θ) and 0, respectively.
Thus, by the definition ofD there exist d and d′ inD with d, d′ ∈ (0,maxθ∈Θ0 ATE (θ)),
from which the claim follows.
Note that since agent a is indifferent between d and d′, we must have
maxθ∈Θ0EF0(θ) [Xi]− Ea [θ|X] = 1
2d+ 1
2d′. Since |Θ0| ≥ 2, we know that there exists
θ′ ∈ Θ0 with θ′ 6= Ea [θ|X]. Assume θ′ < Ea [θ|X] (if not, we can use the same
argument with θ′ > Ea [θ|X]). If we define aε as the agent with ε > 0 more prior
mass on θ′, and ε less prior mass on all other parameter values, then for ε sufficiently
small aε strictly prefers d′ to any other action conditional on X, but strictly prefers
d to d′ conditional on any X ′ > X. Since we can repeat this argument for all
X < max {XM} , this verifies (8) for all X, X ′ such that X ′ 6= X. However, we can
likewise repeat the same argument to verify (8) for all X, X ′ such that X′ 6= X, which
proves the claim. 2
Proof of Corollary 2 Follows immediately from Claim 1, Proposition 3, and Corol-
lary 1. 2
Proof of Claim 2 Note that for all θ ∈ Θ, θj is decreasing in j by assumption.
We can consider d and θ as step functions defined on the interval [0, J ] with steps
47
at the integers, and ‖d− θ‖2 is just the L2 distance in this case. For d∗ which takes
d∗j = d(j), Proposition 1 of Chernozhukov et al (2009) shows that for any d ∈ D with
dj > dj−1 for some j and any θ ∈ Θ,
‖d− θ‖2 ≥ ‖d∗ − θ‖2
with strict inequality if θj < θj−1. Hence
‖d− θ‖22 ≥ ‖d
∗ − θ‖22 ,
for all θ ∈ Θ, with strict inequality for some some θ′ ∈ Θ, and d is dominated in loss.
2
Proof of Claim 3 To prove the claim, we need to show that any two distinct
realizations of the data, say X and X ′, induce distinct optimal actions for some
agent, in the sense that there exists a ∈ ∆ (Θ) with
arg mind∈D
Ea [L (d, θ) |X] ∩ arg mind∈D
Ea [L (d, θ) |X ′] = ∅.
To this end, suppose without loss of generality that for some j, Xj 6= X ′j. Let
us consider the set of agents Aj who put probability one on θj′ = max {Θ0} for
j′ < j, and probability one on θj′ = min {Θ0} for j′ > j. Hence, for these agents
monotonicity imposes no restrictions on θj and, moreover, the only decision-relevant
data is Xj. However, the binomial distribution has exponential family structure with
f (x; t) = exp (tx)h (t) g (x). Hence, the same argument used to prove Claim 1 in the
running example establishes the result. Since we can repeat this argument for all X
and X ′ with X 6= X ′, this proves the claim. 2
Proof of Claim 4 Theorems 4.2 and 4.3 of Eaton (1967) establish the invariant
optimality and minimaxity claims in the classical model. Since we have taken the
audience to consist of the set of all priors, Proposition 2 shows that minimaxity holds
in the classical model if and only if it holds in the decision model, and the same is
true of invariant optimality. For the Bayes part of the claim, note that the prior a∗
implies that θt and θs are independent for all s 6= t, so Ea∗ [θt|X] is a function of Xt
alone. Moreover, by the monotone likelihood ratio assumption Ea∗ [θt|Xt] is strictly
48
increasing in Xt. Hence, arg maxt Ea∗ [θt|X] = arg maxtXt, which proves the claim.
2
Proof of Claim 5 For the first part of the claim, note that d = 1 yields a strictly
lower loss than choosing d = ι for all θ ∈ Θ.
For the second part of the claim, note that it suffices to show that it holds for
a restricted version of the audience, A ⊆ ∆ (Θ). Specifically, let us consider the
audience consisting of only three agents, a0, a1, and a2.
Agent a0 has a uniform prior on Θ, a0 (θ) = a∗ (θ) = 1|Θ| . This again implies that
θt is independent of θs for all s 6= t. By the monotone likelihood ratio property, pro-
vided arg maxtXt is unique, this agent strictly prefers to set d = arg maxtXt. When
arg maxtXt is non unique, by contrast, this agent strictly prefers d ∈ arg maxtXt to
d 6∈ arg maxtXt, but is indifferent among d ∈ arg maxtXt.
Note, next, that since
a (θ|X) =f (X; θ) a (θ)∑
Θ f(X; θ
)a(θ) , Ea [θ|X] =
∑Θ
θa (θ|X) .
for f (X; θ) the probability mass function of Fθ, and Fθ has full support for all θ ∈ Θ,
Ea [θ|X] is continuous in a.Hence, there exists an open neighborhoodN (a0) around a0
such that all agents a ∈ N (a0) strictly prefer to set d ∈ arg maxtXt to d 6∈ arg maxtXt
for all realizations of X. Within this neighborhood, there is an agent a1 who when
arg maxt
Xt = {1, ..., T}
strictly prefers d = 1, and an agent a2 who strictly prefers d = 2 conditional on the
same event. This immediately implies, however, that N (X ,A) ≥ T + 1, since(arg mind∈D
Ea0 [L (d, θ) |X] , arg mind∈D
Ea1 [L (d, θ) |X] , arg mind∈D
Ea2 [L (d, θ) |X]
)
=
(arg maxtXt, arg maxtXt, arg maxtXt) when arg maxtXt is a singleton
(arg maxtXt, 1, 2) when arg maxtXt= {1, ..., T},
where the right hand side takes T + 1 distinct values. 2
49
Proof of Claim 6 Since agents are free to randomize conditional on observing
c (X, V ) = ι, c yields weakly smaller communication risk for all agents than c∗.
To show that there is a strict inequality for some agents, consider agents a for
whom Ea [θt| arg maxtXt = {1, ..., T}] is non-constant across t, while for all (X, V )
arg maxt Ea [θt|c∗ (X, V )] = c∗ (X, V ) . Note in particular that the agent with a uni-
form prior a∗ (θ) = 1/ |Θ| has arg maxt Ea∗ [θt|c∗ (X, V )] = c∗ (X, V ) , so such agents
a exist by the continuity of conditional expectations in a. When arg maxtXt =
{1, ..., T}, the decision taken by these agents is uniformly randomized under the rule
c∗, while under the rule c they are able to pick a decision they strictly prefer to
uniform randomization. Since these agents are still free to set d = c (X, V ) when
c (X, V ) 6= ι, this establishes that they have strictly lower communication risk under
c than under c∗. 2
B Additional Results Referenced in the Text
B.1 Continuous Versions of Example and Applications
This section extends the recurring deworming example, along with the monotone es-
timation and treatment assignment applications discussed in Section 4, to continuous
settings. For the deworming example and monotone estimation application we show
that changing from decision to communication risk can reverse dominance orderings
in a setting with Θ, X , and D all continuous, while for the treatment assignment
application we show that the results discussed in the main text extend unchanged to
the case with continuous Θ but discrete X and D. We also extend our main result
on admissibility, Proposition 3, to the case of continuous Θ.
B.1.1 Continuous Version of Deworming Example
Consider a Gaussian variant of the discrete deworming example developed in the
main text. Specifically, suppose that the data consist of control- and treatment-
group means from a randomized trial. Let us take X = R2, and assume that the two
means are independent and normally distributed with known variances σ2, σ2 > 0,(X
X
)∼ N
((θ
θ
),
(σ2 0
0 σ2
)).
50
Normality with known variance can be motivated by the asymptotic normality of
sample means (along with consistent estimability of their asymptotic variance).
Note that the average treatment effect is simply the difference between the treat-
ment and control means, ATE (θ) = θ − θ, and that the sample average treatment
effect X − X is again an unbiased estimator for ATE (θ). We again restrict the
parameter space such that the average treatment effect is non-negative, considering
Θ ={θ ∈ R2 : θ − θ ≥ 0
},
and take the decision space to correspond to the support of the treatment-control
difference, D = R. We consider quadratic loss L (d, θ) = (d− ATE (θ))2 , and take
the audience to consist of all possible priors ∆ (Θ) . For any rule c : X ×[0, 1]→ ∆ (D)
and any prior a ∈ ∆ (Θ) , we define decision and communication risk as in the discrete
case, with the difference that, unlike in the discrete case, decision and communication
risk may now be infinite for some rules and priors.
To highlight the distinction between decision and communication risk, we con-
sider two particular rules. First, the treatment-control difference,
c′ (X, V ) = c′ (X) = X −X
and, second, the treatment-control difference censored below at zero,
c′′ (X, V ) = c′′ (X) = max{X −X, 0
}.
Since all audience members agree that the average treatment effect is non-
negative, and the distribution of c′ (X) has support equal to R, c′′ (X) has strictly
lower decision risk than c′ for all a,
Ra (c′′) < Ra (c′) for all a ∈ ∆ (Θ) .
At the same time, c′′ censors the data. Note in particular that distinct realizations of
c′ (X) induce distinct posterior expectations for all agents a with non-degenerate pri-
ors on ATE (θ), V ara (ATE (θ)) > 0. Hence, by strict convexity of the loss, censoring
negative estimates strictly increases communication risk for these agents, while not
reducing it for any other agent. Thus, we see that R∗a (c′′) ≥ R∗a (c′) for all a ∈ ∆ (Θ) ,
51
with strict inequality when V ara (ATE (θ)) > 0. Hence, switching from decision to
communication risk reverses the dominance ordering between c′ (X) and c′′ (X) . The
rule c′ (X) preferred by the communication model again appears closer to what is
reported in practice.
B.1.2 Continuous Version of Monotone Estimation Application
Next, let us consider a continuous version of the monotone estimation application
discussed in the main text. Specifically, suppose we observe a normal random vector
X ∈ X = RJ with independent elements Xj ∼ N(θj, σ
2j
), and σj strictly positive
and known for all j. As with the previous extension, this may again be motivated by
asymptotic normality of sample means.
Suppose that θ ∈ Θ0 for each j, for Θ0 some interval with a nonempty interior
(e.g. Θ0 = R), and that θj is known to be decreasing in j so the joint parameter
space is
Θ ={θ ∈ ΘJ
0 : θ1 ≥ θ2 ≥ ... ≥ θJ}.
We take D = RJ , consider the squared error loss,
L (d, θ) = ‖d− θ‖22 =
∑j
(dj − θj)2 ,
and again take the audience to consist of all possible priors, ∆ (Θ) .
As in the deworming example above we compare two decision rules. The first
reports the raw data
c′ (X, V ) = c′ (X) = X,
while the second reports sorted estimates
c′′ (X, V ) = c′′ (X) = d∗ (X) ,
where as in the main text d∗ (d) sorts the elements of d in decreasing order, so d∗1 (d) =
maxj dj and d∗J (d) = minj dj.
The results of Chernozhukov et al (2009) again show that c′ dominates c′′ in
decision risk, and in particular that Ra (c′′) ≤ Ra (c′) for all a ∈ ∆ (Θ), while Ra (c′′) <
Ra (c′) for a such that Pra {θj < θj−1 for some j} > 0.
52
At the same time, c′′ (X) is a transformation of c′ (X), so
R∗a (c′′) ≥ R∗a (c′) for all a ∈ ∆ (Θ).
Note, next, that any two distinct realizations of c′ (X) induce distinct posterior ex-
pectations for some agent. To see that this is the case, suppose that X and X differ
in their jth component, and consider an agent a with degenerate priors on θj′ for
j′ 6= j and a non-degenerate normal prior on θj (truncated to obey the monotonicity
restrictions). If a observes c′ (X) then their posterior mean Ea [θj|c′ (X)] = Ea [θj|Xj]
is strictly monotonic in Xj. If this agent instead observes c′′ (X), any value of c′′ (X)
with at least two distinct elements is consistent with multiple values of Xj, which
implies that
Pr a {Ea [θj|c′′ (X)] 6= Ea [θj|c′ (X)]} > 0.
Jensen’s inequality thus implies thatR∗a (c′′) > R∗a (c′) . Thus, we again see that switch-
ing from decision risk to communication risk reverses the dominance ordering between
c′ and c′′.
B.1.3 Continuous Version of Treatment Choice Application
Define X , D, and L (d, θ) as in Section 4.2, but take Θ = (0, 1)T (or Θ = [0, 1]T ) and
again consider the audience A = ∆ (Θ) . The results of Lehmann (1966) and Eaton
(1967) continue to apply, and imply that the rule c∗ (X, V ) discussed in the main text
uniformly minimizes decision risk among all decision rules invariant with respect to
permutation of the treatments, and is minimax. Moreover, c∗ is a Bayes decision rule
for the agent a∗ with an independent U [0, 1] prior on each θt, and is admissible in the
classical model by Theorem 4.3 of Eaton (1967). Admissibility in the decision model
then follows immediately by the same argument as in the proof of Proposition 2.
At the same time, the rule c defined in the main text continues to dominate c∗
in the communication model. To see that this is the case, note that given c (X, V )
the agent is again free to randomize uniformly when c (X, V ) = ι, which yields a
random decision with the same conditional distribution a c∗ (X, V ) |X for all X. Thus,
R∗a (c) ≤ R∗a (c∗) for all a ∈ ∆ (Θ) . To see that this inequality is strict for some
a ∈ ∆ (Θ) , consider the restricted set of priors ∆(ΘT
0
)for Θ0 finite, |Θ0| ≥ 2. Claim
6 implies that there is a strict inequality for some a ∈ ∆(ΘT
0
).
53
B.1.4 Admissibility Conflict with Continuous Parameter Space
A corollary of Proposition 4 is that the conclusions of Proposition 3 can be extended
to some cases with continuous Θ.
Corollary 5. Suppose that for an infinite (and potentially uncountable) set Θ but
finite X and D, there exists a decision d ∈ D that is dominated in loss, in the
sense that there exists d′ ∈ D with L (d, θ) ≥ L (d′, θ) for all θ ∈ Θ with strict
inequality for at least one θ ∈ Θ. Suppose further that for some finite subset Θ ⊂Θ, N
(X ,∆
(Θ))≥ |D|. Then any rule c that is admissible in decision risk is
inadmissible in communication risk and vice versa.
The dominance in loss condition holds for continuous Θ in the deworming exam-
ple, monotone estimation application, and treatment choice application. Thus, the ad-
missibility conflict extends to these three settings as well under conditions on Θ analo-
gous to those in the discrete case. (Specifically, we maintain that supθ∈Θ0EF0(θ) [Xi]−
infθ∈Θ0 EF0(θ) [Xi] > 2/3 in the deworming example, and sup {Θ0} − inf {Θ0} > 2/3
in the monotone estimation application.)
Proof of Corollary 5 Note first that any rule with Prθ {c (X, V ) = d} > 0 for
some θ ∈ Θ is inadmissible in decision risk by the same argument as in the proof
of Proposition 4. Next, consider any rule such that Pra {c (X, V ) = d} = 0 for all
a ∈ ∆(
Θ). Let us construct a rule c as in the proof of Proposition 4, and note that
by construction R∗a (c) ≤ R∗a (c) for all a ∈ ∆ (Θ) , with a strict inequality for some
a ∈ ∆(
Θ). Thus, c is inadmissible in the communication model, as we wanted to
show. �
B.2 Heterogeneous Loss Functions and Likelihoods
In Section 2, we claimed that models with heterogeneous loss functions and likelihoods
can be cast into our setting. To see that this is the case, suppose we begin with a model
with an audience of agents A where agent a’s prior is πa ∈ ∆ (Θ), their likelihood
function is Fa,θ, and their loss function is La (d, θ). Here we do not identify agent a
with their prior since agents differ on multiple dimensions. Let us further suppose
54
that the loss is uniformly bounded above
supa∈A,d∈D,θ∈Θ
La (d, θ) <∞,
and that the probability mass function fa (x; θ) of Fa,θ is uniformly bounded away
from zero,
infa∈A,x∈X ,θ∈Θ
fa (x; θ) ≥ η > 0.
Under the assumption of uniformly bounded loss, for M = |Θ| × |D| we can
express
La (d, θ) =M∑m=0
la (m)Lm (d, θ)
where L0 (d, θ) = 0, while for m ≥ 1 the functions Lm (d, θ) are proportional to
1{d = dj(m)
}1{θ = θk(m)
},{(dj(m), θk(m)
): m = 1, ...,M
}= D×Θ, la satisfies la (m) ≥
0 for allm, and∑M
m=0 la (m) = 1. Intuitively, la (m) = La(dj(m), θk(m)
)/Lm
(dj(m), θk(m)
),
so for two agents a and a′, la (m) /la′ (m) measures how much a loses, relative to a′,
from the decision-parameter pair(dj(m), θk(m)
).
Likewise, we can express
Pr Fa,θ {X = x} =N∑n=1
pa (n) fn (x; θ)
where
fn (x; θ) =
1− η (|X | − 1) if θ = θk(n) and x = xr(n)
η otherwise,
{(θk(n), xr(n)
): n = 1, ..., N
}= Θ × X , pa (n) satisfies pa (n) ≥ 0 for all n, and∑N
n=0 pa (n) = 1.
Using these observations, we can cast this example into our baseline setting by
augmenting the parameter space. Specifically, define
Θ∗ = Θ× {0, ...,M} × {1, ..., N} .
For each agent a, define a’s prior on this augmented space as π∗a (θ∗) = πa (θ) la (m) pa (n),
55
and let agents share the homogeneous loss function
L (d, θ∗) = Lm (d, θ) ,
and the homogeneous likelihood
Fθ∗ = Fn,θ,
where Fn,θ has mass function fn (·; θ) . Since the prior imposes mutual independence
between (θ,m, n), agent a′s posterior risk from action d conditional on (X, V ) is
Ea [L (d, θ∗) |X, V ] =
∑θ∈Θ
∑Nn=1
∑Mm=0 Lm (d, θ) fn (X; θ) pa (n) la (m) πa (θ)∑
θ∈Θ
∑Nn=1
∑Mm=0 fn (X; θ) pa (n) la (m) πa (θ)
=
∑θ∈Θ
∑Nn=1 fn (X; θ) pa (n)
∑Mm=0 Lm (d, θ) la (m) πa (θ)∑
θ∈Θ
∑Nn=1 fn (X; θ) pa (n) πa (θ)
=
∑θ∈Θ La (d, θ) fa (x; θ) π (θ)∑
θ∈Θ fa (x; θ) π (θ)= Ea [La (d, θ) |X, V ]
and so coincides with the posterior risk in the initial heterogeneous prior, heteroge-
neous loss, heterogeneous likelihood model. Since this holds for all (X, V ), the pos-
terior risk conditional on (c (X, V ) , V ) likewise coincides for all possible c. Hence, all
risk calculations likewise coincide, and we have successfully recast the initial hetero-
geneous prior, heterogeneous loss, heterogeneous likelihood model as a heterogeneous
prior, homogeneous loss, homogeneous likelihood model with a particular audience
A ⊂ ∆ (Θ∗). Whether the class of priors {π∗a : a ∈ A} is closed and/or convex will
depend on the original priors πa, losses La, and likelihoods Fa,θ.
In conjunction with heterogeneous loss functions and likelihoods, our approach
can also allow heterogeneous decision spaces Da and parameter spaces Θa provided
|Da| = |D| and |Θa| = |Θ| for all a ∈ A. To see that this is the case, note that for
each a we can define a bijection φa between Da and D, and a bijection ψa between Θa
and Θ. Using these bijections we can again regard the loss and likelihood for agent a
as being defined on D and Θ, and use the same argument as above.
56
B.3 Regret-based Payoff Functions
For a given loss function L : D × Θ → R≥0, consider the regret loss function L :
D ×Θ→ R≥0 formed by subtracting the minimized loss:
L (d, θ) = L (d, θ)−mind′∈D
L (d′, θ) .
Regret loss functions of this kind have been considered by e.g. Manski (2004). Defin-
ing the communication, decision, and frequentist risk functions with respect to the
regret loss leaves all of the results in Section 3 unchanged (since the regret loss func-
tion is simply a different choice of loss, and our results do not depend on the form of
the loss).
For a given risk function ρ : C × A → R≥0, we may alternatively consider the
regret risk function ρ : C × A → R≥0 formed by subtracting the best-possible risk:
ρ (c, a) = ρ (c, a)− infc′∈C
ρ (c′, a) .
Regret risk functions of this kind are considered in Stoye (2012). Defining the regret
communication, decision, and frequentist risk functions leaves the results on admissi-
bility and ω−optimality in Sections 3.1 and 3.2 unchanged (since the set of ω-optimal
and admissible rules are the same in both cases). We do not know whether the results
on minimaxity in Section 3.3 extend to this case.
B.4 Selecting Natural Permutations
Let ψ permute the elements of D, and let ψ ◦ c : X × [0, 1]→ ∆ (D) be the rule with
realization ψ (c (X, V )). It is immediate that (ψ ◦ c) ∈ C and that R∗a (ψ ◦ c) = R∗a (c).
By contrast, in general Ra (ψ ◦ c) 6= Ra (c). That is, communication risk is invariant
to permutations, whereas decision risk is not. As a result, the communication model
has the unrealistic implication that if, for example, D is a subset of R and is symmetric
around zero, all agents are indifferent between the report c and the report −c.It is possible to modify the communication model to eliminate this unrealistic
implication. Define for any rule c ∈ C the set of rules
Q (c) = {c′ ∈ C : R∗a (c′) = R∗a (c)∀a ∈ A}
57
that are equivalent in communication risk to the rule c. Now say that a rule c∗ ∈ C is
• naturally admissible in communication risk if c∗ is admissible in communication
risk and there exists no rule c ∈ Q (c∗) such that Ra (c) ≤ Ra (c∗) for all a ∈ A,
with strict inequality for at least one a ∈ A.
• naturally ω-optimal in communication risk if c∗ is ω-optimal in communication
risk and ∫ARa (c∗) dω (a) = inf
c∈Q(c∗)
∫ARa (c) dω (a) .
• naturally minimax in communication risk if c∗ is minimax in communication
risk and
supa∈A
Ra (c∗) = infc∈Q(c∗)
supa∈A
Ra (c) .
For each optimality criterion, naturalness selects from among the communication-
optimal rules those that are not worse in decision risk than some other rule that
is equivalent in communication risk. The resulting rules are “natural” in the sense
that they will not lead to unnecessary losses if their reports are taken literally as
recommended decisions. This approach is similar in spirit to the refinement of cheap-
talk equilibria that arises from an arbitrarily small cost of lying (Chen et al. 2008).
The resulting refinement does not affect the relationship between the communi-
cation and decision models.
Proposition 5. There exists a rule c∗ that is admissible in decision risk and naturally
admissible in communication risk if and only if there exists a rule c ∈ C that is
admissible in both decision and communication risk. The same holds for ω-optimality
and minimaxity.
Proof of Proposition 5 We first consider admissibility. Since a rule is naturally
admissible in communication risk only if it is admissible in communication risk, the
“only if” part of the statement is trivial. For the “if” part, note that if c∗ is admissible
in both decision risk and communication risk, then it is admissible in communication
risk, and admissible in decision risk relative to the restricted class of rules Q (c∗) .
However, this is the definition of natural admissibility. The same argument works for
ω-optimality and minimaxity. �
58
B.5 Derivation of Quadratic Loss in Deworming Example
Let k ≥ max {D} be the private marginal cost of the medication with k−d the market
price with a subsidy (or tax) d ∈ D. Assume that households’ private willingness-
to-pay is distributed uniformly on [0,W ] for W ≥ k −min {D}, so that the share of
households purchasing the medication at market price k − d is given by
W − (k − d)
W∈ [0, 1] .
Each purchase has a social benefit not internalized by households given by θ =
ATE (θ) (e.g. because malnutrition has an impact on children’s long-run outcomes
that is not fully internalized by parents). The government’s payoff is given by the
sum of total surplus from the purchases, plus the social benefit:(W − (k − d)
W
)(W + (k − d)
2−(k − θ
))where W+(k−d)
2is the average private willingness-to-pay conditional on purchase and
(k − ATE (θ)) is the social marginal cost. We can rewrite the government’s payoff as
1
2W
(W 2 − 2W
(k − θ
)−(d− θ
)2
+(k − θ
)2).
Subtracting the best possible payoff (which arises when d = θ) gives
− 1
2W
(d− θ
)2
which is directly proportional to the assumed loss.
B.6 Complete Class Theorems for Decision and
Communication Models
In this section we establish complete class theorems for decision and communication
risk. To state this result, we allow a general closed audience A ⊆ ∆ (Θ), and say that
a rule c∗ ∈ C is ω-optimal in decision risk if and only if∫Ra (c∗) dω (a) = inf
c∈C
∫Ra (c) dω (a) ,
59
where unlike for ω-optimality the support of the probability measure ω may be a
strict subset of A. We define ω-optimality in communication risk analogously.
Proposition 6. A rule c is admissible in decision risk for the audience A only if it is
ω-optimal in decision risk for some ω. Likewise, a rule is admissible in communication
risk only if it is ω-optimal in communication risk for some ω.
For decision risk, this result is an immediate consequence of the classical complete
class theorem. By contrast, communication risk is nonlinear in a, and so requires a
slightly different argument (and in particular depends on the public randomization
device).
Proof of Proposition 6 We first prove the result for decision risk. Note that the
set of decision risk functions is trivially convex, since we can take a mixture of any
two decision rules to form a new decision rule. Hence we can apply Theorem 8.4.3
of Robert (2007) (with A playing the role of Θ) to obtain that weighted average risk
minimizing procedures are a complete class for decision risk.
We next prove the result for communication risk. Note first that the presence
of V ensures that the set of communication risk functions is convex. Specifically, for
any pair of rules c1, c2 ∈ C and any α ∈ [0, 1] we can construct a new rule
cα (X, V ) = 1 {V ≤ α} c (X, V/α) + 1 {V > α} c (X, (1− V ) / (1− α)) .
Since each agent a observes (cα (X, V ) , V ),
R∗a (cα) = αR∗a (c1) + (1− α)R∗a (c2) .
We show continuity of communication risk in Lemma 1, so combined with the
convexity of the class of communication rules, Theorem 8.4.3 of Robert (2007) again
implies that weighted average risk minimizing procedures are a complete class for
communication risk. 2
B.7 Extension of Optimal Treatment Assignment Example
This section extends the analysis of optimal treatment assignment in Section 4.2 of
the main text to show that when agents have sufficiently informative beliefs, it may
60
be communication-preferred to report ι even in some cases without exact ties. To
develop these results we consider a restricted audience A ⊂ ∆ (Θ).
Claim 7. Suppose that for a closed audience A and some non-empty set E ⊆ X ,
arg maxt
Ea [θt|X] = arg maxt
Ea [θt] for all a ∈ A, X ∈ E . (9)
Then the rule c which takes c (X, V ) = c∗ (X, V ) when X 6∈ E and c (X, V ) = ι when
X ∈ E has weakly lower communication risk for all a ∈ A than does the rule c∗.
Claim 8. If in addition to the conditions of Claim 7,{X : arg max
tXt = {1, ..., T}
}∩ E 6= ∅, (10)
there exists a ∈ A with arg maxt Ea [θt|c∗ (X, V ) , V ] = c∗ (X, V ) for all (X, V ), and
arg maxt Ea [θt] a singleton, then c dominates c∗ in communication risk.
Claim 9. Finally, if the conditions of Claim 7 hold, the conditions of Claim 8 hold for
all a ∈ A, and the audience A is invariant under permutation of the treatments (that
is, the set of priors is unchanged by relabeling the treatments), then c∗ is minimax in
decision risk but not communication risk for the audience A.
Condition (9) in Claim 7 formalizes what it means for X ∈ E to be uninformative
for the audience A : no agent in this audience changes their action in response to
observing X ∈ E . Note that E and A are tightly linked here. For example, if we take
A to contain only agents with dogmatic priors, then all data is uninformative and we
can take E = X . By contrast, if we take A = {a∗} to consist of the agent with an
uninformative prior discussed in Claim 4, then the largest possible E is
E =
{X : arg max
tXt = {1, ..., T}
}.
Claim 8 is intuitive as well. The condition arg maxt Ea [θt|c∗ (X, V ) , V ] = c∗ (X, V )
means that agent a has a sufficiently uninformative prior that they are willing to follow
the recommendation of the minimax rule. By contrast, the fact that arg maxt Ea [θt] is
a singleton means this agent’s prior is sufficiently informative that, absent any data,
61
they have a unique preferred decision. Putting these two conditions together, how-
ever, means that this agent would strictly prefer to avoid the randomization induced
by the rule c∗.
Finally, for Claim 9, note that, since the audience is invariant, minimaxity of c∗
follows from the fact that it minimizes risk in the class of invariant decision rules.
However, the conditions of Claim 9 imply that the audience of A is not convex, since
the convex hull of any invariant audience necessarily contains a prior a such that
arg maxt
Ea [θt] = {1, ..., T}
which we have ruled out by assumption. Hence, Claim 9 illustrates that with a non-
convex audience, minimax decision rules need not be minimax communication rules.
Proof of Claim 7 Note that all agents have the option to choose d ∈ arg maxt Ea [θt]
conditional on observing c (X, V ) = ι, while choosing d ∈ arg maxt Ea [θt|c (X, V ) , V ]
conditional on observing c (X, V ) 6= ι. By the definition of E this yields a weakly
lower expected loss for agent a than choosing some d ∈ arg maxt Ea [θt|c∗ (X, V ) , V ].
2
Proof of Claim 8 If arg maxt Ea [θt] is a singleton for a given agent a and (9) holds,
then conditional on X ∈ E agent a strictly prefers not to randomize their decision.
At the same time, since arg maxt Ea [θt|c∗ (X, V )] = c∗ (X, V ), under the rule c∗ this
agent’s decision is random conditional on the data when
X ∈ E ∩{X : arg max
tXt = {1, ..., T}
}.
As above, since the agent is free to choose d = c (X, V ) conditional on c (X, V ) 6= ι
and d = arg maxt Ea [θt] conditional on c (X, V ) = ι, we see that c yields a strictly
lower communication risk for this agent. Since we have shown in the proof of Claim
7 that c yields weakly lower communication risk than c∗ for all a ∈ A, c dominates
c∗. �
62
Proof of Claim 9 Since the set A is a closed subset of the simplex and communi-
cation risk is continuous in a, the proof of Claim 8 implies that
infa∈A{R∗a (c∗)−R∗a (c)} > 0.
Hence, the maximum communication risk of c∗ for the audience A is strictly larger
than that of c, which proves the second part of the claim. To prove the first part
of the claim, note that by the assumed invariance of the audience A, the maximum
decision risk of a rule c is bounded below by the risk of the rule ψ◦c where ψ randomly
permutes the treatments. Note, however, that the rule ψ ◦ c is invariant to relabeling
of the treatments by construction. The results of Lehmann (1966) imply, however,
that Ra
(cI)≥ Ra (c∗) for any invariant rule cI and any a ∈ A. Thus, we have that
for any rule c
supa∈A
Ra (c) ≥ supa∈A
Ra
(ψ ◦ c
)≥ sup
a∈ARa (c∗) ,
which proves that c∗ is minimax in decision risk. �
63