A Model of Scienti c Communication - Harvard University · 2020-02-25 · A Model of Scienti c...

A Model of Scientific Communication

Isaiah Andrews, Harvard University and NBER∗

Jesse M. Shapiro, Brown University and NBER

February 2020

Abstract

We propose a model of empirical science in which an analyst makes a report to

an audience after observing some data. Agents in the audience may differ in

their beliefs or objectives, and may therefore update or act differently follow-

ing a given report. We contrast the proposed model with a classical model of

statistics in which the estimate directly determines the payoff. We identify set-

tings in which the proposed model prescribes very different, and more realistic,

optimal rules.

keywords: statistical communication, statistical decision theory

∗This article previously circulated under the title “Statistical Reports for Remote Agents.” Weacknowledge funding from the National Science Foundation under Grant No. 1654234, the Alfred P.Sloan Foundation, the Silverman (1968) Family Career Development Chair at MIT, and the BrownUniversity Population Studies and Training Center. Any opinions, findings, and conclusions orrecommendations expressed in this material are those of the authors and do not necessarily reflectthe views of the funding sources. We thank Glen Weyl for contributions to this project in its earlystages. Conversations with Matthew Gentzkow and Kevin M. Murphy on related projects greatlyinfluenced our thinking. For comments and helpful conversations, we also thank Alberto Abadie,Tim Armstrong, Gary Chamberlain, Xiaohong Chen, Max Kasy, Elliot Lipnowski, Adam McCloskey,Emily Oster, Mikkel Plagborg-Møller, Tomasz Strzalecki, and Neil Thakral. The paper also benefitedfrom the comments of seminar and conference audiences at Brown University, Harvard University, theMassachusetts Institute of Technology, the University of Chicago, Boston College, Rice University,Texas A&M University, New York University, Columbia University, the University of California LosAngeles, the University of Texas at Austin, Oxford University, and from comments by conferencediscussant Jann Spiess. We thank our dedicated research assistants for their contributions to thisproject. E-mail: [email protected], jesse shapiro [email protected].

1

1 Introduction

Statistical decision theory, following Wald (1950), is the dominant theory of optimal

estimation in econometrics.1 The classical decision-theoretic model envisions an an-

alyst who estimates an unknown parameter based on some data. The performance

of the estimate is judged by its proximity to the true value of the parameter. This

judgment is formalized by treating the estimate as a decision that, along with the

parameter, determines a realized payoff or loss. For example, if the loss is taken to

be the square of the difference between the estimate and the parameter, then the

expected loss of the decision-maker is the estimator’s mean squared error, a standard

measure of performance.

Although many scientific situations seem well described by the classical model,

many others do not. Scientists often communicate their findings to a broad and diverse

audience, consisting of many different agents (e.g., practitioners, policymakers, other

scientists) with different opinions and objectives. These diverse agents may make

different decisions, or form different judgments, following a given scientific report. In

such cases, it is the beliefs and actions of these audience members which ultimately

matter for realized payoffs or losses.

In this paper we propose an alternative model of empirical science that captures

scientific situations of this kind. We develop sufficient conditions under which the

proposed model predicts very different scientific reports from the classical model.

We offer examples satisfying those conditions, and argue that in practical situations

similar to these examples, scientists’ reports seem to be more consistent with the

predictions of the proposed model than with the predictions of the classical model.

In the proposed communication model, the analyst reports a recommended deci-

sion (or estimate) to an audience based on some observed data. Each agent in the

audience takes her optimal decision (or forms an optimal estimate) after observing

the analyst’s report. Agents differ in their priors, and may therefore have different

optimal decisions (or estimates) following a given report. To simplify the analysis,

we assume that every possible prior belief on the unknown parameter is held by some

agent in the audience. A reporting rule (specifying a distribution of recommendations

1See Lehmann and Casella (1998) for a textbook treatment of statistical decision theory andStoye (2012) for a recent discussion of its relation to econometrics.

2

for each realization of the data) induces an expected loss for each agent, which we

call the rule’s communication risk.

We contrast the proposed communication model with a decision model in which

the analyst selects a decision (or estimate) that directly determines the loss for all

agents. Because agents in the decision model do not optimize following the ana-

lyst’s report, each agent’s expected loss, which we call the rule’s decision risk, is

weakly higher than under the communication model.2 The implications of the deci-

sion model coincide with those of the classical model. Our analysis therefore proceeds

by contrasting rankings of rules under the decision model with rankings under the

communication model.

We find that the two models can imply very different rankings of rules. An ex-

ample illustrates. Suppose that an analyst conducts a randomized controlled trial to

assess the effect of a deworming medication on the average body weight of children

in a low-income country. Even if deworming medication is known to (weakly) im-

prove nutrition, sampling error means that the treatment-control difference may be

negative. Under a canonical quadratic loss, the decision model implies that all au-

dience members prefer that the analyst censor negative estimates at zero, since zero

is closer to the (weakly positive) true effect than any negative number. Under the

same loss, the communication model implies that censoring discards potentially useful

information (the more negative the estimate, the weaker the evidence for a large posi-

tive effect), and has no corresponding benefit (agents can incorporate censoring when

determining their optimal decisions or estimates). While a policymaker choosing a

subsidy for the medication might well censor the decision (thus ruling out imposing

a tax), we claim, and illustrate by example, that a scientist choosing a report for a

research article would be unlikely to censor.

In the paper, we formalize and generalize the example. A rule is admissible with

respect to a given definition of risk if no other rule yields a weakly lower risk for

all agents and a strictly lower risk for at least one. While admissibility is a very

weak notion of optimality, in settings like the deworming example there exists no rule

that is simultaneously admissible for decision and communication. More generally,

we show that the sets of decision-admissible and communication-admissible rules do

2Decision risk is what Lehmann and Casella (1998, Chapter 4) call the Bayes risk.

3

not intersect when (i) some decision is dominated in loss and (ii) the set of feasible

decisions is smaller than the set of distinct optimal action profiles for agents.

We next turn to principles for choosing rules other than admissibility. One such

principle is to minimize weighted average risk with respect to some weights on the

audience. Applying this principle to decision risk leads to the class of Bayes decision

rules. Bayes decision rules with respect to full-support priors have strong optimality

properties in the classical setting, but because they are admissible in decision risk,

they are inadmissible in communication risk under conditions (i) and (ii).

Another principle is to minimize the maximum risk over agents in the audience. In

contrast to our findings for admissibility and optimality in weighted average risk, we

find that any rule that is minimax in decision risk is minimax in communication risk.

This finding establishes a sense in which any rule that is robust for decision-making

is also robust for communication. However, minimax rules can be inadmissible, and

under conditions (i) and (ii), any rule that is both minimax and admissible in decision

risk is minimax and inadmissible in communication risk.

We apply our model to two types of scientific settings recently studied in econo-

metrics: estimating the conditional expectation function when this function is known

to be monotonically decreasing, and estimating the optimal choice of treatment from

a finite set. In both settings, as in the deworming example, we show that the implica-

tions of the decision and communication models are different, and we argue that the

communication model is more consistent with practice in some real-world situations.

Our analysis is positive in nature. We do not take a stand on which principles

should guide scientists’ choice of statistical reports. Nor do we argue that the com-

munication model is a better description of all scientific situations than the decision

model. Rather, we argue that the communication model better describes some im-

portant situations, and that in these situations, it seems to better match scientific

practice.

Heterogeneity among agents plays a central role in our analysis. When agents are

homogeneous, the distinction between decision and communication risk is inconse-

quential, because a benevolent analyst can simply report the agents’ optimal decision

(or estimate) given the data. When agents are instead heterogeneous, the distinction

can be consequential, because different agents may prefer different decisions (or es-

4

timates). Although we model heterogeneity by assuming agents differ in their prior

beliefs, we show that our model is equivalent to one in which agents instead differ in

their loss functions.

Our baseline model assumes that the sets of possible parameter values, data re-

alizations, and decisions are finite. This greatly simplifies the analysis but has the

drawback that it precludes applying our framework to many canonical statistical situ-

ations such as estimating the mean of a Gaussian random variable. Online Appendix

B.1 develops continuous versions of the example and applications discussed in the

main text and shows that there remains a strong conflict between the decision and

communication models, in the sense that changing the notion of risk can reverse

dominance orderings.

We assume that the analyst’s report takes the form of a recommended decision.

This ensures that both decision and communication risk are well-defined for all rules

we consider, and thus allows us to directly compare the ranking of rules under the

two notions of risk. We discuss additional implications of this assumption in Section

2.3.

We are not aware of past work that studies the ranking of rules based on com-

munication risk in a setting with heterogeneous agents. Raiffa and Schlaifer (1961),

Hildreth (1963), Sims (1982, 2007), and Geweke (1997, 1999), among others, consider

the problem of communicating statistical findings to diverse, Bayesian agents.3 One

conclusion from this literature is that when the analyst can communicate the full

likelihood, data, or a sufficient statistic, the analyst should do so. Our analysis is

particularly related to that of Hildreth (1963) who studies, among other topics, the

properties of what we term communication risk in the single-agent setting. Banerjee

et al. (forthcoming) study minimax experimental design when the audience has het-

erogeneous priors. Spiess (2018) studies optimal estimation in a setting where, unlike

in our model, the objectives of the analyst and audience may be misaligned. Andrews

et al. (2020) study the implications of communication risk for structural estimation

3See also Efron (1986) and Poirier (1988). A related literature (e.g., Pratt 1965, Kwan 1999,Abadie forthcoming, Abadie and Kasy 2019, Frankel and Kasy 2018) assesses the Bayesian interpre-tation of frequentist inference. Another literature (e.g., Zhang et al. 2013, Jordan et al. 2018, Zhuand Lafferty 2018a) considers the problem of distributing statistical estimation and inference acrossmultiple machines when communication is costly.

5

in economics (see also Andrews et al. 2017).

Our setting is also related to the literature on comparisons of experiments following

Blackwell (1951, 1953).4 What we term communication risk has previously appeared

in this literature (see for instance Example 1.4.5 in Torgersen 1991), but the primary

focus has been on properties (e.g., Blackwell’s order) that hold for all possible beliefs

and loss functions. By contrast, we focus on the comparison between communication

risk and decision risk for a given loss function and class of priors.

Our setting is broadly related to large literatures on strategic communication

(Crawford and Sobel 1982) and information design (Bergemann and Morris 2019).

As in Farrell and Gibbons (1989), the receivers (agents) in our setting are hetero-

geneous. As in Kamenica and Gentzkow (2011), the sender (analyst) in our setting

commits in advance to a reporting strategy. Unlike much of the literature on strategic

communication, our setting does not involve a conflict of interest between the sender

and the receivers.

The remainder of the paper is organized as follows. Section 2 introduces our

setting, notation, and key definitions, and provides some preliminary results. Section

3 provides our main results on the implications of the communication model and its

differences with the decision and classical models. Section 4 presents applications

to two types of scientific situations. Section 5 concludes. An appendix following

the body text provides proofs for the main results. Online Appendix A proves the

results in the example and applications. Online Appendix B discusses extensions and

auxiliary results.

2 Model

2.1 Primitives

An analyst observes data X ∈ X for X a finite sample space, |X | <∞. The distribu-

tion of X is governed by a parameter θ ∈ Θ, with X|θ ∼ Fθ, for Θ a finite parameter

space. We assume that Fθ has support equal to X for all θ ∈ Θ. The analyst also

observes a public random variable V ∼ U [0, 1] that is independent of θ and X.

4Le Cam (1996) provides a brief review, while an extensive treatment can be found in Torgersen(1991).

6

The analyst publicly commits to a rule c : X × [0, 1] → ∆ (D) that maps from

realizations of the data X and the public random variable V into a distribution over

decisions d ∈ D, for D a finite space. Let C denote the set of all such rules and with a

slight abuse of notation let c (X, V ) ∈ D denote the random realization from a given

rule c ∈ C.Rules are evaluated by their performance with respect to a closed set A ⊆ ∆ (Θ)

of priors on the parameter space, which we will call the audience. Specifically, the

function ρ : C ×A → R≥0 describes the risk of rule c ∈ C with respect to any a ∈ A.

We discuss three possible choices of (A, ρ), along with their interpretation, in the next

section.

The ordering of rules under the risk function ρ (·, ·) may depend on the prior

a ∈ A. We consider three canonical criteria by which the analyst could select a rule,

or set of rules, in such a case.

Definition 1. For a given risk function ρ (·, ·) and audience A, a rule c∗ ∈ C is

• admissible if there exists no rule c ∈ C such that ρ (c, a) ≤ ρ (c∗, a) for all

a ∈ A, with strict inequality for at least one a ∈ A.

• ω-optimal if for a probability measure ω with support equal to A∫Aρ (c∗, a) dω (a) = inf

c∈C

∫Aρ (c, a) dω (a) . (1)

• minimax if

supa∈A

ρ (c∗, a) = infc∈C

supa∈A

ρ (c, a) .

These criteria are central to the classical study of statistics (Lehmann and Casella

1998). They also have an intuitive economic meaning. Admissibility excludes rules

that are dominated. ω-optimality defines the set of rules optimal with respect to

full-support Pareto weights ω on the audience. Minimaxity minimizes the worst-case

risk and is familiar from welfare economics, game theory, and the study of decision-

making under ambiguity. Existence of rules optimal according these criteria follows

from standard arguments.

7

Proposition 1. Suppose that ρ (c, a) is continuous in a for all c ∈ C, and that the set

of risk functions {ρ (c, ·) : c ∈ C} is compact in the supremum norm. Then there exists

an admissible rule, an ω-optimal rule for any ω, a minimax rule, and an admissible

minimax rule. Moreover, any ω-optimal rule is admissible.

Under the conditions of Proposition 1, the criteria we consider are available to the

analyst, in the sense that they each admit at least one rule.

2.2 Risk Functions for Communication and Decision

In our setting, a model of empirical science is characterized by a risk function ρ (·, ·)and an audienceA, which together determine the implications of the canonical criteria

for the analyst’s choice of rules.

We focus on three such models.

Definition 2. Fix a loss function L : D ×Θ→ R≥0. Then:

• The communication model takes A = ∆ (Θ) and

ρ (c, a) = R∗a (c) = Ea

[mind

Ea [L (d, θ) |c (X, V ) , V ]],

for R∗a (c) the communication risk of rule c ∈ C for prior a ∈ ∆ (Θ) and Ea

the expectation under a.

• The decision model takes A = ∆ (Θ) and

ρ (c, a) = Ra (c) = Ea [L (c (X, V ) , θ)]

for Ra (c) the decision risk of rule c ∈ C for prior a ∈ ∆ (Θ).

• The classical model takes A to be the vertices of ∆ (Θ), such that each prior

a ∈ A places probability 1 on some θ (a) ∈ Θ, and takes

ρ (c, a) = Rθ(a) (c) = Eθ(a) [L (c (X, V ) , θ (a))]

for Rθ(a) (c) the frequentist risk of rule c ∈ C for the parameter θ (a) ∈ Θ.

8

The communication risk R∗a (c) is the ex-ante expected loss under the prior a from tak-

ing the a-posteriori optimal decision after observing c (X, V ) and V . The decision risk

Ra (c) is the expected loss under the prior a from taking the decision c (X, V ). Finally,

the frequentist risk Rθ(a) (c) is the expected loss under the parameter value θ (a) from

taking the decision c (X, V ). Lemma 1 in the appendix shows that communication

and decision risk satisfy the conditions of Proposition 1, while these conditions hold

trivially for frequentist risk. Hence, Proposition 1 implies the existence of admissible,

ω-optimal, and minimax rules in the communication, decision, and classical models.

The decision and classical models have different audiences but the same risk func-

tion. The following proposition shows that their implications coincide in a strong

sense.

Proposition 2. The decision model implies the same sets of admissible rules, ω-

optimal rules for some ω, and minimax rules as the classical model.

Intuitively, because the audience A = ∆ (Θ) consists of the set of all possible

convexifications of the vertices of ∆ (Θ), the dominance relation (admissibility), the

implications of full-support averages (ω-optimality), and the worst-case risk (mini-

maxity) are all preserved when switching from the classical model’s audience to the

decision model’s audience. This point is well-understood in the literature (see, e.g.,

Stoye 2011).

While we focus on the case with A = ∆ (Θ) for simplicity of exposition, many of

our results (though not Proposition 2) extend to the case of a general closed, convex

audience A ⊆ ∆ (Θ). We highlight these extensions in the proofs. That said, some

degree of heterogeneity is necessary for the distinction between communication and

decision risk to be interesting. For an audience A = {a} consisting of a single prior

a, all three notions of optimality discussed in Definition 1 coincide, and optimality in

the decision model implies optimality in the communication model.

It is worth briefly highlighting the role of randomization in the rules c. Specifically,

the analyst may randomize publicly through the random variable V , considering c :

X × [0, 1] → D, privately through the choice of a randomized decision, c : X →∆ (D) , or both publicly and privately, c : X × [0, 1] → ∆ (D). For the decision

and classical models private and public randomization are equivalent, in the sense

9

that the risk depends only on the distribution of c (X, V ) |X. For the communication

model, by contrast, the nature of the randomization matters. Intuitively, since V

is observed, publicly randomized rules c : X × [0, 1] → D can be viewed as ex-ante

mixtures over non-random rules c : X → D, where it is revealed ex-post which rule

c was selected. By contrast, privately randomized rules c : X → ∆ (D) can be

viewed as ex-ante mixtures over non-random rules c : X → D, where it is never

revealed which c was selected. Hence, the public random variable V ∼ U [0, 1] is a

modeling device to allow the analyst to randomize across rules while revealing the

outcome of the randomization to the audience. Lemma 2 in the appendix shows that

public randomization always yields weakly lower communication risk than private

randomization, with a strict inequality in some cases.

2.3 Interpretation of the Models

To interpret the models, we can identify each prior a ∈ ∆ (Θ) with some agent. Agents

in the audience for a given scientific finding might include practitioners, policymakers,

or other scientists. We highlight two different ways to interpret the decision d ∈ Dand associated loss L (d, θ).

One interpretation is that the decision d ∈ D represents a real-world action whose

consequences are captured by L (d, θ). For example, doctors may need to choose a

treatment, policymakers to set a tax, and scientists to decide on what experiment to

run next. On this interpretation, the decision model reflects a situation in which the

analyst makes a decision on behalf of all agents, or equivalently, all agents are bound

to take the decision recommended by the analyst. The communication model, by

contrast, reflects a situation in which each agent is free to take her optimal decision

given the information in the analyst’s report.

Another interpretation is that the decision d ∈ D represents a best guess whose

departure from the truth is captured by L (d, θ). This interpretation is evoked by

canonical losses, such as L (d, θ) = (d− θ)2, that increase in the distance between

the estimate and the parameter. On this interpretation, the decision model reflects

a situation in which each agent evaluates the quality of the analyst’s guess according

to the agent’s prior. The communication model, by contrast, reflects a situation in

10

which each agent evaluates the quality of the agent’s own best guess, as informed by

the analyst’s report as well as the agent’s prior.

In many real-world situations the agents in the audience for a given scientific find-

ing will have diverse opinions and may therefore make different decisions, or form dif-

ferent best guesses about an unknown parameter, after learning the same report. The

communication model better reflects such situations than does the decision model.

In other situations—for example, a government committee deciding on the appropri-

ate treatment to reimburse for a given diagnosis for all practitioners, or a scientific

committee deciding where next to point a telescope that will provide data to many

researchers—the decision model seems a better fit.

In our model, agents differ only in their prior beliefs about the parameter. Casual

experience suggests that different agents (doctors, policymakers, scientists) do often

disagree subjectively about how to interpret evidence, and the assumption of hetero-

geneous priors is one way to capture such disagreements (Morris 1995). But agents

may also differ in their preferences or objectives. Online Appendix B.2 shows that

models in which an audience of agents differ in their loss functions can be cast into our

setting through appropriate relabeling and choice of A.5 The same appendix shows

how to accommodate heterogeneous beliefs about the likelihood Fθ. Online Appendix

B.3 deals with the case where an agent’s payoff is determined by the agent’s regret

(i.e., loss or risk relative to a best-possible decision or rule).

It is also possible to do away with heterogeneous agents altogether by envisioning a

single agent who receives some information about the parameter θ that is not available

to the analyst at the time they make their report. Under this interpretation, A is the

set of posterior beliefs that the agent may hold following receipt of the information.

Under the decision model, the agent is bound by the recommended decision regardless

of the additional information. Under the communication model, the agent is free to

make an optimal decision after learning the additional information.

We assume throughout that the set of rules C available to the analyst is the set of

estimators or decision rules, i.e. rules mapping from the sample space to distributions

on the decision space D. This assumption is important for comparing the implications

5Brown (1975) considers a setting with a collection of possible loss functions and proposes corre-sponding notions of admissibility.

11

of the decision and communication models, because decision risk is not well-defined for

rules c /∈ C. Modifying the communication model to allow the use of a broader set of

rules C ′ ⊃ C would only strengthen the conflicts between decision and communication

risk that we document in Section 3.6 An unrealistic aspect of the communication

model is that communication risk (unlike decision risk) is invariant to permutations

of the analyst’s report (e.g., reporting high values of d when the parameter θ is

expected to be low, or vice versa). Online Appendix B.4 shows how to eliminate this

unrealistic aspect of the model without altering our substantive findings.

In some situations, including some of the examples we discuss below, the com-

munication constraint is not binding, in the sense that D is rich enough to allow the

analyst to communicate all the decision-relevant information in the data. In general,

this need not be the case, and when it is not, an analyst concerned with minimiz-

ing communication risk might like to use a richer vocabulary than D. In practice,

scientists often report estimates or other summaries that do not convey all of the

information in the data. We think a plausible reason is that there are communication

or information processing constraints on the part of the agents in the audience. Our

model captures those constraints by the restriction to rules c ∈ C. Restricting reports

to use a finite vocabulary is a standard way to model such constraints in information

theory (e.g., Cover and Thomas 2006), and has been explored in recent studies of

nonparametric statistics under computational constraints (Zhu and Lafferty 2018a,

b).

Online Appendix B.1 develops versions of our example and applications with

continuous parameter spaces. For the example and one application we further al-

low for a continuous sample space X and decision space D. With continuous D,

communication-optimal rules can take an unrealistic form, for example encoding the

full data in the decimal expansion of c (X, V ). The appendix therefore focuses on

showing that the decision and communication models deliver conflicting rankings of

particular, realistic-seeming rules. How best to model communication constraints for

general settings with continuous decision spaces seems a potentially interesting topic

for future work.

6If, according to a given optimality criterion, any rule that is optimal in decision risk is notoptimal in communication risk, then the same applies if we evaluate communication risk with respectto C′ ⊃ C.

12

3 Implications of the Communication Model

In this section we discuss the implications of the communication model, emphasizing

the contrast with the decision (and hence classical) model. To illustrate the key

intuitions, we return to (and elaborate) the example from the introduction.

Example. An analyst observes data on weight gain for a sample of n children enrolled

in a randomized trial of deworming drugs. Weight is measured to finite precision, so

the weight gain for child i is Xi ∈ X0 for X0 a finite set with |X0| ≥ 2. The data are

thus

X = (X1, ..., Xn) ∈ X = X n0 .

Indices i are assigned at random, with children {1, ..., n} assigned to control and

children {n+ 1, ..., n} assigned to treatment, with n, n − n ≥ 3 . Children’s weight

gains Xi, Xj are independent for all i 6= j. For children in the control group, Xi ∼F0 (θ) and for those in the treatment group Xi ∼ F0

(θ), where F0 (t) is an exponential

family with probability mass function of the form f0 (x; t) = exp (tx)h (t) g (x) . This

implies that the control and treatment group means

(X,X

)=

(1

n

n∑i=1

Xi,1

n− n

n∑i=n+1

Xi

)

are a sufficient statistic for θ =(θ, θ). Let XM and XM denote the sets of possible

values for each of these means.

The average treatment effect of deworming drugs on child weight is

EF0(θ) [Xi]− EF0(θ) [Xi] = ATE (θ) ,

which can be estimated without bias by the difference in means X−X. Suppose that

the average treatment effect is known a priori to be non-negative, and that θ, θ ∈ Θ0

for Θ0 a finite set with |Θ0| ≥ 2 and maxθ∈Θ0EF0(θ) [Xi]−minθ∈Θ0 EF0(θ) [Xi] > 2/3.

The parameter space is thus

Θ ={θ ∈ Θ2

0 : ATE (θ) ≥ 0}

={θ ∈ Θ2

0 : θ ≥ θ}.

13

The audience consists of governments who must decide how much to subsidize

(or tax) deworming drugs. The optimal Pigouvian subsidy is simply ATE (θ). The

governments face a loss L (d, θ) = (d− ATE (θ))2 for d the per-unit subsidy, with

d < 0 denoting a tax. The set of feasible decisions,

D ={x− x : (x, x) ∈ XM ×XM

},

corresponds to the support of the average treatment effect estimate X −X. Online

Appendix B.5 provides a microfoundation for the quadratic loss (d− ATE (θ))2 in a

Pigouvian setting.

Because θ ≥ θ by assumption, a tax (d < 0) is never optimal. Therefore, rules that

sometimes recommend d < 0 are unappealing from the standpoint of decision risk.

Nevertheless, because the sample is finite we may have X < X even though θ ≥ θ.

Agents who are, say, unsure whether to subsidize a little or not at all may value

knowing that X < X, because they may apply a lower subsidy when X < X than

when, say, X = X. Therefore, from the standpoint of communication risk, rules that

never report d < 0 may be unappealing because they suppress useful information.

We may alternatively envision the loss as capturing the scientific community’s

desire for a good guess of the true average treatment effect. On this interpretation,

a guess d < 0 is again unappealing from the standpoint of decision risk (such a guess

cannot be right), but may be appealing from the standpoint of communication risk

(because it conveys useful information that agents can use in formulating their own

guesses). N

The Example illustrates a setting in which we expect the implications of the

communication and decision models to be very different. We next formalize the

source of the tension. The following subsections show how it manifests under each of

the criteria that we consider for choosing rules.

Definition 3. A decision d ∈ D is dominated in loss if there exists d′ ∈ D such

that L (d, θ) ≥ L (d′, θ) for all θ ∈ Θ, with strict inequality for some θ′ ∈ Θ.

A decision d ∈ D is dominated in loss if another decision yields a weakly smaller

loss under all parameter values, with a strictly smaller loss under some parameter

value.

14

Definition 4. Let P be the set of partitions of X , with generic element P ∈ P. Let

P∗ denote the subset of P such that for every cell Xp ∈ P ∈ P∗, each agent has at

least one decision d ∈ D that is optimal for every X ∈ Xp. That is,

P∗ =

{P ∈ P :

{∩X∈Xp arg min

d∈DEa [L (d, θ) |X]

}6= ∅ for all Xp ∈ P, a ∈ ∆ (Θ)

}.

The effective size of the sample space X , denoted N (X ), is the minimal size

of a partition in P∗

N (X ) = min {|P | : P ∈ P∗} .

The effective size of the sample space is the smallest number of cells into which

we can partition the sample space X such that knowing only which cell contains X

is sufficient for all agents to take optimal decisions. For example, such a partition

would group data realizations x and x′ that imply the same likelihood, i.e., for which

Prθ {X = x} = Prθ {X = x′} for all θ.

Claim 1. In the Example, any d < 0 is dominated in loss, and N (X ) = |XM | ×∣∣XM

∣∣ > |D|.Intuitively, any decision d < 0 is dominated by the decision d′ = 0 because decision

d′ is strictly closer to the optimal decision ATE (θ). At the same time any two distinct

realizations of the sufficient statistics imply distinct optimal actions for some agent,

so the effective size of the sample space is |XM | ×∣∣XM

∣∣.3.1 Admissibility

When there exists a dominated decision and N (X ) ≥ |D|, the communication and

decision models have non-overlapping sets of admissible rules.

Proposition 3. If (i) there exists a decision d ∈ D that is dominated in loss and

(ii) N (X ) ≥ |D| , then any rule c ∈ C that is admissible under the decision model is

inadmissible under the communication model, and vice versa.

The proof of Proposition 3 proceeds in two steps. The first step establishes that,

under the decision model, if d ∈ D is dominated in loss by d′ ∈ D, then any rule

15

c ∈ C that recommends d with positive probability is dominated by the rule c′ ∈ Cthat instead recommends d′.

The second step establishes that, under the communication model, if the effective

size of the sample space is weakly larger than the decision space, then any rule c ∈ Cwhich never uses some d ∈ D is dominated by another rule c′ ∈ C which sometimes

uses d. The reason is that, for any such c, there exists X ∗ ⊆ X such that (i) different

elements of X ∗ would lead some agent a to take different actions and (ii) c sometimes

assigns the same signal to all X ∈ X ∗. It follows that c can be dominated by a rule

c′ that reports d on the subset of X ∗ where a would like to take a particular action,

which strictly reduces the communication risk for agent a without increasing it for

any other agent.

Propositions 2 and 3 immediately establish conditions for a conflict between the

communication and classical models:

Corollary 1. If (i) there exists a decision d ∈ D that is dominated in loss and (ii)

N (X ) ≥ |D| , then any rule c ∈ C that is admissible under the classical model is

inadmissible under the communication model, and vice versa.

Claim 1 allows us to apply Proposition 3 and Corollary 1 to the Example.

Corollary 2. In the Example, there is no rule c ∈ C that is admissible under both the

decision model and the communication model, and no rule c ∈ C that is admissible

under both the classical model and the communication model.

Online Appendix B.1.1 develops a variant of this example with Gaussian data

and a continuous parameter space, and shows that decision and communication risk

induce conflicting dominance orderings.

It is also possible to modify the Example so the analyst reports estimates of

the treatment and control group means rather than of their difference. With this

modification the communication constraint is not binding, in the sense that it is

feasible to report a sufficient statistic for θ. The hypotheses of Proposition 3, and

hence the conclusions of Corollary 2, continue to hold for this alternative formulation.7

7Formally, let d =(d, d)∈ D = XM ×XM , and define the loss as

L (d, θ) =(d− d−ATE (θ)

)2.

16

Example. (Continued) Kruger et al. (1996) conducted an early randomized con-

trolled trial of the effect of anthelmintic therapy (deworming drugs) on children’s

growth. A separate randomization was used to study the effect of iron-fortified soup.

Children were weighed to the nearest 0.1kg at baseline and endline. Among children

who received unfortified soup, those receiving deworming drugs had a lower average

growth over the intervention period (mean weight gain of 0.9kg, n = 15) than those

receiving a placebo treatment (mean weight gain of 1.0kg, n = 14; see Table 4 of

Kruger et al. 1996). Kruger et al. (1996) state that “[Positive effects on weight gain]

can be expected with reduction in diarrhoea, anorexia, malabsorption, and iron loss

caused by parasitic infection” (p. 10). In a later review of the literature, Croke et al.

(2016) state that “there is no scientific reason to believe that deworming has negative

side effects on weight” (p. 19). If we interpret these statements to mean that the

average treatment effect is known to be non-negative, then censoring the estimated

treatment effect at 0 (i.e. reporting that the treatment and control groups experienced

the same average weight gain) would lead to an estimate strictly closer to the truth

than the negative estimate implied by the group means, and would therefore domi-

nate in mean squared error. However, Kruger et al. (1996) report the group means

and do not report a censored estimate. Indeed, among the four studies that Croke

et al. (2016) identify as implying negative point estimates of the effect of deworming

drugs on weight, none published a censored point estimate.8

The preceding analysis suggests one possible explanation. While each member of

the audience for the research might be happy with some censoring scheme, different

members of the audience would like different ones, and (under the conditions above)

By Claim 1, d− d < 0 is dominated in loss and N (X ) = |XM | ×∣∣XM

∣∣ ≥ |D|.8Croke et al. (2016, Figure 2) identify 4 negative point estimates out of a total of 22 reviewed.

These 4 negative point estimates are from 4 distinct studies (including Kruger et al. 1996), out of atotal of 20 distinct studies reviewed. Donnen et al. (1998, Table 2) report the regression-adjustedweight gains for a group treated with mebendazole and a control. They further report that thetreated group’s gain is statistically significantly below that of the control group at all time horizonsconsidered. Croke et al. (2016, Figure 2) report a statistically significant effect on weight gain of-0.45kg based on the data from Donnen et al. (1998). Miguel and Kremer (2004, Table V) reporttreatment and control group means of standardized weight-for-age and a statistically insignificantdifference in means of -0.00 to rounding precision. Croke et al. (2016, Figure 2) report a statisticallyinsignificant effect on weight of -0.76kg based on the data from Miguel and Kremer (2004). Awasthiet al. (2000, Table 1) report treatment and control group means of weight gain and report thatthese are not statistically different. Croke et al. (2016, Figure 2) report a statistically insignificanteffect of -0.05kg based on the data from Awasthi et al. (2000).

17

any single censoring scheme sacrifices decision-relevant information for some audience

member. N

3.2 ω-optimality

In the classical setting, a rule c is a Bayes decision rule (or Bayes estimator) if it

prescribes the optimal decision given the data X for a Bayesian agent with some

proper prior (e.g., Lehmann and Casella 1998, p. 6; Robert 2007, p. 63). Such decision

rules have strong optimality properties in the classical setting. In particular, Complete

Class Theorems show that in many settings any rule that cannot be expressed as Bayes

is dominated by one that can be. These Theorems have been invoked as part of the

justification for using Bayes procedures (e.g., Robert 2007, p. 512).

Under the decision model any rule that is ω-optimal is a Bayes decision rule in the

classical sense. The reason is that the weighted average decision risk for the audience

can be expressed as the decision risk for a single agent with a weighted average prior

aω (θ) =∫

∆(Θ)a (θ) dω (a):

∫∆(Θ)

Ra (c) dω (a) =∑θ∈Θ

(Eθ [L (c (X, V ) , θ)]

∫∆(Θ)

a (θ) dω (a)

). (2)

Unlike decision risk, communication risk does not generally admit a representation like

(2). Nonetheless, we show in Online Appendix B.6 that a Complete Class Theorem

holds in both the decision model and the communication model.9

Propositions 1 and 3 establish conditions under which the set of ω-optimal rules

in the communication model does not overlap with the set of ω-optimal rules in the

decision model or with the set of Bayes decision rules with respect to full-support

priors.


N (X ) ≥ |D| , then:

9To describe this result, define ω-optimality analogously to ω-optimality, except that the supportof ω may be a strict subset of ∆ (Θ). Unlike ω-optimality, ω-optimality need not imply admissibility.The Complete Class Theorem states that any rule that is not ω-optimal is dominated by some ω-optimal rule.

18

(a) Any rule c ∈ C that is ω-optimal under the decision model is inadmissible, and

therefore not ω′-optimal for any ω′, under the communication model, and vice versa.

(b) Any rule c ∈ C that can be expressed as a Bayes decision rule with respect to a

prior with full support on Θ is inadmissible, and therefore not ω-optimal for any ω,

under the communication model.

(c) Any rule c ∈ C that is ω-optimal under the communication model is not an

optimal decision rule for any agent with a full-support prior, a ∈ int (∆ (Θ)).

Corollary 3 implies a very strong sense in which the communication model predicts

different ω-optimal rules than the decision (and hence classical) model. This arises

because (under the conditions of the corollary) Bayes decision rules with respect to

full-support priors discard actionable information.

Example. (Continued) Suppose that Kruger et al. (1996) had selected a full-support

prior a ∈ int (∆ (Θ)) and reported the associated Bayes decision. By construction,

under both the decision and communication models, this rule is optimal for audience

members with prior a. Moreover, under the decision model, the rule is optimal with

respect to some full-support Pareto weights ω on the audience, and hence admissible.

Under the communication model, however, this rule is not optimal with respect to

any such Pareto weights, and is inadmissible. The reason is that any such Bayes

decision rule never selects d < 0, and so censors the data unnecessarily.

Figure 1 illustrates this intuition in a toy numerical example with F0 (t) a Bernoulli

distribution with success probability t, Θ0 = {0.3, 0.7}, and hence

Θ = {(0.3, 0.3) , (0.3, 0.7) , (0.7, 0.7)}. The upper-left plot shows the Bayes decision

rule c for a particular full-support prior a ∈ int (∆ (Θ)). The rule c never makes a

negative report, and is therefore dominated in communication risk by another rule c′

that does sometimes make a negative report, shown in the upper-right plot. The lower

plot shows, for each a, the normalized difference R∗a (c) − R∗a (c′) in communication

risk between the rule c and the dominating rule c′.

The dominating rule c′ achieves weakly lower communication risk than the rule c

for all agents a ∈ ∆ (Θ). Because rule c is the Bayes decision rule for agent a, the

dominating rule c′ achieves the same communication risk as c for this agent. The same

holds for other agents who, like agent a, put a lot of prior mass on the parameter

19

values {(0.3, 0.3) , (0.7, 0.7)} under which the intervention is ineffective. However,

for other agents, such as a′, who put more prior mass on the possibility that the

intervention is effective, the dominating rule c′ achieves strictly lower communication

risk than the rule c. Intuitively, such agents value knowing when X < X, i.e. when

the evidence for gains from treatment is especially weak.

Croke et al. (2016, Figure 2) review 20 distinct studies reporting on randomized

controlled trials of the effects of deworming drugs on children’s growth. None of these

reports a Bayesian posterior mean for a proper prior, or other explicitly Bayesian

estimate of the effect of deworming on weight. Our analysis illustrates one possible

reason for this, which is that reporting a Bayes decision with respect to one prior can

sacrifice information useful to an agent with a different prior. N

3.3 Maximum Risk

The set of minimax rules in the communication model nests that in the decision

model.

Theorem 1. Any rule c∗ ∈ C that is minimax under the decision (or classical) model

is minimax under the communication model.

An intuition for the proof is as follows. Pick some rule c∗ ∈ C that is minimax in

decision risk. Because the set of priors ∆ (Θ) is convex, we can show (following results

in Grunwald and Dawid 2004, who in turn build on the classic minimax theorem of

von Neumann 1928) that there exists some worst-off agent a∗ ∈ ∆ (Θ) for whom c∗

is optimal in decision risk. For this agent a∗, it follows that c∗ must also be optimal

in communication risk, because the agent’s optimal decision following any report

c∗ (X, V ) will simply be the report itself. But because this agent a∗ is worst-off in

decision risk, it follows that this same agent must also be worst-off in communication

risk. Collecting this reasoning gives:

infc∈C

supa∈∆(Θ)

Ra (c) = infc∈C

Ra∗ (c) = Ra∗ (c∗) = R∗a∗ (c∗) = infc∈C

R∗a∗ (c) = infc∈C

supa∈∆(Θ)

R∗a (c) .

Hence c∗ is minimax in communication risk. Convexity of the audience is important

20

Figure 1: Illustration of a Bayes decision rule in running example

Bayes decision rule c Decensored Bayes decision rule c′

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.1

0.1

0.1

0.1

0.1

0

0

0

0

0

0

0.1

0.2

0.2

0.2

0.2

0

0

0

0

0

0

0.2

0.3

0.3

0.3

0.3

0

0

0

0

0

0

0.2

0.3

0.4

0.4

0.4

0

0

0

0

0

0

0.2

0.3

0.4

0.4

0.4

0

0

0

0

0

0

0.2

0.3

0.4

0.4

0.4

0

0

0

0

0

0

0.2

0.3

0.4

0.4

0.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

-0.2

-0.1

0

-0.2

-0.1

0

0

-0.2

-0.1

0

0

0

-0.2

-0.1

0

0

0

0

-0.2

-0.1

0.1

0.1

0.1

0.1

0.1

-0.2

-0.1

0

0.1

0.2

0.2

0.2

0.2

-0.2

-0.1

0

0

0.2

0.3

0.3

0.3

0.3

-0.2

-0.1

0

0

0

0.2

0.3

0.4

0.4

0.4

-0.2

-0.1

0

0

0

0

0.2

0.3

0.4

0.4

0.4

-0.1

0

0

0

0

0

0.2

0.3

0.4

0.4

0.4

0

0

0

0

0

0

0.2

0.3

0.4

0.4

0.4

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.7

-0.6

-0.5

-0.4

-0.3

-0.6

-0.5

-0.4

-0.3

-0.5

-0.4

-0.3

-0.4

-0.3

-0.3

Communication risk of Bayes decision rulerelative to decensored rule

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Note: The plots illustrate properties of a Bayes decision rule in the running example withΘ0 = {0.3, 0.7} and thus Θ = {(0.3, 0.3) , (0.3, 0.7) , (0.7, 0.7)}. We set n = 20, n = 10,and X0 = {0, 1}, so that XM = XM = {0, 0.1, ..., 1} and D = {−1,−0.9, ..., 1}. Weassume that F0 (t) is a Bernoulli distribution with success probability t. The upper leftplot shows the Bayes decision rule c for a particular full-support prior a ∈ int (∆ (Θ)),specifically a = (0.1, 0.1, 0.8). The upper right plot shows a rule c′ such that c′ (X) =c (X)+1 {c (X) = 0}min

{0, X −X

}. The lower plot shows the difference in communication

risk R∗a (c) − R∗a (c′) between the two rules for each agent a ∈ ∆ (Θ), normalized by themaximum communication risk (over all agents) from a report of the full data. In the plot,we label the prior a, for which R∗a (c) = R∗a (c′), as well as another prior a′, for whichR∗a′ (c) > R∗a′ (c

′).

21

for this line of reasoning, because without it we cannot guarantee that the worst-case

prior corresponds to an agent a∗ in the audience.

Although minimax rules under the decision model are guaranteed to be minimax

under the communication model, they need not be appealing if the analyst cares about

more than the worst-case risk. Recall that by Proposition 1 and Lemma 1, both the

decision and communication models admit a nonempty set of minimax admissible

rules. Under the conditions of Proposition 3, these sets do not intersect, and thus

there is no rule that is minimax admissible under both models.


N (X ) ≥ |D| , then any rule c∗ ∈ C that is minimax and admissible under the decision

(or classical) model is minimax and inadmissible under the communication model.

Example. (Continued). In the Example, any rule c∗ that is minimax under the

decision model reports d ≥ 0 with probability one. Hence, all rules that are minimax

under the decision model are inadmissible under the communication model in this

setting. Figure 2 continues our numerical illustration. The upper-left plot shows

a minimax decision rule c∗ in this numerical example. The rule c∗ never makes a

negative report, and is therefore dominated in communication risk by another rule c∗∗

that does sometimes make a negative report, shown in the upper-right plot. The lower

plot shows, for each a, the normalized difference R∗a (c∗)−R∗a (c∗∗) in communication

risk between the rule c∗ and the dominating rule c∗∗. The worst-off agent a∗, and other

agents who likewise put significant mass on parameters under which the treatment

is ineffective, do not benefit in communication risk from the dominating rule c∗∗.

However, agents who, like the labeled agent a∗∗, believe a priori that the treatment is

very likely effective, benefit in communication risk from rule c∗∗ because overturning

their priors requires strong evidence of ineffectiveness. Such evidence can be provided

by reporting the treatment-control difference X −X in cases where this difference is

negative. N

22

Figure 2: Illustration of minimax decision rule in running example

Minimax decision rule c∗ Decensored minimax decision rule c∗∗

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0

0

0

0

0.1

0.2

0.2

0.2

0.3

0.3

0.3

0

0

0

0

0.1

0.2

0.3

0.4

0.4

0.4

0.4

0

0

0

0

0.1

0.2

0.4

0.4

0.4

0.4

0.4

0

0

0

0

0.1

0.3

0.4

0.4

0.4

0.4

0.4

0

0

0

0

0.1

0.3

0.4

0.4

0.4

0.4

0.4

0

0

0

0

0.1

0.3

0.4

0.4

0.4

0.4

0.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

-0.2

-0.1

0

-0.2

-0.1

0

0

-0.2

-0.1

0

0

0

-0.2

-0.1

0

0

0

0

0.1

0.1

0.1

0.1

0.1

0.1

0.1

-0.2

0.1

0.2

0.2

0.2

0.3

0.3

0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.4

0.4

0.4

-0.2

-0.1

0

0.1

0.2

0.4

0.4

0.4

0.4

0.4

-0.2

-0.1

0

0

0.1

0.3

0.4

0.4

0.4

0.4

0.4

-0.1

0

0

0

0.1

0.3

0.4

0.4

0.4

0.4

0.4

0

0

0

0

0.1

0.3

0.4

0.4

0.4

0.4

0.4

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.7

-0.6

-0.5

-0.4

-0.3

-0.6

-0.5

-0.4

-0.3

-0.5

-0.4

-0.3

-0.4

-0.3

-0.3

Communication risk of minimax rulerelative to decensored rule

0

2

4

6

8

10

12

14

10-3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Note: The plots illustrate properties of a minimax decision rule in the running examplewith Θ0 = {0.3, 0.7} and thus Θ = {(0.3, 0.3) , (0.3, 0.7) , (0.7, 0.7)}. We set n = 20, n = 10,and X0 = {0, 1}, so that XM = XM = {0, 0.1, ..., 1} and D = {−1,−0.9, ..., 1}. Weassume that F0 (t) is a Bernoulli distribution with success probability t. The upper leftplot shows the minimax decision rule c∗. The upper right plot shows a rule c∗∗ such thatc∗∗ (X) = c∗ (X) + 1 {c∗ (X) = 0}min

{0, X −X

}. The lower plot shows the difference in

communication risk R∗a (c∗) − R∗a (c∗∗) between the two rules for each agent a ∈ ∆ (Θ),normalized by the maximum communication risk (over all agents) from a report of the fulldata. In the plot, we label the prior a∗ of the worst-off agent, for which R∗a∗ (c∗) = R∗a∗ (c∗∗),as well as another prior a∗∗, for which R∗a∗∗ (c∗) > R∗a∗∗ (c∗∗).

23

4 Applications

In this section we apply our model to two types of scientific situations studied in the

recent econometrics literature. In both situations we show that the implications of

the decision and communication models are different, and we argue that the commu-

nication model is more consistent with some observed scientific practices.

4.1 Estimation of a Monotone Function

Our first application is to the estimation of a function that is known to be mono-

tonically decreasing, which is a special case of estimation under shape restrictions.

Shape restrictions arise in many important economic situations, and econometric pro-

cedures that exploit these restrictions can offer improved performance according to

conventional criteria (see, e.g., the review by Chetverikov et al. 2018).

As a concrete example, consider estimating the effect of the length of a job-seeker’s

current spell of unemployment on the probability that her resume receives a callback.

Following Oberholzer-Gee (2008), Kroft et al. (2013), and Eriksson and Rooth (2014),

we may imagine that the analyst conducts an audit study, submitting artificial re-

sumes with randomly assigned employment histories, and recording the responses

from potential employers. Multiple classes of economic models (reviewed, for ex-

ample, in Kroft et al. 2013, Section II) imply that a longer current unemployment

spell makes an applicant less attractive to an employer, which can be interpreted

as a monotonicity restriction that the conditional probability of a callback is weakly

decreasing in the duration of the current unemployment spell.

Formally, let Yi ∈ {0, 1} denote a binary outcome, and suppose we are interested

in the conditional mean E [Yi|Zi] of Yi given some discrete Zi ∈ Z = {z1, ..., zJ}, with

z1 < z2 < ... < zJ . In the resume audit setting, we can think of Yi as an indicator

for whether resume i generates a callback, and Zi as the duration (in months) of the

fictitious applicant’s current unemployment spell.

The analyst observes n ≥ 3 independent draws of Yi for each predictor value zj.

Denote the fraction of successes when Zi = zj by Xj ∈{

0, 1n, ..., 1

}. The number of

successes nXj follows a binomial distribution and it is without loss to represent the

24

data as the vector of success rates,

X = (X1, ..., XJ) ∈ X =

{0,

1

n, ..., 1

}J.

The unknown parameter is the vector θ = (θ1, ..., θJ) of conditional means θj =

E [Y |Z = zj]. We assume that conditional mean is weakly decreasing in j

θ1 ≥ θ2 ≥ ... ≥ θJ . (3)

We assume that θj ∈ Θ0 ⊂ (0, 1) for each j, for Θ0 a finite set such that max {Θ0} −min {Θ0} > 2/3. The parameter space is thus

Θ ={θ ∈ ×Jj=1Θ0 : θ1 ≥ θ2 ≥ ... ≥ θJ

}.

We take the decision space to equal the sample space, D = X , so the communi-

cation constraint is not binding in the sense that it is feasible to report the full data

X. Finally, we suppose that the objective is to estimate θ, which we formalize by

considering the quadratic loss

L (d, θ) = ‖d− θ‖22 =

∑j

(dj − θj)2 .

The results of this section hold for any loss of the form L (d, θ) = ‖d−θ‖pp or L (d, θ) =

‖d− θ‖p for p > 1.

A natural rule is to simply report the data, c (X, V ) = X. All audience members

agree that such a report is unbiased, in the sense that Ea [X − θ] = 0 for all a ∈ ∆ (Θ) .

However, Xj need not be decreasing in j, and indeed Prθ {Xj > Xj−1} > 0 for all j

and all θ ∈ Θ. Hence, this rule does not respect the restrictions on the parameter

space. Let d(j) sort the elements of d in decreasing order, so d(1) is the largest and

d(J) the smallest. Define

d∗j (d) = d(j)

as the decision which sorts the elements in d in decreasing order. The following result

is immediate from Chernozhukov et al. (2009).

25

Claim 2. Any decision d with dj > dj−1 for some j is dominated in loss by d∗ (d).

At the same time, one can show that any two distinct realizations of the data

imply distinct optimal actions for some agent, which implies the following.

Claim 3. The effective size of the sample space is equal to the size of the sample and

decision spaces, N (X ) = |X | = |D| .

It follows by Proposition 3 that there is no rule c ∈ C that is admissible under

both the decision model and the communication model, and no rule c ∈ C that is

admissible under both the classical model and the communication model. Online

Appendix B.1.2 develops a version of this example with X normally distributed and

a continuous parameter space Θ0 ⊆ R, and shows that decision and communication

risk again imply different dominance orderings.

Figure 3 illustrates the conflict between the goals of communication and decision

in a numerical example of this application. The top row of plots shows three example

realizations of the data X that all correspond to the same rearranged decision d∗ (X),

which is depicted in the bottom plot. The rearranged decision d∗ (X) dominates a

decision of d (X) = X in loss for any X for which Xj > Xj−1 for some j, and hence for

the three realizations X depicted in the top row of plots. However, by pooling several

different realizations of the data into the same decision, the rearranged decision d∗ (X)

sacrifices information that is decision-relevant for some members of the audience.

Kroft et al. (2013, Figure 2, top panel) report the average callback rate as a

function of months unemployed, which we may think of as analogous to reporting

X, and which violates monotonicity in their data. Kroft et al. (2013, Figure 3) also

report a sorted local linear regression estimate based on Chernozhukov et al (2009),

which imposes monotonicity, and which we may think of as analogous to reporting

d∗ (X). Eriksson and Rooth (2014, Table 5) report the callback rate as a function of

months unemployed in the current spell, which violates monotonicity in their data.

Eriksson and Rooth do not report any sorted or monotonicity-constrained estimates

of this relationship.10

10Oberholzer-Gee (2008, Table 2) reports a linear probability model relating the callback proba-bility to the number of months unemployed in the current spell, which respects monotonicity in hisdata among those currently unemployed.

26

Figure 3: Illustration of dominating decision in estimation of a monotone function

Example data realizations corresponding to the same dominating decision

0 2 4 6 8 10 12

0.05

0.1

0.15

0.2

0 2 4 6 8 10 12

0.05

0.1

0.15

0.2

0 2 4 6 8 10 12

0.05

0.1

0.15

0.2

Dominating decision

0 2 4 6 8 10 12

0.05

0.1

0.15

0.2

Note: The plots illustrate properties of the dominating decision in the estimation of afunction that is known to be monotonically decreasing. The top row of plots depicts threepossible realizations of the callback rates X = (X1, ..., XJ). These are the means of a binaryoutcome Yi conditional on the unemployment duration Zi ∈ Z = {z1, ..., zJ} = {1, ..., 12}.For each realization X, the decision d = X is dominated in (quadratic) loss by the decisiond = d∗ (X), which is depicted in the bottom plot. Under the dominating decision d∗ (X),all of the realizations X depicted in the top row of plots are reported identically, with thereport depicted in the bottom plot.

27

The fact that both articles report estimates that violate monotonicity seems more

consistent with the predictions of the communication model than with the predic-

tions of the decision model. The fact that Kroft et al. (2013) additionally report

sorted estimates seems consistent with the predictions of the decision model, perhaps

suggesting a concern for both communication and decision-making in their context.

4.2 Optimal Treatment Assignment

Our second application is to optimal treatment assignment. Manski (2004) studies

the problem of assigning one of a finite set of treatments to each member of some

population, potentially as a function of a discrete covariate.

As a concrete example, consider the problem of an analyst who must make a

clinical recommendation to an audience of physicians on the basis of the available

evidence. This is a problem faced by many organizations, including medical learned

societies and private publishers. Say that each physician’s goal is to achieve the best

average outcome for patients with each of a given set of attributes (e.g., diagnosis).

We suppose these attributes are discrete, as in Manski (2004), and study the problem

of recommending treatment to patients in a given attribute cell.

Formally, denote the available treatments (e.g., medications) by t ∈ {1, ..., T} for

T ≥ 2. Suppose that n ≥ 1 units (e.g., patients) are randomly allocated to each

treatment t, and that for each unit i the analyst measures a binary outcome Yi (e.g.,

an indicator for the resolution of symptoms). Let us further assume patient outcomes

are independent, so it is without loss to represent the data for treatment t as a fraction

of successes Xt ∈{

0, 1n, ..., 1

}, with nXt following a binomial distribution. The sample

space is then

X =

{0,

1

n, ..., 1

}T.

The unknown parameter is (θ1, ..., θT ) where θt = E [Xt] denotes the success proba-

bility for units assigned to treatment t. We assume that θt lies in a finite set Θ0 ⊂ (0, 1)

with |Θ0| ≥ 2, so the parameter space is Θ = ΘT0 ⊂ (0, 1)T . We show in Online Ap-

pendix B.1.3 that our results for this application continue to hold for continuous

parameter spaces Θ0 = (0, 1) or Θ0 = [0, 1].

The agent’s (e.g., physician’s) decision consists of picking a treatment or declining

28

do so. Formally we take the decision space to be D = {1, ..., T} ∪ {ι} where ι

corresponds to not picking a treatment. The agent’s objective is to pick the best

treatment which, following Manski (2004), we formalize by considering the regret loss

L (d, θ) =

−θd + maxt θt if d 6= ι

maxt θt if d = ι.

Declining to pick a treatment yields greater loss than picking any given treatment

(e.g., because the patient cannot self-prescribe). The decision space in this case is too

small to convey the full data, since T + 1 = |D| < |X | = (n+ 1)T .

Classical decision-theoretic results for selection problems (Lehmann 1966, Eaton

1967) imply the following:

Claim 4. Consider the rule c∗ that takes c∗ (X, V ) = arg maxtXt if the argmax is

unique and otherwise randomizes uniformly over arg maxtXt. This rule minimizes

decision risk uniformly over ∆ (Θ) among rules that are invariant with respect to

permutations of the treatments. Moreover, it is a Bayes decision rule for the agent a∗

with a∗ (θ) = 1|Θ| for all θ ∈ Θ, and is minimax under the decision model.

The rule c∗ is a special case of what Manski (2004) terms the“conditional empirical

success” rule, and is related to the empirical welfare maximization procedures studied

by Kitagawa and Tetenov (2018) and Athey and Wager (2019). The results of Stoye

(2009) imply that c∗ is a minimax decision rule for regret loss in the case of T = 2.

Claim 4 shows that the rule c∗ is an appealing decision rule in several senses, and

together with Theorem 1 implies that c∗ is minimax under the communication model.

The rule c∗ is nevertheless inadmissible under the communication model. To see this,

note first the following properties of the setting:

Claim 5. The decision d = ι is dominated in loss, and the effective size of the sample

space is at least as large as the decision space, N (X ) ≥ |D| .

It follows by Proposition 3 that there is no rule c ∈ C that is admissible under

both the decision model and the communication model, and no rule c ∈ C that is

admissible under both the classical model and the communication model. Since, by

Claim 4, c∗ is a Bayes decision rule for the full-support prior a∗ (θ) = 1|Θ| , it follows

29

that c∗ is admissible under the decision model and therefore, by Proposition 3 and

Corollary 3(b), inadmissible, and not ω-optimal for any ω, under the communication

model.

To understand why rule c∗ is inadmissible under the communication model, note

the following:

Claim 6. Under the communication model, the rule c∗ is dominated by the rule c

that takes c (X, V ) = ι if arg maxtXt = {1, ..., T} and c (X, V ) = c∗ (X, V ) otherwise.

When arg maxtXt = {1, ..., T}, the data are uninformative about which treatment

is best, in the sense that the likelihood for a given parameter value θ is unchanged if

we permute which θt is associated to which treatment. The rule c reflects this fact by

reporting ι, while the rule c∗ instead makes a guess at random. From the standpoint

of decision-making, the rule c∗ selects an undominated decision while the rule c may

not. From the standpoint of communication, the rule c∗ obscures information that

the rule c does not. The information that c∗ obscures is useful to agents whose

priors are such that they wish to follow the empirical success rule only when the

data are informative. Online Appendix B.7 extends the analysis to demonstrate a

case in which the communication model favors reporting ι even when the data are

informative, provided the amount of information in the data is small in comparison

to the audience’s beliefs.

In practice, analysts in situations like the one we have modeled sometimes ex-

press their ignorance rather than choosing a concrete recommendation at random.

UpToDate is a private publisher that synthesizes medical research into clinical rec-

ommendations. As in the communication model, readers of these recommendations

include practitioners who are free to make different clinical decisions. On the choice

among selective serotonin reuptake inhibitors (SSRIs) to treat unipolar major depres-

sion in adults, UpToDate says “Given the lack of clear superiority in efficacy among

antidepressants, selecting a drug is based on other factors, such as ... patient prefer-

ence or expectations” (Simon 2019). Such a report seems more similar to c than to

c∗, and thus more consistent with the predictions of the communication model than

with the predictions of the decision model.

30

5 Conclusions

We propose a model of scientific communication in which the analyst’s report is

designed to convey useful information to the agents in the audience, rather than, as

in a classical model of statistics, to make a good decision or guess on these agents’

behalf. We show conditions under which the proposed model predicts very different

reporting rules from the classical model. We offer examples satisfying these conditions,

and we argue that in practical situations similar to these examples, scientists’ reports

seem to be more consistent with the predictions of the proposed model than with the

predictions of the classical model.

Proofs

This appendix provides proofs for the main results in the paper. Online Appendix

A proves results for the Example and applications. Online Appendix B discusses

extensions and additional results.

Proof of Proposition 1 We first prove the existence of an ω-optimal rule. Since

the set of risk functions {ρ (c, ·) : c ∈ C} is compact in the supremum norm, the set

of weighted average risks{∫A ρ (c, a) dω (a) : c ∈ C

}is compact as well, and thus has

a smallest element. This implies the existence of an ω-optimal rule.

We next show that any ω-optimal rule is admissible. Suppose that the result fails,

so there exists some c′ ∈ C that is ω-optimal but inadmissible. ω-optimality implies

that ∫Aρ (c′, a) dω (a) = min

c∈C

∫Aρ (c, a) dω (a) .

Inadmissibility implies the existence of c′′ ∈ C such that ρ (c′, a) ≥ ρ (c′′, a) for all

a ∈ A and ρ (c′, a′) > ρ (c′′, a′) for some a′ ∈ A. By continuity of ρ (c, a) in a,

ρ (c′, a) > ρ (c′′, a) on an open neighborhood of a′ in ∆ (Θ) (which may or may not

be contained in A). The full support assumption on ω implies that the intersection

31

of this open neighborhood with A has positive ω measure,11 and thus that∫Aρ (c′, a) dω (a) >

∫Aρ (c′′, a) dω (a) .

Hence, we have achieved a contradiction and proved the result.

Since we have shown that an ω-optimal rule exists for any ω, and that ω-optimal

rules are admissible, it follows that an admissible rule exists.

We next prove the existence of minimax rules. To that end, note that A is a closed

subset of a finite-dimensional simplex and so is compact. Since ρ (c, a) is continuous

in a and A is compact, supa∈A ρ (c, a) = maxa∈A ρ (c, a) exists for all c ∈ C. Moreover,

since the set of risk functions {ρ (c, ·) : c ∈ C} is compact in the supremum norm, the

set {maxa∈A ρ (c, a) : c ∈ C} is compact, and so has a minimum, which implies the

existence of a minimax rule.

Finally, to prove the existence of an admissible minimax rule, define the set C0 ⊆ Cof minimax rules and note that the set of minimax risk functions {ρ (c, ·) : c ∈ C0} is

a closed subset of a compact set, and thus is compact. Let us take a countable dense

subset of A, {a1, a2, ...}. For each j ≥ 1, let Cj denote the subset of rules in Cj−1 with

minimal risk at aj,

Cj =

{c ∈ Cj−1 : ρ (c, aj) = min

c′∈Cj−1

ρ (c′, aj)

},

where the min on the right hand side is achieved by compactness of {ρ (c, ·) : c ∈ Cj−1}.{ρ (c, ·) : c ∈ Cj} is a closed subset of a compact set, and so is again compact. More-

over, {ρ (c, ·) : c ∈ Cj} is non-empty by construction for all j, so ∩j {ρ (c, ·) : c ∈ Cj}is non-empty by Cantor’s Intersection Theorem. Any rule c∗ ∈ ∩jCj ⊆ C0 is minimax

by definition but must also be admissible, since otherwise (by continuity of ρ in a)

it would have been dropped at some finite step j. Thus, there exists an admissible

minimax rule. 2

Lemma 1. Decision risk Ra (c) and communication risk R∗a (c) are both continu-

ous in a for all c ∈ C. Moreover, the sets of risk functions {R· (c) : c ∈ C} and

11Since a′ is in the support of ω by assumption, we know that ω assigns positive mass to all openneighborhoods of a′.

32

{R∗· (c) : c ∈ C} are compact in the supremum norm.

Proof of Lemma 1 We first show continuity of decision risk. Note that the decision

risk of a rule c can be written as

∑Θ

a (θ) Eθ [L (c (X, V ) , θ)]

where L (d, θ) is uniformly bounded. Hence, the decision risk is trivially continuous,

and indeed Lipschitz, in a.

We next show continuity of communication risk. To this end, note that for any

fixed c ∈ C, if we define B to be the set of mappings b : D × [0, 1] → D then for any

c ∈ C and any a ∈ ∆ (Θ) we have

R∗a (c) = minb∈B

Ea [L (b (c (X, V ) , V ) , θ)] = minb∈B

Ra (b ◦ c) ,

where the min is achieved by a binding rule ba ∈ B with

ba (c (X, V ) , V ) ∈ mind∈D

Ea [L (d, θ) |c (X, V ) , V ]

for all (X, V ). However, the same argument used to establish continuity of decision

risk implies that Ea [L (b (C, V ) , θ)] is continuous in a. Indeed,

supc,b |Ea [L (b (c (X, V ) , V ) , θ)]− Ea [L (b (c (X, V ) , V ) , θ)]| ≤2 maxd,θ |L (d, θ)|

∑Θ |a (θ)− a (θ)|

(4)

so Ea [L (b (c (X, V ) , V ) , θ)] is Lipschitz in a (with respect to the L1-norm) uniformly

over b, c. This implies that minb∈B Ea [L (b (c (X, V ) , V ) , θ)] is Lipschitz in a as well,

since for ba the optimal rule for a, with

minb∈B

Ea [L (b (c (X, V ) , V ) , θ)] = Ea [L (ba (c (X, V ) , V ) , θ)] = R∗a (c) ,

we have

Ea [L (ba (c (X, V ) , V ) , θ)] ≥ Ea [L (ba (c (X, V ) , V ) , θ)] = R∗a (c)

33

Ea [L (ba (c (X, V ) , V ) , θ)] ≥ Ea [L (ba (c (X, V ) , V ) , θ)] = R∗a (c) ,

and thus

|R∗a (c)−R∗a (c)| ≤ 2 maxd,θ|L (d, θ)|

∑Θ

|a (θ)− a (θ)| .

Hence, communication risk is Lipschitz in a, with the same Lipschitz constant as

decision risk.

To prove compactness, note that both the decision and communication risk func-

tions are uniformly bounded by maxd,θ |L (d, θ)|, and that the set of beliefs A is com-

pact with respect to the L1-norm∑

Θ |a (θ)− a (θ)|. Thus, the sets of decision and

communication risk functions are bounded Lipschitz functions on a compact domain,

and so are compact in the supremum norm by the Arzela–Ascoli Theorem. 2

Proof of Proposition 2 For admissibility, note that rule c dominates rule c′ in the

classical model if and only if

Eθ [L (c (X, V ) , θ)] ≤ Eθ [L (c′ (X, V ) , θ)] for all θ ∈ Θ (5)

with strict inequality for some θ. Note, however, that this implies

Ea [L (c (X, V ) , θ)] ≤ Ea [L (c′ (X, V ) , θ)]

for all a, with strict inequality for full-support a, and thus that c dominates c′ in

the decision model. Conversely, if c dominates c′ in the decision model, then (5) is

immediate. Moreover, there must be a strict inequality for some θ, or else the decision

risk functions for c and c′ would be the same.

For ω-optimality, for any full-support weights ω on ∆ (Θ), define

aω (θ) =

∫∆(Θ)

a (θ) dω (a)

as the implied weight on Θ. Note that if ω has full support on ∆ (Θ) then aω has full

support on Θ, and thus that if a rule c is optimal in the decision model with respect

to the full-support weights ω, it is optimal in the classical model with respect to the

full-support weights aω. Conversely, for any full-support weights aω in the classical

34

model, let ν = minθ∈Θ aω (θ). Let η denote the uniform weight on ∆ (Θ), and let ω

correspond to a mixture with weight ν |Θ| on η and weight 1−ν |Θ| on the degenerate

weight which puts mass one on a (θ) = aω(θ)−ν1−ν|Θ| . By construction the weight ω has full

support, and any rule that is optimal in the classical model with respect to aω is

optimal in the decision model with respect to ω.

Finally note that the maximum risk in the decision model is the same as that in

the classical model, in the sense that for all c ∈ C,

maxa∈A

Ea [L (c (X, V ) , θ)] = maxθ∈Θ

Eθ [L (c (X, V ) , θ)] .

Hence, equivalence of minimax rules in the two models is immediate. 2

To prove several results that follow, it is helpful to restrict attention to the class

of rules that are non-random conditional on the data and the public randomization

device (that is, rules c : X × [0, 1]→ D, rather than c : X × [0, 1]→ ∆ (D)). We can

do so without increasing either decision or communication risk.

Lemma 2. For any c : X × [0, 1] → ∆ (D), there exists a publicly-randomized rule

c : X×[0, 1]→ D such that Ra (c) = Ra (c) and R∗a (c) ≤ R∗a (c), both for all a ∈ ∆ (Θ).

Proof of Lemma 2 Note that we can construct a U [0, 1] random variable U inde-

pendent of X and V such that for some function c : X × [0, 1]2 → D, the conditional

distribution of c (X, V, U) given (X, V ) coincides with that of c (X, V ).12

Next, note that using V we can generate two independent U [0, 1] random variables

(V1, V2), e.g. by taking alternating terms in the decimal expansion of V . Let us define

c (X, V ) = c (X, V1, V2) , noting that c (X, V ) is non-random conditional on (X, V ) .

The conditional distribution of c (X, V ) given X is the same as the conditional dis-

tribution of c (X, V ) given X for all X by construction. Hence, Eθ [L (c (X, V ) , θ)] =

Eθ [L (c (X, V ) , θ)] for all θ ∈ Θ, which implies that Ra (c) = Ra (c) for all a ∈ ∆ (Θ),

12For example, numbering the elements of D as d1, ..., d|D| and defining pj (X,V ; c) =Pr {c (X,V ) = dj |X,V } , conditional on c (X,V ) = dj let us take U to be uniformly distributed

on[∑

j′<j pj′ (X,V ; c) ,∑

j′≤j pj′ (X,V ; c)]. Note that U ∼ U [0, 1] conditional on (X,V ) for all

(X,V ), and so is independent of (X,V ) . If we then define c so that c (X,V, U) = dj if and only if

U ∈[∑

j′<j pj′ (X,V ; c) ,∑

j′≤j pj′ (X,V ; c)], the conditional distribution of c (X,V, U) coincides

with that of c (X,V ).

35

and so establishes the result for decision risk.

For communication risk, as in the proof of Lemma 1 above, we define B to be the

set of mappings b : D × [0, 1] → D and let ba denote the optimal rule for a. Let c

denote the (potentially randomized) rule considered in the statement of the lemma,

and ba a corresponding binding rule. Note that by the definition of c

R∗a (c) = minb∈B

Ra (b ◦ c) = Ra (ba ◦ c) = Ra (ba ◦ c) ≥ minb∈B

Ra (b ◦ c) = R∗a (c) ,

so c has weakly lower communication risk than c.13 2

Proposition 3 extends to cases with general audiences A 6= ∆ (Θ). To state the

result for this more general case, we first need to define dominance in loss and the

effective size of the sample space for a general audience A.

Definition 5. A decision d ∈ D is dominated in loss if there exists d′ ∈ D such

that L (d, θ) ≥ L (d′, θ) for all θ ∈ Θ, with strict inequality for some θ′ such that

Pra′ {θ = θ′} > 0 for at least one agent a′ ∈ A.

Definition 6. Let P be the set of partitions of X , with generic element P ∈ P. Let

P∗A denote the subset of P such that for every cell Xp ∈ P ∈ P∗A, each agent a ∈ Ahas at least one decision d ∈ D that is optimal for every X ∈ Xp. That is,

P∗A =

{P ∈ P :

{∩X∈Xp arg min

d∈DEa [L (d, θ) |X]

}6= ∅ for all Xp ∈ P, a ∈ A

}.

The effective size of the sample space X for audience A, denoted N (X ,A), is

the minimal size of a partition in P∗A

N (X ,A) = min {|P | : P ∈ P∗A} .13To see why this inequality can be strict, it suffices to consider an example. Suppose thatD = {d1, d2} , X = {x1, x2} , and that c implies Pr {c (X,V ) = dj |V,X = xj} = 2/3 for j ∈ {1, 2}.To achieve the same conditional distribution over c (X,V ) |X with a publicly-randomized rule c, itmust be the case that V11 = {v|c (x1, v) = d1} and V22 = {v|c (x2, v) = d2} both have Lebesguemeasure 2/3, and thus that V11 ∩ V22 has Lebesgue measure at least 1/3 (since the measure ofV11 ∪ V22 cannot exceed one). Conditional on V ∈ V11 ∩ V22, however, agents can perfectly infer Xbased on observing (c (X,V ) , V ) , whereas this is never possible based on observing (c (X,V ) , V ) .Hence, for certain loss functions and priors, communication risk will be strictly smaller under c thanunder c.

36

Proposition 4. If (i) there exists a decision d ∈ D that is dominated in loss and

(ii) N (X ,A) ≥ |D| , then any rule c ∈ C that is admissible under the decision model

with audience A is inadmissible under the communication model with audience A,

and vice versa.

Proof of Proposition 4 To prove this result, note first that under our full-support

assumption on Fθ, any rule c that selects the dominated decision d with positive

probability is dominated in decision risk by the rule c′ that is equal to c except that

it chooses the dominating decision d′ whenever c chooses d. Hence, any rule c that

selects d with positive probability is inadmissible under the decision model.

We next show that any rule c that selects d with probability zero is inadmissible

under the communication model. By Lemma 2 we can limit attention to rules c

that use only the public randomization device. Fix a given such rule c, and let

D′ ⊂ D denote the set of decisions that c selects with positive probability. Since

|D′| < |D| ≤ N (X ,A) , we know that for every realization of the public randomization

device V there is some aV ∈ A, some XV ⊆ X , and some dV ∈ D such that

∩X∈XV arg mind∈D

EaV [L (d, θ) |X] = ∅

and c (X, V ) = dV for all X ∈ XV .14

We can view (XV , dV ) as a random variable supported on 2X × D, for 2X the

power set of X . Since XV is non-empty with probability one and (X ,D) are fi-

nite, there exists a pair X ∗ ⊆ X and d∗ ∈ D such that X ∗ is non-empty and

Pr {XV = X ∗, dV = d∗} > 0, where this event depends only on V so the probabil-

ity does not need to be subscripted by a. Let a∗ be an agent with

∩X∈X ∗ arg mind∈D

Ea∗ [L (d, θ) |X] = ∅, (6)

noting that we can take a∗ = aV for any V with XV = X ∗. Let

V∗ = {V ∈ [0, 1] : XV = X ∗, dV = d∗} ,14If this is not the case, then the level sets of c (·, V ) form a partition in P∗A of size at most |D′| ,

contradicting |D′| < N (X ,A) .

37

so c (X, V ) = d∗ for all X ∈ X ∗ and V ∈ V∗. Finally, let d∗∗ denote an element of

arg mind∈D

Ea∗ [L (d, θ) |c (X, V ) = d∗, V ∈ V∗] .

By (6), there exists X ∈ X ∗ and d ∈ D such that

Ea∗

[L(d, θ)|X]< Ea∗

[L (d∗∗, θ) |X

].

For d an element of D\D′, let c be the rule that reports c (X, V ) = d when V ∈ V∗ and

X = X, and that agrees with c otherwise. Given c (X, V ), an agent can reconstruct

c (X, V ), so the rule c does not increase communication risk for any agent relative to

c. For agent a∗, however,

Ea∗

[L(d, θ)|c (X, V ) = d, V ∈ V∗

]< Ea∗

[L (d∗∗, θ) |c (X, V ) = d, V ∈ V∗

],

so d∗∗ is now sub-optimal for a∗ conditional on observing{c (X, V ) = d, V ∈ V∗

}.

This immediately implies that

Ea∗

[mind∈D

Ea∗ [L (d, θ) |c (X, V ) , V ∈ V∗] |c (X, V ) = d∗, V ∈ V∗]<

mind∈D

Ea∗ [L (d, θ) |c (X, V ) = d∗, V ∈ V∗] .

Since Pra∗ {c (X, V ) = d∗, V ∈ V∗} > 0, this implies that R∗a∗ (c) < R∗a∗ (c), so c dom-

inates c as we wanted to show. 2

Proof of Proposition 3 Follows immediately from Proposition 4 by setting A =

∆ (Θ) . 2

Proof of Corollary 1 Follows immediately from Propositions 2 and 3. 2

Proof of Corollary 3 Lemma 1 shows that the decision and communication mod-

els satisfy the conditions of Proposition 1. For part (a) of the corollary, note that

since by Proposition 1 ω-optimality in the decision model implies admissibility in

that model, under the conditions of the corollary it also implies inadmissibility in the

38

communication model by Proposition 3. Moreover, since ω-optimality for communi-

cation implies admissibility for communication, the set of rules that are ω-optimal for

decision and ω′-optimal for communication, where ω and ω′ may be different, must

not overlap.

For part (b) of the corollary, note that by Proposition 2, c is a Bayes decision rule

with respect to a full-support prior if and only if it is ω-optimal in the decision model

for some ω. The result thus follows by part (a).

Finally, for part (c) of the corollary, note that ω-optimality in the communication

model implies admissibility for communication. By Proposition 3 this implies inad-

missibility for decision, and thus by the argument in part (b) implies that the rule is

not a Bayes decision rule with respect to a full-support prior. �

Theorem 1 again extends to general (closed) audiences A, though in this case we

must also require that the audience be convex.

Theorem 2. If the audience A is convex, any rule c∗ ∈ C that is minimax under the

decision model is minimax under the communication model.

Proof of Theorem 2 Note that by continuity of decision risk in a (established

in Lemma 1) and compactness of closed subsets of the finite-dimensional simplex,

supa∈ARa (c) is achieved for all c. Note, next, that the set of decision risk functions

{R· (c) : c ∈ C} is compact in the supremum norm by Lemma 1. Hence,

infc∈C

supARa (c)

is achieved, so a min-max rule exists.

Thus, there exists c∗ ∈ C with infc∈Cmaxa∈ARa (c) = maxa∈ARa (c∗) . Let c∗ :

X → ∆ (D) be a rule that does not use the public randomization device such that the

distribution of c∗(X)|X is the same as that of c∗(X, V )|X for all X ∈ X . Ra (c∗) =

Ra (c∗) for all a, so maxa∈ARa (c∗) = maxa∈ARa (c∗). For C the class of such rules,

note that if we limit attention to c ∈ C we can cast the problem into the setting of

Section 5 of Grunwald and Dawid (2004). Specifically, let us view c ∈ C as the action

“a” in their terminology, and let us define the state “X” in their terminology as the

pair (X, θ) , noting that this state takes only a finite number of values. Since the class

39

of priors A is closed and convex by assumption, the set of implied distributions for

(X, θ) is closed and convex as well. Finally, the set of risk functions for c : X → Dis finite, while the set of decision risk functions for C is the convex hull of a finite set

and so is compact.

Theorem 5.2 of of Grunwald and Dawid (2004) then implies that there exists

a∗ ∈ A such that

maxa∈A

Ra (c∗) = Ra∗ (c∗) = minc∈C

Ra∗ (c) . (7)

Note, however, that the same argument used to construct c∗,

minc∈C

Ra∗ (c) = minc∈C

Ra∗ (c) ,

so (7) and the fact that Ra (c∗) = Ra (c∗) for all a implies that maxa∈ARa (c∗) =

Ra∗ (c∗) = minc∈C Ra∗ (c) . Note, next, that by the definition of communication risk

minc∈C Ra∗ (c) = minc∈C R∗a∗ (c) . Thus, since Ra (c) ≥ R∗a (c) for all a ∈ A, c ∈ C, we

have Ra∗ (c∗) = R∗a∗ (c∗).

Combining this reasoning, we see that

infc∈C

maxa∈A

Ra (c) = maxa∈A

Ra (c∗) = Ra∗ (c∗) = R∗a∗ (c∗) = minc∈C

R∗a∗ (c) .

Note, next, that by definition

infc∈C

maxa∈A

Ra (c) ≥ minc∈C

maxa∈A

R∗a (c) ≥ minc∈C

R∗a∗ (c) .

Thus, since the first and last quantities are equal, we obtain that

infc∈C

supa∈A

Ra (c) = infc∈C

Ra∗ (c) = Ra∗ (c∗) = R∗a∗ (c∗) = infc∈C

R∗a∗ (c) = infc∈C

supa∈A

R∗a (c) ,

as we wanted to show. 2

Proof of Theorem 1 Immediate from Theorem 2, taking A = ∆ (Θ) . 2

Proof of Corollary 4 Immediate from Proposition 3. 2

40

References

Abadie, Alberto. Forthcoming. Statistical non-significance in empirical economics.

American Economic Review: Insights.

Abadie, Alberto and Maximilian Kasy. 2019. Choosing among regularized estimators

in empirical economics: The risk of machine learning. Review of Economics

and Statistics 101(5): 743-762.

Andrews, Isaiah, Matthew Gentzkow, and Jesse M. Shapiro. 2017. Measuring the

sensitivity of parameter estimates to estimation moments. Quarterly Journal

of Economics 132(4): 1553-1592.

Andrews, Isaiah, Matthew Gentzkow, and Jesse M. Shapiro. 2020. Transparency in

structural research. NBER Working Paper No. 26631.

Awasthi, S., V. K. Pande, and R. H. Fletcher. 2000. Effectiveness and cost-effectiveness

of albendazole in improving nutritional status of pre-school children in urban

slums. Indian Pediatrics 37(1): 19-29.

Athey, Susan and Stefan Wager. 2019. Efficient policy learning. arXiv preprint

arXiv:1702.0289. Accessed on September 20 2019 at

<https://arxiv.org/abs/1702.02896v5>.

Banerjee, Abhijit, Sylvain Chassang, Sergio Montero, and Erik Snowberg. Forth-

coming. A theory of experimenters: robustness, randomization, and balance.

American Economic Review.

Bergemann, Dirk and Stephen Morris. 2019. Information design: A unified perspec-

tive. Journal of Economic Literature 57(1): 44-95.

Blackwell, David. 1951. The comparison of experiments. In J. Neyman (ed.), Pro-

ceedings of the Second Berkeley Symposium on Mathematical Statistics and

Probability : 93-102. Berkeley: University of California Press.

Blackwell, David. 1953. Equivalent comparisons of experiments. Annals of Mathe-

matical Statistics 24(2): 265-272.

Brown, Lawrence D. 1975. Estimation with incompletely specified loss functions

(the case of several location parameters). Journal of the American Statistical

Association 70(350): 417-427.

Chen, Ying, Navin Kartik, and Joel Sobel. 2008. Selecting cheap-talk equilibria.

Econometrica 76(1): 117-136.

Chernozhukov, Victor, Ivan Fernandez-Val, and Alfred Galichon. 2009. Improv-

41

ing point and interval estimators of monotone functions by rearrangement.

Biometrika 96(3): 559-575.

Chetverikov, Denis, Andres Santos, and Azeem M. Shaikh. 2018. The econometrics

of shape restrictions. Annual Review of Economics 10: 31-63.

Cover, Thomas M. and Joy A. Thomas. 2006. Elements of Information Theory. 2nd

ed. New York: Wiley-Interscience.

Crawford, Vincent P. and Joel Sobel. 1982. Strategic information transmission.


Croke, Kevin, Joan Hamory Hicks, Eric Hsu, Michael Kremer, and Edward Miguel.

2016. Does mass deworming affect child nutrition? Meta-analysis, cost-effectiveness,

and statistical power. World Bank Policy Research Working Paper No. 7921.

Donnen, Philippe, Daniel Brasseur, Michele Dramaix, Francoise Vertongen, Mweze

Zihindula, Mbasha Muhamiriza, and Philippe Hennart. 1998. Vitamin A sup-

plementation but not deworming improves growth of malnourished preschool

children in eastern Zaire. Journal of Nutrition 128(8): 1320-1327.

Eaton, Morris L. 1967. Some optimum properties of ranking procedures. The Annals

of Mathematical Statistics 38(1): 124-137.

Efron, Bradley. 1986. Why isn’t everyone a Bayesian? American Statistician 40(1):

1-5.

Eriksson, Stefan and Dan-Olof Rooth. 2014. Do employers use unemployment as a

sorting criterion when hiring? Evidence from a field experiment. American

Economic Review 104(3): 1014-1039.

Farrell, Joseph and Robert Gibbons. 1989. Cheap talk with two audiences. American


Frankel, Alex and Maximilian Kasy. 2018. Which findings should be published?

Harvard University Working Paper. Accessed on November 8, 2018 at

<https://maxkasy.github.io/home/files/papers/findings.pdf>.

Geweke, John. 1997. Posterior simulators in econometrics. In D. M. Kreps and

K. F. Wallis (eds.), Advances in Economics and Econometrics: Theory and

Applications, Seventh World Congress 3: 128-165. Cambridge: Cambridge

University Press.

Geweke, John. 1999. Using simulation methods for Bayesian econometric models:

Inference, development, and communication. Econometric Reviews 18(1): 1-

73.

42

Grunwald, Peter D. and A. Philip Dawid. 2004. Game theory, maximum entropy,

minimum discrepancy and robust Bayesian decision theory. Annals of Statis-

tics 32(4): 1367-1433.

Hildreth, Clifford. 1963. Bayesian statisticians and remote clients. Econometrica

31(3): 422-438.

Jordan, Michael I., Jason D. Lee, and Yung Yang. 2018. Communication-efficient

distributed statistical inference. Journal of the American Statistical Associa-

tion.

Kamenica, Emir and Matthew Gentzkow. 2011. Bayesian persuasion. American


Kitagawa, Toru and Aleksey Tetenov. 2018. Who should be treated? Empirical

welfare maximization methods for treatment choice. Econometrica 86(2): 591-

616.

Kremer, Michael and Edward Miguel. 2004. Worms: identifying impacts on education

and health in the presence of treatment externalities. Econometrica 72(1):

159-217.

Kroft, Kory, Fabian Lange, and Matthew J. Notowidigdo. 2013. Duration dependence

and labor market conditions: Evidence from a field experiment. The Quarterly

Journal of Economics 128(3): 1123–1167.

Kruger, Marita, Chad J. Badenhorst, and Erna P. G. Mansvelt. 1996. Effects of iron

fortification in a school feeding scheme and anthelmintic therapy on the iron

status and growth of six- to eight-year-old schoolchildren. Food and Nutrition

Bulletin 17(1): 1-11.

Kwan, Yum K. 1999. Asymptotic Bayesian analysis based on a limited information

estimator. Journal of Econometrics 88(1): 99-121.

Le Cam, L. 1996. Comparison of experiments—A short review. In T. S. Ferguson,

L. S. Shapley, and J. B. MacQueen (eds.), Statistics, Probability and Game

Theory: Papers in Honor of David Blackwell : 127-138. Hayward: Institute of

Mathematical Statistics.

Lehmann, E. L. 1966. On a theorem of Bahadur and Goodman. The Annals of

Mathematical Statistics 37(1): 1-6.

Lehmann, E.L. and George Casella. 1998. Theory of Point Estimation. 2nd ed. New

York: Springer-Verlag.

Manski, Charles F. 2004. Statistical treatment rules for heterogeneous populations.

43


Morris, Stephen. 1995. The common prior assumption in economic theory. Eco-

nomics and Philosophy 11(2): 227-253.

Oberholzer-Gee, Felix. 2008. Nonemployment stigma as rational herding: A field

experiment. Journal of Economic Behavior & Organization 65(1): 30-40.

Poirier, Dale J. 1988. Frequentist and subjectivist perspectives on the problems of

model building in economics. Journal of Economic Perspectives 2(1): 121-144.

Pratt, John W. 1965. Bayesian interpretation of standard inference statements. Jour-

nal of the Royal Statistical Society, Series B (Methodological) 27(2): 169-203.

Raiffa, Howard and Robert Schlaifer. 1961. Applied Statistical Decision Theory.

Boston: Division of Research, Graduate School of Business Administration,

Harvard University.

Robert, Christian. 2007. The Bayesian Choice: From Decision-Theoretic Founda-

tions to Computational Implementation. 2nd ed. New York: Springer-Verlag.

Simon, Gregory. 2019. Unipolar major depression in adults: choosing initial treat-

ment. In D. Solomon (ed.), UpToDate. Accessed on September 19, 2019 at

<https://www.uptodate.com/contents/unipolar-major-depression-in-adults-choosing-

initial-treatment>.

Sims, Christopher A. 1982. Scientific standards in econometric modeling. In M.

Hazewinkel and A. H. G. Rinnooy Kan (eds.), Current Developments in the

Interface: Economics, Econometrics, Mathematics : 317-337. Holland: D Rei-

del Publishing Company.

Sims, Christopher A. 2007. Bayesian methods in applied econometrics, or, why econo-

metrics should always and everywhere be Bayesian. Slides from the Hotelling

Lecture, presented June 29, 2007 at Duke University.

Spiess, Jann. 2018. Optimal estimation when researcher and social preferences are

misaligned. Harvard University Working Paper. Accessed on December 27,

2019 at<https://scholar.harvard.edu/files/spiess/files/alignedestimation.pdf>.

Stoye, Jorg. 2009. Minimax regret treatment choice with finite samples. Journal of

Econometrics 151(1): 70-81.

Stoye, Jorg. 2011. Statistical decisions under ambiguity. Theory and Decision 70(2):

129-148.

Stoye, Jorg. 2012. New perspectives on statistical decisions under ambiguity. Annual

Review of Economics 4: 257-282.

44

Torgersen, E. 1991. Comparison of Experiments. Encyclopedia of Mathematics and

its Applications. 1st ed. Cambridge: Cambridge University Press.

von Neumann, John. 1928. Zur theorie der gesellschaftsspiele. Mathematische An-

nalen 100(1): 295-320.

Wald, Abraham. 1950. Statistical Decision Functions. New York, NY: John Wiley

& Sons.

Zhu, Yuancheng, and John Lafferty. 2018a. Distributed nonparametric regression

under communication constraints. Proceedings of the 35th International Con-

ference on Machine Learning 80.

Zhu, Yuancheng, and John Lafferty. 2018b. Quantized minimax estimation over

Sobolev ellipsoids. Information and Inference: A Journal of the IMA 7(1):

31-82.

Zhang, Yuchen, John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013.

Information-theoretic lower bounds for distributed statistical estimation with

communication constraints. Advances in Neural Information Processing Sys-

tems 26.

45

Online Appendix for

A Model of Scientific Communication

Isaiah Andrews, Harvard University and NBER

Jesse M. Shapiro, Brown University and NBER

February 2020

A Proofs for Claims in Example and Applications

Proof of Claim 1 For dominance in loss, note that since 0 ∈ XM and 0 ∈ XM ,

0 ∈ D by construction. For any d < 0, L (d, θ) > L (0, θ) for all θ ∈ Θ, so d is

dominated in loss by d′ = 0 ∈ D.

To show that N (X ) = |XM |×∣∣XM

∣∣, note first that sufficiency of(X,X

)implies

that N (X ) ≤ |XM | ×∣∣XM

∣∣ . To complete the proof we will show that for any X =(X,X

)∈ X and X ′ =

(X ′, X

′)

such that X 6= X ′, there exists an agent a ∈ ∆ (Θ)

such that

arg mind∈D

Ea [L (d, θ) |X] ∩ arg mind∈D

Ea [L (d, θ) |X ′] = ∅, (8)

which in turn implies that N (X ) ≥ |XM | ×∣∣XM

∣∣.Note that under the assumed form for the distribution F0, the joint likelihood of

the control group outcomes satisfies the monotone likelihood ratio property in X.15 In

particular, for any agent with a non-degenerate prior on θ, the posterior distribution

on θ is strictly increasing in X in the sense of first-order stochastic dominance. Fix the

prior on θ as the prior which puts mass one on the largest value in Θ0, and consider

15Specifically, the joint density is given by∏n

i=1 exp (θXi)h (θ) g (Xi) =exp (θ · n ·X)h (θ)

n∏ni=1 g (Xi), and for X ′ > X and θ′ > θ,∏n

i=1 exp(θ′X ′i

)h(θ′)g (X ′i)∏n

i=1 exp (θX ′i)h (θ) g (X ′i)>

∏ni=1 exp

(θ′Xi

)h(θ′)g (Xi)∏n

i=1 exp (θXi)h (θ) g (Xi).

46

the class of all priors on θ, ∆ (Θ0) , noting that since we have taken θ as large as

possible all of these priors respect the bounds on the parameter space.

Since the data have full support under all θ ∈ Θ, the implied class of posteriors

for θ conditional on(X,X

)is the same as the class of priors, and X does not matter

for the posterior. The posterior risk function for agent a conditional on X is∑θ∈Θ

(d− θ)2 a (θ|X) .

Strict convexity of the squared error loss implies that each agent’s set of optimal

actions conditional on X contains at most two elements. Let us fix a value X <

max {XM} and consider an agent a who, conditional on X, is indifferent between

two decisions d and d′ with d < d′ (and prefers these to all other decisions). Exis-

tence of such an agent follows from our assumptions. Specifically, since n ≥ 3, and

maxθ∈Θ0EF0(θ) [Xi]−minθ∈Θ0 EF0(θ) [Xi] > 2/3, we know that there exist n1 < n2≤ n

such that 1n1, 1n2∈ (0,maxθ∈Θ0 ATE (θ)) . However, upper and lower bounds on

ATE (θ) after fixing θ at its upper bound are maxθ∈Θ0 ATE (θ) and 0, respectively.

Thus, by the definition ofD there exist d and d′ inD with d, d′ ∈ (0,maxθ∈Θ0 ATE (θ)),

from which the claim follows.

Note that since agent a is indifferent between d and d′, we must have

maxθ∈Θ0EF0(θ) [Xi]− Ea [θ|X] = 1

2d+ 1

2d′. Since |Θ0| ≥ 2, we know that there exists

θ′ ∈ Θ0 with θ′ 6= Ea [θ|X]. Assume θ′ < Ea [θ|X] (if not, we can use the same

argument with θ′ > Ea [θ|X]). If we define aε as the agent with ε > 0 more prior

mass on θ′, and ε less prior mass on all other parameter values, then for ε sufficiently

small aε strictly prefers d′ to any other action conditional on X, but strictly prefers

d to d′ conditional on any X ′ > X. Since we can repeat this argument for all

X < max {XM} , this verifies (8) for all X, X ′ such that X ′ 6= X. However, we can

likewise repeat the same argument to verify (8) for all X, X ′ such that X′ 6= X, which

proves the claim. 2

Proof of Corollary 2 Follows immediately from Claim 1, Proposition 3, and Corol-

lary 1. 2

Proof of Claim 2 Note that for all θ ∈ Θ, θj is decreasing in j by assumption.

We can consider d and θ as step functions defined on the interval [0, J ] with steps

47

at the integers, and ‖d− θ‖2 is just the L2 distance in this case. For d∗ which takes

d∗j = d(j), Proposition 1 of Chernozhukov et al (2009) shows that for any d ∈ D with

dj > dj−1 for some j and any θ ∈ Θ,

‖d− θ‖2 ≥ ‖d∗ − θ‖2

with strict inequality if θj < θj−1. Hence

‖d− θ‖22 ≥ ‖d

∗ − θ‖22 ,

for all θ ∈ Θ, with strict inequality for some some θ′ ∈ Θ, and d is dominated in loss.

2

Proof of Claim 3 To prove the claim, we need to show that any two distinct

realizations of the data, say X and X ′, induce distinct optimal actions for some

agent, in the sense that there exists a ∈ ∆ (Θ) with

arg mind∈D

Ea [L (d, θ) |X] ∩ arg mind∈D

Ea [L (d, θ) |X ′] = ∅.

To this end, suppose without loss of generality that for some j, Xj 6= X ′j. Let

us consider the set of agents Aj who put probability one on θj′ = max {Θ0} for

j′ < j, and probability one on θj′ = min {Θ0} for j′ > j. Hence, for these agents

monotonicity imposes no restrictions on θj and, moreover, the only decision-relevant

data is Xj. However, the binomial distribution has exponential family structure with

f (x; t) = exp (tx)h (t) g (x). Hence, the same argument used to prove Claim 1 in the

running example establishes the result. Since we can repeat this argument for all X

and X ′ with X 6= X ′, this proves the claim. 2

Proof of Claim 4 Theorems 4.2 and 4.3 of Eaton (1967) establish the invariant

optimality and minimaxity claims in the classical model. Since we have taken the

audience to consist of the set of all priors, Proposition 2 shows that minimaxity holds

in the classical model if and only if it holds in the decision model, and the same is

true of invariant optimality. For the Bayes part of the claim, note that the prior a∗

implies that θt and θs are independent for all s 6= t, so Ea∗ [θt|X] is a function of Xt

alone. Moreover, by the monotone likelihood ratio assumption Ea∗ [θt|Xt] is strictly

48

increasing in Xt. Hence, arg maxt Ea∗ [θt|X] = arg maxtXt, which proves the claim.

2

Proof of Claim 5 For the first part of the claim, note that d = 1 yields a strictly

lower loss than choosing d = ι for all θ ∈ Θ.

For the second part of the claim, note that it suffices to show that it holds for

a restricted version of the audience, A ⊆ ∆ (Θ). Specifically, let us consider the

audience consisting of only three agents, a0, a1, and a2.

Agent a0 has a uniform prior on Θ, a0 (θ) = a∗ (θ) = 1|Θ| . This again implies that

θt is independent of θs for all s 6= t. By the monotone likelihood ratio property, pro-

vided arg maxtXt is unique, this agent strictly prefers to set d = arg maxtXt. When

arg maxtXt is non unique, by contrast, this agent strictly prefers d ∈ arg maxtXt to

d 6∈ arg maxtXt, but is indifferent among d ∈ arg maxtXt.

Note, next, that since

a (θ|X) =f (X; θ) a (θ)∑

Θ f(X; θ

)a(θ) , Ea [θ|X] =

∑Θ

θa (θ|X) .

for f (X; θ) the probability mass function of Fθ, and Fθ has full support for all θ ∈ Θ,

Ea [θ|X] is continuous in a.Hence, there exists an open neighborhoodN (a0) around a0

such that all agents a ∈ N (a0) strictly prefer to set d ∈ arg maxtXt to d 6∈ arg maxtXt

for all realizations of X. Within this neighborhood, there is an agent a1 who when

arg maxt

Xt = {1, ..., T}

strictly prefers d = 1, and an agent a2 who strictly prefers d = 2 conditional on the

same event. This immediately implies, however, that N (X ,A) ≥ T + 1, since(arg mind∈D

Ea0 [L (d, θ) |X] , arg mind∈D

Ea1 [L (d, θ) |X] , arg mind∈D

Ea2 [L (d, θ) |X]

)

=

(arg maxtXt, arg maxtXt, arg maxtXt) when arg maxtXt is a singleton

(arg maxtXt, 1, 2) when arg maxtXt= {1, ..., T},

where the right hand side takes T + 1 distinct values. 2

49

Proof of Claim 6 Since agents are free to randomize conditional on observing

c (X, V ) = ι, c yields weakly smaller communication risk for all agents than c∗.

To show that there is a strict inequality for some agents, consider agents a for

whom Ea [θt| arg maxtXt = {1, ..., T}] is non-constant across t, while for all (X, V )

arg maxt Ea [θt|c∗ (X, V )] = c∗ (X, V ) . Note in particular that the agent with a uni-

form prior a∗ (θ) = 1/ |Θ| has arg maxt Ea∗ [θt|c∗ (X, V )] = c∗ (X, V ) , so such agents

a exist by the continuity of conditional expectations in a. When arg maxtXt =

{1, ..., T}, the decision taken by these agents is uniformly randomized under the rule

c∗, while under the rule c they are able to pick a decision they strictly prefer to

uniform randomization. Since these agents are still free to set d = c (X, V ) when

c (X, V ) 6= ι, this establishes that they have strictly lower communication risk under

c than under c∗. 2

B Additional Results Referenced in the Text

B.1 Continuous Versions of Example and Applications

This section extends the recurring deworming example, along with the monotone es-

timation and treatment assignment applications discussed in Section 4, to continuous

settings. For the deworming example and monotone estimation application we show

that changing from decision to communication risk can reverse dominance orderings

in a setting with Θ, X , and D all continuous, while for the treatment assignment

application we show that the results discussed in the main text extend unchanged to

the case with continuous Θ but discrete X and D. We also extend our main result

on admissibility, Proposition 3, to the case of continuous Θ.

B.1.1 Continuous Version of Deworming Example

Consider a Gaussian variant of the discrete deworming example developed in the

main text. Specifically, suppose that the data consist of control- and treatment-

group means from a randomized trial. Let us take X = R2, and assume that the two

means are independent and normally distributed with known variances σ2, σ2 > 0,(X

X

)∼ N

((θ

θ

),

(σ2 0

0 σ2

)).

50

Normality with known variance can be motivated by the asymptotic normality of

sample means (along with consistent estimability of their asymptotic variance).

Note that the average treatment effect is simply the difference between the treat-

ment and control means, ATE (θ) = θ − θ, and that the sample average treatment

effect X − X is again an unbiased estimator for ATE (θ). We again restrict the

parameter space such that the average treatment effect is non-negative, considering

Θ ={θ ∈ R2 : θ − θ ≥ 0

},

and take the decision space to correspond to the support of the treatment-control

difference, D = R. We consider quadratic loss L (d, θ) = (d− ATE (θ))2 , and take

the audience to consist of all possible priors ∆ (Θ) . For any rule c : X ×[0, 1]→ ∆ (D)

and any prior a ∈ ∆ (Θ) , we define decision and communication risk as in the discrete

case, with the difference that, unlike in the discrete case, decision and communication

risk may now be infinite for some rules and priors.

To highlight the distinction between decision and communication risk, we con-

sider two particular rules. First, the treatment-control difference,

c′ (X, V ) = c′ (X) = X −X

and, second, the treatment-control difference censored below at zero,

c′′ (X, V ) = c′′ (X) = max{X −X, 0

}.

Since all audience members agree that the average treatment effect is non-

negative, and the distribution of c′ (X) has support equal to R, c′′ (X) has strictly

lower decision risk than c′ for all a,

Ra (c′′) < Ra (c′) for all a ∈ ∆ (Θ) .

At the same time, c′′ censors the data. Note in particular that distinct realizations of

c′ (X) induce distinct posterior expectations for all agents a with non-degenerate pri-

ors on ATE (θ), V ara (ATE (θ)) > 0. Hence, by strict convexity of the loss, censoring

negative estimates strictly increases communication risk for these agents, while not

reducing it for any other agent. Thus, we see that R∗a (c′′) ≥ R∗a (c′) for all a ∈ ∆ (Θ) ,

51

with strict inequality when V ara (ATE (θ)) > 0. Hence, switching from decision to

communication risk reverses the dominance ordering between c′ (X) and c′′ (X) . The

rule c′ (X) preferred by the communication model again appears closer to what is

reported in practice.

B.1.2 Continuous Version of Monotone Estimation Application

Next, let us consider a continuous version of the monotone estimation application

discussed in the main text. Specifically, suppose we observe a normal random vector

X ∈ X = RJ with independent elements Xj ∼ N(θj, σ

2j

), and σj strictly positive

and known for all j. As with the previous extension, this may again be motivated by

asymptotic normality of sample means.

Suppose that θ ∈ Θ0 for each j, for Θ0 some interval with a nonempty interior

(e.g. Θ0 = R), and that θj is known to be decreasing in j so the joint parameter

space is

Θ ={θ ∈ ΘJ

0 : θ1 ≥ θ2 ≥ ... ≥ θJ}.

We take D = RJ , consider the squared error loss,

L (d, θ) = ‖d− θ‖22 =

∑j

(dj − θj)2 ,

and again take the audience to consist of all possible priors, ∆ (Θ) .

As in the deworming example above we compare two decision rules. The first

reports the raw data

c′ (X, V ) = c′ (X) = X,

while the second reports sorted estimates

c′′ (X, V ) = c′′ (X) = d∗ (X) ,

where as in the main text d∗ (d) sorts the elements of d in decreasing order, so d∗1 (d) =

maxj dj and d∗J (d) = minj dj.

The results of Chernozhukov et al (2009) again show that c′ dominates c′′ in

decision risk, and in particular that Ra (c′′) ≤ Ra (c′) for all a ∈ ∆ (Θ), while Ra (c′′) <

Ra (c′) for a such that Pra {θj < θj−1 for some j} > 0.

52

At the same time, c′′ (X) is a transformation of c′ (X), so

R∗a (c′′) ≥ R∗a (c′) for all a ∈ ∆ (Θ).

Note, next, that any two distinct realizations of c′ (X) induce distinct posterior ex-

pectations for some agent. To see that this is the case, suppose that X and X differ

in their jth component, and consider an agent a with degenerate priors on θj′ for

j′ 6= j and a non-degenerate normal prior on θj (truncated to obey the monotonicity

restrictions). If a observes c′ (X) then their posterior mean Ea [θj|c′ (X)] = Ea [θj|Xj]

is strictly monotonic in Xj. If this agent instead observes c′′ (X), any value of c′′ (X)

with at least two distinct elements is consistent with multiple values of Xj, which

implies that

Pr a {Ea [θj|c′′ (X)] 6= Ea [θj|c′ (X)]} > 0.

Jensen’s inequality thus implies thatR∗a (c′′) > R∗a (c′) . Thus, we again see that switch-

ing from decision risk to communication risk reverses the dominance ordering between

c′ and c′′.

B.1.3 Continuous Version of Treatment Choice Application

Define X , D, and L (d, θ) as in Section 4.2, but take Θ = (0, 1)T (or Θ = [0, 1]T ) and

again consider the audience A = ∆ (Θ) . The results of Lehmann (1966) and Eaton

(1967) continue to apply, and imply that the rule c∗ (X, V ) discussed in the main text

uniformly minimizes decision risk among all decision rules invariant with respect to

permutation of the treatments, and is minimax. Moreover, c∗ is a Bayes decision rule

for the agent a∗ with an independent U [0, 1] prior on each θt, and is admissible in the

classical model by Theorem 4.3 of Eaton (1967). Admissibility in the decision model

then follows immediately by the same argument as in the proof of Proposition 2.

At the same time, the rule c defined in the main text continues to dominate c∗

in the communication model. To see that this is the case, note that given c (X, V )

the agent is again free to randomize uniformly when c (X, V ) = ι, which yields a

random decision with the same conditional distribution a c∗ (X, V ) |X for all X. Thus,

R∗a (c) ≤ R∗a (c∗) for all a ∈ ∆ (Θ) . To see that this inequality is strict for some

a ∈ ∆ (Θ) , consider the restricted set of priors ∆(ΘT

0

)for Θ0 finite, |Θ0| ≥ 2. Claim

6 implies that there is a strict inequality for some a ∈ ∆(ΘT

0

).

53

B.1.4 Admissibility Conflict with Continuous Parameter Space

A corollary of Proposition 4 is that the conclusions of Proposition 3 can be extended

to some cases with continuous Θ.

Corollary 5. Suppose that for an infinite (and potentially uncountable) set Θ but

finite X and D, there exists a decision d ∈ D that is dominated in loss, in the

sense that there exists d′ ∈ D with L (d, θ) ≥ L (d′, θ) for all θ ∈ Θ with strict

inequality for at least one θ ∈ Θ. Suppose further that for some finite subset Θ ⊂Θ, N

(X ,∆

(Θ))≥ |D|. Then any rule c that is admissible in decision risk is

inadmissible in communication risk and vice versa.

The dominance in loss condition holds for continuous Θ in the deworming exam-

ple, monotone estimation application, and treatment choice application. Thus, the ad-

missibility conflict extends to these three settings as well under conditions on Θ analo-

gous to those in the discrete case. (Specifically, we maintain that supθ∈Θ0EF0(θ) [Xi]−

infθ∈Θ0 EF0(θ) [Xi] > 2/3 in the deworming example, and sup {Θ0} − inf {Θ0} > 2/3

in the monotone estimation application.)

Proof of Corollary 5 Note first that any rule with Prθ {c (X, V ) = d} > 0 for

some θ ∈ Θ is inadmissible in decision risk by the same argument as in the proof

of Proposition 4. Next, consider any rule such that Pra {c (X, V ) = d} = 0 for all

a ∈ ∆(

Θ). Let us construct a rule c as in the proof of Proposition 4, and note that

by construction R∗a (c) ≤ R∗a (c) for all a ∈ ∆ (Θ) , with a strict inequality for some

a ∈ ∆(

Θ). Thus, c is inadmissible in the communication model, as we wanted to

show. �

B.2 Heterogeneous Loss Functions and Likelihoods

In Section 2, we claimed that models with heterogeneous loss functions and likelihoods

can be cast into our setting. To see that this is the case, suppose we begin with a model

with an audience of agents A where agent a’s prior is πa ∈ ∆ (Θ), their likelihood

function is Fa,θ, and their loss function is La (d, θ). Here we do not identify agent a

with their prior since agents differ on multiple dimensions. Let us further suppose

54

that the loss is uniformly bounded above

supa∈A,d∈D,θ∈Θ

La (d, θ) <∞,

and that the probability mass function fa (x; θ) of Fa,θ is uniformly bounded away

from zero,

infa∈A,x∈X ,θ∈Θ

fa (x; θ) ≥ η > 0.

Under the assumption of uniformly bounded loss, for M = |Θ| × |D| we can

express

La (d, θ) =M∑m=0

la (m)Lm (d, θ)

where L0 (d, θ) = 0, while for m ≥ 1 the functions Lm (d, θ) are proportional to

1{d = dj(m)

}1{θ = θk(m)

},{(dj(m), θk(m)

): m = 1, ...,M

}= D×Θ, la satisfies la (m) ≥

0 for allm, and∑M

m=0 la (m) = 1. Intuitively, la (m) = La(dj(m), θk(m)

)/Lm

(dj(m), θk(m)

),

so for two agents a and a′, la (m) /la′ (m) measures how much a loses, relative to a′,

from the decision-parameter pair(dj(m), θk(m)

).

Likewise, we can express

Pr Fa,θ {X = x} =N∑n=1

pa (n) fn (x; θ)

where

fn (x; θ) =

1− η (|X | − 1) if θ = θk(n) and x = xr(n)

η otherwise,

{(θk(n), xr(n)

): n = 1, ..., N

}= Θ × X , pa (n) satisfies pa (n) ≥ 0 for all n, and∑N

n=0 pa (n) = 1.

Using these observations, we can cast this example into our baseline setting by

augmenting the parameter space. Specifically, define

Θ∗ = Θ× {0, ...,M} × {1, ..., N} .

For each agent a, define a’s prior on this augmented space as π∗a (θ∗) = πa (θ) la (m) pa (n),

55

and let agents share the homogeneous loss function

L (d, θ∗) = Lm (d, θ) ,

and the homogeneous likelihood

Fθ∗ = Fn,θ,

where Fn,θ has mass function fn (·; θ) . Since the prior imposes mutual independence

between (θ,m, n), agent a′s posterior risk from action d conditional on (X, V ) is

Ea [L (d, θ∗) |X, V ] =

∑θ∈Θ

∑Nn=1

∑Mm=0 Lm (d, θ) fn (X; θ) pa (n) la (m) πa (θ)∑

θ∈Θ

∑Nn=1

∑Mm=0 fn (X; θ) pa (n) la (m) πa (θ)

=

∑θ∈Θ

∑Nn=1 fn (X; θ) pa (n)

∑Mm=0 Lm (d, θ) la (m) πa (θ)∑

θ∈Θ

∑Nn=1 fn (X; θ) pa (n) πa (θ)

=

∑θ∈Θ La (d, θ) fa (x; θ) π (θ)∑

θ∈Θ fa (x; θ) π (θ)= Ea [La (d, θ) |X, V ]

and so coincides with the posterior risk in the initial heterogeneous prior, heteroge-

neous loss, heterogeneous likelihood model. Since this holds for all (X, V ), the pos-

terior risk conditional on (c (X, V ) , V ) likewise coincides for all possible c. Hence, all

risk calculations likewise coincide, and we have successfully recast the initial hetero-

geneous prior, heterogeneous loss, heterogeneous likelihood model as a heterogeneous

prior, homogeneous loss, homogeneous likelihood model with a particular audience

A ⊂ ∆ (Θ∗). Whether the class of priors {π∗a : a ∈ A} is closed and/or convex will

depend on the original priors πa, losses La, and likelihoods Fa,θ.

In conjunction with heterogeneous loss functions and likelihoods, our approach

can also allow heterogeneous decision spaces Da and parameter spaces Θa provided

|Da| = |D| and |Θa| = |Θ| for all a ∈ A. To see that this is the case, note that for

each a we can define a bijection φa between Da and D, and a bijection ψa between Θa

and Θ. Using these bijections we can again regard the loss and likelihood for agent a

as being defined on D and Θ, and use the same argument as above.

56

B.3 Regret-based Payoff Functions

For a given loss function L : D × Θ → R≥0, consider the regret loss function L :

D ×Θ→ R≥0 formed by subtracting the minimized loss:

L (d, θ) = L (d, θ)−mind′∈D

L (d′, θ) .

Regret loss functions of this kind have been considered by e.g. Manski (2004). Defin-

ing the communication, decision, and frequentist risk functions with respect to the

regret loss leaves all of the results in Section 3 unchanged (since the regret loss func-

tion is simply a different choice of loss, and our results do not depend on the form of

the loss).

For a given risk function ρ : C × A → R≥0, we may alternatively consider the

regret risk function ρ : C × A → R≥0 formed by subtracting the best-possible risk:

ρ (c, a) = ρ (c, a)− infc′∈C

ρ (c′, a) .

Regret risk functions of this kind are considered in Stoye (2012). Defining the regret

communication, decision, and frequentist risk functions leaves the results on admissi-

bility and ω−optimality in Sections 3.1 and 3.2 unchanged (since the set of ω-optimal

and admissible rules are the same in both cases). We do not know whether the results

on minimaxity in Section 3.3 extend to this case.

B.4 Selecting Natural Permutations

Let ψ permute the elements of D, and let ψ ◦ c : X × [0, 1]→ ∆ (D) be the rule with

realization ψ (c (X, V )). It is immediate that (ψ ◦ c) ∈ C and that R∗a (ψ ◦ c) = R∗a (c).

By contrast, in general Ra (ψ ◦ c) 6= Ra (c). That is, communication risk is invariant

to permutations, whereas decision risk is not. As a result, the communication model

has the unrealistic implication that if, for example, D is a subset of R and is symmetric

around zero, all agents are indifferent between the report c and the report −c.It is possible to modify the communication model to eliminate this unrealistic

implication. Define for any rule c ∈ C the set of rules

Q (c) = {c′ ∈ C : R∗a (c′) = R∗a (c)∀a ∈ A}

57

that are equivalent in communication risk to the rule c. Now say that a rule c∗ ∈ C is

• naturally admissible in communication risk if c∗ is admissible in communication

risk and there exists no rule c ∈ Q (c∗) such that Ra (c) ≤ Ra (c∗) for all a ∈ A,

with strict inequality for at least one a ∈ A.

• naturally ω-optimal in communication risk if c∗ is ω-optimal in communication

risk and ∫ARa (c∗) dω (a) = inf

c∈Q(c∗)

∫ARa (c) dω (a) .

• naturally minimax in communication risk if c∗ is minimax in communication

risk and

supa∈A

Ra (c∗) = infc∈Q(c∗)

supa∈A

Ra (c) .

For each optimality criterion, naturalness selects from among the communication-

optimal rules those that are not worse in decision risk than some other rule that

is equivalent in communication risk. The resulting rules are “natural” in the sense

that they will not lead to unnecessary losses if their reports are taken literally as

recommended decisions. This approach is similar in spirit to the refinement of cheap-

talk equilibria that arises from an arbitrarily small cost of lying (Chen et al. 2008).

The resulting refinement does not affect the relationship between the communi-

cation and decision models.

Proposition 5. There exists a rule c∗ that is admissible in decision risk and naturally

admissible in communication risk if and only if there exists a rule c ∈ C that is

admissible in both decision and communication risk. The same holds for ω-optimality

and minimaxity.

Proof of Proposition 5 We first consider admissibility. Since a rule is naturally

admissible in communication risk only if it is admissible in communication risk, the

“only if” part of the statement is trivial. For the “if” part, note that if c∗ is admissible

in both decision risk and communication risk, then it is admissible in communication

risk, and admissible in decision risk relative to the restricted class of rules Q (c∗) .

However, this is the definition of natural admissibility. The same argument works for

ω-optimality and minimaxity. �

58

B.5 Derivation of Quadratic Loss in Deworming Example

Let k ≥ max {D} be the private marginal cost of the medication with k−d the market

price with a subsidy (or tax) d ∈ D. Assume that households’ private willingness-

to-pay is distributed uniformly on [0,W ] for W ≥ k −min {D}, so that the share of

households purchasing the medication at market price k − d is given by

W − (k − d)

W∈ [0, 1] .

Each purchase has a social benefit not internalized by households given by θ =

ATE (θ) (e.g. because malnutrition has an impact on children’s long-run outcomes

that is not fully internalized by parents). The government’s payoff is given by the

sum of total surplus from the purchases, plus the social benefit:(W − (k − d)

W

)(W + (k − d)

2−(k − θ

))where W+(k−d)

2is the average private willingness-to-pay conditional on purchase and

(k − ATE (θ)) is the social marginal cost. We can rewrite the government’s payoff as

1

2W

(W 2 − 2W

(k − θ

)−(d− θ

)2

+(k − θ

)2).

Subtracting the best possible payoff (which arises when d = θ) gives

− 1

2W

(d− θ

)2

which is directly proportional to the assumed loss.

B.6 Complete Class Theorems for Decision and

Communication Models

In this section we establish complete class theorems for decision and communication

risk. To state this result, we allow a general closed audience A ⊆ ∆ (Θ), and say that

a rule c∗ ∈ C is ω-optimal in decision risk if and only if∫Ra (c∗) dω (a) = inf

c∈C

∫Ra (c) dω (a) ,

59

where unlike for ω-optimality the support of the probability measure ω may be a

strict subset of A. We define ω-optimality in communication risk analogously.

Proposition 6. A rule c is admissible in decision risk for the audience A only if it is

ω-optimal in decision risk for some ω. Likewise, a rule is admissible in communication

risk only if it is ω-optimal in communication risk for some ω.

For decision risk, this result is an immediate consequence of the classical complete

class theorem. By contrast, communication risk is nonlinear in a, and so requires a

slightly different argument (and in particular depends on the public randomization

device).

Proof of Proposition 6 We first prove the result for decision risk. Note that the

set of decision risk functions is trivially convex, since we can take a mixture of any

two decision rules to form a new decision rule. Hence we can apply Theorem 8.4.3

of Robert (2007) (with A playing the role of Θ) to obtain that weighted average risk

minimizing procedures are a complete class for decision risk.

We next prove the result for communication risk. Note first that the presence

of V ensures that the set of communication risk functions is convex. Specifically, for

any pair of rules c1, c2 ∈ C and any α ∈ [0, 1] we can construct a new rule

cα (X, V ) = 1 {V ≤ α} c (X, V/α) + 1 {V > α} c (X, (1− V ) / (1− α)) .

Since each agent a observes (cα (X, V ) , V ),

R∗a (cα) = αR∗a (c1) + (1− α)R∗a (c2) .

We show continuity of communication risk in Lemma 1, so combined with the

convexity of the class of communication rules, Theorem 8.4.3 of Robert (2007) again

implies that weighted average risk minimizing procedures are a complete class for

communication risk. 2

B.7 Extension of Optimal Treatment Assignment Example

This section extends the analysis of optimal treatment assignment in Section 4.2 of

the main text to show that when agents have sufficiently informative beliefs, it may

60

be communication-preferred to report ι even in some cases without exact ties. To

develop these results we consider a restricted audience A ⊂ ∆ (Θ).

Claim 7. Suppose that for a closed audience A and some non-empty set E ⊆ X ,

arg maxt

Ea [θt|X] = arg maxt

Ea [θt] for all a ∈ A, X ∈ E . (9)

Then the rule c which takes c (X, V ) = c∗ (X, V ) when X 6∈ E and c (X, V ) = ι when

X ∈ E has weakly lower communication risk for all a ∈ A than does the rule c∗.

Claim 8. If in addition to the conditions of Claim 7,{X : arg max

tXt = {1, ..., T}

}∩ E 6= ∅, (10)

there exists a ∈ A with arg maxt Ea [θt|c∗ (X, V ) , V ] = c∗ (X, V ) for all (X, V ), and

arg maxt Ea [θt] a singleton, then c dominates c∗ in communication risk.

Claim 9. Finally, if the conditions of Claim 7 hold, the conditions of Claim 8 hold for

all a ∈ A, and the audience A is invariant under permutation of the treatments (that

is, the set of priors is unchanged by relabeling the treatments), then c∗ is minimax in

decision risk but not communication risk for the audience A.

Condition (9) in Claim 7 formalizes what it means for X ∈ E to be uninformative

for the audience A : no agent in this audience changes their action in response to

observing X ∈ E . Note that E and A are tightly linked here. For example, if we take

A to contain only agents with dogmatic priors, then all data is uninformative and we

can take E = X . By contrast, if we take A = {a∗} to consist of the agent with an

uninformative prior discussed in Claim 4, then the largest possible E is

E =

{X : arg max

tXt = {1, ..., T}

}.

Claim 8 is intuitive as well. The condition arg maxt Ea [θt|c∗ (X, V ) , V ] = c∗ (X, V )

means that agent a has a sufficiently uninformative prior that they are willing to follow

the recommendation of the minimax rule. By contrast, the fact that arg maxt Ea [θt] is

a singleton means this agent’s prior is sufficiently informative that, absent any data,

61

they have a unique preferred decision. Putting these two conditions together, how-

ever, means that this agent would strictly prefer to avoid the randomization induced

by the rule c∗.

Finally, for Claim 9, note that, since the audience is invariant, minimaxity of c∗

follows from the fact that it minimizes risk in the class of invariant decision rules.

However, the conditions of Claim 9 imply that the audience of A is not convex, since

the convex hull of any invariant audience necessarily contains a prior a such that

arg maxt

Ea [θt] = {1, ..., T}

which we have ruled out by assumption. Hence, Claim 9 illustrates that with a non-

convex audience, minimax decision rules need not be minimax communication rules.

Proof of Claim 7 Note that all agents have the option to choose d ∈ arg maxt Ea [θt]

conditional on observing c (X, V ) = ι, while choosing d ∈ arg maxt Ea [θt|c (X, V ) , V ]

conditional on observing c (X, V ) 6= ι. By the definition of E this yields a weakly

lower expected loss for agent a than choosing some d ∈ arg maxt Ea [θt|c∗ (X, V ) , V ].

2

Proof of Claim 8 If arg maxt Ea [θt] is a singleton for a given agent a and (9) holds,

then conditional on X ∈ E agent a strictly prefers not to randomize their decision.

At the same time, since arg maxt Ea [θt|c∗ (X, V )] = c∗ (X, V ), under the rule c∗ this

agent’s decision is random conditional on the data when

X ∈ E ∩{X : arg max

tXt = {1, ..., T}

}.

As above, since the agent is free to choose d = c (X, V ) conditional on c (X, V ) 6= ι

and d = arg maxt Ea [θt] conditional on c (X, V ) = ι, we see that c yields a strictly

lower communication risk for this agent. Since we have shown in the proof of Claim

7 that c yields weakly lower communication risk than c∗ for all a ∈ A, c dominates

c∗. �

62

Proof of Claim 9 Since the set A is a closed subset of the simplex and communi-

cation risk is continuous in a, the proof of Claim 8 implies that

infa∈A{R∗a (c∗)−R∗a (c)} > 0.

Hence, the maximum communication risk of c∗ for the audience A is strictly larger

than that of c, which proves the second part of the claim. To prove the first part

of the claim, note that by the assumed invariance of the audience A, the maximum

decision risk of a rule c is bounded below by the risk of the rule ψ◦c where ψ randomly

permutes the treatments. Note, however, that the rule ψ ◦ c is invariant to relabeling

of the treatments by construction. The results of Lehmann (1966) imply, however,

that Ra

(cI)≥ Ra (c∗) for any invariant rule cI and any a ∈ A. Thus, we have that

for any rule c

supa∈A

Ra (c) ≥ supa∈A

Ra

(ψ ◦ c

)≥ sup

a∈ARa (c∗) ,

which proves that c∗ is minimax in decision risk. �

63

A Model of Scienti c Communication - Harvard University · 2020-02-25 · A Model of Scienti c...

Documents

Transcript of A Model of Scienti c Communication - Harvard University · 2020-02-25 · A Model of Scienti c...