Kristóf Madarász First Version: March 2007. - Home | IZA - · PDF...
Transcript of Kristóf Madarász First Version: March 2007. - Home | IZA - · PDF...
Information Projection: Model and Applications.∗
Kristóf Madarász
University of California, Berkeley.†
First Version: March 2007.
This Version: April 2008.
Abstract
Evidence from both psychology and economics shows that people systematically underestimate
informational differences. I model such information projection by assuming that after processing a
signal, a person overestimates the probability with which this signal is available to others. I apply the
model to agency and communication settings. When learning about an expert’s skill using ex-post
information, a biased evaluator exaggerates how much a skilled expert could have known ex ante,
and hence underestimates the expert on average. To minimize such underestimation, ex-ante experts
will be too reluctant to produce useful information that will be seen more clearly by the evaluator ex
post, and too eager to gather information that the evaluator will independently learn ex post. I also
show that information projection introduces noise into evaluations decreasing an expert’s incentive
to exert effort and lowering the optimal use of monitoring relative to the Bayesian case. Evidence
from, and applications to, medical malpractice, liability regulation, and effective communication are
discussed.
Keywords: Hindsight Bias, Curse of Knowledge, Internal Labor Markets, Medical Malpractice,
Communication, Retrospectroscope.
∗I am grateful to Matthew Rabin for his constant encouragement and suggestions. I also owe special thanks to Botond
Koszegi for his advice and support. I benefited from conversations with George Akerlof, Jerker Denrell, Erik Eyster, Marina
Halac, Daniel Kahneman, Ulrike Malmendier, David Rosenberg, Adam Szeidl, Daniel Teknos and seminar participants at
LSE, LBS, Cambridge, Oxford, UC San Diego, Yahoo! Research, Google, UC Berkeley, CEU and HAS.†Contact: [email protected] or 549 Evans Hall # 3880, UC Berkeley, Berkeley CA 97420-3880.
1
1 Introduction
The study of how asymmetric information affects economic activity, typically builds on the assumption
that people perceive informational differences correctly. Evidence shows however, that people system-
atically mispredict informational differences and exaggerate the similarity between the information they
have and information available to others. For example, having learned novel information about a patient,
one may well exaggerate whether an attentive physician should have diagnosed cancer earlier.
In this paper, I model such information projection by assuming that having processed a signal, a
biased person exaggerates the probability with which this signals is also available to others. I show that
as a result, a biased examiner will be too pessimistic about the skill of the agents she evaluates. In turn,
agents will have incentives to change the type of information they produce to mitigate the adverse effects
of hindsight bias on their reputation. I also investigate how this bias affects the optimal use of incentives,
communication and social inference.
In the context of financial markets, Camerer, Loewenstein, and Weber (1989) provide laboratory
evidence that better informed traders overestimate how much uninformed traders know, and that such
curse of knowledge affects market outcomes. In Section 2, I review both controlled laboratory, and more
stylized field evidence to support my claim that information projection is a widespread phenomenon. In
Section 3, I develop a formal model of information projection building on CWL (1989).
In Sections 4 and 5, I turn to the main application of the paper: the influence of information projection
on performance evaluation. To illustrate the results, consider a radiologist who diagnoses a patient based
on an ambiguous X-ray. After the diagnosis is made, the patient returns with novel symptoms and an
evaluator is asked to assess the radiologist’s original diagnosis. A biased evaluator projects ex-post
information, and acts as if all radiologists should have guessed the symptoms earlier. This leads to two
types of inferential mistakes: underestimation, and over- and under-inference.
Assume that radiologists differ in skill, and skilled ones understand the X-ray and unskilled ones
do not. If more ex-ante information increases the chances of an ex-post successful treatment, a biased
evaluator exaggerates the success rate for both types of radiologists. In hindsight, a successful treatment
becomes the norm and a failed one becomes a surprise. If the probability of failure decreases with
skill, the evaluator thus underestimates the radiologist skill on average. The ’surprisingly’ high failure
to success ratio is perceived to be the result of the lack of skill, rather than the lack of sufficient ex-ante
2
information.
While the evaluator underestimates the agent on average, information projection will typically affect
her conditional beliefs as well. Whenever knowing the symptoms ex-ante would have increased the
chances of a successful diagnosis for a skilled type than for an unskilled type, the evaluator over-infers
skill from performance. For example, if the symptoms alone are uninformative, but combined with
the X-ray they are perfectly indicative of cancer, a biased evaluator perceives differences in luck to
be differences in skill. If however, knowing the symptoms alone is almost perfectly informative, and
hence the probability of a failed treatment depends very little on understanding the X-ray, the evaluator
perceives differences in performance to be due to differences in luck. Here, she underinfers skill from
performance.
Given these results, a natural question to ask is how the radiologist might change his behavior to
minimize the adverse effects of information projection on his reputation. Evidence suggests that ex-
perts are often aware that those evaluating them are biased. For example, it is argued that “defensive
medicine”, medical procedures designed to minimize false liability rather than maximize cost-effective
health care, is due partly to the fear of experts that those evaluating them will suffer from hindsight bias.
To study such behavior, assume that the radiologist can decide what radiographs to order ex ante. I
show that if a radiograph is a substitute of the ex-post information, i.e., it provides information ex-ante
that the evaluator will independently learn ex post, then the radiologist has an incentive to over-produce
this radiograph. Overly costly MRI’s might be ordered for all patients if such MRI’s produce the same
information that the evaluator inevitably learns ex post. At the same time, the radiologist is too reluctant
to produce complement information, i.e., radiographs that help him make a good diagnosis but can be
interpreted better in light of ex-post information. He might avoid ordering a mammography that helps
detect breast cancer if he fears it can be interpreted much better in hindsight than in foresight. Thus as a
result of information projection, increasing the likelihood of ex-post evaluations could increase produc-
tion costs, lower productivity and exacerbate the types of over- and underproduction of information that
observers have attributed to medical malpractice regulation.1
If the management of a hospital is aware of evaluators propensity for hindsight bias, it can correct this
mistake to some extent. In important situations however, even the perfect anticipation of the evaluator’s
1See Studdert et al. (2005), Kessler, and McClellan (2002) and Jackson, and Righi (2006).
3
bias cannot eliminate inefficiencies. To show this, in Section 5, I turn from a context where the amount
of information that the radiologist learns from an X-ray is a function of effort rather than of skill.
To motivate the radiologist, a hospital may provide incentives to encourage a careful reading of the
X-ray. When the radiologist’s effort is not observed however, he might be rewarded and punished based
solely on whether the patient’s condition improved or deteriorated. In cases of limited liability or risk-
averse radiologists, no such reward scheme can be first-best optimal. A second-best scheme may instead
involve monitoring whether the radiologist’s diagnosis accorded with the information that was available
to him ex-ante.
A biased evaluator, however, is prone to judge the correctness of the diagnosis not on the basis of the
ex-ante available information, but on the basis of both the ex-ante and the ex-post information. Thus the
radiologist is punished too often for bad luck and rewarded too rarely for good decisions. As a result, the
radiologist’s incentive to carefully understand the X-ray is lower than under a Bayesian evaluator. More
generally, an agent who is de jure facing a negligence rule is de facto punished and rewarded according
to strict liability if he is assessed by a biased judge or jury. I show that the report of a biased evaluator
contains too much noise and hence even if the hospital anticipates the bias, it has a reason to monitor less
often than in the rational case. I also show that if the hospital does rely on biased reports, it nevertheless
decides to induce lower levels of effort to save on incentives that are appropriate in the rational case, but
too strong in the biased case.
In Section 6, I turn to the influence of information projection on communication. I show that a
listener who projects his private information will be too credulous of a speaker’s advice because he
overestimates how much the speaker knows. I also show that a speaker who projects information on her
non-communicable background knowledge, will mistakenly send messages that are too ambiguous for
her audience to interpret. I identify conditions for over- and under-communication.
Finally, in Section 7, I conclude with a brief discussion of some further implications and extensions
of my model. I discuss how information projection might affect social inferences in networks causing
hostility between groups, as well as the possibility of extending my model to capture the related phe-
nomenon of ignorance projection, where a person who does not observe a signal underestimates the
probability that this signal is available to others.
4
2 Evidence and Related Literature
Folk wisdom has for a long time recognized the existence of what I call information projection, as noted
by the common refrain, "hindsight is 20/20". I begin this section by discussing both laboratory and more
stylized evidence on two closely related phenomena: hindsight bias – the phenomenon that people form
biased judgements in hindsight relative to foresight –, and the curse of knowledge – the phenomenon
that informed people overestimate the information of those uninformed.2 I then turn to a brief summary
of some evidence on related biases lending support to the existence of the projection of various forms
of private information. Although individual studies are often subject to alternative interpretations, the
sum-total of the studies provides a compelling case for the widespread existence of this phenomenon.
The presence of information projection in experimental financial markets was demonstrated by
Camerer, Loewenstein, and Weber (1989). A group of Wharton and Chicago MBA students traded
assets of eight different companies in a double-oral auction. Traders were divided into two groups. In
the first group, traders were presented with the past earnings history of the companies (not including
1980) and traded assets that yielded returns in proportion to the actual 1980 earnings of these compa-
nies. In the second group, traders received the same information, and in addition they also learned the
actual 1980 earnings of the companies. By design, returns for traders in the second group depended on
the market price established by those in the first group. Therefore to maximize earnings, better-informed
traders had to guess as correctly as possible the market price at which less-informed traders traded these
assets. If traders in the second group project their information, then their guesses and hence the price at
which they trade are significantly different from the market price established by the first group. CLW
(1989) finds that the guesses of better-informed traders were biased by 60% towards the actual 1980
earnings and market prices were biased by 30%.3 The reason why the bias in the market was lower than
in judgements is that traders with a smaller bias traded more aggressively. Less biased traders behaved
as if they had anticipated that others would project information.
Further evidence comes from the experimental study of Loewenstein, Moore, and Weber (2006)
who build on CLW (1989). They study the curse of knowledge using a set of visual recognition tasks.
2Hindsight bias studies involve both between-subject and within-subject designs. In the latter, participants have to re-
call their own prior estimates after being presented with new evidence. Since my focus in this paper is on interpersonal
information projection, I concentrate on the between-subject designs.3CLW do not report the numbers explicitly, only graphically, so these are approximate. See CLW (1989) pp.1241.
5
Figure 1: From Loewenstein, Moore, and Weber (2006).
In these tasks, subjects are presented with two pictures that differ in one crucial detail. LMW (2006)
divide subjects into three groups: uninformed, informed, and choice. In the uninformed condition, no
additional information is available besides the two pictures. In the informed condition, the difference
between the pictures is highlighted for the subjects. In the choice condition, subjects could decide
whether to obtain additional information for a small fee, or remain uninformed. After looking at the
pictures, the subjects in each group are asked to guess what fraction of people in the uninformed group
could tell the difference between the two pictures. Subjects are compensated based on how well they
predicted this fraction.
As Figure 1 indicates, the informed subjects’ mean estimate was significantly higher than the unin-
formed subjects’ mean estimate. Importantly, a significant portion of the people in the choice condition
paid for additional information. In this group, the mean estimate was 55.4%, while the mean estimate of
subjects who chose to remain uninformed was 34.6%. Hence people not only projected their additional
information, but also paid for information that biased their judgements in a way that lower their earnings.
The work of Fischhoff (1975) initiated research on hindsight bias. He showed that reporting an out-
come of an uncertain historical event increases the perceived ex-ante likelihood of the reported outcome
occurring. Fischhoff’s findings were replicated by a plethora of studies, and most of these find a strong
presence of such hindsight bias, often larger than the one found in this initial study. These studies and the
meta-analyses building on them also show that the presence of hindsight bias is robust to a great number
of debiasing techniques. A robust comparative static result is that the more informative the outcome
the greater is the bias, Harley et al. (2004). As I demonstrate in Section 3, my model of information
projection exhibits the same monotonicity.
Less controlled evidence comes from more explicit field studies. In the context of liability judge-
ments, there is a wealth of evidence that juries and experienced judges fail to ignore superior informa-
6
tion and instead form judgements as if the defendant had information that was unavailable at the time
he acted. Experiments have demonstrated the existence of information projection in the evaluation of
ex-ante judgements of various experts. Anderson et al. (1997) documented the existence of the bias in
judges deciding on cases of auditors’ liability where auditors failed to predict the financial problems of
their audit clients. Caplan, Posner, and Cheney (1991) conducted a study with 112 practicing anesthesi-
ologists. Here physicians saw identical case histories but were either told that the case ended in minor
or were told that it ended in severe damages. Those who were told that a severe damage occurred were
more likely to judge the ex-ante care to be negligent. In certain cases, the difference in the frequency
of ruling negligence was as great as 51%. Bukszar and Terry (1988) demonstrate hindsight bias in the
solution of business case studies, Hastie, Schkade, and Payne (1999) document very serious biases in
jurors’ judgement of punitive liability. Strong effects were found among others in experiments on the as-
sessment of railroad accidents, legality of search, evaluation of military officers, etc. For survey articles
on the evidence, see e.g., Harley (2007).4
A large set of other psychological findings further indicate that people project various types of private
information. For example, a study by Gilovich, Medvec, and Savitsky (1998) shows that people greatly
overestimate the probability that their lies, once made, are detected by others.5 Such overestimation was
also documented in the context of communication. In a set of experiments, Kruger et al. (2005) found
that when people communicate through email, they overestimate how well their intent is transmitted
through their messages.6 Here, senders had to make serious and sarcastic statements either through
email or voice recording, and then guess the probability that receivers would be able to understand their
intent.
As Figure 2 shows, the mean estimate for both those sending an email and those sending a voice
recording was 78%, while the actual probabilities were 73% in the voice condition and 58% in the
email condition. Kruger et al. (2005) also conduct an experiment where they ask subjects in the email
condition to vocalize their messages before sending them. Senders are again randomly divided into two
groups; some are asked to vocalize the message in the same tone as the tone of their email, and others
4The legal profession has long recognized this bias and developed certain procedures to mitigate its effects. One such
procedure is the bifurcation of trials, where ex-post evidence is suppressed at the initial phases of the trial. More on this see
Rachlinski (1998).5Illusion of transparency was also studied in the context of negotiations, Van Boven, Gilovich, and Medvec (2003). Here
the results are harder to interpret.6See also Newton (1990) on tappers and listeners.
7
Figure 2: From Krueger, Epley, Parker, and NG (2005).
are asked to vocalize it in the opposite tone. Senders in both groups overestimate how easy it would be
to understand their messages, yet such overestimation decreased significantly in the case where senders
vocalize in the opposite tone. While some of these results may be due to general overconfidence about
one’s ability to communicate, the evidence is more consistent with the interpretation of information
projection.
My paper builds closely on the experimental results of Camerer, Loewenstein, and Weber (1989).
CLW offer a preliminary model of this bias by assuming that a better informed trader’s estimate of the
mean of a less informed trader’s estimate of the value of an asset is the linear combination of the better
informed trader’s estimate of this mean value and the less informed traders’ estimate of this mean value.
Biais, and Weber (2007) build on this formalization of CLW and assume that after observing a realization
of a random variable, a person misperceives the mean of her prior on this variable to be the mean of her
own posterior. Biais and Weber then study whether this formulation of within person hindsight bias can
explain trading behavior consistent with underreaction to news. They also test their hypothesis using
psychometric and investment data from a sample of investment bankers in Frankfurt and London.
In the context of predicting future changes in one’s taste, the phenomenon of projection has also
been studied by Loewenstein, O’Donoghue, and Rabin (2003) and Conlin, O’Donoghue, and Vogel-
sang (2007). In contrast to the projection of taste, the projection of information is most relevant in the
interpersonal domain where people think about what others might or might not know, and hence it is
primarily a social bias.
Several other papers, with no explicitly developed model, argued that information projection, un-
der the rubric of hindsight bias or curse of knowledge, might have important economic consequences.
Among others, Viscusi, and Zeckhauser (2005), Camerer, and Malmendier (2006), Heath, and Heath
8
(2007), Rachlinski (1998) argue that information projection might be an important factor in economic
settings affecting both judgements and the functioning of organizations. The model also belongs to the
small but growing literature on quasi-Bayesian models of individual biases e.g., Rabin (2002), Mul-
lainathan (2002) and the literature on social biases e.g., DeMarzo, Vayanos, and Zwiebel (2003).
The evidence I summarized in this section is indicative of the fact that people project various forms
of information. Although this evidence comes from a diverse set of experimental paradigms that use
different methods of identification and classify information projection under a variety of rubrics, the
model that I present in the next section provides a framework to study this phenomenon in a more
unified manner. It also provides a setup to make more precise predictions about the implications of
information projection in organizations and labor markets and to test such predictions.
3 Model
Consider an environment where people observe signals about the underlying physical state ω ∈ Ω, Ω is
bounded. An example of ω could be the fundamental value of a company’s stock, the medical condition
of a patient, or the geophysical conditions of a place where an engineer is commissioned to build a
bridge. Let there be M people and N different signals sjNj=1. A signal is a function sj(ω) : Ω−→∆Z
from the set of states to the set of lotteries over a realization space Z. These signals provide information
about the state through the correlation between the observed outcome from Z and the state ω ∈ Ω.
Information is interpreted given a common prior σ(ω) over Ω, where this prior also determines people’s
shared view about ω absent any signals.
Let pjm determine the probability with which signal sj is available for person m. If pjm = 0, then
sj is surely not available for her, and if pjm = 1, it surely is. The collection of these parameters for all
m and all j is given by p = pjmNj=1
Mm=1. The elements of this vector describe the correct Bayesian
estimates of the distribution of information. The informational environment is then summarized by the
tuple Ω, sjNj=1, σ, p.
In what follows, I distinguish between the availability and the processing of a signal. Availability
refers to the fact that this signal is ’present’, while processing refers to the fact that its information
content is actually understood. As an illustration, note that only someone who has training in medicine
9
knows what to infer from the radiograph. In cases where this distinction applies, I assume that pm
concerns the availability of a signal.
We can now define information projection in the following way: a person who projects information
exaggerates the probability that the signals she processed are available to others. To measure the extent
of this mistake, I introduce a parameter ρ ∈ [0, 1] which denotes the degree of information projection.
Definition 1 Person i exhibits interpersonal information projection of universal degree ρ if, after process-
ing signal sj, her perception of pjk is given by pj,ρk ≡ pjk(1− ρ) + ρ. for all k ∈M , k = i.
Information projection by person i corresponds to the overestimation of the probability that a signal
person i processed, is available others. Such overestimation is increasing in ρi. If ρi = 0, then the
person has correct Bayesian perception and does not exaggerate the availability of signals. If ρi = 1,
then she exhibits full information projection and her perception of the probability that the signals she
processed are available to others equals 1. In cases where 0 < ρi < 1, the person exhibits partial infor-
mation projection.7 The above definition captures the key feature of the evidence: people underestimates
informational differences to an extent not warranted by Bayesian reasoning.
Intuition suggests that certain pieces of information are projected more than others, and that the
extent to which a particular piece of information is projected depends on a number of factors. In the
above definition, I allow for heterogeneity in projection by allowing ρ to vary across signals and across
individuals. If ρji denotes the degree to which person i projects signal sj, then such heterogeneity exists
whenever ρji = ρli for some j, l ∈ N or ρji = ρjk for i, k ∈ M . Here, I do not attempt to pin down the
factors determining the value of ρ. My claim though is that the evidence suggests that in a number of
economically important domains ρ > 0. More research is needed to get a better understanding of why
certain signals are projected more than others.
While full information projection is not, partial information projection is sensitive to the re-description
of signals. For example, if two signals are collapsed into one, then partial projection of the combined
signal induces a different probability distribution on the information of player m′ than if the two signals
were projected individually. In most relevant applications though there is a natural way to break down
7In certain contexts, for greater psychological realism or for issues of measurability, the following transformation of the
true probabilities into perceived probabilities might be more appropriate: pj,ρk = pjk/[(1− ρ) + ρpjk]. This functional form
preserves the same properties as the previous one for all pjk > 0, but assumes that if pjk = 0, then pj,ρk = 0 for all ρ.
10
information into distinct signals or groups of signals. For example, in the case of hindsight bias in per-
formance evaluation where information projection happens over time, the timing of information already
suggests a way to break down information into distinct signals. Importantly, however, almost all results
in this paper are qualitative and do not depend on the use of partial projection. I indicate it in the text
when a result holds under partial information projection but not under full information projection.
There is another sense in which the exact separation of signals matters in my setup. This concerns
the distinction between availability and processing. If one signal requires skill to be processed and the
other does not, then my model has different implications when these two signals are collapsed into one,
or considered to be separate. Here, I always assume that the degree to which a signal requires skill to be
processed is fixed.
As mentioned in Section 2, evidence suggests that in important contexts, people anticipate that others
are biased. Since I build on this fact in the applications, I define such anticipation formally. Let the
probability density function fi,k(ρ) describe the beliefs of person i concerning the extent to which person
k = i projects her information. If fi,k is not concentrated on 0, person i believes that there is a non-zero
probability that person k is biased. Two types of anticipation are of special interest. First, if person i
believes that person k is not biased then the cdf generated by fi,k is such that Fi,k(0) = 1. If person i
has a perfect guess of person k’s bias, then Fi,k(ρk) = 1 and Fi,k(ρ) = 0 for all ρ < ρk.
3.1 A Dinner
Many of the paper’s results follow from two ways information projection biases inference. To demon-
strate these, consider a dinner invitation from Mrs. Robinson (the host) to Mr. Keynes (the guest). At
the dinner, Robinson offers either fish or meat to Keynes, and let her choice be denoted by y ∈ M,F.
Assume that Keynes either prefers fish, ωF , or meat, ωM . Robinson observes a noisy signal sr about
his taste, where Pr(sr = ω | ω) = h ≥ 0.5. Keynes also observes a private signal sk, about his
preference for the evening. Keynes is better informed about his own taste and thus sk is such that
Pr(sk = ω | ω) = z > h.
The point of this example is to see how Keynes’ views change about Robinson after the dinner.
Consider a case, where Robinson has four possible types θ. She is either kind, and follows her signal or
she is mean, and follows only her own taste. In addition, she either prefers meat or fish. Assume for a
11
moment that Keynes knows his taste, and set z = 1. Assume also that Robinson observes only a noisy
signal where h = 2/3. Let the prior belief of Keynes be π0(θ), and assume that he initially believes
that each type is equally likely. The following table summarizes a Bayesian versus a fully biased guest’s
beliefs about the kindness of the host after being served the meal he likes and after being served the meal
he dislikes:
Posterior Bayesian, ρ = 0. Biased, ρ = 1.
π1(kind | y = sk) =2/3+2/32/3+1+2/3
=4
7, 1+1
1+1+1=2
3.
π1(kind | y = sk) =1/3+1/31/3+1+1/3
=2
5, 0
1= 0.
E[π1(kind)] = 7
12∗47+ 5
12∗25=1
2, 7
12∗23= 7
18.
Note that a biased guest overestimates kindness if he is served the mean he likes, and underestimates
it if he is served the meal he dislikes. In the former case, he believes that a kind host serves the right
meal with probability one. In the latter case, he believes that a kind host serves the wrong meal with
probability zero. Hence in both situations a biased Keynes reads too much into Robinson’s choice.
More generally, whenever the guest’s signal about his taste is more precise than the host’s, a biased
guest overestimates how well different types separate. Let πρ1(θ) be the posterior of a ρ−biased guest.
The following proposition shows that the guest overinfers from the host’s choice.
Proposition 1 For all π0, πρ1(θkind | y = sk) is increasing and πρ1(θkind | y = sk) is decreasing in ρ.
The above proposition holds independently of whether Keynes actually observes the realization of
s1, or just knows that Robinson observed s1. In both cases, he exaggerates how much she knew, and in
expected terms infers too much from her choice.
As the third row of the above table shows, these two over-attributions do not cancel out, rather on
average Keynes comes to believe that Robinson is mean. Note that relative to his prior, a biased guest
overestimates the probability that he will be served the meal he prefers. A biased guest’s estimate is 3/4
while the true probability is 7/12. This means that Keynes will be disappointed in Robinson on average.
More generally, a biased guest who knows more about his own taste than the host, overestimates the
probability with which the host she can serve his preferred meal if she wants to. This implies that on
average he underestimates the probability that the host cares about his taste. As the following proposition
shows such underestimation holds for all z > h.
12
Proposition 2 For all π0, E[πρ1(θkind)] is decreasing in ρ where expectations are taken with respect to
the true distribution of signals.
To further illustrate this point, consider a case where the guest and the host receive i.i.d. noisy signals
about the state. Assume that the guest mistakenly believes that the host observed the exact same signal
realization as he did. As long as the two signals are i.i.d., the expected posterior of a biased observer
and that of a Bayesian one are the same. Underestimation happens only if the biased guest has more
information than the host. Even in this case however, over-inference has an interesting implication.
Assume that it so happens that the host has the same taste as the guest. Here a fully biased guest on
average infers that the host is kind. In contrast, if it happens to be so that the host has a different taste,
then a fully biased guest on average infers that the host is hostile. Thus a biased guest misattributes
differences in taste to differences in intentions.
4 Skill Assessment
Let’s now turn to the main application of the paper and consider the impact of information projection
on performance evaluation. This application is motivated both by the key role performance evaluation
plays in labor markets, organizations, medicine, and law and by the evidence which indicates that this
bias is common in such contexts.8 In this section, I focus on a problem of skill assessment where a
supervisor wants to learn about the competence of her agent. In the next section, I focus on the problem
of incentives and monitoring where a principal wants to motivate the agent to exert effort.
Consider an agent who is hired by a principal to process and act upon information available to
him. Since agents differ in their ability to understand information a supervisor is asked to review the
agent’s performance on a task and assess the agent’s competence. Such assessment then forms the
basis of compensation, firing or job-allocation decisions. Assume, as it is typically the case, that when
evaluating his performance the supervisor has access to some ex-post information that was not available
to the agent.9
8On the former, see e.g., Alchian, and Demsetz (1972) or Lazear (2000).9Berlin (2000) argues that in the case of lung cancer, the generally accepted error rate for radiological detection is between
20% and 50%. For radiographs that were previously evaluated as normal but where the patient later developed lung carcinoma
however, the carcinoma is seen in retrospect in as many as 90% of the cases.
13
Consider for example a social worker who is assigned a case of foster care. After the injury of
the child the state commissions a supervisor to investigate whether the social worker was efficient at
preventing this outcome. All the home visits and the phone calls of the social worker are reviewed to
establish whether the social worker acted appropriately given his information. In doing so, a biased
supervisor projects information that becomes available only through learning that the child was injured.
Similar evaluation might happen, when a CEO is assessed by a Board that knows the market conditions
that were uncertain at the time, when the CEO had to decide how to allocate funds among various
projects.
Agent Supervisor A’s info S’s info
Radiologist Medical Examiner Patient’s X-ray Subsequent case history.
Social Worker Government Official Child’s family history Child’s reaction to treatment.
CEO Board Firm’s investment projects Market conditions.
I first show that a biased supervisor underestimates the skill of the agent on average. Since both
higher skill and more information leads to a higher probability of success in my setup, exaggerating how
much information the social worker had leads to underestimation. The second result identifies conditions
under which the supervisor will infer too little or too much from performance. I conclude this section by
showing how increasing the frequency of monitoring changes the behavior of an agent who anticipates
the bias. I derive predictions on the types of information that will be over-produced and the types that
will be under-produced.
4.1 Setup
The radiologist’s (agent’s) task is to offer a treatment recommendation y which maximizes the probabil-
ity of a successful outcome. Before taking y, he receives a set of signals s0 about the patient’s condition
ω, where s0 consists of signals that all radiologists understand and consists of signals that require skill
to be processed. The probability that a radiologist understands skill-intensive signals depends his type
θ ∈ [0, 1]. A radiologist of type θ, understands such signals in θ fraction of the time, the most skilled
radiologist (θ = 1) always understands the X-ray while the incompetent one (θ = 0) never does. If he
does not understand these signals, he infers nothing from them.10
10In some specifications, a more natural interpretation of θ would refer to the fraction of signals the agent under-
stands/receives or his ability to distinguish between important and unimportant signals. Importantly, the results in this
14
After the radiologist takes y, the supervisor observes whether a success (S) or a failure (F ) occurred
along with a set of novel signals s1 about ω. In most cases, observing success or failure alone has
information about the state ω, and it is key to our analysis that the supervisor does learn something novel
about the patient’s medical condition that was not available to the radiologist ex-ante.
Assume that understanding more signals in s0 increases the probability that the radiologist’s ex-ante
optimal choice leads to a success ex-post. Furthermore, assume that if he could use the signals in s1
in addition, this probability would even be higher for all types. As long as the principal cares about
success, she prefers to employ a high type radiologist over low type. Assume finally that neither the
supervisor nor the agent observes θ but they share a common prior π0 with full support over [0, 1].11 The
uncertainty about the radiologist skill motivates the skill assessment of the supervisor because what the
supervisor learns about θ can then form the basis of employment, compensation and allocation decisions
by the principal.
4.2 Example
To illustrate the formal setup, consider first a specific information and task structure. Let Ω = 1, 2, 3, 4
and let an ex-ante signal s0 provide noisy information on whether the state is an even or an odd number.
Formally, let Pr(s0 = z | z) = h where z ∈ even, odd and h ∈ (0.5, 1). Let a second ex-ante
signal s′0 give precise information on whether the state is low (ω ≤ 2) or high (ω > 2). Assume that
s′0 requires skill to be processed but s0 does not. Assume that a success occurs if y = ω and a failure
occurs otherwise. Since s′0 is processed with probability θ by an agent of type θ, the true probability of
success for a type θ agent is:
Pr(S | θ, h) =h
2(1 + θ) (1)
Assume that the supervisor observes an ex-post signal s1, which tells her precisely whether the state is
an even or an odd number. If the supervisor projects s1, then her perceived probability of a success given
section require only that the probability of success is increasing in type for any given set of signals.11The assumption that the agent does not know his type is for simplicity only. It is otherwise standard in the career concerns
literature, e.g. Holmström (1999). Since in Sections 4.3-4.6 the agent is passive, this assumption plays no role in the results
there.
15
a projection of degree ρ is:
Pr(S | θ, h)ρ =1
2(ρ+ (1− ρ)h)(1 + θ) (2)
where subscript ρ refers to the degree of the bias. This equation shows that the supervisor’s expectation
of the probability with which a type θ agent should succeed is increasing in ρ. Assume that the supervisor
observes success or failure, but not y. Let πρ1(S) and πρ1(F ) denote a ρ-biased supervisor’s updated
beliefs after observing success and failure, respectively. The following claim shows that biased inference
after success is the same as the unbiased, but a biased supervisor is more pessimistic after a failure than
a Bayesian one.
Claim 1 For all π0, πρ1(S) does not change in ρ, πρ1(F ) is decreasing in ρ in the sense of first-order
stochastic dominance (FOSD).
If recruitment and firing decisions are based on the supervisor’s assessment, then this claim implies
that the agent will be fired too often after a failure but not after a success.
4.3 Underestimation
Although the above example was illustrative of the setup, it did not explicate how information projection
changes the supervisor’s assessment more generally. To identify this mechanism, let me now turn to the
more general case. The first result is also the main result of this section. It claims that if the supervisor
projects productive information, she underestimates the radiologist’s skill level on average.
Proposition 3 Taking expectations based on the true distribution of signals, E[πρ1 ] FOSD E[πρ′
1 ] iff
ρ′ ≥ ρ, for all π0.
Information projection leads to the systematic underestimation of the agent’s skill. Since the su-
pervisor projects productive information, she overestimates the overall probability of a success and
underestimates the overall probability of a failure. Hence she is more surprised observing a failure and
less surprised observing a success than in the unbiased case. As a result, a biased supervisor puts more
weight on the information revealed by a failure and less weight on the information revealed by a success
than a Bayesian supervisor. Since lower types are more likely to fail than higher types, this leads to
16
underestimation on average. Note that in the Bayesian case the expected posterior always equals the
prior and hence the above proposition also implies that the expected biased posterior is lower than the
prior, and that this is true for any prior.
Proposition 3 shows that if the supervisor has access to information that the agent did not have, the
supervisor is negatively biased in her assessment. Let’s call an increase in s1, the ex-post information,
a change that increases the extent to which knowing and acting upon s1 increases the probability of a
successful outcome after an optimal decision. The next corollary shows that an increase in the projected
signal leads to further underestimation.
Corollary 1 For all ρ > 0, E[πρ1 ] is decreasing in s1 in the sense of FOSD.
In the analysis above, I assumed that the supervisor’s inference is based on a performance measure
which consists of either a success or a failure and the knowledge of what the set of ex-ante signals is. In
many situations, the supervisor has more detailed information, such as observing the exact realizations
of the signals in s0 or the agent’s action y. In a Bayesian setting, such information leads to more precise
estimates of θ. In a world of biased evaluators, it might well increase underestimation. Thus in-depth
investigations might be welfare reducing in a biased case, even if they are welfare improving in the
Bayesian one.
4.4 Over- and Under-inference
Proposition 3 is consistent with the general wisdom that hindsight bias leads to too much ex-post blame.
Although this result is true on average, it does not follow that the supervisor assigns too much blame both
after a success and after a failure. Conditional beliefs will depend on the exact nature of the information
projected. If the bias leads to a perception, where the marginal return to skill is higher than in the
Bayesian case, the supervisor overinfers skill from performance. This happens for example in a case,
where in the absence of ex-ante information the outcome is determined only by chance, but in hindsight,
an able radiologist should have detected cancer, had he not only understood the X-ray, but knew what
symptoms the patient would develop later. In contrast, if information projection leads to a perception,
where the marginal return to skill is lower than in the Bayesian case, the supervisor underinfers skill from
performance. This is the case where ex-post information completely substitutes for the skill-intensive
17
information and hence in retrospect all differences in performance are due to differences in luck. The
following two examples show how conditional beliefs depend on the nature of the projected information.
Let ω = !1!2 where !i ∈ 1,−1 for i = 1, 2 and let there be a symmetric prior σ both on !1
and !2. Let s0 be a signal about !1that is true with probability h. Let s′0 be a signal about !2 that is
always true. Assume that skill is necessary for the understanding of s′0 but not of s0. In this case, the
true probability of success for a type θ agent is:
Pr(S | θ) = θh+1
2(1− θ) (3)
Let the ex-post information be given by s1 where s1 reveals !1. Hence the perceived probability of
success of type θ with information projection of degree ρ is:
Pr(S | θ)ρ = ρθ + (1− ρ)θh+1
2(1− θ). (4)
It is easy to see that for all ρ > 0 the marginal return on skill in the biased case is higher than in the
Bayesian case, h − 1
2+ ρ(1 − h) versus (h − 1
2). As a result, the supervisor exaggerates the extent to
which performance is influenced by skill. In the limit, where h = 1
2, performance is not informative
about skill yet a biased supervisor infers skill from performance. Here, full information projection leads
to the complete illusion of talent.12
Let be everything as before, except the ex-post information. Let s1 now tell the true value of !2 with
probability z. Here, in contrast to the previous example, the productivity of s1 depends on whether the
agent processed s′0 or not. If s′0 was processed, s1 adds no information. If it was not, s1 increases the
agent’s chances of producing a successful outcome. The true probability of success for a type θ agent is
Pr(S | θ) = θh+1
2(1− θ) (5)
and the perceived probability with full information projection is
Pr(S | θ)1 = θh+ (1− θ)[hz + (1− h)(1− z)]. (6)
12For other mechanisms leading to the illusion of talent based on false beliefs, see Rabin (2002) or Spiegler (2006).
18
In contrast to the previous case, here the marginal return on skill is lower in the biased case. In the
limit, where z = 1, the perceived probability of success equals h for all types and hence the marginal
return is perceived to be zero. This means that a fully biased supervisor does not update her beliefs
after observing performance because she believes that differences in performance are due entirely to
differences in luck.
Given a bias of degree ρ, let λρ(θ) = Pr(S | θ)/Pr(S | θ)ρ be a measure of the exaggeration of the
probability of success for a type θ agent. We can now state the following more general result:
Proposition 4 For all π0 and ρ > 0, if λρ(θ) is decreasing in θ, then πρ1(S) FOSD π1(S), λρ(θ) is
increasing in θ, then π1(S) FOSD πρ1(S).
The above proposition specifies the effects of information projection on the supervisor’s assessment
after a success as a function of the projected information. 13 The impact on the assessment after a failure
is the outcome of the net effect of two forces: underestimation, and over- and under-inference. In the
case of over-inference, these two point in the same direction and the supervisor is always too pessimistic
after a failure. In the case of under-inference, they point in different directions and the net effect is
ambiguous. As Example 2 shows, it is possible that the under-inference effect dominates and a biased
supervisor is too optimistic after a failure.
4.5 Production of Information
As the evidence suggests, many professionals do anticipate the presence of hindsight bias. In our model,
this suggests that the agent might respond strategically to the supervisor’s bias. It follows from the above
analysis, that if the agent prefers a higher assessment to a lower one, information projection decreases his
welfare on average. To avoid such underestimation, the agent has incentives to reduce the information
gap that exists between the ex-ante and the ex-post environment. This incentive might motivate the agent
to change the set of signals he has, and also to avoid certain tasks if possible.
Consider a specification where the radiologist has access to a set of signals s0 and can decide to
produce an additional ex-ante radiograph s0.14 The cost of producing this radiograph is a and the benefit
13In the case where the marginal return to skill is the same both in the true and in the biased perception, only underestima-
tion has an effect. See the example of Section 4.2.14I do not require that s0 needs skill to be processed, only that there are some skill-intensive signals in s0. For ease of
notation, I suppress s0 in what follows.
19
is the increased probability of a successful treatment. Assume that the radiologist bears the full cost of
a and also that his compensation w0 depends on the outcome. This compensation is w0,S > 0 after a
success, and 0 after a failure. Assume that the radiologist’s utility depends on the supervisor’s evaluation
as well. In particular, assume that the radiologist’s future wage w1 equals the mean of the evaluator’s
ex-post assessment, i.e., wρ1 = E[θ | πρ1 ]. Future wages could be interpreted as a reduced form for
the radiologist’s future employment opportunities.15 Formally, consider the following von Neumann-
Morgenstern utility function for the agent:
U(w, a) = w0 + wρ1 − a ∗ 1s0 is produced (7)
The above specification assumes risk neutrality over assessments, i.e., that ex-ante the agent cares only
about the expected beliefs of the supervisor, and does not care about the difference between the condi-
tional beliefs after a success or a failure. This assumption is mainly for expository purposes, and later in
this section, I discuss the case where this assumption is relaxed.
Let m denote the frequency of monitoring. This frequency corresponds to the ex-ante probability
with which the agent is evaluated. Since the supervisor’s assessment of θ changes only conditional of an
assessment, it follows that for a fixed m and ρ, the radiologist’s optimal choice whether to produce the
additional radiograph is determined by the following inequality:
[Pr(S, s0)− Pr(S)]w0,S − a ≥ mE[wρ1 − wρ1(s0)] (8)
The left-hand is the direct benefit minus the direct cost of producing signal s0. The right-hand is the
loss/gain in expected wages from producing this additional radiograph.
In the Bayesian case, the expectation of the right-hand side of Eq. (8) is always zero. This is true
because the supervisor’s expected posterior always equals her prior under Bayesian learning, and this
holds independently of what signals the radiologist did or did not see. As a result, given the assumption
of risk-neutrality, the choice of producing the additional radiograph is independent of the frequency of
monitoring. Furthermore, even if we relax the assumption of risk-neutrality, the radiologist’s choice
should be independent of the ex-post information. Since such information does not influence the agent’s
15The specific assumption that wρ1 = E[θ | πρ1 ] is without loss of generality in the sense that results hold for all utility
functions that are increasing in πρ1 in the sense of FOSD.
20
ex-ante productivity, it should not affect the supervisor’s assessment in the Bayesian case.
In the biased case, at the same time, the posterior is decreasing in the information gap between the
ex-ante and the ex-post stage. Here the choice whether to produce the additional radiograph depends
crucially on the relationship of this signal and the ex-post information. To see this, let me distinguish be-
tween two ways the productivity of these two signals could be linked. I call signals s0 and s1 substitutes,
if processing s0 decreases the productivity gain from having s1. I call these two signals complements, if
processing s0 increases the productivity gain from having s1. The following definition introduces these
two properties formally:
Definition 2 Let λρ = Pr(S)/Prρ(S) and λρ = Pr(S | s0)/Prρ(S | s0). Signals s0 and s1 are substi-
tutes if λρ < λρ for all ρ. Signals s0 and s1 are complements if λρ > λρ for all ρ.
When signals are substitutes, the radiologist can reduce the information gap between ex-ante and ex-
post by ordering an additional radiograph. When signals are complements, ordering a new radiograph
increases this information gap. For a given m and ρ, let a(m, ρ) denote the cost where Eq. (8) holds
for equality. According to the next proposition, an increase in the probability of monitoring leads to an
increase in the production of substitute information and to a decrease in the production of complement
information.
Proposition 5 If s0 and s1 are substitutes, then a(ρ,m) is decreasing in m iff ρ > 0. If s0 and s1 are
complements, then a(ρ,m) is increasing in m iff ρ > 0.
A radiologist has additional incentives to undertake diagnostic procedures that substitute for ex-post
information. The reason is that such diagnostic procedures reduce the probability of unwarranted ex-
post blame. Even when such procedures are socially inefficient, because they are too costly or just
undesirable for example because they expose the patient to too much radiation, a radiologist will un-
dertake them to maintain a good reputation. As a result, the more he is monitored, the more expensive
and potentially more harmful his activities will be on such tasks. At the same time, a radiologist has
additional incentives to avoid information that can be interpreted much better in hindsight than in fore-
sight. Even if the production of such information increases productivity more than it increases costs, the
radiologist is better-off without producing such information because this way, he can avoid developing
21
a bad reputation. Both effects are increasing in the frequency of monitoring.16
Proposition 5 provides distinct predictions on over- and under-production of information as a func-
tion of the environment. Stylized evidence provides support for existence of both over- and under-
production of information. For example, Studdert et al. (2005) survey physicians (in areas of surgery,
radiology and emergency medicine) in Pennsylvania. In their sample, 43% of the physicians report using
imaging technology in clinically unnecessary circumstances, and 42% of them claim, they took steps to
restrict their practices, which included eliminating procedures prone to future complications. Kessler
and Mclellan (2002), show that changes in defensive medicine that result from medical liability reforms,
are primarily on diagnostic rather than on therapeutic practices. Intuition suggests, that there is typically
more room for information projection in the former than in the latter context. While many argue for
a direct link between defensive medicine and hindsight bias, further evidence is needed to test for the
mechanism described.
Note that Proposition 5 rests crucially on the assumption that the supervisor conditions her inference
of the agent’s competence on all signals produces by the radiologist. She does not need to observe the
realization of these ex-ante signals, but the results depend on the fact that she does observer whether s0
was or was not produced.
Alternatively, one could imagine a situation, where the radiologist can produce s0 secretly, i.e., in
a way that the supervisor does not learn about s0. In this case, the radiologist’s expectation of his
future wages is independent from his production choice. Even here, the anticipation of the supervisor’s
bias might lead to distortions in production choices. To see these effects, let’s return for a moment to
Proposition 4. In environments, where the supervisor overinfers skill from performance wages are too
high after a success and too low after a failure. It follows that here the radiologist wants to over-produce
ex-ante information secretly. In environments, where the supervisor underinfers skill from performance,
wages are low high after a success, and hence the radiologist might want to under-produce ex-ante
information secretly. These deviations from the Bayesian incentives disappear if these under- and over-
productions are detected, unlike in the case of Proposition 5.
Given the focus on skill assessment, I assumed that skill-intensive signals are always present as is
16A corollary of the underproduction result is that the radiologist is also averse to ordering tests that deliver results after
his recommendation is made. These tests increase the information gap between ex-ante and ex-post, and hence increase the
extent of underestimation.
22
true in most situations. In situations, where the agent cannot eliminate, or sufficiently reduce, the infor-
mativeness of novel ex-post signals, ceteris paribus, he would like to avoid procedures that involve infer-
ence about his skill to protect himself from underestimation. A similar incentive might well be present in
a Bayesian setting, where the radiologist is risk-averse over future wages. Absent skill-intensive signals,
the agent is not exposed to wage fluctuations that result from the supervisor’s updating.17 Information
projection amplifies this aversion so that even a risk-neutral agent will exhibit such preferences. Impor-
tantly though, the two mechanisms are distinct and the incentives that can alter them are also different.
5 Reward and Punishment
In the pervious section, I showed that a biased supervisor underestimates the agent’s skill on average.
A principal, responsible for employment decisions, can to some extent correct the supervisor’s mistake
if she anticipates that the supervisor’s reports are too negative on average. In most situations, however,
a principal does not have as detailed information about the agent’s task as the supervisor. Hence such
corrections might introduce other forms of inefficiencies, and might not eliminate the incentives of the
agent to act against underestimation.
In this section, I turn from a context where the amount of information that the agent learns from
a signal is a function of his skill, to situations where it is a function of how much effort he exerts.
How often the radiologist understands X-rays, depends on how carefully he evaluates them. A careful
evaluation is costly because it requires the radiologist to exert effort. To provide incentives for the
radiologist, the principal offers him a contract that rewards the agent for a good health outcome and
punishes him for a bad one. If the health outcome is only a noisy measure of the correctness of the
radiologist’s diagnosis, and effort is unobservable, better incentives can be provided if the principal
hires a supervisor to monitor the radiologist. This way, the principal can tie reward and punishment
closer to whether the radiologist made the correct diagnosis given the information available to him ex
ante.18
The main result of this section shows that if the supervisor projects ex-post information, the effi-
ciency gains from monitoring are decreased. I show that if the supervisor believes that the agent could
17On this logic in the Bayesian setting, see Hermalin (1993).18For this classic insight, that increasing observability reduces inefficiency in the context of moral hazard, see Holmström
(1979) and Shapiro and Stiglitz (1984).
23
have learned the true state, the radiologist is punished too often and exerts less effort than in the Bayesian
case. I also show that when the principal designing incentives anticipates the supervisor’s bias, she wants
to monitor less often. Even if she decides to monitor, she induces less effort on the part of the agent
than in the Bayesian case. The reason is that information projection, even if anticipated by the principal,
introduces noise in the supervisor’s reports, and hence decreases the efficiency of monitoring.
5.1 Effort
Assume that the agent’s level of the effort determines the probability with which he understands signal
s0. Let p(a) be the probability that s0 is understands s0 when the agent exerts effort a. If he does not
understand it, he infers nothing from s0. I assume decreasing returns to effort in terms of the processing
probability. Formally, p′(a) > 0 and p′′(a) < 0. I also assume that lima→0 p′(a) = ∞ and lima→∞
p′(a) = 0.
Let s0 be such that Pr(s0 = ω | ω) = h. Assume that the probability of a success conditional on
the fact that the agent’s action equals the state, y = ω, is k. Assume that the probability of success for
actions different from the state, y = ω, is z where k > z. Finally, assume that if the agent does not
process s0, he is equally likely to take any action y ∈ Ω and the probability that such a random action
matches the state is b where b < h.
For simplicity, assume that both the agent and the principal are risk neutral. Let the agent’s utility
function again be U(w, a) = w0 − a and the principal’s utility function be V (r, w) = r − w0, where r
is the revenue to the principal from the task. Let the revenue of the principal be 1 after a success and 0
after a failure.
5.2 Performance Contract
As the benchmark, I characterize the first-best effort level where the marginal social benefit from exerting
effort equals the marginal social cost. The first best effort level, a∗f , is then defined implicitly by the
following equality:
qp′(af) = 1
24
where q = (h − b)(k − z), and q measures the productivity gain from processing signal s0.19 This
productivity increases in h, the precision of the agent’s signal, and in k, the probability of success
conditional on an optimal action. The productivity decreases in b, the probability of making the right
choice by chance, and in z, the probability of success conditional on a non-optimal choice. With a slight
abuse of notation, let the vector q denote the collection of the parameters, h, b, k, z.
Let’s now turn to the case, where the agent’s effort is unobservable. Assume that the agent is pro-
tected by limited liability, i.e., w0 ≥ 0 has to be true in all contingencies. Let the agent’s outside option
be 0. Given the assumption of risk-neutrality, the principal’s optimal contract is one that offers the low-
est compensation possible after a failure. This implies that the compensation after a failure is wF = 0.20
Let wS denote the compensation offered to the agent upon a success.
In light of these considerations, the principal’s problem is to maximize her expected utility:
maxa,w
V (r(a, q), w) = [p(a)q + bk + (1− b)z](1− wS) (9)
subject to the agent’s incentive compatibility constraint:
an(q, w) = argmaxa
[p(a)q + bk + (1− b)z]wS − a. (10)
Given the agent’s utility function, we can replace this incentive compatibility constraint with its
first-order condition. To guarantee that there is a unique stable equilibrium, I assume that p′′′(a) ≤ 0 for
all a. The optimal effort level, a∗n(q), which solves this constrained maximization problem is defined
implicitly by following equation:
qp′ = 1−p′′(p+ (bk + (1− b)z)/q)
(p′)2. (11)
Let w∗n(q) denote the corresponding optimal wage.
Note that a∗n(q) is always smaller than a∗f (q). The reason is that the principal faces a trade-off:
implementing a higher level of effort is only feasible at the cost of leaving a higher rent for the agent.
19I assume that the solution is always interior. Furthermore, h = h + (1 − h) |Ω||Ω|−1b where |Ω| is the cardinality of the
action space.20On the use of limited liability contracts, see e.g., Innes (1990) and Dewatripont, and Bolton (2005). I believe that the
results of this section hold a fortiori given a risk-averse radiologist.
25
Thus effort is lower and the agent’s rent is higher than in the first-best. A simple comparative static result
follows from Eq. (11). Increasing h or k, increases the productivity of processing information and thus
generates higher utility for the principal given a contract. Since p′ > 1 is always true in equilibrium, a
higher h or k allows for cheaper incentives and thus the principal wants to induce more effort, implying
that effort is increasing in h and k.
Lemma 1 An increase in h or k increases the equilibrium effort level a∗n(q) and the payoff to the prin-
cipal.
5.3 Bayesian Monitoring
The effort level characterized by Eq. (11) is optimal given that the supervisor observes a performance
measure that consists only of success and failure, but obtaining more precise reports about the agent’s
action, allows the principal to induce the same level of effort at a lower cost.
Consider that the principal can monitor the agent by learning the agent’s action and the information
that was available to him. In case of such monitoring, the optimal contract rewards the agent if his
action is the one suggested by the information available to him and punishes the agent otherwise. Since
whether a success happens or not does not contain additional information, it is easy to see that such
a compensation scheme is optimal. Given such a reward scheme, the agent’s incentive compatibility
constraint can now be expressed by the following first-order condition:
am(q, w) = argmaxa
p(a)(1− b)wS + bwS − a (12)
and the optimal contract induces an equilibrium effort level, a∗m(q), defined implicitly by the following
condition:
p′q = 1−p′′(p+ b/(1− b))
(p′)2. (13)
Let w∗m(q) denote the corresponding optimal wage.
The equilibrium effort under monitoring, a∗m(q), is always greater than equilibrium effort without
monitoring a∗n(q). The reason is that monitoring improves the trade-off between providing incentives
and leaving a positive rent for the agent. It rewards good decisions rather than good luck. As a result,
if the principal monitors the agent, she can induce the same level of effort at a lower cost and hence for
26
any given level of effort, she realizes a greater expected profit. The fact that it becomes cheaper for the
principal to induce effort means that the principal is willing to pay for monitoring.
Lemma 2 The equilibrium under monitoring induces a higher effort, a∗n(q) < a∗m(q) and the principal
is better-off with the option of monitoring.
5.4 Biased Monitoring
Let the supervisor’s ex-post signal be s1 and assume that the projected information is such that along
with s0 it perfectly reveals the state but alone its uninformative. This means that a biased supervisor
perceives the true problem as if h = 1 for all h ≤ 1. Furthermore, it also implies that upon not
processing s0 the supervisor still believes that the probability that the agent can take the right action
is b .The consequence of such information projection is that the supervisor makes wrong attributions
from the agent’s choice. Whenever y = ω, the supervisor concludes that the agent did not successfully
read the information available to her. Hence, if the agent did read and follow s0, but this information
turned out to be ’incorrect’ ex-post, the supervisor mistakenly infers that the agent did not read s0.21 The
probability of this mistake is p(a)(1− h), i.e., the probability that s0 is processed times the probability
that s0 did not suggest the right action.
Assume that the agent correctly predicts the bias of the supervisor. In this case, the agent’s effort is
given by the solution of the following maximization problem:
a1m(q, w) = argmaxa
p(a)h(1− b)wS + bwS − a. (14)
Comparing this condition with that of Eq. (12), it is clear that the return to effort is lower in the
biased case. The reason for the former is that an unbiased supervisor can distinguish – up to probability
(1− b) – between a bad decision that is due to wrong ex-ante information and a bad decision that results
from not processing a signal. In contrast, a biased supervisor mistakes a bad decision due to wrong
ex-ante information to a bad decision that is due to not having processed the available information. This
implies that for any given compensation wS the agent exerts less effort in the biased case.
21Note that the ’inference’ of the supervisor is only about whether the agent’s effort was successful or not. In a moral
hazard context, there is no inference about the agent’s effort a.
27
Proposition 6 Suppose h < 1. Then a1m(q, w) < am(q, w), and a1m(q, w) is increasing in h with
a1m(q, w) = am(q, w) if h = 1..
The above corollary shows that if a negligence based reward scheme is enforced by a biased evalu-
ator, than it becomes closer to strict liability and in our setup, this reduces care. A possible corollary to
the above proposition is that negligence rule might actually backfire. The reason is that under monitor-
ing the radiologist is offered a lower compensation in equilibrium which is outweighs by the increased
probability of reward in a Bayesian case. Since the probability of a reward is reduced in the biased case,
care might be lower under biased monitoring than under the simple performance contract.
As a final scenario, consider the case where the bias of the supervisor is common knowledge between
the principal and the agent. If the principal is aware of the supervisor’s bias, she knows that at times, the
supervisor comes to the wrong conclusion. Since the principal can only determine the probability of this
mistake, and not whether the supervisor’s report is actually wrong or right, information projection adds
noise to the supervisor’s reports. Thus, the data obtained by monitoring contains more noise than in the
Bayesian case. This decreases the efficiency of monitoring. As a result, the principal decides to induce
less effort than he would had he believed that the supervisor had perfect Bayesian perception.
Let the optimal effort level induced be denoted by a1∗
m (q) and implicitly defined by:
p′q =(p′)2 − p′′(p+ b/h(1− b))
(p′)2.
Proposition 7 If the principal anticipates the bias, she induces effort a1∗
m (q) < a∗m(q) and a1∗
m (q) is
increasing in h.
The analysis above has implications to the effect of hindsight bias on tort liability. It claims that
whenever there is unobservable effort involved information projection reduces rather than increases an
injurer’s incentive to exercise due care. This observation is in contrast with the common conjecture,
e.g., Rachlinski (1998), that an agent anticipating hindsight bias will take too much pre-caution too
avoid ex-post blame.22
22A key difference between my setup and the standard setup for the study of optimal liability, e.g., Shavell (1980), is that
the level of precaution (effort) is unobservable and action is not increasing in precaution rather it is the probability of a right
action that increases in effort.
28
6 Communication
In the previous sections, I focused on the problem of performance evaluation but information projection
might affect other aspects of organizational life as well. One such domain is communication. Both
intuition and evidence presented in Section 2 indicates that when giving or taking advice people assume
too much about what the other party knows. In this Section I demonstrate two ways information projec-
tion affects efficient information transmission between a speaker and a listener. These two themes are
credulity and unintended ambiguity.
Credulity refers to a case where a listener follows the recommendation of a speaker too closely
because he assumes that the recommendation of the speaker already incorporates his private information.
As a result, he will fail to combine his private information with the speaker’s recommendation and will
fail to sufficiently deviate from this recommendation even if he should. Unintended ambiguity refers
to the case where a speaker sends a message that is too ambiguous to the listener. A biased speaker
exaggerates the probability with which her background knowledge is shared with the listener, and hence
will overestimate how likely it is that the listener will be able to interpret her message. I show that
depending on the messages available for the speaker, the speaker might communicate too often or too
rarely.
6.1 Credulity
Consider a situation where an advisee has to take an action ye that is as close as possible to an unknown
state ω on which the shared prior is N(0, σ0). This state could describe the optimal solution of a research
problem, the best managerial solution on the organization of production or the diagnosis of a patient.
The advisee has some private information about ω that is given by se = ω + εe where εe is a mean zero
Gaussian noise term such that the posterior on ω, given the prior and se, is N(se, σe). The advisor also
has some private information about ω given by sr = ω+ εr where εr is a mean zero Gaussian noise term
such that the posterior on ω, given the prior and sr, is N(sr, σr).23 The advisor makes a recommenda-
tion yr equal to her posterior mean. The advisor cannot communicate the full distribution or the true
signal directly. Such limits on communication might arise due to complexity considerations, or because
23Formally, if εe ∼ N(0, σe) then se =σ20
σ20+σ2
e
se and σe =σ20σ2e
σ20+σ2
e
. Similarly, if εr ∼ N(0, σr) then se =σ20
σ20+σ2
r
sr and
σe =σ20σ2r
σ20+σ2
r
.
29
it’s prohibitively costly to explain this private information. Instead, she can give a recommendation
regarding the best action she would follow.
Let the advisee’s and the advisor’s objective be
maxye−Eω(ye − ω)2 (15)
thus the advisee’s goal is to take an action that minimizes the distance between his action and the state.
Given the advisor’s recommendation yr, and the advisee’s private information se, a rational advisee takes
action y0e such that:
y0e = E[N(ω, c0, v0)] (16)
where c0 = yrσ2eσ2e+σ2
r
+ seσ2rσ2r+σ2
e
and N(ω ; c, v) is a short form for a normally distributed random variable
with mean c and variance v. This action is based on the correct perception of how information is
distributed between the advisor and the advisee. This action efficiently aggregates the information in the
recommendation yr and the advisee’s private information se.
Consider now the case where the advisee exhibits full information projection. Here, he believes
that the advisor’s recommendation is based not only on the realization of sr, but also on se, and thus it
already incorporates all information available to the parties. As a result, he reacts to the advice yr by
taking action y1e such that:
y1e = E[N(ω, c1, v1)] (17)
where c1 = yr and v1 = v0. It follows that if the advisee exhibits full information projection, he puts
all the weight on what the advisor says and no weight on his private information. This way, his private
information is lost. The following proposition shows that a biased advisee follows the recommendation
of his advisor too closely.
Proposition 8 E |yr − yρe | is decreasing in ρ and E |yr − y1e | = 0 where expectations are taken with
respect to the true distribution of signals.
This proposition follows from the discussion above. Note that the more precise the advisee’s private
information is, the greater is the loss relative to the unbiased case. In the biased case, information
30
aggregation fails because the advisee fails to sufficiently update the advisor’s recommendation given
his private information.24 One way to eliminate this information loss is to invest in a technology that
allows the advisor to communicate her posterior distribution. Another option is to block communication
between the advisor and the advisee. Assuming full information projection, the advisee is ex-ante better-
off without a recommendation if and only if his signal is more precise than the advisor’s signal. More
generally, the following corollary is true:
Corollary 2 There exists an indicator function k(ρ, σe, σr) ∈ 0, 1 such that the advisee is better-off
with a recommendation if k(ρ, σe, σr) = 0, and the advisee is better-off without a recommendation if
k(ρ, σe, σr) = 1. The function k(ρ, σe, σr) is increasing in ρ and σr and decreasing in σe.
6.2 Ambiguity, Over- and Under-Communication
In the above context, information projection leads to credulity because the advisee projects his private
information. Let’s now turn to a context where the advisor projects her private information about the
state ω. Consider an information structure analogous to the Examples in Section 4.4. Let ω = !1!2
and !1, !2 ∈ −1, 1. Assume that s0 = !1 is the advisor’s background knowledge, which cannot be
communicated to the advisee. Let s1 = !2 be the signal that can be communicated to the advisee. As an
example, consider a radiologist who speaks to a patient about the patient’s medical condition ω. Signal
s0 incorporates the radiologist’s knowledge of medicine such as the meaning of a complex medical term.
Signal s1 is a medical term that describes the condition of the patient. If the patient does not know the
meaning of a medical term, then s1 does not incorporate any information to him. If the patient knows
the meaning of the medical term, he can interpret s1 in light of s0.
Let there be a third signal s2, Pr(s2 = ω | ω) = h where 0.5 < h < 1. This signal provides noisy
information about ω, but does not require the patient to know s0, the medical language. For simplicity, let
the true probability with which signals (s0, s1, s2) are available to the advisee be pe = (0, 0, 0). Assume
that the patient has a symmetric prior on ω and that the advisor can send only one signal because sending
two is prohibitively costly. Sending one message costs c. Let the payoff to the advisor be 1 if the advisee
24The logic of why a biased advisee will be too credulous, is also indicative of why information projection can result in
irrational herding behavior. In the context of Banerjee (1992) for example, while rational information updating results in
herding type behavior only in contexts where the action space is not as fine as the signal space, information projection leads
to herding even if the action space is as fine as the signal space, and hence where no rational herding should occur.
31
guesses ω correctly, and let it be 0 otherwise.
The advisor has three distinct options: remain silent, send signal s1, send signal s2. The table below
summarizes the advisor’s perceived payoff as a function of a ρ:
Payoff / action: Silence Send s1 Send s2
EV 0 : 1
2, 1
2− c, h− c.
EV ρ : ρ2 + (1− ρ2)(ρh+ (1− ρ)0.5), (1− ρ)ρh+ ρ2+12− c, ρ2 + (1− ρ2)h− c.
Since an unbiased advisor knows that s1 does not convey any valuable information to the patient, she
never sends s1. Furthermore, she decides to spend time describing the state to the patient in lay terms
whenever h−c > 1
2, that is when the expected benefit of talking is greater than the cost of not remaining
silent. In contrast, a biased medical advisor exaggerates the probability with which the medical term
conveys valuable information to the patient because she projects the knowledge of the medical language
s0. Hence if she is sufficiently biased, she prefers to send s1 over s2. Formally, this happens when
ρ ≥ 2h− 1.
At the same time, a biased advisor also exaggerates that the patient already knows both the medical
and the lay description. As a result, she underestimates the return to sending a costly message in general.
The net effect of these two forces depends on the degree to which the advisor is biased. If the advisor is
fully biased, she always decides to remain silent because she assumes that the advisee already knows ω
anyway. If the advisor is only moderately biased however, she might communicate even when a rational
advisor would remain silent.
Proposition 9 If ρ < 2h− 1, the advisor sends s2 iff c ≤ k2(ρ, h) and is silent otherwise. The function
k2(ρ, h) is increasing in h and decreasing in ρ. If ρ > 2h − 1, the advisor sends s1 iff c ≤ k1(ρ, h) and
is silent otherwise. Furthermore, if h = 0.5, the advisor sends s1 iff c ≤ 0.5ρ(1− ρ).
The above proposition shows not only that a biased advisor might send a dominated message but also
offers some comparative static results on whether there will be too much or too little communication. If
the advisor is only moderately biased, and the lay description is sufficiently informative, then she com-
municates too rarely. Here underestimating the return to communication dominates her overestimation
of how informative the medical description is. If the advisor is sufficiently biased, then depending on
32
how informative the lay description is, she might communicate too often. Since she overestimates the
probability that following the medical description the advisee will take the right action, she engages
in costly communication even if it conveys no information to the advisee. Hence adding dominated
communication options might decrease efficiency in the presence of information projection.
Results in the above proposition are consistent with the intuition that the curse-of-knowledge leads to
too much ambiguity. For example, many argue that this is true for computer manuals written by experts
but targeted to lay people. While in the case of computer manuals, hiring a lay person rather than an
expert to proof-read the manuscript could decrease the curse, in many other situations more explicit
communication protocols might better improve welfare.
7 Conclusion
In this paper, I developed a model of information projection applicable to problems of asymmetric
information. The applications in this paper are motivated by problems and evidence from labor markets,
organization, medicine, and law, but they are not exhaustive in any sense. I conclude the paper by
considering some possible further applications and extensions.
The results in Section 4 and Section 5 suggest that if debiasing is ineffective, special kinds of in-
centives might be necessary to mitigate the adverse effects of information choice on the production and
processing of information. Novel insights might be gained in contexts where the radiologist can decide
both about what information to produce and how much effort to exert in understanding the information
he produced.
Another possible extension of the over-inference and underestimation results of Section 2 is to the
analysis of group formation in social networks. Recall that a biased guest will be too optimistic about
the kindness of the host if the host and the guest have similar tastes, and will be too pessimistic about the
host’s kindness if their tastes differ. This implies that if friendships are formed partly on the perception
of social intentions, then members of a group might be too similar in taste. More importantly, such
cliques will misperceive each other as hostile because they mistakenly attribute taste differences to
hostile intentions. As a corollary of the underestimation result of Proposition 2, it might also be true that
a social network will have too few links.
33
The underestimation result can also be extended to the settings of Section 6. A biased advisee might
underestimate how attentive his advisor is with him because he exaggerates the precision of the advice an
attentive advisor could give if he wanted. Here attentiveness is defined as the probability that the advisor
bases her recommendation on information rather than on noise. A biased advisor might underestimate
how perceptive her advisee is because she does not recognize how ambiguous her messages are. Here
perceptiveness is defined as the probability that the advisee listens to the advisor’s message. Such
inferences can result in the breakdown of communication between parties who have a lot to share with
each other but suffer from projection bias.
Another direction to extend the ideas presented in this paper is to consider the related phenomenon
of ignorance projection. Ignorance projection happens when someone who does not observe a signal un-
derestimates the probability with which this signal is available to others. Though evidence on ignorance
projection is not as strong as the evidence on information projection, it might still be a phenomenon
worth studying, both empirically and theoretically. Finally, one could study information and ignorance
projection in the intrapersonal domains where people project their current information and their current
ignorance on their future selves leading to distortions in prospective memory.
34
8 Appendix
Proof of Proposition 1. Note first that since z > h , the host follows sk if she observes sk. Without loss
of generality assume that sk = meat. The biased conditional likelihoods are given by
πρ1(θkind | y = meat) =(ρ+ (1− ρ)h)π0(θkind)
(ρ+ (1− ρ)h)π0(θkind) + π0(θmean,meat)(18)
and
πρ1(θkind | y = meat) =(1− ρ)(1− h)π0(θkind)
(1− ρ)(1− h)π0(θkind) + π0(θmean,fish)(19)
Since h ≥ 0.5, πρ1(θ1 | y = sk) is increasing in ρ and πρ1(θ1 | y = sk) is decreasing in ρ.
Proof of Proposition 2. The guest’s perception of the ex-ante likelihood of the event that y = sk, is
increasing in ρ. By virtue of the properties of Bayes’ rule, the following relation holds for all ρ :
π0(θkind) = πρ1(θkind | y = sk) Prρ(y = sk | π0) + πρ1(θkind | y = sk) Pr
ρ(y = sk | π0).
The expected posterior πρ1(θkind) at the same time, is given by :
E[πρ1(θkind)] =πρ(y = sk | θkind)π0(θkind)
Prρ(y = sk)Pr(y = sk) +
πρ(y = sk | θkind)π0(θkind)
Prρ(y = sk)Pr(y = sk).
(20)
Since Prρ(y = sk | π0) is increasing and Prρ(y = sk | π0) is decreasing in ρ , then given Proposition
1, it follows that E[πρ1(θkind)] is decreasing in ρ.
Proof of Claim 1. Note that
1
2(ρ+ h(1− ρ))(1 + θ)π0(θ)∫ 1
0
1
2(ρ+ h(1− ρ))(1 + θ)π0(θ)dθ
=(1 + θ))π0(θ)∫ 1
0(1 + θ))π0(θ)dθ
=h2(1 + θ))π0(θ)∫ 1
0
h2(1 + θ))π0(θ)dθ
(21)
hence it follows that πρ(θ | S) = π0(θ | S) for all ρ and π0. The result on πρ(θ | F ) follows from the
Proof of Proposition 3 below.
Proof of Proposition 3. The expected posterior is the probability weighted average of the posterior after
35
a success and the posterior after a failure. E[πρ1 | π0] = Pr0(S)πρ1(S) + (1−Pr0(S))πρ1(F ). For a given
type θ this is equal to
E[πρ1(θ) | π0(θ)] = Pr 0(S) ∗Prρ(S | θ)π0(θ)
Prρ(S)+ (1− Pr 0(S)) ∗
Prρ(F | θ)π0(θ)
(1− Prρ(S)). (22)
Note that E[π1 | π0] = π0.
Let’s introduce two variables: λρS = Pr(S)/Prρ(S) and λρF = (1 − Pr 0(S))/(1 − Prρ(S)), where
variables are taken with respect to the expectations in π0. Note that λρS < 1 and λρF > 1 and λρS is
decreasing λρF is increasing in ρ given the assumption that Prρ(S | θ) = ρPr(S | s1, θ) + (1− ρ) Pr(S |
θ).
Since Prρ(S | θ) is increasing in θ for all ρ, it follows that the expected weight on higher types is
decreasing in ρ. Formally,
λρS Prρ(S | θ)π0(θ) + λρF Pr
ρ(F | θ)π0(θ) = λρSπ0(θ) + (λρF − λρS) Prρ(F | θ)π0(θ)
where the equality follows from the fact that Prρ(S) + (1 − Prρ(S)) = 1 for all ρ. Hence, lower types
are overweighted relative to higher types. Since Pr ρ(F | θ) is decreasing in θ for all ρ, it follows that
for any θ∗ < 1∫ θ∗0E[πρ1(θ) | π0]dθ >
∫ θ∗0E[π1(θ) | π0]dθ. (23)
Furthermore, since λρF − λρS is increasing in ρ it follows that
∫ θ∗0E[πρ1(θ) | π0]dθ >
∫ θ∗0E[πρ
′
1 (θ) | π0]dθ (24)
whenever ρ > ρ′.
Proof of Corollary 1. If for a given s0, Pr(S | θ, s1) > Pr(S | θ, s′1) for all θ, then for all ρ,
λρSs1 =Pr(S)
Prρ(S, s1)<
Pr(S)
Prρ(S, s′1)= λρSs
′1.
Since for both s1 and s′1, Prρ(S | θ) is increasing in θ, the corollary follows from the above proof of
36
Proposition 3.
Proof of Proposition 4. To show that πρ1(S) FOSD π1(S) we have to show that for all θ∗ < 1 ,
∫ θ∗
0
π1(θ | S)dθ ≥
∫ θ∗
0
πρ1(θ | S)dθ
for all θ∗ < 1. One can re-write this inequality in the following way:
Pr ρ(S)/Pr(S) ≥
(∫ θ∗
0
Pr ρ(S | θ)π0(θ)dθ
)/
(∫ θ∗
0
Pr(S | θ)π0(θ)dθ
). (25)
If λρ(θ) = Pr(S | θ)/Prρ(S | θ) is decreasing in θ for all θ with λρ(0) ≥ λρ(θ) ≥ λρ(1), then this
inequality holds since∫1
0πρ1(θ | S)dθ =
∫1
0π1(θ | S)dθ = 1. If λρ(θ) is increasing in θ then the reverse
inequality holds, and then π1(S) FOSD πρ1(S).
Proof of Proposition 5. Note first that in the Bayesian case the RHS of Eq.(9) is zero and does not
depend on s1. In the biased case, wρ1 = E[θ | πρ1] depends on s1 and it is decreasing in ρ. The decision
to produce s0 more often or less often than in the Bayesian case depends on whether the following
expression is positive or negative :
E[wρ1 − [wρ1 | s0]] (26)
It follows from the proof of Proposition 3, that if λρS > λρS, then E[wρ1] > E[[wρ1 | s0]] for all ρ > 0.
This is true because underestimation is decreasing in Pr(S)/Prρ(S) = λρS . Similarly, if λρS < λρS then
E[wρ1]] < E[[wρ1 | s0]] for all ρ > 0. It follows that if λρ > λρS, then a(m, ρ) is decreasing in m and if
λρ < λρS , then a(m, ρ) is increasing in m.
Proof of Lemma 1. First let’s derive the optimal contract as given by Eq. (11). The principal’s maxi-
mization problem yields the following Lagrangian:
L(wS, a, µ) = (p(a)q + bk + (1− b)z)(1− wS) + µ(p′(a)qwS − 1)
The FOC with respect to a is given by p′q(1−wS) + µp′′qw = 0 and with respect to wS it is−(p(a)q+
37
bk + (1 − b)z) + µp′(a)q = 0. Solving for µ and substituting for w = 1/p′(a)q the equilibrium effort
level is given by
p′q = 1−p′′(p+ (bk + (1− b)z)/q)
(p′)2= 1−
p′′(p+ b/(h− b) + z/q)
(p′)2(27)
Let the solution of this equation be denoted by a∗n(q). Note that the second-order conditions are satisfied
as long as p′′′(a) ≤ 0.
An increase in k or h increases q and hence increases the LHS of Eq.(28). An increase in k or h
decreases the RHS of Eq.(28). Since p is increasing and concave and p′′′ ≤ 0, it follows that this leads
to a higher equilibrium effort level. To see the effects of an increase in k and h on the principal’s welfare
note that for a given wS , (p(a)q + bk + (1− b)z)(1−wS) is increasing in a since wS < 1. Furthermore
the optimal wS after an increase cannot be larger than the original wS because h − b < 1 < p′ and
k − z < 1 < p′.
Proof of Lemma 2. Let’s first derive the optimal contract given monitoring as given by Eq. (13). The
principal’s maximization problem yields the following Lagrangian:
L(wS, a, µ) = (p(a)q + bk + (1− b)z)− (p(a)(1− b) + b)wS + µ(p′(a)(1− b)wS − 1)
The first-order condition with respect to a is given by p′q − p′(1 − b)wS + µp′′(1 − b)wS = 0 and the
first-order condition with respect to wS is given by−(p(1− b) + b) + µp′(1− b) = 0. Solving for µ and
substituting w = 1/p′(1− b), we get that the equilibrium effort level a∗m is determined by
p′q = 1−p′′(p+ b/(1− b))
(p′)2(28)
To see the inequality in the lemma, note first that (bk + (1− b)z)/(h− b)(k − z) > b/(1− b)⇐⇒
b(k − z) + z(1− b) > bh(k − z) is always true if h < 1.
Compare now Eq. (29) with Eq. (28).Note that the LHS’s of these two equations are the same and
the RHS of Eq. (29) is smaller than the RHS of Eq.(28). Given the assumption that p′′′ ≤ 0 it follows
that a is greater under monitoring.
38
To show the increase in the principal’s welfare, note that
EVn = p(a∗n)q + bk + (1− b)z − (p(a∗n) + b/(h− b) + z/q)/p′(a∗n)
and
EVm = p(a∗m)q + bk + (1− b)z − (p(a∗m) + b/(1− b))/p′(a∗m)
Since (1 − 1/p′(a∗n)) and (1 − 1/p′(a∗m)) are both positive because p′(a∗n), p′(a∗m) > 1 and because
b/(h− b) + z/q > b/(1− b), if h < 1, it follows that EVm > EVn.
Proof of Proposition 6. Let’s fix a wage w. It follows that the agent’s effort choice a1m(q, w) is given
by the solution of the maximization problem :
a1m(q, w) = argmaxa
p(a)h(1− b)wS + bwS − a (29)
The FOC is given by p′h(1− b)wS = 1. In contrast, am(q, w) is defined by the FOC p′(1 − b)wS = 1.
Hence, for any given wS, a1m(q, w) < am(q, w) as long as h < 1. Also a1m(q, w) is increasing in h.
Proof of Proposition 7. To prove this proposition, consider the principal’s problem when she knows
that the agent’s action is given by a1m(q, w). Here the principal’s Lagrangian is given by
L(wS, a, µ) = p(a)q + (bk + (1− b)z)− p(a)(h− b)wS − bwS + µ(p′(a)(h− b)wS − 1)
the first-order condition with respect to a is given by p′q − p′(h − b)wS + µp′′(h − b)wS = 0 and the
first-order condition with respect to wS is given by −p(h− b)− b+ µp′(h− b) = 0. Solving for µ and
substituting w = 1/p′(h− b) we get that a1∗
m (q) is given by:
p′q = 1−p′′(p+ b/(h− b))
(p′)2(30)
Comparing Eq. (31) with Eq.(29) it follows a1∗
m < a∗m as long as h < 1 because the RHS of (31) is
always greater than the RHS of Eq.(29). Furthermore since the RHS of (31) is decreasing in h, a1∗
m is
39
decreasing in h.
Proof of Proposition 8. Since se and sr are independent it follows that the joint distribution of ω, se
and sr is given by a multivariate normal distribution with mean vector (0, 0, 0) and the corresponding
covariance matrix C. Given the assumptions on C, it follows that E[ω | sr] =σ20
σ20+σ2
e
sr and E[ω | se, sr]
is given by
E[ω | se, sr] =[σ20, σ
2
0
]σ20 + σ2e σ20
σ20 σ20 + σ2r
−1 se
sr
(31)
Straightforward calculation shows that
E[ω | se, yr] =yrσ
2e
σ2e + σ2r+
seσ2r
σ2r + σ2e
where se =σ20
σ20+σ2
e
se, σ2e =
σ20σ2e
σ20+σ2
e
and σ2r =σ20σ2r
σ20+σ2
r
.
Consider now the biased case where ρ = 1. Here the advisee believes that yr = E[ω | se, sr] and
hence takes an action y1e = yr. For ρ < 1 the advisee believes that with probability ρ it is the case that
yr = E[ω | se, sr] and with probability 1 − ρ it is the case that yr = E[ω | sr]. Hence it is always true
that yρe ∈ [miny0e , yr , maxy0e , yr]. Furthermore as the probability ρ increases |yρe − yr| decreases.
Proof of Corollary 4. Note first that −E(yρe − ω)2 is decreasing in ρ by virtue of Proposition 8 since
the estimate of ω has the lowest variance given s1 and s2 if ye = E[ω | se, yr]. Also for a fixed ρ,
E |yr − yρe | is decreasing in σr and increasing in σe. Hence if we fix σe < M < ∞, there always exists
sufficiently large σr such that −E(se − ω)2 > −E(yρe − ω)2. Similarly, for a fixed σr > 0 there always
exists σe sufficiently small that −E(se − ω)2 > −E(yρe − ω)2. It follows that k(σe, σr, ρ) is increasing
in σe decreasing in σr and increasing in ρ.
Proof of Proposition 9. Simple calculations show that s2 dominates s1 iff 2h− 1 > ρ. Here the advisor
sends s2 iif c < (1 − ρ2)(h + 0.5)(1 − ρ) = k2(ρ, h). It follows that k2(ρ, h) is increasing in h and
decreasing in ρ. If 2h− 1 < ρ, the advisor sends s1 iif c < ρ3(h− 0.5) + ρ(0.5− ρh) = k1(ρ, h).
40
References
[1] Alchian, Armen, and Harold Demsetz. 1972. "Production, Information Costs and Economic Orga-
nization." American Economic Review, 62(5): 777-95.
[2] Anderson, John, Marianne Jennings, Jordan Lowe, and Philip Reckers. 1997. "The Mitigation of
Hindsight Bias in Judges’ Evaluation of Auditor Decisions." Auditing: A Journal of Practice
and Theory, 16(2): 20–39.
[3] Banerjee, Abhijit. 1992. "A Simple Model of Herd Behavior." Quarterly Journal of Economics,
107(3): 797-817.
[4] Berlin, Leonard. 2002. "Malpractice Issues in Radiology. Hindsight Bias." American Journal of
Roentgenology, 175(3): 597-601.
[5] Biais, Bruno, and Martin Weber. 2007. "Hindsight Bias and Investment Performance." Mimeo
IDEI Toulose.
[6] Bukszar, Ed, and Connolly Terry. 1988. "Hindsight Bias and Strategic Choice: Some Problems in
Learning From Experience." Academy of Management Journal, 31(3): 628-641.
[7] Camerer, Colin, George Loewenstein, and Martin Weber. 1989. "The Curse of Knowledge in Eco-
nomic Settings: An Experimental Analysis." Journal of Political Economy, 97(5): 1234-1254.
[8] Camerer, Colin, and Ulrike Malmendier. 2007. "Behavioral Economics of Organizations." in: P.
Diamond and H. Vartiainen (eds.), Behavioral Economics and Its Applications. Princeton:
Princeton University Press.
[9] Caplan Robert, Posner Karen, Cheney Frederick. 1991. "Effect of Outcome on Physicians’ Judg-
ments of Appropriateness of Care." Journal of the American Medical Association, 265(15):
1957-1960.
[10] Conlin, Mike, Ted O’Donoghue, and Timothy Vogelsang. 2007. "Projection Bias in Catalog Or-
ders." American Economic Review, 97(4): 1217-1249.
[11] DeMarzo, Peter, Dimitri Vayanos, and Jeffrey Zwiebel. 2003. "Persuasion Bias, Social Influence,
and Uni-dimensional Opinions." Quarterly Journal of Economics, 18(3): 909-968.
[12] Dewatripont, Mathias, and Patric Bolton. 2005. Contract Theory. Cambridge: The MIT Press.
41
[13] Fischhoff, Baruch. 1975. "Hindsight=Foresight: The Effect of Outcome Knowledge on Judge-
ment Under Uncertainty." Journal of Experimental Psychology: Human Perception and Per-
formance, 1(3): 288-299.
[14] Gilovich, Thomas, Kenneth Savitsky, and Victoria Medvec. 1998. "The Illusion of Transparency:
Biased Assessment of Other’s Ability to Read Our Emotional States." Journal of Personality
and Social Psychology, 75(2): 743-753.
[15] Kessler, Daniel, and Mark McClellan. 2002. "How Liability Law Affects Medical Productivity."
Journal of Health Economics, 21(6): 931-955.
[16] Kruger, Justin, Epley Nicholas, Jason Parker, and Zhi-Wen Ng. 2005. "Egocentrism over E-mail:
Can People Communicate as well as They Think?" Journal of Personality and Social Psychol-
ogy, 89(6): 925-936.
[17] Harley, Erin, Keri Carlsen, and Geoffrey Loftus. 2004. "The “Saw-It-All-Along” Effect: Demon-
strations of Visual Hindsight Bias." Journal of Experimental Psychology: Learning, Memory,
and Cognition, 30(5): 960-968.
[18] Harley, Erin. 2007. "Hindsight Bias in Legal Decision Making." Social Cognition, 25(1): 48-63.
[19] Hastie, Ried, David Schkade, and John Payne. 1999. "Juror Judgments in Civil Cases: Hindsight
Effects on Judgments of Liability for Punitive Damages." Law and Human Behavior, 23(5):
597-614.
[20] Heath, Chip, and Dan Heath. 2007. Made to Stick: Why Some Ideas Survive and Others Die.
Random House.
[21] Hermalin, Benjamin. 1993. "Managerial Preferences Concerning Risky Projects." Journal of Law,
Economics, & Organization, 9(1): 127-35.
[22] Holmström, Bengt. 1979. "Moral Hazard and Observability." Bell Journal of Economics, 10(1):
74-91.
[23] Holmström, Bengt. 1999. "Managerial Incentive Problems - A Dynamic Perspective." Review of
Economic Studies, 66(1): 169-182.
[24] Innes, Robert. 1990. "Limited Liability and Incentive Contracting with Ex-ante Action Choices."
42
Journal of Economic Theory, 52(1): 45-67.
[25] Jackson, Rene, and Alberto Righi. 2006. Death of Mammography: How Our Best Defense Against
Cancer is Being Driven to Extinction. Caveat Press.
[26] Lazear, Edward. 2000. "Performance Pay and Productivity." American Economic Review, 90(5):
1346-61.
[27] Loewenstein, George, Ted O’Donoghue, and Matthew Rabin. 2003. "Projection Bias in Predicting
Future Utility." Quarterly Journal of Economics, 118(4): 1209-1248.
[28] Loewenstein, George, Don Moore, and Roberto Weber. 2006. "Misperceiving the Value of Infor-
mation in Predicting the Performance of Others." Experimental Economics, 9(3): 281-295.
[29] Mullainathan, Sendhil. 2002. "A Memory-Based Model of Bounded Rationality." Quarterly Jour-
nal of Economics, 117(3): 735-774.
[30] Newton, Elizabeth. 1990. "Overconfidence in the Communication of Intent: Heard and Unheard
Melodies." Unpublished Doctoral Dissertation, Stanford University, Stanford, CA.
[31] Rabin, Matthew. 2002. Inference by Believers in the Law of Small Numbers." Quarterly Journal
of Economics, 117(3): 775-816.
[32] Rachlinski, Jeffrey. 1998. "A Positive Psychological Theory of Judging in Hindsight." The Univer-
sity of Chicago Law Review, 65(2): 571-625.
[33] Shapiro, Carl, and Joseph Stiglitz. 1984. "Equilibrium Unemployment as a Worker Discipline De-
vice." American Economic Review, 74(3): 433-444.
[34] Shavell, Steven. 1980. "Strict Liability Versus Negligence." Journal of Legal Studies, 9(1): 1-25.
[35] Spiegler, Roni. 2006. "The Market for Quacks." Review of Economic Studies, 73(4): 1113-1131.
[36] Studdert, David, Michelle Mello, William Sage, Catherine DesRoches, Jordon Peugh, Kinga Za-
pert, and Troyen Brennan. 2005. "Defensive Medicine Among High-Risk Specialist Physicians
in a Volatile Malpractice Environment." Journal of the American Medical Association, 293(2):
2609-2617.
[37] Van Boven, Leaf, Gilovich Thomas, and Victoria Medvec. 2003. "The Illusion of Transparency in
Negotiations." Negotiation Journal, 19(2): 117-131.
43