Channeled Attention and Stable Errors

72
Channeled Attention and Stable Errors Tristan Gagnon-Bartsch Harvard Matthew Rabin Harvard Joshua Schwartzstein * Harvard Business School September 14, 2021 Abstract We develop a framework for assessing when a person will notice that a theory she has about the world is wrong, premised on the idea that people neglect information that they view (through the lens of their misconceptions) to be irrelevant. Focusing on the question of when a mistaken theory can persist in the long run even when attention is very cheap, we study the attentional stability of both general psychological biases—such as naivete about present bias or neglecting the redundancy in social information—and context-specific empirical misconceptions—such as false beliefs about medicinal side effects or financial investments. People discover their errors only when the data they deem relevant causes them to incidentally notice that their theory is wrong. We explore which combinations of errors and environments allow an error to persist. People tend to notice costly errors in a particular context when they are right about which factors matter even when they are wrong about how these factors matter, whereas errors that lead people to ubiquitously treat consequential factors as irrelevant will be stable across a broad class of environments. Environments designed to make such factors artificially germane can induce recognition of an error. * E-mails: [email protected], [email protected], and [email protected]. We thank Chang Liu for excellent research assistance. For helpful comments, we are grateful to Nava Ashraf, Francesca Bastianello, Max Bazerman, Katherine Coffman, Christine Exley, Erik Eyster, Drew Fudenberg, Nicola Gennaioli, Brian Hall, David Hirshleifer, Botond K˝ oszegi, Spencer Kwon, George Loewenstein, Michael Luca, Kathleen McGinn, Sendhil Mullainathan, Andrei Shleifer, Dmitry Taubinsky and seminar participants at BEAM, Berkeley, Boston College, Brandeis, CEU, Cornell, Dartmouth, ESSET, Harvard, LSE, McGill, Melbourne, Princeton, Stan- ford, University of New South Wales, University of Technology Sydney, and Yale. Gagnon-Bartsch and Rabin thank the Centre for Experimental Research on Fairness, Inequality and Rationality (FAIR) at the Norwegian School of Economics (NHH), for financial support and hospitality while conducting research on this paper.

Transcript of Channeled Attention and Stable Errors

Page 1: Channeled Attention and Stable Errors

Channeled Attention and Stable Errors

Tristan Gagnon-Bartsch

Harvard

Matthew Rabin

Harvard

Joshua Schwartzstein∗

Harvard Business School

September 14, 2021

Abstract

We develop a framework for assessing when a person will notice that a theory she has about theworld is wrong, premised on the idea that people neglect information that they view (throughthe lens of their misconceptions) to be irrelevant. Focusing on the question of when a mistakentheory can persist in the long run even when attention is very cheap, we study the attentionalstability of both general psychological biases—such as naivete about present bias or neglectingthe redundancy in social information—and context-specific empirical misconceptions—suchas false beliefs about medicinal side effects or financial investments. People discover theirerrors only when the data they deem relevant causes them to incidentally notice that theirtheory is wrong. We explore which combinations of errors and environments allow an errorto persist. People tend to notice costly errors in a particular context when they are right aboutwhich factors matter even when they are wrong about how these factors matter, whereas errorsthat lead people to ubiquitously treat consequential factors as irrelevant will be stable across abroad class of environments. Environments designed to make such factors artificially germanecan induce recognition of an error.

∗E-mails: [email protected], [email protected], and [email protected]. Wethank Chang Liu for excellent research assistance. For helpful comments, we are grateful to Nava Ashraf, FrancescaBastianello, Max Bazerman, Katherine Coffman, Christine Exley, Erik Eyster, Drew Fudenberg, Nicola Gennaioli,Brian Hall, David Hirshleifer, Botond Koszegi, Spencer Kwon, George Loewenstein, Michael Luca, KathleenMcGinn, Sendhil Mullainathan, Andrei Shleifer, Dmitry Taubinsky and seminar participants at BEAM, Berkeley,Boston College, Brandeis, CEU, Cornell, Dartmouth, ESSET, Harvard, LSE, McGill, Melbourne, Princeton, Stan-ford, University of New South Wales, University of Technology Sydney, and Yale. Gagnon-Bartsch and Rabin thankthe Centre for Experimental Research on Fairness, Inequality and Rationality (FAIR) at the Norwegian School ofEconomics (NHH), for financial support and hospitality while conducting research on this paper.

Page 2: Channeled Attention and Stable Errors

“My experience is what I agree to attend to. Only those items which I notice shape mymind; without selective interest, experience is an utter chaos. ... [Attention] implieswithdrawal from some things in order to deal effectively with others.” – William James

“The art of being wise is the art of knowing what to overlook.” – William James

“No theory is kind to us that cheats us of seeing.” – Henry James

“Truly nothing is to be expected except for the unexpected.” – Alice James

1 Introduction

Recent economic models explore the consequences of agents who misconceive the world. Thesemodels span a wide range of misconceptions—including psychological biases such as bad statis-tical reasoning, faulty social inference, and distorted beliefs about one’s personal traits, as well ascases where agents happen to have beliefs that mismatch the reality of a specific context. A com-mon critique of such models is that people should discover their errors in the many cases wherethey will be exposed to events they think are unlikely or impossible. Someone who suffers fromthe gambler’s fallacy, for instance, will see more streaks than she expects. A person who neglectsthat others follow the crowd will see an unexpected degree of consensus. And a person who isnaive about his self-control problems will indulge himself more than he anticipates. Won’t theseerror-makers notice that something is amiss?

To study this question, we build from a literature pioneered by Sims (2003) that analyzes theoptimal allocation of costly attention. These models assume that agents have correct expectations,including about how attention will influence their beliefs. We depart from this literature in twointerrelated ways. First, although we continue to assume agents direct attention by weighing itsexpected costs against its expected benefits, we assume that agents may deploy their subjectively

rational attention based on mistaken priors. Our framework brings to the fore ways that attentioncosts and mistaken priors interact and also makes clear that theories of attention cannot render a

priori errors irrelevant. If attention costs aren’t small enough to justify full-attention analysis, theneconomic predictions should depend on the pre-attentional errors that agents haven’t fully movedaway from. In many cases, in fact, the main lesson of realizing attention is costly would seem tobe that errors will remain by and large intact irrespective of the specific nature of attention costs.1

1By analogy, consider a party host trying to locate an absent-minded friend who may have (yet again) run out ofgas by forgetting to fill his tank. If she asks if anyone knows where her friend departed from, she might not welcomea guest interrupting to suggest that everybody instead try to calculate the gas mileage of the friend’s car. And ifanother party-goer interjects “Hold on! We can’t begin to understand where your friend is without first studying howcombustion engines work!”, then the host might rethink who she invites to parties as she sets off alone in search of herfriend. The analogy has is imperfect: one could imagine a (mean) host deeming “not here” a sufficient description ofa missing guest; it’s hard to imagine an economist deeming “not the full-attention outcome” a sufficient description ofthe economy.

1

Page 3: Channeled Attention and Stable Errors

Our second departure is to re-purpose the study of costly attention to address a variant of ouropening question: When exposed to similar recurring situations, will patterns she didn’t expecteventually alert an agent that her priors are mistaken? When addressing this question, one mightdoubt the relevance of bad priors after all. Knowing she will often encounter similar situations,an agent might find it worthwhile to pay enough attention early on to avoid repeated mistakes. Ashopper may be inattentive to the price on any given day, yet initially invest enough in learning thelikely range of prices to know when attention is warranted and what to expect when she doesn’tpay attention. In such settings, rational priors combined with limited attention to case-by-caseparticulars would seem a good approximation.

Yet even in such circumstances, there is a strong countervailing force, noted by Schwartzstein(2014), that motivates our approach. Unlike settings where people balance the costs of attentionagainst correctly forecast benefits, the question at hand is whether somebody who doesn’t know sheis making a mistake will discover she is doing so. Economizing on attention plainly implies youshouldn’t query available evidence for signs of errors you aren’t aware you are making. And evenif you realize you’re likely making some unknown sizable mistake, you won’t allocate attentionwithout a sense of where to direct it. People don’t constantly evaluate which variables and theirco-variations are worthy of attention, nor do they attempt to imagine, formulate, and evaluate allthe possible psychological biases they might succumb to.

The starting point for understanding why people don’t notice their errors is that they don’t noticemost things. The cognitive scientist Alexandra Horowitz (2013) conveys this core truth well:

You are missing most of what is happening around you right now. You are missingwhat is happening in the distance and right in front of you. In reading these words, youare ignoring an unthinkably large amount of information that continues to bombard allof your senses. The hum of the fluorescent lights; the ambient noise in the room;the feeling of the chair against your legs or back; your tongue touching the roof ofyour mouth; the tension you are holding in your shoulders or jaw; the constant humof traffic or a distant lawnmower; the blurred view of your own shoulders and torso inyour peripheral vision; a chirp of a bug or whine of a kitchen appliance.

Our opening quote, by prominent early psychologist William James, notes that the impossibilityof ubiquitous attention demands effective selectivity in attention. Attention research since then hasfocused on where people direct attention, not on how much of it they do.2 An individual employs

2In common usage, “to pay attention” means to direct attention to (what somebody deems) the right thing. Whenteachers, parents, or partners implore us to “Pay attention!” it’s to rein in our wandering minds, not to rouse ourdormant ones. (For that, they implore “Wake up!”) Even without knowing what topic they are competing with, theauthority’s imperative means to stop attending to a friend’s tweet or internet news, or to quit ruminating over a recentargument or new romantic interest.

2

Page 4: Channeled Attention and Stable Errors

mental models of the world that guide her in selecting the features of each new situation to noticeand which choices to make. But research also indicates that the models and intuitions guidingdecision-makers can be flawed. The second quote by William James indicates that wisdom isnecessary to direct our attention in the right direction, and his brother Henry cautioned that theoriesthat help guide us also run the danger of blinding us to insights.3

We apply our framework to cognitive errors and empirical mistakes that are widely studied,which match the notion of people bringing coherent but potentially flawed theories of how to makedecisions in new situations. These errors can be framed with a particular structure: agents focuson a few key factors when making decisions, but miss the relevance of some things, misunderstandhow—or even just how often—some things matter, or wrongly react to some irrelevant things. Weestablish how the errors embedded in the agent’s guiding theories are especially prone to goingunnoticed. These errors are, in the language of Handel and Schwartzstein (2018), “mental gaps”—not knowing which information to process—and not “frictions”—avoiding correctly-assessed costsof such processing. Rather than studying agents who are selecting what to notice from a swarmof random random variables—like a frog skillfully deploying its tongue to zap the choicest bits ofinformation buzzing around it (or if lucky a juicy byte crawling past)—we study people who areunaware of the shortcomings in the mental models they employ.4

This paper formalizes some criteria to assess when people will eventually notice their errors.These criteria rest on the premise that a person channels her attention to information that she seesas relevant, and ignores information she deems irrelevant. Since she doesn’t pay careful attention

3The role economic theory plays in making us miss the role of attention may be an illustration. One reason it can behard for we economists to appreciate the attentional barriers to correcting mistakes is a byproduct of a valuable featureof our theories. Models of errors designed for integration into economics, like all of our formal models, strip awaydistracting complexities, so that economists can focus on the consequences of agents’ errors. This makes these barriersuniversally non-salient to economists: without keeping in mind that the agents being studied are not focusing on thoseconsequences, it can seem like such models assume error-making agents are willfully ignoring useful information. Onereaction to this misperception is to disbelieve that somebody would be that irrational. Another is to proffer attentioncosts that are so large as to deter an agent who is fully aware of the possible error from investing the attention neededto verify whether she is doing so. Our framework applies to the many cases where neither reaction seems warranted.

4We assume few readers of this article believe that horoscopes are truly informative. (Except Capricorns, who tendto be gullible.) You are well-advised not to reexamine your disbelief, but it may prove useful to consider why you don’texamine it—you are deploying the right attentional strategy given that you’re right. Would you notice if you weren’tright? The costliness of attention certainly prevents thinking through all potential astrological systems to check forvalidity. You haven’t investigated how the features of clouds on your 13th birthday might govern your life, and if itwere true that your vaporological sign of “sorta puffy” means that “helping a stranger today will have unanticipatedbenefits for you career,” you would never notice. But nor have you investigated popular astrological systems, such asthose based on the positions of the planets, the moon, and Pluto, just in case they are true. This is despite the abundantdata on your friends’ birthdays, personalities, and important life events to let you do so. And although you certainlycan bypass all the data collection by learning from those you trust, you haven’t thoroughly investigated whether you’vechosen the right people to trust. It is noteworthy that both those who subscribe to astrology and those who don’t arefollowing the wisdom of the ages, but have selected different wisdoms. (And of course both groups are ignoring thenew-age wisdom of economists who note that a person’s astrological sign can be informative if they or those aroundthem have behaved differently in life based on believing the astrology.)

3

Page 5: Channeled Attention and Stable Errors

to postulates she doesn’t think are true, she will notice her error only through “incidental learning”:only if evidence she gathers to resolve utility-relevant uncertainties within her (mis)conception of

the world indicates something amiss with her conception will she necessarily notice her error. Ourapplications and general results examine which combinations of the true structure of the world, theagent’s perceived structure of the world, her goals, and the informational environment will lead herto notice an error.

After briefly reviewing related literature in Section 2, Section 3 provides two extended examplesthat introduce the basics of our model. Section 4 then begins formalizing the framework. Wefocus on errors that can be captured with “quasi-Bayesian” models: the agent is Bayesian witha misspecified prior π over a set of parameters, where π either assigns positive probability toimpossible parameters, or zero probability to possible parameters. We assume the agent’s availableattention strategies include all ways to coarsen the information that she has access to. We define a“sufficient attentional strategy” (SAS) as one in which the agent attends to all available informationthat she perceives as helpful for maximizing long-run payoffs.

Two strong assumptions about access to information simplify the setup, analysis, and interpreta-tion of our framework. First, we assume the agent can observe the payoffs generated by each of herpotential choices, not just the one chosen. By eliminating the effect of her actions on informationexposure, we can distinguish between the effects of limited attention versus limited informationthat might arise endogenously from experimentation.5 Second, we assume that the agent can freelyaccess past information she deems useful at the moment, even if she did not previously take note ofthat information. This could happen if the agent encodes to memory all that happens in a retrieval-friendly way even if she does not make an effort to notice, or (more realistically) the environment issuch that relevant historical data is available at later dates (e.g., it is archived in a database). In Ap-pendix A we formally compare this assumption to the alternative where the agent can only accessinformation later if she attends to it immediately. An earlier draft, Gagnon-Bartsch, Rabin, andSchwartzstein (2018), provides a more complete analysis and shows that, given our focus on long-run questions of whether a person eventually wakes up to her error, results under this alternativeare broadly similar to results assuming perfect encoding.

The definition of a SAS does not rule out that the agent attends to information she believes isuseless. To reflect a stronger notion of limited attention, we primarily focus on the implications ofemploying a “minimal” SAS, whereby a person notices no more than she finds useful. For instance,consider a manager who wants to learn a worker’s success rate at a given task. If she believes thisrate is constant over time, then she will find it sufficient to notice the worker’s historical successfrequency while ignoring temporal trends. Or consider a gym member who wants to assess whether

5Although this is unrealistic, our long-run predictions would extend without this assumption so long as the agenteither occasionally faces restrictions on her choice set or observes the experiences of other agents.

4

Page 6: Channeled Attention and Stable Errors

her membership is worthwhile. If she is naive about her self-control problem (thinking she’ll goexactly when warranted), then she will find it sufficient to notice the frequency she skips the gymwithout analyzing whether the skips are from laziness or from busyness. And if she is certain thather membership is worth it, then she won’t take note of her attendance rate at all. A simple resultis that a person following a minimal SAS never notices an event she thought was impossible.6

The final step in our framework uses the notion of a minimal SAS to address our motivatingquestion: When will a mistaken agent notice that information she is exposed to indicates that analternative explanation for what she is seeing is far more likely, even when she pays attention onlyto information that she finds relevant under her mistaken theory? The assumption that the agentjudges her theory’s plausibility by comparing it to an alternative theory reflects the fact that in anydynamic setting with a persistent role of chance, any particular history occurs with infinitesimalprobability in absolute terms. Thus, the agent does not abandon her model simply based on theabsolute unlikeliness of an observation, but does so when that observation is far more likely under acompelling alternative. In the limit case that we are considering, “more likely” will almost alwaysmean infinitely more likely, and so we say that a theory π is “attentionally inexplicable” relative

to an alternative theory λ if the limit noticed data is infinitely more likely under λ than underπ . In most of our analysis, we take λ to be the correct model; when doing so, we simply use theterm “attentionally inexplicable” without an explicit reference to λ . We include the more generalλ in our framework to emphasize the role that a contemplated alternative must play in judging anexplanation implausible.

From this, we define a tractable and disciplined “solution concept” designating an erroneoustheory π in any specified decision problem as attentionally stable if there exists any SAS wherethe correct model does not end up looking infinitely more likely than π . This lexicographic test ofpositing full attention to any information that could be of any help, and allowing no attention toany other information, obviously leans heavily on the formulation we employ to capture a person’sbad model of the world. The notion that the agent puts zero weight on the true model may wellseem knife-edge, as it would predict the agent discovers her error if she puts the slimmest of oddson the true model being correct. But, besides putting zero weight on the truth, our approach alsoassumes negligible cost of attention. If the cost of attention were non-negligible, then any weightshe puts on the truth would need to be big enough to justify the attention.

At the end of Section 4, we also define a notion of explicability that would be applicable if full

6This feature has relevance for existing approaches to handle such “impossibility” in models of mistakes. Themost common fix is to create an environment where no perceived-impossible events happen. For early examples, seeBarberis, Shleifer, and Vishny (1998), where the measurability condition holds in a natural way, and Rabin (2002),where features of the model are contrived to rule out such events. Another common approach, employed for instancein models of naivete about present bias as formulated by O’Donoghue and Rabin (2001)—which typically generatesbehavior by an agent that she predicted was impossible—is to simply ignore the issue. This could be viewed asreflecting channeled attention under a minimal SAS.

5

Page 7: Channeled Attention and Stable Errors

attention were automatic. By comparing this to attentional explicability, we show how “forcing” anagent to pay attention to information she deems unuseful dramatically narrows the set of mistakesthat could go unnoticed. Indeed, this notion of inexplicability with full attention can be seen as for-malizing (and often validating) the common critique we began with, yet clarifying how misleadingit can be when taking even small costs of attention into account.

In Section 5 we apply the framework to some errors familiar from existing theoretical research.These include naivete about present bias, mislearning from others by neglecting the correlation intheir opinions (as in DeMarzo, Vayanos, and Zwiebel 2003 and Eyster and Rabin 2010, 2014),and empirical misconceptions about patterns in stock returns (as in Barberis, Shleifer, and Vishny1998). For these errors, we demonstrate environments in which people wake up to their mistakeand environments in which they do not.

While these examples highlight errors that are stable in some settings but not others, Section 6describes classes of errors that are attentionally stable irrespective of the decision context—a fea-ture we call preference-independent attentionally explicable (or PIAE, pronounced as Americanspronounce “paella”). For incidental learning to reveal a costly mistake, the agent must want topay attention for her own purposes; that is, she must face some uncertainty within her theory thatshe wants to reduce. For instance, if a person is confident that managed funds always outperformindex funds, she may invest in managed funds without noticing that her guiding theory is false. Butthere are non-doctrinaire errors that are PIAE as well. We show that PIAE models are typically“overly coarse”—they ignore relevant outcomes or predictive signals. Erroneous models that arenot PIAE, on the other hand, typically make the right distinctions or are “overly fine”—they placeimportance on truly irrelevant outcomes or signals.

In Section 7, we analyze some general principles of our framework that play out across settings.When asking which combinations of preferences and available actions promote incidental learning,we show that combinations that encourage the agent to track the data in a way that reveals the errordo not have a tight connection with the costliness of the error. A trader who needs to justify tradingdecisions by reporting precise earnings forecasts is more likely to discover an error than a traderwith more at stake but no such reporting requirement. We show, more generally, that environmentsrequiring a person to engage as she goes along with features of the data she has seen tend to inducean agent to notice her error. One intriguing consequence of this principle is that (in the spirit ofa laboratory experiment by Esponda, Vespa, and Yuksel, 2020) somebody can be better off whenshe is initially missing some key information, because this uncertainty induces her to pay continualattention to what she is experiencing in order to resolve it.

6

Page 8: Channeled Attention and Stable Errors

2 Relationship to the Literature

To put our framework in a broader perspective, we provide a cursory discussion of some strandsof psychological and neurological research on attention, as well as economic research motivatedby (and contributing to) these strands. In doing so, we flag both the forces that we embed in ourframework and those we omit.

Much of the psychological research identifies aspects of stimuli, such as visual features, thattend to induce or escape attention, but different strands have emphasized very different themes. Asubstantial literature emphasizes the notion of “attentional capture”, whereby certain stimuli, suchas unusual dress, tend to capture attention independent of the instrumental value of noticing them.7

Another prominent line of research (see Dehaene 2014 and Chater 2018 for reviews) highlightsthe surprising degree to which seemingly conspicuous stimuli may go unnoticed. Most famously,when Simons and Chabris (1999) tasked people with counting the number of passes in a film clip ofbasketball players, they often failed to notice a gorilla walking across the court. Such “inattentionalblindness” to gorillas is not limited to novice observation: Drew, Vo, and Wolfe (2013) illustratedthat many experienced radiologists failed to notice images of gorillas superimposed on lung x-raysthey were screening for cancerous nodules—even though the gorillas were 48 times larger thanthe average nodule that they were expertly identifying, and even when eye-tracking demonstratedthat they looked directly at the gorilla. Some research in this latter tradition illustrates our assump-tion that people can parse data in a way that ignores distinctions they deem irrelevant, even whenthey must notice and even act on those details at a more literal and proximate level. For instance,Johansson et al. (2005) had experimental participants choose which of two faces was more attrac-tive, but found a considerable fraction defended choices as recorded by the experimenters withoutnoticing that those recorded choices did not match their previously stated preferences.

It can be challenging to draw conclusions from some of this rich evidence from attention research—where it can be unclear what attentional strategies would be optimal in the contexts studied—todraw lessons for how it might distort economically consequential inferences people draw fromtheir environment. Recent economic models have begun to meet this challenge by drawing onthe insights from these different strands of psychological and neurological research. Bordalo,Gennaioli, and Shleifer (2012, 2013, 2020) build on themes explored in the attentional-captureresearch to formalize ways in which surprising features of a context may receive disproportionateweights in decisions. Continuing this idea that stimulus-driven attention plays a role in economicchoices, Li and Camerer (2021) show that machine-learned algorithms trained to predict whichfeatures of visual scenes attract attention also predict choices in experimental games.

The pervasive theme from psychological research on how allocation of attention reflects a per-

7See Bordalo, Gennaioli, and Shleifer (2021) for a review targeted to economists.

7

Page 9: Channeled Attention and Stable Errors

son’s goals has been embraced, clarified, and formalized by recent models of rational inattentionby Sims (2003), Woodford (2012), Caplin and Dean (2015), Matejka and McKay (2015), and oth-ers.8 The notion of sparsity following Gabaix (2014) has likewise elaborated on the calculus ofoptimal attention.9 And Payzan-LeNestour and Woodford (2020) experimentally demonstrate thata form of “outlier blindness”, whereby observers do not distinguish between improbable extremevalues along a dimension, may be consistent with rational information processing.

Although we (crucially) differ in allowing attention to be guided by misspecified priors andmodels, our framework is in this tradition of goal-directed attention. As such, we share the cleardrawback of ignoring all the ways that stimuli might draw attention that have little to do withtheir instrumental value. But there is one form of such stimulus-driven attention that we abstractfrom that is arguably clarifying: we make no a priori assumptions about how agents divide theworld into variables. Relative to Schwartzstein’s (2014) stylized setting—where a person chan-nels her attention based on wrongly thinking some predefined variable is unimportant—or otherapproaches that take variables as exogeneous such as Gabaix (2014), our “SAS” approach is lessrestrictive: when following a SAS, the agent recalls a coarsening of the true history each periodthat she deems sufficient for any future decision. This highlights that what a person considers to bea variable potentially worth tracking is itself driven by her theory. For example, a manager’s beliefsabout whether rude employees are less productive team members depends on how she organizesthe complexity of conversations into a variable called “being rude”. It also captures the importantfeature that one can pay attention to aspects of variables without noting everything relevant aboutthem, or pay attention to them momentarily without noticing long-run statistics. Moreover, ourflexible approach allows us to apply the framework to a wide range of erroneous models. Ourframework is clearly unrealistically agnostic, and occasionally exotic, in how we let agents di-vide the world, but there is a strong sense in which we are deriving the full attentional strategyendogenously from the agent’s model of the world.

The focus on how our theories channel our attention clarifies how it’s simultaneously true thatsurprising features of the data often grab our attention, while unexpected features do not. Theword “surprise” embeds the act of noticing. The exposure to unexpected events is, per AliceJames’s quote, an inevitability in life. Of the infinity of unexpected things that happen to us atevery moment, it is only those things that our theories focus on that typically attract our attention.Attempting to drive north from Paris, you’ll be jolted awake if you see signs you’re entering Mar-seille, yet you could only get there by also mindlessly driving by Provencal villages you’ve neverheard of (and thus couldn’t expect to see). Likewise, research in perception indicates that unex-

8For a review, see Mackowiak, Matejka, and Wiederholt (2021).9Unlike these other papers, Gabaix (2014) allows for agents to have priors in new situations that are incorrect, but

assumes that the attentional strategy is guided by the true model.

8

Page 10: Channeled Attention and Stable Errors

pected outcomes are more likely to be incidentally noticed if they share features with what peopleare looking out for. Most et al (2005), for example, find that a person is much more likely to noticean unexpected black circle if instructed to attend to circles or black objects than if instructed toattend to squares or white objects.

Our theory’s notion of incidental learning builds on the perceptual intuition by asking not whethera single observation wakes people up, but rather statistics of multiple observations. Whether or notpeople are surprised by a very unexpected gorilla walking among basketball players is directly amatter of whether they noticed it. But if a manager is very confident that an employee is rude only10% of the time, and believes the tolerable threshold is 20%, then she might not count how oftenthe employee is rude overall even if she separately notices and condemns each individual act ofrudeness. Likewise, Hanna, Mullainathan, and Schwartzstein (2014) illustrate an example whereseaweed farmers don’t notice the importance of the pod sizes they plant for their yields. This is de-spite their own experience of planting the pods (which varied considerably in size) and reaping theprofits: the farmers failing to learn were using all the relevant data, without tracking relationshipsthat would’ve allowed them to wake up to the importance of pod size for yield.

In such examples what people miss are statistical gorillas. These gorillas are “statistical” be-cause what the person misses is the mismatch over time between his expected distribution of someoutcome and the actual one, without any individual observation itself being a big surprise. A per-son might be able to answer on a given day whether he forgot to take his pills without tracking howoften he forgets. A trader might carefully track earnings each day, without carefully examiningwhether the sequence of earnings conforms to her assumptions about the earnings process.

Our framework is related to several other strands of economic research that explore the persis-tence of erroneous beliefs. The logic behind why channeled attention prevents noticing of errors issimilar to self-confirming equilibrium (Fudenberg and Levine 1993) and why people can maintainincorrect beliefs about options they rarely experiment with in “bandit problems” (Gittins 1979).This literature emphasizes that wrong initial beliefs can persist because of frictions in data collec-tion, whereas we show how such beliefs may persist because of failures to notice and process datathat a person is exposed to.

Our question of when a person rejects her theory also connects to related models of paradigmshifts and “testability” (e.g., Hong, Stein, and Yu 2007; Ortoleva 2012; Al-Najjar and Shmaya2014). While studying paradigm shifts has the flavor of analyzing when people wake up to er-rors, those papers do not study the interaction between waking up and inattention. More recently,Fudenberg and Lanzini (2020) consider the evolutionary fitness of misspecified models, deeminga model unstable if certain mutations of that model both allow people to better explain their ob-servations and yield higher payoffs than the original model. (See their paper for a more detailedcomparison with our approach, as well as a model that combines both approaches.)

9

Page 11: Channeled Attention and Stable Errors

3 Some Simple Illustrative Examples

This section presents two illustrative examples. They introduce the basic components of our frame-work and give a flavor for some of our results.

3.1 Example 1: The Forgetful Patient

Our first example addresses a basic question: how could someone hold false beliefs about his ownrepeated behavior, which he can clearly observe? We consider a person who mispredicts how oftenhe forgets to take his medication and thus forgoes opportunities to invest in beneficial reminders.

Each period, a patient forgets to take his pills with probability θ ∈ (0,1). Forgetting is costly;each time he forgets, he incurs a cost c > 0. But he can make an investment at cost χ > 0 thatreduces how often he forgets. The technology works by alerting the patient to a missed dose,which then leads him to take the pill with probability α ∈ (0,1). So, when using the technology,the patient forgets to take the pill with overall probability θ(1−α) < θ . The patient’s expectedflow utility with and without the technology is then −θ(1−α)c− χ and −θc, respectively. Heshould use the technology if θ > k ≡ χ/αc.

Suppose the patient forgets often enough that he should in fact use the reminder technology; thatis, the true value of θ , labeled θ ∗, satisfies θ ∗ > k. Treating this rate as a constant, the erroneoustheory we consider is that the patient, like many individuals, underestimates how often he forgets:his prior π over θ places probability one on θ < k.10 This situation is depicted in Figure 1.

x∗ = Don’t Invest x∗ = Invest

θ 10

kθ ∗

Figure 1: The figure shows the patient’s optimal decision as a function of his rate of forgetfulness,θ . The red line depicts the support of the patient’s misspecified model while the point θ ∗ denotesthe true parameter value.

Will the individual wake up to his forgetfulness? A common intuition is that, despite beingdogmatic that θ < k, he should: he’s observing himself forget at a much higher rate than he thoughtpossible. Our concept of explicability with full attention will reflect this common intuition. Giventhat the data identifies the true rate of forgetting in the long run, the person’s misspecified model π

10For experimental evidence that people do not fully predict future forgetting, see Ericson (2011).

10

Page 12: Channeled Attention and Stable Errors

is inexplicable relative to an alternative “light-bulb” model that puts weight on θ ∗. The patient issufficiently surprised by how rarely he takes his pills that he wakes up to his error.

But if the patient thinks he forgets so rarely that it’s not worth investing in a reminder technology,why would he pay attention to exactly how often he forgets? After all, under his wrong model ofhis own forgetfulness, π , he doesn’t see any reason to interrogate himself about how often he’staken his pills—he thinks he’s doing fine enough.

Applying our notion of channeled attention that we formalize below, the patient must follow asufficient attentional strategy (SAS) that notices all seemingly useful data to take optimal actions(according to π), but can ignore the rest. The patient’s SAS in this case is very simple: he does notthink he needs to notice any statistics relating to his rate of forgetting, so noticing nothing aboutthis rate constitutes a SAS given π . Although the patient’s long-run empirical forgetting rate wouldseem shockingly large, his model is not falsified when he ignores this data. Thus, under this coarseSAS, the patient’s model π is attentionally explicable. And, in an attentionally-stable equilibrium,the patient persistently makes costly mistakes by forgoing the reminder technology.11

In the context of people failing to take their medication, much evidence supports the basic mech-anism that people persistently think they adhere more often than they do. For example, patients’self-reported adherence in interviews is typically much greater than their electronically-measuredadherence (Garber et al. 2004). While some (or all) of this might reflect patients not wanting toadmit their failure to adhere, it is noteworthy that interviews are typically carried out by researchpersonnel rather than the patients’ clinician and that patients’ self reports are more accurate whenthey are elicited through daily diaries instead of interviews (Garber et al. 2004).

This last point suggests a change to the environment that would jolt the patient into discoveringhis error: incentivizing him to track his adherence and periodically review this data with the doctor.Specifically, imagine that every T ≥ 1 periods, the doctor creates incentives for the patient to noticehis precise rate of pill-taking. In this environment, π becomes attentionally inexplicable relativeto a lightbulb model that places weight on θ ∗. The patient will be so shocked by how low hisempirical rate of pill-taking is (only fraction 1−θ ∗ of the time!) that he will wake up to his ownforgetfulness. This example illustrates our basic premise. A person does not wake up simply basedon how surprising he would find the data if he paid attention to all the details—in both situationsabove, the outcome data are identical. Instead, a person wakes up when the data that guide actions

within his wrong model are sufficiently surprising.

11This may shed light on recent studies finding that the provision of free reminder technologies (which patients muststill set up on their own) may not improve adherence (Volpp et al. 2017; Choudhry et al. 2017). For patient-controlledreminders to be effective, patients need to be aware of their forgetfullness, and (for the many who are aware), theymust recognize that the extent of their forgetfulness warrants the effort to set up their reminders.

11

Page 13: Channeled Attention and Stable Errors

3.2 Example 2: The Patient Who Worries About Non-Existent Side Effects

Our second example illustrates how stable errors may lead people to see false patterns in the data.Consider a patient who thinks that taking a drug possibly causes some symptom (e.g., headache,nausea, pain, dry mouth, etc.) and neglects the likelihood of feeling that symptom without takingthe drug. Such a patient may come to exaggerate the severity of side effects and perceive a falserelationship between taking the drug and experiencing the symptom.12

More specifically, imagine a patient learning about the likelihood of a particular symptom whiletaking a medication. Each period there is a probability v > 0 that he must choose whether to usethe treatment (e.g., he feels sick on a fraction v of days). Suppose the treatment has a knownbenefit b > 0. Each day the patient may also experience a symptom that imposes cost c > 0; thishappens with probability θT ∈ [0,1] if the patient took the medication that day and with probabilityθN ∈ [0,1] otherwise. The patient’s choice to use the treatment thus depends on his beliefs over(θN ,θT ): he uses it if the benefit, b, outweighs the estimated cost of side effects, c(θT − θN).13

Finally, assume the patient can observe (from news media or word-of-mouth) the choices andoutcomes of others, so he has rich enough data to learn parameters θT and θN independent of hisown behavior.14

Suppose the patient is dogmatic about the likelihood of the symptom when he does not take themedication; that is, he is (perhaps wrongly) sure that θN = θN . Low values of θN capture the beliefthat the symptom does not typically happen on its own and is thus likely caused by the treatment(i.e., a side effect). Because the patient is dogmatic about θN , he feels free to ignore his conditionwhile not using the medicine. In contrast, assume the patient is uncertain about the possibility ofside effects, and thus pays attention to the frequency of the symptom when using the medicine.This corresponds to a model π over (θN ,θT ) that (i) is degenerate on θN , (ii) non-degenerate overθT , and (iii) includes values of θT in its support that render the patient uncertain if the medicationis worthwhile given θN . Letting (θ ∗N ,θ

∗T ) denote the true values of (θN ,θT ), we further assume

θN 6= θ ∗N and (θN ,θ∗T ) ∈ supp(π).15

Under the seemingly sufficient attentional strategy that ignores outcomes while not using themedicine, the patient will eventually act as if he learns the true θ ∗T while maintaining his falsebelief θN . Such erroneous beliefs will cause the patient to ultimately refuse treatment whenever

12See, e.g., Barsky et al. (2002) and Hauser, Hansen, and Enck (2012) on the “nocebo” effect.13More precisely, the patient’s flow utility on a day where he chooses x ∈ {T,N} is equal to u(x,y) = b · 1{x =

T}− c ·1{y = 1}, where y ∈ {0,1} is an indicator for whether the patient experiences the symptom that day.14This assumption that the patient continually receives feedback on both options independent of his action simplifies

matters since in each round it is optimal to take the myopically-optimal action that maximizes current flow utility.15The punchline of this example still holds if the patient is not dogmatic about θN but thinks that nothing he might

learn about θN will impact his decision regarding treatment. If the patient’s bad theory instead involves sufficientuncertainty over θN so that he feels compelled to learn pN to make optimal decisions, he would incidentally learn histheory is wrong even when the uncertainty is miscalibrated (e.g., π is concentrated on values of θN less than θ ∗N).

12

Page 14: Channeled Attention and Stable Errors

θN < θ ∗T − (b/c) < θ ∗N : he exaggerates the extent of side effects to such a degree that medicationseems too onerous to use. Furthermore, the patient not only misperceives the treatment’s valuein this case, but may also act as if he believes in a false causal relationship. Consider the casewhere θ ∗N = θ ∗T , so the medication does not contribute to the symptom whatsoever. If θN < θ ∗N , thepatient acts as if θT = θ ∗N and thus as if the symptom is more likely under treatment. The differenceθ ∗T − θN = θ ∗N− θN > 0 captures the patient’s perceived effect of the medication on the symptom,which in truth is zero.

In more intuitive terms, the patient incorrectly thinks he knows how he feels when he does nottake medication but is uncertain how he feels when he does. This leads him to pay careful attentionto how he feels while using a medicine but ignore symptoms associated with side effects when notusing it. Thus, if the patient underestimates the base-rate of some ailment associated with side-effects, then he will over-attribute its occurrence while using the medicine to a side effect ratherthan coincidence. When the treatment in truth has no effect on the ailment, this mistake generatesan illusory causal relationship: the patient wrongly thinks the medicine causes the ailment.

The misspecified model here, in fact, is attentionally stable in a stronger sense than the previousexample. Take the case where θN = 0—the patient thinks side-effect-like symptoms only arise withtreatment. Even if a doctor required the patient to record how often he experiences these symptoms,the patient thinks he can accurately do this without auditing how he feels on days where he skipsthe medication. Simple modifications to incentives that induced the misspecified agent to “wakeup” in the previous example do not similarly work here. Below, we partially characterize the sortsof errors that (like this one) are stable across a wide range of decision environments and thosethat (like the previous one) are stable in a narrower set of environments. The intuition behindwhy this error is particularly stable is that it entirely ignores the possibility of certain outcomes(e.g., feeling some symptom when not taking a drug), while other errors anticipate the right set ofpotential outcomes but are merely miscalibrated on their probabilities.

4 Framework

This section formalizes our framework. We first set up the learning environment in Section 4.1and then develop our model of channeled attention in Section 4.2. In Section 4.3, we present ourcriteria for assessing whether an attention-channelling decision-maker will discover his errors.

4.1 Environment

Consider a person updating his beliefs over a parameter θ ∈ Θ that influences the distribution ofpayoff-relevant outcomes. For instance, in the forgetful patient example above, θ is the baseline

13

Page 15: Channeled Attention and Stable Errors

rate of forgetting; in the side-effects example, θ represents the probability of experiencing a symp-tom with and without medication. More broadly, parameter θ might be a feature of the person’ssurroundings or measure the extent of his biases, such as naivete over present bias.

As depicted in Figure 2, each period t = 1,2, ... is structured as follows: the person (i) receives asignal st ∈ St correlated with θ , (ii) takes an action xt ∈ Xt , and (iii) can see a realized outcome, or“resolution”, rt ∈ Rt . In terms of the side-effects example above, xt represents the patient’s choiceto use the treatment in round t, and rt ∈ {0,1} indicates whether he then experiences a symptom.(There were no signals in that example.) Let yt = (rt ,st) denote the data generated in round t; wecall yt an “observable”. At the end of each period t, the person earns payoff ut(xt ,yt |ht), whichdepends on the current action and observable, as well as (in some cases spelled out below) thehistory ht . In addition to determining period-t’s payoff, the observable yt provides informationabout θ and the optimal action in future periods.

Although some of our applications relax these restrictions, for simplicity of presenting and prov-ing our formal results our general analysis assumes that Θ and each Yt ≡ Rt ×St are finite valued,each Xt is compact, and ut(xt ,yt |ht) is continuous and bounded in xt for all yt and ht .

Period tbegins

Period tends

Realizesignal

st ∈ St

Takeaction

xt ∈ Xt

Realizeoutcome

rt ∈ Rt

Earnpayoff

ut(xt ,yt |ht)

Figure 2: Timeline of events within period t.

For each t, let the history ht denote all the data possibly observed prior to period t’s action:

ht ≡ (st ,yt−1,xt−1,yt−2,xt−2, . . . ,y1,x1).

Let Ht be the set of all such histories up to period t, and let H ≡ ∪∞t=1Ht . Since ht includes the

period-t signal (for reasons that will be useful below), we additionally let ht(¬st) denote all dataprior to period t (i.e., ht excluding st).

The observables are correlated with parameter θ . Conditional on θ and the history, st is drawnaccording to distribution Ps(st |ht(¬st),θ). The subsequent resolution, rt , may depend on the sig-nal and action realized in t, and we denote its distribution by Pr(rt |xt ,ht ,θ). These two distri-butions form a joint distribution over observables denoted by P(yt |xt ,ht(¬st),θ). Let π∗ ∈ ∆(Θ)

denote the true probability distribution from which nature draws θ , and let θ ∗ denote the real-

14

Page 16: Channeled Attention and Stable Errors

ized value. In sum, the decision environments studied in this paper are described by the tuple(Θ,×∞

t=1Xt ,×∞t=1Yt ,×∞

t=1ut ,P,π∗).We assume the person begins with a “misspecified” model of the world. Specifically, the person

correctly knows the likelihood of outcomes conditional on the parameter but has incorrect beliefsabout which parameters are possible.16 Let Θ denote the set of all parameters that either the personor modeler deem possible. A misspecified model (or “theory”) is a prior belief over parametersπ ∈ ∆(Θ) such that supp(π) 6= supp(π∗).17 Misspecified theories therefore either place positiveweight on parameters that never occur or no weight on parameters that might actually occur. Whilethis assumption may raise concerns that the agent in our framework trivially fails to learn becausehe does not entertain the true parameter in his model, this is not the case: it will be clear belowthat, with full attention, the agent generally discovers his model is misspecified.

We assume that starting with a misspecified model is the person’s only mistake. He updatesaccording to Bayes’ Rule given his prior π (when possible) and chooses actions that maximize hisexpected lifetime utility with respect to those updated beliefs.

Unless otherwise noted, we assume that the person’s actions do not affect what he can learn:

Assumption 1. For all t ∈N, ht ∈Ht , xt ∈ Xt , yt ∈Yt , and θ ∈Θ: P(yt |xt ,ht(¬st),θ) = P(yt |yt ,θ),where yt ≡ (yt−1,yt−2, . . . ,y1).

Assumption 1 implies that any learning failure arises from insufficient attention, not insufficientexperimentation. Unless explicit, we also assume that payoffs in period t are independent of the

16Many recent models, spanning a wide range of errors in both single-person and social judgments, take this “quasi-Bayesian” approach. Examples include Barberis, Shleifer, Vishny (1998) on stock-market misperceptions; Rabin(2002), Rabin and Vayanos (2010), and He (2019) on the gambler’s and hot-hand fallacies; Benjamin, Rabin, andRaymond (2016) on the non-belief in the law of large numbers; Spiegler (2016) on biases in causal reasoning; and Hei-dhues, Koszegi, and Strack (2018) on overconfidence. Examples of biases about misreading information include Rabinand Schrag (1999) on confirmation bias; Mullainathan (2002a) on naivete about limited memory; and Bushong andGagnon-Bartsch (2021) on attribution bias. Models of coarse or categorical thinking include Mullainathan (2002b),Jehiel (2005), Jehiel and Koessler (2008), Fryer and Jackson (2008), Mullainathan, Schwartzstein, and Shleifer (2008),and Eyster and Piccione (2013). Models that incorporate errors in reasoning about the informational content of others’behavior include Eyster and Rabin (2005), Esponda (2008), and Eyster, Rabin, and Vayanos (2018); models exploringfailures to understand the redundancy in such content in social-learning settings include DeMarzo, Vayanos, Zweibel(2003), Eyster and Rabin (2010, 2014), Bohren (2016), Gagnon-Bartsch and Rabin (2017), and Bohren and Hauser(2021). Camerer, Ho, and Chong (2004) and Crawford and Iriberri (2007) provide models incorporating false be-liefs about others’ strategic reasoning. Misspecified models have also been considered in specific applications, suchas firms learning about demand (Kirman 1975; Nyarko 1991) and macroeconomic forecasting (Sargent 1993; Evansand Honkapohja 2001). Further from the quasi-Bayesian approach, other models posit inconsistencies in a person’sbeliefs across periods. Although below we translate it to something that fits in our framework, naivete about one’sfuture self-control falls within this broader category. The model of projection bias in Loewenstein, O’Donoghue, andRabin (2003) similarly posits that somebody may have systematically different beliefs about future tastes as a func-tion of fluctuating contemporaneous tastes. Similar models of interpersonal projection bias include Madarasz (2012)on information projection and Gagnon-Bartsch (2016) and Gagnon-Bartsch, Pagnozzi, and Rosato (2021) on tasteprojection.

17A misspecified model could in principle have the same support as the true model. However, given our assumptionthat Θ is finite, our analysis will make clear that the only case of interest is supp(π) 6= supp(π∗).

15

Page 17: Channeled Attention and Stable Errors

history:

Assumption 2. For all t ∈ N and θ ∈ Θ, the action set Xt is independent of ht , and for all xt ∈ Xt ,yt ∈ Yt , and ht ∈ Ht , the payoff ut(xt ,yt |ht) is independent of ht .

Without an incentive for experimentation, Assumption 2 ensures that myopically optimal actionsare in fact long-run optimal. When this assumption holds, we write ut(xt ,yt |ht) simply as ut(xt ,yt).

4.2 Channeled Attention

As discussed in the introduction, people tend to direct their attention to subjectively task-relevantinformation and away from other information (see, e.g., Chun, Golomb, and Turk-Browne 2011and Gazzaley and Nobre 2012 for reviews).18 To model channeled attention, we assume the personnotices and remembers coarse statistics about the history rather than its exact value. For eacht = 1,2, . . . , let Nt partition the set of histories, Ht . Following ht ∈ Ht , the person recalls onlythe element of Nt containing ht , denoted by nt(ht). We call nt(ht) the noticed history. As in theside-effect example from Section 3, the patient sees no purpose in tracking his symptoms on dayswhen he didn’t use the medicine; hence, nt(ht) need not distinguish outcomes that occurred onthose days. Similarly, the forgetful patient has no incentive to track his precise rate of pill-taking;nt(ht) thus encodes whatever coarsening of that rate is, in his mind, sufficient to guide his actions.

Figure 3 amends the timeline of an individual period (Figure 2) by including the informationcoarsening that happens each round. Prior to taking action xt , the person summarizes all prior datainto what he believes is a “sufficient statistic”, nt(ht), and then he uses nt(ht) to guide his action.

Period tbegins

Period tends

Realizesignal

st ∈ St

Noticecoarsenedhistory

nt(ht) ∈ Nt

Takeaction

xt ∈ Xt

Realizeoutcome

rt ∈ Rt

Earnpayoff

ut(xt ,yt |ht)

Figure 3: Timeline of events within period t, including the coarsening of past information.

18A basic psychological fact is that attention and memory do not act like cameras that faithfully record all we see,and that in fact we attend to and remember a small subset of available information. Dehaene (2014), Chater (2018), andothers review the literature and highlight how goal-oriented attention reinforces the inherent constraints on attention,arguing that our “narrow channel of consciousness” is surprisingly effective at blocking out information we’re notlooking out for.

16

Page 18: Channeled Attention and Stable Errors

A noticing strategy N is the full sequence of the person’s noticing partitions, N =(N1,N2, . . .).This strategy specifies for each point in time what the person has noticed conditional on the truehistory. We assume the person is aware that he selectively notices and recalls information. Moreformally, define an attentional strategy as a pair φ = (N ,σ) combining a noticing strategy witha behavioral strategy σ = (σ1,σ2, . . .), where σt : Nt → ∆(Xt) maps noticed histories to actions.When the person observes noticed history nt , he uses his theory π and knowledge of φ to weightthe probability of each ht ∈ nt and updates accordingly. Specifically, the likelihood of nt given θ

conditional on φ is P(nt |θ ,φ) = ∑ht∈nt P(ht |θ ,φ) and the resulting posterior over θ is

πt(θ) =P(nt |θ ,φ)π(θ)

∑θ ′∈Θ P(nt |θ ′,φ)π(θ ′).

Our analysis implicitly assumes that the person recalls his prior π , strategy φ , and the time period.We assume attentional costs are negligible, and thus an optimizing decision-maker ignores some

piece of data only when he perceives it as useless for guiding future decisions. We therefore callhis attentional strategy “sufficient” with respect to his theory π if he filters out only informationthat π deems irrelevant for decisions.

Definition 1. An attentional strategy φ = (N ,σ) is a sufficient attentional strategy (SAS) given

π if, under π , the person expects to do no worse by following φ than he would by following anyother attentional strategy φ . Under Assumptions 1 and 2, sufficiency amounts to

maxx∈Xt

Eπ [ut(x,y)|nt(ht)] = maxx∈Xt

Eπ [ut(x,y)|ht ]

for all t and ht ∈ Ht that occur with positive probability under (π,σ).

Under a sufficient attentional strategy (SAS), the person believes that his expected payoff is in-dependent of whether he optimizes using the coarsened history or the precise history. For instance,the “forgetful patient” in Section 3 follows a SAS that neglects the rate at which he forgets to takehis pills because he wrongly believes that learning this rate would not influence his behavior.

Without further restrictions on the noticing strategy, our analysis allows a person to notice afeature of the data today that emerged in the past even if he did not previously notice it. A for-getful patient could, for instance, notice whether his empirical frequency of forgetting exceeds athreshold without ever noticing the exact empirical frequency. However, at some level the exactfrequency must be tracked in order to keep asking whether it exceeds a threshold. In Appendix Awe discuss refinements on SASs that incorporate assumptions on the interaction between attentionand memory (e.g., requiring a person to notice statistics today in order to use them later).19 As we

19Recent work in economics studies implications of memory-based models of imperfect recall (Mullainathan 2002a;

17

Page 19: Channeled Attention and Stable Errors

discuss further in that section, much of our qualitative analysis remains unchanged under assump-tions on memory interactions and limitations, with the exception that some forms of imperfectmemory (reflecting assumptions on imperfect encoding into memory) can sometimes increase thescope for incidental learning. In the main text, our assumption that a person is able to freely recallpast information she currently deems useful makes the analysis transparent by isolating the impactof channeled attention. And we think it’s largely realistic as a first pass, especially in situationswhere a person doesn’t need to rely solely on her memory to guide decisions (e.g., information isstored in a database or can be written down).20

Fixing π , there are typically many SASs in a given environment. Our definition of a SAS doesnot mandate that a person ignores data he deems useless. However, in the spirit of motivating ourframework with small but non-zero costs of attention, we often consider the “minimal” case whereall seemingly extraneous information is ignored. Say that ˜N is coarser than N if for all t, Nt iscoarser than Nt and at least one of these coarsenings is strict.

Definition 2. Given π , a SAS (N ,σ) is minimal if there does not exist another SAS that can beobtained by coarsening N .

Minimal attentional strategies are perhaps most consistent with our interpretation that the personignores data due to small costs of attention. However, a minimal SAS also assumes a perhapsimplausible ability to ignore data, and we discuss in the conclusion how our framework extends tosituations where some data is impossible to ignore.

The following result establishes one minimal SAS for any given π (Appendix D provides proofs).21

Proposition 1. Consider any model π and an environment that satisfies Assumptions 1 and 2.

There exists a minimal SAS φ given π . Furthermore, if after each history ht ∈ H there is a unique

optimal action x∗t ∈ Xt given ht and π , then there exists a minimal SAS with the following form: in

each period t ∈ N, the person notices x∗t and nothing more.

A minimal SAS requires the person to simply query the history each period, asking “what actionshould I take today?” It is then sufficient for the person to ignore everything else (including ac-tions previously taken). We often compare the beliefs that arise under a minimal SAS to those that

Bernheim and Thomadsen 2005; Bodoh-Creed 2019; Enke, Schwerter, and Zimmermann 2019; Koszegi, Loewenstein,and Murooka 2021) and the interaction between attention and memory (Bordalo, Gennaioli, and Shleifer 2017, 2020).

20In Appendix A we consider refinements on noticing strategies that increase the scope for incidental learning. Butthere are realistic refinements that essentially do not change our results. For example, almost all our results carry overwhen requiring the person to notice an event when it happens in order to remember it later, yet allowing the person tofreely revisit any previously noticed data if he currently deems it relevant.

21There may be multiple minimal SASs for a given π and environment. The particular SAS a person follows candictate which long-run beliefs he comes to hold. To illustrate, consider a person who thinks that each of M doctorsmake the same recommendation given a fixed set of symptoms. There are thus M minimal SASs, each taking thefollowing form: the person follows the advice of Doctor m ∈ {1, . . . ,M} and ignores the others. If advice in fact variesacross doctors, the patient’s (seemingly inconsequential) choice of who to follow will determine his long-run beliefs.

18

Page 20: Channeled Attention and Stable Errors

arise when paying full attention. To facilitate such comparisons, we define the full-attention notic-

ing strategy as NF , where ntF(h

t) = {ht} for all ht ∈ H. Hence, under NF , the person perfectlydistinguishes histories in every period. A full-attention SAS is one that follows NF .

4.3 Light-Bulb Theories and Attentional Explicability

We now present our criterion for assessing when the data noticed while following a SAS will leada person to discover that his misspecified model π is false. Roughly, we say that π is attentionally

inexplicable relative to “light-bulb” model λ ∈ ∆(Θ) if the noticed data are infinitely more likelyunder the light-bulb than the prior. Otherwise, we say π is attentionally explicable relative to λ .

Definition 3. Theory π is attentionally inexplicable with respect to λ and SAS φ = (N ,σ) if thelimit inferior as t → ∞ of the ratio P(nt |π)/P(nt |λ ) equals 0 with positive probability under π∗

when the person follows SAS φ . Otherwise, π is attentionally explicable with respect to λ and φ .When the latter case holds relative to λ = π∗, we call φ an attentionally stable equilibrium (ASE)

given π .

One motivation behind this criterion is to imagine that a Bayesian starts not with prior π butinstead puts infinitesimal weight ε ≈ 0 on an alternative model, λ , so that his prior is (1− ε) ·π + ε · λ . Is it possible that the Bayesian’s posterior eventually concentrates on λ , placing lessand less weight on π? If so, our formulation says π is attentionally inexplicable relative to λ . Ifnot, our formulation says π is attentionally explicable relative to λ . Hence, when θ ∗ ∈ supp(λ ),our notion of explicability given a full-attention SAS essentially amounts to assessing whetherlong-run Bayesian beliefs settle on the true parameter.22 While Definition 3 may seem like aconvoluted way to present such a familiar concept, it provides an interpretational advantage underchanneled attention. Specifically, λ captures possibilities that a person might entertain at momentsof reflection and doubt, but these possibilities do not influence what data he notices prior to suchreflection.23

22More precisely, π is attentionally inexplicable if there is a positive probability that the selectively-noticed datamakes π seem implausible relative to λ . This probabilistic definition takes a stand on inexplicability in cases whereP(nt |π)/P(nt |λ ) converges to 0 under some infinite histories, but not others. To give a simple example, suppose that aball is drawn from an urn each day, and the balls from this urn always have the same color. Ex ante, the urn may be oneof three types—it may contain purely red, blue, or yellow balls. If the person’s erroneous theory π posits that the urnonly has two types—purely red or blue—then P(nt |π)/P(nt |λ ) may converge to 0 when facing a yellow urn but notwhen facing a red or blue urn. As usual, our definition tilts the results in favor of classifying a model as attentionallyinexplicable: if there is any chance that the noticed data wakes the person up, we deem π attentionally inexplicable.

23This foundation suggests the agent understands the possibility λ , but thinks it is so unlikely that he doesn’t letit guide his attention. Taken literally, the agent can be seen as conceptualizing a large set of possibilities (includingthe one that turns out to be true), but cannot pay attention to all of them. However, given the inevitability of theagent ignoring almost all such hypotheses, this literal interpretation will often seem implausible. Indeed, a secondinterpretation may better reflect our motivation, while reaching the same conclusion via similar logic. Namely, we can

19

Page 21: Channeled Attention and Stable Errors

Some additional discussion regarding our explicability criterion is in order. First, as highlightedabove, the criterion directly depends on the alternative theory λ against which π is compared.The necessity of specifying such an alternative theory is tied up with the insufficiency of sayingpeople should doubt their theories when they observe very unlikely things. To say a model π isattentionally inexplicable does not simply mean noticed outcomes are unlikely given π; in manymodels, any given outcome is unlikely. For instance, if a coin is believed to be fair and i.i.d., thena sequence of 200 flips consisting of all heads is no less likely than any other 200-flip sequencethat involves a more balanced mix of heads and tails. But we would all start imagining the fright-ening alternative of a two-headed coin after the first sequence. A criterion that deems the fair-coinhypothesis attentionally inexplicable must therefore compare it to an alternative model that allowsthe coin to be biased.

Specifying the alternative model λ is thus a central feature of our explicability criterion. Whileour framework allows for any λ , most of our analysis assumes λ is the correct (typically degener-ate) theory that we as researchers think the agent ought to entertain.24 Accordingly, when we saythat π is attentionally explicable or inexplicable without reference to a particular light-bulb theory,we implicitly take λ = π∗. Focusing on λ = π∗ both pins down our analysis and most closelymirrors folk intuition regarding when people should “get a clue”. At the same time, taking λ = π∗

gives a potentially misleading impression that if a person discovers his model is wrong, then henecessarily abandons it in favor of the true model: if π is attentionally inexplicable with respectto π∗, then it is also attentionally inexplicable with respect to the infinite array of models thatexplain reality better than π . The dynamics following a “light-bulb moment” where π is deemedinexplicable—and specifying which model a person adopts after rejecting π—is beyond the scopeof our analysis.25

Finally, when it does not create confusion, we take it as understood that nature draws θ ∗ from π∗

and that probabilistic statements and definitions are with respect to the true distribution conditional

think of the alternative model, λ , as arriving from a “light-bulb” moment in which the agent brainstorms alternatives.The exploding-relative-likelihood criterion then tests whether the agent will feel compelled to abandon his priors. Evenabsent any theory of when these light-bulb moments occur—nor any assumption that λ is the uniquely compellingalternative to π—this test seems sufficient for determining whether π would eventually be abandoned.

24When π∗ is non-degenerate, the analyst may want to consider whether a theory π is attentionally explicable forvarious realizations of θ ∗. In this case, for a given θ ∗, the analyst would ask whether liminft→∞ P(nt |π)/P(nt |λ )equals 0 with positive probability under the distribution induced by θ ∗.

25Although we take λ to be the “true” model, we do not necessarily designate that λ is degenerate on the realizedparameter—λ can be probabilistic over the parameters, and thus can represent an array of alternative possibilitiesagainst which a person compares his initial theory. Whether λ is degenerate or not becomes relevant for explicabilityonly when we violate our assumption that Θ is finite. Suppose, for instance, that the person’s theory about the bias ofa coin is uniform on [0,1] in a situation where we believe there is real uncertainty over the bias. If the coin turns outto be biased 0.55 (say), we do not want the person to deem his uncertain-prior model inexplicable merely because adogmatic prior of 0.55 would have designated the realized outcome as more likely. By contrast, in the more realisticscenario where we posit a true model in which the coin is certainly unbiased, we are comfortable saying that theuncertain [0,1] theory is inexplicable.

20

Page 22: Channeled Attention and Stable Errors

on θ ∗. Equivalently, we imagine π∗ as being degenerate on some θ ∗. This allows us to avoidconstantly mentioning both θ ∗ and π∗ in the presentation.

4.4 Explicability With Full Attention

We briefly note some properties of explicable theories under a full-attention SAS. These serve asbenchmarks for assessing the impact of channeled attention. For simplicity, we focus on environ-ments that are “stationary” and have rich enough feedback for a rational agent to learn the trueparameter.

Definition 4. The environment is stationary if Xt , Yt , and ut are independent of t andP(yt |xt ,ht(¬st),θ) = P(yt |xt ,θ) for all t ∈ N, yt ∈ Yt , xt ∈ Xt , ht ∈ Ht , and θ ∈Θ.

When the environment is stationary, we denote the constant action space, outcome space, andutility function by X ,Y , and u, respectively. In such settings, a misspecified model is explicableunder full attention if and only if it explains observations as well as the alternative model.

Observation B.1 in Appendix B.1, which reflects results known at least since Berk (1966), saysthat a theory π is attentionally explicable with respect to λ and a full-attention SAS if π explainsobservations better than λ (in the Kullback-Leibler sense) and attentionally inexplicable if it doesworse.26 This observation suggests that, within environments with rich feedback, any false the-ory that is explicable under a full-attention SAS necessarily generates no long-run welfare loss.Hence, with full attention, explicable models do not continually generate costly mistakes in theenvironments we consider.

Channeled attention limits when “lightbulb moments” will happen. In the special case of a full-attention SAS, a model π is inexplicable if it seems excessively unlikely relative to the light-bulbmodel λ when all available data is actively used to assess the relative likelihood of π versus λ .Under a coarsely-attentive SAS, our interpretation is as follows: we ask when only the seemingly-relevant data under π is enough to alert the person that his model is misspecified. If this selecteddata—which was noticed solely to make optimal decisions given π—happens to make π seem im-plausible relative to some proposed alternative, then π is attentionally inexplicable. Put differently,we do not interpret the person as actively noticing aspects of the data for the dedicated purpose ofdistinguishing π from λ . After all, he sees no reason to question π .

26This observation has two immediate implications: (i) π is attentionally explicable with respect to π∗ and a full-attention SAS if and only if there exists some θ ∈ supp(π) that makes the same predictions over observables as θ ∗; and(ii) any π that assigns positive probability to θ ∗ ∈ supp(π∗) is attentionally explicable with respect to any λ ∈ ∆(Θ)and a full-attention SAS.

21

Page 23: Channeled Attention and Stable Errors

5 Additional Examples

This section discusses two primary ways to apply our framework. The first is to study the stabilityof systematic psychological biases. We illustrate this application by examining both naivete aboutself-control problems and correlation/redundancy neglect. The second is to study the stabilityof false models of statistical processes or empirical misconceptions more broadly. We illustratethis application with a discussion of misspecified beliefs about the evolution of firm earnings,which includes the well-known model by Barberis, Shleifer, and Vishny (1998). Other classes ofapplications are described in Section 8.

Beyond this paper, several others have invoked our notion of channeled attention to explainthe persistence of psychological biases, including overconfidence (Heidhues, Koszegi, and Strack2018), the gambler’s fallacy (He, 2021), inaccurate stereotyping in discrimination contexts (Bohrenet al. 2021), and interpersonal projection bias (Gagnon-Bartsch, Pagnozzi, and Rosato 2021).

5.1 Misunderstanding Self-Control

Why do most of us have experience using a commitment device to solve a particular self-controlproblem, yet display low demand for such devices overall? Why do people seem to quickly learnthey have self-control problems in some situations, yet remain naive in others?

While some theoretical work (e.g., Ali 2011) would seem to suggest that rational learning shouldcorrect underestimation of self-control problems, channeled attention can enable a person to per-sistently make this error in some situations yet wake up in others.27 In a canonical environment, weinvestigate when a person might come to realize that her self-control problem is more severe thanshe deemed plausible. As with all errors, greater uncertainty makes her more prone to discoverher error; the more dogmatic she is, the less likely she is to do so. The degree to which she thinksher uncertainty matters for behavior is also crucial: she’s likely to pay attention to resolving uncer-tainty about her self-control problem only when she anticipates this might determine the wisdomof some course of action (e.g., investing in a commitment device).

We go into this application in particular detail to demonstrate how to apply the framework in a fa-miliar setting. Consider a person who repeatedly decides whether to take an action with immediatecost and delayed benefit. To fix ideas, imagine decisions to visit the gym, as studied by DellaVignaand Malmendier (2006). At the start of each period t, the person chooses whether to buy a gymmembership or, if she already has one, whether to visit the gym. A gym membership lasts for T

periods (beginning the period after purchase) and costs m (paid the period after purchase). If the

27Gottlieb (2019) integrates Benabou and Tirole’s (2004) model of willpower into Ali’s (2011) model of learningabout self control and shows that motivated mis-updating provides another reason why someone may fail to learn theyhave a self-control problem.

22

Page 24: Channeled Attention and Stable Errors

person goes to the gym on day t, then she also pays an immediate effort (or opportunity) cost equalto ct and earns benefit b > 0 in the future. Costs ct are i.i.d. draws from U[0, c], where c > b. Ifthe person doesn’t go to the gym, she incurs no effort cost or benefit. We assume ct is observableat the start of period t regardless of whether she goes or not (i.e., Assumption 1 holds).

Following Laibson (1997) and O’Donoghue and Rabin (1999, 2001), we consider a (β ,δ ) dis-counter with δ = 1. Thus, in each period t, the person discounts future costs or benefits by a factorβ < 1. The Laibson and O’Donoghue-Rabin models assume the person has a constant self-controlproblem and is dogmatic about its degree. The model in Eliaz and Spiegler (2006), by contrast,accommodates the more realistic case where a person is uncertain about her future self-controlproblem. Because their model is only two periods, the formalization can accommodate both truestochastic present bias or uncertainty about a permanent degree of present bias.

Following the latter approach, we assume that the true present bias in a domain is potentiallystochastic, and—reflecting the notion that the person may think she has different present bias indifferent domains—she is also uncertain about how often her present bias is severe or mild in thegiven domain. The person’s discount factor fluctuates from day to day between no present bias (tosimplify what we mean by “mild”) and some degree of present bias. To capture naivete, we assumethe person underestimates the likelihood that she will be present biased. Specifically, she believesthat in any future period t, her discount factor will be βt = β with probability q or βt = 1 withprobability 1−q. For example, the person thinks she’ll be tempted to avoid the gym on any givenday with probability q, while in reality she is tempted with probability 1. An important specialcase is where the person dogmatically believes in some q. The person is sophisticated when q = 1;fully naive when q = 0; and partially naive when q ∈ (0,1). We consider a naive person with priorπ over q such that 1 /∈ supp(π).28

Partial naivete—the belief that βt will sometimes equal 1—is inexplicable under full attentionsince the history of costs and gym attendance will prove this theory false. Notice that a gymmember visits the gym on day t if and only βtb > ct ⇔ b > ct/βt . Since we assume the person canobserve her effort cost (ct) and her overall desire to avoid the gym (i.e., ct/βt), the history of thesevalues together will reveal that βt never equals 1.

With channeled attention, however, π may be part of an attentionally stable equilibrium. Tobuild intuition, first note that this is always true when the membership is free (m = 0). In this case,deciding whether to visit the gym on a given day requires the person to notice only whether her

28In the alternative formulation of partial naivete proposed by O’Donoghue and Rabin (2001), the person dog-matically believes that her self control is β ∈ [β ,1]. Under that formulation, the person sees something she thoughtwas impossible every period (i.e., β is lower than she thought possible). Thus, the O’Donoghue and Rabin (2001)misspecification is “immeasurable”. Since the agent in our framework has wide flexibility in how she encodes sub-jectively impossible events, O’Donoghue and Rabin’s (2001) formulation of partial naivete is in fact more likely to beattentionally explicable than Eliaz and Spiegler’s (2006).

23

Page 25: Channeled Attention and Stable Errors

current disinclination to visit, ct/βt , exceeds b. She need not separately notice the precise valuesof ct and βt—she can simply ask herself if she wants to skip the gym without asking herself why(i.e., high opportunity cost vs. laziness). Furthermore, because she thinks there is nothing payoffrelevant to learn, she need not remember past values of ct/βt or her attendance rate. Thus, whenfollowing such a SAS, the person will not notice that βt has a distribution inconsistent with π .

When the membership is costly (m > 0), whether π is part of an attentionally stable equilibriumdepends on how uncertain the person is about her self-control problem. In this case, the person hasan incentive to collect information that helps her predict whether the membership is worthwhile.For a given point belief q, the person desires the membership if m < T

{(1− q) ·E[b− c|b >

c]Pr(b > c)+ q ·E[b− c|βb > c]Pr(βb > c)}

, or, equivalently, if

m < T ·(

b2

2c

)· [(1− q)+ qβ (2−β )] . (1)

That is, she buys the membership if she thinks its cost is lower than the option value of being ableto use the gym. This option value is high when perceived effort costs are low (i.e., low c) andnaivete is high (i.e., low q).

It is straightforward that the person may persistently “pay not to go to the gym” in an atten-tionally stable equilibrium. When Condition (1) holds for all q ∈ supp(π), the person thinks themembership is worthwhile no matter what. The analysis is then similar to the case where m = 0:if the person is certain she has sufficient self control to justify the membership, then she need nottrack her behavior and notice that she goes too little. Indeed, there is suggestive evidence thatpeople not only overestimate how often they will go to the gym, but do not realize how little theywent in the past (Beshears et al. 2021; Carrera et al. 2021).

Even when the person is initially unsure whether the membership is worthwhile (i.e., Condition1 holds for some q ∈ supp(π) but not all), she still need not wake up to the fact that she goes tothe gym less often than she thought possible. By Proposition 1, she need only notice whether sheshould buy the membership (e.g., whether Condition 1 holds). But this does not require her toprecisely notice the frequency with which she has gone to the gym.

So what would get the person to wake up? She needs incentives to precisely track her behavior.This could arise, for example, when facing a menu of gym memberships with different perks (e.g.,access to exercise classes, different hours, different days open). We’ll capture such incentives inan reduced-form way by imagining that the person needs to precisely track her willingness to payfor a gym membership—that is, she has an incentive to accurately assess her expected value ofthe right-hand-side of Condition (1). In this case, we say the person has an incentive to track her

precise willingness to pay for a gym membership. In contrast, we say the baseline case describedabove (deciding to buy a fixed plan at price m) does not induce such an incentive.

24

Page 26: Channeled Attention and Stable Errors

Proposition 2. In the setting above, there is no attentionally stable equilibrium given the person’s

model π if and only if both (i) π is non-degenerate and (ii) the person has an incentive to track her

precise willingness to pay for a gym membership.

To summarize, waking up depends on the person having some sophistication and uncertaintyabout her self-control problems, and the incentive to precisely learn the extent of those problemsto guide her actions. For intuition behind this result, first suppose that both (i) and (ii) hold. Theperson thinks she must attend to the fraction of times that βt = β in order to distinguish betweenany q and q′ in the support of π . In the long run, the observed fraction will be inconsistent withπ , so π is attentionally inexplicable. If either (i) or (ii) don’t hold, then the person believes thereis no benefit to learning the precise extent of her self-control problem and there is a SAS underwhich π is attentionally explicable.29 While the idea that a person needs to track her willingnessto pay might seem artificial in this gym context, it is more natural in settings with continuousaction choices. For example, consider payday loans where a person’s optimal action (the size ofthe loan) varies continuously in beliefs about future self control (which influences the perceivedcost of credit). This continuous action mirrors the need to track willingness to pay. In line with ourmodel’s prediction, there is suggestive evidence that people do tend to wake up to their present biasin such settings (Allcott et al. 2021), unlike in the gym setting which involves coarser decisions.

This application further illustrates two more general points. First, a person may recognize theextent of an error, such as limited self control, in one situation yet remain naive in another. Thismay shed light, for example, on why most of us have experience using a commitment device insome situation, despite making few commitments overall (Laibson 2015). Second, a person maypersistently fail to appreciate the extent to which her actions deviate from her intentions. This prin-ciple also underlies our “forgetful patient” example in Section 3, where the person overestimatesthe probability she will remember to take a medication (e.g., Garber et al. 2004). Our frame-work suggests that an incentive to record and review or internalize one’s own behavior may be aneffective way to remedy naivete about the divide between actions and intentions.

5.2 Redundancy Neglect

Consider another well-known psychological bias: the tendency to neglect correlations in others’opinions (as in DeMarzo, Vayanos, and Zwiebel 2003; Eyster and Rabin 2010, 2014; Enke and

29More specifically, without an incentive to track her willingness to pay, the person’s noticing partition Nt in aperiod t where she decides whether to buy a membership has at most two cells when following a minimal SAS:{ht ∈ Ht | Et [V (q)]≥ m} and {ht ∈ Ht | Et [V (q)]< m}, where V (q) denotes the right-hand side of (1) and Et is withrespect to πt . In contrast, with an incentive to track willingness to pay, we assume the person has an incentive totruthfully state Et [V (q)] for each period t in which she can buy a membership. Hence, in those periods, Nt must havea cell for each possible realization of ∑

tk=1 1{βk = 1} under π . Since in reality βk = β in every period, the person will

notice that ∑tk=1 1{βk = 1} is always equal to 0, which becomes increasingly improbable under π .

25

Page 27: Channeled Attention and Stable Errors

Zimmermann 2019). Appendix C.1 fleshes out an example of a newly hired employee who seeksadvice from her colleagues. Over time, this person updates her beliefs about the quality of advicethat each colleague provides. However, she wrongly treats their advice as conditionally indepen-dent, ignoring that colleagues also talk with each other. Whether the person wakes up to this errordepends on the amount of feedback she receives. If she always observes whether advice was usefulor not ex post, then she can learn about an individual colleague simply by comparing his recom-mendations with the realized outcomes. Since this strategy does not depend on correlations acrosscolleagues’ advice, her mistaken model is attentionally stable under a minimal SAS; consequently,she may persistently overreact to consensus advice. In contrast, if she does not always observewhether advice was useful or not, then any SAS requires her to notice the correlation across oth-ers’ advice: in the absence of feedback, the efficient way to update about the quality of a colleagueis to “benchmark” his advice against that of other colleagues believed to be knowledgeable. Thus,with limited feedback, she will notice that colleagues are simply echoing one another and hencediscover her mistake. This application highlights how reducing the quality of information canpromote waking up, a theme we return to below.

5.3 Empirical Misconceptions of Asset Patterns

We now consider a setting that involves high stakes, expert participants, and divergent interests:financial markets. We apply our attentional-explicability criterion to a class of misspecified beliefsthat include the well-known model by Barberis, Shleifer, and Vishny (“BSV”, 1998) in whichinvestors wrongly assume there are predictable trends in a stock’s performance. We sketch thisapplication here and provide details in Appendix C.2.

Consider a single security that pays out 100% of its earnings as dividends. In truth, the earningsprocess, denoted (Mt), follows a random walk. However, a risk-neutral representative investorholds wrong beliefs about the process, believing that it is in one of two “regimes” at any point intime: in one regime earnings are mean reverting, and in the other they trend. We show that thepropensity for a market participant to eventually recognize this error will depend on the scope ofher objective. In particular, an analyst who simply needs to make a buy/sell recommendation to aninvestor is less likely to discover the mistake than an analyst who additionally has an incentive toback up her recommendation with precise predictions about future earnings.

Suppose that Mt is observable and agents use this information to update their beliefs about futureearnings. The equilibrium price pt in period t is the expected value of future earnings given thesebeliefs. Consider an analyst who provides a buy/sell recommendation to a price-taking, risk-neutralinvestor. In a given period t, the analyst recommends “buy” if the expected value of future earningsexceeds pt and “sell” otherwise. The analyst may additionally need to make precise predictions

26

Page 28: Channeled Attention and Stable Errors

about future earnings to back up her buy/sell recommendation. In this case, she announces herprediction of the current period’s earnings, denoted Mt , just before they are realized, and she facesincentives for accuracy: her payoff from predicting earnings is −(Mt −Mt)

2, and we assume thisis additively separable from her incentive to make an accurate recommendation.

We show in Appendix C.2 that waking up in this application depends on the scope of an indi-vidual’s objective. The buy/sell recommendation requires only a coarse view of the data, therebypermitting a false belief like the BSV model to persist. On the other hand, a precise predictionabout earnings requires detailed attention to patterns in the data, which is more likely to force theagent to confront inconsistencies that reveal her misspecified beliefs.30

The broader point is that expertise is not necessarily portable across similar problems that makeuse of the same data. While we frame this analysis as holding across individuals with differentobjectives, we could instead think of it as holding within individuals across problems. That is, tothe extent that learning is “local” to a given problem or situation, a person might wake up to an errorin one problem yet remain naive in another. (See the conclusion for a more extensive discussion.)In this way, the analysis from this section perhaps relates to recent evidence from Akepanidtawornet al. (2021) showing that expert portfolio managers are skilled at buying assets but follow costlyheuristics when selling. The data suggests that the managers pay disproportionate attention toinformation pertinent to purchases relative to sales, and this seems to stem from differing waysof looking at the two problems. When buying, managers strive to pick the next “great idea”, butwhen selling, they aim to raise quick cash to fund their next purchase idea. As the authors note,channeled attention provides a reason why managers may not learn that their selling strategiessignificantly under-perform random selling strategies: they channel their attention away from datathat would answer this question because, under their wrong models, they don’t think to ask it.

6 What Errors Tend to Be Stable Across Environments?

This section examines which errors tend to be attentionally stable across a wide range of environ-ments. The crux of our formal results are foreshadowed by intuitions in Section 3. Errors involvingthe neglect of important dimensions or contingencies (e.g., being vigilant about noticing symptoms

30Given that different market participants may reach different conclusions about the earnings process—analystswho merely make recommendations may remain naive while those who need to back up those recommendations withpredictions may wake up—a natural question arises: once analysts abandon their erroneous model, how does themarket respond? This question is beyond the scope of our framework since we are agnostic about what model an agentadopts after realizing that her initial model is inexplicable. That said, even if some analysts in this application were toadopt the true model, we speculate that others may remain ignorant due to channeled attention. Given that they believeothers share their misspecified model, analysts either (i) see no benefit in noticing other analysts’ recommendations or(ii) they base their decisions solely on other analysts’ recommendations. Either way, they need not wake up to otheranalysts using a different model even if, under (ii), they may end up making better decisions once other analysts wakeup.

27

Page 29: Channeled Attention and Stable Errors

after taking medicine, while assuming you rarely feel those symptoms when not taking medicine)are more stable than those that can be thought of as involving correct beliefs about which factorsmatter but incorrect priors as to how much they matter (e.g., knowing you might underestimate thelikelihood of being forgetful while still appreciating the possibility of being forgetful).

6.1 Errors that are Attentionally Stable Across Stationary Environments

Within stationary environments meeting Assumptions 1 and 2, this section characterizes errors thatare stable for any action space or utility function the decision-maker may have.

Under Assumption 1, we can separate the environment into two components: the outcome envi-

ronment, (×∞t=1Yt ,Θ,P,π∗), describing possible distributions over outcomes, and the choice envi-

ronment, (×∞t=1Xt ,×∞

t=1ut), describing the action space and utility function.

Definition 5. Restricting attention to stationary environments where Assumptions 1 and 2 hold,and fixing the outcome environment, a theory π is preference-independent attentionally explica-

ble (PIAE) if for any choice environment (action space X and u : X ×Y → R), there exists anattentionally stable equilibrium given π .

As a prelude to discussions below about how specific errors are PIAE, we first provide a moregeneral characterization of when a model π satisfies PIAE, which depends on π’s predicted prob-ability distributions over outcomes. Given our stationarity assumption, yt is i.i.d. (conditional onθ ) with distribution P(·|θ). Let Y (θ) denote the support of P(·|θ) and let Y (π)≡∪θ∈supp(π)Y (θ).Concepts from probability theory regarding minimal sufficient statistics (e.g., Lehmann and Casella1998) help us analyze when models are PIAE. Let

mπ(y)≡{

y′ ∈ Y (π) | ∃g(y′,y)> 0 s.t. ∀θ ∈ supp(π),P(y′|θ) = P(y|θ)g(y′,y)}. (2)

In words, the partition defined by mπ(y) is a minimal sufficient statistic for updating beliefs aboutθ : mπ(y) lumps y′ together with y if and only if, under π , the person updates the same way uponnoticing y′ as she would after noticing y. We analogously define minimal sufficient statistics overhistories yt = (yt−1,yt−2, . . . ,y1), which we denote by mπ(yt).31

31For an example of mπ(y) and mπ(yt), suppose Y = {0,1} and θ ∈ [0,1] represents the probability that an employeedelivers a successful performance (i.e., y= 1) on any given day. Suppose a manager’s theory π over values of θ assignspositive weight to θ ′ and θ ′′ 6= θ ′. Then mπ(0) = {0} and mπ(1) = {1} since P(1|θ)/P(0|θ) = θ/(1−θ) dependson θ . Furthermore, letting k(yt) denote the count of successes in yt , mπ(yt) = {yt | k(yt) = k(yt)} since

P(yt |θ)P(yt |θ)

=

(t−1k

)θ k(1−θ)t−1−k(t−1

k

)θ k(1−θ)t−1−k

is independent of θ if and only if k = k.

28

Page 30: Channeled Attention and Stable Errors

To ensure that the “minimal sufficient statistic” partition defined by (2) is fixed over time, weassume that P(y|θ) ∈ (0,1) for all (y,θ) ∈ Y (π)× supp(π).32

With these concepts in hand, the following result characterizes when a theory is PIAE.

Lemma 1. Consider a stationary environment where Assumptions 1 and 2 hold, and suppose

P(y|θ) ∈ (0,1) for all (y,θ) ∈ Y (π)× supp(π). A theory π is PIAE if and only if there exists

θ ∈ supp(π) such that with probability one

liminft→∞

P(mπ(yt)|θ)P(mπ(yt)|θ ∗)

> 0. (3)

The idea behind this result is straightforward: in stationary environments, it is always sufficient—and sometimes necessary—for the person to attend to all information helpful in updating beliefsabout θ . Therefore, if noticing mπ(yt) does not force the person to wake up, then there is no choiceenvironment that will. On the other hand, if noticing mπ(yt) does force the person to wake up, thenwe can surely find a choice environment where noticing this data is necessary under any SAS.

Lemma 1 helps identify specific classes of models that are stable across choice environments.One immediate corollary of Lemma 1 is that dogmatic errors (e.g., false but dogmatic beliefsabout oneself) are PIAE. Specifically, if π is degenerate, then the minimial sufficient statistic dis-tinguishes no data (i.e., mπ(y) = Y (π) and mπ(yt) = Y (π)t−1), and thus the ratio in (3) remainspositive. We next describe additional corollaries of Lemma 1 that further categorize classes ofmodels as either PIAE or not. We discuss this categorization in intuitive terms here, and relegateformal definitions and results to Appendix B.2.

We begin by describing two classes of models that are PIAE. First, consider “censored” modelsthat ignore possible outcomes. For an example, imagine a situation where, in any given period,an employee’s performance may be good, bad, or awful, Y = {−1,0,1}, and θ determines thelikelihoods of these outcomes. A manager with a censored model believes, for instance, that anawful performance is impossible: P(y =−1|θ) = 0 for all θ ∈ supp(π). In this case, the manager’sattentional strategy could simply coarsen outcomes based on “good” or “not good” without furtherdistinguishing bad from awful performances. Generally, such “censored” models are PIAE: theagent can simply ignore those outcomes he deems impossible.

Second, consider models that neglect predictive signals. These models treat a subset of signals asindependent of outcomes. Examples include seaweed farmers failing to appreciate the importanceof pod size (Hanna, Mullainathan and Schwartzstein 2014); small investors failing to appreciatethe importance of analyst affiliation when interpreting investment recommendations (Malmendierand Shanthikumar 2007); investors failing to appreciate that the way a manager chooses to report

32In principle, mπt (y) could vary in t if P(y|θ) ∈ {0,1} for some (y,θ) ∈ Y (π)× supp(π) since the support of πtmay differ from that of π .

29

Page 31: Channeled Attention and Stable Errors

current earnings predicts future earnings (Teoh, Welch, and Wong 1998a and 1998b); or physiciansusing overly simple models to predict the value of testing for heart attacks (Mullainathan andObermeyer 2019). For a concrete illustration, suppose a manager receives a signal s ∈ {0,1} eachperiod that is correlated with an employee’s current performance, which can be good (y = 1) orbad (y = 0). In truth, P(yt = 1|st = 1,θ) = θ +κ and P(yt = 1|st = 0,θ) = θ −κ; the manager,however, thinks P(yt = 1|st ,θ) = θ regardless of st . Such neglectful models are generally PIAE.

We now describe three classes of models that are not PIAE, meaning they are prone to incidentallearning in some stationary environment. These include models that are miscalibrated but empha-size the right distinctions and those that emphasize truly unimportant distinctions or variables.

First, consider uncertain models that correctly specify the set of outcomes but incorrectly specifytheir probabilities. When the person is unsure of his optimal action, he will collect the data hismodel deems necessary to learn. The amount of uncertainty he perceives—which is determinedby uncertainty in π and the sensitivity of P(·|θ) to θ—thus impacts the scope of attention and,ultimately, the stability of his model. Specifically, if no two observations lead to the same beliefsover parameters (defined in the appendix as the Varying Likelihood Ratio Property, or VLRP),then the person believes that separately noticing every outcome would aid in learning θ . Whenthe agent’s preferences are such that he in fact wants to learn θ , he will thus notice that his modelis miscalibrated. For example, if a manager is mistakenly convinced that the employee deliversgood performance with frequency θ = 3/4 when in truth θ ∗ = 1/2, then his mistake is PIAE. Ifthe manager instead is uncertain whether θ = 3/4 or θ = 1/4, then his mistake is not PIAE.

Second, consider “overly elaborate” models that anticipate too wide a range of outcomes. Thesemodels represent a counterpoint to “censored” models. Consider again a manager example withthree outcomes, Y = {−1,0,1}. The manager’s model is overly elaborate if P(y=−1|θ)> 0 for allθ ∈ supp(π) while in reality P(y =−1|θ ∗) = 0; that is, the manager wrongly thinks the employeewill sometimes deliver awful performance when in fact this never happens. Such models are proneto discovery, so long as the person has an incentive to track the frequency of various outcomes. Inthis case, the person will eventually notice that an impossible outcome fails to materialize.

Third, consider “over-fit” models that assume the set of predictive signals is wider than it trulyis. These models represent a counterpoint to those that neglect predictive signals. To illustrate,consider again the manager example with signals, but suppose now P(yt = 1|st ,θ) = θ for eitherst ∈ {0,1}, meaning that the signals are truly uninformative. A manager with an over-fit modelwrongly treats signals as informative: he believes, for instance, that P(yt = 1|st ,θ) = θ + ε whenst = 1 and P(yt = 1|st ,θ) = θ − ε when st = 0. Such models are not PIAE when there is un-certainty about how useful the signals are (e.g., the manager wants to determine which employeecharacteristics predict success). In these contexts, the person would record the sequence of bothsignals and resolutions, which would ultimately prove his theory false.

30

Page 32: Channeled Attention and Stable Errors

Table 1 summarizes the categorizations above (again, with formal details in Appendix B.2).Overall, the results demonstrate a sense in which “overly coarse” models—those that ignore rele-vant outcomes or predictive signals—have a greater tendency to be stable than “overly fine” modelsthat place importance on truly irrelevant outcomes or signals.

Table 1: CATEGORIZATION OF PIAE MODELS

Attentional stabilitydependent on choice

environment

Attentional stabilityindependent of choice

environmentCENSORED X

PREDICTOR NEGLECT XMISCALIBRATED WITH VLRP X

OVERLY ELABORATE XOVER-FIT X

A simple intuition underlies this pattern: when a person thinks a variable does not matter, sheis certain about how much it matters—she thinks it does not matter at all. Conversely, when aperson thinks a variable does matter, she is often initially uncertain about its importance. It is thisuncertainty that enables incidental learning.

6.2 Errors that are Attentionally Stable Across Broader Environments

The section above characterizes errors that are attentionally stable for any action space or utilityfunction assuming the environment is stationary. We now relax that stationarity restriction and askwhich errors are attentionally stable across history-dependent action spaces and utility functions.

Definition 6. Restricting attention to environments where Assumption 1 holds, and fixing the out-come environment, a theory π is strongly preference-independent attentionally explicable (S-PIAE)

if for any choice environment (specifying a sequence of action spaces Xt and utility functionsut : Xt×Yt×Ht → R), there exists an attentionally stable equilibrium given π .

While all S-PIAE errors are PIAE, PIAE errors may not be S-PIAE. To characterize S-PIAEerrors, it is useful to introduce the question of whether an error is attentionally measurable for agiven SAS: does the person ever confront data she thought was impossible? More formally, we sayπ is attentionally measurable with respect to SAS φ = (N ,σ) if all finite noticed histories givenφ that occur with positive probability under π∗ are assigned positive probability under π .

31

Page 33: Channeled Attention and Stable Errors

Proposition 3. (i) If φ = (N ,σ) is a minimal sufficient attentional strategy given π , then π is

attentionally measurable with respect to φ . Moreover, (ii) restricting attention to environments

where Assumption 1 holds, any theory π that is attentionally inexplicable under a full-attention

SAS is S-PIAE only if π is not attentionally measurable with respect to a full-attention SAS.

The first part of Proposition 3 shows that for any misspecified model π , there exists a sufficientway to filter the data such that the person never notices an outcome she assumed impossible. Inparticular, this is true whenever the person follows a minimal SAS. Intuitively, the person sees nobenefit to distinguish events that she assigns zero probability from those that she assigns positiveprobability. Put differently, a minimal noticing strategy will not be fine-tuned to notice whensubjectively zero-probability events occur. While nothing would be more surprising to a personthan noticing an event she deemed impossible, these are precisely the events she assumes are notworth looking out for.

The second part says that the only misspecified models that are attentionally stable acrosshistory-dependent utility functions and action spaces are those that are not attentionally measur-able under a full attention SAS. Models that assume truly possible events are impossible are morerobustly stable than those that anticipate the correct set of outcomes. The idea is that a person whois incentivized to carefully track everything she sees will only fail to wake if her error enables herto entirely write off truly possible outcomes. In this sense, “bigger errors” are more robustly stablethan “smaller errors” since they engender less overlap between the questions a person thinks to askand those she should be asking.

7 What Types of Situations Tend to Make Errors Stable?

We now explore features of the environment that influence an error’s stability, generalizing insightsfrom our applications in Section 5. We separately analyze how a person’s objective and informationstructure influence whether he discovers his error.

7.1 How A Person’s Goals Influence Stability

We first establish that whether a person wakes up is not tightly linked to the cost of the mistake.For every outcome environment and erroneous theory a person can have, there exists some choiceenvironment in which the model is stable. In particular, if ut is independent of outcomes, then theperson has no incentive to attend to data and his model is thus attentionally explicable.

More interestingly, for every erroneous model π there is always a choice environment in which π

is both stable and costly, meaning that the person’s limiting behavior under a SAS given π is subop-timal relative to their limiting behavior under a SAS given the true model, π∗. The counterpoint is

32

Page 34: Channeled Attention and Stable Errors

often true as well: for a large class of misspecified models π , there will exist an environment whereπ is necessarily attentionally inexplicable despite being costless. Before stating these results, wefirst define our notion of a costly error more formally.

Definition 7. Consider a model π and SAS φ = (N ,σ) given π . Let ut(φ |ht)≡E(θ∗,φ)[ut(x,y)|ht ]

denote expected utility in period t given the true parameter θ ∗, history ht , and SAS φ . Let φ∗ beany SAS given π∗. The SAS φ is costless if |ut(φ |ht)− ut(φ

∗|ht)| → 0 almost surely given θ ∗.When the SAS φ is not costless it is costly, and when it is also an attentionally stable equilibrium,it is a costly attentionally stable equilibrium.

Proposition 4. Consider a model π and any outcome environment (×∞t=1Yt ,Θ,P,π∗) in which π is

attentionally inexplicable under a full-attention SAS.

1. There exists a choice environment (×∞t=1Xt ,×∞

t=1ut) and a corresponding SAS φ such that φ

is a costly attentionally stable equilibrium given π .

2. If π is attentionally measurable under a full-attention SAS, then there exists a choice en-

vironment (×∞t=1Xt ,×∞

t=1ut) where (i) every SAS given π is costless, yet (ii) there exists no

attentionally stable equilibrium.

The intuition behind Part 1 of Proposition 4 is as follows. Fixing the outcome environment(×∞

t=1Yt ,Θ,P,π∗), we can construct a choice environment with binary actions each period, Xt =

{H,L}, such that H is optimal for any θ ∈ supp(π) but L is optimal for parameter θ ∗. The proofconsiders a utility function with two properties: (i) the person incurs a big penalty if he everswitches actions—so he’s effectively choosing between “always H” and “always L”—and (ii)“always H” yields a higher payoff in any period where π provides a better fit of the empiricaldistribution than does π∗. When the person believes θ ∗ is impossible, he thinks there is nothingpayoff-relevant to learn because he is confident H is optimal in the first period and, because of theswitching penalty, he thinks he should never revise his action. While the proof constructs a contextwhere the person is ex ante dogmatic about the optimal action, many of our examples show thatcostly errors can remain attentionally stable even with active updating about which action to take.

What is more (and in a sense obvious given our framework), the costly attentionally stableequilibria identified in Proposition 4 can be arbitrarily costly to the decision-maker. A personfails to discover a costly mistake only when he wrongly deems valuable data entirely useless andignores it. Yet, once deemed useless, the true value of this data—which determines the scale of theperson’s mistake—has no influence on his decision to ignore it. That is, no matter how great thetrue benefit of some data relative to the attentional cost of processing it, the decision-maker maycontinually ignore this data if his perceived benefit is sufficiently small.

33

Page 35: Channeled Attention and Stable Errors

To see the intuition behind Part 2, consider a choice environment where in each round t, the per-son earns a payoff of 1 if he correctly repeats back the entire history ht , and earns −1 otherwise. Ifπ is attentionally measurable under a full-attention SAS (equivalently, π∗ is absolutely continuouswith respect to π), then π assigns zero probability only to events that are truly impossible underthe true model. Hence, given π , the person has incentive to actively notice ht each round. In thiscase, any error is costless: if the person knows ht , he will always take the optimal action despiteholding a misspecified model. Furthermore, knowledge of ht forces the person to wake up to anyinexplicable error.

Taken together, the two parts of Proposition 4 show that whether the person wakes up is not inti-mately tied to the cost of the mistake, contrary to intuitions from the rational inattention literaturewhere inattention is more likely to enable small errors than big ones. So, if the cost of the mistakedoesn’t encourage waking up, what does?

The basic answer is whether the person’s objective encourages him to track and discern morefeatures of the data. As a simple observation, say that choice environment A requires the person to

track more than choice environment B when any SAS given A is also a SAS given B. Then if π ispart of an attentionally-stable equilibrium given A then it must also be part of an attentionally-stableequilibrium given B. This observation—which holds whenever a person’s objectives in choiceenvironment A subsume those in B—is central, for instance, to our analyst example illustratingwhen the BSV misspecification is stable (Section 5.3).

As another example, incentives to report information beyond what a person has learned withinhis model promote waking up. A person’s posterior over θ is often not enough to prove his modelfalse. In Appendix B.3, we define choice environments in which the person finds it insufficient totrack only what’s necessary to update his beliefs about θ . We show that a person is more likelyto wake up in such environments than in those where he finds this data sufficient. For example,in Section 3, a forgetful patient was more likely to wake up to his prospective memory failureswhen he needed to track and report back statistics on when he takes pills. In such examples, theperson does not recognize that carefully tracking what he’s seen will help him learn about thepayoff-relevant parameter, θ—such tracking therefore enables incidental learning.

7.2 How the Information Structure Influences Stability

The information structure also influences attentional stability. In particular, environments thatreduce the need to track and discern features of the data reduce the scope for incidental learning.

First, consider the impact of a person needing to learn parameters of her model through ex-perience versus description. For example, suppose that an investor is making a decision about acompany. Her incentives are such that her optimal action is continuously and strictly increasing

34

Page 36: Channeled Attention and Stable Errors

in the likelihood she attaches to the company having high versus low dividends that period. Thecompany is ex ante equally likely to be good or bad, but good companies yield high dividends inany given period with probability .75 and bad companies do so with probability .25. While theinvestor has correct beliefs about the likelihood that the company is good, she is wrong about howthis maps to the likelihood of high dividends: she incorrectly believes that good companies yieldhigh dividends with probability .9 and bad companies yield high dividends with probability .2. Ifthe person needs to learn whether the company is good or bad through the dividend history, thenher incorrect theory is attentionally inexplicable: her action in a given period reveals the fractionof times the company had high dividends, which will not approach .9 or .2 in the long run. Onthe other hand, if the investor was informed ex ante whether the company was good or bad, thenshe need not wake up to her misspecification: she will feel that there’s nothing left to learn, so oneminimal SAS is for her not to notice anything. Because uncertainty promotes incidental learning,gleaning answers to questions from experience may provide more opportunities to wake up thanbeing told some of those answers ahead of time.33

More broadly, providing a person with information ex ante that she would otherwise figure outherself hinders incidental learning. Even under the wrong model and a coarsely-attentive SAS, theperson in the example above would correctly learn over time whether the company is good or bad.Providing this information upfront, however, prevents her from waking up. On the other hand,providing a person with information ex ante that contradicts what she would otherwise conclude(given her misspecification and SAS) promotes incidental learning. As an illustration, extend theinvestor example so her misspecified model allows for the possibility that dividends are sometimesreported with error. Such a theory could be attentionally explicable since it provides an explanationfor why the observed frequency of high dividends does not approach .9 or .2 in the long run.However, if the investor was informed ex ante that dividends are always perfectly reported, then hertheory goes back to being attentionally inexplicable. Truthfully informing a person of somethingshe would’ve concluded on her own can therefore hinder the discovery of an error, while truthfullyinforming her of something that contradicts what she would’ve concluded can help.34

33Similarly, garbling feedback may also promote waking up (as noted above in the redundancy-neglect application).Extend this investor example so that earnings may be observable in addition to dividends. Suppose the investorcorrectly knows that good companies always have high earnings and bad companies always have low earnings. Thenthe logic above says that the investor is more likely to wake up if she cannot observe earnings than if she can. Thisexample has the intriguing feature that a person might be better off in the long run if she cannot observe a statistic that(i) her model misuses, and (ii) is sufficient both under her model and under the correct model.

34These results reflect the findings of Esponda, Vespa, and Yuksel (2020), who experimentally examine the persis-tence of base-rate neglect in the presence of rich feedback. In a baseline treatment, subjects are initially induced toadopt an erroneous mental model—they are told some parameter values underlying the data-generating process, whichfacilitates base-rate neglect. Subjects in this treatment are significantly less likely to use the feedback to correct theirtendency for base-rate neglect relative to a treatment in which the subjects do not know the underlying parameters andmust learn them from experience. That is, subjects are more apt to notice their error when they do not receive ex antedescriptions of the problem and instead must learn about it from experience.

35

Page 37: Channeled Attention and Stable Errors

Second, consider the impact of a person being able to delegate her decision-making to a col-league or algorithm. Continuing with the investor example, imagine that the investor can programan algorithm that implements her (subjectively) optimal strategy. And suppose her utility functionis−(x−θg)

2, where x ∈ [0,1] is her action and θg ∈ {0,1} indicates whether the company is good.Then the investor’s optimal action in a given period is equal to the current probability she attachesto the company being good. By the logic of the previous paragraph, she will wake up if she decideswhat action to take each period. However, if she instead runs an algorithm that calculates the pos-terior probability (using her incorrect model) and then implements the associated optimal action,then she need not wake up. In this case, one SAS involves only noticing each period that she’sfollowing the algorithm—this is clearly insufficient to reveal her mistake. While we’re abstract-ing from the obvious benefits of delegation or using algorithms (e.g., reducing noise and effort indecision-making), we’re emphasizing an often-overlooked cost: delegating others to examine thedata and answer the questions we think we should be asking prevents us from recognizing thatwe’re in fact asking the wrong questions.

Third, consider the impact of a person being free to not track data over time because it is recordedsomewhere (e.g., in a database). In the investor illustration, this may enable her to use algorithmswhich, as we just saw, can prevent her from waking up. If she continued to take actions herself,then the inclusion of a database wouldn’t impact her behavior or whether she wakes up under ourmaintained assumption (in the main text) of perfect memory.

The following proposition summarizes and formalizes these intuitions.

Proposition 5. Consider any environment Γ≡ (Θ,×∞t=1Xt ,×∞

t=1Yt ,×∞t=1ut ,P,π∗) that satisfies As-

sumption 1.

1. Consider a modified environment identical to Γ aside from the person knowing ex ante that

a set of parameters Θ⊂ supp(π) are not possible: in the modified environment, the person’s

theory π satisfies supp(π) = supp(π)\ Θ.

Let Θreject(Γ,φ) = {θ | Pr(θ |nt)→ 0 almost surely given π∗under SAS φ}, and suppose Θ⊆Θreject(Γ,φ min), where φ min is a minimal SAS that constitutes an attentionally-stable equilib-

rium if one exists and is an arbitrary minimal SAS otherwise. If there exists an attentionally-

stable equilibrium given π in the original environment Γ, then there exists an attentionally

stable equilibrium given π in the modified environment. On the other hand, the converse

does not generally hold.

Now suppose Θ ⊆ supp(π) \Θreject(Γ,φ min). Even if there exists an attentionally stable

equilibrium given π in the original environment Γ, there may not exist an attentionally stable

equilibrium given π in the modified environment.

36

Page 38: Channeled Attention and Stable Errors

2. Consider a modified environment identical to Γ aside from allowing the person to select

an option each period that implements a subjectively optimal strategy: in the modified en-

vironment, the action space each period is Xt ∪ {d}, where selecting d will implement a

subjectively optimal behavioral strategy. There is always an attentionally stable equilibrium

in the modified environment.

3. Consider a modified environment identical to Γ aside from allowing the person to observe

the full history each period: in the modified environment, the person receives signals st =

(st ,ht(¬st)) for all t ∈N, where st follows the signal structure of Γ and ht(¬st) is the history

prior to period t. There exists an attentionally stable equilibrium given π in the original

environment Γ if and only if there exists an attentionally stable equilibrium given π in the

modified environment.

Parts 1 and 2 of Proposition 5 were discussed above. Part 3 highlights that attentional stabilityis not about the limited availability of data per se, but rather a failure to notice the right features ofthe data. Any attentionally stable error will remain stable if we grant the person continual accessto a complete archive of past outcomes (e.g., outcomes are recorded somewhere). That is, a personwho can always revisit any past data—if he so chooses—still need not get a clue.35

Relaxing the assumption of perfect memory clarifies that giving a person access to the full his-tory each period, if anything, hinders a person from recognizing an error. In Appendix A, we definerefinements on SASs that reflect assumptions on the interaction between attention and memory. Weestablish in that appendix that giving a person access to a recording of the full history each periodimpedes getting a clue so long as a person is sophisticated about memory imperfections. If a per-son thinks she should preemptively notice information today because it might be useful tomorrow,then she will end up noticing more today than if she had access to a recording of the full history.That is, a SAS with memory imperfections remains a SAS without them, but not vice-versa.

The results above reflect the most basic implication of our framework: a person is alerted toher mistakes not by the statistical unlikeliness of all the data in front of her, but rather by howsurprising she finds the data she notices. This means that factors that are often intuited as pro-moting learning—increasing the stakes, decreasing the cost of information gathering, simplifyingthe choice, etc.—may not help in our framework (and may even backfire) depending on how theyinfluence the person’s perceived uncertainty about the optimal action.

35To provide intuition, suppose φ is an ASE given π in an environment where the agent does not have continualaccess to an archive of past outcomes. If π is not attentionally stable when the person can access the history priorto any decision, then there must exist information in the history that she believes would improve her decision beyondthe data she gathered under φ . But this contradicts the assumption that φ is sufficient in the first place: any sufficientattentional strategy extracts all seemingly useful data from the history.

37

Page 39: Channeled Attention and Stable Errors

8 Limitations, Extensions, and Further Implications

This paper develops a framework for investigating when people might notice their errors. Weuse this framework to partially characterize when people are more or less likely to discover theirmistakes, and to investigate the stability of psychological biases and empirical misconceptions.

In turning to several modifications and extensions worth considering, we note that our frame-work is built around a central theme that complements the emphasis of recent research on attention:a person’s (potentially mistaken) prior beliefs critically influence the perceived benefits of payingattention and, consequently, the implications of limited attention. Our focus on cases where atten-tional costs are near zero highlights the centrality of such beliefs.

As a byproduct, this focus also stacks the deck in favor of a person noticing his errors, as doesour study of repetitive choice contexts that provide sufficient data to identify the true model. In thissense, our attentional stability criterion provides a “stress test”: if a person does not discover hiserror in repetitive environments where attention is cheap, then he is unlikely to do so elsewhere.

Our framework also abstracts away from factors beyond channeled attention that could hinderlearning. We assume the agent processes information in a fully Bayesian way given his priors.Hence, psychological biases such as confirmation bias or motivated misreading of information donot play a role in our analysis. Such factors surely contribute to the persistence of some erro-neous beliefs, yet, by neutralizing them, we demonstrate how channeled attention itself enablesthe persistence of errors.

As noted in the introduction, an additional limitation inheres in our ambition to provide a sharpframework that we and others can broadly apply: our approach ignores how non-instrumentalfactors influence what draws attention. Especially when we focus on minimal attentional strategies,our framework permits agents to ignore non-instrumental data that they might, in reality, be forcedto notice. One could embed assumptions about what captures attention as a primitive (e.g., payoffsor utility in some contexts), and our general framework leaves such assumptions to be imposedon a case-by-case basis.36 However, it is worth noting again that any form of attentional capturethat influences what momentarily draws attention without systematically influencing what comesto mind from memory will typically not overturn our results on when people discover their errors.

Our framework is meant to be applied once a misspecified model is provided as a primitive,and we rely on empirical literature in both psychology and economics to suggest these primitives.However, this literature is often silent on whether a particular error is “local” or “global”; that is,whether a person must correct an error in one context if he notices it in another. It is unclear, for

36Attentional capture could be integrated into our framework by focusing solely on SAS’s that reflect the positedattentional capture instead of focusing on minimal SAS’s. Alternatively, we could preserve our focus on minimalSAS’s and modify the agent’s payoffs so that they are rewarded for accurately reporting the object of attentionalcapture.

38

Page 40: Channeled Attention and Stable Errors

example, if a person who is naive about his self-control problem has varying degrees of naivetedepending on the context (e.g., spending vs. exercise) or instead applies the same misspecifiedmodel of his self control across all inter-temporal decision problems. In practice, to apply ourmodel consistently—and to capture the realism that people often fail to port their expertise acrosssimilar contexts (e.g., Green, Rao, and Rothschild 2019)—we assume that any awareness of errorsis local. Under this assumption, a person may discover his mistake in one context, but continue tomake that same mistake in other contexts. Furthermore, as we discussed in Section 4, wheneveran error is inexplicable relative to the truth it is also inexplicable relative to infinite other errors: aperson who discovers that his model is incorrect need not adopt the true model.

Our framework admits further applications. A natural question is how outsiders, such as firms,might design environments to channel others’ attention and prevent them from waking up to mis-takes that benefit the outsiders.37 Our framework may also shed light on how policymakers andresearchers might go about debiasing those who are making errors. If the goal is to provide datato convince a person that his model π is wrong, our theory highlights the importance of provid-ing data that is relevant within that model. While many researchers find that debiasing people isparticularly difficult (e.g., Soll, Milkman, and Payne 2014), the difficulty might partially lie in thetype of information debiasing campaigns choose to provide. Selecting information based solelyon how much it would move people’s beliefs if it were processed may be far less effective thantargeting information that seems relevant given their biased beliefs. Simply describing the correctmodel may also have limited effect: even when we are exposed to the correct alternative, it maynot be immediately compelling because we hadn’t noticed the data supporting it.38

37As an example, imagine a setting where there is a good (e.g., a printer) with a base price that consumers notice andan add-on price (e.g., the price of ink) that consumers neglect because they mistakenly think it is not large or variableenough to impact their decision of whether or what to buy. As previously recognized in the literature following Gabaixand Laibson (2006) (for a review, see Heidhues and Koszegi 2018), the seller might take advantage of consumers’misspecified models of the add-on price by setting the base price low and the add-on price high. For example, theprinter seller would set the price of the printer low enough so that purchasing it seems like a good deal to the consumerregardless of the realized ink price. The more novel implication of our framework is that channeled attention providesa new reason why the base price might stay low: the seller is nervous that a high base price will put the consumeron the margin where she should attend to the add-on price to inform some decision; e.g., to decide how much she’llactually use a printer given the cost of ink and thus whether she should buy a different printer. This could wake theconsumer up to add-on prices that are higher than she thought possible and consequently cause her to avoid purchasingany product in that category altogether.

38This complements a different reason highlighted by Schwartzstein and Sunderam (2021) for why people may notfind the true model compelling: when strategic persuaders tailor models to the data (i.e., propose models after seeingthe data), they are able to propose false models that fit the data better than the true model.

39

Page 41: Channeled Attention and Stable Errors

ReferencesAKEPANIDTAWORN, K., R. DI MASCIO, A. IMAS, AND L. SCHMIDT: (2021): “Selling Fastand Buying Slow: Heuristics and Trading Performance of Institutional Investors.” Working Paper.

ALI, S.N. (2011): “Learning Self Control.” Quarterly Journal of Economics, 126(2), 857–893.

ALLCOTT, H., J. KIM, D. TAUBINSKY, J. ZINMAN (2021): “Payday Lending, Self-Control, andConsumer Protection.” Working Paper.

AL-NAJJAR, N. AND E. SHMAYA (2015): “Uncertainty and Disagreement in Equilibrium Mod-els.” Journal of Political Economy, 123, 778–808.

BARBERIS, N., A. SHLEIFER, AND R. VISHNY (1998): “A Model of Investor Sentiment.” Jour-nal of Financial Economics, 49, 307–343.

BARSKY, A., R. SAINTFORT, M. ROGERS, AND J. BORUS (2002): “Nonspecific MedicationSide Effects and the Nocebo Phenomenon.” Journal of the American Medical Association, 287(5),622–627.

BENABOU, R. AND J. TIROLE (2004): “Willpower and Personal Rules.” Journal of PoliticalEconomy, 112(4), 848–886.

BENJAMIN, D., A. BODOH-CREED, AND M. RABIN (2016): “Base-Rate Neglect: Foundationsand Applications.” Working Paper.

BENJAMIN, D., M. RABIN, AND C. RAYMOND (2016): “A Model of Non-Belief in the Law ofLarge Numbers.” Journal of the European Economic Association, 14(2), 515–544.

BERK, R. (1966): “Limiting Behavior of Posterior Distributions when the Model is Incorrect.”Annals of Mathematical Statistics, 37(1), 51–58.

BERNHEIM, D. AND R. THOMADSEN (2005): “Memory and Anticipation.” The Economic Jour-nal, 115(503), 271–304.

BESHEARS, J., H.N. LEE, K. MILKMAN, R. MISLAVSKY, AND J. WISDOM (2021): “CreatingExercise Habits Using Incentives: The Tradeoff between Flexibility and Routinization.” Manage-ment Science, 67(7), 3985–4642.

BODOH-CREED, A. (2019): “Mood, Memory, and Biased Beliefs and Actions.” Review of Fi-nance, 24(1), 227–262.

BOHREN, A. (2016): “Informational Herding with Model Misspecification.” Journal of Eco-nomic Theory, 163, 222–247.

BOHREN, A., K. HAGGAG, A. IMAS, AND D. POPE (2021): “Inaccurate Statistical Discrimina-tion: An Identification Problem.” Working Paper.

BOHREN, A. AND D. HAUSER (2021): “Learning with Heterogeneous Misspecified Models:Characterization and Robustness.” Econometrica, forthcoming.

40

Page 42: Channeled Attention and Stable Errors

BORDALO, P., N. GENNAIOLI, AND A. SHLEIFER (2012): “Salience Theory of Choice UnderRisk.” The Quarterly Journal of Economics, 127(3), 1243–1285.

BORDALO, P., N. GENNAIOLI, AND A. SHLEIFER (2013): “Salience and Consumer Choice.”Journal of Political Economy, 121(5), 803–843.

BORDALO, P., N. GENNAIOLI, AND A. SHLEIFER (2017): “Memory, Attention, and Choice.”Working Paper.

BORDALO, P., N. GENNAIOLI, AND A. SHLEIFER (2020): “Memory, Attention, and Choice.”Quarterly Journal of Economics, 135(3): 1399–1442.

BORDALO, P., N. GENNAIOLI, AND A. SHLEIFER (2021): “Salience.” Working Paper.

BRADLEY, S. AND N. FELDMAN (2020): “Hidden Baggage: Behavioral Responses to Changesin Airline Ticket Tax Disclosure.” American Economic Journal: Economic Policy, 12(4), 58–87.

BROADBENT, D. (1958): Perception and Communication, Pergamon Press.

BRONNENBERG, B., J. DUBE, M. GENTZKOW, AND J. SHAPIRO (2015): “Do Pharmacists BuyBayer? Informed Shoppers and the Brand Premium.” Quarterly Journal of Economics, 130(4),1669–1726.

BUSHONG, B. AND T. GAGNON-BARTSCH (2021): “Learning with Misattribution of ReferenceDependence.” Working Paper.

CAMERER, C., T. HO, AND J. CHONG (2004): “A Cognitive Hierarchy Model of Games.”Quarterly Journal of Economics, 119(3), 861–898.

CAPLIN, A. AND M. DEAN (2015): “Revealed Preference, Rational Inattention, and CostlyInformation Acquisition.” American Economic Review, 105(7), 2183–2203.

CARRERA, M., H. ROYER, M. STEHR, J. SYDNOR, AND D. TAUBINSKY (2021): “How ArePreferences for Commitment Revealed?” Working Paper.

CHATER, N. (2018): The Mind is Flat, Yale University Press.

CHOUDHRY, N., A. KRUMME, P. ERCOLE, ET AL. (2017): “Effect of Reminder Devices onMedication Adherence: The REMIND Randomized Clinical Trial.” JAMA Internal Medicine,177(5), 624–631.

CHUN, M., J. GOLOMB, AND N. TURK-BROWNE (2011): “A Taxonomy of External and Inter-nal Attention.” Annual Review of Psychology, 62(1), 73–101.

CRAWFORD, V. AND N. IRIBERRI (2007): “Level-K Auctions: Can a Non-equilibrium Modelof Strategic Thinking Explain the Winner’s Curse and Overbidding in Private-Value Auctions?”Econometrica, 75(6), 1721–1770.

DEHAENE, S. (2014): Consciousness and the Brain: Deciphering How the Brain Codes OurThoughts, Viking Press.

41

Page 43: Channeled Attention and Stable Errors

DELLAVIGNA, S. AND U. MALMENDIER (2006): “Paying Not to Go to the Gym.” AmericanEconomic Review, 96, 694–719.

DELLAVIGNA, S. AND M. GENTZKOW (2019): “Uniform Pricing in U.S. Retail Chains.” TheQuarterly Journal of Economics, 134(4), 2011–2084.

DEMARZO, D. VAYANOS, AND J. ZWIEBEL (2003): “Persuasion Bias, Social Influence, andUni-Dimensional Opinions.” Quarterly Journal of Economics, 118, 909–968.

DREW, T., M. VO, AND J. WOLFE (2013): “The Invisible Gorilla Strikes Again: SustainedInattentional Blindness in Expert Observers.” Psychological Science, 24(9), 1848–1853.

ENKE, B., F. SCHWERTER, AND F. ZIMMERMANN (2019): “Associative Memory and BeliefFormation.” Working Paper.

ENKE, B. AND F. ZIMMERMANN (2019): “Correlation Neglect in Belief Formation.” Review ofEconomic Studies, 86(1), 313-332.

ERICSON, K. (2011): “Forgetting We Forget: Overconfidence and Memory.” Journal of theEuropean Economic Association, 9(1), 43–60.

ESPONDA, I. (2008): “Behavioral Equilibrium in Economies with Adverse Selection.” AmericanEconomic Review, 98(4), 1269–1291.

ESPONDA, I. AND D. POUZO (2016): “Berk-Nash Equilibrium: A Framework for ModelingAgents with Misspecified Models.” Econometrica, 84(3), 1093–1130.

ESPONDA, I., E. VESPA, AND S. YUKSEL (2020): “Mental Models and Learning: The Case ofBase-Rate Neglect.” Working Paper.

EVANS, G. AND S. HONKAPOHJA (2001): Learning and Expectations in Macroeconomics,Princeton University Press.

EYSTER, E. AND PICCIONE, M. (2013): “An Approach to Asset Pricing Under Incomplete andDiverse Perceptions.” Econometrica, 81, 1483–1506.

EYSTER, E. AND M. RABIN (2005): “Cursed Equilibrium.” Econometrica, 73(5), 1623–1672.

EYSTER, E. AND M. RABIN (2010): “Naive Herding in Rich-Information Settings.” AmericanEconomic Journal: Microeconomics, 2(4), 221–243.

EYSTER, E. AND M. RABIN (2014): “Extensive Imitation is Irrational and Harmful.” QuarterlyJournal of Economics, 129(4), 1861–1898.

EYSTER, E., M. RABIN, AND D. VAYANOS (2018): “Financial Markets where Traders Neglectthe Informational Content of Asset Prices.” Journal of Finance, 74(1), 371–399.

FRYER, R. AND M. JACKSON (2008): “A Categorical Model of Cognition and Biased Decision-Making.” B. E. Journal of Theoretical Economics, 8, 1–44.

42

Page 44: Channeled Attention and Stable Errors

FUDENBERG, D. AND G. LANZANI (2020): “The Stability of Misperceptions under Mutations.”Working paper.

FUDENBERG, D., AND D. LEVINE (1993): “Self-Confirming Equilibrium.” Econometrica, 61,523–545.

GABAIX, X. (2014) “A Sparsity-Based Model of Bounded Rationality.” The Quarterly Journalof Economics, 129(4), 1661–1710.

GABAIX, X. AND D. LAIBSON (2006): “Shrouded Attributes, Consumer Myopia, and Informa-tion Suppression in Competitive Markets.” Quarterly Journal of Economics, 121(2), 505–540.

GAGNON-BARTSCH, T. (2016): “Taste Projection in Models of Social Learning.” Working paper.

GAGNON-BARTSCH, T., M. PAGNOZZI, AND A. ROSATO (2021): “Projection of Private Valuesin Auctions.” American Economic Review, forthcoming.

GAGNON-BARTSCH, T., AND M. RABIN (2017): “Naive Social Learning, Mislearning, andUnlearning.” Working Paper.

GAGNON-BARTSCH, T., M. RABIN, AND J. SCHWARTZSTEIN (2018): “Channeled Atten-tion and Stable Errors.” Working Paper. https://scholar.harvard.edu/files/gagnonbartsch/files/channeledattention2018.pdf

GARBER, M., D. NAU, S. ERICKSON, J. AIKENS, AND J. LAWRENCE (2004): “The Con-cordance of Self-Report with Other Measures of Medication Adherence: A Summary of theLiterature.” Medical Care, 42(7), 649–652.

GAZZALEY, A. AND A. NOBRE (2012): “Top-Down Modulation: Bridging Selective Attentionand Working Memory.” Trends in Cognitive Sciences, 16(2), 129–135.

GITTINS, J. (1979): “Bandit Processes and Dynamic Allocation Indices.” Journal of the RoyalStatistical Society Series B (Methodological), 41(2), 148–177.

GOTTLIEB, D. (2019): “Will You Never Learn? Self Deception and Biases in Information Pro-cessing.” Working Paper.

GREEN, E., J. RAO, AND D. ROTHSCHILD (2019): “A Sharp Test of the Portability of Exper-tise.” Management Science, 65(6), 2445–2945.

HANDEL, B., AND J. SCHWARTZSTEIN (2018) “Frictions or Mental Gaps: What’s Behind theInformation We (Don’t) Use and When Do We Care?” Journal of Economic Perspectives, 32(1):155–178.

HANNA, R., S. MULLAINATHAN, AND J. SCHWARTZSTEIN (2014): “Learning Through Notic-ing: Theory and Evidence from a Field Experiment.” Quarterly Journal of Economics, 129(3),1311–1353.

HAUSER, W., E. HANSEN, AND P. ENCK (2012): “Nocebo Phenomena in Medicine.” DeutschesAerzteblatt Online, 109(26), 459–465.

43

Page 45: Channeled Attention and Stable Errors

HE, K. (2021): “Mislearning from Censored Data: The Gamblers Fallacy in Optimal-StoppingProblems.” Working Paper.

HEIDHUES, P. AND B. KOSZEGI (2018): “Behavioral Industrial Organization.” Handbook ofBehavioral Economics, Eds. D. Bernheim, S. Dellavigna, and D. Laibson, 1, 517–612.

HEIDHUES, P., B. KOSZEGI, AND P. STRACK (2018): “Unrealistic Expectations and MisguidedLearning.” Econometrica, forthcoming.

HONG, H., J. STEIN, AND J. YU (2007): “Simple Forecasts and Paradigm Shifts.” Journal ofFinance, 62, 1207–1242.

HOROWITZ, A. (2013): On Looking: A Walker’s Guide to the Art of Observation, Simon andSchuster.

JEHIEL, P. (2005): “Analogy-Based Expectation Equilibrium.” Journal of Economic Theory, 123,81–104.

JEHIEL, P. AND F. KOESSLER (2008): “Revisiting games of incomplete information withanalogy-based expectations.” Games and Economic Behavior, 62(2), 533–557.

JOHANSSON, P., L. HALL, S. SIKSTROM., AND A. OLSSON, A (2005): “Failure to DetectMismatches Between Intention and Outcome in a Simple Decision Task.” Science, 310, 116–119.

KIRMAN, A. (1975): “Learning by Firms About Demand Conditions.” In R. Day and T. Groves(Eds), Adaptive Economic Models , Academic Press, 137–156.

KOSZEGI, B., G. LOEWENSTEIN, AND T. MUROOKA (2021): “Fragile Self-Esteem.” Review ofEconomic Studies, forthcoming.

LAIBSON, D. (1997): “Golden Eggs and Hyperbolic Discounting.” Quarterly Journal of Eco-nomics, 112(2): 443–477.

LAIBSON, D. (2015): “Why Don’t Present-Biased Agents Make Commitments?” AmericanEconomic Review Papers and Proceedings, 105(5): 267-272.

LEHMANN, E., AND G. CASELLA (1998): Theory of Point Estimation, Springer New York.

LI, X. AND C. CAMERER (2021): “Predictable Effects of Visual Salience in Experimental Deci-sions and Games.” Working Paper.

LOEWENSTEIN, G., T. O’DONOGHUE, AND M. RABIN (2003): “Projection Bias in PredictingFuture Utility.” Quarterly Journal of Economics, 118(4), 1209–1248.

MACKOWIAK, B., F. MATEJKA, AND M. WIEDERHOLT (2021): “Rational Inattention: A Re-view.” Working Paper.

MADARASZ, K. (2012): “Information Projection: Model and Applications.” Review of EconomicStudies, 79, 961–985.

44

Page 46: Channeled Attention and Stable Errors

MALMENDIER, U. AND D. SHANTHIKUMAR (2007): “Are Small Investors Naive About Incen-tives?” Journal of Financial Economics, 85, 457–489.

MATEJKA, F. AND A. MCKAY (2015): “Rational Inattention to Discrete Choices: A New Foun-dation for the Multinomial Logit Model.” The American Economic Review, 105(1), 272-298.

MOST, S., B. SCHOLL, E. CLIFFORD, AND D. SIMONS (2005): “What You See Is What YouSet: Sustained Inattentional Blindness and the Capture of Awareness.” Psychological Review,112, 217–242.

MULLAINATHAN, S. (2002a): “A Memory-Based Model of Bounded Rationality.” QuarterlyJournal of Economics, 117(3), 735–774.

MULLAINATHAN, S. (2002b): “Thinking Through Categories.” Working Paper.

MULLAINATHAN, S., J. SCHWARTZSTEIN, AND A. SHLEIFER (2008): “Coarse Thinking andPersuasion.” Quarterly Journal of Economics, 123, 577–619.

MULLAINATHAN, S. AND Z. OBERMEYER (2019): “A Machine Learning Approach to Low-Value Health Care: Wasted Tests, Missed Heart Attacks and Mis-Predictions.” Working Paper.

NICKERSON, R. AND M. ADAMS (1979): “Long-term Memory for a Common Object.” Cogni-tive Psychology, 11(3), 287–307.

NYARKO, Y. (1991): “Learning in Misspecified Models and the Possibility of Cycles.” Journalof Economic Theory, 55 (2), 416–427.

O’DONOGHUE, T. AND M. RABIN (1999): “Doing It Now or Later.” American Economic Re-view, 89(1), 103–124.

O’DONOGHUE, T. AND M. RABIN (2001) “Choice and Procrastination.” Quarterly Journal ofEconomics, 116(1), 121–160.

ORTOLEVA, P. (2012): “Modeling the Change of Paradigm: Non-Bayesian Reactions to Unex-pected News.” American Economic Review , 102(6), 2410–2436.

OSTERBERG, L. AND T. BLASCHKE (2005): “Adherence to Medication.” New England Journalof Medicine, 353(5), 487–497.

PAYZAN-LENESTOUR, E. AND M. WOODFORD (2020): “‘Outlier Blindness’: Efficient CodingGenerates an Inability to Represent Extreme Values.” Working Paper.

RABIN, M. (2002): “Inference by Believers in the Law of Small Numbers.” Quarterly Journal ofEconomics, 117(3): 775–816.

RABIN, M. AND J. SCHRAG (1999): “First Impressions Matter: A Model of Confirmatory Bias.”Quarterly Journal of Economics, 114(1): 37–82.

RABIN, M. AND D. VAYANOS (2010): “The Gambler’s and Hot-Hand Fallacies: Theory andApplications.” Review of Economic Studies, 77: 730–778.

45

Page 47: Channeled Attention and Stable Errors

SARGENT, T. (1993): Bounded Rationality in Macroeconomics, Oxford University Press.

SCHWARTZSTEIN, J. (2014): “Selective Attention and Learning.” Journal of the European Eco-nomic Association, 12(6), 1423–1452.

SCHWARTZSTEIN, J. AND A. SUNDERAM (2021): “Using Models to Persuade.” American Eco-nomic Review, 111(1), 276–323.

SIMONS, D. AND C. CHABRIS (1999): “Gorillas in Our Midst: Sustained Inattentional Blindnessfor Dynamic Events.” Perception, 28, 1059–1074.

SIMONS, D. AND C. CHABRIS (2011): “What People Believe about How Memory Works: ARepresentative Survey of the U.S. Population.” PLoS ONE, 6(8), e22757.

SIMS, C. (2003): “Implications of Rational Inattention.” Journal of Monetary Economics, 50,665–690.

SOLL, J., K. MILKMAN, AND J. PAYNE (2015). A User’s Guide to Debiasing. In The WileyBlackwell Handbook of Judgment and Decision Making (eds G. Keren and G. Wu).

SPIEGLER, R. (2016): “Bayesian Networks and Boundedly Rational Expectations.” QuarterlyJournal of Economics, 131, 1243–1290.

TEOH, S., I. WELCH, AND T. WONG (1998a): “Earnings Management and the Underperfor-mance of Seasoned Equity Offerings.” Journal of Financial Economics, 50(1), 63–99.

TEOH, S., I. WELCH, AND T. WONG (1998b): “Earnings Management and the Long-term Mar-ket Performance of Initial Public Offerings.” Journal of Finance, 53(6), 1935–1974.

VOLPP K., A. TROXEL, S. MEHTA, ET AL. (2017): “Effect of Electronic Reminders, Finan-cial Incentives, and Social Support on Outcomes After Myocardial Infarction: The HeartStrongRandomized Clinical Trial.” JAMA Internal Medicine, 177(8), 1093–1101.

WOODFORD, M. (2012): “Inattentive Valuation and Reference-dependent Choice.” Working Pa-per.

46

Page 48: Channeled Attention and Stable Errors

Appendix

A Attention and Memory

Throughout the text we assume that a person’s memory is very flexible: any information a persondoesn’t currently believe she needs to notice is not top of mind; any information she currentlybelieves she needs to notice is top of mind. In particular, she is able to access at any time anythingthat happened to her, whether or not she remembered or even noticed it when it first happened.There are circumstances where such assumptions are natural (e.g., if relevant information is col-lected and stored in an external database). There are also circumstances where these assumptionsseem plainly counterfactual. Here (and in greater detail in an earlier draft of the paper, Gagnon-Bartsch, Rabin, and Schwartzstein 2018) we explore what happens if we refine noticing strategiesto account for realistic interactions between attention and memory.

Our analysis would remain unchanged if we refine noticing strategies to require that a personmust notice an event as it happens in order to recall it later—e.g., a person needs to first notice agorilla to later recall that she’s seen it. But it changes in subtle ways if we take this idea further andimpose that once some piece of data goes unnoticed (i.e., it’s no longer top of mind even if noticedbefore), then it can no longer be noticed in the future.

Definition A.1. A noticing strategy N is memory consistent if for all t ∈N and ht ∈Ht , ht ∈ nt(ht)

implies (st+1,yt ,xt ; ht) ∈ nt+1((st+1,yt ,xt ;ht)) for all (st+1,yt ,xt) ∈ St+1×Yt×Xt .

Memory consistency says “once unnoticed, never noticed again”. That is, under memory consis-tency, when the agent notices a coarsening of the history today, then all further coarsenings mustmaintain the current one. This refinement crudely captures the idea that data is less likely to betop of mind today if it was not top of mind yesterday. Since memory consistency refines noticingstrategies, if there exists an attentionally stable equilibrium under memory consistency, then thereexists an attentionally stable equilibrium without requiring memory consistency.39

In fact, memory consistency promotes incidental learning in some situations. To illustrate, con-sider a manager who in each period assigns an employee to one of two tasks, xt ∈ {H,L}: “highimportance” or “low importance”. The manager assigns tasks based on his beliefs about the em-ployee’s ability, θ ∈ [0,1]. The employee’s output yt ∈ {0,1} is i.i.d. conditional on θ withP(yt = 1|θ) = θ , where yt = 1 denotes a “successful” job in round t and θ represents the em-ployee’s success rate. The manager has incentives to assign the “high” task on day t if and onlyhe currently believes the worker’s success rate, θ , exceeds 50%. Suppose the manager is overly

39In describing the noticing strategy in Definition A.1, we implicitly assume that a person’s beliefs about her noticingplans in future periods are time consistent. It is straightforward to extend our framework to handle inconsistency.

47

Page 49: Channeled Attention and Stable Errors

pessimistic about the worker and the support of his misspecified model has an upper bound strictlybelow the true rate, θ ∗. The manager need not discover this mistake without memory consistency;that is, if he can continually re-partition the full history of outcomes each round in a way that isspecifically tailored to his current decision problem. In this case, the manager can follow an atten-tional strategy that simply notices the optimal action each round and nothing more (see Proposition1). The optimal assignment in a single period provides but a rough sense of how often the workerhas succeeded (e.g., only whether this rate exceeds 50%). This information alone is not enough todiscover that θ ∗ /∈ supp(π)—he would need to further notice the worker’s precise performance rateover time. Yet when ht is always accessible, the manager has no incentive to track this seeminglysuperfluous data since his coarse attentional strategy seems sufficient.

These incentives change under memory consistency. In this case, if the manager were to noticenothing more than the optimal action today, then he would no longer be able to recall the exactnumber of good and bad performances that happened prior to today. This is because the noticingpartition in any later round must continue to ignore the same details that the manager ignorestoday. But such an attentional strategy is not sufficient, since future decisions may rely on thesedetails despite being irrelevant today.40 As such, a SAS under memory consistency requires themanager to notice the employee’s empirical performance rate at each point in time: this allows himto both take the optimal action today and precisely update his beliefs over θ after observing moreoutcomes in the future. The empirical performance rate, however, will converge to θ ∗ /∈ supp(π)and consequently render the manager’s model attentionally inexplicable.

This example highlights a sense in which incidental learning under memory consistency comesfrom a discrepancy in the data necessary to make an optimal decision versus precisely learn param-eters. Having unlimited access to historical data allows the agent to bypass other details requiredfor belief updating and hence limits the scope for incidental learning.

Memory consistency does not say that a person notices all information that he previously en-coded. Instead, it is consistent with allowing the person to freely discard information that hepreviously encoded once it is no longer decision-relevant under his misspecified model. That said,there are situations where it seems likely that some data would be top of mind even when a personno longer finds it useful (e.g., immediately after an action inspired by that data). To handle suchscenarios, we consider a refinement that captures the limiting situation of automatic recall (“AR”)where a person continually notices anything that he previously noticed.

Definition A.2. A noticing strategy N satisfies automatic recall (AR) if for all t ∈ N and ht ∈ Ht ,

40For instance, assigning the high task reveals only that the employee has delivered good performance in at least50% of the previous periods, but it does not reveal by how much the employee surpassed this benchmark. Thus,noticing only this action does not tell the manager how many subsequent bad performances he should endure beforere-assigning the employee to the low task. From the manager’s perspective, this attentional strategy performs worsethan a strategy that pays full attention. Hence, it is not a SAS under memory consistency.

48

Page 50: Channeled Attention and Stable Errors

ht /∈ nt(ht) implies that (st+1, yt , xt ; ht) /∈ nt+1((st+1,yt ,xt ;ht)) for all (st+1,yt ,xt),(st+1, yt , xt) ∈St+1×Yt×Xt .

Automatic recall requires the person to distinguish the continuations of any two histories that werepreviously distinguished. For example, if a consumer considers a product’s price when decidingwhether to buy it, then automatic recall says that she always remembers this price when makingfuture decisions. Although the combination of automatic recall and memory consistency is ex-treme, an earlier draft (Gagnon-Bartsch, Rabin, and Schwartzstein, 2018)—along with many ofthe proofs below—show that our results on when an error is attentionally stable largely continueto hold even if we impose these refinements.

B Supplemental Results

B.1 Basic Result Under Full Attention

In this section, we formally state the results under full attention that we sketched in Section 4.4.These results serve as a “full-attention benchmark” against which we contrast outcomes underchanneled attention. (See Appendix D for proof.) Recall that we assume nature draws θ ∗ from π∗

and probabilistic statements are with respect to the true distribution conditional on θ ∗.Under Assumption 1, let D(θ ∗‖λ ) ≡ minθ∈supp(λ )D(θ ∗‖θ), where D(θ ∗‖θ) is the Kullback-

Leibler Divergence of P(·|θ) from P(·|θ ∗), and define ∆D(θ ∗‖λ ,π)≡D(θ ∗‖λ )−D(θ ∗‖π) as thedegree to which π better explains observations than λ .41

Observation B.1. Suppose Assumptions 1 and 2 hold, the environment is stationary, and D(θ ∗‖λ )or D(θ ∗‖π) is finite.

1. Theory π is explicable with respect to λ and a full-attention SAS if (i) ∆D(θ ∗‖λ ,π) > 0 or

(ii) ∆D(θ ∗‖λ ,π) = 0 and θ ∗ ∈ supp(λ )∪ supp(π).

2. Theory π is inexplicable with respect to λ and a full-attention SAS if ∆D(θ ∗‖λ ,π)< 0.

B.2 Preference-Independent Attentional Explicability

This section formalizes the results discussed in Section 6 (e.g., Table 1). The following proposi-tions are essentially corollaries of Lemma 1. See the main text for intuitive descriptions of these

41The Kullback-Leibler divergence is given by

D(θ ∗‖θ) = ∑y∈Y

P(y|θ ∗) logP(y|θ ∗)P(y|θ)

. (B.1)

49

Page 51: Channeled Attention and Stable Errors

results and Appendix D for proofs.Following our presentation in Section 6, we first consider two classes of models that are PIAE.“Censored” models that ignore possible outcomes. We define “censored models” as follows.

Definition B.1. Model π is censored if Y (π) ⊂ Y (θ ∗) and there exists θ ∈ supp(π) such that forall y ∈ Y (θ), P(mπ(y)|θ) = P(mπ(y)|y ∈ Y (θ),θ ∗).

Proposition B.1. Suppose the assumptions underlying Lemma 1 hold. If π is censored, then π is

PIAE.

Note that the assumption that a censored model correctly explains anticipated outcomes—i.e., forall y ∈ Y (θ), P(mπ(y)|θ) = P(mπ(y)|y ∈ Y (θ),θ ∗)—is stronger than necessary for PIAE. As weshow in the proof, this extra assumption ensures that π is additionally PIAE under automatic recall(see Definition A.2). Thus, Y (π)⊂ Y (θ ∗) alone is enough to imply that π is PIAE.

Models that neglect predictive signals. We define models with “predictor neglect” as follows.

Definition B.2. Consider an environment where yt = (rt ,s1t , . . . ,s

Kt ) in each round, and for all θ ∈

supp(π)∪{θ ∗}, P(st ,rt |θ) = P(rt |st ,θ)P(st) where st ≡ (s1t , . . . ,s

Kt ). That is, due to uncertainty

over θ , the person may be uncertain about how s predicts r, but she is certain about the frequency ofs since it is independent of θ . Model π exhibits predictor neglect if there exists J ∈ {0, . . . ,K−1}such that for all θ ∈ supp(π), P(rt |st ,θ) is independent of (sJ+1

t , . . . ,sKt ).

Proposition B.2. Suppose the assumptions underlying Lemma 1 hold. If π exhibits predictor

neglect and there exists some θ ∈ supp(π) such that P(rt |s1t , . . . ,s

J,θ) = P(rt |s1t , . . . ,s

J,θ ∗) for all

possible (r,s1, . . . ,sJ) under θ ∗, then π is PIAE.

Intuitively, the person feels free to ignore any information on the “neglected” signals (sJ+1, . . . ,sK).Therefore, so long as there exists some θ within the person’s model that can explain the jointdistribution over (r,s1, . . . ,sJ), then the model is PIAE.

Next, we consider three classes of models that are not PIAE.Uncertain models that correctly specify the set of outcomes but incorrectly specify their proba-

bilities. Sufficient uncertainty induces incidental learning when the misspecified theory correctlypredicts which outcomes are possible but incorrectly specifies the probabilities of those outcomes.The next definition describes uncertain environments where no two observations lead to the samebeliefs over parameters.

Definition B.3. For any π , we say the family of distributions {P(·|θ)}θ∈supp(π)∪{θ∗} satisfies theVarying Likelihood Ratio Property (VLRP) if for all y,y′ ∈ Y (π) and all θ ,θ ′ ∈ supp(π)∪{θ ∗},P(y|θ)P(y′|θ) =

P(y|θ ′)P(y′|θ ′) if and only if y = y′ or θ = θ ′.42

42The VLRP condition is a generalization of the more familiar monotone likelihood ratio property (MLRP), as itdoes not require supp(π)∪{θ ∗} to be ordered. VLRP therefore holds for any family of distributions that satisfy (strict)MLRP.

50

Page 52: Channeled Attention and Stable Errors

Whenever supp(π) contains at least two elements, VLRP implies that the person finds it necessaryto separately notice every outcome in order to learn θ ; that is, mπ(y) is a singleton for each y ∈Y (π).

Proposition B.3. Suppose the assumptions underlying Lemma 1 hold and, in addition, VLRP holds

with Y (π) = Y (θ ∗). If supp(π) has at least two elements and θ ∗ /∈ supp(π), then π is not PIAE.

“Overly elaborate” models that anticipate too wide a range of outcomes. We define overly-elaborate models as follows.

Definition B.4. Model π is overly elaborate if Y (π) ⊃ Y (θ ∗) and there exists y ∈ Y (π) such thatmπ(y)∩Y (θ ∗) =∅ with P(mπ(y)|θ)> 0 for all θ ∈ supp(π).

Proposition B.4. Suppose the assumptions underlying Lemma 1 hold. If π is overly elaborate,

then π is not PIAE.

“Over-fit” models that assume the set of predictive signals is wider than it truly is. We providea definition of “over-fit” models within the same class of environments in which we consideredpredictor neglect, above. Over-fit models can be seen as a counterpoint to those with predictorneglect.

Definition B.5. Consider the environment introduced above in Definition B.2: in each round, yt =

(rt ,s1t , . . . ,s

Kt ) and, for all θ ∈ supp(π)∪{θ ∗}, P(st ,rt |θ)=P(rt |st ,θ)P(st) where st ≡ (s1

t , . . . ,sKt ).

Model π is over-fit if it has the following properties:

1. There exists J ∈ {0, . . . ,K−1} such that in truth P(rt |st ,θ∗) is independent of (sJ+1

t , . . . ,sKt ).

That is, for all s, s ∈ S such that (s1, . . . ,sJ) = (s1, . . . , sJ) and (sJ+1, . . . ,sK) 6= (sJ+1, . . . , sK),P(r|s,θ ∗) = P(r|s,θ ∗).

2. For all θ ∈ supp(π), P(rt |st ,θ) depends on both (s1t , . . . ,s

Jt ) and (sJ+1

t , . . . ,sKt ). That is, for

all s, s ∈ S such that s 6= s, P(r|s,θ) 6= P(r|s,θ).

3. Signals (sJ+1, . . . ,sK) are useful for updating under model π . That is, there exist s, s∈ S suchthat (s1, . . . ,sJ) = (s1, . . . , sJ), (sJ+1, . . . ,sK) 6= (sJ+1, . . . , sK), and (r, s) /∈mπ((r,s)) for someresolution r where (r,s),(r, s) ∈ Y (π).

To summarize, over-fit models are certain that some useless signals help predict outcomes (prop-erties 1 and 2), yet exhibit some uncertainty about the extent to which they help (property 3).

Proposition B.5. Suppose the assumptions underlying Lemma 1 hold. If π is over-fit, then π is not

PIAE.

Intuitively, there exist choice environments where the person seeks to learn the extent to whichsignals (sJ+1, . . . ,sK) predict outcomes, and attending to these signals would eventually prove π

false.

51

Page 53: Channeled Attention and Stable Errors

B.3 How Objectives Influence Attentional Stability: Additional Results

Definition B.6. Fix a model π and an outcome environment (×∞t=1Yt ,Θ,P,π∗). Say that a choice

environment (×∞t=1Xt ,×∞

t=1ut) creates incentives for a person to keep track of what she’s seen and

not just what she’s learned when only keeping track of the minimal sufficient statistic for updatingbeliefs is not a SAS. Otherwise, say that the choice environment creates incentives for a person to

only keep track of what she’s learned.

An example of an environment that creates these incentives is one that requires the person toaccurately report back ht or statistics of ht . As we show next, these incentives can induce the dis-covery of a PIAE error: while we define a PIAE error as stable across all stationary environmentsthat meet Assumptions 1 and 2, it need not be stable more broadly.

Corollary B.1. Consider a model π and any outcome environment (×∞t=1Yt ,Θ,P,π∗) in which π

is inexplicable under a full attention SAS. If π is PIAE but not S-PIAE, then (i) there exists an

attentionally stable equilibrium in every choice environment that creates incentives for a person to

only keep track of what she’s learned, but (ii) there is no attentionally stable equilibrium in some

choice environment that creates incentives for a person to keep track of what she’s seen and not

just what she’s learned.

This corollary provides a sense in which a person is more likely to wake up to errors in envi-ronments that create incentives for her to keep track of what she’s seen and not just what she’slearned.

C Application Details

This section provides details for the applications discussed in Section 5.

C.1 Redundancy Neglect

This application considers an agent who neglects the correlated nature of others’ advice (as inDeMarzo, Vayanos, and Zwiebel 2003; Eyster and Rabin 2010; or Enke and Zimmermann 2019).In each period, the agent encounters a problem that has a solution dependent on a binary statethat is i.i.d. over time. For instance, the state may be the optimal way to resolve a problem atwork, and new problems crop up over time. Denote the state in period t by ωt ∈ {0,1}. Wesuppose that ωt is i.i.d. across periods with P(ωt = 1) = .5. In each period t, the person receivessignals about ωt from 2 information sources, denoted st = (s1

t ,s2t ) ∈ {0,1}2. These signals, for

instance, may be colleagues’ advice on how to resolve a problem. Let θi ≡ P(sit = ωt |ωt) denote

52

Page 54: Channeled Attention and Stable Errors

the precision of signals from source i ∈ {1,2}, and suppose θi ∈ {.5,γ} for some γ ∈ (.5,1). Thatis, an information source is either uninformative (i.e., θi = .5) or partially informative (i.e., θi = γ).The person’s objective each period is to choose xt ∈ [0,1] to maximize −(xt −ωt)

2. Thus, theoptimal xt matches her subjective probability that ωt = 1 conditional on st .

Suppose the two information sources are in fact perfectly correlated: s1t = s2

t for all t. Forinstance, one colleague has good intuition (or access to better information), while the second col-league has none and simply repeats (or “apes”) the first.43 We explore the stability of a misspecifiedmodel where the agent treats these two information sources as independent, thereby thinking shereceives two informative signals each period instead of one. We also describe some features of theenvironment that can help or hinder the discovery of this error.

To specify the agent’s erroneous model more precisely, let θ = (θ1,θ2,θc) denote the param-eter governing the agent’s signals. As noted above, θi for i ∈ {1,2} is the precision of sourcei. Additionally, θc ∈ {0,1} parameterizes the correlation in information sources: θc = 0 denotesindependence and θc = 1 denotes perfect correlation. The agent’s misspecified model π puts prob-ability one on θc = 0.

The first feature of the environment that matters for the discovery of this error is whether theagent feels compelled to learn about the precision of her information sources. For instance, asalesperson may encounter new clients each period and must decide how to pitch her product. Shetakes into account client-specific advice from her colleagues, but she may also need to learn whichof them gives good advice. Or, for another example, a new professor teaches different lectureseach day and learns over time which colleagues give good advice on how to lead those classes.

The second feature concerns the feedback available to the agent. We say the environment hasperfect feedback when the agent observes the current state at the end of each period. That is,rt = ωt for all t. For instance, the worker can readily assess whether her colleagues’ advice wasgood or bad. In contrast, we say the environment has no feedback when rt = ∅ for all t. In suchcases, we assume the agent does not observe the state nor receive utility until the end of the game.For instance, a teacher only gets noisy feedback about whether classes go well or poorly—some ofthis uncertainty is resolved only when she receives course evaluations. While these two cases areextreme, they adequately reveal how less feedback can enable incidental learning.

Proposition C.1. In the setting above, consider any misspecified model π that puts probability 1

on θc = 0 (independent signals).

1. Perfect feedback (i.e., rt =ωt for all t): Model π is part of an attentionally stable equilibrium

if and only if π is dogmatic about the precision of one or both information sources.43Notwithstanding the usual meaning of “ape”, here we also mean to invoke the intelligence of corvids and great

apes: the colleague doing the imitating is presumably behaving wisely—he is, in fact, doing exactly what the agent isdoing (learning from better informed colleagues).

53

Page 55: Channeled Attention and Stable Errors

2. No feedback (i.e., rt =∅ for all t): Model π is part of an attentionally stable equilibrium if

and only if π is dogmatic about both information sources or dogmatic that one information

source is uninformative.

This result reveals two ways in which uncertainty enables the incidental discovery of redun-dancy neglect. The first part says that if the person receives perfect feedback, then her erroneous“independent-signals” model is part of an attentionally stable equilibrium if and only if she knowsthe precision of one or both information sources. If she is certain about one of the sources, thenshe can ignore feedback about it. Under this SAS there is no way to notice the correlation betweensources. However, if she is uncertain about the precision of both, then tracking how often eachgives good advice will cause her to incidentally notice that their advice matches the state at aninexplicably similar rate.

The second part says that if the person receives no feedback, then the “independent-signals”model is part of an attentionally stable equilibrium if and only if she is dogmatic about the precisionof both information sources. If she uncertain about the precision of at least one information source,then she will learn its precision by tracking how often the two sources give the same advice: preciseadvice is more likely to be similar than noisy advice. To provide some intuition, suppose the workeris dogmatic that Colleague 1 has high precision yet is uncertain about the precision of Colleague 2.In this case, even without feedback on whether s1

t is correct, the mere event of agreement betweencolleagues would be good news about the quality of Colleague 2’s advice.

C.2 Empirical Misconceptions of Asset Patterns

In this section we apply our framework to a class of misspecified beliefs that include the well-known model by Barberis, Shleifer, and Vishny (“BSV”, 1998). Consider a single security thatpays out 100% of its earnings as dividends. In truth, the earnings process, denoted (Mt), followsa random walk: Mt = Mt−1 + yt , where yt ∈ {−y,y} for some y > 0 and M0 = 0. However, a risk-neutral representative investor holds wrong beliefs about the process, believing that it is in one oftwo “regimes” at any point in time: in one regime earnings are mean reverting, and in the otherthey trend. We show that the propensity for a market participant to eventually recognize this errorwill depend on the scope of her objective.

More formally, each agent’s misspecified model posits an underlying state of the economy ωt ∈{1,2} that determines how yt is drawn in round t. If ωt = 1, then yt follows a mean-revertingMarkov model; if ωt = 2, then yt follows a trending Markov model. The presumed transitionprobabilities in each of the two states are given in the table below, where θL ≤ .5≤ θH .

54

Page 56: Channeled Attention and Stable Errors

Table 2: Misspecified Models of Earnings Growth (BSV, 1998)

(a) The Mean-Reverting Process (Regime 1)

ωt = 1 yt−1 = y yt−1 =−yyt = y θL 1−θL

yt =−y 1−θL θL

(b) The Trending Process (Regime 2)

ωt = 2 yt−1 = y yt−1 =−yyt = y θH 1−θH

yt =−y 1−θH θH

Agents also believe the state ωt follows a Markov process with the following transition matrix:

ωt−1 = 1 ωt−1 = 2

ωt = 1 1−θ1 θ1

ωt = 2 θ2 1−θ2

with θ1,θ2 ∈ (0,1). In reality, θ ∗L = θ ∗H = 1/2. An agent’s misspecified model π over θ =

(θL,θH ,θ1,θ2) places no weight on θ with θL = θH = 1/2. This nests the BSV misspecifica-tion where an agent is convinced of all the transition probabilities above, and thus her misspecifiedmodel π is degenerate on some θ = (θL,θH ,θ1,θ2) with θL < 1/2 < θH .

Earnings are observable, and agents use this information to update their beliefs about the currentstate and future earnings. Assuming risk-neutrality and a discount rate ρ , the equilibrium price inperiod t is pt =Et [Vt ], where Vt ≡∑

∞k=0 Mt+k/(1+ρ)k is the discounted value of all future earnings

and Et denotes expectations under π conditional on (yt−1, . . . ,y1).44

Consider an analyst who provides a buy/sell recommendation to a price-taking, risk-neutralinvestor. In a given period t, the analyst recommends “buy” (xt = 1) if Et [Vt ] ≥ pt and “sell”(xt = 0) otherwise. The analyst may additionally need to make precise predictions about futureearnings to back up her buy/sell recommendation. In this case, she announces her prediction of thecurrent period’s earnings, denoted Mt , just before they are realized, and her payoff from predictingearnings is −(Mt −Mt)

2. We assume this is additively separable from her incentive to make anaccurate recommendation. The optimal forecast is Et [Mt ].

Proposition C.2. For any misspecified model π over θ , the analyst is more likely to wake up when

she needs to provide predictions to back up her recommendations than when she does not need to

make such predictions: fixing π , any noticing strategy that is part of a SAS for the analyst who

needs to provide predictions is also part of a SAS for the analyst who does not need to provide

predictions, but not vice-versa. Consequently, if there is an attentionally stable equilibrium for the

analyst who needs to provide predictions there is also an attentionally stable equilibrium for the

analyst who does not need to provide predictions.

44The timing is such that pt denotes the price in period t prior to the period-t earnings realization; pt is thus basedon (yt−1, . . . ,y1).

55

Page 57: Channeled Attention and Stable Errors

Waking up to an error depends on the scope of an individual’s objective. The buy/sell recom-mendation requires only a coarse view of the data, thereby permitting a false belief like the BSVmodel to persist. On the other hand, a precise prediction about earnings requires detailed atten-tion to patterns in the data, which is more likely to force the agent to confront inconsistenciesthat reveal her misspecified beliefs. Take the BSV misspecification as an example. Following thelogic of Proposition 1, the analyst who only needs to make a recommendation can coarsely attendto the data so that she notices only whether she should recommend buy or sell. At any date t,her minimal noticing partition Nt contains just two elements: nt

B = {ht ∈ Ht | Et [Vt ] ≥ pt} andnt

S = {ht ∈ Ht | Et [Vt ]< pt}. In each period, she essentially notices a binary signal—“recommendbuy” or “recommend sell”—and these coarse signals alone are not enough to reveal her error. Theanalyst who needs to make precise predictions, on the other hand, feels the need to more preciselyparse the earnings history. Her expectation of the next period’s earnings is a function of the proba-bility that the current regime is mean reverting. Letting qt denote this probability, the analyst mustpartition the history of outcomes in a way that precisely distinguishes qt . This value, however,gives a very good sense of the sequence of earnings innovations that must have happened prior toperiod t. As such, the analyst must confront the history of outcomes by noticing qt and is there-fore more likely to reject the misspecified model based on her noticed data. We conjecture (buthave not yet proven) that there is no attentionally stable equilibrium for an analyst who holds themisspecified BSV model and needs to offer earnings predictions.

D Proofs

D.1 Proofs of Results in the Main Text

Proof of Proposition 1. Let NF be the full-attention noticing strategy and let σF be a pure behav-ioral strategy such that φF = (NF ,σF) is a SAS. For each ht , let x∗(ht) ∈ Xt denote the action towhich σF(ht) assigns probability one. Define N such that for all t, nt(ht)≡

{ht ∈ Ht | x∗(ht) = x∗(ht)

}.

Let σ be a behavioral strategy where, for all t, σt(nt(ht)) places probability one on x∗(ht); φ =

(N ,σ) is a therefore a SAS.Furthermore, φ is minimal. To see this, consider any ˜N that is a coarsening of N . Then ˜N

must include some partition Nt with an element nt with the following property: there exist at leasttwo elements nt and nt of Nt such that nt ∩nt 6=∅ and nt ∩ nt 6=∅. However, by construction, allht ∈ nt prescribe the same optimal action under π , and this action is distinct from the one prescribedby all ht ∈ nt . Thus, ˜N is not part of a SAS given π since nt induces a suboptimal action withpositive probability under π . �

Proof of Proposition 2. Consider the setup from Section 5.1, and consider any misspecified model

56

Page 58: Channeled Attention and Stable Errors

π such that q < 1 where q = maxsupp(π). There are two relevant types of periods in this ap-plication: (1) periods where the person decides whether to buy (or renew) the membership, and(2) periods where the person decides whether to visit the gym (assuming he has a membership).We assume these types of periods are mutually exclusive. (This is inconsequential but simplifiesexposition.)

(⇐) Suppose (i) π is non-degenerate and (ii) the person has an incentive to track her precisewillingness to pay for a gym membership. In this case, in each period t where the person decideswhether to buy a membership, Nt has exactly t +1 elements—one for each possible realization of

∑tk=1 1{βk = β} under π (see the discussion in Footnote 29). In reality βk = β in every period.

Thus, for all ht , nt(ht) perfectly reveals that βk = β for all k ≤ t. Thus,

limt→∞

P(nt(ht)|π)P(nt(ht)|π∗)

= limt→∞

P( t

∑k=1

1{βk = β}= t∣∣∣∣π)

≤ limt→∞

(q)t = 0,

where the first equality follows from the fact that P(nt(ht)|π∗) = P(

∑tk=1 1{βk = β}= t

∣∣∣∣π∗)= 1

when π∗ is degenerate on the true parameter, q∗ = 1.(⇒) We now show that conditions (i) and (ii) are necessary for there to exist no attentional

equilibrium given π . If (i) fails to hold, then a minimal SAS is such that (1) in each period t wherethe person decides whether to buy a membership, Nt is a singleton; (2) in each period t where theperson decides whether to visit the gym, Nt distinguishes only whether or not ct/βt > b. Sincethe person does not notice any details of the history beyond his current sentiment about goingto the gym, this SAS constitutes an attentionally stable equilibrium given π . (More precisely,liminft→∞ P(ct/βt > b|π)/P(ct/βt > b|π∗)> 0 and liminft→∞ P(ct/βt ≤ b|π)/P(ct/βt ≤ b|π∗)>0 since the event ct/βt > b has positive probability under both models and these probabilities areconstant for all t.)

Now suppose that (i) holds but (ii) does not. As above, in each period t where the persondecides whether to visit the gym, Nt distinguishes only whether or not ct/βt > b and thus thisdata alone cannot render π attentionally inexplicable. Consider instead periods t where the persondecides whether to buy the membership. The only interesting case is where Condition 1 holdsfor some q ∈ supp(π) but not all (otherwise the person is subjectively certain about the optimalmembership decision). As noted in the text in Section 5.1 (details in Footnote 29), each noticingpartition (of a minimal SAS) consists of at most two elements in this case: letting V (q) denotethe right-hand side of (1) where Et is with respect to π conditional on ht , the two elements arent

B ≡ {ht ∈ Ht | Et [V (q)] ≥ m}, which contains all histories following which it’s optimal to buythe contract, and nt

R ≡ {ht ∈ Ht | Et [V (q)] < m}, which contains all histories following which

57

Page 59: Channeled Attention and Stable Errors

it’s optimal to reject. Note that limt→∞ P(ntB|π)/P(nt

B|π∗)> 0 and limt→∞ P(ntR|π)/P(nt

R|π∗)> 0,since P(nt

x|π) is bounded away from 0 as t → ∞ given the assumption that Condition 1 holds forsome q ∈ supp(π) but not all. �

The following lemma (and remark) will be useful in establishing explicability in many of theproofs to follow.

Lemma D.1. Assume Assumptions 1 and 2 hold, and that the environment is stationary. Enumerate

Y arbitrarily by Y = {y1, . . . ,yN} and suppose the true parameter is θ ∗. For any θ ,θ ′ ∈Θ, define

Z(θ ,θ ′|θ ∗)≡N

∏n=1

(P(yn|θ)P(yn|θ ′)

)P(yn|θ∗). (D.1)

1. If Z(θ ,θ ′|θ ∗)< 1, then the likelihood ratio P(yt |θ)/P(yt |θ ′) a.s.−→ 0.

2. If Z(θ ,θ ′|θ ∗)> 1, then P(yt |θ)/P(yt |θ ′) a.s.−→ ∞.

3. If Z(θ ,θ ′|θ ∗) = 1 and θ or θ ′ equals θ ∗, then P(yt |θ)/P(yt |θ ′) a.s.−→ 1.

Proof of Lemma D.1. For any yt = (yt−1, . . . ,y1), let kn(yt) be the number of times outcome yn

happens prior to round t. That is, kn(yt)≡∑t−1τ=1 1{yτ = yn}, and thus kn(yt)/t a.s.−→ P(yn|θ ∗) by the

Strong Law of Large Numbers (SLLN). Then

P(yt |θ)P(yt |θ ′)

=∏

Nn=1 P(yn|θ)kn(yt)

∏Nn=1 P(yn|θ ′)kn(yt)

=

(∏

Nn=1 P(yn|θ)kn(yt)/t

∏Nn=1 P(yn|θ ′)kn(yt)/t

)t

= (Zt)t , (D.2)

where

Zt ≡N

∏n=1

(P(yn|θ)P(yn|θ ′)

)kn(yt)/t

. (D.3)

If Z(θ ,θ ′|θ ∗) < 1, fix any Z ∈(Z(θ ,θ ′|θ ∗),1

). On the event {limt→∞ Zt = Z(θ ,θ ′|θ ∗)}, which

occurs with probability one by construction, there exists T ∈N such that for all t > T , Zt < Z < 1.It follows that limsupt→∞(Zt)

t ≤ limt→∞

(Z)t

= 0 on this event, and thus (Zt)t a.s.−→ 0. A simi-

lar argument holds for the case where Z(θ ,θ ′|θ ∗) > 1. Finally, if Z(θ ,θ ′|θ ∗) = 1 and θ or θ ′

equals θ ∗, then P(·|θ) = P(·|θ ′) by Gibb’s inequality. This implies that Zt = 1 for all t and thusP(yt |θ)/P(yt |θ ′) = 1 for all t. �

Remark D.1. Note that ln(Z(θ ,θ ′|θ ∗)) =D(θ ∗‖θ ′)−D(θ ∗‖θ), where D is the KL divergence de-fined in Equation B.1. Hence, the three conditions of Lemma D.1 are equivalent to (i) D(θ ∗‖θ ′)<D(θ ∗‖θ), (ii) D(θ ∗‖θ ′)> D(θ ∗‖θ), and (iii) D(θ ∗‖θ ′) = D(θ ∗‖θ) and θ or θ ′ equals θ ∗.

58

Page 60: Channeled Attention and Stable Errors

Proof of Lemma 1. We begin by proving a similar result for the case of memory consistency (MC)and automatic recall (AR) defined in Definitions A.1 and A.2.

Lemma D.2. Assume Assumptions 1 and 2 hold, and that the environment is stationary. Further

suppose S is a singleton and P(y|θ) ∈ (0,1) ∀(y,θ) ∈ Y (π)× supp(π). Then the theory π is PIAE

under memory consistency and automatic recall given θ ∗ if and only if there exists θ ∈ supp(π)such that

P(mπ(y)|θ)≥ P(mπ(y)|θ ∗) ∀y ∈ Y (π). (D.4)

Proof of Lemma D.2: (⇐) For any (X ,u), the person believes it is sufficient to record mπ(yt)

each period since this is sufficient for updating beliefs about θ (given S is a singleton). Thereare two cases to consider depending on whether the support of outcomes under the misspecifiedmodel, Y (π), matches the true support of outcomes, Y (θ ∗):

1. Suppose Y (π) = Y (θ ∗). This implies straightforwardly that condition (D.4) holds only if itholds with equality: P(mπ(y)|θ) = P(mπ(y)|θ ∗) ∀y ∈ Y (π). Under this condition, such anoticing strategy (i.e., recording mπ(yt) each period) is part of an attentionally stable equi-librium. To see this, consider any history of outcomes yt ∈ Y (π)t−1, and, slightly abus-ing notation, denote the corresponding noticed history by nt = (mπ(yt−1), . . . ,mπ(y1)). Themodel π is attentionally explicable if for some θ ∈ supp(π), liminft→∞ P(nt |θ)/P(nt |θ ∗)> 0with probability 1 given θ ∗. Since Y (π) is finite, enumerate the elements of mπ(·) by{m1

π , . . . ,mNπ }. For any yt ∈ Y (π)t−1, let kn(yt) ≡ ∑

t−1τ=1 1{yτ ∈ mn

π} denote the count ofentries in yt that are in mn

π . Then for any θ ∈ supp(π),

P(nt(yt)|θ)P(nt(yt)|θ ∗)

=∏

Nn=1 P(mn

π |θ)kn(yt)

∏Nn=1 P(mn

π |θ ∗)kn(yt)=

(∏

Nn=1 P(mn

π |θ)kn(yt)/t

∏Nn=1 P(mn

π |θ ∗)kn(yt)/t

)t

. (D.5)

This likelihood ratio is identical to the one considered in Lemma D.1 except the “noticed”outcome space in this case to be {m1

π , . . . ,mNπ } rather than Y (π). Since P(mπ(y)|θ) =

P(mπ(y)|θ ∗) ∀y ∈ Y (π), the ratio in (D.5) converges to 1 in t, and hence π is part of anattentionally stable equilibrium irrespective of (X ,u). Additionally, it is straightforward thatthis noticing strategy satisfies MC and AR: note that, for all t,

nt(yt) = {yt ∈ Y (π)t | mπ(yτ) = mπ(yτ) ∀τ = 1, . . . , t−1}. (D.6)

Thus, for all yt ∈Y (π), if ht ∈ nt(ht), then the continuation history (yt , ht)= (yt , yt−1, . . . , y1)∈nt+1((yt ,ht)), which implies memory consistency. Furthermore, if ht /∈ nt(ht), then there

59

Page 61: Channeled Attention and Stable Errors

must exist some τ < t such that mπ(yτ) 6= mπ(yτ). This implies that, for all yt , yt ∈Y (π), thecontinuation history (yt , ht) = (yt , yt−1, . . . , y1) /∈ nt+1((yt ,ht)), and hence AR holds.

2. Suppose Y (π) 6= Y (θ ∗). Thus, condition (D.4) need not hold with equality. The caseof equality is handled above. To handle the case without equality, suppose there existsθ ∈ supp(π) such that P(mπ(y)|θ) ≥ P(mπ(y)|θ ∗) ∀y ∈ Y (π) with a strict inequality forsome y ∈ Y (π). As such, the support Y (π) must exclude at least one outcome in Y (θ ∗),so the set Y 0 ≡ Y (θ ∗) \Y (π), is non-empty. Let P0 ≡ ∑y∈Y 0 P(y|θ ∗). Again enumeratethe elements of mπ(·) as {m1

π , . . . ,mNπ }. We will construct an alternative collection of suf-

ficient statistics over Y (π)∪Y 0 for each time period t, denoted{

m(1,t)π , . . . , m(N,t)

π

}such

that P(

m(n,t)π

∣∣θ) = P(

m(n,t)π

∣∣θ ∗) ∀n = 1, . . . ,N and ∀t ∈ N. Suppose the person merges

y ∈ Y 0 with elements of a partition over Y (π) according to a randomizing device gov-erned by discrete i.i.d. random variables zt with support Z = {1, . . . ,N} and mass functionP(zt = n) = [P(mn

π |θ)−P(mnπ |θ ∗)]/P0. We augment the observation space to Y ×Z . Then

for all t and all outcomes (yt ,zt), define m(n,t)π by

yt ∈ m(n,t)⇔(yt ∈ mn

π

)or(yt /∈ Y (π) and zt = n

).

In other words, yt is lumped according to mπ if yt ∈ Y (π) and is otherwise lumped stochas-tically according to the randomizing device zt . Thus, each m(n,t)

π is encoded with the sameprobability under both θ and θ ∗: from the specification of P(zt = n), it follows that forall n ∈ {1, . . . ,N} and all t ∈ N, P

(m(n,t)

π

∣∣θ ∗) = P(mnπ |θ ∗)+P0 ·P(zt = n) = P(mn

π |θ) =

P(

m(n,t)π

∣∣θ), where the last equality follows from the fact that the only realizations of yt

included in m(n,t)π beyond those in mn

π have probability zero under θ . Given that the distribu-tion of noticed outcomes under mπ(·) is equivalent for both θ and θ ∗, the proof concludesalong the same lines as the case above with Y (π) = Y (θ ∗) aside from the simple differencethat each mn

π above is replaced with the corresponding m(n,t)π . Furthermore, an analogous

argument to that in the previous case establishes that the noticing strategy satisfies MC andAR.

(⇒) Suppose π is PIAE (with AR) given θ ∗. Enumerate Y (π) = {y1, . . . ,yN} and consider theaction space X = [0,1]N along with utility function u(x,y) = −∑

Nn=1(xn− 1{y = yn})2. We first

show that under (X ,u), any SAS requires that the person notices at least the information con-tained in mπ(yt) each period since this is a minimal sufficient statistic (see, e.g., Lehmann andCasella 1998). To establish this, we show that the person’s optimal action after noticing y ∈ mπ

differs from the optimal action following any y′ ∈ m′π where m′π 6= mπ . The optimal action under(X ,u) is such that xn = ∑θ∈supp(π)P(yn|θ)πt(θ). First, if mπ(y) = Y (π) ∀y, then we are trivially

60

Page 62: Channeled Attention and Stable Errors

done. If there exists y ∈ Y (π) such that mπ(y) 6= Y (π), then it suffices to show the following:y′ /∈ mπ(y) ⇒ ∑θ∈supp(π)P(y|θ)π(θ |y) 6= ∑θ∈supp(π)P(y|θ)π(θ |y′), where π(θ |y) is the poste-rior probability of θ following outcome y given prior π(θ). Note that ∑θ∈supp(π)P(y|θ)π(θ |y) 6=∑θ∈supp(π)P(y|θ)π(θ |y′)⇔ ∑θ∈supp(π)P(y|θ)[π(θ |y)−π(θ |y′)] 6= 0. Further,

∑θ∈supp(π)

P(y|θ)[π(θ |y)−π(θ |y′)] = ∑θ∈supp(π)

P(y|θ)[

P(y|θ)π(θ)P(y)

− P(y′|θ)π(θ)P(y′)

]∝ ∑

θ∈supp(π)π(θ)P(y|θ)

[P(y|θ)P(y′)−P(y′|θ)P(y)

]= ∑

θ∈supp(π)π(θ)

[P(y|θ)2P(y′)−P(y|θ)P(y′|θ)P(y)

]= Eθ

[P(y|θ)2P(y′)−P(y|θ)P(y′|θ)P(y)

]> Eθ

[P(y)2P(y′)−P(y′|θ)P(y)2]= 0,

where the strict inequality follows from Jensen’s inequality given that y′ /∈mπ(y) and hence P(y′|θ)/P(y|θ)depends on θ . Thus, any π that is PIAE (with MC and AR) must be part of an ASE involving aSAS that records mπ(yt) each round (as in Equation D.6).

To finally establish that condition (D.4) must hold, we proceed by contradiction: suppose con-dition (D.4) does not hold, so for any θ ∈ supp(π), there exists y ∈ Y (π) such that P(mπ(y)|θ)<P(mπ(y)|θ ∗). Under a SAS where the person records each instance of mπ(y), the predicted distri-bution over noticed outcomes for each t and θ ∈ supp(π) will differ from the true distribution in thelimit. As such, the KL distance between these distributions is positive and P(nt |θ)/P(nt |θ ∗) a.s.−→ 0by Remark D.1. Thus π is not PIAE (with AR), a contradiction.

Completing the Proof of Lemma 1: Unlike Lemma D.2, Lemma 1 in the main text does not as-sume memory consistency and automatic recall. Thus, we now extend the proof above by droppingthese two assumptions.

(⇐) For any (X ,u), the person believes it is sufficient to notice mπ(yt) and st each period sincethis is sufficient for updating beliefs about θ and for taking the optimal action. Hence, for any(X ,u), noticing mπ(yt) and st constitutes the noticing strategy for some SAS. By assumption,there exists θ ∈ supp(π) such that liminft→∞ P(mπ(yt)|θ)/P(mπ(yt)|θ ∗) > 0 with probability 1under θ ∗. Thus, under the noticing strategy just described, liminft→∞ P(nt |θ)/P(nt |θ ∗)> 0 (withprobability 1 under θ ∗), which implies that π is part of an attentionally stable equilibrium givenθ ∗. Finally, since (X ,u) was arbitrary, π is PIAE given θ ∗.

(⇒) Suppose π is PIAE given θ ∗. The proof follows along the same lines as the analogousresult assuming MC and AR, above. First, we show that there exist (X ,u) under which any SASrequires the person to notice mπ(yt) (and st) for all t. As above, consider X = [0,1]N along withutility function u(x,y) = −∑

Nn=1(xn− 1{y = yn})2. Analogous to the proof with automatic recall

61

Page 63: Channeled Attention and Stable Errors

and memory consistency, the person’s optimal action after noticing yt ∈ mπ(yt) differs from theoptimal action following any yt /∈ mπ(yt) and hence the person must distinguish any mπ(yt) frommπ(yt) 6= mπ(yt). Given this result, PIAE by definition implies that there exists θ ∈ supp(π) and aSAS such that liminft→∞ P(nt |θ)/P(nt |θ ∗)> 0 with probability 1 under θ ∗, which in turn impliesliminft→∞ P(mπ(yt)|θ)/P(mπ(yt)|θ ∗) > 0 with probability 1 under θ ∗ since nt must contain atleast as much information as mπ(yt). �

Proof of Proposition 3. Part 1. Consider a minimal SAS (N ,σ) given π . Toward a contradiction,suppose that π is not attentionally measurable with respect to (N ,σ). This implies that thereexists a sample path ht , t ≥ 2 that occurs with positive probability under θ ∗ with the followingproperty: there exists a finite t ≤ t such that P(nt(ht)|θ) = 0 for all θ ∈ supp(π), where ht isthe history up to time t consistent with ht . Let τ be the smallest such t. Consider a modifiednoticing strategy N = (N1, N2, . . .) derived from N in the following way. First, let Nk = Nk forall k < τ and all k > τ . Second, since Nτ is a finite partition, enumerate its elements arbitrarily byNτ = {nτ

1, . . . ,nτJ} for some J≥ 1. By the assumption above, there exits some element nτ ∈Nτ such

that P(nτ |θ) = 0 ∀θ ∈ supp(π). Since the enumeration of Nτ is arbitrary, label this element by nτJ .

There must, however, exist some nτi ∈ Nτ , i 6= J, such that P(nτ

i |θ)> 0 for some θ ∈ supp(π). LetNτ consist of J− 1 elements, Nτ = {nτ

1, . . . , nτJ−1}, such that nτ

k = nτk if k 6= i,J and nτ

i = nτi ∪ nτ

J .That is, Nτ is a coarsening of Nτ where the (subjectively) zero-probability cell, nτ

J , is merged witha positive-probability cell, nτ

i . The noticing strategy N is thus coarser than N and the attentionalstrategy (N ,σ) is also sufficient given π , since altering how a person behaves in subjectivelyzero-probability situations does not impact his expected payoffs. Hence, (N ,σ) is not minimal, acontradiction.

Part 2. Suppose π is attentionally measurable with respect to a full-attention SAS. Then π

is not S-PIAE: take an environment where the person is incentivized to accurately repeat backeverything she ever saw. For example, she gets a strictly positive payoff for accurately reportinght and 0 otherwise. In this environment, π is attentionally inexplicable with respect to everySAS: by π being attentionally measurable with respect to a full attention SAS, the person placespositive probability on every truly possible history so every SAS is equivalent to the full attentionSAS. Since π is attentionally inexplicable with respect to a full-attention SAS, it is attentionallyinexplicable with respect to every SAS in this environment. �

Proof of Proposition 4. We begin by stating and proving a simple lemma.

Lemma D.3. Suppose Assumptions 1 and 2 hold. If with probability 1 under θ ∗ there exists some

t ∈ N such that for all t > t the optimal action given πt is independent of θ ∈ supp(πt), then there

exists an attentionally stable equilibrium (N ,σ) given π whether or not it is costly.

62

Page 64: Channeled Attention and Stable Errors

Proof of Lemma D.3: Suppose that with probability 1 under θ ∗ there exists some t such thatfor all t > t the optimal action given πt is independent of θ ∈ supp(πt). Then there exists a SAS(N ,σ) given π under which: with probability 1 there exists some period t after which (i) thenoticed history nt(ht) discards all information from t on except possibly aspects of the currentsignal st , and (ii) the noticing strategy N lumps together any signal that is impossible under πt

with a signal that is possible under πt . Under such a SAS, P(nt |π)/P(nt |λ ) is bounded away from0 because the effective history length is essentially bounded at t. Hence, an attentionally stableequilibrium given θ ∗ exists, completing the proof of Lemma D.3.

Proof of Part 1. Suppose θ ∗ /∈ supp(π) and consider a stationary and binary action space X =

{0,1}. Let yt+1 = (yt , . . . ,y1) ∈ ×tk=1Yk denote the sequence of realized outcomes through period

t. For each t ∈ N, consider the history-dependent utility function ut defined as follows:

ut(xt ,yt |ht) =

maxθ∈supp(π) P(yt+1|θ)max

θ∈supp(π∗) P(yt+1|θ) if ∑tk=1 xk = 0

maxθ∈supp(π∗) P(yt+1|θ)

maxθ∈supp(π) P(yt+1|θ) if ∑tk=1 xk = t

−1 if ∑tk=1 xk /∈ {0, t}.

(D.7)

Given that π is inexplicable, it must be that θ ∗ /∈ supp(π). Furthermore, under the true parameterθ ∗, limt→∞ P(yt+1|θ)/P(yt+1|θ ∗) = 0 for all θ ∈ supp(π). Thus, according to model π , it isoptimal to choose xt = 0 for all t, and strictly so whenever supp(π) is not a subset of supp(π∗).As such, there exists a minimal SAS in which the person chooses xt = 0 for all t and ignores allfeedback. This SAS yields an attentionally stable equilibrium given π (by Lemma D.3), and it iscostly given that, under π∗, it is in fact optimal to choose xt = 1 for all t.

Proof of Part 2. Consider a model π and an outcome environment (×∞t=1Yt ,Θ,P,π∗) in which

π is inexplicable in this environment. Suppose π∗ is absolutely continuous with respect to π

(i.e., π is attentionally measurable under a full-attention SAS) but θ ∗ /∈ supp(π). Consider thechoice environment where for all t = 1,2, . . . , we have Xt = Ht and ut(xt |ht) = 1 if xt = ht andut(xt |ht) =−1 otherwise. That is, the agent has incentive to repeat back the full history.

Consider an arbitrary minimal SAS (N ,σ). Absolute continuity in this choice environmentimplies that the sufficient noticing strategy N = (N1,N2, . . .) must distinguish all histories thatarise with positive probability under θ ∗ since any such histories happen with positive probabilityunder π as well.45 Since the minimal SAS (N ,σ) must precisely distinguish the true history ht

45That is, for all t, if ht and ht 6= ht happen with positive probability under θ ∗ (and hence under π), then ht /∈ nt(ht).To see why, consider any period t and ht , ht ∈ Ht such that ht 6= ht , P(ht |θ ∗) > 0, and P(ht |θ ∗) > 0. Toward acontradiction, suppose ht ∈ nt(ht). Recall from Definition 2 that for all ht ∈ Ht that occur with positive probabilityunder (π,σ), sufficiency requires maxx∈Xt E(π,σ)[ut(x|ht)|nt(ht)] = maxx∈Xt E(π,σ)[ut(x|ht)|ht ]. Note that this condi-tion fails if both ht and ht are assigned positive probability under π: in this case, maxx∈Xt E(π,σ)[ut(x|ht)|nt(ht)] <maxx∈Xt E(π,σ)[ut(x|ht)|ht ]. Thus, no cell of any Nt can contain more than one history assigned positive probability

63

Page 65: Channeled Attention and Stable Errors

each round (i.e., for all ht , ht with positive probability under π , ht /∈ nt(ht)), the behavioral strategyσt : Nt → Xt is such that xt = ht with probability 1. Hence, the person acts optimally each periodand the SAS (N ,σ) is thus costless.

Finally, we can show that π is attentionally inexplicable given the arbitrary minimal SAS (N ,σ).For all θ ∈ π , P(nt(ht)|θ) = ∑ht∈nt(ht)P(ht |θ) = P(ht |θ), since sufficiency of (N ,σ) implies thatfor any θ ∈ supp(π), ht ∈ nt(ht) where ht 6= ht only if ht is assigned probability zero under θ . Thusthe relevant Bayes’ Factor for assessing attentional explicability is equivalent to the one for as-sessing full-attention explicability: for all ht ∈ Ht , P(nt(ht)|θ)/P(nt(ht)|θ ∗) = P(ht |θ)/P(ht |θ ∗).Since π is inexplicable with full attention, P(ht |θ)/P(ht |θ ∗) converges to zero in t with positiveprobability, and hence P(nt(ht)|θ)/P(nt(ht)|θ ∗) does as well. Thus, π is θ ∗-attentionally inexpli-cable under any arbitrary minimal SAS. �

Proof of Proposition 5. Part 1. Suppose Θ⊆ Θreject(Γ,φ min), where φ min is a minimal SAS. Sup-pose φ min is an attentionally stable equilibrium given π in the original environment, and henceliminft→∞ P(nt |π)/P(nt |π∗) > 0, where nt are noticed histories under φ min. Because supp(π) =supp(π)\Θ and because Θreject(Γ,φ min)= {θ |Pr(θ |nt)→ 0 almost surely given π∗under SAS φ min},it follows that long-run beliefs under φ min assign positive probability to all θ ∈ supp(π)\Θreject(Γ,φ min)

in both the original environment (i.e., starting with prior π) and the modified environment (i.e.,starting with prior π). Thus, liminft→∞ P(nt |π)/P(nt |π∗) > 0 in the modified environment, sincethe environment is identical to Γ aside from the refinement of the prior. Furthermore, φ min is stilla SAS in the modified environment (albeit not necessarily minimal). Thus, φ min is an attentionallystable equilibrium given π in the modified environment. We saw by example in the text that theremay be an attentionally stable equilibrium given π in the modified environment but no attentionallystable equilibrium given π in the original environment.

Now suppose Θ⊆ supp(π)\Θreject(Γ,φ min). In contrast to the case above, long-run beliefs underφ min assign positive probability to all θ ∈ supp(π)\Θreject(Γ,φ min) in the original environment butnot in the modified environment. Thus, it is not necessarily the case that liminft→∞ P(nt |π)/P(nt |π∗)>0 under φ min. We saw an example of this case in the text.

Part 2. In the original environment, let φ = (N ,σ) denote the SAS derived in the proof ofProposition 1 that, in each period t, notices only the optimal action given π and ht . Consider amodified environment identical to Γ aside from expanding the action space each period to Xt =

Xt ∪{d}, where selecting d implements the recommended action under φ . That is, prior to period1, the person submits the behavioral strategy σ to a delegate or algorithm, and in any period t

in which the person chooses xt = d, the delegate or algorithm implements the action specified

under π . However, absolutely continuity implies that for all t and any history ht ∈ Ht that can occur under θ ∗, thereexists θ ∈ supp(π) such that ht has positive probability under θ . Hence, P(ht |π) > 0 and P(ht |π) > 0, and thusht ∈ nt(ht) would imply a contradiction to our sufficiency assumption.

64

Page 66: Channeled Attention and Stable Errors

by σt and nt(ht). Hence, ut(d,yt |ht) = ut(σ(nt(ht)),yt |ht). Consider the attentional strategy φ =

(N , σ) where N makes no distinctions (i.e., Nt = {Ht} for all t) and σt(nt) = d for all t. Inthe modified environment, φ is a SAS since it implements the same behavior as φ , which itself isa SAS. Furthermore, since φ follows the coarsest possible noticing strategy, φ is an attentionallystable equilibrium given π .

Part 3. Let φ = (N ,σ) be an attentionally stable equilibrium given π in the original environ-ment. In the original environment, the history each period looks like

ht = (st ,yt−1,xt−1,yt−2,xt−2, . . . ,y1,x1).

In the modified environment, the history each period instead looks like

ht = (st ,yt−1,xt−1,yt−2,xt−2, . . . ,y1,x1)

= (ht ,yt−1,xt−1,yt−2,xt−2, . . . ,y1,x1).

Let ht1 be the first component of ht , which (for illustration) above is ht , and let Ht be the set of

modified histories up to period t. Now, let ˜N be defined by

nt(ht) ={

ht ∈ Ht |ht1 ∈ nt(ht

1)}∀ t, ht ∈ Ht .

So the person notices the same things under ˜N that she does under N . Now derive σ from σ inthe obvious way and let φ = ( ˜N , σ).

Note that φ leads to the same behavior, beliefs, and effective information sets as φ . Since φ is aSAS, so is φ . And since φ is an ASE given π in the original environment, φ is an ASE given π inthe modified environment. The other direction is analogous and omitted. �

D.2 Proofs of Supplemental Results

Proof of Observation B.1. Under the full-attention noticing strategy, P(nt(ht)|π) = P(ht |π) andP(nt(ht)|λ ) = P(ht |λ ) for all ht ∈H. Supposing D(θ ∗‖λ ) and D(θ ∗‖π) are finite (the case whereone is infinite is obvious), P(ht |π) > 0 and P(ht |λ ) > 0 for all ht in the support of P(·|θ ∗). Nowletting Θmin

π = argminθ∈supp(π)D(θ ∗‖θ) and Θmin

λ= argmin

θ∈supp(λ )D(θ ∗‖θ), expand

P(ht |π) = P(ht |Θminπ ) ·π(Θmin

π )+P(ht |supp(π)\Θminπ ) · (1−π(Θmin

π ))

= P(ht |Θminπ ) ·

[π(Θmin

π )+P(ht |supp(π)\Θmin

π )

P(ht |Θminπ )

· (1−π(Θminπ ))

].

65

Page 67: Channeled Attention and Stable Errors

Similarly expand P(ht |λ ). As a result,

P(ht |π)P(ht |λ )

=P(ht |Θmin

π ) ·[π(Θmin

π )+P(ht |supp(π)\Θmin

π )

P(ht |Θminπ )

· (1−π(Θminπ ))

]P(ht |Θmin

λ) ·[

π(Θminλ

)+P(ht |supp(λ )\Θmin

λ)

P(ht |Θminλ

)· (1−π(Θmin

λ))

] . (D.8)

Without loss of generality, assume Θminπ 6= supp(π). Thus, for all θ ∈ supp(π)\Θmin

π and θ ′ ∈Θminπ ,

we have D(θ ∗‖θ)> D(θ ∗|θ ′). It follows from Part 1 of Lemma D.1 that P(ht |θ)P(ht |θ ′)

a.s.−→ 0, implying

that P(ht |supp(π)\Θminπ )

P(ht |Θminπ )

a.s.−→ 0. Thus, (D.8) implies that

P(ht |π)P(ht |λ )

a.s.−→ P(ht |Θminπ )

P(ht |Θminλ

)· π(Θ

minπ )

π(Θminλ

). (D.9)

If ∆D(θ ∗‖λ ,π) > 0 (i.e., D(θ ∗‖π) < D(θ ∗‖λ )), then for all θ ∈ supp(π) and θ ′ ∈ Θminλ

, wehave D(θ ∗‖θ) = D(θ ∗‖π) < D(θ ∗‖λ ) = D(θ ∗‖θ ′). It then follows from Part 2 of Lemma D.1that P(ht |θ)

P(ht |θ ′)a.s.−→ ∞, and thus P(ht |Θmin

π )

P(ht |Θminλ

)

a.s.−→ ∞. Therefore, (D.9) implies that P(ht |π)P(ht |λ )

a.s.−→ ∞, and

hence π is explicable with respect to λ and a full-attention SAS. Similarly, if ∆D(θ ∗‖λ ,π) = 0with θ ∗ ∈ supp(λ )∪ supp(π), then Part 3 of Lemma D.1 implies that P(ht |π)

P(ht |λ ) converges a.s. to apositive constant, again implying that π is is explicable with respect to λ and a full-attention SAS.

Finally, if ∆D(θ ∗‖λ ,π)< 0, then a similar argument based on Part 1 of Lemma D.1 implies thatP(ht |Θmin

π )

P(ht |Θminλ

)

a.s.−→ 0, and thus π is inexplicable with respect to λ and a full-attention SAS. �

Proof of Proposition B.1. Suppose π is censored. Thus there exists θ ∈ supp(π) such thatP(mπ(y)|θ) = P(mπ(y)|y ∈ Y (θ),θ ∗) for all y ∈ Y (θ). Since Y (θ) ⊂ Y (θ ∗), it must be thatP(mπ(y)|θ ∗) ≤ P(mπ(y)|y ∈ Y (θ),θ ∗) = P(mπ(y)|θ) for all y ∈ Y (θ), which implies that thesufficient condition for PIAE (with memory consistency and automatic recall) from the proof ofLemma 1 holds (Condition D.4). If π is PIAE with memory consistency and automatic recall, thenit is PIAE more generally. �

Proof of Proposition B.2. Assume π exhibits predictor neglect and that P(r|s1, . . . ,sJ,θ) =

P(r|s1, . . . ,sJ,θ ∗) for all possible (r,s1, . . . ,sJ) under π . Any SAS must distinguish the y=(r,s1, . . . ,sK)

from y = (r, s1, . . . , sK) only if (r,s1, . . . ,sJ) 6= (r, s1, . . . , sJ). Let N be the number of distinct val-ues of (r,s1, . . . ,sJ) under π . Then for each n = 1, . . . ,N, mπ(yt) must record the count kn(yt) ofoutcomes yτ , τ < t, such that yτ ∈ mn

π . Then P(mπ (yt)|θ)P(mπ (yt)|θ∗) is identical to (D.11) from the proof of

Proposition B.4, below. As such, π is PIAE if P(mnπ |θ ∗) = P(mn

π |θ) for all n = 1, . . . ,N. Thefact that P(r|s1, . . . ,sJ,θ) = P(r|s1, . . . ,sJ,θ ∗) for all possible (r,s1, . . . ,sJ) under π along with thedefinition of mπ implies P(mn

π |θ ∗) = P(mnπ |θ) for all n = 1, . . . ,N, so π is PIAE. �

66

Page 68: Channeled Attention and Stable Errors

Proof of Proposition B.3. Suppose {P(·|θ)}θ∈supp(π)∪θ∗ satisfies VLRP. Thus, for each y ∈ Y (π),there exists no y′ ∈ Y (π) such that y′ 6= y and P(y|θ)

P(y′|θ) is constant in θ ∈ supp(π). This implies thatfor all y ∈ Y (π), mπ(y) = {y}. Accordingly, for any yt ∈ Y (π)t−1 and all yn ∈ Y (π), mπ(yt) mustrecord the count of outcomes yτ in yt such that yτ = yn (denoted by kn(yt)). Then

P(mπ(yt)|θ)P(mπ(yt)|θ ∗)

=∏

Nn=1 P(yn|θ)kn(yt)

∏Nn=1 P(yn|θ ∗)kn(yt)

=

(∏

Nn=1 P(yn|θ)kn(yt)/t

∏Nn=1 P(yn|θ ∗)kn(yt)/t

)t

. (D.10)

Hence, Lemma D.1 along with (D.10) implies that limt→∞ P(mπ(yt)|θ)/P(mπ(yt)|θ ∗) > 0 withprobability 1 given θ ∗ iff −D(θ ∗‖θ)≥ 0. Since the Kullback-Leibler Divergence is non-negative,−D(θ ∗‖θ) ≥ 0⇔ D(θ ∗‖θ) = 0⇔ P(·|θ) = P(·|θ ∗), which contradicts VLRP. Hence, π is notPIAE. �

Proof of Proposition B.4. Let mπ be the partition of Y (π) defined in (2). Since this partition isunique and finite, enumerate its elements as {m1

π , . . . ,mNπ }. For any yt ∈ Y (π)t−1, mπ(yt) must

record the count of outcomes yτ in yt such that yτ ∈ mnπ (denoted by kn(yt)). Then

P(mπ(yt)|θ)P(mπ(yt)|θ ∗)

=∏

Nn=1 P(mn

π |θ)kn(yt)

∏Nn=1 P(mn

π |θ ∗)kn(yt)=

(∏

Nn=1 P(mn

π |θ)kn(yt)/t

∏Nn=1 P(mn

π |θ ∗)kn(yt)/t

)t

. (D.11)

Likelihood ratio (D.11) is identical to the one considered in the proof of Lemma 1 (Equation D.5).Thus limt→∞ P(mπ(yt)|θ)/P(mπ(yt)|θ ∗)> 0 with probability 1 given θ ∗ iff D(θ ∗‖θ) = 0, whereD in this case is the KL distance from Pm(·|θ) to Pm(·|θ ∗) with Pm(·|θ) denoting the impliedprobability measure over {m1

π , . . . ,mNπ } given θ . Since π is overly elaborate, there exists some mn

π

such that mnπ ∩Y (θ ∗) =∅, implying Pm(mn

π |θ)> 0 while Pm(mnπ |θ ∗) = 0. Finally, since D(θ ∗‖θ)

is non-negative and D(θ ∗‖θ) = 0⇔ Pm(·|θ) = Pm(·|θ ∗), and because the latter equality does nothold given that Pm(mn

π |θ)> 0 but Pm(mnπ |θ ∗) = 0, we must have D(θ ∗‖θ)> 0=D(θ ∗‖θ ∗). It then

follows from Lemma D.1 and Remark D.1 that ratio (D.11) converges to 0 a.s. and π is thereforenot PIAE. �

Proof of Proposition B.5. Following the setup of the proof of Proposition B.4, let mπ be the par-tition of Y (π) defined in (2) and enumerate its elements as {m1

π , . . . ,mNπ }. Again following the

proof of Proposition B.4, any yt ∈ Y (π)t−1, mπ(yt) must record the count of outcomes yτ in yt

such that yτ ∈ mnπ (denoted by kn(yt)), and thus limt→∞ P(mπ(yt)|θ)/P(mπ(yt)|θ ∗) (which in this

case is identical to the likelihood ratio in Equation D.11) is positive with probability 1 given θ ∗

iff D(θ ∗‖θ) = 0, where D in this case is the KL distance from Pm(·|θ) to Pm(·|θ ∗). Note thatD(θ ∗‖θ) = 0 iff Pm(·|θ) = Pm(·|θ ∗). We now show that the previous equality is violated for anyθ ∈ supp(π) when π is over-fit: Since π is over-fit, there exists s, s ∈ S such that (s1, . . . ,sJ) =

67

Page 69: Channeled Attention and Stable Errors

(s1, . . . , sJ), (sJ+1, . . . ,sK) 6= (sJ+1, . . . , sK), and (r, s) /∈ mπ((r,s)) for some resolution r where(r,s),(r, s) ∈Y (π). For all θ ∈ supp(π), P(r|s,θ) 6= P(r|s,θ), but P(r|s,θ ∗) = P(r|s,θ ∗). This im-plies that, for each θ ∈ supp(π), one of the following inequalities must hold: P(r|s,θ) 6= P(r|s,θ ∗)or P(r|s,θ) 6= P(r|s,θ ∗). Consider an arbitrary θ ∈ supp(π), and suppose WLOG that the first orthe two previous inequalities holds: P(r|s,θ) 6= P(r|s,θ ∗). Since P(s) is independent of the param-eter (by assumption), the previous inequality implies that P((r,s)|θ) 6= P((r,s)|θ ∗), and thereforePm(mπ((r,s))|θ) 6= Pm(mπ((r,s))|θ ∗). �

Proof of Corollary B.1. Part (i) is immediate from Lemma 1’s characterization of PIAE models.Part (ii) follows from the definition of not being S-PIAE. �

Proof of Proposition C.1. Consider the setup from Section C.1, and consider any misspecifiedmodel π that puts probability 1 on θc = 0 (independent signals).

Part 1: Suppose there is perfect feedback (i.e., rt = ωt for all t).(⇐) Suppose π is dogmatic about the precision of one or both information sources. If π is

dogmatic about both sources, then a minimal SAS is such that, for all t, Nt distinguishes onlythe value of st = (s1

t ,s2t ). Thus, P(nt(ht)|π)/P(nt(ht)|π∗) = P(st |π)/P(st |π∗) for all t. This ratio

is bounded away from zero as t → ∞ since P(st |π∗) ≥ 1− γ and P(st |π) ≥ (1− γ)2. As such,suppose π is dogmatic about the precision of only one information source. WLOG suppose π isdogmatic that source 2 has precision θ2 ∈ {.5,γ}, and it puts positive probability on both θ ∗1 (thetrue precision of source 1) and θ1, which denotes the other possible (yet false) value of θ1. Aminimal SAS is such that, for all t, Nt will (i) notice st = (s1

t ,s2t ); (ii) ignore all past signals from

source 2 from periods k < t; and (iii) distinguish the value of W 1t ≡ ∑

t−1k=1 1{s1

k = ωk}, which issufficient for updating beliefs about θ1. Thus:

P(nt(ht)|π)P(nt(ht)|π∗)

=P(W 1

t∣∣π)

P(W 1

t∣∣θ ∗1 ) P

(st∣∣W 1

t ,π)

P(st∣∣π∗)

=

∑θ1∈{θ1,θ

∗1 }

π(θ1)P(W 1

t∣∣θ1)

P(W 1

t∣∣θ ∗1 )

P(st∣∣W 1

t ,π)

P(st∣∣π∗)

=

π(θ ∗1 )+π(θ1)

(θ1

θ ∗1

)W 1t ( 1− θ1

1−θ ∗1

)t−W 1t

P(st∣∣W 1

t ,π)

P(st∣∣π∗) . (D.12)

Since in truth W 1t ∼ Binomial(t,θ ∗1 ), limt→∞

(θ1θ∗1

)W 1t(

1−θ11−θ∗1

)t−W 1t= 0.46 Hence,

46To see this, write(

θ1θ∗1

)W 1t(

1−θ11−θ∗1

)t−W 1t

as L(t,Wt , |θ ∗1 , θ1)t , where L(t,Wt , |θ ∗1 , θ1) ≡

θ∗

)W 1t /t ( 1−θ1

1−θ∗1

)1−W 1t /t

.

Note that limt→∞ L(t,Wt |θ ∗1 , θ1) =(

θ

θ∗

)θ∗1(

1−θ11−θ∗1

)1−θ∗1 ≡ L(θ1|θ ∗1 ). Furthermore, argmaxθ1

L(θ1|θ ∗1 ) = θ ∗1 and

68

Page 70: Channeled Attention and Stable Errors

liminft→∞ P(nt(ht)|π)/P(nt(ht)|π∗) > 0 since P(st∣∣W 1

t ,π)/P(st∣∣π∗) is bounded away from 0 as

t→ ∞.(⇒) Suppose π is part of an attentionally stable equilibrium. Toward a contradiction, suppose

that π is not dogmatic about either information source, and define W 2t analogously to W 1

t . There-fore, a minimal SAS is such that, for all t, Nt must distinguish the value of W 1

t , W 2t , and st . (To

see this, note that the count of “successes” is a minimal sufficient statistic for updating about theparameter governing a binomial distribution; see Footnote 31.) Thus, following the logic of (D.12)above, we have

P(nt(ht)|π)P(nt(ht)|π∗)

=P(W 1

t ,W2

t∣∣π)

P(W 1

t ,W 2t∣∣π∗) P

(st∣∣W 1

t ,W2

t ,π)

P(st∣∣π∗)

=P(W 1

t∣∣π)P(W 2

t∣∣π)

P(W 1

t∣∣θ ∗1 ) P

(st∣∣W 1

t ,W2

t ,π)

P(st∣∣π∗)

= P(W 2

t∣∣π)π(θ ∗1 )+π(θ1)

θ ∗

)W 1t ( 1− θ1

1−θ ∗1

)t−W 1t

P(st∣∣W 1

t ,W2

t ,π)

P(st∣∣π∗) .

As shown above, the middle term of the expression above (in parentheses) converges to π(θ ∗1 ) andP(st∣∣W 1

t ,W2

t ,π)/P(st∣∣π∗) is finite. Furthermore, limt→∞ P

(W 2

t∣∣π) = 0 given that for each θ2 ∈

supp(π), the person presumes W 2t ∼ Binomial(t,θ2). Thus, limt→∞ P(nt(ht)|π)/P(nt(ht)|π∗) = 0.

Part 2: Suppose there is no feedback (i.e., rt =∅ for all t).(⇐) Suppose π is dogmatic about the precision of both information sources. (Below we deal

with the additional case where π is dogmatic that one information source is uninformative.) As inthe similar case above with feedback, a minimal SAS is such that, for all t, Nt distinguishes onlythe value of st = (s1

t ,s2t ). Thus, limt→∞ P(nt(ht)|π)/P(nt(ht)|π∗) = limt→∞ P(st |π)/P(st |π∗)> 0.

(⇒) Suppose π is part of an attentionally stable equilibrium. Toward a contradiction, assumethe person is uncertain about at least one of the information sources and is not dogmatic that eithersource is uninformative. As we show below, the person must track the number of periods in whichthe two sources agree. Let at ≡ 1{s1

t = s2t } be an indicator for agreement in period t and let

Qt ≡ ∑tk=1 ak count the number of agreements through round t. Assuming the person notices Qt

(and st) each period, then π is attentionally inexplicable:

P(nt(ht)|π)P(nt(ht)|π∗)

=P(Qt∣∣π)

P(Qt∣∣π∗) P

(st∣∣Qt ,π

)P(st∣∣π∗)

= P(Qt∣∣π)P

(st∣∣Qt ,π

)P(st∣∣π∗) , (D.13)

L(θ ∗1 |θ ∗1 ) = 1. Thus, L(θ1|θ ∗1 )< 1 since θ1 6= θ ∗1 , and hence limt→∞ L(t,Wt , |θ ∗1 , θ1)t = limt→∞ L(θ1|θ ∗)t = 0.

69

Page 71: Channeled Attention and Stable Errors

where the second equality follows from the fact that Qt is deterministic under θ ∗ given that signalsare in truth perfectly correlated. Furthermore, limt→∞ P

(Qt = t

∣∣π) = 0 given that π treats s1t and

s2t as independent for all t. Thus (D.13) converges a.s. to 0 given that P

(st∣∣Qt ,π

)/P(st∣∣π∗) is

bounded from above.To complete the proof, we show that a SAS must notice (Qt ,st) in each period t. Let πt(θ1,θ2)

denote beliefs over (θ1,θ2)∈ {.5,γ}2 conditional on ht and note that the optimal action given thesebeliefs is

x∗t = ∑(θ1,θ2)

πt(θ1,θ2)P(ωt = 1|st ,θ1,θ2). (D.14)

We consider what minimal data the person finds sufficient to form the beliefs πt(θ1,θ2) that deter-mine the optimal action (D.14). First consider how the person (who believes θc = 0) updates thesebeliefs upon observing a single period: given beliefs πt−1 and signal st , Bayes’ rule implies

πt(θ1,θ2|st) =P(st |θ1,θ2)πt−1(θ1,θ2)

∑(θ1,θ2)P(st |θ1, θ2)πt−1(θ1, θ2)

. (D.15)

Examining P(st |θ1,θ2) for various values of st reveals that beliefs update identically when signalsagree, i.e. st ∈{(0,0),(1,1)}, and update identically when signals disagree, i.e. st ∈{(1,0),(0,1)}.To see this, note that P(st =(1,1)|θ1,θ2)=

12P(st =(1,1)|θ1,θ2,ωt = 1)+ 1

2P(st =(1,1)|θ1,θ2,ωt =

0)= 12 [1+2θ1θ2−θ1−θ2]. Similar calculations show that P(st =(0,0)|θ1,θ2)=P(st =(1,1)|θ1,θ2)=

12 [1+2θ1θ2−θ1−θ2] and P(st = (1,0)|θ1,θ2) = P(st = (0,1)|θ1,θ2) =

12 [θ1 +θ2−2θ1θ2]. This

implies that at = 1{s1t = s2

t } is sufficient for updating beliefs following a single period.Now consider updating the prior π based on ht . We will show that Qt = ∑

tk=1 ak is sufficient

for ht to form updated beliefs πt(θ1,θ2). The calculations above reveal that for all parametercombinations (θ1,θ2) 6= (γ,γ), P(at = 1|θ1,θ2) = 1/2. In contrast, P(at = 1|γ,γ) = 1−2γ(1− γ).Thus,

πt(γ,γ|ht) =[1−2γ(1− γ)]Qt [2γ(1− γ)]t−Qt π(γ,γ)

[1−2γ(1− γ)]Qt [2γ(1− γ)]t−Qt π(γ,γ)+ [.5]t(1−π(γ,γ)), (D.16)

but for any (θ1,θ2) 6= (γ,γ),

πt(θ1,θ2|ht) =[.5]tπ(θ1,θ2)

[1−2γ(1− γ)]Qt [2γ(1− γ)]t−Qt π(γ,γ)+ [.5]t(1−π(γ,γ)). (D.17)

From Equations (D.16) and (D.17), it is clear that Qt is sufficient for ht to update beliefs. Further-more, so long as πt(γ,γ) is non-degenerate, Qt is also necessary. But as shown in (D.13), noticingQt each period renders π attentionally inexplicable, a contradiction.

Finally, consider the case where the person is dogmatic that one of the information sources is

70

Page 72: Channeled Attention and Stable Errors

uninformative. This implies that πt(γ,γ) = 0 for all t and hence Equation (D.17) implies thatbeliefs about all other combinations of (θ1,θ2) are independent of ht . Therefore a minimal SAS issuch that, for all t, Nt distinguishes only the value of st . Thus, π is attentionally explicable in thiscase. �

Proof of Proposition C.2. For the analyst who does not need to provide predictions, a minimalSAS contains at most two elements, nt

B = {ht ∈ Ht | Et [Vt ] ≥ pt} and ntS = {ht ∈ Ht | Et [Vt ] <

pt}. That is, the analyst must notice only whether Et [Vt ] is high enough to justify buying at theprevailing price. The analyst who does need to provide predictions, on the other hand, needs toadditionally notice Et [Mt+1]. It is clear then that any sufficient noticing strategy for the analyst whoneeds to provide predictions is also a sufficient noticing strategy for the analyst who does not, butnot vice-versa. This implies that if there is an attentionally stable equilibrium for the analyst whoneeds to provide predictions, then there is also an attentionally stable equilibrium for the analystwho does not need to provide predictions. �

71