A Sparsity-Based Model of Bounded Rationality, Applied to...

A Sparsity-Based Model of Bounded Rationality,

Applied to Basic Consumer and Equilibrium Theory∗

Xavier Gabaix

NYU Stern, CEPR and NBER

November 14, 2012

Abstract

This paper defines and analyzes a “sparse max” operator, which generalizes the tra-

ditional max operator used everywhere in economics. The agent builds (as economists

do) a simplified model of the world which is sparse, considering only the variables

of first-order importance. His stylized model and his resulting choices both derive

from constrained optimization. Still, the sparse max remains tractable to compute.

Moreover, the induced outcomes reflect basic psychological forces governing limited

attention.

With the sparse max, we can behaviorally enrich a variety of economic models.

Here we study two pillars of economics: basic theory of consumer demand (choosing a

consumption bundle subject to a budget constraint) and competitive equilibrium. We

obtain a behavioral version of Marshallian and Hicksian demand, the Slutsky matrix,

the Edgeworth box, Roy’s identity etc., and competitive equilibrium. The Slutsky

matrix is no longer symmetric — non-salient prices are associated with anomalously

small demand elasticities. In the Edgeworth box, the offer curve is “extra-dimensional:”

it is a two-dimensional surface rather than a one-dimensional curve. As a result,

different aggregate price levels correspond to materially distinct competitive equilibria,

in a similar spirit to a Phillips curve. This framework provides a way to assess which

parts of basic microeconomics are robust, and which are not, to the assumption of

perfect maximization.

∗[email protected]. I thank David Laibson for a great many enlightening conversationsabout behavioral economics over the years. For very good research assistance I am grateful toJonathan Libgober, Elliot Lipnowski, and Farzad Saidi, and for helpful comments to AndrewAbel, Daniel Benjamin, Douglas Bernheim, Pierre-André Chiappori, Vincent Crawford, StefanoDellavigna, Alex Edmans, David Hirshleifer, Daniel Kahneman, Paul Klemperer, Botond Koszegi,Sendhil Mullainathan, Matthew Rabin, Antonio Rangel, Larry Samuelson, Yuliy Sannikov, ThomasSargent, Josh Schwartzstein, Andrei Shleifer, and participants at various seminars and conferences.I am grateful to INET for financial support.

1

1 Introduction

This paper proposes a tractable model of some dimensions of bounded rationality (BR). It

develops a “sparse maximum” operator, which is a behavioral generalization of the traditional

“maximum” operator.1 In the sparse max, the agent pays less or no attention to some features

of the problem. The sparse max has a psychological foundation and is quite versatile. It

easily handles problems of maximization with constraints.

This research has two goals: one is to provide an easy-to-use, plausible alternative to full

maximization, that can be used for theoretical applications and (ultimately) for empirical

ones. Another is to probe which parts of economic theory are robust, and which are not,

to that assumption of perfect maximization. Given the application-oriented focus of this

paper (where theory is meant to shed light on other, central theories), I submit a proof of

concept by re-examining two pillars of economic theory: basic consumer theory (problem 1),

and basic equilibrium theory. We obtain a simple, fresh perspective on those topics: a behav-

ioral version of Marshallian and Hicksian demand, Edgeworth boxes etc., and competitive

equilibrium sets. We can see which parts of basic microeconomics are very sensitive to the

assumption of perfect maximization, and which are robust. We also obtain an enrichment of

perfect-rationality basic microeconomics, that is arguably more plausible, and often equally

convenient (or more, as BR agents tend to use simpler problems).

The principles behind the sparse max are the following. First, the agent in the model

builds a simplified model of the world, somewhat like economists do. He builds a represen-

tation of the world that is simple enough, and thinks about the world through his partial

model. Second, this representation is “sparse,” i.e., uses few parameters that are non-zero

or differ from the usual state of affairs. These choices are controlled by an optimization of

his representation of the world that depends on the problem at hand. I draw from fairly

recent literature on statistics and image processing to use a notion of “sparsity” that still

entails well-behaved, convex maximization problems (see Tibshirani (1996), Candès and Tao

(2006), Donoho (2006), Mallat (2009)). The idea is to think of “sparsity” (having lots of

zeroes in a vector) instead of “simplicity” (which is an amorphous notion), and measure the

lack of “sparsity” by the sum of absolute values. This apparently simple step leads to a rich

set of results in statistics and signal processing — largely because of the tractable (convex)

notion of “simplicity” it leads to (through linear penalties, rather than fixed costs). This

paper follows this lead to use sparsity notions in economic modelling, and to the best of my

knowledge is the first to do so.2

“Sparsity” is also a psychologically realistic feature of life. For any decision, in principle,

1The meaning of “sparse” is that of a sparse vector or matrix. For instance, a vector ∈ R100000 with onlya few non-zero elements is sparse. In this paper, the vector of things the agent considers is (endogenously)

sparse.2Econometricians have already successfully used sparsity (e.g. Belloni and Chernozhukov 2011).

2

thousands of considerations are relevant to the agent; his income, but also GDP growth in

his country, the interest rate, recent progress in the construction of plastics, interest rates

in Hungary, the state of the Amazonian forest, etc. Since it would be too burdensome to

take all of these variables into account, he is going to discard most of them.3 The traditional

modelling for this is to postulate a fixed cost for each variable. However, that often leads

to intractable problems (fixed costs, with their non-convexity, are notoriously ill-behaved).

In contrast, the notion of sparsity I use (again, following the statistics literature mentioned

above, using linear rather than fixed costs) leads to problems that are easy to solve.

The model rests on very robust psychological notions, which are discussed below. It

incorporates limited attention, of course. To supply the “missing elements” due to limited

attention, people rely on defaults. When taking into account some information, they an-

chor on the default and do a limited adjustment towards the truth. This “anchoring and

adjustment” (Tversky and Kahneman 1974) is at the heart of the model. At the same time,

attention is allocated purposefully, towards features that are likely to be important. Sparsity

is a way to capture this allocation of attention.

If the agent is confused about prices, how is the budget constraint still satisfied? I propose

a way to incorporate maximization under constraint. For this purpose I tried to strike a good

balance between psychological plausibility and tractability. The formulation of the sparse

max with constraints has some nice properties, e.g. of duality, which are quite useful.

After the sparse max has been defined, I probe its usefulness by re-examining two building

blocks of basic microeconomics: consumer theory and competitive equilibrium theory. By

basic consumer theory, I mean the optimal choice of a consumption bundle subject to a

budget constraint4:

max1

(1 ) subject to 11 + + ≤ (1)

where is the utility function, is the quantity consumed of good , its price, and the

budget. There does not yet appear to be any systematic treatment of this building block

with a limited rationality model other than sparsity in the literature to date.5

One might think that there is little to add to old and basic topic. However, it turns

out that (sparsity-based) limited rationality leads to enrichments that may be both realistic

and intellectually intriguing. I assume that agents do not fully pay attention to all prices.

The price they perceive is a weighted average of the true price and the default price. The

3Ignoring variables altogether and assuming that they do not differ from their usual values are the same

thing in the model. For instance, in most decisions we do not pay attention to the quantity of oxygen that is

available to us because there is plenty of it. In the model, ignoring the oxygen factor is modeled as assuming

that the quantity of oxygen available is the normal quantity. Indeed, the two are arguably the same thing.4In companion projects, I examine how to inject sparsity into dynamic programming and macro-finance.5Dufwenberg et al. (2011) analyze general equilibrium with other-regarding preferences, especially when

imply as-if rational behavior. In problem (1), their preferences are rational (though they depend on other

people’s actions).

3

sparse max determines how much attention they pay to each good, and how they adjust

their budget constraint. The comparative statics are sensible. People pay more attention

to goods that have more volatile prices, and to which they are fundamentally more price

elastic.

The agent exhibits a form of nominal illusion. If all prices and his budget increase by

say 10%, say, the consumer does not react in the traditional model; however, in the spar-

sity model, the agent under-perceives the increase of non-salient prices; hence, his demand

changes. This apparently simple departure has a host of economically relevant consequences.

The model allows us to work out the sparse counterpart of the key building blocks of

consumer theory: Marshallian and Hicksian demand, Slutsky matrix, Edgeworth boxes, Roy’s

identity, Shepard’s lemma, etc. We can see which results are robust under BR, and which

are not (see section 1). One result is that the Slutsky matrix is no longer symmetric: non-

salient prices will lead to small terms in the matrix. At the same time, the model offers

a parsimonious deviation from the rational model. I argue below that indeed, the extant

evidence seems to favor the effects theorized here. In addition, deviations from Slutsky

symmetry could be useful for potential future empirical work: they offer a way to recover

quantitatively the extent of limited attention (up to a multiplicative factor).

I next revisit equilibrium theory and the venerable Edgeworth box. Recall that the “offer

curve” of an agent is the set of consumption bundles he chooses as prices change (those price

changes also affecting the value of his endowment). In the traditional Edgeworth box analysis

(with two goods), this offer curve is, well, a curve: a one-dimensional object. However, in the

sparsity model, it becomes a two-dimensional object (see Figure 2). We obtain an “extra-

dimensional” offer curve. 6 When the prices of the two goods change, in the traditional

model only their ratio matters. However, in the present model, both prices matter, not just

their ratio, and we have a two-dimensional curve. This effect, though quite basic, appears

to be new.

Next, we study equilibrium theory with those sparse agents. When two-dimensional offer

curves intersect, we do not expect a unique equilibrium. The key finding is that robustly,

in the sparse model, there is a Phillips curve in the Edgeworth box. More precisely: for each

price level, there is a (typically locally unique) equilibrium. However, as the price level rises,

there is a different real allocation: some agents (say) perceive that the price of the good

they care more about (say, their wage) is higher, and they supply more of that good. We

obtain effects of the Keynesian macroeconomic style (e.g., people supply more labor when

the perceived wage is higher), in a basic microeconomic context.

I gather what appears to be robust and not robust in the basic microeconomic theory of

consumer behavior and competitive equilibrium — when the specific deviation is a sparsity-

6This notion is very different from the idea of a “thick indifference curve”, in which the consumer is

indifferent between dominated bundles. A sparse consumer has only a thin indifference curve.

4

seeking agent.7

What is robust in basic microeconomics?

Propositions that are not robust

Tradition: The Slutsky matrix is symmetric. Sparse model: The Slutsky matrix is asym-

metric, as elasticities to non-salient prices become small.

Tradition: The Marshallian demand ( ) is homogeneous of degree 0, i.e., there is no

money illusion. Sparse model: Lack of attention leads to some nominal illusion.

Tradition: The offer curve is one-dimensional in the Edgeworth box. Sparse model : It is

typically a two-dimensional pinched ribbon.

Tradition: The competitive equilibrium allocation is (locally) uniquely determined. Sparse

model: Each aggregate price level leads a different equilibrium allocation, like in a Phillips

curve; so there is a one-dimensional continuum of equilibria.

Tradition: The Slutsky matrix is the second derivative of the expenditure function. Sparse

model: They are linked, but in a richer way, modulated by price salience.

Tradition: The Slutsky matrix is negative semi-definite. The weak axiom of revealed

preferences holds. Sparse model: These properties generally fail.

Small Robustness: Propositions that hold at the default price, but not away

from it, to the first order

Marshallian and Hicksian demands, Shepard’s lemma and Roy’s identity: the values of

the underlying objects are the same in the traditional and sparse model at the default price,

but differ (to the first order in − ) away from the default price.

Greater robustness: Objects are very close around the default price, up to

second order terms

Tradition: People maximize their “objective” welfare. Sparse model: people maximize

in default situations, but there are losses away from it.

Tradition: Competitive equilibrium is efficient. Sparse model: it is efficient if it happens

at the default price. Away from the default price, competitive equilibrium has inefficiencies,

unless people have the same misperceptions.

Expenditure function ( ), indirect utility function (): their values are the same,

under the traditional and sparse models, up to second order terms in the price deviation

from the default (− ).8

Traditional economics gets the signs right – or, more prudently put, the signs predicted

by the rational model are robust under a sparsity variant. Those predictions are of the type

“if the price of good 1 does down, demand for it goes up”, or more generally “if there’s a good

incentive to do X, people will indeed tend to do X,”9 Those sign predictions make intuitive

7I use the sparsity benchmark not as “the truth,” of course, but as a plausible benchmark for a less than

fully rational agent. The paper the underlying conditions for these statements.8This is just a consequence of the envelope’s theorem.9This is true for “direct” effects, though not necessarily once indirect effects are taken into account. For

5

sense, and, not coincidentally, they hold in the sparse model:10 they remained unchanged

even when the agent has a limited, qualitative understanding of his situation. Indeed, when

economists think about the world, or in much applied microeconomic work, it is often the

sign predictions that are used and trusted (e.g. as in Becker-style price theory), rather than

the detailed quantitative predictions.

After this behavioral version of basic micro, the paper concludes with a discussion of other

approaches to inattention / behavioral economics. The sparse approach is a complement to

other approaches, e.g. learning. Most other approaches either cannot11 handle the basic

consumer problem in eq. 1, or they appear to be too complicated to handle it tractably (e.g.

fixed costs lead to very complex NP-complete problem;12 multidimensional signal extraction

is extremely hard to use outside the Gaussian-quadratic setup (Veldkamp 2011)). There also

differences of substance, discussed below.

The plan of the paper is as follows. Section 2 states the sparse max, analyzes it, first

without constraints, and then with constraints. It also discusses its psychological underpin-

nings. Section 3 develops consumer theory, and section 4 analyzes competitive equilibrium

theory. Those two sections are the richest in fairly concrete results. Section 5 discusses links

with existing themes in behavioral and information economics. Section 6 presents conclud-

ing remarks. Many proofs are in the appendix or the online appendix. The online appendix

contains extensions and other applications, e.g. basic producer theory and pricing anomalies.

2 The Sparse Max Operator

2.1 The Sparse Max Without Constraints

The agent faces a maximization problem which is, in its traditional version, max ( )

subject to ( ) ≥ 0, where is a utility function, and is a constraint (or a vector of

constraints). I want to define the “sparse max” operator:

smax

( ) subject to ( ) ≥ 0 (2)

instance, this is true for compensated demand (see the part on the Slutsky matrix), and in partial equilibrium.

This is not necessarily true for uncompensated demand (where income effects arise) or in general equilibrium

— though in many situations those “second round” effects are small.10The closely related notion of strategic complements and substitutes (Bulow, Geanakoplos and Klemperer

1985) is also robust to a sparsity deviation.11For instance, the “near rational” approach, which is quite useful in some contexts (Akerlof and Yellen

1985, Chetty 2012), says that the consumption lead to the loss of at most utils, but it does precisely predict

which consumption will be chosen. The “adjust information over time” approach (e.g. Gabaix and Laibson

2002, Mankiw and Reis 2002) does not apply to a static problem.12Suppose that each price can be examined by paying a fixed cost. There are 2 ways to allocated those

fixed costs. This leads to an intractable problem when is large.

6

which is a less than fully attentive version of the “max” operator. We first shall examine cases

without constraints, i.e. study max ( ). To fix ideas, take the following quadratic

example:

( ) = −(−X=1

)2 (3)

Then, the traditional optimal action is () =P

=1 . For instance, to choose con-

sumption (normalized from some baseline), the decision maker should consider not only

his wealth, 1, and the deviation of GDP from its trend, 2, but also the interest rate,

10, demographic trends in China, 100, recent discoveries in the supply of copper, 200, etc.

There are 10 000 (say) factors that should in principle be taken into account. However,

most of them have a small impact on his decision, i.e., their impact is small in absolute

value.

Notations are as follows: is the action; it is potentially multi-dimensional. is an ideal

parameter (typically a vector). However, the agent might pick a sparser parameter (we

will specify how) and maximize ( ) rather than ( ). Then, he will pick the

action: () =P

=1, for some vector that endogenously has lots of zeros, i.e.,

is “sparse.” For instance, if the agent only pays attention to his wage and the state of the

economy, 1 and 2 will be non-zero, and the other ’s will be zero.

I now present a procedure that the agent might follow. I first describe it, then analyze its

properties, then justify it (justification is easier after the properties are clear). The inputs

are:

• , the “default parameter” (typically = 0). This corresponds psychologically

to what the agent considers if he has “no time” to think about it. In the quadratic

example, it will be = (0 0), i.e. the agent thinks about nothing.

• A parameter ≥ 0, the penalty for lack of sparsity. If = 0, the agent will be the

traditional, rational agent model. If 0, the agent has a real taste for sparsity.

• A probability measure over the set of : the DM behaves as if the were drawn

from . We assume that E [] = 0. Two cases are worth distinguishing:

1. Attention chosen before seeing the variables (Ex ante attention). The DM chooses the

weight before seeing the . As a proxy for the likely magnitude of , he uses , the

standard deviation of under ,

2. Attention chosen after seeing the variables (Ex post attention). The DM chooses the

weight after seeing the . Formally, this is a special case of the former, where the agent

knows the magnitude of the variables (but not the sign), and hence sets = ||.

• Finally, a random variable , a “modulus” for the discernible changes in action .

That will ensure that the model is invariant in the units in which is measured. In

7

practice, only the product matters, but it’s useful to think of and separately,

as this way, is unitless.

In a static context, these objects are just exogenous. However, in dynamic models, which

have more structure, they become more endogenous (Gabaix 2012a). For instance, the

“default model” may simply be the one that assumes that all variables are at their average

value. The random variable might represent the normal variability of the actions. At this

stage, though, we shall just take them as exogenous. Still, much of the economics will come

from the limited attention, not from playing with and . In this sense, those parameters

need to be specified, but do not matter too much.

We define := argmax ¡

¢, and will call it the default action. It’s the optimal

action under the default model. We assume that function ( ) is strictly concave in ,

and twice continuously differentiable (2) in () near¡

¢.

Definition 1 (Sparse max operator without constraints). The sparse maximum, , and

sparse maximand, , written = smax|, ( ) and = arg smax|, ( ),

are defined by the following procedure: First, calculate the optimally sparse representation of

the world, ∗:

∗ = argmin

1

2(−)

0 E£0

¡−−1 ¢¤ (−) + X

¯ −

¯E£()

2¤12(4)

Second, define the sparse action as = argmax (∗ ), and the resulting utility as

= ( ). In (4) derivatives are evaluated at the default model and action: ,

:= argmax ¡

¢

In other terms, the agent solves for the optimal ∗ that trades off a proxy for the utility

losses (the first term in the right-hand side of equation 4) and a psychological penalty for

deviations from a sparse model (the second term on the left-hand side of 4). Then, the agent

maximizes over the action , taking ∗ to be the true model.

When = 0, the sparse max is simply the regular max. Hence, the model continuously

includes the traditional model with no cognitive friction.

The expression () := 12(−)

0 E [0 (−−1 )] (−) is the quadratic ap-

proximation of the expected loss as a result of an imperfect model . 13 One modeling

13Indeed, consider a function with no , and () = argmax (), the best action under model .

The utility loss from using the approximate model rather than the true model is () := ( () )− ( () ). A Taylor expansion shows that for close to , () = () to the leading order.

Indeed, as () solves ( () ) = 0, the implicit function theorem gives + = 0, i.e.,

= − −1 with = − . Hence, the loss is, up to ³kk2

´terms, = − − 1

2 =

0 + (). Note that this holds if the derivatives are evaluated at the default model (as done here) or

at the true model.

8

μ

τ(μ, κ)

μ + κ

−κ

0κ

μ − κ

Figure 1: The anchoring and adjustment function

decision in defining the sparse max is to use () rather than the exact loss, which

would be very complex to use for both the decision maker and the economist.14 The deci-

sion maker uses a simplified representation of the loss from inattention. This is one way to

circumvent Simon’s “infinite regress problem” — that optimizing the allocation of thinking

cost can be even more complex than the original problem. I avoid that problem by assuming

a simpler representation of it, namely a quadratic loss around the default.

Let us now analyze the model further.

Model Analysis Let us first analyze the problem min12(− )

2+ ||, which is

the elementary component of problem (4). There, wants to be close to the ideal weight

, but there is a linear penalty for deviation from 0, ||. Its solution (see Lemma 1 in theappendix) is:

= ( ) (5)

for the “anchoring and adjustment” function plotted in Figure 1 and defined as:

( ) = (||− ||)+ () (6)

i.e., for 0, ( ) = 0 for || ≤ ||, − for and + for −.When the ideal weight is small (|| ≤ ), then the sparse weight is 0 ( = 0). All

small components are replaced by 0. This confers sparsity on the model. Second, for ,

= − . This corresponds to a partial adjustment towards the correct value . This

motivates the term “anchoring and adjustment” (Tversky and Kahneman 1974).

Let us continue our analysis of the sparse max by consider the case where:

( ) = (11 ) (7)

and calculate the sparse max smax ( ). We consider the case where the agent views

14Tibshirani (1996, section 8) also recommends using a quadratic approximation in statistics.

9

the ’s as uncorrelated, and is one-dimensional. We call := E [2]12

the typical

modulus for movements in . We analyze the cases of ex ante and ex post attention in turn.

Proposition 1 When “attention is chosen before seeing the variables”, the smax opera-

tor can be equivalently formulated as: = arg smax| ( ()=1) and =

smax| ( ()=1). The attentional policy is:

∗ =

µ1

·

¶(8)

where = −−1 is the traditional marginal impact of a small change in ,

evaluated at the default model. The action is:

= argmax

(∗11

∗) (9)

and the utility is = ( ).

Proposition 2 When “attention is chosen after seeing the variables”, the smax opera-

tor can be equivalently formulated as: = arg smax| ( ()=1) and =

smax| ( ()=1). The agent uses the truncated version of variable , ac-

cording to:

=

µ

¶(10)

where = −−1 is the traditional marginal impact of a small change in ,

evaluated at the default model. The action is = argmax ( 11

) and the

utility is = ( ).

To interpret the sparse max further, take the example = 03. Hence, the sparse max

procedure in (8) implies:

“Eliminate each feature of the world (i.e., = 1) that would change the action

by less than a fraction = 30% of the standard deviation of that action” (i.e.,

eliminate the such that | · | ).

This is how a sparse agent sails through life: for a given problem, out of the thousands of

variables that might be relevant, he takes into account only a few that are important enough

to significantly change his decision.

Then, after doing the simplification and removing many variables (many are 0), the

agent chooses the action based on its simplified set of variables, eq. 9. Let us illustrate this

by an example.15

15Also (normalizing = = 1 ), it is easy to see that has at most min³kk1 kk22

´non-zero

components (because 6= 0 implies || ≥ 1). Hence, even with infinite-dimensional and , provided

the norm of is bounded, has a finite number of non-zero components, and is therefore sparse.

10

Example 1 Take the quadratic loss problem, (3). The traditional, non-sparse answer is:

=P

=1 . The sparse answer is, when with ex ante attention:

ex ante =

X=1

µ

¶ (11)

while with ex post attention, it is ex post =P

=1 ( ).

2.2 Sparse Max With Constraints

We extend the model, so that it handles maximization under constraints. The decision maker

wishes to maximize a utility subject to constraints, i.e. (2).

A leading example is the following: We start from a default price . The new price is

= +, while the price perceived by the agent is () = +. The ideal vector of

weights is thus = (1 1). This can be rewritten as: () = +(1−) . Then,

is the budget constraint under model parameter , ¡ −

¢= − () · . We shall

call := (∗) the vector of prices perceived by the sparse agent.

We assume that and are concave in (and at least one of them strictly concave) and

2 around¡

¢.

Definition 2 (Sparse max operator with constraints). The sparse maximum of problem (2)

is defined by the following two steps. We call ( ) := ( ) + · ( ) the

associated Lagrangian.

1. i. Select the Lagrange multiplier ∈ R+ associated with the problem (2) at the default

model, so that the optimal action in under the default model is max ¡

¢.

ii. Selection of model ∗. Use the sparse max operator of Definition 1, without

constraints, for the function ¡

¢. That returns a representation ∗.

2. Action taking the constraints into account.

i. Form a function () = argmax (∗ ), the optimal action under model ∗

with Lagrange multiplier .

ii. Maximizes utility subject to the “true” constraint: ∗ = argmax∈R+ ( () ∗ )

subject to ( () ) ≥ 0. With just one binding constraint this is equivalent to

choosing the lowest ∗ so that the constraint does bind: ( (∗) ) = 0. The

resulting sparse action is := (∗). Utility is = ( ).

Model justification This device does not depend on the sparsity perspective to trans-

late a BR maximum without constraints, into a BR maximum with constraints. It could be

reused in other contexts. Let me show how it is a reasonable way to extend the operator to

add constraints.

11

The first step seems quite natural. To replace a problem with constraints into an uncon-

strained problem, we add the “price” of the constraints to the utility. Step 1.i of Definition

2 picks a Lagrange multiplier, , using the default representation . For instance, in a

consumption problem (1), is the “marginal utility of a dollar”, at the default prices. This

way, in Step 1.ii we can use Lagrangian ¡

¢to encode the importance of the con-

straints and maximize it without constraints, so that the basic sparse max can be applied.

That yields the chosen attention allocation, .

The second step comes from the following intuition.16 Take the consumer problem (1):

we would like to keep the usual psychological / economic reasoning that the ratio of marginal

utilities is the ratio of perceived prices:170102=

12. Pretty much the only way to keep that

intuition and budget constraint is to say that

0 () = (12)

for some scalar . The agent will tune so that the constraint bind, i.e. the value of ()

defined above satisfied · () = .18 In step 2, the agent “hears clearly” if the budget

constraint binds.19 More psychological intuition will be given below Proposition 4.

Advanced topics The following properties will be useful later, but the reader is en-

couraged to skip this subsection at the first reading. With the sparse max, the envelope’s

theorem holds at the default model, but needs to be modified away from it (see online

appendix, section 9.1). In addition, the sparse max has a the following nice duality property.

Proposition 3 (Duality is respected by the sparse max). Suppose − are convex in , atleast one of them strictly so, and let b b be two real numbers. Consider the dual optimizationproblems: (i) (b) := smax ( ) s.t. ( ) ≤ b, (ii) (b) := smin ( )

s.t. ( ) ≥ b. Assume that the constraint binds for problem (i) at b. Then the twoproblems are dual of each other, i.e. ((b)) = b and ((b)) = b. They also yield thesame sparse model ∗.

2.3 Psychological Underpinnings

The model is based on very robust psychological facts: (i) limitedness of attention and

working memory; (ii) use of defaults and anchor; (iii) anchoring and adjustment processes.

16The equivalence with ( (∗) ) = 0, mentioned in Step 2, is justified in Lemma 2 in the appendix.17Otherwise, as usual, if we had

0102

12, the consumer could consume a bit more of good 1 and less of

good 2, and project to be better off18If there are several , the agent takes the smallest value, which is the utility-maximizing one.19This model, with a general objective function and constraints, delivers, as a special case, the third

adjustment rule discussed by the NBER WP version of Chetty, Looney and Kroft (2009) in the context of

consumption between two goods and one tax.

12

I take them in turn.

Limited attention and sparsity It is clear that we do not handle thousands of vari-

ables when dealing with a specific problem. For instance, research on working memory

documents that people handle roughly “seven plus or minus two” items (Miller 1956, Kah-

neman 2011). At the same time, we do know — in our long term memory — about many

variables, . The model roughly represents that selective use of information. In step 1, the

mind contemplates thousands of , and decides which handful it will bring up for conscious

examination. Those are the variables with a non-zero . We simplify problem, and can

attend to only a few things — this is what sparsity represents.20

Recall the terminology for mental operations of Kahneman (2003), where system 1 is

the intuitive, fast, largely unconscious and parallel system, while system 2 is the analytical,

slow, conscious system. One could say that the choice of “what comes to mind” in Step

1 is a system 1 operation, that (below the unconsciously) selects what to bring up to the

conscious mind. Step 2 is more like a system 2 operation, determining what to choose, given

a restricted set of variables actively considered.

Reliance on defaults What about when we have no time to think? What comes to

mind? This is represented by the vector — the default parametrization of variables taken

into account when we have no time to think.21 This default model, and the default action

(which is the optimal action under the default model) corresponds to “system 1 under

extreme time pressure”. The importance of default has been shown in a growing literature

(Carroll et al. 2009). Here, the model default is very simple (basically, it is “do not think

about anything”, = 0), but it could be enriched, following other models (e.g. Gennaioli

and Shleifer 2010).

Anchoring and adjustment The mind, in the model, anchors on the default model.

Then, it does a full or partial adjustment towards the truth. This is exactly the psychology of

20There is a rich literature in psychology featuring elimination of dimensions (e.g., Tversky’s “elimination

by aspects” 1972, and the literature reviewed in Payne, Bettman and Johnson 1993; see also Gabaix, Laibson,

Moloche and Weinberg 2006). The theme is that, given processing cost, the DM must not (and indeed,

cannot) consciously pay attention to many dimensions. This is very intuitive, and there is strong experimental

support for this, e.g., by looking at active clicks in information lookups (Payne, Bettman and Johnson 1993).

Partly unconsciously, the mind monitors many things in parallel; when they’re important or unusual enough

they’re brought to consciousness. Most of the time, it is modelled in psychology as a process. The advantage

of process models (e.g., sequential search of information) is that they make good intuitive sense. The

disadvantage is that the whole search process needs to be simulated to obtain actual predictions, so that

the predictions are somewhat opaque and cumbersome. The sparsity-based model is a model of elimination

of dimension, which eschews any step by step process, hence obtaining a closed-form representation of what

agents will really take into account. I have tried to optimize it to make it tractable and widely applicable,

while trying to capture some core psychology of inattention.21Any model of limited attention needs something akin to a “default”. Bayesian models, for instance, need

a “prior.”

13

“anchoring and adjustment,” as dubbed by Tversky and Kahneman (1974). There is anchor-

ing on a default value and partial adjustment towards the truth (e.g., people pay only partial

attention to the base rate when forming probability inferences); “People make estimates by

starting from an initial value that is adjusted to yield the final answer [...]. Adjustments

are typically insufficient. That is, different starting points yield different estimates, which

are biased toward the initial value” (Tversky and Kahneman, 1974, p. 1129). It now has

a plethora of experimental evidence supporting it, and appears central to the ability of the

mind to finds its way in complex problems (Kahneman 2011, chapter 11).

In the model, this effect is generated by the anchoring and adjustment function . It

exhibits anchoring on the default model, and partial adjustment towards the truth. It would

be interesting to experimentally investigate the function — perhaps to refine it. The com-

parative statics make sense (less important variables are less used). The quantitative forms

would make sense too. When the variable has high values, it is largely taken into account.

Hence, even though there is no specific experimental evidence regarding this function, the

extensive psychological evidence supports its basic elements.

Purposeful attention, directed to a priori important things In this model, the

DM pays more attention to more important things. Recent models show the rich psychologi-

cal implication of that basic fact. In Bordalo, Gennaioli and Shleifer (2012), agents choosing

between two gambles pay more attention to states where the two gambles are most different.

In Koszegi and Szeidl (2012), people focus more on features that most differ in the choice

set.

Discussion When presenting any boundedly rational procedure, certain modelling de-

cisions have to be made. Going away from the safe shores of rationality, we venture into the

unknown. An example is the literature on learning in games (Fudenberg and Levine 1998),

which features somewhat ad hoc algorithms, like fictitious play. Indeed, it may be instruc-

tive to note that many of the useful innovations in basic modelling have started without any

axiomatic basis: prospect theory, hyperbolic discounting, learning in games, fairness models,

Calvo pricing etc. Sometimes the axioms came, but later.22

Criteria to judge a model include: 1. Usability of the model: Portability and tractability.

2. Formal properties (e.g. representation invariance); 3. Predictions. 4. Intrinsic necessity

/ axioms. I’ll insist on criteria 1—3, while exploring criterion 4 to the online appendix.23

22The rich work on sparsity (Candès and Tao (2006), Donoho (2006)), which has established many near-

optimality properties of the use of the “1” norm, which can be used as a substitute for the perhaps more

ideal, but intractable, notion of number of non-zero coefficients. It is conceivable that some of those results

could be used to quantify the optimality properties of the attention procedure in the sparse max, and perhaps

improve its formulation.23The online appendix shows, however, that given the quadratic- linear structure of the model its scaling

factors (e.g. the multiplication by kk and the linke) are rendered necessary by an axiom that the

14

I’d like to contrast the model with other approaches such as noisy signals with Bayesian

updating and fixed costs, which may do better on criterion 4, but are quite problematic for

criterion 1. The sparsity model, we shall see, is very easy to use, as. I will illustrate in a

core question of economics: consumer and equilibrium theory. It will make one prediction

which I believe is distinctive and quite true regarding the impact on limited attention on

cross-elasticities (the Slutsky matrix).

Isn’t that a complex problem? One could object that it is easier to optimize on , like

in the traditional model, than on and , like in the sparse max. This is not the case

when the situation is seen the following way: at time 0, so to speak the agent chooses an

“attentional policy”, i.e. the ∗. Then, he is ready to react to many situations, with a

precompiled, sparse, attention that allows him to focus on just a few variables. In that way,

it is economical to use something like the sparse max.

In addition, the sparse max leads (at least in many situation) to a quite simple algorithm

(for the agents), as shown in Proposition 2.

If you know and , why not use them? One interpretation is that it is system 1

(Kahneman 2003) who, at some level, knows (i.e. has a sense that the volatility of the

interest isn’t important), and chooses not to bring it to the attention of system 2, for a more

thorough analysis. System 1 chooses the representation , while system 2 takes care of the

actual maximization, with a simpler problem.24

How does the agent know , ? This knowledge is that system 1 has a sense of what

variables are important, and which are not, at the default model. It seems intuitive that for

many problems at least, agents do have a sense of which variables are important or not. To

keep the model simple, this sense is encoded by a knowledge of and .

Why not a fixed cost or a quadratic cost? The “taste for sparsity” in (4) features a linear

cost, || (normalizing = 0). Why not a more general cost, say || for a more general

≥ 0? First, that could well be a good idea, in some cases. Then (assuming that the are uncorrelated), we obtain a different function. The online appendix works out the case

of fixed cost. However, in general, an 1 would generate no sparsity (generically, all

would be non-zero). An 1 (including a fixed cost, which is = 0) would generate

a non-convex problem (|| is not convex then). This would generate a very intractableproblem in general (an NP-complete problem).25 Hence, the case = 1 is the only one to

generate both sparsity and tractability.

The agent’s decisions depend on the basis, i.e. of which are available. That is arguably

a desirable feature of the model. For instance, take the Chetty, Looney and Kroft (CLK,

2009) results, where 1 is the sticker price, and 2 extra tax (all demeaned) and consumption

model should be invariant to the choice of units.24See e.g. Fudenberg and Levine (2006), Brocas and Carillo (2008) for very different dual-self models.25This special role of = 1 has been elucidated by Tibshirani (1996). Hassan and Mertens (2011) also

use = 1.

15

should reflect the total price (1 + 2), then a sparse agent will react with a high weight on

the sticker price, and a low weight the tax (because the tax rate is expected to vary less).

This conforms to the experimental evidence of CLK. To model that the sticker price is very

“available”, we put it on the basis, and don’t put the total price on the basis: calculating

the real price requires more effort. In contrast, a “pure” rational inattention-with-entropy

agent (à la Sims 2003) has predictions that do not depend on the basis. This is one of the

appealing features of the Sims model. However, this leads to counterfactual predictions, in

this context at least. The Sims model will dampen equally the sticker price and the tax,

i.e. people will perceive (1 + 2) with the same dampening ∈ [0 1].26 The experimentalresults of CLK support that lower weight on the seemingly less important part, the tax. To

account for such empirical facts, the basis available to the agents does indeed matter.27

Invariance by rescaling The model is invariant by rescaling; its predictions are indepen-

dent of the units in which the components of and are measured. For instance, || kkdoes not depend on the units of. More generally, the model is invariant (for small changes)

to reparametrizations of the action. For instance, if the agent picks consumption or log con-

sumption, the representation chosen by the decision maker is the same. This adds some

robustness and ease of use to the model. In addition, the model is invariant by (possibly

nonlinear) transformation of the utility function (see online appendix, section 9.2).

Where does the modulus for action come from? One possibility is to take the normal

variability of the action under the true model. Another is to take the variability of the

action under a fixed point. This interesting issue, fortunately, does not matter too much for

many applications. The pros and cons are discussed in the online appendix, section 9.3.

Calibration We can venture a word about calibration. As a rough baseline, we can

imagine that people will search for information that accounts for at least 2 = 10% of the

variance of the decision, i.e., if || . Then, we find = (cf section 2.1). That leads

to the baseline of =√10% ' 03. The reader may find that, rather than 10%, 2 = 1% is

better (though this may be very optimistic about people’s attention), which corresponds to

=√1% = 01 — a number still in the same order of magnitude as ' 03. As it turns out,

in subsequent work (Gabaix 2012b), the calibration ' 03 works quite well in predictingthe subject’s behavior in experimental games.

Section 5.1 contains more justification. Given it involves more math, I defer it. Instead,

let us move to more concrete economics, and apply the sparse max to basic consumer theory.

26With entropy penalty à la Sims (2003), the solution to the quadratic problem is: E£Entropy | ¤ =

P

, for a ∈ [0 1] which increases with the attention budget. Hence, all dimensions are dampenedequally. In contrast, in the sparse model (11), less important dimensions are dampened more.27This is why other researchers on rational inattention (Mackowiak and Wiederholt 2009, Woodford 2012)

also have basis-dependent models, or their counterpart in their framework.

16

3 Consumer Theory

3.1 Basic Consumer Theory: Marshallian Demand

We revisit the theory of consumer’s behavior, with a sparse flavor. We rewrite (1) with the

more compact notation = (1 ): Marshallian demand given price vector and budget

is: ( ) = argmax∈R () subject to · ≤ . The perceived price is:

() = + (1−) (13)

When = 1, the agent fully perceives price , while when = 0, he replaces it by the

default price. In some cases, it is useful to work in percentage terms, i.e. in logs,

() =

¡¢1−

(14)

We call () = ( ()), the indirect utility function.28 We call ( ) = min · s.t. () ≥ the expenditure function — the minimum expenditure required to reach utility

level — and the associated Hicksian demand ( ) = argmin · s.t. () ≥ .

The sparse max in Definition 2 has two parts: the first part states the optimal (sparsely)

perceived price that we will call := (∗) as a short-hand.29 Let us defer its analysis

to focus the second part. Given perceived the price , what is the desired consumption

() of a sparse agent? We call () the Marshallian demand under the traditional

model. Superscripts denotes a sparsely attentive version, and for the traditional, rational,

model.

Proposition 4 (Marshallian demand). Given the true price vector and the perceived price

vector , the Marshallian demand of a sparse agent is

( ) = ( 0) (15)

where the as-if budget 0 solves · ( 0) = (if there are several such 0, take the largest

one), i.e. ensures that the budget constraint is hit under the true price.

When rational demand is linear in wealth ( ( ) = ( 1)), the demand of a sparse

agent is:

() = ( )

· ( 1) (16)

i.e. ( ) = (

)

(

1).

To unpack this Marshallian demand, we start with an example.

28Hence is only the “consumption utility.” One could imagine a richer notion using psychic costs, but

that raises many other interesting issues (Bernheim and Rangel 2009).29In the framework of the previous section, the price comes from a distribution , whose mean is .

17

Example 2 (Demand by a sparse agent with quasi-linear utility). Take () = (1 −1)+

, with strictly concave. Demand for good is independent of wealth and is:

() = ().

In the above example, the demand of the sparse agent is simply the rational demand given

the perceived price (for all goods but the last one). The budget constraint is “absorbed” by

the residual good . This is often the most plausible way budget is respected: extra over-

and under-spending is absorbed in a general fund, for instance savings.

In the world of static consumption theory, without a linear good, there is no “residual

account” in which mistakes, and income effects, can be absorbed. The formulation of Propo-

sition 4 handles that case. It says, “given an as-if budget 0, optimize with perceived prices

: that gives ( 0). If the budget constraint isn’t saturated, change the budget 0 so as

to hit the budget constraint”.

Let us say that the consumer goes to the supermarket, with a budget of = $99.

Because of the lack of full attention to prices, the value of the basket in the cart is actually

( ) = $100. When demand is linear in wealth, the consumer buys 1% less of all the

goods, to hit the budget constraint, and spend exactly $99 (this is the adjustment factor

1 · ( 1) = 099). When demand is not necessarily linear in wealth, the adjustment is(to the leading order) proportional to the marginal demand,

, rather than the average

demand, . When he understand that he must consume less than he expected, the sparse

agent cuts “luxury goods”, not “necessities”.

Here are two other concrete examples. Recall that is the perceived price.

Example 3 (Demand by a sparse Cobb-Douglas agent). Take () =P

=1 ln , with

≥ 0. Demand is: () =

.

Example 4 (Demand by a sparse CES agent). Take () =P

1−1 (1− 1), with

0. Demand is: () = ( )−

()− .

Determination of the attention to prices, ∗. The exact value of attention, , is

not essential for many issues, and this subsection might be skipped in a first reading. The

model endogenizes the weights in (13) as follows.

Proposition 5 (Attention to prices). In the basic consumption problem, assuming either

that price shocks are uncorrelated, or that utility is separable, attention to prices is:

∗ =

µ1

¶(17)

where = − (00−1) is the elasticity of demand for good .

18

Empirical work already measures something akin to those attention weights. For instance,

Allcott and Wozny (2012) find that car buyers behave as if they put a weight = 072 on

gas prices, rather than a weight of 1. Chetty, Looney and Kroft (2009) find that people take

the tax partially into account, with a = 035.

Proposition 5 states that attention to prices is greater for goods (i) with more volatile

prices, and (ii) with higher price elasticity (i.e., roughly, for goods whose price is more

important in the purchase decision). These predictions seem sensible, though not extremely

surprising. 30 Still they might provide plausible hypotheses for empirical work, as they

express attention in terms that are observable in principle (except for the parameter , but

it is hypothesized to be the same across goods). It would be interesting to test them. It is

not easy, but we shall see the the results below (especially (20)) offer a way to do that.

The online appendix (section 8.3 and 8.4) contains two applications of Proposition 5. The

first is to calculate inattention in market equilibrium — i.e., with endogenous price volatility.

The second application is to shrouded attributes: Extant analyses typically assume that

attention is paid to the base good (e.g. the printer) rather than the add-on (e.g. the

cartridge, see Gabaix and Laibson 2006). Proposition 5 allows to derive, rather than merely

assume, this effect, and qualify its domain of validity.

It is comforting to note that the sparse machinery can be applied to those potentially

arduous problems, even though the “direct” predictions are not very surprising. For our

theoretical issues, what is important is that we do have some procedure to pick the , so

that the model is closed. This allows us to derive the “indirect” consequences of limited

attention to prices. More surprises happen here, as we shall now see.

3.2 Edgeworth Box: Extra-dimensional Offer Curve and Nominal

Illusion

To illustrate how a sparse agent is qualitatively different from a traditional agent, we consider

his offer curve. Take a consumer with endowment ∈ R (he is endowed with units of

good , = 1). Given a price vector , his wealth is · , and so his demand is () := ( · ), which has values in R. The offer curve is defined as the set of demands, as

prices vary (perhaps as the Walrasian auctioneer varies them): :=© () : ∈ R

++

ª.

Let us start with two goods ( = 2). The left panel of Figure 2 is the offer curve of

the rational consumer: it has the traditional shape. The right panel plots the offer curve of

30The term can be directly measured in empirical work. In theory, it could just be a parameter,

e.g. = 01 if only changes that make consumption move by 10% are deemed important enough for

the agent. Another variant is to assume that, under the default model, the agent just pays attention to his

wealth, wealth has standard deviation , and =

is the standard deviation of under the default

model. If the is the actual standard deviation of consumption for the consumer, then there is a fixed

point, and the solution is simple in the case where one good (“money”) has linear utility: = 1 (1 + ).

The online appendix (section 9.3) discusses this further.

19

Ω

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

c1

c 2

Offer Curve: Traditional agent

Ω

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

c1

c 2

Offer Curve: Sparse agent

Figure 2: This Figure shows part of the offer curve with Cobb-Douglas preferences. It shows

the set of demanded consumptions ( · ), as the price vector varies. The left panelis the traditional (rational) agent’s offer curve. The right panel is the sparse agent’s offer

curve (in gray): it is a 2-dimensional surface. It is plotted in the range ∈ [14 4] (the fulloffer curve is described at the end of this subsection). Utility is (1 2) = ln 1 + ln 2. For

the sparse agent, we have = (1 1) and = (1 12).

a sparse consumer with the same basic preferences: it corresponds to the gray area in the

right panel is the offer curve. It is a 2-dimensional “ribbon”, with a pinch at the endowment,

rather than the 1-dimensional curve of the rational consumer. The offer curve has acquired

an extra dimension.

What is going on here? Normally, the offer curve is one-dimensional in the traditional

model. Because demand ( · ) is homogeneous of degree 0 in = (1 2), only the

relative price 12 matters. However, in the sparsity model, Marshallian demand () is

not homogeneous of degree 0 in any more (generally speaking — we will specify conditions

later). This is a form of nominal illusion. Not just the relative price 12 matters.31 Hence,

the offer curve is described by two parameters (1 2), so its image is 2-dimensional. One

might call this an “extra-dimensional offer curve” in the Edgeworth box.32

31It is instructive to stare at the result directly, with (1 2) = ln 1 + ln 2 and = (1 1)0. The BR

agent’s offer curve is the set of (1 2) such that: 0 (2) 0 (1) = 2

2 1

1 (using the log prices specification

of attention) and such that the budget constraint holds, · (− ) = 0. Hence, the OC is the set of (1 2)

that satisfy the following two equations for some 1 and 2:

1

2=

µ1

2

¶−1

2−1

2 1

2(1 − 1) + 2 − 2 = 0

Hence, when 1 6= 2 (in the rational model, 1 = 2 = 1), the offer curve is described by two

parameters: 12 and 1−2

2 . The offer curve is two-dimensional.32This “2-dimensional offer curve” is distinct from the textbook “thick indifference curve”. The “thick

indifference curve” arises when the consumer violates strict monotonicity (i.e. likes equally 7 and 7.1 ba-

nanas), is not associated to any endowment or prices, and has no pinch. The sparse offer curve, in contrast,

arises from nominal illusion, needs an endowment and prices, and has a pinch at the endowment.

20

To better understand the nominal illusion, we need to examine howwe can have ( ) 6= () for a multiplicative factor 6= 1. Suppose that the Walrasian auctioneer increases allprices by 10% and the budget by 10% (i.e. = 11). For a rational consumer, nothing really

changes and he picks the same consumption. However, consider a sparse consumer who pays

more attention to good 1: 1 2. He perceives that the price of good 1 has increased

more than the price of good 2 has (he perceives that they have respectively increased by

1 · 10% vs 2 · 10%). So, he perceives that the relative price of good 1 has increased.Hence, he consumes less of good 1, and more of good 2. His demand curve has shifted,

even though prices and budget have all increased by the same 10%. The consumer exhibits

nominal illusion.

More advanced notions The next Proposition formalizes the notion that, if does

not have all equal components, then the offer curve has “one extra dimension” compared to

the traditional model.

Proposition 6 (Extra-dimensional offer curve). Take a price such that ·³

´−1(− ) 6=

0. Then around (), the offer curve of the sparse agent has one extra dimension compared

to the traditional model, i.e. it has dimension .

The restriction implies that 6= : we do not start at the endowment (this can be seen

in the “pinch” at in Figure 2, right panel).33 It also implies that in the log linear model,

does not have all identical components — i.e., the consumer pays more attention to some

goods then offers. 34

Next, we show that when inattention is unlimited, the offer curve is very wide indeed.

This “wide” OC effect relies on potentially extreme prices and misperceptions. In slight

variants, the OC not cover the whole space, e.g. if there is “limited misperception.” 35

Example 5 (Wide offer curves with unbounded inattention). Suppose that there two goods,

with different inattention (1 6= 2) in the loglinear specification (14). Then, any consump-

tion that does not dominate the endowment nor is dominated it, is in the consumer’s offer

curve.

33A point in the offer curve satisfies · = · for some , so it must be north-west or south-east of .Hence, the “pinch.”

34Indeed, if = (1 1)0, then,

=

, so

·³

´−1= −1, and the condition of Proposition

6 is equivalent to · (− ) 6= 0, which is never satisfied.35Such a limited misperception can come from “attention allocated ex post”: () = · exp (ln

),

where 0 and is as in (6). Then,¯ln

()

¯≤ . With that model, we have a less extreme OC, but it

remains extra-dimensional shape. It is very similar to Figure 2,

21

3.3 Slutsky Matrix and Inferring Inattention from Data

The Slutsky matrix is an important object, because it encodes both elasticities of substitution

and welfare losses from distorted prices (the distortion coming from taxes, or, as we shall

see, from inattention). Its term is defined to be the compensated change in consumption

of as price changes:

( ) := ()

+

( )

( ) =:

( )

compensated

(18)

With the traditional agent, the most surprising fact about it is that it is symmetrical:

=

, i.e., the (compensated) impact of the price of good on the consumption of good

, is exactly the same as the impact of the price of good on the consumption of good .

Varian (1992, p.123) comments: “This is a rather nonintuitive result.” Mas-Colell, Whinston

and Green (1995, p.70) concur: “Symmetry is not easy to interpret in plain economic terms.

As emphasized by Samuelson (1947), it is a property just beyond what one would derive

without the help of mathematics.” 36 Now, if a prediction is non-intuitive to Mas-Colell and

consorts, it might require too much sophistication from the average consumer.

We will soon present a less rational, and psychologically more intuitive, prediction. We

use the price misperception discussed in (13). The consumer is partially inattentive, and

“sees” only a part of the price change. We first derive the marginal Marshallian demand.

Proposition 7 The marginal Marshallian demand ( ) is, at the default price :

=

× −

× (1−) (19)

It comprises attenuated attention to the price

, and compensation to satisfy the

budget constraint,. We are now ready to derive the Slutsky matrix.

Proposition 8 (Slutsky matrix). Evaluated at the default price, the Slutsky matrix is,

compared to the traditional matrix :

=

(20)

i.e. the sparse demand sensitivity to price is the rational one, times , the salience of

price . As a result the sparse Slutsky matrix is not symmetric in general. Sensitivities

corresponding to “non-salient” price changes (low ) are anomalously low.

36In the traditional model, this comes from the fact that () =2()

, which is symmetric, as the

order in which one takes derivatives does not matter. One may find this mathematical property (“Young’s

theorem”) itself surprising at first.

22

Instead of looking at the full price change, the consumer just reacts to of it. Hence,

she’s typically less responsive than the rational agent. For instance, say that , so

that the price of is salient, but the price of good isn’t very salient. The model predicts

that lower than

. That’s because good ’s price isn’t very salient, so quantities don’t

react much to it.

The asymmetry of the Slutsky matrix indicates that, in general, a sparse consumer cannot

be represented by a rational consumer who simply has different tastes or some adjustment

costs. Such a consumer would feature a symmetric Slutsky matrix.37

Proposition 9 (Estimation of limited attention) Choice data allow one to recover the al-

location of attention , up to a multiplicative factor . Indeed, suppose that an empirical

Slutsky matrix is available. Then, can be recovered as =

Y

³

´, for any

()=1 s.t.P

= 1.

Proof : We have=

, so

Y

³

´=Y

³

´=

, for :=

Y

¤

Equation (20) makes tight testable predictions, as the sparsity model is a parsimonious

extension of the traditional one.38 It allows for the recovery of the attention terms , up

to a multiplicative factor. The underlying “rational” matrix can be recovered at :=

. A testable prediction is that should be symmetric.39 There is literature on

consumption, estimating Slutsky matrices (e.g. Deaton andMuellbauer 1980), but it does not

seem to explore the potential role of inattention. It would be good to revisit this literature,

emphasizing non-salient prices, using the comparative statics and specific functional form

predicted by the model (Propositions 5 and 8).

Qualitatively, the extant evidence is encouraging. In addition to the tax salience already

mentioned, there is literature pointing to effects qualitatively consistent with Proposition

8, the literature on obfuscation and shrouded attributes (Gabaix and Laibson 2006, Ellison

and Ellison 2009). Those papers (Abaluk and Gruber 2011, Anagol and Kim 2012, Brown,

Hossain and Morgan 2010, Chetty, Looney and Kroft 2009) find direct field evidence that

some prices are neglected (at least partially) by consumers, leading to an underreaction to

37Intuitively, the reason for this impossibility is the following: suppose that price 2 is not salient, so 22 is

small. That might induce the modeler to represent the consumer as a rational agent with inelastic demand

for good 2. However, that move would make 21 be small too. Hence, it is difficult (and indeed generally

impossible, by Proposition 8) to represent the inattentive consumer by a rational consumer who is simply

more price inelastic.38It complements the earlier parsimonious deviation from symmetry identified by Browning and Chiappori

(1998), who have in mind a very different phenomenon: intra-household bargaining. Quantitatively, the

vector adds − 1 extra degrees of freedom (d.f.), which may be compared to the 2 + (1) d.f. added in

the Browning-Chiappori model (I absorb low-dimensions restrictions in the (1) term). The matrix has2

2+ 3

2+ (1) d.f., a good restriction compared to an unrestricted matrix with 2 d.f.

39The Slutsky matrix, by itself, does not allow to recover the extra multiplicative factor: for any 0,

admits a dilated factorization =¡−1

¢(), so the Slutsky matrix can give only up to a

multiplicative factor . To recover this factor, one needs to see how the demand changes as varies.

23

prices of add-ons. However, I am not aware of direct tests of the quantitative structure

predicted by Proposition 8, which makes sense, given this proposition and its functional

form are new.

3.4 More Advanced Issues: Hicksian Demand, Expenditure Func-

tion, Welfare, WARP

This subsection may be skipped at the first reading. It contains more advanced considerations

on consumer theory. We define := (1 ) the attention matrix, so that (20)

becomes = .

The Weak Axiom of Revealed Preferences is violated We shall see that the

venerable weak axiom of revealed preferences (WARP) is violated, in a way that makes

sense psychologically. First, recall that a small price change leads to a compensated

change in consumption = . In the traditional model, it satisfies · ≤ 0, i.e.

0 ≤ 0: the Slutsky matrix is negative semi-definite. Here is the version with a sparseagent.

Proposition 10 (Violation of WARP). Suppose that 6= 0, and reason at the defaultprice. The agent’s decisions violate the weak axiom of revealed preferences: there is a price

change , such that the corresponding change in consumption = satisfies · 0. In other terms, the Slutsky matrix fails to be negative semi-definitive. However, for

all price changes , we have: · ≤ 0.

WARP fails, but something like it holds: at , we have · ≤ 0, i.e. P ≤ 0.

Hence we do preserve a version of the “law of demand” (if prices go up, (compensated)

demand goes down, in a salience-weighted sense).

Here is the intuition, more detailed in the appendix. Suppose that the agent pays at-

tention to the car price, but not gas. Suppose that the car price goes down, but gas price

goes up by a lot. A rational agent will see that the total price of transportation (gas+price)

has gone up, so he consumes less of it: · 0, with = ( ). However, a

sparse agent just sees that the car prices went down, so he consumes more transportation:

· 0. This is a violation of WARP.40

40Proposition 10 shows that this logic is quite general. The condition 6= 0 is quite weak — with twogoods it essentially means that 1 6= 2. Here is a simple example example with two goods:

= (1 1)

and =

µ−1 1

1 −1¶(which can come from = ln 1 + ln 2), and good 2 non-salient, = (1 0). Consider

= (1 2) (which the reader may wish to multiply by some small 0 so we deal with small price changes).

As the price of good 2 increases more, the rational agent consumes less of good 2, and more of good 1:

= = (1−1). However, the sparse consumer perceives = = (1 0), so he perceives only

that good 1’s price increases. So he consumes less of good 1, and more of good 2: = = =

(−1 1) = −. Hence, we have · = −1 0 · = 1, a violation of WARP.

24

Hicksian demand, welfare and related notions We start with an explicit expres-

sion for Hicksian demand and expenditure function.

Proposition 11 The sparse Hicksian demand is: ( ) = ( ). The sparse expen-

diture function is ( ) = · ( ) = · ( ).

Indeed, this is the consumption chosen by a consumer attempting to minimize expen-

diture under the perception of prices . We next examine the link between expenditure

function and Slutsky matrix, which is tight in the traditional case.

Proposition 12 (Link between Slutsky matrix and expenditure function). The expenditure

function satisfies, at the default price: = , = , and: = −( −)

0 ( −),

i.e. = × ( + −). Note that

= +

−

, rather than the

traditional = .

The expenditure function is “twisted” by the attention terms, . We have a

relationship that is somewhat more complex than in the traditional model.

Let us now discuss the welfare losses from price misperception.

Proposition 13 (Indirect utility function and welfare losses). At the default price ,

are the same under the sparse and the traditional model, but differs:

− = −¡ −

¢= ( −)

0 ( −) (21)

The intuition is simple: the utility loss (−) is equal to the extra expenditure −due to suboptimal behavior, times the utility value of money, . That suboptimal behavior

is itself captured by a lack of substitution when reacting to prices, ( −) ( −). As

usual, we can expect the welfare losses to be quite small, as they are second order (e.g.

Akerlof-Yellen 1985, Krusell-Smith 1996).

We now present the generalized version of Shepard’s lemma and Roy’s identity.

Proposition 14 (Shepard’s lemma, Roy’s identity). Evaluated at the default price, we have

Shepard’s lemma: = at = , and Roy’s identity: ( ) +()

()= 0. However,

away from , these expressions need to be modified. Indeed we have the modified Shepard’s

lemma

( ) = ( ) + (− ) · ( ) (22)

and the modified Roy’s identity:

() + ()

( )= [( · 0 ( 0)) − ] · ( 0) (23)

Note that in (23) at = = , we recover the traditional Roy’s identity because

· 0 (0) = 1.

25

Price perceptions in logs In the loglinear model, we still have ()

= at

= . Then, it is easy to verify that all properties in section 3.1 apply also with this log-

linear formulation ( except for the last two displayed equations of Proposition 14, where

should be replaced by ).41 The loglinear model leads to slightly easier calculations

in the competitive equilibrium.

4 Competitive Equilibrium

There are agents. Agent has endowment ∈ R, and the total endowment is =P

=1 . We call () :=

P

=1 ()− the economy’s aggregate excess demand function

when prices are , with () = ( · ). Call P∗ the set of equilibrium prices, i.e. of

prices that lead zero excess demand:42 P∗ := © ∈ R

++ : () = 0ª. The resulting set of

equilibrium allocations for consumer is: C := { () : ∈ P∗}. First, we study thestructure of prices. Then, we study equilibria with many agents.

4.1 “Excess Volatility” of Prices in a Sparse Economy

In this subsection, for simplicity, we take a representative agent economy. We say that good

is relatively less salient (or more obscure) than good if its price is less salient, .

A core effect is the following.

When the price level is high, the relative price of the non-salient good is high. In the

traditional model, the set of equilibrium prices is: P∗ = {()=1 : ∈ R++} for =0 (). Now consider a sparse economy with = , and in the case of the “perception in

logs”, eq. (14). Then, the set of equilibrium prices is:

P∗ = ©¡1¢=1

: ∈ R++ª

(24)

This means that, “when the price level is high ( is high), the relative price of the obscure

good is high.” When is high, the consumer perceives the high price of the obscure good

less, hence demands more of it. That increases the price of the obscure good.

Non-salient prices need to be more volatile to absorb supply shocks. Let us illustrate

this effect in a simple case. Suppose again an endowment economy, and there is a supply

shock: the endowment moves from (0) at = 0 to (1) at = 1. To deal with the

41The same would hold with other specifications of perceived prices (), with the notation “” in the

results as a stand-in for ()

.

42An equilibrium exists under a simple “non satiation” condition, see Debreu (1970). We simply need to

assume Debreu’s Assumption A, which states that k ()k becomes infinite when one price goes to 0, whilethe sum of prices remains constant. One can directly take Debreu’s existence Proposition, as he does not

assume that () is homogeneous of degree 0. See also Shafer and Sonnenschein (1975) for equilibrium

existence in non-standard economies.

26

arbitrariness in the price level, assume that the price of good 1 is pinned down at 1 (price

level determination is discussed below). I assume that the price is initially at the default level,

(0) = . Call [] () and [] () the price vector in the rational economy and the sparse

economy, respectively. Call = (1)− (0) the price change caused by the supply shock,

and consider the case of a small price change (due to a small change in the endowment),

assuming 0. We have the following:

Proposition 15 (Non-salient prices need to be more volatile to absorb supply shocks) Call

[] and

[] the price of the equilibrium price in in the rational and sparse economy, re-

spectively, Then: [] = 1

[] . This is, after a supply shock, the movements of price in

the sparse economy are like the movements in the rational economy, but amplified by a factor

1. Hence, ceteris paribus, the prices of non-salient goods are more volatile.

Hence, non-salient prices need to be more volatile to clear the market.43 This may be

why the price of many goods, e.g. commodities, are very volatile. Consumers are quite price

inelastic, because they are inattentive. As demand underreacts to shocks, in equilibrium,

prices need to overreact.44

We now move away from the representative-agent case to study equilibrium allocations.

4.2 A Phillips Curve in the Edgeworth Box

Here, as in the traditional model, Walras’ law applies: market clearing in the market for

goods 1 to − 1 implies market clearing for good . So the set of equilibria is really

P∗ = { : − () = 0}, where − = ()1≤. As − is a function R++ → R−1, P∗ is

still a 1-dimensional manifold (as in the traditional model).

In the traditional model with one equilibrium allocation, P∗ = { : ∈ R++}, and C isjust a point, (∗) (more generally equilibria consists of a finite union of such sets, under

weak conditions given in Debreu 1970).

In the sparse setup, P∗ is still 1-dimensional, but we have a one-dimensional set of realequilibria C. This is in stark contrast to the traditional model, which typically (“generi-cally”) has only isolated equilibria. 45

To fix ideas, it is useful to consider a few completely solved example. Let us start with

the case of one rational consumer and one sparse consumer. 46

43Gul, Pesendorfer and Strzalecki (2012) have a very different model where, to clear the market, prices

need to move more than in the traditional model.44In general equilibrium, some agents need to pay attention to good , otherwise its volatility would be

infinite. The resulting equilibrium in the allocation of attention is developed in section 8.3 of the online

appendix.45In the traditional model, equilibria are the intersection of offer curves. However, this is typically not the

case here. The reason, available upon request, is left as an exercise for the reader’s students. (Hint: this is

because of () isn’t homogeneous of degree 0).46The emergence, stated here, of a continuum of real equilibria due to bounded rationality seems new.

27

ca

Ωa

0.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0

c1

c 2

Traditional Model

Ωa

0.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0

c1

c 2

Sparse Model

Figure 3: These Edgeworth boxes show competitive equilibria when both agents have Cobb-

Douglas preferences. The left panel illustrates the traditional model with rational agents:

there is just one equilibrium, = (12 12). The right panel illustrates the situation when

type is rational, and type is boundedly rational: there is a one-dimensional continuum

of competitive equilibria. Agent’s share of the total endowment () is the same in both

cases.

Proposition 16 Suppose agent is rational, and the other agent is sparse with 1 6= 2,

and two goods. Consider the set C of equilibrium allocations that agent obtains in com-

petitive equilibria. It is equal to the rational agent’s offer curve. In particular, it is one-

dimensional.

This example is a bit extreme, but illustrates the potential vastness of the set of equilibria.

The economic intuition is as follows. Take an allocation in 0s offer curve, and a price

such that () = . We want to see if consumer can be induced to consume the

corresponding = − . Consider prices , and take 1 2. Then when increases,

the obscure good appears relatively cheaper, so the sparse consumer consumes more of it —

arbitrarily more actually. That allows us to reach all the in the consumer’s budget curve

corresponding to price : { s.t. · = · }.So, take a consumer who is Cobb-Douglas ( (1 2) = ln 1 + ln 2). The equilibrium

set is his indifference curve, as shown in Figure 3, right panel. In contrast, in the traditional

model, when the consumer has the same preferences, but is rational, there is just one

equilibrium, = (12 12).

This is a Phillips curve intuition. Suppose we start at a middle point of the curve in

Figure 3, right panel. Suppose for concreteness that consumer is a worker, and while good

2 is food, say, good 1 is “leisure,” so that when he consumes less good 1, he works more.

Let us say that 1 2; he pays keen attention to his nominal wage (1) and less to price

of food, 2. Suppose now that the central bank raises the price level. Then, consumer

The most closely related may be those of Geanakoplos and Mas-Colell (1989), who analyze a two-period

model with fewer assets than states, and show a (high-dimensional) continuum of equilibria (see also Magill

and Quinzii (1992)). Those papers are quite far in their substance from the previous analysis (they study

incomplete markets with full rationality, here I study complete markets with limited rationality). Still, they

all stem from the fact that in the traditional model, prices have a one-dimensional indeterminacy.

28

sees that his nominal wage went up, and sees less clearly that the price of good 2 went up.

So he perceives that his real wage (12) went up. Hence, he supplies more labor: i.e., he

consumes less leisure (good 1), and more of good 2. Hence, the central bank, by raising the

price level, shifted the equilibrium to a different point. This is a Phillips curve in effect:

by changing the price level (in a way not fully understood by the agents), the central bank

changes the real allocation in the economy.

Is this Phillips curve something real and important? This question, actually, is much

debated in macroeconomics. Standard macro deals with one equilibrium, conditioning on

the price level (and expectations thereof). This is to some extent what we have here. Given a

price level, there’s (locally) only one equilibrium,47 but the price level changes the equilibrium

(when there are some frictions in the perception or posting of prices). This is akin to a

(temporary) Phillips curve (see Gali 2011 for an empirical update): when the price level

goes up, the perceived wage goes up, and people supply more labor (this analysis is most

naturally done in a production economy). Hence, we observe here the price-level dependent

equilibria long theorized in macro (e.g. Lucas 1972), but in the pristine and general universe

of basic microeconomics.

4.3 Complement on Competitive Equilibrium

Efficiency Next we discuss efficiency of competitive equilibrium.

Proposition 17 (Efficiency of competitive equilibrium). Assume that competitive equilibria

are interior. Then, an equilibrium is Pareto efficient if and only the perception of relative

prices is identical across agents.

Hence, typically the equilibrium is not Pareto-efficient when we are not at default prices.

The argument is very simple: if consumers and have the same perceptions, then for two

goods , , =

=

, so that the ratio of marginal utilities is equalized across

agents; there are no extra gains from trade.

Equilibrium selection What pins down the price? One approach is via a quantity

theory of money , e.g. chosen by the central bank: then the price ∈ P∗ is the (oftenunique) one that ensures that nominal GDP is ( · =). Other approaches would rely

on expectations in a dynamic model.

Endogenous default price Endogenizing the default price would be interesting. In

the context of a static model, one might hypothesize that a good default price would be such

that it is also an equilibrium price of the related sparse economy, i.e. such that ¡;

¢= 0

47Mathematically, this comes from Debreu (1970), who shows (generically, and under technical conditions)

that, for a given price level, the equilibrium is locally unique.

29

(rewrite excess demand as ¡;

¢). Then must be an equilibrium price of the underlying

rational model. Alternatively, might be better thought of as some expectation of the price,

given the past prices and recent shocks. The proper locus of the endogenization of may

be in an explicitly dynamic model, something I tackle in companion work.

Generality of the effects Many effects simply stem from the fact that the Marshallian

demand () is not homogenous of degree 0 in ( ), which is a form of “nominal illusion”

— or really, a price illusion, as there is no “money”, simply a default price (Fehr and Tyran

2001, Shafir, Diamond and Tversky 1997 analyze money illusion). In many ways, many

effects here do not depend on the specifics of money illusion. However, it is still useful to

have a specific model, as the economic intuition for many effects does depend on the spirit

of the sparsity model (e.g. salient prices), and some conditions need to be specified (e.g.

Proposition 6).

5 Discussion

5.1 Two Models that Generate a Noisy Sparse Max

Under some conditions, some models offer a noisy microfoundation for the sparse max, i.e.

their representative agent version is an agent doing a sparse max. We emphasize the basic,

quadratic case.

Heterogeneous fixed costs Consider next an agent with an “all or nothing” attention:

each is equal to 0 (no attention) or (full attention). This can be represented as a

fixed cost for the penalty in (4), rather than a linear cost (the online appendix fills in the

details related to scaling). This leads the agent to use the “hard thresholding” function

( ) = · 1||≥||, rather than the “soft thresholding” function . One obtains a sparse

max with that function : we’d replace by throughout.

Some economists will prefer the fixed cost model. The following Proposition indicates

that the sparse max is, in some sense, the “representative agent equivalent” of many agents

with heterogeneous fixed costs. 48

Proposition 18 (Fixed costs models as a microfoundation of sparse max in the basic

case). Suppose that agents use fixed costs , and the distribution of ’s is ( | ) =1

2. Then, the soft-threshold is the average values of the hard-threshold : ( ) =R∞0

( ) ( | ) . Also, in quadratic problems with a one-dimensional action, the av-erage behavior of those “fixed cost” agents is described by the sparse max.

48This distribution turns out to be a “Zipf” distribution, like many distributions in the social world

(firm, cities, words; see Gabaix 1999) but I will refrain from attaching to that a special meaning.

30

Noisy signals Here is a version of this idea with the noisy signals model.

Proposition 19 (Signal-extraction models as a noisy microfoundation of sparse max in the

basic case). Consider a model with quadratic loss (3), where the signals are = +, with

noises with relative precision = () (), with ( ) Gaussian with uncorrelated

components. The agent: (i) decides on signal precision (ii) receives signals = (1 ),

and then (iii) takes action (), to maximize: max≥0max() E [ ( () ) | ]−2

2

P .

Then, for a given , averaging over the signals, the optimal action () is the sparse max

ex ante ():

E [ () | ] = ex ante () =X

µ

¶ (25)

This assumption that the cost of precision is linear is quite common (e.g. Geanakoplos

and Milgrom 1991). Precision is proportional to the number of i.i.d. signals collected, and

the cost can be proportional to the number.

Hence, economists who think that deep down, agents perform Bayesian updating with

costly signals (which is simply a allegory for some messier, biological reality) have a means

to understand the sparse max as follows: The sparse max is the representative agent version

of a model with noisy signals. Of course, this holds only under definite assumptions.49

5.2 Links with Themes of the Literature

Sparsity is another line of attack on the polymorphous problem of confusion, inattention,

simplification, and bounded rationality (Kahneman 2003). This paper is best viewed as a

complement rather than a substitute for existing models. For instance, there is a vast litera-

ture on learning (Sargent 1993, Fudenberg and Levine 1998) that sometimes generates a host

of stylized facts because agents may not set up their models optimally (Fuster, Laibson, and

Mendel 2010). One could imagine joining these literatures in a model of “sparse learning.”

Some of the most active themes are the following.50

Inattention and information acquisition. This paper is related to the literature on mod-

eling inattention (Dellavigna 2009, Veldkamp 2011). One strand uses fixed costs, paid over

time (Grossman and Laroque 1990, Gabaix and Laibson 2002, Mankiw and Reis 2002, Reis

2006, Abel Eberly and Panageas 2010, Schwarzstein 2012). Those models are instructive,

but they quickly become hard to work with as the number of variables increases. Also, those

49Finally, there’s a potential Bayesian interpretation of sparse max (which transposes the usual Bayesian

interpretation of Tibshirani’s Lasso). The agent receives a signal e = + , with ∼ ¡0 2

¢, and

is drawn according to () = 2−|| with 0. Then, the maximum likelihood estimator is:

argmax− (−)222 − ||, which gives =

¡e2¢.50I omit many models here, in particular “process” models, e.g., Bolton and Faure-Grimaud (2009), Gabaix,

Laibson, Moloche, andWeinberg (2006). They are instructive but yield somewhat complex mappings between

situations and outcomes. Automata models (Rubinstein 1998) lead to complex models that are not (yet)

ready to handle basic problems such as the consumption choice problem.

31

papers all require a time dimension, so they don’t naturally apply to static problems, such

as the basic consumption problem (1).

Closer to this paper, perhaps, is Chetty, Looney and Kroft (1999). They study a con-

sumption problem with two goods, where the DM may not think about the tax. Attention

is modeled as paying a fixed cost, or a distribution thereof. This paper proposes a general

sparse max with constraints. Also, it derives the whole of basic consumer and equilibrium

theory with several goods.

An influential proposal made by Sims (2003) is to use an entropy-based penalty for the

cost of information; this literature is progressing impressively (e.g., Mackowiak and Wieder-

holt 2011, 2012, Woodford 2012). It has the advantage of a nice mathematical foundation.

We saw earlier substantive differences between the entropy-based model and the current

proposal. In addition, the entropy-based model leads to non-deterministic models (agents

take stochastic decisions), and the modeling is very complex when it goes beyond the linear-

Gaussian case: the solutions require a computer to be solved, and there’s no closed form

(Matejka and Sims 2010). As a result, no one has (yet) been able to work out the ba-

sic consumption problem (1), much less derive its implications for basic consumption and

equilibrium theory.51

In contrast, this paper’s techniques allow for it. The sparse max presents some important

differences in substance discussed in section 2.3. Still a reader might ask: “Why not do

Bayesian signal extraction all the time, with quadratic approximations, and average over the

agent to look at the average outcomes?” One answer is that it may be better to directly work

with the sparse max, without signals, rather than always do the detour via an allegorical

signal extraction with hypothesized Gaussian signals. In addition, the sparse max handles

constraints easily. Constraints are much harder to handle by limited-information papers.

Limited understanding of strategic interactions. In several types of models, the BR comes

from the interactions between the decision maker and other players: see Camerer, Ho, and

Chong (2004), Crawford and Irriberi (2007), Crawford, Costa-Gomes, and Iriberri (2013),

Eyster and Rabin (2005), Jéhiel (2005), Jéhiel and Samuelson (forthcoming).52 These models

prove very useful for capturing naiveté about strategic interactions. However, in a single-

person context, they (typically) model the decision maker as fully rational.

Uncertainty aversion and concern for robustness. Hansen and Sargent (2007) model that

agents understand that they do not know the right model and have a concern for “robust-

ness”. In contrast, in the present model, the decision maker is biased towards simplicity,

not robustness towards bad events. It would be interesting to study the interaction between

51See however Mackowiak and Wiederholt 2011 for a macro model, which uses quadratic approximations

and source-specific costs, thus way departing from the Sims doctrine.52Relatedly, an interesting literature studies BR in organizations (e.g., Radner and Van Zandt 2001), and

aims at predictions on the level of large organizations rather than individual decision making. See also

Madarász and Prat (2010) for a recent interesting advance in BR in a strategic context.

32

taste for sparsity and concern for robustness.

It may be interesting to note the Sims framework is based on Shannon’s information the-

ory of the 1940s. The Hansen-Sargent framework is influenced by the engineering literature

of the 1970s. The present framework is inspired by the sparsity-based literature of the 1990s-

2000s (Tibshirani 1996, Candès and Tao 2006, Donoho 2006, Mallat 2009), which shows that

sparsity has many properties of tractability and near-optimality.53 The present paper is the

first paper in economic theory to use recent sparsity-based literature from statistics.

6 Conclusion

This paper proposes a tractable enrichment of the traditional “max” operator, with some

boundedly rational features: the “sparse max” operator. This formulation is quite tractable.

At the same time, it arguably features some psychological realism. We all simplify reality, and

this model represents one way to do so. It features the “anchoring with partial adjustment”

mechanism, which is a pervasive feature of psychology. It gives a quantitative version of

when the adjustment will be large or small.

The simplicity of the core model allows for the formulation of a sparse, limited attention

version of important building blocks of economics: the basic theory of consumer behavior,

and competitive equilibrium. We can bring a behavioral enrichment to venerable and often-

used concepts such as Marshallian and Hicksian demand, Edgeworth boxes, and competitive

equilibrium sets. Some surprises emerge. The model allows us to better understand what is

robust and non-robust in basic microeconomics.

We argued that other models with a more traditional form (e.g. extraction of noisy

signals) might lead to fairly similar features, but would be quite intractable. The sparse

max allows us to explore features of economic life that hopefully apply to other models. It

does it with relatively little effort.

No doubt, the model could and should be greatly enriched. It is silent about some difficult

operations such as Bayesian (or non-Bayesian) updating and learning (see Gennaioli and

Shleifer 2010 for recent progress in that direction), and memory management (Mullainathan

2002). As is, it is also silent about mental accounts (Thaler 1985), though that extension

appears to be within reach.

As a work in progress, I extend the model to include multi-agent models and dynamic

programming (Gabaix 2012a,b) to handle applications in macroeconomics and finance.54 The

53For instance, somewhat miraculously, one can do a regression with fewer observations than regressors

(like in genetics, or perhaps growth empirics) by assuming that the number of non-zero regressors is sparse

and using an 1 penalty for sparsity (e.g. Belloni and Chernozhukov 2011), as in min1

P=1 ( − 0)

2+

kk1.54See also Croce, Lettau and Ludvingson (2012) for an investigation of how sparse models offer a promising

way to explain financial puzzles.

33

sparsity theme proves particularly useful. Agents in a sparsity macro model are simpler to

model than in the traditional model, as their actions depend on a small number of variables

and are easier to analyze and interpret. I conjecture that they may be more realistic too.

Despite these current limitations, the sparse max might be a useful tool for thinking

about the impact of bounded rationality in economic situations.

Appendix: Proofs

Remaining proofs are in the online appendix. We start with the following lemma.

Lemma 1 For ≥ 0, the solution ofmin 12(− )

2+ |−∗| is = ∗+ (−∗ )

where is the anchoring and adjustment function given in (6).

Proof. By shifting → −∗, → −∗, it is enough to consider the case ∗ = 0.

The f.o.c. is: − + () = 0, where () is the sign of if 6= 0, and an

indefinite quantity in [−1 1] if = 0. When 0 and 0, that yields = − .

Consistency with 0 requires . If 0 ≤ ≤ , the solution is = 0: the loss

function is decreasing for 0, and increasing for 0. That yields = ( ) for

≥ 0. The case 0 is symmetrical.

Proof of Proposition 1 We have ( ) = (11 ), = and

= , where is the cross-derivative of ( 1 ) with respect to (w.r.t.) and

. Hence, the problem becomes:

∗ = argmin

X

1

2( − ) ( − )

¡−−1 ¢ 2E [] + X

|| ||E£2

2

¤12= argmin

X

1

2( − )

2¡−−1 ¢ 22 +

X

|| ||

as under probability , and are uncorrelated for 6= . We have decoupled problems:

for = 1,

∗ = argmin

1

2( − )

2¡−−1 ¢ 22 + || ||

Let use the notations := E [2]12and :=

|−1 | (which can be infinite). As 0,the problem becomes:

∗ = argmin

1

2( − )

2+ || (26)

Using Lemma 1 just before this proof, this leads to ∗ = ( ). As () =

argmax ( ), we have ( ) = 0, and the implicit function theorem gives:

34

+ = 0, and finally = −−1 . Hence, =¯

¯, and

∗ = ( ) =

µ

¯

¯¶=

µ1

¶.

using ( ) = (−) and the fact that is homogenous of degree 1 in ( ).

Proof of Proposition 2 We apply Proposition 1 to the case = ||. That leads to:

∗ =

µ1

·

¶ =

µ

¶=

so = argmax (∗11

∗) = argmax (1

1

)

Proof of Proposition 4 We follow Definition 2. The Lagrangian is ( ) =

() + ( − · ), with = + and = + . We just need step 2 here.

The function () = argmax (∗ ) satisfies 0 ( ()) = i.e. () = 0−1 ().

Hence using the solution ∗ satisfies the budget constraint · (∗) = (Lemma 2 shows

formally it binds), and the chosen consumption is = (∗). Calling 0 = · , we havethat = ( 0), as it satisfies · = and 0 () = .

When demand is linear, the equation for 0 writes: = · ( 0) = · ( 1)0,so 0 = · ( 1), and

() = ( 0) = ( 1)0 = ( 1)

· ( 1)

Proof of Proposition 7 We have: () = ¡¡ +

¡ −

¢¢=1

0 ()¢.

We differentiate w.r.t. , and the point = :

=

+

0 ()

(27)

Proposition 4 implies · ( 0 ()) = , and differentiating w.r.t.

0 = + ·

+ ·

0

As for the rational demand, · () = , we have (respectively differentiating w.r.t.

and ) + ·

= 0 and · = 1 so 0 = − +

0, and 0

= ( − 1) .

Finally (27) gives:()

=

+

× ( − 1) .

35

Proof of Proposition 8 We can now evaluate the Slutsky term (18) at = :

=

+

=

+

, as for all ,

¡

¢=

¡

¢=

−

(1−) +

, by (19)

=

µ

+

¶ =

, by (18)

References

Abaluck Jason and Jonathan Gruber. 2011. “Choice Inconsistencies among the Elderly:

Evidence from Plan Choice in the Medicare Part D Program.” American Economic Review

101: 1180—1210.

Abel, Andrew, Janice Eberly and Stavros Panageas. 2010. “Optimal Inattention to the

Stock Market with Information Costs and Transactions Costs.” Working Paper, Wharton.

Akerlof, George, and Janet Yellen. 1985. “Can Small Deviations from Rationality Make

Significant Differences in Economic Equilibria?” American Economic Review 75: 708-720.

Allcott, Hunt, and Nathan Wozny. 2012. “Gasoline Prices, Fuel Economy, and the

Energy Paradox.” Working Paper, NYU.

Anagol, Santosh and Hugh Hoikwang Kim. 2012. “The Impact of Shrouded Fees: Evi-

dence from a Natural Experiment in the Indian Mutual Funds Market.” American Economic

Review 102: 576-93.

Belloni, Alexandre and Victor Chernozhukov. 2011. “1-Penalized Quantile Regression

in High-Dimensional Sparse Models.” Annals of Statistics 39: 82-130.

Bernheim, B. Douglas, and Antonio Rangel. 2009. “Beyond Revealed Preference: Choice-

Theoretic Foundations for Behavioral Welfare Economics.” Quarterly Journal of Economics

124: 51-104.

Bolton, Patrick and Antoine Faure-Grimaud. 2009. “Thinking Ahead: The Decision

Problem.” Review of Economic Studies 76(4): 1205-1238.

Brocas, Isabelle, and Juan Carillo. 2008. “The Brain as a Hierarchical Organization.”

American Economic Review 98: 1312—1346.

Brown, Jennifer, Tanjim Hossain, and John Morgan. 2010. “Shrouded Attributes and

Information Suppression: Evidence from the Field.” Quarterly Journal of Economics 125:

859-876.

Browning, Martin, and Pierre-André Chiappori. 1998. “Efficient Intra-Household Allo-

cations - A General Characterization and Empirical Tests.” Econometrica 66: 1241-1278.

Bulow, Jeremy, John Geanakoplos, and Paul Klemperer. 1985. “Multimarket oligopoly:

Strategic substitutes and complements.” Journal of Political Economy 93: 488-511.

36

Camerer, Colin F., Teck-Hua Ho, and Juin-Kuan Chong. 2004. “A Cognitive Hierarchy

Model of Games.” Quarterly Journal of Economics 119: 861-898.

Candès, Emmanuel, and Terence Tao. 2006. “Near-optimal signal recovery from random

projections: universal encoding strategies?” IEEE Transactions on Information Theory 52:

5406-5425.

Carroll, Gabriel D., James Choi, David Laibson, Brigitte C. Madrian, and Andrew Met-

rick. 2009. “Optimal Defaults and Active Decisions.” Quarterly Journal of Economics 124:

1639-1674.

Chetty, Raj. 2012. “Bounds on Elasticities with Optimization Frictions: A Synthesis of

Micro and Macro Evidence on Labor Supply.” Econometrica.

Chetty, Raj, Adam Looney, and Kory Kroft. 2009. “Salience and Taxation: Theory and

Evidence.” American Economic Review 99: 1145-1177.

Costa-Gomes, Miguel, Vincent Crawford, and Nagore Iriberri. 2013. “Structural Models

of Nonequilibrium Strategic Thinking: Theory, Evidence, and Applications.” Journal of

Economic Literature 51.

Crawford, Vincent, and Nagore Iriberri. 2007. “Level- Auctions: Can Boundedly Ra-

tional Strategic Thinking Explain the Winner’s Curse and Overbidding in Private-Value

Auctions?” Econometrica 75: 1721-1770.

Croce, Massimiliano, Martin Lettau, Ludvigson, Sydney. 2012. “Investor Information,

Long-Run Risk, and the Duration of Risky Cash-Flows.” Berkeley WP.

Debreu, Gérard. 1970. “Economies with a Finite Set of Equilibria.” Econometrica 38:

387-392

Dellavigna, Stefano. 2009. “Psychology and Economics: Evidence from The Field.”

Journal of Economic Literature 47: 315-372

Donoho, David. 2006. “Compressed Sensing.” IEEE Transactions on Information Theory

52: 1289-1306.

Dufwenberg, Martin, Paul Heidhues, Georg Kirchsteiger, Frank Riedel, and Joel Sobel.

2011. “Other-Regarding Preferences in General Equilibrium.” Review of Economic Studies

78: 613-639.

Ellison, Glenn and Sara Fisher Ellison. 2009. “Search, Obfuscation, and Price Elasticities

on the Internet.” Econometrica 77: 427-452.

Esponda, Ignacio. 2008. “Behavioral Equilibrium in Economies with Adverse Selection.”

American Economic Review 98: 1269-91.

Eyster, Erik, and Matthew Rabin. 2005. “Cursed Equilibrium.” Econometrica 73: 1623-

1672.

Fehr, Ernst and Jean-Robert Tyran. 2001. “Does Money Illusion Matter? American

Economic Review 91(5): 1239-1262.

Fudenberg, Drew, and David Levine. 1998. The Theory of Learning in Games MIT

Press.

37

Fudenberg, Drew, and David Levine. 2006. “A Dual-Self Model of Impulse Control.”

American Economic Review 96: 1449-1476.

Fuster, Andreas, David Laibson, and Brock Mendel. 2010. “Natural Expectations and

Macroeconomic Fluctuations.” Journal of Economic Perspectives 24: 67-84.

Gabaix, Xavier. 2012a. “Sparse Dynamic Programming and Aggregate Fluctuations.”

Working Paper, NYU.

Gabaix, Xavier. 2012b. “Game Theory with Sparsity-Based Bounded Rationality.”

Working Paper, NYU.

Gabaix, Xavier, and David Laibson. 2002. “The 6 bias and the Equity Premium

Puzzle.” NBER Macroeconomics Annual 16: 257-312.

Gabaix, Xavier and David Laibson. 2006. “Shrouded Attributes, Consumer Myopia, and

Information Suppression in Competitive Markets.” Quarterly Journal of Economics 121 (2):

505-540.

Gabaix, Xavier, David Laibson, GuillermoMoloche and StephenWeinberg. 2006. “Costly

Information Acquisition: Experimental Analysis of a Boundedly Rational Model,” American

Economic Review 96: 1043-1068.

Gali, Jordi. 2011. “The Return of the Wage Phillips Curve.” Journal of the European

Economic Association 9 (3): 436-461.

Geanakoplos, John and Andreu Mas-Colell. 1989. “Real indeterminacy with financial

assets.” Journal of Economic Theory 47(1): 22-38.

Geanakoplos, John and Paul Milgrom. 1991, “A theory of hierarchies based on limited

managerial attention.” Journal of the Japanese and International Economies 5: 205-225.

Gennaioli, Nicola, and Andrei Shleifer. 2010. “What Comes to Mind.” Quarterly Journal

of Economics 125: 1399-1433.

Grossman, Sanford and Guy Laroque. 1990. “Asset Pricing and Optimal Portfolio Choice

in the Presence of Illiquid Durable Consumption Goods.” Econometrica 58(1): 25-51.

Gul, Faruk, Pesendorfer Wolfgang, Strzalecki Tomasz. 2012. ”Behavioral Competitive

Equilibrium and Extreme Prices.” Working Paper, Harvard.

Hansen, Lars, and Thomas Sargent. 2007. Robustness. Princeton University Press.

Hassan, Tarek and Thomas M. Mertens. 2011. “Market Sentiment: A Tragedy of the

Commons.” American Economic Review Papers and Proceedings 101: 402-05.

Hirshleifer, David, Sonya Lim and Siew Teoh. 2009. “Driven to Distraction: Extraneous

Events and Underreaction to Earnings News.” Journal of Finance 64: 2289-2325

Jéhiel, Philippe. 2005. “Analogy-Based Expectation Equilibrium.” Journal of Economic

Theory 123: 81—10.

Jéhiel, Philippe, and Larry Samuelson. “Reputation with Analogical Reasoning.” Quar-

terly Journal of Economics forth.

Kahneman, Daniel. 2003. “Maps of Bounded Rationality: Psychology for Behavioral

Economics.” American Economic Review 93: 1449-1475.

38

Kahneman, Daniel. 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux.

Koszegi, Botond, and Adam Szeidl. 2011. “A Model of Focusing in Economic Choice.”

Working Paper, Berkeley.

Krusell, Per and Anthony Smith. 1996. “Rules of Thumb in Macroeconomic Equilibrium:

A Quantitative Analysis.” Journal of Economic Dynamics and Control.

Lucas, R.E., Jr. 1972. “Expectations and the Neutrality of Money.” Journal of Economic

Theory 4: 103—124.

Magill, Michael andMartin Quinzii. 1992. “Real effects of money in general equilibrium.”

Journal of Mathematical Economics 21(4): 301-342.

Mackowiak, Bartosz andMirkoWiederholt. 2009. “Optimal Sticky Prices under Rational

Inattention.” American Economic Review 99(3): 769-803.

Mackowiak, Bartosz andMirkoWiederholt. 2012. “Inattention to Rare Events.”Working

Paper, Northwestern.

Madarász, Kristóf, and Andrea Prat. 2010. “Screening with an Approximate Type

Space.” Working Paper, London School of Economics.

Mallat, Stéphane. 2009. A Wavelet Tour of Signal Processing: The Sparse Way third

edition. Academic Press.

Mas-Colell, Andreu, Michael Whinston, and Jerry Green. 1995. Microeconomic Theory

Oxford University Press.

Matejka, Filip and Christopher Sims. 2010. “Discrete actions in information-constrained

tracking problems.” Working Paper, Princeton University.

Mankiw, N. Gregory, and Ricardo Reis. 2002. “Sticky Information Versus Sticky Prices:

A Proposal to Replace the New Keynesian Phillips Curve.” Quarterly Journal of Economics

117: 1295-1328.

Miller George. 1956. “The magical number seven plus or minus two: some limits on our

capacity for processing information.” Psychological Review 63 (2): 81—97.

Mullainathan, Sendhil. 2002. “A Memory-Based Model of Bounded Rationality.” Quar-

terly Journal of Economics 117: 735-774.

O’Donoghue, Ted, and Matthew Rabin. 1999. “Doing It Now or Later.” American

Economic Review 89: 103-124.

Payne, John W.; Bettman, James R. and Johnson, Eric J. 1993. The adaptive decision

maker. Cambridge University Press.

Rabin, Matthew and Georg Weizsäcker. 2009. “Narrow Bracketing and Dominated

Choices.” American Economic Review 99: 1508-1543

Radner, Roy, and Timothy Van Zandt. 2001. “Real-Time Decentralized Information

Processing and Returns to Scale.” Economic Theory 17: 545-575.

Reis, Ricardo. 2006. “Inattentive Consumers.” Journal of Monetary Economics 53:

1761-1800.

Rubinstein, Ariel. 1998. Modeling Bounded Rationality. MIT Press.

39

Samuelson, Paul. 1947. Foundations of Economic Analysis. Harvard University Press,.

Sargent, Thomas. 1993. Bounded Rationality in Macroeconomics. Oxford University

Press.

Schwartzstein, Joshua. 2011. “Selective Attention and Learning.” Working Paper, Dart-

mouth.

Shafer, Wayne and Hugo Sonnenschein. 1975. “Equilibrium in abstract economies with-

out ordered preferences.” Journal of Mathematical Economics 2(3): 345-348.

Shafir, Eldar, Peter Diamond and Amos Tversky. 1997. “Money Illusion.” Quarterly

Journal of Economics 112(2): 341-74.

Sims, Christopher. 2003. “Implications of Rational Inattention.” Journal of Monetary

Economics 50: 665—690.

Thaler, Richard H. 1985. “Mental Accounting and Consumer Choice.”Marketing Science

4: 199-214.

Tibshirani, Robert. 1996. “Regression shrinkage and selection via the lasso.” Journal of

the Royal Statistical Society B 58: 267—288.

Tversky, Amos. 1972. “Elimination by aspects: A theory of choice.” Psychological

Review 79:281—299.

Tversky, Amos, and Daniel Kahneman. 1974. “Judgment under uncertainty: Heuristics

and biases.” Science 185: 1124—1130.

Veldkamp, Laura. 2011. Information Choice in Macroeconomics and Finance. Princeton

University Press.

Woodford, Michael. 2012. “Inattentive Valuation and Reference-Dependent Choice.”

Working Paper, Columbia.

40

A Sparsity-Based Model of Bounded Rationality, Applied to...

Documents

Transcript of A Sparsity-Based Model of Bounded Rationality, Applied to...