Consumer Learning and Habit Formation with Multiple … · Consumer Learning and Habit Formation...
Transcript of Consumer Learning and Habit Formation with Multiple … · Consumer Learning and Habit Formation...
Consumer Learning and Habit Formation with
Multiple Brand Choices
Valentin Agafonov∗
Advisor: Professor Ilya Segal
2007
This paper proposes a tractable structural model for the behavior of a
consumer who learns and forms habits from consumption. The consumer’s
dynamic programming problem is solved and applied to two sets of scanner
data on purchases of yogurt to estimate the model’s structural parameters
by maximum likelihood. Using ideas from multinomial logit, the consumer
is viewed as making purchase decisions faced with multiple brand choices.
Estimation of the model produces statistically significant estimates (as well
as dollar values) of both learning and habit formation in the data.
*I would like to thank Professor Segal for his encouraging and intellectually rigorous mentorship. Professor Segal’s insightful suggestions helped me better grasp the intuition behind economics. And his unwavering commitment to encouraging research has made my experience of writing a thesis very intellectually rewarding. I would like to thank Professor Geoffrey Rothwell for his many valuable suggestions and ideas for the project. I would like to thank Professor James Lattin for graciously providing Data Set 1 and I would like to thank Andrea Pozzi for graciously providing Data Set 2.
Contents
1 Introduction 4
1.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . 7
2 Model 8
2.1 Solution to the Consumer’s Dynamic Programming Problem . 11
2.2 Special Cases of the Model . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Consumer Learning . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Habit Formation . . . . . . . . . . . . . . . . . . . . . 12
2.3 The Econometric Model (Multiple Brand Choices) . . . . . . . 12
2.3.1 Likelihood Function . . . . . . . . . . . . . . . . . . . . 18
2.4 Alternative Specifications . . . . . . . . . . . . . . . . . . . . . 19
3 Data 20
4 Estimation 23
4.1 Evidence from Data Set 1 . . . . . . . . . . . . . . . . . . . . 23
4.2 Evidence from Data Set 2 . . . . . . . . . . . . . . . . . . . . 32
5 Conclusion 37
A Solution to the Consumer’s Dynamic Programming Problem 40
A.1 The First Period Choice . . . . . . . . . . . . . . . . . . . . . 40
A.2 The Second Period Choice . . . . . . . . . . . . . . . . . . . . 42
2
B Likelihood of Consumer’s Decisions 43
B.1 Likelihood of Consumer’s First Period Decision . . . . . . . . 43
B.2 Conditional Likelihood of Consumer’s Second Period Decision 45
References 48
List of Tables
1 Data Sets Used . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Estimated Model Parameters for Data Set 1 . . . . . . . . . . 24
3 Dynamic Effects Predictions versus Observed (Data Set 1) . . 25
4 Estimated Model Parameters for Data Set 2 . . . . . . . . . . 33
5 Dynamic Effects Predictions versus Observed (Data Set 2) . . 34
List of Figures
1 Learning Only Distribution of Reservation Price P2 . . . . . . 13
2 Habit Formation Only Distribution of Reservation Price P2 . . 14
3 Shifts in Demand (Data Set 1) . . . . . . . . . . . . . . . . . . 30
4 Demand (Data Set 1) . . . . . . . . . . . . . . . . . . . . . . . 31
5 Demand (Data Set 2) . . . . . . . . . . . . . . . . . . . . . . . 36
6 Reservation Price (Data Set 2) . . . . . . . . . . . . . . . . . . 37
3
1 Introduction
The dependence of consumer demand upon previous consumption, if it were
present to a significant degree in consumer decision making, would need to
be accounted for in models of consumer demand. Two important potential
sources of this path dependence in consumer demand are habit formation
and learning effects. There is a significant amount of literature on rational
habit formation — i.e. Becker and Murphy (1988). With habit formation,
a consumer’s demand for a product will be higher because the product was
consumed before. The other source of path dependence is learning. With ex-
perience goods, a consumer will not know with certainty her marginal benefit
of consuming the product prior to consuming it, and will have incentives to
experiment — i.e. trade off current utility for potential future utility by ac-
quiring information about a product now through consuming it now (i.e. see
Bergemann and Valimaki 1996). Therefore with learning consumer behavior
will depend on prior purchases of the product — those who have tried the
product before are more certain (and have more accurate predictions) about
their marginal benefit from consuming it than those who have not tried it
before. An important question is whether these habit formation and learning
effects are present to a significant degree in consumer behavior, because if
they were, they would have to be accounted for in demand derivations in or-
der to avoid biased estimates, and they could make current demand models
richer and lend them more explanatory power.
4
There are not many papers in the economics literature concerned with
measuring the sizes of these effects. One paper measures both habit forma-
tion and learning effects from scanner data (Osborne 2006). Other papers
(Ackerberg 2003;Erdem and Keane 1996; Crawford and Shum 2005) mea-
sure these effects as well, but none (except Osborne 2006) accounts for both
effects in the same demand model.
Using classical methods (MLE) for such a model can be computationally
demanding. Since modeling a consumer who is learning normally involves
an infinite horizon with uncertain future determinants of demand, a paper
attempting to model this behavior would solve the consumer’s problem by
backwards induction through time, and the larger the number of time pe-
riods, the larger the state space, the more computationally intensive calcu-
lations become — this is referred to as the “curse of dimensionality”. This
is further complicated by the fact that modeling learning involves persistent
(through time) unobserved variables which need to be integrated out in or-
der to calculate the likelihood of an observation. This integration does not
have a closed form and is simulated for each iteration of likelihood. To ease
the computational burden, approximations to the dynamic programming so-
lution are used by interpolating the solution from a subset of points in the
state space. Even with these simplifications, the estimation process is still
very computationally demanding. Additionally, using approximate solutions
to the consumer’s dynamic programming problem to estimate the structural
model by maximum likelihood has potential pitfalls, such as inconsistent
5
and biased parameter estimates (i.e. see Fernandez-Villaverde et al 2006,
finding that second-order errors in the approximate solution to the dynamic
programming problem can result in first-order errors in the approximated
likelihood). My goal in this paper is to derive a tractable model of consumer
learning and habit formation, and use this model to test the presence of these
effects in the behavior of consumers in two panel data sets of supermarket
shoppers (about 4000 consumers in one and about 11000 consumers in the
other data set)
Estimating both consumer learning and habit formation in the same
model has many advantages. For example, a model estimating habit for-
mation and not accounting for learning could be subject to the criticism that
estimates of habit formation reflect consumer learning (i.e. the consumer
tries the product, discovers her high valuation, and keeps buying it for that
reason and not because of habit formation). The advantage of the model in
this paper is that since learning is controlled for, such a bias should not be
present. Osborne points out that identifying learning without controlling for
habit formation is also subject to possible difficulties. He explains that habit
formation may make it look like there is less learning than there really is be-
cause it makes switching brands costly. But controlling for habit formation
should fix these difficulties as well.
Since Osborne (2005) is the only paper in the literature which estimates
learning and habit formation in the same model, it is the best paper to com-
pare with the estimates derived in this paper. It may be useful to observe
6
the differences between Osborne (2005) and this paper. Osborne applies his
model to scanner data on purchases of laundry detergent while this paper
applies its model to scanner data on purchases of yogurt. Since Osborne
uses a very rich heterogeneity structure in his model, he estimates it using
Markov Chain Monte Carlo, and this paper uses classical methods of estima-
tion (Maximum Likelihood).
1.1 Summary of Results
The paper develops a structural model of consumer demand using an exact
solution to the consumer’s dynamic programming problem and estimates it
using MLE. In Data Set 1, the parameter corresponding to the consumer’s
uncertainty about her true taste for yogurt has an estimate which is is highly
statistically significantly different from zero, with a value of 5.44 and a stan-
dard error of 0.664 (in Data Set 2 it has value 6.06 and standard error 0.142).
Using a Taylor series approximation, this translates into an estimate of 230
cents for the standard deviation of tastes, with the estimator’s standard error
at 153 cents (and in Data Set 2 standard deviation of tastes is 427 cents and
has a standard error of 60 cents). Thus consumers are very heterogeneous
in their tastes for yogurt, with some consumers liking it a lot, but others
not liking it. This indicates incentives for consumer learning. In Data Set 2
the habit formation estimate is highly statistically significantly different from
zero, with a value of 82 cents and a standard error of 8.14 cents. This esti-
mate indicates that consuming yogurt in period 1 increases the consumer’s
7
taste for it in the next period by 82 cents. Since most brands of yogurt in
the data sets cost between 50 cents and $1, the habit formation effect is very
significant.
2 Model
The model developped in this section is a structural model of consumer choice
which explicitly solves the consumer’s maximization problem as opposed to
a reduced form model. The advantages of a structural model include the
fact that, if the model is correctly specified, parameters of the model can be
taken away from the data (and, for example, used for counterfactuals/policy
experiments mindful of the Lucas critique). Another advantage of the struc-
tural model is that when estimated by Maximum Likelihood it captures more
information from the data than a reduced form model would. This advan-
tage is especially relevant for this paper because a reduced form specification
failed to detect statistically significant learning and habit formation in the
data because it discarded too much relevant information. The reduced form
model is discussed in section 2.4.
Consumer behavior which is subject to learning can result from the con-
sumer not being certain about her marginal benefit of consuming the product
until she consumes it — this could be true of pure experience goods, but may
be true of all products to some degree. Thus there is a component to utility
which is unknown before the first consumption event, but is revealed after
8
it. A consumer’s first purchase event is a gamble because marginal utility
of consumption is unknown at that time, but once a first purchase is made,
the consumer acquires information about her marginal utility of consuming
the product, and hence, for potential future purchase events, has accurate
information about her valuation. Thus in a first purchase the consumer not
only derives utility from immediate consumption, but also acquires valuable
information for consumption in future periods. While the consumer does
not derive utility from this information directly, she uses it to make choices
maximizing total expected utility. Therefore the consumer may want to
trade off utility from immediate consumption for information. These trade-
offs are built into the behavior of a forward-looking consumer who maximizes
ecpected total utility rather than immediate utility.
We assume consumers have von Neumann-Morgenstern form utility, hence
they maximize expected total utility in the face of uncertain future outcomes
rather than just deterministic total utility (which would be impossible here
in the presence of random variables affecting utility). This is a two-period
model, so we assume customers do not appreciably discount the future. One
advantage of a two-period model is the existence of a closed-form solution to
the dynamic programming problem. Another advantage is the tractability of
the two-period model.
Define the total utility of a two-period consumption choice as
U = vx1,x2 + x1(ε1 − P1 + η) + x2(ε2 − P2 + η), (1)
9
And
v1,1 − v1,0 > v0,1 − v0,0 (2)
so define
δ = (v1,1 − v1,0) − (v0,1 − v0,0) . (3)
where the purchase decisions xk ∈ {0, 1} so the consumer chooses to consume
one unit of the product or zero units, depending on which of these options
yields the higher expected utility. The εk are standard Random Utility Model
distrubances, and here they account for effects of unobservable factors on
the utility. Pk are prices for the product. The η is the unknown component
of utility which will be used to capture learning effects — it is observed
by the agent through consuming the product only if and after she makes
a purchase in the first period, and does not change with time, hence the
information acquired by a first period purchase event is very valuable. The
agent observes ε1 and P1 before deciding on x1, and observes ε2 and P2
before deciding on x2. The δ is used to capture habit formation — a positive
value on δ indicates that the marginal utility of consuming a product will
be higher in period 2 if the product was consumed in period 1. εk, η, and
P2 are assumed normally distributed. Normalize E(η) = 0, while E(ε) can
be nonzero. Therefore, when a consumer purchases yogurt, she buys three
things: immediate consumption, an increase in next period’s utility (through
habit formation), and information.
For the benefit of the model with multiple brands in Section 2.3, introduce
10
the intercept r into both periods’ flow utility. Assume this intercept is known
to the consumer before the first period. This amounts to adding r(x1 + x2)
to the utility specified in equation 1.
2.1 Solution to the Consumer’s Dynamic Programming
Problem
We solve the consumer’s Bellman equation using the Law of Iterated Ex-
pectations to split up E max() expressions (mathematical expectation taken
over random variables of the maximum of these random variables) in the
value function into simple expectations. See Appendix A.1 for the derivation
of the first period reservation price P1 and Appendix A.2 for the derivation
of the second period reservation price P2.
P1 = ε1 + r + v1,0Φ
⎛⎝v1,0 − v1,1 − r − με2 + μP2√
σ2ε2
+ σ2P2
+ σ2η
⎞⎠ − v0,0Φ
⎛⎝v0,0 − v0,1 − r − με2 + μP2√
σ2ε2
+ σ2P2
⎞⎠
+[r + v1,1 + τ(με2 − μP2 ,
√σ2
ε2+ σ2
P2+ σ2
η , v1,0 − v1,1 − r,∞)]
×⎡⎣1 − Φ
⎛⎝v1,0 − v1,1 − r − με2 + μP2√
σ2ε2
+ σ2P2
+ σ2η
⎞⎠
⎤⎦ (4)
−[r + v0,1 + τ(με2 − μP2,
√σ2
ε2+ σ2
P2, v0,0 − v0,1 − r,∞)
]
×⎡⎣1 − Φ
⎛⎝v0,0 − v0,1 − r − με2 + μP2√
σ2ε2
+ σ2P2
⎞⎠
⎤⎦
11
[P2
∣∣∣ x1
]= r + (v0,1 − v0,0) + ε2 + x1(δ + η) (5)
2.2 Special Cases of the Model
In order to see the effect of this x1(δ + η) term on demand dynamics, we
observe the components of this effect in two special cases of the model.
2.2.1 Consumer Learning
• Set δ = v0,0 = v0,1 = v1,0 = v1,0 ≡ 0. This will be a pure learning model.
Then P2 |x1=1 = P2 |x1=0 + η. And η is stochastic with E(η) = 0. Thus
P2 |x1=1 is a mean preserving spread of P2 |x1=0 . Hence P2 |x1=0 Second
Order Stochastically Dominates P2 |x1=1 . See Figure 1.
2.2.2 Habit Formation
• Set η = ση ≡ 0. This will be a pure habit formation model. Then
P2 |x1=1 = P2 |x1=0 + δ. Thus P2 |x1=1 First Order Stochastically Dom-
inates P2 |x1=0 . see Figure 2.
2.3 The Econometric Model (Multiple Brand Choices)
In this section the econometric model which will be used to apply the eco-
nomic model to the data is discussed. The economic model solution derived
above represents optimal behavior for a consumer, which is the subject of
12
Figure 1: Learning Only Distribution of Reservation Price P2
CumulativeProbability
P2
F( |x = 1)P2 1
F( |x = 0)P2 1
interest in this paper. The goal in this section is to get the most out of
the data available in order to gain knowledge about the underlying behavior
described by the economic model. Firstly, the model in this section will go
forward from single brand to multiple brand, which is done for the following
reasons:
1 Data Coverage The panel data sets which are available for this paper
include information about transactions for many brands/flavors of yo-
gurt. One way of getting the most out of this data is to treat each brand
of yogurt as a “type” of the same product and thereby incorporate a
13
Figure 2: Habit Formation Only Distribution of Reservation Price P2
CumulativeProbability
P2
F( |x = 1)P2 1
F( |x = 0)P2 1
larger proportion of the data into estimates than would be possible by
just picking a popular brand as the product and ignoring purchases of
all other brands.
2 External Validity Another advantage of treating brands of yogurt as
types is that an analysis which estimates parameters of the economic
model from data on just one brand of yogurt may end up with estimates
of that particular brand’s properties rather than the properties of the
product category.
3 Brand Substitutability Yet another advantage of treating brands of yo-
14
gurt as types is that, if the brands are sufficiently substitutable, changes
in the price (or other characteristic) of one brand may have an effect
on demand for other brands.
Consumers often do not make choices about purchases in isolation from
alternatives: choices available to the consumer matter. For example, when
choosing a yogurt, consumers have a variety of brands and flavors available
to them, and even if maximizing the flow utility of every one of these choices
separately would yield decisions to buy one unit rather than buy zero units,
consumers do not buy every yogurt in the store. In fact, assuming the con-
sumer has unit demand for yogurt, and also has an available outside good
(not buying any yogurt), then it would make sense for the consumer to select
the yogurt which yields the highest surplus (i.e. the yogurt the purchase
of which will yield highest expected utility over the consumer’s horizon).
Alternatively, purchases of different brands of yogurt could be modeled as
independent of one another, but it seems probable that brands are substi-
tutable, thus I model choice between brands.
In order to econometrically express the idea that the consumer selects the
best from a number of alternatives we apply some ideas behind McFadden’s
model of multinomial choice (multinomial logit), specifically, the assumption
of Independence of Irrelevant Alternatives. Only part of the derivation from
McFadden (1975) is used because this paper makes different assumptions
about the error term ε than those of standard multinomial logit.
15
The utility in the economic model, when allowed to apply to different
brands and individuals, takes the following form for brand/flavor j and in-
dividual i:
U ji = rj(xji1 + xji
2 ) + vxji1 ,xji
2+ xji
1 (εji1 − P j
1 + ηji) + xji2 (εji
2 − P j2 + ηji) (6)
The statistician does not observe the εjik and ηji and considers them stochas-
tic, while με, σε, ση, μP2, σP2 (which represent beliefs of the consumers, and
in the case of P2, not necessarily the actual mean and standard deviation
of observed prices) and v1,1, v0,1, v0,0, δ, rj are deterministic parameters of the
model.1 We assume v1,1, v0,1, v0,0, δ, με, σε, ση, μP2, σP2 are specific to the prod-
uct, and hence are the same for every brand/flavor. We also assume these
parameters are the same for every individual in the data — the only vari-
ables varying by individual are εjik and ηji. The intercept rj accounts for
the possibility of differences (vertical differentiation) between brands/flavors
— i.e. differences which all consumers would agree on, hence they are not
accounted for by the “error” term ε.
The statistician does not observe all the variables which enter into the
customer’s utility and affect the customer’s decisions, thus the statistician
assigns probabilities to decisions to buy or not buy a brand/flavor based on
the information set consisting of observed variables and distribution assump-
tions about the stochastic unobserved variables. Denote the probability of
1Assume σε1 = σε2 ≡ σε and με1 = με2 ≡ με
16
selecting alternative n from a nonempty set of choices N by pN(n). Assume
pN(n) > 0 for all n ∈ N . Then assume Independence from Irrelevant
Alternatives (IIA): if n, m ∈ N , then
p{n,m}(n)
p{n,m}(m)=
pN(n)
pN (m)(7)
This is a strong assumption, but it is a standard one because it makes it
possible to evaluate likelihood without calculating the cdf for multivariate
distributions, a task which is extremely computationally intensive.
Define an arbitrary index for the brands/flavors available to the consumer,
and for convenience set the index of the “none” choice (none of the available
brand/flavors purchased) to 0.
The likelihood of the consumer’s first period choice l1 is derived in Ap-
pendix B.1 and the likelihood of the consumer’s second period choice l2 is
derived in Appendix B.2.
For l1 �= 0
pN(l1) =1 − Φ
(P s
1−cl1−με
σε
)
Φ(
P s1−cl1−με
σε
)/⎛
⎝1 +∑
m∈N,m�=0
1 − Φ(
P s1−cm−με
σε
)Φ
(P s
1−cm−με
σε
)⎞⎠ (8)
and for l1 = 0
pN(l1) = 1
/⎛⎝1 +
∑m∈N,m�=0
1 − Φ(
P s1−cm−με
σε
)Φ
(P s
1−cm−με
σε
)⎞⎠ (9)
17
For l2 �= 0
pN( l2| l1) =
[1 − Φ
(P
l22 −c
l22 −δ−με√
σ2ε+σ2
η
)]1(l1=l2) [1 − Φ
(P
l22 −c
l22 −με
σε
)]1−1(l1=l2)
[Φ
(P
l22 −c
l22 −δ−με√
σ2ε+σ2
η
)]1(l1=l2) [Φ
(P
l22 −c
l22 −με
σε
)]1−1(l1=l2)(10)
÷
⎛⎜⎜⎜⎝1 +
∑m∈N,m�=0
[1 − Φ
(P m
2 −cm2 −δ−με√
σ2ε+σ2
η
)]1(l1=m) [1 − Φ
(P m
2 −cm2 −με
σε
)]1−1(l1=m)
[Φ
(P m
2 −cm2 −δ−με√
σ2ε+σ2
η
)]1(l1=m) [Φ
(P m
2 −cm2 −με
σε
)]1−1(l1=m)
⎞⎟⎟⎟⎠
and for l2 = 0
pN( l2| l1) =
⎛⎜⎜⎜⎝1 +
∑m∈N,m�=0
[1 − Φ
(P m
2 −cm2 −δ−με√
σ2ε+σ2
η
)]1(l1=m) [1 − Φ
(P m
2 −cm2 −με
σε
)]1−1(l1=m)
[Φ
(P m
2 −cm2 −δ−με√
σ2ε+σ2
η
)]1(l1=m) [Φ
(P m
2 −cm2 −με
σε
)]1−1(l1=m)
⎞⎟⎟⎟⎠
−1
(11)
2.3.1 Likelihood Function
The likelihood of an observation with choice b1 for the first stage and choice b2
for the second stage is Pr(l1 = b1, l2 = b2) = Pr(l2 = b2|l1 = b1)×Pr(l1 = b1).
Thus likelihood is
L = pN(b2| b1)pN(b1) (12)
and L is computed by plugging in pN(b2| b1) and pN(b1) from equations
(8),(9),(10),(11).
18
2.4 Alternative Specifications
A reduced form model attempting to measure learning and habit formation
in the data on consumer behavior is:
x2 = β0 + β1Sale1 + β2Sale1Price2 + β3Price2 (13)
Where x2 ∈ {0, 1} is the purchase in period 2, Sale1 is a dichotomous variable
which equals 1 if the brand was on sale in period 1 and 0 otherwise, and
Pricek is the price of the brand of yogurt in period k. Assuming Sale1 is
positively correlated with x1, the purchase in period 1, then:
• Habit formation would result in β1 > 0 because the marginal benefit
of consuming the product in period 2 would be higher for a consumer
who consumed it in period 1 than for a consumer who did not.
• Consumer learning would result in β2 > 0 because a consumer who
consumed the product in period 1 is less price sensitive in period 2
than a consumer who did not consume the product in period 1
Running this regression (probit since the dependent variable is dichotomous)
on the data did not produce statistically significant estimates of coefficients
β1 and β2. This is most likely explained by the weakness of the association
between Sale1 and x1 because a sale in the first period does not compel many
consumers to buy the brand in period 1 who would not have bought it in
period 1 anyway.
19
3 Data
In order to test the presence of learning and habit formation in consumer
behavior, and to measure the sizes of these effects, we turn to revealed pref-
erence data for supermarket shoppers.
Using this type of data (revealed preference) has the advantage that the
economic agents in the data have the correct incentives (they are shopping
normally) and they are experienced and knowledgeable about optimizing
their decisions in this setting. The data sets used are scanner data for pur-
chases of yogurt. Two data sets were used, one with transactions of approxi-
mately 4000 consumers followed over a period of two years, and another with
approximately 11000 consumers also followed over a period of approximately
two years.
20
Table 1: Data Sets Used
Data Set Final Number of Consumers in Data
Data Set 1 3384Data Set 2 8650
21
In each of the data sets, for each customer i, a random week ti is picked
as a starting point. A random week is picked for each consumer in order
to insure prices vary in the data used (for identification purposes) since the
prices may be very similar for all consumers in the same week. Another
reason to pick a random week for each consumer is that it helps to sidestep
potential difficulties associated with measuring the purchasing behavior of
different consumers in the same week because the unobserved variables af-
fecting utility may be correlated across consumers on the same date. The
customer’s purchase or lack thereof of 35 brands of yogurt at ti is recorded.
Since the learning and habit formation effects we are attempting to isolate
are intertemporal consumption effects, we attempt to remove other, poten-
tially confounding, intertemporal effects from the data proactively so that
any statistical association we find can confidently be attributed to our spe-
cific intertemporal consumption effects of interest and not other potentially
confounding effects. One such effect is storage — a consumer may purchase a
product at one time, form a rational habit for it, but not purchase it again at
the next opportunity only because she has stored a sufficient amount of the
product from her first period purchase. This hypothetical behavior would
create a downward bias on habit formation estimates. Thus, in order to
minimize storage effects, we look at the consumer’s shopping trip four weeks
after the original trip date ti, and refer to this date as t′i = ti + 4. The
customer’s purchase or lack thereof of 35 brands of yogurt at t′i is recorded.
Because we are attempting to measure learning effects, we also drop any
22
consumer i who purchased any of the 35 brands of yogurt four weeks or less
prior to ti from our data. Even if brand j of yogurt was not purchased by
consumer i during week t′i or week ti, we still record the price that we expect
consumer i observed of yogurt brand j in that week. This is done by check-
ing the price observed in a purchase of the same brand of yogurt j in the
same week in the same store (by another consumer). Since the multibrand
model, as is common in the literature, assumes the consumer purchases no
more than one brand of yogurt on each shopping trip, observations where
consumer i purchases more than one brand of yogurt during week t′i or week
ti are dropped.
4 Estimation
The maximum likelihood estimation results are shown in Table 2 for Data
Set 1 and in Table 4 for Data Set 2. v0,0 was normalized to zero, and so was
r0, the brand-specific intercept for brand 0 (not buying anything). Thus the
utility of purchasing nothing is zero. Prices in the data are given in cents.
4.1 Evidence from Data Set 1
23
Table 2: Estimated Model Parameters for Data Set 1
Parameter Estimate Parameter Estimate Parameter Estimate
ln(σε) 5.119139 ln(r8) 5.296639 ln(r23) 5.018946(0.2026432) (0.4201343) (0.5572788)
με -668.8425 ln(r9) 5.121379 ln(r24) 5.02987(154.6298) (0.5021016) (0.5489833)
ln(σP2 ) 4.904234 ln(r10) 5.13403 ln(r25) 5.153828(6.171514) (0.4912067) (0.5025538)
μP2 -1.091621 ln(r11) -4.997847 ln(r26) -3.128203(782.3295) (31.97883) (182.4084)
ln(ση) 5.438941 ln(r12) 4.999525 ln(r27) 5.181113(0.6640285) (0.5903094) (0.4771504)
δ 80.35355 ln(r13) -4.963921 ln(r28) 5.179468(165.7878) (34.20364) (0.4754916)
ln(v1,1) 4.152634 ln(r14) 5.203751 ln(r29) -2.726739(2.427955) (0.4679177) (161.2342)
ln(v0,1) -11.33953 ln(r15) 5.198169 ln(r30) 4.954935(2229.194) (0.4659872) (0.6175905)
ln(r1) 5.379895 ln(r16) 5.022056 ln(r31) 5.167926(0.3962141) (0.5546964) (0.4799807)
ln(r2) 5.484487 ln(r17) 5.340205 ln(r32) 4.842411(0.3577982) (0.4052718) (0.6913984)
ln(r3) 5.416172 ln(r18) 4.998615 ln(r33) -2.630624(0.3753905) (0.5919309) (141.8869)
ln(r4) 5.002529 ln(r19) 5.270693 ln(r34) -4.137877(0.5894222) (0.4346098) (239.6542)
ln(r5) 5.403721 ln(r20) 5.00915 ln(r35) -4.062496(0.3828157) (0.5671102) (232.3306)
ln(r6) 5.482439 ln(r21) 4.974899(0.3592197) (0.6360194)
ln(r7) 5.42788 ln(r22) 5.029355(0.376878) (0.5785778)
Note: Standard errors are in parentheses.
24
Table 3: Dynamic Effects Predictions versus Observed (Data Set 1)
Purchase Probabilities Predicted Observed
Pr (x2 = 0|x1 = 1) 0.9453 1Pr (x2 = 1|x1 = 1) 0.0547 0Pr (x2 = 0|x1 = 0) 0.9993 0.9983Pr (x2 = 1|x1 = 0) 0.0007 0.0017
Note: Calculation of predicted values relies on the observed sample standarddeviation and mean of P2. Observed probabilities are empirical frequenciesin the data.
25
The με estimate helps pin down the population average of consumer tastes
for yogurt, με + rj , where rj is the intercept for brand j — for example, for
brand 1, the population average is -452 cents. We will continue to use brand
1 whenever discussion requires a specific brand example. To be more spe-
cific, -452 cents is the population average of consumer tastes for nonbuyers
(x1 = x2 = 0), which are the majority. The population average for con-
sumers who buy in both periods is -388 cents, and for consumers who only
buy in the second period is -452 cents, and for consumers who only buy in
the first period is -469 cents. It may at first appear that people don’t like
yogurt very much and would prefer no yogurt to yogurt, but this is not ex-
actly what takes place. The parameter ln(ση) corresponds to the variation
in intrinsic taste for yogurt in the population, as well as the consumer’s un-
certainty about her true taste for yogurt. The estimate is highly statistically
significantly different from zero, with a value of 5.44 and a standard error of
0.664. Using a Taylor series approximation to estimate the standard error of
the coefficient’s transformation, this translates into an estimate of 230 cents
for ση with the estimator’s standard error at 153 cents. Thus consumers are
very heterogeneous in their tastes for yogurt, with some consumers liking
it a lot, but others not liking it. This result, the highly negative average
consumer taste tempered by a large degree of variation in consumer tastes,
can be seen in almost all brands in the data, and is also found in Osborne
(2005). Osborne points out that this wide heterogeneity in tastes points to
the “experience good” nature of the product. One of the estimates for the
26
consumer’s uncertainty about her true taste for the product in Osbourne can
be converted to cents to yield a standard deviation of 53 cents (this param-
eter is compared to this paper’s ση which represents both variation in true
tastes and consumer uncertainty about true tastes).
The δ estimate indicates that consuming brand j in period 1 increases the
consumer’s taste for the product in period 2 by 80 cents. This corresponds
to habit formation. The habit formation estimate which can be calculated
from the model estimated in Osborne is 44 cents (calculated by dividing the
habit formation coefficient by the MRS). Osborne finds that there is a lot of
individual heterogeneity in the degree of habit formation, and that 62% of
households are habit formers.
While habit formation can be easily interpreted by looking at the model
coefficients, consumer learning is not as straightforward to point out in the
numbers with an estimated model. Firstly, the highly statistically significant
coefficient on ln(ση), in other words, high degree of heterogeneity in consumer
tastes not revealed prior to the first consumption event, indicates strong in-
centives for consumers to learn: if consumer tastes have a large variance, and
consumers know this, they know that they receive valuable rewards from ex-
perimentation with brands. Osborne suggests that learning can be identified
in consumer behavior by looking at the difference between the share of con-
sumers who buy the product in the first period but do not repurchase it and
the share of consumers who do not buy the product in the first period but
buy it in the next period. The intuition here would be that those who pur-
27
chase but do not repurchase the product learned their true valuation of the
product by experimentation. But those who do not purchase right away but
purchase later represent the share of consumers who can be expected in the
second period to have seen high enough realizations of εi and perhaps low
enough realizations of price — thus this is a predictor of what proportion of
consumers can be expected to, in any period, purchase the product because
they are in the high tail of the distribution of unobservables for that period.
Hence large values of the difference between the two shares (in this paper,
probabilities) should be a good signal of learning in consumer behavior. We
can observe this identifier in Table 3. The difference in probabilities is almost
1, the maximum possible. Hence this is another indication of learning in the
data.
With estimates of the structural model it is also possible to examine coun-
terfactuals. Consider the consumer’s dynamic programming value function,
given the consumer purchases the brand in period 1, EV |x1=1. Then consider
a counterfactual in which the consumer does not learn from consumption but
does form habits — in other words, remove the learning from the consumer’s
optimization problem by specifying η = ση ≡ 0. Denote the consumer’s value
function in this counterfactual EV ′|x1=1. Calculating EV |x1=1 − EV ′|x1=1
should yield a measure of how much better off the consumer expects to be
because of the ability to learn about her product valuation through consump-
tion. Here the value is 17 cents, which may upon first examination seem low,
but really is not considering its frame of reference. Almost all of the brands
28
of yogurt in the sample cost between 50 cents and $1, hence 17 cents is a
significant change in valuation when compared to the cost of the product.
Comparing the estimate of habit formation, that the consumer’s taste for the
brand in the second period increases by 80 cents if the product is consumed
in the first period, to the prices of yogurt in the sample, indicates that the
effect of habit formation is just as large as the effect of price.
The estimated parameters of the structural model allow us to examine
demand for yogurt. Period 2 demand is:
Pr(P2 > P2
∣∣∣ P2
)= 1 − F (P2) (14)
where P2 has cdf F and the probability is over the εi2’s and ηi’s (consumers).
The effects of consumer learning and habit formation on demand can be
seen in Figure 3. The graph shows the differences implied by the model and
estimated parameters between demand given x1 = 0 and given x1 = 1. If the
product is purchased in the first period, habit formation shifts demand up by
80 cents and learning makes consumers less price sensitive hence the change in
slope. But the difference between demand given x1 = 0 and given x1 = 1 can
only be conveniently decomposed into its basic component effects in this view,
which includes negative values of price. When we look at the region of the
demand curve estimated in Figure 4 where only positive prices are considered,
the differences between the two curves are not obviously attributable to one
effect or the other. Thus the structural model provides intuition, which is
29
Figure 3: Shifts in Demand (Data Set 1)−1
000
−500
050
0Pr
ice
(P2)
(cen
ts)
0 .2 .4 .6 .8 1Demand
Demand given x1=0Step 1: Habit formation shifts up demand by 80 cents (delta)Step 2: Learning makes consumers less price sensitive
grounded in learning effects and habit formation effects, about what happens
with the demand curves and how their differences can be explained with
economic effects experienced at the micro level of the individual decision
making consumer. But if we step away from the model and analyze the
demand graph in Figure 4, we can still make some conclusions. It can be
seen that demand given x1 = 1 is not merely a shift up of demand given
x1 = 0. This implies that only additive habit formation, in the absence of
other effects, could not be responsible for the differences between the curves.
Also, the difference between the curves has to include a shift and not just a
30
slope effect, thus it appears we can also rule out consumer learning only in the
absence of other effects. Therefore either both consumer learning and habit
formation effects are important, or there is misspecification in the model.
Figure 4: Demand (Data Set 1)
020
040
060
0Pr
ice
(P2)
(cen
ts)
0 .02 .04 .06 .08 .1Demand
Demand given x1=0 Demand given x1=1
31
4.2 Evidence from Data Set 2
The με estimate is necessary to calculate average consumer tastes. -700 cents
is the population average of consumer tastes for nonbuyers (x1 = x2 = 0),
which are the majority. The population average for consumers who buy in
both periods is -669 cents, and for consumers who only buy in the second
period is -700 cents, and for consumers who only buy in the first period
is -751 cents. As in Data Set 1, this does not mean people do not like
yogurt. The parameter ln(ση) corresponds to the variation in intrinsic taste
for yogurt in the population, as well as the consumer’s uncertainty about
her true taste for yogurt. The estimate is highly statistically significantly
different from zero, with a value of 6.06 and a standard error of 0.142. Using a
Taylor series approximation to estimate the standard error of the coefficient’s
transformation, this translates into an estimate of 427 cents for ση with the
estimator’s standard error at 60 cents. Thus, as in Data Set 1, consumers
are very heterogeneous in their tastes for yogurt, with some consumers liking
it a lot, but others not liking it. This is shown in Figure 6.
32
Table 4: Estimated Model Parameters for Data Set 2
Parameter Estimate Parameter Estimate Parameter Estimate
ln(σε) 5.53991 ln(r8) 3.947013 ln(r23) 3.693937(0.0164833) (0.5691153) (0.7270177)
με -827.8881 ln(r9) 2.919857 ln(r24) -5.10243(4.058027) (1.673276) (66.53002)
ln(σP2 ) 23.36663 ln(r10) 4.674613 ln(r25) 3.649968(.) (0.2023852) (0.8332664)
μP2 1.335668 ln(r11) 3.952077 ln(r26) -5.918308(.) (0.707396) (.)
ln(ση) 6.056033 ln(r12) 6.049533 ln(r27) 4.689607(0.1416533) (0.0494318) (0.2062894)
δ 82.31698 ln(r13) 2.190537 ln(r28) 4.051851(8.140572) (2.378045) (0.4674159)
ln(v1,1) 3.436761 ln(r14) 6.009966 ln(r29) 4.458556(0.1701274) (0.0555648) (0.2870664)
ln(v0,1) -8.309021 ln(r15) -5.162236 ln(r30) 3.989435(202.2424) (.) (0.5214765)
ln(r1) 4.84797 ln(r16) 3.616467 ln(r31) 4.268801(0.161968) (0.7956624) (0.3447859)
ln(r2) 2.906048 ln(r17) 4.155919 ln(r32) -12.76221(1.688454) (0.4252708) (1831.423)
ln(r3) 2.887012 ln(r18) 3.637418 ln(r33) 5.667088(1.649668) (0.6763644) (0.0703793)
ln(r4) 4.70944 ln(r19) 4.149523 ln(r34) 4.691722(0.2004381) (0.3967899) (0.1878336)
ln(r5) 4.773026 ln(r20) 4.429662 ln(r35) -5.227693(0.1789292) (0.2675972) (2.812465)
ln(r6) 4.765643 ln(r21) 4.171555(0.1854232) (0.3809042)
ln(r7) 4.638903 ln(r22) 4.070337(0.2216636) (0.4880038)
Note: the numbering of product brands in the two data sets is different,therefore brand number j in the first data set is not the same brand as brandnumber j in the second data set. Standard errors are in parentheses.
33
Table 5: Dynamic Effects Predictions versus Observed (Data Set 2)
Purchase Probabilities Predicted Observed
Pr (x2 = 0|x1 = 1) 0.9170 0.9259Pr (x2 = 1|x1 = 1) 0.0830 0.0741Pr (x2 = 0|x1 = 0) 0.9988 0.9950Pr (x2 = 1|x1 = 0) 0.0012 0.0050
Note: Calculation of predicted values relies on the observed sample standarddeviation and mean of P2. Observed probabilities are empirical frequenciesin the data.
34
The δ estimate is highly statistically significantly different from zero,
with a value of 82 and a standard error of 8.14. This estimate indicates that
consuming brand j in period 1 increases the consumer’s taste for the product
in period 2 by 82 cents. This corresponds to habit formation.
The large estimate for ση is an indication of significant incentives for con-
sumer learning. Another way to identify learning is to observe the difference
between the share of consumers who buy the product in the first period but
do not repurchase it and the share of consumers who do not buy the product
in the first period but buy it in the next period. Large values of the difference
between the two shares (in this paper, probabilities) should be a good signal
of learning in consumer behavior. We can observe this identifier in Table
5. The difference in probabilities is almost 1, the maximum possible. Hence
this is another indication of learning in the data.
A calculation of EV |x1=1 − EV ′|x1=1, which should yield a measure of how
much better off the consumer expects to be because of the ability to learn
about her product valuation through consumption, results in 30 cents. Recall
that almost all of the brands of yogurt in the sample cost between 50 cents
and $1, hence 30 cents is a significant change in valuation when compared
to the cost of the product. Comparing the estimate of habit formation, that
the consumer’s taste for the brand in the second period increases by 82 cents
if the product is consumed in the first period, to the prices of yogurt in the
sample, indicates that the effect of habit formation is just as large as the
effect of price.
35
Examining the demand curves given x1 = 0 and x1 = 1 in Figure 5 leads
to the same conclusions as looking at the demand curves in Data Set 1.
Solely additive habit formation and solely learning cannot account for the
differences in the curves. Therefore either both consumer learning and habit
formation effects are present in the data or the model was misspecified.
Figure 5: Demand (Data Set 2)
020
040
060
080
0Pr
ice
(P2)
(cen
ts)
0 .05 .1Demand
Demand given x1=0 Demand given x1=1
Figure 6 shows the distribution of the reservation price (given x1) and
a 95% confidence interval around its mean based on the estimated variance
parameters and not the standard error of the estimate of the mean P2 This
is a visual representation of the initially confounding highly negative average
36
taste for every single brand of yogurt. The negative average is offset by
the very large variation around the mean in reservation price, hence the
heterogeneity in tastes for yogurt, and the fact that some people like a brand
of yogurt a lot while others do not like it at all.
Figure 6: Reservation Price (Data Set 2)
0.1
.2.3
.4De
nsity
−1595 −1199 −700 −618 −201 0 359 500Reservation Price (cents)
95% CI [P2|x1=1] 95% CI [P2|x1=0]Pr(P2|x1=1) Pr(P2|x1=0)
5 Conclusion
In this paper a tractable dynamic structural model for the intertemporal
purchasing behavior of a consumer which accounts for consumer learning
37
and habit formation was derived. The consumer’s dynamic programming
problem was solved exactly and the solution was used to estimate a model of
purchasing behavior with multiple brand choices using Maximum Likelihood.
The model was estimated on two sets of scanner data of purchases of yogurt.
The predictions from the two data sets were similar and yielded statistically
significant evidence of learning and habit formation, as well as dollar figures
for these effects. A counterfactual calculation allowed to estimate a dollar
figure for how much better off the consumer expects to be because of the
ability to learn about her product valuation through consumption. And
estimates of the parameters of the structural model allowed to derive the
demand for yogurt.
Demand curves derived from the estimated model, however, do not guar-
antee that consumer learning and habit formation are the effects moving the
data. For example, if habit formation were allowed to not be additive, and if
consumers with low valuation for yogurt experienced stronger habit forma-
tion than consumers with high valuation for yogurt, then such a non-additive
habit formation effect alone could explain the differences in demand curves
given x1.
Also, the model does not allow for persistent (through time) individual-
specific unobservables, which means that individual-specific tastes for a brand
based solely on the characteristics observable to the consumer prior to con-
sumption are not accounted for in the model (individual-specific tastes for a
brand which are not based solely on observable characteristics are accounted
38
for by η). This could be problematic if consumers form heterogeneous predic-
tions about their taste for the product prior to the first consumption event.
Therefore in future research it would be best to include a variable to fill
this gap. However, the model estimated in Osborne (2005) suggests con-
sumers are not very heterogeneous in their expectations about their taste for
a product before they try it.
39
Appendix
A Solution to the Consumer’s Dynamic Pro-
gramming Problem
A.1 The First Period Choice
The agent has beliefs
E(U | x1 = 0) = E [max(r + v0,1 − P2 + ε2 + E2(η|x1 = 0), v0,0)]
= E [max(r + v0,1 − P2 + ε2, v0,0)]
and
E(U | x1 = 1) = E[max(2r + v1,1 + ε1 − P1 + η + ε2 − P2 + E2(η|x1 = 1),
r + v1,0 + ε1 − P1 + η)]
= E[r + ε1 − P1 + η + max(r + v1,1 + ε2 − P2 + η, v1,0)]
= r + ε1 − P1 + E[max(r + v1,1 + ε2 − P2 + η, v1,0)].
She will be indifferent between x1 = 0 and x1 = 1 when E(U |x1 = 0) =
E(U | x1 = 1). Thus the reservation price P1 is
P1 = ε1 + r + E[max(v1,0, r + v1,1 + ε2 − P2 + η)] − E[max(v0,0, r + v0,1 + ε2 − P2)]
40
= ε1 + r + E[max(v1,0, r + v1,1 + ε2 − P2 + η)| ε2 − P2 + η > v1,0 − v1,1 − r]
×Pr(ε2 − P2 + η > v1,0 − v1,1 − r)
+E[max(v1,0, r + v1,1 + ε2 − P2 + η)| ε2 − P2 + η ≤ v1,0 − v1,1 − r]
×Pr(ε2 − P2 + η ≤ v1,0 − v1,1 − r)
−E[max(v0,0, r + v0,1 + ε2 − P2)| ε2 − P2 > v0,0 − v0,1 − r]
×Pr(ε2 − P2 > v0,0 − v0,1 − r)
−E[max(v0,0, r + v0,1 + ε2 − P2)| ε2 − P2 ≤ v0,0 − v0,1 − r]
×Pr(ε2 − P2 ≤ v0,0 − v0,1 − r)
= ε1 + r + v1,0 Pr(ε2 − P2 + η ≤ v1,0 − v1,1 − r) − v0,0 Pr(ε2 − P2 ≤ v0,0 − v0,1 − r)
+[r + v1,1 + E(ε2 − P2 + η| ε2 − P2 + η > v1,0 − v1,1 − r)]
×Pr(ε2 − P2 + η > v1,0 − v1,1 − r)
−[r + v0,1 + E(ε2 − P | ε2 − P2 > v0,0 − v0,1 − r)]
×Pr(ε2 − P2 > v0,0 − v0,1 − r)
= ε1 + r + v1,0Φ
⎛⎝v1,0 − v1,1 − r − με2 + μP2√
σ2ε2
+ σ2P2
+ σ2η
⎞⎠ − v0,0Φ
⎛⎝v0,0 − v0,1 − r − με2 + μP2√
σ2ε2
+ σ2P2
⎞⎠
+[r + v1,1 + τ(με2 − μP2 ,
√σ2
ε2+ σ2
P2+ σ2
η , v1,0 − v1,1 − r,∞)]
×⎡⎣1 − Φ
⎛⎝v1,0 − v1,1 − r − με2 + μP2√
σ2ε2
+ σ2P2
+ σ2η
⎞⎠
⎤⎦ (15)
−[r + v0,1 + τ(με2 − μP2,
√σ2
ε2+ σ2
P2, v0,0 − v0,1 − r,∞)
]
×⎡⎣1 − Φ
⎛⎝v0,0 − v0,1 − r − με2 + μP2√
σ2ε2
+ σ2P2
⎞⎠
⎤⎦
where τ() is defined as follows:
41
For any γ ∼ N(μ, σ2),
E (γ| γ ∈ [a, b]) =∫
x Pr(γ = x| γ ∈ [a, b])dx
=∫
xPr(γ = x, γ ∈ [a, b])
Pr(γ ∈ [a, b])dx
=∫
xPr(γ = x) × 1(x ∈ [a, b])
Pr(γ ∈ [a, b])dx
=
b∫a
xPr(γ = x)
Pr(γ ∈ [a, b])dx
=
b∫a
x(2πσ2)
− 12 e−
12(
x−μσ )
2
Φ(
b−μσ
)− Φ
(a−μ
σ
) dx
= μ +σ√2π
e−12(
a−μσ )
2
− e−12(
b−μσ )
2
Φ(
b−μσ
)− Φ
(a−μ
σ
)= τ(μ, σ, a, b)
A.2 The Second Period Choice
Expected utility conditional on purchase decisions is
E2(U | x1 = 0, x2 = 0) = v0,0
E2(U | x1 = 0, x2 = 1) = r + v0,1 + ε2 − P2
E2(U | x1 = 1, x2 = 0) = r + v1,0 + ε1 − P1 + η
E2(U | x1 = 1, x2 = 1) = 2r + v1,1 + ε1 − P1 + η + ε2 − P2 + η.
Given x1 = 0: The agent will be indifferent between x2 = 0 and x2 = 1
when E2(U | x1 = 0, x2 = 0) = E2(U | x1 = 0, x2 = 1). Thus the
42
reservation price is[P2
∣∣∣ x1 = 0]
= r + (v0,1 − v0,0) + ε2.
Given x1 = 1: The agent will be indifferent between x2 = 0 and x2 = 1
when E2(U | x1 = 1, x2 = 0) = E2(U | x1 = 1, x2 = 1). Thus the
reservation price is[P2
∣∣∣ x1 = 1]
= r + (v0,1 − v0,0) + ε2 + δ + η.
[P2
∣∣∣ x1
]= r + (v0,1 − v0,0) + ε2 + x1(δ + η) (16)
B Likelihood of Consumer’s Decisions
B.1 Likelihood of Consumer’s First Period Decision
The IIA assumption implies2
pN(m) =p{n,m}(m)
p{n,m}(n)pN(n). (17)
Define
p{n,n}(n) =1
2; and p{n}(n) = 1 (18)
Summing over m ∈ N ,
1 =∑
m∈N
pN(m) = pN(n)∑
m∈N
p{n,m}(m)
p{n,m}(n)(19)
2Equations (17) through (21) are the derivation by McFadden
43
thus
pN(n) = 1
/ ∑m∈N
p{n,m}(m)
p{n,m}(n)(20)
and using equation (17) this probability can be written as
pN (l) =p{n,l}(l)p{n,l}(n)
/ ∑m∈N
p{n,m}(m)
p{n,m}(n). (21)
For the first stage decision l1, calculate pN(l1) with equation (21) using
the “none” choice as benchmark. Thus
pN (l1) =p{0,l1}(l1)p{0,l1}(0)
/ ∑m∈N
p{0,m}(m)
p{0,m}(0). (22)
This calculation uses probabilities of purchasing a brand/flavor for a cus-
tomer faced with one brand/flavor choice, and thus choosing between buying
that only brand/flavor or buying nothing — in other words, the derived
solutions for “binary” models. And for a brand/flavor with index s,
p{0,s}(s) = Pr(P1 > P s1 ) (23)
and
p{0,s}(0) = 1 − p{0,s}(s) (24)
the expression for P1 from equation (4) can be used to calculate these prob-
abilities.
Thus, from the statistician’s point of view, for individual i and brand/flavor
44
with index s:
Pr(P1 > P s1 ) = Pr(εsi
1 + cs > P s1 ) (25)
where cs is all the terms from the right hand side of equation (4) except ε1,
and cs is deterministic. Thus
p{0,s}(s) = 1 − Φ(
P s1 − cs − με
σε
)(26)
p{0,s}(0) = Φ(
P s1 − cs − με
σε
)(27)
Thus for l1 �= 0
pN (l1) =1 − Φ
(P s
1−cl1−με
σε
)
Φ(
P s1−cl1−με
σε
)/⎛
⎝1 +∑
m∈N,m�=0
1 − Φ(
P s1−cm−με
σε
)Φ
(P s
1−cm−με
σε
)⎞⎠ (28)
and for l1 = 0
pN(l1) = 1
/⎛⎝1 +
∑m∈N,m�=0
1 − Φ(
P s1−cm−με
σε
)Φ
(P s
1−cm−με
σε
)⎞⎠ (29)
B.2 Conditional Likelihood of Consumer’s Second Pe-
riod Decision
Denote the probability of selecting alternative n from a nonempty set of
choices N , conditional on the first-stage decision l1, by pN(n|l1). Going
through the McFadden derivation3 with this conditional probability will re-
3Equations (17) through (21)
45
sult in:
pN ( l| l1) =p{n,l}( l| l1)p{n,l}(n| l1)
/ ∑m∈N
p{n,m}(m| l1)p{n,m}(n| l1) . (30)
Using the “none” choice as benchmark gives
pN( l2| l1) =p{0,l2}( l2| l1)p{0,l2}(0| l1)
/ ∑m∈N
p{0,m}(m| l1)p{0,m}(0| l1) . (31)
For a brand/flavor with index s
p{0,s}(s|l1) = Pr([P2
∣∣∣ l1] > P2) (32)
and
p{0,s}(0|l1) = 1 − p{0,s}(s|l1) (33)
where [P2
∣∣∣ l1] =[P2
∣∣∣ x1 = 1(l1 = s)]
(34)
and the convenience function 1() evaluates to 1 when its argument is true,
and 0 otherwise. The expression for[P2
∣∣∣ x1
]from equation (5) can be used
to calculate these probabilities. Define
cs2 = rs + v0,1 − v0,0 (35)
Thus
p{0,s}(s| l1) = [Pr(cs2 + δ + ε2 + η > P2)]
1(l1=s) [Pr(cs2 + ε2 > P2)]
1−1(l1=s)
46
p{0,s}(0| l1) = [Pr(cs2 + δ + ε2 + η ≤ P2)]
1(l1=s) [Pr(cs2 + ε2 ≤ P2)]
1−1(l1=s)
and
p{0,s}(s| l1) =
⎡⎣1 − Φ
⎛⎝P s
2 − cs2 − δ − με√
σ2ε + σ2
η
⎞⎠
⎤⎦
1(l1=s) [1 − Φ
(P s
2 − cs2 − με
σε
)]1−1(l1=s)
p{0,s}(0| l1) =
⎡⎣Φ
⎛⎝P s
2 − cs2 − δ − με√
σ2ε + σ2
η
⎞⎠
⎤⎦
1(l1=s) [Φ
(P s
2 − cs2 − με
σε
)]1−1(l1=s)
.
Thus for l2 �= 0
pN( l2| l1) =
[1 − Φ
(P
l22 −c
l22 −δ−με√
σ2ε+σ2
η
)]1(l1=l2) [1 − Φ
(P
l22 −c
l22 −με
σε
)]1−1(l1=l2)
[Φ
(P
l22 −c
l22 −δ−με√
σ2ε+σ2
η
)]1(l1=l2) [Φ
(P
l22 −c
l22 −με
σε
)]1−1(l1=l2)(36)
÷
⎛⎜⎜⎜⎝1 +
∑m∈N,m�=0
[1 − Φ
(P m
2 −cm2 −δ−με√
σ2ε+σ2
η
)]1(l1=m) [1 − Φ
(P m
2 −cm2 −με
σε
)]1−1(l1=m)
[Φ
(P m
2 −cm2 −δ−με√
σ2ε+σ2
η
)]1(l1=m) [Φ
(P m
2 −cm2 −με
σε
)]1−1(l1=m)
⎞⎟⎟⎟⎠
and for l2 = 0
pN( l2| l1) =
⎛⎜⎜⎜⎝1 +
∑m∈N,m�=0
[1 − Φ
(P m
2 −cm2 −δ−με√
σ2ε+σ2
η
)]1(l1=m) [1 − Φ
(P m
2 −cm2 −με
σε
)]1−1(l1=m)
[Φ
(P m
2 −cm2 −δ−με√
σ2ε+σ2
η
)]1(l1=m) [Φ
(P m
2 −cm2 −με
σε
)]1−1(l1=m)
⎞⎟⎟⎟⎠
−1
(37)
47
References
Ackerberg, D. (2003), “Advertising, Learning, and Consumer Choice in Ex-
perience Goods Markets: A Structural Empirical Examination,” Inter-
national Economic Review, 44 (3), 1007-1040.
Becker, G., Murphy, K. (1988), “A Theory of Rational Addiction,” The
Journal of Political Economy, 96 (4), 675-700.
Bergemann, D., Valimaki, J. (1996), “Learning and Strategic Pricing,”
Econometrica, 64 (5), 1125-1149.
Crawford, G., Shum, M. (2005), “Uncertainty and Learning in Pharmaceu-
tical Demand,” Econometrica, 73 (4), 1137-1173.
Erdem, T., Keane, M. (1996), “Decision-making Under Uncertainty: Cap-
turing Dynamic Brand Choice Processes in Turbulent Consumer Goods
Markets,” Marketing Science, 15 (1), 1-20.
Fernandez-Villaverde, J., Rubio-Ramırez, J.F., and Santos, M. (2006), “Con-
vergence Properties of the Likelihood of Computed Dynamic Models,”
Econometrica, 74 (1), 93-119.
McFadden, D. (1975), “The Revealed Preferences of a Government Bureau-
cracy: Theory,” The Bell Journal of Economics, 6 (2), 401-416.
Osborne, M. (2006), “Consumer Learning, Habit Formation, and Hetero-
geneity: A Structural Examination,” Unpublished Manuscript.
48