Ba Yes Marketing

6
The HB How Bayesian methods have changed the face of marketing research. 20 Summer 2004

Transcript of Ba Yes Marketing

Page 1: Ba Yes Marketing

TheHB

How Bayesian methodshave changed the face of marketing research.

20 Summer 2004

Page 2: Ba Yes Marketing

marketing research 21

Brevolution

By Greg M. Allenby, David G. Bakken, and Peter E. Ross i

For most of the history of market-

ing research, analytical methods have

relied on “classical” statistics. We use

these methods to make inferences about

the characteristics of a population from

the characteristics of a sample drawn from

that population. Because we can never

determine the true value of the population

parameters from samples with complete

certainty, we have regarded the sample as

one realization of many that could have

been realized and then quantify our uncer-

tainty with a frequency concept of proba-

bility. If we’re interested in the population

mean for some measure (say height, for

example), then the sample mean is our best

guess about the mean of the population,

and we conceive of uncertainty as arising

from hypothetical samples that could have

been selected from the population.

Classical statistical methods express uncer-

tainty through the variability of statistics

calculated from these hypothetical samples

when interpreting confidence intervals and

conducting hypothesis tests.

Over the last 10 years, a paradigm shift

has occurred in the statistical analysis of

marketing data, especially data obtained

from conjoint experiments. The new

paradigm reflects a different perspective

on probability. This view of probability

was first proposed in the 18th century by

Thomas Bayes, an English clergyman who

died in 1760. Bayes wrote a paper (pub-

lished posthumously in 1763) in which he

proposed a rule for accounting for uncer-

tainty that has become known as Bayes’

Theorem. This became the foundation for

Bayesian statistical inference, and Bayes

most likely would be amazed by the many

applications sprouting from his paper. In

Bayesian statistics, probabilities reflect a

belief about the sample of data under

study rather than about the frequency of

events across hypothetical samples. Bayes’

Theorem exploits the fact that the joint

probability of two events, A and B, can be

written as the product of the probability of

one event and the conditional probability

of the second event, given the occurrence

Reprinted with permission from Marketing Research, Summer 2004, published by the American Marketing Association.

Page 3: Ba Yes Marketing

of the first event. If we consider “A” to be our “hypothesis”(H) in the absence of any observations “B” (or “D” for data),we can express the rule as follows:

Pr(H|D) = Pr(D|H) ¥ Pr(H) / Pr(D)

The probability of the hypothesis given (or “conditional”on—the meaning of the “|” symbol) the data is equal to theprobability of the data given the hypothesis times the proba-bility of the hypothesis divided by the probability of the data.We refer to the Pr(H) as the priorprobability and Pr(H|D) as the pos-terior probability. (See “Bayes’Theorem Example” on page 25.)While prior probabilities can bethought of as arising from hypothet-ical samples, the frequency conceptof probability need not beemployed, and the prior probabilitycan reflect the researcher’s beliefabout the state of nature.Alternatively, “H” can stand for anyunobserved aspect of an analysis,including model parameters such asthe estimates of utility in a conjoint model. In Bayesian analy-sis, all unobserved objects of analysis are dealt with in thesame manner.

One of the main differences between the Bayesian and theclassical approach is that the classical approach assumes thatthe hypothesis (H) is known with certainty, and the methodssearch for the best fitting model that maximizes Pr(D|H)—theprobability of the data given the hypothesis. In contrast, theBayesian approach acknowledges that the hypothesis, in reality,is never known, and the classical assumption that it is knownlimits the ability of the analyst to account for uncertainty. (Thedata may be consistent with many hypotheses.) The Bayesianapproach asks, “Which hypothesis is most likely, given thedata?” It is the observed data that are known for certain, andBayesian analysis then proceeds by “conditioning” on the data.

Bayesian Application EmergenceUntil recently, Bayesian statistical inference has been more a

subject of intellectual curiosity than a practical application

among researchers in marketing and other fields. There are atleast three reasons for this. First, for many simple problemswhere the analyst is unwilling or unable to make informativestatements about prior probabilities, the Bayesian and classicalapproach lead to essentially the same conclusions. Second,while Bayes’ Theorem is a conceptually simple method ofaccounting for uncertainty, it has been difficult to implementin all but the simplest problems. As an example, a conjointanalysis involving 15 part-worth estimates and 500 respon-dents leads to an analysis with 750 parameters, making theapplication of Bayes’ Theorem difficult. Finally, Bayesian anal-ysis is particularly well-suited for understanding what wemight call the “data generating process” behind complex mar-keting behaviors. Marketing researchers, however, do not usu-ally make inferences about the process that gives rise to theobserved behavior. The one area in which we do pay attentionis the process by which consumers maximize utility in makingchoices. It’s not surprising, therefore, that Bayesian analysishas gained a strong following in this area.

Most marketing researchers are familiar with a least one ofthe variations of conjoint analysis (full profile conjoint, adap-tive conjoint, and “choice-based” conjoint). In conjoint analy-sis, consumers are presented with descriptions of actual orhypothetical products or services and asked to evaluate these

products or services in somefashion. In the language ofmodeling, the characteristicsthat describe the alternatives(attributes or features, forexample) and the consumerbehavior (choice among alterna-tives, for example) are observed.The utilities that consumersassign to the different productcharacteristics are unobserved.Random factors (e.g., responseerrors) are also unobserved. Theanalytical challenge lies in iso-

lating the unobserved utility from the other unobserved fac-tors, given the attribute descriptions and observed choices.Moreover, it is desirable to do this in a manner that allows forrespondent heterogeneity. A shortcoming of classical methodsfor analyzing conjoint data is that they cannot easily accountfor heterogeneity across consumers in the unobserved utilities.

Accounting for uncertainty is important when analyzingmarketing research data. Marketing data typically comprisesmany heterogeneous “units” (e.g., households, respondents,panel members, activity occasions) with limited information oneach unit and large amounts of uncertainty. In a conjoint analy-sis, it is rare to have more than 20 or so evaluations or choicesof product descriptions per respondent. In direct marketingdata, it is rare to have more than a few dozen orders for a cus-tomer within a given product category. Researchers oftenreport that predictions in marketing research analyses are tooaggressive and unrealistic. A reduction in price of 10% or20%, for example, results in an increase in market share that’sknown to be too large. Overly optimistic predictions can easily

22 Summer 2004

Execut ive Summar y

In the last 10 years, a paradigm shift has occurred in the

statistical analysis of marketing data. Since the early 1990s,

more than 50 papers on hierarchical Bayes (HB) methods

have been published in top marketing journals, offering better

solutions to a wider class of research problems than previ-

ously possible. This article provides an introduction to and

perspective on Bayesian methods in marketing research.

In Bayesian statistics,probabilities reflect a belief

about the sample of data understudy rather than about the frequency of events across

hypothetical samples.

USER
Resaltar
Page 4: Ba Yes Marketing

result if estimates of price sensitivity are incorrectly assumed tobe known with certainty. When uncertainty is taken into con-sideration, market share predictions become more realistic andtend to agree with current and past experience. Bayes’ Theoremtherefore offers an elegant solution to an important problem.

HB ModelsRecent developments in statistical computing have made

Bayesian analysis accessible to researchers in marketing andother fields. A key innovation from the 1980s, known asMarkov Chain Monte Carlo (MCMC) simulation, has facili-tated the estimation of complex models of behavior that canbe infeasible to estimate with alternative methods. (MonteCarlo simulators provide a mechanism for generating randomdraws from statistical distributions. The Markov Chain is adevice that, coupled with the Monte Carlo simulator, trans-forms the simulator into a rather efficient engine for searchingthe randomly generated distribution.) These models are writ-ten in a hierarchical form and, when estimated with MCMCmethods, are referred to as hierarchical Bayes (HB) models.Discrete choice models, for example, assume that revealedchoices reflect an underlying process where consumers havepreferences for alternatives and select the one that offers great-est utility. Utility is assumed to be related to specific attributelevels that are valued by the consumer, and consumers are

assumed to be heterogeneous in their preference for theattributes. The model is written as a series of hierarchical alge-braic statements, where model parameters in one level of thehierarchy are “unpacked,” or explained, in subsequent levels:

(1) Pr (yih = 1) = Pr (Vih + eih > Vjh + ejh for all j)

(2) Vih = xi'bh

(3) bh ~ Normal(b, Sb)

Here “Pr” stands for “the probability that…,” “i” and “j”denote different choice alternatives, yih is the choice outcomefor respondent h, Vih is the utility of choice alternative i torespondent h, and eih is a random or “error” component. Inequation 2, xi denotes the attributes of the ith alternative, andbh are the weights given to the attributes by respondent h.Finally, equation 3 is a “random-effects” model that assumesthat the attribute weights are normally distributed in the pop-ulation (with a mean of and b variance of Sb).

The “bottom” of the hierarchy specified by equations 1–3is the model for the observed choice data. Equation 1 repre-sents the individual’s choice process and specifies that alterna-tive j is chosen if the latent or unobserved utility is the largestamong all of the alternatives. Latent utility is not observeddirectly and is linked to characteristics of the choice alterna-tive in equation 2. Each individual’s partworths or attribute

marketing research 23

Page 5: Ba Yes Marketing

weights (the bh in equation 2) are linked to those of otherindividuals by a common distribution in equation (3).Equation 3 allows for heterogeneity among the units ofanalysis by specifying a probabilistic model of how the unitsare related. Bayes’ Theorem therefore provides a method of“bridging” the analysis across respondents while providingan exact accounting of all the uncertainty present. Evencasual observation of consumer choices suggests that con-sumers vary in the value they assign to different product fea-tures or attributes. The advantage of the Bayesian approachto this problem is the ability to reflect the heterogeneity inthose consumer preferences.

One of the main objections to choice-based conjoint hasbeen that the method enables us to estimate utilities only inthe aggregate, averaged across all respondents in the survey.The HB method, in contrast, allows us to make “individual-level” estimates of utilities for each feature level. This seem-ingly overcomes this objection, making choice-based conjointmore comparable to ratings or ranking-based methods of anal-ysis. However, the individual level utilities estimated by OLSregression for ratings and rank data are not particularly reli-able, given the small number of observations for any singleindividual. A principal benefit of HB estimation lies in equa-tion 3, which bridges analysis among respondents in the sam-ple and increases the information used to make inferenceabout each respondent’s utilities.

The predictive superiority of HB methods stems from thefreedom afforded by MCMC to specify more realistic modelsand the ability to conduct disaggregate analysis. Models ofheterogeneity were previously limited to the use of demo-graphic covariates to explain differences or the use of finitemixture models, such as latent class models. Neither approachis realistic—demographic variables are too broad in scope tobe related to attributes in a specific product category, and theassumption that heterogeneity is well-approximated by a smallnumber of customer segments or “latent classes” is more ahope than a reality. Much of the predictive superiority of HBmethods is due to avoiding the restrictive analytic assumptionsthat alternative methods impose.

Solving Marketing ProblemsThe best way to illustrate the power of Bayesian methods to

solve marketing problems is with a concrete example. Mostpractitioners of conjoint methods have observed, at least anec-dotally, that consumers may reject some feature levels out-right. Put differently, these feature levels do not enter into theconsumer’s consideration set. For example, some consumersmay simply reject all choices that exceed a certain price thresh-old. The basic conjoint model assumes, however, that con-sumers have at least some minimal utility for all feature levelsand, moreover, that they apply a “compensatory rule” in mak-ing choices or evaluating alternatives. The compensatory ruleimplies that consumers will consciously trade off a better levelof one feature in order to get a more desirable level of a secondfeature. In reality, consumers may use non-compensatory rulesin making their choices. One such rule is the “conjunctive” or“and” rule, which requires that an alternative have certain fea-

tures in order to be considered.Practitioners have tried different ad hoc or heuristic

approaches for reducing the bias that departures from theassumption of a compensatory decision process might intro-duce into the estimation of conjoint utilities. One approachinvolves asking respondents to eliminate feature levels that areunacceptable to them. These features may then be eliminatedfrom the conjoint experiment for that respondent (possibleonly in ACA or adaptive conjoint) or the information might beused in estimating the utility values, such that the unaccept-able levels are assigned a large negative utility.

The real problem for the modeler is that, while the choicebehavior is observed, the formation of the consideration set orthe decision rule is not. The question becomes, can we inferthe consideration set and the decision rule from the observedchoices? Happily, the answer is “yes.” Tim Gilbride, a gradu-ate student at Ohio State University, working with GregAllenby (one of the authors of this article), has developed amodeling approach that uses Bayesian estimation to simulta-neously estimate the attribute-level utilities and test for differ-ent consideration sets and decision rules. Using a choice-basedconjoint study for digital cameras, Gilbride estimated a varietyof models, including a “standard” HB choice model, a modelthat assumed a conjunctive (“and”) rule, a model thatassumed a disjunctive (“or”) rule, and a model that allowedfor heterogeneity in the decision process (some consumers usecompensatory rules, some use conjunctive rules, etc.). The keyfinding is that most consumers, at least in this category, appearto use a conjunctive rule. Moreover, the model makes it possi-ble to identify the frequency with which respondents screen oneach of the attributes. Exhibit 1 shows the results of this anal-ysis for the digital camera study.

The ChallengesFreedom from computational constraints allows researchers

and practitioners to develop more realistic models of buyerbehavior and decision making. Moreover, this freedom enablesexploration of marketing problems that have proven elusiveover the years, such as models for advertising ROI, sales forceeffectiveness, and similarly complex problems that ofteninvolve endogeneity. This occurs when the dependent variable

24 Summer 2004

Exhibit 1 Respondents screening on camera attributes

Body style Mid-roll Annotation Annotation Annotation Annotation Op. Zoom Viewfinder Settings Pricechange 1 2 3 4 feedback

45%

40%

35%

30%

25%

20%

15%

10%

5%

0%

8%

18%

6% 4% 4% 4% 4%

41%40%

5%

22%

USER
Resaltar
Page 6: Ba Yes Marketing

has some impact on the independent variable as, for example,when advertising budgets are established as a percent of theprevious year’s sales. The promise of Bayesian statistical meth-ods lies in the ability to deal with these complex problems, butthe very complexity of the problems creates a significant chal-lenge to both researchers and practitioners.

In most cases, there exists no off-the shelf programs for esti-mating complex HB models. Sawtooth Software has createdtwo programs for estimating HB models, one for choice-basedconjoint and one for OLS regression. However, these programsare limited to modeling relatively simple problems. In fact,these programs are really HB extensions to “standard” choice-based conjoint and regression models. While the availability ofthese programs from Sawtooth Software has been a majorimpetus behind the adoption of HB estimation for choice-basedconjoint, our ability to estimate more complex models is lim-ited by the need to customize programs for each new model.For researchers who need to estimate other types of models,WINBUGS and MCMCPack (for the R/S statistical computingenvironments) offer a means of implementing HB methods.

Practitioners, in particular, must also adjust to some of thedifferences they will encounter in using HB methods. As oneexample, HB estimation methods do not “converge” on a

closed-form solution in the way that many of our classical esti-mation methods, such as multinomial logit, do. Practitionerswill need to become comfortable with the fact that, once thevariance in the estimation stabilizes after several thousand“burn-in” iterations, there will still be considerable variationin the average parameter estimates. Another important differ-ence with HB methods is that, instead of a point estimate ofvalues for each respondent, we usually end up with a distribu-tion of estimates for each respondent. While this is powerful interms of understanding uncertainty, it adds to the complexityof the analysis, particularly in the case of market simulation.

Applications and OpportunitiesHB models are revolutionizing research in marketing.

Significant strides have been made in quantifying many aspectsof behavior, including sources of measurement error (such ascarry-over effects and scale usage), volumetric forecasts, andsatiation. Researchers are exploring how individuals differ andhow variables capable of reflecting the motivations presentwhen they interact with their environment and seek helpthrough the goods and services available for sale. Finally, HBmodels have revolutionized the use of spreadsheet simulatorsthat explore marketplace scenarios using the individual-levelestimates of utility previously described. While our exampleshave focused on survey-based data, Bayesian methods havebeen applied to a wide variety of marketing data, includingtransaction data and Web site “click through” data.

Bayesian models give researchers in marketing the freedomto study the complexities of human behavior in a more realis-tic fashion than was previously possible. Human behavior isextremely complex. Unfortunately, many of the models andvariables used in our analysis are not. Consider, for example,the linear model in equation 2. While this model has been theworkhorse of much statistical analysis, particularly in conjointanalysis, it does not provide a true representation of howrespondents encode, judge, and report on items in a question-naire. The future holds many opportunities to look behind theresponses in surveys to gain better insight into how individualsmight act in the marketplace. ●

Additional ReadingGilbride, Timothy J. and Greg M. Allenby (2004), “A ChoiceModel with Conjunctive, Disjunctive, and CompensatoryScreening Rules,” Marketing Science, forthcoming.

Rossi, Peter E. and Greg M. Allenby (2003), “BayesianStatistics and Marketing,” Marketing Science, 22, 304-328.

Greg M. Allenby is the Helen C. Kurtz chair in marketing atthe Fisher College of Business, Ohio State University. He maybe reached at [email protected]. David G. Bakken is seniorvice president, marketing science, at Harris Interactive. Hemay be reached at [email protected]. Peter E.Rossi is Joseph T. Lewis professor of marketing and statistics,graduate school of business, University of Chicago. He may bereached at [email protected].

marketing research 25

Bayes’ Theorem Example

A person working as a quality inspector on a production line is tasked withpicking out units that have a specific blemish. Testing has determined thatthis particular inspector correctly declares (D) 90% of the units that haveblemishes (B+), and never incorrectly declares an unblemished unit (B-). The inspector picks up a unit from the production line and declares it tobe blemished. What is the probability that this unit is in fact blemished? Wemight be inclined to guess that the probability is 90%, but we need to takeaccount of the fact that the inspector never makes a mistake if the unit isunblemished. We can use Bayes’ Theorem to calculate the odds that the unitin the inspector’s hand is in fact blemished. From Bayes’ Theorem we have:

Pr(B+| D) = Pr(D| B+) ¥ Pr(B+) / Pr(D)

Pr(B-| D) = Pr(D| B-) ¥ Pr(B-) / Pr(D)

Dividing the two expressions to remove Pr(D) leaves:

Pr(B+| D) =

Pr(D| B+) ¥ Pr(B+)

Pr(B-| D) Pr(D| B-) Pr(B-)

or “posterior odds = likelihood ratio ¥ prior odds.” If a blemish is present, theinspector is correct 90% of the time, so Pr(D|B+) = 0.90. When a blemish isnot present, the inspector never makes a mistake, so Pr(D|B-) = 0.00. Thismeans that, regardless of the prior odds that a blemish exists, Pr(B+)/Pr(B-).Once the inspector declares a blemish, the posterior odds are infinite or theprobability that a blemish exists, Pr(B+|D), is equal to one and the probabilitythat a blemish does not exist, Pr(B-|D) is zero. For the less extreme casewhere the inspector occasionally makes a mistake when a blemish is not present, Bayes’ Theorem accounts for all forms of uncertainty in deriving the posterior odds.