New Active Machine Learning for Consideration Heuristicsweb.mit.edu/hauser/www/Updates -...

CELEBRATING 30 YEARS

Vol. 30, No. 5, September–October 2011, pp. 801–819issn 0732-2399 �eissn 1526-548X �11 �3005 �0801

http://dx.doi.org/10.1287/mksc.1110.0660© 2011 INFORMS

Active Machine Learning for Consideration Heuristics

Daria Dzyabura, John R. HauserMIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142

{[email protected], [email protected]}

We develop and test an active-machine-learning method to select questions adaptively when consumers useheuristic decision rules. The method tailors priors to each consumer based on a “configurator.” Subsequent

questions maximize information about the decision heuristics (minimize expected posterior entropy). To updateposteriors after each question, we approximate the posterior with a variational distribution and use beliefpropagation (iterative loops of Bayes updating). The method runs sufficiently fast to select new queries in undera second and provides significantly and substantially more information per question than existing methodsbased on random, market-based, or orthogonal-design questions.

Synthetic data experiments demonstrate that adaptive questions provide close-to-optimal information andoutperform existing methods even when there are response errors or “bad” priors. The basic algorithm focuseson conjunctive or disjunctive rules, but we demonstrate generalizations to more complex heuristics and to theuse of previous-respondent data to improve consumer-specific priors. We illustrate the algorithm empiricallyin a Web-based survey conducted by an American automotive manufacturer to study vehicle consideration(872 respondents, 53 feature levels). Adaptive questions outperform market-based questions when estimatingheuristic decision rules. Heuristic decision rules predict validation decisions better than compensatory rules.

Key words : active learning; adaptive questions; belief propagation; conjunctive models; consideration sets;consumer heuristics; decision heuristics; disjunctions of conjunctions; lexicographic models; variationalBayes estimation

History : Received: January 6, 2010; accepted: May 4, 2011; Eric Bradlow and then Preyas Desai served as theeditor-in-chief and Fred Feinberg served as associate editor for this article. Published online in Articles inAdvance July 15, 2011.

1. Problem Statement: AdaptiveQuestions to Identify HeuristicDecision Rules

We develop and test an active-machine-learning algo-rithm to identify heuristic decision rules. Specifically,we select questions adaptively based on prior beliefsand respondents’ answers to previous questions. Tothe best of our knowledge, this is the first (near-optimal) adaptive-question method focused on con-sumers’ noncompensatory decision heuristics. Extantadaptive methods focus on compensatory decisionrules and are unlikely to explore the space of noncom-pensatory decision rules efficiently (e.g., Evgeniouet al. 2005; Toubia et al. 2007, 2004; Sawtooth 1996). Inprior noncompensatory applications, question selec-tion was almost always based on either random pro-files or profiles chosen from an orthogonal design.

We focus on noncompensatory heuristics becauseof managerial and scientific interest. Scientific inter-est is well established. Experimental and revealed-decision-rule studies suggest that noncompensatoryheuristics are common, if not dominant, when con-sumers face decisions involving many alternatives,many features, or if they are making considerationrather than purchase decisions (e.g., Gigerenzer and

Goldstein 1996; Payne et al. 1988, 1993; Yee et al.2007). Heuristic rules often represent a rational trade-off among decision costs and benefits and may bemore robust under typical decision environments(e.g., Gigerenzer and Todd 1999). Managerial inter-est is growing as more firms focus product devel-opment and marketing efforts on getting consumersto consider their products or, equivalently, prevent-ing consumers from rejecting products without evalu-ation. We provide illustrative examples in this paper,but published managerial examples include Japanesebanks, global positioning systems, desktop comput-ers, smart phones, and cellular phones (Ding et al.2011, Liberali et al. 2011).

Our focus is on adaptive question selection, but toselect questions adaptively, we need intermediate esti-mates after each answer and before the next questionis asked. To avoid excessive delays in online ques-tionnaires, intermediate estimates must be obtainedin a second or less (e.g., Toubia et al. 2004). Thisis a difficult challenge when optimizing questionsfor noncompensatory heuristics because we mustsearch over a discrete space of the order of 2N deci-sion rules, where N is the number of feature levels(called aspects, as in Tversky 1972). Without special

801

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Dzyabura and Hauser: Active Machine Learning for Consideration Heuristics802 Marketing Science 30(5), pp. 801–819, © 2011 INFORMS

structure, finding a best-fitting heuristic is muchmore difficult than finding best-fitting parameters foran (additive) compensatory model—such estimationalgorithms typically require the order of N parame-ters. The ability to scale to large N is important inpractice because consideration heuristics are commonin product categories with large numbers of aspects(e.g., Payne et al. 1993). Our empirical applicationsearches over 900 × 1015 heuristic rules.

We propose an active-machine-learning solution(hereafter, active learning) to select questions adap-tively to estimate noncompensatory heuristics. Theactive-learning algorithm approximates the posteriorwith a variational distribution and uses belief prop-agation to update the posterior distribution. It thenasks the next question to minimize expected posteriorentropy by anticipating the potential responses (inthis case, to consider or not consider). The algorithmruns sufficiently fast to be implemented betweenquestions in an online questionnaire.

In the absence of error, this algorithm comesextremely close to the theoretical limit of the informa-tion that can be obtained from binary responses. Withresponse errors modeled, the algorithm does substan-tially and significantly better than extant question-selection methods. We also address looking ahead Ssteps, generalized heuristics, and the use of popula-tion data to improve priors. Synthetic data suggestthat the proposed method recovers parameters withfewer questions than extant methods. Empirically,adaptive-question selection is significantly better atpredicting future consideration than benchmark ques-tion selection. Noncompensatory estimation is alsosignificantly better than the most commonly appliedcompensatory method.

We begin with a brief review and taxonomyof existing methods to select questions to identifyconsumer decision rules. We then review noncom-pensatory heuristics and motivate their managerialimportance. Next, we present the algorithm, testparameter recovery with synthetic data, and describean empirical illustration in the automobile mar-ket. We close with generalizations and managerialimplications.

2. Existing Methods for QuestionSelection to Reveal ConsumerDecision Rules

Marketing has a long tradition of methods to measureconsumer decision rules. Figure 1 attempts a taxon-omy that highlights the major trends and providesexamples.

The vast majority of papers focus on compensatorydecision rules. The most common methods include

either self-explication, which asks respondents to self-state their decision rules, or conjoint analysis, whichinfers compensatory decision rules from questionsin which respondents choose, rank, or rate bun-dles of aspects called product profiles. These meth-ods are applied widely and have demonstrated bothpredictive accuracy and managerial relevance (e.g.,Green 1984, Green and Srinivasan 1990, Wilkie andPessemier 1973). In early applications, profiles werechosen from either full-factorial, fractional-factorial,or orthogonal designs, but as hierarchical-Bayes esti-mation became popular, many researchers movedto random designs to explore interactions better.For choice-based conjoint analysis, efficient designsare a function of the parameters of compensatorydecision rules, and researchers developed “aggre-gate customization” to preselect questions using datafrom prestudies (e.g., Arora and Huber 2001). Morerecently, faced with impatient online respondents,researchers developed algorithms for adaptive con-joint questions based on compensatory models (e.g.,Toubia et al. 2004). After data are collected adaptively,the likelihood principle enables the data to be reana-lyzed with models using classical statistics, Bayesianstatistics, or machine learning.

In some applications respondents are asked to self-state noncompensatory heuristics. Self-explication hashad mixed success because respondents often choseprofiles with aspects they had previously stated asunacceptable (e.g., Green et al. 1988). Recent experi-ments with incentive-compatible tasks, such as hav-ing respondents write an e-mail to a friend who willact as their agent, are promising (Ding et al. 2011).

Researchers have begun to propose methods toidentify heuristic decision rules from directly mea-sured consideration of product profiles. Finding thebest-fit decision rule requires solving a discrete opti-mization problem that is NP-hard (e.g., Martignonand Hoffrage 2002). Existing estimation uses machine-learning methods such as greedy heuristics, greedoiddynamic programs, logical analysis of data, or lin-ear programming perturbation (Dieckmann et al. 2009,Hauser et al. 2010, Kohli and Jedidi 2007, Yee et al.2007). Even for approximate solutions, runtimes areexponential in the number of aspects limiting methodsto moderate numbers of aspects. Bayesian methodshave been used to estimate parameters for moderatenumbers of aspects (e.g., Gilbride and Allenby 2004,2006; Hauser et al. 2010; Liu and Arora 2011). To date,profiles for direct consideration measures are chosenrandomly or from an orthogonal design.

Within this taxonomy Figure 1 illustrates the focusof this paper (thick box)—adaptive questions for non-compensatory heuristics. We also develop an esti-mation method for noncompensatory heuristics thatscales to large numbers of aspects, even when applied

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Dzyabura and Hauser: Active Machine Learning for Consideration HeuristicsMarketing Science 30(5), pp. 801–819, © 2011 INFORMS 803

Figure 1 Taxonomy of Existing Methods to Select Questions to Identify Consumer Decision Rules

Noncompensatory heuristics

Self-stated Fixed Adaptive

Moderatenumber ofaspects

Moderateor large

number ofaspects

Largenumber ofaspects

Unacceptable aspects (but typically overstate),incentive- compatible e-mail task

random, or market-based

(with HB, LAD, greedoid DPs, integer programs, greedy heuristics)

No prior feasibility

No prior applications are feasible for all heuristics

Question selection (and example estimation) for consumer decision rules

Compensatory (additive) rules

Self-stated Fixed Adaptive

Moderateor large

number ofaspects

Moderateor large

number ofaspects

Moderateor large

number ofaspects

Moderateor large

number ofaspects

Examples Examples Examples Examples Examples Examples ExamplesCommon since 1970s

Orthogonal, random, aggregate customization

(with HB, latent class, machine learning, etc.)

Polyhedral, SVMs, ACA

(with HB, latent class, analytic center, etc.)

Orthogonal,

Note. ACA, adaptive conjoint analysis; DP, dynamic program; HB, hierarchical Bayes; SVMs, support-vector machines.

to extant question-selection methods (dotted box).We focus on questions that ask about considerationdirectly (consider or not). However, our methodsapply to all data in which the consumer respondswith a yes or no answer and might be extendableto choice-based data where more than one profile isshown at a time. We do not focus on methods whereconsideration is an unobserved construct inferredfrom choice data (e.g., Erdem and Swait 2004, vanNierop et al. 2010). There is one related adaptivemethod—the first stage of adaptive choice-based con-joint analysis (ACBC; Sawtooth 2008) that is basedon rules of thumb to select approximately 28 profilesthat are variations on a “bring-your-own” profile. Pro-files are not chosen optimally, and noncompensatoryheuristics are not estimated.

3. Noncompensatory DecisionHeuristics

We classify decision heuristics as simple and morecomplex. The simple heuristics include conjunctive,disjunctive, lexicographic, take-the-best, and elimina-tion by aspects. The more complex heuristics include

subset conjunctive and disjunctions of conjunctions.The vast majority of scientific experiments have exam-ined the simple heuristics, with conjunctive the mostcommon (e.g., Gigerenzer and Selten 1999; Payneet al. 1988, 1993, and references therein). The studyof more complex heuristics, which nest the simpleheuristics, is relatively recent, but there is evidencethat some consumers use the more complex forms(Jedidi and Kohli 2005, Hauser et al. 2010). Both sim-ple and complex heuristics apply for consideration,choice, or other decisions and for a wide variety ofproduct categories. For simplicity of exposition, wedefine the heuristics with respect to the considerationdecision and illustrate the heuristics for automotivefeatures.

3.1. Simple Heuristics

3.1.1. Conjunctive Heuristic. For some featuresconsumers require acceptable (“must-have”) levels.For example, a consumer might only consider asedan body type and only consider Toyota, Nissan,or Honda. Technically, for features not in the conjunc-tion, such as engine type, all levels are acceptable.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


3.1.2. Disjunctive Heuristic. If the product has“excitement” levels of a feature, the product is con-sidered no matter what the levels of the other featuresare. For example, a consumer might consider all vehi-cles with a hybrid engine.

3.1.3. Take-the-Best. The consumer ranks prod-ucts on a single most diagnostic feature and considersonly those above some cutoff. For example, the con-sumer may find “brand” most diagnostic, rank prod-ucts on brand, and consider only those with brandsthat are acceptable—say, Toyota, Nissan, and Honda.

3.1.4. Lexicographic (by Features). This heuristicis similar to take-the-best except the feature need notbe most diagnostic. If products are tied on a featurelevel, then the consumer continues examining fea-tures lower in the lexico ordering until ties are broken.For example, the consumer might rank on brand,then body style considering only Toyota, Nissan, andHonda, and, among those brands, only sedans.

3.1.5. Elimination by Aspects. The consumerselects an aspect and eliminates all products withunacceptable levels, and then he or she selects anotheraspect and eliminates products with unacceptable lev-els on that aspect, continuing until only consideredproducts are left. For example, the consumer mayeliminate all but Toyota, Nissan, and Honda and allbut sedans. Researchers have also examined accep-tance by aspects and lexicographic by aspects thatgeneralize elimination by aspects in the obvious ways.

When the only data are consider versus not con-sider, it does not matter in which order the profileswere eliminated or accepted. Take-the-best, lexico-graphic (by features), elimination by aspects, accep-tance by aspects, and lexicographic by aspects areindistinguishable from conjunctive heuristics. Therules predict differently when respondents are askedto rank data and differ in the underlying cognitiveprocess, but they do not differ when predicting theobserved consideration set. Disjunctive is a mirrorimage of conjunctive. Thus, any question-selectionalgorithm that optimizes questions to identify con-junctive heuristics can be applied (perhaps with a mir-ror image) to any of the simple heuristics.

3.2. More Complex Heuristics

3.2.1. Subset Conjunctive. The consumer consid-ers a product if F features have levels that areacceptable. The consumer does not require all fea-tures to have acceptable levels. For example, the con-sumer might have acceptable brands (Toyota, Honda,Nissan), acceptable body types (sedan), and accept-able engines (hybrid) but only require that two of thethree features have levels that are acceptable.

3.2.2. Disjunctions of Conjunctions. The con-sumer might have two or more sets of acceptableaspects. For example, the consumer might consider[Toyota and Honda sedans] or [crossover body typeswith hybrid engines]. Disjunctions of conjunctionsnests the subset conjunctive heuristic and all ofthe simple heuristics (for consideration). However,its generality is also a curse. Empirical applicationsrequire cognitive simplicity to avoid overfitting data.

All of these decision heuristics are postulatedas descriptions of how consumers make decisions.Heuristics are not, and need not be, tied to utilitymaximization. For example, it is perfectly reason-able for a consumer to screen out low-priced prod-ucts because the consumer believes that he or sheis unlikely to choose such a product if consideredand, hence, does not believe that evaluating such aproduct is worth the time and effort. (Put anotherway, the consumer would purchase a fantastic prod-uct at a low price if he or she knew about the prod-uct but never finds out about the product because theconsumer chose not to evaluate low-priced products.When search costs are considered, it may be rationalfor the consumer not to search the lower-priced prod-uct because the probability of finding an acceptablelow-priced product is too low.)

In this paper we illustrate our question-selectionalgorithm with conjunctive decision rules (hence itapplies to all simple heuristics). We later extendthe algorithm to identify disjunctions-of-conjunctionsheuristics (which nest subset conjunctive heuristics).

4. Managerial Relevance: StylizedMotivating Example

As a stylized example, suppose that automobilescan be described by four features with two levelseach: Toyota or Chevy, sedan or crossover body type,hybrid or gasoline engine, and premium or basic trimlevels, for a total of eight aspects. Suppose we aremanaging the Chevy brand that makes only sedanswith gasoline engines and basic trim, and suppose itis easy to change trim levels but not the other fea-tures. If consumers are compensatory and their part-worths are heterogeneous and not “too extreme,” wecan get some consumers to consider our vehicle byoffering sufficiently premium trim levels. It might beprofitable to do so.

Suppose instead that a segment of consumers isconjunctive on [Toyota ∧ crossover]. (In our nota-tion, ∧ is the logical “and”; ∨ is the logical “or.”)No amount of trim levels will attract these conjunc-tive consumers. They will not pay attention to Chevyadvertising, visit the GM website, or travel to a Chevydealer—they will never evaluate any Chevy sedansno matter how much we improve them. In another

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


example, if a segment of consumers is conjunctiveon [crossover ∧ hybrid], we will never get those con-sumers to evaluate our vehicles unless we offer ahybrid crossover vehicle no matter how good wemake our gasoline-engine sedan. Even with disjunc-tions of conjunctions, consumers who use [(sedan ∧

hybrid)∨(crossover∧gasoline engine)] will never con-sider our gasoline-engine sedan. In theory we mightapproximate noncompensatory heuristics with com-pensatory partworth decision rules (especially if weinclude interactions), but if there are many aspects,empirical approximations may not be accurate.

Many products just never make it because theyare never considered; consumers never learn that theproducts have outstanding aspects that could com-pensate for the product’s lack of a conjunctive feature.Our empirical illustration is based in the automotiveindustry. Managers at high levels in the sponsoringorganization believe that conjunctive screening was amajor reason that the automotive manufacturer facedslow sales relative to other manufacturers. For exam-ple, they had evidence that more than half of the con-sumers in the United States would not even considertheir brands. Estimates of noncompensatory heuris-tics are now important inputs to product-design andmarketing decisions at that automotive manufacturer.

Noncompensatory heuristics can imply differentmanagerial decisions. Hauser et al. (2010) illustratehow rebranding can improve the share of a com-mon electronic device if consumers use compensatorymodels but not if consumers use noncompensatorymodels. Ding et al. (2011) illustrate that conjunctiverules and compensatory rules are correlated in the

Figure 2 Example Configurator and Example Queries (Color in Original)

Body Type: Down

Consider

Brand:

Engine Cylinders:

Composite MPG:

Engine Type:

Price:

Body Type Minivan

$32,000

Gasoline

20

Toyota

6

Brand

Engine Cylinders

Quality Rating

Crash Test Rating

Composite MPG

Engine Type

Price

Consider

29 questions left

(a) Example configurator (b) Example query

Next>>

Would Not Consider

sense that feature levels with higher average part-worth values also appear in more “must-have rules.”However, the noncompensatory models identify com-binations of aspects that would not be consideredeven though their combined partworth values mightbe reasonable.

5. Question Types, Error Structure,and Notation

5.1. Question Types and Illustrative ExampleFigure 2 illustrates the basic question formats. Theexample is automobiles, but these types of questionshave been used in a variety of product categories—usually durable goods, where consideration is easyto define and a salient concept to consumers (Dahanand Hauser 2002, Sawtooth 2008, Urban and Hauser2004). Extensive pretests suggest that respondentscan accurately “configure” a profile that they wouldconsider (Figure 2(a)). If respondents use only oneconjunctive rule in their heuristic, they find it diffi-cult to accurately configure a second profile. If theyuse a disjunctions-of-conjunctions heuristic with suffi-ciently distinct conjunctions, such as [(Toyota∧sedan)∨ (Chevy ∧ truck)], we believe they can configure asecond profile “that is different from previous pro-files that you said you will consider.” In this sectionwe focus on the first configured profile and the cor-responding conjunctive heuristic. In a later section,we address more conjunctions in a disjunctions-of-conjunctions heuristic.

After configuring a considered profile, we askrespondents whether or not they will consider various

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


profiles (Figure 2(b)). Our goal is to select the profilesthat provide the most information about decisionheuristics (information is defined below). With syn-thetic data we plot cumulative information (parame-ter recovery) as a function of the number of questions.In our empirical test, we ask 29 queries, half of whichare adaptive and half of which are chosen randomly(proportional to market share). We compare predic-tions based on the two types of questions. Althoughthe number of questions was fixed in the empiricaltest, we address how stopping rules can be endoge-nous to the algorithm.

5.2. Notation, Error Structure, andQuestion-Selection Goal

Let M be the number of features (e.g., brand, bodystyle, engine type, trim level; M = 4), and let N bethe total numbers of aspects (e.g., Toyota, Chevy,sedan, crossover, hybrid, gasoline engine, low trim,high trim; N = 8). Let i index consumers and jindex aspects. For each conjunction, consumer i’sdecision rule is a vector, Eai, of length N , with ele-ments aij such that aij = 1 if aspect j is accept-able and aij = −1 if it is not. For example, Eai =

8+11−11+11−11+11−11+11+19 would indicate thatthe ith consumer finds hybrid Toyota sedans withboth low and high trim to be acceptable.

Each sequential query (Figure 2(b)), indexed by k, isa profile, Exik, with N elements, xijk, such that xijk = 1 ifi’s profile k has aspect j and xijk = 0 if it does not. EachExik has exactly M nonzero elements, one for each fea-ture. (In our stylized example, a profile contains onebrand, one body type, one engine type, and one trimlevel.) For example, Exik = 81101110111011109 wouldbe a hybrid Toyota sedan with low trim.

Let XiK be the matrix of the first K profiles given toa consumer; each row corresponds to a profile. Math-ematically, profile Exik satisfies a conjunctive rule Eai ifwhenever xijk = 1, then aij = 1, such that every aspectof the profile is acceptable. In our eight-aspect exam-ple, consumer i finds the hybrid Toyota sedan withlow trim to be acceptable (compare Eai to Exik5. This con-dition can be expressed as minj8xijkaij9≥ 0. It is vio-lated only if a profile has at least one level (xijk = 15that is unacceptable (aij = −15. Following Gilbride andAllenby (2004), we define a function to indicate whena profile is acceptable: I4Exik1 Eai5 = 1 if minj8xijkaij9≥ 0,and I4Exik1 Eai5 = 0 otherwise. We use the same cod-ing for disjunctive rules but modify the definition ofI4Exik1 Eai5 to use maxj rather than minj .

Let yik be consumer i’s answer to the kth query,where yik = 1 if the consumer says “consider” andyik = 0 otherwise. Let EyiK be the vector of the first Kanswers. If there were no response errors, we wouldobserve yik = 1 if and only if I4Exik1 Eai5 = 1. How-ever, empirically, we expect response errors. Because

the algorithm must run rapidly between queries, wechoose a simple form for response error. Specifically,we assume that a consumer gives a false-positiveanswer with probability �1 and a false-negativeanswer with probability �2. For example, the ith con-sumer will say “consider (yik = 15” with probability1 − �2 whenever the indicator function implies “con-sider,” but he or she will also say “consider” withprobability �1 if the indicator function implies “notconsider.” This error structure implies the followingdata-generating model:

Pr4yik =1 � Exik1 Eai5= 41−�25I4Exik1 Eai5+�141−I4Exik1 Eai551

Pr4yik =0 � Exik1 Eai5=�2I4Exik1 Eai5+41−�1541−I4Exik1 Eai550

(1)

Each new query Exi1K+1 is based on our poste-rior beliefs about the decision rules (Eai5. After theKth query, we compute the posterior Pr4Eai � XiK1yiK5conditioned on the first K queries (XiK5, the firstK answers (EyiK5, and the priors. (Posterior beliefsmight also reflect information from other respon-dents; see §6.6.) We seek to select the Exiks to get asmuch information as feasible about Eai or, equivalently,to reduce uncertainty about Eai by the greatest amount.In §6.2 we define “information” and describe how weoptimize it.

5.3. Error Magnitudes Are Set Prior toData Collection

We cannot know the error magnitudes until after dataare collected, but we must set the �s in order to collectdata. Setting the �s is analogous to setting “accuracy”parameters in aggregate customization. We addressthis conundrum in two ways: (1) We treat the �s as“tuning” parameters and explore the sensitivity tothese tuning parameters with synthetic data. Settingtuning parameters is common in machine-learningquery selection. (2) For the empirical test, we relyon managerial judgment (Little 2004a, b). Because thetuning parameters are set by managerial judgmentprior to data collection, our empirical test is conser-vative in the sense that predictions might improve iffuture research allows updating of the error magni-tudes within or across respondents.

To aid intuition we motivate the �s with anillustrative microanalysis of our stylized example.Suppose that a respondent’s conjunctive heuris-tic is [Toyota ∧ crossover]. This respondent shouldfind a crossover Toyota acceptable and not careabout the engine and trim. Coding each aspect asacceptable or not, and preserving the order Toyota,Chevy, sedan, crossover, hybrid, gasoline, premiumtrim, and basic trim, this heuristic becomes Eai =

6+11−11−11+11+11+11+11+17. Suppose that whenthis respondent makes a consideration decision, he or

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


she makes errors with probability � on each aspect,where an error involves flipping that aspect’s accept-ability. For example, suppose he or she is showna Toyota crossover with a hybrid engine and pre-mium trim; that is, Exik = 61101011111011107. He orshe matches the heuristic to the profile aspect byaspect, making errors with probability � for eachacceptable aspect in the profile; e.g., Toyota is accept-able per the heuristic but may be mistaken for unac-ceptable with probability �. The respondent can makea false-negative error if any of the four aspects inthe profile are mistaken for unacceptable ones. Ifthese errors occur independently, the respondent willmake a false-negative error with probability �2 =

1 − 41 − �54. If a profile is unacceptable, say, Exik =

60111110111011107, we easily compute �1 = �241−�52.In this illustration, any prior belief on the distri-

bution of the heuristics and profiles implies expected�s as a function of the �s. Whether one specifiesthe �s and derives expected �s or specifies the �sdirectly depends on the researchers and managers,but in either case, the tuning parameters are specifiedprior to data collection. With synthetic data we foundno indication that one specification is preferred to theother. Empirically, we found it easier to think aboutthe �s directly.

6. Adaptive Question SelectionTo select questions adaptively, we must address thefollowing procedure:

Step 1. Initialize beliefs by generating consumer-specific priors.

Step 2. Select the next query based on current pos-terior beliefs.

Step 3. Update posterior beliefs from the priors andthe responses to all the previous questions.

Step 4. Continue looping Steps 2 and 3 until Qquestions are asked (or until another stopping rule isreached).

6.1. Initialize Consumer-Specific Beliefs (Step 1)Hauser and Wernerfelt (1990, p. 393) provide exam-ples where self-stated consideration set sizes areone-tenth or less of the number of brands on themarket. Our experience suggests these examples aretypical. If the question-selection algorithm used non-informative priors, the initial queries would be closeto random guesses, most of which would not be con-sidered by the consumer. When a consumer considersa profile, we learn (subject to the errors) that all ofits aspects are acceptable; when a consumer rejectsa profile, we learn only that one or more aspectsare unacceptable. Therefore, the first considered pro-file provides substantial information and a significantshift in beliefs. Without observing the first consideredprofile directly, queries are not efficient, particularly

with large numbers of aspects (N5. To address thisissue, we ask each respondent to configure a consid-ered profile and, hence, gain substantial information.

Prior research using compensatory rules (e.g.,Toubia et al. 2004) suggests that adaptive questionsare most efficient relative to random or orthogonalquestions when consumers’ heuristic decision rulesare heterogeneous. We expect similar results for non-compensatory heuristics. In the presence of hetero-geneity, the initial configured profile enables us totailor prior beliefs to each respondent.

For example, in our empirical application we tai-lor prior beliefs using the co-occurrence of brandsin consideration sets. Such data are readily availablein the automotive industry and for frequently pur-chased consumer goods. Alternatively, prior beliefsmight be updated on the fly using a collaborativefilter on prior respondents (see §6.6). Without lossof generality, let j = 1 index the brand aspect therespondent configures, and for other brand aspects,let b1j be the prior probability that brand j is accept-able when brand 1 is acceptable. Let Exi1 be the con-figured profile, and set yi1 = 1. When co-occurrencedata are available, prior beliefs on the marginal prob-abilities are set such that Pr4ai1 = 1 � Exi11yi15 = 1 andPr4aij = 1 � Exi11yi11priors5= bij for j 6= 1.

Even without co-occurrence data, we can setrespondent-specific priors for every aspect on whichwe have strong prior beliefs. We use weakly infor-mative priors for all other aspects. When managershave priors across features (e.g., considered hybridsare more likely to be Toyotas), we also incorporatethose priors (Little 2004a, b).

6.2. Select the Next Question Based on PosteriorBeliefs from Prior Answers (Step 2)

The respondent’s answer to the configurator pro-vides the first of a series of estimates of his orher decision rule, pij1 = Pr4aij = 1 � Xi1 = Exi11 Eyi15 forall aspects j . (We have suppressed the notation for“priors.”) We update these probabilities by iteratingthrough Steps 2 and 3, computing updated estimatesafter each question–answer pair using all data col-lected up to and including that the Kth question,pijK = Pr4aij = 1 � XiK1 EyiK5 for K > 1. (Details are inStep 3; see §6.3.) To select the K + 1st query (Step 2),assume we have computed posterior values (pijK5from prior queries (up to K5 and that we can com-pute contingent values (pij1K+15 one step ahead forany potential new query (Exi1K+15 and its correspond-ing answer (yi1K+15. We seek those questions that tellus as much as feasible about the respondent’s decisionheuristic. Equivalently, we seek to reduce uncertaintyabout Eai by the greatest amount.

Following Lindley (1956) we define the most infor-mative question as the query that minimizes a loss

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


function. In this paper, we use Shannon’s entropy asthe uncertainty measure (Shannon 1948), but othermeasures of uncertainty could be used without other-wise changing the algorithm. Shannon’s entropy, mea-sured in bits, quantifies the amount of informationthat is missing because the value of a random vari-able is not known for certain. Higher entropy corre-sponds to more uncertainty. Zero entropy correspondsto perfect knowledge. Shannon’s entropy (hereafter,entropy) is used widely in machine learning, hasproven robust in many situations, and is the basis ofcriteria used to evaluate parameter recovery and pre-dictive ability (U 2 and Kullback–Leibler 1951 diver-gence). We leave to future implementations other lossfunctions such as Rényi (1961) entropy, suprisals, andother measures of information.1 Mathematically,

HEai=

N∑

j=1

−{

pijK log2 pijK + 41 − pijK5 log2 41 − pijK5}

0 (2)

If some aspects are more important to managerialstrategy, we use a weighted sum in Equation (2).

To select the K + 1st query, Exi1K+1, we enumer-ate candidate queries, anticipating the answer to thequestion, yi1K+1, and anticipating how that answerupdates our posterior beliefs about the respon-dent’s heuristic. Using the pijKs we compute theprobability the respondent will consider the profile,qi1K+14Exi1K+15 = Pr4yi1K+1 = 1 � XiK1 EyiK1 Exi1K+15. Usingthe Step 3 algorithm (described in the next subsec-tion), we update the posterior pij1K+1s for all potentialqueries and answers. Let p+

ij1K+14Exi1K+15 = Pr4aij = 1 �

XiK1 EyiK1 Exi1K+11yi1K+1 = 15 be the posterior beliefs ifwe ask profile Exi1K+1 and the respondent considersit. Let p−

ij1K+14Exi1K+15= Pr4aij = 1 �XiK1 EyiK1 Exi1K+11yi1K+1= −15 be the posterior beliefs if the respondent doesnot consider the profile. Then the expected posteriorentropy is

E6HEai4Exi1K+1 �XiK1 EYiK57

=−qi1K+14Exi1K+15∑

j

{

p+

ij1K+14Exi1K+15log26p+

ij1K+14Exi1K+157

+61−p+

ij1K+14Exi1K+157log261−p+

ij1K+14Exi1K+157}

−61−qi1K+14Exi1K+157∑

j

{

p−

ij1K+14Exi1K+15log26p−

ij1K+14Exi1K+157

+61−p−

ij1K+14Exi1K+157log261−p−

ij1K+14Exi1K+157}

0 (3)

When the number of feasible profiles is moderate,we compute Equation (3) for every profile and choosethe profile that minimizes Equation (3). However, in

1 Rényi’s entropy reduces to Shannon’s entropy when Rényi’s �= 1;the only value of � for which information on the aijs is separable.To use these measures of entropy, modify Equations (2) and (3) toreflect Rényi’s �.

large designs such as the 53-aspect design in ourempirical example, the number of potential queries435712105 can be quite large. Because this large num-ber of computations cannot be completed in less thana second, we focus our search using uncertainty sam-pling (e.g., Lewis and Gale 1994). Specifically, we eval-uate Equation (3) for the T queries about which weare most uncertain. “Most uncertain” is defined asqi1K+14Exi1K+15≈ 005. Profiles identified from among theT most uncertain profiles are approximately optimaland, in some cases, optimal (e.g., see Appendix A).Uncertainty sampling is similar to choice balanceas used in both polyhedral methods and aggregatecustomization (e.g., Arora and Huber 2001, Toubiaet al. 2004). Synthetic data tests demonstrate thatwith T sufficiently large, we achieve close-to-optimalexpected posterior entropy. For our empirical applica-tion, setting T = 11000 kept question selection undera second. As computing speeds improve, researcherscan use a larger T .

Equation (3) is myopic because it computesexpected posterior entropy one step ahead. Extendingthe algorithm S steps ahead is feasible for small N .However S-step computations are exponential in S.For example, if there are 256 potential queries, a two-step ahead algorithm requires that we evaluate 2562 =

651536 potential queries (without further approxima-tions). Fortunately, synthetic data experiments sug-gest that one-step ahead computations achieve closeto the theoretical maximum information of one bitper query (when there are no response errors) and doquite well when there are response errors. For com-pleteness we coded a two-step-ahead algorithm in thecase of 256 potential queries. Even for modest prob-lems, its running time was excessive (over 13 minutesbetween questions); it provided negligible improve-ments in parameter recovery. Our empirical applica-tion has over a thousand times as many potentialqueries—a two-step-ahead algorithm was not feasiblecomputationally.

6.3. Update Beliefs About Heuristic Rules Basedon Answers to the K Questions (Step 3)

In Step 3 we use Bayes theorem to update our beliefsafter the Kth query:

Pr4Eai �XiK1 EyiK5

∝ Pr4yiK � ExiK1 Eai = Ea5Pr 4Ea= Eai �Xi1K−11 Eyi1K−150 (4)

The likelihood term, Pr4yiK � ExiK1 Eai = Ea5, comes fromthe data-generating model in Equation (1). The vari-able of interest, Eai, is defined over all binary vec-tors of length N . Because the number of potentialconjunctions is exponential in N , updating is notcomputationally feasible without further structure onthe distribution of conjunctions. For example, with

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


N = 53 in our empirical example, we would need toupdate the distribution for 900 × 1015 potential con-junctions.

To gain insight for a feasible algorithm, we examinesolutions to related problems. Gilbride and Allenby(2004) use a “Griddy–Gibbs” algorithm to samplethreshold levels for features. At the consumer level,the thresholds are drawn from a multinomial distribu-tion. The Griddy–Gibbs uses a grid approximation tothe (often univariate) conditional posterior. We cannotmodify their solution directly, in part because most ofour features are horizontal (e.g., brand) and thresh-olds do not apply. Even for vertical features, suchas price, we want to allow non-threshold heuristics.We need algorithms that let us classify each level asacceptable or not.

For a feasible algorithm, we use a variationalBayes approach. In variational Bayes inference, acomplex posterior distribution is approximated witha variational distribution chosen from a family ofdistributions judged similar to the true posterior dis-tribution. Ideally, the variational family can be evalu-ated quickly (Attias 1999, Ghahramani and Beal 2000).Even with an uncertainty-sampling approximation inStep 2, we must compute posterior distributions for2T question–answer combinations and do so whilethe respondent waits for the next question.

As our variational distribution, we approximatethe distribution of Eai with N independent bino-mial distributions. This variational distribution has Nparameters, the pijs, rather than parameters for the 2N

potential values of Eai. Because this variational approx-imation is within a consumer, we place no restric-tion on the empirical population distribution of theaijs. Intercorrelation at the population level is likely(and allowed) among aspect probabilities. For exam-ple, we might find that those automotive consumerswho screen on Toyota also screen on hybrid engines.In another application we might find that those cellu-lar phone consumers who screen on Nokia also screenon “flip.” For every respondent the posterior valuesof all pijKs depend on all of the data from that respon-dent, not just queries that involve the jth aspect.

To calculate posteriors for the variational distribu-tion, we use a version of belief propagation (Yedidiaet al. 2003, Ghahramani and Beal 2001). The algorithmconverges iteratively to an estimate of EpiK . The hthiteration uses Bayes theorem to update each phijK basedon the data and based on phij ′K for all j ′ 6= j . Withinthe hth iteration, the algorithm loops over aspectsand queries using the data-generating model (Equa-tion (1)) to compute the likelihood of observing yk = 1conditioned on the likelihood for k′ 6= k. It continuesuntil the estimates of the phijKs stabilize. In our expe-rience, the algorithm converges quickly: 95.6% of theestimations converge in 20 or fewer iterations, 99.2%

in 40 or fewer iterations, and 99.7% in 60 or feweriterations. Appendix B provides the pseudo-code.

Although variational distributions work well in avariety of applications, there is no guarantee for ourapplication. Performance is an empirical question thatwe address in §§7 and 8. Finally, we note that thebelief propagation algorithm and Equation (4) appearto be explicitly dependent only on the questions thatare answered by consumer i. However, our notationhas suppressed the dependence on prior beliefs. It isa simple matter to make prior beliefs dependent onthe distribution of the Eais, as estimated from previousrespondents (see §6.6).

6.4. Stopping Rules (Step 4)Adaptive-question selection algorithms for compen-satory decision rules and fixed question-selectionalgorithms for compensatory or noncompensatoryrules rely on a target number of questions chosenby prior experience or judgment. Such a stoppingrule can be used with the adaptive question-selectionalgorithm proposed in this paper. For example, westopped after Q = 29 questions in our empiricalillustration.

However, expected posterior entropy minimizationmakes it feasible to select a stopping rule endoge-nously. One possibility is to stop questioning whenthe expected reduction in entropy drops below athreshold for two or more adaptive questions. Syn-thetic data provide some insight. In §7 we plot theinformation obtained about parameters as a functionof the number of questions. In theory we might alsogain insight from our empirical example. However,because our empirical example used only 29 ques-tions for 53 aspects, for 99% of the respondentsthe adaptive-question selection algorithm would stillhave gained substantial information if the respon-dents had been asked a 30th question. We return tothis issue in §11. Because we cannot redo our empir-ical example, we leave this and other stopping-ruleextensions to future research.

6.5. Extension to Disjunctions of ConjunctionsDisjunctions-of-conjunctions heuristics nest both sim-ple and complex heuristics. The extension to disjunc-tions of conjunctions is conceptually simple. After wereach a stopping rule, whether it be fixed a priori orendogenous, we simply restart the algorithm by ask-ing a second configurator question but requiring ananswer that is substantially different from the pro-files that the respondent has already indicated he orshe will consider. If the respondent cannot configuresuch a profile, we stop. Empirically, cognitive simplic-ity suggests that respondents use relatively few con-junctions (e.g., Gigerenzer and Goldstein 1996, Hauseret al. 2010, Martignon and Hoffrage 2002). Most con-sumers use one conjunction (Hauser et al. 2010).

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Hence the number of questions should remain withinreason. We test this procedure on synthetic data and,to the extent that our data allow, empirically.

6.6. Using Data from Previous RespondentsWe can use data from other respondents to improvepriors for new respondents, but in doing so, wewant to retain the advantage of consumer-specific pri-ors. Collaborative filtering provides a feasible method(e.g., Breese et al. 1998). We base our collaborative fil-ter on the consumer-specific data available from theconfigurator (Figure 2(a)).

Specifically, after a new respondent completes theconfigurator, we use collaboratively filtered data fromprevious respondents who configured similar profiles.For example, if an automotive consumer configures aChevy, we search for previous respondents who con-figured a Chevy. For other brands we compute priorswith a weighted average of the brand posteriors (pijs)from those respondents. (We weigh previous respon-dents by predictive precision.) We do this for all con-figured features. As sample sizes increase, populationdata overwhelm even “bad” priors; performance willconverge to performance based on accurate priors(assuming the collaborative filter is effective). We testfinite-sample properties on synthetic data and, empir-ically, with an approximation based on the data wecollected.

7. Synthetic Data ExperimentsTo evaluate the ability of the active-learning algo-rithm to recover known heuristic decision rules,we use synthetic respondents. To compare adaptive-question selection to established methods, we choosea synthetic decision task with sufficiently manyaspects to challenge the algorithm but for which exist-ing methods are feasible. With four features at fourlevels (16 aspects), there are 65,536 heuristic rules—a challenging problem for extant heuristic-rule esti-mation methods. An orthogonal design is 32 profilesand, hence, in the range of tasks in the empirical lit-erature. We simulate respondents who answer anynumber of questions K ∈ 6112567, where 256 profilesexhaust the feasible profiles. To evaluate the question-selection methods, we randomly select 1,000 heuris-tic rules (synthetic respondents). For each aspect wedraw a Bernoulli probability from a Beta41115 distri-bution (uniform distribution) and draw a +1 or −1using the Bernoulli probability. This “sample size”is on the high side of what we might expect in anempirical study and provides sufficient heterogeneityin heuristic rules.

For each decision heuristic, Eai, we use either theproposed algorithm or an established method to selectquestions. The synthetic respondent then “answers”the questions using the decision heuristic, but with

response errors �1 and �2 chosen as if generated byreasonable �s. To compare question-selection meth-ods, we keep the estimation method constant. Weuse the variational Bayes belief-propagation methoddeveloped in this paper. The benchmark question-selection methods are orthogonal, random, andmarket based. Market-based questions are chosen ran-domly but in proportion to profile shares we mightexpect in the market—market shares are known forsynthetic data.

With synthetic data we know the parameters aij .For any K and for all i and j , we use the “observed”synthetic data to update the probability pijK thataij = 1. An appropriate information-theoretic measureof parameter recovery is U 2, which quantifies the per-centage of uncertainty explained (empirical informa-tion/initial entropy; Hauser 1978); U 2 = 100% indi-cates perfect parameter recovery.

We begin with synthetic data that contain noresponse errors. These data quantify potential maxi-mum gains with adaptive questions, test how rapidlyactive-learning questions recover parameters per-fectly, and bound improvements that would be possi-ble with nonmyopic S-step-ahead algorithms. We thenrepeat the experiments with error-laden synthetic dataand with “bad” priors. Finally, we examine whetherwe can recover disjunctions-of-conjunctions heuris-tics and whether population-based priors improvepredictions.

7.1. Tests of Upper Bounds on ParameterRecovery (No Response Errors)

Figure 3 presents key results. To simplify interpreta-tion we plot random queries in Appendix D, ratherthan Figure 3, because the results are indistinguish-able from market-based queries on the scale of Fig-ure 3. Market-based queries do approximately 3%better than random queries for the first 16 queries,approximately 1% better for the first 32 queries, andapproximately 0.5% better for all 256 queries. (Queries129–256 are not shown in Figure 3; the random-query and the market-based query curves asymp-tote to 100%.) Orthogonal-design questions are onlydefined for K = 32.

Questions selected adaptively by the active-learn-ing algorithm find respondents’ decision heuristicsmuch more rapidly than existing question-selectionmethods. The adaptive questions come very close toan optimal reduction in posterior entropy. With 16aspects and equally likely priors, the prior entropy is16 log2(2), which is 16 bits. The configurator revealsfour acceptable aspects (four bits). Each subsequentquery is a binary outcome that can reveal at mostone bit. A perfect active-learning algorithm would

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Figure 3 Synthetic Data Experiments (Base Comparison, No Error):Percent Uncertainty Explained 4U25 for AlternativeQuestion-Selection Methods

0

20

40

60

80

100

0 32 64 96 128

U2

(%)

Number of questions

Adaptive questions

Market-based questions

Orthogonal-design questions

require 12 additional queries to identify a decisionrule (4 bits + 12 bits identifies the 16 elements ofEai5. On average, in the absence of response error,the adaptive questions identify the respondent’s deci-sion heuristic in approximately 13 questions. Thevariational approximation and the one-step-aheadquestion-selection algorithm appear to achieve close-to-optimal information (12 bits in 13 questions).

We compare the relative improvement as a resultof question-selection methods by holding informa-tion constant and examining how many questionsit takes to achieve that level of parameter recovery.Because an orthogonal design is fixed at 32 ques-tions, we use it as a benchmark. As illustrated inthe first line of data in Table 1, an orthogonal designrequires 32 queries to achieve a U 2 of approximately76%. Market-based questions require 38 queries; ran-dom questions require 40 queries, and adaptive ques-tions only nine queries. To parse the configuratorfrom the adaptive questions, Appendix D plots the U 2

obtained with a configurator plus market-based ques-tions. The plot parallels the plot of purely market-based queries requiring 30 queries to achieve a U 2

of approximately 76%. In summary, in an errorlessworld, the active-learning algorithm chooses adaptivequestions that provide substantially more informationper question than existing nonadaptive methods. Thelarge improvements in U 2, even for small numbers ofquestions, suggests that adaptive questions are cho-sen to provide information efficiently.

7.2. Tests of Parameter Recovery When There AreResponse Errors or “Bad” Priors

We now add either response error or “bad” pri-ors and repeat the synthetic data experiments. Theplots remain quasi-concave for a variety of levelsof response error and/or bad priors.2 We report

2 Although the plots in Figure 2 are concave, there is no guaranteethat the plots remain concave for all situations. However, we doexpect all plots to be quasi-concave, and they are.

representative values in Table 1. (Table 1 is based onfalse negatives occurring 5% of the time. False posi-tives are set by the corresponding �. Bad priors per-turb “good” priors with bias drawn from U6010017.)Naturally, as we add errors or bad priors, the amountof information obtained per question decreases; forexample, 13 adaptive questions achieved a U 2 of 100%without response errors but only 55.5% with responseerrors. On average, it takes 12.4 adaptive questions toobtain a U 2 of 50% (standard deviation 8.7). The lastcolumn of Table 1 reports the information obtained by32 orthogonal questions. Adaptive questions obtainrelatively more information per question than existingmethods under all scenarios. Indeed, adaptive ques-tions appear to be more robust to bad priors thanexisting question-selection methods.

7.3. Tests of the Ability to RecoveryDisjunctions-of-Conjunctions Heuristics

We now generate synthetic data for respondents whohave two distinct conjunctions rather than just oneconjunction. By distinct, we mean no overlap in theconjunctions. We allow both question-selection meth-ods to allocate one-half of their questions to the firstconjunction and one-half to the second conjunction.To make the comparison fair, all question-selectionmethods use data from the two configurators whenestimating the parameters of the disjunctions-of-conjunctions heuristics. After 32 questions (plus twoconfigurators), estimates based on adaptive questionsachieve a U 2 of 80.0%, whereas random questionsachieve a U 2 of only 34.5%. Adaptive questions alsobeat market-based and orthogonal-design questionshandily.

This is an important result. With random questionsfalse positives from the second conjunction pollute theestimation of the parameters of the first conjunction,and vice versa. The active-learning algorithm focusesquestions on one or the other conjunction to providegood recovery of the parameters of both conjunctions.We expect the two-conjunction results to extend read-ily to more than two conjunctions. Although this ini-tial test is promising, future tests might improve thealgorithm with endogenous stopping rules that allo-cate questions optimally among conjunctions.

7.4. Tests of Incorporating Data fromPrevious Respondents

To demonstrate the value of incorporating data fromother respondents, we split the sample of syntheticrespondents into two halves. For the first half ofthe sample, we use bad priors, ask questions adap-tively, and estimate the Eais. We use the estimatedEais and a collaborative filter on two features to cus-tomize priors for the remaining respondents. We thenask questions of the remaining respondents using

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Table 1 Synthetic Data Experiments Number of Questions Necessary to Match Predictive Ability of32 Orthogonal Questions

Adaptive Random Market-based Orthogonal- Percentquestions questions questions design questions uncertaintya

Base comparison 9 40 39 32 7601Error in answers 11 38 38 32 5306“Bad” priors 6 42 41 32 5004

Note. Number of questions in addition to the configurator question.aU2 (percent uncertainty explained) when heuristics estimated from 32 orthogonal questions; U2 for other

question-selection methods is approximately the same subject to integer constraints on the number of questions.

collaborative-filter-based priors. On average, U 2 is17.8% larger on the second set of respondents (usingcollaborative-filter-based priors) than on the first setof respondents (not using collaborative-filter-basedpriors). Thus, even when we use bad priors forearly respondents, the posteriors from those respon-dents are sufficient for the collaborative filter. Thecollaborative-filter-based priors improve U 2 for theremaining respondents.

7.5. Summary of Synthetic Data ExperimentsThe synthetic data experiments suggest that

• adaptive question selection via active learningis feasible and can recover the parameters of knownheuristic decision rules,

• adaptive question selection provides more infor-mation per question than existing methods,

• one-step-ahead active-learning adaptive ques-tions achieve gains in information (reduction inentropy) that are close to the theoretical maximumwhen there are no response errors,

• adaptive question selection provides more infor-mation per question when there are response errors,

• adaptive question selection provides more infor-mation per question when there are badly chosenpriors,

• it is feasible to extend adaptive-question selec-tion to disjunctions-of-conjunctions heuristic decisionrules, and

• incorporating data from other respondents im-proves parameter recovery.These synthetic data experiments establish that ifrespondents use heuristic decision rules, then theactive-learning algorithm provides a means to askquestions that provide substantially more informationper question.

8. Illustrative Empirical Applicationwith a Large Number of Aspects

In the spring of 2009, a large American automotivemanufacturer (AAM) recognized that consideration oftheir vehicles was well below that of non-U.S. vehi-cles. Management was interested in exploring variousmeans to increase consideration. As part of that effort,

AAM fielded a Web-based survey to 2,336 respon-dents recruited and balanced demographically froman automotive panel maintained by Harris Interac-tive, Inc. Respondents were screened to be 18 yearsof age and interested in purchasing a new vehicle inthe next two years. Respondents received 300 Harrispoints (good for prizes) as compensation for complet-ing a 40-minute survey. The response rate was 68.2%,and the completion rate was 94.9%.

The bulk of AAM’s survey explored various mar-keting strategies that AAM might use to enhanceconsideration of their brands. The managerial test ofcommunications strategies is tangential to the scopeand focus of this paper, but we illustrate in §10 thetypes of insight provided by estimating consumers’noncompensatory heuristics.

Because AAM’s managerial decisions depended onthe accuracy with which they could evaluate theircommunications strategies, we were given the oppor-tunity to test adaptive-question selection for a subsetof the respondents. A subset of 872 respondents wasnot shown any communications inductions. Instead,after configuring a profile, evaluating 29 calibrationprofiles, and completing a memory-cleansing task(Frederick 2005), respondents evaluated a second setof 29 validation profiles. (A 30th profile in calibrationand validation was used for other research purposesby AAM.) The profiles varied on 53 aspects: brand(21 aspects), body style (9 aspects), price (7 aspects),engine power (3 aspects), engine type (2 aspects), fuelefficiency (5 aspects), quality (3 aspects), and crash-test safety (3 aspects).

8.1. Adaptive Question Selection forCalibration Profiles

To test adaptive question selection, one-half of thecalibration profiles were chosen adaptively by theactive-learning algorithm. The other half were cho-sen randomly in proportion to market share from thetop 50 best-selling vehicles in the United States. Toavoid order effects and to introduce variation in thedata, the question-selection methods were random-ized. This probabilistic variation means that the num-ber of queries of each type is 14.5, on average, butvaries by respondent.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


As a benchmark we chose market-based queriesrather than random queries. The market-basedqueries perform slightly better on synthetic data thanpurely random queries and, hence, provide a strongertest. We could not test an orthogonal design because29 queries is but a small fraction of the 13,320 pro-files in a 53-aspect orthogonal design. (A full fac-torial would require 357,210 profiles.) Furthermore,even if we were to complete an orthogonal designof 13,320 queries, Figure 2 suggests that orthogo-nal queries do only slightly better than random ormarket-based queries. Following Sándor and Wedel(2002) and Vriens et al. (2001), we split the market-based profiles (randomly) over respondents.

Besides enabling methodological comparisons, thismix of adaptive and market-based queries has prac-tical advantages with human respondents. First, themarket-based queries introduce variety to engagethe respondent and help disguise the choice-balancenature of the active-learning algorithm. (Respondentsget variety in the profiles they evaluate.) Second,market-based queries sample “far away” from theadaptive queries chosen by the active-learning algo-rithm. They might prevent the algorithm from gettingstuck in a local maximum (an analogy to simulatedannealing).

8.2. Selecting Priors for the Empirical ApplicationAAM had co-occurrence data available from priorresearch, so we set priors as described in §6.1. Inaddition, using AAM’s data and managerial beliefs,we were able to set priors on some pairwise con-junctions such as “Porsche ∧ Kia” and “Porsche ∧

pick-up.” Rather than setting these priors directly ascorrelations among the aijs, AAM’s managers foundit more intuitive to generate “pseudo-questions” inwhich the respondent was assumed to “not consider”a “Porsche ∧ pick-up” with probability q, where qwas set by managerial judgment. In other applicationsresearchers might set the priors directly.

8.3. Validation Profiles Used to EvaluatePredictive Ability

After a memory-cleansing task, respondents wereshown a second set of 29 profiles, this time cho-sen by the market-based question-selection method.Because there was some overlap between the market-based validation and the market-based calibrationprofiles, we have an indicator of respondent reliabil-ity. Respondents consistently evaluated market-basedprofiles 90.5% of the time. Respondents are consistent,but not perfect, and, thus, modeling response error(via the �s) appears to be appropriate.

8.4. Performance MeasuresAlthough hit rate is an intuitive measure, it can mis-lead intuition for consideration data. If a respondent

were to consider 20% of both calibration and valida-tion profiles, then a null model that predicts “rejectall profiles” will achieve a hit rate of 80%. Such anull model, however, provides no information, has alarge number of false-negative predictions, and pre-dicts a consideration set size of 0. On the other hand,a null model that predicts randomly proportionalto the consideration set size in the calibration datawould predict a larger validation consideration setsize and balance false positives and false negatives,but it would achieve a lower hit rate (68%: 0068 =

400852 + 4002525. Nonetheless, for interested readers,Appendix E provides hit rates.

We expand evaluative criteria by examining false-positive and false-negative predictions. A managermight put more (or less) weight on not missingconsidered profiles than on predicting as consid-ered profiles that are not considered. However, with-out knowing specific loss functions to weigh falsepositives and false negatives differently, we cannothave a single managerial criterion (e.g., Toubia andHauser 2007). Fortunately, information theory pro-vides a commonly used measure that balances falsepositives and false negatives: the Kullback–Leiblerdivergence (KL). KL is a nonsymmetric measure ofthe difference from a prediction model to a compar-ison model (Chaloner and Verdinelli 1995, Kullbackand Leibler 1951). It discriminates among modelseven when the hit rates might otherwise be equal.Appendix C provides formulae for the KL measureappropriate to the data in this paper. We calculatedivergence from perfect prediction; hence a smaller KLis better.

In synthetic data we knew the “true” decision ruleand could compare the estimated parameters aijs toknown parameters. U 2 was the appropriate measure.With empirical data we do not know the true deci-sion rule; we only observe the respondents’ judg-ments about consider versus not consider; hence KLis an appropriate measure. However, both attempt toquantify the information explained by the estimatedparameters (decision heuristics).

8.5. Key Empirical ResultsTable 2 summarizes KL divergence for the twoquestion-selection methods that we tested: adaptivequestions and market-based questions. For each ques-tion type, we use two estimation methods: (1) thevariational Bayes belief-propagation algorithm com-putes the posterior distribution of the noncompen-satory heuristics, and (2) a hierarchical Bayes logitmodel (HB) computes the posterior distribution for acompensatory model. HB is the most used estimationmethod for additive utility models (Sawtooth 2004),and it has proven accurate for zero-versus-one consid-eration decisions (Ding et al. 2011, Hauser et al. 2010).

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Table 2 Illustrative Empirical Application KL Divergence forQuestion-Selection-and-Estimation Combinations (WhereSmaller Is Better)

Noncompensatory Compensatoryheuristics decision model

Question-selection methodAdaptive questions 00475abc 00537c

Market-based questions 00512c 00512cd

Null modelsConsider all profiles 0.565Consider no profiles 0.565Randomly consider profiles 0.562

aSignificantly better than market-based questions for noncompensatoryheuristics (p < 00001).

bSignificantly better than compensatory decision model (p < 00001).cSignificantly better than null models (p < 00001).dSignificantly better than adaptive questions for compensatory decision

model (p < 00001).

The latter authors provide a full HB specification inAppendix C. Both estimation methods are based onlyon the calibration data. For comparison Table 2 alsoreports predictions for null models that predict allprofiles as considered, predict no profiles as consid-ered, and predict profiles randomly based on the con-sideration set size among the calibration profiles.

When the estimation method assumes respon-dents use heuristic decision rules, rules estimatedfrom adaptive questions predict significantly bet-ter than rules estimated from market-based queries.(Hit rates are also significantly better.) Furthermore,for adaptive questions, heuristic rules predict sig-nificantly better than HB-estimated additive rules.Although HB-estimated additive models nest lexico-graphic models (and hence conjunctive models forconsideration data), the required ratio of partworths isapproximately 1015 and not realistic empirically. Morelikely, HB does less well because its assumed additivemodel with 53 parameters overfits the data, even withshrinkage to the population mean.

It is perhaps surprising that ∼14.5 adaptive ques-tions do so well for 53 aspects. This is an empiricalissue, but we speculate that the underlying reasonsare (1) consumers use cognitively simple heuristicswith relatively few aspects, (2) the adaptive questionssearch the space of decision rules efficiently to con-firm the cognitively simple rules, (3) the configuratorfocuses this search quickly, and (4) consumer-specificpriors keep the search focused.

There is an interesting, but not surprising, interac-tion effect in Table 2. If the estimation assumes anadditive model, noncompensatory-focused adaptivequestions do not do as well as market-based ques-tions. Also, consistent with prior research using non-adaptive questions (e.g., Dieckmann et al. 2009, Kohliand Jedidi 2007, Yee et al. 2007), noncompensatoryestimation is comparable to compensatory estimation

using market-based questions. Perhaps to truly iden-tify heuristics, we need heuristic-focused adaptivequestions.

But are consumers compensatory or noncompensa-tory? The adaptive-question–noncompensatory-estimation combination is significantly better thanall other combinations in Table 2. But what if weestimated both noncompensatory and compensatorymodels using all 29 questions (combining ∼14.5 adap-tive questions and ∼14.5 market-based questions)?The noncompensatory model predicts significantlybetter than the compensatory model when all 29questions are used (KL = 00451 versus KL = 00560,p < 00001 using a paired t-test). Differences arealso significant at p < 00001 using a related-samplesWilcoxon signed-rank test. Because we may not knowa priori whether the respondent is noncompensatoryor compensatory, collecting data both ways gives usflexibility for post-data-collection reestimation. (Inthe automotive illustration, prior theory suggestedthat consumers were likely to use noncompensatoryheuristics.)

8.6. Summary of Empirical IllustrationAdaptive questions to identify noncompensatoryheuristics are promising. We appear able to selectquestions to provide significantly more informationper query than market-based queries. Furthermore, itappears that questions are chosen efficiently becausewe can predict well with ∼14.5 questions, even in acomplex product category with 53 aspects. This is anindication of cognitive simplicity. Finally, consumersappear to be noncompensatory.

9. Initial Tests of Generalizations:Disjunctions of Conjunctions andPopulation Priors

Although data were collected based on the conjunctiveactive-learning algorithm, we undertake exploratoryempirical tests of two proposed generalizations: dis-junctions of conjunctions and prior-respondent-basedpriors. These exploratory tests complement the theoryin §§6.5 and 6.6 and the synthetic data tests in §§7.3and 7.4.

9.1. Disjunctions of ConjunctionsAAM managers sought to focus on consumers’ pri-mary conjunctions because, in prior studies sponsoredby AAM, 93% of the respondents used only one con-junction (Hauser et al. 2010). However, we mightgain additional predictive power by searching for sec-ond and subsequent conjunctions using the methodsof §6.5. Ideally, this requires new data, but we getan indicator by (1) estimating the best model withthe data, (2) eliminating all calibration profiles that

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


were correctly classified with the first conjunction,and (3) using the remaining market-based profiles tosearch for a second conjunction. As expected, thisstrategy reduced false negatives because there weremore conjunctions. It came at the expense of a slightincrease in false positives. Overall, using all 29 ques-tions, KL increased slightly (0.459 versus 0.452, p <00001), suggesting that the reestimation on incorrectlyclassified profiles overfit the data. Because the dis-junctions of conjunctions (DOC) generalization worksfor synthetic data, a true test awaits new empiri-cal data.

9.2. Previous-Respondent-Based PriorsThe priors used to initialize consumer-specific beliefswere based on judgments by AAM managers andanalysts; however, we might also use the methodsproposed in §6.6 to improve priors based on datafrom other respondents. As a test, we used the basicalgorithm to estimate the pijKs, used the collaborativefilter to reset the priors for each respondent, rees-timated the model (p′

ijKs), and compared predictedconsideration to observed consideration. Previous-respondent-based priors improved predictions butnot significantly (0.448 versus 0.452, p = 00082), sug-gesting that AAM provided good priors for thisapplication.

10. Managerial UseThe validation reported in this paper was part of amuch larger effort by AAM to identify communica-tions strategies that would encourage consumers toconsider AAM vehicles. At the time of the study,

Table 3 Percentage of Respondents Using Aspect as an Elimination Criterion

Brand Elimination (%) Body type Elimination (%) Engine type Elimination (%)

BMW 68Buick 97Cadillac 86Chevrolet 34Chrysler 66Dodge 60Ford 23GMC 95Honda 14Hyundai 89Jeep 96Kia 95Lexus 86Lincoln 98Mazda 90Nissan 14Pontiac 97Saturn 95Subaru 99Toyota 15VW 86

Sports car 84Hatchback 81Compact sedan 62Standard sedan 58Crossover 62Small SUV 61Full-size SUV 71Pickup truck 82Minivan 90Quality

Q-rating 5 0Q-rating 4 1Q-rating 3 23

Crash testC-rating 5 0C-rating 4 27C-rating 3 27

Gasoline 3Hybrid 44Engine power

4 cylinders 96 cylinders 118 cylinders 69

EPA rating15 mpg 7920 mpg 4225 mpg 1630 mpg 535 mpg 0

Price ($)12,000 7717,000 5422,000 4627,000 4832,000 6137,000 7145,000 87

two of the three American manufacturers had enteredbankruptcy. AAM’s top management believed thatovercoming consumers’ unwillingness to considerAAM vehicles was critical if AAM was to becomeprofitable. Table 2, combined with ongoing studies byAAM, was deemed sufficient evidence for managersto rely on the algorithm to identify consumers’ heuris-tic decision rules. AAM is convinced of the relevancyof consumer heuristics and is actively investigatinghow to use noncompensatory data routinely to informmanagement decisions. We summarize here AAM’sinitial use of information on consumer heuristics.

The remaining 1,464 respondents each answered29 adaptive plus market-based questions, were shownan experimental induction, and then answered a sec-ond set of 29 adaptive plus market-based questions.Each induction was a communications strategy tar-geted to influence consumers to (1) consider AAMvehicles or (2) consider vehicles with aspects onwhich AAM excelled. Details are proprietary andbeyond the scope of this paper. However, in general,the most effective communications strategies werethose that surprised consumers with AAM’s successin a non U.S. reference group. AAM’s then-currentemphasis on J.D. Power and Consumer Reports ratingsdid not change consumers’ decision heuristics.

AAM used the data on decision heuristics for prod-uct development. AAM recognized heterogeneity inheuristics and identified clusters of consumers whoshare decision heuristics. There were four main clus-ters: high selectivity on brand and body type, selec-tivity on brand, selectivity on body type, and (likely)compensatory. There were two to six subclusters

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


within each main cluster, for a total of 20 clusters.3

Each subcluster was linked to demographic and otherdecision variables to suggest directed communica-tions and product development strategies. Decisionrules for targeted consumer segments are proprietary,but the population averages are not. Table 3 indicateswhich percentage of the population uses eliminationrules for each of the measured aspects.

Although some brands were eliminated by mostconsumers, larger manufacturers have many targetedbrands. For example, Buick was eliminated by 97%of the consumers and Lincoln by 98%, but these arenot the only GM and Ford brands. For AAM, thenet consideration of its brands was within the rangeof more-aggregate studies. Consumers are mixed ontheir interest in “green” technology: 44% eliminatehybrids from consideration, but 69% also eliminatelarge engines. Price elimination illustrates that heuris-tics are screening criteria, not surrogates for utility:77% of consumers will not investigate a $12,000 vehi-cle. This means that consumers’ knowledge of themarket tells them that, net of search costs, their beststrategy is to avoid investing time and effort to eval-uate $12,000 vehicles. It does not mean that con-sumers would not buy a top-of-the-line Lexus if itwere offered for $12,000. Table 3 provides aggregatesummaries across many consumer segments—AAM’sproduct development and communications strategieswere targeted within segment. For example, 84% ofconsumers overall eliminate sports cars indicating thesports-car segment is a relatively small market. How-ever, the remaining 16% of consumers constitute amarket that is sufficiently large for AAM to targetvehicles for that market.

11. Summary and ChallengesWe found active machine learning to be an effectivemethodology to select questions adaptively in orderto identify consideration heuristics. Both the syntheticdata experiments and the proof-of-concept empiri-cal illustration are promising, but many challengesremain.

Question selection might be improved furtherwith experience in choosing “tuning” parameters(�′s1T 5, improved priors, an improved focus onmore-complex heuristics, and better variational Bayesbelief-propagation approximations. In addition, fur-ther experience will provide insight on the informa-tion gained as the algorithm learns. For example,

3 AAM used standard clustering methods on the posterior pijs. Bythe likelihood principle, it is possible to use latent-structure mod-els to reanalyze the data. Post hoc clustering is likely to lead tomore clusters than latent-structure modeling. Comparisons of clus-tering methods are beyond the scope and tangential to our currentfocus on methods to select questions efficiently for the estimationof heuristic decision rules.

Figure 4 Average Expected Reduction in Entropy up to the 29thQuestion

0.00

0.10

0.20

0.30

0.40

0.50

0.60

1 6 11 16 21 26

Market-based questions

Adaptive questions

Figure 4 plots the average expected reduction inentropy for adaptive questions and for market-basedquestions. We see that, on average, adaptive questionsprovide substantially more information per question(5.5 times as much). Prior to the 10th question, theincreasingly accurate posterior probabilities enablethe algorithm to ask increasingly more accurate ques-tions. Beyond 10 questions the expected reduction inentropy decreases and continues to decrease throughthe 29th question. It is likely that AAM would havebeen better able to identify consumers’ conjunctivedecision rules had they used 58 questions for estima-tion rather than split the questions between calibra-tion and validation. Research might explore the mixbetween adaptive and market-based questions.

The likelihood principle implies that other mod-els can be tested on AAMs and other adaptive data.The variational Bayes belief-propagation algorithmdoes not estimate standard errors for the pijs. OtherBayesian methods might specify more complex dis-tributions. Reestimation or bootstrapping, when fea-sible, might improve estimation.

Active machine learning might also be extendedto other data-collection formats, including formats inwhich multiple profiles are shown on the same pageor formats in which configurators are used in creativeways. The challenge for large N is that we would liketo approximate decision rules in less than N queriesper respondent.

AcknowledgmentsThis research was supported by the MIT Sloan Schoolof Management, the Center for Digital Business at MIT(http://ebusiness.mit.edu), and an unnamed Americanautomotive manufacturer. The authors thank Rene Befurt,Sigal Cordeiro, Theodoros Evgeniou, Vivek Farias, PatriciaHawkins, Dmitriy Katz, Phillip Keenan, Andy Norton, EleOcholi, James Orlin, Erin MacDonald, Daniel Roesch, JoyceSalisbury, Matt Selove, Olivier Toubia, Paul Tsier, CatherineTucker, Glen Urban, Kevin Wang, and Juanjuan Zhang fortheir insights, inspiration, and help on this project. Bothreviewers and the area editor provided excellent commentsthat improved the paper.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Appendix A. Example Where Uncertainty SamplingMinimizes Posterior EntropyWe choose a simple example with two aspects to demon-strate the intuition. For this formal example, we abstractaway from response error by setting �1 = �2 = 0, andwe choose uninformative priors such that p0

i1 = p0i2 = 005.

With two aspects there are four potential queries, Exi1 =

801091 801191 81109, and 81119; and four potential decisionrules, Eai = 8 − 11−191 8 − 11+191 8 + 11−19, and 8 + 11+19,each of which is a priori equally likely. However, the differ-ent Exi1s provide differential information about the decisionrules. For example, if Exi1 = 80109 and yi1 = 1, then the deci-sion rule must be Eai = 8 − 11−19. At the other extreme, ifExi1 = 81119 and yi1 = 1, then all decision rules are consis-tent. The other two profiles are each consistent with half ofthe decision rules. We compute Pr4yi1 = 1 � Exi15 for the fourpotential queries as 0.25, 0.50, 0.50, and 1.00, respectively.

We use the formulae in the text for expected posteriorentropy, E6H4Exi157.

Potentialquery (Exi15 Pr4yi1 = 1 � Exi15 E6H4Exi157

{0, 0} 0025 − 32 =

( 23 log2

23

+ 13 log2

13

)

= 104

{0, 1} 0050 − log212 = 1

{1, 0} 0050 −log212 = 1

{1, 1} 1000 −2 log212 = 2

Expected posterior entropy is minimized for either of thequeries, 80119 or 81109, both of which are consistent withuncertainty sampling (choice balance).

Appendix B. Pseudo-Code forBelief-Propagation AlgorithmMaintain the notation of the text; let EpiK be the vector of thepijKs, and let EpiK1−j be the vector of all but the jth element.Define two index sets, S+

j = 8k � xijk = 11yik = 19 and S−j = 8k �

xijk = 11yik = 09. Let superscript h index an iteration withh = 0 indicating a prior. The belief-propagation algorithmuses all of the data, XK and EyiK , when updating for the Kthquery. In application, the �s are set by managerial judgmentprior to data collection. Our application used �1 = �2 = 0001for query selection.

Use the priors to initialize Ep0iK . Initialize all Pr4yik �

XiK1 Eph−1iK1−j1 aij = ±15.

While maxj 4phijK − ph−1

ijK 5 > 00001.[Continue looping until phijK converges.]

For j = 1 to N [Loop over all aspects.]For k ∈ S+

j [Use variational distribution toapproximate data likelihood.]

Pr4yik = 1 �XiK1 Eph−1iK1−j1 aij = 15

= 41 − �25∏

xigk=11g 6=j

ph−1igK + �141 −

∏

xigk=11g 6=j

ph−1igK 51

Pr4yik = 1 �XiK1 Eph−1iK1−j1 aij = −15= �1

end loop k ∈ S+

j

For k ∈ S−j [Use variational distribution to

approximate data likelihood.]

Pr4yik = 0 �XiK1 Eph−1iK1−j1 aij = 15

= 41 − �1541 −∏

xigk=11g 6=j

ph−1igK 5+ �2

∏

xigk=11g 6=j

ph−1igK 1

Pr4yik = 0 �XiK1 Eph−1iK1−j1 aij = −15= 41 − �15

end loop k ∈ S−j

Pr4EyiK �XiK1 Eph−1iK1−j1aij =15=

K∏

k=1

Pr4yik �XiK1 Eph−1iK1−j1aij =151

Pr4EyiK �XiK1 Eph−1iK1−j1aij =−15

=

K∏

k=1

Pr4yik �XiK1 Eph−1iK1−j1aij =−15

[Compute data likelihoods across all K questions as aproduct of marginal distributions for each k.]

Pr4aij = 1 �XiK1 EyiK1 Eph−1iK1−j5

∝ Pr4EyiK �XiK1 Eph−1iK1−j1 aij = 15Pr4aij = 1 � prior51

Pr4aij = −1 �XiK1 EyiK1 Eph−1iK1−j5

∝ Pr 4EyiK �XiK1 Eph−1iK1−j1 aij = −1541−Pr4aij = 1 � prior551

phijK = Pr4aij = 1 �XiK1 EyiK1 Eph−1iK1−j5 normalized0

[Use Bayes theorem, then normalize.]end loop j [Test for convergence and continue if

necessary.]

Appendix C. Kullback–Leibler Divergence forEmpirical DataThe Kullback–Leibler divergence (KL) is an informationtheory-based measure of the divergence from one proba-bility distribution to another. In this paper we seek thedivergence from the predicted consideration probabilitiesto those that are observed in the validation data, recogniz-ing the discrete nature of the data (to consider or not). Forrespondent i we predict that profile k is considered withprobability, rik = Pr4yik = 1 � Exik1model5. Then the divergencefrom the true model (the yiks) to the model being tested (theriks) is given by Equation (C1). With log-based-2, KL has theunits of bits:

KL=∑

k∈validation

[

yik log2

(

yikrik

)

+41 − yik5 log2

(

1 − yik1 − rik

)]

0 (C1)

When the riks are themselves discrete, we must use theobservations of false-positive and false-negative predictionsto separate the summation into four components. Let V =

the number of profiles in the validation sample, let Cv = thenumber of considered validation profiles, let Fp = the false-positive predictions, and let Fn = the false-negative predic-tions. Then KL is given by the following equation, whereSc1 c is the set of profiles that are considered in the calibrationdata and considered in the validation data; the sets Sc1nc ,

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Snc1 c , and Snc1nc are defined similarly (nc → not considered):

KL =∑

Sc1 c

log2

(

Cv

Cv − Fp

)

+∑

Sc1nc

log2

(

V − Cv

Fn

)

+∑

Snc1 c

log2

(

Cv

Fp

)

+∑

Sc1 c

log2

(

V − Cv

V − Cv − Fn

)

0

After algebraic simplification, KL can be written as

KL = Cv log2 Cv+4V −Cv5log24V −Cv5−4Cv−Fp5log24Cv−Fp5

−Fn log2Fn−Fp log2Fp−4V −Cv−Fn5log24V −Cv−Fn50

(C2)

KL is a sum over the set of profiles. Sets with more profilesare harder to fit; if V were twice as large and Cv, Fp, andFn were scaled proportionally, then KL would be twice aslarge. For comparability across respondents with differentvalidation set sizes, we divide by V to scale KL.

Appendix D. Percent Uncertainty Explained 4U25 forOther Question-Selection Methods

0

20

40

60

80

100

0 32 64 96 128

U2

(%)

Number of questions

Random questionsAdaptive questionsMarket-based questionsRandom questions plus configuratorOrthogonal-design questions

Appendix E. Hit Rates for Question-Selection-and-Estimation Combinations (Where Larger Is Better)

Noncompensatory Compensatoryheuristics decision model

Question-selection methodAdaptive questions 00848abcde 00594d

Market-based questions 00827bcde 00806cdf

Null modelsConsider all profiles 0.180Consider no profiles 0.820Randomly consider profiles 0.732

aSignificantly better than market-based questions for noncompensatoryheuristics (p < 00001).

bSignificantly better than the compensatory decision model (p < 00001).cSignificantly better than the random null model (p < 00001).dSignificantly better than the consider-all-profiles null model (p < 00001).eSignificantly better than the consider-no-profiles null model (p < 00001).fSignificantly better than adaptive questions for the compensatory decision

model (p < 00001).

ReferencesArora, N., J. Huber. 2001. Improving parameter estimates and

model prediction by aggregate customization in choice exper-iments. J. Consumer Res. 28(2) 273–283.

Attias, H. 1999. Inferring parameters and structure of latent variablemodels by variational Bayes. K. B. Laskey, H. Prade, eds. Proc.15th Conf. Uncertainty Artificial Intelligence (UAI-99), MorganKaufmann, San Francisco, 21–30.

Breese, J. S., D. Heckerman, C. Kadie. 1998. Empirical analysis ofpredictive algorithms for collaborative filtering. Proc. 14th Conf.Uncertainty Artificial Intelligence (UAI-98), Morgan Kaufmann,San Francisco, 43–52.

Chaloner, K., I. Verdinelli. 1995. Bayesian experimental design:A review. Statist. Sci. 10(3) 273–304.

Dahan, E., J. R. Hauser. 2002. The virtual customer. J. Product Inno-vation Management 19(5) 332–353.

Dieckmann, A., K. Dippold, H. Dietrich. 2009. Compensatory ver-sus noncompensatory models for predicting consumer prefer-ences. Judgment Decision Making 4(3) 200–213.

Ding, M., J. R. Hauser, S. Dong, D. Dzyabura, Z. Yang, C. Su,S. Gaskin. 2011. Unstructured direct elicitation of decisionrules. J. Marketing Res. 48(February) 116–127.

Erdem, T., J. Swait. 2004. Brand credibility, brand consideration, andchoice. J. Consumer Res. 31(1) 191–198.

Evgeniou, T., C. Boussios, G. Zacharia. 2005. Generalized robustconjoint estimation. Marketing Sci. 24(3) 415–429.

Frederick, S. 2005. Cognitive reflection and decision making.J. Econom. Perspect. 19(4) 25–42.

Ghahramani, Z., M. J. Beal. 2000. Variational inference for Bayesianmixtures of factor analyzers. Advances in Neural Information Pro-cessing Systems, Vol 12. MIT Press, Cambridge, MA.

Ghahramani, Z., M. J. Beal. 2001. Propagation algorithms for vari-ational Bayesian learning. Advances in Neural Information Pro-cessing Systems, Vol. 13. MIT Press, Cambridge, MA.

Gigerenzer, G., D. G. Goldstein. 1996. Reasoning the fast and frugalway: Models of bounded rationality. Psych. Rev. 103(4) 650–669.

Gigerenzer, G., R. Selten, eds. 2001. Bounded Rationality: The AdaptiveToolbox. MIT Press, Cambridge, MA.

Gigerenzer, G., P. M. Todd, The ABC Research Group. 1999. Sim-ple Heuristics That Make Us Smart. Oxford University Press,Oxford, UK.

Gilbride, T. J., G. M. Allenby. 2004. A choice model with conjunc-tive, disjunctive, and compensatory screening rules. MarketingSci. 23(3) 391–406.

Gilbride, T. J., G. M. Allenby. 2006. Estimating heterogeneous EBAand economic screening rule choice models. Marketing Sci.25(5) 494–509.

Green, P. E. 1984. Hybrid models for conjoint analysis: An exposi-tory review. J. Marketing Res. 21(2) 155–169.

Green, P. E., V. Srinivasan. 1990. Conjoint analysis in marketing:New developments with implications for research and practice.J. Marketing 54(4) 3–19.

Green, P. E., A. M. Krieger, P. Bansal. 1988. Completely unaccept-able levels in conjoint analysis: A cautionary note. J. MarketingRes. 25(August) 293–300.

Hauser, J. R. 1978. Testing the accuracy, usefulness, and signifi-cance of probabilistic choice models: An information-theoreticapproach. Oper. Res. 26(3) 406–421.

Hauser, J. R., B. Wernerfelt. 1990. An evaluation cost model of con-sideration sets. J. Consumer Res. 16(4) 393–408.

Hauser, J. R., O. Toubia, T. Evgeniou, R. Befurt, D. Dzyabura. 2010.Disjunctions of conjunctions, cognitive simplicity, and consid-eration sets. J. Marketing Res. 47(June) 485–496.

Jedidi, K., R. Kohli. 2005. Probabilistic subset-conjunctive modelsfor heterogeneous consumers. J. Marketing Res. 42(4) 483–494.

Kohli, R., K. Jedidi. 2007. Representation and inference of lexico-graphic preference models and their variants. Marketing Sci.26(3) 380–399.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Kullback, S., R. A. Leibler. 1951. On information and sufficiency.Ann. Math. Statist. 22(1) 79–86.

Lewis, D. D., W. A Gale. 1994. Training text classifiers by uncer-tainty sampling. Proc. 17th Annual Internat. ACM SIGIR Conf.Res. Dev. Inform. Retrieval, Dublin, Ireland, 3–12.

Liberali, G., G. L. Urban, J. R. Hauser. 2011. Field experiments: Pro-viding unbiased competitive information to encourage trust,consideration, and sales. Working paper, MIT Sloan School ofManagement, Cambridge, MA.

Lindley, D. V. 1956. On a measure of the information provided byan experiment. Ann. Math. Statist. 27(4) 986–1005.

Little, J. D. C. 2004a. Managers and models: The concept of a deci-sion calculus. Management Sci. 50(12, Supplement) 1841–1853.[Reprinted from 1970.]

Little, J. D. C. 2004b. Comments on “Models and managers: Theconcept of a decision calculus”: Managerial models for prac-tice. Management Sci. 50(12, Supplement) 1854–1860.

Liu, Q., N. Arora. 2011. Efficient choice designs for a consider-then-choose model. Marketing Sci. 30(2) 321–338.

Martignon, L., U. Hoffrage. 2002. Fast, frugal, and fit: Simpleheuristics for paired comparisons. Theory Decision 52(1) 29–71.

Payne, J. W., J. R. Bettman, E. J. Johnson. 1988. Adaptive strategyselection in decision making. J. Experiment. Psych.: Learn., Mem-ory, Cognition 14(3) 534–552.

Payne, J. W., J. R. Bettman, E. J. Johnson. 1993. The Adaptive DecisionMaker. Cambridge University Press, Cambridge, UK.

Rényi, A. 1961. On measures of information and entropy. Proc.4th Berkeley Sympos. Math., Statistics, Probability, University ofCalifornia, Berkeley, Berkeley, 547–561.

Sándor, Z., M. Wedel. 2002. Profile construction in experimentalchoice designs for mixed logit models. Marketing Sci. 21(4)455–475.

Sawtooth Software. 1996. ACA system: Adaptive conjoint analysis.ACA Manual. Sawtooth Software, Sequim, WA.

Sawtooth Software. 2004. CBC hierarchical Bayes analysis. Techni-cal paper, Sawtooth Software, Sequim, WA.

Sawtooth Software. 2008. ACBC. Technical paper, Sawtooth Soft-ware, Sequim, WA.

Shannon, C. E. 1948. A mathematical theory of communication. BellSystem Tech. J. 27(July and October) 379–423, 623–656.

Toubia, O., J. R. Hauser. 2007. On managerially efficient experimen-tal designs. Marketing Sci. 26(6) 851–858.

Toubia, O., J. R. Hauser, R. Garcia. 2007. Probabilistic polyhedralmethods for adaptive choice-based conjoint analysis: Theoryand application. Marketing Sci. 26(5) 596–610.

Toubia, O., J. R. Hauser, D. I. Simester. 2004. Polyhedral methodsfor adaptive choice-based conjoint analysis. J. Marketing Res.41(February) 116–131.

Tversky, A. 1972. Elimination by aspects: A theory of choice. Psych.Rev. 79(4) 281–299.

Urban, G. L., J. R. Hauser. 2004. “Listening in” to find andexplore new combinations of customer needs. J. Marketing 68(2)72–87.

van Nierop, E., B. Bronnenberg, R. Paap, M. Wedel, P. H. Franses.2010. Retrieving unobserved consideration sets from house-hold panel data. J. Marketing Res. 47(February) 63–74.

Vriens, M., M. Wedel, Z. Sándor. 2001. Split-questionnaire designs:A new tool in survey design. Marketing Res. 13(2) 14–19.

Wilkie, W. L., E. A. Pessemier. 1973. Issues in marketing’s useof multi-attribute attitude models. J. Marketing Res. 10(4)428–441.

Yedidia, J. S., W. T. Freeman, Y. Weiss. 2003. Understanding beliefpropagation and its generalizations. G. Lakemeyer, B. Nebel,eds. Exploring Artificial Intelligence in the New Millennium.Morgan Kaufmann, San Francisco, 239–269.

Yee, M., E. Dahan, J. R. Hauser, J. Orlin. 2007. Greedoid-based non-compensatory inference. Marketing Sci. 26(4) 532–549.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

New Active Machine Learning for Consideration Heuristicsweb.mit.edu/hauser/www/Updates -...

Documents

Transcript of New Active Machine Learning for Consideration Heuristicsweb.mit.edu/hauser/www/Updates -...