Learning Demand Curves in B2B Pricing: A New Framework...
Transcript of Learning Demand Curves in B2B Pricing: A New Framework...
Submitted to Manufacturing & Service Operations Managementmanuscript (Please, provide the manuscript number!)
Authors are encouraged to submit new papers to INFORMS journals by means ofa style file template, which includes the journal title. However, use of a templatedoes not certify that the paper has been accepted for publication in the named jour-nal. INFORMS journal templates are for the exclusive purpose of submitting to anINFORMS journal and should not be used to distribute the papers in print or onlineor to submit the papers to another publication.
Learning Demand Curves in B2B Pricing:A New Framework and Case Study
Huashuai QuDepartment of Mathematics, University of Maryland, College Park, MD 20742
Ilya O. Ryzhov, Michael C. FuRobert H. Smith School of Business, University of Maryland, College Park, MD 20742
[email protected], [email protected]
Eric Bergerson, Megan KurkaVendavo, Inc., Mountain View, CA 94043
In business-to-business (B2B) pricing, a seller seeks to maximize revenue obtained from high-volume trans-
actions involving a wide variety of buyers, products, and other characteristics. Buyer response is highly
uncertain, and the seller only observes whether buyers accept or reject the offered prices. These deals are also
subject to high opportunity cost, since revenue is zero if the price is rejected. The seller must adapt to this
uncertain environment and learn quickly from new deals as they take place. We propose a new framework
for statistical and optimal learning in this problem, based on approximate Bayesian inference, that has the
ability to measure and update the seller’s uncertainty about the demand curve based on new deals. In a case
study, based on historical data, we show that our approach offers significant practical benefits.
Key words : optimal learning; B2B pricing; price optimization; Bayesian learning; approximate Bayesian
inference
History :
1. Introduction
We study the problem of optimally pricing high-volume commercial transactions between busi-
nesses, referred to as business-to-business or B2B pricing. For example, consider a negotiation
between a supplier of raw materials (the seller) and a manufacturer (the buyer), which ends in a
1
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing2 Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
final price offer named by the seller. If the price is rejected, the seller incurs a high opportunity cost
(lost revenue); however, it may not be clear whether a lower offer would have gotten the deal, and if
so, how much lower it should have been. If the price is accepted, the seller is left wondering whether
a higher price would have also worked. The seller makes many such pricing decisions over time,
and attempts to maximize revenue, subject to considerable uncertainty about buyer behaviour and
willingness to pay.
We consider a case application based on historical data provided by Vendavo, Inc., a firm special-
izing in B2B pricing science. The data include information on tens of thousands of B2B transactions
(most of them unsuccessful) involving a single seller and a large number of buyers. This seller’s
price optimization problem involves the following challenges:
• Big data. The data are highly heterogeneous, covering thousands of distinct products and
buyers. Different product types have different price sensitivities. Consequently, the data contain a
large number of “rows” (observed deals) as well as “columns” (explanatory variables). Predictive
models may thus be vulnerable to noise accumulation, spurious correlations, and computational
issues (Fan et al. 2014).
• Noise. We are only able to observe a binary (yes/no) response from the buyer, representing
whether the seller’s price was accepted or rejected. The proportion of accepted offers (“wins”) is
very low. Furthermore, many of the products and buyers may appear infrequently and have few or
no wins. Even with a large amount of data, predictive models are likely to be inaccurate.
• High cost of failure. If a price is rejected, the seller’s revenue is zero. In B2B transactions, the
total value of the deal may be in the millions of dollars. If the historical data are insufficient to
make accurate predictions about future deals, the seller must learn quickly from new deals as they
take place. It may not be enough to use a pricing strategy that works well in the long run, as the
practical value is in the very short term.
We seek to address these challenges using predictive and prescriptive analytics, leading to new
developments in both statistical modeling and price optimization. In addition to short-term perfor-
mance, computational efficiency is also an issue. Ideally, price optimization should be implementable
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!) 3
in real time and on demand, so that a sales representative may access it during a negotiation
through a tablet app.
Many models in revenue management allow stochastic product demand (Bitran and Caldentey
2003), but in our case, the seller faces the additional challenge of environmental uncertainty : we
do not know the exact distribution of the buyer’s willingness to pay. Rather, this distribution
is estimated from historical data, assuming some statistical model (e.g., logistic regression, as in
Hormby et al. 2010), and this model is updated over time as new transactions take place. In this
way, any given deal provides new information about the demand distribution, aside from its purely
economic value in generating revenue. Furthermore, since any given statistical model is likely to
be inaccurate, we may not wish to implement the price that seems to be optimal under that
model. Instead, we may experiment with prices (for instance, charging slightly more or less than
the recommended price) in order to obtain new information and potentially discover better pricing
strategies. Doing this may result in lost revenue at first, but the new information may help to
improve pricing decisions in the (hopefully near) future.
We approach the problem from the perspective of optimal learning (Powell and Ryzhov 2012),
which typically uses Bayesian models to measure the uncertainty or the potential for error in the
predictive model. In our case, we use logistic regression with the coefficients modeled as a random
vector (because their “true” values are unknown). The practical power of these models comes from
the concept of “correlated beliefs” (Negoescu et al. 2011, Qu et al. 2015), which measures the
similarities and differences between various types of deals, so that a sale involving one product
will teach us something about other, similar products. The Bayesian model can then be integrated
with a pricing strategy that accounts for the uncertainty in the model, e.g., by correcting overly
aggressive prices when the uncertainty is high, or by experimenting with higher prices when there
is a chance that they may be better than we think. The outcomes of our decisions feed back into
the model and modify our beliefs for future decisions. This framework can provide meaningful
guidance within very short time horizons, even in the presence of very noisy data.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing4 Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
Optimal learning methods typically use simple Bayesian models that can be updated very quickly.
In linear regression (ordinary least squares), the standard approach is to assume that the regression
coefficients are normally distributed, which enables us to concisely model and update correlated
beliefs. However, there is no analogous model for logistic regression, making it difficult to represent
beliefs about logistic demand curves. We approach this problem using approximate Bayesian infer-
ence (Ryzhov 2015), and create a new learning mechanism that allows us to maintain and update a
multivariate normal belief on the regression coefficients using rigorous statistical approximations.
We then develop a “Bayes-greedy” pricing strategy that optimizes an estimate of expected revenue
by averaging over all possible revenue curves.
We find that the Bayesian framework performs very well in both predictive and prescriptive
roles. Surprisingly, despite the approximations used in the Bayesian model, its predictive power is
quite competitive with exact logistic regression. Our first insight is that uncertainty is valuable:
the benefits of quantifying our uncertainty about the predictive model easily compensate for any
reduction in accuracy incurred by using approximations. Our second insight is that uncertainty
is more valuable for optimization than for prediction: while the exact and Bayesian models have
similar predictive power, the price recommendation obtained from the model is greatly improved
by the inclusion of uncertainty in the pricing decision.
Thus, our paper makes the following contributions: 1) We introduce a new approximate Bayesian
learning model for learning B2B demand curves based on logistic regression.1 Our approach opti-
mizes a statistical measure of distance (Kullback-Leibler divergence) between the multivariate nor-
mal approximation and the exact, non-normal posterior distribution. 2) We show how the seller’s
beliefs can be efficiently updated in this model, using stochastic gradient methods to calculate the
optimal statistical approximation. 3) We propose the Bayes-greedy pricing policy and show how
these prices can be efficiently computed. 4) We demonstrate the practical value of these methods
1 We note that our model could potentially be used in any application of logistic regression, not only pricing. We
believe that its performance in this application illustrates the practical potential of approximate Bayesian methods
in other problems involving decision-making under uncertainty.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!) 5
on the Vendavo dataset. Part of our case study is purely data-driven, while another part uses
simulation models calibrated using the data.
2. Literature Review
B2B pricing is a multifaceted problem and has been approached from multiple angles in the liter-
ature. Below, we survey other perspectives and contrast them with the present paper.
Practical implementations of dynamic price optimization often use statistical models such as
logistic regression (Agrawal and Ferguson 2007, Hormby et al. 2010). A significant challenge in
practice is to find a segmentation of the customers (Bodea and Ferguson 2014) that will lead
to successful targeted pricing (Cross et al. 2011). The products can also be segmented based
on common characteristics that influence their value (Gale and Swire 2012). In our work, the
segmentation is assumed to be (mostly) given; in our case study, we use statistical model selection
methods (Hastie et al. 2001) to identify the most important segments from among a large number
pre-specified by the seller. Our learning framework assumes a fixed set of regression features. The
question we ask is how the effects of these segments can be learned efficiently, in a way that leads
to improved price optimization within a short timespan.
The literature has devoted considerable attention to behavioural issues affecting B2B negotia-
tions. For instance, one important issue in practice (Elmaghraby et al. 2015) is that, even if the
seller uses price optimization tools, the salespeople conducting the negotiations may choose not to
implement the recommended prices, instead viewing them as references or targets (Bruno et al.
2012). Elmaghraby et al. (2012) studies the effects of such recommendations on changes in sales-
people’s ultimate price quotes. Zhang et al. (2014) proposes a model that explicitly captures the
latent “trust” state of the buyer and infers this state from a sequence of interactions between the
buyer and seller. Other work has considered strategic behaviour on the buyers’ part (Leng and
Parlar 2005, Elmaghraby et al. 2008), and its effect on the efficacy of pricing strategies such as
trade-in rebates (Agrawal et al. 2015). Our approach may be able to partially accommodate some
of these dimensions (for example, when considering a new deal with a given buyer, the regression
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing6 Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
features could potentially include information about our history with that buyer, such as previous
price quotes and responses), but we do not explicitly include an economic or game-theoretic model.
Rather, our core focus is on prediction and optimization based on historical data.
Another relevant stream of literature deals with the “learning and earning” problem (Harrison
et al. 2012), which also considers pricing under environmental uncertainty, where learning plays a
major role. A common approach there is to develop pricing strategies that collect revenue at the
best possible asymptotic rate. In many cases (Besbes and Zeevi 2009, Broder and Rusmevichien-
tong 2012, Keskin and Zeevi 2014, den Boer and Zwart 2015), such strategies are “semi-myopic,”
meaning that they alternate a period of randomized exploration with a period of purely myopic
decision-making, with the exploration periods spaced increasingly further apart. More recent work
(Besbes and Saure 2014, Keskin and Zeevi 2015) has extended these ideas to problems with non-
stationary demand curves. While our B2B setting also involves learning, it differs from this work
in two fundamental ways: 1) The rate-optimality of semi-myopic methods is asymptotic, meaning
that the benefits are realized over a long time horizon. This is highly relevant in B2C applica-
tions, e.g., in e-commerce, where millions of transactions are observed, but the opportunity cost
of each individual transaction may be fairly low. However, B2B strategies require good short-term
performance, which is ultimately evaluated empirically. 2) Much of the existing work on learning
in pricing assumes that customers are homogeneous (i.i.d.), and/or that only a single product is
being sold. Pricing strategies may strongly rely on these assumptions; for example, the Gittins
index approach of Xia and Dube (2007) or Chhabra and Das (2011) cannot be easily extended to
heterogeneous data. By contrast, customer and product heterogeneity is built into our approach.
Our methodology in this paper has roots in the simulation literature (Chau et al. 2014), where
Bayesian models are often used to estimate the performance of a simulation system (Chick 2006,
Chen et al. 2015). We also use stochastic gradient methods from this literature (Fu 2015) to
solve the technical problem of optimizing difficult expectations (such as the Bayesian expected
revenue). Approximate Bayesian inference, which we use to develop our statistical model, is a
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!) 7
promising method for designing learning mechanisms when standard models are not usable (Qu
et al. 2015), and has previously demonstrated practical benefits in applications such as market-
making in financial exchanges (Das and Magdon-Ismail 2009, Brahma et al. 2012). While Bayesian
learning has previously been studied in the context of dynamic pricing (Araman and Caldentey
2009, Farias and Van Roy 2010), our approximate scheme provides the modeling flexibility needed
to accommodate detailed segmentation, which plays a major role in the practice of B2B pricing.
3. Demand Model
Section 3.1 gives the basic definitions and notation for logistic demand curves. In Section 3.2, we
explain how our uncertainty about such a curve can be represented by a Bayesian prior. Section 3.3
describes our approach to updating the prior after a single new deal is observed. Finally, Section
3.4 gives the technical details of how this update is implemented.
3.1. Modeling the demand curve
Consider a generic deal in which the seller quotes a price p, and the buyer makes a binary response
denoted by Y . The event that Y = 1 represents a sale (or “win”), whereas Y = 0 is a “loss,” meaning
that the deal did not go through. We express the win probability P (Y = 1) as a function
ρ (x,β) =1
1 + e−β>x, (1)
where x∈RM is a vector that depends on p, as well as on additional characteristics of the product
or the buyer, which are known to the seller at the time p is chosen. The function ρ, which is not
known exactly to the seller, is also called the demand curve (Cope 2007). The seller’s expected
revenue from the deal is given by
R (p;x,β) = p · ρ (x,β) , p≥ 0,
with p∗ = arg maxpR (p;x,β) denoting the optimal price. For simplicity, we work with the revenue
function throughout this paper. However, it is straightforward to modify the analysis to maximize
profit rather than revenue.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing8 Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
Equation (1) is an instance of logistic regression, a standard model for forecasting demand or
sales (Ch. 9, Talluri and Van Ryzin 2006). In the simplest possible case, we can let x = [1, p]>
,
which implies that the buyers are homogeneous (given a fixed price, their valuations are drawn
from a single common distribution). However, in practice, x also contains information such as the
type and quantity of product stipulated in the deal. We may need to use a large number of dummy
variables to describe the product. For example, a large retailer may wish to include features that
classify products by department (e.g., electronic, furniture, housewares), then generally describe
the item in question (e.g., TVs, cameras, tablets), and finally give more detailed information such
as the brand and model of the item. Additionally, x could describe the buyer with varying degrees
of granularity (e.g., whether the buyer is located in Europe or Asia, followed by more detailed
country information), since B2B pricing is highly individualized in practice (Elmaghraby et al.
2015). We could also include interaction terms between product and customer features (e.g., if a
particular product type sells better in a particular region), as well as interactions between these
features and the price (to model the case where different products have different price sensitivities).
Since the outcome of B2B negotiations heavily depends on the individual salesperson, x may also
include characteristics of the sales force. In a practical application, x may include hundreds or
thousands of elements.
However, in all of these cases, the regression coefficients β are unknown to the seller, and must
be inferred based on prior knowledge as well as new information obtained by observing new wins
and losses. The margin for error in estimating β is quite narrow. First, the opportunity cost for
lost deals is extremely high (the seller receives zero revenue if the deal fails). Second, the demand
curve can be highly sensitive to the values of β, meaning that small estimation errors can lead to
large differences in the recommended prices. We now describe a Bayesian framework for optimal
learning on the basis of a single new observation (the goal being to implement this framework
sequentially).
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!) 9
3.2. Bayesian model for learning demand curves
In the Bayesian view, any unknown quantity is modeled as a random variable whose distribution
represents our beliefs about likely values for that quantity. We use a multivariate normal prior
distribution, that is,
β∼N (θ,Σ) . (2)
The main benefit of the multivariate normal distribution is that it allows us to compactly represent
correlated beliefs using the covariance matrix Σ. The off-diagonal entries in this matrix can be
viewed as representing the degree of similarity or difference between the values of different regression
coefficients. Correlations have great practical impact when the design matrix is sparse, that is,
many of the components of x are equal to zero for any given observation. This is likely to be the
case in our application: the seller may include hundreds of distinct products into the model, and
only a few observations may be available for a given product even if the overall dataset is large.
However, if we believe that two products are similar, correlated beliefs will allow us to learn about
one product from a deal that involves the other one. This greatly increases the information value
of a single deal, and allows us to learn about a large number of products from a small number of
observations. Furthermore, normality assumptions will substantially simplify the computation of
optimal prices in Section 4.
However, we first require a mechanism for efficiently updating the covariance matrix after new
observations. We use Bayes’ rule to derive the conditional density of β given Y , the associated
features x, and the modeling assumption in (2). This posterior density represents our new beliefs
about the regression coefficients after an additional observation has been made. We first rewrite
the likelihood function of Y more compactly as ` (H (β;Y )), where ` (z) = 11+e−z and H (β;Y ) =
(2Y − 1)β>x. Then, the posterior density of β can be written as
P (β |x, Y )∝ ` (H (β;Y )) |Σ|−12 e−
12 (β−θ)>Σ−1(β−θ). (3)
In multi-stage problems where decisions are made sequentially, it is desirable to use a conjugate
model (DeGroot 1970) where the prior and posterior distributions belong to the same family (e.g.,
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing10Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
multivariate normal). Such models admit computationally efficient learning schemes where the
entire belief distribution is compactly characterized by a finite number of parameters, and these
parameters can be updated recursively after each new observation. However, (3) is non-normal due
to the presence of `.
We would like to retain the multivariate normal distribution in order to use the power of cor-
related beliefs. Since this is not possible using standard Bayesian updating, we use the methods
of approximate Bayesian inference (Ryzhov 2015). Essentially, if the posterior distribution is not
conjugate with the prior, we replace it by a simpler distribution that does belong to our chosen
family (multivariate normal), and optimally approximates the true, non-normal posterior. We use
a variational Bayesian approach, where the parameters (θ′,Σ′) of the desired normal density Q
are chosen to minimize the Kullback-Leibler (KL) divergence between Q and the true posterior
P (· |x, Y ). This quantity is defined as
DKL (Q ‖ P ) =EQ(
logQ (β;θ′,Σ′)
P (β;x, Y,θ,Σ)
), (4)
where EQ is the expectation with respect to Q. The KL divergence, which is always non-negative,
measures the “distance” between two probability distributions. Lower KL divergence suggests that
there is more similarity between P and Q (zero KL divergence occurs if and only if P and Q are
identical). We wish to find
(θ∗,Σ∗) = arg min(θ′,Σ′)
DKL (Q ‖ P ) ,
the parameter values for which the multivariate normal distribution Q optimally approximates the
non-normal distribution P .
3.3. Approximate Bayesian inference for logistic regression
We first observe that the definition in (4) can be partially simplified, due to the following result.
Proposition 1. Given x, Y , and the modeling assumption in (2), the KL divergence can be
written as
DKL (Q ‖ P ) =EQ[log(1 + e−H(β;Y )
)]+h (θ,Σ,θ′,Σ′) , (5)
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)11
with the second component given in closed form as
h (θ,Σ,θ′,Σ′) =1
2
[tr(Σ−1Σ′
)+ (θ−θ′)>Σ−1 (θ−θ′)−M − log
|Σ′||Σ|
+C
], (6)
where C is a constant that does not depend on θ′,Σ′.
Proof: From (3), we have
logQ (β;θ′,Σ′)
P (β;x, Y,θ,Σ)= log
|Σ′|−12 e−
12(β−θ′)
>Σ−1(β−θ′)
` (H (β;Y )) |Σ′|−12 e−
12 (β−θ)>Σ−1(β−θ)
+C.
Taking expectations yields
DKL (Q ‖ P ) =EQ[log(1 + e−H(β;Y )
)]+DKL (Q ‖ P0) ,
where P0 is the prior distribution N (θ,Σ). The KL divergence between two multivariate normal
distributions is given in (6). Q.E.D.
Unfortunately, even with this simplification, the expectation in (5) cannot be expressed in closed
form. Note, however, that the function inside the expectation is known, and the expectation is taken
with respect to a known distribution. To optimize the expected value, we can use gradient-based
stochastic search (Kim 2006). In gradient-based optimization, we would first calculate
∇DKL (Q ‖ P ) =∇EQ[log(1 + e−H(β;Y )
)]+∇h (θ,Σ,θ′,Σ′) , (7)
where ∇ is the gradient with respect to (θ′,Σ′), and apply a steepest descent algorithm to find
(θ∗,Σ∗) to a desired precision. Since the expectation in (7) is intractable, its gradient also cannot be
written explicitly, but it can be estimated from Monte Carlo simulation. Blei et al. (2012) proposes
to use the likelihood ratio method (Sec. 15.2 in Spall 2005) for estimating the gradient of the KL
divergence in Bayesian logistic regression. However, this and other gradient-based methods often
converge slowly to the optimal solution when the dimensionality of the problem is high. In our
case, we are estimating M 2 +M parameters, where M is on the order of hundreds or thousands.
To mitigate these computational challenges, we propose the following form for (θ′,Σ′):
θ′ = Σ′(
Σ−1θ+
(Y − 1
2
)x
)(8)
Σ′ =(Σ−1 + v−1xx>
)−1(9)
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing12Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
We apply the Sherman-Morrison-Woodbury formula (Golub and Van Loan 2012) to (8)-(9) and
obtain
θ′ = θ+v(Y − 1
2
)−x>θ
v+x>ΣxΣx, (10)
Σ′ = Σ− Σxx>Σ
v+x>Σx. (11)
This form substantially reduces the dimensionality of the optimization problem, as there is now
only a single parameter v to be determined. Aside from this computational convenience, we choose
this precise form for the posterior parameters because it resembles the updating equations used in
Bayesian linear regression. In a standard least-squares model y=x>β+ ε, normality assumptions
on β and the residual error ε induce normality of the posterior distribution of β given y and x.
Furthermore, the parameters of the posterior distribution can be computed recursively from the
prior parameters (Minka 2000) using an update that is very similar to (10)-(11). In our case, the
quantity v in (11) is exactly analogous to the variance of the residual error in linear regression,
while the quantity v(Y − 1
2
)replaces the continuous observation y.
Intuitively, this model treats v(Y − 1
2
)as an observation of the log-odds of success for the next
deal. Subtracting 12
from Y ensures that this observation can be both positive and negative, so that
new wins cause us to increase the estimated win probability, while new losses shift the estimate
downward. This is in line with the standard interpretation of logistic regression that positive
coefficients lead to higher win probabilities. The parameter v can be thought of as a user-specified
measure of the accuracy of this observation (higher v means lower accuracy).
In the literature, it has been fairly common to approach Bayesian logistic regression by forcing
it to resemble linear regression. The main issue is the choice of v, since there is no pre-specified
variance parameter in logistic regression. Spiegelhalter and Lauritzen (1990) proposed to use v =
p (1− p), where p is the predicted success probability for the feature vector x using θ as the
regression coefficients. Jaakkola and Jordan (2000) proposed an improved method where a heuristic
recursive update was used for v. Under this rule, the posterior update in (10)-(11) was shown to
follow from a first-order Taylor series approximation of the non-normal density in (3).
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)13
We propose to calculate v by optimizing the KL divergence, that is, v∗ = arg minvDKL (Q ‖ P ).
Even with the simplified form of θ, Σ, the expectation in (5) is not expressible in closed form.
Now, however, since we are solving a scalar optimization problem, gradient-based methods are an
effective way to find v.
3.4. Gradient-based optimization of the KL divergence
We estimate the gradient of the KL divergence using infinitesimal perturbation analysis (IPA; see
Fu 2008). If Q is the distribution of β, we can write
log(1 + e−H(β;Y )
)= log
(1 + e−(2Y−1)(x>θ′+
√x>Σ′xZ)
),
where Y ∈ {0,1} is fixed and Z ∼N (0,1). For a fixed sample path ω, we now write
∇v log(1 + e−H(β(ω);Y )
)=−(2Y − 1)e−H(β(ω);Y )
1 + e−H(β(ω);Y )∇v(x>θ′+
√x>Σ′xZ (ω)
), (12)
where
∇v(x>θ′+
√x>Σ′xZ (ω)
)=
(Y − 1
2
)x>Σx+x>θ
(v+x>Σx)2 x>Σx+
(x>Σx)2
(v+x>Σx)2Z (ω) . (13)
The next result shows that the sample path (IPA) derivative is an unbiased estimator for (5).
Proposition 2. ∇vEQ[log(1 + e−H(β;Y )
)]=EQ
[∇v log
(1 + e−H(β;Y )
)].
Proof: We can directly verify the conditions given in Proposition 1 of L’Ecuyer (1995) for the
interchange between the gradient and the expectation. First, for any ω, the gradient in (12)-(13)
is continuous at all v≥ 0. Second, for any ω, the above gradient exists for all v≥ 0. Third, for any
fixed v≥ 0, the above gradient exists for all ω. Finally, we observe that, for any v,
∣∣∇v log(1 + e−H(β(ω);Y )
)∣∣≤ ∣∣x>Σx+x>θ∣∣
|x>Σx|+ |Z (ω)| ,
whence EQ supv∣∣∇v log
(1 + e−H(β;Y )
)∣∣ <∞. It is therefore valid to interchange the gradient and
the expectation. Q.E.D.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing14Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
The IPA estimator for fixed v can be constructed as follows. Given fixed θ, Σ, x, and Y ,
we calculate θ′ and Σ′ using (10) and (11). Then, we simulate Z ∼ N (0,1) and calculate β =
x>θ′+√x>Σ′x · Z. The stochastic component of the estimator of ∇vDKL (Q ‖ P ) is given by
G=−(2Y − 1)e−H(β;Y )
1 + e−H(β;Y )
[(Y − 1
2
)x>Σx+x>θ
(v+x>Σx)2 x>Σx+
(x>Σx)2
(v+x>Σx)2 Z
].
To obtain the deterministic component, we return to (5) and differentiate h. The terms in (6) can
be rewritten as
tr(Σ−1Σ′
)= tr
(I − xx>Σ
v+x>Σx
),
(θ−θ′)>Σ−1 (θ−θ′) =
(v(Y − 1
2
)−x>θ
v+x>Σx
)2
x>Σx,
log |Σ′| = log∣∣∣(Σ−1 + v−1xx>
)−1∣∣∣ ,
whence
∇v tr(Σ−1Σ′
)=
tr (xx>Σ)
(v+x>Σx)2 ,
∇v (θ−θ′)>Σ−1 (θ−θ′) = 2
(v(Y − 1
2
)−x>θ
v+x>Σx
) (Y − 1
2
)x>Σx+x>θ
(v+x>Σx)2
(x>Σx
)2
and
∇v log |Σ′| = − tr((∇v(Σ−1 + v−1xx>
))Σ′)
=1
v2tr
(xx>
(Σ− Σxx>Σ
v+x>Σx
))=
1
v
1
v+x>Σxtr(xx>Σ
).
The final form for the IPA estimator is given by
∇vDKL (Q ‖ P ) =
(v(Y − 1
2
)−x>θ
v+x>Σx
) (Y − 1
2
)x>Σx+x>θ
(v+x>Σx)2
(x>Σx
)2− x>Σx
2v
tr (xx>Σ)
(v+x>Σx)2 + G,
and it follows from Proposition 2 that
∇vDKL (Q ‖ P ) =E(∇vDKL (Q ‖ P )
).
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)15
We can now apply the Robbins-Monro stochastic approximation method (Kushner and Yin 2003)
vk+1 = vk−αk∇vkDKL (Q ‖ P ) , (14)
which is guaranteed to converge to v∗ from an arbitrary starting point under suitable conditions
on the stepsize αk. The value obtained from this procedure can then be plugged into (10) and (11)
to determine the parameters of the approximate posterior distribution.
4. Price optimization in the multi-stage problem
We now apply our approximate Bayesian framework to the multi-stage pricing problem. Suppose
that we have a sequence of deals, where xn, n= 0,1, ...,N , denotes the features of the (n+ 1)st deal
(including the quoted price pn), and Y n+1 is the buyer’s response. We use different time indices
to express the fact that the response is observed only after the features (and the price) have been
fixed. The seller’s initial beliefs are represented by a multivariate normal distribution with the prior
parameters (θ0,Σ0), which may be calibrated based on historical data (see Section 5).
Suppose now that, after the first n deals have been observed, the seller’s beliefs are represented
by a multivariate normal distribution with parameters (θn,Σn). The features xn of the next deal
become known to the seller, a price pn is quoted, and the response Y n+1 is observed. We now apply
approximate Bayesian inference and assume that the new posterior distribution of β, taking into
account the new information Y n+1, is normal. The parameters of this distribution are obtained
from the recursive update (10)-(11), with the variance parameter v computed using the procedure
in Section 3.4. We then proceed to the next deal under the assumption that the seller’s belief distri-
bution continues to be normal. In this way, approximate Bayesian inference is applied sequentially.
Every new iteration introduces an additional degree of approximation, but the learning mechanism
is computationally efficient, and we maintain the ability to model and update our uncertainty
about β. We now show how price optimization can be integrated into this framework.
4.1. Definition of Bayes-greedy prices
The seller’s pricing decisions are adaptive, so that pn may depend on the posterior parameters
(θn,Σn), as well as on the other features of xn. The seller’s decision is to choose a pricing policy,
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing16Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
which can be represented as a function π mapping (θn,Σn,xn) to a price pn ≥ 0. The optimal
policy maximizes the objective function
supπ
EP
N∑n=0
R (pn;xn,β) , (15)
where we take an additional expectation of the expected revenue since β is random and the price
pn is not known until n deals have been observed. The notation EP means that the expected value
is taken with respect to the probability measure P induced by the approximate Bayesian model.
It is clear that (15) is intractable even for small N , since our distribution of belief is characterized
by M 2 +M continuous parameters, and furthermore, the sequence {xn}Nn=0 of deals is not known to
the seller in advance. In fact, the seller has very little information about the process that generates
the features xn of each deal; modeling this process is substantially more difficult than modeling
uncertainty about the regression coefficients, and is outside the scope of this paper. However, since
the regression features xn become known just before we choose the price for that deal, it is possible
to design a myopic policy that seeks to maximize the revenue from the deal without looking ahead
to future deals. Myopic policies have also attracted attention in recent literature (e.g., Harrison
et al. 2012) because they can be shown to possess asymptotic optimality properties in some cases.
Since we primarily deal with short time horizons in our application, we focus on developing a
myopic policy that is computationally tractable and will perform well in practice.
Recall that, ideally, the seller would like to maximize the true revenue curve by choosing the
price
p∗,n = arg maxp≥0
p
1 + e−(xn)>β,
where xn is a deterministic function of p. Since β is unknown, a standard definition for a myopic
policy is given by
pn = arg maxp≥0
p
1 + e−(xn)>θn, (16)
where θn is the current vector of regression coefficients. This approach is used in frequentist models
(e.g., in Broder and Rusmevichientong 2012), where θn is computed using maximum likelihood
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)17
estimation (in other words, frequentist logistic regression). If xn depends linearly on the price, (16)
has a closed-form expression in terms of the Lambert W-function (Li and Huh 2011).
However, we argue that this approach may underperform here, because it does not use all of
the available information. The value of the Bayesian model is that it enables us to quantify the
decision-maker’s uncertainty about the regression coefficients, which should be leveraged when
making the pricing decision. We define the Bayes-greedy price
pn = arg maxp≥0
Enβ(
p
1 + e−(xn)>β
), (17)
where the expectation is taken with respect to the distribution β ∼N (θn,Σn) obtained through
approximate Bayesian inference. Because the revenue function R is nonlinear, (16) and (17) yield
different prices even for the same values of xn and θn. The Bayes-greedy price takes uncertainty
into account by integrating over the entire space of possible revenue curves. The next result shows
that the Bayesian estimate of the revenue is quasi-concave, which implies that it has a single global
maximum at the Bayes-greedy price.
Proposition 3. Suppose that xn is linear in the price p. Then, the Bayes-greedy revenue func-
tion EnβR (p;xn) is quasi-concave in p when p≥ 0.
Proof: It is straightforward to show that R (p;xn,β) is log-concave in p for fixed β. The expectation
is taken over a multivariate normal density, which is log-concave. The product of log-concave
functions is log-concave. From Brascamp and Lieb (1976), the integral of a log-concave function is
also log-concave. The result follows. Q.E.D.
4.2. Computation of Bayes-greedy prices
We now discuss the solution of the Bayes-greedy price optimization problem in (17). Since this
procedure only depends on n through the posterior parameters, we drop the time index in the
following for notational convenience. Under the Bayesian assumption β∼N (θ,Σ), we have
x>β∼N(x>θ,x>Σx
).
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing18Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
Consequently, the revenue function can be rewritten as
R (p;x,β) =p
1 + e−x>θ−√x>Σx·Z
where Z ∼ N (0,1). The normality assumption considerably simplifies the computation of the
Bayes-greedy price, since (17) now requires us to optimize an expectation over a scalar probability
distribution. This expectation is known in statistics as the logistic-normal integral (Demidenko
2013), and cannot be expressed in closed form2. However, we observe that IPA can again be used
to optimize it. Since the win probability ρ (x,β) is continuous, differentiable, and bounded in p, it
is straightforward to show (similarly to Proposition 2) that the relevant conditions for the validity
of the IPA estimator hold, whence
∇pEβR (p;x,β) =Eβ(∇p
p
1 + e−x>β
).
For a fixed sample path ω, we write
∇pR (p;x,β (ω)) =1
1 + e−x>θ−√x>Σx·Z(ω)
+pe−x
>θ−√x>Σx·Z(ω)(
1 + e−x>θ−√x>Σx·Z(ω)
)2∇p(x>θ+
√x>Σx ·Z (ω)
).
(18)
To make this expression more explicit, we need to specify the dependence of x on the price. Suppose
that this dependence is linear, that is, x can be partitioned as
x=[xf , p ·xp
]>,
where xf is a vector of features whose values are known to the seller and not dependent on p, and
xp is another fixed vector of features related to the price sensitivity. Thus, each component of x
either depends linearly on p, or does not depend on p at all. In the simplest possible example, xfi
may be a dummy variable which equals 1 if the buyer is asking for a certain specific product. We
may then have a different feature xpj = xfi for some j, so that our model includes the base effect of
2 A closed-form approximation is available in Crooks (2009). However, we found that it was less reliable than IPA for
price optimization.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)19
the product on the win probability, as well as a specific price sensitivity for that product. We can
then partition
θ=[θf ,θp
]>, Σ =
Σff Σfp
Σpf Σpp
.In this case,
∇p(x>θ+
√x>Σx ·Z (ω)
)= (xp)
>θp +
(xp)>
Σpfxf + p (xp)>
Σppxp√x>Σx
Z (ω) . (19)
The IPA gradient ∇pR (p;x,β) is obtained by generating Z ∼N (0,1) and substituting this quan-
tity for Z (ω) in (18) and (19). The optimal price is found by iterating
pk+1 = pk +αk∇pkR(pk;x,β
). (20)
Due to Proposition 3, this procedure converges to the Bayes-greedy price.
We can now summarize our entire framework for price optimization and statistical estimation.
Suppose that we have already observed outcomes from n deals and constructed the belief param-
eters (θn,Σn). For the (n+ 1)st deal, we are given the features xf,n, xp,n. We then carry out the
following steps:
1. Apply procedure (20) to find the Bayes-greedy price;
2. Implement the price pn that is returned by this procedure (i.e., quote the price to the buyer);
3. Observe the response Y n+1;
4. Apply procedure (14) to find the optimal variance parameter vn;
5. Calculate (θn+1,Σn+1) from (10)-(11).
This process is repeated for n= 0,1, ...,N .
5. Empirical study
We evaluated the proposed methods on a historical dataset provided by Vendavo. Our goal is to
validate two different dimensions of our work: 1) the approximate Bayesian statistical model, which
can be used for estimation and prediction even when price optimization is not required; 2) the
Bayes-greedy pricing policy. Section 5.1 describes the data and the pre-screening procedure used to
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing20Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
reduce the feature space. Section 5.2 compares the approximate Bayesian model with benchmarks
on purely statistical metrics, where the goal is accurate prediction rather than price optimization.
Section 5.3 presents a qualitative comparison of the Bayes-greedy prices with the historical prices
in the data, and Section 5.4 gives the results of simulations evaluating the potential of different
pricing strategies for maximizing revenue.
5.1. Data description and model selection
Historical transaction data were provided by Vendavo in anonymized form. The part of the data
used to train the model consisted of 50,000 individual observations (both historical wins and
historical losses were recorded). The available information included categorical features for product
and customer types; at the most detailed level, there were 1881 different products and 2051 different
customers. The product types were aggregated hierarchically on four levels; a single product ID
was also assigned to a ProductLevel1, ProductLevel2, and ProductLevel3, with ProductLevel1
representing the coarsest aggregation (containing the most products). Similarly, the customer types
were organized into a hierarchy with two levels. The data also included the quantity of product
stated in the deal, the historical unit price that was quoted, the geographical location of the
customer, the manufacturing plant where the product was made, the channel involved in the sale,
and the ID of the sales representative.
In addition, we built additional features that reflect the popularity of the products and customers’
willingness to purchase them on an aggregate level. These include: 1) the average win rate for each
product; 2) the average win rate aggregated by ProductLevel1; 3) the average unit price for each
product; 4) the average win rate for each customer; and 5) the average win rate aggregated by the
top customer level. We also included interaction terms between the price and the ProductLevel1
variables, modeling heterogeneous price sensitivities.
All together, the initial model contained over 5,000 features. However, many of these features
are unlikely to be statistically significant. For example, many individual product IDs appear very
rarely, in only a small number of deals. In many of these cases, we will never obtain enough
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)21
data to confirm that the presence of these products has significant effects on the win probability;
furthermore, even if these products are in fact significant, they simply do not appear frequently
enough to exert a heavy impact on revenue. However, it may well be the case that these products
belong to a significant ProductLevel1 or ProductLevel2.
With a large number of features, the cost of estimating a statistical model may also become
prohibitive. For all of these reasons, we first performed statistical screening using the Lasso
method (Tibshirani 1996, Roth 2004) to eliminate features that are unlikely to be correlated with
the response variable. This method applies a regularization penalty to the standard maximum-
likelihood estimation approach. Given a design matrix X and a response vector y, one solves the
problem
β= arg minβ{− logL (β;X,y) +λ‖β‖1} , (21)
where L is the usual likelihood function for logistic regression. The penalty function ‖β‖1 is non-
differentiable at zero, which causes βi to shrink to zero if the ith feature does not sufficiently
improve the likelihood. The parameter λ controls the tradeoff between model accuracy and model
size, and is typically chosen to optimize some statistical criterion. In our case, we use the area
under the ROC curve, which is widely used to measure the quality of a statistical model when the
response is binary with a low proportion of 1s (Smithson and Merkle 2013).
The regularized problem (21) simultaneously performs model selection (eliminating insignificant
features) and estimation (calculating regression coefficients β). However, recent theoretical work
(Belloni and Chernozhukov 2013) has shown that the coefficients directly obtained from Lasso can
be biased. For this reason, practitioners are recommended to use Lasso for screening, remove all
insignificant features (that is, features i with βi = 0) from the model, and then refit the coefficients
of the remaining features using standard logistic regression or some other technique.
At this point, our model enters the picture. From Lasso, we obtain a total of 188 selected features,
including the unit price, 4 ProductLevel1s, 17 ProductLevel2s, 23 ProductLevel3s, 15 Plants, 16
interactions between ProductLevel1s and the unit price, all the engineered features, and various
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing22Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
customer types. All other features are now removed from the model, so x now has 188 elements.
We can then apply our approximate Bayesian procedure from Section 3 to the training data in
order to estimate the regression coefficients and covariance matrix.
5.2. Statistical quality of the approximate Bayesian model
Our first test seeks to evaluate the approximate Bayesian model in a purely statistical sense.
This comparison can be carried out based purely on the historical data, and does not involve
any price optimization. The goal is simply to gauge the predictive power of the model given
pre-specified historical prices. This is not directly related to the seller’s objective of maximizing
revenue, since even a poorly-specified model could potentially yield good price recommendations
(Besbes and Zeevi 2015). Nevertheless, this issue is important for understanding the quality of the
approximations used in the model.
We considered three statistical methods:
1. Bayesian logistic regression with KL minimization (KL). This is the proposed method
described in Section 3, where the posterior update is made to resemble linear regression, and the
variance parameter is chosen to optimize the KL divergence.
2. Bayesian logistic regression with variational bound (VB). This is the technique proposed by
Jaakkola and Jordan (2000), which also forces the learning mechanism to resemble linear regression,
but uses a heuristic to choose the variance parameter. This benchmark allows us to quantify the
value of optimizing the parameter.
3. Frequentist logistic regression (LR). We also implemented classical frequentist LR with
maximum-likelihood estimation. This benchmark allows us to quantify the value of including uncer-
tainty in the form of a Bayesian prior.
Each of these methods began with the same training data after screening (that is, each method had
to fit 188 coefficients). Each model was then evaluated on a separate test dataset containing 19,385
observations. We calculated five different performance metrics for each model, as summarized in
Table 1.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)23
The Accuracy metric is the percentage of correct predictions made by classifying data points as
wins if their predicted win probability is over 0.5. In this particular application, this metric is less
insightful as both the training and test data are imbalanced, that is, most of the deals were losses.
Therefore, a naive model that predicts all deals to be losses might appear to perform well in terms
of accuracy. The area under the ROC curve (AUC) is a better metric when considering unbalanced
binary response values. The F1 score includes both precision and recall in the calculation, but does
not consider the true negative rate.
We see that frequentist LR has the best AUC on both the training and test data. Nonetheless,
AUC scores are fairly close for all three policies. Furthermore, the VB method has the best recall
and F1 score (but also the worst precision and AUC). The KL model is generally situated between
the two. Both Bayesian models have better recall than LR, meaning that they make fewer false
negative predictions (i.e., are less likely to predict historical wins as losses).
The similarities between models may be more surprising than the differences. Recall that the
KL model uses two layers of approximations: first, we force the non-normal posterior to be normal
(Section 3.2), and second, we force the parameters of that posterior to resemble linear regression
(Section 3.3). By contrast, the LR method calculates the exact maximum-likelihood estimate of the
regression coefficients. Nonetheless, the value of incorporating uncertainty into the model, in the
form of the covariance matrix, largely outweighs the loss incurred by using these approximations,
producing a very similar AUC. Our conclusion from Table 1 is that all three models are fairly
competitive in terms of statistical predictive power; we experimented with several different ways
of generating the training and test sets, and found that KL was consistently close to LR. The real
value of Bayesian uncertainty becomes evident when we move from prediction to optimization.
Table 1 Performance metrics from three statistical modelson the training and test data.
MetricsTraining Data Test Data
LR KL VB LR KL VBAccuracy 0.867 0.863 0.858 0.828 0.825 0.823
AUC 0.871 0.858 0.842 0.851 0.839 0.827F1 Score 0.445 0.441 0.471 0.439 0.439 0.479Precision 0.643 0.616 0.566 0.643 0.616 0.591
Recall 0.340 0.343 0.403 0.333 0.341 0.403
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing24Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
5.3. Comparison with historical prices
We now begin to connect the models back to the price optimization problem. Each of the three
models in Section 5.2 can be used to calculate a recommended price for any new deal, given a fixed
set of features xf , xp. For the LR model, we use the simple myopic policy of (16). The Bayes-greedy
policy can be used by KL and VB, since both are Bayesian models. For each of the deals in the
test set, we calculated the recommended price for each model, and compared it to the historical
price in the data.
This comparison can provide qualitative arguments for or against a model. The historical prices
may not be optimal, but we expect them to be realistic. If any model recommends prices that
are consistently, unreasonably higher than historical, the model is most likely not useful for price
optimization. Additional insight can be obtained by separately considering the historical wins and
losses. Intuitively, if the historical price led to a loss, we expect that the price was too high, and
that the optimal price should be lower. Likewise, if the historical price led to a win, the optimal
price may have been higher.
However, we may not see such a clean separation across the whole dataset. Many of the products
appear very rarely in the data and have few wins, if any. For such products, any model will have
trouble distinguishing between wins and losses. The models will learn better for those product
types that appear reasonably often and have sufficiently many wins. For this reason, we conducted
this comparison for three different ProductLevel1s using data from the test set. Figure 1 shows the
empirical distributions of the difference between the recommended and historical prices (positive
values mean higher recommended prices, negative values mean higher historical prices) for each
model across these three types.
We observe the following behaviours:
• The LR model tends to recommend prices that are much higher than historical (the peaks of
the distributions appear to the right of zero). This is generally the case for both wins and losses.
• The VB model tends to recommend prices that are close to historical (the peaks are close to
zero), for both wins and losses.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)25
(a) First ProductLevel1.
(b) Second ProductLevel1.
(c) Third ProductLevel1.
Figure 1 Differences between recommended and historical prices for selected ProductLevel1s.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing26Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
• The KL model tends to recommend prices that are close to or below historical for losses, but
higher than historical for wins.
This suggests that the KL model, in conjunction with Bayes-greedy pricing, has better potential
than the other models for price optimization, as it has a better ability to detect opportunities for
additional revenue. This is explored further in the next test.
5.4. Expected revenues based on simulated buyers
We now compare the revenues generated by different statistical and optimization models. This is
the main metric of interest for the seller. However, it is less clear how such a comparison should
be designed: unlike the purely statistical comparison in Section 5.2, it cannot be carried out based
purely on the historical data. The reason is that, for any observation, the distribution of the
response Y depends on the price. We only know the response for the historical price, and there is
no way to go back and redo the same deal with the same customer using a different price offer.
For this reason, we compare different pricing methods using a simulation model, where we use
the data to generate win probabilities for a sequence of buyers, then simulate their responses for
different prices. However, the mechanism for generating these customers must necessarily come from
some statistical model, which itself is one of the main research questions of this paper. For example,
if we use classical logistic regression to fit the demand curve used to generate the customers, this
may bias the results in favour of LR-based policies.
Our approach to this issue is to use multiple simulation models:
1. Frequentist model. Win probabilities are generated from (1), where the true coefficients β
are fixed, but unknown to any of the pricing policies. These values of β are obtained by fitting
a frequentist LR model to a large set of data (70,000 deals). The prior coefficients used by the
policies are fit using a smaller training dataset, so they may be quite different from the true β
values. However, one may expect LR-based methods to do better in this setting.
2. Bayesian model (trained). We first fit the KL model to the training data to obtain the prior
parameters (θ0,Σ0). Then, we run 1000 macroreplications, each of which generates a set of values
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)27
β ∼ N (θ0,Σ0). These values are fixed within a given macroreplication and plugged into (1) to
calculate win probabilities. This approach may favour the Bayesian model, since the true coefficients
are drawn from the prior. However, within each individual macroreplication, the values of β may
be quite different from the prior coefficients θ0.
3. Bayesian model (noisy). This approach is similar to the previous one. However, after the true
coefficients β are generated, we replace the prior covariance matrix Σ0 by a diagonal matrix. In
this way, the Bayesian methods are given less information about the correlations and start with
less accurate beliefs, making it more important to learn quickly.
In each of these cases, the various pricing methods start with a prior obtained from the training
data, and are implemented for 100 simulated deals. The features xf , xp of each deal are chosen
by randomly sampling (bootstrapping) a row of data from the test set. The price p can then be
calculated based on these features and the current beliefs about the coefficients, and the response
is generated by plugging xf , xp, p, and the hidden values β into (1). If the price is accepted, the
policy generates p dollars in revenue; otherwise, no revenue is earned. The beliefs are then updated
using the appropriate statistical method.
We implemented the following combinations of statistical and optimization schemes:
• Our approximate Bayesian learning scheme is implemented with both the Bayes-greedy (KL-
Bayes) and frequentist (KL-Freq) pricing policies. That is, we use (10)-(11) to update after each
deal, but the recommended prices are computed from (17) and (16), respectively. We also imple-
mented a version where the KL model was used to fit the prior, but this prior was not updated
during the 100 deals (“no learning” or KL-Bayes-NL). The Bayes-greedy policy was used for pric-
ing. Note that, if the customers are homogeneous (that is, x= [1, p]>
), KL-Bayes-NL will always
pick the same price. However, since our data are highly heterogeneous, this was not the case in
our simulations. By including this version, we can measure the value added by continuing to learn
after we have fit the prior.
• The VB model of Jaakkola and Jordan (2000) is implemented with the Bayes-greedy policy
(VB-Bayes).
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing28Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
(a) Frequentist simulation model. (b) Bayesian model (trained).
(c) Bayesian model (noisy).
Figure 2 Cumulative revenues for 100 deals (averaged over 1000 macroreplications).
• Classical logistic regression is implemented with the frequentist policy (LR-Freq).
• The historical pricing strategy is also implemented (since we are bootstrapping the features
xf , xp from the test set, we can simply use the price p that was recorded). This method does not
use any statistical updating since the decisions are already fixed.
Figure 2 shows the averaged cumulative revenues obtained by different methods in each of
the three simulation models. Somewhat surprisingly, variants of the KL model achieve the best
performance in all cases, even when β is not drawn from the Bayesian model. The LR method
underperforms for the reason described in Section 5.3: it tends to recommend high prices, which
are only infrequently successful. However, LR consistently outperforms VB, suggesting that the
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)29
(a) Classification of simulated deals by outcome. (b) True win probabilities for simulated deals.
Figure 3 Outcomes and win probabilities of simulated deals.
KL optimality criterion plays an important role for obtaining good practical performance from the
Bayesian model.3
The choice of statistical model accounts for much of the observed differences in performance.
This occurs because the models are using priors that are calibrated from a relatively large training
dataset (30,000 deals). The additional improvement that can be made from 100 deals is small in
comparison, explaining the similar performance of KL-Bayes and KL-Bayes-NL in Figures 2(a)-
2(b). Nonetheless, when covariance information is removed from the prior, the value of sequential
learning is evident (Figure 2(c)). Furthermore, the Bayes-greedy policy consistently outperforms
the frequentist policy, even when both policies use the KL model: in Figure 2(a), the Bayes-greedy
policy has earned approximately 20% more revenue from 100 deals.
Figure 3 examines the simulated deals more closely for insight into why the KL model and Bayes-
greedy policy earn more revenue than LR. In our simulations, different pricing policies make use of
different statistical models, but the resulting pricing decisions are evaluated on the same set of true
3 The VB method would likely perform better if the customers were generated directly from that model. We did not
consider this case here because the VB model demonstrated the lowest AUC in modeling the customers in the data
(Table 1), and thus would produce the least “realistic” customers.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing30Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
coefficients β and the same bootstrapped transaction data xf , xp. Thus, for any given deal, the
same demand curve is used to evaluate decisions made by two different policies. In Figure 3(a), we
combined all the simulated deals from all the macro-replications and classified them by outcome,
namely, whether KL-Bayes and LR-Freq won or lost the deal, and if they both won, which policy
had the higher price (and thus earned more revenue).
The most immediate insight from Figure 3(a) is that, among the 19.3% of deals won by both
policies, KL-Bayes charged a higher price 75% of the time. Those deals that were won by only
one of the two policies were split almost equally between KL-Bayes and LR-Freq. Unsurprisingly
(considering the overall low proportion of wins in the data), both policies lost nearly half of the
deals.
To obtain additional insight into the five categories from Figure 3(a), we also calculated the
true win probabilities, using (1) with the true coefficients β, for all of the simulated deals (recall
that the coefficients β characterize the true demand curve, and are generated independently of the
pricing policy). The empirical distributions of these quantities are shown in Figure 3(b). First, we
see that the true win probability varies greatly between individual simulated transactions, which
reflects the heterogeneity of products and customers in our data. Simply put, there are many deals
where the win probability will be low regardless of the price.
Although the same demand curves are used to generate outcomes for KL-Bayes and LR-Freq,
those actual outcomes are simulated independently by generating Bernoulli random variables with
success probabilities given by (1). Thus, even if the policies recommend similar prices, low win
probabilities mean that we are much more likely to see outcomes where one policy wins and one
loses than we are to see outcomes where both policies win. This random noise accounts for the
large proportion of such cases in Figure 3(a). As can be seen in Figure 3(b), we are more likely to
see outcomes where both policies win if the win probability is higher overall for that deal.
Thus, the main reason why KL-Bayes outperforms LR-Freq is not because KL-Bayes is able to
win more “long-shot” deals, but rather because KL-Bayes consistently makes better offers for those
deals that are realistically winnable. In other words, KL-Bayes obtains greater value from each
win.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)31
6. Conclusion
We have developed a framework for statistical and optimal learning in B2B pricing. Our statistical
model uses approximate Bayesian inference to learn an unknown logistic demand curve efficiently.
Our optimization strategy then uses the distribution of belief in this model to recommend prices
that are adjusted for the seller’s uncertainty. Our case study shows that this approach performs
efficiently in realistic settings, and its predictive power is competitive against that of a frequentist
approach that does not use any approximations. We believe that this paper lays the methodological
groundwork for improved decision support tools in price optimization. Moosmayer et al. (2013)
finds that target prices are strongly correlated with salespeople’s final price quotes; thus, even if
recommended prices are not implemented directly, they remain quite influential. Thus, a better
model not only makes better recommendations, but also is less likely to mislead the salesperson.
We briefly discuss some avenues for future work. First, our framework assumes that we have
access to data regarding both wins and losses. While this was true in our case study, there may
be other practical settings where only the historical wins may be available (for example, because
the salespeople prefer not to report losses). Then, in order to model the demand curve, the task
would be to reliably infer the number of losses based solely on the wins, a statistical problem
that is outside the scope of the present paper. Second, we have separated the problem of model
selection (identifying the most important segments) from estimation and optimization, whereas
ideally one might wish to adaptively identify new significant features in real time (for example,
if a new product that was not previously in the model suddenly experiences high demand). The
techniques from Li et al. (2015) may be applicable, but would lead to greater computational cost.
A simpler approach for practitioners may be to repeat the model selection procedure at regular
intervals and “reset” the Bayesian model, which would then be used to learn in the short term.
References
Agrawal, V., M. Ferguson. 2007. Bid-response models for customised pricing. Journal of Revenue & Pricing
Management 6(3) 212–228.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing32Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
Agrawal, V., M. Ferguson, G. C. Souza. 2015. Trade-in rebates for price discrimination and product recovery.
Submitted for publication .
Araman, V. F., R. Caldentey. 2009. Dynamic pricing for nonperishable products with demand learning.
Operations Research 57(5) 1169–1188.
Belloni, A. V., V. Chernozhukov. 2013. Least squares after model selection in high-dimensional sparse models.
Bernoulli 19(2) 521–547.
Besbes, O., D. Saure. 2014. Dynamic pricing strategies in the presence of demand shifts. Manufacturing &
Service Operations Management 16(4) 513–528.
Besbes, O., A. Zeevi. 2009. Dynamic pricing without knowing the demand function: Risk bounds and
near-optimal algorithms. Operations Research 57(6) 1407–1420.
Besbes, O., A. Zeevi. 2015. On the (surprising) sufficiency of linear models for dynamic pricing with demand
learning. Management Science 61(4) 723–739.
Bitran, G., R. Caldentey. 2003. An overview of pricing models for revenue management. Manufacturing &
Service Operations Management 5(3) 203–229.
Blei, D. M., M. I. Jordan, J. W. Paisley. 2012. Variational Bayesian inference with stochastic search.
Proceedings of the 29th International Conference on Machine Learning . 1367–1374.
Bodea, T., M. Ferguson. 2014. Segmentation, Revenue Management and Pricing Analytics. Routledge.
Brahma, A., M. Chakraborty, S. Das, A. Lavoie, M. Magdon-Ismail. 2012. A Bayesian market maker.
Proceedings of the 13th ACM Conference on Electronic Commerce. 215–232.
Brascamp, H. J., E. H. Lieb. 1976. On extensions of the Brunn-Minkowski and Prekopa-Leindler theorems,
including inequalities for log concave functions, and with an application to the diffusion equation.
Journal of Functional Analysis 22(4) 366–389.
Broder, J., P. Rusmevichientong. 2012. Dynamic pricing under a general parametric choice model. Operations
Research 60(4) 965–980.
Bruno, H. A., H. Che, S. Dutta. 2012. Role of reference price on price and quantity: insights from business-
to-business markets. Journal of Marketing Research 49(5) 640–654.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)33
Chau, M., M. C. Fu, H. Qu, I. O. Ryzhov. 2014. Simulation optimization: A tutorial overview and recent
developments in gradient-based methods. A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley,
J. A. Miller, eds., Proceedings of the 2014 Winter Simulation Conference. 21–35.
Chen, C.-H., S. E. Chick, L. H. Lee, N. A. Pujowidianto. 2015. Ranking and selection: efficient simulation
budget allocation. M. C. Fu, ed., Handbook of Simulation Optimization. Springer, 45–80.
Chhabra, M., S. Das. 2011. Learning the demand curve in posted-price digital goods auctions. Proceedings
of the 10th International Conference on Autonomous Agents and Multiagent Systems. 63–70.
Chick, S. E. 2006. Subjective Probability and Bayesian Methodology. S.G. Henderson, B.L. Nelson, eds.,
Handbooks of Operations Research and Management Science, vol. 13: Simulation. North-Holland Pub-
lishing, Amsterdam, 225–258.
Cope, E. 2007. Bayesian strategies for dynamic pricing in e-commerce. Naval Research Logistics 54(3)
265–281.
Crooks, G. E. 2009. Logistic approximation to the logistic-normal integral. Tech. rep., Lawrence Berkeley
National Laboratory.
Cross, R. G., J. A. Higbie, Z. N. Cross. 2011. Milestones in the application of analytical pricing and revenue
management. Journal of Revenue & Pricing Management 10(1) 8–18.
Das, S., M. Magdon-Ismail. 2009. Adapting to a market shock: Optimal sequential market-making. D. Koller,
Y. Bengio, D. Schuurmans, L. Bottou, R. Culotta, eds., Advances in Neural Information Processing
Systems, vol. 21. 361–368.
DeGroot, M. H. 1970. Optimal Statistical Decisions. John Wiley and Sons.
Demidenko, E. 2013. Mixed models: theory and applications with R (2nd ed.). John Wiley and Sons.
den Boer, A. V., B. Zwart. 2015. Dynamic pricing and learning with finite inventories. Operations Research
63(4) 965–978.
Elmaghraby, W., A. Gulcu, P. Keskinocak. 2008. Designing optimal preannounced markdowns in the presence
of rational customers with multiunit demands. Manufacturing & Service Operations Management 10(1)
126–148.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing34Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
Elmaghraby, W., W. Jank, I. Z. Karaesmen, S. Zhang. 2012. An exploratory analysis of B2B price changes.
Journal of Revenue & Pricing Management 11(6) 607–624.
Elmaghraby, W., W. Jank, S. Zhang, I. Z. Karaesmen. 2015. Sales force behavior, pricing information, and
pricing decisions. Manufacturing & Service Operations Management (to appear) .
Fan, J., F. Han, H. Liu. 2014. Challenges of big data analysis. National Science Review 1(2) 293–314.
Farias, V. F., B. Van Roy. 2010. Dynamic pricing with a prior on market response. Operations Research
58(1) 16–29.
Fu, M. C. 2008. What you should know about simulation and derivatives. Naval Research Logistics 55(8)
723–736.
Fu, M. C. 2015. Stochastic gradient estimation. M. C. Fu, ed., Handbook of Simulation Optimization.
Springer, 105–147.
Gale, B. T., D. J. Swire. 2012. Implementing strategic B2B pricing: Constructing value benchmarks. Journal
of Revenue & Pricing Management 11(1) 40–53.
Golub, G. H., C. F. Van Loan. 2012. Matrix computations (3rd ed.). JHU Press.
Harrison, J. M., N. B. Keskin, A. Zeevi. 2012. Bayesian dynamic pricing policies: Learning and earning
under a binary prior distribution. Management Science 58(3) 570–586.
Hastie, T., R. Tibshirani, J. Friedman. 2001. The Elements of Statistical Learning (2nd ed.). Springer.
Hormby, S., J. Morrison, P. Dave, M. Meyers, T. Tenca. 2010. Marriott International increases revenue by
implementing a group pricing optimizer. Interfaces 40(1) 47–57.
Jaakkola, T. S., M. I. Jordan. 2000. Bayesian parameter estimation via variational methods. Statistics and
Computing 10(1) 25–37.
Keskin, N. B., A. Zeevi. 2014. Dynamic pricing with an unknown demand model: Asymptotically optimal
semi-myopic policies. Operations Research 62(5) 1142–1167.
Keskin, N. B., A. Zeevi. 2015. Chasing demand: Learning and earning in a changing environment. Submitted
for publication .
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)35
Kim, S. 2006. Gradient-based simulation optimization. L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson,
D. M. Nicol, R. M. Fujimoto, eds., Proceedings of the 2006 Winter Simulation Conference. 159–167.
Kushner, H. J., G. Yin. 2003. Stochastic approximation and recursive algorithms and applications (2nd ed.).
Springer.
L’Ecuyer, P. 1995. Note: On the interchange of derivative and expectation for likelihood ratio derivative
estimators. Management Science 41(4) 738–747.
Leng, M., M. Parlar. 2005. Free shipping and purchasing decisions in B2B transactions: A game-theoretic
analysis. IIE Transactions 37(12) 1119–1128.
Li, H., W. T. Huh. 2011. Pricing multiple products with the multinomial logit and nested logit models:
Concavity and implications. Manufacturing & Service Operations Management 13(4) 549–563.
Li, Y., H. Liu, W. B. Powell. 2015. The knowledge gradient policy using a sparse additive belief model.
arXiv preprint arXiv:1503.05567 .
Minka, T. P. 2000. Bayesian linear regression. Tech. rep., Microsoft Research.
Moosmayer, D. C., A. Y.-L. Chong, M. J. Liu, B. Schuppar. 2013. A neural network approach to predicting
price negotiation outcomes in business-to-business contexts. Expert Systems with Applications 40(8)
3028–3035.
Negoescu, D. M., P. I. Frazier, W. B. Powell. 2011. The Knowledge-Gradient Algorithm for Sequencing
Experiments in Drug Discovery. INFORMS Journal on Computing 23(3) 346–363.
Powell, W. B., I. O. Ryzhov. 2012. Optimal learning . John Wiley and Sons.
Qu, H., I. O. Ryzhov, M. C. Fu, Z. Ding. 2015. Sequential selection with unknown correlation structures.
Operations Research 63(4) 931–948.
Roth, V. 2004. The generalized LASSO. IEEE Transactions on Neural Networks 15(1) 16–28.
Ryzhov, I. O. 2015. Approximate Bayesian inference for simulation and optimization. B. Defourny, T. Terlaky,
eds., Modeling and optimization: theory and applications. Springer. To appear.
Smithson, M., E. C. Merkle. 2013. Generalized linear models for categorical and continuous limited dependent
variables. CRC Press.
Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing36Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)
Spall, J. C. 2005. Introduction to stochastic search and optimization: estimation, simulation, and control .
John Wiley & Sons.
Spiegelhalter, D. J., S. L. Lauritzen. 1990. Sequential updating of conditional probabilities on directed
graphical structures. Networks 20(5) 579–605.
Talluri, K. T., G. J. Van Ryzin. 2006. The theory and practice of revenue management . Springer.
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society
B58(1) 267–288.
Xia, C. H., P. Dube. 2007. Dynamic pricing in e-services under demand uncertainty. Production and
Operations Management 16(6) 701–712.
Zhang, J. Z., O. Netzer, A. Ansari. 2014. Dynamic targeted pricing in B2B relationships. Marketing Science
33(3) 317–337.