JOURNAL OF LA Bidding Machine: Learning to Bid for ... OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST...

14
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Bidding Machine: Learning to Bid for Directly Optimizing Profits in Display Advertising Kan Ren, Ke Chang, Yifei Rong, Weinan Zhang, Yong Yu, and Jun Wang Abstract—Real-time bidding (RTB) based display advertising has become one of the key technological advances in computational advertising. RTB enables advertisers to buy individual ad impressions via an auction in real-time and facilitates the evaluation and the bidding of individual impressions across multiple advertisers. In RTB, the advertisers face three main challenges when optimizing their bidding strategies, namely (i) estimating the utility (e.g., conversions, clicks) of the ad impression, (ii) forecasting the market value (thus the cost) of the given ad impression, and (iii) deciding the optimal bid for the given auction based on the first two. Previous solutions assume the first two are solved before addressing the bid optimization problem. However, these challenges are strongly correlated and dealing with any individual problem independently may not be globally optimal. In this paper, we propose Bidding Machine,a comprehensive learning to bid framework, which consists of three optimizers dealing with each challenge above, and as a whole, jointly optimizes these three parts. We show that such a joint optimization would largely increase the campaign effectiveness and the profit. From the learning perspective, we show that the bidding machine can be updated smoothly with both offline periodical batch or online sequential training schemes. Our extensive offline empirical study and online A/B testing verify the high effectiveness of the proposed bidding machine. Index Terms—Real-Time Bidding, User Response Prediction, Bid Landscape Forecasting, Bidding Strategy Optimization 1 I NTRODUCTION E MERGING in 2009 [22] and popularized since 2011 [10], real-time bidding (RTB) based display advertising has become a major paradigm in computational advertising for both technique and business perspectives. With ad ex- changes as intermediaries, RTB enables publishers to sell the individual ad impressions via hosting a real-time auction and facilitates advertisers to evaluate each auctioned ad impression and bid for it. In RTB display advertising, as is shown is Figure 1, when a user visits one publisher’s site e.g., a web page or a mobile app page, (0) a bid request for the corresponding ad display opportunity, along with its information about the underlying user, the domain context and the auction information, (1) is broadcast to hundreds or thousands of advertisers for bid via an ad exchange [31]. With the help of computer algorithms on demand-side platforms (DSPs), each advertiser estimates the potential utility and the pos- sible cost for the received bid request and (2) makes the final decision of the bid price in this real-time auction. Then the ad exchange will (3) determine that the winner, who proposed the highest bid price, could show the ad and pay for the second highest price which is called as the market price [3] (in second-price auction). The whole loop will be finished in less than 100 ms. The winning advertiser (4) would send the ad creative to the user and (5) receive the user response (e.g., click, conversion) later. For each day, such a request-bidding-feedback loop occurs billions K. Ren, K. Chang, W. Zhang and Y. Yu are with Shanghai Jiao Tong University. E-mail: {kren, kchang, wnzhang, yyu}@apex.sjtu.edu.cn Y. Rong is with Meituan-Dianping Inc. E-mail: [email protected]. This work was done when he was with YOYI Inc. J. Wang is with University College London. E-mail: [email protected] of times for an ordinary RTB platform, which makes RTB be a true battlefield of big data. For example, YOYI DSP, which has deployed our proposed algorithms in this paper and hosted the online A/B testing, handles more than 10 billion ad transactions daily in 2017. In the view of an advertiser, the goal is to spend the campaign budget on the most effective ad opportunities to achieve high profits, which means the ad volume with more positive user responses, e.g., clicks or conversions, yet non- expensive cost. At each time, in order to calculate the bid, the advertiser first predicts the probability of the positive user response of that ad display, i.e., how likely the user is going to click or convert, which is normally measured by the predicted click-through rate (CTR) or conversion rate (CVR). Generally, the advertisrs should bid higher and allocate more budget on the ad inventory with higher CTR or CVR [24]. For most advertisers, the cost, which means the market price of that bid opportunity [3], should be estimated to better determine the bid price. As we may find in the above description, for each ad- vertiser, there are three main challenges within the bidding procedure. The first is to estimate the utility, i.e., CTR or CVR of the ad impression, with the consideration of the user, ad, publisher and contextual information. CTR or CVR qualifies the expected probability of click or conversion for the advertiser w.r.t. the given ad request. The second challenge is to forecast the probably cost for showing the ad, which is the amount paid for the impression. Note that the true cost is the market price rather than the bid price for the winner of the auction. The last but the most important problem is to adopt a proper bidding strategy to win as many effective (high utility, yet low cost) impressions as possible to maximize the profits of the advertiser with the campaign budget constraint.

Transcript of JOURNAL OF LA Bidding Machine: Learning to Bid for ... OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST...

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

Bidding Machine: Learning to Bid for DirectlyOptimizing Profits in Display Advertising

Kan Ren, Ke Chang, Yifei Rong, Weinan Zhang, Yong Yu, and Jun Wang

Abstract—Real-time bidding (RTB) based display advertising has become one of the key technological advances in computationaladvertising. RTB enables advertisers to buy individual ad impressions via an auction in real-time and facilitates the evaluation and thebidding of individual impressions across multiple advertisers. In RTB, the advertisers face three main challenges when optimizing theirbidding strategies, namely (i) estimating the utility (e.g., conversions, clicks) of the ad impression, (ii) forecasting the market value (thusthe cost) of the given ad impression, and (iii) deciding the optimal bid for the given auction based on the first two. Previous solutionsassume the first two are solved before addressing the bid optimization problem. However, these challenges are strongly correlated anddealing with any individual problem independently may not be globally optimal. In this paper, we propose Bidding Machine, acomprehensive learning to bid framework, which consists of three optimizers dealing with each challenge above, and as a whole,jointly optimizes these three parts. We show that such a joint optimization would largely increase the campaign effectiveness and theprofit. From the learning perspective, we show that the bidding machine can be updated smoothly with both offline periodical batch oronline sequential training schemes. Our extensive offline empirical study and online A/B testing verify the high effectiveness of theproposed bidding machine.

Index Terms—Real-Time Bidding, User Response Prediction, Bid Landscape Forecasting, Bidding Strategy Optimization

F

1 INTRODUCTION

EMERGING in 2009 [22] and popularized since 2011 [10],real-time bidding (RTB) based display advertising has

become a major paradigm in computational advertisingfor both technique and business perspectives. With ad ex-changes as intermediaries, RTB enables publishers to sell theindividual ad impressions via hosting a real-time auctionand facilitates advertisers to evaluate each auctioned adimpression and bid for it.

In RTB display advertising, as is shown is Figure 1,when a user visits one publisher’s site e.g., a web page ora mobile app page, (0) a bid request for the correspondingad display opportunity, along with its information aboutthe underlying user, the domain context and the auctioninformation, (1) is broadcast to hundreds or thousands ofadvertisers for bid via an ad exchange [31]. With the helpof computer algorithms on demand-side platforms (DSPs),each advertiser estimates the potential utility and the pos-sible cost for the received bid request and (2) makes thefinal decision of the bid price in this real-time auction. Thenthe ad exchange will (3) determine that the winner, whoproposed the highest bid price, could show the ad and payfor the second highest price which is called as the marketprice [3] (in second-price auction). The whole loop willbe finished in less than 100 ms. The winning advertiser(4) would send the ad creative to the user and (5) receivethe user response (e.g., click, conversion) later. For eachday, such a request-bidding-feedback loop occurs billions

• K. Ren, K. Chang, W. Zhang and Y. Yu are with Shanghai Jiao TongUniversity. E-mail: {kren, kchang, wnzhang, yyu}@apex.sjtu.edu.cn

• Y. Rong is with Meituan-Dianping Inc. E-mail: [email protected] work was done when he was with YOYI Inc.

• J. Wang is with University College London.E-mail: [email protected]

of times for an ordinary RTB platform, which makes RTB bea true battlefield of big data. For example, YOYI DSP, whichhas deployed our proposed algorithms in this paper andhosted the online A/B testing, handles more than 10 billionad transactions daily in 2017.

In the view of an advertiser, the goal is to spend thecampaign budget on the most effective ad opportunities toachieve high profits, which means the ad volume with morepositive user responses, e.g., clicks or conversions, yet non-expensive cost. At each time, in order to calculate the bid,the advertiser first predicts the probability of the positiveuser response of that ad display, i.e., how likely the useris going to click or convert, which is normally measuredby the predicted click-through rate (CTR) or conversionrate (CVR). Generally, the advertisrs should bid higher andallocate more budget on the ad inventory with higher CTRor CVR [24]. For most advertisers, the cost, which means themarket price of that bid opportunity [3], should be estimatedto better determine the bid price.

As we may find in the above description, for each ad-vertiser, there are three main challenges within the biddingprocedure. The first is to estimate the utility, i.e., CTR orCVR of the ad impression, with the consideration of theuser, ad, publisher and contextual information. CTR or CVRqualifies the expected probability of click or conversionfor the advertiser w.r.t. the given ad request. The secondchallenge is to forecast the probably cost for showing the ad,which is the amount paid for the impression. Note that thetrue cost is the market price rather than the bid price forthe winner of the auction. The last but the most importantproblem is to adopt a proper bidding strategy to win as manyeffective (high utility, yet low cost) impressions as possibleto maximize the profits of the advertiser with the campaignbudget constraint.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

Typically, the bid optimization is done in the followingsequential basis. First, the CTR estimation1 is formulated asa binary regression problem, which can be solved by ma-chine learning models such as logistic regressions [20], [27],Bayesian probit regression [11], gradient boosting regressiontrees [12], factorization machines [23], etc. The commonobjective in this stage is to make the estimation as accurateas possible by, for instance, minimizing the cross entropyerror between the predicted CTR and ground truth userresponses. Second, for cost estimation, literatures [8], [33]formulate the problem as a prediction task to forecast themarket price distribution, which is named as bid landscapeforecasting, or directly estimate the winning price of thegiven bid request. Third, based on the estimated CTR andcost of the ad display opportunity, we will seek for theoptimal bidding function along with other considerationsincluding the campaign budget and the auction volume etc.[16], [19], [24], [37], [38].

However, such sequential optimization is indeed notoptimal. According to the Bayesian decision theory [4], thelearning of the user response model and bid landscapemodel should be informed by the final bidding utility. Asis studied in the literature [26], the required accuracy of theCTR prediction would not be the same throughout the rangeof the prediction [0, 1] as there is a cost (negative utility) forthe advertiser to win an impression if no click, but no cost(zero utility) for losing one. The value of clicks also variesacross campaigns; and it would be good if the CTR learningcan tailor its efforts more toward those higher-valued casesand make them better predicted. More importantly, the userresponse prediction is indeed correlated with the secondprice auction in RTB — if won an auction, the advertiserpays the market price and then obtain the payoff from theuser conversions triggered by the ad.

Therefore, the market price and the competition have asignificant impact on the campaign performance. On onehand, if the performed bid is in a highly competitive situa-tion, it is of low confidence to predict whether the advertiserwill win the ad auction or not; thus the optimization ofthe CTR prediction in such case should be more focusedand fine-tuned than that in the less competitive case. Onthe other hand, the bidding strategy also influences theutility estimation and cost prediction modules. In [19], theauthors proved the advantage for bidding effectiveness ofcombining the two optimization models for both CTR andthe winning price under budget constraints. As a conse-quence, the natural idea is to solve the three main challengesaltogether and derive a comprehensive methodology tojointly optimize the bidding performance.

In this paper, we present a novel optimization frame-work, named as Bidding Machine (BM) as shown in Fig-ure 1, which considers the three challenges as a whole anddirectly pushes the limit of the campaign profit by jointlyoptimizing the three components: user response prediction,bid landscape forecasting and bid optimization. In biddingmachine, we adopt the utility estimation model and the costprediction model while utilizing a comprehensive utility op-timization objective function. Moreover, we take the budget

1. In this paper, we focus on the CTR estimation, while the CVRestimation can be done by following the same token.

User

Other Advertisers

0. Ad Request

Ad Exchanger

1. BidRequest

3. WinNotice

2. BidResponse

4. Ad Creative

5. UserResponse

z: Market Price

x: RequestFeature

Training Flow

LearningPerforming

Data Flow

Bidding Machine y: true user response

Bid Strategy

User Response Prediction

Bid Landscape Forecasting

Bid Optim

ization

MarketModeling

Utilit

y Es

timat

ion

yzx

Fig. 1: The joint learning framework of bidding machine.

into consideration and prove the optimal bidding strategyunder the second price auction. Whereafter a functionaloptimization method is used to derive the optimal biddingfunction with budget constraints. The overall methodologyworks as a learning to bid model, which consumes recenthistorical bidding logs and (i) updates the estimation mod-els and (ii) optimizes the corresponding bidding strategy;then (iii) performs the bidding phase online and observesthe real-world feedback. After that, it returns to the first stepand repeats the update-optimize-perform loop. The procedurecirculates and acts as a machine interacting with the marketwhile trying to maximize the obtained profits.

Note that our methodology tries to resolve the threemain challengs in a comprehensive optimization frameworkand directly optimize the profit for the advertiser. To ourbest knowledge, it is the first work that takes these threekey components of RTB altogether to optimize. In our recentwork [26], we combined utility estimation and biddingstrategy in a whole while adopting naive counting-basedmethod for market price modeling, which is not optimalconsidering the expected cost since the market price isflexible in different contexts. The authors in [19] proposed amethod which combines CTR estimation and winning priceprediction (as will be compared in the experiment). But themethod still regards the two aspects as separate problemsand does not put them into a joint optimization framework.

To sum up, the contributions of this paper are three-fold: (i) We point out the three main challenges in RTB,namely user response prediction, bid landscape forecastingand bid optimization, are indeed highly correlated but arecommonly tackled separately in previous work. (ii) Wepropose bidding machine, a comprehensive framework tojointly optimize these three components to directly pushthe limit of the campaign profit. (iii) Extensive empiricalstudy demonstrates that BM vastly improves the biddingperformance against the state-of-the-art baselines, in bothoffline experiments on two public datasets and online A/Btesting on a commercial RTB platform.

In the rest of our paper, we first discuss related lit-

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

eratures in Sec. 2 and then present the bidding machineframework. Specifically, we formulate the problem in Sec. 3,and then present our CTR estimation model in Sec. 4 andcost prediction model in Sec. 5. After that, we solve thefunctional optimization problem for the optimal biddingstrategy in Sec. 6. Due to page limit, the online learningversion is presented in Appendix A, also with the proof ofthe optimal bidding function under the second-price auctiondescribed in Appendix B. We also make a game theoreticanalysis of multiple bidders with the same optimal strategyin Appendix C. We present the experiments and discuss thedetailed results in Sec. 7. Finally we conclude this paper anddiscuss the future work in Sec. 8.

2 RELATED WORK

In this section, we discuss the related literatures about RTBtechniques, specifically the three key components of RTB,as pointed out in Sec. 1, namely utility estimation, costprediction and bidding strategy optimization.

Utility: User Response Prediction. The first challenge isthe utility estimation, which is mostly about user responseprediction, such as the click-through rate (CTR) estimationor the conversion rate (CVR) estimation. And it plays a keyrole in real-time display advertising [16], [20], [30]. The re-sponse prediction is a probability estimation task [21] whichmodels the interest of users in the content of publishersor the ads, and is used to derive the budget allocationof the advertisers [27]. Typically, the response predictionproblem is formulated as a binary regression problem withprediction likelihood as the training objective [1], [11], [23],[27]. From the methodology view, linear models such aslogistic regression [16] and non-linear models such as tree-based model [12] and factorization machines [21], [23] arecommonly used. Other variants include Bayesian probitregression [11], FTRL-trained factorization machine [28],and neural network learning framework [25]. Normally,area under ROC curve (AUC) and relative information gain(RIG) are common evaluation metrics for CTR predictionaccuracy [11]. Recently, the authors in [6], [29] pointed outthat such metrics may not be good enough for evaluatingCTR predictor in RTB based advertising because of the sub-sequent bidding and auctions. The authors in [29] proposeda cost-sensitive objective function to tackle the cost issuein user response prediction learning. However, it may beinfluenced by the suboptimal cost estimation module. Inthis part of the paper, considering utility estimation, we usea logistic regression as a working example and go one stepfurther over [6] to reformulate the CTR estimation learningby directly optimizing campaign performance (profit).

Cost: Bid Landscape Forecasting. For cost estimation,we refer it to bid landscape forecasting, which aims atpredicting the distribution of market price for a type of adinventory [8]. The advertisers use it to calculate the winningrate given a bid and help decide the final bid price. Severalwinning function forms were hypothesized in [17], [38] todirectly induce the optimal bidding functions. A campaign-level forecasting system with tree models was presentedin [8]. The authors in [14] conducted an error handlingmethodology to improve the efficiency and reliability of

the bid landscape forecasting system. As advertisers onlyknow the statistics (market price, user clicks etc.) fromtheir winning impressions, the authors in [33] proposed asolution to handle such data censorship in market priceprediction. Later we will show that market price distributionindeed plays an essential role in both CTR model learningand bidding strategy optimization for campaign profit opti-mization, which has never been formally discussed.

Strategy: Bid Optimization. With the estimated utility ofCTR/CVR, the advertisers would be able to assess thevalue of the impression and perform a bid. The auctiontheory [9] proves that truthful bidding, i.e., bidding theaction value times the action rate, is the optimal strategyin the second price auction [16]. However, with budget andauction volume constraints, the truthful bidding may notbe optimal [36]. The linear bidding strategy [24] is widelyused in industry, where the bid price is calculated via thepredicted CTR/CVR multiplied by a constant parametertuned according to the campaign budget and performance.The authors in [7] proposed a bidding function with truthfulbidding value minus a tuned parameter. A lift-based bid-ding strategy was recently proposed in [34] where the bidprice was determined by the user CVR lift after seeing thedisplayed ad.

However, the impact of market price distribution, i.e.,bid landscape, has not yet been studied in the above works,and the final utility of the campaign is not consideredin the optimization objective, which may result in someunfavorable statistics such as relatively high effective costper click (eCPC) and low return-on-investment ratio (ROI).The authors in [17] combined the winning rate estimationand the winning price prediction together and deployedthe estimation results in different bidding strategies towardsdifferent business demands. The authors in [15] embeddeda budget smoothing component into a bid optimizationframework. In [37], [38], with the estimated CTR as inputof the bidding function, the authors leveraged functionaloptimization to derive non-linear bidding functions. Ourwork is different from the above works as we directly modelCTR learning and cost estimation as part of bid optimizationfor campaign profit maximization. In [19] the authors com-bined two predictors to decide the final bid price. However,the proposed model may be suboptimal since they did nottake profit as the objective function and heuristically set thebidding as a predicted winning price plus a constant value.

To sum up, all the existing learning frameworks inRTB consider the user response prediction, bid landscapeforecasting and bid optimization as three separated parts,while in our paper, we model them as a whole and performa novel joint optimization.

3 PROBLEM DEFINITION OF BIDDING MACHINE

We propose a unified learning framework, named as BiddingMachine, which integrates both utility and cost estimationand puts them back into bid optimization, to maximize theoverall profit for advertisers.

Recall that in RTB scenario, on one side, the user visitingan online page may trigger an auction, in real time, for eachad slot on the web page. On the other side, the advertiserreceives a bid request from the ad exchange [35], along with

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

the information about user, context on web page and otherauction features. Then the bidding engine of DSP decides,on behalf of a campaign, whether to participate the auctionand how much to bid on this impression opportunity.

As is discussed in many literatures [7], [19], [24], [38],the final bid price is influenced by many factors includingestimated utility (i.e., user response) and cost (i.e., mar-ket price). And bid optimization should also consider thebudget pacing [2] to control the cost to optimize the finalprofit. However, almost all the related work bases biddingdecision on the estimated utility and the predicted cost,and treats these three parts desperately. [26] embeddeduser response prediction, i.e., utility estimation, into the bidoptimization and made the prediction aware of the marketinformation and cost sensitivity. [38] took budget constraint,winning probability and cost of the particular ad impressionaltogether and proposed a functional bidding optimizationto maximize the target KPIs, e.g., clicks.

Our work goes steps further. We propose a unifiedoptimization framework on the integral utility minus costobjective, which is the profit gained by the advertiser. Ourgoal is to learn the user response prediction model withmarket competition modeling, and optimize the final bid-ding strategy considering the preset budget constraints. Wewill first formulate the unified learning problem and thendiscuss our optimization solutions. Note that the derivedlearning formulations of each component benefit from theupdate of the other two parts. Thus the whole optimizationframework is a joint learning procedure, where the user re-sponse learning, market competition modeling and biddingstrategy optimization run as a circulation mechanism.

3.1 Problem Definition

Typically, a bid request contains various information of anad display opportunity, including the information of the un-derlying user, location, time, browser and other contextualinformation about the web page. Along with the featuresextracted from the campaign itself, we construct the high-dimensional feature vector for the bid request, which isdenoted as x. We also use px(x) to denote the probabilitydistribution of the input feature vector x that matches thecampaign target rules.

Without loss of generality, we take click as user re-sponse and the CTR estimation is denoted as a functionp(y = 1|x) ≡ fθ(x) mapping from feature x to theprobability of a click, where y ∈ {0, 1} is a binary variableindicating whether a user click occurs (1) or not (0). Wedefine the true value of an occurring click as v which ispreset by the advertiser.

Next, we define the context where utility estimation issituated. A lot of previous work has specified the biddingstrategy as a function b(fθ(x)) mapping from the predictedCTR (or other estimated KPIs) fθ(x) to the bid price [16],[24], [38]. Essentially, the mapping follows a sequentialdependency assumption x → fθ(x) → b proposed by [37],[38]. In this paper, we follow the same formulation. Forsimplicity, we use b(·) to represent the bidding function, butalso occasionally use b to directly represent the bid price.

Once the DSP sends out the bid b, the ad exchange hostsa second-price auction [13] and decides who is going to

TABLE 1: Notations and descriptions

Notation Descriptionv The pre-defined value of positive user response.y The true label of user response.x The bid request represented by its features.

px(x) The probabilistic density function of x.z The market price.

pz(z) The probabilistic density function of z.θ The weight of CTR estimation function.

fθ(x) The CTR estimation function to learn.r The predicted CTR.

b(fθ(x)) The bid price determined by the estimated CTR,b for short.

D The training set.R(·) The utility function.wφ(b) The winning probability given bid price b.c(b) The expected cost given bid price b if winning.

win the auction. The probability of winning an auction isinfluenced by the bid price b and the stochastic market pricez with an underlying p.d.f. pz(z); we use wφ(b) to denotethe probability of winning as:

wφ(b) =

∫ b

0pz(z)dz, (1)

which is the probability that the bid b is higher than themarket price z [13] and φ is the parameter of our winningprobablity model. The details of the winning function willbe discussed later.

If the bid wins the auction, the advertiser pays the cost,which is the market price z. We denote the expected cost inthe second price auction as

c(b) =

∫ b0 zpz(z)dz∫ b0 pz(z)dz

, (2)

which is essentially the expected market price when win-ning the auction [13]. Once we have defined the biddingfunction b, the true value of a click v, and the winning ratew, the expected cost c, we are ready to define a generalform of the utility function as Rθ(x, y; b, v, c, w) for a given(x, y) 2-tuple in the training data (all the received historicalimpressions).

Our task is to build a joint optimization frameworkmodeling user response, market competition and biddingstrategy, to maximize the overall profit for the advertiser,which is formulated as

(b(·),θ∗,φ∗) = argmaxb,θ,φ

∫xR(x, y; b, v, c, w)px(x)dx, (3)

Following [6], [26], the overall revenue R(·) can bedefined as utility function below w.r.t. the correspondingauction sample (x, y):

R(b,θ,φ) =

∫x[vy − c(b(fθ(x)))]wφ(b(fθ(x)))px(x)dx

=∑

(x,y)∈D

[vy − c(b(fθ(x)))]wφ(b(fθ(x))).

(4)The summary of our notations are gathered in Table

1. In the next three sections, we will discuss the detailedoptimization of R(b,θ,φ) w.r.t. the CTR estimator fθ(·)(Sec. 4), the winning function wφ(·) (Sec. 5) and the biddingfunction b(·) (Sec. 6).

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

4 UTILITY: USER RESPONSE LEARNING

In this section, we formally describe the utility estimationtask, i.e., user response learning, and propose two solutionsfor this problem. Recall that, in the training set defined asD, each sample is represented as a 2-tuple as (x, y), where xdenotes the feature vector of the bid request, and y denotesthe indicator whether user action (click) occurs.

4.1 Gradient for Expected UtilityTo solve Eq. (3), utility function Rθ(·) can be naturallydefined as the expected direct profit from the campaign:

REUθ (x, y) = [vy − c(b(fθ(x)))] · w(b(fθ(x))), (5)

where to simplify our notation, we drop the dependency ofb, v, c, w forREU

θ (x, y). The expectation is w.r.t. whether win-ning or not, where no winning has zero utility. Recall that,in the training set defined as D, each sample is representedas a 2-tuple as (x, y), where x denotes the feature vector ofthe bid request, and y denotes the indicator whether useraction (click) occurs. The overall expected direct profit [6] ofall the auctions can be calculated by replacing Eqs. (1) and(2) into Eq. (5) as∑

(x,y)∈D

REUθ (x, y)

=∑

(x,y)∈D

[vy −

∫ b(fθ(x))0 z · pz(z)dz∫ b(fθ(x))0 pz(z)dz

]·∫ b(fθ(x))

0pz(z)dz

=∑

(x,y)∈D

∫ b(fθ(x))

0(vy − z) · pz(z)dz. (6)

Taking Eq. (6) into Eq. (3) with a regularization term, wehave

θEU = argminθ−

∑(x,y)∈D

REUθ (x, y) +

λ

2‖θ‖22 (7)

= argminθ−∑x

∫ b(fθ(x))

0(vy − z) · pz(z)dz +

λ

2θTθ,

where the optimal value of θ is obtained by taking agradient descent algorithm. The gradient of REU

θ (x, y) withregard to θ is calculated as

∂REUθ (x, y)

∂θ= (

bid error︷ ︸︸ ︷vy − b(fθ(x))) ·

market sensitivity︷ ︸︸ ︷pz(b(fθ(x))) ·

∂b(fθ(x))

∂fθ(x)

∂fθ(x)

∂θ,

(8)

and we update for each data instance as θ ← θ −η(−∂R

EUθ (x,y)∂θ + λθ

)with above chain rule where η is the

learning rate.We take logistic regression as our CTR prediction func-

tion, i.e.,

fθ(x) ≡ σ(θTx) =1

1 + e−θTx, (9)

and get ∂fθ(x)∂θ = σ(θTx)(1 − σ(θTx))x. After learning,

we denote the predicted CTR as r = fθ(x) to simplifythe subsequent derivation. We will discuss the specific for-mulation and the corresponding optimization method forbidding function b(r) w.r.t. the predicted CTR r.

0 50 100 150 200 250 300Bid Price

−300

−200

−100

0

100

200

300

Bid

Err

or

Negative Response

Positive Response

0 50 100 150 200 250 300Bid Price

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

Mark

et

Sen

siti

vit

y

−5

−4

−3

−2

−1

0

1

Bid

Err

or £

Mark

et

Sensi

tivit

y

Market Sensitivity

Negative Response

Positive Response

Fig. 2: The illustration of the impact from the bid and marketprice of Expected Utility (EU); click value v = 300.

Discussion. Eq. (8) provides a novel gradient update, takinginto account both the utility and the cost of a biddingdecision (the bid error term) as well as the impact from themarket price distribution (the market sensitivity term). Theyact as two additional re-weighting functions influencing aconventional gradient update, which is formulated by theremaining terms in the equation. We illustrate their impactin Figure 2. The left subfigure shows the weight from biderror against bid price with different user responses (y = 1or y = 0). We see that the update of the CTR model aimsto correct a bid towards the true value vy from a traininginstance, i.e., an optimal model (parameter) would generatea bid close to v for a positive instance, while close to zerofor a negative instance. The right subfigure plots the weightadjustment from the market sensitivity term (y-axis left) andthe combined weight bid error × market sensitivity (y-axisright). We observe that the market sensitivity term re-weightsthe bid error by checking the fitness to the market pricedistribution; this makes the gradient focused more on fixingthe errors (if any) when the bid is close to the market price.This is intuitively correct because when the bid is close tothe market price, the competition is high and a small error(win a case that is no click and vice versa) would make ahuge difference in terms of the cost and reward.

4.2 Gradient for Risk-ReturnBesides the expected utility (EU), we also propose a risk-return (RR) model to balance the risk and return of a biddecision as below:

RRRθ (x, y) =

( vy

z︸︷︷︸return

− v(1− y)v − z︸ ︷︷ ︸

risk

)· w(b(fθ(x))), (10)

where we define that when y = 1, the winning utility is vz ,

which is the ratio between the return and the cost of thistransaction; when y = 0, the winning utility becomes thepenalty for taking risk −v

v−z , which is defined as the ratiobetween the lost (−v) and the gain if winning (v − z). Notethat v is always higher than z as v ≥ b > z. The penalty isvery high when bidding for a very low margin (low v − z)case. Thus the new optimization objective function is

θRR = argminθ−

∑(x,y)∈D

RRRθ (x, y) +

λ

2‖θ‖22

= argminθ−

∑(x,y)∈D

∫ b(fθ(x))

0

(vyz− v(1− y)

v − z

)pz(z)dz

2θTθ, (11)

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

Fig. 3: The illustration of the impact from the bid and marketprice of Risk Return (RR); click value v = 300.

which leads to the gradient of RRRθ (x, y) w.r.t. θ as

∂RRRθ (x, y)

∂θ=( bid error︷ ︸︸ ︷

vy

b(fθ(x))− v(1− y)v − b(fθ(x))

market sensitivity︷ ︸︸ ︷pz(b(fθ(x)))

·∂b(fθ(x))∂fθ(x)

∂fθ(x)

∂θ. (12)

Discussion. To understand the above gradient, we plot thebid error, market sensitivity and their combined weight in Fig-ure 3. The RR model is different from the previous EU modelin that, the bid error turns to return when the response ispositive, and becomes risk when meets a negative response.If y = 0 and bid price is high, or if y = 1 and bid price islow, the bid error is quite significant to avoid the happeningof such cases.

As is shown in both Eqs. (8) and (12), the market pricedistribution plays an important role in the optimization:with the determined bidding function and CTR estimationfunction, the gradient is weighted by the probabilistic den-sity function (p.d.f.) of market price, denoted as pz(z).

4.3 Model RealizationSolving the proposed learning objectives (7) and (11) relieson the realization of the bidding function b(fθ(x)), themarket price distribution pz(z) and the CTR estimationfunction itself fθ(x). In this section, we will discuss thesolutions from the proposed two training objectives givensome specific implementations of b(fθ(x)), pz(z) and fθ(x).

Without loss of generality, for the CTR estimation model,we adopt the widely used logistic regression for fθ(x) as inEq. (9).

For the bidding strategy, we employ a widely used linearbidding function w.r.t. the predicted CTR [24] with a scalingparameter ρ

b(fθ(x)) ≡ ρ · v · fθ(x). (13)

Taking Eqs. (9) and (13) into (8) and (12), respectively, wederive our final gradient of the proposed EU utility:

∂REUθ (x, y)

∂θ=ρv2(y − ρσ(θTx)) · pz(b(fθ(x)))· (14)

σ(θTx)(1− σ(θTx))x ,

and that of the RR utility:

∂RRRθ (x, y)

∂θ=ρv

( y

ρσ(θTx)− 1− y

1− ρσ(θTx)

)· (15)

pz(b(fθ(x))) · σ(θTx)(1− σ(θTx))x ,

where the bidding function parameter ρ acts as a calibrationterm in bid correction.

Note that, various bid landscape models can be uti-lized to model pz(z), such as the parametric log-normaldistribution [8] and Gamma distribution [6]. In this paper,while our model is flexible with various landscape models,we first adopt a non-parametric pz(z) which is directlyobtained from each campaign’s winning price data [3], andalso a parametric functional model pz(z,x;φ) which will bediscussed later.

4.4 Links to Previous WorkIt is of great interest to compare our profit-optimized solu-tions with the existing ones that optimize the fitness of theuser response data. A logistic regression could be trainedwith squared error (SE) loss to fit user response data:

LSEθ (x, y) =

1

2(y − σ(θTx))2,

∂LSEθ (x, y)

∂θ= (σ(θTx)− y)σ(θTx)(1− σ(θTx))x. (16)

More commonly, in a binary output case, a logisticregression can be also trained with cross entropy (CE) loss:

LCEθ (x, y) = −y log σ(θTx)− (1− y) log(1− σ(θTx)),

∂LCEθ (x, y)

∂θ= (σ(θTx)− y)x. (17)

We see that our solutions in Eq. (14) and Eq. (15) extendthe original gradients in Eq. (16) and Eq. (17) by (i) replacingthe user response errors with the bid errors, and (ii) addingthe consideration from the market price and the competi-tion. Under the assumption of (i) truthful bidding function[16], [24] and (ii) uniform market price distribution as

b(fθ(x)) = v · fθ(x) , (18)pz(z) = l , (19)

our proposed learning models which directly optimize(maximize) the profit-related utility are equivalent to thestandard logistic regression with (minimizing) squared errorloss or cross entropy loss:

−∂REUθ (x, y)

∂θ= v2l(σ(θTx)− y) · σ(θTx)(1− σ(θTx))x ,

−∂RRRθ (x, y)

∂θ= vl(σ(θTx)− y)x .

(20)But note that, in our settings, we adopt more reasonablebidding functions and market price distributions, to achievesubstantial improvement against the traditional regressionloss methods.

5 COST: MARKET COMPETITION MODELING

From the above derivation, the market price distributionpz(z) has great contributions to the utility estimation com-ponent. Moreover, as is mentioned in Sec. 4.3, the landscapefunction pz(z) has various realizations. In this section, wepropose a machine learning methodology to model themarket competition, which is to learn the market price dis-tribution function pz(z,x;φ) with the specified feature x.Besides, our learning objective is the profit of the advertiser,

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

0 50 100 150 200 250 300Bid Price

0.0

0.2

0.4

0.6

0.8

1.0Pro

bab

ility

Winning probability wz (b)

Linear ®=300

Quadratic ®=300

Long Tail ®=75

0 50 100 150 200 250 300Market Price

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

Pro

bab

ility

Probability dense function pz (z)

Linear ®=300

Quadratic ®=300

Long Tail ®=75

Fig. 4: An illustration of landscapes with different α values, the forms of pz(z) reflecting the analysis results in Sec. 7.6

rather than merely the likelihood between the true distribu-tion and the learned model, though our model has similarprediction accuracy of distribution modeling as shown inthe experiments in Appendix E.

We take the objective function of expected utility asEq. (4) and the gradient of R w.r.t. φ can be derived as

∂R

∂φ=

∂φ

[vy

∫ b

0pz(z)dz −

∫ b

0zpz(z)dz

]. (21)

Next, we will discuss several formulations of the marketprice distribution pz(z,x;φ).

The market price z is a positive variable and its p.d.f.is pz(z), z ∈ [0,+∞). Here we model pz(z,x;φ) and nat-urally get the winning probability wz which describes thewinning probability of proposing b under the market priceassumption of pz(z) as in Eq. (1). Our another goal is to findthe proper formulation for winning probability function w,and the formulated function should satisfy two properties:

b→ 0+, w → 0 ;

b→ +∞, w → 1 .(22)

Note that the p.d.f. of the market price has a long-tailproperty, which is shown in our discussion of Sec. 7.6, so wepropose three formulations for the bid landscape functionfor market price modeling.

5.1 Linear FormFirst, we base on the assumption of the uniform marketprice distribution ranging in [0, α(x;φ)] where α is the max-imal value of the market price, and we take α(x;φ) = eφ

Tx

for each bid request feature x, where the exponential func-tion is used to make sure the upperbound is positive. Thusthe corresponding winning probability for the proposed bidprice b is

w(b,x;φ) =b

α(x;φ)=

b

eφTx, b ∈ [0, α(x;φ)],

p(z,x;φ) =∂w(z,x;φ)

∂z=

1

eφTx.

(23)

Thus the gradient ∂R∂φ derived from Eq. (21) is that

∂R

∂φ=

∂φ

[vy

∫ b

0pz(z)dz −

∫ b

0zpz(z)dz

]=(b22− vyb

) x

eφTx.

(24)

As is shown in Figure 4, the winning probability functionwz(b) is proportional to the bid price and the maximal

market price is α, whose value varies for different x andis learned by our parametric model.

5.2 Quadratic Form

We keep modeling the maximal market price as α(x;φ),that is z ∈ [0, α(x;φ)], and take a quadratic formulation forwinning probability function wz(b). Then

w(b,x;φ) =b

α(x;φ)

(2− b

α(x;φ)

), b ∈ [0, α(x;φ)] ,

p(z,x;φ) =∂w(z,x;φ)

∂z= − 2z

α(x;φ)2+

2

α(x;φ), (25)

since winning probability has the properties thatw(b = 0) =0, w(b = α) = 1, and is monotonously increasing over [0, α],as in illustrated in Figure 4.

Thus the gradient ∂R∂φ derived from Eq. (21) is

∂R

∂φ=

∂φ

[vy

∫ b

0

pz(z)dz −∫ b

0

zpz(z)dz]

=

[vy

θ2b2

α(x;φ)3− 2b

α(x;φ)2

)−θ

4b3

3α(x;φ)3− b2

α(x;φ)2

)]· ∂α(x;φ)

∂φ(26)

And α(x;φ) is a functional mapping from x to the maximalmarket price for each bid request. We also take a regressionmodel to fit the mapping function as that in Sec. 5.1 whoseparameter is φ.

5.3 Long Tail Form

Some simple winning probability function [38] can be de-rived from Eq. (21) as

w(b,x;φ) =b

b+ α(x;φ), (27)

we also take α(x;φ) = eφTx as to learn the optimal

parameter α value for each feature x.The market price distribution pz(z,x;φ) is

pz(z,x;φ) =∂w(z,x;φ)

∂z=

eφTx

(z + eφTx)2. (28)

Thus we can derive the gradient of U w.r.t. φ as

∂R

∂φ=

∂φ

[vy

∫ b

0pz(z)dz −

∫ b

0zpz(z)dz

]=

∂φ

vyb

b+ eφTx− ∂

∂φ

∫ b

0zpz(z)dz

= xvybeφ

Tx

(b+ eφTx)2+

∂φ

[( eφTx

eφTx + b+ ln(eφ

Tx + b)

− (φTx+ 1))eφ

Tx]. (29)

As is also shown in Figure 4, the market price probabilitypz(z) is monotonously decreasing while z is rising and haslong-tail property for large market prices. Thus the winningprobability function wz(b) approaches to 1 when the bidprice b gets larger.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

Algorithm 1 Double Optimization for Campaign Profits

Input: Training set DOutput: winning function wφ(),

Optimal utility (CTR) estimation model fθ()1: Initially set parameter θ and φ2: for num. of training rounds do3: for each sample (x, y) ∈ D do4: Calculate the gradient of θ via Eq. (8)5: Calculate the gradient of φ via Eq. (21)6: Update parameters θ and φ via gradient descent7: end for8: end for

5.4 Double Optimization for Campaign Performance

With the derived the optimal bid landscape forecastingmodel for campaign profits, we can naturally propose adouble optimization algorithm. In Algorithm 1, both theCTR estimation model and the bid landscape forecastingmodel are learned, whose parameters are θ and φ, re-spectively. Moreover, the bid landscape model derives thecorresponding winning probability function wφ(b) w.r.t. theproposed bid price b.

Generally speaking, this double optimization aims to op-timize the campaign profit in the view of the two predictionmodel. Note that, in this procedure, the contribution madeby the bid landscape forecasting model φ lies in the learningof the CTR estimation model θ since the update of the CTRmodel contains the landscape forecasting results pz(z,x;φ)as in Eqs. (8) and (12).

6 BID OPTIMIZATION

In this section, we will focus on bid optimization con-sidering budget constraints. The rational is that, since theCTR estimation and winning probability have been settleddown, the bidding function b(·) should be fine tuned for bidoptimization. The click maximization framework [38] onlytakes the bid price as the upper bound of the cost, which issuitable in the first-price auction rather than the commonlyused second-price auctions.

As is proved in Appendix B, the linear bidding functionb(u) w.r.t. the utility u is the optimal bidding strategy underthe second-price auction. Here we replace f(x) with r torepresent the predicted user response for simplicity, andconsider the linear bidding function

u(r) = vr ,

b(r) = γu(r) = γvr .(30)

Then we derive the optimal solution of parameter γ.

argmaxγ

T

∫r

∫ γvr

0(vr − z)pz(z)dz · pr(r)dr

s.t. T∫r

∫ γvr

0zpz(z)dz · pr(r)dr = B .

(31)

The Lagrangian L(γ, λ) =

T

∫r

∫ γvr

0[vr − (λ+ 1)z] pz(z)dz · pr(r)dr + λB, (32)

Algorithm 2 Periodic Bidding Machine

Input: Training sets {D1, D2, . . . , Dn}, total budgets{B1, B2, . . . , Bn} for each training set Di

Output: Optimal utility (CTR) estimation model fθ(),winning function wφ() and bidding strategy b(·)

1: Initially set parameter θ and φ and γ2: for each dataset Di in the training data do3: for each sample (x, y) ∈ Di do4: Calculate the gradient via Eqs. (8) and (21)5: Optimize the corresponding parameters as gradient

descent algorithm6: end for7: Update bidding function b(·) via solving Eq. (34)8: end for

where λ is the Lagrangian multiplier. Taking the derivativeequal to zero, we get that

∂L(γ, λ)∂γ

= 0 ⇒ γ =1

λ+ 1. (33)

To solve λ, we take the Lagrangian derivative w.r.t. to λant let it be zero, which obtains the constraint equation

T

∫r

∫ vr1+λ

0zpz(z)dz pr(r)dr = B, (34)

which normally has no analytic solution of λ except forsome trivial implementation of pz(z) and pr(r). Fortunately,the numeric solution of λ is easy to find because the leftpart of the equation monotonously decreases against λ inthe bidding function.

From Eq. (34), we find that the distribution of the pre-dicted CTR pr(r) directly influences the optimal value of λin the bidding function Eq. (33). It means if we update theCTR estimation model fθ(x), pr(r) will change accordingly,which in turn leads to the change of optimal λ [38].

6.1 Bidding Machine AlgorithmHaving specified the learning algorithms for utility op-timization and market modeling, and gone through bidoptimization, we deliver our bidding machine algorithmas in Algorithm 2, which is also illustrated in Figure 1.After received a bid request, our bidding machine will put itthrough the whole predicting process, i.e., (i) estimate utilityand (ii) predict the winning function, and then (iii) proposethe corresponding bid price according to the optimal bid-ding strategy. Our predicted results will be aligned with theground truth, that is, if winning, the true user response tosupervise the utility estimation module, and the real marketprice to correct the winning function learning. After that,we will fine-tune our bidding strategy to gain the optimalbidding function to maximize the advertiser’s total profit.The offline training involves a joint optimization of the threecomponents by alternatively optimizing one and fixing theother two.

We can find that the bidding machine algorithm willdigest the historical bidding information and update theparameters as a whole. We have also derived the onlineFTRL learning paradigm [20] for bidding machine frame-work, which is included in Appendix A. For comparison

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

with our previous work [26], we implement SGD learningparadigm in our offline experiments.

7 EXPERIMENTS

In this section, we first present the datasets and the ex-periment settings with evaluation metrics. Second, we willpresent the user response prediction model performanceand bid landscape forecasting results, and discuss somereasons behind the improvement of our models. Third, wewill discuss the experimental results for the whole biddingmachine framework and finally the online A/B test results.

7.1 DatasetsWe use two real-world datasets: iPinYou and YOYI, andprovide repeatable offline empirical studies2.iPinYou is a leading DSP company in China. The iPinYou

dataset3 was released to promote the research on real-time bidding. The entire dataset contains 65M bidrecords including 20M impressions, 15K clicks and 16KCNY expense on 9 different campaigns over 10 days in2013. The auctions during the last 3 days are set as testdata while the rest as training data.

YOYI runs a major DSP focusing on multi-device displayadvertising in China. YOYI dataset4 contains 402Mimpressions, 500K clicks and 428K CNY expense during8 days in Jan. 2016. The first 7 days in the time sequenceare set as the training data while the last 1 day is as thetest data.

For the repeatable experiments, we focus on our study oniPinYou dataset. Our algorithms are further evaluated overthe YOYI dataset for multi-device display advertising.

In real-time bidding, the training data contains muchfewer positive samples than negative ones. Thus similar to[12], the negative down-sampling and the correspondingcalibration methods are adopted in the experiment. Theonline A/B test is conducted on an operational real-timebidding platform run by YOYI.

7.2 Experiment SetupExperiment Flow. We take the original impression historylog as full volume bid request data. The data contain alist of bid record triples with user response (click) label,the corresponding market price and the request features.We follow the previous work [38] for feature engineeringand the whole experiment flow, which is as follows: thebid requests are received along with the time sequence,which is the same as the procedure that history log wasgenerated. When received one request, our bid engine willdecide the bid price to participate the real-time biddingauction. It wins if its bid price is higher than the marketprice, otherwise loses. Note that, the overal objective isto gain as much profit as possible, and the contributionof the market modeling to the final goal is to help thelearning of CTR model which sequentially affects the finalbid price. Thus our user response prediction and market

2. Repeatable experiment code: https://goo.gl/ZjMbnI.3. iPinYou Dataset link: http://goo.gl/9r8DtM.4. YOYI Dataset link: http://goo.gl/xaao4q.

modeling are both derived to a CTR estimation model in theexperimental measurement as results. Thus, on one hand,we deploy different CTR estimation models to predict theuser response probability, which then can be comparedagainst each other. On the other hand, we also presentthe experimental results of our bid landscape forecatingmodel with other prediction models in Appendix E. Afterbidding, the labeled clicks of the winning impressions willact as user feedback information. It is worth mentioning thatthis evaluation methodology works well for evaluating userresponse prediction and bid optimization [3], [38] and hasbeen adopted in display advertising industry [18].

Budget Constraints. It is obvious that if our bid engine bidsvery high price each time, the cost and profit will stay thesame as the original test log. Thus the budget constraintsplay a key role in evaluation [38]. For CTR estimation andbid landscape forecasting, we only report for the test resultswithout budget constraints since we care more about theprediction performance. For the bidding strategy optimiza-tion, we follow [37], [38] to run the evaluation test using1/64, 1/32, 1/16, 1/8, 1/4, 1/2 of the original total costrespectively in the test log as the budget constraints.

7.3 Evaluation Measures

Since our objective is to improve the profit of a performancecampaign and cut down the unnecessary cost in bidding,in our evaluation we measure profit and ROI w.r.t thecorresponding cost in bidding phase. When the bid enginewins the auction, the corresponding market price will beadded into the total cost. While the user response (click) ispositive, we will take the campaign click value (preset bythe advertiser) of this action as return. In our settings, thisclick value is set equal to eCPC in the campaign’s historydata log. While in the real-world scenario, the campaignvalue is set by the advertiser. The profit is regarded asthe total gross profit (

∑return −

∑cost) for the whole

test data auctions. ROI=profit/cost is another importantmeasurement reflecting the cost-effectiveness of a biddingstrategy. It can be regarded as a relatively orthogonal metricto auction volume and bid cost.

We also take ad related metrics such as eCPC, cost perthousand impressions (CPM), CTR, and the winning rate tocompare the bidding performance of the different predictionmodels.

To measure the binary classification performance forCTR estimation, we adopt commonly used AUC (area un-der ROC curve) and RMSE (root mean squared error) tomeasure the accuracy of a regression model. Moreover, forbid landscape forecasting, we report the ANLP (averagednegative log probability) [32] of the forecasting results overthe test dataset. The detailed experimental results of AUC,RMSE and ANLP can be found in Appendix D and E. Wealso present the significance test results in Appendix F.

7.4 Compared Settings

Test Settings without Budget Constraint. For the first partof our experiment, the unlimited budget is tested. All theCTR models are embedded with the same truthful biddingfunction. We compare 4 models in this part:

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10

1 2 3 4 5 6 7 8 9Train Rounds

1.0

1.5

2.0

2.5

3.0

3.5

4.0Pro

fit

1e7 Learning curve (1458)

SE

CE

EU

RR

0 5 10 15 20Train Rounds

5.5

6.0

6.5

7.0

7.5

Pro

fit

1e9 Learning curve (YOYI)

SE

CE

EU

RR

Fig. 5: Training on iPinYou (left) and YOYI (right).

TABLE 2: Direct campaign profit over baselines.

Profit (×107) ROIiPinYou SE CE SE CE

1458 3.2 3.6 4.2 6.62259 -0.32 0.40 -0.080 0.182261 0.29 0.63 0.26 0.402821 0.11 0.08 0.21 0.0232997 0.11 0.14 0.42 0.713358 1.76 2.4 5.4 5.23386 0.51 1.6 0.16 1.23427 0.33 2.9 0.11 3.43476 0.65 3.1 0.36 3.5

Average 0.74 1.7 1.2 2.3YOYI 665.6 669.5 1.8 1.9

CE - The logistic regression model [12], [20] is widelyused in many DSP platforms to make predictions ofuser feedback. This model takes cross entropy as itsoptimization objective and has the gradient as Eq. (17).

SE - This logistic regression model takes the squared errorloss as the objective function, which takes the gradientupdate as Eq. (16).

EU - Our proposed expected utility model for CTR estima-tion, which takes the gradient update as Eq. (14).

RR - Our proposed risk-return model for CTR estimation,which takes the gradient update as Eq. (15).

BM(MKT) - Our proposed CTR estimation model withmarket (MKT) modeling, which is described in thebinary optimization method of Algorithm 1.

Note that EU and RR models consider a statistical bid land-scape function pz(z), while the last BM(MKT) model utilizesa parametric market competition modeling pz(z,x;φ).

Test Settings with Budget Constraint. In the second sce-nario, we evaluate different bidding strategies under budgetconstraints. Here we test 4 solutions:

CELIN - As in [24], [38], the bid value is linearly propor-tional to the predicted CTR. We implement a logisticregression model with a linear bidding strategy, whichis widely used in lots of production environment.

ORTB - Optimal Real-time Bidding strategy [38] whichapplies functional optimization for bidding function.

PRUD - A Prudent bidder [19] which combines CTR esti-mation and winning price prediction together to effi-ciently bid in real time.

BM(FULL) - The full bidding machine framework withjoint optimization as described in Algorithm 2.

TABLE 3: Campaign profit improvement over baseline CE.

Profit gain ROI gainiPinYou EU RR EU RR

1458 7.10% 9.00% 233% 267%2259 81.6% 99.3% 233% 472%2261 26.3% 31.1% 44.4% 91.2%2821 573% 615% 1334% 943%2997 5.00% 0.700% -3.60% -11.4%3358 1.70% 6.70% 77.1% 77.7%3386 -1.20% 2.50% 20.6% 58.3%3427 5.50% 8.70% 52.0% 175%3476 4.20% 8.60% 16.0% 91.1%YOYI 9.04% 0.600% 14.8% 2.11%

Average +71.2% +78.2% +202% +217%

7.5 Campaign Profit Optimization

As we have found in Appendix D that our models haveat least comparable performance for predicting CTR, weare now ready to examine the performance of profit opti-mization for each campaign in an unlimited budget setting(we will present the results under limited budget in Sec-tion 7.7.2). Figure 5 plots the obtained profit against thetraining rounds for the 4 models in both the iPinYou andYOYI datasets. The model learns on the whole training setin each round. While the figures show the convergence ofeach estimation model, SE does not well generalize its CTRprediction to the profit optimization in iPinYou dataset.Compared to RR, EU’s prediction focuses on medium-valued CTR cases, which is indeed the range with highvolume of clicks in YOYI’s market data, while RR focusesmore on higher-valued cases. This results shows EU betterin winning more quality cases than RR.

We further examine the two baselines, SE and CE, withmore details in Table 2. Both models achieve positive ROIsin almost all campaigns. And, in most campaigns CE outper-forms SE in terms of the profit and ROI. This is consistentwith our finding in Appendix D that CE outperforms SE forCTR prediction accuracy.

We next pick up the best CE model and use it as thebaseline to compare the profit gain and ROI gain with ourproposed EU and RR models, as shown in Table 3. Wecan observe that (i) Both EU and RR consistently achievehigher profit than the CE baseline. Only in iPinYou cam-paign 3386, EU gains less profit. In average, our proposedmodels improve the profit about 71.2% for EU and 78.2%for RR, respectively. (ii) For the ROI metric, EU and RRget even higher overall improvements against the baselineCE. The average ROI gains are 202% for EU and 217% forRR, respectively. Those results suggest that our proposedmodels are much more cost-effective. (iii) RR is the best andin average it gains 7.0% and 15% than EU in profit and ROI,respectively.

Finally, Table 4 provides other statistics to summarizethe overall campaign performance for the 4 CTR estima-tion models. CTR and eCPC reflect the quality and cost-effectiveness of the winning impressions. Our models, bothEU and RR, outperform the baselines in terms of CTR andeCPC with comparable CPM and a relatively low winningrate. This indicates EU and RR successfully allocate thebudget to high quality cost-effective ad inventories andavoid on the low quality ones.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11

TABLE 4: Overall statistics in offline evaluation.

CTR (×10−4) eCPCiPinYou SE CE EU RR SE CE EU RR

1458 34 33 59 190 17 11 4.3 3.42259 3.3 3.6 3.7 5.8 303 235 172 1362261 2.4 2.7 3.0 2.8 234 212 188 1682821 5.5 5.9 4.8 7.0 116 137 105 1122997 31 25 26 27 9.8 8.2 8.3 8.63358 51 41 69 61 18 19 12 123386 7.8 11 13 15 90 48 43 363427 7.2 25 29 72.8 98 25 17.3 103476 6.4 16 17 33.1 111 34 30 20

Average 16 18 25 46 110 81 64 57YOYI 16 18 26 24 12.9 12.4 11.3 12

CPM Win RateiPinYou SE CE EU RR SE CE EU RR

1458 57 37 25 65 0.22 0.24 0.13 .0412259 100 84 64 78 0.89 0.63 0.44 0.242261 57 56 56 46 0.55 0.81 0.71 0.672821 63 80 50 78 0.12 0.63 0.48 0.452997 30 20 21 22 0.55 0.63 0.65 0.633358 92 77 80 70 0.11 0.20 0.11 0.133386 71 54 55 55 0.82 0.45 0.36 0.293427 70 60 49 75 0.75 0.26 0.22 .0823476 71 55 50 65 0.49 0.31 0.31 0.15

Average 68 58 50 62 0.50 0.46 0.38 0.30YOYI 20 23 29 30 0.36 0.30 0.22 0.22

0 50 100 150 200 250 300Price

0

1

2

3

4

5

Log

-Nu

mb

er

Distribution of Bid Price and Market Price

Market Price

CE

EU

Market Price with Click

−200 0 200 400 600Price Difference (Bid Price ¡ Market Price)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Log

-Nu

mb

er

Distribution of Price Difference

CEEU

Fig. 6: Analysis of bid price and market price distribution(iPinYou campaign 2259) reflecting the formulation in Sec. 5.

7.6 Bidding Data AnalysisIn this section, we further analyze the bidding data togain more insights into why our models outperform thebaselines. As we discussed in our formulation, a key ad-vantage of our models is the introduction of the marketprice distribution and the utility of the bid to the learningof CTR model parameters. To understand the impact, in theleft subfigure of Figure 6, we plot the distribution of bidvalues for our EU (similar to RR) model and the baseline CEmodel and compare them with the market price distributionand also the market price of the impressions that receivedclicks. We cut off the figure for price > 300 since the marketprice never goes beyond 300 in the dataset.

Firstly, we see that the bid prices generated from CEdeviate far from the market prices; a large portion of thebids from CE are very high, whereas the distribution (in logscale) of the market prices gently descends from 50 to 300,with its peak in the region between 0 and 30.

By contrast, our model EU nicely reduces the differencebetween the distributions of bid price and the market priceby focusing the training on the cases that the bid is close tothe market price (see the discussion in Section 4.3).

Moreover, considering the market price distribution ofthe impressions with clicks, we find that the bid distribution

1 2 3 4 5 6 7 8 9Training Round

0.6

0.7

0.8

0.9

1.0

Valu

e

10

20

30

40

50

60

AN

LP V

alu

e

Learning of Bidding Machine (iPinYou 3476)

AUCProfit(£3:3e10)ANLP

1 2 3 4 5 6 7 8 9Training Round

0.75

0.80

0.85

0.90

0.95

1.00

Valu

e

10

20

30

40

50

60

AN

LP V

alu

e

Learning of Bidding Machine (iPinYou 3358)

AUCProfit(£2:6e10)ANLP

Fig. 7: Bidding Machine Learning Performance.

of EU fits it much better than that of CE, which means thebids from EU are more unlikely to miss high quality adimpressions than those from CE.

The right subfigure in Figure 6 further shows the distri-butions of the price difference between the bids (from EU andCE respectively) and the true market prices. We find that CEhas a rather biased bidding strategy — a large portion of thebids are much higher than the corresponding market prices.For EU, on the contrary, the major proportion of the bids arein the “sensitive zone” where bid price is close to the marketprice. The peak is located right at zero, which indicates thatEU effectively leverages the market price distribution andperforms sensibly.

7.7 Bidding Machines

7.7.1 Profit Optimization with Market ModelingIn this section, we present the performance of our CTRestimation model with learning of the bid landscape forecast-ing. In Algorithm 1, the learning of bid landscape forecast-ing pz(z,x;φ) and the corresponding winning probabilityfunction w(b,x;φ) contribute to the CTR estimation modelfθ(x). Thus, for evaluation, we implement the learned CTRmodel which optimizes the campaign profits with the bene-fits from the bid landscape forecasting model. In this exper-iment, we try to demonstrate that the binary optimization(with both CTR optimization and market price learning) forcampaign profits is stronger than the single optimizer (withonly CTR optimization).

Table 5 summarizes the results over four key metrics ofthe binary optimization framework which takes paramet-ric market modeling into consideration. In the table, wecan easily find that the binary optimization i.e., BM(MKT)performs the best in almost all the measurements. Theimprovement is reasonable since the both the CTR estima-tion part and the bid landscape modeling part are learnedby maximizing the profits. For AUC performance, we cansee that the learning of bid landscape model contributesto CTR estimation model in classification accuracy, whichreflects the effectiveness of learning paradigm for the marketsensitivity pz(z) in Eq. (8).

Recall that, there are three main metrics for the three sub-problems that are AUC for CTR estimation, ANLP for bidlandscape forecasting and profit for the bidding strategy.We plot the learning curves in terms of these metrics forbidding machine as in Figure 7. We can observe that forthe campaign profit, the model converges around 8 rounds,and for both AUC and ANLP, the model converges after6 rounds, which shows the good convergence property ofbidding machine.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 12

TABLE 5: Campaign profit for Single CTR estimation and Binary Optimization with market modeling.

1458 2259 2261 2821 2997 3358 3386 3427 3476 Average

AUCEU .987 .674 .622 .608 .606 .970 .761 .976 .954 .795RR .977 .691 .619 .639 .608 .980 .778 .960 .950 .800

BM(MKT) .981 .678 .647 .620 .603 .980 .788 .973 .955 .803

Profits (×107)EU 3.91 .732 .797 .539 .147 2.42 1.58 3.05 3.25 1.82RR 3.98 .803 .827 .572 .141 2.54 1.64 3.14 3.39 1.89

BM(MKT) 4.02 .766 .863 .669 .148 2.57 1.73 3.18 3.31 1.91

ROIEU 19.2 .607 .582 .333 .679 9.26 1.46 5.30 4.02 4.60RR 24.3 1.03 .771 .247 .624 9.29 1.90 9.57 6.63 6.04

BM(MKT) 31.7 .829 .692 .476 .733 8.83 1.08 9.70 5.40 6.61

eCPCEU 4.27 172 187 104 8.33 11.4 42.5 17.3 30.0 64.3RR 3.39 136 167 112 8.61 11.4 36.1 10.3 19.7 56.1

BM(MKT) 2.62 151 175 94.7 8.07 11.9 50.2 10.1 23.5 58.7

7.7.2 Experiments with Budgets

As formulated in Eq. (6), the bidding machine algorithmis capable of jointly optimizing all the CTR estimation, bidlandscape forecasting and bidding function by alternativelyfixing two of them and optimizing the third module. Inthis section, we evaluate our joint optimization modelsunder budget constraints. We mainly compare four mod-els: CELIN, ORTB, PRUD and our integrated algorithmBM(FULL) as discussed in Section 7.4. And we set the testbudget as 1/2, 1/4, 1/8, 1/16, 1/32 and 1/64 of the originaltotal cost in the history log respectively.

In Figure 8, we compare the overall performance forthose three models over the tested campaigns of iPinYoudataset. The x-axis indicates the test proportion of the totalcost as the budget settings. We find that in almost allsettings and metrics, our proposed bidding machine algo-rithm BM(FULL) outperforms the other strong baselines,including state-of-the-art ORTB and PRUD.

Specifically, we find that (i) BM(FULL) achieves higherprofits than other models since the learning objective ofbidding machine is directly the profit. Moreover, whenthe budget constraint looses, the baseline models get highprofits in medium degree of the constraint but drop quicklyas the budget gets larger, which means that these models donot care much about the bidding efficiency and waste thebudget on the high market price requests with low returns.(ii) The results of ROI and eCPC performance reflect thecost-effectiveness. We can see the overall ROI decreases toalmost zero for the baselines while BM(FULL) stays effectiveunder all settings, which means our model can dynamicallydrop the bid traffic with low benefits and save the budget.(iii) As for CTR, we find that PRUD achieves the highestCTR with tight budgets and our model gets better when thebudget constraint is loose. The PRUD model is to maximizethe total clicks and achieves state-of-the-art performance.However, the profit is not the optimization goal so thatPRUD has higher CTR but lower ROI than our model.

In Table 6, we list the details of the achieved profits byall the compared models under three different budget con-straints on iPinYou campaigns. The results clearly presentthe trends as observed in Figure 8.

7.8 Online A/B Testing

Our bidding machine models are deployed and tested ina live, commercial environment provided by YOYI PLUS(Programmatic Links Us) platform, which is a main DSP inChina. The online experiment consists of two phases. The

0.0 0.1 0.2 0.3 0.4 0.5Budgets

1.2

1.4

1.6

1.8

2.0

Pro

fit

1e7 Profit with Budgets

CELINORTBPRUDBM(FULL)

0.0 0.1 0.2 0.3 0.4 0.5Budgets

5

10

15

20

25

RO

I

ROI with Budgets

CELINORTBPRUDBM(FULL)

0.0 0.1 0.2 0.3 0.4 0.5Budgets

20

40

60

80

100eC

PC

eCPC with Budgets

CELINORTBPRUDBM(FULL)

0.0 0.1 0.2 0.3 0.4 0.5Budgets

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

CTR

CTR with Budgets

CELINORTBPRUDBM(FULL)

Fig. 8: Performance with budgets on iPinYou.

TABLE 6: Achieved direct profit (×106) with budgets.

iPinYou CELIN ORTB PRUD BM(FULL)1/64 1/8 1 1/64 1/8 1 1/64 1/8 1 1/64 1/8 1

1458 40.1 37.1 22.5 39.9 37.2 32.8 35.3 32.3 3.55 40.7 40.5 40.52259 2.47 5.67 -5.69 1.81 3.72 -2.55 4.00 5.20 .838 3.91 6.47 8.622261 1.93 4.48 .151 1.63 4.99 3.89 .872 3.47 2.46 1.97 5.31 9.132821 3.97 6.03 -13.8 3.69 5.47 -4.13 1.42 4.57 1.70 4.10 7.28 7.602997 .518 1.25 -1.03 .530 1.36 .151 .111 .665 .675 .670 1.39 1.493358 24.3 24.3 11.1 24.3 23.6 18.0 24.0 21.4 1.24 25.2 26.3 26.33386 7.83 13.3 2.43 6.67 12.3 10.1 7.10 11.5 3.71 8.30 16.4 18.43427 29.8 30.9 10.1 29.6 30.5 20.6 30.9 32.3 3.35 30.7 32.0 32.43476 21.7 33.3 15.7 17.3 33.1 22.8 19.8 29.6 2.82 23.3 33.6 34.8

Average 14.7 17.37 4.62 13.9 16.9 11.3 13.7 15.6 2.26 15.4 18.8 19.8

first phase is to test our CTR estimation models with desk-top traffic, while the second phase is to compare differentbidding strategies over campaigns that only target mobileinventories. The received bid requests are randomly selectedto send to each model at each time according to the usercookie ID, while the chance controlled by the DSP platformfor each model is set equal among all the compared models.We set the same budget constraint for all deployed modelsand the unit of money is CNY.Phase I. We test over 10 campaigns during 25-26 January,2016. There are 4 deployed models: EU, RR, CE and FM,where the first three have been discussed in Section 7.4and FM is a factorization machine model [23] with non-hierarchical feature engineering. The comparison of biddingmachine with our previously proposed models will be pre-sented in Phase II. To show the comparable performanceof user response prediction, we set the same linear biddingfunction for all prediction models including baselines. The

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 13

Fig. 9: Online testing results on YOYI (Phase I in 2016).

CELIN EULIN BM0.0

0.5

1.0

1.5

2.0ROI

CELIN EULIN BM0

5

10

15

20

25

30

35Profit (£103 CNY)

CELIN EULIN BM0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35eCPC (CNY)

CELIN EULIN BM0123456789

CTR (‰)

CELIN EULIN BM0

2

4

6

8

10

12Win Rate (%)

CELIN EULIN BM0.0

0.5

1.0

1.5

2.0

2.5CPM (CNY)

Fig. 10: Online results on YOYI MOBILE (Phase II in 2017).

only difference is the embedded prediction model. Thewhole tested bid flow contains over 89M auctions including3.3M impressions, 8,440 clicks and 1,403 CNY budget cost.The overall results are presented in Figure 9.

From the comparison in Figure 9, we have the followingconclusions: (i) EU and RR achieve higher profit and ROIthan CE and FM. Specifically, EU has twice ROI as FM, andRR achieves more than 50% return of FM. EU gains 25.5%and 53.0% more profit than CE and FM respectively. (ii)eCPC consistently has inverse relationship with the trend ofROI. The online result also reflects this relationship: EU andRR have lower eCPC than other two baseline models. (iii) Asfor CTR, we find that EU achieves the highest CTR and RRalso performs better than CE. Here FM has higher CTR thanthe CE model because it could learn feature interactions viathe latent vector inner product [23]. However, FM obtainsrelatively less profit gain and ROI than CE, which showsthat FM does not care enough about those auctions withhigh return value.Phase II. We test over 5 campaigns on the mobile platformduring 30 days in April, 2017. Here we deploy three biddingstrategies, which are CELIN, EULIN and BM. The first twoare the linear bidding strategy with CE and EU modelembedded. The third algorithm is our bidding machinealgorithm, namely BM in this section, which considers CTRoptimization, bid landscape modeling and bidding strategyoptimization altogether to maximize the whole profits. Notethat, in our two-staged online testing phase, RR modelperforms inferior to EU model, which has been shown inPhase I. Thus we discard the RR testing in Phase II forbusiness constraint. The whole tested bid flow involves224M auctions including 23M impressions and 168K clicksand totally 50K CNY budget cost. The overall results areillustrated in Figure 10.

For the bidding efficiency, we have these conclusions

from Figure 10: (i) Both three bidding algorithms achievesmore than 100% profit w.r.t. the cost, especially the bid-ding machine algorithm gains almost double profit. Thejoint profit optimization has delivered best performance inthe ROI and profit measurements. (ii) The EULIN modelachieves the best (lowest) eCPC in Phase I online experi-ments, while our bidding machine beats the strong EULINbaseline and reduces the cost of each click in one morestep. The reason is that bidding machine dynamically learnsthe market competition while EULIN remains static bidlandscape forecasting. (iii) EULIN and BM maintain lowerwinning rates than CELIN while achieving higher CTRperformance, which reflects the finding in Phase I onlinetesting. (iv) BM and CELIN prefer lower price impressionsas shown in CPM comparison. Moreover, our learning-based bid landscape modeling also aims to maximize theoverall profits, which makes a step further than only CTRoptimization and contributes to obtain more profits.

In sum, the online A/B testing results demonstrate theeffectiveness of our proposed bidding machine model forprofit optimization. As for the difference between offlineand online experimental results, it is reasonable because ofthe offline data bias and the market dynamics [39].

8 CONCLUSIONS

In this paper, we proposed bidding machine, a comprehen-sive learning to bid framework, which aims to maximizethe profit of the advertiser in real-time bidding for displayadvertising. In our bidding machine paradigm, the learn-ing model consumes the recent bidding logs and jointlyoptimizes three components, including user response pre-diction, bid landscape forecasting and bidding strategy, ina unified objective function. Our mathematical derivationsshowed that the gradient of each component benefits fromthe behavior of the others. We tested our prediction modeland the optimized bidding strategy with other state-of-the-art bidding algorithms under various market settings. Theempirical study including both offline experiments and on-line A/B testing on a commercial RTB platform has verifiedthe practical efficacy of our proposed bidding machine.

In the future work, we plan to integrate the censoredlearning [3], [32] paradigm for more accurate learning in bidlandscape forecasting. Also, extending the BM frameworkto reinforcement learning settings [5] would be a promisingdirection. Besides, we would investigate the multi-agentbidding machine interactions and explore the potential equi-libriums.

REFERENCES

[1] D. Agarwal, R. Agrawal, R. Khanna, and N. Kota. Estimatingrates of rare events with multiple hierarchies through scalable log-linear models. In Proceedings of the 16th ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages 213–222.ACM, 2010.

[2] D. Agarwal, S. Ghosh, K. Wei, and S. You. Budget pacing fortargeted online advertisements at linkedin. In Proceedings of the20th ACM SIGKDD international conference on Knowledge discoveryand data mining, pages 1613–1619. ACM, 2014.

[3] K. Amin, M. Kearns, P. Key, and A. Schwaighofer. Budget opti-mization for sponsored search: Censored learning in MDPs. UAI,2012.

[4] J. O. Berger. Statistical decision theory and Bayesian analysis. SpringerScience & Business Media, 2013.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 14

[5] H. Cai, K. Ren, W. Zhang, K. Malialis, J. Wang, Y. Yu, and D. Guo.Real-time bidding by reinforcement learning in display adver-tising. In The Tenth ACM International Conference on Web Searchand Data Mining (WSDM). Association for Computing Machinery(ACM), 2017.

[6] O. Chapelle. Offline evaluation of response prediction in onlineadvertising auctions. In Proceedings of the 24th International Con-ference on World Wide Web Companion, pages 919–922. InternationalWorld Wide Web Conferences Steering Committee, 2015.

[7] Y. Chen, P. Berkhin, B. Anderson, and N. R. Devanur. Real-timebidding algorithms for performance-based display ad allocation.In Proceedings of the 17th ACM SIGKDD international conference onKnowledge discovery and data mining, pages 1307–1315. ACM, 2011.

[8] Y. Cui, R. Zhang, W. Li, and J. Mao. Bid landscape forecasting inonline ad exchange marketplace. In Proceedings of the 17th ACMSIGKDD international conference on Knowledge discovery and datamining, pages 265–273. ACM, 2011.

[9] B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertis-ing and the generalized second price auction: Selling billions ofdollars worth of keywords. Technical report, National Bureau ofEconomic Research, 2005.

[10] Google. The arrival of real-time bidding and what it means formedia buyers. Google White Paper, 2012.

[11] T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored searchadvertising in microsoft’s bing search engine. In Proceedings of the27th International Conference on Machine Learning (ICML-10), pages13–20, 2010.

[12] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah,R. Herbrich, S. Bowers, et al. Practical lessons from predictingclicks on ads at facebook. In Proceedings of the Eighth InternationalWorkshop on Data Mining for Online Advertising, pages 1–9. ACM,2014.

[13] V. Krishna. Auction theory. Academic press, 2009.[14] K. J. Lang, B. Moseley, and S. Vassilvitskii. Handling forecast errors

while bidding for display advertising. In Proceedings of the 21stinternational conference on World Wide Web, pages 371–380. ACM,2012.

[15] K.-C. Lee, A. Jalali, and A. Dasdan. Real time bid optimizationwith smooth budget delivery in online advertising. In Proceedingsof the Seventh International Workshop on Data Mining for OnlineAdvertising, page 1. ACM, 2013.

[16] K.-c. Lee, B. Orten, A. Dasdan, and W. Li. Estimating conversionrate in display advertising from past erformance data. In Proceed-ings of the 18th ACM SIGKDD international conference on Knowledgediscovery and data mining, pages 768–776. ACM, 2012.

[17] X. Li and D. Guan. Programmatic buying bidding strategieswith win rate and winning price estimation in real time mobileadvertising. In Advances in Knowledge Discovery and Data Mining,pages 447–460. Springer, 2014.

[18] H. Liao, L. Peng, Z. Liu, and X. Shen. ipinyou global rtb biddingalgorithm competition dataset. In Proceedings of the Eighth Interna-tional Workshop on Data Mining for Online Advertising, pages 1–6.ACM, 2014.

[19] C.-C. Lin, K.-T. Chuang, W. C.-H. Wu, and M.-S. Chen. Combiningpowers of two predictors in optimizing real-time bidding strategyunder constrained budget. In Proceedings of the 25th ACM Inter-national on Conference on Information and Knowledge Management,pages 2143–2148. ACM, 2016.

[20] H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady,L. Nie, T. Phillips, E. Davydov, D. Golovin, et al. Ad clickprediction: a view from the trenches. In Proceedings of the 19thACM SIGKDD international conference on Knowledge discovery anddata mining, pages 1222–1230. ACM, 2013.

[21] A. K. Menon, K.-P. Chitrapura, S. Garg, D. Agarwal, and N. Kota.Response prediction using collaborative filtering with hierarchiesand side-information. In Proceedings of the 17th ACM SIGKDDinternational conference on Knowledge discovery and data mining,pages 141–149. ACM, 2011.

[22] S. Muthukrishnan. Ad exchanges: Research issues. In Internet andnetwork economics, pages 1–12. Springer, 2009.

[23] R. J. Oentaryo, E.-P. Lim, J.-W. Low, D. Lo, and M. Fine-gold. Predicting response in mobile advertising with hierarchicalimportance-aware factorization machine. In Proceedings of the 7thACM international conference on Web search and data mining, pages123–132. ACM, 2014.

[24] C. Perlich, B. Dalessandro, R. Hook, O. Stitelman, T. Raeder, andF. Provost. Bid optimizing and inventory scoring in targeted onlineadvertising. In Proceedings of the 18th ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages 804–812.ACM, 2012.

[25] Y. Qu, H. Cai, K. Ren, W. Zhang, Y. Yu, Y. Wen, and J. Wang.Product-based neural networks for user response prediction. InIEEE International Conference on Data Mining (ICDM), 2016.

[26] K. Ren, W. Zhang, Y. Rong, H. Zhang, Y. Yu, and J. Wang. Userresponse learning for directly optimizing campaign performancein display advertising. In Proceedings of the 25th ACM Internationalon Conference on Information and Knowledge Management, pages 679–688. ACM, 2016.

[27] M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks:estimating the click-through rate for new ads. In Proceedings ofthe 16th international conference on World Wide Web, pages 521–530.ACM, 2007.

[28] A.-P. Ta. Factorization machines with follow-the-regularized-leader for ctr prediction in display advertising. In Big Data (BigData), 2015 IEEE International Conference on, pages 2889–2891. IEEE,2015.

[29] F. Vasile and D. Lefortier. Cost-sensitive learning for bidding inonline advertising auctions. NIPS Workshop: Machine Learning foreCommerce, 2015.

[30] J. Wang and S. Yuan. Real-time bidding: A new frontier ofcomputational advertising research. In CIKM Tutorial, 2013.

[31] J. Wang, W. Zhang, and S. Yuan. Display advertising withreal-time bidding (rtb) and behavioural targeting. arXiv preprintarXiv:1610.03013, 2016.

[32] Y. Wang, K. Ren, W. Zhang, J. Wang, and Y. Yu. Functionalbid landscape forecasting for display advertising. In Joint Eu-ropean Conference on Machine Learning and Knowledge Discovery inDatabases, pages 115–131. Springer, 2016.

[33] W. C.-H. Wu, M.-Y. Yeh, and M.-S. Chen. Predicting winning pricein real time bidding with censored data. In Proceedings of the 21thACM SIGKDD International Conference on Knowledge Discovery andData Mining, pages 1305–1314. ACM, 2015.

[34] J. Xu, X. Shao, J. Ma, K.-c. Lee, H. Qi, and Q. Lu. Lift-based biddingin ad selection. In Thirtieth AAAI Conference on Artificial Intelligence,2016.

[35] Y. Yuan, F. Wang, J. Li, and R. Qin. A survey on real time biddingadvertising. In Service Operations and Logistics, and Informatics(SOLI), 2014 IEEE International Conference on, pages 418–423. IEEE,2014.

[36] W. Zhang. Optimal Real-Time Bidding for Display Advertising. PhDthesis, UCL (University College London), 2016.

[37] W. Zhang and J. Wang. Statistical arbitrage mining for displayadvertising. In Proceedings of the 21th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, pages 1465–1474. ACM, 2015.

[38] W. Zhang, S. Yuan, and J. Wang. Optimal real-time bidding fordisplay advertising. In Proceedings of the 20th ACM SIGKDD in-ternational conference on Knowledge discovery and data mining, pages1077–1086. ACM, 2014.

[39] W. Zhang, T. Zhou, J. Wang, and J. Xu. Bid-aware gradient descentfor unbiased learning with censored data in display advertising.In KDD, 2016.

Kan Ren is a PhD student in Department of Computer Science inShanghai Jiao Tong University. His research interests include data min-ing and computational advertising. He has published several researchpapers in KDD, CIKM, ECML-PKDD, WSDM and ICDM.Yifei Rong is responsible for the algorithm R&D in Meituan-DianpingInc. He has published several scientific papers about computationaladvertising and marketing in WSDM and CIKM.Weinan Zhang is an assistant professor in Department of ComputerScience, Shanghai Jiao Tong University. His research interests includemachine learning and data mining, particularly, deep learning and rein-forcement learning techniques for real-world data mining scenarios. Hehas published 50 research papers on conferences & journals includingKDD, SIGIR, AAAI, WWW, WSDM, ICDM, JMLR and IPM etc.Yong Yu is a professor in Shanghai Jiao Tong University. His researchinterests include information systems, data mining and machine learn-ing. He has published over 200 papers and served as a PC member of adozen of related conferences (e.g., WWW, SIGIR, RecSys, NIPS, ICMLetc.) in these fields.Jun Wang is a professor in University College London. He has publishedover 100 research papers in leading journals and conference proceed-ings including TOIS, TKDE, WWW, CIKM, SIGIR, KDD, SIGMM, AAAIetc. He received the Best Doctoral Consortium award in ACM SIGIR’06,the Best Paper Prize in ECIR’09, ECIR’12, ADKDD’14 and SIGIR’17.