Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf...
-
Upload
daniel-gallagher -
Category
Documents
-
view
219 -
download
0
Transcript of Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf...
![Page 1: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/1.jpg)
Probabilistic Machine Learning in Computational Advertising
Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela
Online Services and AdvertisingMicrosoft Research Cambridge, UK
NIPS 2009 – December 2009
![Page 2: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/2.jpg)
Outline
• Online Advertising and Paid Search• AdPredictorTM: Predicting User Clicks on Ads
[Appendix]• Model shrinking• Parallel training
![Page 3: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/3.jpg)
ONLINE ADVERTISING AND PAID SEARCH
![Page 4: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/4.jpg)
Advertising Industry Business: Size
0
100
200
300
400
500
600
2001 2002 2003 2004 2005 2006
Year
Outdoor
Cinema
Radio
TV
Online
Annu
al E
xpen
ditu
re (i
n bi
llion
USD
)
GDP Denmark (2006)
Microsoft Revenue (2008)
Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007
![Page 5: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/5.jpg)
Advertising Industry Business: Growth
-20.00%
-10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
2001 2002 2003 2004 2005 2006
Year
Outdoor
Cinema
Radio
TV
Online
Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007
![Page 6: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/6.jpg)
![Page 7: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/7.jpg)
$1.00$1.00
$2.00$2.00
$0.10$0.10
* 10%* 10%
* 4%* 4%
* 50%* 50%
=$0.10=$0.10
=$0.08=$0.08
=$0.05=$0.05
$0.80$0.80
$1.25$1.25
$0.05$0.05
Display to users (expected bid)Display to users (expected bid) Charge advertisers (per click)Charge advertisers (per click)
![Page 8: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/8.jpg)
The Scale of Things
• Realistic training set for proof of concept: 7,000,000,000 impressions
• 2 weeks of CPU time during training: 2 wks × 7 days × 86,400 sec/day =
1,209,600 seconds• Learning algorithm speed requirement:– 5,787 impression updates / sec– 172.8 μs per impression update
![Page 9: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/9.jpg)
ADPREDICTORBayesian Linear Probit Regression
![Page 10: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/10.jpg)
Impression Level Predictions
![Page 11: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/11.jpg)
One Weight per Feature Value102.34.12.201
15.70.165.9
221.98.2.187
92.154.3.86
Client IP
Exact Match
Broad Match
MatchType
Position
ML-1
SB-1
SB-2
++ pClick
![Page 12: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/12.jpg)
Click Potential
Linear: click potential = sum of feature click contributions
click potentialclick potential00
PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2
ListingId = 798831ListingId = 798831
ClientIP = 98.0.101.23ClientIP = 98.0.101.23
clickclickclickclickno clickno clickno clickno click
Impression click potentialImpression click potential
![Page 13: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/13.jpg)
Gaussian Noise
Probit: area under Gaussian tail as a function of click potential
click potentialclick potential00
Impression click potentialImpression click potential
clickclickclickclickno clickno clickno clickno click
P(click) = P(potential > 0)P(click) = P(potential > 0)
![Page 14: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/14.jpg)
Probit
Probit: area under Gaussian tail as a function of click potential
click potentialclick potential00
Impression click potentialImpression click potential
100%100%
P(click|Impression)P(click|Impression)
![Page 15: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/15.jpg)
click potentialclick potential00
PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2
ListingId = 798831ListingId = 798831
ClientIP = 98.0.101.23ClientIP = 98.0.101.23
Modelling Uncertainty
![Page 16: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/16.jpg)
click potentialclick potential00
Impression click potentialImpression click potential
Uncertainty about the Potential
![Page 17: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/17.jpg)
click potentialclick potential00
Impression click potentialImpression click potential
Probability of Click
100%100%
![Page 18: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/18.jpg)
Uncertainty: Bayesian Probabilities102.34.12.201
15.70.165.9
221.98.2.187
92.154.3.86
Client IP
Exact Match
Broad Match
MatchType
Position
ML-1
SB-1
SB-2
p(pClick)++
![Page 19: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/19.jpg)
Principled Exploration
![Page 20: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/20.jpg)
Training Algorithm in Action
w1
w1
w2
w2++
zz
cc Prediction Training/Update
![Page 21: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/21.jpg)
Posterior Updates for the Click Event
![Page 22: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/22.jpg)
Client IP: Mean & Variance
![Page 23: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/23.jpg)
Calibrated Predictions
![Page 24: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/24.jpg)
Joint Updates vs. Independent Aggregation
Naive Bayes
![Page 25: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/25.jpg)
adPredictor Wrap Up
![Page 26: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/26.jpg)
Thank [email protected]
![Page 27: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/27.jpg)
APPENDIX
![Page 28: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/28.jpg)
Dealing with Millions of Variables• Observation 1: Large variable bags follow a
power-law w.r.t. frequency of items• Observation 2: Weight posteriors of rare items
are close to their prior• Idea:
1. Initially, the belief of each new item is compactly represented by one (and the same) prior
2. After observing an item for the first time, the posterior is allocated
3. At regular intervals, all weight posteriors with a small deviation from the prior are removed
![Page 29: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/29.jpg)
Naïve Approach – Shared Memory
• Does not scale– Constant contention for locks– Some features are very frequent– Synchronization issues
Training Node 1 Training Node 2
Impression A Impression B
MSNH1110.0.0.1 USA(etc) MSNH11 Canada 10.0.1.25 (etc)
ModelFile
Conflict!
UpdateUpdate
UpdateUpdate
Upda
te
Upda
te
Upda
te
Upd
ate
![Page 30: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.](https://reader036.fdocuments.in/reader036/viewer/2022062322/5697bf811a28abf838c8562b/html5/thumbnails/30.jpg)
Proposal: Approximate LearningTrain Node 1 Train Node 2
Impression A Impression B
MSNH1110.0.0.1 USA(etc) MSNH11 Canada 10.0.1.25 (etc)
Upda
te UpdateUpdate
Update
Update
Merge Deltas
Updat
e
Update
Update
Final Model File