Post on 01-Nov-2014
description
Claudia Perlich Chief Scientist, Dstillery
Adjunct Professor, Stern (NYU)
@claudia_perlich
Tales from data trenches of display advertising
Ad Exchange
Shopping at one of
our campaign sites
cookies
10 Million
URL’s
200 Million
browsers
0.0001% to 1%
baserate 10 Billions of
auctions
per day
conversion
Where should
we advertise and
at what price?
Does the ad have
causal effect?
What data should
we pay for?
Attribution?
Who should
we target for
a marketer?
What requests
are fraudulent?
The Non-Branded Web
A consumer’s online/mobile activity
The Branded Web
gets recorded like this:
Our Browser Data: Agnostic
I do not want to ‘understand’ who you are …
Browsing History
Hashed URL’s: date1 abkcc
date2 kkllo
date3 88iok
date4 7uiol
…
Browsing History
Hashed URL’s: date1 abkcc
date2 kkllo
date3 88iok
date4 7uiol
…
Brand Event Encoded date1 3012L20 date 2 4199L30
… date n 3075L50
Brand Event Encoded date1 3012L20 date 2 4199L30
… date n 3075L50
Targeting Model
Bidding Model
Fraud
Causal Analysis
Analytical Decomposition
The Heart and Soul
Predictive modeling on hashed browsing history
10 Million dimensions for URL’s (binary indicators)
extremely sparse data
positives are extremely rare
Targeting Model
P(Buy|URL,inventory,ad)
How can we learn from 10M features with no/few positives?
We cheat.
In ML, cheating is called “Transfer Learning”
The heart and soul
Has to deal with the 10 Million URL’s
Need to find more positives!
Targeting Model P(Buy|URL,inventory,ad)
Experiment
Randomized targeting across 58 different large display ad campaigns.
Served ads to users with active, stable cookies
Targeted ~5000 random users per day for each marketer. Campaigns ran
for 1 to 5 months, between 100K and 4MM impressions per campaign
Observed outcomes: clicks on ads, post-impression (PI) purchases
(conversions)
Data
Targeting
• Optimize targeting using Click and PI Purchase
• Technographic info and web history as input variables
• Evaluate each separately trained model on its ability to rank order users for PI
Purchase, using AUC (Mann-Whitney Wilcoxin Statistic)
• Each model is trained/evaluated using Logistic Regression
.2.4
.6.8
AU
C
Train on Click Train on Purchase
®
*Restricted feature set used for these modeling results; qualitative conclusions generalize
Predictive performance* (AUC) for purchase learning
[Dalessandro et al. 2012]
.2.4
.6.8
AU
C
Train on Click Train on Purchase
®
.2.4
.6.8
AU
C
Train on Click Train on Purchase
®
*Restricted feature set used for these modeling results; qualitative conclusions generalize
Predictive performance* (AUC) for click learning
[Dalessandro et al. 2012]
Evalu
ate
d o
n p
redic
ting p
urc
ha
ses
(AU
C in the t
arg
et
dom
ain
)
.2.4
.6.8
1
Train on Clicks Train on Site Visits Train on Purchase
AU
C D
istr
ibu
tio
n
*Restricted feature set used for these modeling results; qualitative conclusions generalize
Predictive performance* (AUC) for Site Visit learning
[Dalessandro et al. 2012]
Significantly better targeting training on source
task
Evalu
ate
d o
n p
redic
ting p
urc
ha
ses
(AU
C in the t
arg
et
dom
ain
)
Why is learning the wrong thing better???
Transfer: Navigating Bias-Variance
.2.4
.6.8
1
Train on Clicks Train on Site Visits Train on Purchase
AU
C D
istr
ibu
tion
*Restricted feature set used for these modeling results; qualitative conclusions generalize
Predictive performance* (AUC) across 58 different display ad campaigns
[Dalessandro et al. 2012]
Significantly better targeting training on source
task
High cost
High correlation
High Variance
Low cost
Low correlation
High Bias
Low Cost
High correlation
Low Bias & Variance
The heart and soul
Has to deal with the 10 Million URL’s
Transfer learning:
Use all kinds of Site visits instead of new purchases
Biased sample in every possible way to reduce variance
Negatives are ‘everything else’
Pre-campaign without impression
Stacking for transfer learning
Targeting Model
Organic: P(SiteVisit|URL’s)
P(Buy|URL,inventory,ad)
MLJ 2014
Logistic regression in 10 Million dimensions
Stochastic Gradient Descent
L1 and L2 constraints
Automatic estimation of optimal learning rates
Bayesian empirical industry priors
Streaming updates of the models
Fully Automated ~10000 model per week
KDD 2014
Targeting Model
p(sv|urls) =
Ad Ad Ad
Real-time Scoring of a User
Ad
OBSERVATION
Purchase
ProspectRank
Threshold
site visit with positive correlation
site visit with negative correlation
ENGAGEMENT
Some prospects fall
out of favor once their
in-market indicators
decline.
0
5
10
15
20
25
0
1.0M
2.0M
3.0M
4.0M
5.0M
6.0M
NN
Lif
t o
ver
RO
N
Tota
l Im
pre
ssio
ns
median lift = 5x
Note: the top prospects are consistently rated as
being excellent compared to alternatives by advertising
clients’ internal measures, and when measured by their
analysis partners (e.g., Nielsen): high ROI,
low cost-per-acquisition, etc.
Lift over random for 66 campaigns for online display ad prospecting
Lift
over
baselin
e
<snip>
The Pokerface Bidding Model P(SiteVisit|Prospect Rank, Inventory, ad)
KDD 2012 Best Paper
Marginal Inventory Score:
Convert into bid price:
Inventory for Hotel Campaign
20
Lift
Measuring causal effect?
A/B Testing
Practical concerns
Estimate Causal effects from observational data
Using targeted maximum likelihood (TMLE)
to estimate causal impact
Can be done ex-post for different questions
Need to control for confounding
Data has to be ‘rich’ and cover all combinations of
confounding and treatment
ADKDD 2011 E[YA=ad] – E[YA=no ad]
An important decision…
I think she is hot!
Hmm – so what should I write
to her to get her number?
Source: OK Trends
? ?
Hardships of causality.
Beauty is Confounding
determines both the probability
of getting the number and of the probability that James will say it
need to control for the actual beauty or it can appear that making compliments is a bad idea
“You are beautiful.”
Hardships of causality.
Targeting is Confounding
We only show ads to people we know are more likely to convert (ad or not)
convers
ion r
ate
s
DID NOT SEE AD SAW AD
Need to control for confounding
Data has to be ‘rich’ and cover all combinations of confounding and treatment
Observational Causal Methods: TMLE
Negative Test: wrong ad
Positive Test: A/B comparison
Some creatives do not work …
27
The Police Fraud
Tracking artificial co-visitation patters
Blacklist inventory in the exchanges
Ignore the browser
KDD 2013
Unreasonable Performance Increase Spring 12
2 weeks
Pe
rfo
rma
nc
e In
de
x
2x
Oddly predictive websites?
36% traffic is Non-Intentional
2011 2012
6% 36%
Traffic patterns are ‘non - human’
website 1 website 2 50%
Data from Bid Requests in Ad-Exchanges
Node:
hostname
Edge:
50% co-visitation
WWW 2010
Boston Herald
Boston Herald
womenshealthbase?
WWW 2012
Unreasonable Performance Increase Spring 12
2 weeks
Pe
rfo
rma
nc
e In
de
x
2x
Now it is coming also to brands
• ‘Cookie Stuffing’ increases the value of the ad for retargeting
• Messing up Web analytics …
• Messes up my models because a botnet is easier to predict than a human
Fraud pollutes my models
• Don’t show ads on those sites
• Don’t show ads to a high jacked browser
• Need to remove the visits to the fraud sites
• Need to remove the fraudulent brand visits
When we see a browser on caught up in fraudulent
activity: send him to the penalty box where we
ignore all his actions
Using the penalty box: all back to normal
44
3 more weeks in spring 2012
Pe
rfo
rma
nc
e I
nd
ex
In eigener Sache
claudia.perlich@gmail.com
1. B. Dalessandro, F. Provost, R. Hook. Audience Selection for On-Line Brand
Advertising: Privacy Friendly Social Network Targeting, KDD 2009
2. O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Estimating The Effect Of
Online Display Advertising On Browser Conversion. ADKDD 2011
3. C.Perlich, O. Stitelman, B. Dalessandro, T. Raeder and F. Provost. Bid Optimizing
and Inventory Scoring in Targeted Online Advertising. KDD 2012 (Best Paper Award)
4. T. Raeder, O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Design
Principles of Massive, Robust Prediction Systems. KDD 2012
5. B. Dalessandro, O. Stitelman, C. Perlich, F. Provost Causally Motivated Attribution for
Online Advertising. In Proceedings of KDD, ADKDD 2012
6. B. Dalessandro, R. Hook. C. Perlich, F. Provost. Transfer Learning for Display
Advertising MLJ 2014
7. T. Raeder, C. Perlich, B. Dalessandro, O. Stitelman, F. Provost. Scalable Supervised
Dimensionality Reduction Using Clustering at KDD 2013
8. O. Stitelman, C. Perlich, B. Dalessandro, R. Hook, T. Raeder, F. Provost. Using Co-
visitation Networks For Classifying Non-Intentional Traffic‘ at KDD 2013
46
Some References