20141209 meetup hassan
-
Upload
nanda-kishore -
Category
Social Media
-
view
113 -
download
0
Transcript of 20141209 meetup hassan
Online Display Advertising
Optimization with H2O
Hassan NamarvarPrincipal Data Scientist
SF DATA MINING MEETUPDecember 9th, 2014
2
OUTLINE
Introducing ShareThis
Online display advertising problem
Estimation of conversion rate using H2O
Results from live campaigns
Ongoing work
Q&A
SHARING TOOLS AT SCALE
23 Billion PAGE VIEWS
120 SOCIAL CHANNELS
1. comScore Media Matrix Report * Includes PC, Tablet, and Mobile sites.
210 MM US USERS1
95% REACH*
2.4 MM SITES AND APPS
This is Missy! She is busy chatting and browsing on the
web…
USER
Missy reads an article and shares it to her
Facebook page using the ShareThis widget
SOCIAL ACTIVITY
ShareThis observes the share and can then
target Missy and her friends with advertising
messages tailored to their interests
SOCIAL DATA
MAKING SOCIAL DATA ACTIONABLE
CATEGORY TARGETING: TECHNOLOGY
TVS
1.1 MM
AUDIO
800K
SMARTPHONES
13.7 MM
TABLETS
5.3 MM
PCs
6 MM
GAMING
7 MM
CAMERAS
1.3 MM
28.6 MMUSERS
35 MM+SOCIAL ACTIONS
1.2 MM SOCIAL ACTIONS/DAY
STANDARD TARGETINGTHRESHOLD
INTER
EST
TIME
TRIGGER
EXCITEMENT
PEAK READI-NESSFOR ENGAGEMENT
FADING INTEREST
MALE 25-45 TECH ENTHUSI-
AST $HHI $75K+
“DAN”
6
ShareThis ONLY targets users within 24 hours to ensure ads reach them at the most relevant moment
SHARETHIS MESSAGING TRIG-
GER
REAL TIME MESSAGING REACHES USERS DURING PEAK INTEREST
7
ONLINE DISPLAY ADVERTISING
Advertisers’ goal is to target the most receptive online audience in the right context and right time, so that to influence users to engage with the ad.
Publisher Web Page
Ad Ad Exchange
Model Pipeline(Production)
Real Time Bidding (RTB)
System
ShareThis Data
Campaign DataMeta Data
Models
8
ONLINE DISPLAY ADVERTISING
Campaign Performance
Advertisers seek the optimal price to bid for each ad call.
Cost per Click (CPC) Model
Cost per Action (CPA) Model
9
MODELING CONVERSION RATE (CVR)
CTR and CVR are directly related to the user interacting with the ad in a given context.
Challenge
They are fundamentally difficult to directly model and predict.
Even CVR is harder than CTR since conversion are very rare events
View-through conversions have longer delays in the logging system.
10
PROBLEM SETUP
Let define Users, Publishers, Ads, Devices, and Locations as:
GoalFind the optimal ad such that the probability of conversion is the highest.
11
PROBLEM SETUP
At single user level, the problem is a binary problem: conversion or no conversion.
Conversion event is a random binary event
Transactional (low-level) data features are poorly correlated with user’s direct response on a display ad.
12
DATA HIERARCHIES
A2
A1
A0 Root
Advertiser1
Campaign 1 Campaign
2
Advertiser2
Campaign 3
Campaign K
L2
L1
L0 Root
Location 1
Zipcode 1 Zipcode 2
Location 2
Zipcode 3 Zipcode N
U2
U1
U0 Root
UserClust 1
UserGroup 1 UserGroup
2
UserClust 2
UserGroup 3
UserGroup I P2
P1
P0 Root
PubType 1
Publisher 1 Publisher
2
PubType 2
Publisher 3
Publisher J
13
HIGH LEVEL MODELING
Compute conversions for similar users, contexts, ads, …
Maximum Likelihood Estimate (MLE):
14
COMBINING EESTIMATORS LOGISTIC REGRESSION
Let denote MLE of the CVR’s of events at Q different levels.
GoalEstimate CVR using combination of estimators:
Log-likelihood
Logistic Regression
15
PRACTICAL ISSUES
Data Imbalance CVR is inherently very low Need to up-sample conversions or down-sample non conversions
Remove Anomalies Retargeting visit data as proxy for cnv when cnv data is not
available Remove outliers
Missing Features Sometimes features are missing or not enough conversions Impute features
Feature Selection Discard feature if more than 70% of the training examples are
missing Variance of attribution is lower than a threshold (10e-9)
16
WHY NEW MACHINE LEARNING TOOL?
Available large-scale ML tools such as Apache Mahout, Vowpal Wabbit, Hadoop RMR, native Spark MLLib have their own issues.
Critical Features for a state-of-the-art ML package:
Ease of use
System reliability
In-memory (fast)
Distributed
Extensible (API/SDK)
Accurate algorithms
Visualization (data and results)
Easy to deploy to production
19
SCORE CALIBRATION
Calibrate Model Scores
Find best threshold from AUC
Ad server attributes a conversion to the last impression
RTB needs to deliver certain amount of impressions per day
There is a trade-off between wasting impressions and winning conversions.
20
BUILDING A CPA MODEL RETARGETED VISITS AS A PROXY FOR CONVERSIONS
USER-CENTRIC
Focus on RT Users
Deliver Ads at the optimal times
BETTERPERFORMAN
CELeverage
optimization opportunities
OPTIMAL TIME
Target Users Who Likely Convert
DON’T WASTE IMP.
21
LIVE TEST ON A CAR INSURANCE CAMPAIGNTESTED FOR TWO MONTHS AND MEASURED THE PERFORMANCE BY DFA.
The CPA test for a car insurance campaign showed 58% improvement on eCPA and 57% on conversion rate (CVR).
23
ONGOING WORK
Tests are expensive and time consuming
We need to evaluate models before deploying to production
Build many models and evaluate them offline
Different datasets
Different features
Different algorithms
24
COMBINING ESTIMATORS GRADIENT BOOSTING MACHINE
Let denote categorical features.
GoalEstimate CVR using an ensemble of weak prediction models, decision trees:
Gradient boosting combines weak learners into a single strong learner, in an iterative fashion.
27
OFFLINE SIMULATIONS
Selecting models in practice
Accuracy of prediction on unseen data
Scoring time at production
Remove anomalies using Deep Learning
Correlations with other campaign KPIs (CTR, Brand lift, Viewability, Winning Price, …)
Performance Stability
31
CONCLUSION
How H2O helped us?
Maximized ROI by optimizing campaign performance and budget allocation.
Empowered advanced ML algorithms in Hadoop cluster
Used all data and build models much faster
Reduced R&D time significantly
Building a smooth model building pipeline (R and Spark API)
ACKNOWLEDGEMENT
THE TEAM:Prasanta BeheraXibin ChenWahid ChrabakhJinghao MiaoHassan NamarvarYan Qu
THANK YOU!