20141209 meetup hassan

33
Online Display Advertising Optimization with H 2 O Hassan Namarvar Principal Data Scientist SF DATA MINING MEETUP December 9 th , 2014

Transcript of 20141209 meetup hassan

Online Display Advertising

Optimization with H2O

Hassan NamarvarPrincipal Data Scientist

SF DATA MINING MEETUPDecember 9th, 2014

2

OUTLINE

Introducing ShareThis

Online display advertising problem

Estimation of conversion rate using H2O

Results from live campaigns

Ongoing work

Q&A

SHARING TOOLS AT SCALE

23 Billion PAGE VIEWS

120 SOCIAL CHANNELS

1. comScore Media Matrix Report * Includes PC, Tablet, and Mobile sites.

210 MM US USERS1

95% REACH*

2.4 MM SITES AND APPS

This is Missy! She is busy chatting and browsing on the

web…

USER

Missy reads an article and shares it to her

Facebook page using the ShareThis widget

SOCIAL ACTIVITY

ShareThis observes the share and can then

target Missy and her friends with advertising

messages tailored to their interests

SOCIAL DATA

MAKING SOCIAL DATA ACTIONABLE

CATEGORY TARGETING: TECHNOLOGY

TVS

1.1 MM

AUDIO

800K

SMARTPHONES

13.7 MM

TABLETS

5.3 MM

PCs

6 MM

GAMING

7 MM

CAMERAS

1.3 MM

28.6 MMUSERS

35 MM+SOCIAL ACTIONS

1.2 MM SOCIAL ACTIONS/DAY

STANDARD TARGETINGTHRESHOLD

INTER

EST

TIME

TRIGGER

EXCITEMENT

PEAK READI-NESSFOR ENGAGEMENT

FADING INTEREST

MALE 25-45 TECH ENTHUSI-

AST $HHI $75K+

“DAN”

6

ShareThis ONLY targets users within 24 hours to ensure ads reach them at the most relevant moment

SHARETHIS MESSAGING TRIG-

GER

REAL TIME MESSAGING REACHES USERS DURING PEAK INTEREST

7

ONLINE DISPLAY ADVERTISING

Advertisers’ goal is to target the most receptive online audience in the right context and right time, so that to influence users to engage with the ad.

Publisher Web Page

Ad Ad Exchange

Model Pipeline(Production)

Real Time Bidding (RTB)

System

ShareThis Data

Campaign DataMeta Data

Models

8

ONLINE DISPLAY ADVERTISING

Campaign Performance

Advertisers seek the optimal price to bid for each ad call.

Cost per Click (CPC) Model

Cost per Action (CPA) Model

9

MODELING CONVERSION RATE (CVR)

CTR and CVR are directly related to the user interacting with the ad in a given context.

Challenge

They are fundamentally difficult to directly model and predict.

Even CVR is harder than CTR since conversion are very rare events

View-through conversions have longer delays in the logging system.

10

PROBLEM SETUP

Let define Users, Publishers, Ads, Devices, and Locations as:

GoalFind the optimal ad such that the probability of conversion is the highest.

11

PROBLEM SETUP

At single user level, the problem is a binary problem: conversion or no conversion.

Conversion event is a random binary event

Transactional (low-level) data features are poorly correlated with user’s direct response on a display ad.

12

DATA HIERARCHIES

A2

A1

A0 Root

Advertiser1

Campaign 1 Campaign

2

Advertiser2

Campaign 3

Campaign K

L2

L1

L0 Root

Location 1

Zipcode 1 Zipcode 2

Location 2

Zipcode 3 Zipcode N

U2

U1

U0 Root

UserClust 1

UserGroup 1 UserGroup

2

UserClust 2

UserGroup 3

UserGroup I P2

P1

P0 Root

PubType 1

Publisher 1 Publisher

2

PubType 2

Publisher 3

Publisher J

13

HIGH LEVEL MODELING

Compute conversions for similar users, contexts, ads, …

Maximum Likelihood Estimate (MLE):

14

COMBINING EESTIMATORS LOGISTIC REGRESSION

Let denote MLE of the CVR’s of events at Q different levels.

GoalEstimate CVR using combination of estimators:

Log-likelihood

Logistic Regression

15

PRACTICAL ISSUES

Data Imbalance CVR is inherently very low Need to up-sample conversions or down-sample non conversions

Remove Anomalies Retargeting visit data as proxy for cnv when cnv data is not

available Remove outliers

Missing Features Sometimes features are missing or not enough conversions Impute features

Feature Selection Discard feature if more than 70% of the training examples are

missing Variance of attribution is lower than a threshold (10e-9)

16

WHY NEW MACHINE LEARNING TOOL?

Available large-scale ML tools such as Apache Mahout, Vowpal Wabbit, Hadoop RMR, native Spark MLLib have their own issues.

Critical Features for a state-of-the-art ML package:

Ease of use

System reliability

In-memory (fast)

Distributed

Extensible (API/SDK)

Accurate algorithms

Visualization (data and results)

Easy to deploy to production

17

H2O PLATFORM

Screen shot for H2O platform web API

18

H2O PLATFORM: GLM MODEL

Screen shot for the CPA model using the GLM algorithm.

19

SCORE CALIBRATION

Calibrate Model Scores

Find best threshold from AUC

Ad server attributes a conversion to the last impression

RTB needs to deliver certain amount of impressions per day

There is a trade-off between wasting impressions and winning conversions.

20

BUILDING A CPA MODEL RETARGETED VISITS AS A PROXY FOR CONVERSIONS

USER-CENTRIC

Focus on RT Users

Deliver Ads at the optimal times

BETTERPERFORMAN

CELeverage

optimization opportunities

OPTIMAL TIME

Target Users Who Likely Convert

DON’T WASTE IMP.

21

LIVE TEST ON A CAR INSURANCE CAMPAIGNTESTED FOR TWO MONTHS AND MEASURED THE PERFORMANCE BY DFA.

The CPA test for a car insurance campaign showed 58% improvement on eCPA and 57% on conversion rate (CVR).

22

LIVE TESTS ON DIFFERENT CAMPAIGNSOBSERVED CPA LIFT

23

ONGOING WORK

Tests are expensive and time consuming

We need to evaluate models before deploying to production

Build many models and evaluate them offline

Different datasets

Different features

Different algorithms

24

COMBINING ESTIMATORS GRADIENT BOOSTING MACHINE

Let denote categorical features.

GoalEstimate CVR using an ensemble of weak prediction models, decision trees:

Gradient boosting combines weak learners into a single strong learner, in an iterative fashion.

25

MODEL COMPARISON

Comparing AUC plots of GBM and RF models on test data:

26

OFFLINE SIMULATIONS

Comparing AUC plots of GBM and RF models on test data:

27

OFFLINE SIMULATIONS

Selecting models in practice

Accuracy of prediction on unseen data

Scoring time at production

Remove anomalies using Deep Learning

Correlations with other campaign KPIs (CTR, Brand lift, Viewability, Winning Price, …)

Performance Stability

28

EVALUATION ON IMPRESSION DATA

Correlation of GBM model scores with CTR

29

EVALUATION ON IMPRESSION DATA

Correlation of GBM model scores with average winning bid price

30

GBM MODEL TESTS vs GLM MODEL CONTROLA/B TEST: OBSERVED CPA LIFT

31

CONCLUSION

How H2O helped us?

Maximized ROI by optimizing campaign performance and budget allocation.

Empowered advanced ML algorithms in Hadoop cluster

Used all data and build models much faster

Reduced R&D time significantly

Building a smooth model building pipeline (R and Spark API)

ACKNOWLEDGEMENT

THE TEAM:Prasanta BeheraXibin ChenWahid ChrabakhJinghao MiaoHassan NamarvarYan Qu

THANK YOU!

SHARETHIS IS HIRING!

Please check out:www.sharethis.com/about/careers

Q&A