Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning...

40
Copyright © 2016 Criteo ML for Display Advertising @ Scale Damien Lefortier MLconf NYC 2016-04-15

Transcript of Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning...

Page 1: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2016 Criteo

ML for Display Advertising @ Scale

Damien Lefortier

MLconf NYC

2016-04-15

Page 2: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

2

Page 3: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

3

Page 4: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

AdTech / Criteo

4

Advertiser Publisher

Page 5: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Our Engine is trying to answer 3 questions

COMMON

OBJECTIVE:

Maximize the

client’s value

1. How much should we bid for a given ad space?My company

yes no no

My company

yes …

2. What products should we recommend / show?

My company

BUY!

My company

BUY! BUY!

BUY! BUY!

My company

BUY! BUY!

BUY! BUY!

My company

BUY! BUY! BUY!

BUY!

3. What is the best look & feel of the banner?

Page 6: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

6

Physical infrastructure

7 in-house data centers on 3 continents

~ 15000 servers; largest Hadoop cluster in Europe

More than 35 PB of data storage

Traffic

800k HTTP requests / sec (peak activity)

29000 impressions / sec (peak activity)

< 10 ms to process a bidding request

< 100 ms to render the ad (if we win)

Page 7: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

7

Page 8: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

8

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Page 9: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

9

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Page 10: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Bidding strategy (1)

• As we sell performance: Criteo’s and our clients’ interests are aligned.

• The cost of a display is lower and independent from the bid (2nd price or floor), so we should bid the max value the client is willing to pay.

• We use adjustments for 1st price auctions.

10

Page 11: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Bidding strategy (2)

• This value depends on the predicted performance and the client’s objective.

• Some examples:

• Click optimized campaign: bid = maxCPC pClick

• CR optimized campaign: bid = maxCPO pCR

• …

11

Page 12: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

We train our prediction models on our historical displays

Historical displays

Variables

Level of engagement of the user

Quality of inventory

User fatigue

For travel: time to check-in and number

of nights

: clicked displays : converted displays (size = order value)

Our ability to predict relies

greatly on the relevance of

the variables we consider

Machine Learning

Algorithms

Page 13: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

13

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Page 14: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Recommend products for a user

• What we want: reco(user) = products

• 1B users x 3B products!

• But we need to scale and keep it fresh

• What we can do:

Pre-select products offline

Refine scoring online to get final candidates

Page 15: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Bob saw orange shoes

Some candidate products

Historical

Similar

Complementary

Most viewed

Page 16: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Products delivering the best performance are displayed

Variables

Products seen by the user

Time since product event

Level of similarity

Product features

Historical displays

: clicked products : converted products (size = order value)

Products are selected based

on their CTR, CR or OV

Machine Learning

Algorithms

Page 17: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

17

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Page 18: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Historical displays (color = look & feel)

We train our prediction models on our historical displays

Variables

Some of which we control:

How user interacts with banner

Organization of information

Colorset

Some of which we don’t:

Zone format

Publisher

: clicked displays : converted displays (size = order value)

Look and feel will be selected

based on its CTR, CR or OV

My company

BUY! BUY! BUY!

BUY!

Machine Learning

Algorithms

Page 19: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

19

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Page 20: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Many models to learn

• We have different ML models for bidding / recommendation / … and depending on the campaign objective. We use logistic regression in many places.

• Each model is trained independently & refreshed as often as possible.

• Three main sources of features: user, ad, page (mostly categorical).

20

Page 21: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

Page 22: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

leads to

50 clicks

Page 23: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

leads to

50 clicks

leads to

1 sale

Page 24: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Quadratic features

• Outer product between 2 features (similar to a polynomial kernel of degree 2).

• Example between site and advertiser:

24

Publisher network

Publisher

Site

Url

Advertiser network

Ad

Campaign

Advertiser

Page 25: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Hashing trick

• Standard representation of categorical features: “one-hot” encoding• Dimensionality equal to the number of different values…

• Hashing to reduce dimensionality (made popular by John Langford in VW)• Dimensionality now independent of number of values

• Using:

25

Page 26: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

In-house Machine Learning library -- IRMA

• We have our own large-scale distributed machine learning library on top of Hadoop used for all our models.

• From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A. Agarwal et al. A Reliable Effective Terascale Linear Learning System).

26

Page 27: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Distribution of L-BFGS & SGD

• L-BFGS, being a batch algorithm, is easy to distribute.

• SGD is a bit tricker: we do parameter averaging for that and we also use Hogwild! to multi-thread on each machine.

• We use Hadoop AllReduce:

27

Page 28: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

A word on more advanced techniques

• Irma is not only about vanilla logistic regression with L2 regularization…

• It contains more advanced techniques such as, e.g., transfer learning, factorization machines, learning to rank, cost-sensitive learning, …

• We for example use cost-sensitive learning for bidding.

28

Page 29: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

29

Page 30: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Offline & online evaluation

Usual two-step process:

• Offline testing is fast, cheap, and efficient for wide exploration.

• Online testing is expensive but has the ultimate word.

30

Page 31: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Offline metrics (bidding case)

• We use classical metrics: LLH, RMSE, … (which focus on the prediction and ignore the bidding system where we use these models).

• Utility from Offline Evaluation of Response Prediction in Online Advertising Auctions by O. Chapelle (WWW’15).

31

Page 32: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Online metrics (bidding case)

• RevExTac = Revenue Excluding Traffic Acquisition Costs

• Cost, Revenue, …

32

Page 33: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Some statistics on evaluation

• 100K+ offline tests per year

• 1K+ A/B tests per year

• Many people

33

• We developed a platform and processes that enable very fast testing and improvement

Page 34: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

34

Page 35: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Some examples of future areas of Research

• Counterfactual evaluation (offline A/B tests)

• Embeddings for recommandation

• Policy learning

35

Page 36: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Counterfactual evaluation

• Estimate the business metric directly (clicks, sales, …).

• Using the production model + randomization.

• Good results on clicks already.

36

Page 37: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Embeddings for recommandation

• Can embeddings (for example a la word2vec) help us compute similaritiesbetween, e.g., different products or users?

37

Page 38: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Policy learning – example on Look & Feel optimization

• Classical supervised machine learning approach: learn a pClick model and sort by predicted values for each possible value (e.g, each color).

• This is a hard problem and may be overkill!

• Really, we only want to know which color is the best according to somebusiness metric (eg, sales).

38

Page 39: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Academic research @ Criteo

• Our 1st public dataset is online: http://bit.ly/1vgw2XC

• New 1TB dataset released last year.

• Some recent publications:

Offline Evaluation of Response Prediction in Online Advertising Auctions. O. Chapelle, WWW’15.

Sources of Variability in Large-scale Machine Learning Systems. D. Lefortier, A. Truchet, and M. de Rijke, NIPS 2015, workshop on ML systems, 2015.

Cost-sensitive Learning for Bidding in Online Advertising Auctions. F. Vasile and D. Lefortier, NIPS workshop on ML for e-Commerce, 2015.

39

Page 40: Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2015 Criteo

Questions

[email protected]