RecSys 2015: Large-scale real-time product recommendation at Criteo
-
Upload
romain-lerallut -
Category
Internet
-
view
6.991 -
download
5
Transcript of RecSys 2015: Large-scale real-time product recommendation at Criteo
![Page 1: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/1.jpg)
Copyright © 2015 Criteo
Large-Scale Real-Time Product
Recommendation at CriteoRomain Lerallut, Diane Gasselin
RecSys Vienna, Sept 18, 2015
![Page 2: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/2.jpg)
Copyright © 2015 Criteo
2
![Page 3: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/3.jpg)
Copyright © 2015 Criteo
« The largest internet company you’ve never heard of »
• Founded in 2005, in the adtech business since 2008
• Recommendation was our first product
• Disruptive business models
• 1700 people WW (50+% for less than a year)
• 300+ engineers
• 26 offices
• Live in 130 countries
• 1B unique users
![Page 4: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/4.jpg)
Copyright © 2015 Criteo
We buy
• Inventory ! (ad spaces)
• Billions of times a day
• All over the Internet
• For 95% of the population
=> Funding the Web
A technology company first and foremost
We sell
• Clicks !• (that convert)
• (that convert a lot)
=> Delight to our clients !
We take the risk
You pay only for what you get
![Page 5: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/5.jpg)
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
![Page 6: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/6.jpg)
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
![Page 7: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/7.jpg)
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
leads to
1 sale
![Page 8: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/8.jpg)
Copyright © 2015 Criteo
8
Traffic
800k HTTP requests / sec (peak activity)
29000 impressions / sec (peak activity)
![Page 9: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/9.jpg)
Copyright © 2015 Criteo
9
Traffic
800k HTTP requests / sec (peak activity)
29000 impressions / sec (peak activity)
<10 ms to process RTB request
<100 ms to process reco request
![Page 10: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/10.jpg)
Copyright © 2015 Criteo
10
Physical infrastructure
7 in-house data centers on 3 continents
Traffic
800k HTTP requests / sec (peak activity)
29000 impressions / sec (peak activity)
<10 ms to process RTB request
<100 ms to process reco request
![Page 11: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/11.jpg)
Copyright © 2015 Criteo
11
Physical infrastructure
7 in-house data centers on 3 continents
~ 15000 servers, largest Hadoop cluster in Europe
More than 35 PB of storage Big Data
Traffic
800k HTTP requests / sec (peak activity)
29000 impressions / sec (peak activity)
<10 ms to process RTB request
<100 ms to process reco request
![Page 12: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/12.jpg)
Copyright © 2015 Criteo
(Big) Data Sources
Ad display data20B events / day
User behavior data2B events / day
Catalog data1M+ products / client
10k clients
![Page 13: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/13.jpg)
Copyright © 2015 Criteo
How do we do it ?
![Page 14: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/14.jpg)
Copyright © 2015 Criteo
Recommend products for a user
• What we want: reco(user) = products
• But 1B users x 3B products !
• And we need to scale and keep it fresh
• What we can do :
• Pre-select products offline (source)
• Refine recommendation online
![Page 15: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/15.jpg)
Copyright © 2015 Criteo
15
Offline : prepare sources
Advertiser events
Co events
Item View – Item View Item Sale – Item Sale
Best ofBest of by category
Similarities Complementarities
Top N
350M keys12B values
50B
50M keys1B values
![Page 16: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/16.jpg)
Copyright © 2015 Criteo
User X saw orange shoes
Offline : prepare sources
Historical
Similar
Best-of
Other users :
Most viewed products on the client website
Some candidate products for user X
Complementary
![Page 17: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/17.jpg)
Copyright © 2015 Criteo
OFFLINE
Reco overview
Advertiser
events
Source computation
Map-Reduce jobs
Recommendation Service
Display, Click, Sale logs
Prediction
models
Sources
Catalog
12h
4h
6h
4.5B
500M
100K qps
50B
![Page 18: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/18.jpg)
Copyright © 2015 Criteo
ML model
• Logistic regression models because : • They scale
• They are fast
• They can handle lots of features (with a bit of magic)
Product-specific User-specific User-product interactions Display-specific
![Page 19: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/19.jpg)
Copyright © 2015 Criteo
Online: sources
Similarities Most viewed Most bought
![Page 20: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/20.jpg)
Copyright © 2015 Criteo
Online: merge of products
Similarities Most viewed Most bought
![Page 21: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/21.jpg)
Copyright © 2015 Criteo
Online: scoring
Similarities Most viewed Most bought
0,02 0,12 0,06 0,18 0,03 0,05 0,01 0,005 0,011 0,013 0,004 0,007
![Page 22: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/22.jpg)
Copyright © 2015 Criteo
Online: scoring
Similarities Most viewed Most bought
0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004
![Page 23: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/23.jpg)
Copyright © 2015 Criteo
Online: candidates
0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004
SHOP SHOP SHOP SHOP
-50%
![Page 24: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/24.jpg)
Copyright © 2015 Criteo
Evaluation
![Page 25: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/25.jpg)
Copyright © 2015 Criteo
• It is the only truth we have
• 50% users on model A
• 50% users on model B
The basics : online ab-testing
My company
BUY! BUY!
BUY!
My company
BUY! BUY!
BUY!
![Page 26: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/26.jpg)
Copyright © 2015 Criteo
• It is the only truth we have
• 50% users on model A
• 50% users on model B
• But it is onerous• If not good, we lose money, fast !
• Tests are long (~2weeks needed to have good confidence intervals)
• Code has to be prod-ready (no bug, good performance), we run 24/7
• Can be heavy on the infrastructure
• And does not take long-term effect into account
The basics : online ab-testing
My company
BUY! BUY!
BUY!
My company
BUY! BUY!
BUY!
![Page 27: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/27.jpg)
Copyright © 2015 Criteo
The test framework for prediction
• ALTERNATIVE : Framework that replays production logs (offline)• 30 000 tests / year
• Replay ~x100
• BUT : we only have data on products we display (exploration iscostly)
• SO : we can only make sure we are not completely mistaken
![Page 28: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/28.jpg)
Copyright © 2015 Criteo
Ultimate solution: offline ab-testing
• Find the best offline predictor for online performance
• Counterfactual Reasoning and Learning Systems
Léon Bottou Microsoft Research, Redmond, WA
Jonas Peters Max Planck Institute, Tübingen
Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly,
Dipankar Ray, Patrice Simard, Ed Snelson
• But we haven’t succeeded in making it precisely match reality..
![Page 29: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/29.jpg)
Copyright © 2015 Criteo
Ultimate solution: offline ab-testing
• Find the best offline predictor for online performance
• Counterfactual Reasoning and Learning Systems
Léon Bottou Microsoft Research, Redmond, WA
Jonas Peters Max Planck Institute, Tübingen
Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly,
Dipankar Ray, Patrice Simard, Ed Snelson
• But we haven’t succeeded in making it precisely match reality.. YET
![Page 30: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/30.jpg)
Copyright © 2015 Criteo
What’s next ?
![Page 31: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/31.jpg)
Copyright © 2015 Criteo
What’s next for us : Upcoming challenges
• Long(er)-term user profiles
![Page 32: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/32.jpg)
Copyright © 2015 Criteo
What’s next for us : Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
![Page 33: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/33.jpg)
Copyright © 2015 Criteo
What’s next for us : Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Instant-update of similarities• (because batch computation is soooo last year)
![Page 34: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/34.jpg)
Copyright © 2015 Criteo
What’s next for us : Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Instant-update of similarities• (because batch computation is soooo last year)
• Joined product scoring• (score full banner and not products independently)
![Page 35: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/35.jpg)
Copyright © 2015 Criteo
What’s next for you : Fancy a try ?
On your own:
With us !
http://labs.criteo.com/jobs/
• We published datasets for click prediction
• 4GB display-click data : Kaggle challenge in 2014 http://bit.ly/1vgw2XC• 1TB Display-Click data (industry’s largest dataset) : http://bit.ly/1PyH4Vq
• 4 billion of observations• 156 billion feature-value• available on Microsoft Azure• used by edX (UC Berkeley)
• We would be happy to share Recocentric data !
![Page 36: RecSys 2015: Large-scale real-time product recommendation at Criteo](https://reader030.fdocuments.in/reader030/viewer/2022021500/58f9b3c7760da3da068bd98c/html5/thumbnails/36.jpg)
Copyright © 2015 Criteo
Questions?