Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135...
Transcript of Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135...
Realtime Recommendationswith Redis
Torben Brodtplista GmbH
April 25th, 2013
NoSQL Search Roadshowhttp://nosqlroadshow.com/nosql-berlin-2013/
Introduction
● Torben Brodt, Head of Data Engineering○ computer science studies○ 5 years plista○ publication „collaborative filtering“○ evangelist for "power of algorithms“
● plista GmbH○ recommendations & advertising○ founded in 2008, Berlin [DE]○ ~5k recommendations/ second
Contents
1. How to feed a recommender?
2. How to build a recommendation?
3. How to scale a recommender?
How to feed a recommender?
How to feed a recommender?
● to show recommendations we are integrated on the website
● we have URL + HTTP Headers○ user agent○ IP address -> geolocation
How to feed a recommender?
● push the data away quickly● make use of data quickly
RULE: be quick
src http://en.wikipedia.org/wiki/Pac-Man
How to feed a recommender?
How to feed a recommender?
Technology overview
● Apache Lucene for Content● MySQL for relational data● Machine Learning
○ Hadoop? No! It's batch + slow○ In Memory? Yes, stream computing
● Redis for Statistics○ Live○ Backup
How to build a recommendation?
How to build a recommendation?
Behavioralbased on interaction between user and article
○ Most Popular○ Collaborative Filtering○ Item to Item
Contentbased on the articles
○ Content Similarity○ Latest Item
Classification
● different recommender families
Most popular with
welt.de/football/berlin_wins.html● ZINCR "p:welt.de" berlin_wins● ZREVRANGEBYSCORE
p:welt.de
berlin_wins 689 +1
summer_is_coming 420
plista_company 135
Live Read+ Live Write= Real Time Recommendations
● String, Lists, Set, ..● Hash
○ map between string fields and string values, very fast
○ HINCR complexity O(1)● Sorted Set
○ ZINCR complexity: O(log(N)) where N is the number of elements in the sorted set.
○ Allows to limit number of result: ZREVRANGEBYSCORE
○ UNION + INTERSECT
Recap Data typesp:welt.de
berlin_wins 689 +1
summer_is_coming 420
plista_company 135
Most popular with timeseries
welt.de/football/berlin_wins.html● ZINCR "p:welt.de:1360007000" berlin_wins● ZUNION
○ "p:welt.de:1360007000"○ "p:welt.de:1360006000"○ "p:welt.de:1360005000"
● ZREVRANGEBYSCOREp:welt.de:1360005000
berlin_wins 420
summer_is_coming 135
plista_best_company 689
p:welt.de:1360006000
berlin_wins 420
summer_is_coming 135
plista_best_company 689
p:welt.de:1360007000
berlin_wins 689
summer_is_coming 420
plista_best_company 135
Most popular with timeseries
welt.de/football/berlin_wins.html● ZINCR "p:welt.de:1360007000" berlin_wins● ZUNION ... WEIGHTS
○ "p:welt.de:1360007000" .. 4○ "p:welt.de:1360006000" .. 2○ "p:welt.de:1360005000" .. 1
● ZREVRANGEBYSCOREp:welt.de:1360005000
berlin_wins 420
summer_is_coming 135
plista_best_company 689
p:welt.de:1360006000
berlin_wins 420
summer_is_coming 135
plista_best_company 689
p:welt.de:1360007000
berlin_wins 689
summer_is_coming 420
plista_best_company 135
Most popular with timeseries
:1360007000
-1h -2h -3h -4h -5h -6h -7h -8h
:1360007000
:1360007000
42
1
Most popular to any context
● it's not only publisher, we use ~50 context attributes
context attributes:● publisher● weekday● geolocation● demographics● ...
publisher = welt.de
berlin_wins 689 +1
summer_is_coming 420
plista_company 135
weekday = sunday
berlin_wins 400 +1
dortmund_wins 200
... 100
geolocation = dortmund
dortmund_wins 200
berlin_wins 10 +1
... 5
Most popular to any context
ZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1
w:sunday:1360007 4w:sunday:1360006 2w:sunday:1360005 1
g:dortmund:1360007 4g:dortmund:1360006 2g:dortmund:1360005 1
● how it looks like in Redispublisher = welt.de
berlin_wins 689 +1
summer_is_coming 420
plista_company 135
weekday = sunday
berlin_wins 400
dortmund_wins 200
... 100
geolocation = dortmund
dortmund_wins 200
berlin_wins 10
... 5
Most popular with Effect size
ZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1
w:sunday:1360007 4w:sunday:1360006 2w:sunday:1360005 1
g:dortmund:1360007 4g:dortmund:1360006 2g:dortmund:1360005 1
* 70%* 70%* 70%
* 10%* 10%* 10%
* 30%* 30%* 30%
Effect Size
Examples:small effect: weatherbig effect: publisher
Data with small effect should not been taken into account, otherwise we get avg results
● which context has an influence?
SUM over..
● timeseries● different context● previous hits of the user● similar publisher
knowledge
publisher = welt.de
berlin_wins 689
summer_is_coming 420
plista_company 135ΣZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1
w:sunday:1360007 4w:sunday:1360006 2w:sunday:1360005 1
g:dortmund:1360007 4g:dortmund:1360006 2g:dortmund:1360005 1
... redis can do it ;)
Even more Matrix Operations ;)
● Similarity Matrix
● Human Control Matrix
● Meta-learning Matrix○ cooperation with
○ aided from
∏Σ
More recommenders possible
this was only about most popular
● other algorithms using redis○ incremental collaborative filtering
○ article to article paths (~graph)
○ .. using external data sources
How to scale a recommender?
How to scale a recommender?
Distribution to many servers● 1 client to access n servers● partitioning of data using hashing
How to scale a recommender?
Distribution to many servers● 1 client to access n servers● partitioning of data using hashing● for UNION we run into problems
○ combined keys need to be on same server○ NO consistent hashing possible○ workaround: prefix hashing
How to scale a recommender?
Low Latency● master/slave replication● should be close to edge servers● e.g. 1 redis instance per 1 webserver
src http://en.wikipedia.org/w
iki/Flash_(comics)
How to scale a recommender?
Application in Database● LUA Support is shipped● but single core process● a long read blocks all writes● concurrency issue
src http://lua.org
How to scale a recommender?
in spite of all those disadvantages● Redis fits perfect for simple operations
○ SUM + AGGREGATE + MIN + MAX● In-Memory operations are pretty fast● real-time features feel better in a real-time
database (e.g. time series)● we don't need batch
What else in Redis?
● message bus● many recommenders● live statistics● caching
"One technology to rule them all"
Questions?
www.plista.com
@torbenbrodt
xing.com/profile/Torben_Brodt
http://goo.gl/pvXm5
http://lnkd.in/MUXXuv