Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135...

Realtime Recommendationswith Redis

Torben Brodtplista GmbH

April 25th, 2013

NoSQL Search Roadshowhttp://nosqlroadshow.com/nosql-berlin-2013/

http://nosqlroadshow.com/nosql-berlin-2013/

http://nosqlroadshow.com/nosql-berlin-2013/

Introduction

● Torben Brodt, Head of Data Engineering○ computer science studies○ 5 years plista○ publication „collaborative filtering“○ evangelist for "power of algorithms“

● plista GmbH○ recommendations & advertising○ founded in 2008, Berlin [DE]○ ~5k recommendations/ second

Contents

1. How to feed a recommender?

2. How to build a recommendation?

3. How to scale a recommender?

How to feed a recommender?


● to show recommendations we are integrated on the website

● we have URL + HTTP Headers○ user agent○ IP address -> geolocation


● push the data away quickly● make use of data quickly

RULE: be quick

src http://en.wikipedia.org/wiki/Pac-Man

http://en.wikipedia.org/wiki/Pac-Man

Technology overview

● Apache Lucene for Content● MySQL for relational data● Machine Learning

○ Hadoop? No! It's batch + slow○ In Memory? Yes, stream computing

● Redis for Statistics○ Live○ Backup

How to build a recommendation?

How to build a recommendation?

Behavioralbased on interaction between user and article

○ Most Popular○ Collaborative Filtering○ Item to Item

Contentbased on the articles

○ Content Similarity○ Latest Item

Classification

● different recommender families

Most popular with

welt.de/football/berlin_wins.html● ZINCR "p:welt.de" berlin_wins● ZREVRANGEBYSCORE

p:welt.de

berlin_wins 689 +1

summer_is_coming 420

plista_company 135

Live Read+ Live Write= Real Time Recommendations

● String, Lists, Set, ..● Hash

○ map between string fields and string values, very fast

○ HINCR complexity O(1)● Sorted Set

○ ZINCR complexity: O(log(N)) where N is the number of elements in the sorted set.

○ Allows to limit number of result: ZREVRANGEBYSCORE

○ UNION + INTERSECT

Recap Data typesp:welt.de

berlin_wins 689 +1


plista_company 135

Most popular with timeseries

welt.de/football/berlin_wins.html● ZINCR "p:welt.de:1360007000" berlin_wins● ZUNION

○ "p:welt.de:1360007000"○ "p:welt.de:1360006000"○ "p:welt.de:1360005000"

● ZREVRANGEBYSCOREp:welt.de:1360005000

berlin_wins 420


plista_best_company 689

p:welt.de:1360006000

berlin_wins 420



p:welt.de:1360007000

berlin_wins 689




welt.de/football/berlin_wins.html● ZINCR "p:welt.de:1360007000" berlin_wins● ZUNION ... WEIGHTS

○ "p:welt.de:1360007000" .. 4○ "p:welt.de:1360006000" .. 2○ "p:welt.de:1360005000" .. 1

● ZREVRANGEBYSCOREp:welt.de:1360005000

berlin_wins 420



p:welt.de:1360006000

berlin_wins 420



p:welt.de:1360007000

berlin_wins 689




:1360007000

-1h -2h -3h -4h -5h -6h -7h -8h

:1360007000

:1360007000

42

1

Most popular to any context

● it's not only publisher, we use ~50 context attributes

context attributes:● publisher● weekday● geolocation● demographics● ...

publisher = welt.de

berlin_wins 689 +1


plista_company 135

weekday = sunday

berlin_wins 400 +1

dortmund_wins 200

... 100

geolocation = dortmund

dortmund_wins 200

berlin_wins 10 +1

... 5

Most popular to any context

ZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1

w:sunday:1360007 4w:sunday:1360006 2w:sunday:1360005 1

g:dortmund:1360007 4g:dortmund:1360006 2g:dortmund:1360005 1

● how it looks like in Redispublisher = welt.de

berlin_wins 689 +1


plista_company 135

weekday = sunday

berlin_wins 400

dortmund_wins 200

... 100

geolocation = dortmund

dortmund_wins 200

berlin_wins 10

... 5

Most popular with Effect size

ZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1



* 70%* 70%* 70%

* 10%* 10%* 10%

* 30%* 30%* 30%

Effect Size

Examples:small effect: weatherbig effect: publisher

Data with small effect should not been taken into account, otherwise we get avg results

● which context has an influence?

SUM over..

● timeseries● different context● previous hits of the user● similar publisher

knowledge

publisher = welt.de

berlin_wins 689


plista_company 135ΣZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1



... redis can do it ;)

Even more Matrix Operations ;)

● Similarity Matrix

● Human Control Matrix

● Meta-learning Matrix○ cooperation with

○ aided from

∏Σ

More recommenders possible

this was only about most popular

● other algorithms using redis○ incremental collaborative filtering

○ article to article paths (~graph)

○ .. using external data sources

How to scale a recommender?


Distribution to many servers● 1 client to access n servers● partitioning of data using hashing


Distribution to many servers● 1 client to access n servers● partitioning of data using hashing● for UNION we run into problems

○ combined keys need to be on same server○ NO consistent hashing possible○ workaround: prefix hashing


Low Latency● master/slave replication● should be close to edge servers● e.g. 1 redis instance per 1 webserver

src http://en.wikipedia.org/w

iki/Flash_(comics)

http://en.wikipedia.org/wiki/Flash_(comics)


Application in Database● LUA Support is shipped● but single core process● a long read blocks all writes● concurrency issue

src http://lua.org

http://lua.org


in spite of all those disadvantages● Redis fits perfect for simple operations

○ SUM + AGGREGATE + MIN + MAX● In-Memory operations are pretty fast● real-time features feel better in a real-time

database (e.g. time series)● we don't need batch

What else in Redis?

● message bus● many recommenders● live statistics● caching

"One technology to rule them all"

Questions?

www.plista.com

[email protected]

@torbenbrodt

xing.com/profile/Torben_Brodt

http://goo.gl/pvXm5

http://lnkd.in/MUXXuv

http://www.plista.com

http://www.plista.com

mailto:[email protected]

mailto:[email protected]

http://twitter.com/torbenbrodt

http://twitter.com/torbenbrodt

http://xing.com/profile/Torben_Brodt

http://xing.com/profile/Torben_Brodt

http://goo.gl/pvXm5

http://goo.gl/pvXm5



Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135...

Documents

Transcript of Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135...