Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135...

31
Realtime Recommendations with Redis Torben Brodt plista GmbH April 25th, 2013 NoSQL Search Roadshow http://nosqlroadshow.com/nosql-berlin-2013/

Transcript of Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135...

Page 1: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Realtime Recommendationswith Redis

Torben Brodtplista GmbH

April 25th, 2013

NoSQL Search Roadshowhttp://nosqlroadshow.com/nosql-berlin-2013/

Page 2: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Introduction

● Torben Brodt, Head of Data Engineering○ computer science studies○ 5 years plista○ publication „collaborative filtering“○ evangelist for "power of algorithms“

● plista GmbH○ recommendations & advertising○ founded in 2008, Berlin [DE]○ ~5k recommendations/ second

Page 3: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map
Page 4: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Contents

1. How to feed a recommender?

2. How to build a recommendation?

3. How to scale a recommender?

Page 5: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to feed a recommender?

Page 6: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to feed a recommender?

● to show recommendations we are integrated on the website

● we have URL + HTTP Headers○ user agent○ IP address -> geolocation

Page 7: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to feed a recommender?

● push the data away quickly● make use of data quickly

RULE: be quick

src http://en.wikipedia.org/wiki/Pac-Man

Page 8: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to feed a recommender?

Page 9: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to feed a recommender?

Page 10: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Technology overview

● Apache Lucene for Content● MySQL for relational data● Machine Learning

○ Hadoop? No! It's batch + slow○ In Memory? Yes, stream computing

● Redis for Statistics○ Live○ Backup

Page 11: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to build a recommendation?

Page 12: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to build a recommendation?

Behavioralbased on interaction between user and article

○ Most Popular○ Collaborative Filtering○ Item to Item

Contentbased on the articles

○ Content Similarity○ Latest Item

Classification

● different recommender families

Page 13: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Most popular with

welt.de/football/berlin_wins.html● ZINCR "p:welt.de" berlin_wins● ZREVRANGEBYSCORE

p:welt.de

berlin_wins 689 +1

summer_is_coming 420

plista_company 135

Live Read+ Live Write= Real Time Recommendations

Page 14: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

● String, Lists, Set, ..● Hash

○ map between string fields and string values, very fast

○ HINCR complexity O(1)● Sorted Set

○ ZINCR complexity: O(log(N)) where N is the number of elements in the sorted set.

○ Allows to limit number of result: ZREVRANGEBYSCORE

○ UNION + INTERSECT

Recap Data typesp:welt.de

berlin_wins 689 +1

summer_is_coming 420

plista_company 135

Page 15: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Most popular with timeseries

welt.de/football/berlin_wins.html● ZINCR "p:welt.de:1360007000" berlin_wins● ZUNION

○ "p:welt.de:1360007000"○ "p:welt.de:1360006000"○ "p:welt.de:1360005000"

● ZREVRANGEBYSCOREp:welt.de:1360005000

berlin_wins 420

summer_is_coming 135

plista_best_company 689

p:welt.de:1360006000

berlin_wins 420

summer_is_coming 135

plista_best_company 689

p:welt.de:1360007000

berlin_wins 689

summer_is_coming 420

plista_best_company 135

Page 16: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Most popular with timeseries

welt.de/football/berlin_wins.html● ZINCR "p:welt.de:1360007000" berlin_wins● ZUNION ... WEIGHTS

○ "p:welt.de:1360007000" .. 4○ "p:welt.de:1360006000" .. 2○ "p:welt.de:1360005000" .. 1

● ZREVRANGEBYSCOREp:welt.de:1360005000

berlin_wins 420

summer_is_coming 135

plista_best_company 689

p:welt.de:1360006000

berlin_wins 420

summer_is_coming 135

plista_best_company 689

p:welt.de:1360007000

berlin_wins 689

summer_is_coming 420

plista_best_company 135

Page 17: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Most popular with timeseries

:1360007000

-1h -2h -3h -4h -5h -6h -7h -8h

:1360007000

:1360007000

42

1

Page 18: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Most popular to any context

● it's not only publisher, we use ~50 context attributes

context attributes:● publisher● weekday● geolocation● demographics● ...

publisher = welt.de

berlin_wins 689 +1

summer_is_coming 420

plista_company 135

weekday = sunday

berlin_wins 400 +1

dortmund_wins 200

... 100

geolocation = dortmund

dortmund_wins 200

berlin_wins 10 +1

... 5

Page 19: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Most popular to any context

ZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1

w:sunday:1360007 4w:sunday:1360006 2w:sunday:1360005 1

g:dortmund:1360007 4g:dortmund:1360006 2g:dortmund:1360005 1

● how it looks like in Redispublisher = welt.de

berlin_wins 689 +1

summer_is_coming 420

plista_company 135

weekday = sunday

berlin_wins 400

dortmund_wins 200

... 100

geolocation = dortmund

dortmund_wins 200

berlin_wins 10

... 5

Page 20: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Most popular with Effect size

ZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1

w:sunday:1360007 4w:sunday:1360006 2w:sunday:1360005 1

g:dortmund:1360007 4g:dortmund:1360006 2g:dortmund:1360005 1

* 70%* 70%* 70%

* 10%* 10%* 10%

* 30%* 30%* 30%

Effect Size

Examples:small effect: weatherbig effect: publisher

Data with small effect should not been taken into account, otherwise we get avg results

● which context has an influence?

Page 21: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

SUM over..

● timeseries● different context● previous hits of the user● similar publisher

knowledge

publisher = welt.de

berlin_wins 689

summer_is_coming 420

plista_company 135ΣZUNION ... WEIGHTSp:welt.de:1360007 4p:welt.de:1360006 2p:welt.de:1360005 1

w:sunday:1360007 4w:sunday:1360006 2w:sunday:1360005 1

g:dortmund:1360007 4g:dortmund:1360006 2g:dortmund:1360005 1

... redis can do it ;)

Page 22: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

Even more Matrix Operations ;)

● Similarity Matrix

● Human Control Matrix

● Meta-learning Matrix○ cooperation with

○ aided from

∏Σ

Page 23: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

More recommenders possible

this was only about most popular

● other algorithms using redis○ incremental collaborative filtering

○ article to article paths (~graph)

○ .. using external data sources

Page 24: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to scale a recommender?

Page 25: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to scale a recommender?

Distribution to many servers● 1 client to access n servers● partitioning of data using hashing

Page 26: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to scale a recommender?

Distribution to many servers● 1 client to access n servers● partitioning of data using hashing● for UNION we run into problems

○ combined keys need to be on same server○ NO consistent hashing possible○ workaround: prefix hashing

Page 27: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to scale a recommender?

Low Latency● master/slave replication● should be close to edge servers● e.g. 1 redis instance per 1 webserver

src http://en.wikipedia.org/w

iki/Flash_(comics)

Page 28: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to scale a recommender?

Application in Database● LUA Support is shipped● but single core process● a long read blocks all writes● concurrency issue

src http://lua.org

Page 29: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

How to scale a recommender?

in spite of all those disadvantages● Redis fits perfect for simple operations

○ SUM + AGGREGATE + MIN + MAX● In-Memory operations are pretty fast● real-time features feel better in a real-time

database (e.g. time series)● we don't need batch

Page 30: Realtime Recommendationsnosqlroadshow.com/dl/NoSQL-Berlin-2013/GOTO/GOTO... · plista_company 135 Live Read + Live Write = Real Time Recommendations String, Lists, Set, .. Hash map

What else in Redis?

● message bus● many recommenders● live statistics● caching

"One technology to rule them all"