Redis
Click here to load reader
-
Upload
gleicon -
Category
Technology
-
view
8.010 -
download
4
description
Transcript of Redis
Redis
Case studies
Redis
Key/ValueAsync I/OVery fast (most ops take O(1))Active development (VM, APPEND Only datafile, HASH type)Values can be data types: LISTS, SETS, ORDERED SETS (http://code.google.com/p/redis/wiki/IntroductionToRedisDataTypes)One step further than memcached, same intuitive applications and patterns
Redis
Why ?RestMQ Brief MongoDB/Redis recapInformation Retrieval: using SETs to search books
Why ? Another Key/Value ?
REDIS is a key value storage, but it presents different data types.
These datatypes are the building blocks of more complex stuff you use already.
Think LISTs, SETs, Ordered SETs, and methods to deal with them as you would do with a good standard library.
Also, different persistence strategies, replication, locks, increments...
RestMQ
RestMQ is a HTTP/REST/JSON based message queue.
HTTP as transport protocolREST as a way to organize resourcesJSON as data exchange format
- Built initially to mimic Amazon's SQS functionality at GAE (http://jsonqueue.appspot.com)
- Standalone server, uses Python, Cyclone, Twisted and Redis.
- COMET consumer (bind your http client and get objects)
RestMQ and Redis
For each queue <<q>> in Redis:
q:uuid - The queue unique id counterq:queue - The queue LIST (fifo)q:control - Queue pause controlq:<<id>> - objects in queue.
Global:
QUEUESET: SET containing all queues
Also: Persistence, statistics
An async/sharding Redis client
Original python clients:
redis.py: Synchronoustxredis: Incomplete
Needed:
Async client, with connection pool and sharding (well sharding is a plus).
http://github.com/fiorix/txredisapi
Web app framework
Original RestMQ ways twisted.web based. Cool, but too much work.
http://github.com/fiorix/cyclone
A twisted based tornado clone. COMET is a breeze, lots of web framework stuff, json encode/decode support built in.
Integrates easily with txredis-api. The core queue protocol was ported and extended form the GAE version.
RestMQ
COMET consumerREST producer/consumerJSON Based producer/consumerCOMET is pausable (start/stop control)HTTP based. Even CURL can operate a MQ now.Asynchronous I/OMap/Reduce and Actors are a given (easy to implement, example shipped)
http://github.com/gleicon/restmq
Brief MongoDB/Redis recap - BooksMongoDB
{ 'id': 1, 'title' : 'Diving into Python', 'author': 'Mark Pilgrim', 'tags': ['python','programming', 'computing'] }
{ 'id':2, 'title' : 'Programing Erlang', 'author': 'Joe Armstrong', 'tags': ['erlang','programming', 'computing', 'distributedcomputing', 'FP'] }
{ 'id':3, 'title' : 'Programing in Haskell', 'author': 'Graham Hutton', 'tags': ['haskell','programming', 'computing', 'FP'] }
Redis
SET book:1 {'title' : 'Diving into Python', 'author': 'Mark Pilgrim'}SET book:2 { 'title' : 'Programing Erlang', 'author': 'Joe Armstrong'}SET book:3 { 'title' : 'Programing in Haskell', 'author': 'Graham Hutton'}
SADD tag:python 1SADD tag:erlang 2SADD tag:haskell 3SADD tag:programming 1 2 3SADD tag computing 1 2 3SADD tag:distributedcomputing 2SADD tag:FP 2 3
Brief MongoDB/Redis recap - BooksMongoDB
Search tags for erlang or haskell:
db.books.find({"tags": { $in: ['erlang', 'haskell'] }})
Search tags for erlang AND haskell (no results)
db.books.find({"tags": { $all: ['erlang', 'haskell'] }})
This search yields 3 resultsdb.books.find({"tags": { $all: ['programming', 'computing'] }})
Redis
SINTER 'tag:erlang' 'tag:haskell'0 results
SINTER 'tag:programming' 'tag:computing'
3 results: 1, 2, 3
SUNION 'tag:erlang' 'tag:haskell'2 results: 2 and 3
SDIFF 'tag:programming' 'tag:haskell'2 results: 1 and 2 (haskell is excluded)
DOCDB
http://github.com/gleicon/docdb
Almost a document database.
eBook indexing - Basic IR procedure
tokenize(split) each wordtake the stop words outstemminggroup words to make composed searches possible
Lots of wordSETs, but as documents are stored, the growing rate slows.
DOCDB
Simulation about how many wordSETs would be created by book:
$ python doc_to_sets.py books/10702.txt 5965
$ python doc_to_sets.py books/13437-8.txt 6125
$ python doc_to_sets.py books/2346.txt 1920
$ python doc_to_sets.py books/24022.txt 3470
$ python doc_to_sets.py books/advsh12.txt 5576
DOCDB
Simulation about how many wordSETs would be created by book, accumulating the result:
$ python doc_to_sets.py books/10702.txt books/13437-8.txt books/2346.txt books/24022.txt books/advsh12.txt
5965918394261003011400
That would mean 11400 SETs in Redis, named for the STEM of the word, each one containing the IDs of books with this word. The growing rate starts with 5965 (no sets) and goes to 1370 sets between the last two documents.
The search would be like using SINTER, SUNION and SDIFF as shown before, to find book by words.
The End
- Check the project's website: http://code.google.com/p/redis/
- Python/Twisted driver: http://github.com/fiorix/txredisapi (connection pool, consistent hashing)
- No silver bullet
- Plan ahead, use IR techniques
- Own your data
- SETs and LISTs are building blocks for most operations regarding indexes. Use them.
- http://code.google.com/p/redis/wiki/IntroductionToRedisDataTypes - Intro do Redis DataTypes
- More about its features: http://code.google.com/p/redis/wiki/Features
- http://code.google.com/p/redis/wiki/TwitterAlikeExample - Twitter clone using Redis