Sharding Redis at Flite

SHARDING REDIS at FLITE

Eugene Feingold & Chakri Kodali

WHAT IS FLITE?

•  Display advertising platform •  Create and publish rich,

interactive ads •  Serve ad impressions •  Collect and analyze metrics,

batch AND realtime

WHAT IS FLITE?

•  Display advertising platform •  Create and publish rich,

interactive ads •  Serve ad impressions •  Collect and analyze metrics,

batch AND realtime with

REALTIME YOU SAY?

•  Realtime monitoring of ad performance •  Debugging of ads, with instant feedback on triggered events

METRICS ARCHITECTURE

JVM

JVM

JVM

ad

ad

ad

ad

ad

ad

ad

node.js

node.js

pipelined write of ~30 items

WRITING TO REDIS

Persistence: Up to 4 hours for most data

Amount: ~30 writes per event, pipelined

Granularity: By second, minute, hour

Write Types: •  HSET - JSON blobs (Event body data) •  LPUSH - Lists of events (Events by session) •  HINCRBY - Simple counters (Events by ad) •  ZINCRBY - Sorted set counters (To retrieve “Top 100”) •  PUBSUB - Stream event data (Debugger)

READING FROM REDIS

Redis transactions are used extensively like multi, exec Read types: •  HGETALL - get all data for event •  HGET - get event counts by ad •  LRANGE - list of all events by session •  ZREVRANGE - top 100 ads with highest number of events •  SUBSCRIBE

AND ALL OF THIS AT SCALE

•  Daily traffic peaks: 100k - 200k events per minute

•  Peaks are really plateaus that last for hours

•  Read load is negligible by comparison, but reads must be fast

•  In fact, everything must be fast: <1 sec latency for debugger to work

WHAT’S WRONG WITH THIS PICTURE?

JVM

JVM

JVM

ad

ad

ad

ad

ad

ad

ad

node.js

node.js

WHAT’S WRONG WITH THIS PICTURE?

Bottleneck!

JVM

JVM

JVM

ad

ad

ad

ad

ad

ad

ad

node.js

node.js

SCALABILITY FAIL!

What did it look like?

2-3x of our usual load

SCALABILITY FAIL!

Note how only 1 core is being used 14,000 open connections!

What did it look like?

A QUICK SOLUTION?

•  MOAR Megahurtz!!: m2.2xl is already about as fast as Amazon gets.

•  Redis-As-A-Service: Expensive, not fast enough for even our usual load.

•  twemproxy: Twitter’s sharding solution.

Doesn’t support all commands: PING, MULTI, INFO, MGET, etc…

WHAT ABOUT JEDIS’S NATIVE SHARDING?

•  No pipelining

•  No pubsub

•  Complicated consistent hashing mechanism makes reading in other

environments more difficult

LET’S ROLL OUR OWN!

Goals: Speed, speed, speed

Not Goals: Fault tolerance, redundancy, resiliency

HOW HARD CAN IT BE?

Sharding method: Java hashCode of key for every item written

JVM ad event items

node.js

Write to many Read from one

WHAT HAPPENED?

Before Sharding After Sharding

Items per Event 30 30

Items written per Event per Redis 30 10

Redis Connections per Event 1 3

Connections per second per Redis box n n

Reality: When n gets to around 500, Redis maxes out CPU and starts rejecting connections.

Theory: Since Redis claims to be able to handle 70k connections per second, the amount of data being sent per connection is the problem.

SO HOW HARD CAN IT BE?

HARD.

TAKE TWO

Sharding method: Java hashCode of EVENT key for every item written. A single key now lives on multiple Redis boxes

JVM ad event items

node.js

Write to one Read from many

BETTER!

Before Sharding After Take 1 After Take 2

Items per Event 30 30 30

Items written per Event per Redis 30 10 30

Redis Connections per Event 1 3 1

Connections per second per Redis box n n n/3

More load can be easily accommodated by adding boxes

CODING CHALLENGES Java •  Managing multiple connection pools •  Managing multiple pipelines •  Automatic health checks

node.js •  Finding hashing function that works in different environments •  Managing multiple pipelines •  Fanout requests and merging response once pipeline is

executed

SINGLE-REDIS JEDIS WORKFLOW

On application startup: 1.  Initialize jedisPool with connection info

Every time: 1.  Jedis jedisClient = jedisPool.getClient(); 2.  Pipeline pipeline = jedisClient.pipelined(); 3.  State your business 4.  pipeline.sync(); 5.  jedisPool.returnResource(jedisClient);

SHARDED JEDIS WORKFLOW

On application startup: 1.  Initialize n jedisPools with connection info

Every time: 1.  Jedis jedisClient = jedisPool.getClient(); 2.  Pipeline pipeline = jedisClient.pipelined(); 3.  State your business 4.  pipeline.sync(); 5.  jedisPool.returnResource(jedisClient);


On application startup: 1.  Initialize n jedisPools with connection info

Every time: 1.  Jedis jedisClient = jedisPool.getClient(); Which pool? 2.  Pipeline pipeline = jedisClient.pipelined(); Which client? 3.  State your business To whom? 4.  pipeline.sync(); Which pipeline? 5.  jedisPool.returnResource(jedisClient); Return what where?


RedisNodeManager Spring-created singleton

JedisPoolManager JedisPoolManager JedisPoolManager RedisPoolManager




RedisPipelineManager JedisPoolManager JedisPoolManager JedisPoolManager RedisPoolManager

getPipelineManager()




RedisPipelineManager JedisPoolManager JedisPoolManager JedisPoolManager RedisPoolManager

Pipeline

getPipelineManager()

getPipeline(shardKey)

LET’S LOOK AT SOME CODE!

OPERATIONAL DETAILS

•  Redis is single threaded o  You can run multiple Redises on one server o  Bind each Redis instance to a specific core

QUESTIONS?

Sharding Redis at Flite

Technology

Transcript of Sharding Redis at Flite