Memcached Code Camp 2009

Post on 15-Jan-2015

3.960 views 4 download

Tags:

description

memcached best practices presentation at Silicon Valley Code Camp 2009,

Transcript of Memcached Code Camp 2009

memcached

scaling your website

with memcached

by: steve yen

about me

• Steve Yen

• NorthScale

• Escalate Software

• Kiva Software

what you’ll learn

• what, where, why, when

• how

• especially, best practices

“mem cache dee”

• latest version1.4.1

• http://code.google.com/p/memcached

open source

distributed cache

livejournal

helps your websites run fast

popular

simple

KISS

easy

small bite-sized steps

• not a huge, forklift replacement rearchitecture / reengineering project

fast

“i only block for memcached”

scalable

many client libraries

• might be TOO many

• the hit list...

• Java ==> spymemcached

• C ==> libmemcached

• Python, Ruby, etc ==>

• libmemcached wrappers

frameworks

• rails

• django

• spring / hibernate

• cakephp, symphony, etc

applications

• drupal

• wordpress

• mediawiki

• etc

it works

it promises to solve performance problems

it delivers!

problem?

your website is too slow

RDBMS melting down

urgent! emergency

one server

web app + RDBMS

1 + 1 servers

web app

RDBMS

N + 1 servers

web app, web app, web app, web app

RDBMS

RDBMS

EXPLAIN PLAN?

buy a bigger box

buy better disks

master write DB + multiple read DB?

vertical partitioning?

sharding?

uh oh, big reengineering

• risky!

• touch every line of code, every query!!

and, it’s 2AM

you need a band-aid

a simple band-aid now

use a cache

keep things in memory!

don’t hit disk

distributed cache

• to avoid wasting memory

don’t write one of these yourself

memcached

simple API

• hash-table-ish

your code before

v = db.query( SOME SLOW QUERY )

your code after

v = memcachedClient.get(key)

if (!v) {

v = db.query( SOME SLOW QUERY )

memcachedClient.set(key, v)

}

cache read-heavy stuff

invalidate when writing

• db.execute(“UPDATE foo WHERE ...”)

• memcachedClient.delete(...)

and, repeat

• each day...

• look for the next slowest operations

• add code to cache a few more things

your life gets better

thank you memcached!

no magic

you are in control

now for the decisions

memcached adoption

• first, start using memcached

• poorly

• but you can breathe again

memcached adoption

• next, start using memcached correctly

memcached adoption

• later

• queueing

• persistence

• replication

• ...

an early question

where to run servers?

answer 1

• right on your web servers

• a great place to start, if you have extra memory

servers

web app web app web app web appmemcached memcached memcached, memcached

RDBMS

add up your memory usage!

• having memcached server swap == bad!

answer 2

• run memcached right on your database server?

• WRONG!

answer 3

• run memcached on separate dedicated memcached servers

• congratulations!

• you either have enough money

• or enough traffic that it matters

running a server

• daemonize

• don’t be root!

• no security

server lists

• mc-server1:11211

• mc-server2:11211

• mc-server3:11211

consistent hashing

source: http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/

client-side intelligence

• no “server master” bottleneck

libmemcached

• fast C memcached client

• supports consistent hashing

• many wrappers to your favorite languages

updating server lists

• push out new configs and restart?

• moxi

• memcached + integrated proxy

keys

• no whitespace

• 250 char limit

• use short prefixes

keys & MD5

• don’t

• stats become useless

values

• any binary object

• 1MB limit

• change #define & recompile if you want more

• and you’re probably doing something wrong if you want more

values

• query resultset

•serialized object

•page fragment

•pages• etc

nginx + memcached

>1 language?

• JSON

• protocol buffers

• XML

memcached is lossy

• memcached WILL lose data

that’s a good thing

remember, it’s a CACHE

why is memcached lossy?

memcached node dies

when node restarts...

• you just get a bunch of cache misses

(and a short RDBMS spike)

eviction

more disappearing data!

LRU

• can config memcached to not evict

• but, you’re probably doing something wrong if you do this

remember, it forgets

• it’s just a CACHE

expiration

• aka, timeouts

• memcached.set(key, value, timeout)

use expirations or not?

1st school of thought

• expirations hide bugs

• you should be doing proper invalidations

• (aka, deletes)

• coherency!

school 2

• it’s 3AM and I can’t think anymore

• business guy:

• “sessions should auto-logout after 30 minutes due to bank security policy”

put sessionsin memcached?

• just a config change

• eg, Ruby on Rails

good

• can load-balance requests to any web host

• don’t touch the RDBMS on every web request

bad

• could lose a user’s session

solution

• save sessions to memcached

• the first time, also save to RDBMS

• ideally, asynchronously

• on cache miss, restore from RDBMS

solution

• save sessions to memcached

• the first time, also save to RDBMS

• ideally, asynchronously

• on cache miss, restore from RDBMS

in the background...

• have a job querying the RDBMS

• cron job?

• the job queries for “old” looking session records in the sessions table

• refresh old session records from memcached

add vs replace vs set

append vs prepend

CAS

• compare - and - swap

incr and decr

• no negative numbers

queueing

• “hey, with those primitives, I could build a queue!”

don’t

• memcached is lossy

• protocol is incorrect for a queue

• instead

• gearman

• beanstalkd

• etc

cache stampedes

• gearman job-unique-id

• encode a timestamp in your values

• one app node randomly decides to refresh slightly early

coherency

denormalization

• or copies of data

example: changing a product price

memcached UDF’s

• another great tool in your toolbox

• on a database trigger, delete stuff from memcached

memcached UDF’s

• works even if you do UPDATES with fancy WHERE clauses

multigets

• they are your friend

• memcached is fast, but...

• imagine 1ms for a get request

• 200 serial gets ==> 200ms

a resultset loop

foreach product in resultset

c = memcached.get(product.category_id)

do something with c

2 loops

for product in resultset

multiget_request.append(product.category_id)

multiget_response = memcachedClient.multiget(

multiget_request)

for c in multiget_response

do something with c

memcached slabber

• allocates memory into slabs

• it might “learn” the wrong slab sizes

• watch eviction stats

losing a node

• means your RDBMS gets hit

replication

• simple replication in libmemcached

• >= 2x memory cost

• only simple verbs

• set, get, delete

• doesn’t handle flapping nodes

persistence

things that speak memcached

• tokyo tyrant

• memcachedb

• moxi

another day

• monitoring & statistics

• near caching

• moxi

thanks!!!

• love any feedback

• your memcached war stories

• your memcached wishlist

• steve.yen@northscale.com