Introduction to memcached

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Introduction to memcached

  2. 2. Tags memcached, performance, scalability, php, mySQL, caching techniques, #ikdoeict
  3. 3. lead web dev at Netlog since 4 years php + mysql + frontend working on Gatcha
  4. 4. For who? talk for students professional bachelor ICT
  5. 5. Why this talk? One of the rst things Ive learnt at Netlog. Using it every single day.
  6. 6. Program - About caching - About memcached - Examples - Tips & tricks - Toolsets and other solutions
  7. 7. What is caching? A copy of real data with faster (and/or cheaper) access
  8. 8. What is caching? From Wikipedia: "A cache is a collection ofdata duplicating original values storedelsewhere or computed earlier, where theoriginal data is expensive to fetch (owingto longer access time) or to compute,compared to the cost of reading thecache." Term introducted by IBM in the 60s
  9. 9. The anatomy simple key/value storage simple operations save get delete
  10. 10. Terminology storage cost retrieval cost (network load / algorithm load) invalidation (keeping data up to date / removingirrelevant data) replacement policy (FIFO/LFU/LRU/MRU/RANDOMvs. Beladys algorithm) cold cache / warm cache
  11. 11. Terminology cache hit and cache miss typical stats: hit ratio (hits / hits + misses) miss ratio (1 - hit ratio) 45 cache hits and 10 cache misses 45/(45+10) = 82% hit ratio 18% miss ratio
  12. 12. When to cache? caches are only efficient when the benetsof faster access outweigh the overhead ofchecking and keeping your cache up todate more cache hits then cache misses
  13. 13. Where are caches used? at hardware level (cpu, hdd) operating systems (ram) web stack applications your own short term vs long term memory
  14. 14. Caches in the web stack Browser cache DNS cache Content Delivery Networks (CDN) Proxy servers Application level full output cachingplugin) (eg. Wordpress WP-Cache ...
  15. 15. Caches in the web stack (contd) Application level opcode cache (APC) query cache (MySQL) storing denormalized results in thedatabase object cache storing values in php objects/classes
  16. 16. Efficiency of caching? the earlier in the process, the closer to the original request(er), the faster browser cache will be faster then cache on a proxy but probably also the harder to get it right the closer to the requester the more parameters the cachedepends on
  17. 17. What to cache on the server-side? As PHP backend developer, what to cache? expensive operations: operations thatwork with slower resources database access reading les(in fact, any lesystem access) API calls Heavy computations XML
  18. 18. Where to cache on the server-side? As PHP backend developer, where to storecache results? in database (computed values,generated html) youll still need to access your database in static les (generated html orserialized php values) youll still need to access your le system
  19. 19. in memory!
  20. 20. memcached
  21. 21. About memcached Free & open source, high-performance,distributed memory object caching system Generic in nature, intended for use inspeeding up dynamic web applications byalleviating database load. key/value dictionary
  22. 22. About memcached (contd) Developed by Brad Fitzpatrick forLiveJournal in 2003 Now used by Netlog, Facebook, Flickr,Wikipedia, Twitter, YouTube ...
  23. 23. Technically Its a server Client access over TCP or UDP Servers can run in pools eg. 3 servers with 64GB mem each giveyou a single pool of 192GB storage forcaching Servers are independent, clients managethe pool
  24. 24. What to store in memcache? high demand (used often) expensive (hard to compute) common (shared accross users) Best? All three
  25. 25. What to store in memcache? (contd) Typical: user sessions (often) user data (often, shared) homepage data (eg. often, shared,expensive)
  26. 26. What to store in memcache? (contd) Workow: monitor application (query logs /proling) add a caching level compare speed gain
  27. 27. Memcached principles Fast network access (memcached servers closeto other application servers) Nomemcached is gone) server goes down, datain persistency (if your No redundancy / fail-over No replication (single item in cache lives on oneserver only) No authentication (not in shared environments)
  28. 28. Memcached principles (contd) 1 key is maximum 1MB keys are strings of 250 characters (inapplication typically MD5 of user readablestring) No enumeration of keys (thus no list ofvalid keys in cache at certain moment, listof keys beginnen with user_, ...) No active clean-up (only clean up whenmore space needed, LRU)
  29. 29. $ telnet localhost 11211 Trying Connected to localhost. Escape character is '^]'. get foo VALUE foo 0 2 hi END stats STAT pid 8861 (etc)
  30. 30. Client Access both ASCII as Binary protocol in real life: clients available for all major languages C, C++, PHP, Python, Ruby, Java, Perl,Windows, ...
  31. 31. PHP Clients Support the basics such as multipleservers, setting values, getting values,incrementing, decrementing and gettingstats. pecl/memcache pecl/memcached newer, in beta, a couple more features
  32. 32. PHP Client Comparison pecl/memcachepecl/memcachedFirst Release Date 2004-06-08 2009-01-29 (beta)Actively Developed? Yes YesExternal Dependency None libmemcachedFeaturesAutomatic Key Fixup Yes NoAppend/Prepend NoYesAutomatic Serialzation2Yes YesBinary Protocol NoOptionalCASNoYesCompressionYes YesCommunication TimeoutConnect OnlyVarious OptionsConsistent Hashing Yes YesDelayed GetNoYesMulti-GetYes YesSession Support Yes YesSet/Get to a specic server NoYesStores Numerics Converted to Strings Yes
  33. 33. PHP Client functions Memcached::add Add an item under a new key Memcached::addServer Add a server to the serverpool Memcached::decrement Decrement numeric item'svalue Memcached::delete Delete an item Memcached::ush Invalidate all items in the cache Memcached::get Retrieve an item Memcached::getMulti Retrieve multiple items Memcached::getStats Get server pool statistics Memcached::increment Increment numeric item'svalue Memcached::set Store an item ...
  34. 34. Output caching Pages with high load / expensive togenerate Very easy Very fast But: all the dependencies ... language, css, template, logged inusers details, ...
  35. 35.
  36. 36. Data caching on a lower level easier to nd all dependencies ideal solution for offloading databasequeries the database is almost always thebiggest bottleneck in backendperformance problems
  37. 37.
  38. 38. There are only two hard things in Computer Science: cache invalidation and naming things. Phil Karlton
  39. 39. Invalidation Caching for a certain amount of time eg. 10 minutes dont delete caches thus: You cant trust that data comingfrom cache is correct
  40. 40. Invalidation (contd) Use: Great for summaries Overview Pages where its not that big a problem if data is a little bit out of dat (eg. search results) Good for quick and dirty optimizations
  41. 41. Invalidation (contd) Store forever, and expire on certain events the userdata example store userdata for ever when user changes any of his preferences, throw cache away
  42. 42. Invalidation Use: data that is fetched more then itsupdated where its critical the data is correct Improvement: instead of delete onevent, update cache on event. (Mind:race conditions. Cache invalidationalways as close to original change aspossible!)
  43. 43. Uses at Netlog sessions (cross server) database results (via database class, orobject caching) ooding checks output caching (eg. for RSS feeds) locks
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53. Query Caching (contd) queries with JOIN and WHERE statementsare harder to cache often not easy to nd the cache key onupdate/change events solution: JOIN in PHP
  54. 54. Query Caching (contd) queries with JOIN and WHERE statementsare harder to cache often not easy to nd the cache key onupdate/change events solution: JOIN in PHP In following example: what if nicknameof user changes?
  55. 55.
  56. 56.
  57. 57.
  58. 58. So? Pros: speed, duh. queries get simpler (better for your db) easier porting to key/value storage solutions Cons: Youre relying on memcached to be up and have good hit ratios
  59. 59. Multi-Get Optimisations We reduced database access Memcached is faster, but access tomemcache still has its price Solution: multiget fetch multiple keys from memcached inone single call result is array of items
  60. 60. Multi-Get Optimisations (contd) back to addUserDetails example nd UIDs from array multiget to memcached for details ofUIDs for UIDs without result, do a query SELECT ... FROM USERS WHERE uid IN (...) for each fetched user, store in cache worst case (no hits): 1 query return merged cache/db results
  61. 61. Consistent Hashing client is responsible for managing pool hashes a certain key to a certain server clients can be nave: distribute keys on sizeof pool if one server goes down, all keys will now bequeried on other servers > cold cache use a client with consistent hashingalgorithms, so if server goes down, only dataon that server gets lost
  62. 62. Memcached Statistics available stats from servers include: uptime, #calls (get/set/...), #hits (sinceuptime), #misses (since uptime) no enumeration, no distinguishing on typesof caches add own logging / statistics to monitoreffectiveness of your caching str