Caching Up and Down the Stack

Post on 06-May-2015

179 views 2 download

description

Whether you're looking to make your web app run faster or scale better, one great way to achieve both is to simply do less work. How? By using caches, the data hidey-holes which generations of engineers have thoughtfully left at key junctures in computing infrastructure from your CPU to the backbone of the internet. Requests into web applications, which span great distances and often involve expensive frontend and backend lifting are great candidates for caching of all types. We'll discuss the benefits and tradeoffs of caching at different layers of the stack and how to find low-hanging cachable fruit, with a particular focus on server-side improvements

Transcript of Caching Up and Down the Stack

Caching Up and Downthe Stack

Long Island/Queens Django Meetup 5/20/14

Hi, I’m Dan Kuebrich

● Software engineer, python fan● Web performance geek● Founder of Tracelytics, now part of AppNeta● Once (and future?) Queens resident

DJANGO

What is “caching”?

● Caching is avoiding doing expensive worko by doing cheaper work

● Common examples?o On repeat visits, your browser doesn’t download

images that haven’t changedo Your CPU caches instructions, data so it doesn’t

have to go to RAM… or to disk!

What is “caching”?

Uncached

Client

Data Source

What is “caching”?

Client

Data Source

Uncached Cached

Cache Intermediary

Client

Data Source

What is “caching”?

Client

Data Source

Uncached Cached

Cache Intermediary

Client

Data Source

Fast!

Slow...

“Latency Numbers Every Programmer Should Know”

Systems Performance: Enterprise and the Cloud by Brendan Gregg http://books.google.com/books?id=xQdvAQAAQBAJ&pg=PA20&lpg=PA20&source=bl&ots=hlTgyxdrnR&sig=CCjddHrY1H6muMVW9BFcbdO7DDo&hl=en&sa=X&ei=dS7oUquhOYr9oAT9oYGoDw&ved=0CCkQ6AEwAA#v=onepage&q&f=false

A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based

o Full-pageo Fragmento Object cache

● Databaseo Query cacheo Denormalization

Closer to the user

Closer to the data

Caching in Django apps: Frontend

● Client-side assets● Full pages

Client-side assets

Client-side assets

Client-side assets● Use HTTP caches!

o Browsero CDNo Intermediate proxies

● Set policy with cache headerso Cache-Control / Expireso ETag / Last-Modified

HTTP Cache-Control and Expires● Stop the browser from even asking for it● Expires

o Pick a date in the future, good til then

● Cache-controlo More flexibleo Introduced in HTTP 1.1o Use this one

HTTP Cache-Control and Expires

dan@JLTM21:~$ curl -I https://login.tv.appneta.com/cache/tl-layouts_base_unauth-compiled-162c2ceecd9a7ff1e65ab460c2b99852a49f5a43.css

HTTP/1.1 200 OKAccept-Ranges: bytesCache-Control: max-age=315360000Content-length: 5955Content-Type: text/cssDate: Tue, 20 May 2014 23:12:16 GMTExpires: Thu, 31 Dec 2037 23:55:55 GMTLast-Modified: Fri, 16 May 2014 20:51:19 GMTServer: nginxConnection: keep-alive

HTTP Cache Control in Django

https://docs.djangoproject.com/en/dev/topics/cache/

ETag + Last-Modified

ETag + Last-Modified

dan@JLTM21:~$ curl -I www.appneta.com/stylesheets/styles.css

HTTP/1.1 200 OKLast-Modified: Tue, 20 May 2014 05:52:50 GMTETag: "30854c-1c3d3-4f9ce7d715080"Vary: Accept-EncodingContent-Type: text/css...

ETag + Last-Modified

dan@JLTM21:~$ curl -I www.appneta.com/stylesheets/styles.css --header 'If-None-Match: "30854c-1c3d3-4f9ce7d715080"'

HTTP/1.1 304 Not ModifiedLast-Modified: Tue, 20 May 2014 05:52:50 GMTETag: "30854c-1c3d3-4f9ce7d715080"Vary: Accept-EncodingContent-Type: text/cssDate: Tue, 20 May 2014 23:21:12 GMT...

ETag vs Last-Modified

● Last-Modified is date-based● ETag is content-based● Most webservers generate both

● Some webservers (Apache) generate etags

that depend on local stateo If you have a load-balanced pool of servers working

here, they might not be using the same etags!

A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based

o Full-pageo Fragmento Object cache

● Databaseo Query cacheo Denormalization

CDNs

● Put content closer to your end-userso and offload HTTP requests from

your servers● Best for static assets● Same cache control policies apply

Full-page caching

Client

Data Source

Varnish

No internet standards necessary!

Full-page caching: mod_pagespeed

Client

Data Source

mod_pagespeed

● Dynamically rewrites pages with frontend optimizations

● Caches rewritten pages

A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based

o Full-pageo Fragmento Object cache

● Databaseo Query cacheo Denormalization

Full-page caching in Django

Wait, where is this getting cached?

● Django makes it easy to configureo In-memoryo File-basedo Memcachedo etc.

Full-page caching: dynamic pages?

Full-page caching: dynamic pages?

Fragment caching

Full-page caching: dynamic pages?

Full-page caching: the ajax solution

Object cachingdef get_item_by_id(key):

# Look up the item in our databasereturn session.query(User)\

.filter_by(id=key)\ .first()

Object cachingdef get_item_by_id(key):

# Check in cacheval = mc.get(key)# If exists, return itif val:

return val# If not, get the val, store it in the cacheval = return session.query(User)\

.filter_by(id=key)\ .first()

mc.set(key, val)return val

Object caching

@decoratordef cache(expensive_func, key):

# Check in cacheval = mc.get(key)# If exists, return itif val:

return val# If not, get the val, store it in the cacheval = expensive_func(key)mc.set(key, val)return val

Object caching@cachedef get_item_by_id(key):

# Look up the item in our databasereturn session.query(User)\

.filter_by(id=key)\ .first()

Object caching in Django

A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based

o Full-pageo Fragmento Object cache

● Databaseo Query cacheo Denormalization

Query caching

Client

Actual tables

Database

Query Cache

Cached?

Query cachingmysql> select SQL_CACHE count(*) from traces; +----------+| count(*) |+----------+| 3135623 |+----------+1 row in set (0.56 sec)

mysql> select SQL_CACHE count(*) from traces;+----------+| count(*) |+----------+| 3135623 |+----------+1 row in set (0.00 sec)

Query caching

Query caching

Uncached

Cached

Denormalization

mysql> select table1.x, table2.y from table1 join table2 on table1.z = table2.q where table1.z > 100;

mysql> select table1.x, table1.y from table1 where table1.z > 100;

A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based

o Full-pageo Fragmento Object cache

● Databaseo Query cacheo Denormalization

Caching: what can go wrong?

● Invalidation● Fragmentation● Stampedes● Complexity

Invalidation

Client

Data Source

Cache Intermediary

Update!

Write

Invalidate

Invalidation on page-scale● Browser cache● CDN● Proxy / optimizer● Application-based

o Full-pageo Fragmento Object cache

● Databaseo Query cacheo Denormalization

More savings,generally more invalidation...

Smaller savings,generally less invalidation

Fragmentation

● What if I have a lot of different things to cache?o More misseso Potential cache eviction

Fragmentation

Your pages / objects

Fre

quen

cy o

f Acc

ess

Fragmentation

Your pages / objects

Fre

quen

cy o

f Acc

ess

Stampedes

● On a cache miss extra work is done● The result is stored in the cache● What if multiple simultaneous misses?

Stampedes

http://allthingsd.com/20080521/stampede-facebook-opens-its-profile-doors/

Complexity

● How much caching do I need, and where?● What is the invalidation process

o on data update? on release?● What happens if the caches fall over?● How do I debug it?

Takeaways

● The ‘how’ of caching:o What are you caching?o Where are you caching it?o How bad is a cache miss?o How and when are you invalidating?

Takeaways

● The ‘why’ of caching:o Did it actually get faster?o Is speed worth extra complexity?o Don’t guess – measure!o Always use real-world conditions.

Questions?

?

Thanks!

● Interested in measuring your Django app’s performance?o Free trial of TraceView:

www.appneta.com/products/traceview● See you at Velocity NYC this fall?● Twitter: @appneta / @dankosaur

Resources● Django documentation on caching: https://docs.djangoproject.com/en/dev/topics/cache/● Varnish caching, via Disqus:

http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views● Django cache option comparisons:

http://codysoyland.com/2010/jan/17/evaluating-django-caching-options/● More Django-specific tips:

http://www.slideshare.net/csky/where-django-caching-bust-at-the-seams● Guide to cache-related HTTP headers:

http://www.mobify.com/blog/beginners-guide-to-http-cache-headers/● Google PageSpeed: https://developers.google.com/speed/pagespeed/module