Scaling Django

95
Scaling Django Web Apps Mike Malone euro con 2009 Tuesday, May 5, 2009

description

by Mike Malone, presented at EuroDjangoCon

Transcript of Scaling Django

Page 1: Scaling Django

Scaling Django Web AppsMike Malone

euro con 2009Tuesday, May 5, 2009

Page 2: Scaling Django

Hi, I’m Mike.

Tuesday, May 5, 2009

Page 3: Scaling Django

Tuesday, May 5, 2009

Page 4: Scaling Django

Tuesday, May 5, 2009

Page 5: Scaling Django

http://www.flickr.com/photos/kveton/2910536252/Tuesday, May 5, 2009

Page 6: Scaling Django

Tuesday, May 5, 2009

Page 7: Scaling Django

euro con 2009

Pownce

• Large scale

• Hundreds of requests/sec

• Thousands of DB operations/sec

• Millions of user relationships

• Millions of notes

• Terabytes of static data

7

Tuesday, May 5, 2009

Page 8: Scaling Django

euro con 2009

Pownce

• Encountered and eliminated many common scaling bottlenecks

• Real world example of scaling a Django app

• Django provides a lot for free

• I’ll be focusing on what you have to build yourself, and the rare places where Django got in the way

8

Tuesday, May 5, 2009

Page 9: Scaling Django

Scalability

Tuesday, May 5, 2009

Page 10: Scaling Django

euro con 2009

Scalability

10

• Speed / Performance

• Generally affected by language choice

• Achieved by adopting a particular technology

Scalability is NOT:

Tuesday, May 5, 2009

Page 11: Scaling Django

euro con 2009

import time

def application(environ, start_response): time.sleep(10) start_response('200 OK', [('content-type', 'text/plain')]) return ('Hello, world!',)

A Scalable Application

11

Tuesday, May 5, 2009

Page 12: Scaling Django

euro con 2009

def application(environ, start_response): remote_addr = environ['REMOTE_ADDR'] f = open('access-log', 'a+') f.write(remote_addr + "\n") f.flush() f.seek(0) hits = sum(1 for l in f.xreadlines()

if l.strip() == remote_addr) f.close() start_response('200 OK', [('content-type', 'text/plain')]) return (str(hits),)

A High Performance Application

12

Tuesday, May 5, 2009

Page 13: Scaling Django

euro con 2009

Scalability

13

A scalable system doesn’t need to change when the size of the problem changes.

Tuesday, May 5, 2009

Page 14: Scaling Django

euro con 2009

Scalability

• Accommodate increased usage

• Accommodate increased data

• Maintainable

14

Tuesday, May 5, 2009

Page 15: Scaling Django

euro con 2009

Scalability

• Two kinds of scalability

• Vertical scalability: buying more powerful hardware, replacing what you already own

• Horizontal scalability: buying additional hardware, supplementing what you already own

15

Tuesday, May 5, 2009

Page 16: Scaling Django

euro con 2009

Vertical Scalability

• Costs don’t scale linearly (server that’s twice is fast is more than twice as much)

• Inherently limited by current technology

• But it’s easy! If you can get away with it, good for you.

16

Tuesday, May 5, 2009

Page 17: Scaling Django

euro con 2009

Vertical Scalability

17

Sky scrapers are special. Normal buildings don’t need 10 floor foundations. Just build!

- Cal Henderson

Tuesday, May 5, 2009

Page 18: Scaling Django

euro con 2009

Horizontal Scalability

18

The ability to increase a system’s capacity by adding more processing units (servers)

Tuesday, May 5, 2009

Page 19: Scaling Django

euro con 2009

Horizontal Scalability

19

It’s how large apps are scaled.

Tuesday, May 5, 2009

Page 20: Scaling Django

euro con 2009

Horizontal Scalability

• A lot more work to design, build, and maintain

• Requires some planning, but you don’t have to do all the work up front

• You can scale progressively...

• Rest of the presentation is roughly in order

20

Tuesday, May 5, 2009

Page 21: Scaling Django

Caching

Tuesday, May 5, 2009

Page 22: Scaling Django

euro con 2009

Caching

• Several levels of caching available in Django

• Per-site cache: caches every page that doesn’t have GET or POST parameters

• Per-view cache: caches output of an individual view

• Template fragment cache: caches fragments of a template

• None of these are that useful if pages are heavily personalized

22

Tuesday, May 5, 2009

Page 23: Scaling Django

euro con 2009

Caching

• Low-level Cache API

• Much more flexible, allows you to cache at any granularity

• At Pownce we typically cached

• Individual objects

• Lists of object IDs

• Hard part is invalidation

23

Tuesday, May 5, 2009

Page 24: Scaling Django

euro con 2009

Caching

• Cache backends:

• Memcached

• Database caching

• Filesystem caching

24

Tuesday, May 5, 2009

Page 25: Scaling Django

euro con 2009

Caching

25

Use Memcache.

Tuesday, May 5, 2009

Page 26: Scaling Django

euro con 2009

Sessions

26

Use Memcache.

Tuesday, May 5, 2009

Page 27: Scaling Django

euro con 2009

Sessions

27

Or Tokyo Cabinethttp://github.com/ericflo/django-tokyo-sessions/

Thanks @ericflo

Tuesday, May 5, 2009

Page 28: Scaling Django

euro con 2009

from django.core.cache import cache

class UserProfile(models.Model): ... def get_social_network_profiles(self): cache_key = ‘networks_for_%s’ % self.user.id profiles = cache.get(cache_key) if profiles is None: profiles = self.user.social_network_profiles.all() cache.set(cache_key, profiles) return profiles

Caching

28

Basic caching comes free with Django:

Tuesday, May 5, 2009

Page 29: Scaling Django

euro con 2009

from django.core.cache import cachefrom django.db.models import signals

def nuke_social_network_cache(self, instance, **kwargs): cache_key = ‘networks_for_%s’ % self.instance.user_id cache.delete(cache_key)

signals.post_save.connect(nuke_social_network_cache, sender=SocialNetworkProfile)signals.post_delete.connect(nuke_social_network_cache, sender=SocialNetworkProfile)

Caching

29

Invalidate when a model is saved or deleted:

Tuesday, May 5, 2009

Page 30: Scaling Django

euro con 2009

Caching

30

• Invalidate post_save, not pre_save

• Still a small race condition

• Simple solution, worked for Pownce:

• Instead of deleting, set the cache key to None for a short period of time

• Instead of using set to cache objects, use add, which fails if there’s already something stored for the key

Tuesday, May 5, 2009

Page 31: Scaling Django

euro con 2009

Advanced Caching

31

• Memcached’s atomic increment and decrement operations are useful for maintaining counts

• But they’re not available in Django 1.0

• Added in 1.1 by ticket #6464

Tuesday, May 5, 2009

Page 32: Scaling Django

euro con 2009

Advanced Caching

32

• You can still use them if you poke at the internals of the cache object a bit

• cache._cache is the underlying cache object

try: result = cache._cache.incr(cache_key, delta)except ValueError: # nonexistent key raises ValueError # Do it the hard way, store the result.return result

Tuesday, May 5, 2009

Page 33: Scaling Django

euro con 2009

Advanced Caching

33

• Other missing cache API

• delete_multi & set_multi

• append: add data to existing key after existing data

• prepend: add data to existing key before existing data

• cas: store this data, but only if no one has edited it since I fetched it

Tuesday, May 5, 2009

Page 34: Scaling Django

euro con 2009

Advanced Caching

34

• It’s often useful to cache objects ‘forever’ (i.e., until you explicitly invalidate them)

• User and UserProfile

• fetched almost every request

• rarely change

• But Django won’t let you

• IMO, this is a bug :(

Tuesday, May 5, 2009

Page 35: Scaling Django

euro con 2009

class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=0): if isinstance(value, unicode): value = value.encode('utf-8') return self._cache.add(smart_str(key), value, timeout or self.default_timeout)

The Memcache Backend

35

Tuesday, May 5, 2009

Page 36: Scaling Django

euro con 2009

class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=None): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.add(smart_str(key), value, timeout)

The Memcache Backend

36

Tuesday, May 5, 2009

Page 37: Scaling Django

euro con 2009

Advanced Caching

37

• Typical setup has memcached running on web servers

• Pownce web servers were I/O and memory bound, not CPU bound

• Since we had some spare CPU cycles, we compressed large objects before caching them

• The Python memcache library can do this automatically, but the API is not exposed

Tuesday, May 5, 2009

Page 38: Scaling Django

euro con 2009

from django.core.cache import cachefrom django.utils.encoding import smart_strimport inspect as i

if 'min_compress_len' in i.getargspec(cache._cache.set)[0]: class CacheClass(cache.__class__): def set(self, key, value, timeout=None, min_compress_len=150000): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.set(smart_str(key), value, timeout, min_compress_len) cache.__class__ = CacheClass

Monkey Patching core.cache

38

Tuesday, May 5, 2009

Page 39: Scaling Django

euro con 2009

Advanced Caching

39

• Useful tool: automagic single object cache

• Use a manager to check the cache prior to any single object get by pk

• Invalidate assets on save and delete

• Eliminated several hundred QPS at Pownce

Tuesday, May 5, 2009

Page 40: Scaling Django

euro con 2009

Advanced Caching

40

All this and more at:

http://github.com/mmalone/django-caching/

Tuesday, May 5, 2009

Page 41: Scaling Django

euro con 2009

Advanced Caching

• Consistent hashing: hashes cached objects in such a way that most objects map to the same node after a node is added or removed.

41

http://www.flickr.com/photos/deepfrozen/2191036528/

Tuesday, May 5, 2009

Page 42: Scaling Django

euro con 2009

Caching

42

Now you’ve made life easier for your DB server,next thing to fall over: your app server.

Tuesday, May 5, 2009

Page 43: Scaling Django

Load Balancing

Tuesday, May 5, 2009

Page 44: Scaling Django

euro con 2009

Load Balancing

• Out of the box, Django uses a shared nothing architecture

• App servers have no single point of contention

• Responsibility pushed down the stack (to DB)

• This makes scaling the app layer trivial: just add another server

44

Tuesday, May 5, 2009

Page 45: Scaling Django

euro con 2009

Load Balancing

45

App Servers

Database

Load Balancer

Spread work between multiple nodes in a cluster using a load balancer.

• Hardware or software• Layer 7 or Layer 4

Tuesday, May 5, 2009

Page 46: Scaling Django

euro con 2009

Load Balancing

46

• Hardware load balancers

• Expensive, like $35,000 each, plus maintenance contracts

• Need two for failover / high availability

• Software load balancers

• Cheap and easy, but more difficult to eliminate as a single point of failure

• Lots of options: Perlbal, Pound, HAProxy, Varnish, Nginx

Tuesday, May 5, 2009

Page 47: Scaling Django

euro con 2009

Load Balancing

47

• Most of these are layer 7 proxies, and some software balancers do cool things

• Caching

• Re-proxying

• Authentication

• URL rewriting

Tuesday, May 5, 2009

Page 48: Scaling Django

euro con 2009

Load Balancing

48

A common setup for large operations is to use redundant layer 4 hardware balancers in front of a pool of layer 7 software balancers.

Hardware Balancers

Software Balancers

App Servers

Tuesday, May 5, 2009

Page 49: Scaling Django

euro con 2009

Load Balancing

49

• At Pownce, we used a single Perlbal balancer

• Easily handled all of our traffic (hundreds of simultaneous connections)

• A SPOF, but we didn’t have $100,000 for black box solutions, and weren’t worried about service guarantees beyond three or four nines

• Plus there were some neat features that we took advantage of

Tuesday, May 5, 2009

Page 50: Scaling Django

euro con 2009

Perlbal Reproxying

50

Perlbal reproxying is a really cool, and really poorlydocumented feature.

Tuesday, May 5, 2009

Page 51: Scaling Django

euro con 2009

Perlbal Reproxying

51

1. Perlbal receives request

2. Redirects to App Server

1. App server checks auth (etc.)

2. Returns HTTP 200 with X-Reproxy-URL header set to internal file server URL

3. File served from file server via Perlbal

Tuesday, May 5, 2009

Page 52: Scaling Django

euro con 2009

Perlbal Reproxying

• Completely transparent to end user

• Doesn’t keep large app server instance around to serve file

• Users can’t access files directly (like they could with a 302)

52

Tuesday, May 5, 2009

Page 53: Scaling Django

euro con 2009

def download(request, filename): # Check auth, do your thing response = HttpResponse() response[‘X-REPROXY-URL’] = ‘%s/%s’ % (FILE_SERVER, filename) return response

Perlbal Reproxying

53

Plus, it’s really easy:

Tuesday, May 5, 2009

Page 54: Scaling Django

euro con 2009

Load Balancing

54

Best way to reduce load on your app servers: don’t use them to do hard stuff.

Tuesday, May 5, 2009

Page 55: Scaling Django

Queuing

Tuesday, May 5, 2009

Page 56: Scaling Django

euro con 2009

Queuing

• A queue is simply a bucket that holds messages until they are removed for processing by clients

• Many expensive operations can be queued and performed asynchronously

• User experience doesn’t have to suffer

• Tell the user that you’re running the job in the background (e.g., transcoding)

• Make it look like the job was done real-time (e.g., note distribution)

56

Tuesday, May 5, 2009

Page 57: Scaling Django

euro con 2009

Queuing

• Lots of open source options for queuing

• Ghetto Queue (MySQL + Cron)

• this is the official name.

• Gearman

• TheSchwartz

• RabbitMQ

• Apache ActiveMQ

• ZeroMQ

57

Tuesday, May 5, 2009

Page 58: Scaling Django

euro con 2009

Queuing

• Lots of fancy features: brokers, exchanges, routing keys, bindings...

• Don’t let that crap get you down, this is really simple stuff

• Biggest decision: persistence

• Does your queue need to be durable and persistent, able to survive a crash?

• This requires logging to disk which slows things down, so don’t do it unless you have to

58

Tuesday, May 5, 2009

Page 59: Scaling Django

euro con 2009

Queuing

• Pownce used a simple ghetto queue built on MySQL / cron

• Problematic if you have multiple consumers pulling jobs from the queue

• No point in reinventing the wheel, there are dozens of battle-tested open source queues to choose from

59

Tuesday, May 5, 2009

Page 60: Scaling Django

euro con 2009

from django.core.management import setup_environfrom mysite import settings

setup_environ(settings)

Django Standalone Scripts

60

Consumers need to setup the Django environment

Tuesday, May 5, 2009

Page 61: Scaling Django

THE DATABASE!

Tuesday, May 5, 2009

Page 62: Scaling Django

euro con 2009

The Database

• Til now we’ve been talking about

• Shared nothing

• Pushing problems down the stack

• But we have to store a persistent and consistent view of our application’s state somewhere

• Enter, the database...

62

Tuesday, May 5, 2009

Page 63: Scaling Django

euro con 2009

CAP Theorem

• Three properties of a shared-data system

• Consistency: all clients see the same data

• Availability: all clients can see some version of the data

• Partition Tolerance: system properties hold even when the system is partitioned & messages are lost

• But you can only have two

63

Tuesday, May 5, 2009

Page 64: Scaling Django

euro con 2009

CAP Theorem

• Big long proof... here’s my version.

• Empirically, seems to make sense.

• Eric Brewer

• Professor at University of California, Berkeley

• Co-founder and Chief Scientist of Inktomi

• Probably smarter than me

64

Tuesday, May 5, 2009

Page 65: Scaling Django

euro con 2009

CAP Theorem

• The relational database systems we all use were built with consistency as their primary goal

• But at scale our system needs to have high availability and must be partitionable

• The RDBMS’s consistency requirements get in our way

• Most sharding / federation schemes are kludges that trade consistency for availability & partition tolerance

65

Tuesday, May 5, 2009

Page 66: Scaling Django

euro con 2009

The Database

• There are lots of non-relational databases coming onto the scene

• CouchDB

• Cassandra

• Tokyo Cabinet

• But they’re not that mature, and they aren’t easy to use with Django

66

Tuesday, May 5, 2009

Page 67: Scaling Django

euro con 2009

The Database

• Django has no support for

• Non-relational databases like CouchDB

• Multiple databases (coming soon?)

• If you’re looking for a project, plz fix this.

• Only advice: don’t get too caught up in trying to duplicate the existing ORM API

67

Tuesday, May 5, 2009

Page 68: Scaling Django

euro con 2009

I Want a Pony

• Save always saves every field of a model

• Causes unnecessary contention and more data transfer

• A better way:

• Use descriptors to determine what’s dirty

• Only update dirty fields when an object is saved

68

Tuesday, May 5, 2009

Page 69: Scaling Django

Denormalization

Tuesday, May 5, 2009

Page 70: Scaling Django

euro con 2009

Denormalization

• Django encourages normalized data, which is usually good

• But at scale you need to denormalize

• Corollary: joins are evil

• Django makes it really easy to do joins using the ORM, so pay attention

70

Tuesday, May 5, 2009

Page 71: Scaling Django

euro con 2009

Denormalization

• Start with a normalized database

• Selectively denormalize things as they become bottlenecks

• Denormalized counts, copied fields, etc. can be updated in signal handlers

71

Tuesday, May 5, 2009

Page 72: Scaling Django

Replication

Tuesday, May 5, 2009

Page 73: Scaling Django

euro con 2009

Replication

• Typical web app is 80 to 90% reads

• Adding read capacity will get you a long way

• MySQL Master-Slave replication

73

Read & Write

Read only

Tuesday, May 5, 2009

Page 74: Scaling Django

euro con 2009

Replication

• Django doesn’t make it easy to use multiple database connections, but it is possible

• Some caveats

• Slave lag interacts with caching in weird ways

• You can only save to your primary DB (the one you configure in settings.py)

• Unless you get really clever...

74

Tuesday, May 5, 2009

Page 75: Scaling Django

euro con 2009

class SlaveDatabaseWrapper(DatabaseWrapper): def _cursor(self, settings): if not self._valid_connection(): kwargs = { 'conv': django_conversions, 'charset': 'utf8', 'use_unicode': True, } kwargs = pick_random_slave(settings.SLAVE_DATABASES) self.connection = Database.connect(**kwargs) ... cursor = CursorWrapper(self.connection.cursor()) return cursor

Replication

75

1. Create a custom database wrapper by subclassing DatabaseWrapper

Tuesday, May 5, 2009

Page 76: Scaling Django

euro con 2009

class MultiDBQuerySet(QuerySet): ... def update(self, **kwargs): slave_conn = self.query.connection self.query.connection = default_connection super(MultiDBQuerySet, self).update(**kwargs) self.query.connection = slave_conn

Replication

76

2. Custom QuerySet that uses primary DB for writes

Tuesday, May 5, 2009

Page 77: Scaling Django

euro con 2009

class SlaveDatabaseManager(db.models.Manager): def get_query_set(self): return MultiDBQuerySet(self.model, query=self.create_query())

def create_query(self): return db.models.sql.Query(self.model, connection)

Replication

77

3. Custom Manager that uses your custom QuerySet

Tuesday, May 5, 2009

Page 78: Scaling Django

euro con 2009

Replication

78

http://github.com/mmalone/django-multidb/

Example on github:

Tuesday, May 5, 2009

Page 79: Scaling Django

euro con 2009

Replication

• Goal:

• Read-what-you-write consistency for writer

• Eventual consistency for everyone else

• Slave lag screws things up

79

Tuesday, May 5, 2009

Page 80: Scaling Django

euro con 2009

Replication

80

What happens when you become write saturated?

Tuesday, May 5, 2009

Page 81: Scaling Django

Federation

Tuesday, May 5, 2009

Page 82: Scaling Django

euro con 2009

Federation

82

• Start with Vertical Partitioning: split tables that aren’t joined across database servers

• Actually pretty easy

• Except not with Django

Tuesday, May 5, 2009

Page 83: Scaling Django

euro con 2009

Federation

83

django.db.models.base

FAIL!

Tuesday, May 5, 2009

Page 84: Scaling Django

euro con 2009

Federation

84

If the Django pony gets kicked every time someonuses {% endifnotequal %} I don’t want to know what

happens every time django.db.connection is imported.

http://www.flickr.com/photos/captainmidnight/811458621/

Tuesday, May 5, 2009

Page 85: Scaling Django

euro con 2009

Federation

• At some point you’ll need to split a single table across databases (e.g., user table)

• Now auto-increment won’t work

• But Django uses auto-increment for PKs

• ugh

• Pluggable UUID backend?

85

Tuesday, May 5, 2009

Page 86: Scaling Django

Profiling, Monitoring & Measuring

Tuesday, May 5, 2009

Page 87: Scaling Django

euro con 2009

>>> Article.objects.filter(pk=3).query.as_sql()('SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article" WHERE "app_article"."id" = %s ', (3,))

Know your SQL

87

Tuesday, May 5, 2009

Page 88: Scaling Django

euro con 2009

>>> import sqlparse>>> def pp_query(qs):... t = qs.query.as_sql()... sql = t[0] % t[1]... print sqlparse.format(sql, reindent=True, keyword_case='upper')... >>> pp_query(Article.objects.filter(pk=3))SELECT "app_article"."id", "app_article"."name", "app_article"."author_id"FROM "app_article"WHERE "app_article"."id" = 3

Know your SQL

88

Tuesday, May 5, 2009

Page 89: Scaling Django

euro con 2009

>>> from django.db import connection>>> connection.queries[{'time': '0.001', 'sql': u'SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article"'}]

Know your SQL

89

Tuesday, May 5, 2009

Page 90: Scaling Django

euro con 2009

Know your SQL

• It’d be nice if a lightweight stacktrace could be done in QuerySet.__init__

• Stick the result in connection.queries

• Now we know where the query originated

90

Tuesday, May 5, 2009

Page 91: Scaling Django

euro con 2009

Measuring

91

Django Debug Toolbar

http://github.com/robhudson/django-debug-toolbar/

Tuesday, May 5, 2009

Page 92: Scaling Django

euro con 2009

Monitoring

• Ganglia

• Munin

92

You can’t improve what you don’t measure.

Tuesday, May 5, 2009

Page 93: Scaling Django

euro con 2009

Measuring & Monitoring

• Measure

• Server load, CPU usage, I/O

• Database QPS

• Memcache QPS, hit rate, evictions

• Queue lengths

• Anything else interesting

93

Tuesday, May 5, 2009

Page 94: Scaling Django

All done... Questions?

Tuesday, May 5, 2009

Page 95: Scaling Django

euro con 2009

Contact Me

95

Mike [email protected]

twitter.com/mjmalone

Tuesday, May 5, 2009