Download - Scaling Django Web Apps - Meetupfiles.meetup.com/1333474/scaling_django_sf_meetup.pdf · 2009-05-27 · scaling bottlenecks • Real world example of scaling a Django app • Django

Scaling Django Web AppsMike Malone

san francisco meetupTuesday, May 26, 2009

... & some Django Patterns & Best Practices

Mike Malone

san francisco meetupTuesday, May 26, 2009

Hi, I’m Mike.

Tuesday, May 26, 2009

http://www.flickr.com/photos/kveton/2910536252/Tuesday, May 26, 2009

http://www.flickr.com/photos/kveton/2910536252/

http://www.flickr.com/photos/kveton/2910536252/

san francisco meetup

Pownce

• Large scale

• Hundreds of requests/sec

• Thousands of DB operations/sec

• Millions of user relationships

• Millions of notes

• Terabytes of static data

8



Pownce

• Encountered and eliminated many common scaling bottlenecks

• Real world example of scaling a Django app

• Django provides a lot for free

• I’ll be focusing on what you have to build yourself, and the rare places where Django got in the way

9


Scalability



Scalability

11

• Speed / Performance

• Generally affected by language choice

• Achieved by adopting a particular technology

Scalability is NOT:



import time

def application(environ, start_response): time.sleep(10) start_response('200 OK', [('content-type', 'text/plain')]) return ('Hello, world!',)

A Scalable Application

12



def application(environ, start_response): remote_addr = environ['REMOTE_ADDR'] f = open('access-log', 'a+') f.write(remote_addr + "\n") f.flush() f.seek(0) hits = sum(1 for l in f.xreadlines()

if l.strip() == remote_addr) f.close() start_response('200 OK', [('content-type', 'text/plain')]) return (str(hits),)

A High Performance Application

13



Scalability

14

A scalable system doesn’t need to change when the size of the problem changes.



Scalability

• Accommodate increased usage

• Accommodate increased data

• Maintainable

15



Scalability

• Two kinds of scalability

• Vertical scalability: buying more powerful hardware, replacing what you already own

• Horizontal scalability: buying additional hardware, supplementing what you already own

16



Vertical Scalability

• Costs don’t scale linearly (server that’s twice is fast is more than twice as much)

• Inherently limited by current technology

• But it’s easy! If you can get away with it, good for you.

17



Vertical Scalability

18

Sky scrapers are special. Normal buildings don’t need 10 floor foundations. Just build!

- Cal Henderson

“



Horizontal Scalability

19

The ability to increase a system’s capacity by adding more processing units (servers)




20

It’s how large apps are scaled.




• A lot more work to design, build, and maintain

• Requires some planning, but you don’t have to do all the work up front

• You can scale progressively...

• Rest of the presentation is roughly in order if you’re scaling as you go...

21


Caching



Caching

• Several levels of caching available in Django

• Per-site cache: caches every page that doesn’t have GET or POST parameters

• Per-view cache: caches output of an individual view

• Template fragment cache: caches fragments of a template

• None of these are that useful if pages are heavily personalized

23



Caching

• Low-level Cache API

• Much more flexible, allows you to cache at any granularity

• At Pownce we typically cached

• Individual objects

• Lists of object IDs

• Hard part is invalidation

24



Caching

• Cache backends:

• Memcached

• Database caching

• Filesystem caching

25



Caching

26

Use Memcache.



Sessions

27

Use Memcache.



Sessions

28

Or Tokyo Cabinethttp://github.com/ericflo/django-tokyo-sessions/

Thanks @ericflo


http://github.com/ericflo/django-tokyo-sessions/





from django.core.cache import cache

class UserProfile(models.Model): ... def get_social_network_profiles(self): cache_key = ‘networks_for_%s’ % self.user.id profiles = cache.get(cache_key) if profiles is None: profiles = self.user.social_network_profiles.all() cache.set(cache_key, profiles) return profiles

Caching

29

Basic caching comes free with Django:



from django.core.cache import cachefrom django.db.models import signals

def nuke_social_network_cache(self, instance, **kwargs): cache_key = ‘networks_for_%s’ % self.instance.user_id cache.delete(cache_key)

signals.post_save.connect(nuke_social_network_cache, sender=SocialNetworkProfile)signals.post_delete.connect(nuke_social_network_cache, sender=SocialNetworkProfile)

Caching

30

Invalidate when a model is saved or deleted:



Caching

31

• Invalidate post_save, not pre_save

• Still a small race condition

• Simple solution, worked for Pownce:

• Instead of deleting, set the cache key to None for a short period of time

• Instead of using set to cache objects, use add, which fails if there’s already something stored for the key



Caching

32



Caching

33



Advanced Caching

34

• Memcached’s atomic increment and decrement operations are useful for maintaining counts

• But they’re not available in Django 1.0

• Added in 1.1 by ticket #6464



Advanced Caching

35

• You can still use them if you poke at the internals of the cache object a bit

• cache._cache is the underlying cache object

try: result = cache._cache.incr(cache_key, delta)except ValueError: # nonexistent key raises ValueError # Do it the hard way, store the result.return result



Advanced Caching

36

• Other missing cache API

• delete_multi & set_multi

• append: add data to existing key after existing data

• prepend: add data to existing key before existing data

• cas: store this data, but only if no one has edited it since I fetched it



Advanced Caching

37

• It’s often useful to cache objects ‘forever’ (i.e., until you explicitly invalidate them)

• User and UserProfile

• fetched almost every request

• rarely change

• But Django won’t let you

• IMO, this is a bug :(



class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=0): if isinstance(value, unicode): value = value.encode('utf-8') return self._cache.add(smart_str(key), value, timeout or self.default_timeout)

The Memcache Backend

38



class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=None): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.add(smart_str(key), value, timeout)

The Memcache Backend

39



Advanced Caching

40

• Typical setup has memcached running on web servers

• Pownce web servers were I/O and memory bound, not CPU bound

• Since we had some spare CPU cycles, we compressed large objects before caching them

• The Python memcache library can do this automatically, but the API is not exposed



from django.core.cache import cachefrom django.utils.encoding import smart_strimport inspect as i

if 'min_compress_len' in i.getargspec(cache._cache.set)[0]: class CacheClass(cache.__class__): def set(self, key, value, timeout=None, min_compress_len=150000): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.set(smart_str(key), value, timeout, min_compress_len) cache.__class__ = CacheClass

Monkey Patching core.cache

41



Advanced Caching

42

• Useful tool: automagic single object cache

• Use a manager to check the cache prior to any single object get by pk

• Invalidate assets on save and delete

• Eliminated several hundred QPS at Pownce



Advanced Caching

43

All this and more at:

http://github.com/mmalone/django-caching/





Advanced Caching

• Consistent hashing: hashes cached objects in such a way that most objects map to the same node after a node is added or removed.

44

http://www.flickr.com/photos/deepfrozen/2191036528/





Consistent Hashing

45



Consistent Hashing

46



Consistent Hashing

47

hash_ring on PyPi



Caching

48

Now you’ve made life easier for your DB server,next thing to fall over: your app server.


Load Balancing



Load Balancing

• Out of the box, Django uses a shared nothing architecture

• App servers have no single point of contention

• Responsibility pushed down the stack (to DB)

• This makes scaling the app layer trivial: just add another server

50



Load Balancing

51

App Servers

Database

Load Balancer

Spread work between multiple nodes in a cluster using a load balancer.

• Hardware or software• Layer 7 or Layer 4



Load Balancing

52

• Hardware load balancers

• Expensive, like $35,000 each, plus maintenance contracts

• Need two for failover / high availability

• Software load balancers

• Cheap and easy, but more difficult to eliminate as a single point of failure

• Lots of options: Perlbal, Pound, HAProxy, Varnish, Nginx



Load Balancing

53

• Most of these are layer 7 load balancers, and some software balancers do cool things

• Caching

• Re-proxying

• Authentication

• URL rewriting



Load Balancing

54

A common setup for large operations is to use redundant layer 4 hardware balancers in front of a pool of layer 7 software balancers.

Hardware Balancers

Software Balancers

App Servers



Load Balancing

55

• At Pownce, we used a single Perlbal balancer

• Easily handled all of our traffic (hundreds of simultaneous connections)

• A SPOF, but we didn’t have $100,000 for black box solutions, and weren’t worried about service guarantees beyond three or four nines

• Plus there were some neat features that we took advantage of



Perlbal Reproxying

56

Perlbal reproxying is a really cool, and really poorlydocumented feature.



Perlbal Reproxying

57

1. Perlbal receives request

2. Redirects to App Server

1. App server checks auth (etc.)

2. Returns HTTP 200 with X-Reproxy-URL header set to internal file server URL

3. File served from file server via Perlbal



Perlbal Reproxying

• Completely transparent to end user

• Doesn’t keep large app server instance around to serve file

• Users can’t access files directly (like they could with a 302)

58



def download(request, filename): # Check auth, do your thing response = HttpResponse() response[‘X-REPROXY-URL’] = ‘%s/%s’ % (FILE_SERVER, filename) return response

Perlbal Reproxying

59

Plus, it’s really easy:



Load Balancing

60

Best way to reduce load on your app servers: don’t use them to do hard stuff.


Queuing



Queuing

• A queue is simply a bucket that holds messages until they are removed for processing by clients

• Many expensive operations can be queued and performed asynchronously

• User experience doesn’t have to suffer

• Tell the user that you’re running the job in the background (e.g., transcoding)

• Make it look like the job was done real-time (e.g., note distribution)

62



Queuing

• Lots of open source options for queuing

• Ghetto Queue (MySQL + Cron)

• this is the official name.

• Gearman

• TheSchwartz

• RabbitMQ

• Apache ActiveMQ

• ZeroMQ

63



Queuing

• Lots of fancy features: brokers, exchanges, routing keys, bindings...

• Don’t let that crap get you down, this is really simple stuff

• Biggest decision: persistence

• Does your queue need to be durable and persistent, able to survive a crash?

• This requires logging to disk which slows things down, so don’t do it unless you have to

64



Queuing

• Pownce used a simple ghetto queue built on MySQL / cron

• Problematic if you have multiple consumers pulling jobs from the queue

• No point in reinventing the wheel, there are dozens of battle-tested open source queues to choose from

65



from django.core.management import setup_environfrom mysite import settings

setup_environ(settings)

Django Standalone Scripts

66

Consumers need to setup the Django environment



Django Standalone Scripts

67

Great blog post by James Bennett (@ubernostrum)

http://bit.ly/django-standalone-scripts




THE DATABASE!



The Database

• Til now we’ve been talking about

• Shared nothing

• Pushing problems down the stack

• But we have to store a persistent and consistent view of our application’s state somewhere

• Enter, the database...

69



CAP Theorem

• Three properties of a shared-data system

• Consistency: all clients see the same data

• Availability: all clients can see some version of the data

• Partition Tolerance: system properties hold even when the system is partitioned & messages are lost

• But you can only have two

70



CAP Theorem

• Big long proof... here’s my version.

• Empirically, seems to make sense.

• Eric Brewer

• Professor at University of California, Berkeley

• Co-founder and Chief Scientist of Inktomi

• Probably smarter than me

71



CAP Theorem

• The relational database systems we all use were built with consistency as their primary goal

• But at scale our system needs to have high availability and must be partitionable

• The RDBMS’s consistency requirements get in our way

• Most sharding / federation schemes are kludges that trade consistency for availability & partition tolerance

72



The Database

• There are lots of non-relational databases coming onto the scene

• CouchDB

• Cassandra

• Tokyo Cabinet

• But they’re not that mature, and they aren’t easy to use with Django

73



The Database

• Django has no support for

• Non-relational databases like CouchDB

• Multiple databases (coming soon?)

• If you’re looking for a project, plz fix this.

• Only advice: don’t get too caught up in trying to duplicate the existing ORM API

74



I Want a Pony

• Save always saves every field of a model

• Causes unnecessary contention and more data transfer

• A better way:

• Use descriptors to determine what’s dirty

• Only update dirty fields when an object is saved

75


Denormalization



Denormalization

• Django encourages normalized data, which is usually good

• But at scale you need to denormalize

• Corollary: joins are evil

• Django makes it really easy to do joins using the ORM, so pay attention

77



Denormalization

• Start with a normalized database

• Selectively denormalize things as they become bottlenecks

• Denormalized counts, copied fields, etc. can be updated in signal handlers

78


Replication



Replication

• Typical web app is 80 to 90% reads

• Adding read capacity will get you a long way

• MySQL Master-Slave replication

80

Read & Write

Read only



Replication

• Django doesn’t make it easy to use multiple database connections, but it is possible

• Some caveats

• Slave lag interacts with caching in weird ways

• You can only save to your primary DB (the one you configure in settings.py)

• Unless you get really clever...

81



class SlaveDatabaseWrapper(DatabaseWrapper): def _cursor(self, settings): if not self._valid_connection(): kwargs = { 'conv': django_conversions, 'charset': 'utf8', 'use_unicode': True, } kwargs = pick_random_slave(settings.SLAVE_DATABASES) self.connection = Database.connect(**kwargs) ... cursor = CursorWrapper(self.connection.cursor()) return cursor

Replication

82

1. Create a custom database wrapper by subclassing DatabaseWrapper



class MultiDBQuerySet(QuerySet): ... def update(self, **kwargs): slave_conn = self.query.connection self.query.connection = default_connection super(MultiDBQuerySet, self).update(**kwargs) self.query.connection = slave_conn

Replication

83

2. Custom QuerySet that uses primary DB for writes



class SlaveDatabaseManager(db.models.Manager): def get_query_set(self): return MultiDBQuerySet(self.model, query=self.create_query())

def create_query(self): return db.models.sql.Query(self.model, connection)

Replication

84

3. Custom Manager that uses your custom QuerySet



Replication

85

http://github.com/mmalone/django-multidb/

Example on github:





Replication

• Goals:

• Read-what-you-write consistency for writer

• Eventual consistency for everyone else

• Slave lag screws things up

86



Replication

87

What happens when you become write saturated?


Federation



Federation

89

• Start with Vertical Partitioning: split tables that aren’t joined across database servers

• Actually pretty easy

• Except not with Django



Federation

90

django.db.models.base

FAIL!



Federation

• At some point you’ll need to split a single table across databases (e.g., user table)

• Now auto-increment won’t work

• But Django uses auto-increment for PKs

• So specify your own PKs in the save() method

• Not a bad idea to start with UUIDs from day one since it’s a pain in the ass to migrate

91



class Model(models.Model): def save(self, force_insert=False, force_update=False): if not self.id: force_insert = True self.id = uuid.uuid() return super(Model, self).save(force_insert, force_update) class Meta: abstract = True

Federation

92



UUID Generator

93

http://gist.github.com/117292




Profiling, Monitoring & Measuring



>>> Article.objects.filter(pk=3).query.as_sql()('SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article" WHERE "app_article"."id" = %s ', (3,))

Know your SQL

95



>>> import sqlparse>>> def pp_query(qs):... t = qs.query.as_sql()... sql = t[0] % t[1]... print sqlparse.format(sql, reindent=True, keyword_case='upper')... >>> pp_query(Article.objects.filter(pk=3))SELECT "app_article"."id", "app_article"."name", "app_article"."author_id"FROM "app_article"WHERE "app_article"."id" = 3

Know your SQL

96



>>> from django.db import connection>>> connection.queries[{'time': '0.001', 'sql': u'SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article"'}]

Know your SQL

97



Know your SQL

• It’d be nice if a lightweight stacktrace could be done in QuerySet.__init__

• Stick the result in connection.queries

• Now we know where the query originated

98



Monitoring & Measuring

99

Django Debug Toolbar

http://github.com/robhudson/django-debug-toolbar/






• Ganglia (http://ganglia.info)

• Munin (http://munin.projects.linpro.no/)

• Cacti (http://cacti.net)

100

You can’t improve what you don’t measure.


http://munin.projects.inpro.no

http://munin.projects.inpro.no



• All Servers

• CPU Usage

• Disk utilization

• IO Wait

• Memory Usage

• Bandwidth Usage

101




• Database Servers

• Queries per second

• Open connections

• Slave lag

• Cache hit rate

102




• Web Servers

• Requests per second

• Response time

• Apache children (or equivalent)

103




• Cache servers

• Requests per second

• Eviction rate

• LRU reference age

• Average object size

• Cache hit ratio

104




• Application level

• Queue lengths

• Registration rate

• Anything interesting

• You should be able to correlate your server level metrics (like DB QPS) with application level metrics (like API traffic)

105


All done... Questions?

Mike [email protected]

twitter.com/mjmalone


mailto:[email protected]


Django Patterns & Best Practices



URLs

108

Always name URLs.



url(r’^user/(\w+/$’, ..., name=‘user’))

{% url user username %}

from django.core.urlresolvers import reversereverse(‘user’, kwargs={‘username’: username})

URLs

109

Always name URLs.



The ORM

110

Use Model.objects.get_or_create()



token, created = Token.objects.get_or_create( key=access_token.key, defaults={'secret': access_token.secret})if not created: token.secret = access_token.secret token.save()

The ORM

111

Use Model.objects.get_or_create()



Managers

112

Custom Managers are awesome. You should use them.



Managers

113

• Custom managers are good for

• Caching

• Denormalization

• Custom SQL

• Complex relationships

• Anything on a model that you want to hide behind a pretty API



class FollowingDescriptor(object): def __get__(self, instance, cls): class RelationshipManager(models.Manager): def get_query_set(self): return User.objects.filter(follower_relationships__user=instance) def add(self, user): instance.following_relationships.create(to_user=user) def remove(self, user): try: relationship = instance.following_relationships.get(to_user=user) relationship.delete() except ObjectDoesNotExist: pass return RelationshipManager()

Managers

114



Class-based Views

115

Django views are callables that take a request object and return a response object.



Class-based Views

116

• Just implement the __call__() method

• Views instantiated when urls.py is imported

• View instances are global variables

• Not thread-safe

• Retain state between requests



Class-based Views

• Make your view subclass HttpResponse

• Kind of hacky, but it works

• Instantiated per request

• Thread safe

• Safe to maintain state in the view instances

• Jacob promises to fix the problems with the __call__()-based approach

117



Class-based Views

118

__call__()-based approach

http://www.djangosnippets.org/snippets/1071/





Class-based Views

119


Subclass approach








Subclassy Models

120

• Abstract models added in Django 1.0

• Useful for creating a common base class

• Pownce: Note superclass would have been nice

• Work by creating multiple tables for superclass and subclasses

• But if you fetch an object via the superclass manager, you get an instance of the superclass... lame.



Subclassy Models

121


Use a custom Manager & QuerySet to return an instance of the base class




All done... Questions?







Contact Me

123