Python, async web frameworks, and MongoDB

23
Python, MongoDB, and asynchronous web frameworks A. Jesse Jiryu Davis [email protected] emptysquare.net

description

A talk covering the state of the art for writing asynchronous web applications using Python and MongoDB.

Transcript of Python, async web frameworks, and MongoDB

Page 1: Python, async web frameworks, and MongoDB

Python, MongoDB, and asynchronous web frameworks

A. Jesse Jiryu [email protected]

Page 2: Python, async web frameworks, and MongoDB

Agenda• Talk about web services in a really dumb

(“abstract”?) way• Explain when we need async web servers• Why is async hard?• What is Tornado and how does it work?• Why am I writing a new PyMongo wrapper to

work with Tornado?• How does my wrapper work?

Page 3: Python, async web frameworks, and MongoDB

CPU-bound web service

Client Serversocket

• No need for async• Just spawn one process per core

Page 4: Python, async web frameworks, and MongoDB

Normal web service

Client Serversocket

• Assume backend is unbounded• Service is bound by: • Context-switching overhead • Memory!

Backend(DB, web service,

SAN, …)socket

Page 5: Python, async web frameworks, and MongoDB

What’s async for?

• Minimize resources per connection• I.e., wait for backend as cheaply as possible

Page 6: Python, async web frameworks, and MongoDB

CPU- vs. Memory-bound

Crypto ChatMost web services?•

Memory-boundCPU-bound

Page 7: Python, async web frameworks, and MongoDB

HTTP long-polling (“COMET”)

• E.g., chat server• Async’s killer app• Short-polling is CPU-bound: tradeoff between

latency and load• Long-polling is memory bound• “C10K problem”: kegel.com/c10k.html• Tornado was invented for this

Page 8: Python, async web frameworks, and MongoDB

Why is async hard to code?BackendClient Server

request

response

store state

request

response

time

Page 9: Python, async web frameworks, and MongoDB

Ways to store statethis slide is in beta

Coding difficulty

Multithreading

Tornado, Node.jsGreenlets / Gevent

Mem

ory

per c

onne

ction

Page 10: Python, async web frameworks, and MongoDB

What’s a greenlet?

• A.K.A. “green threads”• A feature of Stackless Python, packaged as a

module for standard Python• Greenlet stacks are stored on heap, copied

to / from OS stack on resume / pause• Cooperative• Memory-efficient

Page 11: Python, async web frameworks, and MongoDB

Threads:State stored on OS stacks

# pseudo-Python

sock = listen()

request = parse_http(sock.recv())

mongo_data = db.collection.find()

response = format_response(mongo_data)

sock.sendall(response)

Page 12: Python, async web frameworks, and MongoDB

Gevent:State stored on greenlet stacks

# pseudo-Pythonimport gevent.monkey; monkey.patch_all() sock = listen() request = parse_http(sock.recv()) mongo_data = db.collection.find() response = format_response(mongo_data) sock.sendall(response)

Page 13: Python, async web frameworks, and MongoDB

Tornado:State stored in RequestHandler

class MainHandler(tornado.web.RequestHandler): @tornado.web.asynchronous def get(self): AsyncHTTPClient().fetch(

"http://example.com", callback=self.on_response)  def on_response(self, response): formatted = format_response(response) self.write(formatted) self.finish()

Page 14: Python, async web frameworks, and MongoDB

Tornado IOStreamclass IOStream(object): def read_bytes(self, num_bytes, callback): self.read_bytes = num_bytes self.read_callback = callback

io_loop.add_handler( self.socket.fileno(),

self.handle_events,events=READ)

def handle_events(self, fd, events): data = self.socket.recv(self.read_bytes) self.read_callback(data)

Page 15: Python, async web frameworks, and MongoDB

Tornado IOLoop

class IOLoop(object): def add_handler(self, fd, handler, events): self._handlers[fd] = handler # _impl is epoll or kqueue or ... self._impl.register(fd, events)

def start(self): while True: event_pairs = self._impl.poll() for fd, events in event_pairs: self._handlers[fd](fd, events)

Page 16: Python, async web frameworks, and MongoDB

Python, MongoDB, & concurrency

• Threads work great with pymongo• Gevent works great with pymongo– monkey.patch_socket(); monkey.patch_thread()

• Tornado works so-so– asyncmongo

• No replica sets, only first batch, no SON manipulators, no document classes, …

– pymongo• OK if all your queries are fast• Use extra Tornado processes

Page 17: Python, async web frameworks, and MongoDB

Introducing: “Motor”

• Mongo + Tornado• Experimental• Might be official in a few months• Uses Tornado IOLoop and IOStream• Presents standard Tornado callback API• Stores state internally with greenlets• github.com/ajdavis/mongo-python-driver/tree/tornado_async

Page 18: Python, async web frameworks, and MongoDB

Motorclass MainHandler(tornado.web.RequestHandler): def __init__(self): self.c = MotorConnection()

@tornado.web.asynchronous def post(self): # No-op if already open self.c.open(callback=self.connected)

def connected(self, c, error): self.c.collection.insert( {‘x’:1}, callback=self.inserted)

def inserted(self, result, error): self.write(’OK’) self.finish()

Page 19: Python, async web frameworks, and MongoDB

Motor internals

pymongoIOLoop RequestHandlerrequest

schedulecallback

start

time

Client greenlet

IOStream.sendall(callback)switch()

switch()

return

stack depth

callback()

HTTP response

parse Mongo response

callback()

Page 20: Python, async web frameworks, and MongoDB

Motor internals: wrapperclass MotorCollection(object): def insert(self, *args, **kwargs): callback = kwargs['callback'] del kwargs['callback'] kwargs['safe'] = True

def call_insert(): # Runs on child greenlet result, error = None, None try: sync_insert = self.sync_collection.insert result = sync_insert(*args, **kwargs) except Exception, e: error = e

# Schedule the callback to be run on the main greenlet tornado.ioloop.IOLoop.instance().add_callback( lambda: callback(result, error) )

# Start child greenlet greenlet.greenlet(call_insert).switch()

return

1

2

3

6

8

Page 21: Python, async web frameworks, and MongoDB

Motor internals: fake socketclass MotorSocket(object): def __init__(self, socket): # Makes socket non-blocking self.stream = tornado.iostream.IOStream(socket)

def sendall(self, data): child_gr = greenlet.getcurrent()

# This is run by IOLoop on the main greenlet # when data has been sent; # switch back to child to continue processing def sendall_callback(): child_gr.switch()

self.stream.write(data, callback=sendall_callback)

# Resume main greenlet child_gr.parent.switch()

4

5

7

Page 22: Python, async web frameworks, and MongoDB

Motor

• Shows a general method for asynchronizing synchronous network APIs in Python

• Who wants to try it with MySQL? Thrift?• (Bonus round: resynchronizing Motor for

testing)

Page 23: Python, async web frameworks, and MongoDB

Questions?

A. Jesse Jiryu [email protected]

(10gen is hiring, of course:10gen.com/careers)