PyCon2013

Post on 08-Nov-2014

30 views 0 download

Tags:

description

pycvcooooomnnnnnvbuiffsobgewfgbounf

Transcript of PyCon2013

Async I/O for Python 3(PyCon 2013 keynote)

Guido van Rossumguido@python.org

This all started on python-ideas...

• When someone proposed to fix asyncore.py• http://mail.python.org/pipermail/python-

ideas/2012-September/016185.html– Subject: asyncore: included batteries don't fit– Date: September 22, 2012– By October 6 it was a centithread– On October 12 I started several new threads– On December 12 I first posted PEP 3156

Take a deep breath

What is async I/O?

• Do something else while waiting for I/O• It's an old idea (as old as computers)• With lots of approaches– threads, callbacks, events...

• I'll come back to this later

Why async I/O?

• I/O is slow compared to other work– the CPU is not needed to do I/O

• Keep a UI responsive– avoid beach ball while loading a url

• Want to do several/many I/O things at once– some complex client apps– typical server apps

Why not use threads?

• (Actually you may if they work for you!)• OS threads are relatively expensive• Max # open sockets >> max # threads• Preemptive scheduling causes races– "solved" with locks

Async I/O without threads

• select(), poll(), etc.• asyncore :-(• write your own• frameworks, e.g. Twisted, Tornado, zeroMQ• Wrap C libraries, e.g. libevent, libev, libuv• Stackless, gevent, eventlet• (Some overlap)

Downsides

• Too many choices• Nobody likes callbacks• APIs not always easy• Standard library doesn't cooperate

So, about gevent...

• Scary implementation details– x86 CPython specific stack-copying code

• Monkey-patching– "patch-and-pray"

• Don't know when it task switches– could be not enough– could be unexpected

So what to do?

No, really!

Let's standardize the event loop

• At the bottom of all of these is an event loop– (that is, all except OS threads)

• Event loop multiplexes I/O• Various other features also common

Why is the event loop special?

• Serializes event handling– handle only one event at a time

• There should be only one– otherwise it's not serializing events

• Each framework has its own event loop API– even though the functionality has much overlap

What functionality is needed?

• start, stop running the loop– variant: always running

• schedule callback DT in the future (may be 0)– also: repeated timer callback

• set callback for file descriptor when ready– variant: call when I/O done

Interop

• Most frameworks don't interoperate• There's a small cottage industry adapting the

event loop from framework X to be usable with framework Y– Tornado now maintains a Twisted adapter– There's also a zeroMQ adapter for Tornado– I hear there's a gevent fork of Tornado– etc.

Enter PEP 3156 and Tulip

I know this is madness

• Why can't we all just use Tornado?• Let's just import Twisted into the stdlib• Standardizing gevent solves all its problems– no more monkey-patching– greenlets in the language

• Or maybe use Stackless Python?• Why reinvent the wheel?– libevent/ev/uv is the industry standard

Again: PEP 3156 and Tulip

• I like to write clean code from scratch• I also like to learn from others• I really like clean interfaces• PEP 3156 and Tulip satisfy all my cravings

What is PEP 3156? What is Tulip?

• PEP 3156:– standard event loop interface– slated for Python 3.4

• Tulip:– experimental prototype (currently)– reference implementation (eventually)– additional functionality (maybe)– works with Python 3.3 (always)

PEP 3156 is not just an event loop

• It's also an interface to change the event loop implementation (to another conforming one)– this is the path to framework interop– (even gevent!)

• It also proposes a new way of writing callbacks– (that doesn't actually use callbacks)

But first, the event loop

• Influenced by both Twisted and Tornado• Reviewed by (some) other stakeholders• The PEP is not in ideal state yet• I am going to sprint Mon-Tue on PEP and Tulip

Event loop method groups

• starting/stopping the loop• basic callbacks• I/O callbacks• thread interactions• socket I/O operations• higher-level network operations

Starting/stopping the event loop

• run() # runs until nothing more to do• run_forever()• run_once([timeout])• run_until_complete(future, [timeout])• stop()

• May change these around a bit

Basic callbacks

• call_soon(callback, *args)• call_later(delay, callback, *args)• call_repeatedly(interval, callback, *args)• call_soon_threadsafe(callback, *args)

• All return a Handler instance which can be used to cancel the callback

I/O callbacks

• add_reader(fd, callback, *args) -> Handler• remove_reader(fd)• add_writer(fd, callback, *args) -> Handler• remove_writer(fd)

• Not all fd types are always acceptable• fd may be an object with a fileno() method

UNIX signals

• add_signal_handler(sig, callback, *args) -> Handler

• remove_signal_handler(sig)

• Raise RuntimeError if signals are unsupported

Thread interactions

• wrap_future(future) -> Future• run_in_executor(executor, callback, *args)

-> Future

• Used to run code in another thread– sometimes there is no alternative– e.g. getaddrinfo(), database connections

• Threads may use call_soon_threadsafe()

Socket I/O operations

• sock_recv(sock, nbytes) -> Future• sock_sendall(sock, data) -> Future• sock_accept(sock) -> Future• sock_connect(sock, address) -> Future

• Only transports should use these

High-level network operations

• getaddrinfo(host, port, ...) -> Future• getnameinfo(address, [flags]) -> Future• create_connection(factory, host, port, ...)

-> Future• start_serving(factory, host, port, ...) -> Future

• Use these in your high-level code

Um, Futures?

• Like PEP 3148 Futures (new in Python 3.2):– from concurrent.futures import Future– f.set_result(x), f.set_exception(e)– f.result(), f.exception()– f.add_done_callback(func)– wait(fs, [timeout, [flags]]) -> (done, not_done)– as_completed(fs, [timeout]) -> <iterator>

• However, adapted for use with coroutines

Um, coroutines?

• Whoops, let me get back to that later

What's a Future?

• Abstraction for a value to be produced later– Also known as Promises (check wikipedia)– Per wikipedia, these are explicit futures

• API:– result() blocks until result is ready– an exception is a "result" too: will be raised!– exception() blocks ands checks for exceptions– done callbacks called when result/exc is ready

Futures and coroutines

• Not the concurrent.futures.Future class!• Nor exactly the same API• Where PEP 3148 "blocks", we must use...

Drum roll, please

PEP 380: yield-from

• @coroutinedef getresp(): s = socket() yield from loop.sock_connect(s, host, port) yield from loop.sock_sendall(s, b'xyzzy') data = yield from loop.sock_recv(s, 100)

• Yes, you can now return from a generator!• Please, do not write real code like this! :-)

I cannot possibly do this justice

• The best way to think about this is that yield-from is magic that "blocks" your current task but does not block your application

• It's almost best to pretend it isn't there when you squint (but things don't work without it)

• PS. @coroutine / yield-from are very close to async / await in C#

How to think about Futures

• Most of the time you can forget they are there• Just pretend that:

data = yield from <function_returning_future>is equivalent to: data = <equivalent_blocking_function>...and keep calm and carry on

• Also forget about result(), exception(), and done-callbacks

Error handling

• Futures can raise exceptions too• Just put a try/except around the yield-from:• try:

data = yield from loop.sock_connect(s, h, p)except OSError: <error handling code>

Coroutines

• Yield-from must be used inside a generator• Use @coroutine decorator to indicate that

you're using yield-from to pass Futures• Coroutines are driven by the yield-from• Without yield-from a coroutine doesn't run

• What if you want an autonomous task?

Tasks

• Tasks run as long as the event loop runs• A Task is a coroutine wrapped in a Future• Two ways to create Tasks:– @task decorator (instead of @coroutine)– f = Task(some_coroutine())

• The Task makes sure the coroutine runs• Task is a subclass of Future

Back to higher-level network ops

• Consider: loop.create_connection(factory, host, port)

• This will block and create a TCP connection• It returns a Future when ready• The factory is a protocol class– or a factory function returning a protocol instance

• Future's result is a (transport, protocol) tuple

Wait; transports and protocols?!

• PEP 3153 (async I/O) explains why transport and protocol is the right abstraction– transport: provides two byte streams• e.g. TCP or SSL or pipes

– protocol: implements application logic• e.g. SMTP or FTP or IRC

• Only this abstraction level supports both ready- (select) and done-callbacks (IOCP)

Below the event loop

• Lowest level factored out– selector classes: uniform API to select, poll, etc.– will be stdlib classes in their own right– also an IOCP "proactor" (not the same API)

• Not part of the PEP (uncontroversial)

There's a lot more...

But I'm out of time :-(

• StreamReader class: like a file whose methods return Futures (e.g. readline())

• Datagram protocol (under development)• Various types of locks (experimental)• Exemplary HTTP client and server protocols– (may base client on Requests, HTTP for humans)

• Subprocess support (mostly TBD)

More about interop...

• Write code against standard event loop API• May use yield-from, don't have to• Will interop with other code written like that• Will also work with adapted event loop– e.g. Twisted reactor– code using legacy event loop API will also work– Ideally most of Twisted will work with any

standard event loop

Using Futures w/o yield-from

• You can use Futures without yield-from!• Just use add_done_callback() and set_result()• This is how Twisted can adapt the event loop

When can I have it?

• Tulip works but is in flux and undocumented• PEP 3156 still to be reviewed thoroughly• Push to be ready for Python 3.4 (Feb 2014)– 3.4.0 beta 1 cutoff date Nov 23, 2013

• Tulip (3rd party) will work with vanilla 3.3• Will keep Tulip around for a few releases• PS. stdlib version won't be named "tulip"

And the rest of the stdlib?

• We'll start thinking about that in earnest once 3.4 is out of the door

• We may eventually have to deprecate urllib, socketserver etc.

• Or emulate them on top of PEP 3156• But that will take years

What about older Python versions?

• Sorry, you're out of luck :-(• yield-from only available in 3.3• Much of Tulip depends on yield-from– even the parts that just use Futures

• Consider this a carrot for porting to 3.3 :-)• However, someone could implement a PEP-

conforming event loop in Python 2.7– just use yield instead of yield-from

Acknowledgments

• Greg Ewing for PEP 380 (yield-from)• Glyph and SF Twisted folks for meetings• Richard Oudkerk for the IOCP proactor work• Nikolay Kim for much of the code and tests• Charles-François Natali for the Selectors• Eli Benderski, Geert Jansen, Saúl Ibarra

Corretgé, Steve Dower, Dino Viehland, Ben Darnell, Laurens van Houtven, Giampaolo Rodolà, and everyone on python-ideas...

Oh yeah, I'm sprinting

• Will be here Monday - Tuesday