Don't Do ThisRichard Jones, PyCon AU 2013
Monday, 8 July 13
Monday, 8 July 13
In this talk I'm going to poke around in some strange corners of Python and perhaps show you some things you can do with Python that you probably shouldn't. First up I'm going to look at some strange edge cases of the Python grammar.
>>> from serialise_marshal import SerialiseMarshal>>> try:... from serialise_json import SerialiseJSON... except:... SerialiseJSON = None... pass...
Monday, 8 July 13
Let's say we have some mixin classes that perform serialisation. Let's say that out preferred mixin might not be available, but we want things to go on rocking regardless.
>>> from serialise_marshal import SerialiseMarshal>>> try:... from serialise_json import SerialiseJSON... except:... SerialiseJSON = None... pass...>>> class foo(SerialiseJSON or SerialiseMarshal):... pass...
Monday, 8 July 13
So, did you know that classes in the bases clause of a class definition could also be expressions? Oh yes. Because "inheritance" just doesn't cut it in this modern world of rapidly changing serialisation protocols. We need more "fallbackitance".
>>> try:... B0RK... except eval('NameError'):... print("Caught!")... else:... print("ok")...Caught!
Monday, 8 July 13
So, who knew that except clauses can be expressions? Whatever the expression evaluates to had better be an Exception base class, but as long as it is, it'll be the exception type that's caught.
>>> def generate_stuff():... for i in range(3):... yield 'spam'... while True:... yield 'ham'...
Monday, 8 July 13
Hey, generators are cool, right?
>>> def generate_stuff():... for i in range(3):... yield 'spam'... while True:... yield 'ham'... >>> generate_stuff = generate_stuff().__next__
Monday, 8 July 13
When you pluck out their __next__ method you can just keep calling them and they generate stuff!
>>> def generate_stuff():... for i in range(3):... yield 'spam'... while True:... yield 'ham'... >>> generate_stuff = generate_stuff().__next__>>> generate_stuff()'spam'>>> generate_stuff()'spam'>>> generate_stuff()'spam'>>> generate_stuff()'ham'>>> generate_stuff()'ham'>>> generate_stuff()'ham'>>> generate_stuff()'ham'... and so on
Monday, 8 July 13
They're like happy little spewing machines that can make your program become awesomer!
def generate_5_assertions(): for i in range(5): yield AssertionError while True: yield RuntimeError
generate_5_assertions = generate_5_assertions().__next__
Monday, 8 July 13
Let's modify our generator to generate exception classes instead of strings.
def generate_5_assertions(): for i in range(5): yield AssertionError while True: yield RuntimeError
generate_5_assertions = generate_5_assertions().__next__
import random
while True: try: assert random.randint(0, 1) print('Phew!') except generate_5_assertions(): print('Assertion Squashed!')
Monday, 8 July 13
And now, in some stupid code that generates stupid assertion errors about half the time, we can restrict our program so that it's only so tolerant of those errors.
don't do this richard$ python3 except_clause.py Phew!Assertion Squashed!Assertion Squashed!Phew!Phew!Assertion Squashed!Phew!Phew!Assertion Squashed!Phew!Assertion Squashed!Phew!Traceback (most recent call last): File "<stdin>", line 3, in <module>AssertionError
Monday, 8 July 13
5 errors and we stop squashing them. Er, don't do this?
Monday, 8 July 13
And just in case you thought I was kidding, this is an actual line of code from MongoDB. OK, this isn't exactly the same, it's avoiding logging 90% of a particular kind of error.
Monday, 8 July 13
now let's look at some ways that Python's runtime is perhaps a little more mutable than you previously thought
>>> def f():... print('ohai there!')...>>> f()ohai there!
Monday, 8 July 13
OK, now let's do something a little more odd. Let's define a functions, say f(). It does a thing.
>>> def f():... print('ohai there!')...>>> f()ohai there!>>> f.__code__<code object f at 0x10b25e930, file "<stdin>", line 1>
Monday, 8 July 13
The it does is in its code, and that code object is attached to the function object as the __code__ attribute.
>>> def f():... print('ohai there!')...>>> f()ohai there!>>> f.__code__<code object f at 0x10b25e930, file "<stdin>", line 1>>>> exec(f.__code__)ohai there!
Monday, 8 July 13
You can exec code objects. That's fun.
>>> def g():... print('hello, world!')...>>> g()hello, world!
Monday, 8 July 13
Let's make another function.
>>> def g():... print('hello, world!')...>>> g()hello, world!>>> g.__code__ = f.__code__>>> g()ohai there!
Monday, 8 July 13
How many of you knew the __code__ attribute was mutable?
>>> with open('some_code.py', 'w') as f:... f.write('print("Hello, world!")')...22>>> import some_codeHello, world!
Monday, 8 July 13
The code object is not unique to functions. The code in modules is also encapsulated in a code object.
>>> with open('some_code.py', 'w') as f:... f.write('print("Hello, world!")')...22>>> import some_codeHello, world!>>> print some_code.__cached____pycache__/some_code.cpython-33.pyc
Monday, 8 July 13
In fact, the "pyc" file that's written by Python to cache a module's code is the code object marshalled.
>>> with open('some_code.py', 'w') as f:... f.write('print("Hello, world!")')...22>>> import some_codeHello, world!>>> print some_code.__cached____pycache__/some_code.cpython-33.pyc>>> import marshal>>> with open(some_code.__cached__, 'rb') as f:... code = marshal.loads(f.read()[12:])...
Monday, 8 July 13
We can unmarshal that object. And I think you know where I'm heading with this.
>>> with open('some_code.py', 'w') as f:... f.write('print("Hello, world!")')...22>>> import some_codeHello, world!>>> print some_code.__cached____pycache__/some_code.cpython-33.pyc>>> import marshal>>> with open(some_code.__cached__, 'rb') as f:... code = marshal.loads(f.read()[12:])...>>> f.__code__ = code>>> f()Hello, world!
Monday, 8 July 13
I can't think of a single reason why you'd ever want to do this, so I'm not even going to bother to tell you not to. I have a feeling you'd be able to justify it regardless.
Monday, 8 July 13
You can also create code objects by hand.
Monday, 8 July 13
You don't even have to start with Python source code. Which, let's face it, would be the most obvious way of constructing code objects by hand. But we're not here for the obvious way to do things, are we?
<python><Module>
<FunctionDef name="adder"><arguments><arg arg="a" /><arg arg="b" /></arguments><body>
<Return><Add>
<left><Load id="a" /></left><right><Load id="b" /></right>
</Add></Return>
</body></FunctionDef><Expr>
<Call><func><Load id="print" /></func><args>
<Str value="1 + 2 =" /><Call>
<func><Load id="adder" /></func><args><Num value="1" /><Num value="2" /></args>
</Call></args>
</Call></Expr><Expr>
<Call><func><Load id="print" /></func><args>
<Str value="one + two =" /><Call>
<func><Load id="adder" /></func><args><Str value="one" /><Str value="two" /></args>
</Call></args>
</Call></Expr>
</Module></python>
Monday, 8 July 13
Witness, for example, the beautiful elegance of this XML Python source. We shall call this "adder.pyxml". It's it beautiful? And elegant? In this modern age of Service Oriented Architecture DOMs over well-formed WDSL carriers with ubiquitous SGML DTDs incorporating the full implementation of OMA DRM, who wouldn't want to code in XML directly? The way we do this is we parse the XML and construct what's known as an Abstract Syntax Tree which we can then compile into a code object.
...Monday, 8 July 13
Which basically consists of a bunch of this. I have it on good authority from someone close to the ast code that this really isn't done very often. I even managed to provoke a segfault from a deep corner of the ast code, so that was fun.
>>> import pyxml_loader>>> pyxml_loader.install()>>>>>> import adder>>>>>> print(adder.add(3, 4))7
Monday, 8 July 13
So, to make this glorious new possibility a reality, we install our pyxml loader and now we can import adder.pyxml! Huzzah!
Monday, 8 July 13
To do this, we create a customer file loader for the import machinery. The import stuff needs me to register a finder, which will locate pyxml files matching the module name and return a loader which will actually load the code for the module it found. You can also abuse this to import SQL files that you can execute. Or write funny little DSLs that meld Python and LISP. Or write something to implement macros for Python.
Monday, 8 July 13
let's throw in some of Python's slightly more powerful introspection capabilities and see what damage we can do...
>>> locals(){'__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__': None, '__package__': None, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__'}
Monday, 8 July 13
So, you all know about locals() and globals(), right? They give you a handle on the dictionary that is the local or global namespace you're in.
>>> import inspect>>> inspect.currentframe().f_locals{'__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__': None, '__package__': None, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__'}
Monday, 8 July 13
or as I like to call them by their full name inspect.currentframe.f_locals...
>>> locals(){'__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__': None, '__package__': None, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__'}>>> spam = 1>>> locals()['spam']1
Monday, 8 July 13
Anyway, you can poke at that dict just like it was a dict (hint: it *is* a dict)
>>> locals(){'__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__': None, '__package__': None, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__'}>>> spam = 1>>> locals()['spam']1>>> locals()['ham'] = 2>>> ham2
Monday, 8 July 13
So of course modifying that dict is possible to create new local or global variables.
Given some JSON in a file config.json:{ "message": "Hello, world!", "badness_value": 1000}
Monday, 8 July 13
So, given some JSON .. and I think you may know where I'm going with this
Given some JSON in a file config.json:{ "message": "Hello, world!", "badness_value": 1000}
>>> from json_loader import import_json>>> import_json('config.json')>>> message'Hello, world!'>>> badness_value1000
Monday, 8 July 13
Yes, loading variables directly from JSON files.
import jsonimport inspect
def import_json(filename):caller = inspect.currentframe().f_backcaller.f_locals.update(json.load(open('config.json')))
Monday, 8 July 13
Why should import * hog all the namespace pollution fun? We look up the call stack to find the local namespace of interest - the inspect module provides some handy features for this. First we get the frame - think of it as the state of the function. Each frame has a reference to its calling frame in f_back, and its local variables in f_locals. This is far from the worst thing I'll show you today, but it's still worth saying: you probably shouldn't do this.
>>> import marshal>>> class SerialiseMarshal(object):... @staticmethod... def serialise(data):... return marshal.dumps(data)... >>> class DoStuff(SerialiseMarshal):... def do_stuff(self):... return self.serialise(dict(message="Hello, world!"))... >>> DoStuff().do_stuff()b'{u\x07\x00\x00\x00messageu\r\x00\x00\x00Hello, world!0'
Monday, 8 July 13
OK, on to another kind of unexpected mutability. Let's go back to our serialisation idea. Say we have this kind of setup where a doing stuff class inherits from a mixin class to serialise some data using the marshal module.
>>> import json>>> class SerialiseJSON(object):... @staticmethod... def serialise(data):... return json.dumps(data)...
Monday, 8 July 13
Let's say we decide a little while later in the code that we want to stop serialising with marshal and use JSON instead.
>>> import json>>> class SerialiseJSON(object):... @staticmethod... def serialise(data):... return json.dumps(data)... >>> DoStuff.__bases__ = (SerialiseJSON,)>>> DoStuff().do_stuff()'{"message": "Hello, world!"}'
Monday, 8 July 13
We can just swap out the old mixin class and replace it with the new one and hey presto DON'T DO THIS.
class MyContextManager(object): def __enter__(self): # do stuff at start def __exit__(self, exc_type, exc_val, exc_tb): # do stuff at exit
with MyContextManager(): # do stuff!
Monday, 8 July 13
So context managers are pretty neat, right? So who else, when they saw them for the first time, thought "hey, I reckon we could hack some namespaces right here..." No? Oh, well I did.
>>> from context_capture import capture_in>>> d = {}>>> with capture_in(d):... spam = 'ham'... >>> d{'spam': 'ham'}
Monday, 8 July 13
Here's a context manager that'll snarf all local variable assignments and copy them into a dictionary called "d". Kind of like a little backup namespace. I am not going to justify this to you!
>>> from context_capture import capture_on>>> class T(object):... def __init__(self):... with capture_on(self):... spam = 'spam'... ham = 'ham'... >>> t = T()>>> t.spam'spam'
Monday, 8 July 13
It's pretty easy to modify the code to capture the locals onto another object. No more typing "self" all the time!
>>> from context_capture import capture_globals>>> def foo():... with capture_globals():... spam = 'ham'... print(spam)... >>> foo()ham>>> spam'ham'
Monday, 8 July 13
Who else is sick and tired of typing global all the time? Well we can do away with all those pesky "global" variable declarations by promoting all local assignments into the global namespace.
class LocalsCapture(object): def __enter__(self): caller_frame = inspect.currentframe().f_back self.local_names = set(caller_frame.f_locals) return self
def __exit__(self, exc_type, exc_val, exc_tb): caller_frame = inspect.currentframe().f_back for name in caller_frame.f_locals: if name not in self.local_names: self.capture(name, caller_frame.f_locals[name])
class capture_in(LocalsCapture): def __init__(self, namespace): self.namespace = namespace
def capture(self, name, value): self.namespace[name] = value
class capture_globals(capture_in): def __init__(self): caller_frame = inspect.currentframe().f_back super(capture_globals, self).__init__(caller_frame.f_globals)
Monday, 8 July 13
How does it work? Well, recall our context manager is invoked twice. The enter is invoked at the start, of the with block, so when that happens we snapshot the local variable names belonging to the caller's frame using old f_back and f_locals. When the with block exits we are invoked again so in exit we see what new local names exist and capture the new ones.
Monday, 8 July 13
A common problem we software developers face is that we're often asked to find out why some live code has gone awry. Sometimes we're not the author of the code, and sometimes we're not even familiar with the deployment scenario. And sometimes it's late at night and you're just not at all happy about having been called up to fix someone else's mess.
print 'query =', query
Monday, 8 July 13
So we try to print out some values like the web query but we have no idea where the output goes.
import sysprint >>sys.stderr, 'query =', query
Monday, 8 July 13
So we try maybe standard error, maybe that'll make it to the server logs?
import logginglogging.debug('query = %r', query)
Monday, 8 July 13
Nope, ok, maybe logging? But seriously, there's so many ways this can fail - not knowing where the log file is or what the logging level is set to. Ugh.
pip install q
To print the value of foo, put this in your program:
import q; q(foo)
Output will go to /tmp/q (or $TMPDIR/q), so:
tail -f /tmp/q
Monday, 8 July 13
So the q module was born. Quick and dirty debugging output for tired programmers.
results in this in the "q" file:
Monday, 8 July 13
The "q" module not only dumps the value but also includes the context the value was seen in including the expression that created the value and the function the q invocation was made in.
import q
@qdef function(...):...
Monday, 8 July 13
To trace a function's inputs and return value use the q as a decorator.
results in this in the "q" file:
Monday, 8 July 13
The decorator tracing gives you information about what arguments the function was called with and the return value from the function. It's clever if the return value is huge - that gets stored off in a separate file referenced from the q log. But there's a lot of funky stuff going on in q.
info = self.inspect.getframeinfo(self.sys._getframe(1), context=9)
# info.index is the index of the line containing the end of the call# expression, so this gets a few lines up to the end of the expression.lines = ['']if info.code_context: lines = info.code_context[:info.index + 1]
# If we see "@q" on a single line, behave like a trace decorator.if lines[-1].strip().startswith('@') and args: return self.trace(args[0])
Monday, 8 July 13
Just to give you some idea of how one bit works, this is how we determine whether q has been invoked as a decorator or just as a value-dumping function with a callable argument. The decorator usage is detected using this code. It walks the call stack to see whether we're invoked as a function or decorator by looking at the actual source code of the call site - if it looks like a decorator we declare it a decorator!
class OverloadDemo { void test() {
System.out.println("No parameters"); } // Overload test for one integer parameter. void test(int a) {
System.out.println("a: " + a); } // Overload test for two integer parameters. void test(int a, int b) {
System.out.println("a and b: " + a + " " + b); } // overload test for a double parameter void test(double a) {
System.out.println("double a: " + a); }
}
Monday, 8 July 13
This is Java. Don't do this.So back when I was teaching Python at university I had a student ask me how to do overloading of methods like Java does. I said that "Python doesn't work that way". Then I thought for a moment, and said "ask me again next week".
pip install overload
Monday, 8 July 13
So overload was born.
>>> class A(object):... @overload... def method(self, a):... return 'a'... @method.add... def method(self, a, b):... return 'a, b'...>>> a = A()>>> a.method(1)'a'>>> a.method(1, 2)'a, b'
Monday, 8 July 13
And here you go, method overloading.
>>> @overload... def func(a:int):... return 'int'...>>> @func.add... def func(a:str):... return 'str'...>>> func(1)'int'>>> func('s')'str'>>> func(1.0)Traceback (most recent call last): File "<stdin>", line 1, in <module> File "overload.py", line 94, in f raise TypeError('invalid call argument(s)')TypeError: invalid call argument(s)
Monday, 8 July 13
Overloading functions works too. As do function argument annotations. You can even overload classmethods, staticmethods and, if you really need to, classes themselves.
func.__defaults__
Monday, 8 July 13
The implementation uses a bunch of introspection into functions for things like the default argument values supplied at function creation time.
func.__code__.co_argcountfunc.__code__.co_varnames
Monday, 8 July 13
The code object's required argument count and the names of those required arguments. We match arguments passed into the function to the function signature using basically the same method as regular function calls but trying to find the first function signature that accepts the passed arguments. For each element in the argcount, we pop an element off the fixed arguments passed in. Or if there's none of those we grab the value from keyword arguments by argname.
func.__annotations__.get(arg)
Monday, 8 July 13
We can also look into the annotations dictionary which works by the named arguments in the function definition. We only care if ann is a type object and value is an instance of that type. If it's not a match we discard this overload option and move to the next (if any).
func.__code__.co_flags & 0x04
Monday, 8 July 13
And then if the *args flag is set, we can supply remaining arguments passed to the invocation along as *args values, otherwise
func.__code__.co_flags & 0x08
Monday, 8 July 13
We can also pass leftover keyword arguments in the invocation over as **keyword arguments.
if isinstance(callable, (classmethod, staticmethod)): ....
Monday, 8 July 13
So now given we have matched the supplied values to the function signature we invoke the function. There's some other hacks in there like detecting staticmethod and classmethod because they have a funky proxyish kinda object which handles the class argument.
Monday, 8 July 13
So the q module is nice but there's a something about it that's just a little fishy.
import q; q(foo)
Monday, 8 July 13
So, who can see something odd about this? That's right, modules aren't callable. To make this work, the q module resorts to a bit of a hack.
# Install the Q() object in sys.modules so that "import q" gives a callable q.import syssys.modules['q'] = Q()
Monday, 8 July 13
q currently does this, which has side-effects. Most notably, as soon as you replace the module in sys.modules, it is garbage-collected since there's no references to it (the Q class does not retain a reference to its module). Thus the Q class needs additional yucky hacks around imports and other things so it can handle that and additionally pretend to be a module. There's an alternative: we can make modules callable.
Given hello_world.py:
import callable_modulescallable_modules.enable()
def __call__(): print 'hello, world!'
Monday, 8 July 13
Given hello_world.py:
Python 2.7.1 (r271:86832, Aug 5 2011, 03:30:24) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> import hello_world>>> hello_world()hello, world!
Monday, 8 July 13
Hey, presto, callable modules! But how does callable_modules.enable work?
...
Monday, 8 July 13
The handling of callability is done at the types level (as in C types). Every builtin type like ints and modules have a PyTypeObject structure. Callability is implemented through a slot called tp_call. It's a pointer to a C function that is invoked when objects of that type are called. If it's not set (ie. NULL) then the objects of that type aren't callable. So we need to provide a callable for the tp_call slot and a way to assign it to the slot using ctypes. Oh yes, ctypes.
Monday, 8 July 13
First up, here's our callback for the tp_call slot. We define the C level API for the ternaryfunc callback and a simple Python function that implements the calling of the __call__ method on the object (module). The ctypes layer does some interesting things with Python callbacks for C functions that I won't go into now; suffice to say it took me a while to figure out the last argtype for the function declaration needed to be c_void_p...
Monday, 8 July 13
Next we define the PyTypeObject structure - or enough of it at least - so we can assign to the tp_call slot. We also need PyObject defined since we need to access the type object through the C object itself.
Monday, 8 July 13
So once we have all those parts in place, we can modify the module type to make its instances callable!
TODO segfault on missing __call__
>>> import callable_modules>>> callable_modules.enable()>>> >>> import string>>> def called(*args, **kw):... print 'called with', args, kw... >>> string.__call__ = called>>> string()called with () {}>>> string(1, 2, three='four')called with (1, 2) {'three': 'four'}
Monday, 8 July 13
Monday, 8 July 13
Using similar ctypes hackery we can modify builtin types to add new attributes. Yes, this has been done, see http://clarete.github.com/forbiddenfruit/.
>>> from forbiddenfruit import curse>>> from datetime import timedelta, datetime>>> curse(int, 'days', property(lambda s: timedelta(s)))>>> (12).daysdatetime.timedelta(12)>>> curse(timedelta, 'ago', property(lambda s: datetime.now() - s))>>> print (12).days.ago2013-05-31 18:56:49.745315
Monday, 8 July 13
The above is inspired by http://shouldly.github.com/ from Ruby land.
Thanks
Ryan KellyNick Coghlan
Monday, 8 July 13
Monday, 8 July 13
Monday, 8 July 13
Controlling Minecraft from Python- demo game of life or something- maybe "import this" in Minecraft?
Monday, 8 July 13
def reraise_as(new_type): e_type, e_value, e_traceback = sys.exc_info() new_type = new_exception_or_type new_exception = new_type() new_exception.__cause__ = e_value try: raise new_type, new_exception, e_traceback finally: del e_traceback
try: do_something_crazy()except Exception: reraise_as(UnhandledException)
Monday, 8 July 13
This is a neat idea by dcramer (TODO name) which uses the new __cause__ attribute of exceptions to allow you to re-raise an exception under a different type while not losing any information. https://github.com/dcramer/reraise
TODO
Monday, 8 July 13
bytecodehacks"optimisations"automatic "self"what's the worst thing we could do with bytecode?
TODO
Monday, 8 July 13
https://pypi.python.org/pypi/magicsuper
Top Related