[Python-Dev] Python 3 optimizations... (original) (raw)

Stefan Behnel stefan_ml at behnel.de
Fri Jul 23 10:38:32 CEST 2010


stefan brunthaler, 23.07.2010 08:48:

I guess it would be a good idea to quickly outline my inline caching approach, so that we all have a basic understanding of how it works.

Yes, that certainly makes it easier to discuss.

If we take for instance the BINARYADD instruction, the interpreter evaluates the actual operand types and chooses the matching operation implementation at runtime, i.e., operands that are unicode strings will be concatenated via unicodeconcatenate, for float operands on the other hand, the interpreter would end up invoking floatadd via binaryop1. Now, a very efficient way to achieve purely interpretative inline caching is to quicken the type-generic BINARYADD instruction to a type-dependent FLOATADD instruction (this technique, i.e., inline caching via quickening, is the primary contribution of my ECOOP paper). Hence, I have a very simple code generator, that generates type-dependent interpreter instructions in a pre-compile step of the interpreter, and uses runtime type information to quicken/rewrite instructions. Aside of the operators, I have implemented this quickening technique for FORITER, COMPAREOP and CALLFUNCTION instructions.

This sounds like wpython (a CPython derivative with a wider set of byte code commands) could benefit from it.

Do I understand correctly that you modify the byte code of modules/functions at runtime?

I'm absolutely interested, although not for the CPython project but for Cython. I wonder how you do inline caching in Python if the methods of a type can be replaced by whatever at runtime. Could you elaborate on that? Currently, I only provide optimized derivatives for several separate call targets, i.e., whether a call target is a C function with varargs, or a Python function/method--this already eliminates a lot of overhead from invoking callfunction.

Ah, yes, that makes good sense. So you basically add an intermediate step to calls that provides faster dispatch for known C functions.

Or do you restrict yourself to builtin types? Currently, my approach provides optimized derivative instructions for the standard library, e.g., unicode strings, numerical objects, containers, and iterators.

I'm interested in the code that determines what can be optimised in what way. I read that Jython recently received a contribution that provides type information for lots of modules and builtins, but having something like that for CPython would be cool.

That might be worth it already, just think of list.append(). We have an optimistic optimisation for object.append() in Cython that gives us massive speed-ups in loops that build lists, even if we don't know at compile time that we are dealing with lists.

Yes, that sounds like a reasonable thing to do. I could provide much more optimized derivatives based on application profiles, too. Since I use a simple code generator for generating the derivatives, it would also be possible to provide end-users with the means to analyze their apps and generate optimized instruction derivatives matching their profile.

Such an approach would also be very useful for Cython. Think of a profiler that runs a program in CPython and tells you exactly what static type annotations to put where in your Python code to make it compile to a fast binary with Cython. Or, even better, it could just spit out a .pxd file that you drop next to your .py file and that provides the static type information for you.

Stefan



More information about the Python-Dev mailing list