[Python-Dev] Speeding up CPython 5-10% (original) (raw)

Stefan Behnel stefan_ml at behnel.de
Fri Jan 29 05:00:02 EST 2016


Yury Selivanov schrieb am 27.01.2016 um 19:25:

tl;dr The summary is that I have a patch that improves CPython performance up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available at [1]. There are no slowdowns that I could reproduce consistently.

There are two different optimizations that yield this speedup: LOADMETHOD/CALLMETHOD opcodes and per-opcode cache in ceval loop. LOADMETHOD & CALLMETHOD ------------------------- We had a lot of conversations with Victor about his PEP 509, and he sent me a link to his amazing compilation of notes about CPython performance [2]. One optimization that he pointed out to me was LOAD/CALLMETHOD opcodes, an idea first originated in PyPy. There is a patch that implements this optimization, it's tracked here: [3]. There are some low level details that I explained in the issue, but I'll go over the high level design in this email as well. Every time you access a method attribute on an object, a BoundMethod object is created. It is a fairly expensive operation, despite a freelist of BoundMethods (so that memory allocation is generally avoided). The idea is to detect what looks like a method call in the compiler, and emit a pair of specialized bytecodes for that. So instead of LOADGLOBAL/LOADATTR/CALLFUNCTION we will have LOADGLOBAL/LOADMETHOD/CALLMETHOD. LOADMETHOD looks at the object on top of the stack, and checks if the name resolves to a method or to a regular attribute. If it's a method, then we push the unbound method object and the object to the stack. If it's an attribute, we push the resolved attribute and NULL. When CALLMETHOD looks at the stack it knows how to call the unbound method properly (pushing the object as a first arg), or how to call a regular callable. This idea does make CPython faster around 2-4%. And it surely doesn't make it slower. I think it's a safe bet to at least implement this optimization in CPython 3.6. So far, the patch only optimizes positional-only method calls. It's possible to optimize all kind of calls, but this will necessitate 3 more opcodes (explained in the issue). We'll need to do some careful benchmarking to see if it's really needed.

I implemented a similar but simpler optimisation in Cython a while back:

http://blog.behnel.de/posts/faster-python-calls-in-cython-021.html

Instead of avoiding the creation of method objects, as you proposed, it just normally calls getattr and if that returns a bound method object, it uses inlined calling code that avoids re-packing the argument tuple. Interestingly, I got speedups of 5-15% for some of the Python benchmarks, but I don't quite remember which ones (at least raytrace and richards, I think), nor do I recall the overall gain, which (I assume) is what you are referring to with your 2-4% above. Might have been in the same order.

Stefan



More information about the Python-Dev mailing list