Message 258204 - Python tracker (original) (raw)

This issue supersedes issue #6033. I decided to open a new one, since the patch is about Python 3.6 (not 2.7) and is written from scratch.

The idea is to add new opcodes to avoid instantiation of BoundMethods. The patch only affects method calls of Python functions with positional arguments.

I'm working on the attached patch in this repo: https://github.com/1st1/cpython/tree/call_meth2

If the patch gets accepted, I'll update it with the docs etc.

Performance Improvements

Method calls in micro-benchmarks are 1.2x faster:

call_method

Avg: 0.330371 -> 0.281452: 1.17x faster

call_method_slots

Avg: 0.333489 -> 0.280612: 1.19x faster

call_method_unknown

Avg: 0.304949 -> 0.251623: 1.21x faster

Improvements in mini-benchmarks, such as Richards are less impressive, I'd say it's 3-7% improvement. The full output of benchmarks/perf.py is here: https://gist.github.com/1st1/e00f11586329f68fd490

When the full benchmarks suite is run, some of them report that they were slow. When I ran them separately several times, they all show no real slowdowns. It's just some of them (such as nbody) are highly unstable.

It's actually possible to improve the performance another 1-3% if we fuse __PyObject_GetMethod with ceval/LOAD_METHOD code. I've tried this here: https://github.com/1st1/cpython/tree/call_meth4, however I don't like to have so many details of object.c into ceval.c.

Changes in the Core

Two new opcodes are added -- LOAD_METHOD and CALL_METHOD. Whenever compiler sees a method call "obj.method(..)" with positional arguments it compiles it as follows:

LOAD_FAST(obj) LOAD_METHOD(method) {call arguments} CALL_METHOD

LOAD_METHOD implementation in ceval looks up "method" on obj's type, and checks that it wasn't overridden in obj.dict. Apparently, even with the dict check this is still faster then creating a BoundMethod instance etc.

If the method is found and not overridden, LOAD_METHOD pushes the resolved method object, and 'obj'. If the method was overridden, the resolved method object and NULL are pushed to the stack.

CALL_METHOD then looks at the two stack values after call arguments. If the first one isn't NULL, it means that we have a method call.

Why CALL_METHOD?

It's actually possible to hack CALL_FUNCTION to support LOAD_METHOD. I've tried this approach in https://github.com/1st1/cpython/tree/call_meth3. It looks like that adding extra checks in CALL_FUNCTION have negative impact on many benchmarks. It's really easier to add another opcode.

Why only pure-Python methods?

LOAD_METHOD atm works only with methods defined in pure Python. C methods, such as list.append are still wrapped into a descriptor, that creates a PyCFunction object on each attribute access. I've tried to do that in https://github.com/1st1/cpython/tree/call_cmeth. It does impact C method calls in a positive way, although my implementation is very hacky. It still uses LOAD_METHOD and CALL_METHOD opcodes, so my idea is to consider merging this patch first, and then introduce the necessary refactoring of PyCFunction and MethodDesctiptors in a separate issue.

Why only calls with positional arguments?

As showed in "Why CALL_METHOD?", making CALL_FUNCTION to work with LOAD_METHOD slows it down. For keyword and var-arg calls we have three more opcodes -- CALL_FUNCTION_VAR, CALL_FUNCTION_KW, and CALL_FUNCTION_VAR_KW. I suspect that making them work with LOAD_METHOD would slow them down too, which will probably require us to add three (!) more opcodes for LOAD_METHOD.

And these kind of calls require much more overhead anyways, I don't expect them to be as optimizable as positional arg calls.