[Python-Dev] Speeding up CPython 5-10% (original) (raw)

Damien George damien.p.george at gmail.com
Fri Jan 29 07:38:53 EST 2016

Previous message (by thread): [Python-Dev] Speeding up CPython 5-10%
Next message (by thread): [Python-Dev] Speeding up CPython 5-10%
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Yury,

An off-topic: have you ever tried hg.python.org/benchmarks or compare MicroPython vs CPython? I'm curious if MicroPython is faster -- in that case we'll try to copy some optimization ideas.

I've tried a small number of those benchmarks, but not in any rigorous way, and not enough to compare properly with CPython. Maybe one day I (or someone) will get to it and report results :)

One thing that makes MP fast is the use of pointer tagging and stuffing of small integers within object pointers. Thus integer arithmetic below 2**30 (on 32-bit arch) requires no heap.

Do you use opcode dictionary caching only for LOADGLOBAL-like opcodes? Do you have an equivalent of LOADFAST, or you use dicts to store local variables?

The opcodes that have dict caching are:

LOAD_NAME LOAD_GLOBAL LOAD_ATTR STORE_ATTR LOAD_METHOD (not implemented yet in mainline repo)

For local variables we use LOAD_FAST and STORE_FAST (and DELETE_FAST). Actually, there are 16 dedicated opcodes for loading from positions 0-15, and 16 for storing to these positions. Eg:

LOAD_FAST_0 LOAD_FAST_1 ...

Mostly this is done to save RAM, since LOAD_FAST_0 is 1 byte.

If we change the opcode size, it will probably affect libraries that compose or modify code objects. Modules like "dis" will also need to be updated. And that's probably just a tip of the iceberg.

We can still implement your approach if we add a separate private 'unsigned char' array to each code object, so that LOADGLOBAL can store the key offsets. It should be a bit faster than my current patch, since it has one less level of indirection. But this way we loose the ability to optimize LOADMETHOD, simply because it requires more memory for its cache. In any case, I'll experiment!

Problem with that approach (having a separate array for offset_guess) is that how do you know where to look into that array for a given LOAD_GLOBAL opcode? The second LOAD_GLOBAL in your bytecode should look into the second entry in the array, but how does it know?

I'd love to experiment implementing my original caching idea with CPython, but no time!

Cheers, Damien.

Previous message (by thread): [Python-Dev] Speeding up CPython 5-10%
Next message (by thread): [Python-Dev] Speeding up CPython 5-10%
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list