[Python-Dev] Speeding up CPython 5-10% (original) (raw)
Yury Selivanov yselivanov.ml at gmail.com
Fri Jan 29 10:06:38 EST 2016
- Previous message (by thread): [Python-Dev] Speeding up CPython 5-10%
- Next message (by thread): [Python-Dev] Speeding up CPython 5-10%
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Damien,
BTW I just saw (and backed!) your new Kickstarter campaign to port MicroPython to ESP8266, good stuff!
On 2016-01-29 7:38 AM, Damien George wrote:
Hi Yury,
[..] Do you use opcode dictionary caching only for LOADGLOBAL-like opcodes? Do you have an equivalent of LOADFAST, or you use dicts to store local variables? The opcodes that have dict caching are: LOADNAME LOADGLOBAL LOADATTR STOREATTR LOADMETHOD (not implemented yet in mainline repo) For local variables we use LOADFAST and STOREFAST (and DELETEFAST). Actually, there are 16 dedicated opcodes for loading from positions 0-15, and 16 for storing to these positions. Eg: LOADFAST0 LOADFAST1 ... Mostly this is done to save RAM, since LOADFAST0 is 1 byte.
Interesting. This might actually make CPython slightly faster too. Worth trying.
If we change the opcode size, it will probably affect libraries that compose or modify code objects. Modules like "dis" will also need to be updated. And that's probably just a tip of the iceberg.
We can still implement your approach if we add a separate private 'unsigned char' array to each code object, so that LOADGLOBAL can store the key offsets. It should be a bit faster than my current patch, since it has one less level of indirection. But this way we loose the ability to optimize LOADMETHOD, simply because it requires more memory for its cache. In any case, I'll experiment! Problem with that approach (having a separate array for offsetguess) is that how do you know where to look into that array for a given LOADGLOBAL opcode? The second LOADGLOBAL in your bytecode should look into the second entry in the array, but how does it know?
I've changed my approach a little bit. Now I have a simple function [1] to initialize the cache for code objects that are called frequently enough.
It walks through the code object's opcodes and creates the appropriate offset/cache tables.
Then, in ceval loop I have a couple of convenient macros to work with the cache [2]. They use INSTR_OFFSET() macro to locate the cache entry via the offset table initialized by [1].
Thanks, Yury
[1] https://github.com/1st1/cpython/blob/opcache4/Objects/codeobject.c#L167 [2] https://github.com/1st1/cpython/blob/opcache4/Python/ceval.c#L1164
- Previous message (by thread): [Python-Dev] Speeding up CPython 5-10%
- Next message (by thread): [Python-Dev] Speeding up CPython 5-10%
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]