[Python-Dev] Opcode cache in ceval loop (original) (raw)

Yury Selivanov yselivanov.ml at gmail.com
Mon Feb 1 16:21:37 EST 2016


Hi Damien,

On 2016-02-01 3:59 PM, Damien George wrote:

Hi Yury,

That's great news about the speed improvements with the dict offset cache!

The cache struct is defined in code.h [2], and is 32 bytes long. When a code object becomes hot, it gets an cache offset table allocated for it (+1 byte for each opcode) + an array of cache structs. Ok, so each opcode has a 1-byte cache that sits separately to the actual bytecode. But a lot of opcodes don't use it so that leads to some wasted memory, correct?

Each code object has a list of opcodes and their arguments (bytes object == unsigned char array).

"Hot" code objects have an offset table (unsigned chars), and a cache entries array (hope your email client will display the following correctly):

opcodes          offset       cache entries
                 table

 OPCODE            0            cache for 1st LOAD_ATTR
 ARG1              0            cache for 1st LOAD_GLOBAL
 ARG2              0            cache for 2nd LOAD_ATTR
 OPCODE            0            cache for 1st LOAD_METHOD
 LOAD_ATTR         1            ...
 ARG1              0
 ARG2              0
 OPCODE            0
 LOAD_GLOBAL       2
 ARG1              0
 ARG2              0
 LOAD_ATTR         3
 ARG1              0
 ARG2              0
 ...              ...
 LOAD_METHOD       4
 ...              ...

When, say, a LOAD_ATTR opcode executes, it first checks if the code object has a non-NULL cache-entries table.

If it has, that LOAD_ATTR then uses the offset table (indexing with its INSTR_OFFSET()) to find its position in cache-entries.

But then how do you index the cache, do you keep a count of the current opcode number? If I remember correctly, CPython has some opcodes taking 1 byte, and some taking 3 bytes, so the offset into the bytecode cannot be easily mapped to a bytecode number.

First, when a code object is created, it doesn't have an offset table and cache entries (those are set to NULL).

Each code object has a new field to count how many times it was called. Each time a code object is called with PyEval_EvalFrameEx, that field is inced.

Once a code object is called more than 1024 times we:

  1. allocate memory for its offset table

  2. iterate through its opcodes and count how many LOAD_ATTR, LOAD_METHOD and LOAD_GLOBAL opcodes it has;

  3. As part of (2) we initialize the offset-table with correct mapping. Some opcodes will have a non-zero entry in the offset-table, some won't. Opcode args will always have zeros in the offset tables.

  4. Then we allocate cache-entries table.

Yury



More information about the Python-Dev mailing list