[Python-Dev] Opcode cache in ceval loop (original) (raw)
Yury Selivanov yselivanov.ml at gmail.com
Mon Feb 1 16:21:37 EST 2016
- Previous message (by thread): [Python-Dev] Opcode cache in ceval loop
- Next message (by thread): [Python-Dev] Opcode cache in ceval loop
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Damien,
On 2016-02-01 3:59 PM, Damien George wrote:
Hi Yury,
That's great news about the speed improvements with the dict offset cache!
The cache struct is defined in code.h [2], and is 32 bytes long. When a code object becomes hot, it gets an cache offset table allocated for it (+1 byte for each opcode) + an array of cache structs. Ok, so each opcode has a 1-byte cache that sits separately to the actual bytecode. But a lot of opcodes don't use it so that leads to some wasted memory, correct?
Each code object has a list of opcodes and their arguments (bytes object == unsigned char array).
"Hot" code objects have an offset table (unsigned chars), and a cache entries array (hope your email client will display the following correctly):
opcodes offset cache entries
table
OPCODE 0 cache for 1st LOAD_ATTR
ARG1 0 cache for 1st LOAD_GLOBAL
ARG2 0 cache for 2nd LOAD_ATTR
OPCODE 0 cache for 1st LOAD_METHOD
LOAD_ATTR 1 ...
ARG1 0
ARG2 0
OPCODE 0
LOAD_GLOBAL 2
ARG1 0
ARG2 0
LOAD_ATTR 3
ARG1 0
ARG2 0
... ...
LOAD_METHOD 4
... ...
When, say, a LOAD_ATTR opcode executes, it first checks if the code object has a non-NULL cache-entries table.
If it has, that LOAD_ATTR then uses the offset table (indexing
with its INSTR_OFFSET()
) to find its position in
cache-entries.
But then how do you index the cache, do you keep a count of the current opcode number? If I remember correctly, CPython has some opcodes taking 1 byte, and some taking 3 bytes, so the offset into the bytecode cannot be easily mapped to a bytecode number.
First, when a code object is created, it doesn't have an offset table and cache entries (those are set to NULL).
Each code object has a new field to count how many times it was called. Each time a code object is called with PyEval_EvalFrameEx, that field is inced.
Once a code object is called more than 1024 times we:
allocate memory for its offset table
iterate through its opcodes and count how many LOAD_ATTR, LOAD_METHOD and LOAD_GLOBAL opcodes it has;
As part of (2) we initialize the offset-table with correct mapping. Some opcodes will have a non-zero entry in the offset-table, some won't. Opcode args will always have zeros in the offset tables.
Then we allocate cache-entries table.
Yury
- Previous message (by thread): [Python-Dev] Opcode cache in ceval loop
- Next message (by thread): [Python-Dev] Opcode cache in ceval loop
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]