[Python-Dev] Wordcode: new regular bytecode using 16-bit units (original) (raw)

Victor Stinner victor.stinner at gmail.com
Wed Apr 13 12:24:44 EDT 2016

Previous message (by thread): [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them!
Next message (by thread): [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

In the middle of recent discussions about Python performance, it was discussed to change the Python bytecode. Serhiy proposed to reuse MicroPython short bytecode to reduce the disk space and reduce the memory footprint.

Demur Rumed proposes a different change to use a regular bytecode using 16-bit units: an instruction has always one 8-bit argument, it's zero if the instruction doesn't have an argument:

http://bugs.python.org/issue26647

According to benchmarks, it looks faster:

http://bugs.python.org/issue26647#msg263339

IMHO it's a nice enhancement: it makes the code simpler. The most interesting change is made in Python/ceval.c:

```
   if (HAS_ARG(opcode))
```
```
       oparg = NEXTARG();
```

```
   oparg = NEXTARG();
```

This code is the very hot loop evaluating Python bytecode. I expect that removing a conditional branch here can reduce the CPU branch misprediction.

I reviewed first versions of the change, and IMHO it's almost ready to be merged. But I would prefer to have a review from a least a second core reviewer.

Can someone please review the change?

The side effect of wordcode is that arguments in 0..255 now uses 2 bytes per instruction instead of 3, so it also reduce the size of bytecode for the most common case.

Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6 bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit argument for keyword defaults and 24-bit argument for annotations. Other common instruction known to use large argument are jumps for bytecode longer than 256 bytes.

Right now, ceval.c still fetchs opcode and then oparg with two 8-bit instructions. Later, we can discuss if it would be possible to ensure that the bytecode is always aligned to 16-bit in memory to fetch the two bytes using a uint16_t* pointer.

Maybe we can overallocate 1 byte in codeobject.c and align manually the memory block if needed. Or ceval.c should maybe copy the code if it's not aligned?

Raymond Hettinger proposes something like that, but it looks like there are concerns about non-aligned memory accesses:

http://bugs.python.org/issue25823

The cost of non-aligned memory accesses depends on the CPU architecture, but it can raise a SIGBUS on some arch (MIPS and SPARC?).

Victor

Previous message (by thread): [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them!
Next message (by thread): [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list