[Python-Dev] Bytecode analysis (original) (raw)
damien morton dmorton@bitfurnace.com
Wed, 26 Feb 2003 00:59:11 -0500
- Previous message: [Python-Dev] Bytecode analysis
- Next message: [Python-Dev] Bytecode analysis
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I implemented LOAD_FAST_n, STORE_FAST_n, LOAD_CONST_n for n < 16
Getting a small 2% improvement in speed Going from about 21800 PyStones to 22300 PyStones; very hard to get consistent readings on the PyStones - anyone got any tips on how to get more consistent results under windows?
Getting a small 3% reduction in .pyc filesizes os.path 24,929 unmodified os.path 24,149 with modifications
I sort of cheated on the switch statement to avoid the use of a goto.
opcode = NEXTOP();
if (HAS_ARG(opcode))
oparg = NEXTARG();
...
switch (opcode) {
...
case LOAD_FAST_14:
case LOAD_FAST_15:
oparg = opcode - LOAD_FAST_0;
case LOAD_FAST:
x = GETLOCAL(oparg);
if (x != NULL) {
Py_INCREF(x);
...
I also altered the opcode.h file to use an enum for the opcodes instead of all those #defines. Much easier to re-arrange things that way. I have a feeling that most of the speedup (such that it is) comes from that re-arrangment, which packs the opcodes into a contiguous numeric space. I suspect that sorting the opcodes by frequency of access might also have some positive effect. Also, organising the opcodes and the switch statement so that frequently co-occuring opcodes are adjacent to each other might also have some positive effect.
-----Original Message----- From: guido@python.org [mailto:guido@python.org] Sent: Tuesday, 25 February 2003 20:25 To: damien morton Cc: python-dev@python.org Subject: Re: [Python-Dev] Bytecode analysis
> As you say, LOADFAST is a very frequently occuring instruction, both > statically and dynamically. Reducing it from a 3 byte instruction to a > 1 byte instruction in 97% of (static) cases should be an overall good. > > Most of the opcodes I proposed could be added without disturbing > locality of reference. > > e.g. > > switch (op = *p++) { > ... > case LOADFAST: > index = (*p++) + (*p++)<<8_ _> goto LOADFASTMAIN; > break; > case LOADFAST0: > case LOADFAST1: > case LOADFAST15: > index = op - LOADFAST0 > LOADFASTMAIN: > ... > break; > > > } Good idea. Can you benchmark this? --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] Bytecode analysis
- Next message: [Python-Dev] Bytecode analysis
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]