[Python-Dev] Bytecode analysis (original) (raw)

damien morton dmorton@bitfurnace.com
Wed, 26 Feb 2003 00:59:11 -0500


I implemented LOAD_FAST_n, STORE_FAST_n, LOAD_CONST_n for n < 16

Getting a small 2% improvement in speed Going from about 21800 PyStones to 22300 PyStones; very hard to get consistent readings on the PyStones - anyone got any tips on how to get more consistent results under windows?

Getting a small 3% reduction in .pyc filesizes os.path 24,929 unmodified os.path 24,149 with modifications

I sort of cheated on the switch statement to avoid the use of a goto.

    opcode = NEXTOP();
    if (HAS_ARG(opcode))
        oparg = NEXTARG();
    ...
    switch (opcode) {
    ...
    case LOAD_FAST_14:
    case LOAD_FAST_15: 
        oparg = opcode - LOAD_FAST_0;
    case LOAD_FAST:
        x = GETLOCAL(oparg);
        if (x != NULL) {
            Py_INCREF(x);
        ...

I also altered the opcode.h file to use an enum for the opcodes instead of all those #defines. Much easier to re-arrange things that way. I have a feeling that most of the speedup (such that it is) comes from that re-arrangment, which packs the opcodes into a contiguous numeric space. I suspect that sorting the opcodes by frequency of access might also have some positive effect. Also, organising the opcodes and the switch statement so that frequently co-occuring opcodes are adjacent to each other might also have some positive effect.

-----Original Message----- From: guido@python.org [mailto:guido@python.org] Sent: Tuesday, 25 February 2003 20:25 To: damien morton Cc: python-dev@python.org Subject: Re: [Python-Dev] Bytecode analysis

> As you say, LOADFAST is a very frequently occuring instruction, both > statically and dynamically. Reducing it from a 3 byte instruction to a > 1 byte instruction in 97% of (static) cases should be an overall good. > > Most of the opcodes I proposed could be added without disturbing > locality of reference. > > e.g. > > switch (op = *p++) { > ... > case LOADFAST: > index = (*p++) + (*p++)<<8_ _> goto LOADFASTMAIN; > break; > case LOADFAST0: > case LOADFAST1: > case LOADFAST15: > index = op - LOADFAST0 > LOADFASTMAIN: > ... > break; > > > } Good idea. Can you benchmark this? --Guido van Rossum (home page: http://www.python.org/~guido/)