[Python-Dev] Performance (non)optimization: 31-bit ints in pointers (original) (raw)

Jeff Epler jepler@unpythonic.net
Mon, 12 Aug 2002 15:51:08 -0500


Many Lisp interpreters use 'tagged types' to, among other things, let small ints reside directly in the machine registers.

Python might wish to take advantage of this by designating pointers to odd addresses stand for integers according to the following relationship: p = (i<<1) | 1 i = (p>>1) (due to alignment requirements on all common machines, all valid pointers-to-struct have 0 in their low bit) This means that all integers which fit in 31 bits can be stored without actually allocating or deallocating anything.

I modified a Python interpreter to the point where it could run simple programs. The changes are unfortunately very invasive, because they make any C code which simply executes o->ob_type or otherwise dereferences a PyObject* invalid when presented with a small int. This would obviously affect a huge amount of existing code in extensions, and is probably enough to stop this from being implemented before Python 3000.

This also introduces another conditional branch in many pieces of code, such as any call to PyObject_TypeCheck().

Performance results are mixed. A small program designed to test the speed of all-integer arithmetic comes out faster by 14% (3.38 vs 2.90 "user" time on my machine) but pystone comes out 5% slower (14124 vs 13358 "pystones/second").

I don't know if anybody's barked up this tree before, but I think these results show that it's almost certainly not worth the effort to incorporate this "performance" hack in Python. I'll keep my tree around for awhile, in case anybody else wants to see it, but beware that it still has serious issues even in the core: >>> 0+0j Traceback (most recent call last): File "", line 1, in ? TypeError: unsupported operand types for +: 'int' and 'complex' >>> (0).class Segmentation fault

Jeff jepler@unpythonic.net PS The program that shows the advantage of this optimization is as follows:

j = 0 for k in range(10): for i in range(100000) + range(1<<30, 1<<30 + 100000): j = j ^ i print j