[Python-Dev] Tagged integers (original) (raw)

James Y Knight foom at fuhm.net
Thu Jul 15 08:37:45 CEST 2004

Previous message: [Python-Dev] Tagged integers
Next message: [Python-Dev] Tagged integers
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Jul 14, 2004, at 9:42 PM, Guido van Rossum wrote:

Sorry, I'm still not convinced that it's worth to break all the 3rd party extensions that undoubtedly are doing all sorts of things that strictly speaking they shouldn't do.

There really is a minimal set of things you can do that will not already cause problems. The only thing you can do with an arbitrary PyObject * is access its ob_type or ob_refcnt. Anything else will break with objects today. So, those accesses all need to be cleaned up to use the new macros.

The other thing a 3rd party extension could do that would now break is to access a PyIntObject's ->ob_ival field directly.There's already the PyInt_AS_LONG macro they ought to be using for that.

If the idea to rename the ob_type/ob_refcnt fields is implemented, I really don't see any mysterious runtime failures occurring.

The idea of doing this in stages (as Jeff Epler says) is probably a good one. One question though (I am somewhat unknowledgeable in this area): are C extensions generally binary compatible between python major versions anyways? I had thought that they weren't.

And what about all the extra code generated for PyDECREF and PyINCREF calls? These now all contain an extra jump. Horrors!

Indeed. That (and the similar branch in Py_GETTYPE) is why some operations are slower. The only question here is: do the speedups outweigh the slowdowns.

Investigating the pybench some more, it seems that speedup for many of the tests is because of the shortcut for tagged types in Py_INCREF and Py_DECREF. E.g. with the pybench TupleSlicing test, as written, the speed diff is -39.13%. However, changing the tuple to contain strings instead of integers causes the change to be +0.03%.

This makes me think that perhaps using an inline tagged representation for things like True, False, and single character strings < 256 might be a win as well, even though they don't ever cause memory allocation -- just because of the refcounting. Well, I tried making booleans a tagged types as well: pybench then runs 11% faster than standard python (vs the 7% with just tagged integers).

The all-important benchmark Parrotbench runs 4.8% faster than standard python.

James

P.S.: I have also previously experimented with removing refcounting from Python completely, and using the Boehm GC, because I theorized that refcounting could very well be slower than a full conservative GC. It mostly worked (I did not finish, so it did not support destructors or weakrefs), but was not a win, because it was no longer possible to use large block allocation and free list tricks for integers, and thus integer allocation caused a huge slowdown. I might revisit that now. I do realize using Boehm GC has pretty much a negative probability of making it into standard python, though. :)

Previous message: [Python-Dev] Tagged integers
Next message: [Python-Dev] Tagged integers
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list