[Python-Dev] Tagged integers (original) (raw)
James Y Knight foom at fuhm.net
Wed Jul 14 08:41:19 CEST 2004
- Previous message: [Python-Dev] ANN: Reminder -- SciPy 04 is coming up
- Next message: [Python-Dev] Tagged integers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
So I was saying to someone the other day "Gee, I wonder why Python doesn't use tagged integers, it seems like it would be a lot faster than allocating new objects all the time.", and they said "Cause you'd have to change everything, too much work!" and I said "Nah, you only need to change a few things to use macros, it'd only take a few hours, mostly query-replace".
So, of course, I had to do it then, and it only took a couple hours, and appears to be at least somewhat faster.
On the test that probably puts my change in the most positive light possible: "x = 0; while x < 50000000: x = x + 1", it achieves about a 50% increase in speed. More normal integer-heavy things seem to be at most 20% faster.
My implementation works as follows:
- Assumes at least 32-bit words
- Lower 2 bits of machine word are the tag. tag 0 == normal object pointer (all object pointers are already word aligned, so their lower 2 bits will be 0). tag 1 == integer. 2&3 unassigned.
- Integers that fit in 30 bits are stored into the word directly, not
in allocated memory.
- For example, integer "5" would be stored as (5 << 2) | 1 == 0x15.
- Thus, an arbitrary "PyObject *" can no longer be relied upon to actually be a pointer. Sometimes it will be tagged data instead.
- Thus, no code can directly access object fields ->ob_refcnt, or ->ob_type. Introduced Py_GETTYPE/Py_GETREF macros and search&replaced code in python to use them. These macros check the tag bits, and do the right thing if it is tagged data.
- I kept tagged integers as the same class as allocated int objects, as it simplified implementation (nothing outside intobject knows the difference). It would probably be nicer to do away with heap allocated int objects and overflow directly to long objects after 30 bits, but, IIRC, there are various bits of python that require integers up to the platform's full integer width to fit in an instance of PyInt_Type. Also, subclasses of integer must be heap allocated too, of course.
- intobject.c cannot directly access ->ob_ival. Instead, I made it use PyInt_AS_LONG, which is modified to know about tagged integers.
- That's about it!
So, why doesn't python already use tagged integers? Surely someone's thought to "just do it" before? I only see discussion of it with relation to pypy.
A couple things:
- It's almost certainly not fully portable -- that's fine! Keep using heap allocated int objects on architectures that it doesn't work with.
- It does cause incompatibility with external C modules. Not only binary incompatibility -- the source also needs to be modified to not use ->ob_type/->ob_refcnt directly. ob_refcnt is hardly used outside of INCREF/DECREF macros already, so that's good. ob_type is used a lot in extension modules' PyXXX_Check macros.
Here's the patch I have against Python-2.3.3. Please note this is just a couple hour hack, it may have errors. Most of the diff is quite boring and repetitious. <http://fuhm.net/~jknight/python233-tagint.diff.gz>.
So, any thoughts? Worth continuing on with this? If this is something that people are interested in actually doing, I could update the patch against latest CVS and put the changes in #ifdefs so it's compile-time selectable.
Thoughts for future development: There is space available for 2 more tagged data types. Could they be productively used? Perhaps one for single element tuples. Perhaps one for single character unicode strings. Dunno if those are easily doable and would actually increase performance.
James
PS: Here's the interesting portions of the changes. Yes, I realize typeof() and ({ are GCC extensions, but this was the most straightforward way to implement inline expression macros that may use their arguments more than once. Maybe they should be inline functions instead of macros?
===== object.h
#define Py_OBJ_TAG(op) (((Py_uintptr_t)(op)) & 0x03)
#define Py_GETTYPE(op) ({typeof((op)) __op = (op);
(!Py_OBJ_TAG(__op))?__op->ob_type:Py_tagged_types[Py_OBJ_TAG(__op)];
})
#define Py_GETREF(op) ({typeof((op)) __op = (op);
Py_OBJ_TAG(__op)?1:__op->ob_refcnt; })
#define Py_SETREF(op, val) ({typeof((op)) __op = (op);
if(!Py_OBJ_TAG(__op))
__op->ob_refcnt = (val);
})
#define Py_ADDREF(op, amt) ({typeof((op)) ___op = (op);
if(!Py_OBJ_TAG(___op))
___op->ob_refcnt += (amt);
Py_GETREF(___op);
})
#define Py_INCREF(op) (
_Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA
Py_ADDREF(op, 1), (void)0)
#define Py_DECREF(op)
if (_Py_DEC_REFTOTAL _Py_REF_DEBUG_COMMA
Py_ADDREF(op, -1) != 0)
_Py_CHECK_REFCNT(op)
else
_Py_Dealloc((PyObject *)(op))
===== intobject.h
#define PyInt_AS_LONG(op) ({typeof((op)) __op = (op);
Py_OBJ_TAG(__op)?(((long)__op) >> 2):((PyIntObject *)__op)->ob_ival;
})
- Previous message: [Python-Dev] ANN: Reminder -- SciPy 04 is coming up
- Next message: [Python-Dev] Tagged integers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]