[Python-Dev] Simple dicts (original) (raw)

Tim Peters tim.one@comcast.net
Wed, 14 May 2003 23:08:18 -0400

Previous message: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib warnings.py, 1.19, 1.20
Next message: [Python-Dev] Simple dicts
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Behind the scenes, Damien Morton has been playing with radically different designs for "small" dicts. This reminded me that, in previous lives, I always used chaining ("linked lists") for collision resolution in hash tables. I don't have time to pursue this now, but it would make a nice experiment for someone who wants to try a non-trivial but still-simple and interesting project. It may even pay off, but that shouldn't be a motivation .

Design notes:

PyDictEntry would grow a new

PyDictEntry *next;

slot, boosting it from 12 to 16 bytes on a 32-bit box. This wasn't reasonable when Python was designed, but pymalloc allocates 16-byte chunks with virtually no wasted space.

PyDictObject would loose the ma_fill and ma_table and ma_smalltable members, and gain a pointer to a variable-size vector of pointers

PyDictEntry **first;

For a hash table with 2n slots, this vector holds 2n pointers, memset to NULL initially. The hash chain for an object with hash code h starts at first[h & ma_mask], and is linked together via PyDictEntry.next pointers. Hash chains per hash code are independent. There's no use for the "dummy" state. There's no logical need for the "unused" state either, although a micro-optimizer may want to retain that in some form.

Memory use isn't nearly as bad as it may first appear -- to the contrary, it's probably better on average! Assuming a 32-bit box:

Current: tables are normally 1/3 to 2/3 full. If there are N active objects in the table, at 1/3 full the table contains 3N PyDictEntries, and at 2/3 full it contains 1.5N PyDictEntries, for a total of (multiplying by 12 bytes per PyDictEntry) 18N to 36N bytes.

Chaining: assuming tables are still 1/3 to 2/3 full. At 1/3 full there are 3N first pointers and at 2/3 full there are 1.5N first pointers, for a total of 6N to 12N bytes for first pointers. Independent of load factor, 16N bytes are consumed by the larger PyDictEntry structs. Adding, that's 22N to 28*N bytes. This relies on pymalloc's tiny wastage when allocating 16-byte chunks (under 1%). The worst case is worse than the current scheme, and the best case is better. The average is probably better.

Note that "full" is a misnomer here. A chained table with 2i slots can actually hold any number of objects, even if i==0; on average, each hash chain contains N/2i PyDictEntry structs.

Note that a small load factor is less important with chained resolution than with open addressing, because collisions at different hash codes can't interfere with each other (IOW, an object in slot #i never slows access to an object in the slot #j collision list, whenever i != j; "breathing room" to ease cross-hashcode collision pressure isn't needed; primary collisions are all that exist).

Collision resolution code: Just a list walk. For example, lookdict_string could be, in its entirety:

static dictentry * lookdict_string(dictobject *mp, PyObject *key, register long hash) { dictentry *p = mp->first[hash & mp->ma_mask];

if (PyString_CheckExact(key)) {
    for (; p != NULL; p = p->next) {
        if (p->me_key == key ||
            (ep->me_hash == hash &&
             PyString_Eq(ep->me_key, key)))
            return p;
    }
    return NULL;
}

mp->ma_lookup = lookdict;
return lookdict(mp, key, hash);

}

Resizing: Probably much faster. The vector of first pointers can be realloc'ed, and sometimes benefit from the platform malloc extending it in-place. No other memory allocation operation is needed on a resize. Instead about half the PyDictEntry structs will need to move to "the other half" of the table (the structs themselves don't get copied or moved; they just get relinked via their next pointers).

Copying: Probably slower, due to needing a PyObject_Malloc() call for each key/value pair.

Building a dict up: Probably slower, again due to repeated PyObject_Malloc() calls.

Referencing a dict: Probably a wash, although because the code can be so much simpler compilers may do a better job of optimizing it, and no tests are needed to distinguish among three kinds of states. Out-of-cache dicts are killers either way. Also see next point.

Optimizations: The cool algorithmic thing about chaining is that self-organizing gimmicks (like swap-toward-front (or move-to-front) on reference) are easy to code and run fast (again, the dictentry structs don't move, you just need to fiddle a few next pointers). When collision chains can collide, dynamic table reorganization is so complicated and expensive that nobody has even thought about trying it in Python. When they can't collide, it's simple. Note too that since the memory burden per unused slot falls from 12 to 4 bytes, sparser tables are less painful to contemplate.

Small dicts: There's no gimmick here to favor them.

Previous message: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib warnings.py, 1.19, 1.20
Next message: [Python-Dev] Simple dicts
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]