[Python-Dev] Reducing memory overhead for dictionaries by removing me_hash (original) (raw)

Kirat Singh kirat.singh at gmail.com
Sun Apr 23 23:03:37 CEST 2006


Interesting, thanks for the responses. And yeah I meant 1/3, I always mix up negatives.

Agree that as you point out the biggest slowdown will be on classes that define their own hash, however since classes use instancedicts and this would reduce the dict size from 96 -> 64 bytes, we could blow 4 bytes to cache the hash on the object.

In fact PyObject_Hash could 'intern' the result of hash into a hashvalue member of the class dict. This might be the best of both worlds since it'll only use space for the hashvalue if its needed.

Oh and the reason I brought up strings was that one can grab the ob_shash from the stringobject in lookupdict_string to avoid even the function call to get the hash for a string, so its just the same as storing the hash on the entry for strings.

The reason I looked into this to begin with was that my code used up a bunch of memory which was traceable to lots of little objects with instance dicts, so it seemed that if instancedicts took less memory I wouldn't have to go and add slots to a bunch of my classes, or rewrite things as tuples/lists, etc.

thanks! -Kirat

On 4/23/06, Tim Peters <tim.peters at gmail.com> wrote:

[Kirat Singh] > Hi, this is my first python dev post, so please forgive me if this topic has > already been discussed. It's hard to find one that hasn't -- but it's even harder to find the old discussions ;-) > It seemed to me that removing mehash from a dict entry would save 2/3 of > the space 1/3, right? > used by dictionaries and also improve alignment of the entries > since they'd be 8 bytes instead of 12. How would that help? On 32-bit boxes, we have 3 4-byte members in PyDictEntry, and they'll all 4-byte aligned. In what respect related to alignment is that sub-optimal? > And sets end up having just 4 byte entries. > > I'm guessing that string dicts are the most common (hence the specialized > lookupdictstring routine), String dicts were the only kind at first, and their speed is critical because Python itself makes heavy use of them (e.g., to implement instance and module namespaces, and keyword arguments). > and since strings already contain their hash, this would probably mitigate > the performance impact. No slowdown in string dicts would be welcome, but since strings already cache their own hash, they seem unaffected by this. It's the speed of other key types that would suffer, and for classes that define their own hash method that could well be deadly (see Martin's reply for more detail). > One could also add a hash to Tuples since they are immutable. A patch to do that was recently rejected. You can read its comments for some of the reasons: http://www.python.org/sf/1462796 More reasons were given in a python-dev thread about the same thing earlier this month: http://mail.python.org/pipermail/python-dev/2006-April/063275.html > If this isn't a totally stupid idea, I'd be happy to volunteer to try the > experiment and run any suggested tests. I'd be -1 if it slowed dict operations for classes that define their own hash. I do a lot of that ;-) > PS any opinion on making PyStringEq a macro? Yes: don't bother unless it provably speeds something "important" :-) It's kinda messy for a macro otherwise, macros always make debugging harder (can't step through the source expansion in a debugger w/o a lot of pain), etc. > inline function would be nice but I hesitate to bring up the C/C++ debate, both > languages suck in their own special way ;-) Does the Python source even compile as C++ now? People have been working toward that, but my last impression was that it's not there yet. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20060423/518a4e11/attachment.htm



More information about the Python-Dev mailing list