[Python-Dev] Tuning Python dicts (original) (raw)
Reid Kleckner rnk at mit.edu
Sat Apr 10 22:32:25 CEST 2010
- Previous message: [Python-Dev] Tuning Python dicts
- Next message: [Python-Dev] Tuning Python dicts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Apr 10, 2010 at 2:57 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
Any other advice would also be helpful. I may be missing the point, but ISTM that the assumption of this approach is that there are often collisions in the hash table. I think that assumption is false; at least, I recommend to validate that assumption before proceeding.
It's just an experiment for a class, not something I am (yet) seriously thinking about contributing back to CPython. I think my chances of improving over the current implementation are slim. I do not have that much hubris. :) I would just rather do experimental rather than theoretical work with data structures.
I think you're right about the number of collisions, though. CPython dicts use a pretty low load factor (2/3) to keep collision counts down. One of the major benefits cited in the paper is the ability to maintain performance in the face of higher load factors, so I may be able to bump up the load factor to save memory. This would increase collisions, but then that wouldn't matter, because resolving them would only require looking within two consecutive cache lines.
There are, of course, cases where dicts do show collisions (namely when all keys hash equal), however, I'm uncertain whether the approach would help in that case.
Yes, in fact, hopscotch hashing fails completely as I mentioned at the end of my last message.
Reid
- Previous message: [Python-Dev] Tuning Python dicts
- Next message: [Python-Dev] Tuning Python dicts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]