[Python-Dev] Tuning Python dicts (original) (raw)

Reid Kleckner rnk at mit.edu
Sat Apr 10 23:35:24 CEST 2010


On Sat, Apr 10, 2010 at 4:40 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

Reid Kleckner <rnk mit.edu> writes:

I think you're right about the number of collisions, though.  CPython dicts use a pretty low load factor (2/3) to keep collision counts down.  One of the major benefits cited in the paper is the ability to maintain performance in the face of higher load factors, so I may be able to bump up the load factor to save memory.  This would increase collisions, but then that wouldn't matter, because resolving them would only require looking within two consecutive cache lines. Why wouldn't it matter? Hash collisions still involve more CPU work, even though if you're not access memory a lot.

So we know for sure that extra CPU work to avoid cache misses is a big win, but it's never clear whether or not any given memory access will be a miss. Because Python's access patterns are rarely random, it may turn out that all of the elements it accesses are in cache and the extra CPU work dominates. If you have a random access pattern over a large dataset, the dictionary will not fit in cache, and saving random memory accesses at the expense of CPU will be a win.

Reid



More information about the Python-Dev mailing list