[Python-Dev] More compact dictionaries with faster iteration (original) (raw)
Andrew Svetlov andrew.svetlov at gmail.com
Tue Dec 18 23:40:49 CET 2012
- Previous message: [Python-Dev] More compact dictionaries with faster iteration
- Next message: [Python-Dev] More compact dictionaries with faster iteration
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Good plan!
On Tue, Dec 18, 2012 at 11:35 PM, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
On Dec 11, 2012, at 1:13 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
On Dec 10, 2012, at 2:48 AM, Christian Heimes <christian at python.org> wrote: On the other hand every lookup and collision checks needs an additional multiplication, addition and pointer dereferencing: entry = entriesbaseaddr + sizeof(PyDictKeyEntry) * idx
Currently, the dict implementation allows alternative lookup functions based on whether the keys are all strings. The choice of lookup function is stored in a function pointer. That lets each lookup use the currently active lookup function without having to make any computations or branches. An indirect function call is technically a branch, as seen from the CPU (and not necessarily a very predictable one, although recent Intel CPUs are said to be quite good at that). FWIW, we already have an indirection to the lookup function. I would piggyback on that, so no new indirections are required. My plan now is to apply the space compaction idea to sets. That code is less complex than dicts, and set operations stand to benefit the most from improved iteration speed. The steps would be: * Create a good set of benchmarks for set operations for both size and speed. * Start with the simplest version of the idea: separate the entries table from the hash table. Keep the hash table at Pyssizet, and pre-allocate the entry array to two-thirds the size of the hash table. This should give about a 25% space savings and speed-up iteration for all the set-to-set operations. * If that all works out, I want to trim the entry table for frozensefs so that the entry table has no over-allocations. This should give a much larger space savings. * Next, I want to experiment with the idea of using char/short/long sizes for the hash table. Since there is already an existing lookup indirection, this can be done with no additional overhead. Small sets will get the most benefit for the space savings and the cache performance for hash lookups should improve nicely (for a hash table of size 32 could fit in a single cache line). At each step, I'll run the benchmarks to make sure the expected speed and space benefits are being achieved. As a side-effect, sets without deletions will retain their insertion order. If this is of concern, it would be no problem to implement Antoine's idea of scrambling the entries during iteration. Raymond P.S. I've gotten a lot of suggestions for improvements to the proof-of-concept code. Thank you for that. The latest version is at: http://code.activestate.com/recipes/578375/ In that code, entries are stored in regular Python lists and inherit their over-allocation characteristics (about 12.5% overallocated for large lists). There are many other possible allocation strategies with their own respective speed/space trade-offs.
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
-- Thanks, Andrew Svetlov
- Previous message: [Python-Dev] More compact dictionaries with faster iteration
- Next message: [Python-Dev] More compact dictionaries with faster iteration
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]