[Python-Dev] PyBench DictCreation (was Re: Performance compares) (original) (raw)

Tim Peters tim.one@home.com
Fri, 18 May 2001 15:48:33 -0400

Previous message: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Next message: [Python-Dev] IPv6
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[Neil Schemenauer]

A table of keys and values. Values are looked up by looping over the table comparing each key until the correct one is found (ie. its O(n) where n is the size of the table). For Python, the cost of doing compares probably outweighs the cost of doing the hashing, even for small tables.

I thought about that before. The inlining appeals but the algorithm not much: the dict implementation as is loops over all the table entries too, except that instead of starting with "i = 0" it starts (now) with "i = hash & mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^= poly"; and instead of giving up when "i >= length" it punts when finding an entry with a null value. Incrementing via ++i is certainly cheaper, except that even when small, the hash table usually hits on the first try when the key is present, so usually gets out before incrementing.

Its not clear to me though if it would be a win.

Best guess is not.

Assuming that interned strings are the most common key, a assocation table with four entries would take on average two pointer compares to look up a value.

Actually an average of 2.5 when the key is present and each key is equally likely to be queried, and always 4 when the queried key is not present. The hash table has better expected stats on both counts, but needs 4 unused slots too to achieve that. The savings would be in memory for small dicts more than in time (if at all).

Previous message: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Next message: [Python-Dev] IPv6
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]