[Python-Dev] More compact dictionaries with faster iteration (original) (raw)

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed Dec 12 23:06:18 CET 2012


On 12/12/2012 10:31 PM, PJ Eby wrote:

On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:

On 12/12/2012 01:15 AM, Nick Coghlan wrote:

On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov at microsoft.com_ _<mailto:dinov at microsoft.com>> wrote: OTOH changing certain dictionaries in IronPython (such as keyword args) to be ordered would certainly be possible. Personally I just wouldn't want to see it be the default as that seems like unnecessary overhead when the specialized class exists.

Which reminds me, I was going to note that one of the main gains with ordered keyword arguments, is their use in the construction of string-keyed objects where you want to be able to control the order of iteration (e.g. for serialisation or display purposes). Currently you have to go the path of something like namedtuple where you define the order of iteration in one operation, and set the values in another. So here's a brand new argument against ordered dicts: The existence of perfect hashing schemes. They fundamentally conflict with ordered dicts. If I understand your explanation, then they don't conflict with the type of ordering described in this thread. Raymond's optimization separates the "hash table" part from the "contents" part of a dictionary, and there is no requirement that these two parts be in the same size or the same order.

I don't fully agree.

Perfect hashing already separates "hash table" from "contents" (sort of), and saves the memory in much the same way.

Yes, you can repeat the trick and have 2 levels of indirection, but that then requires an additional table of small ints which is pure overhead present for the sorting; in short, it's no longer an optimization but just overhead for the sortability.

Dag Sverre

Indeed, Raymond's split design lets you re-parameterize the hashing all you want, without perturbing the iteration order at all. You would in fact be able to take a dictionary at any moment, and say, "optimize the 'hash table' part to a non-colliding state based on the current contents", without touching the 'contents' part at all. (One could do this at class creation time on a class dictionary, and just after importing on a module dictionary, for example.)



More information about the Python-Dev mailing list