[Python-Dev] PEP for new dictionary implementation (original) (raw)

Jim J. Jewett jimjjewett at gmail.com
Thu Feb 16 19:24:22 CET 2012

Previous message: [Python-Dev] PEP for new dictionary implementation
Next message: [Python-Dev] PEP for new dictionary implementation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

PEP author Mark Shannon wrote (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt):

... allows ... (the _dict_ attribute of an object) to share keys with other attribute dictionaries of instances of the same class.

Is "the same class" a deliberate restriction, or just a convenience of implementation? I have often created subclasses (or even families of subclasses) where instances (as opposed to the type) aren't likely to have additional attributes. These would benefit from key-sharing across classes, but I grant that it is a minority use case that isn't worth optimizing if it complicates the implementation.

By separating the keys (and hashes) from the values it is possible to share the keys between multiple dictionaries and improve memory use.

Have you timed not storing the hash (in the dict) at all, at least for (unicode) str-only dicts? Going to the string for its own cached hash breaks locality a bit more, but saves 1/3 of the memory for combined tables, and may make a big difference for classes that have relatively few instances.

Reduction in memory use is directly related to the number of dictionaries with shared keys in existence at any time. These dictionaries are typically half the size of the current dictionary implementation.

How do you measure that? The limit for huge N across huge numbers of dicts should be 1/3 (because both hashes and keys are shared); I assume that gets swamped by object overhead in typical small dicts.

If a table is split the values in the keys table are ignored, instead the values are held in a separate array.

If they're just dead weight, then why not use them to hold indices into the array, so that values arrays only have to be as long as the number of keys, rather than rounding them up to a large-enough power-of-two? (On average, this should save half the slots.)

A combined-table dictionary never becomes a split-table dictionary.

I thought it did (at least temporarily) as part of resizing; are you saying that it will be re-split by the time another thread is allowed to see it, so that it is never observed as combined?

Given that this optimization is limited to class instances, I think there should be some explanation of why you didn't just automatically add slots for each variable assigned (by hard-coded name) within a method; the keys would still be stored on the type, and array storage could still be used for the values; the dict slot could initially be a NULL pointer, and instance dicts could be added exactly when they were needed, covering only the oddball keys.

I would reword (or at least reformat) the Cons section; at the moment, it looks like there are four separate objections, and seems to be a bit dismissive towards backwards copmatibility. Perhaps something like:

While this PEP does not change any documented APIs or invariants, it does break some de facto invariants.

C extension modules may be relying on the current physical layout of a dictionary. That said, extensions which rely on internals may already need to be recompiled with each feature release; there are already changes planned for both Unicode (for efficiency) and dicts (for security) that would require authors of these extensions to at least review their code.

Because iteration (and repr) order can depend on the order in which keys are inserted, it will be possible to construct instances that iterate in a different order than they would under the current implementation. Note, however, that this will happen very rarely in code which does not deliberately trigger the differences, and that test cases which rely on a particular iteration order will already need to be corrected in order to take advantage of the security enhancements being discussed under hash randomization, or for use with Jython and PyPy.

-jJ

If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

Previous message: [Python-Dev] PEP for new dictionary implementation
Next message: [Python-Dev] PEP for new dictionary implementation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list