[Python-Dev] PEP for new dictionary implementation (original) (raw)

Jim Jewett jimjjewett at gmail.com
Fri Feb 17 04:32:51 CET 2012

Previous message: [Python-Dev] PEP for new dictionary implementation
Next message: [Python-Dev] PEP for new dictionary implementation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Feb 16, 2012 at 4:34 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:

Am 16.02.2012 19:24, schrieb Jim J. Jewett:

PEP author Mark Shannon wrote (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt):

... allows ... (the _dict_ attribute of an object) to share keys with other attribute dictionaries of instances of the same class.

Is "the same class" a deliberate restriction, or just a convenience of implementation?

It's about the implementation: the class keeps a pointer to the key set. A subclass has a separate pointer for that.

I would prefer to see that reason in the PEP; after a few years, I have trouble finding email, even when I remember reading the conversation.

Have you timed not storing the hash (in the dict) at all, at least for (unicode) str-only dicts? Going to the string for its own cached hash breaks locality a bit more, but saves 1/3 of the memory for combined tables, and may make a big difference for classes that have relatively few instances.

I'd be in favor of that, but it is actually an unrelated change: whether or not you share key sets is unrelated to whether or not str-only dicts drop the cached hash.

Except that the biggest arguments against it are that it breaks cache locality, and it changes the dictentry struct -- which this patch already does anyway.

Given a dict, it may be tricky to determine whether or not it is str-only, i.e. what layout to use.

Isn't that exactly the same determination needed when deciding whether or not to use lookdict_unicode? (It would make the switch to the more general lookdict more expensive, as that would involve a new allocation.)

Reduction in memory use is directly related to the number of dictionaries with shared keys in existence at any time. These dictionaries are typically half the size of the current dictionary implementation.

How do you measure that? The limit for huge N across huge numbers of dicts should be 1/3 (because both hashes and keys are shared); I assume that gets swamped by object overhead in typical small dicts.

It's more difficult than that. He also drops the smalltable (which I think is a good idea), so accounting how this all plays together is tricky.

All the more reason to explain in the PEP how he measured or approximated it.

If a table is split the values in the keys table are ignored, instead the values are held in a separate array.

If they're just dead weight, then why not use them to hold indices into the array, so that values arrays only have to be as long as the number of keys, rather than rounding them up to a large-enough power-of-two? (On average, this should save half the slots.)

Good idea. However, how do you track per-dict how large the table is?

Why would you want to?

The per-instance array needs to be at least as large as the highest index used by any key for which it has a value; if the keys table gets far larger (or even shrinks), that doesn't really matter to the instance. What does matter to the instance is getting a value of its own for a new (to it) key -- and then the keys table can tell it which index to use, which in turn tells it whether or not it needs to grow the array.

Are are you thinking of len(o.dict), which will indeed be a bit slower? That will happen with split dicts and potentially missing values, regardless of how much memory is set aside (or not) for the missing values.

-jJ

Previous message: [Python-Dev] PEP for new dictionary implementation
Next message: [Python-Dev] PEP for new dictionary implementation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list