[Python-Dev] Hashing proposal: change only string-only dicts (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Wed Jan 18 18:55:31 CET 2012

Previous message: [Python-Dev] Hashing proposal: change only string-only dicts
Next message: [Python-Dev] Hashing proposal: change only string-only dicts
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Am 18.01.2012 17:01, schrieb PJ Eby:

On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. Löwis" <martin at v.loewis.de_ _<mailto:martin at v.loewis.de>> wrote:

Am 17.01.2012 22:26, schrieb Antoine Pitrou: > Only 2 bits are used in obsstate, meaning 30 are left. These 30 bits > could cache a "hash perturbation" computed from the string and the > random bits: > > - hash() would use obshash > - dictlookup() would use ((obshash * 1000003) ^ (obsstate & ~3)) > > This way, you cache almost all computations, adding only a computation > and a couple logical ops when looking up a string in a dict. That's a good idea. For Unicode, it might be best to add another slot into the object, even though this increases the object size.

Wouldn't that break the ABI in 2.x?

I was thinking about adding the field at the end, so I thought it shouldn't. However, if somebody inherits from PyUnicodeObject, it still might - so my new proposal is to add the extra hash into the str block, either at str[-1], or after the terminating 0. This would cause an average increase of four bytes of the storage (0 bytes in 50% of the cases, 8 bytes because of padding in the other 50%).

What do you think?

Regards, Martin

Previous message: [Python-Dev] Hashing proposal: change only string-only dicts
Next message: [Python-Dev] Hashing proposal: change only string-only dicts
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list