[Python-Dev] Hashing proposal: change only string-only dicts (original) (raw)

Gregory P. Smith greg at krypto.org
Wed Jan 18 06:58:51 CET 2012


On Tue, Jan 17, 2012 at 12:59 PM, "Martin v. Löwis" <martin at v.loewis.de>wrote:

I'd like to propose a different approach to seeding the string hashes: only do so for dictionaries involving only strings, and leave the tphash slot of strings unchanged.

Each string would get two hashes: the "public" hash, which is constant across runs and bugfix releases, and the dict-hash, which is only used by the dictionary implementation, and only if all keys to the dict are strings. In order to allow caching of the hash, all dicts should use the same hash (if caching wasn't necessary, each dict could use its own seed). There are several variants of that approach wrt. caching of the hash 1. add an additional field to all string objects, to cache the second hash value.

yuck, our objects are large enough as it is.

a) variant: in 3.3, drop the extra field, and declare that hashes may change across runs

+1 Absolutely. We can and should make 3.3 change hashes across runs (behavior that can be disabled via a flag or environment variable).

I think the issue of doctests and such breaking even in 2.7 due to hash order changes is a being overblown. Code like that has already needs to fix its tests at least once when they want tests to pass on on both 32-bit and 64-bit python VMs (they have different hashes). Do we have any measure of how big a deal this will be before going too far here?

-gps -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120117/387bfbe9/attachment.html>



More information about the Python-Dev mailing list