original) (raw)
(I'd like to propose a different approach to seeding the string hashes:
only do so for dictionaries involving only strings, and leave the
tp_hash slot of strings unchanged.
Each string would get two hashes: the "public" hash, which is constant
across runs and bugfix releases, and the dict-hash, which is only used
by the dictionary implementation, and only if all keys to the dict are
strings. In order to allow caching of the hash, all dicts should use
the same hash (if caching wasn't necessary, each dict could use its own
seed).
There are several variants of that approach wrt. caching of the hash
1. add an additional field to all string objects, to cache the second
� hash value.
� a) variant: in 3.3, drop the extra field, and declare that hashes
� may change across runs
I think the issue of doctests and such breaking even in 2.7 due to hash order changes is a being overblown. �Code like that has already needs to fix its tests at least once when they want tests to pass on on both 32-bit and 64-bit python VMs (they have different hashes). �Do we have _any_ measure of how big a deal this will be before going too far here?