(original) (raw)

On Tue, Jan 17, 2012 at 12:59 PM, "Martin v. L�wis" <martin@v.loewis.de> wrote:

I'd like to propose a different approach to seeding the string hashes:

only do so for dictionaries involving only strings, and leave the

tp_hash slot of strings unchanged.

Each string would get two hashes: the "public" hash, which is constant

across runs and bugfix releases, and the dict-hash, which is only used

by the dictionary implementation, and only if all keys to the dict are

strings. In order to allow caching of the hash, all dicts should use

the same hash (if caching wasn't necessary, each dict could use its own

seed).

There are several variants of that approach wrt. caching of the hash

1. add an additional field to all string objects, to cache the second

� hash value.

yuck, our objects are large enough as it is.

�

� a) variant: in 3.3, drop the extra field, and declare that hashes

� may change across runs

+1 Absolutely. �We can and should make 3.3 change hashes across runs (behavior that can be disabled via a flag or environment variable).

I think the issue of doctests and such breaking even in 2.7 due to hash order changes is a being overblown. �Code like that has already needs to fix its tests at least once when they want tests to pass on on both 32-bit and 64-bit python VMs (they have different hashes). �Do we have _any_ measure of how big a deal this will be before going too far here?

-gps