(original) (raw)
On Sat, Dec 31, 2011 at 4:04 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
Hash functions are already unstable across Python versions. Makingthem unstable across interpreter processes (multiprocessing doesn't
share dicts, right?) doesn't sound like a big additional problem.
Users who want a distributed hash table will need to pull their own
hash function out of hashlib or re-implement a non-cryptographic hash
instead of using the built-in one, but they probably need to do that
already to allow themselves to upgrade Python.
Here's an idea. �Suppose we add a sys.hash\_seed or some such, that's settable to an int, and defaults to whatever we're using now. �Then programs that want a fix can just set it to a random number, and on Python versions that support it, it takes effect. �Everywhere else it's a silent no-op.
Downside: sys has to have slots for this to work; does sys actually have slots? �My memory's hazy on that. �I guess actually it'd have to be sys.set\_hash\_seed(). �But same basic idea.
Anyway, this would make fixing the problem \*possible\*, while still pushing off the hard decisions to the app/framework developers. �;-)
Downside: every hash operation includes one extra memory access, but strings only compute their hash once anyway.)
Given that changing dict won't help, and changing the default hash is a non-starter, an option to set the seed is probably the way to go. �(Maybe with an environment variable and/or command line option so users can work around old code.)