[Python-3000] Performance Notes - new hash algorithm (original) (raw)

Guido van Rossum guido at python.org
Fri Sep 7 22:53:45 CEST 2007


I'd like Tim Peters's input on this before we change it. I seem to recall that there's an aspect of non-randomness to the existing hash function that's important when you hash many closely related strings, e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been so long that I may misremember this, and perhaps it was related to the dictionary implementation.

In any case we need to see the code as a patch, of course.

On 9/7/07, Gregory P. Smith <greg at krypto.org> wrote:

On 9/4/07, Thomas Hunger <hto at arcor.de> wrote: > > Hello, > > I don't know much about python internals, so the following might be > bogus: > > I replaced unicodehash and stringhash with the hash function from > here: http://www.azillionmonkeys.com/qed/hash.html. > > Then I ran the following micro-benchmark : > > $ time ./python bench.py > > where bech.py is: > > f = dict((line, nr) for nr, line > in enumerate(open('/usr/share/dict/words', > encoding='latin1').readlines())) > > Python3k original hash: real 0m2.210s > new hash: real 0m1.842s > > So maybe this is an interesting hash function? > > Tom

Sounds like a great idea to me. Can you submit it as a patch? We should run some more realistic perf tests and profiles but I imagine the impact will only be good. -gps


Python-3000 mailing list Python-3000 at python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list