[Python-3000] Performance Notes - new hash algorithm (original) (raw)

Larry Hastings larry at hastings.org
Sun Sep 9 04:24:47 CEST 2007

Previous message: [Python-3000] Performance Notes - new hash algorithm
Next message: [Python-3000] Performance Notes - new hash algorithm
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

If the Python community is just noticing the Hsieh hash, that implies that the Bob Jenkins hashes are probably unknown as well. Behold: http://burtleburtle.net/bob/hash/doobs.html To save you a little head-scratching, the functions you want to play with are hashlittle()/hashlittle2() in "lookup3.c": http://burtleburtle.net/bob/c/lookup3.c hashlittle() returns a 32-bit hash; hashlittle2() returns two 32-bit hashes on the same input (in effect a 64-bit hash). The "little" implies that the function is better on little-endian machines. (There is a hashbig(); no hashbig2(), it is left as an exercise for the reader.)

In our testing (at Facebook, for memcached) hashlittle2 was faster than the Hsieh hash; that was done a year ago (and before I joined) so I don't have numbers for you.

One goal of Jenkin's hashes is uniform distribution, so these functions presumably lack the serendipitous "similar inputs hash to similar values" behavior of Python's current hash function. But why is that a feature? (Not that I doubt Tim Peters!)

Oh, and, all the Jenkins code is public domain.

Cheers,

/larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070908/a49272fd/attachment.htm

Previous message: [Python-3000] Performance Notes - new hash algorithm
Next message: [Python-3000] Performance Notes - new hash algorithm
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list