[Python-Dev] String hash function multiplier (original) (raw)

Bob Ippolito bob at redivi.com
Tue Apr 13 21:30:28 EDT 2004


On Apr 13, 2004, at 9:09 PM, Jeff Epler wrote:

With -O2 -mcpu=i686 or newer, gcc uses "imul" for both 100003 and 65599, rather than shifts and adds.

There may be a few people who care about some other processor, but I wouldn't listen to them. (the only non-x86 CPU I program for on a weekly basis doesn't have hardware multiply, but it's also much too small for Python) The current value goes back a long way: http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Objects/ stringobject.c#rev2.31 ... all the way back to when Python did string haching instead of hashing. Other than some abstract beauty to 65599, are there some other practical advantages I'm missing?

It's not expected that GCC optimize an integer constant into shifts on
its own. Anyways, a practical advantage is that with a sane
instruction set, like PPC, it saves you a memory access or some
instructions (depending on the compiler I guess). Both 100003 and
65599 are too big to be immediate values in a PPC instruction, but the
shift constants are not.

I guess the real question for Raymond is, does it really make a
measurable difference? And what effect does it have on pickled dicts
(or other such hash-using data structures), if any?

-bob



More information about the Python-Dev mailing list