[Python-Dev] String hash function multiplier (original) (raw)

Raymond Hettinger python at rcn.com
Tue Apr 13 22:16:16 EDT 2004


[Jeff Epler]

> With -O2 -mcpu=i686 or newer, gcc uses "imul" for both 100003 and > 65599, > rather than shifts and adds.

[Bob Ippolito]

It's not expected that GCC optimize an integer constant into shifts on its own.

Right, the actual diff is:

*** 1145,1151 **** p = (unsigned char *) a->ob_sval; x = p << 7; while (--len >= 0) ! x = (1000003x) ^ *p++; x ^= a->ob_size; if (x == -1) x = -2; --- 1152,1158 ---- p = (unsigned char *) a->ob_sval; x = *p << 7; while (--len >= 0) ! x = (x << 6) + (x << 16) - x + (long)*p++; x ^= a->ob_size; if (x == -1) x = -2;

I guess the real question for Raymond is, does it really make a measurable difference?

Yes, the loop runs 20% faster on my Pentium III. The speedup ought to be much more dramatic on the Pentium IV (where the integer ALU instructions speedup from 1 clock to 0.5 clocks while the latency on integer multiplication slows down from 4 clocks to 14-18 clocks).

And what effect does it have on pickled dicts (or other such hash-using data structures), if any?

The test suite runs fine. Dicts may display in a different order than in previous pythons. That may upset some doctests if the writer didn't take Tim's documented advice about making tests that do not depend on display order.

Raymond



More information about the Python-Dev mailing list