Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps (original) (raw)

Ulf Zibis Ulf.Zibis at gmx.de
Wed May 23 23:58:07 UTC 2012


Hi,

What about making this approach a little bit more general? See: Bug <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862>6812862 <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862> - provide customizable hash() algorithm in HashMap for speed tuning + all later comments. Then you additionally could save: if ((0 != h) && (k instanceof String))

Looking at the codes of many charsets, the main variance seems to be in the lower 8 bits of a character, especially if the strings belong to the same language. So if we would compose the initial 32-bit values from 4 chars then the murmur3 algorithm could perform almost twice faster.

If you alter all hash maps in JDK to use a new hash value, which noteworthy use cases remain to use the legacy hashcode()? Do we really need 2 hash fields in String?

In project coin, we have set in stone to use compile time hashes for Strings_in_switch extension. So it never can't profit from the murmur3 optimization. IMO: what a pity! (Prominent people have said, it will never make sense to change the String's hash algorithm.) See: http://markmail.org/message/ig3nzmfinfuvgbwz http://markmail.org/message/h3nlhhae5qlmf37a

Am 23.05.2012 21:03, schrieb Mike Duigou:

Also, this change

- return h ^ (h>>> 7) ^ (h>>> 4); + h ^= (h>>> 7) ^ (h>>> 4); + + return h; will make the compiler generates an additional iload/istore pair. While the Jitted code will be the same, it may bother the inlining heuristic. Wouldn' t return (h ^= (h>>> 7) ^ (h>>> 4)); have the same effect ?

Anyway, please add a comment for later readers.

-Ulf



More information about the core-libs-dev mailing list