Update on alternative hashing for String keys with hash-based Maps (original) (raw)

Doug Lea dl at cs.oswego.edu
Sat Jul 14 06:43:57 PDT 2012


On 07/13/12 19:09, Mike Duigou wrote:

Doug Lea has been investigating further improvements

A quick overview: The preview version of ConcurrentHashMap (currently as jsr166e.ConcurrentHashMapV8 -- see http://gee.cs.oswego.edu/dl/concurrency-interest/index.html) addresses string hash problems from a different angle. Rather than replacing the hash code algorithm, it uses O(log n) techniques to provide graceful degradation in cases where many keys have colliding hash codes.

The basic techniques used apply only when the colliding keys are Comparable, which includes all cases of interest (String, numbers, etc; this is dynamically checked, so the nominal key types don't matter). They reduce worst-case impact of dumb/hostile usages where, say, a billion keys all have the same hash code, from a billion steps per operation down to about 50 steps without the need for alternate hash functions (like murmur) that would be less susceptible to these problems at the expense of noticeably higher costs in the normal case.

It looks possible but will take a bunch of effort to apply these techniques to the other JDK hash-based classes that are susceptible to these kinds of performance issues. (The main downside in doing this is that each of them will require a few hundred lines of slightly different specializations of the required variant form of balanced trees.)

This approach may eliminate the motivation/need for "hash2" slot introduced in class String, further reducing footprints (i.e., murmur would still be supported but its results not cached). And further, the ALT-HASHING controls may turn into no-ops for all JDK tables.

-Doug



More information about the jdk7u-dev mailing list