Re: Faster HashMap implementation (original) (raw)

Sorry for the delay getting back to this.

I put an update of HashMapV2 at http://gee.cs.oswego.edu/dl/wwwtmp/HashMapV2.java It includes the better support for tiny maps and probe sequences that were missing.

On further evaluating it (among other possible HashMap replacements), I'm not convinced that they are enough better to commit.

To explain why, I'll first back up to the main observations that lead to exploring open-address (unchained) tables. Here are runs of JDK HashMap (using chained) vs IdentityHashMap (IHM) (using open addressing) on (further updated) MapMicroBenchmark on an x86 (this is typical of other x86 and sparc runs). Also shown is current HashMapV2. Times in nanosecs per element-operation, averaged across 8 key types.

 Size:   9     36    144    576   2304   9216  36864 147456 589824

HashMap 49 33 30 33 37 45 97 208 273 IHM 41 28 26 28 31 35 58 151 188 V2 47 31 29 37 36 40 78 188 257

On these (and other) tests, IdentityHashMap is from 15% to 60% faster than HashMap. Plus it typically has a smaller footprint, by around 40% in steady state (although for tiny tables it has a larger footprint.) So you'd think that by compromising a bit on the properties that make IHM faster, it would be possible to arrive at something close in revised HashMap. HashMapV2 usually is in-between on both time and space, but unfortunately much closer to HashMap, close enough that it is probably no better than making a couple of simple tweaks to the current HashMap (see below).

The main issues are hash quality and caching:

IdentityHashCodes have good randomness properties. (This is by design: In hotspot, they are produced by a simple random number generator.) This means that basic linear-probe open-address table in IHM usually runs close to its theoretical expected performance level. To effectively deal with non-randomness of hashCode (vs identityHashCodes) for common key types, you need to both precondition (bit-spread) them, and give up on linear-probing and instead use psuedo-randomized probing techniques. (If you do no preconditioning and use linear probing, you get factor of 20X slowdowns for some known key type/value usages.) The first costs computation overhead; the second costs memory locality. While you can trade these off a bit (heavier preconditioning vs more locality of subsequent probes), I don't know of pair of such techniques better than those in current V2. And even these measures cannot achieve as much randomness as you get in IdentityHashMap. In particular, neither help much when dealing with the common problem of encountering much greater than (theorecticallY expected numbers of duplicate hashCodes. In a chained table, these poison only one bin, but in open-adresss they tend to reduce overall performance. So all in all, the hit rates are noticeably worse.
Because Objects themselves are required to cache their own identityHashCodes, there is no reason for IdentityHashMap to do so. So there is no need for a side-array to hold them. The overall space reduction also makes it an easy call to use a 2/3 vs 3/4 load factor threshold, which is a good/common value for open-address tables. The need for side-array of hashes in HashMapV2 not only takes up some space, but also leads to cache misses, mainly on collisions. For large tables, these cases noticeably hurt overall throughput. Note: One reason for caching hashes is compatibility. For many key types (see below) it is a better tradeoff to recompute them when needed for rehashing and not to bother using them to pre-screen equals. But there are exceptions, and people out there surely have code the relies on lack of recomputation.

(In case anyone is curious, the best version I tried with heavy randomizer, linear probe, and no side-array has benchmark stats: Size: 9 36 144 576 2304 9216 36864 147456 589824 average 46 36 38 44 43 49 88 235 294 which is OK, but without any room to improve to the point of being good enough to invite a slew of performance bug reports about hash recomputations.)

Further, the minor-looking compatibility issues with current HashMap (like needing an entirely different map to handle overflow after 500million entries) all add some constant overhead. Plus the intrinsically worse performance on some entrySet operations because of lack of Entry objects. Simple escape analysis will often but not always come to the rescue here, so it is a relatively minor concern. But all together these contribute to the conclusion that open-address approach doesn't have enough advantages to replace HashMap. Similarly for the mixed-side-array approach of Alex's Compact/Fast HashMap.

As I mentioned in previous posts, it is possible to use open addressing to get speedups and footprint reductions comparable to IHM for the set of key types that seem to account for up to 80% of all usages: Strings, Integer/Long, and identity. All three share lack of need to cache hash codes plus either good randomization properties or known cheap ways to attain them. Identity is already supported, but people don't seem to choose IdentityHashMaps all that often even when they apply. The Integer case is supported in a not-very-nice way in Sun JDK releases under -XX:+AggressiveOpts. Strings are by far the most common case but no one has tried anything comparable. Internal specialization might be worth a try but I can't get too excited about prospects for the only available means of doing this -- optimistic type tests followed by costly de-optimization when you are wrong. This extends the lack of performance predicatability you already get in Java to living with higher-order algorithmic luckiness. Explicit specialization (as is upcoming in Scala) seems like a much better line of attack.

There is though one issue surrounding current HashMaps that does seem to deserve a short-term minor improvement. People (including the IBM "bloat" paper folks) have repeatedly shown that a great many HashMaps are either permanently empty or hold at most a few elements. It would be pretty straightforward to accommodate this by (1) lazily initializing arrays (2) initially allocating only capacity 4, then jumping to current default of 16 on first resize. This seems like a simple win. Testing this out, it seems to have almost no performance impact on non-tiny map usages. (In part because adding an initial 4->16 resize warms up the resize code.)

-Doug