HashMap - Hybrid approach (original) (raw)

Doug Lea dl at cs.oswego.edu
Thu Jun 11 14:53:08 UTC 2009


Alex Yakovlev wrote:

I thought about tree-like structures for collisions, but what came to my mind is that the problem is not in search depth that can be reduced by non-linear structures, but in memory fragmentation.

There were three reasons I suggested tries for collisions (1) to remove a dependent load, allowing prefetch into elements array (2) to decrease footprint, (3) to conform to current threshold parameters, while still avoiding long sequential pile-ups that you can get using using open-addressing-style collision handling, especially if the number of duplicate hash codes is higher than statistically expected. It may be possible to still get most of the effects without them though. Here are some notes on some possible ways to get there:

First a digression: One source of hash duplicates is that System.identityHashCode (which you also get as Object.hashCode for keys that don't override it) has had from 21 to 25 significant bits in various releases of hotspot (and fewer in some other JVMs), which means you start seeing more and more of them in very large tables. It is a known problem that IdentityHashMaps may have crummy performance for tables past a few million elements. On the other hand, because identityHashCodes are nicely randomized by the JVM, and because IdentityHashMaps avoid an indirection compared to other maps, performance on less-than-huge tables is by far faster than for all others.

About footprint (again; refining/correcting my previous mail): In 32bit mode, under default 3/4 load factor, the current Hashmap allocates C+6N words (6 = 4 fields plus 2 words object header). versus 13C/4 for yours. In steady state, C = 4N/3 just before a resize, and 8N/3 just after, with average at 6N/3 = 2N. So the steady state footprints below look good, but the initial sizes under default 16 capacity don't. (While I'm at it, for further comparison, the footprint for IdentityHashMap is also below. It hardwires 2/3 not 3/4 threshold.)

              steady state       initial
          min     ave     max    (default cap)

yours: 4.33N 6.50N 8.67N 52 current: 7.33N 8.00N 8.67N 16 (Identity: 3.00N 4.50N 6.00N 32)

To cope with known bloat issues, any possible HashMap replacement should have a smaller footprint for unused and tiny maps. Using a dynamic collision tree is one way to reduce initial footprint. Another way is to just to be more cautious on startup:

Also, currently you initially allocate 64 words for a 1.0 loadfactor, which is surprisingly greater than 52 for 0.75 default. Under above scheme initial values would be closer. Further note that the steady state space differences between 0.75 and 1.0 are very small (6.25%): steady state initial min ave max (default cap) lf=1.0: 4.00N 6.00N 8.00N 64

And further, the performance differences are very small across these load factors in a few tests I ran. This suggests a different sort of tactic for helping with indirections and prefetches. Suppose that index[h] was usually just h; that is, with the key in elements[2*h]. (This only works in current design if loadFactor==1, but can work with smaller values if elements array is sized at hashlen.) Under this scheme, you can optimistically read this element at the same time as the index. To make this work, you'd need a bitmap of available slots rather than a freelist, adding C/32 space overhead, plus some heuristic juggling around inside put and resize. It might be worth an experiment or two. Doing this requires yet further work/thought about how sizing varies with loadFactor arguments.

But in my array-based approach it is possible to defragment it and (almost) always store all collisions in sequental array indices.

In current HashMap, the nodes chained inside bins tend to have good locality after a GC because GC will usually relocate them to be together. Analogously, it would seem to make sense to defrag only on resize, perhaps accompanied by heuristic one-place swaps during put.

I gathered some statisics, and in successful reads when map is almost full with .75 load factor just before resize, 70% of gets happens on first try, 23% on second, 5% on third lookup, 1% on fourth.

The exact values depend of course on hash quality, insertion order, etc. But this also reflects fact that the collisions array (or back half of current index array) is typically less than 30% occupied. This suggests shrinking its default size and letting it independently resize, using the same sort of approach as with main hash: start out with say 25% size, by letting sets of 4 indices share an overflow slot, and expand as needed. Or maybe start with 50%, which will still be enough footprint savings to bother.

-Doug



More information about the core-libs-dev mailing list