HashMap - Hybrid approach (original) (raw)

Doug Lea dl at cs.oswego.edu
Wed Jun 10 12:59:32 UTC 2009

Previous message: HashMap - Hybrid approach
Next message: HashMap - Hybrid approach
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Alex Yakovlev wrote:

To have single-indirect access we need to read key/value objects with the first array read.

Backing up a little, there are five main forces in hash table design. These are fresh in my mind because I've been trying to put together CustomConcurrentHashMap, the last planned JSR166y JDK7 component, that handles weak/soft refs, custom equality comparisons, eviction, etc. (Concurrency adds further forces omitted here.)

Memory stalls
- Cache misses due to randomization of indexing can easily cost hundreds of cycles. This is made worse if there are multiple levels of indirection each with poor locality. (Multiple levels with good locality are much less of a problem, which is why ConcurrentHashMap is pretty fast despite adding one). And this is made even worse when there are too many computations before processor can even do a prefetch, which is one reason why bit-spreading functions require careful speed vs uniformity tradeoff analysis.
Method Dispatching
- Both Object.equals and Object.hashCode are often "megamorphic", i.e., cannot be inlined because they are overridden in too many classes. In some usages, this accounts for the main performance bottleneck. So caching hashcodes, which usually also allows skipping calls to equals for different objects, has a noticeable effect. Also, it is sometimes possible to bias the structure of code to encourage compilers to do more per-call inlining, which helps a lot. Note that these effects can be difficult to test. Test programs that include multiple key types (like our DenseMapMicrobenchmark) are a little more accurate indicators of megamorphism effects, but even here, these kinds of microbenchmarks will often cause compilers to do deeper inlining than they would in most actual usages.
Collision handling
- Collisions lead to some sort of slower scanning. Especially given the highly variable quality of user-supplied hashCodes, you'd like to localize the impact of collisions to avoid per-map slowdowns. Linear-probes and related techniques don't fare very well on these grounds.
Biasing for expected operation mixes
- I once checked that collection (and in particular HashMap/Hashtable) usage approximately obeys the usual 4:1 read vs write ratio seen in just about every aspect of computing. In fact it seems a bit more extreme -- something like 84% read (get() or iterate) 15% add (put() and variants) and 1% delete (remove, clear). So making reads fast is clearly the highest priority.
Footprint
- Many Java frameworks (especially EE) can create hundreds of thousands of collections, many of them very small, but also some huge ones. This leads to second-order bloat effects, like high TLB miss rates, and much slower GC. For some details see papers by Gary Sevitsky and colleagues at http://domino.research.ibm.com/comm/research_people.nsf/pages/sevitsky.pubs.html This is a difficult issue with hash tables since the whole idea is to preallocate space to allow random indexing. And because resizing is not cheap, you don't want to be overly incremental about it.

And we cannot store collisons in pre-allocated arrays - to read it from the same array with keys/values it needs to be an object. Thus writing speed on my original map will always be faster.

Right. My thought here is to use an alternative structure for collisions. So, given capacity C, you'd have: elements[] -- 2C of key/value hashes[] -- 1C of hash + bookkeeping bits collisions -- a dynamic bitwise trie or somesuch Compared to current HashMap...

It allows parallel prefetches on get; of both elements[2*h] and hashes[h], so should reduce memory stalls.

It triples the unloaded footprint. You can get most of this back in default-constructor usages by not allocating arrays, but instead using only collisions tree until size reaches say 4, and then using a larger (say 64) initial capacity once you know you aren't a tiny map. = Collisions may or may not be a little more expensive to resolve
Iterators are likely slower and definitely a lot messier
The trie nodes needed here would require 2 or 3 more pointer fields than do list-based nodes, so the incremental footprint grows faster.
Resizing is slower and more complicated since the collisions may or may not be moved to main table.

Without fleshing this out, I'm not sure whether this hits the right tradeoffs.

-Doug

Previous message: HashMap - Hybrid approach
Next message: HashMap - Hybrid approach
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the core-libs-dev mailing list