Faster HashMap implementation (original) (raw)

Doug Lea dl at cs.oswego.edu
Wed Jul 1 12:27:50 UTC 2009


While I'm posting in-progress stuff, here are some other notes on hash maps:

First, some observations about usage (repeating some already posted):

Via some dated but probably still valid dynamic measurments: Ops: Approx 84% read, 15% add/modify, 1% remove Sizes: Peak at 0, 1, 2, then mostly flat distribution across all others Static guesses from frequencies of code forms, mainly via google code search Key types 50-60% Strings 10-20% Numbers -- mainly Integer, then Long, then others 10-20% Identity-based (no override of equals/hashCode) 10-30% eveything else Reads might be around 70% get, 30% traverse? get: At least 80% successful (no null checks on get) Iterators: Probably less than 20% Entry, others about half key vs value Puts: typically add, not modify

The tables below (from MapMicrobenchmark) show that cache misses completely dominate performance for large maps. You'd like to both push out the sizes for which they come into play, and lessen the number of indirections (thus misses) even when they do. HashMapV2 mostly does the latter. Of course, for cache misses to dominate, you also need good key hashing and collision strategies to cope with possibly poor hashCode()'s.

Times are approximately nanosecs per element/op. Here are runs on an Intel x86 linux 2.6.24-23 with jdk6 -server. Absolute times vary on other platforms but show the same patterns (*).

HashMap: Type/Size: 36 144 576 2304 9216 36864 147456 589824 BigDecimal 40 37 43 47 57 174 267 317 BigInteger 39 34 38 39 48 187 247 288 String 37 34 38 42 54 149 227 253 Double 41 36 39 44 49 80 219 250 Float 40 38 41 44 48 95 216 267 Long 34 29 33 34 43 74 194 242 Integer 38 33 36 39 46 70 199 233 Object 36 33 35 39 47 64 215 275

average 38 34 37 41 49 111 223 265

HashMapV2: Type/Size: 36 144 576 2304 9216 36864 147456 589824 BigDecimal 34 34 40 44 58 146 255 286 BigInteger 28 25 29 34 44 151 220 243 String 34 29 34 40 59 136 208 238 Double 37 32 32 44 46 55 180 192 Float 34 37 38 42 42 67 165 246 Long 26 21 22 29 32 40 123 147 Integer 29 25 30 37 46 60 137 157 Object 34 31 33 41 49 64 196 250

average 32 29 32 38 47 89 185 219

Here are some more details listed across some operation categories (via updated MapCheck) using as keys, English words (Strings) from a big dictionary. Times are scaled differently than above, and these runs are from AMD-x86/solaris, also typical of others:

                     HashMap  :     V2

Access Present : 213 : 200 Add Absent : 265 : 245 Modify Present : 231 : 199 Remove Present : 111 : 101 Search Absent : 119 : 93 Traverse access entry : 173 : 137 Traverse access key/val: 147 : 87

(*) At least on Solaris jdk6u14, adding the -XX:+AggressiveOpts option also seems to enable the specialized Integer version of HashMap, which improves performance on some tests but worsens on others. On all platforms, the -XX:+AggressiveOpts switch seems to lead to more run-to-run variation.

Here's synosis of MapMicrobenchmark pasted from javadoc:



More information about the core-libs-dev mailing list