Faster HashMap implementation (original) (raw)
Doug Lea dl at cs.oswego.edu
Wed Jul 1 12:27:50 UTC 2009
- Previous message: hg: jdk7/tl/jdk: 6811297: Add more logging to HTTP protocol handler
- Next message: Faster HashMap implementation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
While I'm posting in-progress stuff, here are some other notes on hash maps:
First, some observations about usage (repeating some already posted):
Via some dated but probably still valid dynamic measurments: Ops: Approx 84% read, 15% add/modify, 1% remove Sizes: Peak at 0, 1, 2, then mostly flat distribution across all others Static guesses from frequencies of code forms, mainly via google code search Key types 50-60% Strings 10-20% Numbers -- mainly Integer, then Long, then others 10-20% Identity-based (no override of equals/hashCode) 10-30% eveything else Reads might be around 70% get, 30% traverse? get: At least 80% successful (no null checks on get) Iterators: Probably less than 20% Entry, others about half key vs value Puts: typically add, not modify
The tables below (from MapMicrobenchmark) show that cache misses completely dominate performance for large maps. You'd like to both push out the sizes for which they come into play, and lessen the number of indirections (thus misses) even when they do. HashMapV2 mostly does the latter. Of course, for cache misses to dominate, you also need good key hashing and collision strategies to cope with possibly poor hashCode()'s.
Times are approximately nanosecs per element/op. Here are runs on an Intel x86 linux 2.6.24-23 with jdk6 -server. Absolute times vary on other platforms but show the same patterns (*).
HashMap: Type/Size: 36 144 576 2304 9216 36864 147456 589824 BigDecimal 40 37 43 47 57 174 267 317 BigInteger 39 34 38 39 48 187 247 288 String 37 34 38 42 54 149 227 253 Double 41 36 39 44 49 80 219 250 Float 40 38 41 44 48 95 216 267 Long 34 29 33 34 43 74 194 242 Integer 38 33 36 39 46 70 199 233 Object 36 33 35 39 47 64 215 275
average 38 34 37 41 49 111 223 265
HashMapV2: Type/Size: 36 144 576 2304 9216 36864 147456 589824 BigDecimal 34 34 40 44 58 146 255 286 BigInteger 28 25 29 34 44 151 220 243 String 34 29 34 40 59 136 208 238 Double 37 32 32 44 46 55 180 192 Float 34 37 38 42 42 67 165 246 Long 26 21 22 29 32 40 123 147 Integer 29 25 30 37 46 60 137 157 Object 34 31 33 41 49 64 196 250
average 32 29 32 38 47 89 185 219
Here are some more details listed across some operation categories (via updated MapCheck) using as keys, English words (Strings) from a big dictionary. Times are scaled differently than above, and these runs are from AMD-x86/solaris, also typical of others:
HashMap : V2
Access Present : 213 : 200 Add Absent : 265 : 245 Modify Present : 231 : 199 Remove Present : 111 : 101 Search Absent : 119 : 93 Traverse access entry : 173 : 137 Traverse access key/val: 147 : 87
(*) At least on Solaris jdk6u14, adding the -XX:+AggressiveOpts option also seems to enable the specialized Integer version of HashMap, which improves performance on some tests but worsens on others. On all platforms, the -XX:+AggressiveOpts switch seems to lead to more run-to-run variation.
Here's synosis of MapMicrobenchmark pasted from javadoc:
- A micro-benchmark with key types and operation mixes roughly
- corresponding to some real programs.
- The main results are a table of approximate nanoseconds per
- element-operation (averaged across get, put etc) for each type,
- across a range of map sizes.
- The program includes a bunch of microbenchmarking safeguards that
- might underestimate typical performance. For example, by using many
- different key types and exercising them in warmups it disables most
- dynamic type specialization. Some test classes, like Float and
- BigDecimal are included not because they are commonly used as keys,
- but because they can be problematic for some map implementations.
- By default, it creates and inserts in order dense numerical keys
- and searches for keys in scrambled order. Use "r" as second arg to
- instead use random numerical values, and "s" as third arg to search
- in insertion order.
- Previous message: hg: jdk7/tl/jdk: 6811297: Add more logging to HTTP protocol handler
- Next message: Faster HashMap implementation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]