Are the faster versions of HashMap and BigInteger going into jdk7? (original) (raw)
Alex Yakovlev alex14n at gmail.com
Tue Oct 27 06:22:44 UTC 2009
- Previous message: Are the faster versions of HashMap and BigInteger going into jdk7?
- Next message: hg: jdk7/tl/jdk: 6891113: More methods for java.util.Objects: deepEquals, hash, toString with default
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Doug,
On Sat, Oct 24, 2009 at 2:43 AM, Doug Lea <dl at cs.oswego.edu> wrote:
Although at the end of that mail (http://mail.openjdk.java.net/pipermail/core-libs-dev/2009-July/002168.html) I mentioned that there are several less dramatic changes to the current implementation that seem worthwhile. In case anyone is interested, a preliminary version with those changes, that I put together shortly afterward is at http://gee.cs.oswego.edu/dl/wwwtmp/HashMapV6.java One of these days/weeks/months I hope to re-check for further possible improvements and consider committing.
I have an idea how to improve both yours and mine previous efforts. I've got no code to show but the idea is to have 3 arrays:
Object[] keyValueTable with key/value mappings. It's size is 2*capacity so it's based on open hash approach, and it's lazy-initialized.
int[] indexTable with complex indices. Lowest bits contains next index, in the middle are hashCode bits, and there's one bit flag indicating that this cell is from another hash chain. We don't need 'end of list' bit flag - it can be encoded by having next index bits equal to this cell index.
Entry<K,V>[] overflowTable with HashEntry-based linked list for overflow. It will be used only for very large maps, or maybe for small too with load factor >=1, what do you think? This will be faster (less memory lookups) than storing all overflow data in a single trie.
In get method we first look into keyValueTable. If it's key part is null - return null, if it's referentially equals to given key - return stored value, thus we'll have only one random memory lookup. Else we look indexTable, if it's 'another chain' bit is set - return null, else compare hash bits, if equals - compare key calling equals. If they are not equal - check next index until it's equal to current indicating end-of-list. If overflowTable is not null also check it (it's like table current HashMap).
By storing next index in open hash approach we'll get linked hash performance (no 'false' lookups) on gets (but put operation will be slower than in pure linked hash). We can use adjacent cells which gives several percent gain as I wrote before. Defragmentation trick cannot be used in open map approach (and it results in too complex code) but we can use not only next free index but also previous, or maybe several steps forward/backward that could be prefetched. This should be tested on different platforms to figure out optimal behavior.
In put operation we may need to move out other entries from other hash chains, but we have no space to store previous index to have double linked list, so we'll need to check possible previous indices - either adjacent or using some large step or some inversible function - here I'll strongle need your advice.
Then to insert new key we first check adjacent cell, fill it if it's empty, it now - take a big step, either fixed or some function-based, fill it if empry, take another step if not, etc. We can also limit number of possible steps after which we'll use overflowTable. And if current hash load is high enough or arrays sizes is at maximum we'll use overflowTable anyway.
To iterate over elements we first iterate over keyValueTable then overflowTable if it's not null.
To have HashSet use less memory we can have a flag indicating that keyValueTable contains only keys, but probably we should check if it does not hurt performance.
For Linked map we'll add two double-element arrrays, one with next/previous index, another (initialized only with overflowTable) with pointer to next/prev HashEntries. Also in Entry object we'll need to add two index fields and two pointers to link either to main arrays or to overflow Entry object.
Do you think it's worth trying? This way we'll have the best of your open-hash-based previous approach and mine complex indices + using adjacent cells for prefetching, having no problem with big load factors and very large maps with better speed than single trie for everything needing ~30 random memory lookups (one for each bit).
Alex
- Previous message: Are the faster versions of HashMap and BigInteger going into jdk7?
- Next message: hg: jdk7/tl/jdk: 6891113: More methods for java.util.Objects: deepEquals, hash, toString with default
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]