Scaling problem of HashMap (introduced with alternative hashing) (original) (raw)
Wolfgang Hoschek whoschek at cloudera.com
Sat Jan 5 22:10:39 UTC 2013
- Previous message: Scaling problem of HashMap (introduced with alternative hashing)
- Next message: Scaling problem of HashMap (introduced with alternative hashing)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mike Duigou <mike.duigou at ...> writes:
> I am responding on the StackOverflow thread. I will look into using ThreadLocalRandom. > > The random.next() is clearly a potential bottleneck but given that this happens only once per HashMap > instance it is still unclear why a reasonable application would want to create hundreds or thousands of > hash maps per second. > > Mike
There are tons of apps out there that create a transient HashMap per record in big data applications. Think parsers and serializers in tight loops, for example. ArrayList and HashMap are probably the most heavily used data structures in every day java usage, and perf regressions in this area affect all existing java programs. Putting any synchronization into unsynchronized collections classes is a real gotcha.
In my opinion, this is unacceptable and needs to be fixed ASAP. The change that was apparently introduced in 7u6, CR#7118743 should be reverted or fixed without requiring any synchronization or CAS operation.
Somehow this situation reminds me of the colossal mistake of making StringBuffer and Vector and HashTable synchronized in JDK 1.1/1.2. People paid dearly for years for that mistake. No need to repeat that experience.
Indeed, the performance of HashMap constructor is of such high importance that Josh Bloch (the creator of the java collections framework) back in the day even put in the optimizations in response to customer feedback to avoid the multiplication loop for the common case, which is the HashMap() no-arg constructor:
java6: public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR); table = new Entry[DEFAULT_INITIAL_CAPACITY]; init(); }
I just checked the source and it appears that this optimization has ALSO somehow got lost on the way from java6 to java7. In Java7 I find that the no-arg constructor required the while (capacity < initialCapacity). In other words, CR#7118743, not only adds CAS and other unnecessary random related computation overhead that hurts a common usage pattern, but it also removes the deliberate fast path put in by Josh Bloch back then, which makes this an even bigger performance regression!
java7: public HashMap() { this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR); }
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
table = new Entry[capacity];
useAltHashing = sun.misc.VM.isBooted() &&
(capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
init();
}
Wolfgang
- Previous message: Scaling problem of HashMap (introduced with alternative hashing)
- Next message: Scaling problem of HashMap (introduced with alternative hashing)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]