RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size (original) (raw)
Peter Levart peter.levart at gmail.com
Sat Jul 12 15:47:34 UTC 2014
- Previous message: RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size
- Next message: RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 07/11/2014 09:08 PM, Doug Lea wrote:
I've been content to just observe Martin and Peter's nice efforts on this, but one note: On 07/11/2014 03:00 PM, Martin Buchholz wrote: On Wed, Jul 9, 2014 at 3:17 PM, Peter Levart <peter.levart at gmail.com> wrote:
IMH resizing is arranged so that the table is always 33% ... 66% full (if nothing is ever removed from it) except when capacity reaches 2**29, then it can be filled up to the top. avg. # of probes can be taken as a rough estimation of average slow-down, max. # of probes can be taken as a rough estimation of worst-case-operation slow-down. So where to draw the line? Linear probing has long been known to be prone to long worst-case probes, but with recent computer architectures this is offset by its extreme cache friendliness. Bear in mind that the number of bits of identityHashCode is less than 32 on all JVMs I know. It can be as low as 23, which means that you are sure to see a lot of exact collisions on IHMs with only tens of millions of elements, and there's nothing much you can do that will help. -Doug
We have max. size = 2^29 - 1. This can not change because of serialization backwards compatibility. It could be a little less, but not much. Experiment shows that it is possible to fill IHM up to 99% with not too much CPU effort. If such IHM is serialized, we must be able to deserialize it...
But why do we have to limit table length to 2^30 ? Because it has to be power of 2 and 2^31 is already a negative number... What would happen if max. table length was 2^30 + 2^29 ...
Here's what we get now:
// max. size = 2^29 - 1 // table length = 2^30 10% max. size, probes: 0.1 avg., 9 max. 20% max. size, probes: 0.1 avg., 15 max. 30% max. size, probes: 0.2 avg., 25 max. 40% max. size, probes: 0.3 avg., 38 max. 50% max. size, probes: 0.5 avg., 64 max. 60% max. size, probes: 0.7 avg., 93 max. 65% max. size, probes: 0.9 avg., 147 max. 70% max. size, probes: 1.2 avg., 224 max. 75% max. size, probes: 1.5 avg., 354 max. 80% max. size, probes: 2.0 avg., 441 max. 85% max. size, probes: 2.8 avg., 789 max. 90% max. size, probes: 4.5 avg., 1869 max. 91% max. size, probes: 5.1 avg., 2377 max. 92% max. size, probes: 5.7 avg., 3409 max. 93% max. size, probes: 6.6 avg., 3804 max. 94% max. size, probes: 7.8 avg., 5824 max. 95% max. size, probes: 9.5 avg., 7021 max. 96% max. size, probes: 12.0 avg., 12607 max. 97% max. size, probes: 16.2 avg., 17643 max. 98% max. size, probes: 24.5 avg., 34561 max. 99% max. size, probes: 49.6 avg., 131371 max. 100% ... haven't waited long enough....
If table length was 2^30 + 2^29, we would only fill it up to 66% like always and there would be no slow-down:
// max. size = 2^29 - 1 // table length = 2^30 + 2^29 10% max. size, probes: 0.0 avg., 7 max. 20% max. size, probes: 0.1 avg., 11 max. 30% max. size, probes: 0.1 avg., 13 max. 40% max. size, probes: 0.2 avg., 20 max. 50% max. size, probes: 0.3 avg., 28 max. 60% max. size, probes: 0.4 avg., 40 max. 65% max. size, probes: 0.4 avg., 42 max. 70% max. size, probes: 0.5 avg., 52 max. 75% max. size, probes: 0.5 avg., 65 max. 80% max. size, probes: 0.6 avg., 87 max. 85% max. size, probes: 0.7 avg., 87 max. 90% max. size, probes: 0.8 avg., 128 max. 91% max. size, probes: 0.8 avg., 128 max. 92% max. size, probes: 0.8 avg., 128 max. 93% max. size, probes: 0.8 avg., 129 max. 94% max. size, probes: 0.9 avg., 129 max. 95% max. size, probes: 0.9 avg., 131 max. 96% max. size, probes: 0.9 avg., 150 max. 97% max. size, probes: 0.9 avg., 150 max. 98% max. size, probes: 1.0 avg., 159 max. 99% max. size, probes: 1.0 avg., 159 max. 100% max. size, probes: 1.0 avg., 159 max.
If we're willing to pay the price of special-casing the non-power-of-2 MAX_CAPACITY = (1 << 29) + (1 << 28), which amounts to approx. 4% of performance:
Original code:
Benchmark Mode Samples Score Score error Units j.t.IHMBench.ihm0Put thrpt 30 25217087.593 50117.867 ops/s j.t.IHMBench.ihm1Get thrpt 30 43017677.457 230902.599 ops/s
Patched code:
Benchmark Mode Samples Score Score error Units j.t.IHMBench.ihm0Put thrpt 30 24213091.899 122980.408 ops/s j.t.IHMBench.ihm1Get thrpt 30 40754537.027 936380.022 ops/s
Using JMH benchmark:
@State(Scope.Thread) public class IHMBench {
static final Object[] keys = new Object[1000_000];
static {
for (int i = 0; i < keys.length; i++) {
keys[i] = new Object();
}
}
static final Object value = new Object();
final Map<Object, Object> map = new IdentityHashMap<Object,
Object>(keys.length);
int i = 0;
@Benchmark
public void ihm0Put(Blackhole bh) {
bh.consume(map.put(keys[i], value));
int j = i + 1;
i = j < keys.length ? j : 0;
}
@Benchmark
public void ihm1Get(Blackhole bh) {
bh.consume(map.get(keys[i]));
int j = i + 1;
i = j < keys.length ? j : 0;
}
}
Then here's a possible solution:
http://cr.openjdk.java.net/~plevart/jdk9-dev/IdentityHashMap/webrev.06/
Is it worth it?
Regards, Peter
- Previous message: RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size
- Next message: RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]