(original) (raw)

I have encountered a large discrepancy in the performance of Arrays.fill(int\[\], int) vs. Arrays.fill(Object\[\], Object). From jmh:

Benchmark Mode Samples Mean Mean error Units
fillIntArray avgt 10 802.393 19.323 ns/op
fillReferenceArray avgt 10 5323.516 105.982 ns/op

The array size is 8192, which means that filling an int array works at above 10 slots/nanosecond (2.66 GHZ Intel Core i7)---this sound like fantastic performance.

My question is, what is stopping HotSpot from applying the same optimization to a reference array?

One guess was the maintenance of the card table, but I couldn't get enough information to confirm that. A naive view of optimization opportunities seems to indicate that the card table could be updated wholesale either before or after writing out the array.

Printing assembly code, all I can see in the fill(Object\[\],Object) case is the explicit loop involving the write barrier on each slot write:

lea eax, \[edx+ebx\*4+0x10\]
mov \[eax\], edx
shr eax, 9
mov \[edi+eax\], ah

whereas for fill(int\[\],int) I see an opaque

mov edx, 0x0000000011145ca0
call edx

for the whole filling operation.

I was testing on 64-Bit Server VM (build 24.0-b56, mixed mode) with default settings.

For reference, this is the code I have used:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 2)
@State(Scope.Thread)
@Threads(1)
@Fork(1)
public class Writing
{
static final int TARGET\_SIZE = 1<<13;

static final int\[\] intArray = new int\[TARGET\_SIZE\];
static final Object\[\] referenceArray = new Object\[TARGET\_SIZE\];

int intVal = 1;
@GenerateMicroBenchmark
public void fillIntArray() {
Arrays.fill(intArray, intVal++);
}

@GenerateMicroBenchmark
public void fillReferenceArray() {
Arrays.fill(referenceArray, new Object());
}
}