(original) (raw)
I have encountered a large discrepancy in the performance of Arrays.fill(int\[\], int) vs. Arrays.fill(Object\[\], Object). From jmh:
Benchmark Mode Samples Mean Mean error Units
fillIntArray avgt 10 802.393 19.323 ns/op
fillReferenceArray avgt 10 5323.516 105.982 ns/op
The array size is 8192, which means that filling an int array works at above 10 slots/nanosecond (2.66 GHZ Intel Core i7)---this sound like fantastic performance.
My question is, what is stopping HotSpot from applying the same optimization to a reference array?
One guess was the maintenance of the card table, but I couldn't get enough information to confirm that. A naive view of optimization opportunities seems to indicate that the card table could be updated wholesale either before or after writing out the array.
Printing assembly code, all I can see in the fill(Object\[\],Object) case is the explicit loop involving the write barrier on each slot write:
lea eax, \[edx+ebx\*4+0x10\]
mov \[eax\], edx
shr eax, 9
mov \[edi+eax\], ah
whereas for fill(int\[\],int) I see an opaque
mov edx, 0x0000000011145ca0
call edx
for the whole filling operation.
I was testing on 64-Bit Server VM (build 24.0-b56, mixed mode) with default settings.
For reference, this is the code I have used:
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 2)
@State(Scope.Thread)
@Threads(1)
@Fork(1)
public class Writing
{
static final int TARGET\_SIZE = 1<<13;
static final int\[\] intArray = new int\[TARGET\_SIZE\];
static final Object\[\] referenceArray = new Object\[TARGET\_SIZE\];
int intVal = 1;
@GenerateMicroBenchmark
public void fillIntArray() {
Arrays.fill(intArray, intVal++);
}
@GenerateMicroBenchmark
public void fillReferenceArray() {
Arrays.fill(referenceArray, new Object());
}
}
Benchmark Mode Samples Mean Mean error Units
fillIntArray avgt 10 802.393 19.323 ns/op
fillReferenceArray avgt 10 5323.516 105.982 ns/op
The array size is 8192, which means that filling an int array works at above 10 slots/nanosecond (2.66 GHZ Intel Core i7)---this sound like fantastic performance.
My question is, what is stopping HotSpot from applying the same optimization to a reference array?
One guess was the maintenance of the card table, but I couldn't get enough information to confirm that. A naive view of optimization opportunities seems to indicate that the card table could be updated wholesale either before or after writing out the array.
Printing assembly code, all I can see in the fill(Object\[\],Object) case is the explicit loop involving the write barrier on each slot write:
lea eax, \[edx+ebx\*4+0x10\]
mov \[eax\], edx
shr eax, 9
mov \[edi+eax\], ah
whereas for fill(int\[\],int) I see an opaque
mov edx, 0x0000000011145ca0
call edx
for the whole filling operation.
I was testing on 64-Bit Server VM (build 24.0-b56, mixed mode) with default settings.
For reference, this is the code I have used:
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 2)
@State(Scope.Thread)
@Threads(1)
@Fork(1)
public class Writing
{
static final int TARGET\_SIZE = 1<<13;
static final int\[\] intArray = new int\[TARGET\_SIZE\];
static final Object\[\] referenceArray = new Object\[TARGET\_SIZE\];
int intVal = 1;
@GenerateMicroBenchmark
public void fillIntArray() {
Arrays.fill(intArray, intVal++);
}
@GenerateMicroBenchmark
public void fillReferenceArray() {
Arrays.fill(referenceArray, new Object());
}
}