Performance bug in and XXS patch for Parallel Old GC scanning large object arrays (original) (raw)
Reingruber, Richard richard.reingruber at sap.com
Wed Mar 2 12:26:44 UTC 2016
- Previous message (by thread): 8149642: gc/g1/TestShrinkAuxiliaryData* tests fail with "GC triggered before VM initialization completed"
- Next message (by thread): Performance bug in and XXS patch for Parallel Old GC scanning large object arrays
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
I would like to report a performance bug in the parallel old collector handling large object arrays and contribute an XXS patch for the bug that improves gc pauses by 5x in a microbenchmark. The problem is that scanning of large object arrays is not really parallelized.
When marking through large object arrays, the elements are scanned in chunks of ObjArrayMarkingStride. This is done in
ObjArrayKlass::objarray_follow_contents(ParCompactionManager* cm, oop obj, int index)
In particular, the partially scanned array is pushed on the _objarray_stack only after the elements are marked and pushed on the marking stack. It would help parallelizing the work, if the array was made available to other threads (by pushing it on _objarray_stack) before the current thread visits its stride of elements.
diff -r 5c4f8192021e src/share/vm/gc/parallel/psCompactionManager.inline.hpp --- a/src/share/vm/gc/parallel/psCompactionManager.inline.hpp Tue Mar 01 09:42:19 2016 +0100 +++ b/src/share/vm/gc/parallel/psCompactionManager.inline.hpp Wed Mar 02 12:17:07 2016 +0100 @@ -125,14 +125,14 @@ T* const beg = base + beg_index; T* const end = base + end_index; + if (end_index < len) { + cm->push_objarray(obj, end_index); // Push the continuation. + } + // Push the non-NULL elements of the next stride on the marking stack. for (T* e = beg; e < end; e++) { cm->mark_and_push(e); }
- if (end_index < len) {
- cm->push_objarray(obj, end_index); // Push the continuation.
- } }
inline void ParCompactionManager::follow_contents(objArrayOop obj, int index) {
Below you can find a microbenchmark that demonstrates the performance bug and the output of 2 runs without and with the patch on a linux x86_64 box with 32 hw threads and 16 cores.
I'd suggest as well to reduce ObjArrayMarkingStride (512 by dflt) for the parallel old gc for better parallelization. G1 and the parallel scavenger for example use ParGCArrayScanChunk of 50). Both push the remainder before iterating the array.
Regards, Richard.
-- Richard Reingruber | SAP JVM | CORE PLATFORM, SAP SE Pflichtangaben/Mandatory Disclosure Statements: http://www.sap.com/company/legal/impressum.epx
Benchmark Run 1 without the path
$ ./jdk9/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4 openjdk version "9-internal" OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt) OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)
Building large array with 100000 elements of small trees with depth 4 Heap stats: free:7769 MB (99.0 %) max:7851 MB done Heap stats: free:6417 MB (81.7 %) max:7851 MB Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 8759 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1856 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 3657 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 4280 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1823 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 2439 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1803 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1675 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1889 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1867 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1854 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1771 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1792 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1681 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1844 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1807 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1832 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1835 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1953 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1846 ms d038402 at ld3357 [/net/usr.work/d038402/tmp] 17:55:04
Benchmark Run 2 without the path
$ ./jdk9_mod/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4 openjdk version "9-internal" OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt) OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)
Building large array with 100000 elements of small trees with depth 4 Heap stats: free:7769 MB (99.0 %) max:7851 MB done Heap stats: free:6376 MB (81.2 %) max:7851 MB Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 3684 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 1073 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 425 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 369 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 436 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 334 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 352 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 420 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 391 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 393 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 356 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 352 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 348 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 354 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 340 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 370 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 363 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 370 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 355 ms Calling System.gc() Heap stats: free:6467 MB (82.4 %) max:7851 MB Full GC duration: 348 ms
Benchmark source code
package test.gc;
public class TestLargeObjectArray {
private static final long MB = 1 << 20;
private int arraySize;
private int treeDepth;
private int iterations;
private LittleTree[] arrayOfSmallTrees;
public static void main(String[] args) {
TestLargeObjectArray test = new TestLargeObjectArray();
test.parseCmdLine(args);
test.buildLargeArrayOfSmallTrees();
test.run();
}
private void run() {
while (iterations-- > 0) {
long start = System.currentTimeMillis();
message("Calling System.gc()");
System.gc();
long end = System.currentTimeMillis();
heapStats();
message("Full GC duration: " + (end - start) + " ms");
}
}
public void buildLargeArrayOfSmallTrees() {
try {
message("Building large array with " + arraySize + " elements of small trees with depth " + treeDepth);
heapStats();
arrayOfSmallTrees = new LittleTree[arraySize];
for (int i = 0; i < arraySize; i++) {
arrayOfSmallTrees[i] = LittleTree.build(treeDepth);
}
message("done");
heapStats();
} catch (OutOfMemoryError oom) {
arrayOfSmallTrees = null;
message("OutOfMemory: please reduce array size or tree depth");
System.exit(1);
}
}
private void heapStats() {
long free = Runtime.getRuntime().freeMemory();
long max = Runtime.getRuntime().maxMemory();
float freePercentage = round((float) free / (float) max * 100);
message("Heap stats: free:" + free / MB + " MB (" + freePercentage + " %)" + " max:" + max / MB + " MB");
}
private float round(float f) {
return Math.round(f * 10) / 10f;
}
private void parseCmdLine(String[] args) {
if (args.length != 3) {
message();
message("usage: java " + TestLargeObjectArray.class.getName()
+ " <test iterations, e.g. 20> <large array size, e.g. 50000> <small tree depth, e.g. 4>");
message();
System.exit(1);
}
int i = 0;
iterations = Integer.parseInt(args[i++]);
arraySize = Integer.parseInt(args[i++]);
treeDepth = Integer.parseInt(args[i++]);
}
private void message(String msg) {
System.out.println(msg);
}
private void message() {
System.out.println();
}
}
class LittleTree {
private static final int ARITY = 6;
@SuppressWarnings("unused")
private LittleTree[] subtrees;
public LittleTree(LittleTree[] trees) {
this.subtrees = trees;
}
public static LittleTree build(int treeDepth) {
if (treeDepth == 0) {
return null;
}
// recursion
LittleTree[] subtrees = new LittleTree[ARITY];
for (int i = 0; i < ARITY; i++) {
subtrees[i] = build(treeDepth - 1);
}
return new LittleTree(subtrees);
}
}
- Previous message (by thread): 8149642: gc/g1/TestShrinkAuxiliaryData* tests fail with "GC triggered before VM initialization completed"
- Next message (by thread): Performance bug in and XXS patch for Parallel Old GC scanning large object arrays
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]