RFR: 8200557: OopStorage parallel iteration scales poorly (original) (raw)

Kim Barrett kim.barrett at oracle.com
Thu Apr 19 08🔞35 UTC 2018


Please review this change to OopStorage parallel iteration to improve the scaling with additional threads.

Two sources of poor scaling were found: (1) contention when claiming blocks, and (2) each worker thread ended up touching the majority of the blocks, even those not processed by that thread.

To address this, we changed the representation of the sequence of all blocks. Rather than being a doubly-linked intrusive list linked through the blocks, it is now an array of pointers to blocks. We use a combination of refcounts and an RCU-inspired mechanism to safely manage the array storage when it needs to grow, avoiding the need to lock access to the array while performing concurrent iteration.

The use of an array for the sequence of all blocks permits parallel iteration to claim ranges of indices using Atomic::add, which can be more efficient on some platforms than using cmpxchg loops. It also allows a worker thread to only touch exactly those blocks it is going to process, rather than walking a list of blocks. The only complicating factor is that we have to account for possible overshoot in a claim attempt.

Blocks know their position in the array, to facilitate empty block deletion (an empty block might be anywhere in the active array, and we don't want to have to search for it). This also helps with allocation_status, eliminating the verification search that was needed with the list representation. allocation_status is now constant-time, which directly benefits -Xcheck:jni.

A new gtest-based performance demonstration is included. It's not really a test, in that it doesn't do any verification. Rather, it performs parallel iteration and reports total time, per-thread times, and per-thread percentage of blocks processed. This is done for a variety of thread counts, to show the parallel speedup and load balancing. Running on my dual 6 core Xeon, I'm seeing more or less linear speedup for up to 10 threads processing 1M OopStorage entries.

CR: https://bugs.openjdk.java.net/browse/JDK-8200557

Webrev: http://cr.openjdk.java.net/~kbarrett/8200557/open.00/

Testing: jdk-tier{1-3}, hs-tier{1-5}, on all Oracle supported platforms



More information about the hotspot-dev mailing list