RFR: 8200557: OopStorage parallel iteration scales poorly (original) (raw)

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Wed Apr 25 21:30:42 UTC 2018


http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.udiff.html

+OopStorage::BasicParState::IterationData::IterationData() :

Can this constructor be inlined in the declaration?  This seems out of place here (and I had to hunt for it).

Can some other simple accessors be inlined in the declarations as well?  That would decrease jumping around trying to understand things.

I think the OopStorage::BasicParState functions should be in a new .cpp file since there's now *src/hotspot/share/gc/shared/oopStorageParState.hpp *Can you do this as a follow up/cleanup RFE?

http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.hpp.udiff.html

class Block;                  // Forward decl; defined in .inline.hpp file.

Can you give a short description of these, ie:

class Block;                  // Block of oop* stored in OopStorage

Or maybe a short description over the class declarations?  The trouble with these generic names is you forget why they exist.

I'd call BlockList a AllocationBlockList and BlockArray an ActiveBlockArray

http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.inline.hpp.frames.html

38 class OopStorage::BlockArray {

This would be a good place to comment the purpose of this and a line about refcounting.

http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.hpp.frames.html

Why does BlockList need to be included in the .hpp file?

http://cr.openjdk.java.net/~kbarrett/8200557/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.frames.html

All these places have block_count() which are indistinguishable. It looks like _block_count for the BlockArray should always be acquired outside of BlockArray class or rather give that as the only public option to prevent mistakes.  It seems like there wouldn't be much performance cost to calling load_acquire once at the safepoint iteration.

Comment about why refcounts and what do they do here would be nice.

151 void OopStorage::BlockArray::increment_refcount() const {

567 void OopStorage::relinquish_block_array(BlockArray* array) const { 568 if (array->decrement_refcount()) { 569 assert(array != _active_array, "invariant"); 570 BlockArray::destroy(array); 571 } 572 }

Here array is a pointer to the active array so iterators could destroy the array if they are the last ones to use it.  Is this right?

I've read through most of the rest of the code and it looks good apart from how complicated it is, which I don't know how to suggest to improve it other than some reminder comments and maybe more descriptive names.

thanks, Coleen

On 4/19/18 4:18 AM, Kim Barrett wrote:

Please review this change to OopStorage parallel iteration to improve the scaling with additional threads.

Two sources of poor scaling were found: (1) contention when claiming blocks, and (2) each worker thread ended up touching the majority of the blocks, even those not processed by that thread. To address this, we changed the representation of the sequence of all blocks. Rather than being a doubly-linked intrusive list linked through the blocks, it is now an array of pointers to blocks. We use a combination of refcounts and an RCU-inspired mechanism to safely manage the array storage when it needs to grow, avoiding the need to lock access to the array while performing concurrent iteration. The use of an array for the sequence of all blocks permits parallel iteration to claim ranges of indices using Atomic::add, which can be more efficient on some platforms than using cmpxchg loops. It also allows a worker thread to only touch exactly those blocks it is going to process, rather than walking a list of blocks. The only complicating factor is that we have to account for possible overshoot in a claim attempt. Blocks know their position in the array, to facilitate empty block deletion (an empty block might be anywhere in the active array, and we don't want to have to search for it). This also helps with allocationstatus, eliminating the verification search that was needed with the list representation. allocationstatus is now constant-time, which directly benefits -Xcheck:jni. A new gtest-based performance demonstration is included. It's not really a test, in that it doesn't do any verification. Rather, it performs parallel iteration and reports total time, per-thread times, and per-thread percentage of blocks processed. This is done for a variety of thread counts, to show the parallel speedup and load balancing. Running on my dual 6 core Xeon, I'm seeing more or less linear speedup for up to 10 threads processing 1M OopStorage entries. CR: https://bugs.openjdk.java.net/browse/JDK-8200557 Webrev: http://cr.openjdk.java.net/~kbarrett/8200557/open.00/ Testing: jdk-tier{1-3}, hs-tier{1-5}, on all Oracle supported platforms



More information about the hotspot-dev mailing list