RFR (M): 8077144: Concurrent mark initialization takes too long (original) (raw)

Kim Barrett kim.barrett at oracle.com
Sat Mar 26 01:38:16 UTC 2016


On Mar 15, 2016, at 6:12 PM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:

Hi Mikael, updated webrev at http://cr.openjdk.java.net/~tschatzl/8077144/webrev.3/ (full) http://cr.openjdk.java.net/~tschatzl/8077144/webrev.2to3/ (diff) which implements the suggested changes.

I've run out of steam for looking at g1ConcurrentMark.cpp changes for now. I may have more comments there later. I like the new approach though.


src/share/vm/gc/g1/g1ConcurrentMark.cpp 51 #include "logging/logTag.hpp"

Why add this #include? log.hpp appears to be the "include all of logging" header, and it's already included. And I didn't find anything specific to log tags in the changes.


src/share/vm/gc/g1/g1ConcurrentMark.cpp
1142 inline void set_card_bitmap_range(BitMap* bm,

Is this really necessary? I suspect set_range with a small_range hint would get much of the benefit (especially with the _last_marked_bit_idx optimization in the caller), with a lot less code. E.g.

bm.set_range(beg, end, BitMap::small_range);

And that's ignoring the question of where this should actually cross over between looping and calling set_range; do you have measurements to show that 8 is actually a good crossing point?

Just using set_range would also make this assert superfluous: 1148 assert(end_idx <= bm->size(), "sanity");


src/share/vm/gc/g1/g1ConcurrentMark.cpp
1151 // use par_at_put_range (if parallel). The range is made up of the

I see no parallelism here. Maybe this comment has some left over cut and paste?


src/share/vm/gc/g1/g1ConcurrentMark.cpp 1199 BitMap::idx_t card_num = (BitMap::idx_t)(uintptr_t(addr) >> CardTableModRefBS::card_shift);

I would like the cast to BitMap::idx_t removed; I don't think it serves any useful purpose because idx_t is size_t is big enough. And if it did perform a narrowing, I'd rather not have that hidden if/when we someday turn on -Wconversion or the like.

Personally, I'd prefer an explict reinterpret_cast for the uintptr_t conversion, but I realize using C++ cast operators is still kind of an unusual thing in this code base.

1286 (BitMap::idx_t)(uintptr_t(G1CollectedHeap::heap()->reserved_region().start()) >> CardTableModRefBS::card_shift);

Same thing here.


src/share/vm/gc/g1/g1ConcurrentMark.cpp 1260 int obj_sz = obj->size(); ... 1268 marked_bytes += (size_t)obj_sz * HeapWordSize;

[Pre-existing, feel free to ignore.]

obj_sz could be declared size_t, since obj->size() is not going to return a negative value, and then the cast could be removed.


src/share/vm/gc/g1/g1ConcurrentMark.cpp 1206 void set_bit_for_region(HeapRegion* hr) { 1207 BitMap::idx_t index = (BitMap::idx_t) hr->hrm_index(); 1208 _region_bm->par_at_put(index, true); 1209 }

This is the only use of _region_bm in the G1LiveDataHelper class. Of the three places that make one of these, two provide a region bitmap and one passes in NULL, meaning for that one it can't actually call this function, which suggests this function shouldn't really be part of the helper class. (It could be a static member taking a bitmap argument, or a similar file scoped helper function.)

[Pre-existing] The explicit cast of hrm_index to BitMap::idx_t serves no useful purpose, and if it were a narrowing conversion the cast would just mask that if we ever turn on -Wconversion or the like.


src/share/vm/gc/g1/g1ConcurrentMark.cpp
1281 //assert(region_bm != NULL, ""); 1282 assert(card_bm != NULL, "");

Left-over debugging code? Later: Oh, the commented out assert is related to the above discussion about set_bit_for_region.


src/share/vm/gc/g1/g1ConcurrentMark.cpp 688 void G1ConcurrentMark::cleanup_for_next_mark() { ... 699 clear_bitmap(_nextMarkBitMap, _parallel_workers, true); 700 701 // Clear the live count data. If the marking has been aborted, the abort() 702 // call already did that. 703 if (!has_aborted()) { 704 clear_all_live_data(_parallel_workers); 705 DEBUG_ONLY(verify_all_live_data()); 706 }

[Pre-existing] abort() also clears the _nextMarkBitMap, but appears to do so in STW. Seems like one of those is superfluous. And wouldn't it be better to not do any of these in (STW) abort() but rather do all of them here?


src/share/vm/gc/g1/g1ConcurrentMark.cpp
2496 _bitmap->clear_range(start, end);

If it weren't for the risk of tripping the (IMO bogus) assert on minimum size in BitMap::clear_large_range, I would suggest calling clear_range with a large_range hint.


src/share/vm/gc/g1/g1ConcurrentMark.cpp 2475 class G1ClearAllLiveDataTask : public AbstractGangTask { ... 2494 BitMap::idx_t start = M * BitsPerByte * to_process; 2495 BitMap::idx_t end = MIN2(start + M * BitsPerByte, _bitmap->size()); ... 2501 void G1ConcurrentMark::clear_all_live_data(WorkGang* workers) { ... 2506 size_t const num_chunks = align_size_up(_card_live_bm.size_in_words() * HeapWordSize, M) / M;

These two snippets implicitly share knowledge that the chunking size for tasks is 1Mbyte. One uses "chunk" and the other uses "task" terminology, which further obscures this relationship. Consistent terminology and a static constant in G1ClearAllLiveDataTask that is used in both places would help.


src/share/vm/gc/g1/g1ConcurrentMark.cpp 1130 class G1LiveDataHelper VALUE_OBJ_CLASS_SPEC { ... 1142 inline void set_card_bitmap_range(BitMap* bm,

This uses non-parallel set_bit and set_range. Parallel callers are operating on distinct heap regions. If that isn't guaranteed, then two parallel tasks could try to modify bits in the same BitMap word simultaneously. We're OK so long as G1HeapRegionSize is a multiple of

BitsPerWord * (1 << CardTableModRefBs::card_shift)

(which is 2^15 with 512 byte cards on a 64bit machine), but G1HeapRegionSize is a product option with a relatively unconstrained value (only bounded between 1M and 32M, no alignment constraint).

Oh, but, it looks like HeapRegion::setup_heap_region_size ensures the region size is a power of 2 within the min/max bounds, rounding up an arbitrary command line value if necessary.

So we're OK, though it required some searching to convince myself of that. An assertion somewhere in G1LiveDataHelper might be appropriate, since it doesn't ever use par_xxx bitmap functions, unlike the old code, which selected between them based on an argument that no longer exists.


src/share/vm/gc/g1/g1ConcurrentMark.hpp 307 // A set bit indicates that the given card contains a live object.

"... contains a live object." is perhaps imprecise. The card might contain multiple objects, and might contain parts of one or two objects that cross either card boundary. Something like "... contains at least part of at least one live object." would be more precise. But maybe I'm being overly pedantic.


src/share/vm/gc/g1/g1ConcurrentMark.hpp 266 class G1ConcurrentMark: public CHeapObj { ... 280 protected:

[Pre-existing, feel free to ignore.] Why is all this stuff protected rather than private?


src/share/vm/gc/g1/g1ConcurrentMark.hpp 855 // Precondition: obj is below region's NTAMS.

There is no longer an argument called "region"; it's now the region containing obj. (Actually, it always ways, but there used to also be a precondition that obj was in region.)


src/share/vm/gc/g1/g1OopClosures.inline.hpp 134 _cm->grayRoot(obj, hr);

With the removal of _worker_i as an argument here, the _worker_i member of G1RootRegionScanClosure appears to be unused.


src/share/vm/utilities/bitMap.cpp 72 os::pretouch_memory((char*)word_addr(0), (char*)word_addr(size()));

Casts to char* are no longer needed, since fix for JDK-8151414.

Also, copyright needs updating.


src/share/vm/utilities/bitMap.hpp 147 static idx_t size_in_words(size_t size_in_bits) {

I find the name of this function confusing in conjunction with the no-arg ordinary member function; I'm not keen on overloads with with very different semantics and usage. Clearer (to me) here might be something like calc_size_in_words.

Also, copyright needs updating.




More information about the hotspot-gc-dev mailing list