RFR: 8199417: Modularize interpreter GC barriers (original) (raw)

Erik Österlund erik.osterlund at oracle.com
Wed Apr 4 21:03:52 UTC 2018

Previous message: RFR: 8199417: Modularize interpreter GC barriers
Next message: RFR: 8199417: Modularize interpreter GC barriers
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Coleen,

On 2018-04-04 22:05, coleen.phillimore at oracle.com wrote:

Thank you for keeping loadheapoop in the macroAssembler, I like it and think it should be consistent with the rest of the platforms.

I'm glad you liked it.

http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/src/hotspot/cpu/x86/templateTablex86.cpp.udiff.html

+ dooopstore(masm, Address(rdx, 0), rax, INHEAPARRAY); What happens if someone misses INHEAPARRAY and puts INHEAP instead?

Then we will miss performing precise card marking, which we expect should happen when either 1) storing into an array, or 2) store using unsafe, which in turn could be into an array.

http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/src/hotspot/cpu/aarch64/templateTableaarch64.cpp.udiff.html

+ dooopstore(masm, elementaddress, noreg, INHEAP | INHEAPARRAY); Here it has both for aarch64.

do_oop_store in the templateTable code is always IN_HEAP. So that seems redundant.

I reviewed the x86 interpreter code, and will review the rest when/if people say that loadheapoop is good.

Okay, thanks. In that case, I will start to polish the SPARC code with similar wrappers meanwhile.

Thanks, /Erik

Thanks, Coleen

On 4/4/18 9:38 AM, Erik Österlund wrote: Hi Kim,

Thank you for reviewing this. I have made a new webrev with proposed changes to the x86/x64 code reflecting the concerns you and Coleen have. I thought we should settle for something we like in the x86/x64 code before going ahead and changing the other platforms too much (I don't want to bug Martin and Roman too much before). New full webrev: http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/ Incremental: http://cr.openjdk.java.net/~eosterlund/8199417/webrev.0001/ On 2018-04-01 05:12, Kim Barrett wrote: On Mar 26, 2018, at 5:31 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:

Hi, The GC barriers for the interpreter are not as modular as they could be. They currently use switch statements to check which GC barrier set is being used, and call this or that barrier based on that, in a way that assumes GCs only use write barriers. This patch modularizes this by generating accesses in the interpreter with declarative semantics. Accesses to the heap may now use storeat and loadat functions of the BarrierSetAssembler, passing along appropriate arguments and decorators. Each concrete BarrierSetAssembler can override the access completely or sprinkle some appropriate GC barriers as necessary. Big thanks go to Martin Doerr and Roman Kennke, who helped plugging this into S390, PPC and AArch64 respectively. Webrev: http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8199417 Thanks, /Erik I've only looked at the x86 and generic code so far. I have some comments, but also a bunch of questions. I think I'm missing some things somewhere. I'm sending what I've got so far anyway, and will continue studying. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssemblerx86.cpp 354 // flatten object address if needed 355 // We do it regardless of precise because we need the registers There is no "precise" in this context. I thought "precise" was referring to the concept of precise card marking, as opposed to the variable 'precise' that used to exist. I have reworded the comment to reflect that. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssemblerx86.cpp 364 #ifndef LP64 365 InterpreterMacroAssembler *imasm = staticcast<InterpreterMacroAssembler*>(masm); 366 #endif In the old code, the function received an InterpreterMacroAssembler*. What is there about this context that lets us know the downcast is safe here? Is oopstoreat only ever called with an IMA*? If so, then why isn't that the parameter type. If not, then what? Yes, this is indeed only used by the InterpreterMacroAssembler. But I wanted the interface to use MacroAssembler. Why? 1) It's only 32 bit x86 that relies on this property, and I hope it will go away soon, and the save bcp hack with it. 2) previous loadheapoop is used not only in the InterpreterMacroAssembler, and I wanted the same assembler in loadat and storeat. So for now it is only used in InterpreterMacroAssembler, and I doubt that will change any time soon. I am hoping 32 bit support will be dropped before that, and the hack will go away. For now, I have added a virtual isinterpretermasm() call that I use in an assert to make sure this is not accidentally used in the wrong context until 32 bit support is gone. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/templateInterpreterGeneratorx86.cpp 753 bs->loadat(masm, INHEAP | ONWEAKOOPREF, TOBJECT, 754 rax, fieldaddress, /tmp1/ rbx, /tmpthread/ rdx); This needs to distinguish between WEAK and PHANTOM references, else it will break ZGC's concurrent reference processing. (At least so long as we still have to deal with finalization.) It does. This is ONWEAKOOPREF. The get intrinsic is generated for WeakReference (and SoftReference) only. PhantomReference does not have a getter. Therefore, in all contexts this is intrinsified, it will refer to an ONWEAKOOPREF. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/templateInterpreterGeneratorx86.cpp 698 address TemplateInterpreterGenerator::generateReferencegetentry(void) { Almost all the code here (and much of the commentary) was/is specific to G1 (or SATB collectors). Shouldn't it all be in the barrierset's loadat? As it is now, if (say) we're using Parallel GC then I think there's a bunch of additional code being generated and executed here that wasn't present before. It is true that interpreter Reference.get() intrinsics with Parallel/CMS/Serial currently run a few instructions more than they need to pay for the abstraction. There is no doubt in my mind that this abstraction is worth the cost. Reference.get in the interpreter is not a hot path, and does not need to optimize every cycle. I changed the G1 comments to more general comments instead. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/methodHandlesx86.cpp 170 _ loadheapoop(methodtemp, Address(recv, NONZERO(javalanginvokeMethodHandle::formoffsetinbytes()))); => 175 bs->loadat(masm, INHEAP, TOBJECT, methodtemp, Address(recv, NONZERO(javalanginvokeMethodHandle::formoffsetinbytes())), t This doesn't look like an improvement in code readability to me. Why can't bs->loadat be the implementation of " loadheapoop" ?_ And the lines are getting really long; so long that webrev is cutting off the right end, so I can't review the new versions of the lines. Due to popular demand, I have done what you (and Coleen) both proposed: retained the loadheapoop abstraction. Before: * loadheapoop was the "raw" variant, similar to oopDesc::loaddecodeheapoop, which no longer exists. Now: * loadheapoop calls MacroAssembler::accessloadat * accessloadat grabs the active BarrierSetAssembler and calls loadat on it. ... same idea for stores. * What used to be the raw versions for oop loads and stores, has moved to BarrierSetAssembler::loadat/storeat. These accessors now optionally take temporary register and decorator parameters than you don't have to supply: * supplying temprary register is polite to the GC so its accesses can be a bit better * supplying decorators can be done if it is not a default access. For example, ASRAW can now be used to perform a raw load instead. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/methodHandlesx86.cpp 363 _ loadheapoop(temp2defc, memberclazz); => 368 BarrierSetAssembler* bs = BarrierSet::barrierset()->barriersetassembler(); 369 Register rthread = LP64ONLY(r15thread) NOTLP64(noreg); 370 bs->loadat(masm, INHEAP, TOBJECT, temp2defc, memberclazz, noreg, rthread); Similarly, this doesn't seem like an improvement to me. And there are more like this. Fixed. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/macroAssemblerx86.hpp It would reduce the clutter in this change to leave the asAddress declarations where they were, since that place is now public, rather than moving them to a different public section. Fixed. ------------------------------------------------------------------------------

Generally, it seems like it might be nice for the MacroAssembler class to have a reference to the barrierset, or maybe even the bs's assembler, rather than looking them up in various places, and in different ways. For example, I see each of Universe::heap()->barrierset() BarrerSet::barrierset() used in various different places in new code. Some consistency would be nice. I think it seems fine to just fetch them only in the accessload/storeat wrappers. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/macroAssemblerx86.cpp 5249 bs->loadat(this, INROOT | ONPHANTOMOOPREF, TOBJECT, 5250 value, Address(value, -JNIHandles::weaktagvalue), tmp, thread); 5251 verifyoop(value); This puts the verifyoop after the G1 pre-barrier, where in the old code verifyoop was before the pre-barrier. I remember the old order being intentional, though don't off-hand recall how important that was. If for no other reason, it seems like the new order could rarely (due to bad timing) have a bad oop reported somewhere far away during SATB buffer processing, rather than at the point of the load. With the new version's order there could be one verify after the done label. That's just a minor nit. Conversely, I think the new order is better and more GC agnostic. It avoids, for example, sending in bad oops for verification with ZGC. The more GC-agnostic interface is that after the loadat ------------------------------------------------------------------------------ src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssemblerx86.cpp 230 #ifdef LP64 231 if (crarg1 != thread) { 232 _ mov(crarg1, thread); 233 } 234 if (crarg0 != preval) { 235 _ mov(crarg0, preval); 236 } 237 #else 238 _ push(thread); 239 _ push(preval); 240 #endif Old code used: __ passarg1(thread);_ __ passarg0(preval);_ Why is this being changed? Oh, passargN are static helpers not available outside macroAssemblerx86.cpp. Maybe they should be exposed in some way, like public helpers in the x86 version of the MacroAssembler class, rather than copied inline here (and maybe other places?). Fixed. (added a wrapper for this to MacroAssembler, as suggested) ------------------------------------------------------------------------------ src/hotspot/cpu/x86/interpmasmx86.cpp Deleted these lines: 519 // The resulting oop is null if the reference is not yet resolved. 520 // It is Universe::thenullsentinel() if the reference resolved to NULL via condy. I don't think those comment lines should be deleted. Fixed. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/gc/shared/modRefBarrierSetAssemblerx86.cpp 86 void ModRefBarrierSetAssembler::storeat(MacroAssembler* masm, DecoratorSet decorators, BasicType type, 87 Address dst, Register val, Register tmp1, Register tmp2) { 88 if (type == TOBJECT || type == TARRAY) { 89 oopstoreat(masm, decorators, type, dst, val, tmp1, tmp2); 90 } else { 91 BarrierSetAssembler::storeat(masm, decorators, type, dst, val, tmp1, tmp2); 92 } 93 } Doesn't the conditional conversion from storeat to oopstoreat actually indicate a bug in the caller? That is, calling storeat with a oop (TOBJECT or TARRAY) value seems like a bug, and should be asserted against, rather than translating. And oopstoreat should assert that the value is an oop value. And should there be any verification that the decorators and the type are consistent? But maybe I'm not understanding the protocol around oopstoreat? There being no documentation, it seems entirely possible that I've misunderstood something here. Maybe I'm being confused by similarities in naming with Access:: that don't hold? The confusion roots in that Access and BarrierSetAssembler are not structured exactly the same. Access provides storeat and oopstoreat. The reason for providing both of these, is that I could not safely infer whether the passed in parameters were oops or not without changing oop and narrowOop to always be value objects, which I did not dare to do as the generated code was worse. In BarrierSetAssembler, there is no such restriction. The type for the access is always passed in to storeat/loadat as BasicType. It is the responsibility of ModRefBarrierSetAssembler to filter away non-oop references, and call oopstoreat for relevant accesses for ModRef. This follows the same pattern that the arraycopy stubs used. This allows, e.g. CardTableModRef to only override oopstoreat. This is similar to how ModRefBarrierSet::AccessBarrier<decorators,_ _BarrierSetT>::oopstoreinheap calls the concrete barriersets for help generating that access. I hope this helps understanding the structuring. This is also the reason why G1BarrierSetAssembler::loadat checks whether it is an oop that is loaded or not, because loads are not riding on the ModRef layer, which only knows about oop stores. ------------------------------------------------------------------------------ src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssemblerx86.cpp 88 void CardTableBarrierSetAssembler::storecheck(MacroAssembler* masm, Register obj, Address dst) { 89 // Does a store check for the oop in register obj. The content of 90 // register obj is destroyed afterwards. The comment states obj is an oop, but when called by the precise case of oopstoreat, it is not an oop, but rather a derived pointer. I think it's the comment that's wrong. I changed the comment to refer to obj as an oop* instead of an oop. ------------------------------------------------------------------------------ I was surprised that oopstoreat is initially declared at the ModRefBarrierSetAssembler level, rather than at the BarrierSetAssembler level. I was also surprised to not see an ooploadat. In Access we have at suffixed operations paired with non-at suffixed operations, the difference being the at suffixed operations have a base object, to allow dealing with the brooks-ptr. I'm not finding any such distinction in the xxxAssembler API. Presumably I've missed something somewhere. How is that handled? I think I previously already explained this one. ModRef does not know about oop loads, only stores. Therefore, G1 must override loadat, and not ooploadat. Thanks, /Erik ------------------------------------------------------------------------------

Previous message: RFR: 8199417: Modularize interpreter GC barriers
Next message: RFR: 8199417: Modularize interpreter GC barriers
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the hotspot-dev mailing list