RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism) (original) (raw)

Doerr, Martin martin.doerr at sap.com
Tue Apr 24 22:40:08 UTC 2018


Hi Erik,

thanks a lot for your explanations and for digging so much into PPC stuff.

So I think we should continue using memory_order_conservative instead of seq_cst for now. Or we could introduce seq_cst and implement it like conservative on PPC64 for now.

Best regards, Martin

-----Original Message----- From: Erik Österlund <erik.osterlund at oracle.com> Sent: Dienstag, 24. April 2018 22:03 To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)

Hi Martin,

On 2018-04-24 19:02, Doerr, Martin wrote:

Hi Erik,

I was not suggesting removing the more conservative ordering property, only allowing using seqcst, when you know that is the correct semantics. Ok, thanks for clarifying. There are both leading sync and trailing sync conventions for implementing C++11 seqcst on PPC I'm not convinced that this can be freely selected. Is there a GCC switch? What about xlC on AIX?

It is not selectable by the compiler, which is why I think we should generate our own inline assembly where we explicitly choose the convention as either leading sync or trailing sync, depending on how we roll (cf. Repairing Sequential Consistency in C/C++11 https://dl.acm.org/citation.cfm?id=3062352 where both leading and trailing sync bindings for C++11 are provided). Anything else seems unsafe (to me). When I talk about C++11 semantics, I don't necessarily mean the chosen implementation by specific compilers. I mean the semantic properties of seq_cst, acquire, release, acq_rel, consume, etc.

The trailing sync convention would have a different weakness: Store + Cmpxchg The store will have to be SeqCst in order to generate correct code on PPC64.

Sure, if you move the fence from one side to another, but still have one side lacking a full fence, there will naturally be a possibility of reordering on the other side. And therefore I am not arguing we necessarily remove "conservative". Having said that - doesn't this look like a much less innocent mistake - having some sort of dekker duality between a relaxed store (?!) and seq_cst cmpxchg/load? I am biased to think that seems like a much worse and less subtle programmer mistake, that is easier to understand. We don't even do that on x86 today (we would use release_store_fence for the store in such a dekker duality), and it would break more consistently on other platforms too. The comments in orderAccess.hpp file states: "Use release_store_fence to update values like the thread state, where we don't want the current thread to continue until all our prior memory accesses (including the new thread state) are visible to other threads". Now we can have opinions about this "visibility" property, but we have arbitrarily tied it to the stores. Therefore, the fact is that such dekker dualities use release_store_fence today, and there is no corresponding fence_load_acquire() operation. Therefore, I think it would seem, in HotSpot, easier to adopt a trailing sync seq_cst convention, to fit into existing code conventions. Sorry for derailing further down the rabbit hole...

Even if we can select it in C++, it would be unfortunate if it differs from JMM. JMM should not be changed for the sake of C++ implementation cleanup. We have "JEP 188: Java Memory Model Update" for that.

Sure. The chosen bindings for the JMM would have to comply with that. I will stop myself from derailing into a conversation about that. :)

From the other mail:

With C++11, you can choose both what memory ordering semantics successful CAS and failing CAS should have You're right. We just need to use it correctly.

Indeed. :)

Thanks, /Erik

Thanks and best regards, Martin

-----Original Message----- From: Erik Österlund [mailto:erik.osterlund at oracle.com] Sent: Dienstag, 24. April 2018 18:25 To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism) Hi Martin, On 2018-04-24 15:43, Doerr, Martin wrote: Hi Erik,

I highly appreciate if we can get the VM code to work correctly with C++ 2011 semantics on PPC64. But I think this needs to be done very carefully in order to avoid missing sync instructions between CAS and Load. I was not suggesting removing the more conservative ordering property, only allowing using seqcst, when you know that is the correct semantics. E.g. the new global counter implementation uses (after fix): Atomic::add(memoryorderconservative) + OrderAccess::loadacquire In C++2011 this should become: Cmpxchg SeqCst + Load Seq Cst Cmpxchg SeqCst + any other Load would be incorrect on PPC64. So we will need to double-check all corresponding loads. This might be getting out of topic, but here goes... There are both leading sync and trailing sync conventions for implementing C++11 seqcst on PPC. What you are saying is true for the leading sync convention, but not for the trailing sync convention. In HotSpot, I think it is probably more desirable to bind the expectations for all seqcst accesses to have trailing sync convention to avoid this issue of seqcst store followed by loadacquire being reordered. I don't think we want a situation where PPC implements leading sync (despite being able to choose trailing sync), when AArch64 uses trailing sync. But perhaps this is derailing a bit... I've seen that you have moved the supportIRIWfornotmultiplecopyatomiccpu fence into RawAccessBarrier::loadinternal. Ideally, RawAccessBarrier::storeinternal should use OrderAccess::releasestore without fence in case of supportIRIWfornotmultiplecopyatomiccpu, but I'm not requesting to change this now. We'd get this by moving to C++ 2011 semantics (which assumes supportIRIWfornotmultiplecopyatomiccpu), right? That code should really call into Atomic/OrderAccess with seqcst, and let the platform layer do what it needs to do. Thanks, /Erik Best regards, Martin

-----Original Message----- From: Erik Österlund [mailto:erik.osterlund at oracle.com] Sent: Montag, 23. April 2018 12:41 To: Andrew Haley <aph at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism) Hi Andrew, On 2018-04-23 12:20, Andrew Haley wrote: On 04/18/2018 05:37 PM, Doerr, Martin wrote:

It'd be great if we could specify semantics for Atomic:add like we do for Atomic::cmpchgx. However, the double-fence is a very bad default from performance perspective. I wonder if PPC64 is the only platform which gets hit. Probably not. Would it be acceptable to set the default to memoryorderacquirerelease and specify memoryorderconservative for the new usage? I think this would fit perfectly for PPC64, but some people may not like it. One could say PPC64 is the problem, but one could also say the VM code is the problem �� Indeed. In as much as we use stronger memory ordering than we need in more than one place, the VM code is the problem. I don't really like the straight forward fix to insert a fence with #ifdef AIX. I agree. It looks to me like it's sufficient if the increment of globalcounter and the load of thread->getrcucounter() are sequentially consistent. On at least one platform, AArch64, we can do that without additional fencing. I would also like to have seqcst available. In the Access API, I need MOSEQCST for JMM volatile support. I would like the semantic requirements to go straight into Atomic, instead of doing a "if (supportIRIWfornotmultiplecopyatomiccpu) { OrderAccess::fence(); }" dance in shared code, that may or may not make sense at a given platform (among other reasons because it enforces whether leading or trailing sync convention is used in shared code, rather than leave that decision to the platform level). Plus I have a mouse pad that says I love sequential consistency, although that might not be a valid argument to introduce this to Atomic. Thanks, /Erik



More information about the hotspot-dev mailing list