RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException (original) (raw)

Erik Österlund erik.osterlund at oracle.com
Mon Jun 5 11:52:02 UTC 2017

Previous message: RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException
Next message: RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Andrew,

On 2017-06-03 11:00, Andrew Haley wrote:

On 30/05/17 11:57, Erik Österlund wrote:

The issue is not whether an algorithm depends on IRIW or not. The issue is that we have to explicitly reason about IRIW to prove that it works. The lack of IRIW violates seqcst and by extension linearization points that rely in seqcst, and by extension algorithms that rely on linearization points. By breaking the very building blocks that were used to reason about algorithms and their correctness, we rely on chance for it to work... I've been thinking about this some more, and one thing you said has been baffling me. I don't think that I have come across anywhere in HotSpot that uses seqcst internally, and I don't think there is even any support for it in OrderAccess. The only thing that I know of where we actually need seq.cst is in the C++ interpreter and JNI's support for volatile fields, and there we simulate seq.cst by using releasefence; store; full fence.

Yes, I miss sequential consistency in hotspot too. In fact, my mouse pad has "sequential consistency <3" printed on it. :) For atomic operations such as CAS, we do use what we refer to as conservative memory ordering, which appears to be closely related to SC.

Other than that, the main idea with SC is on an abstract level to have (at least) the guarantees of the weaker acquire/release as well as the effect of a full fence between any pair of SC accesses so that they can agree upon some total ordering of SC accesses (let's not dig too far into details and interactions between SC and acquire/release on the same atomic objects just yet). In hotspot, rather than having such semantics, we have instead elected to accomplish that either with interleaved OrderAccess::fence() calls where this is required. In the cases you mentioned, the fence is guarded with if statements checking whether "support_IRIW_for_not_multiple_copy_atomic_cpu" is set to control the convention of where the fence is placed.

I believe and hope that if we would have had an SC memory ordering, it would have been used more.

But we could do that with the seq.cst C++11 functions instead.

Or build our own SC bindings. In fact, my new Access API does come with sequential consistency as one memory ordering constraint. Hopefully that will not be shot down...

The benefit of rolling our own SC mappings is that we control the ABI and conventions. Some immediate benefits are:

It is compliant with our dynamically generated code interacting with our runtime. For example, the most popular bindings for SC on non-multiple copy atomic CPUs seems to be either what is commonly referred to as "leading sync" or "trailing sync" conventions where a full fence is placed either before or after every SC access to ensure every two SC accesses would be interleaved with a full fence. Both the leading sync and trailing sync conventions were proven correct [1]. But subsequently, the proof was proven wrong and it turns out that the trailing sync bindings are not always safe as it may violate e.g. IRIW when weaker acquire accesses are combined with SC accesses on the same locations [2]. Yet the currently recommended ARMv7 bindings appear to be trailing sync [3], because it has been determined to be faster. Recently, both trailing sync and leading sync bindings have been proposed that correct the currently known flaws of C++11 seq_cst [4]. These bindings are not necessarily compliant with lock-free races between our runtime and our dynamically generated code.
The trailing sync and leading sync conventions used by C++11 compilers are not ABI compliant. And there is no real way for us to know if a leading sync or trailing sync convention is used by the C++ compiler without full on selling our souls to uncontracted compiler internals. Any subsequent compiler upgrade might break the shady contract we thought we had with our dynamically generated code. This is a fundamental issue with relying on C++11 atomics.
We can be compliant with our own JMM and not have to think about the interactions and differences between our JMM and the evolving C++11 memory model when we write synchronized code.
Rather than having trailing sync on some architectures and leading sync on others leading to observably different behaviour when combined with dynamically generated code, I would ideally like to have the same conventions on all machines so that any such behavioural difference for generated code looks the same on all platforms. This is not the case for C++11 atomics. Currently recommended bindings use leading sync for PPC and trailing sync for ARMv7 [3]. And they will behave slightly differently.
The ARM manuals talk about "visibility" of stores as parts of its contract - a property that is too specific to fit into the C++11 memory model, yet might be useful for a JVM that needs to interact with dynamically generated code. In particular, if some ARMv8 stlr instruction guarantees "visibility" (which I think can be better thought of as not reordering with any subsequent accesses in program order as a memory model should preferably talk about constraints between memory access orderings), then this is equivalent to always having a trailing fence() on SC stores, and hence unlike e.g. a C++11 leading sync SC store binding for PPC. I would argue we do want this stronger property as our current generated code occasionally depends on that.
We can decide upon what properties we think are important independently of which direction C++ decides to go. They can e.g. decide to buy weaker consistency for a failing CAS in the name of micro optimization, and we can elect not to buy that bag of problems.

Hope you bought my car sales man pitch.

Of course I have a motive for digging into this: I'm the lead of the AArch64-port project. That processor (uniquely?) has real support for seq.cst, and it'd be nice to use it without throwing a full fence at every volatile store.

I understand your motivation. At least I think we both want some kind of SC semantics in hotspot in one shape or another. :)

Thanks, /Erik

[1] "Clarifying and Compiling C/C++ Concurrency: from C++11 to POWER", M. Betty et. al., POPL'12 [2] "Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings", Y. A. Manerka, 2016 [3] https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html [4] "Repairing sequential consistency in C/C++ 11", O. Lahav, PLDI'17

Previous message: RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException
Next message: RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the jdk10-dev mailing list