RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException (original) (raw)

Andrew Haley aph at redhat.com
Mon Jun 5 15:07:15 UTC 2017


On 05/06/17 12:52, Erik Ă–sterlund wrote:

Hi Andrew,

On 2017-06-03 11:00, Andrew Haley wrote: On 30/05/17 11:57, Erik Ă–sterlund wrote:

The issue is not whether an algorithm depends on IRIW or not. The issue is that we have to explicitly reason about IRIW to prove that it works. The lack of IRIW violates seqcst and by extension linearization points that rely in seqcst, and by extension algorithms that rely on linearization points. By breaking the very building blocks that were used to reason about algorithms and their correctness, we rely on chance for it to work... I've been thinking about this some more, and one thing you said has been baffling me. I don't think that I have come across anywhere in HotSpot that uses seqcst internally, and I don't think there is even any support for it in OrderAccess. The only thing that I know of where we actually need seq.cst is in the C++ interpreter and JNI's support for volatile fields, and there we simulate seq.cst by using releasefence; store; full fence. Yes, I miss sequential consistency in hotspot too. In fact, my mouse pad has "sequential consistency <3" printed on it. :) For atomic operations such as CAS, we do use what we refer to as conservative memory ordering, which appears to be closely related to SC. Other than that, the main idea with SC is on an abstract level to have (at least) the guarantees of the weaker acquire/release as well as the effect of a full fence between any pair of SC accesses so that they can agree upon some total ordering of SC accesses (let's not dig too far into details and interactions between SC and acquire/release on the same atomic objects just yet). In hotspot, rather than having such semantics, we have instead elected to accomplish that either with interleaved OrderAccess::fence() calls where this is required. In the cases you mentioned, the fence is guarded with if statements checking whether "supportIRIWfornotmultiplecopyatomiccpu" is set to control the convention of where the fence is placed. I believe and hope that if we would have had an SC memory ordering, it would have been used more. But we could do that with the seq.cst C++11 functions instead. Or build our own SC bindings.

True: I'd like us to be able to define what we need on a per-CPU basis rather than defining things in the shared code, so that we really can use the appropriate instructions everywhere, and every target does what we need as well as it can.

In particular, ARM listened to our pleas about sequential consistency and gave us what we asked for in v8, and I'd like to take advantage of the results.

In fact, my new Access API does come with sequential consistency as one memory ordering constraint. Hopefully that will not be shot down...

I don't believe that we should use SC where acq/rel is sufficient: the efficiency hit on some hardware is not nice.

The benefit of rolling our own SC mappings is that we control the ABI and conventions. Some immediate benefits are:

1) It is compliant with our dynamically generated code interacting with our runtime. For example, the most popular bindings for SC on non-multiple copy atomic CPUs seems to be either what is commonly referred to as "leading sync" or "trailing sync" conventions where a full fence is placed either before or after every SC access to ensure every two SC accesses would be interleaved with a full fence. Both the leading sync and trailing sync conventions were proven correct [1]. But subsequently, the proof was proven wrong and it turns out that the trailing sync bindings are not always safe as it may violate e.g. IRIW when weaker acquire accesses are combined with SC accesses on the same locations [2].

That doesn't surprise me.

Yet the currently recommended ARMv7 bindings appear to be trailing sync [3], because it has been determined to be faster. Recently, both trailing sync and leading sync bindings have been proposed that correct the currently known flaws of C++11 seqcst [4]. These bindings are not necessarily compliant with lock-free races between our runtime and our dynamically generated code.

2) The trailing sync and leading sync conventions used by C++11 compilers are not ABI compliant.

I don't understand what you mean by this.

And there is no real way for us to know if a leading sync or trailing sync convention is used by the C++ compiler without full on selling our souls to uncontracted compiler internals. Any subsequent compiler upgrade might break the shady contract we thought we had with our dynamically generated code. This is a fundamental issue with relying on C++11 atomics.

OK. But where we do know, we should be able to oveeride it in the CPU-dependent code.

3) We can be compliant with our own JMM and not have to think about the interactions and differences between our JMM and the evolving C++11 memory model when we write synchronized code.

Kinda sorta, but in the longer term we will really need to be able to interwork with C++ code that uses atomics.

4) Rather than having trailing sync on some architectures and leading sync on others leading to observably different behaviour when combined with dynamically generated code, I would ideally like to have the same conventions on all machines so that any such behavioural difference for generated code looks the same on all platforms. This is not the case for C++11 atomics.

And I really would like this not to happen. A honking great full fence after every volatile store is not my idea of a good time.

Currently recommended bindings use leading sync for PPC and trailing sync for ARMv7 [3]. And they will behave slightly differently.

5) The ARM manuals talk about "visibility" of stores as parts of its contract - a property that is too specific to fit into the C++11 memory model, yet might be useful for a JVM that needs to interact with dynamically generated code. In particular, if some ARMv8 stlr instruction guarantees "visibility" (which I think can be better thought of as not reordering with any subsequent accesses in program order as a memory model should preferably talk about constraints between memory access orderings), then this is equivalent to always having a trailing fence() on SC stores,

But consider this:

volatile x, y

Thread 1 Thread 2

x = 1 y = 1 r1 = y r1 = x

On the ARMv8 memory model, we can't do this as

mov r0, #1 mov r0, #1 stlr r0, [x] stlr r0, [y] ldr r1, [y] ldr r1, [y] dmb dmb

In this case, clearly the DMB instructions have no effect. In order to get SC we need to match the STLR with LDAR. STLR is not equivalent to always having a trailing fence, but we can recover SC by using a DMB before the load.

and hence unlike e.g. a C++11 leading sync SC store binding for PPC. I would argue we do want this stronger property as our current generated code occasionally depends on that.

6) We can decide upon what properties we think are important independently of which direction C++ decides to go. They can e.g. decide to buy weaker consistency for a failing CAS in the name of micro optimization, and we can elect not to buy that bag of problems.

They already did. And rightly so, IMO.

Hope you bought my car sales man pitch.

Of course I have a motive for digging into this: I'm the lead of the AArch64-port project. That processor (uniquely?) has real support for seq.cst, and it'd be nice to use it without throwing a full fence at every volatile store. I understand your motivation. At least I think we both want some kind of SC semantics in hotspot in one shape or another. :)

Indeed, but not everywhere: much of the time acq/rel will do.

-- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



More information about the jdk10-dev mailing list