aarch64 DMB - patch (original) (raw)

Andrew Dinn adinn at redhat.com
Tue Jun 23 10:12:44 UTC 2015


Hi Benedikt,

On 17/06/15 14:26, Benedikt Wedenik wrote:

I checked out both repositories and compared the AD-file. My patch also works in the latest version of hg.openjdk.java.net/jdk9/hs-comp <http://hg.openjdk.java.net/jdk9/hs-comp>.

If ADinn is working on that part of the code right now, do you think I should talk to him directly?

You have been talking to him directly -- it's just that I have not been responding because I have been away on holiday for a few weeks.

Firstly, here is a summary of what is currently being done to replace memory barriers with ldar/stlr instructions.

I have already made one change in jdk9 to ensure that dmb instructions are elided for volatile gets and non-object field volatile puts. You can track progress for that patch via the associated JIRA issue:

https://bugs.openjdk.java.net/browse/JDK-8078263

That fix required modifying the ad file rules which match MemBarAcquire, MemBarRelease and MemBarVolatile nodes to employ predicates which filter out the cases where generation of a dmb can safely be omitted. It also required changing the rules for put and get to use corresponding predicates to generate stlr and ldar in precisely the same cases. The predictes need to detect /exactly/ the same cases for elision and generation of synchronizing loads/stores in order for the optimization to be correct. You should look at the prior jdk9 aarch64 code to see why these predicates are defined as is -- the jdk7 and jdk8 aarch64 rules differ and are not a good starting point.

This first fix fails to optimize volatile object stores. That's because the current predicates do not recognize the GC card mark nodes inserted by the compiler. I am about to post a fix for this case to aarch64-dev and hotspot-dev. The JIRA is

https://bugs.openjdk.java.net/browse/JDK-8078743

A follow-up fix will also optimize CAS operations to drop dmbs in favour of ldar/stlr. This 3rd fix depends on the second fix as it requires use of a common function to test for the presence of GC card mark nodes. The JIRA issue is

https://bugs.openjdk.java.net/browse/JDK-8080293

Now, as regards your proposed patch -- it appears to be addressing the unrelated case (unrelated to my changes above, that is) of memory barriers associated with fast lock and fast unlock operations i.e. locks associated with synchronized methods or synchronizations on objects via the synchronized keyword. I am not sure your patch is valid wrt to the jdk9 code base or even relative to jdk7/8.

Your attachment includes a change to elide the dmb instructions planted when a MemBarAcquireLock or MemBarReleaseLock node is matched. These are generated, respectively, before and after a FastLock and FastUnlock node. The encodings for these latter two operations, aarch64_enc_fast_lock and aarch64_enc_fast_unlock currently employ ldxr and stlxr at the points where the object markOop field is being tested and updated (this is true in jdk7/8/9). Note that /ldxr/ is not an acquiring load. So, if your contention is that the barriers can be dropped because the markOop load-exclusive + store-exclusive pair provides sufficiently strong memory syncrhonization then at the very least your patch would need to modify the encoding to use ldaxr in place of ldxr.

However, I am not convinced that these barriers can be removed even granted that change. There are various other memory operations encoded in both the fast_lock and fast_unlock cases both before and after the load-exclusive + store-exclusive pair. I believe the point of separating out the MemBarAcquireLock and MemBarReleaseLock from FastLock and FastUnlock is to ensure that those related memory operations are correctly synchronized wrt to memory operations performed by other threads which may be trying to synchronize on the same oop. If you think I am wrong and your optimization is valid then you really need to provide a detailed, convincing argument as to why -- n.b. that's not a requirement to convince me but rather to convince the many experts on this list who understand lock synchronization. Expect a lively and lengthy debate if you want to pursue this.

regards,

Andrew Dinn



More information about the hotspot-compiler-dev mailing list