RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores (original) (raw)

Andrew Haley aph at redhat.com
Fri Jul 31 13:32:45 UTC 2015


Hi,

On 07/31/2015 11:33 AM, Andrew Dinn wrote:

On 30/07/15 20:19, Vladimir Kozlov wrote:

First, thank you for extensive comments - they help. They were a necessity for me as much as anyone else :-)

Second, does it really help? I don't see any numbers. Hmm, running on prejudice, maybe try science? good idea! I will obtain numbers.

That's not easy because AArch64 is a specification, not an implementation. Going the route of load acquire/store release may not help much on some chips, but conversations I've had with ARM architects tell me that they should be preferred. In particular, store release for volatiles means that we can avoid a full fence.

Current status: on one out-of-order implementation of AArch64 I see no difference between "stlr" and "dmb st; str ; dmb ish". On another, this time an in-order processor, "stlr" is 40% faster. This is just the execution for a few instructions, like this:

.L3: ldr w2, [x1] add w2, w2, 1 str w2, [x1] stlr x4, [x3] subs x0, x0, #1 bne .L3

versus this:

.L3: ldr w2, [x1] add w2, w2, 1 str w2, [x1] dmb st; str x4, [x3]; dmb ish subs x0, x0, #1 bne .L3

The guidelines from ARM are that we should optimize for the simpler in-order processors; it won't help the out-of-order parts very much, but it won't hurt either.

Andrew.



More information about the hotspot-compiler-dev mailing list