Redundant barrier elimination (original) (raw)

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Wed Feb 12 08:20:57 PST 2014


Hi,

during the PPC port, we encountered some problems with the current representation of the barriers.

1.) We can do load-acquire (ld-twi-isync) on ppc. Therefore we implement MemBarAcquire empty. But there were places where MemBarAcquire is issued without corresponding to a dedicated load. To distinguish this, we introduced MemBarLoadFence. Further, there are graphs where a ld.acq is followed by a membar instruction (sync or lwsync), in this case we can omit the -twi-isync. We check this during matching by calling followed_by_acquire() in the matcher predicate. (Comparable to Matcher::post_store_load_barrier().) 2.) Similar holds for st.rel on ia64. 3.) MemBarVolatileNode is specified to do a StoreLoad barrier. On ppc, we match it to a node that issues the 'sync' instruction. This is the only instruction doing a StoreLoad barrier. But 'sync' also does all the other barriers, so we could coalesce it with any other MemBar node. 4.) We think that in do_exits() a MemBarStoreStore suffices.

As a solution, I could think of a generic node, that indicates by four flags which barrier it should execute (LoadLoad, LoadStore etc.). An optimization on the ideal graph then could coalesce nodes, or-ing the flags. The matcher could then just match the cheapest instruction doing the required barriers.

Barriers that are implemented empty should just not be issued to the ideal graph. There should be a way to configure per-platform which barriers are not needed. Eventually they should be replaced by MemBarCPUOrder. Also, MemBarCPUOrder should not be issued if there is an other MemBar operation. CPU order could be modeled by the node proposed above if none of the flags is set.

In addition, on PPC, we could peel off barrier operations from cmpxchg and represent them as individual IR nodes. Then these could be subject to further optimization, too.

Best regards, Goetz and Martin

-----Original Message----- From: hotspot-compiler-dev-bounces at openjdk.java.net [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doug Lea Sent: Mittwoch, 12. Februar 2014 15:41 To: hotspot compiler Subject: Redundant barrier elimination

While exploring JMM update options, we noticed that hotspot will tend to pile up memory barrier nodes without bothering to coalesce them into one barrier instruction. Considering the likelihood that JMM updates will accentuate pile-ups, I looked into improvements.

Currently, there is only one case where coalescing is attempted. Matcher::post_store_load_barrier does a TSO-specific forward pass, that handles only MemBarVolatile. This is a harder case than others, because it takes into account that other MemBars are no-ops on TSO. It is (or should be) called only from dfa on x86 and sparc. So it does not apply on processors for which MemBarAcquire and MemBarRelease are not no-ops. But for all (known) processors, you can always do an easier check for redundancy, buttressed by hardware-model-specific ones like post_store_load_barrier when applicable. I put together the following, that does a basic check, but I don't offhand know of a cpu-independent place to call it from. Needing to invoke this from each barrier case in each .ad file seems suboptimal. Any advice would be welcome. Or perhaps suggestions about placing similar functionality somewhere other than Matcher?

Thanks!

... diffs from JDK9 (warning: I haven't even tried to compile this)

diff -r 4c8bda53850f src/share/vm/opto/matcher.cpp --- a/src/share/vm/opto/matcher.cpp Thu Feb 06 13:08:44 2014 -0800 +++ b/src/share/vm/opto/matcher.cpp Wed Feb 12 09:07:17 2014 -0500 @@ -2393,6 +2393,54 @@ return false; }

+// Detect if current barrier is redundant. Returns true if there is +// another upcoming barrier or atomic operation with at least the same +// properties before next store or load. Assumes that MemBarVolatile +// and CompareAndSwap* provide "full" fences, and that non-biased +// FastLock/Unlock provide acquire/release +bool Matcher::is_redundant_barrier(const Node* vmb) {



More information about the hotspot-compiler-dev mailing list