Redundant barrier elimination (original) (raw)

Doug Lea dl at cs.oswego.edu
Wed Feb 12 10:44:53 PST 2014


On 02/12/2014 11:20 AM, Lindenmaier, Goetz wrote:

during the PPC port, we encountered some problems with the current representation of the barriers.

1.) We can do load-acquire (ld-twi-isync) on ppc. Therefore we implement MemBarAcquire empty. But there were places where MemBarAcquire is issued without corresponding to a dedicated load. To distinguish this, we introduced MemBarLoadFence. Further, there are graphs where a ld.acq is followed by a membar instruction (sync or lwsync), in this case we can omit the -twi-isync. We check this during matching by calling followedbyacquire() in the matcher predicate. (Comparable to Matcher::poststoreloadbarrier().)

First a disclaimer: I have amateur status in C2, so could easily be wrong.

In principle, even on processors with fused fence+access, the later you go without fusing them, the more likely you can get rid of the fences. So it would seem better all around to have smarter matching only during instruction generation, and keep the fences separate. This also simplifies other C2 passes because the fences also generate reordering constraints. Keeping the four kinds of fences separate also aids optimization.

Pushing this a little harder, it may also make sense to do this for CompareAndSwap* nodes and maybe even *Lock nodes rather than implicitly assuming fence properties. Especially since some processors have different modes (acquire/release/full), or LL/SC idioms with these effects. It is possible/likely that JMM revisions and related JEPs will expose these.

4.) We think that in doexits() a MemBarStoreStore suffices.

(Yes, it is likely that this will become the officially sanctioned strategy across all processors.)

-Doug



More information about the hotspot-compiler-dev mailing list