[llvm-dev] [RFC] Vector Predication (original) (raw)
Simon Moll via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 1 01:52:16 PST 2019
- Previous message: [llvm-dev] [RFC] Vector Predication
- Next message: [llvm-dev] [RFC] Vector Predication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
On 1/31/19 8:17 PM, Philip Reames wrote:
On 1/31/19 11:03 AM, David Greene wrote: Philip Reames <listmail at philipreames.com> writes:
Question 1 - Why do we need separate mask and lengths? Can't the length be easily folded into the mask operand?
e.g. newmask = (<4 x i1>)((i4)%y & (1 << %L -1))_ _and then pattern matched in the backend if needed_ _I'm a little concerned about how difficult it will be to maintain enough_ _information throughout compilation to be able to match this on a machine_ _with an explicit vector length value._ _Does the hardware *also* have a mask register? If so, this is a_ _likely minor code quality issue which can be incrementally refined_ _on. If it doesn't, then I can see your concern._ _Question 2 - Have you explored using selects instead? What practical_ _problems do you run into which make you believe explicit predication_ _is required?_ _e.g. %sub = fsub <4 x float> %x, %y %result = select <4 x i1> %M, <4 x float> %sub, undef That is semantically incorrect. According to IR semantics, the fsub is fully evaluated before the select comes along. It could trap for elements where %M is 0, whereas a masked intrinsic conveys the proper semantics of masking traps for masked-out elements. We need intrinsics and eventually (IMHO) fully first-class predication to make this work properly. If you want specific trap behavior, you need to use the constrained family of intrinsics instead. In IR, fsub is expected not to trap. We have an existing solution for modeling FP environment aspects such as rounding and trapping. The proposed signatures for your EVL proposal do not appear to subsume those, and you've not proposed their retirement. We definitely don't want two ways of describing FP trapping. In other words, I don't find this reason compelling since my example can simply be rewritten using the appropriate constrained intrinsic.
The existing constrained fp intrinsics do not take a mask or vlen. So, you can not have vectorized trapping fp math at the moment (beyond what LV can do...).
Masking has advantages even in the default non-trapping fp environment: It is not uncommon for fp hardware to be slow on denormal values. If you take the operation + select approach, spurious computation on denomals could occur, slowing down the program.
If you target has no masked fp ops (SSE, NEON, ..), you can still use EVL and have the backend lower it to "select-safe-inputs-on-masked-off-lanes + fp-operation" pattern. If you emit that pattern to early, InstCombine etc might fold it away.. also because IR optimizations can not distinguish between a select that was part of the original program and a select that was inserted to have a matchable pattern in the backend.
My context for these questions is that my experience recently w/o existing masked intrinsics shows us missing fairly basic optimizations, precisely because they weren't able to reuse all of the existing infrastructure. (I've been working on SimplifyDemandedVectorElts recently for exactly this reason.) My concern is that your EVL proposal will end up in the same state. I think that's just the nature of the beast. We need IR-level support for masking and we have to teach LLVM about it. I'm solidly of the opinion that we already have IR support for explicit masking in the form of gather/scatter/etc... Until someone has taken the effort to make masking in this context *actually work well*, I'm unconvinced that we should greatly expand the usage in the IR.
What do you mean by "make masking work well"? LLVMs vectorization support is stuck in ~2007 (SSE, ..) with patched-in intrinsics to support masked load/store and gather/scatter on AVX2.
I think this is a chicken-and-egg problem: LLVMs LoopVectorizer is rather limited and is used to argue that better IR support for predication was not necessary. However, if we had better IR support more aggressive vectorization schemes are possible.. right now, if you are serious about exploiting a SIMD ISAs, people use target-specific intrinsics to get the functionality they need.
-David
--
Simon Moll Researcher / PhD Student
Compiler Design Lab (Prof. Hack) Saarland University, Computer Science Building E1.3, Room 4.31
Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll
- Previous message: [llvm-dev] [RFC] Vector Predication
- Next message: [llvm-dev] [RFC] Vector Predication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]