[llvm-dev] [RFC] Vector Predication (original) (raw)

Luke Kenneth Casson Leighton via llvm-dev llvm-dev at lists.llvm.org
Mon Feb 4 16:54:37 PST 2019

Previous message: [llvm-dev] [RFC] Vector Predication
Next message: [llvm-dev] [RFC] Vector Predication
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

with apologies for breaking the thread, i wasn't cc'd earlier in the conversation. http://lists.llvm.org/pipermail/llvm-dev/2019-January/129806.html

david, you wrote:

I'm solidly of the opinion that we already have IR support for explicit masking in the form of gather/scatter/etc... Until someone has taken the effort to make masking in this context actually work well, I'm unconvinced that we should greatly expand the usage in the IR.

the problem with gather/scatter is that it requires moving the data (MV or LD/ST)

MV - particularly with quite large data sets - puts pressure on a microarchitecture to increase the size of the register file (otherwise data has to be pushed to stack).

LD/ST - as shown by Jeff Bush in his work on nyuzi - results in significant power consumption increases due to having to push data through the L1/L2 cache (which is all CAMs).

in SV we are deliberately dropping the vectorisation onto the standard register file precisely to avoid the need to exchange data between a special vector register file and a scalar register file.

additionally, the microarchitecture being designed actually happens to effectively implement (use) gather/scatter techniques when a predicate mask is used. this through pushing element operations into a multi-issue instruction queue, and simply skipping of non-predicated elements [thus we get 100% ALU utilisation even when there are back-to-back "if then else" inverted predicate masks (the non-inverted predicate issuing one set of elements, and the inverted predicate matches perfectly with that). ]

basically i feel that this is the right paradigm.

now, if a given ISA doesn't have predicate masks, then yes, absolutely, gather/scatter at the instruction level (as opposed to the micro-architectural level) is the correct way to emulate predication. instructions may be issued that exclude the non-predicated elements, put them into a group (even a SIMD fixed-width group), and re-extract them on the other side of the group-operation into the required destination registers.

even the previously-mentioned SX-Aurora architecture (and other SIMD architectures) could use this trick, to effectively "emulate" predication where the ISA doesn't have predicate masks, and it can also be used to emulate variable-length vectors, through simply setting the top elements of a SIMD block to zero (or ignoring them entirely) and only copying out the lower-indexed elements with a scatter operation. whilst that is not particularly efficient, that's not LLVM's problem: SIMD architectures were designed the way they are because it's seductively simpler at the hardware level.

however to expect an architecture that does support proper predication to have to complexify the way it does predication, by shoe-horning it into gather/scatter... that's sub-optimal and i'm drawing a mental blank as to how it could be done, let alone done effectively and efficiently.

that's not to say that gather/scatter should be removed entirely: that would be a mistake. there are circumstances where gather/scatter is far better suited for use than predicate masks.

bottom line: i feel that expecting predication to be implemented in terms of gather/scatter is the wrong way round. the IR should have explicit and proper support for predicate masks, and architectures that don't have predicate masks should use gather/scatter instructions to emulate it.

Previous message: [llvm-dev] [RFC] Vector Predication
Next message: [llvm-dev] [RFC] Vector Predication
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list