[llvm-dev] [RFC] Vector Predication (original) (raw)

Bruce Hoult via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 1 02:58:48 PST 2019

Previous message: [llvm-dev] [RFC] Vector Predication
Next message: [llvm-dev] [RFC] Vector Predication
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Feb 1, 2019 at 2:09 AM Jacob Lifshay <programmerjake at gmail.com> wrote:

Neat! I did not know that about the V extension. So this sounds as though the V extension would like support for <VL x <4 x float>>-style vectors as well.

Yes. In general, support for <VL x > where M is in {2,4,8} and N could be as small as 1 though support for smaller than i8 is optional. (no distinction is drawn between int and float in the vector configuration -- that's up to the operations performed)

We are currently thinking of defining the extension in terms of a 16-bit prefix that changes standard 32-bit instructions into vectorized 48-bit instructions, allowing most future or current standard/non-standard extensions to be vectorized, rather than having to wait for additional extensions to have vector versions added to the V extension (one reason we are not using the V extension instead), such as the B extension.

Do you mean instructions following the standard 48-bit encoding scheme, that happen to contain a standard 32 bit instruction as a payload?

Having a prefix rather than, or in addition to, a layout configuration register allows intermixing vector operations on different group/element sizes without having to constantly change the vector configuration every few instructions.

No real difference. The standard RISC-V Vector extension is intended to allow exactly those changes to the vector configuration every few instructions. It's mostly the microcontroller people coming from DSP/SIMD who want to do that, so it's up to them to make that efficient on their cores -- they might even do macro-op fusion on it. Big OoO/Supercomputer style code compiled from C/FORTRAN in general doesn't want to do that kind of thing.

Example code that changes the configuration within a loop to do 16 bit loads, 16x16->32 multiply, then 32 bit shift and store:

Example: Load 16-bit values, widen multiply to 32b, shift 32b result

right by 3, store 32b values.

loop: vsetvli a3, a0, vsew16,vlmul4 # vtype = 16-bit integer vectors vlh.v v4, (a1) # Get 16b vector slli t1, a3, 1 add a1, a1, t1 # Bump pointer vwmul.vs v8, v4, v1 # 32b in

vsetvli x0, a0, vsew32,vlmul8  # Operate on 32b values
vsrl.vi v8, v8, 3
vsw.v v8, (a2)          # Store vector of 32b
  slli t1, t1, 2
  add a2, a2, t1        # Bump pointer
  sub a0, a0, a3        # Decrement count
  bnez a0, loop         # Any more?

(this example is probably only useful if 16x16->32 mul is significantly faster than 32x32->32, otherwise you'd just load and sign extend the 16 bit data into 32 bit elements)

A note on vector register numbering. There are registers 0..31. If you specify vlmul4 then only v0,v4,v8,v12,v16,v20,v24,v28 are valid register numbers. If you specify vlmul8 then only v0,v8,v16,v24 are valid.

Previous message: [llvm-dev] [RFC] Vector Predication
Next message: [llvm-dev] [RFC] Vector Predication
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list