[llvm-dev] [RFC] Vector Predication (original) (raw)
Bruce Hoult via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 1 02:58:48 PST 2019
- Previous message: [llvm-dev] [RFC] Vector Predication
- Next message: [llvm-dev] [RFC] Vector Predication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Feb 1, 2019 at 2:09 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
Neat! I did not know that about the V extension. So this sounds as though the V extension would like support for <VL x <4 x float>>-style vectors as well.
Yes. In general, support for <VL x > where M is in {2,4,8} and N could be as small as 1 though support for smaller than i8 is optional. (no distinction is drawn between int and float in the vector configuration -- that's up to the operations performed)
We are currently thinking of defining the extension in terms of a 16-bit prefix that changes standard 32-bit instructions into vectorized 48-bit instructions, allowing most future or current standard/non-standard extensions to be vectorized, rather than having to wait for additional extensions to have vector versions added to the V extension (one reason we are not using the V extension instead), such as the B extension.
Do you mean instructions following the standard 48-bit encoding scheme, that happen to contain a standard 32 bit instruction as a payload?
Having a prefix rather than, or in addition to, a layout configuration register allows intermixing vector operations on different group/element sizes without having to constantly change the vector configuration every few instructions.
No real difference. The standard RISC-V Vector extension is intended to allow exactly those changes to the vector configuration every few instructions. It's mostly the microcontroller people coming from DSP/SIMD who want to do that, so it's up to them to make that efficient on their cores -- they might even do macro-op fusion on it. Big OoO/Supercomputer style code compiled from C/FORTRAN in general doesn't want to do that kind of thing.
Example code that changes the configuration within a loop to do 16 bit loads, 16x16->32 multiply, then 32 bit shift and store:
Example: Load 16-bit values, widen multiply to 32b, shift 32b result
right by 3, store 32b values.
loop: vsetvli a3, a0, vsew16,vlmul4 # vtype = 16-bit integer vectors vlh.v v4, (a1) # Get 16b vector slli t1, a3, 1 add a1, a1, t1 # Bump pointer vwmul.vs v8, v4, v1 # 32b in
vsetvli x0, a0, vsew32,vlmul8 # Operate on 32b values
vsrl.vi v8, v8, 3
vsw.v v8, (a2) # Store vector of 32b
slli t1, t1, 2
add a2, a2, t1 # Bump pointer
sub a0, a0, a3 # Decrement count
bnez a0, loop # Any more?
(this example is probably only useful if 16x16->32 mul is significantly faster than 32x32->32, otherwise you'd just load and sign extend the 16 bit data into 32 bit elements)
A note on vector register numbering. There are registers 0..31. If you specify vlmul4 then only v0,v4,v8,v12,v16,v20,v24,v28 are valid register numbers. If you specify vlmul8 then only v0,v8,v16,v24 are valid.
- Previous message: [llvm-dev] [RFC] Vector Predication
- Next message: [llvm-dev] [RFC] Vector Predication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]