[llvm-dev] [RFC] Vector Predication (original) (raw)
Jacob Lifshay via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 1 03:45:34 PST 2019
- Previous message: [llvm-dev] [RFC] Vector Predication
- Next message: [llvm-dev] [RFC] Vector Predication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Feb 1, 2019 at 2:59 AM Bruce Hoult <brucehoult at sifive.com> wrote:
On Fri, Feb 1, 2019 at 2:09 AM Jacob Lifshay <programmerjake at gmail.com> wrote: > Neat! I did not know that about the V extension. So this sounds as though the V extension would like support for <VL x <4 x float>>-style vectors as well.
Yes. In general, support for <VL x > where M is in {2,4,8} and N could be as small as 1 though support for smaller than i8 is optional. (no distinction is drawn between int and float in the vector configuration -- that's up to the operations performed) > We are currently thinking of defining the extension in terms of a 16-bit prefix that changes standard 32-bit instructions into vectorized 48-bit instructions, allowing most future or current standard/non-standard extensions to be vectorized, rather than having to wait for additional extensions to have vector versions added to the V extension (one reason we are not using the V extension instead), such as the B extension. Do you mean instructions following the standard 48-bit encoding scheme, that happen to contain a standard 32 bit instruction as a payload? Yes. We reuse the 2 LSB bits from the 32-bit instruction (since they are constant) to allow for more prefix bits. An example prefix scheme (that took the complexity waaay too far, we're working on that): https://salsa.debian.org/Kazan-team/kazan/blob/0c5abb5d35b03c52a21a54d4002f76bcec6c5d1d/docs/Prefix%20Proposal.md
>Having a prefix rather than, or in addition to, a layout configuration register allows intermixing vector operations on different group/element sizes without having to constantly change the vector configuration every few instructions. No real difference. The standard RISC-V Vector extension is intended to allow exactly those changes to the vector configuration every few instructions. It's mostly the microcontroller people coming from DSP/SIMD who want to do that, so it's up to them to make that efficient on their cores -- they might even do macro-op fusion on it. Yeah, that works, but you need a larger instruction fetch bandwidth.
Big OoO/Supercomputer style code compiled from C/FORTRAN in general doesn't want to do that kind of thing. We're aiming for SIMT-style code (Vulkan Shaders) converted into variable-length vector operations, so it's different than either microcontroller or supercomputer styles. Before vectorization, short vectors are used to represent:
- colors (RGBA)
- positions (XYZ)
- geometric vectors (XYZ)
- transformation matrices (4x4 or 4x3/3x4)
- positions in homogeneous coordinates (XYZW)
- and more.
The short vectors are used more as a grouping mechanism (like a struct or class) rather than just a method of improving performance.
One problem with the V extension in this use case is that 3-element vectors (pre-vectorization) are quite common, so if there were a mechanism to natively support them, we could pack them tightly in registers and ALUs, preventing a 25% performance loss.
An example: http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html Relevant section reproduced for convenience: struct VertexIn { vec3 position; vec3 normal; vec4 color; // rgba }; struct VertexOut { vec4 position; // xyzw vec4 color; }; VertexIn vertexes_in[]; VertexOut vertexes_out[]; vec3 light_dir; float ambient, diffuse; for(int i = 0; i < 1000; i++) { // calculate vertex colors using // lambert's cos model and fixed ambient brightness vec3 n = vertexes_in[i].normal; vec3 l = light_dir; float dot = n.x * l.x + n.y * l.y + n.z * l.z; float brightness = max(dot, 0.0) * diffuse + ambient; vec4 c = vertexes_in[i].color; c.rgb *= brightness; vertexes_out[i].color = c; // orthographic projection vertexes_out[i].position = vec4(vertexes_in[i].position, 1.0); }
vectorization produces: for(int i = 0;;) { VL = setvl(1000 - i); vec3xVL n = load3xVL_strided(&vertexes_in[i].normal, sizeof(VertexIn)); vec3 l = light_dir; vecVL dot = n.x * l.x + n.y * l.y + n.z * l.z; vecVL brightness = max(dot, 0.0) * diffuse + ambient; vec4xVL c = load4xVL_strided(&vertexes_in[i].color, sizeof(VertexIn)); vec3xVL c_rgb = c.rgb; c_rgb *= brightness; c.rgb = c_rgb; store4xVL_strided(&vertexes_out[i].color, c, sizeof(VertexOut)); vec4xVL p = 1.0; p.xyz = load3xVL_strided(&vertexes_in[i].position, sizeof(VertexIn)); store4xVL_strided(&vertexes_out[i].position, p, sizeof(VertexOut)); i += VL; }
Jacob Lifshay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/7346b32b/attachment.html>
- Previous message: [llvm-dev] [RFC] Vector Predication
- Next message: [llvm-dev] [RFC] Vector Predication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]