On NEC SX-Aurora the vector length is always interpreted in 64bit       data chunks. That is one example of a real architecture where the       vscaled interpretation of VL makes sense.

Now this is a problem. Let's leave the details of why RISC-V V needs the other interpretation to Phab, but we definitely have a conflict in what these two architectures need. How do we reconcile them? Picking one option and requiring a multiplication/division of the vlen argument to get the other meaning is a nice canonical IR form, but it seems a bit problematic for codegen because that mul/div is a prime candicate for being CSE'd across blocks (pure calculation, repeated everywhere) and consequently being difficult to access for pattern matching in the backend.

On the other hand, it's a less serious problem than was previously discussed re: vlen vs predication. The actual change in codegen is just omitting one instruction, which one can easily do that in an SSA-based MIR pass if necessary (instead of during ISel). Moreover, the cost of a missed folding opportunity is relatively minor, since it'll most likely be just a shift by an immediate, and it'll usually be amortized over basically the entire loop body in a lot of code.
It's minor for power-of-2 subvector sizes. Our ISA supports subvectors of size 3, and division by 3 is much more complex. Also, the code we will be generating will have a lot of subvectors of size 3 since they are used wherever a 3d position (basically at least a few times in each shader) or normal vector is used.

Still, does anyone have a better idea?

Cheers,
Robin
_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
">

(original) (raw)

On Mon, Feb 4, 2019, 15:32 Robin Kruppe via llvm-dev <llvm-dev@lists.llvm.org wrote:
On Mon, 4 Feb 2019 at 23:04, Simon Moll <moll@cs.uni-saarland.de> wrote:

On NEC SX-Aurora the vector length is always interpreted in 64bit data chunks. That is one example of a real architecture where the vscaled interpretation of VL makes sense.

Now this is a problem. Let's leave the details of why RISC-V V needs the other interpretation to Phab, but we definitely have a conflict in what these two architectures need. How do we reconcile them? Picking one option and requiring a multiplication/division of the vlen argument to get the other meaning is a nice canonical IR form, but it seems a bit problematic for codegen because that mul/div is a prime candicate for being CSE'd across blocks (pure calculation, repeated everywhere) and consequently being difficult to access for pattern matching in the backend.

On the other hand, it's a less serious problem than was previously discussed re: vlen vs predication. The actual change in codegen is just omitting one instruction, which one can easily do that in an SSA-based MIR pass if necessary (instead of during ISel). Moreover, the cost of a missed folding opportunity is relatively minor, since it'll most likely be just a shift by an immediate, and it'll usually be amortized over basically the entire loop body in a lot of code.
It's minor for power-of-2 subvector sizes. Our ISA supports subvectors of size 3, and division by 3 is much more complex. Also, the code we will be generating will have a lot of subvectors of size 3 since they are used wherever a 3d position (basically at least a few times in each shader) or normal vector is used.

Still, does anyone have a better idea?

Cheers,
Robin
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev