NVPTX: Add f16 SIMD intrinsics by kjetilkjeka · Pull Request #1626 · rust-lang/stdarch (original) (raw)

CUDA seems to have an API for f16 intrinsics, should we be mirroring that API instead?

I see that the names are consistent with the ptx ISA for functions. I will change all the function names to be consistent with this naming (both CUDA and PTX ISA). I will refrain from adopting the leading double underscore __ as this is a C detail related to not having namespaces. I'm very happy with this solution.

The part I'm a bit more unsure about is naming the type half2. The CUDA C type is, at least in one aspect, closer to something found in core::simd as a big part of the problem it solves is being able to use the same vector instructions both on x86_64 and nvptx64 targets. These, core::arch::nvptx types are only usable on ptx and if it is acceptable to keep f16x2 I think that is the better name (being consistent with the ptx ISA and as a bonus also familiar for Rust users)