SIMD groundwork by huonw · Pull Request #1199 · rust-lang/rfcs (original) (raw)

The Motivation section mentions how this RFC aims to provide just some ground-work on top of which nice SIMD functionality could be built. While builtin arithmetic, shuffles etc for repr(simd) types is nice and convenient, providing it at this level seems questionable. I think something like this could be accomplished inside the to-be-written SIMD library as well with some operator overloading and the intrinsic functions for basic arithmetic.

I agree that it isn't totally necessary to actually use the arithmetic operators: we could instead use a generic intrinsic similar to the comparison operators. However, I think it is important we do more than the platform intrinsics: LLVM (and compilers in general) knows more about its internal add instruction than arbitrary platform specific intrinsics, and so may be able to optimise it more aggressively.

For shuffles, the optimisation applies: the compiler can e.g. simplify a sequence of shuffles into a single one. Also. the RFC discusses this. One point in it is the compiler synthesizing an optimal (/close to optimal) sequence of instructions for an arbitrary shuffle, instead of forcing the programmer to think about doing that themselves.

I have a bad feeling about this. A regular method call shouldn't require parameters to be compile time constants. Using generics to express this requirement as shown here depends on #1062, but it would be a much cleaner solution.

This isn't a regular method call: intrinsics are special in many ways. Note that my solution on #1062 that you link to just calls the intrinsic. This is the low-level API, people generally won't be calling the intrinsics directly.

Wouldn't a compile error be nicer here?

Yes, sort of. However, using the trick mentioned in #1062 would result in very poor error messages, since the shuffle order may be passed through multiple layers of generic function calls possibly in external crates, meaning the out-of-bounds error is generated deep inside code that the programmer didn't write.

Why implement compiler magic when these functions could be implemented in plain rust with asm!()?

As other have said, asm! is a black-box, and seriously inhibits optimisations.

Additionally, inline asm allows the programmer to influence things like instruction scheduling and register allocation (within the asm section), in case the compiler is doing a bad job in that regard.

Neither of these apply to this: the API is essentially exposing individual CPU instructions, i.e. each asm! block is a single instruction. Hence, there's no scheduling benefit, and none of the asm! blocks would use concrete registers: they'd all be "generic", to let the compiler allocate registers as it sees fit.

These reasons apply if one was, say, writing an entire inner loop as a single asm! block, but it doesn't apply here.

I'm unsure about what the repr(simd) introduced here really does. I guess its primary purpose is signaling to the compiler that this struct can live in the SIMD registers and be subject to SIMD operations (like this builtin arithmetic).

Yes. repr(simd) changes how a type is represented. E.g. it changes the alignment, imposes element constraints, and even changes its ABI (for function/FFI calls).

Concerning the structural typing when importing the intrinsics: Please be careful that this does not end up allowing people to "peek through" private abstractions of data-types. That would be a horrible mess of a safety issue.

It sort-of does, but in a very very restricted way, that's already possible with transmute.

What's the reason for choosing this unconventional approach to typing, rather than using tuples, or arrays, or lang-items for the Simd* types?

Tuples and arrays don't have the right low-level details.

repr(simd) is essentially acting as a lang-item that can be defined multiple times. All of the actual lang items (i.e. #[lang = "..."]) in the compiler can only be defined once in the entire hierarchy of dependencies of a compilation target, which means we'd either have to allow multiple versions of these lang items, or just disallow linking multiple SIMD crates into a project (e.g. two different crates that define low-level SIMD interfaces, or even just versions 0.1 & 0.3 or 1.0 & 2.3 or ... of a single SIMD crate).