Implement all x86 vendor intrinsics · Issue #40 · rust-lang/stdarch (original) (raw)

This is intended to be a tracking issue for implementing all vendor intrinsics in this repository.
This issue is also intended to be a guide for documenting the process of adding new vendor intrinsics to this crate.

If you decide to implement a set of vendor intrinsics, please check the list below to make sure somebody else isn't already working on them. If it's not checked off or has a name next to it, feel free to comment that you'd like to implement it!

At a high level, each vendor intrinsic should correspond to a single exported Rust function with an appropriate target_feature attribute. Here's an example for _mm_adds_epi16:

/// Add packed 16-bit integers in a and b using saturation. #[inline] #[target_feature(enable = "sse2")] #[cfg_attr(test, assert_instr(paddsw))] pub unsafe fn _mm_adds_epi16(a: __m128i, b: __m128i) -> __m128i { unsafe { paddsw(a, b) } }

Let's break this down:

Once a function has been added, you should also add at least one test for basic functionality. Here's an example for _mm_adds_epi16:

#[simd_test = "sse2"] unsafe fn test_mm_adds_epi16() { let a = _mm_set_epi16(0, 1, 2, 3, 4, 5, 6, 7); let b = _mm_set_epi16(8, 9, 10, 11, 12, 13, 14, 15); let r = _mm_adds_epi16(a, b); let e = _mm_set_epi16(8, 10, 12, 14, 16, 18, 20, 22); assert_eq_m128i(r, e); }

Note that #[simd_test] is the same as #[test], it's just a custom macro to enable the target feature in the test and generate a wrapper for ensuring the feature is available on the local cpu as well.

Finally, once that's done, send a PR!

Writing the implementation

An implementation of an intrinsic (so far) generally has one of three shapes:

  1. The vendor intrinsic does not have any corresponding compiler intrinsic, so you must write the implementation in such a way that the compiler will recognize it and produce the desired codegen. For example, the _mm_add_epi16 intrinsic (note the missing s in add) is implemented via simd_add(a, b), which compiles down to LLVM's cross platform SIMD vector API.
  2. The vendor intrinsic does have a corresponding compiler intrinsic, so you must write an extern block to bring that intrinsic into scope and then call it. The example above (_mm_adds_epi16) uses this approach.
  3. The vendor intrinsic has a parameter that must be a constant value when given to the CPU instruction, where that constant is often a parameter that impacts the operation of the intrinsic. This means the implementation of the vendor intrinsic must guarantee that a particular parameter be a constant. This is tricky because Rust doesn't (yet) have a stable way of doing this, so we have to do it ourselves. How you do it can vary, but one particularly gnarly example is _mm_cmpestri (make sure to look at the constify_imm8! macro).

References

All intel intrinsics can be found here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=5236

The compiler intrinsics available to us through LLVM can be found here: https://gist.github.com/anonymous/a25d3e3b4c14ee68d63bd1dcb0e1223c

The Intel vendor intrinsic API can be found here: https://gist.github.com/anonymous/25d752fda8521d29699a826b980218fc

The Clang header files for vendor intrinsics can also be incredibly useful. When in doubt, Do What Clang Does:
https://github.com/llvm-mirror/clang/tree/master/lib/Headers

TODO

["AVX2"]


previous description of this issue