mm512_shrdv* intrinsics have incorrect argument order (original) (raw)

I tried this code:

#![feature(stdarch_x86_avx512)]

use std::arch::x86_64::*;

fn main() { unsafe { let a = _mm512_set1_epi32(0xffff); let b = _mm512_setzero_epi32(); let c = _mm512_set1_epi32(1);

    let dst = _mm512_shrdv_epi32(a, b, c);
    println!("{}", _mm512_cvtsi512_si32(dst));    
}

}

I expected to see this happen:

The code produces the same output as the equivalent C program:

#include <immintrin.h> #include <stdio.h>

int main() { __m512i a = _mm512_set1_epi32(0xffff); __m512i b = _mm512_setzero_epi32(); __m512i c = _mm512_set1_epi32(1);

__m512i dst = _mm512_shrdv_epi32(a, b, c);
printf("%u\n", _mm512_cvtsi512_si32(dst));

}

The program outputs 32767.

Instead, the Rust program outputs -2147483648.


Intel's documentation (as linked in the rustdoc for the function) for _mm512_shrdv_epi32 states:

Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.

FOR j := 0 to 15
i := j*32
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ENDFOR
dst[MAX:512] := 0

meaning argument b is the upper bits, and a is the lower bits. However, llvm.fshr.* uses the opposite order. It appears Rust is passing arguments a, b, and c in that order to llvm.fshr:

https://github.com/rust-lang/stdarch/blob/b1edbf90955cb9b057a323f761e2c19edb591e6f/crates/core_arch/src/x86/avx512vbmi2.rs#L997-L999

This likely also applies to all similar intrinsics that call llvm.fshr.

Meta

rustc --version --verbose:

rustc 1.83.0-nightly (0609062a9 2024-09-13)
binary: rustc
commit-hash: 0609062a91c8f445c3e9a0de57e402f9b1b8b0a7
commit-date: 2024-09-13
host: x86_64-unknown-linux-gnu
release: 1.83.0-nightly
LLVM version: 19.1.0