Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for bf16(xN)
, i1xN
and x86amx
by sayantn · Pull Request #140763 · rust-lang/rust (original) (raw)
This PR changes how LLVM intrinsics are codegen
Explanation of the changes
Current procedure
This is the same for all functions, LLVM intrinsics are not treated specially
- We get the LLVM Type of a function simply using the argument types. For example, the following function
#[link_name = "llvm.sqrt.f32"]
fn sqrtf32(a: f32) -> f32;
will have LLVM type simply f32 (f32)
due to the Rust signature
Pros
- Simpler to implement, no extra complexity involved due to LLVM intrinsics
Cons
- LLVM intrinsics have a well-defined signature, completely defined by their name (and if it is overloaded, the type parameters). So, this process of converting Rust signatures to LLVM signatures may not work, for example the following code generates LLVM IR without any problem
#[link_name = "llvm.sqrt.f32"]
fn sqrtf32(a: i32) -> f32;
but the generated LLVM IR is invalid, because it has wrong signature for the intrinsic (Godbolt, adding -Zverify-llvm-ir
to it will fail compilation). I would expect this code to not compile at all instead of generating invalid IR.
- LLVM intrinsics that have types in their signature that can't be accessed from Rust (notable examples are the AMX intrinsics that have the
x86amx
type, and (almost) all intrinsics that have vectors ofi1
types) can't be linked to at all. This is a (major?) roadblock in the AMX and AVX512 support in stdarch. - If code uses an non-existing LLVM intrinsic, even
-Zverify-llvm-ir
won't complain. Eventually it will error out due to the non-existing function (courtesy of the linker). I don't think this is a behavior we want.
What this PR does
- When linking to non-overloaded intrinsics, we use the function
LLVMIntrinsicGetType
to directly get the function type of the intrinsic from LLVM. - We then use this LLVM definition to verify the Rust signature, and emit a proper error if it doesn't match, instead of silently emitting invalid IR.
Note
This PR only focuses on non-overloaded intrinsics, overloaded can be done in a future PR
Regardless, the undermentioned functionalities work for all intrinsics
- If we can't find the intrinsic, we check if it has been
AutoUpgrade
d by LLVM. If not, that means it is an invalid intrinsic, and we error out. - Don't allow intrinsics from other archs to be declared, e.g. error out if an AArch64 intrinsic is declared when we are compiling for x86
Pros
- It is now not possible (or at least, it would require significantly more leaps and bounds) to introduce invalid IR using non-overloaded LLVM intrinsics.
- As we are now doing the matching of Rust signatures to LLVM intrinsics ourselves, we can now add bypasses to enable linking to such non-Rust types (e.g. matching 8192-bit vectors to
x86amx
and injectingllvm.x86.cast.vector.to.tile
andllvm.x86.cast.tile.to.vector
s in callsite)
Note
I don't intend for these bypasses to be permanent (at least the bf16
and i1
ones, the x86amx
bypass seems inevitable). A better approach will be introducing a bf16
type in Rust, and allowing repr(simd)
with bool
s to get Rust-native i1xN
s. These are meant to be short-time, as I mentioned, "bypass"es. They shouldn't cause any major breakage even if removed, as link_llvm_intrinsics
is perma-unstable.
This PR adds bypasses for bf16
(via i16
), bf16xN
(via i16xN
), i1xN
(via iM
, where M
is the smallest power of 2 s.t. M >= N
, unless N <= 4
, where we use M = 8
), and x86amx
(via 8192-bit vectors). This will unblock AVX512-VP2INTERSECT, AMX and a lot of bf16 intrinsics in stdarch. This PR also automatically destructures structs if the types don't exactly match (this is required for us to start emitting hard errors on mismmatches).
Cons
- This only works for non-overloaded intrinsics (at least for now). Improving this to work with overloaded intrinsics too will involve significantly more work.
Possible ways to extend this to overloaded intrinsics (future)
Parse the mangled intrinsic name to get the type parameters
LLVM has a stable mangling of intrinsic names with type parameters (in LLVMIntrinsicCopyOverloadedName2
), so we can parse the name to get the type parameters, and then just do the same thing.
Pros
- For most intrinsics, this will work perfectly, and is a easy way to do this.
Cons
- The LLVM mangling is not perfectly reversible. When we have
TargetExt
types or identified structs, their name is a part of the mangling, making it impossible to reverse. Even more complexities arise when there are unnamed identified structs, as LLVM adds more mangling to the names.
Use the IITDescriptor
table and the Rust function signature
We can use the base name to get the IITDescriptor
s of the corresponding intrinsic, and then manually implement the matching logic based on the Rust signature.
Pros
- Doesn't have the above mentioned limitation of the parsing approach, has correct behavior even when there are identified structs and
TargetExt
types. Also, fun fact, Rust exports all struct types as literal structs (unless it is emitting LLVM IR, then it always uses named identified structs, with mangled names)
Cons
- Doesn't actually use the type parameters in the name, only uses the base name and the Rust signature to get the llvm signature (although we can check that it is the correct name). It means there would be no way to (for example) link against
llvm.sqrt.bf16
until we havebf16
types in Rust. Because if we are usingu16
s (or any other type) asbf16
s, then the matcher will deduce that the signature isu16 (u16)
notbf16 (bf16)
(which would lead to an error becauseu16
is not a valid type parameter forllvm.sqrt
), even though the intended type parameter is specified in the name. - Much more complex, and hard to maintain as LLVM gets new
IITDescriptorKind
s
These 2 approaches might give different results for same function. Let's take
#[link_name = "llvm.is.constant.bf16"] fn foo(a: u16) -> bool
The name-based approach will decide that the type parameter is bf16
, and the LLVM signature is i1 (bf16)
and will inject some bitcasts at callsite.
The IITDescriptor
-based approach will decide that the LLVM signature is i1 (u16)
, and will see that the name given doesn't match the expected name (llvm.is.constant.u16
), and will error out.
Other things that this PR does
- Disables all ABI checks only for the
unadjusted
ABI to facilitate the implementation of AMX (otherwise passing 8192-bit vectors to the intrinsic won't be allowed). This is "safe" because this ABI is only used to link to LLVM intrinsics, and passing vectors of any lengths to LLVM intrinsics is fine, because they don't exist in machine level. - Removes unnecessary
bitcast
s incg_llvm/builder::check_call
(now renamed ascast_arguments
due to its new counterpartcast_return
). This was old code from when Rust used to pass non-erased lifetimes to LLVM.
Reviews are welcome, as this is my first time actually contributing to rustc
After CI is green, we would need a try build and a rustc-perf run.
@rustbot label T-compiler A-codegen A-LLVM
r? codegen