Consider implementing ARM64 __load_acquire
/__stlr
intrinsics · Issue #62103 · llvm/llvm-project (original) (raw)
As of VS 2022 17.6 Preview 3, MSVC supports the following ARM64 intrinsics used by its STL:
unsigned __int8 __load_acquire8 (const volatile unsigned __int8 * _Target); unsigned __int16 __load_acquire16(const volatile unsigned __int16 * _Target); unsigned __int32 __load_acquire32(const volatile unsigned __int32 * _Target); unsigned __int64 __load_acquire64(const volatile unsigned __int64 * _Target);
void __stlr8 (volatile unsigned __int8 * _Target, unsigned __int8 _Value); void __stlr16(volatile unsigned __int16 * _Target, unsigned __int16 _Value); void __stlr32(volatile unsigned __int32 * _Target, unsigned __int32 _Value); void __stlr64(volatile unsigned __int64 * _Target, unsigned __int64 _Value);
According to my understanding, the __load_acquire
intrinsic emits either the ldar
or ldapr
instructions (according to criteria that are beyond my cat-sized brain 🐱 🧠), while the __stlr
intrinsic emits the stlr
instruction. These are significantly more efficient than what was previously possible.
Currently, MSVC's STL is using its classic (slower) codepaths for Clang/LLVM ARM64. It would be nice if Clang added support for the new faster intrinsics.