Consider implementing ARM64 __load_acquire/__stlr intrinsics · Issue #62103 · llvm/llvm-project (original) (raw)

As of VS 2022 17.6 Preview 3, MSVC supports the following ARM64 intrinsics used by its STL:

unsigned __int8 __load_acquire8 (const volatile unsigned __int8 * _Target); unsigned __int16 __load_acquire16(const volatile unsigned __int16 * _Target); unsigned __int32 __load_acquire32(const volatile unsigned __int32 * _Target); unsigned __int64 __load_acquire64(const volatile unsigned __int64 * _Target);

void __stlr8 (volatile unsigned __int8 * _Target, unsigned __int8 _Value); void __stlr16(volatile unsigned __int16 * _Target, unsigned __int16 _Value); void __stlr32(volatile unsigned __int32 * _Target, unsigned __int32 _Value); void __stlr64(volatile unsigned __int64 * _Target, unsigned __int64 _Value);

According to my understanding, the __load_acquire intrinsic emits either the ldar or ldapr instructions (according to criteria that are beyond my cat-sized brain 🐱 🧠), while the __stlr intrinsic emits the stlr instruction. These are significantly more efficient than what was previously possible.

Currently, MSVC's STL is using its classic (slower) codepaths for Clang/LLVM ARM64. It would be nice if Clang added support for the new faster intrinsics.