Improve ARM64 atomics for Clang by StephanTLavavej · Pull Request #4870 · microsoft/STL (original) (raw)

This mirrors @mcfi's MSVC-PR-567635 "Leverage clang builtins __atomic_load_n/__atomic_store_n for more efficient acquired loads and released stores on Arm64" as of Iteration 13. His description:

Clang doesn't support __load_acquire/__ldar/__stlr intrinsics, so applications built with clang still generate full barriers for acquired loads and released stores. This PR changes the STL code to leverage clang builtins __atomic_load_n/__atomic_store_n to generate more efficient ldar/stlr for acquired loads and released stores.

This improved a benchmark score by ~2.8% on real hardware.

Resolves llvm/llvm-project#62103 because we're going to use Clang's builtins now.

Works towards #1133.