Improve ARM64 atomics for Clang by StephanTLavavej · Pull Request #4870 · microsoft/STL (original) (raw)
This mirrors @mcfi's MSVC-PR-567635 "Leverage clang builtins __atomic_load_n
/__atomic_store_n
for more efficient acquired loads and released stores on Arm64" as of Iteration 13. His description:
Clang doesn't support
__load_acquire
/__ldar
/__stlr
intrinsics, so applications built with clang still generate full barriers for acquired loads and released stores. This PR changes the STL code to leverage clang builtins__atomic_load_n
/__atomic_store_n
to generate more efficientldar
/stlr
for acquired loads and released stores.
This improved a benchmark score by ~2.8% on real hardware.
Resolves llvm/llvm-project#62103 because we're going to use Clang's builtins now.
Works towards #1133.