Improve Guid equality checks on 64-bit platforms by BruceForstall · Pull Request #35654 · dotnet/runtime (original) (raw)

Did you consider a vectorized approach?

I didn't. I was specifically looking to improve arm64, but x64 as well (and be simple and cross-platform).

Is it safe for arm32 as well (accessing misaligned long)?

RyuJIT converts a 64-bit long access to two 32-bit int accesses on 32-bit platforms, so there is no difference in alignment for 32-bit. There could be a difference in alignment for 64-bit since presumably Guid could be 4-byte aligned.

fyi, here's the x86 code difference. It appears RyuJIT doesn't handle the Unsafe.Add calls well.

current x86 assembly

G_M51749_IG01: 55 push ebp 8BEC mov ebp, esp ;; bbWeight=1 PerfScore 1.25 G_M51749_IG02: 8B4518 mov eax, dword ptr [ebp+18H] 3B4508 cmp eax, dword ptr [ebp+08H] 7532 jne SHORT G_M51749_IG05 ;; bbWeight=1 PerfScore 3.00 G_M51749_IG03: 8D4518 lea eax, bword ptr [ebp+18H] 8B4004 mov eax, dword ptr [eax+4] 8D5508 lea edx, bword ptr [ebp+08H] 3B4204 cmp eax, dword ptr [edx+4] 7524 jne SHORT G_M51749_IG05 8D4518 lea eax, bword ptr [ebp+18H] 8B4008 mov eax, dword ptr [eax+8] 8D5508 lea edx, bword ptr [ebp+08H] 3B4208 cmp eax, dword ptr [edx+8] 7516 jne SHORT G_M51749_IG05 8D4518 lea eax, bword ptr [ebp+18H] 8B400C mov eax, dword ptr [eax+12] 8D5508 lea edx, bword ptr [ebp+08H] 3B420C cmp eax, dword ptr [edx+12] 0F94C0 sete al 0FB6C0 movzx eax, al ;; bbWeight=0.50 PerfScore 9.13 G_M51749_IG04: 5D pop ebp C22000 ret 32 ;; bbWeight=0.50 PerfScore 1.25 G_M51749_IG05: 33C0 xor eax, eax ;; bbWeight=0.50 PerfScore 0.13 G_M51749_IG06: 5D pop ebp C22000 ret 32

new x86 assembly

G_M51749_IG02: 8B442414 mov eax, dword ptr [esp+14H] 8B542418 mov edx, dword ptr [esp+18H] 33442404 xor eax, dword ptr [esp+04H] 33542408 xor edx, dword ptr [esp+08H] 0BC2 or eax, edx 751B jne SHORT G_M51749_IG05 ;; bbWeight=1 PerfScore 5.25 G_M51749_IG03: 8B44241C mov eax, dword ptr [esp+1CH] 8B542420 mov edx, dword ptr [esp+20H] 3344240C xor eax, dword ptr [esp+0CH] 33542410 xor edx, dword ptr [esp+10H] 0BC2 or eax, edx 0F94C0 sete al 0FB6C0 movzx eax, al ;; bbWeight=0.50 PerfScore 2.75 G_M51749_IG04: C22000 ret 32 ;; bbWeight=0.50 PerfScore 1.00 G_M51749_IG05: 33C0 xor eax, eax ;; bbWeight=0.50 PerfScore 0.13 G_M51749_IG06: C22000 ret 32