Optimize Span.Copy and Span.TryCopyTo by GrabYourPitchforks · Pull Request #15947 · dotnet/coreclr (original) (raw)

@AndyAyersMS @jkotas And here's the x86 disassembly: original and modified. Ignore the comparisons at the very beginning and very end of the function as I've moved some things around a bit.

In particular check out the parts around the movdqu block, which seem to be less compact in their representation and perform more shuffling of registers.

Original (pointer-based)

53c15feb 8bd8 mov ebx,eax 53c15fed c1eb06 shr ebx,6 53c15ff0 f30f6f02 movdqu xmm0,xmmword ptr [edx] 53c15ff4 f30f7f01 movdqu xmmword ptr [ecx],xmm0 53c15ff8 f30f6f4210 movdqu xmm0,xmmword ptr [edx+10h] 53c15ffd f30f7f4110 movdqu xmmword ptr [ecx+10h],xmm0 53c16002 f30f6f4220 movdqu xmm0,xmmword ptr [edx+20h] 53c16007 f30f7f4120 movdqu xmmword ptr [ecx+20h],xmm0 53c1600c f30f6f4230 movdqu xmm0,xmmword ptr [edx+30h] 53c16011 f30f7f4130 movdqu xmmword ptr [ecx+30h],xmm0 53c16016 83c140 add ecx,40h 53c16019 83c240 add edx,40h

Modified (ref-based)

53b9cbf5 8bf1 mov esi,ecx 53b9cbf7 c1ee06 shr esi,6 53b9cbfa 8b7d08 mov edi,dword ptr [ebp+8] 53b9cbfd 8b5d0c mov ebx,dword ptr [ebp+0Ch] 53b9cc00 f30f6f07 movdqu xmm0,xmmword ptr [edi] 53b9cc04 f30f7f03 movdqu xmmword ptr [ebx],xmm0 53b9cc08 f30f6f4710 movdqu xmm0,xmmword ptr [edi+10h] 53b9cc0d f30f7f4310 movdqu xmmword ptr [ebx+10h],xmm0 53b9cc12 f30f6f4720 movdqu xmm0,xmmword ptr [edi+20h] 53b9cc17 f30f7f4320 movdqu xmmword ptr [ebx+20h],xmm0 53b9cc1c f30f6f4730 movdqu xmm0,xmmword ptr [edi+30h] 53b9cc21 f30f7f4330 movdqu xmmword ptr [ebx+30h],xmm0 53b9cc26 8b7d0c mov edi,dword ptr [ebp+0Ch] 53b9cc29 83c740 add edi,40h 53b9cc2c 897d0c mov dword ptr [ebp+0Ch],edi 53b9cc2f 8b7d08 mov edi,dword ptr [ebp+8] 53b9cc32 83c740 add edi,40h 53b9cc35 897d08 mov dword ptr [ebp+8],edi