(original) (raw)

For the multiply case, your improved code requires duplicating a load. Sure it's safe in this case because there are no stores and the memory isn't volatile. But the register allocator would have to analyze the code to prove that it's safe to duplicate.

\~Craig

On Sat, Dec 1, 2018 at 9:38 AM Stefan Kanthak via llvm-dev <llvm-dev@lists.llvm.org> wrote:

Compile the following functions with "-O3 -target i386-win32"
(see <https://godbolt.org/z/exmjWY>):

\_\_int64 \_\_fastcall div(\_\_int64 foo, \_\_int64 bar)
{
return foo / bar;
}

On the left the generated code; on the right the expected,
properly optimised code:

push dword ptr \[esp + 16\] |
push dword ptr \[esp + 16\] |
push dword ptr \[esp + 16\] |
push dword ptr \[esp + 16\] |
call \_\_alldiv | jmp \_\_alldiv
ret 16 |

\_\_int64 \_\_fastcall mod(\_\_int64 foo, \_\_int64 bar)
{
return foo % bar;
}

push dword ptr \[esp + 16\] |
push dword ptr \[esp + 16\] |
push dword ptr \[esp + 16\] |
push dword ptr \[esp + 16\] |
call \_\_allrem | jmp \_\_allrem
ret 16 |

\_\_int64 \_\_fastcall mul(\_\_int64 foo, \_\_int64 bar)
{
return foo \* bar;
}

push esi | mov ecx, dword ptr \[esp + 16\]
mov ecx, dword ptr \[esp + 16\] | mov edx, dword ptr \[esp + 12\]
mov esi, dword ptr \[esp + 8\] | imul edx, dword ptr \[esp + 8\]
mov eax, ecx | mov eax, dword ptr \[esp + 4\]
imul ecx, dword ptr \[esp + 12\] | imul ecx, eax
mul esi | add ecx, edx
imul esi, dword ptr \[esp + 20\] | mul dword ptr \[esp + 12\]
add edx, ecx | add edx, ecx
add edx, esi | ret 16
pop esi |
ret 16 |
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev