(original) (raw)

None of the "si" division routines will be used by x86\. They exist for targets that don't support the operations natively. X86 supports them natively so will never use the library functions.

X86 has its own assembly implementation of \_\_muldi3 that uses 32-bit pieces.

We should be using the assembly versions of the "di" division routines on i386\. Except when compiler-rt is built with MSVC because MSVC can't parse the at&t assembly syntax.

\~Craig


On Mon, Dec 3, 2018 at 5:51 AM Stefan Kanthak via llvm-dev <llvm-dev@lists.llvm.org> wrote:
Hi @ll,

LLVM-7.0.0-win32.exe contains and installs
lib\\clang\\7.0.0\\lib\\windows\\clang\_rt.builtins-i386.lib

The implementation of (at least) the multiplication and division
routines \_\_\[u\]{div,mod,divmod,mul}\[sdt\]i\[34\] shipped with this
libraries SUCKS: they are factors SLOWER than even Microsoft's
NOTORIOUS POOR implementation of 64-bit division shipped with
MSVC and Windows!

The reasons: 1\. subroutine matroschka, 2\. "C" implementation!

JFTR: the target processor "i386" (introduced October 1985) is
a 32-bit processor, it has instructions to divide 64-bit
integers by 32-bit integers, and to multiply two 32-bit
integers giving a 64-bit product!
I expect that a library written 20+ years later takes
advantage of these instructions!

\_\_divsi3 (18 instructions) perform a DIV after 2 calls of abs(),
plus a final negation, instead of just
a single IDIV
\_\_modsi3 (14 instructions) calls \_\_divsi3 (18 instructions)
\_\_divmodsi4 (17 instructions) calls \_\_divsi3 (18 instructions)

\_\_udivsi3 (52 instructions) does NOT use DIV, but performs BITWISE
division using shifts and additions!
\_\_umodsi3 (14 instructions) calls \_\_udivsi3 (52 instructions)
\_\_udivmodsi4 (17 instructions) calls \_\_udivsi3 (52 instructions)

\_\_muldi3 (41 instructions) performs a "long" multiplication on
16-bit "digits"

JFTR: I haven't checked whether clang actually calls these
SUPERFLUOUS routines listed above.
IT BETTER SHOULD NOT, NEVER!

\_\_divdi3 (37 instructions) calls \_\_udivmoddi4 (254 instructions)
\_\_moddi3 (51 instructions) calls \_\_udivmoddi4 (254 instructions)
\_\_divmoddi4 (36 instructions) calls \_\_divdi3 (37 instructions) which
calls \_\_udivmoddi4 (254 instructions)
\_\_udivdi3 (8 instructions) calls \_\_udivmoddi4 (254 instructions)
\_\_umoddi3 (33 instructions) calls \_\_udivmoddi4 (254 instructions)

JFTR: the subdirectory compiler-rt/lib/builtins/i386/ contains FAR
better (although suboptimal) \_\_divdi3, \_\_moddi3, \_\_udivdi3 and
\_\_umoddi3 routines written in assembler, which SHOULD be
shipped with clang\_rt.builtins-i386.lib instead of the above
listed POOR and NOT optimised implementations!

NOT AMUSED
Stefan Kanthak

PS: <https://lists.llvm.org/pipermail/llvm-dev/2018-November/128094.html>
has patches for the assembler routines!

PPS: please remove the blatant lie
| The builtins library provides optimized implementations of
| this and other low-level routines, either in target-independent
| C form, or as a heavily-optimized assembly.
seen on <https://compiler-rt.llvm.org/>
These routines are NOT optimized, and for sure NOT heavily-
optimized!
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev