[llvm-dev] The builtins library of compiler-rt is a performance HOG^WKILLER (original) (raw)

Craig Topper via llvm-dev llvm-dev at lists.llvm.org
Mon Dec 3 11:24:21 PST 2018


Reviewers: me, Simon Pilgrim, Sanjay Patel(the 3 most active X86 contributors) and probably Steve Canon since he wrote the original routines.

~Craig

On Mon, Dec 3, 2018 at 10:51 AM Stefan Kanthak <stefan.kanthak at nexgo.de> wrote:

"Craig Topper" <craig.topper at gmail.com> wrote:

> None of the "si" division routines will be used by x86. That was my expectation too. > They exist for targets that don't support the operations natively. > X86 supports them natively so will never use the library functions. So they SHOULD not be built (or at least not shipped) with the builtins library for x86. _> X86 has its own assembly implementation of muldi3 that uses 32-bit > pieces. I know; that's why I placed this ABOVE my "JFTR:" > We should be using the assembly versions of the "di" division routines on > i386. Except when compiler-rt is built with MSVC because MSVC can't parse > the at&t assembly syntax. Again: my offer to provide these routines still stands! _I have OPTIMISED _divdi3, _moddi3, _udivdi3 and umoddi3 in Intel syntax, wrapped as inline files into an NMakefile, for use with ML.EXE. For the optimisations see the patch I sent last week. Since Howard Hinnant is NO MORE with LLVM: who is the CURRENT code owner and reviewer for the builtins library, especially for x86? I'm asking this SIMPLE question now for the 3rd time! _I also have udivmoddi3: adding the pointer to the remainder as _argument and 4 more instructions will turn it into udivmoddi4. Compiling them with MSVC is of course easy to achieve: remove the _MASM/ML statements, put the assembler source inside an asm block, _and add a function definition with declspec(naked) But then someone will have to find new filenames; I'd prefer to leave them as *.ASM, so they can be added to YOUR source tree without clobbering existing files. _The same holds for _alldiv, _alldvrm, _allrem, aulldiv, __aulldvrm and _aullrem, plus _allmul, allshl, allshr and _aullshr. If you name a reviewer I'll send them to llvm-commits! regards Stefan > On Mon, Dec 3, 2018 at 5:51 AM Stefan Kanthak via llvm-dev <_ _> llvm-dev at lists.llvm.org> wrote: > >> Hi @ll, >> >> LLVM-7.0.0-win32.exe contains and installs >> lib\clang\7.0.0\lib\windows\clangrt.builtins-i386.lib >> >> The implementation of (at least) the multiplication and division _>> routines [u]{div,mod,divmod,mul}[sdt]i[34] shipped with this >> libraries SUCKS: they are factors SLOWER than even Microsoft's >> NOTORIOUS POOR implementation of 64-bit division shipped with >> MSVC and Windows! >> >> The reasons: 1. subroutine matroschka, 2. "C" implementation! >> >> JFTR: the target processor "i386" (introduced October 1985) is >> a 32-bit processor, it has instructions to divide 64-bit >> integers by 32-bit integers, and to multiply two 32-bit >> integers giving a 64-bit product! >> I expect that a library written 20+ years later takes >> advantage of these instructions! >> _>> divsi3 (18 instructions) perform a DIV after 2 calls of abs(), >> plus a final negation, instead of just >> a single IDIV _>> _modsi3 (14 instructions) calls divsi3 (18 instructions) _>> _divmodsi4 (17 instructions) calls divsi3 (18 instructions) >> _>> udivsi3 (52 instructions) does NOT use DIV, but performs BITWISE >> division using shifts and additions! _>> _umodsi3 (14 instructions) calls udivsi3 (52 instructions) _>> _udivmodsi4 (17 instructions) calls udivsi3 (52 instructions) >> _>> muldi3 (41 instructions) performs a "long" multiplication on >> 16-bit "digits" >> >> JFTR: I haven't checked whether clang actually calls these >> SUPERFLUOUS routines listed above. >> IT BETTER SHOULD NOT, NEVER! >> _>> _divdi3 (37 instructions) calls udivmoddi4 (254 instructions) _>> _moddi3 (51 instructions) calls udivmoddi4 (254 instructions) _>> _divmoddi4 (36 instructions) calls divdi3 (37 instructions) which _>> calls udivmoddi4 (254 instructions) _>> _udivdi3 (8 instructions) calls udivmoddi4 (254 instructions) _>> _umoddi3 (33 instructions) calls udivmoddi4 (254 instructions) >> >> JFTR: the subdirectory compiler-rt/lib/builtins/i386/ contains FAR _>> better (although suboptimal) _divdi3, _moddi3, udivdi3 and _>> umoddi3 routines written in assembler, which SHOULD be >> shipped with clangrt.builtins-i386.lib instead of the above >> listed POOR and NOT optimised implementations! >> >> NOT AMUSED >> Stefan Kanthak >> >> PS: <_ _https://lists.llvm.org/pipermail/llvm-dev/2018-November/128094.html> >> has patches for the assembler routines! >> >> PPS: please remove the blatant lie >> | The builtins library provides optimized implementations of >> | this and other low-level routines, either in target-independent >> | C form, or as a heavily-optimized assembly. >> seen on <https://compiler-rt.llvm.org/> >> These routines are NOT optimized, and for sure NOT heavily- >> optimized! _>> ________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181203/5edc2b52/attachment.html>



More information about the llvm-dev mailing list