[llvm-dev] The builtins library of compiler-rt is a performance HOG^WKILLER (original) (raw)

Chris Bieneman via llvm-dev llvm-dev at lists.llvm.org
Tue Dec 4 17:38:07 PST 2018


On Dec 3, 2018, at 10:50 AM, Stefan Kanthak via llvm-dev <llvm-dev at lists.llvm.org> wrote:

"Craig Topper" <craig.topper at gmail.com> wrote:

None of the "si" division routines will be used by x86. That was my expectation too. They exist for targets that don't support the operations natively. X86 supports them natively so will never use the library functions. So they SHOULD not be built (or at least not shipped) with the builtins library for x86.

I think you will find that down this path lies madness. Apple has tried for many years to limit which builtins get shipped in compiler-rt to just the smallest correct set to reduce the distribution size of clang. Over the years we've taken several different approaches, and they are all error prone and result in bugs. This problem stems from the fact that generation of builtin calls can be triggered by optimization settings, architecture, ABI, or the lunar cycle.

Initially we (Apple) maintained per-architecture lists of builtins, but those lists wouldn't get updated when new builtins got added, and we'd get bugs often after we shipped. Then I moved to an inverted system where we maintained lists to exclude, allowing that all new builtins always got added, but that has turned out to be a mess because it is really hard to know if it is safe to exclude something, and oh wait the compiler changed and now it isn't safe anymore.

IMO, and coming from some painful experience, I think including all builtin functions is the easiest way to make less buggy release, until someone comes along and comes up with a definitive way for us to always know if a given builtin is possible to generate with a given compiler.

-Chris

_X86 has its own assembly implementation of muldi3 that uses 32-bit pieces. I know; that's why I placed this ABOVE my "JFTR:" We should be using the assembly versions of the "di" division routines on i386. Except when compiler-rt is built with MSVC because MSVC can't parse the at&t assembly syntax. Again: my offer to provide these routines still stands! _I have OPTIMISED _divdi3, _moddi3, _udivdi3 and umoddi3 in Intel syntax, wrapped as inline files into an NMakefile, for use with ML.EXE. For the optimisations see the patch I sent last week. Since Howard Hinnant is NO MORE with LLVM: who is the CURRENT code owner and reviewer for the builtins library, especially for x86? I'm asking this SIMPLE question now for the 3rd time! _I also have udivmoddi3: adding the pointer to the remainder as _argument and 4 more instructions will turn it into udivmoddi4. Compiling them with MSVC is of course easy to achieve: remove the _MASM/ML statements, put the assembler source inside an asm block, _and add a function definition with declspec(naked) But then someone will have to find new filenames; I'd prefer to leave them as *.ASM, so they can be added to YOUR source tree without clobbering existing files. _The same holds for _alldiv, _alldvrm, _allrem, aulldiv, __aulldvrm and _aullrem, plus _allmul, allshl, allshr and _aullshr. If you name a reviewer I'll send them to llvm-commits! regards Stefan On Mon, Dec 3, 2018 at 5:51 AM Stefan Kanthak via llvm-dev <_ _llvm-dev at lists.llvm.org> wrote:

Hi @ll,

LLVM-7.0.0-win32.exe contains and installs lib\clang\7.0.0\lib\windows\clangrt.builtins-i386.lib The implementation of (at least) the multiplication and division _routines [u]{div,mod,divmod,mul}[sdt]i[34] shipped with this libraries SUCKS: they are factors SLOWER than even Microsoft's NOTORIOUS POOR implementation of 64-bit division shipped with MSVC and Windows! The reasons: 1. subroutine matroschka, 2. "C" implementation! JFTR: the target processor "i386" (introduced October 1985) is a 32-bit processor, it has instructions to divide 64-bit integers by 32-bit integers, and to multiply two 32-bit integers giving a 64-bit product! I expect that a library written 20+ years later takes advantage of these instructions! _divsi3 (18 instructions) perform a DIV after 2 calls of abs(), plus a final negation, instead of just a single IDIV __modsi3 (14 instructions) calls divsi3 (18 instructions) __divmodsi4 (17 instructions) calls divsi3 (18 instructions) _udivsi3 (52 instructions) does NOT use DIV, but performs BITWISE division using shifts and additions! __umodsi3 (14 instructions) calls udivsi3 (52 instructions) __udivmodsi4 (17 instructions) calls udivsi3 (52 instructions) _muldi3 (41 instructions) performs a "long" multiplication on 16-bit "digits" JFTR: I haven't checked whether clang actually calls these SUPERFLUOUS routines listed above. IT BETTER SHOULD NOT, NEVER! __divdi3 (37 instructions) calls udivmoddi4 (254 instructions) __moddi3 (51 instructions) calls udivmoddi4 (254 instructions) __divmoddi4 (36 instructions) calls divdi3 (37 instructions) which _calls udivmoddi4 (254 instructions) __udivdi3 (8 instructions) calls udivmoddi4 (254 instructions) __umoddi3 (33 instructions) calls udivmoddi4 (254 instructions) JFTR: the subdirectory compiler-rt/lib/builtins/i386/ contains FAR _better (although suboptimal) _divdi3, _moddi3, udivdi3 and _umoddi3 routines written in assembler, which SHOULD be shipped with clangrt.builtins-i386.lib instead of the above listed POOR and NOT optimised implementations! NOT AMUSED Stefan Kanthak PS: <https://lists.llvm.org/pipermail/llvm-dev/2018-November/128094.html> has patches for the assembler routines! PPS: please remove the blatant lie | The builtins library provides optimized implementations of | this and other low-level routines, either in target-independent | C form, or as a heavily-optimized assembly. seen on <https://compiler-rt.llvm.org/> These routines are NOT optimized, and for sure NOT heavily- optimized!


LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list