(original) (raw)
Hi Arthur, Craig,
Thanks for you comments about GCC/Clang intrinsics. I never considered using them, but they might be better alternative to inline assembly.
Is there a one for regular MUL?
Anyway, I want to go the opposite direction. If I can I relay on compiler's optimizations. If I want to use MULX in Clang I do it like that:
Is there a one for regular MUL?
Anyway, I want to go the opposite direction. If I can I relay on compiler's optimizations. If I want to use MULX in Clang I do it like that:
unsigned long mulx(unsigned long x, unsigned long y, unsigned long\* hi)
{
auto p = (unsigned \_\_int128){x} \* y;
\*hi = static\_cast<unsigned long>(p >> 64);
return static\_cast<unsigned long>(p);
}
mulx(unsigned long, unsigned long, unsigned long\*):
mov rcx, rdx
mov rdx, rsi
mulx rdx, rax, rdi
mov qword ptr \[rcx\], rdx
ret
What I want to do it move it further - rewrite the above mulx() helper without using \_\_int128 type in a way that a compiler would recognize that it should use MUL/MULX instruction.
A possible implementation looks like
A possible implementation looks like
uint64\_t mul\_full\_64\_generic(uint64\_t x, uint64\_t y, uint64\_t\* hi)
{
uint64\_t xl = x & 0xffffffff;
uint64\_t xh = x >> 32;
uint64\_t yl = y & 0xffffffff;
uint64\_t yh = y >> 32;
uint64\_t t = xl \* yl;
uint64\_t l = t & 0xffffffff;
uint64\_t h = t >> 32;
t = xh \* yl;
t += h;
h = t >> 32;
t = xl \* yh + (t & 0xffffffff);
l |= t << 32;
\*hi = xh \* yh + h + (t >> 32);
return l;
}
As expected, Clang is not able to match this pattern currently.
If we want to implement this optimization in Clang, there are some questions I have:
1\. Can we prove this pattern is equivalent of MUL 64x64 -> 128?
2\. What pass this optimization should be added to?
3\. Can this pattern be split into smaller ones? E.g. UMULH.
Paweł
1\. Can we prove this pattern is equivalent of MUL 64x64 -> 128?
2\. What pass this optimization should be added to?
3\. Can this pattern be split into smaller ones? E.g. UMULH.
Paweł
On Sun, Dec 30, 2018 at 2:34 AM Craig Topper <craig.topper@gmail.com> wrote:
\_mulx\_u64 only exists when the target is x86\_64\. That's still not very portable. I'm not opposed to removing the bmi2 check, but gcc also has the same check so it doesn't improve portability much.\~CraigOn Sat, Dec 29, 2018 at 4:44 PM Arthur O'Dwyer via llvm-dev <llvm-dev@lists.llvm.org> wrote:Hi Pawel,There is the \_mulx\_u64 intrinsic, but it currently requires the hardware flag "-mbmi2".On Clang 3.8.1 and earlier, the \_addcarry\_u64 and \_subborrow\_u64 intrinsics required the hardware flag \`-madx\`, even though they didn't use the hardware ADX/ADOX instructions. Modern GCC and Clang permit the use of these intrinsics (to generate ADC) even in the absence of \`-madx\`.I think it would be a very good idea for Clang to support \_mulx\_u64 (to generate MUL) even in the absence of \`-mbmi2\`.–Arthur\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_On Sat, Dec 29, 2018 at 6:03 PM Paweł Bylica via cfe-dev <cfe-dev@lists.llvm.org> wrote:Hi,\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_For some maybe dumb reasons I try to write a portable version of int128.What is very valuable for this implementation is access to MUL instruction on x86 which provides full 64 x 64 -> 128 bit multiplication. An equally useful on ARM would be UMULH instruction.Well, the way you can access this on clang / GCC is to use \_\_int128 type or use inline assembly. MSVC provides an intrinsic for this instruction. This defeats the idea of portable int128 reimplementation and makes constexpr implementation of multiplication at least inconvenient.Maybe there is a hope for me in LLVM. Is there any pattern matcher that is producing MUL instruction of bigger type?If not, would it be good idea to teach LLVM about it?Bests,Paweł
cfe-dev mailing list
cfe-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev