[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long (original) (raw)
Flamedoge via llvm-dev llvm-dev at lists.llvm.org
Wed Apr 19 10:14:20 PDT 2017
- Previous message: [llvm-dev] FE_INEXACT being set for an exact conversion from float to i64
- Next message: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Are we better off using branches instead of cmove to implement FP to unsigned i64?
This seems like it was done for perf reason (mispredict). Conditional-to-cmov transformation should keep from introducing additional observable side-effects, and it's clear that whatever did this did not account for floating point exception.
On Wed, Apr 19, 2017 at 10:01 AM, Michael Clark via llvm-dev < llvm-dev at lists.llvm.org> wrote:
Changing the list from cfe-dev to llvm-dev
On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote: I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI. I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic using a conditional move is causing INEXACT to be set from the other side of the predicate as the lowered x8664 code executes both conversions whereas GCC uses a branch. That seems to be the difference. I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.
$ more llvm/lib/Target/X86//README-X86-64.txt … Are we better off using branches instead of cmove to implement FP to unsigned i64? conv: ucomiss LC0(%rip), %xmm0 cvttss2siq %xmm0, %rdx jb L3 subss LC0(%rip), %xmm0 movabsq $-9223372036854775808, %rax cvttss2siq %xmm0, %rdx xorq %rax, %rdx L3: movq %rdx, %rax ret instead of conv: movss LCPI10(%rip), %xmm1 cvttss2siq %xmm0, %rcx movaps %xmm0, %xmm2 subss %xmm1, %xmm2 cvttss2siq %xmm2, %rax movabsq $-9223372036854775808, %rdx xorq %rdx, %rax ucomiss %xmm1, %xmm0 cmovb %rcx, %rax ret On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at mac.com> wrote: On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at gmail.com> wrote: On 18 April 2017 at 15:54, Michael Clark via cfe-dev <cfe-dev at lists.llvm.org> wrote: The only way towards completing a milestone is via fixing a number of small issues along the way… I believe there's more to it than that. None of LLVM's optimizations are aware of this extra side-channel of information (with possible exceptions like avoiding speculating fdiv because of unavoidable exceptions). From what I remember, the real proposal is to replace all floating-point IR with intrinsics when FENVACCESS is on, which the optimizers by default won't have a clue about and will treat conservatively (essentially like they're modifying external memory). So be careful with drawing conclusions from small snippets; you're probably not seeing the full range of LLVM's behaviour.
Yes. I’m sure. It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2 It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations. ; Function Attrs: noinline nounwind uwtable define i64 @Z7fcvtluf(float %f) #0 { %1 = alloca float, align 4 store float %f, float* %1, align 4 %2 = load float, float* %1, align 4 %3 = fptoui float %2 to i64 ret i64 %3 } GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag. Clang lowering (inexact set when result is exact): fcvtlu(float): movss xmm1, dword ptr [rip + .LCPI10] # xmm1 = mem[0],zero,zero,zero movaps xmm2, xmm0 subss xmm2, xmm1 cvttss2si rax, xmm2 movabs rcx, -9223372036854775808 xor rcx, rax cvttss2si rax, xmm0 ucomiss xmm0, xmm1 cmovae rax, rcx ret GCC lowering (sets flags correctly): fcvtlu(float): ucomiss xmm0, DWORD PTR .LC0[rip] jnb .L4 cvttss2si rax, xmm0 ret .L4: subss xmm0, DWORD PTR .LC0[rip] movabs rdx, -9223372036854775808 cvttss2si rax, xmm0 xor rax, rdx ret
LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170419/72c5480c/attachment.html>
- Previous message: [llvm-dev] FE_INEXACT being set for an exact conversion from float to i64
- Next message: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]