[llvm-dev] FE_INEXACT being set for an exact conversion from float to i64 (original) (raw)
Michael Clark via llvm-dev llvm-dev at lists.llvm.org
Wed Apr 19 10:06:58 PDT 2017
- Previous message: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
- Next message: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Confirmed it is in the target layer in LLVM.
Here is the test case: https://godbolt.org/g/kApSxe
$ g++ -O3 -lm fcvt.cc $ ./a.out 1 exact 1 inexact 1 exact 1 inexact
$ clang++ -O3 -lm fcvt.cc $ ./a.out 1 exact 1 inexact 1 inexact 1 inexact
$ cat fcvt.cc #include #include #include #include #include <fenv.h>
typedef signed int s32; typedef unsigned int u32; typedef signed long long s64; typedef unsigned long long u64;
attribute ((noinline)) s32 fcvt_wu(float f) { return s32(u32(f)); } attribute ((noinline)) s64 fcvt_lu(float f) { return s64(u64(f)); }
void test_fcvt_wu(float a) { feclearexcept(FE_ALL_EXCEPT); printf("%d ", fcvt_wu(a)); printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact"); }
void test_fcvt_lu(float a)
{
feclearexcept(FE_ALL_EXCEPT);
printf("%lld ", fcvt_lu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}
int main() { fesetround(FE_TONEAREST);
test_fcvt_wu(1.0f);
test_fcvt_wu(1.1f);
test_fcvt_lu(1.0f);
test_fcvt_lu(1.1f);
}
On 20 Apr 2017, at 5:01 AM, Michael Clark <michaeljclark at mac.com> wrote:
Changing the list from cfe-dev to llvm-dev
On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com <mailto:michaeljclark at mac.com>> wrote:
I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI. I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic using a conditional move is causing INEXACT to be set from the other side of the predicate as the lowered x8664 code executes both conversions whereas GCC uses a branch. That seems to be the difference. I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.
$ more llvm/lib/Target/X86//README-X86-64.txt … Are we better off using branches instead of cmove to implement FP to unsigned i64? conv: ucomiss LC0(%rip), %xmm0 cvttss2siq %xmm0, %rdx jb L3 subss LC0(%rip), %xmm0 movabsq $-9223372036854775808, %rax cvttss2siq %xmm0, %rdx xorq %rax, %rdx L3: movq %rdx, %rax ret instead of conv: movss LCPI10(%rip), %xmm1 cvttss2siq %xmm0, %rcx movaps %xmm0, %xmm2 subss %xmm1, %xmm2 cvttss2siq %xmm2, %rax movabsq $-9223372036854775808, %rdx xorq %rdx, %rax ucomiss %xmm1, %xmm0 cmovb %rcx, %rax ret
On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at mac.com <mailto:michaeljclark at mac.com>> wrote:
On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at gmail.com <mailto:t.p.northover at gmail.com>> wrote: On 18 April 2017 at 15:54, Michael Clark via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote: The only way towards completing a milestone is via fixing a number of small issues along the way… I believe there's more to it than that. None of LLVM's optimizations are aware of this extra side-channel of information (with possible exceptions like avoiding speculating fdiv because of unavoidable exceptions). From what I remember, the real proposal is to replace all floating-point IR with intrinsics when FENVACCESS is on, which the optimizers by default won't have a clue about and will treat conservatively (essentially like they're modifying external memory). So be careful with drawing conclusions from small snippets; you're probably not seeing the full range of LLVM's behaviour. Yes. I’m sure. It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2 <https://godbolt.org/g/myUoL2> It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations. ; Function Attrs: noinline nounwind uwtable define i64 @Z7fcvtluf(float %f) #0 { %1 = alloca float, align 4 store float %f, float* %1, align 4 %2 = load float, float* %1, align 4 %3 = fptoui float %2 to i64 ret i64 %3 } GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag. Clang lowering (inexact set when result is exact): fcvtlu(float): movss xmm1, dword ptr [rip + .LCPI10] # xmm1 = mem[0],zero,zero,zero movaps xmm2, xmm0 subss xmm2, xmm1 cvttss2si rax, xmm2 movabs rcx, -9223372036854775808 xor rcx, rax cvttss2si rax, xmm0 ucomiss xmm0, xmm1 cmovae rax, rcx ret GCC lowering (sets flags correctly): fcvtlu(float): ucomiss xmm0, DWORD PTR .LC0[rip] jnb .L4 cvttss2si rax, xmm0 ret .L4: subss xmm0, DWORD PTR .LC0[rip] movabs rdx, -9223372036854775808 cvttss2si rax, xmm0 xor rax, rdx ret
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170420/7dbbb67d/attachment.html>
- Previous message: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
- Next message: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]