#pragma float_control(precise, on) doesn't work for SSE intrinsics · Issue #55713 · llvm/llvm-project (original) (raw)
This is the link to godbolt with the full reproducer: https://godbolt.org/z/qYczcba39
The problem is that the pragma doesn't switch the mode when using intrinsics directly, but works when using the operators for the __m128
types.
I've originally discovered this in clang 14.0.1.
The code to see the problem is this (compiled with -Ofast -msse4.2 -mrecip=none
):
__m128 func(__m128 d, float oldLen, float newLen) { #pragma float_control(precise, on) return _mm_div_ps( _mm_mul_ps(d, _mm_set1_ps(oldLen)), _mm_set1_ps(newLen) ); }
__m128 func1(__m128 d, float oldLen, float newLen) { #pragma float_control(precise, on) return d*oldLen/newLen; }
And it leads to this assembly:
.LCPI1_0: .long 0x3f800000 # float 1 func(float __vector(4), float, float): # @func(float __vector(4), float, float) shufps xmm1, xmm1, 0 # xmm1 = xmm1[0,0,0,0] mulps xmm0, xmm1 movss xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero divss xmm1, xmm2 shufps xmm1, xmm1, 0 # xmm1 = xmm1[0,0,0,0] mulps xmm0, xmm1 ret func1(float __vector(4), float, float): # @func1(float __vector(4), float, float) shufps xmm1, xmm1, 0 # xmm1 = xmm1[0,0,0,0] mulps xmm0, xmm1 shufps xmm2, xmm2, 0 # xmm2 = xmm2[0,0,0,0] divps xmm0, xmm2 ret
Generally the use of *(1/a)
optimization here seems questionable and cland doesn't do it for scalars, only for vector/simd types. Is this another bug that needs to be reported separately?