llvm.fma.f16
intrinsic is expanded incorrectly on targets without native half
FMA support · Issue #98389 · llvm/llvm-project (original) (raw)
Consider the following LLVM IR:
declare half @llvm.fma.f16(half %a, half %b, half %c)
define half @do_fma(half %a, half %b, half %c) { %res = call half @llvm.fma.f16(half %a, half %b, half %c) ret half %res }
On targets without native half
FMA support, LLVM turns this into the equivalent of:
declare float @llvm.fma.f32(float %a, float %b, float %c)
define half @do_fma(half %a, half %b, half %c) { %a_f32 = fpext half %a to float %b_f32 = fpext half %b to float %c_f32 = fpext half %c to float %res_f32 = call float @llvm.fma.f32(float %a_f32, float %b_f32, float %c_f32) %res = fptrunc float %res_f32 to half ret half %res }
This is a miscompilation, however, as float
does not have enough precision to do a fused-multiply-add for half
without double rounding becoming an issue. For instance (raw bits of each half
are in brackets): do_fma(48.34375 (0x520b), 0.000013887882 (0x00e9), 0.12438965 (0x2ff6)) = 0.12512207 (0x3001)
, but LLVM's lowering to float
FMA gives an incorrect result of 0.125 (0x3000)
.
A correct lowering would need to use double
(or larger): a double
FMA is not required as double
is large enough to represent the result of half * half
without any rounding. In summary, a correct lowering would look something like this:
declare double @llvm.fmuladd.f64(double %a, double %b, double %c)
define half @do_fma(half %a, half %b, half %c) { %a_f64 = fpext half %a to double %b_f64 = fpext half %b to double %c_f64 = fpext half %c to double %res_f64 = call double @llvm.fmuladd.f64(double %a_f64, double %b_f64, double %c_f64) %res = fptrunc double %res_f64 to half ret half %res }