llvm.fma.f16 intrinsic is expanded incorrectly on targets without native half FMA support · Issue #98389 · llvm/llvm-project (original) (raw)

Consider the following LLVM IR:

declare half @llvm.fma.f16(half %a, half %b, half %c)

define half @do_fma(half %a, half %b, half %c) { %res = call half @llvm.fma.f16(half %a, half %b, half %c) ret half %res }

On targets without native half FMA support, LLVM turns this into the equivalent of:

declare float @llvm.fma.f32(float %a, float %b, float %c)

define half @do_fma(half %a, half %b, half %c) { %a_f32 = fpext half %a to float %b_f32 = fpext half %b to float %c_f32 = fpext half %c to float %res_f32 = call float @llvm.fma.f32(float %a_f32, float %b_f32, float %c_f32) %res = fptrunc float %res_f32 to half ret half %res }

This is a miscompilation, however, as float does not have enough precision to do a fused-multiply-add for half without double rounding becoming an issue. For instance (raw bits of each half are in brackets): do_fma(48.34375 (0x520b), 0.000013887882 (0x00e9), 0.12438965 (0x2ff6)) = 0.12512207 (0x3001), but LLVM's lowering to float FMA gives an incorrect result of 0.125 (0x3000).

A correct lowering would need to use double (or larger): a double FMA is not required as double is large enough to represent the result of half * half without any rounding. In summary, a correct lowering would look something like this:

declare double @llvm.fmuladd.f64(double %a, double %b, double %c)

define half @do_fma(half %a, half %b, half %c) { %a_f64 = fpext half %a to double %b_f64 = fpext half %b to double %c_f64 = fpext half %c to double %res_f64 = call double @llvm.fmuladd.f64(double %a_f64, double %b_f64, double %c_f64) %res = fptrunc double %res_f64 to half ret half %res }