Misoptimization: EarlyCSEPass
uses replaces powi.f16
with float
result · Issue #98665 · llvm/llvm-project (original) (raw)
It looks like EarlyCSEPass is transformin the following:
%_6 = alloca [48 x i8], align 8 %_3 = alloca [2 x i8], align 2 %0 = call half @llvm.powi.f16.i32(half 0xH3C00, i32 1) ; 0xH3C00 = 1.0f16 store half %0, ptr %_3, align 2 %1 = load half, ptr %_3, align 2 %_4 = fcmp oeq half %1, 0xH3C00 br i1 %_4, label %bb1, label %bb2
Into this:
%_6 = alloca [48 x i8], align 8 %_3 = alloca [2 x i8], align 2 store float 1.000000e+00, ptr %_3, align 2 %0 = load half, ptr %_3, align 2 %_4 = fcmp oeq half %0, 0xH3C00 br i1 %_4, label %bb1, label %bb2
And later InstCombine folds further into:
%_6 = alloca [48 x i8], align 8 %_3 = alloca [2 x i8], align 2 store float 1.000000e+00, ptr %_3, align 2 br i1 false, label %bb1, label %bb2
EarlyCSE seems to be doing an incorrect transformation: the result of powi.f16(1.0, 1)
should be half
1.0 (0x3c00), but it is returning float
1.0 (0x3f800000). This is incorrect and an OOB write.
This comes from the following rust code, which asserts only when optimizations are enabled:
#![feature(f16)] #![allow(unused)]
#[inline(never)] pub fn check_pow(a: f16) { assert_eq!(1.0f16.powi(1), 1.0); }
pub fn main() { check_pow(1.0); println!("finished"); }
Link to compiler explorer: https://rust.godbolt.org/z/zsbzzxGvj
I'm not sure how to reduce to a llc example since the passes appear different. I have been testing on aarch64 since x86 has other f16 ABI bugs, but I don't think this is limited to aarch64.