mul nuw+lshr exact should fold to a single multiplication (when the latter is a factor) · Issue #54824 · llvm/llvm-project (original) (raw)

Alive for the specific example, with opt+llc: https://alive2.llvm.org/ce/z/7oofsh
+label: llvm:optimizations

I tried the following:

define i64 @src(i64 noundef %0) { start: %1 = mul nuw nsw i64 %0, 52 %2 = lshr exact i64 %1, 2 ret i64 %2 }

Since 52 = 4 * 13 I expected to see that fold down to just a single mul:

define i64 @tgt(i64 noundef %0) { start: %1 = mul nuw nsw i64 %0, 13 ret i64 %1 }

But it doesn't -- the mul and lshr are left in the optimized IR, and aren't folded by the target either:

src: # @src imul rax, rdi, 52 shr rax, 2 ret

Whereas the single-multiplication version ends up not even needing an imul:

tgt: # @tgt lea rax, [rdi + 2rdi] lea rax, [rdi + 4rax] ret

So in general, %1 = mul nuw %0, CONST1; %2 = lshr exact %1, CONST2 should simplify to a single mul nuw whenever CONST1 is a multiple of 1 << CONST2.

In fact, it looks like this already happens in opt if written with div: https://alive2.llvm.org/ce/z/L-VouQ

%1 = mul nuw i64 %0, 52 %2 = udiv exact i64 %1, 4 => %1 = mul nuw nsw i64 %0, 13

So that code path just needs to take into account places where the udiv -> lshr transformation had already happened.