(original) (raw)
Hi sanjay,
If I'm seeing it correctly, (part of?) the fold you're looking for is here:
...but it's restricted to pre-legalization.
I don't remember exactly what the problem was allowing that fold post-legalization, but maybe you can loosen that restriction?
Thanks! I tried just to remove the !LegalOperations condition
(DAGCombiner.cpp:10056), and indeed my problem was solved. Doing
this on SystemZ (for all of the opcodes) did not affect SPEC that
much. Opcode counts (trunk to left):
aghi : 38759
38742 -17
ahi : 34921
34936 +15
risbgn : 37104
37092 -12
nill : 2172
2183 +11
lr : 29731
29735 +4
sr : 6055
6059 +4
srk : 3743
3741 -2
lhi : 89566
89568 +2
risblg : 6528
6529 +1
la : 192375
192374 -1
Spill|Reload : 189670
189670 +0
So, to me it seems this could be the default on SystemZ at least.
/Jonas
On Fri, Feb 8, 2019 at 10:20 AM Jonas Paulsson via llvm-dev <llvm-dev@lists.llvm.org> wrote:
Hi,
SystemZ supports @llvm.ctlz.i64() natively with a single instruction
(FLOGR), and lesser bitwidth versions of the intrinsic are promoted to i64.
For some reason, this leads to unfolded additions of constants as shown
below:
This function:
define i16 @fun(i16 %arg) {
%1 = tail call i16 @llvm.ctlz.i16(i16 %arg, i1 false)
ret i16 %1
}
,gives this optimized DAG as input to instruction selection:
SelectionDAG has 15 nodes:
t0: ch = EntryToken
t2: i32,ch = CopyFromReg t0, Register:i32 %0
t10: i32 = and t2, Constant:i32<65535>
t16: i64 = zero\_extend t10
t17: i64 = ctlz t16
t22: i64 = add t17, Constant:i64<-32>
t20: i32 = truncate t22
t15: i32 = add t20, Constant:i32<-16>
t7: ch,glue = CopyToReg t0, Register:i32 $r2l, t15
t8: ch = SystemZISD::RET\_FLAG t7, Register:i32 $r2l, t7:1
It seems that SelectionDAG::computeKnownBits() has a case for ISD::CTLZ,
and it seems to figure out that the high bits of t17 are zero, as expected.
t17 is guaranteed to have a value between 48 and 64, so there could not
be any overflow here, even though I am not sure if that's the problem or
not... Should DAGCombiner::visitADD() handle this, or perhaps
visitTRUNCATE()?
Thanks for any help,
Jonas
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev