Richard Guenther - Re: [PATCH] Fix PR29789, Missed invariant out of the loop with conditio (original) (raw)

This is the mail archive of the gcc-patches@gcc.gnu.orgmailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR29789, Missed invariant out of the loop with conditionals and shifts


On Thu, 15 Feb 2007, Roger Sayle wrote:

Hi Richard,

On Thu, 15 Feb 2007, Richard Guenther wrote:

2007-01-25 Richard Guenther rguenther@suse.de

PR middle-end/29789
* fold-const.c (fold_binary): Do not fold (1 << a) & b to
(b >> a) & 1.

I will apply this later.

Noooo! Or at least not so quickly, without further benchmarking.

;)

So I verified that we create exactly the same assembler on two-operand insn machines with and without the patch.

I didn't believe you :-), so I tested this out myself.

For the following test case:

int foo(int a, int b) { return ((1 << a) & b) != 0; }

which currently on i686-pc-linux-gnu with -O2 -fomit-frame-pointer is

_foo: movl 8(%esp), %eax movl 4(%esp), %ecx sarl %cl, %eax andl $1, %eax ret

but with your proposed patch becomes:

_foo: movl 4(%esp), %ecx movl $1, %eax sall %cl, %eax testl %eax, 8(%esp) setne %al movzbl %al, %eax ret

Which is significantly larger and slower.

Heh, ok. I only tested on the testcase from PR29789. It looks like the above should be somewhere in the testsuite.

I'm suprised that pinskia hadn't mentioned the various PRs concerning shifts and single bit tests, including those that would have to be reopened if you eliminated this code.

This is just working around the real problem of invariant identification by reassociations. Given an expression such as iv + (invar + 1), we currently canonicalize with the constant outermost, (iv + invar) + 1 which may cause a simplistic invariant subexpression pass to miss the fact that (invar + 1) is loop invariant. It's not just shifts, but all kinds of expressions that may need to be restructured to separate the loop variant from the loop invariant terms.

Well sure. This patch was a quick shot at solving parts of our slowness in spec2k6 libquantum. I bet that PRE with the new VN could do all this invariant motion - tree lim can only do it if I adjust its cost metric and have the trees canonicalized differently, likewise rtl invariant motion can do it with different canonicalization and current costs.

I'll see what can be done from within fold, or perhaps cleaned up within the RTL optimizers, but these transformations are part of a complex push-pull transformation chain, with the canonical forms being identified and appropriately lowered during RTL expansion.

[This is why you didn't see a change in an if(...) test. We canonicalize to "(a >> b) & 1" at the tree-level and then at RTL expansion time, we determine whether the integer value is needed or not (in expand_expr vs. do_jump) and emit/expand the appropriate form.]

Even if we decided on the alternate canonical form, which doesn't satisfy truth_value_p, your patch would also then need to transform "(a >> b) & 1" into "(1 << b) & a", so we could spot their equivalence, and lower it intelligently in expr.c/jump.c.

At a minimum you should run SPEC (for performance) and CSiBE (for code size) to see, if your change wins more often than it loses.

I did run spec2k on x86_64 and all was just noise. libquantum from spec2k6 got a nice 5% improvement, I didn't run the rest.

I'll take the patch back for now.

Richard.

-- Richard Guenther rguenther@suse.de Novell / SUSE Labs


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]