msg294674 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-05-29 06:58 |
The peephole optimizer optimizes some boolean conditions. For example in "if not a:" it replaces UNARY_NOT+POP_JUMP_IF_FALSE with POP_JUMP_IF_TRUE, and in "if a and b:" it makes checking the boolean value of a only once. But it is unable to optimize more complex conditions, like "if not a and b:". Proposed patch makes the compiler producing optimized code for conditions. It supports expressions with arbitrary complexity containing the "not" operator, the "and" and "or" binary operators, the "if" ternary operator, and chained comparisons, used as conditions in the ternary operator, in the "if", "while" and "assert" statements, and in generator expressions and comprehensions. It would be possible to remove the part of the peepholer optimizer, but it is needed also for optimizing the "and" and "or" operators in non-boolean context. Doing this optimization in the compiler is possible but too cumbersome, it requires 3 times more code that in the proposed patch. May be I'll find the more general solution in future. |
|
|
msg294680 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-05-29 07:19 |
Some examples. if not a and b: x Unpatched: 1 0 LOAD_NAME 0 (a) 2 UNARY_NOT 4 POP_JUMP_IF_FALSE 14 6 LOAD_NAME 1 (b) 8 POP_JUMP_IF_FALSE 14 10 LOAD_NAME 2 (x) 12 POP_TOP >> 14 LOAD_CONST 0 (None) 16 RETURN_VALUE Patched: 1 0 LOAD_NAME 0 (a) 2 POP_JUMP_IF_TRUE 12 4 LOAD_NAME 1 (b) 6 POP_JUMP_IF_FALSE 12 8 LOAD_NAME 2 (x) 10 POP_TOP >> 12 LOAD_CONST 0 (None) 14 RETURN_VALUE if a <= b < c: x Unpatched: 1 0 LOAD_NAME 0 (a) 2 LOAD_NAME 1 (b) 4 DUP_TOP 6 ROT_THREE 8 COMPARE_OP 1 (<=) 10 JUMP_IF_FALSE_OR_POP 18 12 LOAD_NAME 2 (c) 14 COMPARE_OP 0 (<) 16 JUMP_FORWARD 4 (to 22) >> 18 ROT_TWO 20 POP_TOP >> 22 POP_JUMP_IF_FALSE 28 24 LOAD_NAME 3 (x) 26 POP_TOP >> 28 LOAD_CONST 0 (None) 30 RETURN_VALUE Patched: 1 0 LOAD_NAME 0 (a) 2 LOAD_NAME 1 (b) 4 DUP_TOP 6 ROT_THREE 8 COMPARE_OP 1 (<=) 10 POP_JUMP_IF_FALSE 20 12 LOAD_NAME 2 (c) 14 COMPARE_OP 0 (<) 16 POP_JUMP_IF_FALSE 28 18 JUMP_FORWARD 4 (to 24) >> 20 POP_TOP 22 JUMP_FORWARD 4 (to 28) >> 24 LOAD_NAME 3 (x) 26 POP_TOP >> 28 LOAD_CONST 0 (None) 30 RETURN_VALUE if not (a and b) and c: x Unpatched: 1 0 LOAD_NAME 0 (a) 2 JUMP_IF_FALSE_OR_POP 6 4 LOAD_NAME 1 (b) >> 6 UNARY_NOT 8 POP_JUMP_IF_FALSE 18 10 LOAD_NAME 2 (c) 12 POP_JUMP_IF_FALSE 18 14 LOAD_NAME 3 (x) 16 POP_TOP >> 18 LOAD_CONST 0 (None) 20 RETURN_VALUE Patched: 1 0 LOAD_NAME 0 (a) 2 POP_JUMP_IF_FALSE 8 4 LOAD_NAME 1 (b) 6 POP_JUMP_IF_TRUE 16 >> 8 LOAD_NAME 2 (c) 10 POP_JUMP_IF_FALSE 16 12 LOAD_NAME 3 (x) 14 POP_TOP >> 16 LOAD_CONST 0 (None) 18 RETURN_VALUE Note that the __bool__() method of every value is evaluated only once. |
|
|
msg294681 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-05-29 07:43 |
It would be hard to get the same optimization in the peepholer. For the first example, "if not a and b:", it is easy, just replace UNARY_NOT+POP_JUMP_IF_FALSE with POP_JUMP_IF_TRUE. The more complex third example, "if not (a and b) and c:", would need multiple passes (and more complex examples can need more passes, having the quadratic complexity in corner cases). The second example, with chained comparisons, is the most complex. It uses the fact that after the conditional jump and some stack operations we got a false value on the stack. This is too complex for the peepholer. |
|
|
msg294693 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-05-29 13:00 |
Any high-level benchmarks (i.e. other than pybench and similar programs) impacted by this? |
|
|
msg294713 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2017-05-29 21:04 |
Will this negatively impact code coverage reporting or code tracing (by optimizing across basic blocks)? |
|
|
msg294714 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2017-05-29 21:16 |
The improved code does look better. I'm +1 on this proposal as long as we're sure it doesn't introduce any new problems. |
|
|
msg294743 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-05-30 07:25 |
> Any high-level benchmarks (i.e. other than pybench and similar programs) impacted by this? Sorry, I don't have any data. Currently I'm not able to run large benchmarks. I would be surprised if this increases the performance of a macrobenchmark even by 1%. But this change is a step in the direction of getting rid from the peephole optimizer. > Will this negatively impact code coverage reporting or code tracing (by optimizing across basic blocks)? I can't imagine the case. The compiler doesn't generate separate line numbers neither for JUMP instructions, nor for auxilary stack manipulating instructions in chained comparison, nor for UNARY_NOT. And these are the only things that are changed. Line number marks are left the same in optimized code, and they points to the same instructions. I think that coverage tools and debugger are not affected. |
|
|
msg295705 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-06-11 11:50 |
New changeset 36ff451ebae41f09560bff582c95946474d898f8 by Serhiy Storchaka in branch 'master': bpo-30501: Make the compiler producing optimized code for condition expressions. (#1851) https://github.com/python/cpython/commit/36ff451ebae41f09560bff582c95946474d898f8 |
|
|