msg63422 - (view) |
Author: Paul Pogonyshev (_doublep) |
Date: 2008-03-09 15:35 |
This optimization targets a common case of functions like this: def foo(a, b): if a: if b: return 'true' Before the optimization: 6 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE 16 (to 22) 6 POP_TOP 7 7 LOAD_FAST 1 (b) 10 JUMP_IF_FALSE 5 (to 18) 13 POP_TOP 8 14 LOAD_CONST 1 ('true') 17 RETURN_VALUE >> 18 POP_TOP 19 JUMP_FORWARD 1 (to 23) >> 22 POP_TOP >> 23 LOAD_CONST 0 (None) 26 RETURN_VALUE After: 6 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE 16 (to 22) 6 POP_TOP 7 7 LOAD_FAST 1 (b) 10 JUMP_IF_FALSE 9 (to 22) 13 POP_TOP 8 14 LOAD_CONST 1 ('true') 17 RETURN_VALUE 18 POP_TOP 19 JUMP_FORWARD 1 (to 23) >> 22 POP_TOP >> 23 LOAD_CONST 0 (None) 26 RETURN_VALUE Note that size of bytecode doesn't become smaller, however one execution branch (jump from offset 10) becomes shorter by one JUMP_FORWARD. Additionally, two commands at offset 18 become unreachable. However, they will not be removed by the patch in currently, because there is a basic-block change at 18, where JUMP_IF_FALSE previously had its target. If we want unreachable code be removed in such cases, we need to make peepholer make two passes with recomputing blocks[] in between. This would enable more optimizations. |
|
|
msg63424 - (view) |
Author: Paul Pogonyshev (_doublep) |
Date: 2008-03-09 15:39 |
Also, this is the reason for while() in the patch. Consider this function: def bar(a, b, c): if a: if b: if c: return 'true' Before the patch: 11 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE 27 (to 33) 6 POP_TOP 12 7 LOAD_FAST 1 (b) 10 JUMP_IF_FALSE 16 (to 29) 13 POP_TOP 13 14 LOAD_FAST 2 (c) 17 JUMP_IF_FALSE 5 (to 25) 20 POP_TOP 14 21 LOAD_CONST 1 ('true') 24 RETURN_VALUE >> 25 POP_TOP 26 JUMP_ABSOLUTE 34 >> 29 POP_TOP 30 JUMP_FORWARD 1 (to 34) >> 33 POP_TOP >> 34 LOAD_CONST 0 (None) 37 RETURN_VALUE After: 11 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE 27 (to 33) 6 POP_TOP 12 7 LOAD_FAST 1 (b) 10 JUMP_IF_FALSE 20 (to 33) 13 POP_TOP 13 14 LOAD_FAST 2 (c) 17 JUMP_IF_FALSE 13 (to 33) 20 POP_TOP 14 21 LOAD_CONST 1 ('true') 24 RETURN_VALUE 25 POP_TOP 26 JUMP_ABSOLUTE 34 29 POP_TOP 30 JUMP_FORWARD 1 (to 34) >> 33 POP_TOP >> 34 LOAD_CONST 0 (None) 37 RETURN_VALUE Without the while(), target for jump at offset 17 wouldn't be optimized, because "jump to unconditional jump" optimization hasn't been done yet: it is farther in the code. |
|
|
msg70540 - (view) |
Author: Jack Diederich (jackdied) *  |
Date: 2008-08-01 03:43 |
+0 * The peepholer comment promises "Optimizations are restricted to simple transformations occuring within a single basic block." and this goes beyond that. You could patch that comment. * This needs a matching patch to Lib/test/test_peepholer.py * Deeply nested "if"s suffer a flat penalty because the unconditional jump peepholer figures this out after the first extra POP_TOP (notice the JUMP_ABSOLUTE in your second disassembly). * I noticed the NOP assignment is commented out in your patch, was that unsafe? I searched for old threads on the peephole optimizer and came up with these: Skip Montanaro's paper http://www.webfast.com/~skip/python/spam7/optimizer.html Two old python-dev threads http://www.gossamer-threads.com/lists/python/dev/645669 http://www.gossamer-threads.com/lists/python/dev/645669 I was hoping to find out if writing a bytecode optimizer in python had been discussed because a generic version of your optimization would be really easy to write in pure python (using a dict for the jump table, for instance). I found nothing; my search terms might have been insufficient. |
|
|
msg70546 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-08-01 09:41 |
While this patch is interesting, I think it would be more efficient to introduce variants of JUMP_IF_FALSE and JUMP_IF_TRUE which pop their argument rather than leaving it on the stack (like I did in #2459). |
|
|
msg75935 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-11-16 12:51 |
This looks fine to me. Go ahead and uncomment the memcpy() line (I presume you commented it out in order to demonstrate the basic transformation without the dead code elimination). Be sure to add a testcase. Antoine, the issue with JUMP_IF variants is that they interfere with other existing byte code optimizations, resulting in a net loss. BTW, don't expect much of a real-world speed-up from this patch. The JUMP_ABSOLUTE opcode is very fast. |
|
|
msg75936 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-11-16 13:08 |
> Antoine, the issue with JUMP_IF variants is that they interfere with > other existing byte code optimizations, resulting in a net loss. Which other byte code optimizations does it interfere with? |
|
|
msg75937 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-11-16 13:13 |
Am taking this one back. May have a simpler approach. |
|
|
msg75938 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-11-16 14:27 |
Adding a simpler approach. It's a bit limited because some common cases have the POP_TOP JUMP_FORWARD 1 pair split across two basic blocks. Those cases should not be optimized away on a first pass. |
|
|
msg75944 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-11-16 21:13 |
Georg, would you please give this a second review. It passes the whole test suite for me and triggers often. |
|
|
msg75994 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-11-18 00:07 |
Applied simplified patch r67253 . |
|
|