msg109259 - (view) |
Author: Demur Rumed (serprex) |
Date: 2010-07-04 19:53 |
Currently, COMPARE_OP is burdened by a needless, and unorthogonal, extra layer of indirection. I've modified it to only handle the rich comparison case, moving the other five cases to BINARY_IN, BINARY_NOT_IN, BINARY_IS, BINARY_IS_NOT, and BINARY_EXC_MATCH To consider is inlining the POP_JUMP_IF_FALSE POP_TOP which always follow BINARY_EXC_MATCH |
|
|
msg109261 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-07-04 20:09 |
Any chance you could regenerate the patch without all the unnecessary whitespace changes? It's kinda hard to read right now. |
|
|
msg109279 - (view) |
Author: Demur Rumed (serprex) |
Date: 2010-07-05 01:02 |
I've attached the original patch without whitespace, and am also including modifications to this portion of the interpreter which remove ROT_FOUR, DUP_TOPX while adding ROT_THREE_TWO, DUP_TOP_TWO, DUP_ROT_THREE. I've seen a 5% speed increase with timeit.Timer("a[0]+=0","a=[0]") Also modified the optimizer to reject code with a size over 32767 instead of 32760. That's most likely a useless modification which may be preferred ignored |
|
|
msg109282 - (view) |
Author: Demur Rumed (serprex) |
Date: 2010-07-05 01:26 |
Fixed regression in ROT_THREE_TWO and a missed DUP_TOP2 -> DUP_TOP_TWO. Now there are only the odd segfaults |
|
|
msg109319 - (view) |
Author: Demur Rumed (serprex) |
Date: 2010-07-05 14:07 |
I've fixed the segfaulting in IN and NOT_IN, though the code requires cleaning due to the obtusely cargo culted nature of those opcodes I'm also including an incrementation which completely removes COMPARE_OP, using the opcode to determine the comparison type. As a result, I found it safe to remove the PyCmp_ enum, though cpickle seems to disagree. It's also lacking performance wise |
|
|
msg109360 - (view) |
Author: Demur Rumed (serprex) |
Date: 2010-07-05 22:47 |
It seems the lack of benefits from replacing COMPARE_OP with a set of aliased bytecodes was repairable through TARGET_WITH_IMPL. Also fixed was the lack of amendments to dis |
|
|
msg109514 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-07-08 05:56 |
Be sure to consider the impact on the peephole optimizer. Also consider that adding volume inside the ceval.c loop may have unintended slowing effects on various compilers -- supposedly, this has been a serious issue in that past and there has been concern about growing the body of the loop. I presume that the thrust of the efforts is directed at eliminating the an unpredictable switch branch inside the COMPARE_OP opcode. If so, there's not much fat to be shed here. The cost of the opcode switch is small relative the time it takes to dispatch to an object's rich comparison methods. |
|
|
msg109538 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2010-07-08 13:14 |
I would like any such change to show what it improves through benchmarks. |
|
|
msg109553 - (view) |
Author: Demur Rumed (serprex) |
Date: 2010-07-08 16:25 |
The seperation of COMPARE_OP into multiple opcodes shouldn't affect cache size since the opcodes are aliased. Spreading out the switch statement shouldn't cause issue from flattening, since GCC would inline the single use of cmp_outcome. Thus the only bloat is the wrapper overhead, which I believe the C code is relatively large with respect to the macros. As for DUP_TOP_TWO, that's smaller than DUP_TOPX It may be suitable to benchmark on other systems besides my own due to the subversive nature of caches; I've seen improvement in any code that uses comparisons. To consider, however, is that ceval.o dropped from 268592B to 267448B. This is reflected in that the interpreter's executable dropped from 6721842B to 6719866B Raymond, what do you mean with respect to the peepholer? I believe my changes there have shown it to be a simplification of the code |
|
|
msg109557 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-07-08 17:33 |
The patch fails compiling with --with-computed-gotos: Python/ceval.c: In function ‘PyEval_EvalFrameEx’: Python/opcode_targets.h:101: erreur: label ‘TARGET_DUP_TOPX’ used but not defined make: *** [Python/ceval.o] Erreur 1 |
|
|
msg109561 - (view) |
Author: Demur Rumed (serprex) |
Date: 2010-07-08 18:10 |
Replaced TARGET_DUP_TOPX with _unknown_opcode I recompiled with --with-computed-gotos, and the results remain promising, though the interpreter instead grows from 6756023B to 6765167B |
|
|
msg109564 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-07-08 18:31 |
Here, your patch actually show a slowdown on pybench with computed gotos. Results attached. |
|
|
msg115345 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-09-02 00:23 |
I'm marking this one as rejected. The timings have shown mixed results (some favorable, some not). In general, there is a bias against expanding the number of opcodes because 1) there aren't that many codes available, 2) it grows the size of the switch-case (which has been problematic from some compilers on various systems), and because it complicates other optimization efforts such as the peephole optimizer or in other tools that depend on bytecode (such as the Unladen Swallow LLVM project, Psyco, or ByteCodeHacks). Also, the potential improvement is very small because it removes only one indirection out of long chain of events involved in a comparison. Most of the time is consumed in abstract dispatch to a concrete comparison method and it that method itself. |
|
|