Message 79056 - Python tracker (original) (raw)

@Alexandre:

So, can you try dropping the switch altogether, using always computed goto and seeing how does the resulting code get compiled?

Removing the switch won't be possible unless we change the semantic EXTENDED_ARG. In addition, I doubt the improvement, if any, would worth the increased complexity. OK, it's time that I post code to experiment with that - there is no need to break EXTENDED_ARG. And the point is to fight miscompilations.

Do you actually mean the time spent interpreting bytecodes compared to the time spent in the other parts of Python? If so, your figures are wrong for CPython on x86-64. It is about 50% just like on x86 (when running pybench). With the patch, this drops to 35% on x86-64 and to 45% on x86.

More or less, I mean that, but I was making an example, and I made up reasonable figures. 70%, or even more, just for dispatch (i.e. just for the mispredicted indirect jump), is valid for real-world Smalltalk interpreters for instance, or for the ones in "The Structure and Performance of Efficient Interpreters". But, when you say "intepreting opcodes", I do not know which part you refer to, if just the computed goto or for the whole code in the interpreter function.