Message 123025 - Python tracker (original) (raw)
The use of Py_LOCAL_INLINE is new to me since we usually use #define instead, but this has a cleaner look to it. I am unclear on whether all the our target compilers support an inline keyword. If you're sure it works everywhere, that's great.
I fixed ./configure to properly set up Py_LOCAL_INLINE in Issue5553. :-)
It will expand to "static inline" under both MSVC and gcc. On older compilers, it may expand to "static inline", "static __inline", or whatever else is needed to get the job done.
As a last resort, it will expand to simply "static", but I don't know of any 32-bit (or 64-bit) compilers where that would actually happen.
Also note that this patch puts a lot of faith in branch prediction. If some target processor doesn't support it, or has limited ability to remember predictions, or mispredicts, then the code will be slower.
I think even a limited amount of memory dedicated to branch prediction should be sufficient. There are two cases:
Sorting a simple type, like an int: the comparison is lightweight, and the CPU should have plenty of memory to remember which branch to take in the sorting code.
Sorting a complex type (i.e., calling a lt method written in Python): the processor might not be able to remember which branch to take, but the performance impact will be small (as a percentage) since most of the CPU is being consumed by the comparisons.
Thanks for taking the time to review this.