msg195918 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2013-08-22 20:52 |
The get/set/delitem slicing protocol has replaced the old Py2.x get/set/delslice protocol in Py3.x. This change introduces a substantial overhead due to processing indices as Python objects rather than plain Py_ssize_t values. This overhead should be reduced as much as possible. This ticket can be seen as a more general follow-up to . |
|
|
msg195919 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2013-08-22 20:55 |
This tiny patch adds a fast-path to _PyEval_SliceIndex() that speeds up the slicing-heavy "fannkuch" benchmark by 8% for me. |
|
|
msg195920 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2013-08-22 21:15 |
Sorry, broken patch. Here's a new one (same results). |
|
|
msg195922 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2013-08-22 21:22 |
Another patch, originally proposed in by Kristján Valur Jónsson. It uses local variables instead of pointers in PySlice_GetIndicesEx(). I see no performance difference whatsoever (Linux/gcc 4.7), but also I can't see a reason not to do this. I think it cleans up the code a bit. |
|
|
msg195925 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2013-08-22 21:48 |
And in fact, callgrind shows an *increase* in the number of instructions executed for the "use locals" patch (898M with vs. 846M without that patch when running the fannkuch benchmark twice). That speaks against making that change, I guess. |
|
|
msg195944 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2013-08-23 06:31 |
Here is another patch that remembers the Py_ssize_t slice indices if they are known at instantiation time. It only makes a very small difference for the "fannkuch" benchmark, so that's no reason to add both the complexity and the (IMHO ignorable) memory overhead. However, it also adds a public C-API function PySlice_FromIndices() that allows setting the values from C code at instantiation time, thus avoiding the overhead of having to do the conversion back again. It might also be worth exploring if we can't instantiate the Python object indices at first request using properties, iff the slice was created with integer indices. Meaning, we'd leave the PyObject* fields NULL in that case until they are being used. That would reduce the overhead of creating the slice object in the first place. Since we added the one-instance cache, it's now dominated by the creation of the index value objects when _PySlice_FromIndices() is being used internally (e.g. when calling PySequence_Get/Set/DelSlice()). |
|
|
msg231215 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-11-15 20:02 |
Are there any benchmarks? |
|
|
msg231219 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-11-15 21:32 |
As mentioned, the fannkuch benchmark benefits quite a bit from the fast path, by 8%. It was a while ago when I tested, though, so I don't have exact numbers ATM. |
|
|
msg231235 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-11-16 08:17 |
Here's another idea. There could be two implementations of "slice", one that uses Python object indices (as currently) and one that has Py_ssize_t indices (and properties for the start/stop/step attributes). Similar to what Unicode strings do, the slice type would decide which to use at instantiation time. The internals of PySliceObject are not part of the stable ABI, and the normal way to figure out the indices is PySlice_GetIndicesEx(), which would be adapted accordingly. This breaks compatibility with some C extensions that access the slice attributes directly, but that should be rare, given how complex index calculations are. This change is certainly more invasive than adding Py_ssize_t fields to the existing object, though. |
|
|
msg231237 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-11-16 08:35 |
Thanks for the review, Serhiy. There isn't really a reason not to 'normalise' the Python step object to None when 1 is passed from C code, so I updated the patch to do that. Regarding on-demand creation of the Python values, the problem is that C code that accesses the struct fields directly would not be prepared to find NULL values in them, so this would be a visible change and potential source of crashes. However, my guess is that this would rarely be a problem (see my last comment on changing the slice type internally). |
|
|
msg231238 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-11-16 09:47 |
> There could be two implementations of "slice", one that > uses Python object indices (as currently) and one that has Py_ssize_t > indices (and properties for the start/stop/step attributes). Agree, this idea LGTM. Single boolean flag should be enough to switch between implementations. This shouldn't break well written code. The PySliceObject structure is not a part of stable API. |
|
|
msg231244 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-11-16 10:41 |
I reran the benchmarks with the fast path in _PyEval_SliceIndex() and the results are not conclusive. There is no clear indication that it improves the performance. The problem is that most slices have no step, many slices are open ended on at least one side (i.e. usually have one or two fields set to None) and they are only used once, so integer index optimisations can only have a limited impact compared to instantiating the slice object in the first place. |
|
|
msg231247 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-11-16 14:32 |
Indeed, PySequence_GetSlice() is used only in few performance non-critical places in the stdlib, PySequence_Set/DelSlice() are not used at all. So looks as this way doesn't speed up any code. As for faster_PyEval_SliceIndex.patch see . May be such optimization is not always correct (only for exact PyLong). And why you pass through fast path only if -1<= Py_SIZE(v) <=1? |
|
|
msg231256 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-11-16 18:43 |
> why you pass through fast path only if -1<= Py_SIZE(v) <=1? It's usually enough, it's the fastest fast path, and it doesn't even need error handling. Any slicing where the slice calculation time matters is going to be of fairly limited size, often involving the values (!) 1 or -1 in the slice object fields, but definitely small integers. |
|
|
msg234884 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-01-28 10:29 |
So may be close this issue as it doesn't affect performance? |
|
|
msg234949 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2015-01-29 07:12 |
Closing as not worth doing. |
|
|