Issue 2268: Fold slice constants (original) (raw)
I am attaching a proof-of-concept patch which would optimize bytecode generated from constant slices as follows:
Before patch:
dis(lambda:x[1:2:3]) 1 0 LOAD_GLOBAL 0 (x) 3 LOAD_CONST 0 (1) 6 LOAD_CONST 1 (2) 9 LOAD_CONST 2 (3) 12 BUILD_SLICE 3 15 BINARY_SUBSCR
16 RETURN_VALUE
After the patch:
dis(lambda:x[1:2:3]) 1 0 LOAD_GLOBAL 0 (x) 3 LOAD_CONST 3 (slice(1, 2, 3)) 6 BINARY_SUBSCR
7 RETURN_VALUE
While the peephole optimizer changes are straightforward, the optimization does not work unless slice objects gain hash and marshal support.
While I don't see any problem with adding slice marshaling, the idea of making slices hashable has recently been rejected (see ) and I was supporting the rejection myself.
With this patch, however, I would like to reopen the discussion of hash(slice(..)) issue.
Allowing constant folding for slices may tip the balance towards allowing hash(slice(..)) assuming that {}[:] can still be prohibited.
One possible approach to this problem would be to emit a new bytecode instead of BINARY_SUBSCR from slice syntax and have it reject mapping objects. This should be easy for objects implemented in C, but for user defined classes with custom (get|set)item it may not be easy to distinguish between a mapping and a sequence. However, I don't much of a problem for always allowing x[:] for such types (user code can reject slices if necessary).
If extra bytecode approach is taken, it is likely that d[slice(a,b)] will end up being supported even if d[a:b] is not. Some may think it would be a good feature, though.
A possible advantage of that approach would be a better error message from an attempt to slice a dictionary. The current "unhashable type" diagnostic is somewhat cryptic. "Cannot slice a dictionary" would be much friendlier.
It is possible to work around unhashability of slices and still implement folding, but the ideas that come to mind such as placing a hashable subclass of slice into constants seem too artificial.
I am copying the "nosy" list from to start the discussion.
Just to quantify the improvement:
Before:
$ ./python -m timeit -s"x='abc'" "x[::-1]" 1000000 loops, best of 3: 0.305 usec per loop $ ./python -O -m timeit -s"x='abc'" "x[::-1]" 1000000 loops, best of 3: 0.275 usec per loop
After:
$ ./python -m timeit -s"x='abc'" "x[::-1]" 1000000 loops, best of 3: 0.262 usec per loop $ ./python -O -m timeit -s"x='abc'" "x[::-1]" 1000000 loops, best of 3: 0.253 usec per loop
For some reason, when I run pybench, the timings vary from run to run so much that I cannot even tell the difference. (Run to run differences are larger than patched to original.)
FWIW, the micro-benchmark above shows 8% improvement with "-O" and 14% improvement without.