[Python-3000] Making more effective use of slice objects in Py3k (original) (raw)

Ron Adam rrr at ronadam.com
Mon Aug 28 13:14:14 CEST 2006


Nick Coghlan wrote:

This idea is inspired by the find/rfind string discussion (particularly a couple of comments from Jim and Ron), but I think the applicability may prove to be wider than just string methods (e.g. I suspect it may prove useful for the bytes() type as well).

If I'm following the ideas here which was based (only in part) on my suggestion. It's not a major feature request, but instead a combination of various small changes in which each may have some benefits of their own. The proposal is more in line with cleaning up things so they can (if one desires) get them to work together easier. But that needn't be the main reason for doing it.

I also recognize that python has many very specific functions and modules, many of which are highly optimized. Most of the major problems have already been solved in that way, so it is really hard to find things that make a big difference. But I don't think that means we shouldn't work on making small improvements to things where they are possible, even if it's only to make it a bit easier to remember and/or learn.

I think an enriched slicing model that allows sequence views to be expressed easily as "this slice of this sequence" would allow this to be dealt with cleanly, without requiring every sequence to provide a corresponding "sequence view" with non-copying semantics. I think Guido's concern that people will reach for string views when they don't need them is also valid (as I believe that it is most often inexperience that leads to premature optimization that then leads to needless code complexity).

I agree with both of these, but maybe we should concentrate on the individual changes and not a big picture to justify a group of changes. The individual changes or enhancements need to stand on their own.

So in that light, the following individual separate items is what I would focus on for now. (Not string views or slice partition functions. Let those come later if they prove useful.)

The specific changes I suggest based on the find/rfind discussion are:

1. make range() (what used to be xrange()) a subclass of slice(), so that range objects can be used to index sequences. The only differences between range() and slice() would then be that start/stop/step will never be None for range instances, and range instances act like an immutable sequence while slice instances do not (i.e. range objects would grow an indices() method).

  1. Remove None stored as indices in slice objects. Depending on the step value, Any Nones can be converted to 0 or -1 immediately, the step should never be None or Zero.

Once the slice is created the Nones are not needed, valid index values can be determined. This moves the checks forward to slice object creation time from slice object use time.

If a slice object is reused, then there might be some (micro) performance benefits if it is defined outside a loop and then used multiple times inside a loop.

Also the indices can be read and used directly via slice.start, etc... without having to check for None or invalid index's if someone wants to do that.

2. change range() and slice() to accept slice() instances as arguments so that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError if x.stop is None).

  1. Enable slices and ranges to be converted back and forth.

This works now.

xrange(*slice(1,-1,1).indices(10)) xrange(1, 9)

There is no way to get the indices from an xrange object. They are not available via attributes or methods, (that I know of), but they can be gotten by parsing the repr string.

So this doesn't work.

 slice(*xrange(1,10,1).indices())   # no indices method

While I don't have any real specific use case for this item, it may have some educational or introspective value. ie... something to teach the relationships of each. An xrange() object can also be defined outside a loop and then used multiple times in an inner loop.

  1. Continue to make xrange() and slice() a bit more alike in how they work and the values they return, but keep them separate and don't subclass range from slice. Each has a definite different purpose although they are related in some ways they shouldn't try to 'be' the other I think.

The following examples show some inconsistencies in how they work or where they could be more alike. For example viewing a xrange vs slice objects returns differing representations depending on what the values of the indices are. These are just minor (barely) annoyances, and there isn't anything actually wrong, but they could be improved a bit I think.

slice always shows all three values if viewed. (This is ok)

slice(10) slice(None, 10, None) # None stored as indices. slice(0, 10, 1) slice(0, 10, 1)

- xrange only shows values different from the defaults.

xrange(10) xrange(10) xrange(1, 10) xrange(1, 10) xrange(0, 10, 1) xrange(10) # hides 0 and 1

- The xrange stop value is always an even increment of

the step value + start.

is even numbered.

xrange(1, 10, 2) xrange(1, 11, 2) # 11! why not 10 here? xrange(0, 10, 3) xrange(0, 12, 3) # and 10 here instead of 12?

slice accepts anything!

slice(1, 10, 0) # zero for step slice(1, 10, 0) slice(list, int, dict) slice(<type 'list'>, <type 'int'>, <type 'dict'>)

xrange rejects any invalid index's.

xrange(None, 10, None) # None not an integer. Traceback (most recent call last): File "", line 1, in ? TypeError: an integer is required

xrange(1, 10, 0) Traceback (most recent call last): File "", line 1, in ? ValueError: xrange() arg 3 must not be zero

  1. Allow slice objects to be sub-classed. That will allow for experimentation and or for programmers to modify slice in ways they may find useful for their "own" applications. Most likely it would be a way to group methods together that all use the same start, stop and or step indices. And then could it be possible to apply those via the slice operation at once?

  2. Find a way to avoid slice wrap-a-rounds. These happen when iterating past zero in either direction. It usually requires a different approach and/or check to avoid going past the zero/-1 boundary.

One thought I've had on this is to allow only positive integers along with a symbol to indicate an index is to be counted from the far end. Then an exception could be raised if a negative index is used.

Possibly something like: [i:\j] # '' indicate j is to be counted from the far end.

The line continuation back slash could be special cased for use with slices I think. But some other symbol might be better.

I think this group of separate items taken together will do what the title in this thread suggests. But each of these is a separate item in itself as well and has its own reasons why it could be helpful.

Regarding the other items...

The above changes possibly make some (or most) of the other suggestions possible and/or easier to implement. So then a programmer can roll their own string views or slice partition functions in a clean way if they want to. That's the point of the "Making more effective use of slice objects". Its not a specific idea, but a generality that may come about by doing these other smaller things first. And doing them as a group is probably a good way to address these things.

I hope this clarifies at least my view point if not Nicks. But I'll keep an open mind and see what he has to offer in his PEP.

Cheers, Ron



More information about the Python-3000 mailing list