[Python-3000] Making more effective use of slice objects in Py3k (original) (raw)
Guido van Rossum guido at python.org
Tue Aug 29 21:55:21 CEST 2006
- Previous message: [Python-3000] Making more effective use of slice objects in Py3k
- Next message: [Python-3000] Making more effective use of slice objects in Py3k
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
"Guido van Rossum" <guido at python.org> wrote: > For operations that may be forced to return a new string (e.g. > concatenation) I think the return value should always be a new string, > even if it could be optimized. So for example if v is a view and s is > a string, v+s should always return a new string, even if s is empty.
I'm on the fence about this. On the one hand, I understand the desireability of being able to get the underlying string object without difficulty. On the other hand, its performance characteristics could be confusing to users of Python who may have come to expect that "st+''" is a constant time operation, regardless of the length of st.
Well views aren't strings. And s+t (for s and t strings) normally takes O(len(s)+len(t)) time.
The type consistency and predictability is more important to me.
I didn't mean to recommend v+"" as the best way to turn a view v into a string; that would be str(v).
The non-null string addition case, I agree that it could make some sense to return the string (considering you will need to copy it anyways), but if one returned a view on that string, it would be more consistant with other methods, and getting the string back via str(view) would offer equivalent functionality. It would also require the user to be explicit about what they really want; though there is the argument that if I'm passing a string as an operand to addition with a view, I actually want a string, so give me one.
I strongly believe you're mistaken here. I don't think users will hvae any trouble with the concept "operations that don't (necessarily) return a substring will return a new string.
I'm going to implement it as returning a view, but leave commented sections for some of them to return a string.
> BTW beware that in py3k, strings (which will always be unicode > strings) won't support the buffer API -- bytes objects will. Would you > want views on strings or ob bytes or on both? That's tricky. Views on bytes will come for free, like array, mmap, and anything else that supports the buffer protocol. It requires the removal of the hash method for mutables, but that is certainly expected.
The question is, how useful is the buffer protocol going to be? We don't know yet.
Right now, a large portion of standard library code use strings and string methods to handle parsing, etc. Removing immutable byte strings from 3.x seems likely to result in a huge amount of rewriting necessary to utilize either bytes or text (something I have mentioned before). I believe that with views on bytes (and/or sufficient bytes methods), the vast majority would likely result in the use of bytes.
Um, unless you consider decoding a GIF file "parsing", parsing would seem to naturally fall in the realm of text (characters), not bytes.
Having a text view for such situtions that works with the same kinds of semantics as the bytes view would be nice from a purity/convenience standpoint, and only needing to handle a single data type (text) could make its implementation easier. I don't have any short-term plans of writing text views, but it may be somewhat easier to do after I'm done with string/byte views.
Unifying the semantics between byte views and text views will be difficult since bytes are mutable.
I recommend that you have a good look at the bytes implementation in the p3yk branch.
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-3000] Making more effective use of slice objects in Py3k
- Next message: [Python-3000] Making more effective use of slice objects in Py3k
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]