[Python-Dev] bytes (original) (raw)

[Python-Dev] bytes / unicode

Nick Coghlan ncoghlan at gmail.com
Wed Jun 23 12:58:00 CEST 2010

Previous message: [Python-Dev] bytes / unicode
Next message: [Python-Dev] bytes / unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jun 23, 2010 at 7:18 PM, M.-A. Lemburg <mal at egenix.com> wrote:

Note that the point of using a builtin method was to get better performance. Such type adaptions are often needed in loops, so adding a few extra Python function calls just to convert a str object to a bytes object or vice-versa is a bit much overhead.

I actually agree with that, I just think we need more real world experience as to what works with the Python 3 text model before we start messing with the APIs for the builtin objects (fair point that "coerce" is a loaded term given the existence of the old coercion protocol. It's the right word for the task though).

One of the key points coming out of this thread (to my mind) is the lack of a Text ABC or other way of making an object that can be passed to functions expecting a str instance with a reasonable expectation of having it work. Are there some core string capabilities that can be identified and then expanded out to a full str-compatible API? (i.e. something along the lines of what collections.MutableMapping now provides for dict-alikes).

However, even if something like that was added, PJE is correct in pointing out that builtin strings still don't play well with others in many cases (usually due to underlying optimisations or other sound reasons, but perhaps sometimes gratuitously). Most of the string binary operations can be dealt with through their reflected forms, but str.mod will never return NotImplemented, contains has no reflected form and the actual method calls are of course right out (e.g. the arguments to str.join() or str.split() calls have no ability to affect the type of the result).

Third party number implementations couldn't provide comparable funtionality to builtin int and long objects until the index protocol was added. Perhaps PJE is right that what this is really crying out for is a way to have third party "real string" implementations so that there can actually be genuine experimentation in the Unicode handling space outside the language core (comparable to the difference between the "you can turn me into an int" int method and the "I am an int equivalent" index method).

That may be tapping in a nail with a sledgehammer (and would raise significant moratorium questions if pursued further), but I think it's a valid question to at least ask.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Previous message: [Python-Dev] bytes / unicode
Next message: [Python-Dev] bytes / unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list