[Python-Dev] Maintenance burden of str.swapcase (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Wed Sep 7 14:47:49 CEST 2011


On Wed, 07 Sep 2011 11:15:04 +0900 "Stephen J. Turnbull" <stephen at xemacs.org> wrote:

Antoine Pitrou writes:

> Bytes objects are often used for partly ASCII strings, All I can say to that phrase is, "urk, ISO 2022 anyone?"

You could also point out UTF-16 or EBCDIC, but I fail to see how that's relevant. Do you have problems with ISO 2022 when parsing, say, e-mail headers?

> not arbitrary "arrays of bytes". And making indexing of bytes > objects return ints was IMHO a mistake.

Bytes objects are not ASCII strings, even though they can be used to represent them.

I'm talking about practice, not some idealistic view of the world. In many use cases (XML, HTML, e-mail headers, many other test-based protocols), you can get a mixture of ASCII "commands", and opaque binary stuff (which will or will not, depending on these "commands", have a meaningful unicode decoding).

In the stdlib, bytes objects are accessed far more often to poke at some text-like data, than to poke at arbitrary numbers.

With PEP 393, there isn't even really a space excuse.

Of course there is. Any single non-ASCII byte of data mingled with aforementioned ASCII "commands" will make it switch to a less efficient representation.

And "surrogateescape" will be a performance problem in itself, when used on large binary data; if you use "latin1" instead, you are risking far greater confusion; ask David about that dilemma. :-)

AFAICS, anything that should be done with ASCII-punned magic numbers ("protocol tokens", if you prefer) can be done with slices and (ta-da!) case conversion.

So, basically, you're saying that we should remove useful functionality and tell people to reimplement an adhoc version of it when they need it. That sounds obnoxious.

Regards

Antoine.



More information about the Python-Dev mailing list