[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)

Phillip J. Eby pje at telecommunity.com
Tue Feb 14 17:25:01 CET 2006

Previous message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:

On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:

Phillip J. Eby wrote: I was just pointing out that since byte strings are bytes by definition, then simply putting those bytes in a bytes() object doesn't alter the existing encoding. So, using latin-1 when converting a string to bytes actually seems like the the One Obvious Way to do it.

This is a misconception. In Python 2.x, the type str already is a bytes type. So if S is an instance of 2.x str, bytes(S) does not need to do any conversion. You don't need to assume it is latin-1: it's already bytes.

In fact, the 'encoding' argument seems useless in the case of str objects, and it seems it should default to latin-1 for unicode objects. I agree with the former, but not with the latter. There shouldn't be a conversion of Unicode objects to bytes at all. If you want bytes from a Unicode string U, write bytes(U.encode(encoding)) I like it, it makes sense. Unicode strings are simply not allowed as arguments to the byte constructor. Thinking about it, why would it be otherwise? And if you're mixing str-strings and unicode-strings, that means the str-strings you're sometimes giving are actually not byte strings, but character strings anyhow, so you should be encoding those too. bytes(sorU.encode('utf-8')) is a perfectly good spelling.

Actually, I think you mean:

 if isinstance(s_or_U, str):
     s_or_U = s_or_U.decode('utf-8')

 b = bytes(s_or_U.encode('utf-8'))

Or maybe:

 if isinstance(s_or_U, unicode):
     s_or_U = s_or_U.encode('utf-8')

 b = bytes(s_or_U)

Which is why I proposed that the boilerplate logic get moved into the bytes constructor. I think this use case is going to be common in today's Python, but in truth I'm not as sure what bytes() will get used for in today's Python. I'm probably overprojecting based on the need to use str objects now, but bytes aren't going to be a replacement for str for a good while anyway.

Kill the encoding argument, and you're left with:

Python2.X: - bytes(bytesobject) -> copy constructor - bytes(strobject) -> copy the bytes from the str to the bytes object - bytes(sequenceofints) -> make bytes with the values of the ints, error on overflow Python3.X removes str, and most APIs that did return str return bytes instead. Now all you have is: - bytes(bytesobject) -> copy constructor - bytes(sequenceofints) -> make bytes with the values of the ints, error on overflow Nice and simple.

I could certainly live with that approach, and it certainly rules out all the "when does the encoding argument apply and when should it be an error to pass it" questions. :)

Previous message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list