[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)

Phillip J. Eby pje at telecommunity.com
Tue Feb 14 00:17:07 CET 2006


At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:

The conversion from Unicode to bytes is different in this respect, since you are converting from a "bigger" type to a "smaller" one. Choosing latin-1 as default for this conversion would give you all 8 bits, instead of just 7 bits that ASCII provides.

I was just pointing out that since byte strings are bytes by definition, then simply putting those bytes in a bytes() object doesn't alter the existing encoding. So, using latin-1 when converting a string to bytes actually seems like the the One Obvious Way to do it.

I'm so accustomed to being wary of encoding issues that the idea doesn't feel right at first - I keep going, "but you can't know what encoding those bytes are". Then I go, Duh, that's the point. If you convert str->bytes, there's no conversion and no interpretation - neither the str nor the bytes object knows its encoding, and that's okay. So str(bytes_object) (in 2.x) should also just turn it back to a normal bytestring.

In fact, the 'encoding' argument seems useless in the case of str objects, and it seems it should default to latin-1 for unicode objects. The only use I see for having an encoding for a 'str' would be to allow confirming that the input string in fact is valid for that encoding. So, "bytes(some_str,'ascii')" would be an assertion that some_str must be valid ASCII.

> So, it sounds like making the encoding default to latin-1 would be a > reasonably safe approach in both 2.x and 3.x.

Reasonable for bytes(): yes. In general: no.

Right, I was only talking about bytes().

For 3.0, the type formerly known as "str" won't exist, so only the Unicode part will be relevant then.



More information about the Python-Dev mailing list