[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)

Adam Olsen rhamph at gmail.com
Wed Feb 15 06:11:49 CET 2006


On 2/14/06, Guido van Rossum <guido at python.org> wrote:

On 2/13/06, Adam Olsen <rhamph at gmail.com> wrote: > If I understand correctly there's three main candidates: > 1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x

I'm not sure what you mean, but I'm guessing you're thinking that the repr() of a bytes object created from bytes('abc\xf0') would be bytes('abc\xf0') under this rule. What's so bad about that?

See below.

> 2. Direct copying to str/unicode if it's only ascii values, switching > to a list of hex literals if there's any non-ascii values

That works for me too. But why hex literals? As MvL stated, a list of decimals would be just as useful.

PEBKAC. Yeah, decimals are simpler and shorter even.

_> 3. b"foo" literal with ascii for all ascii characters (other than _ > and "), \xFF for individual characters that aren't ascii > > Given the choice I prefer the third option, with the second option as > my runner up. The first option just screams "silent errors" to me.

The 3rd is out of the running for many reasons. I'm not sure I understand your "silent errors" fear; can you elaborate?

I think it's that someone will create a unicode object with real latin-1 characters and it'll get passed through without errors, the code assuming it's 8bit-as-latin-1. If they had put other unicode characters in they would have gotten an exception instead.

However, at this point all the posts on latin-1 encoding/decoding have become so muddled in my mind that I don't know what they're suggesting. I think I'll wait for the pep to clear that up.

-- Adam Olsen, aka Rhamphoryncus



More information about the Python-Dev mailing list