[Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Mon Apr 27 20:04:44 CEST 2009


Antoine Pitrou writes:

or (better for 2.x, where bytes are strings as far as most programmers are concerned) as a new data type,

I'm -1 on any new string-like type (for file paths or whatever else) with custom encoding/decoding semantics. It's the best way to ruin the clean str/bytes separation that 3.x introduced.

Excuse me, but I can't see a scheme that encodes bytes as Unicodes but only sometimes as a "clean separation". It's a dirty hack that makes life a lot easier for Windows programmers and a little easier for many Unix programmers. Practicality beats purity, true, but at the cost of the purity.

Besides, the goal is also to makes things easier for the programmer. Otherwise, we'll have the same situation as in 2.x where many English-centric programmers produced code that was incapable of dealing with non-ASCII input, because they didn't care about the distinction between str and unicode.

So what you'll get here, AFAICS, is a new situation where many Windows-centric programmers will produce code that's incapable of dealing with non-Unicode input because they don't have to care about the distinction between Unicode and bytes.

That's an improvement, but we can do still better and not at huge expense to programmers.



More information about the Python-Dev mailing list