[Python-3000] New io system and binary data (original) (raw)

Charles D Hixson charleshixsn at earthlink.net
Sun Sep 23 19:24:24 CEST 2007


Guido van Rossum wrote:

On 9/19/07, Bill Janssen <janssen at parc.com> wrote:

This really isn't a UTF-8 problem. It is the problem with file opens defaulting to "text" mode instead of "binary" mode rearing its ugly head again.

You can repeat that until you're blue in the face but it's not going to change. Way more programs (especially simple ones) deal with txet than with binary data. OTOH, almost all of that text is ASCII. Even if the system mode is set to utf-8, ascii is still ascii.

Still, this won't affect me, much, as I rarely send anything complex via pipes. (I know, I should. It's more secure. But the fact is, I don't. I use files.)

But this is the kind of thing that could make dealing with, say, xpm files a real hassle. (Probably won't, as ascii is still ascii, but it will introduce corner cases.) A lot of the time what I'm really dealing with is bytes rather than characters. I think of them as characters, and try to choose values that display nicely as characters, because that's the way that's been convenient for decades. But they ARE bytes, sometimes signed bytes. And this is going to mean that there are lots of cases where they don't map nicely to something that's trying to understand them as unicode.

So there needs to be an easy and obvious way to deal with files whose records are arrays of byte valued data...that is commonly manipulated by an editor using ascii-8.



More information about the Python-3000 mailing list