[Python-Dev] str object going in Py3K (original) (raw)

Guido van Rossum guido at python.org
Wed Feb 15 22:37:52 CET 2006


On 2/15/06, Bill Janssen <janssen at parc.com> wrote:

Well, I probably am, but that's not the reason. Reading has nothing to do with it.

Actually if you read binary data in text mode on Windows you also get corrupt (and often truncated) data, unless you're lucky enough that the binary data contains neither ^Z (EOF) nor CRLF.

The default mode (text) corrupts data on write on a certain platform (Windows) by inserting extra bytes in the data stream. This bug particularly exhibits itself when programs developed on Linux or Mac OS X are then run on a Windows platform. I think it's a bug to default to a mode which modifies the data stream. The default mode should be 'binary'; people interested in exploiting the obsolete Windows distinction between "text" and "binary" should have to use a mode switch (I suggest "t") to put a file stream in 'text' mode.

This might have been a possibility in Python 2.x where binary reads return strings. In Python 3000 binary files will return bytes objects while text files will return strings (which are decoded from unicode using an encoding that's determined when the file is opened, taking into account system and user settings as well as possible overrides passed to open()). I expect that the APIs for reading and writing binary data will be sufficiently different from that for reading/writing text that even staunch Unix programmers won't make the mistake of using the text API for creating binary files.

I realize that's not the answer you're looking for, but for backwards compatibility we can't change the default on Windows in Python 2.x, so the point is moot until 3.0 or until a new binary file API is added to 2.x.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list