[Python-Dev] New lines, carriage returns, and Windows (original) (raw)
Nick Maclaren nmm1 at cus.cam.ac.uk
Sat Sep 29 20:48:20 CEST 2007
- Previous message: [Python-Dev] New lines, carriage returns, and Windows
- Next message: [Python-Dev] New lines, carriage returns, and Windows
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Guido van Rossum" <guido at python.org> wrote:
Have you looked at Py3k at all, especially PEP 3116 (new I/O)?
No.
Python does have its own I/O model. There are binary files and text files. For binary files, you write bytes and the semantic model is that of an array of bytes; byte indices are seek positions.
That is the same model as C and Unix. It is text files that we are discussing.
For text files, the contents is considered to be Unicode, encoded as bytes in a binary file. So text file always has an underlying binary file. Two translations take place, both of which have defaults varying by platform. One translation is encoding Unicode text into bytes upon output, and decoding bytes to Unicode text upon input. This can use any encoding supported by the encodings package.
The character code isn't the issue here, and is almost completely irrelevant.
The other translation deals with line endings. Upon input, any of \r\n, \r, or \n is translated to a single \n by default (this is nhe "universal newlines" algorithm from Python 2.x). This can be tweaked or disabled. Upon output, \n is translated into a platform specific string chosen from \r\n, \r, or \n. This can also be disabled or overridden. Note that \r, when written, is never treated specially; if you want special processing for \r on output, you can write your own translation layer.
Grrk. That's the problem. You don't get back what you have written, for a start, which isn't nice. There are other issues, too.
That's all. There is nothing unimplementable or confusing in these specifications.
Nothing unimplementable, I agree. Nothing confusing? Not in the experience of the users I have dealt with.
Python doesn't care about record I/O on legacy OSes; it does care about variability found in practice between popular OSes.
As a short-term solution, that is fine. But I have seen the wheel turn a couple of times in 40 years, and expect it to continue after I am safely 6' under ....
Note that \r, \n and friends in Python 3000 are either ASCII (in bytes literals) or Unicode (in text literals). Again, no support for legacy systems that don't use ASCII or a superset.
That's not a problem. I don't see that changing in the forseeable future.
Legacy OSes are called that for a reason.
Well, I remember when the text I/O model that C, Unix and Python use WAS a feature of legacy OSs :-)
Seriously.
Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679
- Previous message: [Python-Dev] New lines, carriage returns, and Windows
- Next message: [Python-Dev] New lines, carriage returns, and Windows
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]