[Python-Dev] [python] Re: New lines, carriage returns, and Windows (original) (raw)

Michael Foord fuzzyman at voidspace.org.uk
Sat Sep 29 20:35:53 CEST 2007


Terry Reedy wrote:

"Michael Foord" <fuzzyman at voidspace.org.uk> wrote in message news:46FE6F92.40601 at voidspace.org.uk... | Guido van Rossum wrote:

[snip first part of nice summary of Python i/o model] | > The other translation deals with line endings. Upon input, any of | > \r\n, \r, or \n is translated to a single \n by default (this is nhe [sic] | > "universal newlines" algorithm from Python 2.x). This can be tweaked | > or disabled. Upon output, \n is translated into a platform specific | > string chosen from \r\n, \r, or \n. This can also be disabled or | > overridden. Note that \r, when written, is never treated specially; if | > you want special processing for \r on output, you can write your own | > translation layer. | So the question is, that when a string containing '\r\n' is written to a | file in text mode on a Windows platform, should it be written with the | encoded representation of '\r\n' or '\r\r\n'? I think Guido pretty clearly said that on output, the default behavior is that \r is nothing special. If you want a special case exception, write a special case translator. +1 from me. To propose otherwise is to propose that the default semantic meaning of Python text objects depend on the platform that it might be output-translated for. I believe the point of universal newline support was to get away from this. | Purity would dictate the latter and practicality the former (IMO)... I disagree. Special case exceptions complicate both learnability and code readability and maintainability. Simplicity is practicality. The symmetry of 'platform-line-endings =input> \n =output> plaform-line-endings' is both pure and practical. | However, that would mean that round tripping a string would change it | ('\r\n' would be written as '\r\n' and then read as '\n') Whereas \r\r\n would be read back as \r\n, which is what should happen. Round-trip-ability is practical to me. | - on the other | hand (particularly given that we are treating the data as text and not a | binary blob) I don't see how writing '\r\r\n' would ever actually be | useful in text. There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \r<translation of \n> is correct. Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'.

Michael



More information about the Python-Dev mailing list