Issue 3359: add 'rbU' mode to open() (original) (raw)

Issue3359

Created on 2008-07-15 05:21 by techtonik, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (18)
msg69673 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-15 05:21
'rU' universal newline support is useless, because read lines end with '\n' regardless of actual line end in the source file. Applications that care about line ends still open file in binary mode and gather the stats manually. So, to make this mode useful - the 'rbU' should be addded. Otherwise it doesn't worth complication both in C code and in documentation.
msg69679 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-15 11:47
The whole idea of universal newline mode is that the various possible line endings ('\r', '\n' and '\r\n') are all mapped to '\n' precisely so the user doesn't have to detect and fiddle with them. Using 'b' and 'U' together makes no sense. * If you really want to see the line endings use 'rb'. * If you don't care about the line endings regardless of source, use 'rU'. * Otherwise use 'r'.
msg69709 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-15 19:05
If you open file with 'r' - all line endings will be mapped precisely to '\n' anyways, so it has nothing to do with 'U' mode.
msg69742 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-16 01:39
> If you open file with 'r' - all line endings will be mapped precisely to > '\n' anyways, so it has nothing to do with 'U' mode. No they won't -- only the platform-specific newline will. On Unix, 'r' and 'rb' are the same.
msg69764 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-16 05:08
That's weird and the worst is that it is not documented. Manual says: "If Python is built without universal newline support a mode with 'U' is the same as normal text mode." but no information about what is "normal text mode" behaviour. The way Python works that you describe is weird, but true. If developer uses Windows platform - Unix and Windows files will be handled in the same way, but not files from Mac platform. The worst that developer can't know this, because he is unlikely to have any Mac files to test. This behavior is like a long standing mine to collate Windows and Mac Python users. Why not to fix it?
msg69845 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-16 22:05
This behavior is inherited from the C-level fopen() and therefore "normal text mode" is whatever that defines. Is this really nowhere documented?
msg69862 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-17 01:01
anatoly> If you open file with 'r' - all line endings will be mapped anatoly> precisely to '\n' anyways, so it has nothing to do with 'U' anatoly> mode. Before 3.0 at least, if you copy a text file from, say, Windows to Mac, and open it with 'r', you get lines which end in '\r\n'. Here's a simple example: >>> open("dos.txt", "rb").read() 'a single line\r\nanother line\r\n' >>> f = open("dos.txt") >>> f.next() 'a single line\r\n' >>> f = open("dos.txt", "r") >>> f.next() 'a single line\r\n' >>> f.next() 'another line\r\n' If, on the other hand, you open it with 'rU', the '\r\n' literal line ending is converted, even though CRLF is not the canonical Mac line ending: >>> f = open("dos.txt", "rU") >>> f.next() 'a single line\n' >>> f.next() 'another line\n' Skip
msg69876 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-17 06:46
> This behavior is inherited from the C-level fopen() and therefore > "normal text mode" is whatever that defines. > Is this really nowhere documented? Relation to fopen() function may be documented, but there is no explanation of what "normal text mode" is. Is it really pythonic that a script writer without former experience with C, stdio and fopen should be aware of inherited fopen "behavior" when programming Python?
msg70030 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-19 13:50
At least the 2.6 docs say "The default is to use text mode, which may convert ``'\n'`` characters to a platform-specific representation on writing and back on reading."
msg70068 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-20 09:09
That's fine with me. I just need a 'rbU' mode to know in which format should I write the output file if I want to preserve proper line endings regardless of platform. As for Python 2.6 note - I would replace "may convert" with "converts".
msg70069 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-20 10:03
If you want to write your own line endings, read with "rU" and write with "rb".
msg70098 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-21 06:12
If lineends are mixed I would like to leave them as is.
msg70130 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-22 01:10
Did you look at the io.open() function? It's a new module in python2.6, but also the builtin "open" in py3k! """ * On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated. """ I suggest to try io.open(filename, newline="")
msg70180 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-23 18:14
As I indicated in if you want to see the line endings just open the file in binary mode ('rb').
msg70202 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-24 12:39
Thanks for the hints. It appeared that "universal text mode" is not for crossplatform but for platform-specific programming. =) So I gave it up and ended with my own 'rb' newlines counter and 'wb' writer which inserts lines in required format. As for 2.6 io.open() http://docs.python.org/dev/library/io.html#module-io - can anybody point what's the difference between text mode with newlines='' and binary mode? - the comment about newline= "If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated." does it mean that if newline='\r\n' is specified all single '\n' characters are returned inline?
msg70204 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-24 14:20
> does it mean that if newline='\r\n' is specified all single '\n' > characters are returned inline? Yes. Let's take a file with mixed newlines: >>> io.open("c:/temp/t", "rb").read() 'a\rb\r\nc\nd\n' rb mode splits only on '\r\n' (I'm on Windows) >>> io.open("c:/temp/t", "rb").readlines() ['a\rb\r\n', 'c\n', 'd\n'] rU mode splits on every newline, and converts everything to \n >>> io.open("c:/temp/t", "rU").readlines() [u'a\n', u'b\n', u'c\n', u'd\n'] newline='' splits like rU, but does not translate newlines: >>> io.open("c:/temp/t", newline='').readlines() [u'a\r', u'b\r\n', u'c\n', u'd\n'] newline='\r\n' only splits on the specified string: >>> io.open("c:/temp/t", newline='\r\n').readlines() [u'a\rb\r\n', u'c\nd\n']
msg70218 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-24 17:32
This '\r' makes things worse. I am also on Windows and didn't thought that "rb" processes '\r\n' linefeeds as a side-effect of '\n' being the last character. Thanks. newline='' is just what I need. I guess there is no alternative to it in 2.5 series except splitting lines returned from binary read manually. What about file.newlines attribute - is it preserved in 2.6/Py3k? BTW, it would be nice to have this example in manual.
msg70219 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-24 18:51
Please read http://docs.python.org/dev/library/io.html#io.TextIOBase.newlines
History
Date User Action Args
2022-04-11 14:56:36 admin set github: 47609
2008-07-24 18:51:45 amaury.forgeotdarc set messages: +
2008-07-24 17:32:57 techtonik set messages: +
2008-07-24 14:20:18 amaury.forgeotdarc set messages: +
2008-07-24 12:39:16 techtonik set messages: +
2008-07-23 18:14:19 skip.montanaro set messages: +
2008-07-22 01:10:40 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2008-07-21 06:13:00 techtonik set messages: +
2008-07-20 10:03:09 georg.brandl set messages: +
2008-07-20 09:09:51 techtonik set messages: +
2008-07-19 13:50:12 georg.brandl set messages: +
2008-07-17 06:46:41 techtonik set messages: +
2008-07-17 01:01:36 skip.montanaro set messages: +
2008-07-16 22:05:15 georg.brandl set messages: +
2008-07-16 05:08:16 techtonik set messages: +
2008-07-16 01:39:10 georg.brandl set nosy: + georg.brandlmessages: +
2008-07-15 19:05:40 techtonik set messages: +
2008-07-15 11:47:47 skip.montanaro set status: open -> closedresolution: not a bugmessages: + nosy: + skip.montanaro
2008-07-15 05:21:48 techtonik create