Issue 1511: csv input converts \r\n to \n but csv output does not when a field has internal line breaks (original) (raw)

Created on 2007-11-28 05:15 by fenner, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue1511.py gregory.p.smith,2007-11-28 05:33
issue1511.csv gregory.p.smith,2007-11-28 05:36
issue1511_py3k.py ajaksu2,2009-05-12 13:23
Messages (7)
msg57902 - (view) Author: Bill Fenner (fenner) Date: 2007-11-28 05:15
When a field has internal line breaks, e.g., foo,"bar baz biff",boo that is actually 3 lines, but one csv-file row. csv.reader() converts this to ['foo', 'bar\nbaz\nbiff', 'boo']. This is a reasonable behavior. Unfortunately, csv.writer() does not use the dialect's lineterminator setting for values with such internal linebreaks. This means that the resulting file will have a mix of line-termination styles: foo,"bar\n baz\n biff",boo\r\n If the reading csv implementation is strict about its line termination, these line breaks will not be read properly.
msg57903 - (view) Author: Bill Fenner (fenner) Date: 2007-11-28 05:19
I realized that my description was not crystal clear - the file being read has \r\n line terminators - in the format that I used later, the input file is foo,"bar\r\n baz\r\n biff",boo\r\n
msg57904 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-11-28 05:33
release25-maint and trunk (2.6) appear to do the correct thing when testing on my ubuntu gutsy linux x86 box. test script and file attached. The problem is reproducable in a release24-maint build compiled 2007-11-05.
msg57905 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-11-28 05:36
attaching the test input file. use od -x or similar to compare the new.csv output with .csv to see if the problem happened. its 2.4.. that may be old enough to be considered dead
msg87624 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-05-12 13:23
I get different behavior in py3k compared to trunk: ~/trunk-py$ ./python issue1511_py3k.py [['foo', 'bar\r\nbaz\r\nbiff', 'boo']] 'foo,"bar\r\nbaz\r\nbiff",boo\r\n' ~/trunk-py$ ../py3k/python issue1511_py3k.py [['foo', 'bar\nbaz\nbiff', 'boo']] 'foo,"bar\nbaz\nbiff",boo\n'
msg87631 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-05-12 13:59
Daniel> Daniel Diniz <ajaksu@gmail.com> added the comment: Daniel> I get different behavior in py3k compared to trunk: Daniel> ~/trunk-py$ ./python issue1511_py3k.py Daniel> [['foo', 'bar\r\nbaz\r\nbiff', 'boo']] Daniel> 'foo,"bar\r\nbaz\r\nbiff",boo\r\n' Daniel> ~/trunk-py$ ../py3k/python issue1511_py3k.py Daniel> [['foo', 'bar\nbaz\nbiff', 'boo']] Daniel> 'foo,"bar\nbaz\nbiff",boo\n' Try adding newline='' to your open calls. I believe that will preserve the CRLF pairs. Skip
msg87632 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-05-12 14:08
You're right, sorry about the noise. Closing as out of date.
History
Date User Action Args
2022-04-11 14:56:28 admin set github: 45852
2009-05-12 14:08:00 ajaksu2 set status: open -> closedresolution: out of datemessages: + stage: test needed -> resolved
2009-05-12 13:59:33 skip.montanaro set messages: +
2009-05-12 13:23:17 ajaksu2 set files: + issue1511_py3k.pycomponents: + IOversions: + Python 2.6, - Python 2.4nosy: + ajaksu2, pitroumessages: + stage: test needed
2008-01-20 19:54:41 christian.heimes set priority: normal
2007-11-28 12:45:53 skip.montanaro set assignee: skip.montanaronosy: + skip.montanaro
2007-11-28 05:36:15 gregory.p.smith set files: + issue1511.csvmessages: +
2007-11-28 05:33:03 gregory.p.smith set files: + issue1511.pynosy: + gregory.p.smithmessages: +
2007-11-28 05:19:39 fenner set messages: +
2007-11-28 05🔞53 fenner set components: + Library (Lib)
2007-11-28 05:15:17 fenner create