Issue 967934: csv module cannot handle embedded \r (original) (raw)
Created on 2004-06-07 04:46 by gnbond, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (8)
Author: Gregory Bond (gnbond)
Date: 2004-06-07 04:46
CSV module cannot handle the case of embedded \r (i.e. carriage return) in a field.
As far as I can see, this is hard-coded into the _csv.c file and cannot be fixed with Dialect changes.
Author: Raymond Hettinger (rhettinger) *
Date: 2004-06-07 05:02
Logged In: YES user_id=80475
Skip, does this coincide with your planned switchover to universal newlines?
Author: Andrew McNamara (andrewmcnamara) *
Date: 2004-06-07 05:32
Logged In: YES user_id=698599
I suspect this restriction (CR appearing within a quoted field) is a historical accident and can be safely removed.
Author: Skip Montanaro (skip.montanaro) *
Date: 2004-06-07 11:25
Logged In: YES user_id=44345
It certainly intersects with it somehow. ;-) If nothing else, it will serve as a useful test case.
Author: Andrew McNamara (andrewmcnamara) *
Date: 2005-01-13 11:34
Logged In: YES user_id=698599
If you're interested, I've just checked in a change to the CVS head for Python 2.5 that may, at least partially, fix this problem (if you try it, let me know how it goes).
Author: David Goodger (goodger)
Date: 2006-04-05 15:35
Logged In: YES user_id=7733
I just filed a bug (http://www.python.org/sf/1465014) that seems to be related to this. Revision 38290 on Modules/_csv.c includes the addition of this code:
else if (c == '\n' || c == '\r') {
self->state = EAT_CRNL;
break;
}
(and similar). This seems to be eating (deleting) control chars, but newlines used to be significant.
Embedded line breaks are allowed, according to RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt). And according to the Wikipedia entry (http://en.wikipedia.org/wiki/Comma-separated_values), "a line break within an element must be preserved."
Author: Daniel Diniz (ajaksu2) *
Date: 2009-02-14 13:56
IIUC, I get the correct behavior:
trunk-py$ ./python ~/Desktop/tcsv.py ['fld1', 'fld2', 'fld3 ', 'fld4'] ['fld1', 'fld2', 'fld3 \r', 'fld4']
trunk-py$ cat ~/Desktop/tcsv.py #! /usr/local/bin/python
import csv
d = 'fld1,fld2,"fld3 ",fld4\r\n' d2 = 'fld1,fld2,"fld3 \r' d3 = '",fld4\r\n'
r = csv.reader([d, d2, d3], dialect="excel") for f in r: print f
Author: R. David Murray (r.david.murray) *
Date: 2010-05-20 20:55
At some point I added test_roundtrip_quoteed_newlines to the csv unit tests, and it passes both on trunk and py3k. I believe if there was a bug here it has been fixed. I just backported the test to 2.6 in r81382, and it passes there as well. Closing as out of date.
Heh, I just noticed that the method name is misspelled :(
History
Date
User
Action
Args
2022-04-11 14:56:04
admin
set
github: 40357
2010-05-20 20:55:26
r.david.murray
set
status: open -> closed
nosy: + r.david.murray
messages: +
resolution: out of date
stage: test needed -> resolved
2010-05-20 20:27:03
skip.montanaro
set
nosy: - skip.montanaro
2009-02-14 13:56:28
ajaksu2
set
versions: + Python 2.6
nosy: + ajaksu2
messages: +
dependencies: + CSV regression in 2.5a1: multi-line cells
components: + Extension Modules
type: behavior
stage: test needed
2004-06-07 04:46:56
gnbond
create