Issue 967934: csv module cannot handle embedded \r (original) (raw)

Created on 2004-06-07 04:46 by gnbond, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (8)

msg21057 - (view)

Author: Gregory Bond (gnbond)

Date: 2004-06-07 04:46

CSV module cannot handle the case of embedded \r (i.e. carriage return) in a field.

As far as I can see, this is hard-coded into the _csv.c file and cannot be fixed with Dialect changes.

msg21058 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2004-06-07 05:02

Logged In: YES user_id=80475

Skip, does this coincide with your planned switchover to universal newlines?

msg21059 - (view)

Author: Andrew McNamara (andrewmcnamara) * (Python committer)

Date: 2004-06-07 05:32

Logged In: YES user_id=698599

I suspect this restriction (CR appearing within a quoted field) is a historical accident and can be safely removed.

msg21060 - (view)

Author: Skip Montanaro (skip.montanaro) * (Python triager)

Date: 2004-06-07 11:25

Logged In: YES user_id=44345

It certainly intersects with it somehow. ;-) If nothing else, it will serve as a useful test case.

msg21061 - (view)

Author: Andrew McNamara (andrewmcnamara) * (Python committer)

Date: 2005-01-13 11:34

Logged In: YES user_id=698599

If you're interested, I've just checked in a change to the CVS head for Python 2.5 that may, at least partially, fix this problem (if you try it, let me know how it goes).

msg21062 - (view)

Author: David Goodger (goodger) (Python committer)

Date: 2006-04-05 15:35

Logged In: YES user_id=7733

I just filed a bug (http://www.python.org/sf/1465014) that seems to be related to this. Revision 38290 on Modules/_csv.c includes the addition of this code:

else if (c == '\n' || c == '\r') {
  self->state = EAT_CRNL;
  break;
}

(and similar). This seems to be eating (deleting) control chars, but newlines used to be significant.

Embedded line breaks are allowed, according to RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt). And according to the Wikipedia entry (http://en.wikipedia.org/wiki/Comma-separated_values), "a line break within an element must be preserved."

msg82052 - (view)

Author: Daniel Diniz (ajaksu2) * (Python triager)

Date: 2009-02-14 13:56

IIUC, I get the correct behavior:

trunk-py$ ./python ~/Desktop/tcsv.py ['fld1', 'fld2', 'fld3 ', 'fld4'] ['fld1', 'fld2', 'fld3 \r', 'fld4']

trunk-py$ cat ~/Desktop/tcsv.py #! /usr/local/bin/python

import csv

d = 'fld1,fld2,"fld3 ",fld4\r\n' d2 = 'fld1,fld2,"fld3 \r' d3 = '",fld4\r\n'

r = csv.reader([d, d2, d3], dialect="excel") for f in r: print f

msg106189 - (view)

Author: R. David Murray (r.david.murray) * (Python committer)

Date: 2010-05-20 20:55

At some point I added test_roundtrip_quoteed_newlines to the csv unit tests, and it passes both on trunk and py3k. I believe if there was a bug here it has been fixed. I just backported the test to 2.6 in r81382, and it passes there as well. Closing as out of date.

Heh, I just noticed that the method name is misspelled :(

History

Date

User

Action

Args

2022-04-11 14:56:04

admin

set

github: 40357

2010-05-20 20:55:26

r.david.murray

set

status: open -> closed

nosy: + r.david.murray
messages: +

resolution: out of date
stage: test needed -> resolved

2010-05-20 20:27:03

skip.montanaro

set

nosy: - skip.montanaro

2009-02-14 13:56:28

ajaksu2

set

versions: + Python 2.6
nosy: + ajaksu2
messages: +
dependencies: + CSV regression in 2.5a1: multi-line cells
components: + Extension Modules
type: behavior
stage: test needed

2004-06-07 04:46:56

gnbond

create