msg23295 - (view) |
Author: Chris Withers (fresh) |
Date: 2004-11-24 12:00 |
On trying to parse a '\r' terminated csv generated on a Mac, I get a "newline inside string" error from the csv module. Two things sprung to mind having read: http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_csv.c?rev=1.15&view=markup ...for a bit. 1. The Dialect's lineterminator doesn't appear to be used when parsing a CSV. This feels like a bug to be, 'cos I could specify the terminator if Reader_iternext(ReaderObj *self) used it :-S 2. The processing in Reader_iternext(ReaderObj *self) assumes that a '\r' will be followed by '\0' for Macs, '\n' for windows, and anything else is an error. but: >>> c = open('var\\data\\metadata.csv').read() >>> c[:100] 'BENEFIT,,Subjects relating to all benefits,AB \rBENEFIT,PARTNERDIED,Bereavement Should I be expecting to see a '\0' there? Anyway, the real bug seems to be the reader's ignorance of the lineterminator. However, even if my analysis is off the mark, the problem still exists :-S cheers, Chris |
|
|
msg23296 - (view) |
Author: Skip Montanaro (skip.montanaro) *  |
Date: 2004-11-25 04:23 |
Logged In: YES user_id=44345 This is a known problem. See the April archives of the csv mailing list: http://manatee.mojam.com/pipermail/csv/2004-April/thread.html Solutions are welcome. I suspect any solution will involve either discarding PyIter_Next altogether or further subdividing what it returns. A couple things to note in the way of workarounds: 1. Reader_iternext() defers to PyIter_Next() to grab the next line, so there's really no opportunity to interject the lineterminator into the operation with the current code. This means reading from StringIO objects that use \r lineterminators will always fail. 2. If you have a real file as input and open it in universal newline mode you will get the correct behavior. |
|
|
msg23297 - (view) |
Author: Andrew McNamara (andrewmcnamara) *  |
Date: 2005-01-13 04:14 |
Logged In: YES user_id=698599 The reader expects to be supplied an iterator that returns lines - in this case, the file iterator has not recognised \r as end-of-line and has read the whole file in and yielded that as a "line". If you use universal-newline mode on your source file, you should have more luck. |
|
|
msg23298 - (view) |
Author: Chris Withers (fresh) |
Date: 2005-01-18 11:25 |
Logged In: YES user_id=24723 I don't think its fair to close this as a rejection. The documentation implies that you can control what line terminator this module uses, which currently isn't the case. I'm not saying this is a high priority issue, just that it shouldn't be rejected in case some day someone (maybe even me ;-) wants to haev a goat fixing it... |
|
|
msg23299 - (view) |
Author: Andrew McNamara (andrewmcnamara) *  |
Date: 2005-01-18 12:11 |
Logged In: YES user_id=698599 This cannot be fixed with the current interface - the line splitting is being done by the file iterator, and it only supports \r and \n. As I said, you'll get better results with universal newline mode. The parser in Python 2.5 (the CVS HEAD) has been improved somewhat, but it's still not possible to use anything other than \r and \n for end-of-line. The documentation has been updated to reflect this fact. |
|
|
msg82123 - (view) |
Author: Daniel Diniz (ajaksu2) *  |
Date: 2009-02-14 21:57 |
Needs confirmation, probably a won't fix either way. |
|
|
msg106210 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-05-21 01:33 |
The doc has been fixed; using lineterminator in reader has not been and is not likely to be implemented (unless someone wants to come forward with a patch). Processing files that use \r line endings does work; as indicated you use universal newline mode for the input file. In Py3k you can wrap a BytesIO object in a TextIOWrapper to get universal newline parsing. So, I'm closing this as wont fix, as suggested. If someone does want to implement lineterminator for reader, they can open a new feature request issue. |
|
|