Issue 1431091: CSV Sniffer fails to report mismatch of column counts (original) (raw)
If one line of a CSV file is missing one or more commas, the delimiter detection code of the Sniffer class fails, setting delimiter to an empty string.
This leads to a totally misleading error when using has_header().
This code shows the problem (Python 2.4.2, FC3 and Ubuntu Breezy):
import csv
str1 = "a,b,c,d\r\n1,2,foo bar,dead beef\r\nthis,line,is,ok\r\n" str2 = "a,b,c,d\r\n1,2,foo bar,dead beef\r\nthis,line is,not\r\n"
s = csv.Sniffer()
d1 = s.sniff(str1) d2 = s.sniff(str2)
for line in str1.split('\r\n'): print line.count(',')
print d1.delimiter print s.has_header(str1)
for line in str2.split('\r\n'): print line.count(',')
print d2.delimiter print s.has_header(str2)
Problem is still there as of Python 2.4.3.
Trying to read in a file whose lines have a different number of fields, I get:
Traceback (most recent call last): File "../myscript.py", line 59, in ? main() File "../myscript.py", line 30, in main reader = csv.reader(fin, dialect) TypeError: bad argument type for built-in operation
where "dialect" has been sniffed by feeding the first two lines to the Sniffer.
What I expect is to either:
- get different sized rows, with no exception raised
- get a csv.Error instead of the TypeError above
Thanks