[Python-ideas] csv.DictReader could handle headers more intelligently. (original) (raw)

Steven D'Aprano steve at pearwood.info
Thu Jan 24 01:26:52 CET 2013


On 24/01/13 06:59, Jerry Hill wrote:

On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

I can't see why there would be duplicate column headers for valid reason.

Someone may have written their CSV export incorrectly, but that's not actually valid. Sure it is. Since there is no formal spec for .csv files, having a multiple columns with the same text in the header is a perfectly valid .csv file. For what it's worth, the informal spec for csv files seems to be "whatever Excel does" and Excel (and every other spreadsheet-oriented program) is happy to let you have duplicated headers too.

+1

I think keeping DictReader as it is now is fine for backward compatibility. Or better, simply have DictReader raise an exception rather than silently eat data. I don't expect that anyone is relying on that behaviour, nor is it behaviour promised by the class.

But we should add a MultiDictReader that supports the multiple columns with the same name.

It would therefore be arguable for the program to give at least a WARNING that it's throwing data away. I think the library should give the programmer some sort of indication that they are losing data. Personally, I'd prefer an exception which can either be caught or not, depending on whether the program is designed to handle the situation or not.

However, since python is mechanising this as a dictionary and since in python setting A to 1 then setting A to 3 would throw away the earlier value for A and the import function working AS EXPECTED in Python. I'm not sure this behavior merits the all-caps "AS EXPECTED" label. It's not terribly surprising once you sit down and think about it, but it's certainly at least a little unexpected to me that data is being thrown away with no notice. It's unusual for errors to pass silently in python.

Yes, we should not forget that a CSV file is not a dict. Just because DictReader is implemented with a dict as the storage, doesn't mean that it should behave exactly like a dict in all things. Multiple columns with the same name are legal in CSV, so there should be a reader for that situation.

-- Steven



More information about the Python-ideas mailing list