[Python-ideas] csv.DictReader could handle headers more intelligently. (original) (raw)

J. Cliff Dyer jcd at sdf.lonestar.org
Wed Jan 23 02:06:08 CET 2013


Idea folks,

I'm working with some poorly-formed CSV files, and I noticed that DictReader always and only pulls headers off of the first row. But many of the files I see have blank lines before the row of headers, sometimes with commas to the appropriate field count, sometimes without. The current implementation's behavior in this case is likely never correct, and certainly always annoying. Given the following file:

---Start File 1--- ,, A,B,C 1,2,3 2,4,6 ---End File 1---

csv.DictReader yields the rows:

{'': 'C'}
{'': '3'}
{'': '6'}

And given a file starting with a zero-length line, like the following:

---Start File 2---

A,B,C 1,2,3 2,4,6 ---End File 2---

It yields the following:

{None: ['A', 'B', 'C']} {None: ['1', '2', '3']} {None: ['2', '4', '6']}

I think that in both cases, the proper response would be treat the A,B,C line as the header line. The change that makes this work is pretty simple. In the fieldnames getter property, the "if not self._fieldnames:" conditional becomes "while not self._fieldnames or not any(self._fieldnames):" As a subclass:

import csv

class DictReader(csv.DictReader): @property def fieldnames(self): while self._fieldnames is None or not any(self._fieldnames): try: self._fieldnames = next(self.reader) except StopIteration: break return self._fieldnames self.line_num = self.reader.line_num

#Same as the original setter, just rewritten to associate with the

new getter propery @fieldnames.setter def fieldnames(self, value): self._fieldnames = value

There might be some issues with existing code that depends on the {None: ['1','2','3']} construction, but I can't imagine a time when programmers would want to see {'': '3'} with the 1 and 2 values getting lost.

Thoughts? Do folks think this is worth adding to the csv library, or should I just keep using my subclass?

Cheers, Cliff



More information about the Python-ideas mailing list