[Python-ideas] csv.DictReader could handle headers more intelligently. (original) (raw)
Amaury Forgeot d'Arc amauryfa at gmail.com
Wed Jan 23 18:08:32 CET 2013
- Previous message: [Python-ideas] csv.DictReader could handle headers more intelligently.
- Next message: [Python-ideas] csv.DictReader could handle headers more intelligently.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
2013/1/23 J. Cliff Dyer <jcd at sdf.lonestar.org>
On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote: > I don't think we should start adding support for every malformed type > of csv file that exists. It's easy enough to remove the unnecessary > lines yourself before passing them to DictReader: > > from csv import DictReader > > with open('malformed.csv','rb') as csvfile: > csvlines = list(l for l in csvfile if l.strip()) > csvreader = DictReader(csvlines) > > Personally, if I was dealing with this as often as you are, I'd > probably make a custom context manager instead. The problem lies in > the files themselves, not in csv's response to them. _> ________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas >
With all due respect, while you make a good point that we don't want to start special casing every malformed type of CSV, there is absolutely something wrong with DictReader's response to files that have duplicate headers. It throws away data silently.
That's how Python dictionaries work, by design: d = {'a': 1, 'a': 2} "silently" discards the first value.
If you (and others on this list) aren't in favor of trying to find the
right header row (which I can understand: "In the face of ambiguity, refuse the temptation to guess."), maybe a better solution would be to raise a (suppressible) exception if the headers aren't uniquely named. ("Errors should never pass silently. Unless explicitly silenced.")
What about a subclass then:
class CarefulDictReader(csv.DictReader): def init(self, *args, **kwargs): super().init(*args, **kwargs) fieldnames = self.fieldnames if len(fieldnames) != len(set(fieldnames)): raise ValueError("Duplicate field names", fieldnames)
-- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/616cefb7/attachment.html>
- Previous message: [Python-ideas] csv.DictReader could handle headers more intelligently.
- Next message: [Python-ideas] csv.DictReader could handle headers more intelligently.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]