[Python-ideas] csv.DictReader could handle headers more intelligently. (original) (raw)
Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Jan 29 12:16:02 CET 2013
- Previous message: [Python-ideas] csv.DictReader could handle headers more intelligently.
- Next message: [Python-ideas] csv.DictReader could handle headers more intelligently.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 29 January 2013 10:18, Shane Green <shane at umbrellacode.com> wrote:
So I wasn't really questioning the usefulness of the dictionary representation, but couldn't the returned object also let you access the header and value sequences, etc? I was also thinking the conversion to simple dict with single (non-list) values per column could be part of the API.
Appending duplicate field values as they're read reflects the order the duplicate entries appear in the source (when I've encountered CSV that purposely used duplicate column headers, the sequence they appear was critical). The output from the current implementation should reflect the last duplicate value, as that always replaces previous ones in the dict, so my conversions returned the last value (-1), which should do the sameā¦I think. It was a straw man ;-). I see your point about the point. I think it would be good to have an implementation that kept all the information but still put the most usable API on it possible, rather than saying you can't have dictionary access unless you want to lose duplicate values, for example. I mean, I've needed to consume CSV a lot, and that's what would have made the module useful to me, and the implementation that keeps all the information and lets it easily to trimmed as-not-needed seems better than one that just wipes it out to start.
This is exactly what the csv.reader objects do.
While it is a problem that csv.DictReader silently discards data when that is very likely an error, there's no need to try and guess how people want to deal with duplicate column headers and invent a new class for it. It's easy enough to write your own wrapper that exactly performs whatever processing you happen to want:
def multireader(csvreader): try: headers = next(csvreader) except StopIteration: raise ValueError('No header') for row in csvreader: d = defaultdict(list) for h, v in zip(headers, row): d[h].append(v) yield d
Oscar
- Previous message: [Python-ideas] csv.DictReader could handle headers more intelligently.
- Next message: [Python-ideas] csv.DictReader could handle headers more intelligently.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]