(original) (raw)

Okay, sure, I guess the starting point of my argument is, DictReader is nice, why not make one that supports duplicate columns and easily implement the other behaviors, whether it's discarding values from duplicate columns so there's a one-to-one mapping, or just raising an exception when a duplicate column is encountered to start with, in terms of something that handles this superset of legal CSV formats that do in fact specify exactly what header names each of their values should be mapped to?

Shane Green

www.umbrellacode.com

408-692-4666 | shane@umbrellacode.com

On Jan 29, 2013, at 3:16 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

On 29 January 2013 10:18, Shane Green <shane@umbrellacode.com> wrote:
So I wasn't really questioning the usefulness of the dictionary
representation, but couldn't the returned object also let you access the
header and value sequences, etc? I was also thinking the conversion to
simple dict with single (non-list) values per column could be part of the
API.

Appending duplicate field values as they're read reflects the order the
duplicate entries appear in the source (when I've encountered CSV that
purposely used duplicate column headers, the sequence they appear was
critical). The output from the current implementation should reflect the
last duplicate value, as that always replaces previous ones in the dict, so
my conversions returned the last value (-1), which should do the same�I
think. It was a straw man ;-).

I see your point about the point. I think it would be good to have an
implementation that kept all the information but still put the most usable
API on it possible, rather than saying you can't have dictionary access
unless you want to lose duplicate values, for example. I mean, I've needed
to consume CSV a lot, and that's what would have made the module useful to
me, and the implementation that keeps all the information and lets it easily
to trimmed as-not-needed seems better than one that just wipes it out to
start.

This is exactly what the csv.reader objects do.

While it is a problem that csv.DictReader silently discards data when
that is very likely an error, there's no need to try and guess how
people want to deal with duplicate column headers and invent a new
class for it. It's easy enough to write your own wrapper that exactly
performs whatever processing you happen to want:

def multireader(csvreader):
try:
headers = next(csvreader)
except StopIteration:
raise ValueError('No header')
for row in csvreader:
d = defaultdict(list)
for h, v in zip(headers, row):
d\[h\].append(v)
yield d

Oscar