read_csv clobbers values of columns with duplicate names (original) (raw)

xref #10577 (has test for duplicates with empty data)

I don't expect this is the correct behavior, although it's always possible I'm doing something wrong. Importing data using the names keyword will clobber the values of columns where the name is duplicated. For example:

from StringIO import StringIO import pandas as pd

data = """a,1 b,2 c,3""" names = ['field', 'field']

print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=True) print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=False)

returns

   field  field
0      1      1
1      2      2
2      3      3
   field  field
0      1      1
1      2      2
2      3      3

However, this produces the correct result:

df = pd.read_csv(StringIO(data), header=None) df.columns = names print df

   field  field
0      a      1
1      b      2
2      c      3

Interestingly, it works if the field names are in the header:

data_with_header = "field,field\n" + data print pd.read_csv(StringIO(data_with_header))

  field  field.1
0     a        1
1     b        2
2     c        3

Is this a bug or am I doing something wrong?