read_csv clobbers values of columns with duplicate names (original) (raw)
xref #10577 (has test for duplicates with empty data)
I don't expect this is the correct behavior, although it's always possible I'm doing something wrong. Importing data using the names keyword will clobber the values of columns where the name is duplicated. For example:
from StringIO import StringIO import pandas as pd
data = """a,1 b,2 c,3""" names = ['field', 'field']
print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=True) print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=False)
returns
field field
0 1 1
1 2 2
2 3 3
field field
0 1 1
1 2 2
2 3 3
However, this produces the correct result:
df = pd.read_csv(StringIO(data), header=None) df.columns = names print df
field field
0 a 1
1 b 2
2 c 3
Interestingly, it works if the field names are in the header:
data_with_header = "field,field\n" + data print pd.read_csv(StringIO(data_with_header))
field field.1
0 a 1
1 b 2
2 c 3
Is this a bug or am I doing something wrong?