Pandas inconsistenly handles identically named columns in csv export and merging · Issue #3468 · pandas-dev/pandas (original) (raw)
Using pandas 0.10.1
Pandas allows creating a dataframe with two columns with the same name. (I disagree that it should be allowed, but it is allowed, so OK). However, it doesn't handle that correctly in several cases.
Pandas ought to either completely disallow duplicate named columns or handle them everywhere. But it shouldn't handle them in some cases but not others.
Problem 1: Round-trip to a CSV
Dump the dataframe to a CSV and then read it back. Even though duplicate columns are supposed to be legal, Pandas won't allow that in the CSV import/export.
df = pandas.DataFrame([[1,2]], columns=['a','a']) #Note df has two identically named columns now df.to_csv('foo.csv') df2 = pandas.DataFrame.from_csv('foo.csv') #Note df2 does NOT have the same columns anymore. the "a" was renamed "a.1" #At the least, you ought to get some warning that dumping to a CSV will #lose column names.
Problem 2: Merging to a dataframe with dup columns does not work
df = pandas.DataFrame([[1,2,3]], columns=['a','a','b']) df.merge(df, how='left', on='b') #Result: Very misleading error message
edit: reindexing error
I'd be ok with almost any solution (disallowing duplicate named columns, giving you a warning when you do it, handling it correctly in the merge and csv read, etc). But it should be consistently one way or the other.