BUG: concat unwantedly sorts DataFrame column names if they differ · Issue #4588 · pandas-dev/pandas (original) (raw)
When concat'ing DataFrames, the column names get alphanumerically sorted if there are any differences between them. If they're identical across DataFrames, they don't get sorted.
This sort is undocumented and unwanted. Certainly the default behavior should be no-sort. EDIT: the standard order as in SQL would be: columns from df1 (same order as in df1), columns (uniquely) from df2 (less the common columns) (same order as in df2). Example:
df4a = DataFrame(columns=['C','B','D','A'], data=np.random.randn(3,4)) df4b = DataFrame(columns=['C','B','D','A'], data=np.random.randn(3,4)) df5 = DataFrame(columns=['C','B','E','D','A'], data=np.random.randn(3,5))
print "Cols unsorted:", concat([df4a,df4b])
Cols unsorted: C B D A
print "Cols sorted", concat([df4a,df5])
Cols sorted A B C D E
``'