False negative on .equals() after read_hdf() · Issue #9330 · pandas-dev/pandas (original) (raw)

I have strange results from .equals appearing when DataFrame is written to HDF Store and then read back:

import pandas as pd
df = pd.DataFrame({'B':[1,2], 'A':[str('x'), str('y')]})  # str() - just to be sure this is not linked to unicode 
print 'df:'
print df
df.to_hdf('hdf_file', 'key', format='t', mode='w')
df_out = pd.read_hdf('hdf_file', 'key')
print '\ndf_out:'
print df_out
print '\ndf equals df_out:', df.equals(df_out)
print '\ndf_out equals df:', df_out.equals(df)
print '\ndf.shape == df_out.shape:', df.shape == df_out.shape
print '\narray_equivalent(df.values, df_out.values):', pd.core.common.array_equivalent(df.values, df_out.values)
print '\ndf.index equals df_out.index:', df.index.equals(df_out.index)
print '\ndf.columns equals df_out.columns:', df.columns.equals(df_out.columns)
for col in df.columns:
    print '\ndf.{0} equals df_out.{0}: {1}'.format(col, df[col].equals(df_out[col]))

output:

df:
   A  B
0  x  1
1  y  2

df_out:
   A  B
0  x  1
1  y  2

df equals df_out: False

df_out equals df: False

df.shape == df_out.shape: True

array_equivalent(df.values, df_out.values): True

df.index equals df_out.index: True

df.columns equals df_out.columns: True

df.A equals df_out.A: True

df.B equals df_out.B: True

The interesting thing is that if DataFrame is initialized with different columns "order" in the dictionary the results are ALL True (i.e. correct):

df = pd.DataFrame({'A':[1,2], 'B':[str('x'),str('y')]})  # in the code above

will give:

df equals df_out: True
df_out equals df: True

I have seen similar issues (#8437 and #7605), which are marked as closed, but seeing this strange results... might be something different?

python 2.7.9, pandas 0.15.2

My apologies in advance for potential duplicate.