False negative on .equals() after read_hdf() · Issue #9330 · pandas-dev/pandas (original) (raw)
I have strange results from .equals
appearing when DataFrame is written to HDF Store and then read back:
import pandas as pd
df = pd.DataFrame({'B':[1,2], 'A':[str('x'), str('y')]}) # str() - just to be sure this is not linked to unicode
print 'df:'
print df
df.to_hdf('hdf_file', 'key', format='t', mode='w')
df_out = pd.read_hdf('hdf_file', 'key')
print '\ndf_out:'
print df_out
print '\ndf equals df_out:', df.equals(df_out)
print '\ndf_out equals df:', df_out.equals(df)
print '\ndf.shape == df_out.shape:', df.shape == df_out.shape
print '\narray_equivalent(df.values, df_out.values):', pd.core.common.array_equivalent(df.values, df_out.values)
print '\ndf.index equals df_out.index:', df.index.equals(df_out.index)
print '\ndf.columns equals df_out.columns:', df.columns.equals(df_out.columns)
for col in df.columns:
print '\ndf.{0} equals df_out.{0}: {1}'.format(col, df[col].equals(df_out[col]))
output:
df:
A B
0 x 1
1 y 2
df_out:
A B
0 x 1
1 y 2
df equals df_out: False
df_out equals df: False
df.shape == df_out.shape: True
array_equivalent(df.values, df_out.values): True
df.index equals df_out.index: True
df.columns equals df_out.columns: True
df.A equals df_out.A: True
df.B equals df_out.B: True
The interesting thing is that if DataFrame is initialized with different columns "order" in the dictionary the results are ALL True (i.e. correct):
df = pd.DataFrame({'A':[1,2], 'B':[str('x'),str('y')]}) # in the code above
will give:
df equals df_out: True
df_out equals df: True
I have seen similar issues (#8437 and #7605), which are marked as closed, but seeing this strange results... might be something different?
python 2.7.9, pandas 0.15.2
My apologies in advance for potential duplicate.