BUG: DataFrame.diff(axis=1) with mixed (or EA) dtypes by jbrockmendel · Pull Request #32995 · pandas-dev/pandas (original) (raw)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can just raise NotImplemented here. I suspect a mix of floats and bool will be somewhat common.

Can we instead skip doing things blockwise for axis=1 and do it columnwise instead? Something like

In [39]: df = pd.DataFrame({'a': [1,1,2,2],'b': [1, 2, 3, 4.], 'c': [True, False, False, True]})

In [40]: import toolz

In [41]: pd.concat([a - b for a, b in toolz.sliding_window(2, (df.iloc[:, i] for i in range(len(df.columns))))], axis=1) Out[41]: 0 1 0 0.0 0.0 1 -1.0 2.0 2 -1.0 3.0 3 -2.0 3.0

(without using toolz, and including the all-NA columns and fixing the column names). We could perhaps only do that when nblocks > 1, to preserve the performance in the homogenous case.