Bug in internals alignment for Series.combine_first with Extension dtype · Issue #24147 · pandas-dev/pandas (original) (raw)

In [5]: a = pd.Series(pd.Categorical([0, 1, 2], categories=list(range(5))))

In [6]: b = pd.Series(pd.Categorical([2, 3, 4], categories=list(range(5))), index=[2, 3, 4])

In [7]: a.combine_first(b) Out[7]: 0 0.0 1 1.0 2 2.0 3 NaN 4 NaN dtype: category Categories (5, int64): [0, 1, 2, 3, 4]

Compare with the expected (aside from dtype)

In [8]: a = pd.Series([0, 1, 2])

In [9]: b = pd.Series([2, 3, 4], index=[2, 3, 4])

In [10]: a.combine_first(b) Out[10]: 0 0.0 1 1.0 2 2.0 3 3.0 4 4.0 dtype: float64

Something is going wrong inside Block.apply at https://github.com/pandas-dev/pandas/blob/master/pandas/core/internals/managers.py#L386-L387

(Pdb) pp b.mgr_locs.indexer slice(0, 1, 1) (Pdb) pp self.items[b.mgr_locs.indexer] Int64Index([0], dtype='int64')

that should be

(Pdb) pp b.mgr_locs.indexer slice(0, 5, 1) (Pdb) pp b_items Int64Index([0, 1, 2, 3, 4], dtype='int64')

I'm hitting this in the DatetimeArray refactor.

I suspect that this is a symptom of #23023