API: expected result of concat of SparseArray with Categorical? · Issue #34459 · pandas-dev/pandas (original) (raw)
xref #34338
The current behaviour when Sparse and Categorical are concatenated, is that for both the "dense" values are used (for sparse the densified values, for categorical the non-categorical version of the values).
For example, Sparse[int] with Categorical[int] results in a plain int
series:
In [1]: pd.concat([
...: pd.Series([1, 0, 2], dtype=pd.SparseDtype("int64", 0)),
...: pd.Series([3, 4, 5], dtype="category")
...: ])
Out[1]:
0 1
1 0
2 2
0 3
1 4
2 5
dtype: int64
An alternative could also be to preserve the sparseness in the result, which is what happens when concatting with a plain int series (not categorical):
In [2]: pd.concat([
...: pd.Series([1, 0, 2], dtype=pd.SparseDtype("int64", 0)),
...: pd.Series([3, 4, 5], dtype="int64")
...: ])
Out[2]:
0 1
1 0
2 2
0 3
1 4
2 5
dtype: Sparse[int64, 0]
(alternatively, you could maybe also say that concatting with categorical should never give a numerical result, but rather object dtype or so)