API: expected result of concat of SparseArray with Categorical? · Issue #34459 · pandas-dev/pandas (original) (raw)

xref #34338

The current behaviour when Sparse and Categorical are concatenated, is that for both the "dense" values are used (for sparse the densified values, for categorical the non-categorical version of the values).

For example, Sparse[int] with Categorical[int] results in a plain int series:

In [1]: pd.concat([ 
   ...:     pd.Series([1, 0, 2], dtype=pd.SparseDtype("int64", 0)), 
   ...:     pd.Series([3, 4, 5], dtype="category") 
   ...: ]) 
Out[1]: 
0    1
1    0
2    2
0    3
1    4
2    5
dtype: int64

An alternative could also be to preserve the sparseness in the result, which is what happens when concatting with a plain int series (not categorical):

In [2]: pd.concat([ 
   ...:     pd.Series([1, 0, 2], dtype=pd.SparseDtype("int64", 0)), 
   ...:     pd.Series([3, 4, 5], dtype="int64") 
   ...: ]) 
Out[2]: 
0    1
1    0
2    2
0    3
1    4
2    5
dtype: Sparse[int64, 0]

(alternatively, you could maybe also say that concatting with categorical should never give a numerical result, but rather object dtype or so)