BUG: Sparse incorrectly handle fill_value · Issue #12797 · pandas-dev/pandas (original) (raw)
Sparse looks to handle missing (NaN)
and fill_value
confusingly. Based on the doc, I understand fill_value
is a user-specified value to be omitted in the sparse internal repr. fill_value
may be different from missing (NaN).
Code Sample, a copy-pastable example if possible
# NG, 2nd and last element must be NaN
pd.SparseArray([1, np.nan, 0, 3, np.nan], fill_value=0).to_dense()
# array([ 1., 0., 0., 3., 0.])
# NG, 2nd element must be NaN
orig = pd.Series([1, np.nan, 0, 3, np.nan], index=list('ABCDE'))
sparse = orig.to_sparse(fill_value=0)
sparse.reindex(['A', 'B', 'C'])
# A 1.0
# B 0.0
# C 0.0
# dtype: float64
# BlockIndex
# Block locations: array([0], dtype=int32)
# Block lengths: array([1], dtype=int32)
Expected Output
pd.SparseArray([1, np.nan, 0, 3, np.nan], fill_value=0).to_dense()
# array([ 1., np.nan, 0., 3., np.nan])
sparse = orig.to_sparse(fill_value=0)
sparse.reindex(['A', 'B', 'C'])
# A 1.0
# B NaN
# C 0.0
# dtype: float64
# BlockIndex
# Block locations: array([0], dtype=int32)
# Block lengths: array([1], dtype=int32)
output of pd.show_versions()
Current master.
The fix itself looks straightforward, but it breaks some tests use dubious comparison.