SparseDataFrame.fillna() doesn't fill all NaNs (original) (raw)
Code Sample, a copy-pastable example if possible
naneye = np.eye(3) naneye[1:, 0] = np.nan naneye array([[ 1., 0., 0.], [ nan, 1., 0.], [ nan, 0., 1.]])
import scipy.sparse spm = scipy.sparse.csr_matrix(naneye) spm <3x3 sparse matrix of type '<class 'numpy.float64'>' with 5 stored elements in Compressed Sparse Row format>
spm.nnz # diag + 2 NaN 5
sdf = pd.SparseDataFrame(spm, default_fill_value=0) sdf 0 1 2 0 1.0 0.0 0.0 1 NaN 1.0 0.0 2 NaN 0.0 1.0
This is fine, since I want the implied zeros¹ of scipy matrices to remain zeros and have explicit missing data remain missing.
¹ spm.mean(0) == matrix([[ nan, 0.3333, 0.3333]])
However, if I don't specify a fill value:
sdf = pd.SparseDataFrame(spm) sdf 0 1 2 0 1.0 NaN NaN 1 NaN 1.0 NaN 2 NaN NaN 1.0
sdf.fillna(-1) # huh?? 0 1 2 0 1.0 -1.0 -1.0 1 NaN 1.0 -1.0 2 NaN -1.0 1.0
sdf.fillna(-1).fillna(-2) # at least there's a pattern to it :) 0 1 2 0 1.0 -1.0 -1.0 1 -2.0 1.0 -1.0 2 -2.0 -1.0 1.0
sdf.fillna(-1).class pandas.core.sparse.frame.SparseDataFrame
Problem description
On sdf.fillna(-1), I'd expect all NaNs to be filled. Don't know whether it's a bug or a feature, but it certainly is strange.
xref: #15533
Output of pd.show_versions()
pandas v0.20.0rc1 844013b