SparseDataFrame.fillna() doesn't fill all NaNs (original) (raw)

Code Sample, a copy-pastable example if possible

naneye = np.eye(3) naneye[1:, 0] = np.nan naneye array([[ 1., 0., 0.], [ nan, 1., 0.], [ nan, 0., 1.]])

import scipy.sparse spm = scipy.sparse.csr_matrix(naneye) spm <3x3 sparse matrix of type '<class 'numpy.float64'>' with 5 stored elements in Compressed Sparse Row format>

spm.nnz # diag + 2 NaN 5

sdf = pd.SparseDataFrame(spm, default_fill_value=0) sdf 0 1 2 0 1.0 0.0 0.0 1 NaN 1.0 0.0 2 NaN 0.0 1.0

This is fine, since I want the implied zeros¹ of scipy matrices to remain zeros and have explicit missing data remain missing.

¹ spm.mean(0) == matrix([[ nan, 0.3333, 0.3333]])

However, if I don't specify a fill value:

sdf = pd.SparseDataFrame(spm) sdf 0 1 2 0 1.0 NaN NaN 1 NaN 1.0 NaN 2 NaN NaN 1.0

sdf.fillna(-1) # huh?? 0 1 2 0 1.0 -1.0 -1.0 1 NaN 1.0 -1.0 2 NaN -1.0 1.0

sdf.fillna(-1).fillna(-2) # at least there's a pattern to it :) 0 1 2 0 1.0 -1.0 -1.0 1 -2.0 1.0 -1.0 2 -2.0 -1.0 1.0

sdf.fillna(-1).class pandas.core.sparse.frame.SparseDataFrame

Problem description

On sdf.fillna(-1), I'd expect all NaNs to be filled. Don't know whether it's a bug or a feature, but it certainly is strange.

xref: #15533

Output of pd.show_versions()

pandas v0.20.0rc1 844013b