Filtering dataframe with sparse column leads to NAs in sparse column · Issue #27781 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd

df1 = pd.DataFrame({"A": pd.SparseArray([0, 0, 0]), 'B': [1,2,3]})

df1_filtered will have NAs in column A

df1_filtered = df1.loc[df1['B'] != 2]

df2 = pd.DataFrame({"A": pd.SparseArray([0, 1, 0]), 'B': [1,2,3]})

df2_filtered has no NAs in column A

df2_filtered = df2.loc[df2['B'] != 2]

where df1_filtered will look like

and df2_filtered like

Problem description

Filtering a dataframe with an all-zero sparse column can lead to NAs in the sparse column.

Expected Output

Both data frames should be the same, as filtering a dataframe with non-missing data should not lead to missing data.

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]

INSTALLED VERSIONS

commit : None

pandas : 0.25.0
numpy : 1.16.2
pytz : 2019.1
dateutil : 2.8.0
pip : 19.2.1
setuptools : 39.1.0
Cython : None
pytest : 4.3.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : 0.2.1
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : 0.2.0
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None