Filtering dataframe with sparse column leads to NAs in sparse column · Issue #27781 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd
df1 = pd.DataFrame({"A": pd.SparseArray([0, 0, 0]), 'B': [1,2,3]})
df1_filtered will have NAs in column A
df1_filtered = df1.loc[df1['B'] != 2]
df2 = pd.DataFrame({"A": pd.SparseArray([0, 1, 0]), 'B': [1,2,3]})
df2_filtered has no NAs in column A
df2_filtered = df2.loc[df2['B'] != 2]
where df1_filtered
will look like
and df2_filtered
like
Problem description
Filtering a dataframe with an all-zero sparse column can lead to NAs in the sparse column.
Expected Output
Both data frames should be the same, as filtering a dataframe with non-missing data should not lead to missing data.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit : None
pandas : 0.25.0
numpy : 1.16.2
pytz : 2019.1
dateutil : 2.8.0
pip : 19.2.1
setuptools : 39.1.0
Cython : None
pytest : 4.3.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : 0.2.1
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : 0.2.0
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None