pivot_table drops columns for aggregate functions that return all None, even if dropna=False · Issue #22159 · pandas-dev/pandas (original) (raw)
pivot_table
drops columns for aggregate functions that return all None
, even if dropna=False
. I'm using pandas: 0.23.3
and the full output of pd.show_versions()
is posted at the end of this comment. #21969 seems to be related.
Code Sample
import pandas as pd
pd.version = '0.23.3'
create sample dataframe to be pivoted
df = pd.DataFrame({'fruit':['apple','orange','peach','apple','peach'], 'size': [1,1.5,1,2,1], 'taste': [7,8,6,6,10]})
create three aggregate functions, one which returns only None
def return_one(x): return 1
def return_sum(x): return sum(x)
def return_none(x): return None
reshape the dataframe
df2 = pd.pivot_table(df, columns='fruit', aggfunc=[return_sum,return_none,return_one], dropna=False)
df2 does not include the output
of return_none, which is unexpected
return_sum return_one
fruit apple orange peach apple orange peach
size 3.0 1.5 2.0 1.0 1.0 1.0
taste 13.0 8.0 16.0 1.0 1.0 1.0
Problem description
The dropna argument of pivot_table
is True
by default, meaning that the default output of pivot_table
will not include "columns whose entries are all NaN".
If dropna
is set to False, I would expect the resulting dataframe to include columns whose entries are all NaN (or None). This is not the case. As shown in the example, the columns are dropped even if dropna=False
.
Expected Output
return_sum return_none return_one
fruit apple orange peach apple orange peach apple orange peach
size 3.0 1.5 2.0 None None None 1.0 1.0 1.0
taste 13.0 8.0 16.0 None None None 1.0 1.0 1.0
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.3
pytest: 3.3.2
pip: 18.0
setuptools: 39.0.1
Cython: 0.28.4
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.5
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None