Error with pivot_table and categorical data when add dropna args in version 0.23 · Issue #21133 · pandas-dev/pandas (original) (raw)
pd.version '0.23.0'
labels = ["< 1 mes","1 a 11 meses","1 a 5 años","6 a 11 años","12 a 17 años","18 a + años","alli"]
df = pd.DataFrame( {'a': pd.cut(range(0,50), bins = [0,5,10,20,30,40,50,60], labels = labels, right = True), 'b':(' '.join(['One'] * 20) + ' ' +' '.join(['Two'] * 30)).split(), 'c': range(10,60)})
df.pivot_table(index='a', aggfunc = 'count', fill_value = 0
, margins = True, margins_name = 'Total', values = 'c',dropna = False)
output
c
a
< 1 mes 5
1 a 11 meses 5
1 a 5 años 10
6 a 11 años 10
12 a 17 años 10
18 a + años 9
alli 0
Total 50
df.pivot_table(index='a', aggfunc = 'count', fill_value = 0
, margins = True, margins_name = 'Total', values = 'c',dropna = True)
output
c
a
NaN 5
< 1 mes 5
1 a 11 meses 10
1 a 5 años 10
6 a 11 años 10
12 a 17 años 9
18 a + años 0
Total 49
pd.version '0.22.0'
df.pivot_table(index='a', aggfunc = 'count', fill_value = 0
, margins = True, margins_name = 'Total', values = 'c',dropna = False)
c
a
< 1 mes 5
1 a 11 meses 5
1 a 5 años 10
6 a 11 años 10
12 a 17 años 10
18 a + años 9
alli 0
Total 50
df.pivot_table(index='a', aggfunc = 'count', fill_value = 0
, margins = True, margins_name = 'Total', values = 'c',dropna = True)
c
a
< 1 mes 5
1 a 11 meses 5
1 a 5 años 10
6 a 11 años 10
12 a 17 años 10
18 a + años 9
alli 0
Total 49
When entering the argument dropna = False with the function pivot_table the value and order of the categories is correct. But when dropna = True both the order and the value of the categories do not perform as expected, add a NaN category and drop last category. With other dataset w/o NaN values dropna = True causes categories lose coherence with the values.
The same goes for the pd.crosstab () function
INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: Spanish_Argentina.1252
pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: 0.3.1
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None