Regression: .agg with dictionary return value in 0.20 does not work as intended · Issue #16741 · pandas-dev/pandas (original) (raw)
When I run the code below in 0.20.2
import numpy as np import pandas as pd
pd.show_versions()
np.random.seed(42) df = pd.DataFrame({'uid': np.random.randint(0, 5, 20), 'color': np.random.randint(0, 5, 20)}) df.groupby('uid')['color'].agg(lambda x: x.value_counts().to_dict())
I got:
uid
0 <built-in method values of dict object at 0x10...
1 <built-in method values of dict object at 0x10...
2 <built-in method values of dict object at 0x11...
3 <built-in method values of dict object at 0x11...
4 <built-in method values of dict object at 0x11...
Name: color, dtype: object
Doing exactly the same in 0.19.2 I got as expected:
uid
0 {0: 1}
1 {0: 1, 2: 1, 4: 1}
2 {1: 1, 2: 1, 3: 3}
3 {1: 2, 2: 3}
4 {0: 2, 3: 2, 4: 2}
Name: color, dtype: object
This might be related to the closed issue #8735 where the same problem occurred once using .apply
and dictionary return values. But in my case .apply
works as expected only .agg
misbehaves since version 0.20.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8
pandas: 0.20.2
pytest: 2.8.5
pip: 9.0.1
setuptools: 32.1.2
Cython: 0.24.1
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None