BUG: · Issue #34380 · pandas-dev/pandas (original) (raw)
After specified column after groupby output is incorrect.
df = pd.DataFrame({'token': [12345.0, 12345.0, 12345.0, 12345.0, 12345.0, 12345.0, 6789.0, 6789.0, 6789.0, 6789.0, 6789.0, 6789.0], 'name': ['abc', 'abc', 'abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz'], 'ltp': [2.0, 5.0, 3.0, 9.0, 5.0, 16.0, 1.0, 5.0, 3.0, 13.0, 9.0, 20.0], 'change': [np.nan, 1.5, -0.4, 2.0, -0.44444399999999995, 2.2, np.nan, 4.0, -0.4, 3.333333, -0.307692, 1.222222]})
print (df)
token name ltp change
0 12345.0 abc 2.0 NaN
1 12345.0 abc 5.0 1.500000
2 12345.0 abc 3.0 -0.400000
3 12345.0 abc 9.0 2.000000
4 12345.0 abc 5.0 -0.444444
5 12345.0 abc 16.0 2.200000
6 6789.0 xyz 1.0 NaN
7 6789.0 xyz 5.0 4.000000
8 6789.0 xyz 3.0 -0.400000
9 6789.0 xyz 13.0 3.333333
10 6789.0 xyz 9.0 -0.307692
11 6789.0 xyz 20.0 1.222222
df0 = df.groupby('name').agg(pos=pd.NamedAgg(column='change',aggfunc=lambda x: x.gt(0).sum()),\
neg = pd.NamedAgg(column='change',aggfunc=lambda x:x.lt(0).sum()))
print (df0)
pos neg
name
abc 3.0 2.0
xyz 3.0 2.0
df1 = df.groupby('name')['change'].agg(pos = pd.NamedAgg(column='change',aggfunc=lambda x:x.gt(0).sum()),\
neg = pd.NamedAgg(column='change',aggfunc=lambda x:x.lt(0).sum()))
print (df1)
pos neg
name
abc 2.0 2.0
xyz 2.0 2.0
df2 = df.groupby('name')['change'].agg(pos = lambda x:x.gt(0).sum(),\
neg = lambda x:x.lt(0).sum())
print (df2)
pos neg
name
abc 3.0 2.0
xyz 3.0 2.0
In my opinion instead wrong output it should raise error/warning.
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.5.4
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.48.0
None