BUG: · Issue #34380 · pandas-dev/pandas (original) (raw)

After specified column after groupby output is incorrect.

df = pd.DataFrame({'token': [12345.0, 12345.0, 12345.0, 12345.0, 12345.0, 12345.0, 6789.0, 6789.0, 6789.0, 6789.0, 6789.0, 6789.0], 'name': ['abc', 'abc', 'abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz'], 'ltp': [2.0, 5.0, 3.0, 9.0, 5.0, 16.0, 1.0, 5.0, 3.0, 13.0, 9.0, 20.0], 'change': [np.nan, 1.5, -0.4, 2.0, -0.44444399999999995, 2.2, np.nan, 4.0, -0.4, 3.333333, -0.307692, 1.222222]})
print (df)
      token name   ltp    change
0   12345.0  abc   2.0       NaN
1   12345.0  abc   5.0  1.500000
2   12345.0  abc   3.0 -0.400000
3   12345.0  abc   9.0  2.000000
4   12345.0  abc   5.0 -0.444444
5   12345.0  abc  16.0  2.200000
6    6789.0  xyz   1.0       NaN
7    6789.0  xyz   5.0  4.000000
8    6789.0  xyz   3.0 -0.400000
9    6789.0  xyz  13.0  3.333333
10   6789.0  xyz   9.0 -0.307692
11   6789.0  xyz  20.0  1.222222
df0 = df.groupby('name').agg(pos=pd.NamedAgg(column='change',aggfunc=lambda x: x.gt(0).sum()),\
                            neg = pd.NamedAgg(column='change',aggfunc=lambda x:x.lt(0).sum()))
print (df0)
      pos  neg
name          
abc   3.0  2.0
xyz   3.0  2.0
df1 = df.groupby('name')['change'].agg(pos = pd.NamedAgg(column='change',aggfunc=lambda x:x.gt(0).sum()),\
                                 neg = pd.NamedAgg(column='change',aggfunc=lambda x:x.lt(0).sum()))
    
print (df1)
      pos  neg
name          
abc   2.0  2.0
xyz   2.0  2.0
df2 = df.groupby('name')['change'].agg(pos = lambda x:x.gt(0).sum(),\
                                      neg = lambda x:x.lt(0).sum())
print (df2)
      pos  neg
name          
abc   3.0  2.0
xyz   3.0  2.0

In my opinion instead wrong output it should raise error/warning.

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.6.final.0
python-bits      : 64
OS               : Windows
OS-release       : 7
machine          : AMD64
processor        : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : en
LOCALE           : None.None

pandas           : 1.0.1
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 45.2.0.post20200210
Cython           : 0.29.15
pytest           : 5.3.5
hypothesis       : 5.5.4
sphinx           : 2.4.0
blosc            : None
feather          : None
xlsxwriter       : 1.2.7
lxml.etree       : 4.5.0
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.1
IPython          : 7.12.0
pandas_datareader: None
bs4              : 4.8.2
bottleneck       : 1.3.2
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.5.0
matplotlib       : 3.1.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : 5.3.5
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : 1.3.13
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.3.0
xlsxwriter       : 1.2.7
numba            : 0.48.0
None