Unexpected behaviour of groupby.transform when using 'fillna' · Issue #30918 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd import numpy as np
df = pd.DataFrame( { 'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'baz'], 'B': [1, 2, np.nan, 3, 3, np.nan, 4], 'C': [np.nan]*7, 'D': [0,1,2,3,4,5,6], 'E': [np.nan] + [datetime.datetime(2020,1,1)]*3 + [datetime.datetime(2020,1,2)]*2 +[datetime.datetime(2020,1,3)], 'F': list('abcdefg'), 'G': list('abc') + [np.nan] + list('efg'), 'id': range(0,7), } ).set_index('id') df.groupby('A').transform('fillna', value=9999)
Output
B | C | D | E | F | G |
---|---|---|---|---|---|
9999.0 | 9999.0 | 2 | 2020-01-01 00:00:00 | c | c |
9999.0 | 9999.0 | 2 | 2020-01-01 00:00:00 | c | c |
9999.0 | 9999.0 | 2 | 2020-01-01 00:00:00 | c | c |
9999.0 | 9999.0 | 2 | 2020-01-01 00:00:00 | c | c |
1.0 | 9999.0 | 0 | 9999 | a | a |
1.0 | 9999.0 | 0 | 9999 | a | a |
2.0 | 9999.0 | 1 | 2020-01-01 00:00:00 | b | b |
Problem description
When using GroupBy.transform
together with 'fillna' I expected it to work like GroupBy.transform
together with lambda x: x.fillna()
. Instead, it seems to also change values that are not NaN. Even worse, it seems to shuffle contents between groups.
Is this how it is expected to work?
Expected Output
df.groupby('A').transform(lambda x: x.fillna(9999))
B | C | D | E | F | G |
---|---|---|---|---|---|
1.0 | 9999.0 | 0 | 9999 | a | a |
2.0 | 9999.0 | 1 | 2020-01-01 00:00:00 | b | b |
9999.0 | 9999.0 | 2 | 2020-01-01 00:00:00 | c | c |
3.0 | 9999.0 | 3 | 2020-01-01 00:00:00 | d | 9999 |
3.0 | 9999.0 | 4 | 2020-01-02 00:00:00 | e | e |
9999.0 | 9999.0 | 5 | 2020-01-02 00:00:00 | f | f |
4.0 | 9999.0 | 6 | 2020-01-03 00:00:00 | g | g |
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : de_DE.UTF-8
LOCALE : None.None
pandas : 0.25.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200106
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None