BUG: odd transform behaviour with integers · Issue #7972 · pandas-dev/pandas (original) (raw)
After grouping on an integer column, we seem to forget that we have groups:
>>> pd.__version__
'0.14.1-172-gab64d58'
>>> x = np.arange(6, dtype=np.int64)
>>> df = pd.DataFrame({"a": x//2, "b": 2.0*x, "c": 3.0*x})
>>> df
a b c
0 0 0 0
1 0 2 3
2 1 4 6
3 1 6 9
4 2 8 12
5 2 10 15
>>> df.groupby("a").transform("mean")
b c
0 1 1.5
1 5 7.5
2 9 13.5
3 NaN NaN
4 NaN NaN
5 NaN NaN
>>> df["a"] = df["a"]*1.0
>>> df.groupby("a").transform("mean")
b c
0 1 1.5
1 1 1.5
2 5 7.5
3 5 7.5
4 9 13.5
5 9 13.5
To make it even more obvious:
>>> df.index = range(20, 26)
>>> df.groupby("a").transform("mean")
b c
0 1 1.5
1 5 7.5
2 9 13.5
20 NaN NaN
21 NaN NaN
22 NaN NaN
23 NaN NaN
24 NaN NaN
25 NaN NaN
Switching to a float index seems to avoid the issues as well.