BUG: odd transform behaviour with integers · Issue #7972 · pandas-dev/pandas (original) (raw)

After grouping on an integer column, we seem to forget that we have groups:

>>> pd.__version__
'0.14.1-172-gab64d58'
>>> x = np.arange(6, dtype=np.int64)
>>> df = pd.DataFrame({"a": x//2, "b": 2.0*x, "c": 3.0*x})
>>> df
   a   b   c
0  0   0   0
1  0   2   3
2  1   4   6
3  1   6   9
4  2   8  12
5  2  10  15
>>> df.groupby("a").transform("mean")
    b     c
0   1   1.5
1   5   7.5
2   9  13.5
3 NaN   NaN
4 NaN   NaN
5 NaN   NaN
>>> df["a"] = df["a"]*1.0
>>> df.groupby("a").transform("mean")
   b     c
0  1   1.5
1  1   1.5
2  5   7.5
3  5   7.5
4  9  13.5
5  9  13.5

To make it even more obvious:

>>> df.index = range(20, 26)
>>> df.groupby("a").transform("mean")
     b     c
0    1   1.5
1    5   7.5
2    9  13.5
20 NaN   NaN
21 NaN   NaN
22 NaN   NaN
23 NaN   NaN
24 NaN   NaN
25 NaN   NaN

Switching to a float index seems to avoid the issues as well.