BUG: Inconsistent behavior with groupby and copy-on-write (original) (raw)
Grouping by a Series and the mutating that Series can have different impacts whether a view on the data exists.
ser = pd.Series([1, 2, 1]) df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) gb = df.groupby(ser) ser.iloc[0] = 100 print(gb.sum())
a b
1 3 6
2 2 5
100 1 4
ser = pd.Series([1, 2, 1]) df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) ser2 = ser[:] gb = df.groupby(ser) ser.iloc[0] = 100 print(gb.sum())
a b
1 4 10
2 2 5
This only happens for certain paths in groupby, e.g. using
ser = pd.Series(pd.Categorical([1, 2, 1], categories=[1, 2, 100]))
gives the latter behavior. We should be taking a shallow copy of any grouping Series when we create the DataFrameGroupBy instance.
Hat-tip to @jorisvandenbossche for constructing the example.