BUG: Inconsistent behavior with groupby and copy-on-write (original) (raw)

Grouping by a Series and the mutating that Series can have different impacts whether a view on the data exists.

ser = pd.Series([1, 2, 1]) df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) gb = df.groupby(ser) ser.iloc[0] = 100 print(gb.sum())

a b

1 3 6

2 2 5

100 1 4

ser = pd.Series([1, 2, 1]) df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) ser2 = ser[:] gb = df.groupby(ser) ser.iloc[0] = 100 print(gb.sum())

a b

1 4 10

2 2 5

This only happens for certain paths in groupby, e.g. using

ser = pd.Series(pd.Categorical([1, 2, 1], categories=[1, 2, 100]))

gives the latter behavior. We should be taking a shallow copy of any grouping Series when we create the DataFrameGroupBy instance.

Hat-tip to @jorisvandenbossche for constructing the example.