BUG: groupby and agg on read-only array gives ValueError: buffer source array is read-only · Issue #36014 · pandas-dev/pandas (original) (raw)

import pandas as pd import pyarrow as pa

df = pd.DataFrame( { "sepal_length": [5.1, 4.9, 4.7, 4.6, 5.0], "species": ["setosa", "setosa", "setosa", "setosa", "setosa"], } )

context = pa.default_serialization_context() data = context.serialize(df).to_buffer().to_pybytes() df_new = context.deserialize(data)

this fails

df_new.groupby(["species"]).agg({"sepal_length": "sum"})

this works

df_new.copy().groupby(["species"]).agg({"sepal_length": "sum"})

This is the traceback.

Traceback (most recent call last): File "demo.py", line 16, in df_new.groupby(["species"]).agg({"sepal_length": "sum"}) File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 949, in aggregate result, how = self._aggregate(func, *args, **kwargs) File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/base.py", line 416, in _aggregate result = _agg(arg, _agg_1dim) File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/base.py", line 383, in _agg result[fname] = func(fname, agg_how) File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/base.py", line 367, in _agg_1dim return colg.aggregate(how) File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 240, in aggregate return getattr(self, func)(*args, **kwargs) File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1539, in sum return self._agg_general( File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 999, in _agg_general return self._cython_agg_general( File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1033, in _cython_agg_general result, agg_names = self.grouper.aggregate( File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 584, in aggregate return self._cython_operation( File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 537, in _cython_operation result = self._aggregate(result, counts, values, codes, func, min_count) File "/home/jeet/miniconda3/envs/rnd/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 599, in _aggregate agg_func(result, counts, values, comp_ids, min_count) File "pandas/_libs/groupby.pyx", line 475, in pandas._libs.groupby._group_add File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper File "stringsource", line 349, in View.MemoryView.memoryview.cinit ValueError: buffer source array is read-only

In the .agg line that fails, if you do a min, max, median, or count aggregation, then it's going to work.

But if you do a sum or mean, then it fails.

I expected the aggregation to succeed without any error.