pandas.DataFrame.sum() returns wrong type for subclassed pandas DataFrame · Issue #25596 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

the following code is obtained from the documentation

https://pandas.pydata.org/pandas-docs/stable/development/extending.html

import pandas as pd

class SubclassedSeries(pd.Series): @property def _constructor(self): return SubclassedSeries @property def _constructor_expanddim(self): return SubclassedDataFrame

class SubclassedDataFrame(pd.DataFrame): @property def _constructor(self): return SubclassedDataFrame @property def _constructor_sliced(self): return SubclassedSeries

create a class instance as in the example of the documentation

df = SubclassedDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

df A B C 0 1 4 7 1 2 5 8 2 3 6 9

this works just fine

type(df) <class '__main__.SubclassedDataFrame'>

slicing also works fine

sliced2 = df['A'] sliced2 0 1 1 2 2 3 Name: A, dtype: int64

type(sliced2) <class '__main__.SubclassedSeries'>

however, the sum operation returns a pandas.Series, not SubclassedSeries

sliced3 = df.sum() sliced3 0 1 1 2 2 3 Name: A, dtype: int64

type(sliced3) <class 'pandas.core.series.Series'>

Problem description

In our project, we extend pandas as described in the documentation and implement our own kind of DataFrame and Series, similar to the geopandas project (if you apply sum on their DataFrame, the same problem appears). If you want to use _reduce operations like sum, it is important that the correct SubclassedSeries is returned. Otherwise, inheritance from pandas.DataFrames is not possible.

Expected Output

type(sliced3) <class '__main__.SubclassedSeries'>

I think I can provide a possible fix of this problem: The relevant code is contained in core/frame.py just before the return statement of the _reduce function:

this is the code in core/frame.py:

def _reduce(...): # .... left out if constructor is not None: result = Series(result, index=labels) return result

I suggest the following change:

def _reduce(...): # .... left out if constructor is None: result = Series(result, index=labels) else: result = constructor(result, index=labels) # alternative (since constructor will create a SubclassedDataFrame): result = self._constructor_sliced(result, index=labels) return result

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None