REGR: Restore _constructor_from_mgr to pass manager object to constructor by jorisvandenbossche · Pull Request #54922 · pandas-dev/pandas (original) (raw)

The motivation is that geopandas does something involving inference on object dtypes differently if it gets a Manager object? Can you elaborate on that?

@jbrockmendel yes, whenever we get a manager, we don't do any inference, while if we get object dtype array, we do some inference (and then potentially fallback to DataFrame instead of GeoDataFrame in case there are no geometries). See also my geopandas PR related to this: geopandas/geopandas#3046

But that's geopandas specific (and something we can handle by overriding the constructor from mgr), while this PR actually addresses a more general issue: with your change in #53871, creating a Series or DataFrame object with self._from_mgr and passing that to _constructor, the consequence of this is that we are now passing a non-fully initialized object to the subclass constructor. And there is no guarantee that the subclass constructor works with such partial objects.

Essentially, this is the same as #55607, where Series._from_mgr currently returns an non-fully initialized Series object (there is an attribute missing, so some methods don't work). Similarly for subclasses, when this is created with _from_mgr, it didn't go through the subclass' __init__ function, and so the subclass object might not be fully correctly be set up.

I think, when passing (subclassed) DataFrame/Series objects to the public constructor (__init__), one should always be able to assume that this are valid, fully-initialized objects.

I added a generic test that demonstrates this (test_constructor_with_metadata in frame/test_subclass.py). This test works on 2.0, but fails on 2.1.1.