DEPR: Deprecate the convert_dtype param in Series.Apply by topper-123 · Pull Request #52257 · pandas-dev/pandas (original) (raw)
Thinking about it now again, removing convert_dtype
in Series.apply
made things lot simpler in the code base. It was very inconsistently implemented, so typically for ExtensionArrays, convert_dtype
wasn't implemented. I value a consistent interface, so IMO either implement convert_dtype
in Extensionarray.map
or remove it. I didn't think convert_dtype
adds much value, so opted to deprecate it.
My original idea was actually to move convert_dtype
to Series.map
, but had a discussion with @rhshadrach about it see comment in #52140 and opted to just deprecate it in Series.apply
and not move it to Series.map
.
But also right now, we do infer the dtype of the result, even if the starting dtype was object. So with our current behaviour, first casting to object dtype before calling apply doesn't make any difference for the result dtype inference.
Hm, yeah, I see this point, I didn't consider object dtype, when I wrote the error message, I must admit, so the question is how to deal with it. I think there are two options:
1: Implement convert_dtype
parameter in Series|DataFrame.map
and guide people to use Series|DataFrame.map
. That means it must also either be implemented in ExtensionArray.map
or a note should be added in the .map
doc string that convert_dtype has not effect on ExtensionArrays.
2: Make a clearer error message
I'm not super sure what direction is best. Generally, operating element-wise makes it very difficult to guarantee a certain return dtype because an array has to be constructed and we can't know what dtype is should have. E.g. for something like ser.map(lambda x: ...)
we can't really know what the return dtype will be. But I also see the value in a "best effort" like we do currently.
I guess the choice depends on how much people value the option to do convert_dtype=False
. For me, I put a low value on it (I wouldn't use it) and am ok with just doing a better deprecation message, so I think option 2 is better.