DEPR: Deprecate the convert_dtype param in Series.Apply by topper-123 · Pull Request #52257 · pandas-dev/pandas (original) (raw)

Hi @jorisvandenbossche

Thinking about it now again, removing convert_dtype in Series.apply made things lot simpler in the code base. It was very inconsistently implemented, so typically for ExtensionArrays, convert_dtype wasn't implemented. I value a consistent interface, so IMO either implement convert_dtype in Extensionarray.map or remove it. I didn't think convert_dtype adds much value, so opted to deprecate it.

My original idea was actually to move convert_dtype to Series.map, but had a discussion with @rhshadrach about it see comment in #52140 and opted to just deprecate it in Series.apply and not move it to Series.map.

But also right now, we do infer the dtype of the result, even if the starting dtype was object. So with our current behaviour, first casting to object dtype before calling apply doesn't make any difference for the result dtype inference.

Hm, yeah, I see this point, I didn't consider object dtype, when I wrote the error message, I must admit, so the question is how to deal with it. I think there are two options:

1: Implement convert_dtype parameter in Series|DataFrame.map and guide people to use Series|DataFrame.map. That means it must also either be implemented in ExtensionArray.map or a note should be added in the .map doc string that convert_dtype has not effect on ExtensionArrays.
2: Make a clearer error message

I'm not super sure what direction is best. Generally, operating element-wise makes it very difficult to guarantee a certain return dtype because an array has to be constructed and we can't know what dtype is should have. E.g. for something like ser.map(lambda x: ...) we can't really know what the return dtype will be. But I also see the value in a "best effort" like we do currently.

I guess the choice depends on how much people value the option to do convert_dtype=False. For me, I put a low value on it (I wouldn't use it) and am ok with just doing a better deprecation message, so I think option 2 is better.