API: Series.str-accessor infers dtype (and Index.str does not raise on all-NA) by h-vetinari · Pull Request #23167 · pandas-dev/pandas (original) (raw)

closes #23163
closes #23011
closes #23551
#23555 / #23556

Several times while working around str.cat (most recently #22725) @jreback and @WillAyd mentioned that the .str accessor should not work for (e.g.) bytes data. So far, the constructor of the StringMethods object didn't infer for Series -- this PR adds that.

What needs disussion is the following: I've commented out two tests about an encode-decode-roundtrip that cannot work when forbidding bytes data. Should such an encode-decode-cycle explicitly part of the desired functionality for .str (currently there are .str.encode and .str.decode methods), or not? I guess that even if not, this would need a deprecation cycle.

Alternatively, it'd be possible to allow construction of the StringMethods object but raise on inferred_dtype='bytes' for all but the str.decode method.

Finally, it would be possible to partially close #13877 by also excluding 'mixed-integer' from the allowed inferred types, but for floats, the inference-code would need to be able to distinguish between actual floats and missing values (currently, both of those return inferred_type='mixed')