API: Series.str-accessor infers dtype (and Index.str does not raise on all-NA) by h-vetinari · Pull Request #23167 · pandas-dev/pandas (original) (raw)
closes #23163
closes #23011
closes #23551#23555 / #23556
- tests added / passed
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry
Several times while working around str.cat
(most recently #22725) @jreback and @WillAyd mentioned that the .str
accessor should not work for (e.g.) bytes data. So far, the constructor of the StringMethods
object didn't infer for Series
-- this PR adds that.
What needs disussion is the following: I've commented out two tests about an encode-decode-roundtrip that cannot work when forbidding bytes data. Should such an encode-decode-cycle explicitly part of the desired functionality for .str
(currently there are .str.encode
and .str.decode
methods), or not? I guess that even if not, this would need a deprecation cycle.
Alternatively, it'd be possible to allow construction of the StringMethods
object but raise on inferred_dtype='bytes'
for all but the str.decode
method.
Finally, it would be possible to partially close #13877 by also excluding 'mixed-integer'
from the allowed inferred types, but for floats, the inference-code would need to be able to distinguish between actual floats and missing values (currently, both of those return inferred_type='mixed'
)