BUG (string dtype): replace() value in string column with non-string should cast to object dtype instead of raising an error · Issue #60282 · pandas-dev/pandas (original) (raw)

For all other dtypes (I think, just checked with the one below), if the value to replace with in replace() doesn't fit into the calling series, then we "upcast" to object dtype and then do the replacement anyway.

Simple example with an integer series:

ser = pd.Series([1, 2]) ser.replace(1, "str") 0 str 1 2 dtype: object

However, for the future string dtype, and then trying to replace a value with a non-string, we do not cast to object dtype currently, but raise instead:

pd.options.future.infer_string = True ser = pd.Series(["a", "b"]) ser.replace("a", 1) ... File ~/scipy/repos/pandas/pandas/core/internals/blocks.py:713, in Block.replace(self, to_replace, value, inplace, mask) 709 elif self._can_hold_element(value): 710 # TODO(CoW): Maybe split here as well into columns where mask has True 711 # and rest? 712 blk = self._maybe_copy(inplace) --> 713 putmask_inplace(blk.values, mask, value) 714 return [blk] 716 elif self.ndim == 1 or self.shape[0] == 1: ...

File ~/scipy/repos/pandas/pandas/core/arrays/string_.py:746, in setitem(self, key, value) ... TypeError: Invalid value '1' for dtype 'str'. Value should be a string or missing value, got 'int' instead.

Making replace() strict (preserve dtype) in general is a much bigger topic, so I think for now we should just keep the current behaviour of upcasting to object dtype when needed.