BUG: ExtensionBlock.set not setting values inplace by jbrockmendel · Pull Request #32831 · pandas-dev/pandas (original) (raw)
does this need a whatsnew as your example IS user facing? if so, can you add in a followon
yes, will do.
@jreback two questions on Block.setitem
behavior (AFAICT you wrote at least one of the two original implementations)
In Block.setitem we have a check
elif (
exact_match
and is_categorical_dtype(arr_value.dtype)
and not is_categorical_dtype(values)
):
# GH25495 - If the current dtype is not categorical,
# we need to create a new categorical block
values[indexer] = value
return self.make_block(Categorical(self.values, dtype=arr_value.dtype))
It isn't clear why we need exact_match
here. If we remove that, there is one test that fails because it expects to retain the non-Categorical dtype when setting only 2 of the 3 values with a length-2 Categorical. Is this important? (not having this restriction would make it easier to simplify this method)
Second, the next check in Block.setitem is:
# if we are an exact match (ex-broadcasting),
# then use the resultant dtype
elif exact_match:
# We are setting _all_ of the array's values, so can cast to new dtype
values[indexer] = value
values = values.astype(arr_value.dtype, copy=False)
The non-obvious thing here is why we are over-writing values
instead of just using value
(which would also save an astype
!). CoW semantics are hard, and it seems really easy for some of these to be careful and intentional and others not to be.