BUG: ExtensionBlock.set not setting values inplace by jbrockmendel · Pull Request #32831 · pandas-dev/pandas (original) (raw)

does this need a whatsnew as your example IS user facing? if so, can you add in a followon

yes, will do.

@jreback two questions on Block.setitem behavior (AFAICT you wrote at least one of the two original implementations)

In Block.setitem we have a check

        elif (
            exact_match
            and is_categorical_dtype(arr_value.dtype)
            and not is_categorical_dtype(values)
        ):
            # GH25495 - If the current dtype is not categorical,
            # we need to create a new categorical block
            values[indexer] = value
            return self.make_block(Categorical(self.values, dtype=arr_value.dtype))

It isn't clear why we need exact_match here. If we remove that, there is one test that fails because it expects to retain the non-Categorical dtype when setting only 2 of the 3 values with a length-2 Categorical. Is this important? (not having this restriction would make it easier to simplify this method)

Second, the next check in Block.setitem is:

        # if we are an exact match (ex-broadcasting),
        # then use the resultant dtype
        elif exact_match:
            # We are setting _all_ of the array's values, so can cast to new dtype
            values[indexer] = value
            values = values.astype(arr_value.dtype, copy=False)

The non-obvious thing here is why we are over-writing values instead of just using value (which would also save an astype!). CoW semantics are hard, and it seems really easy for some of these to be careful and intentional and others not to be.