BUG/API: prohibit dtype-changing IntervalArray.setitem by jbrockmendel · Pull Request #32782 · pandas-dev/pandas (original) (raw)
I'm fine with prohibiting dtype changing here. Seems more consistent with existing behavior in pandas/numpy and makes the logic easier.
It looks like the existing needs_float_conversion
logic is incomplete though and only handles the scalar case. Setting a slice with a list
or IntervalArray
containing np.nan
doesn't raise or change dtype but instead takes the integer sentinel value (not sure if that's the right term?):
In [2]: ia = pd.arrays.IntervalArray.from_breaks(range(5))
In [3]: ia[:2] = [np.nan, pd.Interval(0, 5)]
In [4]: ia Out[4]: [(-9223372036854775808, -9223372036854775808], (0, 5], (2, 3], (3, 4]] Length: 4, closed: right, dtype: interval[int64]
I think we can address this by shifting the logic around a little bit. The following diff addresses the issue locally for me and at first glance don't appear to break anything:
diff --git a/pandas/core/arrays/interval.py b/pandas/core/arrays/interval.py index 22ce5a6f8..2e6e4bd0c 100644 --- a/pandas/core/arrays/interval.py +++ b/pandas/core/arrays/interval.py @@ -513,18 +513,15 @@ class IntervalArray(IntervalMixin, ExtensionArray): return self._shallow_copy(left, right)
def __setitem__(self, key, value):
# na value: need special casing to set directly on numpy arrays
needs_float_conversion = False if is_scalar(value) and isna(value):
if is_integer_dtype(self.dtype.subtype):
# can't set NaN on a numpy integer array
needs_float_conversion = True
elif is_datetime64_any_dtype(self.dtype.subtype):
if is_datetime64_any_dtype(self.dtype.subtype): # need proper NaT to set directly on the numpy array value = np.datetime64("NaT") elif is_timedelta64_dtype(self.dtype.subtype): # need proper NaT to set directly on the numpy array value = np.timedelta64("NaT")
else:
value = np.nan value_left, value_right = value, value # scalar interval
@@ -542,18 +539,18 @@ class IntervalArray(IntervalMixin, ExtensionArray): msg = f"'value' should be an interval type, got {type(value)} instead." raise TypeError(msg) from err
if is_integer_dtype(self.dtype.subtype) and np.any(isna(value_left)):
raise ValueError("Cannot set float NaN to integer-backed IntervalArray")