pandas (original) (raw)

I'm fine with prohibiting dtype changing here. Seems more consistent with existing behavior in pandas/numpy and makes the logic easier.

It looks like the existing needs_float_conversion logic is incomplete though and only handles the scalar case. Setting a slice with a list or IntervalArray containing np.nan doesn't raise or change dtype but instead takes the integer sentinel value (not sure if that's the right term?):

In [2]: ia = pd.arrays.IntervalArray.from_breaks(range(5))

In [3]: ia[:2] = [np.nan, pd.Interval(0, 5)]

In [4]: ia Out[4]: [(-9223372036854775808, -9223372036854775808], (0, 5], (2, 3], (3, 4]] Length: 4, closed: right, dtype: interval[int64]

I think we can address this by shifting the logic around a little bit. The following diff addresses the issue locally for me and at first glance don't appear to break anything:

diff --git a/pandas/core/arrays/interval.py b/pandas/core/arrays/interval.py index 22ce5a6f8..2e6e4bd0c 100644 --- a/pandas/core/arrays/interval.py +++ b/pandas/core/arrays/interval.py @@ -513,18 +513,15 @@ class IntervalArray(IntervalMixin, ExtensionArray): return self._shallow_copy(left, right)

 def __setitem__(self, key, value):

   # na value: need special casing to set directly on numpy arrays

   needs_float_conversion = False
   if is_scalar(value) and isna(value):

       if is_integer_dtype(self.dtype.subtype):

           # can't set NaN on a numpy integer array

           needs_float_conversion = True

       elif is_datetime64_any_dtype(self.dtype.subtype):

       if is_datetime64_any_dtype(self.dtype.subtype):
           # need proper NaT to set directly on the numpy array
           value = np.datetime64("NaT")
       elif is_timedelta64_dtype(self.dtype.subtype):
           # need proper NaT to set directly on the numpy array
           value = np.timedelta64("NaT")

```
       else:
```

           value = np.nan
       value_left, value_right = value, value

   # scalar interval

@@ -542,18 +539,18 @@ class IntervalArray(IntervalMixin, ExtensionArray): msg = f"'value' should be an interval type, got {type(value)} instead." raise TypeError(msg) from err

   if is_integer_dtype(self.dtype.subtype) and np.any(isna(value_left)):

       raise ValueError("Cannot set float NaN to integer-backed IntervalArray")