BUG/API: setitem/where with mixed dt64 units · Issue #56410 · pandas-dev/pandas (original) (raw)
import pandas as pd
dti = pd.date_range("2016-01-01", periods=3, unit="s") item = pd.Timestamp("2016-02-03 04:05:06.789")
ser = pd.Series(dti)
ser[0] = item
assert ser[0] == item # <- raises bc item is rounded
mask = [False, True, True] res = ser.where(mask, item)
assert res[0] == item # <- raises bc item is rounded
ser2 = ser.copy() ser2[0] = pd.NaT res2 = ser2.fillna(item)
assert res2[0] == item # <- raises bc item is rounded
Issue Description
The operations above all go through DatetimeArray._validate_setitem_value -> _validate_scalar -> _unbox_scalar. _unbox_scalar does a value.as_unit(self.unit)
, which uses the default round_ok=True
, so casts the ms-unit item
to s-unit, which truncates.
(Something similar happens if we use a vector vec = pd.date_range(item, periods=3, unit="ms")
instead of scalar item
, but that also has other issues that I'll address later)
The apparent solution here is to do the as_unit casting in _validate_scalar, bug pass round_ok=False
. This fixes the cases above (the __setitem__
case correctly issues a PDEP6 warning). But it breaks test_resample_ohlc_result, test_resample_dst_anchor2, and test_clip_with_timestamps_and_oob_datetimes.
In the clip test, the change causes us to raise in EABackedBlock.where, fall back to coerce_to_target_dtype, which identifies new_dtype
and M8[ns]
, tries to cast, and raises OutOfBoundsDatetime. There are a few non-crazy options here. I lean toward doing a PDEP6-like deprecation for cases that go through this path (where, mask, clip, fillna).
The other cases are raising a KeyError on a DatetimeIndex.get_loc call when doing something like ser.loc[:"4-15-2000"]
. I need to look into this more closely, tentatively looks solvable.
Last I want to make sure I write down the really ugly bug in where/fillna if we use a vector:
vec = pd.date_range(item, periods=3, unit="ms")
mask = [False, True, True]
res = ser.where(mask, vec)
arr = res._values
>>> arr.dtype == arr._ndarray.dtype # <- raises, yikes!
This goes though NDArrayBackedExtensionArray._where, which does value = self._validate_setitem_value(value)
which fails to cast the unit of vec
, so gets back value
which is a M8[ms]
ndarray instead of M8[s]
. We then pass it to np.where which gives back res_values
with M8[ms]
dtype, and then call _from_backing_data which assumes that res_values.dtype == self._ndarray.dtype
. That assumption is wrong in this case and we get a corrupted EA back.
I've got a branch addressing this, just want to make sure this is written down somewhere.
cc @MarcoGorelli to make sure we're in agreement that the cases above should not silently truncate.