Series[dt64].astype(int64) vs Series[Sparse[dt64]].astype(int64) · Issue #49631 · pandas-dev/pandas (original) (raw)
dti = pd.date_range("2016-01-01", periods=3) ser = pd.Series(dti) ser[0] = pd.NaT
dense = ser._values sparse = pd.core.arrays.SparseArray(ser.values)
dense.astype("int64") array([-9223372036854775808, 1451692800000000000, 1451779200000000000])
sparse.astype("int64") [...] ValueError: Cannot convert NaT values to integer
sparse.astype("Sparse[int64]") [0, 1451692800000000000, 1451779200000000000] Fill: 0 IntIndex Indices: array([1, 2], dtype=int32)
The dense version goes through DatetimeArray.astype
, for which .astype to int64 is basically a view (xref #45034). The Sparse version goes through astype_nansafe
which specifically checks for NaTs when going from dt64->int64. I expected this to match the non-sparse behavior.
When converting to Sparse[int64]
, we only call astype_nansafe on the non-NaT elements so it doesn't raise, but when converting the fill_value from NaT it somehow gets 0, whereas I expected that to raise.
Side-notes:ser.astype(pd.SparseDtype(ser.dtype))
raises, as does dense.astype("Sparse[int64]")