BUG/REF: TimedeltaIndex.new by jbrockmendel · Pull Request #23539 · pandas-dev/pandas (original) (raw)
If we disable the branch that calls to_timedelta, the only other changes we need to make everything work (or specifically, pass existing tests) is to a) convert generators to lists before passing into data = np.array(data, copy=False) and b) catch string-dtypes in the same block that catches np.object_ just below that. Which is to say: calling to_timedelta at all is highly redundant.
If you are going to look at it, I would recommend trying to make the checks you mention here redundant, by letting to_timedelta
handle it. Eg strings is something that to_timedelta
is made for to handle.
I think the object dtype handling could be removed, see below (but certainly didn't check all cases, so this might be too optimistic)
Dived into the (current master) code for a moment, and you have this:
# convert if not already |
---|
if getattr(data, 'dtype', None) != _TD_DTYPE: |
data = to_timedelta(data, unit=unit, box=False) |
elif copy: |
data = np.array(data, copy=True) |
data = np.array(data, copy=False) |
if data.dtype == np.object_: |
data = array_to_timedelta64(data) |
Here, the conversion of everything that is not yet a timedelta64 array is first handled with to_timedelta
. But at the end of the snippet, we still call array_to_timedelta64(data)
(what is used inside to_timedelta
) if the data is object dtype. But didn't we already just convert everything not m8 with to_timedelta
?
So this happens if data
is a TimedeltaArrayMixin (or at least one of the cases): TimedeltaArrayMixin still has a m8[ns] numpy dtype, so the getattr(data, 'dtype', None) != _TD_DTYPE
is passed; so we then convert it with np.array(data)
to an array of Timedelta objects, to then convert it back to m8 with array_to_timedelta64
.
So this is of course just a redundant back and forth conversion, that can be avoided with a check if data
is a TimedeltaArrayMixin, and in that case get the underlying numpy array of it.
(similar to what DatetimeIndex also does, it has a check if isinstance(data, DatetimeArrayMixin)
)