BUG: inconsistent state of DatetimeIndex._data · Issue #20810 · pandas-dev/pandas (original) (raw)

Depending on how a DatetimeIndex is constructed, the underlying ._data attribute is a DatetimeIndex or a datetime64 ndarray:

In [1]: idx1 = pd.DatetimeIndex(start="2012-01-01", periods=3, freq='D') # date_range kind of construction

In [2]: idx1._data
Out[2]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq=None)

In [3]: idx2 = pd.DatetimeIndex(idx1)

In [4]: idx2._data
Out[4]: 
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
       '2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

I think this should always be a numpy array? (it clearly doesn't hurt, but I don't see any reason to have it sometimes as a DatetimeIndex)

This came out of fixing warnings in #20721, and is due to how _generate_regular_range and _simple_new on DatetimeIndex are implemented. From the code of _simple_new, I suspect that it assumes the input is always an ndarray and not DatetimeIndex, but in several places (like _generate_regular_range) an already constructed DatetimeIndex is passed to _simple_new.