BUG: freq setter for DatetimeIndex/TimedeltaIndex/PeriodIndex is poorly supported · Issue #20678 · pandas-dev/pandas (original) (raw)

1. Setting to a frequency string alias fails for all datetimelike indexes (same error for all three):

In [1]: import pandas as pd

In [2]: dti = pd.DatetimeIndex(['20170101', '20170102', '20170103'])

In [3]: dti Out[3]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq=None)

In [4]: dti.freq = 'D'

In [5]: dti Out[5]: --------------------------------------------------------------------------- AttributeError: 'str' object has no attribute 'freqstr'

2. DatetimeIndex and TimedeltaIndex allow setting to invalid frequencies:

In [6]: dti = pd.DatetimeIndex(['20170101', '20170102', '20170103'])

In [7]: dti.freq = pd.offsets.Day(2)

In [8]: dti Out[8]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq='2D')

Note that trying to construct this from scratch fails, as expected:

In [9]: dti = pd.DatetimeIndex(['20170101', '20170102', '20170103'], freq='2D')

ValueError: Inferred frequency D from passed dates does not conform to passed frequency 2D

3. PeriodIndex gives nonsensical output when setting with an offset:

In [10]: pi = pd.PeriodIndex(['2018Q1', '2018Q2', '2018Q3'], freq='Q')

In [11]: pi Out[11]: PeriodIndex(['2018Q1', '2018Q2', '2018Q3'], dtype='period[Q-DEC]', freq='Q-DEC')

In [12]: pi.freq = pd.offsets.Day()

In [13]: pi Out[13]: PeriodIndex(['1970-07-12', '1970-07-13', '1970-07-14'], dtype='period[Q-DEC]', freq='D')

In addition to the nonsensical dates, note the incompatibility in [13] between dtype and freq.

Expected Output

I don't know that setting freq for a PeriodIndex should be supported: there's some ambiguity in regards aligning to the start/end of a period, and there's also the PeriodIndex.asfreq method that gives more control over this.

If setting freq is disallowed for PeriodIndex, should setting also be disallowed for DatetimeIndex and TimedeltaIndex in order to remain consistent? If so, should there then be an analogous asfreq for these two?

My thoughts on the three issues:

  1. String aliases should be coerced as appropriate (be it via setting or a asfreq method).
  2. This should raise with a similar message as the constructor (be it via setting or a asfreq method).
  3. This should raise, potentially with a message indicating to use asfreq. Or if we choose to allow setting, this should be consistent with the default asfreq options.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: fa231e8
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+740.gfa231e8
pytest: 3.1.2
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.0
pandas_gbq: None
pandas_datareader: None