Resampling uses inconsistent labeling for sub-daily and super-daily frequencies · Issue #9586 · pandas-dev/pandas (original) (raw)

xref #2665
xref #5440

Resample appears to be use an inconsistent label convention depending on whether the target frequency is sub-daily/daily or super-daily:

I found this behavior quite surprising and confusing. Is it intentional? I would like to rationalize this if possible, because this strikes me as very poor design. The behavior also couples in a weird way with the closed argument (see the linked issues).

From my perspective (as someone who uses monthly and yearly data), the sub-daily/daily behavior makes sense and the super-daily behavior is a bug: there's no particular reason why it makes sense to use 1 day as an offset for frequencies with super-daily resolution.

CC @Cd48 @kdebrab


Here's my test script:

for orig_freq, target_freq in [('20s', '1min'), ('20min', '1H'), ('10H', '1D'), ('3D', '10D'), ('10D', '1M'), ('1M', 'Q'), ('3M', 'A')]: print '%s -> %s:' % (orig_freq, target_freq) ind = pd.date_range('2000-01-01', freq=orig_freq, periods=10) s = pd.Series(np.arange(10), ind) print 'default', s.resample(target_freq, how='first').index[0] print 'left', s.resample(target_freq, label='left', how='first').index[0] print 'right', s.resample(target_freq, label='right', how='first').index[0]

20s -> 1min:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-01 00:01:00
20min -> 1H:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-01 01:00:00
10H -> 1D:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-02 00:00:00
3D -> 10D:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-11 00:00:00
10D -> 1M:
default 2000-01-31 00:00:00
left 1999-12-31 00:00:00
right 2000-01-31 00:00:00
1M -> Q:
default 2000-03-31 00:00:00
left 1999-12-31 00:00:00
right 2000-03-31 00:00:00
3M -> A:
default 2000-12-31 00:00:00
left 1999-12-31 00:00:00
right 2000-12-31 00:00:00