Resampling uses inconsistent labeling for sub-daily and super-daily frequencies · Issue #9586 · pandas-dev/pandas (original) (raw)

xref #2665
xref #5440

Resample appears to be use an inconsistent label convention depending on whether the target frequency is sub-daily/daily or super-daily:

For sub-daily/daily frequencies, label='left' makes labels at the timestamp corresponding to the start of each frequency bin, and label='right' that makes labels at that timestamp plus the frequency (at the timestamp dividing exactly dividing bins).
For super-daily frequencies, both labels appears to shifted minus one day to the left, so the timestamps no longer cleanly divide the frequencies. Moreover, the default label shifts from 'left' to 'right'! My guess is that the default was changed here because users were confused by label='left' no longer falling inside the expected interval. (I guess I could check git blame for the details.)

I found this behavior quite surprising and confusing. Is it intentional? I would like to rationalize this if possible, because this strikes me as very poor design. The behavior also couples in a weird way with the closed argument (see the linked issues).

From my perspective (as someone who uses monthly and yearly data), the sub-daily/daily behavior makes sense and the super-daily behavior is a bug: there's no particular reason why it makes sense to use 1 day as an offset for frequencies with super-daily resolution.

CC @Cd48 @kdebrab

Here's my test script:

for orig_freq, target_freq in [('20s', '1min'), ('20min', '1H'), ('10H', '1D'), ('3D', '10D'), ('10D', '1M'), ('1M', 'Q'), ('3M', 'A')]: print '%s -> %s:' % (orig_freq, target_freq) ind = pd.date_range('2000-01-01', freq=orig_freq, periods=10) s = pd.Series(np.arange(10), ind) print 'default', s.resample(target_freq, how='first').index[0] print 'left', s.resample(target_freq, label='left', how='first').index[0] print 'right', s.resample(target_freq, label='right', how='first').index[0]

20s -> 1min:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-01 00:01:00
20min -> 1H:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-01 01:00:00
10H -> 1D:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-02 00:00:00
3D -> 10D:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-11 00:00:00
10D -> 1M:
default 2000-01-31 00:00:00
left 1999-12-31 00:00:00
right 2000-01-31 00:00:00
1M -> Q:
default 2000-03-31 00:00:00
left 1999-12-31 00:00:00
right 2000-03-31 00:00:00
3M -> A:
default 2000-12-31 00:00:00
left 1999-12-31 00:00:00
right 2000-12-31 00:00:00