resampling closed='left' incorrect ? · Issue #5440 · pandas-dev/pandas (original) (raw)
Hi,
Coming back to issue #2665, I'm still not convinced that the closed-argument of the resampling function does what it should.
When resampling daily to monthly values with closed = 'left', the closed argument should relate to the actual time the months change, but instead it seems to refer to the time 24 hours earlier.
In [25]: import pandas as pd
...: import numpy as np
...: dates = pd.date_range('3/29/2000', '4/3/2000', freq='1D')
In [27]: ts1 = pd.Series(np.ones(len(dates)), index=dates)
In [28]: ts2 = ts1.resample('1M', closed='left',label='left').sum()
In [29]: ts3 = ts1.resample('1M').sum()
In [30]: ts1
Out[30]:
2000-03-29 1.0
2000-03-30 1.0
2000-03-31 1.0
2000-04-01 1.0
2000-04-02 1.0
2000-04-03 1.0
Freq: D, dtype: float64
# resampling and asking that the first sample of the month (midnight April 1st) is part of that month does not result in what you think. Instead pandas does not consider 01/04/2000 00:00 as the time the months change, but it thinks the months change at 31/03/2000 00:00 (one day too early)
In [31]: ts2
Out[31]:
2000-02-29 2.0
2000-03-31 4.0
Freq: M, dtype: float64
# keeping it simple is a possible solution but it would be better to have the closed-argument working correctly ...
In [32]: ts3
Out[32]:
2000-03-31 3.0
2000-04-30 3.0
Freq: M, dtype: float64