Incorrect resampling from hourly to monthly values · Issue #2665 · pandas-dev/pandas (original) (raw)
EDIT: Modern reproducible example:
In [1]: import pandas as pd
In [2]: dates = pd.date_range('3/20/2000', '5/20/2000', freq='1h')
In [4]: ts = pd.Series(range(len(dates)), index=dates)
In [5]: def start(x):
...: return x.index[0]
...:
...: def end(x):
...: return x.index[-1]
...:
In [6]: ts.resample('1M', closed='right', label='right').apply(start)
Out[6]:
2000-03-31 2000-03-20
2000-04-30 2000-04-01
2000-05-31 2000-05-01
Freq: M, dtype: datetime64[ns]
In [7]: ts.resample('1M', closed='right', label='right').apply(end)
Out[7]:
2000-03-31 2000-03-31 23:00:00
2000-04-30 2000-04-30 23:00:00
2000-05-31 2000-05-20 00:00:00
Freq: M, dtype: datetime64[ns]
In [8]: ts.resample('1M', closed='left', label='right').apply(start)
Out[8]:
2000-03-31 2000-03-20
2000-04-30 2000-03-31
2000-05-31 2000-04-30
Freq: M, dtype: datetime64[ns]
In [9]: ts.resample('1M', closed='left', label='right').apply(end)
Out[9]:
2000-03-31 2000-03-30 23:00:00
2000-04-30 2000-04-29 23:00:00
2000-05-31 2000-05-20 00:00:00
Freq: M, dtype: datetime64[ns]
Expected behavior: #2665 (comment)
There seems to be something wrong when resampling hourly values to monthly values. The 'closed=' argument does not do what it should. I am using pandas 0.10.0.
import pandas as pd
from pandas import TimeSeries
import numpy as np
# Timeseries with hourly values
dates = pd.date_range('3/20/2000', '5/20/2000', freq='1h')
ts = TimeSeries(np.arange(len(dates)), index=dates)
# I want monthly values equal to the summed values of the hours in that month
# With "closed=left" I want the hourly value (1 value) at midnight to be
# included in the month after that very midnight
data_monthly_1 = ts.resample('1M', how='sum', closed='left', label='right')
sum_april1 = data_monthly_1.ix[1]
# If we check this, the summed values per month do not match :
data_hourly_april2 = ts[ts.index.month == 4]
sum_april2 = sum(data_hourly_april2)
# Instead, "closed=left" above resulted in an inclusion of the entire last day of the
# previous month (25 values !) (and exclusion of the last day of the month)
# This is also strange as I am nowhere concerned with using 'days'.
data_hourly_april3 = ts[ ((ts.index.month == 3) & (ts.index.day == 31)) |
((ts.index.month == 4) & (ts.index.day < 30)) ]
sum_april3 = sum(data_hourly_april3)
# PS: "closed=right" results in the answer I would expect for "closed = left":
data_monthly_4 = ts.resample('1M', how='sum', closed='right', label='right')
sum_april4 = data_monthly_4.ix[1]
When resampling these hourly to daily values, there was no problem.