Cannot aggregate by mean when using PeriodIndex and high-frequency series does cross between bins · Issue #2070 · pandas-dev/pandas (original) (raw)
When using a Series indexed by a PeriodIndex and downsampling, resampling fails when using how='mean' and where the series to be resampled does not span multiple lower-frequency bins.
For example:
ix = period_range(start="2012-01-01", end="2012-12-31", freq="M")
s = Series(np.random.randn(len(ix)), index=ix)
s.resample("A", how='mean')
Fails because the period range is entirely contained within a single year. I've been able to replicate this going from quarterly to annual, or monthly to quarterly, etc. As of 0.9.1-dev, crashes Python without an exception as Cython function group_mean_bin() attempts to index into an empty bins array.
if bins[len(bins) - 1] == len(values): # Crash
I don't know how fine-grained pandas is right now when aggregating partially-filled periods, but it could be nice to have an option to return a NaN when the higher-frequency window is only partially filled. For example, suppose we sum daily to monthly and take a percent change across months, and either the recording started partway through the first month or data is only available partway through the last month. Then the first or last period percent change will possibly show a dramatic swing, and the user may not realize its simply an artifact of the data availability, as opposed to a truly interesting move in the underlying process. When running alot of automated aggregations the user may wish to not aggregate any partially filled periods in order to protect themselves from reaching a false conclusion about the time-series trend at the beginning or end of the series.