rolling_max on a TimeSeries with freq='D' returns incorrect results · Issue #6297 · pandas-dev/pandas (original) (raw)
rolling_max on a TimeSeries with freq='D' appears to actually compute the rolling mean, and not the rolling max.
In [118]: import pandas
In [119]: indices = [datetime.datetime(1975, 1, i, 12, 0) for i in range(1, 6)]
In [120]: indices.append(datetime.datetime(1975, 1, 3, 6, 0)) # So that we can have 2 datapoints on one of the days
In [121]: series = pandas.Series(range(1, 7), index=indices)
In [122]: series = series.map(lambda x: float(x)) # Use floats instead of ints as values
In [123]: series = series.sort_index() # Sort chronologically
In [124]: expected_result = pandas.Series([1.0, 2.0, 6.0, 4.0, 5.0], index=[datetime.datetime(1975, 1, i, 12, 0) for i in range(1, 6)])
In [125]: actual_result = pandas.rolling_max(series, window=1, freq='D')
In [126]: assert((actual_result==expected_result).all())
AssertionError Traceback (most recent call last) in () ----> 1 assert((actual_result==expected_result).all())
AssertionError:
In [127]: expected_result Out[127]: 1975-01-01 12:00:00 1 1975-01-02 12:00:00 2 1975-01-03 12:00:00 6 1975-01-04 12:00:00 4 1975-01-05 12:00:00 5 dtype: float64
In [128]: actual_result Out[128]: 1975-01-01 1.0 1975-01-02 2.0 1975-01-03 4.5 1975-01-04 4.0 1975-01-05 5.0 Freq: D, dtype: float64
With a window of size 2 days, it still looks like the rolling mean:
In [130]: pandas.rolling_max(series, window=2, freq='D') Out[130]: 1975-01-01 NaN 1975-01-02 2.0 1975-01-03 4.5 1975-01-04 4.5 1975-01-05 5.0 Freq: D, dtype: float64