BUG: Incorrect behavior of window aggregation functions on disjoint windows skipping overflowing elements · Issue #45647 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np import pandas as pd from pandas.core.indexers.objects import BaseIndexer f64max = 1.7976931348623158E+308 values = np.array([1, f64max, f64max, 3], dtype=np.float64) start = np.array([0, 0, 3, 3], dtype=np.int64) end = start + 1 class WinInd(BaseIndexer): def get_window_bounds(self, num_values: 'int' = 0, min_periods: 'int | None' = None, center: 'bool | None' = None, closed: 'str | None' = None) -> 'tuple[np.ndarray, np.ndarray, np.ndarray]': return start, end
pd.Series(values).rolling(WinInd()).sum() pd.Series(values).rolling(WinInd()).mean()
Issue Description
The example outputs
0 1.0
1 1.0
2 NaN
3 NaN
dtype: float64
for both sum
and mean
. The correct behavior would output 4.0
instead of NaN
in the last two rows. This shows that at least these rolling window aggregations produce incorrect output on disjoint windows that skip elements whose sum overflows. This incorrect behavior originates in the window aggregation functions in pandas._libs.window.aggregations
that process, rather than skip over, elements outside the disjoint windows; many of these functions have this problem. Here is code that shows this for sum
and mean
:
import numpy as np import pandas as pd import pandas._libs.window.aggregations as wa values = np.array([1, np.inf, 3], dtype=np.float64) start = np.array([0, 2], dtype=np.int64) end = start + 1 wa.roll_sum(values, start, end, 0) array([ 1., nan, nan]) wa.roll_mean(values, start, end, 0) array([ 1., nan, nan])
Because these window aggregation functions are not exposed to the user and are wrapped by defensive code within Series.rolling
that handles np.inf
values, the above example is more involved and induces an overflow to expose the behavior.
One issue where the need for handling disjoint windows occurs is in GH-15354 when the step size is larger than the window size, which is the use case described there. The current issue is a precursor for handling GH-15354.
Expected Behavior
The expected output is for both sum
and mean
is:
0 1.0
1 1.0
2 4.0
3 4.0
dtype: float64
Installed Versions
This issue is confirmed on the main branch.