BUG: Partially incorrect results when using a custom indexer for a rolling window for max and min · Issue #46726 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd from pandas import api import numpy as np
class MultiWindowIndexer(api.indexers.BaseIndexer): def init(self, window): self.window = np.array(window) super().init()
def get_window_bounds(self, num_values, min_periods, center, closed):
end = np.arange(num_values, dtype='int64') + 1
start = np.clip(end - self.window, 0, num_values)
return start, end
np.random.seed([3,14]) a = np.random.randn(20).cumsum() w = np.minimum( np.random.randint(1, 4, size=a.shape), np.arange(len(a))+1 )
df = pd.DataFrame({'Data': a, 'Window': w})
df['max1'] = df.Data.rolling(MultiWindowIndexer(df.Window)).max(engine='cython')
print(df)
Issue Description
This method basically tries to use a rolling
operation where the window
is an arbitrary series of integers instead of an integer or an offset. It is related to question/feature request #46716 and it was originally authored as an answer for a StackOverflow question here. There the author of the method notes on the bug: "The cython implementation seems to remember the largest starting index encountered so far and 'clips' smaller starting indices to the stored value. More technically correct: only stores the range of the largest start and largest end indices encountered so far in a queue, discarding smaller start indices and making them unavailable."
Expected Behavior
The result printed for index 18, should be -1.487828 instead of -1.932612, because at that point the window is 3 and it looks for the max between -1.932612 and -2.539703 and -1.487828,
Installed Versions
commit : 4bfe3d0
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.2
numpy : 1.22.3
pytz : 2021.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 61.1.1
Cython : None
pytest : 7.1.1
hypothesis : None
sphinx : None
blosc : 1.10.6
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.1
IPython : None
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.0.1
matplotlib : None
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : 1.4.34
tables : 3.7.0
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
zstandard : None