BUG: Using df.groupby(..).rolling(..) on non-monotonic timestamp column does not raise an exception · Issue #43909 · pandas-dev/pandas (original) (raw)

Reproducible Example

import pandas as pd

shuffled = [3, 0, 1, 2] sec = 1_000_000_000 df = pd.DataFrame([{'t': pd.Timestamp(2 * x * sec), 'x': x+1, 'c': 42} for x in shuffled])

t x c

0 1970-01-01 00:00:06 4 42

1 1970-01-01 00:00:00 1 42

2 1970-01-01 00:00:02 2 42

3 1970-01-01 00:00:04 3 42

Following row raises an Exception (as expected):

df.rolling(on='t', window='3s').x.sum()

Following code should raise the same exception, but instead it gives meaningless results:

df.groupby('c').rolling(on='t', window='3s').x.sum()

c t

42 1970-01-01 00:00:06 4.0

1970-01-01 00:00:00 1.0

1970-01-01 00:00:02 3.0

1970-01-01 00:00:04 6.0

Issue Description

When calling df.groupby(..).rolling(on='...', ...) on some Timestamp column (with timedelta window) which is not sorted, pandas gives some meaningless results.

Expected Behavior

An exception should be raised, as is the case when we call df.rolling(on=...) (without the groupby) on non-monotonic column

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 73c68257545b5f8530b7044f56647bd2db92e2ba
python           : 3.8.10.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.11.0-37-generic
Version          : #41~20.04.2-Ubuntu SMP Fri Sep 24 09:06:38 UTC 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.3.3
numpy            : 1.21.2
pytz             : 2021.1
dateutil         : 2.8.2
pip              : 21.2.3
setuptools       : 45.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.5.0
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.13.0
pandas_datareader: None
bs4              : 4.8.2
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.1
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : 2.0.1
xlwt             : None
numba            : None