BUG: rolling with method='table' and apply sorts columns alphabetically (original) (raw)

Pandas version checks

Reproducible Example

import pandas as pd import numpy as np

datetime_column = "datetime"

def sum_of_subtraction(values): col_1 = values[:, 0] col_2 = values[:, 1] return np.mean(col_1 - col_2)

cols_order_1 = ["a", "b"] cols_order_2 = ["b", "a"]

df = pd.DataFrame( { datetime_column: pd.date_range("2020-01-01", periods=6), "a": [1, 2, 3, 4, 5, 6], "b": [6, 7, 8, 5, 6, 7], } ) order_1_df = ( df .rolling('3D', on=datetime_column, method='table')[cols_order_1] .apply(sum_of_subtraction, engine="numba", raw=True) )

order_2_df = ( df .rolling('3D', on=datetime_column, method='table')[cols_order_2] .apply(sum_of_subtraction, engine="numba", raw=True) )

Issue Description

With the code above, the function sum_of_subtraction returns a different results if the columns of the input argument values are swapped (cf the example below).

x = np.array([[1, 2],[3, 4]]) y = x[:, [1, 0]] # columns are swapped print(sum_of_subtraction(x)) # returns -1.0 print(sum_of_subtraction(y)) # returns 1.0

Yet, using rolling with method='table' followed with apply returns same results, no matter or the columns ordering. Even though the columns were given with a specific ordering, when calling apply with engine='numba', the columns are sorted alphabetically. With the code above, here are the results:

order_1_df:

a b datetime
0 -5 -5 2020-01-01 00:00:00
1 -5 -5 2020-01-02 00:00:00
2 -5 -5 2020-01-03 00:00:00
3 -3.66667 -3.66667 2020-01-04 00:00:00
4 -2.33333 -2.33333 2020-01-05 00:00:00
5 -1 -1 2020-01-06 00:00:00

order_2_df:

a b datetime
0 -5 -5 2020-01-01 00:00:00
1 -5 -5 2020-01-02 00:00:00
2 -5 -5 2020-01-03 00:00:00
3 -3.66667 -3.66667 2020-01-04 00:00:00
4 -2.33333 -2.33333 2020-01-05 00:00:00
5 -1 -1 2020-01-06 00:00:00

Expected Behavior

The results when the columns are ordering as such ["b", "a"] must be like that (order_2_df dataframe):

b a datetime
0 5 5 2020-01-01 00:00:00
1 5 5 2020-01-02 00:00:00
2 5 5 2020-01-03 00:00:00
3 3.66667 3.66667 2020-01-04 00:00:00
4 2.33333 2.33333 2020-01-05 00:00:00
5 1 1 2020-01-06 00:00:00

Installed Versions

INSTALLED VERSIONS

commit : e4956ab
python : 3.10.14
python-bits : 64
OS : Linux
OS-release : 5.15.0-1066-azure
Version : # 75-Ubuntu SMP Thu May 30 14:29:45 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 3.0.0.dev0+1413.ge4956ab403
numpy : 2.0.2
dateutil : 2.9.0
pip : 24.2
Cython : None
sphinx : None
IPython : 8.26.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : 0.60.0
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pytz : 2024.1
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None