BUG: Issues with groupby ewm and times · Issue #40951 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
This refers to the code that is currently on master 84d9c5e (2021-04-14). The issues also exist on the latest version of pandas but are different.
import pandas as pd
halflife = "23 days" baseline_df = pd.DataFrame( { "A": ["a", "b", "a", "b", "a", "b"], "B": [0, 0, 1, 1, 2, 2], "C": pd.to_datetime( [ "2020-01-01", "2020-01-01", "2020-01-10", "2020-01-02", "2020-01-23", "2020-01-03", ] ) } )
cython_result = baseline_df.groupby("A").ewm(halflife=halflife, times="C").mean() print("cython") print(cython_result) print("numba") numba_result = baseline_df.groupby("A").ewm(halflife=halflife, times="C").mean(engine="numba") print(numba_result)
expected_result_a = pd.DataFrame([0, 1, 2]).ewm( halflife=halflife, times=pd.to_datetime(["2020-01-01", "2020-01-10", "2020-01-23"]) ).mean() expected_result_b = pd.DataFrame([0, 1, 2]).ewm( halflife=halflife, times=pd.to_datetime(["2020-01-01", "2020-01-02", "2020-01-03"]) ).mean() print("expected") print(" group a") print(expected_result_a) print(" group b") print(expected_result_b)
Output:
cython
B
A
a 0 0.000000
2 0.500000
4 1.094088
b 1 0.000000
3 0.500000
5 1.094088
numba
B
A
a 0 0.000000
2 0.666667
4 1.428571
b 1 0.000000
3 0.666667
5 1.428571
expected
group a
0
0 0.000000
1 0.567395
2 1.221209
group b
0
0 0.000000
1 0.507534
2 1.020088
Problem description
There are three problems with the current groupby ewm implementation in the case of non-None times.
- numba implementation: ignores the times
- cython implementation: does not use the correct times/deltas in aggregations.pyx in case of multiple groups
- if the groups are non-trivial the time vector and values become out of sync
I have a branch that fixes these issues, will link to it in a bit.
Expected Output
cython
B
A
a 0 0.000000
2 0.567395
4 1.221209
b 1 0.000000
3 0.507534
5 1.020088
numba
B
A
a 0 0.000000
2 0.567395
4 1.221209
b 1 0.000000
3 0.507534
5 1.020088
expected
group a
0
0 0.000000
1 0.567395
2 1.221209
group b
0
0 0.000000
1 0.507534
2 1.020088