BUG: Segmentation fault when doing pandas.core.window.rolling.RollingGroupBy.apply · Issue #36727 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
df = pd.DataFrame( [ ["A", "group_1", pd.Timestamp(2019, 1, 1, 9)], ["B", "group_1", pd.Timestamp(2019, 1, 2, 9)], ["C", "group_2", pd.Timestamp(2019, 1, 3, 9)], ["D", "group_1", pd.Timestamp(2019, 1, 6, 9)], ["E", "group_1", pd.Timestamp(2019, 1, 7, 9)], ["F", "group_1", pd.Timestamp(2019, 1, 10, 9)], ["G", "group_2", pd.Timestamp(2019, 1, 20, 9)], ["H", "group_1", pd.Timestamp(2019, 4, 8, 9)], ], columns=["index", "group", "eventTime"], ).set_index("index")
groups = df.groupby("group") df["count_to_date"] = groups.cumcount() rolling_groups = groups.rolling("10d", on="eventTime") group_size = rolling_groups.apply(lambda df: df.shape[0]) print(group_size)
Problem description
The above code causes a segmentation fault inside pandas for versions after 1.0.5. Since I need the above code for a project, I am restricted to using pandas 1.0.5 until this is resolved. I am not sure what is causing the segmentation fault, but all the above circumstances are necessary to reproducing the bug (ie DataFrame
with special index, a column set in the DataFrame
after grouping, a rolling window on a group, etc).
I have reproduced this bug on a variety of machines and operating systems.
Expected Output
eventTime count_to_date
group index
group_1 A 2019-01-01 09:00:00 1.0
B 2019-01-02 09:00:00 2.0
D 2019-01-06 09:00:00 3.0
E 2019-01-07 09:00:00 4.0
F 2019-01-10 09:00:00 5.0
H 2019-04-08 09:00:00 1.0
group_2 C 2019-01-03 09:00:00 1.0
G 2019-01-20 09:00:00 1.0
Note: This is indeed the output of versions 1.0.5 and prior.
Output of pd.show_versions()
This is just one configuration but the bug has been reproduced on three different machines (both linux and mac), all exhibiting the same behavior.
INSTALLED VERSIONS
commit : 2a7d332
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
Version : Darwin Kernel Version 17.7.0: Thu Jun 18 21:21:34 PDT 2020; root:xnu-4570.71.82.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.2
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None