BUG: skiplist memory leak in rolling functions · Issue #43339 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

running the test_rolling_non_monotonic test with the following addition triggers the leak

discovered in #43338

diff --git a/pandas/tests/window/test_rolling.py b/pandas/tests/window/test_rolling.py
index f829ae4be0..35063ba555 100644
--- a/pandas/tests/window/test_rolling.py
+++ b/pandas/tests/window/test_rolling.py
@@ -1251,6 +1251,19 @@ def test_rolling_decreasing_indices(method):
                 -0.45439658241367054,
             ],
         ),
+        (
+            "median",
+            [
+                float("nan"),
+                6.5,
+                float("nan"),
+                20.5,
+                4.0,
+                6.5,
+                9.0,
+                12.5
+            ],
+        ),
     ],
 )
 def test_rolling_non_monotonic(method, expected):

Problem description

roll_median_c and roll_quantile reassign but don't destroy their skiplist pointers when the indices are non-monotonic. skiplist_destroy should be called before skiplist_init. It might also be worth considering adding a reset() function to the skiplist to avoid unnecessary reallocations. But, since these extra allocations currently happen anyways it's probably better to just add the destroy() call first.

==184879== 
==184879== 0 bytes in 1 blocks are indirectly lost in loss record 1 of 1,913
==184879==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==184879==    by 0x168B8745: node_init (skiplist.h:69)
==184879==    by 0x168B8745: skiplist_init(int) (skiplist.h:129)
==184879==    by 0x168D3F1B: __pyx_pf_6pandas_5_libs_6window_12aggregations_10roll_median_c (aggregations.cpp:9332)
==184879==    by 0x168D3F1B: __pyx_pw_6pandas_5_libs_6window_12aggregations_11roll_median_c(_object*, _object*, _object*) (aggregations.cpp:9050)
==184879==    by 0x526878: ??? (in /usr/bin/python3.9)
==184879==    by 0x629E37: _PyObject_Call (in /usr/bin/python3.9)
==184879==    by 0x59F7D5: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==184879==    by 0x59734D: ??? (in /usr/bin/python3.9)
==184879==    by 0x62C5F3: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==184879==    by 0x599266: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==184879==    by 0x59734D: ??? (in /usr/bin/python3.9)
==184879==    by 0x62C5F3: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==184879==    by 0x599266: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==184879== 
==184879== 0 bytes in 1 blocks are indirectly lost in loss record 2 of 1,913
==184879==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==184879==    by 0x168B8750: node_init (skiplist.h:70)
==184879==    by 0x168B8750: skiplist_init(int) (skiplist.h:129)
==184879==    by 0x168D3F1B: __pyx_pf_6pandas_5_libs_6window_12aggregations_10roll_median_c (aggregations.cpp:9332)
==184879==    by 0x168D3F1B: __pyx_pw_6pandas_5_libs_6window_12aggregations_11roll_median_c(_object*, _object*, _object*) (aggregations.cpp:9050)
==184879==    by 0x526878: ??? (in /usr/bin/python3.9)
==184879==    by 0x629E37: _PyObject_Call (in /usr/bin/python3.9)
==184879==    by 0x59F7D5: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==184879==    by 0x59734D: ??? (in /usr/bin/python3.9)
==184879==    by 0x62C5F3: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==184879==    by 0x599266: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==184879==    by 0x59734D: ??? (in /usr/bin/python3.9)
==184879==    by 0x62C5F3: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==184879==    by 0x599266: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==184879== 

Expected Output

Output of pd.show_versions()


INSTALLED VERSIONS
------------------
commit           : 4caa51b1790d3b1c03835e919fc9f753fbd817b3
python           : 3.9.5.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.11.0-7620-generic
Version          : #21~1626191760~20.04~55de9c3~dev-Ubuntu SMP Tue Jul 20 18:02:09 
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.4.0.dev0+551.g4caa51b179.dirty
numpy            : 1.21.1
pytz             : 2019.3
dateutil         : 2.7.3
pip              : 21.2.4
setuptools       : 45.2.0
Cython           : 0.29.21
pytest           : 6.2.5
hypothesis       : 6.17.4
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.6.3
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.1
IPython          : 7.26.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.4.2
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.1
sqlalchemy       : 1.4.22
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None```

</details>