Memory leak in Document.insert_pdf() (original) (raw)

Describe the bug (mandatory)

Memory leak when calling Document.insert_pdf()

To Reproduce (mandatory)

import os import sys

import fitz import psutil

with open(sys.argv[1], "rb") as f: doc = fitz.open(stream=f.read())

proc = psutil.Process(os.getpid()) first_ram = proc.memory_info().rss

calls = 0 while True: for page_idx in range(doc.page_count): copied_page = doc[page_idx] page_pdf = fitz.Document() page_pdf.insert_pdf( doc, from_page=page_idx, to_page=page_idx, start_at=0, rotate=copied_page.rotation, ) calls += 1 # I tried these after seeing suggestions in other issues but no difference copied_page = None page_pdf.close() page_pdf = None fitz.TOOLS.store_shrink(100)

ram = proc.memory_info().rss
ram_diff = (ram - first_ram) // 1024
print(f"{calls} calls\t{ram // 1024}KB total\t+{ram_diff}KB since start ({ram_diff / calls:0.02f}/call)")

Expected behavior (optional)

Not leaking memory :)

Screenshots (optional)

Reproduction script output:

$ python leak.py 500pages-A4.pdf 500 calls 26252KB total +3364KB since start (6.73/call) 1000 calls 28364KB total +5476KB since start (5.48/call) 1500 calls 30476KB total +7588KB since start (5.06/call) 2000 calls 32324KB total +9436KB since start (4.72/call) 2500 calls 34436KB total +11548KB since start (4.62/call) 3000 calls 36548KB total +13660KB since start (4.55/call) 3500 calls 38396KB total +15508KB since start (4.43/call) 4000 calls 40508KB total +17620KB since start (4.41/call) 4500 calls 42356KB total +19468KB since start (4.33/call) 5000 calls 44468KB total +21580KB since start (4.32/call) 5500 calls 46580KB total +23692KB since start (4.31/call) 6000 calls 48428KB total +25540KB since start (4.26/call) 6500 calls 50540KB total +27652KB since start (4.25/call) 7000 calls 52388KB total +29500KB since start (4.21/call) 7500 calls 54500KB total +31612KB since start (4.21/call) 8000 calls 56612KB total +33724KB since start (4.22/call) 8500 calls 58460KB total +35572KB since start (4.18/call) 9000 calls 60572KB total +37684KB since start (4.19/call) 9500 calls 62420KB total +39532KB since start (4.16/call) 10000 calls 64532KB total +41644KB since start (4.16/call) 10500 calls 66644KB total +43756KB since start (4.17/call) 11000 calls 68492KB total +45604KB since start (4.15/call) 11500 calls 70604KB total +47716KB since start (4.15/call) 12000 calls 72452KB total +49564KB since start (4.13/call) 12500 calls 74564KB total +51676KB since start (4.13/call) 13000 calls 76676KB total +53788KB since start (4.14/call) 13500 calls 78524KB total +55636KB since start (4.12/call) 14000 calls 80636KB total +57748KB since start (4.12/call) 14500 calls 82484KB total +59596KB since start (4.11/call) 15000 calls 84596KB total +61708KB since start (4.11/call) 15500 calls 86708KB total +63820KB since start (4.12/call) 16000 calls 88556KB total +65668KB since start (4.10/call) 16500 calls 90668KB total +67780KB since start (4.11/call) 17000 calls 92516KB total +69628KB since start (4.10/call)

It stabilizes around 4KB/call on that PDF, seems to leak more with more pages.

Valgrind summary on a debug build (tag 1.19.6, MuPDF 1.19.0, suppressing PyMem_RawMalloc calls):

==5508== HEAP SUMMARY:
==5508==     in use at exit: 1,507,032 bytes in 615 blocks
==5508==   total heap usage: 4,721 allocs, 4,106 frees, 8,493,027 bytes allocated
==5508==
==5508== 4,072 (32 direct, 4,040 indirect) bytes in 1 blocks are definitely lost in loss record 326 of 357
==5508==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5508==    by 0x692F78B: do_scavenging_malloc (memory.c:51)
==5508==    by 0x692F85F: fz_calloc (memory.c:111)
==5508==    by 0x6981238: pdf_new_graft_map (pdf-graft.c:42)
==5508==    by 0x689DFF5: new_Graftmap (fitz_wrap.c:17104)
==5508==    by 0x68B37A3: _wrap_new_Graftmap (fitz_wrap.c:29204)
==5508==    by 0x5ECFF7: cfunction_call (methodobject.c:552)
==5508==    by 0x50F338: _PyObject_MakeTpCall (call.c:191)
==5508==    by 0x574E12: _PyObject_VectorcallTstate (abstract.h:116)
==5508==    by 0x574E12: PyObject_Vectorcall (abstract.h:127)
==5508==    by 0x574E12: call_function (ceval.c:5077)
==5508==    by 0x574E12: _PyEval_EvalFrameDefault (ceval.c:3489)
==5508==    by 0x5100A1: _PyEval_EvalFrame (pycore_ceval.h:40)
==5508==    by 0x5100A1: function_code_fastcall (call.c:330)
==5508==    by 0x5100A1: _PyFunction_Vectorcall (call.c:367)
==5508==    by 0x5100A1: _PyObject_FastCallDictTstate (call.c:118)
==5508==    by 0x5100A1: _PyObject_Call_Prepend (call.c:489)
==5508==    by 0x54B2A2: slot_tp_init (typeobject.c:6969)
==5508==    by 0x54880A: type_call (typeobject.c:1026)
==5508==
==5508== LEAK SUMMARY:
==5508==    definitely lost: 32 bytes in 1 blocks
==5508==    indirectly lost: 4,040 bytes in 1 blocks
==5508==      possibly lost: 0 bytes in 0 blocks
==5508==    still reachable: 606,358 bytes in 115 blocks
==5508==         suppressed: 896,602 bytes in 498 blocks

Your configuration (mandatory)

print(sys.version, "\n", sys.platform, "\n", fitz.doc) 3.9.4 (v3.9.4:1f2e3088f3, Apr 4 2021, 12:32:44) [Clang 6.0 (clang-600.0.57)] darwin

PyMuPDF 1.19.6: Python bindings for the MuPDF 1.19.0 library. Version date: 2022-03-03 00:00:01. Built for Python 3.9 on darwin (64-bit).

print(sys.version, "\n", sys.platform, "\n", fitz.doc) 3.9.12 (main, Mar 24 2022, 16:21:12) [GCC 7.5.0] linux

PyMuPDF 1.19.6: Python bindings for the MuPDF 1.19.0 library. Version date: 2022-03-03 00:00:01. Built for Python 3.9 on linux (64-bit).