[Python-Dev] cpython: Issue #3329: Add new APIs to customize memory allocators (original) (raw)

Victor Stinner victor.stinner at gmail.com
Sun Jun 16 02🔞32 CEST 2013


2013/6/15 Antoine Pitrou <solipsis at pitrou.net>:

On Sat, 15 Jun 2013 03:54:50 +0200 Victor Stinner <victor.stinner at gmail.com> wrote:

The addition of PyMemRawMalloc() is motivated by the issue #18203 (Replace calls to malloc() with PyMemMalloc()). The goal is to be able to setup a custom allocator for all allocation made by Python, so malloc() should not be called directly. PyMemRawMalloc() is required in places where the GIL is not held (ex: in os.getcwd() on Windows). We already had this discussion on IRC and this argument isn't very convincing to me. If os.getcwd() doesn't hold the GIL while allocating memory, then you should fix it to hold the GIL while allocating memory.

The GIL is released for best performances, holding the GIL would have an impact on performances.

PyMem_RawMalloc() is needed when PyMem_Malloc() cannot be used because the GIL was released. For example, for the issue #18227 (reuse the custom allocator in external libraries), PyMem_Malloc() is usually not appropriate. PyMem_RawMalloc() should also be used instead of PyMem_Malloc() in the Python startup sequence, because PyMem_Malloc() requires the GIL whereas the GIL does not exist yet.

PyMem_RawMalloc() also provides more accurate memory usage if it can be replaced or hooked (with PyMem_SetRawAllocators).

The issue #18203 explains why I would like to replace direct call to malloc() with PyMem_Malloc() or PyMem_RawMalloc().

I don't like the idea of adding of third layer of allocation APIs. The dichotomy between PyObjectMalloc and PyMemMalloc is already a bit gratuitous (i.e. not motivated by any actual real-world concern, as far as I can tell).

In Python 3.3, PyMem_Malloc() cannot be used instead of malloc() where the GIL is not held. Instead of adding PyMem_RawMalloc(), an alternative is to remove the "the GIL must be held" restriction from PyMem_Malloc() by changing PyMem_Malloc() to make it always call malloc() (instead of PyObject_Malloc() in debug mode).

With such change, a debug hook cannot rely on the GIL anymore: it cannot inspect Python objects, get a frame or traceback, etc. To still get accurate debug report, PyMem_Malloc() should be replaced with PyObject_Malloc().

I don't understand yet the effect of such change on backport compatibility. May it break applications?

As for the debug functions you added: PyMemGetRawAllocators(), PyMemSetRawAllocators(), PyMemGetAllocators(), PyMemSetAllocators(), PyMemSetupDebugHooks(), PyObjectGetArenaAllocators(), PyObjectSetArenaAllocators(). Well, do we need all 7 of them? Can't you try to make that 2 or 3?

Get/SetAllocators of PyMem, PyMem_Raw and PyObject can be grouped into 2 functions (get and set) with an argument to select the API.

It is what I proposed initially. I changed this when I had to choose a name for the name of the argument ("api", "domain", something else?) because there were only two choices. With 3 family of functions (PyMem, PyMem_Raw and PyObject), it becomes again interesting to have generic functions.

The arena case is different: pymalloc only uses two functions to allocate areneas: void* alloc(size_t) and void release(void*, size_t). The release function has a size argument, which is unusual, but require to implement it using munmap(). VirtualFree() on Windows requires also the size.

An application can choose to replace PyObject_Malloc() with its own allocator, but in my experience, it has an important impact on performance (Python is slower). To benefit of pymalloc with a custom memory allocator, _PyObject_SetArenaAllocators() can be used.

I kept _PyObject_SetArenaAllocators() private because I don't like its API, it is not homogenous with the other SetAllocators functions. I'm not sure that it would be used, so I prefer to keep it private until it is tested by some projects.

"Private" functions can be used by applications, it's just that Python doesn't give any backward compatibility warranty. Am I right?

Victor



More information about the Python-Dev mailing list