[Python-Dev] Modify PyMem_Malloc to use pymalloc for performance (original) (raw)

Victor Stinner victor.stinner at gmail.com
Fri Feb 12 10:07:21 EST 2016


Hi,

2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal at egenix.com>:

Sorry, your email must gotten lost in my inbox.

no problemo

Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs).

It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc.

Ok, I will try my patch on some of them. Thanks for the pointers.

I suppose such a flag would create a noticeable runtime performance hit, since the compiler would no longer be able to inline the PyMem*() APIs if you redirect those APIs to other sets at runtime.

Hum, I think that you missed the PEP 445. The overhead of this PEP was discussed and considered as negligible enough to implement the PEP: https://www.python.org/dev/peps/pep-0445/#performances

Using the PEP 445, there is no overhead to enable debug hooks at runtime (except of the overhead of the debug checks themself ;-)).

PyMem_Malloc now calls a pointer: https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l319

Same for PyObject_Malloc: https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l380

I also don't see much point in carrying around such baggage in production builds of Python, since you'd most likely only want to use the tools to debug C extensions during their development.

I propose adding an environment variable because it's rare that a debug build is installed on system. Usually, using a debug build requires to recompile all C extensions which is not really... convenient...

With such env var, it would be trivial to check quickly if the Python memory allocators are used correctly.

Runtime performance, difference in memory consumption (arenas cannot be freed if there are still small chunks allocated), memory locality. I'm no expert in this, so can't really comment much.

"arenas cannot be freed if there are still small chunks allocated" yeah, this is called memory fragmentation.

There is a big difference between libc malloc() and pymalloc for small allocations: pymalloc is able to free an arena using munmap() which releases immediatly the memory to the system, whereas most implementation of malloc() use a single contigious memory block which is only shrinked when all memory "at the top" is free. So it's the same fragmentation issue that you described, except that it uses a single arena which has an arbitrary size (between 1 MB and 10 GB, there is no limit), whereas pymalloc uses small arenas of 256 KB.

In short, I expect less fragmentation with pymalloc.

"memory locality": I have no idea on that. I guess that it can be seen on benchmarks. pymalloc is designed for objects with short lifetime.

Victor



More information about the Python-Dev mailing list