[Python-Dev] Memory bitmaps for the Python cyclic garbage collector (original) (raw)

Neil Schemenauer neil at python.ca
Thu Sep 7 13:30:12 EDT 2017


Python objects that participate in cyclic GC (things like lists, dicts, sets but not strings, ints and floats) have extra memory overhead. I think it is possible to mostly eliminate this overhead. Also, while the GC is running, this GC state is mutated, which destroys copy-on-write optimizations. This change would mostly fix that issue.

All objects that participate in cyclic GC have the Py_TPFLAGS_HAVE_GC bit set in their type. That causes an extra chunk of memory to be allocated before the ob_refcnt struct member. This is the PyGC_Head struct.

The whole object looks like this in memory (PyObject pointer is at arrow):

union __gc_head *gc_next;
union __gc_head *gc_prev;
Py_ssize_t gc_refs;

--> Py_ssize_t ob_refcnt struct _typeobject *ob_type; [rest of PyObject members]

So, 24 bytes of overhead on a 64-bit machine. The smallest Python object that can have a pointer to another object (e.g. a single PyObject

Carl Shaprio questioned me today on why we use a double linked-list and not the memory bitmap. I think the answer is that there is no good reason. We use a double linked list only due to historical constraints that are no longer present.

Long ago, Python objects could be allocated using the system malloc or other memory allocators. Since we could not control the memory location, bitmaps would be inefficient. Today, we allocate all Python objects via our own function. Python objects under a certain size are allocated using our own malloc, obmalloc, and are stored in memory blocks known "arenas".

The PyGC_Head struct performs three functions. First, it allows the GC to find all Python objects that will be checked for cycles (i.e. follow the linked list). Second, it stores a single bit of information to let the GC know if it is safe to traverse the object, set with PyObject_GC_Track(). Finally, it has a scratch area to compute the effective reference count while tracing refs (gc_refs).

Here is a sketch of how we can remove the PyGC_Head struct for small objects (say less than 512 bytes). Large objects or objects created by a different memory allocator will still have the PyGC_Head overhead.

There are a lot of smaller details to work out but I see no reason why the idea should not work. It should significantly reduce memory usage. Also, because the bitmap and gc_refs are contiguous in memory, locality will be improved. Ɓukasz Langa has mentioned that the current GC causes issues with copy-on-write memory in big applications. This change should solve that issue.

To implement, I think the easiest path is to create new malloc to be used by small GC objects, e.g. gcmalloc.c. It would be similar to obmalloc but have the features needed to keep track of the bitmap. obmalloc has some quirks that makes it hard to use for this purpose. Once the idea is proven, gcmalloc could be merged or made to be a variation of obmalloc. Or, maybe just optimized and remain separate. obmalloc is complicated and highly optimized. So, adding additional functionality to it will be challenging.

I believe this change would be ABI compatible.



More information about the Python-Dev mailing list