Python garbage collector — Unofficial Python Development (Victor's notes) documentation (original) (raw)
Reference documentation by Pablo Galindo Salgado: https://devguide.python.org/garbage_collector/
Py_TPFLAGS_HAVE_GC¶
The garbage collector does not track objects if their type don’t have thePy_TPFLAGS_HAVE_GC
flag.
If a type has the Py_TPFLAGS_HAVE_GC
flag, when an object is allocated, aPyGC_Head
structure is allocated at the beginning of the memory block, butPyObject*
points just after this structure. The _Py_AS_GC(obj)
macro gets a PyGC_Head*
pointer from a PyObject*
pointer using pointer arithmetic: ((PyGC_Head *)(obj) - 1)
.
See also the PyObject_IS_GC()
function which uses thePyTypeObject.tp_is_gc
slot. An object has the PyGC_Head
header ifPyObject_IS_GC()
returns true. For a type, the tp_is_gc
slot function checks if the type is a heap type (has the Py_TPFLAGS_HEAPTYPE
flag): static types don’t have the PyGC_Head
header.
Implement the GC protocol in a type¶
- Set
Py_TPFLAGS_HAVE_GC
flag - Define a
tp_traverse
function. - Define a
tp_clear
function. - For heap types, the traverse function must visit the type, and the dealloc function must call
Py_DECREF(Py_TYPE(self))
. Otherwise, the GC is unable to collect the type once the last instance is deleted (and the type was already deleted). - If
PyObject_New()
is used to allocate an object, replace it withPyObject_GC_New()
. - If the dealloc function calls
PyObject_Free()
: replace it withtype->tp_free(self)
. - The constructor should call
PyObject_GC_Track(self)
(or not, it depends how the object was created) and the deallocator should callPyObject_GC_UnTrack(self)
.
Example of dealloc function:
static void abc_data_dealloc(_abc_data *self) { PyTypeObject *tp = Py_TYPE(self); // ... release resources ... tp->tp_free(self); #if PY_VERSION_HEX >= 0x03080000 Py_DECREF(tp); #endif }
On Python 3.7 and older, Py_DECREF(tp);
is not needed: it changed in Python 3.8, see bpo-35810.
PyType_GenericAlloc()
allocates memory and immediately tracks the newly created object, even if its memory is uninitialized: its traverse function must support uninitialized objects. Python 3.11 adds a private function_PyType_AllocNoTrack()
which allocates memory without tracking an object, so the caller can only track the object (PyObject_GC_Track(self)
) once it’s fully initialized, to simplify the traverse function.
&PyBaseObject_Type
(without Py_TPFLAGS_HAVE_GC
):
tp_alloc = PyType_GenericAlloc()
tp_free = PyObject_Del()
&PyType_Type
(with Py_TPFLAGS_HAVE_GC
):
tp_alloc = PyType_GenericAlloc()
(inherited from&PyBaseObject_Type
)tp_free = PyObject_GC_Del()
&PyDict_Type
(with Py_TPFLAGS_HAVE_GC
):
tp_alloc = _PyType_AllocNoTrack()
: function creating dicts call_PyObject_GC_TRACK()
tp_free = PyObject_GC_Del()
gc.collect()¶
CPython uses 3 garbage collector generations. Default thresholds (gc.get_threshold()
):
- Generation 0 (youngest objects): 700
- Generation 1: 10
- Generation 2 (oldest objects): 10
The main function of the GC is gc_collect_main()
in Modules/gcmodule.c
: it collects objects of a generation. The function relies on the PyGC_Head
structure. Simplified algoritm:
- Merge younger generations with one we are currently collecting.
- Deduce unreachable.
- Copy object reference count into PyGC_Head.
- Traverse objects using visit_decref(); ignore objects which are not part of the currently collected GC collection.
- Move objects with a reference count (PyGC_Head) of 0 to an “unreachable” list.
- Move reachable objects to next generation.
- Clear weak references and invoke callbacks as necessary.
- Call
tp_finalize
on objects which have one. - Handle any objects that may have resurrected.
- Call
tp_clear
on unreachable objects. - If the DEBUG_SAVEALL flags is set, move uncollectable garbage (cycles with
tp_del
slots, and stuff reachable only from such cycles) to thegc.garbage
list.
The exact implementation is more complicated.
GC bugs¶
See also the Python finalization.
- bpo-42972: Heap types (PyType_FromSpec) must fully implement the GC protocol
- bpo-40217: The garbage collector doesn’t take in account that objects of heap allocated types hold a strong reference to their type: Bug fixed in Python 3.9.
- bpo-38006: issue with weak references and types which don’t implement tp_traverse.
- GC fix for weak references:commit
- Remove a closuse in weakref.WeakValueDictionary:commit
- PyFunctionType.tp_clear:removedtemporarily, and then added again
- cffi type missing a tp_traverse function:bug report(still open at 2021-09-24)
- bpo-35810: Object Initialization does not incref Heap-allocated Types:commit
PyObject_Init()
now callsPy_INCREF(Py_TYPE(op))
if the object type is a heap type. Traverse functions must now visit the type and dealloc functions must now callPy_DECREF()
on the type.
Reference cycles¶
- C function (PyCFunctionObject): C function <=> module
- PyCFunctionObject.m_module => module
- module => module.__dict__
- module.__dict__ => PyCFunctionObject
- PyTypeObject
- type->tp_mro => type: the MRO tuple contains the type