[Python-Dev] Cyclic GC issues (original) (raw)
Tim Peters tim.peters at gmail.com
Mon Oct 11 00:45:00 CEST 2004
- Previous message: [Python-Dev] Cyclic GC issues
- Next message: [Python-Dev] compiler bug #1042238 is blocking the 2.4b1 release
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Jason Evans]
Since the spring of 2003, I have been developing Crux, which is a computer program for phylogenetic inferencing (bioinformatics) research. In March of 2004, I switched Crux to using Python from having used a different embeddable interpreter. For the most part, I have been very happy with Python, but Python's garbage collector has been a major source of frustration.
Then you're not using it correctly -- it's easy if you are. You didn't show actual code, though, and I don't have time to guess what you might have done.
... The important aspect of Crux is that it deals with trees.
The details shouldn't matter; Python's cyclic gc handles arbitrary graph structures, including multiple self-loops if you want them. The Python core creates graph structures of boggling complexity, so it might help you to study how core objects implement their tp_traverse and tp_clear methods. They're mostly very simple. Where they're not very simple, it's usually because someone had a silly idea of "optimization" in mind.
...
1) I originally had lone rings refer to themselves (obrefcnt started out at 3; 2 self-references and one reference held by the associated edge). This didn't work.
"Didn't work" leaves the reader guessing about everything. Showing code, and being specific about both what you expected and what you observed, are necessary. python-dev isn't really the right place for that, though.
It appears that the cyclic GC does not actually calculate the number of live references to objects (references that are reachable by traversing all objects accessible from the root set);
Python's gc has no "root set" concept, so thinking in those terms won't help. It won't grow a root-set concept, either: "foreign" extension modules (like Tcl/Tk, or library packages written in Fortran) can hold what would normally be considered root-set objects for Python, and that's fine. They're not required to cooperate with Python's memory management beyond respecting Python's refcount rules. Forcing modules (via some new API) to identify root-set objects they own would be backward-incompatible (not to mention a major new burden on extension authors and users), so won't happen.
instead it assumes that if tpclear() doesn't drop the reference count to a number that equals the number of references from live objects, there must still be references from live objects. Unfortunately, visit()ing self does not work, so there is no way to convince Python that all references are from unreachable objects.
Again I can't guess what "does not work" means. But I can assure you it "would work" if I wrote it . It's easy: an object's tp_traverse must call the visit() callback exactly once for each non-NULL PyObject* directly (in one step) reachable from the object, and that's all. If self is reachable from self via a refcounted pointer held by self, then self's tp_traverse() implementation must call visit(self, arg).
But I think you're getting off track even caring what your pointers point at. If some object type T has three Py_Object* pointers pa, pb, pc, then using Python 2.4's Py_VISIT macro, T's tp_traverse should be coded
static int T_traverse(T *self, visitproc visit, void *arg) { Py_VISIT(self->pa); Py_VISIT(self->pb); Py_VISIT(self->pc); return 0; }
It doesn't matter what pa (etc) points at. If pa points to self, fine; if it doesn't, also fine.
Working around this in Crux requires a lot of extra reference counting complexity, because there are three different cases for reference counts, depending on how many members there are in a ring (1, 2, or 3+ members).
Please show code. If, e.g., you have a vector v of length n holding the PyObject* "members", then
int i;
for (i = 0; i < self->n; ++i)
Py_VISIT(self->v[i]);
return 0;
is an appropriate tp_traverse(). For example, that's exactly what the tp_traverse() for Python's builtin list objects does, and a Python list L can certainly be an element of itself:
L = [] L.append(L) L[0] is L True
There's no special-casing needed for the number of containees, or for the nature of what they do or don't point at. It's also irrelevant what can be reached from the direct containees.
2) This issue is really just a different manifestation of issue (1).
At the C (not Python object) level, each node only actually stores a pointer to a single member of the associated ring.
Please show code. If the C object contains exactly one pointer, then its tp_traverse should call visit() exactly once; etc. The correct C-level code has entirely to do with what the C struct contains, and nothing to do with the view presented at the Python level.
Given a single ring member, it is possible to traverse the ring and reach all other ring members.
That's irrelevant to gc. tp_traverse() is only responsible for revealing the objects directly reachable from self, or, more precisely, for revealing the PyObject* pointers on which self owns a reference. gc computes the transitive closure itself; tp_traverse() isn't responsible for that.
As mentioned in issue (1), the cyclic GC expects tptraverse() to call visit() once for each reference held.
Yes.
It is not enough for a node to visit() one ring member;
Don't know what it means for "a node to visit()". If the C struct has exactly one pointer, then it is enough for visit() to be invoked exactly once with that pointer. It would be incorrect not to visit() that pointer, and it would be incorrect to visit() more pointers than just that one.
it must visit() all ring members,
If and only if for each ring member M, self contains a pointer
directly to M. Showing code is much better than English. View your
C-level objects as a directed graph. tp_traverse at a node self
is
reponsible for revealing self's outgoing edges, no less and no more
than that.
in order for the GC to count how many references are from unreachable objects, versus reachable from the root set.
In summary, issues (1) and (2) are due to how the cyclic GC does the "marking" phase of mark/sweep GC.
It's not a mark/sweep collector.
My expectation of mark/sweep GC is
You expectation is irrelevant , since it's not a mark/sweep collector.
that it should be sufficient to assure that all objects reachable from the root set are visit()ed at least once; it should not be important how many times each unreachable object is visit()ed.
Forget roots sets and mark/sweep collectors. If you want to know how Python's gc actually works, read the comments in gcmodule.c's collect() function, and drill down from there as deep as your curiosity takes you.
I don't have a deep enough understanding of the Python interpreter to give a detailed suggestion for improvement. I suspect that determining the root set is not an easy operation; if this is the case, then I think we're stuck with the current design.
We are, but I don't feel "stuck" with it -- it's a very good design.
If determining the root set is easy (or even possible), then I would suggest using a standard mark/sweep collector, where unreachable objects are scheduled for destruction.
The root set is impossible to determine without active cooperation from extension modules. Python's gc can infer the existence of roots, but not their identies; it can infer the transitive closure of what's reachable from the root set intersected with the set of objects that have registered with gc, but knows nothing about the roots themselves.
tptraverse(), tpclear(), and tpdealloc() would retain the same structure; the only difference would be the logic that determines which objects can be destroyed.
3) A strange thing can happen when tpclear() is called. Suppose that an edge object is being cleared, and it drops its references to the associated rings. If obrefcnt of one of the rings drops to 0 as a result, Python will tpdealloc() the ring right now, without ever calling tpclear() on the ring.
That depends on how the tp_dealloc() for the ring is coded, so is up to you. The only necessary thing for tp_clear() to do is to clear enough pointers to break cycles. Some people do only that much. For example, Python tuples don't even implement tp_clear, because a cycle involving tuples necessarily involves non-tuple objects too. Other people write tp_clear() so that it can be called from their tp_dealloc() function too. That's up to them. If you want to do the latter, it's good to write tp_clear() in such a way that calling it more than once is harmless; e.g.,
if (self->pa) {
Py_DECREF(self->pa);
self->pa = NULL;
}
etc
or, using 2.4's Py_CLEAR macro,
Py_CLEAR(self->pa);
etc.
That means that I have to keep track of whether tpclear() has been called on objects, if it is at all important that tpclear() be called, so that I can manually do so in tpdealloc(),
As above, make your tp_clear() idempotent and then you don't have to keep track of anything. It's usually good practice for a tp_clear() to be robust against NULL pointers anyway.
if necessary. It is in my opinion reasonable to have cleanup code in tpclear(), with the assumption that it will be called precisely once, but Python makes me do extra work to make sure that this happens.
You won't get a guarantee that tp_clear() will be called exactly once. You have a guarantee that tp_dealloc() will be called no more than once. It's actually unusual to find a tp_dealloc() that calls its type's tp_clear(), but I think it's a nice pattern when you can get away with it. In such cases, tp_clear() should be coded so that it can be called multiple times, and that's generally very easy to do.
This should be pretty easy to change. A single bit per object is needed to keep track of whether tpclear() has been called. I think this only needs to be done for object types that support cyclic GC.
There's no such thing as "one bit" in reality: because of malloc memory padding, "one more bit" would require adding 4 bytes to every gc'able object. That's an extraordinary expense. Luckily, it's not needed.
4) There is no way to control the order in which objects are tpdealloc()ed.
It's true that there's no direct way to control the order.
This is a problem for the R-E-R construct, since at a low level, these objects are always linked together.
Many things are always linked together in the core too. Why that's "a problem" for you isn't clear. It's not a problem in the core, for example.
What I would like to do is defer tpdealloc() on the edge until after both rings have been deleted. Instead, I am forced to use a reference-counted deletion function.
Sorry, don't know what that means, and (as above) don't know why you care.
Not calling self->obtype->tpfree() on the edge in tpdealloc() until later is not a reasonable option, because this defers deletion of the edge until a later round of garbage collection.
This could be addressed in the Python interpreter by paying heed to the return value of tpdealloc(). If the return value is non-zero, move the object to the end of the list of objects to be destroyed,
Except there is no list of objects to be destroyed.
so that destruction is tried later. This allows the module to impose its own destruction ordering.
That part would need clearer motivation, preferably in the form of a PEP.
- Previous message: [Python-Dev] Cyclic GC issues
- Next message: [Python-Dev] compiler bug #1042238 is blocking the 2.4b1 release
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]