[Python-Dev] finalization again (original) (raw)

Guido van Rossum guido@python.org
Wed, 08 Mar 2000 09:06:53 -0500


A trash cycle without a finalizer isn't a problem, right? In that case, topsort rules have no visible consquence so it doesn't matter in what order you merely reclaim the memory.

When we have a pile of garbage, we don't know whether it's all connected or whether it's lots of little cycles. So if we find [objects with -- I'm going to omit this] finalizers, we have to put those on a third list and put everything reachable from them on that list as well (the algorithm I described before).

What's left on the first list then consists of finalizer-free garbage. We dispose of this garbage by clearing dicts and lists. Hopefully this makes the refcount of some of the finalizers go to zero -- those are finalized in the normal way.

And now we have to deal with the inevitable: finalizers that are part of cycles. It makes sense to reduce the graph of objects to a graph of finalizers only. Example:

A <=> b -> C <=> d

A and C have finalizers. C is part of a cycle (C-d) that contains no other finalizers, but C is also reachable from A. A is part of a cycle (A-b) that keeps it alive. The interesting thing here is that if we only look at the finalizers, there are no cycles!

If we reduce the graph to only finalizers (setting aside for now the problem of how to do that -- we may need to allocate more memory to hold the reduced greaph), we get:

A -> C

We can now finalize A (even though its refcount is nonzero!). And that's really all we can do! A could break its own cycle, thereby disposing of itself and b. It could also break C's cycle, disposing of C and d. It could do nothing. Or it could resurrect A, thereby resurrecting all of A, b, C, and d.

This leads to (there's that weird echo again :-) Boehm's solution: Call A's finalizer and leave the rest to the next time the garbage collection runs.

Note that we're now calling finalizers on objects with a non-zero refcount. At some point (probably as a result of finalizing A) its refcount will go to zero. We should not finalize it again -- this would serve no purpose. Possible solution:

INCREF(A); A->del(); if (A->ob_refcnt == 1) A->class = NULL; /* Make a finalizer-less */ DECREF(A);

This avoids finalizing twice if the first finalization broke all cycles in which A is involved. But if it doesn't, A is still cyclical garbage with a finalizer! Even if it didn't resurrect itself.

Instead of the code fragment above, we could mark A as "just finalized" and when it shows up at the head of the tree (of finalizers in cyclical trash) again on the next garbage collection, to discard it without calling the finalizer again (because this clearly means that it didn't resurrect itself -- at least not for a very long time).

I would be happier if we could still have a rule that says that a finalizer is called only once by magic -- even if we have two forms of magic: refcount zero or root of the tree. Tim: I don't know if you object against this rule as a matter of principle (for the sake of finalizers that resurrect the object) or if your objection is really against the unordered calling of finalizers legitimized by Java's rules. I hope the latter, since I think it that this rule (del called only once by magic) by itself is easy to understand and easy to deal with, and I believe it may be necessary to guarantee progress for the garbage collector.

The problem is that the collector can't easily tell whether A has resurrected itself. Sure, if the refcount is 1 after the finalizer run, I know it didn't resurrect itself. But even if it's higher than before, that doesn't mean it's resurrected: it could have linked to itself. Without doing a full collection I can't tell the difference. If I wait until a full collection happens again naturally, and look at the "just finalized flag", I can't tell the difference between the case whereby the object resurrected itself but died again before the next collection, and the case where it was dead already. So I don't know how many times it was expecting the "last rites" to be performed, and the object can't know whether to expect them again or not. This seems worse than the only-once rule to me.

Even if someone once found a good use for resurrecting inside del, against all recommendations, I don't mind breaking their code, if it's for a good cause. The Java rules aren't a good cause. But top-sorted finalizer calls seem a worthy cause.

So now we get to discuss what to do with multi-finalizer cycles, like:

A <=> b <=> C

Here the reduced graph is:

A <=> C

About this case you say:

If it has an object with a finalizer, though, at the very worst you can let it leak, and make the collection of leaked objects available for inspection. Even that much is a huge "improvement" over what they have today: most cycles won't have a finalizer and so will get reclaimed, and for the rest they'll finally have a simple way to identify exactly where the problem is, and a simple criterion for predicting when it will happen. If that's not "good enough", then without abandoning principle the user needs to have some way to reduce such a cycle to a topsort case themself.

> I find having a separate cleanup protocol cumbersome. Same here, but if you're not comfortable leaking, and you agree Python is not in the business of guesing in inherently ambiguous situations, maybe that's what it takes! MAL and GregS both gravitated to this kind of thing at once, and that's at least suggestive; and MAL has actually been using his approach. It's explicit, and that's Pythonic on the face of it. > I think that the "finalizer only called once by magic" rule is reasonable. If it weren't for its specific use in emulating Java's scheme, would you still be in favor of that? It's a little suspicious that it never came up before .

Suspicious or not, it still comes up. I still like it. I still think that playing games with resurrection is evil. (Maybe my spiritual beliefs shine through here -- I'm a convinced atheist. :-)

Anyway, once-only rule aside, we still need a protocol to deal with cyclical dependencies between finalizers. The cleanup approach is one solution, but it also has a problem: we have a set of finalizers. Whose cleanup do we call? Any? All? Suggestions?

Note that I'd like some implementation freedom: I may not want to bother with the graph reduction algorithm at first (which seems very hairy) so I'd like to have the right to use the cleanup API as soon as I see finalizers in cyclical trash. I don't mind disposing of finalizer-free cycles first, but once I have more than one finalizer left in the remaining cycles, I'd like the right not to reduce the graph for topsort reasons -- that algorithm seems hard.

So we're back to the cleanup design. Strawman proposal: for all finalizers in a trash cycle, call their cleanup method, in arbitrary order. After all cleanup calls are done, if the objects haven't all disposed of themselves, they are all garbage-collected without calling del. (This seems to require another garbage colelction cycle -- so perhaps there should also be a once-only rule for cleanup?)

Separate question: what if there is no cleanup? This should probably be reported: "You have cycles with finalizers, buddy! What do you want to do about them?" This same warning could be given when there is a cleanup but it doesn't break all cycles.

--Guido van Rossum (home page: http://www.python.org/~guido/)